2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Column attribute xj | Number of attributes n | x ⃗ vec{x} x(i)Row vector | A value x ⃗ j i vec{x}_j^i xjiUp and down |
---|---|---|---|
Mean μ | standardization | Standard deviation | sigma(σ) |
w
⃗
vec{w}
w = [w1 w2 w3 …]
x
⃗
vec{x}
x = [x1 x2 x3 …]
f w ⃗ , b ( x ⃗ ) = w ⃗ ∗ x ⃗ + b = w 1 x 1 + w 2 x 2 + … + w n x n + b f_{vec{w},b} (vec{x}) = vec{w} * vec{x} + b = w_1x_1 + w_2x_2 + … + w _nx_n + b fw,b(x)=w∗x+b=w1x1+w2x2+…+wnxn+b
import numpy
f = np.dot(w, x) + b
Note: When n is large, it is very fast (parallel processing)
w n = w n − α 1 m ∑ i = 1 m f w ⃗ , b ( x ⃗ ( i ) − y ( i ) ) x n ( i ) w_n = w_n - αdfrac{1}{m} sumlimits_{i=1}^mf_{vec{w},b}(vec{x}^{(i)}-y^{(i)})x_n^{(i)} wn=wn−αm1i=1∑mfw,b(x(i)−y(i))xn(i)
b = b − α 1 m ∑ i = 1 m ( f w ⃗ , b ( x ⃗ ( i ) − y ( i ) ) b = b - α{dfrac{1}{m}}sumlimits_{i=1}^m(f_{vec{w},b}(vec{x}^{(i)}-y^{(i)}) b=b−αm1i=1∑m(fw,b(x(i)−y(i))
The weights corresponding to independent variables with larger ranges tend to be smaller, and the weights corresponding to independent variables with smaller ranges tend to be larger.
Divide by the maximum value of the range to get the weight and the independent variable [0, 1]
Horizontal axis: x 1 = x 1 − μ 1 2000 − 300 x_1 = dfrac{x_1-μ_1}{2000-300} x1=2000−300x1−μ1 Y-axis: x 2 = x 2 − μ 2 5 − 0 x_2 = dfrac{x_2 - μ_2}{5-0} x2=5−0x2−μ2
− 0.18 ≤ x 1 ≤ 0.82 -0.18le x_1le0.82 −0.18≤x1≤0.82 − 0.46 ≤ x 2 ≤ 0.54 -0.46le x_2le0.54 −0.46≤x2≤0.54
300 ≤ x 1 ≤ 2000 300le x_1le2000 300≤x1≤2000 0 ≤ x 2 ≤ 5 0le x_2le5 0≤x2≤5
x 1 = x 1 − μ 1 σ 1 x1 = dfrac{x_1-μ_1}{σ_1} x1=σ1x1−μ1 − 0.67 ≤ x 1 ≤ 3.1 -0.67le x_1le3.1 −0.67≤x1≤3.1
By scaling, we try to keep the values of all features in a similar range, so that the impact of their changes on the predicted values is close to (-3,3)
If the cost function J becomes larger, then the step size (learning rate) is inappropriate or the code is wrong.
Note: The number of iterations varies from machine to machine.
In addition to drawing curves to determine the extent of the iteration, you can also use automatic convergence testing
Let ε be equal to
1
0
−
3
10^{-3}
10−3, if the decrease of J is less than this small number, it is considered to have converged.
Create feature engineering through transformation or combination to provide more options
f w ⃗ , b ( x ⃗ ) = w 1 x 1 + w 2 x 2 + w 3 x 3 + b f_{vec{w},b}(vec{x}) = w_1x_1+w_2x_2+w_3x_3+b fw,b(x)=w1x1+w2x2+w3x3+b
Note: Polynomial regression can be used for both linear and nonlinear fits.