Technology Sharing

Deep Learning DeepLearning Multiple Linear Regression Study Notes

2024-07-12

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Multidimensional features

Variables and terms

Column attribute xjNumber of attributes n x ⃗ vec{x} x (i)Row vectorA value x ⃗ j i vec{x}_j^i x jiUp and down
Mean μstandardizationStandard deviationsigma(σ)

formula

w ⃗ vec{w} w = [w1 w2 w3 …]
x ⃗ vec{x} x = [x1 x2 x3 …]

f w ⃗ , b ( x ⃗ ) = w ⃗ ∗ x ⃗ + b = w 1 x 1 + w 2 x 2 + … + w n x n + b f_{vec{w},b} (vec{x}) = vec{w} * vec{x} + b = w_1x_1 + w_2x_2 + … + w _nx_n + b fw ,b(x )=w x +b=w1x1+w2x2++wnxn+b

Multiple Linear Regression

import numpy
f = np.dot(w, x) + b
  • 1
  • 2

Note: When n is large, it is very fast (parallel processing)

Normal equation method

  1. Greater than 1000, low efficiency
  2. It cannot be generalized to other algorithms such as logistic regression, neural networks or others.
  3. No iteration

w n = w n − α 1 m ∑ i = 1 m f w ⃗ , b ( x ⃗ ( i ) − y ( i ) ) x n ( i ) w_n = w_n - αdfrac{1}{m} sumlimits_{i=1}^mf_{vec{w},b}(vec{x}^{(i)}-y^{(i)})x_n^{(i)} wn=wnαm1i=1mfw ,b(x (i)y(i))xn(i)

b = b − α 1 m ∑ i = 1 m ( f w ⃗ , b ( x ⃗ ( i ) − y ( i ) ) b = b - α{dfrac{1}{m}}sumlimits_{i=1}^m(f_{vec{w},b}(vec{x}^{(i)}-y^{(i)}) b=bαm1i=1m(fw ,b(x (i)y(i))

The weights corresponding to independent variables with larger ranges tend to be smaller, and the weights corresponding to independent variables with smaller ranges tend to be larger.

Mean normalization

Divide by the maximum value of the range to get the weight and the independent variable [0, 1]

Horizontal axis: x 1 = x 1 − μ 1 2000 − 300 x_1 = dfrac{x_1-μ_1}{2000-300} x1=2000300x1μ1 Y-axis: x 2 = x 2 − μ 2 5 − 0 x_2 = dfrac{x_2 - μ_2}{5-0} x2=50x2μ2

− 0.18 ≤ x 1 ≤ 0.82 -0.18le x_1le0.82 0.18x10.82 − 0.46 ≤ x 2 ≤ 0.54 -0.46le x_2le0.54 0.46x20.54

Z-score normalization

300 ≤ x 1 ≤ 2000 300le x_1le2000 300x12000 0 ≤ x 2 ≤ 5 0le x_2le5 0x25

x 1 = x 1 − μ 1 σ 1 x1 = dfrac{x_1-μ_1}{σ_1} x1=σ1x1μ1 − 0.67 ≤ x 1 ≤ 3.1 -0.67le x_1le3.1 0.67x13.1

By scaling, we try to keep the values ​​of all features in a similar range, so that the impact of their changes on the predicted values ​​is close to (-3,3)

If the cost function J becomes larger, then the step size (learning rate) is inappropriate or the code is wrong.

insert image description here

Note: The number of iterations varies from machine to machine.

In addition to drawing curves to determine the extent of the iteration, you can also use automatic convergence testing
Let ε be equal to 1 0 − 3 10^{-3} 103, if the decrease of J is less than this small number, it is considered to have converged.

Set an appropriate learning rate

  1. When testing, you can set a very small value to see if J decreases.
  2. The learning rate should not be too large or too small during iteration
  3. When testing, each time * 3, choose the largest possible learning rate, or a slightly smaller value than the reasonable value

Feature engineering

Create feature engineering through transformation or combination to provide more options

f w ⃗ , b ( x ⃗ ) = w 1 x 1 + w 2 x 2 + w 3 x 3 + b f_{vec{w},b}(vec{x}) = w_1x_1+w_2x_2+w_3x_3+b fw ,b(x )=w1x1+w2x2+w3x3+b

Note: Polynomial regression can be used for both linear and nonlinear fits.