Deep Learning DeepLearning Multiple Linear Regression Study Notes

2024-07-12

- Multidimensional features

Multidimensional features

Variables and terms

Column attribute x_j	Number of attributes n	$x$ ⁽ⁱ⁾Row vector	A value $vec{x}_j^i$ Up and down
Mean μ	standardization	Standard deviation	sigma（σ）

formula

$w$ = [w₁ w₂ w₃ …]
$x$ = [x₁ x₂ x₃ …]

$f_{vec{w},b} (vec{x}) = vec{w} * vec{x} + b = w_1x_1 + w_2x_2 + … + w _nx_n + b$

Multiple Linear Regression

import numpy
f = np.dot(w, x) + b
1
2

Note: When n is large, it is very fast (parallel processing)

Normal equation method

Greater than 1000, low efficiency
It cannot be generalized to other algorithms such as logistic regression, neural networks or others.
No iteration

$w_n = w_n - αdfrac{1}{m} sumlimits_{i=1}^mf_{vec{w},b}(vec{x}^{(i)}-y^{(i)})x_n^{(i)}$

$α{dfrac{1}{m}}sumlimits_{i=1}^m(f_{vec{w},b}(vec{x}^{(i)}-y^{(i)})$

The weights corresponding to independent variables with larger ranges tend to be smaller, and the weights corresponding to independent variables with smaller ranges tend to be larger.

Mean normalization

Divide by the maximum value of the range to get the weight and the independent variable [0, 1]

Horizontal axis: $x_1 = dfrac{x_1-μ_1}{2000-300}$ Y-axis: $x_2 = dfrac{x_2 - μ_2}{5-0}$

$x_1le0.82$ $x_2le0.54$

Z-score normalization

$x_1le2000$ $x_2le5$

$dfrac{x_1-μ_1}{σ_1}$ $x_1le3.1$

By scaling, we try to keep the values of all features in a similar range, so that the impact of their changes on the predicted values is close to (-3,3)

If the cost function J becomes larger, then the step size (learning rate) is inappropriate or the code is wrong.

insert image description here

Note: The number of iterations varies from machine to machine.

In addition to drawing curves to determine the extent of the iteration, you can also use automatic convergence testing
Let ε be equal to $10^{-3}$ , if the decrease of J is less than this small number, it is considered to have converged.

Set an appropriate learning rate

When testing, you can set a very small value to see if J decreases.
The learning rate should not be too large or too small during iteration
When testing, each time * 3, choose the largest possible learning rate, or a slightly smaller value than the reasonable value

Feature engineering

Create feature engineering through transformation or combination to provide more options

$f_{vec{w},b}(vec{x}) = w_1x_1+w_2x_2+w_3x_3+b$

Note: Polynomial regression can be used for both linear and nonlinear fits.

Technology Sharing