2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Introduction: To train an artificial intelligence model, in simple terms, we adjust the model parameters based on the data.The values predicted by the model are the same as the values in our dataBut it was definitely different at the beginning, so weIntroducing loss function, use it to calculate how much is still missing; once we find out how much is missing, how do we adjust the parameters in the original model?
Why! The purpose of adjusting the parameters of the original model is to make the predicted value the same as the required value.Is it possible to find a model parameter that minimizes the difference between the calculated predicted value and the required value? ===》This is the problem of finding the minimum value
So the essence is to find the minimum value of the loss function.
Disassembly instructions:
Target: Find the appropriate value of x that minimizes f(x).
logic
1. Take any point x0 and calculate the derivative value f(x0) at this point
2. Depending on the sign of the derivative, decide whether x0 should be increased or decreased;If the derivative is positive, then reduce x; because as it increases, y will also increase; if the derivative is negative, increase x
3.迭代进行1,2步直到导数为0;或者导数变号了。
Under what circumstances does the derivative change sign?
那就函数的值,之前在减小,现在在增大了,所以导数就会编号,那么最小值就在其中(救赎之道,就在其中)
gradient: It can be directly understood as a derivative, but in deep learning, it is usually not a derivative, that is, the derivative of a multivariate function is taken.
for example:
Unary function:
Original function: y=5x^2
Derivative function: y = 10x
That is, when x=1, the derivative value is 10
Multi Function
Ternary function: y=2x^2 + 6z^2 + 7m^3
Derivative function (i.e. partial derivative of three unknowns): y={4x,12z,21m^2}
The gradient at [1,1,1] is [4,12,21]; and the gradient is a vector
They all take the derivative of the function, and you can use the derivative to understand the gradient
Interpretation: The gradient descent algorithm is a logic that calculates the gradient of the model in the input data, and then updates the original weight parameters of the model through the learning rate. There are many types of algorithms used, which we will introduce.
The following diagram is a flowchart of deep learning, where the value of finding the minimum value problem corresponds toLoss function –> optimizer –> model
1. The smaller the loss function, the better the model
2. The goal of learning is to minimize the loss function
3. Model weights affect the loss function
4. Find the optimal weights through gradient descent
Updated logic
1. Calculate the predicted value y1 based on the input x and the current weight of the model
2. Calculate loss using loss function based on y1 and y
3. Calculate the gradient of the model weights based on the loss
4. Use gradients and learning rates to adjust the model weights according to the optimizer
Update method:
1. All samples are calculated together (accumulated) Gradient descent
2. Calculate the gradient using one sample at a time Stochastic gradient descent
3. Mini-batch gradient descent is calculated using n samples each time (accumulation).