Deep Learning - Gradient Descent Algorithm

Deep Learning - Gradient Descent Algorithm - NLP (V)

2024-07-12

Gradient Descent Algorithm

Introduction to Gradient Descent Algorithm in Deep Learning

Introduction to Gradient Descent Algorithm in Deep Learning

Finding the minimum value problem

Introduction: To train an artificial intelligence model, in simple terms, we adjust the model parameters based on the data.The values predicted by the model are the same as the values in our dataBut it was definitely different at the beginning, so weIntroducing loss function, use it to calculate how much is still missing; once we find out how much is missing, how do we adjust the parameters in the original model?

Why! The purpose of adjusting the parameters of the original model is to make the predicted value the same as the required value.Is it possible to find a model parameter that minimizes the difference between the calculated predicted value and the required value? ===》This is the problem of finding the minimum value

So the essence is to find the minimum value of the loss function.

Mathematically find the minimum value

insert image description here
Disassembly instructions:
Target: Find the appropriate value of x that minimizes f(x).
logic

1. Take any point x0 and calculate the derivative value f(x0) at this point
2. Depending on the sign of the derivative, decide whether x0 should be increased or decreased;If the derivative is positive, then reduce x; because as it increases, y will also increase; if the derivative is negative, increase x
3.迭代进行1,2步直到导数为0；或者导数变号了。
Under what circumstances does the derivative change sign?
那就函数的值，之前在减小，现在在增大了，所以导数就会编号，那么最小值就在其中（救赎之道，就在其中）

gradient

gradient: It can be directly understood as a derivative, but in deep learning, it is usually not a derivative, that is, the derivative of a multivariate function is taken.
insert image description here
for example:
Unary function:

Original function: y=5x^2
Derivative function: y = 10x
That is, when x=1, the derivative value is 10

Multi Function

Ternary function: y=2x^2 + 6z^2 + 7m^3
Derivative function (i.e. partial derivative of three unknowns): y={4x,12z,21m^2}
The gradient at [1,1,1] is [4,12,21]; and the gradient is a vector

They all take the derivative of the function, and you can use the derivative to understand the gradient

Gradient Descent Algorithm

Interpretation: The gradient descent algorithm is a logic that calculates the gradient of the model in the input data, and then updates the original weight parameters of the model through the learning rate. There are many types of algorithms used, which we will introduce.
insert image description here

Finding the minimum problem in the deep learning process

Overall flow chart of deep learning

The following diagram is a flowchart of deep learning, where the value of finding the minimum value problem corresponds toLoss function –> optimizer –> model
insert image description here

The goal of solving the loss function

1. The smaller the loss function, the better the model
2. The goal of learning is to minimize the loss function
3. Model weights affect the loss function
4. Find the optimal weights through gradient descent

Weight Update

Updated logic

1. Calculate the predicted value y1 based on the input x and the current weight of the model
2. Calculate loss using loss function based on y1 and y
3. Calculate the gradient of the model weights based on the loss
4. Use gradients and learning rates to adjust the model weights according to the optimizer

Update method:

1. All samples are calculated together (accumulated) Gradient descent
2. Calculate the gradient using one sample at a time Stochastic gradient descent
3. Mini-batch gradient descent is calculated using n samples each time (accumulation).

Technology Sharing