Introduction To Gradient Descent
Introduction To Gradient Descent
Gradient Descent
by Amir Ali
Understanding the Concept
Iteration Learning Rate Convergence
Gradient descent works by The learning rate determines The algorithm continues
taking small steps in the the size of each step, and it's iterating until it reaches a point
direction of the negative crucial to find the right balance where the changes in the
gradient of the cost function, to ensure convergence. parameters are negligible,
which points towards the indicating the minimum has
minimum. been found.
Cost Function and Optimization
2 Chain Rule
When the model is complex, the gradient is computed using the chain rule to
differentiate the composite functions.
3 Numerical Approximation
In some cases, the gradient may be difficult to compute analytically, so it can
be approximated numerically.
Updating the Parameters
Efficient for large-scale optimization problems Can get stuck in local minima for non-convex
functions
Works well with high-dimensional data Sensitive to the choice of learning rate
Guaranteed to find the global minimum for May require many iterations to converge,
convex functions especially for complex models
Can be parallelized for faster computation May not perform well with sparse or noisy data