Gradient Descent Final
Gradient Descent Final
ALGORITHM IN
MACHINE
LEARNING
Deepanshu Thakur- RBA09
Pallav Jain- RBA 106
Ritesh Shinde- RBA 111
Pranav Khadke- RBA 108
Sankalp Agarwal- RBA 112
Karanveer- RBA 101
01. INTRODUCTION
https://builtin.com/data-sc
ience/gradient-descent
MATHEMATICS OF GRADIENT
DESCENT:
So as X increases the value
of the Function from point A
decreases and value of the
Function from point B
increases.
Therefore, the Gradient or
Slope at point A is negative
that is -2x and at point B is
positive that is 2x.
This is how the slope helps
in deciding in which
direction to move to reach
the global minima.
GRADIENT DESCENT LEARNING
RATE:
• How big the steps gradient descent takes into
the direction of the local minimum are
determined by the learning rate, which figures
out how fast or slow we will move towards the
optimal weights.
Easy to understand, implement, and apply to May require additional techniques for stability
Simplicity
various machine learning problems. in complex scenarios.
Customer Churn Gradient Descent optimizes models predicting which customers are
Prediction likely to leave, enabling retention strategies.
Supply Chain Fine-tunes machine learning models for routing, demand forecasting,
Optimization and reducing operational costs.
COMPARISON
WITH SIMIALR
ALGORITHMS
Algorith
Description Advantages Disadvantages Best Use
m
General-
purpose
Optimizes by iteratively updating Simple, easy to Sensitive to learning rate, can
Gradient optimization,
parameters using the gradient of implement, efficient get stuck in local minima, slow
Descent regression,
the loss function. for large datasets. near convergence.
neural
networks.
Prevents exploding
Scales the learning rate by Can converge too fast to a Training RNNs
RMSPro gradients, effective in
averaging squared gradients to suboptimal solution, needs and deep neural
p non-stationary
stabilize updates. careful tuning of the decay rate. networks.
settings.
Deep learning
Adds a fraction of the previous update Reduces oscillations, Requires additional
Moment and models with
to the current gradient to accelerate faster convergence in hyperparameter tuning
um complex loss
convergence. ravine-like regions. (momentum term).
surfaces.
• It may get stuck in local minima, saddle points, or oscillate if the learning rate is not
properly tuned.
• Gradient Descent remains one of the most widely used optimization techniques in
machine learning due to its efficiency, scalability, and adaptability. Despite its
limitations, improvements like Momentum, Adam, and RMSprop have enhanced its
performance in deep learning applications.
REFERENCES:
https://towardsdatascience.com/gradient-descent-from-scratc
h-e8b75fa986cc
https://builtin.com/data-science/gradient-descent
https://youtu.be/xOB10eTjoQ
https://www.javatpoint.com/gradient-descent-in-m
achine-learning
THANK
YOU