0% found this document useful (0 votes)
15 views27 pages

Gradient Descent Final

Gradient Descent is an optimization algorithm used in machine learning to minimize cost functions by iteratively updating model parameters in the direction of the steepest descent. It is computationally efficient and scalable, making it suitable for high-dimensional datasets, but can struggle with local minima if the learning rate is not properly tuned. The algorithm has various applications in business analytics, including predictive modeling, dynamic pricing, and fraud detection.

Uploaded by

Deepanshu Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views27 pages

Gradient Descent Final

Gradient Descent is an optimization algorithm used in machine learning to minimize cost functions by iteratively updating model parameters in the direction of the steepest descent. It is computationally efficient and scalable, making it suitable for high-dimensional datasets, but can struggle with local minima if the learning rate is not properly tuned. The algorithm has various applications in business analytics, including predictive modeling, dynamic pricing, and fraud detection.

Uploaded by

Deepanshu Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

GRADIENT DESCENT

ALGORITHM IN
MACHINE
LEARNING
Deepanshu Thakur- RBA09
Pallav Jain- RBA 106
Ritesh Shinde- RBA 111
Pranav Khadke- RBA 108
Sankalp Agarwal- RBA 112
Karanveer- RBA 101
01. INTRODUCTION

02. IT’S WORKING

03. PROS AND CONS


TABLE OF
04. UTILITY IN BUSINESS
ANALYTICS CONTENT
05. UTILITY IN BUSINESS
SCENARIO

06. COMPARISON WITH


SIMILAR ALGORITHM

07. DATASET & CONCLUSION


WHAT IS GRADIENT

Gradient Descent is an optimization algorithm for finding a local


minimum of a differentiable function. Gradient descent in
machine learning is simply used to find the values of a function's
parameters (coefficients) that minimize a cost function as far as
possible.

Function requirements Gradient descent algorithm does not work


for all functions. There are two specific requirements.

A function must be:


• Differentiable
• Convex
• if a function is differentiable,
it has a derivative for each point in its
domain, not all functions meet these
criteria.

• function must be convex.


For a univariate function, this means that
the line segment connecting two
function’s points lays on or above its curve
(it does not cross it). If it does it means
that it has a local minimum which is not a
global one.
HOW DOES GRADIENT DESCENT
WORK?
In Gradient Descent, we start with a random guess and then
move slowly to the right answer or correct answer.

The common formula for optimization of parameters in


gradient descent is:
• New Value = Old Value – Step size (how much you
want to shift)

So, new value is the updates version of old value adjusted


against known as step size.

The Step Size in the world of Gradient Descent in terms of


mathematics is computed as:
• Step Size = Learning Rate x Slope
Let us consider a square
function:
So, the objective of
Gradient Descent is that
this function needs to be
minimized.

So, the minimum value


which this function can
take is 0. So, our objective
is to reach the minimum
value.
So, now the random guess
is which is plotted on the
graph, and we don’t know
whether to move up or
down. So, to find out in
which direction we need
to move to reach the
minimum point is with the
help of concept called
Derivative.

https://builtin.com/data-sc
ience/gradient-descent
MATHEMATICS OF GRADIENT
DESCENT:
So as X increases the value
of the Function from point A
decreases and value of the
Function from point B
increases.
Therefore, the Gradient or
Slope at point A is negative
that is -2x and at point B is
positive that is 2x.
This is how the slope helps
in deciding in which
direction to move to reach
the global minima.
GRADIENT DESCENT LEARNING
RATE:
• How big the steps gradient descent takes into
the direction of the local minimum are
determined by the learning rate, which figures
out how fast or slow we will move towards the
optimal weights.

• For the gradient descent algorithm to reach the


local minimum we must set the learning rate to
an appropriate value, which is neither too low
nor too high. This is important because if the
steps it takes are too big, it may not reach the
local minimum because it bounces back and
forth between the convex function of gradient
descent.

• If we set the learning rate to a very small value,


gradient descent will eventually reach the local
minimum but that may take a while. So, the
learning rate should never be too high or too low
PROS AND CONS
Aspect Pros Cons

Computationally efficient, especially in Full-batch gradient descent is


Efficiency
stochastic and mini-batch variations. computationally expensive for large datasets.

Easy to understand, implement, and apply to May require additional techniques for stability
Simplicity
various machine learning problems. in complex scenarios.

Handles high-dimensional data and large


May struggle with very large datasets without
Scalability datasets effectively with optimization
mini-batch or stochastic variants.
techniques.

Guaranteed to converge to a minimum


Can get stuck in local minima or saddle
Convergence (local/global) with proper learning rate
points in non-convex functions.
tuning.

Sensitive to learning rate—too high causes


Learning Flexibility to adjust learning rate for different
divergence, too low leads to slow
Rate use cases.
UTILITY IN BUSINESS ANALYTICS

• Optimization in Predictive Models: Gradient Descent helps train


machine learning models (e.g., linear regression, logistic regression,
neural networks) to predict customer behavior, demand, and risks.

• Cost Function Minimization: Businesses use it to minimize error


metrics (e.g., mean squared error) in predictive modeling, improving
the accuracy of forecasts and insights.

• Revenue and Profit Maximization: Can be applied in dynamic pricing


strategies or inventory optimization to maximize revenue and
minimize costs.
• Clustering and Segmentation: Facilitates unsupervised
learning models like k-means clustering, enabling
customer segmentation for targeted marketing
campaigns.

• Risk Modeling and Mitigation: Helps optimize models for


credit scoring, fraud detection, and insurance risk
assessment by fine-tuning parameters for minimal loss.

• Operational Efficiency: Optimizes resource allocation,


supply chain parameters, and workforce management
using machine learning models trained with gradient
descent.
UTILITY IN
BUSINESS
SCENARIO
Scenario Description

Customer Churn Gradient Descent optimizes models predicting which customers are
Prediction likely to leave, enabling retention strategies.

Trains time-series models to predict future sales trends, helping


Sales Forecasting
businesses manage inventory and operations.

Optimizes pricing models based on real-time demand, supply, and


Dynamic Pricing
competitor analysis.

Marketing Refines logistic regression models to target potential customers for


Campaigns higher conversion rates.

Inventory Optimizes stock levels by predicting demand and minimizing holding or


Management shortage costs.

Helps train anomaly detection models to identify fraudulent


Fraud Detection
transactions in real time.

Supply Chain Fine-tunes machine learning models for routing, demand forecasting,
Optimization and reducing operational costs.
COMPARISON
WITH SIMIALR
ALGORITHMS
Algorith
Description Advantages Disadvantages Best Use
m

General-
purpose
Optimizes by iteratively updating Simple, easy to Sensitive to learning rate, can
Gradient optimization,
parameters using the gradient of implement, efficient get stuck in local minima, slow
Descent regression,
the loss function. for large datasets. near convergence.
neural
networks.

Adam Requires tuning


Fast convergence,
(Adaptive Combines Momentum and RMSProp hyperparameters, Deep learning,
adaptive learning rate,
Moment for adaptive learning rates and computationally expensive NLP, and vision
works well with sparse
Estimatio faster convergence. compared to vanilla Gradient tasks.
data.
n) Descent.

Prevents exploding
Scales the learning rate by Can converge too fast to a Training RNNs
RMSPro gradients, effective in
averaging squared gradients to suboptimal solution, needs and deep neural
p non-stationary
stabilize updates. careful tuning of the decay rate. networks.
settings.

Deep learning
Adds a fraction of the previous update Reduces oscillations, Requires additional
Moment and models with
to the current gradient to accelerate faster convergence in hyperparameter tuning
um complex loss
convergence. ravine-like regions. (momentum term).
surfaces.

Works well with sparse Sparse data


WORKING ON
TESLA DATASET
CONCLUSION
• Gradient Descent is a fundamental optimization algorithm widely used in machine
learning and deep learning for minimizing cost functions. It iteratively updates
model parameters in the direction of the steepest descent, as determined by the
negative gradient.

• Unlike traditional optimization methods, Gradient Descent can handle high-


dimensional datasets efficiently.

• Works well for large datasets and deep learning models.

• It may get stuck in local minima, saddle points, or oscillate if the learning rate is not
properly tuned.

• Gradient Descent remains one of the most widely used optimization techniques in
machine learning due to its efficiency, scalability, and adaptability. Despite its
limitations, improvements like Momentum, Adam, and RMSprop have enhanced its
performance in deep learning applications.
REFERENCES:

https://towardsdatascience.com/gradient-descent-from-scratc
h-e8b75fa986cc
https://builtin.com/data-science/gradient-descent
https://youtu.be/xOB10eTjoQ
https://www.javatpoint.com/gradient-descent-in-m
achine-learning

THANK
YOU

You might also like