0% found this document useful (0 votes)

338 views

Gradient Descent Algorithm and Its Variants - GeeksforGeeks

The document discusses the Gradient Descent algorithm, an optimization technique used to minimize cost functions in machine learning. It outlines three types of gradient descent: Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent, each with its own advantages and disadvantages regarding computational efficiency and convergence behavior. Additionally, it provides algorithms for each type and explains their convergence trends based on the nature of the cost function.

Uploaded by

Piyush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

338 views

Gradient Descent Algorithm and Its Variants - GeeksforGeeks

Uploaded by

Piyush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Related Articles Save for later

Gradient Descent algorithm and

its variants
Difficulty Level : Medium ● Last Updated : 02 Jun, 2020

Gradient Descent is an optimization algorithm used for

minimizing the cost function in various machine learning
algorithms. It is basically used for updating the parameters of
the learning model.

Types of gradient Descent:

Attention reader! Don’t stop learning now. Get hold of all the
important Machine Learning Concepts with the Machine
Learning Foundation Course at a student-friendly price and
become industry ready.

1. Batch Gradient Descent: This is a type of gradient descent

which processes all the training examples for each iteration
of gradient descent. But if the number of training examples is
large, then batch gradient descent is computationally very
expensive. Hence if the number of training examples is large,
then batch gradient descent is not preferred. Instead, we
prefer to use stochastic gradient descent or mini-batch
gradient descent.
2. Stochastic Gradient Descent: This is a type of gradient
Switch to Light Mode
descent which processes 1 training example per iteration.
Hence, the parameters are being updated even after one
iteration in which only a single example has been processed.
Hence this is quite faster than batch gradient descent. But
again, when the number of training examples is large, even
then it processes only one example which can be additional
overhead for the system as the number of iterations will be
quite large.
3. Mini Batch gradient descent: This is a type of gradient
descent which works faster than both batch gradient descent
and stochastic gradient descent. Here b examples where b<m
are processed per iteration. So even if the number of training
examples is large, it is processed in batches of b training
examples in one go. Thus, it works for larger training
examples and that too with lesser number of iterations.

Variables used:
Let m be the number of training examples.
Let n be the number of features.

Learn more
Note: if b == m, then mini batch gradient descent will behave
similarly to batch gradient descent.

Algorithm for batch gradient descent :

Let hθ(x) be the hypothesis for linear regression. Then, the
cost function is given by:
Let Σ represents the sum of all training examples from i=1 to
m.

Jtrain(θ) = (1/2m) Σ( hθ(x(i)) - y(i))2

Repeat {
θj = θj – (learning rate/m) * Σ( hθ(x(i)) - y
For every j =0 …n
}

Where xj(i) Represents the jth feature of the ith training

example. So if m is very large(e.g. 5 million training
samples), then it takes hours or even days to converge to the
global minimum.That’s why for large datasets, it is not
recommended to use batch gradient descent as it slows down
the learning.

Algorithm for stochastic gradient descent:

1) Randomly shuffle the data set so that the parameters can
be trained evenly for each type of data.
2) As mentioned above, it takes into consideration one
example per iteration.

Hence,
Let (x(i),y(i)) be the training example
Cost(θ, (x(i),y(i))) = (1/2) Σ( hθ(x(i)) - y(i))

Jtrain(θ) = (1/m) Σ Cost(θ, (x(i),y(i)))

Repeat {

For i=1 to m{

θj = θj – (learning rate) * Σ( hθ(x(i)) - y

For every j =0 …n

}
}

Algorithm for mini batch gradient descent:

Say b be the no of examples in one batch, where b < m.
Assume b = 10, m = 100;

Note: However we can adjust the batch size. It is generally

kept as power of 2. The reason behind it is because some
hardware such as GPUs achieve better run time with common
batch sizes such as power of 2.

Repeat {
For i=1,11, 21,…..,91

Let Σ be the summation from i to i+9 represented by k.

θj = θj – (learning rate/size of (b) ) * Σ( h

For every j =0 …n

Convergence trends in different variants of Gradient

Descents:

In case of Batch Gradient Descent, the algorithm follows a

straight path towards the minimum. If the cost function is
convex, then it converges to a global minimum and if the cost
function is not convex, then it converges to a local minimum.
Here the learning rate is typically held constant.

In case of stochastic gradient Descent and mini-batch

gradient descent, the algorithm does not converge but keeps
on fluctuating around the global minimum. Therefore in order
to make it converge, we have to slowly change the learning
rate. However the convergence of Stochastic gradient
descent is much noisier as in one iteration, it processes only
one training example.

Like 17

Previous Next

ML | Momentum- Getting started with

based Gradient Classification
Optimizer introduction

R ECO M M E N D E D A RT I C L E S Page : 1 2 3

17 Large Scale Machine Learning PDF
No ratings yet
17 Large Scale Machine Learning PDF
10 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
GD Types
No ratings yet
GD Types
98 pages
Gradient Descent and Its Types
No ratings yet
Gradient Descent and Its Types
5 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
Technical_writing (2)
No ratings yet
Technical_writing (2)
9 pages
7 Stochastic Gradient
No ratings yet
7 Stochastic Gradient
4 pages
Technical_writing (1)
No ratings yet
Technical_writing (1)
9 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
3 Types of Gradient Descent Algorithms For Small & Large Datasets
No ratings yet
3 Types of Gradient Descent Algorithms For Small & Large Datasets
9 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Technical_writing
No ratings yet
Technical_writing
8 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
CSCE 5063-001: Assignment 2: 1 Implementation of SVM Via Gradient Descent
No ratings yet
CSCE 5063-001: Assignment 2: 1 Implementation of SVM Via Gradient Descent
5 pages
Gradient Descent Regression
No ratings yet
Gradient Descent Regression
14 pages
Gradient Descent Unit3
No ratings yet
Gradient Descent Unit3
9 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
DL Unit -2
No ratings yet
DL Unit -2
20 pages
Lecture 08 ML
No ratings yet
Lecture 08 ML
20 pages
ANN Explanation Request Updated
No ratings yet
ANN Explanation Request Updated
44 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Module 4 Lab 3
No ratings yet
Module 4 Lab 3
6 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Gradient_decent
No ratings yet
Gradient_decent
15 pages
5 Optimizers
No ratings yet
5 Optimizers
10 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
23 pages
S09_DNN_Gradients_wip
No ratings yet
S09_DNN_Gradients_wip
28 pages
05.Stochastic Gradient Descent (3)
No ratings yet
05.Stochastic Gradient Descent (3)
2 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
04 Batch SGD Mini Batch Gradient Descent Algorithms
No ratings yet
04 Batch SGD Mini Batch Gradient Descent Algorithms
3 pages
Gradient Descent
No ratings yet
Gradient Descent
58 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
4 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Lec6 (1)
No ratings yet
Lec6 (1)
11 pages
Backpropagation, Sgmiod Neuron & Gradient Discend
No ratings yet
Backpropagation, Sgmiod Neuron & Gradient Discend
29 pages
Paper 2
No ratings yet
Paper 2
27 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
2,5 Stochastic Gradient Descent
No ratings yet
2,5 Stochastic Gradient Descent
11 pages
chp2 Gradient Descent algorithm
No ratings yet
chp2 Gradient Descent algorithm
5 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Yash 21bsds12
No ratings yet
Yash 21bsds12
3 pages
CS435 Ch5
No ratings yet
CS435 Ch5
15 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
No ratings yet
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
2 pages
Optimizer
No ratings yet
Optimizer
13 pages
Gradient Descent & Stockastic Gradient Descent
No ratings yet
Gradient Descent & Stockastic Gradient Descent
6 pages
8 - GD Optimizers
No ratings yet
8 - GD Optimizers
18 pages
Gradient Descent Deep Learning Lecture
No ratings yet
Gradient Descent Deep Learning Lecture
5 pages
3 Gradient Descent
No ratings yet
3 Gradient Descent
8 pages
ML - WEEK 06
No ratings yet
ML - WEEK 06
31 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Air Quality Dispersion Modeling - Preferred and Recommended Models _ US EPA
No ratings yet
Air Quality Dispersion Modeling - Preferred and Recommended Models _ US EPA
10 pages
ANU UG 5th Sem B.SC Computers Web Interface Designing Technologies Material PDF (WWW - Anuupdates.org)
No ratings yet
ANU UG 5th Sem B.SC Computers Web Interface Designing Technologies Material PDF (WWW - Anuupdates.org)
170 pages
Linux Commands Cheat Sheet
No ratings yet
Linux Commands Cheat Sheet
1 page
Chapter 2 - The Origins of Software
No ratings yet
Chapter 2 - The Origins of Software
26 pages
Pinkalicious Dragon to the Rescue (I Can Read Level 1) Victoria Kann Victoria Kann 2024 scribd download
100% (4)
Pinkalicious Dragon to the Rescue (I Can Read Level 1) Victoria Kann Victoria Kann 2024 scribd download
16 pages
10 Moduel 1 đầu vào 1 đầu ra FDCIO 181-1
No ratings yet
10 Moduel 1 đầu vào 1 đầu ra FDCIO 181-1
6 pages
Data Engineer 5 Months Roadmap
No ratings yet
Data Engineer 5 Months Roadmap
5 pages
Unit-2 Chapter 3 Artifact Set
No ratings yet
Unit-2 Chapter 3 Artifact Set
45 pages
Exploring Progress in Aspect-Based Sentiment Analysis An In-Depth Survey
No ratings yet
Exploring Progress in Aspect-Based Sentiment Analysis An In-Depth Survey
10 pages
8016-9008-9016 User's Manual
No ratings yet
8016-9008-9016 User's Manual
109 pages
20220108202159D6130 - 01-02 Systems of Linear Equations-Update
No ratings yet
20220108202159D6130 - 01-02 Systems of Linear Equations-Update
31 pages
Detailed Lesson Plan in EPP
No ratings yet
Detailed Lesson Plan in EPP
7 pages
MBA 5040 Company Analysis Paper
No ratings yet
MBA 5040 Company Analysis Paper
13 pages
HTML
No ratings yet
HTML
94 pages
The Difference Between Standard Sales Rush Orders
No ratings yet
The Difference Between Standard Sales Rush Orders
6 pages
npc-list
No ratings yet
npc-list
10 pages
Get The Gold For App Inventor 2
No ratings yet
Get The Gold For App Inventor 2
6 pages
Kavayitri Bahinabai Chaudhari North Maharashtra University Front Page
No ratings yet
Kavayitri Bahinabai Chaudhari North Maharashtra University Front Page
1 page
6th Cse Important Question
No ratings yet
6th Cse Important Question
9 pages
GNSS Receivers, Data Colletors and Radio
No ratings yet
GNSS Receivers, Data Colletors and Radio
10 pages
16-Bit Embedded Control Solutions PDF
No ratings yet
16-Bit Embedded Control Solutions PDF
32 pages
Huawei Nova 17i
No ratings yet
Huawei Nova 17i
3 pages
Overview of The Architecture, Circuit Design, and Physical Implementation of A First-Generation Cell Processor
No ratings yet
Overview of The Architecture, Circuit Design, and Physical Implementation of A First-Generation Cell Processor
18 pages
5 - Foundations - 11 - 26 - 2 - M5
No ratings yet
5 - Foundations - 11 - 26 - 2 - M5
73 pages
Kstar Inverosores Datasheet
No ratings yet
Kstar Inverosores Datasheet
1 page
4 Chapter 21 Non Linear Programming
No ratings yet
4 Chapter 21 Non Linear Programming
37 pages
Lab 3-Manual-CS314-CSS-HTML
No ratings yet
Lab 3-Manual-CS314-CSS-HTML
23 pages
Honeywell
No ratings yet
Honeywell
1 page
Heq Apr22 PGD We
No ratings yet
Heq Apr22 PGD We
6 pages
Chapter 1
No ratings yet
Chapter 1
3 pages