0% found this document useful (0 votes)
255 views

Gradient Descent Algorithm and Its Variants - GeeksforGeeks

The document discusses the Gradient Descent algorithm, an optimization technique used to minimize cost functions in machine learning. It outlines three types of gradient descent: Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent, each with its own advantages and disadvantages regarding computational efficiency and convergence behavior. Additionally, it provides algorithms for each type and explains their convergence trends based on the nature of the cost function.

Uploaded by

Piyush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
255 views

Gradient Descent Algorithm and Its Variants - GeeksforGeeks

The document discusses the Gradient Descent algorithm, an optimization technique used to minimize cost functions in machine learning. It outlines three types of gradient descent: Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent, each with its own advantages and disadvantages regarding computational efficiency and convergence behavior. Additionally, it provides algorithms for each type and explains their convergence trends based on the nature of the cost function.

Uploaded by

Piyush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Related Articles Save for later

Gradient Descent algorithm and


its variants
Difficulty Level : Medium ● Last Updated : 02 Jun, 2020

Gradient Descent is an optimization algorithm used for


minimizing the cost function in various machine learning
algorithms. It is basically used for updating the parameters of
the learning model.

Types of gradient Descent:

Attention reader! Don’t stop learning now. Get hold of all the
important Machine Learning Concepts with the Machine
Learning Foundation Course at a student-friendly price and
become industry ready.

1. Batch Gradient Descent: This is a type of gradient descent


which processes all the training examples for each iteration
of gradient descent. But if the number of training examples is
large, then batch gradient descent is computationally very
expensive. Hence if the number of training examples is large,
then batch gradient descent is not preferred. Instead, we
prefer to use stochastic gradient descent or mini-batch
gradient descent.
2. Stochastic Gradient Descent: This is a type of gradient
Switch to Light Mode
descent which processes 1 training example per iteration.
Hence, the parameters are being updated even after one
iteration in which only a single example has been processed.
Hence this is quite faster than batch gradient descent. But
again, when the number of training examples is large, even
then it processes only one example which can be additional
overhead for the system as the number of iterations will be
quite large.
3. Mini Batch gradient descent: This is a type of gradient
descent which works faster than both batch gradient descent
and stochastic gradient descent. Here b examples where b<m
are processed per iteration. So even if the number of training
examples is large, it is processed in batches of b training
examples in one go. Thus, it works for larger training
examples and that too with lesser number of iterations.

Variables used:
Let m be the number of training examples.
Let n be the number of features.

Learn more
Note: if b == m, then mini batch gradient descent will behave
similarly to batch gradient descent.

Algorithm for batch gradient descent :


Let hθ(x) be the hypothesis for linear regression. Then, the
cost function is given by:
Let Σ represents the sum of all training examples from i=1 to
m.

Jtrain(θ) = (1/2m) Σ( hθ(x(i)) - y(i))2

Repeat {
θj = θj – (learning rate/m) * Σ( hθ(x(i)) - y
For every j =0 …n
}

Where xj(i) Represents the jth feature of the ith training


example. So if m is very large(e.g. 5 million training
samples), then it takes hours or even days to converge to the
global minimum.That’s why for large datasets, it is not
recommended to use batch gradient descent as it slows down
the learning.

Algorithm for stochastic gradient descent:


1) Randomly shuffle the data set so that the parameters can
be trained evenly for each type of data.
2) As mentioned above, it takes into consideration one
example per iteration.

Hence,
Let (x(i),y(i)) be the training example
Cost(θ, (x(i),y(i))) = (1/2) Σ( hθ(x(i)) - y(i))

Jtrain(θ) = (1/m) Σ Cost(θ, (x(i),y(i)))


Repeat {

For i=1 to m{

θj = θj – (learning rate) * Σ( hθ(x(i)) - y


For every j =0 …n

}
}

Algorithm for mini batch gradient descent:


Say b be the no of examples in one batch, where b < m.
Assume b = 10, m = 100;

Note: However we can adjust the batch size. It is generally


kept as power of 2. The reason behind it is because some
hardware such as GPUs achieve better run time with common
batch sizes such as power of 2.

Repeat {
For i=1,11, 21,…..,91

Let Σ be the summation from i to i+9 represented by k.

θj = θj – (learning rate/size of (b) ) * Σ( h


For every j =0 …n

Convergence trends in different variants of Gradient


Descents:

In case of Batch Gradient Descent, the algorithm follows a


straight path towards the minimum. If the cost function is
convex, then it converges to a global minimum and if the cost
function is not convex, then it converges to a local minimum.
Here the learning rate is typically held constant.

In case of stochastic gradient Descent and mini-batch


gradient descent, the algorithm does not converge but keeps
on fluctuating around the global minimum. Therefore in order
to make it converge, we have to slowly change the learning
rate. However the convergence of Stochastic gradient
descent is much noisier as in one iteration, it processes only
one training example.

Like 17

Previous Next

ML | Momentum- Getting started with


based Gradient Classification
Optimizer introduction

R ECO M M E N D E D A RT I C L E S Page : 1 2 3

You might also like