0% found this document useful (0 votes)

181 views9 pages

3 Types of Gradient Descent Algorithms For Small & Large Datasets

There are three main types of gradient descent algorithms: 1) Batch gradient descent calculates the gradient using the entire dataset in each iteration, making it slow for large datasets. 2) Stochastic gradient descent uses a single random example per iteration, making it faster but less accurate than batch gradient descent. 3) Mini-batch gradient descent uses a small batch of examples to calculate the gradient, providing a good balance between speed and accuracy compared to the other two methods.

Uploaded by

contactcenter2 Lider

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

181 views9 pages

3 Types of Gradient Descent Algorithms For Small & Large Datasets

Uploaded by

contactcenter2 Lider

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

3 Types of Gradient Descent Algorithms for Small & Large Data Sets

Introduction
Gradient Descent Algorithm (GD) is an iterative algorithm to find a Global Minimum of an
objective function (cost function) J(). The categorization of GD algorithm is for accuracy
and time consuming factors that are discussed below in detail. This algorithm is widely used
in machine learning for minimization of functions. Here,the algorithm to achieve objective
goal of picture below is in this tutorial below.

Why use gradient descent algorithm?

We use gradient descent to minimize the functions like J(). In gradient descent, our first
step is to initialize the parameters by some value and keep changing these values till we
reach the global minimum. In this algorithm, we calculate the derivative of cost function
in every iteration and update the values of parameter simultaneously using the formula:
where '' is the learning rate.

We will consider linear regression for algorithmic example in this article while talking about
gradient descent, although the ideas apply to other algorithms too, such as

Logistic regression
Neural networks

In linear regression we have a hypothesis function:

Where are parameters and are input features. In order to solve

the model, we try to find the parameter, such that the hypothesis fits the model in the best
possible way.To find the value of parameters we develop a cost function J( ) and use
gradient descent to minimize this function.

Cost function (ordinary least square error)

Gradient of Cost function

Plot between parameter and cost function

How gradient descent algorithm works?
The following pseudo-code explains the working :

1. Initialize the parameter with some values. (Say )

2. Keep changing these values iteratively in such a way it minimize the objective
function, J( ).

Types of Gradient Descent Algorithms

Various variants of gradient descent are defined on the basis of how we use the data to
calculate derivative of cost function in gradient descent. Depending upon the amount of
data used, the time complexity and accuracy of the algorithms differs with each other.

1. Batch Gradient Descent

2. Stochastic Gradient Descent
3. Mini-Batch Gradient Descent

How does the batch gradient descent work?

It is the first basic type of gradient descent in which we use the complete dataset available
to compute the gradient of cost function.

As we need to calculate the gradient on the whole dataset to perform just one update,
batch gradient descent can be very slow and is intractable for datasets that don't fit in
memory. After initializing the parameter with arbitrary values we calculate gradient of cost
function using following relation:

where 'm' is the number of training examples.

If you have 300,000,000 records you need to read in all the records into memory
from disk because you can't store them all in memory.
After calculating sigma for one iteration, we move one step.
Then repeat for every step.
This means it take a long time to converge.
Especially because disk I/O is typically a system bottleneck anyway, and this will
inevitably require a huge number of reads.

Contour plot: after every iteration

Batch gradient descent is not suitable for huge datasets. The code below explains
implementing gradient descent in python.

import numpy as np
import random

def gradient_descent(alpha, x, y, ep=0.0001, max_iter=10000):

converged = False
iter = 0
m = x.shape[0] # number of samples

# initial theta
t0 = np.random.random(x.shape[1])
t1 = np.random.random(x.shape[1])

# total error, J(theta)

J = sum([(t0 + t1*x[i] - y[i])**2 for i in range(m)])

# Iterate Loop
while not converged:
# for each training sample, compute the gradient (d/d_theta j(theta))
grad0 = 1.0/m * sum([(t0 + t1*x[i] - y[i]) for i in range(m)])
grad1 = 1.0/m * sum([(t0 + t1*x[i] - y[i])*x[i] for i in range(m)])

# update the theta_temp

temp0 = t0 - alpha * grad0
temp1 = t1 - alpha * grad1

# update theta
t0 = temp0
t1 = temp1

# mean squared error

e = sum( [ (t0 + t1*x[i] - y[i])**2 for i in range(m)] )

if abs(J-e) <= ep:

print 'Converged, iterations: ', iter, '!!!'
converged = True

J = e # update error
iter += 1 # update iter

if iter == max_iter:
print 'Max interactions exceeded!'
converged = True

return t0,t1

How does stochastic gradient descent works?

Batch Gradient Descent turns out to be a slower algorithm. So, for faster computation, we
prefer to use stochastic gradient descent.

The first step of algorithm is to randomize the whole training set. Then, for updation of
every parameter we use only one training example in every iteration to compute the
gradient of cost function. As it uses one training example in every iteration this algo is faster
for larger data set. In SGD, one might not achieve accuracy, but the computation of results
are faster.

After initializing the parameter with arbitrary values we calculate gradient of cost function
using following relation:
where, 'm' is the number of training examples

Following is the pseudo code for stochastic gradient descent:

In the inner loop:

Taking first step: pick first training example and update the parameter using this
example, then for second example and so on
Taking second step: pick second training example and update the parameter using
this example, and so on for ' m '.
Now take third ... n steps in algorithm.
Until we reach global minimum.

SGD Never actually converges like batch gradient descent does,but ends up wandering
around some region close to the global minimum.

How does mini batch gradient descent work?

Mini batch algorithm is the most favorable and widely used algorithm that makes precise
and faster results using a batch of 'm' training examples. In mini batch algorithm rather
than using the complete data set, in every iteration we use a set of 'm' training examples
called batch to compute the gradient of the cost function. Common mini-batch sizes range
between 50 and 256, but can vary for different applications.

In this way, algorithm

reduces the variance of the parameter updates, which can lead to more stable
convergence.
can make use of highly optimized matrix, that makes computing of gradient very
efficient.
After initializing the parameter with arbitrary values we calculate gradient of cost function
using following relation:

where ' b ' is number of batches and ' m ' is number training examples.

Some of the important points to remember are:

Updating Parameter Simultaneously

While implementing the algorithm, updating of parameter should be done

simultaneously. This means, during values of parameters should be store first in

some temporary variable then assigned to the parameters.

Learning rate ''

is crucial parameter that controls how large steps our algorithm takes.

1. If is too large algorithm would take larger steps and algorithm may not
converge .
2. if is small, then smaller will be the steps and esay to converge.
Checking working of gradient descent

Plot the curve between Number of Iterations and value of cost function after that
number of iteration. This plot helps to identify whether gradient descent is working
properly or not.
"J() should decrease after every iteration and should become constant (or
converge ) after some iterations."

Above statement is because after every iteration of gradient descent and takes
values such that J() moves towards depth i.e. value of J() decreases after every
iteration.

J() decreases with iteration

Variation in gradient descent with learning rate

Summary
In this article, we learned about the basics of gradient descent algorithm and its types.
These optimization algorithms are being widely used in neural networks these days. Hence,
it's important to learn. The image below shows a quick comparison in all 3 types of gradient
descent algorithms:

Source: http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets?utm_source=facebook-
post&utm_campaign=Blog-gradient-descent-algorithms&utm_medium=he-handle

Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
3. Linear Models-Gradient Descent, Regularization(Introduction)
No ratings yet
3. Linear Models-Gradient Descent, Regularization(Introduction)
26 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
Solution Manual for Introduction to Algorithms, third edition By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein download pdf
100% (6)
Solution Manual for Introduction to Algorithms, third edition By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein download pdf
40 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Gradient Descent Algorithm.Y... (1)
No ratings yet
Gradient Descent Algorithm.Y... (1)
10 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
MFD S Assignment 2
No ratings yet
MFD S Assignment 2
18 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
GD Types
No ratings yet
GD Types
98 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
CS435 Ch5
No ratings yet
CS435 Ch5
15 pages
Gradient Descent 5 Part 2
No ratings yet
Gradient Descent 5 Part 2
15 pages
Gradient_decent
No ratings yet
Gradient_decent
15 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
12 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
Unit VI Optimization Techniques question bank solved answer
No ratings yet
Unit VI Optimization Techniques question bank solved answer
20 pages
Gradient Descent & Stockastic Gradient Descent
No ratings yet
Gradient Descent & Stockastic Gradient Descent
6 pages
Paper 2
No ratings yet
Paper 2
27 pages
Technical_writing (2)
No ratings yet
Technical_writing (2)
9 pages
Technical_writing (1)
No ratings yet
Technical_writing (1)
9 pages
Aie231 NN Lab5
No ratings yet
Aie231 NN Lab5
7 pages
Module 4 Lab 3
No ratings yet
Module 4 Lab 3
6 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
Technical_writing
No ratings yet
Technical_writing
8 pages
Gradient Descent a Fundamental Optimization Algorithm
No ratings yet
Gradient Descent a Fundamental Optimization Algorithm
30 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Gradient Descent Unit3
No ratings yet
Gradient Descent Unit3
9 pages
CSCE 5063-001: Assignment 2: 1 Implementation of SVM Via Gradient Descent
No ratings yet
CSCE 5063-001: Assignment 2: 1 Implementation of SVM Via Gradient Descent
5 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
DL Unit -2
No ratings yet
DL Unit -2
20 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Gradient Descent and SGD
No ratings yet
Gradient Descent and SGD
8 pages
Cs Lecture 1
No ratings yet
Cs Lecture 1
102 pages
UNIT2
No ratings yet
UNIT2
25 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
LInear
No ratings yet
LInear
14 pages
Gradient Descent and Its Types
No ratings yet
Gradient Descent and Its Types
5 pages
5 Optimizers
No ratings yet
5 Optimizers
10 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
chp2 Gradient Descent algorithm
No ratings yet
chp2 Gradient Descent algorithm
5 pages
Algorithms - Pseudo-Code - Flowcharts - Answers
No ratings yet
Algorithms - Pseudo-Code - Flowcharts - Answers
21 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
17 Large Scale Machine Learning PDF
No ratings yet
17 Large Scale Machine Learning PDF
10 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
AI33
No ratings yet
AI33
6 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Project Report Sample
0% (1)
Project Report Sample
62 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Sheet 3 Sol 3
No ratings yet
Sheet 3 Sol 3
3 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
2024ATINY LASER SENSOR PRODUCT MANUAL 2024_compressed-V3_compressed
No ratings yet
2024ATINY LASER SENSOR PRODUCT MANUAL 2024_compressed-V3_compressed
34 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Yash 21bsds12
No ratings yet
Yash 21bsds12
3 pages
PC Exam
No ratings yet
PC Exam
59 pages
(LNCS 9795) Hien T. Nguyen, Vaclav Snasel (Eds.) - Computational Social Networks - 5th International Conference, CSoNet 2016, Ho Chi Minh City, Vietnam, August 2-4, 2016
100% (1)
(LNCS 9795) Hien T. Nguyen, Vaclav Snasel (Eds.) - Computational Social Networks - 5th International Conference, CSoNet 2016, Ho Chi Minh City, Vietnam, August 2-4, 2016
366 pages
Can-an-Algorithm-Reduce-the-Perceived-Bias-of-News-Testing-the-Effect-of-Machine-Attribution-on-News-Readers-Evaluations-of-Bias-Anthropomorphism-and-CredibilityJournalism-and-Mass-Communication-Quarterly
No ratings yet
Can-an-Algorithm-Reduce-the-Perceived-Bias-of-News-Testing-the-Effect-of-Machine-Attribution-on-News-Readers-Evaluations-of-Bias-Anthropomorphism-and-CredibilityJournalism-and-Mass-Communication-Quarterly
19 pages
Problem Solving With Logic Structures
No ratings yet
Problem Solving With Logic Structures
72 pages
16 Mark QP
No ratings yet
16 Mark QP
47 pages
Bachelor Syllabus Year 1
No ratings yet
Bachelor Syllabus Year 1
14 pages
COLETTA & KITCHIN - 2018 - Algorhythmic Governance
No ratings yet
COLETTA & KITCHIN - 2018 - Algorhythmic Governance
16 pages
(1995) Quinn - The LN (Atanbeta) Index and How To Calculate It and How To Use It Within The Topmodel Framework
No ratings yet
(1995) Quinn - The LN (Atanbeta) Index and How To Calculate It and How To Use It Within The Topmodel Framework
22 pages
Cambridge IGCSE: Computer Science 0478/22
No ratings yet
Cambridge IGCSE: Computer Science 0478/22
16 pages
Shelter - Algorithmic Design and Evaluation of Emergency Shelters
No ratings yet
Shelter - Algorithmic Design and Evaluation of Emergency Shelters
14 pages
Teaching Introductory Artificial Intelligence With Pac-Man: January 2010
No ratings yet
Teaching Introductory Artificial Intelligence With Pac-Man: January 2010
6 pages
Week 1 Quiz Coursera Answ
No ratings yet
Week 1 Quiz Coursera Answ
7 pages
AVLib A Simulink Library For Multi-Agent Systems Research
No ratings yet
AVLib A Simulink Library For Multi-Agent Systems Research
7 pages
Features of Steam Turbines Diagnostics
No ratings yet
Features of Steam Turbines Diagnostics
6 pages
Part 1 - Flowchart
No ratings yet
Part 1 - Flowchart
4 pages
Gujarat Technological University: 1. Teaching and Examination Scheme
No ratings yet
Gujarat Technological University: 1. Teaching and Examination Scheme
3 pages
Unit 1 - Lab 1-Flowchart
No ratings yet
Unit 1 - Lab 1-Flowchart
6 pages
SDM Prep PDF
No ratings yet
SDM Prep PDF
4 pages
Procedural Generation of Levels For Puzzle Game
No ratings yet
Procedural Generation of Levels For Puzzle Game
9 pages
(APY Material) Be 5 Sem Analysis and Design of Algorithms 2150703 Summer 2017
No ratings yet
(APY Material) Be 5 Sem Analysis and Design of Algorithms 2150703 Summer 2017
2 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-J
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-J
2 pages
Chandhya Resume
No ratings yet
Chandhya Resume
2 pages
DAA Test-1
No ratings yet
DAA Test-1
2 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
5 pages
Compro1-C++ Syllabus 1011
No ratings yet
Compro1-C++ Syllabus 1011
4 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

3 Types of Gradient Descent Algorithms For Small & Large Datasets

Uploaded by

3 Types of Gradient Descent Algorithms For Small & Large Datasets

Uploaded by

3 Types of Gradient Descent Algorithms for Small & Large Data Sets

Why use gradient descent algorithm?

In linear regression we have a hypothesis function:

Where are parameters and are input features. In order to solve

Cost function (ordinary least square error)

Gradient of Cost function

Plot between parameter and cost function

1. Initialize the parameter with some values. (Say )

Types of Gradient Descent Algorithms

1. Batch Gradient Descent

How does the batch gradient descent work?

where 'm' is the number of training examples.

Contour plot: after every iteration

def gradient_descent(alpha, x, y, ep=0.0001, max_iter=10000):

# total error, J(theta)

# update the theta_temp

# mean squared error

if abs(J-e) <= ep:

How does stochastic gradient descent works?

Following is the pseudo code for stochastic gradient descent:

In the inner loop:

How does mini batch gradient descent work?

In this way, algorithm

Some of the important points to remember are:

Updating Parameter Simultaneously

While implementing the algorithm, updating of parameter should be done

simultaneously. This means, during values of parameters should be store first in

Learning rate ''

J() decreases with iteration

Variation in gradient descent with learning rate

You might also like