0% found this document useful (0 votes)

17 views11 pages

2,5 Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is a variant of the Gradient Descent algorithm that optimizes machine learning models by using a single random training example or a small batch for each iteration, improving computational efficiency for large datasets. While SGD is faster and more memory-efficient than traditional methods, it can produce noisy updates leading to oscillations and may require more iterations to converge. Despite its disadvantages, SGD is preferred in many scenarios due to its ability to escape local minima and its suitability for online learning.

Uploaded by

ishan123456789000000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views11 pages

2,5 Stochastic Gradient Descent

Uploaded by

ishan123456789000000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Stochastic Gradient Descent (SGD)

Gradient Descent is an iterative

optimization process that searches for
an objective function’s optimum value
(Minimum/Maximum). It is one of the
most used methods for changing a
model’s parameters in order to reduce a
cost function in machine learning
projects.
The primary goal of gradient descent is
to identify the model parameters that
provide the maximum accuracy on both
training and test datasets.
In gradient descent, the gradient is a vector
pointing in the general direction of the
function’s steepest rise at a particular point.
The algorithm might gradually drop towards
lower values of the function by moving in the
opposite direction of the gradient, until
reaching the minimum of the function.

Types of Gradient Descent:

Typically, there are three types of Gradient
Descent:
1. Batch Gradient Descent
2. Stochastic Gradient Descent
3. Mini-batch Gradient Descent
In this article, we will be discussing Stochastic
Gradient Descent (SGD).

Stochastic Gradient Descent (SGD):

Stochastic Gradient Descent (SGD) is a
variant of the Gradient Descent algorithm that
is used for optimizing machine
learning models. It addresses the
computational inefficiency of traditional
Gradient Descent methods when dealing with
large datasets in machine learning projects.
In SGD, instead of using the entire
dataset for each iteration, only a single
random training example (or a small
batch) is selected to calculate the
gradient and update the model
parameters. This random selection
introduces randomness into the optimization
process, hence the term “stochastic” in
stochastic Gradient Descent
The advantage of using SGD is its
computational efficiency, especially
when dealing with large datasets. By
using a single example or a small batch, the
computational cost per iteration is
significantly reduced compared to traditional
Gradient Descent methods that require
processing the entire dataset.
Stochastic Gradient Descent Algorithm
 Initialization: Randomly initialize the
parameters of the model.
 Set Parameters: Determine the number

of iterations and the learning rate

(alpha) for updating the parameters.
 Stochastic Gradient Descent Loop:
Repeat the following steps until the
model converges or reaches the
maximum number of iterations:
 Shuffle the training dataset to
introduce randomness.
 Iterate over each training example

(or a small batch) in the shuffled

order.
 Compute the gradient of the cost

function with respect to the model

parameters using the current
training
example (or batch).
 Update the model parameters by

taking a step in the direction of the

negative gradient, scaled by the
learning rate.
 Evaluate the convergence criteria,

such as the difference in the cost

function between iterations of the
gradient.
 Return Optimized Parameters: Once the

convergence criteria are met or the

maximum number of iterations is
reached, return the optimized model
parameters.
In SGD, since only one sample from the
dataset is chosen at random for each
iteration, the path taken by the
algorithm to reach the minima is usually
noisier than your typical Gradient
Descent algorithm. But that doesn’t matter
all that much because the path taken by the
algorithm does not matter, as long as we
reach the minimum and with a significantly
shorter training time.
The path taken by Batch Gradient
Descent is shown below:
Batch gradient optimization path

A path taken by Stochastic Gradient

Descent looks as follows –
stochastic gradient optimization path

One thing to be noted is that, as SGD is

generally noisier than typical Gradient
Descent, it usually took a higher number of
iterations to reach the minima, because of the
randomness in its descent. Even though it
requires a higher number of iterations to
reach the minima than typical Gradient
Descent, it is still computationally much less
expensive than typical Gradient Descent.
Hence, in most scenarios, SGD is preferred
over Batch Gradient Descent for optimizing a
learning algorithm.

Difference between Stochastic Gradient

Descent & batch Gradient Descent
The comparison between Stochastic Gradient
Descent (SGD) and Batch Gradient Descent
are as follows:
Stochastic
Gradient Descent Batch Gradient
Aspect (SGD) Descent

Uses a single
random sample or Uses the entire
a small batch of dataset (batch) at
samples at each each iteration.
Dataset Usage iteration.

Computational Computationally Computationally

Efficiency less expensive per more expensive
iteration, as it per iteration, as it
processes fewer processes the
Stochastic
Gradient Descent Batch Gradient
Aspect (SGD) Descent

data points. entire dataset.

Faster Slower
convergence due convergence due
to frequent to less frequent
Convergence updates. updates.

High noise due to Low noise as it

frequent updates updates
Noise in with a single or parameters using
Updates few samples. all data points.

Less stable as it More stable as it

may oscillate converges
around the smoothly towards
Stability optimal solution. the optimum.

Requires less
Requires more
memory as it
memory to hold
processes fewer
the entire dataset
Memory data points at a
in memory.
Requirement time.
Stochastic
Gradient Descent Batch Gradient
Aspect (SGD) Descent

Frequent updates Less frequent

make it suitable updates make it
Update for online learning suitable for
Frequency and large datasets. smaller datasets.

Less sensitive to
More sensitive to
initial parameter
initial parameter
Initialization values due to
values.
Sensitivity frequent updates.

Advantages of Stochastic Gradient

Descent
 Speed: SGD is faster than other variants

of Gradient Descent such as Batch

Gradient Descent and Mini-Batch
Gradient Descent since it uses only one
example to update the parameters.
 Memory Efficiency: Since SGD updates

the parameters for each training

example one at a time, it is memory-
efficient and can handle large datasets
that cannot fit into memory.
 Avoidance of Local Minima: Due to
the noisy updates in SGD, it has the
ability to escape from local minima and
converges to a global minimum.
Disadvantages of Stochastic Gradient
Descent
 Noisy updates: The updates in SGD are

noisy and have a high variance, which

can make the optimization process less
stable and lead to oscillations around
the minimum.
 Slow Convergence: SGD may require

more iterations to converge to the

minimum since it updates the
parameters for each training example
one at a time.
 Sensitivity to Learning Rate: The
choice of learning rate can be critical in
SGD since using a high learning rate can
cause the algorithm to overshoot the
minimum, while a low learning rate can
make the algorithm converge slowly.
 Less Accurate: Due to the noisy
updates, SGD may not converge to the
exact global minimum and can result in
a suboptimal solution. This can be
mitigated by using techniques such as
learning rate scheduling and
momentum-based updates.

Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
23 pages
Stochastic Gradient Descent - Math and Python Code
No ratings yet
Stochastic Gradient Descent - Math and Python Code
28 pages
Lecture05 Descent
No ratings yet
Lecture05 Descent
31 pages
Topic5 Stoch Grad D Oct202023
No ratings yet
Topic5 Stoch Grad D Oct202023
29 pages
INT255 Unit-4
No ratings yet
INT255 Unit-4
40 pages
GD Types
No ratings yet
GD Types
98 pages
Non-Convex Optimization For Deep Networks and Stochastic
No ratings yet
Non-Convex Optimization For Deep Networks and Stochastic
9 pages
Paper 2
No ratings yet
Paper 2
27 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
4 pages
ANN Explanation Request Updated
No ratings yet
ANN Explanation Request Updated
44 pages
Lec 6
No ratings yet
Lec 6
11 pages
Linear Models-Gradient Descent, Regularization (Introduction)
No ratings yet
Linear Models-Gradient Descent, Regularization (Introduction)
26 pages
Lecture 08 ML
No ratings yet
Lecture 08 ML
20 pages
DC9 072A Industrial
No ratings yet
DC9 072A Industrial
4 pages
ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks
No ratings yet
ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks
9 pages
Dla-Cat 1
No ratings yet
Dla-Cat 1
37 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
UNIT3
No ratings yet
UNIT3
37 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
3 pages
04 Batch SGD Mini Batch Gradient Descent Algorithms
No ratings yet
04 Batch SGD Mini Batch Gradient Descent Algorithms
3 pages
Gradient Descent
No ratings yet
Gradient Descent
7 pages
Acquiring Bank
No ratings yet
Acquiring Bank
6 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Free Ebook MCQ Series Based On e PG Pathshala P02-M1,2,3
No ratings yet
Free Ebook MCQ Series Based On e PG Pathshala P02-M1,2,3
81 pages
UNIT2
No ratings yet
UNIT2
25 pages
7 Stochastic Gradient
No ratings yet
7 Stochastic Gradient
4 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Optimizer
No ratings yet
Optimizer
13 pages
Gradient Descent and Cost Function
No ratings yet
Gradient Descent and Cost Function
14 pages
Gradient Descent and Its Types
No ratings yet
Gradient Descent and Its Types
5 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
Gradient Decent
No ratings yet
Gradient Decent
15 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
ECS171: Machine Learning: Lecture 4: Optimization (LFD 3.3, SGD)
No ratings yet
ECS171: Machine Learning: Lecture 4: Optimization (LFD 3.3, SGD)
45 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
4 pages
Stochastic Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Stochastic Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
22 pages
Chapter 2
100% (1)
Chapter 2
40 pages
Stochastic Search Methods
No ratings yet
Stochastic Search Methods
2 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
5 pages
Gradient Descent & Stockastic Gradient Descent
No ratings yet
Gradient Descent & Stockastic Gradient Descent
6 pages
Gradient Descent 5 Part 2
No ratings yet
Gradient Descent 5 Part 2
15 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
23 pages
2025-03-17
No ratings yet
2025-03-17
3 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
PL Systemair Feb 2024
No ratings yet
PL Systemair Feb 2024
133 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
SGD
No ratings yet
SGD
3 pages
Unit I
No ratings yet
Unit I
12 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Technical Writing
No ratings yet
Technical Writing
8 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
Mini Project Report 2024-25-1
No ratings yet
Mini Project Report 2024-25-1
29 pages
5.prestressing in UHPFRC
No ratings yet
5.prestressing in UHPFRC
10 pages
Mini Project Report 2024-25-0
No ratings yet
Mini Project Report 2024-25-0
28 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
Account Closure Form
No ratings yet
Account Closure Form
1 page
Stress Detection Using Machine Learning and Image Processing
No ratings yet
Stress Detection Using Machine Learning and Image Processing
9 pages
DS+ICT Final Setup - Colour (128-135) .
No ratings yet
DS+ICT Final Setup - Colour (128-135) .
8 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
Unit Iv
No ratings yet
Unit Iv
9 pages
User Stress Detection Using Social Media Text Machine Learning Approach
No ratings yet
User Stress Detection Using Social Media Text Machine Learning Approach
15 pages
17 Large Scale Machine Learning PDF
No ratings yet
17 Large Scale Machine Learning PDF
10 pages
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
No ratings yet
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
2 pages
3 Types of Gradient Descent Algorithms For Small & Large Datasets
No ratings yet
3 Types of Gradient Descent Algorithms For Small & Large Datasets
9 pages
Planos ZX130-5
No ratings yet
Planos ZX130-5
18 pages
Text Based Stress Detection Using Machine Learning
No ratings yet
Text Based Stress Detection Using Machine Learning
5 pages
SIPGA Project List
No ratings yet
SIPGA Project List
1 page
Gilsang+18671 17 Korea AAP Rev
No ratings yet
Gilsang+18671 17 Korea AAP Rev
8 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
DWM Lab Workbook Sample
No ratings yet
DWM Lab Workbook Sample
10 pages
Akshaya and Jayshree Redevelopment
No ratings yet
Akshaya and Jayshree Redevelopment
7 pages
Text-Based Stress Detection and Classification Using Machine Learning
No ratings yet
Text-Based Stress Detection and Classification Using Machine Learning
5 pages
Some Introductory Concepts On Fiberr Optic System
No ratings yet
Some Introductory Concepts On Fiberr Optic System
36 pages
CONDUITE
No ratings yet
CONDUITE
9 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
05.stochastic Gradient Descent
No ratings yet
05.stochastic Gradient Descent
2 pages
Checklist Chiller-Update
No ratings yet
Checklist Chiller-Update
4 pages
2.AquaArm SBS 3000X
No ratings yet
2.AquaArm SBS 3000X
3 pages
1p00q00 5
No ratings yet
1p00q00 5
1 page
Hussein Abdullahi Elmi: Personal Profile
No ratings yet
Hussein Abdullahi Elmi: Personal Profile
3 pages
Onboarding Form Filling Guide
No ratings yet
Onboarding Form Filling Guide
2 pages
Rittal White Paper 401: The Benefits of Busbar Power Distribution Systems For North American & Global Applications
No ratings yet
Rittal White Paper 401: The Benefits of Busbar Power Distribution Systems For North American & Global Applications
9 pages
Prestigio Multipad pmp3270b Service Manual
No ratings yet
Prestigio Multipad pmp3270b Service Manual
32 pages
Boq - Cuyapo Warehouse
No ratings yet
Boq - Cuyapo Warehouse
1 page
ARRI - SkyPanel - Firmware 4 - 4 - Release Notes
No ratings yet
ARRI - SkyPanel - Firmware 4 - 4 - Release Notes
4 pages
Microphone Data Sheet
No ratings yet
Microphone Data Sheet
3 pages
BQS502 Assignment 1 Mac - Aug 2022
No ratings yet
BQS502 Assignment 1 Mac - Aug 2022
2 pages
Admit Card
No ratings yet
Admit Card
2 pages
Panel Options LCD Samsung PDF
No ratings yet
Panel Options LCD Samsung PDF
11 pages
Rubrica 3: Conalep 1
No ratings yet
Rubrica 3: Conalep 1
4 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet

2,5 Stochastic Gradient Descent

Uploaded by

2,5 Stochastic Gradient Descent

Uploaded by

Stochastic Gradient Descent (SGD)

Gradient Descent is an iterative

Types of Gradient Descent:

Stochastic Gradient Descent (SGD):

of iterations and the learning rate

(or a small batch) in the shuffled

function with respect to the model

taking a step in the direction of the

such as the difference in the cost

convergence criteria are met or the

A path taken by Stochastic Gradient

One thing to be noted is that, as SGD is

Difference between Stochastic Gradient

Computational Computationally Computationally

data points. entire dataset.

High noise due to Low noise as it

Less stable as it More stable as it

Frequent updates Less frequent

Advantages of Stochastic Gradient

of Gradient Descent such as Batch

the parameters for each training

noisy and have a high variance, which

more iterations to converge to the

You might also like