0% found this document useful (0 votes)

20 views6 pages

Gradient Descent & Stockastic Gradient Descent

Uploaded by

aimad baigouar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views6 pages

Gradient Descent & Stockastic Gradient Descent

Uploaded by

aimad baigouar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Why Stochastic Gradient

Descent Works?
Sep 14, 2019·4 min read
Why Stochastic Gradient Descent Works? | by Sujan Dutta | Towards Data Science

Crazy paths often lead to the right destination!

Optimizing a cost function is one of the most important concepts

in Machine Learning. Gradient Descent is the most common
optimization algorithm and the foundation of how we train an
ML model. But it can be really slow for large datasets. That’s
why we use a variant of this algorithm known as Stochastic
Gradient Descent to make our model learn a lot faster. But what
makes it faster? Does it come at a cost?

Well…Before diving into SGD, here’s a quick reminder

of Vanilla Gradient Descent…
We first randomly initialize the weights of our model. Using
these weights we calculate the cost over all the data points in
the training set. Then we compute the gradient of cost w.r.t the
weights and finally, we update weights. And this process
continues until we reach the minimum.

The update step is something like this…

J is the cost over all the training data points

Now, what happens if the number of data points in our training

set becomes large? say m = 10,000,000. In this case, we have
to sum the cost of all m examples just to perform one update
step!

Here comes the SGD to rescue us…

Instead of calculating the cost of all data points, we calculate

the cost of one single data point and the corresponding
gradient. Then we update the weights.

The update step is as follows…

J_i is the cost of ith training example

We can easily see that in this case update steps are performed
very quickly and that is why we can reach the minimum in a very
small amount of time.

But…Why SGD works?

The key concept is we don’t need to check all the training

examples to get an idea about the direction of decreasing slope.
By analyzing only one example at a time and following its slope
we can reach a point that is very close to the actual minimum.
Here’s an intuition…

Suppose you have made an app and want to improve it by

taking feedback from 100 customers. You can do it in two
ways. In the first way, you can give the app to the first
customer and take his feedback then to the second one, then the
third, and so on. After collecting feedback from all of them you
can improve your app. But in the second way, you can improve
the app as soon as you get feedback from the first customer.
Then you give it to the second one and you improve again
before giving it to the third one. Notice that in this way you are
improving your app at a much faster rate and can reach an
optimal point much earlier.
Hopefully, you can tell that the first process is the Vanilla
Gradient Descent and the second one is SGD.

But SGD has some cons too…

SGD is much faster but the convergence path of SGD is noisier

than that of original gradient descent. This is because in each
step it is not calculating the actual gradient but an
approximation. So we see a lot of fluctuations in the cost. But
still, it is a much better choice.

Convergence paths are shown on a contour plot

We can see the noise of SGD in the above contour plot. It is to be

noted that vanilla GD takes a fewer number of updates but each
update is done actually after one whole epoch. SGD takes a lot of
update steps but it will take a lesser number of epochs i.e. the
number of times we iterate through all examples will be lesser in
this case and thus it is a much faster process.
As you can see in the plot there is a third variant of gradient
descent known as Mini-batch gradient descent. This is a process
that uses the flexibility of SGD and the accuracy of GD. In this
case, we take a fixed number(known as batch size) of training
examples at a time and compute the cost and corresponding
gradient. Then we update the weights and continue the same
process for the next batch. If batch size = 1 then it becomes
SGD and if batch size = m then it becomes normal GD.

J_b is the cost of bth batch

Implementation from scratch

Here’s a python implementation of mini-batch gradient descent

from scratch. You can easily make batch_size = 1 to implement
SGD. In this code, I’ve used SGD to optimize the cost function of
logistic regression for a simple binary classification problem.

Find the full code here.

Still curious? Watch a video that I made recently…

(204) Stochastic Gradient Descent | Why and How it Works? - YouTube

def sgd(W_new, W_prev, lr, batch_size, epochs):

X_, Y_ = shuffle(X, Y, random_state=0)

for e in range(epochs):

epoch_loss = []

X_, Y_ = shuffle(X_, Y_, random_state=0)

for (batchX, batchY) in next_batch(X_, Y_, batch_size):

W_prev = W_new

epoch_loss.append(cost(W_prev, batchX, batchY))

gradients = grad(W_prev, batchX, batchY)

W_new = W_prev - lr*gradients

print(np.average(epoch_loss))

return W_new

Sgw-3015 Technical Manual
100% (2)
Sgw-3015 Technical Manual
7 pages
Elevator Controler Part 4 TK
100% (6)
Elevator Controler Part 4 TK
22 pages
Memorandum of Understanding
93% (30)
Memorandum of Understanding
4 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
2,5 Stochastic Gradient Descent
No ratings yet
2,5 Stochastic Gradient Descent
11 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
Paper 2
No ratings yet
Paper 2
27 pages
04 Batch SGD Mini Batch Gradient Descent Algorithms
No ratings yet
04 Batch SGD Mini Batch Gradient Descent Algorithms
3 pages
ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks
No ratings yet
ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks
9 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
5 pages
3 Types of Gradient Descent Algorithms For Small & Large Datasets
No ratings yet
3 Types of Gradient Descent Algorithms For Small & Large Datasets
9 pages
Gradient Descent and Its Types
No ratings yet
Gradient Descent and Its Types
5 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
Gradient Descent 5 Part 2
No ratings yet
Gradient Descent 5 Part 2
15 pages
Visualising SGD With Momentum, Adam and Learning Rate Annealing
No ratings yet
Visualising SGD With Momentum, Adam and Learning Rate Annealing
8 pages
GD Types
No ratings yet
GD Types
98 pages
05.stochastic Gradient Descent
No ratings yet
05.stochastic Gradient Descent
2 pages
SGD
No ratings yet
SGD
3 pages
17 Large Scale Machine Learning PDF
No ratings yet
17 Large Scale Machine Learning PDF
10 pages
Linear Models-Gradient Descent, Regularization (Introduction)
No ratings yet
Linear Models-Gradient Descent, Regularization (Introduction)
26 pages
Stochastic Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Stochastic Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
22 pages
7 Stochastic Gradient
No ratings yet
7 Stochastic Gradient
4 pages
ANN Explanation Request Updated
No ratings yet
ANN Explanation Request Updated
44 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
Optimizer
No ratings yet
Optimizer
13 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
Tut04 - One Algorithm To Optimize Them All
No ratings yet
Tut04 - One Algorithm To Optimize Them All
19 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
Module 4 Lab 3
No ratings yet
Module 4 Lab 3
6 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
4 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
Gradient Decent
No ratings yet
Gradient Decent
15 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
23 pages
chp2 Gradient Descent Algorithm
No ratings yet
chp2 Gradient Descent Algorithm
5 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Lecture 5
No ratings yet
Lecture 5
4 pages
Lec 6
No ratings yet
Lec 6
11 pages
Stochastic Gradient Descent - Math and Python Code
No ratings yet
Stochastic Gradient Descent - Math and Python Code
28 pages
S09 DNN Gradients Wip
No ratings yet
S09 DNN Gradients Wip
28 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
23 pages
Lecture05 Descent
No ratings yet
Lecture05 Descent
31 pages
UNIT3
No ratings yet
UNIT3
37 pages
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
No ratings yet
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
2 pages
Gradient Descent and Cost Function
No ratings yet
Gradient Descent and Cost Function
14 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
Dla-Cat 1
No ratings yet
Dla-Cat 1
37 pages
CS221 - Artificial Intelligence - Machine Learning - 4 Stochastic Gradient Descent
No ratings yet
CS221 - Artificial Intelligence - Machine Learning - 4 Stochastic Gradient Descent
12 pages
Lecture 08 ML
No ratings yet
Lecture 08 ML
20 pages
UNIT2
No ratings yet
UNIT2
25 pages
Technical Writing
No ratings yet
Technical Writing
8 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Lead Generation
From Everand
Lead Generation
B. Vincent
No ratings yet
UNIT 1 LÝ THUYẾT VÀ BÀI TẬP THỰC HÀNH
No ratings yet
UNIT 1 LÝ THUYẾT VÀ BÀI TẬP THỰC HÀNH
12 pages
Dsa Report
No ratings yet
Dsa Report
14 pages
Abap Important All Data Infor.
No ratings yet
Abap Important All Data Infor.
17 pages
Pertemuan 2 Strategi Operasi Dalam Lingkungan Global
No ratings yet
Pertemuan 2 Strategi Operasi Dalam Lingkungan Global
48 pages
A 950 - A 950M - 99 (Reapproved 2003) PDF
No ratings yet
A 950 - A 950M - 99 (Reapproved 2003) PDF
5 pages
Cbgad My16 PDF
No ratings yet
Cbgad My16 PDF
5 pages
01 TD Infinity C700 For IT en
No ratings yet
01 TD Infinity C700 For IT en
40 pages
TELX TD PABX Manual PDF
No ratings yet
TELX TD PABX Manual PDF
44 pages
Remote Reporting in The COVID 19 Era From Pilot S
No ratings yet
Remote Reporting in The COVID 19 Era From Pilot S
4 pages
Emmanuel Seminar
No ratings yet
Emmanuel Seminar
9 pages
M100 - 4" Meters With Mechanical Register: Positive Displacement
No ratings yet
M100 - 4" Meters With Mechanical Register: Positive Displacement
2 pages
Grade 3 Mental Maths Subtraction Worksheet 1 PDF 2
No ratings yet
Grade 3 Mental Maths Subtraction Worksheet 1 PDF 2
1 page
PD42-x-1240 Manual
No ratings yet
PD42-x-1240 Manual
34 pages
Chapter 3
No ratings yet
Chapter 3
29 pages
Refrigeration System Operating With Solar Energy: - Design of Vapor-Absorption
No ratings yet
Refrigeration System Operating With Solar Energy: - Design of Vapor-Absorption
27 pages
Excel Project Plan Template
No ratings yet
Excel Project Plan Template
14 pages
G14-User Manual
No ratings yet
G14-User Manual
20 pages
Subcontractor Permit Request2023
No ratings yet
Subcontractor Permit Request2023
1 page
Ie Generative Ai Deloitte Consulting
No ratings yet
Ie Generative Ai Deloitte Consulting
3 pages
Hari Kishan Reddy Dulganti - Java
No ratings yet
Hari Kishan Reddy Dulganti - Java
7 pages
Ketron Sd1 Plus Quick
No ratings yet
Ketron Sd1 Plus Quick
42 pages
Generating Permutations. Ranking and Unranking Permutations. The Pigeonhole Principle. The Inclusion and Exclusion Principle
No ratings yet
Generating Permutations. Ranking and Unranking Permutations. The Pigeonhole Principle. The Inclusion and Exclusion Principle
88 pages
Ram Sharma
No ratings yet
Ram Sharma
2 pages
Incident Operation (Fire Incident) - English
No ratings yet
Incident Operation (Fire Incident) - English
3 pages
CV Bima
No ratings yet
CV Bima
1 page
Fire CR Dental
No ratings yet
Fire CR Dental
64 pages
HP Summer Sale Consumer Offers 2025
No ratings yet
HP Summer Sale Consumer Offers 2025
24 pages