SGD

stoch. gradient descent

Uploaded by

Sam Smith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views3 pages

SGD

stoch. gradient descent

Uploaded by

Sam Smith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Explain what gradient descent means.

a. Gradient descent aims to find the minimum of a function, typically a loss

function
J(θ), where θ represents the parameters of the model.

b. At each iteration, gradient descent calculates the gradient (derivative) of

the loss function with respect to the parameters
𝜃. Each component of the gradient vector indicates how much the function
f changes with respect to a small change in the parameter,
holding all other parameters constant.

c. The parameters 𝜃 are updated iteratively in the opposite direction of the

gradient to minimize the loss function. Alpha - the learning rate controls
step size of each update.

d. Iterative process: Steps b and c are repeated until a stopping criterion is

met (e.g., a maximum number of iterations, convergence of the loss function).

G.R.A.D.I.E.N.T.

Get initial parameters

Recognize gradient direction
Adjust parameters
Determine convergence criteria
Iterate until convergence
Evaluate performance
Navigate toward minimum
Tune learning rate

4b. Stochastic Gradient Descent (SGD) and conventional Gradient Descent (GD) are
both optimization algorithms used in training machine learning models, but they differ
in their approach to updating model parameters. Here’s when each one is typically
used:

Gradient Descent (GD):

Batch Processing: GD computes the gradient of the loss function with respect to the
entire training dataset. It then updates the model parameters once per epoch (one
pass through the entire dataset).
Suitable for Small Datasets: GD is suitable when the dataset fits entirely in memory
and is not too large. It ensures that each parameter update is based on the precise
average gradient computed across the entire dataset.
Advantages: Typically converges to the global minimum (for convex problems) or a
good local minimum (for non-convex problems) more reliably because it uses precise
gradient information.
Stochastic Gradient Descent (SGD):

Online Learning: SGD updates the model parameters incrementally for each training
example (or mini-batch of examples). It computes the gradient using only one
example (or a small batch) at a time.
Large Datasets: SGD is particularly useful for large datasets that cannot fit into
memory. It allows for iterative updates without needing to load the entire dataset into
memory at once.
Advantages: Faster convergence per iteration compared to GD because updates are
more frequent and use less computational resources per iteration.
When to Use SGD vs. GD:

GD: Use GD when you have a small to moderate-sized dataset that can fit into
memory and when you want to ensure precise updates based on the entire dataset.
It's also suitable for situations where you want a smoother convergence trajectory
towards the minimum.

SGD: Use SGD when dealing with large datasets or when implementing online
learning where you need to continuously update the model as new data arrives. It's
also beneficial in scenarios where computational resources are limited or when you
want faster updates per iteration.

Variants:

Mini-Batch SGD: A compromise between GD and SGD, where updates are made
based on small batches of data. It combines the benefits of both approaches by
reducing the variance of parameter updates compared to pure SGD while still being
computationally efficient.
In practice, the choice between GD and SGD (or its variants) depends on the specific
problem, the size of the dataset, computational resources available, and desired
convergence properties of the optimization process.

DOWNSIDES to SGD:
Due to its stochastic nature and the potential for noisy updates, SGD might not
always converge to the global minimum of the loss function. Instead, it often settles
for a good local minimum, which may or may not be optimal.

When to Use SGD vs. GD:

USE GD when you have a small to moderate-sized dataset that can fit into memory
and when you want to ensure precise updates based on the entire dataset
Use Stochastic Gradient Descent when dealing with large datasets or when
implementing online learning where you need to continuously update the model as
new data arrives.

Admission Form
No ratings yet
Admission Form
1 page
1 Intro
No ratings yet
1 Intro
91 pages
04 Batch SGD Mini Batch Gradient Descent Algorithms
No ratings yet
04 Batch SGD Mini Batch Gradient Descent Algorithms
3 pages
Optimizers and Activation Functions in Deep Learning
No ratings yet
Optimizers and Activation Functions in Deep Learning
15 pages
6705-Article Text-13114-1-10-20210220
No ratings yet
6705-Article Text-13114-1-10-20210220
29 pages
ANN Explanation Request Updated
No ratings yet
ANN Explanation Request Updated
44 pages
Dla-Cat 1
No ratings yet
Dla-Cat 1
37 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
GD Types
No ratings yet
GD Types
98 pages
UNIT3
No ratings yet
UNIT3
37 pages
04 Optimization
No ratings yet
04 Optimization
62 pages
UNIT III Part-2
No ratings yet
UNIT III Part-2
39 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
Gradient Descent A Fundamental Optimization Algorithm
No ratings yet
Gradient Descent A Fundamental Optimization Algorithm
30 pages
Linear Models-Gradient Descent, Regularization (Introduction)
No ratings yet
Linear Models-Gradient Descent, Regularization (Introduction)
26 pages
Optim
No ratings yet
Optim
33 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
Xilinx System Generator For DSP PDF
No ratings yet
Xilinx System Generator For DSP PDF
376 pages
Optimization Techniques (SGD Alternatives)
No ratings yet
Optimization Techniques (SGD Alternatives)
34 pages
Gradient Descent
No ratings yet
Gradient Descent
7 pages
Unit 4 Final
No ratings yet
Unit 4 Final
29 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
Linear Models (Unit II) Chapter III 1
No ratings yet
Linear Models (Unit II) Chapter III 1
24 pages
Gradient Descent 5 Part 2
No ratings yet
Gradient Descent 5 Part 2
15 pages
ML Module 5 Full Notes
No ratings yet
ML Module 5 Full Notes
23 pages
Paper 2
No ratings yet
Paper 2
27 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
3 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
5 pages
Optimizer
No ratings yet
Optimizer
13 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
2,5 Stochastic Gradient Descent
No ratings yet
2,5 Stochastic Gradient Descent
11 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
Grievance Report by Evangeline Ano
83% (6)
Grievance Report by Evangeline Ano
19 pages
Gradient Descent & Stockastic Gradient Descent
No ratings yet
Gradient Descent & Stockastic Gradient Descent
6 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Gradient Decent
No ratings yet
Gradient Decent
15 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Op Tim Ization
No ratings yet
Op Tim Ization
9 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
Unit 3. Information Search Process
No ratings yet
Unit 3. Information Search Process
34 pages
BSCPL Tech Spec MLTP Botanical R00
No ratings yet
BSCPL Tech Spec MLTP Botanical R00
57 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
Technical Writing
No ratings yet
Technical Writing
8 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
Guidelines Flare Vent Measurement
100% (1)
Guidelines Flare Vent Measurement
36 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
4 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Gradient Descent and Its Types
No ratings yet
Gradient Descent and Its Types
5 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
4 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
STULZ CyberRow DX Engineering Manual QEWR002G
No ratings yet
STULZ CyberRow DX Engineering Manual QEWR002G
20 pages
NVIDIA DGX SuperPOD With DGX GB200 Systems
No ratings yet
NVIDIA DGX SuperPOD With DGX GB200 Systems
3 pages
Systemair Fans KVO Data Sheet Eng PDF
No ratings yet
Systemair Fans KVO Data Sheet Eng PDF
4 pages
Gradient Descent and Cost Function
No ratings yet
Gradient Descent and Cost Function
14 pages
UNIT2
No ratings yet
UNIT2
25 pages
3 Types of Gradient Descent Algorithms For Small & Large Datasets
No ratings yet
3 Types of Gradient Descent Algorithms For Small & Large Datasets
9 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
X U Data Sheet Technical Information ASSET DOC 2597808
No ratings yet
X U Data Sheet Technical Information ASSET DOC 2597808
10 pages
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
No ratings yet
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
2 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
Rightvows Fly Dubai: 1. Visit Visa For 30 Days or 90 Days + Insurance
No ratings yet
Rightvows Fly Dubai: 1. Visit Visa For 30 Days or 90 Days + Insurance
3 pages
Optical Fiber Communication: Technology and Systems: Chapter 1: Introduction
No ratings yet
Optical Fiber Communication: Technology and Systems: Chapter 1: Introduction
44 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
IDFL Standards - European Sleeping Bag Labeling Info EN13537 Information For Consumers Jan 05
No ratings yet
IDFL Standards - European Sleeping Bag Labeling Info EN13537 Information For Consumers Jan 05
5 pages
05.stochastic Gradient Descent
No ratings yet
05.stochastic Gradient Descent
2 pages
1 s2.0 S0360319923002951 Main
No ratings yet
1 s2.0 S0360319923002951 Main
25 pages
Petition For Disqualification Bartolome Bermudez
No ratings yet
Petition For Disqualification Bartolome Bermudez
9 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
Vio's Bartering Money Guide For Poor People-1 PDF
No ratings yet
Vio's Bartering Money Guide For Poor People-1 PDF
13 pages
2-In-1 Mbot: Line Follower and Object Avoidance: Technology Workshop Craft Home Food Play Outside Costumes
No ratings yet
2-In-1 Mbot: Line Follower and Object Avoidance: Technology Workshop Craft Home Food Play Outside Costumes
4 pages
Microsoft Azure Fundamentals: Microsoft AZ-900 Dumps Available Here at
No ratings yet
Microsoft Azure Fundamentals: Microsoft AZ-900 Dumps Available Here at
9 pages
Invitation of PT Garuda Indonesia (Persero) TBK: The Annual General Meeting of Shareholders
No ratings yet
Invitation of PT Garuda Indonesia (Persero) TBK: The Annual General Meeting of Shareholders
1 page
Project Team Building, Conflict, and Negotiation
No ratings yet
Project Team Building, Conflict, and Negotiation
9 pages
User Manual 2195612
No ratings yet
User Manual 2195612
2 pages
Anthill Protocol
No ratings yet
Anthill Protocol
2 pages
01 JRODOS Overview
No ratings yet
01 JRODOS Overview
25 pages
Ddos Detection Approach Based On Continual Learning in The SDN Environment
No ratings yet
Ddos Detection Approach Based On Continual Learning in The SDN Environment
10 pages
Cbet LVL 6 Basic Electronics 4
No ratings yet
Cbet LVL 6 Basic Electronics 4
3 pages
Norman Cordero Marquez, Petitioner, vs. Commission On Elections, Respondent.
No ratings yet
Norman Cordero Marquez, Petitioner, vs. Commission On Elections, Respondent.
9 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Socialization of Agriculture
No ratings yet
Socialization of Agriculture
2 pages
Assignment 3 BTF3363
No ratings yet
Assignment 3 BTF3363
5 pages
FF0332 01 Artificial Intelligence Powerpoint Template
No ratings yet
FF0332 01 Artificial Intelligence Powerpoint Template
8 pages
2025 Reqwhiterun
No ratings yet
2025 Reqwhiterun
6 pages