More On Gradient Descent

The document discusses gradient descent as an optimization algorithm that requires a differentiable loss function to minimize loss through weight and bias updates. It highlights the vanishing gradient problem associated with certain activation functions like sigmoid and suggests solutions such as using ReLU and techniques like batch normalization. Additionally, it describes different types of gradient descent: Batch, Stochastic, and Mini-Batch, each with its own advantages and disadvantages for training neural networks.

Uploaded by

rabby01601565625

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views12 pages

MORE ON GRADIENT SPS, Summer 2022

DESCENT
GRADIENT DESCENT
An Optimization algorithm
Needs a differentiable loss
function
Finds the values for minimum
loss
First take random values of
weights and bias. Update rule,
wnew = wold – lr* dL/dw
bnew = bold – lr* dL/db
ACTIVATION FUNCTIONS

Sigmoid squishes values between 0 to 1

ReLU takes 0 for –ve values and positive

Values as it is.
ACTIVATION FUNCTIONS

We are interested in
activation functions
cause they are used in
calculating the output
Of each neuron and
those outputs are also
used when we calculate
derivatives to update
the weights and bias.
VANISHING GRADIENT
DESCENT PROBLEM
In Backpropagation, we can represent gradient of the loss
function as a product of gradients respect to their weights.
The updated weights of nodes in the network depend on the
gradients of the activation functions of each node.
For sigmoid the partial derivative of the sigmoid function
reaches a maximum value of 0.25. When there are more layers
in the network, the value of the product of derivative
decreases until at some point the partial derivative of the
loss function approaches a value close to zero, and the
partial derivative vanishes. We call this the vanishing
gradient problem.
VANISHING GRADIENT
DESCENT PROBLEM
wnew = wold – lr* dL/dw
bnew = bold – lr* dL/db
SOLUTION FOR VANISHING
GD
In a network with vanishing The problem with the use of
gradient, the weights cannot be ReLU is when the gradient has
updated, so the network cannot a value of 0. (Dying ReLU)
learn. The performance of the
network will decrease as a Other technique to avoid the
result. vanishing gradient problem is
The simplest solution to the proper weight
problem is to replace the initialization,
activation function as Relu.
reduce model complexity,
The derivative of a ReLU
function is defined as 1 for Leaky ReLU,
inputs that are greater than
zero and 0 for inputs that are
Batch normalization,
negative. residual Network(ResNET)
TYPES OF GRADIENT
DESCENT
In Batch Gradient Descent,
all the training data is
taken into consideration to
take a single step.
In Stochastic Gradient
Descent (SGD), we consider
just one example at a time to
take a single step.
a mixture of Batch Gradient
Descent and SGD. We use a
batch of a fixed number of
training examples which is
less than the actual dataset
and call it a mini-batch.
BATCH GD
Take the whole dataset
Feed it to Neural Network
Calculate it’s gradient
Use the gradient to update the weights
Repeat for number of epochs

great for relatively smooth error manifolds.

move directly towards an optimum solution.
It stuck when dataset is huge.
May not reach the global Minima.
STOCHASTIC GD
Take an example
Feed it to Neural Network
Calculate it’s gradient
Use the gradient we calculated
in step 3 to update the weights
Repeat steps 1–4 for all the
examples in training dataset
SGD can be used for larger
datasets to converge faster.
Good for fining global minima.
it will never reach the
minima but it will keep
dancing around it.
MINI BATCH GD
Pick a mini-batch Total data = 500
Feed it to Neural Network Batch_size = 50
Calculate the mean gradient Total batch = 500/50 = 10
of the mini-batch
For every 50 data points
Use the mean gradient we weight will be updated one
calculated in step 3 to time. This process will
update the weights repeat 10 times.
Repeat steps 1–4 for the  It takes both the good
mini-batches we created sides from SGD and BGD. It is
faster and also computable at
one time.
RESOURCES
https://
towardsdatascience.com/batch-mini-batch-stochastic-gradient-desc
ent-7a62ecba642a
Image sources : Google

C Program For Numerical Methods Unit 1
100% (1)
C Program For Numerical Methods Unit 1
21 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
S09 DNN Gradients Wip
No ratings yet
S09 DNN Gradients Wip
28 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
Lecture 7 - Optimization Part I
No ratings yet
Lecture 7 - Optimization Part I
38 pages
ANN Presentation Exam Hafsa
No ratings yet
ANN Presentation Exam Hafsa
29 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
PCA and Convex Optimization and Bias, Variance-2
No ratings yet
PCA and Convex Optimization and Bias, Variance-2
29 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Implement 03-1
No ratings yet
Implement 03-1
24 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Lect 7 - Vanishing Gradient Problem
No ratings yet
Lect 7 - Vanishing Gradient Problem
41 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Lec 5 Scaling and Opt
No ratings yet
Lec 5 Scaling and Opt
68 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Lesson 4 Training ANNs
No ratings yet
Lesson 4 Training ANNs
34 pages
Lect 6
No ratings yet
Lect 6
60 pages
ML - Week 06
No ratings yet
ML - Week 06
31 pages
Module 4 Lab 3
No ratings yet
Module 4 Lab 3
6 pages
Deep Learning Unit 1
No ratings yet
Deep Learning Unit 1
35 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
5.scaling Optimization
No ratings yet
5.scaling Optimization
68 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Deep Learning Unit 1
No ratings yet
Deep Learning Unit 1
32 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
4 pages
04 Batch SGD Mini Batch Gradient Descent Algorithms
No ratings yet
04 Batch SGD Mini Batch Gradient Descent Algorithms
3 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
Deep Learning: Course Code: Unit 1
No ratings yet
Deep Learning: Course Code: Unit 1
41 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Module 2
No ratings yet
Module 2
13 pages
GD Types
No ratings yet
GD Types
98 pages
ANN Presentation
No ratings yet
ANN Presentation
29 pages
Gradient Descent Deep Learning Lecture
No ratings yet
Gradient Descent Deep Learning Lecture
5 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
chp2 Gradient Descent Algorithm
No ratings yet
chp2 Gradient Descent Algorithm
5 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Cours 5
No ratings yet
Cours 5
23 pages
Cours 2 - Training Deep Neural Networks
No ratings yet
Cours 2 - Training Deep Neural Networks
42 pages
Gradient Descent (GD) - GD With Momentum - Nesterov Accelerated GD - Stochastic GD - OrIGINAL
No ratings yet
Gradient Descent (GD) - GD With Momentum - Nesterov Accelerated GD - Stochastic GD - OrIGINAL
25 pages
Module 2
No ratings yet
Module 2
12 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
3.4 - Backpropagation and Architectures
No ratings yet
3.4 - Backpropagation and Architectures
28 pages
Boltzmann Machine
No ratings yet
Boltzmann Machine
47 pages
1989 AMC 8 Problems
No ratings yet
1989 AMC 8 Problems
8 pages
Module 6 Topic 12 Polynomial Functions PART II - Answers
No ratings yet
Module 6 Topic 12 Polynomial Functions PART II - Answers
5 pages
Algorithmic Probability Theory and Applications
0% (1)
Algorithmic Probability Theory and Applications
23 pages
12 Thdec ACA
No ratings yet
12 Thdec ACA
8 pages
Birla Institute of Technology and Science, Pilani: Second Semester 2024-25 Course Handout
No ratings yet
Birla Institute of Technology and Science, Pilani: Second Semester 2024-25 Course Handout
8 pages
Algorithms: Introducton
No ratings yet
Algorithms: Introducton
50 pages
Algorithms Types - Discrete Mathematics Questions and Answers - Sanfoundry
No ratings yet
Algorithms Types - Discrete Mathematics Questions and Answers - Sanfoundry
7 pages
Minimize Subject To The Constraints: X X X X
No ratings yet
Minimize Subject To The Constraints: X X X X
16 pages
CS6234 L1 Matching
No ratings yet
CS6234 L1 Matching
68 pages
547f39f7a0142 Network Analysis
No ratings yet
547f39f7a0142 Network Analysis
38 pages
Lect 4
No ratings yet
Lect 4
14 pages
Linear Search
No ratings yet
Linear Search
4 pages
Graphing Polynomials ws1
No ratings yet
Graphing Polynomials ws1
6 pages
Daa QB PDF
No ratings yet
Daa QB PDF
12 pages
Lab 8 - LP Modeling and Simplex Method
No ratings yet
Lab 8 - LP Modeling and Simplex Method
8 pages
University of Massachusetts Dept. of Electrical & Computer Engineering
No ratings yet
University of Massachusetts Dept. of Electrical & Computer Engineering
22 pages
RST 2
No ratings yet
RST 2
80 pages
Verhoosel 33241900 2024-2
No ratings yet
Verhoosel 33241900 2024-2
82 pages
Array Ds 1 Chapter
No ratings yet
Array Ds 1 Chapter
33 pages
Cse-IV-graph Theory and Combinatorics (10cs42) - Notes
No ratings yet
Cse-IV-graph Theory and Combinatorics (10cs42) - Notes
215 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
9 pages
Introduction and Overview: 1.1 Physics of Information
No ratings yet
Introduction and Overview: 1.1 Physics of Information
739 pages
A Ad - A - Ab - Abc - B: Generate The SLR Parsing Table For The Following Grammar
0% (1)
A Ad - A - Ab - Abc - B: Generate The SLR Parsing Table For The Following Grammar
7 pages
Stanford Graph Problems 1
No ratings yet
Stanford Graph Problems 1
1 page
Explicit and Implicit Method
100% (1)
Explicit and Implicit Method
10 pages
Church-Turing Thesis: cs3102: Theory of Computation Class 15: Turing Machine Recap
No ratings yet
Church-Turing Thesis: cs3102: Theory of Computation Class 15: Turing Machine Recap
7 pages
The Ocaml Language: Syntax Functions Conditionals
No ratings yet
The Ocaml Language: Syntax Functions Conditionals
1 page
Directi
No ratings yet
Directi
12 pages