2.vanishing Gradient and Exploding Gradient Simple Notes

Uploaded by

Jeevabarathi P

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

2.vanishing Gradient and Exploding Gradient Simple Notes

Uploaded by

Jeevabarathi P

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Vanishing Gradient and Exploding Gradient:

● The Neural Networks are trained using back propagation and gradient based learning methods.
● During training, we want to reach the most optimum value of weights resulting in minimum loss.
● Each weight is constantly gets updated during the training of the algorithm.
● The update is proportional to the partial derivative of the error function with respect to the current
weight in each training iteration.
● However, sometimes this update becomes too small, and hence the weight does not get updated. It
results in very less or practically no training of the network. This is referred to as the vanishing
gradient problem.

● In Figure, we Shown that in the sigmoid function, we can face the problem of vanishing gradient,
while in the case of a ReLU or Leaky ReLU, we will not have vanishing gradient as an issue.

Back Propagation
● Each neuron in the network has an activation function and a bias term.
● It accepts a finite number of input weight products, adds a bias term to it, and then
applies activation function. The output is then passed to the next neuron.
● The difference between the expected output and the predicted value is called error term.
● The error will be minimized when we have achieved the best combination of weights and
biases across the layers and neurons.
Gradient Descent

● When the error is calculated, a gradient descent is applied to the error function.
● This gradient descent is the differentiation of the error function with respect to weights
and biases.
● The back propagation algorithm adjusts these weights and biases using a learning rate.
● This is done from the last layer to the first layer in the backward direction or from the
right to the left.
● In each iteration, gradient descent determines the direction of change, updating weights and biases
until the error is minimized, or till the error reaches a global minima as shown in Figure.

Detection:
1. Kernel weight distribution shows weights approaching zero.
2. Weights in final layers change more than initial layers.
3. Slow or no improvement in model during training.
4. Early training stopping without further improvement.
Solutions:
1. Reduce Network Depth: Simplifies the network but might reduce performance.
2. Use ReLU Activation: ReLU (Rectified Linear Unit) helps maintain gradients better than
sigmoid or tanh functions.
3. Residual Networks (ResNets): Use skip connections to maintain gradient flow, making deep
networks train more effectively.

Exploding Gradient Problem:

In deep networks, error gradients sometimes become very large as they get accumulated. Hence, the
updates in the networks will be very large which make the network unstable. There are a few signs of
exploding gradients which can help us in detecting exploding gradient:
1. The model is suffering from poor loss during the training phase.
2. During the training of the algorithm, we might encounter NaN for the loss or for the weights.
3. The model is generally unstable, or in other words the updates to loss in subsequent iterations are
huge indicating an unstable state.
4. The error gradients are constantly above 1 for each of the layers and neurons in the network.

Exploding gradient can be resolved using

1. We can reduce the number of layers in the network or can try reducing the batch size during training.
2. L1 and L2 weight regularization can be added which will act as a penalty to the network loss functions.
3. Gradient clipping is one of the methods which can be used. We can limit the size of the gradients
during the process of training. We set a threshold for the error gradients, and the error gradients are set
to that limit or clipped if the error gradient exceeds the threshold.

MCSDK - 6-Step Firmware Examples: Insights of The Firmware and How To Customize It
No ratings yet
MCSDK - 6-Step Firmware Examples: Insights of The Firmware and How To Customize It
18 pages
Gradient Problems (1)
No ratings yet
Gradient Problems (1)
8 pages
Lect 7- Vanishing Gradient Problem
No ratings yet
Lect 7- Vanishing Gradient Problem
41 pages
The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks
No ratings yet
The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks
8 pages
Weight Initialization Techniques Assignment Questions
No ratings yet
Weight Initialization Techniques Assignment Questions
8 pages
36-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
No ratings yet
36-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
4 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
gradient_exploding_vanishing_problem_v2
No ratings yet
gradient_exploding_vanishing_problem_v2
3 pages
abss
No ratings yet
abss
8 pages
Layer-Wise Training Rectified Linear Activation Function: Kick-Start Your Project With My New Book
No ratings yet
Layer-Wise Training Rectified Linear Activation Function: Kick-Start Your Project With My New Book
3 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
UNIT-2 Foundations of Deep Learning
No ratings yet
UNIT-2 Foundations of Deep Learning
64 pages
Vanishing Gradient Problem in Deep Learning Understanding Intuition and Solutions
No ratings yet
Vanishing Gradient Problem in Deep Learning Understanding Intuition and Solutions
8 pages
Cours 2 - Training Deep Neural Networks
No ratings yet
Cours 2 - Training Deep Neural Networks
42 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Module 2
No ratings yet
Module 2
13 pages
More on Gradient Descent
No ratings yet
More on Gradient Descent
12 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
2. Training Deep Neural Networks hifi
No ratings yet
2. Training Deep Neural Networks hifi
267 pages
06 AIS302 ANN backpropagation
No ratings yet
06 AIS302 ANN backpropagation
83 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Deep Learning-Summery
No ratings yet
Deep Learning-Summery
24 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
59 pages
Machine Learning With Artificial Neural Networks
No ratings yet
Machine Learning With Artificial Neural Networks
44 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
DL_Unit2
No ratings yet
DL_Unit2
113 pages
[AK]_AIMLCZG511_Midsem_Regular
No ratings yet
[AK]_AIMLCZG511_Midsem_Regular
7 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
introai_last_edit
No ratings yet
introai_last_edit
11 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
2. Deep Neural Network
No ratings yet
2. Deep Neural Network
60 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
DL 3
No ratings yet
DL 3
72 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Unit 2.4
No ratings yet
Unit 2.4
31 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Lect 6
No ratings yet
Lect 6
60 pages
Gradient Descent Learning: Minimize Objective Function: Error Landscape
No ratings yet
Gradient Descent Learning: Minimize Objective Function: Error Landscape
14 pages
AI - W7L13
No ratings yet
AI - W7L13
46 pages
2 Deep Neural Network_241120_095158
No ratings yet
2 Deep Neural Network_241120_095158
47 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
MLS+1+-+Presentation
No ratings yet
MLS+1+-+Presentation
11 pages
Deep Learning Training
No ratings yet
Deep Learning Training
9 pages
UNIT V NNHDL
No ratings yet
UNIT V NNHDL
33 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Deep Learning Notes-2
No ratings yet
Deep Learning Notes-2
16 pages
Derivations For Back Propagation of Multilayer Neural Network
No ratings yet
Derivations For Back Propagation of Multilayer Neural Network
14 pages
Supervised Learning Network
No ratings yet
Supervised Learning Network
33 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Unit 2
No ratings yet
Unit 2
31 pages
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Assignment Functions
No ratings yet
Assignment Functions
6 pages
Tac2 Workshop GSM Network: Call Flow
No ratings yet
Tac2 Workshop GSM Network: Call Flow
26 pages
0400000010
No ratings yet
0400000010
339 pages
Gaurav Sharma - Resume
No ratings yet
Gaurav Sharma - Resume
3 pages
Lec3 B
No ratings yet
Lec3 B
24 pages
Ask Me (Ask4pc)
0% (1)
Ask Me (Ask4pc)
2 pages
Cgaxis 60 Plants in Pot
No ratings yet
Cgaxis 60 Plants in Pot
32 pages
eDocFile Dragonpay Payment Step-By-Step From IPO
No ratings yet
eDocFile Dragonpay Payment Step-By-Step From IPO
5 pages
White Paper The Building Blocks of Fintech 2 0
No ratings yet
White Paper The Building Blocks of Fintech 2 0
45 pages
Analytical Design of PID Controllers (Iván D. Díaz-Rodríguez, Sangjin Han Etc.) (Z-Library)
No ratings yet
Analytical Design of PID Controllers (Iván D. Díaz-Rodríguez, Sangjin Han Etc.) (Z-Library)
304 pages
UVic ECE 330 Lab Manual
No ratings yet
UVic ECE 330 Lab Manual
45 pages
CMA Ships Application Form Crewform 01a
50% (2)
CMA Ships Application Form Crewform 01a
9 pages
Nodeb (B8200 Rru) Installation Guide: Zte University
No ratings yet
Nodeb (B8200 Rru) Installation Guide: Zte University
34 pages
SOUTH S1 External Radio User Manual
No ratings yet
SOUTH S1 External Radio User Manual
8 pages
Final Se Lab PDF
No ratings yet
Final Se Lab PDF
20 pages
Q. The Marketing Mix. (7 P'S) in Online Context
No ratings yet
Q. The Marketing Mix. (7 P'S) in Online Context
9 pages
Yokogawa Rota Mass
No ratings yet
Yokogawa Rota Mass
274 pages
How to Fix “We Couldn't Verify That This Account Belongs to You.” in My Cash App - Quora
No ratings yet
How to Fix “We Couldn't Verify That This Account Belongs to You.” in My Cash App - Quora
1 page
A Semi-Detailed Lesson Plan in Mathematics 8
89% (19)
A Semi-Detailed Lesson Plan in Mathematics 8
2 pages
Success Story Vaw Arvato PDF
No ratings yet
Success Story Vaw Arvato PDF
2 pages
Dev Guide
No ratings yet
Dev Guide
967 pages
Logo Ok! Bank - Google Search
No ratings yet
Logo Ok! Bank - Google Search
1 page
Pin 5067 Eis and SM Question Paper
No ratings yet
Pin 5067 Eis and SM Question Paper
7 pages
CV Template
No ratings yet
CV Template
1 page
Realhowto Java
No ratings yet
Realhowto Java
810 pages
Professional Services Capabilities PPT 2022
No ratings yet
Professional Services Capabilities PPT 2022
14 pages
FMEA - Chromeleon - v3
No ratings yet
FMEA - Chromeleon - v3
3 pages
Reading and Writing RFID Data With SIMATIC S7-1500 Via IO-Link
No ratings yet
Reading and Writing RFID Data With SIMATIC S7-1500 Via IO-Link
35 pages
Worksheet Place Value
No ratings yet
Worksheet Place Value
45 pages