Gradient Problems
Gradient Problems
Networks
Introduction
Neural networks rely on the backpropagation algorithm to update the weights of the
model based on the gradient of the loss function. During training, issues such as the
vanishing and exploding gradients can arise, particularly in deep neural networks.
These problems can severely impact the training process, leading to slow convergence
or unstable updates.
The vanishing gradient problem is a challenge that occurs when training deep neural
networks, where the gradients used to update the network's weights become very
small. [1]
During backpropagation, the error gradient is propagated back through the network to
update the weights. However, in deep networks, the gradient can become very small as
it moves back to the initial layers, making it difficult to update the weights. This can lead
to the network learning very slowly or not at all. [3, 4]
The consequences of vanishing gradient problems include slow convergence, the
network getting stuck in low minima, and impaired learning of deep
representations.
Reference:
[1] https://www.engati.com/glossary/vanishing-gradient-problem
[2] https://en.wikipedia.org/wiki/Vanishing_gradient_problem
[3] https://www.linkedin.com/pulse/vanishing-gradient-problem-iain-brown-ph-d--
5qyle
[4] https://mrinalwalia.medium.com/understanding-the-vanishing-gradient-
problem-in-deep-learning-c648a4f16b05
[5] https://medium.com/@amanatulla1606/vanishing-gradient-problem-in-deep-
learning-understanding-intuition-and-solutions-da90ef4ecb54
2. Exploding gradient problem
When It Occurs:
It's a counterpart to the vanishing gradient problem that occurs when the
gradients in a neural network grow exponentially during the training phase.
This can lead to unstable training dynamics, making it challenging for the network
to converge into an optimal solution.
The "exploding gradient problem" in deep learning occurs when gradients during
backpropagation become excessively large, causing the model to update weights
drastically and destabilize the training process.[1]
Consequences :
Remedies
● Gradient Clipping:
● Weight Regularization:
stability in training.
exploding gradients.
● Residual Connections:
● Penalties
● Truncated Backpropagation
When These Problems Commonly Occur
Reference
[1] https://deepai.org/machine-learning-glossary-and-terms/exploding-gradient-
problem
[2] https://spotintelligence.com/2023/12/06/exploding-gradient-problem/
[3] https://medium.com/@fraidoonomarzai99/vanishing-and-exploding-gradient-
problems-in-deep-learning-057a275fde6f
[4] https://www.linkedin.com/advice/0/what-best-way-handle-exploding-
gradients-artificial-tt1rc
[5] https://www.analyticsvidhya.com/blog/2024/04/exploring-vanishing-and-
exploding-gradients-in-neural-networks/
[6] https://www.kdnuggets.com/2022/02/vanishing-gradient-problem.html
[7] https://medium.com/@piyushkashyap045/understanding-batch-normalization-
in-deep-learning-a-beginners-guide-40917c5bebc8
Leaky ReLU Mitigates vanishing Allows small gradients for negative values.
Real-world Problems:
Vanishing Gradients:
Exploding Gradients:
● Sequence-to-Sequence Models:
○ Example: Training RNNs for language translation often results in exploding
gradients, requiring gradient clipping to stabilize training.
Table for Remedies
He Initialization Optimized for ReLU May not work well with Deep networks with
activations all activations ReLU variants
Residual Direct gradient flow for Requires architectural Very deep networks
Connections deep layers design changes (e.g., ResNet)
Conclusion
The vanishing and exploding gradient problems highlight the challenges of training deep
neural networks. These issues can render a model ineffective if not addressed properly.
Modern deep learning practices—including optimized activation functions, weight
initialization techniques, and advanced architectures like residual networks—have made
it possible to mitigate these problems and train deeper networks effectively. By
understanding and addressing these challenges, we can ensure stable and efficient
neural network training.