2.6 Regularization
2.6 Regularization
Regularization
Overfitting in Deep Learning
• Overfitting occurs when a model learns the training data too well, capturing even the noise
and random fluctuations instead of general patterns.
Training Phase:
– Suppose we have a given training dataset.
– As the model trains, it gradually adjusts to fit the data points.
– If training continues for too long, the model may fit the data too perfectly, creating a highly
complex decision boundary that captures almost all training points.
Testing Phase:
– When we evaluate the model on a separate test dataset, we observe that it fails to generalize well.
– The decision boundary that worked perfectly for training data does not correctly classify test points.
– This results in high accuracy on training data but low accuracy on test data.
Conclusion:
– Overfitting leads to poor model performance on unseen data.
– The model memorizes specific examples instead of learning generalizable features.
– To prevent overfitting, techniques like regularization, dropout, early stopping, and data
augmentation can be applied.
Regularization
Regularization
Overfitting occurs when a model learns the training data too well, capturing noise and details that do
not generalize to new data. This results in poor performance on real-world data. Regularization
techniques help prevent overfitting in neural networks and other machine learning models.
For example, in a simple scenario with one input and one output, an overfitted model would follow the
training data points too precisely, forming a complex curve that fits every detail. While this might
seem ideal for the training set, it fails to generalize to unseen data, making predictions unreliable.
Regularization
Regularization
Regularization
Regularization
Regularization
Regularization
Regularization
Regularization
Regularization
Regularization
Regularization
Dropout Regularization
Dropout Regularization
• During each training step, every neuron has a probability p of being inactive, which is referred
to as the dropout rate. This is a hyperparameter that must be set before training begins.
• For example, consider a neural network with two hidden layers, each containing four neurons.
If we apply a dropout rate of 0.25 (i.e., each neuron has a 25% chance of being inactive), then
in each training step, some neurons will randomly be disabled.
• In one step, the first neuron from the first layer and the second neuron from the second layer
might be dropped.
• A key aspect of dropout regularization is that it only applies during training—neurons are
randomly dropped to prevent over-reliance on specific ones and to improve generalization.
However, during testing (or inference), all neurons remain active, ensuring the full capacity
of the trained network is utilized for predictions.
Why to scaling with 0.75
In Detail