DL Class3
DL Class3
Bias
Variance:
This part is due to the model’s excessive sensitivity to small variations in the training
data. A model with many degrees of freedom (such as a high-degree polynomial model)
is likely to have high variance, and thus to overfit the training data.
Bias-Variance tradeoff
Irreducible error
This part is due to the noisiness of the data itself. The only way to reduce this part
of the error is to clean up the data (e.g., fix the data sources, such as broken
sensors, or detect and remove outliers).
The formula that connects test MSE to bias,variance and irreducible error:
https://en.wikipedia.org/wiki/Bias–variance_tradeoff
Hyperparameter tuning
LR range test: Run your model for several epochs while letting the
learning rate increase linearly between low and high LR values. This
test is enormously valuable whenever you are facing a new
architecture or dataset.
Hyperparameter tuning
(2) Batch Size
• Use as large batch size as possible to fit your memory then you
compare performance of different batch sizes.
• Small batch sizes add regularization while large batch sizes add less, so
utilize this while balancing the proper amount of regularization.
•It is often better to use a larger batch size so a larger learning rate can
be used.
Hyperparameter tuning
(3) Momentum
https://nanonets.com/blog/hyperparameter-optimization/
Regularization: L2 regularization
L2 & L1 regularization:
• Due to the addition of this regularization term, the values of weight matrices
decrease because it assumes that a neural network with smaller weight
matrices leads to simpler models. Therefore, it will also reduce overfitting to
quite an extent.
• In L2, we have:
• It also produces very good results and is consequently the most frequently
used regularization technique in the field of deep learning.
At every iteration, it randomly selects some nodes and removes them along
with all of their incoming and outgoing connections as shown below.
So each iteration has a different set of nodes and this results in a different
set of outputs. It can also be thought of as an ensemble technique in
machine learning.
Dropout
• There are a few ways of increasing the size of the training data –
rotating the image, flipping, scaling, shifting, etc.
• When we see that the performance on the validation set is getting worse,
we immediately stop the training on the model. This is known as early
stopping.