DL Regularization
DL Regularization
BITS Pilani
Pilani Campus
• The content for these slides has been obtained from books and various other source on the Internet
• I here by acknowledge all the contributors for their material and inputs.
• I have provided source information wherever necessary
• I have added and modified the content to suit the requirements of the course
Mi t e s h M . K h a p r a
Model selection
• l2 regularization
• Dataset augmentation
• Early stopping
• Ensemble methods
• Dropout
l2 regularization- weight decay
Regularized Cost function Add the norm as a penalty term
to the problem of minimizing
the loss. This will ensure that
the weight vector is small.
label = 2
Error
• Track the validation error
• Have a patience parameter p
• If you are at step k and
there was no improvement in
V alidation error
validation error in the
previous p steps then stop
Training error training and return the
k Steps model stored at step k —p
k − p
return
t h i s model
stop
• Basically, stop the training
early before it drives the
training error to 0 and blows
up the validation error
Mi t e s h M . K h a p r a
Early stopping
Ensemble - Bagging
Mi t e s h M . K h a p r a
Drop out
Mi t e s h M . K h a p r a
Drop out
• We initialize all the parameters (weights) of the network and start training
• For the first training instance (or mini-batch), we apply dropout resulting in
the thinned network
• We compute the loss and back propagate
• Which parameters will we update? Only those which are active
Drop out
• https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
• Ref TB Dive into Deep Learning Sections 5.4, 5.5, 5.6 online
version
• IIT M CS7015 (Deep Learning) : Lecture 8
Thank You All !