0% found this document useful (0 votes)

4 views28 pages

DL Regularization

The document provides an overview of deep learning concepts, focusing on training and generalization errors, model complexity, and various regularization techniques. It discusses the importance of model selection and the bias-variance tradeoff, as well as methods like l2 regularization, dataset augmentation, early stopping, ensemble methods, and dropout. The content is derived from multiple sources and is tailored for a deep learning course at BITS Pilani.

Uploaded by

prapti.agarwal.bits24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views28 pages

DL Regularization

Uploaded by

prapti.agarwal.bits24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Deep Learning

BITS Pilani
Pilani Campus

Acknowledgement: IIT M CS7015 (Deep Learning)

Deep Neural Network

Disclaimer and Acknowledgement

• The content for these slides has been obtained from books and various other source on the Internet
• I here by acknowledge all the contributors for their material and inputs.
• I have provided source information wherever necessary
• I have added and modified the content to suit the requirements of the course

BITS Pilani, Pilani Campus

Session Agenda

• Training Error and Generalization Error

• Fit of the model
• Model complexity
• Regularization
• l 2 regularization
• Dataset augmentation
• Early stopping
• Ensemble methods
• Dropout

BITS Pilani, Pilani Campus

Training Error and Generalization Error

• Training error is the error of our model as calculated on the

training dataset.
• Obtained while training the model.
• Generalization error is the expectation of our modelʼs error,
if an infinite stream of additional data examples drawn from
the same underlying data distribution as the original sample
were applied on the model.
• Cannot be computed, but estimated.
• Estimate the generalization error by applying the model to
an independent test set,
• constituted of a random selection of data examples that
were withheld from the training set
Factors that influence the
generalizability of a model
1. The number of tunable parameters.
• When the number of tunable parameters, called the
degrees of freedom, is large, models tend to be more
susceptible to overfitting.
2. The values taken by the parameters.
• When weights can take a wider range of values, models
can be more susceptible to overfitting.
3. The number of training examples.
• It is trivially easy to overfit a dataset containing only one
or two examples even if your model is simple.
• But overfitting a dataset with millions of examples
requires an extremely flexible model.
Fit of the model
Underfitting Overfitting

High Training loss

Low Training loss Low Training loss
High Validation loss
Low Validation loss High Validation loss
Little gap between both

Slide credit: Andrew Ng

Bias- variance

• Simple models trained on different

samples of the data do not differ much from
each other

• However they are very far from the true

sinusoidal curve (under fitting)

• On the other hand, complex models trained

on different samples of the data are very
different from each other (high variance)

Simple model: high bias, low variance

Complex model: low bias, high variance

Slide credit: IITM CS7015 BITS Pilani, Pilani Campus

Model complexity

• Simple models and abundant data

• Expect the generalization error to resemble the training error.
• More complex models and fewer examples
• Expect the training error to go down but the generalization gap
to grow.
• Model complexity
• A model with more parameters might be considered more
complex.
• A model whose parameters can take a wider range of values
might be more complex.
• A neural network model that takes more training iterations are
more complex, and
• One subject to early stopping (fewer training iterations) are
less complex.
Model complexity

BITS Pilani, Pilani Campus

Model complexity

• Let there be n training points and m test (validation) points

• As the model complexity increases trainerr becomes overly optimistic and

gives us a wrong picture of how close f̂ is to f
• The validation error gives the real picture of how close f̂ is to f

Mi t e s h M . K h a p r a
Model selection

• Model selection is the process of selecting the final model after

evaluating several candidate models.
• With MLPs, compare models with
• different numbers of hidden layers,
• different numbers of hidden units
• different activation functions applied to each hidden layer.
• We should touch the test data once, to assess the very best model
or to compare a small number of models to each other
• Use Validation dataset to determine the best among our candidate
models
• In deep learning, with millions of data available, the split is
generally
• Training = 98-99 % of the original dataset
• Validation = 1-2 % of training dataset
• Testing = 1-2 % of the original dataset
Model selection
Model complexity

• Why do we care about this bias variance tradeoff and model

complexity?
• Deep Neural networks are highly complex models. Many
parameters, many nonlinearities.
• It is easy for them to overfit and drive training error to 0.
• Hence we need some form of regularization.
Different forms of regularization

• l2 regularization
• Dataset augmentation
• Early stopping
• Ensemble methods
• Dropout
l2 regularization- weight decay
Regularized Cost function Add the norm as a penalty term
to the problem of minimizing
the loss. This will ensure that
the weight vector is small.

Regularized Cost function – Logistic regression

w t+1 = wt —η∇L (wt) —η  wt

w0 is not regularized
Regularized Cost function – Neural network
Dataset augmentation

label = 2

[given training data] We

exploit the fact that certain
transformations to the image do
not change the label of the image.

• Typically, More data = better

learning
• Works well for image classification
[augmented data = created using
/ object recognition tasks Also some knowledge of the task]
shown to work well for speech
• For some tasks it may not be clear
how to generate such data
Slide credit: IITM CS7015
Early stopping

Error
• Track the validation error
• Have a patience parameter p
• If you are at step k and
there was no improvement in
V alidation error
validation error in the
previous p steps then stop
Training error training and return the
k Steps model stored at step k —p
k − p
return
t h i s model
stop
• Basically, stop the training
early before it drives the
training error to 0 and blows
up the validation error

Mi t e s h M . K h a p r a
Early stopping
Ensemble - Bagging

Each model trained with a different sample

of the data (sampling with replacement)
Ensemble - Bagging
• Typically model averaging(bagging ensemble) always helps
• Training several large neural networks for making an ensemble
is prohibitively expensive
• Option 1: Train several neural networks having different
architectures(obviously expensive)
• Option 2: Train multiple instances of the same network using
different training samples (again expensive)
• Even if we manage to train with option 1 or option 2, combining
several models at test time is infeasible in real time applications

Mi t e s h M . K h a p r a
Drop out

• Dropout is a technique which addresses both these issues.

• Effectively it allows training several neural networks without
any significant computational overhead.
• Also gives an efficient approximate way of combining
exponentially many different neural networks.
Drop out
• Dropout refers to dropping out units
• Temporarily remove a node and all its
incoming/outgoing connections resulting in a thinned
network
• Each node is retained with a fixed probability (typically p
= 0.5) for hidden nodes and p = 0.8 for visible nodes

Mi t e s h M . K h a p r a
Drop out

• Suppose a neural network has n nodes

• Using the dropout idea, each node can be retained or dropped
• For example, in the above case we drop 5 nodes to get a thinned network
Given a total of n nodes, what are the total number of thinned networks
that can be formed? 2n
• we cannot possibly train so many networks
• Trick: (1) Share the weights across all the networks
(2) Sample a different network for each training
instance
Drop out

• We initialize all the parameters (weights) of the network and start training
• For the first training instance (or mini-batch), we apply dropout resulting in
the thinned network
• We compute the loss and back propagate
• Which parameters will we update? Only those which are active
Drop out

• For the second training instance (or mini-batch), we again apply

dropout resulting in a different thinned network
• We again compute the loss and backpropagate to the active weights
• If the weight was active for both the training instances then it would
have received two updates by now
• If the weight was active for only one of the training instances then it
would have received only one updates by now
• Parameter sharing ensures that no model has untrained or poorly
trained parameters
Drop out

• Prevents hidden units from coadoption

• Dropout gives a smaller neural network, giving the effect of
• regularization.
• In general,
• Vary keep probability (0.5 to 0.8) for each hidden layer.
• The input layer has a keep probability of 1.0 or 0.9.
• The output layer has a keep probability of 1.0.
References

• https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
• Ref TB Dive into Deep Learning Sections 5.4, 5.5, 5.6 online
version
• IIT M CS7015 (Deep Learning) : Lecture 8
Thank You All !

BITS Pilani, Pilani Campus

Multivariate Data Integration Using R
No ratings yet
Multivariate Data Integration Using R
331 pages
Facto MIneR-PCA Datos Decathlon
No ratings yet
Facto MIneR-PCA Datos Decathlon
19 pages
DL CS 7 M4 Live Class Flow
No ratings yet
DL CS 7 M4 Live Class Flow
37 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Cours 4
No ratings yet
Cours 4
30 pages
Lecture W15ab
No ratings yet
Lecture W15ab
44 pages
Neural Networks Bias
No ratings yet
Neural Networks Bias
7 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
Chap 4 Slides
No ratings yet
Chap 4 Slides
61 pages
Week 10
No ratings yet
Week 10
69 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
AN2DL 03 2324 NeuralNetwroksTraining
No ratings yet
AN2DL 03 2324 NeuralNetwroksTraining
40 pages
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
No ratings yet
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
39 pages
DL Class3
No ratings yet
DL Class3
28 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
Tutorial 4
No ratings yet
Tutorial 4
6 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
Deep Learning Hand Book 2024
No ratings yet
Deep Learning Hand Book 2024
185 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
25 pages
A Little Book of Deep Learning - Francois Fleuret
No ratings yet
A Little Book of Deep Learning - Francois Fleuret
149 pages
Home Assignment Submission Solutions
No ratings yet
Home Assignment Submission Solutions
82 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
Deep Learning Model
No ratings yet
Deep Learning Model
144 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
RADL TQKhoat
No ratings yet
RADL TQKhoat
50 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
6 Working Example 01-08-2024
No ratings yet
6 Working Example 01-08-2024
21 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Deep Learning Algorithms Report PDF
No ratings yet
Deep Learning Algorithms Report PDF
11 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
140 pages
The - Little - Book - of - Deep Learning
No ratings yet
The - Little - Book - of - Deep Learning
140 pages
Unit 4 Short Notes
No ratings yet
Unit 4 Short Notes
27 pages
The Little Book of Deep Learning
100% (1)
The Little Book of Deep Learning
140 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
CMPE257 - W2C3 - ML Fundamentals - Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals - Part 2
34 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Lec 8
No ratings yet
Lec 8
43 pages
Ceng403 - Week 6b
No ratings yet
Ceng403 - Week 6b
51 pages
NN 08
No ratings yet
NN 08
36 pages
LBDL
No ratings yet
LBDL
143 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
143 pages
Lecture 2
No ratings yet
Lecture 2
67 pages
Lbdlu
No ratings yet
Lbdlu
168 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
167 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
20 pages
Lecture 2.2 Example Data Preparation Feature Engineering
No ratings yet
Lecture 2.2 Example Data Preparation Feature Engineering
25 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
DL Activation
No ratings yet
DL Activation
41 pages
CS 3 Vector Semantics
No ratings yet
CS 3 Vector Semantics
46 pages
Contact Session8 - Viterbie and Forward Backward Algortihm
No ratings yet
Contact Session8 - Viterbie and Forward Backward Algortihm
63 pages
Contact Session7 - POS Tagging
No ratings yet
Contact Session7 - POS Tagging
65 pages
Studi Deskriptif Effect Size Penelitian
No ratings yet
Studi Deskriptif Effect Size Penelitian
17 pages
(Ebook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach Instant Download
100% (1)
(Ebook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach Instant Download
46 pages
CHAPTER 1 - BASIC TO ENGINEERING STATISTICS (For Lecturer)
No ratings yet
CHAPTER 1 - BASIC TO ENGINEERING STATISTICS (For Lecturer)
113 pages
Ch.5 Factorial CRD
No ratings yet
Ch.5 Factorial CRD
11 pages
Zero-Inflated Poisson Regression Mixture Model
No ratings yet
Zero-Inflated Poisson Regression Mixture Model
8 pages
Chapter5 Infererence Based On Two Samples
No ratings yet
Chapter5 Infererence Based On Two Samples
37 pages
Week 4 Hypothesis Test Concerning Proportions
No ratings yet
Week 4 Hypothesis Test Concerning Proportions
33 pages
Ml-Mod 1 Pyq and Imp QN
No ratings yet
Ml-Mod 1 Pyq and Imp QN
12 pages
Design-Based Causal Inference in Bipartite Experiments
No ratings yet
Design-Based Causal Inference in Bipartite Experiments
32 pages
1 Assignment Solution Submision
No ratings yet
1 Assignment Solution Submision
3 pages
S CS2 PaperB PDF
No ratings yet
S CS2 PaperB PDF
13 pages
Factorial Experiments
No ratings yet
Factorial Experiments
16 pages
Stats Workbook For College Students
No ratings yet
Stats Workbook For College Students
337 pages
Correlation and Regression: Libeeth B. Guevarra
No ratings yet
Correlation and Regression: Libeeth B. Guevarra
10 pages
Quantitative Analysis For Management Ch04
100% (1)
Quantitative Analysis For Management Ch04
71 pages
Jurnal RRB Speaking Ester Dan Khairunnisa Revised
No ratings yet
Jurnal RRB Speaking Ester Dan Khairunnisa Revised
7 pages
Spanish Validation of The Parental Acceptance Questionnaire (6-PAQ)
No ratings yet
Spanish Validation of The Parental Acceptance Questionnaire (6-PAQ)
10 pages
IT ML Lab
No ratings yet
IT ML Lab
35 pages
属性MSA
No ratings yet
属性MSA
23 pages
Module Lesson 3 - MD
No ratings yet
Module Lesson 3 - MD
17 pages
Jiajing Zhneg Resume
No ratings yet
Jiajing Zhneg Resume
1 page
Qualities of A Good Measuring Instrument
100% (1)
Qualities of A Good Measuring Instrument
9 pages
Mini Project - Golf: By: Kantimati Subramanian Iyer
No ratings yet
Mini Project - Golf: By: Kantimati Subramanian Iyer
12 pages
Exercises With Some Solutions
No ratings yet
Exercises With Some Solutions
25 pages
Applied Statistics Lecture 13
No ratings yet
Applied Statistics Lecture 13
13 pages
SS 187 - Applied Statistics For Humanities - Salaar Khan
No ratings yet
SS 187 - Applied Statistics For Humanities - Salaar Khan
5 pages
ES031 M3 HypothesisTestingSingleSample
No ratings yet
ES031 M3 HypothesisTestingSingleSample
55 pages
Comparative Analysis
No ratings yet
Comparative Analysis
60 pages