0% found this document useful (0 votes)

19 views28 pages

DL Class3

This document provides an overview of deep learning techniques, including: [1] Limitations of gradient descent and momentum-based improvements; [2] Overfitting and solutions like regularization, data augmentation, and early stopping; [3] The bias-variance tradeoff in modeling; [4] Hyperparameter tuning of learning rate, batch size, momentum, and weight decay; [5] Specific regularization techniques like L2 regularization, dropout, and data augmentation. Learning resources on deep learning and neural networks are also provided.

Uploaded by

Rishi Chaary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views28 pages

DL Class3

Uploaded by

Rishi Chaary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

18AIC301J: DEEP LEARNING TECHNIQUES

B. Tech in ARTIFICIAL INTELLIGENCE, 5th semester

Faculty: Dr. Athira Nambiar

Section: A, slot:D
Venue: TP 804
Academic Year: 2022-22
UNIT-2

Limitations of gradient descent

learning algorithm, Contour maps,
Momentum based gradient descent, Nesterov accelerated gradient
Descent, AdaGrad, RMSProp, Adam learning
Algorithm, Stochastic gradient descent
Implement linear regression with stochastic gradient descent
Mini-batch gradient descent, Bias Variance tradeoff, Overfitting in deep neural networks,
Hyperparameter tuning, Regularization: L2 regularization, Dataset Augmentation and Early
stopping
Implement linear regression with
stochastic mini-batch gradient
descent and compare the results with
previous exercise.
Dimensionality reduction, Principal Component Analysis, Singular value decomposition
Autoencoders, Relation between PCA and
Autoencoders, Regularization in Autoencoders
Optimizing neural networks using
L2 regularization, Dropout, data
augmentation and early stopping.
What is OVERFITTING?
GENERALIZATION?
Overfitting
Underfitting
• underfitting is the opposite of overfitting
• it occurs when the model is too simple
to learn the underlying structure of the
data.

• The model neither fits the training data

nor generalizes on the new data.

The main options to fix this problem are:

• Selecting a more powerful model, with more parameters

• Feeding better features to the learning algorithm (feature engineering)

• Reducing the constraints on the model (e.g., reducing the regularization
hyper‐ parameter)
Overfitting: solutions?
Overfitting happens when the model is too complex relative to the amount
and noisiness of the training data.

The possible solutions are:

• To simplify the model by selecting one with fewer parameters by
reducing the number of attributes in the training data or by
constraining the model
• To gather more training data
• To reduce the noise in the training data (e.g., fix data errors and
remove outliers)
• Regularization
Bias-Variance tradeoff
An important theoretical result of statistics and Machine Learning is the fact that a
model’s generalization error can be expressed as the sum of three very different errors:

Bias

This part of the generalization error is due to wrong assumptions, such as

assuming that the data is linear when it is actually quadratic. A high-bias model
is most likely to underfit the training data

Variance:

This part is due to the model’s excessive sensitivity to small variations in the training
data. A model with many degrees of freedom (such as a high-degree polynomial model)
is likely to have high variance, and thus to overfit the training data.
Bias-Variance tradeoff
Irreducible error

This part is due to the noisiness of the data itself. The only way to reduce this part
of the error is to clean up the data (e.g., fix the data sources, such as broken
sensors, or detect and remove outliers).

The formula that connects test MSE to bias,variance and irreducible error:
https://en.wikipedia.org/wiki/Bias–variance_tradeoff
Hyperparameter tuning

Finding Optimal Hyper-parameters

(1) Learning Rate (LR)
If the learning rate (LR) is too small, overfitting can occur. Large learning rates
help to regularize the training but if the learning rate is too large, the training
will diverge.
Hyperparameter tuning
(1) Learning Rate (LR)

•Perform a learning rate range test to identify a “large” learning rate.

•Using the 1-cycle LR policy with a maximum learning rate determined

from an LR range test, set a minimum learning rate as a tenth of the
maximum.

LR range test: Run your model for several epochs while letting the
learning rate increase linearly between low and high LR values. This
test is enormously valuable whenever you are facing a new
architecture or dataset.
Hyperparameter tuning
(2) Batch Size

• Batch size must be examined in conjunction with the execution time of

the training

• The batch size is limited by your hardware’s memory, while the

learning rate is not.

• Use as large batch size as possible to fit your memory then you
compare performance of different batch sizes.

• Small batch sizes add regularization while large batch sizes add less, so
utilize this while balancing the proper amount of regularization.

•It is often better to use a larger batch size so a larger learning rate can
be used.
Hyperparameter tuning
(3) Momentum

•Test with short runs of momentum values 0.99, 0.97,

0.95, and 0.9 to get the best value for momentum.

•If using the 1-cycle learning rate schedule, it is better to

use a cyclical momentum (CM) that starts at this maximum
momentum value and decreases with increasing learning
rate to a value of 0.8 or 0.85.

left: learning rate one cycle, right:momentum for one cycle

Hyperparameter tuning
(4) Weight decay

• Weight decay is defined as multiplying each weight in the

gradient descent at each epoch by a factor λ [0<λ<1].

•Weight Decay, is a regularization technique applied to the

weights of a neural network. We minimize a loss function
compromising both the primary loss function and a penalty on
the norm of the weights

λ is a value determining the strength of the penalty

(encouraging smaller weights).

https://nanonets.com/blog/hyperparameter-optimization/
Regularization: L2 regularization

L2 & L1 regularization:

• L1 and L2 are the most common types of regularization.

• These update the general cost function by adding another term known as the
regularization term.

Cost function = Loss (say, binary cross entropy) + Regularization term

• Due to the addition of this regularization term, the values of weight matrices
decrease because it assumes that a neural network with smaller weight
matrices leads to simpler models. Therefore, it will also reduce overfitting to
quite an extent.

• In L2, we have:

(lambda is the regularization parameter)

Dropout
• This is the one of the most interesting types of regularization techniques.

• It also produces very good results and is consequently the most frequently
used regularization technique in the field of deep learning.

• To understand dropout, let’s say our neural network structure:

Dropout
What does dropout do?

At every iteration, it randomly selects some nodes and removes them along
with all of their incoming and outgoing connections as shown below.

So each iteration has a different set of nodes and this results in a different
set of outputs. It can also be thought of as an ensemble technique in
machine learning.
Dropout

This probability of choosing how many nodes should be dropped is the

hyperparameter of the dropout function. As seen in the image above, dropout
can be applied to both the hidden layers as well as the input layers.
Dataset Augmentation

• The simplest way to reduce overfitting is to increase the size of

the training data.

• In machine learning, we were not able to increase the size of

training data as the labeled data was too costly.

• There are a few ways of increasing the size of the training data –
rotating the image, flipping, scaling, shifting, etc.

• Some transformation has been done on the handwritten digits

dataset.
Dataset Augmentation
Dataset Augmentation
Early stopping
• Early stopping is a kind of cross-validation strategy where we keep one part
of the training set as the validation set.

• When we see that the performance on the validation set is getting worse,
we immediately stop the training on the model. This is known as early
stopping.

• Stop training at the dotted

line since after that our
model will start overfitting
on the training data.
Learning
Resources
• Charu C. Aggarwal, Neural Networks and Deep Learning, Springer, 2018.
• Eugene Charniak, Introduction to Deep Learning, MIT Press, 2018.
• Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, MIT Press, 2016.
• Michael Nielsen, Neural Networks and Deep Learning, Determination Press, 2015.
• Deng & Yu, Deep Learning: Methods and Applications, Now Publishers, 2013.
• https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-
regularization-techniques/
Thank you

Operating System Os Notes New Cs 2nd Year
No ratings yet
Operating System Os Notes New Cs 2nd Year
89 pages
100 Employee Data Set
No ratings yet
100 Employee Data Set
7 pages
Dcscombined 18 Bec 1277
No ratings yet
Dcscombined 18 Bec 1277
96 pages
Assignment Unit 2 Problem Solving by Search
No ratings yet
Assignment Unit 2 Problem Solving by Search
2 pages
CNN Basic Structure, Hyper-Parameter Tuning, Regularization-Dropouts
No ratings yet
CNN Basic Structure, Hyper-Parameter Tuning, Regularization-Dropouts
54 pages
DL CS 7 M4 Live Class Flow
No ratings yet
DL CS 7 M4 Live Class Flow
37 pages
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
20 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Lecture Week 8 Part 2
No ratings yet
Lecture Week 8 Part 2
9 pages
10&11 Week
No ratings yet
10&11 Week
5 pages
ADA Assignemet1
No ratings yet
ADA Assignemet1
8 pages
Unit 1
No ratings yet
Unit 1
21 pages
Question Bank
No ratings yet
Question Bank
12 pages
Dataset Augmentation
No ratings yet
Dataset Augmentation
30 pages
Lecture 9
No ratings yet
Lecture 9
38 pages
Adijfpqo
No ratings yet
Adijfpqo
8 pages
Unit Online 1.3
No ratings yet
Unit Online 1.3
21 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
Lecture 1 Part II
No ratings yet
Lecture 1 Part II
24 pages
Malika
No ratings yet
Malika
21 pages
8 Reducibility
No ratings yet
8 Reducibility
3 pages
Week 10
No ratings yet
Week 10
69 pages
HW3 Tot
No ratings yet
HW3 Tot
10 pages
DL IT324a 3
No ratings yet
DL IT324a 3
13 pages
Question Paper Code: Electronics and Communication Engineering EC 6501 - Digital Communication
No ratings yet
Question Paper Code: Electronics and Communication Engineering EC 6501 - Digital Communication
3 pages
DL Regularization
No ratings yet
DL Regularization
28 pages
DL Lect 7
No ratings yet
DL Lect 7
15 pages
ANN Presentation Exam Hafsa
No ratings yet
ANN Presentation Exam Hafsa
29 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Regularization
No ratings yet
Regularization
3 pages
Dynamic Programming
No ratings yet
Dynamic Programming
43 pages
4 NN Regularization
No ratings yet
4 NN Regularization
13 pages
Cst414-Deep Learning Module 2
No ratings yet
Cst414-Deep Learning Module 2
13 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Home Assignment Submission Solutions
No ratings yet
Home Assignment Submission Solutions
82 pages
10 Trees
No ratings yet
10 Trees
57 pages
Underfitting Overfitting
No ratings yet
Underfitting Overfitting
38 pages
18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024
No ratings yet
18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024
5 pages
Regularization Slides
No ratings yet
Regularization Slides
50 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
What Is Regularization.
No ratings yet
What Is Regularization.
10 pages
DL Notes
No ratings yet
DL Notes
16 pages
Unit 3
No ratings yet
Unit 3
47 pages
Oving-The-Beginners-Pid-Introduction/ Improving The Beginner's PID - Introduction
No ratings yet
Oving-The-Beginners-Pid-Introduction/ Improving The Beginner's PID - Introduction
2 pages
Deep Learning Module 2 Important Topics PYQs
No ratings yet
Deep Learning Module 2 Important Topics PYQs
30 pages
NNDL Notes
No ratings yet
NNDL Notes
73 pages
Curs5site PDF
No ratings yet
Curs5site PDF
47 pages
S10 DNN Regularization Wip
No ratings yet
S10 DNN Regularization Wip
11 pages
Unit 4
No ratings yet
Unit 4
35 pages
DL Mod 2
No ratings yet
DL Mod 2
4 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
ML Syllabus
No ratings yet
ML Syllabus
3 pages
Section-3.8 Gauss Jacobi
No ratings yet
Section-3.8 Gauss Jacobi
15 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
Cours 4
No ratings yet
Cours 4
30 pages
Distance Properties of Block Codes (Cond..) Minimum Distance Decoding Some Bounds On The Code Size
No ratings yet
Distance Properties of Block Codes (Cond..) Minimum Distance Decoding Some Bounds On The Code Size
28 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
DL Unit 3
No ratings yet
DL Unit 3
14 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
D1 Exercise 1A
No ratings yet
D1 Exercise 1A
3 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Digital Signal Processing (DSP) Course File (A.Y.2021-2022) (Academic Regulations - 2018)
No ratings yet
Digital Signal Processing (DSP) Course File (A.Y.2021-2022) (Academic Regulations - 2018)
233 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Lecture 6
No ratings yet
Lecture 6
41 pages
2EE71OE3 - Optimization Techniques
No ratings yet
2EE71OE3 - Optimization Techniques
2 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Deep Learning Basics Lecture 4 Regularization II
No ratings yet
Deep Learning Basics Lecture 4 Regularization II
27 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
Unit V NNHDL
No ratings yet
Unit V NNHDL
33 pages
Handwritten Bangla Digit Recognition Using Deep Learning: Alomm Udayton EDU
No ratings yet
Handwritten Bangla Digit Recognition Using Deep Learning: Alomm Udayton EDU
12 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
AIM: Program To Find Factorial of A Number by Using Recursion
No ratings yet
AIM: Program To Find Factorial of A Number by Using Recursion
18 pages
40 Machine Learning Algorithms
From Everand
40 Machine Learning Algorithms
Anam Giri
No ratings yet
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
C G Sample Programs
No ratings yet
C G Sample Programs
26 pages
Regularization For Neural Networks 1718966083
No ratings yet
Regularization For Neural Networks 1718966083
9 pages
1989 - On Genetic Crossover Operators For Relative Order Preservation
No ratings yet
1989 - On Genetic Crossover Operators For Relative Order Preservation
11 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
Deep Learning
100% (2)
Deep Learning
49 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Umerical Ifferentiation AND Ntegration: Chapter Objectives
No ratings yet
Umerical Ifferentiation AND Ntegration: Chapter Objectives
58 pages
Cbse Test Paper-03: MATHEMATICS (Class-10) Chapter 2. Polynomials
No ratings yet
Cbse Test Paper-03: MATHEMATICS (Class-10) Chapter 2. Polynomials
1 page
Hyperparameters
No ratings yet
Hyperparameters
15 pages

DL Class3

Uploaded by

DL Class3

Uploaded by

18AIC301J: DEEP LEARNING TECHNIQUES

B. Tech in ARTIFICIAL INTELLIGENCE, 5th semester

Faculty: Dr. Athira Nambiar

Limitations of gradient descent

• The model neither fits the training data

The main options to fix this problem are:

• Feeding better features to the learning algorithm (feature engineering)

The possible solutions are:

This part of the generalization error is due to wrong assumptions, such as

Finding Optimal Hyper-parameters

•Perform a learning rate range test to identify a “large” learning rate.

•Using the 1-cycle LR policy with a maximum learning rate determined

• Batch size must be examined in conjunction with the execution time of

• The batch size is limited by your hardware’s memory, while the

•Test with short runs of momentum values 0.99, 0.97,

•If using the 1-cycle learning rate schedule, it is better to

left: learning rate one cycle, right:momentum for one cycle

• Weight decay is defined as multiplying each weight in the

•Weight Decay, is a regularization technique applied to the

λ is a value determining the strength of the penalty

• L1 and L2 are the most common types of regularization.

Cost function = Loss (say, binary cross entropy) + Regularization term

(lambda is the regularization parameter)

• To understand dropout, let’s say our neural network structure:

This probability of choosing how many nodes should be dropped is the

• The simplest way to reduce overfitting is to increase the size of

• In machine learning, we were not able to increase the size of

• Some transformation has been done on the handwritten digits

• Stop training at the dotted

You might also like