0% found this document useful (0 votes)

38 views18 pages

Deep Learning: Computer Science and Engineering

This document discusses regularization techniques in deep learning. It introduces regularization, which modifies learning algorithms to reduce generalization error without affecting training error. Common regularization strategies include parameter norm penalties like L2 regularization (weight decay), which adds a penalty term for high parameter norms to the objective function. This limits model capacity and prevents unbounded parameter growth. The document analyzes the effect of L2 regularization on the learning updates and shows that it shrinks parameters along eigenvectors of the Hessian matrix. In deep learning, regularization is widely used to reduce overfitting and obtain a well-generalizing model.

Uploaded by

alok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views18 pages

Deep Learning: Computer Science and Engineering

Uploaded by

alok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Computer Science and Engineering| Indian Institute of Technology Kharagpu

cse.iitkgp.ac.i

Deep Learning

Abir Das
Assistant Professor
Computer Science and Engineering Department
Indian Institute of Technology Kharagpur

http://cse.iitkgp.ac.in/~adas/
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Agenda
• Introduce the concepts of
• Regularization
• Dropout
• Batch normalization

• Resource: Goodfellow Book (Chapter 7)

27 Feb 2020
CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 2
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Recap: The Bias-Variance Decomposition

2
𝐸
𝑜𝑢𝑡 ( 𝒙 )=𝔼 𝒟 [( 𝑔 ( 𝒙 ) − 𝑓 ( 𝒙 ) ) ]
𝒟
𝑛
2 𝒟 𝒟 2
𝔼 𝒟 [ 𝑓 ( 𝒙 ) − 2 𝑓 ( 𝒙 ) 𝑔 𝑛 ( 𝒙 ) + 𝑔𝑛 ( 𝒙 )
¿ ]
2

2
𝒟

𝒟 2
𝒟 2
𝑓 ( 𝒙 ) − 2 𝑓 ( 𝒙 ) 𝔼 𝒟 [ 𝑔𝑛 ( 𝒙 ) ] + 𝔼 𝒟 [ 𝑔 𝑛 ( 𝒙 ) ]
¿ 𝐸 +¿
𝑜𝑢𝑡 ( 𝒙 )=¿ Bias Variance
𝑓 ( 𝒙 ) − 2 𝑓 ( 𝒙 ) 𝑔 ( 𝒙 )+ 𝔼 𝒟 [ 𝑔 𝑛 ( 𝒙 ) ]
¿
2 2 2 𝒟 2
𝑓 ( 𝒙 ) − 𝑔 ( 𝒙 ) +𝑔 ( 𝒙 ) − 2 𝑓 ( 𝒙 ) 𝑔 ( 𝒙 )+ 𝔼𝒟 [ 𝑔 𝑛 ( 𝒙 ) ]
¿
2 𝒟 2 2
¿ ( 𝑓 ( 𝒙 ) − 𝑔 ( 𝒙 ) ) + 𝔼 𝒟 [ 𝑔𝑛 ( 𝒙 ) ] − 𝑔 ( 𝒙 )

Bias Variance

Slide motivation: Malik Magdon-Ismail

27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 3
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Recap: Bias-Variance Trade-off

Test Error
variance

Error
Training
Error
bias

Number of Data Points, N

27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 4
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Regularization
• Machine learning is concerned more about the performance on the test data than on
the training data

• According to the Goodfellow book, chapter 7 – “Many strategies used in Machine

Learning are explicitly designed to reduce the test error, possibly at the expense of
increased training error. These strategies are collectively known as Regularization”.

• Also – in the book, regularization is defined as – “Any modification we make to a

learning algorithm that is intended to reduce its generalization error but not its
training error”.

27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 5
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Regularization Strategies
• Adding restrictions on parameter values

• Adding constraints that are designed to encode specific kinds of prior knowledge

• Use of ensemble methods/dropout

• Dataset augmentation

• In practical Deep Learning scenarios, we almost do find – the best fitting model (in
the sense of minimizing generalization error) is a large model that has been
regularized appropriately

27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 6
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Parameter Norm Penalties

• The most traditional form of regularization to deep learning is adding penalties for high norm of
parameters
• This approach limits the capacity of the model by adding penalty to the objective function resulting in

• When the optimization procedure tries to minimize the objective function, it will also limit the
parameters to grow in an unbounded manner, thus restricting the complexity

𝛼
27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 7
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Parameter Norm Penalties

• Two most common choices are L2 (also known as weight decay in deep learning
community) and L1 norms as penalties

• In neural networks, we typically, choose the ’s as the ’s to regularize – not the biases

• Regularizing bias parameters can introduce significant amount of underfitting

• Thus for neural networks,

27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 8
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

L-2 Parameter Norm Regularization

• L-2 parameter norm penalty is commonly known as Weight Decay

• We can gain some insight into the behavior of weight decay regularization by studying
the gradient of the regularized objective function.

• The gradient is

• So, the update step is

• The addition of weight decay term modifies the learning rule to shrink the weight
vector further before performing the usual gradient update
27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 9
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

L-2 Parameter Norm Regularization

• Further simplification of the analysis will be made by making a quadratic
approximation to the unregularized objective function in the neighborhood of the
optimum weights , to the unregularized objective function.

• H is the Hessian Matrix of J w.r.t. w evaluated at w*.

• What rule/formula is used to get this approximation?

• Taylor series expansion

• Where is the first order term?

• being the minimizing value, is 0

27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 10
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

L-2 Parameter Norm Regularization

• With this approximation, the regularized objective is given by

• Computing the gradient of the above and equating it to 0, we get the minimizing w of
the regularized and approximated objective as,

• As

• As grows, we can see the effect by using eigendecomposition of H

27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 11
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

L-2 Parameter Norm Regularization

• Then
• The effect of weight decay is to rescale along the axes defined by the eigenvectors
of . Specifically, the component of that is aligned with the eigenvector of is rescaled
by a factor

27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 12
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Regularization Strategies: Dataset Augmentation

• One way to get better generalization is to train on more data.
• But under most circumstances, data is limited. Furthermore, labelling is an extremely tedious task.
• Dataset Augmentation provides a cheap and easy way to increase the amount of training data.

Color Jitter

And many many more Horizontal Flip

27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 13
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Regularization Strategies: Dropout

• Bagging is a technique for reducing generalization error through combining several models (Breiman, 1994)
• Bagging: (1) Train k different models on k different subsets of training data, constructed to have the same number
of examples as the original dataset through random sampling from that dataset with replacement
• Bagging: (2) Have all of the models vote on the output for test examples
• Dropout is a computationally inexpensive but powerful extension of Bagging
• Training with dropout consists of training sub-networks that can be formed by removing non-output units from an
underlying base network

Images courtesy: Goodfellow et. al., Karpathy et. al.

27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 14
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Dropout (Fun Intuition)

27 Feb 2020 CS60010 / Deep Learning | Regularization and Batchnorm (c) Abir Das 15
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Batch Normalization
𝑥1
𝑤1
1
𝑚
(𝑖)
𝑤
𝜇= ∑ 𝑥
𝑎 (𝒙 ) 𝑔(𝑎) 𝑚 𝑖=1


𝑥2
2
𝑦

X -
𝑊 0 =𝑏 (elementwise)

𝑥𝑑
𝑤𝑑

1
𝑥1 ( 2)
𝑎
( 1)
h
(1) 𝑎1 h(2)
1 1 1 Can we normalize so as to train , faster ?
𝑥2 ^ 𝑦
Normalize
𝑎(21) h2
(1)
𝑎(22) h(2)
𝑥3 2
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Implementing BatchNorm

Given some intermediate values in NN, :
𝑚

1 (𝑖) If,
𝜇= ∑ 𝑎
𝑚 𝑖=1 2
𝑚
𝛾 = √ 𝜎 + 𝜖

1
𝜎 = ∑ ( 𝑎 − 𝜇) 𝛽=𝜇
2 (𝑖 ) 2
𝑚 𝑖=1
then,
( 𝑖)

(𝑖) 𝑎 −𝜇
𝑎 =
𝑛𝑜𝑟𝑚 2

=
√𝜎 +𝜖

, are the learnable parameters of the model

Use
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i

Effect of Batch Normalization on Biases

(𝑙) ( 𝑙 ) ( 𝑙 −1 ) (𝑙)
𝑎 =𝑤 h +𝑏

We know,

+

+

So,

UPPSC Computer Operator Question Paper 2020 PDF (WWW - Examstocks.com)
100% (2)
UPPSC Computer Operator Question Paper 2020 PDF (WWW - Examstocks.com)
15 pages
Chap 7-1 Regularization For Deep Learning-Keonwoo Noh
No ratings yet
Chap 7-1 Regularization For Deep Learning-Keonwoo Noh
41 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
5 Regularization
No ratings yet
5 Regularization
79 pages
Unit - 4-NNDL - Notes
No ratings yet
Unit - 4-NNDL - Notes
14 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
DL Chpter 3
No ratings yet
DL Chpter 3
8 pages
DL Unit 4
No ratings yet
DL Unit 4
15 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
Batch Normalisation
No ratings yet
Batch Normalisation
17 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
Batch Normalization Separate
No ratings yet
Batch Normalization Separate
20 pages
4th Unit DL Final Class Notes
No ratings yet
4th Unit DL Final Class Notes
68 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
Lecture15 Regularization
No ratings yet
Lecture15 Regularization
47 pages
Deep Learning Basics Lecture 4 Regularization II
No ratings yet
Deep Learning Basics Lecture 4 Regularization II
27 pages
Week 10
No ratings yet
Week 10
69 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Lecture Slides 3 - Bias Variance and Regularisation For Neural Networks - 2021
No ratings yet
Lecture Slides 3 - Bias Variance and Regularisation For Neural Networks - 2021
29 pages
4 NN Regularization
No ratings yet
4 NN Regularization
13 pages
Lecture 6
No ratings yet
Lecture 6
41 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
Regularization Slides
No ratings yet
Regularization Slides
50 pages
07 Regularization
No ratings yet
07 Regularization
51 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
DL Unit-3
No ratings yet
DL Unit-3
10 pages
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
No ratings yet
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
9 pages
Regularization
No ratings yet
Regularization
46 pages
Cours 6
No ratings yet
Cours 6
26 pages
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
No ratings yet
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
20 pages
Cours 4
No ratings yet
Cours 4
30 pages
Improving ML, DL Networks Hyperparameter Tuning, Regularization & Optimization
No ratings yet
Improving ML, DL Networks Hyperparameter Tuning, Regularization & Optimization
16 pages
Unit - 4 REGULARIZATION FOR DEEP LEARNING
No ratings yet
Unit - 4 REGULARIZATION FOR DEEP LEARNING
56 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
DL Lect 7
No ratings yet
DL Lect 7
15 pages
Penalizing Gradient Norm For Efficiently Improving Generalization in Deep Learning
No ratings yet
Penalizing Gradient Norm For Efficiently Improving Generalization in Deep Learning
11 pages
DL IT324a 3
No ratings yet
DL IT324a 3
13 pages
Lec8 Regularization
No ratings yet
Lec8 Regularization
41 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
S10 DNN Regularization Wip
No ratings yet
S10 DNN Regularization Wip
11 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
NN&DL Unit-IV Regularization For Deep Learning
No ratings yet
NN&DL Unit-IV Regularization For Deep Learning
16 pages
UNIT LV
No ratings yet
UNIT LV
8 pages
Lecture 05 - Regularization - 4p
No ratings yet
Lecture 05 - Regularization - 4p
21 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
What Is Regularization.
No ratings yet
What Is Regularization.
10 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
Unit Iv NNHDL
No ratings yet
Unit Iv NNHDL
15 pages
DL Class3
No ratings yet
DL Class3
28 pages
Normalization Techniques
No ratings yet
Normalization Techniques
23 pages
Perceptions of Senior High School Academic Track Students of Jose Rizal University Towards Their Employability Skills
100% (3)
Perceptions of Senior High School Academic Track Students of Jose Rizal University Towards Their Employability Skills
17 pages
Regularization For Neural Networks 1718966083
No ratings yet
Regularization For Neural Networks 1718966083
9 pages
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
No ratings yet
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
10 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Intro International Relations Notes
No ratings yet
Intro International Relations Notes
12 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
Dis4 Sol
No ratings yet
Dis4 Sol
10 pages
NCTE
No ratings yet
NCTE
4 pages
General Biology 1 Workbook
No ratings yet
General Biology 1 Workbook
24 pages
Objective Test
No ratings yet
Objective Test
36 pages
5f551-D36f-0026-86b-301b84f4a UPDATED The Self Help Planner 1 Better Goal Planner
No ratings yet
5f551-D36f-0026-86b-301b84f4a UPDATED The Self Help Planner 1 Better Goal Planner
15 pages
Architecture and Algorithms For Tracking Football Players With Multiple Cameras
No ratings yet
Architecture and Algorithms For Tracking Football Players With Multiple Cameras
5 pages
Data Analyst Cheat Sheet
No ratings yet
Data Analyst Cheat Sheet
28 pages
Manual Del Medidor de Campo
No ratings yet
Manual Del Medidor de Campo
17 pages
MV Forced Vibrations Notes
No ratings yet
MV Forced Vibrations Notes
51 pages
19xr Impeller
No ratings yet
19xr Impeller
1 page
Annual Report Unvr 2017
No ratings yet
Annual Report Unvr 2017
280 pages
EDTM 312 Study Unit 3
No ratings yet
EDTM 312 Study Unit 3
33 pages
SOP - Startup Shutdown and Operation of Coal Mill
No ratings yet
SOP - Startup Shutdown and Operation of Coal Mill
2 pages
Maths p1 2021 g12 Solutions
No ratings yet
Maths p1 2021 g12 Solutions
5 pages
Clustering MIT 15.097 Course Notes
No ratings yet
Clustering MIT 15.097 Course Notes
9 pages
USP NF Friability
No ratings yet
USP NF Friability
3 pages
Touch Screen
No ratings yet
Touch Screen
8 pages
Cat Questions
No ratings yet
Cat Questions
5 pages
Workbook - General Management Skills
No ratings yet
Workbook - General Management Skills
5 pages
MIT15 097S12 Lec04
No ratings yet
MIT15 097S12 Lec04
6 pages
Module 1 Contemporary Arts
No ratings yet
Module 1 Contemporary Arts
46 pages
Insect Science Age Cheilomenes
No ratings yet
Insect Science Age Cheilomenes
9 pages
Drafting, Revising, and Editing: Academic & Professional Communication
No ratings yet
Drafting, Revising, and Editing: Academic & Professional Communication
37 pages
Viscosity and Wettability of Carboxymethylcellulose (CMC) Solutions and Artificial Saliva
No ratings yet
Viscosity and Wettability of Carboxymethylcellulose (CMC) Solutions and Artificial Saliva
9 pages
Unmasking Japans Work Culture
No ratings yet
Unmasking Japans Work Culture
2 pages
4 - Sam-Hq
No ratings yet
4 - Sam-Hq
18 pages
PSYC 6213 Unit2
No ratings yet
PSYC 6213 Unit2
6 pages
Fuel Cell System: Performance and Efficiency
No ratings yet
Fuel Cell System: Performance and Efficiency
2 pages
12 Vision and Language
No ratings yet
12 Vision and Language
68 pages
Sripura GP
No ratings yet
Sripura GP
6 pages
CS60010: Deep Learning: Spring 2021
No ratings yet
CS60010: Deep Learning: Spring 2021
32 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5
No ratings yet
Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5
6 pages
Quiz 2
No ratings yet
Quiz 2
8 pages
Rencana Daftar Kebutuhan Alat & Consumable: No Uraian Jumlah Satuan Alat Berat
No ratings yet
Rencana Daftar Kebutuhan Alat & Consumable: No Uraian Jumlah Satuan Alat Berat
1 page
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
IGNOU BCA Introduction to Database Management Systems Previous Year Unsolved Papers MCS 023
From Everand
IGNOU BCA Introduction to Database Management Systems Previous Year Unsolved Papers MCS 023
Manish Soni
No ratings yet

Deep Learning: Computer Science and Engineering

Uploaded by

Deep Learning: Computer Science and Engineering

Uploaded by

Computer Science and Engineering| Indian Institute of Technology Kharagpu

• Resource: Goodfellow Book (Chapter 7)

Recap: The Bias-Variance Decomposition

Slide motivation: Malik Magdon-Ismail

Recap: Bias-Variance Trade-off

Number of Data Points, N

• According to the Goodfellow book, chapter 7 – “Many strategies used in Machine

• Also – in the book, regularization is defined as – “Any modification we make to a

• Use of ensemble methods/dropout

Parameter Norm Penalties

Parameter Norm Penalties

• Regularizing bias parameters can introduce significant amount of underfitting

• Thus for neural networks,

L-2 Parameter Norm Regularization

• So, the update step is

L-2 Parameter Norm Regularization

• H is the Hessian Matrix of J w.r.t. w evaluated at w*.

• What rule/formula is used to get this approximation?

• Where is the first order term?

L-2 Parameter Norm Regularization

• As grows, we can see the effect by using eigendecomposition of H

L-2 Parameter Norm Regularization

Regularization Strategies: Dataset Augmentation

And many many more Horizontal Flip

Regularization Strategies: Dropout

Images courtesy: Goodfellow et. al., Karpathy et. al.

Dropout (Fun Intuition)

Effect of Batch Normalization on Biases

You might also like