LECTURE#9 EE258 F22 Part2 Draft v1

Regularization for Deep Learning
(Part- 2)
Based on Chapter 7 of Deep Learning textbook
0
What is Regularization?
• How to make an algorithm perform well not only on the
training data but also on new inputs?
• Strategies designed to reduce test error = REGULARIZATION
optimal
/
[ huge GAP --~-→ overfitting
1
Regularization Methods
LI :rlw)=HwH ,
• 7.1 Parameter Norm Penalties /

→ :rH=Hw¥ L2
/
• 7.4 Dataset Augmentation ☆ regularization parameter
• 7.5 Noise Robustness
• 7.6 Semi-supervised Learning
• 7.7 Multitask Learning
• 7.8 Early Stopping
• 7.9 Parameter Sharing (will be discussed in CNN)
• 7.11 Bagging and Ensemble Models
• 7.12 Dropout
2
Dataset Augmentation
• It is always better to train a model using more data, but data
is in general limited.
" "
• Data Augmentation: create fake data using transformations in

the inputs and add it to the training
-
• Usually used for classification problems

– Transformations of images
– Adding new synthetic images
– Adding noisy signals
– Etc
3
Data Augmentation for Object Recognition
Object Recognition Problem: detect the animal
Inputs
Note: Make sure the transformations does not

change the correct class
Rotate
180o
Outputs
Lion Images are from mygreatlearning.com 4
Data Augmentation for Speech Recognition
Add Noise
" "
sound of off
• Add noise è Make neural networks more robust

• Stretch, pitch shift, etc.
5
Noise Robustness
-
lecture
• Adding Noise to Inputs
– Discussed in data augmentation ¥wi¥s
– It was shown that this is equivalent to penalty on the norm under
certain conditions (Bishop 95)
Remen n→
• Adding Noise to Weights
-
Update egn
: w→cn 1)
+ = ch
217J
- +
– It a way to reflect the uncertainty Rn N

EI
lo , )
– Pushes the solution into regions where the model is relatively ii.¥Éan
insensitive to small changes variance
Gaussian
noise
• Adding Noise to the Output Labels

– Most datasets have some number of mistakes at the output
For some ϵ assume that y is correct with probability 1- ϵ:
• Label Smoothing + Binary Classification: Replace 0 & 1 with ϵ and 1- ϵ.
Ismat pas .
#
6
Semi-supervised Learning
In
• In general, models are developed using both unsupervised

and supervised learning methods
y
X Unsupervised Supervised
Learning (PCA, Learning
Better representation of inputs
Kmeans)
(a version of X)
• Semi-supervised learning:
X Improved generalization better

Semi -supervised
Learning test
y for some x error
– Use unlabelled x to model p(x)

7
Multitask Learning
• Standard in ML: Learn one task at a time
• Learning tasks with the aim of mutual benefit (assumption: all
tasks are related)
human or boy or
Example: Image classification animal girl
Project Ii
human or animalca
Task-
specific
project 2 : boy or girl

Shared-
generic
– Increased number of inputs

– Shared parameters
– Improves generalization if the assumption is correct ( helps with regularization)
8
Early Stopping
• Most commonly used regularization method
-
• Stop when validation error do not improve anymore
Store Parameters and Return Back
Trade-offs:
– Space for storing the parameters
– Some training data is used for validation
• Retrain again with all the data and use early stopping point from previous
training
• Continue training with all the data check if error decreases below the
early stopping point
9
Parameter Sharing
• Force constraints on the parameter è be equal to each other
U2
èREDUCES THE NUMBER OF PARAMETERS
④ 3)
WL
→
v3
v3
• Example: CNN (Convolutional Neural Networks)
10
Bagging (an ensemble model)
of
• Bagging:
"
– Regression
n=3
[ →
^
2
Egging )
f
i – Classification
↑ "
% (F) Majority
=
Rule
{ Fit :*}
,
BOOTSTRAPPING
-
11
Other Ensemble Models
Combine predictions
I see
prey
.
slide )
12
Dropout
• Definition: A practical way of implementing a version of
bagging in NN for a large set of models sharing parameters
REMOVE
NON-OUTPUT
UNITS
Drop-out rate: Between 0 and

1. Fraction of the units to drop.
13

LECTURE#9 EE258 F22 Part2 Draft v1

Uploaded by

Copyright:

Available Formats

LECTURE#9 EE258 F22 Part2 Draft v1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LECTURE#9 EE258 F22 Part2 Draft v1

Uploaded by

Copyright:

Available Formats

Regularization for Deep Learning

• 7.1 Parameter Norm Penalties /

• Data Augmentation: create fake data using transformations in

• Usually used for classification problems

Note: Make sure the transformations does not

• Add noise è Make neural networks more robust

– It a way to reflect the uncertainty Rn N

• Adding Noise to the Output Labels

• In general, models are developed using both unsupervised

X Improved generalization better

– Use unlabelled x to model p(x)

project 2 : boy or girl

– Increased number of inputs

• Stop when validation error do not improve anymore

Store Parameters and Return Back

• Example: CNN (Convolutional Neural Networks)

Drop-out rate: Between 0 and

You might also like