Module-4_3

Uploaded by

as.business.023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Module-4_3

Uploaded by

as.business.023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Regularization in Deep Learning

L1, L2, and Dropout

Recap: Overfitting
• Overfitting refers to the phenomenon where a
neural network models the training data very
well but fails when it sees new data from the
same problem domain. Overfitting is caused
by noise in the training data that the neural
network picks up during training and learns it
as an underlying concept of the data.
data
Overfitting
• This learned noise,, however,, is unique
q to each
training set. As soon as the model sees new data
from the same problem domain, but that does
not contain this noise,
noise the performance of the
neural network gets much worse.
• “Why does the neural network picks up that noise
in the first place?”
• The reason for this is that the complexity of this
network is too high.
high A fit of a neural network with
higher complexity is shown in the image on the
right‐hand side.
Graph 1. Model with a good fit and high variance
Overfitting
• The model with a higher complexity is able to pick up and
learn patterns (noise) in the data that are just caused by
some random fluctuation or error. The network would be
able to model each data sample of the distribution one‐by‐
one, while
hil nott recognizing
i i ththe ttrue ffunction
ti th
thatt d
describes
ib
the distribution.
• New arbitrary samples generated with the true function
wouldld h
have a hi
high h di
distance
t tto th
the fit off th
the model.
d l W We also
l
say that the model has a high variance.
• On the other hand, the lower complexity network on the
l ft side
left id models
d l the
th didistribution
t ib ti much h better
b tt byb nott trying
t i
too hard to model each data pattern individually.
Overfitting
• In practice
practice, overfitting causes the neural
network model to perform very well during
training but the performance gets much
training,
worse during inference time when faced with
brand new data.
data
Regularization

• Regularization refers to a set of different

techniques that lower the complexity of a
neural network model during training,
training and
thus prevent the overfitting.
• There are three very popular and efficient
regularization techniques called L1, L2, and
dropout which we are going to discuss in the
following.
L2 Regularization

• The L2 regularization is the most common

type of all regularization techniques and is
also commonly known as weight decay or Ride
Regression.
• During the L2 regularization the loss function
of the neural network as extended by a so‐
called regularization term,
term which is called here
Ω.
L2 Regularization

• The regularization term Ω is defined as the

E lid
Euclidean N
Norm (or
( L2 norm)) off th
the weight
i ht
matrices, which is the sum over all squared
g values of a weight
weight g matrix.
L2 Regularization

• The regularization term is weighted by the scalar

alpha divided by two and added to the regular
loss function that is chosen for the current task.
This leads to a new expression for the loss
function:
Gradient Descent during L2
• Alpha is sometimes called as the regularization rate and is
an additional hyperparameter we introduce into the neural
network. Simply speaking alpha determines how much we
regularize our model.
• In the next step we can compute the gradient of the new
loss function and put the gradient into the update rule for
the weights:
Gradient Descent during L2
• Some reformulations of the update rule lead to the
expression which very much looks like the update rule
for the weights during regular gradient descent:
• The only difference is that by adding the regularization
term we introduce an additional subtraction from the
current weights (first term in the equation).
• In other words independent of the gradient of the loss
function we are making our weights a little bit smaller
each time an update is performed.
L1 Regularization

• In the case of L1 regularization (also knows as

Lasso regression), we simply use another
regularization term Ω.
Ω This term is the sum of
the absolute values of the weight parameters
in a weight matrix:
L1 Regularization

• The derivative of the new loss function leads

to the following expression, which the sum of
the gradient of the old loss function and sign
of a weight value times alpha.
Why do L1 and L2 Regularizations
work?
• consider the plots of the AND functions,
where represents the operation performed
during L1(red) and the operation performed
during L2 (blue)regularization.
Why do L1 and L2 Regularizations work?
• In the case of L2 regularization, weight parameters decrease, but
not necessarily become zero, since the curve becomes flat near
zero.
• On the other hand during the L1 regularization, the weight are
always forced all the way towards zero.
• In the case of L2, you can think of solving an equation, where the
sum of squared weight values is equal or less than a value s. s is
the constant that exists for each possible value of the
regularization term α. For just two weight values W1 and W2 this
equation would look as follows:
W1 ² + W²² ≤ s
• While, the L1 regularization can be thought of as an equation
where the sum of modules of weight values is less than or equal to
a value s. This would look like the following expression: |W1| +
|W2| ≤ s
Visualization
• The introduced equations for L1 and L2 regularizations are
constraint functions, which we can visualize:

The left image shows the constraint function (green area) for the
L1 regularization and the right image shows the constraint
function for the L2 regularization. The red ellipses are contours
off the
h loss
l function
f that
h is used d during
d the
h gradient
d d
descent. In
the center of the contours there is a set of optimal weights for
which the loss function has a global minimum.
Visualization
• In the case of L1 and L2 regularization, the estimates of W1
and W2 are given by the first point where the ellipse
intersects with the green constraint area.
• Since L2 regularization has a circular constraint area, the
intersection won’t generally occur on an axis, and this the
estimates for W1 and W2 will be exclusively non‐zero.
• In the case of L1, the constraints area has a diamond shape
with corners. And thus the contours of the loss function will
often intersect the constraint region at an axis. Then this
occurs, one of the estimates (W1 or W2) will be zero.

• In a high dimensional space, many of the weight

parameters will equal
p q zero simultaneously. y
What does Regularization achieve?

• Performing L2 regularization encourages the

weight values towards zero (but not exactly zero)
• Performing L1 regularization encourages the
weight values to be zero
• Intuitively speaking smaller weights reduce the
impact of the hidden neurons. In that case, those
hidden neurons become neglectable and the
overall complexity of the neural network gets
reduced.
What does Regularization achieve?

• As mentioned earlier: less complex models typically avoid

modeling
d li noise
i iin th
the ddata,
t and d th
therefore,
f th
there iis no overfitting.
fitti
• But you have to be careful. When choosing the regularization
term α. The goal is to strike the right balance between low
complexity of the model and accuracy
• If your alpha value is too high, your model will be simple, but you
run the risk of underfitting your data. Your model won’t learn
enough about the training data to make useful predictions.
• If your alpha value is too low, your model will be more complex, and
you run the risk of overfitting your data. Your model will learn too
much about the particularities of the training data, and won’t be
able to generalize to new data
data.

Module 2 Grammatical & Syntactic Awareness
No ratings yet
Module 2 Grammatical & Syntactic Awareness
63 pages
Year 5 Mid Term English Paper 2023
No ratings yet
Year 5 Mid Term English Paper 2023
5 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
Regularization Slides (2)
No ratings yet
Regularization Slides (2)
50 pages
Regularization
No ratings yet
Regularization
46 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
Deep Learning Basics Lecture 4 Regularization II
No ratings yet
Deep Learning Basics Lecture 4 Regularization II
27 pages
What is Regularization.
No ratings yet
What is Regularization.
10 pages
Regularization in Deep Learning (1)
No ratings yet
Regularization in Deep Learning (1)
49 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
Nndl Notes
No ratings yet
Nndl Notes
73 pages
UNIT LV
No ratings yet
UNIT LV
8 pages
Regularization_for_Neural_Networks_1718966083
No ratings yet
Regularization_for_Neural_Networks_1718966083
9 pages
Kkk
No ratings yet
Kkk
17 pages
DL_Unit-3
No ratings yet
DL_Unit-3
56 pages
Unit 4
No ratings yet
Unit 4
35 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
5 Regularization
No ratings yet
5 Regularization
79 pages
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
No ratings yet
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
20 pages
DL Class3
No ratings yet
DL Class3
28 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
Overfitting vs Underfitting
No ratings yet
Overfitting vs Underfitting
16 pages
WEEK 10
No ratings yet
WEEK 10
69 pages
DL_IT324a_3
No ratings yet
DL_IT324a_3
13 pages
DL+lect+7 (1)
No ratings yet
DL+lect+7 (1)
15 pages
4. Regularization
No ratings yet
4. Regularization
19 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
unit4
No ratings yet
unit4
93 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
cours4
No ratings yet
cours4
30 pages
S10_DNN_Regularization_wip
No ratings yet
S10_DNN_Regularization_wip
11 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
3 pages
07_regularization
No ratings yet
07_regularization
51 pages
L10_regularization__slides(1)
No ratings yet
L10_regularization__slides(1)
45 pages
Regularization PDF
No ratings yet
Regularization PDF
32 pages
The Problem of Overfitting: Overfitting With Linear Regression
No ratings yet
The Problem of Overfitting: Overfitting With Linear Regression
32 pages
Underfitting Overfitting
No ratings yet
Underfitting Overfitting
38 pages
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
No ratings yet
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
9 pages
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
No ratings yet
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
10 pages
Regularization 1704650055
No ratings yet
Regularization 1704650055
32 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
4 NN Regularization
No ratings yet
4 NN Regularization
13 pages
Lecture 6
No ratings yet
Lecture 6
41 pages
Lec-W12 Overfitting
No ratings yet
Lec-W12 Overfitting
30 pages
DL Chpter 3
No ratings yet
DL Chpter 3
8 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
No ratings yet
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
18 pages
Curs5site PDF
No ratings yet
Curs5site PDF
47 pages
One Fourth Labs: L2 Regularization
No ratings yet
One Fourth Labs: L2 Regularization
2 pages
CM20315 09 Regularization
No ratings yet
CM20315 09 Regularization
44 pages
Regularization
No ratings yet
Regularization
14 pages
4th Unit DL Final Class Notes (1)
No ratings yet
4th Unit DL Final Class Notes (1)
68 pages
DL Unit 4
No ratings yet
DL Unit 4
15 pages
NN&DL Unit-IV Regularization for Deep Learning
No ratings yet
NN&DL Unit-IV Regularization for Deep Learning
16 pages
Module-4_4
No ratings yet
Module-4_4
19 pages
Unit 2
No ratings yet
Unit 2
31 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Introduction to Logarithms and Exponentials
From Everand
Introduction to Logarithms and Exponentials
Simone Malacrida
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet
If_Else
No ratings yet
If_Else
2 pages
Tuple and Dictionary
No ratings yet
Tuple and Dictionary
1 page
SOLUTION DFH WORKSHEET1
No ratings yet
SOLUTION DFH WORKSHEET1
10 pages
Homewwork 1
No ratings yet
Homewwork 1
2 pages
Year 9 Science Rubric
No ratings yet
Year 9 Science Rubric
1 page
量度體溫記錄表 Temperature Record Sheet - 20200914150016
No ratings yet
量度體溫記錄表 Temperature Record Sheet - 20200914150016
1 page
Gen Bio1 Module 8
100% (1)
Gen Bio1 Module 8
23 pages
Provincial/Division Athletic Meet 2019 List of Qualified Athletes, Coaches and Chaperons Legislative District 1
No ratings yet
Provincial/Division Athletic Meet 2019 List of Qualified Athletes, Coaches and Chaperons Legislative District 1
12 pages
JKPSC Kas Syllabus PDF
No ratings yet
JKPSC Kas Syllabus PDF
3 pages
English Compulsory and English Advanced
No ratings yet
English Compulsory and English Advanced
25 pages
Group 4 Distribution Mgt.
No ratings yet
Group 4 Distribution Mgt.
45 pages
Perceptions, Issues, and Challenges Towards Online and Alternative Examinations System A Case of Mid-Western University
No ratings yet
Perceptions, Issues, and Challenges Towards Online and Alternative Examinations System A Case of Mid-Western University
10 pages
Humare Avaaz Report (Easy Read)
No ratings yet
Humare Avaaz Report (Easy Read)
12 pages
Theangulartutorial PDF
No ratings yet
Theangulartutorial PDF
537 pages
EDUC.403 SPARTAN EDUCATION by CLAUDEN NALIC Reporter No.7
No ratings yet
EDUC.403 SPARTAN EDUCATION by CLAUDEN NALIC Reporter No.7
2 pages
Gold Exp A1 U1 Lang Test B
No ratings yet
Gold Exp A1 U1 Lang Test B
2 pages
Using The Mother Tongue by Sheelagh Deller & Mario Rinvolucri PDF
No ratings yet
Using The Mother Tongue by Sheelagh Deller & Mario Rinvolucri PDF
96 pages
Literature Review Assignment Rubric
No ratings yet
Literature Review Assignment Rubric
4 pages
Organic Agriculture Gr11 - Module3.final For Teacher
No ratings yet
Organic Agriculture Gr11 - Module3.final For Teacher
17 pages
Grade 9 Syllabus- American
No ratings yet
Grade 9 Syllabus- American
2 pages
Assessment, Careers, and Business: Mcgraw-Hill/Irwin © 2013 Mcgraw-Hill Companies. All Rights Reserved
No ratings yet
Assessment, Careers, and Business: Mcgraw-Hill/Irwin © 2013 Mcgraw-Hill Companies. All Rights Reserved
40 pages
Mastering Airline Ticketing - A Comprehensive Guide For New Agents
No ratings yet
Mastering Airline Ticketing - A Comprehensive Guide For New Agents
77 pages
Aptis Listening 2020 Test1
No ratings yet
Aptis Listening 2020 Test1
4 pages
Ra 11642
No ratings yet
Ra 11642
43 pages
André Corten (auth.) - Pentecostalism in Brazil_ Emotion of the Poor and Theological Romanticism-Palgrave Macmillan UK (1999)
No ratings yet
André Corten (auth.) - Pentecostalism in Brazil_ Emotion of the Poor and Theological Romanticism-Palgrave Macmillan UK (1999)
226 pages
Cambridge English Key For Schools Faqs
No ratings yet
Cambridge English Key For Schools Faqs
4 pages
Laporan KKL Uniba
No ratings yet
Laporan KKL Uniba
20 pages
5th IMO (International Mathematics Olympiads) : Answer Key Class II
No ratings yet
5th IMO (International Mathematics Olympiads) : Answer Key Class II
8 pages
CCT 11 Google Form Link
100% (1)
CCT 11 Google Form Link
4 pages
Bursar
No ratings yet
Bursar
5 pages
CSI Registration - Final
No ratings yet
CSI Registration - Final
4 pages
ECA2+ Tests Language Test 8B 2018
No ratings yet
ECA2+ Tests Language Test 8B 2018
4 pages

Module-4_3

Uploaded by

Module-4_3

Uploaded by

Regularization in Deep Learning

L1, L2, and Dropout

• Regularization refers to a set of different

• The L2 regularization is the most common

• The regularization term Ω is defined as the

• The regularization term is weighted by the scalar

• In the case of L1 regularization (also knows as

• The derivative of the new loss function leads

• In a high dimensional space, many of the weight

• Performing L2 regularization encourages the

• As mentioned earlier: less complex models typically avoid

You might also like