Unit-2 L1

Uploaded by

pari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views23 pages

Unit-2 L1

Uploaded by

pari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Unit 2

Regularization
• Regularization: Regularization: Overview, Parameter Penalties,
Norm Penalties as Constrained Optimization, Regularization and
Underconstrained Problems, Data Augmentation, Noise Robustness,
Batch Normalization, Semi-Supervised Learning, Multi-Task
Learning, Early Stopping, Parameter Tying and Parameter Sharing,
SparseRepresentations, Bagging, Dropout. Tuning Neural Networks,
Hyperparameters
Regularization

• Definition: Regularization is one of the most

important concepts of deep learning. It is a technique
to prevent the model from overfitting by adding extra
information to it.
Regularization
• Generalization error or Test error: Sometimes the deep learning model performs well with
the training data but does not perform well with the test data. It means the model is
not able to predict the output when deals with unseen data by introducing noise in
the output, and hence the model is called overfitted.

• At the left end of the graph,

training error and
generalization error are both
high. This is the underfitting
regime.

• As we increase capacity, training error decreases, but the gap between training
and generalization error increases. Eventually, the size of this gap outweighs
the decrease in training error, and we enter the overfitting regime, where
capacity is too large, above the optimal capacity.
Regularization (Overview)
• There are three situations when a deep learning model is trained
1. First regime is,excluded the true data generating process—corresponding to
underfitting and inducing bias, or
2. Matched the true data generating process (Just fit) (desired), or
3. Included the generating process but also many other possible generating
processes—the overfitting regime where variance rather than bias dominates the
estimation error.
• The goal of regularization is to take a model from the third regime into the
second regime.
• The best-fitting model (in the sense of minimizing generalization error) is a
large model that has been regularized appropriately.
• In regularization technique, we reduce the magnitude of the
features by keeping the same number of features.”
Overfitting occurs when our deep learning model tries to cover all
the data points or more than the required data points present in the
given dataset. The overfitted model has low bias and high variance.

Ques: How to avoid the

Overfitting in
Model
Ans: Regularization
Regularization Techniques
Parameter Norm Penalties
• The Parameter Norm Penalty approaches are based on limiting the capacity of
neural network models, by adding a parameter norm penalty Ω(θ) to the
objective function J.

• where α ∈ [0, ∞) is a hyperparameter that weights the relative contribution of

the norm penalty term, Ω.
• We choose to use a parameter norm penalty Ω that penalizes only the weights
of the affine transformation at each layer and leaves the biases unregularized.
As biases typically require less data to fit accurately than the weights.
• Different choices for the parameter norm Ω can result in different solutions
being preferred.
• Using a separate penalty with a different α coefficient for each layer is
sometimes desirable. Because it can be expensive to search for the correct
value of multiple hyperparameters, it is still reasonable to use the same weight
decay at all layers just to reduce the size of search space.
Parameter Norm Penalties
1. L2 Parameter Regularization (ridge regression or Tikhonov regularization):
This regularization strategy drives the weights closer to the origin1 by
adding a regularization term to the objective function.
L2 Parameter Regularization
• The addition of the weight decay term has modified the learning rule
to multiplicatively shrink the weight vector by a constant factor on
each step, just before performing the usual gradient update.
• The solid ellipses represent contours of equal
value of the unregularized objective.
• The dotted circles represent contours of equal
value of the L2 regularizer.
• At the point , these competing objectives reach
an equilibrium.
L Regularization
1
• Formally, L1 regularization on the model parameter w is defined as:

• we can see that the regularization contribution to the gradient no longer

scales linearly with each ; instead it is a constant factor with a sign equal to
sign().
• In comparison to L2 regularization, L1 regularization results in a solution that
is more sparse. Sparsity in this context refers to the fact that some
Norm Penalties as Constrained
Optimization
Norm Penalties as Constrained
Optimization
• To minimize a function subject to constraints, a generalized Lagrange function,
consisting of the original objective function plus a set of penalties can be
constructed.
• If we wanted to constrain to be less than some constant , we could construct a
generalized Lagrange function
Norm Penalties as Constrained
Optimization
• Solving this problem requires modifying both θ and α. Many different
procedures are possible. And α must increase whenever Ω(θ) > k and
decrease whenever Ω(θ) < k.
• We can fix α* and view the problem as just a function of θ

• We can thus think of the parameter norm penalty as imposing a constraint on

the weights.
Norm Penalties as Constrained
Optimization
• How α influences weights:
• If Ω is L2 norm, weights are then constrained to lie in an L2 ball.
• If Ω is the L1 norm, Weights are constrained to lie in a region of
limited L1 norm
• Usually, we do not know size of constraint region that we impose
by using weight decay with coefficient α* because the value of
α* does not directly tell us the value of k.
- Larger α will result in a smaller constraint region
- Smaller α will result in a larger constraint region
Norm Penalties as Constrained
Optimization
• Reprojection
• Sometimes we may wish to use explicit constraints rather than
penalties
-We can modify SGD to take a step downhill on J(θ) and then project θ
back to the nearest point that satisfies Ω(θ)<k
-This is useful when we have an idea of what value of k is appropriate and
we do not want to spend time searching for the value of α that
corresponds to this k.
• Rationale for explicit constraints/Reprojection
1. Dead weights
2. Stability
Norm Penalties as Constrained
Optimization
• Eliminating dead weights
• A reason to use explicit constraints and reprojection rather than
enforcing constraints with penalties:
• Penalties can cause nonconvex optimization procedures to get stuck in
local minima corresponding to small θ
• This manifests as training with dead units
• Explicit constraints implemented by reprojection can work much better
because they do not encourage weights to approach the origin.
Norm Penalties as Constrained
Optimization
• Stability of Optimization
• Explicit constraints with reprojection can be useful because
these impose some stability on the optimization procedure.
• When using high learning rates, it is possible to encounter a
positive feedback learning loop in which large weights induce
large gradients, which then induce a large update of the weights.
Can lead to numerical overflow
• Explicit constraints with reprojection prevent this feedback loop
from continuing to increase magnitudes of weights without
bound
Regularization and Underconstrained
Problems
Regularization and Underconstrained Problems
Regularization and Underconstrained Problems
Underconstrained logistic regression
Regularization and Underconstrained Problems
Solution for Undercontrained Iterative

Norton
No ratings yet
Norton
14 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
Unit - 4 REGULARIZATION FOR DEEP LEARNING
No ratings yet
Unit - 4 REGULARIZATION FOR DEEP LEARNING
56 pages
Unit Iv NNHDL
No ratings yet
Unit Iv NNHDL
15 pages
DL Chpter 3
No ratings yet
DL Chpter 3
8 pages
NN&DL Unit-IV Regularization For Deep Learning
No ratings yet
NN&DL Unit-IV Regularization For Deep Learning
16 pages
DL Unit 4
No ratings yet
DL Unit 4
15 pages
UNIT LV
No ratings yet
UNIT LV
8 pages
4th Unit DL Final Class Notes
No ratings yet
4th Unit DL Final Class Notes
68 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
Unit - 4-NNDL - Notes
No ratings yet
Unit - 4-NNDL - Notes
14 pages
Chap 7-1 Regularization For Deep Learning-Keonwoo Noh
No ratings yet
Chap 7-1 Regularization For Deep Learning-Keonwoo Noh
41 pages
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
No ratings yet
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
10 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Lecture 05 - Regularization - 4p
No ratings yet
Lecture 05 - Regularization - 4p
21 pages
Regularization
No ratings yet
Regularization
46 pages
Regularization in Deep Learning
No ratings yet
Regularization in Deep Learning
49 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
DL Unit-3
No ratings yet
DL Unit-3
56 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Penalizing Gradient Norm For Efficiently Improving Generalization in Deep Learning
No ratings yet
Penalizing Gradient Norm For Efficiently Improving Generalization in Deep Learning
11 pages
07 Regularization
No ratings yet
07 Regularization
51 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Regularization (Mathematics) - Wikipedia
No ratings yet
Regularization (Mathematics) - Wikipedia
13 pages
Unit 4
No ratings yet
Unit 4
93 pages
DL Unit 3
No ratings yet
DL Unit 3
59 pages
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
No ratings yet
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
18 pages
L S N N R: Earning Parse Eural Etworks Through Egularization
No ratings yet
L S N N R: Earning Parse Eural Etworks Through Egularization
13 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
Regularization (Mathematics)
No ratings yet
Regularization (Mathematics)
11 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
Unit 4
No ratings yet
Unit 4
62 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
Unit 4
No ratings yet
Unit 4
35 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
Regularization
No ratings yet
Regularization
18 pages
S10 DNN Regularization Wip
No ratings yet
S10 DNN Regularization Wip
11 pages
NNDL Notes
No ratings yet
NNDL Notes
73 pages
L1 Regularization (Lasso) & L2 Regularization (Ridge)
No ratings yet
L1 Regularization (Lasso) & L2 Regularization (Ridge)
4 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
Regularization Induces Sparse Coefficients
No ratings yet
Regularization Induces Sparse Coefficients
2 pages
Deep Learning: Computer Science and Engineering
No ratings yet
Deep Learning: Computer Science and Engineering
18 pages
What Is Regularization.
No ratings yet
What Is Regularization.
10 pages
07 Regularization
No ratings yet
07 Regularization
7 pages
Linear Regression With Overparameterized Linear Neural Networks
No ratings yet
Linear Regression With Overparameterized Linear Neural Networks
69 pages
Overfitting Problem Regularization (Ridge, Lasso, Elastic) Dropout and Early Stopping
No ratings yet
Overfitting Problem Regularization (Ridge, Lasso, Elastic) Dropout and Early Stopping
17 pages
Deep Learning Basics Lecture 4 Regularization II
No ratings yet
Deep Learning Basics Lecture 4 Regularization II
27 pages
Overfitting Vs Underfitting
No ratings yet
Overfitting Vs Underfitting
16 pages
DL Mod 4 & 6 Notes
No ratings yet
DL Mod 4 & 6 Notes
12 pages
(Hu2017groupsparse) Group Sparse Optimization Via LP, Q Regularization
No ratings yet
(Hu2017groupsparse) Group Sparse Optimization Via LP, Q Regularization
52 pages
ML Lec-8
No ratings yet
ML Lec-8
7 pages
2022 Scribe Lecture7
No ratings yet
2022 Scribe Lecture7
9 pages
Index: L1 Regularization L2 Regularization Comparison References
No ratings yet
Index: L1 Regularization L2 Regularization Comparison References
6 pages
Classification Problem: Feedforwardnet Patternnet Fitnet
No ratings yet
Classification Problem: Feedforwardnet Patternnet Fitnet
16 pages
Convex Problems
No ratings yet
Convex Problems
48 pages
Mod 4
No ratings yet
Mod 4
65 pages
Regularization
No ratings yet
Regularization
2 pages
21 CF With Regularization (Guide)
No ratings yet
21 CF With Regularization (Guide)
2 pages
Simulated Annealing: Fundamentals and Applications
From Everand
Simulated Annealing: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
XI SYLLABUS BREAK UP and MARKING SCHEME UNIT TEST 1
No ratings yet
XI SYLLABUS BREAK UP and MARKING SCHEME UNIT TEST 1
2 pages
Math 5 Q3 Test
No ratings yet
Math 5 Q3 Test
8 pages
MQL5 Language Basics STRING TYPES
No ratings yet
MQL5 Language Basics STRING TYPES
11 pages
1.1 Functions and Theis Representations
No ratings yet
1.1 Functions and Theis Representations
17 pages
CT605A-N Soft Computing
No ratings yet
CT605A-N Soft Computing
3 pages
2022-2023 ASVAB Arithmetic Reasoning and Mathematics
No ratings yet
2022-2023 ASVAB Arithmetic Reasoning and Mathematics
4 pages
1LANG algMERGED PDF
No ratings yet
1LANG algMERGED PDF
12 pages
Worksheet-1 Trigonometry
No ratings yet
Worksheet-1 Trigonometry
3 pages
Gray Code
No ratings yet
Gray Code
6 pages
Bordasvaldez Studyhabitsattitudetowardsmathmathachievementsofdoscststudents
No ratings yet
Bordasvaldez Studyhabitsattitudetowardsmathmathachievementsofdoscststudents
19 pages
Time-Cost Trade-Off Numerical
No ratings yet
Time-Cost Trade-Off Numerical
8 pages
Rolling Regression Theory
No ratings yet
Rolling Regression Theory
30 pages
Problem On Ages 411119 Crwill
No ratings yet
Problem On Ages 411119 Crwill
6 pages
Error Checking in Java
No ratings yet
Error Checking in Java
18 pages
Automata Theory and Computability 18Cs54
No ratings yet
Automata Theory and Computability 18Cs54
73 pages
End Block Design Aid
No ratings yet
End Block Design Aid
6 pages
Lesson 9 5 Multiplication Division of Radical Expressions
100% (1)
Lesson 9 5 Multiplication Division of Radical Expressions
17 pages
Naskah Fathi Slide I Slide 8
No ratings yet
Naskah Fathi Slide I Slide 8
3 pages
ANSYS CFX Tutorials
No ratings yet
ANSYS CFX Tutorials
610 pages
5 Tools and Tricks I Learned From Teaching Autocad Civil 3D To Caltrans Employees
No ratings yet
5 Tools and Tricks I Learned From Teaching Autocad Civil 3D To Caltrans Employees
12 pages
Manual Ezysurf
No ratings yet
Manual Ezysurf
10 pages
Elkies N.D. Lectures On Analytic Number Theory (Math259, Harvard, 1998) (100s) - MT
No ratings yet
Elkies N.D. Lectures On Analytic Number Theory (Math259, Harvard, 1998) (100s) - MT
100 pages
Lec 8
No ratings yet
Lec 8
8 pages
One Dimensional Array in Java - Tutorial & Example
No ratings yet
One Dimensional Array in Java - Tutorial & Example
4 pages
Advances in Geophysics Volume 55 1st Edition Renata Dmowska
No ratings yet
Advances in Geophysics Volume 55 1st Edition Renata Dmowska
75 pages
Intro To UV-Vis Spectros
No ratings yet
Intro To UV-Vis Spectros
14 pages
4TH CLASS MATHS SEM-2 - Watermark
No ratings yet
4TH CLASS MATHS SEM-2 - Watermark
132 pages
Zeus Case Study
No ratings yet
Zeus Case Study
7 pages
New Hexagonal Geometry in Cellular Network Systems
No ratings yet
New Hexagonal Geometry in Cellular Network Systems
8 pages