Regularization

The document discusses regularization techniques in machine learning, focusing on bias versus variance trade-offs, early stopping, and L1/L2 regularization methods to prevent overfitting. It highlights symptoms of high variance and high bias, and outlines strategies to address these issues, including adding training data and adjusting model complexity. Additionally, it provides practical examples of implementing these techniques in PyTorch.

Uploaded by

ayaazouz1997

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views18 pages

Regularization

Uploaded by

ayaazouz1997

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Regularization

Bias versus variance

A central problem in machine learning is how to make an algorithm that will perform well
not just on the training data, but also on new inputs. Many strategies used in machine
learning are explicitly designed to reduce the test error, possibly at the expense of increased
training error. These strategies are known collectively as regularization. A great many forms
of regularization are available to the deeplearning practitioner. In fact, developing more
effective regularization strategies has been one of the major research efforts in the field
High Variance - the cause of the poor performance is high variance.

Symptoms:
 Training error is much lower than test error
 Training error is lower than ϵ
 Test error is above ϵ
Actions:
 Add more training data
 Reduce model complexity - complex models are prone to high variance

High Bias - the model being used is not robust enough to produce an accurate prediction.
Symptoms:
 Training error is higher than ϵ
Actions:
 Use more complex model
 Add features
 Use more training data
Early Stopping
When training large models with sufficient representational capacity to overfit the task, we
often observe that training error decreases steadily over time, but validation set error begins
to rise again. See figure below for an example of this behavior, which occurs reliably. This
means we can obtain a model with better validation set error (and thus, hopefully better test
set error) by returning to the parameter setting at the point int ime with the lowest validation
set error. Every time the error on the validation set improves, we store a copy of the model
parameters. When the training algorithm terminates, we return these parameters, rather than
the latest parameters. The algorithm terminates when no parameters have improved over the
best recorded validation error for some pre-specified number of iterations. This strategy is
known as early stopping.
It is probably the most commonly used form of regularization in deep learning. Its popularity
is due to both its effectiveness and its simplicity. One way to think of early stopping is as a
very efficient hyperparameter selection algorithm. In this view, the number of training steps
is just another hyperparameter. We can see in figure that this hyperparameter has a U-
shaped validation set performance curve. Most hyperparameters that control model capacity
have such a U-shaped validation set performance curve.

In the case of early stopping, we are controlling the effective capacity of the model by
determining how many steps it can take to fit the training set. Most hyperparameters must
be chosen using an expensive guess and check process, where we set a hyperparameterat
the start of training, then run training for several steps to see its effect. The“training time”
hyperparameter is unique in that by definition, a single run oft raining tries out many values
of the hyperparameter. The only significant cost to choosing this hyperparameter
automatically via early stopping is running the validation set evaluation periodically during
training.
Ideally, this is done in parallel to the training process on a separate machine, separate CPU,
or separate GPU from the main training process. If such resources are not available, then
the cost of these periodic evaluations may be reduced by using a validation set that is small
compared to the training set or by evaluating the validation set error less frequently and
obtaining a lower-resolution estimate of the optimal training time.
Batch normalization

Training
Batch normalization: Test
Comparison of Normalization Layers

torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True,

track_running_stats=True, device=None, dtype=None)

torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True,

track_running_stats=True, device=None, dtype=None)
Decorrelated Batch Normalization
Dropout
torch.nn.Dropout(p=0.5, inplace=False)
During training, randomly zeroes some of the elements of the input tensor with probability
p using samples from a Bernoulli distribution. Each channel will be zeroed out
independently on every forward call.

Dataset Augmentation
L2 regularization
L2 regularization, also known as weight decay, is an important technique used in deep
learning to prevent overfitting, which occurs when a model becomes too complex and fits
too closely to the training data, leading to poor generalization performance on new, unseen
data.

In L2 regularization, a penalty term is added to the loss function of the model, which
encourages the model to have smaller weights. This penalty term is proportional to the
square of the magnitude of the weights, which means that larger weights are penalized
more heavily than smaller weights. By penalizing larger weights, the model is encouraged
to choose simpler solutions, which can lead to better generalization performance.

L2 regularization can also be seen as a way of implementing Occam's razor, which states
that among competing hypotheses, the one that makes the fewest assumptions should be
selected. In the context of deep learning, this means that simpler models that generalize
well to new data are preferred over more complex models that fit the training data
perfectly but may not generalize well.

Overall, L2 regularization is an important tool in the deep learning toolbox that can help
improve model generalization performance and prevent overfitting.
In Pytorch:
optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=0.01)
or
optimizer = optim.Adam(model.parameters(), lr=0.01, weight_decay=0.01)
L1 regularization
L1 and L2 regularization are two common techniques used in machine learning to prevent
overfitting by adding a penalty term to the loss function of a model. While both techniques
aim to reduce the complexity of the model and improve generalization performance, they
differ in how they achieve this goal.

The main difference between L1 and L2 regularization is in the penalty term that is added
to the loss function. In L1 regularization, the penalty term is proportional to the absolute
value of the weights, while in L2 regularization, it is proportional to the square of the
magnitude of the weights.

This difference in penalty term leads to some key differences in the effect of L1 and L2
regularization on the model. One of the main differences is that L1 regularization tends to
result in sparse weight vectors, where many of the weights are exactly zero. This is
because the penalty term in L1 regularization encourages some of the weights to be set to
zero, effectively removing some of the features from the model.

On the other hand, L2 regularization tends to distribute the weight values more evenly,
with no weight being exactly zero, and instead all weights are just small. This makes L2
regularization more suited to models where all features are potentially important, as it will
still consider them all.
In summary, L1 regularization tends to produce sparse models, while L2 regularization
produces models with small, non-zero weights. Choosing between the two regularization
techniques depends on the specific problem and the characteristics of the data and the
model.

import torch
import torch.nn as nn
import torch.optim as optim

model = nn.Linear(10, 1) # Example linear model

optimizer = optim.SGD(model.parameters(), lr=0.01) # No weight decay for L1
regularization
criterion = nn.MSELoss() # Example loss function

l1_lambda = 0.01 # Strength of L1 regularization

l1_reg = nn.L1Loss() # L1 regularization module

# Training loop
for inputs, labels in dataloader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
l1_loss = 0
for param in model.parameters():
l1_loss += l1_lambda * l1_reg(param)
loss += l1_loss
loss.backward()
optimizer.step()

Unit - 4-NNDL - Notes
No ratings yet
Unit - 4-NNDL - Notes
14 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
Regularization in Deep Learning
No ratings yet
Regularization in Deep Learning
49 pages
Early Stopping, Dropout, Augmentation, Optimizers New
No ratings yet
Early Stopping, Dropout, Augmentation, Optimizers New
91 pages
5m DL Answers
No ratings yet
5m DL Answers
12 pages
Unit-2 L3
No ratings yet
Unit-2 L3
23 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
Dataset Augmentation
No ratings yet
Dataset Augmentation
30 pages
4th Unit DL Final Class Notes
No ratings yet
4th Unit DL Final Class Notes
68 pages
4.bias and Variance
No ratings yet
4.bias and Variance
19 pages
Accomplishment Report-Catch-Up-Fridays
No ratings yet
Accomplishment Report-Catch-Up-Fridays
10 pages
Application of Genetic Engineering in Medicine
80% (5)
Application of Genetic Engineering in Medicine
5 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Deep Learning - Lecture 3 - Regularization in Neural Networks
No ratings yet
Deep Learning - Lecture 3 - Regularization in Neural Networks
16 pages
DL Unit-3
No ratings yet
DL Unit-3
56 pages
Unit Online 1.3
No ratings yet
Unit Online 1.3
21 pages
Unit 4
No ratings yet
Unit 4
62 pages
07 Regularization
No ratings yet
07 Regularization
51 pages
Grade 9 LO - Constitutional Values
No ratings yet
Grade 9 LO - Constitutional Values
17 pages
1.5 Regularization and Optimization
No ratings yet
1.5 Regularization and Optimization
17 pages
The Greek Sense of Theatre Tragedy and Comedy 3rd Edition J Michael Walton Instant Download
No ratings yet
The Greek Sense of Theatre Tragedy and Comedy 3rd Edition J Michael Walton Instant Download
54 pages
Regularization Slides
No ratings yet
Regularization Slides
50 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
CM20315 09 Regularization
No ratings yet
CM20315 09 Regularization
44 pages
Regularization
No ratings yet
Regularization
46 pages
Unit 3
No ratings yet
Unit 3
47 pages
Deep Learning Module 3-1
No ratings yet
Deep Learning Module 3-1
31 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
Unit - 4 REGULARIZATION FOR DEEP LEARNING
No ratings yet
Unit - 4 REGULARIZATION FOR DEEP LEARNING
56 pages
Lecture 05 - Regularization - 4p
No ratings yet
Lecture 05 - Regularization - 4p
21 pages
Regularization
No ratings yet
Regularization
19 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
Unit 4
No ratings yet
Unit 4
35 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
Unit 2
No ratings yet
Unit 2
18 pages
ML Assignment
No ratings yet
ML Assignment
7 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Index: L1 Regularization L2 Regularization Comparison References
No ratings yet
Index: L1 Regularization L2 Regularization Comparison References
6 pages
Unit-2 L1
No ratings yet
Unit-2 L1
23 pages
Cst414-Deep Learning Module 2
No ratings yet
Cst414-Deep Learning Module 2
13 pages
Case Study On Food Corporation of India
80% (10)
Case Study On Food Corporation of India
23 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
DL Notes
No ratings yet
DL Notes
16 pages
An Overview of Overfitting and Its Solutions
No ratings yet
An Overview of Overfitting and Its Solutions
7 pages
Compact First For Schools Teacher's Book
No ratings yet
Compact First For Schools Teacher's Book
16 pages
What Is Regularization.
No ratings yet
What Is Regularization.
10 pages
Contents 2
No ratings yet
Contents 2
147 pages
Regularization (Mathematics) - Wikipedia
No ratings yet
Regularization (Mathematics) - Wikipedia
13 pages
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
No ratings yet
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
39 pages
DL M2 Regularization
No ratings yet
DL M2 Regularization
12 pages
Deep Learning Basics Lecture 4 Regularization II
No ratings yet
Deep Learning Basics Lecture 4 Regularization II
27 pages
Unit Iv NNHDL
No ratings yet
Unit Iv NNHDL
15 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
DL Class3
No ratings yet
DL Class3
28 pages
Accelerated Bayesian Optimization For Deep Learning
No ratings yet
Accelerated Bayesian Optimization For Deep Learning
13 pages
DL Mod 4 & 6 Notes
No ratings yet
DL Mod 4 & 6 Notes
12 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
InterCor Hybrid-Roadmap v1.0 Final
No ratings yet
InterCor Hybrid-Roadmap v1.0 Final
40 pages
DL Chpter 3
No ratings yet
DL Chpter 3
8 pages
UNIT LV
No ratings yet
UNIT LV
8 pages
Reading Evaluation 0503
No ratings yet
Reading Evaluation 0503
1 page
Classification Problem: Feedforwardnet Patternnet Fitnet
No ratings yet
Classification Problem: Feedforwardnet Patternnet Fitnet
16 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Exergy - A Useful Concept: Göran Wall
No ratings yet
Exergy - A Useful Concept: Göran Wall
46 pages
Concrete Shear Wall With Complete Details Ram Concept
100% (1)
Concrete Shear Wall With Complete Details Ram Concept
108 pages
GE 4 Unit 1 Lesson 4
No ratings yet
GE 4 Unit 1 Lesson 4
31 pages
Animal Research and Human Medicine Booklet
No ratings yet
Animal Research and Human Medicine Booklet
24 pages
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
No ratings yet
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
10 pages
Embroidery
No ratings yet
Embroidery
3 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Indonesia and Malay Archipelago
No ratings yet
Indonesia and Malay Archipelago
5 pages
Auditing Theory - Overview
No ratings yet
Auditing Theory - Overview
1 page
School of Law, Mumbai: Digital Forgery
No ratings yet
School of Law, Mumbai: Digital Forgery
9 pages
Instructions: Read The Following Article and Answer The Questions. Six Sigma in Industry: Some Observations After Twenty-Five Years T. N. Goh
No ratings yet
Instructions: Read The Following Article and Answer The Questions. Six Sigma in Industry: Some Observations After Twenty-Five Years T. N. Goh
7 pages
EDAC SampleExam09 WEB 001 PDF
No ratings yet
EDAC SampleExam09 WEB 001 PDF
8 pages
CFA L-2 Performance Tracker '23
No ratings yet
CFA L-2 Performance Tracker '23
11 pages
Pancit Palabok Recipe
No ratings yet
Pancit Palabok Recipe
2 pages
Practical Statistical Process Control
From Everand
Practical Statistical Process Control
Colin Hardwick
5/5 (9)
Kaveh Afrasiabi - Vilification of A Scholar
No ratings yet
Kaveh Afrasiabi - Vilification of A Scholar
4 pages
Deen Dayal Upadhyaya Gorakhpur University, Gorakhpur
No ratings yet
Deen Dayal Upadhyaya Gorakhpur University, Gorakhpur
1 page
University of Calicut: Day & Date Subject
No ratings yet
University of Calicut: Day & Date Subject
4 pages
Amazing Love - Lyrics & Chords
No ratings yet
Amazing Love - Lyrics & Chords
1 page
Nov 20 - Lesson Plan - Hot Air Balloon
No ratings yet
Nov 20 - Lesson Plan - Hot Air Balloon
2 pages
Gratuity Form
No ratings yet
Gratuity Form
2 pages
SRT Kahoot Lesson Plan
No ratings yet
SRT Kahoot Lesson Plan
3 pages
Thermal Engineering-2 Question Bank: Unit Wise
No ratings yet
Thermal Engineering-2 Question Bank: Unit Wise
12 pages
Medical Fitness
No ratings yet
Medical Fitness
1 page