0% found this document useful (0 votes)

113 views44 pages

CM20315 09 Regularization

Uploaded by

gdcs.iug

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views44 pages

CM20315 09 Regularization

Uploaded by

gdcs.iug

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

CM20315 - Machine Learning

Prof. Simon Prince

9. Regularization
Regularization
• Why is there a generalization gap between training and test data?
• Overfitting (model describes statistical peculiarities)
• Model unconstrained in areas where there are no training examples
• Regularization = methods to reduce the generalization gap
• Technically means adding terms to loss function
• But colloquially means any method (hack) to reduce gap
Regularization
• Explicit regularization
• Implicit regularization
• Early stopping
• Ensembling
• Dropout
• Adding noise
• Bayesian approaches
• Transfer learning, multi-task learning, self-supervised learning
• Data augmentation
Explicit regularization
• Standard loss function:

• Regularization adds and extra term

• Favors some parameters, disfavors others.

• >0 controls the strength
Explicit regularization
• Standard loss function:

• Regularization adds an extra term

• Favors some parameters, disfavors others.

• >0 controls the strength
Explicit regularization
• Standard loss function:

• Regularization adds an extra term

• Favors some parameters, disfavors others.

• >0 controls the strength
Explicit regularization
Explicit regularization
Explicit regularization
Probabilistic interpretation
• Maximum likelihood:

• Regularization is equivalent to a adding a prior over parameters

… what you know about parameters before seeing the data

Equivalence
• Explicit regularization:

• Probabilistic interpretation:

• Mapping:
Equivalence
• Explicit regularization:

• Probabilistic interpretation:

• Mapping:
L2 Regularization
• Can only use very general terms
• Most common is L2 regularization
• Favors smaller parameters

• Also called Tichonov regularization, ridge regression

• In neural networks, usually just for weights and called weight decay
Why does L2 regularization help?
• Discourages slavish adherence to the data (overfitting)
• Encourages smoothness between datapoints
L2 regularization
Regularization
• Explicit regularization
• Implicit regularization
• Early stopping
• Ensembling
• Dropout
• Adding noise
• Bayesian approaches
• Transfer learning, multi-task learning, self-supervised learning
• Data augmentation
Implicit regularization

Gradient descent approximates a Finite step size equivalent to Add in that regularization and
differential equation regularization differential equation converges to
(infinitesimal step size) same place
Implicit regularization
• Gradient descent disfavors areas where gradients are steep

• SGD likes all batches to have similar gradients

• Depends on learning rate – perhaps why larger learning rates generalize better.
Implicit regularization
• Gradient descent disfavors areas where gradients are steep

• SGD likes all batches to have similar gradients

• Depends on learning rate – perhaps why larger learning rates generalize better.
Implicit regularization
• Gradient descent disfavors areas where gradients are steep

• SGD likes all batches to have similar gradients

• Depends on learning rate – perhaps why larger learning rates generalize better.
Generally performance
• best for larger learning rates
• best with smaller learning rates
Regularization
• Explicit regularization
• Implicit regularization
• Early stopping
• Ensembling
• Dropout
• Adding noise
• Bayesian approaches
• Transfer learning, multi-task learning, self-supervised learning
• Data augmentation
Early stopping
• If we stop training early, weights don’t have time to overfit to noise
• Weights start small, don’t have time to get large
• Reduces effective model complexity
• Known as early stopping
• Don’t have to re-train
Regularization
• Explicit regularization
• Implicit regularization
• Early stopping
• Ensembling
• Dropout
• Adding noise
• Bayesian approaches
• Transfer learning, multi-task learning, self-supervised learning
• Data augmentation
Ensembling
• Average together several models – an ensemble
• Can take mean or median
• Different initializations / different models
• Different subsets of the data resampled with replacements -- bagging
Regularization
• Explicit regularization
• Implicit regularization
• Early stopping
• Ensembling
• Dropout
• Adding noise
• Bayesian approaches
• Transfer learning, multi-task learning, self-supervised learning
• Data augmentation
Dropout
Dropout

Can eliminate kinks in function that are far from data and don’t contribute to
training loss
Regularization
• Explicit regularization
• Implicit regularization
• Early stopping
• Ensembling
• Dropout
• Adding noise
• Bayesian approaches
• Transfer learning, multi-task learning, self-supervised learning
• Data augmentation
Adding noise

• to inputs
• to weights
• to outputs (labels)
Regularization
• Explicit regularization
• Implicit regularization
• Early stopping
• Ensembling
• Dropout
• Adding noise
• Bayesian approaches
• Transfer learning, multi-task learning, self-supervised learning
• Data augmentation
Bayesian approaches
• There are many parameters compatible with the data
• Can find a probability distribution over them Prior info about
parameters

• Take all possible parameters into account when make prediction

Bayesian approaches
• There are many parameters compatible with the data
• Can find a probability distribution over them Prior info about
parameters

• Take all possible parameters into account when make prediction

Bayesian approaches
Regularization
• Explicit regularization
• Implicit regularization
• Early stopping
• Ensembling
• Dropout
• Adding noise
• Bayesian approaches
• Transfer learning, multi-task learning, self-supervised learning
• Data augmentation
• Transfer learning

• Multi-task learning

• Self-supervised
learning
• Transfer learning

• Multi-task learning

• Self-supervised
learning
• Transfer learning

• Multi-task learning

• Self-supervised
learning
Regularization
• Explicit regularization
• Implicit regularization
• Early stopping
• Ensembling
• Dropout
• Adding noise
• Bayesian approaches
• Transfer learning, multi-task learning, self-supervised learning
• Data augmentation
Data augmentation
Regularization overview

Dismantling Naik
No ratings yet
Dismantling Naik
45 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Name Email Mobile
No ratings yet
Name Email Mobile
30 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
Gautama Buddha Was Born in Hela Bima
33% (3)
Gautama Buddha Was Born in Hela Bima
62 pages
Perencanaan Tebal Perkerasan Landasan Pacu
No ratings yet
Perencanaan Tebal Perkerasan Landasan Pacu
8 pages
Writing Drills - Answer 2012: A) Exercise 1A: Formal Letter
No ratings yet
Writing Drills - Answer 2012: A) Exercise 1A: Formal Letter
10 pages
4 MachineLearningForCV
No ratings yet
4 MachineLearningForCV
73 pages
Pci Dss Compliance Checklist
No ratings yet
Pci Dss Compliance Checklist
9 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
Lecture 3
No ratings yet
Lecture 3
105 pages
Quantam Computers
No ratings yet
Quantam Computers
21 pages
Lecture 6
No ratings yet
Lecture 6
41 pages
Formulation of Objective
No ratings yet
Formulation of Objective
16 pages
L09 - Regularisation
No ratings yet
L09 - Regularisation
79 pages
Deep Learning Basics Lecture 4 Regularization II
No ratings yet
Deep Learning Basics Lecture 4 Regularization II
27 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
ENROLLMENT NO.:-160280107033 PYTHON PROGRAMMING (2180711) : Be - Comp. - Sem-8 - Ldce Page
No ratings yet
ENROLLMENT NO.:-160280107033 PYTHON PROGRAMMING (2180711) : Be - Comp. - Sem-8 - Ldce Page
23 pages
Lecture15 Regularization
No ratings yet
Lecture15 Regularization
47 pages
An Introduction To Ferrography
No ratings yet
An Introduction To Ferrography
37 pages
Lect 6
No ratings yet
Lect 6
60 pages
BondMaster1000eplus en
No ratings yet
BondMaster1000eplus en
2 pages
KALAnnualReport2016 17
No ratings yet
KALAnnualReport2016 17
92 pages
LECTURE#9 EE258 F22 Part2 Draft v1
No ratings yet
LECTURE#9 EE258 F22 Part2 Draft v1
14 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
SCM in Motor Vehicle Industry
No ratings yet
SCM in Motor Vehicle Industry
44 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Regularization 1704650055
No ratings yet
Regularization 1704650055
32 pages
C-Data Gepon Olt Fd2000s Ems User Manual-V2.0
No ratings yet
C-Data Gepon Olt Fd2000s Ems User Manual-V2.0
67 pages
Chapter 2 - 4 Important Techniques
No ratings yet
Chapter 2 - 4 Important Techniques
34 pages
DL Class3
No ratings yet
DL Class3
28 pages
1 Maxwell's Equations in Matter (Integrate With Next Section)
100% (1)
1 Maxwell's Equations in Matter (Integrate With Next Section)
2 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Abdul - Azeez Bin Abdullaah Bin Baaz
No ratings yet
Abdul - Azeez Bin Abdullaah Bin Baaz
4 pages
Gallup Test
No ratings yet
Gallup Test
25 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Grade 06 History 1st Term Test Paper With Answers 2019 Sinhala Medium North Western Province
83% (6)
Grade 06 History 1st Term Test Paper With Answers 2019 Sinhala Medium North Western Province
7 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
Regularization
No ratings yet
Regularization
7 pages
Visual Rhetoric PresentationWorksheet
No ratings yet
Visual Rhetoric PresentationWorksheet
2 pages
5 Regularization
No ratings yet
5 Regularization
79 pages
Group 8 Ocampo ED 203 MidTerm Exam
No ratings yet
Group 8 Ocampo ED 203 MidTerm Exam
6 pages
Schedule For OzCon 2023 Revised 05-30 2
No ratings yet
Schedule For OzCon 2023 Revised 05-30 2
4 pages
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
No ratings yet
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
20 pages
Group Assignment 6 ICT (XII IPA 5) - 20240118 - 003400 - 0000
No ratings yet
Group Assignment 6 ICT (XII IPA 5) - 20240118 - 003400 - 0000
13 pages
Lecture 3-4
No ratings yet
Lecture 3-4
115 pages
RFQ - Section - III - Technical - Questionnaire
No ratings yet
RFQ - Section - III - Technical - Questionnaire
12 pages
Regularization Slides
No ratings yet
Regularization Slides
50 pages
BT Mid 1ans
No ratings yet
BT Mid 1ans
9 pages
Unit-2 L2
No ratings yet
Unit-2 L2
22 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
DL Lect 7
No ratings yet
DL Lect 7
15 pages
Remediation and Enrichment Plan
No ratings yet
Remediation and Enrichment Plan
11 pages
Mod 4
No ratings yet
Mod 4
65 pages
Cours 4
No ratings yet
Cours 4
30 pages
What Is Regularization.
No ratings yet
What Is Regularization.
10 pages
Regularization (Mathematics)
No ratings yet
Regularization (Mathematics)
11 pages
Unit 3
No ratings yet
Unit 3
47 pages
DL M2 Regularization
No ratings yet
DL M2 Regularization
12 pages
Week 10
No ratings yet
Week 10
69 pages
L10 Regularization Slides
No ratings yet
L10 Regularization Slides
45 pages
S10 DNN Regularization Wip
No ratings yet
S10 DNN Regularization Wip
11 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
07 Regularization
No ratings yet
07 Regularization
51 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
Assessments in Occupational Therapy Mental Health An Integrative Approach, 4th Edition Full Digital Edition
100% (15)
Assessments in Occupational Therapy Mental Health An Integrative Approach, 4th Edition Full Digital Edition
16 pages
Regularization (Mathematics) - Wikipedia
No ratings yet
Regularization (Mathematics) - Wikipedia
13 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
Unit V NNHDL
No ratings yet
Unit V NNHDL
33 pages
A Brief History of Consumer Culture
No ratings yet
A Brief History of Consumer Culture
6 pages
DL Unit-3
No ratings yet
DL Unit-3
56 pages
Regularization
No ratings yet
Regularization
46 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
Family Worship, Family Worship, J. H. Merle D'aubigné
No ratings yet
Family Worship, Family Worship, J. H. Merle D'aubigné
18 pages
What Is Galvanic Cell Give Half Cell Reaction of Daniell Cell and Ex
No ratings yet
What Is Galvanic Cell Give Half Cell Reaction of Daniell Cell and Ex
1 page
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
12-Regularization For Deep Learning-17!08!2024
No ratings yet
12-Regularization For Deep Learning-17!08!2024
51 pages
OOPS Lab File
No ratings yet
OOPS Lab File
60 pages
Lec8 Regularization
No ratings yet
Lec8 Regularization
41 pages
DL Module 2
No ratings yet
DL Module 2
8 pages
Lecture 11
No ratings yet
Lecture 11
110 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Mastering Tableau
From Everand
Mastering Tableau
David Baldwin
2.5/5 (3)
data science course training in india hyderabad: innomatics research labs
From Everand
data science course training in india hyderabad: innomatics research labs
innomatics research labs
No ratings yet

CM20315 09 Regularization

Uploaded by

CM20315 09 Regularization

Uploaded by

CM20315 - Machine Learning

Prof. Simon Prince

• Regularization adds and extra term

• Favors some parameters, disfavors others.

• Regularization adds an extra term

• Favors some parameters, disfavors others.

• Regularization adds an extra term

• Favors some parameters, disfavors others.

• Regularization is equivalent to a adding a prior over parameters

… what you know about parameters before seeing the data

• Also called Tichonov regularization, ridge regression

• SGD likes all batches to have similar gradients

• SGD likes all batches to have similar gradients

• SGD likes all batches to have similar gradients

• Take all possible parameters into account when make prediction

• Take all possible parameters into account when make prediction

You might also like