0% found this document useful (0 votes)

67 views7 pages

07 Regularization

The document discusses regularization techniques to address overfitting in machine learning models. Regularization works by adding penalties to the cost function to discourage complex models. This encourages simpler models that generalize better to new data. Regularization is applied by adding terms to the cost function and gradient descent updates for linear and logistic regression models.

Uploaded by

Juan Salvador Sánchez Bonnín

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views7 pages

07 Regularization

Uploaded by

Juan Salvador Sánchez Bonnín

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

07: Regularization

Previous Next Index

The problem of overfitting

So far we've seen a few algorithms - work well for many applications, but can suffer from the problem
of overfitting
What is overfitting?
What is regularization and how does it help

Overfitting with linear regression

Using our house pricing example again

Fit a linear function to the data - not a great model
This is underfitting - also known as high bias
Bias is a historic/technical one - if we're fitting a straight line to the data we have a strong
preconception that there should be a linear fit
In this case, this is not correct, but a straight line can't help being straight!
Fit a quadratic function
Works well
Fit a 4th order polynomial
Now curve fit's through all five examples
Seems to do a good job fitting the training set
But, despite fitting the data we've provided very well, this is actually not such a
good model
This is overfitting - also known as high variance
Algorithm has high variance
High variance - if fitting high order polynomial then the hypothesis can basically fit
any data
Space of hypothesis is too large

To recap, if we have too many features then the learned hypothesis may give a cost function of
exactly zero
But this tries too hard to fit the training set
Fails to provide a general solution - unable to generalize (apply to new examples)

Overfitting with logistic regression

Same thing can happen to logistic regression
Sigmoidal function is an underfit
But a high order polynomial gives and overfitting (high variance hypothesis)

Addressing overfitting

Later we'll look at identifying when overfitting and underfitting is occurring

Earlier we just plotted a higher order function - saw that it looks "too curvy"
Plotting hypothesis is one way to decide, but doesn't always work
Often have lots of a features - here it's not just a case of selecting a degree polynomial,
but also harder to plot the data and visualize to decide what features to keep and which
to drop
If you have lots of features and little data - overfitting can be a problem
How do we deal with this?
1) Reduce number of features
Manually select which features to keep
Model selection algorithms are discussed later (good for reducing number of
features)
But, in reducing the number of features we lose some information
Ideally select those features which minimize data loss, but even so, some info
is lost
2) Regularization
Keep all features, but reduce magnitude of parameters θ
Works well when we have a lot of features, each of which contributes a bit to
predicting y

Cost function optimization for regularization

Penalize and make some of the θ parameters really small
e.g. here θ3 and θ4
The addition in blue is a modification of our cost function to help penalize θ3 and θ4
So here we end up with θ3 and θ4 being close to zero (because the constants are massive)
So we're basically left with a quadratic function

In this example, we penalized two of the parameter values

More generally, regularization is as follows

Regularization
Small values for parameters corresponds to a simpler hypothesis (you effectively get rid
of some of the terms)
A simpler hypothesis is less prone to overfitting
Another example
Have 100 features x1, x2, ..., x100
Unlike the polynomial example, we don't know what are the high order terms
How do we pick the ones to pick to shrink?
With regularization, take cost function and modify it to shrink all the parameters
Add a term at the end
This regularization term shrinks every parameter
By convention you don't penalize θ0 - minimization is from θ1 onwards

In practice, if you include θ0 has little impact

λ is the regularization parameter
Controls a trade off between our two goals
1) Want to fit the training set well
2) Want to keep parameters small
With our example, using the regularized objective (i.e. the cost function with the
regularization term) you get a much smoother curve which fits the data and gives a much
better hypothesis
If λ is very large we end up penalizing ALL the parameters (θ1, θ2 etc.) so all the
parameters end up being close to zero
If this happens, it's like we got rid of all the terms in the hypothesis
This results here is then underfitting
So this hypothesis is too biased because of the absence of any parameters
(effectively)
So, λ should be chosen carefully - not too big...
We look at some automatic ways to select λ later in the course

Regularized linear regression

Previously, we looked at two algorithms for linear regression
Gradient descent
Normal equation
Our linear regression with regularization is shown below

Previously, gradient descent would repeatedly update the parameters θj, where j = 0,1,2...n
simultaneously
Shown below

We've got the θ0 update here shown explicitly

This is because for regularization we don't penalize θ0 so treat it slightly differently
How do we regularize these two rules?
Take the term and add λ/m * θj
Sum for every θ (i.e. j = 0 to n)
This gives regularization for gradient descent
We can show using calculus that the equation given below is the partial derivative of the regularized
J(θ)
The update for θj
θj gets updated to
θj - α * [a big term which also depends on θj]
So if you group the θj terms together

The term

Is going to be a number less than 1 usually

Usually learning rate is small and m is large
So this typically evaluates to (1 - a small number)
So the term is often around 0.99 to 0.95
This in effect means θj gets multiplied by 0.99
Means the squared norm of θj a little smaller
The second term is exactly the same as the original gradient descent

Regularization with the normal equation

Normal equation is the other linear regression model
Minimize the J(θ) using the normal equation
To use regularization we add a term (+ λ [n+1 x n+1]) to the equation
[n+1 x n+1] is the n+1 identity matrix

Regularization for logistic regression

We saw earlier that logistic regression can be prone to overfitting with lots of features
Logistic regression cost function is as follows;
To modify it we have to add an extra term

This has the effect of penalizing the parameters θ1, θ2 up to θn

Means, like with linear regression, we can get what appears to be a better fitting lower order
hypothesis
How do we implement this?
Original logistic regression with gradient descent function was as follows

Again, to modify the algorithm we simply need to modify the update rule for θ1, onwards
Looks cosmetically the same as linear regression, except obviously the hypothesis is very
different

Advanced optimization of regularized linear regression

As before, define a costFunction which takes a θ parameter and gives jVal and gradient back

use fminunc
Pass it an @costfunction argument
Minimizes in an optimized manner using the cost function
jVal
Need code to compute J(θ)
Need to include regularization term
Gradient
Needs to be the partial derivative of J(θ) with respect to θi
Adding the appropriate term here is also necessary

Ensure summation doesn't extend to to the lambda term!

It doesn't, but, you know, don't be daft!

Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Unit - 4-NNDL - Notes
No ratings yet
Unit - 4-NNDL - Notes
14 pages
Data Science L20 - Regularization
No ratings yet
Data Science L20 - Regularization
41 pages
Lec8 Regularization Polynomial Regression
No ratings yet
Lec8 Regularization Polynomial Regression
30 pages
Regularization: The Problem of Overfitting
No ratings yet
Regularization: The Problem of Overfitting
24 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
6 Complexity
No ratings yet
6 Complexity
22 pages
Lecture 3-Linear-Regression-Part2
No ratings yet
Lecture 3-Linear-Regression-Part2
45 pages
Multiclass Classification Regularization
No ratings yet
Multiclass Classification Regularization
31 pages
Unit 4
No ratings yet
Unit 4
62 pages
Lecture 4.2. Generalization and Regularization
No ratings yet
Lecture 4.2. Generalization and Regularization
23 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
L09 - Regularisation
No ratings yet
L09 - Regularisation
79 pages
Group 30
No ratings yet
Group 30
33 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Handout5 Regularization
No ratings yet
Handout5 Regularization
20 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
Lecture W4a
No ratings yet
Lecture W4a
17 pages
Regularization
No ratings yet
Regularization
7 pages
02 - Linear Models - C - Regularization - Logistic - Regression
No ratings yet
02 - Linear Models - C - Regularization - Logistic - Regression
16 pages
9 - Linear Regression-Problems and Solutions
No ratings yet
9 - Linear Regression-Problems and Solutions
23 pages
Linguistics STudy Guide PDF
100% (3)
Linguistics STudy Guide PDF
178 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
07 Regularization
No ratings yet
07 Regularization
51 pages
NNDL Notes
No ratings yet
NNDL Notes
73 pages
Regularization
No ratings yet
Regularization
5 pages
Overfitting Problem Regularization (Ridge, Lasso, Elastic) Dropout and Early Stopping
No ratings yet
Overfitting Problem Regularization (Ridge, Lasso, Elastic) Dropout and Early Stopping
17 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
Regularization
No ratings yet
Regularization
46 pages
Regularization - AndrewNg
No ratings yet
Regularization - AndrewNg
24 pages
Achine Learning Egularization: Ntroduction
No ratings yet
Achine Learning Egularization: Ntroduction
10 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
Regularization 1704650055
No ratings yet
Regularization 1704650055
32 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
Regularization
No ratings yet
Regularization
3 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
Regularization (Mathematics)
No ratings yet
Regularization (Mathematics)
11 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Regularization PDF
No ratings yet
Regularization PDF
32 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
Regularization (Mathematics) - Wikipedia
No ratings yet
Regularization (Mathematics) - Wikipedia
13 pages
Machine Learning - Home - Week 2 - Notes - Coursera
No ratings yet
Machine Learning - Home - Week 2 - Notes - Coursera
10 pages
NN&DL Unit-IV Regularization For Deep Learning
No ratings yet
NN&DL Unit-IV Regularization For Deep Learning
16 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
Anand Kumar (Control Systems) PDF
No ratings yet
Anand Kumar (Control Systems) PDF
227 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
Design and Analysis SAE
100% (1)
Design and Analysis SAE
186 pages
Linear Regression With Multiple Features
No ratings yet
Linear Regression With Multiple Features
7 pages
Introduction To Machine Learning: The Problem of Overfitting
No ratings yet
Introduction To Machine Learning: The Problem of Overfitting
8 pages
Deep Learning Basics Lecture 3 Regularization I
No ratings yet
Deep Learning Basics Lecture 3 Regularization I
32 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Gemechu Bushu1
No ratings yet
Gemechu Bushu1
80 pages
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
No ratings yet
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
10 pages
06 ISO 19443 FMEA Practical Application Handout
100% (2)
06 ISO 19443 FMEA Practical Application Handout
45 pages
Project Management Professional (Training)
100% (1)
Project Management Professional (Training)
182 pages
07: Regularization: The Problem of Overfitting
No ratings yet
07: Regularization: The Problem of Overfitting
5 pages
The Problem of Overfitting: Overfitting With Linear Regression
No ratings yet
The Problem of Overfitting: Overfitting With Linear Regression
32 pages
Deloitte ERP Industrie-4-0 Whitepaper
No ratings yet
Deloitte ERP Industrie-4-0 Whitepaper
50 pages
Writing An Application Letter
No ratings yet
Writing An Application Letter
35 pages
English9 1ST Exam
100% (1)
English9 1ST Exam
4 pages
Introduction To 8051 Programming
No ratings yet
Introduction To 8051 Programming
24 pages
Clean Rooms L Essentiel 3
100% (1)
Clean Rooms L Essentiel 3
4 pages
Practice Exam Paper 3
0% (1)
Practice Exam Paper 3
4 pages
PGD Võ Nhai - đề 2
No ratings yet
PGD Võ Nhai - đề 2
10 pages
DGR 61st Edition Checklist For A Radioactive Shipment 11
No ratings yet
DGR 61st Edition Checklist For A Radioactive Shipment 11
1 page
Logitech Wireless Keyboard K350 Manual
No ratings yet
Logitech Wireless Keyboard K350 Manual
40 pages
MATH 8 THIRD QUARTER EXAM (AutoRecovered)
No ratings yet
MATH 8 THIRD QUARTER EXAM (AutoRecovered)
8 pages
Ebin - Pub The Cult of Silvanus A Study in Roman Folk Religion 9004096019 9789004096011
No ratings yet
Ebin - Pub The Cult of Silvanus A Study in Roman Folk Religion 9004096019 9789004096011
95 pages
Shrinkage Limit Presentation
No ratings yet
Shrinkage Limit Presentation
10 pages
Fame Fortune and Ambition What Is The Real Meaning of Success Osho Life Essentials by Osho B003gwx8ia PDF
No ratings yet
Fame Fortune and Ambition What Is The Real Meaning of Success Osho Life Essentials by Osho B003gwx8ia PDF
5 pages
Tierney 1988
No ratings yet
Tierney 1988
7 pages
Language Arts Teach-A-Lesson Assignment
No ratings yet
Language Arts Teach-A-Lesson Assignment
12 pages
Introduction To Cosmochemistry: Planet Earth
No ratings yet
Introduction To Cosmochemistry: Planet Earth
72 pages
Science Reaction Time Lab
No ratings yet
Science Reaction Time Lab
5 pages
Unit 6 Homework Part 2
No ratings yet
Unit 6 Homework Part 2
33 pages
2.2 Hypothesis Testing Critical Values - COMPLETE
No ratings yet
2.2 Hypothesis Testing Critical Values - COMPLETE
13 pages
Elements of Mechanical Engineering
No ratings yet
Elements of Mechanical Engineering
5 pages
Script - What Is Applied Behaviour Analysis
No ratings yet
Script - What Is Applied Behaviour Analysis
4 pages
Marxist Perspective On Population
No ratings yet
Marxist Perspective On Population
14 pages
How To Use Webshag-Gui in Kali Linux - Geeky Shows
No ratings yet
How To Use Webshag-Gui in Kali Linux - Geeky Shows
6 pages
Exploring Randomness PDF
No ratings yet
Exploring Randomness PDF
2 pages
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet

07 Regularization

Uploaded by

07 Regularization

Uploaded by

07: Regularization

Previous Next Index

The problem of overfitting

Overfitting with linear regression

Using our house pricing example again

Overfitting with logistic regression

Later we'll look at identifying when overfitting and underfitting is occurring

Cost function optimization for regularization

In this example, we penalized two of the parameter values

In practice, if you include θ0 has little impact

Regularized linear regression

We've got the θ0 update here shown explicitly

Is going to be a number less than 1 usually

Regularization with the normal equation

Regularization for logistic regression

This has the effect of penalizing the parameters θ1, θ2 up to θn

Advanced optimization of regularized linear regression

Ensure summation doesn't extend to to the lambda term!

You might also like