0% found this document useful (0 votes)

77 views13 pages

Linear Regression Regularization

Uploaded by

sonal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views13 pages

Linear Regression Regularization

Uploaded by

sonal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Linear Regression Regularization

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Linear Regression Model -

Lab- 1- Estimating mileage based on features of a second hand car

Description – Sample data is available at

https://archive.ics.uci.edu/ml/datasets/Auto+MPG

The dataset has 9 attributes listed below that define the quality
1. mpg: continuous
2. cylinders: multi-valued discrete
3. displacement: continuous
4. horsepower: continuous
5. weight: continuous
6. acceleration: continuous
7. model year: multi-valued discrete
8. origin: multi-valued discrete
9. car name: string (unique for each instance)

Sol : Ridge_Lasso_Regression.ipynb
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)

When we have too many parameters and exposed to curse of dimensionality, we resort to dimensionality reduction
techniques such as transforming to PCA and eliminating the PCA with least magnitude of eigen values. This can be a
laborious process before we find the right number principal components. Instead, we can employ the shrinkage
methods.

Shrinkage methods attempt to shrink the coefficients of the attributes and lead us towards simpler yet effective
models. The two shrinkage methods are :

1. Ridge regression is similar to the linear regression where the objective is to find the best fit
surface. The difference is in the way the best coefficients are found. Unlike linear regression where
the optimization function is SSE, here it is slightly different

1. TheLinear
term Regression
is like acost
penalty term used to penalize
function
large magnitude coefficients when it is set to
Ridge Regression with additional term in the cost function
a high number, coefficients are suppressed significantly. When it is set to 0, the cost function
becomes same as linear regression cost function

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)

Why should we be interested in shrinking the coefficients? How does it help?

When we have large number of dimensions and few data points, the models are likely to become complex, overfit
and prone to variance errors. When you print out the coefficients of the attributes of such complex model, you will
notice that the magnitude of the different coefficients become large

Large coefficients indicate a case where for a unit change in the input variable, the magnitude of change in the target
column is very large.

Coeff for simple linear regression model of 10 dimensions Coeff with polynomial features shooting up to 57 from 10
-9.67853872e-13 -1.06672046e+12 -4.45865268e+00 -2.24519565e+00 -
1. The coefficient for cyl is 2.5059518049385052 2.96922206e+00 -1.56882955e+00 3.00019063e+00 -1.42031640e+12 -
2. The coefficient for disp is 2.5357082860560483 5.46189566e+11 3.62350196e+12 -2.88818173e+12 -1.16772461e+00 -
1.43814087e+00 -7.49492645e-03 2.59439087e+00 -1.92409515e+00 -
3. The coefficient for hp is -1.7889335736325294 3.41759793e+12 -6.27534905e+12 -2.44065576e+12 -2.32961194e+12
4. The coefficient for wt is -5.551819873098725 3.97766113e-01 1.94046021e-01 -4.26086426e-01 3.58203125e+00 -
5. The coefficient for acc is 0.11485734803440854 2.05296326e+00 -7.51019934e+11 -6.18967069e+11 -5.90805593e+11
6. The coefficient for yr is 2.931846548211609 2.47863770e-01 -6.68518066e-01 -1.92150879e+00 -7.37030029e-01 -
7. The coefficient for car_type is 2.977869737601944 1.01183732e+11 -8.33924574e+10 -7.95983063e+10 -1.70394897e-01
5.25512695e-01 -3.33097839e+00 1.56301740e+12 1.28818991e+12
8. The coefficient for origin_america is -0.5832955290166003 1.22958044e+12 5.80200195e-01 1.55352783e+00 3.64527008e+11
9. The coefficient for origin_asia is 0.3474931380432235 3.00431724e+11 2.86762821e+11 3.97644043e-01 8.58604718e+10
10.The coefficient for origin_europe is 0.3774164680868855 7.07635073e+10 6.75439422e+10 -7.25449332e+11 1.00689540e+12
9.61084146e+11 2.18532428e+11 -4.81675252e+12 2.63818648e+12

Ref: Ridge_Lasso_Regression.ipynb Very large coefficients!

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)

Z = f ( x, y) 1. Curse of dimensionality results in large magnitude

coefficients which results in a complex undulated surface /
model.

1. This complex surface has the data points occupying the

peaks and the valleys

1. The model gives near 100% accuracy in training but poor

result in testing and the testing scores also vary a lot from
one sample to another.

1. The model is supposed to have absorbed the noise in the

data distribution!

1. Large magnitudes of the coefficient give the least SSE and

at times SSE = 0! A model that fits the training set 100%!

1. Such models do not generalize

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)

1. In Ridge Regression, the algorithm while trying to find the

best combination of coefficients which minimize the SSE on
the training data, is constrained by the penalty term

1. The penalty term is akin to cost of magnitude of the

coefficients. Higher the magnitude, more the cost. Thus to
minimize the cost, the coefficient are suppressed

1. Thus the resulting surface tends to be relatively much more

smoother than the unconstrained surface. This means we
have settled for a model which will make errors in the
Z = f ( x, y) training data

1. This is fine as long as the errors can be attributed to the

random fluctuations i.e. because the model does not absorb
the random fluctuations in the data

1. Such model will perform equally well on unseen data i.e.

test data. The model will generalize better than the complex
model

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)

Impact of Ridge Regression on the coefficients of the 56 attributes

Ridge model: [[ 0. 3.73512981 -2.93500874 -2.13974194 -3.56547812 -1.28898893 3.01290805

2.04739082 0.0786974 0.21972225 -0.3302341 -1.46231096 -1.17221896 0.00856067 2.48054694
-1.67596093 0.99537516 -2.29024279 4.7699338 -2.08598898 0.34009408 0.35024058 -0.41761834
3.06970569 -2.21649433 1.86339518 -2.62934278 0.38596397 0.12088534 -0.53440382 -1.88265835
-0.7675926 -0.90146842 0.52416091 0.59678246 -0.26349448 0.5827378 -3.02842915 -0.36548074
0.5956112 -0.15941014 0.49168856 1.45652375 -0.43819158 -0.20964198 0.77665496 0.36489921
-0.4750838 0.3551047 0.23188557 -1.42941282 2.06831543 -0.34986402 -0.32320394 0.39054656 0.06283411]]

Large coefficients have been suppressed, almost close to 0 in many cases.

Ref: Ridge_Lasso_Regression.ipynb

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)

1. Lasso Regression is similar to the Ridge regression with a difference in the penalty term. Unlike
Ridge, the penalty term here is raised to power 1. Also known as L1 norm.

1. The term continues to be the input parameter which will decide how high penalties would be
for the coefficients. Larger the value more diminished the coefficients will be.

1. Unlike Ridge regression, where the coefficients are driven towards zero but may not become zero,
Lasso Regression penalty process will make many of the coefficients 0. In other words, literally
drop the dimensions

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)

Impact of Lasso Regression on the coefficients of the 56 attributes

Lasso model: [ 0. 0.52263805 -0.5402102 -1.99423315 -4.55360385 -0.85285179 2.99044036 0.00711821 -0. 0.76073274 -0. -0. -0.19736449
0. 2.04221833 -1.00014513 0. -0. 4.28412669 -0. 0. 0.31442062 -0. 2.13894094 -1.06760107 0. -0. 0. 0. -0.44991392 -1.55885506 -0. -0.68837902 0.
0.17455864 -0.34653644 0.3313704 -2.84931966 0. -0.34340563 0.00815105 0.47019445 1.25759712 -0.69634581 0. 0.55528147 0.2948979 -0.67289549
0.06490671 0. -1.19639935 1.06711702 0. -0.88034391 0. -0. ]

Large coefficients have been suppressed, to 0 in many cases, making those dimensions useless i.e. dropped from
the model.

Ref: Ridge_Lasso_Regression.ipynb

To compare the Ridge and Lasso, let us first transform our error function (which is
a quadratic / convex function) into a contour graph

1. Every ring on the error function represents a combination of

coefficients (m1 and m2 in the image) which result in same
quantum of error i.e. SSE

1. Let us convert that to a 2d contour plot. In the contour plot,

every ring represents one quantum of error.

1. The innermost ring / bull’s eye is the combination of the

coefficients that gives the lease SSE

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Ridge Constraint)
1. Yellow circle is the Ridge
Most optimal combination of m1, constraint region
Lowest SSE error
m2 given the constraints representing the ridge
ring. violates the
penalty (sum of squared
constraint coeff)

1. Any combination of m1
nd m2 that fall within
yellow is a possible
solution

1. The most optimal of all

Sub-optimal solutions is the one
combination which satisfies the
Allowed combination of m1, m2. constraint and also
of m1, m2 by Ridge Meets minimizes the SSE
Constraints constraint but (smallest possible red
circle)
is not the
minimal 1. Thus the optimal solution
possible SSE of m1 and m2 is the one
within where the yellow circle
constraint touches a red circle.

The point to note is that the red rings and yellow circle will never be tangential (touch) on the axes
representing the coefficient. Hence Ridge can make coefficients close to zero but never zero. You may
notice some coefficients becoming zero but that will be due to roundoff…
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Ridge Constraint) 1. As the lambda value (shown here as alpha) increases,
the coefficients have to become smaller and smaller to
minimize the penalty term in the cost function i.e. the

1. The larger the lambda, smaller the sum of squared

coefficients should be and as a result the tighter the
constraint region

1. The tighter the constraint region, the larger will be the red
circle in the contour diagram that will be tangent to the
boundary of the yellow region

1. Thus, higher the lambda, stronger the shrinkage, the

coefficient shrink significantly and hence more smooth
the surface / model

1. More smoother the surface, more likely the model is

going to perform equally well in production

1. When we move away from a model with sharp peaks and

valleys (complex model) to smoother surface (simpler
models), we reduce the variance errors but bias errors go
up.

1. Using gridsearch, we have to find the right value of

lambda which results in right fit, neither too complex nor
too simple a model

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Lasso Constraint)
1. Yellow rectangle is the
Most optimal combination of m1, Lasso constraint region
Lowest SSE error
m2 given the constraints representing the Lasso
ring. violates the
penalty (sum coeff)
constraint
1. Any combination of m1
nd m2 that fall within
yellow is a possible
solution

1. The most optimal of all

solutions is the one
Sub-optimal which satisfies the
combination constraint and also
Allowed combination of m1, m2. minimizes the SSE
of m1, m2 by Lasso Meets (smallest possible red
constraint but circle)
Constraints
is not the
1. Thus the optimal solution
minimal of m1 and m2 is the one
possible SSE where the yellow
within rectangle touches a red
constraint circle.

The beauty of Lasso is, the red circle may touch the constraint region on the attribute axis! In the picture
above the circle is touching the yellow rectangle on the m1 axis. But at that point m2 coefficient is 0!
Which means, that dimension has been dropped from analysis. Thus Lasso does dimensionality reduction
which Ridge does not
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited

1.1. Linear Models - Scikit-Learn 1.6.1 Documentation
No ratings yet
1.1. Linear Models - Scikit-Learn 1.6.1 Documentation
41 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
Machine Learning With Ridge and Lasso Regression
No ratings yet
Machine Learning With Ridge and Lasso Regression
19 pages
LASSO and Ridge-1
No ratings yet
LASSO and Ridge-1
15 pages
Unit 2
No ratings yet
Unit 2
92 pages
ML 2024 Part2 Shrinkage Estimators
No ratings yet
ML 2024 Part2 Shrinkage Estimators
64 pages
Lec8 Regularization Polynomial Regression
No ratings yet
Lec8 Regularization Polynomial Regression
30 pages
Unit 2
No ratings yet
Unit 2
92 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
Regularization
No ratings yet
Regularization
13 pages
Regularization & Gradient Descent
No ratings yet
Regularization & Gradient Descent
18 pages
ISYE 8803 - Kamran - M6 - LD Learning Using Regularization
No ratings yet
ISYE 8803 - Kamran - M6 - LD Learning Using Regularization
25 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Regularization
No ratings yet
Regularization
13 pages
Ch5 Regularization
No ratings yet
Ch5 Regularization
23 pages
INSY446 - 3 - Linear Model Part 2
No ratings yet
INSY446 - 3 - Linear Model Part 2
27 pages
3.3 Regularized Linear Model
No ratings yet
3.3 Regularized Linear Model
27 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
Regularization Methods Intro 1694372556
No ratings yet
Regularization Methods Intro 1694372556
38 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
Lecture-6 Linear Regression Addition
No ratings yet
Lecture-6 Linear Regression Addition
15 pages
9 - Linear Regression-Problems and Solutions
No ratings yet
9 - Linear Regression-Problems and Solutions
23 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Regularization
No ratings yet
Regularization
5 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Regression
No ratings yet
Regression
16 pages
Lecture 3
No ratings yet
Lecture 3
16 pages
Ridge and Lasso Regression in Python
No ratings yet
Ridge and Lasso Regression in Python
18 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
MLF Week 4 Notes by Manisha Pal
No ratings yet
MLF Week 4 Notes by Manisha Pal
13 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
16 pages
Feature Selection
No ratings yet
Feature Selection
19 pages
What Is LASSO Regression Definition, Examples and Techniques
No ratings yet
What Is LASSO Regression Definition, Examples and Techniques
15 pages
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
No ratings yet
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
29 pages
SLChapter 5
No ratings yet
SLChapter 5
16 pages
ML Unit 3 Assignment
No ratings yet
ML Unit 3 Assignment
13 pages
CSL0777 L17
No ratings yet
CSL0777 L17
27 pages
Regularization
No ratings yet
Regularization
3 pages
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
No ratings yet
Dr. Siti Mariam Binti Abdul Rahman Faculty of Mechanical Engineering Office: T1-A14-01C E-Mail: Mariam4528@salam - Uitm.edu - My
30 pages
Lecture 13 - Reguralization
No ratings yet
Lecture 13 - Reguralization
33 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
Comprehensive Machine Learning Tutorial - Regressio
No ratings yet
Comprehensive Machine Learning Tutorial - Regressio
9 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
Aml 3
No ratings yet
Aml 3
19 pages
Lab 1
No ratings yet
Lab 1
6 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
Ridge Mt1cars
No ratings yet
Ridge Mt1cars
4 pages
Unit 2
No ratings yet
Unit 2
8 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
Karthik Nambiar 60009220193
No ratings yet
Karthik Nambiar 60009220193
9 pages
AI34
No ratings yet
AI34
3 pages
Chapter 6 - 1 Handsout Machine Learning
No ratings yet
Chapter 6 - 1 Handsout Machine Learning
29 pages
SAP S4 Hana Syllabus
No ratings yet
SAP S4 Hana Syllabus
3 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Exercise 03
No ratings yet
Exercise 03
5 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
B Ridge - and - Lasso - Regression
No ratings yet
B Ridge - and - Lasso - Regression
5 pages
The Role of Teacher Feedback in Enhancing ESL Learners Writing Proficiency 1
No ratings yet
The Role of Teacher Feedback in Enhancing ESL Learners Writing Proficiency 1
8 pages
Study of E Banking Services Offered by ICICI Bank Manavi Mhaskar 09
No ratings yet
Study of E Banking Services Offered by ICICI Bank Manavi Mhaskar 09
58 pages
Ridge Regression
No ratings yet
Ridge Regression
5 pages
BJT Biasing (Complete)
No ratings yet
BJT Biasing (Complete)
64 pages
Ridge and Lasso in Python PDF
No ratings yet
Ridge and Lasso in Python PDF
5 pages
Ma 2024 How AI Use in Organizations Contributes To Employee Competitive Advantage - The Moderating Role of Perceived Organization Support
No ratings yet
Ma 2024 How AI Use in Organizations Contributes To Employee Competitive Advantage - The Moderating Role of Perceived Organization Support
14 pages
RHCSA Rapid Track Course
No ratings yet
RHCSA Rapid Track Course
3 pages
Enviromental Scanning Small Scale Industry
No ratings yet
Enviromental Scanning Small Scale Industry
32 pages
Fluid Statics Examples
No ratings yet
Fluid Statics Examples
14 pages
Kamuli District DDP III 2020 - 2025 - 0
No ratings yet
Kamuli District DDP III 2020 - 2025 - 0
233 pages
Set Theory
No ratings yet
Set Theory
6 pages
Unit 2 Companies English For Business 3 April 2025
No ratings yet
Unit 2 Companies English For Business 3 April 2025
8 pages
Job Application Letter Volunteer
100% (1)
Job Application Letter Volunteer
6 pages
Time-Series Econometrics
No ratings yet
Time-Series Econometrics
36 pages
Central University of Haryana: Temporary Camp Office: Govt. B.Ed. College Building, Narnaul (Distt. Mahendergarh) Haryana
No ratings yet
Central University of Haryana: Temporary Camp Office: Govt. B.Ed. College Building, Narnaul (Distt. Mahendergarh) Haryana
7 pages
Silent Songs Possible Kcse Questions Set 1
No ratings yet
Silent Songs Possible Kcse Questions Set 1
5 pages
Mockingbird
No ratings yet
Mockingbird
4 pages
Nova Southeastern Dissertation Guide
100% (2)
Nova Southeastern Dissertation Guide
4 pages
Sector Theory
No ratings yet
Sector Theory
12 pages
Supply Chain Improvement in Construction Industry
No ratings yet
Supply Chain Improvement in Construction Industry
8 pages
Prenatal Genetic Testing For Monogenic Diabetes Due To Glucokinase Deficiency (December 2023) What's New
No ratings yet
Prenatal Genetic Testing For Monogenic Diabetes Due To Glucokinase Deficiency (December 2023) What's New
33 pages
Moral Reasoning: Moral Reasoning Is The Process of Determining Right or Wrong in A Given Situation
No ratings yet
Moral Reasoning: Moral Reasoning Is The Process of Determining Right or Wrong in A Given Situation
12 pages
Personal Mandala Rubric
No ratings yet
Personal Mandala Rubric
2 pages
NL2SQL Schema Linked Guide
No ratings yet
NL2SQL Schema Linked Guide
4 pages
Wind Energy
No ratings yet
Wind Energy
26 pages
Trevor Ivan - Final Assessment
No ratings yet
Trevor Ivan - Final Assessment
3 pages
French Sociologist Pierre Bourdieu
No ratings yet
French Sociologist Pierre Bourdieu
3 pages
Pharmacy Site File Checklist
No ratings yet
Pharmacy Site File Checklist
7 pages
30 bt Mức độ thông hiểu - phần 2
No ratings yet
30 bt Mức độ thông hiểu - phần 2
2 pages
MMS Exam Form Acknowledgment
No ratings yet
MMS Exam Form Acknowledgment
1 page
20 06 09 Tastytrade Research
No ratings yet
20 06 09 Tastytrade Research
3 pages
Active Contour: Advancing Computer Vision with Active Contour Techniques
From Everand
Active Contour: Advancing Computer Vision with Active Contour Techniques
Fouad Sabry
No ratings yet

Linear Regression Regularization

Uploaded by

Linear Regression Regularization

Uploaded by

Linear Regression Regularization

Lab- 1- Estimating mileage based on features of a second hand car

Description – Sample data is available at

Why should we be interested in shrinking the coefficients? How does it help?

Ref: Ridge_Lasso_Regression.ipynb Very large coefficients!

Z = f ( x, y) 1. Curse of dimensionality results in large magnitude

1. This complex surface has the data points occupying the

1. The model gives near 100% accuracy in training but poor

1. The model is supposed to have absorbed the noise in the

1. Large magnitudes of the coefficient give the least SSE and

1. Such models do not generalize

1. In Ridge Regression, the algorithm while trying to find the

1. The penalty term is akin to cost of magnitude of the

1. Thus the resulting surface tends to be relatively much more

1. This is fine as long as the errors can be attributed to the

1. Such model will perform equally well on unseen data i.e.

Impact of Ridge Regression on the coefficients of the 56 attributes

Ridge model: [[ 0. 3.73512981 -2.93500874 -2.13974194 -3.56547812 -1.28898893 3.01290805

Large coefficients have been suppressed, almost close to 0 in many cases.

Impact of Lasso Regression on the coefficients of the 56 attributes

1. Every ring on the error function represents a combination of

1. Let us convert that to a 2d contour plot. In the contour plot,

1. The innermost ring / bull’s eye is the combination of the

1. The most optimal of all

1. The larger the lambda, smaller the sum of squared

1. Thus, higher the lambda, stronger the shrinkage, the

1. More smoother the surface, more likely the model is

1. When we move away from a model with sharp peaks and

1. Using gridsearch, we have to find the right value of

1. The most optimal of all

You might also like