Linear Regression Regularization
Linear Regression Regularization
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Linear Regression Model -
The dataset has 9 attributes listed below that define the quality
1. mpg: continuous
2. cylinders: multi-valued discrete
3. displacement: continuous
4. horsepower: continuous
5. weight: continuous
6. acceleration: continuous
7. model year: multi-valued discrete
8. origin: multi-valued discrete
9. car name: string (unique for each instance)
Sol : Ridge_Lasso_Regression.ipynb
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)
When we have too many parameters and exposed to curse of dimensionality, we resort to dimensionality reduction
techniques such as transforming to PCA and eliminating the PCA with least magnitude of eigen values. This can be a
laborious process before we find the right number principal components. Instead, we can employ the shrinkage
methods.
Shrinkage methods attempt to shrink the coefficients of the attributes and lead us towards simpler yet effective
models. The two shrinkage methods are :
1. Ridge regression is similar to the linear regression where the objective is to find the best fit
surface. The difference is in the way the best coefficients are found. Unlike linear regression where
the optimization function is SSE, here it is slightly different
1. TheLinear
term Regression
is like acost
penalty term used to penalize
function
large magnitude coefficients when it is set to
Ridge Regression with additional term in the cost function
a high number, coefficients are suppressed significantly. When it is set to 0, the cost function
becomes same as linear regression cost function
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)
Large coefficients indicate a case where for a unit change in the input variable, the magnitude of change in the target
column is very large.
Coeff for simple linear regression model of 10 dimensions Coeff with polynomial features shooting up to 57 from 10
-9.67853872e-13 -1.06672046e+12 -4.45865268e+00 -2.24519565e+00 -
1. The coefficient for cyl is 2.5059518049385052 2.96922206e+00 -1.56882955e+00 3.00019063e+00 -1.42031640e+12 -
2. The coefficient for disp is 2.5357082860560483 5.46189566e+11 3.62350196e+12 -2.88818173e+12 -1.16772461e+00 -
1.43814087e+00 -7.49492645e-03 2.59439087e+00 -1.92409515e+00 -
3. The coefficient for hp is -1.7889335736325294 3.41759793e+12 -6.27534905e+12 -2.44065576e+12 -2.32961194e+12
4. The coefficient for wt is -5.551819873098725 3.97766113e-01 1.94046021e-01 -4.26086426e-01 3.58203125e+00 -
5. The coefficient for acc is 0.11485734803440854 2.05296326e+00 -7.51019934e+11 -6.18967069e+11 -5.90805593e+11
6. The coefficient for yr is 2.931846548211609 2.47863770e-01 -6.68518066e-01 -1.92150879e+00 -7.37030029e-01 -
7. The coefficient for car_type is 2.977869737601944 1.01183732e+11 -8.33924574e+10 -7.95983063e+10 -1.70394897e-01
5.25512695e-01 -3.33097839e+00 1.56301740e+12 1.28818991e+12
8. The coefficient for origin_america is -0.5832955290166003 1.22958044e+12 5.80200195e-01 1.55352783e+00 3.64527008e+11
9. The coefficient for origin_asia is 0.3474931380432235 3.00431724e+11 2.86762821e+11 3.97644043e-01 8.58604718e+10
10.The coefficient for origin_europe is 0.3774164680868855 7.07635073e+10 6.75439422e+10 -7.25449332e+11 1.00689540e+12
9.61084146e+11 2.18532428e+11 -4.81675252e+12 2.63818648e+12
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)
=0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)
Ref: Ridge_Lasso_Regression.ipynb
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)
1. Lasso Regression is similar to the Ridge regression with a difference in the penalty term. Unlike
Ridge, the penalty term here is raised to power 1. Also known as L1 norm.
1. The term continues to be the input parameter which will decide how high penalties would be
for the coefficients. Larger the value more diminished the coefficients will be.
1. Unlike Ridge regression, where the coefficients are driven towards zero but may not become zero,
Lasso Regression penalty process will make many of the coefficients 0. In other words, literally
drop the dimensions
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Shrinkage methods)
Lasso model: [ 0. 0.52263805 -0.5402102 -1.99423315 -4.55360385 -0.85285179 2.99044036 0.00711821 -0. 0.76073274 -0. -0. -0.19736449
0. 2.04221833 -1.00014513 0. -0. 4.28412669 -0. 0. 0.31442062 -0. 2.13894094 -1.06760107 0. -0. 0. 0. -0.44991392 -1.55885506 -0. -0.68837902 0.
0.17455864 -0.34653644 0.3313704 -2.84931966 0. -0.34340563 0.00815105 0.47019445 1.25759712 -0.69634581 0. 0.55528147 0.2948979 -0.67289549
0.06490671 0. -1.19639935 1.06711702 0. -0.88034391 0. -0. ]
Large coefficients have been suppressed, to 0 in many cases, making those dimensions useless i.e. dropped from
the model.
Ref: Ridge_Lasso_Regression.ipynb
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Comparing The Methods)
To compare the Ridge and Lasso, let us first transform our error function (which is
a quadratic / convex function) into a contour graph
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Ridge Constraint)
1. Yellow circle is the Ridge
Most optimal combination of m1, constraint region
Lowest SSE error
m2 given the constraints representing the ridge
ring. violates the
penalty (sum of squared
constraint coeff)
1. Any combination of m1
nd m2 that fall within
yellow is a possible
solution
The point to note is that the red rings and yellow circle will never be tangential (touch) on the axes
representing the coefficient. Hence Ridge can make coefficients close to zero but never zero. You may
notice some coefficients becoming zero but that will be due to roundoff…
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Ridge Constraint) 1. As the lambda value (shown here as alpha) increases,
the coefficients have to become smaller and smaller to
minimize the penalty term in the cost function i.e. the
1. The tighter the constraint region, the larger will be the red
circle in the contour diagram that will be tangent to the
boundary of the yellow region
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Regularising Linear Models (Lasso Constraint)
1. Yellow rectangle is the
Most optimal combination of m1, Lasso constraint region
Lowest SSE error
m2 given the constraints representing the Lasso
ring. violates the
penalty (sum coeff)
constraint
1. Any combination of m1
nd m2 that fall within
yellow is a possible
solution
The beauty of Lasso is, the red circle may touch the constraint region on the attribute axis! In the picture
above the circle is touching the yellow rectangle on the m1 axis. But at that point m2 coefficient is 0!
Which means, that dimension has been dropped from analysis. Thus Lasso does dimensionality reduction
which Ridge does not
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited