0% found this document useful (0 votes)
14 views10 pages

L2 Regularization

The document discusses L1 and L2 regularization techniques, specifically Ridge and Lasso regression, in the context of machine learning. It explains the problem of overfitting and how Ridge regression minimizes mean square error (MSE) by introducing a penalty factor, while Lasso regression can exclude irrelevant variables. Key differences between the two methods are highlighted, emphasizing Ridge's inclusion of all variables versus Lasso's ability to eliminate unnecessary ones.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views10 pages

L2 Regularization

The document discusses L1 and L2 regularization techniques, specifically Ridge and Lasso regression, in the context of machine learning. It explains the problem of overfitting and how Ridge regression minimizes mean square error (MSE) by introducing a penalty factor, while Lasso regression can exclude irrelevant variables. Key differences between the two methods are highlighted, emphasizing Ridge's inclusion of all variables versus Lasso's ability to eliminate unnecessary ones.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

L2 and L1 Regularization/ Ridge and Lasso Regression

Subject- Machine Learning

Dr. Varun Kumar

Subject- Machine Learning Dr. Varun Kumar 1 / 10


Outlines

1 Problem of Overfitting

2 Ridge Regression
Main idea behind Ridge regression
Working of Ridge regression
How it can solve the unsolvable?

3 Lasso Regression

4 References

Subject- Machine Learning Dr. Varun Kumar 2 / 10


Problem of Overfitting
In real world scenario, the model should performed well for unlabeled or
unknown data.

Overfitted model works well for training data only.


By increasing the model complexity testing data also works well but
underperform for very high complexity.
Subject- Machine Learning Dr. Varun Kumar 3 / 10
Ridge regression: (L2 regularization)

Main idea

⇒ In regression problem, we find the line that gives the minimum mean
square error (MSE) residual.
⇒ By changing the weight value, the MSE can also be reduced for the
test data.
Subject- Machine Learning Dr. Varun Kumar 4 / 10
Working of Ridge regression

⇒ Red dots are the training point.


⇒ Green dots are the testing point.
⇒ Residual error is zero for training model but not for the testing data.
⇒ Given red line (training model) has high variance and zero bias.

Subject- Machine Learning Dr. Varun Kumar 5 / 10


Continued–

⇒ We introduce a small bias, so that MSE for test data could be


minimized.
⇒ From above figure

size = y − axis intercept + Slope × weight

⇒ From Ridge regression


n+m
1 X
MSE = (yi − fˆ(xi ))2 + λ × Θ2
m
i=n+1

(a) Θ → Slope of st line.


(b) λ → Penalty factor, 0 < λ < ∞

Subject- Machine Learning Dr. Varun Kumar 6 / 10


Example–

♦ Case 1: Let λ = 1, Θ = 1.3 ⇒ MSE = 0 + λΘ2 = 0 + 1.69 = 1.69


→ High variance
♦ Case 2: Ridge regression: Let λ = 1, Θ = 0.8, e1 = 0.3, e2 = 0.1 ⇒
MSE = 0.32 + 0.12 + λΘ2 = 0.74 → Low variance
♦ Ridge regression lines are less sensitive to weight rather then least
square line under given penalty.
Subject- Machine Learning Dr. Varun Kumar 7 / 10
Penalty factor λ

Pn+m
⇒ MSE = 1
m i=n+1 (yi − fˆ(xi ))2 + λ × Θ2 → Ridge regression
⇒ Higher the slope (Θ) → Size is more sensitive to the weight from
above figure.
⇒ Penalty= λ × Θ2
If λ = 0, Penalty=0
If λ1 = 1 and λ2 = 2, under constant penalty, Θ2 < Θ1 .
If λ → ∞, under constant penalty, Slope → 0.
⇒ For very large λ, size becomes insensitive to the weight from above
figure.
Note: From above discussion Ridge regression has been discussed (size vs
weight), which is continuous variable. It can also be applicable for discrete
variable.

Subject- Machine Learning Dr. Varun Kumar 8 / 10


Lasso regression: L1 regularization

It is similar to the Ridge regression, but there are some differences


From Lasso regression
n+m
1 X
MSE = (yi − fˆ(xi ))2 + λ × |Θ|
m
i=n+1

(a) Θ → Slope of st line.


(b) λ → Penalty factor, 0 < λ < ∞

Difference between Ridge and Lasso regression


Lasso regression can exclude the useless variable from the equation.
Ridge regression include the variable in the equation, which are more
important.

Subject- Machine Learning Dr. Varun Kumar 9 / 10


References

E. Alpaydin, Introduction to machine learning. MIT press, 2020.

T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,


School of Computer Science, Machine Learning , 2006, vol. 9.

J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.

Subject- Machine Learning Dr. Varun Kumar 10 / 10

You might also like