0% found this document useful (0 votes)
60 views3 pages

Lasso Regression

Lasso Regression is a regression technique that uses L1 regularization to select variables and prevent overfitting by shrinking the coefficients of less significant features to zero. It enhances linear regression by adding a penalty term to the cost function, balancing the bias-variance tradeoff through the tuning parameter lambda. Ridge Regression, on the other hand, employs L2 regularization to address multicollinearity and improve model generalization by shrinking large coefficients.

Uploaded by

pkoficial333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views3 pages

Lasso Regression

Lasso Regression is a regression technique that uses L1 regularization to select variables and prevent overfitting by shrinking the coefficients of less significant features to zero. It enhances linear regression by adding a penalty term to the cost function, balancing the bias-variance tradeoff through the tuning parameter lambda. Ridge Regression, on the other hand, employs L2 regularization to address multicollinearity and improve model generalization by shrinking large coefficients.

Uploaded by

pkoficial333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Lasso Regression (L1 Regularization)

The Lasso Regression, a regression method based on Least Absolute Shrinkage and Selection
Operator is quite an important technique in regression analysis for selecting the variables and
regularization. Lasso regression removes unnecessary features, which helps prevent overfitting.
It also makes weak features easier to identify by making their impact smaller or even reducing
them to zero.
Lasso Regression enhance the linear regression concept by making use of
a regularization process in the standard regression equation. Linear Regression operates by
minimizing the sum of squared discrepancies between the observed and predicted values by
fitting a line (or, in higher dimensions, a plane or hyperplane) to the data points.
Multicollinearity happens when some features in a dataset are strongly related to each other. This
is common in real-world data. Lasso regression helps by using regularization, which means
adding a penalty to the model. This prevents overfitting and makes the model more reliable.
Bias-Variance Tradeoff in Lasso Regression
The balance between bias and variance is known as the bias-variance tradeoff. While
implementing lasso regression, the penalty term (L1 regularization) helps to significantly
lowers the variance of the model by decreasing the coefficients of less significant features
towards zero. By doing this, overfitting may be avoided. Hence, the model identifies noise in
the training set rather than the underlying patterns. However, increasing the regularization
strength to reduce variance can also make the model too simple, preventing it from properly
understanding the real patterns in the data. This can lead to higher bias.

Thus, bias and variance are traded off in lasso regression, just like in other regularization
strategies. Achieving the ideal balance usually requires minimizing the total prediction
error(MSE) by adjusting the regularization parameter using methods like cross-
validation.
Lasso regression is fundamentally an extension of linear regression. The goal of traditional linear
regression is to minimize the sum of squared differences between the observed and predicted
values in order to determine the line that best fits the data points. But the complexity of real-
world data is not taken into account by linear regression, particularly when there are many
factors.

1. Ordinary Least Squares (OLS) Regression: Lasso regression is very useful in this situation
because adding a penalty term In lasso regression, minimize the sum of squared differences. The
predictors' coefficients' absolute values serve as the basis for this penalty.
The formula for OLS is:
min RSS=Σ(yᵢ−y^ᵢ)²
Where,
· yi is the observed value.
· and y^ is the predicted value for each data point i.
2. Penalty Term for Lasso Regression: The OLS equation is supplemented with a penalty
term. This penalty term is the sum of the absolute values of the coefficients (also known as L1
regularization). The goal is now to minimize the penalty term plus the sum of squared
differences:
RSS+λ×∑∣βi∣
Where,
· βi represents the coefficients of the predictors
· and λ is the tuning parameter that controls the strength of the penalty. As lambda
increases, more coefficients are pushed towards zero
3. Shrinking Coefficients: The penalty term in lasso regression have a unique characteristic
with is its ability to reduce the coefficients of less significant variables to zero. As a result,
features with zero coefficients are eliminated from the model, essentially performing variable
selection. When working with high-dimensional data where there are many predictors relative
to the number of observations this is especially helpful.
Lasso regression makes the model simpler and less prone to overfitting by reducing or deleting
the coefficients of unimportant predictors. This improves the model's readability and ability to
be applied to fresh sets of data.
4. Selecting the optimal λ: In lasso regression, selecting the tuning parameter lambda is
essential. Frequently, cross-validation methods are employed to determine the ideal value of
lambda that strikes a balance between predicted accuracy and model complexity.
The primary objective of Lasso regression is to minimize the residual sum of squares (RSS)
along with a penalty term multiplied by the sum of the absolute values of the coefficients.

In the plot, the equation for the Lasso Regression of cost function, combines the residual sum
of squares (RSS) and an L1 penalty on the coefficients βjβj.
· RSS measures: The squared difference between the expected and actual values is
measured.
· L1 penalty: Penalizes the coefficients' absolute values, bringing some of them to
zero and simplifying the model. The L1 penalty's strength is managed via the
lambda term. Stronger penalties result from greater lambdas, which may both
increase the RSS and make the model sparser (having more coefficients equal to
zero).
The graph itself shows the relationship between the value of lambda (x-axis) and the cost
function (y-axis).
· y-axis: represents the value of the cost function, which Lasso Regression tries to
minimize.
· Bottom axis (x-axis): represents the value of the lambda (λ) parameter, which
controls the strength of the L1 penalty in the cost function.
· Green to orange curve: This curve depicts how the cost function (y-axis) changes
with increasing lambda (x-axis). As lambda increases (moving to the right on the x-
axis), the curve transitions from green to orange. This represents the cost function
value going up (potentially due to a higher RSS term) as the L1 penalty becomes
stronger (forcing more coefficients to zero).
When to use Lasso Regression
When working with high-dimensional datasets that contain a large number of features some of
which may be redundant or irrelevant, lasso regression is very helpful. Moreover, we can use
lasso regression in following situations:
· Feature Selection: By reducing the coefficients of less significant features to zero,
Lasso regression automatically chooses a selection of features. When you have a lot
of features and want to find the ones that are most significant, this is helpful.
· Collinearity: By reducing the coefficients of correlated variables and choosing one,
lasso regression might be useful when there is multicollinearity—that is, when the
predictor variables have a high degree of correlation with one another.
· Regularization: By penalizing big coefficients, Lasso regression can aid in
preventing overfitting. When the number of predictors approaches or surpasses the
number of observations, this becomes particularly significant.
· Interpretability: Compared to conventional linear regression models that
incorporate all features, lasso regression often yields sparse models with fewer non-
zero coefficients. This could make the final model simpler to understand.

Ridge Regression(L2 Regularization)

Ridge Regression is a type of linear regression that adds a penalty to the model to prevent
overfitting. It is useful when there is multicollinearity (high correlation between independent
variables) or when the model is too complex.

Ridge Regression modifies the standard linear regression equation by adding a penalty term
(λ∑β²) to the cost function:

Loss=∑(y−y^)2+λ∑β2
· First part: Measures how well the model fits the data (squared error).
· Second part: Adds a penalty based on the sum of squared coefficients (L2 regularization).
· λ (lambda): Controls the strength of regularization. A higher λ shrinks the coefficients
more, making the model simpler.

Key Benefits:
✔ Reduces overfitting by shrinking large coefficients.
✔ Works well with highly correlated features.
✔ Helps improve model generalization to new data.

When to Use Ridge Regression?

· When your dataset has many features with collinearity.


· When you want to avoid overfitting in linear regression.
· When you need a model that generalizes well to new data.

You might also like