Regression
Regression
Regression
INTRODUCTION
TO REGRESSION
• Regression is a well-known statistical technique to model the
predictive relationship between several independent variables
and one dependent variable.
• The objective is to find the best-fitting curve for a dependent
variable in a multidimensional space, with each independent
variable being a dimension.
• The curve could be a straight line, or it could be a nonlinear
curve.
• The quality of fit of the curve to the data can be measured by a
coefficient of correlation (r), which is the square root of the
amount of variance explained by the curve.
POINT TO PONDER?
● “Imagine you have made plans with friends after a long time and you wish to go
out, but you are not sure whether it will rain or not. It’s the monsoon season, but
your mom says the air feels dry today, and therefore the probability of raining
today is less. On the contrary, your sister believes because it rained yesterday it’s
likely that it will rain today. Considering you have no control over the weather,
how will you decide whose opinion to take more seriously, keeping in mind the
fact that you are impartial towards both?”
Source: https://www.dezyre.com/article/types-of-regression-analysis-in-machine-learning/410
Geographical
location
Dependent Variable
Linearly Correlated
Rainfall/
Precipitation
Independent Variable
Wind
Humidity Speed
KEY STEPS
Regression Model
Previous Values Future Value
Time-series: Xt-1 Xt
as Prediction
Model Xt+1
EVALUATING
REGRESSION MODELS
ACCURACY IS NOT A MEASURE TO
CALCULATE REGRESSION!
• There are many other metrics for regression, although these are the most
commonly used. You can see the full list of regression metrics supported by
the scikit-learn Python machine learning library here:
• Scikit-Learn API: Regression Metrics.
Original Source: https://machinelearningmastery.com/regression-metrics-for-
machine-learning/
1.1. MEAN SQUARED ERROR MEAN
SQUARED ERROR
• Mean Squared Error, or MSE for short, is a popular error metric for
regression problems.
• It is also an important loss function for algorithms fit or optimized using
the least squares framing of a regression problem. Here “least squares”
refers to minimizing the mean squared error between predictions and
expected values.
• The MSE is calculated as the mean or average of the squared differences
between predicted and expected target values in a dataset.
• The squaring also has the effect of inflating or magnifying large errors.
That is, the larger the difference between the predicted and expected
values, the larger the resulting squared positive error. This has the effect
of “punishing” models more for larger errors when MSE is used as a loss
function. It also has the effect of “punishing” models by inflating the
average error score when used as a metric.
• The mean squared error between your expected and predicted values
can be calculated using the mean_squared_error() function from the
scikit-learn library.
• The function takes a one-dimensional array or list of expected values and
predicted values and returns the mean squared error value.
2. ROOT MEAN SQUARED ERROR
• The Root Mean Squared Error, or RMSE, is an extension of
the mean squared error.
• Importantly, the square root of the error is calculated,
which means that the units of the RMSE are the same as
the original units of the target value that is being predicted.
• As such, it may be common to use MSE loss to train a
regression predictive model, and to use RMSE to evaluate
and report its performance.
• MSE uses the square operation to remove the sign of each
error value and to punish large errors. The square root
reverses this operation, although it ensures that the result
remains positive.
• The root mean squared error between your expected and
predicted values can be calculated using
the mean_squared_error() function from the scikit-learn
library.
3. MEAN ABSOLUTE ERROR
• Mean Absolute Error, or MAE, is a popular metric because, like RMSE,
the units of the error score match the units of the target value that is
being predicted.
• MSE and RMSE punish larger errors more than smaller errors, inflating
or magnifying the mean error score. This is due to the square of the
error value. The MAE does not give more or less weight to different
types of errors and instead the scores increase linearly with increases
in error.
• As its name suggests, the MAE score is calculated as the average of
the absolute error values. Absolute or abs() is a mathematical function
that simply makes a number positive. Therefore, the difference
between an expected and predicted value may be positive or negative
and is forced to be positive when calculating the MAE.
• The mean absolute error between your expected and predicted values
can be calculated using the mean_absolute_error() function from the
scikit-learn library.
REGRESSION
ANALYSIS
Outliers Underfitting
Regression
Analysis
Overfitting Heteroscedasticity
Outliers
● where m is the slope of the line, c is an intercept, and e represents the error in the model.
● The predictor error is the difference between the observed values and the predicted value.
● The values of m and c get selected in such a way that it gives the minimum predictor error. It is important to note that a
simple linear regression model is susceptible to outliers.
Source: https://medium.com/machine-learning-id/simple-linear-
regression-teori-d4abebd1ade2
Multiple Linear Regression
The equation for a multiple linear regression is
shown below.
Overfitting!!!
Regularization to reduce Overfitting
● There are two types of regression that are quite familiar and use this
Regularization technique, namely:
○ Ridge Regression
○ Lasso Regression
Ridge Regression
● Lasso regression and ridge regression are both known as regularization methods because they both
attempt to minimize the sum of squared residuals (RSS) along with some penalty term.
● In other words, they constrain or regularize the coefficient estimates of the model.
● The main difference between Ridge and LASSO Regression is that if ridge regression can shrink the
coefficient close to 0 so that all predictor variables are retained. Whereas LASSO can shrink the
coefficient to exactly 0 so that LASSO can select and discard the predictor variables that have the right
coefficient of 0.
● When we use ridge regression, the coefficients of each predictor are shrunken towards zero but none of
them can go completely to zero.
● Conversely, when we use lasso regression it’s possible that some of the coefficients could go completely
to zero when λ gets sufficiently large.
Which is better: Ridge and Lasso Regression?