0% found this document useful (0 votes)
20 views5 pages

09 Regression

Uploaded by

l.arrizabalaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

09 Regression

Uploaded by

l.arrizabalaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Objectives

Rubén Sánchez Corcuera


[email protected]

■ We will see how to apply regression using scikit-learn


■ We will review how to evaluate regression models

Regression
2

What is a regression? What is a regression?

■ A regression is a statistical technique that relates a dependent


variable to one or more independent (explanatory) variables.
■ A regression model is able to show whether changes observed in
the dependent variable are associated with changes in one or more
of the explanatory variables.
■ It does this by essentially fitting a best-fit line and seeing how the
data is dispersed around this line.

3 4
Regression in sklearn (our fav library) Regression in sklearn (our fav library)

■ There are multiple methods for regression supported in sklearn: ■ There are multiple methods for regression supported in sklearn:
● Nearest Neighbour regression

● Linear regression ● Support Vector Regression


● Logistic regression ■ LinearSVR
● Generalized linear regression ■ SVR
● Quantile regression ■ NuSVR
● Polynomial regression ● SGD Regression

5 6

Regression in sklearn (our fav library) Robust regression in sklearn


■ Robust regression aims to fit a regression model in the presence of corrupt data:
either outliers, or error in the model.
■ There are multiple methods for regression supported in sklearn:
■ Scikit-learn provides 3 robust regression estimators: RANSAC, Theil Sen and
● Gaussian Process Regression HuberRegressor.

● Decision Trees Regression ● HuberRegressor should be faster than RANSAC and Theil Sen unless the
number of samples are very large, i.e n_samples >> n_features.
● HuberRegressor should be more robust than RANSAC and Theil Sen on
default parameters.
● RANSAC is faster than Theil Sen and scales much better with the number
of samples.
● RANSAC will deal better with large outliers in the y direction (most
common situation).
● Theil Sen will cope better with medium-size outliers in the X direction, but
this property will disappear in high-dimensional settings.
7 8
Evaluating Regression Models

Evaluating
Regression ■ Why can’t we use accuracy to evaluate our regression models?
● We have a continuous target variable.

Models
● If we evaluate accuracy for each one of the data points we will
obtain awful results
■ We need other type of metrics to properly evaluate our models.

9 10

Mean Absolute Error (MAE) Mean Absolute Percentage Error (MAPE)

■ One of the most used metrics ■ When the target variable has a single dimension, some users tend
to normalize it, whereas other don´t.
■ We try to calculate the difference between the predicted values
and the actual ones. ■ The value of MAE will vary between normalized and
non-normalized approaches.
■ If is the predicted value and yi the expected one, the error would
be ■ Defining the error as a percentage variation from the actual values,
solves these situations:

■ As it would not be useful to present it as the total error, we
calculate the mean:

11 12
Root Mean Squared Error (RMSE) R-squared (R2)

■ RMSE is another widely used metric for regression models. ■ R-squared explains to what extent the variance of one variable
explains the variance of the second variable.
■ Is similar to the MSE, but the result is square-rooted.
● It is also known as the Coefficient of Determination.

13 14

R-squared (R2) Adjusted R-squared (Adjusted R2)

■ R-squared explains to what extent the variance of one variable ■ If we have an overfitted model can have a high R-squared we can
explains the variance of the second variable. help this problem with the adjusted R-squared measure.
● It is also known as the Coefficient of Determination.

15 16
Exercise Do you have any questions?
[email protected]

■ Let’s try this with a quick exercise.

Thanks!
17 18

You might also like