0% found this document useful (0 votes)
23 views19 pages

DataScience - Chapter03 - Machine Learning With Python - 03 - Regression

Uploaded by

lyntm125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views19 pages

DataScience - Chapter03 - Machine Learning With Python - 03 - Regression

Uploaded by

lyntm125
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIVERSITY OF ECONOMICS HO CHI MINH CITY

INTRODUCTION TO DATA SCIENCE AND


APPLICATIONS

2023

Instructor: TRAN THI TUAN ANH


4. REGRESSION IN SUPERVISED LEARNING

4. REGRESSION IN SUPERVISED LEARNING

Instructor: TRAN THI TUAN ANH 2 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

What is regression?

(Source: Internet) Instructor: TRAN THI TUAN ANH 3 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

What is regression?
In machine learning, regression refers to the problem of learning the
relationships between some (qualitative or quantitative) input variables
x = [x1 , x2 , ..., xp ] and a quantitative output variable y .
Model:
y = f (x1 , x2 , ..., xp ) + u
where
u: a noise/error term which describes everything that cannot be
captured by the model.
Types of regression:
Linear regression
Nonlinear regression
Instructor: TRAN THI TUAN ANH 4 / 19
4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

Linear regression

y = β 0 + β1 x1 + β2 x2 + ... + βk xk +u
| {z }
f (x1 ,x2 ,...,xk )

β0 , β1 , ..., βk : parameters
The problem is how to learn the parameters β0 , β1 , ..., βk from
training dataset
The linear regression model canbe used for two different purposes:
Classical statistics: Describe relationships
Machine learning : Predicting future outputs

Instructor: TRAN THI TUAN ANH 5 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

First, learning the model from training data


Learn the unknown parameters β0 , β1 , ..., βk from a training dataset;
That means to find values such that the model fits the data well.
How?
By OLS: Ordinary Least Squares
By LAD: Least Absolute Deviation
By MLE: Maximum Likelihood Estimator

The OLS method is most commonly used.


Second, use trained model to predict the outputs for new data.

ŷ = β̂0 + β̂1 x1∗ + β̂2 x2∗ + ... + β̂k xk∗

Instructor: TRAN THI TUAN ANH 6 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

Some special cases:


The polynomial regression

y = β0 + β1 x + β2 x 2 + ... + βp x p + u

Qualitative input variables


Use dummy variables
If a qualitative input variable that only takes two different values,
create one dummy variable;
If a qualitative input variable that can take m different values, create
m − 1 dummy variable;

Instructor: TRAN THI TUAN ANH 7 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

The problem of overfitting and regularization


Overfit regression models have too many parameters for the number
of observations.
An overfit model can cause the regression coefficients, p-values, and
R-squared to be misleading.

A useful approach to handle overfitting is regularization.

Instructor: TRAN THI TUAN ANH 8 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

The problem of overfitting and regularization

(Source: Internet)
Instructor: TRAN THI TUAN ANH 9 / 19
4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

Regularization
"Regularization" in regression is a way to give a penalty for each
parameter included into the model;
In regularized regression, the magnitude (size) of coefficients, as well
as the magnitude of the error term, are penalized.
Complex models are discouraged, that help to avoid overfitting.
Two most common types of Regularized Regression are:
Ridge regression
Lasso regression

Instructor: TRAN THI TUAN ANH 10 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning


!2
n
P k
P
OLS: Loss = yi − β0 − βj xji → min
i=1 j=1

Ridge regression:
 2
n
X k
X k
X
Loss = yi − β0 − βj xji  + λ βj2 → min
i=1 j=1 j=1

Lasso regression:
 2
n
X k
X k
X
Loss = yi − β0 − βj xji  + λ |βj | → min
i=1 j=1 j=1

where λ is tuning parameter


Instructor: TRAN THI TUAN ANH 11 / 19
4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

LASSO vs Ridge:

Instructor: TRAN THI TUAN ANH 12 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

Note:
The tuning parameter controls the strength of the penalty term.
When λ = 0, Ridge/Lasso regression equals least squares regression;
When λ = ∞, all parameters tend to be 0;
The ideal penalty is therefore somewhere in between 0 and ∞.

Instructor: TRAN THI TUAN ANH 13 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

Turning parameter:

Instructor: TRAN THI TUAN ANH 14 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

Example 3.6
+ Linear regression in Python:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, y_train)
+ Rigde regression
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=0.01)
ridge.fit(X_train, y_train)
+ Lasso regression
from sklearn.linear_model import Lasso
lasso = Lasso(alpha=0.01)
lasso.fit(X_train, y_train)
Instructor: TRAN THI TUAN ANH 15 / 19
4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

Evaluating forecast accuracy


Mean Absolute Error
n
1X
MAE = |ui |
n
i=1
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)
Mean Squared Error
n
1X 2
MSE = ui
n
i=1
from sklearn.metrics import mean_squared_error
mean_squared_error(y_true, y_pred)

Instructor: TRAN THI TUAN ANH 16 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

Evaluating forecast accuracy (cont)


Mean Absolute Percentage Error
 
1 X  Yi − Ŷi
n n  
1 X |ui |
MAPE = × 100 = × 100
n |Yi | n |Yi |
i=1 i=1

Root Mean Squared Error


v
u n
√ u1 X
RMSE = MSE = t ui2
n
i=1

Instructor: TRAN THI TUAN ANH 17 / 19


4. REGRESSION IN SUPERVISED LEARNING

4. Regression in supervised learning

Teamwork 3:
Using data in file regression.csv, where
Inputs: x1 , x2 , x3 , x4
Output: y
Create a Python code file to do some tasks as follow:
Loading the required libraries and modules
Loading the data
Creating arrays for the inputs and output variable
Creating the training and test datasets
Build, Predict and Evaluate the Ridge and Lasso regression.
Hint: Similar to Python file of previous algorithms
Instructor: TRAN THI TUAN ANH 18 / 19
4. REGRESSION IN SUPERVISED LEARNING

THE END

THANK YOU FOR LISTENING

Instructor: TRAN THI TUAN ANH 19 / 19

You might also like