DataScience - Chapter03 - Machine Learning With Python - 03 - Regression
DataScience - Chapter03 - Machine Learning With Python - 03 - Regression
2023
What is regression?
What is regression?
In machine learning, regression refers to the problem of learning the
relationships between some (qualitative or quantitative) input variables
x = [x1 , x2 , ..., xp ] and a quantitative output variable y .
Model:
y = f (x1 , x2 , ..., xp ) + u
where
u: a noise/error term which describes everything that cannot be
captured by the model.
Types of regression:
Linear regression
Nonlinear regression
Instructor: TRAN THI TUAN ANH 4 / 19
4. REGRESSION IN SUPERVISED LEARNING
Linear regression
y = β 0 + β1 x1 + β2 x2 + ... + βk xk +u
| {z }
f (x1 ,x2 ,...,xk )
β0 , β1 , ..., βk : parameters
The problem is how to learn the parameters β0 , β1 , ..., βk from
training dataset
The linear regression model canbe used for two different purposes:
Classical statistics: Describe relationships
Machine learning : Predicting future outputs
y = β0 + β1 x + β2 x 2 + ... + βp x p + u
(Source: Internet)
Instructor: TRAN THI TUAN ANH 9 / 19
4. REGRESSION IN SUPERVISED LEARNING
Regularization
"Regularization" in regression is a way to give a penalty for each
parameter included into the model;
In regularized regression, the magnitude (size) of coefficients, as well
as the magnitude of the error term, are penalized.
Complex models are discouraged, that help to avoid overfitting.
Two most common types of Regularized Regression are:
Ridge regression
Lasso regression
Ridge regression:
2
n
X k
X k
X
Loss = yi − β0 − βj xji + λ βj2 → min
i=1 j=1 j=1
Lasso regression:
2
n
X k
X k
X
Loss = yi − β0 − βj xji + λ |βj | → min
i=1 j=1 j=1
LASSO vs Ridge:
Note:
The tuning parameter controls the strength of the penalty term.
When λ = 0, Ridge/Lasso regression equals least squares regression;
When λ = ∞, all parameters tend to be 0;
The ideal penalty is therefore somewhere in between 0 and ∞.
Turning parameter:
Example 3.6
+ Linear regression in Python:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, y_train)
+ Rigde regression
from sklearn.linear_model import Ridge
ridge = Ridge(alpha=0.01)
ridge.fit(X_train, y_train)
+ Lasso regression
from sklearn.linear_model import Lasso
lasso = Lasso(alpha=0.01)
lasso.fit(X_train, y_train)
Instructor: TRAN THI TUAN ANH 15 / 19
4. REGRESSION IN SUPERVISED LEARNING
Teamwork 3:
Using data in file regression.csv, where
Inputs: x1 , x2 , x3 , x4
Output: y
Create a Python code file to do some tasks as follow:
Loading the required libraries and modules
Loading the data
Creating arrays for the inputs and output variable
Creating the training and test datasets
Build, Predict and Evaluate the Ridge and Lasso regression.
Hint: Similar to Python file of previous algorithms
Instructor: TRAN THI TUAN ANH 18 / 19
4. REGRESSION IN SUPERVISED LEARNING
THE END