0% found this document useful (0 votes)
12 views27 pages

INSY446 - 3 - Linear Model Part 2

Uploaded by

iryannh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views27 pages

INSY446 - 3 - Linear Model Part 2

Uploaded by

iryannh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

INSY 446 – Winter 2023

Data Mining for Business


Analytics

Session 3 – Linear Model Part 2


January 23, 2023
Dongliang Sheng
Linear Regression Revisited
y

§ The core idea of linear regression models is


to find a linear relationship between
predictors and the target variable
§ It works well both in statistical perspectives
and data mining perspectives
2
Linear Regression Revisited
y

§ There is one important issue when we use


linear regression models in data mining
§ That is, the model is very sensitive to the
training dataset

3
Linear Regression Revisited
y

§ Suppose we separate the data into training


(red)
§ The relationship estimated from the training
data would be different than the “true”
relationship (dark grey)
4
Linear Regression Revisited
y

§ When the model is used on the test dataset


(blue), the performance would be subpar
§ In other words, although the performance on
the training dataset is reasonably good, the
performance on the test dataset is bad
5
The Issue

§ This issue occurs because the objective of


the linear regression model is to optimize the
sum of squared error in the training data
§ So, the model has a low bias (performs well
with the training data) and high variance
(performs very badly with the test data)
§ This is similar to the overfitting issue

6
Regularization

§ The idea of the regularization technique is to


add a small amount of bias to the model (i.e.,
making the model performs worse with the
training data)
§ In the process, the variance can be reduced
(i.e., the model performs better with the test
data)
§ There are several models that operationalize
the regularization technique

7
Example 1
Linear Regression Issues
# Import data
df = pandas.read_csv("cereals.CSV")

# Construct variables
X = df[[Sodium']]
y = df['Rating']

# Using the whole data


lm1 = LinearRegression()
model1 = lm1.fit(X,y)

# Generate the prediction value


y_pred = model1.predict(X)

# Separate the data


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 9)

# Run linear regression based on the training data


lm2 = LinearRegression()
model2 = lm2.fit(X_train,y_train)

# Generate the prediction value from the training data


y_train_pred = model2.predict(X_train)
# Generate the prediction value from the test data
y_test_pred = model2.predict(X_test)

# Plot the linear regression line for the whole data


pyplot.plot(X, y_pred, color='black', linewidth=1)
# Plot observations in the training data
pyplot.scatter(X_train, y_train, color=red')
# Plot the linear regression line for training data
pyplot.plot(X_train, y_train_pred, color='red', linewidth=1)
# Plot observations in the test data
pyplot.scatter(X_test, y_test, color='green')
pyplot.plot(X_test, y_test_pred, color='green', linewidth=1)
pyplot.show() 8
Ridge Regression

§ One of popular models that utilize the


regularization technique is Ridge Regression
§ Ridge Regression adds bias by changing the
objective of the model from minimizing the
sum of squared errors (SSE) to minimizing:
𝑺𝑺𝑬 + (λ ×𝑺𝒍𝒐𝒑𝒆𝟐 )

Additional Penalty Imposed


by Ridge Regression

Note: 𝑆𝑙𝑜𝑝𝑒 is similar to the coefficient of the variable x, which


is usually denoted by 𝛽 in a linear regression model.
9
Ridge Regression

§ Intuitively, the slope represents the


sensitivity of the target variable when the
value of predictor(s) changes
§ For a steep line (i.e., large slope), a change in
a predictor leads to a large change in the
target variable
§ For a flat line (i.e., small slope), a change in a
predictor leads to a small change in the target
variable

10
Ridge Regression

§ The parameter λ dictates the sensitivity of the


target variable with respect to the change in
the value of predictor(s)
§ λ ranges from zero to infinite
§ If λ = 𝟎, then the model becomes the
traditional linear regression model
§ When λ increases, the model will be penalized
more when it is sensitive to the change in the
value of predictor(s)
§ In other words, higher λ leads to slope of the
ridge regression line that is close to 0
11
Ridge Regression
y

§ With this penalty term, the ridge regression


line has smaller slope than the linear
regression line
§ The ridge regression line fits the test set
better than the linear regression line
12
Ridge Regression
y

§ With trial-and-errors, we find the value of λ


that optimizes the SSE based on the test set
§ In practice, we use cross validation to find
the optimal value of λ

13
Example 2
Ridge Regression

# Load libraries
from sklearn.linear_model import Ridge
import pandas

# Import data
df = pandas.read_csv("cereals.CSV")

# Construct variables
X = df[['Sodium']]
y = df['Rating']

# Separate the data


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 9)

# Run ridge regression

# Generate the prediction value from the test data

# Calculate the MSE


from sklearn.metrics import mean_squared_error

14
Example 3
Penalty Parameter

# Load libraries
from sklearn.linear_model import Ridge
import pandas

# Import data
df = pandas.read_csv("cereals.CSV")

# Construct variables
X = df[['Sodium']]
y = df['Rating']

# Separate the data


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 9)

# Run ridge regression

# Print the coefficients

# Generate the prediction value from the test data

# Calculate the MSE


from sklearn.metrics import mean_squared_error

15
Example 4
Finding Optimal Alpha

# Load libraries
from sklearn.linear_model import Ridge
import pandas

# Import data
df = pandas.read_csv("cereals.CSV")

# Construct variables
X = df[['Sodium']]
y = df['Rating']

# Separate the data


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 9)

# Run the loops


from sklearn.metrics import mean_squared_error

16
The Idea of Ridge Regression
y

§ Ridge Regression adds bias to the model to


reduce the variance by using the parameter λ
§ The core idea is to manipulate the slope of
the regression line to fit the training data less
and fit the test data more
17
LASSO Regression

§ LASSO: Least Absolute Shrinkage and


Selection Operator
§ It is also a regression model that utilizes the
regularization technique
§ In other words, LASSO also introduces bias
to reduce variance
§ It is very similar to Ridge Regression with
one important difference

18
LASSO Regression

§ The objective of Ridge Regression is to


minimize:

𝑺𝑺𝑬 + (λ ×𝑺𝒍𝒐𝒑𝒆𝟐 )

§ In LASSO, the objective is to minimize:

𝑺𝑺𝑬 + (λ × 𝑺𝒍𝒐𝒑𝒆 )

19
LASSO Regression

§ Similar to Ridge Regression, λ is a parameter


of the model ranges from zero to infinite
§ We usually use cross validation to determine
the optimal value of λ (i.e., the value of λ that
produces the lowest MSE/MAPE based on the
test set)

20
The Role of Lambda

§ Intuitively, the parameter λ penalizes the


predictor(s) with respect to their influence on
the target variable
§ So, the penalty that λ imposes is different for
different predictors
§ When λ increases, the slope (i.e., coefficient)
of each predictor naturally decreases
§ This process is called “shrinking” for both
Ridge Regression and LASSO

21
Ridge Regression vs. LASSO

§ In Ridge Regression, the shrinking operation


can decrease the slope to be asymptotically
zero (i.e., very close to zero)
§ In LASSO, the shrinking operation can
decrease the slope to zero
§ Therefore, LASSO generally perform better
when there are a lot of useless predictors
where LASSO can penalize their coefficient to
zero
§ Ridge Regression tends to perform better
when most of predictors are useful
22
Example 5
LASSO

# Load libraries
from sklearn.linear_model import Lasso
import pandas

# Import data
df = pandas.read_csv("cereals.CSV")

# Construct variables
X = df[['Sodium']]
y = df['Rating']

# Separate the data


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 5)

# Run linear regression only for the test data

# Generate the prediction value from the test data

# Calculate the MSE


from sklearn.metrics import mean_squared_error

23
Example 6
Penalty Parameter

# Load libraries
from sklearn.linear_model import Lasso
import pandas

# Import data
df = pandas.read_csv("cereals.CSV")

# Construct variables
X = df[['Sodium']]
y = df['Rating']

# Separate the data


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 5)

# Run linear regression only for the test data

# Print the coefficients

# Generate the prediction value from the test data

# Calculate the MSE


from sklearn.metrics import mean_squared_error

24
Example 7
Finding Optimal Alpha

# Load libraries
from sklearn.linear_model import Lasso
import pandas

# Import data
df = pandas.read_csv("cereals.CSV")

# Construct variables
X = df[['Sodium']]
y = df['Rating']

# Separate the data


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 5)

# Run the loops


from sklearn.metrics import mean_squared_error

25
Exercise #1

§ Use nutrition.csv dataset


§ Use CALORIES as the target variable and
other variables as predictors
§ Construct a ridge regression model
§ Print all coefficients (including intercept).
You do not have to format the results

26
Exercise #2

§ Using the same dataset in #1


§ Use CALORIES as the target variable and
other variables as predictors
§ Construct a LASSO model
§ Print all coefficients (including intercept).
You do not have to format the results

27

You might also like