0% found this document useful (0 votes)

11 views2 pages

Topic 7 Linear Regression

Uploaded by

alhammadheba77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views2 pages

Topic 7 Linear Regression

Uploaded by

alhammadheba77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

LinearRegressionMethods

ISE791

Table of Contents
1 Regression
1.1 Learning Outcomes
1.1.1 What is the difference between correlation and regression?
1.2 Regression vs Correlation
2 Linear Regression
2.1 Linear Regression Overview
3 Single Input Single Output (SISO) Linear regression
3.1 Solution Method for SISO Linear Regression
3.1.1 Example: SISO Linear Regression
3.1.2 Gradient Descent for Linear Regression
3.1.3 Example-2: SISO Linear Regression
3.1.3.1 How to find the value β1 and β0 ?
3.2 How does the error function looks for our data?
3.3 What are the best values for β1 and β0 ?
3.4 Where do the best values for β1 and β0 fall in the error function?
3.4.1 Gradient descent algorithm
4 Multiple Input Single Output (MISO) Linear Regression
4.1 Gradient Descent for MISO Linear Regression
4.2 Closed Form Solution for MISO Linear Regression
4.2.1 Example MISO Linear Regression
4.3 Penalized Linear Regression (PLR)
4.3.1 Solution of the Penalized Linear Regression
4.3.2 Example Penalized Regressions
4.3.3 Lasso vs Ridge: Summary
5 Cross-validation for Parameter Selection
6 Case Study
6.1 Case Study - 1
7 Concluding Remarks
7.1 Handling Big-Data
7.2 Concluding Remarks
8 References
8.1 Theory:
8.2 Data Sets:

Regression
A process for modeling the relationship between variables of interest.

Example: If you know the relationship between education and income (the more someone is educated, the more money they make), we could predict someone's income based on their education. Simply speaking, learning such a
relationship is regression.

Learning Outcomes
1. Implement linear and penalized regression methods (CLO-2)
2. Execute gradient descent algorithm for regression problem (CLO-2)
3. Process real data before applying regression (CLO-1)
4. Apply cross-validation and hyper-parameter selection (CLO-4)

What is the difference between correlation and regression?

Question-A: Consider the following data for 10 students who took exam last semester. Column-1 represents the student ID, Column-2 represents the attitude of the students before taking the exam, and Column-3 represents the
score obtained in the exam.

STU-ID Attitude Score

1 65 129

2 67 126

3 68 143

4 70 156

5 71 161

6 72 158

7 72 168

8 73 166

9 73 182

10 75 201

1. Calculate the correlation between Attitude and Score using python.

2. If attitude decrease, then score will increase. True or False. Explain your answer based on Part-1.
3. If attitude of a new student is 74 units, then can you estimated his score before the exam using correlation?

In [1]: # 1. Calculate the correlation between *Attitude* and *Score* using python.

import pandas as pd
# import matplotlib.pyplot as plt
# import numpy as np

df = pd.read_csv('data/Regression-1.csv', delimiter =',')

corr = df.drop(['STU-ID'], axis=1).corr()

#corr = df.loc[:,['Attitude','Score']].corr()
#corr = df.iloc[:,[1,2]].corr()

display(corr)
print(f"The correlation between Attitude and Score is {corr.loc['Attitude','Score']}")

# # Following code can be used to compute the correlation too

# x = df['Attitude']
# y = df['Score']
# r = np.corrcoef(x, y)[0,1]
# print(f'The correlation between Attitude and Score is {r}')

Attitude Score

Attitude 1.00000 0.94179

Score 0.94179 1.00000

The correlation between Attitude and Score is 0.9417903723646914

1. If attitude decrease, then score will increase. True or False. Explain your answer based on Part-1.
False, since the direction of relationship is positive.

2. If attitude of a new student is 74 units, then can you estimated his score before the exam using correlation?
No, the correlation shows the strength and direction of the relationship. However, it cannot depict the actual relation.

Regression vs Correlation
Correlation:

1. Correlation may indicate whether two variables are related or not.

2. However, correlation will not provide information of how one variable is related to another.

Regression:

1. Regression may identify how one or more variables are related to an output variable.
2. Specifically, it will provide details of how input variables affects the output variable.
3. Beyond estimating a relationship, regression is a way of predicting an output variable from one or more input variables.

Linear Regression
The most common form of regression used in data analysis.
It assumes the relationship of the input variables and the output variables is linear (can be expressed as a line or hyperplane).

Linear Regression Overview

Single Input Single Output (SISO) Linear regression

How one input variable is related to one output variable?

We will start with Single Input Single Output (SISO) linear regression.

Notations: Let x be the input variables, and let y be the output variable. The linear regression model can be stated as:

Model:

y = β0 + β1 x ,

where β1 represents the slope of the x, and β0 is the intercept for the equation.

Goal: Linear regression estimates the best values of β0 and β1 . So, when a new or previously unobserved data point x comes with unknown value of y, using the value of x, and estimated β0 and β1 values, one can find estimated
value of y, say y^. The goal of linear regression is to have y^ as close as possible to y.

Solution Method for SISO Linear Regression

From statistical analysis, it has been shown that the coefficients can be estimated as follows:

sdy
β1 = r
sdx

and

β0 = ¯y¯¯ − β1¯x¯¯

where r is the Pearson's correlation coefficient, sdx and sdy represent the standard deviation of x and y variables respectively, ¯x
¯¯ and ¯y
¯¯ represent the means of x and y variables respectively.

Example: SISO Linear Regression

Question-B: Consider the following data from Question-A:

STU-ID Attitude Score

1 65 129

2 67 126

3 68 143

4 70 156

5 71 161

6 72 158

7 72 168

8 73 166

9 73 182

10 75 201

1. Visualize the relationship between Attitude and Score (plot Attitude on x-axis, and Score on y-axis).
2. Is the relationship linear, comment.
3. Identify the linear relationship between Attitude and Score of the students using the formula. Then verify the result using python.
4. If a new participant with positive attitude of 78 is taking the exam, then what is the estimated score of the participant.
5. If new participants with positive attitude of 78, 74, 68 and 69 are taking the exam, then what are the estimated scores for the participants.
6. The estimated score and predicted score of student with attitude of 68 are different. Is there some error in the approach?

In [2]: # 1. Visualize the relationship between Attitude and Score (plot Attitude on x-axis, and Score on y-axis).
%matplotlib inline
# %matplotlib notebook
# %matplotlib qt

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('data/Regression-1.csv', delimiter =',')

plt.figure()
sns.relplot(x='Attitude', y='Score',
color = 'purple', marker = 'D',
kind='scatter',
data=df)
plt.xlabel('Attitude')
plt.ylabel('Score')
plt.show()

<Figure size 432x288 with 0 Axes>

2: Is the relationship linear, comment.

From the plot, it can be seen that the relationship could be linear.

1. Identify the linear relationship between Attitude and Score of the students using the formula. Then verify the result using python.

To identify the relationship, we need to identifying the betas

Using the formula:

Pearson’s correlation coefficient r = 0.94. Mean of Attitude = 70.6, mean of Score = 159.
S.D. of Attitude = 2.94, S.D. of Score = 21.64.

Thus,

21.64
β1 = 0.94 × = 6.93
2.94

β0 = 159 − (6.93 × 70.6) = −330.25

In [3]: # Then verify the result using python.

import numpy as np
from sklearn.linear_model import LinearRegression

# Create linear regression object

reg = LinearRegression(fit_intercept=True)

# Train the model using the training sets

reg.fit(df[['Attitude']], df['Score']) #single column
# reg.fit(df['Attitude'].values.reshape(-1,1), df['Score']) #single column

# The betas are

print(f'Beta1, the slope is {np.round(reg.coef_,2)}; and Beta0, the intercept is {np.round(reg.intercept_,2)}')

## The following code computes betas using the formula and python,
## but without using the skliear library

# import numpy as np
# x=df['Attitude'].values
# y=df['Score'].values
# r = np.corrcoef(x, y)[0,1]
# sdx = np.std(x)
# sdy = np.std(y)
# beta1 = r * (sdy / sdx)
# beta0 = np.mean(y) - beta1*np.mean(x)
# print(r,np.mean(x),np.mean(y),sdx,sdy,beta1,beta0)

Beta1, the slope is [6.93]; and Beta0, the intercept is -330.46

1. If a new participant with positive attitude of 78 is taking the exam, then what is the estimated score of the participant.

From Part-3 we know:

β1 = 6.93

β0 = −330.46

Thus, the new participant's estimated score will be

y^ = −330.46 + (6.93 × 78) = 210.08

In [4]: x_new = 78
y_new = reg.predict(np.array(x_new).reshape(1, -1)) #single value or single record
print(f'The estimated score for the student with attiude of {x_new} is {np.round(y_new,2)}.')

The estimated score for the student with attiude of 78 is [210.3].

In [5]: # 5. If new participants with positive attitude of 78,74,68 and 69 are taking the exam,
# then what are the estimated scores for the participants.

x_new = [78,74,68,69]
y_new = reg.predict(np.array(x_new).reshape(-1, 1)) #single column
print(f'The estimated scores for the students with attiudes of {x_new} are {np.round(y_new,2).tolist()} respectively .')

The estimated scores for the students with attiudes of [78, 74, 68, 69] are [210.3, 182.57, 140.97, 147.91] respectively .

1. The estimated score and predicted score of student with attitude of 68 are different. Is there some error in the approach?
From the plot, we can see that the relationship is not perfectly linear. Thus, the estimated values will not give the perfect results. There will be some error. The linear regression tries to minimize the mean squared error of the
input data points. We will see all these details in the following slides.

Gradient Descent for Linear Regression

The previous solution method is based on Square Error Minimization.
The least square approach has a closed form solution.
However, the closed form solution will not work with large scale data.
In this section, we will look at gradient descent based solution.

Example-2: SISO Linear Regression

Question-C Let us take an example to understand the gradient descent approach. Consider the data given in Regression-2.csv file. Do the following:

1. Read and display the data.

2. Plot the data
3. Can we model the data using a linear relationship? If yes, then suggest the linear model, and highlight the unknowns.

In [1]: # 1. Read and display the data.

import pandas as pd
df = pd.read_csv('data/Regression-2.csv', delimiter =',')
display(df.T)
# df.describe().T

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

y 3 4 8 4 6 9 8 12 15 26 35 40 45 54 49 59 60 62 63 68

In [2]: # 2. Plot the data

%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# df = pd.read_csv('data/Regression-2.csv', delimiter =',')

plt.figure()
sns.relplot(x='x', y='y', color = 'r', marker = 'o',kind='scatter',data=df)
plt.xlabel('x')
plt.ylabel('y')
plt.show()

<Figure size 432x288 with 0 Axes>

1. Can we model the data using a linear relationship? If yes, then suggest the linear model, and highlight the unknowns.

The data can be modeled using a linear relationship (more or less).

let us denote the data as xi , yi , where i = 1, … , n, and n is the total observations.
Let us assume, the equation of the line (or the linear relation) is:

y = β1 x + β0

where β1 is the line's slope and β0 is the line's y-intercept are the two unknowns.

How to find the value β1 and β0 ?

Idea:

1. try different values of β1 and β0 , and

2. find the best values.

Mechanism:

1. define an error function (a.k.a cost function), that measures how good a given line is,
2. for a given value of β1 and β0 , the error function returns an error value based on how well the line fits our data.

Squared Error Function:

1. one of the classical ways to measure the error value,

2. the error value is the summation of square of errors for each point,
3. linear regression with such error function is called as Ordinary Least Square (OLS) regression,
4. Formally, this error function looks like:

1 n
ϵ= ∑((β1 xi + β0 ) − yi )2 .
n
i=1

Solution Method:

1. values of value of β1 and β0 that minimizes the above function will give best line fitted for our data.
2. minimization of the above function can be done using different algorithms.
3. We will focus on gradient descent algorithm.

How does the error function looks for our data?

In [12… # The following code is not for the course. It is just a depiction of the error function.
# The code details should be skipped.

%matplotlib inline
# %matplotlib qt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

df = pd.read_csv('data/Regression-2.csv', delimiter =',')

x = df['x'].values
y = df['y'].values

b1Grid = np.linspace(0,8)
b0Grid = np.linspace(-20,10)

def error_function(x,y,b0,b1,e=0):
for i,yi in enumerate(y):
e += np.square((b1*x[i]+b0) - yi)
return e/len(y)

bv, mv = np.meshgrid(b0Grid, b1Grid)

z = error_function(x,y,bv,mv)

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(mv, bv, z, color='b', alpha=0.3)
ax.contour(mv, bv, z, 100, offset =-1000)
ax.set_xlabel('b1')
ax.set_ylabel('b0')
ax.set_zlabel('Z')
ax.view_init(elev=40, azim=120)
plt.show()

What are the best values for β1 and β0 ?

In [9]: from sklearn.linear_model import LinearRegression

reg = LinearRegression(fit_intercept=True)
reg.fit(df[['x']], df['y']) #single column
# reg.fit(df['x'].values.reshape(-1,1), df['y']) #single column

best_b1 = reg.coef_[0]
best_b0 = reg.intercept_
print(f'The best values for b1 and b0 are {np.round(best_b1,2)} and {np.round(best_b0,2)} respectively.')

The best values for b1 and b0 are 3.98 and -10.26 respectively.

Where do the best values for β1 and β0 fall in the error function?
The values of β1 and β0 fall at the lowest point of the error function.

Gradient descent algorithm

The following Squared error function is convex or bowl shape:

1 n
ϵ= ∑((β1 xi + β0 ) − yi )2 .
n
i=1

The partial derivatives w.r.t β1 and β0 are:

∂ϵ 2 n
= ∑((β1 xi + β0 ) − yi )xi ,
∂β1 n
i=1

∂ϵ 2 n
= ∑((β1 xi + β0 ) − yi ).
∂β0 n
i=1

Algorithm:
Initialize:
Start with random β1 and β0 values, say β old and β old .
1 0
Calculate ϵold
repeat {
Update:
β1new = β1old − λ n2 ∑ni=1 ((β1old xi + β0old ) − yi )xi
β0new = β0old − λ n2 ∑ni=1 ((β1old xi + β0old ) − yi )
Calculate ϵnew
} until (ϵnew >= ϵold )

In the above algorithm λ > 0 is step size, and is usually selected as a small number. Here we do not have to consider the second order information because the function is convex.

In [10… # Implementation of gradient descent method.

def error_function(x,y,b0,b1):
e=0
for i,yi in enumerate(y):
e += ((b1*x[i]+b0) - yi)**2
return e/len(y)

def partial_b1(x,y,b0,b1):
e=0
for i,yi in enumerate(y):
e += ((b1*x[i]+b0) - yi)*x[i]
return 2*e/len(y)

def partial_b0(x,y,b0,b1):
e=0
for i,yi in enumerate(y):
e += ((b1*x[i]+b0) - yi)
return 2*e/len(y)

# Given

df = pd.read_csv('data/Regression-2.csv', delimiter =',')

x = df['x'].values
y = df['y'].values

#initialize
b1_Old = 10
b0_Old = 10
mse_Old = error_function(x,y,b0_Old,b1_Old)
lam = 0.005
print('Initial Values:', b0_Old, b1_Old, mse_Old,'\n')

##repeate
while True:
b1_New = b1_Old - lam * partial_b1(x,y,b0_Old,b1_Old)
b0_New = b0_Old - lam * partial_b0(x,y,b0_Old,b1_Old)
mse_New = error_function(x,y,b0_New,b1_New)
if mse_New >= mse_Old:
b0_New,b1_New,mse_New = b0_Old,b1_Old,mse_Old
break
else:
b0_Old,b1_Old,mse_Old = b0_New,b1_New,mse_New

print('Final values:', b0_New, b1_New, mse_New)

Initial Values: 10 10 8210.8

Final values: -10.263152927431642 3.977443244976516 32.53308270677262

Multiple Input Single Output (MISO) Linear Regression

How multiple input variables are related to one output variable?

So far we have seen one input variable's relationship with one output variable.
In many practical applications, we will be dealing with multiple input variables.
To incorporate multiple variables, consider the following notations:
Let the n observations be represented by (xi , yi ), for i = 1, … , n
where xi is a vector in RP , and P are the number of input variables.
Let β0 , the intercept
Let βj be the coefficients of variable xi , for j = 1, … , P , and for i = 1, … , n
j
For ease of notation, we introduce xi0 = 1 for i = 1, … , n

Based on the notations, the linear relationship is defined as follows:

P
h(xi ) = ∑ βj xij ∀i = 1, … , n.
j=0

Moreover, the error function can be defined as:

1 n 2
J(β) = ∑ (h(xi ) − yi ) .
n
i=0

Gradient Descent for MISO Linear Regression

In this section, we extend the gradient descent approach that we have developed for SISO case.
Let us take partial derivatives of the Squared error function, for each parameter βj :

∂ 2 n
J(β) = ∑ (h(xi ) − yi ) xij
∂βj n
i=0

The update mechanism can be defined as:

2 n
βj := βj − λ ∑ (h(xi ) − yi ) xij
n
i=0

Thus, the gradient descent algorithm will be modified as follows:

Algorithm:
Initialize:
Start with random βj for j = 0, 1, … , P
Calculate J(β)old
repeat {
Update:
βj := βj − λ n1 ∑ni=0 (h(xi ) − yi ) xij ∀j = 0, 1, … , P
Calculate J(β)new
} until (J(β)new >= J(β)old )

In the above algorithm λ > 0 is step size, and is usually selected as a small number. Here we do not have to consider the second order information because the J(β) function is convex.

Closed Form Solution for MISO Linear Regression

For the squared error function, the solution can be obtained as follows:

β∗ = (XT X)−1 XT y,

where X ∈ Rn×(P +1) , each column of X represent an input variable, the first column of X contains all ones, and y ∈ Rn represents the column containing output variable.

Example MISO Linear Regression

Question-D: Consider the data given in the csv file titled "Regression-3.csv".

1. Read and describe the data.

2. Calculate the correlation among all the variables.
3. Identify the top three correlated input variables to the output variable.
4. Calculate the coefficient estimates using OLS closed form.
5. Calculate the coefficient estimates using sci-kit learn LinearRegression module.

In [6]: # 1. Read and describe the data.

import numpy as np
import pandas as pd

df = pd.read_csv('data/Regression-3.csv', delimiter =',')

#display(df.head())
display(df.describe().T)

count mean std min 25% 50% 75% max

x1 97.0 1.030930e-11 1.005195 -2.300218 -0.713997 0.082650 0.662694 2.107397

x2 97.0 2.061865e-11 1.005195 -2.942386 -0.593769 -0.013927 0.580608 2.701661

x3 97.0 -2.061859e-11 1.005195 -3.087227 -0.521961 0.153109 0.558151 2.043304

x4 97.0 -2.061855e-10 1.005195 -1.030029 -1.030029 0.138397 1.010033 1.542252

x5 97.0 -6.185566e-11 1.005195 -0.525657 -0.525657 -0.525657 -0.525657 1.902379

x6 97.0 -1.030927e-10 1.005195 -0.867655 -0.867655 -0.445098 0.976274 2.216735

x7 97.0 5.154621e-11 1.005195 -1.047571 -1.047571 0.344407 0.344407 3.128363

x8 97.0 -1.030929e-10 1.005195 -0.868957 -0.868957 -0.334356 0.556647 2.695054

y 97.0 -3.092764e-11 1.005195 -2.533318 -0.650257 0.098514 0.503299 2.703452

In [2]: # 2: Calculate the correlation among all the variables.

corr = df.corr()

corr.style.background_gradient(cmap='coolwarm').set_precision(4) # see pandas.DataFrame.style

# display(corr)

# import seaborn as sns

# %matplotlib inline
# sns.heatmap(corr)

C:\Users\TAHIR~1.MAH\AppData\Local\Temp/ipykernel_2780/1861820697.py:4: FutureWarning: this method is deprecated in favour of `Styler.format(precision=..)`

corr.style.background_gradient(cmap='coolwarm').set_precision(4) # see pandas.DataFrame.style
Out[2]: x1 x2 x3 x4 x5 x6 x7 x8 y

x1 1.0000 0.2805 0.2250 0.0273 0.5388 0.6753 0.4324 0.4337 0.7345

x2 0.2805 1.0000 0.3480 0.4423 0.1554 0.1645 0.0569 0.1074 0.4333

x3 0.2250 0.3480 1.0000 0.3502 0.1177 0.1277 0.2689 0.2761 0.1696

x4 0.0273 0.4423 0.3502 1.0000 -0.0858 -0.0070 0.0778 0.0785 0.1798

x5 0.5388 0.1554 0.1177 -0.0858 1.0000 0.6731 0.3204 0.4576 0.5662

x6 0.6753 0.1645 0.1277 -0.0070 0.6731 1.0000 0.5148 0.6315 0.5488

x7 0.4324 0.0569 0.2689 0.0778 0.3204 0.5148 1.0000 0.7519 0.3690

x8 0.4337 0.1074 0.2761 0.0785 0.4576 0.6315 0.7519 1.0000 0.4223

y 0.7345 0.4333 0.1696 0.1798 0.5662 0.5488 0.3690 0.4223 1.0000

In [7]: # 3. Identify the top three correlated input variables to the output variable.

corr=corr.apply(lambda x: np.abs(x))

sorted_corr = corr.sort_values(by=['y'], ascending=False) # negative values will be considered using abs.

sorted_corr = sorted_corr['y'].index
print('The top three correlated input variables are: ',sorted_corr[1:4].tolist())
print(sorted_corr)

The top three correlated input variables are: ['x1', 'x5', 'x6']
Index(['y', 'x1', 'x5', 'x6', 'x2', 'x8', 'x7', 'x4', 'x3'], dtype='object')

In [8]: #4: Calculate the coefficient estimates using OLS closed form.
# import numpy as np

Xo = df.iloc[:,0:-1].values
y = df.iloc[:,-1].values
X = np.c_[np.ones(len(y)), Xo]
#print(Xo)
#print(y)
#print(X)

## or using the labels

# Xo = df.loc[:,'x1':'x8'].values
# y = df['y'].values
# X = np.hstack([np.ones(len(y)).reshape(-1,1), Xo])

best_beta = np.linalg.inv(X.T @ X) @ X.T @ y

print('The closed form estimates are:', np.round(best_beta,2).tolist())

The closed form estimates are: [-0.0, 0.58, 0.23, -0.14, 0.12, 0.27, -0.13, 0.03, 0.11]

In [15… # 5: Calculate the coefficient estimates using scikit learn LinearRegression module.
from sklearn.linear_model import LinearRegression

reg = LinearRegression().fit(Xo, y)

best_beta = np.round(reg.coef_,2)
best_beta_0 = np.round(reg.intercept_,2)

print(f'The best values for the estimates are :', best_beta_0, best_beta.tolist())

The best values for the estimates are : -0.0 [0.58, 0.23, -0.14, 0.12, 0.27, -0.13, 0.03, 0.11]

Penalized Linear Regression (PLR)

(Book-2)

The coefficients obtained by minimizing the squared error of the training data, may not perform in general good on the new data.
This phenomenon is called as over-fitting.
In order to control over-fitting, a penalized term (or regularization term) is added to the error function.
The aim of the penalized function is to set some of the coefficients to zero.

In PLR, the least square error function will be updated as follows:

1 n
∑ (h(xi ) − yi ) + αP (β),
2
J(β) =
n
i=1

where P (β) is the penalty or regularized term, and α is the regularization coefficient.

In our class, we will look at the following two popular penalization linear regression models:
Ridge regression:

1 n P
∑ (h(x ) − yi ) + α ∑ βj2
i 2
n
i=1 j=0

LASSO regression:

1 n P
∑ (h(x ) − yi ) + α ∑ |βj |
i 2
n
i=1 j=0

Solution of the Penalized Linear Regression

Ridge regression:
A closed form solution exists, given as:

β∗ = (XT X + αI)−1 XT y,

where I ∈ R(P +1)×(P +1) is the identity matrix.

LASSO regression:
No closed form solution, but many efficient algorithms exists.

Note: gradient descent can be used for solving the above penalized linear regression problems.

Example Penalized Regressions

Question-E: Consider the data given in Question-D.

1. Find the coefficient estimates using Ridge regression for alpha = 500, using sci-kit learn.
2. Find the coefficient estimates using Lasso regression for alpha = 0.5, using sci-kit learn.
3. Are the coefficients different from OLS. What can you infer.

In [16… #1. Find the coefficient estimates using Ridge regression for 𝑎𝑙𝑝ℎ𝑎=500 , using sci-kit learn.
import pandas as pd
df = pd.read_csv('data/Regression-3.csv', delimiter =',')

Xo = df.iloc[:,0:-1].values
y = df.iloc[:,-1].values

from sklearn.linear_model import Ridge

regr = Ridge(alpha=500)
regr.fit(Xo, y)
best_beta = np.round(regr.coef_,2)
best_beta_0 = np.round(regr.intercept_,2)
print(f'The best values for the estimates are :', best_beta_0, best_beta.tolist())

The best values for the estimates are : -0.0 [0.1, 0.06, 0.01, 0.02, 0.07, 0.06, 0.04, 0.04]

In [17… #2. Find the coefficient estimates using Lasso regression for 𝑎𝑙𝑝ℎ𝑎=0.5 , using sci-kit learn.
from sklearn.linear_model import Lasso
regl = Lasso(alpha=0.5)
regl.fit(Xo, y)
best_beta = np.round(regl.coef_,2)
best_beta_0 = np.round(regl.intercept_,2)
print(f'The best values for the estimates are :', best_beta_0, best_beta.tolist())

The best values for the estimates are : -0.0 [0.23, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

1. Are the coefficients different from OLS. What can you infer.

Yes, the coefficients are different due to penalized objective. Following can be inferred:

OLS vs Ridge: OLS and Ridge have all non-zero coefficients. However, the coefficients given by Ridge are less in terms of absolute value (shrunken coefficients)

OLS vs Lasso: OLS has all non-zero coefficients, whereas Lasso has ONE non-zero coefficients (selected coefficient). Thus, Lasso claims that only the first column is relevant to the output variable, for alpha=0.5.

Lasso vs Ridge: Summary

Ridge regression shrinks the estimate values, it is useful when there are correlated input columns.
Lasso does feature selection, it is useful when there are unrelated input columns.
The quality of solution depends on the value of the hyper-parameter α
Selecting the best value of α generates better results than OLS!

How to identify/select the best value of the hyper-parameter?

Cross-validation for Parameter Selection

An idea to find best value of the hyper-parameters

Train-Testing:

Two data sets (training and testing) are available.

The idea is to build the model on training data, and test the model on testing data.
The testing data is usually smaller dataset compared to the training data.
The aim is to test the model's generalizability.
Other variations: Training-testing-validation datasets.
to goal is to reduce the level of overfitting.
the training is done a little bit more rigorously.
validation set is for validating the model's accuracy before the actual test on testing data.

Cross-validation (CV):

It is used when there is no predefined training/validation/testing data.

It is one of the pragmatic approaches for parameter selection.
The idea is to partition the data into training/validation/testing sets.
Holdout method
k-fold method
Leave-one-out method
Monte-Carlo method
The aim is to improve the model's generalizability.

Case Study
Let's test the power of linear regression

Case Study - 1
Data:
Historical data of real estate prices per unit area are collected for a city. Following 6 input features are recorded over the time:
X1=the transaction date
X2=the house age (unit: year)
X3=the distance to the nearest MRT station (unit: meter)
X4=the number of nearby convenience stores (integer)
X5=the geographic coordinate, latitude. (unit: degree)
X6=the geographic coordinate, longitude. (unit: degree)

Values of the 6 input features for (one observation) is written in a row. Furthermore, the corresponding real estate price per unit area (output variable) values are stored under the column Y in the corresponding row. The data is given
in Regression-4.csv file.

Hypothesis:
Our underlying hypothesis is that the input variables are linearly related with the output variable.

Objective:
The objective of this case study is to identify the input variables' relationship with the output variable. Specifically, conduct a regression analysis, and estimate the best coefficients that capture the underlying relationship. Note: It is
not necessary that all the input variables are related to the output variable.

In [4]: # Reading & describing the data

import pandas as pd
df = pd.read_csv('data/Regression-4.csv', delimiter = ',')
display(df.info())
df.describe().T

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 414 entries, 0 to 413
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 X1 transaction date 414 non-null float64
1 X2 house age 414 non-null float64
2 X3 distance to the nearest MRT station 414 non-null float64
3 X4 number of convenience stores 414 non-null int64
4 X5 latitude 414 non-null float64
5 X6 longitude 414 non-null float64
6 Y house price of unit area 414 non-null float64
dtypes: float64(6), int64(1)
memory usage: 22.8 KB
None
Out[4]: count mean std min 25% 50% 75% max

X1 transaction date 414.0 2013.148971 0.281967 2012.66700 2012.917000 2013.16700 2013.417000 2013.58300

X2 house age 414.0 17.712560 11.392485 0.00000 9.025000 16.10000 28.150000 43.80000

X3 distance to the nearest MRT station 414.0 1083.885689 1262.109595 23.38284 289.324800 492.23130 1454.279000 6488.02100

X4 number of convenience stores 414.0 4.094203 2.945562 0.00000 1.000000 4.00000 6.000000 10.00000

X5 latitude 414.0 24.969030 0.012410 24.93207 24.963000 24.97110 24.977455 25.01459

X6 longitude 414.0 121.533361 0.015347 121.47353 121.528085 121.53863 121.543305 121.56627

Y house price of unit area 414.0 37.980193 13.606488 7.60000 27.700000 38.45000 46.600000 117.50000

In [5]: # Generate Train - Test splits

from sklearn.model_selection import train_test_split
X = df.iloc[:,:-1].values
y = df.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [7]: # Scaling the Train - Test splits

import numpy as np
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

scaler.fit(np.c_[X_train,y_train])

A_train = scaler.transform(np.c_[X_train,y_train])
X_train = A_train[:,:-1]
y_train = A_train[:,-1]

A_test = scaler.transform(np.c_[X_test,y_test])
X_test = A_test[:,:-1]
y_test = A_test[:,-1]

# print(A_train)

In [10… # Regression Analysis: Mean Squared Error Metric

from sklearn.metrics import mean_squared_error

## OLS
from sklearn.linear_model import LinearRegression
reg1 = LinearRegression(fit_intercept=False).fit(X_train, y_train)
y_pred1 = reg1.predict(X_test)
print('The MSE using OLS is:', mean_squared_error(y_test, y_pred1))

## Ridge
from sklearn.linear_model import RidgeCV
reg2 = RidgeCV(alphas=[1e-3, 1e-2, 1e-1, 1e0, 1e1, 1e2, 1e3], fit_intercept=False,cv=10).fit(X_train, y_train)
y_pred2 = reg2.predict(X_test)
print('The MSE using Ridge is:', mean_squared_error(y_test, y_pred2))

## Lasso
from sklearn.linear_model import LassoCV
reg3 = LassoCV(alphas=[1e-3, 1e-2, 1e-1, 1e0, 1e1, 1e2, 1e3],
fit_intercept=False,cv=10, random_state=0).fit(X_train, y_train)
y_pred3 = reg3.predict(X_test)
print('The MSE using Lasso is:', mean_squared_error(y_test, y_pred3))

#best_beta = np.round(reg3.coef_,2)
#best_beta_0 = np.round(reg3.intercept_,2)
#print(f'The best values for the estimates are :', best_beta_0, best_beta.tolist())

The MSE using OLS is: 0.3840946602349498

The MSE using Ridge is: 0.3793714233792296
The MSE using Lasso is: 0.3788784599125441
The best values for the estimates are : 0.0 [0.11, -0.19, -0.41, 0.23, 0.21, -0.0]

In [9]: ## Details of the best estimates

print('The best penalty coefficient is:', reg3.alpha_)
print('The best coefficient estimates are:', reg3.coef_)

The best penalty coefficient is: 0.01

The best coefficient estimates are: [ 1.07962770e-01 -1.87811814e-01 -4.08893754e-01 2.28328947e-01
2.14423679e-01 -3.65142883e-04]

Summary of the Case Study-1:

Lasso performs better than OLS and Ridge for this data.
From the regression analysis, we can say that input variable X6 seems to have no relationship with Y.
Given a new information related to X1, X2, X3, X4 & X5, using the above lasso reg3 model, one can predict the price (Y).

Concluding Remarks
Further knowledge in regression

Handling Big-Data
When n is very huge, the close form solution requires high computer memory.
Typically, gradient descent is preferred over closed form solution in such cases.
The gradient descent can be applied in following ways:
Batch mode: The gradient descent algorithm considers all n data points while calculating the gradient
Stochastic/Incremental mode: The gradient descent algorithm considers one of the n data points at a time while calculating the gradient
Mini-Batch mode: The gradient descent algorithm considers a sample from n data points at a time while calculating the gradient
When n is very huge, typically, Stochastic or Mini-batch mode is used.

Concluding Remarks
Accuracy
Training Time
Number of Parameters
Number of Features
Choosing the Right Estimator
ElasticNet

References
Theory:
1. Chirag Shah, "A Hands-On Introduction to Data Science," Cambridge University Press, 2020, Section 3.8.1, 8.3, 8.4, 12.4.
2. Bishop, Christopher M., "Pattern recognition and machine learning," Springer, 2006, Section 3.1.4.

Data Sets:
1. Regression-1 and Regression-2: Chirag Shah, "A Hands-On Introduction to Data Science," Cambridge University Press, 2020.
2. Regression-3: Stamey, Thomas A., et al. "Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients." The Journal of urology 141.5 (1989): 1076-1083.
3. Regression-4: Yeh, I. C., & Hsu, T. K. (2018). Building real estate valuation models with comparative approach through case-based reasoning. Applied Soft Computing, 65, 260-271.

In [ ]:

Ignou Assignment Receiving Slip - ACKNOWLEDGEMENT Slip - Black by Youtube SabOnlineHai
0% (1)
Ignou Assignment Receiving Slip - ACKNOWLEDGEMENT Slip - Black by Youtube SabOnlineHai
1 page
Correlation Regression
No ratings yet
Correlation Regression
16 pages
Chapter 2 Multiple Regression 2
No ratings yet
Chapter 2 Multiple Regression 2
6 pages
Quarter 2 Week 4 Statistics and Probability
No ratings yet
Quarter 2 Week 4 Statistics and Probability
7 pages
Spearman and Pearson
No ratings yet
Spearman and Pearson
2 pages
MBA QN
No ratings yet
MBA QN
6 pages
Decision Science: Sales (INR 000s)
No ratings yet
Decision Science: Sales (INR 000s)
17 pages
Tool For Forecating (Chap 4)
No ratings yet
Tool For Forecating (Chap 4)
22 pages
Customized Lecture 5
No ratings yet
Customized Lecture 5
19 pages
BBA II A Project-1 2022-23
No ratings yet
BBA II A Project-1 2022-23
15 pages
Tablas y Figuras Septima Edicion
No ratings yet
Tablas y Figuras Septima Edicion
6 pages
Practical (Bda)
No ratings yet
Practical (Bda)
15 pages
Module 2 Practice Problems
No ratings yet
Module 2 Practice Problems
21 pages
Linear Regression
100% (1)
Linear Regression
4 pages
Note #10 Correlation and Regression
No ratings yet
Note #10 Correlation and Regression
7 pages
Categorical Slide2024
No ratings yet
Categorical Slide2024
32 pages
Assignment - IV
No ratings yet
Assignment - IV
2 pages
Name: M.Shunmalar Roll No: BC0190044
No ratings yet
Name: M.Shunmalar Roll No: BC0190044
2 pages
Lesson 3 Assignment
No ratings yet
Lesson 3 Assignment
9 pages
Data Science Report
No ratings yet
Data Science Report
35 pages
Ch1 Solutions 1
No ratings yet
Ch1 Solutions 1
59 pages
Attachment 1
No ratings yet
Attachment 1
20 pages
XIME-QT-1 Assignment-II
No ratings yet
XIME-QT-1 Assignment-II
2 pages
TABLE 3.2 Regression Parameter Estimation - Solution
No ratings yet
TABLE 3.2 Regression Parameter Estimation - Solution
14 pages
Decision Science Assignment
No ratings yet
Decision Science Assignment
13 pages
Intro To Regression
No ratings yet
Intro To Regression
30 pages
Correlation and Regression Correlation: Correlation Between 2 Variables X and Y Indicates Whether They Are Related To Each
No ratings yet
Correlation and Regression Correlation: Correlation Between 2 Variables X and Y Indicates Whether They Are Related To Each
18 pages
ch13 Multiple Regression
No ratings yet
ch13 Multiple Regression
37 pages
Symbiosis Institute of Business Management, Pune
No ratings yet
Symbiosis Institute of Business Management, Pune
3 pages
Statistics Classwork #3-2
No ratings yet
Statistics Classwork #3-2
13 pages
Company Profile
No ratings yet
Company Profile
8 pages
2022 CFA Program Curriculum Level II Box Set Vol 1 6 1st Edition Cfa Institute Download
No ratings yet
2022 CFA Program Curriculum Level II Box Set Vol 1 6 1st Edition Cfa Institute Download
44 pages
Lab Question Set
No ratings yet
Lab Question Set
7 pages
Relative - Mixed - Absolute References
No ratings yet
Relative - Mixed - Absolute References
17 pages
QB Stat
No ratings yet
QB Stat
7 pages
Statistics For Engineers (Lab) Digital Assignment-2: 1.correlation
No ratings yet
Statistics For Engineers (Lab) Digital Assignment-2: 1.correlation
4 pages
Delta Wire Corporation Uses Training As A Weapon 1
No ratings yet
Delta Wire Corporation Uses Training As A Weapon 1
12 pages
Exercise 4 Trip Distribution: Work Trips
No ratings yet
Exercise 4 Trip Distribution: Work Trips
18 pages
Ricky Caibog BSA-1: No - of Observations
No ratings yet
Ricky Caibog BSA-1: No - of Observations
14 pages
SFSF
No ratings yet
SFSF
6 pages
APPENDIX D - INTRODUCTION TO SAS - Introduction To Linear Regression Analysis, 5th Edition
No ratings yet
APPENDIX D - INTRODUCTION TO SAS - Introduction To Linear Regression Analysis, 5th Edition
9 pages
3rd Attempt Revised
No ratings yet
3rd Attempt Revised
137 pages
Teja 2585
No ratings yet
Teja 2585
7 pages
Advanced Statistics Problems (New) 1
No ratings yet
Advanced Statistics Problems (New) 1
5 pages
EXCEL Assignment
No ratings yet
EXCEL Assignment
7 pages
Training On Excel - Day1 - 13 5 16 Devender Jalan
No ratings yet
Training On Excel - Day1 - 13 5 16 Devender Jalan
28 pages
RSHH Qam12 ch04
No ratings yet
RSHH Qam12 ch04
83 pages
aSUMSI KLASIK
No ratings yet
aSUMSI KLASIK
4 pages
Financial Statement Analysis and Business Valuation CgnXN64hU2
No ratings yet
Financial Statement Analysis and Business Valuation CgnXN64hU2
18 pages
Calculate Growth
No ratings yet
Calculate Growth
5 pages
Comparative Analysis
No ratings yet
Comparative Analysis
3 pages
DTE-2 R Language Paper
No ratings yet
DTE-2 R Language Paper
8 pages
Group D Research Methodology
No ratings yet
Group D Research Methodology
6 pages
New - Purchase Plan Template
No ratings yet
New - Purchase Plan Template
12 pages
Chapter 5 - Forecasting Techniques and Analysing Data
No ratings yet
Chapter 5 - Forecasting Techniques and Analysing Data
15 pages
Linear Models SEM-IV (Part2) Main (1) - 119-133
No ratings yet
Linear Models SEM-IV (Part2) Main (1) - 119-133
15 pages
CH 11
No ratings yet
CH 11
76 pages
Business Statistics, 4e: by Ken Black
No ratings yet
Business Statistics, 4e: by Ken Black
38 pages
Reinforcement Learning: A Practical Guide to Algorithms
From Everand
Reinforcement Learning: A Practical Guide to Algorithms
Trilokesh Khatri
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Topic 8-ISE291 Shimaa Alwadie
No ratings yet
Topic 8-ISE291 Shimaa Alwadie
10 pages
Topic 5-ISE291 Shimaa Alwadie
No ratings yet
Topic 5-ISE291 Shimaa Alwadie
8 pages
Topic 9 Basic Clustering Methods
No ratings yet
Topic 9 Basic Clustering Methods
3 pages
Kobe Bryant - The Game of His Life (PDFDrive)
No ratings yet
Kobe Bryant - The Game of His Life (PDFDrive)
163 pages
Circuit Globe: All About Electrical and Electronics
No ratings yet
Circuit Globe: All About Electrical and Electronics
6 pages
Business Plan Ambito Final Submission
No ratings yet
Business Plan Ambito Final Submission
17 pages
Nalanda Bulletin Issue No 19
No ratings yet
Nalanda Bulletin Issue No 19
13 pages
Structured Denoising Diffusion Models in Discrete State-Spaces
No ratings yet
Structured Denoising Diffusion Models in Discrete State-Spaces
33 pages
Integration of Facilities Management (FM) Practices With Building Information Modeling (BIM)
No ratings yet
Integration of Facilities Management (FM) Practices With Building Information Modeling (BIM)
8 pages
HHT Manual
No ratings yet
HHT Manual
37 pages
Her First Ball-Katherine Mansfield
100% (1)
Her First Ball-Katherine Mansfield
1 page
Traffic Surveillance Camera Calibration by 3D Model B - 2017 - Computer Vision A
No ratings yet
Traffic Surveillance Camera Calibration by 3D Model B - 2017 - Computer Vision A
12 pages
Chapter 1 Introduction To Sales Management
No ratings yet
Chapter 1 Introduction To Sales Management
46 pages
2040 4 0 SV8100 ACD Supervisor
No ratings yet
2040 4 0 SV8100 ACD Supervisor
458 pages
Beamer - Customize Mini Frame Navigation - TeX - LaTeX Stack Exchange2
No ratings yet
Beamer - Customize Mini Frame Navigation - TeX - LaTeX Stack Exchange2
4 pages
Mechatronics Design 1 Second Mid Term Exam 2010-2011
100% (1)
Mechatronics Design 1 Second Mid Term Exam 2010-2011
2 pages
Yemen WASH Cluster Cholera SOPs v3.1 ZD
No ratings yet
Yemen WASH Cluster Cholera SOPs v3.1 ZD
42 pages
Abrazadera para Cable XLP
No ratings yet
Abrazadera para Cable XLP
6 pages
Date Sheet For The Associate Degree in ArtsScience Part I Annual Exami89783
No ratings yet
Date Sheet For The Associate Degree in ArtsScience Part I Annual Exami89783
2 pages
Mid Sem Notes 6th Semester
No ratings yet
Mid Sem Notes 6th Semester
22 pages
G-30 Entrance Exam Syllabus
No ratings yet
G-30 Entrance Exam Syllabus
1 page
(DOC) From Lessons To Life - Authentic Materials Bridge The Gap - Maria Spelleri
No ratings yet
(DOC) From Lessons To Life - Authentic Materials Bridge The Gap - Maria Spelleri
4 pages
D T Pronunciation Maze
No ratings yet
D T Pronunciation Maze
4 pages
CW COMP1252 159762 st846 20100528 124047 0910
No ratings yet
CW COMP1252 159762 st846 20100528 124047 0910
67 pages
Midterm Quizzes
No ratings yet
Midterm Quizzes
7 pages
Camera Quick Start Guide
No ratings yet
Camera Quick Start Guide
20 pages
Elisabeth Johanne Rook: Curriculum Vitae
No ratings yet
Elisabeth Johanne Rook: Curriculum Vitae
2 pages
2019-AGA-DDSEP-9-Chapter-2-1557871764749 1 ENFERMEDADES DEL ESTOMAGO
No ratings yet
2019-AGA-DDSEP-9-Chapter-2-1557871764749 1 ENFERMEDADES DEL ESTOMAGO
24 pages
India Local Services Lists - Hospital List For India July 19.odt
No ratings yet
India Local Services Lists - Hospital List For India July 19.odt
38 pages
EXPEDIA Remittancedetails
No ratings yet
EXPEDIA Remittancedetails
18 pages
Bachelor of Technology: Prediction of Used Car Prices Using Artificial Neural Networks and Machine Learning
No ratings yet
Bachelor of Technology: Prediction of Used Car Prices Using Artificial Neural Networks and Machine Learning
47 pages
Data For Hijra Year 1421-1520
No ratings yet
Data For Hijra Year 1421-1520
52 pages
Multimedia Resources As Learning Tools
No ratings yet
Multimedia Resources As Learning Tools
10 pages