0% found this document useful (0 votes)
15 views43 pages

Lecture 9-10 - Updated Vesion S25 - Regression

The document covers Linear Regression, detailing both Simple and Multiple Linear Regression models, including their mathematical formulations and applications. It discusses the importance of regression for predicting response variables and the assumptions required for the data. Additionally, it includes examples and methods for calculating regression coefficients, error measures, and the correlation coefficient.

Uploaded by

mawoye3141
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views43 pages

Lecture 9-10 - Updated Vesion S25 - Regression

The document covers Linear Regression, detailing both Simple and Multiple Linear Regression models, including their mathematical formulations and applications. It discusses the importance of regression for predicting response variables and the assumptions required for the data. Additionally, it includes examples and methods for calculating regression coefficients, error measures, and the correlation coefficient.

Uploaded by

mawoye3141
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Linear Regression

Lecture 9&10
INSTRUC TOR:
DR. MAHA AMIN HASSANEIN
P R O F E S S O R E N G I N E E R I N G M AT H E M AT I C S A N D P H Y S I C S D E PA R T M E N T
F A C U LT Y O F E N G I N E E R I N G
CAIRO UNIVERSITY
Study Outline
Simple Linear Regression Model
Multiple Linear Regression
Variance-Covariance Matrix
R2 Best of fit

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Why Regression
To study the relationship between two or more variables or
predict a response variable Y from predictor variables 𝑋𝑖
Measures of the direction and strength of the linear
relationship between the dependent variable and the
independent variables

Assumptions about the data


Linearity - Independence - Normally distributed error
terms

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


SPRING 2025 PROF. DR. MAHA A. HASSANEIN
Linear Regression Models
A simple linear regression model

𝑌 = 𝛼 + 𝛽𝑥
where 𝛼 is the intercept and 𝛽 is the slope.

A multiple regression with more than one independent


variables (repressors) is given by the probabilistic model

𝑌 = 𝛼 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑝 𝑥𝑝

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


SPRING 2025 PROF. DR. MAHA A. HASSANEIN
Example: Housing dataset

The response Y (Dependent Variable): Price


The predictors X (Independent Variables): lotsize, bedrooms,
bathrooms, stories, driveway, recroom, fullbase, gashw, airco,
garageplace.
Training Data Size N =546 records

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Ex. Housing Data Scatter Plot

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Simple Linear Regression Model
Statistical model for regression line
𝜖3
𝑌 = 𝛼 + 𝛽𝑥 + 𝜖 𝜖1
𝜖2
𝛼 the intercept
𝛽 the slope 𝛼

𝜖 the model error


𝜖 ~𝑁 0, 𝜎 2
where 𝜎 2 is the residual variance or error variance.

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


The Method of Least Squares
Goal: Find a and b in 𝑦ො = 𝑎 + 𝑏𝑥 so that the Sum of Squares
of the Errors about the regression line is a minimum

𝜖3
𝑒1
𝜖1
𝜖2

𝛼 a

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Computing a and b
Data: { 𝑥𝑖 , 𝑦𝑖 , i=1,2,…,n } . The least squares method finds a and
b such that to minimize The Mean Sum of Squares Error (SSE) or
(MSE)
𝑛 𝑛
෍ 𝑒𝑖2 = ෍ 𝑦𝑖 − 𝑦ො𝑖 2
𝑖=1 𝑖=1
We can prove that
𝑛 σ 𝑥𝑖 𝑦𝑖 − σ 𝑥𝑖 σ 𝑦𝑖
𝑏=
𝑛 σ 𝑥𝑖 2 − σ 𝑥𝑖 2
and
𝑎 = 𝑦ത − 𝑏𝑥ҧ
that minimizes SSE

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Example 1
Fit a straight line to the data by the method of least
squares, and use it to estimate y at x=10. Then plot the
line.

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Solution
𝑥ҧ = 6.4
𝑦ത = 7.0
𝑏 ≈ −1.1
𝑎 = 14.0

The fitted line : y = –1.1 x + 14.0


The predicted value at x=10
y =14 - 1.1 *10 = 3

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


S-Notation
Using the notations :
Σ𝑥 = σ 𝑥𝑖 , Σ𝑦 = σ 𝑦𝑖 , Σ𝑥𝑦 = σ 𝑥𝑖 𝑦𝑖 , Σ𝑥𝑥 = σ 𝑥𝑖 2 , Σ𝑦𝑦 = σ 𝑦𝑖2
Define the sums of squares and products as:
𝑛 Σ𝑥 2
𝑆𝑥𝑥 = ෍ 2
𝑥𝑖 − 𝑥ҧ = Σ𝑥𝑥 −
𝑖=1 𝑛
2
𝑛 Σ𝑦
2
𝑆𝑦𝑦 = ෍ 𝑦𝑖 − 𝑦ത = Σ𝑦𝑦 −
𝑖=1 𝑛
𝑛 Σ𝑥 Σ𝑦
𝑆𝑥𝑦 = ෍ 𝑦𝑖 − 𝑦ത 𝑥𝑖 − 𝑥ҧ = Σ𝑥𝑦 −
𝑖=1 𝑛

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Simpler Formulas
The simple linear regression coefficients using S-notation
𝑆𝑥𝑦
𝑏=
𝑆𝑥𝑥
𝑎 = 𝑦ത − 𝑏𝑥ҧ

The error sum of squares (SSE)


2
𝑆𝑥𝑦
𝑆𝑆𝐸 = 𝑆𝑦𝑦 −
𝑆𝑥𝑥
An estimate of the (residual variance) σ2 is
2
𝑆𝑆𝐸
se =
𝑛−2

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Day Solar Energy
Example 2 Radi(X) Output (Y)
Given the data in table : 1 4.5 1800
a)Find linear relationship 2 5.0 2000
between solar radiation and the 3 3.5 1400
energy output of a solar .
4 6.0 2400
5 5.5 2200
b)Predict the energy output for
average daily solar radiation 6 4.0 1600
equal to 5.2 kWh/m² 7 5.0 2000
8 4.5 1800
c)Compute the residual sum of 9 6.5 2600
squares.
10 3.0 1100
SPRING 2025 PROF. DR. MAHA A. HASSANEIN
Solution
Σ𝑥 = 47.5 , Σ𝑦 = 18900 , Σ𝑥𝑦 = 94200 , Σ𝑥𝑥 = 236.25, Σ𝑦𝑦 = 3757 × 104
47.5 2
𝑆𝑥𝑥 = 236.25 − = 10.625, 𝑆𝑦𝑦 = 1849 × 103 ,𝑆𝑥𝑦 = 4425
10

ഥ𝑥 =4.75 kWh/m² and 𝑦ത =1890 kWh


𝑆𝑥𝑦 4425
𝑏= = = 416.47
𝑆𝑥𝑥 10.625
𝑎 = 𝑦ത − 𝑏𝑥ҧ = 1890 − 416.47 ∗ 4.75 = −88.2325
The linear regression equation is:
Y = −88.2325+416.47* X
The residual sum of squares SSE is
2 2
𝑆𝑥𝑦 4425
𝑆𝑆𝐸 = 𝑆𝑦𝑦 − = 1056 × 103 − ≈ 6117.647
𝑆𝑥𝑥 10.625
r=0.9988

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


the linear regression model in “example 2” fits the data better than that of “example 1”.

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Correlation Coefficient r
The correlation coefficient is a measure of the strength of
linear relationships between variables
1 σ 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
𝑟=
𝑛−1 𝑠𝑥 𝑠𝑦
A simple formula for r
𝑆𝑥𝑦
𝑟=
𝑆𝑥𝑥 𝑆𝑦𝑦
R is also known as Pearson correlation coefficient
−1 ≤ 𝑟 ≤ 1

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Multiple Linear Regression
Making predictions on Y based on multiple factors X
Multiple Linear Regression Equation:

𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + … + 𝛽𝑘 𝑋𝑘 + 𝜀

𝛽0 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝛽1 , 𝛽2 , … , 𝛽𝑘 the slopes
The slopes provides the strength and direction of the
relationships

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Illustrative Example
The linear relation ship between the response Y (Wage) and the
predictor variables 𝑥𝑖 is

𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 + 𝛽4 𝑥4𝑖 + 𝑒𝑖


where
𝑥1 = 𝐹𝑒𝑚𝑎𝑙𝑒/𝑀𝑎𝑙𝑒,
𝑥2 = 𝐴𝑔𝑒 in years
𝑥3 = 𝐸𝑑𝑢𝑐 Level 1-4
𝑥4 = 𝑃𝑎𝑟𝑡𝑖𝑚𝑒 job (1 if no work – 0 if works)

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Multiple Linear Regression Model
Matrix Form
For n observations and k factors

𝒀 = 𝑿𝜷 + 𝜺
𝒀 is 𝑛 × 1 vector
𝑿 is 𝑛 × (𝑘 + 1) coefficient matrix
𝜷 is (𝑘 + 1) × 1 unknown parameters
𝜺 𝑛 × 1 error term
෡=𝒃
The estimate of the parameter 𝜷 denoted by 𝜷

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Least Squares Solution
The Least Squares solution of
𝑦 = 𝑋𝑏
Such that b that minimizes the residual error
𝑋𝛽 − 𝑋𝑏
The normal equation:
𝑋𝑇𝑋 𝑏 = 𝑋𝑇𝑦
If X is full rank , that is 𝑟𝑎𝑛𝑘 (𝑋) = 𝑘 + 1, then 𝑋 𝑇 𝑋 is
non-singular
−1 𝑇
෠ 𝑇
𝑏= 𝑋 𝑋 𝑋 𝑦

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Measures of Model Fit
To numerically determine how the model fits .
Two measures to determine the fit of a model :
1 – Sum of Square Error (SSE)
2- R squared (R^2)

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Sum of Squares Error (SSE)
The sum of squares error of the actual value y from an estimate
𝑦ො = 𝑋𝑏
𝑆𝑆𝐸 = 𝑦 − 𝑦ො 𝑇 𝑦 − 𝑦ො

𝑆𝑆𝐸 = 𝑦 − 𝑋𝑏 𝑇 𝑦 − 𝑋𝑏

An unbiased estimate of the residual variance 𝜎 2 in 𝜖 ~𝑁 0, 𝜎 2


is given by
2
𝑆𝑆𝐸
𝑠𝑒 =
𝑛−𝑘−1

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


The Coefficient of Determination R^2
Is a measure to determine how close the data to the fitted line
R^2= the proportion of the variability in the response Y that is explained
by the linear model
i.e., comparing a regression model to a simple model, the mean of the
data points.
2
𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑑𝑢𝑒 𝑡𝑜 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
𝑅 =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠

𝑛 2
2
σ 𝑖=1 𝑦
ෝ𝑖 − 𝑦ത 𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅 = 𝑛 2
= =1−
σ𝑖=1 𝑦𝑖 − 𝑦ത 𝑆𝑆𝑇 𝑆𝑆𝑇
SST = SSR +SSE (prove)

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Values of R-squared
𝑅2 ≈ 0 Poor Fit (SSE ≈SST)
0 < 𝑅2 < 0.3 Weak fit
0.3 < 𝑅2 < 0.7 Moderate fit
𝑅2 > 0.7 Strong fit
𝑅2 =1 Perfect fit(SSR ≈SST)

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Example 3 ( case k=1)
Use the matrix relations to fit a straight line to the data

X 0 1 2 3 4
y 8 9 4 3 1

a) Fit the data with a simple linear regression line


b) Find the sum of squares error SSE and standard error
c) Find the 95% confidence interval of the regression
coefficients a and b

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Solution
X’ Y X’X 𝑋 ′𝑋 −1 X’y
1 1 1 1 1 8 5 10 0.6 −0.2 25
0 1 2 3 4 9 10 30 −0.2 0.1 30
4
3
1
0.6 −0.2 25 9
𝑏= 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑦
= =
−0.2 0.1 30 −2
The fitted equation: 𝑦ො = 9 − 2𝑥
1 0 9 −1
1 1 9 7 2
𝑦ො = 𝑋𝑏 = 1 2 = 5 and 𝑦 − 𝑦ො = −1
1 3 −2 3 0
1 4 1 0
6
→ 𝑆𝑆𝐸 = 6 and 𝑠𝑒2 = = 2.0
5−(1+1)
SPRING 2025 PROF. DR. MAHA A. HASSANEIN
Example 4
y: Number of twists required to break alloy bar, x1: %of
element A in bar, x2: %of element B in bar.
Fit a least squares regression plane and use to estimate
number of twists required to break a bar with x1=2.5 ,
x2=12.

y 41 42 69 40 50 43
x1 1 2 3 1 2 4
x2 5 5 5 10 10 20

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Solution
X’ Y X’X 𝑋 ′𝑋 −1 X’y

1 1 1 1 1 1 41 6 13 55 0.915 −0.244 −0.024 285


1 2 3 1 2 4 42 13 35 140 −0.244 0.233 −0.028 644
5 5 5 10 10 20 69 55 140 675 −0.024 −0.028 0.009 2520
40
50
43

0.915 −0.244 −0.024 285


Compute 𝑏 = 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑦 = −0.244 0.233 −0.028 644
−0.024 −0.028 0.009 2520
𝟒𝟑. 𝟐𝟒
𝒃= 𝟖. 𝟖 ,
−𝟏. 𝟔𝟏𝟓
ෝ = 𝟒𝟑. 𝟐𝟒 + 𝟖. 𝟖𝒙𝟏 − 𝟏. 𝟔𝟏𝟓𝒙𝟐
The fitted equation: 𝒚

SPRING 2025 PROF. DR. MAHA A. HASSANEIN



1 1 1 1 1 1 𝟒𝟑. 𝟐𝟒
𝑦ො = 𝑋𝑏 = 1 2 3 1 2 4 𝟖. 𝟖 =
5 5 5 10 10 20 −𝟏. 𝟔𝟏𝟓

and 𝑦 − 𝑦ො = 2.959 10.76 −7.439 −4.116 −5.315 3.137 ’


234.9
→ 𝑆𝑆𝐸 =234.9 and 𝑠𝑒2 = 6−(2+1) = 78.3

𝑆𝑆𝑇 = 617.5
𝑆𝑆𝐸
𝑅2 = 1 − = 0.62 Moderate Fit
𝑆𝑆𝑇

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Variance-Covariance Matrix
Let 𝑋 ′ 𝑋 −1 =C , then the estimated variances and
covariance's of the least squares estimators expressed as

෢ (𝑏0 )
𝑉𝑎𝑟 ෢ 0 , 𝑏1 )
𝐶𝑜𝑣(𝑏 ⋯ ෢ 0 , 𝑏𝑘 )
𝐶𝑜𝑣(𝑏
2 ෢ 1 , 𝑏0 )
𝐶𝑜𝑣(𝑏 ෢ 1)
𝑉𝑎𝑟(𝑏 ⋯ ෢ 1 , 𝑏𝑘 )
𝐶𝑜𝑣(𝑏
𝑠𝑒 ∗ 𝐶 =
⋮ ⋮ ⋱ ⋮
෢ 𝑘 , 𝑏0 ) 𝐶𝑜𝑣(𝑏
𝐶𝑜𝑣(𝑏 ෢ 𝑘 , 𝑏1 ) ⋯ ෢ 𝑘)
𝑉𝑎𝑟(𝑏

෢ 𝑏𝑖 = 𝑐𝑖𝑖 𝑠𝑒2
𝜎ො𝑏2𝑖 = 𝑉𝑎𝑟
෢ 𝑏𝑖 , 𝑏𝑗 = 𝑐𝑖𝑗 𝑠𝑒2
𝜎ො𝑏𝑖𝑏𝑗 = 𝐶𝑜𝑣
SPRING 2025 PROF. DR. MAHA A. HASSANEIN
Example 3.
The Variance-Covariance Matrix
𝐶 = 𝑠𝑒2 ∗ 𝑋 ′ 𝑋 −1
0.6 −0.2
= 2.0
−0.2 0.1
The estimated variance and covariance
෢ 𝑏0 = 2.0 ∗ 𝑐11 = 1.2
𝜎ො𝑏20 = 𝑉𝑎𝑟
෢ 𝑏1 = 2.0 ∗ 𝑐22 = 0.2
𝜎ො𝑏21 = 𝑉𝑎𝑟
෢ 𝑏1 , 𝑏0 = −0.4
𝐶𝑜𝑣

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Inferences Based on the Least
Estimators
Statistical model for simple regression line

𝑌𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖
Where 𝜖𝑖 are independent normally distributed random
variables ~𝑁 0, 𝜎 2

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Example 3 CI.
For 95% confidence , n=5, df= n-2=3,
𝑡𝛼 = 𝑡0.025 = 3.182
2
CI for intercept-coefficient :
𝑎 𝜖 9 ± 3.182 ∗ 1.2

CI for slope-coefficient :
𝑏 𝜖 − 2 ± 3.182 ∗ 0.2

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


𝛼

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Example 4.
The Variance-Covariance Matrix
𝐶 = 𝑠𝑒2 ∗ 𝑋 ′ 𝑋 −1

0.915 −0.244 −0.024


= 78.3 ∗ −0.244 0.233 −0.028
−0.024 −0.028 0.009
The estimated variances and covariances
෢ 𝑏0 = 78.3 ∗ 0.915 = 71.63
𝜎ො𝑏20 = 𝑉𝑎𝑟
෢ 𝑏1 = 78.3 ∗ 0.233 = 18.24
𝜎ො𝑏21 = 𝑉𝑎𝑟
෢ 𝑏2 = 78.3 ∗ 0.009 = 0.73
𝜎ො𝑏22 = 𝑉𝑎𝑟
෢ 𝑏1 , 𝑏0 = −19.13, 𝐶𝑜𝑣
𝐶𝑜𝑣 ෢ 𝑏2 , 𝑏0 = −1.86, 𝐶𝑜𝑣
෢ 𝑏1 , 𝑏2 = −2.224

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Example 4 CI.
For 95% confidence, n=6, df =n-3=3,
𝑡𝛼 = 3.182
2
CI for regression coefficients:

𝑏0 𝜖𝟒𝟑. 𝟐𝟒 ± 3.182 ∗ 71.63 = [16.29894, 70.16706]


𝑏1 𝜖𝟖. 𝟖 ± 3.182 ∗ 18.24 = [−4.79092, 22.39292]
𝑏2 𝜖 − 𝟏. 𝟔𝟏𝟓 ± 3.182 ∗ 0.73 = [−4.33338,1.103384]

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Example 4 HT.
Test the Hypothesis that 𝛽2 = −2.0 at the 0.05 level of significance
against the alternative hypothesis that 𝛽2 ≻ −2.0.
Sol.
𝐻0 : 𝛽2 = −2.0
𝐻𝑎 : 𝛽2 > −2.0
For 0.05 significance level , n=6, df =n-3=3, 𝑡𝛼 = 𝑡0.05 = 2.353
𝑏2 = −𝟏. 𝟔𝟏𝟓
−1.615+2.0
𝜎ො2 = 0.73 → 𝑡𝑡𝑒𝑠𝑡 = = 0.45 <2.353
0.73
Fail to reject H0

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


scatter plot with regression
line and confidence interval

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Other Regression Types

SPRING 2025 PROF. DR. MAHA A. HASSANEIN


Text book
Chapter 11.
Sec. 11.1 - 11.7

Reference

SPRING 2025 PROF. DR. MAHA A. HASSANEIN

You might also like