Linear Regression
Linear Regression
Fayoum University
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 1 / 33
Purpose of Regression
The idea behind regression in the social sciences is that the researcher
would like to find the relationship between two or more variables.
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 2 / 33
Purpose of Regression
The idea behind regression in the social sciences is that the researcher
would like to find the relationship between two or more variables.
Regression is a statistical technique that allows the scientist to
examine the existence and extent of this relationship.
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 2 / 33
Purpose of Regression
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 3 / 33
Purpose of Regression
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 3 / 33
Purpose of Regression
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 3 / 33
Correlation vs. Regression
Correlation can tell you how the values of your variables co-vary, but
regression analysis is aimed at making a stronger claim:
demonstrating how one variable, your independent variable, causes
another variable, your dependent variable.
Correlation determines the strength of the relationship between
variables, while regression attempts to describe the relationship
between these variables.
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 4 / 33
Spurious Correlation
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 5 / 33
Spurious Correlation
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 5 / 33
Spurious Correlation
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 5 / 33
Spurious Correlation
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 5 / 33
Simple Linear Regression Line
Equation
The simple linear regression line of a population describing the linear
relationship between explanatory (or predictor) variable X and the
response variable Y is given by the following relation:
Y = a + bX + ε
Where:
ε is a normal random variable with zero expectation E (ε) = 0. This
term ε in the form of simple regression line makes the regression
analysis as a probabilistic approach.
a, b, and ε are the parameters of the simple regression line, where a is
a constant term (intercept) and b is the coefficient of the variable X
(slope).
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 6 / 33
Simple Linear Regression Line
Graph
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 7 / 33
Proof: Least Squares Method
Introduction
We aim to prove the least squares method to estimate the line a + bx
using a sample of data points.
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 8 / 33
Proof: Least Squares Method (Cont’d)
Setup
Consider a sample of n data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), where xi
are the independent variables and yi are the corresponding dependent
variables.
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 9 / 33
Proof: Least Squares Method (Cont’d)
Objective
Our objective is to find the line a + bx that minimizes the sum of the
squares of the vertical distances between the data points and the line.
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 10 / 33
Proof: Least Squares Method (Cont’d)
Error Function
Let E be the error function, defined as the sum of the squares of the
vertical distances:
Xn
E= (yi − (a + bxi ))2
i=1
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 11 / 33
Proof: Least Squares Method (Cont’d)
Minimization
To find the line a + bx that minimizes the error function E , we
differentiate E with respect to a and b, set the derivatives equal to zero,
and solve for a and b.
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 12 / 33
Proof: Least Squares Method (Cont’d)
Partial Derivatives
Differentiating E with respect to a and b gives:
n
∂E X
= −2 (yi − a − bxi )
∂a
i=1
n
∂E X
= −2 xi (yi − a − bxi )
∂b
i=1
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 13 / 33
Proof: Least Squares Method (Cont’d)
n
X
xi (yi − a − bxi ) = 0
i=1
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 14 / 33
Proof: Least Squares Method (Cont’d)
Solving for a
Solving the first normal equation and replacing a and b by â and b̂ we
obtain
Xn
nâ = (yi − b̂xi )
i=1
â = Ȳ − b̂ X̄
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 15 / 33
Proof: Least Squares Method (Cont’d)
Solving for b
Similarly, from the second normal equation we obtain
n
X n
X n
X
b̂ xi2 = −â xi + xi yi
i=1 i=1 i=1
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 16 / 33
Estimated Regression model
The estimated regression line for the given sample can be obtained as:
Ŷ = â + b̂X
â = ȳ − b̂x̄
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 17 / 33
Coefficient of Determination
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 18 / 33
Coefficient of Determination (cont’d)
Where:
Total Sum of Squared Deviations (Total Variation):
n
X
SST = (yi − ȳ )2
i=1
1 0 ≤ r 2 ≤ 1.
2 If r 2 = 0, it indicates that the least squares regression line holds no
explanatory power.
3 Conversely, if r 2 = 1, it signifies that the regression line can explain
the entire variation in the response variable Y , accounting for 100%
of its variability.
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 20 / 33
Example 5.2.2
X 30 20 60 80 40 50 60 30 70 60
Y 73 50 128 170 87 108 135 69 148 132
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 21 / 33
Example 5.2.2 (Continued)
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
Example 5.2.2 (Continued)
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
Example 5.2.2 (Continued)
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
Example 5.2.2 (Continued)
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
Example 5.2.2 (Continued)
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
Example 5.2.2 (Continued)
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
1. The scatter diagram
The scatter plot suggests that there is a strong positive linear association
between X and Y.
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 23 / 33
2.Analysis of Scatter Plot
The scatter plot of the data shows that there is a linear trend since the
value of Y linearly increases when the value of X increases. Hence, the
regression model Y = a + bX + ϵ is appropriate to describe the
relationship between X and Y .
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 24 / 33
3.Estimating Parameters of Regression
i x x2 y y2 xy
1 30 900 73 5329 2190
2 20 400 50 2500 1000
3 60 3600 128 16384 7680
4 80 6400 170 28900 13600
5 40 1600 87 7569 3480
6 50 2500 108 11664 5400
7 60 3600 135 18225 8100
8 30 900 69 4761 2070
9 70 4900 148 21904 10360
10 60 3600 132 17424 7920
Sum 500 28400 1100 134550 61800
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 25 / 33
3.Estimating Parameters of Regression (Continued)
â = Ȳ − b̂ X̄ = 110 − 2 × 50 = 10
2 The estimated simple linear regression equation is: Ŷ = â + b̂X .
3 From this equation, we see that when the lot size increases by one
unit, the Man-hours increases by 2 hours, while there are 10 hours
that do not depend on the lot size.
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 26 / 33
The estimated regression line on the scatter diagram
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 27 / 33
4.The estimated regression line on the scatter diagram
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 28 / 33
5. Predict the man-hours for a lot of size 65
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 29 / 33
6. Coefficient of Determination
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 30 / 33
6. Coefficient of Determination (Continued)
From the tables, we find that:
The total Sum of Squared Variations
n
X
SST = (yi − ȳ )2 = 13660
i=1
The Sum of Squared Regression Error:
n
X
SSR = (ŷi − ȳ )2 = 13600
i=1
13600 60
r2 = ≈ 0.9956 (or) r2 = 1 − ≈ 0.9956
13660 13660
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 31 / 33
6. Coefficient of Determination (Continued)
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 32 / 33
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 33 / 33