0% found this document useful (0 votes)
3 views

Linear Regression

The document discusses regression analysis, a statistical technique used to examine the relationship between independent and dependent variables in social sciences. It explains the purpose of regression, the concept of spurious correlation, and details the least squares method for estimating regression lines. Additionally, it covers the coefficient of determination and provides an example involving lot size and labor hours.

Uploaded by

Mariam Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Linear Regression

The document discusses regression analysis, a statistical technique used to examine the relationship between independent and dependent variables in social sciences. It explains the purpose of regression, the concept of spurious correlation, and details the least squares method for estimating regression lines. Additionally, it covers the coefficient of determination and provides an example involving lot size and labor hours.

Uploaded by

Mariam Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Regression Analysis

Dr. Mohamed Sief

Fayoum University

December 10, 2024

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 1 / 33
Purpose of Regression

The idea behind regression in the social sciences is that the researcher
would like to find the relationship between two or more variables.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 2 / 33
Purpose of Regression

The idea behind regression in the social sciences is that the researcher
would like to find the relationship between two or more variables.
Regression is a statistical technique that allows the scientist to
examine the existence and extent of this relationship.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 2 / 33
Purpose of Regression

Regression analysis allows us to understand how changes in one


variable (the independent variable) are associated with changes in
another variable (the dependent variable).

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 3 / 33
Purpose of Regression

Regression analysis allows us to understand how changes in one


variable (the independent variable) are associated with changes in
another variable (the dependent variable).
By fitting a regression model to the data, we can quantify the
relationship between these variables, typically in terms of the slope of
the regression line.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 3 / 33
Purpose of Regression

Regression analysis allows us to understand how changes in one


variable (the independent variable) are associated with changes in
another variable (the dependent variable).
By fitting a regression model to the data, we can quantify the
relationship between these variables, typically in terms of the slope of
the regression line.
For example, if we’re examining the relationship between study hours
and exam scores, regression analysis can tell us how much a one-unit
increase in study hours is associated with a change in exam score.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 3 / 33
Correlation vs. Regression

Correlation can tell you how the values of your variables co-vary, but
regression analysis is aimed at making a stronger claim:
demonstrating how one variable, your independent variable, causes
another variable, your dependent variable.
Correlation determines the strength of the relationship between
variables, while regression attempts to describe the relationship
between these variables.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 4 / 33
Spurious Correlation

Regression may lead to what is called “spurious correlation,” where


the co-variation of two variables implies a causal relationship that
does not exist.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 5 / 33
Spurious Correlation

Regression may lead to what is called “spurious correlation,” where


the co-variation of two variables implies a causal relationship that
does not exist.
Just because two variables are correlated does not mean that one
variable causes the other to change.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 5 / 33
Spurious Correlation

Regression may lead to what is called “spurious correlation,” where


the co-variation of two variables implies a causal relationship that
does not exist.
Just because two variables are correlated does not mean that one
variable causes the other to change.
Establishing causation requires additional evidence, often derived from
substantive theories or experimental designs.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 5 / 33
Spurious Correlation

Regression may lead to what is called “spurious correlation,” where


the co-variation of two variables implies a causal relationship that
does not exist.
Just because two variables are correlated does not mean that one
variable causes the other to change.
Establishing causation requires additional evidence, often derived from
substantive theories or experimental designs.
For example, while regression analysis might reveal a correlation
between ice cream sales and drowning incidents, it doesn’t mean that
buying ice cream causes drownings. To establish causation, we would
need to investigate other factors, such as temperature, swimming
habits, or lifeguard availability.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 5 / 33
Simple Linear Regression Line

Equation
The simple linear regression line of a population describing the linear
relationship between explanatory (or predictor) variable X and the
response variable Y is given by the following relation:

Y = a + bX + ε

Where:
ε is a normal random variable with zero expectation E (ε) = 0. This
term ε in the form of simple regression line makes the regression
analysis as a probabilistic approach.
a, b, and ε are the parameters of the simple regression line, where a is
a constant term (intercept) and b is the coefficient of the variable X
(slope).

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 6 / 33
Simple Linear Regression Line

Graph

Figure: Simple Linear Regression Line

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 7 / 33
Proof: Least Squares Method

Introduction
We aim to prove the least squares method to estimate the line a + bx
using a sample of data points.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 8 / 33
Proof: Least Squares Method (Cont’d)

Setup
Consider a sample of n data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), where xi
are the independent variables and yi are the corresponding dependent
variables.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 9 / 33
Proof: Least Squares Method (Cont’d)

Objective
Our objective is to find the line a + bx that minimizes the sum of the
squares of the vertical distances between the data points and the line.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 10 / 33
Proof: Least Squares Method (Cont’d)

Error Function
Let E be the error function, defined as the sum of the squares of the
vertical distances:
Xn
E= (yi − (a + bxi ))2
i=1

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 11 / 33
Proof: Least Squares Method (Cont’d)

Minimization
To find the line a + bx that minimizes the error function E , we
differentiate E with respect to a and b, set the derivatives equal to zero,
and solve for a and b.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 12 / 33
Proof: Least Squares Method (Cont’d)

Partial Derivatives
Differentiating E with respect to a and b gives:
n
∂E X
= −2 (yi − a − bxi )
∂a
i=1

n
∂E X
= −2 xi (yi − a − bxi )
∂b
i=1

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 13 / 33
Proof: Least Squares Method (Cont’d)

Solving for a and b


Setting the partial derivatives equal to zero gives the normal equations:
n
X
(yi − a − bxi ) = 0
i=1

n
X
xi (yi − a − bxi ) = 0
i=1

Solving these equations yields the estimates for a and b.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 14 / 33
Proof: Least Squares Method (Cont’d)

Solving for a
Solving the first normal equation and replacing a and b by â and b̂ we
obtain
Xn
nâ = (yi − b̂xi )
i=1

so, the estimator of a is given by

â = Ȳ − b̂ X̄

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 15 / 33
Proof: Least Squares Method (Cont’d)

Solving for b
Similarly, from the second normal equation we obtain
n
X n
X n
X
b̂ xi2 = −â xi + xi yi
i=1 i=1 i=1

replacing â by Ȳ − b̂ X̄ and rewrite the equation we get


n
X n
X
b̂ xi2 − nb̂ X̄ 2 = xi yi − nX̄ Ȳ
i=1 i=1

and solving for b̂, we obtain the estimator


Pn Pn
i=1 xi yi − nX̄ Ȳ (xi − X̄ )(yi − Ȳ ) SXY
b̂ = Pn 2 2
= i=1
Pn 2
=
i=1 xi − nX̄ i=1 (xi − X̄ ) SXX

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 16 / 33
Estimated Regression model

The estimated regression line for the given sample can be obtained as:

Ŷ = â + b̂X

where the coefficients â and b̂ can be estimated as:


Pn
(x − x̄)(yi − ȳ )
b̂ = i=1Pn i 2
i=1 (xi − x̄)

â = ȳ − b̂x̄

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 17 / 33
Coefficient of Determination

The coefficient of determination r 2 can be obtained by squaring the


Pearson correlation coefficient (r ). This method works only for the linear
regression model.
Ŷ = â + b̂X
The coefficient of determination r 2 , represents the proportion of the total
sample variation in Y (measured by the sum of squares of deviations of
the sample y1 , y2 , . . . , yn values about their mean ȳ ) that is explained by
(or attributed to) the linear relationship between X and Y .
The formula to calculate the coefficient of determination is:
SSR SSE
r2 = =1−
SST SST
The proof is left for you !!!!!!!

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 18 / 33
Coefficient of Determination (cont’d)
Where:
Total Sum of Squared Deviations (Total Variation):
n
X
SST = (yi − ȳ )2
i=1

Sum of Squared Regression Error (Explained Variation):


n
X
SSR = (ŷi − ȳ )2
i=1

Sum of Squared Error (Unexplained Variation):


n
X
SSE = (yi − ŷi )2
i=1

Total Sum of Squared Deviations:


SST = SSR + SSE
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 19 / 33
Coefficient of Determination

1 0 ≤ r 2 ≤ 1.
2 If r 2 = 0, it indicates that the least squares regression line holds no
explanatory power.
3 Conversely, if r 2 = 1, it signifies that the regression line can explain
the entire variation in the response variable Y , accounting for 100%
of its variability.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 20 / 33
Example 5.2.2

A certain spare part is manufactured by Westwood Company once a


month in lots which vary in size as demand fluctuates.
Let X represents the lot size and Y the number of Man-hours labor
for recent production runs.
The data is given in the table below:

X 30 20 60 80 40 50 60 30 70 60
Y 73 50 128 170 87 108 135 69 148 132

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 21 / 33
Example 5.2.2 (Continued)

1 Construct the scatter diagram.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
Example 5.2.2 (Continued)

1 Construct the scatter diagram.


2 Is the linear relationship appropriate to describe the relationship
between X and Y ?

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
Example 5.2.2 (Continued)

1 Construct the scatter diagram.


2 Is the linear relationship appropriate to describe the relationship
between X and Y ?
3 Estimate the parameters of the linear regression line Y = a + bX and
write down the estimated regression line.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
Example 5.2.2 (Continued)

1 Construct the scatter diagram.


2 Is the linear relationship appropriate to describe the relationship
between X and Y ?
3 Estimate the parameters of the linear regression line Y = a + bX and
write down the estimated regression line.
4 Plot the estimated regression line on the scatter diagram.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
Example 5.2.2 (Continued)

1 Construct the scatter diagram.


2 Is the linear relationship appropriate to describe the relationship
between X and Y ?
3 Estimate the parameters of the linear regression line Y = a + bX and
write down the estimated regression line.
4 Plot the estimated regression line on the scatter diagram.
5 Estimate (or predict) the man-hours for a lot of size 65 (X = 65).

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
Example 5.2.2 (Continued)

1 Construct the scatter diagram.


2 Is the linear relationship appropriate to describe the relationship
between X and Y ?
3 Estimate the parameters of the linear regression line Y = a + bX and
write down the estimated regression line.
4 Plot the estimated regression line on the scatter diagram.
5 Estimate (or predict) the man-hours for a lot of size 65 (X = 65).
6 Calculate the coefficient of determination (r 2 ) and hence deduce the
simple linear correlation coefficient (r ) and interpret the results.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 22 / 33
1. The scatter diagram

The scatter plot suggests that there is a strong positive linear association
between X and Y.
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 23 / 33
2.Analysis of Scatter Plot

The scatter plot of the data shows that there is a linear trend since the
value of Y linearly increases when the value of X increases. Hence, the
regression model Y = a + bX + ϵ is appropriate to describe the
relationship between X and Y .

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 24 / 33
3.Estimating Parameters of Regression

i x x2 y y2 xy
1 30 900 73 5329 2190
2 20 400 50 2500 1000
3 60 3600 128 16384 7680
4 80 6400 170 28900 13600
5 40 1600 87 7569 3480
6 50 2500 108 11664 5400
7 60 3600 135 18225 8100
8 30 900 69 4761 2070
9 70 4900 148 21904 10360
10 60 3600 132 17424 7920
Sum 500 28400 1100 134550 61800

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 25 / 33
3.Estimating Parameters of Regression (Continued)

1 Then the estimation of the parameters are:


P P P
n xy − x y 10(61800) − 500 × 1100
b̂ = P 2 P 2 = =2
n x − ( x) 10(28400) − (500)2

â = Ȳ − b̂ X̄ = 110 − 2 × 50 = 10
2 The estimated simple linear regression equation is: Ŷ = â + b̂X .
3 From this equation, we see that when the lot size increases by one
unit, the Man-hours increases by 2 hours, while there are 10 hours
that do not depend on the lot size.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 26 / 33
The estimated regression line on the scatter diagram

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 27 / 33
4.The estimated regression line on the scatter diagram

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 28 / 33
5. Predict the man-hours for a lot of size 65

The estimated man-hours for a lot of size 65 (X = 65) is:

Ŷ = 10 + 2X = 10 + 2(65) = 140 hours.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 29 / 33
6. Coefficient of Determination

i yi ŷi (yi − ŷi )2 (yi − ȳ )2 (ŷi − ȳ )2


1 73 70 -37 1369 9
2 50 50 -60 3600 0
3 128 130 18 324 4
4 170 170 60 3600 0
5 87 90 -23 529 9
6 108 110 -2 4 4
7 135 130 25 625 25
8 69 70 -41 1681 1
9 148 150 38 1444 4
10 132 130 22 484 4
Total 1100 1100 0 13660 60

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 30 / 33
6. Coefficient of Determination (Continued)
From the tables, we find that:
The total Sum of Squared Variations
n
X
SST = (yi − ȳ )2 = 13660
i=1
The Sum of Squared Regression Error:
n
X
SSR = (ŷi − ȳ )2 = 13600
i=1

The Sum of Squared Error (Unexplained Variation):


n
X
SSE = (yi − ŷi )2 = 60
i=1

It is clear that SStot = SSR + SSE .


The coefficient of determination is:

13600 60
r2 = ≈ 0.9956 (or) r2 = 1 − ≈ 0.9956
13660 13660
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 31 / 33
6. Coefficient of Determination (Continued)

This shows that 99.6% of the total variation of the Man-hours is


explained by the lot size, and hence we can conclude that the lot size
is the most important variable to predict the Man-hours.

Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 32 / 33
Dr. Mohamed Sief (Fayoum University) Regression Analysis December 10, 2024 33 / 33

You might also like