0% found this document useful (0 votes)
15 views36 pages

Unit 07 Regression Correlation

The document provides an introduction to regression and correlation, focusing on the relationship between dependent and independent variables through regression models. It explains simple linear regression, the calculation of regression coefficients, and the use of regression for prediction, along with examples and formulas. Additionally, it discusses the concept of residuals and the construction of scatter diagrams to visualize relationships between variables.

Uploaded by

mehransharif278
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views36 pages

Unit 07 Regression Correlation

The document provides an introduction to regression and correlation, focusing on the relationship between dependent and independent variables through regression models. It explains simple linear regression, the calculation of regression coefficients, and the use of regression for prediction, along with examples and formulas. Additionally, it discusses the concept of residuals and the construction of scatter diagrams to visualize relationships between variables.

Uploaded by

mehransharif278
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Week No.

07
Program: MS
Course Code: AGU408
Semester: 03
Topic: Regression & Correlation
By:
S. Taskeen Shah
Associate Professor
INTRODUCTION TO REGRESSION
Regressions investigates the dependence of a variable on one or
more independent variables and provide an equation to be used
for estimating the average value of the dependent variable.
Technically the dependent variable is called regressend or
response variable and independent variable is called regressors
or predictor.

The statistical technique that quantify the relationship


between a response variable and predictor(s ).
Example 1: if we take revenue of a firm as a response
variable and spending on advertisement as a predictor.
The regression model would take the following form:
𝑅𝑒𝑣𝑒𝑛𝑢𝑒 = 𝛼 + 𝛽 𝑆𝑝𝑒𝑛𝑑𝑖𝑛𝑔 𝑜𝑛 𝑎𝑑𝑣𝑒𝑟𝑡𝑖𝑠𝑚𝑒𝑏𝑟 + 𝐸𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚
𝑌 = 𝛼 + 𝛽𝑋 +∈
𝛼: represent total expected revenue when advertisement
spending is zero.
The coefficient β represent the average change in total
revenue when advertisement spending is increased by one
unit (e.g. one dollar).
Example 2: A researchers might administer various
dosages of a certain drug to patients and observe how
their blood pressure responds. Here blood pressure is
taken a response variable and dosage is a predictor
variable. The regression model would take the following
form:
𝑩𝒍𝒐𝒐𝒅 𝑷𝒓𝒆𝒔𝒔𝒖𝒓𝒆 𝒍𝒆𝒗𝒆𝒍 = 𝜶 + 𝜷 𝒅𝒐𝒔𝒂𝒈𝒆 + 𝑬𝒓𝒓𝒐𝒓 𝒕𝒆𝒓𝒎
𝑌 = 𝛼 + 𝛽𝑋 +∈
𝛼: represent the expected blood pressure when the
dosage is zero and β represent the average change
in total blood pressure when dosage is increased by one
unit.
Simple Linear Regression Model
The simple linear regression investigates the dependence
of a response variable on single predictor variable.
Statistically Simple linear regression model can be
express as:
𝑌 = 𝛼 + 𝛽𝑋 +∈
𝑌 → Response variable
𝑋 → Predictor variable
𝛼 → Intercept
𝛽 → Slope, and ∈ Include all those variables which are
not under consideration in the analysis (called
disturbance or noise term).
The estimated model can be express as:
𝑌 = 𝑎 + 𝑏𝑋
OLS method

(𝑋 − 𝑋)(𝑌 − 𝑌)
𝑏=
𝑋−𝑋 2
𝑛 𝑋𝑌 − 𝑋 𝑌
𝑏=
𝑛 𝑋2 − 𝑋 2
𝑎 = 𝑌 − 𝑏𝑋
Note: 𝑌= 𝑌
Example 1: Compute the least squares regression equation of Y
on X for the following data.

X 5 6 8 10 12 13 15 16 17
Y 16 19 23 28 36 41 44 45 50

What is the regression coefficient and what does it mean?


Solution:
No X Y XY 𝑋2
1 5 16 80 25
2 6 19 114 36
3 8 23 184 64
4 10 28 280 100
5 12 36 432 144
6 13 41 533 169
7 15 44 660 225
8 16 45 720 256
9 17 50 850 289
Σ 102 302 3853 1308
𝑛Σ𝑋𝑌 − Σ𝑋. Σ𝑌 9 3853 − 102 × 302
𝑏= 2 2
= 2
= 2.831
𝑛Σ𝑋 − (Σ𝑋) 9 1308 − (102)
302 102
𝑎 = 𝑌 − 𝑏𝑋 = − 2.831 × = 1.47
9 9
Hence the estimated regression equation of Y on X is
𝑌 = 𝑎 + 𝑏𝑋
𝑌 = 1.47 + 2.831 𝑋
As b = 2.831, which indicate that the values of Y
increase by 2.831 units for a unit increase in X.
Prediction by Regression Model
The regression model can be used to predict the value
of response variable (Y) for a specified value of
𝑋 = 𝑋0 .
Consider the estimated regression model
𝑌 = 𝑎 + 𝑏𝑋
The predicted value of Y at 𝑋 = 𝑋0 can be obtained as:
𝑌 = 𝑎 + 𝑏𝑋0
Residual
The difference between observed and estimated values
by regression model is called residual denoted by e.
𝑒 =𝑌−𝑌
Further the sum of residual is always is equal to zero.

𝑒=0
Example: Find the regression line and predict Y for X = 7
and also show that the sum of residual is zero.
𝑋 2 4 6 8
𝑌 3 7 5 10
Solution:
𝑋 𝑌 𝑋𝑌 𝑋2 𝑌 = 1.50 + 0.95𝑋 𝑌−𝑌
2 3 6 4 3.4 −0.4
4 7 28 16 5.3 1.7
6 5 30 36 7.2 −2.2
8 10 80 64 9.1 0.9
20 25 144 120 25 0
𝑛Σ𝑋𝑌 − Σ𝑋. Σ𝑌
𝑏=
𝑛Σ𝑋 2 − (Σ𝑋)2
4 × 144 − 20 × 25
𝑏=
4 × 120 − 20 2
576 − 500 76
𝑏= =
480 − 400 80
𝑏 = 0.95
𝑋 20
𝑋= = =5
𝑛 4
𝑌 25
𝑌= = = 6.25
𝑛 4
𝑎 = 𝑌 − 𝑏𝑋
𝑎 = 6.25 − 0.95 × 5 = 1.50
The estimated regression line:
𝑌 = 𝑎 + 𝑏𝑋
𝑌 = 1.50 + 0.95𝑋
Prediction of Y when X = 7.
𝑌 = 1.50 + 0.95 × 7
𝑌 = 1.50 + 6.65
𝑌 = 8.15
The sum of residual is zero:

𝑌−𝑌 =0
Scatter Diagram (Plot)
Scatter diagram in regression exhibit the relationship
between a response variable (generally represented by
Y) and explanatory variable (generally represented by
X). The scatter diagram is constructed by taking
explanatory variable on x-axis and response variable on
y-axis. Plot the bivariate data on (X, Y) on graph paper.
The relationship between response variable and
explanatory variable will be linear if the plotted points
portray a relationship represented by straight line
otherwise the relationship between the response variable
and explanatory variable will be nonlinear.
Scatter Diagram
Repeat the previous Example:
Example: Find the regression line and plot on scatter plot
𝑋 2 4 6 8
𝑌 3 7 5 10
Solution:

𝑋 𝑌 𝑋𝑌 𝑋2
2 3 6 2
4 7 28 16
6 5 30 36
8 10 80 64
20 25 144 120
𝑛Σ𝑋𝑌 − Σ𝑋. Σ𝑌
𝑏=
𝑛Σ𝑋 2 − (Σ𝑋)2
4 × 144 − 20 × 25
𝑏=
4 × 120 − 20 2
576 − 500 76
𝑏= =
480 − 400 80
𝑏 = 0.95
𝑋 20
𝑋= = =5
𝑛 4
𝑌 25
𝑌= = = 6.2
𝑛 4
𝑎 = 𝑌 − 𝑏𝑋
𝑎 = 6.25 − 0.95 × 5 = 1.50
The estimated regression line:
𝑌 = 𝑎 + 𝑏𝑋
𝑌 = 1.50 + 0.95𝑋
Regression line Y on X
The regression line Y on X is commonly expressed as:
𝑌 = 𝑎 + 𝑏𝑋
But for the sack of differentiation it can be written as:
𝑌 = 𝑎𝑦𝑥 + 𝑏𝑦𝑥 𝑋

(𝑋 − 𝑋)(𝑌 − 𝑌)
𝑏𝑦𝑥 =
𝑋−𝑋 2
𝑛 𝑋𝑌 − 𝑋 𝑌
𝑏𝑦𝑥 =
𝑛 𝑋2 − 𝑋 2
𝑎𝑦𝑥 = 𝑌 − 𝑏𝑋
Regression Line X on Y
The regression line X on Y can be written as:
𝑋 = 𝑎 + 𝑏𝑌
But for the sack of differentiation it can be written as:
𝑋 = 𝑎𝑥𝑦 + 𝑏𝑥𝑦 𝑌

(𝑋 − 𝑋)(𝑌 − 𝑌)
𝑏𝑥𝑦 =
𝑌−𝑌 2
𝑛 𝑋𝑌 − 𝑋 𝑌
𝑏𝑥𝑦 =
𝑛 𝑌2 − 𝑌 2
𝑎𝑥𝑦 = 𝑋 − 𝑏𝑌
Example: Find the regression line Y on X and X on Y.
𝑋 1 3 5 7
𝑌 4 6 8 10
Solution:
𝑋 𝑌 𝑋𝑌 𝑋2 𝑌2
1 4 4 1 16
3 6 18 9 36
5 8 40 25 64
7 10 70 49 100
16 28 132 84 216
𝑛 𝑋𝑌 − 𝑋 𝑌 4 × 132 − 16 × 28
𝑏𝑦𝑥 = 2 2
= 2
=1
𝑛 𝑋 − 𝑋 4 × 84 − 16

𝑎𝑦𝑥 = 𝑌 − 𝑏𝑋
28 16
𝑎𝑦𝑥 = −1× =3
4 4
The regression line Y on X:
𝑌 = 𝑎𝑦𝑥 + 𝑏𝑦𝑥 𝑋
𝑌 = 3 + 1𝑋
Now
𝑛 𝑋𝑌 − 𝑋 𝑌 4 × 132 − 16 × 28
𝑏𝑥𝑦 = 2 2
= 2
=1
𝑛 𝑌 − 𝑌 4 × 216 − 28
𝑎𝑥𝑦 = 𝑋 − 𝑏𝑌
16 28
𝑎𝑥𝑦 = −1× = −3
4 4
The regression line X on Y:
𝑌 = 𝑎𝑥𝑦 + 𝑏𝑥𝑦 𝑋
𝑌 = −3 + 1𝑋
Comparison between slopes of two regression lines
The slope of the regression line Y on X is given by:

(𝑋 − 𝑋)(𝑌 − 𝑌)
𝑏𝑦𝑥 =
𝑋−𝑋 2
The slope of the regression line X on Y is given by:
(𝑋 − 𝑋)(𝑌 − 𝑌)
𝑏𝑥𝑦 =
𝑌−𝑌 2
The numerators of both equations are identical and the
denominators are square quantities. So, it is conclude that both
must possess same signs.
That’s
If 𝑏𝑦𝑥 is positive then 𝑏𝑥𝑦 will be positive and vice versa.
PROPERTIES OF REGRESSION LINES
 The regression lines always passes through the means
of data.
𝑌

 The sum of the difference between observed and


estimated value (called residual) is always is equal to
zero.

𝑌−𝑌 =0
 The sum of the square of the difference between
observed and estimated value is minimum.
2
𝑌−𝑌 = 𝑚𝑖𝑛𝑖𝑚𝑢𝑚

 The geometric means of two regression coefficients


(Y on X & X on Y) gives correlation coefficient.

𝑟 = ± 𝑏𝑦𝑥 × 𝑏𝑥𝑦
CORRELATION

The phenomena in which two variables


varies simultaneously either in the same
direction or opposite direction.
TYPES OF CORRELATION

Positive Correlation
The two variables vary simultaneously in the same direction.

𝑋 𝑌 𝑋 𝑌
↑ ↑ ↓ ↓

Negative Correlation
The two variables vary simultaneously in the opposite direction.

𝑋 𝑌 𝑋 𝑌
↑ ↓ ↓ ↑
Coefficient of Correlation
(Pearson’s product moment correlation coefficient)
A numerical quantity that measures the strength and
direction of linear relationship between two variables.

𝑋−𝑋 𝑌−𝑌
𝑟=
2
𝑋−𝑋 2 𝑌−𝑌

𝑛 𝑋𝑌 − 𝑋 𝑌
𝑟=
𝑛 𝑋2 − 𝑋 2 𝑛 𝑌2 − 𝑌 2
The coefficient of correlation “r” lies between −1 & +1.
Interpretation:
𝑟 > 0, the correlation between two variables is positive.
𝑟 < 0, the correlation between two variables is negative.
Example: Calculate the linear correlation
coefficient for the following data. X = 4, 8, 12,
16 and Y = 5, 10, 15, 20.
Solution:
𝑋 𝑌 𝑋𝑌 𝑋2 𝑌2

4 5 20 16 25
8 10 80 64 100
12 15 180 144 225
16 20 320 256 400
40 50 600 480 750
𝑛 𝑋𝑌 − 𝑋 𝑌
𝑟=
𝑛 𝑋2 − 𝑋 2 𝑛 𝑌2 − 𝑌 2

4 × 600 − 40 × 50
𝑟=
4 × 480 − 40 2 4 × 750 − 50 2

400
𝑟=
320 × 500
400
𝑟= =1
400
𝑟 = +1, there is perfect positive correlation between X
& Y.
PROPERTIES OF CORRELATION
COEFFICIENT

o The correlation coefficient lies between -1 & +1


symbolically
−1 ≤ 𝑟 ≤ +1
o The correlation coefficient is independent of the change
of origin & scale.
𝑟𝑋𝑌 = 𝑟𝑢𝑣
o The coefficient of correlation is the geometric mean of
two regression coefficient.

𝑟 = ± 𝑏𝑦𝑥 × 𝑏𝑥𝑦
SOME IMPORTANT QUESTIONS;
1. If 𝑛 = 10, 𝑌 = 25, 𝑋 = 10 𝑎𝑛𝑑 𝑏 = 0.50, The value of
a?
2. Interpret the regression model given below:
𝑌 = 100 − 0.25𝑋
3. If 𝑌 = 45, 𝑋 = 20 𝑎𝑛𝑑 𝑎 = 1.20. Find slope of the line.
4. If 𝑏𝑦𝑥 = 0.12 and 𝑏𝑥𝑦 = −0.30. Is it possible?
5. If 𝑏𝑦𝑥 = 1.10 and 𝑏𝑦𝑥 = 0.31, find correlation coefficient

You might also like