Unit 07 Regression Correlation
Unit 07 Regression Correlation
07
Program: MS
Course Code: AGU408
Semester: 03
Topic: Regression & Correlation
By:
S. Taskeen Shah
Associate Professor
INTRODUCTION TO REGRESSION
Regressions investigates the dependence of a variable on one or
more independent variables and provide an equation to be used
for estimating the average value of the dependent variable.
Technically the dependent variable is called regressend or
response variable and independent variable is called regressors
or predictor.
(𝑋 − 𝑋)(𝑌 − 𝑌)
𝑏=
𝑋−𝑋 2
𝑛 𝑋𝑌 − 𝑋 𝑌
𝑏=
𝑛 𝑋2 − 𝑋 2
𝑎 = 𝑌 − 𝑏𝑋
Note: 𝑌= 𝑌
Example 1: Compute the least squares regression equation of Y
on X for the following data.
X 5 6 8 10 12 13 15 16 17
Y 16 19 23 28 36 41 44 45 50
𝑒=0
Example: Find the regression line and predict Y for X = 7
and also show that the sum of residual is zero.
𝑋 2 4 6 8
𝑌 3 7 5 10
Solution:
𝑋 𝑌 𝑋𝑌 𝑋2 𝑌 = 1.50 + 0.95𝑋 𝑌−𝑌
2 3 6 4 3.4 −0.4
4 7 28 16 5.3 1.7
6 5 30 36 7.2 −2.2
8 10 80 64 9.1 0.9
20 25 144 120 25 0
𝑛Σ𝑋𝑌 − Σ𝑋. Σ𝑌
𝑏=
𝑛Σ𝑋 2 − (Σ𝑋)2
4 × 144 − 20 × 25
𝑏=
4 × 120 − 20 2
576 − 500 76
𝑏= =
480 − 400 80
𝑏 = 0.95
𝑋 20
𝑋= = =5
𝑛 4
𝑌 25
𝑌= = = 6.25
𝑛 4
𝑎 = 𝑌 − 𝑏𝑋
𝑎 = 6.25 − 0.95 × 5 = 1.50
The estimated regression line:
𝑌 = 𝑎 + 𝑏𝑋
𝑌 = 1.50 + 0.95𝑋
Prediction of Y when X = 7.
𝑌 = 1.50 + 0.95 × 7
𝑌 = 1.50 + 6.65
𝑌 = 8.15
The sum of residual is zero:
𝑌−𝑌 =0
Scatter Diagram (Plot)
Scatter diagram in regression exhibit the relationship
between a response variable (generally represented by
Y) and explanatory variable (generally represented by
X). The scatter diagram is constructed by taking
explanatory variable on x-axis and response variable on
y-axis. Plot the bivariate data on (X, Y) on graph paper.
The relationship between response variable and
explanatory variable will be linear if the plotted points
portray a relationship represented by straight line
otherwise the relationship between the response variable
and explanatory variable will be nonlinear.
Scatter Diagram
Repeat the previous Example:
Example: Find the regression line and plot on scatter plot
𝑋 2 4 6 8
𝑌 3 7 5 10
Solution:
𝑋 𝑌 𝑋𝑌 𝑋2
2 3 6 2
4 7 28 16
6 5 30 36
8 10 80 64
20 25 144 120
𝑛Σ𝑋𝑌 − Σ𝑋. Σ𝑌
𝑏=
𝑛Σ𝑋 2 − (Σ𝑋)2
4 × 144 − 20 × 25
𝑏=
4 × 120 − 20 2
576 − 500 76
𝑏= =
480 − 400 80
𝑏 = 0.95
𝑋 20
𝑋= = =5
𝑛 4
𝑌 25
𝑌= = = 6.2
𝑛 4
𝑎 = 𝑌 − 𝑏𝑋
𝑎 = 6.25 − 0.95 × 5 = 1.50
The estimated regression line:
𝑌 = 𝑎 + 𝑏𝑋
𝑌 = 1.50 + 0.95𝑋
Regression line Y on X
The regression line Y on X is commonly expressed as:
𝑌 = 𝑎 + 𝑏𝑋
But for the sack of differentiation it can be written as:
𝑌 = 𝑎𝑦𝑥 + 𝑏𝑦𝑥 𝑋
(𝑋 − 𝑋)(𝑌 − 𝑌)
𝑏𝑦𝑥 =
𝑋−𝑋 2
𝑛 𝑋𝑌 − 𝑋 𝑌
𝑏𝑦𝑥 =
𝑛 𝑋2 − 𝑋 2
𝑎𝑦𝑥 = 𝑌 − 𝑏𝑋
Regression Line X on Y
The regression line X on Y can be written as:
𝑋 = 𝑎 + 𝑏𝑌
But for the sack of differentiation it can be written as:
𝑋 = 𝑎𝑥𝑦 + 𝑏𝑥𝑦 𝑌
(𝑋 − 𝑋)(𝑌 − 𝑌)
𝑏𝑥𝑦 =
𝑌−𝑌 2
𝑛 𝑋𝑌 − 𝑋 𝑌
𝑏𝑥𝑦 =
𝑛 𝑌2 − 𝑌 2
𝑎𝑥𝑦 = 𝑋 − 𝑏𝑌
Example: Find the regression line Y on X and X on Y.
𝑋 1 3 5 7
𝑌 4 6 8 10
Solution:
𝑋 𝑌 𝑋𝑌 𝑋2 𝑌2
1 4 4 1 16
3 6 18 9 36
5 8 40 25 64
7 10 70 49 100
16 28 132 84 216
𝑛 𝑋𝑌 − 𝑋 𝑌 4 × 132 − 16 × 28
𝑏𝑦𝑥 = 2 2
= 2
=1
𝑛 𝑋 − 𝑋 4 × 84 − 16
𝑎𝑦𝑥 = 𝑌 − 𝑏𝑋
28 16
𝑎𝑦𝑥 = −1× =3
4 4
The regression line Y on X:
𝑌 = 𝑎𝑦𝑥 + 𝑏𝑦𝑥 𝑋
𝑌 = 3 + 1𝑋
Now
𝑛 𝑋𝑌 − 𝑋 𝑌 4 × 132 − 16 × 28
𝑏𝑥𝑦 = 2 2
= 2
=1
𝑛 𝑌 − 𝑌 4 × 216 − 28
𝑎𝑥𝑦 = 𝑋 − 𝑏𝑌
16 28
𝑎𝑥𝑦 = −1× = −3
4 4
The regression line X on Y:
𝑌 = 𝑎𝑥𝑦 + 𝑏𝑥𝑦 𝑋
𝑌 = −3 + 1𝑋
Comparison between slopes of two regression lines
The slope of the regression line Y on X is given by:
(𝑋 − 𝑋)(𝑌 − 𝑌)
𝑏𝑦𝑥 =
𝑋−𝑋 2
The slope of the regression line X on Y is given by:
(𝑋 − 𝑋)(𝑌 − 𝑌)
𝑏𝑥𝑦 =
𝑌−𝑌 2
The numerators of both equations are identical and the
denominators are square quantities. So, it is conclude that both
must possess same signs.
That’s
If 𝑏𝑦𝑥 is positive then 𝑏𝑥𝑦 will be positive and vice versa.
PROPERTIES OF REGRESSION LINES
The regression lines always passes through the means
of data.
𝑌
𝑌−𝑌 =0
The sum of the square of the difference between
observed and estimated value is minimum.
2
𝑌−𝑌 = 𝑚𝑖𝑛𝑖𝑚𝑢𝑚
𝑟 = ± 𝑏𝑦𝑥 × 𝑏𝑥𝑦
CORRELATION
Positive Correlation
The two variables vary simultaneously in the same direction.
𝑋 𝑌 𝑋 𝑌
↑ ↑ ↓ ↓
Negative Correlation
The two variables vary simultaneously in the opposite direction.
𝑋 𝑌 𝑋 𝑌
↑ ↓ ↓ ↑
Coefficient of Correlation
(Pearson’s product moment correlation coefficient)
A numerical quantity that measures the strength and
direction of linear relationship between two variables.
𝑋−𝑋 𝑌−𝑌
𝑟=
2
𝑋−𝑋 2 𝑌−𝑌
𝑛 𝑋𝑌 − 𝑋 𝑌
𝑟=
𝑛 𝑋2 − 𝑋 2 𝑛 𝑌2 − 𝑌 2
The coefficient of correlation “r” lies between −1 & +1.
Interpretation:
𝑟 > 0, the correlation between two variables is positive.
𝑟 < 0, the correlation between two variables is negative.
Example: Calculate the linear correlation
coefficient for the following data. X = 4, 8, 12,
16 and Y = 5, 10, 15, 20.
Solution:
𝑋 𝑌 𝑋𝑌 𝑋2 𝑌2
4 5 20 16 25
8 10 80 64 100
12 15 180 144 225
16 20 320 256 400
40 50 600 480 750
𝑛 𝑋𝑌 − 𝑋 𝑌
𝑟=
𝑛 𝑋2 − 𝑋 2 𝑛 𝑌2 − 𝑌 2
4 × 600 − 40 × 50
𝑟=
4 × 480 − 40 2 4 × 750 − 50 2
400
𝑟=
320 × 500
400
𝑟= =1
400
𝑟 = +1, there is perfect positive correlation between X
& Y.
PROPERTIES OF CORRELATION
COEFFICIENT
𝑟 = ± 𝑏𝑦𝑥 × 𝑏𝑥𝑦
SOME IMPORTANT QUESTIONS;
1. If 𝑛 = 10, 𝑌 = 25, 𝑋 = 10 𝑎𝑛𝑑 𝑏 = 0.50, The value of
a?
2. Interpret the regression model given below:
𝑌 = 100 − 0.25𝑋
3. If 𝑌 = 45, 𝑋 = 20 𝑎𝑛𝑑 𝑎 = 1.20. Find slope of the line.
4. If 𝑏𝑦𝑥 = 0.12 and 𝑏𝑥𝑦 = −0.30. Is it possible?
5. If 𝑏𝑦𝑥 = 1.10 and 𝑏𝑦𝑥 = 0.31, find correlation coefficient