Correlation and Regression
Correlation and Regression
CORRELATION ANALYSIS
Karl Pearson introduced the concept of correlation.
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑟=
𝑁 σ 𝑋2− σ 𝑋 2 𝑁 σ 𝑌2− σ 𝑌 2
Where N = number of paired observations
σ 𝑥𝑦
𝑟= where 𝑥 = 𝑋 − ത and y = Y − 𝑌ത
𝑋
σ 𝑥2 σ 𝑦 2
𝑐𝑜𝑣(𝑥,𝑦)
𝑟=
𝜎𝑥 𝜎𝑦
Solution:
r X Y X2 Y2 XY
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 10 88 100 7744 880
= 12 90 144 8100 1080
𝑁 σ 𝑋2 − σ 𝑋 2 𝑁 σ 𝑌2 − σ 𝑌 2
18 94 324 8836 1692
10 𝑋 12718 − 140 𝑋 900
= 8 86 64 7396 688
10 𝑋 2224 − 140 2 𝑋 10𝑋 81130 − 900 2
13 87 169 7569 1131
127180−126000
= 20 92 400 8464 1840
22240−19600 𝑋 811300−810000
1180 1180 22 96 484 9216 2112
= = 15 94 225 8836 1410
2640 𝑋 1300 51.38 𝑋 36.06
1180
= = 0.637 5 88 25 7744 440
1852.76
17 85 289 7225 1445
r = 0.637
Course Teacher: Dr.D.Ramya 140 900 2224 81130 12718
20
From the following data, compute the coefficient of correlation
between X and Y
Solution:
Given σ 𝑋 − 𝑋ത 2
= σ 𝑥 2 = 8250
σ 𝑌 − 𝑌ത 2
= σ 𝑦 2 = 724
𝑋 − 𝑋ത 𝑌 − 𝑌ത = 𝑥𝑦 = 2350
σ 𝑌 − 𝑌ത 2 724
𝜎𝑦 = = = 72.4 = 8.51
𝑁 10
𝑐𝑜𝑣(𝑥, 𝑦) 235 235
𝑟= = = = 0.961
𝜎𝑥 𝜎𝑦 28.72𝑋8.51 244.41
7 𝑋 4826 − 182𝑋175
=
7 𝑋 4844 − 182 2 𝑋 7𝑋5131 − 175 2
33782−31850
=
33908−33124 𝑋 35917−30625
1932 1932 1932
= = = = 0.948
784 𝑋 5292 28 𝑋 72.75 2037
r = 0.948
Solution:
Rx Ry d=Rx-Ry d2
6 σ 𝑑2
𝑟𝑠 = 1 − 𝑁3 −𝑁 1 3 -2 4
3 7 -4 16
6∗200 4 2 2 4
= 1 − 103 −10 5 8 -3 9
1200 6 5 1 1
= 1 − 1000−10 2 10 -8 64
8 9 -1 1
1200
=1− =1 − 1.212 = −0.212 7 6 1 1
990 10 4 6 36
9 1 8 64
200
𝑟𝑠 = Course
−0.212 Teacher: Dr.D.Ramya
27
Case (ii) When the ranks are not given and no repeated ranks
From the data given below, find Spearman’s rank correlation
coefficient:
X: 52 63 45 36 72 65 47 25
Y: 62 53 51 25 79 43 60 33
Solution:
X Y Rx Ry d=Rx-Ry d2
6 σ 𝑑2
𝑟𝑠 = 1 − 𝑁3 −𝑁 52 62 4 2 2 4
6𝑋28 63 53 3 4 -1 1
=1 − 83 −8 45 51 6 5 1 1
168 36 25 7 8 -1 1
=1− 512−8 72 79 1 1 0 0
168 65 43 2 6 -4 16
=1−
504 47 60 5 3 2 4
= 1 − 0.33 = 0.67 25 33 8 7 1 1
𝑟𝑠 = 0.67 28
28 Course Teacher: Dr.D.Ramya
Case (iii) When the ranks are not given (with repeated ranks)
Find the spearman’s rank correlation coefficient from the following
data: X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
Solution: X Y Rx Ry d=Rx-Ry d2
68 62
𝑚3 −𝑚 𝑚3 −𝑚 𝑚3 −𝑚
6 σ 𝑑2+ + + +⋯ 64 58
12 12 12
𝑟𝑠 = 1 −
𝑁3 −𝑁 75 68
𝑚3 − 𝑚 𝑚3 − 𝑚 𝑚3 − 𝑚
6 σ 𝑑2 + + + +⋯ 50 45
12 12 12
𝑟𝑠 = 1 −
𝑁3 − 𝑁 64 81
80 60 1
75 68
40 48
55 50
64 70
X: 39 65 62 90 82 75 25 98 36
Y: 47 53 58 86 62 68 60 91 51
X: 30 50 25 30 60 70 30 65 75 85
Y: 50 60 30 40 70 50 90 60 40 80
Sir Francis Galton introduced the word ‘regression’ to study the relationship
This measure helps to estimate or predict unknown values of one variable from
The unknown variable for which the value is to be predicted is called dependent
variable.
31 Course Teacher: Dr.D.Ramya
Regression coefficients
The regression coefficients indicates the degree and
direction of change in the dependent variable with respect
to change in the independent variable.
bxy and byx are known as the regression coefficients.
Regression coefficient X on Y is given by,
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 𝜎𝑋
𝑏𝑋𝑌 = or 𝑏𝑋𝑌 = 𝑟𝜎
𝑁 σ 𝑌 2− σ 𝑌 2 𝑌
Solution:
X Y X2 Y2 XY
6 9 36 81 54
2 11 4 121 22
10 5 100 25 50
4 8 16 64 32
8 7 64 49 56
X on Y Yon X
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑏𝑋𝑌 = 𝑏𝑌𝑋 =
𝑁 σ 𝑌2 − σ 𝑌 2 𝑁 σ 𝑋2 − σ 𝑋 2
5 𝑋 214 − 30 𝑋 40
=
5 𝑋 214 − 30 𝑋 40 =
5 𝑋340 − 40 2 5 𝑋220 − 30 2
1070−1200 −130 1070−1200 −130
= = = =
1700−1600 100 1100−900 200
𝑏𝑋𝑌 = -1.3 𝑏𝑌𝑋 = -0.65
Solution:
X Y X2 Y2 XY
10 5 100 25 50
12 6 144 36 72
13 7 169 49 91
17 9 289 81 153
X on Y Y on X
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑏𝑋𝑌 = 𝑏𝑌𝑋 =
𝑁 σ 𝑌2 − σ 𝑌 2 𝑁 σ 𝑋2 − σ 𝑋 2
5 𝑋 600 − 70 𝑋 40 5 𝑋600 − 70 𝑋 40
= 5 𝑋360 − 40 2 =
5 𝑋 1026 − 70 2
3000−2800 200 3000−2800 200
= 1800−1600 = 200 =
5130−4900
=
230
𝑏𝑋𝑌 = 1 𝑏𝑌𝑋 = 0.87
X on Y Y on X
𝑋 − 𝑋ത = 𝑏𝑋𝑌 𝑌 − 𝑌ത Y−𝑌ത = 𝑏𝑌𝑋 𝑋 − 𝑋ത
σ𝑋 70
𝑋ത = 𝑁
= 5
= 14
σ𝑌 40 Y−8 = 0.87 𝑋 − 14
𝑌ത = = = 8
𝑁 5 Y−8 = 0.87𝑋 − 12.18
𝑌 = 0.87𝑋 − 12.18 + 8
𝑋 − 14 = 1 𝑌 − 8
𝑋 − 14 = 𝑌 − 8 Y = 0.87𝑋 − 4.18
𝑋 =𝑌+6
X 2 8 10 -2 5 -4
Y 3 2 5 10 -2 -3
45
Course Teacher: Dr.D.Ramya