Correlation and Regression
Correlation and Regression
Department of Mathematics
BY
Dr Sowbhagya
Assistant Professor
Syllabus For second semester
Subject Code: 20MAT21
Subject Name: Differential Equations and Numerical Methods
1 1
x
n
( x x ) 2
y
n
( y y ) 2
3. Degree of association (Strength in magnitude)
For Positive Correlation : 0 r 1
For No Correlation : r 0
Note : Coefficient of correlation is always
independent of units.
Problems
1. Calculate the correlation coefficient from the following data
x 1 2 3 4 5 6 7 8 9 10
y 10 12 16 28 25 36 41 49 40 50
Solution:
From the given data we get, n 10, x 55, y 307,
x 2
385, 11387,
y 2
xy 2074
Solution:
From the given n 10, x 990, y 980 xy 97112 x2 98180, y 96180
2
data,
The correlation coefficient is given by
n xy x y
r 10 97112 990 980 0.5963
n x 2 x n y 2 y
2 2
10 98180 9902 10 96180 9802
3. Find the correlation coefficient between x and y from the given data:
x 78 89 97 69 59 79 68 57
y 125 137 156 112 107 138 123 108
Solution:
n 8, x 596, y 1006 xy 76538 x 45770, y 128560
2 2
From the given data,
n xy x y
r 8 76538 596 1006 0.9488
n x 2 x n y 2 y
2 2
8 45770 5962 8128560 10062
4. A person while calculating coefficient ofcorrelation between two
y 2
460 (14 2
6 2
) (12 2
8 2
) 436
Proof:
(x x) ( y y)
2
xx y y 2 2
( x x )( y y )
1. Consider
x2 y2
2
x y
x y
n x2 n y2 n cov( x, y )
2 1
2
x 2
y x y x
n
( x x )2
n n 2nr 2n(1 r ) 1
y
n
( y y)2
2
1 xx y y
r 1 1
2n x y Cov( x, y )
n
( x x )( y y )
Cov ( x, y )
r
x y
2. Consider
(x x) ( y y)
2
xx y y 2 2
( x x )( y y )
x2 y2
2
x y
x y
n x2 n y2 n cov( x, y )
2 n n 2nr 2n(1 r )
2
x 2
y x y
2
xx y y
2
1 xx y y
r 1 1
r 1
2n x y
2n x y
Homework Problems
1. Calculate the correlation coefficient from the following data
x 1 2 3 4 5
Answer: r= 0.8062
y 2 5 3 8 7
xi
Lines of Regression
n
Y ax b
S Yi yi
2
yi i 1
A3
n
S axi b yi
Yi 2
Minimum
i 1
A2
Differentiating w. r. t a & b we get
A1
S S
0 (1) 0 (2)
a b
xi
a x nb y
Normal Equations
a x b x yx
Solving eq. (1) &(2) we get 2
a x nb y multiplying x to first equation n xy x y
a byx
a x b x yx n x 2 x
2 2
b
y a x
b yx y byx x
y x
n n n
Slope = b y x
Line passes through ( x , y )
Role of coefficient of correlation in regression
yi yi yi
xi xi xi
r=1 r = 0.8 r = 0.5
y y byx (x x)
Regression Lines(Equations)
• If ‘x’ is treated as an independent variable and ‘y’ as dependent
variable then the straight line obtained is called Regression
equation of y on x.
n x y x y
Equation of the form y y by x ( x x ) Where b OR by x r y
n x x x
yx 2
2
2. r , by x and bx y will have the same sign (all positive or all negative).
3. If by x 1 then bx y 1
4. The point ( x , y ) satisfies both the regression equations.
5. For regression equation of y on x i.e y y by x ( x x ) where by x slope
1 1
x x bx y ( y y ) y y x x where slope
bx y bx y
Problems
1) Find the two regression lines and hence the coefficient of correlation
from the following data
x 1 2 3 4 5
y 2 5 3 8 7
Solution:
From the given data we get, n 5, x 15, y 25, x 55, y 151,2 2
xy 88
x y n xy x y n xy x y
x 3, y 5, byx 1.3, bxy 0.5
n x x n y 2 y
2 2 2
n n
• Regression equation of y on x is given by
2
then r byx .bxy 6 2 which is not accepted .
3
3 1 1
Let byx & 6 bxy
2 bxy 6
3 1
then r byx .bxy 0.5 1 accepted .
2 6
4) In a partially destroyed lab record on correlation data the following
records were only available, var(x)=9, regression equations are
4x- 5y+33=0 , 20x-9y=107. Find x , y , r and y
Solution :
sin ce x and y satisfies both regression equations, we get
4 x 5 y 33
solving we get x 13 and y 17
20 x 9 y 107
• To find r :
4 33
4 x 5 y 33 y x
5 5
20 107
20 x 9 y 107 y x
9 9
4 1 20 9
Suppose byx & bxy
5 bxy 9 20
4 9
then r byx .bxy 0.6 .
5 20
• To find y
4 var(x) = 9
3
y byx . x 5
by x r y 4
x r 0.6
Angle between RegressionLines
Consider two regression lines
y y by x ( x x )......(1)
1
x x bx y ( y y ) y y x x ......(2)
bx y
y
Slope of line (1) : m1 by x r
x
1
x
x x1 x2 x3 x4 - - xn
y y1 y2 y3 y4 - - yn
• Assign ranks to x-series data and y-series data, with highest value as
rank 1 and lowest with rank n.
x x1 x2 x3 x4 - - xn
y y1 y2 y3 y4 - - yn
Rx Rx1 Rx2 Rx3 Rx4 - - Rxn
Ry Ry1 Ry2 Ry3 Ry4 - - Ryn
Spearman’s rank correlation coefficient -Formula
• Let d denote the difference in ranks i.e. d=Rx-Ry
x x1 x2 x3 x4 - - xn
y y1 y2 y3 y4 - - yn
Rx Rx1 Rx2 Rx3 Rx4 - - Rxn
Ry Ry1 Ry2 Ry3 Ry4 - - Ryn
d=Rx-Ry Rx1-Ry1 Rx2-Ry2 Rx3-Ry3 Rx4-Ry4 - - Rxn-Ryn
6 d 2
Solution:
Marks in Physics (x) 36 43 47 28 35 50 40
Marks in Maths (y) 73 44 35 30 20 36 40
Rx 5 3 2 7 6 1 4
Ry 1 2 5 6 7 4 3
d = Rx-Ry 4 1 -3 1 -1 -3 1
d2 16 1 9 1 1 9 1
6 d 2
64 58 6 6 0 0
6 28
75 68 3 3 0 0
1 1
50 45 8 9 -1 1 n(n 1)
2
9(92 1)
69 81 4 1 3 9
80
76
60
69
1
2
5
2
-4
0
16
0
0.7667
40 48 9 8 1 1
55 50 7 7 0 0
3. In a beauty contest 10 participants are ranked by 3 judges, determine which pair
of judges has a common taste in respect of beauty.
Judge 1 4 7 2 1 5 10 9 6 3 8
Judge 2 6 5 1 2 9 10 3 7 4 8
Judge 3 6 8 2 1 5 9 10 3 4 7
Solution :
To find: rank correlation between Judge 1 & 2, Judge 1 & 3, Judge 2 &3.
Judge 1 Judge 2 Judge 3 d12 d13 d23 d2 d2 d2
12 13 23
4 6 6 -2 -2 0 4 4 0
7 5 8 2 -1 -3 4 1 9
2 1 2 1 0 -1 1 0 1
1 2 1 -1 0 1 1 0 1
5 9 5 -4 0 4 16 0 16
10 10 9 0 1 1 0 1 1
9 3 10 6 -1 -7 36 1 49
6 7 3 -1 3 4 1 9 16
3 4 4 -1 -1 0 1 1 0
8 8 7 0 0 1 0 1 1
Total 64 18 94
• Rank Correlation between Judge 1 and 2 is given by
6 d122 6 64
12 1 1 0.6121
n(n 1)
2
10(10 1)
2
Answer: 0.9336
Thank You