0% found this document useful (0 votes)
165 views25 pages

ProbStat - Curvefitting - U5notes

1. The document discusses various concepts related to correlation and regression including simple regression, multiple regression, least squares method, and curve fitting. 2. It provides examples of using the least squares method to fit straight lines, exponential curves, and other functions to sets of data points. 3. The examples calculate regression equations and curves of best fit for different types of relationships between variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views25 pages

ProbStat - Curvefitting - U5notes

1. The document discusses various concepts related to correlation and regression including simple regression, multiple regression, least squares method, and curve fitting. 2. It provides examples of using the least squares method to fit straight lines, exponential curves, and other functions to sets of data points. 3. The examples calculate regression equations and curves of best fit for different types of relationships between variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

B.

Tech II Semester (2020 Batch)


PROBABILITY AND STATISTICS (20BM1104)
(For CSE-3 & CSE-4)

Unit – 5: Correlation and Regression

(The method of least squares, curvilinear regression, multiple regressions, correlation (excluding
causation))

Curve fitting: Computing a curve corresponding to a given set of points is called a curve fitting

Regression: A relation between independent and dependent variables obtained from a given set of points is
called a regression

Simple regression: A relation between one dependent variable and one independent variable obtained
from a given set of points is called a simple regression

Multiple regression: A relation between one dependent variable and two or more independent variables
obtained from a given set of points is called a simple regression

Regression line of y on x : A relation of the form y  a  bx is called a regression line of y on x

Regression line of x on y : A relation of the form x  a  by is called a regression line of x on y

Regression curve of y on x1 , x2 : A relation of the form y  a  bx1  cx2 is called a regression curve of
y on x1 , x2
Least Squares Method: The method of computing a curve (or regression curve) by using a given set of
points such that the sum of the squares of deviations from the points to the curve along y  axis is
minimum

Curve fitting by Least Squares Method:

1. To fit a straight line of the form y  a  bx , the Normal equation are given by
 y  na  b  x
 xy  a  x  b  x 2

2. To fit a straight line of the form x  a  by , the Normal equation are given by
 x  na  b  y
 xy  a  y  b  y 2

3. To fit an exponential curve of the form y  ae bx ,


First write ln y  ln a  bx and then the Normal equation are given by
 ln y  n ln a  b  x
 x ln y  ln a  x  b  x 2
4. To fit an exponential curve of the form y  ab x ,
First write log y  log a  x log b and then the Normal equation are given by
 log y  n log a  log b  x
 x log y  log a  x  log b  x 2

5. To fit a power curve (geometric curve) of the form y  ax b ,


First write log y  log a  b log x and then the Normal equation are given by
 log y  n log a  b  log x
 log x log y  log a  log x  b  (log x) 2

6. To fit a parabola of 2nd degree (or quadratic curve) of the form y  a  bx  cx 2 , the Normal equation
are given by
 y  na  b  x  c  x 2
 xy  a  x  b  x  c  x
2 3

x y  ax  bx  cx


2 2 3 4

7. To fit a multiple regression curve of the form z  a  bx  cy , the Normal equation are given by
 z  na  b  x  c  y
 xz  a  x  b  x  c  xy
2

 yz  a  y  b  xy  c  y 2

8. To fit a multiple regression curve of the form y  a  bx1  cx2 , the Normal equation are given by
 y  na  b  x1  c  x2
x y  ax  bx  cx x
1 1
2
1 1 2

x y  ax  bx x  bx


2 2 1 2
2
2
Problems:

1. Fit a straight line y  a  bx for the following data by least squares method

x 1 2 3 4 5
y 12 25 40 50 65

Solution: The normal equations for the straight line y  a  bx are


 y  na  b x  xy  a x  b x
2
and
Consider
x y x2 xy
1 12 1 12
2 25 4 50
3 40 9 120
4 50 16 200
5 65 25 325
 x  15  y  192  x 2  55  xy  707
Here  x  15,  y  192,  x 2
 55,  xy  707 and n5
The normal equations becomes 192  5a  15b   (1)
and 707  15a  55b   (2)
Solving (1) and (2), a  0.9 and b  13.1
Hence the straight line is y  0.9  13.1x

2. By the method of least squares, fit a straight line y  a  bx for the following data

x 50 70 100 120
y 12 15 21 25

Solution: The normal equations for the straight line y  a  bx are


 y  na  b x  xy  a x  b x
2
and
Consider
x y x2 xy
50 12 2500 600
70 15 4900 1050
100 21 10000 2100
120 25 14400 3000
 x  340  y  73  x 2  31800  xy  6750
Here  x  340,  y  73,  x 2
 31800,  xy  6750 and n  4
The normal equations becomes 73  4a  340 b    (1)
and 6750  340a  31800 b   (2)
Solving (1) and (2), a  2.2758 and b  0.1879
Hence the straight line is y  2.2758  0.1879 x
3. By the method of least squares, fit a straight line x  a  by for the following data

x 12 15 21 25
y 50 70 100 120

Solution: The normal equations for the straight line x  a  by are


 x  na  b y  xy  a y  b y
2
and
Consider
x y y2 xy
12 50 2500 600
15 70 4900 1050
21 100 10000 2100
25 120 14400 3000
 x  73  y  340  y 2  31800  xy  6750
Here  x  73,  y  340,  y 2
 31800,  xy  6750 and n  4
The normal equations becomes 73  4a  340 b    (1)
and 6750  340a  31800 b   (2)
Solving (1) and (2), a  2.2785 and b  0.1879
Hence the straight line is x  2.2785  0.1879 y

4. Fit an exponential curve y  ae bx for the following data

x 1 3 5 7 9
y 100 81 73 54 43

Solution: The curve is y  aebx


Taking log on both sides, log y  log a  bx log e
That is, Y  A  Bx , where Y  log y, A  log a, B  b log e
Now the normal equations for Y  A  Bx are
Y  nA  B x and  xY  A x  B x
2

Consider,
x y Y  log y x2 xY
1 100 2 1 2
3 81 1.9085 9 5.7255
5 73 1.8633 25 9.3165
7 54 1.7324 49 12.1268
9 43 1.6335 81 14.7015
 x  25 Y  9.1377  x  165 2
 xY  43.8703
Here  x  25, Y  9.1377,  x 2
 165,  xY  43.8703 and n  5
The normal equations becomes 9.1377  5 A  25B    (1)
and 43.8703  25 A  165B   (2)
Solving (1) and (2), A  2.0548 and B  0.0455 or
B
Therefore, a  10 A 113.4488 and b   0.1048
log e
Hence the required curve is y  113.4488 e 0.1048x

5. Fit an exponential curve y  ab x for the following data

x 1 2 3 4 5
y 130 152.2 177.3 190.2 244.7

Solution: The curve is y  ab x


Taking log on both sides, log y  log a  x log b
That is, Y  A  Bx , where Y  log y, A  log a, B  log b
Now the normal equations for Y  A  Bx are
Y  nA  B x and  xY  A x  B x
2

Consider,
x y Y  log y x2 xY
1 130 2.1139 1 2.1139
2 152.2 2.1824 4 4.3648
3 177.3 2.2487 9 6.7461
4 190.2 2.2792 16 9.1168
5 244.7 2.3886 25 11.9432
 x  15 Y  11.2129  x 2  55  xY  34.2849
Here  x  15, Y  11.2129,  x 2
 55,  xY  34.2849 and n  5
The normal equations becomes 11.2129  5 A  15B    (1)
and 34.2849  15 A  55B    (2)
Solving (1) and (2), A  2.0487 and B  0.0646 or
Therefore, a  10 111.8716 and b  10 1.1604
A B

Hence the required curve is y  111.8716 1.1604


x

6. Fit an exponential curve y  ab x for the following data

x 2 3 4 5 6
y 144 172.8 207.4 248.8 298.6

log y  log a  x log b or Y  A  Bx ,where Y  log y, A  log a, B  log b


The normal equations for Y  A  Bx are
Y  nA  B x and  xY  A x  B x
2

Here  x  20, Y   log y  11.5837,  x  90,  xY   x log y  47.1266


2

and n  5
The normal equations becomes 11.5837  5 A  20B and 47.1266  20 A  90B
On solving, A  2 and B  0.0792 or a 100 and b 1.2
Hence the exponential curve y  100 1.2
x
7. Fit a power curve y  ax b for the following data
x 1 2 3 4 5 6
y 2.98 4.26 5.21 6.10 6.80 7.50

Solution: The power curve is y  ax b


Taking log on both sides, log y  log a  b log x
That is, Y  A  bX , where Y  log y, X  log x, A  log a
Now the normal equations for Y  A  bX are
Y  nA  b X and  XY  A X  b X 2
Consider,
x y X  log x Y  log y X2 XY
1 2.98 0 0.4742 0 0
2 4.26 0.3010 0.6294 0.0906 0.1895
3 5.21 0.4771 0.7168 0.2276 0.3420
4 6.10 0.6021 0.7853 0.3625 0.4728
5 6.80 0.6990 0.8325 0.4886 0.5819
6 7.50 0.7782 0.8751 0.6055 0.6809
 X  2.8573 Y  4.3134  X 2  1.7748  XY  2.2671
Here  X  2.8573, Y  4.3134  X  1.7748,  XY  2.2671and n  6
2

The normal equations become 4.3134  6 A  2.8573 b    (1)


and 2.2671  2.8573 A  1.7748 b   (2)
Solving (1) and (2), A  0.4740 and b  0.5143
Therefore, a  10 A  2.9783
Hence the curve is y  2.9783 x 
0.5143

8. Determine the regression line of y on x for the following data

x 20 25 28 35 43
y 52 48 63 79 95

Solution: The regression line of y on x is given by y  a  bx


The normal equations for are
 y  na  b x and  xy  a x  b x
2

Consider
x y x2 xy
20 52 400 1040
25 48 625 1200
28 63 784 1764
35 79 1225 2765
43 95 1849 4085
 x  151  y  337  x 2  4883  xy  10854
Here  x  151,  y  337,  x 2
 4883,  xy  10854 and n5
The normal equations become 337  5a  151b   (1)
and 10854  151a  4883b   (2)
Solving (1) and (2), a  4.0998 and b  2.096
Hence the regression line of y on x is y  4.0998  2.096 x

9. Determine the regression line of x on y for the following data

x 1.1 2.3 4.5 7.6


y 21 35 64 84
And estimate the value of x at y  4.8

Solution: The regression line of x on y is given by x  a  by


The normal equations for the straight line x  a  by are
 x  na  b y  xy  a y  b y
2
and
Consider
x y y2 xy
1.1 21 441 23.1
2.3 35 1225 80.5
4.5 64 4096 288
7.6 84 7056 638.4
 x  15.5  y  204  y 2  12818  xy  1030
Here  x  15.5,  y  204,  y 2
 12818,  xy  1030 and n  4
The normal equations become 15.5  4a  204 b    (1)
and 1030  204a  12818 b   (2)
Solving (1) and (2), a  1.1849 and b  0.0992
Hence the regression line of x on y is x  1.1849  0.0992 y
Therefore, the value of x at y  4.8 is given by x  1.1849  0.0992 (4.8)  0.70874

10. Fit a second degree parabola y  a  bx  cx 2 for the following data

x 1 3 5 7 9
y 2 7 10 11 9

Solution: The curve is y  a  bx  cx 2


The normal equations are
 y  na  b x  c x 2 ,
 xy  a x  b x  c x
2 3

 x y  a  x  b x  c  x
2 2 3 4

Consider,
x y x2 x3 x4 xy x2 y
1 2 1 1 1 2 2
3 7 9 27 81 21 63
5 10 25 125 625 50 250
7 11 49 343 2401 77 539
9 9 81 729 6561 81 729
 x  25  y  39  x  165
2
 x  1225
3
 x 4  9669  xy  231  x y  1583
2
Here  x  25,  y  39,  x 2
 165, x 3
 1225, x 4
 9669,  xy  231
x 2
y  1583 and n  5

The normal equations 39  5 a  25b  165c    (1)


231  25 a  165 b  1225 c   (2)
and 1583  165 a  1225 b  9669 c   (3)
Solving (1), (2) and (3), a  1.5571, b  3.7571 and c  0.2857
Hence the parabola is y  1.5571  3.7571x  0.2857 x 2

11. Fit a second degree parabola y  a  bx  cx 2 for the following data

x 1.0 1.5 2.0 2.5 3.0 3.5 4.0


y 1.1 1.3 1.6 2.0 2.7 3.4 4.1

Solution: The curve is y  a  bx  cx 2


The normal equations are
 y  na  b x  c x 2 ,
 xy  a x  b x  c x 2 3

 x y  a  x  b x  c  x
2 2 3 4

Consider,
x y x2 x3 x4 xy x2 y
1.0 1.1 1 1 1 1.1 1.1
1.5 1.3 2.25 3.375 5.0625 1.95 2.925
2.0 1.6 4 8 16 3.2 6.4
2.5 2.0 6.25 15.625 39.0625 5 12.5
3.0 2.7 9 27 81 8.1 24.3
3.5 3.4 12.25 42.875 150.0625 11.9 41.65
4.0 4.1 16 64 256 0.4 1.6
x  y  x2   x3   x4   xy   x2 y 
17.5 12.2 50.75 161.875 548.1875 31.65 90.475

The normal equations 12.2  7 a  17.5b  50.75c    (1)


31.65  17.5 a  50.75 b  161.875 c   (2)
and 90.475  50.75 a  161.875 b  548.1875 c   (3)
Solving (1), (2) and (3), a  2.3929, b  3.7119 and c  0.7095
Hence the parabola is y  2.3929  3.7119 x  0.7095 x 2
12. Determine the Normal equations to fit a straight line of the form y  a  bx
Solution: Let y  f (x) , where f ( x)  a  bx
Let A  ( xi yi ) be any given data point

At x  xi , the observed (given) value of y is yi ; that is AC  yi


At x  xi , the expected value of y is f ( xi ) ; that is BC  f ( xi )
Therefore, the deviation (d i ) at x  xi is given by AB  AC  BC
That is, d i  yi  f ( xi ) so that di2  yi  f ( xi )
2

Sum of the squares of the deviations is given by S   di2   yi  f ( xi )   yi  (a  bxi )
2 2

According least squares method S is minimum


S S
To get S minimum, we need  0 and 0
a b
S
Now  0  2  yi  (a  bxi )(1)  0
a
   yi  (a  bxi )  0
  yi   a   bxi
  yi  n a  b  xi    (1)
S
And  0  2  yi  (a  bxi )( xi )  0
b
 
  xi yi  (axi  bxi2 )  0
  xi yi   axi   bxi2
  xi yi  a  xi  b  xi2    (2)
Therefore (1) and (2) are the required normal equations
13. Determine the least squares regression equation of the form y  a  bx1  cx2 for the following data

y 3 5 6 8 12 14
x1 16 10 7 4 3 2
x2 90 72 54 42 30 12

Solution: The equation is y  a  bx1  cx2


The normal equations are
 y  na  b x1  c x2 ,
 x y  a x  b x  c x x
1 1
2
1 1 2

 x y  a x  b x x  c x
2 2 1 2
2
2

Consider,
y x1 x2 x12 x 22 x1 x2 x1 y x2 y
3 16 90 256 8100 1440 48 270
5 10 72 100 5184 720 50 360
6 7 54 49 2916 378 42 324
8 4 42 16 1764 168 32 336
12 3 30 9 900 90 36 360
14 2 12 4 144 24 28 168
y   x1   x2   x12   x22   x1 x2   x1 y   x2 y 
48 42 300 434 19008 2820 236 1818

The normal equations 48  6 a  42 b  300 c    (1)


236  42 a  434 b  2820 c   (2)
and 1818  300 a  2820 b  19008 c   (3)
Solving (1), (2) and (3), a  16.1067, b  0.4270 and c  0.2219
Hence the regression equation is y  16.1067  0.4270 x1  0.2219 x2

14. Determine the least squares regression equation of the form z  a  bx  cy for the following data

z 16 19 23 20 26 23 28
x 1 2 3 4 5 6 7
y 4 5 7 2 6 1 4

Solution: The equation is z  a  bx  cy


The normal equations are
 z  na  b x  c y ,
 xz  a x  b x  c xy
2

 yz  a y  b xy  c y 2
Consider,
z x y x2 y2 xy xz yz
16 1 4 1 16 4 16 64
19 2 5 4 25 10 38 95
23 3 7 9 49 21 69 161
20 4 2 16 4 8 80 40
26 5 6 25 36 30 130 156
23 6 1 36 1 6 138 23
28 7 4 49 16 28 196 112
z  x  y   x2   y2   xy   xz   yz 
155 28 29 140 147 107 667 651

The normal equations 155  7 a  28 b  29 c    (1)


667  28 a  140 b  107 c   (2)
and 651  29 a  107 b  147 c   (3)
Solving (1), (2) and (3), a  10, b  2 and c  1
Hence the regression equation is z  10  2 x  y

Exercise:

1. Fit a straight line y  a  bx for the following data by least squares method

x 1 2 3 4 5 6
y 14 33 40 63 76 85

2. Fit a straight line y  a  bx for the following data by least squares method

x 0 2 3 5 9
y -3 4 3 8 15

3. Fit a straight line x  a  by for the following data by least squares method

x 18.5 25.4 30 64.5 34.6 89.8 20.8


y 5 8 10 25 12 36 6
(Ans: x  7  2.3 y )

4. The following shows the improvement of eight students in a speed-reading program, and the number
of weeks they have been in the program:

No.of weeks x 3 5 2 8 6 9 3 4
Speed gain (words/min.) y 86 118 49 193 164 232 73 109

Fit a straight line by the method of least squares


5. If p is the pull required to lift a load w by means of a pulley block, find a linear law of the form
p  mw  c using the data
p 12 15 21 25
w 50 70 100 120

Predict y at x  3.75 by fitting power curve y  a x for the following data


b
6.

x 1 2 3 4 5 6
y 2.98 4.26 5.21 6.10 6.80 7.50

7. Fit a second degree parabola y  a  bx  cx 2 for the following data

x 2.5 3.6 4.6 5.2 6.8 7.2 8.9 9.2


y 1.8 2.6 4.8 6.2 8.9 4.2 2.9 4.5

(Ans: y  7.7504  4.422 x  0.3472 x 2 )

8. Determine the regression line of y on x for the following data

x 50 60 70 90 100
y 65 51 40 26 08

9. Find Y when X1 = 10 and X2 = 6 from the least square regression equation of Y on X1 and X2for the
following data
Y 90 72 54 42 30 12
X1 3 5 6 8 12 14
X2 16 10 7 4 3 2

10. Fit a least-squares regression plane for the following data and also find y at x1  2.2 and x2  90 .

y 5.3 7.8 7.4 9.8 10.8 9.1 8.1 7.2 6.5 12.6
x1 1.5 2.5 0.5 1.2 2.6 0.3 2.4 2 0.7 1.6
x2 66 87 69 141 93 105 111 78 66 123

Correlation: The relationship between two variables such that a change in one variable results in a
 ve or -ve change in the other and also greater change in one variable results in corresponding greater
change in the other is called a correlation. For a change in one variable, if there is a corresponding change
in the other variable then the variables are called correlated.
Note:
(i) If the variables deviate in the same direction then the correlation is called direct or  ve correlation
(ii) If the variables deviate in the opposite direction then the correlation is called inverse or -ve
correlation

Correlation Coefficient (or Karl Pearson coefficient of correlation): The numerical measurement of
linear relationship between the variables x and y is called the coefficient of correlation of x and y and it
is denoted by r ( x, y) or r
Note:
(i) The coefficient of correlation r is always lies between  1 and 1 ; that is,  1  r  1
(ii) If r  0 then the variables are not correlated
(iii) If r  1 then the variables are positively and perfectly correlated
(iv) If r  1 then the variables are negatively and perfectly correlated
(v) If 0  r  1 then the variables are positively and partially correlated
(vi) If  1  r  0 then the variables are negatively and partially correlated

Correlation formulas:

Mean of x is given by x  
x
(i)
n

Variance of x is given by  x2   or  x2    (x ) 2
(x  x)2 x2
(ii)
n n

Covariance of x and y is given by Cov( x, y )   or Cov( x, y )    ( x )( y )


( x  x ) ( y  y) xy
(iii)
n n

(iv) Coefficient of correlation of x and y is given by r 


Cov( x, y )
or r 
 (x  x ) ( y  y)
 x y  ( x  x )2  ( y  y)2
 x2   y2   x2 y  x2 y   x2   y2
(v) r or r 
2 x y 2 x y
y
(vi) Regression line of y on x is given by y  y  r (x  x)
x
The slope of the regression line of y on x is called regression coefficient of y on x

It is denoted by b yx and is given by byx  r y
x

(vii) Regression line of x on y is given by x  x  r x ( y  y )
y
The slope of the regression line of x on y is called regression coefficient of x on y

It is denoted by bxy and is given by bxy  r x
y
(viii) Both the regression lines passes through the point ( x , y )
 y
(ix) The Geometric Mean of the regression coefficients is r ; that is r 2  bxy  b yx  r x  r
y x
Problems:

1. Determine the coefficient of correlation for the following data

x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9

Solution: The coefficient of correlation r 


 ( x  x) ( y  y)
 ( x  x )  ( y  y)
2 2

Here x   x  56  7 and y
 y  40  5
n 8 n 8
x y xx y y ( x  x )( y  y ) (x  x)2 ( y  y) 2
1 1 -6 -4 24 36 16
3 2 -4 -3 12 16 9
4 4 -3 -1 3 9 1
6 4 -1 -1 1 1 1
8 5 1 0 0 1 0
9 7 2 2 4 4 4
11 8 4 3 12 16 9
14 9 7 4 28 49 16
x  y  x  x  y  y    x  x  
2
y  y 
2

56 40 84 132 56

 x  x   132, y  y   56,  x  x  y  y   84


2 2
Observe that,

Therefore, the coefficient of correlation r 


 ( x  x ) ( y  y )  84  0.977
 ( x  x)  ( y  y)
2
13256
2

2. Determine the coefficient of correlation for the following data

x 78 36 98 25 75 82 90 62 65 39
y 84 51 91 60 68 62 86 58 53 47

Solution: The coefficient of correlation r 


 (x  x ) ( y  y)
 ( x  x )  ( y  y)
2 2

Here x 
 x  650  65 and y 
 y  660  66
n 10 n 10

x y xx y y ( x  x )( y  y ) (x  x)2 ( y  y) 2
78 84 13 18 234 169 324
36 51 - 29 -15 435 841 225
98 91 33 25 825 1089 625
25 60 - 40 -6 240 1600 36
75 68 10 2 20 100 4
82 62 17 -4 -68 289 16
90 86 25 20 500 625 400
62 58 -3 -8 24 9 64
65 53 0 -13 0 0 169
39 47 - 26 -19 494 676 361
x  y  x  x  y  y    x  x  
2
y  y 
2

650 660 2704 5398 2224

 x  x   5398, y  y   2224,  x  x  y  y   2704


2 2
Observe that,

Therefore, the coefficient of correlation r 


 ( x  x ) ( y  y) 
2704
 0.7804
 ( x  x )  ( y  y)
2 2
53982224
3. Determine the coefficient of correlation for the following data

x 65 66 67 67 68 69 70 72
y 67 68 65 68 72 72 69 71

Solution: The coefficient of correlation r 


 (x  x ) ( y  y)
 ( x  x )  ( y  y)
2 2

Here x   x  544  68 and y


 y  552  69
n 8 n 8

x y xx y y ( x  x )( y  y ) (x  x)2 ( y  y) 2
65 67 -3 -2 6 9 4
66 68 -2 -1 2 4 1
67 65 -1 -4 4 1 16
67 68 -1 -1 1 1 1
68 72 0 3 0 0 9
69 72 1 3 3 1 9
70 69 2 0 0 4 0
72 71 4 2 8 16 4
x  y  x  x  y  y    x  x  
2
y  y 
2

544 552 24 36 44

 x  x   36, y  y   44,  x  x  y  y   24


2 2
Observe that,

Therefore, the coefficient of correlation r 


 ( x  x ) ( y  y) 
24
 0.603
 ( x  x )  ( y  y)
2 2
3644

4. From the following data

x 65 66 67 67 68 69 70 72
y 67 68 65 68 72 72 69 71
Determine (i) x and y (ii)  x and  y (iii) Cov( x, y) (iv) the correlation coefficient
between x and y

 x ,  2  (x  x)  ( x  x ) ( y  y)
2

Solution: We know that x x , Cov( x, y ) 


n n n
Cov( x, y )
and r 
 x y

Consider,
x y xx y y ( x  x )( y  y ) (x  x)2 ( y  y) 2
65 67 -3 -2 6 9 4
66 68 -2 -1 2 4 1
67 65 -1 -4 4 1 16
67 68 -1 -1 1 1 1
68 72 0 3 0 0 9
69 72 1 3 3 1 9
70 69 2 0 0 4 0
72 71 4 2 8 16 4
x  y  x  x  y  y    x  x  
2
y  y 
2

544 552 24 36 44

(i) x 
 x  544  68 and y
 y  552  69
n 8 n 8

(ii)  x2 
(x  x)2 36
 4.5 ,  x  4.5  2.1213

n 8

 y2 
 ( y  y ) 2  44  5.5 ,   5.5  2.3452
y
n 8

(iii) Cov( x, y) 
 ( x  x ) ( y  y )  24  3
n 8
Cov( x, y) 3
(iv) r    0.6030
 x y 2.1213  2.3452

5. Find the correlation coefficient between x and y from the following data

x 78 89 97 69 59 79 68 57
y 125 137 156 112 107 138 123 108

Solution: The coefficient of correlation r 


 (x  x ) ( y  y)
 ( x  x )  ( y  y)
2 2

Here x   x  596  74.5 and y


 y  1006  125.75
n 8 n 8
Consider,
x y xx y y ( x  x )( y  y ) (x  x)2 ( y  y) 2
78 125 3.5 -0.75 -2.625 12.25 0.5625
89 137 14.5 11.25 163.125 210.25 126.5625
97 156 22.5 30.25 680.625 506.25 915.0625
69 112 -5.5 -13.75 75.625 30.25 189.0625
59 107 -15.5 -18.75 290.625 240.25 351.5625
79 138 4.5 12.25 55.125 20.25 150.0625
68 123 -6.5 -2.75 17.875 42.25 7.5625
57 108 -17.5 -17.75 310.625 306.25 315.0625
x  y  x  x  y  y    x  x  
2
y  y 
2

596 1006 1591 1368 2055.5


 x  x   1368, y  y  2055.5,  x  x  y  y   1591
2 2
Observe that,

Therefore, the coefficient of correlation r 


 ( x  x) ( y  y) 
1591
 0.9488
 ( x  x )  ( y  y)
2 2
13682055.5

6. From the following data

x 78 89 97 69 59 79 68 57
y 125 137 156 112 107 138 123 108
Determine
(i) x and y
(ii)  x and  y
(iii) Cov( x, y)
(iv) the correlation coefficient between x and y
(v) two regression lines

Solution: We know that x 


 x
, x 
2  (x  x)2
, Cov( x, y ) 
 ( x  x ) ( y  y)
n n n
Cov( x, y )
and r 
 x y
Consider,

x y xx y y ( x  x )( y  y ) (x  x)2 ( y  y) 2
78 125 3.5 -0.75 -2.625 12.25 0.5625
89 137 14.5 11.25 163.125 210.25 126.5625
97 156 22.5 30.25 680.625 506.25 915.0625
69 112 -5.5 -13.75 75.625 30.25 189.0625
59 107 -15.5 -18.75 290.625 240.25 351.5625
79 138 4.5 12.25 55.125 20.25 150.0625
68 123 -6.5 -2.75 17.875 42.25 7.5625
57 108 -17.5 -17.75 310.625 306.25 315.0625
x  y  x  x  y  y    x  x  
2
y  y 
2

596 1006 1591 1368 2055.5

(i) x     y  1006  125.75


x 596
 74.5 and y 
n 8 n 8

(ii)  x2 
 ( x  x )  1368  171,   171  13.0767
2

x
n 8

y 
2  ( y  y ) 2 2055.5
  256.9375 ,  y  256.9375  16.0293
n 8

(iii) Cov( x, y) 
 ( x  x ) ( y  y )  1591  198.875
n 8
Cov( x, y) 198.875
(iv) r    0.9488
 x y 13.0767 16.0293
y
(v) The regression line of y on x is given by y  y  r (x  x)
x
16.0293
That is, y  125.75  (0.9488) ( x  74.5)
13.0767
 y  125.75  1.1630 ( x  74.5)
 y  125.75  1.1630 x  86.6435
 y  39.1065  1.163 x
x
And the regression line of x on y is given by x  x  r ( y  y)
y
13.0767
That is, x  74.5  (0.9488) ( y  125.75)
16.0293
 x  74.5  0.7740 ( y  125.75)
 x  74.5  0.774 y  97.3305
 x  22.8305  0.774 y

7. The two regression equations of the variables x and y are y  0.399 x  6.934  0 and
x  1.212 y  2.461  0. Find (i) mean of x (ii) mean of y (iii) correlation coefficient between x and y

Solution: Solving the given equations, x  11.5083 and y  11.5258


Therefore, x  11.5083 and y  11.5258
y
The regression coefficient of y on x is given by r  0.399
x

The regression coefficient of x on y is given by r x  1.212
y
    
Correlation coefficient r   r y   r x   0.3991.212  0.6954
 
 x   y 
( r is positive since both the regression coefficients are positive)

8. The two regression equations of the variables x and y are x  19.13  0.87 y and y  11.64  0.50 x.
Find (i) mean of x (ii) mean of y (iii) correlation coefficient between x and y

Solution: Solving the given equations, x  15.9349 and y  3.6726


Therefore, x  15.9349 and y  3.6726
y
The regression coefficient of y on x is given by r  0.50
x

The regression coefficient of x on y is given by r x  0.87
y
 y   x 
Correlation coefficient r   r   r    0.50 0.87   0.66
   
 x   y 

( r is negative since both the regression coefficients are negative)


 x2   y2   x2 y
9. Establish the formula r 
2 x  y
Solution: Consider,
 x2 y   x  y   x  y    x  x    y  y 
1 2 1 2

n n
  x  x     y  y    x  x  y  y 
1 2 1 2 2
n n n
  x   y  2r x y
2 2

 x2   y2   x2 y
Therefore, r 
2 x  y

 x2 y   x2   y2
10. Establish the formula r 
2 x y
Solution: Consider,
 x2 y   x  y   x  y    x  x    y  y 
1 2 1 2

n n
  x  x     y  y    x  x  y  y 
1 2 1 2 2
n n n
  x   y  2r x y
2 2

 x2 y   x2   y2
Therefore, r 
2 x y
 x2   y2   x2 y
11. Use the formula r  to compute the correlation coefficient to the following data
2 x  y

x 78 89 97 69 59 79 68 57
y 125 137 156 112 107 138 123 108

Solution: Consider,

x y xx y y (x  x)2 ( y  y) 2 z  x y zz (z  z)2


78 125 3.5 -0.8 12.25 0.64 -47 4.25 18.0625
89 137 14.5 11.2 210.25 125.44 -48 3.25 10.5625
97 156 22.5 30.2 506.25 912.04 -59 -7.75 60.0625
69 112 -5.5 -13.8 30.25 190.44 -43 8.25 68.0625
59 107 -15.5 -18.8 240.25 353.44 -48 3.25 10.5625
79 138 4.5 12.2 20.25 148.84 -59 -7.75 60.0625
68 123 -6.5 -2.8 42.25 7.84 -55 -3.75 14.0625
57 108 -17.5 -17.8 306.25 316.84 -51 0.25 0.0625
x  y  (x  x)2  ( y  y)2 z   (z  z)2
596 1006  1368  2055.52  410  241.5
(i) x     y  1006  125.75 and z  x  y   z   410  51.25
x 596
 74.5, y 
n 8 n 8 n 8

(ii)  x2  
(x  x) 2
1368
  171 ,  x  171  13.0767
n 8

 y2 
 ( y  y ) 2  2055.52  256.94 ,   256.94  16.0293
y
n 8

 x y   z 
2 2  ( z  z ) 2 241.5
  30.1875 ,  x y   z  30.1875  5.4943
n 8
 x2   y2   x2 y 171  256.94  30.1875
(iii) r    0.9488
2 x  y 2 13.0767 16.0293
r 2  1   x y 
12. If  is the angle between the two regression lines, prove that tan 
r   x2   y2 
Solution: We know that the two regression lines
y
y y r ( x  x )   (1)
x

x  x  r x ( y  y )   (2)
y
1y
From (2), y  y  (x  x)
r x
y
The slope of the line (1), m1  r
x
1y
The slope of the line (2), m2 
r x
y 1y  r 2 1  y
  
r   x
r
m1  m2 x r x  r 2  1   x y 
Therefore, tan    
1  m1m2  y  1y
   x2   y2  r   x2   y2 
1   r  
  
 x  r x
   2 
 x 
r 2  1   x y 
If  is acute angle then tan  is positive, and therefore tan 
r  2  2 
 x y 

Exercise:

1. Determine the correlation coefficient for the following data

x 11.1 10.3 12 15.1 13.7 18.5 17.3 14.2 14.8 15.3


y 10.9 14.2 13.8 21.5 13.2 21.1 16.4 19.3 17.4 19.0

2. Compute the correlation coefficient to the following data

x 62 56 36 66 25 75 82 78
y 58 44 51 58 60 68 62 84
3. Compute the correlation coefficient to the following data

x 8 1 5 4 7
y 3 4 0 2 1

4. From the following data

x 50 60 70 90 100
y 65 51 40 26 08
Determine
(i) x and y
(ii)  x and  y
(iii) Cov( x, y)
(iv) the correlation coefficient between x and y
(v) two regression lines

5. The equations of two regression lines obtained in a correlation analysis are 4 x  5 y  33  0 and
20 x  9 y  107. Compute (i) mean of x (ii) mean of y (iii) correlation coefficient between x and y

6. Psychological tests of intelligence and engineering ability were applied to 10 students. Here is a record
of ungrouped data showing intelligence ratio (IR) and engineering ratio (ER). Calculate the coefficient
of correlation

Student A B C D E F G H I J
IR 105 104 102 101 100 99 98 96 93 92
ER 101 103 100 98 95 96 104 92 97 94

x  99, y  98,  x  x   170,   y  y   140,  x  x  y  y   92


2 2

Correlation coefficient r 
 ( x  x ) ( y  y)  0.59
 ( x  x )  ( y  y)
2 2

 x2   y2   x2 y
7. Use the formula r  to compute the correlation coefficient to the following data
2 x  y
X 62 56 36 66 25 75 82 78
Y 58 44 51 58 60 68 62 84

8. Given that x  31.6, y  38, x  3.72, y  6.31 and r  0.36 . Determine the two regression lines
Rank Correlation: The correlation between the ranks of the variables x and y is called the rank
correlation

Rank Correlation Coefficient (or Spearman Rank Correlation Coefficient): It is denoted by  ( x, y) or


6d 2
 and is given by   1  , where d is the difference between the ranks of corresponding values
n (n 2  1)
of x, y and n is the number of pairs of data points.

Repeated Values: If an item of x or y is repeated m times, then we give the average rank for the repeated
m (m 2  1)
items and add the factor to  d 2 in the formula of  .
12
Problems:
1. Determine the rank correlation coefficient for the following data

x 68 64 75 50 64 80 75 40 55 64
y 62 58 68 45 81 60 68 48 50 70

6d 2
Solution: The rank correlation of x, y is given by   1 
n (n 2  1)
The values of x in decreasing order: 80, 75, 75, 68, 64, 64, 64, 55, 50, 40
The values of y in decreasing order: 81, 70, 68, 68, 62, 60, 58, 50, 48, 45
Consider,
x y Rank x Rank y d  Rank x  Rank y d2
68 62 4 5 -1 1
64 58 6 7 -1 1
75 68 2.5 3.5 -1 1
50 45 9 10 -1 1
64 81 6 1 5 25
80 60 1 6 -5 25
75 68 2.5 3.5 -1 1
40 48 10 9 1 1
55 50 8 8 0 0
64 70 6 2 4 16
 d 2  72
Here, in the values of x, 75 is repeated 2 times and 64 is repeated 3 times
And in the values of y, 68 is repeated 2 times
Therefore the correction factor is given by
 m (m 2  1)  2(2 2  1)  3(32  1)  2(2 2  1)  1  2  1  3
12 12 12 12 2 2

 m (m 2
 1)
Now, n  10 ,  d 2  72 and d 2  12
 72  3  75
Hence the Rank correlation coefficient,
6d 2 6 (75) 450
  1  1  1  1  0.4545  0.5455
n (n  1)
2
10 (10  1)
2
990
2. Determine the rank correlation coefficient for the following data

x 10 15 12 17 13 16 24 14 22
y 30 42 45 46 33 34 40 35 39

6d 2
Solution: The rank correlation of x, y is given by   1 
n (n 2  1)
Consider,

x y Rank x Rank y d  Rank x  Rank y d2


10 30 9 9 0 0
15 42 5 3 2 4
12 45 8 2 6 36
17 46 3 1 2 4
13 33 7 8 -1 1
16 34 4 7 -3 9
24 40 1 4 -3 9
14 39 6 5 1 1
22 35 2 6 -4 16
 d 2  80
Here, there are no repetitions in the values of x and y
Now, n  9 ,  d 2  80
Hence the Rank correlation coefficient,
6d 2 6 (80) 480
  1  1  1  1  0.6667  0.3333
n (n  1)
2
9 (9  1)
2
720

3. Ten participants in a contest are ranked by two judges as follows

x 1 6 5 10 3 2 4 9 7 8
y 6 4 9 8 1 2 3 10 5 7
Calculate the rank correlation coefficient
6d 2
Solution: The rank correlation of x, y is given by   1 
n (n 2  1)
Consider,
x y Rank x Rank y d  Rank x  Rank y d2
1 6 1 6 -5 25
6 4 6 4 2 4
5 9 5 9 -4 16
10 8 10 8 2 4
3 1 3 1 2 4
2 2 2 2 0 0
4 3 4 3 1 1
9 10 9 10 -1 1
7 5 7 5 2 4
8 7 8 7 1 1
 d  60
2
Here, there are no repetitions in the values of x and y
Now, n  10 ,  d 2  60
Hence the Rank correlation coefficient,
6d 2 6 (60) 360
  1  1  1  1  0.3636  0.6364
n (n  1)
2
10 (10  1)
2
990

4. Determine the rank correlation coefficient for the following data

x 5 10 6 3 19 5 6 12 8 2 10 19
y 8 3 2 9 12 3 17 18 22 12 17 20

6d 2
Solution: The rank correlation of x, y is given by   1 
n (n 2  1)
Consider,
x y Rank x Rank y d  Rank x  Rank y d2
5 8 9.5 9 0.5 0.25
10 3 4.5 10.5 -6 36
6 2 7.5 12 -4.5 20.25
3 9 11 8 3 9
19 12 1.5 6.5 -5 25
5 3 9.5 10.5 -1 1
6 17 7.5 4.5 3 9
12 18 3 3 0 0
8 22 6 1 5 25
2 12 12 6.5 5.5 30.25
10 17 4.5 4.5 0 0
19 20 1.5 2 -0.5 0.25
 d 2  156
Here, in the values of x, 19 is repeated 2 times, 10 is repeated 2 times, 6 is repeated 2 times, and 5 is
repeated 2 times
And in the values of y, 17 is repeated 2 times, 12 is repeated 2 times and 3 is repeated 2 times,
Therefore the correction factor is given by
 m (m 2  1)  7  2(2 2  1)  7  3.5
12 12 2

 m (m 2
 1)
Now, n  12 , d 2
 156 and d 2

12
 156  3.5  159.5
Hence the Rank correlation coefficient,
6d 2 6 (159.5) 957
  1  1  1  1  0.5577  0.4423
n (n  1)
2
12 (12  1)
2
1716
Exercise:

1. Determine the rank correlation coefficient for the following data

x 8 3 9 2 7 10 4 6 1 5
y 9 5 10 1 8 7 3 4 2 6

Ans: n  10 ,  d 2  24 and   0.8545


2. Determine the rank correlation coefficient for the following data

x 78 56 36 66 25 75 82 62
y 84 44 57 58 60 68 62 58

 m (m 2
 1)
 d 2  28.5 ,
1
Ans: n  10 , and   0.655

12 2
3. Determine the rank correlation coefficient for the following data

x 65 63 67 64 68 62 70 66 68 67 69 71
y 68 66 68 65 69 66 68 65 71 67 68 70

 m (m 2
 1)
Ans: n  12 ,  d 2  72.5 , 12
 7 and   0.722

You might also like