Chapter 4
Chapter 4
CORRELATION
Ans: correlation: correlation measure the strength of linear relationship between two or
more variables so that if the change in one variable effects to change in other variable.
Then they are said to be correlated.
For example, the production of paddy is dependent on the rainfall. Here production of
paddy is considered to be a dependent variable.
rxy = √∑ (
i =1
2
xi − x̄ ) ∑ ( y i − ȳ )2
i =1 ……………..(1)
Equation (1) is also called Karl pearson’s coefficient of correlation formula given by
1890.
∑ x i y i−
(∑ )(∑ )
i=1
xi
i=1
yi
√{
n 2 n 2
rxy =
n
∑ x i2−
i=1
(∑ )i=1
n
xi
}{ n
∑ y i 2−
i=1
(∑ ) i=1
n
yi
}
2. Types of Correlation.
There are three types of correlations. They are
4. What is scatter diagram? Interpret the different values of r with the help of
scatter diagram.
Scatter diagram: The diagrammatic way of representing bivariate data is called scatter
diagram.
Suppose, (x1,y1), (x2,y2)………..(xn,yn) are n pairs of observations. If the values of the
variables x and y be plotted along the x-axis and y-axis respectively in the xy-plane, the
diagram of dots so obtained is known as scatter diagram.
Fig 1.1
Scatter
diagram for showing r= +1 Fig 1.2 Scatter diagram for showing r= -1
Fig 1.3 Scatter diagram for showing 0 < r <1 Fig 1.4 Scatter diagram for showing -1< r <0
Fig 1.5 Scatter diagram for showing r = 0 Fig 1.6 Scatter diagram for showing r = 0
i.e, rxy= √ b yx ×b xy
5. For two independent variable correlation coefficient is zero
rxy = √ ∑ ( xi − x̄ )2 ∑ ( y i − ȳ )2
i =1 i =1 ……………..(1)
∑ XY
r= √∑ X 2 ∑ Y 2
Let us consider the following expression which is always positive.
X Y
±
2
i.e, ∑ ( √∑ X 2 √∑ Y 2 ) ≥0
X2 X Y Y2
or,
∑ ( ∑ X 2
±2
√ 2
∑ √ ∑
X Y 2
+
Y 2
≥0
)
∑ X2 ±2
∑ XY ∑ Y 2 )≥0
or, ( ∑ X2 √∑ X2 ∑ Y 2 + ∑Y2
or, 1 ±2 r+1≥0
or, r ¿−1
or, -1 ¿ r …………(ii)
and 1-r ¿ 0
or, 1 ¿ r
or, r ¿ 1 …………..(iii)
From (ii) and (iii) we get, 1< r < 1.
or, ∑ ( x i − x̄ )( y i− ȳ ) =0
n
∑ ( xi − x̄ )( y i− ȳ )
i =1
n n
We Know, rxy = √∑ (
i =1
xi − x̄ ) 2
∑ ( y i − ȳ )2
i =1
0
n n
=
√∑ (i =1
xi − x̄ )
2
∑ ( y i − ȳ )2
i =1
= 0 (proved)
(ii) r=-1, indicates that the correlation coefficient between x and y is perfect negative.
(iii) r=1, indicates that the correlation coefficient between x and y is perfect positive.
r>1, is not possible because, the Correlation coefficient lies between -1 to +1.
(v) r<1, not possible because, the Correlation coefficient lies between -1 to +1.
rxy = √ ∑ ( xi − x̄ ) ∑ ( y i − ȳ )2
i =1
2
i =1 ……………..(1)
Now, y = mx + c………..(ii)
n
∑ ( xi − x̄ )( mxi + c−m x̄ −c )
i =1
n n
=
√∑ (
i =1
xi − x̄ ) 2
∑ ( mxi−m x̄ )2
i =1
n
m ∑ ( x i− x̄ )(x i − x̄ )
i=1
n n
=
m
√ ∑ ( x i− x̄ )2 ∑ ( x i− x̄ )2
i=1 i=1
n
∑ ( x i− x̄ )2
i=1
n
∑ ( x i− x̄ )2
= i=1
=1
Application Problem-2: A research physician recorded the pulse rates and the
temperatures of water submerging the faces of ten small children in cold water to control
the abnormally rapid heartbeats. The results are presented in the following table.
Calculate the correlation coefficient between temperature of water and reduction in pulse
rate.
Temperature of water 68 65 70 62 60 55 58 65 69 63
Reduction in pulse rate. 2 5 1 10 9 13 10 3 4 6
∑ x i y i−
(∑ )(∑ )
i=1
xi
i=1
yi
√{
n 2 n 2
We know, rxy =
n
∑ x i2−
i=1
(∑ )
i=1
n
xi
}{ n
∑ y i 2−
i =1
(∑ )
i=1
yi
n
}
635×63
3835−
10
( 635 )2 ( 63 )2
= √{ 40537−
10
541−
10 }{ }
= -0.94
The result -0.94, indicates that the correlation coefficient between temperature of water
and reduction in pulse rate is highly negatively correlated.
Assignment problem-1: Compute r for the for the following paired sets of values:
i.(x, y): (1,2) , (2, 3), (3, 5), (4, 4), (5, 7)
ii. (x, y): (1,1) , (2, 3), (3, 5), (4, 7), (5, 9)
iii.(x, y): (1,10) , (2, 8), (3, 6), (4, 4), (5, 2)
iv.(x, y): (2,9) , (3, 5), (4, 6), (5, 2), (6, 1)
v.(x, y): (-2,4) , (-1, 1), (0, 0), (1, 1), (2, 4)
Solution 1: (x, y): (1,2) , (2, 3), (3, 5), (4, 4), (5, 7)
∑ x i y i−
(∑ )(∑ )
i=1
xi
i=1
yi
√{
n 2 n 2
rxy =
n
∑ x i2−
i=1
(∑ )
i=1
n
xi
}{ n
∑ y i 2−
i =1
(∑ )i=1
n
yi
}
Let us make a table to calculate correlation coefficient.
∑ x i y i−
( )(∑ )
∑ xi
i=1 i=1
yi
√{
n 2 n 2
rxy =
n
∑ x i2−
i=1
(∑ )
i=1
n
xi
}{ n
∑ y i 2−
i =1
(∑ )
i=1
n
yi
}
15×21
74−
5
( 15 )2 ( 21 )2
= √{ 55−
5
103−
5}{ }
= 0.90
Age in years 56 42 36 47 49 42 72 63 55 60
x
Blood pressure 147 12 118 128 125 140 155 160 149 150
y 5
(i) Draw a scatter diagram
(ii) Find correlation coefficient between x and y and comment.
RANK CORRELATION
Rank correlation: In some situation it is difficult to measure the values of the variables
from bivariate distribution numerically, but the can be ranked. The correlation coefficient
between these two ranks is usually called rank correlation coefficient, given by Spearman
(1904). It is denoted by R. this is the only method for finding relationship between two
qualitative variables like beauty, honesty, intelligence, efficiency and so on.
When there are no ties, the formula for computing the spearman’s rank correlation
coefficient
6 ∑ d2
R = 1- n ( n2 −1 )
Remarks:
Note: For finding rank correlation coefficient, we may have two types of data:
Example-1: Obtain the rank correlation co-efficient for the following data:
A: 80 75 90 70 65 60
B: 65 70 60 75 85 80
Solution: Here ranks of the score are not given. Let us start ranking from the highest
value for both the variables as shown in the table given below:
2
6∑ d
R = 1- n ( n2 −1 )
6×4
2
= 1- 6 ( 6 −1 ) = - 0.94
Example-2: Obtain the rank correlation co-efficient for the following data:
Examiner A B C D E
I 1 2 3 4 5
II 2 4 1 5 4
Ranking by Ranking by d = R1 – R2 d2
examiner-I: R1 examiner-II:
R2
1 2 -1 1
2 3 -1 1
3 1 2 4
4 5 -1 1
5 4 1 1
Total ∑ d i == 0 ∑ d 2 == 8
i
2
6∑ d
R = 1- n ( n2 −1 )
6×8
2
= 1- 5 ( 5 −1 )
= 0.6
Comment: There is a positive rank correlation coefficient between the rankings of two
examiners.
R = 1- n ( n2 −1 )
Marks in 20 80 40 12 28 20 15 60
mathematics
Marks in statistics 30 60 20 30 50 30 40 20
Compute rank correlation coefficient and comment.
Solution: let the marks obtained by mathematics be x and the marks obtained by statistics
be y.
=1
Assignment problem-3:
Profit (Tk.Crore):x 25 28 27 33 31 10 16 16 18 23
(ii) Calculate Karl Pearson’s and Spearman rank correlation coefficients and comment.
Assignment problem-4:
(Tk. Lac)
Sales 11 13 17 18 21 24 21 27 26 21
(Tk.Crore)
Calculate Karl Pearson’s correlation coefficient and Spearman rank correlation
Regression
1. What is regression?
Ans: The probable movement of one variable in terms of the other variables is called
regression.
In other words the statistical technique by which we can estimate the unknown value of
one variable (dependent) from the known value of another variable is called regression.
The term “regression” was used by a famous Biometrician Sir. F. Galton (1822-1911) in
1877.
Let, (x1,y1), (x2,y2)……….. (xn,yn) be the pairs of n observations. Then the regression
coefficient of y on x is denoted by byx and defined by
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( xi − x̄ )2
byx = i=1
Again, the regression coefficient of x on y is denoted by bxy and defined by
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( y i − ȳ )2
bxy = i=1
4. Regression lines:
If we consider two variables X and Y, we shall have two regression lines as the
regression line of Y on X and the regression line of X on Y. The regression line of Y on
X gives the most probable values of Y for given values of X and The regression line of
X on Y gives the most probable values of X for given values of Y. Thus we have two
regression lines. However, when there is either perfect positive or perfect negative
correlation between the two variables, the two regression lines will coincide i.e, we will
have one line.
5. Regression equation:
The regression equation of y on x is expressed as follows:
∑ y −b ∑ x n
∑ ( xi − x̄ )2
Here, a = y - bx = n n and b= i =1
n n
∑ xi yi−
(∑ )(∑ )
i =1
xi
i=1
yi
n
n 2
n
∑ x i2 −
( )∑ xi
i =1
= i=1 n
n n
∑ xi yi−
( )(∑ )
∑ xi
i =1 i=1
yi
n
n 2
n
∑ y i2 −
(∑ )
i =1
yi
= i=1 n
i.e, rxy= √ b yx ×b xy
5. The arithmetic mean of two regression coefficient is greater than correlation
b yx +b xy
Coefficient. i.e,
( 2 )
≥
rxy
6. If one of regression coefficient is greater than unity the other must be less than
rxy = √ ∑ ( xi − x̄ )2 ∑ ( y i − ȳ )2
i =1 i =1 ……………..(1)
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( xi − x̄ )2
Again, the regression coefficient of y on x is, byx = i =1
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( y i − ȳ )2
Again, the regression coefficient of x on y is, bxy = i=1
n n
∑ ( x i− x̄ )( y i − ȳ ) ∑ ( x i− x̄ )( yi − ȳ )
i=1 i=1
n
× n
2
∑ ( xi − x̄ ) ∑ ( y i− ȳ ) 2
byx ¿ bxy = i=1 i=1
n
∑ ( xi − x̄ )( y i− ȳ )
i =1
n n
√ b yx ×b xy= √ ∑ ( xi − x̄ )2 ∑ ( y i − ȳ )2
i =1 i =1
= rxy (proved)
b yx + b xy
The arithmetic mean of byx and bxy is A.M=
( 2 ) and the geometric mean is
√ b yx ×b xy
G.M=
b yx + b xy
( 2 ) ¿ √ b yx ×b xy
or,
b yx + b xy
( 2 ) ¿ r (proved
or,
6. When r=0 then the variables are 6. When r=0 then two lines of regression
correlated. are perpendicular to each other.
∑ xi yi−
(∑ )(∑ )
i =1
xi
i=1
yi
n
n 2
n
∑ x i2 −
( )
∑ xi
i =1
x y x2 y2 xy
39 37 1521 1369 1443
25 18 625 324 450
29 20 841 400 580
35 25 1225 625 875
32 25 1024 625 800
27 20 729 400 540
37 30 1369 900 1110
∑ x= 224 ∑ y=175 ∑ x 2=7334 ∑ y 2= 4643 ∑ xy= 5798
n n
∑ xi yi−
( )( )
∑ xi ∑ yi
i =1 i=1
n
( 224 )( 175 )
n 2 5798−
n
∑ x i2 −
(∑ )
i =1
xi
7334−
7
( 224 )2
(a) Here, b = i=1 n 7 = 1.193
=
And a = y - bx
∑y b
∑x
= n - n
175 ( 224 )
= 7 -(1.193) 7 = 25-38.176 = -13.176
(b) Hence, if the age of husband is 45, the probable age of wife would be
^y = -13.176 + 1.193x = -13.176 + 1.193 ¿ 45 = 40.51 years.
(c) The equation of the best –fitted regression line of y on x is ^x = a + by
n n
∑ xi yi−
( )(∑ )
∑ xi
i =1 i=1
yi
n
n 2
n
∑ y i2 −
(∑ )
i =1
yi
Where, b = i=1 n
( 224 )( 175 )
5798−
7
( 175 )2
4643−
= 7 = 0.739
And a= x - by
∑ x −b ∑ y
= n n
224 175
−0 . 739
= 7 7 = 13.525
^x = a + by
Application problem-2: A research physician recorded the pulse rates and the
temperatures of water submerging the faces of ten small children in cold water to control
the abnormally rapid heartbeats. The results are presented in the following table.
Calculate the correlation coefficient and regression coefficients between temperature of
water and reduction in pulse rate.
Temperature of water 68 65 70 62 60 55 58 65 69 63
Reduction in pulse rate. 2 5 1 10 9 13 10 3 4 6
b yx + b xy
( 2 )
≥
rxy
Also show that (i)
∑ x i y i−
(∑ )(∑ )
i=1
xi
i=1
yi
√{
n 2 n 2
We know, rxy =
n
∑ x i2−
i=1
(∑ )
i=1
n
xi
}{ n
∑ y i 2−
i =1
(∑ )i=1
n
yi
}
635×63
3835−
10
( 635 )2 ( 63 )2
= √{ 40537−
10 }{
541−
10 }
= -0.94
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( xi − x̄ )2
We know, the regression coefficient of y on x is, byx = i =1
n n
∑ xi yi−
( )(∑ )
∑ xi
i =1 i=1
yi
n
635×63
n 2 3835−
10
n
∑ x i2 −
(∑ )i =1
xi
40537−
( 635 )2 −1655
=
= i =1 n = 10 = = 2145 -
0.77
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( y i − ȳ )2
Again, the regression coefficient of x on y is, bxy = i=1
n n
∑ xi yi−
( )(∑ )
∑ xi
i =1 i=1
yi
n
635×63
n 2 3835−
10
n
∑ y i2 −
(∑ )
i =1
yi
541−
( 63 )2 −1655
=
= i=1 n = 10 = 1441 -1.1
b yx + b xy
(i)
( 2 )
≥
rxy
b yx +b xy (−0 .77 )+ (−1. 1 )
Here,
( 2 )
=
2 = -0.94 = rxy
Assignment Problem-1: The following data give the test scores and sales made by nine
salesmen during the last year of a big departmental store:
Test Scores: y 14 19 24 21 26 22 15 20 19
Sales(in lakh Taka) 31 36 48 37 50 45 33 41 39
(a) Find the regression equation of test scores on sales.
Ans: ^y = -2.4 + 0.56x
(b) Find the test scores when the sale is Tk. 40 lakh.
Ans: 20 lakh
(c) Find the regression equation of sales on test scores.
Ans: ^x = 7.8 + 1.61y
(d) Predict the value of sale if the test score is 30
Ans: 56.1 lakh
(e) Compute the value of correlation coefficient with the help of regression
coefficients.
Assignment Problem-2: The following table gives the ages and blood pressure of 10
women:
Age in years 56 42 36 47 49 42 72 63 55 60
x
Blood pressure 147 12 118 128 125 140 155 160 149 150
y 5
(i) Obtain the regression line of y on x. Ans: ^y = 83.76+ 1.11x
(ii) Estimate the blood pressure of a women whose age is 50 years. Ans: 139.26
(iii) Obtain the regression line of x on y.
(iv) Find correlation coefficient between x and y and comment.
Assignment Problem-3: Consider the following data set on two variables x and y:
x:1 2 3 4 5 6
y:6 4 3 5 4 2
Units 56 40 48 30 41 42 55 35
Overhead 282 173 233 116 191 171 274 152
(i)Draw a scatter diagram and comment
Assignment Problem-5: The following data refer to information about annual sales
Salesmen 1 2 3 4 5 6 7 8
Year of experience 7 4 5 6 11 12 13 17
(i)Fit two regression lines.