PPP Correlation BIOSTATISTICS
PPP Correlation BIOSTATISTICS
Correlation: If the change in one variable affects other variable, the variables are said to
correlation variable.
If the variables deviate in the same direction, i.e. increase (decrease) in the value of one
variable results in corresponding increase (decrease) in the value of the other variable the
correlation is said to be +ve correlation. For example,
(i) Height and weight of a man
(ii) Income and Expenditure
(iii) Population and unemployment.
And if the variable deviate in the opposite direction, i.e. increase (decrease) in the value of
one variable results in corresponding decrease (increase) in the value of other variable, the
correlation is said to be negative correlation, e.g.
(i) Price and Demand of a commodity
(ii) Volume and pressure of a perfect gas
Linear Correlation: Correlation is said to be linear if the ratio of the change in the value of
one variable to the change in the value of other variable is almost constant.
For example, X 5 10 20 25 30 40
Y 8 16 31 39 48 63
Scatter Diagram: Scatter diagram is one of the simplest methods to get an idea about the
correlation between two variables. In this method first of all we take one of the variables
along horizontal axis (generally independent variable) and another variable along vertical
axis (generally dependent variable), then we plot the dots against the values of two variables.
The diagram thus obtained is known as scatter diagram.
INTERPRETATION FROM SCATTER DIAGRAM
In the scatter diagram
1. If dots are in the shape of a line and line rises from left bottom to the right top (Fig.1),
then correlation is said to be perfect positive.
12
10
8
y values
6
4
2
0
0 2 4 6 8 10 12
1
Fig. 1: Scatter diagram for perfect positive correlation
2. If dots in the scatter diagram are in the shape of a line and line moves from left top to
right bottom Fig. 2, then correlation is perfect negative.
12
10
8
6 y values
4
2
0
0 2 4 6 8 10 12
120
100
80
60 y values
40
20
0
0 2 4 6 8 10 12
4. If dots of scatter diagram do not show any trend Fig. 5 there is no correlation between
the variables.
2
10
8
6
y values
4
2
0
0 2 4 6 8 10 12
3
Note: Remember
Types of Correlation
When both variables are continuous
Ans: Pearson product moment correlation coefficient (r)
When one variable is cts and other in dichotomous (0 & 1)
Ans: Point bi-serial correlation
When both are dichotomous
Ans: Phi coefficient
When both variables are ordinal
Ans: Spearman rank correlation
Assumptions:
Assumptions when interpreting r
Normal distribution for X and Y
Linear relationship between X and Y
Homoscedasticity
Reliability of X and Y
Validity of X and Y
Random and representative sampling
4
Homoscedasticity
In a scatter plot the vertical distance between a dot and the regression line reflects the
amount of prediction error (known as the “residual”)
Note: These residuals should not be related to x. If it is so then there is some confound in
our study. Residual should be chance error they should not be systematic, because
systematic means they are related to x.
X Y
18 30 36 49 42
20 35 16 4 8
22 37 4 0 0
0
24 33 0 16 0
0
26 38 4 1 2
2 1
28 45 16 64 32
4 8
30 41 36 16 24
6 4
5
3. From the following data, compute the coefficient of correlation between X and Y.
X-series Y-series
No. of items 15 15
Arithmetic mean 25 18
Sum of squares of deviations form mean 136 138
Summation of product of deviation of X and Y-series from their respective means =
122.
4. (a) Calculate the coefficient of correlation between X and Y –series from the
following data.
X-series Y-series
No. of items 15 15
Arithmetic mean 25 18
Standard deviations 5 5
(b) If the covariance between X and Y – series is +3.5 and variance of X and Y – series
are respectively 2.99 and 10.56, find the coefficient of correlation between them.
5 The coefficient of correlation between two variables X and Y is 0.4 and their
covariance is 10. If variance of X is 9, find the second moment about mean of Y –
series.
6 Calculate coefficient of correlation from the following data
20 30 25 169 65
6
22 35 9 64 24
24 38 1 25 5
26 45 1 4 2
1 2
28 52 9 81 27
3 9
32 60 49 289 119
7 17
Karl Pearson’s coefficient between U and V series and consequently between X and Y
is given by
10. Find Karl Pearson’s coefficient of correlation between the values of X and Y given
below.
X 78 89 96 69 59 79 68 61
Y 125 137 156 112 107 136 123 108
Assume 69 and 112 as the mean values for X and Y respectively.
11. The total of the multiplication of deviation of X and Y = 3044
Number of pairs of the observation = 10
Total of the deviation of X = – 170
Total of the deviation of Y = – 20
Total of the square of deviations of X = 8288
Total of the square of deviations of Y = 2264
Find out the coefficient of correlation when the arbitrary means of X and Y are 82 and
68 respectively.
12. A computer, while calculating the correlation coefficient between two variables X and
Y obtained the following constants.
7
It was, however, later discovered at the time of checking that it had copied down two
pairs of observations as:
When variables under study are qualitative in nature then we cannot find actual numerical
values corresponding to different values of the variables. In such cases instead of Spearman’s
coefficient of correlation we find Spearman’s rank correlation coefficient which is given
below:
8
Note: Remember
10 15 2 4 4
12 10 3 2 1 1
8 6 1 1 0 0
15 25 4 7 9
20 16 5 5 0
0
25 12 6 3 9
3
40 18 7 6 1
1
24
In case ranks 1, 2, ... are given to highest, next highest, and so on:
10 15 6 4 4
12 10 5 6 1
8 6 7 7 0
9
15 25 4 1 0 9
20 16 3 3 3 0
25 12 2 5 0 9
40 18 1 2 1
24
80 12 1 8 49
78 13 2 7 25
75 14 3.5 5 2.25
75 14 3.5 5 2.25
68 14 5 5 0
0
67 16 6 2 16
4
60 15 7 3 16
4
59 17 8 1 49
7
159.
5
10
Here, in X-series, 75 is repeated 2 times, so m = 2 and
in Y-series, 14 is repeated 3 times, so m = 3, so by putting the values in above
formula we get
( ) ( ) ( )
1 3 6 4 25 9
6 5 4 1 4 1
1 2 1
5 8 9 9 16 1
10 4 8 36 4 16
3 7 1 6 2 16 4 36
2 10 2 2 6 64 0 64
4 2 3 0 8 4 1 1
9 1 10 1 64 1 81
2
7 6 5 1 4 1
8
8 9 7 2 1 1 4
1 1
1
2
200 60 214
11
Rank correlation between Judges 1st and 3rd is given by
R
ank correlation between Judges 2 and 3 is given by
nd rd
Since, is maximum, we conclude that the pair of judges 1st and 3rd have the
nearest approach to common tastes in beauty.
17 From the following data, calculate Spearman’s Rank Coefficient of correlation.
(a) Calculate the rank correlation coefficient for proficiencies of this group in
Mathematics and Statistics.
(b) What does the value of the coefficient obtained indicate?
(c) If you had found out Karl Pearson’s simple coefficient of correlation between the
ranks of these 16 students, would your results have been the same as obtained in
(a) or different?
19 Construct a Scatter diagram from the following data and comment on the correlation.
12
X 1 4 7 10 13 16 19
Y 1 4 2 9 14 17 19
20 Construct a graph showing correlation, if any between the ages of husbands and
wives from the data given below.
Husband’s Age 16 17 18 19 20 21 22
Wife’s Age 14 15 16 17 18 19 20
21 From the following data, construct a correlation graph and comment upon the result.
Price 11 1 13 14 15 16 17 18 19 20
2
Supply 40 3 39 35 34 34 34 31 28 25
9
Exercises based on Karl Pearson’s Coefficient of correlation.
22 From the following data, compute the coefficient of correlation between X and Y.
X-series Y-series
No. of items 10 10
Arithmetic Mean 65 66
Sum of squares of deviations from Mean 5398 2224
Summation of product of deviations of X and Y-series form their respective means =
2704.
23 A computer while calculating the correlation coefficient between two variables X and
Y obtained the following constants.
N = 25
It was, however, later discovered at the time of checking that it had copied down two pairs of
observations as:
X Y
X Y 8 10
8 12 6 10
6 8 While the correct values
were
Obtain the correct value of coefficient.
24 Ten students get the following Marks in Statistics and Accountancy.
Marks in Statistics 6 34 96 23 73 80 90 60 63 33
Marks in Accountancy 80 47 87 56 64 58 82 54 31 45
Calculate Rank coefficient of correlation.
25 Find coefficient of correlation by ranking method from the following data
X 14 12 16 18 20 22 24 27 28 25
Y 12 17 8 19 10 7 13 14 5 11
26 Find out coefficient of correlation between X and Y by the method of rank
differences.
X 42 48 35 50 50 57 45 40 50 39
13
Y 90 110 95 95 95 120 115 128 116 130
27 Ten competitors in a beauty contest are ranked by three Judges in the following order.
1st Judge 1 5 6 10 2 3 4 9 8 7
2nd Judge 3 5 8 7 4 10 2 1 6 9
3rd Judge 6 4 9 8 1 7 5 10 3 2
Use the rank correlation to determine which pair of Judges has the nearest approach to
common tastes in beauty.
ANSWERS
14