0% found this document useful (0 votes)
35 views14 pages

PPP Correlation BIOSTATISTICS

The document discusses correlation between variables and different types of correlation. It defines positive and negative correlation, and explains linear correlation. It also describes using scatter diagrams to understand the correlation between variables and interpret different patterns in scatter diagrams. Finally, it discusses Karl Pearson's coefficient of correlation and how it is calculated.

Uploaded by

aishaheaven
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views14 pages

PPP Correlation BIOSTATISTICS

The document discusses correlation between variables and different types of correlation. It defines positive and negative correlation, and explains linear correlation. It also describes using scatter diagrams to understand the correlation between variables and interpret different patterns in scatter diagrams. Finally, it discusses Karl Pearson's coefficient of correlation and how it is calculated.

Uploaded by

aishaheaven
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

CORRELATION

Correlation: If the change in one variable affects other variable, the variables are said to
correlation variable.
If the variables deviate in the same direction, i.e. increase (decrease) in the value of one
variable results in corresponding increase (decrease) in the value of the other variable the
correlation is said to be +ve correlation. For example,
(i) Height and weight of a man
(ii) Income and Expenditure
(iii) Population and unemployment.
And if the variable deviate in the opposite direction, i.e. increase (decrease) in the value of
one variable results in corresponding decrease (increase) in the value of other variable, the
correlation is said to be negative correlation, e.g.
(i) Price and Demand of a commodity
(ii) Volume and pressure of a perfect gas
Linear Correlation: Correlation is said to be linear if the ratio of the change in the value of
one variable to the change in the value of other variable is almost constant.

For example, X 5 10 20 25 30 40
Y 8 16 31 39 48 63

Scatter Diagram: Scatter diagram is one of the simplest methods to get an idea about the
correlation between two variables. In this method first of all we take one of the variables
along horizontal axis (generally independent variable) and another variable along vertical
axis (generally dependent variable), then we plot the dots against the values of two variables.
The diagram thus obtained is known as scatter diagram.
INTERPRETATION FROM SCATTER DIAGRAM
In the scatter diagram
1. If dots are in the shape of a line and line rises from left bottom to the right top (Fig.1),
then correlation is said to be perfect positive.
12
10
8
y values
6
4
2
0
0 2 4 6 8 10 12

1
Fig. 1: Scatter diagram for perfect positive correlation

2. If dots in the scatter diagram are in the shape of a line and line moves from left top to
right bottom Fig. 2, then correlation is perfect negative.
12
10
8
6 y values

4
2
0
0 2 4 6 8 10 12

Fig. 2: Scatter diagram for perfect negative correlation


3. If dots show some trend and trend is downward from left top to the right bottom
(Fig.4) correlation is said to be negative.

120
100
80
60 y values

40
20
0
0 2 4 6 8 10 12

Fig.4: Scatter diagram for negative correlation

4. If dots of scatter diagram do not show any trend Fig. 5 there is no correlation between
the variables.

2
10
8
6
y values
4
2
0
0 2 4 6 8 10 12

Fig. 5: Scatter diagram for uncorrelated data


NOTE: Scatter diagram does not tell about exact relationship between two variables but it
indicates whether they are correlated or not. Some numerical values of Karl Pearson’s
coefficient of correlation (discussed later on) and their interpretation are given in the
following table:
Degree of Correlation Numerical Value of r
Perfect Positive Correlation 1
Perfect Negative Correlation –1
High Degree of Positive Correlation
Moderate Degree of Positive Correlation
Lower Degree of Positive Correlation
High Degree of Positive Correlation
Moderate Degree of Positive Correlation
Lower Degree of Positive Correlation
No Correlation 0
KARL PEARSON’S COEFFICIENT OF CORRELATION: (r)
Karl Pearson’s coefficient of correlation between two variables X and Y is defined as

3
Note: Remember
Types of Correlation
 When both variables are continuous
Ans: Pearson product moment correlation coefficient (r)
 When one variable is cts and other in dichotomous (0 & 1)
Ans: Point bi-serial correlation
 When both are dichotomous
Ans: Phi coefficient
 When both variables are ordinal
Ans: Spearman rank correlation
Assumptions:
Assumptions when interpreting r
 Normal distribution for X and Y
 Linear relationship between X and Y
 Homoscedasticity
 Reliability of X and Y
 Validity of X and Y
 Random and representative sampling

How to check the Assumptions


 Normal distribution for X and Y
How to defect violations
Plot histograms and examine summary statistics

4
 Homoscedasticity
In a scatter plot the vertical distance between a dot and the regression line reflects the
amount of prediction error (known as the “residual”)

Note: These residuals should not be related to x. If it is so then there is some confound in
our study. Residual should be chance error they should not be systematic, because
systematic means they are related to x.

Some Exercises for Practice


1. Construct a graph showing correlation between demand and supply of a certain
commodity.

Year 2000 2001 2002 2003 2004 2005 2006 2007


Demand 5 10 14 20 25 29 35 38
Supply 4 8 12 18 22 25 30 35
2. Find Karl Pearson’s coefficient between X and Y series.
X 18 20 22 24 26 28 30
Y 30 35 37 33 38 45 41
Solution:

X Y

18 30 36 49 42
20 35 16 4 8
22 37 4 0 0
0
24 33 0 16 0
0
26 38 4 1 2
2 1
28 45 16 64 32
4 8
30 41 36 16 24
6 4

168 259 112 150 108

Karl Pearson’s coefficient between X and Y series is given by

5
3. From the following data, compute the coefficient of correlation between X and Y.
X-series Y-series
No. of items 15 15
Arithmetic mean 25 18
Sum of squares of deviations form mean 136 138
Summation of product of deviation of X and Y-series from their respective means =
122.
4. (a) Calculate the coefficient of correlation between X and Y –series from the
following data.

X-series Y-series
No. of items 15 15
Arithmetic mean 25 18
Standard deviations 5 5

(b) If the covariance between X and Y – series is +3.5 and variance of X and Y – series
are respectively 2.99 and 10.56, find the coefficient of correlation between them.
5 The coefficient of correlation between two variables X and Y is 0.4 and their
covariance is 10. If variance of X is 9, find the second moment about mean of Y –
series.
6 Calculate coefficient of correlation from the following data

X 50 100 150 200 250 300 350


Y 10 20 30 40 50 60 70
7 Two variates X and Y when expressed as deviations from their respective means are
given as follows:
X –4 –3 –1 –2 0 1 2 3 4
Y 3 –3 ? 0 4 1 2 –2 –1
Find the coefficient of correlation between them.
8 Find Karl Pearson’s coefficient of correlation. Take deviations from the actual means
6 and 8 respectively.
X 6 2 10 4 8
Y 9 11 ? 8 7
9 Find Karl Pearson’s coefficient of correlation between X and Y –series.
X 20 22 24 26 28 32
Y 30 35 38 45 52 60
Solution:
X Y UV

20 30 25 169 65

6
22 35 9 64 24
24 38 1 25 5
26 45 1 4 2
1 2
28 52 9 81 27
3 9
32 60 49 289 119
7 17

152 260 2 94 2 632 242

Karl Pearson’s coefficient between U and V series and consequently between X and Y
is given by

10. Find Karl Pearson’s coefficient of correlation between the values of X and Y given
below.
X 78 89 96 69 59 79 68 61
Y 125 137 156 112 107 136 123 108
Assume 69 and 112 as the mean values for X and Y respectively.
11. The total of the multiplication of deviation of X and Y = 3044
Number of pairs of the observation = 10
Total of the deviation of X = – 170
Total of the deviation of Y = – 20
Total of the square of deviations of X = 8288
Total of the square of deviations of Y = 2264

Find out the coefficient of correlation when the arbitrary means of X and Y are 82 and
68 respectively.

12. A computer, while calculating the correlation coefficient between two variables X and
Y obtained the following constants.

7
It was, however, later discovered at the time of checking that it had copied down two
pairs of observations as:

X Y While the correct values were X Y


6 14 8 12
8 6 6 8
Obtain the correct value of coefficient of
correlation.
Solution: Corrected value of

Now, corrected value of r is given by

SPEARMAN’S RANK CORRELATION COEFFICIENT

When variables under study are qualitative in nature then we cannot find actual numerical
values corresponding to different values of the variables. In such cases instead of Spearman’s
coefficient of correlation we find Spearman’s rank correlation coefficient which is given
below:

Remark: In case of repeated ranks we add a factor in the formula for


each repeated rank, where m is the number of times a value is repeated, i.e.

8
Note: Remember

13. Find the coefficient of correlation by ranking method.


X 10 12 8 15 20 25 40
Y 15 10 6 25 16 12 18
Solution:
In case ranks 1, 2, ... are given to lowest, next lowest, and so on:

X Y Ranks to the Ranks to the


values of X values of Y

10 15 2 4 4
12 10 3 2 1 1
8 6 1 1 0 0
15 25 4 7 9
20 16 5 5 0
0
25 12 6 3 9
3
40 18 7 6 1
1

24

In case ranks 1, 2, ... are given to highest, next highest, and so on:

X Y Ranks to the Ranks to the


values of X values of Y

10 15 6 4 4
12 10 5 6 1
8 6 7 7 0

9
15 25 4 1 0 9
20 16 3 3 3 0
25 12 2 5 0 9
40 18 1 2 1

24

14 Calculate the rank coefficient of correlation from the following data.


X 80 78 75 75 68 67 60 59
Y 12 13 14 14 14 16 15 17
Solution:
In case ranks 1, 2, ... are given to highest, next highest, and so on:

X Y Ranks to the Ranks to the


values of X values of Y

80 12 1 8 49
78 13 2 7 25
75 14 3.5 5 2.25
75 14 3.5 5 2.25
68 14 5 5 0
0
67 16 6 2 16
4
60 15 7 3 16
4
59 17 8 1 49
7

159.
5

10
Here, in X-series, 75 is repeated 2 times, so m = 2 and
in Y-series, 14 is repeated 3 times, so m = 3, so by putting the values in above
formula we get

15 Calculate coefficient of correlation by Spearmen’s Ranking Method.


X 13 10 15 16 15 10 8 15
Y 6 8 10 8 4 5 7 3
16 Ten competitors in a beauty contest are ranked by three Judges in the following order:
1st judge 1 6 5 10 3 2 4 9 7 8
2nd Judge 3 5 8 4 7 10 2 1 6 9
3rd Judge 6 4 9 8 1 2 3 10 5 7
Find which pair of Judges has nearest approach to common taste in beauty.
Solution:

Ranks by Ranks by Ranks by


1st Judge 2nd Judge 3rd Judge

( ) ( ) ( )

1 3 6 4 25 9
6 5 4 1 4 1
1 2 1
5 8 9 9 16 1
10 4 8 36 4 16
3 7 1 6 2 16 4 36
2 10 2 2 6 64 0 64
4 2 3 0 8 4 1 1
9 1 10 1 64 1 81
2
7 6 5 1 4 1
8
8 9 7 2 1 1 4
1 1
1
2

200 60 214

Rank correlation between Judges 1st and 2nd is given by

11
Rank correlation between Judges 1st and 3rd is given by

R
ank correlation between Judges 2 and 3 is given by
nd rd

Since, is maximum, we conclude that the pair of judges 1st and 3rd have the
nearest approach to common tastes in beauty.
17 From the following data, calculate Spearman’s Rank Coefficient of correlation.

Serial Number Rank difference Serial Number Rank differences


1 –2 6 0
2 –4 7 ?
3 –1 8 +3
4 +3 9 +3
5 +2 10 –3
18 The ranks of the same 16 students tests in Mathematics and Statistics were as follows,
the two numbers within brackets denoting the ranks of the same student in
Mathematics and Statistics respectively.
1. (1,1), 5. (5,5), 9. (9,8), 13. (13,1),
2. (2,10) 6. (6,7), 10. (10,1), 14. (14,1),
3. (3,3), 7. (7,2), 11. (11,1), 15. (15,1),
4. (4,4), 8. (8,6), 12. (12,9), 16. (16,1),

(a) Calculate the rank correlation coefficient for proficiencies of this group in
Mathematics and Statistics.
(b) What does the value of the coefficient obtained indicate?
(c) If you had found out Karl Pearson’s simple coefficient of correlation between the
ranks of these 16 students, would your results have been the same as obtained in
(a) or different?
19 Construct a Scatter diagram from the following data and comment on the correlation.

12
X 1 4 7 10 13 16 19
Y 1 4 2 9 14 17 19
20 Construct a graph showing correlation, if any between the ages of husbands and
wives from the data given below.
Husband’s Age 16 17 18 19 20 21 22
Wife’s Age 14 15 16 17 18 19 20
21 From the following data, construct a correlation graph and comment upon the result.
Price 11 1 13 14 15 16 17 18 19 20
2
Supply 40 3 39 35 34 34 34 31 28 25
9
Exercises based on Karl Pearson’s Coefficient of correlation.
22 From the following data, compute the coefficient of correlation between X and Y.
X-series Y-series
No. of items 10 10
Arithmetic Mean 65 66
Sum of squares of deviations from Mean 5398 2224
Summation of product of deviations of X and Y-series form their respective means =
2704.
23 A computer while calculating the correlation coefficient between two variables X and
Y obtained the following constants.

N = 25

It was, however, later discovered at the time of checking that it had copied down two pairs of
observations as:
X Y
X Y 8 10
8 12 6 10
6 8 While the correct values
were
Obtain the correct value of coefficient.
24 Ten students get the following Marks in Statistics and Accountancy.
Marks in Statistics 6 34 96 23 73 80 90 60 63 33
Marks in Accountancy 80 47 87 56 64 58 82 54 31 45
Calculate Rank coefficient of correlation.
25 Find coefficient of correlation by ranking method from the following data
X 14 12 16 18 20 22 24 27 28 25
Y 12 17 8 19 10 7 13 14 5 11
26 Find out coefficient of correlation between X and Y by the method of rank
differences.
X 42 48 35 50 50 57 45 40 50 39

13
Y 90 110 95 95 95 120 115 128 116 130
27 Ten competitors in a beauty contest are ranked by three Judges in the following order.
1st Judge 1 5 6 10 2 3 4 9 8 7
2nd Judge 3 5 8 7 4 10 2 1 6 9
3rd Judge 6 4 9 8 1 7 5 10 3 2
Use the rank correlation to determine which pair of Judges has the nearest approach to
common tastes in beauty.
ANSWERS

2. 0.833 3. 0.89 4. (a) 0.33 (b) 0.62 5. 6. 1 7. –0.06, missing item = –4


8. missing item = 5, r = –0.919 9. 0.96 10. 0.954 11. 0.78 12. 0.667 13. 0.572 14. –0.928

15. 0.018 16. 17. 0.636 18 (a) 0.8 (b) The


coefficient of correlation between Mathematics and Statistics is positive and highly
significant (c) r = 0.8 Thus, it is clear that the answer from K.P. ‘s coefficient will also remain
equivalent to that of Rank’s coefficient of correlation. However, the answer would differ
when there is repetition in ranks. 19. High degree positive correlation 20. Perfect positive
correlation 21. High degree of negative correlation 22. 0.78 23. 0.28 24. 0.454

25. –0.36 26. –0.055 27. . So, Ist and 3rd


Judges have the nearest approach.

14

You might also like