Unit 7 Rank Correlation: Structure
Unit 7 Rank Correlation: Structure
Structure
7.1 Introduction
Objectives
7.2 Concept of Rank Correlation
7.3 Derivation of Rank Correlation Coefficient Formula
7.4 Tied or Repeated Ranks
7.5 Concurrent Deviation
7.6 Summary
7.7 Solutions / Answers
7.1 INTRODUCTION
In second unit of this block, we have discussed the correlation with its
properties and also the calculation of correlation coefficient. In correlation
coefficient or product moment correlation coefficient, it is assumed that both
characteristics are measurable. Sometimes characteristics are not measurable
but ranks may be given to individuals according to their qualities. In such
situations rank correlation is used to know the association between two
characteristics. In this unit, we will discuss the rank correlation and calculation
of rank correlation coefficient with its merits and demerits. We will also study
the method of concurrent deviation.
In Section 7.2, you will know the concept of rank correlation while Section 7.3
gives the derivation of Spearman’s rank correlation coefficient formula. Merits
and demerits of the rank correlation coefficient are discussed in Sub-section
7.3.1. There might be a situation when two items get same rank. This situation is
called tied or repeated rank which is described in Section 7.4. You will learn the
method of concurrent deviation in Section 7.5.
Objectives
After reading this unit, you would be able to
• explain the concept of rank correlation;
• derive the Spearman’s rank correlation coefficient formula;
• describe the merits and demerits of rank correlation coefficient;
• calculate the rank correlation coefficient in case of tied or repeated ranks;
and
• describe the method of concurrent deviation.
Correlation for Bivariate Data order of their merits. This type of situation occurs when we deal with the
qualitative study such as honesty, beauty, voice, etc. For example, contestants
of a singing competition may be ranked by judge according to their
performance. In another example, students may be ranked in different subjects
according to their performance in tests.
Arrangement of individuals or items in order of merit or proficiency in the
possession of a certain characteristic is called ranking and the number
indicating the position of individuals or items is known as rank.
If ranks of individuals or items are available for two characteristics then
correlation between ranks of these two characteristics is known as rank
correlation.
With the help of rank correlation, we find the association between two
qualitative characteristics. As we know that the Karl Pearson’s correlation
coefficient gives the intensity of linear relationship between two variables and
Spearman’s rank correlation coefficient gives the concentration of association
between two qualitative characteristics. In fact Spearman’s rank correlation
coefficient measures the strength of association between two ranked variables.
Derivation of the Spearman’s rank correlation coefficient formula is discussed
in the following section.
x
i=1
i = x1 + x 2 + ... + x n
2
4
(x
1
= + x 2 − 2x x)
2 2
x i i
n i=1
n
1 n n
=
x +
2
( x − 2x xi )
2 2
x i
n i=1 i=1 i=1
n
1
=
x
2
x
(
i
2
+ nx 2 − 2nx 2 )
n i=1
n
1
=
x
2
x
(
i
2
− nx 2 )
n i=1
x
1 1
= − x2 = (x 2 + x 2 + ... + x 2 ) − x 2
2 2
… (2)
x i
n i=1 n 1 2 n
2
x
n 6
x
2
= −
6 4
2n + 1 (n + 1)
x = (n + 1) −
2
6 4
2(2n + 1) − 3(n + 1)
x = (n + 1)
2
12
4n + 2 − 3n − 3
x = (n + 1)
2
12
n − 1
x = (n + 1)
2
12
2 n −1
2
di = xi − yi
di = xi − yi − x + y Since x = y
di = (xi − x) − (yi − y)
n n
i=1 i=1
(x − x) + (y
n n
n n n n
d = (x i − x) + (y i − y) − 2 (x i − x)(yi − y)
2
i
2 2
… (3)
1 n d2 = 2 + 2 − 2Cov(x, y)
n
i … (4)
x y
i=1
Cov(x, y)
We know that, r = , which implies that Cov(x, y) = r .
xy
xy
Substituting Cov(x, y) = rxy in equation (4), we have
1 n d2 = 2 + 2 − 2r
n
i
x y x y
i=1
Since, = , then
2 2
x y
1 n d2 = 2 + 2 − 2r
n
i x x x x
i=1
n
1
d = 22 − 2r2
2
i x x
n i=1
n
1
d = 22 (1 − r)
2
i x
n i=1
n
1 2
7
2n 2
d i
= (1− r)
x i=1
d
2
i
r = 1 − i=1 2
2nx
8
6d i
2 Rank Correlation
n 2− 1
r = 1 − i=1 2 (Since x = )
n(n −1) 2 12
6d i
2
rs = 1 − i=1 2 … (5)
n(n −1)
6 d i2
n
rs = 1 − i =1
n ( n 2 − 1)
Let us denote the rank of students in Statistics by R x and rank in Mathematics
by R y . For the calculation of rank correlation coefficient we have to find
n
d
i=1
2
i which is obtained through the following table:
Difference of
Rank in Rank in
Ranks
Statistics Mathematics
(R ) (d = R − R ) d2i
(R x ) y
i x y
1 2 −1 1
2 4 −2 4
3 1 2 4
4 5 −1 1
5 3 2 4
6 8 −2 4
7 7 0 0
8 6 2 4
d 2
i
= 22
132 =
372 =
504 504 0.74
10
Correlation for Bivariate Data Thus there is a positive association between ranks of Statistics and
Mathematics.
Example 2: Suppose we have ranks of 5 students in three subjects Computer,
Physics and Statistics and we want to test which two subjects have the same
trend.
Rank in Computer 2 4 5 1 3
Rank in Physics 5 1 2 3 4
Rank in Statistics 2 3 5 4 1
Solution: In this problem, we want to see which two subjects have same
trend i.e. which two subjects have the positive rank correlation coefficient.
Here we have to calculate three rank correlation coefficients
r12s = Rank correlation coefficient between the ranks of Computer and Physics
r23s = Rank correlation coefficient between the ranks of Physics and Statistics
d 2
12 =32, d 2
=32 and d 2
=14.
23 13
Now
6 d12
2
6 32 = 1 − 8 = − 3
r12s = 1 − =1− = −0.6
n(n − 1)
2
5 24 5 5
6 d 223 6 32 = 1 − 8 = − 3
r23s = 1 − =1− = −0.6
n(n − 1)
2
5 24 5 5
6 d 13
2
6 14 = 1 − 7 = 3 = 0.3
r13s = 1 − = 1 −
n(n 2 −1) 5 24 10 10
r12s is negative which indicates that Computer and Physics have opposite
11
trend. Similarly, negative rank correlation coefficient r23s shows the opposite
12
trend in Physics and Statistics. r13s = 0.3 indicates that Computer and Statistics Rank Correlation
have same trend.
Sometimes we do not have rank but actual values of variables are available. If
we are interested in rank correlation coefficient, we find ranks from the given
values. Considering this case we are taking a problem and try to solve it.
Example 3: Calculate rank correlation coefficient from the following data:
x 78 89 97 69 59 79 68
y 125 137 156 112 107 136 124
d 2
i
=2
i=1
6d i
2
rs = 1 − i=1 2
n(n −1)
62 12
rs = 1 − = 1−
7(49 −1) 7 48
1 27
=1− = = 0.96
28 28
Correlation for Bivariate Data 3. If we want to see the association between qualitative characteristics, rank
correlation coefficient is the only formula;
4. Rank correlation coefficient is the non-parametric version of the Karl
Pearson’s product moment correlation coefficient; and
5. It does not require the assumption of the normality of the population from
which the sample observations are taken.
Demerits of Rank Correlation Coefficient
1. Product moment correlation coefficient can be calculated for bivariate
frequency distribution but rank correlation coefficient cannot be
calculated; and
2. If n >30, this formula is time consuming.
Series 50 70 80 80 85 90
In the above example 80 was repeated twice. It may also happen that two or
more values are repeated twice or more than that.
For example, in the following series there is a repetition of 80 and 110. You
observe the values, assign ranks and check with following.
m(m2 −1)
When there is a repetition of ranks, a correction factor is added to
12
d 2
in the Spearman’s rank correlation coefficient formula, where m is the
number of times a rank is repeated. It is very important to know that this
correction factor is added for every repetition of rank in both characters.
14
In the first example correction factor is added once which is 2(4-1)/12 = 0.5, Rank Correlation
while in the second example correction factors are 2(4-1)/12 = 0.5 and
6
r = 1−
n(n − 1)
s 2
d 2 + m(m12− 1) + ...
2
6
r = 1−
n(n − 1)
s 2
Here rank 6 is repeated three times in rank of x and rank 2.5 is repeated twice
in rank of y, so the correction factor is
3(32 −1) 2(22 −1)
+
12 12
Hence rank correlation coefficient is
3(32 − 1) 2(22 − 1)
683.50 + + 12
12
r = 1−
8(64 − 1)
s
15
3 8 2 3
6 83.50 + +
Correlation for Bivariate Data
12
rs = 1 −
12
8 X 63
6(83.50 + 2.50)
rs = 1 −
504
516
r =1−
s
504
rs = 1−1.024 = − 0.024
There is a negative association between expenditure on advertisement and
profit.
Now, let us solve the following exercises.
E2) Calculate rank correlation coefficient from the following data:
10 20 30 30 40 45 50
y 15 20 25 30 40 40 40
third value. If the third value is greater than the second value ‘+’ is Rank Correlation
assigned. If second and third values or equal than ‘=’ sign is assigned.
4. This procedure is repeated upto the last value of the series.
5. Similarly, we obtain column D y for series y.
(2c − k)
rc =
2 9 −10 8
=+ +
Correlation for Bivariate Data
r =
10 10
(Both signs are + because 2c − k is positive)
= 0.8 = 0.89
Thus correlation is positive.
Now, let us solve the following exercises.
7.6 SUMMARY
In this unit, we have discussed:
1. The rank correlation which is used to see the association between two
qualitative characteristics;
2. Derivation of the Spearman’s rank correlation coefficient formula;
3. Calculation of rank correlation coefficient in different situations- (i) when
values of variables are given, (ii) when ranks of individuals in different
characteristics are given and (iii) when repeated ranks are given;
4. Properties of rank correlation coefficient; and
5. Concurrent deviation which provides the direction of correlation.
d 2
i
=26
i=1
18
6 26
rs = 1 − Rank Correlation
6(36 −1)
26 9
= 1− = = 0.26
35 35
d 2
= 2.5
d 2 + m(m12− 1) + ...
2
6
r = 1−
n(n − 1)
s 2
2 3 3 8
62.5 + +
12 12
rs = 1 −
7 48
6(2.5 + 2.5)
rs = 1 −
336
30 306
rs = 1 − =
336 336
rs = 0.91
E3) We have some calculations in the following table:
19
20
d 2
= 97.5
+
6
r = 1−
n(n − 1)
s 2
Here, rank 4 and 6.5 is repeated thrice and twice respectively in rank of
x and rank 2 is repeated thrice in rank of y, so the correction factor is
3(32 −1) 2(22 − 1) 3(32 −1)
+ +
12 12 12
and therefore, rank correlation2 coefficient2 is
3(3 − 1) 2(2 − 1) 3(32 − 1)
+
697.5 + + 12
12 12
r = 1−
7(49 − 1)
s
697.5 + 4.5
rs = 1 −
7 48
6(102)
rs = 1 −
336
102
r=1− = −0.82
s
56
E4) Coefficient of concurrent deviation is given
(2c − k)
rc =
Let us denote the supply by x and price by y and we calculate c by the
following table:
x Change of Direction y Change of Direction DxDy
sign for x (Dx) sign for y (Dy)
114 108
127 + 104 − −
128 + 105 + +
121 − 106 + −
120 − 100 − +
124 + 99 − −
c=2
21
2 3 − 5
r = =