11 Correlation
11 Correlation
There is relationship
between height of
Fathers and their
adult Sons
Correlation between Age and Hearing
capacity
Relationships and Scatter plots
Direction of relationship
Correlations can be classified into three
basic categories: Positive, Negative and
Neutral.
Positive correlation: both move in the
same direction
Negative correlation: both move in the
opposite direction
No relationship: move independently
Some common examples
Var A: Income 3 6 9 12 15
Var B: Expenditure 1 2 3 4 5
Example 2 : Inverse relation
Variable A: Income of family in 1000 Rs
Variable B : Number of children in the family
Var A: Income 3 6 9 12 15
Var B: Children 5 4 3 2 1
The two basic questions ?
co-variability of X and Y
= -----------------------------------------
variability of X and Y separately
Covariance(XY)
r
SD(X)SD(Y)
ΝΣΧΥ ΣΧ ΣΥ
{Σ 2 (Σ) 2{Σ 2 (Σ ) 2 }
Meaning of Correlation
6 di 2
6 di 2
1 3 1 2
n n n(n 1)
Where
di = diff. of ranks between two attributes
n = no. of pairs of observation
Example 4: Rank Correlation Computation
Individual No Beauty rank(X) Honesty Rank (Y) di =X-Y di2
01 01 01 0 0
02 10 02 8 64
03 03 03 0 0
04 04 04 0 0
05 05 05 0 0
06 07 06 1 1
07 02 07 -5 25
08 06 08 -2 4
09 08 09 -1 1
10 11 10 1 1
11 15 11 4 16
12 09 12 -3 9
13 14 13 1 1
14 12 14 -2 4
15 16 15 1 1
16 13 16 -3 9
Total 0 136
Rank Correlation: Computation..
6 di 2
ρ 1
n(n 1 )
2
6 136
ρ 1 0.8
16 ( 16 1 )
2
When ranks are repeated:
If two or more individuals have same value, in this case
common ranks are assigned to the repeated items. This
common rank is the average of ranks they would have
received if there were no repetition.
For example: we have a series 50, 70, 80, 80, 85, 90 then
1st rank is assigned to 90 because it is the biggest value
then 2nd to 85, now there is a repetition of 80 twice. Since
both values are same so the same rank will be assigned
which would be average of the ranks that we would have
assigned if there were no repetition.
Series 50 70 80 80 85 90
Ranks 6 5 3.5 3.5 2 1
Formula for repeated rank
correlation
Expenditure on 10 15 14 25 14 14 20 22
advertisement
Profit 6 25 12 18 25 40 10 7
Solution: Let us denote the expenditure on
advertisement by x and profit by y
x Rank of x y Rank of y d = Rx-Ry d2
(Rx) ( Ry )
10 8 6 8 0 0
15 4 25 2.5 1.5 2.25
14 6 12 5 1 1
25 1 18 4 -3 9
14 6 25 2.5 3.5 12.25
14 6 40 1 5 25
20 3 10 6 -3 9
22 2 7 7 -5 25
d 83.50
2
m ( m
2
1)
6 di
2
.....
12
1
n(n 2 1)
Here rank 6 is repeated three times in rank of x and rank 2.5 is repeated
twice in rank of y,
Hence rank correlation coefficient is
3(3 1) 2(2 1)
2 2
6 83.50
12 12
1 0.024
8(64 1)
• Check, Σdi = 0
• It may be interpreted as r (limits ± 1)
• It is not possible in bi-variate
distribution
• It is simple and easy to understand
• Difficult to compute for larger data
sets
Thank You