0% found this document useful (0 votes)
30 views28 pages

11 Correlation

Research methodology class
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views28 pages

11 Correlation

Research methodology class
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Correlation

Correlation: The History

Sir Francis Galton 1822-1911


An eminent anthropologist,
collected data on stature of
parents and adult offspring.

Co-relation and their measurement chiefly from


anthropological data, Proc. of Royal society, 1888.
Galton’s empirical correlation

Scatter plot of heights of 1078 fathers and sons


showing positive association between their height
Father
73 inch
Y=X

There is relationship
between height of
Fathers and their
adult Sons
Correlation between Age and Hearing
capacity
Relationships and Scatter plots
Direction of relationship
Correlations can be classified into three
basic categories: Positive, Negative and
Neutral.
Positive correlation: both move in the
same direction
Negative correlation: both move in the
opposite direction
No relationship: move independently
Some common examples

1. Relation between price and demand of a commodity


2. Relation between births and infant deaths in a community
3. Relation between pressure and volume of a gas
4. Relation between income and expenditure
Spurious correlation:
5. Relation between size of shoes and income
6. Relation between grip strength and vision
Example 1: Direct relation
Variable A: Income of family in 1000s Rs
Variable B : Expenditure of family

Here is a perfect positive correlation, as one


variable increases, the other variable also increases
precisely in the same proportion

Variable Pair of observations

Var A: Income 3 6 9 12 15
Var B: Expenditure 1 2 3 4 5
Example 2 : Inverse relation
Variable A: Income of family in 1000 Rs
Variable B : Number of children in the family

Here is a perfect and negative correlation : as one


variable increases, the other variable decreases
precisely in the same proportion

Variable Pair of observations

Var A: Income 3 6 9 12 15
Var B: Children 5 4 3 2 1
The two basic questions ?

1. How strong is the apparent


relationship?
(Degree of relationship)

2. Can a simple rule be given to


express the relationship?
(Nature of relationship)
Definition
Correlation is a statistical technique that is
used to measure a relationship between
two variables.

Correlation requires two scores from each


individual (one score from each of the two
variables)
Karl Pearson’s Correlation
• Used to describe the linear relationship
between two variables that are both
interval or scale variables
• The symbol for Pearson’s correlation
coefficient is r. This is also called product
moment correlation coefficient
• The underlying principle of r is that it
compares how consistently each Y value is
paired with each X value in a linear fashion
Karl Pearson’s Correlation...

degree to which X and Y vary together


r = ---------------------------------------------------
degree to which X and Y vary separately

co-variability of X and Y
= -----------------------------------------
variability of X and Y separately
Covariance(XY)
r
SD(X)SD(Y)
ΝΣΧΥ   ΣΧ   ΣΥ 

{Σ 2  (Σ) 2{Σ 2  (Σ ) 2 }
Meaning of Correlation

1. r measures the degree of linear relationship


2. r lies between +1 and –1
3. If r = +1 Prefect +ve correlation
4. If r = –1 Prefect –ve correlation
5. If r = 0 Variables are independent
Correlation Coefficient: Values
Perfect Negative Perfect Positive
No Correlation
Correlation Correlation

-1.0 -.5 0 +.5 +1.0

Increasing degree of Increasing degree of


negative correlation positive correlation
Correlation and causality
 The fact that there is a relationship between
two variables does not mean that changes in
one variable will always cause changes in the
other variable.
 A statistical relationship may exist even though
one variable does not cause or influence the
other.
 Correlational research cannot be used to infer
causal relationships between two variables
Example 3:Correlation coefficient
Computation
Height in Inch
Sl X2 Y2 XY
Father(x) Son (y)
1 65 67 4225 4489 4355
2 66 68 7256 4624 4488
3 67 65 4489 4225 4355
4 67 68 4489 4624 4556
5 68 72 4624 5184 4896
6 69 72 4761 5184 4968
7 70 69 4900 4761 4830
8 72 71 5181 5041 5112
Totals ΣX=544 ΣY=552 ΣX2=37028 ΣY2=38132 ΣXY=37560

8  37560  544  552


r  0.603
 8  37028  544   8  38132  552 
2 2
Qualitative data:
Spearman’s Rank correlation coefficient
• Observations for qualitative characteristics may
not be normally distributed eg.
 Relationship of Health-Intelligence
 Relationship of Beauty-Honesty
• It is distribution free statistics
• A non-parametric approach to correlation
The rank correlation

6 di 2
6 di 2

 1 3 1 2
n n n(n  1)
Where
di = diff. of ranks between two attributes
n = no. of pairs of observation
Example 4: Rank Correlation Computation
Individual No Beauty rank(X) Honesty Rank (Y) di =X-Y di2
01 01 01 0 0
02 10 02 8 64
03 03 03 0 0
04 04 04 0 0
05 05 05 0 0
06 07 06 1 1
07 02 07 -5 25
08 06 08 -2 4
09 08 09 -1 1
10 11 10 1 1
11 15 11 4 16
12 09 12 -3 9
13 14 13 1 1
14 12 14 -2 4
15 16 15 1 1
16 13 16 -3 9
Total 0 136
Rank Correlation: Computation..

6 di 2

ρ 1
n(n  1 )
2

6  136
ρ 1  0.8
16  ( 16  1 )
2
When ranks are repeated:
If two or more individuals have same value, in this case
common ranks are assigned to the repeated items. This
common rank is the average of ranks they would have
received if there were no repetition.
For example: we have a series 50, 70, 80, 80, 85, 90 then
1st rank is assigned to 90 because it is the biggest value
then 2nd to 85, now there is a repetition of 80 twice. Since
both values are same so the same rank will be assigned
which would be average of the ranks that we would have
assigned if there were no repetition.
Series 50 70 80 80 85 90
Ranks 6 5 3.5 3.5 2 1
Formula for repeated rank
correlation

Factor m(m2-1)/12 is added to Σdi2


 m ( m
2
 1) 
6  di 
2
 .....
 12 
 1
n(n 2  1)

m = is the no. of time a rank is repeated


Example 5 :
Calculate rank correlation coefficient from the following
data:

Expenditure on 10 15 14 25 14 14 20 22
advertisement
Profit 6 25 12 18 25 40 10 7
Solution: Let us denote the expenditure on
advertisement by x and profit by y
x Rank of x y Rank of y d = Rx-Ry d2
(Rx) ( Ry )
10 8 6 8 0 0
15 4 25 2.5 1.5 2.25
14 6 12 5 1 1
25 1 18 4 -3 9
14 6 25 2.5 3.5 12.25
14 6 40 1 5 25
20 3 10 6 -3 9
22 2 7 7 -5 25
 d  83.50
2
 m ( m
2
 1) 
6  di 
2
 .....
 12 
 1
n(n 2  1)
Here rank 6 is repeated three times in rank of x and rank 2.5 is repeated
twice in rank of y,
Hence rank correlation coefficient is

 3(3  1) 2(2  1) 
2 2

6  83.50   
 12 12 
 1  0.024
8(64  1)

There is a negative association between expenditure on advertisement


and profit.
Rank Correlation : Remarks

• Check, Σdi = 0
• It may be interpreted as r (limits ± 1)
• It is not possible in bi-variate
distribution
• It is simple and easy to understand
• Difficult to compute for larger data
sets
Thank You

You might also like