Correlation
Correlation
Correlation
Correlation is used to measure and describe
a relationship between two variables.
Measure of correlation called correlation
coefficient which tells about the degree
and direction of correlation.
Correlation analysis measures the closeness
of the relationship between variables.
Ex- Husband & wife’s age, sales of a
company and expenditure on advertisement
Describing relationships:
An example…
Correlation & Causation
Correlation Causation
Causation Correlation
5 2.1 3
4 1.6
2
3 1.5
2 1.2 1
1 1 0
0 2 4 6 8 10
Education (Predictor Variable)
Merits/ Demerits of
Scatter Diagram
denoted by ‘r’.
The value of r lies between –1 and +1.
1 r 1
Pearson’s r
Definitional formula:
degree to which X and Y vary together
r
degree to which X and Y vary separately
r
COVXY
COV XY
( X X )(Y Y )
(sx )(sy ) n
Computational formula:
n( XY ) ( X )( Y )
r
( n X ( X ) )( n Y ( Y ) )
2 2 2 2
An Example: Correlation
X Education Y Income XY X2 Y2
8 3.4 27.2 64 11.56
7 4.4 30.8 49 19.36
6 2.5 15 36 6.25
5 2.1 10.5 25 4.41
4 1.6 6.4 16 2.56
3 1.5 4.5 9 2.25
2 1.2 2.4 4 1.44
1 1 1 1 1
36 17.7 97.8 204 48.83
X 36
Y 17.7 n( XY ) ( X )( Y )
r
XY 97.8 ( n X 2 ( X ) 2 )( n Y 2 ( Y ) 2 )
X 2 204
Y 2 48.83
n8
An Example: Correlation
X 36
Y 17.7
XY 97.8
X 2 204
Y 2 48.83
n8
An Example: Correlation
Researchers who measure reaction time for human
participants often observe a relationship between the
reaction time scores and the number of errors that the
participants commit. This relationship is known as the
speed-accuracy tradeoff. The following data are from a
reaction time study where the researcher recorded the
average reaction time (milliseconds) and the total number of
errors for each individual in a sample of 8 participants.
Calculate the correlation coefficient.
Speed Accuracy Tradeoff
Number of Errors
10
234 2
197 7
189 13 5
221 10
237 4 0
192 9 150 175 200 225 250
Reaction Time
An Example: Correlation
X X2 Y Y2 XY
184 33856 10 100 1840
213 45369 6 36 1278
234 54756 2 4 468
197 38809 7 49 1379
189 35721 13 169 2457
221 48841 10 100 2210
237 56169 4 16 948
192 36864 9 81 1728
1667 350385 61 555 12308
n( XY ) ( X )( Y )
r
( n X 2 ( X ) 2 )( n Y 2 ( Y ) 2 )
8(12308) (1667)(61)
r
8(350385) (1667) 2 8(555) (61) 2
0.77
Example-
Sales revenue & profit for cement companies for
quarter July-Sept 2006-07.Find r
Company Revenue Profit after tax
(Rs. Crores) (RS. Crores)
ACC 13 2.5
Grasim 21 3.2
Industries
Guj Ambuja 10 2.6
Cements
Ultratech Cement 9 1.4
Shree Cements 3 0.8
India Cements 5 1.1
(Ans: r= 0.014)
Interpreting r
How can we describe the strength of the
relationship in a scatter plot?
– A number between -1 and +1 that indicates the
relationship between two variables.
• The sign (- or +) indicates the direction of
the relationship.
• The number indicates the strength of the
relationship.
-1 ------------ 0 ------------ +1
Perfect Relationship No Relationship Perfect Relationship
R 1
n( n 1)
2
A 1 1
B 2 3
C 3 2
D 4 4
E 5 6
F 6 5
G 7 9
H 8 8
I 9 10
J 10 7
When ranks are not given-
Quotations of Index numbers of security prices of
a certain joint stock company are given. Find r-
R 1 12 12
n(n 1)
2
r P.E
Standard Error
Standard error is defined as
1 r2
S .E
n
If r=0.6 and n=64, find out the probable
error , standard error of the coefficient of
correlation and determine the limits for
population r.
Coefficient of Determination
Square of correlation coefficient.
0 r 2
1
If r=0.9 the r2=0.81which means that 81% of the total variation in y is due to variation in x or explained by the variation
in x.The remaining 19%(=100-81)is due to or explained by some other factors
Some Practical
Examples