0% found this document useful (0 votes)
170 views34 pages

Correlation

Correlation analysis measures the relationship between two variables. The correlation coefficient indicates the strength and direction of this relationship. A scatter plot provides a visual representation of the data and can show if variables are positively correlated, negatively correlated, or not correlated at all. Pearson's correlation coefficient (r) is commonly used to quantify the degree of linear correlation between two variables on a scale of -1 to 1. Spearman's rank correlation coefficient is an alternative method used when variables are expressed qualitatively rather than quantitatively.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views34 pages

Correlation

Correlation analysis measures the relationship between two variables. The correlation coefficient indicates the strength and direction of this relationship. A scatter plot provides a visual representation of the data and can show if variables are positively correlated, negatively correlated, or not correlated at all. Pearson's correlation coefficient (r) is commonly used to quantify the degree of linear correlation between two variables on a scale of -1 to 1. Spearman's rank correlation coefficient is an alternative method used when variables are expressed qualitatively rather than quantitatively.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Correlation Analysis

Correlation
 Correlation is used to measure and describe
a relationship between two variables.
 Measure of correlation called correlation
coefficient which tells about the degree
and direction of correlation.
 Correlation analysis measures the closeness
of the relationship between variables.
 Ex- Husband & wife’s age, sales of a
company and expenditure on advertisement
Describing relationships:
An example…
Correlation & Causation
 Correlation Causation
 Causation Correlation

 Correlation may be coincidental especially


in small samples.
 The relationship between variables may be
caused by some third variable.
 Both the variables may be influencing each
other so that neither can be designated as
the cause and other as the effect.
Types of Correlation
 Positive and negative correlation-
 Depends upon the direction of change of
the variables.
 If both the variables are varying in same
direction called positive correlation.
 X 2 4 6 8 10 OR X 50 40 30 20 10
 Y 13 579 Y 24 21 19 18 14
Negative Correlation

 The variables are varying in opposite


directions.
 X 2 4 6 8 10 OR X 50 40 30 20 10
 Y 9 7 5 31 Y 24 26 28 30 32

Simple/Partial/Multiple
Correlation-
 Distinction between three depends on the
number of variables studied.
 When only two variables are studied then simple.
 When three or more variables studied
simultaneously then multiple.
 Recognize more than two variables but consider
only two variables to be influencing each other
and keeping other variables as constant, then
partial.
Linear/ Non-Linear Relationship
 Depends upon the constancy of the ratio of the
change between the variables.
 If the amount of change in one variable tends to
bear constant ratio to the amount of change in
other variable then it is said to be linear.
 X 10 20 30 40 50
 Y 70 140 210 280 350
 If the amount of change in one variable does not
bear a constant ratio to the amount of change in
other variable then it is said to be non-linear.
Methods of Correlation
 Scatter Diagram Method
 Simplest device for ascertaining whether
two variables are related is to prepare a
dot chart.
 Greater the scatter of the plotted points,
lesser the relationship between variables.

Positive Negative No Correlation


Scatter Plot
 What is the relationship between level of
education and lifetime earnings?

Education Level and Lifetime Earnings


X (Education) Y (Income)
8 3.4
7 4.4 5
6 2.5 4
(Criterion Variable)
Lifetime Earnings

5 2.1 3
4 1.6
2
3 1.5
2 1.2 1
1 1 0
0 2 4 6 8 10
Education (Predictor Variable)
Merits/ Demerits of
Scatter Diagram

 Useful for gaining a visual impression of


the relationship.
 Cant establish the exact degree of
correlation between variables, so more
quantitative description is needed
 Gives rough indication of nature and
strength of relationship between variables.
Karl Pearson’s Coefficient of
correlation
 Measure of linear correlation
 Widely used method
 Pearsonian Correlation Coefficient is

denoted by ‘r’.
 The value of r lies between –1 and +1.

1  r  1
Pearson’s r
 Definitional formula:
degree to which X and Y vary together
r
degree to which X and Y vary separately

r
COVXY
COV XY 
 ( X  X )(Y  Y )
(sx )(sy ) n

Computational formula:
n( XY )  ( X )( Y )
r
( n X  ( X ) )( n Y  ( Y ) )
2 2 2 2
An Example: Correlation
X Education Y Income XY X2 Y2
8 3.4 27.2 64 11.56
7 4.4 30.8 49 19.36
6 2.5 15 36 6.25
5 2.1 10.5 25 4.41
4 1.6 6.4 16 2.56
3 1.5 4.5 9 2.25
2 1.2 2.4 4 1.44
1 1 1 1 1
36 17.7 97.8 204 48.83

 X  36
 Y  17.7 n( XY )  ( X )( Y )
r
 XY  97.8 ( n X 2  ( X ) 2 )( n Y 2  ( Y ) 2 )
 X 2  204
 Y 2  48.83
n8
An Example: Correlation

 X  36
 Y  17.7
 XY  97.8
 X 2  204
 Y 2  48.83
n8
An Example: Correlation
 Researchers who measure reaction time for human
participants often observe a relationship between the
reaction time scores and the number of errors that the
participants commit. This relationship is known as the
speed-accuracy tradeoff. The following data are from a
reaction time study where the researcher recorded the
average reaction time (milliseconds) and the total number of
errors for each individual in a sample of 8 participants.
Calculate the correlation coefficient.
Speed Accuracy Tradeoff

Reaction Time Errors 15


184 10
213 6

Number of Errors
10
234 2
197 7
189 13 5

221 10
237 4 0
192 9 150 175 200 225 250
Reaction Time
An Example: Correlation
X X2 Y Y2 XY
184 33856 10 100 1840
213 45369 6 36 1278
234 54756 2 4 468
197 38809 7 49 1379
189 35721 13 169 2457
221 48841 10 100 2210
237 56169 4 16 948
192 36864 9 81 1728
1667 350385 61 555 12308

n( XY )  ( X )( Y )
r
( n X 2  ( X ) 2 )( n Y 2  ( Y ) 2 )
8(12308)  (1667)(61)
r
 8(350385)  (1667) 2  8(555)  (61) 2 
 0.77
Example-
Sales revenue & profit for cement companies for
quarter July-Sept 2006-07.Find r
Company Revenue Profit after tax
(Rs. Crores) (RS. Crores)

ACC 13 2.5
Grasim 21 3.2
Industries
Guj Ambuja 10 2.6
Cements
Ultratech Cement 9 1.4
Shree Cements 3 0.8
India Cements 5 1.1

Source: Economic Times , dt. 11th October 2006.


(Ans r =0.916)
Example
 The following table gives indices of
industrial production and no. of registered
unemployed people(in Lakhs.) Calculate the
value of the correlation coefficient.
 Year 1991 92 93 94 95 96 97 98
 Index of prod. 78 89 99 60 59 79 68 61
 No. of
 Unemployed 125 137 156 112 107 136 123 108

 (Ans: r= 0.014)
Interpreting r
 How can we describe the strength of the
relationship in a scatter plot?
– A number between -1 and +1 that indicates the
relationship between two variables.
• The sign (- or +) indicates the direction of
the relationship.
• The number indicates the strength of the
relationship.
-1 ------------ 0 ------------ +1
Perfect Relationship No Relationship Perfect Relationship

The closer to –1 or +1, the stronger the


relationship.
 When r =+1, perfect positive
relationship.
 When r =-1 ,perfect negative
relationship.
 When r=0, no relationship
 Close to +1 or –1, closer the relationship
between variables.
 Closer to 0, less close the relationship.
 The closeness of relationship is not
proportional to r.
Correlation Coefficient
Spearman’s Rank Correlation
Coefficient
 This method is useful for correlation analysis
when variables are expressed in qualitative terms
like beauty, judgment, intelligence, honesty etc.
 Spearman’s Rank correlation coefficient is defined
as
6 D 2

R  1
n( n  1)
2

 Where R:Rank Correlation coefficient


 D: difference of rank between items of two
series.
 N: no. of observations
When ranks are given-
Rank as per final grade Rank as per salary offered

A 1 1
B 2 3
C 3 2
D 4 4
E 5 6
F 6 5
G 7 9
H 8 8
I 9 10
J 10 7
When ranks are not given-
 Quotations of Index numbers of security prices of
a certain joint stock company are given. Find r-

Year Debenture Share


Price Price
1 97.8 73.2
2 99.2 85.8
3 98.8 78.9
4 98.3 75.8
5 98.4 77.2
6 96.7 87.2
7 97.1 83.8
Equal ranks
1 3 1 3
6[ D  (m  m)  (m  m)  .....] 2

R  1 12 12
n(n  1)
2

 m: number of times whose rank are common

 Obtain rank correlation coefficient between X & Y:-


 X: 50 55 65 50 55 60 50 65 70
 Y: 110 110 115 125 140 115 130 120 115
Probable Error
 Used for testing the reliability of an
observed correlation coefficient.
1 r2
P.E  0.6745 X
n
 If r< P.E, correlation is not significant
 If r > 6P.E, correlation is definitely
significant.
 By adding and subtracting the value of
probable error from the coefficient of
correlation we get limits within coefficient
of correlation is expected to lie.

  r  P.E
Standard Error
 Standard error is defined as
1 r2
S .E 
n
 If r=0.6 and n=64, find out the probable
error , standard error of the coefficient of
correlation and determine the limits for
population r.
Coefficient of Determination
 Square of correlation coefficient.

 Indicates the extent to which variation in


one variable is explained by the variation in
other.

 It is useful because it gives the proportion


of the variance (fluctuation) of one
variable that is predictable from the other
variable.
 Coefficient of determination-
 r2 = Explained variation
Total variance

Coefficient of non-determination/ alienation-


k2 = 1-r2 = Unexplained Variation
Total Variation

0 r 2
1
If r=0.9 the r2=0.81which means that 81% of the total variation in y is due to variation in x or explained by the variation
in x.The remaining 19%(=100-81)is due to or explained by some other factors
Some Practical
Examples

You might also like