Correlation Analysis PDF
Correlation Analysis PDF
4.1 INTRODUCTION
Statistical methods of measures of central tendency, dispersion, skewness and kurtosis are
helpful for the purpose of comparison and analysis of distributions involving only one
variable i.e. univariate distributions. However, describing the relationship between two or
In many business research situations, the key to decision making lies in understanding the
relationships between two or more variables. For example, in an effort to predict the behavior
of the bond market, a broker might find it useful to know whether the interest rate of bonds is
related to the prime interest rate. While studying the effect of advertising on sales, an account
executive may find it useful to know whether there is a strong relationship between
The statistical methods of Correlation (discussed in the present lesson) and Regression (to be
discussed in the next lesson) are helpful in knowing the relationship between two or more
variables which may be related in same way, like interest rate of bonds and prime interest
rate; advertising expenditure and sales; income and consumption; crop-yield and fertilizer
In all these cases involving two or more variables, we may be interested in seeing:
¾ if so, what form the relationship between the two variables takes;
¾ how we can make use of that relationship for predictive purposes, that is, forecasting;
and
96
Since these issues are inter related, correlation and regression analysis, as two sides of a
single process, consists of methods of examining the relationship between two or more
variables. If two (or more) variables are correlated, we can use information about one (or
more) variable(s) to predict the value of the other variable(s), and can measure the error
exploratory research when the objective is to locate variables that might be related in some
Correlation can be classified in several ways. The important ways of classifying correlation
are:
If both the variables move in the same direction, we say that there is a positive correlation,
i.e., if one variable increases, the other variable also increases on an average or if one variable
97
On the other hand, if the variables are varying in opposite direction, we say that it is a case of
If the change in one variable is accompanied by change in another variable in a constant ratio,
X : 10 20 30 40 50
Y : 25 50 75 100 125
The ratio of change in the above example is the same. It is, thus, a case of linear correlation.
If we plot these variables on graph paper, all the points will fall on the same straight line.
On the other hand, if the amount of change in one variable does not follow a constant ratio
couple of figures in either series X or series Y are changed, it would give a non-linear
correlation.
The distinction amongst these three types of correlation depends upon the number of
variables involved in a study. If only two variables are involved in a study, then the
correlation is said to be simple correlation. When three or more variables are involved in a
three or more variables are studied simultaneously. But in partial correlation we consider only
two variables influencing each other while the effect of other variable(s) is held constant.
Suppose we have a problem comprising three variables X, Y and Z. X is the number of hours
studied, Y is I.Q. and Z is the number of marks obtained in the examination. In a multiple
correlation, we will study the relationship between the marks obtained (Z) and the two
variables, number of hours studied (X) and I.Q. (Y). In contrast, when we study the
98
relationship between X and Z, keeping an average I.Q. (Y) as constant, it is said to be a study
The correlation analysis, in discovering the nature and degree of relationship between
variables, does not necessarily imply any cause and effect relationship between the variables.
Two variables may be related to each other but this does not mean that one variable causes
the other. For example, we may find that logical reasoning and creativity are correlated, but
that does not mean if we could increase peoples’ logical reasoning ability, we would produce
causal relationship. But if it is true that influencing someones’ logical reasoning ability does
influence their creativity, then the two variables must be correlated with each other. In other
1. The correlation may be due to chance particularly when the data pertain to a small
sample. A small sample bivariate series may show the relationship but such a
2. It is possible that both the variables are influenced by one or more other variables.
households show a positive relationship because both have increased over time.
But, this is due to rise in family incomes over the same period. In other words, the
incomes.
99
3. There may be another situation where both the variables may be influencing each
other so that we cannot say which is the cause and which is the effect. For
example, take the case of price and demand. The rise in price of a commodity may
lead to a decline in the demand for it. Here, price is the cause and the demand is
the effect. In yet another situation, an increase in demand may lead to a rise in
price. Here, the demand is the cause while price is the effect, which is just the
variable is causing the effect on which variable, as both are influencing each
other.
The foregoing discussion clearly shows that correlation does not indicate any causation or
this has nothing to do with cause and effect relation. It only reveals co-variation between
two variables. Even when there is no cause-and-effect relationship in bivariate series and one
correlation. Obviously, this will be misleading. As such, one has to be very careful in
correlation exercises and look into other relevant factors before concluding a cause-and-effect
relationship.
Correlation Analysis is a statistical technique used to indicate the nature and degree of
relationship existing between one variable and the other(s). It is also used along with
regression analysis to measure how well the regression line explains the variations of the
The commonly used methods for studying linear relationship between two variables involve
both graphic and algebraic methods. Some of the widely used methods include:
1. Scatter Diagram
2. Correlation Graph
100
3. Pearson’s Coefficient of Correlation
This method is also known as Dotogram or Dot diagram. Scatter diagram is one of the
method, both the variables are plotted on the graph paper by putting dots. The diagram so
obtained is called "Scatter Diagram". By studying diagram, we can have rough idea about the
nature and degree of relationship between two variables. The term scatter refers to the
spreading of dots on the graph. We should keep the following points in mind while
interpreting correlation:
¾ if the plotted points are very close to each other, it indicates high degree of
correlation. If the plotted points are away from each other, it indicates low degree of
correlation.
101
Figure 4-1 Scatter Diagrams
¾ if the points on the diagram reveal any trend (either upward or downward), the
variables are said to be correlated and if no trend is revealed, the variables are
uncorrelated.
¾ if there is an upward trend rising from lower left hand corner and going upward to the
upper right hand corner, the correlation is positive since this reveals that the values of
the two variables move in the same direction. If, on the other hand, the points depict a
downward trend from the upper left hand corner to the lower right hand corner, the
correlation is negative since in this case the values of the two variables move in the
opposite directions.
¾ in particular, if all the points lie on a straight line starting from the left bottom and
going up towards the right top, the correlation is perfect and positive, and if all the
points like on a straight line starting from left top and coming down to right bottom,
The various diagrams of the scattered data in Figure 4-1 depict different forms of correlation.
Example 4-1
102
Given the following data on sales (in thousand units) and expenses (in thousand rupees) of a
Month : J F M A M J J A S O
Sales: 50 50 55 60 62 65 68 60 60 50
Expenses: 11 13 14 16 16 15 15 14 13 13
a) Make a Scatter Diagram
b) Do you think that there is a correlation between sales and expenses of the
Solution:(a) The Scatter Diagram of the given data is shown in Figure 4-2
20
15
Expenses
10
0
0 20 40 60 80
Sales
(b) Figure 4-2 shows that the plotted points are close to each other and reveal an upward
trend. So there is a high degree of positive correlation between sales and expenses of the firm.
This method, also known as Correlogram is very simple. The data pertaining to two series are
plotted on a graph sheet. We can find out the correlation by examining the direction and
closeness of two curves. If both the curves drawn on the graph are moving in the same
direction, it is a case of positive correlation. On the other hand, if both the curves are moving
in opposite direction, correlation is said to be negative. If the graph does not show any
definite pattern on account of erratic fluctuations in the curves, then it shows an absence of
correlation.
103
Example 4-2
Find out graphically, if there is any correlation between price yield per plot (qtls); denoted by
Plot No.: 1 2 3 4 5 6 7 8 9 10
Y: 3.5 4.3 5.2 5.8 6.4 7.3 7.2 7.5 7.8 8.3
X: 6 8 9 12 10 15 17 20 18 24
30
25
20
X and Y
15
10
5
0
1 2 3 4 5 6 7 8 9 10
Plot Number
Figure 4-3 shows that the two curves move in the same direction and, moreover, they are very
close to each other, suggesting a close relationship between price yield per plot (qtls) and
Remark: Both the Graphic methods - scatter diagram and correlation graph provide a
‘feel for’ of the data – by providing visual representation of the association between the
variables. These are readily comprehensible and enable us to form a fairly good, though
rough idea of the nature and degree of the relationship between the two variables. However,
these methods are unable to quantify the relationship between them. To quantify the extent of
A mathematical method for measuring the intensity or the magnitude of linear relationship
104
between two variables was suggested by Karl Pearson (1867-1936), a great British
Biometrician and Statistician and, it is by far the most widely used method in practice.
Karl Pearson’s measure, known as Pearsonian correlation coefficient between two variables X
relationship between them and is defined as the ratio of the covariance between X and Y, to
Symbolically
Cov ( X , Y )
rxy = …………(4.1)
S x .S y
Y in a bivariate distribution,
Cov( X , Y ) =
∑ ( X − X )(Y − Y ) …………(4.2a)
N
Sx =
∑(X − X ) 2
…………(4.2b)
N
and Sy =
∑ (Y − Y ) 2
…………(4.2c)
N
Thus by substituting Eqs. (4.2) in Eq. (4.1), we can write the Pearsonian correlation
coefficient as
1
N
∑ ( X − X )(Y − Y )
rxy =
1 1
N
∑(X − X ) 2
N
∑ (Y − Y ) 2
rxy =
∑ ( X − X )(Y − Y ) …………(4.3)
∑ ( X − X ) ∑ (Y − Y )
2 2
105
If we denote, d x = X − X and d y = Y − Y
Then rxy =
∑d x dy
…………(4.3a)
∑d
2 2
x dy
We have
1
Cov( X , Y ) =
N
∑ ( X − X )(Y − Y )
1
=
N
∑ XY − X Y
1 ∑ X ∑Y
=
N
∑ XY − N N
=
1
N2
[
N ∑ XY − ∑ X ∑ Y ] …………(4.4)
1
and S x2 =
N
∑(X − X ) 2
1
=
N
∑X 2
−( X ) 2
⎛∑X
2
1 ⎞
=
N
∑ X −⎜⎜ N 2 ⎟
⎟
⎝ ⎠
=
N
1
2
[
N ∑ X 2 − (∑ X )
2
] …………(4.5a)
Similarly, we have
S y2 =
N
1
2
[
N ∑ Y 2 − (∑ Y )
2
] …………(4.5b)
1
N2
N ∑ XY − ∑ X ∑ Y [ ]
rxy =
N
1
2
[
N ∑ X 2 − (∑ X )
2
N
1
2
N ∑ Y 2 − (∑ Y )
2
] [ ]
N ∑ XY − ∑ X ∑ Y
or rxy = …………(4.6)
N ∑ X − (∑ X ) N ∑ Y − (∑ Y )
2 2 2 2
106
Remark: Eq. (4.3) or Eq. (4.3a) is quite convenient to apply if the means X and
Y come out to be integers. If X or/and Y is (are) fractional then the Eq. (4.3) or Eq. (4.3a) is
∑ ( X − X )(Y − Y ) are quite time consuming and tedious. In such a case Eq. (4.6) may be
used provided the values of X or/ and Y are small. But if X and Y assume large values, the
calculation of ∑ X , ∑Y
2 2
and ∑ XY is again quite time consuming.
Thus if (i) X and Y are fractional and (ii) X and Y assume large values, the Eq. (4.3) and Eq.
(4.6) are not generally used for numerical problems. In such cases, the step deviation method
where we take the deviations of the variables X and Y from any arbitrary points is used. We
-1 ≤ r ≤1
Remarks: (i) This property provides us a check on our calculations. If in any problem,
the obtained value of r lies outside the limits + 1, this implies that there is some mistake in
our calculations.
(ii) The sign of r indicate the nature of the correlation. Positive value of r indicates
absence of correlation.
(iii) The following table sums up the degrees of correlation corresponding to various
values of r:
107
Value of r Degree of correlation
±1 perfect correlation
±0.90 or more very high degree of correlation
sufficiently high degree of
±0.75 to ±0.90
correlation
±0.60 to ±0.75 moderate degree of correlation
only the possibility of a
±0.30 to ±0.60
correlation
less than ±0.30 possibly no correlation
0 absence of correlation
X−A Y −B
U= and V =
h k
Where A, B, h and k are constants and h > 0, k > 0; then the correlation coefficient
Remark: This is one of the very important properties of the correlation coefficient and
is extremely helpful in numerical computation of r. We had already stated that Eq. (4.3) and
Eq.(4.6) become quite tedious to use in numerical problems if X and/or Y are in fractions or if
X and Y are large. In such cases we can conveniently change the origin and scale (if possible)
in X or/and Y to get new variables U and V and compute the correlation between U and V by
N ∑ UV − ∑ U ∑ V
rxy = ruv = …………(4.7)
N ∑ U − (∑ U ) N ∑ V − (V )
2 2 2 2
3. Two independent variables are uncorrelated but the converse is not true
108
If X and Y are independent variables then
rxy = 0
However, the converse of the theorem is not true i.e., uncorrelated variables need not
distribution.
X : 1 2 3 -3 -2 -1
Y : 1 4 9 9 4 1
Hence in the above example the variable X and Y are uncorrelated. But if we examine
the data carefully we find that X and Y are not independent but are connected by the
relation Y = X2. The above example illustrates that uncorrelated variables need not be
independent.
Remarks: One should not be confused with the words uncorrelation and independence.
rxy = 0 i.e., uncorrelation between the variables X and Y simply implies the absence of any
linear (straight line) relationship between them. They may, however, be related in some other
form other than straight line e.g., quadratic (as we have seen in the above example),
coefficients, i.e.
rxy = ± bxy .b yx
The signs of both the regression coefficients are the same, and so the value of r will
This property will be dealt with in detail in the next lesson on Regression Analysis.
109
5. The square of Pearsonian correlation coefficient is known as the coefficient of
determination.
dependent variable that is accounted for by the independent variable, is a much better
and useful measure for interpreting the value of r. This property will also be dealt
The correlation coefficient establishes the relationship of the two variables. After ascertaining
this level of relationship, we may be interested to find the extent upto which this coefficient is
dependable. Probable error of the correlation coefficient is such a measure of testing the
for the two variables under consideration, then the Probable Error, denoted by PE (r) is
expressed as
PE (r ) = 0.6745 SE (r )
1− r2
or PE (r ) = 0.6745
N
PE(r), implying that if we take another random sample of the size N from the same
population, then the observed value of the correlation coefficient in the second
sample can be expected to lie within the limits given above, with 0.5 probability.
When sample size N is small, the concept or value of PE may lead to wrong
110
conclusions. Hence to use the concept of PE effectively, sample size N it should be
fairly large.
correlation.
Example 4-3
Find the Pearsonian correlation coefficient between sales (in thousand units) and expenses (in
Firm: 1 2 3 4 5 6 7 8 9 10
Sales: 50 50 55 60 65 65 65 60 60 50
Expenses: 11 13 14 16 16 15 15 14 13 13
1 50 11 -8 -3 64 9 24
2 50 13 -8 -1 64 1 8
3 55 14 -3 0 9 0 0
4 60 16 2 2 4 4 4
5 65 16 7 2 49 4 14
6 65 15 7 1 49 1 7
7 65 15 7 1 49 1 7
8 60 14 2 0 4 0 0
9 60 13 2 -1 4 1 -2
10 50 13 -8 -1 64 1 8
∑ X ∑Y ∑d
2
x ∑d
2
y
∑d x dy
111
= = =360 =22 =70
580 140
X=
∑X = 580
= 58 and Y=
∑ Y = 140 = 14
N 10 N 10
rxy =
∑d x dy
∑d
2 2
x dy
70
rxy =
360 x 22
70
rxy =
7920
rxy = 0.78
The value of rxy = 0.78 , indicate a high degree of positive correlation between sales and expenses.
Example 4-4
The data on price and quantity purchased relating to a commodity for 5 months is given
below:
Find the Pearsonian correlation coefficient between prices and quantity and comment on its
112
∑X =55 ∑ Y =21 ∑X 2
= 609 ∑Y 2
= 95 ∑ XY = 226
N ∑ XY − ∑ X ∑ Y
rxy =
N ∑ X 2 − (∑ X ) N ∑ Y 2 − (∑ Y )
2 2
5 x 226 − 55 x 21
rxy =
(5 x609 − 55 x55)(5 x95 − 21x 21)
1130 − 1155
rxy =
20 x34
− 25
rxy =
680
rxy = −0.98
The negative sign of r indicate negative correlation and its large magnitude indicate a very
high degree of correlation. So there is a high degree of negative correlation between prices
Example 4-5
Find the Pearsonian correlation coefficient from the following series of marks obtained by 10
X: 45 70 65 30 90 40 50 75 85 60
Y: 35 90 70 40 95 40 60 80 80 50
Solution:
Calculations for Coefficient of Correlation
{Using Eq. (4.7)}
X Y U V U2 V2 UV
45 35 -3 -6 9 36 18
70 90 2 5 4 25 10
65 70 1 1 1 1 1
113
30 40 -6 -5 36 25 30
90 95 6 6 36 36 36
40 40 -4 -5 16 25 20
50 60 -2 -1 4 1 2
75 80 3 3 9 9 9
85 80 5 3 25 9 15
60 50 0 -3 0 9 0
∑U = 2 ∑V = −2 ∑U 2
= 140 ∑V 2
= 176 ∑UV = 141
X − 60 Y − 65
U= and V=
5 5
N ∑ UV − (∑ U ∑ V )
rxy = ruv =
N ∑ U 2 − (∑ U ) N ∑ V 2 − (∑ V )
2 2
10 x141 − 2 x(−2)
=
10 x140 − 2 x 2 10 x176 − (−2) x(−2)
1410 + 4
=
1400 − 4 1760 − 4
1414
=
2451376
= 0.9
So there is a high degree of positive correlation between marks obtained in Mathematics and
in Statistics.
1− r2
PE (r ) = 0.6745
N
114
1 − (0.9 )
2
PE (r ) = 0.6745
10
PE (r ) = 0.0405
Sometimes we come across statistical series in which the variables under consideration are
not capable of quantitative measurement but can be arranged in serial order. This happens
when we are dealing with qualitative characteristics (attributes) such as honesty, beauty,
character, morality, etc., which cannot be measured quantitatively but can be arranged
serially. In such situations Karl Pearson’s coefficient of correlation cannot be used as such.
consists in obtaining the correlation coefficient between the ranks of N individuals in the two
Suppose we want to find if two characteristics A, say, intelligence and B, say, beauty are
related or not. Both the characteristics are incapable of quantitative measurements but we can
arrange a group of N individuals in order of merit (ranks) w.r.t. proficiency in the two
characteristics. Let the random variables X and Y denote the ranks of the individuals in the
individuals get the same rank in a characteristic then, obviously, X and Y assume numerical
The Pearsonian correlation coefficient between the ranks X and Y is called the rank
correlation coefficient between the characteristics A and B for the group of individuals.
Spearman’s rank correlation coefficient, usually denoted by ρ(Rho) is given by the equation
6∑ d 2
ρ =1 − …………(4.8)
N ( N 2 − 1)
115
Where d is the difference between the pair of ranks of the same individual in the two
Example 4-6
Ten entries are submitted for a competition. Three judges study each entry and list the ten in
Entry: A B C D E F G H I J
Judge J1: 9 3 7 5 1 6 2 4 10 8
Judge J2: 9 1 10 4 3 8 5 2 7 6
Judge J3: 6 3 8 7 2 4 1 5 9 10
Calculate the appropriate rank correlation to help you answer the following questions:
6∑ d 2
ρ (J1 & J2) = 1 −
N ( N 2 − 1)
116
6 x 48
=1 −
10(10 2 − 1)
288
=1 −
990
=1 – 0.29
= +0.71
6∑ d 2
ρ (J1 & J3) =1 −
N ( N 2 − 1)
6 x 26
=1 −
10(10 2 − 1)
156
=1 −
990
=1 – 0.1575
= +0.8425
6∑ d 2
ρ (J2 & J3) =1 −
N ( N 2 − 1)
6 x 88
=1 −
10(10 2 − 1)
528
=1 −
990
=1 – 0.53
= +0.47
Spearman’s rank correlation Eq.(4.8) can also be used even if we are dealing with variables,
which are measured quantitatively, i.e. when the actual data but not the ranks relating to two
variables are given. In such a case we shall have to convert the data into ranks. The highest
(or the smallest) observation is given the rank 1. The next highest (or the next lowest)
ascending) the ranks are assigned. However, the same approach should be followed for all the
117
Example 4-7
Calculate the rank coefficient of correlation from the following data:
X: 75 88 95 70 60 80 81 50
Y: 120 134 150 115 110 140 142 100
Solution:
Calculations for Coefficient of Rank Correlation
{Using Eq.(4.8)}
X Ranks RX Y Ranks RY d = RX -RY d2
75 5 120 5 0 0
88 2 134 4 -2 4
95 1 150 1 0 0
70 6 115 6 0 0
60 7 110 7 0 0
80 4 140 3 +1 1
81 3 142 2 +1 1
50 8 100 8 0 0
∑d2 = 6
6∑ d 2
ρ = 1−
N ( N 2 − 1)
6x6
= 1−
8(8 2 − 1)
36
= 1−
504
= 1 – 0.07
= + 0.93
Hence, there is a high degree of positive correlation between X and Y
Repeated Ranks
In case of attributes if there is a tie i.e., if any two or more individuals are placed together in
any classification w.r.t. an attribute or if in case of variable data there is more than one item
with the same value in either or both the series then Spearman’s Eq.(4.8) for calculating the
rank correlation coefficient breaks down, since in this case the variables X [the ranks of
118
individuals in characteristic A (1st series)] and Y [the ranks of individuals in characteristic B
In this case common ranks are assigned to the repeated items. These common ranks are the
arithmetic mean of the ranks, which these items would have got if they were different from
each other and the next item will get the rank next to the rank used in computing the common
rank. For example, suppose an item is repeated at rank 4. Then the common rank to be
assigned to each item is (4+5)/2, i.e., 4.5 which is the average of 4 and 5, the ranks which
these observations would have assumed if they were different. The next item will be assigned
the rank 6. If an item is repeated thrice at rank 7, then the common rank to be assigned to
each value will be (7+8+9)/3, i.e., 8 which is the arithmetic mean of 7,8 and 9 viz., the ranks
these observations would have got if they were different from each other. The next rank to be
If only a small proportion of the ranks are tied, this technique may be applied together with
m(m 2 − 1)
…………(4.8a)
12
∑d
2
to ; where m is the number of times an item is repeated. This correction factor is to be
Example 4-8
For a certain joint stock company, the prices of preference shares (X) and debentures (Y) are
given below:
X: 73.2 85.8 78.9 75.8 77.2 81.2 83.8
Y: 97.8 99.2 98.8 98.3 98.3 96.7 97.1
119
Use the method of rank correlation to determine the relationship between preference prices
and debentures prices.
Solution:
Calculations for Coefficient of Rank Correlation
{Using Eq. (4.8) and (4.8a)}
X Y Rank of X (XR) Rank of Y (YR) d = XR – YR d2
73.2 97.8 7 5 2 4
85.8 99.2 1 1 0 0
78.9 98.8 4 2 2 4
75.8 98.3 6 3.5 2.5 6.25
77.2 98.3 5 3.5 1.5 2.25
81.2 96.7 3 7 -4 16
83.8 97.1 2 6 -4 16
∑d = 0 ∑d
2
= 48.50
In this case, due to repeated values of Y, we have to apply ranking as average of 2 ranks,
which could have been allotted, if they were different values. Thus ranks 3 and 4 have been
allotted as 3.5 to both the values of Y = 98.3. Now we also have to apply correction factor
m(m 2 − 1)
∑d
2
to , where m in the number of times the value is repeated, here m = 2.
12
⎡
6 ⎢∑ d 2 +
(
m m2 −1 ⎤ )
⎥
⎣ 2 ⎦
ρ =
N ( N − 1)
2
⎡ 2(4 − 1) ⎤
6⎢48.5 +
⎣ 12 ⎥⎦
=
7(7 2 − 1)
6 x 49
= 1-
7 x 48
= 0.125
Hence, there is a very low degree of positive correlation, probably no correlation,
120
Remarks on Spearman’s Rank Correlation Coefficient
correlation coefficient, r, between the ranks, it can be interpreted in the same way
3. Karl Pearson’s correlation coefficient assumes that the parent population from
which sample observations are drawn is normal. If this assumption is violated then
is such a distribution free measure, since no strict assumption are made about the
Pearson’s formula. The values obtained by the two formulae, viz Pearsonian r and
Spearman’s ρ are generally different. The difference arises due to the fact that
when ranking is used instead of full set of observations, there is always some loss
of information. Unless many ties exist, the coefficient of rank correlation should
measured quantitatively but can be arranged serially. It can also be used where
6. Spearman’s formula has its limitations also. It is not practicable in the case of
bivariate frequency distribution. For N >30, this formula should not be used unless
121
4.3.5 CONCURRENT DEVIATION METHOD
This is a casual method of determining the correlation between two series when we are not
very serious about its precision. This is based on the signs of the deviations (i.e. the
direction of the change) of the values of the variable from its preceding value and does not
take into account the exact magnitude of the values of the variables. Thus we put a plus (+)
sign, minus (-) sign or equality (=) sign for the deviation if the value of the variable is greater
than, less than or equal to the preceding value respectively. The deviations in the values of
two variables are said to be concurrent if they have the same sign (either both deviations are
positive or both are negative or both are equal). The formula used for computing correlation
⎛ 2c − N ⎞
rc = + +⎜ ⎟ …………(4.9)
⎝ N ⎠
Where c is the number of pairs of concurrent deviations and N is the number of pairs of
deviations. If (2c-N) is positive, we take positive sign in and outside the square root in Eq.
(4.9) and if (2c-N) is negative, we take negative sign in and outside the square root in Eq.
(4.9).
Remarks: (i) It should be clearly noted that here N is not the number of pairs of
observations but it is the number of pairs of deviations and as such it is one less than the
principle:
“If the short time fluctuations of the time series are positively correlated or in other
words, if their deviations are concurrent, their curves would move in the same
direction and would indicate positive correlation between them”
Example 4-9
122
Calculate coefficient of correlation by the concurrent deviation method
Supply: 112 125 126 118 118 121 125 125 131 135
Price: 106 102 102 104 98 96 97 97 95 90
Solution:
Calculations for Coefficient of Concurrent Deviations
{Using Eq. (4.9)}
Supply Sign of deviation from Price Sign of deviation Concurrent
(X) preceding value (X) (Y) preceding value (Y) deviations
112 106
125 + 102 -
126 + 102 =
118 - 104 +
118 = 98 -
121 + 96 -
125 + 97 + +(c)
125 = 97 = = ( c)
131 + 95 -
135 + 90 -
We have
Number of pairs of deviations, N =10 – 1 = 9
c = Number of concurrent deviations
= Number of deviations having like signs
=2
Coefficient of correlation by the method of concurrent deviations is given by:
⎛ 2c − N ⎞
rc = + +⎜ ⎟
⎝ N ⎠
⎛ 2x 2 − 9 ⎞
rc = + +⎜ ⎟
⎝ 9 ⎠
rc = + +(− 0.5556 )
Since 2c – N = -5 (negative), we take negative sign inside and outside the square root
123
rc = − − (− 0.5556 )
rc = − 0.5556
rc = −0.7
Hence there is a fairly good degree of negative correlation between supply and price.
As mentioned earlier, correlation analysis is a statistical tool, which should be properly used
reasonably sure that one variable is the cause while the other is the effect. Let us take
an example. .
Suppose that we study the performance of students in their graduate examination and
their earnings after, say, three years of their graduation. We may find that these two
variables are highly and positively related. At the same time, we must not forget that
both the variables might have been influenced by some other factors such as quality of
process and so forth. If the data on these factors are available, then it is worthwhile to
that correlation explains 70 percent of the total variation in Y. The error can be seen
124
determination r2 will be 0.49. This means that only 49 percent of the total variation in
Y is explained.
indicate causal relationship, that is, the percentage of the change in one variable is due
3. Another mistake in the interpretation of the coefficient of correlation occurs when one
concludes a positive or negative relationship even though the two variables are
actually unrelated. For example, the age of students and their score in the examination
have no relation with each other. The two variables may show similar movements but
To sum up, one has to be extremely careful while interpreting coefficient of correlation. Be-
fore one concludes a causal relationship, one has to consider other relevant factors that might
have any influence on the dependent variable or on both the variables. Such an approach will
avoid many of the pitfalls in the interpretation of the coefficient of correlation. It has been
rightly said that the coefficient of correlation is not only one of the most widely used, but
2. Explain the meaning and significance of the concept of correlation. Does correlation
always signify casual relationships between two variables? Explain with illustration
(a) Over a period of time there has been an increased financial aid to under
developed countries and also an increase in comedy act television shows. The
125