CORRELATION
CORRELATION
CORRELATION
o n e variable:
example, marks
ev.
of only
calledunivari
c h a r a c t e r i s t i C s
Introduction
studied the of analysis
is called univariate analysi
have This type statistieaa
Chapter we sales, etc. then the l analysis
In the previous we study it,
prices, ages, and if
weights, heights, rainfalls, between two
variables,
that the values of the
alues of the variat
exists some
relationship Sometimes it may happen heh
relationship
lfthere data is called bivariate analysis. to find if
there iIS any
Definition of correlation
According to Ya Lun Chou, "Correlation analysis attempts to determine the degree of retu
lations
between variables."
According to W. I. King, "Correlation means that between two ther
exists casual connection."
some series or group o
According to L.R. Connon,
"If two or more quantities
in one tend to be vary
tobe correlated."accompanied by corresponding movements in
in sympathy, s are
said
is
CORRELATION
useful in physical and social sciences.
Correlation
and economics.
In this book we can
study the uses of correlation
in business
nrrelation
1.Co
is very useful to economists to
study the relationship between
and quantity demanded. lo the variables, like price
related variables.
businessmen, it helps to estimate costs, sales, price and other
a Some variables shoW Some kind of
relationship; correlation analysis helps in measuring the
deoree of relationship between the variables like supply and demand, and
and expenditure, etc. price supply,
income
3 The relation between variable_ can be verified and tested for significance, with the help of
the correlation analysis. The effect of correlation is to reduce the range of uncertainty of our
prediction.
4 The coefficient of correlation is a relative measure and we can compare the relationship between
variables which are expressed in different units.
5. Sampling error can also be calculated.
6.Correlation is the basis for the concept of regression and ratio of variation.
Correlation and causation
Correlation analysis deals with the association or co-variation between two or more variables and
helps to determine the degree of relationship between two or more variables. But correlation does
not indicate a cause and effect relationship between two variables. It explains only co-variation.
The high degree of correlation between two variables may exist due to anyone or a combination
of the following reasons.
1. Pure chance: Especially in a small sample, the correlation is due to pure chance. If we select
a small sample from bivariate distribution, it may show a high degree of correlation; but in
the universe there is no relationship between the variables. A high degree of mathematical
corelation can be obtained even at the time when there is no relationship between the variables.
For example, the comparison of the production of shoes with the agricultural production which
have no relationship, may have a relationship. But if a relationship is formed, it may be only
a chance of coincidence, and such type of correlation is called nensensical or spurious
corelation. Another example is the relationship between cars produced and the children born
in a country.
Here this covariation may be due to chance and there is no logical basis for relationship.
Casure of correlation may be arrived at on the basis of covariation, but it may be
Onsensical or without meaning. That is, there may be correlation between two variables when
e two variables do not operate in the same physical or social systems, when they have
ning to do with each other. And, again, such types of correlation is known as spurious
Or nonsensical correlation.
2. BothVariables
variables are
are influenced by some other variables: A high degree of correlation between
ariables may be due to some cause or different causes eftecting each of these variables.
Forample,
example, aa high degree of correlation may exit between the yield per acre of paddy or
whea t o the effect of rainfall and other factors like fertilizers used, favourable weathes
etc. But none of the two variables Is the causc Or ne otner. It is difficult to explain
398 Statistics Theory and
which is the cause and which
is the effect; they may not haveave caused
caice.
each Prack
caused the other. But there is
an outside influence.
other,her, nor e
s affect each other. The
3. Mutual dependance: In this, the
variables
affeche proau suhiens..
are to bejudged for the circumstances. For example, tne production of juteanative Telative vary
is the and
subject
the jute production.
jute is rainfall directly
production is relative. The effect of the rainfallrainfal
directy rrelae
elated
Types of correlation: Correlation is classified into many types, but the
a of change in
constant ratio of the amount of change in the other variables. The graph of no
n o n - l i n e a r e
assume
majority of cases, we find curvilinear relationship, which is a
that the complcacial
One,
s c i e n c e s
is not so linear.iences.
Methods of studying correlation: perfect as in natua elations
The different methods of
two variables are; finding our
CoTeanon
, Graphic m e t h o d 399
A Graphi
Diagram
or
Scattergram
Scater
2 Sitmple G r a p h
Methematical method
Pearsons
Coefficient of Correlation
ar!
am3
4Spearmana.Rank coeficient of Correlation
ient of Concurrent Deviation
least s q u a r e s .
AMethodof
Men Dr iagram method: This is the simplest method of
variables by plotting the values on a finding
1. S e a t t e r
out whether there is any
sent between
between two va
kin present
R l a t i o n s h i p
chart, known as scatter diagram.
thod, the given
ethod, the given data are plotted on a graph paper in the form of dots. X variables are
1
the horizontal axis and Y variables on the vertical axis. Thus we have the dots and we
ntration of the various points. This will show the
the scatter or concent
n know of type correlation.
Perfect Positive Correlation Perfect Negative Correlation
Ifthe plotted points form a straight line running from the lower left-hand corner to the upper
g hand corner, then there is a perfect positive correlation (i.e., r = + 1, Diagram 1). On the other
the points are in a straight line, having a falling trend from the upper left-hand corner to
OWer right-hand corner, it reveals that there is a perfect negative or inverse correlation
i,r- 1,
Diagram 2).
bheithe plotted points fall in a narrow band, and the points are rising from lower left-hand corner
pper right-hand comer,
corn there will be a high degree or positive correlation between the variables
ht-handn
3). If the plotted to the lower
t h e plotted points poi fall in a band from the upper left-hand
narrow
corner
CorTelationpositive
High degree of negative
correlation
Diagram 3 Diagram 4
Diagram 5
StatisticS Theor
400
Theory and Pract
Merits of finding out the
1. Scatter diagram is
a simple and
attractive method ne
nature of comelatin
between two variables.
correlation. It is easy to
to understand
non-mathematical
method of studying and
2. It is a
whether it is a positive or nepatis
negative correl
glance
can get a rough idea at a
extreme items.
elation.
4. It is not influenced by
out the relationship
between two variables.
5. It is a first step in finding
Demerits
or correlation between two variahlan
By this method we cannot get the exact degree .
a rough idea.
2. Simple Graph: The values of the two variables are ploted on a graph paper, We .
curves, one for X variables and another for Y variables. These two curves reveal the diree
closeness of the two curves and also reveal whether or not the variables are related. If ha
both
curves move in the same direction, i.e., parallel to each other, either upward or downward, COra
the
is said to be positive. On the other hand, if they move in opposite directions, then the comeei
orrelaion
orelation
is said to be negative.
Illustration 12.1: Draw a correlation graph from the following data:
Variable 1 15 18 22 20 25 20
Variable 2 30 35 43 41 51 40
Correlation Graph
V-2 V-I
701 35
60 30
50
40
40 20
30- 5-
Variable-1
204 0- Variable-2
104
X
M A M
Months
is used
p ea b o v e m e t h o d
in
the case of
hich the ables are related, time series. This 401
pefficientof Correlation:
tion: method also does
gciene variables.
more variables. Its
Its
Correlation is a statistical es not
reveal the
or
til of
s
two m or
measures
alistical res
of correlation are analysis deals with the technique used for
sticalationship.
relationship. It is not proof only of association, between analysing ththe
or causal
is known. always possible to co-variation between two or more
e of
alie of a
a
variable
obtain the exact of the series, not
Karl Pearson'sC cient of Correlation: other variable:
a
Pearsoeal method for
mathematicai
Karl
the Pearson,
earson, aa great
measuring
earson's method is the most magnitude of biometrician and statistician,
Karl lation. It is denote
iesKarlPea
widely used linear relationship
int of orrelation. denoted by the symbolmethod in practice and betweent
; the formula for is known as Pearsonian
Covariance of xy calculating Pearsonian
2) r =2 y
OxXOy N oxxOy (3) r = 2xy
= (-X) y =(y-5)
n Standard deviation of series x
Standard deviation of series y
hen the deviation of items are taken from the actual
sabs, but the simplest formula the mean, we can
apply of these
is third one. any one
r
XYx N) (EXx2Y)
VZXxN-(2Y*).2rxN-(2Y*)
EXY =676; X =
70; Y =63; 2x* =728;
Y=651; N =7
(676x7)x (70x63)
728x7-(70) 651x7-(63)
4732-4410
V5096-4900 V4557 -3969
322
196x588
322
339.48
= +0.95.
Height in inches: 57
59 62
Weight in lbs.: 113 63 64 65 55 58
117 126 116
126 130 129 111 BCon
hen
deviations tions are taken from an assumed mean
n dev whole number, but a fraction or
ic not a
when the series
al me inyolve
n a c t u a lm e a n
dxdy dr Edy
N
N N
here dx = deviation of the items of x-series from an assumed mean i.e., dr = (X- A).
Steps:
. IaKE the deviation of x series from an assumed mean and denote these deviations by the
inches) 69 71 73
son (in incites) 66 67 67 68
70 69 70
68 64 68 72
(B. Com. Andhra,
M.A. Alahabad)
406 Statistics Theory and
Computation of Coefficient
of Correlation
Pracita
Y Series
X Series
Height of Deviations
Height of Deviations Square of Square of
father in from deviations son in from deviations Products o
inches assumed
mean (67)
inches assumed
means (68)
deviation d
x and y
series
d
65 67 Gxdy
2
66 68 0
67 0 0 64 -4 16
67 68 0
68 72 4 16
69 4 70 2
71 4 16 69
73 6 36 70 2
12
x=546| Zk=10Zá=62 2y=548T2d=4 Ed=4242dtdy=u
Coefficient of Correlation
E drdy-2) (Zd)
y
-xXXd?2d?
N N
Z dr dy xN -(Zd») (Z)
V2 xN-(2árxdyxN-(2d)*
2dr dy 26, Zde =10, Zdy
=
= 4
d=62, Edy 42, N =8 =
y 26x8-(10x4)
[62 x8-10]x[42 x8-44].
208-40
496-100] [336-16]
168
y-
396x320
168
V126720
168
355.98
= 0.472
TeaDon 407
lustration
2.6: Find suitable coefficient of correlation for the
a
15 18 20
following data:
r i l i e r
Hsea(tonnes)
24 30 35 40 50
85 93 95 105
wdactiviy nmes):
l o n m e
Fertilser
Deviation Square of Producti- Deviation Square of Producti-
from Deviation vity from Deviation vity of
usea
Assumed Y Assumed d Deviations
Mean Mean dxdy
dr X - 29 d y Y- I19
- 14 196 85 -34 1156 476
35 36 130 11 121 66
40 11 121 150 31 961 341
N
a 2d21/.
N
8
0225368142
2317
vI022x5343.5
2317
2336.89 0.99.
Hence
a
there nigh degree of correlation between fertiliser used and productivity.
lustratior
ke
ance cost
2.7: Calculate the coefficient of correlation between age of cars and annual
and
comment:
Corelation
413
8,800+272
9,20064 15,400-1156
9072
9136x14244
9072
V130133184
9072
11407.6
=+0.7953.
of relationship.
in one figure the degree ofcorrelation and its direction.
he coefficient ofcorrelation summarises Variable from known values of the
M estimate the value of the dependent
Oreover, we can
dependent variables.
merits of Coefficient of Correlation variables is not affected whether it is correct
.The 45Sumption of linear relationship between
or not.
The diculation
calculation of correlation is time-consuming.
of coefficient
affected by extreme items.
standard deviation, the value of the coefficient is unduly
Like landard
and-I need a a very
very careful
careful
4. The which lies between +1
coefficient
eilicient of correlation
o or the yardstick
misinterpreted. Careless interpretation will be fallacious.
interpretation,
pretation, or correlation will be
Mathtical
em Properties
Pro of the Coefficient of Correlation
need not
The converse
ot the theorem, i.e., r= true, that is uncorrected variables
0, is not
the
necessarily be independent. Uncorrelation between the variables Xand Y simply implies
in some other
absence of linear relationship between them. They may, however, be related
form.
N = Number of pairs
rtP.E.()
CONDITIONS
1. The numbe
FOR THE USE OF
PROBABLE ERROR
2. The distribution should have a normal distribution. That is, bell sho
Statistics Theory and
Prar
3. The items in the sample must have been selected by random sample curve
manner.
in an unbic
4. The statistical measure for which probable error is computed must hava
een from a
Illustration 12.12: Test the significance of correlation for the followino ul.
number of observation (i) 10 and ( i ) 100 and r = + 0.4 and + 0.9.
ased ont
(B.Com, Rajasta
r<6P.E.(r) >pP.E.(r) 6
No. of P.E
PE SignificntNo
observations
Signifcan!
0.4
10 0.4 67454=0.18 0.18
2.22
Not significant
6745-4) 0.06
0.4
100 0.4 = 6.67
V100 0.06 Significant
10 0.9 6745-V10 =0.04 0.9
0.04
22.5
Highly significant
67454)
0.9
100 0.9 = 0.0128 70.3
0.0128 Very highly signmiter
V100
It will always be good to calculate the probable eror before starting the interprelatend
coefficient of correlation.
Standard Error: Standard error is considered better than the probable error in modem tiasis
The formula is:
S.E. of r =-
N
(if 0.6745 is omitted from P.E., the formula we get is S.E. of r.) V a r a b l s
81% ofthe variance in the relative series has been explained by the subject s onludeth Conc
isv
variance of the independe and
K2Unexplained variance
Total variance
_1Explained variance
Total variance
=1-
ank Correlation Coe
oefficient: In 1904, Charles Edward Spearman, a British
Ranimethod
psychologist
of ascertaining the coefficient of correlation by ranks. This method is based
method
Out the
jad
measure is useful in dealing with qualitative characteristics, such as intelligence, beauty,
etc.
character, etc. It cannot be measured quantitatively, as in the case of Pearson's coefficient
tion, but it is based on the ranks given to the observations. It can be used when the data
l a r or extreme items are erratic or inaccurate, because rank correlation coefficient is not
of formality of data.
on the assumption
Ramk corelation is applicable only to individual observations. The result we get from this
value are not taken into
dod is only an approximate one, because under ranking method original
t . The formula for Spearman's rank correlation which is denoted by P is;
P 62D
= 1 - -N ( N - 1 )
P =1-62D?
N3-N
P Rank coefficient of correlation
of two ranks
D= Sum of the squares of the differences
N =Number of paired observations.
the value of P lies between +1 and -1. If
ne Karl Pearson's coefficient of correlation, direction of the rank is also
is complete agreement in the order of ranks and the
ere in the order of ranks and they are
When P -1, the then there is complete disagreement
pPste directions. We can find this in the following examples.
We may
Come across two types of problems
l) Where ranks are given
) Where ranks are not given
AWhere 4re gtven: When the actual
ranks are given,the steps followed are;
6
two subject is related?
8 9 10
Mathematics. 2
2
4
3 4
5 3 9 1 10 6 8
(B.Com. Bombay)
Statistics The
418 heory and Prac
Solution: Calculation of Rank Correlatlon
+2
5 4
3
1
10 2
9 6 +3
10 8 2
4
LD'40
N(N-1)
6x 40
=1--
10(10-1)
240
=1
10(100-1)
240
=1
990
= 1-0.24
= +0.76.
(6) Where ranks are not given: When no rank is given, but actual data are given, then we ma
net u
ranks by taking the highest 1 or the lowest value as l, l
give ranks. We can give as
the highest (lowest) as 2 and follow the same procedure for both the variables. des in
Illustration 12.14: A random sample of 5 college students is selected and their grau
=1(-1)
24
=1-5(25-1)
24
= 1-
5x 24
= 1- 5
=1-0.2
= +0.8
relation is +0.8.
is difficult to give
items have equal values, it
or more
ranks: When two received,
r Repeated the of the ranks they would have
items are given average are each
In that case the in the seventh place, they
.
k 7.5
2 tot=8 which is the common
seventh place, they are given the rank 3
ked equal at the be 10, in this case. A slightly
different formula
variables.
Soluti to the
First
we have to assign ranks Correlation
Calculation of Rank
D
DR ()-R ()
Rank ( )
Rank () Y 2.5 6.25
5.5 0.25
8 13 0.5
5.5 9.00
40 13 3
10 2.25
24 -1.5
2. 4 16.00
6 6
4.00
16 3 15 2
1.00
4
24 10 9 1 1.00
20
4 0.5 0.25
9
3 2.5 1.00
6
19 2D =41
420 statistics Theory and
16 is repeated 3 times in X items hence m= 3.13 and 6 are repeated.
Yitems te
Pra,
m
=
2. Therefore, the formula is:
P =1--
62D+ -m) -m)
N-N
=1- o41-3)*2-22'-
10 -10
= 1- 6(41+2+0.5+0.5)
990
264
=1 990
= 1-0.267
=+0.733
Merits of Rank Correlation Coefficient
1. It is simple to understand and
easy to calculate.
2. It is very useful in the case of data
which are of qualitative nature, like
beauty, efficiency, etc. intelligence, hones
3. No other method can be used
when the ranks
4. When the actual data are
are given, except this.
given, this method can also be applied
Demerits of Rank Correlation
Coefficient
1. It cannot be used in the
case of bi-variate
2. If the number of items distribution.
are greater
a lot of time. If
than, say 30, the calculation becomes tedious and u
requrs
exceeds 30.
we are
given the ranks, then we can apply this method even ough
the first
1 , I f i ti n c r e a s e
is equal,
nut zero. In the
case of the 421
ifit
difit
andi
edirection
thee
heading
of change of y Variable, following the
of the column
is denoted by Dx.
out
2Find by Dy. above
polumn is
denoted
step. The heading of
hy Dy and find out the values of
the
Maltiply Der C; i.e., the number of
Gibstitute the figures in the formula. positive items.
2C-is negative, the negative value
multiplied by the minus sign inside will make
and
an we can take the
squareroot. But f the ultimate result is
negative, we cannot take
crOOts of minus
Lareroots minus sign. If
If 42C is
N positive, then all the signs will be positive.
tlustration 12.16: Calculate the coefficient of concurrent deviations from the data given below:
Month: Jan. Feb. Mar. Apr. May June July Aug. Sept.
Supply: 160 164 172 182 166 170 178 192 186
Price: 292 280 260 234 266 254 230 190 200
(B.Com. Madras)
Solution:
Y) 2C-N
N
C =0; W =8.
(2x0-8)
8
=t+1)
=-1
-1.
Deviation
Calculation of Coefficient of Concurrent
Direction of
Month Direction of
Price y change compared dxdy
Supplyx change compared to previous
to previous year = dy
year dx
Jan. 292
Feb 160
164 280
Mat. 260
Apr. 172
234
May 182
266
une 166
254
170
230
Aug 178
190
Sent 192
200
186
drdy =0
422
Merits Statistics Theory and
1. It is the easiest and the
simplest method. Prac
2. It is used in the study of short time
oscillations.
3. This method
be used when the number
can
of items are
idea of the degree of very large; and
relationship. we
can
Demerits get a
quck
1. It gives equal weight to small and big changes.
2. It provides only a rough measure of coefficient of correlation.
LAG AND LEAD IN
It is necessary to consider
CORRELATION
lag and lead in the
changes. It is field of economic series one relationship
find
of variables, which do
not show
simuitane
effect relationship it established. may that there is some time
For instance, the gap before a çause n
it may not have immediate supply of raw materials may increase
effect on the price and take a few to-day h
days for prices to adjust to the
supply. Similarly, the boom in increged
gap of time. This difference
agricultural products may get reflected in industrial
period is in the known as lag. output ate
CORRELATION OF TIME SERIES
"A time series
may be defined as a collection of
readings
economic variable or
composite of variables." A seriesbelonging
some to different time
in which one variable is
periods, df
time series. That is, historical time, is cald
data spread over a
series depicts two period of time constitute a time series. The te
ty es of fluctuations (1) long term and short
series, without any change in the (2) term. If we corelate
twO
series, the resulting coefficient of
long term and short term correlation will
include
values will be eliminated changes.
If we desire to
study correlation of short term change, the u
by the moving average method. The
1. Find trend value steps in brief:
by moving average method.
2.Subtract thé moving
average from the
by x for X series and y for Y series. given time series to get short term flucnuau
Dentk
The
following example
would illustrate the above
Tlustration 12.17: Following are point p u t e coeficient
EdrsEh(Zd
N N
3. Bivariate frequency Distribution
Efdrdy-Ydr)(Efiy)
N
N N
4. When we take actual value of X and Y
NEXY 2XLY
NEX (ZX) NEY (EY) -
5.
Spearman's Rank Correlation Coefficient
62D2
r=
Spearman's Rank Correlatio
=]- N ( N
orN-
r or16ZD?
r=1-
D 12 m-m)+ 12 -m)..
N-N
7. Concurrent Deviation
correlation D"
c =Coefficient of
the concurrent deviatrou
method
e 2C-N
N concurrent
C = nber of the numr
deviations o r
of positive signs
443
N =
Number of pairs of deviation
compared ie. (N - 1)
&Probable Error P.E, = Probable Error
PE, = 0 . 6 7 4 5
r= Correlation
N= Number of of obervation
pairs
1- S.E, = Standard Error
S.E,N Determination =
2
of
Coefficient
non-Determination 1 2
Coefficient of
-
10.
QUESTIONS
bjective Type Questions
Siate whether the following statements are "True" or "False"
1Corelation always signifies a cause and effect relationship between thevariables.
The
3. The more variables
relationship betw
tween three or
based.
(B.ComA
7
[Ans: Y= 0.79].