0% found this document useful (0 votes)
40 views22 pages

CORRELATION

Correlation is a statistical concept which is widely used for determining the relationship between two variables. This Document shows the correlation detailed with excercise.

Uploaded by

SHRIRAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views22 pages

CORRELATION

Correlation is a statistical concept which is widely used for determining the relationship between two variables. This Document shows the correlation detailed with excercise.

Uploaded by

SHRIRAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

CHAPTER| 1 2

CORRELATION

o n e variable:
example, marks
ev.

of only
calledunivari
c h a r a c t e r i s t i C s

Introduction
studied the of analysis
is called univariate analysi
have This type statistieaa
Chapter we sales, etc. then the l analysis
In the previous we study it,
prices, ages, and if
weights, heights, rainfalls, between two
variables,
that the values of the
alues of the variat
exists some
relationship Sometimes it may happen heh
relationship
lfthere data is called bivariate analysis. to find if
there iIS any

and its sale; with inera


interested
such inter-related. We may be the commodity
of
with decrease in the nti
are
so collected
For example,
the price
under study. to decrease, or
two variables bound
the quantity sold
is
we can conclude
that there is sn
Mme
the price of the product, Therefore,
bound to increase.
the product, the quantity sold is
between price and sale. variables. We can find som
relationship or more
Thus correlation refers to
the relationship of two between the height
there exists some relationship
between two variables; for example,
relationship index yield and raintall, height
a price and demand, wage and price
ofa father and the height of son,
which measures and analyses the
and weight and so on. Correlation is the statistical analysis
which two variables fluctuate with reference to
each other. The wod
degree or extent to
between the variables
relationship' is of important and indicates that there is some connection variables
under observation. The corelation measures the closeness of the relationship between the
For instance, the height and the weight are correlated because both will be increasing ordecrea

Definition of correlation
According to Ya Lun Chou, "Correlation analysis attempts to determine the degree of retu
lations
between variables."
According to W. I. King, "Correlation means that between two ther
exists casual connection."
some series or group o
According to L.R. Connon,
"If two or more quantities
in one tend to be vary
tobe correlated."accompanied by corresponding movements in
in sympathy, s are
said

the other(s), then ey


According to Croxton and Cowden, "The
statistical tool for discovering relationship of nature, the approp
is known as and
measuring the relationshipquantitative nat
expressing " brief 10
i e f formis
correlation." and
According to A.M. Tutle,
more variables."
"Correlation is an
analysis of the ation benweenoo
Thus, the association of covarlao
the
relationship or any two
variates
Dresses
the
changes in inter-dependence
the value
of
of two
sets
is known
nown asas correlation.
of correlation. The correlatie
is the
numerical measurement one variable are in variables upon
upon each
each other in such way
a
Corela

showing the degreesympathy


of
with the
changes
ou
ges in the
other.
correlation between twO iables. variables. One v
Y
Corelation

ssubject" (independent) and the other


called
"subject" (inc 397
be Can terms of the subject. For
nay
be
in terms "relative" (dependent) variable.
ble. Relative
measured instance, rainfall and Relative variable
variabieis
is
a causewhich-reflects the agricultural agriculrural
production. But products.
aSrainfall. Therefore, rainfall is independent and the agricultural production cannot causeRainfall
he OF STUDY OF production is dependent.
GNIFICANCE
SIGN

is
CORRELATION
useful in physical and social sciences.
Correlation

and economics.
In this book we can
study the uses of correlation
in business
nrrelation
1.Co
is very useful to economists to
study the relationship between
and quantity demanded. lo the variables, like price
related variables.
businessmen, it helps to estimate costs, sales, price and other
a Some variables shoW Some kind of
relationship; correlation analysis helps in measuring the
deoree of relationship between the variables like supply and demand, and
and expenditure, etc. price supply,
income

3 The relation between variable_ can be verified and tested for significance, with the help of
the correlation analysis. The effect of correlation is to reduce the range of uncertainty of our
prediction.
4 The coefficient of correlation is a relative measure and we can compare the relationship between
variables which are expressed in different units.
5. Sampling error can also be calculated.
6.Correlation is the basis for the concept of regression and ratio of variation.
Correlation and causation
Correlation analysis deals with the association or co-variation between two or more variables and
helps to determine the degree of relationship between two or more variables. But correlation does
not indicate a cause and effect relationship between two variables. It explains only co-variation.
The high degree of correlation between two variables may exist due to anyone or a combination
of the following reasons.
1. Pure chance: Especially in a small sample, the correlation is due to pure chance. If we select
a small sample from bivariate distribution, it may show a high degree of correlation; but in
the universe there is no relationship between the variables. A high degree of mathematical
corelation can be obtained even at the time when there is no relationship between the variables.
For example, the comparison of the production of shoes with the agricultural production which
have no relationship, may have a relationship. But if a relationship is formed, it may be only
a chance of coincidence, and such type of correlation is called nensensical or spurious

corelation. Another example is the relationship between cars produced and the children born
in a country.
Here this covariation may be due to chance and there is no logical basis for relationship.
Casure of correlation may be arrived at on the basis of covariation, but it may be
Onsensical or without meaning. That is, there may be correlation between two variables when
e two variables do not operate in the same physical or social systems, when they have
ning to do with each other. And, again, such types of correlation is known as spurious
Or nonsensical correlation.
2. BothVariables
variables are
are influenced by some other variables: A high degree of correlation between
ariables may be due to some cause or different causes eftecting each of these variables.
Forample,
example, aa high degree of correlation may exit between the yield per acre of paddy or
whea t o the effect of rainfall and other factors like fertilizers used, favourable weathes
etc. But none of the two variables Is the causc Or ne otner. It is difficult to explain
398 Statistics Theory and
which is the cause and which
is the effect; they may not haveave caused
caice.
each Prack
caused the other. But there is
an outside influence.
other,her, nor e
s affect each other. The
3. Mutual dependance: In this, the
variables
affeche proau suhiens..

are to bejudged for the circumstances. For example, tne production of juteanative Telative vary
is the and
subject
the jute production.
jute is rainfall directly
production is relative. The effect of the rainfallrainfal
directy rrelae
elated
Types of correlation: Correlation is classified into many types, but the

1. Positive and negative are


2. Simple and multiple
3. Partial and total
4. Linear and non-linear
1. Positive and negative correlation: Positive and negative correlation depend upon thed
of change of the variables. If two variables tend to move together in the same direek
an increase in the value of one variable is accompanied by an increase in the
value eof te
other variable; or a decrease in the value of one variable is accompanied by a
rease in
the value of the other variable, then the correlation is called positive or direct corelati
Height and weight, rainfall and yield of crops, price and supply are examples ofpes
corelation.
If two variables, tend to move together in opposite directions so that an increase or decrae
in the values of one variable is accompanied by a decrease or increase in the valueoft
other variable, then the correlation is called negative or inverse correlation. Price and denai
yield of crops and price, etc., are examples of negative correlation. Here, the increase intk
values of the independent variable is associated with the decrease in the value oft
dependent variable or vice versa.
2. and is
Simple multiple: When study we
relationship descrk
only two variables, the
simple correlation, example, quantity of money and price level, demand and price,et.
in a multiple correlation we study more than two variables simultaneously;, exampi.
relationship of price, demand and supply of a commodity.
3. Partial and total: The study of two variables alledpari
excluding some other variabiesi In
correlation. For example, we study price and demand, eliminating the supplyssr:
correlation, all the facts are taken into account.
uniform, thent
4. Linear and non-linear: Ifthe ratio of change between two variables is unifom,
will be linear correlation between them. Consider the
following
X 5 20
10 15
Y 4 8
6
12 on the g r a
The ratio of change between the variables is the same. If we plot tnc
get a straight line.
does
not *
In curvilinear or non-linear correlation, the amount
a riable

a of change in
constant ratio of the amount of change in the other variables. The graph of no
n o n - l i n e a r e

relationship will form a curve. genen


we
In so

assume
majority of cases, we find curvilinear relationship, which is a
that the complcacial
One,
s c i e n c e s

relationship between the


correlation is rare, because the exactnessvariables under study is
S O c i a l

is not so linear.iences.
Methods of studying correlation: perfect as in natua elations
The different methods of
two variables are; finding our
CoTeanon
, Graphic m e t h o d 399
A Graphi

Diagram
or
Scattergram
Scater
2 Sitmple G r a p h

Methematical method
Pearsons
Coefficient of Correlation
ar!
am3
4Spearmana.Rank coeficient of Correlation
ient of Concurrent Deviation
least s q u a r e s .
AMethodof
Men Dr iagram method: This is the simplest method of
variables by plotting the values on a finding
1. S e a t t e r
out whether there is any
sent between
between two va
kin present
R l a t i o n s h i p
chart, known as scatter diagram.
thod, the given
ethod, the given data are plotted on a graph paper in the form of dots. X variables are
1
the horizontal axis and Y variables on the vertical axis. Thus we have the dots and we
ntration of the various points. This will show the
the scatter or concent
n know of type correlation.
Perfect Positive Correlation Perfect Negative Correlation

Diagram 1 (r=+ 1) Diagram 2 (r=- 1)

Ifthe plotted points form a straight line running from the lower left-hand corner to the upper
g hand corner, then there is a perfect positive correlation (i.e., r = + 1, Diagram 1). On the other
the points are in a straight line, having a falling trend from the upper left-hand corner to
OWer right-hand corner, it reveals that there is a perfect negative or inverse correlation
i,r- 1,
Diagram 2).
bheithe plotted points fall in a narrow band, and the points are rising from lower left-hand corner
pper right-hand comer,
corn there will be a high degree or positive correlation between the variables
ht-handn
3). If the plotted to the lower
t h e plotted points poi fall in a band from the upper left-hand
narrow
corner

and cormer, there


there will be a high degree or negative correlation (Diagram 4). If the plotted
Pims ieSscatter
a l overthe diagram, there is no correlation between the two variables (Diagram 5)
igh degree of No correlation (r = 0 -

CorTelationpositive
High degree of negative
correlation

Diagram 3 Diagram 4
Diagram 5
StatisticS Theor
400
Theory and Pract
Merits of finding out the

1. Scatter diagram is
a simple and
attractive method ne
nature of comelatin
between two variables.
correlation. It is easy to
to understand
non-mathematical
method of studying and
2. It is a
whether it is a positive or nepatis
negative correl
glance
can get a rough idea at a
extreme items.
elation.
4. It is not influenced by
out the relationship
between two variables.
5. It is a first step in finding
Demerits
or correlation between two variahlan
By this method we cannot get the exact degree .

a rough idea.
2. Simple Graph: The values of the two variables are ploted on a graph paper, We .
curves, one for X variables and another for Y variables. These two curves reveal the diree
closeness of the two curves and also reveal whether or not the variables are related. If ha
both
curves move in the same direction, i.e., parallel to each other, either upward or downward, COra
the
is said to be positive. On the other hand, if they move in opposite directions, then the comeei
orrelaion
orelation
is said to be negative.
Illustration 12.1: Draw a correlation graph from the following data:

Period Jan, Feb. Mar. April May June

Variable 1 15 18 22 20 25 20
Variable 2 30 35 43 41 51 40

Correlation Graph

V-2 V-I

701 35

60 30

50
40
40 20

30- 5-
Variable-1
204 0- Variable-2

104

X
M A M
Months
is used
p ea b o v e m e t h o d
in
the case of
hich the ables are related, time series. This 401
pefficientof Correlation:
tion: method also does
gciene variables.
more variables. Its
Its
Correlation is a statistical es not
reveal the
or
til of
s
two m or

measures
alistical res
of correlation are analysis deals with the technique used for
sticalationship.
relationship. It is not proof only of association, between analysing ththe
or causal
is known. always possible to co-variation between two or more
e of
alie of a
a
variable
obtain the exact of the series, not
Karl Pearson'sC cient of Correlation: other variable:
a
Pearsoeal method for
mathematicai
Karl
the Pearson,
earson, aa great
measuring
earson's method is the most magnitude of biometrician and statistician,
Karl lation. It is denote
iesKarlPea
widely used linear relationship
int of orrelation. denoted by the symbolmethod in practice and betweent
; the formula for is known as Pearsonian
Covariance of xy calculating Pearsonian
2) r =2 y
OxXOy N oxxOy (3) r = 2xy
= (-X) y =(y-5)
n Standard deviation of series x
Standard deviation of series y
hen the deviation of items are taken from the actual
sabs, but the simplest formula the mean, we can
apply of these
is third one. any one

MpERTIES OF COEFFICIENT OF CORRELATION


The measure of correlation, called
coefficient of correlation, summarises in
diection and the degree of correlation. one
figure, the
The value of the coefficient of correlation shall
always lie
When r=+1, then there is perfect positive corelation between 1 and 1.
+ -

between the variables.


Nenr 1, then there is perfect negative correlation
-

between the variables.


Wienr=0, then there is no relationship between the
variables.
Dretically, we get values which lie between +
ad -0.5, 1 and -

1; but normally the value lies


between
Aans there is positive correlation, because r is positive and the magnitude
sthat the correlation is negative and magnitude of correlation is 0.5. of correlation
Thus, the
dndescribes the magnitude and the direction of correlation.
hird formula given 2 XY
above, that 15, r is easy to calculate, and it is not
ay to calculate sx?Y
the standard deviation of X andY series separately. (See Ilustration
Seps:
u
e the mean of the wo two series i.e., X and Y.
ae Viations
the deviatiof the two series from x and denote dx and
and y dy.
deote iations
andand get the total of the respective squares of deviations of x and y
enote d and
dy"
dy.
402
Statistics Theory and
co-variance) Practie
of x and y and get the his is the
total. (This the co

4. Multiply the deviations in the formula.


5. Substitute the values
of 2XY, ZX' and Y
from the followins
Ilustration 12.2: Calculate
coefficient of correlation
10 11 13
llowing data.
12 8
X
8 6 9 11 12
Y 14 3
(B.Com. Baroda, Madurai, Madros
Solution: NB: In both the series items are in small number. Therefore coral as,
can also be calculated without taking deviations from actual mea
neans or
ation coeftKericaely
assumed mean
Computation of Coefficient of Correlation
X Y XY
2 14 144 196
168
9 8 81 64 72
8 6 64 36 48
10 9 100 81 90
11 121 121 121
13 12 169 144 156
7 3 49 9 21
2X=70 =70 Y=63 Zx=728 Y ' =651 2XY =676

r
XYx N) (EXx2Y)
VZXxN-(2Y*).2rxN-(2Y*)
EXY =676; X =
70; Y =63; 2x* =728;
Y=651; N =7
(676x7)x (70x63)
728x7-(70) 651x7-(63)
4732-4410
V5096-4900 V4557 -3969
322
196x588
322
339.48
= +0.95.

llustration 12.3: Find if there is and


we

given below: any significant correlation berwe the


heights

Height in inches: 57
59 62
Weight in lbs.: 113 63 64 65 55 58
117 126 116
126 130 129 111 BCon

Solution: Co-efficient of Corelation


THZDON
0.847 405
r =+0.847.

hen
deviations tions are taken from an assumed mean
n dev whole number, but a fraction or
ic not a
when the series
al me inyolve
n a c t u a lm e a n

lot of time. To avoid such tedious is large, the calculation


jwill involve a lot
The formula is: calculation, we can use the assumed
:
metho

dxdy dr Edy
N

N N

here dx = deviation of the items of x-series from an assumed mean i.e., dr = (X- A).

dy = deviations of the items of y series from an assumed mean i.e.,


dy = (Y-A).
N = Number of items.
Sdr dy = the total of the product of the deviations of x and y series from their
assumed mean.
St2 = the total of the squares of the deviations of x series from an assumed mean.

h2 = total of the squares of the deviations of y series from an assumed mean


Edy = total of the deviations of x series from assumed mean.
2dy = total of the deviations of y series from assumed mean.

Steps:
. IaKE the deviation of x series from an assumed mean and denote these deviations by the

ymbol dr and get the total i.e., 2dx.


from assumed mean and denote these deviations
y,take the deviations of y series an
Dy dy and get the total
i.e., 2dy.
are
rand get the total d and square dy and get the total 2dy'.
Muliply dr by dy and get the total i.e., dy'.
Subsitute he values in the formula, that is:
d r dhy-2& Ldy
N
Ed-2
N
L N
ofstration 12.5: ind Find in the following case.
the coefficient of correlation
faher (in out

inches) 69 71 73
son (in incites) 66 67 67 68

70 69 70
68 64 68 72
(B. Com. Andhra,
M.A. Alahabad)
406 Statistics Theory and
Computation of Coefficient
of Correlation
Pracita
Y Series
X Series

Height of Deviations
Height of Deviations Square of Square of
father in from deviations son in from deviations Products o
inches assumed
mean (67)
inches assumed
means (68)
deviation d
x and y

series
d
65 67 Gxdy
2
66 68 0

67 0 0 64 -4 16
67 68 0
68 72 4 16
69 4 70 2
71 4 16 69
73 6 36 70 2
12
x=546| Zk=10Zá=62 2y=548T2d=4 Ed=4242dtdy=u
Coefficient of Correlation
E drdy-2) (Zd)
y
-xXXd?2d?
N N
Z dr dy xN -(Zd») (Z)
V2 xN-(2árxdyxN-(2d)*
2dr dy 26, Zde =10, Zdy
=
= 4
d=62, Edy 42, N =8 =

y 26x8-(10x4)
[62 x8-10]x[42 x8-44].
208-40
496-100] [336-16]
168
y-
396x320
168
V126720
168
355.98
= 0.472
TeaDon 407
lustration
2.6: Find suitable coefficient of correlation for the
a

15 18 20
following data:
r i l i e r
Hsea(tonnes)
24 30 35 40 50
85 93 95 105
wdactiviy nmes):
l o n m e

120 130 150 160


(B. Com., Madras)
S o l u t i o n :

Computation of Coefficient of Correlation


Fertiliser used
Productivity
X Y

Fertilser
Deviation Square of Producti- Deviation Square of Producti-
from Deviation vity from Deviation vity of
usea
Assumed Y Assumed d Deviations
Mean Mean dxdy
dr X - 29 d y Y- I19
- 14 196 85 -34 1156 476

121 93 26 676 286


8
81 95 24 576 216
20
25 105 14 196 70
24
30 120

35 36 130 11 121 66
40 11 121 150 31 961 341

50 21 441 160 41 1681 861

dr= 0 d =1022 Edy-14 d53682.317


Coefficient of Correlation:
d ra y 2 d r x E d y

N
a 2d21/.
N

2 drdy =2,317, 2dr =


0, Edy =
-14, N =8, 2d =1022, 2dy = 5368
2317 Ox -14

8
0225368142

2317
vI022x5343.5
2317
2336.89 0.99.
Hence
a
there nigh degree of correlation between fertiliser used and productivity.
lustratior
ke
ance cost
2.7: Calculate the coefficient of correlation between age of cars and annual
and
comment:
Corelation

413
8,800+272
9,20064 15,400-1156
9072
9136x14244
9072
V130133184
9072
11407.6
=+0.7953.

AsSumption of Pearsonian Coefficient


There are some assumptions of Karl Pearson's coefficient of correlation. They are as follows:
1. Linear relationship: If the two variables are plotted on a scatter diagram, it is assumed that
the plotted points will form a straight line. So there is a linear relationship between thevariables.
2. Normality: The correlated variables are affected by a large number of independent causes,
which form a normal distribution. Variables like quantity of money, age, weight, height, price,
demand, etc., are affected by such forces, that normal distribution is formed.
3.Casual relationship: Correlation is only meaningful, ifthere is a cause and effect relationship
between the force affecting the distribution of items in the two series. It is meaningless, if
there is no such relationship. There is no relationship between rice and weight, because the
factors that affect these variables are not common.
4. Proper grouping: It will be a better correlation analysis if there is an equal number of pairs.
.Error of measurement: If the error of measurement is reduced to the minimum the coeffi-
CIent of correlation is more reliable.

Merits of Coefficient of Correlation


arl Pearson's method is the most popular mathematical method used for measuringthe degree

of relationship.
in one figure the degree ofcorrelation and its direction.
he coefficient ofcorrelation summarises Variable from known values of the
M estimate the value of the dependent
Oreover, we can

dependent variables.
merits of Coefficient of Correlation variables is not affected whether it is correct
.The 45Sumption of linear relationship between
or not.

The diculation
calculation of correlation is time-consuming.
of coefficient
affected by extreme items.
standard deviation, the value of the coefficient is unduly
Like landard
and-I need a a very
very careful
careful
4. The which lies between +1
coefficient
eilicient of correlation
o or the yardstick
misinterpreted. Careless interpretation will be fallacious.
interpretation,
pretation, or correlation will be

Mathtical
em Properties
Pro of the Coefficient of Correlation

Coefficient of correlation lies between -1 and +1.


Symbolically. r<+l
415
Correlation

need not
The converse
ot the theorem, i.e., r= true, that is uncorrected variables
0, is not
the
necessarily be independent. Uncorrelation between the variables Xand Y simply implies
in some other
absence of linear relationship between them. They may, however, be related
form.

the Coefficient of Correlation


Interpreting otherwise,
most care while interpreting the value of the coefficient of correlation;
We must take at very much
conclusion. The correlation
interpretation ofthe coefficient of depends
we get a fallacious can say or we can come to a conclusion that the following general
on experience. However, we
in interpreting the value of r.
rules are helpful
between the variables.
1.Whenr =+1, it means that there is perfect positive relationship between the variables.
2. When r
=
1, it means that there is perfect negative relationship
-

i.e., the variables


3. When r
=
0, it means that there is no relationship between the variables,
are not correlated.
+1 or-1, it signifies that there is a high degree ofcorrelation
4. Ifthe coefficient of correlation is there
or negative) between the two variables. If r
is near to 0, i.e., 0.1, -0.1 or 0.2
(positive
is less correlation.
between two series and not
5. A coefficient of correlation shows only the degree of correlation
the causes of relationship.

Coefficient of Correlation and Probable Error


the value of Pearsonian coefficient of correlation,
To find out the reliability or the significance of
error of the coefficient of correlation
to Horace Secrist. "The probable
probable enror is used. Accordingor substracted from the mean correlation coefficient, produces amounts
is an amount, which, ifadded to
of correlation from a series selected at random
within which the chances are even that a coefficient
will fall." The formula for calculating probable error is

Probable Error ofr =0.6745

Where 0.6745 is a constant number


r = Pearsonian coefficient of correlation

N = Number of pairs

limits for population correlation coefticient


are:
he

rtP.E.()

Functions of Probable Error


the value of r is not at all significant.
the value is less than the probable error,
of r the value ofr is significant.
value ofr is more than six times
the probable error(r =6PE),
ne correlation should not be considered at all.
probable error is less than 0.3, the
A* the probable error is small, the correlation is definitely existing8

CONDITIONS
1. The numbe
FOR THE USE OF
PROBABLE ERROR

When the number of pairs of observation is


ber of items should be large enough.
conciusions.
, the probable eTor may lead to fallacious
416

2. The distribution should have a normal distribution. That is, bell sho
Statistics Theory and
Prar
3. The items in the sample must have been selected by random sample curve
manner.
in an unbic
4. The statistical measure for which probable error is computed must hava
een from a
Illustration 12.12: Test the significance of correlation for the followino ul.
number of observation (i) 10 and ( i ) 100 and r = + 0.4 and + 0.9.
ased ont
(B.Com, Rajasta
r<6P.E.(r) >pP.E.(r) 6

No. of P.E
PE SignificntNo
observations
Signifcan!
0.4
10 0.4 67454=0.18 0.18
2.22
Not significant

6745-4) 0.06
0.4
100 0.4 = 6.67
V100 0.06 Significant
10 0.9 6745-V10 =0.04 0.9
0.04
22.5
Highly significant
67454)
0.9
100 0.9 = 0.0128 70.3
0.0128 Very highly signmiter
V100
It will always be good to calculate the probable eror before starting the interprelatend
coefficient of correlation.
Standard Error: Standard error is considered better than the probable error in modem tiasis
The formula is:
S.E. of r =-
N
(if 0.6745 is omitted from P.E., the formula we get is S.E. of r.) V a r a b l s

Coefficient of determination: The nature and the extent ofrelationship bew


are indicated by the coefficient of correlation. An effective way of interpreting
coefficient of determination. The coefficient of determination is defined as the ratio o e i
ariancein)
variance to the total variance, if multiplied by 100, it will give the percentage of co-v

which is associated with the variance on x (y) or vice versa.

Coefficient of determination =? -EXplained


variance
Total variance 87%.
I me
S
For example, ifr = 0.9, then 2 = 0.81, when multiplied by 100, it becom a n dt h e r e m
t h a 9t

81% ofthe variance in the relative series has been explained by the subject s onludeth Conc

19% of the variance is due to other factors. When = 0.9,


we cannot mean or series.

isv
variance of the independe and

of the variance of the dependent series is due to the U s e f u lm e a s u r


overrate

According to Tuttle, ""The coefficient of correlation has been grosse useful a r i n ge v e rc


yo r e l a n
mo
is a much of
the
ina"

entirely too much. Its determinat


square, the coefficient of
the linear covariation of two variables. The reader should develop the habit h e extent
h
coefticient he finds cited or stated before coming to any conclusion adou
relationship between the correlated variables."
Tealon 417
Coeftficiento f nan-determination is the ratio
of the unexplained variation to the total variation.
c i e n t
termination is denoted by K.
" n o n - d e t e r

K2Unexplained variance
Total variance
_1Explained variance
Total variance
=1-
ank Correlation Coe
oefficient: In 1904, Charles Edward Spearman, a British
Ranimethod
psychologist
of ascertaining the coefficient of correlation by ranks. This method is based
method
Out the
jad
measure is useful in dealing with qualitative characteristics, such as intelligence, beauty,
etc.
character, etc. It cannot be measured quantitatively, as in the case of Pearson's coefficient
tion, but it is based on the ranks given to the observations. It can be used when the data
l a r or extreme items are erratic or inaccurate, because rank correlation coefficient is not
of formality of data.
on the assumption
Ramk corelation is applicable only to individual observations. The result we get from this
value are not taken into
dod is only an approximate one, because under ranking method original
t . The formula for Spearman's rank correlation which is denoted by P is;

P 62D
= 1 - -N ( N - 1 )

P =1-62D?

N3-N
P Rank coefficient of correlation
of two ranks
D= Sum of the squares of the differences
N =Number of paired observations.
the value of P lies between +1 and -1. If
ne Karl Pearson's coefficient of correlation, direction of the rank is also
is complete agreement in the order of ranks and the
ere in the order of ranks and they are
When P -1, the then there is complete disagreement
pPste directions. We can find this in the following examples.
We may
Come across two types of problems
l) Where ranks are given
) Where ranks are not given
AWhere 4re gtven: When the actual
ranks are given,the steps followed are;

Campe Crence of the two ranks (R, and R) and denote by D.

Square the D and


Stbritute the figur get inZD.
the fomula.
two subjects. Statistics
by 10 students in

AMhSAatsetimcsaics 3 : Following are the rank


To what exte
xtent the knowledge
obtained
of the students in the

6
two subject is related?
8 9 10

Mathematics. 2
2
4
3 4
5 3 9 1 10 6 8
(B.Com. Bombay)
Statistics The
418 heory and Prac
Solution: Calculation of Rank Correlatlon

Rank of Statistics (x)


Rank of Mathematics (y) D (x-y)
2 -1 D
2
2

+2
5 4
3

1
10 2

9 6 +3
10 8 2
4

LD'40
N(N-1)
6x 40
=1--
10(10-1)
240
=1
10(100-1)
240
=1
990
= 1-0.24
= +0.76.

(6) Where ranks are not given: When no rank is given, but actual data are given, then we ma
net u
ranks by taking the highest 1 or the lowest value as l, l
give ranks. We can give as

the highest (lowest) as 2 and follow the same procedure for both the variables. des in
Illustration 12.14: A random sample of 5 college students is selected and their grau

Mathematics and Statistics are found to be


4 5
2 3
40 90
Mathematics: 85 60 73
50 80
Statistics: 93 75 65
Calculate Pearman's rank correlation coefficient.
Solution:
Rank
Marks in Ranks x Marks in Rank y
difference
Mathematics statisticsY
X -
85 93
60 75 3
73 65
40 50 5
-1
90 80
N 5,D =4
6x4

=1(-1)
24
=1-5(25-1)
24
= 1-
5x 24
= 1- 5
=1-0.2
= +0.8
relation is +0.8.
is difficult to give
items have equal values, it
or more
ranks: When two received,
r Repeated the of the ranks they would have
items are given average are each
In that case the in the seventh place, they
.

if two individuals are placed


tied. For example, the next will be 9;
and if
rank to be assigned; and
which is common

k 7.5
2 tot=8 which is the common
seventh place, they are given the rank 3
ked equal at the be 10, in this case. A slightly
different formula

the next rank will


SSignedto each; and item having the s a m e value.
The t o r m u l a 1s:
more than one
n there is 1
2D+(m -m)+(m
12
-m)..
12
P 1-6 N-N are common
items whose ranks
the number of
m = the rank
correlation coefficient after
following data calculate
ranon 12.15: From the
djustment for tied ranks. 24 16 57
16 65
48 33 40 9 16 6 19
20 9
4
13 13 24 15 (MCom. Meerut)

variables.
Soluti to the
First
we have to assign ranks Correlation
Calculation of Rank
D
DR ()-R ()
Rank ( )
Rank () Y 2.5 6.25
5.5 0.25
8 13 0.5
5.5 9.00
40 13 3
10 2.25
24 -1.5
2. 4 16.00
6 6
4.00
16 3 15 2
1.00
4
24 10 9 1 1.00
20
4 0.5 0.25
9
3 2.5 1.00
6
19 2D =41
420 statistics Theory and
16 is repeated 3 times in X items hence m= 3.13 and 6 are repeated.
Yitems te
Pra,
m
=
2. Therefore, the formula is:

P =1--
62D+ -m) -m)
N-N

=1- o41-3)*2-22'-
10 -10
= 1- 6(41+2+0.5+0.5)
990
264
=1 990
= 1-0.267
=+0.733
Merits of Rank Correlation Coefficient
1. It is simple to understand and
easy to calculate.
2. It is very useful in the case of data
which are of qualitative nature, like
beauty, efficiency, etc. intelligence, hones
3. No other method can be used
when the ranks
4. When the actual data are
are given, except this.
given, this method can also be applied
Demerits of Rank Correlation
Coefficient
1. It cannot be used in the
case of bi-variate
2. If the number of items distribution.
are greater
a lot of time. If
than, say 30, the calculation becomes tedious and u
requrs

exceeds 30.
we are
given the ranks, then we can apply this method even ough

Concurrent deviation method: Under


in the concerned variables is taken into concurrent deviation method only the direc
correlation and not its account. When it is desired to study
degree, the method onyei h
on the
signs of the deviation, i.e., direction of of concurrent deviation is the easiest.
change, of the values
p r e c e d m n e

value and does not take into of the variable


account the exact
magnitude of
rd
In this method, only the direction of the values of
vana u n t oa c c o u

It is the changein the variables x and y 18 a


simplest method of finding
out correlation. eviations
each term the
change is the value of This is based on thé signs of de
be plus (+) or the value which
minus (). The formula variable from its or preceding
ious
previ
is:
2C-N
N
ie)= Coefficient of
C
Number of
correlation by the concurrent deviation bd
concurrent metno
N =
Number of pairs of deviations
Steps: deviation compared.
1. Find out 0aseanda
the
down whetherdirection of
the second change of x variable. Take ue f
as

the first
1 , I f i ti n c r e a s e

relation to the value is Increasing or nstant


previous increasing
one, mark plus
or decreasing or
ases, putmin
(+) sign against it, T u
TOalon

is equal,
nut zero. In the
case of the 421
ifit
difit
andi

method till the last


third value, the
second value is the base and
above item. The
repeat
the

edirection
thee
heading
of change of y Variable, following the
of the column
is denoted by Dx.
out
2Find by Dy. above
polumn is
denoted
step. The heading of
hy Dy and find out the values of
the
Maltiply Der C; i.e., the number of
Gibstitute the figures in the formula. positive items.
2C-is negative, the negative value
multiplied by the minus sign inside will make
and
an we can take the
squareroot. But f the ultimate result is
negative, we cannot take
crOOts of minus
Lareroots minus sign. If
If 42C is
N positive, then all the signs will be positive.
tlustration 12.16: Calculate the coefficient of concurrent deviations from the data given below:
Month: Jan. Feb. Mar. Apr. May June July Aug. Sept.
Supply: 160 164 172 182 166 170 178 192 186
Price: 292 280 260 234 266 254 230 190 200
(B.Com. Madras)
Solution:

Y) 2C-N
N

C =0; W =8.
(2x0-8)
8

=t+1)
=-1
-1.
Deviation
Calculation of Coefficient of Concurrent
Direction of
Month Direction of
Price y change compared dxdy
Supplyx change compared to previous
to previous year = dy

year dx
Jan. 292
Feb 160
164 280
Mat. 260
Apr. 172
234
May 182
266
une 166
254
170
230
Aug 178
190
Sent 192
200
186
drdy =0
422
Merits Statistics Theory and
1. It is the easiest and the
simplest method. Prac
2. It is used in the study of short time
oscillations.
3. This method
be used when the number
can
of items are
idea of the degree of very large; and
relationship. we
can
Demerits get a
quck
1. It gives equal weight to small and big changes.
2. It provides only a rough measure of coefficient of correlation.
LAG AND LEAD IN
It is necessary to consider
CORRELATION
lag and lead in the
changes. It is field of economic series one relationship
find
of variables, which do
not show
simuitane
effect relationship it established. may that there is some time
For instance, the gap before a çause n
it may not have immediate supply of raw materials may increase
effect on the price and take a few to-day h
days for prices to adjust to the
supply. Similarly, the boom in increged
gap of time. This difference
agricultural products may get reflected in industrial
period is in the known as lag. output ate
CORRELATION OF TIME SERIES
"A time series
may be defined as a collection of
readings
economic variable or
composite of variables." A seriesbelonging
some to different time
in which one variable is
periods, df
time series. That is, historical time, is cald
data spread over a
series depicts two period of time constitute a time series. The te
ty es of fluctuations (1) long term and short
series, without any change in the (2) term. If we corelate
twO
series, the resulting coefficient of
long term and short term correlation will
include
values will be eliminated changes.
If we desire to
study correlation of short term change, the u
by the moving average method. The
1. Find trend value steps in brief:
by moving average method.
2.Subtract thé moving
average from the
by x for X series and y for Y series. given time series to get short term flucnuau
Dentk

3. Square the short


term fluctuations for X
4. and Y series to get r and y.
Multiplyx with y for each value to get xy.
5. Apply the
formula.

The
following example
would illustrate the above
Tlustration 12.17: Following are point p u t e coeficient

corrélation for short-time the indices of supply and price.


oscillations taking 5 yearly moving O
Year
averagc
Indexofprice
Index of Supply
117
2
91
98 97
3
102
4 95
108
92
105
93
taken from actual mean
Karl
. When deviations are
Pearson's coefi
correlation
Zxy
No y or x
(X - X)
Zxy
xy y =
(Y- 7
N Number of
pairS of
obseng
observaun
o, Standard deviation
of Xsetes
Oy
2. When deviations are taken from assumed means
Standard deviation of Ysene
series
dx (X - A)
dx (Y - A)
dxdy (Ldx) (2dy)

EdrsEh(Zd
N N
3. Bivariate frequency Distribution
Efdrdy-Ydr)(Efiy)
N

N N
4. When we take actual value of X and Y
NEXY 2XLY
NEX (ZX) NEY (EY) -

5.
Spearman's Rank Correlation Coefficient
62D2
r=
Spearman's Rank Correlatio
=]- N ( N
orN-
r or16ZD?

(N- N) D Difference of Rank


N)
6. When N= Number of pairs of obervalid
Ranks are
repeated m =The number of times, the values
repeated or the ranks arecom

r=1-
D 12 m-m)+ 12 -m)..
N-N
7. Concurrent Deviation
correlation D"
c =Coefficient of
the concurrent deviatrou

method
e 2C-N
N concurrent
C = nber of the numr
deviations o r
of positive signs
443
N =
Number of pairs of deviation
compared ie. (N - 1)
&Probable Error P.E, = Probable Error

PE, = 0 . 6 7 4 5
r= Correlation

N= Number of of obervation
pairs
1- S.E, = Standard Error
S.E,N Determination =
2
of
Coefficient
non-Determination 1 2
Coefficient of
-

10.
QUESTIONS
bjective Type Questions
Siate whether the following statements are "True" or "False"
1Corelation always signifies a cause and effect relationship between thevariables.

2Cpefiient of correlation is a relative measure of association between two or more variables.


. Correlation is a measure of strength of relationship between two variables.
4. The coefficient of correlation is never negative.
data.
5.Coefficient of correlation must be in the same units as the original
6.The coefficient of concurrent deviations cannot be negative.
1. There are no limits to the value of r.
&The rank correlation coefficient was developed by Spearman.
value of one ofthe variables, decreases
.Ihe negative correlation in two series means that, as the
the value of the other variable would also decrease.
. Ir is negative, both the variables are decreasing.

Ans: True-2, 3, 6, 8. False- 1, 4, 5, 7, 9, 10].


LFil in the blanks:
and.
of change of .
e COeffricient of correlation is independent
2.Ifr is more than six times... it is called significant. correlation.
of
is studied with the help .
. . .

The
3. The more variables
relationship betw
tween three or

4.If=0.3, ?will be..


S.TAhe oefficient of correlation is under-root oftwo. fficients].
coefficients].
0.09; 5. Regression
Ans: 1. Scale, rigin; 2. probable error; 3. Multiple; 4.
.Tiek t h
Tick
orig
the correct answer:
\.The
Coefficient
a)
of correlation:
(6) can be
less than
has no limits ()varies between +1.

)can be more than coefficient


of correlation:
The Value of situation is 0.81.
What is
1or for particular
a particular
(b) 0.9
ay 0.81
)o.09.
1.What is meant
THEORETICAL QUESTIONS
by correlation? What are the properties f the
of the
coefficient ofof
2. (a) Distinguish coefficient of correlation from coefficient of
coelaBomba
(B.A. Econ. tion
variation.
(b) What is a scatter
variables, in
diagram?How does it help us in
respect of both their nature and
(B.Com. Madhurci
studying the correlation
3. extent? betuens een tv
Define Karl Pearson's
coefficient of correlation. What is it (B.Com. Karnatoka
intended to measure?
4. Distinguish between:
(MA. Econ. Caliai
(a) Positive and
negative correlation.
(b) Linear and non-linear
(c) Simple, partial and correlation,
5. What are the multiple correlation.
advantages of (B.Com. Madr
Spearman's rank correlation
coefficient? Explain the method over Karl Pearson's corelau
of
calculating Spearman's rank correlation
6.
coemeie
Define, coefficient of
Concurrent Deviations' and
(B.Com. Shini
7. What is Comment on its usefulness. LCH
'spurious' or
non-sensical
8. Define
coefficient of correlation andcorrelation? Explain with example.
(B.Com Ma

9. What are the mention its important (BCom


methods of
10. Explain the assumption oncalculating properties.
coefficient of correlation?
which Karl Pearson coefficient of correlation, (B.Com.
i Keni

based.

(B.ComA

1.Calculate the PRACTICAL PROBLEMS


coefficient of correlation lata:
X: between x and y from the follo
Y:
2 3
2 4 5
4 6
5 3 8 6 (BA Puyi

7
[Ans: Y= 0.79].

You might also like