We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 19
UNIT 2- SEC-T
1. CONCEPT AND DEFINITION OF
CORRELATION
In many practical applications, we might come across the situation where observations are available on
‘two or more variables. The following exampleswill illustrate the situations clearly:
a) Heights and weights of persons of a certain group;
b) Sales revenue and advertising expenditure in business; and
) Time spent on study and marks obtained by students in exam.
If data are available for two variables, say X and Y, it is called bivariate distribution,
Let us consider the example of sales revenue and expenditure on advertising inbusiness. A natural question
arises in mind that is there any connection between sales revenue and expenditure on advertising? Does
sales revenue increase or decrease as expenditure on advertising increases or decreases?
If we see the example of time spent on study and marks obtained by students, a natural question appears
whether marks increase or decrease as time spent onstudy increase or decrease.
Inall these situations, we try to find out relation between two variables andcorrelation answers the question,
if there is any relationship between one variable and another.
When two variables are related in such a way that change in the value of onevariable affects the value of
another variable, then variables are said to be correlated or there is correlation between these two variables.
2. TYPES OF CORRELATION
a) Positive Correlation:
Correlation between two variables is said to be positive if the values of thevariables
deviate in the same direction i.e. if the values of one variable increase (or decrease) then
the values of other variable also increase (or decrease). Some examples of positive
correlation are correlation between
1 Heights and weights of persons of a certain group:
2. Sales revenue and advertising expenditure in business: and
3. Time spent on study and marks obtained by students in exam.
b) Negative Correlation:
Correlation between two variables is said to be negative if the
values of variables deviate in opposite direction ic. if the values
of one variable increase(or decrease) then the values of other
variable decrease (or increase). Some examples of negative
correlations are correlation between
1. Volume and pressure of perfect gas;2. Price and demand of goods;
3. Literacy and poverty in a country; and
4, Time spent on watching TV and marks obtained by
students inexamination,
3. Simple, Partial and Multiple Correlation:
The distinction between simple, partial and multiple correlation is based upon
the number of variables studied
Simple Correlation: When only two variables are studied, it is a case of simple
correlation. For example, when one studies relationship between the marks secured
by student and the attendance of student in class, it is a problem of simple
correlation.
Partial Correlation: In case of partial correlation one studies three or more
variablesbut considers only two variables to be influencing each other and the
effect of other influencing variables being held constant. For example, in above
example ofrelationship between student marks and attendance, the other variable
influencingsuch as effective teaching of teacher, use of teaching aid like computer,
smart boardete are assumed to be constant.
Multiple Correlation: When three or more variables are studied, it is a case of
multiple correlation, For example, in above example if study covers the relationship
between student marks, attendance of students, effectiveness of teacher, use of
teaching aids ete, it is a case of multiple correlation.
4. Linear and Non-linear Correlation:
Depending upon the constancy of the ratio of change between the variables, the
correlation may be Linear or Non-linear Correlation.
Linear Correlation: If the amount of change in one variable bears a constant ratio to the amount,
of change in the other variable, then correlation is said to be linear
Non-linear Correlation: If the amount of change in one variable does not bear a constant ratio to
the amount of change to the other variable, then correlation is said to be non-linear3. METHODS OF FIND CORRELATION COEFFICIENT
a) Scatter Diagram
Scatter diagram is a statistical tool for determining the potentiality of
correlation between dependent variable and independent variable. Scatter
diagram does not tell about exact relationship between two variables but it
indicates whether they are correlated or not.
Let (x,.y,): @=1.2.....m) be the bivariate distribution. If the values of the
dependent variable Y are plotted against corresponding values of the
independent variable X in the XY plane, such diagram of dots is called scatter
diagram or dot diagram. It is to be noted that scatter diagram is not suitable for
large number of observations.
In the scatter diagram
Diagram -1 Diagram - IT
Perfect Positive Correlation
Perfect Negative Correlation
25) 125
104m »
.
78 e 75
7
.
25 oe as
.
0 o
75
10
125High Positive Correlation High Nogativo Correlation
as
Diagram - IIL Diagram -1V
Low Postive Correlation Lov Negative Correlation
Diagram -V Diagram - VINo Correlation
4
asd.
2s
w
Diagram - VII
B) KARL PEARSON’S CORRELATION COFFICIENT
Scatter diagram tells us whether variables are correlated or not.
But it does notindicate the extent of which they are correlated.
Coefficient of correlation gives the exact idea of the extent of
which they are correlated.
Coefficient of correlation measures the intensity or degree of
linear relationship between two variables. It was given by
British Biometrician Karl Pearson (1867-1936).
If X and Y are two random variables then correlation coefficient between X
and ¥ i
+)
Cor(x,y) is indication of cor coefficient between two variables X and
Me
Where, Cov(x, y) the covariance between X and ¥ which is defined as:
and eV
Voo= -x,Similarly,
V(x) the variance of ¥ is defined by
Vor= « -¥F
oat)
where, n is number of paired observations.
Then, the correlation coefficient “r” may be defined as
le Le 7
aoe -=y, -¥)
r=Corr(x,y) =
REMARK 1: Karl Pearson’s correlation coefficient r is also called product moment.
REMARK 2: Karl Pearson’s correlation coefficient is also denoted by p(X, ¥).
Correlation coefficient. Expression in equation (2) can be
simplified in various forms. Some of them are:
Ye -Dy.-9
+ Q)
@)
on 3)
= 6)4, ASSUMPTION for CORRELATION COEFFICIENT
1. Assumption of Linearity
Variables being used to know correlation coefficient must
be linearly related. You can see the linearity of the
variables through scatter diagram.
2. Assumption of Normality
Both variables under study should follow Normal
distribution. Theyshould not be skewed in either the
positive or the negative direction.
3. Assumption of Cause and Effect Relationship
There should be cause and effect relationship between both
variables, for example, Heights and Weights of children,
Demand and Supply of goods,etc. When there is no cause
and effect relationship between variables then correlation
coefficient should be zero. If it is non zero then correlation
is termed as chance correlation or spurious correlation. For
example, correlation coefficient between:
a) Weight and income of a person over periods of time; and
b) Rainfall and literacy in a state over periods of time
As correlation measures the degree of linear relationship, different values ofcoefficient
of correlation can be interpreted as below:
Value of correlation Correlation is
coefficient
+1 Perfect Positive
Correlation
“I Perfect Negative
Correlation
0 There is no
Correlation
0- Weak Positive
0.25 Correlation
0.75 - 1) Strong Positive
Correlation
-0.25-0 Weak Negative
Correlation—0.75- (1) Strong Negative
Correlation
5. PROPERTIES OF CORRELATION COFFICIENT.
Property 1: Correlation coefficient lies between -1 and +1
Proof: We have to prove that
-1<1r@%Y)< +1 ”
r& Y= aes is
. (2 abF a;=
POY Trash zy where (
We have the Schwartz inequality which states that if aj, bij = 1,2, ...41
aré real quantifies then <
(SE abi s( 3 a2) S 62)
del jel det
the sign of equality holding’if and oply if
Using Schwarz inequality, we get from (*) _
X,Y) Ss lie, IX, His >*-10,b>0, h> 0 and k>0.
We have to prove Corr(x,y) = Corr(u,v) i.e. there is no change in correlation
when origin and scale are changed.
Coves.y) = E(x—8y-3)
=2D (e+ bu-ahalo +ky —b-by)
= Lh D (o-aKw—
Cov(x.y) = hkCoviu.v)
and,
voo=15(x-x)
=23(e+ mu a-ha
VO) =h?V(uy
Similarly,
Voyy=kVi(vy
Cov(x, y)
VVCOV)
RkCov (u,v)
ve Vow
Corr(x, y
Corr(x.y)Cov(u,v)
Jive
Corr(x.y) = Corr(u,v)
Corr(x.y)
i:e. correlation coefficient between X and Y is same as correlation coefficient
between U and V Thus, correlation coefficient is independent of change of
origin and scale
Property 3: If X and Y are two independent variables then correlation
coefficient between X and Y is zero, i.e.Corr(x.
Proof. If X and ¥ are independent variables, then
Cov (X, ¥) =0
ret, 1) =S22 a yn
Hence two independent, variables are. poveriicl
Hence two independent variables are uncorrelated.
But the converse of the theorem is not uue, i.e., two uncorrelated variables
may not be independent as the following example illustrates :
x “pcacwan
HX, Yy LOK
ox Oy
Thus in the above example, the variables X and Y ~e uncorrelated. But on carful examination
we find that X and Y are not independent but they are connected by the relation Y = X?.
Hence two uncorrelated variables need not necessarily to be independent.
Example 1.Calculate the correlation coefficient for the following heights (in inches) of fathers (x) and
their sons (Y) :
xX: 65 66 67 | 67 68. 69 70 72
Y: o7 68 65 8 m2 72 o 7
Solution.
CALCULATIONS FOR CORRELATION COEFFICIENT
x Y __ x Y? xY
67 4225 4489 4355
68 4356 4624 4488
65 4489 4225 4355
68 4489 4624 4556
72 4624 5184 4896
22 4761 5184 4968
6s 4900 4761 | 4830
1 184 5041 $112
552 37028 38132 37560
1 344 6g Felyyel y
Kap ik= B= 8, P= PLY = 9x 552 = 69
1 os
wey =COeED REXY-XF ;
‘OxSy : =
1 sea Ed ¥
(@ Ex? -#) (zr - 7)
i
ies 37560 — 68 x 69
8.
= 4695 — 4692 - 3
(4628-5 — 4624) (4766-5 = 4761) V4-5x 5-5
Short Cut method:
V [me (68y? | 38122 coor
= 0-603
Define dy =X — Ay and dy =Y — Ay
Where, Ay and Ay are assumed mean of X and Y series respectively.
The correlation coefficient is defined asBD dydy — Tey
(lze-a) Grea) — |}
— Vd — Yd
where dy == , y=
r=
Example 2:
Use short-cut method to find coefficient of correlation.
x 10 12) 14 18 20
5 6| 7 10 12
Let A,= Assumed mean of X=I4 and A,= Assumed mean of Y =7
x] oy |d@exld
1 [5 |i-4=
2 | 6 |iM=2
4 [7 | I4=0
Tg | 10 | ig-14=4
20 | 12 | 20-14=6
1
= (52) — (0.8)1
r(X,Y) = 5 62)— 81 = 9.6 - 26
(2- 064) a. »} VG3-76)(6.8) 93.568= 0.99
Example 3
A computer while calculating correlation coefficient
between two variables X and Y from 25 pairs of observations obtained the
following resulis :
n=25, EX = 125, IX? = 650, LY = 100, LY? = 460, IXY = 508
It'was, however, later discovered at the time of checking that he had copied
down two pairs as Y while the correct values were X | Y
6 [14 8 [72
. 8lé ole
Obtain the correct value of correlation coefficient. -
Solution.
Comected EX=12- 6 - B+ B+ 6=125
Comected EY=100- 14- 6+ 12+ 8=100
Comected IX7=650 - @- B+ 82+ &=650
Cormected E¥?=460 - 147 - + 12% + B= 436
Corrected EXY = 508 - 6x14 - 8x6 + 8x12 +6x8=520
¥ wtyyeb = =i =
X =) Pk 95x 125 =5, Y=5 BY =55x 100=4
Cov KN) =i Exy -X7¥=