Correlational Analysis - Statistics - Alok - Kumar

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

CORRELATION

ANALYSIS
It is a statistical method or a statistical
technique that measures the degree of
association between two or more variables.
 According to L.R.Connor,’’When two or more
quantities vary in sympathy so that movements
in the one tend to be accompanied by
corresponding movements in the others ,then
they are said to be correlated.’’
 According to Ya Lun Chow ,’’ Correlation
analysis attempts to determine the degree of
relationship between variables.’’.
 According to W.L.King,’’Correlation means
that between two series or groups of data ,there
exists some casual connection,’’
 The study of Correlation shows the direction
and degree of relationship between the
variables .This has helped the formation of
different laws and concept in economic theory.
 It is very helpful in understanding economic
behaviour .This is helpful in studying factors
by which economic events are affected.
 Study of correlation reduces the range of
uncertainties in matter of prediction.
 Helpful in investigation and research.
 It is also helpful in policy formulation.
 Correlation can be:
 Positive and Negative Correlation
 Linear and Non- Linear Correlation
 Simple ,Multiple and Partial Correlation
 Positive correlation - When two variables X and Y move in the
same direction,i.e.,when one increases the other also increases and
when one decreases the other also decreases, the correlation
between the two is positive .For example, Price and supply of a
commodity.

Positive Correlation Positive Correlation


X Y X Y
8 4 15 35
16 6 10 25
24 10 5 20
 Negative correlation: When two variables X
and Y move in the opposite direction, the
correlation is negative .For example ,reverse
relationship between price and demand of a
commodity.

Negative Correlation Negative Correlation


X Y X Y
8 32 30 45
16 24 20 55
24 8 10 70
 Linear Correlation:If the ratio of change
between two variables is uniform ,it is called
Linear Correlation.If the changes are plotted
on a graph paper ,their relationship will be
indicated by a straight line
 Example:.
Linear Correlation
X Y
2 5
4 10
6 15
 Non-Linear Correlation :If the ratio of change
between two variables is not uniform,It is
called Non-Linear Correlation.If these changes
are plotted on a graph paper ,they will not
form a straight line but a curve.
 Example:
Non- Linear Correlation
X Y
2 5
4 8
6 12
 Simple Correlation: Relationship between two variables is
known as Simple Correlation.For example ,relationship
between price and demand of a commodity.
 Multiple Correlation:When the relationship among three or
more than three variables is studied simultaneously, it is
called Multiple Correlation.For example, agricultural
production depends on rainfall, amount of mannures,seeds
etc. This will be called Multiple Correlation.
 Partial Correlation: Relationship between two variables is
established keeping other variables constant. For example, If
we study the relationship between degree of rainfall and
agricultural production assuming amount of fertilizers,
quality of seeds as constant ,it will be known as Partial
Correlation.
 Perfect Correlation
 Absence of Correlation or No Correlation

 Limited degree of Correlation

(a)High degree of Correlation


(b)Moderate degree of Correlation
(c)Low degree of Correlation
 Degree of Correlation refers to the Coefficient of
Correlation. There may be following types of
Positive and Negative Correlation:
 1.Perfect Correlation: When two variables change
in the same proportion. It may be of two kinds:
 (a)Perfect Positive :When proportional change in
two variables is in the same direction, it is called
Perfect Positive Correlation .In this case coefficient
of correlation(r)=+1
 (b)Perfect Negative :When proportional change in
two variables is in opposite direction ,it is called
perfect negative correlation. In this case, coefficient
of correlation(r)= -1
 2.Absence of Correlation: If there is no relation
between two variables ,it is called no
correlation or absence of correlation. In this
case ,coefficient of correlation(r)=0.
 Limited Correlation : Between perfect correlation and no
correlation there is a situation of limited degree of
correlation .In this case the coefficient of correlation (r) is
more than zero and less than one ,i.e. r>0 but <1.
 There are three types of limited degree of correlation:
 (a)High Correlation :When Correlation between two series
is close to one, it is called high degree of correlation. In this
case value of r lies between±0.75and±1.
 (b)Moderate Correlation :When Correlation between two
series is neither large nor small, it is called Moderate degree
of Correlation .In this case value of r lies
between±0.25and±0.75.
 (c)Low Correlation :When the Correlation coefficient
between two series is very small, it is called Low degree
Correlation. In this case value of r lies between 0and±0.25.
DEGREE POSITIVE NEGATIVE
Perfect +1 -1
High Between +0.75and+1 Between -0.75and -1
Moderate Between +0.25and+0.75 Between -0.25 and -0.75

Low Between 0and+0.25 Between 0and -0.25

Zero 0 0
 SCATEER DIAGRAM
 KARL PEARSON’S COEFFICIENT OF
CORRELATION
 COEFFICIENT OF CORRELATION BY RANK
DIFFERENCES
 COEFFICIENT OF CONCURRENT
DEVIATION
 REGRESSION LINES AND REGRESSION
COEFFICIENT
 1.Scatter Diagram Method :The existence of
Correlation between variables can be shown
graphically by means of a Scatter diagram. It
is obtained by plotting value on a graph paper
.The chart is prepared by measuring X-
variable on horizontal axis and the Y-variable
on vertical axis and all the observations are
plotted on a graph . The cluster points ,so
obtained on graph paper is called the Scatter
diagram or dot diagram.
 By observing the points we can know the degree
and direction of Correlation.
 If the trend of the dotted points is Upward, rising
from left bottom and going up towards the right
top, Correlation is positive. On the other hand ,If
the dotted point show a downward trend from the
left top to the right bottom ,correlation is negative.
 If the plotted point do not show any trend ,the two
variables are not correlated.

 Closeness of dots towards each other in a


particular direction indicating higher degree of
correlation.
PERFECT POSITIVE PERFECT NEGATIVE
CORRELATION CORRELATION
 If all the points lies on a straight If all the points lie on a straight line falling
line rising from the lower left from the upper left corner to the lower
hand corner to the upper right right hand corner of the diagram. In this
hand corner. In this case ,r=+1 case,r=-1

 r=+1 r =-1
IF THE PLOTTED POINTS ARE VERY
. CLOSE TO EACH OTHER IT SHOWS
HIGH DEGREE CORRELATION

High degree of positive  High degree of negative


correlation correlation
LOW DEGREE POSITIVE LOW DEGREE NEGATIVE
CORRELATION CORRELATION
ABSENCE OF CORRELATION
OR NO CORRELATION NO CORRELATION

 If scatter diagram  r=0


does not have any
trend line and all the
points in the
diagram are highly
scattered, it
indicates No
Correlation.
 MERITS OF SCATTER DIAGRAM:
 It is a simple and non mathematical method of
knowing correlation.
 It gives a rough idea at a glance whether there is
positive correlation, negative correlation or
absence of correlation between variables.
 It is not affected by extreme values.
 DEMERITS OF SCATTER DIAGRAM:
 The exact degree of correlation can not be obtained
from it.
 It gives only an approximate idea of relationship.
 A mathematical method for measuring the linear
relationship between the variable X and Y was
suggested by the great biologist and statistician
Karl Pearson.
 This method is also called Product Moment
Method.
 According to Karl Pearson ,coefficient of
correlation(written as r)of two variables is
determined by dividing the total of the products
by the corresponding deviation of the various
items of two series from their respective means by
the product of their standard deviation and the
number of pairs of observation.
 Formula
 According to Karl Pearson’s method ,the coefficient of correlation is
measured as

σ xy
 r=
N𝛔x𝛔𝒀
 Here, r=Coefficient of Correlation.
 X=(X-𝑿ഥ ) =Deviation of value X from mean
 Y=(Y-𝒀ഥ )= Deviation of value Y from mean
 𝛔x=Standard deviation of X series.
 𝛔y=Standard deviation of Y series.
 N= No. of observations.
 A modified version of Karl Pearson's formula
σ xy
r= 𝟐 𝟐
σ𝒙 × σ𝒚

 Here, r=Coefficient of Correlation.


 X=(X-𝑿ഥ ) =Deviation of value X from mean
 Y=(Y-𝒀ഥ )= Deviation of value Y from mean
 Xy=(X- 𝑿) (Y-𝒀)
 𝒙𝟐 =(X−𝑿ഥ )𝟐
 𝒚𝟐 =(Y−𝒀ഥ) 𝟐
 STEPS:Calculate arithmetic mean of X and Y series
 Calculate deviation of observation in X-series
from𝑋ത and denote it by x.
 Calculate deviation of observation in Y-series from
𝑌ത and denote it by y.
 Calculate square of deviation in both series and
obtain their aggregate.i.e.σ 𝑥 2 and σ 𝑦 2 .
 Multiply the corresponding deviation of the X and
Y series to obtain σ xy .
σ xy
 Apply the formula: r= 𝟐 𝟐
σ𝒙 × σ𝒚
STEPS:
1.Any convenient value in X and Y series is taken as
assumed mean Ax and Ay.

2.With the help of assumed mean of both the series


,deviation of the values of individual variable,i.e.,
dx(X-Ax)and dy (Y-Ay) are calculated.

3.σ 𝒅𝒙 and σ 𝒅𝒚 are found by adding the deviations.

4.Deviations of the two series are multiplied ,as


dx.dy,and the multiples added up to obtain σ 𝒅𝒙𝒅𝒚.
 5. Squares of the deviations 𝒅𝒙𝟐 and 𝒅𝒚𝟐 are
added up to find out σ 𝒅𝒙𝟐 and σ 𝒅𝒚𝟐
6. Finally ,Coefficient of Correlation is calculated
using the formula
 FORMULA:
STEPS:
1.Any convenient value in X and Y series is taken as assumed
mean Ax and Ay.

2.With the help of assumed mean of both the series ,deviation of


the values of individual variable,i.e., dx(X-Ax)and dy(Y-Ay) are
calculated.

3. Now divide dx and dy by some common factor as


𝒅𝒙 𝒅𝒚
dx’= ,dy’= ; 𝐡𝐞𝐫𝐞 𝐂 is common factor for series X and series Y
𝑪 𝑪
.And dx’ and dy’ are step deviations.

4.Deviations of the two series are ,as dx’× 𝐝𝐲′ ,and the multiples
added up to obtain σ 𝒅𝒙′ 𝒅𝒚′ .
5.Squares of the deviations 𝒅𝒙′𝟐 and
𝒅𝒚′𝟐 are added up to find out σ 𝒅𝒙′𝟐
and σ 𝒅𝒚′𝟐

6.finally,apply the formula


 Karl Pearson correlation coefficient lies between -1
and +1,e.i.,-1≤r≤+1.
 r is a pure number and it is independent of the
units of measurement.
 -ve value of r indicates an inverse relationship
between variables and if r is+ve,the two variables
move in the same direction .
 If r=0 ,there is no correlation between variables.
 If r=+1,The correlation is perfect positive
 If r=-1,The correlation is perfect negative.
 The coefficient of correlation is not affected by
change of scale or origin.
 Probable error is used to test the reliability of Karl Pearson’s correlation
coefficient.
1−𝑟 2
 Probable error (P.E.)=0.6745 ×
𝑁
 Here, r stands for the coefficient of correlation and n for the no. of pairs
of observation.
 Probable Error is used to interpret the value of the correlation coefficient.
*If the value of r is less than the P.E. there is no evidence of
correlation
*If the value of r is more than six times of P.E.,it is significant
correlation.
*By adding and subtracting the value of P.E. from the coefficient of
correlation we get respectively the upper and lower limit within which
coefficient of correlation in the population can be expected to lie.
*P.E. as a measure of interpreting coefficient of correlation should
be used only when the n is large.
* P.E. as a measure of interpreting coefficient of correlation should
be used only when a sample study is being made and the sample is unbiased
and representative.
 MERITS
 Practical and popular method.
 Meaningful conclusion.
 Measurement of degree and direction
simultaneously.
 DEMERITS:
 Greater influence of extreme values.
 Calculation process is long and time consuming.
 Possibility of wrong interpretation.
 Assumption of Linear relationship between the
variables.
 This method was propounded by the British psychologist, Charles Edward
Spearman, in the year 1904 to calculate coefficient of correlation of
qualitative variables whose quantitative measurement is not possible. For
example, we can not measure honesty ,intelligence,beauty,leadership etc.
quantitatively but these variables can be assigned ranks. These ranks are
used for the calculation of coefficient of Correlation under Rank Correlation
method. If ranks of X series are denoted as R1 ,ranks of Y series is denoted
by R2 and the difference between R1 and R2 is denoted by D then we can
find out Rank Correlation with the help of formula
6 σ 𝐷2
R=1- N(𝑁2 −1)
 Here, R=Rank Coefficient of Correlation ,σ 𝐷2 =The total of squares of
differences of corresponding ranks.N= Number of pairs of observation.
 As in case of r, -1≤R≤+1.
 σ 𝐷 Or the sum of the differences between R1 and R2 is always equal to
zero.
 WHEN ACTUAL RANKS ARE GIVEN :
 Steps:
 i.Compute the difference of ranks (R1-R2) and
denote them by D
 ii.Compute 𝐷2 and total them to get σ 𝐷2 .
 iii.Apply the formula:
6 σ 𝐷2
 R=1-
N(𝑁2 −1)
 When ranks are not given:
 Steps:
 I. First of all ranks are allotted either in ascending
or in descending order to all the values of the two
series.
 Steps:i.Compute the difference of ranks (R1-R2)
and denote them by D
 ii.Compute 𝐷 2 and total them to get σ 𝐷2 .
 iii.Apply the formula:
6 σ 𝐷2
 R=1-
N(𝑁2 −1)
 When ranks are equal:
 If there is two or more items have equal values in a series,
then the problem of determining the rank arises. In this
situation a common rank (Average of the ranks )are
assigned to each equal value. In order to avoid the
possibility of error, Formula,

6[ σ 𝐷2 + (𝑚13 −m1)+12
1 1
(𝑚23 −m2)+….]
R=1- 12
𝑁 3 −N
,
 Is used for the calculation of rank3correlation in such
𝑚 −m
situation. The correction factor 12 is added to the value
of3 σ 𝐷 2 .Here m =no. of items which have common ranks.
𝑚 −m
12
is added as many times as the number of such groups
having equal ranks.
 MERITS:
 Its calculation is easier as compared to Karl
Pearson’s Method.
 This method can be used as a measure of degree of
association between qualitative variables.

 DEMERITS:
 This method is not suitable for calculating
coefficient of correlation of grouped frequency
distribution.
 If the no. of items are large , this method becomes
difficult and unsuitable.
STAY HOME STAY SAFE

You might also like