0% found this document useful (0 votes)
28 views

Correlation and Regression

Uploaded by

Nitin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Correlation and Regression

Uploaded by

Nitin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

CORRELATION AND REGRESSION

CORRELATION ANALYSIS
 Karl Pearson introduced the concept of correlation.

 Correlation refers to the study of relationship between two


or more variables
 The relationship is termed as co-variation or mutual
interdependence between the variables
 Eg. Price and demand, height and weight, advertisement
and sales, agricultural production and rainfall
 Correlation summarizes both direction and magnitude of
the variables in one figure.

2 Course Teacher: Dr.D.Ramya


Types of correlation
 Positive and negative correlation
 When the two variables change in the same direction,
there is a positive correlation between the two variables.
 Eg. Supply and demand
 When the two variables change in the opposite direction,
there is a negative correlation between the two variables.
 Eg. price and demand
 Linear and Non-linear correlation
 If the ratio of change in one variable is same as the ratio
of change in the other variable then the correlation is said
to be linear otherwise it is non- linear.

3 Course Teacher: Dr.D.Ramya


Types of correlation
 Simple, Partial and Multiple correlation
 When only two variables are studied, it is called simple
correlation.
 When there are more than two variables under study and any
two variables alone are studied, it is called partial correlation.
 When more than two variables are studied simultaneously, it is
called multiple correlation.
 Spurious or non-sense correlation
 When there is a relationship between two variables, but they
are not causally related with each other.
 Eg. Height of the students of class and grades obtained by
them in exams.

4 Course Teacher: Dr.D.Ramya


Correlation and Causation
 Correlation helps in determining the degree of
relationship between two or more variables.
 It does not tell anything about cause and effect
relationship.
 In most of the cases, among the two variables it is
impossible to say, which is cause and which is the
effect.
 Moreover, when calculating correlation it is not
necessary that there should be cause and effect
relationship between the variables.

5 Course Teacher: Dr.D.Ramya


Methods of calculating correlation
 Graphical method – Scatter diagram

 Mathematical or algebraic method

 Karl Pearson’s coefficient of correlation

 Spearman’s Rank correlation coefficient

6 Course Teacher: Dr.D.Ramya


Scatter diagram
 The Scatter Diagram Method is the simplest method

to study the correlation between two variables wherein


the values for each pair of a variable is plotted on a
graph in the form of dots thereby obtaining as many
points as the number of observations.

 Then by looking at the scatter of several points, the

degree of correlation is ascertained.

7 Course Teacher: Dr.D.Ramya


Scatter Diagram
 The degree to which the variables are related to each
other depends on the manner in which the points are
scattered over the chart.
 The more the points plotted are scattered over the
chart, the lesser is the degree of correlation between
the variables.
 The more the points plotted are closer to the line, the
higher is the degree of correlation.
 The degree of correlation is denoted by “r”.

8 Course Teacher: Dr.D.Ramya


Perfect Positive Correlation (r =+1)
 The correlation is said to be perfectly positive when
all the points lie on the straight line rising from the
lower left-hand corner to the upper right-hand corner.

9 Course Teacher: Dr.D.Ramya


Perfect Negative Correlation (r =-1)
 When all the points lie on a straight line falling from
the upper left-hand corner to the lower right-hand
corner, the variables are said to be negatively
correlated.

10 Course Teacher: Dr.D.Ramya


High Degree of +ve Correlation (r = +
0.8)
 The degree of correlation is high when the points
plotted fall under the narrow band and is said to be
positive when these show the rising tendency from the
lower left-hand corner to the upper right-hand corner.

11 Course Teacher: Dr.D.Ramya


High Degree of –ve Correlation (r = –
0.8)
 The degree of negative correlation is high when the
point plotted fall in the narrow band and show the
declining tendency from the upper left-hand corner to
the lower right-hand corner.

12 Course Teacher: Dr.D.Ramya


Low degree of +Ve Correlation (r= +
0.3)
 The correlation between the variables is said to be low
but positive when the points are highly scattered over
the graph and show a rising tendency from the lower
left-hand corner to the upper right-hand corner.

13 Course Teacher: Dr.D.Ramya


Low Degree of –Ve Correlation (r = -
0.3)
 The degree of correlation is low and negative when the
points are scattered over the graph and the show the
falling tendency from the upper left-hand corner to the
lower right-hand corner.

14 Course Teacher: Dr.D.Ramya


No Correlation (r= 0)
 The variable is said to be unrelated when the points
are haphazardly scattered over the graph and do not
show any specific pattern. Here the correlation is
absent and hence r = 0.

15 Course Teacher: Dr.D.Ramya


Karl Pearson’s coefficient of correlation
 A most commonly used method for computing
correlation.
 Popularly known as Pearsonian coefficient of
correlation or product moment correlation
 It is denoted by the symbol ‘r’.
 The formula for calculating ‘r’ is,

𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑟=
𝑁 σ 𝑋2− σ 𝑋 2 𝑁 σ 𝑌2− σ 𝑌 2
Where N = number of paired observations

16 Course Teacher: Dr.D.Ramya


Other formula for calculating
correlation

σ 𝑥𝑦
 𝑟= where 𝑥 = 𝑋 − ത and y = Y − 𝑌ത
𝑋
σ 𝑥2 σ 𝑦 2

𝑐𝑜𝑣(𝑥,𝑦)
 𝑟=
𝜎𝑥 𝜎𝑦

17 Course Teacher: Dr.D.Ramya


 The following table gives aptitude test scores and productivity
indices of 8 randomly selected workers:
Aptitude Score 57 58 59 59 60 61 62 64
Productivity 67 68 65 68 72 72 69 71

Calculate the correlation coefficient between aptitude score and


productivity index X Y X2 Y2 XY
Solution: 57 67 3249 4489 3819
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 58 68 3364 4624 3944
𝑟=
𝑁 σ 𝑋2 − σ 𝑋 2 𝑁 σ 𝑌2 − σ 𝑌 2 59 65 3481 4225 3835
8 𝑋 33144 − 480𝑋552 59 68 3481 4624 4012
=
8 𝑋 28836 − 480 2 𝑋 8𝑋38132 − 552 2 60 72 3600 5184 4320
265152−264960
= 61 72 3721 5184 4392
230688−230400 𝑋 305056−304704
192 192 192 62 69 3844 4761 4278
= = = = 0.603 64 71 4096 5041 4544
288 𝑋 352 16.97 𝑋 18.76 318.36
r = 0.603 480 552 28836 38132 33144
18 Course Teacher: Dr.D.Ramya
 Practice Problems
1. Compute the coefficient of correlation between X-
Advertisement Expenditure and Y-Sales.
X: 10 12 18 8 13 20 22 15 5 17
Y: 88 90 94 86 87 92 96 94 88 85

2. From the following data, compute the coefficient of correlation


between X and Y
X Y
Sum of squares of deviations from the arithmetic
8250 724
mean
Sum of products of deviations of X and Y from
2350
their respective means
No. of pairs of observations 10

3. Calculate correlation coefficient from the following results:


σ 𝑋 Teacher:
19 N = 7;Course = 182; σ𝑌 =
Dr.D.Ramya 175; σ 𝑋 2 = 4844; σ 𝑌 2 = 5131; σ 𝑋𝑌 = 4826
1. Compute the coefficient of correlation between X- Advertisement
Expenditure and Y-Sales.

 Solution:
r X Y X2 Y2 XY
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 10 88 100 7744 880
= 12 90 144 8100 1080
𝑁 σ 𝑋2 − σ 𝑋 2 𝑁 σ 𝑌2 − σ 𝑌 2
18 94 324 8836 1692
10 𝑋 12718 − 140 𝑋 900
= 8 86 64 7396 688
10 𝑋 2224 − 140 2 𝑋 10𝑋 81130 − 900 2
13 87 169 7569 1131
127180−126000
= 20 92 400 8464 1840
22240−19600 𝑋 811300−810000
1180 1180 22 96 484 9216 2112
= = 15 94 225 8836 1410
2640 𝑋 1300 51.38 𝑋 36.06
1180
= = 0.637 5 88 25 7744 440
1852.76
17 85 289 7225 1445
r = 0.637
Course Teacher: Dr.D.Ramya 140 900 2224 81130 12718
20
 From the following data, compute the coefficient of correlation
between X and Y

 Solution:
Given σ 𝑋 − 𝑋ത 2
= σ 𝑥 2 = 8250
σ 𝑌 − 𝑌ത 2
= σ 𝑦 2 = 724
෍ 𝑋 − 𝑋ത 𝑌 − 𝑌ത = ෍ 𝑥𝑦 = 2350

σ 𝑥𝑦 2350 2350 2350


𝑟= = = = = 0.961
σ 𝑥 2 σ 𝑦2 8250 724 90.83 𝑋 26.91 2444.24
21 Course Teacher: Dr.D.Ramya
 Alternate Solution:
𝑐𝑜𝑣(𝑥, 𝑦)
𝑟=
𝜎𝑥 𝜎𝑦
σ 𝑋 − 𝑋ത 𝑌 − 𝑌ത 2350
𝑐𝑜𝑣 𝑥, 𝑦 = = = 235
𝑁 10
σ 𝑋 − 𝑋ത 2 8250
𝜎𝑥 = = = 825 = 28.72
𝑁 10

σ 𝑌 − 𝑌ത 2 724
𝜎𝑦 = = = 72.4 = 8.51
𝑁 10
𝑐𝑜𝑣(𝑥, 𝑦) 235 235
𝑟= = = = 0.961
𝜎𝑥 𝜎𝑦 28.72𝑋8.51 244.41

22 Course Teacher: Dr.D.Ramya


Calculate correlation coefficient from the following results:
N = 7; σ 𝑋 = 182; σ 𝑌 = 175; σ 𝑋 2 = 4844; σ 𝑌 2 = 5131; σ 𝑋𝑌 = 4826
Solution:
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑟=
𝑁 σ 𝑋2 − σ 𝑋 2 𝑁 σ 𝑌2 − σ 𝑌 2

7 𝑋 4826 − 182𝑋175
=
7 𝑋 4844 − 182 2 𝑋 7𝑋5131 − 175 2
33782−31850
=
33908−33124 𝑋 35917−30625
1932 1932 1932
= = = = 0.948
784 𝑋 5292 28 𝑋 72.75 2037
r = 0.948

23 Course Teacher: Dr.D.Ramya


Properties of the correlation
coefficient
 The measure of correlation, called coefficient of
correlation, summarises in one figure, the direction and
the degree of correlation
 The value of correlation shall always lie between +1
and -1
 The coefficient of correlation is independent of change
of scale and origin of the variables X and Y.
 The coefficient of correlation is the geometric mean of
the two regression coefficients. Symbolically, 𝑟 =
𝑏𝑥𝑦 𝑏𝑦𝑥

24 Course Teacher: Dr.D.Ramya


Spearman’s Rank correlation
coefficient
 This method is based on ranks rather than on their actual values.

 If the items are ranked according to their attributes, the


correlation between them would be called Rank correlation
 This type of correlation is applied when the variables are
qualitative or categorical in nature such as beauty, intelligence,
talent etc.,
 Rank correlation is denoted by the symbol rs or 𝜌.

 The range of rank correlation is also -1 to +1.

25 Course Teacher: Dr.D.Ramya


Formula for calculating rank
correlation
 Case (i) When the ranks are given
6 σ 𝑑2
𝑟𝑠 = 1 − 𝑁3 −𝑁
where d =Rx –Ry = (Rank of x – Rank of y)
 Case (ii) When the ranks are not given (no repeated ranks)
6 σ 𝑑2
𝑟𝑠 = 1 − 𝑁 3 −𝑁
 Case (iii) When the ranks are not given (with repeated ranks)
2 𝑚3 −𝑚 𝑚3 −𝑚 𝑚3 −𝑚
6 σ𝑑 + + + +⋯
12 12 12
𝑟𝑠 = 1 − 𝑁 3 −𝑁
Where m = number of times a value is repeated.

26 Course Teacher: Dr.D.Ramya


 Case (i) When the ranks are given:
10 competitors in a beauty contest were ranked in the following order
by two judges. Find the rank correlation coefficient.
Judge 1 1 3 4 5 6 2 8 7 10 9
Judge 2 3 7 2 8 5 10 9 6 4 1

Solution:
Rx Ry d=Rx-Ry d2
6 σ 𝑑2
𝑟𝑠 = 1 − 𝑁3 −𝑁 1 3 -2 4
3 7 -4 16
6∗200 4 2 2 4
= 1 − 103 −10 5 8 -3 9
1200 6 5 1 1
= 1 − 1000−10 2 10 -8 64
8 9 -1 1
1200
=1− =1 − 1.212 = −0.212 7 6 1 1
990 10 4 6 36
9 1 8 64
200
𝑟𝑠 = Course
−0.212 Teacher: Dr.D.Ramya
27
 Case (ii) When the ranks are not given and no repeated ranks
From the data given below, find Spearman’s rank correlation
coefficient:
X: 52 63 45 36 72 65 47 25
Y: 62 53 51 25 79 43 60 33

Solution:
X Y Rx Ry d=Rx-Ry d2
6 σ 𝑑2
𝑟𝑠 = 1 − 𝑁3 −𝑁 52 62 4 2 2 4

6𝑋28 63 53 3 4 -1 1
=1 − 83 −8 45 51 6 5 1 1
168 36 25 7 8 -1 1
=1− 512−8 72 79 1 1 0 0
168 65 43 2 6 -4 16
=1−
504 47 60 5 3 2 4
= 1 − 0.33 = 0.67 25 33 8 7 1 1

𝑟𝑠 = 0.67 28
28 Course Teacher: Dr.D.Ramya
 Case (iii) When the ranks are not given (with repeated ranks)
Find the spearman’s rank correlation coefficient from the following
data: X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70

 Solution: X Y Rx Ry d=Rx-Ry d2
68 62
𝑚3 −𝑚 𝑚3 −𝑚 𝑚3 −𝑚
6 σ 𝑑2+ + + +⋯ 64 58
12 12 12
𝑟𝑠 = 1 −
𝑁3 −𝑁 75 68
𝑚3 − 𝑚 𝑚3 − 𝑚 𝑚3 − 𝑚
6 σ 𝑑2 + + + +⋯ 50 45
12 12 12
𝑟𝑠 = 1 −
𝑁3 − 𝑁 64 81
80 60 1
75 68
40 48
55 50
64 70

29 Course Teacher: Dr.D.Ramya


Practice Problems:
 Ten competitors in a music contest are ranked by three judges in the following order:
Judge A 1 5 4 8 9 6 10 7 3 2
Judge B 4 8 7 6 5 9 10 3 2 1
Judge C 6 7 8 1 5 10 9 2 3 4
Use the rank correlation coefficient to discuss which pair of judges has the nearest approach
to common tastes in music.
rAB rAC rBC
 Calculate spearman’s rank correlation coefficient:

X: 39 65 62 90 82 75 25 98 36
Y: 47 53 58 86 62 68 60 91 51

 Calculate the rank correlation coefficient:

X: 30 50 25 30 60 70 30 65 75 85
Y: 50 60 30 40 70 50 90 60 40 80

30 Course Teacher: Dr.D.Ramya


REGRESSION ANALYSIS
 The word ‘regression’ means ‘stepping back’ or ‘going back’

 Sir Francis Galton introduced the word ‘regression’ to study the relationship

between the heights of fathers and sons in his research study.

 Regression refers to the average or functional relationship between the variables

 This measure helps to estimate or predict unknown values of one variable from

the known values of another variable

 The known variable which is used to estimate an unknown variable is called

independent variable (cause).

 The unknown variable for which the value is to be predicted is called dependent

variable.
31 Course Teacher: Dr.D.Ramya
Regression coefficients
 The regression coefficients indicates the degree and
direction of change in the dependent variable with respect
to change in the independent variable.
 bxy and byx are known as the regression coefficients.
 Regression coefficient X on Y is given by,
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 𝜎𝑋
𝑏𝑋𝑌 = or 𝑏𝑋𝑌 = 𝑟𝜎
𝑁 σ 𝑌 2− σ 𝑌 2 𝑌

 Regression coefficient Y on X is given by,


𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 𝜎
𝑏𝑌𝑋 = or 𝑏𝑌𝑋 = 𝑟 𝜎𝑌
𝑁 σ 𝑋2− σ 𝑋 2 𝑋

32 Course Teacher: Dr.D.Ramya


Regression Equations
 It is the algebraic expression of the regression lines

 If two variables are under study namely X and Y, there are


two regression lines of X on Y and Y on X. hence there are
two regression equations.
 The regression equation X on Y is used to describe the
variation in the values of X, for the given changes in Y and
the regression equation Y on X is used to describe the
variation in the values of Y for the given changes in X.
 The regression equations are estimated through the
regression coefficients
33 Course Teacher: Dr.D.Ramya
Calculating Regression Equations
 Regression equation X and Y:
It is used to find the value of X for the given value of Y
𝑋 − 𝑋ത = 𝑏𝑋𝑌 𝑌 − 𝑌ത
Where 𝑏𝑋𝑌 is the regression coefficient of X on Y
 Regression equation Y and X:
It is used to find the value of Y for the given value of X
Y − 𝑌ത = 𝑏𝑌𝑋 𝑋 − 𝑋ത
Where 𝑏𝑌𝑋 is the regression coefficient of Y on X
σ𝑋 σ𝑌
ത=
 𝑋 and 𝑌ത =
𝑁 𝑁

34 Course Teacher: Dr.D.Ramya


 From the following data obtain the two regression equations
X 6 2 10 4 8
Y 9 11 5 8 7

Solution:

X Y X2 Y2 XY

6 9 36 81 54

2 11 4 121 22

10 5 100 25 50

4 8 16 64 32

8 7 64 49 56

30 40 220 340 214


35 Course Teacher: Dr.D.Ramya
Regression Coefficients

X on Y Yon X
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑏𝑋𝑌 = 𝑏𝑌𝑋 =
𝑁 σ 𝑌2 − σ 𝑌 2 𝑁 σ 𝑋2 − σ 𝑋 2
5 𝑋 214 − 30 𝑋 40
=
5 𝑋 214 − 30 𝑋 40 =
5 𝑋340 − 40 2 5 𝑋220 − 30 2
1070−1200 −130 1070−1200 −130
= = = =
1700−1600 100 1100−900 200
𝑏𝑋𝑌 = -1.3 𝑏𝑌𝑋 = -0.65

36 Course Teacher: Dr.D.Ramya


Regression Equations
 X on Y
 Y on X
𝑋 − 𝑋ത = 𝑏𝑋𝑌 𝑌 − 𝑌ത
σ𝑋 30
Y−𝑌ത = 𝑏𝑌𝑋 𝑋 − 𝑋ത
𝑋ത = = =6
𝑁 5
σ𝑌 40
𝑌ത = = =8 Y−8 = −0.65 𝑋 − 6
𝑁 5
Y−8 = −0.65𝑋 + 3.9
𝑋 − 6 = −1.3 𝑌 − 8 𝑌 = −0.65𝑋 + 3.9 + 8
𝑋 − 6 = −1.3𝑌 + 10.4 Y = −0.65𝑋 + 11.9
𝑋 = −1.3𝑌 + 10.4 + 6
𝑋 = −1.3𝑌 + 16.4

37 Course Teacher: Dr.D.Ramya


 Calculate the two regression equations for the data given below and
predict the value of X when Y = 20.
X 10 12 13 17 18
Y 5 6 7 9 13

 Solution:

X Y X2 Y2 XY

10 5 100 25 50

12 6 144 36 72

13 7 169 49 91

17 9 289 81 153

18 13 324 169 234

70 40 1026 360 600


38 Course Teacher: Dr.D.Ramya
Regression Coefficients

X on Y Y on X
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑏𝑋𝑌 = 𝑏𝑌𝑋 =
𝑁 σ 𝑌2 − σ 𝑌 2 𝑁 σ 𝑋2 − σ 𝑋 2
5 𝑋 600 − 70 𝑋 40 5 𝑋600 − 70 𝑋 40
= 5 𝑋360 − 40 2 =
5 𝑋 1026 − 70 2
3000−2800 200 3000−2800 200
= 1800−1600 = 200 =
5130−4900
=
230
𝑏𝑋𝑌 = 1 𝑏𝑌𝑋 = 0.87

39 Course Teacher: Dr.D.Ramya


Regression Equations

 X on Y  Y on X
𝑋 − 𝑋ത = 𝑏𝑋𝑌 𝑌 − 𝑌ത Y−𝑌ത = 𝑏𝑌𝑋 𝑋 − 𝑋ത
σ𝑋 70
𝑋ത = 𝑁
= 5
= 14
σ𝑌 40 Y−8 = 0.87 𝑋 − 14
𝑌ത = = = 8
𝑁 5 Y−8 = 0.87𝑋 − 12.18
𝑌 = 0.87𝑋 − 12.18 + 8
𝑋 − 14 = 1 𝑌 − 8
𝑋 − 14 = 𝑌 − 8 Y = 0.87𝑋 − 4.18
𝑋 =𝑌+6

40 Course Teacher: Dr.D.Ramya


Prediction or Estimation:
Calculation of X when Y=20
Equation of X on Y
𝑋 =𝑌+6
X = 20+6
X = 26
The estimated value of X when Y = 20 is 26

41 Course Teacher: Dr.D.Ramya


1. From the following information on values of two variables X and Y
find the two regression lines and the correlation coefficient
N = 7; σ 𝑋 = 182; σ 𝑌 = 175; σ 𝑋 2 = 4844;
σ 𝑌 2 = 5131; σ 𝑋𝑌 = 4826

2. Find out from the following


a. Coefficient of correlation
b. The two regression equations
c. Most likely value of X when Y = 12
d. Most likely value of Y when X = 22

X 2 8 10 -2 5 -4
Y 3 2 5 10 -2 -3

42 Course Teacher: Dr.D.Ramya


Properties of Regression Coefficient
 The geometric mean between regression coefficient is the
coefficient of correlation. Symbolically,
𝑟 = 𝑏𝑋𝑌 . 𝑏𝑌𝑋
 If bYX is positive then bXY will also be positive. If bYX is
negative then bXY will also be negative.
 The average of two regression coefficient will always be
greater than the correlation coefficient. Symbolically,
𝑏𝑋𝑌 +𝑏𝑌𝑋
2
≥𝑟
 Regression coefficients are not affected by change of
origin, but its is affected by change of scale.
 The coefficient of correlation and the regression
coefficients will have the same sign. If regression
coefficients are positive, then the correlation coefficient
will also be positive and vice versa.

43 Course Teacher: Dr.D.Ramya


Uses of regression analysis
 Regression analysis helps in establishing a functional
relationship between two or more variables
 It helps to predict the value of dependent variables
from the known value of independent variables
 It is widely used in the statistical estimation of demand
curve, supply curve, production function, cost
function, consumption function etc.
 It is used to calculate coefficient of correlation.
 Since most of the problems of economic analysis are
based on cause and effect relationships, the regression
analysis is a very valuable tool in economic and
business research.
Course Teacher: Dr.D.Ramya
44
Difference between correlation and
regression

45
Course Teacher: Dr.D.Ramya

You might also like