Chapter 6 Correlation and Regression
Chapter 6 Correlation and Regression
Chapter 6
CORRELATION AND REGRESSION
CORRELATION ANALYSIS
TYPES OF CORRELATION
Ex X = 1, 2, 3, 4, 5, 6, 7, 8,
Y = 3 + 2x
r= (x = X - X̅ and y = Y-Y̅)
Or
(Direct Method)
Under direct method, X and Y are the values given in the problem and no
need to calculate mean of the values.
5
There is linear relationship between two variables, i.e. when the two variables
are plotted on a scatter diagram a straight line will be formed by the points.
Cause and effect relation exists between different forces operating on the item
of the two variable series.
CAUSALITY
An individuals with a higher level of income may have both higher levels of
savings and spending.
6
We might find that there is a positive correlation between level of savings and
level of spending but this does not mean that one variable causes the other.
We should mention the very interesting case where two related variables are
separated by several steps in a cause-effect chain of events.
Example Problem:
The heights (in centimeters) and weight (in kilograms) of 10 athletes in a team
are:
Height (X) 182 185 190 194 197 199 202 205 208 210
Weight (Y) 75 74 80 82 84 88 94 96 96 98
Calculate:
Find out the coefficient of correlation.
HEIGHT WEIGHT
(X) (Y) x y XY
= = 197.2
7
Y̅ = = 86.7 r= =
Rank Correlation
Rank correlation was developed by the British Psychologist Charles
Edward Spearman in the year 1904. It is applicable for the variables which
cannot be quantified or measured in quantity, but which can be ranked such
as beauty, decision making ability, or leadership skill. The rank correlation
finds its application when the data are not normally distributed or when the
pattern of distribution cannot be recognized. Whereas, the Pearson’s
correlation is based on the assumption the data are normally distributed. Rank
correlation is calculated by using the formula,
Example:
The ranks of 12 beauty contestants for their beauty and their attitude are tabulated below. Find the
correlation coefficient and interpret the result.
D2
D( )
4 11 -7 49
3 9 -6 36
7 7 0 0
9 5 4 16
8
10 2 8 64
1 10 -9 81
2 8 -6 36
4 12 -8 64
12 6 6 36
8 3 5 25
6 1 5 25
11 4 7 49
= 1-
=1-
= 1-
= 1-
= 1-1.6818
REGRESSION ANALYSIS
9
Regression technique
(i) Helps to find equation of X on Y and
(ii) Helps to find equation of Y on X by using the formula,
Regression equation of X on Y
10
(X-X̅) = r (Y=Y̅)
r =
Regression equation of Y on X
(Y-Y̅) = r (X-X̅)
r =
Example:
The following data gives the experience of Machine operators and their
performance rating given by the no. of good parts turned out per 100 pieces.
Operator Experience (X) Performance Ratings (Y)
1 16 87
2 12 88
3 18 89
4 4 68
5 3 78
6 10 80
7 5 75
8 12 83
Calculate regression lines of performance ratings on experience and estimate
probable performance of the operator who has 7 years of experience.
Solution:
X Y X-X̅ Y-Y̅ X² XY
16 87 6 6 36 36
12 88 2 7 4 14
18 89 8 8 64 64
4 68 -6 -13 36 78
3 78 -7 -3 49 21
10 80 0 -1 0 0
5 75 -5 -6 25 30
12 83 2 2 4 4
11
X̅ = Y̅ =
= =
= 10 = 81
(Y-Y̅) = r (X-X̅)
r =
= 1.13
Y = 1.13 X+69.7
If X = 7
Y = 1.13x7+69.7
Y = 77.61
Example Problems:
12
1. Estimate (a) The sale for advertising expenditure of 100 Lakhs rupees & (b)
The advertising expenditure for sales of 47 cores rupees from the data given
below.
Sales X (in cores rupees) Y Advertisement exp. (Rs in Lakhs)
14 52
16 62
18 65
20 70
24 76
30 80
32 78
Answer:
X X-X̅ X² Y Y-Y̅ Y² XY
14 -8 64 52 -17 289 136
16 -6 36 62 -7 49 42
18 -4 16 65 -4 16 16
20 -2 4 70 1 1 -2
24 2 4 76 7 49 14
30 8 64 80 11 121 88
32 10 100 78 9 81 90
_______ ______ _____ _____ _____
154 288 483 606
384
X̅ = Y̅ =
= =
= 22 = 69
Regression equation if X on Y,
(X-X̅) = r (Y=Y̅)
13
r =
X = 0.63Y – 21.47
Regression equation of Y on X
r =
= 1.33
Y = 1.33X+39.74
a) X = 0.63Y-21.47
= 0.63X100-21.47
= 41.53 crores
b) Y = 1.33X+39.74
= 1.33X47+39.74
= 102.25 Lakhs
5 25
Correlation Coefficient is 0.8
(1) Calculate the 2 regression equations
(2) Find the likely sale when advertisement expenditure is 25 crores
(3) What should be the advertisement budget if the co., wants to attain sales
target of 150 crores.
Answers:
X̅ = 20 Y̅ = 120
X = 5 y = 25
X = 0.8
(X-X̅) = r (Y-Y̅)
Y-Y̅ = r (X - X̅)
Y – 120 = 4(X-20)
Y – 120 = 4X – 80
Y = 4X – 80 + 120
Y = 4X+40
15
Y = 4X+40
= 4X+40
= 140 crores
X = 0.16Y+0.8
= 0.16X150+0.8
= 24.80 crores
3. The R&D of a Chain Store wants to make a catchment analysis to start its
new branch. As the number of customers visiting the new branch depends
upon the distance of the location of the branch from the city limit, it wants to
make a correlation and regression analysis. It provides you the following data
of number of customers (in thousands) and the distance (in kilometers).
Solution:
No of customer 10 8 6 5 3 2
distance 16 24 28 35 38 40
x= X-
= = = 5.7
= = 30.17 =
r=
= 0.6226
b) When y is 5km, x (customer) by Regression
Regression of x on y
(X- ) = r (Y- )
r= = = = 0.207
(Y- ) = (X- )
r = = = 1.86
17
Height (X) 182 185 190 194 197 199 202 205 208 210
Weight (Y) 75 74 80 82 84 88 94 96 96 98
Calculate:
1. Fit the regression line of y on x.
2. Calculate the estimated weight of an athlete whose height is measured
215 cm.
1) Regression line of y on x
(y- (x- )
r. = = = 3.25
y - 86.7 = 3.25(x-197.2)
y = 3.25 x-554.2
2) Calculate y on x = 215 cm
y= 3.25 (215)-554.2
y=144.55cm
5. The marks of six students in science and mathematics are given below:
Mathematics 84 72 90 65 60 55
Science 65 60 78 68 55 52
18
Determine the regression lines and calculate the expected marks in science for a student who has
secured 95 in mathematics.
= = 71
= = 63
6. Five diabetics aged 52, 63, 65, 72 and 80 years had blood sugar level of
105, 115, 125, 126 and 130 mg./dL. respectively.
1. Find the equation of the regression line of age on blood sugar level.
2. Based on this data, what is the approximate sugar level of a 85year old
diabetic patient?
x x2 y y2 xy
= = 66.4
= = 120.2
(X- ) = r (Y- )
r. = = 0.85
(Y- ) = r. (X- )
Y = 136.92
7. From the following data of hours worked in a factory (x) and output units (y), determine the
regression line of y on x, the linear correlation coefficient and determine the type of correlation.
Hours (X) 80 79 83 84 78 60 82 85 79 84 80 62
Production (Y) 300 302 315 330 300 250 300 340 315 330 310 240
= = 78
= = 302.6
r. = = 3.50
(Y- ) = r. (X- )
r= = = 0.96
.5 9
1 9
1.5 8.5
2 8
2.5 6
3 5
3.5 5
1. 5
1. Calculate the correlation coefficient.
2. Determine the equation of the regression line of y on x.
3. If a person sleeps twelve hours, how many hours he will allot for his
study?
Solution:
X Y x2 y2
(x- ) x (y- ) y
= = = 1.875
= = = 6.9375
r= =
r= = -1.9815
2. Regression line of Y on X
(Y- ) = r (X- )
r = = = -1.1065
3) Regression line of X on Y
(X- ) = r (Y- )
r = = = -0.3599
20-40 11
40-60 8
60-80 9
80-100 1
Marks X Y M(x) X2 Y2 xy
X=xm- Y=y-
= = 50
= = 8.2
r= =
r= = -0.877
10. The following table gives the details of salary income and the
money spent in beauty parlor of 40 randomly selected women in a
metropolitan city. Find out the correlation coefficient between salary and
parlor expense.
Salary per month (Rs. in ‘000) Money spent in parlor per month
(Rs. in ‘00)
Below 10 15
10 – 20 20
20 – 30 25
30 – 40 25
40 – 50 35
Above 50 40
X Y M(x) X2 Y2 xy
X=xm- Y=y-
= = 30
= = 26.6 = 27
r= =
r= = 0.96
11. The following table summarizes the scores of a sales training provided to six sales men
(x) and sales in the first month (y) in thousand rupees.
X 35 42 32 65 24 76
Y 62 82 58 90 35 94
2 Calculate the regression line of y on x and predict the sales of a sales man who obtains 50 in training.
X Y x y X2 Y2 xy
= = = 45.67
= = = 70.17
Correlation co-efficient r =
r=
r=
r= = 0.9025
2. Regression line of Y on X
(Y- ) = r (X- )
r = = = 1.0005
3) Regression line of X on Y
(X- ) = r (Y- )
X-45.67 = r (Y-70.17)
27
r = = = 0.8141
12. Two customers are asked to rank eight varieties of designer sarees.
The ranks given by them are as follows:
Saree Codes: D101 D102 D103 D104 D105 D106 D107 D108
Customer 1: 7 5 8 3 2 4 6 1
Customer 2: 5 6 3 7 3 2 8 4
Calculate Spearman’s rank correlation coefficient.
S.C D2
D( )
D111 7 5 2 4
D102 5 6 -1 1
D103 8 3 5 25
D104 3 7 -4 16
D105 2 3 -1 1
D106 4 2 2 4
D107 6 8 -2 4
D108 1 4 -3 9
R= 1-
28
=1-
= 1-
= 1-
= 1-0.7619
= 0.2381
No. of Students: 240 174 216 302 232 253 132 151
X Y y x2 y2 xy
x(x- )
= = = 212.5
= = = 181.88
r=
r=
r=
r= =0.9650
r = 0.9650