0% found this document useful (0 votes)
18 views26 pages

Notes Regression

Uploaded by

Madhura Tambe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views26 pages

Notes Regression

Uploaded by

Madhura Tambe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

REGRESSION

Regression is defined as a method of estimating the value of one variable when thatof the
other is known and when the variables are correlated.
If the variables which are highly correlated are plotted on a graph then the points lie in a
narrow strip. If the strip is nearly straight, we may draw a line such that all the points are
close to it from both the sides. Such a line can be taken as the representative of the ideal
variation. It is called the line of best fit. It is a line such that the sum of the distances of the
points from the line is minimum. It is also called as the Line of regression. But we do not
measure the distance by dropping a perpendicular from a point to the line. We measure the
deviations vertically and horizontally and get one line when distances are minimised
vertically and second line when distances are minimised horizontally. thus, we get two lines
of regression.
Lines of regression of y on x
If we minimise the deviations of the point from the line measured along y axis we get a line
which is called the line of regression of y on x. Its equation is written in the form𝑦 = 𝑎 + 𝑏𝑥
This line is used for estimating the value of y for a given value of x.

Lines of regression of x on y
If we minimise the deviations of the point from the line measured along x axis we get a line
which is called the line of regression of x on y .its equation is written in the form 𝑥 = 𝑎 +
𝑏𝑦 This line is used for estimating the value of x for a given value of y
Method of least Square
First Method (Normal Equation Method)
The equation of line of regression of 𝑦 𝑜𝑛 𝑥

𝑦 = 𝑎 + 𝑏𝑥
Where values of 𝑎 𝑎𝑛𝑑 𝑏 are calculated by solving the Normal Equations

𝑦 = 𝑛𝑎 + 𝑏 𝑥

𝑥𝑦 = 𝑎 𝑥+𝑏 𝑥2

.
The equation of line of regression of 𝑥 𝑜𝑛 𝑦

1
EM III_SMITA N
𝑥 = 𝑎 + 𝑏𝑦
Where values of 𝑎 𝑎𝑛𝑑 𝑏 are calculated by solving the Normal Equations

𝑥 = 𝑛𝑎 + 𝑏 𝑦

𝑥𝑦 = 𝑎 𝑦+𝑏 𝑦2

Example 1
Obtain the equation of line of regression of 𝑦 𝑜𝑛 𝑥 from the following data and estimate
𝑦 𝑤𝑕𝑒𝑛 𝑥 = 73
𝑥: 70 72 74 76 78 80
𝑦 163 170 179 188 196 220

Solution

𝑥 𝑦 𝑥𝑦 𝑥2
70 163 11410 4900
72 170 12240 5184
74 179 13246 5476
76 188 14288 5776
78 196 15288 6084
80 220 17600 6400
𝑥 = 450 𝑦 = 1116 𝑥 𝑦 = 84072 𝑥 2 = 33820

Substituting in the Normal Equation


𝑦 = 𝑛𝑎 + 𝑏 𝑥 1116 = 6𝑎 + 450𝑏 ……………(1) :

𝑥𝑦 = 𝑎 𝑥+𝑏 𝑥2 84072 = 450𝑎 + 33820𝑏……………..(2)


Solving (1) & (2) we get 𝑎 = −212.57 𝑏 = 5.314

𝑦 = 𝑎 + 𝑏𝑥
𝑦 = −212.57 + 5.3142𝑥
𝑤𝑕𝑒𝑛 𝑥 = 73 𝑡𝑕𝑒𝑛 𝑦 = 175.37

2
EM III_SMITA N
Example 2
Obtain the equation of line of regression of 𝑥 𝑜𝑛 𝑦 from the following data

𝑥: 1 3 4 6 8 9 11 14
𝑦 1 2 4 4 5 7 8 9

Solution

𝑥 𝑦 𝑥𝑦 𝑦2
1 1 1 1
3 2 6 4
4 4 16 16
6 4 24 16
8 5 40 25
9 7 63 49
11 8 88 64
14 9 126 81
𝑥 = 56 𝑦 = 40 𝑥 𝑦 = 364 𝑦 2 = 256

Substituting in the Normal Equation

𝑥 = 𝑛𝑎 + 𝑏 𝑦 56 = 8𝑎 + 40𝑏 ……………(1) :
𝑥𝑦 = 𝑎 𝑦+𝑏 𝑦2 364 = 40𝑎 + 256𝑏……………..(2)

Solving (1) & (2) we get 𝑎 = −0.5 𝑏 = 1.5

𝑥 = 𝑎 + 𝑏𝑦

𝑥 = −0.5 + 1.5𝑦

Example 3
Find the lines of regression and hence find the coefficient of Correlation.

𝑥: 65 66 67 67 68 69 70 72
𝑦 67 68 65 66 72 72 69 71

3
EM III_SMITA N
Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
65 67 4355 4225 4489
66 68 4488 4356 4624
67 65 4355 4489 4225
67 66 4422 4489 4356
68 72 4896 4624 5184
69 72 4968 4761 5184
70 69 4830 4900 4761
72 71 5112 5184 5041
𝑥 = 544 𝑦 = 550 𝑥 𝑦 = 37426 𝑥 2 = 37028 𝑦 2 = 37864

Lines of regressions of y 𝑜𝑛 𝑥
Substituting in the Normal Equation
𝑦 = 𝑛𝑎 + 𝑏 𝑥 550 = 8𝑎 + 544𝑏 ……………(1) :
𝑥𝑦 = 𝑎 𝑥+𝑏 𝑥2 37426 = 544𝑎 + 37028𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 19.64 𝑏 = 0.722

𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 19.6388 + 0.7222𝑥
Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 = 𝑛𝑎 + 𝑏 𝑦 544 = 8𝑎 + 550𝑏 ……………(1) :
𝑥𝑦 = 𝑎 𝑦+𝑏 𝑦2 37426 = 550𝑎 + 37864𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 33.29 𝑏 = 0.5048

𝑥 = 𝑎 + 𝑏𝑦
𝑥 = 33.2912 + 0.5048𝑦
𝑟 2 = 0.7222 0.5048 = 0.3645

𝑟 = 0.6037
Example 4

Find the lines of regressions

𝒙 5 6 7 8 9 10 11
𝑦 11 14 14 15 12 17 16

4
EM III_SMITA N
Solution

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
5 11 55 25 121
6 14 84 36 196
7 14 98 49 196
8 15 120 64 225
9 12 108 81 144
10 17 170 100 289
11 16 176 121 256
𝑥 = 56 𝑦 = 99 𝑥 𝑦 = 811 𝑥 2 = 476 𝑦 2 = 1427

Lines of regressions of y 𝑜𝑛 𝑥
Substituting in the Normal Equation
𝑦 = 𝑛𝑎 + 𝑏 𝑥 99 = 7𝑎 + 56𝑏 ……………(1) :
𝑥𝑦 = 𝑎 𝑥+𝑏 𝑥2 811 = 56𝑎 + 476𝑏……………..(2)
Solving (1) & (2) we get 𝑎 = 8.7142 𝑏 = 0.6785

𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 8.7142 + 0.6785𝑥
Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 = 𝑛𝑎 + 𝑏 𝑦 56 = 7𝑎 + 99𝑏 ……………(1) :

𝑥𝑦 = 𝑎 𝑦+𝑏 𝑦2 811 = 99𝑎 + 1427𝑏……………..(2)


Solving (1) & (2) we get 𝑎 = −2.0053 𝑏 = 0.7074
𝑥 = 𝑎 + 𝑏𝑦
𝑥 = −2.0053 + 0.7074𝑦

SECOND METHOD
METHOD OF LEAST SQUARE

Line of regression of 𝑦 𝑜𝑛 𝑥

𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥 )
𝜎𝑦
𝑤𝑕𝑒𝑟𝑒 𝑏𝑦𝑥 = 𝑟
𝜎𝑥

5
EM III_SMITA N
𝑥 𝑦
𝑥 = , 𝑦 =
𝑛 𝑛
𝑏𝑦𝑥 = 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠

𝜎𝑥 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥
𝜎𝑦 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑦

𝑟 = 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
𝑥 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑥
𝑦 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑦

Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥 = 𝑏𝑥𝑦 (𝑦 − 𝑦)
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦
𝑥 𝑦
𝑥 = , 𝑦 =
𝑛 𝑛
𝑏𝑥𝑦 = 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠

𝜎𝑥 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥
𝜎𝑦 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑦

𝑟 = 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛

𝑥 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑥
𝑦 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑦

𝑥 𝑦
𝑥𝑦 − 𝑁
𝑏𝑥𝑦 = 2
( 𝑦)
𝑦2 − 𝑁
𝑥 𝑦
𝑥𝑦 − 𝑁
𝑏𝑦𝑥 = 2
( 𝑥)
𝑥2 − 𝑁

We know that

6
EM III_SMITA N
𝑥 𝑦
𝑥𝑦 − 𝑁
𝑟=
2 2
( 𝑥) ( 𝑦)
𝑥2 − 𝑦2 −
𝑁 𝑁

𝜎𝑦 𝜎𝑥
𝑏𝑦𝑥 𝑏𝑥𝑦 = 𝑟 𝑟 = 𝑟2
𝜎𝑥 𝜎𝑦

Properties of regression coefficient


1) Coefficient of Correlation is the geometric mean between the coefficients of
regression.

𝜎𝑦 𝜎𝑥
𝑏𝑦𝑥 𝑏𝑥𝑦 = 𝑟 𝑟 = 𝑟2
𝜎𝑥 𝜎𝑦
𝑟 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑓 𝑏𝑦𝑥 & 𝑏𝑥𝑦 𝑎𝑟𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑟 𝑖𝑠 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑖𝑓 𝑏𝑦𝑥 & 𝑏𝑥𝑦 𝑎𝑟𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒

𝑖𝑓 𝑏𝑦𝑥 𝑖𝑠 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑡𝑕𝑒𝑛 𝑏𝑥𝑦 𝑖𝑠 𝑎𝑙𝑠𝑜 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒

2) If one coefficient of regression is greater than one then the other must be less than
one.
Since −1 ≤ 𝑟 ≤ 1

𝑟2 ≤ 1
𝑏𝑦𝑥 𝑏𝑥𝑦 ≤ 1
1
𝑏𝑦𝑥 ≤
𝑏𝑥𝑦
𝑏𝑦𝑥 < 1, 𝑏𝑥𝑦 > 1

3) Arithmetic mean of the coefficients of regression is greater than or equal to the


coefficient of Correlation.
𝑏𝑦𝑥 + 𝑏𝑥𝑦
To prove ≥𝑟
2

1 𝜎𝑦 𝜎
To prove 𝑟 𝜎 + 𝑟 𝜎𝑥 ≥ 𝑟
2 𝑥 𝑦

𝜎𝑦 𝜎
T.P 𝜎𝑥
+ 𝜎𝑥 ≥ 2
𝑦

7
EM III_SMITA N
T.P 𝜎𝑦 2 + 𝜎𝑥 2 ≥ 2𝜎𝑥 𝜎𝑦

T.P 𝜎𝑦 2 + 𝜎𝑥 2 − 2𝜎𝑥 𝜎𝑦 ≥ 0

T.P (𝜎𝑥 + 𝜎𝑦 )2 ≥ 0 𝑤𝑕𝑖𝑐𝑕 𝑖𝑠 𝑜𝑏𝑣𝑖𝑜𝑢𝑠𝑙𝑦 𝑡𝑟𝑢𝑒.

Angle between the lines of regression


1 − 𝑟2 𝜎𝑥 𝜎𝑦
𝑡𝑎𝑛𝜃 =
𝑟 (𝜎𝑥 + 𝜎𝑦 )2

𝑖 𝐼𝑓 𝑟 = 0 𝑡𝑎𝑛𝜃 = ∞
𝜋
𝜃=
2
The lines of regression are perpendicular to each other.
(ii) 𝐼𝑓 𝑟 = ±1 𝑡𝑎𝑛𝜃 = 0
𝜃=0
The lines of regression are Coincident.

Example 1
The following table gives the age of car of a certain make and annual maintenance
cost. Obtain the equation of the line of regression of cost on age.

𝐴𝑔𝑒 𝑜𝑓 𝑎 𝑐𝑎𝑟 2 4 6 8
𝑀𝑎𝑖𝑛𝑡𝑒𝑛𝑎𝑛𝑐𝑒 𝑐𝑜𝑠𝑡 1 2 2.5 3

Solution
Line of regression of 𝑦 𝑜𝑛 𝑥

𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥 )

𝑥 𝑦 𝑥𝑦 𝑥2
2 1 2 4
4 2 8 16
6 2.5 15 36
8 3 24 64
𝑥 = 20 𝑦 = 8.5 𝑥 𝑦 = 49 𝑥 2 = 120

8
EM III_SMITA N
𝑥 20
𝑥 = = =5,
𝑛 4
𝑦 8.5
𝑦 = = = 2.125
𝑛 4
𝑥 𝑦
𝑥𝑦 − 𝑁
𝑏𝑦𝑥 = 2
( 𝑥)
𝑥2 − 𝑁

20 8.5
49 − 4
= 20 2
= 0.325
120 − 4

Substituting in 𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥 )

𝑦 − 2.125 = 0.325(𝑥 − 5)

𝑦 − 2.125 = 0.325𝑥 − 1.625

𝑦 = 0.325𝑥 − 1.625 + 2.125

𝑦 = 0.325𝑥 + 0.5

Example 2

Find the lines of regression


𝑥 10 12 13 16 17 20 25
𝑦 19 22 24 27 29 33 37

Solution
Line of regression of 𝑦 𝑜𝑛 𝑥

𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥 )

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2

9
EM III_SMITA N
10 19 190 100 361
12 22 264 144 484
13 24 312 169 576
16 27 432 256 729
17 29 493 289 841
20 33 660 400 1089
25 37 925 625 1369
𝒙 =113 𝒚 = 𝟏𝟗𝟏 𝒙 𝒚 =3276 𝒙𝟐 = 𝟏𝟗𝟖𝟑 𝒚𝟐 = 𝟓𝟒𝟒𝟗

𝑥 113
𝑥 = = = 16.1428
𝑛 7
𝑦 191
𝑦 = = = 27.2857
𝑛 7
𝑥 𝑦
𝑥𝑦 − 𝑁
𝑏𝑦𝑥 = 2
( 𝑥)
𝑥2 − 𝑁

113 191
3276 − 7
= 113 2
= 1.2131
1983 − 7
𝑥 𝑦
𝑥𝑦 − 𝑁
𝑏𝑥𝑦 = 2
( 𝑦)
𝑦2 − 𝑁

(113 )(191 )
3276−
7
= 191 2
= 0.8116
5449−
7

Substituting in 𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥 )

𝑦 − 27.2857 = 1.2131 (𝑥 − 16.1428)

𝑦 = 1.2131 𝑥 − 19.5828+27.2857

𝑦 = 1.2131 𝑥 + 7.7029

Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥 = 𝑏𝑥𝑦 (𝑦 − 𝑦 )

10
EM III_SMITA N
𝑥 − 16.1428 = 0.8116 (𝑦 − 27.2857)

𝑥 − 16.1428 = 0.8116 𝑦 − 23.0702

𝑥 = 0.8116 𝑦 − 22.1450 + 16.1428

𝑥 = 0.8116 𝑦 −6.0022

Example 3
The following data regarding the heights (y) and the weights (x) of 100 college
students are given below
𝑥 = 15000 , 𝑥 2 = 2272500 𝑦 = 6800

𝑦 2 = 463025 𝑥𝑦 = 1022250
Find the correlation coefficient between height and weight and state the equations
of regression of height on weight.

Solution
𝑥 15000
𝑥 = = = 150
𝑛 100
𝑦 6800
𝑦 = = = 68
𝑛 100
𝑥 𝑦
𝑥𝑦 − 𝑁
𝑏𝑦𝑥 = 2
( 𝑥)
𝑥2 − 𝑁

15000 6800
1022250 − 100
= 15000 2
= 0.1
2272500 − 100

𝑥 𝑦
𝑥𝑦 − 𝑁
𝑏𝑥𝑦 = 2
( 𝑦)
𝑦2 − 𝑁

15000 6800
1022250 − 100
= 6800 2
= 3.6
463025 − 100

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦
= (0.1)(3.6)

11
EM III_SMITA N
= 0.36

𝑟 = 0.36 = 0.6
Line of regression of 𝑦 𝑜𝑛 𝑥

𝑦 − 68 = 0.1(𝑥 − 150)
𝑦 − 68 = 0.1𝑥 − 15
𝑦 = 0.1𝑥 − 15 + 68
𝑦 = 0.1𝑥 + 53

Example 4

Find the regression coefficients and the coefficient of Correlation from the following
data
N=12, 𝑥 = 120 , 𝑥 2 = 1392 𝑦 = 432
𝑦 2 = 18252 𝑥𝑦 = 4992
Find the correlation coefficient between height and weight and state the equations
of regression of height on weight.

Solution
𝑥 120
𝑥 = = = 10
𝑛 12
𝑦 432
𝑦 = = = 36
𝑛 12
𝑥 𝑦
𝑥𝑦 − 𝑁
𝑏𝑦𝑥 = 2
( 𝑥)
𝑥2 − 𝑁

120 432
4992 − 12
= 120 2
= 3.5
1392 − 12

𝑥 𝑦
𝑥𝑦 − 𝑁
𝑏𝑥𝑦 = 2
( 𝑦)
𝑦2 − 𝑁

12
EM III_SMITA N
120 432
4992 − 12
= 432 2
= 0.2488
18252 − 12

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

= (3.5)(0.2488)

= 0.8708

𝑟 = 0.36 = 0.9331

Example 5
Given Variance of 𝑥 = 25. 𝑇𝑕𝑒 Equations of two lines of regression are5𝑥 − 𝑦 =
22, 64𝑥 − 45𝑦 = 24.
Find (i) 𝑥 𝑎𝑛𝑑 𝑦
(ii) 𝜎𝑦

(iii)𝑟

Solution

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥 = 𝜎𝑥 2 = 25
𝜎𝑥 = 5
Solving the given equations, we get the mean as

𝑥 =6 𝑦=8
If we consider the given equation
5𝑥 − 𝑦 = 22 as regression equation of 𝑥 𝑜𝑛 𝑦
5𝑥 = 𝑦 + 22
1 22
𝑥= 𝑦+
5 5
Then 𝑏𝑥𝑦 = 0.2

If we consider the given equation 64𝑥 − 45𝑦 = 24.


as as regression equation of 𝑦 𝑜𝑛 𝑥
64𝑥 − 45𝑦 = 24

13
EM III_SMITA N
45𝑦 = 64𝑥 − 24
64 24
𝑦= 𝑦−
45 45
Then 𝑏𝑦𝑥 = 1.4222
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

= (1.4222)(0.2)

= 0.2844

𝑟 = 0.2844

𝑟 = 0.5332
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥 = 𝜎𝑥 2 = 25
𝜎𝑥 = 5
5
0.2 = (0.5332)
𝜎𝑦
2.666
0.2 =
𝜎𝑦
2.666
𝜎𝑦 =
0.2

𝜎𝑦 = 13.33

Example 6

From 8 observations the following results were obtained

𝑥 = 59 , 𝑥 2 = 524 𝑦 = 40

𝑦 2 = 256 𝑥𝑦 = 364
Find the equations of the line of regression of 𝑥 𝑜𝑛 𝑦 and the Coefficient of
Correlation.

Solution
Line of regression of 𝑥 𝑜𝑛 𝑦

14
EM III_SMITA N
𝑥 − 𝑥 = 𝑏𝑥𝑦 (𝑦 − 𝑦 )
𝑥 59
𝑥= = = 7.375
𝑁 8
𝑦 40
𝑦= = =5
𝑁 8

𝑥 𝑦
𝑥𝑦 − 𝑁
𝑏𝑥𝑦 = 2
( 𝑦)
𝑦2 − 𝑁

59 40
364 − 8
= 40 2
= 1.2321
256 − 8

Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥 = 𝑏𝑥𝑦 (𝑦 − 𝑦 )

𝑥 − 7.375 = 1.2321(𝑦 − 5)

𝑥 = 1.2321𝑦 − 6.1605 + 7.375


𝑥 = 1.2321𝑦 + 1.2145
𝑥 𝑦
𝑥𝑦 − 𝑁
𝑟=
2 2
( 𝑥) ( 𝑦)
𝑥2 − 𝑦2 −
𝑁 𝑁

59 40
364 − 8
= = 0.9780
592 40 2
524 − 256 −
8 8

Example 7
The equations of two lines of regression are
𝑥 = 19.13 − 0.87𝑦
𝑦 = 11.64 − 0.50𝑥
Find (i) the mean of 𝑥 𝑎𝑛𝑑 𝑦 and (ii) coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦

Solution

15
EM III_SMITA N
Solving the given equations, we get the mean as

𝑥 = 15.93 𝑦 = 3.672
If we consider the given equation
𝑥 = 19.13 − 0.87𝑦 as regression equation of 𝑥 𝑜𝑛 𝑦
Then 𝑏𝑥𝑦 = −0.87

If we consider the given equation 𝑦 = 11.64 − 0.50𝑥 as as regression equation of 𝑦 𝑜𝑛 𝑥


Then 𝑏𝑦𝑥 = −0.50

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

= (−0.50)(−0.87)

= 0.435

𝑟 = − 0.435

= −0.6595
Example 8
In a partially destroyed laboratory record of analysis of correlation data the following results
are legible.
Variance of 𝑥 = 9. Equations of lines of regression 4𝑥 − 5𝑦 + 33 = 0, 20𝑥 − 9𝑦 − 107 =
0.
Find (i) mean value of 𝑥 𝑎𝑛𝑑 𝑦
(ii) Standard deviation of 𝑦
(iii) Coefficient of correlation between 𝑥 𝑎𝑛𝑑 𝑦

Solution

Solving 4𝑥 − 5𝑦 + 33 = 0, 20𝑥 − 9𝑦 − 107 = 0. We get the means


𝑥 = 13 𝑦 = 17

𝐼𝑓 𝑤𝑒 𝑡𝑎𝑘𝑒 4𝑥 − 5𝑦 + 33 = 0 𝑎𝑠 𝑙𝑖𝑛𝑒 𝑜𝑓 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑜𝑓 𝑥 𝑜𝑛 𝑦 𝑡𝑕𝑒𝑛


4𝑥 = 5𝑦 − 33

16
EM III_SMITA N
5 33
𝑥= 𝑦−
4 4

5
𝑏𝑥𝑦 =
4
𝐼𝑓 𝑤𝑒 𝑡𝑎𝑘𝑒 20𝑥 − 9𝑦 − 107 = 0. 𝑎𝑠 𝑙𝑖𝑛𝑒 𝑜𝑓 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑜𝑓 𝑦 𝑜𝑛 𝑥 𝑡𝑕𝑒𝑛

9𝑦 = 20𝑥 − 107

20 107
𝑦= 𝑥−
9 9

20
𝑏𝑦𝑥 =
9
Hence, we see that 𝑏𝑥𝑦 𝑎𝑛𝑑 𝑏𝑦𝑥 are greater than 1 which is not possible because
coefficient of correlation cannot be greater than 1.hence our assumption is wrong.

so 𝑤𝑒 𝑕𝑎𝑣𝑒 𝑡𝑜 𝑡𝑎𝑘𝑒 4𝑥 − 5𝑦 + 33 = 0 𝑎𝑠 𝑙𝑖𝑛𝑒 𝑜𝑓 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑜𝑓 𝑦 𝑜𝑛 𝑥 𝑡𝑕𝑒𝑛


5𝑦 = 4𝑥 + 33
4 33
𝑦= 𝑥−
5 5

4
𝑏𝑦𝑥 =
5

𝐼𝑓 𝑤𝑒 𝑡𝑎𝑘𝑒 20𝑥 − 9𝑦 − 107 = 0. 𝑎𝑠 𝑙𝑖𝑛𝑒 𝑜𝑓 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑜𝑓 𝑥 𝑜𝑛 𝑦 𝑡𝑕𝑒𝑛


20𝑥 = 9𝑦 + 107
9 107
𝑥= 𝑦−
20 20
9
𝑏𝑥𝑦 =
20

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

4 9
= ( )( )
5 20

= 0.36

𝑟 = 0.36

= 0.6
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦

17
EM III_SMITA N
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑥 = 𝜎𝑥 2 = 9
𝜎𝑥 = 3
9 3
= (0.6)
20 𝜎𝑦
1.8
0.45 =
𝜎𝑦

1.8
𝜎𝑦 =
0.45

𝜎𝑦 = 4

Example 9

The regression lines of a sample are


𝑥 + 6𝑦 = 6
3𝑥 + 2𝑦 = 10
Find (i) the mean of 𝑥 𝑎𝑛𝑑 𝑦 and (ii) coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦.Also
estimate 𝑦 when 𝑥 = 12.

Solution

Solving the given equations, we get the mean as

𝑥 = 3 𝑦 = 0.5
If we consider the given equation
𝑥 + 6𝑦 = 6
𝑥 = −6𝑦 + 6
as regression equation of 𝑥 𝑜𝑛 𝑦
Then 𝑏𝑥𝑦 = −6

If we consider the given equation


3𝑥 + 2𝑦 = 10 as as regression equation of 𝑦 𝑜𝑛 𝑥

18
EM III_SMITA N
2𝑦 = −3𝑥 + 10
3
𝑦 =− 𝑥+5
2

3
Then 𝑏𝑦𝑥 = − 2

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

3
= (−6)(− )
2

=9

𝑟 = −3 𝑛𝑜𝑡 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 . 𝑕𝑒𝑛𝑐𝑒 𝑜𝑖𝑟 𝑎𝑠𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 𝑖𝑠 𝑤𝑟𝑜𝑛𝑔

= −0.6595

Hence
𝑥 + 6𝑦 = 6 is a regression line of 𝑦 𝑜𝑛 𝑥
𝑥 + 6𝑦 = 6
6𝑦 = −𝑥 + 6
1
𝑦 =− 𝑥+1
6
1
𝑏𝑦𝑥 = −
6
3𝑥 + 2𝑦 = 10as regression equation of 𝑥 𝑜𝑛 𝑦

3𝑥 + 2𝑦 = 10
3𝑥 = −2𝑦 + 10
2 10
𝑥 =− 𝑦+
3 3
2
𝑏𝑥𝑦 = −
3

𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

1 2
= (− )(− )
6 3

= 0.1111
𝑟 = −0.3333

19
EM III_SMITA N
To estimate 𝑦 when 𝑥 = 12.

1
𝑦=− 12 + 1 = −2 + 1 = −1
6

Example 10
Given the following information about marks of 60 students

Mathematics English
Mean 80 50
S.D 15 10

Coefficient of correlation
𝑟=
0.4. 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑡𝑕𝑒 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝑡𝑕𝑒 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑀𝑎𝑡𝑕𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝑤𝑕𝑜 𝑠𝑐𝑜𝑟𝑒𝑑 60 𝑚𝑎𝑟𝑘𝑠 𝑖𝑛 𝐸𝑛𝑔𝑙𝑖𝑠𝑕.
Solution
Let 𝑥 𝑏𝑒 𝑡𝑕𝑒 𝑚𝑎𝑟𝑘𝑠 𝑖𝑛 𝑀𝑎𝑡𝑕𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝑎𝑛𝑑 𝑦 𝑏𝑒 𝑡𝑕𝑒 𝑚𝑎𝑟𝑘𝑠 𝑖𝑛 𝐸𝑛𝑔𝑙𝑖𝑠𝑕
𝑀𝑒𝑎𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝑀𝑎𝑡𝑕𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝑥 = 80
𝑀𝑒𝑎𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝐸𝑛𝑔𝑙𝑖𝑠𝑕 𝑦 = 50
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝑀𝑎𝑡𝑕𝑒𝑚𝑎𝑡𝑖𝑐𝑠 𝜎𝑥 = 15
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 𝑜𝑓 𝐸𝑛𝑔𝑙𝑖𝑠𝑕 𝜎𝑦 = 10
Coefficient of
correlation 𝑟 = 0.4
Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥 = 𝑏𝑥𝑦 (𝑦 − 𝑦 )
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦

15 6
= 0.4 = = 0.6
10 10

Hence 𝑥 − 𝑥 = 𝑏𝑥𝑦 (𝑦 − 𝑦 )

𝑥 − 80 = 0.6(𝑦 − 50)
𝑥 − 80 = 0.6𝑦 − 30

20
EM III_SMITA N
𝑥 = 0.6𝑦 − 30 + 80
𝑥 = 0.6𝑦 + 50
𝑤𝑕𝑒𝑛 𝑦 = 60
𝑥 = 0.6(60) + 50
𝑥 = 36 + 50
𝑥 = 86

Example 11
Given

𝑥 𝑠𝑒𝑟𝑖𝑒𝑠 𝑦 𝑠𝑒𝑟𝑖𝑒𝑠
Mean 18 100
S.D 14 20
𝑟 = 0.6
Find the most probable value of 𝑦 when x = 70 and most probable value of x when y =
90
Solution
𝑥 = 18 ,𝑦 = 100 𝜎𝑥 = 14𝜎𝑦 = 20𝑟 = 0.6

Line of regression of 𝑦 𝑜𝑛 𝑥
𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥 )
𝜎𝑦
𝑏𝑦𝑥 = 𝑟
𝜎𝑥

20
= 0.6 = 0.8571
14

Hence
𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥 )

𝑦 − 100 = 0.8571(𝑥 − 18)


𝑦 = 0.8571𝑥 − 15.4278 + 100
𝑦 = 0.8571𝑥 + 84.5722

21
EM III_SMITA N
𝑤𝑕𝑒𝑛 𝑥 = 70
𝑦 = 0.8571(70) + 84.5722
𝑦 = 144.5692

Line of regression of 𝑥 𝑜𝑛 𝑦
𝑥 − 𝑥 = 𝑏𝑥𝑦 (𝑦 − 𝑦 )
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦

14
= 0.6 = 0.42
20

Hence 𝑥 − 𝑥 = 𝑏𝑥𝑦 (𝑦 − 𝑦 )

𝑥 − 18 = 0.42(𝑦 − 100)
𝑥 − 18 = 0.42𝑦 − 42
𝑥 = 0.42𝑦 − 42 + 18
𝑥 = 0.42𝑦 − 24
𝑤𝑕𝑒𝑛 𝑦 = 90
𝑥 = 0.42 90 − 24
𝑥 = 13.8

Example 12
It is given that the means of 𝑥 𝑎𝑛𝑑 𝑦 are 5 and 10. If the line of regression of 𝑦 𝑜𝑛 𝑥 is
parallel to the line 20𝑦 = 9𝑥 + 40.Estimate value of 𝑦 𝑓𝑜𝑟 𝑥 = 30.
Solution

Line of regression of 𝑦 𝑜𝑛 𝑥

22
EM III_SMITA N
𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥 )

Slope of this equation is 𝑏𝑦𝑥 and this line is parallel to 20𝑦 = 9𝑥 + 40


9 40
𝑦= 𝑥+
20 20

9
𝑏𝑦𝑥 =
20
𝑥 = 5 𝑎𝑛𝑑 𝑦 = 10

Line of regression of 𝑦 𝑜𝑛 𝑥

9
𝑦 − 10 = (𝑥 − 5)
20

𝑦 − 10 = 0.45(𝑥 − 5)

𝑦 − 10 = 0.45𝑥 − 2.25
𝑦 = 0.45𝑥 − 2.25 + 10
𝑦 = 0.45𝑥 + 7.75

𝑤𝑕𝑒𝑛 𝑥 = 30

𝑦 = 0.45(30) + 7.75

𝑦 = 13.5 + 7.75
𝑦 = 21.25

Example 13

23
EM III_SMITA N
A panel of two judges A and B graded dramatic performances by independently awarding
marks as follows:
Performance No.: 1 2 3 4 5 6 7
Marks by A: 36 32 34 31 32 32 34
Marks by B: 35 33 31 30 34 32 36
The eighth performance however, which judge B could not attend got 38 marks by judge A.
If judge B had also been present how many marks would he be expected to have awarded to
the eighth performance?
Solution

𝑥 𝑦 𝑥𝑦 𝑥2
36 35 1260 1296
32 33 1056 1024
34 31 1054 1156
31 30 930 961
32 34 1088 1024
32 32 1024 1024
34 36 1224 1156
𝑥 = 231 𝑦 = 231 𝑥 𝑦 = 7636 𝑥 2 = 7641

Substituting in the Normal Equation


𝑦 = 𝑛𝑎 + 𝑏 𝑥 231 = 7𝑎 + 231𝑏 ……………(1) :

𝑥𝑦 = 𝑎 𝑥+𝑏 𝑥2 7636 = 231𝑎 + 7641𝑏……………..(2)


Solving (1) & (2) we get 𝑎 = 9.16 𝑏 = 0.722
𝑦 = 𝑎 + 𝑏𝑥
𝑦 = 9.16 + 0.722𝑥
𝑤𝑕𝑒𝑛 𝑥 = 38 𝑡𝑕𝑒𝑛 𝑦 = 36.52 ≅ 37

Example 14
The equations of two lines of regressions are 3𝑥 + 2𝑦 = 26, 6𝑥 + 𝑦 = 31.
Find (i) the mean of 𝑥 𝑎𝑛𝑑 𝑦
(ii) Coefficient of Correlation between 𝑥 𝑎𝑛𝑑 𝑦,
(iii) 𝜎𝑦 𝑖𝑓 𝜎𝑥 = 3

Solution

24
EM III_SMITA N
𝜎𝑥 = 3
Solving the given equations, we get the mean as

𝑥 =4 𝑦=7
If we consider the given equation
3𝑥 + 2𝑦 = 26 as regression equation of 𝑦 𝑜𝑛 𝑥
2𝑦 = −3𝑥 + 26
−3 26
𝑦= 𝑦+
2 2
Then 𝑏𝑦𝑥 = −1.5

If we consider the given equation 6𝑥 + 𝑦 = 31.


as as regression equation of 𝑥 𝑜𝑛 𝑦
6𝑥 + 𝑦 = 31
6𝑥 = −𝑦 + 31

−1 31
𝑥= 𝑦+
6 6
Then 𝑏𝑥𝑦 = 0.1667
𝑟 2 = 𝑏𝑦𝑥 𝑏𝑥𝑦

= (−1.5)(−0.1667)

= 0.25

𝑟 = 0.25

𝑟 = −0.5
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦

𝜎𝑥 = 3
3
−0.1667 = (−0.5)
𝜎𝑦
1.5
0.1667 =
𝜎𝑦

25
EM III_SMITA N
1.5
𝜎𝑦 =
0.1667

𝜎𝑦 = 8.9982 ≅ 9

Practice Problems
Find the coefficient of regression and hence the equation of lines of regression for the
following data

X 78 36 98 25 75 82 90 62 65 39
y 84 51 91 60 68 62 86 58 53 47

Estimate the value of x when y=90

26
EM III_SMITA N

You might also like