Regression Methods
Regression Methods
Regression Methods
In the case of single random variable we have seen that the mean and variance are parameters of
the random variable giving us information about its average behaviour. In the case of two dimensional
random variables also we would like to have a similar representations. So, we generalize the definition of
variance to what is called covariance. Covariance between X and Y tells about the relationship between
X and Y indicating how they tend to vary together. In this chapter, we have to see the study of relationship
between variables.
Curve fitting. Let (xi , yi ) , i = 1, 2, 3, , n be a given set of n pairs of values, X being independent
variable and Y the dependent variable. The general problem in curve fitting is to find, if possible, an
analytic expression of the form y = f (x), for the functional relationship suggested by the given data.
x 1 2 3 4 6 8
.
y 2.4 3 3.6 4 5 6
1
2.2 Statistics and Queueing Theory
X Y X2 XY
1 2.4 1 2.4
2 3 4 6.0
3 3.6 9 10.8
.
4 4 16 16.0
6 5 36 30.8
8 6 64 48.0
X 2 = 130
P P P P
X = 24 y = 24 XY = 113.2
x 1 2 3 4 5
.
y 14 27 40 4 55
X Y X2 XY
1 14 1 14
2 27 4 54
3 40 9 120 .
4 55 16 220
5 68 25 340
X 2 = 55
P P P P
X = 15 y = 204 XY = 748
x 0 5 10 15 20 25
.
y 12 15 17 22 24 30
X Y X2 XY
0 12 0 0
5 15 25 75
10 17 100 170
.
15 22 225 330
20 24 400 480
25 30 625 750
X 2 = 1375
P P P P
X = 75 y = 120 XY = 1805
b
Problem 4 Fit the least square y = ax + to the following data.
x
x 1 2 3 4
y -1.5 0.99 3.88 7.66
b
Solution: Given y = ax + xy = ax2 + b
x
x y X = x2 Y = xy X2 XY
1 -1.5 1 -1.5 1 -15
2 0.99 4 1.98 16 7.92
.
3 3.88 9 11.64 81 104.76
4 7.66 16 30.64 256 480.24
10 11.03 30 42.76 354 601.42
y (production in
355 356 357 358 359 361 362
tons)
.
x y X = x 1961 Y = y 358 X2 X3 X4 XY X 2Y
1931 355 -30 -3 900 -27000 810000 90 -2700
1941 356 -20 -2 400 -8000 160000 40 -800
1951 357 -10 -1 100 -1000 10000 10 -100
1961 358 0 0 0 0 0 0 0
1971 359 10 1 100 1000 10000 10 100
1981 361 20 3 400 8000 160000 60 1200
1991 362 30 4 900 27000 810000 120 3600
0 2 2800 0 196 104 330 1300
.
x 1 2 3 4 5
y (production in .
5 12 26 60 97
tons)
X Y X2 X3 X4 XY X 2Y
1 5 1 1 1 5 5
2 12 4 8 16 24 48
3 26 9 27 81 78 234 .
4 60 16 64 256 240 960
5 97 25 125 625 485 2425
15 200 55 225 979 832 3672
Using normal equations,
Y = na + b X + c X 2
P P P
200 = 5a + 15b + 55c (1)
P P P 2 P 3
XY = a X + b X + c X 832 = 15a + 55b + 225c (2)
P 2 P 2 P 3 P 4
X Y = a X + b X + c X 3672 = 55a + 225b + 979c (3)
Solving (1), (2) and (3), we get a = 10.4, b = 11.08, c = 5.714
The best fitting of the parabola is y = 10.4 11.08x + 5.714x2 .
cov(x, y)
r = r(x, y) = rxy =
x y
where, P
xy
cov(x, y) = xy
n
rP
x2
x = (x)2
n
rP
y2
y = (y)2
n
Problem 1 Calculate the Karl pearsons co-efficient of correlation to the following data.
x 65 66 67 67 68 69 70 72
.
y 67 68 65 68 72 72 69 71
Solution:
X Y X2 Y2 XY
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
.
68 72 4624 5184 4896
69 72 4761 5184 4968
70 69 4900 4761 4830
72 71 5184 5041 5112
544 552 37028 38132 37560
n=8
P
x 544
x = = = 68
Pn 8
y 552
y = = = 69
n
rP 8
r
x2 2
37028
x = (x) = (68)2 = 2.12
n 8
rP r
y2 38132
y = (y)2 = (69)2 = 2.34
n 8
P
xy 37560
cov(x, y) = xy = (68 69) = 3
n 8
cov(x, y) 3
rxy = = = 0.6047
x y 2.12 2.34
6 d2i
P
=1
n(n2 1)
Where, di = xi yi
Where, di = xi yi
m(m2 1)
C.Fs are correction factor and it can be calculated by C.F = Here m is the number of
12
times, the data has been repeated.
x 68 64 75 50 64 80 75 40 55 64
.
y 62 58 68 45 81 60 68 48 50 70
Solution:
In value of X,
2+3
75 is repeated 2 times and which having the rank as 2 and 3. the rank of 75 = = 2.5 and
2
m(m2 1) 2(22 1)
C.F1 = = = 0.5
12 12
5+6+7
64 is repeated 3 times and which having the rank as 5, 6 and 7. the rank of 64 = = 6 and
3
2
m(m 1) 2
3(3 1)
C.F2 = = =2
12 12
In value of Y,
3+4
68 is repeated 2 times and which having the rank as 3 and 4. the rank of 68 = = 3.5 and
2
m(m2 1) 2(22 1)
C.F3 = = = 0.5
12 12
= 1 0.4545
= 0.5454
Exercise
Problem 1 10 competitors in a musical contest were ranked by 3 judges x, y and z. Find out which pair
of judges having the same likings of music.
x 1 2 3 4 5 6 7 8 9 10
y 10 6 7 9 5 4 3 2 1 8 .
z 8 10 9 7 6 5 4 3 2 1
2.4 Regression
Regression is the mathematical study of average relationship between the independent variables x and y.
Lines of regression of x on y
(x x) = bxy (y y)
Lines of regression of y on x
(y y) = byx (x x)
Note:
p
r= bxy byx
x
bxy = r
y
y
byx = r
x
The point of intersection of the lines of regression of y on x and x on y is the mean value
of x and y.
3. The most likely marks in statistics when the marks in economics is 30.
Marks in Economics 25 28 35 32 31 36 29 38 34 32
.
Marks in Statistics 43 46 49 41 36 32 31 30 33 39
P
(x x)(y y)
bxy =
(y y)2
P
93
= = 0.2336
398
and
P
(x x)(y y)
byx =
(x x)2
P
93
= = 0.6642
140 p
correlation co-efficient is = bxy byx = 0.2336 0.6642 = 0.393
Line of regression of x on y is (x x) = bxy (y y)
(x 32) = 0.2336(y 38)
x 32 = 0.2336y + 8.8768
x = 0.2336y + 8.8768 + 32
x = 0.2336y + 40.8768 (1)
Line of regression of y on x is(y y) = byx (x x)
(y 38) = 0.6642(x 32)
y 38 = 0.6642x + 21.2544
y = 0.6642x + 21.2544 + 38
y = 0.6642x + 59.2544 (2)
Problem 2 Two variables x and y have the regression lines 3x + 2y 26 = 0, 6x + y 31 = 0 find the
Solution:
Given 3x + 2y 26 = 0 (1)
6x + y 31 = 0 (2)
byx = 6
r
p 2
r= bxy byx = 6 > 2
3
Since the correlation coefficient should not exceed 1, 3x + 2y 26 = 0 can not be a line of regression
of x on y and 6x + y 31 = 0 can not be a line of regression of y on x. we have to consider
3x + 2y 26 = 0 be line of regression of y on x
3
3x + 2y 26 = 0 2y = 3x + 26 y = y + 13
2
3
byx =
2
1 31
6x + y 31 = 0 6x = y + 31 x = y +
6 6
1
bxy =
6
r
p 3 1
r = bxy byx = = 0.5 < 1
2 6