0% found this document useful (0 votes)
702 views

Regression Methods

The document discusses regression methods for modeling the relationship between variables, including fitting lines and parabolas to data using the least squares method. It provides examples of using the least squares method to fit straight lines to different data sets by minimizing the sum of the squares of the differences between observed and predicted values of the dependent variable. It also shows an example of fitting a second degree parabola to time-series data on product tonnage by minimizing the sum of squares error.

Uploaded by

Arun Prasath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
702 views

Regression Methods

The document discusses regression methods for modeling the relationship between variables, including fitting lines and parabolas to data using the least squares method. It provides examples of using the least squares method to fit straight lines to different data sets by minimizing the sum of the squares of the differences between observed and predicted values of the dependent variable. It also shows an example of fitting a second degree parabola to time-series data on product tonnage by minimizing the sum of squares error.

Uploaded by

Arun Prasath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

2.

Regression Methods

In the case of single random variable we have seen that the mean and variance are parameters of
the random variable giving us information about its average behaviour. In the case of two dimensional
random variables also we would like to have a similar representations. So, we generalize the definition of
variance to what is called covariance. Covariance between X and Y tells about the relationship between
X and Y indicating how they tend to vary together. In this chapter, we have to see the study of relationship
between variables.

2.1 Principles of Least squares

Curve fitting. Let (xi , yi ) , i = 1, 2, 3, , n be a given set of n pairs of values, X being independent
variable and Y the dependent variable. The general problem in curve fitting is to find, if possible, an
analytic expression of the form y = f (x), for the functional relationship suggested by the given data.

2.1.1 Fitting a St. line

Problem 1 Fit a straight line to the following data.

x 1 2 3 4 6 8
.
y 2.4 3 3.6 4 5 6

Solution: Let the line be Y = a + bX.

1
2.2 Statistics and Queueing Theory

X Y X2 XY
1 2.4 1 2.4
2 3 4 6.0
3 3.6 9 10.8
.
4 4 16 16.0
6 5 36 30.8
8 6 64 48.0
X 2 = 130
P P P P
X = 24 y = 24 XY = 113.2

Using normal equations,


P P
Y = na + b X 24 = 6a + 24b (1)
P P P 2
XY = a X + b X 113.2 = 24a + 130b (2)
Solving (1) and (2), we get a = 1.976 and b = 0.506.
The least square line Y on X is
Y = a + bX
= 1.976 + 0.506X

Problem 2 Fit a straight line to the following data.

x 1 2 3 4 5
.
y 14 27 40 4 55

Solution: Let the line be Y = a + bX.

X Y X2 XY
1 14 1 14
2 27 4 54
3 40 9 120 .
4 55 16 220
5 68 25 340
X 2 = 55
P P P P
X = 15 y = 204 XY = 748

Using normal equations,


P P
Y = na + b X 204 = 5a + 15b (1)
P P P 2
XY = a X + b X 748 = 15a + 55b (2)
Solving (1) and (2), we get a = 0 and b = 13.6.
The least square line Y on X is
Y = a + bX
= 13.6X

Problem 3 Fit a straight line to the following data.

M. Radhakrishnan, Asst.Professor, SRM University


Regression Methods 2.3

x 0 5 10 15 20 25
.
y 12 15 17 22 24 30

Solution: Let the line be Y = a + bX.

X Y X2 XY
0 12 0 0
5 15 25 75
10 17 100 170
.
15 22 225 330
20 24 400 480
25 30 625 750
X 2 = 1375
P P P P
X = 75 y = 120 XY = 1805

Using normal equations,


P P
Y = na + b X 120 = 6a + 75b (1)
P P P 2
XY = a X + b X 1805 = 75a + 1375b (2)
Solving (1) and (2), we get a = 11.28 and b = 0.697.
The least square line Y on X is
Y = a + bX
= 11.28 + 0.697X

b
Problem 4 Fit the least square y = ax + to the following data.
x

x 1 2 3 4
y -1.5 0.99 3.88 7.66

b
Solution: Given y = ax + xy = ax2 + b
x

Let Y=aX+bwhere, Y = xy, X = x2

x y X = x2 Y = xy X2 XY
1 -1.5 1 -1.5 1 -15
2 0.99 4 1.98 16 7.92
.
3 3.88 9 11.64 81 104.76
4 7.66 16 30.64 256 480.24
10 11.03 30 42.76 354 601.42

Using normal equations,


P P
Y = na + b X 42.76 = 4a + 30b (1)
P P P 2
XY = a X + b X 601.42 = 30a + 354b (2)

M. Radhakrishnan, Asst.Professor, SRM University


2.4 Statistics and Queueing Theory

Solving (1) and (2), we get a = 5.63 and b = 2.19.


The least square line Y on X is
Y = a + bX
= 5.63 + 2.19X
xy = 5.63x + 2.19x2
5.63
y = 2.19x
2.19

2.1.2 Fitting a second degree parabola

Problem 1 Fit a parabola of second degree to the following data:

x (year) 1931 1941 1951 1961 1971 1981 1991

y (production in
355 356 357 358 359 361 362
tons)
.

Solution: Let Y = a + bX + cX 2 is the second degree parabola.

x y X = x 1961 Y = y 358 X2 X3 X4 XY X 2Y
1931 355 -30 -3 900 -27000 810000 90 -2700
1941 356 -20 -2 400 -8000 160000 40 -800
1951 357 -10 -1 100 -1000 10000 10 -100
1961 358 0 0 0 0 0 0 0
1971 359 10 1 100 1000 10000 10 100
1981 361 20 3 400 8000 160000 60 1200
1991 362 30 4 900 27000 810000 120 3600
0 2 2800 0 196 104 330 1300
.

Using normal equations,


Y = na + b X + c X 2
P P P
2 = 7a + 0 + 2800c (1)
P P P 2 P 3
XY = a X + b X + c X 330 = 2800b (2)
P 2
X Y = a X + b X + c X 1300 = 2800a + 190 104 c (3)
P 2 P 3 P 4

Solving (1), (2) and (3), we get a = 0.0476, b = 0.117, c = 0.000595


The best fitting of the parabola is y 358 = 0.0476 + 0.117(x 1961) + 0.000595(x 1961)2

M. Radhakrishnan, Asst.Professor, SRM University


Regression Methods 2.5

Problem 2 Fit a parabola of second degree to the following data:

x 1 2 3 4 5

y (production in .
5 12 26 60 97
tons)

Solution: Let Y = a + bX + cX 2 is the second degree parabola.

X Y X2 X3 X4 XY X 2Y
1 5 1 1 1 5 5
2 12 4 8 16 24 48
3 26 9 27 81 78 234 .
4 60 16 64 256 240 960
5 97 25 125 625 485 2425
15 200 55 225 979 832 3672
Using normal equations,
Y = na + b X + c X 2
P P P
200 = 5a + 15b + 55c (1)
P P P 2 P 3
XY = a X + b X + c X 832 = 15a + 55b + 225c (2)
P 2 P 2 P 3 P 4
X Y = a X + b X + c X 3672 = 55a + 225b + 979c (3)
Solving (1), (2) and (3), we get a = 10.4, b = 11.08, c = 5.714
The best fitting of the parabola is y = 10.4 11.08x + 5.714x2 .

M. Radhakrishnan, Asst.Professor, SRM University


2.6 Statistics and Queueing Theory

2.2 Karl pearsons Correlation Co-efficient

Correlation is the study of relationship between two independent variables.


Karl pearsons correlation co-efficient is

cov(x, y)
r = r(x, y) = rxy =
x y

where, P
xy
cov(x, y) = xy
n
rP
x2
x = (x)2
n
rP
y2
y = (y)2
n

n is the number of data


P
x
x=
n
P
y
y=
n

Note: Correlation co-efficient between -1 and 1. i.e., 1 r 1

Problem 1 Calculate the Karl pearsons co-efficient of correlation to the following data.

x 65 66 67 67 68 69 70 72
.
y 67 68 65 68 72 72 69 71

Solution:

X Y X2 Y2 XY
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
.
68 72 4624 5184 4896
69 72 4761 5184 4968
70 69 4900 4761 4830
72 71 5184 5041 5112
544 552 37028 38132 37560

M. Radhakrishnan, Asst.Professor, SRM University


Regression Methods 2.7

n=8
P
x 544
x = = = 68
Pn 8
y 552
y = = = 69
n
rP 8
r
x2 2
37028
x = (x) = (68)2 = 2.12
n 8
rP r
y2 38132
y = (y)2 = (69)2 = 2.34
n 8
P
xy 37560
cov(x, y) = xy = (68 69) = 3
n 8
cov(x, y) 3
rxy = = = 0.6047
x y 2.12 2.34

2.3 Rank correlation

Spearsmans rank correlation coefficient

6 d2i
P
=1
n(n2 1)

Where, di = xi yi

Note: If ranks are repeated, then

d2i + C.F1 + C.F2 +


P 
6
=1
n(n2 1)

Where, di = xi yi

m(m2 1)
C.Fs are correction factor and it can be calculated by C.F = Here m is the number of
12
times, the data has been repeated.

Problem 1 Calculate the spearsmans rank correlation to the following data.

x 68 64 75 50 64 80 75 40 55 64
.
y 62 58 68 45 81 60 68 48 50 70

Solution:

M. Radhakrishnan, Asst.Professor, SRM University


2.8 Statistics and Queueing Theory

X Y Rank of X Rank of Y di = xi yi d2i


68 62 4 5 1 1
64 58 6 7 1 1
75 68 2.5 3.5 1 1
50 45 9 10 1 1
64 81 6 1 5 25
.
80 60 1 6 5 25
75 68 2.5 3.5 1 1
40 48 10 9 1 1
55 50 8 8 0 0
64 70 6 2 4 16
d2i = 72
P

In value of X,
2+3
75 is repeated 2 times and which having the rank as 2 and 3. the rank of 75 = = 2.5 and
2
m(m2 1) 2(22 1)
C.F1 = = = 0.5
12 12
5+6+7
64 is repeated 3 times and which having the rank as 5, 6 and 7. the rank of 64 = = 6 and
3
2
m(m 1) 2
3(3 1)
C.F2 = = =2
12 12
In value of Y,
3+4
68 is repeated 2 times and which having the rank as 3 and 4. the rank of 68 = = 3.5 and
2
m(m2 1) 2(22 1)
C.F3 = = = 0.5
12 12

d2i + C.F1 + C.F2 + C.F3


P 
6
=1
n(n2 1)
6 [72 + 0.5 + 2 + 0.5]
=1
10(102 1)

= 1 0.4545

= 0.5454

M. Radhakrishnan, Asst.Professor, SRM University


Regression Methods 2.9

Exercise

Problem 1 10 competitors in a musical contest were ranked by 3 judges x, y and z. Find out which pair
of judges having the same likings of music.

x 1 2 3 4 5 6 7 8 9 10
y 10 6 7 9 5 4 3 2 1 8 .

z 8 10 9 7 6 5 4 3 2 1

2.4 Regression

Regression is the mathematical study of average relationship between the independent variables x and y.
Lines of regression of x on y
(x x) = bxy (y y)

Lines of regression of y on x
(y y) = byx (x x)

where bxy and byx are regression co-efficients. It is given by


P P
(x x)(y y) (x x)(y y)
bxy = 2
and byx =
(x x)2
P P
(y y)

Note:
p
r= bxy byx

x
bxy = r
y

y
byx = r
x

The point of intersection of the lines of regression of y on x and x on y is the mean value
of x and y.

Problem 1 From the following data find

1. Two lines of regressions

2. Coefficient of correlation between the marks of economics and statistics

3. The most likely marks in statistics when the marks in economics is 30.

Marks in Economics 25 28 35 32 31 36 29 38 34 32
.
Marks in Statistics 43 46 49 41 36 32 31 30 33 39

M. Radhakrishnan, Asst.Professor, SRM University


2.10 Statistics and Queueing Theory

Solution:Let x be marks in Economics and y be marks in Statistics


P P
x
320 y 380
x= = = 32 and y = = = 38
n 10 n 10

x y (x x) (y y) (x x)2 (y y)2 (x x)(y y)


25 43 7 5 49 25 35
28 46 4 8 16 64 32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 1 2 1 4 2
.
36 32 4 6 16 36 24
29 31 3 7 9 49 21
38 30 6 8 36 64 48
34 33 2 5 4 25 10
32 39 0 1 0 1 0
320 380 0 0 140 398 93

P
(x x)(y y)
bxy =
(y y)2
P

93
= = 0.2336
398
and
P
(x x)(y y)
byx =
(x x)2
P

93
= = 0.6642
140 p
correlation co-efficient is = bxy byx = 0.2336 0.6642 = 0.393
Line of regression of x on y is (x x) = bxy (y y)
(x 32) = 0.2336(y 38)
x 32 = 0.2336y + 8.8768
x = 0.2336y + 8.8768 + 32
x = 0.2336y + 40.8768 (1)
Line of regression of y on x is(y y) = byx (x x)
(y 38) = 0.6642(x 32)
y 38 = 0.6642x + 21.2544
y = 0.6642x + 21.2544 + 38
y = 0.6642x + 59.2544 (2)

M. Radhakrishnan, Asst.Professor, SRM University


Regression Methods 2.11

Now, to find y when x = 30

eqn.(2) y = 0.6642(30) + 59.2544 = 39.3284

Marks in Statistics = 39.32

Problem 2 Two variables x and y have the regression lines 3x + 2y 26 = 0, 6x + y 31 = 0 find the

1. mean value of x and y

2. correlation co-efficient between x and y

3. the variance of y when the variance of x is 25

Solution:

Given 3x + 2y 26 = 0 (1)

6x + y 31 = 0 (2)

1. mean value of x and y


Solving (1) and (2), we get x = 4 and y = 7
x = 4 and y = 7

2. correlation co-efficient between x and y


Let 3x + 2y 26 = 0 be line of regression of x on y
Then
2
3x + 2y 26 = 0 3x = 2y + 26 x = y + 12
3
2
bxy =
3

Let 6x + y 31 = 0 be line of regression of y on x


Then
6x + y 31 = 0 y = 6x + 31 y = 6x + 31

byx = 6
r
p 2
r= bxy byx = 6 > 2
3

Since the correlation coefficient should not exceed 1, 3x + 2y 26 = 0 can not be a line of regression
of x on y and 6x + y 31 = 0 can not be a line of regression of y on x. we have to consider

M. Radhakrishnan, Asst.Professor, SRM University


2.12 Statistics and Queueing Theory

3x + 2y 26 = 0 be line of regression of y on x

3
3x + 2y 26 = 0 2y = 3x + 26 y = y + 13
2

3
byx =
2

and consider 6x + y 31 = 0 be line of regression of x on y

1 31
6x + y 31 = 0 6x = y + 31 x = y +
6 6
1
bxy =
6
r
p 3 1
r = bxy byx = = 0.5 < 1
2 6

3. the variance of y when the variance of x is 25 (x2 = 25)


i.e., x = 5, we have to find y
x
bxy = r
y
x
y =r
bxy
5
= 0.5 = 15
1

6
y2 = 225

M. Radhakrishnan, Asst.Professor, SRM University

You might also like