A A Regression
A A Regression
2
2
Where X, is the mean value of X and Y
is the mean value of Y.
Similarly, the normal equations for the
regression equation X on Y are:
_ _
Y d X c
Y n Y
Y X n XY
d
Y d Y c XY
Y d nc X
=
=
+ =
+ =
2
2
2
Illustration
From the following data obtain the two
Regression equations:
X Y X
2
Y
2
XY
6 9 36 81 54
2 11 4 121 22
10 5 100 25 50
4 8 16 64 32
8 7 64 49 56
X = 30 Y = 40 X
2
= 220 Y
2
= 340 XY = 214
Regression equation of Y on X
Y = a + bx
Substituting the values in equations
We find
40 = 5a + 30 b..(i)
214 = 30a + 220 b(ii)
b = -0.65
Substituting the value of b in equation (i)
40 = 51 + 30 (-.65)
a = 11.9
Putting the values of a & b in the equation, the regression
line of Y on X is:
Y = 11.9 0.65 X
+ =
+ =
2
X b X a XY
X b na Y
Regression line of X on Y
X = c + d Y
Substituting the values in the following equations:
We find:
30 = 5a + 40 b..(i)
214 = 40 a + 340 b(ii)
Solving these two equations we get
b = - 1.3
Substituting the value of b in equation (i)
30 = 5a + 40 (-1.3)
a = 16.4
Thus Regression line of X on Y
X = 16.4 1.3 Y
+ =
+ =
2
Y d Y c XY
Y d nc X
Deviations taken from Arithmetic
Mean
In this case instead of dealing with the actual
values of X and Y we take the deviations of X
and Y series from their respective means.
In such a case the two regression equations are
written as follows:
i) Regression equation of X on Y:
y
x
r
Y
X
Y Y r X X
y
x
o
o
o
o
=
= ) (
Mean of X series
Mean of Y Series =
= The regression coefficient of X on Y
Regression equation of Y on X:
Y Y y
X X x
x
xy
b
Where
X X b Y Y
X X
x
y
r Y Y
yx
yx
=
=
=
=
=
2
) (
) (
o
o
Illustration
Calculate the two regression equations of
X on Y and Y on X from the data given
below :
Price 10 12 13 12 16 15
Amount
demanded
40 38 43 45 37 43
Solution:
X x x
2
Y y y
2
xy
10 -3 9 40 -1 1 3
12 -1 1 38 -3 9 3
13 0 0 43 2 4 0
12 -1 1 45 4 16 -4
16 3 9 37 -4 16 -12
15 2 4 43 2 4 4
X = 78 x = 0 x
2
= 24
Y = 246 y = 0 y
2
=50 xy = -6
Regression equation of X on Y
) ( Y Y
y
x
X X =
o
o
92 . 17 12 . 0
) 41 ( 12 . 0 13
12 . 0
50
6
41
6
246
13
6
78
2
+ =
=
=
= =
=
=
= =
Y X
Y X
y
xy
y
x
Y
X
o
o
Regression equation of Y on X
25 . 44 25 .
25 . 44 25 . 0 41
) 13 ( 25 . 0 41
25 . 0
24
6
) (
2
+ =
+ =
=
=
= =
=
X Y
X Y
X Y
x
xy
x
y
X X
x
y
Y Y
o
o
o
o
Regression Coefficients
Regression equation of Y on X, is given as
Y = a + b X
The quantity b, the slope of the line of regression Y
on X is called the regression coefficient.
The regression coefficient of Y on X is also given
by:
Regression coefficient of X on Y is given by:
x
y
r b
yx
o
o
=
y
x
r bxy
o
o
=
Where r is the coefficient of correlation
between X and Y
x is the population standard deviation
of X
y is the population standard deviation
of Y
Properties of Regression Coefficient :
i. The coefficient of correlation is the
geometric mean of the two regression
coefficients.
xy yx
b b r =
If b
xy
is positive then b
yx
should also be
positive and vice versa. Thus, both
regression coefficient must have the same
sign.
As the value of the coefficient of correlation
lies between 1, both the regression
coefficient also lie between 1. If one of the
regression coefficient is greater than one,
the other must be less than one, since the
value of the coefficient of correlation can not
be greater than one.
The coefficient of correlation and the regression
coefficient have the same sign. If the former is
positive, the latter is also positive and vice-
versa.
Arithmetic mean of b
yx
and b
xy
is equal or greater
than coefficient of correlation.
Regression coefficient are independent of origin
but not of scale.
Since
values given the other three.
y
x
r b
xy
o
o
=
, we can find out any of the four
Standard Error of Estimate (S. E.)
The standard error of estimate measures the
dispersion about an average line, called the
regression line.
It is analogous to standard deviation. S. E. of Y
measures the variability of the observed values
of Y around the regression line.
The deviations are not from the arithmetic
means but they are perpendicular distances of
every point from the line of average
relationships.
Formula:
S
yx
is the standard error
of regression of Y
values from Y
c
.
Similarly,
2
2
1
) (
r S
N
Y Y
S
y yx
c
yx
=
=
o
N
X X
S
c
xy
2
) (
=
A more convenient Formula is :
It measures the accuracy of the estimated
figures.
The smaller the value of S. E., the closer will be
the dots to the regression line and the better the
estimates based on the equation for this line.
If S. E. is zero, then there is no variation about
the line and the correlation will be perfect.
N
XY b Y a Y
S
N
XY b X a X
S
yx
xy
=
=
2
2
With the help of S. E. it is possible for us to ascertain
how good and representative the regression line is as a
description of the average relationship between two
series.
Interpreting the SE of Estimate and finding the
Confidence Limits for the Estimate in Large and Small
Samples:
a) SE for Large Samples (where N 30 in a sample)
With an assumption that the observed values of Y are
normally distributed around the regression line and the
variance of the distribution around each possible value
of Y is same, one can find
68% of all observations within Y 1 SE limits
95.5% of all observations within Y 2 SE limits
99.7% of all observations within Y 3 SE limits
>