Linear Regression Analysis: Module - Ii
Linear Regression Analysis: Module - Ii
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
LINEAR REGRESSION ANALYSIS
MODULE II
Lecture - 7
Simple Linear Regression
Analysis
2
Orthogonal regression method (or major axis regression method)
The direct and reverse regression methods of estimation assume that the errors in the observations are either in x-direction
or y-direction. In other words, the errors can be either in dependent variable or independent variable. There can be
situations when uncertainties are involved in dependent and independent variables both. In such situations, the orthogonal
regression is more appropriate. In order to take care of errors in both the directions, the least squares principle in
orthogonal regression minimizes the squared perpendicular distance between the observed data points and the line in the
following scatter diagram to obtain the estimates of regression coefficients. This is also known as major axis regression
method. The estimates obtained are called as orthogonal regression
estimates or major axis regression estimates of regression coefficients.
If we assume that the regression line to be fitted is ,
then it is expected that all the observations
lie on this line. But these points deviate from the line and in such a
case, the squared perpendicular distance of observed data
from the line is given by
where denotes the pair of observation without any error
which lie on the line.
0 1 i i
Y X = +
( , ), 1,2,...,
i i
x y i n =
(xi, yi)
yi,
xi,
(Xi, Yi)
Orthogonal or major axis regression method
( , )( 1,2,..., )
i i
x y i n =
2 2 2
( ) ( )
i i i i i
d X x Y y = +
( , )
i i
X Y
th
i
3
The objective is to minimize the sum of squared perpendicular distances given by to obtain the estimates of
and .
The observations are expected to lie on the line
so let
The regression coefficients are obtained by minimizing under the constraints using the Lagrangians multiplier
method. The Lagrangian function is
where are the Lagrangian multipliers.
The set of equations are obtained by setting
Thus we find
2
1
n
i
i
d
=
( , )( 1,2,..., )
i i
x y i n =
0 1 i i
Y X = +
0 1
0.
i i i
E Y X = =
2
1
n
i
i
d
=
'
i
E s
2
0
1 1
2
n n
i i i
i i
L d E
= =
=
1
,...,
n
0 0 0 0
0 1
0, 0, 0 0( 1,2,..., ). and
i i
L L L L
i n
X Y
= = = = =
0
1
0
0
1 0
0
1 1
( ) 0
( ) 0
0
0.
i i i
i
i i i
i
n
i
i
n
i i
i
L
X x
X
L
Y y
Y
L
L
X
=
=
= + =
= =
= =
= =
4
Since
so substituting these values in , we obtain
Also using this in the equation we get
and using we get
Substituting in this equation, we get
Using in the equation and using the equation , we solve
1 i i i
i i i
X x
Y y
=
= +
0 1
1
2
1
( )
0
1
n
i i
i
x y
=
+
=
+
1
1
( ) 0.
n
i i i
i
x
=
=
2 2
0 1 1 0 1
1 1
2 2 2
1 1
( ) ( )
0. (1)
(1 ) (1 )
n n
i i i i i i
i i
x x y x x y
= =
+ +
=
+ +
0 1
1
2
1
( )
0.
1
n
i i
i
x y
=
+
=
+
i
E
0 1 1
0 1
2
1
( ) ( ) 0
.
1
i i i i i
i i
i
E y x
x y
= + =
+
=
+
i
1
0,
n
i
i
=
=
1
1
( ) 0 0, and
n
i i i i i
i
X x X
=
+ = =
1
0
n
i
i
=
=
5
The solution provides an orthogonal regression estimate of as
where is an orthogonal regression estimate of
Now, substituting in equation (1), we get
0
0 1
OR OR
y x =
1
OR
1
.
0OR
( )
| | | |
2
2 2
1 1 1 1 1 1
1 1
2
2
1 1 1 1
1 1
2 2
1 1 1 1
1 1
(1 ) 0
(1 ) ( ) ( ) ( ) 0
(1 ) ( )( ) ( ) 0
or
or
where
n n
i i i i i i i
i i
n n
i i i i i
i i
n n
i i i i i
i i
yx xx x x y y x x y
x y y x x y y x x
u x v u v u
u
= =
= =
= =
( + + + =
+ + + =
+ + + + =
1 1
2 2 2
1 1
1
2
1 1
,
.
0,
( ) 0
( ) 0.
Since so
or
i i
i i
n n
i i
i i
n
i i i i i i
i
xy xx yy xy
x x
v y y
u v
u v u v u v
s s s s
= =
=
=
=
= =
( + =
+ =
6
Solving this quadratic equation provides the orthogonal regression estimate of as
where sign denotes the sign of which can be positive or negative. So
Notice that this gives two solutions for . We choose the solution which minimizes
The other solution maximizes and is in the direction perpendicular to the optimal solution.
The optimal solution can be chosen with the sign of .
1
( ) ( )
2 2
1
( ) 4
2
xy
yy xx xy xx yy
OR
xy
s s sign s s s s
s
+ +
=
( )
xy
s
xy
s
1
OR
2
1
.
n
i
i
d
=
2
1
n
i
i
d
=
xy
s
1 0
( )
1 0.
if
if
xy
xy
xy
s
sign s
s
>
=
<
7
Reduced major axis regression method
The direct, reverse and orthogonal methods of estimation minimize the errors in a particular direction which is usually the
distance between the observed data points and the line in the scatter diagram. Alternatively, one can consider the area
extended by the data points in certain neighbourhood and instead of distances, the area of rectangles defined between
corresponding observed data point and nearest point on the line in the following scatter diagram can also be minimized.
Such an approach is more appropriate when the uncertainties are present in study as well as explanatory variables. This
approach is termed as reduced major axis regression.
Suppose the regression line is on which all the observed points are expected to lie. Suppose the points
are observed which lie away from the line.
0 1 i i
Y X = +
( , ), 1,2,...,
i i
x y i n =
8
The area of rectangle extended between the i
th
observed data point and the line is
where denotes the i
th
pair of observation without any error which lie on the line.
The total area extended by n data points is
All observed data points are expected to lie on the line
and let
So now the objective is to minimize the sum of areas under the constraints to obtain the reduced major axis estimates
of regression coefficients. Using the Lagrangian multiplies method, the Lagrangian function is
where are the Lagrangian multipliers. The set of equations are obtained by setting
( ~ )( ~ ) ( 1,2,..., )
i i i i i
A X x Y y i n = =
( , )
i i
X Y
1 1
( ~ )( ~ ).
n n
i i i i i
i i
A X x Y y
= =
=
( , ), ( 1,2,..., )
i i
x y i n =
0 1 i i
Y X = +
*
0 1
0.
i i i
E Y X = =
*
1 1
*
1 1
( )( )
n n
R i i i
i i
n n
i i i i i i
i i
L A E
X x Y y E
= =
= =
=
=
1
,...,
n
0 1
0, 0, 0, 0 ( 1,2,..., ).
R R R R
i i
L L L L
i n
X Y
= = = = =
*
i
E
9
Thus
Now
Substituting in , we get the reduced major axis regression estimate of is obtained as
where is the reduced major axis regression estimate of . Using and in
we get
1
1 0
1 1
( ) 0
( ) 0
0
0.
R
i i i
i
R
i i i
i
n
R
i
i
n
R
i i
i
L
Y y
X
L
X x
Y
L
L
X
=
=
= + =
= =
= =
= =
1
0 1 1
0 1 1
0 1
1
( )
.
2
i i i
i i i
i i i
i i i i
i i
i
X x
Y y
X y
x y
y x
= +
=
+ =
+ + =
=
i
1
0
n
i
i
=
=
0
0 1
RM RM
y x =
1
RM
1
,
i i i i
X x = +
0
RM
1
0,
n
i i
i
X
=
=
1 1 1 1
1
1 1
0.
2 2
n
i i i i
i
i
y y x x y y x x
x
=
| || | + +
=
| |
\ .\ .
10
Let then this equation can be re-expressed as
Using we get
Solving this equation, the reduced major axis regression estimate of is obtained as
where
We choose the regression estimator which has same sign as that of s
xy.
, and
i i i i
u x x v y y = =
1 1 1
1
( )( 2 ) 0.
n
i i i i
i
v u v u x
=
+ + =
1 1
0,
n n
i i
i i
u v
= =
= =
2 2 2
1
1 1
0.
n n
i i
i i
v u
= =
=
1
( )
yy
RM xy
xx
s
sign s
s
=
1 0
( )
1 0.
if
if
xy
xy
xy
s
sign s
s
>
=
<
11
Least absolute deviation regression method
The least squares principle advocates the minimization of sum of squared errors. The idea of squaring the errors is useful in
place of simple errors because the random errors can be positive as well as negative. So consequently their sum can be
close to zero indicating that there is no error in the model which can be misleading. Instead of the sum of random errors,
the sum of absolute random errors can be considered which avoids the problem due to positive and negative random
errors.
In the method of least squares, the estimates of the parameters and in the model
are chosen such that the sum of squares of deviations is minimum. In the method of least absolute deviation (LAD)
regression, the parameters and are estimated such that the
sum of absolute deviations is minimum. It minimizes the
absolute vertical sum of errors as in the following scatter diagram:
0
0 1
. ( 1,2,..., )
i i i
y x i n = + + =
2
1
n
i
i
1
n
i
i
12
The LAD estimates and are the values and , respectively which minimize
for the given observations
0
0 1 0 1
1
( , )
n
i i
i
LAD y x
=
=
( , )( 1,2,..., ).
i i
x y i n =
Conceptually, LAD procedure is simpler than OLS procedure because |e| (absolute residuals) is a more straightforward
measure of the size of the residual than e
2
(squared residuals). The LAD regression estimates of and are not
available in closed form. Rather they can be obtained numerically based on algorithms. Moreover, this creates the
problems of non-uniqueness and degeneracy in the estimates. The concept of non-uniqueness relates to more than one
best lines passing through a data point. The degeneracy concept describes that the best line through a data point also
passes through more than one other data points. The non-uniqueness and degeneracy concepts are used in algorithms to
judge the quality of the estimates. The algorithm for finding the estimators generally proceeds in steps. At each step, the
best line is found that passes through a given data point. The best line always passes through another data point, and this
data point is used in the next step. When there is non-uniqueness, then there are more than one best lines. When there
is degeneracy, then the best line passes through more than one other data point. When either of the problem is present,
then there is more than one choice for the data point to be used in the next step and the algorithm may go around in circles
or make a wrong choice of the LAD regression line. The exact tests of hypothesis and confidence intervals for the LAD
regression estimates can not be derived analytically. Instead they are derived analogous to the tests of hypothesis and
confidence intervals related to ordinary least squares estimates.
0
13
Estimation of parameters when X is stochastic
In a usual linear regression model, the study variable is supped to be random and explanatory variables are assumed to be
fixed. In practice, there may be situations in which the explanatory variable also becomes random.
Suppose both dependent and independent variables are stochastic in the simple linear regression model
where is the associated random error component. The observations are assumed to be jointly
distributed. Then the statistical inferences can be drawn in such cases which are conditional on X.
Assume the joint distribution of X and y to be bivariate normal where and are the means of X
and and are the variances of X and y, and is the correlation coefficient between X and y. Then the
conditional distribution of y given X = x is univariate normal conditional mean
and conditional variance of y given X = x is
where
and
0 1
y X = + +
( , ), 1,2,...,
i i
x y i n =
2 2
( , , , , )
x y x y
N
x
2
;
x
y
2
y
| 0 1
( | )
y x
E y X x x = = = +
2 2 2
|
( | ) (1 )
y x y
Var y X x = = =
0 1 y x
=
1
.
y
x
=
Moreover, the correlation coefficient
can be estimated by the sample correlation coefficient
14
When both X and y are stochastic, then the problem of estimation of parameters can be reformulated as follows. Consider
a conditional random variable y|X = x having a normal distribution with mean as conditional mean and variance as
conditional variance . Obtain n independently distributed observation from
with nonstochastic X. Now the method of maximum likelihood can be used to estimate the parameters
which yields the estimates of and as earlier in the case of nonstochastic X as
| y x
2
|
( | )
y x
Var y X x = = | , 1,2,...,
i i
y x i n =
2
| |
( , )
y x y x
N
0
and
respectively.
1
b y b x =
1
xy
xx
s
b
s
=
( )( )
y x
y x
E y X
=
1
2 2
1 1
1
( )( )
( ) ( )
.
n
i i
i
n n
i i
i i
xy
xx yy
xx
yy
y y x x
x x y y
s
s s
s
b
s
=
= =
=
=
=
15
Thus
which is same as the coefficient of determination.
Thus R
2
has the same expression as in the case when X is fixed.
Thus R
2
again measures the goodness of fitted model even when X is stochastic.
2 2
1
1
2
1
2
xx
yy
xy
yy
n
yy i
i
yy
s
b
s
s
b
s
s
s
R
=
=
=
=
=