Chapter5 Regression TransformationAndWeightingToCorrectModelInadequacies
Chapter5 Regression TransformationAndWeightingToCorrectModelInadequacies
Chapter5 Regression TransformationAndWeightingToCorrectModelInadequacies
The graphical methods help in detecting the violation of basic assumptions in regression analysis. Now we
consider the methods and procedures for building the models through data transformation when some of the
assumptions are violated.
the study variable follows a probability distribution in which the variance is functionally related to mean.
For example, if the study variable ( y ) in the model is Poisson random variable in a simple linear regression
model, then its variance is the same as the mean. Since mean of y is related to the explanatory variable x ,
so the variance of y will be proportional to x . In such cases, variance stabilizing transformations are
useful.
In another example, if y is proportion, i.e., 0 yi 1 then in such cases the variance of y is proportional
Some commonly used variance-stabilizing transformations in the order of their strength are as follows:
Relation of 2 to E ( y ) Transformation
2 E ( y ) y* y (Poisson data)
2 E ( y )[1 E ( y )] y* sin 1 ( y ) (Binomial proportion 0 yi 1)
2 [ E ( y )]2 y* ln( y )
2 [ E ( y )]3 y* 1/ y
1
2 [ E ( y )]4 y*
y
After making a suitable transformation, use y * as a study variable in the respective case.
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
1
The strength of a transformation depends on the amount of curvature present in the curve between the study
and the explanatory variable. The transformation mentioned here ranges from relatively mild to relatively
strong. The square root transformation is relatively mild and reciprocal transformation is relatively strong.
The square root transformation is relatively mild and reciprocal transformation is relatively strong.
In general, a mild transformation applied when the minimum and maximum values do not range much (e.g.
ymax / ymin 2,3) and such transformation has little effect on the curvature. On the other hand, when the
minimum and maximum vary much, then a strong transformation is needed that will have a substantial
impact on the analysis.
In the presence of non-constant variance, the OLSE will remain unbiased but will looses the minimum
variance property.
When the study variable has been transformed as y* , then the predicted values are in the transformed scale.
It is often necessary to convert the predicted values back to the original units ( y ).
When the inverse transformation is applied directly to the original values, then it gives an estimate of the
median of the distribution of the study variable instead of the mean. So one needs to be careful while doing
so.
Confidence interval and prediction interval may be directly converted from one metric to another. The
reason being that the interval estimates are percentile of distribution and percentiles are unaffected by the
transformation. One may note that the resulting intervals may or may not be or remain the shortest possible
intervals.
In some cases, a nonlinear model can be linearized by using a suitable transformation. Such nonlinear
models are called intrinsically or transformable linear. The advantage of transforming the nonlinear
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
2
function into the linear function is that the statistical tools are developed for the case of a linear regression
model. For example, exact tests for the test of hypothesis, confidence interval estimation etc. are developed
for the case of a linear regression model. Once the nonlinear function is transformed to a linear function, all
such tools can be readily applied, and there is no need to develop them separately.
Using the transformation y* ln y, x* ln x, i.e., by taking log on both sides, the model
becomes
log y log 0 1 log x
or y* 0* 1 x *
where 0* log 0 and the model becomes a linear model. Note that the parameter 0 changes
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
3
2. If the curve between y and x is like as follows
ln y ln 0 1 x
or y* 0* 1 x
where y* ln y and 0* ln 0 .
So y* ln y is the transformation needed in this case. The intercept term 0 becomes ln 0 in
1 1
which becomes a linear model by using the transformation y* , x* .
y x
With the observed behaviour of the plots, one can choose any such curve and use the linearized form
of the function.
When such transformations are used, many times the form of also gets changed. For example, in
the case of
y 0 exp( 1 x)
ln y ln 0 1 x ln
or y* 0* 1 x * .
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
5
This implies that the multiplicative error in the original model is log normally distributed in the
transformed model. Many times, we ignore this aspect and continue to assume that the random errors
are still normally distributed. In such cases, the residuals from the transformed model should be
checked for the validity of the assumptions.
When such transformations are used, the OLSE has the desired properties with respect to the
transformed data and not the original data.
For example, if 0.5, then the transformation is the square root and y is used as a study variable in
place of y .
Now the linear regression model has parameters , 2 and . Box and Cox method tells how to estimate
simultaneously the and parameters of the model using the method of maximum likelihood.
Note that as approaches zero, y approaches to 1. So there is a problem at 0 because this makes all
the observation y to be unity. It is meaningless that all the observation on the study variable are constant.
y 1
So there is a discontinuity at 0 . One approach to solve this difficulty is to use as a study
y 1
variable. Note that as 0, ln y . So a possible solution is to use the transformed study variable
as
y 1
for 0
W
ln y for 0.
So the family W is continuous. Still, it has a drawback. As changes, the value of W change
dramatically. So it is difficult to obtain the best value of . If different analysts obtain different values of
, then it will fit different models. It may then not be appropriate to compare the models with different
values of . So it is preferable to use an alternative form
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
6
y 1
for 0
y ( ) V y* 1
y ln y for 0
*
where y* is the geometric mean of yi ' s as y* ( y1 y2 ... yn )1/ n which is constant.
When V is applied to each yi , we get V V1 ,V2 ,...,Vn ' as a vector of observation on transformed study
The quantity y* 1 in the denominator is related to the nth power of Jacobian of transformation. See how:
yi 1
yi( ) Wi ; 0.
Let y y1 , y2 ,..., yn ', W (W1 , W2 ,..., Wn ) '.
y1 1
Note that if W1 , then
W1 y1 1
y1 1
y1
W1
0.
y2
In general,
Wi yi 1 if i j
y j 0 if i j.
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
7
W1 W1 W1
y1 y2 yn
y1 1 0 0 0
W2 W2 W2 1
0 y2 0 0
J (W y ) y1 y2 yn
1
0 0 0 yn
Wn Wn Wn
y1 y2 yn
n
yi 1
i 1
1
n
yi
i 1
1
1 1
J(y W) n .
J (W Y )
yi
i 1
Since this is a Jacobian when we want to transform the whole vector y to whole vector W . If an individual
yi is to be transform into Wi , then take its geometric mean as
1
1
J ( yi Wi ) 1
.
n n
yi
i 1
1
The quantity J (Y W ) n
ensures that unit volume is preserved moving from the set of yi to the
yi 1
1
i
set of Vi . This is a factor which scales and ensures that the residual sum of squares obtained from different
y 1
where y ( ) , ~ N (0, 2 I ).
y*
1
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
8
Applying the method of maximum likelihood for likelihood function for y ( ) ,
n 2
i
n
( ) 1 2
L y 2
exp i 1 2
2 2
n
1 2 '
= 2
exp 2
2 2
n
1 2 ( y ( ) X ) '( y ( ) X )
2
exp
2 2 2
n ( y ( ) X ) '( y ( ) X )
ln L y ( ) ln 2 (ignoring constant).
2 2 2
Solving
ln L y ( )
0
ln L y ( )
0
2
gives the maximum likelihood estimators
ˆ ( ) ( X ' X ) 1 X ' y ( )
1 ( ) y ( ) ' Hy ( )
ˆ 2 ( ) y ' I X ( X ' X ) 1 X ' y ( )
n n
for a given value of .
n n
L( ) ln ˆ 2 ln SSr e s ( )
2 2
where SS r e s ( ) is the sum of squares due to residuals which is a function of . Now maximize L( )
with respect to . It is difficult to obtain any closed form of the estimator of . So we maximize it
numerically.
n
The function ln SSr e s ( ) is called as the Box-Cox objective function.
2
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
9
Let max be the value of which minimizes the Box-Cox objective function. Then under fairly general
has approximately 2 (1) distribution. This result is based on the large sample behaviour of the likelihood
ratio statistic. This is explained as follows:
n SSr e s (max )
ln ln
2 SSr e s ( )
n SSr e s ( )
ln ln
2 SS r e s (max )
n n
ln SSr e s ( ) ln SSr e s (max )
2 2
L( ) L(max )
where
n
L( ) ln SSr e s ( )
2
n
L(max ) ln SSr e s (max ) .
2
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
10
Since under certain regularity conditions, 2 ln n converges in distribution to 2 (1) when the null
hypothesis is true, so
2 ln ~ 2 (1)
2 (1)
or ln ~
2
2 (1)
or L(max ) L( ) ~ .
2
Computational procedure
The maximum- likelihood estimate of corresponds to the value of for which residual sum of squares
from the fitted model SSr e s ( ) is a minimum. To determine such , we proceed computationally as follows:
- Fit y ( ) for various values of . For example, start with values in (-1, 1) then take the values in
(-2, 2) and so on. Take about 15 to 20 values of which are expected to be sufficient for the
estimation of optimum value.
- Plot SSr e s ( ) versus .
Note that the value of can not be selected by directly comparing the residual sum of squares from the
regression of y on x because for each , the residual sum of squares is measured on a different scale.
It is better to use simple values of . For example, the practical difference between 0.5 and 0.58 is
likely to be small but 0.5 is much easier to interpret.
It is entirely acceptable to use y ( ) as a response to the final model. This model will have a scale difference
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
11
An approximate confidence interval for
We can find an approximate confidence interval for the transformation parameter . This interval helps in
selecting the final value of . For example, if ˆ 0.58 is the value of which is minimizing the sum of
squares due to residual. But if 0.5 is in the confidence interval, then one may use the square root
transformation because it is easier to explain. Furthermore, if 1 is in the confidence interval, then it may
be concluded that no transformation is necessary.
In applying the method of maximum likelihood to the regression model, we are essentially maximizing
n
L( ) ln SS r e s ( )
2
or equivalently, we are minimizing SS r e s ( ) .
An approximate 100(1- ) % confidence interval for consists of those values of that satisfy
2 (1)
L(ˆ ) L( )
2
where 2 (1) is the upper % point of the Chi-square distribution with one degree of freedom.
where ̂ is the value of which minimizes the sum of squares due to residuals. See how:
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
12
2 (1) n 2 (1)
L(ˆ ) ln SSr e s (ˆ )
2 2 2
n (1)
2
ln SS Re s (ˆ )
2 n
n 2 (1)
2
ln SS r e s (ˆ ) ln exp
n
n ˆ 2 (1)
ln SSr e s ( ).exp
2 n
n
ln SS * .
2
Using the expansion of exponential function as
t2
exp(t ) 1 t ...
2!
1 t,
where is the degrees of freedom associated with the sum of squares due to residuals.
These expressions are based on the fact that
2 (1) Z 2 t2 if is small.
It is debatable to use either or n but practically the difference is very little between the confidence
interval results.
Box-Cox transformation was originally introduced to reduce the nonnormality in the data. It also helps in
reducing the nonlinearity. The approach is to find out the transformations, which attempts to reduce the
residuals associated with outliers and also reduce the problem of non-constant error variance if there was no
acute nonlinearity, to begin with.
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
13
Transformation on explanatory variables: Box and Tidwell procedure
Suppose the relationship between y and one or more of the explanatory variables is nonlinear. Other usual
assumptions normally and independently distributed study variable with constant variance are at least
approximately satisfied.
We want to select an appropriate transformation on the explanatory variable so that the relationship between
y and transformed explanatory variable is as simple as possible.
Box and Tidwell procedure describes a general analytical procedure for determining the form of
transformation on x
Suppose that the study variable y is related to the power of explanatory variables. Box and Tidwell
procedures for explanatory variables choose the variables as
xij j 1
when j 0, i 1, 2,.., n; j 1, 2,..., k
zij j
ln xij when j 0.
We need to estimate j ' s . Since the dependent variable is not being transformed, we need not worry about
We consider this for simple linear regression model instead of a nonlinear regression model.
Assume y is related to x as
E ( y ) f ( , 0 , 1 ) 0 1
x if 0
where
ln x if 0
Usually, the first guess is 0 1 so that x or no transformation is applied in the first iteration.
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
14
Expand about the initial guess in a Taylor series and ignoring terms of order higher them one gives
df ( , 0 , 1 )
E ( y ) f ( 0 , 0 , 1 ) ( 0 )
d 0
0
df ( , 0 , 1 )
0 1 x ( 1) .
d 0
0
df ( , 0 , 1 )
Suppose the term is known, then it can be treated just like as an additional explanatory
d 0
0
d
Since the form of transformation is known, i.e., x , so x ln x.
d
Furthermore
df ( , 0 , 1 ) d ( 0 1 x)
1.
d 0 dx
ŷ ˆ0 ˆ1 x
by least-squares method.
x ln x
estimating the parameter in
E ( y ) 0* 1* x ( 1) 1
0* 1* x
by least-squares.
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
15
This gives the following:
yˆ ˆ0* ˆ1* x
ˆ
ˆ ( 1) ˆ1
ˆ
or 1 1
ˆ1
as the revised estimate of .
Note that ˆ1 is obtained from ŷ ˆ0 ˆ1 x and ˆ is obtained from ŷ ˆ0* ˆ1* x
ˆ .
This procedure may be repeated using a new regression x* x1 in the calculation.
Usually, the first stage result 1 is a satisfactory estimate of . The round-off error is a potential problem.
If enough decimal places are not taken care, then the successive values of may oscillate badly. If the
standard deviation of error ( ) is large or the range of the explanatory variable is very small relative to its
mean, then the estimator may face convergence problems. This situation implies that the data do not support
the need for any transformation.
Regression Analysis | Chapter 5 | Transf. Weight. Correct Model Inadequacies | Shalabh, IIT Kanpur
16