Robust Weighted Least Squares Estimation
Robust Weighted Least Squares Estimation
Teknologi
Received :2 February 2014 In a linear regression model, the ordinary least squares (OLS) method is considered the best method to
Received in revised form : estimate the regression parameters if the assumptions are met. However, if the data does not satisfy the
3 August 2014 underlying assumptions, the results will be misleading. The violation for the assumption of constant
Accepted :15 October 2014 variance in the least squares regression is caused by the presence of outliers and heteroscedasticity in the
data. This assumption of constant variance (homoscedasticity) is very important in linear regression in
Graphical abstract which the least squares estimators enjoy the property of minimum variance. Therefor e robust regression
method is required to handle the problem of outlier in the data. However, this research will use the
weighted least square techniques to estimate the parameter of regression coefficients when the assumption
of error variance is violated in the data. Estimation of WLS is the same as carrying out the OLS in a
transformed variables procedure. The WLS can easily be affected by outliers. To remedy this, We have
suggested a strong technique for the estimation of regression parameters in the existence of
heteroscedasticity and outliers. Here we apply the robust regression of M-estimation using iterative
reweighted least squares (IRWLS) of Huber and Tukey Bisquare function and resistance regression
estimator of least trimmed squares to estimating the model parameters of state-wide crime of united states
in 1993. The outcomes from the study indicate the estimators obtained from the M-estimation techniques
and the least trimmed method are more effective compared with those obtained from the OLS.
Keywords: Robust estimation; robust weighted least squares; robust least trimmed squares;
heteroscedasticity; outliers
Abstrak
Dalam model regresi linear, kaedah kuasa dua terkecil (OLS) dianggap kaedah terbaik untuk menganggar
parameter regresi jika andaian dipenuhi. Walau bagaimanapun, jika data tidak memenuhi andaian asas,
keputusan akan terpesong. Andaian varians malar dalam regresi kuasa dua terkecil tidak dipenuhi
disebabkan oleh kehadiran titik terpencil dan heteroskedastisiti dalam data. Andaian varians malar
(homoskedastik) adalah sangat penting dalam regresi linear di mana penganggar kuasa dua terkecil
mempunyai varians yang minimum. Oleh itu kaedah regresi teguh diperlukan untuk mengendalikan
masalah titik terpencil dalam data. Walau bagaimanapun, kajian ini akan menggunakan teknik wajaran
kuasa dua terkecil (WLS) untuk menganggar parameter pekali regresi apabila andaian varians ralat tidak
dipenuhi dalam data. Anggaran WLS adalah sama seperti menjalankan OLS dalam prosedur transformasi
pembolehubah. Pengaggar WLS dengan mudah boleh dipengaruhi oleh titik terpencil. Untuk mengatasi
perkara ini, kami telah mencadangkan satu teknik yang kuat untuk anggaran parameter regresi dalam
kewujudan heteroskedastik dan titik terpencil. Di sini kita menggunakan regresi teguh M- anggaran
berdasarkan lelaran wajaran kuasa dua terkecil ( IRWLS ) daripada Huber dan Tukey Bisquare fungsi dan
pengaggar rintangan r regresi kuasa dua terkurang untuk menganggar parameter model bagi data jenayah
di seluruh Amerika Syarikat pada tahun 1993. Hasil daripada kajian menunjukkan penganggar yang
diperolehi daripada teknik-teknik M- anggaran dan kaedah kuasa dua terkurang adalah lebih berkesan jika
dibandingkan dengan yang diperoleh daripada OLS .
Kata kunci: Menunjukkan anggaran; menunjukkan wajaran kuasa dua; menunjukkan kuasa dua
terkurang; heteroskedastisiti; titik terpencil
© 2014 Penerbit UTM Press. All rights reserved.
1.0 INTRODUCTION the n×1 vector of errors. The errors are assumed to be normally
distributed, with mean 0 and constant variance σ2.
In classical linear regression analysis the ordinary least squares The estimator of OLS regression coefficients is
(OLS) method is generally used to estimate the parameter of the
regression model due to its optimal properties and ' -1 '
straightforward computation. There are several assumptions that βˆ =(X X) X y (2)
have to be possessed in making the OLS estimators very
attractive and valid. One of the assumptions within the OLS with the variance matrix giving by
regression model is the assumption of homoscedasticity which is
rather severe. Researchers frequently encountered difficult ˆ ' -1 ' ' -1
situations where the variance from the respond variable relates to var(β)=(X X) X ΩX(X X) (3)
the values of a number of independent variables, leading to
heteroscedasticity [1], [2]. Where E( ) = , is a positive definite matrix. Equation
T
=x i
'
i
4.0 COMPARISON OF ROBUST REGRESSION For the i th of n observations the fitted model is
METHODS
yi b1 xi1 b2 xi 2 ... bk xik ei
In general, the three broad categories of robust regression models
that play the most impotant role are; M-estimators (extending
xi'b ei (15)
from M-estimates of location by considering the size of the
residuals); L-estimators (based on linear combinations of order The general M-estimator minimizes the objective function rather
statistics), and R-estimators (based on the ranks of the residuals). than minimizes the sum of squared errors since the aim is to
Each category of the estimators contains a class of models minimize the function ρ of the errors with M-estimate. The M-
derived under similar conditions and with comparable theoretical estimate target function is,
statistical properties. Least Trimmed Squares Estimate (LTS), n n '
M-estimate, Yohai -MM-estimate, Least Median Squares (LMS) (e ) ( y x b ) (16)
i i i
and S-estimate are among popular techniques used in estimating i 1 i 1
the parameters of the regression line. In this study the M-
estimator and least trimmed squares are used and will be briefly The contribution of each residual is given by the function ρ to
described in the next sections. the objective function. A suitable ρ should have the following
Suppose we define n sample of data points as characteristics.
( e) 0,
Z = {(x , .., x , y ), .., (x , .., x , y )}
11 1p 1 n1 np n (10) (0) 0,
(e) ( e) and
and let T be an estimator of regression. This indicates that by
applying T to such a sample Z will produce a vector of ( ei ) (ei ) for ei ei
regression coefficients. For example, for least squares estimation,
T(Z) = θ̂
T
(ei ) ei
Now let consider z that are obtained by replacing any of the m
original data points by arbitrary values. Let us denote by bias (m; The devices of normal equations to solve this minimization
T, Z) and thus the maximum bias cause as a result of such problem was discovered if the partial derivatives with respect to
contamination β are set to 0, produces a system of k+1 estimating equations for
the coefficients
n
bias m;T ,Z sup T Z 'T Z (11) ( yi xi' b ) xi' 0
z' i 1
'
where the supremum is over all possible z . If bias (m; T, Z) is where ψ is derivative of ρ. The preference of the ψ function is
infinite, this means that m outliers can have an arbitrary large dependent on the choice of how much weight to specify outliers.
effect of T which may be expressed by saying that estimator A monotone ψ function does not consider weight on outliers as
breaks down. Therefore, the (finite-sample) breakdown point of much as least squares (e.g. 10σ outlier would receive the same
the estimator T for sample Z is described as follows, weight as a 3σ outlier). A descending ψ function increases the
* m weight specify to an outlier until a specified distance and then
n (T , Z ) min ,bias ( m, T ,Z ) is finite (12) reduce the weight to 0 as the outlying distance gets considerable.
n
Newton-Raphson and Iteratively Reweighted Least Squares
is infinite. In other words, it is the smallest fraction of (IRLS) are the two methods to solve the M estimates nonlinear
contamination that can cause the estimator T to take on values normal equations. But for this research, the iterative reweighted
arbitrarily far from T(Z). For least squares, we have seen that one least squares robust regression is used. IRLS expresses the
outlier is sufficient to carry T over all bounds. Therefore, its normal equations as,
breakdown point is
14 Bello Abdulkadir Rasheed et al. / Jurnal Teknologi (Sciences & Engineering) 71:1 (2014), 11–17
' -1 ' Another form of robust regression estimation is the least trimmed
βˆ =(X WX) X . (18)
squares regression method (LTS) [9]. He develops the least
trimmed squares (LTS) estimation method as a high efficiency
However, the weights depend upon the residuals; the residuals alternative to least median squares regression (LMSR) and this
depend upon the estimated coefficients, and the estimated technique is observed from minimizing
coefficients depend upon the weights. An iterative solution
called iteratively re-weighted least squares, IRLS is therefore
ˆ LTS arq MinQLTS ( ) where
required. The following step describe the IRLS procedure:
h
Step 1: Select the weight function for weighting all the cases. Q e2
LTS
But in this study, we will make use of the Huber and the Tukey. i 1
The weights function which is defined as follows: 2 2 2
where e e ........ e are the ordered squared
1 u 1.345 (1) (2) ( n)
Huber: w 1.345 residuals from the smallest to the largest. The values of h is
2i
u u 1.345
n p 1
obtained by h with n and p being the
2 2
The constant 1.345 is called a turning constant and the given sample size and number of parameters involved in the
standardized residual model respectively. This approach is similar to least squares
(0) 0 except usually the largest (n-h) squared residuals are removed
(0) median|ei median (ei )| (trimmed sum) from the summations which allow those outlier to
s
i 0.6745 be removed completely, allowing the fit to avoid the outliers.
The corresponding bisquare method is defined as: Least trimmed squares (LTS) can be very efficient when exact
2 outlying data points are trimmed. But if there is more trimming
(0) 2 than there are outlying data points, then some good observations
i
u (0) will be eliminated from the computation. From the breakdown
1
0 4.685 i
, u 4.685
point, LTS is regarded as a high break down techniques with a
w , i 1, 2, ..., 51
i
BP of 50% when h 1 / n . The main disadvantage of robust LTS
(0) is the large number of operations required to sort the squared
0 ui 4.685 residuals in the objective function [15]. Another challenge is
deciding the best approach for determining the initial estimate.
The weighted robust least trimmed squares method consists of
Step 2: Obtain the starting weight for all the cases. the following procedures:
Step 1: Regressed the response variables yi on the explanatory
bt = [xt wt-1x]xi wt-i y (19)
variables xij by least trimmed squares and compute the
Where x is the model matrix, with xi as its ith row, and regression coefficients from this fitting
t-i=diag w t-i
i
Step 2: The inverse of these fitted values denoted by w1i will be
w
is the current weight matrix. the values of the initial weight.
Step 3: Use the starting weights in weighted least squares to Step 3: Obtain the final weight from Huber weighted functions.
t 1 Which is given as
obtain the residual e
i from the fitted regression function.
1 , u 1.345
Step 4: Use the obtained residuals in step 3 to obtain the revised
weight w 1.345
u , u 1.345
2i
t 1 t i
w [e ] (20)
i i
The constant 1.345 is called a turning constant and the
standardized residual u. The estimate of the scaled residuals is
Step 5: Continue the iteration until convergence is obtained. The
obtained as,
asymptotic covariance matrix of ˆ is
(0) 0
E ( 2 ) t -1 (0) median|ei median (ei )|
v (b ) ( x x) (21) s
( E ( )) i 0.6745
Step 6: Finally carry out a WLS regression using the final The standardized residual estimate is then defined as
weights wi. The coefficients of regression realized from this WLS | ei(0) |
regression are the required estimate of the heteroscedastic model. ui( o ) , i 1, 2,..., n
Step2 and Step3 are repeated until the estimated coefficient s (0)
15 Bello Abdulkadir Rasheed et al. / Jurnal Teknologi (Sciences & Engineering) 71:1 (2014), 11–17
Table 2 Summary of robust techniques performance against OLS for (original and modified)
Method Data
type plainly 0 1 2 3 4 5 6
OLS Orign. -857.62 23.42 6.24 -1.27 5.36 15.09 28.632
Mod 519.88 -17.82 -16.07 35.90 24.82 -17.40 253.912
RWLS Huber. Orign -662.47 24.16 5.62 -1.14 3.34 10.48 32.170
Mod. -139.38 15.09 3.82 -8.58 3.93 14.63 51.781
Tukey Bisquare Orgn. -509.74 24.25 5.28 -1.32 2.01 8.11 33.6722
Mod. -376.50 23.03 5.28 -1.84 0.42 12.58 31.9234
RWLTS Orign -662.47 24.16 5.62 -1.14 3.34 10.48 32.170
Mod -139.38 15.09 3.82 -8.58 3.93 14.63 51.781
Table 3 Summary of robust techniques performance against OLS for (original and modified)
Table 4 Summary of robust techniques performance against OLS for (original and modified)
Method Data type t-value t-value t-value t-value t-value t-value t-value
0 1 2 3 4 5 6
OLS Original -1.42 5.94 5.28 -0.50 0.77 1.46 1.94
Modified 0.14 -0.71 -2.15 -2.22 0.56 -0.27 2.72
RWLS Huber Original -1.10 6.12 4.66 -0.45 0.48 1.01 2.17
Modified -0.23 3.34 2.91 -2.45 0.56 1.36 2.79
Tukey Bisquare Original -0.81 5.90 4.29 -0.50 0.28 0.75 2.19
Modified -0.63 5.88 4.50 -0.72 0.06 1.22 2.18
RWLTS Original -1.10 6.12 4.66 -0.45 0.48 1.01 2.17
Modified -0.23 3.34 2.91 -2.45 0.56 1.36 2.79
Figure 6 Plot of OLS residual versus fitted values (Modified data) Figure 9 Plot of RWLTS residuals versus fitted values (Modified data)