0% found this document useful (0 votes)
50 views7 pages

Robust Weighted Least Squares Estimation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views7 pages

Robust Weighted Least Squares Estimation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Jurnal Full paper

Teknologi

Robust Weighted Least Squares Estimation of Regression Parameter in


the Presence of Outliers and Heteroscedastic Errors
a* a b a
Bello Abdulkadir Rasheed , Robiah Adnan , Seyed Ehsan Saffari , Kafi dano Pati
a
Department of mathematics, Faculty of Science, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
b
Centre of Education, Sabzevar University of Medical Sciences, Sabzevar, Iran

*Corresponding author: [email protected]

Article history Abstract

Received :2 February 2014 In a linear regression model, the ordinary least squares (OLS) method is considered the best method to
Received in revised form : estimate the regression parameters if the assumptions are met. However, if the data does not satisfy the
3 August 2014 underlying assumptions, the results will be misleading. The violation for the assumption of constant
Accepted :15 October 2014 variance in the least squares regression is caused by the presence of outliers and heteroscedasticity in the
data. This assumption of constant variance (homoscedasticity) is very important in linear regression in
Graphical abstract which the least squares estimators enjoy the property of minimum variance. Therefor e robust regression
method is required to handle the problem of outlier in the data. However, this research will use the
weighted least square techniques to estimate the parameter of regression coefficients when the assumption
of error variance is violated in the data. Estimation of WLS is the same as carrying out the OLS in a
transformed variables procedure. The WLS can easily be affected by outliers. To remedy this, We have
suggested a strong technique for the estimation of regression parameters in the existence of
heteroscedasticity and outliers. Here we apply the robust regression of M-estimation using iterative
reweighted least squares (IRWLS) of Huber and Tukey Bisquare function and resistance regression
estimator of least trimmed squares to estimating the model parameters of state-wide crime of united states
in 1993. The outcomes from the study indicate the estimators obtained from the M-estimation techniques
and the least trimmed method are more effective compared with those obtained from the OLS.

Keywords: Robust estimation; robust weighted least squares; robust least trimmed squares;
heteroscedasticity; outliers

Abstrak

Dalam model regresi linear, kaedah kuasa dua terkecil (OLS) dianggap kaedah terbaik untuk menganggar
parameter regresi jika andaian dipenuhi. Walau bagaimanapun, jika data tidak memenuhi andaian asas,
keputusan akan terpesong. Andaian varians malar dalam regresi kuasa dua terkecil tidak dipenuhi
disebabkan oleh kehadiran titik terpencil dan heteroskedastisiti dalam data. Andaian varians malar
(homoskedastik) adalah sangat penting dalam regresi linear di mana penganggar kuasa dua terkecil
mempunyai varians yang minimum. Oleh itu kaedah regresi teguh diperlukan untuk mengendalikan
masalah titik terpencil dalam data. Walau bagaimanapun, kajian ini akan menggunakan teknik wajaran
kuasa dua terkecil (WLS) untuk menganggar parameter pekali regresi apabila andaian varians ralat tidak
dipenuhi dalam data. Anggaran WLS adalah sama seperti menjalankan OLS dalam prosedur transformasi
pembolehubah. Pengaggar WLS dengan mudah boleh dipengaruhi oleh titik terpencil. Untuk mengatasi
perkara ini, kami telah mencadangkan satu teknik yang kuat untuk anggaran parameter regresi dalam
kewujudan heteroskedastik dan titik terpencil. Di sini kita menggunakan regresi teguh M- anggaran
berdasarkan lelaran wajaran kuasa dua terkecil ( IRWLS ) daripada Huber dan Tukey Bisquare fungsi dan
pengaggar rintangan r regresi kuasa dua terkurang untuk menganggar parameter model bagi data jenayah
di seluruh Amerika Syarikat pada tahun 1993. Hasil daripada kajian menunjukkan penganggar yang
diperolehi daripada teknik-teknik M- anggaran dan kaedah kuasa dua terkurang adalah lebih berkesan jika
dibandingkan dengan yang diperoleh daripada OLS .

Kata kunci: Menunjukkan anggaran; menunjukkan wajaran kuasa dua; menunjukkan kuasa dua
terkurang; heteroskedastisiti; titik terpencil
© 2014 Penerbit UTM Press. All rights reserved.

71:1 (2014) 11–18 | www.jurnalteknologi.utm.my | eISSN 2180–3722 |


12 Bello Abdulkadir Rasheed et al. / Jurnal Teknologi (Sciences & Engineering) 71:1 (2014), 11–17

1.0 INTRODUCTION the n×1 vector of errors. The errors are assumed to be normally
distributed, with mean 0 and constant variance σ2.
In classical linear regression analysis the ordinary least squares The estimator of OLS regression coefficients is
(OLS) method is generally used to estimate the parameter of the
regression model due to its optimal properties and ' -1 '
straightforward computation. There are several assumptions that βˆ =(X X) X y (2)
have to be possessed in making the OLS estimators very
attractive and valid. One of the assumptions within the OLS with the variance matrix giving by
regression model is the assumption of homoscedasticity which is
rather severe. Researchers frequently encountered difficult ˆ ' -1 ' ' -1
situations where the variance from the respond variable relates to var(β)=(X X) X ΩX(X X) (3)
the values of a number of independent variables, leading to
heteroscedasticity [1], [2]. Where E( ) = , is a positive definite matrix. Equation
T

In this type of situation, the variance of a model according


(3) simplifies to the following:
to the explanatory variables can produce weights for the
weighted least squares estimator [2], [3], [4]. Weighted least
2 ' -1
squares, is a special case from the generalized least squares var( ˆ )  (X X) (4)
estimator, is optimal when the structure of heteroscedasticity
2 ' -1
error variance is known. But unfortunately usually, the structure If the errors are homoscedastic, then is var( ˆ )  (X X) .but
of heteroscedasticity error variance is not known in advance. For
if the errors are heteroscedastic, the parameter Ω become
that situation, researchers may use estimated generalized least
squares [3], [4].
2
In the existence of heteroscedasticity, the OLS estimators  = V , and equation (3) becomes
remained unbiased. However, probably the most harmful
2 ' -1 ' ' -1
consequence of heteroscedasticity in regression model would be var( ˆ )  (X X) X VX(X X) (5)
that the OLS estimator from the parameter covariance matrix
(OLSCM), whose elements in the diagonal are utilized to The mentioned problem above can be overcome by transforming
estimate the standard errors of the regression coefficients, our model to a new set of observations that fulfilled the
becomes biased and unreliable [4], [5]. As a result, the t-tests for underlying standard assumptions of least squares. Then the OLS
individual coefficients are generally too liberal or too is used on the transformed data. Since the covariance matrix of
conservative, with respect to the type of heteroscedasticity. the errors is denoted by , Then υ must be non-negative and
White [4] and Rana et al. [6] suggested a heteroscedasticity non-singular definite, then
consistent covariance matrix (HCCM) to resolve the
inconsistency problem from the estimator.But there is evidence
' -1 -1 -1
that with a couple of outliers this could make all the estimation β = (X V X) XV y (6)
and methods meaningless [5], [6], [7]. In the existence of outliers GLS
we possess some robust approaches for the recognition of
heteroscedasticity [6], [7]. is the estimate of the generalized least squares (GLS) of β. When
However we do not have enough robust techniques the errors ε have unequal variances and are uncorrelated, the
obtainable in the literature for the estimation of parameters in the covariance matrix of ε is written as
existence of outliers and heteroscedasticity error variance.
Although heteroscedasticity does not cause any biasness problem 2
 V  diag[1 / w ] , i 1, 2, ..., n
towards the OLS estimators, the OLS may be easily affected by i
the presence of outliers. The weighted least squares also suffer Consequently, the GLS now becomes a solution to the
exactly the same problems in the existence of outliers and can heteroscedastic model. Assuming we defined W = V-1 , as a the
produce a huge interpretive issue in the estimation method [6], diagonal matrix with diagonal elements or weights
[7], [8]. In most cases, no estimation techniques work effectively w1 , w 2 ,..., wn . From equation (6), the estimator of weighted
unless of course we eliminate the influence of outliers in a
least squares is now written as
heteroscedastic regression model.
This problem inspires us to build a new and better
estimation technique, that provide resistant result when ' -1 '
WLS =(X WX) X Wy (7)
heteroscedasticity and outliers happened at the same time. In this
2 ' -1
study the OLS regression estimation method will be compared V(βˆ )=σ WLS (X WX) ` (8)
with the robust regression methods of M-estimate based on WLS
Huber weighted function and tukey bisquare function and the Where
resistant regression estimator of least trimmed squares. We 2  Wi 2εˆ i 2
expect the recommended methods could be less responsive to σ WLS = (9)
n-p
outliers and simultaneously have the ability to remedy the
problem of heteroscedasticity
If the error structure of heteroscedastic in the regression model is
known, the computation of weights W matrix is simple, and
2.0 METHODOLOGY OF HETEROSCEDASTIC consequently the WLS regression serve a good solution of the
REGRESSION MODEL heteroscedastic model. Unfortunately, even though in practice,
the hetroscedastic error structure of the regression model is
Consider the following classical linear regression model unknown.
y =X + (1)

where y is the usual n×1 vector of the observed dependent


values, X is the n×p matrix of the predictor variables including
the intercept, β is a p×1vector of regression parameters, and ε is
13 Bello Abdulkadir Rasheed et al. / Jurnal Teknologi (Sciences & Engineering) 71:1 (2014), 11–17

3.0 ESTIMATION OF ROBUST WEIGHTED LEAST * 1


SQUARES REGRESSION (RWLSR) n  (13)
(T,Z) n
Robust regression analysis provides an alternative choice to a
least squares regression when fundamental assumptions are which tends to zero for increasing size n, so it can be said that
unfulfilled while using the character within the data [8].The the LS has breakdown point of 0% This again reflects the
qualities of efficiency, breakdown, and leverage points are extreme sensitivity of the LS techniques to outliers [12].
broadly-knowledgeable about define robust techniques
performance within the theoretical sense. One justification for
robust estimators may well be a highest finite sample breakdown 5.0 M-ESTIMATION
point ὲ*n defined by [9], [10], [11]. The breakdown point may The performance of linear least square estimates behaves badly
be defined as the smallest percentage of contaminated data that when the distribution of error is not normal, more especially
can cause the estimator to take on arbitrary large aberrant Values when the errors are heavily tailed. One solution to this is to
[10]. Hence, the breakdown point is simply the initial time any eliminate the influential observation from the least squares fit.
record test becomes swamped due to contaminated data. Some The group of M-estimator models consists of all models that are
regression estimators offer the smallest possible breakdown point derived from maximum likelihood models. The most frequent
of just 0/n or 1/n. basically; only one outlier will make the OLS general method of robust regression is M-estimation, produced
regression equation being made useless. Other estimators offer by Huber (1964) that is nearly as efficient as OLS [12].
the finest possible breakdown cause of n/2 or 50%. If robust Consider the following linear model
estimation method includes a 50% breakdown point, then 50% of
the data could contain outliers together with the coefficients that
would remain useful [12], [13], [14], [15], [16] yi = +1xi1 +2 x i2 +...+k xik + i (14)

=x    i
'
i
4.0 COMPARISON OF ROBUST REGRESSION For the i th of n observations the fitted model is
METHODS
yi    b1 xi1  b2 xi 2  ...  bk xik  ei
In general, the three broad categories of robust regression models
that play the most impotant role are; M-estimators (extending
 xi'b  ei (15)
from M-estimates of location by considering the size of the
residuals); L-estimators (based on linear combinations of order The general M-estimator minimizes the objective function rather
statistics), and R-estimators (based on the ranks of the residuals). than minimizes the sum of squared errors since the aim is to
Each category of the estimators contains a class of models minimize the function ρ of the errors with M-estimate. The M-
derived under similar conditions and with comparable theoretical estimate target function is,
statistical properties. Least Trimmed Squares Estimate (LTS), n n '
M-estimate, Yohai -MM-estimate, Least Median Squares (LMS)   (e )    ( y  x b ) (16)
i i i
and S-estimate are among popular techniques used in estimating i 1 i 1
the parameters of the regression line. In this study the M-
estimator and least trimmed squares are used and will be briefly The contribution of each residual is given by the function ρ to
described in the next sections. the objective function. A suitable ρ should have the following
Suppose we define n sample of data points as characteristics.
 ( e)  0,
Z = {(x , .., x , y ), .., (x , .., x , y )}
11 1p 1 n1 np n (10)  (0)  0,
 (e)  ( e) and
and let T be an estimator of regression. This indicates that by
applying T to such a sample Z will produce a vector of  ( ei )  (ei ) for ei  ei
regression coefficients. For example, for least squares estimation,
T(Z) = θ̂
T
 (ei )  ei
Now let consider z that are obtained by replacing any of the m
original data points by arbitrary values. Let us denote by bias (m; The devices of normal equations to solve this minimization
T, Z) and thus the maximum bias cause as a result of such problem was discovered if the partial derivatives with respect to
contamination β are set to 0, produces a system of k+1 estimating equations for
the coefficients
n
bias  m;T ,Z   sup T  Z 'T  Z  (11)   ( yi  xi' b ) xi' 0
z' i 1
'
where the supremum is over all possible z . If bias (m; T, Z) is where ψ is derivative of ρ. The preference of the ψ function is
infinite, this means that m outliers can have an arbitrary large dependent on the choice of how much weight to specify outliers.
effect of T which may be expressed by saying that estimator A monotone ψ function does not consider weight on outliers as
breaks down. Therefore, the (finite-sample) breakdown point of much as least squares (e.g. 10σ outlier would receive the same
the estimator T for sample Z is described as follows, weight as a 3σ outlier). A descending ψ function increases the
* m  weight specify to an outlier until a specified distance and then
 n (T , Z )  min  ,bias ( m, T ,Z ) is finite  (12) reduce the weight to 0 as the outlying distance gets considerable.
n 
Newton-Raphson and Iteratively Reweighted Least Squares
is infinite. In other words, it is the smallest fraction of (IRLS) are the two methods to solve the M estimates nonlinear
contamination that can cause the estimator T to take on values normal equations. But for this research, the iterative reweighted
arbitrarily far from T(Z). For least squares, we have seen that one least squares robust regression is used. IRLS expresses the
outlier is sufficient to carry T over all bounds. Therefore, its normal equations as,
breakdown point is
14 Bello Abdulkadir Rasheed et al. / Jurnal Teknologi (Sciences & Engineering) 71:1 (2014), 11–17

' ' converges. The Procedure continues until some convergence


X WXβˆ = X Wy (17) criteria is satisfied. The estimate of scaled residuals may be
updated after every initial estimate.
where W is an n x n diagonal matrix of weights. The initial
vector of parameter estimates, α and β are typically obtained
from OLS. IRLS updates these parameter estimates with 6.0 LEAST TRIMMED SQUARES (LTS) ESTIMATE

' -1 ' Another form of robust regression estimation is the least trimmed
βˆ =(X WX) X . (18)
squares regression method (LTS) [9]. He develops the least
trimmed squares (LTS) estimation method as a high efficiency
However, the weights depend upon the residuals; the residuals alternative to least median squares regression (LMSR) and this
depend upon the estimated coefficients, and the estimated technique is observed from minimizing
coefficients depend upon the weights. An iterative solution
called iteratively re-weighted least squares, IRLS is therefore
ˆ LTS  arq MinQLTS (  ) where
required. The following step describe the IRLS procedure:
h
Step 1: Select the weight function for weighting all the cases. Q   e2
LTS
But in this study, we will make use of the Huber and the Tukey. i 1
The weights function which is defined as follows: 2 2 2
where e  e  ........  e are the ordered squared
1 u 1.345 (1) (2) ( n)

Huber: w  1.345 residuals from the smallest to the largest. The values of h is
2i
 u u 1.345
  n    p 1 
obtained by h       with n and p being the
2  2 
The constant 1.345 is called a turning constant and the given sample size and number of parameters involved in the
standardized residual model respectively. This approach is similar to least squares
(0) 0 except usually the largest (n-h) squared residuals are removed
(0) median|ei median (ei )| (trimmed sum) from the summations which allow those outlier to
s 
i 0.6745 be removed completely, allowing the fit to avoid the outliers.
The corresponding bisquare method is defined as: Least trimmed squares (LTS) can be very efficient when exact
 2  outlying data points are trimmed. But if there is more trimming
   (0) 2   than there are outlying data points, then some good observations

  i 
u  (0)  will be eliminated from the computation. From the breakdown
1
 0    4.685   i
  , u  4.685 
  point, LTS is regarded as a high break down techniques with a
w   , i  1, 2, ..., 51
i
 
  BP of 50% when h  1 / n . The main disadvantage of robust LTS
 (0)  is the large number of operations required to sort the squared
 0 ui 4.685  residuals in the objective function [15]. Another challenge is
 
deciding the best approach for determining the initial estimate.
The weighted robust least trimmed squares method consists of
Step 2: Obtain the starting weight for all the cases. the following procedures:
Step 1: Regressed the response variables yi on the explanatory
bt = [xt wt-1x]xi wt-i y (19)
variables xij by least trimmed squares and compute the
Where x is the model matrix, with xi as its ith row, and regression coefficients from this fitting

 
t-i=diag w t-i
i
Step 2: The inverse of these fitted values denoted by w1i will be
w
is the current weight matrix. the values of the initial weight.

Step 3: Use the starting weights in weighted least squares to Step 3: Obtain the final weight from Huber weighted functions.
t 1 Which is given as
obtain the residual e
i from the fitted regression function.

1 , u 1.345
Step 4: Use the obtained residuals in step 3 to obtain the revised 
weight w  1.345
 u , u 1.345
2i

t 1 t i
w [e ] (20)
i i
The constant 1.345 is called a turning constant and the
standardized residual u. The estimate of the scaled residuals is
Step 5: Continue the iteration until convergence is obtained. The
obtained as,
asymptotic covariance matrix of ˆ is
(0) 0
E (  2 ) t -1 (0) median|ei median (ei )|
v (b )  ( x x) (21) s 
( E (  )) i 0.6745

Step 6: Finally carry out a WLS regression using the final The standardized residual estimate is then defined as
weights wi. The coefficients of regression realized from this WLS | ei(0) |
regression are the required estimate of the heteroscedastic model. ui( o )  , i  1, 2,..., n
Step2 and Step3 are repeated until the estimated coefficient s (0)
15 Bello Abdulkadir Rasheed et al. / Jurnal Teknologi (Sciences & Engineering) 71:1 (2014), 11–17

variance assumption. This signifies that the OLS estimate is


Step 4: Multiply the initial weight w1i with the weight w2i inappropriate to be used. On the other hand When comepared
obtained from Huber function to get the final weight wi . with the RWLS and RWLTS plot of Figures 2, Figures 3,
Figures 4, Figures 7, Figures 8 and Figures 9 indicate that the
RWLS and RWLTS can solve the heteroscedasticity and outliers
Step 5: Finally perform the WLTS with regression with the final problems.
weight wi . The regression coefficients produced from this
Table 1 Summary of Robust techniques performance against OLS for
estimates of WLTS regression are the desired regression (Original and Modified)
estimates of heteroscedasticity model regression are the desired
regression estimate of heteroscedasticity model.
Method Data Type Estimate SE t- value
OLS Original 49.03 11.83 4.15
7.0 NUMERICAL EXAMPLE Modified 54.21 36.12 1.50
RWJS
Original 35.96 11.19 3.22
In this section we consider some few examples to show the HUBER
importance of the robust estimators in a situation when Modified 49.37 12.26 4.04
Turkey
heteroscedasticity and outliers are presence. A hetrosecdastic Original 28.45 9.54 2.98
Bisquare
data taken from the state-wide crime data of United states (1993) Modified 37.57 9.118 4.12
are to be used. The data contains fifty one observation of violent
crime rate (per 100,000 people in population) of y with
corresponding predictor variables of x1, x2, x3, x4, x5 and x6, is Table 1 gives the summary results of statistics, which
used. The analysis begins by considering one regressor variables include the standard errors, t-values and the estimate of the
of x, with its corresponding response variable y, to observe the regression coefficient for the original and modified crimed data.
effects of outliers and heteroscedasticity using diagnostic plot. The result of Table 1 reveals the influence of outliers on the
We intentionally replace the value correspond to 1 st and 25th regression model, when OLS is used to estimate the regression
observation of the original data with a higher value such as 7610 parameter compared with the regression parameter obtained from
and 4340, where the original value is 761 and 434 respectively. the RWLS and RWLTS estimate. The result of RWLS based on
The OLS, RWLTS, M-estimates based on Huber function and Huber function, psiBisquare function and RWLTS estimate of
tukey bisquare were then used in the original and modified data. regression coefficient, standard errors and t-value of the
The results are presented in the graphs and the tables below. In modified and original data are similar. We can also see the
Figure 1, OLS residual plot of the original data against the weights given to the estimates on dramatically lower using the
regression fitted values. The situation for existence of Tukey bisquare weighting function than the Huber weighting
hetroscedasticity is that when variance of the error terms are not function and the parameter estimates from these two different
constant, and this can be identified when the residuals are not weighting methods differ.
randomly distributed around the zero residual, with an indication On the other hand when considering the estimate of the
of systematic trend on the plot. Based on this concept, the plot result obtained from the crime data that involve six explanatory
clearly indicates that constant variance assumption is violated, variables.
which gives evidence that the OLS fit is improper to be used, as As you can see, the results from the two analyses of original
there are clear evidence for the presence of heterogeneous error and modified data are fairly different, especially with respect to
variance. In this regard we apply the m- estimator methods based the regression coefficients and the constant (intercept) While
on Huber and Tukey bisquare weighted function to the data for normally we are not interested in the constant, if you had
the purpose to remedy the short coming of OLS in the presence centred one or all of the predictor variables, the constant would
of outlier and hetroscedesticily error variance. be useful. It will be noticed that some variables are not
To introduce this technique to the data, we first need to plot statistically significant in either analysis, whereas some are
the residual against the response variable with a data set that significant in both analyses it is an evidence that M- estimate
contain outlier. Figure 2, Figure 3 and Figure 4 gives the and RWLTS have partially address the problem of
diagnostic plot of the residual against the fitted values without hetroscedastic in the presence of outleirs in the data. However,
outliers using the M-estimate and least trimmed squares. While the results obtained using RWLS based on Huber and RWLTS
Figure 5 gives the linear regression models obtained from the are only slightly influence by the outliers. Different functions
three robust estimation techniques and OLS. From this plot we have advantages and drawbacks. Huber weights can have
notice that there are some differences between the estimators. difficulties with severe outliers, and Tukey Bisquare weights
This is an evident that the performance of the methods was can have difficulties converging or may yield multiple solutions.
satisfactory. In order to examine the consequence of outlier in The summary in Table 1 provides the result of the estimated
the existence of hetrosecdasticity, modification of the data is parameter using OLS and RWLS and RWLTS for the simple
highly important. The OLS, M-estimate and resistance regression linear sigression of two variables X and Y and multiple
method were used to examine the presence of outliers in the regression of both original and modified crimed data. Hence
modified data and the results are presented in Figure 6, Figure 7, they are not reliable. However, the M-estimation (IRWLS)
Figure.8 and Figure 9 below. Figure 10 gives the linear emerges to become plainly more effective and much more
regression models obtain from the three robust estimation reliable because it is less affected by the outliers. The outcomes
techniques and OLS estimation when the modified data is used. of the result appear to point out that the M-estimation based on
The plot of linear regression models obtained from the three Huber estimation, Tukey bisquare and RWLTS methods
robust estimation techniques using the original and modified data provides asubstantial Improvement within the other existing
give a clearer picturer about the real situation. The plot of Figure techniques.
1 and Figure 6 of OLS, indicate a violation of the constant
16 Bello Abdulkadir Rasheed et al. / Jurnal Teknologi (Sciences & Engineering) 71:1 (2014), 11–17

Table 2 Summary of robust techniques performance against OLS for (original and modified)

Method Data
type plainly 0 1 2 3 4 5 6
OLS Orign. -857.62 23.42 6.24 -1.27 5.36 15.09 28.632
Mod 519.88 -17.82 -16.07 35.90 24.82 -17.40 253.912
RWLS Huber. Orign -662.47 24.16 5.62 -1.14 3.34 10.48 32.170
Mod. -139.38 15.09 3.82 -8.58 3.93 14.63 51.781
Tukey Bisquare Orgn. -509.74 24.25 5.28 -1.32 2.01 8.11 33.6722
Mod. -376.50 23.03 5.28 -1.84 0.42 12.58 31.9234
RWLTS Orign -662.47 24.16 5.62 -1.14 3.34 10.48 32.170
Mod -139.38 15.09 3.82 -8.58 3.93 14.63 51.781

Table 3 Summary of robust techniques performance against OLS for (original and modified)

0 S.E. 1 S.E.  2 3 S.E.  4 5 6


Method Data type
S.E. S.E. S.E. S.E.
OLS Original 602.8 3.94 1.18 2.554 6.98 10.34 14.73
Modified 3816.4 24.9 7.48 16.168 44.2 65.44 93.26
RWLS Huber Original 602.7 3.95 1.21 2.554 6.95 10.43 14.86
Modified 616.6 4.52 1.32 3.504 7.09 10.73 18.54
Tukey Bisquare Original 628.7 4.11 1.23 2.664 7.28 10.78 15.36
Modified 599.2 3.92 1.18 2.539 6.95 10.28 14.64
RWLTS Original 602.7 3.95 1.21 2.554 6.95 10.43 14.86
Modified 616.6 4.52 1.32 3.504 7.09 10.73 18.54

Table 4 Summary of robust techniques performance against OLS for (original and modified)

Method Data type t-value t-value t-value t-value t-value t-value t-value
0 1 2 3 4 5 6
OLS Original -1.42 5.94 5.28 -0.50 0.77 1.46 1.94
Modified 0.14 -0.71 -2.15 -2.22 0.56 -0.27 2.72
RWLS Huber Original -1.10 6.12 4.66 -0.45 0.48 1.01 2.17
Modified -0.23 3.34 2.91 -2.45 0.56 1.36 2.79
Tukey Bisquare Original -0.81 5.90 4.29 -0.50 0.28 0.75 2.19
Modified -0.63 5.88 4.50 -0.72 0.06 1.22 2.18
RWLTS Original -1.10 6.12 4.66 -0.45 0.48 1.01 2.17
Modified -0.23 3.34 2.91 -2.45 0.56 1.36 2.79

Figure 1 Plot of OLS residual versus fitted values (Original data)


Figure 3 Plot of RWLS based on Tukey Bisquare residuals versus fitted
values(Original data)

Figure 2 Plot of RWLS based on Huber residuals versus fitted values


(Original data)

Figure 4 Plot of RWLTS residuals versus fitted values (Original data)


17 Bello Abdulkadir Rasheed et al. / Jurnal Teknologi (Sciences & Engineering) 71:1 (2014), 11–17

Figure 8 Plot of RWLS based on Tukey Bisquare residuals vs


Figure 5 Plot of crime data with four estimated regression lines fittedvalues (Modified data)
(Original data)

Figure 6 Plot of OLS residual versus fitted values (Modified data) Figure 9 Plot of RWLTS residuals versus fitted values (Modified data)

Figure 7 Plot of RWLS based on Huber residuals versus fitted values


(Modified data)
Figure 10 Plot of crime data with four estimated regression lines
(MModified data)

8.0 THE BEST MODEL Heteroscedasticity Errors. WSEAS Transaction on Mathematics.


351–360.
[2] S. Chatterjee, and A. S. Hadi. 2006. Regression Analysis by
The best model is by using the standard error and t-value Examples. 4th ed. Wiley, New York.
estimated from the state-wide crime data which involve all [3] R. D. Cook, and S. Weisberg. Diagnostics for Heteroscedasticity in
the explanatory variables. Based on the results obtained in Regression. Biometrika. 70(983): 1–10.
Table 1, Table 2, Table 3 and Table 4 it is clear that RWLS [4] Halbert White, A. 1980. Heteroskedastic Consistent Covariance
estimator using Tukey bisquare has the least standard errors Matrix Estimator and a Direct Test fFor Heteroskedasticity.
Econometrica. 48: 817–838.
with the largest t-values compared to the t-value obtain from [5] R. A. Maronna, R. D. Martin, and V. J. Yohai. 2006. Robust
RWLS estimator using, Huber function, RWLTS and OLS Statistics-Theory and Methods. Wiley, New York.
[6] M. S. Rana, H. Midi, and A. H. M. R. Imon. 2008. A Robust
Modification of the Goldfeld-Quandt Test for the Detection of
Heteroscedasticity in the Presence of Outliers. Journal of
9.0 CONCLUSION mathematics and Statistics. 4(4): 277–283.
[7] M. H. Kutner, C. J. Nachtsheim, and J. Neter. 2004. Applied Linear
The primary focus of this paper would be to produce reliable Regression Models. 4th ed., McGraw-Hill/ Irwin, New York.
techniques for correcting the problem of heteroscedastic [8] H. Midi and B. A. Talib. 2008. The Performance of Robust
errors in the existence of outliers. The empirical study Estimator in Linear Regression Model Having both Continous and
Catigorical Variables with Heteroscedasticity Errors. Malaysia
discloses the OLS estimations are often affected by the Journal of Mathematical Science. 2(1): 25–48
outliers.For correcting the problems of outliers and [9] P. J. Rousseeuw, and A. Leroy. 1987. Robust Regression and
heteroscedastic errors in the data. Outlier Detection. Wiley, New York.
[10] A. H. Midi and R. Imon. 2009. Deletion Residuals in the Detection
of Heterogencity of Variance in Linear Regression. Journal of
Applied Statistics. 36: 347–358.
Acknowledgments [11] R. Marona, R. Martin, V. J. Yohai. 2006. Robust Statistics Theory
and Methods. John Wiley & Sons Ltd., England.
We would like to acknowledge the financial support from [12] D. L. Donoho and P. J. Huber. 1983. The Notion of Breakdown
University Technologi Malaysia for the Research University Point. In: Bickel PJ Doksum KA, Hodges JL Jr (Editors), A
Grant. Festschrift for Erich L. LehmannWadsworth, Belmont. 157–184.
[13] A. Christmann. 1994. Least Median of Weighted Squares in
Logistic Regression with Large Strata. Biometrika. 81: 413–417.
[14] P. J. Rousseeuw, A. M. Leroy. 1987. Robust Regression and
References Outlier Detection. Wiley-Interscience, New York.
[15] P. H. Huber. 1964. Robust Estimation of a Location Parameter. The
[1] Midi, S. Rana, and A. Rahmatullah. 2009. The Performance of Annals of Mathematical Statistics. 35:7–101.
Robust Weighted Least Squares in the Presence of Outliers and

You might also like