0% found this document useful (0 votes)
7 views38 pages

Outlier Detection Algorithms

This paper reviews robust methods for outlier detection in least squares time series regression, focusing on asymptotic results related to the Huber-skip M-estimator and other techniques. It establishes a new asymptotic theory for the gauge, which measures the expected frequency of falsely detected outliers, and applies this theory to time series data. The document emphasizes the importance of understanding the properties of these methods when no outliers are present to ensure their reliability in statistical modeling.

Uploaded by

dio din
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views38 pages

Outlier Detection Algorithms

This paper reviews robust methods for outlier detection in least squares time series regression, focusing on asymptotic results related to the Huber-skip M-estimator and other techniques. It establishes a new asymptotic theory for the gauge, which measures the expected frequency of falsely detected outliers, and applies this theory to time series data. The document emphasizes the importance of understanding the properties of these methods when no outliers are present to ensure their reliability in statistical modeling.

Uploaded by

dio din
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

1

Outlier detection algorithms for least squares time series


regression1
Søren Johansen23 & Bent Nielsen4

8 September 2014

Summary: We review recent asymptotic results on some robust methods for multiple regres-
sion. The regressors include stationary and non-stationary time series as well as polynomial
terms. The methods include the Huber-skip M-estimator, 1-step Huber-skip M-estimators,
in particular the Impulse Indicator Saturation, iterated 1-step Huber-skip M-estimators and
the Forward Search. These methods classify observations as outliers or not. From the as-
ymptotic results we establish a new asymptotic theory for the gauge of these methods, which
is the expected frequency of falsely detected outliers. The asymptotic theory involves normal
distribution results and Poisson distribution results. The theory is applied to a time series
data set.
Keywords: Huber-skip M-estimators, 1-step Huber-skip M-estimators, iteration, Forward
Search, Impulse Indicator Saturation, Robusti…ed Least Squares, weighted and marked em-
pirical processes, iterated martingale inequality, gauge.

1 Introduction
The purpose of this paper is to review recent asymptotic results on some robust methods
for multiple regression and apply these to calibrate these methods. The regressors include
stationary and non-stationary time series as well as quite general deterministic terms. All the
reviewed methods classify observations as outliers according to hard, binary decision rules.
The methods include the Huber-skip M-estimator, 1-step versions such as the robusti…ed
least squares estimator and the Impulse Indicator Saturation, iterated 1-step versions thereof,
and the Forward Search. The paper falls in two parts. In the …rst part we give a motivating
empirical example. This is followed by an overview of the methods and a review of recent
asymptotic tools and properties of the estimators. For all the presented methods the outlier
classi…cation depends on a cut-o¤ value c which is taken as given in the …rst part. In the
second part we provide an asymptotic theory for setting the cut-o¤ value c indirectly from
the gauge, where the gauge is de…ned as the frequency of observations classi…ed as outliers,
when in fact there are no outliers in the data generating process.
Robust methods can be used in many ways. Some methods reject observations that are
classi…ed as outliers, while other method give a smooth weight to all observations. It is
1
Acknowledgements: We would like to thank the organizers of the NordStat meeting in Turku, Finland,
June 2014, for giving us the opportunity to present these lectures on outlier detection.
2
The …rst author is grateful to CREATES - Center for Research in Econometric Analysis of Time Series
(DNRF78), funded by the Danish National Research Foundation.
3
Department of Economics, University of Copenhagen and CREATES, Department of Economics and
Business, Aarhus University, DK-8000 Aarhus C. E-mail: [email protected].
4
Nu¢ eld College & Department of Economics, University of Oxford & Programme on Economic Mod-
elling, INET, Oxford. Address for correspondence: Nu¢ eld College, Oxford OX1 1NF, UK. E-mail:
bent.nielsen@nu¢ eld.ox.ac.uk.
2
open to discussion which method to use, see for instance Hampel, Ronchetti, Rousseeuw and
Stahel (1986, §1.4). Here, we focus on rejection methods. We consider an empirical example,
where rejection methods are useful as diagnostic tools. The idea is that most observations are
‘good’in the sense that they conform with a regression model with symmetric, if not normal,
errors. Some observations may not conform with the model - they are the outliers. When
building a statistical model the user can apply the outlier detection methods in combination
with considerations about the substantive context to decide which observations are ‘good’
and how to treat the ‘outliers’in the analysis.
In order to use the algorithms with con…dence we need to understand its properties when
all observations are ‘good’. Just as in hypothesis testing, where tests are constructed by
controlling their properties when the hypothesis is true, we consider the outlier detection
methods when, in fact, there are no outliers. The proposal is to control the cut-o¤ values of
the robust methods in terms of their gauge. The gauge is the frequency of wrongly detected
outliers when there are none. It is distinct from, but related to, the size of a hypothesis test
and of false discovery rate in multiple testing (Benjamini and Hochberg, 1995).
The origins of the notion of a gauge are as follows. Hoover and Perez (1999) studied
the properties of a general-to-speci…c algorithm for variable selection through a simulation
study. They considered various measures for the performance of the algorithm, that are
related to what is now called the gauge. One of these, they referred to as the size, and this
was the number of falsely signi…cant variables divided by the di¤erence between the total
number of variables and the number of variables with non-zero coe¢ cients. The Hoover-Perez
idea for regressor selection was the basis of the PcGets and Autometrics algorithms, see for
instance Hendry and Krolzig (2005), Doornik (2009) and Hendry and Doornik (2014). The
Autometrics algorithm also includes an impulse indicator saturation algorithm. Through
extensive simulation studies the critical values of these algorithms have been calibrated in
terms of the false detection rates for irrelevant regressors and irrelevant outliers. The term
gauge was introduced in Hendry and Santos (2010) and Castle, Doornik and Hendry (2011).

Part I
Review of recent asymptotic results
2 A motivating example
What is an outlier? How do we detect them? How should we deal with them? There is no
simple, universally valid answer to these questions –it all depends on the context. We will
therefore motivate our analysis with an example from time series econometrics.
Demand and supply is key to discussing markets in economics. To study this Graddy
(1995, 2006) collected data on prices and quantities from the Fulton Fish market in New
York. For our purpose the following will su¢ ce. The data consists of daily data of the
quantity of whiting sold by one wholesaler over the period 2 Dec 1991 to 8 May 1992. Figure
1(a) shows the daily aggregated quantity Qt measured in pounds. The logarithm of the
quantity, qt = log Qt is shown in panel (b). The supply of …sh depends on the weather at sea
where the …sh is caught. Panel (c) shows a binary variable St taking value 1 if the weather
3

(a) quantities in pounds (b) observations and fit


10
20000

8
10000

7
log quantity
fitted

0 20 40 60 80 100 120 0 20 40 60 80 100 120


(c) stormy weather at sea (d) residuals
1.0 2

0
0.5

-2

0 20 40 60 80 100 120 0 20 40 60 80 100 120

Figure 1: Data and properties of …tted model for Fulton Fish market data

is stormy. The present analysis is taken from Hendry and Nielsen (2007, §13.5).
A simple autoregressive model for log quantities qt gives

qbt = 7:0 + 0:19 qt 1 0:36 St ; (2.1)


(standard error) (0:8) (0:09) (0:15)
[t-statistic] [8:8] [2:03] [ 2:39]

b = 0:72; b̀ = 117:82; R2 = 0:090; t = 2; : : : ; 111;


2 2 2
norm [2] = 6:9 [p= 0.03], skew [1] = 6:8 [p= 0.01], kurt [1] = 0:04 [p= 0.84],
Far(1 2) [2; 106] = 0:9 [p= 0.40], Farch(1) [1; 106] = 1:4 [p= 0.24],
Fhet [3; 103] = 2:0 [p= 0.12], Freset [1; 106] = 1:8 [p= 0.18].

Here b2 is the residual variance, b̀ is the log likelihood, T is the sample size. The residual
speci…cation tests include cumulant based tests for skewness, 2skew ; kurtosis, 2kurtosis and
both, 2norm = 2skew + 2kurtosis ; a test Far for autoregressive temporal dependence, see Godfrey
(1978), a test Farch for autoregressive conditional heteroscedasticity, see Engle (1982), a
test Fhet for autoregressive conditional heteroscedasticity, see White (1980), and a test Freset
for functional form, see Ramsey (1969). We note that the above references only consider
stationary processes, but the speci…cation tests also apply for non-stationary autoregressions,
see Kilian and Demiroglu (2000) and Engler and Nielsen (2009) for 2skew ; 2kurtosis and Nielsen
(2006) for Far : The computations were done using OxMetrics, see Doornik and Hendry (2013).
Figure 1(b; d) shows the …tted values and the standardized residuals.
The speci…cation tests indicate that the residuals are skew. Indeed the time series plot
of the residuals in Figure 1(d) shows a number of large negative residuals. The three largest
residuals have an interesting institutional interpretation. The observations 18 and 34 are
Boxing Day and Martin Luther King Day, which are public holidays, while observation 95 is
Wednesday before Easter. Thus, from a substantive viewpoint it seems preferable to include
4
dummy variables for each of these days, which gives

qbt = 7:9 + 0:09 qt 1 0:36 St 1:94 Dt18 1:82 Dt34 2:38 Dt95 ; (2.2)
(0:7) (0:08) (0:14) (0:66) (0:66) (0:66)
[10:8] [1:04] [ 2:68] [ 3:00] [ 2:75] [ 3:64]

b = 0:64; b̀ = 104:42; R2 = 0:287; t = 2; : : : ; 111:

Speci…cation tests, which are not reported, indicate a marked improvement in the speci…-
cation. Comparing the regressions (2.1) and (2.2) it is seen that the lagged quantities were
marginally signi…cant in the …rst, misspeci…ed regression, but not signi…cant in the second,
better speci…ed, regression. It is of course no surprise that outliers matter for statistical
inference - and that institutions matter for markets.
The above modelling strategy blends usage of speci…cation tests, graphical tools and
substantive arguments. It points at robustifying a regression by removing outliers and then
re…tting the regression. We note that outliers are de…ned as those observations that do not
conform with the statistical model. In the following we will consider some algorithms for
outlier detection that are inspired by this example. These algorithms are solely based on
statistical information and we can then discuss their properties by mathematical means. In
practice, outcomes should of course be assessed within the substantive context. We return
to this example in §11.

3 Model
Throughout, we consider data (yi ; xi ), i = 1; : : : ; n where yi is univariate and xi has dimension
dim x: The regressors are possibly trending in a deterministic or stochastic fashion. We
assume that (yi ; xi ), i = 1; : : : ; n satisfy the multiple regression equation
0
yi = xi + "i ; i = 1; : : : ; n: (3.1)

The innovations, "i ; are independent of the …ltration Fi 1 ; which is the sigma-…eld generated
by x1 ; : : : ; xi and "1 ; : : : ; "i 1 : Moreover, "i are identically distributed with mean zero and
variance 2 ; so that "i = has known symmetric density f and distribution function F(c) =
P("i c): In practice, the distribution F will often be standard normal.
We will think of the outliers as pairs of observations (yi ; xi ) that do not conform with
the model (3.1). In other words, a pair of observations (yi ; xi ) gives us an outlier if the
scaled innovation "i = does not conform with reference density f: This has slightly di¤erent
consequences for cross-sectional data and for time series data. For cross-sectional data
the pairs of observations (y1 ; x1 ); : : : ; (yn ; xn ) are unrelated. Thus, if the innovation "i is
classi…ed as an outlier, then the pair of observations (yi ; xi ) is dropped. We can interpret
this as an innovation not conforming with the model, or that yi or xi or both are not correct.
This is di¤erent for time-series data, where the regressors will include lagged dependent
variables. For instance, for a …rst order autoregression xi = yi 1 : We distinguish between
innovative outliers and additive outlier. Classifying the innovation "i as an outlier, has the
consequence that we discard the evaluation of the dynamics from yi 1 to yi without discarding
the observations yi 1 and yi . Indeed, yi 1 appears as the dependent variable at time i 1 and
the yi as the regressor at time i+1; respectively. Thus, …nding a single outlier in a time series
5
context, implies that the observations are considered correct, but possibly not generated by
the model. An additive outlier arises if an observation yi is wrongly measured. For a …rst
order autoregression this is captured by two innovative outliers "i and "i+1 : Discarding these,
the observation yi will not appear.
We consider algorithms using absolute residuals and calculation of least squares estima-
tors from selected observations. Both these choices implicitly assume a symmetric density:
If non-outlying innovations were asymmetric then the symmetrically truncated innovations
would in general be asymmetric and the least squares estimator for location would be biased.
With symmetry the absolute value errors j"i j= have density g(c) = 2f(c) and distribution
function G(c) = P(j"1 j c) = 2F(c) 1. We de…ne = G(c) so that c is the quantile

c = G 1 ( ) = F 1 f(1 + )=2g; 2 [0; 1[;

while the probability of exceeding the cut-o¤ value c is

=1 =1 G(c):

De…ne also the truncated moments


Z c Z c Z 1
2 4
= u f(u)du; {= u f(u)du; = u4 f(u)du; (3.2)
c c 1

and the conditional variance of "1 = given fj"1 j cg as


Z c
2
& = = = u2 f(u)du=P(j"1 = j c); (3.3)
c

which will serve as a bias correction for the variance estimators based on the truncated
sample. De…ne also the quantity

= 2c(c2 & 2 )f(c): (3.4)

In this paper we focus on the normal reference distribution. The truncated moments
then simplify as follows

= 2cf(c); {=3 2c(c2 + 3)f(c); = 3: (3.5)

4 Some outlier detection algorithms


Least squares estimators are known to be fragile with respect to outliers. A number of
robust methods have been developed over the years. We study a variety of estimators with
the common property that outlying observations are skipped.

4.1 M-estimators
Huber (1964) introduced M-estimators as a class of maximum likelihood type estimators for
location. The M-estimator for the regression model (3.1) is de…ned as the minimizer of
P
Rn ( ) = n 1 ni=1 (yi x0i ): (4.1)
6

(a) c=1.4 (b) c=0.7

28.180
200

e
150

objectiv

28.165
100

28.150
50

-0.2 0.0 0.2 0.4 0.6 0.202 0.206 0.210

Figure 2: Huber-skip objective function for Fulton …sh data.

for some absolutely continuous and non-negative criterion function : In particular, the
least squares estimator arises when (u) = u2 while the median or least absolute deviation
estimator arises for (u) = juj: We will pursue the idea of hard rejection of outliers through
the non-convex Huber-skip criterion function (u) = u2 1(juj c) + c2 1(juj> c) for some cut-o¤
c > 0 and known scale :
The objective function of the Huber-skip M-estimator is non-convex. Figure 2 illustrates
the objective function for the Fish data.5 The speci…cation is as in equation (2.1). All
parameters apart from that on qt 1 are held …xed at the values in (2.1). Panel (a) shows
that when the cut-o¤ c is large the Huber-skip is quadratic in the central part. Panel (b)
shows that when the cut-o¤ c is smaller the objective function is non-di¤erentiable in a …nite
number of points. Subsequently, we consider estimators that are easier to compute and apply
for unknown scale, while hopefully preserving some useful robustness properties.
The asymptotic theory of M-estimators has been studied in some detail for the situation
without outliers. Huber (1964) proposed a theory for location models and convex criterion
functions : Jureµcková and Sen (1996, p. 215f) analyzed the regression problem with convex
criterion functions. Non-convex criterion functions were considered for location models in
Jureµcková and Sen (1996, p. 197f), see also Jureµcková, Sen, and Picek (2012). Chen and Wu
(1988) showed strong consistency of M-estimators for general criterion functions with i.i.d.or
deterministic regressors, while time series regression is analyzed in Johansen and Nielsen
(2014b). We review the latter theory in §7.1.

4.2 Huber-skip estimators


We consider some estimators that involve skipping data points, but are not necessarily M-
estimators. The objective functions have binary stochastic weights vi for each observation.
5
Graphics were done using R 3.1.1, see R Development Core Team (2014).
7
These weights are de…ned in various ways below. In all cases the objective function is
P
Rn ( ) = n 1 ni=1 f(yi x0i )2 vi + c2 (1 vi )g: (4.2)

The weights vi may depend on . The …rst example is the Huber-skip M-estimator which
depends on a cut-o¤ point c; where

vi = 1(jyi x0i j c ) : (4.3)

Another example is the Least Trimmed Squares estimator of Rousseeuw (1984) which de-
pends on an integer k n; where

vi = 1(jyi x0i j (k) )


; (4.4)

for (k) chosen as the k-th smallest order statistic of absolute residuals i = jyi x0i j for
i = 1; : : : ; n. Given an integer k n we can …nd and c so k=n = = G 1 (c); and ; c; k
are di¤erent ways of calibrating the methods. In either case, once the regression estimator
b has been determined the scale can be estimated by
P P
b2 = & 2 ( ni=1 vi ) 1 f ni=1 vi (yi x0i b)2 g; (4.5)

where & 2 = = is the consistency correction factor de…ned P in (3.3).


For the Least Trimmed Squares estimator it holds that ni=1 (1 vi ) = n k: Thus, the
last term in the objective function (4.2) does not depend on ; so that it is equivalent to
optimize
X
n
1
Rn;LT S ( ) = n vi (yi x0i )2 : (4.6)
i=1

The Least Trimmed Squares weight (4.4) is scale invariant in contrast to the Huber-skip
M-estimator. It is known to have breakdown point of = 1 = 1 k=n for < 1=2, see
Rousseeuw and Leroy (1987, §3.4). An asymptotic theory is provided by Víšek (2006a,b,c).
The estimator is computed through a binomial search algorithm which is uncomputable in
most practical situations, see Maronna, Martin and Yohai (2006, §5.7) for a discussion. A
number of iterative approximations have been suggested such as the Fast LTS algorithm by
Rousseeuw and van Driessen (1998). This leaves additional questions with respect to the
properties of the approximating algorithms.
If the weights vi do not depend on ; the objective function has a least squares solution
b = (Pn vi xi x0 ) 1 (Pn vi xi yi ): (4.7)
i=1 i i=1

From this the variance estimator (4.5) can be computed. Examples include 1-step Huber-skip
M-estimators based on initial estimators e; e2 , where

vi = 1(jyi x0i ej ce) ; (4.8)

and 1-step Huber-skip L-estimators based on an initial estimator e and a cut-o¤ k < n;
which de…nes the k-th smallest order statistic e(k) of absolute residuals i = jyi x0i ej; where

vi = 1(jyi x0i ej e(k) ) : (4.9)


8
These estimators are computationally attractive, but require a good starting point. They
can also be iterated. As before, we see that the 1-step L-estimator does not require an initial
scale estimator in contrast to the 1-step M-estimator.
Robusti…ed least squares arises if the initial estimators e; e2 are the full-sample least
squares estimators. This relates to the estimation procedure for the Fulton Fish Market
data in §2. This approach can be fragile, especially when there are more than a few outliers,
see Welsh and Ronchetti (2002) for a discussion.
The 1-step estimators relate to the 1-step M-estimators of Bickel (1975), although he was
primarily concerned with smooths weights vi : His idea was to apply preliminary estimators
b(0) ; (b(0) )2 and then de…ne the 1-step estimator b(1) by linearising the …rst order condition.
He also suggested iteration, but no results were given.
Ruppert and Carroll (1980) studied a related 1-step L-estimator for which …xed propor-
tions of negative and positive residuals are skipped. Following their suggestion we refer to
the estimator with weights (4.9) as a 1-step Huber-skip L-estimator, because the objective
function is de…ned by weights involving an order statistics. We note that there is a mismatch
in the nomenclature of L and M-estimators. Jaeckel (1971) de…ned L-estimators for location
problems in terms of the estimator, whereas Huber (1964) de…ned M-estimators in terms of
the objective function. Thus, the Least Trimmed Squares estimator is not classi…ed as an
L-estimator, although its objective function is a quadratic combination of order statistics.

4.3 Some statistical algorithms


We give three statistical algorithms involving iteration of 1-step Huber-skip estimators.
These are the iterated 1-step Huber-skip M-estimator, the Impulse Indicator Saturation,
and the Forward Search.
The 1-step Huber-skip estimators are amenable to iteration. Here we consider iterated
Huber-skip M-estimators.

Algorithm 4.1 Iterated 1-step Huber-skip M-estimator. Choose a cut-o¤ c > 0.


1. Choose initial estimators b(0) ; (b(0) )2 and let m = 0:
2. De…ne indicator variables vi as in (4:8); replacing e; e2 by b(m) ; (b(m) )2 :
(m)

3. Compute least squares estimators b(m+1) ; (b(m+1) )2 as in (4:7); (4:5) replacing vi by vi :


(m)

4. Let m = m + 1 and repeat 2 and 3.

The Iteration Algorithm 4.1 does not have a stopping rule. This leaves the questions
whether the algorithm convergences with increasing m and n and in which sense it approxi-
mates the Huber-skip estimator.
The Impulse Indicator Saturation algorithm has its roots in the empirical work of Hendry
(1999) and Hendry, Johansen and Santos (2008). It is a 1-step M-estimator, where the
initial estimator is formed by exploiting in a simple way the assumption, that a subset of
observations is free of outliers. The idea is to divide the sample into two sub-samples. Then
run a regression on each sub-sample and use this to …nd outliers in the other sub-sample.

Algorithm 4.2 Impulse Indicator Saturation. Choose a cut-o¤ c > 0:


1.1. Split indices in sets Ij ; for j = 1; 2; of nj observations.
9
2
1.2. Calculate the least squares estimators for ( ; ) based upon sample Ij as

bj = (P xi x0 ) 1 (P xi yi ); bj2 =
1P
(yi x0i ^j )2 :
i2Ij i i2Ij
nj i2Ij

1.3. De…ne indicator variables for each observation


( 1)
vbi = 1(i2I1 ) 1(jyi x0i b2 j cb2 ) + 1(i2I2 ) 1(jyi x0i b1 j cb1 ) ; (4.10)

1.4. Compute least squares estimators b(0) ; (b(0) )2 using (4:7); (4:5); replacing vi by vbi
( 1)

and let m = 0:
(m)
2. De…ne indicator variables vi = 1(jyi x0 b(m) j cb(m) ) as in (4:8):
i

3. Compute least squares estimators b(m+1) ; (b(m+1) )2 as in (4:7); (4:5); replacing vi by vi :


(m)

4. Let m = m + 1 and repeat 2 and 3.

Due to its split half approach to the initial estimation, the Impulse Indicator Satura-
tion may be more robust than robusti…ed least squares. The Impulse Indicator Saturation
estimator will work best when the outliers are known to be in a particular subset of the
observations. For instance, consider the split half case where index sets I1 ; I2 are chosen as
the …rst half and the second half of the observations, respectively. Then the algorithm has a
good ability to detect for instance a level shift half way through the second sample, while it
is poor at detecting outliers scattered throughout both samples, because both sample halves
are contaminated. If the location of the contamination is unknown, one will have to iterate
over the choice of the initial sets I1 ; I2 : This is what the more widely used Autometrics
algorithm does, see Doornik (2009) and Doornik and Hendry (2014).
The Forward Search algorithm is an iterated 1-step Huber-skip L-estimator suggested
for the multivariate location model by Hadi (1992) and for multiple regression by Hadi and
Simono¤ (1993) and developed further by Atkinson and Riani (2000), see also Atkinson,
Riani and Cerioli (2010). The algorithm starts with a robust estimate of the regression
parameters. This is used to construct the set of observations with the smallest m0 absolute
residuals. We then run a regression on those m0 observations and compute absolute residuals
of all n observations. The observations with m0 + 1 smallest residuals are then selected, and
a new regression is performed on these m0 + 1 observations: This is then iterated. Since
the estimator based on the m0 + 1 observation is computed in terms of the order statistic
based on the estimator for the m0 observation, it is a 1-step Huber-skip L-estimator. When
iterating the order of the order statistics is gradually expanding.

Algorithm 4.3 Forward Search.


1. Choose an integer m0 < n and an initial estimators b(m0 ) ; and let m = m0 :
2.1. Compute absolute residuals bi = jyi x0i b(m) j for i = 1; : : : ; n:
(m)

2.2. Find the (m + 1)th smallest order statistic zb(m) = b(m+1) :


(m)

(m)
2.3. De…ne indicator variables vi = 1(jyi (m)
x0i b(m) j b(m+1) )
as in (4:9):
3. Compute least squares estimators b(m+1) ; (b(m+1) )2 as in (4:7); (4:5) replacing vi by vi :
(m)

4. If m < n let m = m + 1 and repeat 2 and 3.


10
The Forward Search Algorithm 4.3 has a …nite number of steps. It terminates when
m = n 1 and b(n) ; (b(n) )2 are the full sample least squares estimators. Applying the
algorithm for m = m0 ; : : : ; n 1, results in sequences of least squares estimators b(m) ; (b(m) )2
and order statistics zb(m) = b(m+1) :
(m)

The idea of the Forward Search is to monitor the plot of scaled forward residuals zb(m) =b(m) .
For each m we can …nd the asymptotic distribution of zb(m) =b(m) and add a curve of point-
wise p-quantiles as a function of m for some p: The …rst m for which zb(m) =b(m) exceeds the
quantile curve is the estimate m b of the number of non-outliers. Asymptotic theory for the
forward residuals zb(m) =b(m) is reviewed in §8.3. A theory for the estimator m b is given in §10.
A variant of the Forward Search advocated by Atkinson and Riani (2000) is to use the
minimum deletion residuals db(m) = mini62S (m) bi instead of the forward residuals zb(m) .
(m)

5 Overview of the results for the location case


We give an overview of the asymptotic theory for the M-type Huber-skip estimators for the
location problem, where x0i reduces to a location parameter . The theory evolves around
two asymptotic results. The …rst is an asymptotic distribution for the M-estimator. The
second is an asymptotic expansion for 1-step M-estimators. The iterated 1-step M-estimators
are found to converge to the M-estimator.
Huber (1964) proposed an asymptotic theory for M-estimators with convex objective
function in location models. His proof did not extend to the Huber-skip M-estimator based
on the weights (4.3). Instead he assumed consistency and conjectured that the asymptotic
distribution would be, for symmetric f;
1 Pn D
2
n1=2 (b )= n 1=2
i=1 "i 1(j"i j c) + oP (1) ! N[0; ]: (5.1)
2cf(c) f 2cf(c)g2
This result is generalized to time series regression in Theorem 7.1.
In the situation with normal errors = 2cf(c); see (3.5), the asymptotic variance in
(5.1) reduces to 2 = : The e¢ ciency relative to the sample average, which is the least squares
estimator, is therefore : The bottom curve plotted in Figure 3 shows the e¢ ciency as a
function of . The least trimmed squares estimator has the same asymptotic distribution.
Next, consider the 1-step Huber-skip M-estimator based on weights (4.8). It has asymp-
totic expansion linking the updated estimator b(1) with the initial estimator b(0) through
1 Pn 2cf(c)
n1=2 (b(1) )= n 1=2
i=1 "i 1(j"i j c) + n1=2 (b(0) ) + oP (1); (5.2)

see Theorem 7.2 for regression.


Robusti…ed least squares arises, if we choose the initial estimator b(0) as the least squares
estimator. In that case we get the expansion
1 Pn 2cf(c) Pn
n1=2 (b(1) )= n 1=2
i=1 "i 1(j"i j c) + n 1=2
i=1 "i + oP (1): (5.3)

We can use the Central Limit Theorem to show asymptotic normality of the estimator.
The asymptotic variance follows in Theorem 7.3. The e¢ ciency relative to least squares
estimation is shown as the top curve in Figure 3.
11

1.0
0.8
efficiency

0.6
0.4

robustified least squares


=Impulse Indicator Saturation, m=0
0.2

Impulse Indicator Saturation, m=1


Huber-skip M-estimator
0.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0


cut-off value c

Figure 3: The e¢ ciency of robusti…ed least squares, the Impulse Indicator Saturation, and
the Huber-skip M-estimator relative to full sample least squares when the reference distrib-
ution is normal.

Starting with other estimators give di¤erent asymptotic variances. An example is the
Impulse Indicator Saturation Algorithm 4.2. Theorem 7.4 shows that the initial split-half
estimator b(0) has the same asymptotic distribution as the robusti…ed least squares estimator.
The updated 1-step estimator b(1) is slightly less e¢ cient, as shown by the middle curve in
Figure 3, but hopefully more robust.
The 1-step M-estimator can be iterated along the lines of Algorithm 4.1. This iteration
has a …xed point b solving the equation

1 Pn 2cf(c)
n1=2 (b )= n 1=2
i=1 "i 1(j"i j c) + n1=2 (b ) + oP (1); (5.4)

see Theorem 7.6. Thus, any in‡uence of the initial estimator is lost through iteration. Solving
this equation gives
1 Pn
n1=2 (b )= "i 1(j"i j c) + oP (1); (5.5)
2cf(c) i=1

with the same leading term as the Huber-skip M-estimator in (5.1).

6 Preliminary asymptotic results


We present the main ingredient for the asymptotic theory.

6.1 Assumptions on regressors and density


The innovations "i and regressors xi must satisfy moment assumptions. The innovations "i
have symmetric density with derivative satisfying boundedness and tail conditions. Related
12
conditions on the density are often seen in the literatures on empirical processes and quantile
processes. These conditions are satis…ed for the normal distribution and t-distributions,
see Johansen and Nielsen (2014a) for a discussion. For the iterated estimator we need an
assumption of unimodality. The minimal assumptions vary for the di¤erent estimators,
as explored in Johansen and Nielsen (2009, 2013, 2014a, 2014b) for 1-step Huber-skip M-
estimators, for iterated 1-step Huber-skip M-estimators, for the Forward Search and for
general M-estimators, respectively.
For this presentation we simply assume a normal reference distribution, which, of course,
is most used in practice. With normality we avoid a somewhat tedious discussion of existence
of moments of a certain order. The regressors can be temporally dependent and possibly
deterministically or stochastically trending.

Assumption 6.1 Let Fi be the …ltration generated by x1 ; : : : ; xi+1 and "1 ; : : : ; "i : Assume
(i) innovations "i = are independent of Fi 1 and standard normal;
(ii) regressors xi satisfy, for some non-stochastic normalisation matrix N ! 0 and random
matrices V; ; , the following joint convergence results hold
P D
(a) Vn = N 0 ni=1 xi "i ! V ;
P D a:s:
(b) n = N 0 ni=1 xi x0i N ! > 0;
P D
(c) n 1=2 N 0 ni=1 xi ! ;
1=2 0
(d) maxi P n jn N xi j = oP (n ) for all > 0;
(e) n 1 E ni=1 jn1=2 N 0 xi jq = O(1) for some q > 9:

The Assumption 6.1(ii) for the regressors are satis…ed in a range of situations, see Jo-
hansen and Nielsen (2009). For instance, xi could be vector autoregressive with stationary
roots or roots at one. It also holds for quite general regressors including polynomial regres-
sors. The normalisation is N = n 1=2 Idim x for stationary regressors and N = n 1 Idim x for
random walk regressors.
We note that Assumption 6.1 implies Assumption 3.1(i; ii) of Johansen and Nielsen
(2014a) by choosing = 1=4, q0 = q > 9 and a = for a small > 0 so that 0 < is
bounded by the minimum of ; 1=(1 + dim x) and (q 9)=(q 1):

6.2 Weighted and marked empirical processes


The asymptotic analysis of Huber-skip estimators is concerned with a class of weighted and
marked empirical processes. The 1-step estimators for and 2 have estimation errors that
can be expressed in terms of statistics of the form
Pn Pn Pn 0
Pn 2
i=1 vi ; i=1 vi xi "i ; i=1 vi xi xi ; i=1 vi "i ; (6.1)

where vi are indicator functions for small residuals. Such sums of indicator functions are the
basis for empirical process. The Fi 1 -predictable factors xi and xi x0i are called weights in
line with Koul (2002). The unbounded, Fi -adapted factors "i and "2i are said to be marks.
For M-type estimators, the indicator functions have the form

vM;i = 1fj"i x0i ( e )j c+(e )cg = 1fj"i x0i N e


bj c+n 1=2 e
acg ; (6.2)
13

which allows for estimation uncertainty eb = N 1 ( e a = n1=2 (e


) and e ) in the regression
coe¢ cient and in the scale : For L-type estimators the indicators are
vL;i = 1fj"i x0i ( e )j ~(k) g = 1fj"i x0i N e
bj (c+n e
1=2 d)g ; (6.3)

which allows for estimation uncertainty eb = N 1 ( e ) and de = n1=2 ( ~= c) in the


regression coe¢ cient and in the quantile :
We will need an asymptotic linearization of the statistics (6.1) with respect to the esti-
mation uncertainty. For this purpose we start by considering weights
vib;c;d = 1fj"i x0i N bj (c+n 1=2 d)g ; (6.4)
where the estimation uncertainty is replaced by bounded, deterministic terms b; d: Subse-
quently, we apply the result to M-type and L-type estimators, by replacing b by eb and d by
e
ac= and d,e respectively.
The following asymptotic expansion is a version of Lemma D.5 of Johansen and Nielsen
(2014a) formulated under the present simpli…ed Assumption 6.1.

Theorem 6.1 (Johansen and Nielsen, 2014a, Lemma D.5) Suppose Assumption 6.1 holds.
Consider the product moments (6.1) with weights vib;c;d given by (6.4) and expansions
P P
n 1=2 ni=1 vib;c;d = n 1=2 ni=1 1(j"i j c) + 2f(c)d + Rv (b; c; d);
P P
n 1=2 ni=1 vib;c;d "2i = n 1=2 ni=1 "2i 1(j"i j c) + 2 2 c2 f(c)d + Rv"" (b; c; d);
P P P
N 0 ni=1 vib;c;d xi "i = N 0 ni=1 xi "i 1(j"i j c) + 2cf(c)N 0 ni=1 xi x0i N b + Rvx" (b; c; d);
P P
N 0 ni=1 vib;c;d xi x0i N = N 0 ni=1 xi x0i N + Rvxx (b; c; d):
Let
R(b; c; d) = jRv (b; c; d)j + jRv"" (b; c; d)j + jRvx" (b; c; d)j + jRvxx (b; c; d)j
Then it holds for all (large) B > 0, all (small) > 0 and n ! 1 that
sup sup R(b; c; d) = oP (1): (6.5)
jbj;jdj n1=4 B 0<c<1

In particular, for bounded c; then


sup sup R(b; c; ac= ) = oP (1): (6.6)
jaj;jbj n1=4 B 0<c B

Theorem 6.1 is proved by a chaining argument. The idea is to cover the domain of b; d
with a …nite number of balls. The supremum over the large compact set can then be replaced
by considering the maximum value over the centers of the balls and the maximum of the
variation within balls. By subtracting the compensators of the product moments we turn
them into martingales. The argument will therefore be a consideration of the tail behaviour
of the maximum of a family of martingales using the iterated martingale inequality presented
in §6.3 and Taylor expansions of the compensators.
Related results of Theorem 6.1 are considered in the literature. Koul and Ossiander
(1994) considered weighted empirical processes without marks and with > 1=4: Johansen
and Nielsen (2009) considered the situation (6.6) for …xed c and with > 1=4:
14

6.3 An iterated martingale inequality


We present an iterated martingale inequality, which can be used to assess the tail behaviour
of the maximum of a family of martingales. It builds on an exponential martingale inequality
by Bercu and Touati (2008).

Theorem 6.2 (Bercu and Touati, 2008, Theorem 2.1) For i = 1; : : : ; n let (mi ; Fi ) be a
locally square integrable martingale di¤erence. Then, for all x; y > 0;
P Pn x2
P[j ni=1 mi j x; 2 2
i=1 fmi + E(mi jFi 1 )g y] 2 exp( ):
2y

In order to bound a family of martingales it is useful to iterate this martingale inequality


to get the following iterated martingale inequality.

Theorem 6.3 (Johansen and Nielsen, 2014a, Theorem 5.2.) For ` = 1; : : : ; L let z`;i be
P
2r
Fi -adapted so Ez`;i < 1 for some r 2 N: Let Dr = max1 ` L ni=1 E(z`;i
2r
jFi 1 ) for 1 r r:
Then, for all 0 ; 1 ; : : : ; r > 0, it holds
P EDr Pr EDr P 2
P[ max j ni=1 fz`;i E(z`;i jFi 1 )gj > 0] L + r=1 + 2L rr=01 exp( r
):
1 ` L r r 14 r+1

Theorem 6.3 contains parameters 0 ; 1 ; : : : ; r ; which can be chosen in various ways. We


give two examples taken from Theorems 5.3, 5.4 of Johansen and Nielsen (2014a)
The …rst example is to show that the remainder terms in Theorem 6.1 are uniformly
small. In the proof we consider a family of size L = O(n ) where > 0 depends on the
dimension of the regressor and seek to prove that the maximum of the family of martingales
q q
is of order oP (n1=2 ). Choosing q = ( n1=2 )2 (28 log n)1 2 ; so that 2q = q+1 = 28 log n and
1=2
0 = n ; a result of that type follows.
The second example is to show that the empirical processes (6.1) are tight. In this case
the family is of …xed size L; and now the probability that the maximum of the family of
martingales is larger than n1=2 has to be bounded by a small number. Choosing q =
q 1 1 2q
n2 so 2q = q+1 = and 0 = n1=2 a result of that type follows.

7 Asymptotic results for Huber-skip M-estimators


We consider recent results on the Huber-skip M-estimator as well as for 1-step Huber-skip
M-estimators and iterations thereof.

7.1 Huber-skip M-estimators


The Huber-skip M-estimator is the solution to the optimization problem (4.2) with weights
(4.3). Since this problem is non-convex we need an additional assumption that bounds
the frequency of small regressors. That bound involves a function that is an approximate
inverse of the function n ( ) appearing in the analysis of S-estimators by Davies (1990), see
also Chen and Wu (1988). The bound can be satis…ed for stationary and non-stationary
15
regressors. The condition is used to prove that the objective function is uniformly bounded
below for large values of the parameter, a property that implies existence and tightness of
the estimator. For full descriptions of the bound to the regressors and extensions to a wider
class of M-estimators, see Johansen and Nielsen (2014b).

Theorem 7.1 (Johansen and Nielsen, 2014b, Theorems 1,2,3) Consider the Huber-skip M-
estimator de…ned from (4.2), (4.3). Suppose Assumption 6.1 holds and that the frequency of
small regressors is bounded as outlined above. Then any minimizer of the objective function
(4:2) has a measurable version and satis…es
1 Pn
N 1
(b )= n
1
N0 i=1 xi "i 1(j"i j c) + oP (1):
2cf(c)

If, in addition the regressors are stationary then


D
n1=2 ( b ) ! Nf0; 1 2
= g:

Theorem 7.1 proves the conjecture (5.1) of Huber (1964) for time series regression. The
regularity conditions on the regressors are much weaker than those normally considered in
for instance Chen and Wu (1988), Liese and Vajda (1994), Maronna, Martin, and Yohai
(2006), Huber and Ronchetti (2009), and Jureµcková, Sen, and Picek (2012). Theorem 7.1
extends to non-normal, but symmetric densities and even to non-symmetric densities and
objective function, by introducing a bias correction.
Theorem 7.1 is proved in three steps. First, it is shown that b is tight, that is N 1 ( b ) =
OP (n1=2 ); through a geometric argument that requires the assumption to the frequency of
small regressors. Secondly, it is shown that b is consistent, in the sense that N 1 ( b )=
OP (n1=2 ) for any < 1=4; using the iterated martingale inequality of Theorem 6.3. Finally,
the presented expansion of Theorem 7.1 is proved, again using Theorem 6.3.

7.2 1-step Huber-skip M-estimators


The asymptotic theory of the 1-step Huber-skip M-estimator for regression is given in Jo-
hansen and Nielsen (2009). The main result is a stochastic expansion of the updated esti-
mation error in terms of a kernel and the original estimation error. It follows from a direct
application of Theorem 6.1.

Theorem 7.2 (Johansen and Nielsen, 2009, Corollary 1.2) Consider the 1-step Huber-skip
M-estimators b(1) ; b(1) de…ned by (4.7), (4.5) with weights (4.8). Suppose Assumption 6.1
holds and that N 1 ( b(0) ) and n1=2 (b(0) ) are OP (1): Then

1 Pn 2cf(c)
N 1
( b(1) )= n
1
N0 i=1 xi "i 1(j"i j c) + N 1
( b(0) ) + oP (1): (7.1)
1 Pn
n1=2 (b(1) )= n 1=2 2
i=1 ("i
2
)1(j"i j c) + n1=2 (b(0) ) + oP (1) (7.2)
2 2
16
Theorem 7.2 generalises the statement (5.2) for the location problem. Theorem 7.2 shows
that the updated regression estimator b(1) only depends on the initial regression estimator
b(0) and not on the initial scale estimator b(0) : This is a consequence of the symmetry imposed
on the problem. Johansen and Nielsen (2009) also analyze situations where the reference
distribution f is non-symmetric and the cut-o¤ is made in a matching non-symmetric way.
In that situation both expansions involve the initial estimation uncertainty for and 2 :
We can immediately use Theorem 7.2 for an m-fold iteration of (7.1), (7.2). Results for
in…nite iterations follow in §7.5.

7.3 Robusti…ed Least Squares


Robusti…ed least squares arises when the initial estimators are the full-sample least squares
estimator. We can analyze this 1-step Huber-skip M-estimator using Theorem 7.2. The
product moment properties in Assumption 6.1 imply that the initial estimators satisfy
P
N 1( e ) = OP (1); n1=2 (e2 2
) = n 1=2 ni=1 ("2i 2
) + OP (n 1=2 ): (7.3)

Thus, the conditions of Theorem 7.2 are satis…ed so that the robusti…ed least squares esti-
mators can be expanded as in (7.1), (7.2). The asymptotic distribution of estimator for
will depend on the properties of the regressors. For simplicity the regressors are assumed
stationary in the following result.

Theorem 7.3 (Johansen and Nielsen, 2009, Corollary 1.4) Consider the 1-step Huber-skip
M-estimator de…ned with the weights (4:8) and where the initial estimators e; e2 are the
full-sample least squares estimators. Suppose Assumption 6.1 holds and that the regressors
are stationary. Then
b D
2 1
0
n1=2 ! N 0; 4 ;
b2 2 0 2

where, using the coe¢ cients ( ; {; ) from (3.2) and (3.4), the e¢ ciency factors ; are
2
2 2 2 2
= f1 + 4cf(c)g + f2cf(c)g ; 2 = ({ = )(1 + ) + ( 1): (7.4)
4
The result generalises the statement (5.3) for the location problem. The e¢ ciency factor
is plotted as the top curve in Figure 3. A plot of the e¢ ciency for the variance, can be
found in Johansen and Nielsen (2009, Figure 1.1). Situations with non-stationary regressors
are also discussed in that paper.

7.4 Impulse Indicator Saturation


Impulse Indicator Saturation is a second example of a 1-step Huber-skip M-estimator. This
requires the choice of sub-samble Ij ; each with nj observations. If the product moment
properties of Assumption 6.1 hold for each sub-sample and nj =n ! j > 0 then the initial
estimators satisfy
1=2 P
N 1 ( bj
1=2
) = OP (1); nj (b2 2
) = nj 2
i2Ij ("i
2
) + OP (n 1=2 ): (7.5)
17
The asymptotic distribution theory will depend on the choice of sub-samples and regressors.
For simplicity we only report the split-half case with subsets I1 = (i n=2) and I2 = (i >
n=2) and stationary regressors.

Theorem 7.4 (Johansen and Nielsen, 2009, Theorems 1.5, 1.7) Consider the split-half Im-
pulse Indicator Saturation estimator of Algorithm 4.2. Suppose Assumption 6.1 holds with
stationary regressors. Recall the e¢ ciency factors ; from (7:4). Then the initial esti-
mators satisfy
b(0) D
2 1
0
n1=2 ! N 0; 4 ;
(b(0) )2 2 0 2

Moreover, the updated Impulse Indicator Saturation estimator satis…es


D
n1=2 ( b(1) ) ! N(0; 2 iis 1
);

where
4 iis 1
= f + 2cf(c)g [ + 2cf(c) + 2f2cf(c)g2 ] + f2cf(c)g4 :
2
The e¢ ciency factors and iis for the split-half case are plotted as the top and the
middle curve, respectively, in Figure 3. Johansen and Nielsen (2009) also discuss situations
with general index sets I1 ; I2 and where the regressors are non-stationary.

7.5 Iterated 1-step Huber-skip M-estimators


The asymptotic theory of the iterated 1-step Huber-skip M-estimator for regression is given
in Johansen and Nielsen (2013). This includes iteration of the robusti…ed least squares
estimator and of the Impulse Indicator Saturation estimator with general index sets and
general regressors. In each step the asymptotic theory is governed by Theorem 7.2. But
what does it take to control the iteration and establish a …xed point result?
We start by showing that the sequence of normalised estimators b(m) ; b(m) is tight.

Theorem 7.5 (Johansen and Nielsen, 2013, Theorem 3.3) Consider the iterated 1-step
Huber-skip M-estimator in Algorithm 4.1. Suppose Assumption 6.1 holds and that N 1 ( b(0)
) and n1=2 (b(0) ) are OP (1): Then

sup jN 1
( b(m) )j + jn1=2 (b(m) )j = OP (1):
0 m<1

Theorem 7.5 is proved by showing that the expansions (7.1), (7.2) are contractions.
Necessary conditions are that 2cf(c)= < 1 and =(2 ) < 1: This holds for normal or t-
distributed innovations, see Johansen and Nielsen (2013, Theorem 3.6).
In turn, Theorem 7.5 leads to a …xed point result for in…nitely iterated estimators.

Theorem 7.6 (Johansen and Nielsen, 2013, Theorem 3.3) Consider the iterated 1-step
Huber-skip M-estimator in Algorithm 4.1. Suppose Assumption 6.1 holds and that N 1 ( b(0)
18

) and n1=2 (b(0) ) are OP (1): Then, for all ; > 0 a pair m0 ; n0 > 0 exists so for all
m > m0 and n > n0 it holds
PfjN 1 ( b(m) b )j + n1=2 jb(m) b j > g < ;
where
1 Pn
N 1
(b )= n
1
N0 i=1 xi "i 1(j"i j c) ; (7.6)
2cf(c)
2 Pn
n1=2 f(b )2 2
g= n 1=2 2
i=1 ("i
2
= )1(j"i j c) : (7.7)
2
Recently Cavaliere and Georgiev (2013) made a similar analysis of a sequence of Huber-
skip M-estimators for the parameter of a …rst order autoregression with in…nite variance
errors and an autoregressive coe¢ cient of unity.
Iterated 1-step Huber-skip M-estimators can be viewed as iteratively reweighted least
squares with binary weights. Dollinger and Staudte (1981) gave conditions for convergence
of iteratively reweighted least squares for smooth weights. Their argument was cast in
terms of in‡uence functions. While Theorem 7.6 is similar in spirit, the employed tightness
argument is di¤erent because of the binary weights.
An issue of interest in the literature is, whether a slow initial convergence rate can be
improved upon through iteration. This would open up for using robust estimators converging
for instance at an n1=3 rate as initial estimator. An example would be the Least Median
Squares estimator of Rousseeuw (1984). Such a result would complement the result of He
and Portnoy (1992), who …nd that the convergence rate cannot be improved in a single step
of the iteration, as well as Theorem 8.3 below showing that the Forward Search can improve
the rate of a slowly converging initial estimator.

8 Asymptotic results for Huber-skip L-type estimators


The di¤erence between the Huber-skip estimators of the M-type and the L-type is that the
former have a …xed cut-o¤, whereas the latter have a cut-o¤ determined from the order
statististics of the absolute residuals. The asymptotic results appear to be the same, but
the argument to get there is a bit more convoluted for the L-type estimators because of the
quantiles involved. We give an overview of the results for Least Trimmed Squares estimators,
1-step Huber-skip L-estimators as well as the Forward Search.

8.1 Least Trimmed Squares


The Least Trimmed Squares estimator has the same asymptotic expansion as the Huber-skip
M-estimator. Víšek (2006a,b,c) proved this for the case of …xed regressors.
Theorem 8.1 (Víšek, 2006c, Theorem 1) Consider the Least Trimmed Squares estimator
bLT S de…ned as minimizer of (4.6). Suppose Assumption 6.1 holds. Suppose the regressors
are …xed and that their empirical distribution can be suitably approximated by a continuous
distribution function, see Víšek (2006c) for details. Then
1 Pn
N 1 ( bLT S )= 1 0
n N i=1 xi "i 1(j"i j c) + oP (1):
2cf(c)
19

8.2 1-step Huber-skip L-estimators


The 1-step Huber-skip L-estimator has the following expansion.
Theorem 8.2 Consider the 1-step Huber-skip L-estimators b(1) ; b(1) de…ned by (4.7), (4.5)
with weights (4.9). Suppose Assumption 6.1 holds and that N 1 ( b(0) ) is OP (1): Then
1 Pn 2cf(c)
N 1
( b(1) )= n
1
N i=1 xi "i 1(j"i j c) + N 1
( b(0) ) + oP (1): (8.1)

1 Pn b(k)
n1=2 (b(1) )= n 1=2 2
i=1 ("i
2
)1(j"i j c) + n1=2 ( ) + oP (1) (8.2)
2 2 c
1 1=2
Pn 2 2 1=2
P n
= n i=1 ("i )1(j"i j c) + n i=1 f1(j"i j c) g + oP (1)
2 4c
(8.3)
Proof. Equations (8.1), (8.2) follow from Theorem 6.1. Equation (8.3) with its expansion
of the quantile b(k) follows from Johansen and Nielsen (2014a, Lemma D.11).
Ruppert and Carroll (1980) state a similar result for a related 1-step L-estimator, but
omit the details of the proof. It is interested to note that the expansions of the one-step
regression estimator of L-type in (8.1) is the same as for the M-type in (7.1). In contrast, the
variance estimators have di¤erent expansions. In particular, the L-estimator does not use
the initial variance estimator and, consequently, the expansion does not involve uncertainty
from the initial estimation.

8.3 Forward Search


The Forward Search is an iterated 1-step Huber-skip L-estimator, where the cut-o¤ changes
slightly in each step. We highlight asymptotic expansions for the forward regression estima-
tors b(m) and for the scaled forward residuals zb(m) =b(m) : The results are formulated in terms
of embeddings of the time series b(m) ; b(m) ; zb(m) for m = m0 + 1; : : : ; n into the space D[0; 1]
of right continuous functions with limits from the left, for instance,
b(m) for m = integer(n ) and = m0 =n 1;
b = 0
0 otherwise.
Theorem 8.3 (Johansen and Nielsen, 2014a, Theorems 3.1, 3.2, 3.5) Consider the Forward
Search estimator in Algorithm 4.3. Suppose Assumption 6.1 holds and that N 1 ( b(m0 ) )
1=4
is OP (n ) for some > 0. Let 1 > 0 : Then, it holds
1 Pn
sup jN 1 ( b ) 1 0
n N i=1 xi "i 1(j"i = j c ) j = oP (1);
1 1 2c f(c )
b2 P "i
sup j n1=2 ( 2 1) n 1=2 ni=1 f( )2 g1(j"i = j c )
0 n=(n+1)
Pn
+(c2 )n 1=2
i=1 (1(j"i = j c ) )j = oP (1);
Pn
sup j2f(c )n1=2 (b
z = c )+n 1=2
i=1 f1(j"i = j c ) gj = oP (1):
0 n=(n+1)
20
The asymptotic variances and covariances are given in Theorem A.1.

The proof uses the theory of weighted and marked empirical processes outlined in §6.2
combined with the theory of quantile processes discussed in Csörg½o (1983). A single step of
the algorithm was previously analyzed in Johansen and Nielsen (2010).
Comparing Theorem 8.3 with Theorems, 7.6, 8.1, we recognise the asymptotic result for
the estimator for : The e¢ ciency relative to the least squares estimator is shown as the bot-
tom curve in Figure 3. The asymptotic expansion for the variance estimator b2 is, however,
di¤erent from the expression for the iterated 1-step Huber-skip M-estimator in Theorem
7.6, re‡ecting the di¤erent handling of the scale. The Bahadur (1966) representation link-
ing the empirical distribution of the scaled innovations "i = with their order statistics, bc
say, implies that 2f(c )n1=2 (b
z = b
c ) vanishes. Moreover, the minimum deletion residual
db(m) = mini62S (m) bi has the same asymptotic expansion as zb(m) = b(m+1) after a burn-in
(m) (m)

period. See Johansen and Nielsen (2014a, Theorem 3.4) for details.
The idea of the Forward Search is to monitor the plot of the sequence of scaled forward
residuals. Combining the expansions for b and zb in Theorem 8.3 gives the next result.

Theorem 8.4 (Johansen and Nielsen 2014a, Theorem 3.3). Consider the Forward Search-
estimator in Algorithm 4.3. Suppose Assumption 6.1 holds and that N 1 ( b(m0 ) ) is
OP (n1=4 ) for some > 0. Let 1 > 0 : Then
zb
sup j2f(c )n1=2 ( c ) + Zn (c )j = oP (1);
0 n=(n+1) b

where the process Zn (c) given by


c f(c ) Pn c f(c ) Pn "i
f1 (c2 )gn 1=2
i=1 f1(j"i = j c) g+ n 1=2
i=1 f( )2 g1(j"i = j c)

(8.4)
converges to a Gaussian process Z: The covariance of Z is given in Theorem A.1.

Part II
Gauge as a measure of false detection
We now present some new results for the outlier detection algorithms. Outlier detection
algorithms will detect outliers with a positive probability when in fact there are no outliers. xxx
We analyze this in terms of the gauge, which is the expected frequency of falsely detected
outliers when, in fact, the data generating process has no outliers. The idea of a gauge
originates in the work of Hoover and Perez (1999) and is formally introduced in Hendry and
Santos (2010), see also Castle, Doornik and Hendry (2011).
The gauge concept is related to, but also distinct from the concept of a size of a statistical
test, which is the probability of falsely rejecting a true hypothesis. For a statistical test we
choose the critical value indirectly from the size we are willing to tolerate. In the same way,
for an outlier detection algorithm, we can choose the cut-o¤ for outliers indirectly from the
gauge we are willing to tolerate.
21
The detection algorithms assign binary weigths vbi to each observation, so that vbi = 0 for
outliers and vbi = 1 otherwise. We de…ne the empirical or sample gauge as the frequency of
falsely detected outliers
1 Pn
b= (1 vbi ): (8.5)
n i=1
In turn, the population gauge is the expected frequency of falsely detected outliers, when in
fact the model has no contamination, that is
1 Pn
Eb = E (1 vbi ):
n i=1
To see how the gauge of an outlier detection algorithm relates to the size of a statistical
test, consider an outlier detection algorithm which classify observations as outliers if the
absolute residuals jyi x0i bj=b is large for some estimator ( b; b): That algorithm has gauge
1 Pn 1 Pn
b= (1 vbi ) = 1 x0i bj>bc) : (8.6)
n i=1 n i=1 (jyi

Suppose the parameters ; where known so that we could choose b; b as ; : Then the
population gauge reduces to the size of a test that a single observation is an outlier, that is
1 Pn 1 Pn
E 1(jy x0i j> c) =E 1(j" j> c) = P(j"1 j > c) = :
n i=1 i n i=1 i
In general, the population gauge will, however, be di¤erent from the size of such a test
because of the estimation error. In §9, §10 we analyze the gauge implicit in the de…nition of
a variety of estimators of type M and L, respectively. Proofs follow in the appendix.

9 The gauge of Huber-skip M-estimators


Initially a consistency result is given for the gauge of Huber-skip M-estimators and a distrib-
ution theory follows. A normal theory arises when the proportion of falsely detected outliers
is controlled by …xing the cut-o¤ c as n increases whereas a Poisson exceedence theory arises
when n is held …xed as n increases.

9.1 Asymptotic analysis of the gauge


We give an asymptotic expansion of the sample gauge of the type (8.6).

Theorem 9.1 Consider a sample gauge b of the form (8:6): Suppose Assumption 6.1 holds
and that N 1 ( b ); n1=2 (b2 2
) are OP (1): Then, for …xed c;
Pn b
n1=2 (b )=n 1=2
i=1 f1(j"i j> c) g + 2cf(c)n1=2 ( 1) + oP (1): (9.1)

It follows that Eb ! :
22
Note that convergence in mean is equivalent to convergence in probability since the gauge
takes values in the interval [0; 1], see Billingsley (1968, Theorem 5.4).
Theorem 9.1 applies to various Huber-skip M-estimators. For the Huber-skip M-estimator
the estimators b; b2 are the Huber-skip estimator and corresponding variance estimator. For
1-step Huber-skip M-estimator the estimators b; b2 are the initial estimators. For Impulse
Indicator Saturation or 1-step Huber-skip M-estimator iterated m times the estimators b; b2
are the estimators from step m 1:

9.2 Normal approximations to gauge


We control the proportion of falsely discovered outliers by …xing the cut-o¤ c. In that case an
asymptotically normal distribution theory follows from the expansion in Theorem 9.1. The
asymptotic variance is analyzed case by case since the expansion in Theorem 9.1 depends
on the variance estimator b2 .
Huber-skip M-estimator: Theorem 7.1 shows that N 1 ( b ) is tight. This is the simplest
case to analyse since the variance is assumed known so that b2 = 2 : Therefore only the …rst
binomial term in Theorem 9.1 matters.

Theorem 9.2 Consider the Huber-skip b


1
Pn M-estimator de…ned from (4:2); (4:3) with known
; …xed c and sample gauge b = n i=1 1(jyi x0 bj> c) : Suppose Assumption 6.1 holds. Then
i

D
n1=2 (b ) ! Nf0; (1 )g:

The robusti…ed least squares estimator: This is the 1-step Huber-skip M-estimator b
de…ned in (4.7), (4.8), where the initial estimators e; e2 are the full-sample least squares
estimators. The binomial term in Theorem 9.1 is now combined with a term from the initial
variance estimator e2 :

Theorem 9.3 Consider the robusti…ed least squares estimator b de…ned from (4:7), (4:8),
and the initial estimatorsPe and e2 are the full sample least squares estimators, …xed c and
sample gauge is e = n 1 ni=1 1(jyi x0 ej>ec) . Suppose Assumption 6.1 holds. Then
i

D
n1=2 (e ) ! N[0; (1 ) + 2cf(c)( ) + fcf(c)g2 ( 1)]:

The variance in Theorem 9.3 is larger than the binomial variance for a normal reference
distribution and any choice of : This is seen through di¤erentiation with respect to c.
The split-half Impulse Indicator Saturation estimator: The estimator is de…ned in Algo-
( 1)
rithm 4.2. Initially, the outliers are de…ned using the indicator vbi based on the split-sample
b 2 b 2
estimators 1 ; b1 and 2 ; b2 ; see (4.10). The outliers are reassessed using the updated esti-
mators b(0) ; b(0) : Thus, the algorithm gives rise to two sample gauges
P P
b( 1) = n 1 i2I1 1(jyi x0 b2 j>b2 c) + n 1 i2I2 1(jyi x0 b1 j>b1 c) ; (9.2)
i i
P
b(0) = n 1 ni=1 1(jyi x0 b(0) j>b(0) c) : (9.3)
i

For simplicity we only report the result for the initial gauge b( 1)
. The updated gauge b(0)
has a di¤erent asymptotic variance.
23

0.05 0.01 0.005 0.0025 0.001


c 1.960 2.576 2.807 3.023 3.291
sdv for Huber-skip M 0.218 0.0995 0.0705 0.0499 0.0316
sdv for RLS 0.146 0.0844 0.0634 0.0467 0.0305
sdv for iterated 1-step Huber-skip M 0.314 0.117 0.0783 0.0534 0.0327

Table 1: Asymptotic standard devations of the empirical gauge.

Theorem 9.4 Consider the Impulse Indicator Saturation. Suppose Assumption 6.1 holds
for each set I1 ; I2 . Then, for …xed c; the initial sample gauge b( 1) has the same asymptotic
distribution as the sample gauge for robusti…ed least squares reported in Theorem 9.3.

The iterated 1-step Huber-skip M-estimator: The estimator is de…ned in Algorithm 4.1.
Special cases are the iterated robusti…ed least squares estimator and the Impulse Indicator
Saturation. If the algorithm is stopped after m + 1 steps the sample gauge is
P
b(m) = n 1 ni=1 1(jyi x0 b(m) j>b(m) c) for m = 0; 1; 2; : : :
i

Because the estimation errors N 1 ( b(m) ); n1=2 (b(m) ) are tight by Theorem 7.5, the
sequence of sample gauges will also be tight. Theorem 9.1 then generalises as follows.

Theorem 9.5 Consider the iterated 1-step Huber-skip estimator. Suppose Assumption 6.1
holds and that the initial estimators satisfy that N 1 ( b(0) ) and n1=2 (b(0) ) are OP (1):
(m)
Then, for …xed c; the sequence of sample gauges b satis…es

sup0 m<1 jEb(m) j ! 0; n ! 1:

A …xed point result can also be derived for the gauge.

Theorem 9.6 Consider the iterated 1-step Huber-skip estimator ; ; see (7.6) and (7.7)
and the vi ; de…ned from these. Suppose Assumption 6.1 holds and that the initial estima-
tors satisfy that N 1 ( b(0) ) and n1=2 (b(0) ) are OP (1): Then, for all ; > 0 a pair
n0 ; m0 > 0 exists so that for all n; m so n n0 and m m0 it holds, for …xed c;

P(n1=2 jb(m) b j> )< :

where, for ( ; {; ) from (3.2) and (3.4),

Pn 2cf(c) Pn "2i
n1=2 (b )=n 1=2
i=1 f1(j"i j> c) g n 1=2
i=1 ( 2
)1(j"i j c) + oP (1):
2
Moreover, the two sums are asymptotically independent and it holds that

D 2cf(c) 2
n1=2 (b ) ! N[0; (1 )+f g ({ )]:
2
24
Table 1 shows the asymptotic variances for the Huber-skip M-estimator, the Robusti…ed
Least Squares and for the fully iterated 1-step Huber-skip estimators . The latter include
iterated Robusti…ed Least Squares and iterated Impulse Indicator Saturation. The results
are taken from Theorems 9.2, 9.3, 9.6, respectively. For gauges of 1% or lower the standard
deviations are very similar. If the gauge is chosen as = 0:05 and n = 100; then the sample
gauges b will be asymptotically normal with mean = 0:05 and a standard deviation of
about 0:2=n1=2 = 0:02: This suggests that it is not unusual to …nd up to 8-9 outliers when in
fact there are none. Lowering the gauge to = 0:01 or = 0:0025; the standard deviation
is about 0:1=n1=2 = 0:01 and 0:05=n1=2 = 0:005; respectively, when n = 100. Thus, it is not
unusual to …nd up to 2-3 and up to 1 outliers, respectively, when in fact there are none. This
suggests that the gauge should be chosen rather small in line with the discussion in Hendry
and Doornik (2014, §7.6).

9.3 Poisson approximation to gauge


If we set the cut-o¤ so as to accept the same …xed number of falsely discovered outliers
regardless of the sample size, then a Poisson exceedence theory arises.
The idea is to choose the cut-o¤ cn so that, for some > 0;
P(j"i j > cn ) = =n: (9.4)
The cut-o¤ cn appears both in the de…nition of the gauge and in the de…nition of the
estimators, so some care is needed. We build the argument around the 1-step M-estimator.
Let bn and bn be sequences of estimators that may depend on cn , hence the subscript n in
the notation for the estimators. Given these estimators, the sample gauge is
P
bn = n 1 ni=1 1(jyi x0 bn j>bn cn ) : (9.5)
i

In the …rst result we assume that estimation errors N 1 ( bn ) and n1=2 (bn ) are
tight. Thus, the result immediately applies to robusti…ed least squares, where the initial
estimators bn and bn are the full sample least squares estimators, which do not depend on
the cut-o¤ cn : But, in general we need to check this tightness condition.
Theorem 9.7 Consider the 1-step Huber-skip M-estimator, where nP (j"1 j cn ) = .
1 b 1=2 2 2
Suppose Assumption 6.1 holds, and that N ( n ) and n (bn ) are OP (1): Then the
sample gauge bn in (9:5) satis…es
D
nbn ! Poisson( ):
We next discuss this result for particular initial estimators.
Robusti…ed least squares estimator: The initial estimators e and e2 are the full sample
least squares estimators. These do not depend on cn so Theorem 9.7 trivially applies.
Theorem 9.8 Consider the robusti…ed least squares estimator b de…ned from (4:7), (4:8),
where the initial estimators e and e2 are the full sample least squares estimators, while
cn isPde…ned from (9:4). Suppose Assumption 6.1 holds. Then the sample gauge en =
1 n
n i=1 1(jyi x0 ej>ecn ) satis…es
i
D
nen ! Poisson( ):
25

x
cn=100 cn=200 0 1 2 3 4 5
5 1.960 2.241 0.01 0.04 0.12 0.27 0.44 0.62
1 2.576 2.807 0.37 0.74 0.92 0.98 1.00
0.5 2.807 3.023 0.61 0.91 0.98 1.00
0.25 3.023 3.227 0.78 0.97 1.00
0.1 3.291 3.481 0.90 1.00

Table 2: Poisson approximations to the probability of …nding at most x outliers for a given
1
. The implied cut-o¤ cn = f1 =(2n)g is shown for n = 100 and n = 200.

Impulse Indicator Saturation: Let bj and bj2 be the split sample least squares estimators.
These do not depend on cn so Theorem 9.7 trivially applies for the split sample gauge based
on
( 1)
vbi;n = 1(i2I1 ) 1(jyi x0 b2 j>b2 cn ) + 1(i2I2 ) 1(jyi x0 b1 j>b1 cn ) :
i i

The updated estimators bn(0)


and
(0)
(bn )2
do, however, depend on the cut-o¤. Thus, an
additional argument is needed, when considering the gauge based on the combined initial
estimator as in
(0)
vbi;n = 1(jyi x0 bn(0) j>bn(0) cn ) :
i

Theorem 9.9 Consider the Impulse Indicator Saturation Algorithm 4.2. Let cn be de…ned
from (9:4). Suppose Assumption 6.1 holds for each set I1 ; I2 . Let the estimators bn and
(0)

(bn )2 be de…ned from (4:7), (4:5) replacing vi by vbi;n . Then N 1 ( bn


(0) ( 1) (0) (0)
) and n1=2 (bn )
are OP (1), and
Pn (m) D
nbn(m) = i=1 (1 vbi;n ) ! Poisson( ) for m = 1; 0:

Table 1 shows the Poisson approximation to the probability of …nding at most x outliers
for di¤erent values of : For small and n this approximation is possibly more accurate
than the normal approximation, although that would have to be investigated in a detailed
simulation study. The Poisson distribution is left skew so the probability of …nding at most
x = outliers increases from 62% to 90% for decreasing from 5 to 0.1. In particular, for
= 1 and n = 100 so the cut-o¤ is cn = 2:58 the probability of …nding at most one outlier is
74% and the probability of …nding at most two outliers is 92%. In other words, the chance
of …nding 3 or more outliers is small when in fact there are none.

10 The gauge of Huber-skip L-type estimators


We now consider the gauge for the L-type estimators. The results and their consequences are
somewhat di¤erent from the results for M-type estimators. For the Least Trimmed Squares
estimator the gauge is trivially b = ; because the purpose of the estimator is to keep the
trimming proportion …xed. For the Forward Search the idea is to stop the algorithm once
the forward residuals zb(m) =b(m) become too big. We develop a stopping rule from the gauge.
26

10.1 Gauge for the Forward Search


The forward plot of forward residuals consists of the scaled forward residuals zb(m) =b(m)
for m = m0 ; : : : ; n 1: Along with this we plot point-wise con…dence bands derived from
Theorem 8.4. Suppose we de…ne some stopping time m b based on this information, so that m
b
is the number of non-outlying observations while n m b is the number of the outliers. This
stopping time can then be calibrated in terms of the sample gauge (8.5), which simpli…es as

n b
m 1 Pn 1
b= = (n m)1(m=m)
b :
n n m=m0
P 1
Rewrite this by substituting n m = nj=m 1 and change order of summation to get

1 Pn 1
b= 1(m
b j) : (10.1)
n j=m0
b
If the stopping time is an exit time, then the event (m j) is true if z^(m) =^ (m) has exited
at the latest by m = j:
An example of a stopping time is the following. Theorem 8.4 shows that

e n (c ) = 2f(c)n1=2 ( zb
Z c ) = Zn (c ) + oP (1) (10.2)
b

uniformly in 0 n=(n + 1); where Zn converges to a Gaussian process Z: We now


choose the stopping time as the …rst time greater than or equal to m1 ( m0 ); zb(m) =b(m)
exceeds some constant level q times its pointwise asymptotic standard deviation, that is,
e n (cm=n ) > qsdvfZ
b = arg min[Z
m e n (cm=n )g]: (10.3)
m1 m<n

b
To analyze the stopping time (10.3) we consider the event (m j): This event satis…es

e n (cm=n )
Z
b
(m j) = [ max > q]:
m1 m j sdvfZe n (cm=n )g

Inserting this expression into (10.1) and then using expansion (10.2) we arrive at the following
result, with details given in the appendix.

Theorem 10.1 Consider the Forward Search. Suppose Assumption 6.1 holds. Let m0 =
int( 0 n) and m1 = int( 1 n) for some 1 b in (10:3)
0 > 0: Consider the stopping time m
for some c 0: Then
Z 1
n m b Z(c )
Eb = E ! = P[ sup > q]du:
n 1 1 u sdvfZ(c )g

If 1 > 0 ; the same limit holds for the forward search when replacing zb(m) by the deletion
residual db(m) in the de…nition of m
b in (10:3):
27

vs 1 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
0.10 2.50 2.43 2.28 2.14 1.99 1.81 1.60 1.31 0.82 -
0.05 2.77 2.71 2.58 2.46 2.33 2.19 2.02 1.79 1.45 0.69
0.01 3.30 3.24 3.14 3.04 2.94 2.83 2.71 2.55 2.33 1.91
0.005 3.49 3.44 3.35 3.26 3.15 3.04 2.95 2.81 2.62 2.26
0.001 3.90 3.85 3.77 3.69 3.62 3.53 3.43 3.32 3.18 2.92

Table 3: Cut-o¤ values q for Forward Search as a function of gauge and lower point 1 of
range for the stopping time.

The integral in Theorem 10.1 cannot be computed analytically in an obvious way. In-
stead we simulated it using Ox 7, see Doornik (2007). For a given n; draws of normal
"i can be made. From this, the process Zn in (8.4) can be computed. The maximum of
Zn (cm=n )=sdvfZ(cm=n )g over m1 m j can then be computed for any m1 j n.
Repeating this nrep times the probability appearing as the integrand can be estimated for a
given value of q and u : From this the integral can be computed. This expresses = ( 1 ; q)
as a function of q and 1 . Inverting this for …xed 1 expresses q = q( 1 ; ) as a function of
and 1 : Results are reported in the Table 3 for nrep = 105 and n = 1600:

11 Application to …sh data


11.1 Impulse Indicator Saturation
The Impulse Indicator Saturation of Algorithm 4.2 is an iterative procedure. Assuming
innovations are normal cut-o¤s can be chosen according to a standard normal distribution.
For a …nite iteration, where the number of steps is chosen apriori, this follows from Theorem
9.1. For an in…nite iteration, this follows from Theorem 9.6. Thus, the cut-o¤ is 2.58 for a
1% gauge. When applying the procedure we split the sample in the …rst and last half.
The estimated model for the …rst sample half is
(1st half)
qbt = 6:5 + 0:26 qt 1 0:51 St ; b = 0:66; t = 2; : : : ; 56:
(1:1) (0:12) (0:18)

The preliminary second half outliers are in observations 95, 108, 68, 75, 94 with residuals
4:66; 3:11; 2:85; 2:74; 2:66: The estimated model for the second sample half is
(2nd half)
qbt = 7:5 + 0:13 qt 1 0:21 St ; b = 0:77; t = 57; : : : ; 111:
(1:2) (0:14) (0:30)

The preliminary …rst half outliers are in observations 18, 34 with residuals 3:78; 2:95:
In step m = 0 we estimate a model with dummies for the preliminary outliers and get
the full sample model
(0)
qbt = 1:98 Dt18 1:80 Dt34 1:26 Dt68 1:34 Dt75 1:35 Dt94 2:40 Dt95
(0:60) (0:61) (0:60) (0:60) (0:60) (0:61)
[ 3:16] [ 2:93] [ 2:10] [ 2:23] [ 2:25] [ 3:96]

1:56 Dt108 + 7:8 + 0:11 qt 1 0:41 St ; b = 0:60;


(0:60) (0:7) (0:08) (0:13)
[ 2:61]
28
with standardised coe¢ cients reported in square brackets. The observations 18, 34, 95, 108
remain outliers. All residuals - for observations without indicators - are now smaller than
the cut-o¤ value. Thus we conclude that the observations 18, 34, 95, 108 are outliers.
In step m = 1 we get the Impulse Indicator Saturation model
(1)
qbt = 1:96 Dt18 1:82 Dt34 2:40 Dt95 1:55 Dt108
(0:63) (0:65) (0:64) (0:63)
[ 3:10] [ 2:81] [ 3:76] [ 2:44]

+ 7:9 + 0:09 qt 1 0:39 St ; b = 0:63:


(0:7) (0:08) (0:14)

The observations 18, 34, 95 remain outliers, while all residuals are small.
In step m = 2 the estimated model is identical to the model (2.2). In that model the
observations 18, 34, 95 remain outliers, while all residuals are smaller. Thus, the algorithm
has reached a …xed point.
If the gauge is chosen as 0.5% or 0.25% so the cut-o¤ is 2.81 or 3.02, respectively, the
algorithm will converge to a solution taking 18, 95 or 95 as outliers, respectively.

11.2 Forward Search


We need to choose the initial estimator, the fractions 0 ; 1 and the gauge. As initial estima-
tor we chose the fast LTS estimator by Rousseeuw and van Driessen (1998) as implemented
in the ltsReg function of the R-package robustbase. We chose to use it with breakdown
point 1 0 : There is no asymptotic analysis of this estimator. It is meant to be an ap-
proximation to the Least Trimmed Squares estimator, for which we have Theorem 8.1 based
on Víµcek (2006c). That result requires …xed regressors. Nonetheless, we apply it to the …sh
data where the two regressors are the lagged dependent variable and the binary variable St
which is an indicator for stormy weather. We choose 0 = 1 as either 0.95 or 0.8.
Figure 4 shows the forward plots of the forward residuals b(m+1) =&m=n b(m+1) , where the
(m)

scaling is chosen in line with Atkinson, Riani and Cerioli (2010). Consider panel (a) where
0 = 1 = 0:95: Choose the gauge as, for instance, = 0:01; in which case the we need
to consider the third exit band from the top. This is exceeded for m b = 107; pointing at
n m b = 3 outliers. These are the three holiday observations 18, 34, 95 discussed in §2. If
the gauge is set to = 0:001 we …nd no outliers. If the gauge is set to = 0:05 we …nd
mb = 104; pointing at n m b = 6, which is 5% of the observations.
Consider now panel (b) where 0 = 1 = 0:80: With a gauge of = 0:01 we …nd m b = 96;
pointing at n m b = 14 outliers. These include the three holiday observations along with
11 other observations. This leaves some uncertainty about the best choice of the number
of outliers. The present analysis is based on asymptotics and could be distorted in …nite
samples.

12 Conclusion and further work


The results presented concern the asymptotic properties of a variety of Huber-skip estimators
in the situation where there are no outliers, and the reference distribution is symmetric if
29

3.4
3.2 (a) ψ 0=ψ 1=0.95 (b) ψ0=ψ1=0.80

3.0
forw ard residual
3.0

2.5
2.8
2.6

2.0
2.4

104 105 106 107 108 109 110 90 95 100 105 110

Figure 4: Forward Plots of forward residuals for …sh data. Here 0 = 1 is chosen either as
0.95 or 0.80. The bottom curve shows the pointwise median. The top curves show the exit
bands for gauges chosen as, from top, 0.001, 0.005, 0.01, 0.05, respectively. Panel (b) also
includes an exit band for a gauge of 0.10.

not normal. Combined with the concept of the gauge, these results are used for calibrating
the cut-o¤ values of the estimators.
In further research we will look at situations, where there actually are outliers. Various
con…gurations of outliers will be of interest: single outliers, clusters of outliers, level shifts,
symmetric and non-symmetric outliers. The probability of …nding particular outliers is called
potency in Hendry and Santos (2010). It will then be possible to compare the potency of
two di¤erent outlier detection algorithms, that are calibrated to have the same gauge.
The approach presented is di¤erent from the traditional approaches of robust statistics.
It would be of interest to compare the approach with the traditional idea of analyzing robust
estimators in terms of their breakdown point, see Hampel (1971), or the in‡uence function,
see Hampel, Ronchetti, Rousseeuw and Stahel (1986) or Maronna, Martin and Yohai (2006).
First order asymptotic theory is known to be fragile in some situations. A comprehensive
simulation study of the results presented would therefore be useful, possibly building on
Atkinson and Riani (2006) and Hendry and Doornik (2014).
It would be of interest to extend this research to variable selection algorithms such as
Autometrics, see Hendry and Doornik (2014). The Impulse Indicator Saturation is a stylized
version of Autometrics. It should work well, if the researcher can identify a part of the data,
that is free from outliers. If this is not the case, one will have to iterate over the choice
of sub-samples. In Autometrics potential outliers are coded as dummy variables and the
algorithm then searches over these dummy variables along with the other regressors.
30

A Proofs
For the asymptotic normality results for the gauge some covariance matrices have to be
computed. The results are collected in

Theorem A.1 Suppose Assumption 6.1 holds and that c = G( ). Then the processes
P
An (c) = n 1=2 ni=1 f1(j"i j> c) g;
P "2
Bn (c) = n 1=2 ni=1 ( i2 )1(j"i j c) ;

1=2
Pn "2i
Cn (c) = n i=1 ( 2
1);
0
Pn
Kn (c) = N i=1 xi "i 1(j"i = j c)

converge to continuous limits A; B; C; K on D[0; 1] endowed with the uniform metric. The
processes An ; Bn ; Cn have Gaussian limits with covariance matrix
8 9 8 9
< An (c) = < (1 ) 0 =
2 2
= V ar Bn (c) = 0 { = { = : (A.1)
: ; : 2 ;
Cn (c) { = ( 1)=2

If the regressors are stationary, then K is Gaussian independent of A; B; C with variance


2
:
It follows that the asymptotic variance in Theorem 8.3 is given by

2f(c )n1=2 (^
z = c ) An (c )
asVar = Var = !10 !1
n (^ = 2
1=2 2
1) Bn (c )= + (c2 = 1= )An (c )

for
1 0 0
!10 = :
(c2 = 1= ) 1= 0
The asymptotic variance in Theorem 8.4 is given by

asV ar(2f(c )n1=2 (^


z =^ c )
c f(c ) 2 c f(c )
= V ar( f1 (c )gAn (c ) + Bn (c )) = !20 !2

where
c f(c ) c f(c )
!20 = f1 (c2 )g; ; 0 :

Proof of Theorem 9.1. Apply the asymptotic expansion in Theorem 6.1.

Proof of Theorem 9.2. Insert b2 = 2


in the expansion in Theorem 9.1 and apply
Theorem A.1 to the binomial term.
31

Proof ofPTheorem 9.3. The initial estimators satisfy N 1 ( e ) = OP (1) and n1=2 (e2
2 1=2 n 2 2
)=n i=1 ("i ) + oP (1); see (7.3). Use Theorem 9.1 to get
P P
n1=2 (b ) = n 1=2 ni=1 f1(j"i j> c) g + cf(c)n 1=2 ni=1 ("2i 2
) + oP (1); (A.2)

and apply Theorem A.1.

Proof of Theorem P 9.4. The initial estimators satisfy N 1 ( bj ) = OP (1) and


1=2 2 2 1=2 n 2 2
n (bj ) = n i=1 ("i ) + oP (1); see (7.5). Insert this in the expansion in
Theorem 9.1 to get
1=2 P 1=2 P 1=2 P 2 2
nj i2Ij f1(jyi x0 b3 j j>b3 j c)
i
g = nj i2Ij f1(j"i j> c) g+cf(c)nj i2Ij ("i )+oP (1):

Combine the counts of outliers for the two sub-samples to get

1=2
1=2 ( 1) 1=2
P2 nj P
n (b )=n j=1 1=2 i2Ij f1(jyi x0i b3 j j>b3 j c)
g
nj
1=2
P2 P 1=2
P2 P 2 2
=n j=1 i2Ij f1(j"i j> c) g + cf(c)n j=1 i2Ij ("i ) + oP (1):

This is reduces to the expansion (A.2) for the robusti…ed least squares estimator.

Proof of Theorem 9.5. Theorem 7.5 shows that the normalised estimators are tight.
Thus, for all there exists an A > 0 so that the set
T 1 b(m)
An = 1 m=0 fjN ( )j + jn1=2 f(b(m) )2 2
gj U g (A.3)

has probability of at least 1 : Theorem 6.1 then shows that on that set
1 Pn
b(m) = f1(j"i j> c) g n 1=2
2cf(c)n1=2 (b(m) = 1) + oP (n 1=2
); (A.4)
n i=1
1=2
where, uniformly in m; the …rst term and the second term are OP (n ); while the remainder
term is oP (n 1=2 ): Therefore

sup0 m<1 j^ (m) j = oP (1):

Since ^ (m) and are bounded by one then E sup0 m<1 j^ (m) j vanishes as n ! 1. Thus,
by the triangle inequality

sup0 m<1 jE^ (m) j sup0 m<1 Ej^ (m) j E sup0 m<1 j^ (m) j = o(1):

Proof of Theorem 9.6. On the set An de…ned in the proof of Theorem 9.5, see (A.3),
we consider the expansion (A.4), that is,
P
n1=2 (b(m) ) = n 1=2 ni=1 f1(j"i j> c) g 2cf(c)n1=2 (b(m) = 1) + oP (1);
32
where the remainder is uniform in m: Theorem 7.6 shows that for large m; n we have

1 Pn "2i
n1=2 (b(m) = 1) = n 1=2
i=1 ( 2
)1(j"i j c) + oP (1);
2
where the remainder is uniform in m: Combine to get the desired expansion. The asymptotic
normality follows from Lemma A.1.

Theorem 9.7 is a special case of the following Lemma subjected to Remarks A.1 and A.2
below, because Assumption 6.1 assumes Gaussian errors.

Lemma A.2 Suppose Assumption 6.1(ii; d) holds. Let the cut-o¤ cn be given by (9.4) and
assume that
(i) the density f is symmetric with decreasing tails and support on R so that cn ! 1 with
(a) Ej"i jr < 1 for some r > 4;
(b) f(cn )=[cn f1 F(cn )g] = O(1);
(c) f(cn n 1=4 A)=f(cn ) = O(1) for all A > 0;
(ii) N 1 ( b ); n1=2 (b2 2
) are OP (1):
Then the sample gauge b in (8:6) satis…es
D
nb ! Poisson( ):

Remark A.1 Assumption (ia) implies that cn = O(n1=r ) where 1=r < 1=4. Combine the
de…nition P(j"i j > cn ) = =n with the Markov inequality P(j"i j > cn ) ( cn ) r Ej"i jr so
1
that cn (Ej"i jr )1=r 1=r n1=r = O(n1=r ).

Remark A.2 Assumption (i) of Lemma A.2 holds if f = ' is standard normal. For (b) use
the Mill’s ratio result f(4 + c2 )1=2 cg=2 < f1 (c)g='(c), see Sampford (1953). For (c)
1=4
note that 2 logff(cn n A)=f(cn )g = cn (cn n 1=4 A)2 = 2cn n 1=4 A n 1=2 A2 and use
2

Remark A.1.

Proof of Lemma A.2. 1. A bound on the sample space. Since N 1


(b ) and
n (b2
1=2 2
) are OP (1) and in light of Assumption 6.1 (ii; d) then for all > 0 exists a
constant A0 > 1 such that the set

Bn = fjN 1
(b )j + n1=2 jb j + n1=4 max jN 0 xi j A0 g
1 i n

has probability larger than 1 : It su¢ ces to prove the theorem on this set.
2. A bound on indicators. Introduce the quantity

si = bcn yi + x0i b + "i = cn + n 1=2 1=2


n (b )cn + x0i N N 1
(b ):

On the set Bn ; using cn = o(n1=4 ); by Remark A.1 the quantity si satis…es, for some A1 > 0;
1=2 1=4
si cn + n A0 cn + n A20 (cn + n 1=4
A1 );
1=2 1=4
si cn n A0 cn n A20 (cn n 1=4
A1 ):
33
It therefore holds that

1("i = >cn +n 1=4 A


1)
1(yi x0i b>bcn ) = 1("i >si ) 1("i = >cn n 1=4 A
1)
:

With a similar inequality for 1(yi x0i b< bcn ) we …nd

1(j"i = j>cn +n 1=4 A


1)
1(jyi x0i bj>bcn ) 1(j"i = j>cn n 1=4 A
1)
: (A.5)

3. Expectation of indicator bounds. It will be argued that

nE1(j"i = j>cn +n 1=4 A


1)
! ; nE1(j"i = j>cn n 1=4 A
1)
! : (A.6)

Since nE1(j"i = j>cn ) ! it su¢ ces to argue that

En = nEf1(j"i = j>cn n 1=4 A


1)
1(j"i = j>cn +n 1=4 A
1)
g ! 0:

A …rst order Taylor expansion and the identity 2f1 F(cn )g = =n give
Z cn +n 1=4 A
1
1=4 4 n 1=4 A1 f(c )
En = n 2f(x)dx = 4n A1 f(c ) = ;
cn n 1=4 A
1
2f1 F(cn )g
1=4
for jc cn j n A1 . Rewrite as

1=4 f(c ) f(cn n 1=4 A1 ) f(cn )


En = 2 n A1 f 1=4
gf g[ ]cn :
f(cn n A1 ) f(cn ) cn f1 F(cn )g
The …rst fraction is bounded by one since f has decreasing tails. Then second and the third
fractions are bounded by Assumption (ib; ic): Then use that n 1=4 cn = o(1) by Remark A.1.
4. Poisson distribution. Using the bounds in item 3, it holds on the set Bn that
1 Pn 1 Pn 1 Pn
1 1=4 A b= 1 x0i bj>bcn ) 1 1=4 A :
n i=1 (j"i = j>cn +n 1)
n i=1 (jyi n i=1 (j"i = j>cn n 1)

Using (A.6) the Poisson limit theorem shows that the upper and lower bounds have Poisson
limits with mean :

( bn
1 (0)
Proof of Theorem 9.9. 1. Comparison with least squares: The estimator N )
is based on
P ( 1) P P ( 1)
N 0 ni=1 vbi;n xi x0i N = N 0 ni=1 xi x0i N N 0 ni=1 (1 vbi;n )xi x0i N; (A.7)
P ( 1) P P ( 1)
N 0 ni=1 vbi;n xi "i = N 0 ni=1 xi "i N 0 ni=1 (1 vbi;n )xi "i ; (A.8)

In each equation the …rst term is the full sample product moment, which converges due
to Assumption 6.1, and the estimation error of the full sample least squares is bounded in
probability. It su¢ ces to show that the second terms vanish in probability. The argument
(0)
for n1=2 f(bn )2 2
g is similar.
2. Tightness of the initial estimators. Because N 1 ( bj ) and n1=2 (bj2 2
) are OP (1),
then for all > 0 there exists a constant A0 > 1 such that the set
P P
Bn = f 2j=1 jN 1 ( bj )j + 2j=1 n1=2 jbj j + n1=2 max jN 0 xi j A0 g
1 i n
34
has probability larger than 1 : It su¢ ces to prove the theorem on this set.
3. Bounding the second terms: The second terms of (A.7) and (A.8) are bounded by
P ( 1)
Sp = ni=1 (1 vbi;n )jN 0 xi j2 p j"i jp for p = 0; 1:
On the set Bn we get the further bound, see (A.5) in the proof of Lemma A.2,
Pn 0 2 p
Sp 1Bn i=1 jN xi j j"i jp 1(j"i = j>cn n 1=4 A1 ) :
The expectation is bounded as
Pn 0
E(Sp 1Bn ) E i=1 jN xi j2 p
j"i jp 1(j"i = j>cn n 1=4 A
1)
:
Now
j"i jp 1(j"i = j>cn n 1=4 A
1)
E1=2 1(j"i = j>cn n 1=4 A
1)
E1=2 j"i j2p 1(j"i = j>cn n 1=4 A
1)
:
The …rst factor is of order n 1=2 ; because n(1 F (cn n 1=4 A1 )) ! ; and the second factor
tends to zero because E"2i < 1. We also have
P P
E ni=1 jN 0 xi j2 p = n1=2(p 2) E ni=1 jn1=2 N 0 xi j2 p Cnp=2 Cn1=2
P
by Assumption 6.1(ii; d): Collecting these evaluations we …nd Sp ! 0:
Proof of Theorem 10.1. Theorem 8.3 implies, that Zn converges to a Gaussian process
Z on D[ 0 ; 1] endowed with the uniform metric. The variance of Z(c ) vanishes for ! 1
so a truncation argument is needed to deal with the ratio Xn (c ) = Zn (c )=sdvfZ(c )g:
Approximate the sample gauge by
n m b 1 Pint(nv) 1
bv = 1(mb vn) = 1(m
b j) ;
n n j=m1
for some v < 1 and using (10.1). Then the sample gauge is ^ = ^1 ; and
n m b n nv
0 ^ ^v = 1(m>vn)
b < = 1 v: (A.9)
n n
The process Xn (c ) converges on D[ 1 ; v]: The Continuous Mapping Theorem 5.1 of Billings-
ley (1968) then shows that sup 1 u Xn (c ) converges as a process in u on D[ 1 ; v]: In
turn, for a given q; the deterministic function P(m b nu) = Pfsup 1 u Xn (c ) > qg in
1 u v converges to a continuous increasing function p(u) on [ 1 ; v], which is bounded
by unity. In particular it holds that
Z v
1 Pint(nv) 1 1 Pint(nv) 1
Ebv = E1(m
b j) = P(mb j) ! v = p(u)du v 1 1 1;
n j=m1 n j=m1 1

and Z Z Z
v 1 1
Z(c )
v = p(u)du % = p(u)du =
> c]du P[ sup
1 1 1 1 u sdvfZ(c )g

regardless of the behaviour of the process Xn (c) for close to unity.


Now return to the sample gauge b; and rewrite it as
b =( v ) + (bv v) + (b bv )
for some …xed v: Then
jb j 1 v + jbv vj +1 v:
Choose an > 0 and v such that 1 v ; and then n so large that jbv vj with large
probability, then jb j 3 with large probability, which completes the proof.
35

References
Atkinson, A.C. and Riani, M. (2000) Robust Diagnostic Regression Analysis. New York:
Springer.
Atkinson, A.C. and Riani, M. (2006) Distribution theory and simulations for tests of outliers
in regression. Journal of Computational and Graphical Statistics 15, 460–476.
Atkinson, A.C., Riani, M. and Cerioli, A. (2010) The forward search: Theory and data
analysis (with discussion). Journal of the Korean Statistical Society 39, 117–134.
Bahadur, R.R. (1966) A note on quantiles in large samples. Annals of Mathematical Sta-
tistics 37, 577–580.
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and
powerful approach to multiple testing. Journal of the Royal Statistical Society B 57,
289–300.
Bercu, B. and Touati, A. (2008) Exponential inequalities for self-normalized martingales
with applications. Annals of Applied Probability 18, 1848–1869.
Bickel, P.J. (1975) One-step Huber estimates in the linear model. Journal of the American
Statistical Association 70, 428–434.
Billingsley, P. (1968) Convergence of Probability Measures. New York: Wiley.
Castle, J.L., Doornik, J.A. and Hendry, D.F. (2011) Evaluating automatic model selection.
Journal of Time Series Econometrics 3, Issue 1, Article 8.
Cavaliere, G. and Georgiev, I. (2013) Exploiting in…nite variance through dummy variables
in nonstationary autoregressions. Econometric Theory 29, 1162–1195.
Chen, X.R. and Wu, Y.H. (1988) Strong consistency of M-estimates in linear models. Jour-
nal of Multivariate Analysis 27, 116–130.
Csörg½o, M. (1983) Quantile Processes with Statistical Applications. CBMS-NFS Regional
Conference Series in Applied Mathematics 42, Society for Industrial and Applied Math-
ematics.
Davies, L. (1990) The asymptotics of S-estimators in the linear regression model, The
Annals of Statistics 18, 1651–1675.
Dollinger, M.B. and Staudte, R.G. (1991) In‡uence functions of iteratively reweighted least
squares estimators. Journal of the American Statistical Association 86, 709–716.
Doornik, J.A. (2007) Object-Oriented Matrix Programming Using Ox, 3rd ed. London:
Timberlake Consultants Press and Oxford: www.doornik.com.
Doornik, J.A. (2009) Autometrics. In Castle, J.L. and Shephard, N. (eds.) The Methodology
and Practice of Econometrics: A Festschrift in Honour of David F. Hendry, pp. 88–
121. Oxford: Oxford University Press.
36
Doornik, J.A. and Hendry, D.F. (2013) Empirical Econometric Modelling - PcGive 14,
volume 1. London: Timberlake Consultants.

Engle, R.F. (1982) Autoregressive conditional heteroscedasticity with estimates of the vari-
ance of United Kingdom in‡ation. Econometrica 50, 987–1108

Engler, E. and Nielsen, B. (2009) The empirical process of autoregressive residuals. Econo-
metrics Journal 12, 367–381.

Godfrey, L.G. (1978) Testing Against General Autoregressive and Moving Average Error
Models when the Regressors Include Lagged Dependent Variables. Econometrica 46,
1293–1302.

Graddy, K. (1995) Testing for imperfect competition at the Fulton Fish Market. RAND
Journal of Economics 26, 75–92.

Graddy, K. (2006) The Fulton Fish Market. Journal of Economic Perspectives 20, 207–220.

Hadi, A.S. (1992) Identifying multiple outliers in multivariate data. Journal of the Royal
Statistical Society B 54, 761–771.

Hadi, A.S. and Simono¤, J.S. (1993) Procedures for the Identi…cation of Multiple Outliers
in Linear Models Journal of the American Statistical Association 88, 1264-1272.

Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W.A. (1986) Robust Statistics:
The Approach Based on In‡uence Functions. New York: John Wiley & Sons.

Hampel, F.R. (1971) A general qualitative de…nition of robustness. Annals of Mathematical


Statistics 42, 1887–1896.

He, X. and Portnoy, S. (1992) Reweighted LS estimators converge at the same rate as the
initial estimator. Annals of Statistics 20, 2161–2167.

Hendry, D.F. (1999) An econometric analysis of US food expenditure, 1931–1989. In Mag-


nus, J.R. and Morgan, M.S. Methodology & Tacit Knowledge: Two Experiments in
Econometrics. Chichester: John Wiley & Sons, p. 341–362.

Hendry, D.F. and Doornik, J.A. (2014) Empirical Model Discovery and Theory Evaluation.
Cambridge MA: MIT Press.

Hendry, D.F., Johansen, S. and Santos, C. (2008) Automatic selection of indicators in a


fully saturated regression. Computational Statistics 23, 337–339.

Hendry, D.F. and Krolzig, H.-M. (2005) The properties of automatic GETS modelling.
Economic Journal 115, C32–61.

Hendry, D.F. and Mizon, G.E. (2011) Econometric modelling of time series with outlying
observations. Journal of Time Series Econometrics 3:1:6.

Hendry, D.F. and Nielsen, B. (2007) Econometric Modelling. Princeton NJ: Princeton
University Press.
37
Hendry, D.F. and Santos, C. (2010) An automatic test of super exogeneity. In Bollerslev,
T., Russell, J.R. and Watson, M.W. (eds.) Volatility and Time Series Econometrics:
Essays in Honor of Robert F. Engle, pp. 164–193. Oxford: Oxford University Press.

Hoover, K.D. and Perez, S.J. (1999) Data mining reconsidered: encompassing and the
general-to-speci…c approach to speci…cation search (with discussion). Econometrics
Journal 2, 167–191.

Huber, P.J. (1964) Robust estimation of a location parameter. Annals of Mathematical


Statistics 35, 73–101.

Huber, P.J. and Ronchetti, E.M. (2009) Robust Statistics. New York: Wiley.

Jaeckel, L.A. (1971) Robust estimates of location: Symmetry and asymmetric contamina-
tion. Annals of Mathematical Statistics 42, 1020–1034.

Johansen, S. and Nielsen, B. (2009) An analysis of the indicator saturation estimator as a


robust regression estimator. In Castle, J.L. and Shephard, N. (eds.) The Methodology
and Practice of Econometrics: A Festschrift in Honour of David F. Hendry, pp. 1–36.
Oxford: Oxford University Press.

Johansen, S. and Nielsen, B. (2010) Discussion: The forward search: Theory and data
analysis. Journal of the Korean Statistical Society 39, 137–145.

Johansen, S. and Nielsen, B. (2013) Asymptotic theory for iterated one-step Huber-skip
estimators. Econometrics 1, 53–70.

Johansen, S. and Nielsen, B. (2014a) Analysis of the Forward Search using some new
results for martingales and empirical processes. Updated version of 2013 Discussion
Paper with title Asymptotic theory of the Forward Search.

Johansen, S. and Nielsen, B. (2014b) Asymptotic theory of M-estimators for multiple re-
gression. Work in Progress.

Jureµcková, J. and Sen, P.K. (1996) Robust Statistical Procedures: Asymptotics and Interre-
lations. New York: John Wiley & Sons.

Jureµcková, J., Sen, P.K. and Picek, J. (2012) Methodological Tools in Robust and Nonpara-
metric Statistics. London: Chapman & Hall/CRC Press.

Kilian, L. and Demiroglu, U. (2000) Residual based tests for normality in autoregressions:
asymptotic theory and simulations. Journal of Economic Business and Control 18,
40–50

Koul, H.L. (2002) Weighted Empirical Processes in Dynamic Nonlinear Models. 2nd edition.
New York: Springer.

Koul, H.L. and Ossiander, M. (1994) Weak convergence of randomly weighted dependent
residual empiricals with applications to autoregression. Annals of Statistics, 22, 540–
582.
38
Liese, F. and Vajda, I. (1994) Consistency of M-estimates in general regression models.
Journal of Multivariate Analysis 50, 93–110.

Maronna, R.A., Martin, R.D. and Yohai, V.J. (2006) Robust Statistics: Theory and Meth-
ods. Chicester: John Wiley & Sons.

Nielsen, B. (2006) Order determination in general vector autoregressions. In Ho, H.-C., Ing,
C.-K., and Lai, T.L. (eds): Time Series and Related Topics: In Memory of Ching-Zong
Wei. IMS Lecture Notes and Monograph Series 52, 93-112.

R Development Core Team (2014). R: A language and environment for statistical comput-
ing. R Foundation for Statistical Computing, Vienna, Austria.

Ramsey, J.B. (1969) Tests for Speci…cation Errors in Classical Linear Least Squares Re-
gression Analysis. Journal of the Royal Statistical Society B 31, 350–371.

Riani, M., Atkinson, A.C. and Cerioli, A. (2009) Finding an unknown number of multivari-
ate outliers. Journal of the Royal Statistical Society B, 71, 447–466.

Rousseeuw, P.J. (1984) Least median of squares regression. Journal of the American Sta-
tistical Association, 79, 871–880.

Rousseeuw, P.J. and van Driessen, K. (1998) A fast algorithm for the minimum covariance
determinant estimator. Technometrics 41, 212–223.

Rousseeuw, P.J. and Leroy, A.M. (1987) Robust Regression and Outlier Detection. New
York: Wiley.

Ruppert, D. and Carroll, R.J. (1980) Trimmed least squares estimation in the linear model.
Journal of the American Statistical Association, 75, 828–838.

Sampford, M.R. (1953) Some inequalities on Mill’s ratio and related functions. Annals of
Mathematical Statistics, 24, 130–132.

Víšek, J.Á. (2006a) The least trimmed squares. Part I: Consistency. Kybernetika, 42, 1–36.
p
Víšek, J.Á. (2006b) The least trimmed squares. Part II: n-consistency. Kybernetika, 42,
181–202.

Víšek, J.Á. (2006c) The least trimmed squares. Part III: Asymptotic normality. Kyber-
netika, 42, 203–224.

Welsh, A.H. and Ronchetti, E. (2002) A journey in single steps: robust one-step M-
estimation in linear regression Journal of Statistical Planning and Inference, 103, 287–
310.

White, H. (1980) A heteroskedastic-consistent covariance matrix estimator and a direct test


for heteroskedasticity. Econometrica 48, 817–838.

You might also like