Dodge1987 - An Introduction To L1-Norm Based Statistical Data Analysis
Dodge1987 - An Introduction To L1-Norm Based Statistical Data Analysis
North-Holland
Abstract: A brief introduction to statistical data analysis based on the minimization of L,-norm is
given for those who are not familiar with the subject. A selected bibliography on the statistical data
analysis &-norm based is provided.
Keywords: &-norm, Least absolute deviations estimation, Minimum sum of absolute errors,
Minimum absolute deviations, Linear programming.
1. Introduction
In almost every possible sphere of human endeavor in the modern world, due
to lack of certainty, statistical technology plays an indispensable role. The
inferential aspects of statistical techniques have made them essential to the toolkit
of anyone engaged in scientific enquiry, including economy, sociology, medicine,
astronomy, business, psychology, education, industry, engineering and in all other
branches of applied sciences. Model fitting, estimation, grouping and data
classification, design and analysis of experiments, testing of statistical hypotheses,
sample surveys, and time series are most of the major statistical methods that are
widely used in many areas of applied sciences.
The underlying theories of these statistical methods contain elements of
optimization.
For example, to determine an estimator, we need a set of criteria by which its
performance can be judged. By an estimator of a parameter, we mean a function
of observations which is closest to the true value in some sense. In choosing a
criteria of estimation, one attempts to provide a measure of closeness of a
parameter subjects to some suitable constraints on the class of estimators. An
optimum estimator in the restricted class is determined by minimizing the
measure of closeness.
As a second example, which also serves as a base for some definitions, consider
one of the most extensively discussed methods among the statistical tools availa-
ble for analysis of data, the regression analysis.
[ CIEilP]l’p 0.2)
with p > 1. If we set p = 1, we obtain what is known as an absolute or city block
metric or &-norm. The minimization of this criterion (distance) is called, among
other names, least absolute values method. If p = 2, we have what is known as an
Euclidian metric or &-norm. The minimization of this distance is known as the
least squares method. The classical approach to the regression problem uses this
method. If we minimize the L,-norm for p = co we have the minimax or
Chebychev method.
The root of the most popular method of estimating /3= (&,, pi)‘, that is, the
least squares, goes back to Gauss or Legendre (see Stigler (1981) for a recent
historical discussion). Laplace used the name ‘most advantageous method’. The
usual assumption required to use this method, is that the error &i is distributed
normally with mean zero and variance a*.
While the method of least squares (and its generalizations) have served
statisticians well for a good many years (mainly because of mathematical con-
venience and ease of computation), and enjoys certain well known properties
within strictly Gaussian parametric models, it is recognized, however, that out-
liers, which arise from heavy-tailed distributions, have an unusually large in-
fluence on the estimates obtained by these methods. Indeed, one single outlier
Y. Dodge / L, -norm based data analysis 241
can have an arbitrary large effect on the estimate. Consequently, robust methods
have been created to modify least squares methods so that the outliers have much
less influence on the final estimates. Since Box (1953) coined the term robustness,
a number of excellent papers have been published on the subject. In 1964, Huber
published what is now considered to be a classic paper on robust estimation of
location parameter. Huber’s work was subsequently extended to linear model by
Andrews (1974), Bickel (1975) and Huber (1973) among others. Huber (1964)
introduces the concept of M-estimate (or maximum likelihood type estimate)
based on the idea of replacing E; in the minimization of sum of E; by (Ed), where
is a symmetric function with a unique minimum at zero. Harter’s (1974a, b,
1975a, b, c, 1976) monumental study provides a fascinating historical account of
linear model estimation based on least squares and alternative methods.
One of the most satisfying robust alternative to the least squares is the least
absolute values method. This method, which is the subject of this book, is a
widely recognized superior robust method specially well-suited to longer-tailed
error distributions, e.g. Laplace or Cauchy. Increasingly, this method is recom-
mended as a preliminary (consistent) estimator for one-step and iteratively
reweighted least squares methods. Other advantages of least absolute methods in
robust estimation are explained by Huber (1987).
There are, however, many other robust procedures. For an excellent mathe-
matical aspect of robustness, the reader is referred to Huber (1964, 1972, 1973,
1981), and Hampel et al. (1986).
2. Historical remarks
’ Whether one uses MAD or LAV (sound love) for abbreviation, they mean the same thing since
love is a kind of MADness anyhow.
242 Y. Dodge / L, -norm based data analysis
vector variable Xi (non-random) and vector parameter /3.For the model f( Xi, p)
= &, the LAV estimator is the sample median.
Historically, the LAV estimation is probably the oldest of all robust methods.
The origin of the LAV estimator can be traced back to Galilei (1632). Galilei in
determining the position of a newly discovered star in his “Dialog0 dei massimi
sistemi”, proposes the least possible correction in order to obtain a reliable result
for the problem. However, the LAV estimator in a simple model was suggested
and studied by Boscovich (1757) and Laplace (1793). Laplace called it the
‘method of situations’ (Stigler, 1973). In this paper Stigler discusses Laplace’s
work as well as that of R.A. Fisher who studied the loss of information of LAV
estimator in the double exponential model for which it is, however, the maximum
likelihood estimator.
After nearly seventy years of silence, following the publication of Laplace’s
second supplement to the ThCorie Analytique des Probabilites (1818), Edgeworth
(1887) presented a method for linear regression using LAV method. Farebrother
(1987) discusses several contributions of Laplace, Gauss, Edgeworth and others to
the geometrical, graphical and analytical solution of the L, and L, estimations
problems. Via many worked out examples, he teaches and inlightens many vacant
periods in the chronological development of L, and L, methods that were
otherwise not clear. Gentle (1977) and Bloomfield and Steiger (1983) provided
excellent bibliographical notes on the LAV estimation.
3. Computational algorithms
Since the publication of Edgeworth’s work, few attempts have been made to
convince the statisticians and particularly the applied users to employ this
method. (See Turner (1887), Rhodes (1930), and Singleton (1940) and Karst
(1958).) Reasons for such a long silence may be summarized as follows:
(1) Computational difficulties in producing the numeric values of the LAV
estimates in regression. (Lack of closed form formulae similar to that of least
squares.)
(2) Lack of asymptotic theory for LAV estimation in the regression model, and
in general nonexistence of accompanying statistical inference procedures.
(3) Insufficient evidence to show the superiority of the small sample properties
of the LAV estimation to the least squares estimators when sampling from long
tailed distributions.
Following the work of Charnes, Cooper and Ferguson (1955) a renewed
interest in using the LAV estimation for regression problems was created. They
showed the equivalence between LAV problem and a linear programming prob-
lem. Wagner (1959) suggested that the LAV problem can be solved by solving the
dual of the LAV problem. He also observed that the dual problem can be reduced
to a problem with a smaller basis but that the dual variables have upper-bound
restrictions. Wagner formulation of the problem can be stated as follows:
Consider the problem of minimizing C 1q 1 with respect to p where ei is the
Y. Dodge / L, -norm based data analysis 243
deviation from the observed, and predicted values of Y the ith observation. The
problem can be stated as follows:
Minimize c I&II
subject to Xf3 + E = Y, (3-L)
E, /? unrestricted in sign.
Noting the fact that 1E, 1 = q, + ezi where &ri and &2r are nonnegative, and
E, = El, - &2,, we can reformulate (3.1) as a linear programming problem as
follows:
From the computational point of view, Barrodale and Roberts (1973) pre-
sented a modified simplex algorithm so as to bypass some iterations. The LAV
estimation problem with additional linear restrictions (restricted LAV problem) is
considered along the same lines in Barrodale and Roberts (1978). Special purpose
algorithms for LAV problem have also been given by Armstrong and Hultz
(1977) and Bartels and Conn (1977). Computer comparisons have established the
Barrodale and Roberts algorithms as an efficient method for solving the LAV
problem. Armstrong, Elam and Hultz (1977) considered the problem of two-way
classification models using the L,-norm minimization. They demonstrate the
equivalence between the problem of obtaining LAV estimates for the two-way
classification model and a capacitated transportation problem. They also devel-
oped a special purpose algorithm to provide LAV estimates. Their method along
with a real life example can be found in Arthanari and Dodge (1981). Bloomfield
and Steiger (1983) devoted a complete chapter on LAV in multi-way tables
(classification models) in which they also discussed the relationship between
Tukey’s (1977) median polish and LAV fitting. A recent revised simplex version
of this algorithm by Armstrong, Frome, and Kung (1979) is claimed to be even
more efficient than Barrodale and Roberts algorithm.
In 1981, Arthanari and Dodge devoted nearly a complete chapter on computa-
tional aspects of the LAV estimation. Bloomfield and Steiger (1983) described
and compared three algorithms for LAV estimation by Barrodale and Roberts
(1973, 1974), Bartels, Conn and Sinclair (1978) and Bloomfield and Steiger
(1980).
For recent development in computational algorithms, the interested reader
may refer to Dielman (1984), and Narula (1987). Gentle, Narula and Sposito
(1987) compared the codes that are openly available for solving unconstrained L,
linear regression problems. They took the CPU time on a scalar and virtual
memory as the bases of comparison. For the simple linear regression model (two
parameters), they tested the programs of Armstrong and Kung (AK) (1978),
Josvanger and Sposito (JS) (1983), Abdelmalek (A) (1980), Armstrong, Frome
244 Y. Dodge / l., -norm based data ana~vsis
and Kung (AFK) (1979) and Bloomfield and Steiger (BS) (1980). And for more
than two parameters they considered only A, AFK and BS. They recommended
the AFK algorithm (all things considered) to be the best.
Ekblom (1987) presents some algorithms to compute L,, and Huber estimates.
He also examines and points out some difficulties of applying these algorithms
when p approaches to 1 for the I,,-estimates. The reduced gradient algorithm to
minimize polyhedral convex functions has been presented by Osborne (1987).
Gonin and Money (1987) presented a complete and systematic review of compu-
tational methods for solving the nonlinear &-norm estimation problem.
Probably with new improved algorithms for the linear programming such as
Khachian (1979) Karmarkar (see Meketon (1986)), new possibilities will also be
open for the LAV estimation.
Probably the major difficulty for applied researchers in using LAV estimation
for many years was the lack of accompanying statistical inference procedures.
Such procedures would include methods for testing general linear hypothesis,
obtaining confidence intervals, analysis of variance tables and performing multi-
ple comparison procedures.
Koenker and Bassett (1982) investigate the asymptotic distribution of three
alternative L, test statistics of a linear hypothesion in the standard linear model.
They showed that the three test statistics which correspond to Wald, likelihood
ratio, and Lagrange multiplier tests under mild regularity conditions on design
and error distribution have the same limiting chi-square behavior. A very nice
summary of LAV estimation which includes computational algorithms, small and
large (asymptotic) sample properties, confidence intervals and hypotheses testing
in linear regression is given by Dielman and Pfaffenberger (1982). They have also
suggested some future lines of research on LAV estimation in regression. Koenker
(1987) compares the small-sample performance of these three L, test statistics on
a two way model with interaction for testing the hypothesis of no interaction.
In a series of articles McKean and Hettmansperger (1976), Shrader (1976),
McKean and Shrader (1977), Shrader and Hettmansperger (1980) have adopted
many robust inference procedures which are very similar to classical analysis of
variance. McKean and Shrader (1987) presented an LAV analysis of the general
linear model as complete and unified as least squares analysis of variance. For
testing a general linear hypothesis, the procedure is to replace the classical
246 Y. Dodge / L, -norm based data analysis
6. Density estimation
k(x) dx = 1.
1
After selecting a k(x), one can estimate the density function key:
f(x)=$$q(x-x,)/h].
J
However, choosing the scale factor h (bandwidth) for a given kernel is not all that
easy. There are many asymptotic results on how h should be selected in order to
obtain the best estimate of the density. As suggested by Parzen, h must be a
function of n so that as n tends to infinity, h tends to 0 and nh * 00. But it is
obvious that the optimal h depends on f(x). Among other results Dodge (1986)
showed that the optimal asymptotic value of h varies as a function of x in
different densities in comparison with practical situations. When the number of
observations reaches 2000 in the case of normal distribution with mean 0 and
variance 1, the optimal value of h reaches 0.54 which shows a slow convergence
of h as n goes towards infinity.
Recently, Devroye and Gyiirfi (1985) published a complete manuscript entitled
“Nonparametric density estimation: the L, view”. In that, they developed a
smooth L, theory since the better studied L, theory has led to various anomalies
and misconceptions. Their choice of the L,-norm is also motivated by its
invariance under monotone transformations of the coordinate axes and the fact
that it is always well defined.
Y. Dodge / L, -norm based data analysis 247
For uniformly mixing samples and for strong mixing samples, the
distribution-free of asymptotically L, consistency of kernel and histogram den-
sity estimates is proved by Gyiirfi (1987). For independent samples Devroye
(1983) gives the complete characterization of L, consistency of kernel estimates.
7. Cluster analysis
$4) =
Ii
10
F,
‘<J
;,JEA
ds for X 2 2,
otherwise,
Optimize C( TV).
JEG
8. Concluding remarks
In summary, the main goal is to be able to apply the &-norm wherever the
&-norm was used before.
References
N.N. Abdelmalek (1980) A Fortran subroutine for the L, solution of overdetermined systems of
linear equations, ACM Trans. Math. Software 6, 228-230.
D.F. Andrews (1974), A robust method for multiple linear regression, Technometrics 16, 523-531.
R.D. Armstrong and E.L. Frome (1977) A special purpose linear programming algorithm for
obtaining least absolute value estimates in a linear model with dummy variables, Commun.
Statist. B6, 383-398.
R.D. Armstrong and J.W. Hultz (1977) An algorithm for a restricted discrete approximation
problem in the L, norm, SIAM J. Numer. Anal. 14, 555.
R.D. Armstrong and D.S. Kung (1978) AS132: Least absolute value estimates for a simple linear
regression problem, Appl. Statist. 21, 363-366.
R.A. Armstrong, J.J. Elam and J.W. Hultz (1977) Obtaining least absolute value estimates for a
two-way classification model, Commun. Statist. B6, 365-381.
R.D. Armstrong, E.L. Frome and D.S. Kung (1979), A revised simplex algorithm for the absolute
deviation curve fitting problem, Commun. Stat. BS, 175.
T.S. Arthanari and Y. Dodge (1981) Mathematical Programming in Statistics (John Wiley,
Interscience Division, New York).
V.G. Ashar and T.D. Wallace (1963), A sampling study of minimum absolute deviations estimators,
Oper. Res. 11, 747.
I. Barrodale and F.D.K. Roberts (1973), An improved algorithm for discrete L, linear approxima-
tion, SIAM J. Numer. Anal. 10, 839.
I. Barrodale and F.D.K. Roberts (1974) Algorithm 478: Solution of an overdetermined system of
equations in the L, norm, Commun. Assoc. Comput. Mach. 17, 319.
I. Barrodale and F.D.K. Roberts (1978), An efficient algorithm for discrete L, linear approxima-
tion with linear constraints, SIAM J. Numer. Anal. 15, 603.
R.H. Bartels and A.R. Conn (1977) LAV regression: A special case of piecewise linear minimi-
zation, Commun. Stat. B6, 329.
R.H. Bartels, A.R. Conn and J.W. Sinclair (1978), Minimization techniques for piecewise differen-
tiable functions: The L, solution to an overdetermined linear system, SIAM J. Numer. Anal. 15,
224.
G.W. Bassett and R. Koenker (1978), Asymptotic theory of least absolute error, J. Amer. Statist.
Assoc. 73, 618-622.
P.J. Bickel (1975), One-step Huber estimates in the linear model, Journal of the American Statistical
association 70, 428-434.
P. Bloomfield and W. Steiger (1980), Least absolute deviations curve-fitting, SIAM J. Sci. Statist.
Comput. 1, 290-300.
P. Bloomfield and W.L. Steiger (1983), Least Absolute Deviations: Theory, Applications, and
Algorithms (Birkhauser, Boston).
R.J. Boscovich (1757). De litteraria expeditione per pontificiam ditionem, et synopsis amplioris
operis, ac habentur plura eius ex exemplaria etiam sensorum impressa, Bononienci Scientiarum et
Artium Znstituto Atque Academia Commentarrii 4, 353-396.
G.A. Bourdon (1974), A Monte Carlo sampling study for further testing of the robust regression
procedure based upon the kurtosis of the least squares residuals, Unpublished M.S. thesis, Air
Force Institute of Technology, Wright-Patterson AFB, Ohio.
G.E.P. Box (1953) Non-normality and tests on variances, Biometrika 40, 318-334.
A. Charnes, W.W. Cooper and R.O. Ferguson (1955) Optimal estimation of executive compensa-
tion by linear programming, Manage. Sci. 1, 138.
Y. Dodge / L, -norm based data analysis 251
L. Decroye (1983) The equivalence of weak, strong and complete convergence in L, for kernel
density estimates, Annals of Statistics 11, 896-904.
L. Defroye and L. Gyorfi (1985), Nonparametric Density Estimation, the L, View (John Wiley, New
York).
T.E. Dielman (1984). Least absolute value estimation in regression models: An annotated bibliogra-
phy, Commun. Statist. 13, 513-541.
T. Dielman and R. Pfaffenberger (1982), LAV (least absolute value) estimation in linear regression:
A review, in: S.H. Zanakis and J.S. Rustagi (Eds.), Optimization in Statistics (North-Holland,
Amsterdam).
Y. Dodge (1986) Some difficulties involving nonparametric estimation of a density function, J. of
Official Statistics 2(2), 193-202.
Y. Dodge and J. Jureckova (1987) Adaptive combination of least squares and least absolute
deviations estimators, in: Y. Dodge (Ed.), Statistical Data Analysis Based on the L, -Norm and
Related Methods (North-Holland, Amsterdam).
J.C. Dunn (1974) A fuzzy relative of the ISODATA process and its use in detecting compact
well-separated clusters, J. of Cybernetics 3(3), 32-57.
J. Dupacova (1987). Asymptotic properties of restricted L,-estimates of regression, in: Y. Dodge
(Ed.), Statistical Data Analysis Based on the L, -Norm and Related Methods (North-Holland,
Amsterdam).
F.Y. Edgeworth (1887), On observations relating to several quantities, Phil. Mag. (5th Series) 24,
222.
H. Ekblom (1987) The L,-estimate as limiting case of an L,- or Huber-estimate, in: Y. Dodge
(Ed.), Statistical Data Analysis Based on the L,-Norm and Related Methods (North-Holland,
Amsterdam).
R.W. Farebrother (1987) Mechanical representations of the L, and L, estimation problems, in: Y.
Dodge (Ed.), Statistical Data Analysis Based on the L, -Norm and Related Methods (North-Hol-
land, Amsterdam).
R.W. Farebrother (1987), The historical development of the L, and L, estimation procedures, in:
Y. Dodge (Ed.), Statistical Data Analysis Based on the L, -Norm and Related Methods (North-
Holland, Amsterdam).
V.V. Fedorov (1987), Various discrepancy measures in model testing (two competing regression
models), in: Y. Dodge (Ed.), Statistical data Analysis Based on the L, -Norm and Related Methods
(North-Holland, Amsterdam).
A. Gaivoronski (1987) Numerical techniques for finding estimates which minimize the upper
bound of the absolute deviation, in: Y. Dodge (Ed.), Statistical Data Analysis Based on the
L, -Norm and Related Methods (North-Holland, Amsterdam).
G. Galilei (1632), Dialog0 dei massimi sistemi.
J.E. Gentle (1977) Least absolute values estimation: An introduction, Commun. Statist. B6,
313-328.
J.E. Gentle, S.C. Narula and V.A. Sposito (1987) Algorithms for unconstrained L, linear
regression, in: Y. Dodge (Ed.), Statistical Data Analysis Based on the L,-Norm and Related
Methods (North-Holland, Amsterdam).
F.R. Glahe and J.G. Hunt (1970) The small sample properties of simultaneous equation least
absolute estimators vi-vis least squares estimators, Econometrica 38, 742.
R. Gonin and A.H. Money (1987) A review of computational methods for solving the nonlinear
L,-norm estimation problem, in: Y. Dodge (Ed.), Statistical Data Analysis Based on the L, -Norm
and Related Methods (North-Holland, Amsterdam).
R. Gonin and A.H. Money (1987). Outliers in physical processes: L,- or adaptive L,-norm
estimation?. In: Y. Dodge (Ed.), Statistical Data Analysis Based on the L,-Norm and Related
Methods (North-Holland, Amsterdam).
L. Gyorfi (1987) Density estimation from dependent sample. in: Y. Dodge (Ed.), Statistical Data
Analysis Based on the L, -Norm and Related Methods (North-Holland, Amsterdam).
Hample, F.R. (1968) Contributions to the theory of robust estimation, Ph.D. thesis, University of
California, Berkeley.
252 Y. Dodge / L, -norm based data analysis
F.R. Hample (1971) A general qualitative definition of robustness, Annals of Math. Statistics 42,
1887-1896.
F. Hampel, E. Ronchetti, P. Rousseeuw and W. Stahel (1986) Robust Statistics: The Approach
Based on Injluence Functions (John Wiley, New York).
H.L. Harter (1974a), The method of least squares and some alternatives I, Znt. Stat. Reu. 42, 235.
H.L. Harter (1974b), The method of least squares and some alternatives II, Znt. Stat. Rev. 42, 235.
H.L. Harter (1975a), The method of least squares and some alternatives III, Int. Stat. Rev. 43, 1.
H.L. Harter (1975b), The method of least squares and some alternatives IV, Int. Stat. Reu. 43,
125-190 and 273-278.
H.L. Harter (1975~) The method of least squares and some alternatives V, Znt. Stat. Reu. 43, 269.
H.L. Harter (1976) The method of least squares and some alternatives VI, Znt. Stat. Rev. 44, 113.
P.J. Huber (1964) Robust estimation of a location parameter, Ann. Math. Statist. 35, 73-101.
P.J. Huber (1972) Robust statistics: A review, Ann. Math. Statist. 43, 1041-1067.
P.J. Huber (1973), Robust regression: Asymptotics, conjectures, and Monte Carlo, Ann. Statist. 1,
7999821.
P.J. Huber (1981) Robust Statistics (John Wiley, New York).
P.J. Huber (1987) The place of the L,-norm in robust estimation, in: Y. Dodge (Ed.), Statistical
Data Analysis Bused on the L, -Norm and Related Methods (North-Holland, Amsterdam).
R.E. Jensen (1969) A dynamic programming algorithm for cluster analysis, Oper. Res. 17, 1034.
L.A. Josvanger and V.A. Sposito (1983) L,-norm estimates for the simple regression problem,
Commun. Statist. B12, 215-221.
O.J. Karst (1958) Linear curve fitting using least deviations, J. Amer. Statist. Assoc. 53, 1188132.
L. Kaufman and P.J. Rousseeuw (1987), Clustering by means of medoids, in: Y. Dodge (Ed.),
Statistical Data Analysis Bused on the L, -Norm and Related Methods (North-Holland, Amster-
dam).
L.G. Khachian (1979) A polynomial algorithm for linear programming, Dokludy Akud. Nuuk
USSR 244(5) 93-96.
E.A. Kiountouzis (1973) Linear programming techniques in regression analysis, Appl. Stat. 22, 69.
R. Koenker (1987) A comparison of asymptotic testing methods for L,-regression, in: Y. Dodge
(Ed.), Statistical Data Analysis Bused on the L, -Norm and Related Methods (North-Holland,
Amsterdam).
R. Koenker and G.W. Bassett (1982), Tests of hypotheses and L, estimation, Econometrica 50,
1577-1583.
P.S. Laplace (1793), Sur quelques points du systeme du monde, Memoires de 1’AcadCmie Royale des
Sciences de Paris, l-87, reprinted in Oeuvres Completes de Laplace, Vol. 11 (Gauthier-Villars,
Paris, 1895) 477-558.
P.S. Laplace (1812) ThCorie analytique des probabilites (Mme Courtier, Paris, 1820) reprinted in
Oeuvres Completes de Laplace, Vol.. 7 (Gauthier-Villars, Paris, 1886).
P.S. Laplace (1818) Deuxieme supplement to Laplace (1812).
J.W. McKean and R.M. Schrader (1987) Least absolute errors analysis of variance, in: Y. Dodge
(Ed.), Statistical Data Analysis Bused on the L, -Norm and Related Methods (North-Holland,
Amsterdam).
M.S. Meketon (1986) Least absolute value regression, Working Paper, AT&T Bell Laboratories,
Holmdel, NJ.
S.C. Narula (1987) The minimum sum of absolute errors regression, J. Quality Tech. 19, 37-45.
M.R. Osborne (1987) The reduced gradient algorithm, in: Y. Dodge (Ed.), Statistical Data Analysis
Based on the L, -Norm and Related Methods (North-Holland Amsterdam).
E. Parzen (1962) On the estimation of a probability density function and the mode, Annuls of
Mathemuticul Statistics 40, 1065-1076.
R.C. Pfaffenberger and J.J. Dinkel (1978) Absolute deviations curve fitting: An Alternative to
Least Squares, in: H.A. David (Ed.), Contributions to Survey Sampling and Applied Statistics
(Academic Press, New York).
E.C. Rhodes (1930) Reducing observations by the method of minimum deviations, Phil. Mug. (7th
Series) 9, 974.
Y. Dodge / L, -norm based data analysis 253
J.R. Rice and J.S. White (1964), Norms for smoothing and estimation, SIAM Reu. 6, 243.
A.E. Ronner (1977), P-norm estimators in a linear regression model, Ph.D. Thesis, Groningen, The
Netherlands.
A.E. Ronner (1984), Asymptotic normality of p-norm estimators in multiple regression, Z.
Wahrscheinlichkeitstheorie Verw. Gebiete 66, 613-620.
B. Rosenberg and D. Carlson (1977), A simple approximation of the sampling distribution of least
absolute residuals regression estimates, Commun. Stat. B6, 421.
P.J. Rousseeuw (1987), An application of L, to astronomy, in: Y. Dodge (Ed.), Statistical Data
Analysis Based on the L, -Norm and Related Methods (North-Holland, Amsterdam).
R.M. Schrader and J.W. McKean (1987), Small sample properties of least absolute errors analysis
of variance, in: Y. Dodge (Ed.), Statistical Data Analysis Based on the L,-Norm and Related
Methods (North-Holland, Amsterdam).
R.L. Sielken and H.O. Hartley (1973), Two linear programming algorithms for unbiased estimation
of linear models, J. Amer. Statist. Assoc. 68, 639.
R.R. Singleton (1940), A method of minimizing the sum of absolute values of deviations, Ann.
Math. Statist. 11, 301-310.
H. Spgth (1987), Using the L, norm within cluster analysis, in: Y. Dodge (Ed.), Statistical Data
Analysis Based on the L, -Norm and Related Methods (North-Holland, Amsterdam).
G. Stangenhaus (1987), Bootstrap and inference procedures for L, regression, in: Y. Dodge (Ed.),
Statistical Data Analysis Based on the L, -Norm and Related Methods (North-Holland, Amster-
dam).
S.M. Stigler (1973), Studies in the history of probability and statistics, XxX11: Laplace, Fisher, and
the discovery of sufficiency, Biometrika 60, 439-445.
S.M. Stigler (1981), Gauss and the invention of least squares, Annals of Statistics 9, 465-474.
L.D. Taylor (1973), Estimation by minimizing the sum of absolute errors, in: P. Zarembka (Ed.),
Frontiers in Econometrics (Academic Press, New York) 169-190.
E. Trauwaert (1987), L, in fuzzy clustering, in: Y. Dodge (Ed.), Statistical Data Analysis Based on
the L, -Norm and Related Methods (North-Holland, Amsterdam).
J.W. Tukey (1977), Exploratory Data Analysis (Addison-Wesley, Reading, MA).
H.H. Turner (1887), On Mr. Edgeworth’s method of reducing observations relating to several
quantities, Phil. Mag. (5th Series) 24, 466-470.
I. Vajda (1987), L,-distances in statistical inference: Comparison of topological, functional and
statistical properties, in: Y. Dodge (Ed.), Statistical Data Analysis Based on the L, -Norm and
Related Methods (North-Holland, Amsterdam).
H.D. Vinod (1969), Integer programming and theory of grouping, J. Amer. Statist. Assoc. 64, 506.
H.M. Wagner (1959), Linear programming techniques for regression analysis, J. Amer. Statist.
Assoc. 54, 206.
H.G. Wilson (1978), Least squares versus minimum absolute deviations estimation in linear models,
Decis. Sci. 9, 322.