Additive Model
Additive Model
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to
digitize, preserve and extend access to Journal of the American Statistical Association
A new method for nonparametric multiple regression is to cover 10 percent of the range of each coordinate, then
presented. The procedure models the regression surface it will (on the average) contain only (. 1)10 of the sample,
as a sum of general smooth functions of linear combi- and thus will nearly always be empty. If, on the other
nations of the predictor variables in an iterative manner. hand, one adjusts the neighborhood to contain 10 percent
It is more general than standard stepwise and stagewise of the sample, it will cover (on the average) (.1)1/10 - 80
regression procedures, does not require the definition of percent of the range of each coordinate. This problem of
a metric in the predictor space, and lends itself to graph- sparsity basically limits the success of direct p-dimen-
ical interpretation. sional local averaging. In addition, these methods do not
provide any comprehensible information about the nature
KEY WORDS: Nonparametric regression; Smoothing;
of the regression surface.
Projection pursuit; Surface approximation.
The successful nonparametric regression procedures
that have been proposed are based on successive refine-
1. INTRODUCTION
ment. A hierarchy of models of increasing complexity is
In the regression problem, one is given a p-dimensional formulated. The complexity of a model is the number of
random vector X, the components of which are called degrees of freedom used to fit it. The aim is to find the
predictor variables, and a random variable Y, which is particular model that, when estimated from the data, best
called the response. The aim of regression analysis is to approximates the regression surface. The search usually
estimate the conditional expectation of Y given X on the proceeds through the hierarchy in a stepwise manner. At
basis of a sample {(xi, yi): i = 1, 2, . . . , n}. Typically, each step, the model of the subsequent level of the hi-
one assumes that the functional form of the regression erarchy that best fits the data is selected. Since the sample
surface is known, reducing the problem to that of esti- size limits the complexity of the models that can be used,
mating a set of parameters. To the extent that this model these procedures will be successful to the extent that the
is correct, such parametric procedures can be successful; regression surface can be approximated by models on
unfortunately, model correctness is difficult to verify in levels of low complexity in the hierarchy.
practice, and an incorrect model can yield misleading Applying this concept with a hierarchy of polynomial
results. For this reason, there is a growing interest in functions of the predictors leads to the stepwise, stage-
nonparametric methods, which make only a few very wise, and all-subsets polynomial regression procedures.
general assumptions about the regression surface. These procedures have proven to be successful in many
The most extensively studied nonparametric regression applications. Unfortunately, regression surfaces occur-
techniques (kernel, nearest-neighbor, and spline smooth- ring in practice often are not represented well by low-
ing) are based on p-dimensional local averaging: the es- order polynomials (e.g., surfaces with asymptotes); use
timate of the regression surface at a point xo is the average of higher-order polynomials is limited by considerations
of the responses of those observations with predictors in of sample size and computational feasibility.
a neighborhood of xo. These techniques can be shown to A hierarchy of piecewise constant (Sonquist 1970) or
have desirable asymptotic properties (Stone 1977). In piecewise linear (Breiman and Meisel 1976; Friedman
high-dimensional settings, however, they do not perform 1979) models leads to recursive partitioning regression.
well for reasonable sample sizes. The reason is the in- These procedures basically operate as follows: for a par-
herent sparsity of high-dimensional samples. This is il- ticular predictor and a value of this predictor, the pre-
lustrated by the following simple example: let X be uni- dictor space is split into two regions, one projecting to
formly distributed over the unit hypercube in R'0, and the left and the other to the right of the value. A separate
consider local averaging over hypercubical neighbor- constant or linear model is fit to the sample points lying
hoods. If the dimensions of the neighborhood are chosen in each region. The particular predictor and splitting value
are chosen to minimize the residual sum of squares over
the sample. The procedure is then recursively applied to
* Jerome H. Friedman is Group Leader, Computation Researcheach of the regions so obtained.
Group, Stanford Linear Accelerator Center, Stanford University, Stan- These recursive partitioning methods can be viewed as
ford, CA 94305. Werner Stuetzle is with the Stanford Linear Accelerator
Center and is Assistant Professor, Department of Statistics, Stanford local averaging procedures, but unlike kernel and nearest-
University, Stanford, CA 94305. The authors' work was supported by
the Department of Energy under Contract DE-AC03-76SF00515. The
? Journal of the American Statistical Association
authors gratefully acknowledge the help of Mark Jacobson during the
early stages of this work and thank the editor for many helpful December 1981, Volume 76, Number 376
comments. Theory and Methods Section
817
neighbor procedures, the local regions are adaptively This procedure directly follows the successive refine-
constructed based on the nature of the response variation. ment concept outlined in the previous section: The
In many situations, this results in dramatically improved models at the mth level of the hierarchy are sums of m
performance. However, as each split reduces the sample smooth functions of arbitrary linear combinations of the
over which further fitting can take place, the number of predictors.
regions, and thus the number of separate models, is Standard additive models approximate the regression
limited. surface by a sum of functions of the individual predictors.
In this paper we apply the successive refinement con- Such models are not completely general in that they can-
cept in a new way that attempts to overcome the limi- not deal with interactions of predictors. Considering func-
tations of polynomial regression and recursive partition- tions of linear combinations of the predictors removes
ing. The procedure is presented in Section 2. Univariate this limitation. For example, consider a simple interac-
smoothing is discussed in Section 3; implementation spe- tion: Y = X1X2. A standard additive model cannot rep-
cifics are considered in Section 4. In Section 5 we illus- resent this multiplicative dependence; however, Y can be
trate the procedure by applying it to several data sets. expressed in the form (1), with a, = (1/V2)(1, 1), a2
The merits of this procedure, relative to other nonpara- = (1I/A2)(1, - 1), S (Z) = IZ2, S2(Z) = - Z2. The in-
metric procedures, are discussed in Section 6. In Section troduction of arbitrary linear combinations of predictors
7 we relate projection pursuit regression to the projection allows the representation of general regression surfaces.
pursuit technique for cluster analysis presented by Fried-
man and Tukey (1974). 3. UNIVARIATE SMOOTHING
riy, i= l...n with suitable adjustment for the boundaries. Here AVE
can denote the mean, median, or other ways of "aver-
MO.
aging." The parameter k defines the bandwidth of the
smoother.
(We assume that the response is centered: E Yi 0.)
2. Search for the next term in the model. For a given The assumption underlying traditional smoothing pro-
linear combination Z = a * X, construct a smooth rep- cedures is that the observed responses yi are generated
resentation Sa(Z) of the current residuals as ordered in according to the model yi = f(xi) + Ei, Ei iid, E(Ei) = 0,
ascending value of Z (see Sec. 3). Take as a figure of f smooth. The resulting smooth S is then taken as an
merit (criterion of fit) I(a) for this linear combination estimate
the for f. Choosing too small a bandwidth will tend
fraction of so far unexplained variance that is explained to increase the variance -component of the mean squared
by Sa: error of the estimate, whereas too large a bandwidth may
n n increase the bias. The optimum bandwidth will, of course,
I(a) = 1 - z (ri - Sa(a . xi))2 ri2. (2) depend on f and the variance of es which are generally
i=l1 i=l unknown. Formal methods for estimating the optimal
bandwidth using cross-validation have been proposed
Find the coefficient
(projection pursuit) aM+ 1 = maxj`1I(a), and the cor- (Wahba and Wold 1975). Often, however, the degree of
smoothing is determined experimentally. One attempts
responding smooth S.M+l.
3. Termination. If the figure of merit is smaller than to use as large a bandwidth as possible, subject to the
a user-specified threshold, stop. (The last term is not smooth not lying systematically above or below the data
included in the model.) Otherwise, update the current in any region (oversmoothing).
residuals and the term counter Our design of a smoother is guided by the fact that the
model underlying traditional smoothing procedures is not
ri ri- SaMr S.M+ I (am .i x1), . = n appropriate. Our model seeks to explain response vari-
ability by not just one smoothed sequence, but by a sum
M-M + 1
of smooths of several sequencings of the response (as in-
and go to Step 2. duced by the several linear combinations of the predic-
tors). High local variability encountered in a particular or without readjustment of the smooths along previously
sequence may be caused by smooth dependence of the determined linear combinations when a new linear com-
response on other linear combinations. In order to pre- bination has been found (backfitting). In the terminology
serve the ability of fitting this structure in further itera- of linear regression, this would correspond to the differ-
tions, it is important to avoid accounting for it by spurious ence between a stepwise and a stagewise procedure. We
fits along existing directions. Consequently, we use a have implemented the stepwise version.
variable bandwidth smoother. An average smoother In some situations it may be useful to restrict the search
bandwidth is specified by the user. The actual bandwidth for solution directions to the set of predictors (projection
used for local averaging at a particular value of Z can be selection) rather than allowing for linear combinations.
larger or smaller than the average bandwidth. Larger Although the resulting additive model cannot represent
bandwidths are used in regions of high local variability completely general regression surfaces, it is still more
of the response. general than linear regression in allowing for general
To reduce bias, especially at the ends of the sequence, smooth functions rather than only linear functions of the
we smooth by locally linear, rather than locally constant, predictors. Projection selection is computationally less
fitting (Cleveland 1979). Furthermore, each observation expensive than full projection pursuit and the resulting
is omitted from the local average that determines its models are often more easily interpreted. One could also
smoothed value. This cross-validation makes the average run projection selection, followed by projection pursuit,
squared residual a more realistic indicator of variability thereby separating the additive and interactive parts of
about the smooth (e.g., it is not possible to make the the model. Another strategy would be to run projection
average squared residual arbitrarily small by reducing the pursuit and get some easily interpreted linear combina-
bandwidth). To protect against isolated outliers, we use tions (as in Sec. 5, second example, with XI - X2, X4,
running medians of three (Tukey 1977) as a first pass in X5) and then run projection selection on these directions
our smoother. to see how much is lost. Forming a parametric model
Our smoothing algorithm thus makes four passes over based on these directions is another possibility.
the data:
Figure la. Y = X1X2 + i, e - N(0, .04), vs. X2 (Y is Figure lc. Residuals From First Solution Smooth
plotted on the vertical axis, X2 on the horizontal axis. vs. Second Solution Linear Combination a2 "X, a2
The + symbols represent data points, numbers indi- = (.72, -.69)
cate more than one data point. The smooth is repre-
I + I
sented by * symbols) I I
I I
I I
I + I
I I I ..+ + I
1.0 I I I ++ 2 I
I + ++ ++ + I
I ++ +I + + 2 + ++ +
I + I I + 2+ + +44+++ + +I
I I I +4.++ ++ 2 I
II I + + 2 I
I + + + 2 +++ **+++ I
+ + + + I + 3 + ++********* + + I
I ++ + +1 I +.+ + ****** ++4+.+ *** + I
I + ++ I
I + + + + + I .0 I 2 **** +++2+ + + ***. I
I * + + I I +.+ ** + 2+ + + 2**+2+ I
I + + 2 ++++ I I .+***++ ++2 ++ + ++ +** I
I + + + +2+ + I + **+2 + +4+ + + + *** +
I + *** + + + + ** + I
I + ++ 2 + + I
I ** + + + + +4***4+ I
I + + ++ ++++++ + + + + + I
I ++** 2++ ++ + + + +** I
I + + + +4++ + + + I I **+ 4.+ + + +** I
+ +2 + ++ ** + +I I
I
**
**
+
+
**
**
+
I
I
I + ** + + *********+ ** 2 + + + I I ** ++ ** + I
I + ***** + + + + ++ + + + + + + + +I I ** + + **+. I
I *** 4++4 + + 3 +2 ++2 + + I -. S I ** + + ** I
I ** + + + + + + 9 I + ** + ** +
I****+ +4 + + 2+ + +3 I I** ** I
I + + + + + I
I + + + + + 2 I I + + + I
I + + + + + I I I
I + I I +I
++ + + . + + I I
I I
I 2 I I + I
I +I I I
I I -1.0
I + + + I
I+ I
-1.0 I I 1.3 .0 1.4
+ + +
I+I I + I
I I
1.0 I * I
I * I .5 I + I
I +* I I I+ +
I *+
I+ *2 I + + I
I** + ++** I I + + I
I * ** I I + + I
I ** + 2 * I
+ + + + + +
+ +** + . ** +
I *** *** I I + + + + + + + I
I +** + + + + **+ I I +
+ +
I
4+ +
+
2
+
+
2
+
I
I
I ** + + ** I
I ***4++ + + **++ I I + ++ + 4 + + I
I+ ** 2 + + +**+ I I + + + 44+ + 2 2 + + I
I **+ + 44 + + * I 1+4.
I +
+ +
+
4 +
+
43
+
+ +
+
+432
+
I
I
I 2**+2 + + + +2 2*** + I
I + ** + ++ + ***+ + I I + + + *********** + 2+ + I
.0O I ********* + + ***** **+. *********** + I
I **.+ 2 44 + *** +++ I +*** + + + + + **********2 + +4+ *** +
O0+ 2 *** + + 2442 2 *** + + + + I ++ 2 + + + + + + 2 * I
I ****+ ++**+2 I 1 4 ++ 44+ 3 + + + + *****I
I 2 **+2 + + + +***4++ + I I+ + + + 4 + 2 22 + + + ++ I
1 +44+ 44*****+ +2 +2****++ + I I +4. + 2 + + I
I ++ + ********* + I + + + + + + +4 . 2+ I
I + + 2+ +44 +2 I I 4.+ + + + I
I +- + + + + I I 2 + + I
I + ++4+ 2 + I I .+ + +4. I
I 2 + 2 I I + + I
I + I 4. 4. + +
+ + 2 + + I + + +I
I + + + I I I
I + + I 1 + I
I + I I + I
I I -.5 I +I
I ++ + I
I + I I I
-1.01I I I I
I + I 4. + +
+ + +
+
at the San Jose measuring station. Three projections were Figure 2b. Air Pollution-Second Solution Smooth
accepted. Figures 2a through 2c show the three final Sa 2 = (.16, .29, .17, .91, .16), With Residuals Added
smooths (after backfitting) plotted against their corre-
I + I
sponding linear combinations. The points plotted are ob- 60.0 I I
I I
tained by adding the residuals from the final model to I I
I I
each smooth. The first projection (Fig. 2a) shows that a I + + I
I1 + I
good indicator of suspended particulate matter is (stand- 40.0 I
I +
+
+
+
+
+ I
I
+ + + + + +
ardized) temperature minus wind speed. For small values I + 2 I
I + + + + I
of this indicator, the amount of pollution is seen to be I + + + + **I
I + + 4+ + +t**I
roughly constant; for higher values, there is a strong lin- 20.0 I+ + 2 + + + + + +I
I + + + +2 ** I
ear dependence. The second smooth (Fig. 2b) and the I ++ +2+ + + 2 **
I ***** + + + + 2+ + **++ I
corresponding direction (essentially X4) show a much I***** +****** +
+ + +***** + + +
+ ++2 + + +** + I
+ 3 + 2 *** ++
.0 I + ***** ++ + 2+********** ++ + I
smaller pollutant dependence on 4:00 A.M. wind direc- I + +++ ++ ****** + *** + 3** + + I
I + + + ********+**** 2 ++ I
tion. The third projection (Fig. 2c) suggests an additional I + + + + 2 *** +++2+3 I
I + .+ + + + + + + 2 + I
dependence on the 4:00 P.M. wind direction, but the ef- I +++4IH2 + 22 I
-20.0 I + + + 2 + + I
fect, if any, is clearly small. I + + + + + ++ + + + I
I t+ + 44 + I
In order to illustrate PPR on highly structured data, + + + + + ++ +
I 2 + I
which are common in the physical sciences, we apply it -40.0
I +
I + +
I
I
I I
to data taken from a particle physics experiment (Ballam I I
I I
et al. 1971). This data set (500 observations) is described I I
I I
in Friedman and Tukey (1974). Here we study the com-
-60.0 + + I
bined energy of the three wr mesons (Y) as a function of
the six other variables. -2.0 -1.0 0.0 1.0
120 I I 60.0 I
I + I
I** +4-4- I I I
I ** I I + 4+ +2+++ I
I * + + I I I
100 + ** ++ ++ + 40. 0 I 2 I
+ I
+ ~~ ~~
44+2+~~~~~~~+
+ ++
I
I ** +4 + I
I **+ 2 + I I + + I
I * + I I
I
++
++2 2 *
+ 2 I
I ** 4++4 4 + + I I+ +4
I **+2 ++ + I 20.0 I + + + + 2
80 I ++ **4~+2+4 + I
+ + + +2 + +* I
I + ** + + I
I + ++2 2+ *I
I + **+22 + + ++ I + 44 + +2 ** I
+ +3+**** ++ + 2+ 4 + I+ **** + + +2+** ++ 3+* I
I 4t44++ 2***2 4+ I + + ****** + 2 + **** + * +
I 44 +2**2 + + I 0I ****** + + 3**2 **++ * I
60 I +2324 +**44+ + I0 ******* + +** 22** +** I
I + +3++ * + 44+ ** I I + ******* ** 44 +* 2+ +**** +I
I ++ + ** 2 + + *********** *** I I + + ***************23 +** + I
I St + + 2+* **************** ******* ** +I I + ** **2+4+ 2+ +++ I
I 4++ +*** ++ I I+ 2 + 2 32 2+ 2 + I
I +4422+4 44 I + 3+4444+2 2 I
40 I + +2+ 2 + I -20.0 I+ + + 3+4++ +4 I
+ + +4+2 +4+ + + + I + + I
I +4+ +44 + I + + + 2 + +
I 4 + + I I + ++4t + I
I 2 + + + I I + + + I
20 + + I
-40.0 I + + I
I
I
+ I
I
I + ++ I
I I
I I
I I I I
I I
+ + + -60. 0+ + +
Figure 3a. Combined Energy of Three IT Mesons E31, Figure 3c. Particle Physics Data-Second Solution
(particle physics data) vs. First Solution Linear Com- Smooth Sa2, a2 = (.0, .82, -.05, .0, -.33, .46), With
bination, a, = (.83, .54, .0, -.16, .0, .0), With Corre- Residuals Added
sponding Smooth Found on the First Iteration
I +I
4.0 I I
I+ + I I I
20.0 I I
I1+
+ I
I I ***I
I ** I I ++** I
I *** + I I + +**+ I
I ** + I I *** I
I *** + + I + **+ +
I *** + + I I1 3*** I
+ ***++ ++ 2.0 ** + I
15.0 I *** + + I I 22**+ I
I 44*** 2 + I I *3**44 I
I 2 2+**+ + + + I
I ** + +I 1 44 + ** I
I + **++ + + I I ** I
I + ** +2 + 2 * I +**++ I
I +** + + * I I + **++ ++ I
I +** + * I + 4*** + +
I 2 ***++ 2 + 2** I I + +++** I
+ +2** + 4 ** + I 4**++ I
10.0 I 2+*** +2++ + + + 4** I .0 I + +2** I
I +o+**+++ + + 2** I I ~~~~~~+ + 2+***I
I 22*** 2 ++ +*2 + I I + **++2
I + ** 2++ + ** I + + + **+2 I
I 32*** 44* I I + + ++4 23**2+ I
I +3+**22 + 2** I I 2+ 2+ 264**+ I
I + +**+ +++ + +* I I + + + 2+325***+ I
I 2+2** + +2++ + ** I + + + 2 2+2++22***4**23 + + +
I +25** *2 I I + ++ + ********6***2+ I
I + +2*****++?.+33+25 3 I
5.0 ?+ 4 *h*2 *** + I 2*****33 53+2+ 2++2+ I
I 2+** 444411* I 1+ 22.***22 ++ + + ++ I
I + 2+**5 +2+**+ I -2. 2+**** + + +
I 44**3 7.* I I*** ++2 + + I
I 34**82**+ I I* + + 2+ I
I + +6**4*4 I I ++ I
1.0 + 47*K*+ I I+ I
** I +*** I 4+
I 8M8 I
+ 44 +
-3.0 -2.0 -1.0 .0 1.0
-2. 0 -10 0 5
for nonlinearity by modeling with general smooth func- jections. Their figure of merit measuring the degree of
tions of the predictors. Full projection pursuit allows for clustering in a projection (P index) is too complex to be
interactions by modeling with general smooth functions optimized by linear algebra. Instead, the optimal projec-
of linear combinations of the predictors. tion was sought by numerical optimization; this was re-
PPR is computationally quite feasible. For increasing ferred to as projection pursuit. As multivariate structure
sample size n, dimensionality p, and number of iterations often will not be completely reflected in one projection,
M, the computation required to construct the model it is important to remove structure already discovered
grows as Mpn log (n). (deflate previous optima of the figure of merit), allowing
As seen in the examples, an important feature of PPR the algorithm to find additional interesting projections.
is that the results of each iteration can be represented Friedman and Tukey suggest splitting the data into clus-
graphically, facilitating interpretation. This pictorial out- ters, once a clustered projection has been found, and
put can be used to adjust the main parameters of the applying the procedure to the data in each of the clusters
procedure, that is, average smoother bandwidth and ter- separately.
mination threshold. Projection pursuit regression follows a similar prescrip-
The average bandwidth should be chosen as large as tion. It constructs a model of the regression surface based
possible, subject to the avoidance of oversmoothing. In on projections of the data on planes spanned by the re-
any projection, whether the smooth systematically de- sponse Y and a linear combination at * X of the predictors.
viates from the data is easily detected by visual inspec- Here the figure of merit for a projection is the fraction
tion. Whether a particular projection affects a significant of variance explained by a smooth of Y versus a * X.
improvement in the model can be judged subjectively by Structure is removed by forming the residuals from the
viewing its smooth and the corresponding residuals. Lack smooth and substituting them for the response. The
of a systematic tendency of the smooth indicates that model at each iteration is the sum of the smooths that
including this projection into the model would only in- were previously subtracted and thus incorporates the
crease the variance, while not reducing the bias. One can structure so far found.
also employ a more formal procedure based on cross-
[Received February 1980. Revised April 1981.]
validation (see Stone 1981).
The PPR procedure can clearly be applied to the re- REFERENCES
siduals from any initial model. If the initial model does
BALLAM, J., CHADWICK, G.B., GUIRAGOSSIAN, Z.C.G., JOHN-
not fit the data well, PPR will so indicate by augmenting SON, W.B., LEITH, D.W.G.S., and MORIGASU, J. (1971), "Van
the model. Hove Analysis of the Reactions r-p --> -irr +p and nr+p
All stepwise procedures have difficulties modeling 7r+ Tr+r- at 16 GeV/C," Physics Review, 4, 1946-1947.
BREIMAN, L., and MEISEL, W.S. (1976), "General Estimates of the
regression surfaces that cannot be well represented by Intrinsic Variability of Data in Nonlinear Regression Models," Jour-
models of low complexity in their hierarchy. Because nal of the American Statistical Association, 71, 301-307.
models in PPR are sums of functions, each varying only CLEVELAND, W.S. (1979), "Robust Locally Weighted Regression
and Smoothing Scatterplots," Journal of the American Statistical
along a single linear combination of the predictors, PPR Association, 74, 829-836.
has difficulties modeling regression surfaces that vary FRIEDMAN, J.H. (1979), "A Tree-Structured Approach to Nonpar-
with equal strength along all possible linear combinations. ametric Multiple Regression," in Smoothing Techniques for Curve
Estimation, eds. Th. Gasser and M. Rosenblatt, New York: Springer-
Verlag, 5-22.
7. PROJECTION PURSUIT PROCEDURES FRIEDMAN, J.H., and TUKEY, J.W. (1974), "A Projection Pursuit
Algorithm for Exploratory Data Analysis," IEEE Transactions on
The idea of projection pursuit is not a new one. Inter- Computers, C-23, 881-890.
preting high-dimensional data through the use of well- GASSER, T., and ROSENBLATT, M. (eds.) (1979), "Smoothing Tech-
niques for Curve Estimation," in Lecture Notes in Mathematics 757,
chosen lower-dimensional projections is a standard pro-
New York: Springer-Verlag.
cedure in multivariate data analysis. The choice of a pro- ROSENBROCK, H.H. (1960), "An Automatic Method for Finding the
jection is usually guided by an appropriate figure of merit. Greatest or Least Value of a Function," Computer Journal, 3,
175-184.
If the goal is to preserve interpoint distances as well as
SONQUIST, J. (1970), "Multivariate Model Building: The Validation
possible, then the appropriate figure of merit is the var- of a Search Strategy," Report, Institute for Social Research, Uni-
iance of the projected data, leading to projection on the versity of Michigan, Ann Arbor.
largest principal component. If the purpose is to separate STONE, C.J. (1977), "Nonparametric Regression and Its Applications"
(with discussion), Annals of Statistics, 5, 595-645.
two Gaussian samples with equal covariance matrices, (1981), "Admissible Selection of an Accurate and Parsimonious
the figure of merit is the error rate of a one-dimensional Normal Linear Regression Model," Annals of Statistics, 9, in press.
TUKEY, J.W. (1977), EDA Exploratory Data Analysis, Reading, Mass.:
classification rule in the projection, leading to linear dis-
Addison-Wesley.
criminant analysis. In both cases the figure of merit is WAHBA, G., and WOLD, S. (1975), "A Completely Automatic French
especially simple and the solution can be found by linear Curve: Fitting Spline Functions by Cross-Validation," Communi-
cations in Statistics, 4, 1-17.
algebra. In a similar spirit, Friedman and Tukey (1974)
suggest detecting clusters by searching for clustered pro-