0 ratings0% found this document useful (0 votes) 72 views11 pagesCleveland, Local Regression - 000008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Chapter 8
Local Regression Models
William S. Cleveland
Bric Grosse
Local regression models provide methods for fitting re
mn surfaces, to data. Two examples are shown in F
ere is one predictor, and the fitted fun
ere are two predictors, and the fitted
will be explained in de
scion functions, or regres
Band 8.2 Ia the fist
mis the curve. In the second
2 is shown by a contour plot.
later,
predictors. One basic specific
shere is a neighborhood containing x in
mated by a function from a specific para
‘im this chapter, there willbe two classes—
specifications of local regression models lead.
ee 1
ing that consist of emoothing the response as a function of the
predictors thus the iting methods are nonparametric regression procedures.
‘Recall that in Chapters 4 to 6, responses are modeled as parametric functions
duct
regression surface of two or more pr
by additive functions of the predictorCHAPTER 8. LOCAL REGRESSION MODELS
case, they provide more pan
estimation prope:
apprexi
various specif.
the details
called loess, is
U regression models
show how the dataCHAPTER 8. LOCAL REGRESSION MoDELS
8.1 Statistical Models and Fitting
8.11 Definition of Local Regression Models
2 pbeee for cach irom 1 ton, that yi measurement of the response and 2; is
the rape nk eeor of measurements ofp predictors Tne wag regression model
‘he response and predictors are related by
= 0a) +2,
yibete isthe regression surface and the e
in the space of the predictors,
9(23) is the expected value of
Properties of the regression =
‘about them, We will now di
functions and objects that are
the specications that are allowable alg aos
described in Section 8.2.
Specification of the Errors
Gh al cass, we suppose that the, are indep.
6. “One of two families of probability a
{the Gaussian, The second is symmetric
situation where the errors
cetera, it the normal (leptokurtosis), and whieh ane
estimation,
11 ab esi properties ofthe vasianes ofthe en one of two ways, The
Se Simpy that they are a constane, fhe Soe fo that ase; bas constant
“erlance a", wheve the a prior weights a, ate postions nova,
lent random vaciables with mean
ns ean be specified. The first is
ions, wh
Specification of the Surface
Suppose fist, that all predictors are ni
ach 2 in the space of the predictors,
of 2, the regress
sloss. The overall
{Stine in Section 8.1.2. Size, ofcourse, hopin
listance. For two or more numeric pa
routed by deciding whether to normalise the sec
"ill elaborate on this later
We will allow the specification of one of two
ions: near and quadratic polynomials. For ay there are tno pre
vant aod tHE we specify linear, the clue concn ‘ree monomials: a
and: Ife specify quadratic, the eas a made we ee ‘menomials:
tric, and we will use Euclidean
shapes of the neighborhoods ace
les ofthe numeric predictors. We
Several clases of parametric func.
he ark
F that given the values of the predictors notin the subset, th
313
re mame priors, We can spcty
A= 2 and thar are two oF m as Wan
re be dropped om th
nyo th ovo tat oe sm de Pca
soon sete he edt ate and He ge teva
the clas eo four mono: constant
tric class as function of the subset. If we change the con
Sed
aie mime Wasa eee oe nat
Tonto Seed oan
mene
exploration ofthe data
oth utction
iain get hh ne nha ey oh stn
ft rls Making Sich seit wn =
"Sipe oy tat te ae nc redo, A conned faa fred by
appre now “ri
tro to reficre wit ela ale nd ff the sad Bk nde
tescont Ther Ue cored er sr ee ek ena Sat al
hte fale adhe mt. Ine sa, sone spot rapa
Preier aply copay fr enc el of the combned cos
Dred apply separa
# Gaussian or symmetric dis
‘+ constant variance or @ priori weight;
ic in numeric predictors
‘locally inear or locally quadratic in numeric p
«neighborhood size;‘+ Rormalization of the scales
© dropping squares;
* conditionally parametric subset
Identically Distribn i
aise uted, Gaussian Errors: One Numeric Pre-
oe ‘sian errors with constant variance 92,
f 9 at a specif
Let Ay(2) = fe
(2) = be the values of these
2 of, is comput
smallest to largest, and nie
stances ordered from
Tuy {C-W, frosuct
o forw>t
be the tricube weight function,
‘The smoothness of
= de oss ft depends on the specification of the neighbor d
Parameter, a > 0, As a increase, ¢
equal to tn tran creases, § becomes smoother. Suppose
ws 0 integer. We deine a weght fr een Eta
wile) = 7
Se diced in the same mane, but
: bch We wl cal he niger
sn tance orn
Tw hi
ae secede su
polynomial—that is, i \ is 1— a
tas squares with the wage
Fits} e quadratic w Seed
or lye ting or an
CHAPTER 8. LOCAL REGRESSION MODELS
we At a quadratic. For a > 1,
f 31. STATISTICAL MODELS AND FITTING 315
“Identically Distributed, Gaussian Errors: Two or More Nu-
“meric Predictors
continue to suppose the errors are identically distributed and Gaussian. ‘The
tue additional issue that needs to be addressed for p numeric predictors with p >
isthe notion of distance in the space of the predictors. Suppose 2 Is a value in
‘the space, To define neighhcrhood weights we need to define the distance, A,(2),
‘om x to 2, the ith observation of the predictors. We will use Euclidean distance,
= but the 2, do not have to be the raw measurements. Typically, it makes sense to
tak 2; to be the raw moagurements normalized in some way. We will normalize
‘the predictors by dividing them by their 10% trimmed sample standard deviation,
and call this the standard normalization. There ore, however, situations where we
‘might choose not to normalize —for example, ifthe predictors represent position in
space,
‘Armed with the A,(2), the loess fitting method for p > 1 is just an obvious
‘generalization of the one-prodictor method. For a < 1, neighborhood weights,
are defined using the same formulas used for one predictor; thus, if = 1, we
that c4)(2) is replaced by Aca
Dropping Squares and Conditionally Parametric Fitting for
‘Two or More Predictors
Suppose A hes been specifed to be 2. Suppose, in addition, that we have specified
‘the squares of certain predictors to be dropped. ‘Then those monomials are not used
in the local fitting
Suppose a proper subset of the predictors has been specifi
parametric. Then we simply ignore these predictors in compu
distances that aro used in the definition of the neighborhood weig
‘an easy exercige to show that this results in a conditionally paremotri
Symmetric Errors and Robust Fitting
Suppose the &; have been specified to have a symme
modify the loess fitting procedures to produce a robust
rot adversely afected if the errors have a long-talled di
‘ficiency In the Gauselan case.
‘The loess robust estimate begins with the Gaussian-error estimate,
the residuals
&=u- aed)316 1
CHAPTER 8. LOCAL REGRESSION MoDEIS.
‘are computed. Let
for |uj > 6
De the bisquare weight function. Lat
m= medi
be the median absolute residual. The robustness weight are
= Bless 6m)
An updated estimate,
ihe neighborhood
large residuals receive n
i computed sing the Ll Sting m
replaced by rj ay
with
1) ith
als ate computed and the
estimate soveral tines,
Factor Predictors
me oy ade ne of ore factor prdictrs
Seats oe fr ech combination of lev
This
in he Sting by dividing the data
{in any way; for example
However, if the error distribution
te are pooled in forming the median absolute residual,
Errors with Unequal Scales
Suppose we specify that a, have cor variance o?,
retin th ecghorsond waht a) eo ttete
Pebut eins the whe nena) see a
8.2 S Functions and Objects
Q
+ and thn carryout graphical diagaoetis to che the neat
2 § FUNCTIONS AND OBJECTS
© the fitted models, Our goal is to show how the data are analyzed in practice using S,
id how each dataset presents a diferent challenge, We begin, however, by rapidly
running through the § functions for Btting and inference to give an overview; the
‘The basic modeling function is toe
Lete-apply it to some concocted data in the data frame madeup, which has two
numeric predictors
> nanes(aadeup)
11 "response
> avvech(adeup)
We will ft Gaussian model with the smoothing parameter, a, equal to 0.8 and
the degree, A, of the locally-fitted polynomial equal to 1:
> adeup.n < lotsa(resgonse ~ one + t¥0, span = 0.6, degree = 2)
> madeop 2
cat!
YoessCformta = response ~ one + two, span = 0.5, degree
=D
Muster of Observations: 100
Equivslest hunber of Paransters: 14.9
Residual Standard Error’ 0.9698
maeiple Renquared 0.76
Resiauale:
ein ist Q nadin 3rd. Q_ max
-2.289 -0, 5064 0.1243 0.7359 2.357
Notice that the printing shows the equivalent number of parameters, ms this measure
of the amount of smoothing, which is defined in Section 8.4, is analogous to the
number of parameters in a par ft. Also shown is an estimate of 0, the
standard error of the residuals. ipdate the ft by dropping the equare of the
first prodictor and making it con parametric:
> medeup.new < update(aadeup.2, drop.oquare = “one",
4 paranctric = "one")
> sedeup.nee
oa
osea(formla = 7%
parametric =|
Je ~ one # t¥o, span = 0.8, degree = 2,
“drop.equare = "one")
umber of Observations: 100
Equivalent Nuaber of Paraneters: 6.9
2.40
Reesdual Standara Erzor:
ose
multiple Rrsquared:
Residuals:26 CHAPTER 8. LOCAL REGRESSION MODELS
sf 7
24, ’
eal |
poy. > |
i? pot ga
3 ° |
alee 2°
34 ° ° |
|
©
Siew 85: Reidut geist 8 wih a saterlotsmoothing—frs tt gu.
‘glues and, again, no convincing dependence was found. ‘To check the assumption
2a Gaussian distribution of the errors, we will male a Geacen probability plot
Te tutto judge the straightness ofthe points on such ples
‘it wrte a litle fonction that draws a line though the lower ane vores quartiles:
> agtine
funczion()
{
data.quartiles < quant
orn. quares el
> & (date. quartiter
(rors. quar
ebline(a, 8)
)
‘ow we make the plot:
aqnorm reeiduals(gas.x))
3a7
| 82, § FUNCTIONS AND OBJECTS
resiuaa(gas ma)
Figure 86: Residuals against B with oscatterplot smoothing —scoond fi to gus.
ssiduals (gas.
gqlin
ification is justified,
‘The result, shown in Figure 8.8, suggests that the Gaussian specification is justific
which allows us to carry out stat
ity. First, we compute 99% pointwise co
> posntrive ene.
9521 4.109907 §.40200 6.586510 2.52760 1.710817
ieCHAPTER 8. LOCAL REGRESSION MODELS
Sanabs(eduategas.n)
esfoas.m)
Figure 87: Suareroot absolute residuals against fed onlue with 8 soatterplt emoothing
ste:
U7) oraassadd 07068 5.0nsr006 5.135260 3. s4a6868 1. sses0i7
(7) ovs226823
Stover
Cr) -a.aaeaig® §-2801866 &.es1:876 a.ro40105 27507007 6852464
(7) -0.4246042
he function pos
3 confidence
Plot (gaan, cont:
reihe fs earlier used to plot the eurve, will compute an
1 plot, as shown ia Figure 8.9:
=”
be limits are ce equally spaced points from the minimum to
*. ‘Thus, the limits that are plotted in
ed.
We know fem the
tf for purposes of illustration
dels:
mt does not fit the data,
ical comparison ofthe two
329
82. $ FUNCTIONS AND OBJECTS
resisval(gaem)
‘Quartiog of Stancard Nema
Figure 88: Gaussian quantile plot of residuals with line passing through lower and wpper
Figure 8.8: Cavsian qua
quartiles
AoessCformla = NOx ~ E, span = 2/3, degree = 2)
Tunber of Observations 2
Equivatent tuber of Parsaeters: 5.5
0.2406
Residual Standard Error:
Multiple R-equared 0.96
Ressdaads:
nin iat Q median rd g aar
~0.8606 -0.213 0.02811 0.1271 0.6234
Yooes (formula = NOx ~ E, span = 1, degree = 2)
Number of Observations: 2@ sedian 3raQ nar
4596 ~0.1019 0.2014 0.0133,
hn he incense
ners in ao ganna sein drop
arate, bt the enue af acess pre
of thelck of. Wecan tat nr cen oe
0022, gas.a)
mula = NOE ~ E, open = 1, do
Wr ~ Ey span = 2/3,
ance Table
@ «82.2 Ethanol Data
82. § FUNCTIONS AND OBJECTS 3a
me RSSTest_ =F Value Pr(F)
1 3.5 4.689 1v=e2 10.14 0.000861
2 6s 1.7760
The result, as expected, is highly significant.
‘The experiment that produced the ‘we just analyzed was also run with
gasoline replaced by ethanol. There and two predictors: B, as before,
and C, the compression ratio of the engine. ‘The data are in ettanot:
discovered that an addi-
‘because of an interaction
‘These data were analyzed
tive ft did not approximat
between C and E.
typing easier we will a
sttech(ethanol)
Exploratory Data Display
for starting an analysis with two or more predictors is
own in Figure 8.10;332
Figure 6.10: Ethanol date—scetterplot matre of NOx, C, and B.
CHAPTER 8, LOCAL REGRESSION MODELS
82, § FUNCTIONS AND OBJECTS I 333
of the two variables are nearly uncorrelated and that C' takes on one of
‘The intervals are shown on the given panel; as we move fe
rls, we move from left to right and then bottom to top through the
the intervale
overtap = 1/4)
endpoints of the intervals in the left
specified by
shared by the successive intervals.368
8.3.3. Graphics
to Same case, enough evaluation is done by plot) for oees objets that ws want
tea be done it confidence intervals for future renderings ofthe praph, Tan
aa be done using the function prep
plotting by piot()
sthanol plot <= preplot(ethanol.cp, confidence = 7)
plotCethanel.piot)
8.4 Statistical and Computational Methods
tical methods in the fing of
uss the methods of inference that
84.2, we discuss computational
methods that underlie loess St
ep the discussion from be
all numeric. Extending the res
obvious
8.4.1 Statistical Inference
Waals we will suppose thatthe eros have been specie tobe Gausian andthe
variances have been specified to be constant
One important property of a Gaussan-ervor less estimate, (2), i thet
linear in y—that is,
ae) =D
prize the (x) do not depend on the yj. This near rests in ditibution
Fting. S/O the estimate that are very similar to those fr classic tence
fitting.
ue
the diagnostic methods have been applied and have revealed no
{ack of ft in 9(2); we will take this to mean that g(s) g(a) ale Suppose
{izitr that dingnstc checking has veri the specications othe nee oes
‘the model,
Estimation of ¢
Since gf
linear in yi, the fitted value at 2; can be written
CHAPTER 8, LOCAL REGRESSION MODELS
+ which saves the computations for future |
= Land 2, le
where isthe m xn identity matrix, Fork = 1 and
b= u(D'b)"
Wie estimate a by the seal estimate
Diet
Confidence Intervals for 9(z)
Since
the standard deviation of (2)
We estimate o(2) by
Let a
‘The distribution of ate) — ala)
so this
of freedom; we can vs
p degrees ue 6
‘on d(z). Notice thatthe va
sae tte sea the al 9.
al parametric fisting,
ut not elose enough
— es
eee
Pate eeer nerrere eet
Bee
hdr age em te tan,
th ‘two values are equal. For loess, they are eplcaly ase o foiomstace it
ice ee es acai
Settee, eset tec
ee
interval3 foDs am
CHAPTER 8. LOCAL REGRESSION moDELS #4. STATISTICAL AND COMPUTATIONAL METH
Analysis of Variance for Nested Models
We can use the analysis of vatiance to test a
‘The null and alternative models have the same neighborhood variables.
‘The fitting variables of the null model are a subset of the fitting variables of
the alternative model.
Zul local regression model against ax
alternative one. Let the parameters of the null model be 0, and 6.
Tet the paramoters of the alternative model be For the test ta
‘make sense, the null model should be neste ig ative; we will define this
concept Let rss be the residual sum-of of the alternative model,
ae eal be the residual sumo squares of the sal aa
sng st statisti, whichis analogous to that for the aces of variance in the
parametric case, is
‘The Equivalent Number of Parameters
“ nae)
ithe Gare the fitted values, th
(rs! — re)
Fe
Yh Variances)
rr
Tee & dlsteution that ie well approximated by an F ditibation with denomi-
Fe wi te equal peter of pm
we fra for dl th i eo
beta ened partes gtr
oF rece tP deerees of freedom p, defined earlier, and narra look-up degrees
of freedom
ranutdel being nested inthe alternative expreses the idea
sot capturing any elle tha: the null en capture, Sat
Dretkely & specification of when it makes Sense to ur roe
fhe a ztaee fo compare two models. The alls nese ese alternative if
°e following conditions hald
hese factors
desired value y by taking a to be 1.27/y,
selected all
neighborhood and fitting variables. However, having selected all
‘except a, we can get, approximatel
‘where ris the umber of fitting var
Symmetric Errors
inferences are based on
‘When the error dstibation is specified to be symmetric, inferences ae sed on
pees aus Tat he obsitn weghs andthe median bec ae
the final update ofthe fit, f(z), ber and m, respectively, |
‘The poeudo-values are
the square of a numeric
tor is dropped from the alternative model,
then it must not be presen
‘ull model; the converse need not be teva
Hatt ones
where jar the Sted values, & are the residuals, and
Sew Gam)
cedures of tho Gaussian
Inferences are carried out by applying the inference procedures of|
acg curves Por
oberon of the responte yy the
Crap spp an tocomputeacouene eral oro
‘zampl, suppor we wan °
‘coverage. For
coverage using this procedure is well approximated by the nominal coverage.