0% found this document useful (0 votes)
38 views36 pages

On Asymptotic Distribution Theory in Segmented Regression Problems

This document summarizes Paul Feder's 1975 paper on asymptotic distribution theory in segmented regression problems where the regression function changes form across different domains of the independent variable. Feder shows that when the regression function consists of different linear models in different segments, the least squares estimation problem can be transformed into a new problem where classical techniques can be applied. He then derives the asymptotic distribution of the estimators for this new problem and shows that the results also apply to the original segmented regression problem. Feder focuses on the case where the segments are all linear models but notes the techniques should extend to some nonlinear cases as well.

Uploaded by

cratto6384
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views36 pages

On Asymptotic Distribution Theory in Segmented Regression Problems

This document summarizes Paul Feder's 1975 paper on asymptotic distribution theory in segmented regression problems where the regression function changes form across different domains of the independent variable. Feder shows that when the regression function consists of different linear models in different segments, the least squares estimation problem can be transformed into a new problem where classical techniques can be applied. He then derives the asymptotic distribution of the estimators for this new problem and shows that the results also apply to the original segmented regression problem. Feder focuses on the case where the segments are all linear models but notes the techniques should extend to some nonlinear cases as well.

Uploaded by

cratto6384
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

On Asymptotic Distribution Theory in Segmented Regression Problems-- Identified Case

Author(s): Paul I. Feder


Reviewed work(s):
Source: The Annals of Statistics, Vol. 3, No. 1 (Jan., 1975), pp. 49-83
Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/2958079 .
Accessed: 15/09/2012 07:38

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The
Annals of Statistics.

http://www.jstor.org
The Annalsof Statistics
1975,Vol. 3, No. 1, 49-83

ON ASYMPTOTIC DISTRIBUTION THEORY IN SEGMENTED


REGRESSION PROBLEMS-IDENTIFIED CASE'
BY PAUL I. FEDER

GeneralElectricCompany
Thispaperdealswiththeasymptotic theoryofleastsquares
distribution
estimatorsin regressionmodelshavingdifferent analyticalformsin dif-
ferentregionsof the domainof the independent variable. An important
specialcase is thatof brokenline regression,
in whicheachsegmentofthe
regression functionis a different line. The residualsumof squares
straight
functionhas manycorners,and so classicalleastsquarestechniquescannot
be directlyapplied. It is shown,however,thatthe problemcan be trans-
formedintoa newproblemin whichthesumofsquaresfunction is locally
smoothenoughto applytheclassicaltechniques.Asymptotic distribution
theoryis discussedforthenewproblemand it is shownthattheresultsare
also valid forthe originalproblem. Resultsrelatedto the usual normal
theoryare derived.

1. Introduction. Frequently inregression problems,a modelisassumedwhich


supposesthattheregression functionis of a singleparametricformthroughout
the entiredomainof interest.However,in manyproblemsit is necessaryto
considerregressionswhichhave different analyticalformsin different regions
of the domain. An important specialcase is thatof brokenlineregression, in
whicheachsegment is a different
straightline. Dunicz [5] providesan example
in which such a model naturallyarises in a chemicalprocess. Sprent[23]
givesexamplesfromagriculture and biologywheresuch regression modelsare
appropriate.
One class of segmented modelsconsistsof functions whereeachsegment is in
theformofa linearmodel. Robison[21] givesproceduresforobtainingcon-
fidenceintervals whentheregression function P,(t), fort ? r
is onepolynomial,
and a second polynomial,p2(t), for t > r, with pz(z) = p,2(r). However he as-
sumethatit is knownbetweenwhichobservation pointst lies,and furthermore
does notrestrainhis estimateof zrto lie betweentheseappropriate observation
points. Quandt [19, 20] discussesmethodsof estimatingthe coefficients of
segmentedregression functions
and heuristicallyobtainsfromsamplingexperi-
mentsa samplingdistribution of thelikelihoodratiostatisticforthetestof no
ReceivedJuly1970;revisedFebruary1974.
1 This researchwas supportedat Yale University
by the Army,Navy,Air Force and NASA
undera contractadministered by theOfficeof Naval Research,at PrincetonUniversityunder
ArmyResearchOfficeContractDA-31-124-ARO-(D)-215, and at GeneralElectricCorporate
Researchand Development.This is a generalization of a partof theauthor'sdoctoraldisser-
tationwrittenat StanfordUniversity underthesupportof National Science FoundationGrant
GP-5705.
AMS 1970 subjectclassifications. 62E20, 62J05.
Key wordsandphrases.Regression,segmented,
leastsquaresestimation,
asymptotic
theory,
splines.
49
50 PAUL I. FEDER

switchin theformof theregression.Farley,Hinich,and McGuire[6, 7] pro-


posea routinescreening procedureto detectparameter in timeseries
instability
regression models.
In recentyearsmathematicians interested in approximation theoryhave de-
votedmuchattention to thetheoryof splinefunctions. Splinefunctions consist
polynomialsegments.Theyare continuous andusuallycontinuously differenti-
able; however,theyhave discontinuous higherorderderivatives at thechange-
overpointsbetweenthepolynomial segments.The classof splinefunctions is a
considerable extensionoftheclassofpolynomials.Splineshavebeenfoundvery
usefulforapproximation and interpolation.See Greville[10] and Schoenberg
[22] fordetails. In thetheoryof splineapproximation, thechange-over points
betweensegments(or knots)are chosen merelyfor analyticalconvenience,
whereasin segmented regressiontheory,the change-over pointsusuallyhave
intrinsicphysicalmeaningin thattheycorrespond to structural
changesin the
underlying model. However,thetechnicalproblemsare thesamein bothsitu-
ations. Poirer[17] relatessplinetheoryand segmented regressiontheory. He
developsteststo detectstructural changesin themodeland to decidewhether
certainof the modelcoefficients vanish. However,he makesthe simplifying
and restrictiveassumption thatthelocationsof thechange-over pointsbetween
segments are known.
The principaldifficulties
in theestimation problemoccurwhenitis notknown
betweenwhichconsecutive observationsoftheindependent variablethechange-
over pointslie. If foreach k it is knownthatrk' thekthchange-over point,
liesbetweenthesuccessiveobservations tj(k) and tj(k,+1, thenthefitting problem
is relativelysimple. For each admissibleset of change-overpointsZ.V *, r_
.

one obtains separate least squares fits(functionsof zr


..1 , - within each
segment, subjectonlyto therestraint of continuity at the change-over points.
This can be readilyaccomplished by theuse ofLagrangemultipliers.One then
choosesthatsetof admissiblerk'S forwhichthe bestfitis obtained. This is
not too difficult
a job computationally if thesetofadmissibler's is relatively
small. On theotherhand,iftherestraining regionis large,theproblemis very
likelyto reduceto thatoffitting separateregressions in each segment, without
imposingany of theconstraints.The asymptotic distribution theorythende-
pendson themagnitudes of ti(k+1 - tj(k) as comparedwithn-i.
In 1966, Hudson [14] consideredthe problemof obtainingcomputational
proceduresfortheleastsquaresfitofa continuous, segmented, linearregression
functionwhenno priorknowledgeis assumedregardingthe locationof the
change-overpoints. Bellman and Roth [1] applied dynamicprogramming
methodsto thisproblem.In thepresentpaper,asymptotic distributiontheory
of theleastsquaresestimates is discussedformodelssuch as thoseconsidered
by Hudson.
In 1965and 1972Sylwester [24,25] consideredthecase of two straight line
segmentsand one unknownchange-over point. In 1967Feder[8] treatedthe
SEGMENTED REGRESSION PROBLEMS 51

case whereall thesegments are dthdegreepolynomialsdiffering only in their


linearterm. The presentpaperupdatestheseresultsto treata considerably
largerclass of models. Feder [9] consideredthe problemof likelihoodratio
testingin segmented regression models. He showedbyexamplethattheasymp-
toticnulldistributionof thelikelihoodratiotestthata two-segment modelin
factconsistofjust one segmentis notuniquebutdependson thespacingof the
independent variables. Hinkley[11] consideredthe case of two straight line
segmentsand reports,on empiricalgrounds,thattheasymptotic normality of
the estimatesof (r, , cr-1) derivedin thispaper may not be an adequate
approximation formoderatesamplesizes. He presentsan informal argument
to derivealternativeapproximations. Hinkley[13] consideredthe problemof
estimating and makinginferences aboutthepointof changeofdistribution ina
sequenceof randomvariables,whichis relatedto the problemof the present
paper.
For reasonsof simplicity and to avoid peripheralissues,thispaperconfines
attentionto thecase in whichall segments of theregressionfunctionare in the
formof linearmodels. Howeverthetechniques employed shouldsuffice,
byuse
of appropriateTaylorexpansions, to handlemanycases in whichthe segments
are nonlinear.
2. Definition
of the modeland summaryof results. Consideran r phase,
segmentedregressionfunction
of theform
P(e;t) = f(PO;t) for A -r0 < t ? z1
(2.1) = f2(82; t) for rl1 t < T2
= f7(6r; t) for tri1 < ?t B V r.

This can be compactlyrepresented


as
(2.2) p(e; t) = Zr=lfj(69j; t)Ij(t)
whereI,(t) is theindicatorfunction of theinterval[r-1, zj). It is assumedthat
pi(f; t) is continuous at t = Tj j = 1, * , r - 1. In thismodel,A and B are
knownconstants(assumed0, 1 withoutloss of generality)and 81, 8,
7,r-i are unknownparameters.Boldfacesymbolswill represent vectors
and matrices. Let 6 = (61. ., r), T = (71, ...,* r-1), and e = (6, r). Assume
thatforeachj, 0j-(ojl, ** jK(j)) is a K(j) dimensional vectorand thatthere
exist known functions fjl(t), .*, fSK(j(t) such that

(2.3) fj(0j; t) = EK(j) Ojktfk(t)

where{ffk(t)) are linearlyindependent functions on theinterval[zj-1, rj].


In addition,it will be assumedthatthereexistsan s < oo suchthatanylinear
combination ofthefunctions {ffk(t)1 has at mosts signchangesin derivative on
theinterval[0, 1]. This conditionis satisfied bymostfunctions usuallyencoun-
tered,suchas polynomials, sinesand cosines,and exponentials.
Let 0 denotethesetof "admissible"vectors6. That is, 0 is the collection
52 PAUL I. FEDER

of 8's whichlead to functions p(e; t) satisfyingthecontinuity restraints.For


each 6 e:0 considerthe set of r's (dependingon 6) whichlead to functions
the continuity
p(e; t) satisfying restraints.Formthevectorse = (6, r(6)) -
(6, r). Let E denotethesetof thesee's and let U = {f(e; t): { e E}. Through-
out thediscussionattentionwill be confined to 6's in 0 and to e's in E.
For givenn, assumethatn observations, . X," are takenwhere
Xn1, *,

(2.4) Xni= p(4; tni) + eni.

Assumethattheobservation errors, and identically


e,i, are independently distri-
butedwithE(e.i) = 0, Var(enli)= a2, unknown,and Elenii'l+o< oo forsome
a > 0. Let SQo(?, a2) and letSo" = ((08) *., rl(0
(0) ) *... a2) denote *0)

thetruestateof nature.
Asymptotic propertiesof theleastsquaresestimators of so will be examined.
Since the change-over points r1(0, .. **,1 are unknown, the derivation of
asymptotic properties moreinvolvedthanone mightat firstex-
is considerably
of r) is chosento minimize
pect. For each fixedr, theestimate6 (a function

(2.5) s(e) = - 1 (X"s- p(t; tns))


n

subjectto the continuity restraints.The residualmean square thencan be


expressedas s(r). It is necessaryto obtaintheminimum of thisfunction.
Thereis no guaranteethats(e) possessesasymptotically evenone continuous
derivativein the neighborhood of 'r?'. For instance,if fj(Oj; t) = a, + b2t,
bj b,+,, then s(e) and s(r) possess in the z derivatives
discontinuities along
= t,i foreach i, j.
Considerforexamplethe simple2 phase modelwherep(f; t) = c for0 <
t<? and t forr < t < 1. (See Fig. 2.1.)
Here, 6 is thescalarc and r = c. Fromequation(2.5)

s(f) _ S(C, C) = - Ltni<C (Xi


- c)2 +
C)
n
Et>c (Xni tni)-
ni

tll(C;t) s(c)

t C
r tNn
tnltn2 tnitn)+l
FIG. 2.1. FIG. 2.2.
SEGMENTED REGRESSION PROBLEMS 53

Thus forall c's betweentnj and t, j--


As(c)_ 2
as(c) - 2 =1(Xli - c),I a2s(c)
2() -2] 2
ac n ac, n
It is seen(Fig. 2.2) thatthereis a discontinuity in derivative at each observation
point,tni.
The classical derivationsof asymptotic normalityof maximumlikelihood
estimators assumethatthelog likelihoodfunction (or equivalently s(e)) asymp-
toticallybehaveslikea paraboloidin someneighborhood of e't0. (See Cramer
[3], page 501, forinstance.) Hence the classicalarguments are not directly
applicablehere. The methodofapproachand resultsof thispaperare outlined
below.
1. It is shownthat5, theunrestricted leastsquaresestimator (l.s.e.) of 5o,is
consistent undersuitableidentifiability assumptions, whichtacitlyassumethat
no twoconsecutivefj(O9jO; t) are identical. (See Theorem3.6.)
2. Iffj andfj+l are identical,theparameter space is overspecified since the
regression function does notdependon Tj, whichtherefore cannotbe estimated
consistently. However,itis shownthatundercertaincircumstances, withlarge
probability as n -+ oo, the fittedregression function will reflect thissituation.
(See Corollary3.22.) One segment can thenbe deletedfromthemodeland the
regression functionrecomputed in the reducedmodel. For theremainder of
thesummary it is assumedthatno twoadjacentfj'sare identical.
3. Undersuitableidentifiability conditions, 0 - 0('0= O(n-i(log logn)i) and
(tj - r0o))m = Op(n-i(log logn)l), where0 and - are thel.s.e.'s of 0 and r,
and m,is thelowestordert-derivative in whichfill) and f%'?, differ at t = T
(See Theorems3.16 and 3.18.)
4. If cois a subsetof E and et) e d, the closureof wo,thenstatements 1, 2,
and 3 applyequallywellto So. =_(e,,,a.2), thel.s.e. amongall e e .
5. A pseudoproblem is formedby deletingo (n/loglogn) strategically placed
observations near the true change-over points. It is observed that 1, 2, 3, and
4 are stillvalidin thepseudoproblem.(See beginning ofSection4, inparticular
Theorem4. 1.)
6. Let *=(0* r*) denotethe l.s.e. in the pseudoproblem. It is shown
that under identifiabilityassumptions 0 - 0(0)= Op(n-i), 0* - op(n-0), =
(-j* z(O))j = O(n-i), (t'* - Z-(O))mj -(rj - zj(?))"i = op(n-i). This implies
that0 - 060 = Op(n-'), (Tj - z-j(0))i = Op(n-i),and that0, 0* have the same
asymptotic distribution.(See Lemmas4.3, 4.12, and 4.16.)
7. The asymptotic distribution of0*, * is obtainedby"classical"methods.
(See Lemmas 4.4, 4.8, and Theorem 4.15.)
8. Severalexamplesare presented theresults.
thatillustrate
9. Severalunresolvedproblemsare mentioned.
Much notationis used in thelatersections.Some of thenotationfrequently
usedis setout belowforease of reference.
54 PAUL I. FEDER

q = K(l) + K(2) + * + K(r)


t = (tl t2, . . . tk) or t = (tl, t2, * . , tq)

p8= ,u(t; t) (p(t; t1), p(e; t2), *.., (; tk))


p() =p(0); t); /i=p(e;t)
v v(e;t)- 4; t)-4(0(); t)-p(t)-Uto(t) ij to
Vni, V(e;tn,i)' Vni = p(; tni)

Il/li= maxo<,,j Ip(t)l or Ilp(e;t)ll= (2e2(] ; tn))'

0 = (01 . 9r); 03 = .
( 0jll I jK(j))

E) = {admissible
O's}; E = {(0, r(6); 6 e Ej; U = {He(;t); e }
T' = (T'i, * 7 r-i) ; = (@, ); 5 = (e, y2)
600?',r'?' = "true" statesof nature; P0 = P at 6 = 60() T ()

;
e(0) (O (0), r(0)) f(0) 2) (0)Io
= (6, r) = leastsquaresestimator (l.s.e.) of e(O)
= (0,,,r,,) = l.s.e. restricted
to c E
(0*, r*) leastsquaresestimator in thepseudoproblem(see Sec-
tion4)
* = (0,,',,r,w*)= restrictedl.s.e. in the pseudo problem
fj(6j ; t) = E k=1 _jkfjk( t); fjk(t) fjk f (0). t) = f.j0(t) =
fjo
X = [(; tni)+ eni
Eeni = 0; Var (enj)= 2; Elej,%12(la+)
< 00 forsome 6 > 0
H,,(s) = distributionof {tni}; Hn(s) ---d H(s) as n -+ oo
H,(A) = 5AdHn (s)
D+(h, j, k)-D+ = the kthleftand rightt-derivatives of
respectively
fh(Oh(0);t) at t= . If D+ = D- their common value is
denotedas D.
n* sample size in the pseudo problem; n** = n - n*
* = summationover the n* termsof thepseudo problem; E **
Pn *

The calculusof Op and o, (Definition


2.1 belowand discussedrigorously by
Pratt[ 18]) is usedthroughout
thepaperwithoutanyexplanation.Looselyspeak-
ing,onecan operatewithOp and o, in asymptotic calculationsas with0 and o.
DEFINITION 2.1. (Opand o,). A sequenceof randomvariables{YJ} is said
to be
(a) OP(l) if for every s > 0 there exist constants D(s) and N(e) such that
n > N(e) impliesP[l YI < D(s)] > 1 -s;
(b) op(l) if foreverye > 0, 6 > 0 there exists a constant N(S, 6) such that
n > N(,, 6) impliesP[IY.J < 6] > 1-
SEGMENTED REGRESSION PROBLEMS 55

(c) A sequenceof randomvariables{Y} is said to be O,(r,)(o,(r,)) if the


sequence IY,Xrn}is O(l)(o,(l)).

3. Consistencyand rate of convergenceof s. The questions of consistency


and rateof convergence of e to e(O) are consideredin thissection.
At theoutset,thenonstatistical notionofidentifiability
oftheregression func-
tionimmediately arises. That is, assumingno observationerrors,at whicht
valuesmustp(e; t) be observedin orderto uniquelydetermine itovertheentire
interval[0, 1]? It will be shownthatundersuitableidentifiability assumptions
6 converges to 60)9at therateOp(n-i(loglogn)i)and r, converges to zr,'?at a rate
determined by the numberof t-derivatives in whichf,(O (O; t) andfj+1(O'4,;t)
agreeat t = Tj
It will be assumed throughoutthatfj(69j(?; t) and fj+,(69'.4;t) agree in m,-I
t-derivativesat t =- rj") butdiffer
in them3th.Further,it will be assumedthat
fj and fj+l each have continuousleftand rightmjtht-derivatives at t = (0
J= 1,2*2 .,r-1.

3.1. The parameter6 is identified


DEFINITION at pa(0)by the vector t = (t1,
t2 . .if the systemof k simultaneous
.*,tk) equationsp(f; t) = ,p(O uniquely
determines
60'.
LEMMA 3.2. If f is identified
at p('? by t thenthereexist neighborhoods N, T
whereN is a (k-dimensional)
neighborhood of p(O) and T is a (k-dimensional)
neigh-
borhoodof t such that
(a) for all (k-dimensional)vectorsp ? N and t' E T suchthatp can be represented
as p = p(e; t') for somee e 7, 6 is identified
at p by t'
(b) thereexistsa constant,C, suchthatthetransformation 6 = 0(p; t') satisfies
theLipschitz condition 1I8-0 211 < CILI - 2II whenever t' C T andp1 p(t1; t'),
P2 p(e2; t') are bothin N.
Since6 is identified
PROOF. at p(O)byt, itfollowsthatforanypossiblechoice
of parametersr,-,* *, z-_4(and consequentsegments{[zj-, zj), j = 1, * * *, r})
consistentwith 6(0), for each j theremust exist K(j) componentstjl, ***, tjK(j)
withinthe segment(r-1, zj) n (r?1, rj'?') such thatthe K(j) by K(j) matrix
Aj(til
. . .*,
tjK(j)) with(i, k)thelementfj,k(tjj), is nonsingular.By continuity,
thetjis maybe perturbed slightly
withoutdisturbing thenonsingularityof Aj.
Assertions(a) and (b) followdirectlyfromtheproperties of nonsingular
linear
transformations
REMARK 1. Nothinghas yet been mentionedabout the determination
of
*' * c
rr-1- The -'s may or may not be uniquelydeterminedonce 6 is known.
This will be discussedat lengthlateron.
REMARK 2. The proofof Lemma3.2 showsthatif6 is identified at p(O) by
t (tl, t2, . . ., tk), thenk ? q and thereexistsa q-dimensional
subvectort of t,
such that6 is identified at p(O)by t.
56 PAUL I. FEDER

REMARK3. In order that 6 be identifiedat ,.'?' by t it is necessary that no


twoadjacentsegments
of P(f(0?; t) be identically
thesame.
REMARK4. Since p(e; t) effectively depends on q parameters(actually q +
r - 1 parameters relatedby r - 1 continuity restraints), it mustbe observedat
a minimum ofq pointsin orderto be identified.It is clearfromthelinearinde-
pendenceof thefJk'Sthattheplacementof K(j) distinctobservations between
each pairof consecutivechange-over pointsr'(0), r'0I,is necessary
and sufficient
to identify60'?. In particular,if then observations are equallyspacedand no
twoadjacentsegments of (t0?; t) are identical,thenforn sufficiently largea t
existssuchthat6 is identifiedat pa")by t.
Let Hn(s2)- Hn(s,) = n-'{numberof observationsin (sl, s2]}. Assume thatthe
t,i the
are selectedto satisfy
Hypothesis.Hg(s) -+ H(s) in distribution,where H(s) is a distributionfunction
with H(O) = 0, H(l) = 1.
DEFINITION3.3. A centerof observations
is a point of increase of H.
6
The principalresultof thissectionis thata-6O) =Op(n-4(log log n)') ifthere
is a vectort whosecomponents
are centersof observations
and whichidentifies
6 at p(O).
Lemma 3.5 below impliesthat p(e; t) must be near p(e'0); t) for at least one
of6 is a con-
value of t close to each centerof observations.The consistency
sequenceof this.
Condition(*) ofLemma3.4 guaranteesthatthe least squaresestimator6 is
containedin a spherewithcenter6(0)and withradiusd*.
LEMMA3.4. Suppose that thereexistsan s > 0 suchthatfor everyK > 0 there
existd(K), n(K) such thatd > d(K), n > n(K) imply

( ) H,.t:
~~inf(,0:110-V0()JJ>d) Ijy(e;t) - p(e'?; t)J> K} > .
Thenthereexistsa d* such that
limn 0 P0{01l - 6(0)11
< d*} - 1
PROOF. Take K = 3q0o/i. For n > n(K), d > d(K), and ff6 - 6(0)1> d,
inf6,
E I (Xni - (e; tni))2_ inf 1 (et - ni)
The triangleinequalityimplies
(e,i - pi)2 > [(E OJni)- ( E e2i) ]2
- ( E 2) [(E )i 2(E - e2 )']
+E eX
> (E 02)I[30ni - +
2a,0ni o,(nl)] + E e .

The lastinequalityis a consequenceof condition(*). Note thattheop(n0)term


does notdependon 6. Thus,ifd > d(K), n > n(K),
(e,, -
ui)2 > n(E u2 i)i(q0 + o'(1)) + eX
E as n -s oo
SEGMENTED REGRESSION PROBLEMS 57

whereo,(1) is independent of0. Thisimpliesthatwithprobability approaching


1 as n -- oo, E (eji - uj)2 > 5 el i uniformly - 011 >
forall 0 such thatIO11 I
d(K). In otherwords,(withtheinfrestricted to such0),
lrmPo{infeE (Xni - p(eg tj))2 > E elJ = 1

Sincetheleastsquaresestimator
0 minimizes
theresidualsumofsquaresfunction
it followsthat with probabilityapproaching 1 as n -> oo 110- 60()11 ? d(K). L
LEMMA 3.5. If tois a centerof observations,
( > 0, i7 > 0, and condition(*) of
Lemma 3.4 holds,then
>
P0{1,(4;t) - p(e(0); t)l r for all t suchthat It- tol O
< .
PROOF. Let S denote{0: 110- 0')11 < d*}. Let e denotethe least squares
estimatorof e with0 restricted to S. Lemma3.4 impliesthatwithlargeprob-
abilityas n -oo,o, the unrestrictedleast squares estimator,is equal to e, the
leastsquaresestimator.We utilizeTheorems1 and 4 of Jennrich
restricted [15]
to discuss the behavior of e.
Since tois a centerofobservations,
thenumberofobservations intheinterval
[to-3, to+3]is 1n+o(n) forsome12>O. LetF= {v: Ij(s; t)? >r for all t
withIt- tol< ( and 118- 0(0)11< d*J. Now
3(Xni- p(e;t,i))= E (e,j- j)n
= ^
eli + i - 2 *e,j
if v e F, then 2u.i > h2n[I + o(l)] where o(l) does not depend on 0.
Furthermore, the argumentsof Theorems1 and 4 ofJennrich [15] implythat
n-1z eniUniconvergesto 0 in probability
uniformly for e S, sinceS is a com-
0
pact set. Thus infGF E (ei - _ni)2> Z el, + 2i2n + op(n). Since the least
squaresestimateminimizes theresidualsumof squares,it followsthat
z (ni -
it(e;tn))2 <
pM(e(?); tnt,))' e2i.
(X,,i

Thus, with large probabilityas n oo, u(e; t) is not in F. Since e= with


large probabilityas n - oo, it follows that with large probabilityas n -+>00,
u(e; t) is notin F. That is, Iu(e;t)I < 7 forsomet withIt- tol< (3. B
THEOREM3.6. (Consistency).If (i) Condition(*) of Lemma 3.4 holds,(ii) 0 is
at p'(' by t, (iii) thecomponents
identified of t are centersof observations,
then

(3.1) 0 6-0() ov(l)


(3.2) or2 _o =o 10)

PROOF. Let N, T be (k-dimensional) neighborhoods of p'?) and t within


whichtheassertions
ofLemma3.2 hold. It followsfromLemma3.5 thatgiven
e > 0, withprobability oneas n -+ oo thereexistsa t' c T suchthat
approaching
58 PAUL I. FEDER

FromLemma3.2, = 0(2; t') is uniquelydetermined


and

(3.3) ito- 0(0)11


< CIlp(e; ? C's
t') - P(e(O);t')JI
equation(3.3) implies(3.1).
Since s is arbitrary,
Equation(3.2) followsdirectlysince
-2
or - 1
- E (en - n_)2 <_T - 1+ en =
(2 + op()
n n
On theotherhand,a2 = l/n e + 1/ E i - 1/nE eni pI'. Lemma3.4 and
the uniformconvergencein probability to 0 of E enini for 1j - 0(1(I d*
implies that a2 ? q2 + op(l). Thus 62 = a2 + op(l). L
REMARK. One mightexpectthatany function whichsatisfiestheconstraints
of the modeland whichfitsthe data betterthanpo(t)has to be close to [o(t)
somewhere in theneighborhood ofanypointt aroundwhichr(n) observations
are taken,as longas r(n) - oo. Lemma3.5 impliesthatthisis thecase ifr(n)=
2n. However,thefollowing exampleindicatesthatthisis nottruegenerally.

\/(t)

t
0 1/201/10 1/5 2/5 3/5 4/5 t I
FIG. 3.1.

Supposethemodelis a fivesegmentbrokenlineand [(t'?', t) is as shownin


Figure3.1. Supposethat logn observations are takenat each of theeightt-
valuesindicatedwithhashmarks. Further,supposen observations are madeat
t* and n observationsare uniformly spreadoverthesubinterval I--( I , -n).
Let T,n be the t-valuein I at whichthe maximumdisturbance, e..M, occurs.
Definep(t) as:
M(tXi) = io(tni) tni < tnM

XnM tni = tnM

= PO(tnt) tnM < tni K 110

= Po(t*) t,ni = t*

Define/e(t) elsewhereby the conditionthat it conformsto the fivesegment


model. It maybe verified thatasymptotically p(t) will fitthedata betterthan
p0(t) is identified,
pAo(t), butp(t) does notat all resemblep,(t).
Supposee""? e o), theclosureof theset o c E. The proofsofLemma3.5 and
SEGMENTED REGRESSION PROBLEMS 59

l.s.e. This is
of O.U,the restricted
Theorem3.6 directlyimplytheconsistency
statedas
formally
COROLLARY 3.7. If el" e co c i;, then underthe hypothesesof Theorem3.6
,- 0'? op(1), ""WI2
- - a = l.s.e. of 0, and aR2
op(l), where0,,,is therestricted
l.s.e. of a2.
is therestricted
Thus far,no mentionhas beenmadeof thebehaviorof r. It turnsout that
undersuitableconditions
r' converges and therateof con-
to r(?) in probability
vergencedependson the numberof t-derivatives thatadjacentsegmentshave
in common at the change-overpoints r1()*,
r(),
(0) T) and which
Let A denotethesetof r's such thatfj(01j(); r) = fj+1(O(0'41;
lie to therightof all centersof observation involvedin theidentification
of 0
and totheleftofall centersofobservation involvedin theidentification
of 0,+.
(For brevity, one can describeAJ(0) as thesetof T's whichare compatible,with
respectto 00(,) 0(,), withthecentersof observation.)It will be shownthatT
lies near an elementof A,') with large probabilityas n -> oo.
It is shownin Lemma3.8 thatif03 and 0,+8 are closeto 0 (O) and 0(.4 respec-
tively,theneach elementof A, (theset of rj's compatible,withrespectto 0,,
89+J,withthecentersof observation) lies close to an elementof A,(').
LEMMA 3.8. Let j-1?be anycollectionof neighborhoods whichcoversthesetA '(0).
Thereexistneighborhoods N1 and N,+1about O,(0)and 0(?+ respectively,
such thatif
0 e) 0and 0j e Nj,0j+1 e N1+1respectively,thenif vj is compatiblewithrespectto
0j, 0j+1 withthecentersof observation, v1 mustbe containedin an elementof jIft.
PROOF. Supposeto the contrary thatthereexistsa collectionof neighbor-
hoods IA, such thatforall Nj, Ni+1as describedin thestatement,
thereexists
0 e e with 0j e Nj, 0j+1 E Ni+1 and such that vj's compatiblewith respectto 0op
0j+1 existoutsideof theelementsof -xVj;. Thenthereexistsa sequence0(n) e 93
such thatOf,,,and 01j+,,, convergeto Oi(o)and 0O("1respectively and a sequence
vrj.J}such that for each n, 1j,,,belongsto no element of A,j is compatible,
withrespectto 0jn,,,0j-1,n, withthe centersof observation, and is such that
fj(60j,,; tj,n) = f+1(6j+?1,n; v,j, ). Thus thereexistsa subsequenceof Tj,?,'s which
convergesto T* 2 Aj1) and whichis compatiblewithrespectto 0j(0), 094, with
the centers. By continuity,fi(Oj(O);T*) = fj+1(0.41; T*), which contradictsthe
definitionof A (0). E
An important
specialcase of Lemma3.8 occurswhenAi(') is the one point
set, tvj(0)}. This suggests
DEFINITION 3.9. The parameter at prl by t if
0 is well-identified
(i) 0 is identifiedat p(?) by t
(ii) foreachj, 1 < j < r - 1, Aj(') is theone pointset,{rj'ol}.
of I is thenan immediateconsequenceof Theorem3.6,
The consistency
Lemma3.8, and Definition3.9. Thisis statedin Theorem3.10.
60 PAUL I. FEDER

THEOREM (*) ofLemma3.4 holds,(ii) 6 is well-identified


3.10. If (i) Condition
at p(O) byt, (iii) thecomponentsoft are centers
ofobservations,then

(3.4) -S(O) = o(l)

Severalexamplesmay help to distinguish amongthe notionsof identified,


well-identified, and unidentified.The twophasebrokenlineis well-identified
bythefourpointspicturedin example1 ofFigure3.2. The twophaseparabola-
straightline functionis identified by the fivepointspicturedin example2 of
Figure3.2. However,it is notwell-identified sincethechange-over pointmay
occurin one oftwoplaces. The twophaseparabola-straight line functionis not
identified by thefivepointspicturedin example3 ofFigure3.2. Thisis because
thepointsin two different
it is possibleto partition ways,each in accordance
withdifferent O's; namely1, 2 and 3, 4, 5 or alternatively1, 2, 3 and 4, 5.
Bothof thesepossibilities are picturedin example3.

4~~~~~~~~~~~~~~~~

2~~~~~~~~~ 3

I
EXAMPLE EXAMPLE
2

= _
2 I 2 3

EXAMPLE
3a EXAMPLE
3b
FIG. 3.2.

As a fourthexample,ifthemodelspecifies a two-phasebrokenline but the


segmentsare in realitycolinear,thentheregression function is notidentified,
sincea nonuniquesegmentcan be adjoinedin such a way thatthe resulting
function conforms to themodel. Howeveriftheadditionalassumption is made
thatthereare at leasttwocentersof observation withineach segment, thenthe
functionis identified,butobviouslynotwell-identified.
It will now be shownthatunderthe identifiability assumptions of Theorem
3.6, t9-@'60- Op(n-(log logn)'). It will be shownin the nextsectionthat
under a mild additional assumption,0 - 60)?= Op(n-i).
We firstprovethreepreliminary lemmaswhichenterinto the rate of con-
vergence argument.Notethatnotationmaydiffer fromthatintheapplications.
SEGMENTED REGRESSION PROBLEMS 61

Let X/ be an innerproductspaceand 2, V subspacesofWY.Supposex e 2-',


y e , z = x + y, and x*,y* are theorthogonalprojectionsof z onto 2, Y,
respectively.

LEMMA 3.11. Suppose thereexists an a < 1 such that x e , ye v implies


l(x,y) < a IxiIyIY
I 1. Then
(3.5) lIX+ yll< (Ilx*II+ IJy*Il)/(l
- a) .
PROOF. It follows from the definition of x*, y* that (x + y - y*, y) = 0,
(x + y - x*, x) = 0. Thus Iyj112= (y,y*) - (x,y) < IIYIIlly*II+ alxIIy Il and
IIx] I2 = (x, X*) - (x, y) < IIXII IIx*I + al [XII IIy1. The triangle inequality and the
above two relationsimply lix+ ylI < lxiI+ IlIYI< (IIx*II+ aIIyli) + (IIy*II+
allxll), from which (3.5) follows. [
Lemma 3.12 is an obvious multivariate generalization of Kolmogorov's in-
equality.

LEMMA 3.12. Suppose X1, * , Xn are independentp x 1 randomvectors,with


E(X,) = 0, CoV (Xi) = Xi. Let Sk = k= X, Z(k) = I 1 Xi, and letM denotea
positivedefinitematrix. Then
(3.6) P{maxl;k,5. Sk'MSk __2} <!-2 tr (Z'"'M)

PROOF. The quadratic formSk'MSk can be writtenas iiMJSkil2= [(MiSk)12


(where Mi denotes the symmetric square root of M). The derivation of the usual
Kolmogorov inequality is directly applicable to this sum. (See Doob [4], bottom
of page 315.) L
Lemma 3.13 will be used to show that the length of the projection of the
disturbance vector e onto a certain random one-dimensional subspace is not too
large. This lemma is used in the proof of Theorem 3.14, the principal rate of
convergence result.
The conditions and assumptions of the lemma are rather opaque and were of
course motivated by the needs of Theorem 3.14; it may be preferable to read it
after the proof of Theorem 3.14.
Assume 0 < p(n) < n and p(n) -* oo as n -k oo. Assume furtherthat N. is a
sequence of random variables such that 1 < Nn< n and Nn O=(p(n)). For
brevity denote Nnby N. Let Ci, 1 < i ? oo be a fixed set of constants and let
a > 0. Let C, **,
* C... be constants such that sup, max1l<i,l,nil< cc and
such thatnaQi = Ci(l + pni)wheremj =_max1,%is 1P.l = o(p-i(n)) as n oo.
Let h(i) = o(l) as i-> oo.

LEMMA3.13. SupposeforeveryK > 0 suchthatKp(n) < nfornsufficiently


large,

(3.7) lim Ci2loglog E i= i 15<0


( h(i) , =1 j /
Then
(3.8) TN=-Z Ci e,,i/(SE N= C2X =
Op((log logn)i) as n -o cc
62 PAUL I. FEDER

REMARK 1. Defineloglog . j to be 0 if it is otherwiseundefined


or
negative.
REMARK 2. This lemmais used in theproofof Theorem3.14 to accountfor
theinfluenceon theleastsquaresestimates
of theobservations
nearthechange-
overpoints.
REMARK 3. The 3 in theexponentin equation(3.7) is the 3 referred
to in
the paragraphimmediately below equation (2.4), where it is assumedthat
Ele,ill2+) < W.

PROOF. Multiplyeach of the CQni


by nl. Thus

TN= Z C=li(I + p,i)e,i/(i= Ci(1 + pni))2)


Since maxl!i; 1p,ij= o(p-i(n)), we have thatforevery; > 0, (i'= Z2(l +
Pni)2)* > (1 - )(N=1 i2)* for n sufficiently
large. Therefore,forn sufficiently
large,
(I - s) EY1Ci= pZini
ei c2Ci2(l
+ )
Pni)I
Y
< 1|iN 1ireilL= Ci2), < Mn- EN=1 l1i lCil( iZi),

< m i(EJiYle
o=(l)]
e2i), < mn e2j)'[ + = Op(1) -
Now consider VN=- E1 Cie,.J(E=l 1 j)i. For every s > 0 there exists a
K = K(e) such thatN < Kp(n) < n withprobabilitygreaterthan 1 - e as n -+ oo.
Thus, withprobabilitygreaterthan 1 - e as n -* oo, IVNI < maxl?k?Kp(-) IVkl. If
Z2 is bounded as n -> oo, then Kolmogorov's inequalityimpliesthat that
zKp(n)

thenumerator is OP(l). The denominator is obviouslyboundedawayfrom0. If


R(R)
ci2-s o as n -+ oo, define as eni if i h(i)
e,,< ,A ; 2/QlC2
e, loglog
and as 0 otherwise.
Imaginethe finitesequence{k= 1 ieTi/(Ek 1i2)j, k = 1, *.., Kp(n)}, to be
the beginningofan infinite sequence. The proofof thefirstpartof thelaw of
theiterated logarithm on pages261-262ofLoeve [16] holdstrueforthisinfinite
sequence. This impliesthatforevery1 > 0
k
PIlZk= eT I > (1 + i)[Z='
E i2][Iloglog often}= 0.
i2] infinitely
This in turnimpliesthat
limM_ P{supk?MI[Z=l
Zz[ Ti][= ki2]"[log log Ek1 Ci2]-| > 1 p} 0
thatforeverye > 0 thereexistsan M(s) suchthatforall n for
or equivalently
whichKp(n) > M(e), PtmaxM(z)?k?KP(n) Isamel> 1 + j} < a. Thisimpliesthat
maxl?k?KP(fl) z=1 Z.eiT/(Ek 1 Ci2)j
=
O,([log log K=p1(n) Qi2],)

Since Ci < Cna(l + mn-j), 1 i < n, it followsthattheabove expressionis


Op([log log n]i) as n -* oo.
It now remainsto showthate can be substituted fore T in theabove order
relation. This is done usingthe firstpartof the Borel-CanlelliLemma (see
SEGMENTED REGRESSION PROBLEMS 63

Loeve [16],page228): namely,if E= P(An)< 00 thenP(Aninfinitely


often)=
0. Now,
_
#
F >
~~~~~~~~~()
Cj2
Kp,jf)
i=l P{eTi
n=l ej -EKP() p Jeji >? iog E
L log log 2jJ

? Eje,,l2(1+5YKp(n) (42 loglogE= CZ2 1+6

which, by hypothesis,remainsbounded as n -* oo.


If we pretendthat{e,, 1 < i Kp(n)} is the beginningof an infinitesequence,
_

then the Borel-Cantelli Lemma implies that P{eli # e,i infinitelyoftenj = 0.


Equivalently, forevery; > 0 thereexistsan M(;) suchthatP{e,n- = 0 forall
i ? M(;)} ? 1- a. This impliesthatmaxl?k?Kp(.) ID= M1(en e j)/(ik_ - i -

op(l) as n -*oo. Ei
Let noj,Nj denotetheindicesof the observations whichoccur immediately
precedingrj, j j = 1, ..., r 1. Assumethatthespacingof theobserva-
-

tionsaroundthechange-over pointssatisfies thefollowingcondition:


There exist functionspj(n), j = 1, ***, r - 1 such thatforeach j theassump-
tionsofLemma3.13 are satisfied withN = INj - nojl,p(n) = pj(n), and C,i -
fj 0(t,4nsoi+i) ffil0(tn,n0j+i)-

REMARK. If e(t)is a brokenline and the tji are equally spaced, then equation
(3.7) holdstruewithh(i) = i-a/2, p(n) = n,and N, Cnidefinedas in thepreceding
paragraph.To see this,expressfjf?)(t)= ai(?' + bj'?'(t-
-1 f,(t) = a +
-
b'.?+1(t (j0). Then Cni is (bj'() - b(,94I)i/n(1
+ o (1)) and so n" -> (bj 0) - b(.?)i
Ci. The assertionin (3.7) is now easily verified.
Next,a keytheorem in therateofconvergence argument is statedandproved.
The theoremguaranteesthatwithinany subsetof t values whichcontainsa
"substantial"portionoftheobservations, theestimatedregression function
must
be "quiteclose" to thetrueregressionfunction,at least at one point. An as-
sumptionis requiredto theeffect that"enough"observations are takenwithin
each (true)segmentof theregression function.
Suppose Sj is a subset of (zr1)j,Zrj'0), j = 1, ***, r and that with probability
approachingone as n -+ oo, S; e (7_l rj), j] 1, *, r. Let M, denotethe
K(j) x K(j) matrix(1/nES fjh(tn,)fjk(tni)), h, k 1, 2, . . ., K(j). Assumethat
theminimum eigenvalueof M. is 2. + o(l ) as n oo where2i > 0,1 1, =* ..,

r. In otherwords,thesubsetSi containsa proportion


oftheinformation
which
is bounded away from0 as n -0oo.

THEOREM 3.14. Suppose Wis a subsetof [0, 1] suchthatH(W) > 0. Then

(3.9) minti6 Jvk(ti)j = logn)i) .


Op(n-f(log
PROOF. (r = 2). The proofis given only forthe case r = 2, to simplifythe
notation,withno substantialloss in generality.Let t denotethe n-vector
64 PAUL I. FEDER

(tnlg..., t,") and let p, denote the n-vector(p(e(01;t1) ..*, jj(e?); t,,)) within
(and onlywithin)thisproof.
Thereare n + 1 waysin whichthet71imaybe dividedamongtwo segments.
Considerthekthof thesepartitions:
(ti,, * *, tn,k_1), (tnk, . *, t..). Let _k be
thelinearspacespannedbytheq (_ K(l) + K(2)) n-vectors wheretheithcom-
ponentof thejth vectoris
fij(tn) if i ? k-1 and j < K(l)
O if i > k and j < K(l)
O if i ? k-1 and jIK(l) + h > K(l)
f2h(t,i) if i > k and j _ K(l) + h > K(l)
and let -k+ = ?k [p#(t)]denotethe directsumof thetwovectorspaces.
Let Qk+ denotetheorthogonalprojectiononto ,k+ Qk theorthogonal
projec-
tiononto Wk
Let X (Xnl, * , X,)' and let Pk* be the orthogonal projection of X onto
f1k the closestpointto X in ,k+ subjectto theunderlying
+ continuity
restrictions.This is displayedschematically
in Figure3.3.
XE-Lote

I-lo Ilk Pk k
FIG. 3.3.

Then
IIX - ?l
|jIk* - PkiI = |IX "kli ? lIX- Poll
whichimplies
IX - Poll2- IIPk - Yoll2+ lPSk -
PollX
- 2(Pk* Ok-09 ) +-
OIO IXI-
Il2
|k I
Thus
- ol12 < 2(pk* - PO k - Po) < 21|P* - PolI I I -k PolI
and so
poll < 211Pk* - foll = 21IQk+eII
-
(3.10) llp-k
stepinshowingthat i/
isan important
Thiscomputation ,lI= Op((loglogn)i).
SEGMENTED REGRESSION PROBLEMS 65

The estimated regression belongsto the randomlinearspace, X whichis


spannedby q n-vectors whosecomponents are almostthesameas thoseof the
q n-vectors thatspan Wk,exceptthatthe conditioni < k - 1 is replacedby
i < N1,so thatthereare now a randomnumberof "0" components in the vec-
tors. ThisimpliesthatDi(t)belongsto +- 3 [po(t)],the directsum of
the two vectorspaces. The vectorspace W+ is also generated by thedirect
sumofSand thevectorC,whereC' = (0,. *, O,f'f(tn 1 N1+1) -f2'(tf N1+N1)I...
0,
f1'n(tnnol)- f2(0)(t,nnol), **,0). Let +, Q denote theorthogonal projections
onto _5,
Srespectively.
If it can be shownthatI +eH= O((log logn)i), thenequation(3.10) implies
I - ol = Op((loglogn)i). Thusany subset,W, of thet,i's whichcontainsa
proportion of theobservations
asymptotically
boundedawayfrom0, mustcon-
tain a forwhich i(tni) = Op(n-'(log log n)'). This implies(3.9).
tn
It now remainsto show that IIQ+elI= O((log logn)4). Recall thatit was
assumedthatwithlargeprobability as n -0oo, Sj e (r_J,7,) n ( z1Tj(O)) ]
1, 2. Sincetheminimum eigenvalues,21,22 of theinformation matrices
M1,M2
are asymptoticallybounded away from0 as n -- oo, it followsthat the propor-
tionsof observations
in thesubsetsS1,S2musteach be asymptotically
bounded
away from0 as n -> oo. With large probabilityas n -* oo the sets S, S2 do not
intersect
theinterval(fl, r1a?))and so thecomponentsof thevectorC are identi-
cally0 fort, e Sl, t,,ie S2. Thus,intuitively,
one wouldexpectthevectorC to
be at a substantialangle to .5 (in fact almost orthogonal) as n -+oo. This
intuition
can be quantified
bydemonstrating theexistenceof an a < 1 suchthat
withprobability approachingone as n oo, (C, g)lj< alICIIIlglIforall g e -
-*
The calculationof suchan a is notdifficult
butis omittedto avoid digressions.
It thusfollowsfromLemma3.11 that
IIQ+ell< {l/(I - a)(IlQell + I(C,e)h/IhCHI)}(1
+ o,(1)) .
Therefore,if it can be shownthathQell= Op(1), (C, e)/!IhI= Op((log log n)i),
then
pOII < 21IQ+e|l O((log log n)i) .
- -

1
That Qell= O(l) is a consequenceof Lemma 3.12 and the assumptions
regardingS3,Mj, j = 1, 2. Fromstandardleastsquaresresults
IQell2< S2 v2'M,-1[1? o(1)]v3
=A v3'M3-'v,}(l+ op(l))
where niv1'= (T 10f1(tni)eni, Z
=JflK(l1(tn)en) and where niv2'-
( ==Nl+lf2l(tni)e-nil . .. I 1=N1+lf2K(t2ni)eni) Lemma 3.12 implies that
P{v,'Mylv3? 22} < i-2 tr{M*My}j- where M3* is the K(j) x K(j) matrix
(n 1 z1 fjh(t.i)fjk(tni)), h,k = 1, 2, = 0(1), it
, K(j). Since tr{1M*MNJ1}
follows that v,'Mj'iv, = 0p(l) and so IIQell= 01(1).
It now remainsto show that(C, e)/l
hCh =-O((log logn)l). But thisfollows
as a direct consequence of Lemma 3.13. Thus the proof of the theoremis
complete.
66 PAUL I. FEDER

LEMMA 3.15. If 0 is identified


at by the q-vectort (t1, *, tq) and the
p(O)

componentsof t are centersof observations, thentheassumptionsof Theorem3.14


relatingto S3, Mj,, = 1, **, r are satisfied.

PROOF. It sufficesto demonstrateS,, Mj, j = 1, * , r. Let S, consistof the


unionofsmallneighborhoods abouteach of thecomponents of t thatlie within
(7(z)1, zr()), and arenotlimitpointsof A,(". DefineM, as previously.Sincethe
elementsof S, are not limitpointsof A,("?and Lemma 3.8 impliesthatrj is
arbitrarily close to AJ'0?
withlarge probabilityas n --+ oo, it followsthatS,e
(rj_l, rj), j = 1, * **, r, with probabilityapproachingone as n -* oo. Let G, be
theq x q blockdiagonalmatrixconsisting of r diagonalblocksand r(r- 1) off
diagonalblocks. The jth diagonalblockis theK(j) x K(j) information matrix
MP. The (i, j)th off
diagonalblockis theK(i) x K(j) matrixconsisting
entirely
ofzeroes.Ifitcan be shownthatthesmallesteigenvalueof Gnis boundedaway
from0, thenthesamem-ust be trueforeach of theMP. Let G* (f(t))denote
theq x q matrixwith(i, j)th element
Gj= f1p(t1) i ? K(1) and j = p < K(l)
=0 i > K(l) and j =p < K(l)
=fgp(ti) i = K(l) + + K(g-1) + h < K(l) + + K(g)
and j = K(_) + ** + K(s-1) + p < K(1) + ** + K(g)
=0 i < K(l) + * - + K(g-1) or i > K(1) + * + K(g)
and j = K(l) + * * + K(g-1) + p < K(l) + ** + K(g).

It followsfromLemma3.2 thatG* is nonsingular, and thesmallesteigenvalue


of G*'G* is greaterthansomeC > 0. By continuity, thereexistintervals,
I,,
abouteach of thecomponentsof t suchthatif t* is any q-dimensionalvector
with components tj* e Ij, j = 1, * *, q, then the smallest eigenvalue of
(f(t*))'(f(t*))is greaterthanC/2. Sincethecomponents of t are centersof ob-
servations, thereexistsa > 0 suchthateach 1, containsat least2rn + o(n) of
-

the tni. Form [rn] vectorst,,,, (t*h * t7hq) h,= 1,


h [yn],each having
one component in I,, j = 1, * *, q. Thus,

Gn > - yj,rn (f(tnh))A(f(tnh))


n

(A > B meansthatA - B is positivesemidefinite.)Therefore,the smallest


eigenvalueof G. is greaterthan rC/2+ o(l). This impliesthatG, is strictly
positivedefinite
forn sufficiently
large. U]
Theorem3.14 and Lemma3.15 together
implytheprincipalrate of conver-
genceresult,namely:
THEOREM 3.16.(Rate of convergence). If (i) 0 is identifiedat p(O) by t and the
t
componentsof are centersof observations,(ii) the spacingof the observations
SEGMENTED REGRESSION PROBLEMS 67

is suchthattheassumptions
aroundtheeachofthebreakpoints ofLemma3.13 are
then
satisfied,
(3.11) 0 - 6'0? = O(n-'(log log n)').

PROOF.Theorem3.14 impliesthatwithinanysmallneighborhood
ofa center
of observations
thereexistsa tni suchthat
(tni) - po(tnj) = Op(n i(log log n)')

Lemma3.2 impliestheconclusionof thetheorem.


COROLLARY 3.17. If $'?)e 6 c E, thenunderthehypotheses
of Theorem
3.16
(3.12) 0, - 6(0 = O(n-i(log logn)i).
it will be assumedin thesequelthat6 is well-identified
For simplicity, at f(O)
by t.
The rate of convergenceof f to tr() will now be considered. Suppose that
fj(8j(O); t) and fj+,(8(l)4; t) have m, - 1 t-derivatives in commonat t (0)
1, ***, r - 1. Further,suppose thatfj andfj+, have continuousleftand right
m,tht-derivatives at t = r'(0) and differin bothof thesederivatives. For brevity,
denotetheseassumptions by conditions (r). Let D+(h,j, k), D-(h,j, k) denote
thekthrightand leftt-derivatives respectively offh(oh(?; t) at t = rj'?). If they
coincide,denotetheircommonvalue by D(h,j, k).
Expandfj(Oj; t) and fj+,(Oj+,; t) in Taylorseriesabout6,(0), j (0) and8( 0 (0)
respectively. Recall that fj(69j(O);Tj()0) = fj+1(0(0?Th;
rj(O)), D(j, j, k) = D(j + 1,
j, k), k = 1, 2, *, m -1.
For 6 e 8 in theneighborhood of 0'?),theintersection point,r-, of the two
segmentsfj(6j,t) and fj+,(6j+,; t), is obtained by solving the equationfj+,(6j+,;
j)- fj(6j; ri) = 0. For 6, 8j+l) T near O (0)?, 9() (0)

0 = fj+1(0j+1; j) - f,(Oj; vj)


= [;afx+1+ o(1)] (?j+l -
6(+) - f + o(1)] 0))
j+1. Lae
+ - [D+(j + 1,], in,) - Di(j,j, m,) + o(l)](z,-rj'?'n
Mj.

where D+ is D+ if zj > r,(O)and D- if zj < r,(?) Thus

(3.13) In! [D+(j + 1,], in,) - D(j, j, m,) + o(l)]( ,-T(o)

=
a8i
+ 0(1)] (6,-O '?))- Fa8~~~~~~~j+1
+ o(1)1 (6j+1- ?+4).
Equation(3.13) and Theorem3.16 imply
(3.14) (r -rJO)).=Op (n-i(log log n)i).
This is statedformally
in thetheorembelow.
68 PAUL I. FEDER

THEOREM 3.18. If (i) 6 is well-identified at byt and thecomponents


p(O) of t are
centersof observations, (ii) conditions(T) above are satisfied,and (iii) theobserva-
tionsare spacedaroundthebreakpoints ina mannerso thattheassumptions of Lemma
3.13 are satisfied,then
(0 - 6(') = Op(n-i(loglog n)i and
(j- rj-?))^j = Op(n-i(loglog n)i), j = 1, *..., r -1
specialcase of thistheoremis statedas
An important
COROLLARY 3.19. If thehypotheses
of Theorem3.18 are satisfiedand in addition
m= *- = mr_ = 1, then

(3.15) e- e(o) = 0_(n-A(1oglogn)').


Justas withCorollary3.17
COROLLARY 3.20. If e?' ecJ) c E and thehypotheses of Theorem3.18 are satis-
fied,then(T,,, - rT, m)mj - 0p(n-1(log log n)'), j = 1, ***, r - 1.

COROLLARY 3.21. If (i) the conditionsof Theorem3.18 are satisfied,and (ii)


E(e4) < oo, thena2 - o = Op(n-i(loglogn)i). If inaddition
e(O) e b, then a1,.2
-

oo, = i(log logn)i)


O0p(n- .
It was previously remarked thata necessary conditionforidentification ofthe
underlying regressionfunction is thatno twoadjacentsegments are identically
thesame. However,ifone imposestheadditionalassumptionthata sufficient
numberof previouslyspecified centersof observation lie withineach segment,
then 6(0) is identified
even thoughtwo adjacentsegmentsmay be the same.
However,6(0) is obviouslynotwell-identified. ThusTheorem3.16 can be used
to decidewhetheror nottwoadjacentsegments are distinct.
Moreprecisely, ifthespecified centersof observations lie withinthe appro-
priatesegments and estimated coefficients
fromtwoadjacentsegments (having
thesamefunctional form)differ bylessthanlog n/ni,or ifan estimated segment
does not containappropriate disjointsubintervals, each withat least n/logn
observations, thenit can be inferred thattwoadjacentsegments are identical.
This is statedformally as
COROLLARY 3.22. Supposetheassumptions of Theorem3.16 holdand it is known
a priorithatcertainspecifiedcentersof observationlie strictlywithineach segment
and are sufficient
to identify60('. If (and onlyif) p(e'0'; t) containstwoidentical
adjacentsegments,thenwithprobabilityapproachingone as n -* oo,either
(i) 160-693j+1 < logn/ni
or
(ii) thereexistsan estimatedsegment[r-, r-+,]whichdoes notcontainappropriate
each withat least n/logn observations.
disjointsubintervals,
PROOF. For the sake of simplicityassume that r = 2. For all least squares
SEGMENTED REGRESSION PROBLEMS 69

solutionswiththe a priorispecifiedcenterscontainedin theappropriate seg-


ments,the argumentleadingup to the proof of Theorem 3.16 goes through
directly.Thus theonlyway for(i) to be violatedis for(ii) to hold.
4. Asymptotic distribution of . It will be assumed,unlessspecifically men-
tionedto thecontrary, that0 is well identifiedat p'l) by t and thecomponents
oft are centersofobservations.It was showninTheorem3.18 thatunderthese
conditions and a mild additional assumption0 - 0(0) = O(n-i(log log n)i) and
(-j - zJ0()"'j = O(n-l(log log n)i) wherem,is the lowestordert-derivative in
whichthesegments fj'?'andfj'0',disagreeat t = tQj(). Thisenablesone todiscuss
the asymptoticdistributionof $.
The principalidea (due to Sylwester[24]) is to forma pseudoproblemby
deleting all of the observationsin intervalsLj(n), j = 1, * *., r - 1 of length

dj(n) about each of the -rj'?. The intervalsLj(n) are chosen so that dj(n) -> 0
but (n/loglog n)(l/2m)jdj(n)
-+ oo. The term pseudo problem is used because in
practicethe statisticiandoes not know (rz-j') and thus cannot delete such
observations.
Assumethatthet,i aredistributed ina mannerthatimpliesonlyo (n/loglogn)
observations are deletedby thisprocess. (If H(t) is continuousand has finite
slopeat each zj(", thenthiswillbe thecase.) Intuitively,deletingo(n/loglogn)
observations eliminates a percentage of theinformation whichapproacheszero
as n -- oo. Thus,froman asymptotic pointof view,thedeletionof theseobser-
vationsshouldnot affect anyof thedistributiontheory.It will be shownthat
thisis in factthecase.
Moreprecisely, it will be shownthat
-* _ 8'?) = OP(n-e), (* - ,j0')mi = O(n-i)
t9*- 0 = op(n-i) , (T * - TJ0?)mi (7 _3(o))ms
-- - op(n-i)

where* _(6*, T*) is thel.s.e. in thepseudoproblem(p.l.s.e.). Thisis a great


sincethep.l.s.e. behavesasymptotically
simplification as ifit wereknownbe-
tween which two consecutive observationseach of the z.j(0) are located. Thus
it is possibleto use classicaltechniquesto discussthe asymptotic
behaviorof
V. Notation:
(i) n*: sample size in the pseudo problem,n* -n* o(n/loglog n).
(ii) 0*: unrestrictedl.s.e. in thepseudoproblem(p.l.s.e.).
l.s.e. in thepseudoproblem.
(iii) 6,*: restricted
(iv) E *: thesummation overthen* termsof thepseudoproblem.
(V) Z**: n=1
-

Generally,a singleasteriskrefersto thepseudoproblem.


Thus
Theorems3.10 and 3.18 carryoverdirectlyin thepseudoproblemn.
THEOREM 4.1. If e(?' e d c 2, 0 is well-identified
at fC(?)byt whosecomponents
(in thepseudoproblem),the conditionsof Lemma 3.13
are centersof observations
70 PAUL I. FEDER

t) haveat mostm,-1
are satisfied(in thepseudoproblem),andfj(6j'?'; t),fj+1(6;041.;
t-derivatives in commonat t - (0'),then

(4.1) 8* -0'( = 0_(n-l(log log n)i)


(-* rj(o)^j = 0,(n-i(log log n)i)
(4.2) 6 *- (0) = 0,(n-i(log log n)i),
-* _ (O))Mj
= 0(n-i(log logn)i) i

The asymptotic behaviorof e will now be consideredby firsttreating*.


Recall that9 was definedto be the collectionof O's whichlead to functions
thecontinuity
p(e; t) satisfying restraints, and E the set of corresponding e's.
Thus, 9 is a subsetof q-dimensional space,and differentasymptotic behavior
occurs,depending on whether(0') is a boundarypointor an interiorpointof
9. It will be shownlaterthat (0') interior to 9 implies6* has an asymptotic
normaldistribution.
LEMMA4.2. SupposeD+(j, j, mi) = D-(j, j, j, mi), D+(j + 1,]j, m) = D-(j +
1,j, mj), j = 1, * **, r - 1. If mi, *..., m._l are all odd,6(0) is an interiorpoint
of
9. If any of them, are even,then (0l) is a boundary pointof 9.

pointoffj(Oj; t) and
PROOF.Recall equation(3.13): If r, is theintersection
fj+l(60j+l;t)

(3.13) 1 [D+(j + 1,j,


in.
m,) -D+(j,j, m,)](zj-T-,,())1n
- +0(1)]'
[af1(Ot0)?)) (69 - af'+ o (1)]'(6+ - 6)

for all 6 , 6j+1 close to 6


sufficiently 801.? If m, is odd and D+ = D- then
(O),
forall 62,61+1, equation(3.13) can be solvedforr,. However,if m, is even,
then(3.13) cannothold bothfor6 (0) + j 6;0'4,+ j and 8 (0) - aj, 60(?1 -
,+l. Thusanyneighborhood of (6;0(',0(;01)mustcontainpointswhichare not
in 9 and so 0'?'is a boundarypointof 0. 0
An important specialcaseofLemma4.2 occurswheneachsegment isa straight
line. Thenml=m2= =m_ = 1.
The nextlemmashowsthattherateof convergence ofO* to 6(0) is n-I rather
thann-i(loglogn)i.
LEMMA 4.3. 0* - 601) - 0p(n- ).
PROOF. In analogy with equation (2.5) define

s*(e)=n-1 (Xit _ ,c(e; tni)2


Theorem4.1 impliesthat * e Lj(n), j = 1, * , r - 1 withlargeprobability
as
n o-+ o. Since there are no observations(of the pseudo problem) within the
intervalsLj(n) it followsthats*(e) withinthisregiondoes notdependon r and
SEGMENTED REGRESSION PROBLEMS 71

in 6. For the
is a paraboloidin 6. In particular,it is twice differentiable
remainderof the proofdenotes*(e) by s*(6). Thus,withlargeprobability as
n -0oo
O > S*(O*) - 5*(0(0))

(4.3) = n-i as (D)) [nl(69*- 6(0))]

+ 1[n'(* - 60))]P[ 1 da5(de)] [ni(O* - 6(0))]

The proof of Lemma 3.15 implies that the smallest eigenvalue of


Oa is boundedaway fromzeroas n -*0 , and thematrix
n-' a2s*(0(0))/1a is posi-
and convergesto a limitas n -0oo.
tivedefinite The vectorn-i[as*(6(o))/1a]
has mean 0 and uniformlybounded variance as n -0oo, so that it is Op(1).
Thus equation(4.3) impliesthatni(O* - 6(0)) = Op(l).EI
LEMMA4.4. If 6(?) is an interior
pointof e then

(4.4) '[nr(0* - 6))] (0, G-')


(law) of V and
thedistribution
where _?'(V) represents
-
(4.5) G3kik ss 0 afe((); t) aSU(em;
t) dH(t)
aoi aok
positivedefinite.
matrixand is strictly
whereG is theq x q information
PROOF. Define s*(e) as in the proofof Lemma 4.3. Since 6(0) is interiorto
0, any 6 withina neighborhood for6*.
of 60') is "admissible"as a possibility
Thus,withlargeprobability as n -0oo, t* is thepointat whichtheunrestrained
minimumof s*(e) occurs. ThusO* maybe obtainedby settingthederivative
of s*(e) equal to 0. This implies

(4.6) {1 z
n ~
p
(a1(')(
~ ~
;)
,I~ a6
t)}i (0* - (0))

_ apc(V);
t1i)e,
n a6
The proofof Lemma3.15 impliesthatthesmallesteigenvalueof thematrixon
the left-handside of equation(4.6) is boundedaway fromzero. Since this
matrixconvergesto G, by continuity G mustbe positivedefinite.Thus

ni O* -_(0)) = {G-1+ o(l)}n-i E* en(i(O; tfl)

The Lindeberg-Feller fordoublesequences(see [16],page


centrallimittheorem
295) impliestheassertionof thelemma.
COROLLARY 4.5. In thecase of brokenline regression(i.e., wheneach segment
is a straightline), ni(0 - 6(0)) has an asymptotic
normaldistributionwithmean0
and covarianceG-1.
72 PAUL I. FEDER

Beforediscussingthecase when601)is a boundarypointof e, it is necessary


to introducetwodefinitions.Theseappearin Chernoff[2].
DEFINITION if re C impliescr e C for
4.6. A setC is positivelyhomogeneous
c > 0.

DEFINITION 4.7. The set p is approximated


by thepositivelyhomogeneous
setC
if infr0
I-, =( o(12) for a e p, a Ir-1
>0 and inf,,ep = o(lrI)for e C,
-

r 0.
Let e - 6(0 denote the translateof e by 6(0); i.e., E -( '0) = to - 0)?; 6E
O}. This amountsto translating
theoriginof theparameter
space to 6(0).
LEMMA 4.8. If (i) 6(0) is a boundarypointof 9, (ii) 601) is approximated9 -
by the convex,positivelyhomogeneousset C, thenthe asymptoticdistribution of
ni(9* - 6(0)) is thatof theclosest(in themetric
determined
bytheasymptotic infor-
mationmatrix,G) pointin C to Z, whereS?'(Z) = _-(0, G-1).
The p.l.s.e. 0* is thatelementof 9 whichminimizess*(f). Let 6*
PROOF.
denotethe parameterwhichminimizes s*(e) withoutregardforthecontinuity
restraintsat the change-over points. Then _2n[ni(P*- 6()] -+ _xV(0,G-'),
whereG is specifiedin thestatement of Lemma4.4.
Recall thatwithintheregionof interestof the parameterspace, s*(e) does
not dependon r and is a paraboloidin 6. It will thusbe convenient forthe
remainder oftheprooftodenotes*(e) bys*(6). The change-over pointsbetween
segmentswill be understoodto lie in L,(n), j = 1, , r - 1 regardlessof whe-
theror not 6 e O.
For all 6 in theneighborhood of 6(0)
-
(4.7) S*(6) s*(6*) + (6* -
8)'Gn(d* - 6)
whereG. is thematrixon theleft-hand sideof equation(4.6). Gnconverges to
G, whereG is givenin equation(4.5). Withlargeprobability 0* is thatelement
of 9) whichminimizes (6* - 8)fG,(8*- 6). It maybe shownby an argument
thatutilizestheconvexity of C that6* r*+ o(n-i) wherer*- (61 mini-
-

mizes (6 - 8)'G(O* - 6) over all 6 - 68(0 e C. (The argumentis a bit lengthy


and so has beenomitted.It is availablefromthe authorupon request.) This
implies that ni(O* - 60) and ni(j* - 60)) have the same asymptoticdistribu-
tion,whichmaybe shownbya continuityargument to be thatwhichis asserted
in thestatement
of thelemma. This completestheproof.
if 9 is locallyconvexat 6(0).
Note1. Condition(ii) is satisfied
Note2. Condition(ii), thatC is convex,is a bitstronger
thanwhat is actu-
ally neededto provethe lemma. If C is a positively setwhich
homogeneous
can be dividedintodisjointconvexsets{Ca} suchthatwithlargeprobability as
n oo theestimates
-* 6*, r*and 6* (theclosestpointin to 6* in the metric
9
determined by G) each lie in thesameC,, thentheproofgoesthrough.This
SEGMENTED REGRESSION PROBLEMS 73

relaxedconditionis necessaryforthevalidityof example3, case (iv) at theend


of thissection.
Lemmas4.4 and 4.8 are thebasicbuildingblocksforthe calculationof the
asymptotic distribution of 0, thel.s.e. in theoriginalproblem.It nowremains
to showthat0 andO'*do notdiffer by too much. The nextsequenceof lemmas
leads up to the resultthat0 - = o0(n-i). This showsthattherateof con-
vergenceof0 to 6(0) is n-i ratherthann-i(loglogn)i and that0* and 0 (suitably
normalized)have thesameasymptotic distribution.
The following threelemmaslead touniform boundson theorderofmagnitude
of certainexpressions whichoccurin regression problems.The boundswillbe
preciselystatedin Lemmas4.10 and 4.11.
Lemma4.9 concernsthebehaviorof partialsumsof numbers.Thisis Theo-
rem3.1 ofSylwester [24] and is includedhereonlyforthesakeof completeness.
The proofgivenbelowis due to a referee.
LEMMA4.9. Let yi, i = 1, 2, **, n be anynumbersand wi, i = 1, 2, * ,n be
any n numberssuch that1 > wl > w2>... > w > 0. Then
ma1:k:!9n E1 yl | > max,:,-kg.l= Y i
PROOF. Definew+l= 0. Let di = w -wj+1 _ 0, i = 1, *,n. Thenwi
i dj. Thus
maxl1k?kl IzE1zlyiw,l
max1S IZ=l y (Ej dj)l = maxljk?fl IzLd d mil (j,k) yjl
< %j=dj[maxl.k n |mn1(j,k) i
yil] = 2= dj[maxljks,j I|=1 YilI
< (,1dj)(max, ks lEt=j yil) _ max,Sk,,; lE=l i
Let J- definedon 0 < t ? 1 suchthatf(t)
{f(t)} be theclass of functions
is composedof at mosts segmenteach of whichis a differentiable function
possessingat most z sign changes in derivative. Let ej, i = 1, 2, ***, n be a
doublesequenceof i.i.d. randomvariableshavingmean0 and variance1. De-
notef(tt.)byf"t.For convenience
arrangethetn so that0 < t < *... < t_n< 1.
LEMMA 4.10.
)
1E=1f,,ie,.j < {maxj<j:,_,Jf4jj|}?p(n
whereO(ni) applies uniformly
overallf cV8.

PROOF. Assume without loss of generalitymax 1fntj= 1. By definitionof


&W, the set I-={t: If(t)l< 1}, whichcontainsall the t, consistsof at most
s(z + 1) (possiblydegenerate)intervalsin each of whichdf(t)/dthas at mostz
zeros. Therefore each of thes(z + 1) intervalscan be subdividedinto2(z + 1)
or less subintervals,
throughout whichf(t) is positiveand increasing,
negative
and increasing,positiveand decreasing, or negativeand decreasing.Let I(k),
k = 1, 2, ..., 2s(z + 1)2 denote the kth such subinterval. Obviously
74 PAUL I. FEDER

Thefi can be assumedpositiveand decreasingwithineach subinterval,


since
otherwisethefj, can be multiplied
byminusone or thesummation
can be taken
in reverseorder. By Lemina4.9

(4.9) iEIk(kfnieni ? max1 I - An I Z=h


ej
Equation(4.9) is validforeverynondegenerate interval.
The setI mayalso containdegenerate one-pointintervalsin whichall f(t.i)
(4.9) holdstrivially.Hence from
are + 1 or all are - 1. For theseintervals
(4.8) and (4.9)
(4.10) 1
fj eniI_ 2s(z + 1)2A".
Thus farin theproofprobabilityhas notbeenmentioned.The onlyrandom
sideof (4.10) is A., whichdoes notinvolveJ8
variableon theright-hand By
Kolmogorov'sinequalityAn = O(ni). [
The proofof Lemma4.10 can easilybe extendedto yield
LEMMA 4.11. Let 5? denotethecollectionof subsets,S, of [0, 1] each of which
consistsof at mostm intervals. Then

IZESffnke.nli < {maxs IfjjI}Op(n1)

forf e
uniformly .8 and Se S ..
LEMMA 4.12. If (i) 6 is well-identifiedat p(O) by t and the componentsof t are
centersof observations,(ii) theconditionsof Lemma 3.13 are satisfied(in boththe
originaland pseudoproblems),(iii) 0 is locallyconvexat 6(0), then

(4.11) ,0(n
PROOF.It followsfromthedefinition
of identification
that0 is also identified
at p,u?byt in thepseudoproblemand thecomponents of t are centersof obser-
vationswithrespectto thepseudoproblem.It followsfromTheorems3.16 and
4.1 that 0 - 00 -_ O(n-i(log log n)f) and * -6(0) = O(n-i(log log n)i).
Selecta. > 0 suchthata,/(log log n)* oo and a, = o((n/n**)i). Let X, -

{ee: 6- (?)I < a.n, n E L(n), j = 1, * , r- 1). Then e and * both


lie in Wnwithlargeprobability as n - oo. Notethatthefunctions*(e) depends
onlyon 6 fore e W., so thats*(e) s*(8).
Recall s(e) = n-' I]=1(eni + )2, s*(e) = n-I (ens+ VX)2. Thus
(4.12) s(e) = s*(e) + n1 5 ** (e,n + L2ni)2

= s*(e) + n-1 ** e2i + 2n-1 ** ej ni + n-1 ** 2.

It follows fromthe definitionof ,, that


Sup ew . maxteU Lj(n) Ime(t;
t)I = O(a,nni).
Thus supt, s jn1 ** il= o(n-1)and Lemma4.11 withn**substituted
forn
implies
suptC: In-1L** eniv)jI = op(n-1).
SEGMENTED REGRESSION PROBLEMS 75

Equation(4.12) thusimplies

(4.13) s(e) = s*(e) + n- e2i + 0,(n-l)


smallfor eC
whereo,(n'1) is uniformly
Sincee and * are thel.s.e. and p.l.s.e. respectively,
(4.14) s( ) _ s(f*), s*( *) < s*(f)

Equations(4.13) and (4.14) imply


(4.15) 0 < s(e*) - s(f) = s*(*) -s*(f) + o,(n-1) ? o,(n-1) .

Therefores*(f) - s*(e*) = o0(n-1). Since 9 is locallyconvexat 0(0) and 8* is


theminimizing value in 0, itfollowsthatthedirectional derivativeof s*(e) at
0* in anydirectioninto9 is nonnegative.Now

(4.16) s*(e) = s*(f*) + n-i(0 - *)'[ni 0sa($*)]

+- 1(( *)e_ ) 2 (e - 0*)

Note thatthelasttwotermson therightsideoftheaboveequationarepositive.


Equations(4.15) and (4.16) imply0 - 0* = o0(n-1). E
Note. Condition(iii) is morethanwhatis actuallyneededto provethe lem-
ma. If 9) can be partitioned intodisjointlocallyconvexsets{9,} suchthatwith
largeprobability as n -*00 s and sf lie in the same E), thenthe directional
derivativein (4.16) is positiveand theproofof thelemmagoesthrough.This
relaxedconditionis necessaryforthevalidityof example3, case (iv) at theend
of thissection.
Lemma 4.12 impliesthat ni(0 - 0'0)) and ni(0* - 8(0)) have the same asymp-
Thus
toticdistribution.
THEOREM 4.13. Suppose(i) 0 is well-identified at p'?' by t and the components
of t are centersof observations,(ii) theconditionsof Lemma 3.13 are satisfied.
point of 8 then 22[ni(8 - 0(0))]- 1V(0 G-1) where
(a) If 0(0) is an interior
G is thepositivedefinite information matrixand

Gjk-5= (1 a ) (;t dH(t).

(b) If 0(0) is a boundarypointof 9 and 9 - 0(0) by theconvex,


is approximated
positivelyhomogeneousset C, then?9[n0(0 - 0(0))] convergesto the distribution
of =r(Z) where=(Z) minimizes(Z - 0)'G(Z - 0) amongall 0 e C, and 9(Z) =
.xV(O, G-1).
Note. The convexity in(b) can be relaxedin themannerdiscussed
assumption
in Note 2 following
Lemma4.8.
76 PAUL I. FEDER

4.14. If D+(j,; j mi) = DT(j,j, mi), D+(j + ,ji m,) = D-(j +


COROLLARY
, r - 1 and ml, *
1, j, mj), j = 1, 2, ** in71 are odd,then o,h[ne(e
mr, -6(-))]
,1(0, G-1).

The conditionsof the corollaryare satisfiedin the important


specialcase of
brokenline regression.
REMARK. If Ee4 < oo then62 - 6*2 = o,(n-i). To see this,note that62 =
n-l 1 (eni + v n1 E e+ Op(n-1) and =*2 n*' 1*(ei + *)2 - -

n*-lE* e2i + O (n-1). Thecentrallimittheorem impliesthatI * e2i = Oro2n*


+ X,
E=Ieni = n + X + Y where
X = O(ni), Y = o,(ni). Thusa2 _*2 op(ni). -

The behaviorof theT's willnowbe discussed.Firsttheasymptotic behavior


of the .'s will be considered.It willthenbe shownthat"r and Tj* are close
T

and thushave thesameasymptotic distribution.


Referbacktoequation(3.13) fortheintersection pointoffj(6j; t) andfj+,(Oj+,;
t), forallO,, O,+, sufficientlyclose to 6,(O)'1, j = 1, 2, *,r-1 . Denote
(l/im,!)[D?(j + 1,], i,) - D-D(j,j, i,n)] by D1j, afj(61o)?,
Tk(?)/1a8j by kf', and
deletethe + fromD1+ if the mtht-derivatives are continuousat r,(0). Since
8*-- -P= O=(n-1), equation(3.13) impliesthat(Tj* - i(o)),)i = Op(n-i) and
so (3.13) maybe rewritten as
(4.17) niB* = A[n1(8* - 0(0))]

where B* is the r - 1 dimensional column vector with ith coordinate (rj*-


= A(r'0?)is the (r - 1) x q matrixwhose (i,
Ti'())m and A j)th componentsare
ifi'/Di? for K(1) + **. + K(i - 1) + 1 <? < K(l) + *** + K(i), - if+,/Di?
for K(l) + * + K(i) + 1 ? j ? K(1) + *.. + K(i + 1), and 0 otherwise,
. .

where i runs from1 to r - 1.


The asymptoticdistribution
of r dependson thevalueofm,and on whether
or notD 1.+ Di-I Supposefirstthatall ofthemi areodd andD + = D- =Dj
j = 1, 2 * , r - 1. Lemma 4.2 and Theorem 4.13 (a) implythat ni(6P*-0)
is asymptotically
normal. Equation(4.17) thenimplies
* r(0)')i, . . . ,
-tni(r (r* - r0__)mr.)'} -O-4'0, AG-A')

If Dj+ t D - then ni(O* - (0)) may or may not be asymptoticallynormal,


depending on just how thesegmentsintersect.Thiswillbe illustrated
byexam-
ple. It is apparentthoughfrom(4.17) thatni(fj*- r1())a is notasymptotically
normal. Its asymptotic distribution
is a mixtureof halfnormaldistributions.
If mi is even,thenit can be seen from(3.13) thatni((6P*-6 (0)', (6O1 -
90c)4)')does not generallyhave an asymptotic normaldistribution. However,
ifDk+ = Dk- and Mk is odd fork / j, then0k* is the unrestrained l.s.e. for
k j - 1,j, + 1. Thus,ni(k** - k(O))mk is asymptotically normallydistributed
forall k Izj - l,j,j + 1.
Thisdiscussionis summarized
in thefollowing
theorem.
SEGMENTED REGRESSION PROBLEMS 77

THEOREM 4.15. (i) If m, is odd and D,+ = Dj-, = 1,2, *.., r - 1, then

(4.18) ?tn((Ar*1- Z ?)ml, nI(r*


* - O0)Mr-} <(O, AG-A)
whereA _ A(tr'0) is the(r - 1) x q matrixon therightside of (4.17). If mj is
odd but D,+ # Dy-, thenthe asymptoticdistribution of ni('j* - tr(o))ij may be
nonnormal.
(ii) If mk is even,thenwiththeexceptionofj = k - 1, k, k + 1 thestatements
in (i) are applicable.
It willnow be shownthatIr and TA* have thesameasymptotic
distribution.
LEMMA 4.16. Let B* be ther - 1 dimensional columnvectordefinedin (4.17) and
let B be definedsimilarly, but with ri* replaced by r. Then ni(B - B*) = op(1).

PROOF. Equation (4.17) is valid at s as well as *. Thus

ni(B - B*) = A(rt'?)ni(0 - 0*) + op(1) = op(1). C

Theorem4.15 and Lemma4.16 imply


THEOREM 4.17. The asymptotic distributionof r as n -> oo is the same as that
stated for 4* in Theorem 4.15. In particular, if m1, * , m,1 are odd and D 3
-

Dy- = Di, j = 1, *, r - 1, then 2(niB) -+ _4V(O,AG-WA') where B is defined


in Lemma4.16.
The sectionwillbe concludedwithseveralexamplesthatillustrate
theresults
statedin thetheorems.
1. Considerthetwo-segment brokenlineregression
model.
< t : (oW
1( ; t) = all + 012t 0
= 021 + 022 t T'n'< t ? l .
Here r = 2, K(1) = K(2) = 2, ml = 1, D + D - = o(o) - (2). If P(V0); t) is
identifiedby centersof observations,Corollary4.14 impliesthat{ni(0-- 6)} -

.lV(O, G-1),whereq02G is the4 x 4 information


matrixwithcomponents
aC2G = r(O)ti+i-2dH(t) for i= 1,2; j= 1,2
=0 for i-= 12; j=3,4 or i=3,4; j= 1,2
= St(o)tij for i = 3,4; j = 3,4.
Equation(4.17) (substituting
r forT*) assertsthat
n&(r-z(0)) = nWA(r))(0 - 0(0) + op(1)
where A(T(O)) = (1, T(0)-1, -
(0))/(O(2)- 0?)). Thus {ni(r - z0')J-O
AG-1A'). Thisconcludestheexample.
Hinkley[11] reportsforthisspecialcase, based on an empiricalstudy,that
the asymptotic normalityof r is not a good approximationforsmallsample
sizes. The estimated
change-over pointis givenby
_ - - -
T__ 811 - 21 (6(0 0(0)) + [(011 021) (010 21)

022 012 (022 -0210) + [(022 - 12) 22 12)]


78 PAUL I. FEDER

The denominator can be expandedin a geometricseriesto as highan order


termas desired. Thus
A A A

(0) (0) (0ll - 021) (022 -012)7


0(0(0- 0 0)) (() - 0(0))

(-O 1+
(10r O0(n-1)
-t(?) (022- 012 __ 021) _ (022- 012)
00 0()
22 - 12
(O/L(0 11 -621)
(0) (fl(O)
("22 O (0))
-"12)-

The firsttermon the right-hand side of the above expression, whensuitably


normalized, convergesto thenormaldistribution discussedabove. The second
(correction)termis 0,(n-1). Additionaltermscould be includedif desired.
Hinkley[12] discussesthe distribution of the ratioof two correlatednormal
randomvariables,of whichthedistribution of r is an example.
2. In thisexampleO(0) lies at a boundaryof 9. Considerthe two segment
regressionmodel
14; t) = 11 ? < t <T
= 82(t )2 < t <

2 21

-~~~~~
~(001

FIG. 4.1.

Suppose 0?) = 0, 0) = 1. Then r = 2, fl(01; t) = 01, f2(61; t) = 02(t -)2


() ' K(1) = K(2) = 1, ml = 2. Assume thata02 1. The parameter space
e consistsof thefirst and thirdquadrantsof two-dimensional Euclideanspace,
includingthe021 axis butexcludingthe0,,axis. The stateof nature,(0, 1), is
at a boundaryof 9.
Suppose,fordefiniteness thatobservations
are equallyspacedalongtheinter-
val [0, 1]. It is nothardto showthattheconditions of Lemma4.12 are satisfied
SEGMENTED REGRESSION PROBLEMS 79

and so 0 - 0* = o0(n-i), etc. It thussuffices


to discusstheasymptotic
distri-
bution of 9
Calculatetheunrestrainedestimators 0*, ,*Obasedon theobservations within
each segmentseparately.E is locallyconvexat (0, 1) and so the conditionsof
Lemma4.8 are satisfied,takingforC thehalfspace{6: 0,, _ 0}. In thisexam-
ple G is the2 x 2 matrixwithdiagonalelements

Si dt== 2
-210
and I(t - )4 dt= T16 and offdiagonal
iaon elementsequal to 0.
Lemma 4.8 assertsthat ni(O* - 0(0)) has thesame asymptoticdistributionas the
pointin C closestto Z, wherei29(Z) = _41f(0,
G-1) and closenessis measured
in themetricdetermined by G. Thus
1)} - (Z1+, Z2)
2
whereZ + =_ max(0, Z1) and ;9{Z1, Z2)} = V(O,diag(2, 160)).
The change-over
point,r*, is estimated in theobviousmanner.Thus (?* -
1)2 0* and so

_2jn- )2} _29{Z1+J where 22{Z1} = 4xV(0,


2) .
3. In this example0(0' is sometimesa boundarypointand sometimesan
interiorpointof e. However,even when89'?is interior,r is not normally
distributed.Supposep(e; t) is represented by thetwosegmentmodel u(e; t) =
It - 11if0 < t < r and ,f(e;t) = 8, + 02t if ? t ? 1.
The parameter spacee is picturedbelow.
Suppose00)?is suchthatOi(O) + 02'(0/2 = 0. Then 0(0) lies along the dashed
lineand is thusan interior pointof e if 102(0)1 > 1 andis a boundary pointif
02

02 - l/2-@ \i

02- -20\
I
1/2 0

.1/2
-4.1

FIG. 4.2.
80 PAUL I. FEDER

102(1 < 1. In either


eihr-2 case, r-) 2. h case
The 102,1? = 1 will be discussed
separately.
Supposethattheobservations are equallyspacedalongtheinterval[0, 1]. If
102( 1 # 1, 6 is well-identified,
if02() =- butnotwell-identi-
1, 6 is identified
fied,andif02(0) - 1, 6 is unidentified. ofTheo-
When102(0)1 1 theconditions
rem4.13 are satisfied.We considerthissituationfirst.
Case (i). 102(0)1 < 1. The parameterspace e by the
- 6(0) is approximated
half space 20, + 02 > 0 4.7. Theorem4.13(b) implies
as definedin Definition
?[0(8 - 6(0'))] >- [r(Z)] where =(Z) minimizes(Z - 6)'G(Z - 6) among
all 6 suchthat201+ 02 > 0 and -2(Z) = 4"-(O,G-1). It is easilyseenthat
if 2Z, + Z2 > 0

Z'G(1,-2)' (1 1) if 2Z, + Z2 < 0.


(1, -2)G(1, -2)' _-2
Thus 27r,(Z)+ 7r2(Z) = max (0, 2Z, + Z2).
change-over
The estimated theequation
point,r, mustsatisfy
7-'I 61 + To 1+2) + ( --)2

Therefore
IT-21 - (T - 4)02(0) = [(6, + 262) - (0,(0) + 02)02())] + Op(n1).
This equationhas two solutionsT+~,
r_, wherer+ >, r_? <4.

?(r+-2) - [(, ?+ 262) -(01 +


02(0) 202()] + +O (n-)
+
O(2
A

- 1) = n2[(6 + 202) - (0,0) + 202)]

Thus
This eqai- 2 -so )] r[(+,(Z) +
hre2(Z))/(1 0 )] '
(A ni_- )] -+ 196[-( + 02)]
1(Z) + 10f2(Z))/(1

Case(ii). 1021?) > 1. Theunderlying pointof


6(0) is an interior
parameter,
en. Thus Theorem4.13(a) implies
2 2[n (+- 6 00))]-o(Z)=(O,G-').
If 020?) > 1
-_ = -[(6 + 212)-(0,(0) + 202)]I( 1 + 02(0))] + Op(n')
n+62201 02 + if
= [(6, + 202) - (00) + 202(?l)]/(1-02()) t-
+O(n-')

l1 72?0 if + _
Thisimplies
T re 4)] 9(t) - where Z
-[=-(Z61
+ Z2)I(1(+ a) if Z+ 2Z2?0)
4 ?
= (Z, + 4Z2)I(1-a) if Z, + 2Z,?0 ?
wherea = 102(1 Similarly,if02(0) <- 1,n(-2
2 1
)]
+ (-t). In both 0
instances69,suitablynormalized, however
normnal;
is asymptotically TiSnot.
SEGMENTED REGRESSION PROBLEMS 81

Case (iii). 90,?)= 1. The parameter6 is not identified


and so thereis no
reasonthat6 shouldevenbe consistent.HoweverCorollary3.22 is applicable.
Case (iv). 02(0) -1. The regressionis identifiedbut not well-identified.
The parameterspace e - 60) is approximated by the wedge-shapedpositively
homogeneousregion,C, of apex angle 206.6 degreesbetweenthe half lines
{1- 2= 09 2 + 1 < 0} and + 20, = ? -1 < ?} . Note thatC is not
,621?}ad{2--2==
convexhere. HoweverE and C satisfytherelaxedconditionmentioned in the
notesfollowingLemmas4.8 and 4.12. The regionC can be dividedintotwo
convexregionsby a halflinebeginning at thepoint(2-1). The appropriate
angleof thelinedependson G. The extension of thelineintothecomplement
of C shouldcontainall pointsequidistant(in themetricdetermined by G) to
both arms of the wedge. Thus, fromLemmas 4.8 and 4.12 2?[n[(8 - 6(0))]
i96[7r(Z)] where (Z - ir(Z))'G(Z - 7r(Z)) = minEc (Z - 0)'G(Z - 6). The
estimateof thechange-over
point,r, is notevenconsistent.
5. Someunresolvedproblems.Thispaperprovidesa derivation oftheasymp-
toticdistribution
theoryin the identified
case, but leaves severalimportant
questionsunanswered.
(i) Hinkley[11] has done numericalworkwhichshowslack of agreement
betweenthe empiricaland asymptotic distributions
formoderatesamplesizes,
in a specialcase. Workshouldbe done to assessthesamplesizesnecessaryfor
theasymptotics to be validand to obtainmoderatesamplesize approximations
to thedistributions of theleastsquaresestimators.
(ii) The distribution of thelikelihoodratiostatistic,
i, is of interest.It may
be shownby arguments corollaryto thoseof thispaperthatin theidentified
case -2 log i has theusual chi-squareasymptotic distribution.Howeverthese
arguments breakdownin theunidentified case. This is preciselythe situation
whereone wantsto testwhetheror not thereis a changein the regression.
Quandt[20] reports on empiricalgroundsthatthedistribution of -2 logi does
not appearto be chi-square.Thisproblemis consideredin Feder[9], whereit
is shownby examplethattheasymptotic distributionvarieswiththespacingof
thet-values.
(iii) The designquestionis of interest.That is, how shouldone selectthe
independent variables,eithersequentially
or nonsequentially, to obtainthemost
preciseparameter estimates and themostpowerfultests?
6. Acknowledgments. I wishto thankmanypeoplewhogenerously gaveme
someoftheirtimeto discussideaspertaining to thismanuscript.In particular,
ProfessorsHermanChernoff and thelate L. J. Savageprovidedsomevaluable
insights
thatkeptme fromsinkingintothequicksandof definitions, theorems,
and corollaries.
I thankProfessor
HermanChernoff at Stanford University,
Professor Francis
J.Anscombeat Yale University,
ProfessorJohnTukeyat Princeton University,
82 PAUL I. FEDER

and Dr. RichardL. Shueyat GeneralElectricCorporateResearchandDevelop-


mentfortheirsupportof thiswork,and the refereeforhis veryhelpfulcom-
mentsand suggestions.
I wouldalso liketo notethatSylwester's
thesis,[24], was thefirst
paperthat
cametogripswithsomeofthedifficultiesinherentin regression problemshaving
"kinky"sumsof squaresfunctions.His approachprovidedmotivationforme
and influencedtheformatof thepresentmanuscript.

REFERENCES
R. and ROTH,R. (1969). Curvefitting
[1i BELLMAN, by segmentedstraightlines. J. Amer.
Statist.Assoc.64 1079-1084.
[2] CHERNOFF, H. (1954). On the distributionof the likelihoodratio. Ann.Math. Statist.25
573-578.
[3] CRAMfER, H. (1966). Mathematical Methodsof Statistics.PrincetonUniv. Press.
Processes.Wiley,New York.
[4] DOOB, J. L. (1953). Stochastic
[5] DUNICZ,B. L. (1969). Discontinuities in the surfacestructure of alcohol-watermixtures.
Sond.aus derKolloid-ZeitschriftundZeitschrift furPolymere. 230 346-357.
[6] FARLEY,J. U. and HINICH,M. J. (1970). A testfora shifting in a linear
slopecoefficient
model. J. Amer.Statist.Assoc.65 1320-1329.
[7] FARLEY,J. U., HINICH,M. and McGUIRE,T. W. (1971). Testingfora shiftin the slopes
of a multivariate lineartimeseriesmodel. TechnicalReportWP 77-70-1,Graduate
Schoolof IndustrialAdministration, Carnegie-Mellon Univ.
[8] FEDER,P. I. (1967). On the likelihoodratiostatisticwithapplicationsto brokenline re-
gression.TechnicalReportNo. 23, Department of Statistics, StanfordUniv.
[9] FEDER,P. I. (1975). The log likelihoodratioin segmentedregression.Ann.Statist.3 84-
97.
[10] GREVILLE, T. N. C., ed. (1969). Theoryand Applications of SplineFunctions.Academic
Press,New York.
[11] HINKLEY, D. V. (1969). Inferenceabout the intersection in two-phaseregression.Bio-
metrika 56 495-504.
[12] HINKLEY, D. V. (1969). On the ratioof two correlatednormalrandomvariables. Bio-
metrika 56 635-639.
[13] HINKLEY, D. V. (1970). Inferenceaboutthechange-point in a sequenceofrandomvariables.
Biometrika 57 1-17.
[14] HUDSON, D. J. (1966). Fittingsegmentedcurveswhosejoin pointshave to be estimated.
J. Amer.Statist.Assoc.61 1097-1129.
[15] JENNRICH, R. I. (1969). Asymptotic propertiesofnon-linear leastsquaresestimators.Ann.
Math. Statist.40 633-643.
[16] LOtVE, M. (1960). ProbabilityTheory,(2nded.). Van Nostrand,New York.
[17] POIRER, D. J. (1973). Piecewiseregressionusingcubic splines. J. Amer.Statist.Assoc.68
515-524.
[18] PRATT, J. W. (1959). On a generalconceptof "In probability."Ann.Math. Statist.30
549-558.
[19] QUANDT, R. E. (1958). The estimationof the parametersof a linear regressionsystem
obeyingtwo separateregimes.J. Amer.Statist.Assoc.53 873-880.
[20] QUANDT, R. E. (1960). Testsof the hypothesis thata linearregression systemobeystwo
separateregimes.J. Amer.Statist.Assoc.55 324-330.
[21] ROBISON, D. E. (1964). Estimatesforthe pointsof intersection of two polynomialregres-
sions. J. Amer.Statist.Assoc.59 214-224.
[22] SCHOENBERG, I. J., ed. (1969). Approximations withSpecialEmphasison SplineFunctions.
AcademicPress,New York.
SEGMENTEDREGRESSIONPROBLEMS 83

[23] SPRENT,P. (1961). Some hypotheses


concerningtwophase regression lines. Biometrics
17
634-645.
[241 SYLWESTER, D. L. (1965). On maximumlikelihoodestimationfortwo-phaselinearregres-
sion. TechnicalReportNo. 11,Department of Statistics,
StanfordUniv.
[25] SYLWESTER, D. L. (1972). Asymptotic
distribution
theoryfortwo-phaselinearregression
models. Personalcommunication.
GENERAL ELECTRIC COMPANY
CORPORATE RESEARCH AND DEVELOPMENT
BLDG. 37, ROOM 578
SCHENECTADY, NEW YORK 12301

You might also like