Silvey
Silvey
Author(s): S. D. Silvey
Source: The Annals of Mathematical Statistics, Vol. 30, No. 2 (Jun., 1959), pp. 389-407
Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/2237089 .
Accessed: 28/09/2013 21:09
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The
Annals of Mathematical Statistics.
http://www.jstor.org
functiondi(
If a single-valued *) is thusdefinedforalmostall x, then n(* w*)
is a randomvariable called a maximumlikelihoodestimatorof Ooin W*.When
we referto "almost all x" we mean almost all with respectto the probability
measuredefinedon the sequence space of pointsx by the considerationthat the
componentsof a sequence x are regardedas independentobservationson a
random variable X with distributionfunctionF(., 0o). Similarly"almost all
t E Ra" means almost all with respect to the probabilitymeasure definedon
R by F( , Oo).
The matrix whose (i, j)th element is fR. alog f(t, 0)/a0i alog f(t, o)/aoj
dF(t, 0), we will denote by Be. Further,Ho will denote the s X r matrix
(ahj(0)/aOj). For any real functionr definedon R', DD(O) will denotethe col-
umn vectorwhose ith componentis cl(O)/a0j, while D2v(O) will denote the
s X s matrixwhose (i, j)th componentis 02r(O)/CEiCj . Generally column
vectorscorresponding to pointsin Euclidean space will be printedin the corre-
spondingboldfacetype so that, forexample,the columnvector 0 corresponds
to the point 0.
We willbe interestedinitiallyin the emergenceof 6.(x, co) as a solutionof the
equations
n-'D log Ln(x, 0) + Ho= 0
h(0) = 0,
and
(4.1.6) h(s) = [H'. + o(1)][6 - 0*1.
For almost any x, if n is sufficiently
large, 6 will,with a certainLagrangian
multiplierXA(x),satisfythe restrictedlikelihoodequations.So we have, writing
X in place of XA(x)forbrevity,
(4.1.7) Dn-1 log Ln(x, 9*) + [D2z(0*) + o(1)][6- 0*] + He;-O,
(4.1.8) [He* + o(1)][6 -*] = 0.
Since z(0*) is a maximumin the set wof the functionz, thereexistsa Lagran-
gian multiplierX* = (Xi, A*, *** , xr) such that
(4.1.9) Dz(0*) + Ho.I* = 0,
and on subtracting(4.1.9) from(4.1.7), and using (4.1.5) we obtain
[Dn ' log Ln(x, 9*)- Dz(o*)] + [D2z(0*) + o(1)][6 -_*1
(4.1.10)
+ [Ho. + o()][" - L*] + [Hi - Ho.*]X*- 0.
Now on expandingthe elementsof the matrixH6 by Taylor's Theorem,we find
that,because ofthecontinuityofthesecondorderpartialderivativesofthefunc-
tionshi, foralmost all x,
We will denote by - Bet the matrixD2z( 8*) + 5=1 X*D hi(*). Then on
substitutingin (4.1.10) the expressionfor [H&- Ho*]I* containedin (4.1.11)
we have
[Bet*+ o(l)][6 - J*]-[Ho* + o(l)][l-
= Dn ' log Ln(x, 9*) -Dz(0*),
L
and combining(4.1.12) and (4.1.8) we may write
H*
L-Ho -H|
0J
is non-singular.
/n[;< () _ O*]
of assumptionsis apparent if we note that in the case where OoE cowe might
replace Assumption11 by the following
Assumption11A. The matrixBeois positivedefiniteand the matrixHeo is of
rank r).
(4.3) It is now possibleto obtain a pictureof the typicalpracticalsituation
when n is large and 0o, while not belongingto the set w, is very near this set.
Usually thenz(Oo) will be supeesz(0) and 0* will be near 0oso that Dz(0*) will
be near Dz( Go) = 0. Then X*also will be near 0, though,sincen is large,VnkX*
may be-appreciablydifferent from0. Also the elementsof D2z(O*) will be near
=
thoseofD2z(Oo) -B6o. If in additionf3ij(0*) is nearthe corresponding
element
of B0,, as will usually be the case, then we can say that approximately
z
[6n~' o-*
v\n
will have a multivariatenormaldistributionwith mean 0 and variance matrix
rPoo ?
[Po -Reol
thismatrixbeingas definedin [1]. (It wouldbe possibleto give a rigorousmathe-
maticalderivationof thisresultby imaginingthe trueparameter0oto varywith
n in such a way that the distanceof 0ofromthe set w tended to 0 as n -00,
and by imposingsuitable restrictions on the functionsf and h to ensure that
what is here said to happen usually would in fact happen. But this does not
seem particularlyprofitable).
(4.4) Finally in this connection,because of the remarksmade in the pre-
vious paragraphand of the flexibility of Newton's methodof solvingequations,
we mightexpect that, in the case where 0ois near the set w and n is large,the
iterativemethodof solvingthe restrictedlikelihoodequations suggestedin [1]
will stillapply.
5. Three tests of the hypothesisthat 0o co.We will now comparethreein-
tuitivelyreasonabletests of the hypothesisthat OoE w. These are as follows.
(i) The likelihoodratiotest.We accept the hypothesisif ,(x) = sUpoewLn(x,
0)/supo,sLn(x, 0) is "sufficiently
near" 1.
(ii) The TValdtest.Assumingthe existenceof On(x,Q), we accept the hypothe-
sis if h(&n(x,Q)) is "sufficiently
near" 0. (Wald [8]).
(iii) The Lagrangianmultipliertest.Assumingthe existenceof &n(x,w) and
An(x) we accept the hypothesisif An(x) is "sufficiently
near" 0. (Aitchisonand
Silvey [1]).
For typographicalbrevitywe 0
' will now write forthe unrestricted maximum
likelihoodestimatorf)(* *X), forthe restrictedmaximum, likelihoodestimator
61( 7w) and X forthe randomvariable XA(.)-
The measureof the distancefrom0 of h(6) used by Wald is, in our notation,
(5.3)LB0 Hj [P0 Ql
[-Hoo 0o LQOORooj
Also it is easy to showby the same kindofargumentas has been applied above
that the assumptionsA implythat 0 existsand almost certainlyconvergesto Oo
and that
(5.4) foralmost any x and sufficiently
large n,
D log L[x, O(x)] = 0,
(5.5) /n(b - Oo) = n'B-'D log L( ,Go) +o(1).
We will now use these resultsto prove the followinglemmas.
LEMMA 2. Subject to assumptions A,
-2 log n(= n
- b3Be0(
-)
-6b) + oP(1).
Hence
log ,u = log L(, 0) -log L.(- 0)
2n(6 6)'[Boo + o,(1)](6 - 6) + os,(1),
and the resultfollowsbecause 110- Oil= O0(n-).
LEMMA 3. Subjectto assumptions A, 2 log , = n"'R-' I + op(l).
PROOF. We have
V/n(6-b6) nBon)D
(Poo - G + op(l).
log Ln( Xo)
Now
since by 6A, G0ow. Hence V\nh(O) = n 'H' Bo-1Dlog L,,( X,Go)+ op(1) and
[D logLn( XGto)]
n[h(0)]'Ro0[h()] = n l[D log Ln( *, Go)]'B-1Ho0RooHeoB- + op(1).
It is easy to show that BoO'HeOR0OHo03B' = Qo0R-1Q'O,and it follows that
n[h(6)]'Ro0[h(6)] = ni'R-001+ op(1). The proof is then completed by the
remarkthat, as in Lemma 3, Roo = R, + op(1).
LEMMA5. Subject to assumptionsA, each of therandomvariables-2 log ,
-n[h(0)]'R4[h(6)] and -nX'Re^1i is asymptotically as x2 withr degrees
distributed
offreedom.
This followsfromlemmas3 and 4 and fromthe fact that Vn5 is asymptoti-
cally normallydistributedwith mean 0 and variance matrix -Ri.
In consequenceof lemma 5, whenn is large the naturalchoicesof criticalre-
gionsofsize a fortestingthehypothesisthat G0E w on the bases (i), (ii) and (iii)
Also since Dz(0*) + Ho4** = 0 and since usually Dz(Oo) = 0 and D2z(Oo)=
-Boo, we willhave
(5.8) Be (0* - Oo) + Heol* = 0,
approximately.Since the distribution of 6 does not dependon whetherOois in W
or not, it will remaintrue (see (5.5)) that
(5.9) V/n( - Oo) - n Bo-,1D log Ln( 0*).
Also examinationof the details of the proofof lemma 2 shows that the result
thereobtained,namely
(5.10) -2 log , n(' -
6)Bo(O -
)
stillholds.
Now from(5.6) and (5.9) we have
-2 log -
n(' - 6)'B9o(6 - 6) - n5'H IB-1Heo0= ni'R-'5- n_'Re^L,,
and consequentlythe testsbased on the criticalregionsC1 and C3 will have ap-
proximatelythe same powerin these circumstances.Moreoverit is easy to see
that each of the randomvariables -2 log jAand -ni"'Ri01; will then have ap-
proximatelya non-centralx2-distribution withr degreesoffreedomand param-
eter - n*'Ro^%*. (Again thisargumentcould clearlybe maderigorous byimagin-
ing 0oto vary withn in such a way that 110*- Ooll= 0(n-') and by imposing
suitableconditionson the functionsf and h).
We now considerthe powerof the Lagrangianmultipliertest whenn is large
and 0ois not near w. Then the asymptoticdistributionof 'n will usually be as
given in Lemma 1. Now, if X*is not near 0, thenwitha highprobability x
will be farfrom0 and since normallythe matrix-Ro will be positivedefinite,
the powerof the test based on C3 will be near 1. Howeverthereis a possibility
that Oomightbe such that the functionz has a stationaryvalue at 0*,in which
case A* = 0. Then -ni'RA' would not necessarilybe large with a high prob-
abilityand consequentlythe powerof the test based on C3 would not be near 1
forsuch a Oo. But thisis a contingency whichdoes not seemlikelyto arise often
(the authorhas been unable to findan exampleofit) and we may concludethat
in most practical situationsthe Lagrangian multipliertest is equivalent,for
large samples, to the likelihoodratio test.
6. Singular informationmatrices. As we have said previouslythe whole
problemof maximumlikelihoodestimationis closely bound up with the be-
haviorof the functionz. In particular,forunrestrictedestimationit is important
that z shouldhave a maximnum turningvalue in Q at 0o,forthisconditionplays
an importantpart in ensuringconsistencyof f9n( *, il). Now the demands that
z(Oo) shouldbe a maximumturningvalue of z in Q and that Booshouldbe posi-
tive definiteare not unrelated.For it is usually true that z has a stationary
value at Oo,i.e., that Dz(Oo) = 0 and also that D2z(Oo) = -Beo: these results
dependonlyon f beingsuch that we can "differentiate underthe integralsign."
0
So that if is near Oowe will usually have
Hence if B9ois not positivedefiniteit may verywell happen that z(Oo) is not a
maximumturningvalue of z in Q and much of unirestricted estimationtheory
would thenbreak down.
However,even ifBoois not positivedefiniteand z(Oo) is not a maximumturn-
ing value of z in Q, it may still be the case that if Oobelongsto the subset X of
Q, z(Oo) is a maximumturningvalue of z in w so that restrictedestimation
theorymay not need drasticrevision.And it is of some theoreticalinterestto
considerJustwhat revisionis necessaryin this case. Moreoverthisproblemis of
'G2) F
L-BO0+ ? o(i) -H?0 + o(1)1 6F(x,c)
0 JL (x
- 0 =] Dn logLn(x,0o)]
H) , o(1) W
for almost any x. Now however,since we have dropped the requirementthat
Boobe positivedefiniteand since subsequenttheoryconcerningthe asymptotic
distributionsoffn(., w), X',,and associated randomvariables makesconsiderable
use of the inverseof Boo,this theoryno longerapplies. To enable us to replace
this theorywe will now introduceassumption 1lB which is associated with
assumptionGB in the same manneras 1lA was shown at the beginningof this
sectionto be associatedwith6A. This assumptionwill providea natural connec-
tion between propertiesof the matrixBoo, the subset w and the facts that 00
is identifiablewhen it is kniownto belong to X (assumption 6B), but unidenti-
fiablein S.
Assumption11B. The matrixHoois of rank r. The matrixBo. is of rank s -t
wheret ? r. There existsan s X t sub-matrixHi of Ho. such that Bo. + H,Hi
is positivedefinite.(Without any loss of generalitywe may assume that Hi is
rBoo+ HI Hi -Hoo
L-Ho
I
0 J
is non-singularand wvedefinePO*o,Q'* and R* by
[Po
Q0 Bo] [o ?Hi-HH
(6.5)
normallydistributed
is asymptotically mcan0 and variancematrix
twith
[P 0o
O so]
QooB0oQoo -Ro-o LO = So ,
W'(Bo + HI1H1)W = I8
and
As-t O
W'Boo W A0 0]
t ^~~8-t ?
W'H1 Hi W=Is-[ 0
It followvs
that ni1, i2, ***, mit are independentN(O, 1) random variables,
while mist?l = Ms-t+2 = i = 0-.
and so
Hence
(6.8) m'm 4
n - ]'Boo[ - OoI+ n5I'HeoWW'Hoo1"
Now from(6.4) Vn(-- -O) PO*(W')-'m and, as previously,P* is of
rank s - r. Ilence asymptotically, when n[6 - Oo]'B8o[6- Oo]is expressed as
a quadraticformin ml , M2 . , mS8- , its rank is at most s - r. We will n1ow
is expressedas a quadratic formin ml,
show that when n1"'HooWW'Ho05l M2,
* n*,-t , its rank is at most r - t.
From (6.7) we have, again since Hso(o- Oo) 0,
-H@OWm '-. VnH@o0W'H/o01.
Now
I , ~H1 Wm
H Wm =-H W]
and, since
_ _
N=
K n72IL2 0
L o nk L.J
1
-
(7.1) [NB0Noo -Ho] ( ) Oo[]
D log L(x*,
LHoo 0 iL 5WX*) I ' 0
Also it will usually be true that D log L(x*, do) can he regardedas an observa-
tion on a random variable which is approximatelynormalwith mean 0 and
variancematrixNBo .
Now in the case whereBoois positivedefinitewe may use (7.1) in the same
way as beforeto show that when 0o E w and n,, nk are large, n2*
1H6I
L.'H4NRBd]
will usually be distributedapproximatelyas x2 with r degreesof freedom,and
it is this statisticwhichwe use in the modifiedformof the Lagrangiatnmulti-
pliertest. AlternativelywhenBo, is of rank s - t, wheneach of the functions
hi, h2 , -,* hi is a function of only the parameters involved in the distribution
of one of the X's and Bo, + H1H' is positive definite,the statisticon wlhich
the test is based is 2/H4[N(B6+ H1H")K 1Hdi,whichwill usuallybe distributed
as x with r - t degreesof freedomwhen ni , n2 , nk are large.
2
.
Xi nl n1 ni
X2 n2l 122 fl2
Total MI m2 n
We suppose that the point 0C = (01 , 00, 03, O) is kniownto belongto the set
Q { R 4:E _ O < 1/e (i = 1, 2, 3, 4)1 where E is a snmall positiveinumber.
In this case we also have
log L(x*, 0) -constant + nl1 log 01+ n12 log 02 - n lo3g(01 + 02)
The matrix
[-
O'- (01 + 02)-l -(1 + 02) 0 0
L 01- 03 J
so that
0 O
_l
He= 0 1 -1.
0 1 0
If H1 is the leading4 X 2 sub-matrixof Ho, then forany 0 r ,
H, H
B8 +
Be + Hi Hi =
=
10
o-
0 o0
0-1
0
0
0
0-
0 j
_ 0 0 04
whichis positivedefinite.