AStat
AStat
Asymptotic Statistics
With a View to Stochastic Processes
De Gruyter
Mathematics Subject Classification 2010: 62F12, 62M05, 60J60, 60F17, 60J25
Author
Prof. Dr. Reinhard Höpfner
Johannes Gutenberg University Mainz
Faculty 8: Physics, Mathematics and Computer Science
Institute of Mathematics
Staudingerweg 9
55099 Mainz
Germany
[email protected]
ISBN 978-3-11-025024-4
e-ISBN 978-3-11-025028-2
Printed in Germany
www.degruyter.com
To the students who like both statistics and stochastic processes
Preface
At Freiburg in 1991 I gave my very first lectures on local asymptotic normality; the
first time I explained likelihood ratio processes or minimum distance estimators to
students in a course was 1995 at Bonn, during a visiting professorship. Then, during
a short time at Paderborn and now almost 15 years at Mainz, I lectured from time
to time on related topics, or on parts of these. In cooperation with colleagues in the
field of statistics of stochastic processes (where most of the time I was learning) and
in discussions with good students (who exposed me to very precise questions on the
underlying mathematical notions), the scope of subjects entering the topic increases
steadily; initially, my personal preference was on null recurrent Markov processes and
on local asymptotic mixed normality. Later, lecturing on such topics, a first tentative
version of a script came into life at some point. It went on augmenting and completing
itself in successive steps of approximation. It is now my hope that the combination
of topics may serve students and interested readers to get acquainted with the purely
statistical theory on one hand, developed carefully in a ‘probabilistic’ style, and on the
other hand with some (there are others) typical applications to statistics of stochastic
processes.
The present book can be read in different ways, according to possibly different math-
ematical preferences of a reader. In the author’s view, the core of the book are the
Chapters 5 (Gaussian shift models), 6 (mixed normal and quadratic models), 7 (local
asymptotics where the limit model is a Gaussian shift or a mixed normal or a quadratic
experiment, often abbreviated as LAN, LAMN or LAQ), and finally 8 (examples of
statistical models in a context of diffusion processes where local asymptotics of type
LAN, LAMN or LAQ appear).
A reader who wants to concentrate on the statistical theory alone should skip chap-
ters or subsections marked by an asterisk : he or she would read only the Sections 5.1
and 6.1, and then all subsections of Chapter 7. This route includes a number of exam-
ples formulated in the classical i.i.d. framework, and allows to follow the statistical
theory without gaps.
In contrast, chapters or subsections marked by an asterisk are designed for readers
with an interest in both statistics and stochastic processes. This reader is assumed to be
acquainted with basic knowledge on continuous-time martingales, semi-martingales,
Ito formula and Girsanov theorem, and may go through the entire Chapters 5 to 8
consecutively. In view of the stochastic process examples in Chapter 8, he or she
may consult from time to time the Appendix Section 9 for further background and
for references (on subjects such as Harris recurrence, positive or null, convergence of
martingales, and convergence of additive functionals of a Harris process).
viii Preface
In both cases, a reader may previously have consulted or have read the Sections 1.1
and 1.2 as well as Chapters 3 and 4 for statistical notions such as score and informa-
tion in classical definition, contiguity or L2 -differentiability, to be prepared for the
core of the book. Given Sections 1.1 and 1.2, Chapters 3 and 4 can be read indepen-
dently of each other. Only few basic notions of classical mathematical statistics (such
as sufficiency, Rao–Blackwell theorem, exponential families, ...) are assumed to be
known.
Sections 1.3 and 1.4 are of complementary character and may be skipped; they dis-
cuss naive belief in maximum likelihood and provide some background in order to
appreciate the theorems of Chapter 7.
Chapter 2 stands isolated and can be read separately from all other chapters. In i.i.d.
framework we study in detail one particular class of estimators for the unknown pa-
rameter which ‘works reasonably well’ in a large variety of statistical problems under
weak assumptions. From a theoretical point of view, this allows to explicitly construct
estimator sequences which converge at a certain rate. From a practical point of view,
we find it interesting to start – prior to all optimality considerations in later chapters –
with estimators that tolerate small deviations from theoretical model assumptions.
Fruitful exchanges and cooperations over a long period of time have contributed to
the scope of topics treated in this book, and I would like to thank my colleagues, coau-
thors and friends for those many long and stimulating discussions around successive
projects related to our joint papers. Their influence, well visible in the relevant parts
of the book, is acknowledged with deep gratitude. In a similar way, I would like to
thank my coauthors and partners up to now in other (formal or quite informal) coop-
erations. There are some teachers and colleagues in probability and statistics to whom
I owe much, either for encouragement and help at decisive moments of my mathe-
matical life, or for mathematical discussions on specific topics, and I would like to
take this opportunity to express my gratitude. Furthermore, I have to thank those who
from the beginning allowed me to learn and to start to do mathematics, and – beyond
mathematics, sharing everyday life – my family.
Concerning a more recent time period, I would like to thank my colleague Eva
Löcherbach, my PhD student Michael Diether as well as Tobias Berg and Simon Hol-
bach: they agreed to read longer or shorter parts of this text in close-to-final versions
and made critical and helpful comments; remaining errors are my own.
Preface vii
1 Score and Information 1
1.1 Score, Information, Information Bounds . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Estimator Sequences, Asymptotics of Information Bounds . . . . . . . . . 15
1.3 Heuristics on Maximum Likelihood Estimator Sequences . . . . . . . . . . 23
1.4 Consistency of ML Estimators via Hellinger Distances . . . . . . . . . . . . 30
2 Minimum Distance Estimators 42
2.1 Stochastic Processes with Paths in Lp .T , T , / . . . . . . . . . . . . . . . . . . 43
2.2 Minimum Distance Estimator Sequences . . . . . . . . . . . . . . . . . . . . . . . 55
2.3 Some Comments on Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . 68
2.4 Asymptotic Normality for Minimum Distance Estimator Sequences . . 75
3 Contiguity 85
3.1 Le Cam’s First and Third Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.2 Proofs for Section 3.1 and some Variants . . . . . . . . . . . . . . . . . . . . . . . 92
4 L2 -differentiable Statistical Models 108
4.1 Lr -differentiable Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.2 Le Cam’s Second Lemma for i.i.d. Observations . . . . . . . . . . . . . . . . . 119
5 Gaussian Shift Models 127
5.1 Gaussian Shift Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.2 BrownianMotion with Unknown Drift as a Gaussian Shift
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6 Quadratic Experiments and Mixed Normal Experiments 148
6.1 Quadratic and Mixed Normal Experiments . . . . . . . . . . . . . . . . . . . . . 148
6.2 Likelihood Ratio Processes in Diffusion Models . . . . . . . . . . . . . . . . 160
6.3 Time Changes for Brownian Motion with Unknown Drift . . . . . . . . . 168
x Contents
The chapter starts with classically defined notions of ‘score’, ‘information’ and ‘in-
formation bounds’ in smoothly parameterised statistical models. These are studied
in Sections 1.1 and 1.2; the main results of this part are the asymptotic van Trees
bounds for i.i.d. models in Theorem 1.11. Here we encounter in a restricted setting
the type of information bounds which will play a key role in later chapters, for more
general sequences of statistical models, under weaker assumptions on smoothness of
parameterisation, and for a broader family of risks. Section 1.3 then discusses ‘classical
heuristics’ which link limit distributions of maximum likelihood estimator sequences
to the notion of ‘information’, together with examples of statistical models where such
heuristics either work or do not work at all; we include this – completely informal –
discussion since later sections will show how similar aims can be attained in a rigorous
way based on essentially different mathematical techniques. Finally, a different route
to consistency of ML estimator sequences in i.i.d. models is presented in Section 1.4,
with the main result in Theorem 1.24, based on conditions on Hellinger distances in
the statistical model.
., A, P /
P D ¹P# : # 2 ‚º , ‚ Rd ,
1.2 Definition (Score and information, classical definition). Consider a parametric ex-
periment
E :D ., A, ¹P# : # 2 ‚º/ , ‚ Rd open
with dominating measure and with densities
d P#
.!/ :D f .#, !/ D f# .!/ , # 2‚, !2.
d
Assume that for every ! 2 fixed, f ., !/ is continuous, and let partial derivatives
exist on ‚: as pointwise limits of measurable functions f .#Cheih,/f .#,/ , h ! 0,
these are measurable functions.
(a) Let r denote the vector of partial derivatives with respect to # and define
M# :D .r log f /.#, !/
0 1
@
@#1
log f
B C
:D 1¹f# >0º .!/ @ A .#, !/ , # 2‚, !2.
@
@#d
log f
This yields a well-defined random variable on ., A/ taking values in .Rd , B.Rd //.
Assume further
If all these conditions are satisfied, M# is termed score in #, and the covariance matrix
under P#
I# :D C ov# .M# / D E# M# M#>
Fisher information in #. We then call E an experiment admitting score and Fisher
information.
(b) More generally, we may allow for modification of M# defined in (a) on sets of
f# : # 2 ‚º from
P# -measure zero, and call any family of measurable mappings ¹M
., A/ to .Rd , B.Rd // with the property
f# D M#
for every # in ‚ : M P# -almost surely
Even if many classically studied parametric statistical models do admit score and
Fisher information, it is indeed an assumption that densities # ! f .#, !/ should
satisfy the smoothness conditions in Definition 1.2. Densities # ! f .#, !/ can be
continuous and non-differentiable, their smoothness being e.g. the smoothness of the
Brownian path; densities # ! f .#, !/ can be discontinuous, their jumps correspond-
ing e.g. to the jumps of a Poisson process. For examples, see (1) and (3) in [17] (which
goes back to [60]). We will start from the classical setting.
1.2’ Remark. For a statistical model, score and information – if they exist – depend
essentially on the choice of the parameterisation ¹P# : # 2 ‚º D P for the model, but
not on the choice of a dominating measure: in a dominated experiment, for different
measures 1 and 2 dominating P , we have with respect to 1 C 2
d P# d 1 d P# d P# d 2
D D
d 1 d.1 C 2 / d.1 C 2 / d 2 d.1 C 2 /
where the second factor on the r.h.s. and the second factor on the l.h.s. do not involve
#, hence do not contribute to the score .r log f /.#, /.
and write F for the class of all probability measures on .R, B.R//. We shall show
that for suitable " > 0 and ‚ :D .", "/, there are parametric models
¹F# : # 2 ‚º F with F0 D F
of mutually equivalent probability measures on .R, B.R// where score M# and Fisher
information I# exist at every point # 2 ‚, and where we have at # D 0
Z
.˘˘/ M0 .!/ D h.!/ , I0 D h2 dF .
hence all probability laws in ./ are equivalent. For all j#j < M 1 , score M# and
Fisher information I# according to Definition 1.2 exist and have the form
h
M# .!/ D .!/ ,
1 C #h
0/
.1.3 Z 2 Z
h h2
I# D dF# D dF < 1
1 C #h 1 C #h
which at # D 0 gives .˘˘/ as indicated above.
(b) Now we consider general functions h in .˘/. Select some truncation function
2 C01 .R/, the class of continuously differentiable functions R ! R with compact
support, such that the properties
² ³
1 1
.x/ D x on jxj < , .x/ D 0 on ¹jxj > 1º , max j j <
3 x2R 2
are satisfied. Put
E D . R, B.R/, ¹F# : j#j < 1º / ,
./ R
F# .d!/ :D 1 C Œ .#h.!// .#h/dF F .d!/
and note that in the special case of bounded h as considered in (a) above, paths ./
and ./ coincide when # ranges over some small neighbourhood of 0. By choice of
, the densities
Z
f .#, !/ D 1 C Œ .#h.!// .#h/dF
are strictly positive, hence all probability measures in ./ are equivalent on
.R, B.R//. Since ./ is in particular Lipschitz, dominated convergence shows
Z Z
d
.#h/ dF D h 0 .#h/ dF .
d#
This gives scores M# at j#j < 1
R
d h.!/ 0 .#h.!// h 0 .#h/ dF
M# .!/ D log f .#, !/ D R
d# 1 C Œ .#h.!// .#h/dF
1.4 Example. (location models) Fix a probability measure F on .R, B.R// having
density f with respect to Lebesgue measure such that
1.4’ Exercise. In Example 1.4, we may consider .a, b/ :D .0, 1/ and f .x/ :D
c.˛/1.0,1/ .x/Œx.1 x/˛ for parameter value ˛ > 1 (we write c.˛/ for the norming constant
of the Beta B.˛ C 1, ˛ C 1/ density).
1.4” Exercise. (Location-scale models) For laws F on .R, B.R// with density f differ-
entiable on R and supported by some open interval .a, b/ as in Example 1.4, consider the
location-scale model generated by F
1 #1
E :D R, B.R/, ¹F.#1 ,#2 / : #1 2 R , #2 > 0º , dF.#1 ,#2 / :D f d .
#2 #2
Section 1.1 Score, Information, Information Bounds 7
Write down the Fisher information I# at #. Check that I# is invertible for all # 2 ‚, and
that I# depends on the parameter only through the scaling component #2 (cf. [127, pp. 181–
182]).
E :D . , A, ¹P# : # 2 ‚º / , ‚ Rd open
X
n
n
Mn,# .!1 , : : : , !n / D M# .!i / Pn,# -almost surely on X
iD1
iD1
In,# D n I# , # 2‚.
In,# D n I# , # 2‚.
Proof. Select a dominating measure for the experiment E, and versions f# of the
densities dP#
d
which satisfy the assumptions of Definition 1.2.
n
(1) We prove (a). The product experiment En is dominated by n :D ˝ , with
iD1
densities
dPn,# Y
n
dP# Y
n
fn,# .!1 , : : : , !n / D .!1 , : : : , !n / D .!i / D f# .!i / .
dn d
iD1 iD1
Then Pn,# is supported by the rectangle
n
An,# :D ¹.!1 , : : : , !n / : fn,# .!1 , : : : , !n / > 0º D X ¹!i : f# .!i / > 0º
iD1
n n n
in ˝ A. On the product space . X , ˝ A/ we have with the conventions of 1.2 (a)
iD1 iD1 iD1
X
n
Mn,# ..!1 , : : : , !n // D 1An,# ..!1 , : : : , !n // M# .!i / .
iD1
Since Pn,# .An,# / D 1 , the measurable mappings
X
n
.!1 , : : : , !n / ! Mn,# ..!1 , : : : , !n // , .!1 , : : : , !n / ! M# .!i /
iD1
coincide Pn,# -almost surely on .XniD1 , ˝niD1 A/ , and are identified under Pn,# ac-
cording to Definition write M#,j for the components of M# , 1 j d ,
1.2(b). We
and .I# /j ,l D E# M#,j M#,l .
Section 1.1 Score, Information, Information Bounds 9
The Fisher information yields bounds for the quality of estimators. We present two
types of bounds.
(c) In the special case D id in (a) and (b) above, the bound (˘) at # reads
C ov# .Y / I#1
and Y attains the bound (˘) at # if and only if
Y # D I#1 M# P# -almost surely.
Proof. According to Definition 1.2, we have M# 2 L2 .P# / and E# .M# / D 0 for all
# 2 ‚. Necessarily I# D E# .M# M#> / is symmetric and non-negative definite for
all # 2 ‚. We consider a point # 2 ‚ such that .C/ holds, together with a mapping
: ‚ ! Rk and a random variable Y 2 L2 .P# / satisfying E# .Y / D .#/ and
.CC/.
(1) We start by defining V# by the right-hand side of (++):
V# :D E# Y M#> .
1.7 Remarks. (a) A purely heuristic argument for (++): one should be allowed to
differentiate Z
.#/ D E# .Y / D .d!/ f .#, !/ Y .!/
Section 1.1 Score, Information, Information Bounds 11
1.7’ Remark. Within the class of unbiased and square integrable estimators
Y for
>
, the covariance matrix C ov# .Y / D E# .Y .#//.Y .#// / , quantifying
spread/concentration of estimation errors Y .#/ at #, allows to compare different
estimators at the point #. The lower bound in .˘/ under the assumptions of Proposi-
tion 1.6 involves the inverse of the Fisher information I#1 which thus may indicate
an ‘optimal concentration’; when D id the lower bound in .˘/ equals I#1 .
However, there are two serious drawbacks:
(i) these bounds are attainable bounds only in a few classical parametric models,
see [27, p. 198], or [127, pp. 312–317], and [86, Theorem 7.15 on p. 300];
(ii) unbiasedness is not ‘per se’ relevant for good estimation: a famous example due to
Stein (see [60, p. 26] or [59, p. 93]) shows that even in a normal distribution model
n n ° n ±
k k k
X R , ˝ B.R , ˝ N .#, Ik / : # 2 ‚ , ‚ :D R where k 3
iD1 iD1 iD1
estimators admitting bias can be constructed which (with respect to squared loss)
concentrate better around the true # than the empirical mean, the best unbiased
square integrable estimator in this model.
Fix any subinterval .a, b/ of ‚ and any a priori law with Lebesgue density g :D
d
d
such that
Proof. (0) Note that J is the Fisher information in the location model generated by
on .R, B.R// which satisfies all assumptions of Example 1.4.
(1) We show that the assumptions on the densities made in Definition 1.2 imply
measurability of
are measurable in the pair .#, !/, hence the same holds for their pointwise limit ./
as k ! 1.
(2) The product measurability established in ./ allows to view
Z
.#, A/ ! P# .A/ D 1A .!/ f .#, !/ .d!/ , # 2 ‚ , A 2 A
(3) We consider estimators T : ., A/ ! .R, B.R// for which have the property
Z
E# .T .#//2 .d #/ < 1
.a,b/
(otherwise the bound in Proposition 1.8 would be trivial). To prove Proposition 1.8,
it is sufficient to consider the restriction of the parameter space ‚ to its subset .a, b/:
d
thus we identify ‚ with .a, b/ – then, by assumption, g D d
will be strictly positive
on ‚ with limits g.a/ D g.b/ D 0, and as well as f ., !/ for all ! will have finite
limits at the endpoints of ‚ – and work on the product space
, A :D .‚ , B.‚/˝A/
equipped with the probability measure
P .d #, d!/ :D .d #/ P# .d!/ D .˝/.d #, d!/ g.#/f .#, !/ ,
# 2‚, !2.
(4) In the following steps, we write 0 for the derivative with respect to the parameter
(from the set of assumptions in Definition 1.2, recall differentiability of # ! f .#, !/
for fixed ! when d D 1). Then
Z
.C/ d # .f .#, !/g.#//0 D f .b, !/g.b/ f .a, !/g.a/ D 0
‚
D .d #/ 0 .#/ .
‚
By strict positivity of the densities and strict positivity of g on ‚, the l.h.s. of the first
equality sign is
Z
.˝/.d #, d!/ .f .#, !/g.#//0 .T .!/ .#//
‚
Z Z
.f .#, !/g.#//0
D .d #/P# .d!/ .T .!/ .#//
‚ f .#, !/g.#/
Z 0
g f0
D P .d #, d!/ .#/ C .#, !/ .T .!/ .#// .
‚ g f
14 Chapter 1 Score and Information
are in L2 .P /: the second is the estimation error of an estimator T for which at the
start of step (3) was assumed to be in L2 .P /, the first is the sum of the score in the
location experiment generated by and the score ¹M# : # 2 ‚º in the experiment E,
both necessarily orthogonal in L2 .P /:
Z
g0 f0
P .d #, d!/ .#/ .#, !/ D
‚ g f
.C C C/ Z Z
g0 f0
.d #/ .#/ P# .d!/ .#, !/ D 0 .
‚ g f
Putting the last three blocks of arguments together, the Cauchy–Schwarz inequality
with (+++) gives
Z 2
0
.d #/ .#/
‚
Z 0 2
g f0
D P .d #, d!/ .#/ C .#, !/ .T .!/ .#//
‚ g f
Z 0 2 Z
g f0
P .d #, d!/ .#/ C .#, !/ P .d #, d!/ .T .!/ .#//2
‚ g f ‚
Z Z
D JC .d #/ I# .d #/ E# .T .#//2
‚ ‚
1.8’ Exercise. For ‚ Rd open, assuming densities which are continuous in the parameter,
cover Rd with half-open cubes of side length 2k to prove
In the case where d D 1, the assertion holds under right-continuity (or left-continuity) of the
densities in the parameter.
1.8” Exercise. For ‚ Rd open, for product measurable densities .#, !/ ! f .#, !/ with
dominating measure on ., A/, consider as in Proposition 1.8 the product
respect to some
space , A D .‚ , B.‚/˝A/ equipped with
where is some probability law on .‚, B.‚// with Lebesgue density g, and view .#, !/ !
# and .#, !/ ! ! as random variables on ., A/. We wish to estimate some measurable
Section 1.2 Estimator Sequences, Asymptotics of Information Bounds 15
mapping : .‚, B.‚// ! .Rk , B.Rk //: fixing some loss function ` : Rk ! Œ0, 1/, we call
any estimator T : ., A/ ! .Rk , B.Rk // with the property
Z Z
inf .d #/ E# . `.T .#// / D .d #/ E# `.T .#//
T A-mb ‚ ‚
`-Bayesian with respect to the a priori law . Here ‘inf’ is over the class of all measurable
mappings T : ., A/ ! .Rk , B.Rk //, i.e. over the class of all possible estimators for . So
far, we leave open questions of existence (see Section 37 in Strasser [121]).
In the case where `.y/ D jyj2 and 2 L2 ./, prove that a squared loss Bayesian exists
and is given by
´ R R
R f .,!/ g./
T .!/ D ‚ . / ‚ f . ,!/ g. / d d if
R‚
f . , !/ g. / d > 0
#0 if ‚ f . , !/ g. / d D 0
parameterised by the same parameter set ‚ Rd which does not depend on n, and a
mapping : ‚ ! Rk . An estimator sequence for is a sequence .Yn /n of measur-
able mappings
Yn : .n , An / ! .Rk , B.Rk // , n 1 .
(a) An estimator sequence .Yn /n is called consistent for if
for every # 2 ‚, every " > 0: lim Pn,# .jYn .#/j > "/ D 0
n!1
(convergence in .Pn,# /n -probability of the sequence .Yn /n to .#/, for every # 2 ‚).
(b) Associate sequences .'n .#//n to parameter values # 2 ‚, either taking values
in .0, 1/ and such that 'n .#/ increases to 1 as n ! 1, or taking values in the space
of invertible d d -matrices such that minimal eigenvalues n .#/ of 'n .#/ increase
to 1 as n ! 1. Then an estimator sequence .Yn /n for is called .'n /n -consistent if
(c) .'n /n -consistent estimator sequences .Yn /n for are called asymptotically nor-
mal if
for every # 2 ‚ , L . 'n .#/.Yn .#// j Pn,# / ! N .0, †.#// as n ! 1
(weak convergence in Rk ), for suitable normal distributions N .0, †.#// , # 2 ‚.
as in Lemma 1.5(a), with score Mn,# in # and information In,# D n I# . In this setting
we present some asymptotic lower bounds for the risk of estimators, in terms of the
Fisher information.
1.10 Remark (Asymptotic Cramér–Rao Bound). Consider .En /n as in .˘/ and assume
that I# is invertible for all # 2 ‚. Let .Tn /n denote somepsequence of unbiased and
square integrable estimators for the unknown parameter, n–consistent and asymp-
totically normal:
p
.ı/ for every # 2 ‚ : L n.Tn #/ j Pn,# ! N .0, †.#// , n ! 1
makes an ‘optimal’ limit variance †.#/ D I#1 appear in .ı/, for every # 2 ‚.
Given one estimator
sequence .Tn /n whose rescaled estimation errors at # attain the
limit law N 0, I#1 , one would like to call this sequence ‘optimal’. The problem is
that Cramér–Rao bounds do not allow for comparison within a sufficiently broad class
of competing estimator sequences. Fix # 2 ‚. Except for unbiasedness of estimators
at #, Cramér–Rao needs the assumption (++) in Proposition 1.6 for D id , and needs
./ from Definition 1.2: both last assumptions
E# .Mn,# / D 0 , E# .Tn Mn,# / D I , n1
(with 0 the zero vector in Rd and I the identity matrix in Rd d ) combine in particular
to
./ E# .ŒTn # Mn,# / D I for all n 1 .
Section 1.2 Estimator Sequences, Asymptotics of Information Bounds 17
Thus from the very beginning, condition (++) of Proposition 1.6 for D id estab-
lishes a close connection between the sequence of scores .Mn,# /n on the one hand and
those estimator sequences .Tn /n to which we may apply Cramér–Rao on the other.
Hence the Cramér–Rao setting turns out to be a restricted setting.
Then for independent replication .˘/ of the experiment E, for arbitrary choice of esti-
mators Tn for the unknown parameter # 2 ‚ in the product experiments En , we have
the two bounds .I / and .II/
p
.I / lim lim inf inf sup E# Œ n.Tn #/2 I#1
c#0 n!1 Tn An -mb j##0 j<c 0
p
.II/ lim lim inf inf sup p E# Œ n.Tn #/2 I#1
C "1 n!1 Tn An -mb j##0 j<C = n 0
at every #0 2 ‚.
(finite or infinite) is trivial; to its r.h.s. we apply van Trees inequality in Proposition
1.8 and continue
1 1
.ı/ n R Cr D R Cr
r In,# r .d #/ C Jr r I# r .d #/ C r 2 n J
1
Second, for c # 0 on both sides of the last inequality, continuity .˘˘/ of the Fisher
information gives
p
lim inf lim inf inf sup E# Œ n.Tn #/2 I01 .
c#0 n!1 Tn An -mb j#0j<c
On the l.h.s. of the preceding inequality, the term
p
lim inf inf sup Rn .Tn , #/ , Rn .Tn , #/ :D E# Œ n.Tn #/2
n!1 Tn An -mb j#0j<c
is monotone in c since for c1 < c2
Hence ’ lim inf ’ above is in fact ‘ lim ’ which proves the bound .I /.
c#0 c#0
p
(3) Consider C < 1 large. With r D C = n in the above chain of inequalities .ı/,
we exploit again continuity .˘˘/ of the Fisher information and get
p 1
lim inf inf sup p E# Œ n.Tn #/2 .
n!1 Tn An -mb j#0j<C = n I0 C C12 J
which is the bound .II/. This concludes the proof of Theorem 1.11.
Both bounds .I / or .II/ in Theorem 1.11 are asymptotic lower bounds of minimax
type: using ‘best possible’ estimators for the unknown parameter – where at every
stage n of the asymptotics, competition is between all estimators which may exist in
En – we minimise a maximal risk on small balls around points #0 in ‚. For independent
replication of an experiment E which satisfies all assumptions of Proposition 1.8, the
van Trees inequality thus shows that the maximal risk of estimators on small balls
Section 1.2 Estimator Sequences, Asymptotics of Information Bounds 19
around #0 p(with respect to squared loss, and with estimation errors rescaled by norming
constants n as n ! 1) will never be better than I#1 0
, the inverse of the Fisher
information at #0 .
The two types of bounds .I / and .II/ are different in that they imply different no-
tions of ‘small neighbourhoods
p around #0 ’. Type .II/ neighbourhoods are shrinking
balls of radius O.1= n/: at first glance this seems to be less natural than the small
balls not depending on n which are used in typep .I /. Let us compare the two types
of bounds, writing again Rn .Tn , #/ :D E# Œ n.Tn #/2 as in the last proof. For
every estimator sequence .Tn /n and every pair of constants 0 < c < C < 1, we have
inf sup p en , #/
Rn .T
e
T n An -mb j##0 j<C = n
for arbitrary pairs .c, C /, c > 0 small and C < 1 large. Using again the monotonicity
argument of the last proof, we arrive at a comparison
between left-hand sides in the type .I / or type .II/ bounds of Theorem 1.11.
The two types of bounds correspond to different traditions. On the one hand, see e.g.
Ibragimov and Khasminskii [60, Theorem 12.1, p. 162] and Kutoyants [80, p. 57 or
pp. 114–115] who – in more general settings than the present one – work with bounds
of type .I /. To prove that a given sequence .Ten /n attains a bound of type .I /:
one needs some ‘uniformity in the parameter’ for weak convergence of rescaled esti-
mation errors which has to be proved separately. On the other hand, Le Cam [81, 82],
Hajek [40], Davies [19] or Le Cam and Yang [84] work – in more general settings
than the present one – with bounds of type .II/. To prove that an estimator sequence
en /n attains a type .II/ bound
.T
no separate proof for ‘uniformity in the parameter’ for weak convergence of rescaled
estimation errors is needed since ‘Le Cam’s third lemma’ settles this problem, see
Chapter 3. Our focus in later chapters will be on bounds of type .II/, see Chapter 7; we
will also be interested in loss functions different from squared loss, and in estimator
sequences .Ten /n which achieve type .II/ bounds simultaneously with respect to a
broad class of loss functions.
We conclude the present section by one example illustrating type .I / bounds in the
spirit of the references [60] and [80] mentioned above.
with radius ı > 0. Then for every F 2 F and every point x 2 R such that 0 <
F .x/ < 1, we have a lower bound
.i/ lim lim inf inf sup Ee e .x/2 F .x/.1 F .x// ,
n ŒTn F
F
ı#0 n!1 Tn An -mb e F 2Vı .F /
and thus attains the bound specified in (i). The proof is in several steps.
(1) Fix F 2 F and x 2 R. It is easy to prove (ii): from
X
n
e .x/ D 1
b n .x/ F
F e .x/
1.1,x .Yi / F
n
iD1
sup F e .x/.1 F
e .x// ! F .x/.1 F .x// as ı # 0 .
e
F 2Vı .F /
Thus the empirical distribution function attains the bound proposed in (i).
(2) It remains to prove the bound (i). Fix F 2 F and x 2 R such that 0 < F .x/ < 1.
Let H denote the system of functions h in L2 .F / which satisfy
Z 1 Z x
h bounded on R, h dF D 0 , h dF D 1
1 1
(in addition to Example 1.3(a), the third condition yields aRparticular norming of func-
tions e
h which satisfy the first two conditions together with 1 e
x
h dF ¤ 0). For h 2 H
and neighbourhoods Vı .F / of F we introduce one-parametric paths S h through F by
ı
ı h :D , S h :D ¹ F#h : j#j < ı h º , dF#h :D .1C#h/ dF .
sup jhj
(3) Fix h 2 H . Due to the norming factor included in the definition of H we have
in S h Z x
F#h .x/ D .1C#h/ dF D F .x/ C # , j#j < ı h ,
1
e ! F
hence any An -measurable estimator Tn for : F 3 F e .x/ 2 Œ0, 1 estimates
in restriction to S h
Note that (+) makes appear a shift by F .x/; recall that F and x are fixed. In restriction
to S h , we may associate to a new mapping h and to Tn a new estimator T n
Trees bound in Theorem 1.11 for estimation of the unknown parameter # in small
type .I / neighbourhoods of the point #0 D 0 in ¹# : j#j < ıh º:
p 1
lim lim inf inf sup EF h Œ n.T n #/2 I0h .
n!1
c#0 T n An -mb j#j<c #
(5) In order to conclude the proof of (i) on the basis of the last inequality, it is
sufficient to prove
° ±
1
.C C C/ sup I0h : h 2 H D F .x/.1 F .x// .
1 1
D I0h 2 . F .x/.1 F .x// / 2
Hence the system H contains exactly one element h determined through .ı/ and .ıı/
´
1.1,x .y/ F .x/ 1
, yx
h.y/ D D F .x/ 1
F .x/.1 F .x// 1F .x/
, y>x
(h satisfies the norming conditions in H , and is bounded since 0 < F .x/ < 1) with
the property
° 1 ± 1 Z 2 1
h h
sup I0 : h 2 H D I0 D h dF D F .x/.1 F .x// .
E :D . , A, ¹P# : # 2 ‚º / , ‚ Rd open
24 Chapter 1 Score and Information
admitting score ¹M# : # 2 ‚º and Fisher information ¹I# : # 2 ‚º, under all
conditions of Definition 1.2. The densities f# D dP d
#
are assumed strictly positive
and the parameterisation # ! f .#, / sufficiently smooth on ‚. In such experiments
E, we expect the following interchange conditions (h1) and (h2) to hold. First, as in
Remark 1.7(a) we should have
Z Z Z
@ @ .ŠŠ/ @ @
E# Y D Y f# d D Y f# d D Y log f# f# d
@#i @#i @#i @#i
in our model for suitable Y : ., A/ ! .R, B.R//, and thus
.ŠŠ/
.h1/ r> .E# Y / D E# .Y M#> /
which corresponds to the special case of dimension k D 1 in condition (++) in Propo-
sition 1.6. Second, accepting (h1) we should also have
Z
@ @ .ŠŠ/ @ @
E# Y D Y log f# f# d
@#j @#i @#j @#i
Z
@ @
D Y log f# f#
@#j @#i
@ @
C log f# log f# f# d
@#i @#j
@ @
D E# Y log f# C M#,i M#,j
@#j @#i
where M#,1 , : : : , M#,d are the components of M# . In the particular case Y 1, the
l.h.s. of this chain of equations equals 0, hence by definition of score and Fisher infor-
mation, assuming tacitely P# -integrability of .rr > log f /.#, /,
.ŠŠ/
.h2/ I# D E# M# M#> D E# .rr > log f /.#, / .
Again for general Y we can write the first line of the above chain of equalities in the
alternative form
Z
@ @ .ŠŠ/ @ @
E# Y D Y f .#, / d .
@#j @#i @#j @#i
Comparing right-hand sides in their original and their alternative form when Y 1,
the condition
.ŠŠ/
.h20 / E .rr > f /.#, / D 0
1.13 Heuristics II (ML method). Consider a model E with score ¹M# : # 2 ‚º and
Fisher information ¹I# : # 2 ‚º as in Definition 1.2, with densities f# D dP
d
#
strictly
Section 1.3 Heuristics on Maximum Likelihood Estimator Sequences 25
‚ 3 # ! f .#, !/ 2 .0, 1/
Here oPn,# .1/ denotes remainder terms which vanish in .Pn,# /n -probability as n tends
to 1, and .1CoPn,# .1//d means a diagonal matrix with diagonal entries 1CoPn,# .1/.
We expect that second derivatives of the log-likelihoods should not vary much in the
parameter (in the classical normal distribution model with unknown mean and known
covariance, log-likelihoods are quadratic in the parameter, thus second derivatives do
not depend on the parameter), i.e.
@ @ @ @
.h4/ log fn .b
# n/ log fn .#/.
@#j @#i @#j @#i
Inserting (h4) into the expansion (h3) we should have a final form
.ŠŠ/ >
.h5/ r > log fn .#/ D # b#n rr > log fn .#/ 1 C oPn,# .1/ d
26 Chapter 1 Score and Information
1 X
n
1
rr > log fn .#, .!1 , ..., !n // D rr > log f .#, !i /
n n
iD1
and the strong law of large numbers combined with (h2) gives almost sure convergence
1
.˘/ rr > log fn .#, / ! E# rr > log f .#, / D I#
n
under # as n ! 1. According to Lemma 1.5, the score in product models
X
n
.r log fn / .#, .!1 , ..., !n // D Mn,# .!1 , ..., !n / D M# .!i /
iD1
of rescaled estimation errors. The limit variance at # in (h7) is the inverse of the Fisher
information in E. According to Cramér–Rao asymptotics (Remark 1.10) or – assum-
ing some uniformity in the parameter – van Trees asymptotics (Theorem 1.11) this
is a lower bound for limit variances of ‘good’ estimators for the unknown parameter:
hence ML estimators attain this bound asymptotically.
Section 1.3 Heuristics on Maximum Likelihood Estimator Sequences 27
We turn to examples. There are two examples where all laws under consideration
are normal distributions: in the first case, all approximations above are justified; in the
second case we run into trouble. A third example illustrates that we can be extremely
far from everything which seemed ‘natural’ in the above heuristics.
eter value. In this sense, the normal distribution model 1.15 serves as a prototype exam-
ple for a broad class of statistical models where one can prove that log-likelihoods are
locally – in small neighbourhoods of their unique argmax – approximatively quadratic
in the parameter. Le Cam [83] amused himself in collecting some seemingly harm-
less examples where maximum likelihood goes wrong. In the next example – a 2k-
dimensional normal distribution model with unknown mean and unknown variance
where the mean value parameter is restricted to a particular k-dimensional hyperplane
in R2k – maximum likelihood estimators are asymptotically normal, well concen-
trated, but not around the true parameter value.
1.16 Example ([83], going back to Neyman and Scott). Take k 2 N arbitrarily large.
On ., A/ :D .R2k , B.R2k // where .X1 , Y1 , X2 , Y2 , ..., Xk , Yk / :D id j denotes the
canonical variable, we consider the experiment ¹P# : # 2 ‚º
# D: . 2 , 1 , 2 , ..., k / , ‚ :D .0, 1/ Rk
defined by
.#/ :D . 1 , 1 , 2 , 2 , ..., k , k / 2 R2k , P# :D N . .#/, 2 I2k / , # 2‚
together with two different estimators for the mapping .#/ D 2 .
(1) The random variables XipY 2
i
, i D 1, : : : , k, are i.i.d. and distributed according
to N .0, 2 /, hence an empirical mean
k
1 X Xi Yi 2
T :D p
k 2
iD1
Hence there is a unique maximum likelihood estimator b # for #, defined by its compo-
b
nents 1 k
b
2 , b , ..., b . Since 2 D 1 T , the above properties of T give
2
b2 k k 2 4
L j P# D , 2 , E# b2 D , Var # b2 D .
2 2 2k
This means that the maximum likelihood estimator concentrates in its first component
around one half of the true value, with obviously dramatic effects for large values of
the parameter 2 once the model dimension k is large enough.
The last example shows that the Heuristics 1.12–1.14 cannot pretend to serve as
universal guidelines for good estimation. The next example, associated to classically
well-behaving models, makes types of statistical experiments appear which are very
far from all the considerations above.
As in Barra [4, Chap. X], this model is an exponential family, and the function is
C 1 on D. In particular, for standard Brownian motion X we have .#/ D 12 # 2 and
D D R; for Poisson process X with parameter > 0 we have .#/ D .e # 1/
and D D R.
(2) Fix a point #0 in D; w.l.o.g. we write #0 D 1. Define probability laws ¹Peu :
u > 0º on ., A/ by
e u :D e Xu
dP .1/ u
dP0 , u>0.
1.17’ Exercise. Construct an ML estimator sequence in .En /n of Definition 1.17(b) under the
following set of conditions: (i) ‚ is open in Rd ; (ii) for all n 2 N and all ! 2 n , densities
fn .#, !/ are continuous in # 2 ‚; (iii) for any compact exhaustion .Kn /n of ‚, events
An :D ¹ ! 2 n : max¹fn . , !/ : 2 Kn º D sup¹fn . , !/ : 2 ‚º º
32 Chapter 1 Score and Information
are such that lim Pn,# .An / D 1 for every # 2 ‚. Hint: with respect to .Kn /n and .An /n ,
n!1
for every n, associate to ! 2 An a non-void compact set
® ¯
Mn .!/ :D 2 Kn : fn . , !/ D max¹fn . , !/ : 2 Kn º Kn
E :D . , A, ¹P# : # 2 ‚º / , ‚ Rd open
with strictly positive densities f .#, !/ with respect to some dominating measure, and
product models
n n n
En D . n , An , ¹Pn,# : # 2 ‚º / D X , ˝ A, ¹Pn,# :D ˝ P# : # 2 ‚º .
iD1 iD1 iD1
Assume that for every # 2 ‚ and every " > 0 one can find a finite collection of open
subsets V1 , : : : , Vl of ‚ (with l 2 N and V1 , : : : , Vl depending on # and ") such that
./ and ./ hold:
[
l
® ¯ [
l
./ # … Vi and 2 ‚ : j #j > " Vi
iD1 iD1
Section 1.4 Consistency of ML Estimators via Hellinger Distances 33
f . , / f . , /
sup log 2 L .P# /
1
and E# sup log < 0,
./ 2Vj f .#, / 2Vj f .#, /
j D 1, : : : , l .
Then every ML sequence for the unknown parameter is consistent.
Proof. (i) Given any ML sequence .Tn /n for the unknown parameter with ‘good sets’
.An /n in the sense of Definition 1.17 (b), fix 2 ‚ and " > 0 and select l and
V1 , : : : , V` according to ./. Then
\ l ² ³ \
sup log fn . , / < sup log fn . , / An ¹ jTn #j " º
2Vj 2‚:j#j"
j D1
for every n fixed, by Definition 1.17(b) and since all densities are strictly positive.
Inverting this,
l ²
[ ³ [
¹ jTn #j > " º sup log fn . , / sup log fn . , / Acn
2Vj :j#j"
j D1
l ²
[ ³ [
sup log fn . , / log fn .#, / Acn .
2Vj
j D1
(ii) In the sequence of product experiments .En /n , the .An /n being ‘good sets’ for
.Tn /n , we use the strong law of large numbers thanks to assumption ./ to show that
´ n μ!
1X f . , !i /
Pn,# .!1 , : : : , !n / : sup log 0 ! 0
n 2Vj f .#, !i /
iD1
as n ! 1 for every j , 1 j l: this gives lim Pn,# .jTn #j > "/ D 0.
n!1
1.18 Definition. The Hellinger distance H., / between probability measures Q1 and
Q2 on ., A/ is defined as the square root of
Z
1 ˇˇ 1=2 ˇ
1=2 ˇ2
H 2 .Q1 , Q2 / :D ˇ 1
g g 2 ˇ d 2 Œ0, 1
2
where gi D dQ d
i
are densities with respect to a dominating measure , i D 1, 2. The
affinity A., / between Q1 and Q2 is defined by
Z
1=2 1=2
A.Q1 , Q2 / :D 1 H .Q1 , Q2 / D g1 g2 d .
2
34 Chapter 1 Score and Information
The integrals in Definition 1.18 do not depend on the choice of a dominating mea-
sure for Q1 and Q2 (similar to Remark 1.2’). We have H.Q, Q0 / D 0 if and only
if probability measures Q, Q0 coincide, and H.Q, Q0 / D 1 if and only if Q, Q0 are
mutually singular. Below, we focus on i.i.d. models En and follow Ibragimov and
Khasminskii’s route [60, Chap. 1.4] to consistency of maximum likelihood estimators
under conditions on the Hellinger geometry in the single experiment E.
For the remaining part of this section, the following assumptions will be in force.
Note that we do not assume equivalence of probability laws, and do not assume con-
tinuity of densities in the parameter for fixed !.
we have densities f# D dP d
#
with respect to a dominating measure . For ¤ #, the
likelihood ratio of P with respect to P# is
f
L=# D 1¹f# >0º C 1 1¹f# D0º .
f#
(ii) We write K for the class of compact sets in Rd which are contained in ‚. A
compact exhaustion
S of ‚ is a sequence .Km /m in K such that Km int.KmC1 / for
all m, and m Km D ‚. We introduce the notations
Z ˇ ˇ 1=2
ˇ 1=2 1=2 ˇ2
. , ı/ :D sup ˇf 0 f ˇ d , 2‚, ı>0.
0 2Bı ./\‚
together with
c
.CC/ lim a. , Km /<1
m!1
In En , we write fn,# D fn .#, / for the density of Pn,# with respect to ˝niD1 , and
L=#
n for the likelihood ratio of Pn, with respect to Pn,# .
1.20 Lemma. For K 2 K and # 2 int.K/, the condition a.#, K c / < 1 implies
geometric decrease
q
=# n
En,# sup Ln a.#, K c / , n1.
2 ‚\K c
and thus
q Z n
=# 1=2 1=2 n
En,# sup Ln sup f# f d a.#, K c /
2V 2V
for every n 1.
and (using Lemma 1.21 and (+) in Assumptions 1.19) choose ı > 0 small enough to
have
. 0 , ı/ < h.#, , K/ .
Then we have geometric decrease
q
=#
En,# sup Ln Œ 1 h.#, , K/ C . 0 , ı/ n , n1.
2 Bı .0 /\K
Proof. Write for short V :D Bı . 0 / \ K. For .!1 , : : : , !n / 2 ¹fn,# > 0º, observe
that q
=#
Yn
f 1=2 . , !i /
sup Ln .!1 , : : : , !n / D sup
2V 2V
iD1
f 1=2 .#, !i /
is smaller than
Yn ˇ ˇ
1 ˇ 1=2 ˇ
1=2 .#, ! /
f 1=2
. 0 , !i / C sup ˇ f . , !i / f 1=2
. 0 , ! i /ˇ
iD1
f i 2V
which yields
q ²Z ˇ ˇ ³n
ˇ 1=2 1=2 ˇ
En,# sup L=#
n f#
1=2
f
1=2
0
C sup ˇf f0 ˇ
d .
2V 2V
Putting all this together, assumption (+) in Assumptions 1.19 and our choice of ı im-
plies q
=#
En,# sup Ln ¹ 1 h.#, , K/ C . 0 , ı/ ºn
2 Bı .0 /\K
1.23 Lemma. For all K 2 K, # 2 int.K/, > 0, we have under Assumptions 1.19
exponential bounds
q
C e n 2 h.#, ,K/ , n 1
=# 1
En,# sup Ln
2 KnB .#/
where by choice of the radii in .ı/, and by the elementary inequality e y 1 y for
0 < y < 1, the right-hand side is smaller than
h 1 in
` 1 h.#, , K/ ` e n 2 h.#, ,K/ .
1
2
This is the assertion, with C :D ` the number of balls Bı.0,i / . 0,i / with radii .ı/ which
were needed to cover the compact K n B .#/ subset of ‚ Rd .
Proof. Let any ML estimator sequence .Tn /n with ‘good sets’ .An /n be given, as
defined in Definition 1.17. Fix # 2 ‚ and > 0; we have to show
lim Pn,# . ¹jTn #j º / D 0 .
n!1
Fix a compact exhaustion .Km /m of ‚. Take m large enough for B .#/ Km and
c / < 1 in virtue of condition (++) in Assumptions 1.19.
large enough to have a.#, Km
Then we decompose
c
U D ‚ n B .#/ D U1 [ U2 , U1 :D Km n B .#/ , U2 :D ‚ \ Km
where Lemma 1.20 applies to U2 and Lemma 1.23 to U1 . Thus
q
=#
Pn,# . ¹jTn #j º \ An / En,# sup c Ln
2 ‚\Km
q
=#
C En,# sup Ln
2 Km nB .#/
n
C C e n 2 h.#,
1
c ,Km /
a.#, Km / , n1.
This right-hand side decreases exponentially fast as n ! 1. By definition of the
sequence of ‘good sets’ in Definition 1.17, we also have
.ı/ lim Pn,# Acn D 0
n!1
and are done.
Apart from the very particular case An D n for all n large enough, the preceding
proof did not establish an exponential decrease of Pn,# .jTn #j / as n ! 1.
This is since Assumptions 1.19 do not provide a control on the speed of convergence
in .ı/. Also, we are not interested in proving results ‘uniformly in compact #-sets’.
Later, in more general sequences of statistical models, even the rate of convergence
will vary from point to point on the parameter set, without any hope for results of
such type. We shall work instead with contiguity and Le Cam’s ‘third lemma’, see
Chapters 3 and 7 below.
1.24” Exercise. Calculate the quantities defined in Assumption 1.19(a) for the single exper-
iment E in the following case: ., A/ D .R, B.R//; the dominating measure D is
Lebesgue measure; ‚ is some open interval in R, bounded or unbounded; P# are uniform
laws on intervals of unit length centred at #
1 1
P# :D R # , # C , # 2‚.
2 2
Section 1.4 Consistency of ML Estimators via Hellinger Distances 39
Prove that the parameterisation is Hölder continuous of order 12 in the sense of Hellinger dis-
tance: p
H.P 0 , P / D j 0 j for 0 sufficiently close to .
Prove that condition (+) in Assumptions 1.19 holds with
p
. , ı/ D 2 ı
whenever ı > 0 is sufficiently small (i.e. 0 < ı ı0 . / for 2 ‚). Show also that it is
impossible to satisfy condition (++) whenever diam.‚/ 1. In the case where diam.‚/ > 1,
prove
1
.1 dist. , @‚// lim a. , Km
c
/ 2 .1 dist. , @‚// when 0 < dist. , @‚/
m!1 2
for any choice of a compact exhaustion of ‚, @‚ denoting the boundary of ‚: in this case
condition (++) is satisfied. Hence, in the case where diam.‚/ > 1, Theorem 1.24 establishes
consistency of any sequence of ML estimators for the unknown parameter under independent
replication of the experiment E.
We conclude this section by underlining p the following. First, in i.i.d. models, rates
of convergence need not be the well-known n ; second, in the case where ML estima-
tion errors at # are consistent at some rate 'n .#/ as defined in Assumption 1.9(c), this
is not more than just tightness of rescaled estimation errors L.'n .#/.b # n #/ j Pn,# /
as n ! 1. Either there may be no weak convergence at all, or we may end up with a
variety of limit laws whenever the model allows to define a variety of particular ML
sequences.
1.25 Example. In the location model generated from the uniform law R. 12 , 12 /
° 1 1 ±
E D R , B.R/ , P# :D R # , # C : # 2‚DR
2 2
(extending exercise 1.2400 ), any choice of an ML estimator sequence for the unknown
parameter is consistent at rate n. There is no unicity concerning limit distributions for
rescaled estimation errors at # to be attained simultaneously for any choice of an ML
sequence.
Proof. We determine the density of P0 with respect to as f0 .x/ :D 1Œ 12 , 12 .x/, using
the closed interval of length 1. This will simplify the representation since we may use
two particular ‘extremal’ definitions – the sequences .Tn.1/ /n and .Tn.3/ /n introduced
below – of an ML sequence.
(1) We start with a preliminary remark. If Yi are i.i.d. random variables distributed
according to R..0, 1//, binomial trials show that the probability of an event
² ³
u1 u2
max Yi < 1 , min Yi >
1in n 1in n
40 Chapter 1 Score and Information
tends to e .u1 Cu2 / as n ! 1 for u1 , u2 > 0 fixed. We thus have weak convergence
in R2 as n ! 1
n .1 max Yi / , n min Yi ! . Z1 , Z2 /
1in 1in
in the product model En . Hence any estimator sequence .Tn /n with ‘good sets’ .An /n
such that
1 1
. / Tn 2 max Xi , min Xi C on An
1in 2 1in 2
will be a maximum likelihood sequence for the unknown parameter. The random in-
terval in . / is of strictly positive length since Pn,# –almost surely
max Xi min Xi < 1 Pn,# -almost surely.
1in 1in
All random intervals above are closed by determination 1Œ 12 C, 12 C of the density f
for 2 ‚.
.i/
(3) By . / in step (2), the following .Tn /n with good sets .An /n are ML sequences
in .En /:
1
Tn.1/ :D min Xi C with An :D n
1in 2
² ³
1 1 1
Tn.2/ :D min Xi C 2 with An :D max Xi min Xi < 1 2
1in 2 n 1in 1in n
1
Tn.3/ :D max Xi with An :D n
1in 2
² ³
.4/ 1 1 1
Tn :D max Xi C 2 with An :D max Xi min Xi < 1 2
1in 2 n 1in 1in n
.5/ 1
Tn :D min Xi C max Xi with An :D n .
2 1in 1in
Without the above convention on closed intervals in the density f0 we would remove
.1/ .3/
the sequences .Tn /n and .Tn /n from this list.
(4) Fix # 2 ‚ and consider at stage n of the asymptotics the model En locally at
#, via a reparameterisation D # C u=n according to rate n suggested by step (1).
.#Cu=n/=#
Reparameterised, the likelihood function u ! Ln coincides Pn,# -almost
surely with
R 3 u ! 1 .u/ 2 ¹0, 1º .
n max .Xi #/ 1
2 ,n min .Xi #/ C 12
1i n 1i n
Section 1.4 Consistency of ML Estimators via Hellinger Distances 41
As n ! 1, under Pn,# , we obtain by virtue of step (1) the ‘limiting likelihood func-
tion’
. / R 3 u ! 1. Z1 , Z2 / .u/ 2 ¹0, 1º
where Z1 , Z2 are independent exponentially distributed.
(5) Step (4) yields the following list of convergences as n ! 1:
8
< Z1 for i D 3, 4
L n Tn.i/ # j Pn,# ! Z2 for i D 1, 2
: 1
2 .Z2 Z 1 / for i D 5.
Clearly, we can also realise convergences
L .n .Tn #/ j Pn,# / ! ˛Z2 .1 ˛/Z1
.2/ .4/
for any 0 < ˛ < 1 if we consider Tn :D ˛Tn C .1 ˛/Tn , or we can realise
L .n .Tn #/ j Pn,# / does not converge weakly in R as n ! 1
if we define Tn as Tn.2/ when n is odd, and by Tn.4/ if n is even. Thus, for a maximum
likelihood estimation in the model E as n ! 1, it makes no sense to put forward
the notion of limit distribution, the relevant property of ML estimator sequences being
n-consistency (and nothing more).
Chapter 2
This chapter is devoted to the study of one class of estimators in parametric fami-
lies which – without aiming at any notion of optimality – do have reasonable asymp-
totic properties under assumptions which are weak and easy to verify. Most examples
will consider i.i.d. models, but the setting is more general: sequences of experiments
where empirical objects ‰ b n , calculated from the data at level n of the asymptotics,
are compared to theoretical counterparts ‰# under #, for all values of the parameter
# 2 ‚, which are deterministic and independent of n. Our treatment of asymptotics
of minimum distance (MD) estimators follows Millar [100], for an outline see Kuto-
yants [78–80]. Below, Sections 2.1 and 2.3 contain the mathematical tools and are
of auxiliary character; the statistical part is concentrated in Sections 2.2 and 2.4. The
main statistical results are Theorem 2.11 (almost sure convergence of MD estima-
tors), Theorem 2.14 (representation of rescaled MD estimator errors) and Theorem
2.22 (asymptotic normality of rescaled MD estimation errors). We conclude with an
example where the parameters of a symmetric stable law are estimated by means of
MD estimators based on the empirical characteristic function of the first n observa-
tions.
2.1 Assumptions and Notations for Section 2.1. (a) Let denote a finite measure on
a measurable space .T , T / with countably generated -field T . For 1 p < 1 fixed,
the space Lp ./ D Lp .T , T , / of (-equivalence classes of) p-integrable functions
f : .T , T / ! .R, B.R// is equipped with its norm
Z p1
p
kf k D kf kL .T ,T ,/ D
p jf .t /j .dt / < 1
T
and its Borel- -field B.L p .//. T being countably generated, the space Lp .T , T , /
is separable ([127, p. 138], [117, pp. 269–270]): there is a countable subset S Lp ./
which is dense in Lp .T , T , /, and B.Lp .// is generated by the countable collection
of open balls
Br .g/ :D ¹ f 2 Lp ./ : kf gkLp ./ < r º , r 2 QC , g 2 S .
(b) With parameter set T as in (a), a real valued stochastic process X D .X t / t2T on
a probability space ., A, P / is a collection of random variables X t , t 2 T , defined on
., A/ and taking values in .R, B.R//. This process is termed measurable if .t , !/ !
X.t , !/ is a measurable mapping from .T , T ˝A/ to .R, B.R//. In a measurable
44 Chapter 2 Minimum Distance Estimators
2.1’ Exercise. Consider i.i.d. random variables Y1 , Y2 , : : : defined on some ., A, P /, and
taking values in .T , T / :D .Rm , B.Rm //.
(a) For any n 1, the empirical distribution function associated to the first n observations
1X 1X
n n
bn .t , !/ :D
F 1.1,t .Yi .!// D 1¹Yi tº .!/ , t 2 Rm , ! 2
n n
i D1 i D1
are measurable from .Rm , B.Rm /˝A/ to .R, B.R//, so the same holds for their pointwise
limit as k ! 1 which is .t , !/ ! F bn .t , !/.
(b) Note that the reasoning in (a) did not need more than continuity from the right – in
every component of the argument t 2 Rm – of the paths b F n ., !/ which for fixed ! 2
are distribution functions on Rm . Deduce the following: every real valued stochastic process
.Y t / t2Rm on ., A/ whose paths are continuous from the right is a measurable process.
p
(c) Use the argument of (b) to show that rescaled differences n.F bn F /
p
.t , !/ ! n b F n .t , !/ F .t / , t 2 Rm , ! 2
From this representation of G, associate to every " > 0 some ` D `."/ such that
[ l
Q.G/ Q Bri .gi / C " .
iD1
Thus we have as n ! 1
[
l [
l
n n
lim inf Q .G/ lim inf Q Bri .gi / D lim inf Pn Xn 2 Bri .gi /
n n n
iD1 iD1
D lim inf Pn there is some 1 i l such that kXn gi k < ri
n
l
D lim inf 1 Pn kXn g1 k, : : : , kXn gl k 2 X Œri , 1/ .
n iD1
Using the condition which ensures weak convergence of kXn g1 k, : : : , kXn gl k
in Rl as n ! 1, and the Portmanteau theorem with closed sets in Rl , we can continue
l
1 P .kX g1 k, : : : , kX gl k/ 2 X Œri , 1/
iD1
D P .there is some 1 i l such that kX gi k < ri /
[l [ l
D P X 2 Bri .gi / D Q Bri .gi /
iD1 iD1
Q.G/ "
by choice of l D `."/. Since " > 0 was arbitrary, we have (+). This finishes the
proof.
2.3’ Exercise. Under the assumptions of Proposition 2.3, use the continuous mapping theo-
rem to show that weak convergence Xn ! X in Lp .T , T , / as n ! 1 implies weak
convergence of all integrals
Z Z
g.s/ Xsn .ds/ ! g.s/ Xs .ds/ (weakly in R, as n ! 1)
T T
The following Theorem 2.4, the main result of this section, gives sufficient condi-
tions for weak convergence in Lp .T , T , / – of type ‘convergence of finite dimen-
sional distributions plus uniform integrability’ – from which weak convergence in
Lp .T , T , / can be checked quite easily.
2.4 Theorem (Cremers and Kadelka [16]). Consider measurable stochastic processes
a sufficient condition is that the following properties (i) and (ii) hold simultaneously:
(i) convergence of finite dimensional distributions up to some exceptional set N 2
T such that .N / D 0: for arbitrary l 1 and any choice of t1 , : : : , tl in T n N , one
has
L
L .X tn1 , : : : , X tnl / j Pn ! L .X t1 , : : : , X tl / j P
(weak convergence in Rl , as n ! 1) ;
(ii) uniform integrability of ¹jX n ., /jp : n 0º for the random variables (including
0
X :D X and P0 :D P )
in the following sense: for every " > 0 there is some K D K."/ < 1 such that
Z
sup 1¹jX n j>Kº jX n jp d.˝Pn / < " .
n0 T n
(b) Whenever condition (a.i) is satisfied, any one of the following two conditions
is sufficient for (a.ii):
´
X n ., / ,Rn 1 , X., / are elementsR of Lp .T n , T ˝An , ˝Pn / , and
.2.40 / lim sup T n jX n jp d.˝Pn / T jX jp d.˝P / ,
n!1
´
function f 2 L .T , T , / such that forn-almost all t 2 T
there 1
.2.400 / isn some
E jX t jp
f .t / for all n 1, and lim E jX t jp D E .jX t jp / .
n!1
The remaining parts of this section contain the proof of Theorem 2.4, to be com-
pleted in Proofs 2.6 and 2.6’, and some auxiliary results; we follow Cremers and
Kadelka [16]. Recall that the set of Assumptions 2.1 is in force. W.l.o.g., we take
the finite measure on .T , T / as a probability measure .T / D 1.
48 Chapter 2 Minimum Distance Estimators
2.5 Lemma. Write H for the class of bounded measurable functions ' : T R!R
such that
for every t 2 T fixed, the mapping '.t , / : x ! '.t , x/ is continuous.
Then condition (a.i) of Theorem 2.4 gives for every ' 2 H convergence in law of the
integrals
Z Z
n
'.s, Xs / .ds/ ! '.s, Xs / .ds/ (weak convergence in R, as n ! 1) .
T T
Proof. For real valued measurable processes .X t0 / t2T defined on some .0 , A0 , P 0 /,
the mapping .t , !/ ! .t , X 0 .t , !// is measurable from T ˝A to T ˝B.R/, hence
composition with ' gives a mapping .t , !/ ! '.t , X 0 .t , !// which is T ˝A–B.R/-
measurable. As a consequence,
Z
! ! '.s, X 0 .s, !// .ds/
T
0 0
R , A / taking
is a well-defined random variable on . values in .R, B.R//. In order to
prove convergence in law of integrals T '.s, Xsn / .ds/ as n ! 1, we shall show
Z Z
n
.˘/ EPn g '.s, Xs / .ds/ ! EP g '.s, Xs / .ds/
T T
Put X 0 :D X and P0 :D P , and write left- and right-hand sides in the form
Z l !
n
EPn '.s, Xs / .ds/
T
Z Z !
Y
l
D EPn '.si , Xsni / .ds1 / : : : .dsl /
T T iD1
Z Z
n n
D EPn .s1 ,:::,sl / .Xs1 , : : : , Xsl / .ds1 / : : : .dsl / .
T T
Section 2.1 Stochastic Processes with Paths in Lp .T , T , / 49
Y
l
.x1 , : : : , xl / D .s1 ,:::,sl / .x1 , : : : , xl / :D '.si , xi / 2 Cb .Rl /
iD1
arises which indeed is bounded and continuous: for ' 2 H , we exploit at this point
of the proof the defining property of class H . Condition (a.i) of Theorem 2.4 guaran-
tees convergence of finite dimensional distributions of X n to those of X up to some
exceptional -null set T 2 T , thus we have
EPn .Xsn1 , : : : , Xsnl / ! EP .Xs1 , : : : , Xsl / , n ! 1
for any choice of .s1 , : : : , sl / such that s1 , : : : , sl 2 T n N . Going back to the above
integrals on the product space XliD1 T , note that all expressions in the last convergence
are bounded by M l , uniformly in .s1 , : : : , sl / and n: hence
Z Z
n n
... EPn .s1 ,:::,sl / .Xs1 , : : : , Xsl / .ds1 / : : : .dsl /
T T
This proves .˘/ in the case where g.x/ D x l , for arbitrary l 2 N. This finishes the
proof.
satisfy the condition .˝Pn /.Gn / < , and thus can be inserted in .˘2/ and .˘4/.
(2) With g, ı, C of step (1), introduce a truncated identity h.x/ D .C /_x^.CC /
and define a function ' D 'g,ı : T R ! R by
Then ' belongs to class H as defined in Lemma 2.5, and is such that
.C/ jXsn .!/ g.s/jp D '.s, Xsn .!// on ¹.s, !/ : jX n .s, !/j C , jg.s/j C º .
which is (using the elementary ja C bjp .jaj C jbj/p 2p .jajp C jbjp /, and
definition of ') smaller than
Z
1¹jX n .s,!/j>C º [ ¹jg.s/j>C º 2p jXsn .!/jp C 2p jg.s/jp C 2p C p .ds/
T
²Z
2p 1¹jX n .s,!/j>C º .jX n .s, !/jp C C p / .ds/
T
Z Z
C 1¹jX n .s,!/j>C º jg.s/jp .ds/ C 1¹jg.s/j>C º jX n .s, !/jp .ds/
ZT T
³
p p
C 1¹jg.s/j>C º .jg.s/j C C / .ds/
T
The last right-hand side is the desired bound for ., n/, for ! 2 fixed.
(3) Integrating this bound obtained in step (2) with respect to Pn , we obtain
ˇ Z ˇ
ˇ n ˇ
ˇ
EPn ˇkX .!/ gk p
'.s, Xs .!// .ds/ˇˇ
n
T
Z Z
pC1 n p p
2 1¹jX n j>C º jX j d.˝Pn / C 2 1¹jX n j>C º jgjp d.˝Pn /
T n T n
Z Z
p n p pC1
C2 1¹jgj>C º jX j d.˝Pn / C 2 1¹jgj>C º jgjp d
T n T
where .˘1/–.˘5/ make every term on the right-hand side smaller than 14 ı 2 , indepen-
dently of n. Thus
ˇ Z ˇ
ˇ n ˇ
sup EPn ˇˇkX .!/ gk
p
'.s, Xs .!// .ds/ˇˇ < ı 2
n
n0 T
as desired.
2.6 Proof of Theorem 2.4(a). (1) We shall prove part (a) of Theorem 2.4 using ‘ac-
companying sequences’. We explain this for some sequence .Yen /n of real valued ran-
e
dom variables whose convergence in law to Y 0 we wish to establish. Write Cu .R/ for
52 Chapter 2 Minimum Distance Estimators
the class of uniformly continuous and bounded functions R ! R; for f 2 Cu .R/ put
Mf :D 2 sup jf j. A sequence .Z e n /n – where for every n 0, Z
e n and Yen live on the
same probability space – is called a ı-accompanying sequence for .Y ne n / if
n
sup Pn jY e Z enj > ı < ı .
n0
Selecting for every f 2 Cu .R/ and every > 0 some ı D ı.f , / > 0 such that
For a given sequence .Y en /n0 we thus have the following: whenever we are able to
associate for every ı > 0 some sequence .Z e n .ı//n0 which is ı-accompanying and
e
such that Z e
n .ı/ converges in law to Z 0 .ı/ as n ! 1, we do have convergence in
e0 , thanks to .ı/.
en /n to Y
law of .Y
(2) We start the proof of Theorem 2.4(a). In order to show
or equivalently
n
kX g1 kp , : : : , kXn gl kp ! kX g1 kp , : : : , kX gl kp
(weakly in Rl , n ! 1) .
X
l X
l
Y n :D ˛i kXn gi kp ! ˛i kX gi kp D: Y 0
.CC/ iD1 iD1
(weakly in R, n ! 1)
Pl
where w.l.o.g. we can assume first ˛i ¤ 0 for 1 i l, and second 1
l
1
iD1 j˛i j D 1,
by multiplication of .˛1 , : : : , ˛` / with some constant.
Section 2.1 Stochastic Processes with Paths in Lp .T , T , / 53
(3) With .˛1 , : : : , ˛` / and g1 , : : : , g` of (++), for ı > 0 arbitrary, select functions
'gi , ı in H such that
lj˛i j
ˇ Z ˇ
ˇ n ˇ ı ı
sup Pn ˇˇkX gi k
p
'gi , ı .s, Xs /.ds/ˇˇ >
n
<
n0 T lj˛i j lj˛i j lj˛i j
with notations of Lemma 2.5’. Then also
X
l
'˛,g1 ,:::,gl ,ı .t , x/ :D ˛i 'gi , ı .t , x/ , t 0, x2R
lj˛i j
iD1
by Lemma 2.5, condition (a.i) of Theorem 2.4 gives convergence in law of the integrals
(4) Combining ./ and ./, we have constructed a ı-accompanying sequence for
.Y n /n2N0 which is weakly convergent, for ı > 0 arbitrary. By step (1), we thus have
proved convergence in law of Yn to Y0 as n ! 1. This is (++). According to step (2),
part (a) of Theorem 2.4 is proved.
Next, the function .s, x/ :D f1 .x/ belongs to class H as defined in Lemma 2.5.
Thanks to Lemma 2.5, condition (a.i) of Theorem 2.4 which is assumed here yields
convergence in law
Z Z
n
f1 .Xs / .ds/ ! f1 .Xs / .ds/ (weakly in R, n ! 1)
T T
for arbitrary g 2 Cb .R/. Put M :D sup jf1 j and consider in particular functions
g 2 Cb .R/ which on ŒM , CM coincide with the identity. Since .T / D 1, (+) for
such g yields
Z Z
.CC/ EPn f1 .Xsn / .ds/ ! EP f1 .Xs / .ds/ , n ! 1 .
T T
This is uniform integrability as stated in condition (a.ii) of Theorem 2.4. The proof of
Theorem 2.4(b) is finished.
Xn
1X
n
b n .t , !/ :D 1
F 1.1,t .Yi .!// D 1¹Yi tº .!/ , t 2 Rm .
n n
iD1 iD1
b n ., / is a mea-
b n .t , !/, F
Considering as in Exercise 2.1’(a) the mapping .t , !/ ! F
surable stochastic process in the sense of Assumption 2.1(b). For any choice of a fi-
nite measure on .Rm , B.Rm //, we write for short L2 ./ D L2 .Rm , B.Rm /, /,
56 Chapter 2 Minimum Distance Estimators
b n ., / is a process with paths in .L2 ./, B.L2 ./// as defined in Lemma 2.2. As-
F
sumption (+) guarantees that for fixed ! the mapping
for n large enough, using inverse triangular inequality. Glivenko–Cantelli ./ in the
last expression shows that an identifiability condition
./ inf kF F# kL2 ./ > 0 , for every # 2 ‚ and every " > 0
2‚:j#j>"
Section 2.2 Minimum Distance Estimator Sequences 57
will guarantee consistency of the estimator sequence .#n /n for the unknown parameter
as n ! 1. Due to the structure of the last right-hand side in the chain of inclusions
above, the convergence #n ! # under P# will necessarily be almost sure conver-
gence, for every # 2 ‚. Thanks to the continuity (+), the identifiability condition
./ is usually easy to satisfy in restriction to compact subsets of ‚; difficulties may
arise for at large distances from #, in the case where ‚ is unbounded and of large
dimension.
We give a list of assumptions and notations to be used in this subsection. The as-
sumptions in 2.8(I) will always be in force; out of the list 2.8(III), we will indicate
separately for each result what we assume.
(I) H is a Hilbert space, with scalar product h, iH and norm kkH , equipped with its
Borel- -field B.H /. We consider a sequence ‰ bn of Fn -measurable H -valued random
variables
bn : ., Fn / ! .H , B.H // , n 1
‰
and a deterministic family ¹‰ : 2 ‚º of objects in H such that
‚3 ! b n .!/ ‰
‰ 2 Œ0, 1/
H
are continuous: hence for open (or closed) sets B or compact sets F which are con-
tained in ‚,
b n ‰
inf ‰ , b n ‰
sup ‰ , b n ‰
min ‰ , :::
2B H 2B H 2F H
P# -almost surely: bn ‰# kH ! 0 as n ! 1;
k‰
(c) Tightness condition T(#): there is a sequence 'n D 'n .#/ " 1 of norming
constants such that
° ±
L ' n k‰b n ‰# kH j P# : n 1 is tight in R ;
.C/ ‚ 3 ! ‰ 2 H
2.9 Definition. A sequence of estimators #n : ., Fn / ! .Rd , B.Rd // for the
unknown parameter # 2 ‚ is called a minimum distance (MD) sequence if there is a
sequence of events An 2 Fn such that
P# lim inf An D 1
n!1
for all # 2 ‚ fixed, and such that for n 1
Then one can construct a sequence .Tn /n such that the following holds for all n 1:
the set of all points in Kn where the mapping ! k‰ bn .!/ ‰ kH attains its global
minimum. By continuity of this mapping and definition of An , Mn .!/ is a non-void
closed subset of Kn when ! 2 An , hence a non-void compact set. Thus, for ! 2 An ,
out of arbitrary sequences in Mn .!/, we can select convergent subsequences having
limits in Mn .!/.
(2) For fixed n 1 and fixed ! 2 An we specify one particular point ˛.!/ 2
Mn .!/ as follows. Put
(3) For ! 2 An and for the particular point ˛.!/ 2 Mn .!/ selected in step (2),
write ˛ .n/ .!/ :D ˛.!/ for clarity, and define (fixing some default value #0 2 ‚)
Then Tn .!/ represents one point in Kn such that the mapping ! k‰ bn .!/ ‰ kH
attains its global minimum on ‚ at Tn .!/, provided ! 2 An . It remains to show that
Tn is Fn -measurable.
(4) Write for short :D inf2‚ k‰ bn ‰ kH . By construction of the sequence
.n/ / , fixing arbitrary .b , : : : , b / 2 Rd and selecting convergent subsequences in
.˛ n 1 d
compacts Kn \ .XjmD1 .1, bj X Rd m /, the following is seen to hold successively
in 1 m d :
° ±
.n/ .n/
An \ ˛1 b1 , : : : , ˛m bm D
\ [ ° ±
bn ‰ kH < C r 2 Fn
k‰
²
r>0 rational 2 Qd \Kn
j bj ,1j m
The core of the last proof was ‘measurable selection’ out of the set of points where
! k‰ bn .!/ ‰ kH attains its global minimum on ‚. In asymptotic statistics,
problems of this type arise in almost all cases where one wishes to construct estima-
tors through minima, maxima or zeros of suitable mappings ! H. , !/. See [43,
Thm. A.2 in App. A] for a general and easily applicable result to solve ‘measurable se-
lection problems’ in parametric models, see also [105, Thm. 6.7.22 and Lem. 6.7.23].
2.11 Proposition. Assume SLLN(#) and I(#) for all # 2 ‚. Then the sequence
.Tn /n constructed in Proposition 2.10 is a minimum distance estimator sequence for
the unknown parameter. Moreover, arbitrary minimum distance estimator sequences
.#n /n for the unknown parameter as defined in Definition 2.9 are strongly consistent:
Proof. (1) For the particular sequence .Tn /n which has been constructed in Proposition
2.10, using a compact exhaustion .Kn /n of ‚ and measurable selection on events
² ³
An :D min ‰ b n ‰ D inf ‰ bn ‰ 2 Fn
2Kn H 2‚ H
Section 2.2 Minimum Distance Estimator Sequences 61
defined with respect to Kn , we have to show that the conditions of our proposition
imply
.ı/ P# lim inf An D 1
n!1
for all # 2 ‚. Then, by Proposition 2.10, the sequence .Tn /n with ‘good sets’ .An /n
will have all properties required in Definition 2.9 of an MD estimator sequence.
Fix # 2 ‚. Since .Kn /n is a compact exhaustion of the open parameter set ‚,
there is some n0 and some "0 > 0 such that B2"0 .#/ Kn for all n n0 . Consider
0 < " < "0 arbitrarily small. By the definition of An and Tn in Proposition 2.10, we
have for n n0
² ³
min b
‰ n ‰ < inf b
‰ n ‰ An \ ¹jTn #j "º .
:j#j" H :j#j>" H
(in the third line, we use inverse triangular inequality). For the event Cn defined by
the right-hand side of this chain of inclusions, SLLN(#) combined with I(#) yields
P# lim sup Cn D P# .¹! : ! 2 Cn for infinitely many nº/ D 0 .
n!1
But Acn is a subset of Cn for n n0 , hence we have P# .lim sup Acn / D 0 and thus .ı/.
n!1
(2) Next we consider an arbitrary MD estimator sequence .#n /n for the unknown
parameter according to Definition 2.9, and write .e
An /n for its sequence of ‘good sets’:
thus e
An 2 Fn , for ! 2 e An the mapping ! k‰ bn .!/ ‰ kH attains its global
.!/, and we have
minimum on ‚ at #n
.ıı/ P# lim inf e An D 1 .
n!1
This holds for all " > 0, and we have proved P# -almost sure convergence of .#n /n
to #.
Two auxiliary results prepare for the proof of our main result – the representation of
rescaled MD estimator errors, Lemma 2.13 below – in this section. The first is purely
analytical.
are contributions
inf 'n ‰#Ch='n ‰# H
, " > 0 arbitrarily small
jhj>c , jh='n j<"
2.13 Lemma. In addition to I(#) and D(#), assume T(#) with norming constants
'n D 'n .#/ " 1. Then arbitrary MD sequences .#n /n according to Definition 2.9
are .'n /n -consistent at #:
® ¯
L 'n .#n #/ j P# : n 1 is tight in Rd .
Proof. Let .#n /n with ‘good sets’ .An /n denote any choice of a MD estimator se-
quence for the unknown parameter according to Definition 2.9.
(1) For K < 1 arbitrarily large but fixed, we repeat the reasoning of step (2) in
the proof of Proposition 2.11, except that we insert K='n in place of the " there, for n
large enough. Writing
² ³
Cn .K/ :D 2 ‰ b n ‰# inf ‰ ‰# H
H : j#j>K='n
we thus obtain
®ˇ ˇ ¯
.ı/ ˇ'n .# #/ˇ > K Cn .K/ [ Ac
n n
can be made arbitrarily large by choosing K large, whereas the tightness condition
T(#) yields
lim sup P# 'n ‰b n ‰# > M D0.
M "1 n1 H
Combining both statements, we obtain for the events Cn .K/ in step (1)
for every " > 0 there is K D K."/ < 1 such that lim sup P# . Cn .K."// / < " .
n!1
2.14 Theorem (Millar [100]). Assume SLLN(#), I(#), D(#), and T(#) with a se-
quence of norming constants 'n D 'n .#/ " 1. With d d matrix
˝ ˛
ƒ# :D Di ‰# , Dj ‰# 1i,j d
Proof. (1) We begin with a preliminary remark. By assumption D(#), see Assump-
tions 2.8(III), the components D1 ‰# , .., Dd ‰# of the derivative D‰# are linearly
independent in H . Hence
V# :D span.Di ‰# : 1 i d /
is a d -dimensional closed linear subspace of H . For points h 2 Rd with components
h1 , : : : , hd and for elements f 2 H , the following two statements are equivalent:
Section 2.2 Minimum Distance Estimator Sequences 65
Pn
(i) the orthogonal projection of f on V# takes the form iD1 hi Di ‰# D h> D‰# ;
(ii) one has …# .f / D h .
Note that the orthogonal projection of f on V# corresponds to the unique h in Rd
such that f h> D‰# ? V# . This can be rewritten as
X
d X
d
0D f hi Di ‰# , Dj ‰# D hf , Dj ‰# i hi .ƒ# /i,j , for all 1 j d
iD1 iD1
bn ‰# /kH under P# as
Moreover, assumption T.#/ imposing tightness of k'n .‰
n ! 1 guarantees
the family of laws L. b
hn j P# /, n 1 , is tight in Rd
since …# is a linear mapping.
(3) Let .#n /n denote any MD estimator sequence for the unknown parameter with
‘good sets’ .An /n as in Definition 2.9: for every n 1, on the event An , the mapping
‚ 3 ! bn ‰
‰ 2 Œ0, 1/
H
attains its global minimum at #n , and we have (in particular, cf. proof of Lemma 2.13)
lim P# .An / D 1. With the norming sequence of assumption T.#/, put
n!1
with the notation of steps (2) and (3). The idea is as follows. On the one hand, the
function n,# ./ defined in ./ attains a global minimum at hn (on the event An ),
on the other hand, by ./,
bn ‰# / 'n .‰#Ch=' ‰# /
n,# .h/ D 'n .‰ n
b n ‰# / h> D‰# 'n ‰#Ch=' ‰# .h='n /> D‰#
D 'n .‰ n
on arbitrary compacts K in Rd .
(6) Squaring the random function Dn,# ./ defined in . /, we find a quadratic
lower bound
ˇ ˇ2
.C/ 2
Dn,# .h/ Dn,#
2
.b
hn / C ˇ h b
hn ˇ 2 , for all ! 2 , h 2 Rd , n 1
Section 2.2 Minimum Distance Estimator Sequences 67
introduced in (++) in the proof of Lemma 2.12. Assertion (+) is proved as follows.
From ./ in step (2)
b
h> b
n D‰# is the orthogonal projection of 'n .‰ n ‰# / on the subspace V#
or equivalently
b n ‰# / b
'n .‰ h>
n D‰# ? V#
where we have used the quadratic lower bound (+) for Dn,# 2 ./ around b
hn from step
(6). Recall that lim P# .An / D 1, as in step (2) in the proof of Lemma 2.13. Thanks
n!1
to the approximation .˘/ from step (5), we can replace – uniformly on K as n ! 1
2 ./ by 2 ./. This gives
– the random function Dn,# n,#
lim sup P# jhn b
hn j > "
n!1
® 1 ¯
< " C lim sup P# An \ 2n,# .hn / 2n,# .b
h n / C "2 2 .
n!1 2
68 Chapter 2 Minimum Distance Estimators
For n large enough, the compact K is contained in ‚#,n . For ! 2 An , by ./ in step
(3), the global minimum M of the mapping ‚#,n 3 h ! 2n,# .h/ is attained at hn .
Hence for n large enough, the intersection of the sets
² ³
b 1 2 2
An and n,# .hn / M "
2
2
must be void: this proves .˘˘/. By .˘˘/, " > 0 being arbitrary, the sequences .hn /n
and .b
hn /n under P# are asymptotically equivalent. Thus we have proved
hn D b
hn C oP# .1/ as n ! 1
which – as stated at the start of step (4) – concludes the proof.
2.15 Remark. We indicate a variant of our approach. Instead of almost sure conver-
gence of the MD sequence .#n /n to the true parameter one might be interested only
in convergence in probability (consistency in the usual sense of Definition 1.9). For
this, it is sufficient to work with Definition 2.9 of MD estimator sequences with good
sets .An /n which satisfy the weaker condition limn!1 P# .An / D 1 , and to weaken
the condition SLLN(#) to k‰ bn ‰# kH ! 0 in P# -probability, i.e. to a weak
law of large numbers WLLN(#). With these changes, Proposition 2.12, Lemma 2.13
and Theorem 2.14 remain valid, and Proposition 2.11 changes to convergence in P# -
probability instead of P# -almost sure convergence.
2.17 Examples. (a.i) For T D Œ0, 1/ or T D Œ0, 1, consider standard Brown-
ian motion .B t / t2T with B0 0. By continuity of all paths, B defined on any
.0 , A0 , P 0 / is a measurable stochastic process (cf. Exercise 2.1’(b)). By indepen-
dence of increments, B t2 B t1 having law N .0, t2 t1 / , we have E.B t1 B t2 / D
E.B t21 / C E.B t1 .B t2 B t1 // D t1 for t1 < t2 . Hence, writing
K.t1 , t2 / D t1 ^ t2 t1 t2 , t1 , t2 2 Œ0, 1 .
(ii) The paths of B 0 being bounded functions on Œ0, 1, Brownian bridge is a mea-
surable stochastic process with paths in L2 .Œ0, 1, B.Œ0, 1/, / (apply Lemma 2.2) for
every finite measure on .Œ0, 1, B.Œ0, 1//.
70 Chapter 2 Minimum Distance Estimators
All paths of B 0,F being cJadlJag (right continuous with left-hand limits: this holds by
construction since F is cJadlJag), B 0,F is a measurable stochastic process (cf. Exer-
cise 2.1’(b)). Using (b), B 0,F is a Gaussian process in the sense of Definition 2.16(b)
with covariance kernel
K.t1 , t2 / D F .t1 / ^ F .t2 / F .t1 /F .t2 / , t1 , t2 2 R .
(ii) The paths of B 0,F being bounded functions on R, Brownian bridge time-changed
by F is a process with paths in L2 .R, B.R/, / by Lemma 2.2, for arbitrary choice
of a finite measure .
Gaussian processes have been considered since about 1940, together with explicit
orthogonal representations of the process in terms of eigenfunctions and eigenvalues
of the covariance kernel K., / (Karhunen–Loève expansions). See Loeve [89, vol. 2,
Sects. 36–37, in particular p. 144] or [89, 3rd. ed., p. 478], see also Gihman and Sko-
rohod [28, vol. II, pp. 229–230].
Then for every finite measure on .T , T /, there is a real valued measurable process
.X t / t2T which is -Gaussian with covariance kernel K., / as in Definition 2.16(a).
Proof. (1) Since the kernel K.., ./ is symmetric and non-negative definite, for arbitrary
choice of t1 , : : : , t` in T , ` 1, there is a centred normal law Pt1 ,:::,t` with covariance
matrix .K.ti , tj //i,j D1,:::,` on .R` , B.R` //, with characteristic function
>
R` 3 ! e 2 † , † :D .K.ti , tj //i,j D1,:::,` .
1
such that the canonical process X D .X t / t2T on ., A, P / – the process of coordinate
projections – has finite dimensional distributions
L .X t1 , : : : , X t` / j P D N .0/iD1,:::,` , .K.ti , tj //i,j D1,:::,` ,
.C/
t1 , : : : , t` 2 T , ` 1 .
vanish as n ! 1. In this sense, the process .X t / t2T under P is ‘mean square con-
tinuous’. As a consequence, X under P is continuous in probability:
Then by (+) in step (1), arbitrary finite dimensional distributions of X m are Gaussian
with covariances
E X m .t1 /X m .t2 / D K tl.t 1 ,m/
.m/ , tl.t 2 ,m/
.m/ , t1 , t2 2 T .
By continuity of K., / and convergence tl.t,m/ .m/ ! t , the finite dimensional distri-
m converge as m ! 1 to those of the process X constructed in step (1):
butions of X
the last equation combined with (+) gives for arbitrary t1 , : : : , t` 2 T and ` 1
L .X tm1 , : : : , X tm` / j P ! L .X t1 , : : : , X t` / j P
./
(weak convergence in R` , as m ! 1) .
(3) We fix any finite measure on .T , T / and show that the sequence X m converges
in L2 .T , T ˝A, ˝P / as m ! 1 to some limit process X. e Renormalising , it
is sufficient to consider probability measures e on .T , T /.
72 Chapter 2 Minimum Distance Estimators
Since K., / is uniformly continuous on the compact T T and since jtl0 .m0 /
p
tl .m/j k 2m for indices l 0 and l such that Al 0 .m0 / Al .m/, integrands in .ı/
K.tl0 .m0 /, tl0 .m0 // 2K.tl0 .m0 /, tl .m// C K.tl .m/, tl .m//
for t 2 Al 0 .m0 / \ Al .m/
.C C C/ e in L2 .T , T ˝A, e
X m ! X ˝P / as m ! 1 .
e is measurable, and we deduce from (+++) convergence X tm ! X
In particular, X e t in
2 .P / as m ! 1 for e-almost all t 2 T ; the exceptional e
-null set in T arising here
L
can in general not be avoided.
(4) In (+++), we select a subsequence .mk /k along which e ˝P -almost sure con-
vergence holds:
e
X mk ! X e
˝P -almost surely on .T , T ˝A/ as k ! 1 .
Section 2.3 Some Comments on Gaussian Processes 73
for all ! 2 by ./, and at the same time –combining . / and ./– weak
convergence in Rl
.˘˘/ L 1M t1 X tm1 k , : : : , 1M t` X tm` k ! N 0l , K.ti , tj / i,j D1,...,l , k ! 1 .
This shows that the (real valued and measurable) process 1M X e is -Gaussian with
covariance kernel K., / in the sense of Definition 2.16(a).
Proof. For X -Gaussian, fix an exceptional -null set N 2 T such that whenever
t1 , : : : , tr do not belong to N , finite dimensional laws L .X t1 , : : : , X tr j P 0 / are normal
laws with covariance matrix .K.ti , tj //i,j D1,:::,r .
(1) X on .0 , A0 , P 0 / being real valued and measurable, the set
² Z ³
0 0
N :D ! 2 : X .t , !/ .dt / D C1 2 A0
2
T
74 Chapter 2 Minimum Distance Estimators
Under our assumptions, jgj.t /.dt / is a finite measure on .T , T /, and the kernel
K., / is continuous and bounded on T T . Thus by dominated convergence
Z
m2
! g.t1 / K.t1 , t2 / g.t2 / .˝/.dt1 , dt2 / D 2 , m ! 1
T T
which yields
Z
L
.C/ g.t / X tm .dt / ! N 0 , 2 as m!1.
T
Thus the assumptions of Theorem 2.4(a) are satisfied (here we use the sufficient con-
dition (2.4”) with p D 2 and constant f to establish uniform integrability). Applying
Theorem 2.4(a) we obtain
L
Xm ! X (weak convergence in L2 .T , T , / as m ! 1) ,
from which the continuous mapping theorem (see Exercise 2.3’) gives weak conver-
gence of integrals
Z Z
m L
g.t / X t .dt / ! g.t / X t .dt /
.CC/ T T
(weak convergence in R as m ! 1) .
bn :
‰ ., Fn / ! .H , B.H // , n1
76 Chapter 2 Minimum Distance Estimators
The spatial structure assumed in (b) was already present in Example 2.7. We com-
plete the list of Assumptions 2.8(III) by strengthening the tightness condition T(#)
in 2.8(III):
Now we state the main theorem on minimum distance estimators. It relies directly
on Proposition 2.11 and on the representation of rescaled estimation errors in Theo-
rem 2.14.
2.22 Theorem. Under Assumptions 2.20, let for every # 2 ‚ the set of conditions
SLLN(#), I(#), D(#), AN(#), with 'n D 'n .#/ " 1
Section 2.4 Asymptotic Normality for Minimum Distance Estimator Sequences 77
hold. Then any minimum distance estimator sequence .#n /n for the unknown param-
eter # 2 ‚ defined according to Definition 2.9 is (strongly consistent and) asymptot-
ically normal. We have for every # 2 ‚
L 'n .#n #/ j P# ! N 0 , ƒ1 # „# ƒ#
1
Write h., .i for the scalar product in L2 .T , T , /. Then the continuous mapping theo-
rem gives
L
.˘/ h g, Wn i ! h g, W i (weakly in R, as n ! 1)
To conclude the proof, it is sufficient to combine the last convergence with the repre-
sentation of Theorem 2.14
'n .#n #/ D …# Wn C oP# .1/ , n ! 1
2.23 Lemma. Let Y1 , Y2 , : : : denote i.i.d. random variables taking values in Rk , with
continuous distribution function F : Rk ! Œ0, 1. Let F b n : .t , !/ ! F b n .t , !/
denote the empirical distribution function based on the first n observations. Consider
T compact in Rk , with Borel- -field T , and a finite measure on .T , T /. Write
p
W n :D n F bn F , n 1
Section 2.4 Asymptotic Normality for Minimum Distance Estimator Sequences 79
iD1
2.24 Example (Example 2.7 continued). For ‚ Rd open, consider .Yi /i1 i.i.d.
observations in Rk , with continuous distribution function F# : Rk ! Œ0, 1 under
# 2 ‚. Fix T compact in Rk , T D B.T /, some finite measure on .T , T /, and
assume the following: for all # 2 ‚,
the parameterisation ‚ 3 # ! ‰# :D F# 2 H D L2 .T , T , /
.˘/
satisfies I(#) and D(#) .
with ‘min’ taken componentwise in Rk . For the proof of these statements, recall from
Example 2.7 that condition SLLN(#) holds as a consequence of Glivenko–Cantelli,
cf. ./ in Example 2.7. I(#) holds by .˘/ assumed above, hence Proposition 2.11
gives strong consistency of any version of the MD estimator sequence (+). Next,
Lemma 2.23 establishes condition AN(#) with covariance kernel K.t1 , t2 / D F# .t1 ^
Section 2.4 Asymptotic Normality for Minimum Distance Estimator Sequences 81
t2 / F# .t1 /F# .t2 /. Since D(#) holds by assumption .˘/, Theorem 2.22 applies and
yields the assertion.
2.24’ Exercise. In dimension d D 1, fix a distribution function F on .R, B.R// which admits
a continuous and strictly positive Lebesgue density f on R, and consider as a particular case
of Example 2.24 a location model
where ‚ is open in R. Check that for any choice of a finite measure on .T , T /, T compact
in R, assumptions I(#) and D(#) are satisfied (with DF# D f# ) for all # 2 ‚.
MD estimators in i.i.d. models may be defined in many different ways, e.g. based on
empirical Laplace transforms when tractable expressions for the Laplace transforms
under # 2 ‚ are at hand, from empirical quantile functions, from empirical charac-
teristic functions, and so on. The next example is from Höpfner and Rüschendorf [58].
2.25 Example. Consider real valued i.i.d. symmetric stable random variables .Yj /j 1
with characteristic function
u ! E.˛,/ e i uY1 D e juj , u 2 R
˛
and estimate both stability index ˛ 2 .0, 2/ and weight parameter 2 .0, 1/ from the
first n observations Y1 , : : : , Yn .
Put # D .˛, /, ‚ D .0, 2/ .0, 1/. Symmetry of L.Y jP# / – the characteristic
functions being real valued
P – allows to work with the real parts of the empirical char-
acteristic functions n1 jnD1 e i uYj only, so we put
X n
b n .u/ D 1
‰ cos.uYj / , ‰# .u/ :D e juj ,
˛
# 2‚.
n
j D1
Fix a sufficiently large compact interval T , symmetric around zero and including open
neighbourhoods of the points u D ˙1, T D B.T /, and a finite measure on .T , T /,
symmetric around zero such that
Z "
.C/ j log uj2 .du/ < 1 for all " > 0 ,
"
and assume in addition:
8
< for some open neighbourhood U of the point u D 1,
.˘/ the density of the -absolutely continuous part of in restriction to U
:
is strictly positive and bounded away from zero.
Then any MD estimator sequence based on the empirical characteristic function
b n ‰#
#n :D arginf ‰
#2‚ L2 .T ,T ,/
82 Chapter 2 Minimum Distance Estimators
Now we repeat exactly the same reasoning with the characteristic functions ‰# 0 , ‰#
e , instead of their linearisations ‰
restricted to U e# 0 , ‰e# at u D 1 which we considered
so far, and obtain
° ±
0 0
inf k‰# 0 ‰# kL2 .e U ,e
/
: # 2 ‚ , j# #j " > 0.
Section 2.4 Asymptotic Normality for Minimum Distance Estimator Sequences 83
Since there is some constant c > 0 such that .dt / .je U /.dt / c e
.dt /, by
assumption .˘/, we may pass from U e to T in the last assertion and obtain a fortiori
® ¯
inf k‰# 0 ‰# kL2 .T ,T ,/ : # 0 2 ‚ , j# 0 #j " > 0 .
Since " > 0 was arbitrary, this is the identifiability condition I(#).
(2) Following Prakasa Rao [107, Proposition 8.3.1], condition SLLN(#) holds for
all # 2 ‚:
We start again from the distribution functions F# associated to ‰# , and Glivenko–
Cantelli. Fix # 2 ‚ and write A# 2 A for some set of full P# -measure such
that sup t2R jFb n .t , !/ F# .t /j ! 0 as n ! 1 when ! 2 A# . In particular
b
F n .!, t / ! F# .t / at continuity points t of F# , hence empirical measures asso-
ciated to .Y1 , : : : , Yn /.!/ converge weakly to F# , for ! 2 A# . This gives pointwise
convergence of the associated characteristic functions ‰ bn .t , !/ ! ‰# .t / for t 2 R
when ! 2 A# . By dominated convergence on T with respect to the finite measure
this establishes SLLN(#).
(3) Condition (+) is a sufficient condition for differentiability D(#) at all points
# 2 ‚:
By dominated convergence under (+) assumed above
Z "
j log uj2 .du/ < 1 for " > 0 arbitrarily small
"
X
l
Rj :D ˛i cos.ui Yj / ‰# .ui / .
iD1
2.25’ Exercise. For the family P D ¹ .a, p/ : a > 0 , p > 0º of Gamma laws on .R, B.R//
with densities
p a a1 px
fa,p .x/ D 1.0,1/ .x/ x e ,
.a/
for i.i.d. observations Y1 , Y2 , : : : from P , construct MD estimators for .a, p/ based on empirical
Laplace transforms
1 X Yi
n
Œ0, 1/ 3 ! e 2 Œ0, 1
n
i D1
and discuss their asymptotics (hint: to satisfy the identifiability condition, work with measures
on compacts T D Œ0, C which satisfy .dy/ c dy in restriction to small neighbourhoods
Œ0, "/ of 0C ).
We conclude this chapter with some remarks. Many classical examples for MD
estimator sequences are given in Millar [100, Chap. XIII]. Also, a large number of
extensions of the method exposed here are possible. Kutoyants [79] considers MD
estimator sequences for i.i.d. realisations of the same point process on a fixed spatial
window, in the case where the spatial intensity is parameterised by # 2 ‚. Beyond
the world of i.i.d. observations, MD estimator sequences can be considered in ergodic
Markov processes when the time of observation tends to infinity, see a large number
of diffusion process examples in Kutoyants [80], or in many other stochastic process
models.
Chapter 3
Contiguity
This chapter discusses the notion of contiguity which goes back to Le Cam, see Hájek
and Sidák [41], Le Cam [81], Roussas [113], Strasser [121], Liese and Vajda [87],
Le Cam and Yang [84], van der Vaart [126], and is of crucial importance in the con-
text of convergence of local models. Mutual contiguity considers sequences of like-
lihood ratios whose accumulation points in the sense of weak convergence have the
interpretation of a likelihood ratio between two equivalent probability laws. Section
3.1 fixes setting and notations, and states two key results on contiguity, termed ‘Le
Cam’s first lemma’ and ‘Le Cam’s third lemma’ (3.5 and 3.6 below) since Hájek and
Sidák [41, Chap. 6.1]. Section 3.2 contains the proofs together with some variants of
the main results.
86 Chapter 3 Contiguity
d
3.1’ Notations. For R :D Œ1, C1, consider R -valued random variables Xn
living on some .n , An , Qn /, n 1, and associate to Xn the Rd -valued random
b n :D Xn 1¹jX j<1º .
variable X n
(i) We call the sequence .Xn /n Rd -tight under .Qn /n if
° ±
./ lim Qn .Xn ¤ X b n / D 0 and L X b n j Qn : n 1 is tight in Rd .
n!1
(ii) For probability measures F on .Rd , B.Rd //, we say that .Xn /n under .Qn /n
converges Rd -weakly to F – and write
L .Xn j Qn / ! F (weakly in Rd as n ! 1)
for short – if .Xn /n under .Qn /n is Rd -tight and if the second condition in ./ can
be strengthened to
./ L X b n j Qn ! F (weak convergence in Rd as n ! 1) .
Proof. For every n 1, the event ¹Ln D 1º has Qn -measure zero by Nota-
tion 3.1(i); we have to show
for every " > 0 there is some K < 1 such that sup Qn .1 > Ln > K/ < "
n1
The following results Proposition 3.3’, Theorem 3.4, Lemmas 3.5, 3.6 and Propo-
sition 3.6” will be stated in the present section, together with some comments; all
proofs – evolving through a series of variants and alternative formulations – will be
collected in Section 3.2. The main results are Lemmas 3.5 and 3.6.
3.3’ Proposition. For random variables Yn on .n , An / taking values in .Rd , B.Rd //,
the assertion
implies
¹L .Yn j Pn / : n 1º is tight .
3.4 Theorem. (a) The following assertions (i) and (ii) are equivalent:
.ii/ the sequence .ƒn /n is R-tight under both .Pn /n and .Qn /n .
Out of a tight sequence of probability laws on .Rd , B.Rd // we can select weakly
d
convergent subsequences. Hence, considering Rd -tight sequences of R -valued ran-
dom variables with the conventions of Notation 3.1’(i), there is no loss of generality
– switching to subsequences if necessary – in supposing Rd -weak convergence in the
sense of Notation 3.1’(ii).
3.5 Le Cam’s First Lemma. Assume that there is a probability law F on .R, B.R//
such that
L .ƒn j Qn / ! F (weakly in R as n ! 1) .
Section 3.1 Le Cam’s First and Third Lemma 89
3.5’ Remark. To any probability measure F on .R, B.R// which may occur in Lem-
ma 3.5, we can associate a real valued random variable ƒ on some probability space
., A, Q/ with distribution F ; theR easiest choice is ƒ :D id on ., A, Q/ :D
.R, B.R/, F / . Then the equality R e F .d / D 1 in Lemma 3.5 signifies that
., A/ carries a second probability measure P defined by
dP :D e ƒ dQ
We may view (+) as a ‘limit experiment’ as n ! 1 for the sequence of binary exper-
iments
.n , An , ¹Pn , Qn º/ , n 1 .
Thus Le Cam’s first lemma relates mutual contiguity .Pn /n CB .Qn /n to conver-
gence of experiments where weak limits of log-likelihood ratios are log-likelihood
ratios between equivalent probability laws.
3.5” Exercise. In the case where the weak limit for log-likelihood ratios ƒn under Qn is a
normal law
L .ƒn j Qn / ! N ., 2 / (weakly in R as n ! 1)
with strictly positive variance 2 > 0, use Lemma 3.5 to prove the equivalence
1
.Pn /n CB .Qn /n ” D 2 .
2
3.5”’ Exercise. On .n , An / D .Rn , B.Rn //, consider probability laws Qn D
˝niD1 N .0, 1/, Pn D ˝niD1 N . phn , 1/ for some h 2 R which we keep fixed as n ! 1. Write
P
.X1 , : : : , Xn / for the canonical variable on n . Check that ƒn is given by phn niD1 Xi 12 h2
and think of the Laplace transform of N .0, 1/ to check mutual contiguity .Pn /n CB .Qn /n
via Le Cam’s First Lemma 3.5. Check also that the ‘limit experiment’ in Remark 3.5’ can be
determined as .R, B.R// equipped with Q D N .0, 1/, P D N .h, 1/.
3.5”” Exercise. On .n , An / D .Rn , B.Rn //, for some h > 0 which we keep fixed as
n ! 1, consider probability laws Qn D ˝niD1 R.0, 1/ and Pn D ˝niD1 R.0C hn , 1C nh /.
Write Un D .0, 1/n for the support of Qn and Vn D .0C hn , 1C hn /n for the support of Pn .
Use sequences An D Un n Vn or e An D Vn n Un to check directly from Definition 3.2 that
90 Chapter 3 Contiguity
neither .Pn /n C .Qn /n nor .Qn /n C .Pn /n hold. Show also that instead of a limit law F on
.R, B.R// as required in Lemma 3.5 for the sequence of log-likelihood ratios L .ƒn j Qn / as
n ! 1, we have a limit law on .R, B.R/// which takes values 0 with probability e h and
1 with probability 1e h .
The next result is Le Cam’s third lemma (the ‘second lemma’ is related to repre-
sentation of log-likelihood ratios in smoothly parameterised product experiments, see
Chapters 4 and 7).
.Pn /n CB .Qn /n .
e on .R1Ck , B.R1Ck // and a sequence of Rk -valued
Consider a probability measure F
random variables Xn on .n , An / such that the following holds:
e
L ..ƒn , Xn / j Qn / ! F (weakly in R1Ck as n ! 1) .
Then
e , dx/ :D e F
G.d e .d , dx/ , 2 R , x 2 Rk
is a probability measure on .R1Ck , B.R1Ck //, and we have
e
L ..ƒn , Xn / j Pn / ! G (weakly in R1Ck as n ! 1) .
3.6’ Remarks. (a) Under mutual contiguity .Pn /n CB .Qn /n , we extend the in-
terpretation in Remark 3.5’ and view the law F e in Lemma 3.6 as the distribution of
a random variable .ƒ, X / defined on some ., A, Q/. Thus, in addition to Remark
3.5’, we have an Rk -valued statistic X in the ‘limit experiment’ ., A, ¹P , Qº/ where
P Q is defined by dP :D e ƒ dQ. In the ‘limit experiment’ we have
e D L ..ƒ, X / j Q/ ,
F e D L ..ƒ, X / j P /
G
e for
where the second statement is a consequence of the definition of P and of G:
suitable f ., /,
Z
EP .f .ƒ, X // D EQ e ƒ f .ƒ, X / D e f ., x/F e .d , dx/
Z
e , dx/ .
D f ., x/G.d
Now Le Cam’s Third Lemma 3.6 can be rephrased as follows: under mutual contiguity
.Pn /n CB .Qn /n , assertion (+) implies
e to G
where the passage from F e is explicit (at least in principle, cf. Lemma 3.6); second,
shifting all second components in (++) by h which does not depend on n,
p p
.C C C/ L .ƒn , n.e # n .# C h= n/// j P#Ch=pn ! H e
There is one important special case where the passage from F e to Ge in Le Cam’s
e
Third Lemma 3.6 is particularly simple. Normal limit laws F for L ..ƒn , Xn / j Qn /
as n ! 1 in Lemma 3.6 arise in many classical situations. The following proposition
shows that in this case L.Xn j Qn / and L.Xn j Pn / differ only by a mean value shift
D lim CovQn .ƒn , Xn / :
n!1
3.6” Proposition. When F e in Lemma 3.6 is some normal law on .R1Ck , B.R1Ck //
whose first component is not degenerate to a single point, then necessarily
1 2 2 >
e D N 2
F ,
†
3.7 Proposition. The following condition is necessary and sufficient for contiguity
.Pn /n C .Qn /n : for every " > 0 there is some ı D ı."/ > 0 such that
.C/ lim sup Qn .An / < ı H) lim sup Pn .An / < "
n!1 n!1
Proof. (1) We prove that condition (+) is sufficient for contiguity .Pn /n C .Qn /n . To
" > 0 arbitrarily small associate ı D ı."/ > 0 such that (+) holds. Fix any sequence
of events An in An , n 1, with the property lim Qn .An / D 0 . For this sequence,
n!1
condition (+) makes sure that we must have lim Pn .An / D 0 too: this is contiguity
n!1
.Pn /n C .Qn /n as in Definition 3.2.
(2) We prove that (+) is a necessary condition. Let us assume that for some " > 0,
irrespectively of the smallness of ı > 0, an implication like (+) never holds true. In
this case, considering in particular ı D k1 , k 2 N arbitrarily large, there are sequences
of events .Akn /n such that
1
lim sup Qn .Akn / < holds together with lim sup Pn .Akn / " .
n!1 k n!1
Section 3.2 Proofs for Section 3.1 and some Variants 93
From this we can select a sequence .nk /k increasing to 1 such that for every k 2 N
we have
Pnk .Aknk / > "2 together with Qnk .Aknk / < k2 .
Using Definition 3.2 along the subsequence .nk /k , contiguity .Pnk /k C .Qnk /k
does not hold. A fortiori, as an easy consequence of Definition 3.2, contiguity
.Pn /n C .Qn /n does not hold.
3.7’ Proof of Proposition 3.3’. We have to show that any sequence of Rd - valued
random variables .Yn /n which is tight under .Qn /n remains tight under .Pn /n when
contiguity .Pn /n C .Qn /n holds. Let " > 0 be arbitrarily small. Assuming contiguity
.Pn /n C .Qn /n , we make use of Proposition 3.7 and select ı D ı."/ > 0 such that
the implication (+) in Proposition 3.7 is valid. From tightness of .Yn /n under .Qn /n ,
there is some large K D K.ı/ < 1 such that
lim sup Qn .jYn j > K/ < ı ,
n!1
from which we deduce thanks to (+)
lim sup Pn .jYn j > K/ < " .
n!1
We prepare the next steps by some comments on the conventions in Notations 3.1’.
d
3.8 Remarks. With Notations 3.1’, consider on .n , An , Qn / R -valued random
b n :D Xn 1¹jX j<1º which are Rd -valued.
variables Xn , n 1, and their associated X n
The following assertions (a)–(c) can be checked directly from Notations 3.1’.
(a) The sequence .Xn /n under .Qn /n is Rd -tight if and only if
lim lim sup Qn .C1 jXn j > K/ D 0 .
K "1 n!1
(c) Uniform integrability of the sequence .Xn /n under .Qn /n as defined in Notation
3.1’(iii) is equivalent to validity of the following assertions (i) and (ii) together:
(i) we have sup EQn .jXn j/ < 1 ;
n1
(ii) for every " > 0 there is some ı D ı."/ > 0 such that
Z
n 1 , An 2 An , Qn .An / < ı H) jXn j dQn < " .
An
According to the definition in Notation 3.1’(iii), this is proved exactly as in the usual
case of real valued random variables living on some fixed probability space.
By Proposition 3.3 which was proved in Section 3.1, the sequence of likelihood ra-
tios .Ln /n is R-tight under .Qn /n , without further assumptions. Now we can prove
part (a) of Theorem 3.4: contiguity .Pn /n C .Qn /n is equivalent to R-tightness of
.Ln /n under both .Pn /n and .Qn /n .
3.9 Proof of Theorem 3.4(a). (1) To prove (i)H)(ii) of Theorem 3.4(a), we assume
.Pn /n C .Qn /n and have to verify – according to Remark 3.8(a) – that the following
holds true:
lim lim sup Pn .Ln 2 ŒK, 1/ D 0 .
K "1 n!1
If this assertion were not true, we could find some " > 0, a sequence Kj " 1, and a
sequence of natural numbers nj increasing to 1 such that
Pnj Lnj 2 ŒKj , 1 > " for all j 1 .
By Proposition 3.3 we know that .Ln /n under .Qn /n is R-tight, thus we know
lim lim sup Qn Ln 2 ŒKj , 1 D 0 .
j !1 n!1
start from a sequence of events An 2 An with the property lim Qn .An / D 0 and
n!1
write
Pn .1 Ln > K/ C K Qn .An / .
This gives
lim sup Pn .An / lim sup Pn .1 Ln > K/
n!1 n!1
where by R-tightness of .Ln /n under .Pn /n and by Remark 3.8(i), the righthand side
can be made arbitrarily small by suitable choice of K. This gives limn!1 Pn .An / D
0 and thus proves (ii)H)(i) of Theorem 3.4(a).
Preparing for the proof of Theorem 3.4(b) which will be completed in Proof 3.13
below, we continue to give characterisations of contiguity .Pn /n C .Qn /n .
in the Lebesgue decomposition in Notation 3.1(i) – plus any one of the additional con-
ditions considered in step (1).
3.10’ Exercise. On .R, B.R//, write P .#/ for the exponential law with parameter 1 shifted
by # > 0, i.e. supported by .#, 1/. On .n , An / D .Rn , B.Rn // with canonical variable
.X1 , : : : , Xn /, consider probability laws Qn D ˝niD1 P .0/ and Pn D ˝niD1 P .0C hn /, for
some h > 0 which is fixed as n ! 1. Write Un D .0, 1/n for the support of Qn and
Vn D .0C hn , 1/n for the support of Pn . Check that the likelihood ratio Ln is given by
Deduce contiguity .Pn /n C .Qn /n from Proposition 3.10(iii), from Proposition 3.10(ii), from
Theorem 3.4(a), from an "-ı-argument using Proposition 3.7, and finally directly from Defini-
tion 3.2.
The sequence of likelihood ratios .Ln /n under .Qn /n being R-tight by Proposi-
tion 3.3, for any subsequence .nk /k of the natural numbers, there are further subse-
quences .nkl /l and probability measures FL on .R, B.R// such that .Lnkl /l under
.Qnkl /l converge to FL weakly in R as l ! 1. In this sense, the assumption of the
following theorem – a one-sided version of Le Cam’s First Lemma 3.5 – is (up to
selection of subsequences) no restriction at all. Accumulation points FL in this sense
are necessarily concentrated on Œ0, 1/, and may have a point mass FL .¹0º/ > 0 at zero.
3.11 Theorem. Assume that there is some probability measure FL on .R, B.R// such
that
L.Ln j Qn / ! FL (weakly in R as n ! 1) .
Proof. (1) We start with some preliminary remarks. Define truncated identities
gK ./ 2 Cb .R/
gK .x/ D 0 _ x ^ K , x2R
R1
Considering 0 .x gK .x// QnLn .dx/ where EQn .Ln / 1 by definition of a likeli-
hood ratio,
Z
EQn .Ln / gK dQnLn
.C/ Z
D 1¹K<Ln <1º Ln dQn K QnLn ..K, 1// 2 Œ0, 1 .
thanks to Proposition 3.10. In the limit as n ! 1, the second assertion in .˘/ forces
right-hand sides in (+) to be be small when K is large:
Z
.?/ lim lim sup 1¹K<Ln <1º Ln dQn K QnLn ..K, 1// D 0 .
K"1 n!1
Inserting in this last convergence the first assertion of .˘/, we get via (++)
Z Z
lim lim gK dQnLn D lim gK d FL D 1
K"1 n!1 K"1
R
L D 1.
which is the desired assertion Œ0,1/ x F .dx/
R
L
(3) We prove ’(H’. Let us assume Œ0,1/ x F .dx/ D 1 . A first consequence of
this assumption is
Z Z
.ı/ lim lim gK dQnLn D lim gK d FL D 1
K"1 n!1 K"1
Inserting .ı/ into equations (+) when n tends to 1, left-hand sides in (+) will be small
for large values of K, see .??/, which again with reference to .ı/ gives
simultaneously, right-hand sides in (+) must be small for large values of K, see .?/,
which combined with .ıı/ gives
Z
. / lim lim sup 1¹K<Ln <1º Ln dQn D 0 .
K"1 n!1
3.11’ Remark. To any probability measure FL on .R, B.R// which may occur in
Theorem 3.11, associate a real valued random variable RL 0 on some probability
space ., A, Q/ with distribution FL . Then the equality Œ0,1/ ` FL .d `/ D 1 in Theo-
rem 3.11 signifies that ., A/ carries a second probability measure P defined by
dP :D L dQ .
here FL .¹0º/ D Q.L D 0/, the Q-weight of the support of the P -singular part of Q,
may be strictly positive. In this sense, one-sided contiguity .Pn /n C .Qn /n is related
to convergence of experiments where weak limits of likelihood ratios are likelihood
ratios between probability measures P , Q such that P is absolutely continuous with
respect to Q.
Proof. According to Notation 3.1(iv), the Œ0, 1-valued random variable e Ln :D L1n
is a version of the likelihood ratio of Qn with respect to Pn , and statement (ii) means
R-tightness of the sequence . L1n /n under .Qn /n . Changing names P e n :D Qn , Q
e n :D
Pn such that e e n with respect to Q
Ln is the likelihood ratio of P e n , Theorem 3.4(a) ap-
e n /n and .Q
plied to .P e n /n proves Proposition 3.12.
(1) We assume mutual contiguity. Then Propositions 3.3 and 3.12 together show
1
.C/ lim lim sup Qn Ln … Œ , K D 0
K"1 n!1 K
and Theorem 3.4(a) yields
At this stage, Theorem 3.4 being completely proved, we can prove Le Cam’s first
lemma in the form encountered in Lemma 3.5. An obvious corollary merging Theo-
rem 3.11 and Proposition 3.12 is the following:
3.14 Corollary. Assume that there is a probability measure FL on .R, B.R// such that
L.Ln j Qn / ! FL (weakly in R, as n ! 1) .
To deduce Corollary 3.14 from Theorem 3.11 and Proposition 3.12, it is sufficient
to consider continuity points c > 0 of FL for which lim Qn .Ln c/ D FL .Œ0, c/ .
n!1
Now we deduce Lemma 3.5 from Corollary 3.14:
100 Chapter 3 Contiguity
3.15 Proof of Le Cam’s First Lemma 3.5. We show that the setting of Lemma
3.5 specialises the setting of Corollary 3.14; in this sense, Corollary 3.14 is a slightly
stronger formulation of Le Cam’s first lemma. The difference is in the respective initial
conditions on convergence of likelihood ratios. Lemma 3.5 starts from some law F on
.R, B.R// such that
L.ƒn j Qn / ! F (weakly in R as n ! 1)
In explicit restriction to the subclass of laws FL which are concentrated on .0, 1/,
Corollary 3.14 states
Z
.Pn /n CB .Qn /n ” ` FL .d `/ D 1
.0,1/
R
where the right-hand side equals Re
F .d / D 1 when FL is image of F under the
above mapping.
We turn to the proof of Le Cam’s Third Lemma 3.6. Since .Ln /n under .Qn /n is
always R-tight by Proposition 3.3, we have R1Cd -tightness of pairs .Ln , Xn /n un-
d
der .Qn /n for any sequence .Xn /n of R -valued random variables which is Rd -tight
under .Qn /n , and can select subsequences .nl /l and laws F eL on .R, B.R// such that
e
.Lnl , Xnl /l under .Qnl /l converges weakly in R1Cd to FL as l ! 1. All accumula-
eL in this sense are necessarily laws which are concentrated on Œ0, 1/ Rd ,
tion points F
and FeL may put strictly positive mass to the hyperplane ¹0º Rd . Thus, up to selection
of subsequences, the only condition in the following one-sided variant of Le Cam’s
third lemma is contiguity .Pn /n C .Qn /n : in this case, sequences .Xn / converging
jointly with the likelihoods under .Qn /n will also converge under .Pn /n , and the limit
law will be ‘explicit’.
Proof. (I) We prove this result first in the case where all Xn take values in Rd .
(1) We start with some preliminaries. We write GL for the first marginal of the mea-
eL defined in Theorem 3.16, and FL for the first marginal of F
sure G eL . Then FL is a
probability on .R, B.R// such that
L .Ln j Qn / ! FL (weakly in R as n ! 1) .
In particular, FL is concentrated on Œ0, 1/. Now Theorem 3.11 and the assumed con-
tiguity give
Z
.ı/ ` FL .d `/ D 1 ,
Œ0,1/
eL is a probability. Again by .ı/, for " > 0 arbitrarily small, there is K D K."/ <
thus G
1 such that
Z Z
.C/ eL
l 1¹l>Kº F .d l, dx/ D l FL .d l/ < " ;
R1Cd .K,1/
from Proposition 3.10 and the assumed contiguity. Contiguity also implies
.C C C/ Pn .Ln D 1/ ! 0 as n ! 1 .
(2) First, we prove for functions f 2 Cb .R1Cd /
Z Z
eL l, dx/ ,
lim f .Ln , Xn / Ln dQn D f .l, x/ G.d n!1.
n!1
The family of convergences .˘/ for K D K."/, " > 0 arbitrary, in combination with
the last lines gives
Z Z
lim eL .d l, dx/ ;
f .Ln , Xn / Ln dQn D f .l, x/ l F
n!1
where
3
.Ln , Xn / D .Ln , Xn / 1¹Ln <1º
in accordance with Notations 3.1’ since all Xn are Rd -valued, by assumption, in this
part of the proof.
This holds since the quantity considered in step (2)
Z Z
f .Ln , Xn / Ln dQn D f .Ln , Xn / 1¹Ln <1º dPn
This ends the proof of Theorem 3.16 in the case where all Xn take values in Rd .
Section 3.2 Proofs for Section 3.1 and some Variants 103
d
(II) Now we extend the result from Rd -valued to R -valued random variables Xn ,
n 1.
In this case, by Notation 3.1’(ii), the convergence assumption on pairs .Ln , Xn /n
under .Qn /n implies
lim Qn .jXn j D 1/ D 0 ,
n!1
Contiguity and arguments similar to step (3) above allow to replace asymptotically as
n!1
Z Z
3
f .Ln , Xn / dQn by
b n dQn
f Ln , X
Z Z
3
f .Ln , Xn / dPn by
b n 1¹L <1º dPn
f Ln , X n
where we put X n
3
b n :D Xn 1¹jX j<1º , and where .Ln , Xn / D .Ln , Xn /1¹L <1,jX j<1º
n n
is the notation of 3.1’(ii). The convergence assumption in Theorem 3.16 can thus be
rephrased as Z Z
eL .d l, dx/
b
f Ln , X n dQn ! f .l, x/ F
d
Hence the assertion of Theorem 3.16 is proved also for R -valued random variables
Xn , n 1.
Now we can prove Le Cam’s Third Lemma 3.6 under mutual contiguity:
3.17 Proof of Le Cam’s Third Lemma 3.6. In analogy to the Proof 3.15 which
deduces Lemma 3.5 from Corollary 3.14, we shall show that the setting of Lemma
3.6 specialises the setting of Theorem 3.16 (hence, Theorem 3.16 is a slightly stronger
formulation of Le Cam’s third lemma). Writing d in place of k, Lemma 3.6 starts from
some law F e on .R1Cd , B.R1Cd // such that
e
L . .ƒn , Xn / j Qn / ! F (weakly in R1Cd as n ! 1) .
104 Chapter 3 Contiguity
e , dx/ D e F
Hence G.d e .d , dx/ defines a probability measure on .R1Cd ,
B.R1Cd //. The image G eL of G
e under the mapping R1Cd 3 ., x/ ! .e , x/ 2 R1Cd
being concentrated on .0, 1/ Rd , we can write
eL `, dx/ D .` _ 0/ F
G.d eL .d `, dx/ ,
It remains to prove Proposition 3.6” which specialises Le Cam’s Third Lemma 3.6
e in Lemma 3.6 is a normal law with non-degenerate
to situations where the limit law F
first component.
3.18 Proof of Proposition 3.6”. We start from the assumption that F e in Lemma 3.6
1Ck , B.R1Ck // whose first marginal – the limit law for the
is some normal law on .R
log-likelihood ratios .ƒn /n under .Qn /n when mutual contiguity .Pn /n CB .Qn /n
holds – is not degenerate at a single point, i.e. has some strictly positive variance
2 > 0. Then we can write Fe in the form
2 >
e
F D N ,
†
Z Œ C
P
k
zi xi
Z Œ.1/ C
P
k
zi xi
z ! e i D1 e , dx/ D
G.d e i D1 e .d , dx/
F
1 X k
.z e0 />m D . 1/. 2 / zi i
2
iD1
1 1 X
k
1 2
.z e0 />ƒ .z e0 / D z>ƒ z
2
zi i C
2 2 2
iD1
as 8 2 3 9
< 1 X
k
1 =
z ! exp 4 .C 2 / C zi .i Ci /5 C z>ƒ z
: 2 2 ;
iD1
This is the Laplace transform of N .e m, ƒ/. By the uniqueness theorem for Laplace
transforms (see e.g. [4, Chap. 10.1]), we have identified the law G.de , dx/ :D
e F e .d , dx/, for F
e of step (1), as
1 2 2 >
e D N C2
G ,
C †
3.18’ Exercise. Write M for the space of all piecewise constant right-continuous jump func-
tions f : Œ0, 1/ ! N0 such that f has at most finitely many jumps over compact time
intervals, all these with jump height C1, and starts from f .0/ D 0. Write t for the coordinate
projections t .f / D f .t /, t 0, f 2 M .
Equip M with the -field generated by the coordinate projections, consider the filtration
F D .F t / t0 where F t :D .r : 0 r t /, and call D . t / t0 the canonical process on
.M , M, F /.
For > 0, let P ./ denote the (unique) probability law on .M , M/ such that the canonical
process . t / t0 is a Poisson process with parameter . Then (e.g. [14, p. 165]) the process
0 0 t
® ¯
Lt = :D exp .0 / t , t 0
is the likelihood ratio process of P .0 / relative to P ./ with respect to F , i.e., for all t 2 Œ0, 1/,
0
Lt = is a version of the likelihood ratio of P .0 /jFt relative to P ./jFt
(1) For fixed reference value > 0, reparameterising the family of laws on .M , M, F / with
respect to as
.˘/ # ! P .e # / , # 2 ‚ :D R ,
the likelihood ratio process takes the form which has been considered in Exercise 1.16’:
# =
° ±
Le
t :D exp # t .e #
1/ t , t 0.
For t fixed, this is an exponential family in # with canonical statistic t . Specify a functional
: ‚ ! .0, 1/ and an estimator Tt : M ! R for this functional
1
.#/ :D e # , T t :D t .
t
By classical theory of exponential families, T t is the best estimator for in the sense of uni-
formly minimum variance within the class of unbiased estimators (e.g. [127, pp. 303 and 157]).
(2) For some h 2 R which we keep fixed as n ! 1, define sequences of probability laws
p
Qn :D P ./jFn , Pn :D P .e h= n
/jFn , n1
and prove contiguity .Pn /n CB .Qn /n via Lemma 3.5: for this, use a representation of log-
likelihood ratios ƒn of Pn with respect to Qn in the form
p
1 h
ƒn D h p .n n/ e h= n 1 p n
n n
and weak convergence in R
1
L. ƒn j Qn / ! N . h2 , h2 / , n ! 1 .
2
p
(3) Writing Xn :D n.Tn /, the reference point being fixed according to the repa-
rameterisation .˘/, extend the last result to joint convergence in R2
1 2 2
2 h h h
L. .ƒn , Xn / j Qn / ! N ,
0 h
such that by Le Cam’s Third Lemma 3.6
2
C 12 h2 h h
L. .ƒn , Xn / j Pn / ! N , .
h h
In particular, this gives
.˘˘/ L. Xn h j Pn / ! N . 0 , / .
(4) Deduce from .˘˘/ the following ‘equivariance’ property of the estimator Tn in shrinking
neighbourhoods of radius O. p1n / – in the sense of the reparameterisation .˘/ above – of the
p
reference point , using approximations e h= n D 1 C phn C O. n1 / as n ! 1: for every
h 2 R fixed, we have weak convergence
p p p
L n.Tn e h= n / j P .e h= n / ! N . 0 , / as n ! 1
where the limit law does not depend on h. This means that on small neighbourhoods with ra-
dius O. p1n / of a fixed reference point , asymptotically as n ! 1, the estimator Tn identifies
true parameter values with the same precision.
Chapter 4
This chapter, closely related to Section 1.1, generalises the notion of ‘smoothness’ of
statistical models. We define it in an L2 -sense which e.g. allows us to consider families
of laws which are not pairwise equivalent, or where log-likelihood ratios for ! fixed are
not smooth functions of the parameter. To the new notion of smoothness corresponds
a new and more general definition of score and information in a statistical model. In
the new setting, we will be able to prove rigorously – and under weak assumptions –
quadratic expansions of log-likelihood ratios, valid locally in small neighbourhoods
of fixed reference points #. Later, such expansions will allow to prove assertions very
similar to those intended – with questionable heuristics – in Section 1.3. Frequently
called ‘second Le Cam lemma’, expansions of this type can be proved in large classes
of statistical models – for various stochastic phenomena observed over long time inter-
vals or in growing spatial windows – provided the parameterisation is L2 -smooth, and
provided we have strong laws of large numbers and corresponding martingale conver-
gence theorems. For i.i.d. models, we present a ‘second Le Cam lemma’ in Section 4.2
below.
Section 4.1 Lr -differentiable Statistical Models 109
.i/ e#,1 , : : : , V
e # has components V
V e#,d 2 Lr ., A, / ,
1 e#
# . #/> V
1=r 1=r
.ii/ ! 0 as j #j ! 0 .
j #j Lr . /
(2) We prove: in a statistical model P , (i) and (ii) imply the assertion
e # D 0 on ¹# D 0º .
V
e # into a new object
This allows to transform V
V#,i :D 1¹ # >0º
r 1=r
#
e #,i ,
V 1i d
which gives (note that with respect to (i) we change the integrating measure from
to P# )
V# with components V#,1 , : : : , V#,d 2 Lr ., A, P# / ,
e #,i D 1 1=r V#,i , 1 i d .
V
r #
e # , Fréchet differentiability of the mapping .˘/ takes the
With V# thus associated to V
following form:
R ˇ 1=r ˇ
1 ˇ 1=r 1
1=r . #/ > V ˇr d ! 0
j#jr # r # #
.ii/
as j #j ! 0.
We remark that the integral in Definition 4.1’(ii) does not depend on the choice of
the dominating measure . Considering for P different dominating measures << e ,
and densities # (with respect to ) and e # (with respect to e # D # dde
), we have e
e
-almost surely. Hence, in a representation analogous to Definition 4.1’(ii) with e
, e
#
d
and e , the factor de cancels out. Thus (any version of) the derivative V# in Defini-
tion 4.1’(i) depends only on the law P# in the family P , and not on the choice of a
dominating measure for P .
for every ! 2 fixed, we shall show that the components V#,i of the Lr -derivatives
V# coincide P# -almost surely with
@
.ıı/ ! ! 1¹ # >0º .!/ log .#, !/
@#i
which was considered in Definition 1.2, for all 1 i d . We do have V#,i 2 Lr .P# /
by Definition 4.1’(i), and shall see later in Corollary 4.5 that V#,i is necessarily centred
under P# . When r D 2, .ıı/ being the classical definition of the score at # (Defi-
nition 1.2), L2 -differentiability is a notion of smoothness of parameterisation which
extends the classical setting of Chapter 1. We will return to this in Definition 4.6 below.
To check that components V#,i of Lr -derivatives V# coincide with .ıı/ P# -almost
surely, consider in Definition 4.1’ sequences n D # C ın ei (with ei the i -th unit
vector in Rd , and ın # 0). From Lr ./-convergence in Definition 4.1’(ii), we can
select subsequences .nk /k along which -almost sure convergence holds
ˇ 1=r ˇ
ˇ 1=r ˇ
ˇ #Cınk ei # 1 1=r ˇ
ˇ V ˇ
#,i ˇ ! 0 as k ! 1 .
ˇ ınk r #
ˇ ˇ
Thus for the mapping .ı/, the gradient r.1=r /.#, / coincides –almost surely with
1 1=r e
r # V# D V # . On ¹# > 0º we thus have
r.1=r /.#, /
r.log /.#, / D r r.log.1=r //.#, / D r D V# –almost surely
1=r .#, /
e# according to Assertion 4.1./. This
whereas on ¹# D 0º we put V# 0 V
gives the representation .ıı/.
4.1”’ Exercise. Consider the location model P :D ¹F . / : 2 Rº generated by the doubly
exponential distribution F .dx/ D 12 e jxj dx on .R, B.R//. Prove that P is L2 -differentiable
in D #, for every # 2 R, with derivative V# D sgn. #/.
The next step is to remove the domination assumption from the preliminary Defini-
tion 4.1’.
1 =#
.iia/ P L D 1 ! 0 , j #j ! 0 ,
j #jr
112 Chapter 4 L2 -differentiable Statistical Models
Z ˇ ˇr
1 ˇ =# 1=r 1 ˇ
.iib/ ˇ L 1 . #/ V# ˇˇ dP# ! 0 ,
>
j #j ! 0 .
j #jr ˇ r
4.2’ Remarks. (a) For arbitrary statistical models P D ¹P : 2 ‚º and reference
points # 2 ‚, it is sufficient to check (iia) and (iib) along sequences . n /n ‚ which
converge to # as n ! 1.
(b) Fix any sequence . n /n ‚ which converges to # as n ! 1. Then the
P ¹Pnn
countable subfamily , n 1, P# º is dominated by some -finite measure (e.g.
take :D P# C n1 2 Pn ) which depends on the sequence under consideration.
Let n , n 1, # denote densities of Pn , n 1, P# with respect to . Along this
subfamily, the integral in Definition 4.1’(ii)
Z ˇ ˇr Z Z
ˇ 1=r 1=r 1 1=r > ˇ
ˇn # # . n #/ V# ˇ d D : : : d C : : : d
r ¹ # D0º ¹ # >0º
splits into the two expressions which appear in (iia) and (iib) of Definition 4.2:
ˇ ˇr
Z Z ˇ 1=r ˇ
ˇ n 1 ˇ
n d C ˇ 1 . n #/> V# ˇ # d
¹ # D0º ¹ # >0º ˇ # r ˇ
Z ˇ ˇr
ˇ =# 1=r 1 ˇ
D P n L n =#
D1 C ˇ n
1 . n #/ V# ˇˇ dP# .
>
ˇ L r
Thus the preliminary Definition 4.1’ and the final Definition 4.2 are equivalent in re-
striction to the subfamily ¹Pn , n 1, P# º . Since we can consider arbitrary sequences
. n /n converging to # , we have proved that Definition 4.2 extends Definition 4.1’
consistently.
4.3 Example. On an arbitrary space ., A/ consider P :D M1 ., A/, the set of all
probability measures on ., A/. Fix r 1 and P 2 P . In the non-parametric model
E D ., A, P /, we shall show that one-parametric paths ¹Q# : j#j < "º through P
in directions Z
g 2 Lr .P / such that g dP D 0
r -differentiable at all points of the parameter
(defined in analogy to Example 1.3) are L
interval, and have Lr -derivative V0 D g at the origin Q0 D P .
(a) Consider first the case of directions g which are bounded, and write M :D
sup jgj. Then the one-dimensional path through P in P
° 1 ±
g
./ SP :D , A, Q# : j#j < , dQ# :D .1 C # g/ dP
M
is Lr -differentiable at every parameter value #, j#j < M
1
, and the derivative V# at #
is given by
g
V# D 2 Lr ., A, Q# /
1C#g
Section 4.1 Lr -differentiable Statistical Models 113
Definition 4.2. Since SPg is dominated by :D P , conditions (iia) and (iib) of Defini-
tion 4.2 are equivalent to Condition 4.1’(ii) which we shall check now. For all r 1,
.1 C g/1=r .1 C # g/1=r 1 1 g
D .1 C g/ r 1 g D .1 C g/1=r
1
# r r 1C g
for every ! fixed, with some (depending on !) between and #. For " small enough
and 2 B" .#/, j j remains separated from M1
: thus dominated convergence under P#
as ! # in the above line gives
Z ˇˇ ˇr
ˇ
ˇ .1 C g/ .1 C # g/
1=r 1=r 1 g ˇ
ˇ .1 C # g/1=r ˇ d ! 0
ˇ # r 1 C # gˇ
as j #j ! 0
in ./ are bounded away from both 0 and 2, by choice of . Since is Lipschitz,
dominated convergence gives
Z Z
d
.#g/ dP D g 0 .#g/ dP .
d#
e # defined from the mapping .˘/ in 4.1, our assumptions on
First, with V and g imply
that
Z
e d 1=r 1 . r1 1/ d 1 . r1 1/ 0 0
V# D f D f# f# D f # g .#g/ g .#g/ dP
d# # r d# r
114 Chapter 4 L2 -differentiable Statistical Models
which does not depend on the particular construction – through choice of – of the
path E through P . Note that the last assertion .ı/ holds for arbitrary P 2 P and for
arbitrary directions g 2 Lr .P / such that EP .g/ D 0.
s y 1=s 1 rs
'.y/ :D , .y/ :D ' rs .y/ D ' ts .y/ , y0.
r y 1=r 1
Check that ' is continuous at y D 1 (with '.1/ D 1), and thus continuous on Œ0, 1/.
For ° r r ±
y > max ,1
r s
we have the inclusions
r r r r 1=r
y 1=r
> H) 1 y 1=r > H) .y 1/ > y 1=r
r s s s s
and thus by definition of t
!ts !ts
y 1=s 1 y 1=s 1 1 1 ts
.y/ D ' ts .y/ D r 1=r 1/
< < y sr Dy.
s .y y 1=r
Section 4.1 Lr -differentiable Statistical Models 115
From this and continuity of on Œ0, 1/, the function satisfies a linear growth con-
dition
.ı/ 0 .y/ D ' ts .y/ M.1 C y/ for all y 0
for some constant M .
(2) We exploit the assumption of Lr -differentiability at #. For sequences . m /m
converging to # in ‚, consider likelihood ratios Lm =# of Pm with respect to P# . As a
consequence of (i) in Definition 4.2, the mapping u ! k u> V# kLr .P# / is continuous
on Rd , hence bounded on the unit sphere. Thus . m #/> V# vanishes in Lr .P# /.
From (iib) of Definition 4.2 and inverse triangular inequality
ˇ ˇ
ˇ =# 1=r 1 ˇ
ˇ Lm 1 . #/>
V ˇ o.j m #j/
ˇ Lr .P# / r
m #
Lr .P# / ˇ
Finally, based on convergence in probability (+), we put q D st and apply the first of
the equivalences above to f 1 and fm D '.Lm =# / : this establishes .˘˘/.
(4) With these preparations, we come to the core of the proof. Regarding first (i) and
(iia) in Definition 4.2, it is clear that these conditions are valid for 1 s r when
they are valid for r > 1. We turn to condition (iib) in Definition 4.2. By definition of
the function ' in step (1), we can upper-bound the expression
° 1=s ±
. / s L=# 1 . #/> V# s
L .P# /
with r D 1 gives
#Cın ei #
# V#,i ! 0 in L1 ./ as n ! 1
ın
R
and thus EP# .V#,i / D # V#,i d D 0. This holds for 1 i d .
by Definition 4.2(i) and Corollary 4.5. In the special case of dominated models ad-
mitting continuous densities for which partial derivatives exist, V# necessarily has all
properties of the score considered in Definition 1.2, cf. Remark 4.1”. In this sense, the
present setting generalises Definition 1.2 and allows to transfer notions such as score
or information to L2 -differentiable statistical models.
In the light of Definition 4.2, Hellinger distance H., / between probability mea-
sures Z ˇ
1 ˇˇ 1=2 1=2 ˇ2
H .Q1 , Q2 / D
2
ˇ 1 2 ˇ d 2 Œ0, 1
2
on ., A/ (as in Definition 1.18, not depending on the choice of the -finite measure
which dominates Q1 and Q2 , and not on versions of the densities i D dQ d
i
, i D 1, 2)
2
gives the geometry of experiments which are L -differentiable. This will be seen in
Proposition 4.8 below.
Proof. By Notations 3.1, for any -finite measure on ., A/ which dominates P 0
0
and P and for any choice 0 , of -densities, L = coincides .P C P 0 /-almost
0
surely with 1¹ >0º C 1 1¹ D0º . As in step (1) of the proof of Proposition 3.10,
0 ® ¯ 0
E 1 L = D P 0 D 0 D P 0 L = D 1
p 0
yields the first assertion. Since L = 2 L2 .P /, all expectations in
p 1 0 1 p 0 = 2
.C/ E L 0 = 1 D E L = 1 E L 1
2 2
p
arepwell defined, where equality in (+) is the elementary a 1 D 12 .a 1/
2 . a 1/ for a 0. Taking together the preceding three lines yields the second
1 2
assertion.
m #
ım :D j m #j ! 0 , um :D ! u , m!1.
j m #j
where J# is the Fisher information at # in E. This goes back to Le Cam, see [81];
since Hájek and Sidák [41], results which establish an approximation of this type
120 Chapter 4 L2 -differentiable Statistical Models
are called a ‘second Le Cam lemma’. From the next chapter on, we will exploit
similar approximations of log-likelihood ratios. Beyond the i.i.d. setting, such
approximations do exist in a large variety of contexts where a statistical model is
smoothly parameterised and where strong laws of large numbers and central limit
theorems are at hand (e.g. autoregressive processes, ergodic diffusions, ergodic
Markov processes, . . . ).
4.9’ Exercise. For J 2 Rd d symmetric and strictly positive definite, calculate log-likelihood
ratios in the experiment ¹N .J h, J / : h 2 Rd º and compare to the expansion given in 4.11
below.
4.10 Assumptions and Notations for Section 4.2. (a) We work with an experiment
E D , A , P D ¹P : 2 ‚º , ‚ Rd open
0
with likelihood ratios L = of P 0 with respect to P . Notation # for a point in the
parameter space implies that the following is satisfied:
In this case, we write J# for the Fisher information at # in E, cf. Definition 4.6:
J# D E# V# V#> .
0 = 0 =
Ln and ƒn denote the likelihood ratio and the log-likelihood ratio of Pn0 with
respect to Pn . At points # 2 ‚ where E is L2 -differentiable, cf. (a), we write
1 X
n
Sn .#/.!1 , : : : , !n / :D p V# .!j / , ! D .!1 , : : : , !n / 2 n
n j D1
and have from the Definition 4.6 (which includes Corollary 4.5) and the central limit
theorem
L Sn .#/ j P#n ! N .0, J# / (weak convergence in Rd ) as n ! 1 .
Section 4.2 Le Cam’s Second Lemma for i.i.d. Observations 121
hn = n is in ‚), we have
p
.#Chn = n/=# 1 >
ƒn D h>
n Sn .#/ h J# hn C o.P#n / .1/ , n ! 1
2 n
where L Sn .#/ j P#n converges weakly in Rd as n ! 1 to N .0, J# /.
The proof of Lemma 4.11 will be given in 4.15, after a series of auxiliary results.
p
4.12 Proposition. For bounded sequences .hn /n in Rd and points n :D # C hn = n
in ‚, we have
n p
X 1 1 X >
n
1
Ln =# .!j / 1 D p hn V# .!j / h> J# hn C o.P#n / .1/ , n!1.
2 n 8 n
j D1 j D1
to see this, select in ¹jhj C º at stage n some hn such that E# .Œr# .n, hn /2 / is suf-
p side of .˘/, and apply part (iib) of Definition 4.2 to the
ficiently close to the left-hand
sequence n :D # C hn = n as n ! 1.
(2) For n 1, consider in En
X n p
p 1 1
An,h .!/ :D L.#Ch= n/=# 1 p h> V# .!j /
2 n
j D1
./
jhj X
n
D p r# .n, h/.!j /
n
j D1
for 0 ¤ h 2 Rd . From the second assertion in Proposition 4.7 combined with Corol-
lary 4.5 we know
p
p 1 1
.A
E# n,h / D n E# L.#Ch= n/=# 1 p h> V# D n H 2 P#Ch=pn , P# .
2 n
122 Chapter 4 L2 -differentiable Statistical Models
Consider now a bounded sequence .hn /n in Rd . The unit sphere S d 1 being compact,
hn
we can select subsequences .hnk /k whose directions uk :D jhnk j approach limits
k
u 2 S d 1 . Thus we deduce from Proposition 4.8
1
./ E# An,hn D n H 2 P#Chn =pn , P# D h> J# hn C o.1/
8 n
as n ! 1. On the other hand, calculating variances for ./ we find from .˘/
sup Var# .An,h / C 2 sup Var# .r# .n, h//
jhjC jhjC
C sup E# Œr# .n, h/2
2
! 0
jhjC
as n ! 1. Thus the sequence ./ behaves as the sequence of its expectations ./:
1
An,hn D E# An,hn C o.P#n / .1/ D h> J# hn C o.P#n / .1/
8 n
as n ! 1 which is the assertion.
p
4.13 Proposition. For bounded sequences .hn /n in Rd and points n :D # C hn = n,
we have
n p
X 2 1
Ln =# .!j / 1 D h> J# hn C o.P#n / .1/ , n ! 1 .
4 n
j D1
Proof. We write n .!/, e n .!/, : : : for remainder terms o.P#n / .1/ defined in En . By
Definition 4.6 of the Fisher information and the strong law of large numbers,
1 X
n
V# V#> .!j / D E# V# V#> C n .!/ D J# C n .!/
n
j D1
With notation r# .n, h/ introduced at the start of the proof of Proposition 4.12 we write
p 1 1 jhn j
.CC/ Ln =# .!j / 1 D p h> n V# .!j / C p r# .n, hn /.!j / .
2 n n
As n ! 1, we take squares of both the right-hand sides and the left-hand sides in
(++), and sum over 1 j n. Thanks to .˘/ in the proof of Proposition 4.12
sup E# r# .n, h/2 ! 0 as n ! 1
jhjC
Section 4.2 Le Cam’s Second Lemma for i.i.d. Observations 123
as n ! 1.
where we have to consider carefully the different remainder terms which arise as
n ! 1.
(2) In En , we do have
X
n
.ı/ ƒnn =# .!/ D log Ln =# .!j / for P#n -almost all ! D .!1 , : : : , !n / 2 n
j D1
which justifies the first ‘D’ in the chain of equalities in (1) above. To see this, fix
the sequence . n /n , choose on ., A/ some dominating measure for the restricted
experiment ¹Pn , n 1, P# º, and select densities n , n 1, # . In the restricted
product experiment ¹Pnn , n 1, P#n º, the set
has full measure under P#n , the likelihood ratio Lnn =# coincides .Pnn C P#n /-almost
surely with
Yn
n .!j /
! ! 1An .!/ C 1 1Acn .!/ ,
# .!j /
j D1
and the expressions
X
n
ƒnn =# .!/ D log.Lnn =# .!// and log Ln =# .!j /
j D1
are well defined and Œ1, C1/-valued in restriction to An , and coincide on An . This
is .ı/.
pat # in the experiment E. Fix ı > 0, write Zn D
2
p (3) We exploit L -differentiability
L n 1 where n D # C hn = n . Then we have
=#
Then (+) will justify the last ‘D’ in the chain of heuristic equalities in step (1), and
thus will finish the proof of Proposition 4.14.
(5) To prove (+), we will consider ‘small’ and ‘large’ absolute values of Zn,j sep-
arately, ‘large’ meaning
X
n X
n X
n
R.Zn,j /1¹jZn,j j>ıº D R.Zn,j /1¹Zn,j >ıº C R.Zn,j /1¹Zn,j <ıº
j D1 j D1 j D1
for any ı > 0 fixed. For Zn,j positive, we associate to ı > 0 the quantity
Using step (3) above, the probability of the last event under P#n is smaller than
Negative values of Zn,j can be treated analogously. We thus find that the contribution
of ‘large’ absolute values of Zn,j to the sum (+) is negligible:
X
n X
n
.CC/ for any ı > 0 fixed : R.Zn,j / D R.Zn,j /1¹jZn,j jıº C o.P#n / .1/
j D1 j D1
as n ! 1. It remains to consider ‘small’ absolute values of Zn,j . Fix " > 0 arbitrarily
small. As a consequence of R.z/ D o.z 2 / for z ! 0, we can associate
X
n n p
X 2 1
. / 2
Zn,j .!/ D Ln =# .!j / 1 D h> J# hn C o.P#n / .1/
4 n
j D1 j D1
126 Chapter 4 L2 -differentiable Statistical Models
4.15 Proof of Le Cam’s Second Lemma 4.11. Under Assumptions 4.10, we put to-
gether Propositions 4.14, 4.12 and 4.13
n p
X n p
X
2
ƒnn =# .!/ D 2 L n =# .!j / 1 Ln =# .!j / 1 C o.P#n / .1/
j D1 j D1
1 1 X >
n
1 > 1 >
D2 p hn V# .!j / hn J# hn h J# hn C o.P#n / .1/
2 n 8 4 n
j D1
1 X
n
1 >
Dp h>
n V# .!j / h J# hn C o.P#n / .1/
n j D1 2 n
and have proved the representation of log-likelihoods in Lemma 4.11. Weak conver-
gence of the scores
1 X
n
Sn .#, !/ D p V# .!j / under P#n
n
j D1
has already been stated in Assumption 4.10(b). Le Cam’s Second Lemma 4.11 is now
proved.
Chapter 5
This chapter considers the Gaussian shift model and its statistical properties. The main
stages of Section 5.1 are Boll’s Convolution Theorem 5.5 for equivariant estimators,
the proof that arbitrary estimators are approximately equivariant under a very diffuse
prior, and – as a consequence of both – the Minimax Theorem 5.10 which establishes
a lower bound for the maximal risk of arbitrary estimators, in terms of the central
statistic. We will see a stochastic process example in Section 5.2.
5.1 Example. Fix J 2 Rd d symmetric and strictly positive definite, and consider
the normal distribution model
® ¯
Rd , B.Rd / , Ph :D N .J h, J / : h 2 Rd .
dPh
Here densities fh D d
with respect to Lebesgue measure on Rd are given by
d=2 1=2 1 > 1
f .h, x/ D .2/ .det J / exp .x J h/ J .x J h/ ,
2
x 2 Rd , h 2 Rd
where S.x/ :D x denotes the canonical variable on Rd for which, as a trivial assertion,
.CC/ L .S j P0 / D N .0, J / .
(c) Thus a structure analogous to (+) and (++) persists if we reparameterise around
arbitrary reference points h0 2 Rd : we always have
e 1 > e
. / L.h0 Ch/= h0 D exp eh> .S J h0 / e h J h , e h 2 Rd
2
together with
. / L S J h0 j Ph0 D N .0, J / .
(d) The above quadratic shape (+) of likelihoods combined with a distributional prop-
erty (++) was seen to appear as a limiting structure – by Le Cam’s Second Lemma 4.10
and 4.11 – over shrinking neighbourhoods of fixed reference points # in L2 -differen-
tiable experiments.
Section 5.1 Gaussian Shift Experiments 129
For every given matrix J 2 Rd d symmetric and strictly positive definite, a Gaus-
sian shift experiment E.J / exists by Example 5.1. The following proposition shows
that E.J / as a statistical experiment is completely determined from the matrix J 2
Rd d symmetric and strictly positive definite.
5.3 Proposition. In an experiment E.J / with the properties of Definition 5.2, the fol-
lowing holds:
(a) all laws Ph , h 2 Rd , are equivalent probability measures;
(b) we have L.Z h j Ph / D N .0, J 1 / for all h 2 Rd ;
(c) we have L.S J h j Ph / D N .0, J / for all h 2 Rd ;
(d) we have for all h 2 Rd
.hCe
h/= h e> 1 e> e e
L D exp h .S J h/ h J h , h 2 Rd .
2
Proof. (1) For any h 2 Rd , the likelihood ratio Lh=0 in Definition 5.2 is strictly positive
and finite on : hence neither a singular part of Ph with respect to P0 nor a singular
part of P0 with respect to Ph exists, and we have P0 Ph .
(2) Recall that the Laplace transform of a normal law N .0, ƒ/ on .Rd , B.Rd /// is
Z
> 1 >
d
R 3 u ! e u x N .0, ƒ/.dx/ D e C 2 u ƒ u
which specifies the Laplace transform of the law of S under P0 and establishes
L .S j P0 / D N .0, J / .
L.Z h j Ph / D N .0, J 1 /
for arbitrary h 2 Rd . This is (b), and (c) follows via standard transformations of
normal laws.
(4) From (1) and Definition 5.2 we obtain the representation (d) in the same way
as . / in Example 5.1(b). Then (d) and (c) together show that the experiment E.J /
admits score and Fisher information as indicated.
For the statistical properties of a parametric experiment, the space ., A/ support-
ing the family of laws ¹P : 2 ‚º is of no importance: the structure of likelihoods
L=# matters when and # range over ‚. Hence, one may encounter the Gaussian
shift experiment E.J / in quite different contexts.
In a Gaussian shift experiment E.J /, the problem of estimation of the unknown pa-
rameter h 2 Rd seems completely settled in a very classical way. The central statistic
Section 5.1 Gaussian Shift Experiments 131
An equivariant estimator simply ‘works equally well’ at all points of the statistical
model. By Proposition 5.3(b), the central statistics Z is equivariant in the Gaussian
shift model E.J /.
5.4’ Exercise. With C the space of continuous functions Rd ! R, write Cp C for the
cone of strictly positive f vanishing at 1 faster than any polynomial: for every ` 2 N,
sup¹jxj>Rº ¹jxj` f .x/º ! 0 as R ! 1. Consider an experiment E 0 D .0 , A0 , ¹Ph0 : h 2
Rd º/ of mutually equivalent probability laws for which paths
h ! L.h0 Ce
Rd 3 e h/= h0
.!/ 2 .0, 1/
belong to Cp almost surely, for arbitrary h0 2 Rd fixed, and such that the laws
L L.h0 Ce h/= h0
e
h2Rd
j Ph0 0
are well defined and do not depend on h0 . Prove the following (a) and (b):
(a) In E 0 , a Bayesian estimator h with ‘uniform over Rd prior’ for the unknown parameter
h 2 Rd R 0 h0 =0
d h L .!/ dh0
h .!/ :D R R 0 =0
Rd L
h .!/ dh0
(sometimes called Pitman estimator) is well defined, and is an equivariant estimator.
132 Chapter 5 Gaussian Shift Models
where .Wu /u2R is a two-sided Brownian motion and dimension is d D 1 (the parameter is
‘time’: this is a two-sided variant of Example 1.16’, in the case where X in Example 1.16’ is
Brownian motion), all assumptions above are satisfied, (a) and (b) hold true, and the Bayesian
estimator h outperforms the maximum likelihood estimator b h under squared loss. See the
references quoted at the end of Example 1.16’. For the third point, see [114]; for the second
point, see [56, Lemma 5.2].
5.4” Exercise. Prove the following: in a Gaussian shift model E.J /, the Bayesian estimator
with ‘uniform over Rd prior’ (from Exercise 5.4’) for the unknown parameter h 2 Rd
R 0 h0 =0
d h L .!/ dh0
h .!/ :D R R 0 =0 , !2
Rd L
h .!/ dh0
coincides with the central statistic Z, and thus with the maximum likelihood estimator in E.J /.
Hint: varying the i -th component of h0 , start from one-dimensional integration
Z 1 Z 1
@ h0 =0 0 0
0D 0 L .!/ dh i D .S.!/ J h0 /i Lh =0 .!/ dh0i , 1 i d
1 @hi 1
and prove Z Z
0 0
Z.!/ Lh =0 .!/ dh0 D h0 Lh =0 .!/ dh0 .
Rd Rd
5.4”’ Exercise. In a Gaussian shift model E.J / with unknown parameter h 2 Rd and central
statistic Z D J 1 S , fix some point h0 2 ‚ and some 0 < ˛ < 1, and use a statistic
T :D ˛ Z C .1 ˛/ h0
as an estimator for the unknown parameter. Prove that T is not an equivariant estimator, and
specify the law L.T hjPh / for all h 2 Rd from Proposition 5.3.
The following implies a criterion for optimality within the class of all equivariant
estimators in E.J /. This is the first main result of this section.
Section 5.1 Gaussian Shift Experiments 133
5.5 Convolution Theorem (Boll [13]). Consider a Gaussian shift experiment E.J /.
If is an equivariant estimator for the unknown parameter h 2 Rd , there is some
probability measure Q on .Rd , B.Rd // such that
.˘/ L . j P0 / D N .0, J 1 / ? L . Z j P0 / .
(2) We have ./ for every t 2 Rd . We thus have equality of characteristic functions
on Rd . The right-hand side of .C/ admits an interpretation as characteristic function of
L e C . Z/ j P0
for some random variable e independent of Z under P0 and such that L.e j
P0 / D N .0, J 1 /. Hence, writing Q :D L . Z j P0 /, the right-hand side of ./ is
the characteristic function of a convolution N .0, J 1 / ? Q.
(3) It is important to note the following: the above steps (1) and (2) did not establish
(and the assertion of the theorem did not claim) that Z and . Z/ should be inde-
pendent under P0 .
Combined with Proposition 5.3(b), Boll’s Convolution Theorem 5.5 states that in a
Gaussian shift experiment E.J /, estimation errors of equivariant estimators are ‘more
spread out’ than the estimation error of the central statistic Z, as an estimator for
the unknown parameter. By Theorem 5.5, the best possible concentration of estima-
tion errors (within the class of all equivariant estimators) is attained for D Z: then
134 Chapter 5 Gaussian Shift Models
5.5’ Exercise. For R < 1 arbitrarily large, consider closed balls C centred at 0 with radius R.
(a) In a Gaussian shift model E.J /, check that a Bayesian with uniform prior over the
compact C R 0 h0 =0
h L .!/ dh0
hC .!/ :D CR h0 =0 .!/ dh0
, !2
C L
is not an equivariant estimator for the unknown parameter.
(b) For arbitrary estimators T : ., A/ ! .Rd , B.Rd // for the unknown parameter which
may exist in E.J /, consider quadratic loss and Bayes risks
Z
1
R.T , C / :D Eh .jT hj2 / dh 1 .
.C / C
In the class of all A-measurable estimators T for the unknown parameter, hC minimises the
squared risk for h chosen randomly from C , and provides a lower bound for the maximal
squared risk over C :
This is seen as follows: the first inequality in (+) being a trivial one (we replace an integrand by
its upper bound), it is sufficient to prove the second. Put e :D C and A e :D A ˝ B.C / ;
on the extended space .e e/, let id : .!, h/ ! .!, h/ denote the canonical statistic. Define
, A
a probability measure
0
dh0 Lh =0 .!/ 0
e.d!, dh0 / :D 1C .h0 /
P Ph0 .d!/ D P0 .d!/ 1C .h0 / dh
.C / .C /
are convex and symmetric with respect to the origin (i.e. x 2 Ac ” x 2 Ac , for
x 2 Rd );
(ii) we associate – with respect to a loss function which we keep fixed – a risk
function
Z
R.T , / : ‚0 3 h ! R.T , h/ :D `.T h/ dPh 2 Œ0, 1
0
Note that risk functions according to Definition 5.6(ii) are well defined, but not
necessarily finite-valued. When A is a subset of ‚ of finite Lebesgue measure, we
also write
Z
R.T , A/ :D A .dh/ R.T , h/ 2 Œ0, 1
for the Bayes risk where A is the uniform distribution on A. For short, we write n
for the uniform distribution on the closed ball Bn :D ¹jxj nº. The following lemma
is based on an inequality for volumes of convex combinations of convex sets in Rd ,
see [1, p. 170–171]) for the proof.
5.6’ Lemma (Anderson [1]). Consider C Rd convex and symmetric with respect
to the origin. Consider f : Rd ! Œ0, 1/ integrable, symmetric with respect to the
origin, and such that all sets ¹x 2 Rd : f .x/ cº are convex, 0 < c < 1. Then
Z Z
f .x/ dx f .x/ dx
C C Cy
5.7 Corollary. For matrices ƒ 2 Rd d which are symmetric and strictly positive
definite, sets C 2 B.Rd / convex and symmetric with respect to the origin, subconvex
loss functions ` : Rd ! Œ0, 1/, points a 2 Rd n ¹0º and probability measures Q on
.Rd , B.Rd //, we have
Proof. Writing f ./ for the Lebesgue density of the normal law N .0, ƒ/ on Rd , the
first assertion rephrases Anderson’s Lemma 5.6’. An immediate consequence is
Z Z
è.x/ N .0, ƒ/.dx/ è.x a/ N .0, ƒ/.dx/
X
n2n
X
n2 1 0
n
1 j ° ± C n1
`n :D 1R nCn,j
d D 1 j0 j 0 C1 ¹`>nº , n 1.
2n 2n 2n < ` 2n
j D1 0 j D0
The last inequalities of Corollary 5.7 allow to rephrase Boll’s Convolution Theo-
rem 5.5. This yields a powerful way to compare (equivariant) estimators, in the sense
that ‘optimality’ appears decoupled from any particular choice of a loss function which
we might invent to penalise estimation errors.
5.8 Corollary. In the Gaussian shift experiment E.J /, with respect to any subconvex
loss function, the central statistic Z minimises the risk
R., h/ R.Z, h/ , h 2 Rd
in the class of all equivariant estimators for the unknown parameter.
Proof. Both and Z being equivariant, their risk functions are constant over Rd , thus
it is sufficient to prove the assertion for the parameter value h D 0. We have
L .Z j P0 / D N .0, J 1 /
Section 5.1 Gaussian Shift Experiments 137
from Proposition 5.3. Theorem 5.5 associates to a probability measure Q such that
L . j P0 / D N .0, J 1 / ? Q .
The loss function `./ being subconvex, the third assertion in Corollary 5.7 shows
Z
R.Z, 0/ D E0 . `.Z 0/ / D `.x/ N .0, J 1 /.dx/
Z
`.x/ N .0, J 1 / ? Q .dx/ D E0 . `. 0/ / D R., 0/
Given Theorem 5.5 and Corollary 5.8, we are able to compare equivariant estima-
tors; the next aim is the comparison of arbitrary estimators for the unknown parameter.
We quote the following from [121, Chap. I.2] or [124, Chap. 2.4]:
The following lemma represent a key tool: in a Gaussian shift experiment E.J /,
arbitrary estimators for the unknown parameter can be viewed as ‘approximately
equivariant’ in the absence of any a priori knowledge except that the unknown param-
eter should range over large balls centred at the origin.
5.9 Lemma. In a Gaussian shift experiment E.J /, every estimator for the unknown
parameter h 2 Rd is associated to a sequence of probability measures .Qn /n on
.Rd , B.Rd // such that
Z
1
.i/ d1 n .dh/ L. h j Ph / , N .0, J / ? Qn ! 0 as n ! 1 .
As a consequence, for any choice of a loss function `./ which is subconvex and
bounded, we have
Z
R ., Bn / D n .dh/ Eh .`. h//
.ii/ Z
D `.x/ N .0, J 1 / ? Qn .dx/ C k`k1 o.1/
138 Chapter 5 Gaussian Shift Models
Proof. Assertion (ii) is a consequence of (i), via (+) in Remark 5.8’. We prove (i) in
several steps.
(1) The central statistic Z D J 1 S is a sufficient statistic in the Gaussian shift ex-
periment E.J / (e.g. [4, Chap. II.1 and II.2], or [127, Chap. 3.1]. Thus, for any random
variable taking values in .Rd , B.Rd //, there is a regular version of the conditional
law of given Z D which does not depend on the parameter h 2 Rd : there is a
transition kernel K., / on .Rd , B.Rd // such that
jZDz
K., / is a regular version of .z, A/ ! Ph .A/ for every value of h 2 Rd .
jZDz
We write this as K.z, A/ D P .A/. In the same sense, the conditional law of
Z given Z D does not depend on the parameter, and we define a sequence of
probability measures .Qn /n on .Rd , B.Rd //:
Z
. Z/jZDz
Qn .A/ :D n .dz/ P .A/ , A 2 B.Rd / , n 1 .
Note that the sequence .Qn /n is defined only in terms of the pair ., Z/. Comparing
with the first expression in (i), this definition signifies that the observed value z of the
central statistic starts to take over the role of the parameter h.
(2) For every fixed value of x 2 Rd , we have uniformly in A 2 B.Rd / the bound
ˇZ ˇ
ˇ ˇ . Bn 4 .Bn C x/ /
ˇ ˇ
./ ˇ n .dh/ K.x C h, A C h/ Qn .A x/ˇ .Bn /
where A x is the set A shifted by x, and 4 denotes symmetric difference. To see
this, write
Z Z
1
n .dh/ K.x C h, A C h/ D .dh/ 1Bn .h/ K.x C h, A C h/
.Bn /
Z
1
D .dh0 / 1Bn Cx .h0 / K.h0 , A x C h0 /
.Bn /
Z
1 jZDh0
D .dh0 / 1Bn Cx .h0 / P .A x C h0 /
.Bn /
Z
1 ZjZDh0
D .dh0 / 1Bn Cx .h0 / P .A x/
.Bn /
and compare the last right-hand side to the definition of Qn in step (1)
Z
1 ZjZDh0
Qn .A x/ D .dh0 / 1Bn .h0 / P .A x/ .
.Bn /
Section 5.1 Gaussian Shift Experiments 139
It follows that the difference on the left-hand side of ./ is uniformly in A smaller than
Z
1 ˇ ˇ . Bn 4 .Bn C x/ /
.dh0 / ˇ1Bn .h0 / 1Bn Cx .h0 /ˇ D .
.Bn / .Bn /
(3) To conclude the proof of (i), we condition the first law in (i) with respect to the
central statistic Z. For A 2 B.Rd / we obtain from Proposition 5.3(b), definition of
K., / and substitution z D x C h
Z Z Z
n .dh/ Ph . h 2 A / D n .dh/ PhZ .dz/ Ph jZDz .A C h/
Z Z
D n .dh/ N .h, J 1 /.dz/ K.z, A C h/
Z Z
D N .0, J 1 /.dx/ n .dh/ K.x C h, A C h/
By the bounds ./ obtained in step (2) which are uniform in A for fixed value of x, we
can compare the last right-hand sides. Thus the definition of total variation distance
gives
Z
d1 n .dh/ L. h j Ph / , N .0, J 1 / ? Qn
Z
. Bn 4 .Bn C x/ /
N .0, J 1 /.dx/
.Bn /
where the integrand on the right-hand side, trivially bounded by 2, converges to 0
pointwise in x 2 Rd as n ! 1. Assertion (i) now follows from dominated conver-
gence.
Lemma 5.9 is the key to the minimax theorem in Gaussian shift experiments E.J /.
It allows to compare all possible estimators for the unknown parameter h 2 Rd , with
respect to any subconvex loss function: it turns out that for all choices of `./, the max-
imal risk on Rd is minimised by the central statistic. This is – following Theorem 5.5 –
the second main result of this section.
5.10 Minimax Theorem. In the Gaussian shift experiment E.J /, the central statistic
Z is a minimax estimator for the unknown parameter with respect to any subconvex
loss function `./; we have
Z
sup R., h/ `.z/ N .0, J 1 /.dz/ D R.Z, 0/ D sup R.Z, h/ .
h2Rd h2Rd
140 Chapter 5 Gaussian Shift Models
Proof. Consider risk with respect to any subconvex loss function `./. The last equality
is merely equivariance of Z as an estimator for the unknown parameter h 2 Rd ,
and we have to prove the first sign ‘’. Consider any estimator for the unknown
parameter, define n , Bn , Qn as in Lemma 5.9, and recall that the sequence .Qn /n
depends only on the pair ., Z/. A trivial chain of inequalities is
1 suph2Rd R., h/ suph2Bn R., h/
Z
.C/
n .dh/ R., h/ D: R., Bn /
5.10’ Exercise. In a Gaussian shift model E.J / with unknown parameter h 2 Rd and central
statistic Z D J 1 S , fix some point h0 2 ‚, and consider for 0 < ˛ < 1 estimators
T :D ˛ Z C .1 ˛/ h0
according to exercise 5.4000 which are not equivariant. Under squared loss, calculate the risk of
T as a function of h 2 Rd . Evaluate what T ‘achieves’ at the particular point h0 in comparison
Section 5.2 Brownian Motion with Unknown Drift as a Gaussian Shift Experiment 141
to the central statistic, and the price to be paid for this at parameter points h 2 Rd which are
distant from h0 . Historically, estimators of similar structure have been called ‘superefficient at
h0 ’ and have caused some trouble; in the light of Theorem 5.10 it is clear that any denomina-
tion of this type is misleading.
5.2 Brownian Motion with Unknown Drift as a Gaussian
Shift Experiment
In the following, all sections or subsections preceded by an asterisk will require
techniques related to continuous-time martingales, semi-martingales and stochastic
analysis, and a reader not interested in stochastic processes may skip these and keep
on with the statistical theory. Some typical references for sections marked by are e.g.
Liptser and Shiryaev [88], Metivier [98], Jacod and Shiryaev [64], Ikeda and Watan-
abe [61], Chung and Williams [15], Karatzas and Shreve [69], Revuz and Yor [112].
In the present section, we introduce the notion of a density process (or likelihood ratio
process), and then look in particular to statistical models for Brownian motion with
unknown drift as an illustration to Section 5.1.
P0
loc
P relative to F
Proof. (1) For T 2 N fixed, we have PT0 << PT on FT , hence there is a density
d PT0
fT :D , FT -measurable, Œ0, 1/-valued, unique up to .PT0 C PT /-null sets .
d PT
Define M D .M t /0tT to be the càdlàg modification of the martingale t ! EP .fT j
F t /, 0 t T . Then for every 0 t T and F 2 F t , we can write
P t0 .F / D PT0 .F / D EP .1F fT / D EP .1F MT / D EP .1F M t /
Z Z
D M t dP D M t dP t
F F
d P t0
which shows that in restriction to F t , M t is a version of the density d Pt
.
Section 5.2 Brownian Motion with Unknown Drift as a Gaussian Shift Experiment 143
(2) For T 2 N we can paste together the processes .M t /0tT constructed so far
into one process M D .M t / t0 with the desired properties. This is (a).
(3) Consider F -stopping times . For F 2 F and N 2 N, consider first the subset
F \ ¹ N º which belongs to FN and to F^N . Combining (a) above with the
stopping theorem for bounded stopping times we get
P 0 .F \ ¹ N º/ D EP 1F \¹N º MN D EP 1F \¹N º M^N
D EP 1F \¹N º M
For statistical purposes, the filtered spaces ., A, F / in Definition 5.11 and Theo-
rem 5.12 are frequently path spaces for certain classes of stochastic processes.
and call G t (it contains all events in the process up to time t and infinitesimally be-
yond) for short the -field of events up to time t .
Q :D L ..B t C t / t0 / , 2 R
144 Chapter 5 Gaussian Shift Models
M satisfies E.M,t / D 1 for all t < 1 and thus is a martingale. Fix a time horizon
T < 1 and define from M a probability measure Q e ,T on .C , C / with the property
e ,T .A/ :D EQ M,T 1A
Q for all A 2 C
D EQ M,s 1A whenever 0 s T and A 2 Gs .
M being given by ./, Girsanov theorem (e.g. [14, App. A.3.3], [69, Sect. 3.5], [64,
e ,T , with
Chap. III.3]) establishes that . t / t0 remains a semi-martingale under Q
angle bracket under Qe ,T identical to the angle bracket hi t t under Q, and that
Thus under Q e ,T and up to time T , the canonical . t /0tT is Brownian motion with
drift . The -field GT being generated by the coordinate projections, there is at most
one such probability on GT . Hence the restrictions of the laws Q e ,T and Q to GT
d Q,T
coincide: thus M,T is a version of the density d QT which gives Q,T QT since
M is strictly positive.
As a consequence, the density process of Q with respect to Q coincides with M
up to time T ; T < 1 being arbitrary, we have identified M as the density process
of Q with respect to Q relative to G.
loc
The last part of the assertion is proved as follows: since Q0 Q for all pairs
0 0 =
¤ in R, the density process L of Q0 with respect to Q relative to G is
obtained from ratios
1
M0 ,t =M,t D exp .0 / t .02 2 / t under Q
2
where .m./
t / t0 is the martingale part of the canonical process . t / t0 un-
der Q .
5.15 The Experiment ‘Scaled Brownian Motion with Unknown Drift’. Fix a ma-
trix ƒ 2 Rd d which is symmetric and strictly positive definite, and let ƒ1=2 denote
its square root. On the path space .C , C , G/ of Notation 5.13 with d -dimensional
canonical process . t / t0 , consider probability measures
Qh :D L ƒ1=2 B t C .ƒh/ t t0 , h 2 Rd
is the density process of Qh0 with respect to Qh relative to G, with notation m.h/ for
the local martingale part of the canonical under Qh ; note that
L m.h/ j Qh D L ƒ1=2 B does not depend on h 2 Rd .
For 0 < t < 1 fixed and for .S , J / in Definition 5.2 defined by . t , ƒt /, we have
the structure of a Gaussian shift experiment E.ƒt / D .C , G t , ¹Qh : h 2 Rd º/ which
corresponds to time-continuous observation of the trajectory under unknown h up
to time t .
Proof. Note that in the definition of Qh , the drift term of the form ƒh, h 2 Rd ,
involves the matrix ƒ, i.e. the covariance matrix of 1 under Q0 . Fix h 2 Rd . The
proof is similar to the proof in Example 5.14, and we mention only the main steps.
Again we start from a martingale Mh on .C , C , Q0 / defined as in ./:
² ³
> 1 >
Mh D .Mh,t / t0 , Mh,t :D exp h t h ƒt h
2
e h,T on .C , C / from Mh
and define for 0 < T < 1 probability laws Q
e h,T .A/ :D EQ . Mh,T 1A /
Q for all A 2 C
. t .t ^ T / ƒh / t0 e h,T .
is a local martingale under Q
. t .t ^ T / ƒh / t0 e h,T
is a d -dimensional Brownian motion under Q
from which we deduce that Q e h,T coincides with the restriction of Qh to GT : Mh,T
being strictly positive, we have Qh,T Q0,T , and Mh,T is a version of the density
d Qh,T
d Q0,T
. The remaining parts of the proof are along the lines of Example 5.14.
(with .1/ , : : : , .d / the components of the canonical process ) is the best equivariant
estimator and the minimax estimator for the unknown parameter, by the properties of
the Gaussian shift experiment E.ƒt / D .C , G t , ¹Qh : h 2 Rd º/. Note that this esti-
mator only makes use of the last observation t .
Chapter 6
Jeganathan [66, 67] and Le Cam and Yang [84]. Stochastic process examples leading
to quadratic or to mixed normal experiments will be given in Sections 6.2 and 6.3.
It seems natural to generalise Definition 5.2 and to allow for random matrices J
taking values in the set
® ¯
DC :D j 2 Rd d : j is symmetric and strictly positive definite 2 B.Rd d / .
Frequently we will suppress the indicator in (+), and write for short Z D J 1 S in-
stead of (+).
6.1” Remark. We have seen in Example 5.1 that a Gaussian shift experiment E.J /
exists for any given deterministic matrix J 2 Rd d . In contrast to this, it is no longer
true that arbitrarily prescribed pairs .S , J / induce quadratic experiments once J is ran-
dom. The following nontrivial condition ./ is contained in the formulation of part (ii)
of Definition 6.1:
Z
> 1 >
./ e h S 2 h J h dP0 D 1 for all h 2 Rd .
150 Chapter 6 Quadratic Experiments and Mixed Normal Experiments
from ..0, 1/, B.0, 1// to .R, B.R//, parameterised by 2 R. We keep 0 < < 1 fixed.
(a) Write f .t , x/ for the density of P with respect to Lebesgue measure on .0, 1/ R
and show from Example 5.1 that
f .t , x/ D e x
1 2 t
2 f0 .t , x/ , t 2 .0, 1/ , x 2 R .
Thus, with S.t, x/ D x and J.t, x/ D t , we have a quadratic model in the parameter 2 R,
as defined in Definition 6.1. Moreover, the model ¹P : 2 Rº is in particular mixed normal
in the sense of Definition 6.2 since L.J jP / does not depend on the parameter 2 R.
(b) For a pair . , B/ where B and are independent, where B D .B t / t0 is a one-dimen-
sional standard Brownian motion starting from B0 0 and an exponential time with param-
eter , the above model arises as
® ¯
P D L . , B C / : 2 R .
(c) From (a), ¹P : 2 Rº is a curved exponential family in ./ and T where : R ! R2
is the mapping ! ./ D . 12 2 , / and T D idj.0,1/R the canonical variable on
.0, 1/ R.
S jJ Dj
(iii) for all h 2 Rd we have Ph D N .j h, j / for PhJ -almost all j 2 Rd d ;
ZhjJ Dj
(iv) for all h 2 Rd we have Ph D N .0, j 1 / for PhJ -almost all j 2 Rd d .
j 2 Rd d
S jJ Dj
for every h 2 Rd we have P0 D N .0, j / for P0J -almost all j 2 Rd d .
Thus assertion (i)” .ii) of the proposition is proved; (iii)” .iv) is by definition
Z D 1¹J 2DC º J 1 S of the central statistic. We obviously have (iii)H)(ii); so the
following step (2) will finish the proof.
(2) Proof of (i)+(ii)H)(iii): Under (i) and (ii) we have for arbitrary A 2 B.Rd /,
B 2 B.Rd d /:
Eh .1B .J / 1A .S //
D E0 1B .J / 1A .S / Lh=0
> 1 >
D E0 1B .J / 1A .S / e h S 2 h J h
Z Z
1 > S jJ Dj >
D P0J .dj / 1B .j / e 2 h j h P0 .ds/ 1A .s/ e h s .
From (ii) and the definition of the set DC preceding Definition 6.1, the last line reads
Z "Z #
J 12 h> j h 1 12 s > j 1 s h> s
P0 .dj / 1B \ DC .j / e ds p e 1A .s/ e ,
Rd .2/d jdet.j /j
and after rearranging terms 12 h> j h 12 s > j 1 sCh> s D 12 .sj h/> j 1 .sj h/
gives
Z Z
1 > 1
e 2 .sj h/ j .sj h/ 1A .s/ .
1
P0J .dj / 1B \ DC .j / ds p
Rd .2/d jdet.j /j
By (i) and Definition 6.2, the last expression equals
Z Z
P0 .dj / 1B \ DC .j / N .j h, j /.A/ D PhJ .dj / 1B \ DC .j / N .j h, j /.A/ .
J
152 Chapter 6 Quadratic Experiments and Mixed Normal Experiments
The set DC has full measure under PhJ since Ph P0 . So the complete chain of
equalities shows
Z
Eh .1B .J / 1A .S // D PhJ .dj / 1B .j / N .j h, j /.A/
where A 2 B.Rd / and B 2 B.Rd d / are arbitrary. This gives (iii) and completes
the proof.
6.4’ Remark. In mixed normal experiments E.S , J /, the notion of ‘observed informa-
tion’ due to Barndorff-Nielsen (e.g. [3]) gains a new and deeper sense: the observation
! 2 itself communicates through the observed value j D J.!/ the amount of ‘in-
formation’ it carries about the unknown parameter. Indeed from Definition 6.2 and
Proposition 6.3(iii), the family of conditional probability measures
° ±
jJ Dj
Ph : h 2 Rd
is a Gaussian shift experiment E.j / as discussed in Proposition 5.3 for PJ -almost all
j 2 Rd d ; this means that we can condition on the observed information. By Propo-
sition 6.3, this interpretation is valid under mixed normality, and does not carry over
to general quadratic experiments.
6.5 Definition. In a mixed normal experiment E.S , J /, an estimator for the un-
known parameter h 2 Rd is termed strongly equivariant if the transition kernel
Proof. Clearly Definition 6.2 combined with ./ of Definition 6.5 gives (+). To prove
the converse, fix a countable and \–stable generator S of the -field B.Rd /, and
assume that Definition 6.2 holds in combination with (+). Then for every A 2 S and
every h 2 Rd , we have from (+)
Eh . 1C .J / 1A . h/ / D E0 . 1C .J / 1A ./ /
coincide PJ -almost surely on Rd d . Thus there is some PJ -null set Nh,A 2 B.Rd /
such that
PhhjJ Dj .A/ D P0jJ Dj .A/ for all j 2 DC nNh,A
where DSC is the set defined before Definition 6.1. Keeping h fixed, the countable union
Nh D A2S Nh,A is again a PJ -null set in B.Rd /, and we have
So far, h 2 Rd was fixed but arbitrary: hence K., / is a regular version simultane-
hjJ Dj
ously for all conditional laws .j , A/ ! Ph .A/ , h 2 Rd . We thus have ./
in Definition 6.5 which finishes the proof.
We state the first main result on estimation in mixed normal experiments, due to
Jeganathan [66].
jJ Dj
for PJ -almost all j 2 DC : P0 .A/ D N .0, j 1 / ? Qj .
e , C / :D P0.,S /jJ Dj .C / ,
.j , C / ! K.j j 2 DC , C 2 B.Rd /˝B.Rd /
e , Rd / D N .0, j /
K.j for all j 2 DC .
coincide PJ -almost surely on DC , for every fixed value of h 2 Rd . Let Nh denote
an exceptional PJ -null set
S with respectJ to a fixed value of h in the last formula. The
countable union N :D Nh is a P -null set in DC with the property
h2 Qd
8R
ˆ e i t > .kh/ e h> s 1
h> j h
< Rd RdRK.j , .d k, ds// e 2
The second integral in .˘/ equals fj .0/. The function fj being analytic, assertion .˘/
yields
for all j 2 DC n N : the function fj ./ is constant on C d .
156 Chapter 6 Quadratic Experiments and Mixed Normal Experiments
e / we have proved
By definition of K.,
for all j 2 DC n N :
.CC/ > 1 > 1 >
E0 e i t j J D j D e 2 t j t E0 e i t .Z/ j J D j .
(3) So far, we have considered some fixed value of t 2 Rd . Hence the PJ -null set
N in .CC/ depends on t . Taking the union of such null sets for all t 2 Qd we obtain
e for which dominated convergence with respect to the argument t in
a PJ -null set N
both integrals in (+) establishes
for all j 2 DC n Ne:
> 1 > 1
>
E0 e i t j J D j D e 2 t j t E0 e i t .Z/ j J D j , t 2 Rd .
This is an equation between characteristic functions of conditional laws. Introducing
a regular version
.Z/jJ Dj
.j , A/ ! Qj .A/ D P0 .A/
In combination with Proposition 6.3(iv), the Convolution Theorem 6.6 shows that
within the class of all strongly equivariant estimators for the unknown parameter h 2
Rd in a mixed normal experiment E.S , J /, the central statistic Z D 1¹J 2DC º J 1 S
achieves minimal spread of estimation errors, or equivalently, achieves the best pos-
sible concentration of an estimator around the true value of the parameter. In analogy
to Corollary 5.8 this can be reformulated – again using Anderson’s Lemma 5.7 – as
follows:
6.6’ Corollary. In a mixed normal experiment E.S , J /, with respect to any subconvex
loss function, the central statistic Z minimises the risk
R., h/ R.Z, h/ , h 2 Rd
in the class of all strongly equivariant estimators for the unknown parameter.
Section 6.1 Quadratic and Mixed Normal Experiments 157
6.6” Remark. In the Convolution Theorem 6.6 and in Corollary 6.6’, the best concen-
trated distribution for estimation errors
Z
L .Z j P0 / D L PJ .dj / N 0 , j 1
does not necessarily admit finite second moments. As an example, the mixing distribu-
tion PJ might be in dimension d D 1 some Gamma law .a, p/ with shape parameter
a 2 .0, 1/: then Z
1
EP0 Z 2 D .a, p/.du/ D 1
.0,1/ u
and the central statistic Z does not belong to L2 .P0 /. We shall see examples in Chap-
ter 8. When L .ZjP0 / does not admit finite second moments, comparison of estima-
tors based on squared loss `.x/Dx 2 – or based on polynomial loss functions – does not
make sense, thus bounded loss functions are of intrinsic importance in Theorem 6.6
and Corollary 6.6’, or in Lemma 6.7 and Theorem 6.8 below. Note that under mixed
normality, estimation errors are always compared conditionally on the observed infor-
mation, never globally.
In mixed normal experiments, one would like to be able to compare the central
statistic Z not only to strongly equivariant estimators, but to arbitrary estimators for
the unknown parameter h 2 Rd . Again we can condition on the observed information
to prove in analogy to Lemma 5.9 that arbitrary estimators are ‘approximately strongly
invariant’ under a ‘very diffuse’ prior, i.e. in the absence of any a priori information
except that the unknown parameter should range over large balls centred at the origin.
Again Bn denotes the closed ball ¹x 2 Rd : jxj nº, and n the uniform law on Bn .
6.7 Lemma. In a mixed normal experiment E.S , J /, every estimator for the un-
j
known parameter h 2 Rd can be associated to probability measures ¹ Qn : j 2
DC , n 1 º on .Rd , B.Rd // such that
Z Z h i
J 1 j
d1 n .dh/ L . hjPh / , P .dj / N .0, j / ? Qn ! 0
as n ! 1 ,
Proof. The second assertion is a consequence of the first since `./ is measurable and
bounded, by (+) in Remark 5.8’. We prove the first assertion in analogy to the proof
of Lemma 5.9.
(1) The pair .Z, J / is a sufficient statistic in the mixed normal experiment E.S , J /.
Thus, for any random variable taking values in .Rd , B.Rd //, there is a regular
version of the conditional law of given .Z, J / D which does not depend on the
parameter h 2 Rd . We fix a transition kernel K., e / from Rd DC to Rd which
provides a common regular version
e j.Z,J /D.z,j /
K..z, j /, A/ D Ph .A/ , A 2 Rd , .z, j / 2 Rd DC
e / equals
which by definition of K.,
Z
1
e .h0 , j /, A x C h0
.dh0 / 1Bn Cx .h0 / K
.Bn /
Z
1
D .dh0 / 1Bn Cx .h0 / P 2 .A x C h0 / j .Z, J / D .h0 , j /
.Bn /
Z
1
D .dh0 / 1Bn Cx .h0 / P . Z/ 2 .A x/ j .Z, J / D .h0 , j / .
.Bn /
Now the last expression
Z
1 Z/j.Z,J /D.h0 ,j /
.dh0 / 1Bn Cx .h0 / P. .A x/
.Bn /
can be compared to
Z
1 . Z/j.Z,J /D.h0 ,j /
Qnj .A x/ D .dh0 / 1Bn .h0 / P .A x/
.Bn /
up to the error bound on the right-hand side of ./, uniformly in A 2 B.Rd / and
j 2 DC .
(4) Combining steps (2) and (3), we approximate
Z Z Z
n .dh/ Ph . h 2 A/ by PJ .dj / N .0, j 1 /.dx/ Qnj .A x/
The following is – together with the Convolution Theorem 6.6 – the second main
result on estimation in mixed normal experiments.
6.8 Minimax Theorem. In a mixed normal experiment E.S , J /, the central statistic
Z is a minimax estimator for the unknown parameter with respect to any subconvex
loss function `./: we have
Z Z
ZjJ Dj
sup R., h/ E0 .`.Z// D PJ .dj / P0 .dz/ `.z/ D sup R.Z, h/ .
h2Rd h2Rd
Proof. Consider any estimator for the unknown parameter h 2 Rd in E.S , J /, and
any subconvex loss function `./. Since E0 .`.Z// (finite or not) is the increasing limit
160 Chapter 6 Quadratic Experiments and Mixed Normal Experiments
Now Anderson’s Lemma 5.7 allows us to compare integral terms for all n, j fixed,
and gives lower bounds
Z Z
P .dj / N .0, j 1 /.dv/ `.v/ C o.1/ as n ! 1 .
J
The last integral does not depend on n, and by Definition 6.2 and Proposition 6.3(iv)
equals Eh .`.Z h// D R.Z, h/ for arbitrary h 2 Rd . This finishes the proof.
What happens beyond mixed normality, in the genuinely quadratic case? We know
that Z D J 1 S is a maximum likelihood estimator for the unknown parameter, by
Remark 6.1’, we know that for J.!/ 2 DC the log-likelihood surface
1 >
h ! ƒh=0 .!/ D h> S.!/ h J.!/ h
2
1
D .h Z.!//> J.!/ .h Z.!// C expressions not depending on h
2
has the shape of a parabola which opens towards 1 and which admits a unique maxi-
mum at Z.!/, but this is no optimality criterion. For quadratic experiments which are
not mixed normal, satisfactory optimality results seem unknown. For squared loss,
Gushchin [38, Thm. 1, assertion 3] proves admissibility of the ML estimator Z un-
der random norming by J (which makes the randomly normed estimation errors at h
coincide with the score S J h at h) in dimension d D 1. He also has Cramér–Rao
type results for restricted families of estimators. Beyond the setting of mixed nor-
mality, results which allow to distinguish an optimal estimator from its competitors
simultaneously under a sufficiently large class of loss functions – such as those in the
convolution theorem or in the minimax theorem – seem unknown.
6.2 Likelihood Ratio Processes in Diffusion Models
Statistical models for diffusion processes provide natural examples for quadratic ex-
periments. For laws of solutions of stochastic differential equations, we consider first
Section 6.2 Likelihood Ratio Processes in Diffusion Models 161
6.9 Assumptions and Notations for Section 6.2. (a) We consider d -dimensional
stochastic differential equations (SDE)
.I/ dX t D b.t , X t / dt C .t , X t / d W t , t 0, X0 x0
.II/ dX t0 Db 0
.t , X t0 / dt C .t , X t0 / d W t , t 0, X0 x0
driven by m-dimensional Brownian motion W . SDE (I) and (II) share the same diffu-
sion coefficient
: Œ0, 1/ Rd ! Rd m
but have different drift coefficients
b , b0 : Œ0, 1/ Rd ! Rd .
Both equations (I) and (II) have the same starting value x0 2 Rd . All coefficients are
measurable in .t , x/; we assume Lipschitz continuity in the second variable
jb.t , x/ b.t , y/j C jb 0 .t , x/ b 0 .t , y/j C k .t , x/ .t , y/k K jx yj
(for t 0 and x, y 2 Rd ) together with linear growth
jb.t , x/j2 C jb 0 .t , x/j2 C k .t , x/k2 K .1 C jxj2 /
where K is some global constant. The d d -matrices
c.t , x/ :D .t , x/ > .t , x/
will be important in the sequel.
(c) Under the assumptions of (b), equations (I) and (II) have unique strong solutions
(see e.g. [69, Chap. 5.2.B]) on the probability space ., A, F , P / carrying the driving
Brownian motion W , with F the filtration generated by W . We will be interested in
the laws of the solutions
Q :D L. X j P / and Q0 :D L. X 0 j P /
on the canonical path space .C , C , G/ .
162 Chapter 6 Quadratic Experiments and Mixed Normal Experiments
0
(d) We write mX and mX for the .P , F /-local martingale parts of X and X 0
Z t Z t
X0 0 0
mXt D X t X 0 b.s, X s / ds , m t D X t X 0 b.s, Xs0 / ds .
0 0
Their angle brackets are the predictable processes
D E Z t D 0E Z t
m X
D c.s, Xs / ds , mX
D c.s, Xs0 / ds
t 0 t 0
We shall explain in Remark 6.12’ below why equations (I) and (II) have been as-
sumed to share the same diffusion coefficient. The next result is proved with the argu-
ments which [88, Sect. 7] use on the way to their Theorems 7.7 and 7.18. See also [64,
pp. 159–160, 179–181, and 187]).
6.10 Theorem. Let the drift coefficients b in (I) and b 0 in (II) be such that a measurable
function
: Œ0, 1/ Rd ! Rd
exists which satisfies the following conditions (+) and (++):
(a) Then the probability laws Q0 and Q on .C , C / are locally equivalent relative to G.
(b) The density process of Q0 with respect to Q relative to G is the .Q, G/-martingale
²Z t Z ³
1 t >
L D .L t / t0 , L t D exp > .s, s / d mQ
s c .s, s / ds
0 2 0
where mQ denotes the local martingale part of the canonical process under Q:
Z t
Q Q
mQ D .m t / t0 , m t D t 0 b.s, s / ds
0
and where the local martingale
Z
> .s, s / d mQ
s D: M
(which may take the value C1 with positive probability under Q or under Q0 ) and
write
.n/ .s, s / :D .s, s / 1ŒŒ0,n ŒŒ .s/ , n 1 .
By assumption (++) which is symmetric in Q and Q0 , we have
(2) For every n 2 N, on ., A, F , P /, we also have unique strong solutions for
equations (II.n/ ):
.n/ .n/ .n/
.II.n/ / dX t D Œb C c .n/ .t , X t / dt C .t , X t / d W t , t 0, X0 x0 .
By unicity, X .n/ coincides on ŒŒ0, n with the solution X 0 to SDE (II) where n is the
F -stopping time n :D n ıX 0 . If we write Q.n/ for the law L.X .n/ jP / on .C , C , G/,
then the laws Q.n/ and Q0 coincide in restriction to the -field Gn of events up to
time n .
(3) On .C , C , G/, the Novikov criterion [61, p. 152] applies – as a consequence of
the stopping in step (1) – and grants that for fixed n 2 N
²Z t Z ³
1 t .n/ >
LL t D exp Œ .n/ > .s, s / d mQ
.n/ .n/
s Œ c Œ .s, s / ds , t 0
0 2 0
is a QL .n/ -local martingale whose angle bracket under QL .n/ coincides with the angle
bracket of ˇ >mQ under Q. For the d -dimensional process mQ this shows that
Z
Q
m Œ c .n/ .s, s / ds
0
is a QL .n/ -local martingale whose angle bracket under QL .n/ coincides with the angle
bracket of mQ under Q. Since C is generated by the coordinate projections, this sig-
nifies that QL .n/ is the unique law on .C , C , G/ such that the canonical process under
QL .n/ is a solution to SDE .II.n/ /. As a consequence, the two laws coincide: Q.n/ from
step (2) is the same law as QL .n/ , on .C , C , G/.
(4) For every n fixed, step (3) shows the following: the laws Q.n/ and Q are locally
equivalent relative to G since the density process of Q.n/ with respect to Q relative
to G
L^n D LL .n/
is strictly positive Q-almost surely. Recall also that the laws Q.n/ and Q0 coincide in
restriction to Gn .
loc
(5) The following argument proves Q0 << Q relative to G. For every t < 1,
0
A 2 G t H) Q0 .A/ D P X^t 2A
0
P X^t 2 A , n > t C P . n t /
.n/
P X^t 2 A C P . n t /
D Q.n/ . A / C Q0 . n t /
D Q.n/ . A / C o.1/ as n ! 1
since X 0 and X .n/ coincide up to time n D n ı X 0 as above, and since n " 1
Q 0 -almost surely. Now, Q.n/ and Q being locally equivalent relative to G for all n,
by step (4), the above gives
A 2 G t , Q.A/ D 0 H) Q.n/ .A/ D 0 8 n 1 H) Q0 .A/ D 0
loc
which proves Q0 << Q relative to G.
(6) We prove that L is the density process of Q0 with respect to Q relative to G. For
A 2 G t , the event A \ ¹t < n º belongs to Gn , the -field of events strictly before
time n (for a G-stopping time S , GS is defined as the smallest -field containing
the system ® ¯
G0 [ G \ ¹s < S º : G 2 Gs , 0 s < 1 ,
cf. [98, p. 17], and is contained in GS ). Thus A \ ¹t < n º 2 Gn : as a consequence,
6.11 Example. On the canonical path space .C , C , G/ of Notation 5.13 with canonical
process , write Q0 for Wiener measure. Let Qh denote the law of the solution to the
Ornstein–Uhlenbeck SDE
dX t D h X t dt C dB t , X0 0
for every h 2 R. Then according to Theorem 6.10, all laws Qh are locally equivalent
relative to G, and the density process of Qh with respect to Q0 relative to G is
² Z t Z ³
h=0 h=0 1 2 t 2
Lh=0 D L t t0 , L t :D exp h s d m.0/
s h s ds
0 2 0
with m.0/ the martingale part of under Q0 . For 0 < t < 1 fixed, observation of
up to time t
E.S , J / D C , G t , ¹Qh.t/ : h 2 Rº , Qh.t/ the restriction of Qh to G t
yields a quadratic experiment in the sense of Definition 6.1, with score in 0 and ob-
served information given by
Z t Z t
.0/
S :D s d ms , J :D 2s ds .
0 0
This is not a mixed normal model: first, L.J jQh / depends on h 2 R, second, from
Ito formula and L.., m.0/ /jQ0 / D L..B, B// where B denotes Brownian motion,
the law Z t
1 2
L.S jQ0 / D L Bs dBs D L .B t t /
0 2
166 Chapter 6 Quadratic Experiments and Mixed Normal Experiments
is concentrated on the half-line Œ 2t , 1/. Thus, according to Definition 6.2 and Propo-
sition 6.3, this quadratic model E.S , J / is not a mixed normal model.
R
Scores of similar structure such as Bs dBs motivated Jeganathan to view quadratic
models as ‘Brownian functional models’ (see [67], and the references there). The fol-
lowing example shows that on .C , C , G/, we can attach quadratic statistical models to
solutions of stochastic differential equations in a natural way, under the sole restric-
tion that angle brackets of local martingale parts of the observed should be invertible.
6.12 Example. Assume that c.t , x/ 2 Rd d is invertible for all t and all x. Fix a drift
coefficient b for SDE (I) and write Q for the law of the solution of (I). For SDE (II),
introduce a parameter h 2 Rd , define drift functions
bh .t , x/ :D b.t , x/ C c.t , x/ h , h 2 Rd
which depend on the parameter, and write Qh for the law of the solution of (II) with
b 0 D bh . Clearly bh ¤ bh0 for h ¤ h0 by assumption on c., /. We thus have a
statistical model
¹ Qh : h 2 Rd º on .C , C , G/ with Q0 :D Q .
Note that for every h 2 Rd , with ., / h constant, the assumptions (+) and (++)
of Theorem 6.10 are satisfied, and Theorem 6.10 gives the following.
(a) We have
loc
Qh Q0 relative to G ,
and the density process of Qh with respect to Q0 relative to G is
² ³
1
L t D exp h>m t h>hm.0/ i t h D E t h>m.0/
h=0 .0/
2
where m.0/ is the .G, Q0 /-martingale part of the canonical process . By assumption
on c., /, Z t
h>hm.0/ i t h D h>c.s, s / h ds
0
is strictly positive for every h 2 R Hence the random matrix hm.0/ i t defined on
d.
Then according to ./ in Theorem 5.12(c), we can replace the deterministic time t in
(a) by . This yields a statistical model
(with Qh the restriction of Qh to G ) which again is quadratic in the sense of Defini-
tion 6.1.
It remains to explain why equations (I) and (II) in Assumption 6.9 have been as-
loc
sumed to share the same diffusion coefficient. The reason is that for measures Q Q0
on .C , C , G/, under time-continuous observation of the canonical process , the local
martingale part of can not be modified:
Next, select some subsequence .nk /k such that . / holds almost surely for every time
t which is rational, and then consider events in G t
² Z t ³
A t :D lim Vnk .t / D 2
Œ .s, s / ds ,
k!1 0
² Z t ³
A0t :D lim Vnk .t / D Œ 0 .s, s /2 ds .
k!1 0
168 Chapter 6 Quadratic Experiments and Mixed Normal Experiments
loc
Here A t is a set of full Q-measure, and A0t a set of full Q0 -measure. From Q Q0
relative to G we deduce that the set A t \ A t has full measure under both Q and Q0 .
0
This holds for all rational t . Hence under both laws Q and Q0 , the processes
D E Z t D E Z t
Q Q0 0
. / m D 2
Œ .s, s / ds , m D 2
Œ .s, s / ds
0 t0 0 t0
6.3 Time Changes for Brownian Motion with
Unknown Drift
The present section is on time changes in the model ‘scaled Brownian motion with
unknown drift’ of Section 5.2. We shall see that time changes which are independent
of Brownian motion lead to mixed normal models, other time changes to quadratic
models which are not mixed normal.
With the canonical process on .C , C / and m.h/ the Qh -martingale part of , Exam-
ple 5.15 states that all Qh are locally equivalent relative to G, and that
0 ² ³
0 h0 = h 1
:D exp .h0 h/>m t .h0 h/>ƒt .h0 h/
h =h .h/
Lh = h D L t , Lt
t0 2
is the density process of Qh0 with respect to Qh relative to G, where L.m.h/ jQh / D
L.ƒ1=2 B/ does not depend on the parameter h 2 Rd .
Consider a G-stopping time with the property
By Theorem 5.12(c) and (d) – compare to Remark 6.1” – condition ./ guarantees that
the ‘candidate pair’
.S , J / D m.0/
, ƒ D . , ƒ /
Section 6.3 Time Changes for Brownian Motion with Unknown Drift 169
6.14’ Exercise. This exercise complements Example 6.14. With all notations as in Example
6.14, we consider the one-dimensional case d D 1 with scaling factor ƒ :D 1. We focus on
the G-stopping times Ta , for a > 0 fixed.
170 Chapter 6 Quadratic Experiments and Mixed Normal Experiments
(a) Since 0 D 0 Q0 -almost surely, the law of the iterated logarithm grants
Q0 .¹0 < Ta < 1º/ D 1.
(b) Write for short P :D Q0 and M t :D max s . Write ˆ for the distribution function of
0st
N .0, 1/. For a, t in .0, 1/, use the reflection principle and rescaling
p p
P .Ta < t / D P .M t > a/ D 2P . t > a/ D 2P .1 > a= t / D 2.1 ˆ.a= t //
to determine the density of L.Ta jP /:
a a2
fa .t / D p t 2 e 2
3 1
t , t , a 2 .0, 1/ .
2
Determine the Laplace transform of L.Ta jP /
p
EP e Ta D e a 2 , 0
from the statistical argument in Theorem 5.12(d): for positive drift parameter h > 0 we do
have the equivalent assertions
h=0
Qh .¹0 < Ta < 1º/ D 1 ” EQ0 LTa D 1 , h > 0
where Lh=0 is the density process of Qh with respect to Q0 as in Examples 5.14 or 6.13; on
the right-hand side, exploit Ta D a Q0 -almost surely; finally, change variables :D 12 h2 .
(c) For positive drift h > 0, determine the Laplace transform of L .Ta j Qh /
p
EQh e Ta D e a . h C2 h/ , 0
2
h=0
using (b) and change of measure EQh .e Ta / D EQ0 .e Ta LTa /.
(d) For negative drift h < 0, prove that
Qh .¹0 < Ta < 1º/ D e 2 a h < 1 , h<0
(hint: combine an argument as in step (3) of the proof of Theorem 5.12 with application of (b)
above to eh :D jhj > 0).
(e) Deduce the following from (d). For a > 0 fixed, the candidate pair of statistics S D Ta ,
J D Ta under Q0 does not generate a statistical experiment which satisfies all assumptions of
Definition 6.1, cf. Remark 6.1”. As a consequence, ‘observing Brownian motion with unknown
drift up to time Ta ’ does not lead to a quadratic experiment in the sense of Definition 6.1.
More interesting time transformations for the model ‘scaled Brownian motion with
unknown drift’ are at hand if we extend the probability space and consider stopping
times which are independent from Brownian motion. This will lead to mixed normal
experiments. We will formulate mixed normality in two variants: the first collects all
independent variables, processes, stopping times in one initial -field, the second vari-
ant keeps track of the temporal dynamics of the time transformation.
6.15 Observing Scaled Brownian Motion with Unknown Drift up to some Inde-
pendent Random Time. We extend the experiment
C , C , G , ¹Qh : h 2 Rd º , Qh :D L ƒ1=2 B t C .ƒh/ t t0 , h 2 Rd
Section 6.3 Time Changes for Brownian Motion with Unknown Drift 171
./ : 00 ! Œ0, 1 is A00 -measurable, and 0< <1 P 00 -almost surely.
The object in ./ might be defined in terms of other processes or random variables
living on .00 , A00 , P 00 /. With the notations of Example 5.15, introduce a product space
® ¯
. / H , H , H, Q b h : h 2 Rd , Q b h :D Qh ˝ P 00
where
This constructions lifts the canonical process from .C , C / to .H , H /, and lifts ran-
dom variables from .00 , A00 / to .H , H /. In particular, any Œ0, 1-valued random vari-
able on .00 , A00 / lifted to the extension will be H0 -measurable, and thus can be used
as H-stopping time. On .H , H /, objects lifted from .C , C / and objects lifted from
.00 , A00 / will be independent under all Q b h :D Qh ˝ P 00 , h 2 Rd , and the law of ob-
00 00
jects stemming from . , A / will not depend on h 2 Rd . Due to this independence
structure, density processes of Q b h relative to H are obtained by
b h0 with respect to Q
lifting the density processes given in Example 5.15 from .C , C , G/. Thus ./ and ./
of Example 5.15 remain unchanged provided we redefine on .H , H , H/
m.h/ :D . t ƒh t / t0 bh .
martingale part of under Q
In this sense, the model . / is a simple extension of the statistical model in Exam-
ple 5.15. On .H , H /, we write again for the H-stopping time obtained by lifting ./
above to .H , H , H/. Then condition ./ of Theorem 5.12(c) is satisfied
6.15’ Exercise. In the setting of 6.15, let .00 , A00 , P 00 / carry a Poisson process .N t / t0 with
parameter > 0. Let denote the time of the k-th jump of .N t / t0 . In the particular case k D 1
we recover the example of Exercise 6.1”’, with observation up to an independent exponential
time.
6.15” Exercise. In the setting of 6.15, fix > 0 and let .00 , A00 , P 00 / carry a Gamma
process, i.e. a PIIS . t / t0 (process with stationary and independent increments) where
L . r2 r1 jP 00 / D .r2 r1 , / for 0 r1 < r2 < 1. Here .a, / denotes the Gamma
a
law with density fa, .x/ D 1.0,1/ .x/ .a/ x a1 e x , x 2 R. The state of the process at
time t D 1 is thus exponentially distributed with parameter ; defining :D k we have an
alternative to Exercise 6.15’, for k 2 N. Beyond this, we can use the process t ! t as a time
change for Brownian motion. Note that Gamma processes (cf. Bertoin [10, p. 73]) are obtained
as integrals Z tZ
x .ds, dx/ , t 0
0 .0,1/
The construction in 6.15 does not reflect adequately the temporal dynamics of
a process of time change in cases where such a process may play a key role; the
following construction remedies to this.
6.16 Independent Time Transformation for ‘Scaled Brownian Motion with Un-
known Drift’. Consider a probability space .0 , A0 , P 0 / carrying processes B D
.B t / t0 and A D .A t / t0 as follows:
(i) B is a d -dimensional standard Brownian motion with B0 0,
(ii) for all ! 2 0 , paths t ! A t .!/ are càdlàg non-decreasing, with A0 .!/ D 0
and lim A t .!/ D C1,
t!1
Write F D .F t / t0 for the right-continuous filtration generated by the pair .B, A/.
Then
t :D inf¹v : Av > t º , 0<t <1
are F -stopping times which by (ii) have the property
is mixed normal in the sense that for all 0 < t < 1 fixed, the restrictions Qht of
Qh to the -field of events up to time t form an experiment ¹Qht : h 2 Rd º which
is mixed normal in the sense of Definition 6.2; it represents scaled Brownian motion
with unknown drift time-transformed by the level crossing times of an independent
increasing process A.
Let us state the necessary details more carefully before giving a proof. Write D for
the space of d -dimensional càdlàg functions Œ0, 1/ ! Rd equipped with Skorohod
topology and Borel -field D (see [64, p. 292]). .D, D/ is a Polish space, and D
coincides with the -field generated by the coordinate projections.
T With notation D
. t / t0 for the canonical process on .D, D/, write G t :D r>t .s : 0 s r/.
Next, in dimension 1, we define .DC , DC / as the restriction of the Skorohod space of
one-dimensional càdlàg functions to the closed subspace of non-decreasing functions
h starting at h.0/ D 0 (cf. [64, p. 306]): then .DC , DC / is again Polish.
T With notation
C
D . t / t0 for the canonical process on .DC , DC /, write G t :D r>t . s : 0
s r/. Then the laws Qh defined by .˘/ live on the product space .H , H , H/
.˘˘/ H :D D DC , H :D D ˝ DC , H t :D G t ˝ G tC , H :D .H t / t0 .
We write ., / for the canonical process on .H , H , H/ and have the following:
(˛) All laws Qh , h 2 Rd , are locally equivalent relative to H.
(ˇ) The density process of Qh with respect to Q0 relative to H is
² ³
1
Lh=0 D .L t / t0 , L t :D exp h> t h> ƒ
h=0 h=0
t h .
2
m.1,h/ :D 0 Œƒh
for the Qh -local martingale part of the first component of the canonical pro-
cess ., / on .H , H , H/. Here L . j Qh / D L . j P 0 / does not depend on the
parameter h 2 Rd , and
L m.1,h/ j Qh D L ƒ1=2 B ı j P 0 for all h 2 Rd
e
where id is the deterministic process taking value t at time t . We define H-stopping
times
et :D inf¹v : ev > t º , 0 < t < 1
e /. By the properties of the increasing process A under P 0 , we have
e, H
on .H
for all h 2 Rd and all 0 < t < 1, and L. et j Q e h / does not depend on h 2 Rd .
Put e0 0 and note that paths t ! et and t ! e e
t are càdlàg on ŒŒ0, ŒŒ where
:D inf¹t : et D 1º equals C1 Q e h -almost surely for all h 2 Rd .
e h on H
(2) By definition of the laws Q e, He, H
e , Brownian motion B being indepen-
dent from the increasing process A under P 0 (and trivially independent from the second
Section 6.3 Time Changes for Brownian Motion with Unknown Drift 175
component which is deterministic), Example 5.15 immediately extends to H e, H
e, H e :
thus
e h loc Q
Q e for all h 2 Rd ,
and the density process of Qe h with respect to Qe 0 relative to H e is
° 1 ±
Lh=0 D .e
e Lh=0
t / t0 ,
e
Lh=0
t :D exp h>e t h> ƒt h .
2
(3) In the statistical model of step (2), we can change time according to Theo-
rem 5.12(c) – this hinges on the property ./ for all stopping times et , 0 < t < 1, in
step (1) – and consider mappings
‰ : .e , e, e/ ! ‰ .e , e, e/ :D e ı e, e
e / to .H , H / which have the properties
e, H
from .H
e , 0 < t < 1, L ‰ j Q
‰ 1 .H t / D H e h D Qh , h 2 Rd .
e
t
Thus properties (˛), (ˇ), ( ) and (ı) for the statistical model .˘/ and .˘˘/ hold as
consequences of step (2).
6.17 Remark. (1) For 0 < ˛ < 1, the one-sided stable process S .˛/ with index ˛ is
defined from independent and stationary increments having Laplace transforms
˛
E e .Sr2 Sr1 / D e .r2 r1 / , 0 , 0 r1 < r2 < 1 .
˛ ˛
This process is a functional of Poisson random measure .ds, dx/ on .0, 1/ .0, 1/
with intensity .ds, dx/ D ds .1˛/˛
x ˛1 dx on .0, 1/ .0, 1/
Z tZ
.˛/
St D x .ds, dx/ , t 0,
0 .0,1/
(see [62], or [10, p. 73)]). Paths of S .˛/ are càdlàg and strictly increasing – they have
positive and summable jumps, increase only by jumps, and have an infinite number of
.˛/
infinitesimally small jumps over every finite time interval – such that S0 D 0 and
.˛/
lim t!1 S t D 1.
(2) Let V .˛/ denote the process inverse to S .˛/ , i.e. the process of level crossing
times
V t.˛/ :D inf¹v > 0 : Sv.˛/ > t º , 0 t < 1 .
176 Chapter 6 Quadratic Experiments and Mixed Normal Experiments
The process V .˛/ is called the Mittag–Leffler process of index 0 < ˛ < 1. Paths of
V .˛/ are continuous and non-decreasing such that V0.˛/ D 0 and lim t!1 V t.˛/ D 1.
.˛/
Laplace transforms of V t are given by
1
X
V t.˛/
./n
E e D t n˛ , 0, t 0
nD0
.1 C n˛/
.˛/
and L.V t / admits finite moments of arbitrary order. See [24, p. 453], [12],
or [131].
6.18 Example. At the start of Construction 6.16, for 0 < ˛ < 1, let us take the
increasing process A as the one-sided stable process S .˛/ with index ˛ , thus D V .˛/
in Construction 6.16 is the Mittag–Leffler process of index ˛. We put ƒ :D I for
simplicity. In .˘/ in Construction 6.16 we consider the mixed normal statistical model
Qh :D L B.V t.˛/ / C h V t.˛/ , V t.˛/ t0 , h 2 Rd .
Brownian motion B and the Mittag–Leffler process V .˛/ are independent. Corre-
sponding to observation over the time interval Œ0, t , likelihoods in ( ) of Construc-
tion 6.16 are of type
² ³
h0 = h 0 > .˛/ 1 0 > .˛/ 0
Lt D exp .h h/ B.V t / .h h/ V t I .h h/
2
We remark that the law ./ does not admit finite second moments (see Exercise 6.18’)
for 0 < ˛ < 1.
Under mixed normality, recall that comparison of estimation errors works condi-
tionally on the observed information. As strengthened in Remark 6.6”, the best con-
centrated distribution according to the Convolution Theorem 6.6, its Corollary 6.6’
and the Minimax Theorem 6.8 is not required to have finite second or higher moments;
Example 6.18 again illustrates this fact.
Section 6.3 Time Changes for Brownian Motion with Unknown Drift 177
6.18’ Exercise. We prove that the law ./ in Example 6.18 does not admit finite second mo-
ments:
With notations of Remark 6.17, for 0 < ˛ < 1, deduce from the definition of V .˛/ as
process inverse of S .˛/ and from scaling properties of S .˛/ that
" #˛ !
.˛/ 1
L V1 DL ;
S1.˛/
this representation is sometimes used as a definition of a Mittag–Leffler law. Write now
Z " #!
1 1
.˛/ ˛
.C/ L.V1.˛/ /.du/ D E .˛/
D E S1
.0,1/ u V1
and check that the integral in (+) equals C1:
.˛/
For 0 < ˛ < 1, let G denote the distribution function of S1 , then
1
1 G.t / t ˛ as t ! 1
.1 ˛/
by [24, p. 448] or [12, p. 361)]. For 0 < p ˛, we obtain
p p
E S1.˛/ < 1 for p < ˛ , E S1.˛/ D 1 for p D ˛
from a representation
Z N
p
E S1.˛/ D lim x p d.1 G.x//
N !1 0
The following notations will be used throughout the chapter. The parameter space ‚ is
an open subset of Rd , and we have a sequence .En /n of statistical experiments which
are parameterised by ‚
7.1 Definition. (a) A sequence of experiments .En /n as above is called locally asymp-
totically quadratic at # (LAQ) if relative to # there are pairs of statistics
ın D ın .#/ decreasing to 0 as n ! 1
such that the following properties (i) and (ii) are satisfied:
(i) at #, quadratic expansions
d Pn,#Cın hn 1 >
log D h>
n Sn h Jn hn C o.Pn,# / .1/ , n!1
d Pn,# 2 n
hold true, for arbitrary bounded sequences .hn /n in Rd (then, ‚ being open, we have
# C ın .#/hn in ‚ when n is large enough);
(ii) depending on #, a quadratic experiment exists
E1 D E.S , J / D , A , ¹Ph : h 2 Rd º ,
E1 D E1 .#/ , S D S.#/ , J D J.#/ , Ph D Ph .#/
(weakly in Rd Rd d ) holds as n ! 1.
(b) In particular, .En /n is called locally asymptotically mixed normal at # (LAMN)
if the limit experiment E1 D E.S , J / in (a.ii) is a mixed normal experiment as
defined in Definition 6.2 and Proposition 6.3.
Section 7.1 Local Asymptotics of Type LAN, LAMN, LAQ 181
In the LAN case, J D J.#/ is deterministic; recall that a Gaussian shift experiment
E.J / exists for every J 2 DC . Since convergence in law of Jn D Jn .#/ under Pn,#
to a deterministic limit J D J.#/ is equivalent to convergence in probability, we can
write under LAN
Jn D J C oPn,# .1/ as n ! 1
and can replace in this case the statistics Jn D Jn .#/ in part (a) of Definition 7.1 by
the deterministic quantity J D J.#/ simultaneously for all n 1.
For local asymptotics at some reference point # 2 ‚ we shall always use the
following notations:
The name ‘central sequence’ suggests a benchmark sequence: in fact, we will see
below that it allows us to judge estimation errors at the reference point # under LAMN
simultaneously with respect to a broad variety of loss functions. In a setting of i.i.d.
observations, we will give examples for LAN in Section 7.4; stochastic process ex-
amples for LAN, LAMN or LAQ will be discussed in Chapter 8. For local scale at #,
besides the well-known ın .#/ D n1=2 encountered under certain conditions, various
other rates – slower or faster – will occur in the stochastic process examples of Chap-
ter 8. In particular, we underline the following: in a given sequence of models .En /n ,
we may have at different points # 2 ‚ different rates ın .#/ # 0 and different limit
experiments E1 .#/.
We begin to discuss the statistical implications of the LAQ setting. Note that
Definition 7.1 did not require equivalence or absolute continuity for probability
measures Pn, 0 , Pn, , 0 ¤ , in the experiments En at the pre-limiting stage
h=0
n < 1. In particular, log-likelihoods ƒn,# in En may take the values ˙1 with pos-
itive Pn,# -probability: recall the definition of R-tightness from Notations 3.1 and 3.1’.
7.2 Proposition. When LAQ holds at #, for bounded sequences .hn /n in Rd , log-
hn =0 hn =0
likelihoods .ƒn,# /n and likelihoods .Ln,# /n under Pn,# are R-tight as n ! 1,
and convergence of .hn /n to a limit h 2 Rd implies
hn =0
ƒn,# ƒh=0
n,#
D o.Pn,# / .1/ , Lhn,#
n =0
Lh=0
n,#
D o.Pn,# / .1/ , n ! 1.
hn =0
Ln,# Lh=0
n,#
D o.Pn,# / .1/
for convergent sequences hn ! h from ./ and ./ and step (2).
where ƒh=0 is the log-likelihood ratio of Ph with respect to P0 in the quadratic limit
experiment E1 .#/. By Definition 6.1 and Remark 6.1’, probability measures in the
limit experiment are equivalent. Hence Le Cam’s first lemma applies and establishes
.˘/ in this case, cf. Lemma 3.5 and Remark 3.5’.
(2) Directly from Definition 3.2, mutual contiguity .˘/ holds if and only if any
subsequence .nk /k of N contains a further subsequence .nk` /` along which
Pnk ,#Cınk .#/hnk ` CB Pnk ,# `
` ` ` `
184 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ
Among competing sequences of estimators for the unknown parameter in the se-
quence of experiments .En /n , one would like to identify – if possible – sequences
which are ‘asymptotically optimal’. Fix # 2 ‚. In a setting of local asymptotics at
# with local scale .ın .#//n , the basic idea is as follows. To any estimator Tn for the
unknown parameter in En associate
b :D ¹PK : P 2 P º
P on b b
.E, E/
b
L.y, y 0 / :D L.y/ on .E E 0, E ˝ E 0/
b b
(ii) in the model .E, b /, statistics U taking values in .E 0 , E 0 / and realising the
E, P
prescribed laws
Z
0
PK.A / :D P .dy/ K.y, A0 / , A0 2 E 0 ,
E
b
under the probability measure PK 2 P
b b
exist: on .E, E/, we simply define the random variable U as the projection E b 3
0 / ! y 0 2 E 0 on the second component which gives P .U 2 A0 / D PK.A0 /
.y, y K
for A0 2 E 0 .
The experiment .E, b b b / is called a Markov extension of .E, E, P / ; clearly every
E, P
statistic Y already available in the original experiment is also available in .E,b b b/
E, P
0
via lifting Y .y, y / :D Y .y/.
h .d , ds, dj , du/ D e e
e 0 .d , ds, dj , du/ .
Projecting on the components .s, j , u/, we thus have proved for any limit point h 2 Rd
and any sequence .hl /l converging to h
.CC/ L Snl , Jnl , Unl j Pnl ,#Cınl hl ! h , l ! 1
where – with 0 as in ./ above – the probability measures h are defined from 0 by
>s 1 h>j h
.CCC/ h .ds, dj , du/ :D e h 2 0 .ds, dj , du/ on Rd Rd d Rk .
Note that the statistical model ¹h : h 2 Rd º arising in (+++) is attached to the partic-
ular accumulation point 0 for ¹L.Sn , Jn , Un j Pn,# / : n 1º which was selected in
./ above, and that different accumulation points for ¹L.Sn , Jn , Un j Pn,# / : n 1º
lead to different models (+++).
(2) We construct a Markov extension of the limit experiment E.S , J / carrying a
random variable U which allows to identify h in (++) as L .S , J , U j Ph /, for all
h 2 Rd .
Let sL , jL, uL denote the projections which map .s, j , u/ 2 Rd Rd d Rk to either
one of its coordinates s 2 Rd or j 2 Rd d or u 2 Rk . In the statistical model
¹h : h 2 Rd º fixed by (+++), the pair .Ls , jL/ is a sufficient statistic. By sufficiency,
conditional distributions given .Ls , jL/ of Rk -valued random variables admit regular
188 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ
for all h 2 Rd . Combining ./ and (++) with Proposition 7.3, we can identify the last
expression with
.S ,J /
Ph .ds, dj / K..s, j /, du/
for all h 2 Rd . The Markov extension . / of the original limit experiment allows us
to write this as
Pb .S ,J ,U / .ds, dj , du/
h
b to Rk . Hence we have proved
where U denotes the projection U.!, u/ :D u from
From now on, we need no longer distinguish carefully between the original limit
experiment and its Markov extension. From ./ and ./ in the last proof, we see that
any accumulation point of ¹L.Sn , Jn , Un jPn,# / : n 1º can be written in the form
L.S , J jP0 /.ds, dj /K..s, j /, du/ for some transition probability from Rd Rd d to
Rk . In this sense, statistical models ¹h : h 2 Rd º which can arise in (++) and (+++)
correspond to transition probabilities K., / from Rd Rd d to Rk .
Section 7.1 Local Asymptotics of Type LAN, LAMN, LAQ 189
7.6 Theorem. Assume LAQ at #. For any estimator sequence .Tn /n for the unknown
parameter in .En /n , let Un D Un .#/ D ın1 .Tn #/ denote the rescaled estimation
errors of Tn at #, n 1. Assume joint weak convergence in Rd Rd d Rd
L . Sn , Jn , Un j Pn,# / ! L . S , J , U j P0 / as n ! 1
where U is a statistic in the (possibly Markov extended) limit experiment E1 .#/ D
E.S , J /. Then we have for arbitrary convergent sequences hn ! h
L Sn , Jn , Un hn j Pn,#Cın hn ! L . S , J , U h j Ph /
.C/
as n ! 1
for bounded and continuous loss functions `./ on Rd and for arbitrarily large constants
C < 1.
Proof. (1) Assertion (+) of the theorem corresponds to the assertion of Lemma 7.5, un-
der the stronger assumption of joint weak convergence of .Sn , Jn , Un / under Pn,# as
n ! 1, without selecting subsequences. For loss functions `./ in Cb .Rd /, (+) con-
tains the following assertion: for arbitrary convergent sequences hn ! h, as n ! 1,
En,#Cın hn ` ın1 .Tn .# C ın hn // D En,#Cın hn . `.Un hn //
. /
! Eh . `.U h// .
(2) We prove that in the limit experiment E1 .#/ D E.S , J /, for `./ continuous
and bounded,
h ! Eh . `.U h// is continuous on Rd .
Consider convergent sequences hn ! h. The structure in Definition 6.1 of likelihoods
in the limit experiment implies pointwise convergence of Lhn =0 as n ! 1 to Lh=0 ;
these are non-negative, and E0 .Lh=0 / D 1 D E0 .Lhn =0 / holds for all n. This gives
(cf. [20, Nr. 21 in Chap. II])
the sequence Lhn =0 under P0 , n 1, is uniformly integrable.
For `./ continuous and bounded, we deduce
the sequence Lhn =0 `.U hn / under P0 , n 1, is uniformly integrable
which contains the assertion of step (2): it is sufficient to write as n ! 1
Ehn . `.U hn // D E0 Lhn =0 `.U hn /
! E0 Lh=0 `.U h/ D Eh `.U h/ .
190 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ
(3) Now it is easy to prove (++): in the limit experiment, thanks to step (2), we can
rewrite assertion . / of step (1) in the form
Assume that for some C the sequence .˛n .C //n does not tend to 0. Then there is a
sequence .hn /n in the closed ball ¹jhj C º and a subsequence .nk /k of N such that
for all k
ˇ
ˇ ˇˇ
ˇEnk ,#Cınk hnk ` Unk hnk Ehnk `.U hnk / ˇ > "
for some " > 0. The corresponding .hnk /k taking values in a compact, we can find
some further subsequence .nk` /` and a limit point hL such that convergence hnk` ! hL
holds as ` ! 1, whereas
ˇ ˇˇ
ˇ
ˇEnk ,#Cınk hnk ` Unk` hnk` Ehnk ` U hnk` ˇ > "
` ` ` `
We can rephrase Theorem 7.6 as follows: when LAQ holds at #, any estimator
sequence .Tn /n for the unknown parameter in .En /n satisfying a joint convergence
condition
7.7 Corollary. Under LAQ in #, any sequence .Tn /n with the coupling property .˘/
satisfies
ˇ ˇ
sup ˇEn,#Cın h ` ın1 .Tn .# C ın h// Eh .`.Z h//ˇ ! 0 , n ! 1
jhjC
for continuous and bounded loss functions `./ and for arbitrary constants C < 1.
This signifies that .Tn /n works over shrinking neighbourhoods of #, defined through
local scale ın D ın .#/ # 0 , as well as the maximum likelihood estimator Z in the
limit model E1 .#/ D E.S , J /.
Recall that in Corollary 7.7, the laws L.Z hjPh / may depend on the parameter
h 2 Rd , and may not admit finite higher moments (cf. Remark 6.6”; examples will be
seen in Chapter 8). Note also that the statement of Corollary 7.7 – the ‘best’ result as
long as we do not assume more than LAQ – should not be mistaken as an optimality
criterion: Corollary 7.7 under .˘/ is simply a result on risks of estimators at # – in
analogy to Theorem 7.6 under the joint convergence condition ./ – which does not
depend on a particular choice of a loss function, and which is uniform over shrinking
neighbourhoods of # defined through local scale ın D ın .#/ # 0 . We do not know
about the optimality of maximum likelihood estimators in general quadratic limit
models (recall the remark following Theorem 6.8).
where the limiting law F does not depend on the value of the local parameter h 2 Rd .
192 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ
7.9 Example. Under LAMN or LAN at #, estimator sequences .Tn /n in .En /n linked
to the central sequence .Zn /n at # by the coupling condition of Corollary 7.7
are regular at #. This is seen as follows. As in the remarks preceding Corollary 7.7,
rescaled estimation errors Un :D ın1 .#/.Tn #/ satisfy a joint convergence condi-
tion with U D Z on the right-hand side:
L . Sn , Jn , Un j Pn,# / ! L . S , J , Z j P0 / , n ! 1,
When LAN holds at #, F :D L.Z hjPh / does not depend on h 2 Rd , see Proposi-
tion 5.3(b). When LAMN holds at #, F e :D L.J , Z hjPh / does not depend on h ,
see Definition 6.2 together with Proposition 6.3(iv). This establishes regularity at # of
sequences .Tn /n which satisfy condition .˘/, under LAN or LAMN.
7.9’ Exercise. Let E denote the location model ¹F ./ D F0 . / : 2 Rº generated by the
doubly exponential distribution F0 .x/ D 12 e jxj dx on .R, B.R//. Write En for the n-fold
product experiment. Prove the following, for every reference point # 2 ‚:
(a) Recall from Exercise 4.1’’’ that E is L2 -differentiable at D #,p and use Le Cam’s
Second Lemma 4.11 to establish LAN at # with local scale ın .#/ D 1= n.
(b) The median of the first n observations (which is the maximum likelihood estimator in
this model) yields a regular estimator sequence at #. The same holds for the empirical mean,
or for arithmetic means between upper and lower empirical ˛-quantiles, 0 < ˛ < 12 fixed.
Also Bayesians with ‘uniform over R prior’ as in Exercise 5.4’
R1
L=0
n d
Tn :D R11 =0
, n1
1 Ln d
are regular in the sense of Definition 7.8. Check this using only properties of a location
model.
Section 7.2 Asymptotic optimality of estimators in the LAN or LAMN setting 193
7.9” Exercise. We continue ExerciseP 7.9’, with notations and assumptions as there. Focus
on the empirical mean Tn D n1 niD1 Xi as estimator for the unknown parameter. Check
that the sequence .Tn /n , regular by Exercise 7.9’(b), induces the limit law F D N .0, 2/ in
Definition 7.8(a).
Then give an alternative proof for regularity of .Tn /n based on the LAN property: from joint
convergence of
!
1 X 1 X
n n
p sgn.Xi #/ , p .Xi #/ under Pn,#
n i D1 n i D1
specify the two-dimensional normal law which arises as limit distribution for
p
L ƒh=0n,# , n.Tn #/ j Pn,# as n ! 1 ,
using Le Cam’s Third Lemma 3.6 in the particular form of Proposition 3.6”.
Proof. We prove (b) first. With notation Un :D ın1 .Tn #/, regularity means that for
e,
some law F
L Jn , Un h j Pn,#Cın h ! F e, n!1
e does not depend on h 2 Rd . Selecting subsequences
(weakly in Rd Rd d ) where F
e has a representation
according to Lemma 7.5, we see that F
.C/ e D L .J , U h j Ph /
F not depending on h 2 Rd
where U is a statistic in the (possibly Markov extended) limit experiment E.S , J /.
Then U in (+) is a strongly equivariant estimator for the parameter h 2 Rd in the
194 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ
mixed normal limit experiment E.S , J /, and the Convolution Theorem 6.6 applies
to U and gives the assertion. To prove (a), which corresponds to deterministic J , we
use a simplified version of the above, and apply Boll’s Convolution Theorem 5.5.
Recall from Definition 5.6 that loss functions on Rd are subconvex if all levels sets
are convex and symmetric with respect to the origin in Rd . Recall from Anderson’s
Lemma 5.7 that for `./ subconvex,
Z Z
`.u/ ŒN .0, j 1 / Q0 .du/ `.u/ N .0, j 1 /.du/
for every j 2 DC and any law Q0 on Rd . Anderson’s lemma shows that best concen-
trated limit distributions in the Convolution Theorem 7.10 are characterised by
Q D 0 under LAN at #,
.7.100 /
Qj D 0 for P0 -almost all j 2 Rd d under LAMN at #.
Estimator sequences .Tn /n in .En /n which are regular at # and attain in the convolu-
tion theorem the limit distribution (7.10’) are called efficient at #.
7.10” Exercise. In the location model generated from the two-sided exponential distribution,
continuing Exercises 7.9’ and 7.9”, check from Exercise 7.9” that the sequence of empirical
means is not efficient.
In some problems we might find efficient estimators directly, in others not. Under
some additional conditions – this will be the topic of Section 7.3 – we can apply
a method which allows us to construct efficient estimator sequences. We have the
following characterisation.
7.11 Theorem. Consider estimators .Tn /n in .En /n for the unknown parameter. Under
LAMN or LAN at #, the following assertions (i) and (ii) are equivalent:
(i) the sequence .Tn /n is regular and efficient at # ;
(ii) the sequence .Tn /n has the coupling property .˘/ of example 7.9 (or of Corol-
lary 7.7):
Proof. We consider the LAMN case (the proof under LAN is then a simplified version).
Consider a sequence .Tn /n which is regular at #, and write Un D ın1 .#/.Tn #/.
The implication (ii)H)(i) follows as in Example 7.9 where we have in particular
under (ii)
L Jn , .Un h/ j Pn,#Cın h ! L .J , .Z h/ j Ph /
Section 7.2 Asymptotic optimality of estimators in the LAN or LAMN setting 195
for every h. But LAMN at # implies according to Definition 6.2 and Proposition 6.3
Z
L .J , .Z h/ j Ph / .A/ D L .J , Z j P0 / .A/ D PJ .dj / N .0, j 1 /.du/ 1A .j , u/
Now we exploit the efficiency assumption for .Tn /n at #: according to (7.10’) above
we have
Qj D 0 for PJ -almost all j 2 Rd d .
Since the Convolution Theorem 6.6 identifies .j , B/ ! Qj .B/ as a regular version
U ZjJ Dj
of the conditional distribution P0 .B/, the last line establishes
U DZ P0 -almost surely.
Using .ı/ and the continuous mapping theorem, this gives
L Unl Znl j Pnl ,# ! L .U Z j P0 / D 0 , l ! 1.
But convergence in law to a constant limit is equivalent to stochastic convergence,
thus
.ıı/ Unl D Znl C o.Pn / .1/ , l ! 1.
l ,#
196 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ
We have proved that every subsequence of the natural numbers contains some further
subsequence .nl /l which has the property .ıı/: this gives
However, there might be interesting estimator sequences .Tn /n for the unknown
parameter in .En /n which are not regular as required in the Convolution Theo-
rem 7.10, or we might be unable to prove regularity: when LAMN or LAN holds at a
point #, we wish to include these in comparison results.
7.12 Local Asymptotic Minimax Theorem. Assume that LAMN or LAN holds at #,
consider arbitrary sequences of estimators .Tn /n for the unknown parameter in .En /n ,
and arbitrary loss functions `./ which are continuous, bounded and subconvex.
(a) A local asymptotic minimax bound
lim inf lim inf sup En,#Cın h ` ın1 .Tn .# C ın h// E0 . `.Z//
c!1 n!1 jhjc
holds whenever .Tn /n has estimation errors at # which are tight at rate .ın .#//n :
L ın1 .#/.Tn #/ j Pn,# , n 1 , is tight in Rd .
(b) Sequences .Tn /n satisfying the coupling property .˘/ of Example 7.9
attain the local asymptotic minimax bound at #. One has under this condition
lim sup En,#Cın h ` ın1 .Tn .# C ın h// D E0 . `.Z//
n!1 jhjc
Proof. We give the proof for the LAMN case (again, the proof under LAN is a simpli-
fied version), and write Un D ı 1 .Tn #/ for the rescaled estimation errors of Tn
at #.
(1) Fix c 2 N. The loss function `./ being non-negative and bounded,
is necessarily finite. Select a subsequence of the natural numbers along which ‘liminf’
in the last line can be replaced by ‘lim’, then – using Lemma 7.5 – pass to some further
subsequence .nl /l and some statistic U in the limit experiment E1 .#/ D E.S , J / (if
Section 7.2 Asymptotic optimality of estimators in the LAN or LAMN setting 197
necessary, after Markov extension) such that the following holds for arbitrary limit
points h and convergent sequences hl ! h:
L Snl , Jnl , Unl hl j Pnl ,#Cınl hl ! L . S , J , U h j Ph / , l ! 1 .
From this we deduce as in Theorem 7.6 as l ! 1
ˇ ˇ
ˇ ˇ
.C/ sup ˇEnl ,#Cınl h ` Unl h Eh . ` .U h//ˇ ! 0
jhjc
since ` 2 Cb . Recall that h ! Eh . ` .U h// is continuous, see step (2) in the proof
of Theorem 7.6. Write c for the uniform law on the closed ball Bc in Rd centred at
0 with radius c, as in Lemma 6.7 and in the proof of Theorem 6.8. Then by uniform
convergence according to (+)
sup Enl ,#Cınl h ` Unl h !
jhjc
Z
sup Eh . ` .U h// c .dh/ Eh . ` .U h//
jhjc
where remainder terms .c/ involve an upper bound for `./ and vanish as c increases
to 1. In the last line, at every stage c 2 N of the asymptotics, Anderson’s Lemma 5.7
allows for a lower bound
Z Z
R.U , Bc / P .dj / N .0, j 1 /.du/ l.u/ C .c/
J
since `./ is subconvex. According to Definition 6.2 and Proposition 6.3, the law ap-
pearing on the right-hand side is L .Z j P0 /, and we arrive at
R.U , Bc / E0 .`.Z// C .c/ where lim .c/ D 0.
c!1
198 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ
(this U depends on the choice of the subsequence at the start) where for c tending to 1
Both assertions together yield the local asymptotic minimax bound in part (a) of the
theorem.
(4) For estimator sequences .Tn /n satisfying condition .˘/ in 7.9, (+) above can be
strengthened to
ˇ ˇ
sup ˇ En,#Cın h . ` .Un h// Eh . `.Z h// ˇ ! 0 , n ! 1
jhjc
for fixed c, without any need to select subsequences (Corollary 7.7). In the mixed nor-
mal limit experiment, Eh . `.Z h// D E0 .`.Z// does not depend on h. Exploiting
this we can replace the conclusion of step (1) above by the stronger assertion
7.13 Remark. Theorem 7.12 shows in particular that whenever we try to find estimator
sequences .Tn /n which attain a local asymptotic minimax bound at #, we may restrict
our attention to sequences which have the coupling property .˘/ of Example 7.9 (or of
Corollary 7.7). We rephrase this statement according to Theorem 7.11: under LAMN
or LAN at #, in order to attain the local asymptotic minimax bound of Theorem 7.12,
we may focus – within the class of all possible estimator sequences – on those which
are regular and efficient at # in the sense of the convolution theorem.
7.13’ Example. In the set of all probability measures on .R, B.R//, let us consider a
one-parametric path E D ¹P : j j < 1º in direction sgn./ through the law P0 :D
R.1, C1/, the uniform distribution with support .1, C1/, defined as in Examples
1.3 and 4.3 by
1 X
n
Sn .#/.X1 , : : : , Xn / D p V# .Xi /
n iD1
An unsatisfactory point with the last example is that n-fold product models En in
Example 7.13’ are in fact classical exponential families
b b
dPn, D .1 C /nŒ1F n .0/ .1 /nF n .0/ dPn,0 D .1 C /n exp¹ . / TLn º dPn,0 ,
2‚
1
in . / :D log. 1C / and TLn :D nF
b n .0/. Hence we shall generalise it (considering dif-
ferent one-parametric paths through a uniform law which are not exponential families)
in Example 7.21 below.
200 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ
starting from any preliminary estimator sequence .Tn /n which converges at rate ın .#/
at #. This ‘one-step modification’ is explicit and requires only few further conditions
in addition to LAQ at #; the main result is Theorem 7.19 below.
In particular, when LAMN or LAN holds at #, one-step modification yields optimal
estimator sequences locally asymptotically at #, via Example 7.9 and Theorem 7.11:
the modified sequence will be regular and efficient in the sense of the Convolution
Theorem 7.10, and will attain the local asymptotic minimax bound of Theorem 7.12.
In the general LAQ case, we only have the following: the modified sequence will work
over shrinking neighbourhoods of # as well as the maximum likelihood estimator Z
in the limit experiment E1 .#/, according to Corollary 7.7 where L.Z hjPh / may
depend on h.
We follow Davies [19] for this construction. We formulate the conditions which
we need simultaneously for all points # 2 ‚, such that the one-step modifications
en /n of .Tn /n will have the desired properties simultaneously at all points # 2 ‚.
.T
This requires some compatibility between quantities defining LAQ at # and LAQ at
# 0 whenever # 0 is close to #.
depending on #.
(B) Local scale: (i) For every n 1 fixed, ın ./ : ‚ ! .0, 1/ is a measurable
mapping which is bounded by 1 (this is no loss of generality: we may always replace
ın ./ by ın ./ ^ 1).
(ii) For every # 2 ‚ fixed, we have for all 0 < c < 1
ˇ ˇ
ˇ ın .# C ın .#/ h/ ˇ
sup ˇˇ 1 ˇˇ ! 0 as n ! 1 .
jhjc ın .#/
Section 7.3 Le Cam’s One-step Modification of Estimators 201
(C) Score and observed information: For every # 2 ‚ fixed, in restriction to the set
of dyadic numbers S :D ¹˛2k : k 2 N0 , ˛ 2 Zd º in Rd , we have
ˇ ® ¯ˇ
sup ˇSn . / Sn .#/ Jn .#/ ı 1 .#/. #/ ˇ
n
.i/ 2 S\‚ , j#j c ın .#/
D oPn,# .1/ , n ! 1
Dn :D ın .Tn / 2 .0, 1 , n 1.
Then we have for every # 2 ‚
Dn
D 1 C oPn,# .1/ , n!1.
ın .#/
(b) For every n 1, define a N0 -valued random variable .n/ by
Fix # 2 ‚ and write Un .#/ :D ın1 .#/.Tn #/ for the rescaled estimation errors
at #. Combining tightness of L .Un .#/jPn,# / as n ! 1 according to Assumption
7.14(D) with a representation Dn D ın .Tn / D ın . # C ın .#/ Un .#/ / and with
Assumption 7.14(B.ii) we obtain (a). From 2..n/C1/ Dn 2.n/ we get
..n/C1/ D , thus (b) follows from (a).
2 Dn 2
1
n
for every # 2 ‚.
together with a default value Gn D #0 on Bnc . Since ‚ is open and rescaled estimation
errors of .Tn /n at # are tight at rate .ın .#//n , since .Dn /n or .e .n/ /n defined in
Proposition 7.15(b) are random tightness rates which are equivalent to .ın .#//n under
.Pn,# /n , we have by construction
together with
p
jGn Tn j < d 2.n/ on Bn , for every n 2 N
such that the following (i) and (ii) hold for every # 2 ‚:
.i/ b
J n D Jn .#/ C oPn,# .1/ as n ! 1,
Proof. Note first that Jn .#/ under Pn,# takes values in DC almost surely, by definition
of LAQ in Definition 7.1, but the same statement is not clear for Jn . / under Pn,#
(we did not require equivalence of laws Pn,# , Pn, for ¤ #). Since Gn takes values
in the countable set S \ ‚, the random variable b J n on .n , An / is well defined and
Rd d -valued; then also K b n is well defined since DC is a Borel set in Rd d .
(1) For every # 2 ‚, we deduce (i) from part (C.ii) of Assumption 7.14, via a
representation
b
J n D Jn .Gn / D Jn . # C ın .#/UL n .#/ / , UL n .#/ :D ın1 .#/.Gn #/
in Rd d Rd d as n ! 1. The mapping on Rd d
² 1
j if det.j / ¤ 0
: j !
Id else
In particular, L.b
S n jPn,# / is tight as n ! 1 for every # 2 ‚.
Proof. Again b
S n D Sn .Gn / is well defined. Fix # 2 ‚ and write
b
S n D Sn .Gn / D Sn . # C ın .#/UL n .#/ / , UL n .#/ :D ın1 .#/.Gn #/
where Gn is S \ ‚-valued and where L.UL n .#/jPn,# / is tight as n ! 1. Then we
write
ˇ
ˇ ® ¯ˇˇ
Pn,# ˇSn .Gn / Sn .#/ Jn .#/ UL n .#/ ˇ > " Pn,# jUL n .#/j > c
ˇ ® ¯ˇ
C Pn,# sup ˇ Sn . / Sn .#/ Jn .#/ ı 1 .#/. #/ ˇ > "
n
2S\‚,j#jcın .#/
7.19 Theorem. (a) With these assumptions and notations, the one-step modification
T bn b
en :D Gn C Dn K Sn , n1
en /n for the unknown parameter in .En /n which has
yields an estimator sequence .T
the property
en #/ D Zn .#/ C o.P / .1/ ,
ın1 .#/.T n!1
n,#
for every # 2 ‚.
(b) In the sense of Corollary 7.7, for every # 2 ‚, .T en /n works over shrinking
neighbourhoods of # as well as the maximum likelihood estimator in the limit model
E1 .#/ D E.S.#/, J.#// .
(c) If LAMN or LAN holds at #, the sequence .T en /n is regular and efficient at # in
the sense of the Convolution Theorem 7.10, and attains the local asymptotic minimax
bound at # according to the Local Asymptotic Minimax Theorem 7.12.
Proof. Only (a) requires a proof. Fix # 2 ‚. Then from Propositions 7.15 and 7.17
en #/ D ın1 .#/.Gn #/ C Dn b b
ın1 .#/.T Kn S n
ın .#/
D ın1 .#/.Gn #/ C 1 C oPn,# .1/ Jn1 .#/ C oPn,# .1/ b
Sn
D ın1 .#/.Gn #/ C Jn1 .#/ b
S n C oP .1/
n,#
where terms ın1 .#/.Gn #/ cancel out and the last line simplifies to
E D . , A , P D ¹P : 2 ‚º /
h=0
p
.Ch= n/= d Pn,Ch=pn
ƒn, D ƒn D log
d Pn,
p
for 2 ‚ and h 2 ‚,n , the set of all h 2 Rd such that C h= n belongs to ‚. We
shall assume
8
< there is an open set ‚0 Rd contained in ‚ such
./ that the following holds: for every # 2 ‚0 , the experiment
:
E is L2 -differentiable at D # with derivative V# .
that # 2 ‚0 implies
Recall from Assumptions 4.10, Corollary 4.5 and Definition 4.2
that V# is centred and belongs to L2 .P# /, write J# D E# V# V#> for the Fisher
information in the sense of Definition 4.6. Then Le Cam’s Second Lemma 4.11 yields
a quadratic expansion of log-likelihood ratios
p
.#Chn = n/ 1 >
ƒn D h> n Sn .#/ h J# hn C oPn,# .1/
2 n
as n ! 1, for arbitrary bounded sequences .hn /n in Rd , at every reference point
# 2 ‚0 , with
1 X
n
Sn .#/ D p V# .Xi /
n
j D1
In terms of Definitions 5.2 and 7.1 we can rephrase Le Cam’s Second Lemma 4.11 as
follows:
We present some examples of i.i.d. models for which Le Cam’s second lemma
establishes LAN at all parameter values. The aim is to specify efficient estimator
Section 7.4 The Case of i.i.d. Observations 207
g.x/ :D sin.x/ ,
1
P .dx/ :D .1 C g.x// P0 .dx/ D .1 C sin.x// dx , x 2 I
2
as in Examples 1.3 and 4.3, where the parameterisation is motivated by
1
F .0/ D , 2‚
2
with F ./ the distribution function corresponding to P
1
F .x/ D .Œx C Œcos.x/ C 1/ when x2I.
2
(1) According to Example 4.3, the model E is L2 -differentiable at D # with
derivative
g sin.x/
V# .x/ D .x/ D , x2I
1C#g 1 C # sin.x/
at every reference point # 2 ‚. As resumed in Theorem 7.20, LepCam’s second lemma
yields LAN at # for every # 2 ‚, with local scale ın .#/ D 1= n and score
1 X
n
Sn .#/.X1 , : : : , Xn / D p V# .Xi /
n
iD1
J# D E# .V#2 / < 1 , # 2 ‚.
1X 2
n
Jn .#/ D V# .Xi /
n
iD1
in the quadratic expansion of log-likelihood ratios in the local model at #, and write
p
1 >
D h>
.#Chn = n/
.C/ ƒn n Sn .#/ h Jn .#/ hn C oPn,# .1/
2 n
as n ! 1, for arbitrary bounded sequences .hn /n in Rd .
208 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ
(2) We show that the set of Assumptions 7.14 is satisfied. A preliminary estimator
1 b n .0/
Tn :D F
2
p
for the unknown parameter # in En is at hand, clearly n-consistent as n ! 1.
From step (1), parts (A) and (B) of Assumptions 7.14 are granted; part (C.ii) holds by
continuity of Jn .#/ in the parameter. Calculating
® ¯
Sn . / Sn .#/ Jn .#/ ın1 .#/. #/ , with Jn .#/ as in (+)
(the quantities arising in part (C.i) of Assumption 7.14), we find
1X 2
n
p 1 C # sin.Xi /
Sn . / Sn .#/ D n. #/ V# .Xi /
n 1 C sin.Xi /
iD1
However, even if the set of assumptions 7.14 can be checked in a broad variety of
statistical models, not all of the assumptions listed there are harmless.
7.22 Example. Let E D ., A, ¹P : 2 Rº/ be the location model on .R, B.R//
generated from the two-sided exponential distribution P0 .dy/ :D 12 e jyj dy ; this
example has already been considered in exercises 4.1000 , 7.90 and 7.900 . We shall see
that assumption 7.14 C) i) on the score with estimated parameter does not hold, hence
one-step correction according to theorem 7.19 is not applicable. However, it is easy
–for this model– to find optimal estimator sequences directly.
1) For every # 2 ‚ :D R, we have L2 -differentiability at D # with derivative
V# given by
V# .x/ D sgn.x #/
000
p
(cf. exercise 4.1 ). For all #, put ın .#/ D 1= n. Then Le Cam’s second lemma
(theorem 7.20 or theorem 4.11) establishes LAN at # with score
1 X
n
Sn .#/.y1 , : : : , yn / D p V# .yi /
n
iD1
2) We consider the set of assumptions 7.14. From 1), parts A) and B) of 7.14 are
granted; 7.14 C) part ii) is trivial since Fisher information does not depend on the
parameter. Calculate now
® ¯
Sn . / Sn .#/ Jn .#/ ın1 .#/. #/ , Jn .#/ D J# 1
P
in view of 7.14 C) part i). This takes a form p1n niD1 Yn,i where
²
2 1.#,/ .Xi / C 12 . #/ for # < ,
Yn,i :D
2 1.,#/ .Xi // 12 .# / for < # .
Rr
Defining a function .r/ :D 0 12 e y dy for r > 0, we have
1 X X
n n
Yn,i E# .Yn,i /
p Yn,i D p C oPn,# .1/
n
iD1 iD1
Var# .Yn,1 / C : : : C Var# .Yn,i /
The structure of the Yn,i implies that the Lindeberg condition holds, thus as n ! 1
1 X
n
® ¯
Sn . n / Sn .#/ Jn .#/ ın1 .#/. n #/ D p Yn,i ! N .0, 1/
n iD1
Tn :D median.X1 , : : : , Xn /
which in our model is maximum likelihood in En for every n. Our model allows to
apply a classical result on asymptotic normality of the median: from [128, p. 578]),
p
L n .Tn #/ j Pn,# ! N .0, 1/
210 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ
Recall from 1) that Fisher information in E equals J# D 1 for all #. Thus F appearing
here is the optimal limit distribution F D N .0, J#1 / in Hajek’s convolution theorem.
By (7.10’) and Theorem 7.11, the sequence .Tn /n is thus regular and efficient in the
sense of the convolution theorem, at every reference point # 2 ‚. Theorem 7.11
establishes that the coupling condition
p
n .Tn #/ D Zn .#/ C o.Pn,# / .1/ , n ! 1
holds, for every # 2 R. Asymptotically
P as n ! 1, this links estimation errors of the
median Tn to differences n1 niD1 sgn.Xi #/ between the relative number of obser-
vations above and below #. The coupling condition in turn implies that the estimator
sequence .Tn /n attains the local asymptotic minimax bound 7.12.
7.22’ Exercise. Consider the location model E D ., A, ¹P : 2 Rº/ generated from
P0 D N .0, 1/, show that L2 -differentiability holds with L2 -derivative V# D . #/ at
every point # 2 R. Show that the set of Assumptions 7.14 is satisfied, with left-hand sides
in Assumption 7.14(C) identical to zero.pConstruct an MDE sequence as in Chapter 2 for the
unknown parameter, with tightness rate n, and specify the one-step modification according
to Theorem 7.19 which as n ! 1 grants regularity and efficiency at all points # 2 ‚ (again,
as in Example 7.21, there is no need for discretisation). Verify that this one-step modification
directly replaces the preliminary estimator by the empirical mean, the MLE in this model.
7.22” Exercise. We continue with the location model generated from the two-sided exponen-
tial law, under all notations and assumptions of Exercises 7.9’, 7.9” and of Example 7.22. We
focus on the Bayesians with ‘uniform over R prior’
R1 =0
Ln d
Tn D R1 1 =0
, n1
1 Ln d
and shall prove that .Tn /n is efficient in the sense of the Convolution Theorem 7.10 and of the
Local Asymptotic Minimax Theorem 7.12.
(a) Fix any reference point # 2 R and recall (Example 7.22, or Exercises 7.9’(a), or 4.1’’’
and Lemma 4.11) that LAN holds at # in the form
1 X
n
hn =0 1 2
ƒn,# D hn Sn .#/ h C o.Pn,# / .1/ , Sn .#/ :D p sgn.Xi #/
2 n n i D1
as n ! 1 to a limit law
Z 1 Z 1
uS 12 u2 uS 12 u2
S, ue du , e du
1 1
where S N .0, 1/ generates the Gaussian limit experiment E.1/ D ¹N .h, 1/ : h 2 Rº.
(Hint: finite-dimensional convergence of .Lu=0 /
n,# u2R
allows to deal e.g. with integrals
R u=0
u Ln,# du on compacts K R, as in Lemma 2.5, then give some bound for
RK u=0
K c u Ln,# du as n ! 1).
(c) Let U denote the Bayesian with ‘uniform over R prior’ in the limit experiment E.1/
R1 1 2
u e uS 2 u du
U :D R11 uS 12 u2
1 e du
and recall from Exercise 5.4” that in a Gaussian shift experiment, U coincides with the central
statistic Z.
(d) Use (a), (b) and (c) to prove that the coupling condition
p
n Tn # D Zn .#/ C o.Pn,# / .1/ as n ! 1
holds. By Theorem 7.11, comparing to Theorem 7.10, (7.10’) and Theorem 7.12, we have
efficiency of .Tn /n at #.
Chapter 8
Some Stochastic Process Examples for Local
Asymptotics of Type LAN, LAMN and LAQ
We discuss in detail some examples for local asymptotics of type LAN, LAMN or
LAQ in stochastic process models (hence the asterisk in front of all sections). Mar-
tingale convergence and Harris recurrence (positive or null) will play an important
role in our arguments and provide limit theorems which establish convergence of lo-
cal models to a limit model. Background on these topics and some relevant apparatus
are collected in an Appendix (Chapter 9) to which we refer frequently, so one might
have a look to Chapter 9 first before reading the sections of the present chapter.
Section 8.1 Ornstein-Uhlenbeck Model 213
8.1 Ornstein–Uhlenbeck Process with Unknown
Parameter Observed over a Long Time Interval
We start in dimension d D 1 with the well-known example of Ornstein–Uhlenbeck
processes depending on an unknown parameter, see [23] and [5, p. 4]). We have a
probability space ., A, P / carrying a Brownian motion W , and consider the unique
strong solution X D .X t / t0 to the Ornstein–Uhlenbeck SDE
.8.1/ dX t D # X t dt C d W t , t 0
for some value of the parameter # 2 R and some starting point x 2 R. There is an
explicit representation of the solution
Z t
Xt D e# t x C e # s d Ws , t 0
0
satisfying
Z ´
t
#t # s N 0, 1
e 2# t 1 if # ¤ 0
L e e d Ws D 2#
0 N .0, t / if # D 0 ,
and thus an explicit representation of the semigroup .P t ., // t0 of transition proba-
bilities of X
8.2 Long-time Behaviour of the Process. Depending on the value of the parameter
# 2 ‚, we have three different types of asymptotics for the solution X D .X t / t0 to
equation (8.1).
(a) Positive recurrence in the case where # < 0: When # < 0, the process X is
positive recurrent in the sense of Harris (cf. Definition 9.4) with invariant measure
1
:D N 0, .
2j#j
This follows from Proposition 9.12 in the Appendix where the function
Z x Z y
2# v dv D e # y ,
2
S.x/ D s.y/ dy with s.y/ D exp x, y 2 R
0 0
corresponding to the coefficients of equation (8.1) in the case where # < 0 is a bi-
jection from R onto R, and determines the invariant measure for the process (8.1)
214 Chapter 8 Some Stochastic Process Examples
1
as s.x/ dx , unique up to constant multiples. Normed to a probability measure this
specifies as above.
Next, the Ratio Limit Theorem 9.6 yields for functions f 2 L1 ./
Z
1 t
lim f . s / ds D .f / almost surely as t ! 1
t!1 t 0
for arbitrary choice of a starting point. We thus have strong laws of large numbers for
a large class of additive functionals of X in the case where # < 0.
(b) Null recurrence in the case where # D 0: Here X is one-dimensional Brownian
motion with starting point x, and thus null recurrent (cf. Definition 9.5’) in the sense
of Harris. The invariant measure is , the Lebesgue measure on R.
(c) Transience in the case where # > 0: Here trajectories of X tend towards C1
or towards 1 exponentially fast. In particular, any given compact K in R will be
left in finite time without return: thus X is transient in the case where # > 0. This is
proved as follows. Write F for the (right-continuous) filtration generated by W . For
fixed starting point x 2 R, consider the .P , F /-martingale
Z t
# t
Y D .Y t / t0 , Y t :D e Xt D x C e #s d Ws , t 0 .
0
Rt
Then E ..Y t x/2 / D 0 e 2#s ds and thus
sup E Y t2 < 1
t0
Asymptotics . / can be transformed into a strong law of large numbers for some few
additive functionals of X , in particular
Z t Z t
1 2# t
Xs2 .!/ ds Y1 2
.!/ e 2 # s ds Y1
2
.!/ e as t ! 1
0 0 2#
for P -almost all ! 2 .
Proof. (1) For the process H , consider some localising sequence .n /n under P , and
some localising sequence T .n0 /n under P 0 . For
T finite time horizon N < 1 we have
PN 0 0
PN , hence events n ¹n N º and n ¹n N º in FN are null sets under
both probability measures P and P 0 . This holds for all N , thus .n ^n0 /n is a common
localising sequence under both P and P 0 .
(2) Localising further, we may assume that H as well as M , hM iP , jjjAjjj and M 0 ,
hM 0 iP 0 , jjjA0 jjj are bounded (we write jjjAjjj for the total variation process of A, and
hM iP for the angle bracket under P ). R
(3) Fix a version .t , !/ ! J.t , !/ of the stochastic integral
R Hs dXs under P , and
a version .t , !/ ! J 0 .t , !/ of the stochastic integral Hs dXs under P 0 . For n 1,
define processes .t , !/ ! I .n/ .t , !/
X1 Z t
.n/
It D H kn X t^ kC1 X t^ kn D Hs.n/ dXs ,
2 2n 2 0
kD0
1
X
H .n/ :D H kn 1 k , kC1
2 2n 2n
kD0
to be considered under both P and P 0 . Using [98, Thm. 18.4] with respect to the
martingale parts of X , select a subsequence .n` /` from N such that simultaneously as
`!1
8
ˆ
ˆ for P -almost all !: the paths I .n` / ., !/ converge uniformly on Œ0, 1/
<
to the path J., !/
ˆ for P 0 -almost all !: the paths I .n` / ., !/ converge uniformly on Œ0, 1/
:̂
to the path J 0 ., !/ .
Since PN PN0 for N < 1, there is an event AN 2 FN of full measure under both
P and P such that ! 2 AN implies J., !/ D J 0 ., !/ on Œ0, N . As a consequence,
0
216 Chapter 8 Some Stochastic Process Examples
J and J 0 are indistinguishable processes for the probability measure P as well as for
the probability measure P 0 .
with m.0/ the Q0 -local martingale part of the canonical process under Q0 .
(1) Fix a determination .t , !/ ! Y .t , !/ of the stochastic integral
Z Z Z
Y D s ds D s d m.0/ s D 0 m.0/
C m.0/
s d ms
.0/
under Q0 :
Rt
under Q# . Obviously has Q# -martingale part m.#/ t D t 0 # 0 s ds . There
are two statistical consequences:
(i) for every every 0 < t < 1, ML estimation errors under # 2 ‚ take the form of
the ratio of a Q# -martingale divided by its angle bracket under Q# :
Rt .#/
b s d ms
#t # D 0
Rt under Q# ;
2
0 s ds
(ii) for the density process L=# of Q with respect to Q# relative to G, the repre-
sentation
² Z t ³
1 2
L=0
t =L #=0
t D exp . #/ Y t . # 2
/ 2
s ds , t 0
2 0
(3) The representation (+) allows to reparameterise the model E t with respect to
fixed reference points # 2 ‚, and makes quadratic models appear around #. We will
call
Z t Z t Z
.#/ .#/
s d ms and 2
s ds D dm under Q#
0 t0 0 t0
score martingale at # and information process at #. It is obvious from 8.2 that the
law of the observed information depends on #. Hence reparameterising as in (+)
with respect to different reference points # or # 0 makes statistically different models
appear.
8.3’ Local Models at #. (1) Localising around a fixed reference point # 2 R, write
Qn for Q restricted to Gn . With suitable choice ın .#/ of local scale to be specified
below, consider local models
® n ¯
E#,n D C , Gn , Q#Cı n .#/ h : h 2 R , n1
ƒh=0
#,n
D log Ln.#Cın .#/h/=#
.˘/ Z n Z n
# 1 2 2
D h ın .#/ s d ms h ın .#/ 2s ds , h2R.
0 2 0
218 Chapter 8 Some Stochastic Process Examples
By .˘/ and in view of Definition 7.1, the problem of choice of local scale at # turns
out to be the problem of choice of norming constants for the score martingale: we need
weak convergence as n ! 1 of pairs
Z n Z n
.˘˘/ . Sn .#/ , Jn .#/ / :D ın .#/ s d m#s , ın2 .#/ 2s ds under Q#
0 0
. S.#/ , J.#/ /
which generate as in Definition 6.1 and Remark 6.1” a quadratic limit experiment.
(2) From .˘/, rescaled ML estimation errors at # take the form
Rn
1 b ın .#/ 0 s d m#s
ın .#/ # n # D 2 Rn under Q#
ın .#/ 0 2s ds
Now we show that local asymptotic normality at # holds in the case where # < 0,
local asymptotic mixed normality in the case where # > 0, and local asymptotic
quadraticity in the case where # D 0. We also specify local scale .ın .#//n .
8.4 LAN in the Positive Recurrent Case. By 8.2(a), in the case where # < 0, the
canonical process is positive recurrent under Q# with invariant probability # D
N . 0 , 2j#j
1
/. For the information process in step (3) of Model 8.3 we thus have the
following strong law of large numbers
Z Z
1 t 2 1
lim s ds D x 2 # .dx/ D D: ƒ Q# -almost surely.
t!1 t 0 2j#j
Correspondingly, if at stage n of the asymptotics we observe a trajectory over the time
interval Œ0, n, we take local scale ın .#/ at # such that
(1) Rescaling the score martingale in step (3) of Model 8.3 in space and time we put
G n :D .G tn / t0 and
Z
1=2
tn
M#n D M#n .t / t0 , M#n .t / :D n s d m.#/
s , t 0.
0
Section 8.1 Ornstein-Uhlenbeck Model 219
This yields a family .M#n /n of continuous .Q# , G n /-martingales with angle brackets
Z
˝ n˛ 1 tn 2 1
8t 0 : M# t D s ds ! t Dt ƒ
./ n 0 2j#j
Q# -almost surely as n ! 1 .
From Jacod and Shiryaev [64, Cor. VIII.3.24], the martingale convergence theorem –
we recall this in Appendix 9.1 below – establishes weak convergence in the Skorohod
space D of càdlàg functions Œ0, 1/ ! R to standard Brownian motion with scaling
factor ƒ1=2 :
(2) Combining ./ and ./ above with .˘/ in 8.3’, log-likelihoods in the local
model En,# at # are
1=2 h/=# 1 2 ˝ n˛
ƒh=0
#,n
D log Ln.#Cn D h M#n .1/ h M# 1 , h2R
2
which gives for arbitrary bounded sequences .hn /n
8 h =0
ˆ
ˆ ƒ n D hn M#n .1/ 12 h2n ƒ C oQ# .1/ as n ! 1
< #,n
. / L M#n .1/ j Q# ! N . 0 , ƒ / as n ! 1
ˆ
:̂
with ƒ D 2j#j1
.
This establishes LAN at parameter values # < 0, cf. Definition 7.1(c), and the limit
experiment E1 .#/ is the Gaussian shift E. 2j#j
1
/ in the notation of Definition 5.2.
(3) Once LAN is established, the assertion .˘˘˘/ in 8.3’ is the coupling condition
of Theorem 7.11. From Hájek’s Convolution Theorem 7.10 and the Local Asymptotic
Minimax Theorem 7.12 we deduce the following properties for the ML estimator
sequence .b# n /n : at all parameter values # < 0, the maximum likelihood estimator
sequence is regular and efficient for the unknown parameter, and attains the local
asymptotic minimax bound.
8.5 LAQ in the Null Recurrent Case. In the case where # D 0, the canonical process
under Q0 is a Brownian motion with starting point x, cf. 8.2(b), and self-similarity
properties of Brownian motion turn out to be the key to local asymptotics at # D 0, as
220 Chapter 8 Some Stochastic Process Examples
pointed out by [23] or [38]. Writing B or B e for standard Brownian motion, Ito formula
and scaling properties give
Z t Z t Z t Z t Z t
1 2
Bs dBs , jBs j ds , Bs ds D
2
B t , jBs j ds , 2
Bs ds
0 0 0 2 t 0 0
Z t ˇp ˇ Z t 2
d 1 p e 2 ˇ es ˇ p
e
D Œ t B 1 t , ˇ t B t ˇ ds , t B st ds
2 0 0
Z 1 Z 1 Z 1
d
D t Bs dBs , t 3=2 jBs j ds , t 2 Bs2 ds .
0 0 0
According to .˘/ in 8.3’, the log-likelihoods in the local model En,0 at # D 0 are
h=0 1 h/=0 1 2
. / ƒ0,n D log L.0Cn
n D h M n .1/ h hM n i1 , h2R
2
for every n 1. Thus, as a statistical experiment, local experiments En,0 D ¹Q0C n
1 :
nh
h 2 Rº at # D 0 coincide for all n 1 with the experiment E1 D ¹Qh : h 2 1
(2) We look to ML estimation in the special case of starting value x D 0 for equation
(8.1). Combining the representation of rescaled ML estimation errors .˘˘˘/ in 8.3’
with ./ in step (1), the above scaling properties allow for equality in law which does
not depend on n 1
1
.ı/ L n b #n 0 C h j QŒ0C n1 h D L b # 1 h j Qh
n
at every value h 2 R. To see this, write for functions f 2 Cb .R/
b 1
EQŒ0C 1 h f n # n 0 C h
n n
D EQŒ0C 1 h f b hn h
n
Œ0C 1 h=0
D EQ0 Ln n f bhn h
² ³ n
n 1 2 n M .1/
D EQ0 exp h M .1/ h hM i1 f h
2 hM n i1
which by ./ above is free of n 1. We can rephrase .ı/ as follows: observing over
longer time intervals, we do not gain anything except scaling factors.
(B) Now we consider the general case of starting values 0 x ¤ 0 for equation
(8.1). In this case, by the above decomposition of the score martingale, M n under Q0
is of type
Z Z
1 tn x 1 tn
.x C Bs / dBs D B tn C Bs dBs , t 0
n 0 n n 0
for standard Brownian motion B. Decomposing M n in this sense, we can control
ˇ Z sn ˇ
ˇ n 1 ˇ
sup ˇˇM .s/ . 0 /v d mv ˇˇ under Q0
.0/
0st n 0
x
in the same way as p sup
n 0st
jBs j , for arbitrary n and t , and
ˇ Z sn ˇ
ˇ n 1 ˇ
ˇ
sup ˇhM i .s/ 2 . 0 /v dv ˇˇ under Q0
2
0st n 0
R
in the same way as xnt C 2pjxj
2 t
n 0
jBs j ds , where we use the scaling property stated
at the start.
(1) In the general situation (B), the previous equality in law ./ of score and in-
formation is replaced by weak convergence of the pair (score martingale, information
process) under the parameter value # D 0
Z Z
L M n , hM n i j Q0 ! L Bs dBs , Bs2 ds
./
weakly in D.R2 / as n ! 1 .
222 Chapter 8 Some Stochastic Process Examples
Let us write e
E 1 for the limit experiment which appears in (A.1), in order to avoid
confusion about the different starting values. In our case (B) of starting value x ¤ 0
for equation (8.1), we combine . / for the likelihood ratios in the local models En,0
at # D 0 (which is .˘/ in 8.3’)
for arbitrary loss functions `./ which are continuous and bounded and for arbitrary
constants C < 1. In case (B) of starting value x ¤ 0 for equation (8.1), .ıı/ replaces
equality of laws .ı/ which holds in case (A) above. Recall that .ıı/ merely states the
following: the ML estimator b hn for the local parameter h in the local model En,# at
# D 0 works approximately as well as the ML estimator in the limit model e E 1 . In
particular, .ıı/ is not an optimality criterion.
8.6 LAMN in the Transient Case. By 8.2(c), in the case where # > 0, the canoni-
cal process under Q# is transient, and we have the following asymptotics for the
information process when # > 0:
Z t
1
e 2 # t 2s ds ! Y12
.#/ Q# -almost surely as t ! 1 .
0 2#
Z .nClog.t//C
# n
M#n D M#n .t / t0 , M#n .t / :D e s d m.#/
s , t 0
0
Section 8.1 Ornstein-Uhlenbeck Model 223
(cf. [64, VI.6.1]); we recall this in Theorem 9.3 in the Appendix) in the Skorohood
space D.R2 / of càdlàg functions Œ0, 1/ ! R2 . By continuity of projection mappings
on a subset of D.R2 / of full measure, we end up with weak convergence
n ˝ ˛
M# .1/ , M#n 1 ! . B.'# .1// , '# .1/ /
./
(weakly in R2 , under Q# , as n ! 1) .
coincide with the central sequence at # and converge to the limit law
Z
B.'# .1// 1
. / L .'# .1// .du/ N 0 , .
'# .1// u
By local asymptotic mixed normality according to step (1), Theorems 7.11, 7.10
and 7.12 apply and show the following: at all parameter values # > 0, the ML
estimator sequence .b
# n /n is regular and efficient in the sense of Jeganathan’s version
of the Convolution Theorem 7.10, and attains the local asymptotic minimax bound of
Theorem 7.12.
8.6’ Remark. Under assumptions and notations of 8.6, we comment on the limit law
arising in . /. With x the starting point for equation (8.1), recall from 8.2(c) and the
1
start of 8.6
1
Y1 .#/ N x, , '# .1/ D Y1 2
.#/
2# 2#
which gives (use e.g. [4, Sect. VII.1])
8
< 1
2 , 2#
2 in the case where x D 0
L .'# .1// D p
: 1 3 2
in the case where x ¤ 0
2 , x 2# , 2#
where notation .a, , p/ is used for decentral Gamma laws (a > 0, > 0, p > 0)
X1
e k
.a, , p/ D .aCm, p/ .
mD0
kŠ
In the case where D 0 this reduces to the usual .a, p/. It is easy to see that variance
mixtures of type
Z
1
.a, p/.du/ N 0 , where 0 < a < 1
u
do not admit finite second moments. .a, p/ is the first contribution (summand m D
0) to .a, , p/. Thus the limit law . / which is best concentrated in the sense of
Jeganathan’s Convolution Theorem 7.10 and in the sense of the Local Asymptotic
Minimax Theorem 7.12 in the transient case # > 0
Z Z
1 1 1
L .'# .1// .du/ N 0 , D L Y1 .#/ 2
.du/ N 0 ,
u 2# u
is of infinite variance, for all choices of a starting point x 2 R for SDE (8.1). Recall in
this context Remark 6.6”: optimality criteria in mixed normal models are conditional
on the observed information, never in terms of moments of the laws of rescaled
estimation errors.
Section 8.1 Ornstein-Uhlenbeck Model 225
Let us resume the tableau 8.4, 8.5 and 8.6 for convergence of local models when
we observe an Ornstein–Uhlenbeck trajectory under unknown parameter over a long
time interval: optimality results are available in restriction to submodels where the
process either is positive recurrent or is transient; except in the positive recurrent case,
the rates of convergence and the limit experiments are different at different values of
the unknown parameter.
For practical purposes, one might wish to have limit distributions of homogeneous
and easily tractable structure which hold over the full range of parameter values. De-
pending on the statistical model, one may try either random norming of estimation
errors or sequential observation schemes.
8.6” Exercise
Rt (Random norming). In the Ornstein–Uhlenbeck model, the information process
t ! 0 2s ds is observable, its definition does not involve the unknown parameter. Thus we
may consider random norming for ML estimation errors using the observed information.
(a) Consider first the positive recurrent cases # < 0 in 8.4 and the transient cases # > 0 in
8.6. Using the structure of the limit laws for the pairs
n ˝ ˛
M# .1/ , M#n 1 under Q# as n ! 1
from ./+./ in 8.4 and ./ in 8.6 combined with the representation
Rn
s d m.#/
b
#n # D Rn
0 s
, n2N, under Q#
2 ds
0 s
1
.B 2 1/
is the law L. .R 12 B 21ds/1=2 /. Feigin notes simply that this law lacks symmetry around 0
0 s
!
1
.B 2 1/ 1
P R21 1 0 D P B12 1 D P .B1 2 Œ1, 1/ 0.68 ¤
. 0 Bs2 ds/1=2 2
and thus cannot be a normal law. Hence there is no unified result extending (a) to cover all
cases # 2 R.
226 Chapter 8 Some Stochastic Process Examples
In our model, the information process can be calculated from the observation
without knowing the unknown parameter, cf. step (3) in Model 8.3. This allows to
define a time change and thus a sequential observation scheme by stopping when the
observed information hits a prescribed level.
According to Definition 7.1(c), this is LAN at all parameter values # 2 R, and even
more than that: not only for all values of the parameter # 2 ‚ are limit experiments
e
E 1 .#/ given by the same Gaussian shift E.1/, in the notation of Definition 5.2, but
also all local experiments e
E n,# at all levels n 1 of the asymptotics coincide with
E.1/. Writing
for the maximum likelihood estimator when we observe up to the stopping time
.n, 1/, cf. step (1) of Model 8.3, we have at all parameter values # 2 ‚ the following
properties: the ML sequence is regular and efficient for the unknown parameter at #
(Theorems 7.11 and 7.10), and attains the local asymptotic minimax bound at # (Re-
mark 7.13). This sequential observation scheme allows for a unified treatment over
the whole range of parameter values # 2 R (in fact, everything thus reduces to an
elementary normal distribution model ¹N .h, 1/ : h 2 Rº).
8.7’ Exercise. We compare the observation schemes used in 8.7 and in 8.6 in the transient case
# > 0. Consider only the starting point x D 0 for equation (8.1). Define
1
./ # .u/ :D inf ¹ t > 0 : '# .t / > u º D . u Œ'# .1/1 / 2#
for 0 < u < 1, with # .0/ 0, and recall from 8.6 that
1 1
'# .1/ D Y12
.#/ , 2# 2 .
2# 2
(a) Deduce from ./ that for all parameter values # > 0
(b) Consider in the case where # > 0 the time change u ! .m, u/ in 8.7 and prove that
for 0 < u < 1 fixed,
1
.m, u/ log.m/ under Q#
2#
converges weakly as m ! 1 to
log.# .u// .
˝ n˛
Hint: Observe that the definition of M# t in 8.6 and .n, u/ in 8.7 allows us to replace n 2 N
by arbitrary 2 .0, 1/. Using this, we can write
˝ ˛
P . # .u/ > t / D P . '# .t / < u / D lim Q# M#n t < u
n!1
D lim Q# .e 2# n , u/ > Œn C log.t /C
n!1
for fixed values of u and t in .0, 1/, where the last limit is equal to
1 1
lim Q# .m, u/ > log.m/ C log.t / D lim Q# e .m,u/ 2# log.m/ > t .
m!1 2# m!1
8.2 A Null Recurrent Diffusion Model
We discuss a statistical model where the diffusion process under observation is re-
current null for all values of the parameter. Our presentation follows Höpfner and
Kutoyants [54] and makes use of the limit theorems in Höpfner and Löcherbach [53]
and of a result by Khasminskii [73].
228 Chapter 8 Some Stochastic Process Examples
8.9 Long Time Behaviour of the Process. Under # 2 ‚ where the parameter space
‚ is defined by equation (8.8’), the process X in equation (8.8) is recurrent null in the
sense of Harris with invariant measure
1 p 2#
.8.90 /
2
m.dx/ D 2 1 C y 2 dx , x 2 R .
v
We prove this as follows. Fix # 2 ‚, write b.v/ D # 1Cv 2 for the drift coefficient in
equation (8.8), and consider the mapping S : R ! R defined by
Z x Z y
2b
S.x/ :D s.y/ dy where s.y/ :D exp 2
.v/ dv , x, y 2 R
0 0
Ry
as in Proposition 9.12 in the Appendix; we have 0 2b2 .v/ dv D #2 ln.1 C y 2 / and
thus
# p 2#2
s.y/ D 1 C y 2 2 D 1 C y 2 ,
1 2#
S.x/ sign.x/ jxj1 2 as x ! ˙1 .
1 2#
2
Since j 2#2 j < 1 by equation (8.8’), the function S./ is a bijection onto R: thus Propo-
sition 9.12 shows that X under # is Harris with invariant measure
1 1
m.dx/ D 2 dx on .R, B.R//
s.x/
which gives equation (8.9’). We have null recurrence since m has infinite total
mass.
We remark that ‚ defined by equation (8.8’) is the maximal open interval in R such
that null recurrence holds for all parameter values. For the next result, recall from
Remark 6.17 the definition of a Mittag–Leffler process V .˛/ of index 0 < ˛ < 1: to
the stable increasing process S .˛/ of index 0 < ˛ < 1, the process with independent
and stationary increments having Laplace transforms
.˛/ .˛/
E e .S t2 S t1 / D e .t2 t1 / , 0 , 0 t1 < t2 < 1
˛
Section 8.2 A Null Recurrent Diffusion Model 229
and starting from S0.˛/ 0, we associate the process of level crossing times
.˛/
V .˛/ . Paths of V .˛/ are continuous and non-decreasing, with V0 D 0 and
.˛/
lim t!1 V t D 1. Part (a) of the next result is a consequence of two results due
to Khasminskii [73] which we recall in Proposition 9.14 of the Appendix. Based on
(a), part (b) then is a well-known and classical statement, see Feller [24, p. 448] or
Bingham, Goldie and Teugels [12, p. 349], on domains of attraction of one-sided
stable laws. For regularly varying functions see [12]. X being a one-dimensional
process with continuous trajectories, there are many possibilities to define a sequence
of renewal times .Rn /n1 which decompose the trajectory of X into i.i.d. excursions
.X 1ŒŒRn ,RnC1 /n1 away from 0; a particular choice is considered below.
8.10 Regular Variation. Fix # 2 ‚, consider the function S./ and the measure
m.dx/ of 8.9 (both depending on #), and define
0/ 1 2#
.8.10 ˛ D ˛.#/ D 1 2 2 .0, 1/ .
2
(a) The sequence of renewal times .Rn /n1 defined by
Z !
RnC1
.ii/ E f .Xs / ds D 2 m.f / , f 2 L1 .m/ .
Rn
where V .˛/ is the Mittag–Leffler process of index ˛, 0 < ˛ < 1 . In particular, the
norming function in equation (8.10”) varies regularly at 1 with index ˛ D ˛.#/, and
all objects above depend on # 2 ‚.
230 Chapter 8 Some Stochastic Process Examples
Proof. (1) From Khasminskii’s results [73] which we recall in Proposition 9.14 of the
Appendix, we deduce the assertions (a.i) and (a.ii):
Fix # 2 ‚. We have the functions S./, s./ defined in 8.9 which depend on #.
S./ is a bijection onto R. According to step (3) in the proof of Proposition 9.12, the
process Xe :D S.X /
et D e
dX e t / d Wt
.X D .s / ı S 1
where e
Write for short D 2#2 : by choice of the parameter space ‚ in equation (8.8’),
belongs to .1, 1/. From 8.9 we have the following asymptotics:
s.y/ jyj , y ! ˙1
1
S.x/ sign.x/ jxj1 , x ! ˙1
1
1
S 1 .z/ sign.z/ . .1 /jzj / 1 , z ! ˙1
Œs ı S 1 .v/ . .1 /jvj / 1 , v ! ˙1 .
By the properties of S./, the sequence .Rn /n1 defined in (a) can be written in the
form
e t < 0º , Sn :D inf¹t > Rn1 : X
Rn :D inf¹t > Sn : X e t > 1º ,
.C/
n 1, R0 0
v ! ˙1
which proves part (a.ii) of the assertion. Thus, 8.10(a) is now proved.
(2) We prove 8.10(b). Under # 2 ‚, for ˛ given by equation (8.10’) and for the re-
newal times .Rn /n1 considered in (a), select a strictly increasing continuous norming
function a./ such that
1
a.t / as t ! 1 .
.1 ˛/ P .R2 R1 > t /
In particular, a./ varies regularly at 1 with index ˛. Fix an asymptotic inverse b./ to
a./ which is strictly increasing and continuous. All this depends on #, and (a.i) above
implies
1
1 4 .1 ˛/ ˛ 1
.8.11/ b.t / t ˛ as t ! 1 .
2 2 .˛/
From a.b.n// n as n ! 1, a well-known result on convergence of sums of i.i.d.
variables to one-sided stable laws (cf. [24, p. 448], [12, p. 349]) gives weak conver-
gence
Rn .˛/
! S1 (weakly in R, under #, as n ! 1)
b.n/
232 Chapter 8 Some Stochastic Process Examples
where S1.˛/ follows the one-sided stable law on .0, 1/ with Laplace transform !
exp.˛ /, 0. Regular variation of b./ at 1 with index ˛1 implies that b.t n/
1
t ˛ b.n/ as n ! 1. Thus, scaling properties and independence of increments in the
one-sided stable process S ˛ of index 0 < ˛ < 1 show that the last convergence
extends to finite-dimensional convergence
RŒn f .d .
.8.110 / ! S .˛/ as n ! 1 .
b.n/
Associate a counting process
® ¯
N D .N t / t0 , N t :D max j 2 N : Rj t ,
to the renewal times .Rn /n and write for arbitrary 0 < t1 < < tm < 1, Ai 2
B.R/ and xi > 0
N ti b.n/ Œxi n RŒxi n
P < ,1i m DP > ti , 1 i m .
n n b.n/
The counting process N increasing by 1 on Rn , RnC1 , we have almost sure con-
vergence under #
Z t Z
1 1 Rn
lim f .Xs / ds D lim f .Xs / ds D 2 m.f /
t!1 N t 0 n!1 n 0
from the classical strong law of large numbers with respect to the i.i.d. excursions
X 1ŒŒRj ,Rj C1 , j 1. Together with .8.1100 / we arrive at
Z n
1 f .d .
f .Xs / ds ! 2 m.f / V .˛/ as n ! 1
a.n/ 0
Section 8.2 A Null Recurrent Diffusion Model 233
for functions f 0 belonging to L1 .m/. All processes in the last convergence are
increasing processes, and the limit process is continuous. In this case, according
to Jacod and Shiryaev (1987, VI.3.37), finite dimensional convergence and weak
convergence in D are equivalent, and part (b) of 8.10 is proved.
8.12 Statistical Model. For ‚ given by equation (8.8’) and for some starting point
x0 2 R which does not depend on # 2 ‚, let Q# denote the law of the solution to
(8.8) under #, on the canonical path space .C , C , G/ or .D, D, G/. Applying Theorem
6.10, all laws Q# are locally equivalent relative to G, and the density process of Q#
with respect to Q0 relative to G is
² Z t Z ³
#=0 1 2 t 2
L t D exp # .s / d m.0/
s # . s / 2
ds , t 0
0 2 0
with m.0/ the Q0 -local martingale part of the canonical process under Q0 , and
1 x
.8.120 / .x/ :D , x2R.
1 C x2
2
b Yt
# t :D R t
0 2 . s/
2 ds
R
any determination .t , !/ ! Y .t , !/ of the stochastic
R integral .s / ds under Q0
is also a determination of the stochastic integral .s / ds under Q# , by Lemma
.#/ Rt
8.2’, and the Q# -martingale part of equals m t D t 0 # 0 .s / 2 ds .
This allows us to write ML estimation errors under # 2 ‚ in the form
Rt
b .s / d m.#/
s
# t # D DR0 E under Q#
.#/
.s / d ms
t
(3) The representation (+) allows to reparameterise the model E t with respect to
fixed reference points # such that the model around # is quadratic in . #/. We call
Z t
.#/
.s / d ms
0 t0
and Z Z
t
.#/
2 2
.s / ds D ./ d m under Q#
0 t0
From now on we shall keep trace of the parameter # in some more notations: instead
of m as in .8.90 / we write
1 p 2#
# .dx/ D 1 C y2 2
dx , x2R
2
for the invariant measure of the canonical process under Q# ; instead of a./ we
write a# ./ for the norming function in .8.1000 / which is regularly varying at 1 with
index
1 2#
˛.#/ D 1 2 2 .0, 1/
2
Section 8.2 A Null Recurrent Diffusion Model 235
with m.#/ the Q# -martingale part of the canonical process . Under Q# , we have
weak convergence
n
M , hM n i ! .ƒ.#//1=2 B ı V .˛.#// , ƒ.#/ V .˛.#//
Proof. (1) Fix # 2 ‚. For functions f 0 in L1 .# /, we have from 8.10(b) weak
convergence in D, under Q# , of integrable additive functionals:
Z n
1
f .s / ds ! 2 # .f / V ˛.#/ , n ! 1 .
a# .n/ 0
We rephrase this as weak convergence on D, under Q# , of
Z n
1
f .s / ds ! C.#/ # .f / V ˛.#/ , n!1
n˛.#/ 0
for a constant C.#/ which according to 8.10 is given by
1 ˛.#/ 4 .1˛.#//
1
00
.8.13 / C.#/ :D 2 .
2 2 .˛.#//
(2) Step (1) allows to apply Theorem 9.8(a) from the Appendix: we have the regular
variation condition (a.i) there with ˛ D ˛.#/, m D .#/ and `./ 1=C.#/ on the
right-hand side.
(3) With the same notations, we apply Theorem 9.10 from the Appendix: for locally
square integrable local martingales M f satisfying the conditions made there, this gives
weak convergence of
1 1
p f tn
M Dp f tn
M under Q#
t0 t0
n˛.#/ C.#/ n˛.#/=`.n/
236 Chapter 8 Some Stochastic Process Examples
in D as n ! 1 to
e
.ƒ.#// 1=2
B ı V .˛.#//
where Brownian motion B and Mittag–Leffler process V .˛.#// are independent, with
constant
˝ ˛
.8.13000 / e
ƒ.#/ :D E# M f .
1
(4) We put together steps (1)–(3) above. For g 2 L2 .# / consider the local
.Q# , G/-martingale M
Z t
M t :D g.s / d m.#/
s , t 0
0
where m.#/ is the Q# -local martingale part of the canonical process . Then M is a
martingale additive functional as defined in Definition 9.9 and satisfies
Z 1
E# .hM i1 / D E# g .s / ds D 2 # .g 2 / < 1 .
2 2
0
Thus, as a consequence of step (3),
1
M n :D p .M tn / t0
n˛.#/
under Q# converges weakly in D as n ! 1 to
.ƒ.#//1=2 B ı V .˛.#//
With Model 8.12 and Proposition 8.13 we have all elements which we need to
prove LAMN at arbitrary reference points # 2 ‚. Recall that the parameter space
‚ is defined by .8.80 /, and that the starting point for equation (8.8) is fixed and does
not depend on #. Combining the representation of likelihoods with respect to # in
formula (+) of Model 8.12 with Proposition 8.13 we get the following:
with ƒ.#/ given by .8.130 /, and with Mittag–Leffler process V ˛.#/ independent from
B. For the limit experiment E.S , J / at # see Construction 6.16 and Example 6.18.
(b) For every # 2 ‚, rescaled ML estimation errors at # coincide with the cen-
tral sequence at #. Hence ML estimators are regular and efficient in the sense of Je-
ganathan’s version 7.10(b) of the convolution theorem, and attain the local asymptotic
minimax bound of Theorem 7.12.
Proof. (1) Localising around a fixed reference point # 2 R, write Qn for Q restricted
to Gn . With
1 1 2#
ın .#/ :D p with ˛.#/ D 1 2 2 .0, 1/
n˛.#/ 2
according to 8.10, Model 8.12 and Proposition 8.13 we consider local models at #
° ±
n
E#,n D C , Gn , Q#Cı n .#/ h
: h 2 R , n1
when n tends to 1. According to (+) in step (2) of Model 8.12, log-likelihoods in E#,n
are
Z n Z n
1 2 2
.˘/ ƒh=0
#,n
D h ın .#/ . s / d m #
s h ı n .#/ 2 .s / 2 ds , h 2 R .
0 2 0
Now we can apply Proposition 8.13: note that for every # 2 ‚, ./ belongs to L2 .# /
since
2#
2 .x/ D O x 2 as jxj ! 1 , d# .x/ D O jxj 2 dx as jxj ! 1
where Brownian motion B and Mittag–Leffler process V ˛.#/ are independent, and
.˛.#//
ƒ.#/ D .2 2 /1C˛.#/ # . 2 /
4 .1˛.#//
according to .8.130 /, with ./ from .8.120 /. We have proved in Construction 6.16 and
in Example 6.18 that the pair of random variables .8.140 / indeed generates a mixed
normal experiment.
(2) Combining this with step (2) of Model 8.12, we see that rescaled ML estimation
errors at # coincide with the central sequence at #:
p
.˘ ˘ ˘/ n˛.#/ b# n # D Jn1 .#/ Sn .#/ D Zn .#/ , n 1 .
8.15 Remark. According to .8.140 / and to Remark 6.6’, the limit law for .˘ ˘ ˘/ as
n ! 1 has the form
Z
˛.#/ 1
L .Z.#/jP0 / D L ƒ.#/ V1 .du/ N 0 , .
u
As mentioned in Example 6.18, this law – the best concentrated limit distribution for
rescaled estimation errors at #, in the sense of the Convolution Theorem 6.6 or of
the Local Asymptotic Minimax Theorem 6.8 – does not admit finite second moments
(see Exercise 6.180 ).
8.16 Remark. Our model presents different speeds of convergence at different pa-
rameter values, different limit experiments at different parameter values, but has an
information process
Z t Z t 2
1 s
2 .s / 2 ds D 2 ds , t 0
0 0 1 C 2s
which can be calculated from the observation . t / t0 without knowing the unknown
parameter # 2 ‚. This fact allows for random norming using the observed informa-
tion: directly from LAMN at # in 8.14, combining .˘/, .˘˘/ and .˘ ˘ ˘/ there, we
can write sZ
n
s 2
ds b
# n # ! N .0, 2 /
0 1 C 2s
Section 8.2 A Null Recurrent Diffusion Model 239
for all values of # 2 ‚. The last representation allows for practical work, e.g. to fix
confidence intervals for the unknown parameter determined from an asymptotically
efficient estimator sequence, and overcomes the handicap caused by non-finiteness of
second moments in Remark 8.15.
We conclude the discussion of the statistical model defined by (8.8) and (8.8’) by
pointing out that one-step correction is possible, and that we may start from any prelim-
inary estimator sequence .Tnp/n for the unknown parameter in .En /n whose estimation
errors at # are tight at rate n˛.#/ as n ! 1 for every # 2 ‚, and modify .Tn /n
according to Theorem 7.19 in order to obtain a sequence of estimators .T en /n which
satisfies the coupling condition of Theorem 7.11
p
n˛.#/ T en # D Zn .#/ C o.Q / .1/ as n ! 1
#
8.17 Proposition. With the above notations, the sequence of models .En /n satisfies
all assumptions stated in 7.14. For .Tn /n as above, one-step modification
en :D Tn C p 1
T
1
Sn .Tn / D b
#n
n˛.T n / Jn n/
.T
YLn
TLn :D R n , n1.
0 L 2 .s / 2 ds
p
Then Proposition 8.13 establishes weak convergence of L. n˛.#/ .TLn #/ j Q# /
as n ! 1 under all values of the parameter # 2 ‚. Even if such estimators can be
arbitrarily bad, depending on the choice of A, they converge at the right speed at all
points of the model: this establishes Assumption 7.14(D).
240 Chapter 8 Some Stochastic Process Examples
is a version of the score, according to 8.14 and (+) in Model 8.12, and the information
Z n
1
Jn .#/ D ˛.#/ 2 .s / 2 ds
n 0
for arbitrary values of a constant 0 < c < 1. By .8.100 /, ˛. / differs from ˛.#/ by
Œ 12 . #/. Hence, to check ./ it is sufficient to cancel out n˛.#/ in the ratio in
./ and then take logarithms (note that this argument exploits again 0 < ˛.#/ < 1
for every # 2 ‚). Now parts (B) and (C) of Assumptions 7.14 are established. This
finishes the proof, and we write down the one-step modification
Rn
1 1 Yn Tn 0 2 .s / 2 ds
en :D Tn C p
T Sn .Tn / D Tn C Rn Db #n
n˛.Tn / Jn .Tn / 2 2
0 .s / ds
8.3 Some Further Remarks
We point out several (out of many more) references on LAN, LAMN, LAQ in stochas-
tic process models of different types. Some of these papers prove LAN or LAMN or
LAQ in a particular stochastic process model, others establish properties of estimators
Section 8.3 Some Further Remarks 241
at # which indicate that the underlying statistical model should be LAMN at # or LAQ
at #.
Cox–Ingersoll–Ross process models are treated in Overbeck [103], Overbeck and
Ryden [104] and Ben Alaya and Kebaier [8]. For ergodic diffusions with state space R
dX t D b.#, X t / dt C .X t / d W t , t 0
.IK/ e u 2 juj ,
Lh=0 D e W
1
h2R
with two-sided Brownian motion .W e u /u2R , and studied convergence to this limit
experiment in ‘signal in white noise’ models. Obviously (IK) is not a quadratic ex-
periment since the parameter plays the role of time: in this sense, the experiment
(IK) is linked to the Gaussian shift limit experiment as we pointed out in Example
1.16’. Dachian [17] associated to the limit experiment (IK) approximating experiments
where likelihood ratios have – separately on the positive and on the negative branch –
the form of the trajectory of a particular Poisson process with suitable linear terms
subtracted. In the limit model (IK), Rubin and Song [114] could calculate the risk of
the Bayesian u – for quadratic loss, and with ‘uniform prior over the real line’ – and
could show that quadratic risk of u is by some factor smaller than quadratic risk of
the maximum likelihood estimator b u. Recent investigations by Dachian show quanti-
tatively how this feature carries over to the approximating experiments which he con-
siders: there is a large domain of parameter values in the approximating experiments
where rescaled estimation errors of an analogously defined un outperform those of b un
under quadratic risk. However, not much seems to be known on comparison of b u and
u using other loss functions than the quadratic, a fortiori not under a broader class of
loss functions, e.g. subconvex and bounded. There is nothing like a central statistic in
the sense of Definition 6.1 for the experiment (IK) where the only sufficient statistic is
the whole two-sided Brownian path, hence nothing like a central sequence in the sense
of Definition 7.1’(c) in the approximating experiments. Still, both estimators bu and u
in the experiment (IK) are equivariant in the sense of Definition 5.4: in Höpfner and
Kutoyants [56, 57] we have studied local asymptotics with limit experiment (IK) – in
a context of diffusions carrying a deterministic discontinuous periodic signal in their
drift, and being observed continuously over a long time interval – using some of the
techniques of Chapter 7; the main results of Chapter 7 have no counterpart here. The
limit experiment (IK) is of importance in a broad variety of contexts, e.g. Golubev [32],
Pflug [106], Küchler and Kutoyants [76] and the references therein.
Chapter 9
Appendix
Topics:
This Appendix collects facts of different nature which we quote in the stochastic pro-
cess sections of this book. In most cases, they are stated without proof, and we indicate
references. An asterisk in front of this chapter (and in front of all its sections) indi-
cates that the reader should be acquainted with basic properties of stochastic processes
in continuous time, with semi-martingales and stochastic differential equations. Our
244 Chapter 9 Appendix
principal references are the following. The book by Métivier [98] represents a well-
written source for the theory of stochastic processes; a useful overview appears in
the appendix sections of Bremaud [14]. A detailed treatment can be found in Del-
lacherie and Meyer [20]. For stochastic differential equations, we refer to Karatzas
and Shreve [69] and Ikeda and Watanabe [61]. For semi-martingales and their (weak)
convergence, see Jacod and Shiryaev [64].
9.1 Convergence of Martingales
All filtrations which appear in this section are right-continuous; all processes below
have càdlàg paths. .D, D, G/ denotes the Skorohod space of d -dimensional càdlàg
functions (see [64, Chap. VI]). A d -dimensional locally square integrable local mar-
tingale M D .M t / t0 starting from M0 D 0 is called a continuous Gaussian mar-
tingale if there are no jumps and if the angle bracket process hM i is continuous and
deterministic: in this case, M has independent increments, and all finite dimensional
distributions are Gaussian laws. We quote the following from [64, Coroll. VIII.3.24]:
and write C :D hM 0 i for the deterministic angle bracket. Then stochastic conver-
gence
˝ ˛
for every 0 < t < 1 fixed, M .n/ t D C t C oP .n/ .1/ as n ! 1
implies weak convergence of martingales
Q.n/ :D L M .n/ j P .n/ ! Q0 :D L M 0 j P 0
in the Skorohod space D as n ! 1.
Section 9.1 Convergence of Martingales 245
Next we fix one probability space ..n/ , A.n/ , P .n/ / D ., A, P / for all n,
equipped with a filtration F , and assume that M .n/ and F .n/ as above are derived from
the same locally square integrable local .P , F /-martingale M D .M t / t0 through
.n/ .n/
space-time rescaling (such as e.g. F t :D F tn and M t :D n1=2 M tn ). We need
the following nesting condition (cf. [64, VIII.5.37]):
8
ˆ
< there is a sequence of positive real numbers ˛n # 0 such that
.n/
.C/ F˛n is contained
in F˛.nC1/ for all n 1, and
:̂ S F .n/ D S F D: F .
nC1
n ˛n t t 1
(such as e.g. ˆ t D t for some F1 -measurable random variable > 0). On some
other probability space .0 , A0 , F 0 , P 0 /, consider a continuous Gaussian martingale
M 0 with deterministic angle bracket hM 0 i D C . We define M 0 subject to independent
time change t ! ˆ t
M 0 ı ˆ D Mˆ0 t t0
as follows. Let K 0 ., / denote a transition probability from ., F1 / to .D, D/ such
that for the first argument ! 2 fixed, the canonical process on .D, D/ under
K 0 .!, / is a continuous Gaussian G-martingale with angle bracket
t ! .C ı ˆ/.!, t / D C.ˆ t .!//.
Lifting ˆ and to
D , A˝D , .F1 ˝G t / t0 , .PK 0 /.d!, df / :D P .d!/K 0 .!, df /
the pair .ˆ, M 0 ı ˆ/ is well defined on this space (cf. [64, p. 471]). By this construc-
tion, M 0 ı ˆ is a conditionally Gaussian martingale. We quote the following result
from Jacod and Shiryaev [64, VIII.5.7 and VIII.5.42)]:
9.2 Theorem. For n 1, for filtrations F .n/ in A such that the nesting condition (+)
above holds, consider d -dimensional locally square integrable local martingales
.n/
M .n/ D M t t0 on ., A, F .n/ , P /
.n/
starting from M0 D 0 and satisfying a Lindeberg condition
for all 0 < t < 1 and all " > 0,
Z tZ
jyj2 .n/ .ds, dy/ D oP .1/ as n ! 1 .
0 ¹jyj>"º
246 Chapter 9 Appendix
where M .n,c/ is the continuous local martingale part. Writing M .n,i/ , 1 i d , for
the components of M .n/ and ŒM .n/ for the quadratic covariation process, we obtain
for 0 < t < 1 and i , j D 1, : : : , d
ˇ ˝ ˛ ˇˇ
ˇ
sup ˇ M .n,i/ , M .n,j / s M .n,i/ , M .n,j / s ˇ D oP .1/ as n ! 1
st
provided the sequence .M .n/ /n satisfies the Lindeberg condition. In this situation,
since Jacod and Shiryaev [64, VI.6.1] show that weak convergence of M .n/ as
n ! 1 in D to a continuous limit martingale implies weak convergence of pairs
.M .n/ , ŒM .n/ / in the Skorohod space of càdlàg functions Œ0, 1/ ! Rd Rd d ,
we also have weak convergence of pairs .M .n/ , hM .n/ i/. Thus the following result is
contained in [64, VI.6.1]:
.n/ f denote a
with M0 D 0. Assume the Lindeberg condition of Theorem 9.1. Let M
continuous local martingale
fD M
M ft defined on ., e e
e A, e/
F, P
t0
9.2 Harris Recurrent Markov Processes
For Harris recurrence of Markov chains, we refer to Revuz [111] and Nummelin [101].
For Harris recurrence of continuous time Markov processes, our main reference is
Azema, Duflo and Revuz [2]. On some underlying probability space, we consider a
time homogeneous strong Markov process X D .X t / t0 taking values in Rd , with
càdlàg paths, having infinite life time, and its semigroup
m P t D m for all t 0
is termed invariant for X . On the canonical path space .D, D, G/ for Rd -valued
càdlàg processes, write Qx for the law of the process X starting from x 2 Rd . Let
D . t / t0 denote the canonical process on .D, D, G/, and . t / t0 the collection
of shift operators on .D, D/: t .˛/ :D .˛.t C s//s0 for ˛ 2 D. Systematically,
we speak of ‘properties of the process X ’ when we mean ‘properties of the semigroup
.P t ., // t0 ’: these in turn will be formulated as properties of the canonical process
in the system .D, D, G, . t / t0 , .Qx /x2Rd /.
9.4 Definition. The process X D .X t / t0 is called recurrent in the sense of Harris
(or Harris for short) if there is a -finite measure on .Rd , B.Rd // such that the
248 Chapter 9 Appendix
following holds:
A 2 B.Rd / , .A/ > 0 H)
.˘/ Z 1
1A .s / ds D C1 Qx -almost surely, for every x 2 Rd .
0
9.5’ Definition. A Harris process X is called positive recurrent if the invariant mea-
sure m is of finite total mass on Rd ; X is called null recurrent otherwise.
From [2, 3.1], we quote Theorem 9.6(a.i) below; the following assertions of Theo-
rem 9.6(a.ii) and 9.6(b) are immediate but important consequences.
9.6 Ratio Limit Theorem. (a) If X is recurrent in the sense of Harris, with invariant
measure m,
(i) we have for integrable additive functionals A, B such that 0 < Em .B1 / < 1
At Em .A1 /
for every x 2 Rd : ! Qx -almost surely as t ! 1 ;
Bt Em .B1 /
(b) If X is positive recurrent, with g 1 and norming factor such that m.Rd / D 1,
Z
1 t
for every x 2 R : d f .s / ds ! m.f / Qx -almost surely as t ! 1
t 0
for arbitrary functions f 2 L1 .Rd , B.Rd /, m/.
and paths of V .˛/ are continuous and non-decreasing with V0.˛/ 0 and lim V t.˛/ D
t!1
1 . We extend this definition to the case ˛ D 1 by
S .1/ D id D V .1/ , i.e. S t.1/ t V t.1/ , t 0.
250 Chapter 9 Appendix
The last definition is needed in view of null recurrent processes where suitably normed
integrable additive functionals converge weakly to V .1/ D id . As an example, we
might have a recurrent atom in the state space of X such that the distribution of the time
between successive visits in the atom is ‘relatively stable’ in the sense of [12, Chap. 8.8
combined with p. 359]; at every visit, the process might spend an exponential time in
the atom; then the occupation time A t of the atom up to time t defines an additive
functional A D .A t / t0 of X for which suitable norming functions vary regularly at
1 with index ˛ D 1. Index ˛ D 1 is also needed for the strong law of large num-
bers in Theorem 9.6(b) in positive recurrent Harris processes. We quote the following
from [53, Thm. 3.15].
converges weakly in D as n ! 1 to
m.f / V .˛/
where V .˛/ is the Mittag–Leffler process of index ˛.
(b) The cases in (a) are the only ones where for functions f as in (a.ii) and for suitable
choice of a norming function v./, weak convergence in D of
Z tn
1
f .s / ds under Qx
v.n/ 0 t0
9.9 Definition. On the path space .D, D, G/, for given x 2 Rd , consider a locally
square integrable local Qx -martingale M D .M t / t0 together with its quadratic vari-
ation ŒM and its angle bracket hM i under Qx ; assume in addition that hM i under
Qx is locally bounded. We call M a martingale additive functional if the following
holds:
(i) ŒM and hM i under Qx admit versions which are additive functionals of ;
(ii) for every choice of 0 s < t < 1 and y 2 Rd , one has M t Ms D M ts ı s
Qy -almost surely.
9.9’ Example. (a) On .D, D, G/, for starting point x 2 R, write Qx for the law of a
diffusion
dX t D b.X t / dt C .X t / d W t
and m.x/ for the local martingale part of the canonical process on .D, D, G/ under
Qx . Write L for the Markov generator of X and consider some C 2 function F : R !
R with derivative f . Then
Z t
Mt D f .s / d m.x/
s , t 0
0 t0
driven by the pair .N , W /. Let M denote the local martingale part – sum of a contin-
uous and a purely discontinuous martingale – of the canonical process under Qx .
It admits a version .t , !/ ! Y .t , !/
Z t
Y t :D t 0 " 1.1,0/ .s / 1.0,1/ .s / ds , t 0
0
We quote the following from Höpfner and Löcherbach [53, Thm. 3.16].
times in the approximating processes and apply limit theorems for sums of i.i.d. ran-
dom variables from Resnick and Greenwood [110]. The theorem contains the positive
recurrent cases where ˛ D 1, `./ 1 and V .1/ D id. Note that Theorem 9.10 is not
a martingale convergence theorem in the sense of Theorem 9.2: there is no stochastic
convergence of angle brackets hM n i t , only convergence in law.
Via Theorem 9.3, the following is a direct consequence of Theorems 9.8 and 9.10.
9.10’ Corollary. With notations of Theorem 9.10, and under all assumptions made
there, consider
1
M n :D p . M tn / t0 under Qx
n˛ =`.n/
as n ! 1. With n .ds, dy/ the .Qx , .G tn / t0 /-compensator of the point process of
jumps of M n , assume in addition a Lindeberg condition
Then the assertion of Theorem 9.10 strengthens to weak convergence of the pairs
n
M , hM n i under Qx
9.3 Checking the Harris Condition
We continue under the basic assumptions and notations stated at the start of Sec-
tion 9.2, and present conditions which imply that a càdlàg time-continuous strong
Markov process is Harris. The first one is from [112, pp. 394–395].
Proof. We use arguments of Revuz [111, pp. 94–95] and Revuz and Yor [112, p.
395]. Since m in condition .ı/ is invariant, we consider in .ı/ sets A 2 B.Rd / with
254 Chapter 9 Appendix
Rt
Em . 0 1A .s / ds/ D t m.A/ > 0 . As a consequence, we can specify some " > 0 and
some ı > 0 such that
® ¯
m x 2 Rd : Qx . < 1/ > ı > 0
.ıı/ Rt
where :D A D inf¹t > 0 : 0 1A .s / ds > "º .
For starting point x 2 Rd fixed and for arbitrarily large k < 1, introduce uniformly
integrable .G, Qx /-martingales N .1,x/ , N .2,x,k/ :
.1,x/
Nt :D Ex . 1R j G t / , N t.2,x,k/ :D Ex 1R C 1Rc \ ¹kCık <1º j G t ,
t 0.
By definition of the supermartingale N in step (1) and since the events ¹t C ı t < 1º
are decreasing to R as t ! 1, we can compare conditional expectations, and obtain
Section 9.3 Checking the Harris Condition 255
Qx -almost surely
Revuz and Yor [112, pp. 394–395] take .ı/ in Proposition 9.11 as a definition for
Harris recurrence of .X t / t0 . From the very beginning, we have to know an invari-
ant measure. The following sufficient criterion for Harris recurrence of the process
.X t / t0 avoids this: for some 0 < T < 1 which is deterministic we might be able to
establish that .XkT /k2N0 is a Harris chain.
9.11’ Proposition. Assume that for some step size T 2 .0, 1/ and for some -finite
measure b on .Rd , B.Rd // we have
Then the process X D .X t / t0 is Harris recurrent. In fact, condition .ıı/ implies the
following stronger statement: path segments over Œk t , .kC1/T taken from the path
of X form a Harris chain
.XkT Cv /0vT k2N0
taking values in the Skorohod space .D.Œ0, T /, D.Œ0, T // of càdlàg functions
Œ0, T ! Rd .
Proof. In this proof, we write for short .E, E/ instead of .Rd , B.Rd //, everything
remaining valid when .E, E/ is a Polish space and .D, D, G/ the space of càdlàg
functions Œ0, 1/ ! E with canonical process D .t / t0 .
(1) By condition .ıı/, the grid chainb :D .kT /k2N0 is a Harris chain, has a unique
invariant measure m b (unique up to multiplication with a constant), and .ıı/ holds with
b in place of b (.this is from Harris [45], see also [101, p. 43] and [111, p. 92]).
m
(2) We show that when .ıı/ holds, T -segments in the path of form a Harris chain.
(a) As a consequence of step (1), pairs .kT , .kC1/T /k2N0 form a Harris chain on
.E E, E˝E/ whose invariant measure is m b.dy1 /PT .y1 , dy2 / .
256 Chapter 9 Appendix
(b) The next argument is as in [56, Sect. 2]. Write . t / t0 for the process of coordinate
projections on .D.Œ0, T /, D.Œ0, T //, and let m denote the unique measure on
D.Œ0, T / specified by the following set of finite dimensional distributions:
8
ˆ
< m Rtj 2 Aj , 0 j l D
ˆ
Q
b.dy0 / 1A0 .y0 / jl D1 P tj tj 1 .yj 1 , dyj /1Aj .yj /
E lC1 m
ˆ
:̂ with 0 D t < t < < t D T arbitrary, l 1, and A 2 E for 0 j l .
0 1 l j
Here the measure m.dy1 /PT .y1 , dy2 / is as in (a), and the laws K. .y1 , y2 / , / on
D.Œ0, T / correspond to bridges from state y1 at time 0 to state y2 at time T .
(c) Consider a set F 2 D.Œ0, T / with m.F / > 0. Then by . /, there is some " > 0
such that
BF :D ¹ .y1 , y2 / : K. .y1 , y2 / , F / > " º
b.dy1 /PT .y1 , dy2 /-measure.
has strictly positive m
As a consequence of the Harris property in (a), the chain of pairs .kT ,
.kC1/T /k2N0 will visit infinitely often the set BF , independently of the choice of
a starting point. Then by definition of BF combined with . /, the kernel K pasting in
bridges, the segment chain will visit F infinitely often: we obtain
1
X
F 2 D.Œ0, T / , m.F / > 0 : 1F .kT Cv /0vT D 1
kD0
almost surely, independently of the choice of a starting point. Thus the segment chain
Note that m.FA / D .A/ for A 2 E, and m.G/ D m b.B/ D 1 . The ratio limit
theorem for the segment chain (+) yields the convergence
R mT Pm1
1A .s / ds FA . k / m.FA /
.CC/ lim Pm1
0
D lim PkD0 m1 D D .A/
m!1
kD0 1B . kT /
m!1
mD0 G. k / m.G/
This is property .˘/ in Definition 9.4: hence the continuous-time process D . t / t0
is Harris. By Theorem 9.5, admits a unique (up to constant multiples) invariant
measure m on .E, E/; it remains to identify m .
(4) We show that the three measures m, m b, can be identified. Select some set
C 2 E with the property m.FC / D .C / D 1 . Then similarly to (++), for every
A 2 E, the ratio limit theorem for the segment chain (+) yields
Rt
1A .s / ds m.FA /
lim R 0t D D .A/ Qx -almost surely, for every x 2 E .
t!1 1C .s / ds m.F C/
0
Thus the measures m and coincide, up to some constant multiple, by Theorem 9.6
RT
and the Harris property of . t / t0 in step (3). Next, the measure D 0 ds Œb mPs
is by definition invariant for the grid chain .kT /k2N0 . Thus the Harris property in
step (1) shows that mb and coincide up to constant multiples. This concludes the
proof of the proposition.
It follows from Azema, Duflo and Revuz [2] that the Harris property of .X t / t0 in
continuous time is equivalent to the Harris property of the chain .XSn /n2N0 which cor-
responds to observation of the continuous-time process after independent exponential
waiting times (i.e. Sn :D 1 C C n where .j /j are i.i.d. exponentially distributed
and independent of X ).
258 Chapter 9 Appendix
9.4 One-dimensional Diffusions
For one-dimensional diffusions, it is an easy task to check Harris recurrence, in con-
trast to higher dimensions. We consider first diffusions with state space .1, 1/ and
then – see [69, Sect. 5.5] – diffusions taking values in open intervals I R. The
following criterion is widely known; we take it from from Khasminskii [73, 1st. ed.,
Chap. III.8, Ex. 2 on p. 105].
dX t D b.X t / dt C .X t / d W t
where W is one-dimensional Brownian motion, and where b./ and ./ > 0 are con-
tinuous on R.
(a) If the (strictly increasing) mapping S : R ! R defined by
Z x Z y
2b
./ S.x/ :D s.y/ dy where s.y/ :D exp 2
.v/ dv , x, y 2 R
0 0
is a bijection onto R, then the process X D .X t / t0 is recurrent in the sense of Harris,
and the invariant measure m is given by
Z x
1 1 1 2b
m.dx/ D dx D exp .v/ dv dx on .R, B.R// .
s.x/ 2 .x/ 2 .x/ 0
2
Proof. Let Qx on .D, D, G/ denote the law of X with starting point x. We give a
proof in three steps, from Brownian motion via the driftless case to continuous semi-
martingales as above.
(1) Special case ./ 1 and b./ 0. Here X is one-dimensional Brownian
motion starting from x, thus Lebesgue measure on R is an invariant measure since
Z Z Z
1 1 .yx/2
P t .f / D dx dy p e 2 t f .y/ D dy f .y/ D .f / .
2 t
For any Borel set A of positive Lebesgue measure and for any starting point x for
X , select n large enough for .A \ Bn1 .0// > 0 and x 2 Bn1 .0/, and define
G-stopping times
By the law of the iterated logarithm, these are finite Qx -almost surely, and we have
Tm " 1 Qx -almost surely as m ! 1. Hence A is visited infinitely often by the
canonical process under Qx , and Proposition 9.11 establishes Harris recurrence. By
Theorem 9.5, m D is the unique invariant measure, and we have null recurrence
since m has infinite total mass on R.
(2) Special case b./ 0. Here X is a diffusion without drift. On .D, D, G/, let A
denote the additive functional
Z t
At D 2 .s / ds , t 0
0
and define M :D t 0 , t 0. Then M is a continuous local martingale under Qx
with angle bracket hM i D A. A result by Lepingle [85] states that Qx -almost surely
° ±
on the event Rc :D lim hM i t < 1 , lim M t exists in R .
t!1 t!1
This holds for an arbitrary choice of a starting point x 2 R. We thus have condition
.˘/ in Definition 9.4 with :D , thus the driftless diffusion X is Harris recurrent. It
remains to determine the invariant measure m , unique by Theorem 9.5 up to constant
multiples. We shall show
1
m.dx/ D 2 dx .
.x/
Now, for f 0 measurable, we combine .ı/ and .ıı/ with the obvious representation
Z t Z t
f .Xv / dv D e X
f e v dv for f e :D f ı S 1
0 0
e gives
in order to finish the proof. First, in this combination, the Harris property of X
Z 1
f non-negative, B.R/-measurable, m.f / > 0 : f .X t / dt D 1
0
almost surely, for all choices of a starting point for X . This is property .˘/ in Definition
9.4 with D m. Thus X is a Harris process. Second, for g 0 as in step 2 above and
g :D g ı S 1 which implies m.g/ D m
e e.eg / D 1, the ratio limit theorem
Rt Rt
e.X e s / ds
0 f .Xs / ds f
lim R t D lim R0t Dm e f ı S 1 D m.f /
t!1 g.Xs / ds t!1 e e s / ds
g .X
0 0
identifies m as the invariant measure for X . We have proved part (a) of the proposi-
tion.
9.12’ Examples. The following examples (a)–(c) are applications of Proposition 9.12.
(a) The Ornstein–Uhlenbeck process dX t D #X t dt C d W t with parameters # < 0
2
and > 0 is positive Harris recurrent with invariant probability m D N .0, 2j#j /.
p
(b) The diffusion dX t D .˛ ˇX t / dt C 1 C X t2 d W t with parameters ˛ 2 R,
ˇ > 0, 0 < < 1 is positive Harris recurrent. We have
Z x
2b ˇ
.v/ dv jxj2.1 / as x ! C1 or x ! 1
0 2 1
p 2#
measure is m.dx/ D 1
2 1 C x2 2
dx . Null recurrence holds if and only if
2
j#j 2 .
The following is a variant of Proposition 9.12 for diffusions taking values in open
intervals I R.
Proof. This is a modification of part (3) of the proof of Proposition 9.12 since I is
open. For the I -valued semi-martingale X , the transformed process Xe D S ı X is an
R-valued diffusion without drift as in step (2) of the proof of Proposition 9.12, thus
e on .R, B.R//. Defining m from m
Harris, with invariant measure m e by
1
m.f / :D me f ıS for f : I ! Œ0, 1/ B.I /-measurable
we obtain a -finite measure on .I , B.I // such that sets A 2 B.I / with m.A/ > 0
are visited infinitely often by the process X , almost surely, under every choice of a
starting value in I . Thus X is Harris, and the ratio limit theorem identifies m as the
invariant measure of X .
9.13’ Example. For every choice of a starting p point in I :D .0, 1/, the Cox–
Ingersoll–Ross process dXt D .˛ ˇX t / dt C X t d W t with parameters ˇ > 0,
> 0 and 2˛2 1 almost surely never hits 0 (e.g. Ikeda and Watanabe [61, p. 235–
237]). Proposition 9.13 shows that X is positive Harris recurrent. The invariant prob-
ability m is a Gamma law . 2˛2 , 2ˇ2 /.
9.13” Example. On some ., A, P /, for some parameter # 0 and for deterministic
starting point in I :D .0, 1/, consider the diffusion
1 p
dX t D # . X t / dt C X t .1 X t / d W t
2
Section 9.4 One-dimensional Diffusions 263
up to the stopping time :D inf¹t > 0 : X t … .0, 1/º. Then the following holds:
with respect to any fixed x0 2 .0, 1/. The function s./ in ./ of Proposition 9.13,
given by
y.1 y/ #
s.y/ D , 0<y<1,
x0 .1 x0 /
behaves as c y # for y # 0 and as c .1 y/# for y " 1, with some constant c which
depends on # and on x0 . Hence S./ of ./ of Proposition 9.13 is such that
1 1
m.dx/ :D 1.0,1/ .x/ 2
dx D 1.0,1/ .x/ Œx0 .1 x0 /# Œx.1 x/.#1/ dx .
s.x/ .x/
Up to a factor 2, this is the ‘speed measure’ in terms of Karatzas and Shreve [69,
p. 343]. In the case where # D 0, this measure has infinite total mass m.I / D 1 such
that
for some constant c depending on x0 and #. In the case where # > 0, m has finite
total mass and – arranging for the norming factor – equals the Beta law B.#, #/ on
264 Chapter 9 Appendix
as in [69, p. 347]) .
(2) In the case where # D 0, X is a local martingale, we have s./ 1 on .0, 1/
and thus 8R
< x dy m.Œx0 , y// if x0 < x < 1
V .x/ D Rx0
: x0 dy m..y, x / if 0 < x < x
x 0 0
According to [69, p. 350)], (+) implies P . < 1/ D 1. Thus in the case where # D 0,
almost surely, the martingale X D .X t^ / t0 is absorbed at the boundary of I in
finite time.
(3) Consider the case 0 < # < 1 where both m.I / and S.1 / S.0C / are finite.
By definition of V ./, this again leads to (+) and thus to P . < 1/ D 1: X hits the
boundary of I in finite time.
(4) Consider the case # 1 where S./ maps I onto R. In this case, [69, p. 345]
shows that X has infinite life time in I . Then Proposition 9.13 applies and yields Harris
recurrence of X with invariant measure m . Since up to constants m is a probability
law, X is positive recurrent.
The following result on null recurrent diffusions without drift provides exact con-
stants for weak convergence of additive functionals, both for the invariant measure
and for the norming functions in Theorem 9.8. It was proved in Khasminskii [73]. We
quote part (a) from [73, 2nd ed., pp. 129–130] and part (b) from [73, 2nd ed. pp. 134–
136]), embedding it into the Harris setting of Proposition 9.12.
taking values in .1, 1/. Normed as in Proposition 9.12(b), consider the invari-
ant measure m e together with the sequence of stopping times
e.dx/ D e21.x/ dx of X
.k/ " 1 as k ! 1:
e t < 0º ,
.k/ :D inf¹t > .k/ : X e t > 1º ,
.k/ :D inf¹t > .k 1/ : X
k 1 , .0/ 0 .
Section 9.4 One-dimensional Diffusions 265
(b) Assume that there is ˇ > 1 and constants AC C A > 0 such that the limits
Z
1 x ˇ 2
lim jvj dv D: A˙ 2 Œ0, 1/
x!˙1 x 0 e
2 .v/
[1] T. Anderson, The integral of a symmetric unimodal function over a symmetric convex
set and some probability inequalities, Proc. Amer. Math. Soc. 6 (1955), 170–176.
[2] J. Azéma, M. Duflo, D. Revuz, Mesures invariantes des processus de Markov récurrents,
Séminaire de Probabilités III, Lecture Notes in Mathematics 88, Springer, 1969.
[3] O. Barndorff-Nielsen, Parametric Statistical Models and Likelihood. Lecture Notes in
Statistics 50, Springer, 1988.
[4] J. Barra, Mathematical Basis of Statistics. Academic Press 1981. French ed. Dunod,
1971.
[5] I. Basawa, J. Scott, Asymptotic Optimal Inference for Non-ergodic Models, Lecture
Notes in Statistics 17, Springer, 1983.
[6] R. Bass, Diffusions and Elliptic Operators, Springer, 1998.
[7] H. Bauer, Wahrscheinlichkeitstheorie und Grundzüge der Masstheorie, 3rd ed., De
Gruyter, 1978.
[8] M. Ben Alaya, A. Kebaier, Parameter estimation for the square root diffusions: ergodic
and nonergodic cases, Stochastic Models 28 (2012), 609–634.
[9] R. Beran, Estimating a distribution function, Ann. Statist. 5, 400-404 (1977).
[10] J. Bertoin, Lévy Processes, Cambridge University Press, 1996.
[11] P. Billingsley, Convergence of Probability Measures, Wiley 1968.
[12] N. Bingham, C. Goldie, J. Teugels, Regular Variation, Cambridge University Press
1987.
[13] C. Boll, Comparison of Experiments in the Infinite Case, PhD Thesis, Stanford Univer-
sity, 1955.
[14] P. Brémaud, Point Processes and Queues, Springer, 1981.
[15] K. Chung, R. Williams, Introduction to Stochastic Integration, 2nd ed., Birkhäuser,
1990.
[16] H. Cremers, D. Kadelka, On weak convergence of integral functionals of stochastic
processes with application to processes taking paths in LpE , Stoch. Processes Appl. 21
(1986), 305–317.
[17] S. Dachian, On limiting likelihood ratio processes of some change-point type statistical
models, Journal of Statistical Planning and Inference 140 (2010), 2682–2692.
[18] D. Darling, M. Kac, On occupation times for Markov processes. Trans. Amer. Math.
Soc. 84 (1957), 444–458.
268 Bibliography
[19] R. Davies, Asymptotic inference when the amount of information is random, in: L. Le
Cam, R. Olshen, (Eds): Proc. of the Berkeley Symp. in Honour of J. Neyman and J.
Kiefer, Vol. II. Wadsworth, 1985.
[20] C. Dellacherie, P. Meyer, Probabilités et potentiel, Chap. I–IV, Hermann, 1975;
Chap. V–VIII, Hermann, 1980.
[21] M. Diether, Wavelet estimation in diffusions with periodicity, Statist. Inference Stoch.
Processes 15 (2012), 257–284.
[22] G. Dohnal, On estimating the diffusion coefficient. J. Appl. Probab. 24 (1987), 105–114.
[23] P. Feigin, Some comments concerning a curious singularity. J. Appl. Probab. 16 (1979),
440–444.
[24] W. Feller, An Introduction to Probability Theory and its Applications, Vol. 2, Wiley,
1971.
[25] V. Genon-Catalot, J. Jacod, On the estimation of the diffusion coefficient for multidi-
mensional diffusion processes, Ann. Inst. H. Poincaré Probab. Stat. 29, (1993), 119–
151.
[26] V. Genon-Catalot, D. Picard, Elements de statistique asymptotique, Springer, 1993.
[27] O. Georgii, Stochastik, De Gruyter, 2002.
[28] I. Gihman, A. Skorohod, The Theory of Stochastic Processes, Vol. I+II, Springer, 1974
(Reprint 2004).
[29] R. Gill, B. Levit, Applications of the van Trees inequality: a Bayesian Cramér-Rao
bound, Bernoulli 1 (1995), 59–79.
[30] E. Gobet, Local asymptotic mixed normality property for elliptic diffusion, Bernoulli 7
(2001), 899–912.
[31] E. Gobet, LAN property for ergodic diffusions with discrete observations, Ann. Inst. H.
Poincaré PR 38 (2002), 711–737.
[32] Y. Golubev, Computation of efficiency of maximum likelihood estimate when observing
a discontinuous signal in white noise, Problems Inform. Transm. 15 (1979), 38–52.
[33] P. Greenwood, A. N. Shiryaev, Asymptotic minimaxity of a sequential estimator for a
first order autoregressive system, Stochastics and Stochastics Reports 38 (1992), 49-65.
[34] P. Greenwood, W. Wefelmeyer, Efficiency of empirical estimators for Markov chains,
Ann. Statist. 23 (1995), 132–143.
[35] P. Greenwood, W. Ward, W. Wefelmeyer, Statistical analysis of stochastic resonance
in a simple setting, Phys. Rev. E 60 (1999), 4687–4695.
[36] Greenwood, P., Wefelmeyer, W.: Asymptotic minimax results for stochastic process
families with critical points, Stoch. Proc. Appl. 44, (1993), 107–116.
[37] L. Grinblat, A limit theorem for measurable stochastic processes and its applications,
Proc. Amer. Math. Soc. 61 (1976), 371–376.
[38] A. Gushchin, On asymptotic optimality of estimators under the LAQ condition, Theory
Probab. Appl. 40 (1995), 261–272.
Bibliography 269
[39] A. Gushchin, U. Küchler, Asymptotic inference for a linear stochastic differential equa-
tion with time delay, Bernoulli 5, (1999), 1059–1098.
[40] J. Hájek, A characterization of limiting distributions for regular estimators, Z.
Wahrscheinlichkeitsth. Verw. Geb. 14 (1970), 323–330.
[41] J. Hájek, Z. Sidák, Theory of Rank Tests, Academic Press, 1967.
[42] J. Hájek, Z. Sidák, P. Sen, Theory of Rank Tests (2nd ed.), Academic Press, 1999.
[43] M. Hammer, Parameterschätzung in zeitdiskreten ergodischen Markov-Prozessen am
Beispiel des Cox-Ingersoll-Ross Modells, Diplomarbeit, Institut für Mathematik, Uni-
versität Mainz 2005.
http://ubm.opus.hbz-nrw.de/volltexte/2006/1154/pdf/diss.pdf
[44] M. Hammer, Ergodicity and regularity of invariant measure for branching Markov pro-
cesses with immigration, PhD Thesis, Institute of Mathematics, University of Mainz,
2012.
http://ubm.opus.hbz-nrw.de/volltexte/2012/3306/pdf/doc.pdf
[45] T. Harris, The existence of stationary measures for certain Markov processes.,Proc. 3rd
Berkeley Symp., Vol. II, pp. 113–124, University of California Press, 1956.
[46] R. Höpfner, Asymptotic inference for continuous-time Markov chains, Probab. Theory
Rel. Fields 77 (1988), 537–550.
[47] R. Höpfner, Null recurrent birth-and-death processes, limits of certain martingales, and
local asymptotic mixed normality, Scand. J. Statist. 17 (1990), 201–215.
[48] R. Höpfner, On statistics of Markov step processes: representation of log-likelihood ratio
processes in filtered local models, Probab. Theory Rel. Fields 94 (1993), 375–398.
[49] R. Höpfner, Asymptotic inference for Markov step processes: observation up to a ran-
dom time, Stoch. Proc. Appl. 48 (1993), 295–310.
[50] R. Höpfner, J. Jacod, Some remarks on the joint estimation of the index and the scale
parameter for stable prozesses, in: Mandl and Huskova (Eds.), Asymptotic Statistics,
Proc. Prague 1993, pp. 273–284, Physica Verlag, 1994.
[51] R. Höpfner, J. Jacod, L. Ladelli, Local asymptotic normality and mixed normality for
Markov statistical models, Probab. Theory Rel. Fields 86 (1990), 105–129.
[52] R. Höpfner, E. Löcherbach, Remarks on ergodicity and invariant occupation measure
in branching diffusions with immigration, Ann. Inst. Henri Poincaré 41 (2005), 1025–
1047.
[53] R. Höpfner, E. Löcherbach, Limit Theorems for Null Recurrent Markov Processes,
Memoirs AMS 161, American Mathematical Society, 2003.
[54] R. Höpfner, Y. Kutoyants, On a problem of statistical inference in null recurrent diffu-
sions, Statist. Inference Stoch. Process. 6 (2003), 25–42.
[55] R. Höpfner, Y. Kutoyants, On LAN for parametrized continuous periodic signals in a
time inhomogeneous diffusion, Statistics & Decisions 27 (2009), 309–326.
[56] R. Höpfner, Y. Kutoyants, Estimating discontinuous periodic signals in a time inhomo-
geneous diffusion, Statist. Inference Stoch. Process. 13 (2010), 193–230.
270 Bibliography
[57] R. Höpfner, Y. Kutoyants, Estimating a periodicity parameter in the drift of a time in-
homogeneous diffusion, Math. Meth. Statist. 20 (2011), 58–74.
[58] R. Höpfner, L. Rüschendorf, Comparison of estimators in stable models, Mathematical
and Computer Modelling 29 (1999), 145–160.
[59] R. Iasnogorodski, H. Lhéritier, Théorie de l’estimation ponctuelle paramétrique, EDP
Sciences, 2003.
[60] I. Ibragimov, R. Khasminskii, Statistical Estimation, Springer, 1981.
[61] N. Ikeda, N. Watanabe, Stochastic Differential Equations and Diffusion Processes,
North-Holland / Kodansha, 2nd ed., 1989.
[62] K. Ito, H. McKean, Diffusion Processes and their Sample Paths, Springer, 1965.
[63] J. Jacod, Multivariate point processes: predictable projection, Radon-Nikodym deriva-
tives, representation of martingales, Z. Wahrscheinlichkeitsth. Verw. Geb. 31 (1975),
2435–253.
[64] J. Jacod, A. Shiryaev, Limit Theorems for Stochastic Processes, Springer 1987, 2nd ed.,
2003.
[65] A. Janssen, Zur Asymptotik nichtparametrischer Tests, Vorlesungsskript. Düsseldorf,
1998.
[66] P. Jeganathan, On the asymptotic theory of estimation when the limit of the log-
likelihood ratios is mixed normal, Sankhya Ser. A 44 (1982), 173–212.
[67] P. Jeganathan, Some aspects of asymptotic theory with applications to time series mod-
els. Preprint version, 1988. Econometric Theory 11, 818–887 (1995).
[68] Y. Kabanov, R. Liptser, A. Shiryaev, Criteria for absolute continuity of measures cor-
responding to multivariate point processes, in: J. Prokhorov (Ed.), Proc. Third Japan-
USSR Symposium, Lecture Notes in Math. 550, pp. 232–252, Springer, 1976.
[69] I. Karatzas, S. Shreve, Brownian Motion and Stochastic Calculus, 2nd ed. Springer,
1991.
[70] M. Kessler, Estimation of an ergodic diffusion from discrete observations, Scand. J.
Statist. 24 (1997), 211–229.
[71] M. Kessler, A. Lindner, M. Sørensen (Eds.), Statistical Methods for Stochastic Differ-
ential Equations, CRC Press, 2012.
[72] M. Kessler, A. Schick, W. Wefelmeyer, The information in the marginal law of a
Markov chain, Bernoulli 7 (2001), 342–266.
[73] R. Khasminskii, Stochastic Stability of Differential Equations, 1st ed., Sijthoff und No-
ordhoff, 1980; 2nd ed., Springer, 2012.
[74] R. Khasminskii, G. Yin, Asymptotic behavior of parabolic equations arising from one-
dimensional null recurrent diffusions, J. Diff. Eqns. 161 (2000), 154–173.
[75] A. Klenke, Wahrscheinlichkeitstheorie, 3rd ed., Springer, 2013.
[76] U. Küchler, Y. Kutoyants, Delay estimation for some stationary diffusion-type pro-
cesses, Scand. J. Statist. 27 (2000), 405–414.
Bibliography 271
[120] H. Strasser, Einführung in die lokale asymptotische Theorie der Statistik, Bayreuther
Mathematische Schriften, 1985.
[121] H. Strasser, Mathematical Theory of Statistics, De Gruyter, 1985.
[122] A. Shiryaev, V. Spokoiny, Statistical Experiments and Decisions, World Scientific,
2001.
[123] A. Touati, Théorèmes limites pour les processus de Markov récurrents. Unpublished
paper 1988. See also C. R. Acad. Sci. Paris Série I 305 (1987), 841–844.
[124] A. Tsybakov, Introduction à l’estimation non-parametrique, Springer SMAI, 2004.
[125] A. van der Vaart, An asymptotic representation theorem, Int. Statist. Rev. 59 (1991),
97–121.
[126] A. van der Vaart, Asymptotic Statistics, Cambridge University Press, 1998.
[127] H. Witting, Mathematische Statistik I, Teubner, 1985.
[128] H. Witting, U. Müller-Funk, Mathematische Statistik II, Teubner, 1995.
[129] N. Yoshida, Estimation for diffusion processes from discrete observations, J. Multivar.
Anal. 41 (1992), 220–242.
[130] H. van Zanten, On the rate of convergence of the maximum likelihood estimator in
Brownian semimartingale models, Bernoulli 11 (2005), 643–664.
[131] V. Zolotarev, One-dimensional Stable Distributions, Transl. of Mathematical Mono-
graphs 65, Amer. Math. Soc., 1986.
Index