0% found this document useful (0 votes)
57 views287 pages

AStat

This document discusses asymptotic statistics and stochastic processes. It introduces concepts like score, information, contiguity, and L2-differentiable statistical models. It also covers topics such as Gaussian shift models, quadratic experiments, mixed normal experiments, and local asymptotics of type LAN, LAMN, and LAQ. Many examples involve parameter estimation for stochastic processes like Brownian motion.

Uploaded by

Widmung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views287 pages

AStat

This document discusses asymptotic statistics and stochastic processes. It introduces concepts like score, information, contiguity, and L2-differentiable statistical models. It also covers topics such as Gaussian shift models, quadratic experiments, mixed normal experiments, and local asymptotics of type LAN, LAMN, and LAQ. Many examples involve parameter estimation for stochastic processes like Brownian motion.

Uploaded by

Widmung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 287

De Gruyter Graduate

Höpfner  Asymptotic Statistics


Reinhard Höpfner

Asymptotic Statistics
With a View to Stochastic Processes

De Gruyter
Mathematics Subject Classification 2010: 62F12, 62M05, 60J60, 60F17, 60J25

Author
Prof. Dr. Reinhard Höpfner
Johannes Gutenberg University Mainz
Faculty 8: Physics, Mathematics and Computer Science
Institute of Mathematics
Staudingerweg 9
55099 Mainz
Germany
[email protected]

ISBN 978-3-11-025024-4
e-ISBN 978-3-11-025028-2

Library of Congress Cataloging-in-Publication Data


A CIP catalog record for this book has been applied for at the Library of Congress.

Bibliographic information published by the Deutsche Nationalbibliothek


The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;
detailed bibliographic data are available in the Internet at http://dnb.dnb.de.

© 2014 Walter de Gruyter GmbH, Berlin/Boston

Typesetting: P T P-Berlin Protago-TEX-Production GmbH, www.ptp-berlin.de


Printing and binding: CPI buch bücher.de GmbH, Birkach
Printed on acid-free paper

Printed in Germany

www.degruyter.com
To the students who like both statistics and stochastic processes
Preface

At Freiburg in 1991 I gave my very first lectures on local asymptotic normality; the
first time I explained likelihood ratio processes or minimum distance estimators to
students in a course was 1995 at Bonn, during a visiting professorship. Then, during
a short time at Paderborn and now almost 15 years at Mainz, I lectured from time
to time on related topics, or on parts of these. In cooperation with colleagues in the
field of statistics of stochastic processes (where most of the time I was learning) and
in discussions with good students (who exposed me to very precise questions on the
underlying mathematical notions), the scope of subjects entering the topic increases
steadily; initially, my personal preference was on null recurrent Markov processes and
on local asymptotic mixed normality. Later, lecturing on such topics, a first tentative
version of a script came into life at some point. It went on augmenting and completing
itself in successive steps of approximation. It is now my hope that the combination
of topics may serve students and interested readers to get acquainted with the purely
statistical theory on one hand, developed carefully in a ‘probabilistic’ style, and on the
other hand with some (there are others) typical applications to statistics of stochastic
processes.
The present book can be read in different ways, according to possibly different math-
ematical preferences of a reader. In the author’s view, the core of the book are the
Chapters 5 (Gaussian shift models), 6 (mixed normal and quadratic models), 7 (local
asymptotics where the limit model is a Gaussian shift or a mixed normal or a quadratic
experiment, often abbreviated as LAN, LAMN or LAQ), and finally 8 (examples of
statistical models in a context of diffusion processes where local asymptotics of type
LAN, LAMN or LAQ appear).
A reader who wants to concentrate on the statistical theory alone should skip chap-
ters or subsections marked by an asterisk  : he or she would read only the Sections 5.1
and 6.1, and then all subsections of Chapter 7. This route includes a number of exam-
ples formulated in the classical i.i.d. framework, and allows to follow the statistical
theory without gaps.
In contrast, chapters or subsections marked by an asterisk  are designed for readers
with an interest in both statistics and stochastic processes. This reader is assumed to be
acquainted with basic knowledge on continuous-time martingales, semi-martingales,
Ito formula and Girsanov theorem, and may go through the entire Chapters 5 to 8
consecutively. In view of the stochastic process examples in Chapter 8, he or she
may consult from time to time the Appendix Section 9 for further background and
for references (on subjects such as Harris recurrence, positive or null, convergence of
martingales, and convergence of additive functionals of a Harris process).
viii Preface

In both cases, a reader may previously have consulted or have read the Sections 1.1
and 1.2 as well as Chapters 3 and 4 for statistical notions such as score and informa-
tion in classical definition, contiguity or L2 -differentiability, to be prepared for the
core of the book. Given Sections 1.1 and 1.2, Chapters 3 and 4 can be read indepen-
dently of each other. Only few basic notions of classical mathematical statistics (such
as sufficiency, Rao–Blackwell theorem, exponential families, ...) are assumed to be
known.
Sections 1.3 and 1.4 are of complementary character and may be skipped; they dis-
cuss naive belief in maximum likelihood and provide some background in order to
appreciate the theorems of Chapter 7.
Chapter 2 stands isolated and can be read separately from all other chapters. In i.i.d.
framework we study in detail one particular class of estimators for the unknown pa-
rameter which ‘works reasonably well’ in a large variety of statistical problems under
weak assumptions. From a theoretical point of view, this allows to explicitly construct
estimator sequences which converge at a certain rate. From a practical point of view,
we find it interesting to start – prior to all optimality considerations in later chapters –
with estimators that tolerate small deviations from theoretical model assumptions.
Fruitful exchanges and cooperations over a long period of time have contributed to
the scope of topics treated in this book, and I would like to thank my colleagues, coau-
thors and friends for those many long and stimulating discussions around successive
projects related to our joint papers. Their influence, well visible in the relevant parts
of the book, is acknowledged with deep gratitude. In a similar way, I would like to
thank my coauthors and partners up to now in other (formal or quite informal) coop-
erations. There are some teachers and colleagues in probability and statistics to whom
I owe much, either for encouragement and help at decisive moments of my mathe-
matical life, or for mathematical discussions on specific topics, and I would like to
take this opportunity to express my gratitude. Furthermore, I have to thank those who
from the beginning allowed me to learn and to start to do mathematics, and – beyond
mathematics, sharing everyday life – my family.
Concerning a more recent time period, I would like to thank my colleague Eva
Löcherbach, my PhD student Michael Diether as well as Tobias Berg and Simon Hol-
bach: they agreed to read longer or shorter parts of this text in close-to-final versions
and made critical and helpful comments; remaining errors are my own.

Mainz, June 2013 Reinhard Höpfner


Contents

Preface vii
1 Score and Information 1
1.1 Score, Information, Information Bounds . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Estimator Sequences, Asymptotics of Information Bounds . . . . . . . . . 15
1.3 Heuristics on Maximum Likelihood Estimator Sequences . . . . . . . . . . 23
1.4 Consistency of ML Estimators via Hellinger Distances . . . . . . . . . . . . 30
2 Minimum Distance Estimators 42
2.1 Stochastic Processes with Paths in Lp .T , T , / . . . . . . . . . . . . . . . . . . 43
2.2 Minimum Distance Estimator Sequences . . . . . . . . . . . . . . . . . . . . . . . 55
2.3 Some Comments on Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . 68
2.4 Asymptotic Normality for Minimum Distance Estimator Sequences . . 75
3 Contiguity 85
3.1 Le Cam’s First and Third Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.2 Proofs for Section 3.1 and some Variants . . . . . . . . . . . . . . . . . . . . . . . 92
4 L2 -differentiable Statistical Models 108
4.1 Lr -differentiable Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.2 Le Cam’s Second Lemma for i.i.d. Observations . . . . . . . . . . . . . . . . . 119
5 Gaussian Shift Models 127
5.1 Gaussian Shift Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.2  BrownianMotion with Unknown Drift as a Gaussian Shift
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6 Quadratic Experiments and Mixed Normal Experiments 148
6.1 Quadratic and Mixed Normal Experiments . . . . . . . . . . . . . . . . . . . . . 148
6.2  Likelihood Ratio Processes in Diffusion Models . . . . . . . . . . . . . . . . 160
6.3  Time Changes for Brownian Motion with Unknown Drift . . . . . . . . . 168
x Contents

7 Local Asymptotics of Type LAN, LAMN, LAQ 178


7.1 Local Asymptotics of Type LAN, LAMN, LAQ . . . . . . . . . . . . . . . . . 179
7.2 Asymptotic optimality of estimators in the LAN or LAMN setting . . . 191
7.3 Le Cam’s One-step Modification of Estimators . . . . . . . . . . . . . . . . . . 200
7.4 The Case of i.i.d. Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8  SomeStochastic Process Examples for Local Asymptotics of Type
LAN, LAMN and LAQ 212
 Ornstein–Uhlenbeck Process with Unknown Parameter Observed
8.1
over a Long Time Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
8.2 A Null Recurrent Diffusion Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
 Some Further Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
8.3
Appendix 243
 Convergence of Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
9.1
9.2  Harris Recurrent Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 247
9.3  Checking the Harris Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
9.4  One-dimensional Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
Bibliography 267
Index 275
Chapter 1

Score and Information

Topics for Chapter 1:


1.1 Score, Information, Information Bounds
Statistical models admitting score and information 1.1–1.2’
Example: one-parametric paths in nonparametric models 1.3
Example: location models 1.4
Score and information in product models 1.5
Cramér–Rao bound 1.6–1.7’
Van Trees bound 1.8

1.2 Estimator Sequences, Asymptotics of Information Bounds


Sequences of experiments, sequences of estimators, consistency 1.9
Asymptotic Cramér–Rao bound in i.i.d. models 1.10
Asymptotic van Trees bound in i.i.d. models 1.11
Example: an asymptotic minimax property of the empirical distribution function 1.11’

1.3 Heuristics on Maximum Likelihood Estimator Sequences


Heuristics I: certain regularity conditions 1.12
Heuristics II: maximum likelihood (ML) estimators 1.13
Heuristics III: asymptotics of ML sequences 1.14
Example: a normal distribution model 1.15
A normal distribution model by Neyman and Scott 1.16
Example: likelihoods which are not differentiable in the parameter 1.16’

1.4 Consistency of ML Estimators via Hellinger Distances


Definition of ML estimators and ML sequences 1.17
An example using Kullback divergence 1.17”
Hellinger distance 1.18
Hellinger distances in i.i.d. models: a set of conditions 1.19
Some lemmata on likelihood ratios 1.20–1.23
Main result: consistency via Hellinger distance 1.24
Example: location model generated from the uniform law R. 12 , 12 / 1.25

Exercises: 1.4’, 1.4”, 1.8’, 1.8”, 1.17’, 1.24”


2 Chapter 1 Score and Information

The chapter starts with classically defined notions of ‘score’, ‘information’ and ‘in-
formation bounds’ in smoothly parameterised statistical models. These are studied
in Sections 1.1 and 1.2; the main results of this part are the asymptotic van Trees
bounds for i.i.d. models in Theorem 1.11. Here we encounter in a restricted setting
the type of information bounds which will play a key role in later chapters, for more
general sequences of statistical models, under weaker assumptions on smoothness of
parameterisation, and for a broader family of risks. Section 1.3 then discusses ‘classical
heuristics’ which link limit distributions of maximum likelihood estimator sequences
to the notion of ‘information’, together with examples of statistical models where such
heuristics either work or do not work at all; we include this – completely informal –
discussion since later sections will show how similar aims can be attained in a rigorous
way based on essentially different mathematical techniques. Finally, a different route
to consistency of ML estimator sequences in i.i.d. models is presented in Section 1.4,
with the main result in Theorem 1.24, based on conditions on Hellinger distances in
the statistical model.

1.1 Score, Information, Information Bounds


1.1 Definition. (a) A statistical model (or statistical experiment) is a triplet

., A, P /

where P is a specified family of probability measures on a measurable space ., A/.


A statistical model is termed parametric if there is a 1-1-mapping between P and
some subset of some Rd , d  1,

P D ¹P# : # 2 ‚º , ‚  Rd ,

and dominated if there is some  -finite measure  on ., A/ such that

P <<  for every P 2 P .

(b) A statistic on ., A, P / is a measurable mapping from ., A/ to some mea-


surable space .G, G /. A statistic T taking values in .Rk , B.Rk //, with components
T1 , : : : , Tk , is termed q-integrable (q  1) if

T 2 Lq .P / (i.e. Tj 2 Lq .P / for every 1  j  k) for every P 2 P .

(c) In a parametric experiment ., A, ¹P# : # 2 ‚º/ with parameter space ‚ 


Rd , an estimator for the unknown parameter is any statistic T taking values in
.Rd , B.Rd //. T is termed unbiased if

T 2 L1 .P / and E# .T / D # for every # 2 ‚.


Section 1.1 Score, Information, Information Bounds 3

(d) Consider an arbitrary experiment ., A, P / and a mapping  : P ! Rk . An


estimator for  is any statistic T taking values in .Rk , B.Rk //. T is termed unbiased
for  if
T 2 L1 .P / and EP .T / D .P / for every P 2 P .

1.2 Definition (Score and information, classical definition). Consider a parametric ex-
periment
E :D ., A, ¹P# : # 2 ‚º/ , ‚  Rd open
with dominating measure  and with densities
d P#
.!/ :D f .#, !/ D f# .!/ , # 2‚, !2.
d
Assume that for every ! 2  fixed, f ., !/ is continuous, and let partial derivatives
exist on ‚: as pointwise limits of measurable functions f .#Cheih,/f .#,/ , h ! 0,
these are measurable functions.
(a) Let r denote the vector of partial derivatives with respect to # and define

M# :D .r log f /.#, !/
0 1
@
@#1
log f
B C
:D 1¹f# >0º .!/ @  A .#, !/ , # 2‚, !2.
@
@#d
log f

This yields a well-defined random variable on ., A/ taking values in .Rd , B.Rd //.
Assume further

./ for every # 2 ‚, we have M# 2 L2 .P# / and E# .M# / D 0.

If all these conditions are satisfied, M# is termed score in #, and the covariance matrix
under P#  
I# :D C ov# .M# / D E# M# M#>
Fisher information in #. We then call E an experiment admitting score and Fisher
information.
(b) More generally, we may allow for modification of M# defined in (a) on sets of
f# : # 2 ‚º from
P# -measure zero, and call any family of measurable mappings ¹M
., A/ to .Rd , B.Rd // with the property
f# D M#
for every # in ‚ : M P# -almost surely

a version of the score in the statistical model ¹P# : # 2 ‚º.


4 Chapter 1 Score and Information

Even if many classically studied parametric statistical models do admit score and
Fisher information, it is indeed an assumption that densities # ! f .#, !/ should
satisfy the smoothness conditions in Definition 1.2. Densities # ! f .#, !/ can be
continuous and non-differentiable, their smoothness being e.g. the smoothness of the
Brownian path; densities # ! f .#, !/ can be discontinuous, their jumps correspond-
ing e.g. to the jumps of a Poisson process. For examples, see (1) and (3) in [17] (which
goes back to [60]). We will start from the classical setting.

1.2’ Remark. For a statistical model, score and information – if they exist – depend
essentially on the choice of the parameterisation ¹P# : # 2 ‚º D P for the model, but
not on the choice of a dominating measure: in a dominated experiment, for different
measures 1 and 2 dominating P , we have with respect to 1 C 2
d P# d 1 d P# d P# d 2
 D D 
d 1 d.1 C 2 / d.1 C 2 / d 2 d.1 C 2 /
where the second factor on the r.h.s. and the second factor on the l.h.s. do not involve
#, hence do not contribute to the score .r log f /.#, /.

1.3 Example. (d D 1, one-parametric paths in nonparametric models) Fix any prob-


ability measure F on .R, B.R//, fix any function h : .R, B.R// ! .R, B.R// with
the properties
Z Z
.˘/ hdF D 0 , 0 < h2 dF < 1 ,

and write F for the class of all probability measures on .R, B.R//. We shall show
that for suitable " > 0 and ‚ :D .", "/, there are parametric models

¹F# : # 2 ‚º  F with F0 D F

of mutually equivalent probability measures on .R, B.R// where score M# and Fisher
information I# exist at every point # 2 ‚, and where we have at # D 0
Z
.˘˘/ M0 .!/ D h.!/ , I0 D h2 dF .

Such models are termed one-parametric paths through F in direction h.


(a) We consider first the case of bounded functions h in .˘/. Let sup jh.x/j  M <
x2R
1 and put
 
./ E D R, B.R/, ¹F# : j#j < M 1 º , F# .d!/ :D .1 C # h.!// F .d!/ .

Then the family ./ is dominated by  :D F , with strictly positive densities


d F#
f .#, !/ D .!/ D 1 C # h.!/ , # 2‚, !2R,
dF
Section 1.1 Score, Information, Information Bounds 5

hence all probability laws in ./ are equivalent. For all j#j < M 1 , score M# and
Fisher information I# according to Definition 1.2 exist and have the form
 
h
M# .!/ D .!/ ,
1 C #h
0/
.1.3 Z  2 Z
h h2
I# D dF# D dF < 1
1 C #h 1 C #h
which at # D 0 gives .˘˘/ as indicated above.
(b) Now we consider general functions h in .˘/. Select some truncation function
2 C01 .R/, the class of continuously differentiable functions R ! R with compact
support, such that the properties
² ³
1 1
.x/ D x on jxj < , .x/ D 0 on ¹jxj > 1º , max j j <
3 x2R 2
are satisfied. Put
E D . R, B.R/, ¹F# : j#j < 1º / ,
./  R 
F# .d!/ :D 1 C Œ .#h.!//  .#h/dF  F .d!/

and note that in the special case of bounded h as considered in (a) above, paths ./
and ./ coincide when # ranges over some small neighbourhood of 0. By choice of
, the densities
Z
f .#, !/ D 1 C Œ .#h.!//  .#h/dF 

are strictly positive, hence all probability measures in ./ are equivalent on
.R, B.R//. Since ./ is in particular Lipschitz, dominated convergence shows
Z Z
d
.#h/ dF D h 0 .#h/ dF .
d#
This gives scores M# at j#j < 1
  R
d h.!/ 0 .#h.!//  h 0 .#h/ dF
M# .!/ D log f .#, !/ D R
d# 1 C Œ .#h.!//  .#h/dF 

which by .˘/ belong to L2 .F# /, and are centred under F# since


Z R
h.!/ 0 .#h.!//  h 0 .#h/ dF
R F# .d!/
1 C Œ .#h.!//  .#h/dF 
Z  Z
D h.!/ 0 .#h.!//  h 0 .#h/ dF F .d!/ D 0 .
6 Chapter 1 Score and Information

Thus with M# as specified here, pairs


. M# , I# / ,
Z R 2
.1.3”/ h.!/ 0 .#h.!//  h 0 .#h/ dF
I# D R F .d!/ < 1
1 C Œ .#h.!//  .#h/dF 
satisfy all assumptions of Definition 1.2 for all j#j < 1. At # D 0, expressions (1.3”)
reduce to .˘˘/. 

1.4 Example. (location models) Fix a probability measure F on .R, B.R// having
density f with respect to Lebesgue measure  such that

f is differentiable on R with derivative f 0


f is strictly positive on .a, b/, and  0 outside

for some open interval .a, b/ in R (1  a, b  C1), and assume


Z  0 2
f
dF < 1 .
.a,b/ f
Then the location model generated by F

E :D . R, B.R/, ¹F# : # 2 Rº / , dF# :D f .  #/ d 

satisfies all assumptions made in Definition 1.2, with score at #


 
f0
M# .!/ D 1.aC#,bC#/ .!/  .!  #/ , ! 2 R
f
since F# is concentrated on the interval .a C #, b C #/, and Fisher information
Z  0 2
  f
I# D E# M#2 D dF , # 2 R.
.a,b/ f
In particular, in the location model, the Fisher information does not depend on the
parameter. 

1.4’ Exercise. In Example 1.4, we may consider .a, b/ :D .0, 1/ and f .x/ :D
c.˛/1.0,1/ .x/Œx.1  x/˛ for parameter value ˛ > 1 (we write c.˛/ for the norming constant
of the Beta B.˛ C 1, ˛ C 1/ density). 

1.4” Exercise. (Location-scale models) For laws F on .R, B.R// with density f differ-
entiable on R and supported by some open interval .a, b/ as in Example 1.4, consider the
location-scale model generated by F
 
  1   #1
E :D R, B.R/, ¹F.#1 ,#2 / : #1 2 R , #2 > 0º , dF.#1 ,#2 / :D f d  .
#2 #2
Section 1.1 Score, Information, Information Bounds 7

This model is parameterised by ‚ :D ¹# D .#1 , #2 / : #1 2 R , #2 > 0º. Show that the


condition Z  0 2
f
Œ1 C x 2  .x/ F .dx/ < 1
.a,b/ f
guarantees that the location-scale model admits score and Fisher information at every point in
‚. Write
f 0 .x/ xf 0 .x/
G.x/ :D  , H.x/ :D Œ1 C  , a<x<b
f .x/ f .x/
and show that the score M# at # has the form
0   1
1 !  #1
B # G C
B 2 #2 C
M# .!/ D 1.#1 C#2 a , #1 C#2 b/ .!/ B   C .
@ 1 !  #1 A
H
#2 #2

Write down the Fisher information I# at #. Check that I# is invertible for all # 2 ‚, and
that I# depends on the parameter only through the scaling component #2 (cf. [127, pp. 181–
182]). 

We consider score and information in product models:

1.5 Lemma. Consider as in Definition 1.2 an experiment

E :D . , A, ¹P# : # 2 ‚º / , ‚  Rd open

admitting score ¹M# : # 2 ‚º and Fisher information ¹I# : # 2 ‚º.


(a) Finite products satisfy again all assumptions of Definition 1.2: for every n  1,
the product experiment
 n 
n n
En :D X , ˝ A, ¹Pn,# :D ˝ P# : # 2 ‚º
iD1 iD1 iD1

has score ¹Mn,# : # 2 ‚º given by

X
n
n
Mn,# .!1 , : : : , !n / D M# .!i / Pn,# -almost surely on X 
iD1
iD1

and Fisher information ¹In,# : # 2 ‚º satisfying

In,# D n  I# , # 2‚.

(b) Alternatively, it may be convenient to work with infinite product experiments


 
1 1 1
E1 :D X  , ˝ A , ¹Q# :D ˝ P# : # 2 ‚º ,
iD1 iD1 iD1
8 Chapter 1 Score and Information

with coordinate projections Xi : .!1 , !2 , : : :/ ! !i , i 2 N, and sub- -fields


1
Fn :D  .X1 , : : : , Xn /  ˝ A , n1,
iD1

and to express n-fold independent replication of the experiment E as


 
1
e
E n :D X  , Fn , ¹Pn,# :D Q# jFn : # 2 ‚º .
iD1

Here Q# jFn denotes the restriction of Q# to Fn . Again all assumptions of Defini-


E n . On e
tion 1.2 hold for e E n , the score is
X
n
1
Mn,# .!1 , !2 , : : :/ D M# .!i / Pn,# -almost surely on X 
iD1
iD1

for # 2 ‚, and the Fisher information is

In,# D n  I# , # 2‚.

Proof. Select a dominating measure  for the experiment E, and versions f# of the
densities dP#
d
which satisfy the assumptions of Definition 1.2.
n
(1) We prove (a). The product experiment En is dominated by n :D ˝ , with
iD1
densities
dPn,# Y
n
dP# Y
n
fn,# .!1 , : : : , !n / D .!1 , : : : , !n / D .!i / D f# .!i / .
dn d
iD1 iD1
Then Pn,# is supported by the rectangle
n
An,# :D ¹.!1 , : : : , !n / : fn,# .!1 , : : : , !n / > 0º D X ¹!i : f# .!i / > 0º
iD1
n n n
in ˝ A. On the product space . X , ˝ A/ we have with the conventions of 1.2 (a)
iD1 iD1 iD1

X
n
Mn,# ..!1 , : : : , !n // D 1An,# ..!1 , : : : , !n // M# .!i / .
iD1
Since Pn,# .An,# / D 1 , the measurable mappings
X
n
.!1 , : : : , !n / ! Mn,# ..!1 , : : : , !n // , .!1 , : : : , !n / ! M# .!i /
iD1
coincide Pn,# -almost surely on .XniD1 , ˝niD1 A/ , and are identified under Pn,# ac-
cording to Definition  write M#,j for the components of M# , 1  j  d ,
 1.2(b). We
and .I# /j ,l D E# M#,j M#,l .
Section 1.1 Score, Information, Information Bounds 9

(2) Under Pn,# , successive observations !1 , : : : , !n are independent, hence


.!1 , : : : , !n / ! M# .!i / , 1i n
are Rd -valued i.i.d. random variables on . XniD1  , ˝niD1 A , Pn,# /. On this space,
we have for the components Mn,#,j D 1¹fn,#>0 º . @#@ log fn,# / of Mn,#
j
 
EPn,# Mn,#,j Mn,#,l D
Z X n  Xn 
M#,j .!r / M#,l .!k / Pn,# .d!1 , : : : , d!n / D n  .I# /j ,l
rD1 kD1
using ./ in Definition 1.2. This is (a).
(3) The proof of assertion (b) is analogous since on the infinite product space
 
1 1
X , ˝ A ,
iD1 iD1

in restriction to sub- -fields Fn D  .X1 , : : : , Xn / where n < 1, Q the laws Q# jFn


are dominated by Œ˝1 iD1 jFn with density .!1 , ! 2 , : : :/ ! n
iD1 f# .!i / as
above. 

The Fisher information yields bounds for the quality of estimators. We present two
types of bounds.

1.6 Proposition (Cramér–Rao bound). Consider as in Definition 1.2 an experiment


E D ., A, ¹P# : # 2 ‚º/, ‚ 2 Rd open, with score ¹M# : # 2 ‚º and Fisher
information ¹I# : # 2 ‚º. Consider a mapping  : ‚ ! Rk which is partially
differentiable, and let
Y : ., A/ ! .Rk , B.Rk //
denote a square-integrable unbiased estimator for  .
(a) At points # 2 ‚ where the two conditions
.C/ the Fisher information matrix I# is invertible
0 1
@
 : : : @#@ 1
@#1 1  
B d
C >
.CC/ @ : : : : : : : : : A .#/ D E# Y M#
@
 : : : @#@d k
@#1 k
are satisfied, we have a lower bound
.˘/ C ov# .Y /  V# I#1 V#>
at #, where V# denotes the Jacobi matrix of  on the l.h.s. of condition .CC/.
(b) In the setting of (a), Y attains the bound (˘) at # (i.e. achieves C ov# .Y / D
V# I#1 V#> ) if and only if its estimation error at # admits a representation
Y  .#/ D V# I#1 M# P# –almost surely.
10 Chapter 1 Score and Information

(c) In the special case  D id in (a) and (b) above, the bound (˘) at # reads
C ov# .Y /  I#1
and Y attains the bound (˘) at # if and only if
Y  # D I#1 M# P# -almost surely.

Proof. According to Definition 1.2, we have M# 2 L2 .P# / and E# .M# / D 0 for all
# 2 ‚. Necessarily I# D E# .M# M#> / is symmetric and non-negative definite for
all # 2 ‚. We consider a point # 2 ‚ such that .C/ holds, together with a mapping
 : ‚ ! Rk and a random variable Y 2 L2 .P# / satisfying E# .Y / D .#/ and
.CC/.
(1) We start by defining V# by the right-hand side of (++):
 
V# :D E# Y M#> .

# being fixed, introduce a random variable


W :D .Y  E# .Y //  V# I#1 M#
taking values in Rk . Then we have W 2 L2 .P# / with E# .W / D 0 and
C ov# .W / D E# .W W > /
 
D C ov# .Y /  E# .Y  .#// M#> I#1 V#> /
   
 E# V# I#1 M# .Y  .#//> C E# V# I#1 M# M#> I#1 V#> .
 
On the r.h.s. of this equation, M# being centred under P# and E# Y M#> D V# by
definition, both the second and the third summand reduce to V# I#1 V#> . By defini-
tion of the Fisher information, the fourth summand on the r.h.s. is CV# I#1 V#> . In
the sense of half-ordering of symmetric and non-negative definite matrices, we thus
arrive at
.ı/ 0  C ov# .W / D C ov# .Y /  V# I#1 V#>
which is (a). Equality in (ı) is possible only in the case where W D E# .W / D 0
P# -almost surely. This is the ‘only if’ part of assertion (b), the ‘if’ part is obvious. So
far, we did not make use of assumption (++).
(2) We consider the special case k D d and  D id : here assumption (++) is needed
to identify V# with the d d identity matrix. Then (c) is an immediate consequence of
(a) and (b). 

1.7 Remarks. (a) A purely heuristic argument for (++): one should be allowed to
differentiate Z
.#/ D E# .Y / D .d!/ f .#, !/ Y .!/
Section 1.1 Score, Information, Information Bounds 11

with respect to # under the integral sign, with partial derivatives


Z
@ @
E# .Yi / D .d!/ f .#, !/ Yi .!/
@#j @#j
Z  
@
D P# .d!/ log f .#, !/ Yi .!/ D E# .Y M#> /i,j .
@#j
(b) Condition (++) in 1.6 holds in naturally parameterised d -parametric exponential
families when the parameter space is open (see Barra [4, Chap. X and formula (2) on
p. 178], Witting [127, pp. 152–153], or van der Vaart [126, Lemma 4.5 on p. 38]): in
this case, the score at # is given by the canonical statistic of the exponential family
centred under #. See also Barndorff–Nielsen [3, Chap. 2.4] or Küchler and Sørensen
[77]. 

1.7’ Remark. Within the class of unbiased and square integrable estimators
 Y for
>
 , the covariance matrix C ov# .Y / D E# .Y  .#//.Y  .#// / , quantifying
spread/concentration of estimation errors Y  .#/ at #, allows to compare different
estimators at the point #. The lower bound in .˘/ under the assumptions of Proposi-
tion 1.6 involves the inverse of the Fisher information I#1 which thus may indicate
an ‘optimal concentration’; when  D id the lower bound in .˘/ equals I#1 .
However, there are two serious drawbacks:
(i) these bounds are attainable bounds only in a few classical parametric models,
see [27, p. 198], or [127, pp. 312–317], and [86, Theorem 7.15 on p. 300];
(ii) unbiasedness is not ‘per se’ relevant for good estimation: a famous example due to
Stein (see [60, p. 26] or [59, p. 93]) shows that even in a normal distribution model
 n n  ° n ±
k k k
X R , ˝ B.R , ˝ N .#, Ik / : # 2 ‚ , ‚ :D R where k  3
iD1 iD1 iD1
estimators admitting bias can be constructed which (with respect to squared loss)
concentrate better around the true # than the empirical mean, the best unbiased
square integrable estimator in this model.

The following bound, of different nature, allows to consider arbitrary estimators


for the unknown parameter. We give it in dimension d D 1 only (multivariate gen-
eralisations exist, see [29]), and assuming strictly positive densities. The underlying
idea is ‘Bayesian’: some probability measure .d #/ playing on the parameter space
‚ selects the true parameter # 2 ‚. Our proof follows Gill and Levit [29].

1.8 Proposition (van Trees inequality). Consider an experiment E :D ., A, ¹P# :


# 2 ‚º/ where ‚ is an open interval in R, with strictly positive densities f .#, / D
dP#
d
on  with respect to a dominating measure , for all # 2 ‚. Assume that E,
satisfying all assumptions of Definition 1.2, has score ¹M# : # 2 ‚º and Fisher
information ¹I# : # 2 ‚º, and consider a differentiable mapping  : ‚ ! R.
12 Chapter 1 Score and Information

Fix any subinterval .a, b/ of ‚ and any a priori law  with Lebesgue density g :D
d
d
such that

.i/ g is differentiable on R, strictly positive on .a, b/, and  0 else


Z  0 2
g
.ii/ d  D: J < 1
.a,b/ g

hold. Whenever a or b coincide with a boundary point of ‚, assume that  : ‚ ! R


admits at this point a finite limit denoted by .a/ or .b/, and f ., !/ a finite limit
denoted by f .a, !/ or f .b, !/ for fixed ! 2 . Then we have the bound
R 2
Z  0 .#/ .d #/
  .a,b/
E# .T  .#//2 .d #/  R
.a,b/ .a,b/ I# .d #/ C J

for arbitrary estimators T for  in the experiment E.

Proof. (0) Note that J is the Fisher information in the location model generated by
 on .R, B.R// which satisfies all assumptions of Example 1.4.
(1) We show that the assumptions on the densities made in Definition 1.2 imply
measurability of

./ ‚  3 .#, !/ ! f .#, !/ 2 .0, 1/

when ‚  is equipped with the product  -field B.‚/ ˝ A. In Definition 1.2, we


have ‚ open, and

8# 2‚: f .#, / : ., A/ ! .R, B.R// is measurable,


8!2: f ., !/ : ‚ ! R is continuous.

Write Ak for the set of all j 2 Z such that  2jk , j2C1


k  has a non-void intersection with

‚, and select some point #.k, j / in ‚\ 2jk , j2C1


k  for j 2 Ak . Then all mappings
fk ., / X
fk .#, !/ :D 1  j , j C1  \ ‚ .#/ f .#.k, j /, !/ , k  1
2k 2k
j 2An

are measurable in the pair .#, !/, hence the same holds for their pointwise limit ./
as k ! 1.
(2) The product measurability established in ./ allows to view
Z
.#, A/ ! P# .A/ D 1A .!/ f .#, !/ .d!/ , # 2 ‚ , A 2 A

as a transition probability from .‚, B.‚// to ., A/.


Section 1.1 Score, Information, Information Bounds 13

(3) We consider estimators T : ., A/ ! .R, B.R// for  which have the property
Z
 
E# .T  .#//2 .d #/ < 1
.a,b/

(otherwise the bound in Proposition 1.8 would be trivial). To prove Proposition 1.8,
it is sufficient to consider the restriction of the parameter space ‚ to its subset .a, b/:
d
thus we identify ‚ with .a, b/ – then, by assumption, g D d 
will be strictly positive
on ‚ with limits g.a/ D g.b/ D 0, and  as well as f ., !/ for all ! will have finite
limits at the endpoints of ‚ – and work on the product space
 
, A :D .‚ , B.‚/˝A/
equipped with the probability measure
P .d #, d!/ :D .d #/ P# .d!/ D .˝/.d #, d!/ g.#/f .#, !/ ,
# 2‚, !2.
(4) In the following steps, we write 0 for the derivative with respect to the parameter
(from the set of assumptions in Definition 1.2, recall differentiability of # ! f .#, !/
for fixed ! when d D 1). Then
Z
.C/ d # .f .#, !/g.#//0 D f .b, !/g.b/  f .a, !/g.a/ D 0

for all ! 2  since g.a/ D 0 D g.b/ by our assumption, together with


Z Z
.CC/ d # .#/ .f .#, !/g.#//0 D  d #  0 .#/ .f .#, !/g.#//
‚ ‚
by partial integration. Combining (+) and (++) we get the equation
Z
.˝/.d #, d!/ .f .#, !/g.#//0 .T .!/  .#//
‚
Z Z
D0C .d!/ d #  0 .#/ f .#, !/ g.#/
Z  ‚

D .d #/  0 .#/ .

By strict positivity of the densities and strict positivity of g on ‚, the l.h.s. of the first
equality sign is
Z
.˝/.d #, d!/ .f .#, !/g.#//0 .T .!/  .#//
‚
Z Z
.f .#, !/g.#//0
D .d #/P# .d!/ .T .!/  .#//
‚  f .#, !/g.#/
Z  0 
g f0
D P .d #, d!/ .#/ C .#, !/ .T .!/  .#// .
‚ g f
14 Chapter 1 Score and Information

In the last integrand, both factors


 0 
g f0
.#/ C .#, !/ , .T .!/  .#//
g f

are in L2 .P /: the second is the estimation error of an estimator T for  which at the
start of step (3) was assumed to be in L2 .P /, the first is the sum of the score in the
location experiment generated by  and the score ¹M# : # 2 ‚º in the experiment E,
both necessarily orthogonal in L2 .P /:
Z
g0 f0
P .d #, d!/ .#/ .#, !/ D
‚ g f
.C C C/ Z Z
g0 f0
.d #/ .#/ P# .d!/ .#, !/ D 0 .
‚ g  f
Putting the last three blocks of arguments together, the Cauchy–Schwarz inequality
with (+++) gives
Z 2
0
.d #/  .#/

Z  0  2
g f0
D P .d #, d!/ .#/ C .#, !/ .T .!/  .#//
‚ g f
Z  0 2 Z
g f0
 P .d #, d!/ .#/ C .#, !/  P .d #, d!/ .T .!/  .#//2
‚ g f ‚
 Z  Z
 
D JC .d #/ I#  .d #/ E# .T  .#//2
‚ ‚

which is the assertion. 

1.8’ Exercise. For ‚  Rd open, assuming densities which are continuous in the parameter,
cover Rd with half-open cubes of side length 2k to prove

‚  3 .#, !/ ! f .#, !/ 2 Œ0, 1/ is .B.‚/ ˝ A/–B.Œ0, 1//-measurable .

In the case where d D 1, the assertion holds under right-continuity (or left-continuity) of the
densities in the parameter.

1.8” Exercise. For ‚  Rd open, for product measurable densities .#, !/ ! f .#, !/ with
 dominating measure  on ., A/, consider as in Proposition 1.8 the product
respect to some
space , A D .‚ , B.‚/˝A/ equipped with

P .d #, d!/ D .d #/ P# .d!/ D .˝/.d #, d!/ g.#/f .#, !/ , # 2‚, !2

where  is some probability law on .‚, B.‚// with Lebesgue density g, and view .#, !/ !
# and .#, !/ ! ! as random variables on ., A/. We wish to estimate some measurable
Section 1.2 Estimator Sequences, Asymptotics of Information Bounds 15

mapping  : .‚, B.‚// ! .Rk , B.Rk //: fixing some loss function ` : Rk ! Œ0, 1/, we call
any estimator T  : ., A/ ! .Rk , B.Rk // with the property
Z Z
 
inf .d #/ E# . `.T  .#// / D .d #/ E# `.T   .#//
T A-mb ‚ ‚

`-Bayesian with respect to the a priori law . Here ‘inf’ is over the class of all measurable
mappings T : ., A/ ! .Rk , B.Rk //, i.e. over the class of all possible estimators for  . So
far, we leave open questions of existence (see Section 37 in Strasser [121]).
In the case where `.y/ D jyj2 and  2 L2 ./, prove that a squared loss Bayesian exists
and is given by
´ R R
R f .,!/ g./

T .!/ D ‚ . / ‚ f . ,!/ g. / d  d if
R‚
f . , !/ g. / d > 0
#0 if ‚ f . , !/ g. / d D 0

with arbitrary default value #0 2 ‚.


Hint: In a squared loss setting, the Bayes property reduces to the L2 -projection property of
conditional expectations in ., A, P /. Writing down conditional densities, we have a regular
version of the conditional law of # : .#, !/ ! # given ! : .#, !/ ! !. The random variable
 : .#, !/ ! .#/ belonging to L2 .P /, the conditional expectation of  given ! under P is
the integral of ./ with respect to this conditional law. 

1.2 Estimator Sequences, Asymptotics of Information


Bounds
1.9 Definition. Consider a sequence of experiments

En D .n , An , ¹Pn,# : # 2 ‚º/ , n1

parameterised by the same parameter set ‚  Rd which does not depend on n, and a
mapping  : ‚ ! Rk . An estimator sequence for  is a sequence .Yn /n of measur-
able mappings
Yn : .n , An / ! .Rk , B.Rk // , n  1 .
(a) An estimator sequence .Yn /n is called consistent for  if

for every # 2 ‚, every " > 0: lim Pn,# .jYn  .#/j > "/ D 0
n!1

(convergence in .Pn,# /n -probability of the sequence .Yn /n to .#/, for every # 2 ‚).
(b) Associate sequences .'n .#//n to parameter values # 2 ‚, either taking values
in .0, 1/ and such that 'n .#/ increases to 1 as n ! 1, or taking values in the space
of invertible d d -matrices such that minimal eigenvalues n .#/ of 'n .#/ increase
to 1 as n ! 1. Then an estimator sequence .Yn /n for  is called .'n /n -consistent if

for every # 2 ‚ , ¹L . 'n .#/.Yn  .#// j Pn,# / : n  1º is tight in Rk .


16 Chapter 1 Score and Information

(c) .'n /n -consistent estimator sequences .Yn /n for  are called asymptotically nor-
mal if
for every # 2 ‚ , L . 'n .#/.Yn  .#// j Pn,# / ! N .0, †.#// as n ! 1
(weak convergence in Rk ), for suitable normal distributions N .0, †.#// , # 2 ‚.

For the remaining part of this section we focus on independent replication of an


experiment
E :D . , A, ¹P# : # 2 ‚º / , ‚  Rd open
which satisfies all assumptions of Definition 1.2, with score ¹M# : # 2 ‚º and Fisher
information ¹I# : # 2 ‚º. We consider for n ! 1 the sequence of product models
En :D . n , An , ¹Pn,# : # 2 ‚º / D
 
.˘/ n n n
X , ˝ A, ¹P n,# :D ˝ P # : # 2 ‚º
iD1 iD1 iD1

as in Lemma 1.5(a), with score Mn,# in # and information In,# D n I# . In this setting
we present some asymptotic lower bounds for the risk of estimators, in terms of the
Fisher information.

1.10 Remark (Asymptotic Cramér–Rao Bound). Consider .En /n as in .˘/ and assume
that I# is invertible for all # 2 ‚. Let .Tn /n denote somepsequence of unbiased and
square integrable estimators for the unknown parameter, n–consistent and asymp-
totically normal:
p 
.ı/ for every # 2 ‚ : L n.Tn  #/ j Pn,# ! N .0, †.#// , n ! 1

(weak convergence in Rd ). The Cramér–Rao bound in En


p p 
E# Œ n.Tn  #/Œ n.Tn  #/> D n C ov# .Tn /  n In,#
1
D I#1

makes an ‘optimal’ limit variance †.#/ D I#1 appear in .ı/, for every # 2 ‚.
Given one estimator
 sequence .Tn /n whose rescaled estimation errors at # attain the
limit law N 0, I#1 , one would like to call this sequence ‘optimal’. The problem is
that Cramér–Rao bounds do not allow for comparison within a sufficiently broad class
of competing estimator sequences. Fix # 2 ‚. Except for unbiasedness of estimators
at #, Cramér–Rao needs the assumption (++) in Proposition 1.6 for  D id , and needs
./ from Definition 1.2: both last assumptions
E# .Mn,# / D 0 , E# .Tn Mn,# / D I , n1
(with 0 the zero vector in Rd and I the identity matrix in Rd d ) combine in particular
to
./ E# .ŒTn  # Mn,# / D I for all n  1 .
Section 1.2 Estimator Sequences, Asymptotics of Information Bounds 17

Thus from the very beginning, condition (++) of Proposition 1.6 for  D id estab-
lishes a close connection between the sequence of scores .Mn,# /n on the one hand and
those estimator sequences .Tn /n to which we may apply Cramér–Rao on the other.
Hence the Cramér–Rao setting turns out to be a restricted setting.

1.11 Theorem (Asymptotic van Trees bounds). Consider an experiment E :D


., A, ¹P# : # 2 ‚º/ where ‚ is an open interval in R, with strictly positive densities
f .#, / D dP
d
#
on  with respect to a dominating measure , for all # 2 ‚. Assume
that E, satisfying all assumptions of Definition 1.2, admits score ¹M# : # 2 ‚º and
Fisher information ¹I# : # 2 ‚º, and assume in addition

.˘˘/ ‚ 3 # ! I# 2 .0, 1/ is continuous.

Then for independent replication .˘/ of the experiment E, for arbitrary choice of esti-
mators Tn for the unknown parameter # 2 ‚ in the product experiments En , we have
the two bounds .I / and .II/
p 
.I / lim lim inf inf sup E# Œ n.Tn  #/2  I#1
c#0 n!1 Tn An -mb j##0 j<c 0

p 
.II/ lim lim inf inf sup p E# Œ n.Tn  #/2  I#1
C "1 n!1 Tn An -mb j##0 j<C = n 0

at every #0 2 ‚.

Proof. Up to easy notational changes it is sufficient to consider the case ‚ D R and


#0 D 0.
(1) Select a probability measure  on .R, B.R// with Lebesgue density g such that
g is differentiable in #, has support ¹g > 0º D .1, 1/, and satisfies
Z  0 2
g
J :D d < 1 .
.1,1/ g
Then the location model generated by  satisfies the conditions of Example 1.4, and
thus conditions (i) and (ii) in Proposition 1.8. The same holds for all probability mea-
sures r on .R, B.R// obtained from  by scaling
 
1 #
r .d #/ :D g d# , 0 < r < 1 :
r r
r is concentrated on .r, Cr/, and generates a location model with Fisher informa-
tion Jr :D r12 J . For arbitrary estimators Tn for  D id in the model En and arbitrary
0 < r < 1, the inequality
Z Cr
p  p 
sup E# Œ n.Tn  #/2  r .d #/ E# Œ n.Tn  #/2
j##0 j<r r
18 Chapter 1 Score and Information

(finite or infinite) is trivial; to its r.h.s. we apply van Trees inequality in Proposition
1.8 and continue
1 1
.ı/  n  R Cr D R Cr
r In,# r .d #/ C Jr r I# r .d #/ C r 2 n J
1

for arbitrary 0 < r < 1.


(2) Consider c > 0 small. First, keeping r :D c in .ı/ fixed, c 21n J vanishes as
n ! 1, thus
p  1
lim inf inf sup E# Œ n.Tn  #/2  R Cc .
n!1 Tn An -mb j#0j<c
c I# c .d #/

Second, for c # 0 on both sides of the last inequality, continuity .˘˘/ of the Fisher
information gives
p 
lim inf lim inf inf sup E# Œ n.Tn  #/2  I01 .
c#0 n!1 Tn An -mb j#0j<c
On the l.h.s. of the preceding inequality, the term
p 
lim inf inf sup Rn .Tn , #/ , Rn .Tn , #/ :D E# Œ n.Tn  #/2
n!1 Tn An -mb j#0j<c
is monotone in c since for c1 < c2

sup Rn .Tn , #/  sup Rn .Tn , #/  inf sup en , #/ .


Rn .T
j#0j<c2 j#0j<c1 e
T n An -mb j#0j<c1

Hence ’ lim inf ’ above is in fact ‘ lim ’ which proves the bound .I /.
c#0 c#0
p
(3) Consider C < 1 large. With r D C = n in the above chain of inequalities .ı/,
we exploit again continuity .˘˘/ of the Fisher information and get
p  1
lim inf inf sup p E# Œ n.Tn  #/2  .
n!1 Tn An -mb j#0j<C = n I0 C C12 J

Similar to step (2) above, the l.h.s. is monotone in C , thus


p 
lim lim inf inf sup p E# Œ n.Tn  #/2  I01
C "1 n!1 Tn An -mb j#0j<C = n

which is the bound .II/. This concludes the proof of Theorem 1.11. 

Both bounds .I / or .II/ in Theorem 1.11 are asymptotic lower bounds of minimax
type: using ‘best possible’ estimators for the unknown parameter – where at every
stage n of the asymptotics, competition is between all estimators which may exist in
En – we minimise a maximal risk on small balls around points #0 in ‚. For independent
replication of an experiment E which satisfies all assumptions of Proposition 1.8, the
van Trees inequality thus shows that the maximal risk of estimators on small balls
Section 1.2 Estimator Sequences, Asymptotics of Information Bounds 19

around #0 p(with respect to squared loss, and with estimation errors rescaled by norming
constants n as n ! 1) will never be better than I#1 0
, the inverse of the Fisher
information at #0 .
The two types of bounds .I / and .II/ are different in that they imply different no-
tions of ‘small neighbourhoods
p around #0 ’. Type .II/ neighbourhoods are shrinking
balls of radius O.1= n/: at first glance this seems to be less natural than the small
balls not depending on n which are used in typep .I /. Let us compare the two types
of bounds, writing again Rn .Tn , #/ :D E# Œ n.Tn  #/2 as in the last proof. For
every estimator sequence .Tn /n and every pair of constants 0 < c < C < 1, we have

sup Rn .Tn , #/  sup p Rn .Tn , #/


j##0 j<c j##0 j<C = n

 inf sup p en , #/
Rn .T
e
T n An -mb j##0 j<C = n

for sufficiently large n. It follows that

lim inf inf sup Rn .Tn , #/ 


n!1 Tn An -mb j##0 j<c
lim inf inf sup p Rn .Tn , #/
n!1 Tn An -mb j##0 j<C = n

for arbitrary pairs .c, C /, c > 0 small and C < 1 large. Using again the monotonicity
argument of the last proof, we arrive at a comparison

lim lim inf inf sup Rn .Tn , #/ 


c#0 n!1 Tn An -mb j##0 j<c
lim lim inf inf sup p Rn .Tn , #/
C "1 n!1 Tn An -mb j##0 j<C = n

between left-hand sides in the type .I / or type .II/ bounds of Theorem 1.11.
The two types of bounds correspond to different traditions. On the one hand, see e.g.
Ibragimov and Khasminskii [60, Theorem 12.1, p. 162] and Kutoyants [80, p. 57 or
pp. 114–115] who – in more general settings than the present one – work with bounds
of type .I /. To prove that a given sequence .Ten /n attains a bound of type .I /:

lim lim sup en , #/ D I 1


Rn .T #0
c#0 n!1 j##0 j<c

one needs some ‘uniformity in the parameter’ for weak convergence of rescaled esti-
mation errors which has to be proved separately. On the other hand, Le Cam [81, 82],
Hajek [40], Davies [19] or Le Cam and Yang [84] work – in more general settings
than the present one – with bounds of type .II/. To prove that an estimator sequence
en /n attains a type .II/ bound
.T

lim lim p sup en , #/ D I 1


Rn .T #0
C "1 n!1 j##0 j<C = n
20 Chapter 1 Score and Information

no separate proof for ‘uniformity in the parameter’ for weak convergence of rescaled
estimation errors is needed since ‘Le Cam’s third lemma’ settles this problem, see
Chapter 3. Our focus in later chapters will be on bounds of type .II/, see Chapter 7; we
will also be interested in loss functions different from squared loss, and in estimator
sequences .Ten /n which achieve type .II/ bounds simultaneously with respect to a
broad class of loss functions.

We conclude the present section by one example illustrating type .I / bounds in the
spirit of the references [60] and [80] mentioned above.

1.11’ Example. We consider i.i.d. observations Xi , i  1, with unknown distribution


function F in F , the class of all distribution functions on .R, B.R//, and write F bn
for the empirical distribution function based on the first n observations X1 , : : : , Xn .
For x 2 R fixed, we consider mappings
 : F 3 F ! .F / :D F .x/ 2 Œ0, 1
and wish to prove that the sequence Tn :D F b n .x/ is asymptotically minimax for  in
a sense which we define below. Assertions of type ‘the empirical distribution function
provides asymptotically optimal estimators for unknown F 2 F ’ were first proved by
Beran [9].
Let Xi , i  1, be defined on some ., A/, let An :D  .X1 , : : : , Xn / denote the
sub- -field of A generated by the first n observations as in Lemma 1.5(b), and define
neighbourhoods of F in F
² ³
e e
Vı .F / :D F 2 F : sup jF .y/  F .y/j < ı
y2R

with radius ı > 0. Then for every F 2 F and every point x 2 R such that 0 <
F .x/ < 1, we have a lower bound
 
.i/ lim lim inf inf sup Ee e .x/2  F .x/.1  F .x// ,
n ŒTn  F
F
ı#0 n!1 Tn An -mb e F 2Vı .F /

and the empirical distribution function satisfies


 
.ii/ lim lim sup Ee n ŒFb n .x/  F
e .x/2 D F .x/.1  F .x//
F
ı#0 n!1 e F 2Vı .F /

and thus attains the bound specified in (i). The proof is in several steps.
(1) Fix F 2 F and x 2 R. It is easy to prove (ii): from
X
n
 
e .x/ D 1
b n .x/  F
F e .x/
1.1,x .Yi /  F
n
iD1

for independent random variables Yi e we get


F
 
b e e .x/.1  F
e .x//
F n ŒF n .x/  F .x/ DF not depending on n  1
2
Ee
Section 1.2 Estimator Sequences, Asymptotics of Information Bounds 21

where by definition of Vı .F / in terms of the uniform distance kf  gk1 D


sup jf .y/  g.y/j
y2R

sup F e .x/.1  F
e .x// ! F .x/.1  F .x// as ı # 0 .
e
F 2Vı .F /

Thus the empirical distribution function attains the bound proposed in (i).
(2) It remains to prove the bound (i). Fix F 2 F and x 2 R such that 0 < F .x/ < 1.
Let H denote the system of functions h in L2 .F / which satisfy
Z 1 Z x
h bounded on R, h dF D 0 , h dF D 1
1 1
(in addition to Example 1.3(a), the third condition yields aRparticular norming of func-
tions e
h which satisfy the first two conditions together with 1 e
x
h dF ¤ 0). For h 2 H
and neighbourhoods Vı .F / of F we introduce one-parametric paths S h through F by
ı
ı h :D , S h :D ¹ F#h : j#j < ı h º , dF#h :D .1C#h/ dF .
sup jhj

The choice of ı h makes sure that S h is contained in Vı .F /. As shown in (1.3’),


Z
h2
I#h :D dF , j#j < ı h
1 C #h
is the Fisher information in the path S h ; in particular, we have in S h
Z
I0h D h2 dF at # D 0 .

(3) Fix h 2 H . Due to the norming factor included in the definition of H we have
in S h Z x
F#h .x/ D .1C#h/ dF D F .x/ C # , j#j < ı h ,
1
e ! F
hence any An -measurable estimator Tn for  : F 3 F e .x/ 2 Œ0, 1 estimates
in restriction to S h

.C/ : S h 3 F#h ! F .x/ C # .

Note that (+) makes appear a shift by F .x/; recall that F and x are fixed. In restriction
to S h , we may associate to  a new mapping  h and to Tn a new estimator T n

 h .#/ :D .F#h /  F .x/ D # , F#h 2 S h , T n :D Tn  F .x/ .

Then T n is an An -measurable estimator for  h D id on ¹# : j#j < ı h º. An estimator


for the unknown parameter # in the model ¹# : j#j < ı h º corresponds to any An -
measurable estimator for  in restriction to S h . Now we apply the asymptotic van
22 Chapter 1 Score and Information

Trees bound in Theorem 1.11 for estimation of the unknown parameter # in small
type .I / neighbourhoods of the point #0 D 0 in ¹# : j#j < ıh º:
p   1
lim lim inf inf sup EF h Œ n.T n  #/2  I0h .
n!1
c#0 T n An -mb j#j<c #

Transforming this back from . T n ,  h Did / to . Tn ,  /, we obtain at the point F in


S h a bound
p   
1
.CC/ lim lim inf inf sup EF h Œ n.Tn  .F#h //2  I0h .
c#0 n!1 Tn An -mb F#h :j#j<c #

(4) Now we consider the system of functions h 2 H with associated one-parametric


paths S h passing through F , and have from S h  Vı .F / and from (++) bounds
p 
lim lim inf inf sup Ee Œ n.T n  .Fe //2
ı#0 n!1 Tn An -mb e F
F 2Vı .F /
p   
1
 lim lim inf inf sup EF h Œ n.Tn  .F#h //2  I0h
n!1
c#0 T n An -mb F#h :j#j<c #

for arbitrary h 2 H , whence


p  °  ±
e h 1
lim lim inf inf sup Ee
F Œ n.Tn  F .x// 2
 sup I0 : h 2 H .
ı#0 n!1 Tn An -mb e
F 2Vı .F /

(5) In order to conclude the proof of (i) on the basis of the last inequality, it is
sufficient to prove
°  ±
1
.C C C/ sup I0h : h 2 H D F .x/.1  F .x// .

This is a consequence of the norming conditions imposed on the functions in class H


at the start of step (2), together with Cauchy–Schwarz:
Z x Z C1
1D h.y/ F .dy/  0 D h.y/ Œ1.1,x .y/  F .x/ F .dy/
1 1
Z  12 Z  12
 2
h dF Œ1.1,x .y/  F .x/ F .dy/
2

 1 1
D I0h 2 . F .x/.1  F .x// / 2

where equality holds if and only if

.ı/ for some c 2 R : h.y/ D c  Œ1.1,x .y/  F .x/ , y 2 R

or equivalently (again as a consequence of the norming conditions in class H ) if and


only if
1
.ıı/ cD .
F .x/.1  F .x//
Section 1.3 Heuristics on Maximum Likelihood Estimator Sequences 23

Hence the system H contains exactly one element h determined through .ı/ and .ıı/
´
1.1,x .y/  F .x/ 1
, yx
h.y/ D D F .x/ 1
F .x/.1  F .x// 1F .x/
, y>x

(h satisfies the norming conditions in H , and is bounded since 0 < F .x/ < 1) with
the property
° 1 ±  1 Z 2 1
h h
sup I0 : h 2 H D I0 D h dF D F .x/.1  F .x// .

This element h 2 H is called a least favourable direction at F ; the associated model


S h is a least favourable one-parametric path through F . Within the collection of paths
S h , h 2 H , the least favourable path S h minimises the Fisher information at F . This
is (+++). Hence, by the last inequality in step (4), the asymptotic minimax bound (i)
is proved. 

1.3 Heuristics on Maximum Likelihood Estimator


Sequences
There is a close connection between score and information and maximum likelihood
(ML) estimation. In this section, we discuss – in heuristic terms only – a tradition which
seemed convinced that under independent replication of experiments, maximum like-
lihood
p should lead in all relevant statistical problems to estimator sequences which are
n-consistent, asymptotically normal with covariance matrix of the limit distribution
at # equal to the inverse I#1 of the Fisher information, and such that estimator se-
quences with better concentrated limit distribution do not exist. These heuristics aim
at a representation of rescaled ML estimation errors in terms of score and information,
stated under (h6) and (h7) in Heuristics III below. Later (see Chapter 7), based on a
mathematically different approach and on assumptions of a different type, a represen-
tation of rescaled estimation errors of type (h6) and (h7) will indeed show up again,
and will characterise in broad generality estimator sequences which are asymptotically
optimal in a rigorous sense. We illustrate by examples where traditional heuristics may
work – or do not work at all. Equalities and formulae based on heuristics will be in-
.ŠŠ/
dicated through a notation D and a numerotation (h1)–(h7), and sometimes by a .ŠŠ/
in the text.

1.12 Heuristics I (Interchange conditions). Consider an experiment

E :D . , A, ¹P# : # 2 ‚º / , ‚  Rd open
24 Chapter 1 Score and Information

admitting score ¹M# : # 2 ‚º and Fisher information ¹I# : # 2 ‚º, under all
conditions of Definition 1.2. The densities f# D dP d
#
are assumed strictly positive
and the parameterisation # ! f .#, / sufficiently smooth on ‚. In such experiments
E, we expect the following interchange conditions (h1) and (h2) to hold. First, as in
Remark 1.7(a) we should have
Z Z Z  
@ @ .ŠŠ/ @ @
E# Y D Y f# d D Y f# d D Y log f# f# d
@#i @#i @#i @#i
in our model for suitable Y : ., A/ ! .R, B.R//, and thus
.ŠŠ/
.h1/ r> .E# Y / D E# .Y  M#> /
which corresponds to the special case of dimension k D 1 in condition (++) in Propo-
sition 1.6. Second, accepting (h1) we should also have
Z   
@ @ .ŠŠ/ @ @
E# Y D Y log f#  f# d
@#j @#i @#j @#i
Z  
@ @
D Y log f#  f#
@#j @#i
   
@ @
C log f# log f#  f# d
@#i @#j
  
@ @
D E# Y log f# C M#,i M#,j
@#j @#i
where M#,1 , : : : , M#,d are the components of M# . In the particular case Y  1, the
l.h.s. of this chain of equations equals 0, hence by definition of score and Fisher infor-
mation, assuming tacitely P# -integrability of .rr > log f /.#, /,
  .ŠŠ/  
.h2/ I# D E# M# M#> D  E# .rr > log f /.#, / .

Again for general Y we can write the first line of the above chain of equalities in the
alternative form
Z  
@ @ .ŠŠ/ @ @
E# Y D Y f .#, / d .
@#j @#i @#j @#i
Comparing right-hand sides in their original and their alternative form when Y  1,
the condition
  .ŠŠ/
.h20 / E .rr > f /.#, / D 0

should be equivalent to the interchange condition (h2). 

1.13 Heuristics II (ML method). Consider a model E with score ¹M# : # 2 ‚º and
Fisher information ¹I# : # 2 ‚º as in Definition 1.2, with densities f# D dP
d
#
strictly
Section 1.3 Heuristics on Maximum Likelihood Estimator Sequences 25

positive, and such that the parameterisation # ! f .#, / is sufficiently smooth. A


maximum likelihood estimator b # for the unknown parameter in E is a statistic selecting
for ! 2  a point b
#.!/ where the likelihood function

‚ 3 # ! f .#, !/ 2 .0, 1/

attains its maximum.

1.14 Heuristics III (Asymptotics of ML estimators). Consider independent replication


of an experiment E as in Heuristics 1.12 and 1.13
 n n n 
En D . n , An , ¹Pn,# : # 2 ‚º / D X , ˝ A, ¹Pn,# :D ˝ P# : # 2 ‚º
iD1 iD1 iD1

with score Mn,# in # 2 ‚ and information In,# D n I# as in Lemma 1.5(a). Let b #n


denote the maximum likelihood estimator for the unknown parameter in the product
model En , n  1. We expect that the sequence .b # n /n is consistent as n ! 1 (!!).
b
Since # n is a point in ‚ where the likelihood function attains its maximum, a Taylor
expansion of the gradient of the log-likelihood function at b# n reads as
 >   >   >  
r log fn . / # n/ C  b
r log fn .b #n rr > log fn .b
# n/
„ ƒ‚ …
D0

for points in ‚ which are close to b


# n . Inserting the true parameter value # gives
  .ŠŠ/
r > log fn .#/ D
2 3
.h3/  >   >    
4 r log fn .b # n/ C #  b
#n rr > log fn .b
# n /5 1 C oPn,# .1/ d .
„ ƒ‚ …
D0

Here oPn,# .1/ denotes remainder terms which vanish in .Pn,# /n -probability as n tends
to 1, and .1CoPn,# .1//d means a diagonal matrix with diagonal entries 1CoPn,# .1/.
We expect that second derivatives of the log-likelihoods should not vary much in the
parameter (in the classical normal distribution model with unknown mean and known
covariance, log-likelihoods are quadratic in the parameter, thus second derivatives do
not depend on the parameter), i.e.
@ @ @ @
.h4/ log fn .b
# n/ log fn .#/.
@#j @#i @#j @#i
Inserting (h4) into the expansion (h3) we should have a final form
  .ŠŠ/  >    
.h5/ r > log fn .#/ D #  b#n rr > log fn .#/ 1 C oPn,# .1/ d
26 Chapter 1 Score and Information

of the expansion. In the product model En we have

1 X
n
1  
rr > log fn .#, .!1 , ..., !n // D rr > log f .#, !i /
n n
iD1

and the strong law of large numbers combined with (h2) gives almost sure convergence
1   
.˘/ rr > log fn .#, / ! E# rr > log f .#, / D I#
n
under # as n ! 1. According to Lemma 1.5, the score in product models
X
n
.r log fn / .#, .!1 , ..., !n // D Mn,# .!1 , ..., !n / D M# .!i /
iD1

is asymptotically normal by definition of score and information:


 
1
L p Mn,# j Pn,# ! N .0, I# /
.˘˘/ n
(weak convergence in Rd as n ! 1) .

In combination with .˘/, expansion (h5) written as


1   .ŠŠ/ p  > 1    
p r > log fn .#/ D n # b
#n rr > log fn .#/ 1 C oPn,# .1/ d
n n
takes the form
1 p >  
n.b
> .ŠŠ/
p Mn,# D # n  #/ I# 1 C oPn,# .1/ d .
n

By .˘˘/ the l.h.s. of this equation is tight in Rd under # as n ! 1; inverting the


Fisher information matrix (!!) we get tightness in Rd of rescaled estimation errors
under #
p  .ŠŠ/   1
.h6/ n.b# n  #/ D I#1 1 C oPn,# .1/ d p Mn,#
n
as n ! 1. Combined with .˘˘/, (h6) thus implies asymptotic normality
p   
.h7/ L n.b # n  #/ j Pn,# ! N 0, I#1 (weakly in Rd as n ! 1)

of rescaled estimation errors. The limit variance at # in (h7) is the inverse of the Fisher
information in E. According to Cramér–Rao asymptotics (Remark 1.10) or – assum-
ing some uniformity in the parameter – van Trees asymptotics (Theorem 1.11) this
is a lower bound for limit variances of ‘good’ estimators for the unknown parameter:
hence ML estimators attain this bound asymptotically. 
Section 1.3 Heuristics on Maximum Likelihood Estimator Sequences 27

We turn to examples. There are two examples where all laws under consideration
are normal distributions: in the first case, all approximations above are justified; in the
second case we run into trouble. A third example illustrates that we can be extremely
far from everything which seemed ‘natural’ in the above heuristics.

1.15 Example. Consider n-fold replication


 ² ³
n n n
En :D n :D X Rk , An :D ˝ B.Rk /, Pn,# :D ˝ N .#, ƒ/ : # 2 ‚ ,
iD1 iD1 iD1
k
‚ :D R
of the usual normal distribution model E with known k k-covariance matrix ƒ, sym-
metric and strictly positive definite; the parameter of interest is the mean value # 2 Rk .
Write ! D .!1 , : : : , !n / for the elements of n . The dimension of successive obser-
vations !i is k  1, and we wish to estimate the unknown parameter.
(a) Based on observations !1 , : : : , !n in Rk , the likelihood function
‚ 3 # !
 n  
1X
n
1 > 1
fn .#, .!1 , : : : , !n // D k 1
exp  .!i  #/ ƒ .!i  #/
.2/ 2 .det ƒ/ 2 2
iD1
has a unique maximum at the empirical mean
1X
n
b
# n .!1 , : : : , !n / :D !i .
n
iD1
Using the notation .X1 , : : : , Xn / :D id jn , the ML estimator for the unknown pa-
rameter in En is thus
1X
n
b
#n D Xi .
n
iD1
(b) In all i.i.d. models where the single observation is a square integrable statistic,
by the strong law ofplarge numbers and the central limit theorem, the sequence of
empirical means is a n-consistent and asymptotically normal estimator sequence for
# D E# .X1 /. In the present example we have more: laws of rescaled estimation errors
p   
1 X
n
.ı/ L b
n.# n  #/ j Pn,# D L p .Xi  #/ j Pn,# D N . 0k , ƒ /
n iD1
do not depend on n 2 N. It is easy to calculate score and information in En :
X n
1
Mn,# D ƒ .Xi  #/ , In,# D n ƒ1 , # 2 Rk .
iD1
Assertion .ı/ is a particular case of representation (h7), and
p 1
.ıı/ n.b
# n  #/ D I#1 p Mn,#
n
28 Chapter 1 Score and Information

a particular case of representation (h6), without oPn,# .1/-remainder terms. 

The log-likelihood function in the normal distribution model 1.15 is a negative


quadratic polynomial
# !
1X
n
 .Xi .!/  #/> ƒ1 .Xi .!/  #/ C expressions which do not depend on #
2
iD1

in # 2 R k , and is, for every ! 2 , centred at the maximum likelihood estimator


b P
# n D n niD1 Xi which as in .ı/ and .ıı/ of Example 1.15 is close to the true param-
1

eter value. In this sense, the normal distribution model 1.15 serves as a prototype exam-
ple for a broad class of statistical models where one can prove that log-likelihoods are
locally – in small neighbourhoods of their unique argmax – approximatively quadratic
in the parameter. Le Cam [83] amused himself in collecting some seemingly harm-
less examples where maximum likelihood goes wrong. In the next example – a 2k-
dimensional normal distribution model with unknown mean and unknown variance
where the mean value parameter is restricted to a particular k-dimensional hyperplane
in R2k – maximum likelihood estimators are asymptotically normal, well concen-
trated, but not around the true parameter value.

1.16 Example ([83], going back to Neyman and Scott). Take k 2 N arbitrarily large.
On ., A/ :D .R2k , B.R2k // where .X1 , Y1 , X2 , Y2 , ..., Xk , Yk / :D id j denotes the
canonical variable, we consider the experiment ¹P# : # 2 ‚º
# D: . 2 , 1 , 2 , ..., k / , ‚ :D .0, 1/ Rk
defined by
.#/ :D . 1 , 1 , 2 , 2 , ..., k , k / 2 R2k , P# :D N . .#/,  2 I2k / , # 2‚
together with two different estimators for the mapping .#/ D  2 .
(1) The random variables XipY 2
i
, i D 1, : : : , k, are i.i.d. and distributed according
to N .0,  2 /, hence an empirical mean

k  
1 X Xi  Yi 2
T :D p
k 2
iD1

is a reasonable estimator for .#/ D  2 . Since L.Z 2 / D . 12 , 12 / for Z N .0, 1/,


the usual properties of Gamma laws under convolution and scaling give
 
k k 2 4
L .T j P# / D , 2
, E# .T / D  2 , Var# .T / D .
2 2 k
In particular, this estimator T is unbiased for  , square integrable, and well concen-
trated around the true  2 for large k.
Section 1.3 Heuristics on Maximum Likelihood Estimator Sequences 29

(2) We prove that the likelihood function


 2k Y  1 ®
1
k
¯
# ! p exp  2 .Xi  i/
2
C .Yi  i/
2
2 2 2
iD1

admits a unique maximum in ‚. First, for i D 1, ..., k fixed,


X i C Yi
i ! .Xi  i/
2
C .Yi  i/
2
has a unique minimum at bi :D ,
2
second, inserting the bi into the likelihood and taking logarithms, it remains to consider
  X
k ²   ³
1 1 Xi  Y i 2
 2
! k log  2
2 2 2 2
iD1

which has a unique maximum at


k  
b 1 X Xi  Yi 2 1
2
 :D D T .
k 2 2
iD1

Hence there is a unique maximum likelihood estimator b # for #, defined by its compo-
b
nents  1 k
b
2 , b , ..., b . Since  2 D 1 T , the above properties of T give
2
   
 b2  k k   2 4
L  j P# D , 2 , E# b2 D , Var # b2 D .
2  2 2k
This means that the maximum likelihood estimator concentrates in its first component
around one half of the true value, with obviously dramatic effects for large values of
the parameter  2 once the model dimension k is large enough. 

The last example shows that the Heuristics 1.12–1.14 cannot pretend to serve as
universal guidelines for good estimation. The next example, associated to classically
well-behaving models, makes types of statistical experiments appear which are very
far from all the considerations above.

1.16’ Example. Consider a probability space ., A, P0 / carrying a one-dimensional


Lévy process .Xu /u0 (i.e.: stationary independent increments), starting from X0 D 0,
such that the Laplace transform of X1 exists on some open interval D in R. We can
put Laplace transforms in the form
 
# ! EP0 e #Xu D e u .#/ , # 2 D , u  0
which gives rise to two types of statistical models defined from the Laplace transforms.
(1) Fix u D 1 and define a family of probability laws ¹P# : # 2 Dº on ., A/ by
dP# :D e # X1  .#/
dP0 , #2D.
30 Chapter 1 Score and Information

As in Barra [4, Chap. X], this model is an exponential family, and the function is
C 1 on D. In particular, for standard Brownian motion X we have .#/ D 12 # 2 and
D D R; for Poisson process X with parameter  > 0 we have .#/ D .e #  1/
and D D R.
(2) Fix a point #0 in D; w.l.o.g. we write #0 D 1. Define probability laws ¹Peu :
u > 0º on ., A/ by

e u :D e Xu 
dP .1/ u
dP0 , u>0.

Now ‘smoothness’ of the log-likelihood function u ! Xu  .1/u is ‘smoothness’


of the path u ! Xu . In the case of standard Brownian motion X , the log-likelihood
function is thus nowhere differentiable, P0 -a.s., and Hölder continuous of order 12  "
for every " > 0 (e.g., see [112]). In the case of Poisson process X with parameter
 > 0, the log-likelihood function is piecewise linear, and has jumps of fixed size
C1 at the jumps times of the Poisson path u ! Xu , i.e. after successive independent
exponential waiting times with parameter . So we are very far from all assumptions
in Definition 1.2.
(3) It is known that in models of type (2), under quadratic loss, maximum likelihood
estimators are outperformed by squared-loss Bayesians. See Dachian [17] for an inter-
esting class of examples of type (2); in the case of Brownian motion X , this has been
established in successive steps by Golubev [32], Ibragimov and Khasminskii [60] and
Rubin and Song [114]. 

1.4 Consistency of ML Estimators via Hellinger Distances


There are several ways to prove convergence of ML estimators for the unknown pa-
rameter. A rigorous route along the programme sketched in Heuristics 1.12–1.14 re-
quires very strong conditions of type ‘uniform integrability’, see e.g. Witting and
Müller-Funk [128, Chap. 6.1.3]); the proofs go back to Wald. In sharp contrast, the
present section points towards a completely different way to pose the same problem;
this can be applied successfully in a broad variety of cases, under weak and natural con-
ditions. Ibragimov and Khasminskii [60, Thms. 19–20 in App. I.4, and Chaps. I.4–I.5]
analysed the asymptotic behaviour of ML estimators in terms of Hellinger distances
in the statistical model, and in terms of weak convergence of likelihoods – viewed as
random fields in the (local) parameter – to a random field of limiting likelihoods. We
explain only a first step on this route – consistency – in the present section, for i.i.d.
models. The main result is in Theorem 1.24.
In general statistical models, likelihood functions # ! f .#, !/ do not necessarily
achieve maxima on ‚, and if maxima exist, they need not be unique. When maxima
exist, we shall be interested in the set of parameter values where the maximum is
Section 1.4 Consistency of ML Estimators via Hellinger Distances 31

attained. Thus we start with a careful definition of ML estimators and ML estimator


sequences in arbitrary sequences of models.

1.17 Definition. (a) Consider a statistical model


E D ., A, ¹P# : # 2 ‚º/ , ‚  Rd
dP#
which is dominated by some  -finite measure  on ., A/, densities f# D d
with
respect to , and the likelihood function
‚ 3 # ! f .#, !/ D f# .!/ 2 Œ0, 1/
based on the observation ! 2 .
(i) For T : ., A/ ! .Rd , B.Rd // measurable and A 2 A, we call T maximum
likelihood on the event A if ! 2 A implies
T .!/ 2 ‚ and f .T .!/, !/ D sup¹f . , !/ : 2 ‚º .
(ii) We call T : ., A/ ! .Rd , B.Rd //
a maximum likelihood estimator for the
unknown parameter if there is some event A 2 A such that
T is maximum likelihood on the event A, and P# .A/ D 1 for all # 2 ‚ .

(b) Consider a sequence of experiments parameterised by ‚ 2 Rd


En D .n , An , ¹Pn,# : # 2 ‚º/ , n1
and assume that for every n, En is dominated by some n with densities fn .#, / D
dPn,#
dn
, # 2 ‚. Consider a sequence of estimators .Tn /n in .En /n

Tn : .n , An / ! .Rd , B.Rd // , n  1


as in Definition 1.9. If there exist An 2 An such that the following two conditions
hold
´
for every # 2 ‚ : lim Pn,# .An / D 1 ,
n!1
for every n  1 : Tn is maximum likelihood on the event An ,
then .Tn /n is called a maximum likelihood estimator sequence for the unknown pa-
rameter, and .An /n a sequence of good sets for .Tn /n .

In most cases, explicit representations for ML estimators as in Examples 1.15


and 1.16 do not exist; usually in practical problems, even if there is a unique maximum
in # ! fn .#, !/ for every ! in the good set An it has to be evaluated numerically.

1.17’ Exercise. Construct an ML estimator sequence in .En /n of Definition 1.17(b) under the
following set of conditions: (i) ‚ is open in Rd ; (ii) for all n 2 N and all ! 2 n , densities
fn .#, !/ are continuous in # 2 ‚; (iii) for any compact exhaustion .Kn /n of ‚, events
An :D ¹ ! 2 n : max¹fn . , !/ : 2 Kn º D sup¹fn . , !/ : 2 ‚º º
32 Chapter 1 Score and Information

are such that lim Pn,# .An / D 1 for every # 2 ‚. Hint: with respect to .Kn /n and .An /n ,
n!1
for every n, associate to ! 2 An a non-void compact set
®  ¯
Mn .!/ :D 2 Kn : fn .  , !/ D max¹fn . , !/ : 2 Kn º  Kn

depending on !, and specify a random variable Tn : .n , An / ! .Rd , B.Rd // such


that
! 2 An implies Tn .!/ 2 Mn .!/ ;
here ‘measurable selection’ can be done e.g. as in step (2) of the proof of Proposi-
tion 2.10 in Chapter 2 below. 

In this section, we do not focus on the explicit construction of estimators (Chapter 2


below will cover this topic for a particular class of estimators): we are interested in
the location of the random set of maxima of the likelihood function # ! fn .#, !/,
in order to show that this set – under suitable conditions – shrinks as n ! 1 towards
the true parameter value. The following example using Kullback divergence (from
Pfanzagel [105, Chap. 6.5]) illustrates this point.

1.17” Example. Considering equivalent probability laws P# und P P# , Kullback


divergence is defined (e.g. [124, Chap. 2.4]) as
Z  
f
K.P# , P / :D  log dP# ;
f#
by Jensen inequality for the convex function  log./ we have without further condi-
tions
   
f . , / f . , /
0 D  log.1/ D  log E#  E#  log  C1 .
f .#, / f .#, /
Consider an experiment

E :D . , A, ¹P# : # 2 ‚º / , ‚  Rd open

with strictly positive densities f .#, !/ with respect to some dominating measure, and
product models
 
n n n
En D . n , An , ¹Pn,# : # 2 ‚º / D X , ˝ A, ¹Pn,# :D ˝ P# : # 2 ‚º .
iD1 iD1 iD1

Assume that for every # 2 ‚ and every " > 0 one can find a finite collection of open
subsets V1 , : : : , Vl of ‚ (with l 2 N and V1 , : : : , Vl depending on # and ") such that
./ and ./ hold:

[
l
® ¯ [
l
./ # … Vi and 2 ‚ : j  #j > "  Vi
iD1 iD1
Section 1.4 Consistency of ML Estimators via Hellinger Distances 33

   
f . , / f . , /
sup log 2 L .P# /
1
and E# sup log < 0,
./ 2Vj f .#, / 2Vj f .#, /
j D 1, : : : , l .
Then every ML sequence for the unknown parameter is consistent.

Proof. (i) Given any ML sequence .Tn /n for the unknown parameter with ‘good sets’
.An /n in the sense of Definition 1.17 (b), fix 2 ‚ and " > 0 and select l and
V1 , : : : , V` according to ./. Then
\ l ² ³ \
sup log fn . , / < sup log fn . , / An  ¹ jTn  #j  " º
2Vj 2‚:j#j"
j D1
for every n fixed, by Definition 1.17(b) and since all densities are strictly positive.
Inverting this,
l ²
[ ³ [
¹ jTn  #j > " º  sup log fn . , /  sup log fn . , / Acn
2Vj :j#j"
j D1
l ²
[ ³ [
 sup log fn . , /  log fn .#, / Acn .
2Vj
j D1
(ii) In the sequence of product experiments .En /n , the .An /n being ‘good sets’ for
.Tn /n , we use the strong law of large numbers thanks to assumption ./ to show that
´ n   μ!
1X f . , !i /
Pn,# .!1 , : : : , !n / : sup log  0 ! 0
n 2Vj f .#, !i /
iD1
as n ! 1 for every j , 1  j  l: this gives lim Pn,# .jTn  #j > "/ D 0. 
n!1

In order to introduce a geometry on the parametric statistical experiment, to be dis-


tinguished from Euclidian geometry on the parameter set ‚  Rd , we make use of
Hellinger distance H., / which defines a metric on the space of all probability mea-
sures on ., A/; see e.g. [124, Chap. 2.4] or [121, Chap. I.2]. Recall that countable
collections of probability measures on the same space are dominated.

1.18 Definition. The Hellinger distance H., / between probability measures Q1 and
Q2 on ., A/ is defined as the square root of
Z
1 ˇˇ 1=2 ˇ
1=2 ˇ2
H 2 .Q1 , Q2 / :D ˇ 1
g  g 2 ˇ d 2 Œ0, 1
2

where gi D dQ d
i
are densities with respect to a dominating measure , i D 1, 2. The
affinity A., / between Q1 and Q2 is defined by
Z
1=2 1=2
A.Q1 , Q2 / :D 1  H .Q1 , Q2 / D g1 g2 d .
2
34 Chapter 1 Score and Information

The integrals in Definition 1.18 do not depend on the choice of a dominating mea-
sure for Q1 and Q2 (similar to Remark 1.2’). We have H.Q, Q0 / D 0 if and only
if probability measures Q, Q0 coincide, and H.Q, Q0 / D 1 if and only if Q, Q0 are
mutually singular. Below, we focus on i.i.d. models En and follow Ibragimov and
Khasminskii’s route [60, Chap. 1.4] to consistency of maximum likelihood estimators
under conditions on the Hellinger geometry in the single experiment E.
For the remaining part of this section, the following assumptions will be in force.
Note that we do not assume equivalence of probability laws, and do not assume con-
tinuity of densities in the parameter for fixed !.

1.19 Notations and Assumptions. (a) (i) In the single experiment

E D . , A , ¹P# : # 2 ‚º/ , ‚  Rd open

we have densities f# D dP d
#
with respect to a dominating measure . For ¤ #, the
likelihood ratio of P with respect to P# is
f
L=# D 1¹f# >0º C 1  1¹f# D0º .
f#

(ii) We write K for the class of compact sets in Rd which are contained in ‚. A
compact exhaustion
S of ‚ is a sequence .Km /m in K such that Km  int.KmC1 / for
all m, and m Km D ‚. We introduce the notations

h. ,  , K/ :D inf H 2 .P 0 , P / , K2K, 2K,  >0,


 0 2KnB ./
Z
a. , K c / :D sup f 1=20 f
1=2
d , K2K, 2 int.K/ ,
 0 2K c \‚ 

Z ˇ ˇ 1=2
ˇ 1=2 1=2 ˇ2
. , ı/ :D sup ˇf 0  f ˇ d , 2‚, ı>0.
 0 2Bı ./\‚

(iii) For the single experiment E, we assume a condition

.C/ lim . , ı/ D 0 for every 2‚


ı#0

together with
c
.CC/ lim a. , Km /<1
m!1

for every 2 ‚ and every compact exhaustion .Km /m of ‚.


(b) For n  1, we consider product models
 ² ³
n n n
En D . n , An , ¹Pn,# : # 2 ‚º / D X , ˝ A, Pn,# :D ˝ P# : # 2 ‚ .
iD1 iD1 iD1
Section 1.4 Consistency of ML Estimators via Hellinger Distances 35

In En , we write fn,# D fn .#, / for the density of Pn,# with respect to ˝niD1 , and
L=#
n for the likelihood ratio of Pn, with respect to Pn,# .

In Assumptions 1.19(a), condition (+) implies continuity of the parameterisation in


Hellinger distance, whereas (++) guarantees identifiability with respect to parameter
values outside large compacts. The set of Assumptions 1.19 yields consistency of ML
sequences, see Theorem 1.24 below. We proceed via some auxiliary results. The first
=#
is a control on likelihood ratios Ln under Pn,# when parameter values are far away
from #.

1.20 Lemma. For K 2 K and # 2 int.K/, the condition a.#, K c / < 1 implies
geometric decrease
 q 
=# n
En,# sup Ln  a.#, K c / , n1.
 2 ‚\K c

Proof. Write for short V :D K c \ ‚. Then for .!1 , : : : , !n / 2 ¹fn,# > 0º


 q  Y n
=# f 1=2 . , !i /
sup Ln .!1 , : : : , !n / D sup
 2V  2V
iD1
f 1=2 .#, !i /
Yn 
1
 1=2
 sup f 1=2 . , !i /
iD1
f .#, ! i /  2V

and thus
 q  Z n
=# 1=2 1=2 n
En,# sup Ln  sup f# f d  a.#, K c /
 2V  2V

for every n  1. 

1.21 Lemma. Condition 1.19 implies Hellinger continuity of the parameterisation


H.P 0 , P / ! 0 whenever 0 ! in ‚
which in turn guarantees identifiability within compacts contained in ‚:
h. ,  , K/ > 0 for all K 2 K, 2 K and  > 0 .

Proof. By definition of . , ı/ and by (+) in Assumptions 1.19, the first assertion of


the lemma holds by dominated convergence. From this, all mappings
0
.ı/ ‚ 3 ! H.P 0 , P / 2 Œ0, 1 , 2 ‚ fixed
are continuous, by inverse triangular equality jd.x, z/  d.y, z/j  d.x, y/ for the
metric d., / D H., /. Since we have always P 0 ¤ P when 0 ¤ (cf. Defini-
tion 1.1), the continuous mapping .ı/ has its unique zero at 0 D . As a consequence,
36 Chapter 1 Score and Information

fixing K 2 K, 2 K,  > 0 and restricting .ı/ to the compact K n B . /, the second


assertion of the lemma follows. 
=#
This allows us to control likelihood ratios Ln under Pn,# when ranges over
small balls Bı . 0 / which are distant from #.

1.22 Lemma. Fix K 2 K, # 2 int.K/,  > 0. Under Assumptions 1.19, consider


points
0 2 K n B .#/

and (using Lemma 1.21 and (+) in Assumptions 1.19) choose ı > 0 small enough to
have
. 0 , ı/ < h.#,  , K/ .
Then we have geometric decrease
 q 
=#
En,# sup Ln  Œ 1  h.#,  , K/ C . 0 , ı/ n , n1.
 2 Bı .0 /\K

Proof. Write for short V :D Bı . 0 / \ K. For .!1 , : : : , !n / 2 ¹fn,# > 0º, observe
that  q 
=#
Yn
f 1=2 . , !i /
sup Ln .!1 , : : : , !n / D sup
 2V  2V
iD1
f 1=2 .#, !i /
is smaller than
Yn  ˇ ˇ
1 ˇ 1=2 ˇ
1=2 .#, ! /
f 1=2
. 0 , !i / C sup ˇ f . , !i /  f 1=2
. 0 , ! i /ˇ
iD1
f i  2V

which yields
 q  ²Z  ˇ ˇ ³n
ˇ 1=2 1=2 ˇ
En,# sup L=#
n  f#
1=2
f
1=2
0
C sup ˇf  f0 ˇ
d .
 2V   2V

With Notations 1.19, we have by Lemma 1.21 and choice of 0


Z
1=2 1=2
f# f0 d D 1  H 2 .P0 , P# /  1  h.#,  , K/ < 1

whereas Cauchy–Schwarz inequality and choice of V imply


Z ˇ ˇ
1=2 ˇ 1=2 1=2 ˇ
f# sup ˇf  f0 ˇ d  1  . 0 , ı/ D . 0 , ı/ .
 2V

Putting all this together, assumption (+) in Assumptions 1.19 and our choice of ı im-
plies  q 
=#
En,# sup Ln  ¹ 1  h.#,  , K/ C . 0 , ı/ ºn
 2 Bı .0 /\K

where the term ¹1  h.#,  , K/ C . 0 , ı/º is strictly between 0 and 1. 


Section 1.4 Consistency of ML Estimators via Hellinger Distances 37

1.23 Lemma. For all K 2 K, # 2 int.K/,  > 0, we have under Assumptions 1.19
exponential bounds
 q 
 C e n 2 h.#, ,K/ , n  1
=# 1
En,# sup Ln
 2 KnB .#/

where the constant C < 1 depends on #,  , K.

Proof. Fix K 2 K, # 2 int.K/,  > 0. By Lemma 1.21 and (+) in Assumptions


1.19, we associate to every point 0 in K n B .#/ some radius ı. 0 / > 0 such that
1
.ı/ . 0 , ı. 0 // < h.#,  , K/ .
2
® ¯
This defines an open covering Bı.0 / . 0 / : 0 2 K n B .#/ of the compact
® ¯
K n B .#/ in Rd . We can select a finite subcovering Bı.0,i / . 0,i / : i D 1, : : : , ` .
Applying Lemma 1.22 to every Vi :D Bı.0,i / . 0,i / \ K we get
 q  X̀  q 
=# =#
En,# sup Ln  En,# sup Ln
 2 KnB .#/  2 Vi
iD1

n
 Œ 1  h.#,  , K/ C . 0,i , ı. 0,i // 
iD1

where by choice of the radii in .ı/, and by the elementary inequality e y  1  y for
0 < y < 1, the right-hand side is smaller than
h 1 in
` 1  h.#,  , K/  ` e n 2 h.#, ,K/ .
1

2
This is the assertion, with C :D ` the number of balls Bı.0,i / . 0,i / with radii .ı/ which
were needed to cover the compact K n B .#/ subset of ‚  Rd . 

1.24 Theorem. As n ! 1, consider independent replication of an experiment E


which satisfies all of Assumptions 1.19. Then any maximum likelihood estimator se-
quence .Tn /n for the unknown parameter is consistent.

Proof. Let any ML estimator sequence .Tn /n with ‘good sets’ .An /n be given, as
defined in Definition 1.17. Fix # 2 ‚ and  > 0; we have to show
lim Pn,# . ¹jTn  #j   º / D 0 .
n!1

Put U :D ‚ n B .#/. Since Tn is maximum likelihood on An , we have


² ³ \
sup fn, < fn,# An  ¹ Tn … U º
2U
38 Chapter 1 Score and Information

for every n  1. Taking complements we have


² ³ [
¹ Tn 2 U º  sup fn,  fn,# Acn , n1
2U
and write for the first set on the right-hand side
s !  q   q 
fn, =# =#
Pn,# sup  1  Pn,# sup Ln  1  En,# sup Ln .
2U fn,# 2U 2U

Fix a compact exhaustion .Km /m of ‚. Take m large enough for B .#/  Km and
c / < 1 in virtue of condition (++) in Assumptions 1.19.
large enough to have a.#, Km
Then we decompose
c
U D ‚ n B .#/ D U1 [ U2 , U1 :D Km n B .#/ , U2 :D ‚ \ Km
where Lemma 1.20 applies to U2 and Lemma 1.23 to U1 . Thus
 q 
=#
Pn,# . ¹jTn  #j   º \ An /  En,# sup c Ln
 2 ‚\Km
 q 
=#
C En,# sup Ln
 2 Km nB .#/

n
C C e n 2 h.#,
1
c ,Km /
 a.#, Km / , n1.
This right-hand side decreases exponentially fast as n ! 1. By definition of the
sequence of ‘good sets’ in Definition 1.17, we also have
 
.ı/ lim Pn,# Acn D 0
n!1
and are done. 

Apart from the very particular case An D n for all n large enough, the preceding
proof did not establish an exponential decrease of Pn,# .jTn  #j   / as n ! 1.
This is since Assumptions 1.19 do not provide a control on the speed of convergence
in .ı/. Also, we are not interested in proving results ‘uniformly in compact #-sets’.
Later, in more general sequences of statistical models, even the rate of convergence
will vary from point to point on the parameter set, without any hope for results of
such type. We shall work instead with contiguity and Le Cam’s ‘third lemma’, see
Chapters 3 and 7 below.

1.24” Exercise. Calculate the quantities defined in Assumption 1.19(a) for the single exper-
iment E in the following case: ., A/ D .R, B.R//; the dominating measure  D  is
Lebesgue measure; ‚ is some open interval in R, bounded or unbounded; P# are uniform
laws on intervals of unit length centred at #
 
1 1
P# :D R #  , # C , # 2‚.
2 2
Section 1.4 Consistency of ML Estimators via Hellinger Distances 39

Prove that the parameterisation is Hölder continuous of order 12 in the sense of Hellinger dis-
tance: p
H.P 0 , P / D j 0  j for 0 sufficiently close to .
Prove that condition (+) in Assumptions 1.19 holds with
p
. , ı/ D 2 ı

whenever ı > 0 is sufficiently small (i.e. 0 < ı  ı0 . / for 2 ‚). Show also that it is
impossible to satisfy condition (++) whenever diam.‚/  1. In the case where diam.‚/ > 1,
prove
1
.1  dist. , @‚//  lim a. , Km
c
/  2 .1  dist. , @‚// when 0 < dist. , @‚/ 
m!1 2
for any choice of a compact exhaustion of ‚, @‚ denoting the boundary of ‚: in this case
condition (++) is satisfied. Hence, in the case where diam.‚/ > 1, Theorem 1.24 establishes
consistency of any sequence of ML estimators for the unknown parameter under independent
replication of the experiment E. 

We conclude this section by underlining p the following. First, in i.i.d. models, rates
of convergence need not be the well-known n ; second, in the case where ML estima-
tion errors at # are consistent at some rate 'n .#/ as defined in Assumption 1.9(c), this
is not more than just tightness of rescaled estimation errors L.'n .#/.b # n  #/ j Pn,# /
as n ! 1. Either there may be no weak convergence at all, or we may end up with a
variety of limit laws whenever the model allows to define a variety of particular ML
sequences.

1.25 Example. In the location model generated from the uniform law R. 12 , 12 /
 °  1 1 ±
E D R , B.R/ , P# :D R #  , # C : # 2‚DR
2 2
(extending exercise 1.2400 ), any choice of an ML estimator sequence for the unknown
parameter is consistent at rate n. There is no unicity concerning limit distributions for
rescaled estimation errors at # to be attained simultaneously for any choice of an ML
sequence.

Proof. We determine the density of P0 with respect to  as f0 .x/ :D 1Œ 12 , 12  .x/, using
the closed interval of length 1. This will simplify the representation since we may use
two particular ‘extremal’ definitions – the sequences .Tn.1/ /n and .Tn.3/ /n introduced
below – of an ML sequence.
(1) We start with a preliminary remark. If Yi are i.i.d. random variables distributed
according to R..0, 1//, binomial trials show that the probability of an event
² ³
u1 u2
max Yi < 1  , min Yi >
1in n 1in n
40 Chapter 1 Score and Information

tends to e .u1 Cu2 / as n ! 1 for u1 , u2 > 0 fixed. We thus have weak convergence
in R2 as n ! 1
 
n .1  max Yi / , n  min Yi ! . Z1 , Z2 /
1in 1in

where Z1 , Z2 are independent and exponentially distributed with parameter 1.


(2) For n  1, the likelihood function ! L=#
n coincides Pn,# -almost surely with
the function
R3 ! 1 . / 2 ¹0, 1º
max Xi  12 , min Xi C 12
1i n 1i n

in the product model En . Hence any estimator sequence .Tn /n with ‘good sets’ .An /n
such that

1 1
. / Tn 2 max Xi  , min Xi C on An
1in 2 1in 2
will be a maximum likelihood sequence for the unknown parameter. The random in-
terval in . / is of strictly positive length since Pn,# –almost surely
max Xi  min Xi < 1 Pn,# -almost surely.
1in 1in

All random intervals above are closed by determination 1Œ 12 C, 12 C of the density f
for 2 ‚.
.i/
(3) By . / in step (2), the following .Tn /n with good sets .An /n are ML sequences
in .En /:
1
Tn.1/ :D min Xi C with An :D n
1in 2
² ³
1 1 1
Tn.2/ :D min Xi C  2 with An :D max Xi  min Xi < 1  2
1in 2 n 1in 1in n
1
Tn.3/ :D max Xi  with An :D n
1in 2
² ³
.4/ 1 1 1
Tn :D max Xi  C 2 with An :D max Xi  min Xi < 1  2
1in 2 n 1in 1in n
.5/ 1 
Tn :D min Xi C max Xi with An :D n .
2 1in 1in

Without the above convention on closed intervals in the density f0 we would remove
.1/ .3/
the sequences .Tn /n and .Tn /n from this list.
(4) Fix # 2 ‚ and consider at stage n of the asymptotics the model En locally at
#, via a reparameterisation D # C u=n according to rate n suggested by step (1).
.#Cu=n/=#
Reparameterised, the likelihood function u ! Ln coincides Pn,# -almost
surely with
R 3 u ! 1     .u/ 2 ¹0, 1º .
n max .Xi #/  1
2 ,n min .Xi #/ C 12
1i n 1i n
Section 1.4 Consistency of ML Estimators via Hellinger Distances 41

As n ! 1, under Pn,# , we obtain by virtue of step (1) the ‘limiting likelihood func-
tion’
. / R 3 u ! 1. Z1 , Z2 / .u/ 2 ¹0, 1º
where Z1 , Z2 are independent exponentially distributed.
(5) Step (4) yields the following list of convergences as n ! 1:
8
    < Z1 for i D 3, 4
L n Tn.i/  # j Pn,# ! Z2 for i D 1, 2
: 1
2 .Z2  Z 1 / for i D 5.
Clearly, we can also realise convergences
L .n .Tn  #/ j Pn,# / ! ˛Z2  .1  ˛/Z1
.2/ .4/
for any 0 < ˛ < 1 if we consider Tn :D ˛Tn C .1  ˛/Tn , or we can realise
L .n .Tn  #/ j Pn,# / does not converge weakly in R as n ! 1

if we define Tn as Tn.2/ when n is odd, and by Tn.4/ if n is even. Thus, for a maximum
likelihood estimation in the model E as n ! 1, it makes no sense to put forward
the notion of limit distribution, the relevant property of ML estimator sequences being
n-consistency (and nothing more). 
Chapter 2

Minimum Distance Estimators

Topics for Chapter 2:


2.1 Measurable Stochastic Processes with Paths in Lp .T , T , /
Measurable stochastic processes and their paths 2.1
Measurable stochastic processes with paths in Lp .T , T , / 2.2
Characterising weak convergence in Lp .T , T , / 2.3
Main theorem: sufficient conditions for weak convergence in Lp .T , T , / 2.4
Auxiliary results on integrals along the path of a process 2.5–2.5’
Proving the main theorem 2.6–2.6”
2.2 Minimum Distance Estimator Sequences
Example: fitting the empirical distribution function to a parametric family 2.7
Assumptions and notations 2.8
Defining minimum distance (MD) estimator sequences 2.9
Measurable selection 2.10
Strong consistency of MD estimator sequences 2.11
Some auxiliary results 2.12–2.13
Main theorem: representation of rescaled estimation errors in MD estimator
sequences 2.14
Variant: weakly consistent MD estimator sequence 2.15
2.3 Some Comments on Gaussian Processes
Gaussian and -Gaussian processes 2.16
Some examples (time changed Brownian bridge) 2.17
Existence of -Gaussian processes 2.18
-integrals along the path of a -Gaussian process 2.19
2.4 Example: Asymptotic Normality of Minimum Distance Estimator Sequences
Assumptions and notations 2.20–2.21
Main theorem: asymptotic normality for MD estimator sequences 2.22
Empirical distribution functions and Brownian bridges 2.23
Example: MD estimators defined from the empirical distribution function 2.24
Example: MD estimator sequences for symmetric stable i.i.d. observations 2.25
Exercises: 2.1’, 2.3’, 2.24’, 2.25’.
Section 2.1 Stochastic Processes with Paths in Lp .T , T , / 43

This chapter is devoted to the study of one class of estimators in parametric fami-
lies which – without aiming at any notion of optimality – do have reasonable asymp-
totic properties under assumptions which are weak and easy to verify. Most examples
will consider i.i.d. models, but the setting is more general: sequences of experiments
where empirical objects ‰ b n , calculated from the data at level n of the asymptotics,
are compared to theoretical counterparts ‰# under #, for all values of the parameter
# 2 ‚, which are deterministic and independent of n. Our treatment of asymptotics
of minimum distance (MD) estimators follows Millar [100], for an outline see Kuto-
yants [78–80]. Below, Sections 2.1 and 2.3 contain the mathematical tools and are
of auxiliary character; the statistical part is concentrated in Sections 2.2 and 2.4. The
main statistical results are Theorem 2.11 (almost sure convergence of MD estima-
tors), Theorem 2.14 (representation of rescaled MD estimator errors) and Theorem
2.22 (asymptotic normality of rescaled MD estimation errors). We conclude with an
example where the parameters of a symmetric stable law are estimated by means of
MD estimators based on the empirical characteristic function of the first n observa-
tions.

2.1 Stochastic Processes with Paths in Lp .T , T , /


In preparation for asymptotics of MD estimator sequences, we characterise weak con-
vergence in relevant path spaces. We follow Cremers and Kadelka [16], see also Grin-
blat [37]. Throughout this section, the following set 2.1 of assumptions will be in force.

2.1 Assumptions and Notations for Section 2.1. (a) Let  denote a finite measure on
a measurable space .T , T / with countably generated  -field T . For 1  p < 1 fixed,
the space Lp ./ D Lp .T , T , / of (-equivalence classes of) p-integrable functions
f : .T , T / ! .R, B.R// is equipped with its norm
Z  p1
p
kf k D kf kL .T ,T ,/ D
p jf .t /j .dt / < 1
T

and its Borel- -field B.L p .//. T being countably generated, the space Lp .T , T , /
is separable ([127, p. 138], [117, pp. 269–270]): there is a countable subset S  Lp ./
which is dense in Lp .T , T , /, and B.Lp .// is generated by the countable collection
of open balls
Br .g/ :D ¹ f 2 Lp ./ : kf  gkLp ./ < r º , r 2 QC , g 2 S .
(b) With parameter set T as in (a), a real valued stochastic process X D .X t / t2T on
a probability space ., A, P / is a collection of random variables X t , t 2 T , defined on
., A/ and taking values in .R, B.R//. This process is termed measurable if .t , !/ !
X.t , !/ is a measurable mapping from .T , T ˝A/ to .R, B.R//. In a measurable
44 Chapter 2 Minimum Distance Estimators

process X D .X t / t2T , every path


X .!/ : T 3 t ! X.t , !/ 2 R , !2
is a measurable mapping from .T , T / to .R, B.R//.
(c) Points in Rm are written as t D .t1 , ..., tm /. We write .s1 , ..., sm /  .t1 , ..., tm /
if sj  tj for all 1  j  m; in this case, we put .s, t  :D XjmD1 .sj , tj  or Œs, t / :D
XjmD1 Œsj , tj / etc. and speak of intervals in Rm instead of rectangles.

2.1’ Exercise. Consider i.i.d. random variables Y1 , Y2 , : : : defined on some ., A, P /, and
taking values in .T , T / :D .Rm , B.Rm //.
(a) For any n  1, the empirical distribution function associated to the first n observations

1X 1X
n n
bn .t , !/ :D
F 1.1,t .Yi .!// D 1¹Yi tº .!/ , t 2 Rm , ! 2 
n n
i D1 i D1

is a stochastic process bF n D .F bn .t // t2Rm on ., A/. Prove that F bn is a measurable process


in the sense of 2.1.
Hint: with respect to grids 2k Zm , k  1, l D .l1 , : : : , lm / 2 Zm , write
   
l1 C1 lm C1 m lj lj C1
l C .k/ :D , : : : , , A l .k/ :D X , ;
2k 2k j D1 2k 2k

then for every k, the mappings


X
.t , !/ ! 1Al .k/ .t / b
F n .l C .k/, !/
l2Zm

are measurable from .Rm , B.Rm /˝A/ to .R, B.R//, so the same holds for their pointwise
limit as k ! 1 which is .t , !/ ! F bn .t , !/.
(b) Note that the reasoning in (a) did not need more than continuity from the right – in
every component of the argument t 2 Rm – of the paths b F n ., !/ which for fixed ! 2 
are distribution functions on Rm . Deduce the following: every real valued stochastic process
.Y t / t2Rm on ., A/ whose paths are continuous from the right is a measurable process.
p
(c) Use the argument of (b) to show that rescaled differences n.F bn  F /
p  
.t , !/ ! n b F n .t , !/  F .t / , t 2 Rm , ! 2 

are measurable stochastic processes, for every n  1.


(d) Prove that every real valued stochastic process .Yt / t2Rm on ., A/ whose paths are con-
tinuous from the left is a measurable process: proceed in analogy to (a) and (b), replacing the
intervals A` .k/ used in (a) by e
l l C1
Al .k/ :D XjmD1 . 2jk , j2k , and the points `C .k/ by 2k `. 

2.2 Lemma. For a measurable stochastic process X D .X t / t2T on ., A, P /, intro-


duce the condition
Z
./ jX.t , !/jp .dt / < 1 for all ! 2 .
T
Section 2.1 Stochastic Processes with Paths in Lp .T , T , / 45

Then the mapping


X :  3 ! ! X .!/ 2 Lp ./
is a well-defined random variable on ., A/ taking values in the space .Lp .T , T , /,
B.Lp .T , T , ///, and we call X a measurable stochastic process with paths in
Lp ./.

Proof. For arbitrary g 2 Lp .T , T , / fixed, for measurable processes X D .X t / t2T


on ., A/ which satisfy the condition ./, consider
Z  p1
p
Zg : ! ! kX .!/  gk D jX.t , !/  g.t /j .dt / <1.
T
Considering the right-hand side of this expression, our assumptions guarantee that Zg
is a measurable mapping from ., A/ to .R, B.R//. For balls Br .g/ as defined in
Assumption 2.1(a), this gives
X1 .Br .g// D ¹! : X .!/ 2 Br .g/º D ¹Zg < rº 2 A
where the collection of balls Br .g/ generates B.Lp .//. Hence X is a measurable
mapping from ., A/ to .Lp .T , T , /, B.Lp .T , T , ///. 

We give a characterisation of weak convergence in Lp .T , T , /:

2.3 Proposition. Let .X tn / t2T , n  1, and .X t / t2T be measurable stochastic pro-


cesses with paths in Lp .T , T , /. A necessary and sufficient condition for weak con-
vergence in Lp .T , T , /
L
Xn ! X , n!1
is the following:
8
ˆ for arbitrary l  1 and arbitrary choice of functions g1 , ..., gl in L .T , T , /,
ˆ p
<
  L
ˆ kX  g1 k , : : : , kX  gl k ! . kX  g1 k , : : : , kX  gl k /
n n

weakly in Rl as n ! 1 .

Proof. Write for short Lp D Lp .T , T , /. Mappings


h : Lp 3 f ! .kf  g1 k, : : : , kf  gl k/ 2 Rl
being continuous, the condition stated above is a necessary condition, by the continu-
ous mapping theorem. We prove that the condition stated above is sufficient.
Let X n be defined on spaces .n , An , Pn /, n  1, and X on ., A, P /. We write
n for the law L.X n jP / on .Lp , B.Lp //, n  1, and Q for L.X jP /; according
Q  n 
to the Portmanteau theorem (e.g. Billingsley [11]), we shall prove
.C/ lim inf Qn .G/  Q.G/ for every open set G in Lp .
n!1
46 Chapter 2 Minimum Distance Estimators

By separability of Lp according to Assumption 2.1(a) we can write G as a countable


union of open balls
1
[
GD Bri .gi / for suitable gi 2 S , ri 2 QC .
iD1

From this representation of G, associate to every " > 0 some ` D `."/ such that
[ l 
Q.G/  Q Bri .gi / C " .
iD1
Thus we have as n ! 1
[
l   [
l 
n n
lim inf Q .G/  lim inf Q Bri .gi / D lim inf Pn Xn 2 Bri .gi /
n n n
iD1 iD1
 
D lim inf Pn there is some 1  i  l such that kXn  gi k < ri
n
  
  l
D lim inf 1  Pn kXn  g1 k, : : : , kXn  gl k 2 X Œri , 1/ .
n iD1
 
Using the condition which ensures weak convergence of kXn  g1 k, : : : , kXn  gl k
in Rl as n ! 1, and the Portmanteau theorem with closed sets in Rl , we can continue
 
l
 1  P .kX  g1 k, : : : , kX  gl k/ 2 X Œri , 1/
iD1
D P .there is some 1  i  l such that kX  gi k < ri /
 [l  [ l 
D P X 2 Bri .gi / D Q Bri .gi /
iD1 iD1

 Q.G/  "
by choice of l D `."/. Since " > 0 was arbitrary, we have (+). This finishes the
proof. 

2.3’ Exercise. Under the assumptions of Proposition 2.3, use the continuous mapping theo-
rem to show that weak convergence Xn ! X in Lp .T , T , / as n ! 1 implies weak
convergence of all integrals
Z Z
g.s/ Xsn .ds/ ! g.s/ Xs .ds/ (weakly in R, as n ! 1)
T T

for functions g : T ! R which belong to Lq ./, 1


p
C q1 D 1, and thus also weak convergence
Z Z
gn .s/ Xsn .ds/ ! g.s/ Xs .ds/ (weakly in R, as n ! 1)
T T

for arbitrary sequences .gn /n in Lq ./ with the property gn ! g in Lq ./. 


Section 2.1 Stochastic Processes with Paths in Lp .T , T , / 47

The following Theorem 2.4, the main result of this section, gives sufficient condi-
tions for weak convergence in Lp .T , T , / – of type ‘convergence of finite dimen-
sional distributions plus uniform integrability’ – from which weak convergence in
Lp .T , T , / can be checked quite easily.

2.4 Theorem (Cremers and Kadelka [16]). Consider measurable stochastic processes

.X tn / t2T defined on .n , An , Pn /, n  1, and .X t / t2T defined on ., A, P /

with paths in Lp .T , T , /, under all assumptions of 2.1.


(a) In order to establish
L
Xn ! X (weak convergence in Lp .T , T , /, as n ! 1) ,

a sufficient condition is that the following properties (i) and (ii) hold simultaneously:
(i) convergence of finite dimensional distributions up to some exceptional set N 2
T such that .N / D 0: for arbitrary l  1 and any choice of t1 , : : : , tl in T n N , one
has
  L  
L .X tn1 , : : : , X tnl / j Pn ! L .X t1 , : : : , X tl / j P
(weak convergence in Rl , as n ! 1) ;

(ii) uniform integrability of ¹jX n ., /jp : n  0º for the random variables (including
0
X :D X and P0 :D P )

X n defined on .T n , T ˝An , ˝Pn / with values in .R, B.R// , n0

in the following sense: for every " > 0 there is some K D K."/ < 1 such that
Z
sup 1¹jX n j>Kº jX n jp d.˝Pn / < " .
n0 T n

(b) Whenever condition (a.i) is satisfied, any one of the following two conditions
is sufficient for (a.ii):
´
X n ., / ,Rn  1 , X., / are elementsR of Lp .T n , T ˝An , ˝Pn / , and
.2.40 / lim sup T n jX n jp d.˝Pn /  T  jX jp d.˝P / ,
n!1
´
 function f 2 L .T , T , / such that forn-almost all t 2 T
there 1
.2.400 /  isn some 
E jX t jp
 f .t / for all n  1, and lim E jX t jp D E .jX t jp / .
n!1

The remaining parts of this section contain the proof of Theorem 2.4, to be com-
pleted in Proofs 2.6 and 2.6’, and some auxiliary results; we follow Cremers and
Kadelka [16]. Recall that the set of Assumptions 2.1 is in force. W.l.o.g., we take
the finite measure  on .T , T / as a probability measure .T / D 1.
48 Chapter 2 Minimum Distance Estimators

2.5 Lemma. Write H for the class of bounded measurable functions ' : T R!R
such that
for every t 2 T fixed, the mapping '.t , / : x ! '.t , x/ is continuous.
Then condition (a.i) of Theorem 2.4 gives for every ' 2 H convergence in law of the
integrals
Z Z
n
'.s, Xs / .ds/ ! '.s, Xs / .ds/ (weak convergence in R, as n ! 1) .
T T

Proof. For real valued measurable processes .X t0 / t2T defined on some .0 , A0 , P 0 /,
the mapping .t , !/ ! .t , X 0 .t , !// is measurable from T ˝A to T ˝B.R/, hence
composition with ' gives a mapping .t , !/ ! '.t , X 0 .t , !// which is T ˝A–B.R/-
measurable. As a consequence,
Z
! ! '.s, X 0 .s, !// .ds/
T
0 0
R , A / taking
is a well-defined random variable on . values in .R, B.R//. In order to
prove convergence in law of integrals T '.s, Xsn / .ds/ as n ! 1, we shall show
 Z   Z 
n
.˘/ EPn g '.s, Xs / .ds/ ! EP g '.s, Xs / .ds/
T T

for arbitrary g 2 Cb .R/, the class of bounded continuous functions R ! R.


Fix any constant M < 1 such that sup j'j  M . Thanks to the convention
T R
.T / D 1 above, for functions g 2 Cb .R/ to be considered in .˘/, only the restriction
gjŒM ,M  to the interval ŒM , M  is relevant. According to Weierstrass, approximat-
ing g uniformly on ŒM , M  by polynomials, it will be sufficient to prove .˘/ for
polynomials. By additivity, it remains to consider the special case g.x/ :D x l , l 2 N,
and to prove .˘/ in this case. We fix l 2 N and show
Z l ! Z l !
EPn '.s, Xsn / .ds/ ! EP '.s, Xs / .ds/ , n!1.
T T

Put X 0 :D X and P0 :D P , and write left- and right-hand sides in the form
Z l !
n
EPn '.s, Xs / .ds/
T
Z Z !
Y
l
D  EPn '.si , Xsni / .ds1 / : : : .dsl /
T T iD1
Z Z
 n n

D  EPn .s1 ,:::,sl / .Xs1 , : : : , Xsl / .ds1 / : : : .dsl / .
T T
Section 2.1 Stochastic Processes with Paths in Lp .T , T , / 49

Here, at every point .s1 , : : : , sl / of the product space XliD1 T , a function

Y
l
.x1 , : : : , xl / D .s1 ,:::,sl / .x1 , : : : , xl / :D '.si , xi / 2 Cb .Rl /
iD1

arises which indeed is bounded and continuous: for ' 2 H , we exploit at this point
of the proof the defining property of class H . Condition (a.i) of Theorem 2.4 guaran-
tees convergence of finite dimensional distributions of X n to those of X up to some
exceptional -null set T 2 T , thus we have
   
EPn .Xsn1 , : : : , Xsnl / ! EP .Xs1 , : : : , Xsl / , n ! 1

for any choice of .s1 , : : : , sl / such that s1 , : : : , sl 2 T n N . Going back to the above
integrals on the product space XliD1 T , note that all expressions in the last convergence
are bounded by M l , uniformly in .s1 , : : : , sl / and n: hence
Z Z
 n n

... EPn .s1 ,:::,sl / .Xs1 , : : : , Xsl / .ds1 / : : : .dsl /
T T

will tend as n ! 1 by dominated convergence to


Z Z
 
... EP .s1 ,:::,sl / .Xs1 , : : : , Xsl / .ds1 / : : : .dsl / .
T T

This proves .˘/ in the case where g.x/ D x l , for arbitrary l 2 N. This finishes the
proof. 

2.5’ Lemma. For g 2 Lp .T , T , / fixed, consider the family of random variables

Z n .g/ :D kXn  gkp , n0

as in the proof of Lemma 2.2 (with X 0 :D X , P0 :D P ), and the class H as defined


in Lemma 2.5. Under condition R(a.ii) of Theorem 2.4, we can approximate Z n .g/
uniformly in n 2 N0 by integrals T '.s, Xsn / .ds/ for suitable choice of ' 2 H : for
every g 2 Lp ./ and every ı > 0 there is some ' D 'g,ı 2 H such that
ˇ Z ˇ 
ˇ ˇ
sup Pn ˇˇkXn  gkp  'g,ı .s, Xsn /.ds/ˇˇ > ı < ı .
n0 T

Proof. (1) Fix g 2 Lp .T , T , / and ı > 0. Exploiting the uniform integrability


condition (a.ii) of Theorem 2.4, we select C D C.ı/ < 1 large enough for
Z
1
.˘1/ sup 1¹jX n j>C º jX n jp d.˝Pn / < 2.pC1/ ı 2
n0 T 
n
4
50 Chapter 2 Minimum Distance Estimators

and  D .ı/ > 0 small enough such that

Gn 2 T ˝An , .˝Pn /.Gn / <  H)


.˘2/ Z
1
1Gn jX n jp d.˝Pn / < 2p ı 2
T n 4

independently of n. Assertion .˘2/ amounts to the usual "-ı-characterisation of uni-


form integrability, in the case where random variables .t , !/ ! jXn .t , !/jp are de-
fined on different probability spaces .T n , T ˝An , ˝Pn /. In the same way, we
view .t , !/ ! jg.t /jp as a family of random variables on .T n , T ˝An , ˝Pn /
for n  1 which is uniformly integrable. Increasing C < 1 of .˘1/ and decreasing
 > 0 of .˘2/ if necessary, we obtain in addition
Z
1
.˘3/ 1¹jgj>C º jgjp d < 2.pC1/ ı 2
T 4
together with
Gn 2 T ˝An , .˝Pn /.Gn / <  H)
Z
.˘4/ 1
1Gn jgjp d.˝Pn / < 2p ı 2 .
T n 4
A final increase of C < 1 makes sure that in all cases n  0 the sets

.˘5/ Gn :D ¹jX n j > C º or Gn :D ¹jgj > C º

satisfy the condition .˝Pn /.Gn / < , and thus can be inserted in .˘2/ and .˘4/.
(2) With g, ı, C of step (1), introduce a truncated identity h.x/ D .C /_x^.CC /
and define a function ' D 'g,ı : T R ! R by

'.s, x/ :D jh.x/  h.g.s//jp , s2T , x2R.

Then ' belongs to class H as defined in Lemma 2.5, and is such that

.C/ jXsn .!/  g.s/jp D '.s, Xsn .!// on ¹.s, !/ : jX n .s, !/j  C , jg.s/j  C º .

As a consequence of (+), for ! 2  fixed, the difference


ˇ R ˇ
ˇ kX n .!/  gkp  ˇ
T '.s, Xs .!// .ds/ 
n

., n/ Z
j jXsn .!/  g.s/jp  '.s, Xsn .!// j .ds/
T

admits, for ! 2  fixed, the upper bound


Z
1¹jX n .s,!/j>C º [ ¹jg.s/j>C º j jXsn .!/  g.s/jp  '.s, Xsn .!// j .ds/
T
Section 2.1 Stochastic Processes with Paths in Lp .T , T , / 51

which is (using the elementary ja C bjp  .jaj C jbj/p  2p .jajp C jbjp /, and
definition of ') smaller than
Z
 
1¹jX n .s,!/j>C º [ ¹jg.s/j>C º 2p jXsn .!/jp C 2p jg.s/jp C 2p C p .ds/
T
²Z
 2p 1¹jX n .s,!/j>C º .jX n .s, !/jp C C p / .ds/
T
Z Z
C 1¹jX n .s,!/j>C º jg.s/jp .ds/ C 1¹jg.s/j>C º jX n .s, !/jp .ds/
ZT T
³
p p
C 1¹jg.s/j>C º .jg.s/j C C / .ds/
T

and finally smaller than


² Z Z
p n p
2 2 1¹jX n .s,!/j>C º jX .s, !/j .ds/ C 1¹jX n .s,!/j>C º jg.s/jp .ds/
T T
Z Z ³
n p p
C 1¹jg.s/j>C º jX .s, !/j .ds/ C 2 1¹jg.s/j>C º jg.s/j .ds/ .
T T

The last right-hand side is the desired bound for ., n/, for ! 2  fixed.
(3) Integrating this bound obtained in step (2) with respect to Pn , we obtain
ˇ Z ˇ
ˇ n ˇ
ˇ
EPn ˇkX .!/  gk  p
'.s, Xs .!// .ds/ˇˇ
n
T
Z Z
pC1 n p p
2 1¹jX n j>C º jX j d.˝Pn / C 2 1¹jX n j>C º jgjp d.˝Pn /
T n T n
Z Z
p n p pC1
C2 1¹jgj>C º jX j d.˝Pn / C 2 1¹jgj>C º jgjp d
T n T

where .˘1/–.˘5/ make every term on the right-hand side smaller than 14 ı 2 , indepen-
dently of n. Thus
ˇ Z ˇ
ˇ n ˇ
sup EPn ˇˇkX .!/  gk 
p
'.s, Xs .!// .ds/ˇˇ < ı 2
n
n0 T

for ' D 'ı,g 2 H , and application of the Markov inequality gives


ˇ Z ˇ 
ˇ n ˇ
sup Pn ˇˇkX  gk  p
'.s, Xs /.ds/ˇˇ > ı < ı
n
n0 T

as desired. 

2.6 Proof of Theorem 2.4(a). (1) We shall prove part (a) of Theorem 2.4 using ‘ac-
companying sequences’. We explain this for some sequence .Yen /n of real valued ran-
e
dom variables whose convergence in law to Y 0 we wish to establish. Write Cu .R/ for
52 Chapter 2 Minimum Distance Estimators

the class of uniformly continuous and bounded functions R ! R; for f 2 Cu .R/ put
Mf :D 2 sup jf j. A sequence .Z e n /n – where for every n  0, Z
e n and Yen live on the
same probability space – is called a ı-accompanying sequence for .Y ne n / if

 n 
sup Pn jY e Z enj > ı < ı .
n0

Selecting for every f 2 Cu .R/ and every  > 0 some ı D ı.f , / > 0 such that

sup jf .x/  f .x 0 /j <  ,


jxx 0 j<ı

we have for ı D ı.f , /-accompanying sequences .Z e n /n the inequality


ˇ ˇ ˇ ˇ
.ı/ ˇE.f .Yen //  E.f .Ye0 //ˇ  ˇE.f .Z
e n //  E.f .Z e 0 //ˇ C 2  C 2Mf ı .

For a given sequence .Y en /n0 we thus have the following: whenever we are able to
associate for every ı > 0 some sequence .Z e n .ı//n0 which is ı-accompanying and
e
such that Z e
n .ı/ converges in law to Z 0 .ı/ as n ! 1, we do have convergence in
e0 , thanks to .ı/.
en /n to Y
law of .Y
(2) We start the proof of Theorem 2.4(a). In order to show

Xn ! X (weak convergence in Lp .T , T , /, n ! 1)

it is sufficient by Proposition 2.3 to consider arbitrary g1 , ..., gl in Lp .T , T , /, l  1,


and to prove
 n 
kX  g1 k, : : : , kXn  gl k ! .kX  g1 k, : : : , kX  gl k/
(weakly in Rl , n ! 1)

or equivalently
 n   
kX  g1 kp , : : : , kXn  gl kp ! kX  g1 kp , : : : , kX  gl kp
(weakly in Rl , n ! 1) .

According to Cramér–Wold (or to P. Lévy’s continuity theorem for characteristic func-


tions), to establish the last assertion we have to prove for all ˛ D .˛1 , : : : , ˛` / 2 Rl

X
l X
l
Y n :D ˛i kXn  gi kp ! ˛i kX  gi kp D: Y 0
.CC/ iD1 iD1
(weakly in R, n ! 1)
Pl
where w.l.o.g. we can assume first ˛i ¤ 0 for 1  i  l, and second 1
l
1
iD1 j˛i j D 1,
by multiplication of .˛1 , : : : , ˛` / with some constant.
Section 2.1 Stochastic Processes with Paths in Lp .T , T , / 53

(3) With .˛1 , : : : , ˛` / and g1 , : : : , g` of (++), for ı > 0 arbitrary, select functions
'gi , ı in H such that
lj˛i j

ˇ Z ˇ 
ˇ n ˇ ı ı
sup Pn ˇˇkX  gi k 
p
'gi , ı .s, Xs /.ds/ˇˇ >
n
<
n0 T lj˛i j lj˛i j lj˛i j
with notations of Lemma 2.5’. Then also
X
l
'˛,g1 ,:::,gl ,ı .t , x/ :D ˛i 'gi , ı .t , x/ , t 0, x2R
lj˛i j
iD1

is a function in class H . Introducing the random variables


Z
Z n :D '˛,g1 ,:::,gl ,ı .s, Xsn / .ds/ , n  1,
ZT
0
Z :D '˛,g1 ,:::,gl ,ı .s, Xs / .ds/ ,
T

by Lemma 2.5, condition (a.i) of Theorem 2.4 gives convergence in law of the integrals

./ Z n ! Z 0 (weak convergence in R), n ! 1 .


P
Now, triangle inequality and norming 1l liD1 j˛1i j D 1 show that independently of
n0
 
Pn jY n  Z n j > ı
Xl ˇ Z ˇ 
ˇ ˇ ı
 ˇ n p
Pn ˇ˛i kX  gi k  ˛i n ˇ
'gi , ı .s, Xs /.ds/ˇ >
T lj˛i j l
iD1
X l ˇ Z ˇ 
ˇ ˇ ı
D Pn ˇˇkXn  gi kp  'gi , ı .s, Xsn /.ds/ˇˇ > <ı
T lj˛i j lj˛i j
iD1

which identifies .Z n /n2N0 as a ı-accompanying sequence for .Y n /n2N0


ˇ Z ˇ 
ˇ n ˇ
./ ˇ
sup Pn ˇY  n ˇ
'˛,g1 ,:::,gl ,ı .s, Xs / .ds/ˇ > ı < ı .
n0 T

(4) Combining ./ and ./, we have constructed a ı-accompanying sequence for
.Y n /n2N0 which is weakly convergent, for ı > 0 arbitrary. By step (1), we thus have
proved convergence in law of Yn to Y0 as n ! 1. This is (++). According to step (2),
part (a) of Theorem 2.4 is proved. 

2.6’ Proof of Theorem 2.4(b). By dominated convergence with respect to , condi-


tion (2.4”) obviously implies condition (2.4’). We have to prove that condition (2.4’)
54 Chapter 2 Minimum Distance Estimators

combined with convergence of finite dimensional distributions up to some exceptional


set of -measure 0 (condition (a.i) of Theorem 2.4) guarantees that uniform integra-
bility as required in (a.ii) of Theorem 2.4 holds.
Condition (2.4’) implies X.., ./ 2 Lp .T , T ˝A, ˝P /, hence for " > 0 there
is C < 1 such that
Z
1¹jXj>C º jX jp d.˝P / < " .
T 

Write f0 .x/ :D jxjp on Rd , introduce a function f1 2 Cb .R/ which coincides with


f0 on ¹jxj  C º, vanishes on ¹jxj > 2C º, and satisfies 0  f1  f0 everywhere
on Rd . Define f2 :D f0  f1 : then f2 coincides with f0 on ¹jxj > 2C º, vanishes on
¹jxj  C º, and satisfies 0  f2  f0 on Rd . By choice of C D C."/ we have in
particular
Z 
.ı/ EP f2 .Xs / .ds/ < " .
T

Next, the function .s, x/ :D f1 .x/ belongs to class H as defined in Lemma 2.5.
Thanks to Lemma 2.5, condition (a.i) of Theorem 2.4 which is assumed here yields
convergence in law
Z Z
n
f1 .Xs / .ds/ ! f1 .Xs / .ds/ (weakly in R, n ! 1)
T T

which by definition of weak convergence is


 Z   Z 
n
.C/ EPn g f1 .Xs / .ds/ ! EP g f1 .Xs / .ds/ , n!1
T T

for arbitrary g 2 Cb .R/. Put M :D sup jf1 j and consider in particular functions
g 2 Cb .R/ which on ŒM , CM  coincide with the identity. Since .T / D 1, (+) for
such g yields
Z  Z 
.CC/ EPn f1 .Xsn / .ds/ ! EP f1 .Xs / .ds/ , n ! 1 .
T T

Since f1 C f2 D f0 , f0 .x/ D jxjp ,


our assumption (2.4’) combined with (++) and
.ı/ gives
Z  Z 
n
lim sup EPn f2 .Xs / .ds/  EP f2 .Xs / .ds/ < " .
n!1 T T

The function f2 coinciding with f0 on ¹jxj > 2C º, we have a fortiori


Z
1¹jX n j>2C º jX n jp d.˝Pn / < " for n sufficiently large .
T n
Section 2.2 Minimum Distance Estimator Sequences 55

By assumption (2.4’), all X n .., ./ being in Lp .T n , T ˝An , ˝Pn /, we can in-


crease this constant 2C to some K D K."/ < 1 which satisfies
Z
sup 1¹jX n j>Kº jX n jp d.˝Pn / < " .
n0 T n

This is uniform integrability as stated in condition (a.ii) of Theorem 2.4. The proof of
Theorem 2.4(b) is finished. 

2.2 Minimum Distance Estimator Sequences


Millar [100] has shown that minimum distance (MD) estimator sequences are con-
sistent and asymptotically normal under rather weak conditions, and have interesting
robustness properties under small deviations from the statistical model. Hence such
estimators – without aiming at optimality within a specified statistical model – are
interesting under the practical aspect that in all applications a statistical model is at
best an approximately accurate one. Le Cam [83, Ex. 1 on p. 154] presents a striking
example of how ML estimation of the parameters in a normal distribution model can
switch to a notion void of sense once certain – arbitrarily small – ‘contaminations’
are added: the likelihood surface then develops singularities which are in one-to-one
correspondence to the observations themselves. The present section introduces MD
estimator sequences. The main result is strong consistency (Proposition 2.11 below)
and a representation of rescaled MD estimation errors (Theorem 2.14); on this basis,
asymptotic normality for MD estimator sequences will be dealt with in Section 2.4.
Our approach follows [100].

2.7 Example. Observe i.i.d. random variables Y1 , Y2 , : : : , Yn taking values in Rm


whose law depends on an unknown parameter # 2 ‚. Assume that ‚ is an open
subset of Rd , and let the single observation under P# have continuous distribution
function F# : Rm ! Œ0, 1. Consider the family ¹F# : # 2 ‚º as a subset of Cb .Rm /
equipped with uniform convergence, and let the mapping

.C/ ‚ 3 # ! F# ./ 2 Cb .Rm /

b n ., !/ denote the empirical distribution function


be continuous. Let F

Xn
1X
n
b n .t , !/ :D 1
F 1.1,t .Yi .!// D 1¹Yi tº .!/ , t 2 Rm .
n n
iD1 iD1

b n ., / is a mea-
b n .t , !/, F
Considering as in Exercise 2.1’(a) the mapping .t , !/ ! F
surable stochastic process in the sense of Assumption 2.1(b). For any choice of a fi-
nite measure  on .Rm , B.Rm //, we write for short L2 ./ D L2 .Rm , B.Rm /, /,
56 Chapter 2 Minimum Distance Estimators

b n ., / is a process with paths in .L2 ./, B.L2 ./// as defined in Lemma 2.2. As-
F
sumption (+) guarantees that for fixed ! the mapping

.CC/ b n ., !/  F ./kL2 ./


! kF

is continuous on ‚. Thinking of (++) as a surface over ‚ which is a function of the ob-


served Y1 .!/, : : : , Yn .!/, ‘best approximations’ (later we will consider all the details
which are necessary here)
b n ., !/  F ./kL2 ./
#n .!/ D arginf kF
2‚

provides an estimator for the unknown parameter # 2 ‚. This estimator is called a


minimum distance (MD) estimator. It compares the empirical quantity F b n ., !/ to its
theoretical counterparts F ./ under possible values of the parameter, in the Hilbert
space L2 ./, where Glivenko–Cantelli

./ lim b n .t /  F# .t /j D 0 P# -almost surely, for every # 2 ‚


sup jF
n!1 t2Rm

makes random surfaces (++) stabilise under P# as n ! 1 at a deterministic limit

.C C C/ ‚3 ! kF# ./  F ./kL2 ./ 2 R ,

for every # 2 ‚. This allows to obtain strong convergence of MD estimators under


very weak – in contrast to ML estimation, cf. Assumptions 1.19 and Theorem 1.24 –
assumptions. We sketch the core of this argument, leaving questions of existence or
measurability to Propositions 2.10–2.11 below. Starting from
² ³
® ¯
b
min kF n  F kL2 ./ < inf b
kF n  F kL2 ./  j#n  #j  "
:j#j" :j#j>"

we take complements and obtain the inclusions


² ³
®  ¯
j#n  #j > "  min kF b n  F kL2 ./  inf kFb n  F kL2 ./
:j#j" :j#j>"
² ³
 kF b n  F# kL2 ./  inf kFb n  F kL2 ./
:j#j>"
²  ³
b
 kF n  F# kL2 ./  inf b
kF#  F k  kF n  F# k
:j#j>"
² ³
b
 2 kF n  F# kL2 ./  inf kF#  F kL2 ./
:j#j>"

for n large enough, using inverse triangular inequality. Glivenko–Cantelli ./ in the
last expression shows that an identifiability condition

./ inf kF  F# kL2 ./ > 0 , for every # 2 ‚ and every " > 0
2‚:j#j>"
Section 2.2 Minimum Distance Estimator Sequences 57

will guarantee consistency of the estimator sequence .#n /n for the unknown parameter
as n ! 1. Due to the structure of the last right-hand side in the chain of inclusions
above, the convergence #n ! # under P# will necessarily be almost sure conver-
gence, for every # 2 ‚. Thanks to the continuity (+), the identifiability condition
./ is usually easy to satisfy in restriction to compact subsets of ‚; difficulties may
arise for at large distances from #, in the case where ‚ is unbounded and of large
dimension. 

We give a list of assumptions and notations to be used in this subsection. The as-
sumptions in 2.8(I) will always be in force; out of the list 2.8(III), we will indicate
separately for each result what we assume.

2.8 Assumptions and Notations for Section 2.2. ¹P# : # 2 ‚º is a family of


probability measures on some space ., A/. The parameter space ‚  Rd is open.
.Fn /n is an increasing family of sub- -fields in A where Fn represents stage n of the
asymptotics (e.g. n-fold independent replication of an experiment). We write Pn,# :D
P# jFn for the restriction of P# to Fn , and consider the sequence of experiments

En :D ., Fn , ¹Pn,# : # 2 ‚º/ , n1.

(I) H is a Hilbert space, with scalar product h, iH and norm kkH , equipped with its
Borel- -field B.H /. We consider a sequence ‰ bn of Fn -measurable H -valued random
variables
bn : ., Fn / ! .H , B.H // , n  1

and a deterministic family ¹‰ : 2 ‚º of objects in H such that

.C/ the mapping ‚ 3 ! ‰ 2 H is continuous.

b n is a statistic in En , and ‰ a theoretical counterpart


At stage n of the asymptotics, ‰
– deterministic and independent of n – for ‰ bn under a true value of the unknown
parameter.
(II) By continuity (+), for ! 2  fixed, the mappings

‚3 ! b n .!/  ‰
‰ 2 Œ0, 1/
H

are continuous: hence for open (or closed) sets B or compact sets F which are con-
tained in ‚,
b n  ‰
inf ‰ , b n  ‰
sup ‰ , b n  ‰
min ‰ , :::
2B H 2B H 2F H

are well-defined random variables on ., Fn /.


(III) We introduce a list of assumptions concerning the objects in (I) under parameter
values from ‚:
58 Chapter 2 Minimum Distance Estimators

(a) Strong law of large numbers SLLN(#): we require

P# -almost surely: bn  ‰# kH ! 0 as n ! 1;
k‰

(b) Identifiability condition I(#): we require

inf k‰  ‰# kH > 0 for ı > 0 arbitrarily small;


2‚ , j#j>ı

(c) Tightness condition T(#): there is a sequence 'n D 'n .#/ " 1 of norming
constants such that
°   ±
L ' n k‰b n  ‰# kH j P# : n  1 is tight in R ;

(d) Differentiability condition D(#) (including non-singularity of the derivative): the


mapping

.C/ ‚ 3 ! ‰ 2 H

D # , i.e. there is a derivative


is Fréchet differentiable at
0 1
D1 ‰#
D‰# :D @    A , Dj ‰# 2 H , 1  j  d
Dd ‰ #
at # such that
1
‰  ‰#  .  #/> D‰# ! 0 as j  #j ! 0 .
j  #j H

We require in addition that the components Dj ‰# , 1  j  d , be linearly indepen-


dent in H .

2.9 Definition. A sequence of estimators #n : ., Fn / ! .Rd , B.Rd // for the
unknown parameter # 2 ‚ is called a minimum distance (MD) sequence if there is a
sequence of events An 2 Fn such that
 
P# lim inf An D 1
n!1
for all # 2 ‚ fixed, and such that for n  1

#n .!/ 2 ‚ and bn .!/  ‰#  .!/


‰ D inf b n .!/  ‰
‰ for all ! 2 An .
n
H 2‚ H

Whenever the symbolic notation


bn  ‰
#n D arginf ‰ , n!1
2‚ H

will be used below, we assume that the above properties do hold.


Section 2.2 Minimum Distance Estimator Sequences 59

2.10 Proposition. Consider a compact exhaustion .Kn /n of ‚. Define with respect to


Kn the event
² ³
An :D min ‰ b n  ‰ D inf ‰ bn  ‰ 2 Fn , n  1 .
2Kn H 2‚ H

Then one can construct a sequence .Tn /n such that the following holds for all n  1:

Tn : ., Fn / ! .Rd , B.Rd // is measurable,


8 ! 2 An : Tn .!/ 2 Kn , b n .!/  ‰T .!/
‰ bn .!/  ‰
D inf ‰ .
n
H 2‚ H

Proof. (1) We fix a compact exhaustion .Kn /n of ‚:

Kn compact in Rd , Kn  int.KnC1 / , Kn " ‚ .

For every n and every ! 2 , define


² ³
Mn .!/ :D 2 Kn : ‰b n .!/  ‰ b n .!/  ‰
D inf ‰
H 2‚ H

the set of all points in Kn where the mapping ! k‰ bn .!/  ‰ kH attains its global
minimum. By continuity of this mapping and definition of An , Mn .!/ is a non-void
closed subset of Kn when ! 2 An , hence a non-void compact set. Thus, for ! 2 An ,
out of arbitrary sequences in Mn .!/, we can select convergent subsequences having
limits in Mn .!/.
(2) For fixed n  1 and fixed ! 2 An we specify one particular point ˛.!/ 2
Mn .!/ as follows. Put

˛1 .!/ :D inf ¹ 1 : there are points D . 1 , 2 , ..., d/ 2 Mn .!/º .

Selecting convergent subsequences in the non-void compact Mn .!/, we see that


Mn .!/ contains points of the form .˛1 .!/, 2 , ..., d /. Next we consider

˛2 .!/ :D inf¹ 2 : there are points D .˛1 .!/, 2 , ..., d / 2 Mn .!/º .

Again, selecting convergent subsequences in Mn .!/, we see that Mn .!/ contains


points of the form .˛1 .!/, ˛2 .!/, 3 , ..., d /. Continuing in this way, we end up with a
point
˛.!/ :D .˛1 .!/, ˛2 .!/, ..., ˛d .!// 2 Mn .!/
which has the property

˛j .!/ D min¹ j : there are points D .˛1 .!/, : : : , ˛j 1 .!/, j, : : : , d/ 2 Mn .!/º

for all components 1  j  d .


60 Chapter 2 Minimum Distance Estimators

(3) For ! 2 An and for the particular point ˛.!/ 2 Mn .!/ selected in step (2),
write ˛ .n/ .!/ :D ˛.!/ for clarity, and define (fixing some default value #0 2 ‚)

Tn .!/ :D #0 1Acn .!/ C ˛ .n/ .!/ 1An .!/ , !2.

Then Tn .!/ represents one point in Kn such that the mapping ! k‰ bn .!/  ‰ kH
attains its global minimum on ‚ at Tn .!/, provided ! 2 An . It remains to show that
Tn is Fn -measurable.
(4) Write for short  :D inf2‚ k‰ bn  ‰ kH . By construction of the sequence
.n/ / , fixing arbitrary .b , : : : , b / 2 Rd and selecting convergent subsequences in
.˛ n 1 d
compacts Kn \ .XjmD1 .1, bj  X Rd m /, the following is seen to hold successively
in 1  m  d :
° ±
.n/ .n/
An \ ˛1  b1 , : : : , ˛m  bm D
\ [ ° ±
bn  ‰ kH <  C r 2 Fn
k‰
²
r>0 rational  2 Qd \Kn
j bj ,1j m

(where again we have used continuity (+) of the mapping ! k‰ bn .!/  ‰ kH on


‚ for ! fixed). As a consequence of this equality, taking m D d , the mapping ! !
˛ .n/ .!/1An .!/ is Fn -measurable. Thus, with constant #0 and An 2 Fn , Tn defined
in step (3) is a Fn -measurable random variable, and all assertions of the proposition
are proved. 

The core of the last proof was ‘measurable selection’ out of the set of points where
! k‰ bn .!/  ‰ kH attains its global minimum on ‚. In asymptotic statistics,
problems of this type arise in almost all cases where one wishes to construct estima-
tors through minima, maxima or zeros of suitable mappings ! H. , !/. See [43,
Thm. A.2 in App. A] for a general and easily applicable result to solve ‘measurable se-
lection problems’ in parametric models, see also [105, Thm. 6.7.22 and Lem. 6.7.23].

2.11 Proposition. Assume SLLN(#) and I(#) for all # 2 ‚. Then the sequence
.Tn /n constructed in Proposition 2.10 is a minimum distance estimator sequence for
the unknown parameter. Moreover, arbitrary minimum distance estimator sequences
.#n /n for the unknown parameter as defined in Definition 2.9 are strongly consistent:

8# 2‚: #n ! # P# -almost surely as n ! 1.

Proof. (1) For the particular sequence .Tn /n which has been constructed in Proposition
2.10, using a compact exhaustion .Kn /n of ‚ and measurable selection on events
² ³
An :D min ‰ b n  ‰ D inf ‰ bn  ‰ 2 Fn
2Kn H 2‚ H
Section 2.2 Minimum Distance Estimator Sequences 61

defined with respect to Kn , we have to show that the conditions of our proposition
imply
 
.ı/ P# lim inf An D 1
n!1

for all # 2 ‚. Then, by Proposition 2.10, the sequence .Tn /n with ‘good sets’ .An /n
will have all properties required in Definition 2.9 of an MD estimator sequence.
Fix # 2 ‚. Since .Kn /n is a compact exhaustion of the open parameter set ‚,
there is some n0 and some "0 > 0 such that B2"0 .#/  Kn for all n  n0 . Consider
0 < " < "0 arbitrarily small. By the definition of An and Tn in Proposition 2.10, we
have for n  n0
² ³
min b
‰ n  ‰ < inf b
‰ n  ‰  An \ ¹jTn  #j  "º .
:j#j" H :j#j>" H

Passing to complements this reads for n  n0

¹jTn  #j > "º [ Acn


² ³
 min b
‰ n  ‰  inf b
‰ n  ‰
:j#j" H :j#j>" H
² ³
 ‰ b n  ‰#  inf bn  ‰

H :j#j>" H
²  ³
 k‰ bn  ‰# kH  inf bn  ‰# kH
k‰#  ‰ kH  k‰
:j#j>"
² ³
 Cn :D 2  ‰ bn  ‰#  inf ‰  ‰# H
H :j#j>"

(in the third line, we use inverse triangular inequality). For the event Cn defined by
the right-hand side of this chain of inclusions, SLLN(#) combined with I(#) yields
 
P# lim sup Cn D P# .¹! : ! 2 Cn for infinitely many nº/ D 0 .
n!1

But Acn is a subset of Cn for n  n0 , hence we have P# .lim sup Acn / D 0 and thus .ı/.
n!1
(2) Next we consider an arbitrary MD estimator sequence .#n /n for the unknown
parameter according to Definition 2.9, and write .e
An /n for its sequence of ‘good sets’:
thus e
An 2 Fn , for ! 2 e An the mapping ! k‰ bn .!/  ‰ kH attains its global
 .!/, and we have
minimum on ‚ at #n
 
.ıı/ P# lim inf e An D 1 .
n!1

By Definition 2.9, we have necessarily


² ³
 c 
min b n  ‰
‰ < inf bn  ‰
‰  e
An [ ¹j#n  #j  "º
:j#j" H :j#j>" H
62 Chapter 2 Minimum Distance Estimators

(note the different role played by general ‘good sets’ e


An in comparison to the particular
construction of Proposition 2.10 and step (1) above). If we transform the left-hand side
by the chain of inclusions of step (1) with Cn defined as there, we now obtain
¹j#n  #j > "º \ e
An  Cn
for all n  1, and thus
¹j#n  #j > "º  Cn [ e
Acn , n1.
Again SLLN(#) and I(#) guarantee P# .lim sup Cn / D 0 as in step (1) above. Since
n!1
property .ıı/ yields P# .lim sup e
Acn / D 0 , we arrive at
n!1
 
P# lim sup ¹j#n  #j > "º D 0 .
n!1

This holds for all " > 0, and we have proved P# -almost sure convergence of .#n /n
to #. 

Two auxiliary results prepare for the proof of our main result – the representation of
rescaled MD estimator errors, Lemma 2.13 below – in this section. The first is purely
analytical.

2.12 Lemma. Under I(#) and D(#) we have


 
lim lim inf inf 'n ‰#Ch='n  ‰# H
D1
c"1 n!1 jhj>c

for any sequence 'n " 1 of real numbers.

Proof. By assumption I(#), we have for " > 0 fixed as n ! 1


 
.C/ inf 'n ‰#Ch='n  ‰# H D 'n  inf ‰  ‰# H
! 1 .
h:jh='n j" :j#j"

Assumption D(#) includes linear independence in H of the components


D1 ‰# , : : : , Dd ‰# of the derivative D‰# . Since u ! ku> D‰# kH is contin-
uous, we have on the unit sphere in Rd
.CC/  :D min ku> D‰# kH > 0 .
jujD1

Assumption D(#) shows also


1 
.C C C/ sup ‰  ‰#  .  #/> D‰# <
:j#j<" j  #j H 2
provided " > 0 is small enough. For any 0 < c < 1, because of (+), all what remains
to consider in  
inf 'n ‰#Ch='n  ‰# H
jhj>c
Section 2.2 Minimum Distance Estimator Sequences 63

are contributions
 
inf 'n ‰#Ch='n  ‰# H
, " > 0 arbitrarily small
jhj>c , jh='n j<"

which by (++) and (+++) allow for lower bounds


‰#Ch='n  ‰# H
inf jhj
jhj>c , jh='n j<" jh='n j
‰  ‰# H
c  inf
 : 0<j#j<" j  #j

.  #/> D‰# H
 c  inf
 : 0<j#j<" j  #j

‰  ‰#  .  #/> D‰# H
 sup
 : j#j<" j  #j
 
 c  .  / D c 
2 2
which increase to 1 together with c. 

2.13 Lemma. In addition to I(#) and D(#), assume T(#) with norming constants
'n D 'n .#/ " 1. Then arbitrary MD sequences .#n /n according to Definition 2.9
are .'n /n -consistent at #:
®   ¯
L 'n .#n  #/ j P# : n  1 is tight in Rd .

Proof. Let .#n /n with ‘good sets’ .An /n denote any choice of a MD estimator se-
quence for the unknown parameter according to Definition 2.9.
(1) For K < 1 arbitrarily large but fixed, we repeat the reasoning of step (2) in
the proof of Proposition 2.11, except that we insert K='n in place of the " there, for n
large enough. Writing
² ³
Cn .K/ :D 2  ‰ b n  ‰#  inf ‰  ‰# H
H  : j#j>K='n

we thus obtain
®ˇ ˇ ¯
.ı/ ˇ'n .#   #/ˇ > K  Cn .K/ [ Ac
n n

for n large enough, with ‘good sets’ An satisfying P# .lim inf An / D 1.


n!1
(2) Under assumptions I(#) and D(#), Lemma 2.12 shows that deterministic quan-
tities  
lim inf inf 'n ‰#Ch='n  ‰# H
n!1 jhj>K
64 Chapter 2 Minimum Distance Estimators

can be made arbitrarily large by choosing K large, whereas the tightness condition
T(#) yields    
lim sup P# 'n ‰b n  ‰# > M D0.
M "1 n1 H

Combining both statements, we obtain for the events Cn .K/ in step (1)
for every " > 0 there is K D K."/ < 1 such that lim sup P# . Cn .K."// / < " .
n!1

The ‘good sets’ An of step (1) satisfy a fortiori


lim P# .An / D 1
n!1
S T T
(view lim inf An D m nm An as increasing limit of events nm An as m
n!1
tends to 1). Combining both last assertions with .ı/, we have for every " > 0 some
K."/ < 1 such that
ˇ ˇ 
lim sup P# ˇ'n .#n  #/ˇ > K."/ < "
n!1

holds. This finishes the proof. 

We arrive at the main result of this section.

2.14 Theorem (Millar [100]). Assume SLLN(#), I(#), D(#), and T(#) with a se-
quence of norming constants 'n D 'n .#/ " 1. With d d matrix
˝ ˛
ƒ# :D Di ‰# , Dj ‰# 1i,j d

(invertible by assumption D(#)) define a linear mapping …# : H ! Rd


0 1
hD1 ‰# , f i
…# .f / :D ƒ1 @ ::: A , f 2H .
#
hDd ‰# , f i
Then rescaled estimation errors at # of arbitrary minimum distance estimator sequenc-
es .#n /n as in Definition 2.9 admit the representation
 
'n .#n  #/ D …# 'n .‰ b n  ‰# / C oP .1/ , n ! 1 .
#

Proof. (1) We begin with a preliminary remark. By assumption D(#), see Assump-
tions 2.8(III), the components D1 ‰# , .., Dd ‰# of the derivative D‰# are linearly
independent in H . Hence
V# :D span.Di ‰# : 1  i  d /
is a d -dimensional closed linear subspace of H . For points h 2 Rd with components
h1 , : : : , hd and for elements f 2 H , the following two statements are equivalent:
Section 2.2 Minimum Distance Estimator Sequences 65

Pn
(i) the orthogonal projection of f on V# takes the form iD1 hi Di ‰# D h> D‰# ;
(ii) one has …# .f / D h .
Note that the orthogonal projection of f on V# corresponds to the unique h in Rd
such that f  h> D‰# ? V# . This can be rewritten as

X
d  X
d
0D f hi Di ‰# , Dj ‰# D hf , Dj ‰# i hi .ƒ# /i,j , for all 1  j  d
iD1 iD1

by definition of ƒ# , and thus in the form .hD1 ‰# , f i, : : : , hDd ‰# , f i/ D h> ƒ# ;


transposing and using symmetry of ƒ# the last line gives
0 1
hD1 ‰# , f i
ƒ# h D @ ::: A.
hDd ‰# , f i
Inverting ƒ# we have the assertion.
bn  ‰# / under P# in place of f
(2) Inserting the H -valued random variable 'n .‰
in step (1), we define  
b
hn :D …# 'n .‰b n  ‰# /

which is a random variable taking values in Rd . Then by step (1),


./ b
h> b
n D‰# is the orthogonal projection of 'n .‰ n  ‰# / on the subspace V# .

bn  ‰# /kH under P# as
Moreover, assumption T.#/ imposing tightness of k'n .‰
n ! 1 guarantees
the family of laws L. b
hn j P# /, n  1 , is tight in Rd
since …# is a linear mapping.
(3) Let .#n /n denote any MD estimator sequence for the unknown parameter with
‘good sets’ .An /n as in Definition 2.9: for every n  1, on the event An , the mapping

‚ 3 ! bn  ‰
‰ 2 Œ0, 1/
H

attains its global minimum at #n , and we have (in particular, cf. proof of Lemma 2.13)
lim P# .An / D 1. With the norming sequence of assumption T.#/, put
n!1

hn :D 'n .#n  #/ , ‚#,n :D ¹h 2 Rd : # C h='n 2 ‚º .


Then the sets ‚#,n increase to Rd as n ! 1 since ‚ is open. Rewriting points 2 ‚
relative to # in the form # C h='n , h 2 ‚#,n , we have the following: for every n  1,
on the event An , the mapping
./ ‚#,n 3 h ! bn  ‰#Ch=' / D: n,# .h/
'n .‰ n
66 Chapter 2 Minimum Distance Estimators

attains a global minimum at hn . By Lemma 2.13,


the family of laws L. hn j P# /, n  1 , is tight in Rd .
(4) In order to prove our theorem, we have to show
h D b
n hn C oP .1/ as n ! 1
#

with the notation of steps (2) and (3). The idea is as follows. On the one hand, the
function n,# ./ defined in ./ attains a global minimum at hn (on the event An ),
on the other hand, by ./,
bn  ‰# /  'n .‰#Ch='  ‰# /
n,# .h/ D 'n .‰ n
 
b n  ‰# /  h> D‰#  'n ‰#Ch='  ‰#  .h='n /> D‰#
D 'n .‰ n

will be close (for n large) to the function


.  / b n  ‰# /  h> D‰#
Dn,# .h/ :D 'n .‰
which admits a unique minimum at b
hn according to ./ in step (2), up to small pertur-
bations of order  
'n ‰#Ch='n  ‰#  .h='n /> D‰#
which have to be negligible as n ! 1, uniformly in compact h-intervals, in virtue
of the differentiability assumption D.#/. We will make this precise in steps (5)–(7)
below.
(5) By definition of the random functions n,# ./ and Dn,# ./, an inverse triangular
inequality in H establishes
 
j n,# .h/  Dn,# .h/ j  'n ‰#Ch='n  ‰#  .h='n /> D‰#
for all ! 2 , all h 2 ‚n,# and all n  1. The bound on the right-hand side is
deterministic. The differentiability assumption D.#/ shows that for arbitrarily large
constants C < 1,
 
sup 'n ‰#Ch='n  ‰#  .h='n /> D‰# 
jhjC

‰#Ch='n  ‰#  .h='n /> D‰#


C sup
jhjC jh='n j
vanishes as n ! 1. For n sufficiently large, ‚n,# includes ¹jhj  C º. For all ! 2 ,
we thus have
.˘/ sup j n,# .h/  Dn,# .h/ j ! 0 as n ! 1
h2K

on arbitrary compacts K in Rd .
(6) Squaring the random function Dn,# ./ defined in .  /, we find a quadratic
lower bound
ˇ ˇ2
.C/ 2
Dn,# .h/  Dn,#
2
.b
hn / C ˇ h  b
hn ˇ 2 , for all ! 2 , h 2 Rd , n  1
Section 2.2 Minimum Distance Estimator Sequences 67

around its unique minimum at b


hn , with

 D min k u> D‰# k > 0


jujD1

introduced in (++) in the proof of Lemma 2.12. Assertion (+) is proved as follows.
From ./ in step (2)

b
h> b
n D‰# is the orthogonal projection of 'n .‰ n  ‰# / on the subspace V#

or equivalently
 
b n  ‰# /  b
'n .‰ h>
n D‰# ? V#

we have for the function Dn,# ./ according to Pythagoras


ˇ ˇ2
.h/ D .h  b
hn /> D‰# .b
hn /  ˇh  b
hn ˇ 2 C Dn,# .b
2
2
Dn,# C Dn,#
2 2
hn / .

This assertion holds for all ! 2 , all h 2 Rd and all n  1.


(7) To conclude the proof, we fix " > 0 arbitrarily small. By tightness of .b
hn /n and
 / under P , see steps (2) and (3) above, there is a compact K D K."/ in Rd such
.hn n #
that
  "   "
sup P# b hn … K < , sup P# hn … K < .
n1 2 n1 2
Then for the MD estimator sequence .#n /n of step (3), with ‘good sets’ .An /n ,
 
P# jhn  b
hn j > "
     
 P# b hn … K C P# hn … K C P# Acn
 
C P# ¹b hn 2 Kº \ ¹hn 2 Kº \ An \ ¹jhn  b
hn j > "º
" "  
 C C P# Acn
2 2
 ® 2 ¯
C P# ¹b hn 2 Kº \ ¹hn 2 Kº \ An \ Dn,# .hn /  Dn,#
2
.b
hn / C "2 2

where we have used the quadratic lower bound (+) for Dn,# 2 ./ around b
hn from step
(6). Recall that lim P# .An / D 1, as in step (2) in the proof of Lemma 2.13. Thanks
n!1
to the approximation .˘/ from step (5), we can replace – uniformly on K as n ! 1
2 ./ by 2 ./. This gives
– the random function Dn,# n,#
 
lim sup P# jhn  b
hn j > "
n!1
 ® 1 ¯
< " C lim sup P# An \ 2n,# .hn /  2n,# .b
h n / C "2  2 .
n!1 2
68 Chapter 2 Minimum Distance Estimators

We shall show that the last assertion implies


 
.˘˘/ lim sup P# jhn  b hn j > " < " .
n!1

For n large enough, the compact K is contained in ‚#,n . For ! 2 An , by ./ in step
(3), the global minimum M of the mapping ‚#,n 3 h ! 2n,# .h/ is attained at hn .
Hence for n large enough, the intersection of the sets
² ³
b 1 2 2
An and n,# .hn /  M  " 
2
2
must be void: this proves .˘˘/. By .˘˘/, " > 0 being arbitrary, the sequences .hn /n
and .b
hn /n under P# are asymptotically equivalent. Thus we have proved

hn D b
hn C oP# .1/ as n ! 1
which – as stated at the start of step (4) – concludes the proof. 

2.15 Remark. We indicate a variant of our approach. Instead of almost sure conver-
gence of the MD sequence .#n /n to the true parameter one might be interested only
in convergence in probability (consistency in the usual sense of Definition 1.9). For
this, it is sufficient to work with Definition 2.9 of MD estimator sequences with good
sets .An /n which satisfy the weaker condition limn!1 P# .An / D 1 , and to weaken
the condition SLLN(#) to k‰ bn  ‰# kH ! 0 in P# -probability, i.e. to a weak
law of large numbers WLLN(#). With these changes, Proposition 2.12, Lemma 2.13
and Theorem 2.14 remain valid, and Proposition 2.11 changes to convergence in P# -
probability instead of P# -almost sure convergence.

2.3 Some Comments on Gaussian Processes


In the case where the Hilbert space of Section 2.2 is H D L2 .T , T , / as consid-
ered in Assumption 2.1(a), Gaussian processes arise as limits of (weakly convergent
bn  ‰# / under P# .
subsequences of) 'n .‰

2.16 Definition. Consider a measurable space .T , T / with countably generated  -field


T , and a symmetric mapping K., / : T T ! R.
(a) For finite measures  on .T , T /, a real valued measurable process .X t / t2T with
the property
8
< there is an exceptional
  N 2 T of -measure zero such that
set
L X t1 , : : : , X tl D N 0l , .K.ti , tj /i,j D1,...,l
:
for arbitrary t1 , : : : , tl 2 T n N , `  1
is called (centred) -Gaussian with covariance kernel K., /.
Section 2.3 Some Comments on Gaussian Processes 69

(b) A real valued measurable process .X t / t2T with the property


²    
L X t1 , : : : , X tl D N 0l , .K.ti , tj /i,j D1,...,l
for arbitrary t1 , : : : , tl 2 T , `  1

is called (centred) Gaussian with covariance kernel K., /.

2.17 Examples. (a.i) For T D Œ0, 1/ or T D Œ0, 1, consider standard Brown-
ian motion .B t / t2T with B0  0. By continuity of all paths, B defined on any
.0 , A0 , P 0 / is a measurable stochastic process (cf. Exercise 2.1’(b)). By indepen-
dence of increments, B t2  B t1 having law N .0, t2  t1 / , we have E.B t1 B t2 / D
E.B t21 / C E.B t1 .B t2  B t1 // D t1 for t1 < t2 . Hence, writing

K.t1 , t2 / :D E.B t1 B t2 / D t1 ^ t2 for all t1 , t2 in T ,

Brownian motion .B t / t2T is a Gaussian process in the sense of Definition 2.16(b)


with covariance kernel K., /.
(ii) If a finite measure  on .T , T / satisfies the condition
Z  Z Z
E B t .dt / D
2
K.t , t / .dt / D t .dt / < 1 ,
T T T
R
we can modify the paths of B on the P 0 -null set A :D ¹ T B t2 .dt / D 1º in A0 (put
B.t , !/  0 for all t 2 T if ! 2 A): then Brownian motion .B t / t2T is a measurable
process with paths in L2 .T , B.T /, / according to Lemma 2.2.
(b.i) Put T D Œ0, 1 and consider Brownian bridge
 
B 0 D B t0 0t1 , B t0 :D B t  t  B1 , 0  t  1 .

Transformation properties of multidimensional normal laws under linear transforma-


tions and
0 1 0 1
0 0 1 B t1 1 0 : : : 0 t1
B t1 B ::: C B 0 1 : : : 0 t2 C
@ ::: A D e AB@
C, e
A A :D B@ :::
C
A
B tl
B t0l
B1 0 0 : : : 1 tl

yield the finite dimensional distributions of B 0 : hence Brownian bridge is a Gaussian


process in the sense of Definition 2.16(b) with covariance kernel

K.t1 , t2 / D t1 ^ t2  t1 t2 , t1 , t2 2 Œ0, 1 .

(ii) The paths of B 0 being bounded functions on Œ0, 1, Brownian bridge is a mea-
surable stochastic process with paths in L2 .Œ0, 1, B.Œ0, 1/, / (apply Lemma 2.2) for
every finite measure  on .Œ0, 1, B.Œ0, 1//.
70 Chapter 2 Minimum Distance Estimators

(c.i) Fix a distribution function F , associated to an arbitrary probability distribution


on .R, B.R//. Brownian bridge time-changed by F is the process
 
B 0,F D B t0,F , B t0,F :D BF0 .t/ , t 2 R .
t2R

All paths of B 0,F being cJadlJag (right continuous with left-hand limits: this holds by
construction since F is cJadlJag), B 0,F is a measurable stochastic process (cf. Exer-
cise 2.1’(b)). Using (b), B 0,F is a Gaussian process in the sense of Definition 2.16(b)
with covariance kernel
K.t1 , t2 / D F .t1 / ^ F .t2 /  F .t1 /F .t2 / , t1 , t2 2 R .
(ii) The paths of B 0,F being bounded functions on R, Brownian bridge time-changed
by F is a process with paths in L2 .R, B.R/, / by Lemma 2.2, for arbitrary choice
of a finite measure . 

Gaussian processes have been considered since about 1940, together with explicit
orthogonal representations of the process in terms of eigenfunctions and eigenvalues
of the covariance kernel K., / (Karhunen–Loève expansions). See Loeve [89, vol. 2,
Sects. 36–37, in particular p. 144] or [89, 3rd. ed., p. 478], see also Gihman and Sko-
rohod [28, vol. II, pp. 229–230].

2.18 Theorem. Consider T compact in Rk , equipped with its Borel  -field T D


B.T /. Consider a mapping K., / : T T ! R which is symmetric, continuous,
and non-negative definite in the sense
X
˛i K.ti , tj / ˛j  0 for all `  1, t1 , : : : , t` 2 T , ˛1 , : : : , ˛` 2 R .
i,j D1,:::,`

Then for every finite measure  on .T , T /, there is a real valued measurable process
.X t / t2T which is -Gaussian with covariance kernel K., / as in Definition 2.16(a).

Proof. (1) Since the kernel K.., ./ is symmetric and non-negative definite, for arbitrary
choice of t1 , : : : , t` in T , `  1, there is a centred normal law Pt1 ,:::,t` with covariance
matrix .K.ti , tj //i,j D1,:::,` on .R` , B.R` //, with characteristic function
>
R` 3  ! e  2  †  , † :D .K.ti , tj //i,j D1,:::,` .
1

By the consistency theorem of Kolmogorov, the family of laws


 
` `
P t1 ,:::,t` probability measure on X R , ˝ B.R/ , t1 , : : : , t` 2 T , `1
iD1 iD1

being consistent, there exists a unique probability measure


 
P on ., A/ :D X R , ˝ B.R/
t2T t2T
Section 2.3 Some Comments on Gaussian Processes 71

such that the canonical process X D .X t / t2T on ., A, P / – the process of coordinate
projections – has finite dimensional distributions
   
L .X t1 , : : : , X t` / j P D N .0/iD1,:::,` , .K.ti , tj //i,j D1,:::,` ,
.C/
t1 , : : : , t` 2 T , `  1 .

Since K.., ./ is continuous, any convergent sequence tn ! t in T makes


 
EP .X tn  X t /2 D K.tn , tn /  2K.tn , t / C K.t , t /

vanish as n ! 1. In this sense, the process .X t / t2T under P is ‘mean square con-
tinuous’. As a consequence, X under P is continuous in probability:

.CC/ for convergent sequences tn ! t : X tn D X t C oP .1/ , n!1.

So far, being a canonical process on a product space, X has no path properties.


(2) We define T ˝A–measurable approximations .X m /m to the process X . As in
Exercise 2.1’, we cover Rk with half-open cubes Al .m/
 
k lj lj C1
Al .m/ :D X m
, m , l D .l1 , ..., lk / 2 Zk
j D1 2 2
according to a k-dimensional dyadic grid with step size 2m in each dimension, and
define ƒ.m, T / as the set of all indices l 2 Zk such that Al .m/ \ T ¤ ;. For
l 2 ƒ.m, T / we select some point tl .m/ in Al .m/ \ T . Note that for t 2 T fixed
and m 2 N, there is exactly one l 2 Zk such that t 2 Al .m/ holds: we write l.t , m/
for this index l. Define T ˝A-measurable processes
X
X m .t , !/ :D 1Al .m/\T .t / X.tl .m/, !/ , t 2 T , ! 2  , m  1 .
l2ƒ.m,T /

Then by (+) in step (1), arbitrary finite dimensional distributions of X m are Gaussian
with covariances
    

E X m .t1 /X m .t2 / D K tl.t 1 ,m/
.m/ , tl.t 2 ,m/
.m/ , t1 , t2 2 T .

By continuity of K., / and convergence tl.t,m/ .m/ ! t , the finite dimensional distri-
m converge as m ! 1 to those of the process X constructed in step (1):
butions of X
the last equation combined with (+) gives for arbitrary t1 , : : : , t` 2 T and `  1
   
L .X tm1 , : : : , X tm` / j P ! L .X t1 , : : : , X t` / j P
./
(weak convergence in R` , as m ! 1) .

(3) We fix any finite measure  on .T , T / and show that the sequence X m converges
in L2 .T , T ˝A, ˝P / as m ! 1 to some limit process X. e Renormalising , it
is sufficient to consider probability measures e  on .T , T /.
72 Chapter 2 Minimum Distance Estimators

(i) We prove that .X m /m is a Cauchy sequence in L2 .T , T ˝A, e ˝P /. First,


every X belongs to L .T , T ˝A, e
m 2 ˝P / since K., / is bounded on the compact
T T:
Z Z
 m 2  
X .t , !/ .e ˝P /.dt , d!/ D E .X tm /2 e .dt /
T  T
 Œ sup
0
K.t 0 , t 0 / e
.T / < 1 .
t 2T

Consider now pairs .m, m0 / where m0 > m. By construction of the covering of Rk


in step (2) at stages m and m0 , we have for any two indices l 0 and l either Al 0 .m0 / 
Al .m/ or Al 0 .m0 / \ Al .m/ D ;. Thus
Z Z  
m0 m 2 0
jX  X j d.e ˝P / D E .X tm  X tm /2 e .dt /
T  T

takes the form


X   
 Al 0 .m0 / \ Al .m/ \ T
e K.tl0 .m0 /, tl0 .m0 //
l 0 2ƒ.m0 ,T /
.ı/ l2ƒ.m,T / 
 2K.tl0 .m0 /, tl .m// C K.tl .m/, tl .m// .

Since K., / is uniformly continuous on the compact T T and since jtl0 .m0 / 
p
tl .m/j  k 2m for indices l 0 and l such that Al 0 .m0 /  Al .m/, integrands in .ı/
 
K.tl0 .m0 /, tl0 .m0 //  2K.tl0 .m0 /, tl .m// C K.tl .m/, tl .m//
for t 2 Al 0 .m0 / \ Al .m/

vanish as m ! 1 uniformly in t 2 T and uniformly in m0 > m. Hence, e  being a


finite measure, integrals .ı/ vanish as m ! 1 uniformly in m0 > m, and we have
proved Z
0
lim sup 0
jX m  X m j2 d.e
˝P / D 0 .
m!1 m >m T 
(ii) For the Cauchy sequence .X m /m under e e 2 L2 .T ,
˝P there is some X
T ˝A, e ˝P / such that

.C C C/ e in L2 .T , T ˝A, e
X m ! X ˝P / as m ! 1 .
e is measurable, and we deduce from (+++) convergence X tm ! X
In particular, X e t in
2 .P / as m ! 1 for e-almost all t 2 T ; the exceptional e
-null set in T arising here
L
can in general not be avoided.
(4) In (+++), we select a subsequence .mk /k along which e ˝P -almost sure con-
vergence holds:
e
X mk ! X e
˝P -almost surely on .T , T ˝A/ as k ! 1 .
Section 2.3 Some Comments on Gaussian Processes 73

Thus there is some set M 2 T ˝A of full measure under e


˝P such that
./ e
1M X mk ! 1M X pointwise on T  as k ! 1 .
With notation M t for t -sections through M
M t D ¹! 2  : .t , !/ 2 M º 2 A , t 2 T
R
˝P /.M / D T P .M t / e
we have 1 D .e .dt /. Hence there is some e
-null set N 2 T
such that
.  / P .M t / D 1 for all t in T n N .
Now the proof is finished: for arbitrary `  1 and t1 , : : : , t` in T nN , we have pointwise
convergence
   
.˘/ e t1 , : : : , 1Mt X
1M t1 X tm1 k , : : : , 1M t` X tm` k ! 1M t1 X e t` , k ! 1
`

for all ! 2  by ./, and at the same time –combining .  / and ./– weak
convergence in Rl
     
.˘˘/ L 1M t1 X tm1 k , : : : , 1M t` X tm` k ! N 0l , K.ti , tj / i,j D1,...,l , k ! 1 .

This shows that the (real valued and measurable) process 1M X e is -Gaussian with
covariance kernel K., / in the sense of Definition 2.16(a). 

Frequently in what follows, we will consider -integrals along the path of a -


Gaussian process.

2.19 Proposition. Consider a -Gaussian process .X t / t2T with covariance kernel


K., /, defined on some .0 , A0 , P 0 /. Assume that T is compact in Rk and that K., /
is continuous on T T . Then, modifying paths of X on some P 0 -null set N 0 2 A0 if
necessary, X is a process with paths in L2 .T , T , /, and we have
Z
g.t / X t .dt / N .0,  2 /
TZ Z
where  2 :D g.t1 / K.t1 , t2 / g.t2 / .dt1 /.dt2 /
T T

for functions g 2 L2 .T , T , /.

Proof. For X -Gaussian, fix an exceptional -null set N 2 T such that whenever
t1 , : : : , tr do not belong to N , finite dimensional laws L .X t1 , : : : , X tr j P 0 / are normal
laws with covariance matrix .K.ti , tj //i,j D1,:::,r .
(1) X on .0 , A0 , P 0 / being real valued and measurable, the set
² Z ³
0 0
N :D ! 2  : X .t , !/ .dt / D C1 2 A0
2
T
74 Chapter 2 Minimum Distance Estimators

has P -measure zero since under our assumptions


Z Z
X 2 d.˝P 0 / D EP 0 .X t2 / .dt /  Œsup K.t , t / .T / < 1 .
T 0  T t2T

Redefining X.t , !/ D 0 for all t 2 T if ! 2 N 0 , all paths of X are in L2 .T , T , /.


l l C1
(2) Cover Rk with half-open intervals Al .m/ :D XjkD1 Œ 2jm , j2m /, l D .`1 , : : : , lk /
2 Zk , as in step (2) of the proof of Theorem 2.18. Differently from the proof of The-
orem 2.18, define ƒ.m, T / as the set of all l such that .Al .m/ \ T / > 0. For
l 2 ƒ.m, T /, select tl .m/ in Al .m/ \ T such that tl .m/ does not belong to the
exceptional set N  T , and define
X
X m .t , !/ :D 1Al .m/\T .t / X.tl .m/, !/ , t 2 T , ! 2  , m  1 .
l2ƒ.m,T /

Then all finite dimensional distributions of X m , m  1, are normal distributions;


ƒ.m, T / being finite since T is compact, all X m have paths in L2 .T , T , /. For func-
tions g 2 L2 .T , T , /, -integrals
Z X  
g.t / X tm .dt / D X.tl .m//  g 1Al .m/\T
T
l2ƒ.m,T /

are random variables on .0 , A0 , P 0 / following normal laws


 
N 0 , m2
where
X      
2
m :D  g 1Al .m/\T K tl .m/, tl0 .m/  g 1Al 0 .m/\T .
l,l 0 2ƒ.m,T /

Under our assumptions, jgj.t /.dt / is a finite measure on .T , T /, and the kernel
K., / is continuous and bounded on T T . Thus by dominated convergence
Z
m2
! g.t1 / K.t1 , t2 / g.t2 / .˝/.dt1 , dt2 / D  2 , m ! 1
T T
which yields
Z
L  
.C/ g.t / X tm .dt / ! N 0 ,  2 as m!1.
T

(3) As in the proof of Theorem 2.18 we have for points t1 , : : : , tr in T n N


  L  
L .X tm1 , : : : , X tmr / j P 0 ! L .X t1 , : : : , X tr / j P 0
(weak convergence in Rr , as m ! 1)
where all laws on the right-hand side are normal laws, and for points t in T n N
sup E.jX tm j2 /  sup
0
K.t 0 , t 0 / < 1 and lim E.jX tm j2 / D E.jX t j2 / .
m t 2T m!1
Section 2.4 Asymptotic Normality for Minimum Distance Estimator Sequences 75

Thus the assumptions of Theorem 2.4(a) are satisfied (here we use the sufficient con-
dition (2.4”) with p D 2 and constant f to establish uniform integrability). Applying
Theorem 2.4(a) we obtain
L
Xm ! X (weak convergence in L2 .T , T , / as m ! 1) ,

from which the continuous mapping theorem (see Exercise 2.3’) gives weak conver-
gence of integrals
Z Z
m L
g.t / X t .dt / ! g.t / X t .dt /
.CC/ T T
(weak convergence in R as m ! 1) .

Comparing (++) to (+), the assertion is proved. 

2.4 Asymptotic Normality for Minimum Distance


Estimator Sequences
In this section, we continue the approach of Section 2.2: based on the representation of
Theorem 2.14 of rescaled MD estimation errors, the results of Sections 2.3 and 2.1 –
with H D L2 .T , T , / as Hilbert space – will allow to prove asymptotic normality.
Our set of Assumptions 2.20 provides additional structure for the empirical quanti-
b n from which MD estimators were defined, merging the Assumptions 2.1 of
ties ‰
Section 2.1 with the Assumptions 2.8(I) of Section 2.2. We complete the list of As-
sumptions 2.8(III) by introducing an asymptotic normality condition AN(#). The main
result of this subsection is Theorem 2.22.

2.20 Assumptions and Notations for Section 2.4. (a) T is compact in Rk , T D


B.T /, and
H :D L2 .T , T , /
for some finite measure  on .T , T /. For ‚  Rd open, ¹P# : # 2 ‚º is a family of
probability measures on some ., A/; we have an increasing family of sub- -fields
.Fn /n in A, and Pn,# :D P# jFn is the restriction of P# to Fn . We consider the
sequence of experiments

En :D ., Fn , ¹Pn,# : # 2 ‚º/ , n1,

b n of Fn -measurable H -valued random variables


and have a sequence ‰

bn :
‰ ., Fn / ! .H , B.H // , n1
76 Chapter 2 Minimum Distance Estimators

together with a deterministic family ¹‰ : 2 ‚º in H such that


.C/ the mapping ‚ 3 ! ‰ 2 H is continuous.

Specialising to compacts in Rk , this unites the Assumptions 2.1(a) and 2.8(I).


(b) We now impose additional spatial structure on ‰b n , n  1: there is a measurable
process
X n : .T , T ˝Fn / ! .R, B.R// with paths in H D L2 .T , T , /
b n in (a) arises as path
(cf. Lemma 2.2) such that ‰
b n .!/ :D Xn .!/ D X n ., !/ ,
‰ !2
of X n , at every stage n  1 of the asymptotics.

The spatial structure assumed in (b) was already present in Example 2.7. We com-
plete the list of Assumptions 2.8(III) by strengthening the tightness condition T(#)
in 2.8(III):

2.21 Asymptotic Normality Condition AN(# ). Following Notations and Assump-


tions 2.20, there is a sequence of norming constants 'n D 'n .#/ " 1 such that
processes  
W n :D 'n X n  ‰# , n  1 , under P#
have the following properties: there is a kernel
K., / : T T ! R symmetric, continuous and non-negative definite,
an exceptional set N 2 T of -measure zero such that for arbitrary points t1 , : : : , tl
in T n N , l  1 ,
  L  
L .W tn1 , : : : , W tnl / j P# ! N 0 , .K.ti , tj //1i,j l
(weak convergence in Rl , n ! 1) ,
and some function f 2 L1 .T , T , / such that for -almost all t in T
E. jW tn j2 j P# /  f .t / < 1 , n  1, lim E. jW tn j2 j P# / D K.t , t / .
n!1

Now we state the main theorem on minimum distance estimators. It relies directly
on Proposition 2.11 and on the representation of rescaled estimation errors in Theo-
rem 2.14.

2.22 Theorem. Under Assumptions 2.20, let for every # 2 ‚ the set of conditions
SLLN(#), I(#), D(#), AN(#), with 'n D 'n .#/ " 1
Section 2.4 Asymptotic Normality for Minimum Distance Estimator Sequences 77

hold. Then any minimum distance estimator sequence .#n /n for the unknown param-
eter # 2 ‚ defined according to Definition 2.9 is (strongly consistent and) asymptot-
ically normal. We have for every # 2 ‚
   
L 'n .#n  #/ j P# ! N 0 , ƒ1 # „# ƒ#
1

(weak convergence in Rd , as n ! 1), with a d d matrix „# having entries


Z Z
.„# /i,j D Di ‰# .t1 / K.t1 , t2 / Dj ‰# .t2 / .dt1 /.dt2 /
.C/ T T
1  i, j  d ,
and with ƒ# as defined in Theorem 2.14:
˝ ˛
.ƒ# /i,j D Di ‰# , Dj ‰# , 1  i, j  d .

Proof. Fix # 2 ‚. T being compact in Rk , for any finite measure  on .T , T / and


for any covariance kernel K., / on T T which is symmetric, continuous and non-
negative definite, there exists a real valued measurable process W which is -Gaussian
with covariance kernel K., /, by Theorem 2.18. Depending on #, W D W .#/ is
defined on some space .0 , A0 , P 0 /. As in the proof of Proposition 2.19, W has paths
in L2 .T , T , / (after modification of paths on some P 0 -null set in A0 ).
(1) For ‰b n D Xn as in Assumption 2.20(b), and W n D 'n .X n  ‰# / under P#
as in Condition 2.21, we have an exceptional set N 2 T of -measure zero such that
for arbitrary points t1 , : : : , tl in T n N , l  1 ,
  L  
L .W tn1 , : : : , W tnl / j P# ! N 0 , .K.ti , tj //1i,j l
(weak convergence in Rl , n ! 1)
by assumption AN(#). Defining N 0 2 T as the union of N with the exceptional -null
set which is contained in the definition of W as a -Gaussian process (cf. Definition
2.16), we have for t1 , : : : , tl in T n N 0
  L  
L .W tn1 , : : : , W tnl / j P# ! L .W t1 , : : : , W tl / j P#
(weak convergence in Rl , n ! 1).
Thus we have checked assumption (a.i) of Theorem 2.4. Next, again by assumption
AN(#), there is some function f 2 L1 .T , T , / such that for -almost all t 2 T
E. jW tn j2 j P# /  f .t / < 1 , n  1, lim E. jW tn j2 j P# / D K.t , t / .
n!1
This is assumption (a.ii) of Theorem 2.4, via condition (2.4”); hence Theorem 2.4
establishes
L
Wn ! W (weak convergence in L2 .T , T , /, as n ! 1) .
78 Chapter 2 Minimum Distance Estimators

Write h., .i for the scalar product in L2 .T , T , /. Then the continuous mapping theo-
rem gives
L
.˘/ h g, Wn i ! h g, W i (weakly in R, as n ! 1)

for any g 2 L2 .T , T , / where according to Proposition 2.19


 Z Z 
 0

.˘˘/ L hg, W i j P D N 0, g.t1 / K.t1 , t2 / g.t2 / .dt1 /.dt2 / .
T T

(2) By assumption D(#), the components D1 ‰# , : : : , Dd ‰# of the derivative D‰#


are elements of H D L2 .T , T , /. If we apply .˘/+.˘˘/ to g :D h> D‰# , h 2 Rd
arbitrary, Cramér–Wold yields
00 1 1 00 1 1
hD1 ‰# , Wn i hD1 ‰# , W i
L @@ ::: A j P# A ! L @@ ::: A j P 0 A D N . 0 , „# /
hDd ‰# , W i
n hDd ‰# , W i

(weak convergence in Rd , as n ! 1) since the covariance matrix „# in (+) has


entries
Z Z
 
Di ‰# .s/ K.s, t / Dj ‰# .t / .ds/.dt / D E hDi ‰# , W i , hDj ‰# , W i ,
T T
i , j D 1, : : : , d .

To conclude the proof, it is sufficient to combine the last convergence with the repre-
sentation of Theorem 2.14
 
'n .#n  #/ D …# Wn C oP# .1/ , n ! 1

of rescaled MD estimation errors. To apply Theorem 2.14, note that assumption


AN(#) – through step (1) above and the continuous mapping theorem – establishes
L
weak convergence kWn k ! kW k in R as n ! 1, and thus in particular tightness
as required in condition T(#). 

In Example 2.7, we started to consider MD estimator sequences based on the em-


pirical distribution function for i.i.d. observations. In Lemma 2.23 and Example 2.24
below, we will put this example in rigorous terms.

2.23 Lemma. Let Y1 , Y2 , : : : denote i.i.d. random variables taking values in Rk , with
continuous distribution function F : Rk ! Œ0, 1. Let F b n : .t , !/ ! F b n .t , !/
denote the empirical distribution function based on the first n observations. Consider
T compact in Rk , with Borel- -field T , and  a finite measure on .T , T /. Write
p  
W n :D n F bn  F , n  1
Section 2.4 Asymptotic Normality for Minimum Distance Estimator Sequences 79

and consider the kernel


K.s, t / D F .s ^ t /  F .s/F .t / , s, t 2 Rk
with minimum taken componentwise in Rk : s ^ t D .si ^ ti /1ik . Then all require-
ments of the asymptotic normality condition AN(#) are satisfied, there is a measurable
process W which is -Gaussian with covariance kernel K., /, and we have
Wn ! W (weak convergence in L2 .T , T , /, as n ! 1) .
In the case where d D 1, W is Brownian bridge B 0,F time-changed by F .

Proof. (1) For t , t 0 2 T , with minimum t ^ t 0 defined componentwise in Rk , we have


 
K.t , t 0 / D F .t ^ t 0 /  F .t /F .t 0 / D EF 1.1,t .Y1 /  F .t / 1.1,t 0  .Y1 /  F .t 0 /

with intervals in Rk written according to Assumption 2.1(c), and


X l Xl 
˛i K.ti , ti / ˛i D VarF
0 0 ˛i 1.1,ti  .Y1 /  F .ti / 0
i,i 0 D1 iD1

for arbitrary l  1, ˛1 , : : : , ˛l in R, t1 , : : : , tl in T . The distribution function F being


continuous, the kernel K., / : T T ! Œ0, 1 is thus symmetric, continuous and non-
negative definite. By Theorem 2.18, a -Gaussian process W with covariance kernel
K., / exists. p
(2) Let .Yi /i1 be defined on some ., A, P /, and put W n .t , !/ :D n.F b n .t , !/
F .t //. We prove that for arbitrary t1 , : : : , tl in T , l  1, convergence of finite dimen-
sional distributions
   
.C/ L .W tn1 , : : : , W tnl / j P ! N .0, †/ , † :D K.ti , tj / i,j D1,:::,l

(weakly in Rl as n ! 1) holds. Using Cramér–Wold we have to show weak conver-


gence in R
Xl 
L ˛i W ti j P ! N . 0 , ˛ > † ˛ / , n ! 1
n

iD1

for arbitrary ˛ D .˛1 , : : : , ˛l / 2 Rl . By definition of W n and by the central limit


theorem we have
X
l
1 X
n
˛i W tni D p Rj ! N . 0 , Var.R1 / /
iD1
n j D1

with centred i.i.d. random variables


X
l
Rj :D ˛i 1.1,ti  .Yj /  F .ti / , j 2N.
iD1
80 Chapter 2 Minimum Distance Estimators

By step (1) we have Var.R1 / D ˛ > † ˛ : this proves .C/.


(3) A simple particular case of the above, with f .t / :D K.t , t / D F .t /.1  F .t //,
is    
E jW tn j2 D f .t / for all t , n, and E jW t j2 D f .t / for all t ;
f being continuous and T compact, we have f 2 L1 .T , T , / . Hence all require-
ments of assumption AN(#) are satisfied. Thus Theorem 2.4 applies –using condition
(2.4”) as a sufficient condition for condition (a.ii) in Theorem 2.4 – and gives weak
convergence of paths Wn ! W in L2 .T , T , /. 

MD estimator sequences based on the empirical distribution function behave as fol-


lows.

2.24 Example (Example 2.7 continued). For ‚  Rd open, consider .Yi /i1 i.i.d.
observations in Rk , with continuous distribution function F# : Rk ! Œ0, 1 under
# 2 ‚. Fix T compact in Rk , T D B.T /, some finite measure  on .T , T /, and
assume the following: for all # 2 ‚,
the parameterisation ‚ 3 # ! ‰# :D F# 2 H D L2 .T , T , /
.˘/
satisfies I(#) and D(#) .

Then MD estimator sequences based on the empirical distribution function .t , !/ !


b n .t , !/
F

.C/ b n  F ./ kL2 ./ ,


#n D arginf k F n1
2‚

(according to Definition 2.9; a particular construction may be the construction of Prop-


osition 2.10) are strongly consistent and asymptotically normal for all # 2 ‚:
p   
L n .#n  #/ j P# ! N 0 , ƒ1 # „# ƒ#
1
(weakly in Rd , as n ! 1)

where „# and ƒ# take the form


Z Z
.ƒ# /i,j D Di F# .t1 / Dj F# .t2 / .dt1 /.dt2 / , 1  i , j  d ,
Z Z T T
.„# /i,j D Di F# .t1 / Œ F# .t1 ^ t2 /  F# .t1 /F# .t2 /  Dj F# .t2 / .dt1 /.dt2 / ,
T T
1  i, j  d

with ‘min’ taken componentwise in Rk . For the proof of these statements, recall from
Example 2.7 that condition SLLN(#) holds as a consequence of Glivenko–Cantelli,
cf. ./ in Example 2.7. I(#) holds by .˘/ assumed above, hence Proposition 2.11
gives strong consistency of any version of the MD estimator sequence (+). Next,
Lemma 2.23 establishes condition AN(#) with covariance kernel K.t1 , t2 / D F# .t1 ^
Section 2.4 Asymptotic Normality for Minimum Distance Estimator Sequences 81

t2 /  F# .t1 /F# .t2 /. Since D(#) holds by assumption .˘/, Theorem 2.22 applies and
yields the assertion. 

2.24’ Exercise. In dimension d D 1, fix a distribution function F on .R, B.R// which admits
a continuous and strictly positive Lebesgue density f on R, and consider as a particular case
of Example 2.24 a location model

.R, B.R/, ¹F# : # 2 ‚º/ , F# :D F .  #/ , f# :D f .  #/

where ‚ is open in R. Check that for any choice of a finite measure  on .T , T /, T compact
in R, assumptions I(#) and D(#) are satisfied (with DF# D f# ) for all # 2 ‚. 

MD estimators in i.i.d. models may be defined in many different ways, e.g. based on
empirical Laplace transforms when tractable expressions for the Laplace transforms
under # 2 ‚ are at hand, from empirical quantile functions, from empirical charac-
teristic functions, and so on. The next example is from Höpfner and Rüschendorf [58].

2.25 Example. Consider real valued i.i.d. symmetric stable random variables .Yj /j 1
with characteristic function
 
u ! E.˛,/ e i uY1 D e  juj , u 2 R
˛

and estimate both stability index ˛ 2 .0, 2/ and weight parameter 2 .0, 1/ from the
first n observations Y1 , : : : , Yn .
Put # D .˛, /, ‚ D .0, 2/ .0, 1/. Symmetry of L.Y jP# / – the characteristic
functions being real valued
P – allows to work with the real parts of the empirical char-
acteristic functions n1 jnD1 e i uYj only, so we put

X n
b n .u/ D 1
‰ cos.uYj / , ‰# .u/ :D e  juj ,
˛
# 2‚.
n
j D1

Fix a sufficiently large compact interval T , symmetric around zero and including open
neighbourhoods of the points u D ˙1, T D B.T /, and a finite measure  on .T , T /,
symmetric around zero such that
Z "
.C/ j log uj2 .du/ < 1 for all " > 0 ,
"
and assume in addition:
8
< for some open neighbourhood U of the point u D 1,
.˘/ the density of the -absolutely continuous part of  in restriction to U
:
is strictly positive and bounded away from zero.
Then any MD estimator sequence based on the empirical characteristic function
b n  ‰#
#n :D arginf ‰
#2‚ L2 .T ,T ,/
82 Chapter 2 Minimum Distance Estimators

according to Definition 2.9 (a particular construction may be the construction in Propo-


sition 2.10) is strongly consistent and asymptotically normal for all # 2 ‚:
p   
L n .#n  #/ j P# ! N 0 , ƒ1 # „ # ƒ#
1
(weakly in R2 , as n ! 1)

where „# and ƒ# take the form


Z Z
.ƒ# /i,j D Di ‰# .u/ Dj ‰# .v/ .du/.dv/ ,
T T
Z Z 
‰# .u C v/ C ‰# .u  v/
.„# /i,j D Di ‰# .u/  ‰# .u/‰# .v/
T T 2
Dj ‰# .v/ .du/.dv/
for 1  i , j  2. The proof for these statements is in several steps.
(1) We show that .˘/ establishes the identifiability condition I(#) for all # 2 ‚.
Write CN for the compact Œ0, 2 Œ N1 , N  in R2 . Fix # 2 ‚. Fix some pair ."0 , N0 /
such that the ball B"0 .#/ is contained in the interior of CN0 . For N  N0 sufficiently
large, and for parameter values # 0 D .˛ 0 , 0 / for which N1  0  N , we introduce
linearisations at u D 1
e # 0 .t / :D e  0 Œ1  ˛ 0 0 .t  1/
‰ on e :D Br .1/
U
1
of the characteristic functions ‰# 0 ./ , where 0 < r < 4N is a radius sufficiently small
such that .˘/ holds on U e D Br .1/. For fixed, the family ‰
0 e .˛0 , 0 / can be extended
to include limits ˛ 0 D 0 or ˛ 0 D 2. We put e  :D je
U . Then, CN being compact and
the mapping
CN 3 # 0 ! ‰ e# 0 2 L2 .U
e, e/
continuous, we have for every 0 < " < "0
° ±
min ‰ e# 0  ‰e# 2 : # 0
2 C , j# 0
 #j  " >0.
L .e
U ,e
/ N

This last inequality is identically rewritten in the form


° 1 ±
inf ‰ e# 2
e# 0  ‰ : # 0 D .˛ 0 , 0 / 2 ‚ ,  0
 N , j# 0  #j  " > 0 .
L .e
U ,e
/ N
e # is uniformly on U
Since ‰ e separated from all functions ‰ e# 0 with 0 < 1 or 0 >N
N
if N is large, the last inequality can be extended to
° ±
inf ‰ e# 0  ‰ e# 2 : # 0
D .˛ 0 0
, / 2 ‚ , j# 0
 #j  " > 0.
L .e
U ,e
/

Now we repeat exactly the same reasoning with the characteristic functions ‰# 0 , ‰#
e , instead of their linearisations ‰
restricted to U e# 0 , ‰e# at u D 1 which we considered
so far, and obtain
° ±
0 0
inf k‰# 0  ‰# kL2 .e U ,e
/
: # 2 ‚ , j#  #j  " > 0.
Section 2.4 Asymptotic Normality for Minimum Distance Estimator Sequences 83

Since there is some constant c > 0 such that .dt /  .je U /.dt /  c e
.dt /, by
assumption .˘/, we may pass from U e to T in the last assertion and obtain a fortiori
® ¯
inf k‰# 0  ‰# kL2 .T ,T ,/ : # 0 2 ‚ , j# 0  #j  " > 0 .
Since " > 0 was arbitrary, this is the identifiability condition I(#).
(2) Following Prakasa Rao [107, Proposition 8.3.1], condition SLLN(#) holds for
all # 2 ‚:
We start again from the distribution functions F# associated to ‰# , and Glivenko–
Cantelli. Fix # 2 ‚ and write A# 2 A for some set of full P# -measure such
that sup t2R jFb n .t , !/  F# .t /j ! 0 as n ! 1 when ! 2 A# . In particular
b
F n .!, t / ! F# .t / at continuity points t of F# , hence empirical measures asso-
ciated to .Y1 , : : : , Yn /.!/ converge weakly to F# , for ! 2 A# . This gives pointwise
convergence of the associated characteristic functions ‰ bn .t , !/ ! ‰# .t / for t 2 R
when ! 2 A# . By dominated convergence on T with respect to the finite measure 
this establishes SLLN(#).
(3) Condition (+) is a sufficient condition for differentiability D(#) at all points
# 2 ‚:
By dominated convergence under (+) assumed above
Z "
j log uj2 .du/ < 1 for " > 0 arbitrarily small
"

the mapping ‚ 3 ! ‰# 2 L2 .T , T , / is differentiable at #, and the derivative


D‰# has components
D ‰# .u/ D juj˛ ‰# .u/ , D˛ ‰# .u/ D .log juj/ juj˛ ‰# .u/ , u2R
which are linearly independent in L2 .T , T , /.
(4) For all # 2 ‚, the asymptotic normality condition AN(#) holds with covariance
kernel
1
K# .u, v/ :D . ‰# .u C v/ C ‰# .u  v/ /  ‰# .u/‰# .v/ , u, v 2 R:
2
Fix # 2 ‚. Proceeding as in the proof of Lemma 2.23, for any collection of points
u1 , : : : , ul 2 T , we replace the random variable Rj defined there by

X
l
Rj :D ˛i cos.ui Yj /  ‰# .ui / .
iD1

Calculating the variance of R1 under #, we arrive at a kernel


.u, v/ ! E# .Œcos.uY1 /  ‰# .u/ Œcos.vY1 /  ‰# .v// .
As a consequence of cos.x/ cos.y/ D 1
2 .cos.x C y/ C cos.x  y//, this is the kernel
K., / defined above.
84 Chapter 2 Minimum Distance Estimators

(5) Now we conclude the proof: we have conditions SLLN(#)+I(#)+D(#)+AN(#)


by steps (1)–(4) above, so Theorem 2.22 applies and yields the assertion. 

2.25’ Exercise. For the family P D ¹ .a, p/ : a > 0 , p > 0º of Gamma laws on .R, B.R//
with densities
p a a1 px
fa,p .x/ D 1.0,1/ .x/ x e ,
.a/
for i.i.d. observations Y1 , Y2 , : : : from P , construct MD estimators for .a, p/ based on empirical
Laplace transforms
1 X   Yi
n
Œ0, 1/ 3  ! e 2 Œ0, 1
n
i D1

and discuss their asymptotics (hint: to satisfy the identifiability condition, work with measures
 on compacts T D Œ0, C  which satisfy .dy/  c dy in restriction to small neighbourhoods
Œ0, "/ of 0C ). 

We conclude this chapter with some remarks. Many classical examples for MD
estimator sequences are given in Millar [100, Chap. XIII]. Also, a large number of
extensions of the method exposed here are possible. Kutoyants [79] considers MD
estimator sequences for i.i.d. realisations of the same point process on a fixed spatial
window, in the case where the spatial intensity is parameterised by # 2 ‚. Beyond
the world of i.i.d. observations, MD estimator sequences can be considered in ergodic
Markov processes when the time of observation tends to infinity, see a large number
of diffusion process examples in Kutoyants [80], or in many other stochastic process
models.
Chapter 3

Contiguity

Topics for Chapter 3:


3.1 Le Cam’s First and Third Lemma
Notations: likelihood ratios in sequences of binary experiments 3.1
d
Some conventions for sequences of R -valued random variables 3.1’
Definition of contiguity 3.2
Contiguity and R-tightness of likelihood ratios 3.3–3.4
Le Cam’s first lemma: statement and interpretation 3.5–3.5’
Le Cam’s third lemma: statement and interpretation 3.6–3.6’
Example: mean shift when limit laws are normal 3.6”
3.2 Proofs for Section 3.1 and some Variants
An "-ı-characterisation of contiguity 3.7
Proving Proposition 3.3’ 3.7’–3.8
Proof of Theorem 3.4(a) 3.9
One-sided contiguity and Le Cam’s first lemma 3.10–3.12
Proof of Theorem 3.4(b) 3.13
Proving LeCam’s first lemma 3.14–3.15
One-sided contiguity and Le Cam’s third lemma 3.16
Proving LeCam’s third lemma 3.17
Proof of Proposition 3.6” 3.18
Exercises: 3.5”, 3.5”’, 3.5””, 3.10’, 3.18’

This chapter discusses the notion of contiguity which goes back to Le Cam, see Hájek
and Sidák [41], Le Cam [81], Roussas [113], Strasser [121], Liese and Vajda [87],
Le Cam and Yang [84], van der Vaart [126], and is of crucial importance in the con-
text of convergence of local models. Mutual contiguity considers sequences of like-
lihood ratios whose accumulation points in the sense of weak convergence have the
interpretation of a likelihood ratio between two equivalent probability laws. Section
3.1 fixes setting and notations, and states two key results on contiguity, termed ‘Le
Cam’s first lemma’ and ‘Le Cam’s third lemma’ (3.5 and 3.6 below) since Hájek and
Sidák [41, Chap. 6.1]. Section 3.2 contains the proofs together with some variants of
the main results.
86 Chapter 3 Contiguity

3.1 Le Cam’s First and Third Lemma


For two  -finite measures P and Q on the same ., A/, P is absolutely continuous
with respect to Q (notation P << Q) if Q.A/ D 0 implies P .A/ D 0 for arbitrary
A 2 A. By the Radon–Nikodym theorem, P << Q is equivalent to the following: R
there is some A-measurable mapping f :  ! Œ0, 1/ such that P .A/ D A f dQ
for all A 2 A; in this case, f is called (a version of) the density of P with respect to
Q. P is equivalent to Q (notation P Q) if P << Q and Q << P . P and Q are
singular (notation P ?Q) if there is some set A 2 A which is of measure zero under
P and of full measure under Q. A Lebesgue decomposition of P with respect to Q is
any pair .f , N /, N 2 A and f :  ! Œ0, 1/ A-measurable, such that the following
holds:
Z
Q .N / D 0 and P .A/ D P . A \ N / C f dQ for all A 2 A .
A
See e.g. [121, Chap. I] or [127, Satz 1.103].
Throughout this chapter we will consider sequences of probability spaces carrying
two probability measures (‘binary experiments’), and use the following notations.

3.1 Notations. .n , An / denotes a sequence of measurable spaces, n  1, every


.n , An / equipped with a pair Pn , Qn of probability measures.
(i) Any An -measurable mapping Ln : n ! Œ0, 1 such that the pair .Ln 1¹Ln <1º ,
¹Ln D 1º/ is a Lebesgue decomposition of Pn with respect to Qn :
Z
Qn .¹Ln D 1º/ D 0 , Pn .A/ D Pn . A \ ¹Ln D 1º / C Ln dQn , A 2 An
A

is a version of the likelihood ratio of Pn with respect to Qn . Two different versions of


the likelihood ratio of Pn with respect to Qn coincide .Pn C Qn /-almost surely; see
e.g. [127, Satz 1.110]. The measure
R Pn .  \ ¹Ln D 1º / is called the Qn -singular part
of Pn , and the measure A ! A Ln dQn the Qn -absolutely continuous part of Pn .
(ii) For any  -finite measure n on .n , An / which dominates Pn and Qn (take
e.g. n D Pn C Qn ), for versions pn of the n -density of Pn and qn of the n -density
of Qn ,
pn
Ln :D 1¹qn >0º C 1 1¹qn D0º
qn
is (a version of) the likelihood ratio of Pn with respect to Qn .
(iii) For Ln as in (i), ƒn :D log Ln (with conventions log.C1/ D C1 and
log.0/ D 1 ) mapping n to R :D Œ1, C1 is (a version of) the log-likelihood
ratio of Pn with respect to Qn .
(iv) For any version Ln of the likelihood ratio of Pn with respect to Qn , L1n (with
conventions 10 D 1 and 1 1
D 0) is a version of the likelihood ratio of Qn with respect
to Pn .
Section 3.1 Le Cam’s First and Third Lemma 87

d
3.1’ Notations. For R :D Œ1, C1, consider R -valued random variables Xn
living on some .n , An , Qn /, n  1, and associate to Xn the Rd -valued random
b n :D Xn 1¹jX j<1º .
variable X n
(i) We call the sequence .Xn /n Rd -tight under .Qn /n if
°   ±
./ lim Qn .Xn ¤ X b n / D 0 and L X b n j Qn : n  1 is tight in Rd .
n!1

(ii) For probability measures F on .Rd , B.Rd //, we say that .Xn /n under .Qn /n
converges Rd -weakly to F – and write

L .Xn j Qn / ! F (weakly in Rd as n ! 1)

for short – if .Xn /n under .Qn /n is Rd -tight and if the second condition in ./ can
be strengthened to
 
./ L X b n j Qn ! F (weak convergence in Rd as n ! 1) .

(iii) We call the sequence .Xn /n uniformly integrable under .Qn /n if


Z
b
Qn .Xn ¤ X n / D 0 for all n  1, and lim sup b n j dQn D 0 .
jX
K " 1 n1 ¹jb X n j>Kº

3.2 Definition. The sequence .Pn /n of probability measures is contiguous to .Qn /n


(notation .Pn /n C .Qn /n ) if for arbitrary sequences of events An 2 An we have as
n!1
Qn .An / ! 0 H) Pn .An / ! 0 ;
mutual contiguity (notation .Pn /n CB .Qn /n ) means that both .Pn /n C .Qn /n and
.Qn /n C .Pn /n hold true.

Hence contiguity .Pn /n C .Qn /n is an asymptotic analogue to the classical abso-


lute continuity P << Q for probability measures P , Q on a fixed space ., A/. We
start with a simple observation.

3.3 Proposition. Without further conditions, .Ln /n under .Qn /n is R-tight.

Proof. For every n  1, the event ¹Ln D 1º has Qn -measure zero by Nota-
tion 3.1(i); we have to show

for every " > 0 there is some K < 1 such that sup Qn .1 > Ln > K/ < "
n1

by definition in Notation 3.1’(i). Choosing n :D Pn CQn as the dominating measure


for Pn and Qn , we have versions of the densities pn and qn such that pn C qn  1
88 Chapter 3 Contiguity

on n . Since Ln is .Pn C Qn /-uniquely determined, we can write


   
1  qn 1
Qn .1 > Ln > K/ D Qn qn > 0 , > K D Qn 0 < qn <
qn K C1
Z
1 2
D 1¹qn < KC1
1
º qn dn  n .n / D
n K C1 K C1
where the right-hand side does not depend on n  1. 

The following results Proposition 3.3’, Theorem 3.4, Lemmas 3.5, 3.6 and Propo-
sition 3.6” will be stated in the present section, together with some comments; all
proofs – evolving through a series of variants and alternative formulations – will be
collected in Section 3.2. The main results are Lemmas 3.5 and 3.6.

3.3’ Proposition. For random variables Yn on .n , An / taking values in .Rd , B.Rd //,
the assertion

¹L .Yn j Qn / : n  1º is tight, and .Pn /n C .Qn /n

implies
¹L .Yn j Pn / : n  1º is tight .

3.4 Theorem. (a) The following assertions (i) and (ii) are equivalent:

.i/ .Pn /n C .Qn /n

.ii/ the sequence .Ln /n under .Pn /n is R-tight .

(b) The following assertions (i) and (ii) are equivalent:

.i/ .Pn /n CB .Qn /n

.ii/ the sequence .ƒn /n is R-tight under both .Pn /n and .Qn /n .

Out of a tight sequence of probability laws on .Rd , B.Rd // we can select weakly
d
convergent subsequences. Hence, considering Rd -tight sequences of R -valued ran-
dom variables with the conventions of Notation 3.1’(i), there is no loss of generality
– switching to subsequences if necessary – in supposing Rd -weak convergence in the
sense of Notation 3.1’(ii).

3.5 Le Cam’s First Lemma. Assume that there is a probability law F on .R, B.R//
such that
L .ƒn j Qn / ! F (weakly in R as n ! 1) .
Section 3.1 Le Cam’s First and Third Lemma 89

Then the following holds:


Z
.Pn /n CB .Qn /n ” e  F .d / D 1 .
R

3.5’ Remark. To any probability measure F on .R, B.R// which may occur in Lem-
ma 3.5, we can associate a real valued random variable ƒ on some probability space
., A, Q/ with distribution F ; theR easiest choice is ƒ :D id on ., A, Q/ :D
.R, B.R/, F / . Then the equality R e  F .d / D 1 in Lemma 3.5 signifies that
., A/ carries a second probability measure P defined by

dP :D e ƒ dQ

which necessarily – by its definition – is equivalent to Q, and gives rise to a binary


experiment

.C/ ., A, ¹P , Qº/ with P Q.

We may view (+) as a ‘limit experiment’ as n ! 1 for the sequence of binary exper-
iments
.n , An , ¹Pn , Qn º/ , n  1 .
Thus Le Cam’s first lemma relates mutual contiguity .Pn /n CB .Qn /n to conver-
gence of experiments where weak limits of log-likelihood ratios are log-likelihood
ratios between equivalent probability laws. 

3.5” Exercise. In the case where the weak limit for log-likelihood ratios ƒn under Qn is a
normal law
L .ƒn j Qn / ! N .,  2 / (weakly in R as n ! 1)
with strictly positive variance  2 > 0, use Lemma 3.5 to prove the equivalence
1
.Pn /n CB .Qn /n ”  D  2 .
2
3.5”’ Exercise. On .n , An / D .Rn , B.Rn //, consider probability laws Qn D
˝niD1 N .0, 1/, Pn D ˝niD1 N . phn , 1/ for some h 2 R which we keep fixed as n ! 1. Write
P
.X1 , : : : , Xn / for the canonical variable on n . Check that ƒn is given by phn niD1 Xi  12 h2
and think of the Laplace transform of N .0, 1/ to check mutual contiguity .Pn /n CB .Qn /n
via Le Cam’s First Lemma 3.5. Check also that the ‘limit experiment’ in Remark 3.5’ can be
determined as .R, B.R// equipped with Q D N .0, 1/, P D N .h, 1/.

3.5”” Exercise. On .n , An / D .Rn , B.Rn //, for some h > 0 which we keep fixed as
n ! 1, consider probability laws Qn D ˝niD1 R.0, 1/ and Pn D ˝niD1 R.0C hn , 1C nh /.
Write Un D .0, 1/n for the support of Qn and Vn D .0C hn , 1C hn /n for the support of Pn .
Use sequences An D Un n Vn or e An D Vn n Un to check directly from Definition 3.2 that
90 Chapter 3 Contiguity

neither .Pn /n C .Qn /n nor .Qn /n C .Pn /n hold. Show also that instead of a limit law F on
.R, B.R// as required in Lemma 3.5 for the sequence of log-likelihood ratios L .ƒn j Qn / as
n ! 1, we have a limit law on .R, B.R/// which takes values 0 with probability e h and
1 with probability 1e h .

The next result is Le Cam’s third lemma (the ‘second lemma’ is related to repre-
sentation of log-likelihood ratios in smoothly parameterised product experiments, see
Chapters 4 and 7).

3.6 Le Cam’s Third Lemma. Assume

.Pn /n CB .Qn /n .
e on .R1Ck , B.R1Ck // and a sequence of Rk -valued
Consider a probability measure F
random variables Xn on .n , An / such that the following holds:
e
L ..ƒn , Xn / j Qn / ! F (weakly in R1Ck as n ! 1) .
Then
e , dx/ :D e  F
G.d e .d , dx/ ,  2 R , x 2 Rk
is a probability measure on .R1Ck , B.R1Ck //, and we have
e
L ..ƒn , Xn / j Pn / ! G (weakly in R1Ck as n ! 1) .

3.6’ Remarks. (a) Under mutual contiguity .Pn /n CB .Qn /n , we extend the in-
terpretation in Remark 3.5’ and view the law F e in Lemma 3.6 as the distribution of
a random variable .ƒ, X / defined on some ., A, Q/. Thus, in addition to Remark
3.5’, we have an Rk -valued statistic X in the ‘limit experiment’ ., A, ¹P , Qº/ where
P Q is defined by dP :D e ƒ dQ. In the ‘limit experiment’ we have
e D L ..ƒ, X / j Q/ ,
F e D L ..ƒ, X / j P /
G
e for
where the second statement is a consequence of the definition of P and of G:
suitable f ., /,
  Z
EP .f .ƒ, X // D EQ e ƒ f .ƒ, X / D e  f ., x/F e .d , dx/
Z
e , dx/ .
D f ., x/G.d

We view .Xn /n in Lemma 3.6 as a sequence of statistics in the binary experiments


.n , An , ¹Pn , Qn º/ which under .Qn /n converges weakly jointly with the log-likeli-
hood ratios .ƒn /n :

.C/ L ..ƒn , Xn / j Qn / ! L ..ƒ, X / j Q/ , n!1.


Section 3.1 Le Cam’s First and Third Lemma 91

Now Le Cam’s Third Lemma 3.6 can be rephrased as follows: under mutual contiguity
.Pn /n CB .Qn /n , assertion (+) implies

.CC/ L ..ƒn , Xn / j Pn / ! L ..ƒ, X / j P / , n!1

with P Q given by dP :D e ƒ dQ.


(b) We sketch a typical application
p of Le Cam’s third lemma. Consider statistical
models for i.i.d. observations and n-consistent estimator sequences .e # n /n for an
unknown parameter #.
Let ‚  Rd be open, write Fn for the  -field generated by the first n observations,
n  1. Fix a reference point # and define the sequence of probability laws .Qn /n
by .P# j Fn /n , n  1. Define the sequence .Pn /n by .P#Ch=pn j Fn /n , for some
h in Rd which we keep fixed. Then ƒn in Lemma 3.6 is the log-likelihood ratio of
P#Ch=pn restricted to Fn with respect to P# restricted to Fn . In this setting, mutual
contiguity .Pn /n CB .Qn /n will hold if the parameterisation is ‘smooth enough’
(we shall consider this in Chapters 4 and 7 below, see in particular Proposition 7.3 and
Lemma 4.11). The reference point p# being fixed, we identify the .Xn /n in Lemma 3.6
with rescaled estimation errors . n.e # n  #//n at #. If we are able to establish weak
convergence of pairs .ƒn , Xn / under # (which
p requires some tractable relationship
between rescaled estimation error Xn D n.e # n  #/ and log-likelihood ƒn under
#), then Le Cam’s Third Lemma 3.6 allows us to deduce from convergence at the
reference point #
 p 
.C/ L .ƒn , n.e # n  #// j P# D L ..ƒn , Xn / j Qn / ! F e
p
the convergence under parameter values of the formp # C h= n which are close to #
(in the sense of neighbourhoods shrinking at rate 1= n as n ! 1): first, by Lemma
3.6,
 p 
.CC/ L .ƒn , n.e # n  #// j P#Ch=pn D L ..ƒn , Xn / j Pn / ! G e

e to G
where the passage from F e is explicit (at least in principle, cf. Lemma 3.6); second,
shifting all second components in (++) by h which does not depend on n,
 p p 
.C C C/ L .ƒn , n.e # n  .# C h= n/// j P#Ch=pn ! H e

where H e is the image of G


e under the mapping ., x/ ! ., x  h/. On the left-hand
p
side of (+++) appear the rescaled estimation errors under # C h= n for fixed h as
n ! 1.
In this way, our knowledge (+) on limit distributions of rescaled estimation er-
rors at fixed reference points # extends
p to rescaled estimation errors (+++) on small
neighbourhoods of # of radius O.1= n/. We will exploit this extensively in Chap-
ter 7. 
92 Chapter 3 Contiguity

There is one important special case where the passage from F e to Ge in Le Cam’s
e
Third Lemma 3.6 is particularly simple. Normal limit laws F for L ..ƒn , Xn / j Qn /
as n ! 1 in Lemma 3.6 arise in many classical situations. The following proposition
shows that in this case L.Xn j Qn / and L.Xn j Pn / differ only by a mean value shift
 D lim CovQn .ƒn , Xn / :
n!1

3.6” Proposition. When F e in Lemma 3.6 is some normal law on .R1Ck , B.R1Ck //
whose first component is not degenerate to a single point, then necessarily
 1 2   2 > 
e D N 2   
F ,
  †

for some  2 Rk ,  2 Rk ,  > 0, and some covariance matrix † 2 Rkk , symmetric


e associated to F
and non-negative definite, and G e by Lemma 3.6 takes the form
 1 2   2 > 
e D N C2   
G , .
C  †

3.2 Proofs for Section 3.1 and some Variants


We start with an "-ı-characterisation for contiguity .Pn /n C .Qn /n in analogy to the
classical "-ı-characterisation (e.g. [127, p. 109]) of absolute continuity P << Q of
probability measures.

3.7 Proposition. The following condition is necessary and sufficient for contiguity
.Pn /n C .Qn /n : for every " > 0 there is some ı D ı."/ > 0 such that

.C/ lim sup Qn .An / < ı H) lim sup Pn .An / < "
n!1 n!1

holds for arbitrary sequences of events An in An , n  1.

Proof. (1) We prove that condition (+) is sufficient for contiguity .Pn /n C .Qn /n . To
" > 0 arbitrarily small associate ı D ı."/ > 0 such that (+) holds. Fix any sequence
of events An in An , n  1, with the property lim Qn .An / D 0 . For this sequence,
n!1
condition (+) makes sure that we must have lim Pn .An / D 0 too: this is contiguity
n!1
.Pn /n C .Qn /n as in Definition 3.2.
(2) We prove that (+) is a necessary condition. Let us assume that for some " > 0,
irrespectively of the smallness of ı > 0, an implication like (+) never holds true. In
this case, considering in particular ı D k1 , k 2 N arbitrarily large, there are sequences
of events .Akn /n such that
1
lim sup Qn .Akn / < holds together with lim sup Pn .Akn /  " .
n!1 k n!1
Section 3.2 Proofs for Section 3.1 and some Variants 93

From this we can select a sequence .nk /k increasing to 1 such that for every k 2 N
we have
Pnk .Aknk / > "2 together with Qnk .Aknk / < k2 .
Using Definition 3.2 along the subsequence .nk /k , contiguity .Pnk /k C .Qnk /k
does not hold. A fortiori, as an easy consequence of Definition 3.2, contiguity
.Pn /n C .Qn /n does not hold. 

We use the "-ı-characterisation of contiguity to prove Proposition 3.3’.

3.7’ Proof of Proposition 3.3’. We have to show that any sequence of Rd - valued
random variables .Yn /n which is tight under .Qn /n remains tight under .Pn /n when
contiguity .Pn /n C .Qn /n holds. Let " > 0 be arbitrarily small. Assuming contiguity
.Pn /n C .Qn /n , we make use of Proposition 3.7 and select ı D ı."/ > 0 such that
the implication (+) in Proposition 3.7 is valid. From tightness of .Yn /n under .Qn /n ,
there is some large K D K.ı/ < 1 such that
lim sup Qn .jYn j > K/ < ı ,
n!1
from which we deduce thanks to (+)
lim sup Pn .jYn j > K/ < " .
n!1

Since " > 0 was arbitrary, Proposition 3.3’ is proved. 

We prepare the next steps by some comments on the conventions in Notations 3.1’.
d
3.8 Remarks. With Notations 3.1’, consider on .n , An , Qn / R -valued random
b n :D Xn 1¹jX j<1º which are Rd -valued.
variables Xn , n  1, and their associated X n
The following assertions (a)–(c) can be checked directly from Notations 3.1’.
(a) The sequence .Xn /n under .Qn /n is Rd -tight if and only if
lim lim sup Qn .C1  jXn j > K/ D 0 .
K "1 n!1

(b) For probability measures F on .Rd , B.Rd //,


L .Xn j Qn / ! F (weakly in Rd as n ! 1)
as defined in Notation 3.1’(ii) is equivalent to the following condition . /:
Z Z
. / for all f 2 Cb .Rd / : lim f .Xn / 1¹jXn j<1º dQn D f dF .
n!1 n R
R
Note that for f  1 we have R f dF D 1 on the right-hand side of . /, by
assumption F .Rd / D 1, hence . / contains the assertion lim Qn .jXn j D 1/ D 0
n!1
which was part of the definition of Notation 3.1’(ii).
94 Chapter 3 Contiguity

(c) Uniform integrability of the sequence .Xn /n under .Qn /n as defined in Notation
3.1’(iii) is equivalent to validity of the following assertions (i) and (ii) together:
(i) we have sup EQn .jXn j/ < 1 ;
n1

(ii) for every " > 0 there is some ı D ı."/ > 0 such that
Z
n  1 , An 2 An , Qn .An / < ı H) jXn j dQn < " .
An

According to the definition in Notation 3.1’(iii), this is proved exactly as in the usual
case of real valued random variables living on some fixed probability space. 

By Proposition 3.3 which was proved in Section 3.1, the sequence of likelihood ra-
tios .Ln /n is R-tight under .Qn /n , without further assumptions. Now we can prove
part (a) of Theorem 3.4: contiguity .Pn /n C .Qn /n is equivalent to R-tightness of
.Ln /n under both .Pn /n and .Qn /n .

3.9 Proof of Theorem 3.4(a). (1) To prove (i)H)(ii) of Theorem 3.4(a), we assume
.Pn /n C .Qn /n and have to verify – according to Remark 3.8(a) – that the following
holds true:
lim lim sup Pn .Ln 2 ŒK, 1/ D 0 .
K "1 n!1

If this assertion were not true, we could find some " > 0, a sequence Kj " 1, and a
sequence of natural numbers nj increasing to 1 such that
 
Pnj Lnj 2 ŒKj , 1 > " for all j  1 .

By Proposition 3.3 we know that .Ln /n under .Qn /n is R-tight, thus we know
 
lim lim sup Qn Ln 2 ŒKj , 1 D 0 .
j !1 n!1

Combining the last two formulas, there would be a sequence of events Am 2 Am

Am :D ¹Lnj 2 ŒKj , 1º in the case where m D nj , and Am :D ; else

with the property

lim Qm .Am / D 0 , lim sup Pm .Am /  "


m!1 m!1

in contradiction to the assumption .Pn /n C .Qn /n . This proves (i)H)(ii) of Theo-


rem 3.4(a).
(2) To prove (ii)H)(i) in Theorem 3.4(a), we assume R-tightness of .Ln /n under
.Pn /n and have to show that contiguity .Pn /n C .Qn /n holds true. To prove this, we
Section 3.2 Proofs for Section 3.1 and some Variants 95

start from a sequence of events An 2 An with the property lim Qn .An / D 0 and
n!1
write

Pn .An / D Pn .An \ ¹1  Ln > Kº/ C Pn .An \ ¹Ln  Kº/


Z
 Pn .1  Ln > K/ C Ln dQn
An \¹Ln Kº

 Pn .1  Ln > K/ C K Qn .An / .

This gives
lim sup Pn .An /  lim sup Pn .1  Ln > K/
n!1 n!1

where by R-tightness of .Ln /n under .Pn /n and by Remark 3.8(i), the righthand side
can be made arbitrarily small by suitable choice of K. This gives limn!1 Pn .An / D
0 and thus proves (ii)H)(i) of Theorem 3.4(a). 

Preparing for the proof of Theorem 3.4(b) which will be completed in Proof 3.13
below, we continue to give characterisations of contiguity .Pn /n C .Qn /n .

3.10 Proposition. The following assertions are equivalent:


(i) .Pn /n C .Qn /n ;
(ii) the sequence .Ln /n under .Qn /n is uniformly integrable, and we have
limn!1 Pn .Ln D 1/ D 0 ;
(iii) the sequence .Ln /n under .Qn /n is uniformly integrable, and we have
limn!1 EQn .Ln / D 1 .

Proof. (1) Assertions (ii) and (iii) are equivalent since

Pn .Ln D 1/ C EQn .Ln / D 1 , n1

as a direct consequence of the Lebesgue decomposition in Notation 3.1(i) of Pn with


respect to Qn : here EQn .Ln / is the total mass of the Qn -absolutely continuous part
of Pn , and Pn .Ln D 1/ the total mass of the Qn -singular part of Pn .
(2) For 0 < K < 1 arbitrarily large we can write
Z
Pn .K < Ln  1/ D Pn .Ln D 1/ C 1¹K<Ln <1º Ln dQn
Z
 
D 1  EQn .Ln / C 1¹K<Ln <1º Ln dQn .

By Theorem 3.4(a), contiguity .Pn /n C .Qn /n is equivalent to R-tightness of .Ln /n


under .Pn /n . The last equation shows that this is again equivalent to uniform integra-
bility of .Ln /n under .Qn /n – combining Notation 3.1’(iii) with Qn .Ln D 1/ D 0
96 Chapter 3 Contiguity

in the Lebesgue decomposition in Notation 3.1(i) – plus any one of the additional con-
ditions considered in step (1). 

3.10’ Exercise. On .R, B.R//, write P .#/ for the exponential law with parameter 1 shifted
by # > 0, i.e. supported by .#, 1/. On .n , An / D .Rn , B.Rn // with canonical variable
.X1 , : : : , Xn /, consider probability laws Qn D ˝niD1 P .0/ and Pn D ˝niD1 P .0C hn /, for
some h > 0 which is fixed as n ! 1. Write Un D .0, 1/n for the support of Qn and
Vn D .0C hn , 1/n for the support of Pn . Check that the likelihood ratio Ln is given by

Ln D 0  1Un nVn C e Ch 1Vn


where
Qn .Vn / D e h for all n  1 .

Deduce contiguity .Pn /n C .Qn /n from Proposition 3.10(iii), from Proposition 3.10(ii), from
Theorem 3.4(a), from an "-ı-argument using Proposition 3.7, and finally directly from Defini-
tion 3.2.

The sequence of likelihood ratios .Ln /n under .Qn /n being R-tight by Proposi-
tion 3.3, for any subsequence .nk /k of the natural numbers, there are further subse-
quences .nkl /l and probability measures FL on .R, B.R// such that .Lnkl /l under
.Qnkl /l converge to FL weakly in R as l ! 1. In this sense, the assumption of the
following theorem – a one-sided version of Le Cam’s First Lemma 3.5 – is (up to
selection of subsequences) no restriction at all. Accumulation points FL in this sense
are necessarily concentrated on Œ0, 1/, and may have a point mass FL .¹0º/ > 0 at zero.

3.11 Theorem. Assume that there is some probability measure FL on .R, B.R// such
that
L.Ln j Qn / ! FL (weakly in R as n ! 1) .

Then the following holds:


Z
.Pn /n C .Qn /n ” ` FL .d `/ D 1 .
Œ0,1/

Proof. (1) We start with some preliminary remarks. Define truncated identities
gK ./ 2 Cb .R/
gK .x/ D 0 _ x ^ K , x2R

for 0 < K < 1. Then monotone convergence gives


Z 1 Z
lim gK d FL D x FL .dx/ 2 Œ0, 1 .
K"1 0 Œ0,1/
Section 3.2 Proofs for Section 3.1 and some Variants 97

R1
Considering 0 .x  gK .x// QnLn .dx/ where EQn .Ln /  1 by definition of a likeli-
hood ratio,
Z
EQn .Ln /  gK dQnLn
.C/ Z
D 1¹K<Ln <1º Ln dQn  K QnLn ..K, 1// 2 Œ0, 1 .

By assumption in Theorem 3.11 on weak convergence L.Ln jQn / ! FL we have


Z Z
.CC/ lim gK dQn D gK d FL , n ! 1 .
Ln
n!1

(2) We show ’H)’. Let us assume .Pn /n C .Qn /n and thus


Z
.˘/ E
lim Qn n.L / D 1 , lim lim sup 1¹K<Ln <1º Ln dQn D 0
n!1 K"1 n!1

thanks to Proposition 3.10. In the limit as n ! 1, the second assertion in .˘/ forces
right-hand sides in (+) to be be small when K is large:
Z
.?/ lim lim sup 1¹K<Ln <1º Ln dQn  K QnLn ..K, 1// D 0 .
K"1 n!1

Then also left-hand sides of (+) will be small when K is large:


 Z
.??/ lim lim sup EQn .Ln /  gK dQnLn D 0 .
K"1 n!1

Inserting in this last convergence the first assertion of .˘/, we get via (++)
Z Z
lim lim gK dQnLn D lim gK d FL D 1
K"1 n!1 K"1
R
L D 1.
which is the desired assertion Œ0,1/ x F .dx/
R
L
(3) We prove ’(H’. Let us assume Œ0,1/ x F .dx/ D 1 . A first consequence of
this assumption is
Z Z
.ı/ lim lim gK dQnLn D lim gK d FL D 1
K"1 n!1 K"1

according to (++); a second consequence, considering continuity points K of FL , is

.ıı/ lim lim K QnLn ..K, 1// D lim K FL ..K, 1// D 0 .


K"1 n!1 K"1

Inserting .ı/ into equations (+) when n tends to 1, left-hand sides in (+) will be small
for large values of K, see .??/, which again with reference to .ı/ gives

. / lim EQn .Ln / D 1 ,


n!1
98 Chapter 3 Contiguity

simultaneously, right-hand sides in (+) must be small for large values of K, see .?/,
which combined with .ıı/ gives
Z
. / lim lim sup 1¹K<Ln <1º Ln dQn D 0 .
K"1 n!1

By Proposition 3.10(iii), assertions . / and . / together establish contiguity


.Pn /n C .Qn /n as desired. 

3.11’ Remark. To any probability measure FL on .R, B.R// which may occur in
Theorem 3.11, associate a real valued random variable RL  0 on some probability
space ., A, Q/ with distribution FL . Then the equality Œ0,1/ ` FL .d `/ D 1 in Theo-
rem 3.11 signifies that ., A/ carries a second probability measure P defined by

dP :D L dQ .

Since by definition P is absolutely continuous with respect to Q, we have a binary


‘limit experiment’

.C/ .  , A , ¹P , Qº / with P << Q ;

here FL .¹0º/ D Q.L D 0/, the Q-weight of the support of the P -singular part of Q,
may be strictly positive. In this sense, one-sided contiguity .Pn /n C .Qn /n is related
to convergence of experiments where weak limits of likelihood ratios are likelihood
ratios between probability measures P , Q such that P is absolutely continuous with
respect to Q.

3.12 Proposition. The following statements are equivalent:


(i) .Pn /n B .Qn /n ;
(ii) lim lim sup Qn .Ln  c/ D 0.
c#0 n!1

Proof. According to Notation 3.1(iv), the Œ0, 1-valued random variable e Ln :D L1n
is a version of the likelihood ratio of Qn with respect to Pn , and statement (ii) means
R-tightness of the sequence . L1n /n under .Qn /n . Changing names P e n :D Qn , Q
e n :D
Pn such that e e n with respect to Q
Ln is the likelihood ratio of P e n , Theorem 3.4(a) ap-
e n /n and .Q
plied to .P e n /n proves Proposition 3.12. 

Now we can prove also the second part of Theorem 3.4:

3.13 Proof of Theorem 3.4(b). We have to show that mutual contiguity


.Pn /n CB .Qn /n is equivalent to R-tightness of the sequence .ƒn /n under both
.Pn /n and .Qn /n .
Section 3.2 Proofs for Section 3.1 and some Variants 99

(1) We assume mutual contiguity. Then Propositions 3.3 and 3.12 together show
 1 
.C/ lim lim sup Qn Ln … Œ , K D 0
K"1 n!1 K
and Theorem 3.4(a) yields

lim lim sup Pn .Ln … Œ0, K/ D 0 .


K"1 n!1

In order to consider small values of Ln under Pn , write for K > 1


 Z
1 1  1  1
Pn Ln < < 1¹Ln < K1 º Ln dQn < Qn Ln < < Qn Ln <
K K K K
so that the last convergence strengthens to
 1 
.CC/ lim lim sup Pn Ln … Œ , K D 0 .
K"1 n!1 K
From (+) and (++), log-likelihood ratios .ƒn /n are R-tight both under .Pn /n and under
.Qn /n .
(2) Let us assume R-tightness of .ƒn /n under both .Pn /n and .Qn /n . Then we have
R-tightness of
 
 Cƒn  1  
.Ln /n D e n
, D e ƒn n , with conventions of Notation 3.1(iii)
Ln n
under both .Pn /n and .Qn /n . Hence the likelihoods of Pn with respect to Qn are tight
under .Pn /n , and the likelihoods of Qn with respect to Pn are tight under .Qn /n . Thus
two applications of Theorem 3.4(a) establish the desired mutual contiguity. 

At this stage, Theorem 3.4 being completely proved, we can prove Le Cam’s first
lemma in the form encountered in Lemma 3.5. An obvious corollary merging Theo-
rem 3.11 and Proposition 3.12 is the following:

3.14 Corollary. Assume that there is a probability measure FL on .R, B.R// such that

L.Ln j Qn / ! FL (weakly in R, as n ! 1) .

Then the following holds:


Z
.Pn /n CB .Qn /n ” FL .¹0º/ D 0 and ` FL .d `/ D 1 .
Œ0,1/

To deduce Corollary 3.14 from Theorem 3.11 and Proposition 3.12, it is sufficient
to consider continuity points c > 0 of FL for which lim Qn .Ln  c/ D FL .Œ0, c/ .
n!1
Now we deduce Lemma 3.5 from Corollary 3.14:
100 Chapter 3 Contiguity

3.15 Proof of Le Cam’s First Lemma 3.5. We show that the setting of Lemma
3.5 specialises the setting of Corollary 3.14; in this sense, Corollary 3.14 is a slightly
stronger formulation of Le Cam’s first lemma. The difference is in the respective initial
conditions on convergence of likelihood ratios. Lemma 3.5 starts from some law F on
.R, B.R// such that

L.ƒn j Qn / ! F (weakly in R as n ! 1)

which – writing FL for the image of F under the continuous mapping R 3  ! e 2 R


– translates into

L.Ln j Qn / ! FL with FL ..0, 1// D 1 (weakly in R as n ! 1) .

In explicit restriction to the subclass of laws FL which are concentrated on .0, 1/,
Corollary 3.14 states
Z
.Pn /n CB .Qn /n ” ` FL .d `/ D 1
.0,1/
R
where the right-hand side equals Re
 F .d / D 1 when FL is image of F under the
above mapping. 

We turn to the proof of Le Cam’s Third Lemma 3.6. Since .Ln /n under .Qn /n is
always R-tight by Proposition 3.3, we have R1Cd -tightness of pairs .Ln , Xn /n un-
d
der .Qn /n for any sequence .Xn /n of R -valued random variables which is Rd -tight
under .Qn /n , and can select subsequences .nl /l and laws F eL on .R, B.R// such that
e
.Lnl , Xnl /l under .Qnl /l converges weakly in R1Cd to FL as l ! 1. All accumula-
eL in this sense are necessarily laws which are concentrated on Œ0, 1/ Rd ,
tion points F
and FeL may put strictly positive mass to the hyperplane ¹0º Rd . Thus, up to selection
of subsequences, the only condition in the following one-sided variant of Le Cam’s
third lemma is contiguity .Pn /n C .Qn /n : in this case, sequences .Xn / converging
jointly with the likelihoods under .Qn /n will also converge under .Pn /n , and the limit
law will be ‘explicit’.

3.16 Theorem. Assume


.Pn /n C .Qn /n .
eL on .R1Cd , B.R1Cd // and a sequence of Rd -valued
Consider a probability measure F
random variables Xn on .n , An / such that the following holds:
eL
L ..Ln , Xn / j Qn / ! F (weakly in R1Cd as n ! 1) .
Then
eL `, dx/ :D .` _ 0/ F
G.d eL .d `, dx/ , ` 2 R , x 2 Rd
Section 3.2 Proofs for Section 3.1 and some Variants 101

is a probability measure on .R1Cd , B.R1Cd //, and we have


eL
L ..Ln , Xn / j Pn / ! G (weakly in R1Cd as n ! 1) .

Proof. (I) We prove this result first in the case where all Xn take values in Rd .
(1) We start with some preliminaries. We write GL for the first marginal of the mea-
eL defined in Theorem 3.16, and FL for the first marginal of F
sure G eL . Then FL is a
probability on .R, B.R// such that
L .Ln j Qn / ! FL (weakly in R as n ! 1) .
In particular, FL is concentrated on Œ0, 1/. Now Theorem 3.11 and the assumed con-
tiguity give
Z
.ı/ ` FL .d `/ D 1 ,
Œ0,1/

eL is a probability. Again by .ı/, for " > 0 arbitrarily small, there is K D K."/ <
thus G
1 such that
Z Z
.C/ eL
l 1¹l>Kº F .d l, dx/ D l FL .d l/ < " ;
R1Cd .K,1/

increasing K D K."/ if necessary, we achieve simultaneously


Z
.CC/ sup Ln 1¹Ln >Kº dQn < "
n1

from Proposition 3.10 and the assumed contiguity. Contiguity also implies
.C C C/ Pn .Ln D 1/ ! 0 as n ! 1 .
(2) First, we prove for functions f 2 Cb .R1Cd /
Z Z
eL l, dx/ ,
lim f .Ln , Xn / Ln dQn D f .l, x/ G.d n!1.
n!1

Write M :D sup jf j and decompose with truncated identities gK .x/ D 0 _ x ^ K,


x 2 R,
Z Z
f .Ln , Xn / Ln dQn D f .Ln , Xn / gK .Ln / dQn
Z
C f .Ln , Xn / .Ln  gK .Ln // dQn .

Since .l, x/ ! f .l, x/ gK .l/ is in Cb .R1Cd /, weak convergence of pairs .Ln , Xn /n


under .Qn /n gives
Z Z
.˘/ lim f .Ln , Xn / gK .Ln / dQn D f .l, x/ gK .l/ FeL .d l, dx/ .
n!1
102 Chapter 3 Contiguity

At the same time, we have as a consequence of (++)


Z
sup jf .Ln , Xn /j .Ln  gK .Ln // dQn < M "
n1

and as a consequence of (+)


Z
eL .d l, dx/ < M " .
jf .l, x/j .l  gK .l// F

The family of convergences .˘/ for K D K."/, " > 0 arbitrary, in combination with
the last lines gives
Z Z
lim eL .d l, dx/ ;
f .Ln , Xn / Ln dQn D f .l, x/ l F
n!1

FL being concentrated on Œ0, 1/, the limit is


Z Z
eL eL l, dx/ .
f .l, x/ .l _ 0/ F .d l, dx/ D f .l, x/ G.d

(3) Second, we prove for f ., / as above


Z Z

3 
f .Ln , Xn / dPn ! eL l, dx//
f .l, x/ G.d

where
3
.Ln , Xn / D .Ln , Xn / 1¹Ln <1º
in accordance with Notations 3.1’ since all Xn are Rd -valued, by assumption, in this
part of the proof.
This holds since the quantity considered in step (2)
Z Z
f .Ln , Xn / Ln dQn D f .Ln , Xn / 1¹Ln <1º dPn

(by Lebesgue decomposition of Pn with respect to Qn ) can asymptotically as n ! 1


be replaced by
Z Z
  
3
f .Ln , Xn / 1¹Ln <1º dPn D f .Ln , Xn / dPn


thanks to contiguity (+++) and a bound


ˇZ Z ˇ
ˇ   ˇ
ˇ f .Ln , Xn / 1¹L <1º dPn  f .Ln , Xn /1¹L <1º dPn ˇ  M Pn .Ln D 1/
ˇ n n ˇ
! 0 .
(4) According to Notation 3.1’(ii), contiguity (+++) combined with step (3) is the
desired assertion
L ..L , X / j P / ! G eL (weakly in R1Cd , as n ! 1) .
n n n

This ends the proof of Theorem 3.16 in the case where all Xn take values in Rd .
Section 3.2 Proofs for Section 3.1 and some Variants 103

d
(II) Now we extend the result from Rd -valued to R -valued random variables Xn ,
n  1.
In this case, by Notation 3.1’(ii), the convergence assumption on pairs .Ln , Xn /n
under .Qn /n implies
lim Qn .jXn j D 1/ D 0 ,
n!1

Ln being a.s. finite under Qn , and thanks to contiguity also

lim Pn .Ln D 1 or jXn j D 1/ D 0 .


n!1

Contiguity and arguments similar to step (3) above allow to replace asymptotically as
n!1
Z Z

3 
f .Ln , Xn / dQn by
 
b n dQn
f Ln , X
Z Z

3 
f .Ln , Xn / dPn by
 
b n 1¹L <1º dPn
f Ln , X n

where we put X n
3
b n :D Xn 1¹jX j<1º , and where .Ln , Xn / D .Ln , Xn /1¹L <1,jX j<1º
n n
is the notation of 3.1’(ii). The convergence assumption in Theorem 3.16 can thus be
rephrased as Z Z
  eL .d l, dx/
b
f Ln , X n dQn ! f .l, x/ F

b n / for which steps (2) and (3) of the above


where we are left to consider pairs .Ln , X
proof show
Z Z
   
f Ln , Xb n Ln dQn D f Ln , X b n 1¹L <1º dPn
n
Z
! eL l, dx/ .
f .l, x/ G.d

d
Hence the assertion of Theorem 3.16 is proved also for R -valued random variables
Xn , n  1. 

Now we can prove Le Cam’s Third Lemma 3.6 under mutual contiguity:

3.17 Proof of Le Cam’s Third Lemma 3.6. In analogy to the Proof 3.15 which
deduces Lemma 3.5 from Corollary 3.14, we shall show that the setting of Lemma
3.6 specialises the setting of Theorem 3.16 (hence, Theorem 3.16 is a slightly stronger
formulation of Le Cam’s third lemma). Writing d in place of k, Lemma 3.6 starts from
some law F e on .R1Cd , B.R1Cd // such that

e
L . .ƒn , Xn / j Qn / ! F (weakly in R1Cd as n ! 1) .
104 Chapter 3 Contiguity

If we define F eL as image of F e under the continuous mapping R1Cd 3 ., x/ !


eL , this translates into
.e  , x/ 2 R1Cd , and FL for the first component of F
e
L . .Ln , Xn / j Qn / ! FL where FL ..0, 1// D 1 (weakly in R1Cd as n ! 1) .

By the assumed mutual contiguity .Pn /n CB .Qn /n , we have according to Lemma


3.5 or Corollary 3.14
Z Z Z
1D L
` F .d `/ D eL
` F .d `, dx/ D e .d , dx/ .
e F
.0,1/ .0,1/Rd R1Cd

e , dx/ D e  F
Hence G.d e .d , dx/ defines a probability measure on .R1Cd ,
B.R1Cd //. The image G eL of G
e under the mapping R1Cd 3 ., x/ ! .e  , x/ 2 R1Cd
being concentrated on .0, 1/ Rd , we can write
eL `, dx/ D .` _ 0/ F
G.d eL .d `, dx/ ,

thus Theorem 3.16 establishes


eL
L . .Ln , Xn / j Pn / ! G (weakly in R1Cd as n ! 1) .

Again by the same transformation, the last convergence is equivalent to


e
L . .ƒn , Xn / j Pn / ! G (weakly in R1Cd as n ! 1)

which finishes the proof of Lemma 3.6. 

It remains to prove Proposition 3.6” which specialises Le Cam’s Third Lemma 3.6
e in Lemma 3.6 is a normal law with non-degenerate
to situations where the limit law F
first component.

3.18 Proof of Proposition 3.6”. We start from the assumption that F e in Lemma 3.6
1Ck , B.R1Ck // whose first marginal – the limit law for the
is some normal law on .R
log-likelihood ratios .ƒn /n under .Qn /n when mutual contiguity .Pn /n CB .Qn /n
holds – is not degenerate at a single point, i.e. has some strictly positive variance
 2 > 0. Then we can write Fe in the form
   2 > 
e   
F D N ,
  †

for suitable ,  2 Rk ,  2 R, and † 2 Rkk which is symmetric and non-negative


definite.
e , we have necessarily
(1) Let us prove that in this representation of F
1
.˘/  D  2 .
2
Section 3.2 Proofs for Section 3.1 and some Variants 105

This arises as a consequence of mutual contiguity by which


Z
e .d , dx/ D 1
e F

according to Le Cam’s First Lemma 3.5. For the first marginal,


Z Z
1
e  2 2 . 2. C
1 2 2 /C 2 /

1 D e N .,  /.d / D
2
p d
2 2

amounts – inserting ˙. C  2 /2 in the exponent – to


1
 2 D . C  2 /2 ” jj D j C  2 j ”  D  2
2
e . This proves the first assertion
since  2 > 0, by assumption on the first marginal of F
of Proposition 3.6”.
(2) For the parameters of the normal law
 1 2   2 > 
e D N 2   
F ,
  †
in step (1) above we write
0 1 0 1
1 1  1 2   2 > 
@ . A @ . A 2   
D .. , D .. , m :D , ƒ :D .
  †
k  k

Then with notations


0 1 0 1
1
B z1 C B0C
z D @ . A 2 R1Ck , e0 D @ . A 2 R1Ck
. .
z.k 0.

e (existing on R1Ck ) equals


the Laplace transform of the normal law F
Z Œ C
P
k
zi xi  ° ±
z ! e i D1 e .d , dx/ D exp z>m C 1 z>ƒ z D: e.z/ ;
F
2
e , dx/ :D e  F
as a consequence, the Laplace transform of a law G.d e .d , dx/

Z Œ C
P
k
zi xi 
Z Œ.1/ C
P
k
zi xi 
z ! e i D1 e , dx/ D
G.d e i D1 e .d , dx/
F

exists on R1Ck and has the form


° 1 ±
.˘/ z ! e.z  e0 / :D exp .z  e0 />m C .z  e0 />ƒ .z  e0 / .
2
106 Chapter 3 Contiguity

We rewrite this function using

1 X k
.z  e0 />m D  . 1/.  2 /  zi i
2
iD1

1 1 X
k
1 2
.z  e0 />ƒ .z  e0 / D z>ƒ z   
2
zi i C 
2 2 2
iD1

as 8 2 3 9
< 1 X
k
1 =
z ! exp  4 .C  2 / C zi .i Ci /5 C z>ƒ z
: 2 2 ;
iD1

or equivalently in compact notation


° ±  
1 C 12  2
z ! exp z> m e C z>ƒ z where e :D
m .
2 C

This is the Laplace transform of N .e m, ƒ/. By the uniqueness theorem for Laplace
transforms (see e.g. [4, Chap. 10.1]), we have identified the law G.de , dx/ :D

e F e .d , dx/, for F
e of step (1), as
 1 2   2 > 
e D N C2   
G ,
C  †

which proves the second assertion of Proposition 3.6”. 

3.18’ Exercise. Write M for the space of all piecewise constant right-continuous jump func-
tions f : Œ0, 1/ ! N0 such that f has at most finitely many jumps over compact time
intervals, all these with jump height C1, and starts from f .0/ D 0. Write t for the coordinate
projections  t .f / D f .t /, t  0, f 2 M .
Equip M with the  -field generated by the coordinate projections, consider the filtration
F D .F t / t0 where F t :D  .r : 0  r  t /, and call  D . t / t0 the canonical process on
.M , M, F /.
For  > 0, let P ./ denote the (unique) probability law on .M , M/ such that the canonical
process . t / t0 is a Poisson process with parameter . Then (e.g. [14, p. 165]) the process

0 0 t
® ¯
Lt = :D exp .0  / t , t  0


is the likelihood ratio process of P .0 / relative to P ./ with respect to F , i.e., for all t 2 Œ0, 1/,
0
Lt = is a version of the likelihood ratio of P .0 /jFt relative to P ./jFt

where jF t denotes restriction of a probability measure on .M , M/ to the sub- -field F t . Ac-


cepting this as background (as in Jacod [63], Kabanov, Liptser and Shiryaev [68], and Bré-
maud [14]), prove the following.
Section 3.2 Proofs for Section 3.1 and some Variants 107

(1) For fixed reference value  > 0, reparameterising the family of laws on .M , M, F / with
respect to  as
.˘/ # ! P .e # / , # 2 ‚ :D R ,
the likelihood ratio process takes the form which has been considered in Exercise 1.16’:
# =
° ±
Le
t :D exp #  t  .e #
 1/ t , t 0.

For t fixed, this is an exponential family in # with canonical statistic  t . Specify a functional
 : ‚ ! .0, 1/ and an estimator Tt : M ! R for this functional
1
.#/ :D  e # , T t :D  t .
t
By classical theory of exponential families, T t is the best estimator for  in the sense of uni-
formly minimum variance within the class of unbiased estimators (e.g. [127, pp. 303 and 157]).
(2) For some h 2 R which we keep fixed as n ! 1, define sequences of probability laws
p
Qn :D P ./jFn , Pn :D P .e h= n
/jFn , n1
and prove contiguity .Pn /n CB .Qn /n via Lemma 3.5: for this, use a representation of log-
likelihood ratios ƒn of Pn with respect to Qn in the form
 p 
1 h
ƒn D h p .n  n/   e h= n  1  p n
n n
and weak convergence in R
1
L. ƒn j Qn / ! N .  h2  , h2  / , n ! 1 .
2
p
(3) Writing Xn :D n.Tn  /, the reference point  being fixed according to the repa-
rameterisation .˘/, extend the last result to joint convergence in R2
 1 2   2 
2 h  h  h
L. .ƒn , Xn / j Qn / ! N ,
0 h 
such that by Le Cam’s Third Lemma 3.6
   2 
C 12 h2  h  h
L. .ƒn , Xn / j Pn / ! N , .
h h 
In particular, this gives
.˘˘/ L. Xn  h j Pn / ! N . 0 ,  / .
(4) Deduce from .˘˘/ the following ‘equivariance’ property of the estimator Tn in shrinking
neighbourhoods of radius O. p1n / – in the sense of the reparameterisation .˘/ above – of the
p
reference point , using approximations e h= n D 1 C phn C O. n1 / as n ! 1: for every
h 2 R fixed, we have weak convergence
p p p 
L n.Tn  e h= n / j P .e h= n / ! N . 0 ,  / as n ! 1

where the limit law does not depend on h. This means that on small neighbourhoods with ra-
dius O. p1n / of a fixed reference point , asymptotically as n ! 1, the estimator Tn identifies
true parameter values with the same precision. 
Chapter 4

L2-differentiable Statistical Models

Topics for Chapter 4:


4.1 Lr -differentiability when r  1
Lr -differentiability in dominated families 4.1–4.1”
Lr -differentiability in general 4.2–4.2’
Example: one-parametric paths in non-parametric models 4.3
Lr -differentiability implies Ls -differentiability for 1  s  r 4.4
Lr -derivatives are centred 4.5
Score and information in L2 -differentiable families 4.6
L2 -differentiability and Hellinger distances locally at # 4.7–4.9
4.2 Le Cam’s Second Lemma for i.i.d. observations
Assumptions for Section 4.2 4.10
Statement of Le Cam’s second lemma 4.11
Some auxiliary results 4.12–4.14
Proof of Le Cam’s second lemma 4.15
Exercises: 4.1”’, 4.9’

This chapter, closely related to Section 1.1, generalises the notion of ‘smoothness’ of
statistical models. We define it in an L2 -sense which e.g. allows us to consider families
of laws which are not pairwise equivalent, or where log-likelihood ratios for ! fixed are
not smooth functions of the parameter. To the new notion of smoothness corresponds
a new and more general definition of score and information in a statistical model. In
the new setting, we will be able to prove rigorously – and under weak assumptions –
quadratic expansions of log-likelihood ratios, valid locally in small neighbourhoods
of fixed reference points #. Later, such expansions will allow to prove assertions very
similar to those intended – with questionable heuristics – in Section 1.3. Frequently
called ‘second Le Cam lemma’, expansions of this type can be proved in large classes
of statistical models – for various stochastic phenomena observed over long time inter-
vals or in growing spatial windows – provided the parameterisation is L2 -smooth, and
provided we have strong laws of large numbers and corresponding martingale conver-
gence theorems. For i.i.d. models, we present a ‘second Le Cam lemma’ in Section 4.2
below.
Section 4.1 Lr -differentiable Statistical Models 109

4.1 Lr -differentiable Statistical Models


We start with models which are dominated, and give a preliminary definition of Lr -
differentiability of a statistical model at a point #. The domination assumption will be
removed in the sequel.

4.1 Motivation. We consider a dominated experiment

.  , A , P :D ¹P# : # 2 ‚º / , ‚  Rd open , P << 


with densities
dP#
# :D , # 2‚.
d
1=r
Let r  1 be fixed. It is trivial that for all # 2 ‚, # belongs to Lr ., A, /.
(1) In the normed space Lr ., A, /, the mapping
1=r
.˘/ ‚ 3 !  2 Lr ., A, /

is (Fréchet) differentiable at e # if the following (i) and (ii)


D # with derivative V
hold simultaneously:

.i/ e#,1 , : : : , V
e # has components V
V e#,d 2 Lr ., A, / ,

1 e#
 #  .  #/> V
1=r 1=r
.ii/  ! 0 as j  #j ! 0 .
j  #j Lr . /

(2) We prove: in a statistical model P , (i) and (ii) imply the assertion

./ e#,i D 0 -almost surely on ¹# D 0º , 1  i  d .


V

To see this, consider sequences . n /n  ‚ of type n D # ˙ ın ei where ei is the i -th


unit vector in Rd , and .ın /n any sequence of strictly positive real numbers tending to
0. Then (ii) with D n restricted to the event ¹# D 0º gives

1 1=r e #,i
1¹ # D0º   V ! 0 ,
ın #Cın ei Lr . /

1 1=r e #,i
1¹ # D0º  C V ! 0
ın #ın ei Lr . /

as n ! 1. Selecting subsequences .nk /k along which -almost sure convergence


holds, we have
1 1=r e#,i and 1 1=r e #,i
on ¹# D 0º :  ! V  ! V
ınk #Cınk ei ınk #ınk ei
110 Chapter 4 L2 -differentiable Statistical Models

-almost surely as k ! 1. The densities of the left-hand sides being non-negative,


e #,i necessarily equals 0 -almost surely on the event ¹# D 0º. Thus any version
V
e # in (i) and (ii) has the property ().
of the derivative V
(3) By step (2) combined with the definition (ii) of the Fréchet derivative, we can
modify Ve# on some -null set in A in order to achieve

e # D 0 on ¹# D 0º .
V
e # into a new object
This allows to transform V

V#,i :D 1¹ # >0º
r 1=r
#
e #,i ,
V 1i d

which gives (note that with respect to (i) we change the integrating measure from 
to P# )
V# with components V#,1 , : : : , V#,d 2 Lr ., A, P# / ,
e #,i D 1 1=r V#,i , 1  i  d .
V
r #
e # , Fréchet differentiability of the mapping .˘/ takes the
With V# thus associated to V
following form:

4.1’ Definition (preliminary). For ‚ 2 Rd open, consider a dominated family P D


¹P : 2 ‚º of probability measures on ., A/ with -densities  , 2 ‚. Fix
r  1 and # 2 ‚. The model P is called Lr -differentiable in # with derivative V# if
the following (i) and (ii) hold simultaneously:

.i/ V# has components V#,1 , : : : V#,d 2 Lr ., A, P# /

R ˇ 1=r ˇ
1 ˇ  1=r  1
1=r .  #/ > V ˇr d ! 0
j#jr   # r # #
.ii/
as j  #j ! 0.
We remark that the integral in Definition 4.1’(ii) does not depend on the choice of
the dominating measure . Considering for P different dominating measures  << e ,
and densities # (with respect to ) and e # (with respect to e # D # dde
 ), we have e
e
 -almost surely. Hence, in a representation analogous to Definition 4.1’(ii) with e
 , e
#
d
and e , the factor de cancels out. Thus (any version of) the derivative V# in Defini-
tion 4.1’(i) depends only on the law P# in the family P , and not on the choice of a
dominating measure for P .

4.1” Remark. Fix r  1. If a statistical model P :D ¹P : 2 Rd º is Lr -differen-


tiable in all points # 2 ‚ in the sense of Definition 4.1’, and if in particular

.ı/ ‚ 3 !  .!/ 2 Œ0, 1/ is continuous and admits partial derivatives


Section 4.1 Lr -differentiable Statistical Models 111

for every ! 2  fixed, we shall show that the components V#,i of the Lr -derivatives
V# coincide P# -almost surely with
 
@
.ıı/ ! ! 1¹ # >0º .!/ log  .#, !/
@#i
which was considered in Definition 1.2, for all 1  i  d . We do have V#,i 2 Lr .P# /
by Definition 4.1’(i), and shall see later in Corollary 4.5 that V#,i is necessarily centred
under P# . When r D 2, .ıı/ being the classical definition of the score at # (Defi-
nition 1.2), L2 -differentiability is a notion of smoothness of parameterisation which
extends the classical setting of Chapter 1. We will return to this in Definition 4.6 below.
To check that components V#,i of Lr -derivatives V# coincide with .ıı/ P# -almost
surely, consider in Definition 4.1’ sequences n D # C ın ei (with ei the i -th unit
vector in Rd , and ın # 0). From Lr ./-convergence in Definition 4.1’(ii), we can
select subsequences .nk /k along which -almost sure convergence holds
ˇ 1=r ˇ
ˇ 1=r ˇ
ˇ #Cınk ei  # 1 1=r ˇ
ˇ   V ˇ
#,i ˇ ! 0 as k ! 1 .
ˇ ınk r #
ˇ ˇ

Thus for the mapping .ı/, the gradient r.1=r /.#, / coincides –almost surely with
1 1=r e
r # V# D V # . On ¹# > 0º we thus have

r.1=r /.#, /
r.log /.#, / D r r.log.1=r //.#, / D r D V# –almost surely
1=r .#, /
e# according to Assertion 4.1./. This
whereas on ¹# D 0º we put V#  0  V
gives the representation .ıı/. 

4.1”’ Exercise. Consider the location model P :D ¹F . / : 2 Rº generated by the doubly
exponential distribution F .dx/ D 12 e jxj dx on .R, B.R//. Prove that P is L2 -differentiable
in D #, for every # 2 R, with derivative V# D sgn.  #/.

The next step is to remove the domination assumption from the preliminary Defini-
tion 4.1’.

4.2 Definition. For ‚  Rd open, consider a (not necessarily dominated) family


0
P D ¹P : 2 ‚º of probability measures on ., A/. For 0 , 2 ‚ let L = denote
a version of the likelihood ratio of P 0 with respect to P , as defined in Notation 3.1(i).
Fix r  1. The model P is called Lr -differentiable in # with derivative V# if the
following (i), (iia), (iib) hold simultaneously:
.i/ V# has components V#,1 , : : : V#,d 2 Lr ., A, P# / ,

1  =# 
.iia/ P  L D 1 ! 0 , j  #j ! 0 ,
j  #jr
112 Chapter 4 L2 -differentiable Statistical Models

Z ˇ ˇr
1 ˇ =# 1=r 1 ˇ
.iib/ ˇ L  1  .  #/ V# ˇˇ dP# ! 0 ,
>
j  #j ! 0 .
j  #jr ˇ r


4.2’ Remarks. (a) For arbitrary statistical models P D ¹P : 2 ‚º and reference
points # 2 ‚, it is sufficient to check (iia) and (iib) along sequences . n /n  ‚ which
converge to # as n ! 1.
(b) Fix any sequence . n /n  ‚ which converges to # as n ! 1. Then the
P ¹Pnn
countable subfamily , n  1, P# º is dominated by some  -finite measure  (e.g.
take  :D P# C n1 2 Pn ) which depends on the sequence under consideration.
Let n , n  1, # denote densities of Pn , n  1, P# with respect to . Along this
subfamily, the integral in Definition 4.1’(ii)
Z ˇ ˇr Z Z
ˇ 1=r 1=r 1 1=r > ˇ
ˇn  #  # . n  #/ V# ˇ d D : : : d C : : : d
 r ¹ # D0º ¹ # >0º

splits into the two expressions which appear in (iia) and (iib) of Definition 4.2:
ˇ ˇr
Z Z ˇ  1=r ˇ
ˇ n 1 ˇ
n d C ˇ  1  . n  #/> V# ˇ # d
¹ # D0º ¹ # >0º ˇ # r ˇ
  Z ˇ ˇr
ˇ  =# 1=r 1 ˇ
D P n L n =#
D1 C ˇ n
 1  . n  #/ V# ˇˇ dP# .
>
ˇ L r

Thus the preliminary Definition 4.1’ and the final Definition 4.2 are equivalent in re-
striction to the subfamily ¹Pn , n  1, P# º . Since we can consider arbitrary sequences
. n /n converging to # , we have proved that Definition 4.2 extends Definition 4.1’
consistently. 

4.3 Example. On an arbitrary space ., A/ consider P :D M1 ., A/, the set of all
probability measures on ., A/. Fix r  1 and P 2 P . In the non-parametric model
E D ., A, P /, we shall show that one-parametric paths ¹Q# : j#j < "º through P
in directions Z
g 2 Lr .P / such that g dP D 0

r -differentiable at all points of the parameter
(defined in analogy to Example 1.3) are L
interval, and have Lr -derivative V0 D g at the origin Q0 D P .
(a) Consider first the case of directions g which are bounded, and write M :D
sup jgj. Then the one-dimensional path through P in P
 ° 1 ±
g
./ SP :D , A, Q# : j#j < , dQ# :D .1 C # g/ dP
M
is Lr -differentiable at every parameter value #, j#j < M
1
, and the derivative V# at #
is given by
g
V# D 2 Lr ., A, Q# /
1C#g
Section 4.1 Lr -differentiable Statistical Models 113

(in analogy to expression (1.3’), and in agreement with Remark 4.1”). At # D 0 we


have simply V0 D g.
We check this as follows. From jgj  M and j#j < M 1
, V# is bounded. Thus
V# 2 L ., A, Q# / holds for all # in ./ and all r  1. This is condition (i) in
r

Definition 4.2. Since SPg is dominated by  :D P , conditions (iia) and (iib) of Defini-
tion 4.2 are equivalent to Condition 4.1’(ii) which we shall check now. For all r  1,
.1 C g/1=r  .1 C # g/1=r 1 1 g
D .1 C g/ r 1 g D .1 C g/1=r
1

# r r 1C g
for every ! fixed, with some (depending on !) between and #. For " small enough
and 2 B" .#/, j j remains separated from M1
: thus dominated convergence under P#
as ! # in the above line gives
Z ˇˇ ˇr
ˇ
ˇ .1 C g/  .1 C # g/
1=r 1=r 1 g ˇ
ˇ  .1 C # g/1=r ˇ d ! 0
 ˇ # r 1 C # gˇ
as j  #j ! 0

which is Condition 4.1’(ii).


(b) Consider now arbitrary directions g 2 Lr .P / with EP .g/ D 0. As in the
second part of Example 1.3, use of truncation avoids boundedness assumptions. With
2 C01 .R/ as there
1 1
.x/ D x on ¹jxj < º , .x/ D 0 on ¹jxj > 1º , max j j <
3 x2R 2
we define
E D . , A, ¹Q# : j#j < 1º / , R 
./
Q# .d!/ :D 1 C Œ .#g.!//  .#g/dP  P .d!/ .
In the special case of bounded g as in (a), one-parametric paths ./ and ./ coincide
for parameter values # close to 0. Uniformly in .#, !/, densities
Z
f .#, !/ :D 1 C Œ .#g.!//  .#g/dP 

in ./ are bounded away from both 0 and 2, by choice of . Since is Lipschitz,
dominated convergence gives
Z Z
d
.#g/ dP D g 0 .#g/ dP .
d#
e # defined from the mapping .˘/ in 4.1, our assumptions on
First, with V and g imply
that
 Z
e d 1=r 1 . r1 1/ d 1 . r1 1/ 0 0
V# D f D f# f# D f # g .#g/  g .#g/ dP
d# # r d# r
114 Chapter 4 L2 -differentiable Statistical Models

belongs to Lr ., A, P / (since f# is bounded away from 0; 0 is bounded), or equiv-


alently that
R
1=r e g 0 .#g/  g 0 .#g/ dP @
V # D r f# V# D R D log f .#, /
1 C Œ .#g.!//  .#g/dP  @#

belongs to Lr ., A, Q# /, in analogy to Example 1.3(b). Thus condition (i) in Defi-


nition 4.1’ is checked. Second, exploiting again our assumptions on and g to work
with dominated convergence, we obtain
Z ˇ ˇr
1 ˇ 1=r 1=r e# ˇˇ dP ! 0 as j  #j ! 0 .
ˇ f  f  .  #/ V
j  #jr   #

This establishes condition (ii) in Definition 4.1’. According to Definition 4.1’, we


have proved Lr -differentiability in the one-parametric path E at every parameter value
j#j < 1. In particular, the derivative at # D 0 is

.ı/ V0 .!/ D g.!/ , !2

which does not depend on the particular construction – through choice of – of the
path E through P . Note that the last assertion .ı/ holds for arbitrary P 2 P and for
arbitrary directions g 2 Lr .P / such that EP .g/ D 0. 

4.4 Theorem. Consider a family P D ¹P : 2 ‚º where ‚  Rd is open.


If P is L -differentiable at D # with derivative V# for some r > 1, then P is
r

Ls -differentiable at D # with same derivative V# for all 1  s  r.

Proof. Our proof follows Witting [127, pp. 175–176]


r
(1) Fix 1  s < r, put t :D rs 1
(such that t > 1, and . r=s /C 1
t D 1 or
ts D s  r ) and define functions ', from Œ0, 1/ to .0, 1/ by
1 1 1

s y 1=s  1 rs
'.y/ :D , .y/ :D ' rs .y/ D ' ts .y/ , y0.
r y 1=r  1
Check that ' is continuous at y D 1 (with '.1/ D 1), and thus continuous on Œ0, 1/.
For ° r r ±
y > max ,1
r s
we have the inclusions
 
r r r r 1=r
y 1=r
> H)  1 y 1=r > H) .y  1/ > y 1=r
r s s s s
and thus by definition of t
!ts !ts
y 1=s  1 y 1=s  1  1 1 ts
.y/ D ' ts .y/ D r 1=r  1/
< < y sr Dy.
s .y y 1=r
Section 4.1 Lr -differentiable Statistical Models 115

From this and continuity of on Œ0, 1/, the function satisfies a linear growth con-
dition
.ı/ 0  .y/ D ' ts .y/  M.1 C y/ for all y  0
for some constant M .
(2) We exploit the assumption of Lr -differentiability at #. For sequences . m /m
converging to # in ‚, consider likelihood ratios Lm =# of Pm with respect to P# . As a
consequence of (i) in Definition 4.2, the mapping u ! k u> V# kLr .P# / is continuous
on Rd , hence bounded on the unit sphere. Thus . m  #/> V# vanishes in Lr .P# /.
From (iib) of Definition 4.2 and inverse triangular inequality
ˇ ˇ
ˇ   =# 1=r 1 ˇ
ˇ Lm  1  .  #/>
V ˇ  o.j m  #j/
ˇ Lr .P# / r
m #
Lr .P# / ˇ

and both last assertions together establish


  =# 1=r
.˘/ Lm ! 1 in Lr .P# / as m ! 1 .
First, .˘/ implying convergence in probability, we have
 
.C/ g Lm =# ! 1 in P# -probability
for any continuous function g : Œ0, 1/ ! R with g.1/ D 1; second, .˘/ gives
   1=r r 
.CC/ E# Lm =# D E# Œ Lm =#  ! 1 as m ! 1 .
(3) We wish to prove
 
.˘˘/ ' Lm =# ! 1 in Lst .P# / as m ! 1 .
Fix q  1. Recall that for variables fm 2 Lq .P# / converging to some random variable
f in P# -probability as m ! 1, we have
¹jfm jq : m  1º uniformly integrable under P#
” f 2 Lq .P# / and fm ! f in Lq .P# / ;
in the special case where q D 1 and fm  0 for all m, we can extend this to a chain
of equivalences
” f 2 L1 .P# / and E# .fm / ! E# .f / as m ! 1 ;
for the last equivalence, see [20, Nr. 21 in Chapter II]). In the case where q D 1, based
on (+) and (++) in step (2), the above equivalences applied to f  1 and fm D Lm =#
establish
® m =# ¯
L : m  1 uniformly integrable under P#
and thus, thanks to the linear growth condition .ı/ for the function D ' st  0 in
step (1),
® st   =#  ¯
' Lm : m  1 uniformly integrable under P# .
116 Chapter 4 L2 -differentiable Statistical Models

Finally, based on convergence in probability (+), we put q D st and apply the first of
the equivalences above to f  1 and fm D '.Lm =# / : this establishes .˘˘/.
(4) With these preparations, we come to the core of the proof. Regarding first (i) and
(iia) in Definition 4.2, it is clear that these conditions are valid for 1  s  r when
they are valid for r > 1. We turn to condition (iib) in Definition 4.2. By definition of
the function ' in step (1), we can upper-bound the expression
° 1=s ±
. / s L=#  1  .  #/> V# s
L .P# /

in a first step (using triangular inequality) by


h ° 1=r ± i  
r L=#  1  .  #/> V# ' L=#
Ls .P# /
h   i
C .  #/> V# ' L=#  1
Ls .P# /
1
and in a second step using Hölder inequality (we have . r=s / C 1t D 1 ) by
° 1=r ±  
r L=#  1  .  #/> V# r  ' L=# ts
L .P# / L .P# /
>
 =#

C .  #/ V#  ' L 1 .
Lr .P# / Lt s .P# /

tends to #, the first product is of order


In this last right-hand side, as
 
o.j  #j/  ' L=# ts
D o.j  #j/
L .P# /

and the second of order


 
j  #j  ' L=#  1 D o.j  #j/
Lt s .P# /

by Lr -differentiability at # – we use (iib) of Definition 4.2 in the first case, and


max ku> V# kLr .P# / < 1 in the second case – in combination with Lts -conver-
jujD1
gence .˘˘/. Summarising, we have proved that . / is of order o.j  #j/ as ! #:
thus condition (iib) in Definition 4.2 holds for 1  s < r if it holds for r > 1. 

As a consequence, we now prove that derivatives V# arising in Definitions 4.1’


or 4.2 are always centred:

4.5 Corollary. Consider a family P D ¹P : 2 ‚º where ‚  Rd is open, fix


r  1. If P is Lr -differentiable at D # with derivative V# , then E# .V# / D 0.

Proof. From Theorem 4.4, P is L1 -differentiable at D # with same derivative V# .


Consider a unit vector ei in Rd , a sequence ın # 0, a dominating measure  for the
countable family ¹P#Cın ei , n  1, P# º, and associated densities. Then Definition 4.1’
Section 4.1 Lr -differentiable Statistical Models 117

with r D 1 gives
#Cın ei  #
 # V#,i ! 0 in L1 ./ as n ! 1
ın
R
and thus EP# .V#,i / D # V#,i d D 0. This holds for 1  i  d . 

From now on we focus on the case r D 2 which is of particular importance. As-


suming L2 -differentiability at all points # 2 ‚, the derivatives have components

./ V#,i 2 L2 .P# / such that EP# .V#,i / D 0 , 1i d

by Definition 4.2(i) and Corollary 4.5. In the special case of dominated models ad-
mitting continuous densities for which partial derivatives exist, V# necessarily has all
properties of the score considered in Definition 1.2, cf. Remark 4.1”. In this sense, the
present setting generalises Definition 1.2 and allows to transfer notions such as score
or information to L2 -differentiable statistical models.

4.6 Definition. Consider P D ¹P : 2 ‚º where ‚  Rd is open. If a point


# 2 ‚ is such that

.˘/ the family P is L2 -differentiable at # with derivative V# ,

we call V# the score and  


J# :D E# V# V#>
the Fisher information at #. The family P is called L2 -differentiable if .˘/ holds for
all # 2 ‚.

In the light of Definition 4.2, Hellinger distance H., / between probability mea-
sures Z ˇ
1 ˇˇ 1=2 1=2 ˇ2
H .Q1 , Q2 / D
2
ˇ 1  2 ˇ d 2 Œ0, 1
2
on ., A/ (as in Definition 1.18, not depending on the choice of the  -finite measure 
which dominates Q1 and Q2 , and not on versions of the densities i D dQ d
i
, i D 1, 2)
2
gives the geometry of experiments which are L -differentiable. This will be seen in
Proposition 4.8 below.

4.7 Proposition. Without further assumptions on a statistical model P D ¹P : 2


‚º, we have
  p 2   0 
2 H 2 P 0 , P D E L 0 =  1 C P 0 L = D 1
0
for all , in ‚, and
p 
E L 0 =  1 D H 2 .P 0 , P / .
118 Chapter 4 L2 -differentiable Statistical Models

Proof. By Notations 3.1, for any  -finite measure  on ., A/ which dominates P 0
0
and P and for any choice  0 ,  of -densities, L = coincides .P C P 0 /-almost
0
surely with  1¹  >0º C 1 1¹  D0º . As in step (1) of the proof of Proposition 3.10,
 0  ® ¯  0 
E 1  L = D P 0  D 0 D P 0 L = D 1

is the total mass of the P -singular part of P 0 . Now the decomposition


Z Z
p p 2 p p 2  
 0   d D  0   d C P 0  D 0
 ¹  >0º

p 0
yields the first assertion. Since L = 2 L2 .P /, all expectations in
p  1  0  1 p  0 = 2 
.C/ E L 0 =  1 D E L =  1  E L 1
2 2
p
arepwell defined, where equality in (+) is the elementary a  1 D 12 .a  1/ 
2 . a  1/ for a  0. Taking together the preceding three lines yields the second
1 2

assertion. 

4.8 Proposition. In a model P D ¹P : 2 ‚º where ‚  Rd is open, consider


sequences . m /m which approach # from direction u 2 Sd 1 (Sd 1 the unit sphere
in Rd ) at arbitrary rate .ım /m :

m #
ım :D j m  #j ! 0 , um :D ! u , m!1.
j m  #j

If P is L2 -differentiable in D # with derivative V# , then (with notation an bn


for limn!1 abnn D 1)
  p 2  1 >
2 H 2 Pm , P# E# Lm =#  1 2
ım u J# u
4
as m ! 1 whenever u>J# u is strictly positive.

Proof. With respect to a given sequence m ! # satisfying the assumptions of


the proposition, define a dominating measure  and densities m , # as in Remark
4.2’(b). By L2 -differentiability at # we approximate first
p p
m  # 1p
by # u>
m V# in L2 ./
j m  #j 2

as m ! 1, using Definition 4.2(iib), and then


1p 1p
 # u>m V# by # u> V# in L2 ./
2 2
Section 4.2 Le Cam’s Second Lemma for i.i.d. Observations 119

since . m /m approaches # from direction u. This gives


Z p p 
2   m  # 2
H Pm , P# D
2
d
2
ım  j m  #j
1   1
! E# Œu> V# 2 D u> J# u
4 4
as m ! 1, by Definition 4.6 of the Fisher information. L2 -differentiability at # also
guarantees    2
Pm Lm =# D 1 D o ım
as m ! 1, from Definition 4.2(iia), thus the first assertion of Proposition 4.7 com-
pletes the proof. 

The sphere Sd 1 being compact in Rd , an arbitrary sequence . n /n  ‚ converging


to # contains subsequences . nm /m which approach # from some direction u 2 Sd 1 ,
as required in Proposition 4.8.

4.9 Remark. By Definition 4.6, in an L2 -differentiable experiment, the Fisher infor-


mation J# at # is symmetric and non-negative definite. If J# admits an eigenvalue 0,
for a corresponding eigenvector u and for sequences . m /m approaching # from direc-
tion u, the Hellinger distance will be unable to distinguish m from # at rate j m  #j
as m ! 1, and the proof of the last proposition only gives
    p   
H 2 Pm , P# D o j m  #j2 , E# . Lm =#  1/2 D o j m  #j2

as m ! 1. This explains the crucial role of assumptions on strictly positive defi-


nite Fisher information in an experiment. Only in this last case, the geometry of the
experiment in terms of Hellinger distance is locally at # equivalent – up to some de-
formation expressed by J# – to Euclidean geometry on the parameter space ‚ as a
subset of Rd . 

4.2 Le Cam’s Second Lemma for i.i.d. Observations


Under independent replication of experiments, L2 -differentiability of a statistical
model E at a point # makes log-likelihood ratios in product models Enpresemble –
locally in small neighbourhoods of # which we reparameterise via # C h= n in terms
of a local parameter h – the log-likelihoods in the Gaussian shift model
® ¯
N .J# h , J# / : h 2 Rd

where J# is the Fisher information at # in E. This goes back to Le Cam, see [81];
since Hájek and Sidák [41], results which establish an approximation of this type
120 Chapter 4 L2 -differentiable Statistical Models

are called a ‘second Le Cam lemma’. From the next chapter on, we will exploit
similar approximations of log-likelihood ratios. Beyond the i.i.d. setting, such
approximations do exist in a large variety of contexts where a statistical model is
smoothly parameterised and where strong laws of large numbers and central limit
theorems are at hand (e.g. autoregressive processes, ergodic diffusions, ergodic
Markov processes, . . . ).

4.9’ Exercise. For J 2 Rd d symmetric and strictly positive definite, calculate log-likelihood
ratios in the experiment ¹N .J h, J / : h 2 Rd º and compare to the expansion given in 4.11
below.

Throughout this section the following assumptions will be in force.

4.10 Assumptions and Notations for Section 4.2. (a) We work with an experiment
 
E D  , A , P D ¹P : 2 ‚º , ‚  Rd open
0
with likelihood ratios L = of P 0 with respect to P . Notation # for a point in the
parameter space implies that the following is satisfied:

P D ¹P : 2 ‚º is L2 -differentiable at D # with derivative V# .

In this case, we write J# for the Fisher information at # in E, cf. Definition 4.6:
 
J# D E# V# V#> .

(b) In n-fold product experiments


 ² ³
n n n
En D n :D X  , An :D ˝ A , Pn :D ˝ P : 2‚ ,
iD1 iD1 iD1

 0 =  0 =
Ln and ƒn denote the likelihood ratio and the log-likelihood ratio of Pn0 with
respect to Pn . At points # 2 ‚ where E is L2 -differentiable, cf. (a), we write

1 X
n
Sn .#/.!1 , : : : , !n / :D p V# .!j / , ! D .!1 , : : : , !n / 2 n
n j D1

and have from the Definition 4.6 (which includes Corollary 4.5) and the central limit
theorem
 
L Sn .#/ j P#n ! N .0, J# / (weak convergence in Rd ) as n ! 1 .
Section 4.2 Le Cam’s Second Lemma for i.i.d. Observations 121

Here is the main result of this section:

4.11 Le Cam’s Second Lemma for i.i.d. Observations. At points # 2 ‚ which


p the Assumptions 4.10, for bounded sequences .hn /n in R (such that # C
satisfy d

hn = n is in ‚), we have
p
.#Chn = n/=# 1 >
ƒn D h>
n Sn .#/  h J# hn C o.P#n / .1/ , n ! 1
2 n
 
where L Sn .#/ j P#n converges weakly in Rd as n ! 1 to N .0, J# /.

The proof of Lemma 4.11 will be given in 4.15, after a series of auxiliary results.
p
4.12 Proposition. For bounded sequences .hn /n in Rd and points n :D # C hn = n
in ‚, we have
n p
X  1 1 X >
n
1
Ln =# .!j /  1 D p hn V# .!j /  h> J# hn C o.P#n / .1/ , n!1.
2 n 8 n
j D1 j D1

Proof. (1) Fix # satisfying Assumptions 4.10, and define for 0 ¤ h 2 Rd


p
1 p 1 1
r# .n, h/ :D p L.#Ch= n/=#  1  p h> V#
jhj= n 2 n
in the experiment E. Then for any choice of a constant C < 1, L2 -differentiability
at # implies
 
.˘/ sup E# Œr# .n, h/2 ! 0 as n ! 1 :
0<jhjC

to see this, select in ¹jhj  C º at stage n some hn such that E# .Œr# .n, hn /2 / is suf-
p side of .˘/, and apply part (iib) of Definition 4.2 to the
ficiently close to the left-hand
sequence n :D # C hn = n as n ! 1.
(2) For n  1, consider in En
X n p
p 1 1
An,h .!/ :D L.#Ch= n/=#  1  p h> V# .!j /
2 n
j D1
./
jhj X
n
D p r# .n, h/.!j /
n
j D1

for 0 ¤ h 2 Rd . From the second assertion in Proposition 4.7 combined with Corol-
lary 4.5 we know
p
p 1 1  
.A
E# n,h / D n E# L.#Ch= n/=#  1  p h> V# D n H 2 P#Ch=pn , P# .
2 n
122 Chapter 4 L2 -differentiable Statistical Models

Consider now a bounded sequence .hn /n in Rd . The unit sphere S d 1 being compact,
hn
we can select subsequences .hnk /k whose directions uk :D jhnk j approach limits
k
u 2 S d 1 . Thus we deduce from Proposition 4.8
    1
./ E# An,hn D n H 2 P#Chn =pn , P# D  h> J# hn C o.1/
8 n
as n ! 1. On the other hand, calculating variances for ./ we find from .˘/
sup Var# .An,h /  C 2 sup Var# .r# .n, h//
jhjC jhjC
 
 C sup E# Œr# .n, h/2
2
! 0
jhjC

as n ! 1. Thus the sequence ./ behaves as the sequence of its expectations ./:
  1
An,hn D E# An,hn C o.P#n / .1/ D  h> J# hn C o.P#n / .1/
8 n
as n ! 1 which is the assertion. 
p
4.13 Proposition. For bounded sequences .hn /n in Rd and points n :D # C hn = n,
we have
n p
X  2 1
Ln =# .!j /  1 D h> J# hn C o.P#n / .1/ , n ! 1 .
4 n
j D1

Proof. We write n .!/, e n .!/, : : : for remainder terms o.P#n / .1/ defined in En . By
Definition 4.6 of the Fisher information and the strong law of large numbers,
1 X
n
  
V# V#> .!j / D E# V# V#> C n .!/ D J# C n .!/
n
j D1

as n ! 1. Thus we have for bounded sequences .hn /n


Xn  2
1 >
.C/ p hn V# .!j / D h> n J# hn C o.P#n / .1/ , n!1.
j D1
n

With notation r# .n, h/ introduced at the start of the proof of Proposition 4.12 we write
p 1 1 jhn j
.CC/ Ln =# .!j /  1 D p h> n V# .!j / C p r# .n, hn /.!j / .
2 n n
As n ! 1, we take squares of both the right-hand sides and the left-hand sides in
(++), and sum over 1  j  n. Thanks to .˘/ in the proof of Proposition 4.12
 
sup E# r# .n, h/2 ! 0 as n ! 1
jhjC
Section 4.2 Le Cam’s Second Lemma for i.i.d. Observations 123

and Cauchy–Schwarz inequality, the terms


ˇ ˇ
1X
n
1 X ˇˇ h>
n ˇ
ˇ
Œr# .n, hn /2 .!j / and ˇ n V# .!j / r# .n, hn /.!j / ˇ
n n ˇ jhn j ˇ
j D1 j D1

then vanish as n ! 1 on the right-hand sides, and we obtain


X n p 2 X n  2
1 1 >
L  n =# .!j /  1 D p h V# .!j / Ce
n .!/
2 n n
j D1 j D1
1 >
h J# hn C o.P#n / .1/
D
4 n
as a consequence of (++), .˘/ in the proof of Proposition 4.12, and (+). 
p
4.14 Proposition. For bounded sequences .hn /n in Rd and n :D # C hn = n,
n p
X  Xn p 2
ƒnn =# .!/ D2 L n =# .!j /  1  Ln =# .!j /  1 C o.P#n / .1/
j D1 j D1

as n ! 1.

Proof. (1) The idea of the proof is a logarithmic expansion


1
log.1 C z/ D z  z 2 C o.z 2 / as z ! 0
2
p
which for bounded sequences .hn /n and n D # C hn = n should give
X
n X
n  p  
ƒnn =# .!/ D log Ln =# .!j / D 2 log 1 C Ln =#  1 .!j /
j D1 j D1
Xn
p  n p
X  2
D2 Ln =#  1 .!j /  Ln =#  1 .!j / C   
j D1 j D1

where we have to consider carefully the different remainder terms which arise as
n ! 1.
(2) In En , we do have
X
n
.ı/ ƒnn =# .!/ D log Ln =# .!j / for P#n -almost all ! D .!1 , : : : , !n / 2 n
j D1

which justifies the first ‘D’ in the chain of equalities in (1) above. To see this, fix
the sequence . n /n , choose on ., A/ some dominating measure  for the restricted
experiment ¹Pn , n  1, P# º, and select densities n , n  1, # . In the restricted
product experiment ¹Pnn , n  1, P#n º, the set

An :D ¹! 2 n : # .!j / > 0 for all 1  j  nº 2 An


124 Chapter 4 L2 -differentiable Statistical Models

has full measure under P#n , the likelihood ratio Lnn =# coincides .Pnn C P#n /-almost
surely with
Yn
n .!j /
! ! 1An .!/ C 1 1Acn .!/ ,
# .!j /
j D1
and the expressions
X
n
ƒnn =# .!/ D log.Lnn =# .!// and log Ln =# .!j /
j D1

are well defined and Œ1, C1/-valued in restriction to An , and coincide on An . This
is .ı/.
pat # in the experiment E. Fix ı > 0, write Zn D
2
p (3) We exploit L -differentiability
L n  1 where n D # C hn = n . Then we have
 =#

P# .jZn j > ı/ D P# .Zn > ı/ C P# .Zn < ı/


   
1 > ı 1 > ı
 P# Zn > ı , p hn V#  C P# p h V# >
2 n 2 2 n n 2
   
1 ı 1 ı
C P# Zn < ı , p h> V #   C P # p h >
V# < 
2 n n 2 2 n n 2
ˇp ˇ   ˇ ˇ 
ˇ 1 ˇ ı ˇ ˇ p
 P# ˇ Ln =#  1  . n  #/> V# ˇ > C P# ˇh> n V# ˇ > ı n .
2 2
Since .hn /n is a bounded sequence, we have for suitable C as n ! 1
 p 
ˇ ˇ p  ı n
ˇ > ˇ
P# hn V# > ı n  P# jV# j >
C
  1
C2
 2 E# jV# j2 1 p
ı n Do
ı n ¹jV# j> C º n
since V# is in L2 .P# /. A simpler argument works for the first term on the right-hand
side above since
ˇp ˇ2   
ˇ 1 ˇ 1
E# ˇ Ln =#  1  . n  #/> V# ˇ D o.j n  #j2 / D o
2 n
by part (iib) of Definition 4.2. Taking all this together, we have
 
1
P# .jZn j > ı/ D o as n ! 1
n
for ı > 0 fixed.
(4) In the logarithmic expansion of step (1), we consider remainder terms
ˇ ˇ
ˇ
ˇ 1 2 ˇˇ
R.z/ :D ˇ log.1 C z/  z C z ˇ , z 2 .1, 1/
2
Section 4.2 Le Cam’s Second Lemma for i.i.d. Observations 125

together with random variables in the product experiment En


p 
Zn,j .!/ :D Ln =#  1 .!j / , ! D .!1 , : : : , !n / 2 n ,

and shall prove in step (5) below


X
n
.C/ R.Zn,j / D o.P#n / .1/ as n ! 1 .
j D1

Then (+) will justify the last ‘D’ in the chain of heuristic equalities in step (1), and
thus will finish the proof of Proposition 4.14.
(5) To prove (+), we will consider ‘small’ and ‘large’ absolute values of Zn,j sep-
arately, ‘large’ meaning
X
n X
n X
n
R.Zn,j /1¹jZn,j j>ıº D R.Zn,j /1¹Zn,j >ıº C R.Zn,j /1¹Zn,j <ıº
j D1 j D1 j D1

for any ı > 0 fixed. For Zn,j positive, we associate to ı > 0 the quantity

 D .ı/ :D inf¹R.z/ : z > ıº > 0


which allows to write
²X n ³
® ¯
R.Zn,j / 1¹Zn,j >ıº >  D Zn,j > ı for at least one 1  j  n .
j D1

Using step (3) above, the probability of the last event under P#n is smaller than

n P# .Zn > ı/ D o .1/ as n!1.

Negative values of Zn,j can be treated analogously. We thus find that the contribution
of ‘large’ absolute values of Zn,j to the sum (+) is negligible:

X
n X
n
.CC/ for any ı > 0 fixed : R.Zn,j / D R.Zn,j /1¹jZn,j jıº C o.P#n / .1/
j D1 j D1

as n ! 1. It remains to consider ‘small’ absolute values of Zn,j . Fix " > 0 arbitrarily
small. As a consequence of R.z/ D o.z 2 / for z ! 0, we can associate

ı D ı."/ > 0 such that jR.z/j < " z 2 on ¹jzj  ıº

to every value of " > 0. Since .hn /n is a bounded sequence, we have as n ! 1

X
n n p
X  2 1
. / 2
Zn,j .!/ D Ln =# .!j /  1 D h> J# hn C o.P#n / .1/
4 n
j D1 j D1
126 Chapter 4 L2 -differentiable Statistical Models

from Proposition 4.13, and at the same time


X
n X
n
.C C C/ R.Zn,j /1¹jZn,j jıº  "  2
Zn,j
j D1 j D1
P 
n
for ı D ı."/. Now, . / implies tightness of L Z 2
j D1 n,j j P#
n
as n ! 1, and we
have the possibility to choose " > 0 arbitrarily small. Combining (+++) and (++) we
thus obtain
Xn
R.Zn,j / D o.P#n / .1/ as n ! 1 .
j D1

This is (+), and the proof of Proposition 4.14 is finished. 

Now we can conclude this section:

4.15 Proof of Le Cam’s Second Lemma 4.11. Under Assumptions 4.10, we put to-
gether Propositions 4.14, 4.12 and 4.13
n p
X n p
X
   2
ƒnn =# .!/ D 2 L n =# .!j /  1  Ln =# .!j /  1 C o.P#n / .1/
j D1 j D1
 
1 1 X >
n
1 > 1 >
D2 p hn V# .!j /  hn J# hn  h J# hn C o.P#n / .1/
2 n 8 4 n
j D1

1 X
n
1 >
Dp h>
n V# .!j /  h J# hn C o.P#n / .1/
n j D1 2 n

and have proved the representation of log-likelihoods in Lemma 4.11. Weak conver-
gence of the scores

1 X
n
Sn .#, !/ D p V# .!j / under P#n
n
j D1

has already been stated in Assumption 4.10(b). Le Cam’s Second Lemma 4.11 is now
proved. 
Chapter 5

Gaussian Shift Models

Topics for Chapter 5:


5.1 Gaussian Shift Experiments
A classical normal distribution model 5.1
Gaussian shift experiment E.J / 5.2–5.3
Equivariant estimators 5.4
Boll’s convolution theorem 5.5
Subconvex loss functions 5.6
Anderson’s lemma, and some consequences 5.6’–5.8
Total variation distance 5.8’
Under ‘very diffuse’ a priori, arbitrary estimators are approximately equivariant 5.9
Main result: the minimax theorem 5.10
5.2  Brownian Motion with Unknown Drift as a Gaussian Shift Experiment
Local dominatedness/equivalence of probability measures 5.11
Filtered spaces and density processes 5.12
Canonical path space .C , C , G/ for continuous processes 5.13
Example: Brownian motion with unknown drift, special case d D 1 5.14
The statistical model ‘scaled BM with unknown drift’ in dimension d  1 5.15
Statistical consequence: optimal estimation of the drift parameter 5.16
Exercises: 5.4’, 5.4”, 5.4”’, 5.5’, 5.10’

This chapter considers the Gaussian shift model and its statistical properties. The main
stages of Section 5.1 are Boll’s Convolution Theorem 5.5 for equivariant estimators,
the proof that arbitrary estimators are approximately equivariant under a very diffuse
prior, and – as a consequence of both – the Minimax Theorem 5.10 which establishes
a lower bound for the maximal risk of arbitrary estimators, in terms of the central
statistic. We will see a stochastic process example in Section 5.2.

5.1 Gaussian Shift Experiments


Up to a particular representation of the location parameter, the following experiment
is well known:
128 Chapter 5 Gaussian Shift Models

5.1 Example. Fix J 2 Rd d symmetric and strictly positive definite, and consider
the normal distribution model
 ® ¯
Rd , B.Rd / , Ph :D N .J h, J / : h 2 Rd .

dPh
Here densities fh D d
with respect to Lebesgue measure  on Rd are given by
 
d=2 1=2 1 > 1
f .h, x/ D .2/ .det J / exp  .x  J h/ J .x  J h/ ,
2
x 2 Rd , h 2 Rd

and all laws Ph , h 2 Rd , are pairwise equivalent.


(a) It follows that likelihood ratios of Ph with respect to P0 have the form
 
h=0 dPh > 1 >
.C/ L :D D exp h S  h J h , h 2 Rd
dP0 2

where S.x/ :D x denotes the canonical variable on Rd for which, as a trivial assertion,

.CC/ L .S j P0 / D N .0, J / .

(b) Taking an arbitrary h0 2 Rd as reference point, we obtain in the same way


 
dPh 1
Lh= h0 :D D exp .h  h0 /> .S  J h0 /  .h  h0 /> J .h  h0 / ,
dPh0 2
h 2 Rd

and have, again as a trivial assertion,


 
L S  J h0 j Ph0 D N .0, J / .

(c) Thus a structure analogous to (+) and (++) persists if we reparameterise around
arbitrary reference points h0 2 Rd : we always have
 
e 1 > e
. / L.h0 Ch/= h0 D exp eh> .S  J h0 /  e h J h , e h 2 Rd
2
together with
 
. / L S  J h0 j Ph0 D N .0, J / .

(d) The above quadratic shape (+) of likelihoods combined with a distributional prop-
erty (++) was seen to appear as a limiting structure – by Le Cam’s Second Lemma 4.10
and 4.11 – over shrinking neighbourhoods of fixed reference points # in L2 -differen-
tiable experiments. 
Section 5.1 Gaussian Shift Experiments 129

5.2 Definition. A model ., A, ¹Ph : h 2 Rd º/ is called a Gaussian shift experiment


E.J / if there exists a statistic

S : ., A/ ! .Rd , B.Rd //

and a deterministic matrix

J 2 Rd d symmetric and strictly positive definite

such that for every h 2 Rd ,


 
1
.C/ ! ! exp h> S.!/  h> J h D: Lh=0 .!/
2

is a version of the likelihood ratio of Ph with respect to P0 . We call Z :D J 1 S


central statistic in the Gaussian shift experiment E.J /.

In a Gaussian shift experiment, the name ‘central statistic’ indicates a benchmark


for good estimation, simultaneously under a broad class of loss functions: this will be
seen in Corollary 5.8 and Theorem 5.10 below.

For every given matrix J 2 Rd d symmetric and strictly positive definite, a Gaus-
sian shift experiment E.J / exists by Example 5.1. The following proposition shows
that E.J / as a statistical experiment is completely determined from the matrix J 2
Rd d symmetric and strictly positive definite.

5.3 Proposition. In an experiment E.J / with the properties of Definition 5.2, the fol-
lowing holds:
(a) all laws Ph , h 2 Rd , are equivalent probability measures;
(b) we have L.Z  h j Ph / D N .0, J 1 / for all h 2 Rd ;
(c) we have L.S  J h j Ph / D N .0, J / for all h 2 Rd ;
(d) we have for all h 2 Rd
 
.hCe
h/= h e> 1 e> e e
L D exp h .S  J h/  h J h , h 2 Rd .
2

It follows that in the classical sense of Definition 1.2, Mh :D S  J h is the score


in h 2 Rd and
 
J D Eh Mh Mh>
the Fisher information in h. The Fisher information does not depend on the parameter
h 2 Rd .
130 Chapter 5 Gaussian Shift Models

Proof. (1) For any h 2 Rd , the likelihood ratio Lh=0 in Definition 5.2 is strictly positive
and finite on : hence neither a singular part of Ph with respect to P0 nor a singular
part of P0 with respect to Ph exists, and we have P0 Ph .
(2) Recall that the Laplace transform of a normal law N .0, ƒ/ on .Rd , B.Rd /// is
Z
> 1 >
d
R 3 u ! e  u x N .0, ƒ/.dx/ D e C 2 u ƒ u

(ƒ 2 Rd d symmetric and non-negative definite); the characteristic function of


N .0, ƒ/ is
Z
> 1 >
Rd 3 u ! e i u x N .0, ƒ/.dx/ D e  2 u ƒ u .

(3) In E.J /, J being deterministic, (1) and Definition 5.2 give


   >
 1 >
1 D E0 L.h/=0 D E0 e h S e  2 h J h , h 2 Rd

which specifies the Laplace transform of the law of S under P0 and establishes

L .S j P0 / D N .0, J / .

For the central statistic Z D J 1 S , the law of Z  h under Ph is determined – via


scaling and change of measure, by (1) and Definition 5.2 – from the Laplace transform
of the law of S under P0 : we have
 >   >  > 1 >  1 > 
Eh e  .Zh/ D E0 e  .Zh/ Lh=0 D e C h e  2 h J h E0 e  .J h/ S
>h >J 1 h/> J .J 1 h/ > J 1
D e C e 2 h e C 2 .J D eC 2 
1 1 1
h 

for all  2 Rd and all h 2 Rd . This shows

L.Z  h j Ph / D N .0, J 1 /

for arbitrary h 2 Rd . This is (b), and (c) follows via standard transformations of
normal laws.
(4) From (1) and Definition 5.2 we obtain the representation (d) in the same way
as . / in Example 5.1(b). Then (d) and (c) together show that the experiment E.J /
admits score and Fisher information as indicated. 

For the statistical properties of a parametric experiment, the space ., A/ support-
ing the family of laws ¹P : 2 ‚º is of no importance: the structure of likelihoods
L=# matters when and # range over ‚. Hence, one may encounter the Gaussian
shift experiment E.J / in quite different contexts.
In a Gaussian shift experiment E.J /, the problem of estimation of the unknown pa-
rameter h 2 Rd seems completely settled in a very classical way. The central statistic
Section 5.1 Gaussian Shift Experiments 131

Z is a maximum likelihood estimator. The theorems by Rao–Blackwell and Lehmann–


Scheffé (e.g. [127, pp. 349 and 354], using the notions of a canonical statistic in a
d -parametric exponential family, of sufficiency and of completeness, assert that in the
class of all unbiased and square integrable estimators for the unknown parameter in
E.J /, Z is the unique estimator which uniformly on ‚ minimises the variance.
However, a famous example by Stein (see e.g. [60, pp. 25–27] or [59, p. 93]) shows
that for normal distribution models in dimension d  3, there are estimators admitting
bias which improve quadratic risk strictly beyond the best unbiased estimator. Hence
it is undesirable to restrict from the start the class of estimators under consideration
by imposing unbiasedness. The following definition does not involve any such restric-
tions.

5.4 Definition. In a statistical model .0 , A0 , ¹Ph0 : h 2 ‚0 º/, ‚0  Rd , an estimator


 for the unknown parameter h 2 ‚0 is called equivariant if
   
L   h j Ph0 D L  j P00 for all h 2 ‚0 .

An equivariant estimator simply ‘works equally well’ at all points of the statistical
model. By Proposition 5.3(b), the central statistics Z is equivariant in the Gaussian
shift model E.J /.

5.4’ Exercise. With C the space of continuous functions Rd ! R, write Cp  C for the
cone of strictly positive f vanishing at 1 faster than any polynomial: for every ` 2 N,
sup¹jxj>Rº ¹jxj` f .x/º ! 0 as R ! 1. Consider an experiment E 0 D .0 , A0 , ¹Ph0 : h 2
Rd º/ of mutually equivalent probability laws for which paths

h ! L.h0 Ce
Rd 3 e h/= h0
.!/ 2 .0, 1/

belong to Cp almost surely, for arbitrary h0 2 Rd fixed, and such that the laws
  
L L.h0 Ce h/= h0
e
h2Rd
j Ph0 0

on .C , C / do not depend on h0 2 Rd (as an example, all this is satisfied in Gaussian shift


experiments E.J / in virtue of Proposition 5.3(c) and (d) for arbitrary dimension d  1). As a
consequence, laws of pairs
 Z Z  
L h L.h0 Ce
e h/= h0 e
dh , L.h0 Ceh/= h0 e
d h j Ph0 0
Rd Rd

are well defined and do not depend on h0 . Prove the following (a) and (b):
(a) In E 0 , a Bayesian estimator h with ‘uniform over Rd prior’ for the unknown parameter
h 2 Rd R 0 h0 =0
d h L .!/ dh0
h .!/ :D R R 0 =0
Rd L
h .!/ dh0
(sometimes called Pitman estimator) is well defined, and is an equivariant estimator.
132 Chapter 5 Gaussian Shift Models

(b) In E 0 , a maximum likelihood estimator b


h for the unknown parameter h 2 Rd
® 0 ¯
b
h.!/ D argmax Lh =0 .!/ : h0 2 Rd

(with measurable selection of an argmax – if necessary – similar to Proposition 2.10) is well


defined, and is equivariant.
(c) In general statistical models, estimators according to (a) or to (b) will have different
properties. Beyond the scope of the chapter, we add the following remark, for further reading:
in the statistical model with likelihoods
 
u=0 1
L D exp Wu  juj , u 2 R
2

where .Wu /u2R is a two-sided Brownian motion and dimension is d D 1 (the parameter is
‘time’: this is a two-sided variant of Example 1.16’, in the case where X in Example 1.16’ is
Brownian motion), all assumptions above are satisfied, (a) and (b) hold true, and the Bayesian
estimator h outperforms the maximum likelihood estimator b h under squared loss. See the
references quoted at the end of Example 1.16’. For the third point, see [114]; for the second
point, see [56, Lemma 5.2]. 

5.4” Exercise. Prove the following: in a Gaussian shift model E.J /, the Bayesian estimator
with ‘uniform over Rd prior’ (from Exercise 5.4’) for the unknown parameter h 2 Rd
R 0 h0 =0
d h L .!/ dh0
h .!/ :D R R 0 =0 , !2
Rd L
h .!/ dh0

coincides with the central statistic Z, and thus with the maximum likelihood estimator in E.J /.
Hint: varying the i -th component of h0 , start from one-dimensional integration
Z 1 Z 1
@ h0 =0 0 0
0D 0 L .!/ dh i D .S.!/  J h0 /i Lh =0 .!/ dh0i , 1  i  d
1 @hi 1

and prove Z Z
0 0
Z.!/ Lh =0 .!/ dh0 D h0 Lh =0 .!/ dh0 .
Rd Rd

Then Z and h coincide on , for the Gaussian shift model E.J /. 

5.4”’ Exercise. In a Gaussian shift model E.J / with unknown parameter h 2 Rd and central
statistic Z D J 1 S , fix some point h0 2 ‚ and some 0 < ˛ < 1, and use a statistic

T :D ˛ Z C .1  ˛/ h0

as an estimator for the unknown parameter. Prove that T is not an equivariant estimator, and
specify the law L.T  hjPh / for all h 2 Rd from Proposition 5.3. 

The following implies a criterion for optimality within the class of all equivariant
estimators in E.J /. This is the first main result of this section.
Section 5.1 Gaussian Shift Experiments 133

5.5 Convolution Theorem (Boll [13]). Consider a Gaussian shift experiment E.J /.
If  is an equivariant estimator for the unknown parameter h 2 Rd , there is some
probability measure Q on .Rd , B.Rd // such that

L .  h j Ph / D N .0, J 1 / ? Q for all h 2 Rd .

In addition, the law Q coincides with L .  Z j P0 / .

Proof.  being equivariant for the unknown parameter h 2 Rd , it is sufficient to prove

.˘/ L . j P0 / D N .0, J 1 / ? L .  Z j P0 / .

(1) Using characteristic functions for laws on Rd , we fix t 2 Rd . Equivariance of 


gives
 >   >   > 
.C/ E0 e i t  D Eh e i t .h/ D E0 e i t .h/ Lh=0 , h 2 Rd .

Let us replace h 2 Rd by z 2 C d in (+) to obtain an analytic function


 > > 1 > 
f : C d 3 z ! E0 e i t .z/ e z S  2 z J z 2 C .

By (+), the restriction of f ./ to Rd  C d being constant, f ./ is constant on C d ,


thus in particular  > 
E0 e i t  D f .0/ D f . iJ 1 t / .
Calculating the value of f at iJ 1 t we find
 >  1 > 1  > 
./ E0 e i t  D e  2 t J t E0 e i t .Z/ .

(2) We have ./ for every t 2 Rd . We thus have equality of characteristic functions
on Rd . The right-hand side of .C/ admits an interpretation as characteristic function of
 
L e C .  Z/ j P0

for some random variable e independent of   Z under P0 and such that L.e j
P0 / D N .0, J 1 /. Hence, writing Q :D L .  Z j P0 /, the right-hand side of ./ is
the characteristic function of a convolution N .0, J 1 / ? Q.
(3) It is important to note the following: the above steps (1) and (2) did not establish
(and the assertion of the theorem did not claim) that Z and .  Z/ should be inde-
pendent under P0 . 

Combined with Proposition 5.3(b), Boll’s Convolution Theorem 5.5 states that in a
Gaussian shift experiment E.J /, estimation errors of equivariant estimators are ‘more
spread out’ than the estimation error of the central statistic Z, as an estimator for
the unknown parameter. By Theorem 5.5, the best possible concentration of estima-
tion errors (within the class of all equivariant estimators) is attained for  D Z: then
134 Chapter 5 Gaussian Shift Models

L .Z  h j Ph / equals N .0, J 1 /, and Q D 0 is a point mass at 0. However, a broad


class of estimators with possibly interesting properties is not equivariant.

5.5’ Exercise. For R < 1 arbitrarily large, consider closed balls C centred at 0 with radius R.
(a) In a Gaussian shift model E.J /, check that a Bayesian with uniform prior over the
compact C R 0 h0 =0
 h L .!/ dh0
hC .!/ :D CR h0 =0 .!/ dh0
, !2
C L
is not an equivariant estimator for the unknown parameter.
(b) For arbitrary estimators T : ., A/ ! .Rd , B.Rd // for the unknown parameter which
may exist in E.J /, consider quadratic loss and Bayes risks
Z
1
R.T , C / :D Eh .jT  hj2 / dh  1 .
.C / C
In the class of all A-measurable estimators T for the unknown parameter, hC minimises the
squared risk for h chosen randomly from C , and provides a lower bound for the maximal
squared risk over C :

.C/ sup Eh .jT  hj2 /  R.T , C /  R.hC , C / .


h2C

This is seen as follows: the first inequality in (+) being a trivial one (we replace an integrand by
its upper bound), it is sufficient to prove the second. Put e  :D  C and A e :D A ˝ B.C / ;
on the extended space .e e/, let id : .!, h/ ! .!, h/ denote the canonical statistic. Define
, A
a probability measure
0
dh0 Lh =0 .!/ 0
e.d!, dh0 / :D 1C .h0 /
P Ph0 .d!/ D P0 .d!/ 1C .h0 / dh
.C / .C /

on . e/, and write e


e, A E for expectation under P e. On the space .e e, P
, A e/, identify hC as
conditional expectation of the second component of id given the first component of id . Note
that the random variable h – the second component of id – belongs to L2 .P e/ since C is
e/ gives
compact. Then the projection property of conditional expectations in L2 .P
 
e
E .jh  T j2 /  e
E jh  hC j2
e/ which depend only on the first component of id . But this
for all random variables T 2 L2 .P
is the class of all T :  ! Rd which are A-measurable, thus the class of all possible estima-
tors for the unknown parameter in the experiment E.J /, and we obtain the second inequality
in (+). 

5.6 Definition. In an arbitrary experiment .0 , A0 , ¹Ph0 : h 2 ‚0 º/, ‚0  Rd ,


(i) we call a loss function
` : Rd ! Œ0, 1/ B.Rd /-measurable
subconvex or bowl-shaped if all level sets
Ac :D ¹x 2 Rd : `.x/  cº , c0
Section 5.1 Gaussian Shift Experiments 135

are convex and symmetric with respect to the origin (i.e. x 2 Ac ” x 2 Ac , for
x 2 Rd );
(ii) we associate – with respect to a loss function which we keep fixed – a risk
function
Z
R.T , / : ‚0 3 h ! R.T , h/ :D `.T  h/ dPh 2 Œ0, 1
0

to every estimator T : .0 , A0 / ! .Rd , B.Rd // for the unknown parameter h 2 ‚0 .

Note that risk functions according to Definition 5.6(ii) are well defined, but not
necessarily finite-valued. When A is a subset of ‚ of finite Lebesgue measure, we
also write
Z
R.T , A/ :D A .dh/ R.T , h/ 2 Œ0, 1

for the Bayes risk where A is the uniform distribution on A. For short, we write n
for the uniform distribution on the closed ball Bn :D ¹jxj  nº. The following lemma
is based on an inequality for volumes of convex combinations of convex sets in Rd ,
see [1, p. 170–171]) for the proof.

5.6’ Lemma (Anderson [1]). Consider C  Rd convex and symmetric with respect
to the origin. Consider f : Rd ! Œ0, 1/ integrable, symmetric with respect to the
origin, and such that all sets ¹x 2 Rd : f .x/  cº are convex, 0 < c < 1. Then
Z Z
f .x/ dx  f .x/ dx
C C Cy

for all y 2 Rd , where C C y is the set C shifted by y.

We will use Anderson’s lemma in the following form:

5.7 Corollary. For matrices ƒ 2 Rd d which are symmetric and strictly positive
definite, sets C 2 B.Rd / convex and symmetric with respect to the origin, subconvex
loss functions ` : Rd ! Œ0, 1/, points a 2 Rd n ¹0º and probability measures Q on
.Rd , B.Rd //, we have

N .0, ƒ/.C /  N .0, ƒ/.C  a/ ;


Z Z
`.x/ N .0, ƒ/.dx/  `.x  a/ N .0, ƒ/.dx/ ;
Z Z
`.x/ N .0, ƒ/.dx/  `.x/ ŒN .0, ƒ/ ? Q .dx/ .
136 Chapter 5 Gaussian Shift Models

Proof. Writing f ./ for the Lebesgue density of the normal law N .0, ƒ/ on Rd , the
first assertion rephrases Anderson’s Lemma 5.6’. An immediate consequence is
Z Z
è.x/ N .0, ƒ/.dx/  è.x  a/ N .0, ƒ/.dx/

for ‘elementary’ subconvex loss functions which are finite sums


X
m
è D ˛i 1Rd nCi ,
iD1
˛i > 0, Ci 2 B.Rd / convex and symmetric with respect to the origin, m  1.
The second assertion follows if we pass to general subconvex loss functions `./
by monotone convergence `n " ` where all `n are ‘elementary’: writing Cn,j :D
¹`  2jn º we can take

X
n2n
X
n2 1 0
n
1 j ° ± C n1
`n :D 1R nCn,j
d D 1 j0 j 0 C1 ¹`>nº , n  1.
2n 2n 2n < `  2n
j D1 0 j D0

The second assertion now allows us to compare


Z Z Z
`.x/ ŒN .0, ƒ/ ? Q .dx/ D `.y C b/ N .0, ƒ/.dy/ Q.db/
Z Z
 `.y/ N .0, ƒ/.dy/ Q.db/
Z
D `.y/ N .0, ƒ/.dy/

which is the third assertion. 

The last inequalities of Corollary 5.7 allow to rephrase Boll’s Convolution Theo-
rem 5.5. This yields a powerful way to compare (equivariant) estimators, in the sense
that ‘optimality’ appears decoupled from any particular choice of a loss function which
we might invent to penalise estimation errors.

5.8 Corollary. In the Gaussian shift experiment E.J /, with respect to any subconvex
loss function, the central statistic Z minimises the risk
R., h/  R.Z, h/ , h 2 Rd
in the class of all equivariant estimators  for the unknown parameter. 

Proof. Both  and Z being equivariant, their risk functions are constant over Rd , thus
it is sufficient to prove the assertion for the parameter value h D 0. We have
L .Z j P0 / D N .0, J 1 /
Section 5.1 Gaussian Shift Experiments 137

from Proposition 5.3. Theorem 5.5 associates to  a probability measure Q such that
L . j P0 / D N .0, J 1 / ? Q .
The loss function `./ being subconvex, the third assertion in Corollary 5.7 shows
Z
R.Z, 0/ D E0 . `.Z  0/ / D `.x/ N .0, J 1 /.dx/
Z
 `.x/ N .0, J 1 / ? Q .dx/ D E0 . `.  0/ / D R., 0/

which is the assertion. 

Given Theorem 5.5 and Corollary 5.8, we are able to compare equivariant estima-
tors; the next aim is the comparison of arbitrary estimators for the unknown parameter.
We quote the following from [121, Chap. I.2] or [124, Chap. 2.4]:

5.8’ Remark. Total variation distance d1 .P , Q/ between probability measures P ,


Q living on the same space .0 , A0 / is defined by
Z
d1 .P , Q/ :D sup jP .A/  Q.A/j D sup j .p  q/ d j 2 Œ0, 1 ,
A2A0 A2A0 A
dP dQ
with  some dominating measure and p D D
d
,q versions of the densities, and
d
one has
sup jP .A/  Q.A/j D
A2A² 0
ˇZ Z ˇ ³
.C/ ˇ ˇ
sup ˇ ˇ dP  ˇ
dQ ˇ : an A0 -measurable function 0 ! Œ0, 1 .

The following lemma represent a key tool: in a Gaussian shift experiment E.J /,
arbitrary estimators  for the unknown parameter can be viewed as ‘approximately
equivariant’ in the absence of any a priori knowledge except that the unknown param-
eter should range over large balls centred at the origin.

5.9 Lemma. In a Gaussian shift experiment E.J /, every estimator  for the unknown
parameter h 2 Rd is associated to a sequence of probability measures .Qn /n on
.Rd , B.Rd // such that
Z 
1
.i/ d1 n .dh/ L.   h j Ph / , N .0, J / ? Qn ! 0 as n ! 1 .

As a consequence, for any choice of a loss function `./ which is subconvex and
bounded, we have
Z
R ., Bn / D n .dh/ Eh .`.  h//
.ii/ Z
D `.x/ N .0, J 1 / ? Qn .dx/ C k`k1  o.1/
138 Chapter 5 Gaussian Shift Models

as n ! 1 (with Bn the closed ball ¹jhj  nº in Rd , and n the uniform distribution


on Bn ). In the last line, k`k1 is the sup-norm of the loss function, and the o.1/-terms
do not depend on `./.

Proof. Assertion (ii) is a consequence of (i), via (+) in Remark 5.8’. We prove (i) in
several steps.
(1) The central statistic Z D J 1 S is a sufficient statistic in the Gaussian shift ex-
periment E.J / (e.g. [4, Chap. II.1 and II.2], or [127, Chap. 3.1]. Thus, for any random
variable  taking values in .Rd , B.Rd //, there is a regular version of the conditional
law of  given Z D  which does not depend on the parameter h 2 Rd : there is a
transition kernel K., / on .Rd , B.Rd // such that
jZDz
K., / is a regular version of .z, A/ ! Ph .A/ for every value of h 2 Rd .
jZDz
We write this as K.z, A/ D P .A/. In the same sense, the conditional law of
  Z given Z D  does not depend on the parameter, and we define a sequence of
probability measures .Qn /n on .Rd , B.Rd //:
Z
. Z/jZDz
Qn .A/ :D n .dz/ P .A/ , A 2 B.Rd / , n  1 .

Note that the sequence .Qn /n is defined only in terms of the pair ., Z/. Comparing
with the first expression in (i), this definition signifies that the observed value z of the
central statistic starts to take over the role of the parameter h.
(2) For every fixed value of x 2 Rd , we have uniformly in A 2 B.Rd / the bound
ˇZ ˇ
ˇ ˇ  . Bn 4 .Bn C x/ /
ˇ ˇ
./ ˇ n .dh/ K.x C h, A C h/  Qn .A  x/ˇ  .Bn /
where A  x is the set A shifted by x, and 4 denotes symmetric difference. To see
this, write
Z Z
1
n .dh/ K.x C h, A C h/ D .dh/ 1Bn .h/ K.x C h, A C h/
.Bn /
Z
1
D .dh0 / 1Bn Cx .h0 / K.h0 , A  x C h0 /
.Bn /
Z
1 jZDh0
D .dh0 / 1Bn Cx .h0 / P .A  x C h0 /
.Bn /
Z
1 ZjZDh0
D .dh0 / 1Bn Cx .h0 / P .A  x/
.Bn /
and compare the last right-hand side to the definition of Qn in step (1)
Z
1 ZjZDh0
Qn .A  x/ D .dh0 / 1Bn .h0 / P .A  x/ .
.Bn /
Section 5.1 Gaussian Shift Experiments 139

It follows that the difference on the left-hand side of ./ is uniformly in A smaller than
Z
1 ˇ ˇ  . Bn 4 .Bn C x/ /
.dh0 / ˇ1Bn .h0 /  1Bn Cx .h0 /ˇ D .
.Bn / .Bn /
(3) To conclude the proof of (i), we condition the first law in (i) with respect to the
central statistic Z. For A 2 B.Rd / we obtain from Proposition 5.3(b), definition of
K., / and substitution z D x C h
Z Z Z
n .dh/ Ph .   h 2 A / D n .dh/ PhZ .dz/ Ph jZDz .A C h/
Z Z
D n .dh/ N .h, J 1 /.dz/ K.z, A C h/
Z Z
D N .0, J 1 /.dx/ n .dh/ K.x C h, A C h/

whereas the second law in (i) charges A with mass


Z
 
N .0, J 1 / ? Qn .A/ D N .0, J 1 /.dx/ Qn .A  x/ .

By the bounds ./ obtained in step (2) which are uniform in A for fixed value of x, we
can compare the last right-hand sides. Thus the definition of total variation distance
gives
Z 
d1 n .dh/ L.   h j Ph / , N .0, J 1 / ? Qn
Z
 . Bn 4 .Bn C x/ /
 N .0, J 1 /.dx/
.Bn /
where the integrand on the right-hand side, trivially bounded by 2, converges to 0
pointwise in x 2 Rd as n ! 1. Assertion (i) now follows from dominated conver-
gence. 

Lemma 5.9 is the key to the minimax theorem in Gaussian shift experiments E.J /.
It allows to compare all possible estimators for the unknown parameter h 2 Rd , with
respect to any subconvex loss function: it turns out that for all choices of `./, the max-
imal risk on Rd is minimised by the central statistic. This is – following Theorem 5.5 –
the second main result of this section.

5.10 Minimax Theorem. In the Gaussian shift experiment E.J /, the central statistic
Z is a minimax estimator for the unknown parameter with respect to any subconvex
loss function `./; we have
Z
sup R., h/  `.z/ N .0, J 1 /.dz/ D R.Z, 0/ D sup R.Z, h/ .
h2Rd h2Rd
140 Chapter 5 Gaussian Shift Models

Proof. Consider risk with respect to any subconvex loss function `./. The last equality
is merely equivariance of Z as an estimator for the unknown parameter h 2 Rd ,
and we have to prove the first sign ‘’. Consider any estimator  for the unknown
parameter, define n , Bn , Qn as in Lemma 5.9, and recall that the sequence .Qn /n
depends only on the pair ., Z/. A trivial chain of inequalities is
1  suph2Rd R., h/  suph2Bn R., h/
Z
.C/
 n .dh/ R., h/ D: R., Bn /

for arbitrary n 2 N. We shall show that


Z
lim inf R., Bn /  .` ^ N /.x/ N .0, J 1 /.dx/
.CC/ n!1
for every constant N < 1 .
Given (+) and (++), monotone convergence as N ! 1 will finish the proof of the
theorem.
In order to prove (++), observe first that it is sufficient to work with loss functions
`./ which are subconvex and bounded. For such `./, Lemma 5.9(ii) shows
Z
R., Bn / D `.x/ N .0, J 1 / ? Qn .dx/ C k`k1  o.1/

as n ! 1, where Anderson’s inequality 5.7 gives


Z Z
`.x/ N .0, J 1 / ? Qn .dx/  `.x/ N .0, J 1 /.dx/ for every n .

Both assertions together establish (++); the proof is complete. 

To resume, a Gaussian shift experiment E.J / allows – thanks to the properties of


its central statistic – for two remarkable results: first, Boll’s Convolution Theorem 5.5
for equivariant estimators; second, Lemma 5.9 for arbitrary estimators , according
to which the structure of risks of equivariant estimators is mimicked by Bayes risks
over large balls Bn . All this is independent of the choice of a subconvex loss function.
The Minimax Theorem 5.10 is then an easy consequence of both results. So neither a
notion of ‘maximum likelihood’ nor any version of a ‘Bayes property’ turns out to be
intrinsic for good estimation: it is the existence of a central statistic which allows for
good estimation in the Gaussian shift experiment E.J /.

5.10’ Exercise. In a Gaussian shift model E.J / with unknown parameter h 2 Rd and central
statistic Z D J 1 S , fix some point h0 2 ‚, and consider for 0 < ˛ < 1 estimators

T :D ˛ Z C .1  ˛/ h0

according to exercise 5.4000 which are not equivariant. Under squared loss, calculate the risk of
T as a function of h 2 Rd . Evaluate what T ‘achieves’ at the particular point h0 in comparison

Section 5.2 Brownian Motion with Unknown Drift as a Gaussian Shift Experiment 141

to the central statistic, and the price to be paid for this at parameter points h 2 Rd which are
distant from h0 . Historically, estimators of similar structure have been called ‘superefficient at
h0 ’ and have caused some trouble; in the light of Theorem 5.10 it is clear that any denomina-
tion of this type is misleading. 


5.2 Brownian Motion with Unknown Drift as a Gaussian
Shift Experiment
In the following, all sections or subsections preceded by an asterisk  will require
techniques related to continuous-time martingales, semi-martingales and stochastic
analysis, and a reader not interested in stochastic processes may skip these and keep
on with the statistical theory. Some typical references for sections marked by  are e.g.
Liptser and Shiryaev [88], Metivier [98], Jacod and Shiryaev [64], Ikeda and Watan-
abe [61], Chung and Williams [15], Karatzas and Shreve [69], Revuz and Yor [112].
In the present section, we introduce the notion of a density process (or likelihood ratio
process), and then look in particular to statistical models for Brownian motion with
unknown drift as an illustration to Section 5.1.

5.11 Definition. Consider a probability space ., A/ equipped with a right-continuous


filtration F D .F t / t0 . For any probability measure P on ., A/ and any 0  t < 1,
we write P t for the restriction of P to the  -field F t . For pairs P 0 , P of probability
measures on ., A/, we write
loc
P 0 << P relative to F

if P t0 << P t for all 0  t < 1, and

P0
loc
P relative to F

if P t0 P t for all 0  t < 1.


loc
Typical situations will combine local equivalence P 0 P relative to F with singu-
larity P 0 ?P on A. As an example, think of a process X on ., A/ which is Brownian
motion under P , and Brownian motion with drift  ¤ 0 under P 0 , and of the strong
law of large numbers: then lim 1t X t equals  almost surely under P 0 , and equals 0
t!1
almost surely under P . For the following result, see [64, Chap. III], for background,
see [98].

5.12 Theorem. Consider a probability space ., A/, a right-continuous filtration F D


loc
.F t / t0 , and probability measures P 0 << P relative to F .
142 Chapter 5 Gaussian Shift Models

(a) Then there is a .P , F /-martingale M D .M t / t0 with the properties

M  0, EP .M t / D 1 for all t < 1 , P -almost all paths of M are càdlàg

such that the following holds:


d P t0
for all t < 1: M t is a version of the density d Pt
.

The P -martingale M is uniquely determined up to .P 0 C P /-indistinguishability, and


is called the density process of P 0 with respect to P relative to F .
(b) For F -stopping times  consider the  -field of events up to time 
F :D ¹ F 2 F1 : F \ ¹  t º 2 F t for all 0  t < 1 º
S 
where F1 :D  0t<1 F t . The two mappings
F ! P 0 . F \ ¹ < 1º / and F ! P . F \ ¹ < 1º /
define measures on F of total mass  1, and we have
 
.C/ P 0 .F \ ¹ < 1º/ D EP 1F \¹<1º M .
(c) For F -stopping times  satisfying the condition
./  < 1 P 0 -almost surely and P -almost surely
the restrictions P0 :D P 0 j F and P :D Pj F of P 0 and P to F are probability
measures such that
d P0
P0 << P , M 1¹<1º is a version of d P
.
(d) For F -stopping times  which are P -almost surely finite, we write M for
M 1¹<1º under P and have equivalence of the following three assertions:
.˛/ P 0 .¹ < 1º/ D 1,
.ˇ/ EP .M / D 1,
. / .M^N /N 2N0 is a uniformly integrable P -martingale.

Proof. (1) For T 2 N fixed, we have PT0 << PT on FT , hence there is a density
d PT0
fT :D , FT -measurable, Œ0, 1/-valued, unique up to .PT0 C PT /-null sets .
d PT
Define M D .M t /0tT to be the càdlàg modification of the martingale t ! EP .fT j
F t /, 0  t  T . Then for every 0  t  T and F 2 F t , we can write
P t0 .F / D PT0 .F / D EP .1F fT / D EP .1F MT / D EP .1F M t /
Z Z
D M t dP D M t dP t
F F
d P t0
which shows that in restriction to F t , M t is a version of the density d Pt
.

Section 5.2 Brownian Motion with Unknown Drift as a Gaussian Shift Experiment 143

(2) For T 2 N we can paste together the processes .M t /0tT constructed so far
into one process M D .M t / t0 with the desired properties. This is (a).
(3) Consider F -stopping times  . For F 2 F and N 2 N, consider first the subset
F \ ¹  N º which belongs to FN and to F^N . Combining (a) above with the
stopping theorem for bounded stopping times we get
   
P 0 .F \ ¹  N º/ D EP 1F \¹N º MN D EP 1F \¹N º M^N
 
D EP 1F \¹N º M

where N 2 N. Since M is non-negative, monotone convergence as N " 1 yields


(+). This is (b).
(4) Under the additional condition ./, (+) in (b) implies (c).
(5) Consider a stopping time  such that P . < 1/ D 1. Then equivalence of .˛/
and .ˇ/ in (d) is a direct consequence of (+) in (b), taking F D . If EP .M / D 1,
taking conditional expectations under P of M given F^N yields a uniformly
integrable P -martingale which P -almost surely coincides with .M^N /N 2N0 ;
conversely, a uniformly integrable P -martingale .M^N /N 2N0 converges P -almost
surely and in L1 .P / to M as N ! 1. This is the equivalence of . / and
.ˇ/ in (d). 

For statistical purposes, the filtered spaces ., A, F / in Definition 5.11 and Theo-
rem 5.12 are frequently path spaces for certain classes of stochastic processes.

5.13 Notations. In dimension d  1, write C for the space of continuous functions


f : Œ0, 1/ ! Rd equipped with the topology of locally uniform convergence, and C
for the Borel  -field. At the same time, C is the  -field generated by the coordinate
projections f ! f .t /, t  0 (cf. [11], [64, Chap. VI.1]). On .C , C /, we write . t / t0
for the canonical process. i.e. the process of coordinate projections  t .f / :D f .t /,
t  0, f 2 C . Let G denote the (right-continuous) filtration generated by 
\
G D .G t / t0 , G t :D  .s : 0  s  r/ ,
r>t

and call G t (it contains all events in the process  up to time t and infinitesimally be-
yond) for short the  -field of events up to time t .

We start with Brownian motion with unknown drift in dimension d D 1, without


scaling constant.

5.14 Example. On the path space .C , C , G/ of Notations 5.13 with d D 1, consider


probability measures

Q :D L ..B t C t / t0 / ,  2 R
144 Chapter 5 Gaussian Shift Models

where B is one-dimensional standard Brownian motion starting in 0. For  D 0,


Q :D Q0 is Wiener measure on .C , C /; the canonical process .t , !/ ! .t , !/ is
Brownian motion under Q, and Brownian motion with drift  under Q . We shall
show that for all  2 R
loc
Q Q relative to G
and that
² ³
1
./ M D .M,t / t0 , M,t :D exp   t  2 t
2
is the density process of Q with respect to Q relative to G. For pairs 0 ¤  in R,
this implies that
 0  ² ³
0 =  = 0 = 0 ./ 1 0
./ L D Lt , Lt :D exp .  / m t  .  / t 2
t0 2
is the density process of Q0 with respect to Q relative to G, with notation m./
for the local martingale part of the observation  under Q (in particular, L=0 is the
Q-martingale M ).
Thus for every fixed time 0 < t < 1, with pair .S , J / in Definition 5.2 defined by
. t , t /, we have the structure of a Gaussian shift experiment E.t / D .C , Gt , ¹Q :
 2 Rº/ which corresponds to time-continuous observation of a trajectory under un-
known  up to time t .

Proof. Fix  2 R and consider the process M on .C , C , Q/ defined by ./. The


process takes values in .0, 1/ and has M,0 D 1 Q-almost surely. By Ito formula,
M is a solution to dM,t D  M,t d t under Q. Since  is a .Q, G/-martingale,
M is a local .Q, G/-martingale and a nonnegative supermartingale. By the classical
formula for Laplace transforms of normal laws N .0, t /
   
EQ e  t D E e  B t D e C 2  t for 0  t < 1
1 2

M satisfies E.M,t / D 1 for all t < 1 and thus is a martingale. Fix a time horizon
T < 1 and define from M a probability measure Q e ,T on .C , C / with the property
 
e ,T .A/ :D EQ M,T 1A
Q for all A 2 C
 
D EQ M,s 1A whenever 0  s  T and A 2 Gs .
M being given by ./, Girsanov theorem (e.g. [14, App. A.3.3], [69, Sect. 3.5], [64,
e ,T , with
Chap. III.3]) establishes that . t / t0 remains a semi-martingale under Q
angle bracket under Qe ,T identical to the angle bracket hi t  t under Q, and that

.  t  .t ^ T / / t0 e ,T .


is a local martingale under Q
From this, by P. Lévy’s characterisation of Brownian motion (e.g. [61, Chap. II.7]):
.  t  .t ^ T / / t0 e ,T .
is a Brownian motion under Q

Section 5.2 Brownian Motion with Unknown Drift as a Gaussian Shift Experiment 145

Thus under Q e ,T and up to time T , the canonical . t /0tT is Brownian motion with
drift . The  -field GT being generated by the coordinate projections, there is at most
one such probability on GT . Hence the restrictions of the laws Q e ,T and Q to GT
d Q,T
coincide: thus M,T is a version of the density d QT which gives Q,T QT since
M is strictly positive.
As a consequence, the density process of Q with respect to Q coincides with M
up to time T ; T < 1 being arbitrary, we have identified M as the density process
of Q with respect to Q relative to G.
loc
The last part of the assertion is proved as follows: since Q0 Q for all pairs
0 0 =
 ¤  in R, the density process L  of Q0 with respect to Q relative to G is
obtained from ratios
 
1
M0 ,t =M,t D exp .0  /  t  .02  2 / t under Q
2

where we have to rewrite  in terms of its semi-martingale decomposition under Q :


but
1 02 1
.0  /  t  .  2 / t D .0  /. t  t /  .0  /2 t
2 2
0 ./ 1 0
D .  /m t  .  /2 t
2

where .m./
t / t0 is the martingale part of the canonical process . t / t0 un-
der Q . 

The general case (dimension d  1, scaling matrix for d -dimensional Brownian


motion) is as follows.

5.15 The Experiment ‘Scaled Brownian Motion with Unknown Drift’. Fix a ma-
trix ƒ 2 Rd d which is symmetric and strictly positive definite, and let ƒ1=2 denote
its square root. On the path space .C , C , G/ of Notation 5.13 with d -dimensional
canonical process . t / t0 , consider probability measures
  
Qh :D L ƒ1=2 B t C .ƒh/ t t0 , h 2 Rd

where B D .B t / t0 is d -dimensional standard Brownian motion starting from


B0  0. Then we have
loc
Qh Q relative to G
for all h 2 Rd , with density process
² ³
h=0 h=0 h=0 > 1 >
./ L D .L t / t0 , Lt :D exp h  t  h ƒt h
2
146 Chapter 5 Gaussian Shift Models

of Qh with respect to Q0 relative to G. For pairs h0 ¤ h in Rd , this implies that


0
 0 
h =h
Lh = h D L t ,
t0
./ ² ³
h0 = h 0 > .h/ 1 0 > 0
Lt :D exp .h  h/ m t  .h  h/ ƒt .h  h/
2

is the density process of Qh0 with respect to Qh relative to G, with notation m.h/ for
the local martingale part of the canonical  under Qh ; note that
   
L m.h/ j Qh D L ƒ1=2 B does not depend on h 2 Rd .

For 0 < t < 1 fixed and for .S , J / in Definition 5.2 defined by .  t , ƒt /, we have
the structure of a Gaussian shift experiment E.ƒt / D .C , G t , ¹Qh : h 2 Rd º/ which
corresponds to time-continuous observation of the trajectory  under unknown h up
to time t .

Proof. Note that in the definition of Qh , the drift term of the form ƒh, h 2 Rd ,
involves the matrix ƒ, i.e. the covariance matrix of 1 under Q0 . Fix h 2 Rd . The
proof is similar to the proof in Example 5.14, and we mention only the main steps.
Again we start from a martingale Mh on .C , C , Q0 / defined as in ./:
² ³
> 1 >
Mh D .Mh,t / t0 , Mh,t :D exp h  t  h ƒt h
2
e h,T on .C , C / from Mh
and define for 0 < T < 1 probability laws Q
e h,T .A/ :D EQ . Mh,T 1A /
Q for all A 2 C

such that by Girsanov’s theorem and the definition of Mh

.  t  .t ^ T / ƒh / t0 e h,T .
is a local martingale under Q

By P. Lévy’s characterisation of Brownian motion, this implies that

.  t  .t ^ T / ƒh / t0 e h,T
is a d -dimensional Brownian motion under Q

from which we deduce that Q e h,T coincides with the restriction of Qh to GT : Mh,T
being strictly positive, we have Qh,T Q0,T , and Mh,T is a version of the density
d Qh,T
d Q0,T
. The remaining parts of the proof are along the lines of Example 5.14. 

5.16 Remark. As a consequence of Example 5.15, we have optimal estimators for


the drift parameter in the sense of Boll’s Convolution Theorem 5.5, its reformulation
in terms of subconvex loss functions (Corollary 5.8), and in the sense of the Min-
imax Theorem 5.10: when we observe d -dimensional scaled Brownian motion with

Section 5.2 Brownian Motion with Unknown Drift as a Gaussian Shift Experiment 147

unknown parameter h 2 Rd up to (deterministic) time 0 < t < 1, the central statistic


0 1
.1/
 t =t
B C
B C
Z D ƒ1 B .. C
@ . A
.d /
 t =t

(with .1/ , : : : , .d / the components of the canonical process ) is the best equivariant
estimator and the minimax estimator for the unknown parameter, by the properties of
the Gaussian shift experiment E.ƒt / D .C , G t , ¹Qh : h 2 Rd º/. Note that this esti-
mator only makes use of the last observation  t .
Chapter 6

Quadratic Experiments and Mixed Normal


Experiments

Topics for Chapter 6:

6.1 Quadratic and Mixed Normal Experiments


Quadratic experiments 6.1–6.1”
Mixed normal experiments 6.2–6.3
Score and observed information in quadratic experiments 6.4–6.4’
Strongly equivariant estimators in mixed normal experiments 6.5–6.5’
Convolution theorem in mixed normal experiments 6.6–6.6”
On approximate equivariance of arbitrary estimators 6.7
Minimax theorem for mixed normal experiments 6.8
6.2  Likelihood Ratio Processes in Diffusion Models
Assumptions and notations 6.9
Local absolute continuity of laws of solutions to SDE, density processes 6.10
Example: Ornstein–Uhlenbeck model 6.11
Quadratic models attached to the solution of an SDE 6.12
A remark on quadratic variation and change of measure 6.12’

6.3  Time Changes for Brownian Motion with Unknown Drift


Changing time in the model ‘scaled Brownian motion with unknown drift’ 6.13
Stopping times inducing quadratic experiments which are not mixed normal 6.14
Observation up to some independent time 6.15
Transformation by independent time change 6.16
Example: one-sided stable processes and Mittag–Leffler processes 6.17–6.18
Exercises: 6.1”’, 6.14”, 6.15’, 6.15”, 6.18’

6.1 Quadratic and Mixed Normal Experiments

In this chapter we discuss quadratic and in particular mixed normal experiments. We


show that the main results of the preceding chapter – the convolution theorem and
the minimax theorem – can be generalised to mixed normal experiments, but do not
carry over to general quadratic experiments. We present quadratic and mixed nor-
mal experiments in an approach which has been sketched by Davies [19], see also
Section 6.1 Quadratic and Mixed Normal Experiments 149

Jeganathan [66, 67] and Le Cam and Yang [84]. Stochastic process examples leading
to quadratic or to mixed normal experiments will be given in Sections 6.2 and 6.3.

It seems natural to generalise Definition 5.2 and to allow for random matrices J
taking values in the set
® ¯
DC :D j 2 Rd d : j is symmetric and strictly positive definite 2 B.Rd d / .

6.1 Definition. A statistical model ., A, ¹Ph : h 2 Rd º/ is called a quadratic exper-


iment E.S , J / if there exists a pair .S , J / of statistics

S : ., A/ ! .Rd , B.Rd // , J : ., A/ ! .Rd d , B.Rd d //

with the following properties (i) and (ii):


(i) P0 -almost surely, the statistic J takes values in the set DC ;
(ii) for every h 2 Rd ,
 
> 1 >
! ! exp h S.!/  h J.!/ h D: Lh=0 .!/
2
is a version of the likelihood ratio of Ph with respect to P0 .
The central statistic in a quadratic experiment E.S , J / is defined as

.C/ Z :D 1¹J 2DC º J 1 S .

Frequently we will suppress the indicator in (+), and write for short Z D J 1 S in-
stead of (+).

6.1’ Remark. (a) In a quadratic experiment E.S , J / we have Ph P0 for all h 2


Rd , and the central statistic Z D J 1 S is a maximum likelihood estimator for the
unknown parameter h 2 Rd .
(b) In contrast to the Gaussian shift experiment E.J /, cf. Proposition 5.3(c) and (d)
or Example 5.1(c), general quadratic experiments E.S , J / are no longer robust against
reparameterisation around reference points 0 ¤ h0 2 Rd (examples will be seen in
Chapter 8). So there is no analogue to Proposition 5.3(c) and (d) in Definition 6.1. 

6.1” Remark. We have seen in Example 5.1 that a Gaussian shift experiment E.J /
exists for any given deterministic matrix J 2 Rd d . In contrast to this, it is no longer
true that arbitrarily prescribed pairs .S , J / induce quadratic experiments once J is ran-
dom. The following nontrivial condition ./ is contained in the formulation of part (ii)
of Definition 6.1:
Z
> 1 >
./ e h S  2 h J h dP0 D 1 for all h 2 Rd .
150 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

This is a necessary and sufficient condition in order that a ‘candidate’ pair .S , J / on


a probability space ., A, P0 / generates a quadratic experiment E.S , J / D ., A,
¹Ph : h 2 Rd º/. 
 
6.1”’ Exercise. Consider the statistical model , A, ¹P :  2 Rº

 :D .0, 1/ R, A :D B.0, 1/ ˝ B.R/ , P .dt , dx/ :D e  t dt K ./ .t , dx/

where the second factor represents a family of transition probabilities

K ./ .t , dx/ :D N . t, t /.dx/ , t 2 .0, 1/ , x 2 R

from ..0, 1/, B.0, 1// to .R, B.R//, parameterised by  2 R. We keep 0 <  < 1 fixed.
(a) Write f .t , x/ for the density of P with respect to Lebesgue measure on .0, 1/ R
and show from Example 5.1 that

f .t , x/ D e  x 
1 2 t
2 f0 .t , x/ , t 2 .0, 1/ , x 2 R .

Thus, with S.t, x/ D x and J.t, x/ D t , we have a quadratic model in the parameter  2 R,
as defined in Definition 6.1. Moreover, the model ¹P :  2 Rº is in particular mixed normal
in the sense of Definition 6.2 since L.J jP / does not depend on the parameter  2 R.
(b) For a pair . , B/ where B and  are independent, where B D .B t / t0 is a one-dimen-
sional standard Brownian motion starting from B0  0 and  an exponential time with param-
eter , the above model arises as
® ¯
P D L .  , B C  / :  2 R .

(c) From (a), ¹P :  2 Rº is a curved exponential family in ./ and T where : R ! R2
is the mapping  ! ./ D .  12 2 ,  / and T D idj.0,1/R the canonical variable on
.0, 1/ R. 

6.2 Definition. A mixed normal experiment E.S , J / is a quadratic experiment as in


Definition 6.1 for which

L. J j Ph / D L. J j P0 / does not depend on the parameter h 2 Rd .

In this case we write PJ for L. J j Ph /, h 2 Rd arbitrary.

Mixed normality can be characterised as follows.

6.3 Proposition. In a quadratic experiment E.S , J /, the following assertions (i)–(iv)


are equivalent:
(i) E.S , J / is mixed normal;
S jJ Dj
(ii) P0 D N .0, j / for P0J -almost all j 2 Rd d ;
Section 6.1 Quadratic and Mixed Normal Experiments 151

S jJ Dj
(iii) for all h 2 Rd we have Ph D N .j h, j / for PhJ -almost all j 2 Rd d ;
ZhjJ Dj
(iv) for all h 2 Rd we have Ph D N .0, j 1 / for PhJ -almost all j 2 Rd d .

Proof. (1) By Definition 6.2, mixed normality is equivalent to the assertion


 
E0 .1B .J // D Eh .1B .J // D E0 1B .J / Lh=0
.C/
for all h 2 Rd and all B 2 B.Rd /.
Equality between the first and the third term in (+) is equivalent to any one of the
following assertions:
 for every h 2 Rd , the constant function 1 is a version of
>S  1 h> J h
E0 .e h 2 j J D /;
> >
for every h 2 Rd we have E0 .e h S j J D j / D e C 2 h j h for P0J -almost all
1


j 2 Rd d
S jJ Dj
 for every h 2 Rd we have P0 D N .0, j / for P0J -almost all j 2 Rd d .
Thus assertion (i)” .ii) of the proposition is proved; (iii)” .iv) is by definition
Z D 1¹J 2DC º J 1 S of the central statistic. We obviously have (iii)H)(ii); so the
following step (2) will finish the proof.
(2) Proof of (i)+(ii)H)(iii): Under (i) and (ii) we have for arbitrary A 2 B.Rd /,
B 2 B.Rd d /:
Eh .1B .J / 1A .S //
 
D E0 1B .J / 1A .S / Lh=0
 > 1 >

D E0 1B .J / 1A .S / e h S  2 h J h
Z Z
1 > S jJ Dj >
D P0J .dj / 1B .j / e  2 h j h P0 .ds/ 1A .s/ e h s .

From (ii) and the definition of the set DC preceding Definition 6.1, the last line reads
Z "Z #
J  12 h> j h 1  12 s > j 1 s h> s
P0 .dj / 1B \ DC .j / e ds p e 1A .s/ e ,
Rd .2/d jdet.j /j
and after rearranging terms  12 h> j h 12 s > j 1 sCh> s D  12 .sj h/> j 1 .sj h/
gives
Z Z
1 > 1
e  2 .sj h/ j .sj h/ 1A .s/ .
1
P0J .dj / 1B \ DC .j / ds p
Rd .2/d jdet.j /j
By (i) and Definition 6.2, the last expression equals
Z Z
P0 .dj / 1B \ DC .j / N .j h, j /.A/ D PhJ .dj / 1B \ DC .j / N .j h, j /.A/ .
J
152 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

The set DC has full measure under PhJ since Ph P0 . So the complete chain of
equalities shows
Z
Eh .1B .J / 1A .S // D PhJ .dj / 1B .j / N .j h, j /.A/

where A 2 B.Rd / and B 2 B.Rd d / are arbitrary. This gives (iii) and completes
the proof. 

6.4 Definition. In a quadratic experiment E.S , J /, for h 2 Rd , we call S  J h the


score in h and J the observed information.

Note that S  J h is not a score in h in the classical sense of Definition 1.2: if we


have
Mh :D S  J h D .r log f / .h, / on  ,
there is so far no information on integrability properties of Mh under Ph . Similarly, the
random object J is not Fisher information in h in the classical sense of Definition 1.2.

6.4’ Remark. In mixed normal experiments E.S , J /, the notion of ‘observed informa-
tion’ due to Barndorff-Nielsen (e.g. [3]) gains a new and deeper sense: the observation
! 2  itself communicates through the observed value j D J.!/ the amount of ‘in-
formation’ it carries about the unknown parameter. Indeed from Definition 6.2 and
Proposition 6.3(iii), the family of conditional probability measures
° ±
jJ Dj
Ph : h 2 Rd

is a Gaussian shift experiment E.j / as discussed in Proposition 5.3 for PJ -almost all
j 2 Rd d ; this means that we can condition on the observed information. By Propo-
sition 6.3, this interpretation is valid under mixed normality, and does not carry over
to general quadratic experiments.

Conditioning on the observed information, the convolution theorem carries over to


mixed normal experiments – as noticed by [66] – provided we strengthen the general
notion of Definition 5.4 of equivariance and consider estimators which are equivariant
conditional on the observed information.

6.5 Definition. In a mixed normal experiment E.S , J /, an estimator  for the un-
known parameter h 2 Rd is termed strongly equivariant if the transition kernel

./ .j , A/ ! Ph.h/jJ Dj .A/ , j 2 DC , A 2 B.Rd /

admits a version which does not depend on the parameter h 2 Rd .


Section 6.1 Quadratic and Mixed Normal Experiments 153

In a mixed normal experiment E.S , J /, the central statistic Z D 1¹J 2DC º J 1 S is


a strongly equivariant estimator for the unknown parameter, by Proposition 6.3(iv).
In the particular case of a Gaussian shift experiment E.J /, J is deterministic, and
the notions of equivariance (Definition 5.4) and strong equivariance (Definition 6.5)
coincide: this follows from a reformulation of Definition 6.5.

6.5’ Proposition. Consider a mixed normal experiment E.S , J / and an estimator 


for the unknown parameter h 2 Rd . Then  is strongly equivariant if and only if

.C/ L . .  h, J / j Ph / does not depend on the parameter h 2 Rd .

Proof. Clearly Definition 6.2 combined with ./ of Definition 6.5 gives (+). To prove
the converse, fix a countable and \–stable generator S of the  -field B.Rd /, and
assume that Definition 6.2 holds in combination with (+). Then for every A 2 S and
every h 2 Rd , we have from (+)

Eh . 1C .J / 1A .  h/ / D E0 . 1C .J / 1A ./ /

for all C 2 B.Rd d /, and thus


Z Z
J hjJ Dj jJ Dj
P .dj / 1C .j / Ph .A/ D PJ .dj / 1C .j / P0 .A/

for all C 2 B.Rd d /. This shows that the functions


hjJ Dj jJ Dj
j ! Ph .A/ and j ! P0 .A/

coincide PJ -almost surely on Rd d . Thus there is some PJ -null set Nh,A 2 B.Rd /
such that
PhhjJ Dj .A/ D P0jJ Dj .A/ for all j 2 DC nNh,A
where DSC is the set defined before Definition 6.1. Keeping h fixed, the countable union
Nh D A2S Nh,A is again a PJ -null set in B.Rd /, and we have

j 2 DC nNh : PhhjJ Dj .A/ D P0jJ Dj .A/ for all A 2 S .

S being a \–stable generator of B.Rd /, we infer:


hjJ Dj
the probability laws Ph ./ and P0jJ Dj ./
. /
coincide on .Rd , B.Rd // when j 2 DC nNh .
jJ Dj
Let K., / denote a regular version of the conditional law .j , A/ ! P0 .A/ ,
j 2 DC , A 2 B.Rd /. Since DC nNh is a set of full measure under PJ , assertion . /
hjJ Dj
shows that K., / is at the same time a regular version of .j , A/ ! Ph .A/ .
154 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

So far, h 2 Rd was fixed but arbitrary: hence K., / is a regular version simultane-
hjJ Dj
ously for all conditional laws .j , A/ ! Ph .A/ , h 2 Rd . We thus have ./
in Definition 6.5 which finishes the proof. 

We state the first main result on estimation in mixed normal experiments, due to
Jeganathan [66].

6.6 Convolution Theorem. Consider a strongly equivariant estimator  in a mixed


normal experiment E.S , J /. Then there exist probability measures ¹Qj : j 2 DC º on
.Rd , B.Rd // such that

for PJ -almost all j 2 DC :


.h/jJ Dj
Ph D N .0, j 1 / ? Qj does not depend on h 2 Rd ,

and estimation errors of  are distributed as


Z
L .  h j Ph / D PJ .dj / N .0, j 1 / ? Qj

independently of h 2 Rd . In addition, .j , A/ ! Qj .A/ is obtained as a regular


.Z/jJ Dj
version of the conditional law P0 .A/ where j 2 DC and A 2 B.Rd /.

Proof. Since  is a strongly equivariant estimator for the unknown parameter h 2 Rd ,


it is sufficient to determine a family of probability measures ¹Qj : j 2 Rd d º on
.Rd , B.Rd // such that

jJ Dj
for PJ -almost all j 2 DC : P0 .A/ D N .0, j 1 / ? Qj .

e / of the conditional law of the pair ., S / given J D 


Prepare a regular version K.,
under P0

e , C / :D P0.,S /jJ Dj .C / ,
.j , C / ! K.j j 2 DC , C 2 B.Rd /˝B.Rd /

with second marginals specified according to Proposition 6.3(ii):

e , Rd / D N .0, j /
K.j for all j 2 DC .

(1) For t 2 Rd fixed, for arbitrary events B 2 B.Rd d / and arbitrary h 2 Rd , we


start from Proposition 6.5’
 >   > 
E0 1B .J / e i t  D Eh 1B .J / e i t .h/
Section 6.1 Quadratic and Mixed Normal Experiments 155

which gives on the right-hand side


 >

E0 1B .J / e i t 
 >
  > > 1 > 
D E0 1B .J / e i t .h/ Lh=0 D E0 1B .J / e i t .h/ e h S  2 h J h
Z Z
J
D P0 .dj / 1B .j / e , .d k, ds// e i t > .kh/ e h> s  12 h> j h
K.j
Z Z
D PJ .dj / 1B\DC .j / e , .d k, ds// e i t > .kh/ e h> s  12 h> j h
K.j

and on the left-hand side


  Z Z
i t > J e , .d k, ds// e i t > k
E0 1B .J / e D P .dj / 1B\DC .j / K.j

e /. We compare the expressions in square brackets in both equal-


by definition of K.,
ities: since B 2 B.Rd d / is arbitrary and since PJ is concentrated on DC , the func-
tions Z
j ! e , .d k, ds// e i t > .kh/ e h> s  12 h> j h
K.j
and Z
j ! e , .d k, ds// e i t > k
K.j

coincide PJ -almost surely on DC , for every fixed value of h 2 Rd . Let Nh denote
an exceptional PJ -null set
S with respectJ to a fixed value of h in the last formula. The
countable union N :D Nh is a P -null set in DC with the property
h2 Qd
8R
ˆ e i t > .kh/ e h> s  1
h> j h
< Rd RdRK.j , .d k, ds// e 2

.˘/ D Rd Rd K.je , .d k, ds// e i t > k



for all j 2 DC n N and for all h 2 Qd .

(2) Now we fix j 2 DC n N . We consider the first integral in .˘/ as a function of h


Z
Rd 3 h ! e , .d k, ds// e i t > .kh/ e h> s  12 h> j h
K.j

which as in the proof of Theorem 5.5 is extended to an analytic function


Z
d
C 3 z ! e , .d k, ds// e i t > .kz/ e z > s  12 z > j z D: fj .z/ .
K.j

The second integral in .˘/ equals fj .0/. The function fj being analytic, assertion .˘/
yields
for all j 2 DC n N : the function fj ./ is constant on C d .
156 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

In particular, for all j 2 DC n N ,


fj .0/ D fj .i j 1 t /
which gives in analogy to ./ in the proof of Theorem 5.5
Z Z
.C/ e , .d k, ds// e i t > k D e  12 t > j 1 t
K.j e , .d k, ds// e i t > .kj 1 s/ .
K.j

e / we have proved
By definition of K.,
for all j 2 DC n N :
.CC/  >  1 > 1  > 
E0 e i t  j J D j D e  2 t j t E0 e i t .Z/ j J D j .

(3) So far, we have considered some fixed value of t 2 Rd . Hence the PJ -null set
N in .CC/ depends on t . Taking the union of such null sets for all t 2 Qd we obtain
e for which dominated convergence with respect to the argument t in
a PJ -null set N
both integrals in (+) establishes
for all j 2 DC n Ne:
 >  1 > 1
 > 
E0 e i t  j J D j D e  2 t j t E0 e i t .Z/ j J D j , t 2 Rd .
This is an equation between characteristic functions of conditional laws. Introducing
a regular version
.Z/jJ Dj
.j , A/ ! Qj .A/ D P0 .A/

for the conditional law of   Z given J D  under P0 we have


e , all A 2 B.Rd / :
for all j 2 DC n N P0jJ Dj .A/ D ŒN .0, j 1 / ? Qj .A/ .
This finishes the proof of the convolution theorem in the mixed normal experiment
E.S , J /. 

In combination with Proposition 6.3(iv), the Convolution Theorem 6.6 shows that
within the class of all strongly equivariant estimators for the unknown parameter h 2
Rd in a mixed normal experiment E.S , J /, the central statistic Z D 1¹J 2DC º J 1 S
achieves minimal spread of estimation errors, or equivalently, achieves the best pos-
sible concentration of an estimator around the true value of the parameter. In analogy
to Corollary 5.8 this can be reformulated – again using Anderson’s Lemma 5.7 – as
follows:

6.6’ Corollary. In a mixed normal experiment E.S , J /, with respect to any subconvex
loss function, the central statistic Z minimises the risk
R., h/  R.Z, h/ , h 2 Rd
in the class of all strongly equivariant estimators  for the unknown parameter.
Section 6.1 Quadratic and Mixed Normal Experiments 157

6.6” Remark. In the Convolution Theorem 6.6 and in Corollary 6.6’, the best concen-
trated distribution for estimation errors
Z 
 
L .Z j P0 / D L PJ .dj / N 0 , j 1

does not necessarily admit finite second moments. As an example, the mixing distribu-
tion PJ might be in dimension d D 1 some Gamma law .a, p/ with shape parameter
a 2 .0, 1/: then Z
  1
EP0 Z 2 D .a, p/.du/ D 1
.0,1/ u
and the central statistic Z does not belong to L2 .P0 /. We shall see examples in Chap-
ter 8. When L .ZjP0 / does not admit finite second moments, comparison of estima-
tors based on squared loss `.x/Dx 2 – or based on polynomial loss functions – does not
make sense, thus bounded loss functions are of intrinsic importance in Theorem 6.6
and Corollary 6.6’, or in Lemma 6.7 and Theorem 6.8 below. Note that under mixed
normality, estimation errors are always compared conditionally on the observed infor-
mation, never globally.

In mixed normal experiments, one would like to be able to compare the central
statistic Z not only to strongly equivariant estimators, but to arbitrary estimators for
the unknown parameter h 2 Rd . Again we can condition on the observed information
to prove in analogy to Lemma 5.9 that arbitrary estimators are ‘approximately strongly
invariant’ under a ‘very diffuse’ prior, i.e. in the absence of any a priori information
except that the unknown parameter should range over large balls centred at the origin.
Again Bn denotes the closed ball ¹x 2 Rd : jxj  nº, and n the uniform law on Bn .

6.7 Lemma. In a mixed normal experiment E.S , J /, every estimator  for the un-
j
known parameter h 2 Rd can be associated to probability measures ¹ Qn : j 2
DC , n  1 º on .Rd , B.Rd // such that
Z Z h i
J 1 j
d1 n .dh/ L .  hjPh / , P .dj / N .0, j / ? Qn ! 0

as n ! 1 ,

and for every bounded subconvex loss function the risks


Z
R., Bn / D n .dh/ Eh .`.  h//

can be represented as n ! 1 in the form


Z Z h i
J
R., Bn / D P0 .dj / N .0, j 1 / ? Qnj .dv/ `.v/ C o.1/ , n!1.
158 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

Proof. The second assertion is a consequence of the first since `./ is measurable and
bounded, by (+) in Remark 5.8’. We prove the first assertion in analogy to the proof
of Lemma 5.9.
(1) The pair .Z, J / is a sufficient statistic in the mixed normal experiment E.S , J /.
Thus, for any random variable  taking values in .Rd , B.Rd //, there is a regular
version of the conditional law of  given .Z, J / D  which does not depend on the
parameter h 2 Rd . We fix a transition kernel K., e / from Rd DC to Rd which
provides a common regular version
e j.Z,J /D.z,j /
K..z, j /, A/ D Ph .A/ , A 2 Rd , .z, j / 2 Rd DC

under all values h 2 Rd of the parameter. When we wish to keep j 2 DC fixed, we


write for short
e
K j .z, A/ :D K..z, j /, A/ , A 2 Rd , z 2 Rd .

Sufficiency also allows to consider mixtures


Z
j . Z/j.Z,J /D.h,j /
Qn .A/ :D n .dh/ P .A/ , A 2 B.Rd /

for every j 2 DC and every n 2 N.


(2) With notations introduced in step (1), we have from Proposition 6.3(iv)
Z
n .dh/ Ph .  h 2 A/
Z Z Z
ZjJ Dj e
D n .dh/ PhJ .dj / Ph .dz/ K..z, j /, A C h/
Z Z Z
D PJ .dj / n .dh/ N .h, j 1 /.dz/ K j .z, A C h/
Z Z Z
J 1
D P .dj / N .0, j /.dx/ n .dh/ K j .x C h, A C h/

in analogy to step (3) of the proof of Lemma 5.9.


(3) In analogy to step (2) in the proof of Lemma 5.9, we have bounds
ˇZ ˇ
ˇ ˇ
./ ˇ n .dh/ K .x C h, A C h/  Q .A  x/ ˇ  .Bn 4.Bn C x//
j j
ˇ n ˇ .Bn /

which are uniform in A 2 B.Rd / and j 2 DC . To check this, start from


Z
1
.dh/ 1Bn .h/ K j .x C h, A C h/
.Bn /
Z
1
D .dh0 / 1Bn Cx .h0 / K j .h0 , A  x C h0 /
.Bn /
Section 6.1 Quadratic and Mixed Normal Experiments 159

e / equals
which by definition of K.,
Z
1  
e .h0 , j /, A  x C h0
.dh0 / 1Bn Cx .h0 / K
.Bn /
Z
1  
D .dh0 / 1Bn Cx .h0 / P  2 .A  x C h0 / j .Z, J / D .h0 , j /
.Bn /
Z
1  
D .dh0 / 1Bn Cx .h0 / P .  Z/ 2 .A  x/ j .Z, J / D .h0 , j / .
.Bn /
Now the last expression
Z
1 Z/j.Z,J /D.h0 ,j /
.dh0 / 1Bn Cx .h0 / P. .A  x/
.Bn /
can be compared to
Z
1 . Z/j.Z,J /D.h0 ,j /
Qnj .A  x/ D .dh0 / 1Bn .h0 / P .A  x/
.Bn /

up to the error bound on the right-hand side of ./, uniformly in A 2 B.Rd / and
j 2 DC .
(4) Combining steps (2) and (3), we approximate
Z Z Z
n .dh/ Ph .  h 2 A/ by PJ .dj / N .0, j 1 /.dx/ Qnj .A  x/

uniformly in A 2 B.Rd / as n ! 1, up to error terms


Z Z
.Bn 4.Bn C x//
PJ .dj / N .0, j 1 /.dx/
.Bn /
as in the proof of Lemma 5.9. Using dominated convergence twice, these vanish as
n ! 1. 

The following is – together with the Convolution Theorem 6.6 – the second main
result on estimation in mixed normal experiments.

6.8 Minimax Theorem. In a mixed normal experiment E.S , J /, the central statistic
Z is a minimax estimator for the unknown parameter with respect to any subconvex
loss function `./: we have
Z Z
ZjJ Dj
sup R., h/  E0 .`.Z// D PJ .dj / P0 .dz/ `.z/ D sup R.Z, h/ .
h2Rd h2Rd

Proof. Consider any estimator  for the unknown parameter h 2 Rd in E.S , J /, and
any subconvex loss function `./. Since E0 .`.Z// (finite or not) is the increasing limit
160 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

of E0 .Œ` ^ N .Z// as N ! 1 where all ` ^ N are subconvex loss functions, it is


sufficient to prove a lower bound
sup R., h/  E0 .`.Z// D R.Z, 0/
h2Rd
for subconvex loss functions which are bounded. Then we have a chain of inequalities
(its first two ‘’ are trivial) where we can apply Lemma 6.7:
Z
sup R., h/  sup R., h/  n .dh/ Eh .`.  h// D R., Bn /
h2Rd h2B
Z n Z h i
J
D P0 .dj / N .0, j 1 / ? Qnj .dv/ `.v/ C o.1/ as n ! 1 .

Now Anderson’s Lemma 5.7 allows us to compare integral terms for all n, j fixed,
and gives lower bounds
Z Z
 P .dj / N .0, j 1 /.dv/ `.v/ C o.1/ as n ! 1 .
J

The last integral does not depend on n, and by Definition 6.2 and Proposition 6.3(iv)
equals Eh .`.Z  h// D R.Z, h/ for arbitrary h 2 Rd . This finishes the proof. 

What happens beyond mixed normality, in the genuinely quadratic case? We know
that Z D J 1 S is a maximum likelihood estimator for the unknown parameter, by
Remark 6.1’, we know that for J.!/ 2 DC the log-likelihood surface
1 >
h ! ƒh=0 .!/ D h> S.!/  h J.!/ h
2
1
D  .h  Z.!//> J.!/ .h  Z.!// C expressions not depending on h
2
has the shape of a parabola which opens towards 1 and which admits a unique maxi-
mum at Z.!/, but this is no optimality criterion. For quadratic experiments which are
not mixed normal, satisfactory optimality results seem unknown. For squared loss,
Gushchin [38, Thm. 1, assertion 3] proves admissibility of the ML estimator Z un-
der random norming by J (which makes the randomly normed estimation errors at h
coincide with the score S  J h at h) in dimension d D 1. He also has Cramér–Rao
type results for restricted families of estimators. Beyond the setting of mixed nor-
mality, results which allow to distinguish an optimal estimator from its competitors
simultaneously under a sufficiently large class of loss functions – such as those in the
convolution theorem or in the minimax theorem – seem unknown.


6.2 Likelihood Ratio Processes in Diffusion Models
Statistical models for diffusion processes provide natural examples for quadratic ex-
periments. For laws of solutions of stochastic differential equations, we consider first

Section 6.2 Likelihood Ratio Processes in Diffusion Models 161

the problem of local absolute continuity or local equivalence of probability measures


and the structure of the density processes (this relies on Liptser and Shiryayev [88]
and Jacod and Shiryaev [64]), then we specialise to settings where quadratic statistical
models arise. Since this section is preceded by an asterisk  , a reader not interested in
stochastic processes may skip this and go directly to Chapter 7.
As in Notation 5.13, .C , C , G/ is the canonical path space for Rd -valued stochastic
processes with continuous paths; G is the (right-continuous) filtration generated by
the canonical process  D . t / t0 . Local absolute continuity and local equivalence
of probability measures were introduced in Definition 5.11; density processes were
defined in Theorem 5.12.

6.9 Assumptions and Notations for Section 6.2. (a) We consider d -dimensional
stochastic differential equations (SDE)
.I/ dX t D b.t , X t / dt C  .t , X t / d W t , t 0, X0  x0
.II/ dX t0 Db 0
.t , X t0 / dt C  .t , X t0 / d W t , t 0, X0  x0
driven by m-dimensional Brownian motion W . SDE (I) and (II) share the same diffu-
sion coefficient
 : Œ0, 1/ Rd ! Rd m
but have different drift coefficients
b , b0 : Œ0, 1/ Rd ! Rd .
Both equations (I) and (II) have the same starting value x0 2 Rd . All coefficients are
measurable in .t , x/; we assume Lipschitz continuity in the second variable
jb.t , x/  b.t , y/j C jb 0 .t , x/  b 0 .t , y/j C k .t , x/   .t , y/k  K jx  yj
(for t  0 and x, y 2 Rd ) together with linear growth
jb.t , x/j2 C jb 0 .t , x/j2 C k .t , x/k2  K .1 C jxj2 /
where K is some global constant. The d d -matrices
c.t , x/ :D  .t , x/  > .t , x/
will be important in the sequel.
(c) Under the assumptions of (b), equations (I) and (II) have unique strong solutions
(see e.g. [69, Chap. 5.2.B]) on the probability space ., A, F , P / carrying the driving
Brownian motion W , with F the filtration generated by W . We will be interested in
the laws of the solutions
Q :D L. X j P / and Q0 :D L. X 0 j P /
on the canonical path space .C , C , G/ .
162 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

0
(d) We write mX and mX for the .P , F /-local martingale parts of X and X 0
Z t Z t
X0 0 0
mXt D X t  X 0  b.s, X s / ds , m t D X t  X 0  b.s, Xs0 / ds .
0 0
Their angle brackets are the predictable processes
D E Z t D 0E Z t
m X
D c.s, Xs / ds , mX
D c.s, Xs0 / ds
t 0 t 0

taking values in the space of symmetric and non-negative definite d d -matrices.

We shall explain in Remark 6.12’ below why equations (I) and (II) have been as-
sumed to share the same diffusion coefficient. The next result is proved with the argu-
ments which [88, Sect. 7] use on the way to their Theorems 7.7 and 7.18. See also [64,
pp. 159–160, 179–181, and 187]).

6.10 Theorem. Let the drift coefficients b in (I) and b 0 in (II) be such that a measurable
function
 : Œ0, 1/ Rd ! Rd
exists which satisfies the following conditions (+) and (++):

.C/ b 0 .t , x/ D b.t , x/ C c.t , x/ .t , x/ , t  0 , x 2 Rd ,


Z t 
.CC/  > c  .s, s / ds < 1 for all t < 1, Q- and Q0 -almost surely.
0

(a) Then the probability laws Q0 and Q on .C , C / are locally equivalent relative to G.
(b) The density process of Q0 with respect to Q relative to G is the .Q, G/-martingale
²Z t Z ³
1 t > 
L D .L t / t0 , L t D exp  > .s, s / d mQ
s   c  .s,  s / ds
0 2 0

where mQ denotes the local martingale part of the canonical process  under Q:
Z t
Q Q
mQ D .m t / t0 , m t D  t  0  b.s, s / ds
0
and where the local martingale
Z
 > .s, s / d mQ
s D: M

under Q has angle bracket


Z t 
hM i t D  > c  .s, s / ds , t  0.
0

Section 6.2 Likelihood Ratio Processes in Diffusion Models 163

(c) The process L in (b) is the exponential


° 1 ±
L t D exp M t  hM i t , t 0
2
of M under Q, in the sense of a solution .L t / t0 under Q to the SDE
Z t
Lt D 1 C L t dM t , t 0.
0

A usual notation for the exponential of M under Q is EQ .M /.

Proof. (1) On .C , C , G/ define stopping times


² Z t  ³
n :D inf t > 0 :  > c  .s, s / ds > n , n1
0

(which may take the value C1 with positive probability under Q or under Q0 ) and
write
 .n/ .s, s / :D .s, s / 1ŒŒ0,n ŒŒ .s/ , n  1 .
By assumption (++) which is symmetric in Q and Q0 , we have

n " 1 Q- and Q0 -almost surely.

(2) For every n 2 N, on ., A, F , P /, we also have unique strong solutions for
equations (II.n/ ):
.n/ .n/ .n/
.II.n/ / dX t D Œb C c .n/ .t , X t / dt C  .t , X t / d W t , t 0, X0  x0 .

By unicity, X .n/ coincides on ŒŒ0, n  with the solution X 0 to SDE (II) where n is the
F -stopping time n :D n ıX 0 . If we write Q.n/ for the law L.X .n/ jP / on .C , C , G/,
then the laws Q.n/ and Q0 coincide in restriction to the  -field Gn of events up to
time n .
(3) On .C , C , G/, the Novikov criterion [61, p. 152] applies – as a consequence of
the stopping in step (1) – and grants that for fixed n 2 N
²Z t Z  ³
1 t  .n/ >
LL t D exp Œ .n/ > .s, s / d mQ
.n/ .n/
s  Œ  c Œ  .s, s / ds , t 0
0 2 0

is a Q-martingale. Thus LL .n/ defines a probability measure QL .n/ on .C , C , G/ via


R .n/
QL .n/ .A/ D A LL t dQ if A 2 G t . Writing ˇ for arbitrary directions in Rd , Girsanov’s
theorem [64, Thm. III.3.11] then states that
Z  
> Q > Q .n/ > Q
ˇ m  ˇ m , Œ  .s, s / d ms , 1  i  d
0
164 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

is a QL .n/ -local martingale whose angle bracket under QL .n/ coincides with the angle
bracket of ˇ >mQ under Q. For the d -dimensional process mQ this shows that
Z 
Q
m  Œ c  .n/ .s, s / ds
0

is a QL .n/ -local martingale whose angle bracket under QL .n/ coincides with the angle
bracket of mQ under Q. Since C is generated by the coordinate projections, this sig-
nifies that QL .n/ is the unique law on .C , C , G/ such that the canonical process  under
QL .n/ is a solution to SDE .II.n/ /. As a consequence, the two laws coincide: Q.n/ from
step (2) is the same law as QL .n/ , on .C , C , G/.
(4) For every n fixed, step (3) shows the following: the laws Q.n/ and Q are locally
equivalent relative to G since the density process of Q.n/ with respect to Q relative
to G
L^n D LL .n/
is strictly positive Q-almost surely. Recall also that the laws Q.n/ and Q0 coincide in
restriction to Gn .
loc
(5) The following argument proves Q0 << Q relative to G. For every t < 1,
 0 
A 2 G t H) Q0 .A/ D P X^t 2A
 0 
 P X^t 2 A , n > t C P . n  t /
 
.n/
 P X^t 2 A C P . n  t /
D Q.n/ . A / C Q0 . n  t /
D Q.n/ . A / C o.1/ as n ! 1
since X 0 and X .n/ coincide up to time n D n ı X 0 as above, and since n " 1
Q 0 -almost surely. Now, Q.n/ and Q being locally equivalent relative to G for all n,
by step (4), the above gives
A 2 G t , Q.A/ D 0 H) Q.n/ .A/ D 0 8 n  1 H) Q0 .A/ D 0
loc
which proves Q0 << Q relative to G.
(6) We prove that L is the density process of Q0 with respect to Q relative to G. For
A 2 G t , the event A \ ¹t < n º belongs to Gn , the  -field of events strictly before
time n (for a G-stopping time S , GS  is defined as the smallest  -field containing
the system ® ¯
G0 [ G \ ¹s < S º : G 2 Gs , 0  s < 1 ,
cf. [98, p. 17], and is contained in GS ). Thus A \ ¹t < n º 2 Gn : as a consequence,

Q0 .A/ D lim Q0 .A \ ¹t < n º/ D lim Q.n/ .A \ ¹t < n º/


n!1 n!1
D lim QL .n/ .A \ ¹t < n º/
n!1

Section 6.2 Likelihood Ratio Processes in Diffusion Models 165

which by the above – and since n " 1 Q-almost surely – equals


Z Z
lim LL .n/ dQ D lim L t^n dQ
n!1 A\¹t< º t n!1 A\¹t< º
n
Z n
Z
D lim L t dQ D L t dQ .
n!1 A\¹t< º A
n

This identifies L as the density process of Q0 with respect to Q relative to G.


loc
(7) By step (5), we know Q0 << Q relative to G. By step (6), L D .L t / t0 is a
loc
strictly positive Q-martingale, so this strengthens to Q0 Q relative to G. Now parts
(a) and (b) of the theorem are proved. Part (c) is an application of the Ito formula. 

The Ornstein–Uhlenbeck model as a special case of Theorem 6.10 (take b.t , x/  0,


 .t , x/  1, and .t , x/ D h x for h 2 R) is a well-known prototype of quadratic
models, in dimension d D 1.

6.11 Example. On the canonical path space .C , C , G/ of Notation 5.13 with canonical
process , write Q0 for Wiener measure. Let Qh denote the law of the solution to the
Ornstein–Uhlenbeck SDE

dX t D h X t dt C dB t , X0  0

for every h 2 R. Then according to Theorem 6.10, all laws Qh are locally equivalent
relative to G, and the density process of Qh with respect to Q0 relative to G is
² Z t Z ³
 h=0  h=0 1 2 t 2
Lh=0 D L t t0 , L t :D exp h s d m.0/
s  h  s ds
0 2 0

with m.0/ the martingale part of  under Q0 . For 0 < t < 1 fixed, observation of 
up to time t
 
E.S , J / D C , G t , ¹Qh.t/ : h 2 Rº , Qh.t/ the restriction of Qh to G t

yields a quadratic experiment in the sense of Definition 6.1, with score in 0 and ob-
served information given by
Z t Z t
.0/
S :D s d ms , J :D 2s ds .
0 0

This is not a mixed normal model: first, L.J jQh / depends on h 2 R, second, from
Ito formula and L.., m.0/ /jQ0 / D L..B, B// where B denotes Brownian motion,
the law Z t   
1 2
L.S jQ0 / D L Bs dBs D L .B t  t /
0 2
166 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

is concentrated on the half-line Œ 2t , 1/. Thus, according to Definition 6.2 and Propo-
sition 6.3, this quadratic model E.S , J / is not a mixed normal model. 
R
Scores of similar structure such as Bs dBs motivated Jeganathan to view quadratic
models as ‘Brownian functional models’ (see [67], and the references there). The fol-
lowing example shows that on .C , C , G/, we can attach quadratic statistical models to
solutions of stochastic differential equations in a natural way, under the sole restric-
tion that angle brackets of local martingale parts of the observed  should be invertible.

6.12 Example. Assume that c.t , x/ 2 Rd d is invertible for all t and all x. Fix a drift
coefficient b for SDE (I) and write Q for the law of the solution of (I). For SDE (II),
introduce a parameter h 2 Rd , define drift functions

bh .t , x/ :D b.t , x/ C c.t , x/ h , h 2 Rd

which depend on the parameter, and write Qh for the law of the solution of (II) with
b 0 D bh . Clearly bh ¤ bh0 for h ¤ h0 by assumption on c., /. We thus have a
statistical model

¹ Qh : h 2 Rd º on .C , C , G/ with Q0 :D Q .

Note that for every h 2 Rd , with ., /  h constant, the assumptions (+) and (++)
of Theorem 6.10 are satisfied, and Theorem 6.10 gives the following.
(a) We have
loc
Qh Q0 relative to G ,
and the density process of Qh with respect to Q0 relative to G is
² ³
1  
L t D exp h>m t  h>hm.0/ i t h D E t h>m.0/
h=0 .0/
2

where m.0/ is the .G, Q0 /-martingale part of the canonical process . By assumption
on c., /, Z t
h>hm.0/ i t h D h>c.s, s / h ds
0

is strictly positive for every h 2 R Hence the random matrix hm.0/ i t defined on
d.

.C , C / is invertible for every t  0. As a consequence, for every 0 < t < 1, the


model
.0/
¹ Qht : h 2 Rd º D E.S , J / : S :D m t , J :D hm.0/ i t
(with Qht the restriction of Qh to G t ) is a quadratic model satisfying all assumptions
of Definition 6.1.
(b) Fix any G-stopping time  on .C , C / which has the property

0 <  < 1 Qh -almost surely for every h 2 Rd .



Section 6.2 Likelihood Ratio Processes in Diffusion Models 167

Then according to ./ in Theorem 5.12(c), we can replace the deterministic time t in
(a) by  . This yields a statistical model

¹ Qh : h 2 Rd º D E.S , J / : S :D m0 , J :D hm0 i

(with Qh the restriction of Qh to G ) which again is quadratic in the sense of Defini-
tion 6.1. 

It remains to explain why equations (I) and (II) in Assumption 6.9 have been as-
loc
sumed to share the same diffusion coefficient. The reason is that for measures Q Q0
on .C , C , G/, under time-continuous observation of the canonical process  , the local
martingale part of  can not be modified:

6.12’ Remark. Consider Assumptions 6.9(a) in the special case d D m D 1, and


replace  ., / in equation (II) by  0 ., / : Œ0, 1/ R ! R such that Lipschitz and
linear growth conditions as in Assumptions 6.9 hold for both pairs .b,  / and .b0 ,  0 /.
For X solution to (I) and X 0 solution to (II), fix a common starting point x 2 R, and
consider the laws Q D L.X jP / and Q0 D L.X 0 jP 0 / on .C , C , G/ as in Assump-
tions 6.9. Assume that Q and Q0 are locally equivalent with respect to G. Consider a
sequence of partitions

…n :D ¹tn,i : i 2 N0 º , 0 D tn,0 < tn,1 <    < tn,`n D n ,


sup jtn,i  tn,i1 j ! 0
1i`n

as n ! 1, and empirical quadratic variations


X  2
Vn .t / :D  tn,i C1 ^t   tn,i , t  0.
i : tn,i <t

We have the following convergences in probability as n ! 1, cf. [98, p. 122–125],


and [64, p. 55]):
Z t
Vn .t / ! Œ .s, s /2 ds under Q ,
0
. / Z t
Vn .t / ! Œ 0 .s, s /2 ds under Q0 .
0

Next, select some subsequence .nk /k such that . / holds almost surely for every time
t which is rational, and then consider events in G t
² Z t ³
A t :D lim Vnk .t / D 2
Œ .s, s / ds ,
k!1 0
² Z t ³
A0t :D lim Vnk .t / D Œ 0 .s, s /2 ds .
k!1 0
168 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

loc
Here A t is a set of full Q-measure, and A0t a set of full Q0 -measure. From Q Q0
relative to G we deduce that the set A t \ A t has full measure under both Q and Q0 .
0

This holds for all rational t . Hence under both laws Q and Q0 , the processes
D E Z t  D E Z t 
Q Q0 0
. / m D 2
Œ .s, s / ds , m D 2
Œ .s, s / ds
0 t0 0 t0

are indistinguishable. Thus, under time-continuous observation of the canonical pro-


loc
cess  and locally absolutely continuous change of measure Q Q0 on .C , C , G/,
the local martingale part of  necessarily remains unchanged. In the sense of . /, the
diffusion coefficient plays the role of an ‘observable’ quantity under time-continuous
observation. 


6.3 Time Changes for Brownian Motion with
Unknown Drift
The present section is on time changes in the model ‘scaled Brownian motion with
unknown drift’ of Section 5.2. We shall see that time changes which are independent
of Brownian motion lead to mixed normal models, other time changes to quadratic
models which are not mixed normal.

6.13 Example. We continue Example 5.15, with all notations as there: .C , C , G/


is the canonical path space for Rd -valued processes with continuous paths, B a d -
dimensional standard Brownian motion starting in 0, the matrix ƒ 2 Rd d is sym-
metric and strictly positive definite, with square root ƒ1=2 ; we consider the model
   
C , C , G , ¹Qh : h 2 Rd º , Qh D L .ƒ1=2 B t C .ƒh/ t / t0 , h 2 Rd .

With  the canonical process on .C , C / and m.h/ the Qh -martingale part of  , Exam-
ple 5.15 states that all Qh are locally equivalent relative to G, and that
 0  ² ³
0 h0 = h 1
:D exp .h0  h/>m t  .h0  h/>ƒt .h0  h/
h =h .h/
Lh = h D L t , Lt
t0 2

is the density process of Qh0 with respect to Qh relative to G, where L.m.h/ jQh / D
L.ƒ1=2 B/ does not depend on the parameter h 2 Rd .
Consider a G-stopping time  with the property

./ 0< <1 Qh -almost surely, for every h 2 Rd .

By Theorem 5.12(c) and (d) – compare to Remark 6.1” – condition ./ guarantees that
the ‘candidate pair’  
.S , J / D m.0/
 , ƒ D . , ƒ /

Section 6.3 Time Changes for Brownian Motion with Unknown Drift 169

generates a quadratic experiment as defined in Definition 6.1


 
E.S , J / D C , G , ¹Qh : h 2 Rd º

where Qh :D .Qh /j G are equivalent probability measures, h 2 Rd , and where


 
h0 = h 0 > .h/ 1 0 > 0
L D exp .h  h/ m  .h  h/ ƒ .h  h/
2
provides a version of the density of Qh 0 with respect to Qh . The model E.S , J / rep-
resents scaled Brownian motion with unknown drift observed up to time  .

So far, we have a rather restricted number of time transformations given by G-


stopping times.

6.14 Example. Consider Example 6.13 in the one-dimensional case d D 1, with


scaling factor ƒ :D 1. For a > 0 and b > 0 fixed, we have G-stopping times
a,b :D Ta ^ Sb , Ta :D inf¹r > 0 : r > aº , Sb :D inf¹r > 0 : r < bº .
Writing for short  :D a,b and considering first the Wiener measure P :D Q0 , we
have
b a
.C/ EP . / D ab , P . D a/ D , P . D b/ D
bCa bCa
from [112, Prop. (3.8) and Exer. 3ı of (3.11), Chap. II].
(a) We prove that ./ in Example 6.13 holds for  D Ta ^ Sb . Consider Q0 first;
clearly Q0 .0 <  < 1/ D 1. The canonical process  starting almost surely in 0
under Q0 , the sequence .^N /N 2N takes values almost surely in the interval Œb, a:
hence, for h 2 R arbitrary,
h=0  12 h2 .^N /
L^N D e h  ^N
, N 1
is a uniformly integrable martingale under Q0 . Applying Theorem 5.12(d), the condi-
tion ./ in Example 6.13 is verified.
(b) In the quadratic experiment E.S , J / induced by  , with S D  and J D  ,
conditional laws
S jJ Dj jDj
.ı/ Q0 D Q0  , j 2 .0, 1/
are supported by the two-point-set ¹a, bº, by definition of  D Ta ^ Sb . This is
incompatible with Proposition 6.3(ii). Hence E.S , J / is not a mixed normal experi-
ment. 

6.14’ Exercise. This exercise complements Example 6.14. With all notations as in Example
6.14, we consider the one-dimensional case d D 1 with scaling factor ƒ :D 1. We focus on
the G-stopping times Ta , for a > 0 fixed.
170 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

(a) Since 0 D 0 Q0 -almost surely, the law of the iterated logarithm grants
Q0 .¹0 < Ta < 1º/ D 1.
(b) Write for short P :D Q0 and M t :D max s . Write ˆ for the distribution function of
0st
N .0, 1/. For a, t in .0, 1/, use the reflection principle and rescaling
p p
P .Ta < t / D P .M t > a/ D 2P . t > a/ D 2P .1 > a= t / D 2.1  ˆ.a= t //
to determine the density of L.Ta jP /:
a a2
fa .t / D p t  2 e  2
3 1
t , t , a 2 .0, 1/ .
2
Determine the Laplace transform of L.Ta jP /
  p
EP e Ta D e  a 2 , 0

from the statistical argument in Theorem 5.12(d): for positive drift parameter h > 0 we do
have the equivalent assertions
 
h=0
Qh .¹0 < Ta < 1º/ D 1 ” EQ0 LTa D 1 , h > 0

where Lh=0 is the density process of Qh with respect to Q0 as in Examples 5.14 or 6.13; on
the right-hand side, exploit Ta D a Q0 -almost surely; finally, change variables  :D 12 h2 .
(c) For positive drift h > 0, determine the Laplace transform of L .Ta j Qh /
  p
EQh e  Ta D e  a . h C2  h/ ,   0
2

h=0
using (b) and change of measure EQh .e  Ta / D EQ0 .e  Ta LTa /.
(d) For negative drift h < 0, prove that
Qh .¹0 < Ta < 1º/ D e 2 a h < 1 , h<0
(hint: combine an argument as in step (3) of the proof of Theorem 5.12 with application of (b)
above to eh :D jhj > 0).
(e) Deduce the following from (d). For a > 0 fixed, the candidate pair of statistics S D Ta ,
J D Ta under Q0 does not generate a statistical experiment which satisfies all assumptions of
Definition 6.1, cf. Remark 6.1”. As a consequence, ‘observing Brownian motion with unknown
drift up to time Ta ’ does not lead to a quadratic experiment in the sense of Definition 6.1. 

More interesting time transformations for the model ‘scaled Brownian motion with
unknown drift’ are at hand if we extend the probability space and consider stopping
times which are independent from Brownian motion. This will lead to mixed normal
experiments. We will formulate mixed normality in two variants: the first collects all
independent variables, processes, stopping times in one initial  -field, the second vari-
ant keeps track of the temporal dynamics of the time transformation.

6.15 Observing Scaled Brownian Motion with Unknown Drift up to some Inde-
pendent Random Time. We extend the experiment
    
C , C , G , ¹Qh : h 2 Rd º , Qh :D L ƒ1=2 B t C .ƒh/ t t0 , h 2 Rd

Section 6.3 Time Changes for Brownian Motion with Unknown Drift 171

considered in Example 5.15 in order to obtain random variables, processes or stop-


ping times  which are independent from Brownian motion B. Prepare some space
.00 , A00 , P 00 / carrying  such that

./  : 00 ! Œ0, 1 is A00 -measurable, and 0< <1 P 00 -almost surely.

The object  in ./ might be defined in terms of other processes or random variables
living on .00 , A00 , P 00 /. With the notations of Example 5.15, introduce a product space
 ® ¯
. / H , H , H, Q b h : h 2 Rd , Q b h :D Qh ˝ P 00

where

H :D C 00 , H :D C ˝ A00 , H t :D G t ˝ A00 , H D .H t / t0 .

This constructions lifts the canonical process  from .C , C / to .H , H /, and lifts ran-
dom variables from .00 , A00 / to .H , H /. In particular, any Œ0, 1-valued random vari-
able on .00 , A00 / lifted to the extension will be H0 -measurable, and thus can be used
as H-stopping time. On .H , H /, objects lifted from .C , C / and objects lifted from
.00 , A00 / will be independent under all Q b h :D Qh ˝ P 00 , h 2 Rd , and the law of ob-
00 00
jects stemming from . , A / will not depend on h 2 Rd . Due to this independence
structure, density processes of Q b h relative to H are obtained by
b h0 with respect to Q
lifting the density processes given in Example 5.15 from .C , C , G/. Thus ./ and ./
of Example 5.15 remain unchanged provided we redefine on .H , H , H/

m.h/ :D .  t  ƒh t / t0 bh .
martingale part of  under Q

In this sense, the model . / is a simple extension of the statistical model in Exam-
ple 5.15. On .H , H /, we write again  for the H-stopping time obtained by lifting ./
above to .H , H , H/. Then condition ./ of Theorem 5.12(c) is satisfied

0< <1 b h -almost surely, for every h 2 Rd


Q
b h / do not depend on h 2 Rd , and conditional laws
since the laws L.  j Q
b h/
.Q  jDt
D N . Œƒt  h , ƒt /

have the structure required in Proposition 6.3. Write Q b h to


b  for the restriction of Q
h
the  -field H of all events up to time  . By Theorem 5.12(c), these are equivalent
b  0 with respect to Q
probability laws, and the likelihood of Q b  on H is given by
h h
² ³
0 1 0
Lh = h D exp .h0  h/> m.h/
  .h  h/ >
ƒ .h 0
 h/ ,
2

for all h0 , h 2 Rd . Thus the quadratic experiment


 
E.S , J / D H , H , ¹Q b  : h 2 Rd º , S D  D m.0/
 , J D ƒ
h
172 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

is a mixed normal experiment in the sense of Definition 6.2. It describes time-


continuous observation of scaled Brownian motion with unknown drift up to the in-
dependent time  . The central statistic is
0 1
.1/
 = 
B C
Z :D ƒ1 @ : : : A
.d /
 = 

where .1/ , : : : , .d / are the components of the process . 

6.15’ Exercise. In the setting of 6.15, let .00 , A00 , P 00 / carry a Poisson process .N t / t0 with
parameter  > 0. Let  denote the time of the k-th jump of .N t / t0 . In the particular case k D 1
we recover the example of Exercise 6.1”’, with observation up to an independent exponential
time. 
6.15” Exercise. In the setting of 6.15, fix  > 0 and let .00 , A00 , P 00 / carry a Gamma
process, i.e. a PIIS . t / t0 (process with stationary and independent increments) where
L . r2  r1 jP 00 / D .r2 r1 , / for 0  r1 < r2 < 1. Here .a, / denotes the Gamma
a
law with density fa, .x/ D 1.0,1/ .x/ .a/ x a1 e x , x 2 R. The state of the process at
time t D 1 is thus exponentially distributed with parameter  ; defining  :D k we have an
alternative to Exercise 6.15’, for k 2 N. Beyond this, we can use the process t ! t as a time
change for Brownian motion. Note that Gamma processes (cf. Bertoin [10, p. 73]) are obtained
as integrals Z tZ
x .ds, dx/ , t  0
0 .0,1/

where .ds, dx/ denotes Poisson random measure with intensity ds x 1 e  x dx on


.0, 1/ .0, 1/. Hence they are increasing càdlàg processes, increasing only by jumps, and
admitting an infinite number of small jumps (positive and summable) over every finite time
interval. Accordingly, time-changed Brownian motion .B. t // t0 has càdlàg paths. 

The construction in 6.15 does not reflect adequately the temporal dynamics of
a process of time change in cases where such a process may play a key role; the
following construction remedies to this.

6.16 Independent Time Transformation for ‘Scaled Brownian Motion with Un-
known Drift’. Consider a probability space .0 , A0 , P 0 / carrying processes B D
.B t / t0 and A D .A t / t0 as follows:
(i) B is a d -dimensional standard Brownian motion with B0  0,
(ii) for all ! 2 0 , paths t ! A t .!/ are càdlàg non-decreasing, with A0 .!/ D 0
and lim A t .!/ D C1,
t!1

(iii) B and A are independent under P 0 .



Section 6.3 Time Changes for Brownian Motion with Unknown Drift 173

Write F D .F t / t0 for the right-continuous filtration generated by the pair .B, A/.
Then
t :D inf¹v : Av > t º , 0<t <1
are F -stopping times which by (ii) have the property

0< t .!/ < 1 for all 0 < t < 1 and all ! 2 0 .

We put 0 D 0. Then all paths t ! t are càdlàg non-decreasing, thus B ı :D


.B t / t0 is a càdlàg process. For ƒ 2 Rd d deterministic and strictly positive defi-
nite, the statistical experiment
  
.˘/ Qh :D L ƒ1=2 B t C .ƒh/ t , t t0 j P 0 , h 2 Rd

is mixed normal in the sense that for all 0 < t < 1 fixed, the restrictions Qht of
Qh to the  -field of events up to time t form an experiment ¹Qht : h 2 Rd º which
is mixed normal in the sense of Definition 6.2; it represents scaled Brownian motion
with unknown drift time-transformed by the level crossing times of an independent
increasing process A.
Let us state the necessary details more carefully before giving a proof. Write D for
the space of d -dimensional càdlàg functions Œ0, 1/ ! Rd equipped with Skorohod
topology and Borel  -field D (see [64, p. 292]). .D, D/ is a Polish space, and D
coincides with the  -field generated by the coordinate projections.
T With notation  D
. t / t0 for the canonical process on .D, D/, write G t :D r>t  .s : 0  s  r/.
Next, in dimension 1, we define .DC , DC / as the restriction of the Skorohod space of
one-dimensional càdlàg functions to the closed subspace of non-decreasing functions
h starting at h.0/ D 0 (cf. [64, p. 306]): then .DC , DC / is again Polish.
T With notation
C
D . t / t0 for the canonical process on .DC , DC /, write G t :D r>t  . s : 0 
s  r/. Then the laws Qh defined by .˘/ live on the product space .H , H , H/

.˘˘/ H :D D DC , H :D D ˝ DC , H t :D G t ˝ G tC , H :D .H t / t0 .

We write ., / for the canonical process on .H , H , H/ and have the following:
(˛) All laws Qh , h 2 Rd , are locally equivalent relative to H.
(ˇ) The density process of Qh with respect to Q0 relative to H is
² ³
1
Lh=0 D .L t / t0 , L t :D exp h>  t  h> ƒ
h=0 h=0
t h .
2

( ) The density process of Qh0 with respect to Qh relative to H is


² ³
h0 = h h0 = h h0 = h .1,h/ 1
L D .L t / t0 , L t :D exp .h0 h/> m t  .h0 h/> ƒ t
0
.h h/
2
174 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

for h0 , h 2 Rd , with notation

m.1,h/ :D   0  Œƒh

for the Qh -local martingale part of the first component  of the canonical pro-
cess ., / on .H , H , H/. Here L . j Qh / D L . j P 0 / does not depend on the
parameter h 2 Rd , and
   
L m.1,h/ j Qh D L ƒ1=2 B ı j P 0 for all h 2 Rd

where Brownian motion .B t / t and time change t ! t are independent, by


assumption.
(ı) For 0 < t < 1 fixed, the model .H , H t , ¹.Qh /jHt : h 2 Rd º/ is a mixed
normal model E.S , J / with S D  t and J D ƒ t in application of Proposition
6.3. The central statistic in E.S , J / is
0 1
.1/
t = t
B C
Z :D ƒ1 @ : : : A
.d /
t = t

with .1/ , : : : , .d / the components of .


Below, we shall prove the assertions (˛)–(ı) in three steps.
(1) Introduce one more factor .DC , DC / and consider first the path space
 
e, H
e, H
H e : H e :D D DC DC , H e :D D˝DC˝DC ,
e D .H
H e t :D G t ˝G C˝G C
e t / t0 , H
t t

, e, e/, equipped with the family of laws


with canonical process .e
  
e h :D L ƒ1=2 B C .ƒh/ id , id , A j P 0 , h 2 Rd
Q

e
where id is the deterministic process taking value t at time t . We define H-stopping
times
et :D inf¹v : ev > t º , 0 < t < 1
e /. By the properties of the increasing process A under P 0 , we have
e, H
on .H

./ 0 < et < 1 e h -almost surely


Q

for all h 2 Rd and all 0 < t < 1, and L. et j Q e h / does not depend on h 2 Rd .
Put e0  0 and note that paths t ! et and t ! e e
 t are càdlàg on ŒŒ0, ŒŒ where
 :D inf¹t : et D 1º equals C1 Q e h -almost surely for all h 2 Rd .
 
e h on H
(2) By definition of the laws Q e, He, H
e , Brownian motion B being indepen-
dent from the increasing process A under P 0 (and trivially independent from the second

Section 6.3 Time Changes for Brownian Motion with Unknown Drift 175

 
component which is deterministic), Example 5.15 immediately extends to H e, H
e, H e :
thus
e h loc Q
Q e for all h 2 Rd ,
and the density process of Qe h with respect to Qe 0 relative to H e is
° 1 ±
Lh=0 D .e
e Lh=0
t / t0 ,
e
Lh=0
t :D exp h>e  t  h> ƒt h .
2
(3) In the statistical model of step (2), we can change time according to Theo-
rem 5.12(c) – this hinges on the property ./ for all stopping times et , 0 < t < 1, in
step (1) – and consider mappings
   
‰ : .e , e, e/ ! ‰ .e , e, e/ :D e  ı e, e
e / to .H , H / which have the properties
e, H
from .H
 
e , 0 < t < 1, L ‰ j Q
‰ 1 .H t / D H e h D Qh , h 2 Rd .
e
t

Thus properties (˛), (ˇ), ( ) and (ı) for the statistical model .˘/ and .˘˘/ hold as
consequences of step (2). 

The following is a special case of Construction 6.16: we specify the increasing


process A D .A t / t0 at the start of Construction 6.16 as a one-sided stable process
(stable subordinator) with index ˛ 2 .0, 1/. This is of particular importance for null
recurrent Markov processes, under a regular variation condition, see Chapter 8. We
give the necessary definitions first, and then return to the statistical model.

6.17 Remark. (1) For 0 < ˛ < 1, the one-sided stable process S .˛/ with index ˛ is
defined from independent and stationary increments having Laplace transforms
 ˛ 
E e   .Sr2 Sr1 / D e  .r2 r1 /  ,   0 , 0  r1 < r2 < 1 .
˛ ˛

This process is a functional of Poisson random measure .ds, dx/ on .0, 1/ .0, 1/
with intensity .ds, dx/ D ds .1˛/˛
x ˛1 dx on .0, 1/ .0, 1/
Z tZ
.˛/
St D x .ds, dx/ , t  0,
0 .0,1/

(see [62], or [10, p. 73)]). Paths of S .˛/ are càdlàg and strictly increasing – they have
positive and summable jumps, increase only by jumps, and have an infinite number of
.˛/
infinitesimally small jumps over every finite time interval – such that S0 D 0 and
.˛/
lim t!1 S t D 1.
(2) Let V .˛/ denote the process inverse to S .˛/ , i.e. the process of level crossing
times
V t.˛/ :D inf¹v > 0 : Sv.˛/ > t º , 0  t < 1 .
176 Chapter 6 Quadratic Experiments and Mixed Normal Experiments

The process V .˛/ is called the Mittag–Leffler process of index 0 < ˛ < 1. Paths of
V .˛/ are continuous and non-decreasing such that V0.˛/ D 0 and lim t!1 V t.˛/ D 1.
.˛/
Laplace transforms of V t are given by
1
X
   V t.˛/
 ./n
E e D t n˛ ,   0, t  0
nD0
.1 C n˛/

.˛/
and L.V t / admits finite moments of arbitrary order. See [24, p. 453], [12],
or [131]. 

6.18 Example. At the start of Construction 6.16, for 0 < ˛ < 1, let us take the
increasing process A as the one-sided stable process S .˛/ with index ˛ , thus D V .˛/
in Construction 6.16 is the Mittag–Leffler process of index ˛. We put ƒ :D I for
simplicity. In .˘/ in Construction 6.16 we consider the mixed normal statistical model
  
Qh :D L B.V t.˛/ / C h V t.˛/ , V t.˛/ t0 , h 2 Rd .

Brownian motion B and the Mittag–Leffler process V .˛/ are independent. Corre-
sponding to observation over the time interval Œ0, t , likelihoods in ( ) of Construc-
tion 6.16 are of type
² ³
h0 = h 0 > .˛/ 1 0 > .˛/ 0
Lt D exp .h  h/ B.V t /  .h  h/ V t I .h  h/
2

and the central statistics in (ı) of Construction 6.16 is of type


1
Z :D .˛/
B.V t.˛/ / .
Vt
Thus the best concentrated distribution for estimation errors, in the sense of the Con-
volution Theorem 6.6, its Corollary 6.6’ and of the Minimax Theorem 6.8, is
Z  
.˛/ 1
./ L.ZjQ0 / D L.V t /.du/ N 0 , I .
.0,1/ u

We remark that the law ./ does not admit finite second moments (see Exercise 6.18’)
for 0 < ˛ < 1. 

Under mixed normality, recall that comparison of estimation errors works condi-
tionally on the observed information. As strengthened in Remark 6.6”, the best con-
centrated distribution according to the Convolution Theorem 6.6, its Corollary 6.6’
and the Minimax Theorem 6.8 is not required to have finite second or higher moments;
Example 6.18 again illustrates this fact.

Section 6.3 Time Changes for Brownian Motion with Unknown Drift 177

6.18’ Exercise. We prove that the law ./ in Example 6.18 does not admit finite second mo-
ments:
With notations of Remark 6.17, for 0 < ˛ < 1, deduce from the definition of V .˛/ as
process inverse of S .˛/ and from scaling properties of S .˛/ that
" #˛ !
 .˛/  1
L V1 DL ;
S1.˛/
this representation is sometimes used as a definition of a Mittag–Leffler law. Write now
Z " #!
1 1  
.˛/ ˛
.C/ L.V1.˛/ /.du/ D E .˛/
D E S1
.0,1/ u V1
and check that the integral in (+) equals C1:
.˛/
For 0 < ˛ < 1, let G denote the distribution function of S1 , then
1
1  G.t / t ˛ as t ! 1
.1  ˛/
by [24, p. 448] or [12, p. 361)]. For 0 < p  ˛, we obtain
   
p p
E S1.˛/ < 1 for p < ˛ , E S1.˛/ D 1 for p D ˛

from a representation
  Z N
p
E S1.˛/ D  lim x p d.1  G.x//
N !1 0

via partial integration. 


Chapter 7

Local Asymptotics of Type LAN, LAMN, LAQ

Topics for Chapter 7:


7.1 Local Asymptotics of Type LAN, LAMN, LAQ
LAQ, LAMN, LAN in # 7.1
Notations relative to local asymptotics 7.1’
Contiguity of local alternatives 7.2–7.3
Markov extensions of statistical models 7.4
Rescaled estimation errors in the local experiment at # / estimators in the limit exper-
iment 7.5
Uniform convergence of risks of estimators over shrinking neighbourhoods of # 7.6
Linking rescaled estimation errors at # to the central sequence at # 7.7
7.2 Asymptotic Optimality of Estimators in the LAN or LAMN Setting
Regularity of estimator sequences as asymptotic (strong) equivariance 7.8
Example: linking rescaled estimation errors to the central sequence 7.9
LAN: Hájek’s convolution theorem 7.10(a)
LAMN: Jeganathan’s convolution theorem 7.10(b)
Characterising efficiency at # 7.11
LAMN or LAN: local asymptotic minimax theorem 7.12
A remark on efficient sequences, and an example 7.13–7.13’
7.3 Le Cam’s One-step Modification of Estimators
Conditions for one-step modification 7.14
Local scale with estimated parameter 7.15
Discretisation of estimators 7.16
Information with estimated parameter 7.17
Score with estimated parameter 7.18
The one-step modification 7.19
7.4 The Case of i.i.d. Observations
LAN via Le Cam’s second lemma for i.i.d. observations 7.20
Examples for efficient estimator sequences 7.21–7.22
Exercises: 7.9’, 7.9”, 7.10”, 7.22’, 7.22”
Section 7.1 Local Asymptotics of Type LAN, LAMN, LAQ 179

Given a sequence of statistical experiments .En /n parameterised by the same d -


dimensional parameter 2 ‚, local asymptotics means that we fix a parameter value
# 2 ‚ and consider shrinking neighbourhoods of # in the following way:
(i) for some choice of local scale .ın .#//n tending to 0 in a specific way, we repa-
rameterise with respect to # in En by writing D # C ın .#/ h , and then focus
on h as a local parameter at # at stage n of the asymptotics;
(ii) as n ! 1, local experiments parameterised by h tend in a suitable sense to a
limit experiment whose parameter h ranges over the full space Rd ;
(iii) as n ! 1, statistical properties which characterise the limit experiment carry
over to the local experiments at #, in the following sense: in an approximating lo-
cal experiment at #, at stage n of the asymptotics, they hold true ‘approximately’
when n is large.
Quite different types of limit experiment – exhibiting essentially different statistical
properties – can be considered in such a framework. Also, sets of conditions which at
stage n of the asymptotics link the approximating local experiment at # to the limit
experiment can take various forms. For convergence of experiments in general, see
Strasser [121]. Shiryaev and Spokoiny [122] explore a spectrum of different routes
linking approximating local experiments at # as n ! 1 to a limit experiment.
The present chapter is on local asymptotics where the limit experiment is a Gaus-
sian shift, a mixed normal or a quadratic experiment as considered in Chapters 5 and
6, and where the local experiments at # will inherit the statistical properties of a Gaus-
sian shift as n ! 1, of a mixed normal or a quadratic limit experiment. Our main
references are Le Cam [81], Hàjek [40], Davies [19], Jeganathan [66,67], Le Cam and
Yang [84]; we follow in particular the approach outlined by [19].
The main results of this chapter, for a mixed normal limit experiment, are the
local asymptotic versions of the convolution theorem (in 7.10 and 7.11 below) and
of the minimax theorem (in 7.12 below). We shall look at the special case of i.i.d.
observations in Section 7.4. Some stochastic process examples will then be considered
in a separate Chapter 8.

7.1 Local Asymptotics of Type LAN, LAMN, LAQ

The following notations will be used throughout the chapter. The parameter space ‚ is
an open subset of Rd , and we have a sequence .En /n of statistical experiments which
are parameterised by ‚

En :D .n , An , ¹Pn,# : # 2 ‚º/ , n  1.


180 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

We fix a reference point # 2 ‚. Relative to this point, we consider a norming se-


quence .ın .#//n decreasing to 0. Our ın D ın .#/ are strictly positive real numbers;
everything would work similarly with matrices in Rd d , strictly positive definite and
decreasing in the sense of half-ordering of positive definite matrices to 0 2 Rd d . We
write again DC 2 B.Rd d / as in Definition 6.1 for the set of symmetric and strictly
positive definite matrices.
A main example for this setting is as follows: for sequences of i.i.d. random vari-
ables whose law depends on a parameter #, consider experiments En corresponding
to observation of the first n variables under unknown # 2 ‚ as in Assumption
4.10(b). In this case, assuming smoothness of the parameterisation at # in the
sense of Assumption 4.10(a), we take local scale at # as ın .#/ D n1=2 . Then the
following should be compared to Le Cam’s Second Lemma 4.11 for i.i.d. observations:

7.1 Definition. (a) A sequence of experiments .En /n as above is called locally asymp-
totically quadratic at # (LAQ) if relative to # there are pairs of statistics

.Sn , Jn / D .Sn .#/, Jn .#// : n ! Rd Rd d An -measurable , n1

for every n  1, Pn,# -almost surely, Jn .#/ takes values in DC


and norming constants

ın D ın .#/ decreasing to 0 as n ! 1

such that the following properties (i) and (ii) are satisfied:
(i) at #, quadratic expansions
d Pn,#Cın hn 1 >
log D h>
n Sn  h Jn hn C o.Pn,# / .1/ , n!1
d Pn,# 2 n

hold true, for arbitrary bounded sequences .hn /n in Rd (then, ‚ being open, we have
# C ın .#/hn in ‚ when n is large enough);
(ii) depending on #, a quadratic experiment exists
 
E1 D E.S , J / D  , A , ¹Ph : h 2 Rd º ,
E1 D E1 .#/ , S D S.#/ , J D J.#/ , Ph D Ph .#/

as defined in Definition 6.1 such that weak convergence of pairs

L .Sn , Jn j Pn,# / ! L . S , J j P0 / D L . S.#/, J.#/ j P0 .#//

(weakly in Rd Rd d ) holds as n ! 1.
(b) In particular, .En /n is called locally asymptotically mixed normal at # (LAMN)
if the limit experiment E1 D E.S , J / in (a.ii) is a mixed normal experiment as
defined in Definition 6.2 and Proposition 6.3.
Section 7.1 Local Asymptotics of Type LAN, LAMN, LAQ 181

(c) In particular, .En /n is called locally asymptotically normal at # (LAN) if the


limit experiment E1 D E.S , J / in (a.ii) is a Gaussian shift experiment E.J / as
defined in Definition 5.2.

In the LAN case, J D J.#/ is deterministic; recall that a Gaussian shift experiment
E.J / exists for every J 2 DC . Since convergence in law of Jn D Jn .#/ under Pn,#
to a deterministic limit J D J.#/ is equivalent to convergence in probability, we can
write under LAN
Jn D J C oPn,# .1/ as n ! 1
and can replace in this case the statistics Jn D Jn .#/ in part (a) of Definition 7.1 by
the deterministic quantity J D J.#/ simultaneously for all n  1.

For local asymptotics at some reference point # 2 ‚ we shall always use the
following notations:

7.1’ Definition. (a) For all n  1, we write for short


h=0 d Pn , #Cın .#/h h=0 h=0
Ln,# :D , ƒn,# D log Ln,#
d Pn,#
in the experiments En , with all conventions on likelihood ratios as explained in Nota-
tions 3.1; ‚ being open, we have local parameter spaces
® ¯
‚#,n :D h 2 Rd : h such that # C ı.#/h belongs to ‚
which as n ! 1 form a sequence of open sets increasing to Rd . At stage n  1 of
the asymptotics, we call
 ® ¯
E#,n :D n , An , Pn,#Cın .#/ h : h 2 ‚#,n
a local model at # with local scale ın .#/ and local parameter h .
(b) When LAQ holds at #, we call E1 D E1 .#/ of Definition 7.1(a.ii) the limit
experiment at #, and write
dPh
Lh=0 :D , ƒh=0 D log Lh=0 , h 2 Rd
dP0
in E1 . According to Definitions 6.4 and 6.2, the limit experiment has score S  J h D
S.#/  J.#/h in h 2 Rd , the observed information in the sense of Definition 6.4 is
J D J.#/. Note that in the limit experiment E1 .#/, the law of the observed infor-
mation L.J jPh / D L. J.#/ j Ph .#/ / may depend on h 2 Rd .
(c) The central statistic in the limit experiment E1 is Z D Z.#/ , Z D
1¹J 2DC º J 1 S , by Definition 6.1. When LAQ holds at #, we call the sequence .Zn /n
defined by
Zn D Zn .#/ , Zn :D 1¹Jn 2DC º Jn1 Sn , n  1
the central sequence at #.
182 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

The name ‘central sequence’ suggests a benchmark sequence: in fact, we will see
below that it allows us to judge estimation errors at the reference point # under LAMN
simultaneously with respect to a broad variety of loss functions. In a setting of i.i.d.
observations, we will give examples for LAN in Section 7.4; stochastic process ex-
amples for LAN, LAMN or LAQ will be discussed in Chapter 8. For local scale at #,
besides the well-known ın .#/ D n1=2 encountered under certain conditions, various
other rates – slower or faster – will occur in the stochastic process examples of Chap-
ter 8. In particular, we underline the following: in a given sequence of models .En /n ,
we may have at different points # 2 ‚ different rates ın .#/ # 0 and different limit
experiments E1 .#/.
We begin to discuss the statistical implications of the LAQ setting. Note that
Definition 7.1 did not require equivalence or absolute continuity for probability
measures Pn, 0 , Pn, , 0 ¤ , in the experiments En at the pre-limiting stage
h=0
n < 1. In particular, log-likelihoods ƒn,# in En may take the values ˙1 with pos-
itive Pn,# -probability: recall the definition of R-tightness from Notations 3.1 and 3.1’.

7.2 Proposition. When LAQ holds at #, for bounded sequences .hn /n in Rd , log-
hn =0 hn =0
likelihoods .ƒn,# /n and likelihoods .Ln,# /n under Pn,# are R-tight as n ! 1,
and convergence of .hn /n to a limit h 2 Rd implies
hn =0
ƒn,#  ƒh=0
n,#
D o.Pn,# / .1/ , Lhn,#
n =0
 Lh=0
n,#
D o.Pn,# / .1/ , n ! 1.

Proof. (1) For convergent sequences hn ! h, tightness in Rd Rd d under .Pn,# /n


of the pairs .Sn , Jn /n in virtue of Definition 7.1(a.ii) implies that the quantities
h> >
n Jn .hn h/ and .hn h/ Jn h vanish as n ! 1 in Pn,# -probability. For bounded
sequences .hn /n , both hn Jn hn and h>
>
n Sn remain tight under Pn,# as n ! 1. Thus
we obtain for convergent sequences hn ! h
   
> 1 > > 1 >
.C/ hn Sn  hn Jn hn  h Sn  h Jn h D o.Pn,# / .1/
2 2

as n ! 1. For bounded sequences .hn /n in Rd , we have


 
> 1 >
.CC/ L hn Sn  hn Jn hn j Pn,# , n  1 , is tight in R.
2
(2) Combining (+) with the quadratic expansions in Definition 7.1(a.i) of the log-
likelihoods gives
h =0 h=0
n
ƒn,#  ƒn,# D o.Pn,# / .1/ for convergent sequences hn ! h .
h =0
n
(3) The log-likelihoods .ƒn,# /n form a sequence of R-valued random variables,
cf. Notations 3.1 and 3.1’. For bounded sequences .hn /n , (++) combined with the
Section 7.1 Local Asymptotics of Type LAN, LAMN, LAQ 183

quadratic expansions in Definition 7.1(a.i) yields


 
hn =0
./ ƒn,# is R-tight under .Pn,# /n .
n
hn =0
The likelihoods .Ln,# are Œ0, 1-valued, and Proposition 3.3 shows
/n
 
hn =0
./ Ln,# is R-tight under .Pn,# /n .
n
(4) In order to prove the last assertion of the proposition, we consider functions
N ^ exp./ _ N for large N which are bounded and uniformly continuous, and deduce
1

hn =0
Ln,#  Lh=0
n,#
D o.Pn,# / .1/

for convergent sequences hn ! h from ./ and ./ and step (2). 

Contiguity of probability measures in local models at # will play a key role in


the following. For bounded sequences .hn /n , we consider Pn,#Cın .#/hn as local
alternatives to Pn,# when n ! 1, and make use of the results in Chapter 3.

7.3 Proposition. Under LAQ at # we have mutual contiguity


 
.˘/ Pn,#Cın .#/hn n CB .Pn,# /n

for arbitrary bounded sequences .hn /n . For convergent sequences hn ! h we have


weak convergence
 hn =0   
L ƒn,# , Sn , Jn j Pn,#Cın .#/hn ! L ƒh=0 , S , J j Ph
(in R Rd Rd d , as n ! 1) .

Proof. (1) Consider first convergent sequences hn ! h. Combining Definition 7.1 of


LAQ at # with Proposition 7.2 and Notation 3.1’ we have
 
 hn =0  > 1 >  
L ƒn,# j Pn,# ! L h S  h J h j P0 D L ƒh=0 j P0
2
(weakly in R, for n ! 1)

where ƒh=0 is the log-likelihood ratio of Ph with respect to P0 in the quadratic limit
experiment E1 .#/. By Definition 6.1 and Remark 6.1’, probability measures in the
limit experiment are equivalent. Hence Le Cam’s first lemma applies and establishes
.˘/ in this case, cf. Lemma 3.5 and Remark 3.5’.
(2) Directly from Definition 3.2, mutual contiguity .˘/ holds if and only if any
subsequence .nk /k of N contains a further subsequence .nk` /` along which
   
Pnk ,#Cınk .#/hnk ` CB Pnk ,# `
` ` ` `
184 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

holds. When .hn /n is a bounded sequence, selecting further subsequences if necessary,


we can always assume that .hnk` /` converges. Hence it is sufficient to establish .˘/
for convergent sequences .hm /m which was done in step (1). Thus .˘/ is proved in
general.
(3) We recall some consequences of mutual continuity .˘/: for bounded sequences
q
.hn /n and for An -measurable variables .Yn /n taking values in R , we can consider
events An :D ¹jYn j > "º and have
Yn D oPn,# .1/ ” Yn D oPn,#Cın hn .1/ ;
if we consider events An :D ¹jYn j > Kº, the "-ı-characterisation of mutual contiguity
in Proposition 3.7 shows that
.Yn /n under .Pn,# /n is Rq -tight ” .Yn /n under .Pn,#Cın hn /n is Rq -tight.
In particular, the quadratic expansions under Pn,# which define LAQ in Definition 7.1
remain valid under Pn,#Cın hn whenever .hn /n is bounded: when LAQ holds at # we
have
1 >
.ı/
hn =0
ƒn,# D h>
n Sn  h Jn hn C o.Pn,#Cın hn / .1/ as n ! 1
2 n
for bounded sequences .hn /n .
(4) Now we can prove the second part of the proposition. Under LAQ at #, weak
convergence of pairs
L .Sn , Jn j Pn,# / ! L .S , J j P0 / (in Rd Rd d , as n ! 1)
implies by the continuous mapping theorem and by representation of log-likelihood
ratios in Definition 7.1
 h=0   
L ƒn,# , Sn , Jn j Pn,# ! L ƒh=0 , S , J j P0
(weakly in R Rd Rd d , as n ! 1)
for fixed h 2 Rd . Proposition 7.2 allows to extend this to convergent sequences
hn ! h:
 hn =0   
L ƒn,# , Sn , Jn j Pn,# ! L ƒh=0 , S , J j P0
(weakly in R Rd Rd d , for n ! 1) .
h =0
Here ƒn,#n
is the log-likelihood ratio of Pn,#Cın hn with respect to Pn,# in En .#/, and
ƒ h=0 the log-likelihood ratio of Ph with respect to P0 in E1 .#/. Now we apply Le
Cam’s third lemma, cf. Lemma 3.6 and Remark 3.6’, and obtain weak convergence
under contiguous alternatives
 n =0   
L ƒhn,# , Sn , Jn j Pn,#Cın hn ! L ƒh=0 , S , J j Ph
(weakly in R Rd Rd d , for n ! 1).
This finishes the proof of the proposition. 
Section 7.1 Local Asymptotics of Type LAN, LAMN, LAQ 185

Among competing sequences of estimators for the unknown parameter in the se-
quence of experiments .En /n , one would like to identify – if possible – sequences
which are ‘asymptotically optimal’. Fix # 2 ‚. In a setting of local asymptotics at
# with local scale .ın .#//n , the basic idea is as follows. To any estimator Tn for the
unknown parameter in En associate

Un D Un .#/ :D ın1 .#/ .Tn  #/

which is the rescaled estimation error or Tn at #; clearly

Un  h D ın1 .#/ .Tn  Œ# C ın .#/h/

for h 2 ‚n,# . Thus, # being fixed, we may consider Un D Un .#/ as an estimator


for the local parameter h in the local experiment En .#/ D ¹Pn,#Cın .#/h : h 2 ‚n,# º
at #. The aim will be to associate to such sequences .Un /n limit objects U D U.#/
in the limit experiment E1 .#/, in a suitably strong sense. Then one may first discuss
properties of the estimator U in the limit experiment E1 .#/ and then convert these
into properties of Un as estimator for the local parameter h 2 ‚n,# in the local
experiment En .#/. This means that we will study properties of Tn in shrinking
neighbourhoods of # defined from local scale ın .#/ # 0. Lemma 7.5 below is the
technical key to this approach. It requires the notion of a Markov extension of a
statistical experiment.

7.4 Definition. Consider a statistical experiment .E, E, P /, another measurable space


.E 0 , E 0 / and a transition probability K.y, dy 0 / from .E, E/ to .E 0 , E 0 /. Equip the prod-
uct space
b b
.E, E/ , E b D E E0 , b E D E ˝ E0
with probability laws

PK .dy, dy 0 / :D P .dy/ K.y, dy 0 / , P 2P.

This yields a statistical model

b :D ¹PK : P 2 P º
P on b b
.E, E/

with the following properties:


(i) for every pair Q, P and every version L of the likelihood ratio of P with respect
to Q in the experiment .E, E, P /, b L defined by

b
L.y, y 0 / :D L.y/ on .E E 0, E ˝ E 0/

is a version of the likelihood ratio of PK with respect to QK in the experiment


b b
.E, b /;
E, P
186 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

b b
(ii) in the model .E, b /, statistics U taking values in .E 0 , E 0 / and realising the
E, P
prescribed laws
Z
0
PK.A / :D P .dy/ K.y, A0 / , A0 2 E 0 ,
E
b
under the probability measure PK 2 P

b b
exist: on .E, E/, we simply define the random variable U as the projection E b 3
0 / ! y 0 2 E 0 on the second component which gives P .U 2 A0 / D PK.A0 /
.y, y K
for A0 2 E 0 .
The experiment .E, b b b / is called a Markov extension of .E, E, P / ; clearly every
E, P
statistic Y already available in the original experiment is also available in .E,b b b/
E, P
0
via lifting Y .y, y / :D Y .y/.

We comment on this definition. By Definition 7.4(i), .E, b bE, Pb / is statistically the


same experiment as .E, E, P / since ‘likelihoods remain unchanged’: for arbitrary
P .0/ and P .1/ , : : : , P .`/ in P , writing L.i/ for the likelihood ratio of P .i/ with re-
spect to P .0/ in .E, E/, and b
.i/ .0/
L.i/ for the likelihood ratio of PK with respect to PK
b b
in .E, b /, we have (cf. Strasser [121, 53.10 and 25.6]) by construction
E, P
   .1/ 
L .L.1/ , : : : , L.`/ / j P .0/ D L .b L ,:::,b L.`/ / j PK.0/ .

Whereas .E 0 , E 0 /-valued random variables having law PK under P 2 P might not


exist in the original experiment, their existence is granted on the Markov extension
b b
.E, E, Pb / of .E, E, P / by Definition 7.4(ii).

7.5 Lemma. Fix # 2 ‚ such that LAQ holds at #. Relative to #, consider An -


measurable mappings Un D Un .#/ : n ! Rk with the property

L .Un j Pn,# / , n1, is tight in Rk .

Then there are subsequences .nl /l and A-measurable mappings U :  ! Rk in


(if necessary, Markov extensions of) the limit experiment E1 D E.S , J /, with the
following property:
8
d d
 convergence in R R  R
d k
ˆ
< weak
L Snl , Jnl , Unl j Pnl ,#Cınl hl ! L .S , J , U j Ph / as l ! 1

holds for arbitrary h and convergent sequences hl ! h .

Proof. (1) By LAQ at # we have tightness of L. Sn , Jn j Pn,# /, n  1; by assumption,


we have tightness of L.Un jPn,# /, n  1. Thus

L . Sn , Jn , Un j Pn,# / , n1, is tight in Rd Rd d Rk .


Section 7.1 Local Asymptotics of Type LAN, LAMN, LAQ 187

Extracting weakly convergent subsequences, let .nl /l denote a subsequence of N and


0 a probability law on Rd Rd d Rk such that
 
./ L Snl , Jnl , Unl j Pn,# ! 0 , l ! 1 .

For convergent sequences hl ! h we deduce from the quadratic expansion of log-


likelihoods in Definition 7.1 together with Proposition 7.2
 
h =0
.C/ L ƒnl ,# , Snl , Jnl , Unl j Pn,# ! e 0 , l ! 1
l

0 on R Rd Rd d Rk is the image measure of 0 under the map-


where the law e
ping  
1
.s, j , u/ ! h>s  h>j h , s, j , u .
2
Mutual contiguity .˘/ in Proposition 7.3 and Le Cam’s Third Lemma 3.6 transform
(+) into weak convergence
 
h =0
L ƒnl ,# , Snl , Jnl , Unl j Pnl ,#Cınl hl ! e
l
h , l ! 1

h is a probability law on R Rd Rd d Rk defined from e


where e 0 by

h .d , ds, dj , du/ D e  e
e 0 .d , ds, dj , du/ .

Projecting on the components .s, j , u/, we thus have proved for any limit point h 2 Rd
and any sequence .hl /l converging to h
 
.CC/ L Snl , Jnl , Unl j Pnl ,#Cınl hl ! h , l ! 1

where – with 0 as in ./ above – the probability measures h are defined from 0 by
>s  1 h>j h
.CCC/ h .ds, dj , du/ :D e h 2 0 .ds, dj , du/ on Rd Rd d Rk .

Note that the statistical model ¹h : h 2 Rd º arising in (+++) is attached to the partic-
ular accumulation point 0 for ¹L.Sn , Jn , Un j Pn,# / : n  1º which was selected in
./ above, and that different accumulation points for ¹L.Sn , Jn , Un j Pn,# / : n  1º
lead to different models (+++).
(2) We construct a Markov extension of the limit experiment E.S , J / carrying a
random variable U which allows to identify h in (++) as L .S , J , U j Ph /, for all
h 2 Rd .
Let sL , jL, uL denote the projections which map .s, j , u/ 2 Rd Rd d Rk to either
one of its coordinates s 2 Rd or j 2 Rd d or u 2 Rk . In the statistical model
¹h : h 2 Rd º fixed by (+++), the pair .Ls , jL/ is a sufficient statistic. By sufficiency,
conditional distributions given .Ls , jL/ of Rk -valued random variables admit regular
188 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

versions which do not depend on the parameter h 2 Rd . Hence there is a transition


probability K., / from Rd Rd d to Rk such that
L s ,jL/D.s,j /
uj.L
./ K..s, j /, du/ D:  .du/
L s ,jL/D.s,j /
uj.L
provides a common determination of all conditional laws h .du/ , h 2 Rd .
Defining
b :D  Rk , A
 b :D A˝B.Rk / ,
. /
b h .d!, du/ :D Ph .d!/K. .S.!/, J.!// , du / ,
P h 2 Rd
we have a Markov extension
 ® ¯
 b, P
b, A b h : h 2 Rd

of the original limit experiment E1 .#/ D E.S , J / D ., A, ¹Ph : h 2 Rd º/ of


Definition 7.1. Exploiting again sufficiency, we can put the laws h of (++) and (+++)
in the form
.Ls ,jL/ L
L s ,j /D.s,j /
h .ds, dj , du/ D h .ds, dj / uj.L
h
.du/
.Ls ,jL/ L s ,jL/D.s,j /
uj.L
D h .ds, dj /  .du/
.Ls ,jL/
D h .ds, dj / K..s, j /, du/

for all h 2 Rd . Combining ./ and (++) with Proposition 7.3, we can identify the last
expression with
.S ,J /
Ph .ds, dj / K..s, j /, du/
for all h 2 Rd . The Markov extension . / of the original limit experiment allows us
to write this as
Pb .S ,J ,U / .ds, dj , du/
h
b to Rk . Hence we have proved
where U denotes the projection U.!, u/ :D u from 

b .S ,J ,U / .ds, dj , du/ for all h 2 Rd


h .ds, dj , du/ D P h

which is the assertion of the lemma. 

From now on, we need no longer distinguish carefully between the original limit
experiment and its Markov extension. From ./ and ./ in the last proof, we see that
any accumulation point of ¹L.Sn , Jn , Un jPn,# / : n  1º can be written in the form
L.S , J jP0 /.ds, dj /K..s, j /, du/ for some transition probability from Rd Rd d to
Rk . In this sense, statistical models ¹h : h 2 Rd º which can arise in (++) and (+++)
correspond to transition probabilities K., / from Rd Rd d to Rk .
Section 7.1 Local Asymptotics of Type LAN, LAMN, LAQ 189

7.6 Theorem. Assume LAQ at #. For any estimator sequence .Tn /n for the unknown
parameter in .En /n , let Un D Un .#/ D ın1 .Tn  #/ denote the rescaled estimation
errors of Tn at #, n  1. Assume joint weak convergence in Rd Rd d Rd
L . Sn , Jn , Un j Pn,# / ! L . S , J , U j P0 / as n ! 1
where U is a statistic in the (possibly Markov extended) limit experiment E1 .#/ D
E.S , J /. Then we have for arbitrary convergent sequences hn ! h
 
L Sn , Jn , Un  hn j Pn,#Cın hn ! L . S , J , U  h j Ph /
.C/
as n ! 1

(weak convergence in Rd Rd d Rd ), and from this


ˇ    ˇ
supjhjC ˇEn,#Cın h ` ın1 .Tn  .# C ın h//  Eh .`.U  h//ˇ
.CC/
! 0 as n ! 1

for bounded and continuous loss functions `./ on Rd and for arbitrarily large constants
C < 1.

Proof. (1) Assertion (+) of the theorem corresponds to the assertion of Lemma 7.5, un-
der the stronger assumption of joint weak convergence of .Sn , Jn , Un / under Pn,# as
n ! 1, without selecting subsequences. For loss functions `./ in Cb .Rd /, (+) con-
tains the following assertion: for arbitrary convergent sequences hn ! h, as n ! 1,
  
En,#Cın hn ` ın1 .Tn  .# C ın hn // D En,#Cın hn . `.Un  hn //
. /
! Eh . `.U  h// .
(2) We prove that in the limit experiment E1 .#/ D E.S , J /, for `./ continuous
and bounded,
h ! Eh . `.U  h// is continuous on Rd .
Consider convergent sequences hn ! h. The structure in Definition 6.1 of likelihoods
in the limit experiment implies pointwise convergence of Lhn =0 as n ! 1 to Lh=0 ;
these are non-negative, and E0 .Lh=0 / D 1 D E0 .Lhn =0 / holds for all n. This gives
(cf. [20, Nr. 21 in Chap. II])
the sequence Lhn =0 under P0 , n  1, is uniformly integrable.
For `./ continuous and bounded, we deduce
the sequence Lhn =0 `.U  hn / under P0 , n  1, is uniformly integrable
which contains the assertion of step (2): it is sufficient to write as n ! 1
 
Ehn . `.U  hn // D E0 Lhn =0 `.U  hn /
   
! E0 Lh=0 `.U  h/ D Eh `.U  h/ .
190 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

(3) Now it is easy to prove (++): in the limit experiment, thanks to step (2), we can
rewrite assertion . / of step (1) in the form

. / En,#Cın hn . ` .Un  hn //  Ehn .`.U  hn // D o.1/ , n!1

for arbitrary convergent sequences hn ! h. For large constants C < 1 define


ˇ ˇ
˛n .C / :D sup ˇ En,#Cın h . ` .Un  h//  Eh .`.U  h// ˇ , n  1 .
jhjC

Assume that for some C the sequence .˛n .C //n does not tend to 0. Then there is a
sequence .hn /n in the closed ball ¹jhj  C º and a subsequence .nk /k of N such that
for all k
ˇ
ˇ     ˇˇ
ˇEnk ,#Cınk hnk ` Unk  hnk  Ehnk `.U  hnk / ˇ > "

for some " > 0. The corresponding .hnk /k taking values in a compact, we can find
some further subsequence .nk` /` and a limit point hL such that convergence hnk` ! hL
holds as ` ! 1, whereas
ˇ      ˇˇ
ˇ
ˇEnk ,#Cınk hnk ` Unk`  hnk`  Ehnk ` U  hnk` ˇ > "
` ` ` `

still holds for all `. This is in contradiction to . /. We thus have ˛n .C / ! 0 as


n ! 1. 

We can rephrase Theorem 7.6 as follows: when LAQ holds at #, any estimator
sequence .Tn /n for the unknown parameter in .En /n satisfying a joint convergence
condition

./ L . Sn , Jn , Un j Pn,# / ! L . S , J , U j P0 / , Un D ın1 .Tn  #/

works over shrinking neighbourhoods of #, defined through local scale ın D ın .#/ #


0 , as well as the limit object U viewed as estimator for the unknown parameter h 2 Rd
in the limit experiment E1 .#/ D E.S , J / of Definition 7.1, irrespective of choice
of loss functions in the class Cb .Rd /, and thus irrespective of any particular way of
penalising estimation errors.
Of particular interest are sequences .Tn /n coupled to the central sequence at # by

.˘/ ın1 .#/.Tn  #/ D Zn .#/ C o.Pn,# / .1/ , n!1

where Zn D 1¹Jn 2DC º Jn1 Sn is as in Definition 7.1’. In the limit experiment, P0 -


almost surely, J takes values in DC (Definitions 7.1 and 6.1). By the continuous map-
ping theorem, the joint convergence condition ./ then holds with U :D J 1 S D Z
on the right-hand side, for sequences satisfying .˘/, and Theorem 7.6 reads as follows:
Section 7.2 Asymptotic optimality of estimators in the LAN or LAMN setting 191

7.7 Corollary. Under LAQ in #, any sequence .Tn /n with the coupling property .˘/
satisfies
ˇ    ˇ
sup ˇEn,#Cın h ` ın1 .Tn  .# C ın h//  Eh .`.Z  h//ˇ ! 0 , n ! 1
jhjC

for continuous and bounded loss functions `./ and for arbitrary constants C < 1.
This signifies that .Tn /n works over shrinking neighbourhoods of #, defined through
local scale ın D ın .#/ # 0 , as well as the maximum likelihood estimator Z in the
limit model E1 .#/ D E.S , J /.
Recall that in Corollary 7.7, the laws L.Z  hjPh / may depend on the parameter
h 2 Rd , and may not admit finite higher moments (cf. Remark 6.6”; examples will be
seen in Chapter 8). Note also that the statement of Corollary 7.7 – the ‘best’ result as
long as we do not assume more than LAQ – should not be mistaken as an optimality
criterion: Corollary 7.7 under .˘/ is simply a result on risks of estimators at # – in
analogy to Theorem 7.6 under the joint convergence condition ./ – which does not
depend on a particular choice of a loss function, and which is uniform over shrinking
neighbourhoods of # defined through local scale ın D ın .#/ # 0 . We do not know
about the optimality of maximum likelihood estimators in general quadratic limit
models (recall the remark following Theorem 6.8).

7.2 Asymptotic optimality of estimators in the LAN


or LAMN setting
In local asymptotics at # of type LAMN or LAN, we can do much better than Corollary
7.7. We first consider estimator sequences which are regular at #, following a termi-
nology well established since Hájek [40]: in a local asymptotic sense, this corresponds
to the definition of strong equivariance in Definition 6.5 and Proposition 6.5’ for a
mixed normal limit experiment, and to the definition of equivariance in Definition 5.4
for a Gaussian shift. The aim is to pass from criteria for optimality which we have in the
limit experiment (Theorems 6.6 and 6.8 in the mixed normal case, and Theorems 5.5
and 5.10 for the Gaussian shift) to criteria for local asymptotic optimality for estimator
sequences .Tn /n in .En /n at #. The main results are Theorems 7.10, 7.11 and 7.12.
7.8 Definition. For n  1, consider estimators Tn for the unknown parameter # 2 ‚
in En .
(a) (Hájek [40]) If LAN holds at #, the sequence .Tn /n is termed regular at # if
there is a probability measure F D F .#/ on Rd such that for every h 2 Rd
 
L ın1 .Tn  .# C ın h// j Pn,#Cın h ! F
(weak convergence in Rd as n ! 1)

where the limiting law F does not depend on the value of the local parameter h 2 Rd .
192 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

(b) (Jeganathan [66]) If LAMN holds at #, .Tn /n is termed regular at # if there is


some probability measure F eDF e .#/ on Rd d Rd such that for every h 2 Rd
 
L Jn , ın1 .Tn  .# C ın h// j Pn,#Cın h ! F e
(weakly in Rd d Rd as n ! 1)
e does not depend on h 2 Rd .
where the limiting law F

Thus ‘regular’ is a short expression for ‘locally asymptotically equivariant’ in


the LAN case, and for ‘locally asymptotically strongly equivariant’ in the LAMN case.

7.9 Example. Under LAMN or LAN at #, estimator sequences .Tn /n in .En /n linked
to the central sequence .Zn /n at # by the coupling condition of Corollary 7.7

.˘/ ın1 .#/.Tn  #/ D Zn .#/ C o.Pn,# /n .1/ , n!1

are regular at #. This is seen as follows. As in the remarks preceding Corollary 7.7,
rescaled estimation errors Un :D ın1 .#/.Tn  #/ satisfy a joint convergence condi-
tion with U D Z on the right-hand side:

L . Sn , Jn , Un j Pn,# / ! L . S , J , Z j P0 / , n ! 1,

from which by Theorem 7.6, for every h 2 Rd ,


 
L Sn , Jn , .Un  h/ j Pn,#Cın h ! L . S , J , .Z  h/ j Ph / , n ! 1.

When LAN holds at #, F :D L.Z  hjPh / does not depend on h 2 Rd , see Proposi-
tion 5.3(b). When LAMN holds at #, F e :D L.J , Z  hjPh / does not depend on h ,
see Definition 6.2 together with Proposition 6.3(iv). This establishes regularity at # of
sequences .Tn /n which satisfy condition .˘/, under LAN or LAMN. 

7.9’ Exercise. Let E denote the location model ¹F ./ D F0 .  / : 2 Rº generated by the
doubly exponential distribution F0 .x/ D 12 e jxj dx on .R, B.R//. Write En for the n-fold
product experiment. Prove the following, for every reference point # 2 ‚:
(a) Recall from Exercise 4.1’’’ that E is L2 -differentiable at D #,p and use Le Cam’s
Second Lemma 4.11 to establish LAN at # with local scale ın .#/ D 1= n.
(b) The median of the first n observations (which is the maximum likelihood estimator in
this model) yields a regular estimator sequence at #. The same holds for the empirical mean,
or for arithmetic means between upper and lower empirical ˛-quantiles, 0 < ˛ < 12 fixed.
Also Bayesians with ‘uniform over R prior’ as in Exercise 5.4’
R1
L=0
n d
Tn :D R11 =0
, n1
1 Ln d

are regular in the sense of Definition 7.8. Check this using only properties of a location
model. 
Section 7.2 Asymptotic optimality of estimators in the LAN or LAMN setting 193

7.9” Exercise. We continue ExerciseP 7.9’, with notations and assumptions as there. Focus
on the empirical mean Tn D n1 niD1 Xi as estimator for the unknown parameter. Check
that the sequence .Tn /n , regular by Exercise 7.9’(b), induces the limit law F D N .0, 2/ in
Definition 7.8(a).
Then give an alternative proof for regularity of .Tn /n based on the LAN property: from joint
convergence of
!
1 X 1 X
n n
p sgn.Xi  #/ , p .Xi  #/ under Pn,#
n i D1 n i D1

specify the two-dimensional normal law which arises as limit distribution for
 p 
L ƒh=0n,# , n.Tn  #/ j Pn,# as n ! 1 ,

finally determine the limit law for


 p  p  
L ƒh=0n,#
, n Tn  .# C h= n/ j Pn,#Ch=pn as n ! 1

using Le Cam’s Third Lemma 3.6 in the particular form of Proposition 3.6”. 

In the LAN case, the following is known as Hájek’s convolution theorem.

7.10 Convolution Theorem. Assume LAMN or LAN at #, and consider a sequence


.Tn /n of estimators for the unknown parameter in .En /n which is regular at #.
(a) (Hájek [40]) When LAN holds at #, any limit distribution F arising in Definition
7.8(a) can be written as  
F D N 0, J 1 ? Q
for some probability law Q on Rd as in Theorem 5.5.
(b) (Jeganathan [66]) When LAMN holds at #, any limit law Fe in Definition 7.8(b)
admits a representation
“ h i
e .A/ D
F PJ .dj / N .0, j 1 / ? Qj .du/ 1A .j , u/ , A 2 B.Rd d Rd /

for some family of probability laws ¹Q j : j 2 DC º as in Theorem 6.6.

Proof. We prove (b) first. With notation Un :D ın1 .Tn  #/, regularity means that for
e,
some law F  
L Jn , Un  h j Pn,#Cın h ! F e, n!1
e does not depend on h 2 Rd . Selecting subsequences
(weakly in Rd Rd d ) where F
e has a representation
according to Lemma 7.5, we see that F
.C/ e D L .J , U  h j Ph /
F not depending on h 2 Rd
where U is a statistic in the (possibly Markov extended) limit experiment E.S , J /.
Then U in (+) is a strongly equivariant estimator for the parameter h 2 Rd in the
194 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

mixed normal limit experiment E.S , J /, and the Convolution Theorem 6.6 applies
to U and gives the assertion. To prove (a), which corresponds to deterministic J , we
use a simplified version of the above, and apply Boll’s Convolution Theorem 5.5. 

Recall from Definition 5.6 that loss functions on Rd are subconvex if all levels sets
are convex and symmetric with respect to the origin in Rd . Recall from Anderson’s
Lemma 5.7 that for `./ subconvex,
Z Z
`.u/ ŒN .0, j 1 /  Q0 .du/  `.u/ N .0, j 1 /.du/

for every j 2 DC and any law Q0 on Rd . Anderson’s lemma shows that best concen-
trated limit distributions in the Convolution Theorem 7.10 are characterised by
Q D 0 under LAN at #,
.7.100 /
Qj D 0 for P0 -almost all j 2 Rd d under LAMN at #.

Estimator sequences .Tn /n in .En /n which are regular at # and attain in the convolu-
tion theorem the limit distribution (7.10’) are called efficient at #.

7.10” Exercise. In the location model generated from the two-sided exponential distribution,
continuing Exercises 7.9’ and 7.9”, check from Exercise 7.9” that the sequence of empirical
means is not efficient. 

In some problems we might find efficient estimators directly, in others not. Under
some additional conditions – this will be the topic of Section 7.3 – we can apply
a method which allows us to construct efficient estimator sequences. We have the
following characterisation.

7.11 Theorem. Consider estimators .Tn /n in .En /n for the unknown parameter. Under
LAMN or LAN at #, the following assertions (i) and (ii) are equivalent:
(i) the sequence .Tn /n is regular and efficient at # ;
(ii) the sequence .Tn /n has the coupling property .˘/ of example 7.9 (or of Corol-
lary 7.7):

ın1 .#/.Tn  #/ D Zn .#/ C o.Pn,# / .1/ , n!1.

Proof. We consider the LAMN case (the proof under LAN is then a simplified version).
Consider a sequence .Tn /n which is regular at #, and write Un D ın1 .#/.Tn  #/.
The implication (ii)H)(i) follows as in Example 7.9 where we have in particular
under (ii)
 
L Jn , .Un  h/ j Pn,#Cın h ! L .J , .Z  h/ j Ph /
Section 7.2 Asymptotic optimality of estimators in the LAN or LAMN setting 195

for every h. But LAMN at # implies according to Definition 6.2 and Proposition 6.3
Z
L .J , .Z  h/ j Ph / .A/ D L .J , Z j P0 / .A/ D PJ .dj / N .0, j 1 /.du/ 1A .j , u/

for Borel sets A in Rd d Rd . According to (7.10’) above, we have (i).


To prove (i)H)(ii), we start from the regularity assumption
 
L Jn , .Un  h/ j P#Cın h ! F e, n!1

e does not depend on h 2 Rd . Fix any subsequence


for arbitrary h where the limit law F
of the natural numbers. Applying Lemma 7.5 along this subsequence, we find a further
subsequence .nl /l and a statistic U in the limit experiment E.S , J / (if necessary, after
Markov extension) such that
 
L Snl , Jnl , Unl j P#Cınl h ! L .S , J , U j Ph / , l ! 1
for all h, or equivalently by definition of .Zn /n
 
L .Znl  h/, Jnl , .Unl  h/ j P#Cınl h ! L ..Z  h/, J , .U  h/ j Ph / ,
.ı/
l !1

for every h 2 Rd . Again, regularity yields


e D L .J , U j P0 /
L . J , .U  h/ j Ph / D F does not depend on h 2 Rd
and allows to view U as a strongly equivariant estimator in the limit experiment
E1 .#/ D E.S , J / which is mixed normal. The Convolution Theorem 6.6 yields a
representation
Z Z
e
F .A/ D P .dj / ŒN .0, j 1 / ? Qj .du/ 1A .j , u/ , A 2 B.Rd d Rd / .
J

Now we exploit the efficiency assumption for .Tn /n at #: according to (7.10’) above
we have
Qj D 0 for PJ -almost all j 2 Rd d .
Since the Convolution Theorem 6.6 identifies .j , B/ ! Qj .B/ as a regular version
U ZjJ Dj
of the conditional distribution P0 .B/, the last line establishes
U DZ P0 -almost surely.
Using .ı/ and the continuous mapping theorem, this gives
 
L Unl  Znl j Pnl ,# ! L .U  Z j P0 / D 0 , l ! 1.
But convergence in law to a constant limit is equivalent to stochastic convergence,
thus
.ıı/ Unl D Znl C o.Pn / .1/ , l ! 1.
l ,#
196 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

We have proved that every subsequence of the natural numbers contains some further
subsequence .nl /l which has the property .ıı/: this gives

Un D Zn C o.Pn,# / .1/ , n!1

which is (ii) and finishes the proof. 

However, there might be interesting estimator sequences .Tn /n for the unknown
parameter in .En /n which are not regular as required in the Convolution Theo-
rem 7.10, or we might be unable to prove regularity: when LAMN or LAN holds at a
point #, we wish to include these in comparison results.

7.12 Local Asymptotic Minimax Theorem. Assume that LAMN or LAN holds at #,
consider arbitrary sequences of estimators .Tn /n for the unknown parameter in .En /n ,
and arbitrary loss functions `./ which are continuous, bounded and subconvex.
(a) A local asymptotic minimax bound
  
lim inf lim inf sup En,#Cın h ` ın1 .Tn  .# C ın h//  E0 . `.Z//
c!1 n!1 jhjc

holds whenever .Tn /n has estimation errors at # which are tight at rate .ın .#//n :
 
L ın1 .#/.Tn  #/ j Pn,# , n  1 , is tight in Rd .

(b) Sequences .Tn /n satisfying the coupling property .˘/ of Example 7.9

ın1 .#/.Tn  #/ D Zn .#/ C o.Pn,# / .1/ , n ! 1

attain the local asymptotic minimax bound at #. One has under this condition
  
lim sup En,#Cın h ` ın1 .Tn  .# C ın h// D E0 . `.Z//
n!1 jhjc

for arbitrary choice of a constant 0 < c < 1.

Proof. We give the proof for the LAMN case (again, the proof under LAN is a simpli-
fied version), and write Un D ı 1 .Tn  #/ for the rescaled estimation errors of Tn
at #.
(1) Fix c 2 N. The loss function `./ being non-negative and bounded,

lim inf sup En,#Cın h . ` .Un  h//


n!1 jhjc

is necessarily finite. Select a subsequence of the natural numbers along which ‘liminf’
in the last line can be replaced by ‘lim’, then – using Lemma 7.5 – pass to some further
subsequence .nl /l and some statistic U in the limit experiment E1 .#/ D E.S , J / (if
Section 7.2 Asymptotic optimality of estimators in the LAN or LAMN setting 197

necessary, after Markov extension) such that the following holds for arbitrary limit
points h and convergent sequences hl ! h:
 
L Snl , Jnl , Unl  hl j Pnl ,#Cınl hl ! L . S , J , U  h j Ph / , l ! 1 .
From this we deduce as in Theorem 7.6 as l ! 1
ˇ    ˇ
ˇ ˇ
.C/ sup ˇEnl ,#Cınl h ` Unl  h  Eh . ` .U  h//ˇ ! 0
jhjc

since ` 2 Cb . Recall that h ! Eh . ` .U  h// is continuous, see step (2) in the proof
of Theorem 7.6. Write c for the uniform law on the closed ball Bc in Rd centred at
0 with radius c, as in Lemma 6.7 and in the proof of Theorem 6.8. Then by uniform
convergence according to (+)
  
sup Enl ,#Cınl h ` Unl  h !
jhjc
Z
sup Eh . ` .U  h//  c .dh/ Eh . ` .U  h//
jhjc

as l ! 1. The last ‘’ is a trivial bound for an integral with respect to c .


(2) Now we exploit Lemma 6.7. In the mixed normal limit experiment E1 .#/ D
E.S , J /, arbitrary estimators U for the parameter h 2 Rd can be viewed as being
approximately strongly equivariant under a very diffuse prior, i.e. in absence of any
a priori information on h except that h should range over large balls centred at the
origin: there are probability laws ¹Qcj : j 2 DC , c 2 Nº such that
Z Z 
d1 c .dh/ L.U  hjPh / , PJ .dj / Œ N .0, j 1 / ? Qcj  ! 0

as c increases to 1, with d1 ., / the total variation distance. As a consequence, `./


being bounded, Z
R.U , Bc / D c .dh/ Eh . ` .U  h//
takes the form
Z Z h i
R.U , Bc / D PJ .dj / N .0, j 1 / ? Qcj .du/ l.u/ C .c/

where remainder terms .c/ involve an upper bound for `./ and vanish as c increases
to 1. In the last line, at every stage c 2 N of the asymptotics, Anderson’s Lemma 5.7
allows for a lower bound
Z Z
R.U , Bc /  P .dj / N .0, j 1 /.du/ l.u/ C .c/
J

since `./ is subconvex. According to Definition 6.2 and Proposition 6.3, the law ap-
pearing on the right-hand side is L .Z j P0 /, and we arrive at
R.U , Bc /  E0 .`.Z// C .c/ where lim .c/ D 0.
c!1
198 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

(3) Combining steps (1) and (2) we have for c 2 N fixed


  
lim inf sup En,#Cın h . ` .Un  h// D lim sup Enl ,#Cınl h ` Unl  h
n!1 jhjc l!1 jhjc
D sup Eh . ` .U  h//  R.U , Bc /
jhjc

(this U depends on the choice of the subsequence at the start) where for c tending to 1

lim inf R.U , Bc /  E0 .`.Z// .


c!1

Both assertions together yield the local asymptotic minimax bound in part (a) of the
theorem.
(4) For estimator sequences .Tn /n satisfying condition .˘/ in 7.9, (+) above can be
strengthened to
ˇ ˇ
sup ˇ En,#Cın h . ` .Un  h//  Eh . `.Z  h// ˇ ! 0 , n ! 1
jhjc

for fixed c, without any need to select subsequences (Corollary 7.7). In the mixed nor-
mal limit experiment, Eh . `.Z  h// D E0 .`.Z// does not depend on h. Exploiting
this we can replace the conclusion of step (1) above by the stronger assertion

sup En,#Cın h . ` .Un  h// ! E0 .`.Z//


jhjc

as n ! 1, for arbitrary fixed value of c. This is part (b) of the theorem. 

7.13 Remark. Theorem 7.12 shows in particular that whenever we try to find estimator
sequences .Tn /n which attain a local asymptotic minimax bound at #, we may restrict
our attention to sequences which have the coupling property .˘/ of Example 7.9 (or of
Corollary 7.7). We rephrase this statement according to Theorem 7.11: under LAMN
or LAN at #, in order to attain the local asymptotic minimax bound of Theorem 7.12,
we may focus – within the class of all possible estimator sequences – on those which
are regular and efficient at # in the sense of the convolution theorem. 

Under LAMN or LAN at # plus some additional conditions, we can construct


efficient estimator sequences explicitely by ‘one-step-modification’ starting from any
preliminary estimator sequence which converges at rate ın .#/ at #. This will be the
topic of Section 7.3.

7.13’ Example. In the set of all probability measures on .R, B.R//, let us consider a
one-parametric path E D ¹P : j j < 1º in direction sgn./ through the law P0 :D
R.1, C1/, the uniform distribution with support .1, C1/, defined as in Examples
1.3 and 4.3 by

g.x/ :D sgn.x/ , P .dx/ :D .1 C g.x// P0 .dx/ , x 2 .1, C1/ .


Section 7.2 Asymptotic optimality of estimators in the LAN or LAMN setting 199

(1) Write ‚ D .1, C1/. According to Example 4.3, the model E is L2 -


differentiable at D # with derivative
g 1 1
V# .x/ D .x/ D 1¹x<0º C 1¹x>0º , x 2 .1, C1/ .
1C#g 1# 1C#
at every reference point # 2 ‚ (it is irrelevant how the derivative is defined on
Lebesgue null sets). As in Theorem
p 4.11, Le Cam’s second lemma yields LAN at
# with local scale ın .#/ D 1= n, with score at #

1 X
n
Sn .#/.X1 , : : : , Xn / D p V# .Xi /
n iD1

at stage n of the asymptotics, and with Fisher information given by


1
J# D E# .V#2 / D , # 2 ‚.
1  #2
(2) With Fb n ./ the empirical distribution function based on the first n observations,
the score in En
1 p b 1 p b n .0//
Sn .#/ D nF n .0/ C n.1  F
1# 1C#
can be written in the form
p
.C/ Sn .#/ D J#  n .Tn  #/ for all # 2 ‚

when we estimate the unknown parameter # 2 ‚ by


b n .0/ .
Tn :D 1  2 F

From representation (+), for every # 2 ‚, rescaled estimation errors of .Tn /n


coincide with the central sequence .Zn .#//n . From Theorems 7.11, 7.10 and 7.12, for
every # 2 ‚, the sequence .Tn /n thus attains the best concentrated limit distribution
(7.10’) in the convolution theorem, and attains the local asymptotic minimax bound
of Theorem 7.12(b). 

An unsatisfactory point with the last example is that n-fold product models En in
Example 7.13’ are in fact classical exponential families
b b
dPn, D .1 C /nŒ1F n .0/ .1  /nF n .0/ dPn,0 D .1 C /n exp¹ . / TLn º dPn,0 ,
2‚
1
in . / :D log. 1C / and TLn :D nF
b n .0/. Hence we shall generalise it (considering dif-
ferent one-parametric paths through a uniform law which are not exponential families)
in Example 7.21 below.
200 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

7.3 Le Cam’s One-step Modification of Estimators


In this section, we go back to the LAQ setting of Section 7.1 and construct estimator
sequences .T en /n with the coupling property .˘/ of Corollary 7.7

.˘/ en  #/ D Zn .#/ C o.P / .1/ ,


ı 1 .#/.T n!1
n,#

starting from any preliminary estimator sequence .Tn /n which converges at rate ın .#/
at #. This ‘one-step modification’ is explicit and requires only few further conditions
in addition to LAQ at #; the main result is Theorem 7.19 below.
In particular, when LAMN or LAN holds at #, one-step modification yields optimal
estimator sequences locally asymptotically at #, via Example 7.9 and Theorem 7.11:
the modified sequence will be regular and efficient in the sense of the Convolution
Theorem 7.10, and will attain the local asymptotic minimax bound of Theorem 7.12.
In the general LAQ case, we only have the following: the modified sequence will work
over shrinking neighbourhoods of # as well as the maximum likelihood estimator Z
in the limit experiment E1 .#/, according to Corollary 7.7 where L.Z  hjPh / may
depend on h.
We follow Davies [19] for this construction. We formulate the conditions which
we need simultaneously for all points # 2 ‚, such that the one-step modifications
en /n of .Tn /n will have the desired properties simultaneously at all points # 2 ‚.
.T
This requires some compatibility between quantities defining LAQ at # and LAQ at
# 0 whenever # 0 is close to #.

7.14 Assumptions for Section 7.3. We consider a sequence of experiments

En D .n , An , ¹Pn,# : # 2 ‚º/ , n1, ‚  Rd open

enjoying the following properties (A)–(D):


(A) At every point # 2 ‚ we have LAQ as in Definition 7.1(a), with sequences

.ın .#//n , .Sn .#//n , .Jn .#//n

depending on #, and a quadratic limit experiment

E1 .#/ D E .S.#/, J.#//

depending on #.
(B) Local scale: (i) For every n  1 fixed, ın ./ : ‚ ! .0, 1/ is a measurable
mapping which is bounded by 1 (this is no loss of generality: we may always replace
ın ./ by ın ./ ^ 1).
(ii) For every # 2 ‚ fixed, we have for all 0 < c < 1
ˇ ˇ
ˇ ın .# C ın .#/ h/ ˇ
sup ˇˇ  1 ˇˇ ! 0 as n ! 1 .
jhjc ın .#/
Section 7.3 Le Cam’s One-step Modification of Estimators 201

(C) Score and observed information: For every # 2 ‚ fixed, in restriction to the set
of dyadic numbers S :D ¹˛2k : k 2 N0 , ˛ 2 Zd º in Rd , we have
ˇ ® ¯ˇ
sup ˇSn . /  Sn .#/  Jn .#/ ı 1 .#/.  #/ ˇ
n
.i/  2 S\‚ , j#j  c ın .#/
D oPn,# .1/ , n ! 1

.ii/ sup jJn . /  Jn .#/j D oPn,# .1/ , n!1


 2 S\‚ , j#j  c ın .#/

for arbitrary choice of a constant 0 < c < 1.


(D) Preliminary estimator sequence: We have some preliminary estimator sequence
.Tn /n for the unknown parameter in .En /n which at all points # 2 ‚ is tight at the
rate specified by (A):
 
for every # 2 ‚ : L ın1 .#/ .Tn  #/ j Pn,# , n  1 , is tight in Rd .

Continuity # ! ın .#/ of local scale in the parameter should not be imposed (a


stochastic process example is given in Chapter 8). Similarly, we avoid to impose
continuity of the score or of the observed information in the parameter, not even
measurability. This is why we consider, on the left-hand sides in (C), only parameter
values belonging to a countably dense subset. Under the set of Assumptions 7.14,
the first steps are to define a local scale with an estimated parameter, observed
information with an estimated parameter, and a score with an estimated parameter.

7.15 Proposition. (a) Define a local scale with estimated parameter by

Dn :D ın .Tn / 2 .0, 1 , n  1.
Then we have for every # 2 ‚
Dn
D 1 C oPn,# .1/ , n!1.
ın .#/
(b) For every n  1, define a N0 -valued random variable .n/ by

.n/ D k if and only if Dn 2 .2.kC1/ , 2k , k 2 N0 .

Then for every # 2 #, the two sequences


2.n/ ın .#/
and
ın .#/ 2.n/
are tight in R under Pn,# as n ! 1.

Proof. According to Assumption 7.14(B.i), Dn is a random variable on .n , An /


taking values in .0, 1. Consider the preliminary estimator Tn of Assumption 7.14(D).
202 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

Fix # 2 ‚ and write Un .#/ :D ın1 .#/.Tn  #/ for the rescaled estimation errors
at #. Combining tightness of L .Un .#/jPn,# / as n ! 1 according to Assumption
7.14(D) with a representation Dn D ın .Tn / D ın . # C ın .#/ Un .#/ / and with
Assumption 7.14(B.ii) we obtain (a). From 2..n/C1/  Dn  2.n/ we get
..n/C1/  D , thus (b) follows from (a).
2 Dn  2 
1
n

For every k 2 N0 , cover Rd with half-open cubes


d h h
C.k, ˛/ :D X ˛i 2k , .˛i C 1/2k , ˛ D .˛1 , : : : , ˛d / 2 Zd .
iD1
° ±
Write Z.k/ :D ˛ 2 Zd : C.k, ˛/  ‚ for the collection of those which are con-
tained in ‚. Fix any default value #0 in S\‚. From the preliminary estimator sequence
.Tn /n and from local scale .Dn /n with estimated parameter, we define a discretisation
.Gn /n of .Tn /n as follows: for n  1,
² .n/
˛2 if ˛ 2 Z..n// and Tn 2 C..n/, ˛/
Gn :D
#0 else
X1  X
D #0 C .˛2k  #0 / 1C.k,˛/ .Tn / 1 2.kC1/ ,2k  .Dn / .
kD0 ˛2Z.k/

Clearly Gn is an estimator for the unknown parameter # 2 ‚ in En , taking only


countably many values. We have to check that passing from .Tn /n to .Gn /n does not
modify the tightness rates.

7.16 Proposition. .Gn /n is a sequence of S \ ‚-valued estimators satisfying


®  1  ¯
L ın .#/ .Gn  #/ j Pn,# : n  1 is tight in Rd

for every # 2 ‚.

Proof. Fix # 2 ‚. The above construction specifies ‘good’ events


² [ ³
Bn :D Tn 2 C..n/, ˛/ 2 An
˛2Z..n//

together with a default value Gn D #0 on Bnc . Since ‚ is open and rescaled estimation
errors of .Tn /n at # are tight at rate .ın .#//n , since .Dn /n or .e .n/ /n defined in
Proposition 7.15(b) are random tightness rates which are equivalent to .ın .#//n under
.Pn,# /n , we have by construction

lim Pn,# .Bn / D 1


n!1
Section 7.3 Le Cam’s One-step Modification of Estimators 203

together with
p
jGn  Tn j < d  2.n/ on Bn , for every n 2 N

for ˛ 2 Zd , the half-open cube C.k, ˛/ has lower left endpoint


(for arbitrary k and p
˛2 and diameter d  2k ). Combining the two last lines we have
k

Gn D Tn C OPn,# .ın .#// as n ! 1

which is the assertion. 

7.17 Proposition. The discretised sequence .Gn /n in Proposition 7.16 allows us to


define for every n 2 N information with an estimated parameter
X
b
J n :D Jn .Gn / D Jn . /  1¹Gn Dº
 2 S\‚

together with its ‘inverse’


²
b n .!/ :D Œb
J n .!/1 if b
J n .!/ belongs to DC
K
Id else

such that the following (i) and (ii) hold for every # 2 ‚:

.i/ b
J n D Jn .#/ C oPn,# .1/ as n ! 1,

.ii/ b n D Jn1 .#/ C oP .1/


K and b n Jn .#/ D Id C oP .1/ ,
K n ! 1.
n,# n,#

Here Id denotes the d -dimensional identity matrix.

Proof. Note first that Jn .#/ under Pn,# takes values in DC almost surely, by definition
of LAQ in Definition 7.1, but the same statement is not clear for Jn . / under Pn,#
(we did not require equivalence of laws Pn,# , Pn, for ¤ #). Since Gn takes values
in the countable set S \ ‚, the random variable b J n on .n , An / is well defined and
Rd d -valued; then also K b n is well defined since DC is a Borel set in Rd d .
(1) For every # 2 ‚, we deduce (i) from part (C.ii) of Assumption 7.14, via a
representation
b
J n D Jn .Gn / D Jn . # C ın .#/UL n .#/ / , UL n .#/ :D ın1 .#/.Gn  #/

where Gn is S \ ‚-valued, and where L.UL n .#/jPn,# / is tight as n ! 1 by Propo-


sition 7.16.
(2) Fix # 2 ‚. From LAQ at # combined with (i), we have weak convergence
 
Lb J n , Jn .#/ j Pn,# ! L .J.#/, J.#/ j P0 /
204 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

in Rd d Rd d as n ! 1. The mapping on Rd d
² 1
j if det.j / ¤ 0
: j !
Id else

is continuous on the open set ¹j 2 Rd d : det.j / ¤ 0º which contains DC . Thus, by


assumption of Definition 6.1 on the limit experiment E1 .#/ D E.S.#/, J.#//, the set
b n D .b
of discontinuities of is a null set under L .J.#/jP0 /. If we consider K J n /,
the continuous mapping theorem allows us to extend the above convergence to
 
L K b n , Jn1 .#/, b
J n , Jn .#/ j Pn,#
.C/
! L . J 1 .#/, J 1 .#/, J.#/ , J.#/ j P0 /

weakly in Rd d Rd d Rd d Rd d as n ! 1; recall from Definition 7.1 that


Jn .#/ takes values Pn,# -almost surely in DC for all n  1. In particular, (+) shows
that both sequences Jn1 .#/ under Pn,# and K b n under Pn,# are tight as n ! 1.
Applying the continuous mapping theorem to differences or products of components
in (+) we see that Kb n  Jn1 .#/ under Pn,# converges weakly in Rd d as n ! 1
b n Jn .#/ under Pn,# converges
to the matrix having all entries equal to 0, and that K
weakly in R d d to the identity matrix Id . Weak convergence to a constant limit
being equivalent to convergence in probability, assertion (ii) follows. 

7.18 Proposition. The discretisation Gn of Proposition 7.16 allows to define a score


with estimated parameter
X
b
S n :D Sn .Gn / D Sn . /  1¹Gn Dº , n  1
 2 S\‚

which satisfies for every # 2 ‚


b
S n D Sn .#/  Jn .#/ ın1 .#/.Gn  #/ C oPn,# .1/ , n!1.

In particular, L.b
S n jPn,# / is tight as n ! 1 for every # 2 ‚.

Proof. Again b
S n D Sn .Gn / is well defined. Fix # 2 ‚ and write
b
S n D Sn .Gn / D Sn . # C ın .#/UL n .#/ / , UL n .#/ :D ın1 .#/.Gn  #/
where Gn is S \ ‚-valued and where L.UL n .#/jPn,# / is tight as n ! 1. Then we
write

ˇ ® ¯ˇˇ   
Pn,# ˇSn .Gn /  Sn .#/  Jn .#/ UL n .#/ ˇ > "  Pn,# jUL n .#/j > c
 
ˇ ® ¯ˇ
C Pn,# sup ˇ Sn . /  Sn .#/  Jn .#/ ı 1 .#/.  #/ ˇ > "
n
2S\‚,j#jcın .#/

and apply part (C.i) of Assumption 7.14. 


Section 7.3 Le Cam’s One-step Modification of Estimators 205

Now we resume: Under Assumptions 7.14, with preliminary estimator sequence


.Tn /n and its discretisation .Gn /n as in Proposition 7.16, with local scale with
estimated parameter Dn , information with estimated parameter b J n and score with
estimated parameter b S n as defined in Propositions 7.15, 7.17 and 7.18, with ‘inverse’
b n for b
K J n as in Proposition 7.17, we have

7.19 Theorem. (a) With these assumptions and notations, the one-step modification

T bn b
en :D Gn C Dn K Sn , n1
en /n for the unknown parameter in .En /n which has
yields an estimator sequence .T
the property
en  #/ D Zn .#/ C o.P / .1/ ,
ın1 .#/.T n!1
n,#

for every # 2 ‚.
(b) In the sense of Corollary 7.7, for every # 2 ‚, .T en /n works over shrinking
neighbourhoods of # as well as the maximum likelihood estimator in the limit model
E1 .#/ D E.S.#/, J.#// .
(c) If LAMN or LAN holds at #, the sequence .T en /n is regular and efficient at # in
the sense of the Convolution Theorem 7.10, and attains the local asymptotic minimax
bound at # according to the Local Asymptotic Minimax Theorem 7.12.

Proof. Only (a) requires a proof. Fix # 2 ‚. Then from Propositions 7.15 and 7.17

en  #/ D ın1 .#/.Gn  #/ C Dn b b
ın1 .#/.T Kn S n
ın .#/
D ın1 .#/.Gn  #/ C 1 C oPn,# .1/ Jn1 .#/ C oPn,# .1/ b
Sn
D ın1 .#/.Gn  #/ C Jn1 .#/ b
S n C oP .1/
n,#

which according to Proposition 7.18 equals


ın1 .#/.Gn  #/
C Jn1 .#/ Sn .#/  Jn .#/ ın1 .#/.Gn  #/ C oPn,# .1/ C oPn,# .1/

where terms ın1 .#/.Gn  #/ cancel out and the last line simplifies to

Jn1 .#/ Sn .#/ C oPn,# .1/ D Zn .#/ C oPn,# .1/ , n ! 1.

This is the assertion. 


206 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

7.4 The Case of i.i.d. Observations


p
To establish LAN with local scale ın D 1= n in statistical models for i.i.d. observa-
tions, we use in most cases Le Cam’s Second Lemma 4.11 and 4.10. We recall this
important special case here, and discuss some examples. During this section,

E D .  , A , P D ¹P : 2 ‚º /

is an experiment, ‚  Rd is open, and En is the n-fold product experiment with


canonical variable .X1 , : : : , Xn /. We write Pn, D ˝niD1 P for the laws in En and

h=0
p
.Ch= n/= d Pn,Ch=pn
ƒn, D ƒn D log
d Pn,
p
for 2 ‚ and h 2 ‚,n , the set of all h 2 Rd such that C h= n belongs to ‚. We
shall assume
8
< there is an open set ‚0  Rd contained in ‚ such
./ that the following holds: for every # 2 ‚0 , the experiment
:
E is L2 -differentiable at D # with derivative V# .

 that # 2 ‚0 implies
Recall from Assumptions 4.10, Corollary 4.5 and Definition 4.2
that V# is centred and belongs to L2 .P# /, write J# D E# V# V#> for the Fisher
information in the sense of Definition 4.6. Then Le Cam’s Second Lemma 4.11 yields
a quadratic expansion of log-likelihood ratios
p
.#Chn = n/ 1 >
ƒn D h> n Sn .#/  h J# hn C oPn,# .1/
2 n
as n ! 1, for arbitrary bounded sequences .hn /n in Rd , at every reference point
# 2 ‚0 , with

1 X
n
Sn .#/ D p V# .Xi /
n
j D1

such that L . Sn .#/ j Pn,# / ! N . 0 , J# / , n ! 1.

In terms of Definitions 5.2 and 7.1 we can rephrase Le Cam’s Second Lemma 4.11 as
follows:

7.20 Theorem. For n-fold independent replication of an experiment E satisfying ./


as n ! 1, LAN holds at every # 2 ‚0 with local scale ın .#/ D n1=2 ; the limit
experiment E1 .#/ is the Gaussian shift E.J# / .

We present some examples of i.i.d. models for which Le Cam’s second lemma
establishes LAN at all parameter values. The aim is to specify efficient estimator
Section 7.4 The Case of i.i.d. Observations 207

sequences, either by checking directly the coupling condition in Theorem 7.11, or


by one-step modification according to Section 7.3. The first example is very close to
Example 7.13’.

7.21 Example. Put ‚ :D . 1 , C 1 / and P0 :D R.I / , the uniform distribution on


I :D ., C/. In the set of all probability measures on .R, B.R//, define a one-
parametric path E D ¹P : 2 ‚º in direction g through P0 by

g.x/ :D  sin.x/ ,
1
P .dx/ :D .1 C g.x// P0 .dx/ D .1 C  sin.x// dx , x 2 I
2
as in Examples 1.3 and 4.3, where the parameterisation is motivated by
1
F .0/ D  , 2‚
2
with F ./ the distribution function corresponding to P
1
F .x/ D .Œx C   Œcos.x/ C 1/ when x2I.
2
(1) According to Example 4.3, the model E is L2 -differentiable at D # with
derivative
g  sin.x/
V# .x/ D .x/ D , x2I
1C#g 1 C #  sin.x/
at every reference point # 2 ‚. As resumed in Theorem 7.20, LepCam’s second lemma
yields LAN at # for every # 2 ‚, with local scale ın .#/ D 1= n and score

1 X
n
Sn .#/.X1 , : : : , Xn / D p V# .Xi /
n
iD1

at #, and with finite Fisher information in the sense of Definition 4.6

J# D E# .V#2 / < 1 , # 2 ‚.

In this model, we prefer to keep the observed information

1X 2
n
Jn .#/ D V# .Xi /
n
iD1

in the quadratic expansion of log-likelihood ratios in the local model at #, and write
p
1 >
D h>
.#Chn = n/
.C/ ƒn n Sn .#/  h Jn .#/ hn C oPn,# .1/
2 n
as n ! 1, for arbitrary bounded sequences .hn /n in Rd .
208 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

(2) We show that the set of Assumptions 7.14 is satisfied. A preliminary estimator
1 b n .0/
Tn :D  F
2
p
for the unknown parameter # in En is at hand, clearly n-consistent as n ! 1.
From step (1), parts (A) and (B) of Assumptions 7.14 are granted; part (C.ii) holds by
continuity of Jn .#/ in the parameter. Calculating
® ¯
Sn . /  Sn .#/  Jn .#/ ın1 .#/.  #/ , with Jn .#/ as in (+)
(the quantities arising in part (C.i) of Assumption 7.14), we find

1X 2
n
p 1 C #  sin.Xi /
Sn . /  Sn .#/ D  n.  #/  V# .Xi /
n 1 C  sin.Xi /
iD1

from which we deduce easily that Assumption 7.14(C.i) is satisfied.


(3) Now the one-step modification according to Theorem 7.19 (there is no need
for discretisation since score and observed information depend continuously on the
parameter)
en :D Tn C p1 ŒJn .Tn /1 Sn .Tn / , n  1
T
n
yields an estimator sequence .T en /n which is regular and efficient at every point
# 2 R (a simplified version of the proof of Theorem 7.19 is sufficient to check
this). 

However, even if the set of assumptions 7.14 can be checked in a broad variety of
statistical models, not all of the assumptions listed there are harmless.

7.22 Example. Let E D ., A, ¹P : 2 Rº/ be the location model on .R, B.R//
generated from the two-sided exponential distribution P0 .dy/ :D 12 e jyj dy ; this
example has already been considered in exercises 4.1000 , 7.90 and 7.900 . We shall see
that assumption 7.14 C) i) on the score with estimated parameter does not hold, hence
one-step correction according to theorem 7.19 is not applicable. However, it is easy
–for this model– to find optimal estimator sequences directly.
1) For every # 2 ‚ :D R, we have L2 -differentiability at D # with derivative
V# given by
V# .x/ D sgn.x  #/
000
p
(cf. exercise 4.1 ). For all #, put ın .#/ D 1= n. Then Le Cam’s second lemma
(theorem 7.20 or theorem 4.11) establishes LAN at # with score
1 X
n
Sn .#/.y1 , : : : , yn / D p V# .yi /
n
iD1

at #, and with Fisher information J# D E# .V#2 /  1 not depending on #.


Section 7.4 The Case of i.i.d. Observations 209

2) We consider the set of assumptions 7.14. From 1), parts A) and B) of 7.14 are
granted; 7.14 C) part ii) is trivial since Fisher information does not depend on the
parameter. Calculate now
® ¯
Sn . /  Sn .#/  Jn .#/ ın1 .#/.  #/ , Jn .#/ D J#  1
P
in view of 7.14 C) part i). This takes a form p1n niD1 Yn,i where
²
2 1.#,/ .Xi / C 12 .  #/ for # < ,
Yn,i :D
2 1.,#/ .Xi //  12 .#  / for < # .
Rr
Defining a function .r/ :D 0 12 e y dy for r > 0, we have

jE# .Yn,i /j D j#  j  2.j#  j/ , Var# .Yn,i / D 4 Œ.1  /.j#  j//

2 r as r # 0. We shall see that this structure violates assump-


1
where .r/
tion 7.14 C) i), thus one-step correction according to theorem 7.19 breaks down.
To see this, consider a constant sequence .hn /n with hn :D 12 for all n, and redefine
the Yn,i above using n D # C n1=2 h in place of : first, E# .Yn,i / D o.n1=2 / as
n ! 1 and thus
1 X 1 X
n n
p Yn,i D p ŒYn,i  E# .Yn,i / C oPn,# .1/ ,
n iD1 n iD1

second, Var# .Yn,i / D O.n1=2 / as n ! 1 and by choice of the particular sequence


.hn /n

1 X X
n n
Yn,i  E# .Yn,i /
p Yn,i D p C oPn,# .1/
n
iD1 iD1
Var# .Yn,1 / C : : : C Var# .Yn,i /

The structure of the Yn,i implies that the Lindeberg condition holds, thus as n ! 1

1 X
n
® ¯
Sn . n /  Sn .#/  Jn .#/ ın1 .#/. n  #/ D p Yn,i ! N .0, 1/
n iD1

weakly in R. This is incompatible with 7.14 C) i).


3) In the location model generated by the two-sided exponential distribution, there
is no need for one-step correction. Consider the median

Tn :D median.X1 , : : : , Xn /

which in our model is maximum likelihood in En for every n. Our model allows to
apply a classical result on asymptotic normality of the median: from [128, p. 578]),
p 
L n .Tn  #/ j Pn,# ! N .0, 1/
210 Chapter 7 Local Asymptotics of Type LAN, LAMN, LAQ

as n ! 1. In all location models, the median is equivariant in En for all n. By this


circonstance, the last convergence is already regularity in the sense of Hajek: for any
h 2 R,
p   
L n Tn  Œ# C n1=2 h j Pn,#Cn1=2 h ! F :D N .0, 1/ .

Recall from 1) that Fisher information in E equals J# D 1 for all #. Thus F appearing
here is the optimal limit distribution F D N .0, J#1 / in Hajek’s convolution theorem.
By (7.10’) and Theorem 7.11, the sequence .Tn /n is thus regular and efficient in the
sense of the convolution theorem, at every reference point # 2 ‚. Theorem 7.11
establishes that the coupling condition
p
n .Tn  #/ D Zn .#/ C o.Pn,# / .1/ , n ! 1
holds, for every # 2 R. Asymptotically
P as n ! 1, this links estimation errors of the
median Tn to differences n1 niD1 sgn.Xi  #/ between the relative number of obser-
vations above and below #. The coupling condition in turn implies that the estimator
sequence .Tn /n attains the local asymptotic minimax bound 7.12. 

7.22’ Exercise. Consider the location model E D ., A, ¹P : 2 Rº/ generated from
P0 D N .0, 1/, show that L2 -differentiability holds with L2 -derivative V# D .  #/ at
every point # 2 R. Show that the set of Assumptions 7.14 is satisfied, with left-hand sides
in Assumption 7.14(C) identical to zero.pConstruct an MDE sequence as in Chapter 2 for the
unknown parameter, with tightness rate n, and specify the one-step modification according
to Theorem 7.19 which as n ! 1 grants regularity and efficiency at all points # 2 ‚ (again,
as in Example 7.21, there is no need for discretisation). Verify that this one-step modification
directly replaces the preliminary estimator by the empirical mean, the MLE in this model. 

7.22” Exercise. We continue with the location model generated from the two-sided exponen-
tial law, under all notations and assumptions of Exercises 7.9’, 7.9” and of Example 7.22. We
focus on the Bayesians with ‘uniform over R prior’
R1 =0
 Ln d
Tn D R1 1 =0
, n1
1 Ln d

and shall prove that .Tn /n is efficient in the sense of the Convolution Theorem 7.10 and of the
Local Asymptotic Minimax Theorem 7.12.
(a) Fix any reference point # 2 R and recall (Example 7.22, or Exercises 7.9’(a), or 4.1’’’
and Lemma 4.11) that LAN holds at # in the form
1 X
n
hn =0 1 2
ƒn,# D hn Sn .#/  h C o.Pn,# / .1/ , Sn .#/ :D p sgn.Xi  #/
2 n n i D1

as n ! 1, for bounded sequences .hn /n in R. Also, we can write under Pn,#


R1
p    u Lu=0 du
n Tn  # D R1 1
n,#
u=0
.
1 Ln,# du
Section 7.4 The Case of i.i.d. Observations 211

(b) Prove joint convergence


 Z 1 Z 1 
u=0 u=0
Sn .#/ , u Ln,# du , Ln,# du under Pn,#
1 1

as n ! 1 to a limit law
 Z 1 Z 1 
uS 12 u2 uS 12 u2
S, ue du , e du
1 1

where S N .0, 1/ generates the Gaussian limit experiment E.1/ D ¹N .h, 1/ : h 2 Rº.
(Hint: finite-dimensional convergence of .Lu=0 /
n,# u2R
allows to deal e.g. with integrals
R u=0
u Ln,# du on compacts K  R, as in Lemma 2.5, then give some bound for
RK u=0
K c u Ln,# du as n ! 1).
(c) Let U  denote the Bayesian with ‘uniform over R prior’ in the limit experiment E.1/
R1 1 2
u e uS 2 u du
U  :D R11 uS 12 u2
1 e du

and recall from Exercise 5.4” that in a Gaussian shift experiment, U  coincides with the central
statistic Z.
(d) Use (a), (b) and (c) to prove that the coupling condition
p   
n Tn  # D Zn .#/ C o.Pn,# / .1/ as n ! 1

holds. By Theorem 7.11, comparing to Theorem 7.10, (7.10’) and Theorem 7.12, we have
efficiency of .Tn /n at #. 
Chapter 8

Some Stochastic Process Examples for Local
Asymptotics of Type LAN, LAMN and LAQ

Topics for Chapter 8:


8.1 *The Ornstein–Uhlenbeck Process with Unknown Parameter
Observed over a Long Time Interval
The process and its long-time behaviour (8.1)–8.2
Statistical model, likelihoods and ML estimator 8.3
Local models at # 8.3’
LAN in the ergodic case # < 0 8.4
LAQ in the null recurrent case # D 0 8.5
LAMN in the transient case # > 0 8.6
Remark on non-finite second moments in the LAMN case 8.6’
Sequential observation schemes: LAN at all parameter values 8.7
8.2 *A Null Recurrent Diffusion Model
The process and its long-time behaviour (8.8)–8.9
Regular variation, i.i.d. cycles, and norming constants for invariant measure
8.10–(8.11”)
Statistical model, likelihoods and ML estimator 8.12
Convergence of martingales together with their angle brackets 8.13
LAMN for local models at every # 2 ‚ 8.14
Remarks on non-finite second moments 8.15
Random norming 8.16
One-step modification is possible 8.17
8.3 *Some Further Remarks
LAN or LAMN or LAQ arising in other stochastic process models
Example: a limit experiment with essentially different statistical properties
Exercises: 8.2’, 8.6”, 8.7’

We discuss in detail some examples for local asymptotics of type LAN, LAMN or
LAQ in stochastic process models (hence the asterisk  in front of all sections). Mar-
tingale convergence and Harris recurrence (positive or null) will play an important
role in our arguments and provide limit theorems which establish convergence of lo-
cal models to a limit model. Background on these topics and some relevant apparatus
are collected in an Appendix (Chapter 9) to which we refer frequently, so one might
have a look to Chapter 9 first before reading the sections of the present chapter.
Section 8.1 Ornstein-Uhlenbeck Model 213


8.1 Ornstein–Uhlenbeck Process with Unknown
Parameter Observed over a Long Time Interval
We start in dimension d D 1 with the well-known example of Ornstein–Uhlenbeck
processes depending on an unknown parameter, see [23] and [5, p. 4]). We have a
probability space ., A, P / carrying a Brownian motion W , and consider the unique
strong solution X D .X t / t0 to the Ornstein–Uhlenbeck SDE

.8.1/ dX t D # X t dt C d W t , t 0

for some value of the parameter # 2 R and some starting point x 2 R. There is an
explicit representation of the solution
 Z t 
Xt D e# t x C e # s d Ws , t  0
0
satisfying
 Z  ´   
t
#t # s N 0, 1
e 2# t  1 if # ¤ 0
L e e d Ws D 2#
0 N .0, t / if # D 0 ,
and thus an explicit representation of the semigroup .P t ., // t0 of transition proba-
bilities of X

P t .x, dy/ :D P .XsCt 2 dy j Xs D x/


´   
N e # t x , 21# e 2# t  1 .dy/ if # ¤ 0
D
N . x , t / .dy/ if # D 0
where x, y 2 R and 0  s, t < 1.

8.2 Long-time Behaviour of the Process. Depending on the value of the parameter
# 2 ‚, we have three different types of asymptotics for the solution X D .X t / t0 to
equation (8.1).
(a) Positive recurrence in the case where # < 0: When # < 0, the process X is
positive recurrent in the sense of Harris (cf. Definition 9.4) with invariant measure
 
1
 :D N 0, .
2j#j
This follows from Proposition 9.12 in the Appendix where the function
Z x  Z y 
2# v dv D e  # y ,
2
S.x/ D s.y/ dy with s.y/ D exp  x, y 2 R
0 0

corresponding to the coefficients of equation (8.1) in the case where # < 0 is a bi-
jection from R onto R, and determines the invariant measure for the process (8.1)
214 Chapter 8 Some Stochastic Process Examples

1
as s.x/ dx , unique up to constant multiples. Normed to a probability measure this
specifies  as above.
Next, the Ratio Limit Theorem 9.6 yields for functions f 2 L1 ./
Z
1 t
lim f . s / ds D .f / almost surely as t ! 1
t!1 t 0

for arbitrary choice of a starting point. We thus have strong laws of large numbers for
a large class of additive functionals of X in the case where # < 0.
(b) Null recurrence in the case where # D 0: Here X is one-dimensional Brownian
motion with starting point x, and thus null recurrent (cf. Definition 9.5’) in the sense
of Harris. The invariant measure is , the Lebesgue measure on R.
(c) Transience in the case where # > 0: Here trajectories of X tend towards C1
or towards 1 exponentially fast. In particular, any given compact K in R will be
left in finite time without return: thus X is transient in the case where # > 0. This is
proved as follows. Write F for the (right-continuous) filtration generated by W . For
fixed starting point x 2 R, consider the .P , F /-martingale
Z t
# t
Y D .Y t / t0 , Y t :D e Xt D x C e #s d Ws , t  0 .
0
Rt
Then E ..Y t  x/2 / D 0 e 2#s ds and thus
 
sup E Y t2 < 1
t0

in the case where # > 0 : hence Y t converges as t ! 1 P -almost surely and in


L2 ., A, P /, and the limit is
Z 1  1 
Y1 D x C e #s d Ws N x , .
0 2#
But almost sure convergence of trajectories t ! Y t .!/ signifies that for P -almost all
! 2 ,

. / X t .!/ Y1 .!/ e # t as t !1.

Asymptotics . / can be transformed into a strong law of large numbers for some few
additive functionals of X , in particular
Z t Z t
1 2# t
Xs2 .!/ ds Y1 2
.!/ e 2 # s ds Y1
2
.!/ e as t ! 1
0 0 2#
for P -almost all ! 2 . 

The following tool will be used several times in this chapter.


Section 8.1 Ornstein-Uhlenbeck Model 215

8.2’ Lemma. On some space ., A, F , P / with right-continuous filtration F , consider


a continuous .P , F /-semi-martingale X admitting a decomposition
X D X0 C M C A
under P , where A is continuous F -adapted with paths locally of bounded variation
starting in A0 D 0, and M a continuous local .P , F )-martingale starting in M0 D 0.
Let P 0 denote a second probability measure on ., A, F / such that X is a continuous
.P 0 , F /-semi-martingale with representation
X D X0 C M 0 C A0
under P 0 . Let H be an F -adapted process with left-continuous paths, locally bounded
loc
under both P and P 0 . Then we have the following: under the assumption RP P0
relative to F , any determination .t , !/ ! I.t , !/ of the stochastic
R integral Hs dXs
under P is also a determination of the stochastic integral Hs dXs under P 0 .

Proof. (1) For the process H , consider some localising sequence .n /n under P , and
some localising sequence T .n0 /n under P 0 . For
T finite time horizon N < 1 we have
PN 0 0
PN , hence events n ¹n  N º and n ¹n  N º in FN are null sets under
both probability measures P and P 0 . This holds for all N , thus .n ^n0 /n is a common
localising sequence under both P and P 0 .
(2) Localising further, we may assume that H as well as M , hM iP , jjjAjjj and M 0 ,
hM 0 iP 0 , jjjA0 jjj are bounded (we write jjjAjjj for the total variation process of A, and
hM iP for the angle bracket under P ). R
(3) Fix a version .t , !/ ! J.t , !/ of the stochastic integral
R Hs dXs under P , and
a version .t , !/ ! J 0 .t , !/ of the stochastic integral Hs dXs under P 0 . For n  1,
define processes .t , !/ ! I .n/ .t , !/
X1   Z t
.n/
It D H kn X t^ kC1  X t^ kn D Hs.n/ dXs ,
2 2n 2 0
kD0
1
X
H .n/ :D H kn 1 k , kC1

2 2n 2n
kD0

to be considered under both P and P 0 . Using [98, Thm. 18.4] with respect to the
martingale parts of X , select a subsequence .n` /` from N such that simultaneously as
`!1
8
ˆ
ˆ for P -almost all !: the paths I .n` / ., !/ converge uniformly on Œ0, 1/
<
to the path J., !/
ˆ for P 0 -almost all !: the paths I .n` / ., !/ converge uniformly on Œ0, 1/

to the path J 0 ., !/ .
Since PN PN0 for N < 1, there is an event AN 2 FN of full measure under both
P and P such that ! 2 AN implies J., !/ D J 0 ., !/ on Œ0, N . As a consequence,
0
216 Chapter 8 Some Stochastic Process Examples

J and J 0 are indistinguishable processes for the probability measure P as well as for
the probability measure P 0 . 

We turn to statistical models defined by observing a trajectory of an Ornstein–


Uhlenbeck process continuously over a long time interval, under unknown parameter
# 2 R.

8.3 Statistical Model. Consider the Ornstein–Uhlenbeck equation (8.1). Write ‚ :D


R. Fix a starting point x D x0 which does not depend on # 2 ‚. Let Q# denote the
law of the solution to equation (8.1) under #, on the canonical path space .C , C , G/
or .D, D, G/. Then as in Theorem 6.10 or Example 6.11, all laws Q# are locally
equivalent relative to G, and the density process of Q# with respect to Q0 relative to
G is ² Z t Z ³
#=0 .0/ 1 2 t 2
L t D exp # s d ms  # s ds , t  0
0 2 0

with m.0/ the Q0 -local martingale part of the canonical process  under Q0 .
(1) Fix a determination .t , !/ ! Y .t , !/ of the stochastic integral
Z Z Z
Y D s ds D s d m.0/ s D  0 m.0/
C m.0/
s d ms
.0/
under Q0 :

as in Example 6.11, using L.Œ  0 , m.0/ j Q0 / D L .B, B/ , there is an explicit


representation
1 2 
Yt D  t  20  t , t  0 .
2
As a consequence, in statistical experiments E t corresponding to observation of the
canonical process  up to time 0 < t < 1, the likelihood function
² Z t ³
#=0 1
‚ 3 # ! L t D exp # Y t  # 2 2s ds 2 .0, 1/
2 0

and the maximum likelihood (ML) estimator


 
Yt 1
2t  20  t
b
# t :D R t D 2
Rt
0 2s ds 2
0 s ds

are expressed without reference to any particular probability measure.


(2) Simultaneously for all # 2 ‚, .t , !/ ! Y .t , !/ provides a common determi-
nation for
Z t  Z t Z t 
s ds D s d m.#/
s C # 2
s ds under Q#
0 t0 0 0 t0

by Lemma 8.2’:  is a continuousR semi-martingale under both Q0 and Q# , andRany de-


termination .t , !/ ! Y .t , !/ of s ds under Q0 is also a determination of s ds
Section 8.1 Ornstein-Uhlenbeck Model 217

Rt
under Q# . Obviously  has Q# -martingale part m.#/ t D  t  0  # 0 s ds . There
are two statistical consequences:
(i) for every every 0 < t < 1, ML estimation errors under # 2 ‚ take the form of
the ratio of a Q# -martingale divided by its angle bracket under Q# :
Rt .#/
b s d ms
#t  # D 0
Rt under Q# ;
2
0 s ds

(ii) for the density process L=# of Q with respect to Q# relative to G, the repre-
sentation
² Z t ³
1 2
L=0
t =L #=0
t D exp .  #/ Y t  .  # 2
/ 2
s ds , t 0
2 0

coincides with the representation from Theorem 6.10 applied to Q and Q# :


² Z t Z t ³
=# .#/ 1
.C/ L t D exp .  #/ s d ms  .  #/ 2 2
s ds .
0 2 0

(3) The representation (+) allows to reparameterise the model E t with respect to
fixed reference points # 2 ‚, and makes quadratic models appear around #. We will
call
Z t  Z t  Z 
.#/ .#/
s d ms and 2
s ds D  dm under Q#
0 t0 0 t0

score martingale at # and information process at #. It is obvious from 8.2 that the
law of the observed information depends on #. Hence reparameterising as in (+)
with respect to different reference points # or # 0 makes statistically different models
appear. 

The last part of Model 8.3 allows to introduce local models at # 2 R.

8.3’ Local Models at #. (1) Localising around a fixed reference point # 2 R, write
Qn for Q restricted to Gn . With suitable choice ın .#/ of local scale to be specified
below, consider local models
 ® n ¯
E#,n D C , Gn , Q#Cı n .#/ h : h 2 R , n1

when n tends to 1. Then from (+), the log-likelihoods in E#,n are

ƒh=0
#,n
D log Ln.#Cın .#/h/=#
.˘/  Z n  Z n
# 1 2 2
D h ın .#/ s d ms  h ın .#/ 2s ds , h2R.
0 2 0
218 Chapter 8 Some Stochastic Process Examples

By .˘/ and in view of Definition 7.1, the problem of choice of local scale at # turns
out to be the problem of choice of norming constants for the score martingale: we need
weak convergence as n ! 1 of pairs
 Z n Z n 
.˘˘/ . Sn .#/ , Jn .#/ / :D ın .#/ s d m#s , ın2 .#/ 2s ds under Q#
0 0

to some pair of limiting random variables

. S.#/ , J.#/ /

which generate as in Definition 6.1 and Remark 6.1” a quadratic limit experiment.
(2) From .˘/, rescaled ML estimation errors at # take the form
  Rn
1 b ın .#/ 0 s d m#s
ın .#/ # n  # D 2 Rn under Q#
ın .#/ 0 2s ds

and thus act in the local experiment E#,n as estimators


 
.˘ ˘ ˘/ b
hn D ın1 .#/ b# n  # D Jn1 .#/ Sn .#/

for the local parameter h 2 R. 

Now we show that local asymptotic normality at # holds in the case where # < 0,
local asymptotic mixed normality in the case where # > 0, and local asymptotic
quadraticity in the case where # D 0. We also specify local scale .ın .#//n .

8.4 LAN in the Positive Recurrent Case. By 8.2(a), in the case where # < 0, the
canonical process  is positive recurrent under Q# with invariant probability # D
N . 0 , 2j#j
1
/. For the information process in step (3) of Model 8.3 we thus have the
following strong law of large numbers
Z Z
1 t 2 1
lim s ds D x 2 # .dx/ D D: ƒ Q# -almost surely.
t!1 t 0 2j#j
Correspondingly, if at stage n of the asymptotics we observe a trajectory over the time
interval Œ0, n, we take local scale ın .#/ at # such that

ın .#/ D n1=2 for all values # < 0 of the parameter.

(1) Rescaling the score martingale in step (3) of Model 8.3 in space and time we put
G n :D .G tn / t0 and
Z
  1=2
tn
M#n D M#n .t / t0 , M#n .t / :D n s d m.#/
s , t 0.
0
Section 8.1 Ornstein-Uhlenbeck Model 219

This yields a family .M#n /n of continuous .Q# , G n /-martingales with angle brackets
Z
˝ n˛ 1 tn 2 1
8t 0 : M# t D s ds ! t Dt ƒ
./ n 0 2j#j
Q# -almost surely as n ! 1 .
From Jacod and Shiryaev [64, Cor. VIII.3.24], the martingale convergence theorem –
we recall this in Appendix 9.1 below – establishes weak convergence in the Skorohod
space D of càdlàg functions Œ0, 1/ ! R to standard Brownian motion with scaling
factor ƒ1=2 :

M#n ! ƒ1=2 B (weakly in D under Q# , as n ! 1) .

From this, for the particular time t D 1,

./ M#n .1/ ! ƒ1=2 B1 (weakly in R under Q# , as n ! 1)

since projection mappings D 3 ˛ ! ˛.t / 2 R are continuous


 at every ˛ 2 C ,
cf. [64, VI.2.1]), and C has full measure under L ƒ1=2 B in the space D.

(2) Combining ./ and ./ above with .˘/ in 8.3’, log-likelihoods in the local
model En,# at # are
1=2 h/=# 1 2 ˝ n˛
ƒh=0
#,n
D log Ln.#Cn D h M#n .1/  h M# 1 , h2R
2
which gives for arbitrary bounded sequences .hn /n
8 h =0
ˆ
ˆ ƒ n D hn M#n .1/  12 h2n ƒ C oQ# .1/ as n ! 1
< #,n 
. / L M#n .1/ j Q# ! N . 0 , ƒ / as n ! 1
ˆ

with ƒ D 2j#j1
.

This establishes LAN at parameter values # < 0, cf. Definition 7.1(c), and the limit
experiment E1 .#/ is the Gaussian shift E. 2j#j
1
/ in the notation of Definition 5.2.
(3) Once LAN is established, the assertion .˘˘˘/ in 8.3’ is the coupling condition
of Theorem 7.11. From Hájek’s Convolution Theorem 7.10 and the Local Asymptotic
Minimax Theorem 7.12 we deduce the following properties for the ML estimator
sequence .b# n /n : at all parameter values # < 0, the maximum likelihood estimator
sequence is regular and efficient for the unknown parameter, and attains the local
asymptotic minimax bound. 

8.5 LAQ in the Null Recurrent Case. In the case where # D 0, the canonical process
 under Q0 is a Brownian motion with starting point x, cf. 8.2(b), and self-similarity
properties of Brownian motion turn out to be the key to local asymptotics at # D 0, as
220 Chapter 8 Some Stochastic Process Examples

pointed out by [23] or [38]. Writing B or B e for standard Brownian motion, Ito formula
and scaling properties give
Z t Z t Z t   Z t Z t 
1 2 
Bs dBs , jBs j ds , Bs ds D
2
B t , jBs j ds , 2
Bs ds
0 0 0 2 t 0 0
   Z t ˇp ˇ Z t 2 
d 1 p e 2 ˇ es ˇ p
e
D Œ t B 1  t , ˇ t B t ˇ ds , t B st ds
2 0 0
 Z 1 Z 1 Z 1 
d
D t Bs dBs , t 3=2 jBs j ds , t 2 Bs2 ds .
0 0 0

From step (3) of Model 8.3, the score martingale at # D 0 is


Z Z
s d m.0/
s D  0 m.0/
C .s  0 / d m.0/
s under Q0

where L.Œ  0 , m.0/ j Q0 / D L .B, B/ . Consequently, observing at stage n of the


asymptotics the canonical process  up to time n, the right choice of local scale is
1
ın .#/ :D at parameter value # D 0 ,
n
and rescaling of the score martingale works as follows: with G n :D .G tn / t0 ,
Z
  1 tn
M n D M n .t / t0 , M n .t / :D s d m.0/
s , t 0
n 0
where in the case where # D 0 we suppress subscript # D 0 from our notation.
Note the influence of the starting point x for equation (8.1) on the form of the score
martingale.
(A) Consider first starting point x D 0 in equation (8.1) as a special case.
(1) Applying the above scaling properties, we have exact equality in law
Z Z 
 n n
 d
./ L M , hM i j Q0 D L Bs dBs , Bs ds2
for all n  1 .

According to .˘/ in 8.3’, the log-likelihoods in the local model En,0 at # D 0 are
h=0 1 h/=0 1 2
. / ƒ0,n D log L.0Cn
n D h M n .1/  h hM n i1 , h2R
2
for every n  1. Thus, as a statistical experiment, local experiments En,0 D ¹Q0C n
1 :
nh
h 2 Rº at # D 0 coincide for all n  1 with the experiment E1 D ¹Qh : h 2 1

Rº where an Ornstein–Uhlenbeck trajectory (having initial point x D 0) is observed


over the time interval Œ0, 1. Thus self-similarity creates a particular LAQ situation for
which approximating local experiments and limit experiment coincide. We have seen
in Example 6.11 that the limit experiment E1 is not mixed normal.
Section 8.1 Ornstein-Uhlenbeck Model 221

(2) We look to ML estimation in the special case of starting value x D 0 for equation
(8.1). Combining the representation of rescaled ML estimation errors .˘˘˘/ in 8.3’
with ./ in step (1), the above scaling properties allow for equality in law which does
not depend on n  1
    
1  
.ı/ L n b #n  0 C h j QŒ0C n1 h D L b # 1  h j Qh
n
at every value h 2 R. To see this, write for functions f 2 Cb .R/
    
b 1
EQŒ0C 1 h f n # n  0 C h
n n
  
D EQŒ0C 1 h f b hn  h
n
 Œ0C 1 h=0  
D EQ0 Ln n f bhn  h
 ² ³  n 
n 1 2 n M .1/
D EQ0 exp h M .1/  h hM i1 f h
2 hM n i1
which by ./ above is free of n  1. We can rephrase .ı/ as follows: observing over
longer time intervals, we do not gain anything except scaling factors.
(B) Now we consider the general case of starting values 0  x ¤ 0 for equation
(8.1). In this case, by the above decomposition of the score martingale, M n under Q0
is of type
Z Z
1 tn x 1 tn
.x C Bs / dBs D B tn C Bs dBs , t  0
n 0 n n 0
for standard Brownian motion B. Decomposing M n in this sense, we can control
ˇ Z sn ˇ
ˇ n 1 ˇ
sup ˇˇM .s/  .  0 /v d mv ˇˇ under Q0
.0/
0st n 0
x
in the same way as p sup
n 0st
jBs j , for arbitrary n and t , and
ˇ Z sn ˇ
ˇ n 1 ˇ
ˇ
sup ˇhM i .s/  2 .  0 /v dv ˇˇ under Q0
2
0st n 0
R
in the same way as xnt C 2pjxj
2 t
n 0
jBs j ds , where we use the scaling property stated
at the start.
(1) In the general situation (B), the previous equality in law ./ of score and in-
formation is replaced by weak convergence of the pair (score martingale, information
process) under the parameter value # D 0
Z Z 
 
L M n , hM n i j Q0 ! L Bs dBs , Bs2 ds
./
weakly in D.R2 / as n ! 1 .
222 Chapter 8 Some Stochastic Process Examples

Let us write e
E 1 for the limit experiment which appears in (A.1), in order to avoid
confusion about the different starting values. In our case (B) of starting value x ¤ 0
for equation (8.1), we combine . / for the likelihood ratios in the local models En,0
at # D 0 (which is .˘/ in 8.3’)

h=0 Œ0C n1 h=0 1 2


ƒ0,n D log Ln D h M n .1/  h hM n i1 , h2R
2

with weak convergence ./ to establish LAQ at # D 0 with limit experiment e E 1.


(2) Given LAQ at # D 0 with limit experiment e E 1 , write Qe h for the laws in e
E 1.
Then Corollary 7.7 shows for ML estimation
.ıı/ ˇ       ˇ
ˇ ˇ
sup ˇ EQŒ0C 1 h ` n .b# n  Œ0 C n1 h/  Ee Qh ` b
# 1  h ˇ ! 0
jhjC n

for arbitrary loss functions `./ which are continuous and bounded and for arbitrary
constants C < 1. In case (B) of starting value x ¤ 0 for equation (8.1), .ıı/ replaces
equality of laws .ı/ which holds in case (A) above. Recall that .ıı/ merely states the
following: the ML estimator b hn for the local parameter h in the local model En,# at
# D 0 works approximately as well as the ML estimator in the limit model e E 1 . In
particular, .ıı/ is not an optimality criterion. 

8.6 LAMN in the Transient Case. By 8.2(c), in the case where # > 0, the canoni-
cal process  under Q# is transient, and we have the following asymptotics for the
information process when # > 0:
Z t
1
e 2 # t 2s ds ! Y12
.#/ Q# -almost surely as t ! 1 .
0 2#

Here Y1 .#/ is the G1 -measurable limit variable


 1 
Y1 D Y1 .#/ N x, ,
2#
for the martingale Y of 8.2(c), with x the starting point for equation (8.1). Observing
at stage n of the asymptotics a trajectory of  up to time n, we have to chose local
scale as
ın .#/ D e # n at # > 0 .
Thus in the transient case, local scale depends on the value of the parameter. Space-
C
 as follows. With notation f D f _ 0
time scaling of the score martingale is done
for the positive part of f , we put G :D G.nClog.t//C t0 and
n

Z .nClog.t//C
  # n
M#n D M#n .t / t0 , M#n .t / :D e s d m.#/
s , t 0
0
Section 8.1 Ornstein-Uhlenbeck Model 223

for n  1. Then angle brackets of M#n under Q# satisfy as n ! 1


Z .nClog.t//C
˝ n˛ 1 2#
M# t D e 2 # n 2s ds ! 2
Y1 .#/ t Q# -almost surely
0 2#
for every 0 < t < 1 fixed. Writing
1 2#
2
'# .t / :D Y1 .#/ t , t 0,
2#
we have a collection of G1 -measurable random variables with the properties
´
'# .0/  0, t ! '# .t / is continuous and strictly increasing, lim '# .t / D C1;
˝ ˛ t!1
for every 0 < t < 1: M#n t ! '# .t / Q# -almost surely as n ! 1.

All M#n being continuous .Q# , G n /-martingales, a martingale convergence theorem


(from Jacod and Shiryaev [64, VIII.5.7 and VIII.5.42]) which we recall in Theorem 9.2
in the Appendix (the nesting condition there is satisfied for our choice of the Gn ) yields

M#n ! B ı '# (weak convergence in D, under Q# , as n ! 1)

where standard Brownian motion B is independent from '# . Again by continuity of


all M#n , we also have weak convergence of pairs under Q# as n ! 1
 n ˝ n ˛
M# , M# ! . B ı '# , '# /

(cf. [64, VI.6.1]); we recall this in Theorem 9.3 in the Appendix) in the Skorohood
space D.R2 / of càdlàg functions Œ0, 1/ ! R2 . By continuity of projection mappings
on a subset of D.R2 / of full measure, we end up with weak convergence
 n ˝ ˛ 
M# .1/ , M#n 1 ! . B.'# .1// , '# .1/ /
./
(weakly in R2 , under Q# , as n ! 1) .

(1) Log-likelihood ratios in local models E#,n at # are


# n h=# 1 2 ˝ n˛
ƒh=0 Œ#Ce
0,n D log Ln D h M#n .1/  h M# 1 , h2R
2
according to .˘/ in 8.3’. Combining this with weak convergence ./, we have estab-
lished LAMN at parameter values # > 0; the limit model E1 .#/ is Brownian motion
with unknown drift observed up to the independent random time '# .1/ as above. This
type of limit experiment was studied in 6.16.
(2) According to step (2) in 8.3’, rescaled ML estimator errors at #
  M n .1/
e# n b # n  # D ˝ # n ˛ D Zn .#/ under Q# , for all n  1
M# 1
224 Chapter 8 Some Stochastic Process Examples

coincide with the central sequence at # and converge to the limit law
Z  
B.'# .1// 1
. / L .'# .1// .du/ N 0 , .
'# .1// u
By local asymptotic mixed normality according to step (1), Theorems 7.11, 7.10
and 7.12 apply and show the following: at all parameter values # > 0, the ML
estimator sequence .b
# n /n is regular and efficient in the sense of Jeganathan’s version
of the Convolution Theorem 7.10, and attains the local asymptotic minimax bound of
Theorem 7.12. 

8.6’ Remark. Under assumptions and notations of 8.6, we comment on the limit law
arising in . /. With x the starting point for equation (8.1), recall from 8.2(c) and the
 1 
start of 8.6
1
Y1 .#/ N x, , '# .1/ D Y1 2
.#/
2# 2#
which gives (use e.g. [4, Sect. VII.1])
8  
< 1
2 , 2#
2 in the case where x D 0
L .'# .1// D  p 
: 1 3 2
in the case where x ¤ 0
2 , x 2# , 2#

where notation .a, , p/ is used for decentral Gamma laws (a > 0,  > 0, p > 0)

X1
e  k
.a, , p/ D .aCm, p/ .
mD0

In the case where  D 0 this reduces to the usual .a, p/. It is easy to see that variance
mixtures of type
Z  
1
.a, p/.du/ N 0 , where 0 < a < 1
u
do not admit finite second moments. .a, p/ is the first contribution (summand m D
0) to .a, , p/. Thus the limit law . / which is best concentrated in the sense of
Jeganathan’s Convolution Theorem 7.10 and in the sense of the Local Asymptotic
Minimax Theorem 7.12 in the transient case # > 0
Z   Z    
1 1 1
L .'# .1// .du/ N 0 , D L Y1 .#/ 2
.du/ N 0 ,
u 2# u
is of infinite variance, for all choices of a starting point x 2 R for SDE (8.1). Recall in
this context Remark 6.6”: optimality criteria in mixed normal models are conditional
on the observed information, never in terms of moments of the laws of rescaled
estimation errors. 
Section 8.1 Ornstein-Uhlenbeck Model 225

Let us resume the tableau 8.4, 8.5 and 8.6 for convergence of local models when
we observe an Ornstein–Uhlenbeck trajectory under unknown parameter over a long
time interval: optimality results are available in restriction to submodels where the
process either is positive recurrent or is transient; except in the positive recurrent case,
the rates of convergence and the limit experiments are different at different values of
the unknown parameter.

For practical purposes, one might wish to have limit distributions of homogeneous
and easily tractable structure which hold over the full range of parameter values. De-
pending on the statistical model, one may try either random norming of estimation
errors or sequential observation schemes.

8.6” Exercise
Rt (Random norming). In the Ornstein–Uhlenbeck model, the information process
t ! 0 2s ds is observable, its definition does not involve the unknown parameter. Thus we
may consider random norming for ML estimation errors using the observed information.
(a) Consider first the positive recurrent cases # < 0 in 8.4 and the transient cases # > 0 in
8.6. Using the structure of the limit laws for the pairs
 n ˝ ˛ 
M# .1/ , M#n 1 under Q# as n ! 1

from ./+./ in 8.4 and ./ in 8.6 combined with the representation
Rn
s d m.#/
b
#n  # D Rn
0 s
, n2N, under Q#
 2 ds
0 s

of ML estimation errors we can write under Q#


sZ
n   M n .1/
2s ds b# n  # D ˝ # ˛  1 ! N .0, 1/ , n!1
0 M#n 1 2

to get a unified result which covers the cases # ¤ 0.


(b) However, random norming as in the last line is not helpful when # D 0. This has been
noted by Feigin [23]. Using the scaling properties in 8.5, we find in the case where # D 0 that
the weak limit of the laws
sZ !
n  
L 2s ds b # n  0 j Q0 as n ! 1
0

1
.B 2 1/
is the law L. .R 12 B 21ds/1=2 /. Feigin notes simply that this law lacks symmetry around 0
0 s

!
1
.B 2  1/   1
P R21 1  0 D P B12  1 D P .B1 2 Œ1, 1/ 0.68 ¤
. 0 Bs2 ds/1=2 2

and thus cannot be a normal law. Hence there is no unified result extending (a) to cover all
cases # 2 R. 
226 Chapter 8 Some Stochastic Process Examples

In our model, the information process can be calculated from the observation
without knowing the unknown parameter, cf. step (3) in Model 8.3. This allows to
define a time change and thus a sequential observation scheme by stopping when the
observed information hits a prescribed level.

8.7 LAN at all Points # 2 ‚ by Transformation R 2 of Time. For the Ornstein–


Uhlenbeck Model 8.3 with information process s ds, define for every n 2 N a
random time change
² Z t ³
 .n, u/ :D inf t > 0 : s ds > u n , u  0 .
2
0

Define score martingale and information process scaled and time-changed by u !


 .n, u/:
Z .n,u/
1  
f n
M # .u/ :D p s d m.#/ Ge n :D G.n,u/
s , u0
,
n 0
Z
˝ n˛ 1 .n,u/ 2
f
M# u D s ds D u .
n 0

 n  theorem (cf.n [61, p. 74]) shows for all n 2 N and for


Then P. Lévy’s characterisation
f
all # 2 ‚ that M # n D M f .u/ e , Q# /-standard Brownian motion.
is a .G
# u0
(1) If at stage n of the asymptotics we observe a trajectory of the canonical process
 up to p the random time  .n, 1/, and write eE n,# for the local model at # with local
scale 1= n:
 ® ¯
e
E n,# D C , G.n,1/ , Q#Cn1=2 h j G.n,1/ : h 2 R ,

then log-likelihoods in the local model at # are


˝ n˛
fn .1/  1 h2 M 1 2
1=2
e h=0 :D log L.#Cn h/=# D h M f fn
ƒ #,n .n,1/ # # 1 D h M # .1/  h , h2R
2 2
where we have for all n 2 N and all # 2 ‚
 n 
L Mf .1/ j Q# D N . 0 , 1 / .
#

According to Definition 7.1(c), this is LAN at all parameter values # 2 R, and even
more than that: not only for all values of the parameter # 2 ‚ are limit experiments
e
E 1 .#/ given by the same Gaussian shift E.1/, in the notation of Definition 5.2, but
also all local experiments e
E n,# at all levels n  1 of the asymptotics coincide with
E.1/. Writing

e 2.n,1/  20   .n, 1/ 2.n,1/  20   .n, 1/


b b
# n D # .n,1/ D R .n,1/ D
2 0 2s ds 2n

Section 8.2 A Null Recurrent Diffusion Model 227

for the maximum likelihood estimator when we observe up to the stopping time
 .n, 1/, cf. step (1) of Model 8.3, we have at all parameter values # 2 ‚ the following
properties: the ML sequence is regular and efficient for the unknown parameter at #
(Theorems 7.11 and 7.10), and attains the local asymptotic minimax bound at # (Re-
mark 7.13). This sequential observation scheme allows for a unified treatment over
the whole range of parameter values # 2 R (in fact, everything thus reduces to an
elementary normal distribution model ¹N .h, 1/ : h 2 Rº). 

8.7’ Exercise. We compare the observation schemes used in 8.7 and in 8.6 in the transient case
# > 0. Consider only the starting point x D 0 for equation (8.1). Define
1
./ # .u/ :D inf ¹ t > 0 : '# .t / > u º D . u Œ'# .1/1 / 2#

for 0 < u < 1, with # .0/  0, and recall from 8.6 that
1 1 
'# .1/ D Y12
.#/ , 2# 2 .
2# 2
(a) Deduce from ./ that for all parameter values # > 0

E .j log # .u/j/ < 1 .

(b) Consider in the case where # > 0 the time change u !  .m, u/ in 8.7 and prove that
for 0 < u < 1 fixed,
1
 .m, u/  log.m/ under Q#
2#
converges weakly as m ! 1 to
log.# .u// .
˝ n˛
Hint: Observe that the definition of M# t in 8.6 and  .n, u/ in 8.7 allows us to replace n 2 N
by arbitrary  2 .0, 1/. Using this, we can write
˝ ˛ 
P . # .u/ > t / D P . '# .t / < u / D lim Q# M#n t < u
n!1
 
D lim Q#  .e 2# n , u/ > Œn C log.t /C
n!1

for fixed values of u and t in .0, 1/, where the last limit is equal to
 
1  1 
lim Q#  .m, u/ > log.m/ C log.t / D lim Q# e .m,u/ 2# log.m/ > t . 
m!1 2# m!1


8.2 A Null Recurrent Diffusion Model
We discuss a statistical model where the diffusion process under observation is re-
current null for all values of the parameter. Our presentation follows Höpfner and
Kutoyants [54] and makes use of the limit theorems in Höpfner and Löcherbach [53]
and of a result by Khasminskii [73].
228 Chapter 8 Some Stochastic Process Examples

In this section, we have a probability space ., A, P / carrying a Brownian motion


W ; for some constant  > 0 we consider the unique strong solution to equation
Xt
.8.8/ dX t D # dt C  d W t , X0  x
1 C X t2
depending on a parameter # which ranges over the parameter space
 
1 1
.8.80 / ‚ :D   2 ,  2 .
2 2
We shall refer frequently to the Appendix (Chapter 9).

8.9 Long Time Behaviour of the Process. Under # 2 ‚ where the parameter space
‚ is defined by equation (8.8’), the process X in equation (8.8) is recurrent null in the
sense of Harris with invariant measure
1 p 2#
.8.90 /
2
m.dx/ D 2 1 C y 2  dx , x 2 R .

v
We prove this as follows. Fix # 2 ‚, write b.v/ D # 1Cv 2 for the drift coefficient in
equation (8.8), and consider the mapping S : R ! R defined by
Z x  Z y 
2b
S.x/ :D s.y/ dy where s.y/ :D exp  2
.v/ dv , x, y 2 R
0 0 
Ry
as in Proposition 9.12 in the Appendix; we have 0 2b2 .v/ dv D #2 ln.1 C y 2 / and
thus
  # p  2#2
s.y/ D 1 C y 2  2 D 1 C y 2  ,
1 2#
S.x/ sign.x/ jxj1  2 as x ! ˙1 .
1 2#
2

Since j 2#2 j < 1 by equation (8.8’), the function S./ is a bijection onto R: thus Propo-
sition 9.12 shows that X under # is Harris with invariant measure
1 1
m.dx/ D 2 dx on .R, B.R//
 s.x/
which gives equation (8.9’). We have null recurrence since m has infinite total
mass. 

We remark that ‚ defined by equation (8.8’) is the maximal open interval in R such
that null recurrence holds for all parameter values. For the next result, recall from
Remark 6.17 the definition of a Mittag–Leffler process V .˛/ of index 0 < ˛ < 1: to
the stable increasing process S .˛/ of index 0 < ˛ < 1, the process with independent
and stationary increments having Laplace transforms
 .˛/ .˛/

E e  .S t2 S t1 / D e .t2 t1 /  ,   0 , 0  t1 < t2 < 1
˛

Section 8.2 A Null Recurrent Diffusion Model 229

and starting from S0.˛/  0, we associate the process of level crossing times
.˛/
V .˛/ . Paths of V .˛/ are continuous and non-decreasing, with V0 D 0 and
.˛/
lim t!1 V t D 1. Part (a) of the next result is a consequence of two results due
to Khasminskii [73] which we recall in Proposition 9.14 of the Appendix. Based on
(a), part (b) then is a well-known and classical statement, see Feller [24, p. 448] or
Bingham, Goldie and Teugels [12, p. 349], on domains of attraction of one-sided
stable laws. For regularly varying functions see [12]. X being a one-dimensional
process with continuous trajectories, there are many possibilities to define a sequence
of renewal times .Rn /n1 which decompose the trajectory of X into i.i.d. excursions
.X 1ŒŒRn ,RnC1  /n1 away from 0; a particular choice is considered below.

8.10 Regular Variation. Fix # 2 ‚, consider the function S./ and the measure
m.dx/ of 8.9 (both depending on #), and define
 
0/ 1 2#
.8.10 ˛ D ˛.#/ D 1  2 2 .0, 1/ .
2 
(a) The sequence of renewal times .Rn /n1 defined by

Rn :D inf¹t > Sn : X t < 0º , Sn :D inf¹t > Rn1 : X t > S 1 .1/º ,


n  1 , R0  0

has the following properties (i) and (ii) under #:


 
1 ˛ 4
.i/ P .RnC1  Rn > t / t ˛ as t ! 1 ,
2 2 .˛/

Z !
RnC1
.ii/ E f .Xs / ds D 2 m.f / , f 2 L1 .m/ .
Rn

(b) For any norming function a./ with the property


1
.8.1000 / a.t / as t !1
.1  ˛/ P .R2  R1 > t /
and for any function f 2 L1 .m/ we have weak convergence
Z n
1
f .Xs / ds ! 2 m.f / V .˛/
.8.10000 / a.n/ 0
(weakly in D, under #, as n ! 1)

where V .˛/ is the Mittag–Leffler process of index ˛, 0 < ˛ < 1 . In particular, the
norming function in equation (8.10”) varies regularly at 1 with index ˛ D ˛.#/, and
all objects above depend on # 2 ‚.
230 Chapter 8 Some Stochastic Process Examples

Proof. (1) From Khasminskii’s results [73] which we recall in Proposition 9.14 of the
Appendix, we deduce the assertions (a.i) and (a.ii):
Fix # 2 ‚. We have the functions S./, s./ defined in 8.9 which depend on #.
S./ is a bijection onto R. According to step (3) in the proof of Proposition 9.12, the
process Xe :D S.X /

et D e
dX e t / d Wt
 .X  D .s   / ı S 1
where e

is a diffusion without drift, is Harris recurrent, and has invariant measure


1 1 1
e.de
m x/ D 2 dexD 2 1
dex on .R, B.R// .
e
 .ex/  Œs ı S 2 .e x/

Write for short  D 2#2 : by choice of the parameter space ‚ in equation (8.8’), 
belongs to .1, 1/. From 8.9 we have the following asymptotics:

s.y/ jyj , y ! ˙1
1
S.x/ sign.x/ jxj1 , x ! ˙1
1
1
S 1 .z/ sign.z/ . .1   /jzj / 1 , z ! ˙1

Œs ı S 1 .v/ . .1   /jvj / 1 , v ! ˙1 .
By the properties of S./, the sequence .Rn /n1 defined in (a) can be written in the
form
e t < 0º , Sn :D inf¹t > Rn1 : X
Rn :D inf¹t > Sn : X e t > 1º ,
.C/
n  1, R0  0

e D S.X /. As a consequence of (+), the stopping times .Rn /n1


with respect to X
induce the particular decomposition of trajectories of X e into i.i.d. excursions
e 1ŒŒR ,R  /n1 away from 0 to which Proposition 9.14 applies. With respect to
.X n nC1
.Rn /n1 and Xe we have

2 2 1 2 2 2 2 2
D 2 1
.v/ . .1   /jvj / 1 D .1   / 1 jvj 1 ,
e2
 .v/  Œs ı S  2  2  2

v ! ˙1

which shows that the condition in Proposition 9.14(b) is satisfied:


Z
1 x ˇ 2
lim jvj dv D A˙ ,
x!˙1 x 0 e
 2 .v/
2 2 2
ˇ :D > 1 , AC D A :D .1   / 1 .
1  2

Section 8.2 A Null Recurrent Diffusion Model 231

Recall that  , ˇ and A˙ depend on #. With ˛ D ˛.#/ defined as in equation (8.10’)


 
1 1 1 2#
˛ :D D .1   / D 1  2 2 .0, 1/
ˇC2 2 2 
we have ˇ D 12˛
˛ and rewrite the constants A˙ in terms of ˛ D ˛.#/ as
2 2 12˛
.1   /ˇ D 2 .2˛/ ˛ .
A˙ D
2
 
From this, Proposition 9.14(b) gives us
˛ 2˛ .ŒAC ˛ C ŒA ˛ / ˛  1 ˛ 4
P .RnC1  Rn > t / t D t ˛
.1 C ˛/ 2 2 .˛/
as t ! 1. This proves part (a.i) of the assertion. Next, we apply Proposition 9.14(a)
e the i.i.d. excursions defined by .Rn /n1 correspond to the norming
to the process X:
constant  Z RnC1 
E fe.Xe s / ds D 2 m e/ , f
e .f e  0 measurable
Rn
e :D f ı S 1 and apply formula .ıı/ in the
e. If we write f
for the invariant measure m
proof of Proposition 9.12  
m.f / D me f ı S 1
we obtain
Z RnC1 
E f .Xs / ds D 2 m.f / , f  0 measurable
Rn

which proves part (a.ii) of the assertion. Thus, 8.10(a) is now proved.
(2) We prove 8.10(b). Under # 2 ‚, for ˛ given by equation (8.10’) and for the re-
newal times .Rn /n1 considered in (a), select a strictly increasing continuous norming
function a./ such that
1
a.t / as t ! 1 .
.1  ˛/ P .R2  R1 > t /
In particular, a./ varies regularly at 1 with index ˛. Fix an asymptotic inverse b./ to
a./ which is strictly increasing and continuous. All this depends on #, and (a.i) above
implies
 1
1 4 .1  ˛/ ˛ 1
.8.11/ b.t / t ˛ as t ! 1 .
2 2 .˛/
From a.b.n// n as n ! 1, a well-known result on convergence of sums of i.i.d.
variables to one-sided stable laws (cf. [24, p. 448], [12, p. 349]) gives weak conver-
gence
Rn .˛/
! S1 (weakly in R, under #, as n ! 1)
b.n/
232 Chapter 8 Some Stochastic Process Examples

where S1.˛/ follows the one-sided stable law on .0, 1/ with Laplace transform  !
exp.˛ /,   0. Regular variation of b./ at 1 with index ˛1 implies that b.t n/
1
t ˛ b.n/ as n ! 1. Thus, scaling properties and independence of increments in the
one-sided stable process S ˛ of index 0 < ˛ < 1 show that the last convergence
extends to finite-dimensional convergence
RŒn f .d .
.8.110 / ! S .˛/ as n ! 1 .
b.n/
Associate a counting process
® ¯
N D .N t / t0 , N t :D max j 2 N : Rj  t ,

to the renewal times .Rn /n and write for arbitrary 0 < t1 <    < tm < 1, Ai 2
B.R/ and xi > 0
   
N ti b.n/ Œxi n RŒxi n
P < ,1i m DP > ti , 1  i  m .
n n b.n/

Then .8.110 / gives finite dimensional convergence under # as n ! 1


Nb.n/ f .d .
! V .˛/
n
to the process inverse V .˛/ of S .˛/ . On the left-hand side of the last convergence,
we may replace n by a.n/ and make use of b.a.n// n as n ! 1. Then the last
convergence takes the form
Nn f .d .
.8.1100 / ! V .˛/ as n ! 1 .
a.n/
By (a.ii) we have for functions f  0 belonging to L1 .m/
 Z RnC1 
E f .Xs / ds D 2 m.f / .
Rn

The counting process N increasing by 1 on Rn , RnC1 , we have almost sure con-
vergence under #
Z t Z
1 1 Rn
lim f .Xs / ds D lim f .Xs / ds D 2 m.f /
t!1 N t 0 n!1 n 0

from the classical strong law of large numbers with respect to the i.i.d. excursions
X 1ŒŒRj ,Rj C1  , j  1. Together with .8.1100 / we arrive at
Z n
1 f .d .
f .Xs / ds ! 2 m.f / V .˛/ as n ! 1
a.n/ 0

Section 8.2 A Null Recurrent Diffusion Model 233

for functions f  0 belonging to L1 .m/. All processes in the last convergence are
increasing processes, and the limit process is continuous. In this case, according
to Jacod and Shiryaev (1987, VI.3.37), finite dimensional convergence and weak
convergence in D are equivalent, and part (b) of 8.10 is proved. 

We turn to statistical models defined by observation of a trajectory of the process


(8.8) under unknown # 2 ‚ over a long time interval, with parameter space ‚
defined by equation (8.8’).

8.12 Statistical Model. For ‚ given by equation (8.8’) and for some starting point
x0 2 R which does not depend on # 2 ‚, let Q# denote the law of the solution to
(8.8) under #, on the canonical path space .C , C , G/ or .D, D, G/. Applying Theorem
6.10, all laws Q# are locally equivalent relative to G, and the density process of Q#
with respect to Q0 relative to G is
² Z t Z ³
#=0 1 2 t 2
L t D exp # .s / d m.0/
s  #  . s /  2
ds , t 0
0 2 0

with m.0/ the Q0 -local martingale part of the canonical process  under Q0 , and
1 x
.8.120 / .x/ :D , x2R.
 1 C x2
2

(1) Fix a determination .t , !/ ! Y .t , !/ of the stochastic integral


Z Z
.s / ds D .s / d m.0/ s under Q0

where L.  0 jQ0 / D L.m.0/ jQ0 / D L.B/. Thus, in statistical experiments E t


corresponding to observation of the canonical process  up to time 0 < t < 1, both
likelihood function
² Z t ³
#=0 1
‚ 3 # ! L t D exp # Y t  # 2  2 .s /  2 ds 2 .0, 1/
2 0

and maximum likelihood (ML) estimator

b Yt
# t :D R t
0  2 . s/ 
2 ds

are expressed without reference to a particular probability measure.


(2) Simultaneously for all # 2 ‚, .t , !/ ! Y .t , !/ provides a common determi-
nation for
Z t  Z t Z t 
.s / ds D .s / d m.#/
s C #  2
. s /  2
ds under Q# :
0 t0 0 0 t0
234 Chapter 8 Some Stochastic Process Examples

R
any determination .t , !/ ! Y .t , !/ of the stochastic
R integral .s / ds under Q0
is also a determination of the stochastic integral .s / ds under Q# , by Lemma
.#/ Rt
8.2’, and the Q# -martingale part of  equals m t D  t  0  # 0 .s /  2 ds .
This allows us to write ML estimation errors under # 2 ‚ in the form
Rt
b .s / d m.#/
s
# t  # D DR0 E under Q#
.#/
.s / d ms
t

and allows us to write density processes L=# of Q with respect to Q# relative to G


either as
² Z t ³
=0 #=0 1 2
L t =L t D exp .  #/ Y t  .  # / 2
 .s /  ds , t  0
2 2
2 0

or equivalently – coinciding with Theorem 6.10 applied to Q and Q# – as


² Z t Z t ³
=# 1
.C/ Lt D exp .  #/ .s / d m.#/
s  .  #/2 2 2
 .s /  ds .
0 2 0

(3) The representation (+) allows to reparameterise the model E t with respect to
fixed reference points # such that the model around # is quadratic in .  #/. We call
Z t 
.#/
.s / d ms
0 t0

and Z  Z 
t
.#/
2 2
 .s /  ds D ./ d m under Q#
0 t0

score martingale at # and information process at #. It is clear from .8.10000 / that


the law of the observed information depends on #. Hence reparameterising as in (+)
with respect to a reference point # or with respect to # 0 ¤ # yields statistical models
which are different. 

From now on we shall keep trace of the parameter # in some more notations: instead
of m as in .8.90 / we write
1 p 2#
# .dx/ D 1 C y2 2
dx , x2R
2
for the invariant measure of the canonical process  under Q# ; instead of a./ we
write a# ./ for the norming function in .8.1000 / which is regularly varying at 1 with
index  
1 2#
˛.#/ D 1  2 2 .0, 1/
2 

Section 8.2 A Null Recurrent Diffusion Model 235

as in .8.100 /. By choice of ‚ in .8.80 /, there is a one-to-one correspondence be-


tween indices 0 < ˛ < 1 and parameter values # 2 ‚. Given the representation
of log-likelihoods in Model 8.12, the following proposition allows to prove con-
vergence of local models at #, or convergence of rescaled estimation errors at
# for a broad class of estimators. It is obtained as a special case of Theorems 9.8
and 9.10 and of Corollary 9.10’ (from Höpfner and Löcherbach [53]) in the Appendix.

8.13 Proposition. For functions f 2 L2 .# / and with respect to G n D .G tn / t0 ,


consider locally square integrable local Q# -martingales
Z tn
1
M n D .M tn / t0 , M tn :D p f .s / d m.#/
s
n˛.#/ 0

with m.#/ the Q# -martingale part of the canonical process . Under Q# , we have
weak convergence
 n   
M , hM n i ! .ƒ.#//1=2 B ı V .˛.#// , ƒ.#/ V .˛.#//

in D.R R/ as n ! 1: in this limit, Brownian motion B and Mittag–Leffler process


V .˛.#// of index 0 < ˛.#/ < 1 are independent, and the constant is
 1C˛.#/ .˛.#//
.8.130 / ƒ.#/ :D 2 2 # .f 2 / .
4 .1˛.#//

Proof. (1) Fix # 2 ‚. For functions f  0 in L1 .# /, we have from 8.10(b) weak
convergence in D, under Q# , of integrable additive functionals:
Z n
1
f .s / ds ! 2 # .f / V ˛.#/ , n ! 1 .
a# .n/ 0
We rephrase this as weak convergence on D, under Q# , of
Z n
1
f .s / ds ! C.#/ # .f / V ˛.#/ , n!1
n˛.#/ 0
for a constant C.#/ which according to 8.10 is given by

1 ˛.#/ 4 .1˛.#//
1
00
.8.13 / C.#/ :D 2  .
2 2 .˛.#//
(2) Step (1) allows to apply Theorem 9.8(a) from the Appendix: we have the regular
variation condition (a.i) there with ˛ D ˛.#/, m D .#/ and `./  1=C.#/ on the
right-hand side.
(3) With the same notations, we apply Theorem 9.10 from the Appendix: for locally
square integrable local martingales M f satisfying the conditions made there, this gives
weak convergence of
1   1  
p f tn
M Dp f tn
M under Q#
t0 t0
n˛.#/ C.#/ n˛.#/=`.n/
236 Chapter 8 Some Stochastic Process Examples

in D as n ! 1 to
e
.ƒ.#// 1=2
B ı V .˛.#//
where Brownian motion B and Mittag–Leffler process V .˛.#// are independent, with
constant
˝ ˛ 
.8.13000 / e
ƒ.#/ :D E# M f .
1

(4) We put together steps (1)–(3) above. For g 2 L2 .# / consider the local
.Q# , G/-martingale M
Z t
M t :D g.s / d m.#/
s , t 0
0

where m.#/ is the Q# -local martingale part of the canonical process . Then M is a
martingale additive functional as defined in Definition 9.9 and satisfies
Z 1 
E# .hM i1 / D E# g .s /  ds D  2 # .g 2 / < 1 .
2 2
0
Thus, as a consequence of step (3),
1
M n :D p .M tn / t0
n˛.#/
under Q# converges weakly in D as n ! 1 to

.ƒ.#//1=2 B ı V .˛.#//

where combining .8.1300 / and .8.13000 / we have


 1C˛.#/ .˛.#//
e
ƒ.#/ D C.#/ ƒ.#/ D C.#/  2 # .g2 / D 2 2 # .g2 / .
4 .1˛.#//
This constant ƒ.#/ appears in .8.130 /.
(5) The martingales in step (4) are continuous. Thus Corollary 9.10’ in the Ap-
pendix extends the result of step (4) to weak convergence of martingales together
with their angle brackets, and concludes the proof of Proposition 8.13. 

With Model 8.12 and Proposition 8.13 we have all elements which we need to
prove LAMN at arbitrary reference points # 2 ‚. Recall that the parameter space
‚ is defined by .8.80 /, and that the starting point for equation (8.8) is fixed and does
not depend on #. Combining the representation of likelihoods with respect to # in
formula (+) of Model 8.12 with Proposition 8.13 we get the following:

8.14 LAMN at Every Reference Point # 2 ‚. Consider the statistical model En


which corresponds to observation of a trajectory of the solution to equation (8.8) con-
tinuously over the time interval Œ0, n, under unknown # 2 ‚.

Section 8.2 A Null Recurrent Diffusion Model 237

(a) For every # 2 ‚, we have LAMN at # with local scale


1
ı.n/ :D p for ˛.#/ 2 .0, 1/ defined by .8.100 / :
n˛.#/
for every n, log-likelihoods in the local model E#,n are quadratic in the local parameter
h=0 1 2
ƒ#,n D ƒ.#Cı
n
n .#/h/=#
D h Sn .#/  h Jn .#/ , h2R
2
and we have weak convergence as n ! 1 of score and information at #
   
. Sn .#/, Jn .#/ j Q# / ! ƒ1=2 .#/ B V1.˛.#// , ƒ.#/ V1.˛.#// D: .S , J /

with ƒ.#/ given by .8.130 /, and with Mittag–Leffler process V ˛.#/ independent from
B. For the limit experiment E.S , J / at # see Construction 6.16 and Example 6.18.
(b) For every # 2 ‚, rescaled ML estimation errors at # coincide with the cen-
tral sequence at #. Hence ML estimators are regular and efficient in the sense of Je-
ganathan’s version 7.10(b) of the convolution theorem, and attain the local asymptotic
minimax bound of Theorem 7.12.

Proof. (1) Localising around a fixed reference point # 2 R, write Qn for Q restricted
to Gn . With
 
1 1 2#
ın .#/ :D p with ˛.#/ D 1  2 2 .0, 1/
n˛.#/ 2 
according to 8.10, Model 8.12 and Proposition 8.13 we consider local models at #
 ° ±
n
E#,n D C , Gn , Q#Cı n .#/ h
: h 2 R , n1

when n tends to 1. According to (+) in step (2) of Model 8.12, log-likelihoods in E#,n
are
 Z n  Z n
1 2 2
.˘/ ƒh=0
#,n
D h ın .#/ . s / d m #
s  h ı n .#/  2 .s /  2 ds , h 2 R .
0 2 0

Now we can apply Proposition 8.13: note that for every # 2 ‚, ./ belongs to L2 .# /
since
   2# 
 2 .x/ D O x 2 as jxj ! 1 , d# .x/ D O jxj  2 dx as jxj ! 1

where j 2#2 j < 1. Proposition 8.13 yields weak convergence as n ! 1 of pairs


 Z n Z n 
1 1
. Sn .#/ , Jn .#/ / :D p .s / d m#s , ˛.#/  2 .s /  2 ds
.˘˘/ n˛.#/ 0 n 0
under Q#
238 Chapter 8 Some Stochastic Process Examples

to the pair of limiting random variables


   
.8.140 / . S.#/ , J.#/ / D ƒ1=2 .#/ B V1.˛.#// , ƒ.#/ V1.˛.#//

where Brownian motion B and Mittag–Leffler process V ˛.#/ are independent, and
.˛.#//
ƒ.#/ D .2 2 /1C˛.#/ # . 2 /
4 .1˛.#//
according to .8.130 /, with ./ from .8.120 /. We have proved in Construction 6.16 and
in Example 6.18 that the pair of random variables .8.140 / indeed generates a mixed
normal experiment.
(2) Combining this with step (2) of Model 8.12, we see that rescaled ML estimation
errors at # coincide with the central sequence at #:
p  
.˘ ˘ ˘/ n˛.#/ b# n  # D Jn1 .#/ Sn .#/ D Zn .#/ , n  1 .

In particular, the coupling condition of Theorem 7.11 holds. By LAMN at #, by the


Convolution Theorem 7.10 and the Local Asymptotic Minimax Theorem 7.12, we
see that the ML estimator sequence is regular and efficient at #, and attains the local
asymptotic minimax bound as in Theorem 7.12(b). 

8.15 Remark. According to .8.140 / and to Remark 6.6’, the limit law for .˘ ˘ ˘/ as
n ! 1 has the form
Z  
 ˛.#/  1
L .Z.#/jP0 / D L ƒ.#/ V1 .du/ N 0 , .
u
As mentioned in Example 6.18, this law – the best concentrated limit distribution for
rescaled estimation errors at #, in the sense of the Convolution Theorem 6.6 or of
the Local Asymptotic Minimax Theorem 6.8 – does not admit finite second moments
(see Exercise 6.180 ). 

8.16 Remark. Our model presents different speeds of convergence at different pa-
rameter values, different limit experiments at different parameter values, but has an
information process
Z t Z t 2
1 s
 2 .s /  2 ds D 2 ds , t  0
0  0 1 C 2s
which can be calculated from the observation . t / t0 without knowing the unknown
parameter # 2 ‚. This fact allows for random norming using the observed informa-
tion: directly from LAMN at # in 8.14, combining .˘/, .˘˘/ and .˘ ˘ ˘/ there, we
can write sZ 
n
s 2  
ds b
# n  # ! N .0,  2 /
0 1 C 2s

Section 8.2 A Null Recurrent Diffusion Model 239

for all values of # 2 ‚. The last representation allows for practical work, e.g. to fix
confidence intervals for the unknown parameter determined from an asymptotically
efficient estimator sequence, and overcomes the handicap caused by non-finiteness of
second moments in Remark 8.15. 

We conclude the discussion of the statistical model defined by (8.8) and (8.8’) by
pointing out that one-step correction is possible, and that we may start from any prelim-
inary estimator sequence .Tnp/n for the unknown parameter in .En /n whose estimation
errors at # are tight at rate n˛.#/ as n ! 1 for every # 2 ‚, and modify .Tn /n
according to Theorem 7.19 in order to obtain a sequence of estimators .T en /n which
satisfies the coupling condition of Theorem 7.11
p  
n˛.#/ T en  # D Zn .#/ C o.Q / .1/ as n ! 1
#

at every # 2 ‚. Again there is no need for discretisation as in Proposition 7.16 since


local scale, observed information and score depend continuously on #. To do this, it
is sufficient to check the set of Conditions 7.14 for our model.

8.17 Proposition. With the above notations, the sequence of models .En /n satisfies
all assumptions stated in 7.14. For .Tn /n as above, one-step modification

en :D Tn C p 1
T
1
Sn .Tn / D b
#n
n˛.T n / Jn n/
.T

directly leads to the ML estimator b


# n.

Proof. Let Y denote a common determination R – as in step (2) of Model 8.12, by


Lemma 8.2’ – of the stochastic integral .s / ds under all laws Q , 2 ‚. We
check the set of conditions in 7.14.
First, it is obvious that the model allows for a broad collection of preliminary
estimator sequences. As an example, select A 2 B.R/ with .A/ > 0, write
RL .x/ :D .x/1A .x/, choose a common determination YL for the stochastic integral
L .s / ds under all laws Q , 2 ‚, and put

YLn
TLn :D R n , n1.
0 L 2 .s /  2 ds
p
Then Proposition 8.13 establishes weak convergence of L. n˛.#/ .TLn  #/ j Q# /
as n ! 1 under all values of the parameter # 2 ‚. Even if such estimators can be
arbitrarily bad, depending on the choice of A, they converge at the right speed at all
points of the model: this establishes Assumption 7.14(D).
240 Chapter 8 Some Stochastic Process Examples

In the local model E#,n at #


Z n  Z n
1 1
Sn .#/ D p .s / d m.#/
s D p Yn  #  2 .s /  2 ds
n˛.#/ 0 n˛.#/ 0

is a version of the score, according to 8.14 and (+) in Model 8.12, and the information
Z n
1
Jn .#/ D ˛.#/  2 .s /  2 ds
n 0

depends on # only through local scale. In particular,


!
n˛.#/
.C/ Jn . /  Jn .#/ D  1 Jn .#/ ;
n˛./
R R
writing down Sn . / and transforming .s / d m./ into .s / d m.#/ plus correc-
tion term we get
Sn . / pSn .#/ D ! p
n˛.#/ n˛.#/ hp ˛.#/ i
.CC/ p  1 Sn .#/  p n .  #/ Jn .#/ .
n˛./ n˛./
From (+) and (++) it is obvious that parts (B) and (C) of Assumptions 7.14 simultane-
ously will follow from
ˇ p ˇ
ˇ n˛. #Ch= n˛.#/ / ˇ
ˇ ˇ
./ sup ˇ  1 ˇ ! 0 as n ! 1
jhjc ˇ n˛.#/ ˇ

for arbitrary values of a constant 0 < c < 1. By .8.100 /, ˛. / differs from ˛.#/ by
Œ 12 .  #/. Hence, to check ./ it is sufficient to cancel out n˛.#/ in the ratio in
./ and then take logarithms (note that this argument exploits again 0 < ˛.#/ < 1
for every # 2 ‚). Now parts (B) and (C) of Assumptions 7.14 are established. This
finishes the proof, and we write down the one-step modification
Rn
1 1 Yn  Tn 0  2 .s /  2 ds
en :D Tn C p
T Sn .Tn / D Tn C Rn Db #n
n˛.Tn / Jn .Tn / 2 2
0  .s /  ds

according to (a simplified version of) Theorem 7.19. 


8.3 Some Further Remarks
We point out several (out of many more) references on LAN, LAMN, LAQ in stochas-
tic process models of different types. Some of these papers prove LAN or LAMN or
LAQ in a particular stochastic process model, others establish properties of estimators

Section 8.3 Some Further Remarks 241

at # which indicate that the underlying statistical model should be LAMN at # or LAQ
at #.
Cox–Ingersoll–Ross process models are treated in Overbeck [103], Overbeck and
Ryden [104] and Ben Alaya and Kebaier [8]. For ergodic diffusions with state space R

dX t D b.#, X t / dt C  .X t / d W t , t 0

where the drift depends on a d -dimensional parameter # 2 ‚, the books by Kutoyants


[78, 80] present a large variety of interesting models and examples.
For LAN in parametric models for non-time-homogeneous diffusions – where it is
assumed that the drift is periodic in time and that in a suitable sense ‘periodic ergodic-
ity’ holds – see Höpfner and Kutoyants [55]. An intricate tableau of LAN, LAMN or
LAQ properties arising in a setting of delay equations can be found in Gushchin and
Küchler [39].
Markov step process models with transition intensities depending on an unknown
parameter # 2 ‚ are considered in Höpfner, Jacod and Ladelli [51] and in Höpfner
[46, 48, 49]. For general semi-martingale models with jump parts and with diffusive
parts see Sørensen [119] and Luschgy [95–97]. For branching diffusions – some finite
random number of particles diffusing in space, branching with random number of off-
spring at rates which depend on the position and on the whole configuration, whereas
immigrants are arriving according to Poisson random measure – LAN or LAMN has
been proved by Löcherbach [90, 91]; ergodicity properties and invariant measures for
such processes have been investigated by Hammer [44].
Diffusion process models where the underlying process is not observed continu-
T
ously in time but at discrete time steps ti D i  with 0  i   (either with T fixed
or with T increasing to 1), and where asymptotics are in  tending to 0, have received
a lot of interest: see e.g. Yoshida [129], Genon-Catalot and Jacod [25], Kessler [70],
or Shimizu and Yoshida [118]. Proofs establishing LAMN (in the case where T is
fixed) or LAN (in the case where T " 1 provided the process is ergodic) have been
given by Dohnal [22] and by Gobet [30,31]. The book [71] edited by Kessler, Lindner
and Sørensen gives a broad recent overview on discretely observed semi-martingale
models.
If the underlying process is a point process, e.g. Poisson random measure .dx/ on
some space .E, E/ with deterministic intensity # .dx/ depending on some unknown
parameter # 2 ‚, one may consider sequences of windows Kn  E which increase
to E as n ! 1, and identify the experiment En at stage n of the asymptotics with
observation of the point process in restriction to the window Kn , cf. Kutoyants [79]
or Höpfner and Jacod [50].
If our Chapters 7 and 8 deal with local asymptotics of type LAN, LAMN or LAQ,
let us mention one example for a limit experiment of different type, inducing statistical
properties via local asymptotics which are radically different from what we have dis-
cussed above. Ibragimov and Khasminskii [60] put forward a limit experiment where
242 Chapter 8 Some Stochastic Process Examples

the likelihood ratios are

.IK/ e u  2 juj ,
Lh=0 D e W
1
h2R

with two-sided Brownian motion .W e u /u2R , and studied convergence to this limit
experiment in ‘signal in white noise’ models. Obviously (IK) is not a quadratic ex-
periment since the parameter plays the role of time: in this sense, the experiment
(IK) is linked to the Gaussian shift limit experiment as we pointed out in Example
1.16’. Dachian [17] associated to the limit experiment (IK) approximating experiments
where likelihood ratios have – separately on the positive and on the negative branch –
the form of the trajectory of a particular Poisson process with suitable linear terms
subtracted. In the limit model (IK), Rubin and Song [114] could calculate the risk of
the Bayesian u – for quadratic loss, and with ‘uniform prior over the real line’ – and
could show that quadratic risk of u is by some factor smaller than quadratic risk of
the maximum likelihood estimator b u. Recent investigations by Dachian show quanti-
tatively how this feature carries over to the approximating experiments which he con-
siders: there is a large domain of parameter values in the approximating experiments
where rescaled estimation errors of an analogously defined un outperform those of b un
under quadratic risk. However, not much seems to be known on comparison of b u and
u using other loss functions than the quadratic, a fortiori not under a broader class of
loss functions, e.g. subconvex and bounded. There is nothing like a central statistic in
the sense of Definition 6.1 for the experiment (IK) where the only sufficient statistic is
the whole two-sided Brownian path, hence nothing like a central sequence in the sense
of Definition 7.1’(c) in the approximating experiments. Still, both estimators bu and u
in the experiment (IK) are equivariant in the sense of Definition 5.4: in Höpfner and
Kutoyants [56, 57] we have studied local asymptotics with limit experiment (IK) – in
a context of diffusions carrying a deterministic discontinuous periodic signal in their
drift, and being observed continuously over a long time interval – using some of the
techniques of Chapter 7; the main results of Chapter 7 have no counterpart here. The
limit experiment (IK) is of importance in a broad variety of contexts, e.g. Golubev [32],
Pflug [106], Küchler and Kutoyants [76] and the references therein.
Chapter 9

Appendix

Topics:

9.1  Convergence of Martingales


Convergence to a Gaussian martingale 9.1
Convergence to a conditionally Gaussian martingale 9.2
Convergence of pairs (martingale, angle bracket) 9.3
9.2  Harris Recurrent Markov Processes
Harris recurrent Markov processes 9.4
Invariant measure 9.5–9.5’
Additive functionals 9.5”
Ratio limit theorem 9.6
Tightness rates under null recurrence (9.7)
Regular variation condition, Mittag-Leffler processes, weak convergence of integrable
additive functionals 9.8
Martingale additive functionals 9.9–9.9’
Regular variation condition: weak convergence of martingale additive functionals to-
gether with their angle brackets 9.10–9.10’
9.3  Checking the Harris Condition
A variant of the Harris condition 9.11
From grid chains via segment chains to continuous time processes 9.11’
9.4  One-dimensional Diffusions
Harris properties for diffusions with values in R 9.12
Some examples 9.12’
Diffusions taking values in some open interval I in R 9.13
Some examples 9.13’–9.13”
Exact constants for i.i.d. cycles in null recurrent diffusions without drift 9.14

This Appendix collects facts of different nature which we quote in the stochastic pro-
cess sections of this book. In most cases, they are stated without proof, and we indicate
references. An asterisk  in front of this chapter (and in front of all its sections) indi-
cates that the reader should be acquainted with basic properties of stochastic processes
in continuous time, with semi-martingales and stochastic differential equations. Our

244 Chapter 9 Appendix

principal references are the following. The book by Métivier [98] represents a well-
written source for the theory of stochastic processes; a useful overview appears in
the appendix sections of Bremaud [14]. A detailed treatment can be found in Del-
lacherie and Meyer [20]. For stochastic differential equations, we refer to Karatzas
and Shreve [69] and Ikeda and Watanabe [61]. For semi-martingales and their (weak)
convergence, see Jacod and Shiryaev [64].


9.1 Convergence of Martingales
All filtrations which appear in this section are right-continuous; all processes below
have càdlàg paths. .D, D, G/ denotes the Skorohod space of d -dimensional càdlàg
functions (see [64, Chap. VI]). A d -dimensional locally square integrable local mar-
tingale M D .M t / t0 starting from M0 D 0 is called a continuous Gaussian mar-
tingale if there are no jumps and if the angle bracket process hM i is continuous and
deterministic: in this case, M has independent increments, and all finite dimensional
distributions are Gaussian laws. We quote the following from [64, Coroll. VIII.3.24]:

9.1 Theorem. For n  1, consider d -dimensional locally square integrable local


martingales
 .n/ 
M .n/ D M t t0 defined on ..n/ , A.n/ , F .n/ , P .n/ /
.n/
starting from M0 D 0. Let .n/ .ds, dy/ denote the point process of jumps of M .n/
and  .n/ .ds, dy/ its .P .n/ , F .n/ /-compensator, for .s, y/ in .0, 1/ .Rd n ¹0º/. As-
sume a Lindeberg condition
for all 0 < t < 1 and all " > 0,
Z tZ
jyj2  .n/ .ds, dy/ D oP .n/ .1/ as n ! 1 .
0 ¹jyj>"º

Let M 0 denote a continuous Gaussian martingale


 
M 0 D M t0 t0 defined on .0 , A0 , F 0 , P 0 /

and write C :D hM 0 i for the deterministic angle bracket. Then stochastic conver-
gence
˝ ˛
for every 0 < t < 1 fixed, M .n/ t D C t C oP .n/ .1/ as n ! 1
implies weak convergence of martingales
   
Q.n/ :D L M .n/ j P .n/ ! Q0 :D L M 0 j P 0
in the Skorohod space D as n ! 1.

Section 9.1 Convergence of Martingales 245

Next we fix one probability space ..n/ , A.n/ , P .n/ / D ., A, P / for all n,
equipped with a filtration F , and assume that M .n/ and F .n/ as above are derived from
the same locally square integrable local .P , F /-martingale M D .M t / t0 through
.n/ .n/
space-time rescaling (such as e.g. F t :D F tn and M t :D n1=2 M tn ). We need
the following nesting condition (cf. [64, VIII.5.37]):
8
ˆ
< there is a sequence of positive real numbers ˛n # 0 such that
.n/
.C/ F˛n is contained
 in F˛.nC1/ for all n  1, and
:̂  S F .n/ D  S F  D: F .
nC1

n ˛n t t 1

On ., A, P /, we also need a collection of F1 -measurable random variables ˆ D


.ˆ t / t0 such that
paths t ! ˆ t are continuous and strictly increasing with ˆ0 D 0
.CC/ and lim ˆ t D 1, P -a.s.
t!1

(such as e.g. ˆ t D t for some F1 -measurable random variable > 0). On some
other probability space .0 , A0 , F 0 , P 0 /, consider a continuous Gaussian martingale
M 0 with deterministic angle bracket hM 0 i D C . We define M 0 subject to independent
time change t ! ˆ t  
M 0 ı ˆ D Mˆ0 t t0
as follows. Let K 0 ., / denote a transition probability from ., F1 / to .D, D/ such
that for the first argument ! 2  fixed, the canonical process  on .D, D/ under
K 0 .!, / is a continuous Gaussian G-martingale with angle bracket
t ! .C ı ˆ/.!, t / D C.ˆ t .!//.
Lifting ˆ and  to
 
 D , A˝D , .F1 ˝G t / t0 , .PK 0 /.d!, df / :D P .d!/K 0 .!, df /
the pair .ˆ, M 0 ı ˆ/ is well defined on this space (cf. [64, p. 471]). By this construc-
tion, M 0 ı ˆ is a conditionally Gaussian martingale. We quote the following result
from Jacod and Shiryaev [64, VIII.5.7 and VIII.5.42)]:

9.2 Theorem. For n  1, for filtrations F .n/ in A such that the nesting condition (+)
above holds, consider d -dimensional locally square integrable local martingales
 .n/ 
M .n/ D M t t0 on ., A, F .n/ , P /
.n/
starting from M0 D 0 and satisfying a Lindeberg condition
for all 0 < t < 1 and all " > 0,
Z tZ
jyj2  .n/ .ds, dy/ D oP .1/ as n ! 1 .
0 ¹jyj>"º

246 Chapter 9 Appendix

As above, for a collection of F1 -measurable variables ˆ D .ˆ t / t0 satisfying (++)


and for a continuous Gaussian martingale M 0 with deterministic angle bracket hM 0 i D
C , we have a (continuous) conditionally Gaussian martingale M 0 ıˆ living on . D,
A˝D, .F1 ˝G t / t0 , PK 0 /. Then stochastic convergence
˝ ˛
for every 0 < t < 1 fixed, M .n/ t D .C ı ˆ/ t C oP .1/ as n ! 1

implies weak convergence of martingales


   
L M .n/ j P ! L M 0 ı ˆ j PK 0

in the Skorohod space D as n ! 1.

Finally, consider locally square integrable local martingales M .n/


Z Z
M .n/ D M .n,c/ C y ..n/   .n/ /.ds, dy/
0 Rd n¹0º

where M .n,c/ is the continuous local martingale part. Writing M .n,i/ , 1  i  d , for
the components of M .n/ and ŒM .n/  for the quadratic covariation process, we obtain
for 0 < t < 1 and i , j D 1, : : : , d
ˇ ˝ ˛ ˇˇ
ˇ
sup ˇ M .n,i/ , M .n,j / s  M .n,i/ , M .n,j / s ˇ D oP .1/ as n ! 1
st

provided the sequence .M .n/ /n satisfies the Lindeberg condition. In this situation,
since Jacod and Shiryaev [64, VI.6.1] show that weak convergence of M .n/ as
n ! 1 in D to a continuous limit martingale implies weak convergence of pairs
.M .n/ , ŒM .n/ / in the Skorohod space of càdlàg functions Œ0, 1/ ! Rd Rd d ,
we also have weak convergence of pairs .M .n/ , hM .n/ i/. Thus the following result is
contained in [64, VI.6.1]:

9.3 Theorem. For n  1, consider d -dimensional locally square integrable local


martingales
 .n/ 
M .n/ D M t t0 defined on ..n/ , A.n/ , F .n/ , P .n/ /

.n/ f denote a
with M0 D 0. Assume the Lindeberg condition of Theorem 9.1. Let M
continuous local martingale
 
fD M
M ft defined on ., e e
e A, e/
F, P
t0

f0 D 0. Then weak convergence in D


starting from M
   
L M .n/ j P .n/ ! L M fjP e , n!1

Section 9.2 Harris Recurrent Markov Processes 247

implies weak convergence


 ˝ ˛   ˝ ˛ 
L M .n/ , M .n/ j P .n/ ! L Mf, Mf jP
e , n!1

in the Skorohod space D.Rd Rd d / of càdlàg functions Œ0, 1/ ! Rd Rd d .


9.2 Harris Recurrent Markov Processes
For Harris recurrence of Markov chains, we refer to Revuz [111] and Nummelin [101].
For Harris recurrence of continuous time Markov processes, our main reference is
Azema, Duflo and Revuz [2]. On some underlying probability space, we consider a
time homogeneous strong Markov process X D .X t / t0 taking values in Rd , with
càdlàg paths, having infinite life time, and its semigroup

.P t ., // t0 :


P t .x, dy/ D P . XsCt 2 dy j Xs D x / , x, y 2 Rd , s, t 2 Œ0, 1/ .
R1
We write U 1 for the potential kernel U 1 .x, dy/ D 0 e t P t .x, dy/ dt which cor-
responds to observation of X after an independent exponential time. For  -finite mea-
sures m on .Rd , B.Rd //, measurable functions f : Rd ! Œ0, 1 and t  0 write
Z
P t f .x/ D P t .x, dy/ f .y/ ,
Z
mP t .dy/ D m.dx/ P t .x, dy/ ,
Z
Em f D m.dx/ Ex .f / .

A  -finite measure m on .Rd , B.Rd // with the property

m P t D m for all t  0

is termed invariant for X . On the canonical path space .D, D, G/ for Rd -valued
càdlàg processes, write Qx for the law of the process X starting from x 2 Rd . Let
 D . t / t0 denote the canonical process on .D, D, G/, and . t / t0 the collection
of shift operators on .D, D/: t .˛/ :D .˛.t C s//s0 for ˛ 2 D. Systematically,
we speak of ‘properties of the process X ’ when we mean ‘properties of the semigroup
.P t ., // t0 ’: these in turn will be formulated as properties of the canonical process
 in the system .D, D, G, . t / t0 , .Qx /x2Rd /.

9.4 Definition. The process X D .X t / t0 is called recurrent in the sense of Harris
(or Harris for short) if there is a  -finite measure on .Rd , B.Rd // such that the

248 Chapter 9 Appendix

following holds:
A 2 B.Rd / , .A/ > 0 H)
.˘/ Z 1
1A .s / ds D C1 Qx -almost surely, for every x 2 Rd .
0

In Definition 9.4, a process X satisfying .˘/ will accumulate infinite occupation


time in sets of positive -measure, independently of the starting point. The following
is from [2, 2.4–2.5]:

9.5 Theorem. If X is recurrent in the sense of Harris,


(a) there is a unique (up to multiplicative constants) invariant measure m on
.Rd , B.Rd //,
(b) condition .˘/ of Definition 9.4 holds with m in place of ,
(c) any measure in Definition 9.4 for which condition .˘/ holds satisfies << m
and U 1 m.

9.5’ Definition. A Harris process X is called positive recurrent if the invariant mea-
sure m is of finite total mass on Rd ; X is called null recurrent otherwise.

9.5” Definition. An additive functional of X is an adapted process A D .A t / t0 on


.D, D, G/, all paths càdlàg non-decreasing and starting from A0 D 0, such that

A t  As D A ts ı s for all 0  s < t < 1 .

If X is Harris with invariant measure m, A is termed integrable if Em .A1 / < 1.

From [2, 3.1], we quote Theorem 9.6(a.i) below; the following assertions of Theo-
rem 9.6(a.ii) and 9.6(b) are immediate but important consequences.

9.6 Ratio Limit Theorem. (a) If X is recurrent in the sense of Harris, with invariant
measure m,
(i) we have for integrable additive functionals A, B such that 0 < Em .B1 / < 1
At Em .A1 /
for every x 2 Rd : ! Qx -almost surely as t ! 1 ;
Bt Em .B1 /

(ii) we have for functions f , g 2 L1 .Rd , B.Rd /, m/ such that m.g/ ¤ 0


Rt
f .s / ds m.f /
for every x 2 R : R0t
d ! Qx -almost surely as t ! 1 .
0 g.s / ds
m.g/

Section 9.2 Harris Recurrent Markov Processes 249

(b) If X is positive recurrent, with g  1 and norming factor such that m.Rd / D 1,
Z
1 t
for every x 2 R : d f .s / ds ! m.f / Qx -almost surely as t ! 1
t 0
for arbitrary functions f 2 L1 .Rd , B.Rd /, m/.

Note that integrability of additive functionals as required in Theorem 9.6(a) turns


out to be a restrictive condition in null recurrent cases where the invariant measure m
has infinite total mass on Rd .
Whereas Theorem 9.6(b) provides a strong law of large numbers for suitably normed
integrable additive functionals of a Harris process X which is positive recurrent, not
even convergence in law holds true when X is null recurrent unless additional con-
ditions are imposed. In the general null recurrent case, all that remains granted is ex-
istence of a tightness rate for the class of integrable additive functionals, i.e. of some
deterministic norming function t ! v.t / such that
 Z t 
1 1
.9.7/ lim lim inf Qx < f .s / ds < M D 1
M !1 t!1 M v.t / 0
for all functions f 2 L1 .Rd , B.Rd /, m/ with m.f / ¤ 0, independently of the
choice of the starting point x 2 Rd . This has been proved by Loukianova and Loukia-
nov [94] for the special case of one-dimensional diffusions, using local time. For gen-
eral Harris processes, the result is due to Löcherbach and Loukianova [93], using a
new Nummelin-like splitting technique.
In the case of null recurrence, weak convergence of integrable additive functionals
requires an extra condition on the semigroup of X , of regular variation type. This
goes back to Darling and Kac [18] (see [12, Chap. 8.11]). Below, we give the regular
variation condition in a form due to Touati [123]. Then suitably normed integrable
additive functionals of X converge weakly to a Mittag–Leffler process V .˛/ of index
0 < ˛  1, the process inverse of the stable increasing process S .˛/ of index 0 <
˛  1. When 0 < ˛ < 1, we refer to the definition given in Remark 6.17: the process
.˛/
S .˛/ starting from S0 D 0 has independent and stationary increments with Laplace
transform
 .˛/ .˛/

E e  .S t2 S t1 / D e .t2 t1 /  ,   0 , 0  t1 < t2 < 1 ,
˛

V .˛/ is the process of level crossing times of S .˛/


° ±
V t.˛/ D inf v > 0 : Sv.˛/ > t , t 0,

and paths of V .˛/ are continuous and non-decreasing with V0.˛/  0 and lim V t.˛/ D
t!1
1 . We extend this definition to the case ˛ D 1 by
S .1/ D id D V .1/ , i.e. S t.1/  t  V t.1/ , t 0.

250 Chapter 9 Appendix

The last definition is needed in view of null recurrent processes where suitably normed
integrable additive functionals converge weakly to V .1/ D id . As an example, we
might have a recurrent atom in the state space of X such that the distribution of the time
between successive visits in the atom is ‘relatively stable’ in the sense of [12, Chap. 8.8
combined with p. 359]; at every visit, the process might spend an exponential time in
the atom; then the occupation time A t of the atom up to time t defines an additive
functional A D .A t / t0 of X for which suitable norming functions vary regularly at
1 with index ˛ D 1. Index ˛ D 1 is also needed for the strong law of large num-
bers in Theorem 9.6(b) in positive recurrent Harris processes. We quote the following
from [53, Thm. 3.15].

9.8 Theorem. Consider a Harris process X with invariant measure m.


(a) For 0 < ˛  1 and `./ varying slowly at 1, the following assertions (i) and (ii)
are equivalent:
(i) for every function g : Rd ! Œ0, 1/ which is B.Rd /-measurable and satisfies
0 < m.g/ < 1, one has
Z 1 
 1t s t˛
.R1=t g/.x/ :D Ex e g.s / ds m.g/ as t ! 1
0 `.t /
for m-almost all x 2 Rd , the exceptional null set depending on g;
(ii) for every f : Rd ! Œ0, 1/ which is B.Rd /-measurable with 0 < m.f / < 1,
for every x 2 Rd ,
 Z 
`.n/ t n
f . s / ds under Qx
n˛ 0 t0

converges weakly in D as n ! 1 to
m.f / V .˛/
where V .˛/ is the Mittag–Leffler process of index ˛.
(b) The cases in (a) are the only ones where for functions f as in (a.ii) and for suitable
choice of a norming function v./, weak convergence in D of
 Z tn 
1
f .s / ds under Qx
v.n/ 0 t0

to a continuous non-decreasing limit process V (starting at V0 D 0, and such that


L.V1 / is not Dirac measure 0 at 0) is available.

In statistical models for Markov processes under time-continuous observation, we


have to consider martingales when the starting point x 2 Rd for the process is arbi-
trary but fixed. As an example, in the context of Chapter 8, the score martingale is a

Section 9.2 Harris Recurrent Markov Processes 251

locally square integrable local Qx -martingale M D .M t / t0 on .D, D, G/. We need


limit theorems for the pair .M , hM i/ under Qx to prove convergence of local mod-
els. Under suitable assumptions, Theorem 9.8 above can be used to settle the problem
for the angle bracket hM i under Qx . Thus we need weak convergence for the score
martingale M under Qx . From this – thanks to a result in [64], which was recalled in
Theorem 9.3 above – we can pass to joint convergence of the pair .M , hM i/ under Qx .

9.9 Definition. On the path space .D, D, G/, for given x 2 Rd , consider a locally
square integrable local Qx -martingale M D .M t / t0 together with its quadratic vari-
ation ŒM  and its angle bracket hM i under Qx ; assume in addition that hM i under
Qx is locally bounded. We call M a martingale additive functional if the following
holds:
(i) ŒM  and hM i under Qx admit versions which are additive functionals of  ;
(ii) for every choice of 0  s < t < 1 and y 2 Rd , one has M t  Ms D M ts ı s
Qy -almost surely.

9.9’ Example. (a) On .D, D, G/, for starting point x 2 R, write Qx for the law of a
diffusion
dX t D b.X t / dt C  .X t / d W t
and m.x/ for the local martingale part of the canonical process  on .D, D, G/ under
Qx . Write L for the Markov generator of X and consider some C 2 function F : R !
R with derivative f . Then
Z t 
Mt D f .s / d m.x/
s , t 0
0 t0

is a locally square integrable local Qx -martingale admitting a version .t , !/ ! Y .t , !/


 Z t 
Y t :D F . t /  F .0 /  LF .s / ds , t  0
0

by Ito formula. This version satisfies Y t  Ys D Y ts ı s for 0  s < t < 1.


Quadratic variation and angle bracket of M under Qx
Z t
ŒM  t D hM i t D f 2 .s / 2 .s /ds , t 0
0
are additive functionals. Hence M is a martingale additive functional as defined in
Definition 9.9.
(b) Prepare a Poisson process N D .N t / t0 with parameter  > 0 independent of
Brownian motion W . Under parameter values  > 0,  > 0, " > 0, for starting point
x 2 R, write Qx for the law on .D, D, G/ of the solution to
dX t D  1¹Xt  >0º dt C " 1¹Xt  <0º dN t C  d W t

252 Chapter 9 Appendix

driven by the pair .N , W /. Let M denote the local martingale part – sum of a contin-
uous and a purely discontinuous martingale – of the canonical process  under Qx .
It admits a version .t , !/ ! Y .t , !/
Z t
 
Y t :D  t  0  " 1.1,0/ .s /   1.0,1/ .s / ds , t  0
0

which satisfies Y t  Ys D Y ts ı s for 0  s < t < 1. The quadratic variation of


M under Qx is
Z t
hM i t D  2 t C "2  1.1,0/ .s / ds .
0
The angle bracket of M under Qx admits a version (with notation s D s s  )
X
ŒM  t D  2 t C "2 1¹ s D"º
0<st
P
since the sum process on the right-hand side is indistinguishable from . 0<st
.s /2 1¹j s j>0º / t0 under Qx (see also [64, p. 55]). All requirements of Definition
9.9 are satisfied: thus M is a martingale additive functional. 

We quote the following from Höpfner and Löcherbach [53, Thm. 3.16].

9.10 Theorem. Consider a Harris process X , Rd -valued, with invariant measure m.


Assume that for X there are 0 < ˛  1 and `./ varying slowly at 1 such that

condition (i) of Theorem 9.8(a) is satisfied with ˛ and `./ .

For given x 2 Rd , consider a locally square integrable local Qx -martingale M D


.M t / t0 on .D, D, G/. Assume that Em .hM i1 / is finite, and that M is a martingale
additive functional as defined in Definition 9.9. Then as n ! 1
1
M n :D p . M tn / t0 under Qx
˛
n =`.n/

converges weakly in D to the limit martingale M

M D ƒ1=2 B ı V .˛/ , ƒ :D Em .hM i1 /

where Brownian motion B and Mittag–Leffler process V .˛/ are independent.


.˛/
Recall from Remark 6.17 that L.V t / admits finite moments of arbitrary order.
The proof in [53] is via Nummelin splitting [102] along independent exponential wait-
ing times in a sequence of strong Markov processes approaching (an extension of)  :
we construct renewal times and independent life cycles between successive renewal

Section 9.3 Checking the Harris Condition 253

times in the approximating processes and apply limit theorems for sums of i.i.d. ran-
dom variables from Resnick and Greenwood [110]. The theorem contains the positive
recurrent cases where ˛ D 1, `./  1 and V .1/ D id. Note that Theorem 9.10 is not
a martingale convergence theorem in the sense of Theorem 9.2: there is no stochastic
convergence of angle brackets hM n i t , only convergence in law.
Via Theorem 9.3, the following is a direct consequence of Theorems 9.8 and 9.10.

9.10’ Corollary. With notations of Theorem 9.10, and under all assumptions made
there, consider
1
M n :D p . M tn / t0 under Qx
n˛ =`.n/

as n ! 1. With  n .ds, dy/ the .Qx , .G tn / t0 /-compensator of the point process of
jumps of M n , assume in addition a Lindeberg condition

for all 0 < t < 1 and all " > 0,


Z tZ
jyj2  n .ds, dy/ D oQx .1/ as n ! 1 .
0 ¹jyj>"º

Then the assertion of Theorem 9.10 strengthens to weak convergence of the pairs
 n 
M , hM n i under Qx

in D.Rd Rd d / as n ! 1 to the pair


 1=2 
ƒ B ı V .˛/ , ƒ V .˛/ .


9.3 Checking the Harris Condition
We continue under the basic assumptions and notations stated at the start of Sec-
tion 9.2, and present conditions which imply that a càdlàg time-continuous strong
Markov process is Harris. The first one is from [112, pp. 394–395].

9.11 Proposition. If an invariant measure m for X D .X t / t0 is known, the condition

A 2 B.Rd / , m.A/ > 0 H)


.ı/
lim sup 1A . t / D 1Qx -almost surely, for every x 2 Rd
t!1

implies Harris recurrence.

Proof. We use arguments of Revuz [111, pp. 94–95] and Revuz and Yor [112, p.
395]. Since m in condition .ı/ is invariant, we consider in .ı/ sets A 2 B.Rd / with

254 Chapter 9 Appendix

Rt
Em . 0 1A .s / ds/ D t m.A/ > 0 . As a consequence, we can specify some " > 0 and
some ı > 0 such that
® ¯
m x 2 Rd : Qx . < 1/ > ı > 0
.ıı/ Rt
where  :D A D inf¹t > 0 : 0 1A .s / ds > "º .

(1) Write f : Rd ! Œ0, 1 for the B.Rd /-measurable function v ! Qv . < 1/ .


Then  
f . t / D E 1¹tCı t <1º j G t , t  0
(‘dot’ indicating a common version of the conditional expectation under Px for all x
in virtue of the Markov property) and from this

P t f .v/ D Ev .f . t // D Qv .t C  ı t < 1/  Qv . < 1/ D f .v/ , v 2 Rd

which establishes the inequality

Pt f  f for 0<t <1.

Define a process N D .N t / t0 on .D, D/ by N t :D f . t / , t  0. By P t f  f , N


is a non-negative .G, Qy /-supermartingale, for every value of a starting point y 2 Rd .
.y/
Hence, for every value of y 2 Rd , there is a limit variable N1 under Qy such that
.y/
N t ! N1 Qy -almost surely as t ! 1 .
.y/
On the one hand, this implies f .y/  Ey .N1 / by Fatou, for every y 2 Rd . On the
other hand, the set ¹x : f .x/ > ıº has positive m-measure by .ıı/, and thus will be
visited infinitely often, almost surely, by the process  under Qy as a consequence of
.y/
.ı/, for every value of y 2 Rd : hence we must have N1  ı Qy -almost surely, for
every y 2 Rd . Both assertions combined give f .y/  ı for all y 2 Rd . Thus from
.y/
definition of N , the supermartingale N together with its limit variables N1 under
Qy for starting points y 2 Rd is separated away from 0.
(2) Consider the event
²Z 1 ³ \
R :D RA D 1A .s / ds D 1 D ¹ t C  ı t < 1 º 2 G1 .
0 t

For starting point x 2 Rd fixed and for arbitrarily large k < 1, introduce uniformly
integrable .G, Qx /-martingales N .1,x/ , N .2,x,k/ :
.1,x/  
Nt :D Ex . 1R j G t / , N t.2,x,k/ :D Ex 1R C 1Rc \ ¹kCık <1º j G t ,
t  0.

By definition of the supermartingale N in step (1) and since the events ¹t C ı t < 1º
are decreasing to R as t ! 1, we can compare conditional expectations, and obtain

Section 9.3 Checking the Harris Condition 255

Qx -almost surely

1R D lim N t.1,x/  lim N t D N1


.x/
 lim N t.2,x,k/ D 1R C 1Rc \ ¹kCık <1º .
t!1 t!1 t!1

Letting k tend to 1 we arrive at


.x/
1 R D N1 Qx -almost surely
.x/
where x 2 Rd is arbitrary. We have seen in step (1) that all limit variables N1 under
Qx are separated from 0 by ı > 0. Hence the event R is of full measure under Qx
for every x 2 Rd . 

Revuz and Yor [112, pp. 394–395] take .ı/ in Proposition 9.11 as a definition for
Harris recurrence of .X t / t0 . From the very beginning, we have to know an invari-
ant measure. The following sufficient criterion for Harris recurrence of the process
.X t / t0 avoids this: for some 0 < T < 1 which is deterministic we might be able to
establish that .XkT /k2N0 is a Harris chain.

9.11’ Proposition. Assume that for some step size T 2 .0, 1/ and for some  -finite
measure b on .Rd , B.Rd // we have

A 2 B.Rd /, b.A/ > 0 H)


X1
.ıı/
1A .kT / D C1 Qx -almost surely, for every x 2 Rd .
kD1

Then the process X D .X t / t0 is Harris recurrent. In fact, condition .ıı/ implies the
following stronger statement: path segments over Œk t , .kC1/T  taken from the path
of X form a Harris chain
 
.XkT Cv /0vT k2N0

taking values in the Skorohod space .D.Œ0, T /, D.Œ0, T // of càdlàg functions
Œ0, T  ! Rd .

Proof. In this proof, we write for short .E, E/ instead of .Rd , B.Rd //, everything
remaining valid when .E, E/ is a Polish space and .D, D, G/ the space of càdlàg
functions Œ0, 1/ ! E with canonical process  D .t / t0 .
(1) By condition .ıı/, the grid chainb  :D .kT /k2N0 is a Harris chain, has a unique
invariant measure m b (unique up to multiplication with a constant), and .ıı/ holds with
b in place of b (.this is from Harris [45], see also [101, p. 43] and [111, p. 92]).
m
(2) We show that when .ıı/ holds, T -segments in the path of  form a Harris chain.
(a) As a consequence of step (1), pairs .kT , .kC1/T /k2N0 form a Harris chain on
.E E, E˝E/ whose invariant measure is m b.dy1 /PT .y1 , dy2 / .

256 Chapter 9 Appendix

(b) The next argument is as in [56, Sect. 2]. Write . t / t0 for the process of coordinate
projections on .D.Œ0, T /, D.Œ0, T //, and let m denote the unique measure on
D.Œ0, T / specified by the following set of finite dimensional distributions:
8  
ˆ
< m Rtj 2 Aj , 0  j  l D
ˆ
Q
b.dy0 / 1A0 .y0 / jl D1 P tj tj 1 .yj 1 , dyj /1Aj .yj /
E lC1 m
ˆ
:̂ with 0 D t < t <    < t D T arbitrary, l  1, and A 2 E for 0  j  l .
0 1 l j

Since D.Œ0, T / is a Polish space, m admits the following disintegration: there is a


regular version of the conditional law under m of the process . t /0tT given the
pair .0 , T /, i.e. a transition probability K., / from E E to D.Œ0, T / such that
Z
. / m.F / D b.dy1 /PT .y1 , dy2 / K. .y1 , y2 / , F / , F 2 D.Œ0, T / .
m
E E

Here the measure m.dy1 /PT .y1 , dy2 / is as in (a), and the laws K. .y1 , y2 / ,  / on
D.Œ0, T / correspond to bridges from state y1 at time 0 to state y2 at time T .
(c) Consider a set F 2 D.Œ0, T / with m.F / > 0. Then by . /, there is some " > 0
such that
BF :D ¹ .y1 , y2 / : K. .y1 , y2 / , F / > " º
b.dy1 /PT .y1 , dy2 /-measure.
has strictly positive m
As a consequence of the Harris property in (a), the chain of pairs .kT ,
.kC1/T /k2N0 will visit infinitely often the set BF , independently of the choice of
a starting point. Then by definition of BF combined with . /, the kernel K pasting in
bridges, the segment chain will visit F infinitely often: we obtain
1
X  
F 2 D.Œ0, T / , m.F / > 0 : 1F .kT Cv /0vT D 1
kD0

almost surely, independently of the choice of a starting point. Thus the segment chain

.C/  D . k /k2N0 , k :D .kT Cv /0vT


is a Harris chain taking values in .D.Œ0, T /, D.Œ0, T //. It admits a unique invariant
measure. From the definition of m we see that the measure m is invariant for the
segment chain. Hence  is Harris with invariant measure m.
(3) We show that . t / t0 is a Harris process. Introduce a  -finite measure on
.E, E/:
Z T
.A/ :D ds ŒbmPs .A/ , A 2 E .
0
Fix some set B 2 E with mb.B/ D 1 (recall that we are free to multiply m
b with positive
constants). Together with the D.Œ0, T /-measurable function

G : D.Œ0, T / ! R given by ˛ ! 1¹0 2Bº .˛/ , ˛ 2 D.Œ0, T /



Section 9.3 Checking the Harris Condition 257

consider a family of D.Œ0, T /-measurable functions indexed by A 2 E,


Z T
FA : D.Œ0, T / ! R given by ˛ ! ds 1A .˛.s// , ˛ 2 D.Œ0, T / .
0

Note that m.FA / D .A/ for A 2 E, and m.G/ D m b.B/ D 1 . The ratio limit
theorem for the segment chain (+) yields the convergence
R mT Pm1
1A .s / ds FA . k / m.FA /
.CC/ lim Pm1
0
D lim PkD0 m1 D D .A/
m!1
kD0 1B . kT /
m!1
mD0 G. k / m.G/

Qx -almost surely, for every choice of a starting point x 2 E. From m


b.B/ > 0 and the
Harris property of the grid chain b D .kT /k2N0 in (1), denominators in (++) tend
to 1 Qx -almost surely, for every choice of a starting point x 2 E. Hence, for sets
A 2 E with .A/ > 0, also numerators in (++) increase to 1 Qx -almost surely, for
every choice of a starting point x 2 E:
Z 1
A 2 E , .A/ > 0 : 1A .s / ds D 1 Qx -almost surely, for every x 2 E .
0

This is property .˘/ in Definition 9.4: hence the continuous-time process  D . t / t0
is Harris. By Theorem 9.5,  admits a unique (up to constant multiples) invariant
measure m on .E, E/; it remains to identify m .
(4) We show that the three measures m, m b, can be identified. Select some set
C 2 E with the property m.FC / D .C / D 1 . Then similarly to (++), for every
A 2 E, the ratio limit theorem for the segment chain (+) yields
Rt
1A .s / ds m.FA /
lim R 0t D D .A/ Qx -almost surely, for every x 2 E .
t!1 1C .s / ds m.F C/
0

Thus the measures m and coincide, up to some constant multiple, by Theorem 9.6
RT
and the Harris property of . t / t0 in step (3). Next, the measure D 0 ds Œb mPs 
is by definition invariant for the grid chain .kT /k2N0 . Thus the Harris property in
step (1) shows that mb and coincide up to constant multiples. This concludes the
proof of the proposition. 

It follows from Azema, Duflo and Revuz [2] that the Harris property of .X t / t0 in
continuous time is equivalent to the Harris property of the chain .XSn /n2N0 which cor-
responds to observation of the continuous-time process after independent exponential
waiting times (i.e. Sn :D 1 C    C n where .j /j are i.i.d. exponentially distributed
and independent of X ).

258 Chapter 9 Appendix


9.4 One-dimensional Diffusions
For one-dimensional diffusions, it is an easy task to check Harris recurrence, in con-
trast to higher dimensions. We consider first diffusions with state space .1, 1/ and
then – see [69, Sect. 5.5] – diffusions taking values in open intervals I  R. The
following criterion is widely known; we take it from from Khasminskii [73, 1st. ed.,
Chap. III.8, Ex. 2 on p. 105].

9.12 Proposition. In dimension d D 1, consider a continuous semi-martingale

dX t D b.X t / dt C  .X t / d W t

where W is one-dimensional Brownian motion, and where b./ and  ./ > 0 are con-
tinuous on R.
(a) If the (strictly increasing) mapping S : R ! R defined by
Z x  Z y 
2b
./ S.x/ :D s.y/ dy where s.y/ :D exp  2
.v/ dv , x, y 2 R
0 0 

is a bijection onto R, then the process X D .X t / t0 is recurrent in the sense of Harris,
and the invariant measure m is given by
Z x 
1 1 1 2b
m.dx/ D dx D exp .v/ dv dx on .R, B.R// .
s.x/  2 .x/  2 .x/ 0 
2

The mapping S./ defined by ./ is called the scale function.


(b) In the special case b./  0, we have S./ D id on R: thus a diffusion X without
drift – for  ./ continuous and strictly positive on R – is Harris with invariant measure
1
2 .x/ dx.

Proof. Let Qx on .D, D, G/ denote the law of X with starting point x. We give a
proof in three steps, from Brownian motion via the driftless case to continuous semi-
martingales as above.
(1) Special case  ./  1 and b./  0. Here X is one-dimensional Brownian
motion starting from x, thus Lebesgue measure  on R is an invariant measure since
Z Z Z
1 1 .yx/2
P t .f / D dx dy p e  2 t f .y/ D dy f .y/ D .f / .
2 t
For any Borel set A of positive Lebesgue measure and for any starting point x for
X , select n large enough for .A \ Bn1 .0// > 0 and x 2 Bn1 .0/, and define
G-stopping times

Sm :D inf ¹t > Tm1 :  t > nº , Tm :D inf ¹t > Sm :  t < nº ,


m  0, T0  0 .

Section 9.4 One-dimensional Diffusions 259

By the law of the iterated logarithm, these are finite Qx -almost surely, and we have
Tm " 1 Qx -almost surely as m ! 1. Hence A is visited infinitely often by the
canonical process  under Qx , and Proposition 9.11 establishes Harris recurrence. By
Theorem 9.5, m D  is the unique invariant measure, and we have null recurrence
since m has infinite total mass on R.
(2) Special case b./  0. Here X is a diffusion without drift. On .D, D, G/, let A
denote the additive functional
Z t
At D  2 .s / ds , t  0
0
and define M :D  t  0 , t  0. Then M is a continuous local martingale under Qx
with angle bracket hM i D A. A result by Lepingle [85] states that Qx -almost surely
° ±
on the event Rc :D lim hM i t < 1 , lim M t exists in R .
t!1 t!1

R 1 2 ./ is continuous and strictly positive on R, we infer that the event R D


Since c

¹ 0  .s / ds < 1º is a null set under Qx . As a consequence, A is (strictly) in-


creasing to 1 Qx -almost surely. Define
 .u/ :D inf¹t > 0 : A t > uº , 0 < u < 1, 0  0 .
Then  .u/ tends to 1 Qx -almost surely as u ! 1. As a consequence, the charac-
terisation theorem of P. Lévy (cf. [61, p. 85]) shows that
   
B :D M.u/ u0 :D .u/  0 u0
is a standard Brownian motion. For f  0 measurable, the time change gives
Z .v/ Z .v/
f .s /  .s /ds D
2
f .s / dAs
0 0
Z .v/
.C/ D f .0 C BAs / dAs
0
Z v
D f .0 C Br / dr .
0

By assumption on  ./, we may replace measurable functions f  0 in (+) by f2 and


obtain
Z .v/ Z v 
f
.CC/ f .s / ds D .0 C Br / dr , 0 < v < 1 .
0 0 2
By step (1), one-dimensional Brownian motion is a Harris process. Letting v tend to
1 in (++) and exploiting again continuity and strict positivity of  ./, we obtain from
(++)
Z 1
A 2 B.R/ , .A/ > 0 H) 1A .s / ds D 1 Qx -almost surely .
0

260 Chapter 9 Appendix

This holds for an arbitrary choice of a starting point x 2 R. We thus have condition
.˘/ in Definition 9.4 with :D , thus the driftless diffusion X is Harris recurrent. It
remains to determine the invariant measure m , unique by Theorem 9.5 up to constant
multiples. We shall show
1
m.dx/ D 2 dx .
 .x/

Select g  0 measurable with m.g/ D 1. Then for measurable functions f  0, the


Ratio Limit Theorem 9.6 combined with (++) gives
Rt R .v/ Rv f
0 f .s / ds f .s / ds 2 .0 C Bs / ds
m.f / D lim R t D lim R .v/
0
D lim R0v g .
2 .0 C Bs / ds
t!1 g. / ds v!1 v!1
0 s 0 g. s / ds 0

By step (1) above, B is a Harris process whose invariant measure  is translation


invariant. Thus the ratio limit theorem for B applied to the last right-hand side shows
 
 f2
m.f / D  g  for all f  0 measurable .
 2
 
As a consequence we obtain 0 <  g2 < 1 (by definition of an invariant measure,
m is  -finite and non-null); then, up to multiplication with a constant, we identify m
on .R, B.R// as m.dx/ D 21.x/ dx. This proves part (b) of the proposition.
(3) We consider a semi-martingale X admitting the above representation, and as-
sume that S : R ! R defined by ./ is a bijection onto R. Let L denote the Markov
generator of X acting on C 2 functions
1
Lf .x/ D b.x/f 0 .x/ C  2 .x/f 00 .x/ , x 2 R .
2
It follows from the definition in ./ that LS  0. Thus S ı X is a local martingale by
Ito formula. More precisely,
e :D S ı X D .S.X t // t0
X
is a diffusion without drift as in step (2) solving the equation
et D e
dX e t / d Wt
 .X  :D .s   / ı S 1 .
with e
e is a Harris process with invariant measure m
It has been shown in step (2) that X e
given by
1 1
me.de
x/ D 2 xD
de x , e
de x2R
e
 .ex/ .s   / .S 1 .e
2 x //
and thus
Z 1
 
.ı/ e non-negative, B.R/-measurable, m
f e .fe/ > 0 : e X
f e t dt D 1
0
e
almost surely, independently of the choice of a starting point for X.

Section 9.4 One-dimensional Diffusions 261

If we define a measure m on .R, B.R// as the image of m e under the mapping


e 1
x ! S .e x / , then
Z
 
.ıı/ m.f / D m x / f .S 1 .e
e.de e f ı S 1
x // D m

and the transformation formula gives


1 1 1
m.dx/ D s.x/ dx D 2 dx , x2R.
e
 2 .S.x//  .x/ s.x/

Now, for f  0 measurable, we combine .ı/ and .ıı/ with the obvious representation
Z t Z t
 
f .Xv / dv D e X
f e v dv for f e :D f ı S 1
0 0

e gives
in order to finish the proof. First, in this combination, the Harris property of X
Z 1
f non-negative, B.R/-measurable, m.f / > 0 : f .X t / dt D 1
0

almost surely, for all choices of a starting point for X . This is property .˘/ in Definition
9.4 with D m. Thus X is a Harris process. Second, for g  0 as in step 2 above and
g :D g ı S 1 which implies m.g/ D m
e e.eg / D 1, the ratio limit theorem
Rt Rt
e.X e s / ds  
0 f .Xs / ds f
lim R t D lim R0t Dm e f ı S 1 D m.f /
t!1 g.Xs / ds t!1 e e s / ds
g .X
0 0

identifies m as the invariant measure for X . We have proved part (a) of the proposi-
tion. 

9.12’ Examples. The following examples (a)–(c) are applications of Proposition 9.12.
(a) The Ornstein–Uhlenbeck process dX t D #X t dt C  d W t with parameters # < 0
2
and  > 0 is positive Harris recurrent with invariant probability m D N .0, 2j#j /.
p
(b) The diffusion dX t D .˛  ˇX t / dt C 1 C X t2 d W t with parameters ˛ 2 R,
ˇ > 0, 0 <  < 1 is positive Harris recurrent. We have
Z x
2b ˇ
 .v/ dv jxj2.1 / as x ! C1 or x ! 1
0  2 1  

with notations of ./ in Proposition 9.12. The invariant measure m of X admits


finite moments of arbitrary order.
2
Xt
(c) The diffusion dX t D # 1CX 2 dt C  d W t with parameters  > 0 and #  2
t
is recurrent in the sense of Harris. Normed as in Proposition 9.12, the invariant

262 Chapter 9 Appendix

p 2#
measure is m.dx/ D 1
2 1 C x2 2
dx . Null recurrence holds if and only if
2
j#j  2 .

The following is a variant of Proposition 9.12 for diffusions taking values in open
intervals I  R.

9.13 Proposition. In dimension d D 1, consider an open interval I  R, write B.I /


for the Borel- -field. Consider a continuous semi-martingale X taking values in I and
admitting a representation
dX t D b.X t / dt C  .X t / d W t , t 0.
We assume that b./ and  ./ > 0 are continuous functions I ! R. Fix any point
x0 2 I . If the mapping S : I ! R defined by
Z x  Z y 
2b
./ S.x/ :D s.y/ dy where s.y/ :D exp  2
.v/ dv
x0 x0 
maps I onto R, then X is recurrent in the sense of Harris, and the invariant measure
m on .I , B.I // is given by
Z x 
1 1 1 2b
m.dx/ D 2 dx D 2 exp 2
.v/ dv dx , x 2 I .
 .x/ s.x/  .x/ x0 

Proof. This is a modification of part (3) of the proof of Proposition 9.12 since I is
open. For the I -valued semi-martingale X , the transformed process Xe D S ı X is an
R-valued diffusion without drift as in step (2) of the proof of Proposition 9.12, thus
e on .R, B.R//. Defining m from m
Harris, with invariant measure m e by
 1

m.f / :D me f ıS for f : I ! Œ0, 1/ B.I /-measurable
we obtain a  -finite measure on .I , B.I // such that sets A 2 B.I / with m.A/ > 0
are visited infinitely often by the process X , almost surely, under every choice of a
starting value in I . Thus X is Harris, and the ratio limit theorem identifies m as the
invariant measure of X . 

9.13’ Example. For every choice of a starting p point in I :D .0, 1/, the Cox–
Ingersoll–Ross process dXt D .˛  ˇX t / dt C  X t d W t with parameters ˇ > 0,
 > 0 and 2˛2  1 almost surely never hits 0 (e.g. Ikeda and Watanabe [61, p. 235–
237]). Proposition 9.13 shows that X is positive Harris recurrent. The invariant prob-
ability m is a Gamma law . 2˛2 , 2ˇ2 /. 

9.13” Example. On some ., A, P /, for some parameter #  0 and for deterministic
starting point in I :D .0, 1/, consider the diffusion
1 p
dX t D # .  X t / dt C X t .1  X t / d W t
2

Section 9.4 One-dimensional Diffusions 263

up to the stopping time :D inf¹t > 0 : X t … .0, 1/º. Then the following holds:

P . D 1/ D 1 in the case where #  1 ,


P . < 1/ D 1 in the case where 0  # < 1 .

Only in the case where #  1 is the process X a diffusion taking values in I in


the sense of Proposition 9.13. For all #  1, X is positive Harris, and the invariant
probability on .0, 1/ is the Beta law B.#, #/.

Proof. (1) We fix #  0. Writing b.x/ D #. 12  x/ and  2 .x/ D x.1  x/ for


0 < x < 1, we have
Z Z  
y
2b y
1  2v y.1  y/
.v/ dv D # dv D # ln , 0<y<1
x0 2 x0 v.1  v/ x0 .1  x0 /

with respect to any fixed x0 2 .0, 1/. The function s./ in ./ of Proposition 9.13,
given by
 
y.1  y/ #
s.y/ D , 0<y<1,
x0 .1  x0 /

behaves as c y # for y # 0 and as c .1  y/# for y " 1, with some constant c which
depends on # and on x0 . Hence S./ of ./ of Proposition 9.13 is such that

S.0C / D 1 , S.1 / D C1 in the case where #  1 ,


S.0C / > 1 , S.1 / < C1 for 0  # < 1 .

Define a  -finite measure m on .I , B.I // by

1 1
m.dx/ :D 1.0,1/ .x/ 2
dx D 1.0,1/ .x/ Œx0 .1  x0 /# Œx.1  x/.#1/ dx .
s.x/  .x/

Up to a factor 2, this is the ‘speed measure’ in terms of Karatzas and Shreve [69,
p. 343]. In the case where # D 0, this measure has infinite total mass m.I / D 1 such
that

m..x, x0 / c j log.x/j as x # 0 , m..Œx0 , x// c j log.1  x/j as x " 1

for some constant c depending on x0 and #. In the case where # > 0, m has finite
total mass and – arranging for the norming factor – equals the Beta law B.#, #/ on

264 Chapter 9 Appendix

.0, 1/. We shall also need the function


Z x Z y Z x
1
V .x/ :D dy s.y/ 2
dz D ŒS.x/  S.z/ m.dz/ , 0<x<1
x0 x0 s.z/  .z/ x0

as in [69, p. 347]) .
(2) In the case where # D 0, X is a local martingale, we have s./  1 on .0, 1/
and thus 8R
< x dy m.Œx0 , y// if x0 < x < 1
V .x/ D Rx0
: x0 dy m..y, x / if 0 < x < x
x 0 0

by definition. Since j log./j is integrable on small neighbourhoods of 0C , we have for


# D0

.C/ V .0C / > 1 , V .1 / < C1 .

According to [69, p. 350)], (+) implies P . < 1/ D 1. Thus in the case where # D 0,
almost surely, the martingale X  D .X t^ / t0 is absorbed at the boundary of I in
finite time.
(3) Consider the case 0 < # < 1 where both m.I / and S.1 /  S.0C / are finite.
By definition of V ./, this again leads to (+) and thus to P . < 1/ D 1: X hits the
boundary of I in finite time.
(4) Consider the case #  1 where S./ maps I onto R. In this case, [69, p. 345]
shows that X has infinite life time in I . Then Proposition 9.13 applies and yields Harris
recurrence of X with invariant measure m . Since up to constants m is a probability
law, X is positive recurrent. 

The following result on null recurrent diffusions without drift provides exact con-
stants for weak convergence of additive functionals, both for the invariant measure
and for the norming functions in Theorem 9.8. It was proved in Khasminskii [73]. We
quote part (a) from [73, 2nd ed., pp. 129–130] and part (b) from [73, 2nd ed. pp. 134–
136]), embedding it into the Harris setting of Proposition 9.12.

9.14 Proposition. In dimension d D 1, for  ./ > 0 locally Lipschitz on R and


e solution to
satisfying a linear growth condition, consider the Harris process X
et D e
dX e t / d Wt ,
 .X t 0

taking values in .1, 1/. Normed as in Proposition 9.12(b), consider the invari-
ant measure m e together with the sequence of stopping times
e.dx/ D e21.x/ dx of X
 .k/ " 1 as k ! 1:
e t < 0º ,
 .k/ :D inf¹t > .k/ : X e t > 1º ,
.k/ :D inf¹t >  .k  1/ : X
k  1 ,  .0/  0 .

Section 9.4 One-dimensional Diffusions 265

e 1ŒŒ.k/,.kC1/ away from 0 are such that for


(a) For k  1, the i.i.d excursions X
f  0 measurable
Z .kC1/ !
E e s / ds D 2 m
f .X e.f / .
.k/

(b) Assume that there is ˇ > 1 and constants AC C A > 0 such that the limits
Z
1 x ˇ 2
lim jvj dv D: A˙ 2 Œ0, 1/
x!˙1 x 0 e
 2 .v/

exist. Then Xe is null recurrent. Defining ˛ :D 1


we have 0 < ˛ < 1 and
ˇ C2
regular variation
˛ 2˛ .ŒAC ˛ C ŒA ˛ /
P .  .2/ .1/ > t /  t ˛ as t ! 1 .
.1 C ˛/
(On the last right-hand side, we have corrected a typing error in formula (4.101)
of [73, 2nd ed.], in accordance with (4.110), (4.111), (4.84) and Lemma 4.19 in [73,
2nd ed.]; see also Bingham, Goldie and Teugels [12, p. 349].
Bibliography

[1] T. Anderson, The integral of a symmetric unimodal function over a symmetric convex
set and some probability inequalities, Proc. Amer. Math. Soc. 6 (1955), 170–176.
[2] J. Azéma, M. Duflo, D. Revuz, Mesures invariantes des processus de Markov récurrents,
Séminaire de Probabilités III, Lecture Notes in Mathematics 88, Springer, 1969.
[3] O. Barndorff-Nielsen, Parametric Statistical Models and Likelihood. Lecture Notes in
Statistics 50, Springer, 1988.
[4] J. Barra, Mathematical Basis of Statistics. Academic Press 1981. French ed. Dunod,
1971.
[5] I. Basawa, J. Scott, Asymptotic Optimal Inference for Non-ergodic Models, Lecture
Notes in Statistics 17, Springer, 1983.
[6] R. Bass, Diffusions and Elliptic Operators, Springer, 1998.
[7] H. Bauer, Wahrscheinlichkeitstheorie und Grundzüge der Masstheorie, 3rd ed., De
Gruyter, 1978.
[8] M. Ben Alaya, A. Kebaier, Parameter estimation for the square root diffusions: ergodic
and nonergodic cases, Stochastic Models 28 (2012), 609–634.
[9] R. Beran, Estimating a distribution function, Ann. Statist. 5, 400-404 (1977).
[10] J. Bertoin, Lévy Processes, Cambridge University Press, 1996.
[11] P. Billingsley, Convergence of Probability Measures, Wiley 1968.
[12] N. Bingham, C. Goldie, J. Teugels, Regular Variation, Cambridge University Press
1987.
[13] C. Boll, Comparison of Experiments in the Infinite Case, PhD Thesis, Stanford Univer-
sity, 1955.
[14] P. Brémaud, Point Processes and Queues, Springer, 1981.
[15] K. Chung, R. Williams, Introduction to Stochastic Integration, 2nd ed., Birkhäuser,
1990.
[16] H. Cremers, D. Kadelka, On weak convergence of integral functionals of stochastic
processes with application to processes taking paths in LpE , Stoch. Processes Appl. 21
(1986), 305–317.
[17] S. Dachian, On limiting likelihood ratio processes of some change-point type statistical
models, Journal of Statistical Planning and Inference 140 (2010), 2682–2692.
[18] D. Darling, M. Kac, On occupation times for Markov processes. Trans. Amer. Math.
Soc. 84 (1957), 444–458.
268 Bibliography

[19] R. Davies, Asymptotic inference when the amount of information is random, in: L. Le
Cam, R. Olshen, (Eds): Proc. of the Berkeley Symp. in Honour of J. Neyman and J.
Kiefer, Vol. II. Wadsworth, 1985.
[20] C. Dellacherie, P. Meyer, Probabilités et potentiel, Chap. I–IV, Hermann, 1975;
Chap. V–VIII, Hermann, 1980.
[21] M. Diether, Wavelet estimation in diffusions with periodicity, Statist. Inference Stoch.
Processes 15 (2012), 257–284.
[22] G. Dohnal, On estimating the diffusion coefficient. J. Appl. Probab. 24 (1987), 105–114.
[23] P. Feigin, Some comments concerning a curious singularity. J. Appl. Probab. 16 (1979),
440–444.
[24] W. Feller, An Introduction to Probability Theory and its Applications, Vol. 2, Wiley,
1971.
[25] V. Genon-Catalot, J. Jacod, On the estimation of the diffusion coefficient for multidi-
mensional diffusion processes, Ann. Inst. H. Poincaré Probab. Stat. 29, (1993), 119–
151.
[26] V. Genon-Catalot, D. Picard, Elements de statistique asymptotique, Springer, 1993.
[27] O. Georgii, Stochastik, De Gruyter, 2002.
[28] I. Gihman, A. Skorohod, The Theory of Stochastic Processes, Vol. I+II, Springer, 1974
(Reprint 2004).
[29] R. Gill, B. Levit, Applications of the van Trees inequality: a Bayesian Cramér-Rao
bound, Bernoulli 1 (1995), 59–79.
[30] E. Gobet, Local asymptotic mixed normality property for elliptic diffusion, Bernoulli 7
(2001), 899–912.
[31] E. Gobet, LAN property for ergodic diffusions with discrete observations, Ann. Inst. H.
Poincaré PR 38 (2002), 711–737.
[32] Y. Golubev, Computation of efficiency of maximum likelihood estimate when observing
a discontinuous signal in white noise, Problems Inform. Transm. 15 (1979), 38–52.
[33] P. Greenwood, A. N. Shiryaev, Asymptotic minimaxity of a sequential estimator for a
first order autoregressive system, Stochastics and Stochastics Reports 38 (1992), 49-65.
[34] P. Greenwood, W. Wefelmeyer, Efficiency of empirical estimators for Markov chains,
Ann. Statist. 23 (1995), 132–143.
[35] P. Greenwood, W. Ward, W. Wefelmeyer, Statistical analysis of stochastic resonance
in a simple setting, Phys. Rev. E 60 (1999), 4687–4695.
[36] Greenwood, P., Wefelmeyer, W.: Asymptotic minimax results for stochastic process
families with critical points, Stoch. Proc. Appl. 44, (1993), 107–116.
[37] L. Grinblat, A limit theorem for measurable stochastic processes and its applications,
Proc. Amer. Math. Soc. 61 (1976), 371–376.
[38] A. Gushchin, On asymptotic optimality of estimators under the LAQ condition, Theory
Probab. Appl. 40 (1995), 261–272.
Bibliography 269

[39] A. Gushchin, U. Küchler, Asymptotic inference for a linear stochastic differential equa-
tion with time delay, Bernoulli 5, (1999), 1059–1098.
[40] J. Hájek, A characterization of limiting distributions for regular estimators, Z.
Wahrscheinlichkeitsth. Verw. Geb. 14 (1970), 323–330.
[41] J. Hájek, Z. Sidák, Theory of Rank Tests, Academic Press, 1967.
[42] J. Hájek, Z. Sidák, P. Sen, Theory of Rank Tests (2nd ed.), Academic Press, 1999.
[43] M. Hammer, Parameterschätzung in zeitdiskreten ergodischen Markov-Prozessen am
Beispiel des Cox-Ingersoll-Ross Modells, Diplomarbeit, Institut für Mathematik, Uni-
versität Mainz 2005.
http://ubm.opus.hbz-nrw.de/volltexte/2006/1154/pdf/diss.pdf
[44] M. Hammer, Ergodicity and regularity of invariant measure for branching Markov pro-
cesses with immigration, PhD Thesis, Institute of Mathematics, University of Mainz,
2012.
http://ubm.opus.hbz-nrw.de/volltexte/2012/3306/pdf/doc.pdf
[45] T. Harris, The existence of stationary measures for certain Markov processes.,Proc. 3rd
Berkeley Symp., Vol. II, pp. 113–124, University of California Press, 1956.
[46] R. Höpfner, Asymptotic inference for continuous-time Markov chains, Probab. Theory
Rel. Fields 77 (1988), 537–550.
[47] R. Höpfner, Null recurrent birth-and-death processes, limits of certain martingales, and
local asymptotic mixed normality, Scand. J. Statist. 17 (1990), 201–215.
[48] R. Höpfner, On statistics of Markov step processes: representation of log-likelihood ratio
processes in filtered local models, Probab. Theory Rel. Fields 94 (1993), 375–398.
[49] R. Höpfner, Asymptotic inference for Markov step processes: observation up to a ran-
dom time, Stoch. Proc. Appl. 48 (1993), 295–310.
[50] R. Höpfner, J. Jacod, Some remarks on the joint estimation of the index and the scale
parameter for stable prozesses, in: Mandl and Huskova (Eds.), Asymptotic Statistics,
Proc. Prague 1993, pp. 273–284, Physica Verlag, 1994.
[51] R. Höpfner, J. Jacod, L. Ladelli, Local asymptotic normality and mixed normality for
Markov statistical models, Probab. Theory Rel. Fields 86 (1990), 105–129.
[52] R. Höpfner, E. Löcherbach, Remarks on ergodicity and invariant occupation measure
in branching diffusions with immigration, Ann. Inst. Henri Poincaré 41 (2005), 1025–
1047.
[53] R. Höpfner, E. Löcherbach, Limit Theorems for Null Recurrent Markov Processes,
Memoirs AMS 161, American Mathematical Society, 2003.
[54] R. Höpfner, Y. Kutoyants, On a problem of statistical inference in null recurrent diffu-
sions, Statist. Inference Stoch. Process. 6 (2003), 25–42.
[55] R. Höpfner, Y. Kutoyants, On LAN for parametrized continuous periodic signals in a
time inhomogeneous diffusion, Statistics & Decisions 27 (2009), 309–326.
[56] R. Höpfner, Y. Kutoyants, Estimating discontinuous periodic signals in a time inhomo-
geneous diffusion, Statist. Inference Stoch. Process. 13 (2010), 193–230.
270 Bibliography

[57] R. Höpfner, Y. Kutoyants, Estimating a periodicity parameter in the drift of a time in-
homogeneous diffusion, Math. Meth. Statist. 20 (2011), 58–74.
[58] R. Höpfner, L. Rüschendorf, Comparison of estimators in stable models, Mathematical
and Computer Modelling 29 (1999), 145–160.
[59] R. Iasnogorodski, H. Lhéritier, Théorie de l’estimation ponctuelle paramétrique, EDP
Sciences, 2003.
[60] I. Ibragimov, R. Khasminskii, Statistical Estimation, Springer, 1981.
[61] N. Ikeda, N. Watanabe, Stochastic Differential Equations and Diffusion Processes,
North-Holland / Kodansha, 2nd ed., 1989.
[62] K. Ito, H. McKean, Diffusion Processes and their Sample Paths, Springer, 1965.
[63] J. Jacod, Multivariate point processes: predictable projection, Radon-Nikodym deriva-
tives, representation of martingales, Z. Wahrscheinlichkeitsth. Verw. Geb. 31 (1975),
2435–253.
[64] J. Jacod, A. Shiryaev, Limit Theorems for Stochastic Processes, Springer 1987, 2nd ed.,
2003.
[65] A. Janssen, Zur Asymptotik nichtparametrischer Tests, Vorlesungsskript. Düsseldorf,
1998.
[66] P. Jeganathan, On the asymptotic theory of estimation when the limit of the log-
likelihood ratios is mixed normal, Sankhya Ser. A 44 (1982), 173–212.
[67] P. Jeganathan, Some aspects of asymptotic theory with applications to time series mod-
els. Preprint version, 1988. Econometric Theory 11, 818–887 (1995).
[68] Y. Kabanov, R. Liptser, A. Shiryaev, Criteria for absolute continuity of measures cor-
responding to multivariate point processes, in: J. Prokhorov (Ed.), Proc. Third Japan-
USSR Symposium, Lecture Notes in Math. 550, pp. 232–252, Springer, 1976.
[69] I. Karatzas, S. Shreve, Brownian Motion and Stochastic Calculus, 2nd ed. Springer,
1991.
[70] M. Kessler, Estimation of an ergodic diffusion from discrete observations, Scand. J.
Statist. 24 (1997), 211–229.
[71] M. Kessler, A. Lindner, M. Sørensen (Eds.), Statistical Methods for Stochastic Differ-
ential Equations, CRC Press, 2012.
[72] M. Kessler, A. Schick, W. Wefelmeyer, The information in the marginal law of a
Markov chain, Bernoulli 7 (2001), 342–266.
[73] R. Khasminskii, Stochastic Stability of Differential Equations, 1st ed., Sijthoff und No-
ordhoff, 1980; 2nd ed., Springer, 2012.
[74] R. Khasminskii, G. Yin, Asymptotic behavior of parabolic equations arising from one-
dimensional null recurrent diffusions, J. Diff. Eqns. 161 (2000), 154–173.
[75] A. Klenke, Wahrscheinlichkeitstheorie, 3rd ed., Springer, 2013.
[76] U. Küchler, Y. Kutoyants, Delay estimation for some stationary diffusion-type pro-
cesses, Scand. J. Statist. 27 (2000), 405–414.
Bibliography 271

[77] U. Küchler, M. Sørensen, Exponential Families of Stochastic Processes, Springer, 1997.


[78] Y. Kutoyants, Identification of Dynamical Systems with Small Noise, Kluwer, 1994.
[79] Y. Kutoyants, Statistical Inference for Spatial Poisson Processes, Springer, 1998.
[80] Y. Kutoyants, Statistical Inference for Ergodic Diffusion Processes, Springer, 2004.
[81] L. Le Cam, Théorie asymptotique de la décision statistique, Montréal, 1969.
[82] L. Le Cam, Limits of experiments, in: Proc. 6th Berkeley Symposium Math. Statist.
Probab. Vol. I, pp. 245–261, University of California Press, 1972.
[83] L. Le Cam, Maximum likelihood: an introduction, Intern. Statist. Rev. 58 (1990), 153–
171.
[84] L. Le Cam, G. Yang, Asymptotics in Statistics: Some Basic Concepts, Springer 1990,
2nd ed., 2002.
[85] D. Lepingle, Sur le comportement asymptotique des martingales locales, in: Séminaire
de Probabilités XII, Lecture Notes in Mathematics 649, pp. 148–161, Springer, 1978.
[86] F. Liese, K. Miescke, Statistical Decision Theory, Springer, 2008.
[87] F. Liese, I. Vajda, Convex Statistical Distances, Teubner, 1987.
[88] R. Liptser, A. Shiryaev, Statistics of Random Processes Vol. I+II, Springer, 1981, 2nd
ed., 2001.
[89] M. Loève, Probability Theory, 3rd ed. van Nostrand 1963; 4th ed. Vol. I+II Springer
1978.
[90] E. Löcherbach, Likelihood ratio processes for Markovian particle systems with killing
and jumps, Statist. Inference Stoch. Process. 5 (2002), 153–177.
[91] E. Löcherbach, LAN and LAMN for systems of interacting diffusions with branching
and immigration, Ann. Inst. Henri Poincare (B) Probab. Stat. 38 (2002), 59–90.
[92] E. Löcherbach, Smoothness of the intensity measure density for interacting branching
diffusions with immigration, J. Funct. Analysis 215 (2004), 130–177.
[93] D. Loukianova, E. Löcherbach, On Nummelin splitting for continuous time Harris re-
current Markov processes and application to kernel estimation for multi-dimensional
diffusions, Stoch. Proc. Appl. 118 (2008), 1301–1321.
[94] D. Loukianova, O. Loukianov, On deterministic equivalents of additive functionals of
recurrent diffusions and drift estimation, Statist. Inference Stoch. Process. 11 (2008),
107–121.
[95] H. Luschgy, Local asymptotic mixed normality for semimartingale experiments,
Probab. Theory Rel. Fields 92 (1992), 151–176.
[96] H. Luschgy, Asymptotic inference for semimartingale models with singular parameter
points, J. Statist. Plann. Inference 39 (1994), 155–186.
[97] H. Luschgy, Local asymptotic quadraticity of stochastic process models based on stop-
ping times, Stoch. Proc. Appl. 57 (1995), 305–317.
272 Bibliography

[98] M. Métivier, Semimartingales, De Gruyter, 1982.


[99] P. Millar, The minimax principle in asymptotic statistical theory, in: P. Hennequin (Ed),
Ecole d’ été de probabilités de St. Flour XI 1981, pp. 75–265, Springer, 1983.
[100] P. Millar, A general approach to the optimality of minimum distance estimators, Trans.
Amer. Math. Soc. 286 (1984), 377–418.
[101] E. Nummelin, General Irreducible Markov Chains and Non-negative Operators, Cam-
bridge University Press, 1985.
[102] E. Nummelin, A splitting technique for Harris recurrent Markov chains, Z. Wahrschein-
lichkeitsth. Verw. Geb. 43 (1978), 309–318.
[103] L. Overbeck, Estimation for continuous branching processes, Scand. J. Statist. 25
(1998), 111–126.
[104] L. Overbeck, T. Ryden, Estimation in the Cox-Ingersoll-Ross model, Econometric The-
ory 13 (1997), 430–461.
[105] J. Pfanzagl, Parametric Statistical Theory, De Gruyter, 1994.
[106] G. Pflug, The limiting log-likelihood process for discontinuous density families, Z.
Wahrschein. Verw. Gebiete 64 (1983), 15–35.
[107] B. Prakasa Rao, Asymptotic theory of statistical inference, Wiley, 1987.
[108] B. Prakasa Rao, Statistical Inference for Diffusion Type Processes, Arnold, 1999.
[109] B. Prakasa Rao, Semimartingales and their Statistical Inference, Chapman and Hall
CRC, 1999.
[110] S. Resnick, P. Greenwood, A bivariate stable characterization and domains of attrac-
tion, J. Multivar. Analysis 9 (1979), 206–221.
[111] D. Revuz, Markov Chains, Rev. ed. North-Holland, 1984.
[112] D. Revuz, M. Yor, Continuous Martingales and Brownian Motion, Springer, 1991.
[113] G. Roussas, Contiguity of Probability Measures, Cambridge University Press, 1972.
[114] H. Rubin, K. Song, Exact computation of the asymptotic effiency of maximum like-
lihood estimators of a discontinuous signal in Gaussian white noise, Ann. Statist. 23
(1995), 732–739.
[115] L. Rüschendorf, Asymptotische Statistik, Teubner, 1988.
[116] A. Schick, W. Wefelmeyer, Estimating joint distributions of Markov chains, Statist.
Inference Stoch. Proc. 5 (2002), 1–22.
[117] R. Schilling, Measures, Integrals and Martingales, Cambridge University Press, 2005.
[118] Y. Shimizu, N. Yoshida, Estimation of parameters for diffusion processes with jumps
from discrete observations, Statist. Inference Stoch. Proc. 9 (2006), 227–277.
[119] M. Sørensen, Likelihood methods for diffusion with jumps, in: Prabhu, Basawa (Eds.),
Statistical Inference in Stochastic Processes, pp. 67–105, Marcel Dekker, 1991.
Bibliography 273

[120] H. Strasser, Einführung in die lokale asymptotische Theorie der Statistik, Bayreuther
Mathematische Schriften, 1985.
[121] H. Strasser, Mathematical Theory of Statistics, De Gruyter, 1985.
[122] A. Shiryaev, V. Spokoiny, Statistical Experiments and Decisions, World Scientific,
2001.
[123] A. Touati, Théorèmes limites pour les processus de Markov récurrents. Unpublished
paper 1988. See also C. R. Acad. Sci. Paris Série I 305 (1987), 841–844.
[124] A. Tsybakov, Introduction à l’estimation non-parametrique, Springer SMAI, 2004.
[125] A. van der Vaart, An asymptotic representation theorem, Int. Statist. Rev. 59 (1991),
97–121.
[126] A. van der Vaart, Asymptotic Statistics, Cambridge University Press, 1998.
[127] H. Witting, Mathematische Statistik I, Teubner, 1985.
[128] H. Witting, U. Müller-Funk, Mathematische Statistik II, Teubner, 1995.
[129] N. Yoshida, Estimation for diffusion processes from discrete observations, J. Multivar.
Anal. 41 (1992), 220–242.
[130] H. van Zanten, On the rate of convergence of the maximum likelihood estimator in
Brownian semimartingale models, Bernoulli 11 (2005), 643–664.
[131] V. Zolotarev, One-dimensional Stable Distributions, Transl. of Mathematical Mono-
graphs 65, Amer. Math. Soc., 1986.
Index

additive functionals, martingale additive density process (likelihood ratio process)


functionals definition of, 142
definition of, 248, 251 else, 106, 107, 144, 146, 162, 165,
tightness rates, 249 166, 173, 234
regular variation condition, 250, 252,
253, 265 efficient estimator sequences, 194, 210
weak convergence of, 250, 252, 253 empirical distribution function, 44, 55,
approximately equivariant, approximately 78, 80
strongly equivariant, 137, 157 equivariant estimators, strongly equivari-
ant estimators
definition of, 131, 152
Bayes estimators, 30, 131, 132, 134, 192,
results on, 133, 136, 154, 156
210, 211, 242
else, 131, 132, 147, 192, 242
Brownian bridges, 69, 70, 79 estimator sequences, 31, 58, 189, 190,
Brownian motion with unknown drift 196, 205
statistical model for, 143, 145
time change, stopping times, 142. 168, filtered statistical experiment, 141, 142
169
independent time change, 170, 172, Gaussian martingale, conditionally Gaus-
223, 237 sian martingale, 244, 245
Gaussian processes, -Gaussian pro-
canonical path spaces .C , C , G/ or cesses, 68–70, 77, 79
.D, D, G/, canonical process, Gaussian shift models
143, 173, 247 definition of, 129, 181
central sequence results on, 131–133, 136, 137, 139,
definition of, 181 206
results on, 190–192, 194, 196, 205 else, 119, 121, 144, 146, 147, 219, 226
else, 200, 224, 237, 242 Harris recurrent Markov processes (posi-
central statistic tive or null)
definition of, 129, 149 setting, definition of, 247, 248
results on, 136, 139, 156, 159 Harris conditions, 253, 255, 258, 262
else, 130–133, 147, 149, 153, 157, invariant measure, 247, 248, 258, 262,
172, 174, 176, 181, 211, 242 264
contiguity, 87, 92 ratio limit theorem, 248
convergence of martingales, 244, 246, else, 213, 214, 228, 261–263
247 Hellinger distance, 33, 37, 118
convolution theorem, 133, 154, 157, 193
coupling property, 190, 191, 194, 196, information, 117, 118, 129, 152
198, 200, 205 information process, 217, 234
276 Index

information with estimated parameter, Mittag–Leffler process V ˛ , 175, 176,


203 229, 235, 237, 250, 252
integrals along the paths of a process, 44, mixed normal experiment
48, 73 definition of, 150
results on, 150, 152–154, 156, 157,
Kullback divergence, 32 159
else, 152, 157, 168, 170, 173, 176,
Lr -differentiable statistical models, 110, 180, 223, 224, 238
111, 114, 116–118, 121, 206
LAMN (local asymptotic mixed normal- one-parametric paths, 23, 112, 198, 207
ity), 180, 192–194, 196, 205, one-sided stable (stable increasing) pro-
223, 237 cess S ˛ , 175, 176
LAN (local asymptotic normality), 121, one-step modification, 200, 205, 208, 239
181, 191–194, 196, 205–208,
219 quadratic experiment
LAQ (local asymptotic quadraticity), 180, definition of, 149, 152
183, 186, 189–191, 200, 205, results on, 149, 191, 205
220, 222 else, 160, 162, 165–167, 169, 180,
Le Cam’s first lemma, 88, 96, 99 189, 200, 220, 222
Le Cam’s second lemma, 121, 206 quadratic variation of a continuous semi-
Le Cam’s third lemma, 90, 92, 100 martingale, 167
Lebesgue decomposition, 86
regular and efficient estimator sequences,
likelihood ratio, 86
194, 198, 205, 208, 219, 224,
limit experiment at #, 181
227, 237
local model at #, 181
regular estimator sequences, 191–194,
local parameter at #, 181
210
local scale at #, 181
location models, 39, 81, 111, 208 score, 117, 129, 152
log-likelihood ratio, 86 score martingale, 217, 234
score with estimated parameter, 204
Markov extension of a statistical model, stochastic differential equations (SDE)
186 setting for, 161
maximum likelihood (ML) estimators laws of solutions to, 161, 162, 165–
definition of, 31 167, 216, 233
results on, 37, 39, 131, 149, 191, 205 stochastic processes with paths in Lp , 45,
else, 25, 29, 132, 200, 209, 216, 217, 47, 56, 69, 70, 73, 77
221, 223, 225, 233, 237 subconvex loss functions, 134, 135
measurable selection, 59
minimax theorem, local asymptotic mini- total variation distance, 137
max theorem, 139, 159, 196
minimum distance (MD) estimators, 58, weak convergence in Lp -path spaces, 45,
60, 63, 64, 77 47, 79

You might also like