0% found this document useful (0 votes)

14 views20 pages

0 Inference For Diffusion Processes

Uploaded by

boyuanning999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views20 pages

0 Inference For Diffusion Processes

Uploaded by

boyuanning999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

2

Inference for diffusion processes

Statistical inference for diffusion processes has been an active research area during
the last two or three decades. The work has developed from estimation of linear
systems from continuous-time observations (see Le Breton (1974) and the refer-
ences therein) to estimation of non-linear systems (parametric or non-parametric)
from discrete-time observations. In this chapter, as well as in Papers I and II, we
shall be concerned with parametric inference for discrete-time observations exclu-
sively. The models may be linear or non-linear.
This branch of research commenced in the mid eighties (with the paper by
Dacunha-Castelle & Florens-Zmirou (1986) on the loss of information due to dis-
cretization as an important reference) and accelerated in the nineties. Important
references from the mid of the decade are Bibby & Sørensen (1995) on martingale
estimating functions, Gourieroux, Monfort & Renault (1993) on indirect inference,
and Pedersen (1995b) on approximate maximum likelihood methods, among oth-
ers. Later work includes Bayesian analysis (Elerian, Chib & Shephard 2000) and
further approximate likelihood methods (Aït-Sahalia 1998, Poulsen 1999).
Ideally, the parameter should be estimated by maximum likelihood but, ex-
cept for a few models, the likelihood function is not available analytically. In this
chapter we review some of the alternatives proposed in the literature. There ex-
ist review papers on estimation via estimating functions (Bibby & Sørensen 1996,
Sørensen 1997), but we do not know of any surveys covering all the techniques
discussed in this chapter.
Papers I and II contain my main contributions in this area. Furthermore, there
are some new results on identification for martingale estimating functions in Sec-
tion 2.3.1. In Paper I we discuss a particular estimating function derived as an
approximation to the continuous-time score function. The estimating function is
of the so-called simple type, it is unbiased and invariant to data transformations
and provides consistent and asymptotically normal estimators. In Paper II we dis-
cuss a method suitable for estimation of parameters in the diffusion term when the
drift is known. It is based on a functional relationship between the drift, the diffu-
sion function and the invariant density, and provides asymptotically well-behaved
estimators. The asymptotic results are proved using empirical process theory.
In the following we focus on fundamental ideas and refer to the literature for
rigorous treatments. In particular, we consider one-dimensional diffusions only,
although most methods apply in the multi-dimensional case as well. Also, we do
not account for technical assumptions, regularity conditions etc. An exception is
6 Chapter 2. Inference for diffusion processes

Section 2.3.1, though, where the new identification results are presented.
The chapter is organized as follows. The model is defined in Section 2.1,
and Section 2.2 contains preliminary comments on the estimation problem. Sec-
tion 2.3 is about estimating functions with special emphasis on martingale estimat-
ing functions and so-called simple estimating functions, including the one from
Paper I. In Sections 2.4 we discuss three approximations of the likelihood which
can in principle be made arbitrarily accurate, and Section 2.5 is about Bayesian
analysis. In Section 2.6 we discuss indirect inference and EMM which both intro-
duce auxiliary (but wrong) models and correct for the implied bias by simulation.
The method from Paper II is reviewed in Section 2.7 and conclusions are finally
drawn in Section 2.8.

2.1 Model, assumptions and notation

In this section we present the model and the basic assumptions, and introduce
notation that will be used throughout the chapter. We consider a one-dimensional,
time-homogeneous stochastic differential equation

dXt = b(Xt ; θ ) dt + σ (Xt ; θ ) dWt (2.1)

FF
defined on a filtered probability space (Ω; ; t ; Pr). Here, W is a one-dimensional
Brownian motion and θ is an unknown p-dimensional parameter from the pa-
rameter space Θ R p . The true parameter value is denoted θ0 . The functions
b : R Θ ! R and σ : R Θ ! (0; ∞) are known and assumed to be suitably smooth.
The state space is denoted I = (l ; r) for ∞ l < r +∞ (implicitly assuming
that it is open and the same for all θ ). We shall assume that for any θ 2 Θ and any
F 0 -measurable initial condition U with state space I, equation (2.1) has a unique
strong solution X with X0 = U . Assume furthermore that there exists an invariant
distribution µθ = µ (x; θ )dx such that the solution to (2.1) with X0 µθ is strictly
stationary and ergodic. It is well-known that sufficient conditions for this can be
expressed in terms of the scale function and the speed measure (see Section II.2,
or the textbook by Karatzas & Shreve (1991)), and that µ (x; θ ) is given by
1
µ (x; θ ) = M (θ )σ 2 (x; θ )s(x; θ ) (2.2)
R
where log s(x; θ ) = 2 xx b(y; θ )=σ 2 (y; θ ) dy for some x0 2 I and M (θ ) is a normal-
0
izing constant.
For all θ 2 Θ the distribution of X with X0 µθ is denoted by Pθ . Under Pθ all
Xt µθ . Further, let for t 0 and x 2 I, pθ (t ; x; ) denote the conditional density
(transition density) of Xt given X0 = x. Since X is time-homogeneous pθ (t ; x; ) is
actually the density of Xs+t conditional on Xs = x for all s 0. Note that the tran-
sition probabilities are most often analytically intractable whereas the invariant
density is easy to find (at least up the normalizing constant).
We are going to need some matrix notation: Vectors in R p are considered as
p 1 matrices and AT is the transpose of A. For a function f = ( f1 ; : : : ; fq )T :
2.2. Preliminary comments on estimation 7

R Θ ! R q we let f 0 (x; θ ) and f 00 denote the matrices of first and second order
partial derivatives with respect to x, and f˙(x; θ ) = ∂θ f (x; θ ) denote the q p ma-
trix of partial derivatives with respect to θ , i.e. f˙jk = ∂ f j =∂ θk , assuming that the
derivatives exist.
Finally, introduce the differential operator θ given by A
Aθ f (x θ ) = b(x θ ) f 0(x θ ) + 12 σ 2(x θ ) f 00(x θ )
; ; ; ; (2.3)
;

for twice continuously differentiable functions f : R Θ ! R . When restricted to

a suitable subspace, Aθ is the infinitesimal generator of X (see Rogers & Williams
(1987), for example).

2.2 Preliminary comments on estimation

The objective of this chapter is estimation of the parameter θ . First note that
if X is observed continuously from time zero to time T then parameters from the
diffusion coefficient can be determined (rather than estimated) from the quadratic
variation process of X , and the remaining part can be estimated by maximum
likelihood: if the diffusion function is completely known, that is σ (x; θ ) = σ (x),
then the likelihood function for X0t T is given by
Z T Z T 2
b(Xs; θ ) 1 b (Xs; θ )
LcT (θ ) = exp dXs ds : (2.4)
0 σ 2 (Xs ) 2 0 σ 2 (Xs)
An informal argument for this formula is given below; for a proper proof see
Lipster & Shiryayev (1977, Chapter 7).
From now on we shall consider the situation where X is observed at discrete
time-points only. For convenience we consider equidistant time-points ∆; 2∆; : : : ; n∆
for some ∆ > 0. Conditional on the initial value X0 , the likelihood function is given
as the product
n
Ln (θ ) = ∏ pθ (∆; X(i 1)∆ ; Xi∆ )
i=1

because X is Markov. Ideally, θ should be estimated by the value maximizing

Ln (θ ), but since the transition probabilities are not analytically known, neither is
the likelihood function.
There are a couple of obvious, very simple alternatives which unfortunately are
not satisfactory. First, one could ignore the dependence structure and simply ap-
proximate the conditional densities by the marginal density. Then all information
due to the time evolution of X is lost, and it is usually not possible to estimate the
full parameter vector. See Section 2.3.2 for further details.
As a second alternative, one could use the Euler scheme (or some higher-order
scheme) given by the approximation
p
Xi∆ X(i 1)∆
+ b(X(i 1)∆ ; θ )∆ + σ (X(i 1)∆ ; θ ) ∆ε i
8 Chapter 2. Inference for diffusion processes

where εi , i = 1; : : : ; n are independent, identically N (0; 1)-distributed. This approxi-

mation is good for small values of ∆ but may be bad for larger values. The approx-
imation is two-fold: the moments are not the true conditional moments, and the
true conditional distribution need not be Gaussian. The moment approximation
introduces bias implying that the corresponding estimator is inconsistent as n ! ∞
for any fixed ∆ (Florens-Zmirou 1989). The Gaussian approximation introduces
no bias per se, but usually implies inefficiency: if the conditional mean and vari-
ance are replaced by the true ones, but the Gaussian approximation is maintained,
then the corresponding approximation to the score function is a non-optimal mar-
tingale estimating function, see Section 2.3.1.
Note that the Euler approximation provides an informal explanation of formula
(2.4): if σ does not depend on θ , then the Euler approximation to the discrete-
time likelihood function is given by (except for a constant)
( )
n b(X(i 1)∆
; θ) 1 n b (X(i 1)∆ ; θ )
2
exp ∑ σ 2 (X(i
Xi∆ X(i 1)∆ 2 i∑
∆
σ 2 (X(i 1)∆ )
(2.5)
i=1 1)∆ ) =1

which is the Riemann-Itô approximation of (2.4).

2.3 Estimating functions

Estimating functions provide estimators in very general settings where an un-
known p-dimensional parameter θ is to be estimated from data X obs of size n.
Basically, an estimating function Fn is simply a R p -valued function which takes
the data as well as the unknown parameter as arguments. An estimator is ob-
tained by solving Fn (X obs ; θ ) = 0 for the unknown parameter θ . General theory for
estimating functions may be found in Heyde (1997) or Sørensen (1998b).
The prime example of an estimating function is of course the score function,
yielding the maximum likelihood estimator. When the score function is not avail-
able an alternative estimating function should of course be chosen with care. In
order for the corresponding estimator to behave (asymptotically) “nicely” it is cru-
cial that the estimating function is unbiased and is able to distinguish the true
parameter value from other values of θ :

Eθ Fn (X obs ; θ ) = 0 if and only if θ = θ0 : (2.6)

Now, let us turn to the case of discretely observed diffusions again. The score
function
n
Sn (θ ) = ∂θ log Ln (θ ) = ∑ ∂θ log pθ (∆; X(i 1)∆
; Xi∆)
i=1

is a sum of n terms where the i’th term depends on data through (X(i 1)∆ ; Xi∆ )
only. As we are trying to mimic the behaviour of the score function, it is natural
2.3. Estimating functions 9

to look for estimating functions with the same structure. Hence, we shall consider
estimating functions of the form
n
Fn (θ ) = ∑ f (X(i 1)∆
; Xi∆; θ ) (2.7)
i=1

where we have omitted the dependence of data on Fn from the notation. Condition
(2.6) simplifies to: Eθ f (X0; X∆ ; θ ) = 0 if and only if θ = θ0 .
0
Sørensen (1997) and Jacobsen (1998) provide overviews of estimating func-
tions in the diffusion case. In the following we shall concentrate on two special
types, namely martingale estimating functions (Fn (θ ) being a Pθ -martingale) and
simple estimating functions (each term in Fn depending on one observation only).

2.3.1 Martingale estimating functions

There are (at least) two good reasons for looking at estimating functions that are
martingales: (i) the score function which we are basically trying to imitate is a
martingale; and (ii) we have all the machinery from martingale theory (e.g. limit
theorems) at our disposal. Also, martingale estimating functions are important as
any asymptotically well-behaved estimating function is asymptotically equivalent
to a martingale estimating function (Jacobsen 1998).

Definition, asymptotic results and optimality

Consider the conditional moment condition
Z

Eθ h̃(X0 ; X∆ ; θ )jX0 = x = h̃(x; y; θ ) pθ (∆; x; y) dy = 0; x 2 I; θ 2Θ (2.8)
I

for a function h̃ : I 2 Θ ! R . If all coordinates of f from (2.7) satisfy this condition,

G
and ( i ) is the discrete-time filtration generated by the observations, then

Eθ Fn (θ )j Gn 1 = Fn 1 (θ ) + Eθ f (X(n 1)∆
; Xn∆ ; θ )jX(n 1)∆
= Fn 1 (θ );

G
so Fn (θ ) is a Pθ -martingale with respect to ( i ). Usually, when pθ (∆; x; ) is not
known, functions satisfying (2.8) cannot be found explicitly but should be calcu-
lated numerically.
Suppose that h1 ; : : : ; hN : I 2 Θ ! R all satisfy (2.8) and let α1 ; : : : ; αN : I Θ !
R p be arbitrary weight functions. Then each coordinate of f defined by
N
f (x; y; θ ) = ∑ α j (x θ )h j (x y θ ) = α (x θ )h(x y θ )
; ; ; ; ; ;
j=1

satisfies (2.8) as well. Here we have used the notation α for the R pN -valued func-
tion with (k; j)’th element equal to the k’th element of α j and h for (h1 ; : : : ; hN )T .
Note that the score function is obtained as a special case: for N = p, h(x; y; θ ) =
(∂θ log pθ (∆; x; y))T and α (x; θ ) equal to the p p unit matrix.
10 Chapter 2. Inference for diffusion processes

Classical limit theory for stationary martingales (Billingsley 1961) is employed

for asymptotic results of Fn with f as above. Under differentiability and p integrabil-
ity conditions Ḟn (θ )=n ! A(θ ) in Pθ -probability for all θ and Fn (θ0 )= n ! N (0; V0 )
0
in distribution wrt. Pθ . Here,
0

N
A(θ ) = Eθ
0
f˙(X0 ; X∆ ; θ ) = ∑ Eθ0 α j (X0 θ )ḣ j (X0 X∆ θ ) = Eθ0 α (X0 θ )ḣ(X0 X∆ θ )
; ; ; ; ; ;
j=1
V0 = Eθ f (X0 ; X∆ ; θ0 ) f (X0; X∆ ; θ0 )T = Eθ α (X0 ; θ0 )τh (X0; θ0 )α T (X0 ; θ0 );
0 0

where τh (x; θ ) = Varθ (h(X0; X∆ ; θ )jX0 = x). If the convergence Ḟn (θ )=n ! A(θ ) is
suitably uniform in θ and A0 = A(θ0 ) is non-singular then a solution p θ̃n to Fn (θ ) = 0
exists with a probability tending to 1, θ̃n ! θ0 in probability, and n(θ̃n θ0 ) !
T
N (0; A0 1V0 A0 1 ) in distribution wrt. Pθ (Sørensen 1998b). The condition that A0
0
is non-singular is discussed below.
For h1 : : : ; hN given it is easy to find optimal weights α ? in the sense that the
corresponding estimator has the smallest asymptotic variance, where V V 0 as
usual means that V 0 V is positive semi-definite (Sørensen 1997):
T
α (x; θ ) = τh (x; θ )
? 1
Eθ ḣ(X0 ; X∆ ; θ )jX0 = x :

How to construct martingale estimating functions in practice

The question on how to choose h1 ; : : : ; hN (and N) is far more subtle (when the
score function is not known), and the optimal h1 ; : : : ; hN within some class (typi-
cally) change with ∆. Jacobsen (1998) investigates optimality as ∆ ! 0, and it is
clear that the score for the invariant measure is optimal as ∆ ! ∞. Not much work
has been done for fixed values of ∆ in between. Here we mention two particular
ways of constructing martingale estimating functions.
First, consider functions of the form

h j (x; y; θ ) = g j (y) Eθ (g j (X∆ )jX0 = x) (2.9)

for some (simple) functions g j : I ! R in L1 (µθ ), j = 1; : : : ; N. Obvious choices are

polynomials g j (y) = yk j for some (small) integers k j (Bibby & Sørensen 1995, Bibby
& Sørensen 1996). In some models low-order conditional moments are known an-
alytically although the transition probabilities are not. But even if this is not the
case, the conditional moments are easy to calculate by simulation. Kessler & Pare-
des (1999) investigates the influence of simulations on the asymptotic properties
of the estimator.
Second, let g j (; θ ) : I ! R , j = 1; : : : ; N be eigenfunctions for θ with eigen- A
values λ j (θ ). Under mild conditions (Kessler & Sørensen 1999) Eθ (g j (X∆ ; θ )jX0 =
x) = exp( λ j (θ )∆)g j (x; θ ) so
λ j (θ )∆
h j (x; y; θ ) = g j (y; θ ) e g j (x; θ )
2.3. Estimating functions 11

satisfies (2.8). Note that this h j has the same form as (2.9) except that g j depends
on θ . The estimating functions based on eigenfunctions have two advantages:
they are invariant to twice continuously differentiable transformations of data and
the optimal weights are easy to simulate (Sørensen 1997). However, the applica-
bility is rather limited as the eigenfunctions are known only for a few models; see
Kessler & Sørensen (1999) for some non-trivial examples, though.

Considerations on identification

In order for the estimator to behave asymptotically nicely, the matrix A0 should
be regular. Below we shall see how this condition may be explained in terms
of reparametrizations. For simplicity we assume that N = 1 such that f (x; y; θ ) =
α (x; θ )h(x; y; θ ) for an α : I Θ ! R p and an h : I 2 Θ ! R satisfying (2.8). Note
that τh (x; θ ) = Eθ (h(X0; X∆ ; θ )2 jX0 = x) is a real number. From now on we let α j :
I Θ ! R , j = 1 : : : ; p; denote the coordinate functions of α and λ the Lebesgue
measure on I.
Obviously, τh (x; θ ) should be positive; otherwise the conditional distribution
of h(X0; X∆ ; θ ) given X0 = x is degenerate at zero and provides no information. It
is also obvious that the coordinates of α should be linearly independent; other-
wise there are essentially fewer than p equations for estimation of p parameters.
The following proposition shows that linear independence of the coordinates of
α (; θ0) is equivalent to regularity of the variance matrix V0 of f (X0 ; X∆ ; θ0 ) and
that regularity of A0 implies regularity of V0 .

Proposition 2.1 If τh (x; θ0 ) > 0 for all x 2 R , then (i) V0 is singular if and only if
there exists β 2 R p nf0g such that β T α (x; θ0 ) = 0 for λ -almost all x 2 R ; and (ii) V0
is positive definite if A0 is regular.

Proof Since

V0 = Eθ α (X0; θ0 )τh(X0 ; θ0 )α (X0; θ0 )T

0
T
= Eθ τh (X0 ; θ0 )1=2
α (X0 ; θ0 ) τh (X0; θ0 ) 1=2
α (X0 ; θ0 ) ;
0

it holds that V0 is singular if and only if there exists a linear combination of the co-
ordinates of τh (X0; θ0 )1=2 α (X0 ; θ0 ) that is zero µθ -a.s. i.e. if and only if β 2 R p nf0g
0
exists such that β T α (X0 ; θ0 ) = 0 µθ -a.s. (since τh (x; θ ) > 0). The first assertion now
0
follows as µθ has strictly positive density wrt. λ .
0
For the second assertion we show that singularity of V0 implies singularity of
A0 . Assume that V0 is singular and find β as above. Then

β T A0 = β T Eθ α (X0 ; θ0 )ḣ(X0; θ0 ) = Eθ β T α (X0 ; θ0 )ḣ(X0 ; θ0 ) = 0;

0 0

and A(θ0 ) is singular as claimed.

12 Chapter 2. Inference for diffusion processes

In the following we shall only consider h of the form h(x; y; θ ) = g(y) G(x; θ )
where G(x; θ ) = Eθ (g(X∆)jX0 = x), see (2.9). Since α is nothing but a weight func-
tion, a natural requirement is that G determines the full parameter vector uniquely.
In essence, the proposition below claims that this is also sufficient in order for the
matrix A?0 corresponding to the optimal weight function α ? = Ġ=τh to be regular.
Below we write Aα0 to stress the dependence of α on A0 . In particular, A?0 = Aα0 .
?

We need some further terminology: say that a bijective transformation γ from a

neighbourhood Θ0 of θ0 to a set Γ0 R p is a reparametrization around θ0 . The
inverse of γ is denoted by γ 1 or θ , and γ0 = γ (θ0 ). The function Gγ : I Γ0 is
defined by Gγ (x; γ ) = G(x; θ (γ )); hence G(x; θ ) = Gγ (x; γ (θ )).

Proposition 2.2 If there exist j1 ; : : : ; jq f1; : : : ; pg with jk 6= jk0 for k 6= k0 and a

reparametrization around θ0 such that for j = j1 ; : : : ; jq
∂ Gγ (x; γ0 )=∂ γ j = 0; λ a:s:; (2.10)

then Aα0 has rank at most q for any α . Conversely, if A?0 = Aα0 corresponding to
?

the optimal α ? has rank q < p and τh (x; θ ) > 0 for all x 2 I then there exists a
reparametrization γ around θ0 such that (2.10) holds for all j = q + 1; : : : ; p.

Proof By the chain rule it holds for any α that

Aα0 = Eθ α (X0 ; θ0 )Ġ(X0 ; θ0 )
0

= Eθ α (X0 ; θ0 )Ġγ (X0 ; γ0 )γ̇ (θ0 )

0

= Eθ α (X0; θ0 )Ġγ (X0; γ0 ) γ̇ (θ0 )
0

where Ġγ is the matrix of derivatives wrt. γ of Gγ and γ̇ is the matrix of derivatives
of γ wrt. θ . By assumption the jk ’th column of Ġγ (X0 ; γ0 ) has all elements equal to
zero almost surely, k = 1; : : : ; q, so Aα0 has rank at most q as claimed.
For the second assertion, assume that
A?0 = Eθ Ġ(X0 ; θ0 )T Ġ(X0; θ0 )=τh (X0 ; θ0 )
0

1=2 T

= Eθ Ġ(X0; θ0 )τh (X0 ; θ0 ) Ġ(X0 ; θ0 )τh (X0; θ0 ) 1=2
0

has rank q < p and assume without loss of generality that the upper left q q sub-
matrix is positive definite (possibly after the coordinates of θ have been renum-
bered).
According to Lemma 2.3 below, x1 ; : : : ; xq exist such that
0 1
∂ G(x1 ; θ0 )=∂ θ1 ∂ G(x1 ; θ0 )=∂ θq
B .. .. C
. . A
∂ G(xq ; θ0 )=∂ θ1 ∂ G(xq ; θ0 )=∂ θq
is regular. Hence, there is a neighbourhood Θ0 of θ0 such that γ : Θ0 ! R p defined
by
γ (θ ) = G(x1 ; θ ); : : : ; G(xq ; θ ); θq+1 ; : : : ; θ p
2.3. Estimating functions 13

is injective. Let Γ0 = γ (Θ0 ) and γ0 = γ (θ0 ). The first q rows of γ̇ (θ0 ) are given by
0 1
∂ G(x1 ; θ0 )=∂ θ1 ∂ G(x1 ; θ0 )=∂ θ p
B .. .. C
. . A
∂ G(xq ; θ0 )=∂ θ1 ∂ G(xq ; θ0 )=∂ θ p

and the last p q rows are (0 p qq ; I( p q)( p q) ).

j
Next, let Ġ = (Ġ1 ; : : : ; Ġq ; Ġ j ) be the 1 (q + 1) matrix of derivatives wrt.
θ1 ; : : : ; θq ; θ j for j = q + 1; : : : ; p. Since A?0 has rank q, the matrix

1=2 T

Eθ Ġ j (X0; θ0 )τh(X0 ; θ0 ) Ġ j (X0 ; θ0 )τh (X0; θ0 ) 1=2
0

is singular implying that β̃ j 2 R q+1 nf0g exists such that Ġ j (X0; θ0 )β̃ j = 0 almost
surely wrt. µθ . Here, β̃qj+1 6= 0 because the upper left q q sub-matrix of A?0 is
regular. If β j 2 R p nf0g is defined by
0

8 j j
< β̃k =β̃q+1 ; k = 1; : : : ; q
βkj = 1; k= j
:
0; otherwise
it follows that

Ġ(X0 ; θ0 )β j = 0 µθ a:s: (2.11)

for all j = q + 1; : : : ; p and hence Ġ(x; θ0 )β j = 0 λ -a.s. for all j = q + 1; : : : ; p.

From the expression for the derivative γ̇ (θ0 ) it now follows that γ̇ (θ0 )β j equals
the j’th unit column. Hence, since the inverse θ of γ has derivative θ̇ (γ ) =
γ̇ (θ (γ )) 1 it holds that
∂ θ (γ (θ )) ∂ θ p (γ (θ0 )) T
βj= 1 0
;::: ; ; j = q + 1; : : : ; p:
∂γj ∂γj

Finally, by the chain rule

∂ Gγ (x; γ0 ) T
= Ġ(x; θ0 ) ∂ θ1 (γ0 )=∂ γ j ; : : : ; ∂ θ p (γ0 )=∂ γ j = Ġ(x; θ0 )β = 0
j
∂γj

almost surely wrt. the Lebesgue measure λ for all j = q + 1; : : : ; p as claimed.

Note that (2.11) implies that the coordinates of α ? (; θ0) are linearly dependent
λ -a.s., compare with Proposition 2.1. Also note that the reparametrization around
θ0 is not necessarily a global one as it may not be injective on all of Θ. In the proof
we used the following lemma.

Lemma 2.3 Let Y be a real random variable and d : R ! R q be a function such

that E d (Y )d (Y )T is positive definite. Then y1 ; : : : ; yq exist such that the q q matrix
D(q) (y1 ; : : : ; yq ) defined coordinate-wise by D(i qj ) (y1 ; : : : ; yq ) = d j (yi ) is regular.
14 Chapter 2. Inference for diffusion processes

Proof By assumption it holds for all β 2 R q nf0g that

2
0 < β T E d (Y )d (Y )T β =E β T d (Y )d (Y )T β =E β T d (Y )

so β T d (Y ) is not zero almost surely and yβ exists with β T d (yβ ) 6= 0.

The points y1 ; : : : ; yq are chosen recursively as follows. First, let β1 be the
first unit vector and choose y1 such that β1T d (y1 ) = d1 (y1 ) 6= 0. Next, let β2 =
( d2 (y1 ); d1 (y1 ); 0; : : : ; 0)T and choose y2 such that

β2T d (y2 ) = d1 (y1 )d2 (y2 ) d2 (y1 )d1 (y2 ) = det D(2) (y1 ; y2 );

i.e. such that D(2) (y1 ; y2 ) is regular. Continue in the same manner: for yr , assume
that y1 ; : : : ; yr 1 are chosen such that D(r 1) (y1 ; : : : ; yr 1 ) is regular, and note that
the determinant of D(r) (y1 ; : : : ; yr 1 ; Y ) is a linear combination βrT d (Y ) with coef-
ficients βr depending on d j (yi ), j = 1; : : : ; r and i = 1; : : : ; r 1. Consequently, we
can find yr such that βrT d (yr ) = det D(r) (y1 ; : : : ; yr ) 6= 0. The assertion now follows
for r = q.

2.3.2 Simple estimating functions

An estimating function is called simple if it has the form Fn (θ ) = ∑ni=1 f (Xi∆; θ )
where f : I Θ ! R p takes only one state variable as argument (Kessler 2000).
Condition (2.6) simplifies to: Eθ f (X0 ; θ ) = 0 if and only if θ = θ0 . It involves the
0
marginal distribution only which has two important consequences: First, since the
invariant distribution is known explicitly, it is easy to find functionals f analytically
with Eθ f (X0 ; θ0 ) = 0. Second, simple estimating functions completely ignore the
0
dependence structure of X and can only be used for estimation of (parameters in)
the marginal distribution. This is of course a very serious objection.
Kessler (2000) shows asymptotic results for the corresponding estimators and
is also concerned with optimality. This work was continued by Jacobsen (1998).
However, it is usually not possible to find f optimally so f is chosen somewhat ad
hoc. An obvious possibility is the score corresponding to the invariant distribution,
k
f = ∂θ log µ . Another is moment generated functions f j (x; θ ) = xk j Eθ X0 j , j =
1; : : : ; p. Also, functions could be generated by the infinitesimal generator θ A
defined by (2.3): let h j : I Θ ! R , j = 1; : : : ; p, be such that the martingale part
A A
of h j (X ; θ ) is a true martingale wrt. Pθ . Then f = ( θ h1 ; : : : ; θ h p )T gives rise to
an unbiased, simple estimating function. Kessler (2000) suggests to use low-order
polynomials for h1 ; : : : ; h p — regardless of the model.
In Paper I we study the model-dependent choice (h1 ; : : : ; h p ) = ∂θ log µ . We
show that the corresponding estimating function based on f j = θ (∂θ log µ ), j = A j
1; : : : ; p, may be interpreted as an approximation to minus twice the continuous-
time score function when σ does not depend on θ (Proposition I.1). Intuitively,
we would thus expect it to work well for small values of ∆, and it is indeed small
∆-optimal in the sense of Jacobsen (1998); still if σ does not depend on θ .
2.3. Estimating functions 15

There are two important differences from the usual Riemann-Itô approxima-
tion of the continuous-time score, that is, the logarithmic derivative wrt. θ of
(2.5): the above approximation is unbiased which the Riemann-Itô approxima-
tion is not; and each term in the Riemann-Itô approximation depends on pairs of
observations whereas each term in the above approximation depends on a single
observation only.
Also note that the estimating function from Paper I is invariant to bijective and
twice differentiable transformations of the data if σ does not depend on θ (Propo-
sition I.2); this is not the case for the simple estimating functions discussed earlier.
The ideas carry over (to some extent at least) to multi-dimensional diffusions, and
the estimating function works quite well in simulation studies.
Finally, a remark connecting a simple estimating function Fn (θ ) = ∑ni=1 f (Xi∆ ; θ )
to a class of martingale estimating functions. Define

h f (x; y; θ ) = Uθ f (y; θ ) Uθ f (x; θ ) f (x; θ )
where Uθ is the potential operator given by Uθ f (x; θ ) = ∑∞ k=0 Eθ ( f (Xk∆ ; θ )jX0 =
x). Then h f satisfies condition (2.8), and the martingale estimating functions
∑ni=1 h f (X(i 1)∆ ; Xi∆ ; θ ) and Fn (θ ) are asymptotically equivalent (Jacobsen 1998).
However, the martingale estimating function may be improved by introducing
weights α (unless of course the optimal weight α ? (; θ ) is constant). In this sense
martingale estimating functions are always better (or at least as good) as simple
estimating functions. In practice it is not very helpful, though, as the potential
operator in general is not known! Also, the improvement may be very small as we
shall see in the following example.
Example (The Ornstein-Uhlenbeck process) Consider the solution to dXt = θ Xt dt +
dWt where θ < 0. Kessler (2000) shows that the optimal simple estimating function
is obtained for f (x; θ ) = 2θ x2 + 1. It is easy to see that h f (x; y; θ ) ∝ f (y; θ ) ψ f (x; θ )
where ψ = ψ (θ ; ∆) = exp(2θ ∆) and that the optimal weight function is given by

Eθ ḣ f (X0; X∆ )jX0 = x 4θ ∆ψ x2 (1 ψ + 2θ ∆ψ )=θ
α ? (x; θ ) = = :
τh (x; θ ) 8θ ψ (1 ψ )x + 2(1 ψ )2
2
f

Since α (; θ ) is not constant, improvement is indeed possible. It turns out, how-
?

ever, that the asymptotic variance is only reduced by about 1% (for θ0 = 1).
It is well-known that the optimal simple estimating function is nearly (globally)
efficient in the Ornstein-Uhlenbeck model, and the example does not rule out
the possibility that the improvement could be considerable for other models (and
other simple estimating functions).

2.3.3 Comments
Obviously, there are lots of unbiased estimating functions that are neither martin-
gales nor simple. For example,
f (x; y; θ ) = h2 (y; θ ) Aθ h1(x θ )
; h1 (x; θ ) Aθ h2(y θ ) ;
16 Chapter 2. Inference for diffusion processes

generates a class of estimating functions which are transition dependent and yet
explicit (Hansen & Scheinkman 1995, Jacobsen 1998).
Estimating functions of different kinds may of course be combined. For ex-
ample, one could firstly estimate parameters from the invariant distribution by
solving a simple estimating equation and secondly estimate parameters from the
conditional distribution one step ahead. See Bibby & Sørensen (1998) for a suc-
cesful application.
Also, estimating functions may be used as building blocks for the generalized
method of moments (GMM), the much favored estimation method in the econo-
metric literature (Hansen 1982). Estimation via GMM is essentially performed
by choosing an estimating function Fn of dimension p0 > p and minimizing the
quadratic form Fn (θ )T ΩFn (θ ) for some weight matrix Ω.

2.4 Approximate maximum likelihood estimation

We now describe three approximate maximum likelihood methods. They all sup-
ply approximations, analytical or numerical, of pθ (∆; x; ) for fixed x and θ . In
particular they supply approximations of pθ (∆; X(i 1)∆ ; Xi∆ ), i = 1; : : : ; n, and there-
fore of Ln (θ ). The approximate likelihood is finally maximized over θ 2 Θ.

2.4.1 An analytical approximation

A naive, explicit approximation of the conditional distribution of X∆ given X0 = x
is provided by the Euler approximation. The Gaussian approximation may be poor
even if the conditional moments are replaced by accurate approximations (or per-
haps even the true moments). A sequence of explicit, non-Gaussian approximations
of pθ (∆; x; ) is suggested by Aït-Sahalia (1998). For fixed x and θ the idea is to
(i) transform X to a process Z which, conditional on X0 = x, has Z0 = 0 and Z∆
“close” to standard normal; (ii) define a truncated Hermite series expansion of
the density of Z∆ around the standard normal density; and (iii) invert the Hermite
approximation in order to obtain an approximation of pθ (∆; x; ).
For step (i) define Z = gx;θ (X ) where
Z y
gx;θ (y) = p1 1
σ (u; θ )
du:
∆ x
p
Then Z solves dZt = bZ (Zt ; θ ) dt + 1= ∆ dWt with drift function given by Itô’s for-
mula and Z0 = 0 (given X0 = x). Note that g0x;θ (y) = (∆σ 2 (y; θ )) 1=2 > 0 for all y 2 I
so that gx;θ is injective.
For step (ii) note that N (0; 1) is a natural approximation of the conditional
distribution of Z∆ given Z0 = 0, as increments of Z over time intervals of length ∆
has approximately unit variance. Let pθZ (∆; 0; ) denote the true conditional density
of Z∆ given Z0 = 0 and let pZθ ;J (∆; 0; ) be the Hermite series expansion truncated after
J terms of pZθ (∆; 0; ) around the standard normal density.
2.4. Approximate maximum likelihood estimation 17

For step (iii) note that the true densities pθ (∆; x; ) and pZθ (∆; 0; ) are related by

pθ (∆; x; y) = p 1
pZθ ∆; 0; gx;θ (y) ; y2I
∆σ (x; θ )
and apply this formula to invert the approximation pθZ ;J (∆; 0; ) of pθZ (∆; 0; ) into an
approximation pJθ (∆; x; ) of pθ (∆; x; ) in the natural way:

pJθ (∆; x; y) = p 1
pZθ ;J ∆; 0; gx;θ (y) ; y 2 I:
∆σ (x; θ )
Then pθJ (∆; x; y) converges to pθ (∆; x; y) as J ! ∞, suitably uniformly in y and θ .
Furthermore, if J = J (n) tends to infinity fast enough as n ! ∞ then the estimator
maximizing ∏ni=1 pθJ(n) (∆; X(i 1)∆ ; Xi∆) is asymptotically equivalent to the maximum
likelihood estimator (Aït-Sahalia 1998, Theorems 1 and 2).
Note that the coefficients of the Hermite series expansion cannot be computed
explicitly but could be replaced by analytical approximations in terms of the in-
finitesimal generator. Hence, the technique provides explicit, though very complex,
approximations to pθ (∆; x; ). Aït-Sahalia (1998) performs numerical experiments
that indicate that the error pJθ (∆; x; y) pθ (∆; x; y) decreases quickly; roughly with
a factor 10 for each extra term included in the expansion of pθZ (∆; 0; ).

2.4.2 Numerical solutions of the Kolmogorov forward equation

A classical result from stochastic calculus states that the transition densities under
certain regularity conditions are characterized as solutions to the Kolmogorov for-
ward equations. Lo (1988) employs a similar result and finds explicit expressions
for the likelihood function for a log-normal diffusion with jumps and a Brownian
motion with zero as an absorbing state. Poulsen (1999) seems to be the first to
employ numerical procedures for non-trivial diffusion models.
For x and θ fixed the forward equation for pθ (; x; ) is a partial differential
equation: for (t ; y) 2 (0; ∞) I,
∂ ∂ 1 ∂2
p (t ; x; y) = b(y; θ ) pθ (t ; x; y) + σ 2
(y; θ ) p (t ; x; y) ;
∂t θ ∂y 2 ∂ (y)2 θ

with initial condition pθ (0; x; y) = δ (x y) where δ is the Dirac delta function. In

order to calculate the likelihood Ln (θ ) one has to solve n of the above forward
equations, one for each X(i 1)∆ , i = 1; : : : ; n. Note that the forward equation for
X(i 1)∆ determines pθ (t ; X(i 1)∆; y) for all values of (t ; y), but that we only need it at
a single point, namely (∆; Xi∆).
Poulsen (1999) employs the so-called Crank-Nicholson finite difference meth-
od for each of the n forward equations. For fixed θ he obtains a second order
approximation of log Ln (θ ) in the sense that the numerical approximation log Lhn (θ )
satisfies
log Lhn (θ ) = log Ln (θ ) + h2 fnθ (X0; X∆ ; : : : ; Xn∆ ) + o(h2 )gθn (X0; X∆ ; : : : ; Xn∆ )
18 Chapter 2. Inference for diffusion processes

for suitable functions fnθ and gθn . The parameter h determines how fine-grained a
(t ; y)-grid used in the numerical procedure is (and thus the accuracy of approxi-
mation). If h = h(n) tends to zero faster than n 1=4 as n ! ∞ then the estimator
maximizing log Lhn (θ ) is asymptotically equivalent to the maximum likelihood esti-
mator (Poulsen 1999, Theorem 3).
Poulsen (1999) fits the CKLS model to a dataset of 655 observations (in a
revised version, even a six-parameter extension is fitted) and is able to do it in
quite reasonable time. Although n partial differential equations must be solved
the method seems to be much faster than the simulation based method below.

2.4.3 Approximation via simulation

Pedersen (1995b) defines a sequence of approximations to pθ (∆; x; ) via a missing
data approach. The basic idea is to (i) split the time interval from 0 to ∆ into pieces
short enough that the Euler approximation holds reasonably well; (ii) consider
the joint Euler likelihood for the augmented data consisting of the observation
X∆ and the values of X at the endpoints of the subintervals; (iii) integrate the
unobserved variable out of the joint Euler density; and (iv) calculate the resulting
expectation by simulation. The method has been applied successfully to the CKLS
model (Honoré 1997).
To be precise, let x and θ be fixed, consider an integer N 0, and split the
interval [0; ∆℄ into N + 1 subintervals of length ∆N = ∆=(N + 1). Use the notation
X0;k for the (unobserved) value of X at time k=(N + 1), k = 1; : : : ; N. Then (with
x0;0 = x and x0;N +1 = y),

Z N +1
pθ (∆; x; y) =
IN
∏ pθ ∆N ; x0;i 1 ; x0;i d (x0;1 ; : : : ; x0;N )
i=1
Z

= pθ N∆N ; x; x0;N pθ ∆N ; x0;N ; y dx0;N
I

= Eθ pθ ∆N ; X0;N ; y X0 = x ; y2I (2.12)

where we have used the Chapman-Kolmogorov equations.

Now, for ∆N small (N large), pθ (∆N ; x0;N ; ) is well approximated by the normal
θ (∆N ; x0;N ; )
density with mean x0;N + b(x0;N ; θ )∆N and variance σ 2 (x0;N ; θ )∆N . Let p̃N
denote this density. Following (2.12),

pθN (∆; x; y) = Eθ p̃N
θ ∆N ; X0;N ; y X0 = x

is a natural approximation of pθ (∆; x; y), y 2 I. Note that N = 0 corresponds to the

simple Euler approximation.
n (θ ) = ∏i=1 pθ (∆; X(i 1)∆ ; Xi∆ ) converge
n
The approximate likelihood functions LN N

in probability to Ln (θ ) as N ! ∞ (Pedersen 1995b, Theorems 3 and 4). Further-

more, there exists a sequence N (n) such that the estimator maximizing LN n
(n)
(θ )
2.5. Bayesian analysis 19

is asymptotically equivalent (as n ! ∞) to the maximum likelihood estimator

(Pedersen 1995a, Theorem 3).
In practice we could calculate pθN (∆; x; y) as the average of a large number of
values f p̃N
θ (∆; X0;N ; y)gr where X0;N is the last element of a simulated discrete-time
r r
r r
path X0 ; X0;1 ; : : : ; X0;N started at x. Note that the paths are simulated conditional
on X0 = x only which implies that the simulated values X0r;N at time N∆N may be
far from the observed value at time ∆. This is not very appealing as the continuity
of X makes a large jump over a small time interval unlikely to occur in practice.
Also, it has the unfortunate numerical implication that a very large number of
simulations are needed in order to obtain convergence of the average. Elerian et al.
(2000, Section 3.1) suggest an importance sampling technique which utilizes the
observation at time ∆ as well, but is far more difficult to perform than the above
(see also Section 2.5 below).

2.5 Bayesian analysis

Bayesian analysis of discretely observed diffusions has been discussed by Eraker
(1998) and Elerian et al. (2000). The unknown model parameter is treated as a
missing data point, and Markov Chain Monte Carlo (MCMC) methods are used for
simulation of the posterior distribution of the parameter with density
f (θ jX0 ; X∆ ; : : : ; Xn∆ ) ∝ f (X0; X∆ ; : : : ; Xn∆ jθ ) f (θ ): (2.13)
The Bayesian estimator of θ is simply the mean (say) of this posterior. Note that
we use f generically for densities. In particular, f (θ ) denotes the prior density of
the parameter and f (X0; : : : ; Xn∆ jθ ) denotes the likelihood function evaluated at θ .
The Bayesian approach deals with the intractability of f (X0 ; : : : ; Xn∆ jθ ) in a way
very similar to that of Pedersen (1995b), namely by introducing auxiliary data
and employing the Euler approximation over small time intervals. However, the
auxiliary data are generated and used quite differently in the two approaches.
As in Section 2.4.3 each interval [(i 1)∆; i∆℄ is split into N + 1 subintervals
of length ∆N = ∆=(N + 1). We use the notation Xi∆;k for the value of X at time
i∆ + k=(N + 1), i = 0; : : : ; n 1 and k = 0; : : : ; N + 1. The value is observed for k =
0 and k = N, and X(i 1)∆;N +1 = Xi∆;0. Further, let X̃i∆ be the collection of latent
variables Xi∆;1; : : : ; Xi∆;N between i∆ and (i + 1)∆, let X̃ = (X̃0; : : : ; X̃(n 1)∆ ) be the nN-
vector of all auxiliary variables, and let X obs be short for the vector of observations
X0 ; X∆ ; : : : ; Xn∆ .
For N large enough the Euler approximation is quite good and the density of
; X̃ ), conditional on θ (and X ), is roughly
(X obs
0
n 1 N +1
f N (X obs ; X̃ jθ ) = ∏ ∏ ϕ Xi∆;k ; Xi∆;k 1 + b(Xi∆;k 1 ; θ )∆N ; σ (Xi∆;k 1 ; θ )∆N
2
(2.14)
i=0 k=1

where ϕ (; m; v) is the density of N (m; v). The idea is now to generate a Markov
chain fX̃ j ; θ j g j with invariant (and limiting) density equal to the approximate
20 Chapter 2. Inference for diffusion processes

posterior density

f N (X obs ; X̃ jθ ) f (θ )
f N (X̃ ; θ jX obs ) = ∝ f N (X obs ; X̃ jθ ) f (θ ): (2.15)
f (X obs )

Then fθ j g j has invariant density equal to the marginal of f N (X̃ ; θ jX obs ). This
is interpreted as an approximation of the posterior (2.13) of θ and the Bayes
estimator of θ is simply the average of the simulated values fθ j g j (after some
burn-in time).
In order to start off the Markov chain, θ 0 is drawn according to the prior den-
sity f (θ ), and X̃ 0 is defined by linear interpolation between the observed values
of X , say. The j’th iteration in the Markov chain is conducted in two steps: first,
X̃ j = (X̃0j ; : : : ; X̃ j ) is updated from f (X̃ jX obs ; θ j 1 ), and second, θ j is updated
(n 1)∆
from f (θ jX obs ; X̃ j ).
For the first step, note that the Markov property of X implies that the con-
ditional distribution of X̃i∆ given (X obs; θ ) depends on (Xi∆; X(i+1)∆ ; θ ) only so the
vectors X̃i∆j , i = 0; : : : ; n 1 may be drawn one at a time. We focus on how to draw
X̃0 = (X0;1 ; : : : ; X0;N ) conditional on (X0 ; X∆ ; θ j 1 ); the target density being propor-
tional to
N +1
∏ϕ X0;k ; X0;k 1 + b(X0;k 1 ; θ
j 1
)∆N ; σ (X0;k 1 ; θ
2 j 1
)∆N ;
k=1

cf. (2.14). It is (usually) not possible to find the normalizing constant so direct
sampling from the density is not feasible. However, the Metropolis-Hastings algo-
rithm may be employed; for example with suitable Gaussian proposals. Eraker
(1998) suggests to sample only one element of X̃0 at a time whereas Elerian et al.
(2000) suggests to sample block-wise, with random block size. The latter is sup-
posed to increase the rate of convergence of the Markov chain (of course, all the
usual problems with convergence of the chain should be investigated). Note the
crucial difference from the simulation approach in Section 2.4.3 where X̃i∆ was
simulated conditional on Xi∆ only: here X̃i∆ is simulated conditional on both Xi∆
and X(i+1)∆ .
For the second step it is sometimes possible to find the posterior of θ explic-
itly from (2.15) in which case θ is updated by direct sampling from the density.
Otherwise the Metropolis-Hastings algorithm is imposed again.
The method is easily extended to cover the multi-dimensional case. Also, it
applies to models that are only partially observed (e.g. stochastic volatility mod-
els) in which case the values of the unobserved coordinates are simulated like
X̃ above (Eraker 1998). Eraker (1998) analyses US interest rate data and simu-
lated data, using the CKLS model dXt = α (β Xt ) dt + σ Xtγ as well as a stochastic
volatility model (see Section 3.4.4). Elerian et al. (2000) apply the method on
simulated Cox-Ingersoll-Ross data and on interest rate data using a non-standard
eight-parameter model.
2.6. Estimation based on auxiliary models 21

2.6 Estimation based on auxiliary models

We now discuss indirect inference (Gourieroux et al. 1993) and the so-called ef-
ficient method of moments, or EMM for short (Gallant & Tauchen 1996). The
methods are essentially applicable whenever simulation from the model is possi-
ble and there exists a suitable auxiliary model. This flexibility must be the reason
why the methods are fairly often applied by econometricians in empirical studies.
However, we find the methods somewhat artificial and awkward and believe that
the term “efficient” in EMM is misleading.
The idea is most easily described in a relatively general set-up: let (Y1 ; : : : ; Yn )
be data from a (complicated) time series model Qθ , indexed by the parameter
of interest θ . Estimation is performed in two steps: First, the model Qθ is ap-
proximated by a simpler one Q̃ρ — the auxiliary model, indexed by ρ — and the
auxiliary parameter ρ is estimated. Second, the two parameters ρ and θ are linked
in order to obtain an estimate of θ . This is done via a GMM procedure, and the
first step may simply be viewed as a way of finding moment functionals for the
GMM procedure.
Let us be more specific. Assume that (Y1 ; : : : ; Yn ) has density q̃n wrt. Q̃ρ and let
ρ̂n be the maximum likelihood estimator of ρ , that is,

ρ̂n = argmaxρ log q̃n (Y1 ; : : : ; Yn ; ρ );

with first-order condition
∂
∂ρ log q̃n (Y1 ; : : : ; Yn ; ρ̂n ) = 0:

Loosely speaking, θ̂n is now defined such that simulated data drawn from Q
θ̂n
resembles data drawn from Q̃ρ̂n .
For θ 2 Θ let Y1θ ; : : : ; YRθ be a long trajectory simulated from Qθ and let ρ̂R (θ ) be
the maximum likelihood estimator of ρ based on the simulated data. The indirect
inference estimator of θ is the value minimizing the quadratic form
T
ρ̂n ρ̂R (θ ) Ω ρ̂n ρ̂R (θ )

where Ω is some positive semidefinite matrix of size dim(ρ ) dim(ρ ). In EMM

computation of ρ̂R (θ ) is avoided as
h i h iT
∂
∂ρ log q̃R (Y1θ ; : : : ; Ynθ ; ρ̂n ) Ω̃ ∂
∂ρ log q̃R (Y1θ ; : : : ; YRθ ; ρ̂n )

with Ω̃ like Ω above, is minimized.

Both estimators of θ are consistent and asymptotically normal, and they are
asymptotically equivalent (if Ω and Ω̃ are chosen appropriately). If θ and ρ have
same dimension, then the two estimators coincide and simply solve ρ̂R (θ̂n ) = ρ̂n .
However, as the auxiliary model should be both easy to handle statistically and
flexible enough to resemble the original model, it is often necessary to use one
with higher dimension than the original model.
22 Chapter 2. Inference for diffusion processes

Now, how should we choose the auxiliary model? For the diffusion models
considered in this chapter the discrete-time Euler scheme
p
Xi∆ = X(i 1)∆
+ b(X(i 1)∆ ; ρ )∆ + σ (X(i 1)∆; ρ ) ∆Ui

with U1 ; : : : ; Un independent and identically N (0; 1)-distributed, is a natural sug-

gestion (Gourieroux et al. 1993). The second step in the estimation procedure
corrects for the discrepancy between the true conditional distributions and those
suggested by the Euler scheme. In a small simulation study for the Ornstein Uh-
lenbeck process (solving dXt = θ Xt dt + σ dWt ) the indirect inference estimator was
highly inefficient (compared to the maximum likelihood estimator). In the EMM
literature it is generally suggested to use auxiliary densities based on expansions
of a non-parametric density (Gallant & Long 1997). Under certain (strong) condi-
tions EMM performed with these auxiliary models is claimed to be as efficient as
maximum likelihood.
However, we are convinced that EMM is by no means efficient in practice. The
choice of auxiliary model is still quite arbitrary (and fairly incomprehensible), and
the whole idea seems slightly artificial. We believe that for many models it is
possible to do some kind of (simulated) likelihood approximation that is as fast
and efficient — and far more comprehensible. This has already been done for the
diffusion models (Section 2.4) and Paper III provides ideas for stochastic volatility
models in continuous time.

2.7 Estimation of parameters in the diffusion term

In Paper II we discuss a method for estimation of parameters in the diffusion func-
tion which does not fit into any of the previous sections. We briefly sketch the idea
here and refer to Paper II for details.
Assume that the drift is known, b(x; θ ) = b(x) (or has been estimated by some
other method). Recall that µ (; θ ) is the invariant density and define f = σ 2 µ :
I Θ ! (0; ∞). By equation (2.2) it is easy to verify that f 0 = 2bµ . Aït-Sahalia
(1996) uses this relation for non-parametric estimation of σ 2 via kernel estimation
methods. In Paper II the relation is used for parametric estimation. The idea is to
define a pointwise consistent estimator of f (; θ ) and estimate θ by the value that
makes the uniform distance between the “true” function f (; θ ) and the estimated
version minimal.
It is crucial that f converges to zero at at least one Rof the endpoints, l and r, of
the state space. If f (x; θ ) ! 0 as x & l, then f (x; θ ) = 2 lx b(u)µ (u; θ ) du for all x 2 I
and
2 n
fˆ1;n (x) = ∑ b(Xi∆ )1fX xg
n i =1 i∆

is consistent for f (x; θ ), x 2 I. The uniform distance supx2I f (x; θ ) fˆ1;n (x) is min-
imized in order to obtain an estimator of θ . Similarly, if f (x; θ ) ! 0 as x % r,
2.8. Conclusion 23

then
2 n
n i∑
fˆ2;n (x) = b ( X )
i∆ fXi∆ >xg
1
=1

is consistent for f (x; θ ), x 2 I, and supx2I f (x; θ ) fˆ2;n (x) is minimized. If f (x; θ ) !
0 at both l and r then both fˆ1;n and fˆ2;n provide pointwise consistent estimators
of f (; θ ), and we may use a weighted average fˆn of the two in order to reduce
variance. p
The estimators are n-consistent and in certain cases weakly convergent (The-
orems II.7 and II.9) but the limit distribution need not be Gaussian. Note that
the observations are mixed in a quite complex way in the uniform distance so the
usual limit theorems do not apply. Instead, the asymptotic results are proved using
empirical process theory. We are not aware of any other applications of empirical
process theory to problems related to inference for diffusion processes.
In Paper II we apply the method to simulated data from the CKLS model, dXt =
(α + β Xt ) dt + σ Xtγ dWt , and get reasonable estimators for both γ and σ . The drift
parameters are estimated beforehand using martingale estimating functions. Note
that this model is relatively hard to identify as different values of the pair (γ ; σ )
may yield very similar diffusion functions.
There are two objections to the method. First, it provides estimators of the
parameters in the diffusion function only; the drift needs to be estimated before-
hand. This is possible via martingale estimating functions if the drift is linear (as
in many popular models, e.g. the CKLS model above), but is otherwise difficult.
Second, the approach is perhaps somewhat ad hoc and the estimators need not be
efficient.

2.8 Conclusion
Maximum likelihood estimation is typically not possible for diffusion processes
that have been observed at discrete time-points only. In this chapter we have
reviewed a number of alternatives from the literature.
From a classical point of view, the most appealing methods are those based
on approximations of the true likelihood that in principle can be made arbitrarily
accurate. We reviewed three types above: One provides analytical approximations
to the likelihood function and is therefore in principle the easiest one to use. The
expressions are quite complicated, though, even for low-order approximations.
The other two rely on numerical techniques, one on numerical solutions to partial
differential equations and one on simulations. Even with today’s efficient comput-
ers both methods are quite computationally demanding so faster procedures are
often valuable.
Estimation via estimating functions is generally much faster. So-called simple
estimating functions are available in explicit form but provide only estimators for
parameters from the marginal distribution. Still, they may be useful for prelimi-
nary analysis. Paper I investigates a special simple estimating function which can
24 Chapter 2. Inference for diffusion processes

be interpreted as an approximation of the continuous-time score function. The

corresponding estimator is invariant to transformations of data. Martingale esti-
mating functions are analytically available for a few models but must in general
be calculated by simulated. This basically amounts to simulating conditional ex-
pectations, which is faster than calculating conditional densities as required by
the direct likelihood approximations above. Under regularity conditions, estima-
tors obtained by martingale estimating functions are consistent and asymptotically
normal. We studied one of the regularity conditions in some detail and showed
how it may be explained in terms of reparametrizations.
The Bayesian approach is to consider the parameter as random and make sim-
ulations from its (posterior) distribution. This is quite hard and requires simu-
lation, conditional on the observations, of the diffusion process at a number of
time-points in between those where it was observed. The posterior distribution
depends on the prior distribution which is chosen more or less arbitrarily. Indi-
rect inference and EMM remove bias due to the discrete-time auxiliary model by
simulation methods. The quality of the estimators is bound to depend on the aux-
iliary model which is chosen somewhat arbitrarily, and we believe that more direct
approaches are preferable. The procedure from Section 2.7 (and Paper II) for esti-
mation of the diffusion parameters (when the drift is known) provides satisfactory
estimates in the difficult CKLS model. The estimators are probably not efficient,
though. The application of empirical process theory for proving asymptotic results
is interesting from a theoretical point of view.

Statistical Regression Modeling With R: Ding-Geng (Din) Chen Jenny K. Chen
No ratings yet
Statistical Regression Modeling With R: Ding-Geng (Din) Chen Jenny K. Chen
239 pages
Martingale Estimation Functions For Discretely Observed Diffusion Processes (Biddy and Sorensen, 1995)
No ratings yet
Martingale Estimation Functions For Discretely Observed Diffusion Processes (Biddy and Sorensen, 1995)
24 pages
Tutorial On Stochastic Differential Equations
100% (1)
Tutorial On Stochastic Differential Equations
38 pages
Stochastic Calculus
No ratings yet
Stochastic Calculus
114 pages
Stochastic Calculus For Finance II - Some Solutions To Chapter VI
100% (1)
Stochastic Calculus For Finance II - Some Solutions To Chapter VI
12 pages
Math Finance Cheat Sheet PDF
No ratings yet
Math Finance Cheat Sheet PDF
2 pages
Functional Estimation For Density, Regression Models and Processes (Odile Pons)
No ratings yet
Functional Estimation For Density, Regression Models and Processes (Odile Pons)
205 pages
ISSN 1283-0623: Jos e Da Fonseca Martino Grasselli Florian Ielpo
No ratings yet
ISSN 1283-0623: Jos e Da Fonseca Martino Grasselli Florian Ielpo
49 pages
Mathematical Finance Cheat Sheet
100% (1)
Mathematical Finance Cheat Sheet
2 pages
Statistics Diffusions
No ratings yet
Statistics Diffusions
66 pages
AStat
No ratings yet
AStat
287 pages
Alternative Price Processes For Black-Scholes: Empirical Evidence and Theory
No ratings yet
Alternative Price Processes For Black-Scholes: Empirical Evidence and Theory
44 pages
BEP Thesis FINAL
No ratings yet
BEP Thesis FINAL
48 pages
Week 1 1720465962 Estimation Hour 2
No ratings yet
Week 1 1720465962 Estimation Hour 2
14 pages
10.3934 Naco.2024012
No ratings yet
10.3934 Naco.2024012
13 pages
MATH 437/ MATH 535: Applied Stochastic Processes/ Advanced Applied Stochastic Processes
No ratings yet
MATH 437/ MATH 535: Applied Stochastic Processes/ Advanced Applied Stochastic Processes
7 pages
Efficient Estimation of Conditional Covariance Matrices For Dimension Reduction - Solís, Loubes, Marteau - JDS, Bruxelles 2012
No ratings yet
Efficient Estimation of Conditional Covariance Matrices For Dimension Reduction - Solís, Loubes, Marteau - JDS, Bruxelles 2012
44 pages
An Introduction To Nonlinear Filtering
No ratings yet
An Introduction To Nonlinear Filtering
23 pages
Industrial Mathematics Institute: Research Report
No ratings yet
Industrial Mathematics Institute: Research Report
25 pages
N. Privault, Stochastic Analysis in Discrete and Continuous Settings, Lecture Notes in Mathematics 1982, DOI 10.1007/978-3-642-02380-4 0, C 1
No ratings yet
N. Privault, Stochastic Analysis in Discrete and Continuous Settings, Lecture Notes in Mathematics 1982, DOI 10.1007/978-3-642-02380-4 0, C 1
6 pages
Likelihood Estimation of Jump-Diffusions
No ratings yet
Likelihood Estimation of Jump-Diffusions
105 pages
A Step by Step Mathematical Derivation A
No ratings yet
A Step by Step Mathematical Derivation A
32 pages
BR Mo ST Ca 13
No ratings yet
BR Mo ST Ca 13
143 pages
Estimation of Parameters of The Makeham Distribution Using The Least Squares Method
No ratings yet
Estimation of Parameters of The Makeham Distribution Using The Least Squares Method
11 pages
The Pearson Diffusions: A Class of Statistically Tractable Diffusion Processes
No ratings yet
The Pearson Diffusions: A Class of Statistically Tractable Diffusion Processes
32 pages
Inference in The Stochastic Cox-Ingersol-Ross Diff
No ratings yet
Inference in The Stochastic Cox-Ingersol-Ross Diff
6 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
Multivariate Analysis of Variance
No ratings yet
Multivariate Analysis of Variance
29 pages
R300 Solution Guide 2018M
No ratings yet
R300 Solution Guide 2018M
8 pages
Math
No ratings yet
Math
43 pages
Math
No ratings yet
Math
15 pages
Approximation of The Invariant Measure of Stable SDEs
No ratings yet
Approximation of The Invariant Measure of Stable SDEs
32 pages
牛颖Introduction to M-estimator
No ratings yet
牛颖Introduction to M-estimator
4 pages
Ku Satsu 160225
No ratings yet
Ku Satsu 160225
11 pages
Shreve IISolutions Chapter 05
No ratings yet
Shreve IISolutions Chapter 05
26 pages
Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman
No ratings yet
Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman
19 pages
Brownian Motion, Stochastic Calculus and Ito's Formula Demystified...
No ratings yet
Brownian Motion, Stochastic Calculus and Ito's Formula Demystified...
5 pages
Protter
No ratings yet
Protter
43 pages
Nielsen. Lévy Characterization of Brownian Motion
No ratings yet
Nielsen. Lévy Characterization of Brownian Motion
5 pages
Statistical Inference For Ergodic Diffusion Process: Yu.A. Kutoyants
No ratings yet
Statistical Inference For Ergodic Diffusion Process: Yu.A. Kutoyants
24 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
Variational Problems in Machine Learning and Their Solution With Finite Elements
No ratings yet
Variational Problems in Machine Learning and Their Solution With Finite Elements
11 pages
Automatica: Guangchen Wang Hua Xiao Guojing Xing
No ratings yet
Automatica: Guangchen Wang Hua Xiao Guojing Xing
6 pages
Estimation EMV
No ratings yet
Estimation EMV
37 pages
Basic Stats Estimation
No ratings yet
Basic Stats Estimation
8 pages
Intro To Sdes
No ratings yet
Intro To Sdes
28 pages
Linear Stochastic Differential Equations With Anticipating Initial Conditions
No ratings yet
Linear Stochastic Differential Equations With Anticipating Initial Conditions
9 pages
Brownian Motion Notes
No ratings yet
Brownian Motion Notes
14 pages
Minimax Quadratic Estimation of A Quadratic Functional
No ratings yet
Minimax Quadratic Estimation of A Quadratic Functional
34 pages
Introduction
No ratings yet
Introduction
11 pages
Stat-Review Xid-8243919 1
No ratings yet
Stat-Review Xid-8243919 1
24 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
Sheldon Ross Stochastic Processes
No ratings yet
Sheldon Ross Stochastic Processes
6 pages
MIT15 450F10 Probs
No ratings yet
MIT15 450F10 Probs
7 pages
MIT14 384F13 Rec7
No ratings yet
MIT14 384F13 Rec7
6 pages
Bayesian Analysis of Failure Time Data Using P Splines Best Quality Download
No ratings yet
Bayesian Analysis of Failure Time Data Using P Splines Best Quality Download
17 pages
TS Theme3
No ratings yet
TS Theme3
18 pages
1 IEOR 4700: Notes On Brownian Motion: 1.1 Normal Distribution
No ratings yet
1 IEOR 4700: Notes On Brownian Motion: 1.1 Normal Distribution
11 pages
1033 Biological Assay Validation
No ratings yet
1033 Biological Assay Validation
17 pages
ECE567 16F Hw1soln PDF
No ratings yet
ECE567 16F Hw1soln PDF
5 pages
Examinations: Subject 106 Actuarial Mathematics 2
No ratings yet
Examinations: Subject 106 Actuarial Mathematics 2
8 pages
Diffusions and Stochastic Differential Equations
No ratings yet
Diffusions and Stochastic Differential Equations
8 pages
P Syllabus PDF
No ratings yet
P Syllabus PDF
5 pages
Stat 220 Notes
No ratings yet
Stat 220 Notes
109 pages
Modeling Binary Responses: Correlated
No ratings yet
Modeling Binary Responses: Correlated
297 pages
Stochastic Processes, Ito Calculus and Black-Scholes Formula
No ratings yet
Stochastic Processes, Ito Calculus and Black-Scholes Formula
36 pages
Ms. Koni Bernadette C. Tarayao Faculty in Mathematics College of Education
No ratings yet
Ms. Koni Bernadette C. Tarayao Faculty in Mathematics College of Education
68 pages
The Binomial Probability Distribution PDF
No ratings yet
The Binomial Probability Distribution PDF
7 pages
Syllabus MAT1004 Stat New
No ratings yet
Syllabus MAT1004 Stat New
7 pages
Lecture 3 - CSE38900 - Rev
No ratings yet
Lecture 3 - CSE38900 - Rev
88 pages
Likelihood Estimation of Stable Levy Processes From Discrete Data
No ratings yet
Likelihood Estimation of Stable Levy Processes From Discrete Data
22 pages
Chapter 9 Hypothesis Testing
No ratings yet
Chapter 9 Hypothesis Testing
77 pages
Booklet
No ratings yet
Booklet
238 pages
211MAT1302 Unit-3
No ratings yet
211MAT1302 Unit-3
13 pages
2302 01087
No ratings yet
2302 01087
8 pages
Wilcox Functions
No ratings yet
Wilcox Functions
117 pages
VaR Vs CVaR CARISMA Conference 2010
No ratings yet
VaR Vs CVaR CARISMA Conference 2010
75 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Summer School 19
No ratings yet
Summer School 19
95 pages
MATH30 6 Lecture 3
No ratings yet
MATH30 6 Lecture 3
66 pages
Øving6 Med Notater
No ratings yet
Øving6 Med Notater
11 pages
习题 6.3
No ratings yet
习题 6.3
17 pages
Mars PDF
No ratings yet
Mars PDF
15 pages
Hausman
No ratings yet
Hausman
3 pages
(京都大学) 確率論-1
No ratings yet
(京都大学) 確率論-1
36 pages
Statistics (S) Level-III - 31387653
No ratings yet
Statistics (S) Level-III - 31387653
14 pages
Problem Solving
No ratings yet
Problem Solving
3 pages
高次元統計解析理論と方法論の新しい展開
No ratings yet
高次元統計解析理論と方法論の新しい展開
23 pages
PSYCH 201 Special Exam
No ratings yet
PSYCH 201 Special Exam
7 pages
Skewness and Kurtosis
No ratings yet
Skewness and Kurtosis
3 pages
SaeHB Me Beta
No ratings yet
SaeHB Me Beta
6 pages
Non Paramatric
No ratings yet
Non Paramatric
3 pages
Week 13 Discussion
No ratings yet
Week 13 Discussion
3 pages
Normal Question
No ratings yet
Normal Question
1 page
Algebraic Methods in Statistical Mechanics and Quantum Field Theory
From Everand
Algebraic Methods in Statistical Mechanics and Quantum Field Theory
Dr. Gérard G. Emch
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet

0 Inference For Diffusion Processes

Uploaded by

0 Inference For Diffusion Processes

Uploaded by

2

Inference for diffusion processes

2.1 Model, assumptions and notation

dXt = b(Xt ; θ ) dt + σ (Xt ; θ ) dWt (2.1)

for twice continuously differentiable functions f : R Θ ! R . When restricted to

2.2 Preliminary comments on estimation

because X is Markov. Ideally, θ should be estimated by the value maximizing

where εi , i = 1; : : : ; n are independent, identically N (0; 1)-distributed. This approxi-

which is the Riemann-Itô approximation of (2.4).

2.3 Estimating functions

Eθ Fn (X obs ; θ ) = 0 if and only if θ = θ0 : (2.6)

2.3.1 Martingale estimating functions

Definition, asymptotic results and optimality

for a function h̃ : I 2 Θ ! R . If all coordinates of f from (2.7) satisfy this condition,

Classical limit theory for stationary martingales (Billingsley 1961) is employed

How to construct martingale estimating functions in practice

h j (x; y; θ ) = g j (y) Eθ (g j (X∆ )jX0 = x) (2.9)

for some (simple) functions g j : I ! R in L1 (µθ ), j = 1; : : : ; N. Obvious choices are

V0 = Eθ α (X0; θ0 )τh(X0 ; θ0 )α (X0; θ0 )T

β T A0 = β T Eθ α (X0 ; θ0 )ḣ(X0; θ0 ) = Eθ β T α (X0 ; θ0 )ḣ(X0 ; θ0 ) = 0;

and A(θ0 ) is singular as claimed.

We need some further terminology: say that a bijective transformation γ from a

Proposition 2.2 If there exist j1 ; : : : ; jq  f1; : : : ; pg with jk 6= jk0 for k 6= k0 and a

Proof By the chain rule it holds for any α that

= Eθ α (X0 ; θ0 )Ġγ (X0 ; γ0 )γ̇ (θ0 )

and the last p q rows are (0 p qq ; I( p q)( p q) ).

Ġ(X0 ; θ0 )β j = 0 µθ a:s: (2.11)

for all j = q + 1; : : : ; p and hence Ġ(x; θ0 )β j = 0 λ -a.s. for all j = q + 1; : : : ; p.

Finally, by the chain rule

almost surely wrt. the Lebesgue measure λ for all j = q + 1; : : : ; p as claimed.

Lemma 2.3 Let Y be a real random variable and d : R ! R q be a function such

Proof By assumption it holds for all β 2 R q nf0g that

so β T d (Y ) is not zero almost surely and yβ exists with β T d (yβ ) 6= 0.

2.3.2 Simple estimating functions

2.4 Approximate maximum likelihood estimation

2.4.1 An analytical approximation

2.4.2 Numerical solutions of the Kolmogorov forward equation

with initial condition pθ (0; x; y) = δ (x y) where δ is the Dirac delta function. In

2.4.3 Approximation via simulation

where we have used the Chapman-Kolmogorov equations.

is a natural approximation of pθ (∆; x; y), y 2 I. Note that N = 0 corresponds to the

in probability to Ln (θ ) as N ! ∞ (Pedersen 1995b, Theorems 3 and 4). Further-

is asymptotically equivalent (as n ! ∞) to the maximum likelihood estimator

2.5 Bayesian analysis

2.6 Estimation based on auxiliary models

ρ̂n = argmaxρ log q̃n (Y1 ; : : : ; Yn ; ρ );

where Ω is some positive semidefinite matrix of size dim(ρ ) dim(ρ ). In EMM

with Ω̃ like Ω above, is minimized.

with U1 ; : : : ; Un independent and identically N (0; 1)-distributed, is a natural sug-

2.7 Estimation of parameters in the diffusion term

be interpreted as an approximation of the continuous-time score function. The

You might also like

Proposition 2.2 If there exist j1 ; : : : ; jq f1; : : : ; pg with jk 6= jk0 for k 6= k0 and a