Chapter V. Stochastic Processes in Continuous Time 1. Brownian Motion
Chapter V. Stochastic Processes in Continuous Time 1. Brownian Motion
1
space: (i) Ω = C[0, 1], the space of all continuous functions on [0, 1]; (ii) the
points ω ∈ Ω are thus random functions, and we use the coordinate mappings:
Xt , or Xt (ω), = ωt ; (iii) the filtration is given by Ft := σ(Xs : 0 ≤ s ≤ t),
F := F1 ; (iv) P is the measure on (Ω, F) with finite-dimensional distribu-
tions specified by the restriction that the increments Xt+u −Xt are stationary
independent Gaussian N (0, u).
The best way to prove this is by construction, and one that reveals some
properties. The result below is originally due to Paley, Wiener and Zygmund
(1933) and Lévy (1948), but is re-written in the modern language of wavelet
expansions. We omit the proof; for this, see e.g. [BK] 5.3.1, or SP L20-22.
The Haar system (Hn ) = (Hn (.)) is a complete orthonormal system (cons)
of functions in L2 [0, 1]. The Schauder system ∆n is obtained by integrating
the Haar system. Consider the triangular function (or ‘tent function’)
1 1
∆(t) := 2t on [0, ), 2(1 − t) on [ , 1], 0 else.
2 2
With ∆0 (t) := t, ∆1 (t) := ∆(t), define the nth Schauder function ∆n by
2
Theorem (PWZ theorem: Paley-Wiener-Zygmund, 1933). For (Zn )∞
0
independent N (0, 1) random variables, λn , ∆n as above,
∞
X
Wt := λn Zn ∆n (t)
n=0
converges uniformly on [0, 1], a.s. The process W = (Wt : t ∈ [0, 1]) is Brow-
nian motion.
Z := {t ≥ 0 : Xt = 0}
3
Next, if tn are zeros and tn → t, then by path-continuity B(tn ) → B(t); but
B(tn ) = 0, so B(t) = 0:
2. Z is a closed set (Z contains its limit points).
Less obvious are the next two properties:
3. Z is a perfect set: every point t ∈ Z is a limit point of points in Z. So
there are infinitely many zeros in every neighbourhood of every zero (so the
paths must oscillate amazingly fast!).
4. Z is a (Lebesgue) null set: Z has Lebesgue measure zero.
In particular, the diagram above (or any other diagram!) grossly distorts
Z: it is impossible to draw a realistic picture of a Brownian path.
Brownian Scaling.
For each c ∈ (0, ∞), X(c2 t) is N (0, c2 t), so Xc (t) := c−1 X(c2 t) is N (0, t).
Thus Xc has all the defining properties of a Brownian motion (check). So,
Xc IS a Brownian motion:
Brownian motion owes part of its importance to belonging to all the im-
portant classes of stochastic processes: it is (strong) Markov, a (continuous)
martingale, Gaussian, a diffusion, a Lévy process (process with stationary
independent increments), etc.
Ft = ∩s>t Fs
4
(the ‘usual conditions’ – right-continuity and completeness – in Meyer’s ter-
minology).
A stochastic process X = (Xt )t≥0 is a family of random variables defined
on a filtered probability space with Xt Ft -measurable for each t: thus Xt is
known when Ft is known, at time t.
If {t1 , · · · , tn } is a finite set of time-points in [0, ∞), (Xt1 , · · · , Xtn ), or
(X(t1 ), · · · , X(tn )) (for typographical convenience, we use both notations in-
terchangeably, with or without ω: Xt (ω), or X(t, ω)) is a random n-vector,
with a distribution, µ(t1 , · · · , tn ) say. The class of all such distributions as
{t1 , · · · , tn } ranges over all finite subsets of [0, ∞) is called the class of all
finite-dimensional distributions of X. These satisfy certain obvious consis-
tency conditions:
(i) deletion of one point ti can be obtained by ‘integrating out the unwanted
variable’, as usual when passing from joint to marginal distributions,
(ii) permutation of the ti permutes the arguments of the measure µ(t1 , · · · , tn )
on Rn .
Conversely, a collection of finite-dimensional distributions satisfying these
two consistency conditions arises from a stochastic process in this way (this
is the content of the DANIELL-KOLMOGOROV Theorem: P. J. Daniell in
1918, A. N. Kolmogorov in 1933).
Important though it is as a general existence result, however, the Daniell-
Kolmogorov theorem does not take us very far. It gives a stochastic process
X as a random function on [0, ∞), i.e. a random variable on R[0,∞) . This
is a vast and unwieldy space; we shall usually be able to confine attention
to much smaller and more manageable spaces, of functions satisfying reg-
ularity conditions. The most important of these is continuity: we want to
be able to realise X = (Xt (ω))t≥0 as a random continuous function, i.e. a
member of C[0, ∞); such a process X is called path-continuous (since the
map t → Xt (ω) is called the sample path, or simply path, given by ω) – or
more briefly, continuous. This is possible for the extremely important case of
Brownian motion, for example, and its relatives. Sometimes we need to allow
our random function Xt (ω) to have jumps. It is then customary, and con-
venient, to require Xt to be right-continuous with left limits (rcll), or càdlàg
(continu à droite, limite à gauche) – i.e. to have X in the space D[0, ∞) of
all such functions (the Skorohod space). This is the case, for instance, for the
Poisson process and its relatives.
General results on realisability – whether or not it is possible to realise, or
obtain, a process so as to have its paths in a particular function space – are
5
known, but it is usually better to construct the processes we need directly on
the function space on which they naturally live.
Given a stochastic process X, it is sometimes possible to improve the
regularity of its paths without changing its distribution (that is, without
changing its finite-dimensional distributions). For background on results of
this type (separability, measurability, versions, regularization, ...) see e.g.
Doob’s classic book [D].
The continuous-time theory is technically much harder than the discrete-
time theory, for two reasons:
(i) questions of path-regularity arise in continuous time but not in discrete
time,
(ii) uncountable operations (like taking sup over an interval) arise in contin-
uous time. But measure theory is constructed using countable operations:
uncountable operations risk losing measurability.
6
form: economic and financial insight, plus: mathematics, probability and
stochastic processes; statistics (especially pattern recognition, data mining
and machine learning); numerics and computation.
σ(s, t) = cov(Xs , Xt ).
7
process are either continuous, or extremely pathological: for example, un-
bounded above and below on any time-interval, however short. Naturally,
we shall confine attention in this course to continuous Gaussian processes.
3. Markov Processes.
X is Markov if for each t, each A ∈ σ(Xs : s > t) (the ‘future’) and
B ∈ σ(Xs : s < t) (the ‘past’),
P (A|Xt , B) = P (A|Xt ).
That is, if you know where you are (at time t), how you got there doesn’t
matter so far as predicting the future is concerned. Equivalently, past and
future are conditionally independent given the present.
The same definition applied to Markov processes in discrete time.
X is said to be strong Markov if the above holds with the fixed time t
replaced by a stopping time T (a random variable). This is a real restriction
of the Markov property in continuous time (though not in discrete time) –
another instance of the difference between the two.
4. Diffusions.
A diffusion is a path-continuous strong-Markov process such that for each
time t and state x the following limits exist:
1
µ(t, x) := limh↓0 E[(Xt+h − Xt )|Xt = x],
h
1
σ 2 (t, x) := limh↓0 E[(Xt+h − Xt )2 |Xt = x].
h
2
Then µ(t, x) is called the drift, σ (t, x) the diffusion coefficient. Then p(t, x, y),
the density of transitions from x to y in time t, satisfies the parabolic PDE
1
Lp = ∂p/∂t, L := σ 2 D2 + µ(x)D, D := ∂/∂x.
2
The (2nd-order, linear) differential operator L is called the generator. Brow-
nian motion is the case σ = 1, µ = 0, and gives the heat equation (L = 21 D2
in one dimension, half the Laplacian ∆ in higher dimensions).
It is not at all obvious, but it is true, that this definition does indeed
capture the nature of physical diffusion. Examples: heat diffusing through a
metal; smoke diffusing through air; dye diffusing through liquid; pollutants
diffusing through air or liquid.
8
§4. Quadratic Variation (QV) of Brownian Motion; Itô’s Lemma
In particular, for t > 0 small, this shows that the variance of Bt2 is negligible
compared with its expected value. Thus, the randomness in Bt2 is negligible
compared to its mean for t small.
This suggests that if we take a fine enough partition P of [0, T ] – a finite
set of points
0 = t0 < t1 < · · · < tk = T
with |P| := max |ti − ti−1 | small enough – then writing
9
Theorem. The quadratic variation of a Brownian path over [0, T ] exists and
equals T , a.s.
For details of the proof, see e.g. [BK], §5.3.2, SP L22, SA L7,8.
If we increase t by a small amount to t + dt, the increase in the QV can
be written symbolically as (dBt )2 , and the increase in t is dt. So, formally
we may summarise the theorem as
(dBt )2 = dt.
Suppose now we look at the ordinary variation Σ|∆Bt |, rather than the
quadratic variation
√ Σ(∆Bt )2 . Then instead
√ of Σ(∆Bt )2 ∼ Σ∆t ∼ t, we get
Σ|∆Bt | ∼ Σ ∆t. Now for ∆t small, √ ∆t is of a larger order of magnitude
that ∆t. So if Σ∆t = t converges, Σ ∆t diverges to +∞. This suggests –
what is in fact true – the
The QV result above leads to Lévy’s 1948 result, the Martingale Char-
acterization of BM. Recall that Bt is a continuous martingale with respect
to its natural filtration (Ft ) and with QV t. There is a remarkable converse;
we give two forms.
For proof, see e.g. [RW1], I.2. Observe that for s < t,
Bt2 = [Bs + (Bt − Bs )]2 = Bs2 + 2Bs (Bt − Bs ) + (Bt − Bs )2 ,
E[Bt2 |Fs ] = Bs2 + 2Bs E[(Bt − Bs )|Fs ] + E[(Bt − Bs )2 |Fs ] = Bs2 + 0 + (t − s) :
E[Bt2 − t|Fs ] = Bs2 − s :
Bt2 − t is a martingale.
10
Quadratic Variation (QV).
The theory above extends to continuous martingales (bounded continu-
ous martingales in general, but we work on a finite time-interval [0, T ], so
continuity implies boundedness). We quote (for proof, see e.g. [RY], IV.1):
Itô’s Lemma.
We discuss Itô’s Lemma in more detail in §6 below; we pause here to give
the link with quadratic variation and covariation. We quote: if f (t, x1 , · · · , xd )
is C 1 in its zeroth (time) argument t and C 2 in its remaining d space argu-
ments xi , and M = (M 1 , · · · , M d ) is a continuous vector martingale, then
(writing fi , fij for the first partial derivatives of f with respect to its ith
argument and the second partial derivatives with respect to the ith and jth
arguments) f (Mt ) has stochastic differential
1
df (Mt ) = f0 (M )dt + Σdi=1 fi (Mt )dMti + Σdi,j=1 fij (Mt )dhM i , M j it .
2
Integration by Parts. If f (t, x1 , x2 ) = x1 x2 , we obtain
1
d(M N )t = N dMt + M dNt + hM, N it .
2
R
Similarly for stochastic integrals (defined below): if Zi := Hi dMi (i = 1, 2),
dhZ1 , Z2 i = H1 H2 dhM1 , M2 i.
Note. The integration-by-parts formula – a special case of Itô’s Lemma, as
above – is in fact equivalent to Itô’s Lemma: either can be used to derive the
11
other. Rogers & Williams [RW1, IV.32.4] describe the integration-by-parts
formula/Itô’s Lemma as ‘the cornerstone of stochastic calculus’.
Fractals Everywhere.
As we saw, a Brownian path is a fractal – a self-similar object. So too is
its zero-set Z. Fractals were studied, named and popularised by the French
mathematician Benôit B. Mandelbrot (1924-2010). See his books, and
Michael F. Barnsley: Fractals everywhere. Academic Press, 1988.
Fractals look the same at all scales – diametrically opposite to the familiar
functions of Calculus. In Differential Calculus, a differentiable function has a
tangent; this means that locally, its graph looks straight; similarly in Integral
Calculus. While most continuous functions we encounter are differentiable,
at least piecewise (i.e., except for ‘kinks’), there is a sense in which the typi-
cal, or generic, continuous function is nowhere differentiable. Thus Brownian
paths may look pathological at first sight – but in fact they are typical!
12
de Probabilités X: Lecture Notes on Math. 511, 245-400, Springer.
The first thing to note is that stochastic integrals with respect to Brown-
ian motion, if they exist, must be quite different from the measure-theoretic
integral of III.2. For, the Lebesgue-Stieltjes integrals described there have
as integrators the difference of two monotone (increasing) functions (by Jor-
dan’s theorem), which are locally of finite (bounded) variation, FV. But we
know from §4 that Brownian motion is of infinite (unbounded) variation on
every interval. So Lebesgue-Stieltjes and Itô integrals must be fundamentally
different.
In view of the above, it is quite surprising that Itô integrals can be de-
fined at all. But if we take for granted Itô’s fundamental insight that they
can be, it is obvious how to begin and clear enough how to proceed. We
begin with the simplest possible integrands X, and extend successively much
as we extended the measure-theoretic integral of Ch. III.
1. Indicators. R
If Xt (ω) = I[a,b] (t), there is exactly one plausible way to define XdB:
Z t Z t 0 if t ≤ a,
XdB, or Xs (ω)dBs (ω), := Bt − Ba if a ≤ t ≤ b,
0 0
Bb − Ba if t ≥ b.
Already one wonders how to extend this from constants ci to suitable ran-
dom variables, and one seeks to simplify the obvious but clumsy three-line
expressions above. It turns out that finite sums are not essential: one can
have infinite sums, but now we take the ci uniformly bounded.
We begin again, calling X simple if there is an infinite sequence
and uniformly bounded Ftn -measurable random variables ξn (|ξn | ≤ C for all
n and ω, for some C) if Xt (ω) can be written in the form
13
Rt
The only definition of 0 XdB that agrees with the above for finite sums is,
if n is the unique integer with tn ≤ t < tn+1 ,
Z t
It (X) := XdB = Σn−1 0 ξi (B(ti+1 ) − B(ti )) + ξn (B(t) − B(tn ))
0
= Σ∞
0 ξi (B(t ∧ ti+1 ) − B(t ∧ ti )) (0 ≤ t < ∞).
(ii) s < t and t belong to different intervals: s ∈ [tm , tm+1 ) for m < n. Then
But Itm (X) = Is (X) + ξm (B(s) − B(tm )); taking E[.|Fs ] the second term
gives zero as above, giving the result. //
14
transforms are martingales.
We pause to note a property of martingales which we shall need below.
Call Xt − Xs the increment of X over (s, t]. Then for a martingale X,
the product of the increments over disjoint intervals has zero mean. For, if
s < t ≤ u < v,
taking out what is known (as s, t ≤ u). The inner expectation is zero by the
martingale property, so the LHS is zero, as required.
Rt Rt
D (Itô isometry). E[(It (X))2 ], or E[( 0 Xs dBs )2 ], = E[ 0 Xs2 ds].
Proof. The LHS above is E[It (X).It (X)], i.e.
E[(Σn−1 2
i=0 ξi (B(ti+1 ) − B(ti )) + ξn (B(t) − B(tn ))) ].
E[Σn−1 2 2 2 2
i=0 ξi (B(ti+i − B(ti )) + ξn (B(t) − B(tn )) ].
Since ξi is Fti -measurable, each ξi2 -term is independent of the squared Brown-
ian increment term following it, which has expectation var(B(ti+1 )−B(ti )) =
ti+1 − ti . So we obtain
Σn−1 2 2
i=0 E[ξi ](ti+1 − ti ) + E[ξn ](t − tn ).
Rt Rt
This is 0 E[Xu2 ]du = E[ 0 Xu2 du], as required.
Rt
E. Itô isometry (continued). It (X) − Is (X) = s
Xu dBu satisfies
Z t Z t
E[( Xu dBu ) ] = E[ Xu2 du]
2
P − a.s.
s s
Proof: as above.
Rt Rt
F. Quadratic variation. The QV of It (X) = 0 Xu dBu is 0 Xu2 du.
This is proved in the same way as the case X ≡ 1, that B has quadratic
variation process t.
15
Integrands. Rt
The properties above suggest that 0 XdB should be defined only for
processes with Z t
E[Xu2 ]du < ∞ for all t.
0
We shall restrict attention to such X in what follows. This gives us an L2 -
theory of stochastic integration (compare the L2 -spaces introduced in Ch.
II), for which Hilbert-space methods are available.
3. Approximation.
Recall steps 1 (indicators) and 2 (simple integrands). By analogy with
the integral of Ch. III, we seek a suitable class of integrands suitably ap-
proximable by simple integrands. It turns out that:
(i) The suitable class
R t of 2integrands is the class of left-continuous adapted
processes X with 0 E[Xu ]du < ∞ for all t > 0 (or all t ∈ [0, T ] with finite
time-horizon T , as here),
(ii) Each such X may be approximated by aR sequence of simple integrands
t
Xn so that the stochastic integral It (X) = 0 XdB may be defined as the
Rt
limit of It (Xn ) = 0 Xn dB,
Rt
(iii) The stochastic integral 0 XdB so defined still has properties A-F above.
It is not possible to include detailed proofs of these assertions in a course
of this type [recall that we did not construct the measure-theoretic integral
of Ch. III in detail either – and this is harder!]. The key technical ingredient
needed is the Kunita-Watanabe inequalities. See e.g. [KS], §§3.1-2.
One can define stochastic integration in much greater generality.
1. Integrands. The natural class of integrands X to use here is the class of
predictable processes. These include the left-continuous processes to which
we confine ourselves above.
2. Integrators. One can construct a closely analogous theory for stochastic
integrals with the Brownian integrator B above replaced by a continuous
local martingale integrator M (or more generally by a local martingale: see
below). The properties above hold, with D replaced by
Z t Z t
E[( Xu dMu ) ] = E[ Xu2 dhM iu ].
2
0 0
16
cesses expressible as the sum of a local martingale and a process of (locally)
finite variation. Now C is replaced by: stochastic integrals of local martin-
gales are local martingales. See e.g. [RW1] or Meyer (1976) for details.
1
f (x) = f (u) + Σdi=0 (xi − ui )fi (u) + Σdi,j=0 (xi − ui )(xj − uj )fi,j (u) + · · ·
2
In our case (writing t0 in place of 0 for the starting time):
1
f (t, Xt ) = f (t0 , X(t0 ))+(t−t0 )f1 (t0 , X(t0 ))+(X(t)−X(t0 ))f2 + (t−t0 )2 f11 +
2
1
(t − t0 )(X(t) − X(t0 ))f12 + (X(t) − X(t0 ))2 f22 + · · · ,
2
17
which may be written symbolically as
1 1
df (t, X(t)) = f1 dt + f2 dX + f11 (dt)2 + f12 dtdX + f22 (dX)2 + · · · .
2 2
In this, we
(i) substitute dXt = Ut dt + Vt dBt from √above,
2
(ii) substitute (dBt ) = dt, i.e. |dBt | = dt, from §4:
1 1
df = f1 dt+f2 (U dt+V dB)+ f11 (dt)2 +f12 dt(U dt+V dB)+ f22 (U dt+V dB)2 +· · ·
2 2
Now using (dB)2 = dt,
(U dt + V dB)2 = V 2 dt + 2U V dtdB + U 2 (dt)2
= V 2 dt + higher-order terms :
1
df = (f1 + U f2 + V 2 f22 )dt + V f2 dB + higher-order terms.
2
Summarising, we obtain Itô’s Lemma, the analogue for the Itô or stochastic
calculus of the chain rule for ordinary (Newton-Leibniz) calculus:
Note. Powerful as it is in the setting above, Itô’s Lemma really comes into its
own in the more general setting of semimartingales. It says there that if X is
a semimartingale and f is a smooth function as above, then f (t, X(t)) is also
a semimartingale. The ordinary differential dt gives rise to the bounded-
variation part, the stochastic differential gives rise to the martingale part.
This closure property under very general non-linear operations is very pow-
erful and important.
19
or a central push: frictional drag acts as a restoring force tending to push the
process back towards its mean. It is important in many areas, including
(i) statistical mechanics, where it originated,
(ii) mathematical finance, where it appears in the Vasicek model for the term-
structure of interest-rates (the mean represents the ‘natural’ interest rate),
(iii) stochastic volatility models, where the volatility σ itself is now a stochas-
tic process σt , subject to an SDE of OU type.
20