0% found this document useful (0 votes)

12 views275 pages

Basics of Stochastic Analysis Compress

This document is a course material on stochastic analysis, covering topics such as stochastic integrals, Ito's formula, and stochastic differential equations. It is designed to be accessible to students with limited backgrounds in analysis and probability theory, while also serving as a resource for instructors. The text includes foundational concepts in measure theory and probability, along with exercises and references for further study.

Uploaded by

Minh Bui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views275 pages

Basics of Stochastic Analysis Compress

Uploaded by

Minh Bui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 275

Basics of Stochastic Analysis

c 2003 Timo Seppäläinen

Department of Mathematics, University of Wisconsin–Madison,

Madison, Wisconsin 53706
E-mail address: [email protected]
Abstract. This material was used for a course on stochastic analy-
sis at UW–Madison in fall 2003. The text covers the development of
the stochastic integral of predictable processes with respect to cadlag
semimartingale integrators, Itô’s formula in an open domain in Rn ,
and an existence and uniqueness theorem for an equation of the type
dX = dH + F (t, X) dY where Y is a cadlag semimartingale.
The text is self-contained except for certain basics of integration
theory and probability theory which are explained but not proved. In
addition, the reader needs to accept without proof two basic martin-
gale theorems: (i) the existence of quadratic variation for a cadlag local
martingale; and (ii) the so-called fundamental theorem of local martin-
gales that states the following: given a cadlag local martingale M and
a positive constant c, M can be decomposed as N + A where N and A
are cadlag local martingales, jumps of N are bounded by c, and A has
paths of bounded variation.
This text intends to provide a stepping stone to deeper books such
as Protter’s monograph. The hope is that this material is accessible to
students who do not have an ideal background in analysis and proba-
bility theory, and useful for instructors who (like the author) are not
experts on stochastic analysis.
Contents

Chapter 1. Measures, Integrals, and Foundations of Probability

Theory 1
§1.1. Measure theory and integration 1
§1.2. Basic concepts of probability theory 19
Exercises 30

Chapter 2. Stochastic Processes 35

§2.1. Filtrations and stopping times 35
§2.2. Quadratic variation 44
§2.3. Path spaces, Markov and martingale property 50
§2.4. Brownian motion 52
§2.5. Poisson processes 64
Exercises 67

Chapter 3. Martingales 69
§3.1. Optional Stopping 71
§3.2. Inequalities 75
§3.3. Local martingales and semimartingales 78
§3.4. Quadratic variation for local martingales 79
§3.5. Doob-Meyer decomposition 84
§3.6. Spaces of martingales 86
Exercises 90

Chapter 4. Stochastic Integral with respect to Brownian Motion 91

iii
iv Contents

Exercises 105
Chapter 5. Stochastic Integration of Predictable Processes 107
§5.1. Square-integrable martingale integrator 108
§5.2. Local square-integrable martingale integrator 135
§5.3. Semimartingale integrator 145
§5.4. Further properties of stochastic integrals 150
§5.5. Integrator with absolutely continuous Doléans measure 162
Exercises 167
Chapter 6. Itô’s formula 171
§6.1. Quadratic variation 171
§6.2. Itô’s formula 179
§6.3. Applications of Itô’s formula 187
Exercises 193
Chapter 7. Stochastic Differential Equations 195
§7.1. Examples of stochastic equations and solutions 196
§7.2. Existence and uniqueness for a semimartingale equation 203
§7.3. Proof of the existence and uniqueness theorem 208
Exercises 234
Appendix A. Analysis 235
§A.1. Continuous, cadlag and BV functions 236
§A.2. Differentiation and integration 247
Exercises 250
Appendix B. Probability 251
§B.1. General matters 251
§B.2. Construction of Brownian motion 257
Bibliography 269
Index 271
Chapter 1

Measures, Integrals,
and Foundations of
Probability Theory

In this chapter we sort out the integrals one typically encounters in courses
on calculus, analysis, measure theory, probability theory and various applied
subjects such as statistics and engineering. These are the Riemann inte-
gral, the Riemann-Stieltjes integral, the Lebesgue integral and the Lebesgue-
Stieltjes integral. The starting point is the general Lebesgue integral on an
abstract measure space. The other integrals are special cases, even though
they might have different definitions.
This chapter is not a complete treatment of the basics of measure theory.
It provides a brief unified explanation for readers who have prior familiarity
with various notions of integration. To avoid unduly burdening this chapter,
many technical matters that we need later in the book have been relegated
to the appendix. For details that we have omitted and for proofs the reader
should turn to any of the standard textbook sources, such as Folland [3].
For students of probability, the Appendix of [2] is convenient.
In the second part of the chapter we go over the measure-theoretic foun-
dations of probability theory. Readers who know basic measure theory and
measure-theoretic probability can safely skip this chapter.

1.1. Measure theory and integration

A space X is in general an arbitrary set. For integration, the space must
have two additional structural elements, namely a σ-algebra and a measure.

1
2 1. Measures, Integrals, and Foundations of Probability Theory

1.1.1. σ-algebras. Suppose A is a collection of subsets of X. (Terms such

as class, collection and family are used as synonyms for set to avoid speaking
of “sets of sets,” or even “sets of sets of sets.”) Then A is a σ-algebra (also
called a σ-field) if it has these properties:
(i) X ∈ A and ∅ ∈ A.
(ii) If A ∈ A then also Ac ∈ A.
S
(iii) If {Ai } is a sequence of sets in A, then also their union i Ai is an
element of A.
The restriction to countable unions in part (iii) is crucial. Unions over
arbitrarily large collections of sets are not permitted. On the other hand, if
part (iii) only permits finite unions, then A is called an algebra of sets, but
this is not rich enough.
A pair (X, A) where X is a space and A is a σ-algebra on X is called
a measurable space. The elements of A are called measurable sets. Suppose
(X, A) and (Y, B) are two measurable spaces and f : X → Y is a map
(another term for a function) from X into Y . Then f is measurable if for
every B ∈ B, the inverse image
f −1 (B) = {x ∈ X : f (x) ∈ B} = {f ∈ B}
lies in A. Measurable functions are the fundamental object in measure
theory. Measurability is preserved by composition f ◦ g of functions.
In finite or countable spaces the useful σ-algebra is usually the power set
2X which is the collection of all subsets of X. Not so in uncountable spaces.
Furthermore, the important σ-algebras are usually very complicated so that
it is impossible to give a concise criterion for testing whether a given set is
a member of the σ-algebra. The preferred way to define a σ-algebra is to
generate it by a smaller collection of sets that can be explicitly described.
This procedure is analogous to spanning a subspace of a vector space with a
given set of vectors. Except that the generated σ-algebra usually lacks the
kind of internal description that vector subspaces have as the set of finite
linear combinations of basis vectors. Generation of σ-algebras is based on
this lemma whose proof the reader should fill in as an exercise, if this material
is new.
Lemma 1.1. Let Γ be a family of σ-algebras on a space X. Then the
intersection \
C= A
A∈Γ
is also a σ-algebra.

Let E be an arbitrary collection of subsets of X. The σ-algebra generated

by E, denoted by σ(E), is by definition the intersection of all σ-algebras on
1.1. Measure theory and integration 3

X that contain E. This intersection is well-defined because there is always

at least one σ-algebra on X that contains E, namely the power set 2X . An
equivalent characterization is that σ(E) ⊇ E, σ(E) is a σ-algebra on X, and
if B is any σ-algebra on X that contains E, then σ(E) ⊆ B. This last point
justifies calling σ(E) the smallest σ-algebra on X that contains E.
A related notion is a σ-algebra generated by collection of functions.
Suppose (Y, H) is a measurable space, and Φ is a collection of functions
from X into Y . Then the σ-algebra generated by Φ is defined by

(1.1) σ(Φ) = σ {f ∈ B} : f ∈ Φ, B ∈ H .
σ(Φ) is the smallest σ-algebra that makes all the functions in Φ measurable.
Example 1.2 (Borel σ-algebras). If X is a metric space, then the Borel
σ-field BX is the smallest σ-algebra on X that contains all open sets. The
members of BX are called Borel sets. We also write B(X) when subscripts
become clumsy.
It is often technically convenient to have different generating sets for a
particular σ-algebra. For example, the Borel σ-algebra BR of the real line
is generated by either one of these classes of intervals:
{(a, b] : −∞ < a < b < ∞} and {(−∞, b) : −∞ < b < ∞}.
We shall also need the Borel σ-algebra B[−∞,∞] of the extended real line
[−∞, ∞]. We define this as the smallest σ-algebra that contains all Borel sets
on the real line and the singletons {−∞} and {∞}. This σ-algebra is also
generated by the intervals {[−∞, b] : b ∈ R}. It is possible to define a metric
on [−∞, ∞] such that this σ-algebra is the Borel σ-algebra determined by
the metric.

When we speak of real-valued or extended real-valued measurable func-

tions on an arbitrary measurable space (X, A), we always have in mind the
Borel σ-algebra on R and [−∞, ∞]. One can then check that measurability
is preserved by algebraic operations (f ±g, f g, f /g, whenever these are well-
defined) and by pointwise limits and suprema of sequences: if {fn : n ∈ N}
is a sequence of real-valued measurable functions, then for example the func-
tions
g(x) = sup fn (x) and h(x) = lim fn (x)
n∈N n→∞

are measurable. The set N above is the set of natural numbers N =

{1, 2, 3, . . . }. Thus measurability is a more robust property than other fa-
miliar types of regularity, such as continuity or differentiability that are not
in general preserved by pointwise limits.
If (X, BX ) is a metric space with its Borel σ-algebra, then every continu-
ous function f : X → R is measurable. The definition of continuity implies
4 1. Measures, Integrals, and Foundations of Probability Theory

that for any open G ⊆ R, f −1 (G) is an open set in X, hence a member

of BX . Since the open sets generate BR , this suffices for concluding that
f −1 (B) ∈ BX for all Borel sets B ⊆ R.
Example 1.3 (Product σ-algebras). Of great importance for probability
theory are product σ-algebras. Let I be an arbitrary index set, and for
∈ I let (Xi , Ai ) be a measurable space. The Cartesian
each i Q S product space
X = i∈I Xi is the space of all functions x : I → i∈I Xi such that
x(i) ∈ Xi for each i. Alternate notation for x(i) is xi . Coordinate projection
maps on X are defined by fi (x) = xi , in other words fi maps X onto N Xi by
extracting the i-coordinate of the I-tuple x. The product σ-algebra i∈I Ai
is by definition the σ-algebra generated by the coordinate projections {fi :
i ∈ I}.

1.1.2. Measures. Let us move on to discuss the second fundamental in-

gredient of integration. Let (X, A) be a measurable space. A measure is a
function µ : A → [0, ∞] that satisfies these properties:
(i) µ(∅) = 0.
(ii) If {Ai } is a sequence of sets in A such that Ai ∩ Aj = ∅ for all i 6= j
(pairwise disjoint is the term), then
[ X
µ Ai = µ(Ai ).
i i
Property (ii) is called countable additivity. It goes together with the fact
that σ-algebras are Sclosed under countable unions, so there is no issue about
whether the union i Ai is a member of A. The triple (X, A, µ) is called a
measure space.
If µ(X) < ∞ then µ is a finite measure. If µ(X) = 1 then µ is a prob-
ability measure. Infinite measures arise naturally. The ones we encounter
satisfy a condition called σ-finiteness: µ isSσ-finite if there exists a sequence
of measurable sets {Vi } such that X = Vi and µ(Vi ) < ∞ for all i. A
measure defined on a Borel σ-algebra is called a Borel measure.
Example 1.4. Suppose X = {xi : i ∈ N} is a countable space, and let
{ai : i ∈ N} be a sequence of nonnegative real numbers. Then
X
µ(A) = ai
i:xi ∈A

ai < ∞) measure on the σ-algebra 2X of all

P
defines a σ-finite (or finite, if
subsets of X.

The example shows that to define a measure on a countable space, one

only needs to specify the measures of the singletons, and the rest follows
1.1. Measure theory and integration 5

by countable additivity. Again, things are more complicated in uncountable

spaces. For example, we would like to have a measure m on the Borel sets of
the real line with the property that the measure m(I) of an interval I is the
length of the interval. (This measure is known as the Lebesgue measure.)
But then the measure of any singleton must be zero. So there is no way to
construct the measure by starting with singletons.
Since it is impossible to describe every element of a σ-algebra, it is
even more impossible to give a simple explicit formula for the measure of
every measurable set. Consequently we need a theorem that “generates” a
measure from some modest ingredients that can be explicitly written down.
Here is a useful one.
First, a class S of subsets of X is a semialgebra if it has these properties:
(i) ∅ ∈ S
(ii) If A, B ∈ S then also A ∩ B ∈ S.
(iii) If A ∈ S, then Ac is a finite disjoint union of elements of S.
A good example on the real line to keep in mind is
(1.2) S = {(a, b] : −∞ ≤ a ≤ b < ∞} ∪ {(a, ∞) : −∞ ≤ a < ∞}.
This semialgebra generates BR .

Theorem 1.5. Let S be a semialgebra, and µ0 : S → [0, ∞] a function with

these properties:
(i) µ0 (∅) = 0
(ii) If A ∈ SPis a finite disjoint union of sets B1 , . . . , Bn in S, then
µ0 (A) = µ0 (Bi ).
(iii) If A ∈ S is a countable
P disjoint union of sets B1 , B2 , . . . , Bn , . . . in
S, then µ0 (A) ≤ µ0 (Bi ).

S furthermore that there exists a sequence of sets {Ai } in S such that

Assume
X = Ai and µ0 (Ai ) < ∞ for all i. Then there exists a unique measure µ
on the σ-algebra σ(S) such that µ = µ0 on S.

This theorem is proved by first extending µ0 to the algebra generated

by S, and by then using the so-called Carathéodory Extension Theorem to
go from the algebra to the σ-algebra. With Theorem 1.5 we can describe a
large class of measures on R.

Example 1.6 (Lebesgue-Stieltjes measures). Let F be a nondecreasing,

right-continuous real-valued function on R. For intervals (a, b], define
µ0 (a, b] = F (b) − F (a).
6 1. Measures, Integrals, and Foundations of Probability Theory

This will work also for a = −∞ and b = ∞ if we define

F (∞) = lim F (x) and F (−∞) = lim F (y).
x%∞ y&−∞

It is possible that F (∞) = ∞ and F (−∞) = −∞, but this will not hurt
the definition. One can show that µ0 satisfies the hypotheses of Theorem
1.5, and consequently there exists a measure µ on (R, BR ) that gives mass
F (b) − F (a) to each interval (a, b]. This measure is called the Lebesgue-
Stieltjes measure of the function F , and we shall denote µ by ΛF to indicate
the connection with F .
The most important special case is Lebesgue measure which we shall
denote by m, obtained by taking F (x) = x.
On the other hand, if µ is a Borel measure on R such that µ(B) < ∞
for all bounded Borel sets, we can define a right-continuous nondecreasing
function by (
µ(0, x], x>0
G(0) = 0, and G(x) =
−µ(x, 0], x < 0
and then µ = ΛG . Thus Lebesgue-Stieltjes measures give us all the Borel
measures that are finite on bounded sets.

1.1.3. The integral. Let (X, A, µ) be a fixed measure space. To say that a
function f : X → R or f : X → [−∞, ∞] is measurable is always interpreted
with the Borel σ-algebra on R or [−∞, ∞]. In either case, it suffices to
check that {f ≤ t} ∈ A for each real t. The Lebesgue integral is defined
in several stages, starting with cases for which the integral can be written
out explicitly. This same pattern of proceeding from simple cases to general
cases will also be used to define stochastic integrals.

Step 1. Nonnegative measurable simple functions. A nonnegative sim-

ple function is a function with finitely many distinct values α1 , . . . , αn ∈
[0, ∞). If we set Ai = {f = αi }, then we can write
n
X
f (x) = αi 1Ai (x)
i=1
where (
1, x ∈ A
1A (x) =
0, x ∈
/A
is the indicator
R function (also called characteristic function) of the set A.
The integral f dµ is defined by
Z Xn
f dµ = αi µ(Ai ).
i=1
1.1. Measure theory and integration 7

This sum is well-defined because f is measurable iff each Ai is measurable,

and it is possible to add and multiply numbers in [0, ∞]. Note the convention
0 · ∞ = 0.

Step 2. [0, ∞]-valued measurable functions. Let f : X → [0, ∞] be

measurable. Then we define
Z Z
f dµ = sup g dµ : g is a simple function such that 0 ≤ g ≤ f .

This integral is a well-defined number in [0, ∞].

Step 3. General measurable functions. Let f : X → [−∞, ∞] be

measurable. The positive and negative parts of f are f + = f ∨ 0 and
f − = −(f ∧ 0). f ± are nonnegative functions, and satisfy f = f + − f − and
|f | = f + + f − . The integral of f is defined by
Z Z Z
f dµ = f + dµ − f − dµ

provided at least one of the integrals on the right is finite.

These steps complete the construction of the integral. Along the way
one proves that the integral has all the necessary properties, such as linearity
Z Z Z
(αf + βg) dµ = α f dµ + β g dµ,

monotonicity: Z Z
f ≤ g implies f dµ ≤ g dµ,
and the important inequality
Z Z
f dµ ≤ |f | dµ.
R
Various notations are used for the integral f dµ. Sometimes
R it is desir-
able to indicate the space over which one integrates by X f dµ. Then one
can indicate integration over a subset A by defining
Z Z
f dµ = 1A f dµ.
A X
To make the integration variable explicit, one can write
Z Z
f (x) µ(dx) or f (x) dµ(x).
X X
Since the integral is bilinear in both the function and the measure, the linear
functional notation hf, µi is used. Sometimes the notation is simplified to
µ(f ), or even to µf .
8 1. Measures, Integrals, and Foundations of Probability Theory

A basic aspect of measure theory is that whatever happens on sets of

measure zero is not visible. We say that a property holds µ–almost every-
where (or simply almost everywhere if the measure is clear from the context)
if there exists a set N ∈ A such that µ(N ) = 0 (a µ-null set) and the
property in question holds on the set N c .
For example, we can define a measure µ on R which agrees with Lebesgue
measure on [0, 1] and vanishes elsewhere by µ(B) = m(B ∩[0, 1]) for B ∈ BR .
Then if g(x) = x on R while

sin x, x < 0

f (x) = x, 0≤x≤1

cos x, x > 1


we can say that f = g µ-almost everywhere.

The principal power of the Lebesgue integral derives from three funda-
mental convergence theorems which we state next. The value of an integral is
not affected by changing the function on a null set. Therefore the hypotheses
of the convergence theorems require only almost everywhere convergence.

Theorem 1.7. (Fatou’s Lemma) Let 0 ≤ fn ≤ ∞ be measurable functions.

Then Z Z

lim fn dµ ≤ lim fn dµ.
n→∞ n→∞

Theorem 1.8. (Monotone Convergence Theorem) Let fn be nonnegative

measurable functions, and assume fn ≤ fn+1 almost everywhere, for each n.
Let f = limn→∞ fn . This limit exists at least almost everywhere. Then
Z Z
f dµ = lim fn dµ.
n→∞

Theorem 1.9. (Dominated Convergence Theorem) Let fn be measurable

functions, and assume the limit f = limn→∞ fn exists almost everywhere.
Assume there exists
R a function g ≥ 0 such that |fn | ≤ g almost everywhere
for each n, and g dµ < ∞. Then
Z Z
f dµ = lim fn dµ.
n→∞

Finding examples where the hypotheses and the conclusions of these

theorems fail are excellent exercises. By the monotone convergence theorem
1.1. Measure theory and integration 9

R
we can give the following explicit limit expression for the integral f dµ of
a [0, ∞]-valued function f . Define simple functions
2n
X n−1
fn (x) = 2−n k · 1{2−n k≤f <2−n (k+1)} (x) + n · 1{f ≥n} (x).
k=0

Then 0 ≤ fn (x) % f (x), and so by Theorem 1.8,

Z Z
f dµ = lim fn dµ
n→∞
(1.3) 2nX
n−1
−n
k · µ 2kn ≤ f < k+1

= lim 2 2n + n · µ{f ≥ n} .
n→∞
k=0

There is an abstract change of variables principle which is particu-

larly important in probability. Suppose we have a measurable map ψ :
(X, A) → (Y, H) between two measurable spaces, and a measurable func-
tion f : (Y, H) → (R, BR ). If µ is a measure on (X, A), we can define a
measure ν on (Y, H) by
ν(U ) = µ(ψ −1 (U )) for U ∈ H.
In short, this connection is expressed by
ν = µ ◦ ψ −1 .
If the integral of f over the measure space (Y, H, ν) exists, then the value of
this integral is not changed if instead we integrate f ◦ ψ over (X, A, µ):
Z Z
(1.4) f dν = (f ◦ ψ) dµ.
Y X
Note that the definition of ν already gives the equality for f = 1U . The
linearity of the integral then gives it for simple f . General f ≥ 0 follow by
monotone convergence, and finally general f = f + − f − by linearity again.
This sequence of steps recurs often when an identity for integrals is to be
proved.

1.1.4. Function spaces. Various function spaces play an important role

in analysis and in all the applied subjects that use analysis. One way to
define such spaces is through integral norms. For 1 ≤ Rp < ∞, the space
Lp (µ) is the set of all measurable f : X → R such that |f |p dµ < ∞. The
Lp norm on this space is defined by
Z 1
p
p
(1.5) kf kp = kf kL (µ) =
p |f | dµ .
R
A function f is called integrable if |f | dµ < ∞. This is synonymous with
f ∈ L1 (µ).
10 1. Measures, Integrals, and Foundations of Probability Theory

There is also a norm corresponding to p = ∞, defined by

(1.6) kf k∞ = kf kL∞ (µ) = inf c ≥ 0 : µ{|f | > c} = 0 .
This quantity is called the essential supremum of |f |. The inequality |f (x)| ≤
kf k∞ holds almost everywhere, but can fail on a null set of points x.
The Lp (µ) spaces, 1 ≤ p ≤ ∞, are Banach spaces (see Appendix). The
type of convergence in these spaces is called Lp convergence, so we say
fn → f in Lp (µ) if
kfn − f kLp (µ) → 0 as n → ∞.
However, a problem is created by the innocuous property of a norm that
requires kf kp = 0 if and only if f = 0. For example, let the underlying
measure space be the interval [0, 1] with Lebesgue measure on Borel sets.
From the definition of the Lp norm then follows that kf kp = 0 if and only
if f = 0 Lebesgue–almost everywhere. In other words, f can be nonzero
on even infinitely many points as long as these points form a Lebesgue–null
set, and still kf kp = 0. An example of this would be the indicator function
of the rationals in [0, 1]. So the disturbing situation is that many functions
have zero norm, not just the identically zero function f (x) ≡ 0.
To resolve this, we apply the idea that whatever happens on null sets
is not visible. We simply adopt the point of view that functions are equal
if they differ only on a null set. The mathematically sophisticated way
of phrasing this is that we regard elements of Lp (µ) as equivalence classes
of functions. Particular functions that are almost everywhere equal are
representatives of the same equivalence class. Fortunately, we do not have
to change our language. We can go on regarding elements of Lp (µ) as
functions, as long as we remember the convention concerning equality. This
issue will appear again when we discuss spaces of stochastic processes.

1.1.5. Completion of measures. There are certain technical benefits to

having the following property in a measure space (X, A, µ), called complete-
ness: if N ∈ A satisfies µ(N ) = 0, then every subset of N is measurable
(and then of course has measure zero). It turns out that this can always be
arranged by a simple enlargement of the σ-algebra. Let
Ā = {A ⊆ X : there exists B, N ∈ A and F ⊆ N
such that µ(N ) = 0 and A = B ∪ F }
and define µ̄ on Ā by µ̄(A) = µ(B) when B has the relationship to A from
above. The one can check that A ⊆ Ā, (X, Ā, µ̄) is a complete measure
space, and µ̄ agrees with µ on A .
An important example of this procedure is the extension of Lebesgue
measure m from BR to a σ-algebra LR of the so-called Lebesgue measurable
1.1. Measure theory and integration 11

sets. LR is the completion of BR under Lebesgue measure, and it is strictly

larger than BR . Proving this latter fact is typically an exercise in real
analysis (for example, Exercise 2.9 in [3]). For our purposes the Borel sets
suffice as a domain for Lebesgue measure. In analysis literature the term
Lebesgue measure usually refers to the completed measure.

1.1.6. The Riemann and Lebesgue integrals. In calculus we learn the

Riemann integral. Suppose f is a bounded function on a compact interval
[a, b]. Given a partition π = {a = s0 < s1 < · · · < sn = b} of [a, b] and some
choice of points xi ∈ [si , si+1 ], we form the Riemann sum
n−1
X
S(π) = f (xi )(si+1 − si ).
i=0
We say f is Riemann integrable on [a, b] if there is a number c such that the
following is true: given ε > 0, there exists δ > 0 such that |c − S(π)| ≤ ε
for every partition π with mesh(π) = max{si+1 − si } ≤ δ and for any choice
of the points xi in the Riemann sum. In other words, the Riemann sums
converge to c as the mesh of the partition converges to zero. The limiting
value is by definition the Riemann integral of f :
Z b
(1.7) f (x) dx = c = lim S(π).
a mesh(π)→0

Typically one proves then that every continuous function is Riemann inte-
grable.
The definition of the Riemann integral is fundamentally different from
the definition of the Lebesgue integral. For the Riemann integral there is
one recipe for all functions, instead of a step-by-step definition that proceeds
from simple to complex cases. For the Riemann integral we partition the
domain [a, b], whereas the Lebesgue integral proceeds by partitioning the
range of f , as formula (1.3) makes explicit. This latter difference is some-
times illustrated by counting the money in your pocket: the Riemann way
picks one coin at a time from the pocket, adds its value to the total, and
repeats this until all coins are counted. The Lebesgue way first partitions
the coins into pennies, nickles, dimes, etc, and then counts the piles. As the
coin-counting picture suggests, the Lebesgue way is more efficient (it leads
to a more general integral with superior properties) but when both apply,
the answers are the same. The precise relationship is the following, which
also gives the exact domain of applicability of the Riemann integral.
Theorem 1.10. Suppose f is a bounded function on [a, b].
(a) If f is a Riemann integrable function on [a, b], then f is Lebesgue
measurable, and the Riemann integral of f coincides with the Lebesgue in-
tegral of f with respect to Lebesgue measure m on [a, b].
12 1. Measures, Integrals, and Foundations of Probability Theory

(b) f is Riemann integrable iff the set of discontinuities of f has Lebesgue

measure zero.

Because of this theorem, the Riemann integral notation is routinely used

for Lebesgue integrals on the real line. In other words, we write
Z b Z
f (x) dx instead of f dm
a [a,b]

for a Borel or Lebesgue measurable function f on [a, b], even if the function
f is not Riemann integrable.

1.1.7. Product measures. Let (X, A, µ) and (Y, B, ν) be two σ-finite

measure spaces. The product measure space
(X × Y, A ⊗ B, µ ⊗ ν)
is defined as follows. X × Y is the Cartesian product space. A ⊗ B is the
product σ-algebra. The product measure µ ⊗ ν is the unique measure on
A ⊗ B that satisfies
µ ⊗ ν(A × B) = µ(A)ν(B)
for measurable rectangles A × B where A ∈ A and B ∈ B. Measurable
rectangles generate A ⊗ B and they form a semialgebra. The hypotheses of
the Extension Theorem 1.5 can be checked, so the measure µ ⊗ ν is uniquely
and well defined. This measure µ ⊗ ν is also σ-finite.
The x-section fx of an A ⊗ B-measurable function f is fx (y) = f (x, y).
It is a B-measurable function on Y . Furthermore, the integral of fx over
(Y, B, ν) gives an A-measurable function of x. Symmetric statements hold
for the y-section fy (x) = f (x, y), a measurable function on X. This is part
of the important Tonelli-Fubini theorem.
Theorem 1.11. Suppose (X, A, µ) and (Y, B, ν) are σ-finite measure spaces.
(a) (Tonelli’s theorem) Let f be a [0, ∞]-valued
R A ⊗ B-measurable
R func-
tion on X × Y . Then the functions g(x) = Y fx dν and h(y) = X fy dµ are
[0, ∞]-valued measurable functions on their respective spaces. Furthermore,
f can be integrated by iterated integration:
Z Z Z
f d(µ ⊗ ν) = f (x, y) ν(dy) µ(dx)
X×Y X Y
Z Z
= f (x, y) µ(dx) ν(dy).
Y X

(b) (Fubini’s theorem) Let f ∈ L1 (µ ⊗ ν). Then fx ∈ L1 (ν) for µ-almost

every x, fy ∈ L1 (µ) for ν-almost every y, g ∈ L1 (µ) and h ∈ L1 (ν). Iterated
integration is valid as above.
1.1. Measure theory and integration 13

The product measure construction and the theorem generalize naturally

to products
Y n O n On
Xi , Ai , µi
i=1 i=1 i=1
of finitely many σ-finite measure spaces. Infinite products shall be discussed
in conjunction with the construction problem of stochastic processes.
The part of the Tonelli-Fubini theorem often needed is that integrating
away some variables from a product measurable function always leaves a
function that is measurable in the remaining variables.
In multivariable calculus we learn the multivariate Riemann integral over
n-dimensional rectangles,
Z b1 Z b2 Z bn
··· f (x1 , x2 , . . . , xd ) dxn · · · dx2 dx1 .
a1 a2 an
This is an integral with respect to n-dimensional Lebesgue measure on
Rn , which can be defined as the completion of the n-fold product of one-
dimensional Lebesgue measures.
As a final technical
Q point, consider metric spaces XN 1 , X2 , . . . , Xn , with
product space X = Xi . X has the product σ-algebra BXi of the Borel
σ-algebras from the factor spaces. On the other hand, X is a metric space in
its own right, and so has its own Borel σ-algebra BX . What is the relation
between the two? The projection maps (x1 , . . . , xn )N7→ xi are continuous,
hence
N BX -measurable. Since these maps generate BXi , it follows that
N i BX ⊆ B X . It turns out that if the X i ’s are separable then equality holds:
BXi = BX . A separable metric space is one that has a countable dense
set, such as the rational numbers in R.

1.1.8. Signed measures. A finite signed measure µ on a measurable space

(X, A) is a function µ : A → R such that µ(∅) = 0, and
∞
X
(1.8) µ(A) = µ(Ai )
i=1
S
whenever A = Ai is a disjoint union. The series in (1.8) has to converge
absolutely, meaning that X
|µ(A)| < ∞.
i
P
Without absolute convergence the limit of the series µ(Ai ) would depend
on the order of the terms. But this must not happen because rearranging
the sets A1 , A2 , A3 , . . . does not change their union.
More generally, a signed measure is allowed to take one of the values
±∞ but not both. We shall use the term measure only when the signed
14 1. Measures, Integrals, and Foundations of Probability Theory

measure takes only values in [0, ∞]. If this point needs emphasizing, we use
the term positive measure as a synonym for measure.
For any signed measure ν, there exist unique positive measures ν + and
ν − such that ν = ν + − ν − and ν + ⊥ν − . (The statement ν + ⊥ν − reads “ν +
and ν − are mutually singular,” and means that there exists a measurable
set A such that ν + (A) = ν − (Ac ) = 0.) The measure ν + is the positive
variation of ν, ν − is the negative variation of ν, and ν = ν + − ν − the
Jordan decomposition of ν. There exist measurable sets P and N such that
P ∪ N = X, P ∩ N = ∅, and ν + (A) = ν(A ∩ P ) and ν − (A) = −ν(A ∩ N ).
(P, N ) is called the Hahn decomposition of ν. The total variation of ν is the
positive measure |ν| = ν + +ν − . We say that the signed measure ν is σ-finite
if |ν| is σ-finite.
Integration with respect to a signed measure is defined by
Z Z Z
(1.9) f dν = f dν − f dν −
+

whenever both integrals on the right are finite. A function f is integrable

with respect to ν if it is integrable with respect to |ν|. In other words, L1 (ν)
is by definition L1 ( |ν| ). A useful inequality is
Z Z
(1.10) f dν ≤ |f | d|ν|

valid for all f for which the integral on the right is finite.
Note for future reference that integrals with respect to |ν| can be ex-
pressed in terms of ν by
Z Z Z Z
+ −
(1.11) f d|ν| = f dν + f dν = (1P − 1N )f dν.
P N

1.1.9. BV functions and Lebesgue-Stieltjes integrals. Let F be a

function on [a, b]. The total variation function of F is the function VF (x)
defined on [a, b] by
Xn
(1.12) VF (x) = sup |F (si ) − F (si−1 )| : a = s0 < s1 < · · · < sn = x .
i=1

The supremum above is taken over partitions of the interval [a, x]. F has
bounded variation on [a, b] if VF (b) < ∞. BV [a, b] denotes the space of
functions with bounded variation on [a, b] (BV functions).
VF is a nondecreasing function with VF (a) = 0. F is a BV function iff
it is the difference of two bounded nondecreasing functions, and in case F
is BV, one way to write this decomposition is
F = 21 (VF + F ) − 12 (VF − F )
1.1. Measure theory and integration 15

(the Jordan decomposition of F ). If F is BV and right-continuous, then also

VF is right-continuous.
Henceforth suppose F is BV and right-continuous. Then there is a
unique signed Borel measure ΛF on (a, b] determined by
ΛF (u, v] = F (v) − F (u), a < u < v ≤ b.
We can obtain this measure from our earlier definition of Lebesgue-Stieltjes
measures of nondecreasing functions. Let F = F1 −F2 be the Jordan decom-
position of F . Extend these functions outside [a, b] by setting Fi (x) = Fi (a)
for x < a, and Fi (x) = Fi (b) for x > b. Then ΛF = ΛF1 − ΛF2 , where ΛF1
and ΛF2 are as constructed in Example 1.6. Furthermore, the total variation
measure of ΛF is
|ΛF | = ΛF1 + ΛF2 = ΛVF ,
the Lebesgue-Stieltjes measure of the total variation function VF . The inte-
gral of a bounded Borel function g on (a, b] with respect to the measure ΛF
is of course denoted by
Z Z
g dΛF but also by g(x) dF (x),
(a,b] (a,b]

and the integral is called a Lebesgue-Stieltjes integral.

R We shall use both
of these notations in the sequel. Especially
R when g dF might be confused
with a stochastic integral, we prefer g dΛF . For Lebesgue-Stieltjes integrals
inequality (1.10) can be written in the form
Z Z
(1.13) g(x) dF (x) ≤ |g(x)| dVF (x).
(a,b] (a,b]

We considered ΛF a measure on (a, b] rather than [a, b] because according

to the connection between a right-continuous function and its Lebesgue-
Stieltjes measure, the measure of the singleton {a} is
ΛF {a} = F (a) − F (a−).
This value is determined by how we choose to extend F to x < a, and so is
not determined by the values on [a, b].
Advanced calculus courses often cover a related integral called the Rie-
mann-Stieltjes, or the Stieltjes integral. This is a generalization of the Rie-
mann integral. For bounded functions g and F on [a, b], the Riemann-
Rb
Stieltjes integral a g dF is defined by
Z b X
g dF = lim g(xi ) F (si+1 ) − F (si )
a mesh(π)→0
i
if this limit exists. The notation and the interpretation of the limit is as in
(1.7). One can prove that this limit exists for example if g is continuous and
16 1. Measures, Integrals, and Foundations of Probability Theory

F is BV [6, page 282]. The next lemma gives a version of this limit that
will be used frequently in the sequel.
Lemma 1.12. Let ν be a finite signed measure on (0, T ]. Let f be a bounded
Borel function on [0, T ] for which the left limit f (t−) exists at all 0 < t ≤ T .
Let π n = {0 = sn1 < · · · < snm(n) = T } be partitions of [0, T ] such that
mesh(π n ) → 0. Then
X Z
lim sup f (sni )ν(sni ∧ t, sni+1 ∧ t] − f (s−) ν(ds) = 0.
n→∞ 0≤t≤T (0,t]
i
In particular, for a right-continuous function G ∈ BV [0, T ],
X Z
f (sni ) G(sni+1 ∧ t) − G(sni ∧ t) −

lim sup f (s−) dG(s) = 0.
n→∞ 0≤t≤T (0,t]
i

Remark 1.13. It is important here that f is evaluated at the left endpoint

of the partition intervals [sni , sni+1 ].

Proof. For each 0 ≤ t ≤ T ,

X Z
n n n
f (si )ν(si ∧ t, si+1 ∧ t] − f (s−) ν(ds)
i (0,t]
Z X
≤ f (sni )1(sni ,sni+1 ] (s) − f (s−) |ν|(ds)
(0,t] i
Z X
≤ f (sni )1(sni ,sni+1 ] (s) − f (s−) |ν|(ds)
(0,T ] i
where the last inequality is simply a consequence of increasing the interval
of integration to (0, T ]. The last integral gives a bound that is uniform in t,
and it vanishes as n → ∞ by the dominated convergence theorem.
Example 1.14. A basic example is a step function. Let {xi } be a sequence
of points in [0, T ] (ordering of xi ’sPis immaterial), and {αi } an absolutely
summable sequence, which means |αi | < ∞. Check that
X
G(t) = αi
i:xi ≤t
is a right-continuous BV function. Both properties follow from
X
|G(t) − G(s)| ≤ |αi |.
i:s<xi ≤t
The right-hand side above can be made arbitrarily small by taking t close
enough to s. The Lebesgue-Stieltjes integral of a bounded Borel function is
Z X
f dG = αi f (xi )
(0,T ] i
1.1. Measure theory and integration 17

To rigorously justify this, first take f = 1(a,b] with 0 ≤ a < b ≤ T and

check that both sides equal G(b) − G(a) (left-hand side by definition of the
Lebesgue-Stieltjes measure). These intervals form a π-system that generates
the Borel σ-algebra on (0, T ]. Theorem B.2 can be applied to verify the
identity for all bounded Borel functions f .

1.1.10. Radon-Nikodym theorem. This theorem is among the most im-

portant in measure theory. We state it here because it gives us the existence
of conditional expectations in the next section. First a definition. Suppose µ
is a measure and ν a signed measure on a measurable space (X, A). We say
ν is absolutely continuous with respect to µ, abbreviated ν µ, if µ(A) = 0
implies ν(A) = 0 for all A ∈ A.
Theorem 1.15. Let µ be a σ-finite measure and ν a σ-finite signed mea-
sure on a measurable space (X, A). Assume ν is absolutely continuous with
respect to µ. Then there exists a µ-almost R unique A-measurable
everywhere
function f such that at least one of f + dµ and f − dµ is finite, and for
R

each A ∈ A,
Z
(1.14) ν(A) = f dµ.
A

R + R −
Some remarks
R are in order. Since either A f dµ or A f dµ is finite,
the integral A f dµ has a well-defined value in [−∞, ∞]. The equality of
integrals (1.14) extends to measurable functions, so that
Z Z
(1.15) g dν = gf dµ

for all A-measurable functions g for which the integrals make sense. The
precise sense in which f is unique is this: if f˜ also satisfies (1.14) for all
A ∈ A, then µ{f 6= f˜} = 0.
The function f is the Radon-Nikodym derivative of ν with respect to µ,
and denoted by f = dν/dµ. The derivative notation is very suggestive. It
leads to dν = f dµ which tells us how to do the substitution in the integral.
Also, it suggests that
dν dρ dν
(1.16) · =
dρ dµ dµ
which is a true theorem under the right assumptions: suppose ν is a signed
measure, ρ and µ positive measures, all σ-finite, ν ρ and ρ µ. Then
Z Z Z
dν dν dρ
g dν = g · dρ = g · · dµ
dρ dρ dµ
by two applications of (1.15). Since the Radon-Nikodym derivative is unique,
the equality above proves (1.16).
18 1. Measures, Integrals, and Foundations of Probability Theory

Here is a result that combines the Radon-Nikodym theorem with Lebesgue-

Stieltjes integrals.

Lemma 1.16. Suppose ν is a finite signed Borel measure on [0, T ] and

g ∈ L1 (ν). Let
Z
F (t) = g(s) ν(ds), 0 ≤ t ≤ T.
[0,t]

Then F is a right-continuous BV function on [0, T ]. The Lebesgue-Stieltjes

integral of a bounded Borel function φ on [0, T ] satisfies
Z Z
(1.17) φ(s) dF (s) = φ(s)g(s) ν(ds).
(0,T ] (0,T ]

In abbreviated form, dF = g dν and g = dΛF /dν on (0, T ].

Proof. For right continuity of F , let tn & t. Then

Z Z
|F (tn ) − F (t)| = 1(t,tn ] · g dν ≤ 1(t,tn ] |g| d|ν|.
[0,T ] [0,T ]

The last integral vanishes as tn & t because 1(t,tn ] (s) → 0 for each point s,
and the integral converges by dominated convergence. Thus F (t+) = F (t).
For any partition 0 = s0 < s1 < · · · < sn = T ,
X X Z XZ
F (si+1 ) − F (si ) = g dν ≤ |g| d|ν|
i i (si ,si+1 ] i (si ,si+1 ]
Z
= |g| d|ν|.
(0,T ]

By the assumption g ∈ L1 (ν) the last quantity above is a finite upper bound
on the sums of F -increments over all partitions. Hence F ∈ BV [0, T ].
The last issue is the equality of the two measures ΛF and g dν on (0, T ].
By Lemma B.3 it suffices to check the equality of the two measures for
intervals (a, b], because these types of intervals generate the Borel σ-algebra.
Z Z Z
ΛF (a, b] = F (b)−F (a) = g(s) ν(ds)− g(s) ν(ds) = g(s) ν(ds).
[0,b] [0,a] (a,b]

This suffices.

The conclusion (1.17) can be extended to [0, T ] if we define F (0−) = 0.

For then
Z
ΛF {0} = F (0) − F (0−) = F (0) = g(0)ν{0} = g dν.
{0}
1.2. Basic concepts of probability theory 19

On the other hand, the conclusion of the lemma on (0, T ] would not change
if we defined F (0) = 0 and
Z
F (t) = g(s) ν(ds), 0 < t ≤ T.
(0,t]

This changes F by a constant and hence does not affect its total variation
or Lebesgue-Stieltjes measure.

1.2. Basic concepts of probability theory

This section summarizes the measure-theoretic foundations of probability
theory. Matters related to stochastic processes will be treated in the next
chapter.

1.2.1. Probability spaces, random variables and expectations. The

foundations of probability are taken directly from measure theory, with no-
tation and terminology adapted to probabilistic conventions. A probability
space (Ω, F, P ) is a measure space with total mass P (Ω) = 1. The prob-
ability space is supposed to model the random experiment or collection of
experiments that we wish to analyze. The underlying space Ω is called the
sample space, and its sample points ω ∈ Ω are the elementary outcomes of
the experiment. The measurable sets in F are called events. P is a proba-
bility measure. A random variable is a measurable function X : Ω → S with
values in some measurable space S. Most often S = R. If S = Rd one can
call X a random vector, and if S is a function space then X is a random
function.
Here are some examples to illustrate the terminology.
Example 1.17. Consider the experiment of choosing randomly a person in
a room of N people and registering his or her age in years. Then naturally
Ω is the set of people in the room, F is the collection of all subsets of Ω,
and P {ω} = 1/N for each person ω ∈ Ω. Let X(ω) be the age of person ω.
Then X is a Z+ -valued measurable function (random variable) on Ω.
Example 1.18. Consider the (thought) experiment of tossing a coin infin-
itely many times. Let us record the outcomes (heads and tails) as zeroes
and ones. The sample space Ω is the space of sequences ω = (x1 , x2 , x3 , . . . )
of zeroes and ones, or Ω = {0, 1}N , where N = {1, 2, 3, . . . } is the set of
natural numbers. . The σ-algebra F on Ω is the product σ-algebra B ⊗N
where each factor is the natural σ-algebra

B = ∅, {0}, {1}, {0, 1}
on {0, 1}. To choose the appropriate probability measure on Ω, we need to
make assumptions on the coin. Simplest would be to assume that successive
20 1. Measures, Integrals, and Foundations of Probability Theory

coin tosses are independent (a term we discuss below) and fair (heads and
tails equally likely). Let S be the class of events of the form
A = {ω : (x1 , . . . , xn ) = (a1 , . . . , an )}
as n varies over N and (a1 , . . . , an ) varies over n-tuples of zeroes and ones.
Include ∅ and Ω to make S is a semialgebra. Our assumptions dictate
that the probability of the event A should be P0 (A) = 2−n . One needs
to check that P0 satisfies the hypotheses of Theorem 1.5, and then the
mathematical machinery takes over. There exists a probability measure
P on (Ω, F) that agrees with P0 on S. This is a mathematical model of
a sequence of independent fair coin tosses. Natural random variables to
define on Ω are first the coordinate variables Xi (ω) = xi , and then variables
derived from these such as Sn = X1 + · · · + Xn , the number of ones among
the first n tosses. The random variables {Xi } are an example of an i.i.d.
sequence, which is short for independent and identically distributed.

The expectation of a real-valued random variable X is simply its Lebesgue

integral over the probability space:
Z
EX = X dP.
Ω
The rules governing the existence of the expectation are exactly those inher-
ited from measure theory. The spaces Lp (P ) are also defined as for general
measure spaces.
The distribution µ of a random variable X is the probability measure
obtained when the probability measure P is transported to the real line via
µ(B) = P {X ∈ B}, B ∈ BR .
The expression {X ∈ B} is an abbreviation for the longer set expression
{ω ∈ Ω : X(ω) ∈ B}.
If h is a bounded Borel function on R, then h(X) is also a random
variable (this means the composition h ◦ X), and
Z Z
(1.18) Eh(X) = h(X) dP = h(x) µ(dx).
Ω R
This equality is an instance of the change of variables identity (1.4). Notice
that we need not even specify the probability space to make this calculation.
This is the way things usually work. There must always be a probability
space underlying our reasoning, but when situations are simple we can ignore
it and perform our calculations in familiar spaces such as the real line or
Euclidean spaces.
The (cumulative) distribution function F of a random variable X is
defined by F (x) = P {X ≤ x}. The distribution µ is the Lebesgue-Stieltjes
1.2. Basic concepts of probability theory 21

measure of F . Using the notation of Lebesgue-Stieltjes integrals, (1.18) can

be expressed as
Z
(1.19) Eh(X) = h(x) dF (x).
R
This is the way expectations are expressed in probability and statistics books
that avoid using measure theory, relying on the advanced calculus level
understanding of the Stieltjes integral.
The density function f of a random variable X is the Radon-Nikodym de-
rivative of its distribution with respect to Lebesgue measure, so f = dµ/dx.
It exists iff µ is absolutely continuous with respect to Lebesgue measure on
R. When f exists, the distribution function F is differentiable Lebesgue–
almost everywhere, and F 0 = f Lebesgue–almost everywhere. The expecta-
tion can then be expressed as an integral with respect to Lebesgue measure:
Z
(1.20) Eh(X) = h(x)f (x) dx.
R
This is the way most expectations are evaluated in practice. For example,
if X is a rate λ exponential random variable, then
Z ∞
Eh(X) = λ h(x)e−λx dx.
0
The concepts discussed above have natural extensions to Rd valued random
vectors.
One final terminological change as we switch from analysis to probability:
almost everywhere (a.e.) becomes almost surely (a.s.). But of course there
is no harm in using both.

1.2.2. Convergence of random variables. Here is a list of ways in which

random variables can converge. Except for convergence in distribution, they
are direct adaptations of the corresponding modes of convergence from anal-
ysis.
Definition 1.19. Let {Xn } be a sequence of random variables and X a
random variable, all real valued.
(a) Xn → X almost surely if
n o
P ω : lim Xn (ω) = X(ω) = 1.
n→∞
(b) Xn → X in probability if for every ε > 0,
n o
lim P ω : |Xn (ω) − X(ω)| ≥ ε = 0.
n→∞
(c) Xn → X in Lp for 1 ≤ p < ∞ if
lim E |Xn (ω) − X(ω)|p = 0.

n→∞
22 1. Measures, Integrals, and Foundations of Probability Theory

(d) Xn → X in distribution (also weakly) if

lim P {Xn ≤ x} = P {X ≤ x}
n→∞

for each x at which F (x) = P {X ≤ x} is continuous.

Convergence types (a)–(c) require that all the random variables are defined
on the same probability space, but (d) does not.

The definition of weak convergence above is a specialization to the real

line of the general definition, which is this: let {µn } and µ be Borel proba-
bility measures on a metric space S. Then µn → µ weakly if
Z Z
g dµn −→ g dµ
S S
for all bounded continuous functions g on S. Random variables converge
weakly if their distributions do in the sense above. Commonly used notation
for weak convergence is Xn ⇒ X.
Here is a summary of the relationships between the different types of
convergence.
Theorem 1.20. Let {Xn } and X be real-valued random variables on a
common probability space.
(i) If Xn → X almost surely or in Lp for some 1 ≤ p < ∞, then
Xn → X in probability.
(ii) If Xn → X in probability, then Xn → X weakly.
(iii) If Xn → X in probability, then there exists a subsequence Xnk such
that Xnk → X almost surely.
(iv) If Xn → X in probability, and {Xn } is uniformly integrable, which
means that

(1.21) lim sup E |Xn | · 1{|Xn |≥M } = 0,
M →∞ n∈N

then Xn → X in L1 .

1.2.3. Independence and conditioning. Fix a probability space (Ω, F, P ).

In probability theory, σ-algebras represent information. F represents all the
information about the experiment, and sub-σ-algebras A of F represent par-
tial information. “Knowing a σ-algebra A” means knowing for each event
A ∈ A whether A happened or not. A common way for sub-σ-algebras to
arise is to have them generated by random variables. For example, if X is
a random variable on Ω, and A = σ{X} is the σ-algebra generated by X,
then

A = {X ∈ B} : B ∈ BR .
1.2. Basic concepts of probability theory 23

Knowing the actual value of X is the same as knowing whether {X ∈ B}

happened for each B ∈ BR . But of course there may be many sample points
ω that have the same values for X, so knowing X does not allow us to
determine which outcome ω actually happened. In this sense A represents
partial information.
Elementary probability courses define that two events A and B are inde-
pendent if P (A ∩ B) = P (A)P (B). The conditional probability of A, given
B, is defined as
P (A ∩ B)
P (A|B) =
P (B)
provided P (B) > 0. Thus the independence of A and B can be equivalently
expressed as P (A|B) = P (A). This reveals the meaning of independence:
knowing that B happened (in other words, conditioning on B) does not
change our probability for A.
One technical reason we need to go beyond these elementary definitions
is that we need to routinely condition on events of probability zero. For
example, suppose X and Y are independent random variables, both uni-
formly distributed on [0, 1], and we set Z = X + Y . Then it is clear that
P (Z ≥ 12 |Y = 13 ) = 56 , yet since P (Y = 13 ) = 0 this conditional probability
cannot be defined in the above manner.
The general definition of independence, from which various other defini-
tions follow as special cases, is for the independence of σ-algebras.

Definition 1.21. Let A1 , A2 , . . . , An be sub-σ-algebras of F. Then A1 ,

A2 , . . . , An are mutually independent (or simply independent) if, for every
choice of events A1 ∈ A1 , A2 ∈ A2 , . . . , An ∈ An ,
(1.22) P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 ) · P (A2 ) · · · P (An ).
An arbitrary collection {Ai : i ∈ I} of sub-σ-algebras of F is independent
if each finite subcollection is independent.

The more concrete notions of independence of random variables and

independence of events derive from the above definition.

Definition 1.22. A collection of random variables {Xi : i ∈ I} on a prob-

ability space (Ω, F, P ) is independent if the σ-algebras {σ(Xi ) : i ∈ I} gen-
erated by the individual random variables are independent. Equivalently,
for any finite set of distinct indices i1 , i2 , . . . , in and any measurable sets
B1 , B2 , . . . , Bn from the range spaces of the random variables, we have
n
Y
(1.23) P {Xi1 ∈ B1 , Xi2 ∈ B2 , . . . , Xin ∈ Bn } = P {Xik ∈ Bk }.
k=1
24 1. Measures, Integrals, and Foundations of Probability Theory

Finally, events {Ai : i ∈ I} are independent if the corresponding indicator

random variables {1Ai : i ∈ I} are independent.

Some remarks about the definitions. The product property extends to

all expectations that are well-defined. If A1 , . . . , An are independent σ-
algebras, and Z1 , . . . , Zn are integrable random variables such that Zi is
Ai -measurable (1 ≤ i ≤ n) and the product Z1 Z2 · · · Zn is integrable, then

(1.24) E Z1 Z2 · · · Zn = EZ1 · EZ2 · · · EZn .
Independence is closely tied with the notion of product measures. Let µ
be the distribution of the random vector X = (X1 , X2 , . . . , Xn ) on Rn ,
and let µi be the distribution of component Xi on R. Then the variables
X1 , X2 , . . . , Xn are independent iff µ = µ1 ⊗ µ2 ⊗ · · · ⊗ µn .
Further specialization yields properties familiar from elementary prob-
ability. For example, if the random vector (X, Y ) has a density f (x, y) on
R2 , then X and Y are independent iff f (x, y) = fX (x)fY (y) where fX and
fY are the marginal densities of X and Y . Also, it is enough to check prop-
erties (1.22) and (1.23) for classes of sets that are closed under intersections
and generate the σ-algebras in question. (A consequence of the so-called
π-λ theorem, see Lemma B.3 in the Appendix.) Hence we get the familiar
criterion for independence in terms of cumulative distribution functions:
n
Y
(1.25) P {Xi1 ≤ t1 , Xi2 ≤ t2 , . . . , Xin ≤ tn } = P {Xik ≤ tk }.
k=1

Independence is a special property, and always useful when it is present.

The key tool for handling dependence (that is, lack of independence) is the
notion of conditional expectation. It is a nontrivial concept, but fundamen-
tal to just about everything that follows in this book.
Definition 1.23. Let X ∈ L1 (P ) and let A be a sub-σ-field of F. The con-
ditional expectation of X, given A, is the integrable, A-measurable random
variable Y that satisfies
Z Z
(1.26) X dP = Y dP for all A ∈ A.
A A
The notation for the conditional expectation is Y (ω) = E(X|A)(ω). It is
almost surely unique, in other words, if Ye is A-measurable and satisfies
(1.26), then P {Y = Ye } = 1.

Justification of the definition. The existence of the conditional expec-

tation follows from the Radon-Nikodyn theorem. Define a finite signed mea-
sure ν on (Ω, A) by Z
ν(A) = X dP, A ∈ A.
A
1.2. Basic concepts of probability theory 25

P (A) = 0 implies ν(A) = 0, and so ν P . By the Radon-Nikodym theorem

there exists a Radon-Nikodym derivative Y = dν/dP which is A-measurable
and satisfies
Z Z
Y dP = ν(A) = X dP for all A ∈ A.
A A
Y is integrable because
Z Z Z
+
Y dP = Y dP = ν{Y ≥ 0} = X dP
Ω {Y ≥0} {Y ≥0}
Z
≤ |X| dP < ∞
Ω

and a similar bound can be given for Ω Y − dP .

To prove uniqueness, suppose Ye satisfies the same properties as Y . Let

A = {Y ≥ Ye }. This is an A-measurable event. On A, Y − Ye = (Y − Ye )+ ,
while on Ac , (Y − Ye )+ = 0. Consequently
Z Z Z Z
+
(Y − Y ) dP = (Y − Y ) dP =
e e Y dP − Ye dP
A A A
Z Z
= X dP − X dP = 0.
A A
The integral of a nonnegative function vanishes iff the function vanishes
almost everywhere. Thus (Y − Ye )+ = 0 almost surely. A similar argument
shows (Y − Ye )− = 0, and so |Y − Ye | = 0 almost surely.

The defining property (1.26) of the conditional expectation extends to

Z Z
(1.27) ZX dP = Z E(X|A) dP
Ω Ω
for any bounded A-measurable random variable Z. Boundedness of Z guar-
antees that ZX and Z E(X|A) are integrable for an integrable random vari-
able X.
Some notational conventions. When X = 1B is the indicator random
variable of an event B, then we write P (B|A) for E(1B |A). When the
conditining σ-algebra is generated by a random variable Y , so A = σ{Y },
we can write E(X|Y ) instead of E(X|σ{Y }).
Sometimes one also sees the conditional expectation E(X|Y = y), re-
garded as a function of y ∈ R (assuming now that Y is real-valued). This
is defined by an additional step. Since E(X|Y ) is σ{Y }-measurable, there
exists a Borel function h such that E(X|Y ) = h(Y ). This is an instance of a
general exercise according to which every σ{Y }-measurable random variable
is a Borel function of Y . Then one uses h to define E(X|Y = y) = h(y). This
26 1. Measures, Integrals, and Foundations of Probability Theory

conditional expectation works with integrals on the real line with respect to
the distribution µY of Y : for any B ∈ BR ,
Z
(1.28) E[1B (Y )X] = E(X|Y = y) µY (dy).
B

The definition of the conditional expectation is abstract, and it takes

practice to get used to the idea of conditional probabilities and expecta-
tions as random variables rather than as numbers. The task is to familiarize
oneself with this concept by working with it. Eventually one will under-
stand how it actually does everything we need. The typical way to find
conditional expectations is to make an educated guess, based on an intu-
itive understanding of the situation, and then verify the definition. The
A-measurability is usually built into the guess, so what needs to be checked
is (1.26). Whatever its manifestation, conditional expectation always in-
volves averaging over some portion of the sample space. This is especially
clear in this simplest of examples.
Example 1.24. Let A be an event such that 0 < P (A) < 1, and A =
{∅, Ω, A, Ac }. Then
E(1A X) E(1Ac X)
(1.29) E(X|A)(ω) = · 1A (ω) + · 1Ac (ω).
P (A) P (Ac )
Let us check (1.26) for A. Let Y denote the right-hand-side of (1.29).
Then
Z Z Z
E(1A X) E(1A X)
Y dP = 1A (ω) P (dω) = 1A (ω) P (dω)
A A P (A) P (A) A
Z
= E(1A X) = X dP.
A
R R
A similar calculation checks Ac Y dP = Ac X dP , and adding these together
gives the integral over Ω. ∅ is of course trivial, since any integral over ∅ equals
zero. See Exercise 1.7 for a generalization of this.

The next theorem lists the main properties of the conditional expec-
tation. Equalities and inequalities concerning conditional expectations are
almost sure statements, although we did not indicate this below, because
the conditional expectation is defined only up to null sets.
Theorem 1.25. Let (Ω, F, P ) be a probability space, X and Y integrable
random variables on Ω, and A and B sub-σ-fields of F.
(i) E[E(X|A)] = EX.
(ii) E[αX + βY |A] = αE[X|A] + βE[Y |A] for α, β ∈ R.
(iii) If X ≥ Y then E[X|A] ≥ E[Y |A].
(iv) If X is A-measurable, then E[X|A] = X.
1.2. Basic concepts of probability theory 27

(v) If X is A-measurable and XY is integrable, then

(1.30) E[XY |A] = X · E[Y |A].
(vi) If X is independent of A (which means that the σ-algebras σ{X}
and A are independent), then E[X|A] = EX.
(vii) If A ⊆ B, then
E{E(X|A) |B} = E{E(X|B) |A} = E[X|A].
(viii) If A ⊆ B and E(X|B) is A-measurable, then E(X|B) = E(X|A).
(ix) (Jensen’s inequality) Suppose f is a convex function on (a, b), −∞ ≤
a < b ≤ ∞. This means that
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y) for x, y ∈ (a, b) and 0 < θ < 1.
Assume P {a < X < b} = 1. Then

(1.31) f E[X|A] ≤ E f (X) A
provided the conditional expectations are well defined.
(x) Suppose X is a random variable with values in a measurable space
(S1 , H1 ), Y is a random variable with values in a measurable space (S2 , H2 ),
and φ : S1 ×S2 → R is a measurable function such that φ(X, Y ) is integrable.
Assume that X is A-measurable while Y is independent of A. Then
Z
(1.32) E[φ(X, Y )|A](ω) = φ(X(ω), Y (eω )) P (de
ω ).
Ω

Proof. The proofs must appeal to the definition of the conditional expecta-
tion. We leave them mostly as exercises or to be looked up in any graduate
probability textbook. Let us prove (v) and (vii) as examples.
Proof of part (v). We need to check that X · E[Y |A] satisfies the defi-
nition of E[XY |A]. The A-measurability of X · E[Y |A] is true because X
is A-measurable by assumption, E[Y |A] is A-measurable by definition, and
multiplication preserves A-measurability. Then we need to check that

(1.33) E 1A XY = E 1A X E[Y |A]
for an arbitrary A ∈ A. If X were bounded, this would be a special case of
(1.27) with Z replaced by 1A X. For the general case we need to check the
integrability of X E[Y |A] before we can really write down the right-hand
side of (1.33).
Let us assume first that both X and Y are nonnegative. Then also
E(Y |A) ≥ 0 by (iii), because E(0|A) = 0 by (iv). Let X (k) = X ∧ k be a
truncation of X. We can apply (1.27) to get
E 1A X (k) Y = E 1A X (k) E[Y |A] .

(1.34)
28 1. Measures, Integrals, and Foundations of Probability Theory

Inside both expectations we have nonnegative random variables that increase

with k. By the monotone convergence theorem we can let k → ∞ and recover
(1.33) in the limit, for nonnegative X and Y . In particular, this tells us that,
at least if X, Y ≥ 0, the integrability of X, Y and XY imply that X E(Y |A)
is integrable.
Now decompose X = X + − X − and Y = Y + − Y − . By property (ii),
E(Y |A) = E(Y + |A) − E(Y − |A).
The left-hand side of (1.33) becomes
E 1A X + Y + − E 1A X − Y + − E 1A X + Y − + E 1A X − Y − .

The integrability assumption is true for all pairs X ± Y ± , so to each term

above we can apply the case of (1.33) already proved for nonnegative random
variables. The expression becomes
E 1A X + E(Y + |A) − E 1A X − E(Y + |A) − E 1A X + E(Y − |A)

+ E 1A X − E(Y − |A) .

For integrable random variables, a sum of expectations can be combined

into an expectation of a sum. Consequently the sum above becomes the
right-hand side of (1.33). This completes the proof of part (v).
Proof of part (vii). That E{E(X|A) |B} = E[X|A] follows from part
(iv). To prove E{E(X|B) |A} = E[X|A], we show that E[X|A] satisfies
the definition of E{E(X|B) |A}. Again the measurability is not a problem.
Then we need to check that for any A ∈ A,
Z Z
E(X|B) dP = E(X|A) dP.
A A
R
This is true because A lies in both A and B, so both sides equal AX dP .

There is a geometric way of looking at E(X|A) as the solution to an

optimization or estimation problem. Assume that X ∈ L2 (P ). Then what
is the best A-measurable estimate of X in the mean-square sense? In other
words, find the A-measurable random variable Z ∈ L2 (P ) that minimizes
E[(X − Z)2 ]. This is a “geometric view” of E(X|A) because it involves
projecting X orthogonally to the subspace L2 (Ω, A, P ) of A-measurable L2 -
random variables.
Proposition 1.26. Let X ∈ L2 (P ). Then E(X|A) ∈ L2 (Ω, A, P ). For all
Z ∈ L2 (Ω, A, P ),
E (X − E[X|A])2 ≤ E (X − Z)2

with equality iff Z = E(X|A).

1.2. Basic concepts of probability theory 29

Proof. By Jensen’s inequality,

E E[X|A]2 ≤ E E[X 2 |A] = E X 2 .

Consequently E[X|A] is in L2 (P ).

E (X − Z)2 = E (X − E[X|A] + E[X|A] − Z)2 = E (X − E[X|A])2

+ 2E (X − E[X|A])(E[X|A] − Z) + E (E[X|A] − Z)2

= E (X − E[X|A])2 + E (E[X|A] − Z)2 .

The cross term of the square vanishes because E[X|A] − Z is A-measurable,

and this justifies the last equality. The last line is minimized by the unique
choice Z = E[X|A].

1.2.4. Construction of probability spaces. In addition to the usual

construction issues of measures that we discussed before, in probability the-
ory we need to construct stochastic processes which are infinite collections
{Xt : t ∈ I} of random variables. Often the naturally available ingredients
for the construction are the finite-dimensional distributions of the process.
These are the joint distributions of finite vectors (Xt1 , Xt2 , . . . , Xtn ) of ran-
dom variables. Kolmogorov’s Extension Theorem, whose proof is based
on the general machinery for extending measures, states that the finite-
dimensional distributions are all we need. The natural home for the con-
struction is a product space.
To formulate the hypotheses below, we need to consider permutations
acting on n-tuples of indices from an index set I and on n-vectors from a
product space. A permutation π is a bijective map of {1, 2, . . . , n} onto itself,
for some finite n. If s = (s1 , s2 , . . . , sn ) and t = (t1 , t2 , . . . , tn ) are n-tuples,
then t = πs means that (t1 , t2 , . . . , tn ) = (sπ(1) , sπ(2) , . . . , sπ(n) ). The action
of π on any n-vector is defined similarly, by permuting the coordinates.

Theorem 1.27 (Kolmogorov’s Extension Theorem). Let I be an arbitrary

index set, and for each t ∈ I, let (Xt , Bt ) be a complete,
Q separable metric
space together
N with its Borel σ-algebra. Let X = X t be the product space
and B = Bt the product σ-algebra. Suppose that for each ordered n-tuple
t = (t1 , t2 , . . . , tn ) of distinct indices we are
Qgiven a probability measure
t t n Nn
Qt on the finite product space (X , B ) = k=1 Xtk , k=1 Btk , for all
n ≥ 1. We assume two properties that make {Qt } a consistent family of
finite-dimensional distributions:
(i) If t = πs, then Qt = Qs ◦ π −1 .
(ii) If t = (t1 , t2 , . . . , tn−1 , tn ) and s = (t1 , t2 , . . . , tn−1 ), then for A ∈
B s , Qs (A) = Qt (A × Xtn ).
30 1. Measures, Integrals, and Foundations of Probability Theory

Then there exists a probability measure P on (X, B) whose finite-dimensional

marginal distributions are given by {Qt }. In other words, for any t =
(t1 , t2 , . . . , tn ) and any B ∈ B t ,
(1.35) P {x ∈ X : (xt1 , xt2 , . . . , xtn ) ∈ B} = Qt (B).

We refer the reader to [1, Chapter 12] for a proof of Kolmogorov’s theo-
rem in this generality. The appendix in [2] gives a proof for the case where
I is countable and Xt = R for each t. The main idea of the proof is no
different for the more abstract result.
We will not discuss the proof. Let us observe that hypotheses (i) and
(ii) are necessary for the existence of P , so nothing unnecessary is as-
sumed in the theorem. Property (ii) is immediate from (1.35) because
(xt1 , xt2 , . . . , xtn −1 ) ∈ A iff (xt1 , xt2 , . . . , xtn ) ∈ A × Xtn . Property (i) is
also clear on intuitive grounds because all it says is that if the coordinates
are permuted, their distribution gets permuted too. Here is a rigorous jus-
tification. Take a bounded measurable function f on X t . Note that f ◦ π is
then a function on X s , because
x = (x1 , . . . , xn ) ∈ X s ⇐⇒ xi ∈ Xsi (1 ≤ i ≤ n)
⇐⇒ xπ(i) ∈ Xti (1 ≤ i ≤ n)
⇐⇒ πx = (xπ(1) , . . . , xπ(n) ) ∈ X t .
Compute as follows, assuming P exists:
Z Z
f dQt = f (xt1 , xt2 , . . . , xtn ) P (dx)
Xt
ZX
= f (xsπ(1) , xsπ(2) , . . . , xsπ(n) ) P (dx)
X
Z
= (f ◦ π)(xs1 , xs2 , . . . , xsn ) P (dx)
X
Z Z
= (f ◦ π) dQs = f d(Qs ◦ π −1 ).
Xs Xs

Since f is arbitrary, this says that Qt = Qs ◦ π −1 .

Exercises
Exercise 1.1. Here is a useful formula for computing expectations. Suppose
X is a nonnegative random variable, and h is a nondecreasing function on
R+ such that h(0) = 0 and h is absolutely continuous on R a each bounded
interval. (This last hypothesis is for ensuring that h(a) = 0 h0 (s) ds for all
Exercises 31

0 ≤ a < ∞.) Then

Z ∞
(1.36) Eh(X) = h0 (s)P [X > s] dt.
0

Exercise 1.2. Suppose we need to prove something about a σ-algebra B =

σ(E) on a space X, generated by a class E of subsets of X. A common
strategy for proving such a result is to identify a suitable class C containing
E whose members have the desired property. If C can be shown to be a
σ-field then B ⊆ C follows and thereby all members of B have the desired
property. Here are useful examples.

(a) Fix two points x and y of the underlying space. Suppose for each
A ∈ E, {x, y} ⊆ A or {x, y} ⊆ Ac . Show that the same property is true for
all A ∈ B. In other words, if the generating sets do not separate x and y,
neither does the σ-field.

(b) Suppose Φ is a collection of functions from a space X into a mea-

surable space (Y, H). Let B = σ{f : f ∈ Φ} be the smallest σ-algebra that
makes all functions f ∈ Φ measurable, as defined in (1.1). Suppose g is a
function from another space Ω into X. Let Ω have σ-algebra F. Show that
g is a measurable function from (Ω, F) into (X, B) iff for each f ∈ Φ, f ◦ g
is a measurable function from (Ω, F) into (Y, H).

(c) In the setting of part (b), suppose two points x and y of X satisfy
f (x) = f (y) for all f ∈ Φ. Show that for each B ∈ B, {x, y} ⊆ B or
{x, y} ⊆ B c .

(d) Let S ⊆ X such that S ∈ B. Let

B1 = {B ∈ B : B ⊆ S} = {A ∩ S : A ∈ B}
be the restricted σ-field on the subspace S. Show that B1 is the σ-field
generated on the space S by the collection
E1 = {E ∩ S : E ∈ E}.
Show by example that B1 is not necessarily generated by
E2 = {E ∈ E : E ⊆ S}.

Hint: Consider C = {B ⊆ X : B ∩ S ∈ B1 }. For the example, note that

BR is generated by the class {(−∞, a] : a ∈ R} but none of these infinite
intervals lie in a bounded interval such as [0, 1].

(e) Let (X, A, ν) be a measure space. Let U be a sub-σ-field of A, and

let N = {A ∈ A : ν(A) = 0} be the collection of sets of ν-measure zero
32 1. Measures, Integrals, and Foundations of Probability Theory

(ν-null sets). Let U ∗ = σ(A ∪ N ) be the σ-field generated by U and N .

Show that
U ∗ = {A ∈ A : there exists U ∈ U such that U 4A ∈ N }.
U is called the augmentation of U.
Exercise 1.3. Suppose {Ai } and {Bi } are sequences of measurable sets in
a measure space (X, A, ν) such that ν(Ai 4Bi ) = 0. Then

ν (∪Ai )4(∪Bi ) = ν (∩Ai )4(∩Bi ) = 0.

Exercise 1.4. (Product σ-algebras) Recall the setting of Example 1.3. For
a subset L ⊆ I of indices, let BL = σ{fi : i ∈ L} denote theN σ-algebra
generated by the projections fi for i ∈ L. So in particular, BI = i∈I Ai is
the full product σ-algebra.
(a) Show that for each B ∈ BI there exists a countable set L ⊆ I such
that B ∈ BL . Hint. Do not try to reason starting from a particular set
B ∈ BI . Instead, try to say something useful about the class of sets for
which a countable L exists.
(b) Let R[0,∞) be the space of all functions x : [0, ∞) → R, with the
product σ-algebra generated by the projections x 7→ x(t), t ∈ [0, ∞). Show
that the set of continuous functions is not measurable.
Exercise 1.5. (a) Let E1 , . . . , En be collections of measurable sets on (Ω, F, P ),
each closed under intersections (if A, B ∈ Ei then A ∩ B ∈ Ei ). Suppose
P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 ) · P (A2 ) · · · P (An )
for all A1 ∈ E1 , . . . , An ∈ En . Show that the σ-algebras σ(E1 ), . . . , σ(En ) are
independent. Hint. A straight-forward application of the π–λ Theorem B.1.
(b) Let {Ai : i ∈ I} be a collection of independent σ-algebras. Let I1 ,
. . . , In be pairwise disjoint subsets of I, and let Bk = σ{Ai : i ∈ Ik } for
1 ≤ k ≤ n. Show that B1 , . . . , Bn are independent.
(c) Let A, B, and C be sub-σ-algebras of F. Assume σ{B, C} is indepen-
dent of A, and C is independent of B. Show that A, B and C are independent,
and so in particular C is independent of σ{A, B}.
(d) Show by example that the independence of C and σ{A, B} does not
necessarily follow from having B independent of A, C independent of A, and
C independent of B. This last assumption is called pairwise independence of
A, B and C. Hint. An example can be built from two independent fair coin
tosses.
Exercise 1.6. Independence allows us to average separately. Here is a
special case that will be used in a later proof. Let (Ω, F, P ) be a probability
Exercises 33

space, U and V measurable spaces, X : Ω → U and Y : Ω → V measurable

mappings, f : U × V → R a bounded measurable function (with respect to
the product σ-algebra on U × V ). Assume that X and Y are independent.
Let µ be the distribution of Y on the V -space, defined by µ(B) = P {Y ∈ B}
for measurable sets B ⊆ V . Show that
Z
E[f (X, Y )] = E[f (X, y)] µ(dy).
V

Hints. Start with functions of the type f (x, y) = g(x)h(y). Use Theorem
B.2 from the appendix.

Exercise 1.7. Let {Di : i ∈ N} be a countableSpartition of Ω, by which we

mean that {Di } are pairwise disjoint and Ω = Di . Let D be the σ-algebra
generated by {Di }.
S
(a) Show that G ∈ D iff G = i∈I Di for some I ⊆ N.
(b) Let X ∈ L1 (P ). Let U = {i ∈ N : P (Di ) > 0}. Show that
X E(1D X)
i
E(X|D)(ω) = · 1Di (ω).
P (Di )
i:i∈U

Exercise 1.8. Suppose P (A) = 0 or 1 for all A ∈ A. Show that E(X|A) =

EX for all X ∈ L1 (P ).

Exercise 1.9. Let (X, Y ) be an R2 -valued random vector with joint density
f (x, y). This means that for any bounded Borel function φ on R2 ,
ZZ
E[φ(X, Y )] = φ(x, y)f (x, y) dx dy.
R2

The marginal density of Y is defined by

Z
fY (y) = f (x, y) dx.
R

Let
 f (x, y) , if f (y) > 0

Y
f (x|y) = fY (y)
0, if fY (y) = 0.


(a) Show that f (x|y)fY (y) = f (x, y) for almost every (x, y), with respect
to Lebesgue measure on R2 . Hint: Let

H = {(x, y) : f (x|y)fY (y) 6= f (x, y)}.

Show that m(Hy ) = 0 for each y-section of H, and use Tonelli’s theorem.
34 1. Measures, Integrals, and Foundations of Probability Theory

(b) Show that f (x|y) functions as a conditional density of X, given that

Y = y, in this sense: for a bounded Borel function h on R,
Z
E[h(X)|Y ](ω) = h(x)f (x|Y (ω)) dx.
R

Exercise 1.10. It is not too hard to write down impossible requirements

for a stochastic process. Suppose {Xt : 0 ≤ t ≤ 1} is a real-valued stochastic
process that satisfies
(i) Xs and Xt are independent whenever s 6= t.
(ii) Each Xt has the same distribution, and variance 1.
(iii) The path t 7→ Xt (ω) is continuous for almost every ω.
Show that a process satisfying these conditions cannot exist.
Chapter 2

Stochastic Processes

This chapter first covers general matters in the theory of stochastic processes,
and then discusses the two most important processes, Brownian motion and
Poisson processes.

2.1. Filtrations and stopping times

The set of nonnegative reals is denoted by R+ = [0, ∞). Similarly Q+ for
nonnegative rationals and Z+ = {0, 1, 2, . . . } for nonnegative integers. The
set of natural numbers is N = {1, 2, 3, . . . }.
The discussion always takes place on some fixed probability space (Ω, F, P ).
We will routinely assume that this space is complete as a measure space.
This means that if D ∈ F and P (D) = 0, then all subsets of D lie in F and
have probability zero. This is not a restriction because every measure space
can be completed. See Section 1.1.5.
A filtration on a probability space (Ω, F, P ) is a collection of σ-fields
{Ft : t ∈ R+ } that satisfy
Fs ⊆ Ft ⊆ F for all 0 ≤ s < t < ∞.
Whenever the index t ranges over nonnegative reals, we write simply {Ft }
for {Ft : t ∈ R+ }. Given a filtration {Ft } we can add a last member to it
by defining
[
F∞ = σ Ft .
0≤t<∞
F∞ is contained in F but can be strictly smaller than F.
We will find it convenient to assume that each Ft contains all subsets
of F-measurable P -null events. This is more than just assuming that each

35
36 2. Stochastic Processes

Ft is complete, but it entails no loss of generality. To achieve this, first

complete (Ω, F, P ), and then replace Ft with
(2.1) F̄t = {B ∈ F : there exist A ∈ Ft such that P (A4B) = 0 }.
Exercise 1.2(e) verified that F̄t is a σ-algebra. The filtration {F̄t } is some-
times called complete, or the augmented filtration.
At the most general level, a stochastic process is a collection of random
variables {Xi : i ∈ I} indexed by some arbitrary index set I, and all de-
fined on the same probability space. For us the index set is always R+ or
some subset of it. Let X = {Xt : t ∈ R+ } be a process on (Ω, F, P ). It
is convenient to regard X as a function on R+ × Ω through the formula
X(t, ω) = Xt (ω). Indeed, we shall use the notations X(t, ω) and Xt (ω) in-
terchangeably. When a process X is discussed without explicit mention of
an index set, then R+ is assumed.
If the random variables Xt take their values in a space S, we say X =
{Xt : t ∈ R+ } is an S-valued process. To even talk about S-valued random
variables, S needs to have a σ-algebra so that a notion of measurability
exists. Often in general accounts of the theory S is assumed a metric space,
and then the natural σ-algebra on S is the Borel σ-field BS . We have no
cause to consider anything more general than S = Rd , the d-dimensional
Euclidean space. Unless otherwise specified, in this section a process is
always Rd -valued. Of course, most important is the real-valued case with
space R1 = R.
A process X = {Xt : t ∈ R+ } is adapted to the filtration {Ft } if Xt is
Ft -measurable for each 0 ≤ t < ∞. The smallest filtration to which X is
adapted is the filtration that it generates, defined by
FtX = σ{Xs : 0 ≤ s ≤ t}.
A process X is measurable if X is BR+ ⊗ F-measurable as a function
from R+ × Ω into R. Furthermore, X is progressively measurable if the
restriction of the function X to [0, T ] × Ω is B[0,T ] ⊗ FT -measurable for each
T . More explicitly, the requirement is that for each B ∈ BRd , the event
{(t, ω) ∈ [0, T ] × Ω : X(t, ω) ∈ B}
lies in the σ-algebra B[0,T ] ⊗ FT . If X is progressively measurable then it is
also adapted, but the reverse implication does not hold (Exercise 2.4).
Properties of random objects are often interpreted in such a way that
they are not affected by events of probability zero. For example, let X =
{Xt : t ∈ R+ } and Y = {Yt : t ∈ R+ } be two stochastic processes defined
on the same probability space (Ω, F, P ). As functions on R+ × Ω, X and Y
are equal if Xt (ω) = Yt (ω) for each ω ∈ Ω and t ∈ R+ . A useful relaxation
of this strict notion of equality is called indistinguishability. We say X and
2.1. Filtrations and stopping times 37

Y are indistinguishable if there exists an event Ω0 ⊆ Ω such that P (Ω0 ) = 1

and for each ω ∈ Ω0 , Xt (ω) = Yt (ω) for all t ∈ R+ . Since most statements
about processes ignore events of probability zero, for all practical purposes
indistinguishable processes can be regarded as equal.
Another, even weaker notion is modification: Y is a modification of X if
for each t, P {Xt = Yt } = 1.
Assuming that the probability space (Ω, F, P ) and the filtration {Ft }
are complete conveniently avoids certain measurability complications. For
example, if X is adapted and P {Xt = Yt } = 1 for each t ∈ R+ , then Y is
also adapted. To see the reason, let B ∈ BR , and note that
{Yt ∈ B} = {Xt ∈ B} ∪ {Yt ∈ B, Xt ∈
/ B} \ {Yt ∈
/ B, Xt ∈ B}.
Since all subsets of zero probability events lie in Ft , we conclude that there
are events D1 , D2 ∈ Ft such that {Yt ∈ B} = {Xt ∈ B} ∪ D1 \ D2 , which
shows that Y is adapted.
In particular, since the point of view is that indistinguishable processes
should really be viewed as one and the same process, it is sensible that such
processes cannot differ in adaptedness or measurability.
A stopping time is a random variable τ : Ω → [0, ∞] such that {ω :
τ (ω) ≤ t} ∈ Ft for each 0 ≤ t < ∞. Many operations applied to stopping
times produce new stopping times. The most often used one is the minimum.
If σ and τ are stopping times (for the same filtration) then
{σ ∧ τ ≤ t} = {σ ≤ t} ∪ {τ ≤ t} ∈ Ft
so σ ∧ τ is a stopping time.
Example 2.1. Here is an illustration of the notion of stopping time.
(a) If you instruct your stock broker to sell all your shares in company
ABC on May 1, you are specifying a deterministic time. The time of sale
does not depend on the evolution of the stock price.
(b) If you instruct your stock broker to sell all your shares in company
ABC as soon as the price exceeds 20, you are specifying a stopping time.
Whether the sale happened by May 1 can be determined by inspecting the
stock price of ABC Co. until May 1.
(c) Suppose you instruct your stock broker to sell all your shares in
company ABC on May 1 if the price will be lower on June 1. Again the sale
time depends on the evolution as in case (b). But now the sale time is not
a stopping time because to determine whether the sale happened on May 1
we need to look into the future.
So the notion of a stopping time is eminently sensible because it makes
precise the idea that today’s decisions must be based on the information
available today.
38 2. Stochastic Processes

If τ is a stopping time, the σ-field of events known at time τ is defined

by
Fτ = {A ∈ F : A ∩ {τ ≤ t} ∈ Ft for all 0 ≤ t < ∞}.
A deterministic time is a special case of a stopping time. If τ (ω) = u for all
ω, then Fτ = Fu .
If {Xt } is a process and τ is a stopping time, Xτ denotes the value of the
process at the random time τ , in other words Xτ (ω) = Xτ (ω) (ω). For this to
be well-defined on all of Ω, τ has to be finite. Or at least almost surely finite
so that Xτ is defined with probability one. Here are some basic properties
of these concepts. Infinities arise naturally, and we use the conventions that
∞ ≤ ∞ and ∞ = ∞ are true, but ∞ < ∞ is not.

Lemma 2.2. Let σ and τ be stopping times, and X a process.

(i) For A ∈ Fσ , the events A ∩ {σ ≤ τ } and A ∩ {σ < τ } lie in Fτ . In
particular, σ ≤ τ implies Fσ ⊆ Fτ .
(ii) Both τ and σ ∧ τ are Fτ -measurable. The events {σ ≤ τ }, {σ < τ },
and {σ = τ } lie in both Fσ and Fτ .
(iii) If the process X is progressively measurable and τ < ∞, then X(τ )
is Fτ -measurable.

Proof. Part (i). Let A ∈ Fσ . For the first statement, we need to show that
(A ∩ {σ ≤ τ }) ∩ {τ ≤ t} ∈ Ft . Write

(A ∩ {σ ≤ τ }) ∩ {τ ≤ t}
= (A ∩ {σ ≤ t}) ∩ {σ ∧ t ≤ τ ∧ t} ∩ {τ ≤ t}.

All terms above lie in Ft . (i) The first by the definition of A ∈ Fσ . (ii) The
second because both σ ∧ t and τ ∧ t are Ft -measurable random variables: for
any u ∈ R, {σ ∧ t ≤ u} equals Ω if u ≥ t and {σ ≤ u} if u < t, a member of
Ft in both cases. (iii) {τ ≤ t} ∈ Ft since τ is a stopping time.
In particular, if σ ≤ τ , then Fσ ⊆ Fτ .
To show A ∩ {σ < τ } ∈ Fτ , write
[
1
A ∩ {σ < τ } = A ∩ {σ + n ≤ τ }.
n≥1

All members of the union on the right lie in Fτ by the first part of the proof,
because σ ≤ σ + n1 implies A ∈ Fσ+1/n .

Part (ii). Since {τ ≤ t} ∩ {τ ≤ t} = {τ ≤ t} ∈ Ft by the definition of

a stopping time, τ is Fτ -measurable. By the same token, the stopping time
σ ∧ τ is Fσ∧τ -measurable, hence also Fτ -measurable.
2.1. Filtrations and stopping times 39

Taking A = Ω in part (a) gives {σ ≤ τ } and {σ < τ } ∈ Fτ . Taking the

difference gives {σ = τ } ∈ Fτ , and taking complements gives {σ > τ } and
{σ ≥ τ } ∈ Fτ . Now we can interchange σ and τ in the conclusions.

Part (iii). We claim first that ω 7→ X(τ (ω) ∧ t, ω) is Ft -measurable. To

see this, write it as the composition
ω 7→ (τ (ω) ∧ t, ω) 7→ X(τ (ω) ∧ t, ω).
The first step ω 7→ (τ (ω) ∧ t, ω) is measurable
as a map from (Ω, Ft ) into
the product space [0, t] × Ω, B[0,t] ⊗ Ft if the components have the correct
measurability. ω 7→ τ (ω) ∧ t is measurable from Ft into B[0,t] by a previous
part of the lemma. The other component is the identity map ω 7→ ω.
The second step of the composition is the map (s, ω) 7→ X(s, ω). By the
progressive measurability
assumption for X, this step is measurable from
[0, t] × Ω, B[0,t] ⊗ Ft into (R, BR ).
We have shown here that {Xτ ∧t ∈ B} ∈ Ft for B ∈ BR , and so
{Xτ ∈ B} ∩ {τ ≤ t} = {Xτ ∧t ∈ B} ∩ {τ ≤ t} ∈ Ft .
This shows that Xτ is Fτ -measurable.

All the stochastic processes we study will have some regularity properties
as functions of t, when ω is fixed. These are regularity properties of paths.
A stochastic process X = {Xt : t ∈ R+ } is continuous if for each ω ∈ Ω,
the path t 7→ Xt (ω) is continuous as a function of t. The properties left-
continuous and right-continuous have the obvious analogous meaning. X is
right continuous with left limits (or cadlag as the French acronym for this
property goes) if the following is true for all ω ∈ Ω and t ∈ R+ :
Xt (ω) = lim Xs (ω), and the left limit Xt− (ω) = lim Xs (ω) exists.
s&t s%t

Above s & t means that s approaches t from above (from the right), and
s % t approach from below (from the left). Finally, we also need to consider
the reverse situation, namely a process that is left continuous with right
limits, and for that we use the term caglad.
X is a finite variation process (FV process) if the path t 7→ Xt (ω) has
bounded variation on each compact interval [0, T ].
We shall use all these terms also of a process that has a particular path
property for almost every ω. For example, if t 7→ Xt (ω) is cadlag for all ω
in a set Ω0 of probability 1, then we can define X et (ω) = Xt (ω) for ω ∈ Ω0
and Xt (ω) = 0 for ω ∈
e / Ω0 . Then X has all paths continuous, and X
e
and X are indistinguishable. Since we regard indistinguishable processes
e
as equal, it makes sense to regard X itself as a continuous process. When
we prove results under hypotheses of path regularity, we assume that the
40 2. Stochastic Processes

path condition holds for each ω. Typically the result will be the same for
processes that are indistinguishable.
Note, however, that processes that are modifications of each other can
have quite different path properties (Exercise 2.3).
The next two lemmas record some technical benefits of path regularity.

Lemma 2.3. Let X be adapted to the filtration {Ft }, and suppose X is

either left- or right-continuous. Then X is progressively measurable.

Proof. Suppose X is right-continuous. Fix T < ∞. Define on [0, T ] × Ω

the function
n −1
2X
(k+1)T
Xn (t, ω) = X(0, ω) · 1{0} (t) + X 2n , ω · 1(kT 2−n ,(k+1)T 2−n ] (t).
k=0

Xn is a sum of products of B[0,T ] ⊗ FT -measurable functions, hence itself

B[0,T ] ⊗ FT -measurable. By right-continuity Xn (t, ω) → X(t, ω) as n → ∞,
hence X is also B[0,T ] ⊗ FT -measurable when restricted to [0, T ] × Ω.
We leave the case of left-continuity as an exercise.

Checking indistinguishability between two processes with some path reg-

ularity reduces to an a.s. equality check at a fixed time.

Lemma 2.4. Suppose X and Y are right-continuous processes defined on

the same probability space. Suppose P {Xt = Yt } = 1 for all t in some dense
countable subset S of R+ . Then X and Y are indistinguishable. The same
conclusion holds under the assumption of left-continuity if 0 ∈ S.
T
Proof. Let Ω0 = s∈S {ω : Xs (ω) = Ys (ω)}. By assumption, P (Ω0 ) = 1.
Fix ω ∈ Ω0 . Given t ∈ R+ , there exists a sequence sn in S such that sn & t.
By right-continuity,
Xt (ω) = lim Xsn (ω) = lim Ysn (ω) = Yt (ω).
n→∞ n→∞

Hence Xt (ω) = Yt (ω) for all t ∈ R+ and ω ∈ Ω0 , and this says X and Y are
indistinguishable.
For the left-continuous case the origin t = 0 needs a separate assumption
because it cannot be approached from the left.

Filtrations also have certain kinds of limits and continuity properties.

Given a filtration {Ft }, define the σ-fields
\
(2.2) Ft+ = Fs .
s:s>t
2.1. Filtrations and stopping times 41

{Ft+ } is a new filtration, and Ft+ ⊇ Ft . If Ft = Ft+ for all t, we say {Ft }
is right-continuous. Similarly, we can define
[
(2.3) F0− = F0 and Ft− = σ Fs for t > 0.
s:s<t

Since a union of σ-fields is not necessarily a σ-field, Ft− needs to be defined

as the σ-field generated by the union of Fs over s < t. The generation step
was unnecessary in the definition of Ft+ because any intersection of σ-fields
is again a σ-field. If Ft = Ft− for all t, we say {Ft } is left-continuous. .
It is convenient to note that, since Fs depends on s in a monotone fash-
ion, the definitions above can be equivalently formulated through sequences.
T
For example, if sj > t is a sequence such that sj & t, then Ft+ = j Fsj .
The assumption that {Ft } is both complete and right-continuous is
sometimes expressed by saying that {Ft } satisfies the usual conditions. In
many books these are standing assumptions. When we develop the sto-
chastic integral we assume the completeness. We shall not assume right-
continuity as a routine matter, and we alert the reader whenever that as-
sumption is used.
Lemma 2.5. A [0, ∞]-valued random variable τ is a stopping time with
respect to {Ft+ } iff {τ < t} ∈ Ft for all t ∈ R+ .

Proof. Suppose τ is an {Ft+ }-stopping time. Then for each n ∈ N,

{τ ≤ t − n−1 } ∈ F(t−n−1 )+ ⊆ Ft ,
≤ t − n−1 } ∈ Ft .
S
and so {τ < t} = n {τ
−1
T if {τ < t + n −1} ∈ Ft+n−1 for all n ∈ N, then for
Conversely, T all m ∈ N,
{τ ≤ t} = n:n≥m {τ < t + n } ∈ Ft+m−1 . And so {τ ≤ t} ∈ m Ft+m−1 =
Ft+ .

There are some benefits to having a right-continuous filtration. Here is

an example. Given a set H, define
(2.4) τH (ω) = inf{t ≥ 0 : Xt (ω) ∈ H}.
Lemma 2.6. Let X be a process adapted to a filtration {Ft } and assume
X is left- or right-continuous. If G is an open set, then τG is a stopping
time with respect to {Ft+ }. In particular, if {Ft } is right-continuous, τG is
a stopping time with respect to {Ft }.

Proof. If the path s 7→ Xs (ω) is left- or right-continuous, τG (ω) < t iff

Xs (ω) ∈ G for some s ∈ [0, t) iff Xq (ω) ∈ G for some rational q ∈ [0, t). (If
X is right-continuous, every value Xs for s ∈ [0, t) can be approached from
the right along values Xq for rational q. If X is left-continuous this is true
42 2. Stochastic Processes

for all values except s = 0, but 0 is among the rationals so it gets taken care
of.) Thus we have
[
{τG < t} = {Xq ∈ G} ∈ σ{Xs : 0 ≤ s < t} ⊆ Ft .
q:q∈Q+
q∈[0,t)

Example 2.7. Assuming X continuous would not improve the conclusion
to {τG ≤ t} ∈ Ft . To see this, let G = (b, ∞) for some b > 0, let t > 0, and
consider the two paths
Xs (ω0 ) = Xs (ω1 ) = bs for 0 ≤ s ≤ t
while )
Xs (ω0 ) = bs
for s ≥ t.
Xs (ω1 ) = b(2t − s)
Now τG (ω0 ) = t while τG (ω1 ) = ∞. Since Xs (ω0 ) and Xs (ω1 ) agree for
s ∈ [0, t], the points ω0 and ω1 must be together either inside or outside any
event in FtX [Exercise 1.2(c)]. But clearly ω0 ∈ {τG ≤ t} while ω1 ∈ / {τG ≤
t}. This shows that {τG ≤ t} ∈ X
/ Ft .

There is an alternative way to register arrival into a set, if we settle for

getting infinitesimally close. For a process X, let X[s, t] = {X(u) : s ≤ u ≤
t}, with (topological) closure X[s, t]. For a set H define
(2.5) σH = inf{t ≥ 0 : X[0, t] ∩ H 6= ∅}.
Note that for a cadlag path,
(2.6) X[0, t] = {X(u) : 0 ≤ u ≤ t} ∪ {X(u−) : 0 < u ≤ t}.
Lemma 2.8. Suppose X is a cadlag process adapted to {Ft } and H is a
closed set. Then σH is a stopping time.

Proof. Fix t > 0. First we claim

{σH ≤ t} = {X(0) ∈ H} ∪ {X(s) ∈ H or X(s−) ∈ H for some s ∈ (0, t]}.
It is clear the event on the right is contained in the event on the left. To
prove the opposite containment, suppose σH ≤ t. Then for every k there
exists uk ≤ t + 1/k such that either X(uk ) ∈ H or X(uk −) ∈ H. By
compactness, we may pass to a convergent subsequence uk → s ∈ [0, t] (no
reason to introduce new notation for the subsequence). By passing to a
further subsequence, we may assume the convergence is monotone. We have
three cases to consider.
(i) If uk = s for some k, then X(s) ∈ H or X(s−) ∈ H.
2.1. Filtrations and stopping times 43

(ii) If uk < s for all k then both X(uk ) and X(uk −) converge to X(s−),
which thus lies in H by the closedness of H.
(iii) If uk > s for all k then both X(uk ) and X(uk −) converge to X(s),
which thus lies in H.
The equality above is checked.
Let
Hn = {y : there exists x ∈ H such that |x − y| < n−1 }
be the n−1 -neighborhood of H. Let U contain all the rationals in [0, t] and
the point t itself. Next we claim
{X(0) ∈ H} ∪ {X(s) ∈ H or X(s−) ∈ H for some s ∈ (0, t]}
∞ [
\
= {X(q) ∈ Hn }.
n=1 q∈U

To justify this, note first that if X(s) = y ∈ H for some s ∈ [0, t] or X(s−) =
y ∈ H for some s ∈ (0, t], then we can find a sequence qj ∈ U such that
X(qj ) → y, and then X(qj ) ∈ Hn for all large enough j. Conversely, suppose
we have qn ∈ U such that X(qn ) ∈ Hn for all n. Extract a convergent
subsequence qn → s. By the cadlag property a further subsequence of X(qn )
converges to either X(s) or X(s−). By the closedness of H, one of these
lies in H.
Combining the set equalities proved shows that {σH ≤ t} ∈ Ft .

Lemma 2.8 fails for caglad processes, unless the filtration is assumed
right-continuous (Exercise 2.7). For a continuous process X and a closed
set H the random times defined by (2.4) and (2.5) coincide. So we get this
corollary.
Corollary 2.9. Assume X is continuous and H is closed. Then τH is a
stopping time.
Remark 2.10 (A look ahead). The stopping times discussed above will
play a role in the development of the stochastic integral in the following
way. To integrate an unbounded real-valued process X we need stopping
times ζk % ∞ such that Xt (ω) stays bounded for 0 < t ≤ ζk (ω). Caglad
processes will be an important class of integrands. For a caglad X Lemma
2.6 shows that
ζk = inf{t ≥ 0 : |Xt | > k}
are stopping times, provided {Ft } is right-continuous. Left-continuity of X
then guarantees that |Xt | ≤ k for 0 < t ≤ ζk .
Of particular interest will be a caglad process X that satisfies Xt = Yt−
for t > 0 for some adapted cadlag process Y . Then by Lemma 2.8 we get
44 2. Stochastic Processes

the required stopping times by

ζk = inf{t > 0 : |Yt | > k or |Yt− | > k}
without having to assume that {Ft } is right-continuous.

2.2. Quadratic variation

In stochastic analysis many processes turn out to have infinite variation, and
we use quadratic variation as a measure of path oscillation. For example,
we shall see in the next chapter that if a continuous martingale M has finite
variation, then Mt = M0 .
Let Y be a stochastic process. For a partition π = {0 = t0 < t1 < · · · <
tm(π) = t} of [0, t] we can form the sum of squared increments
m(π)−1
X
(Yti+1 − Yti )2 .
i=0

We say that these sums converge to the random variable [Y ]t in probability

as mesh(π) = maxi (ti+1 − ti ) → 0 if for each ε > 0 there exists δ > 0 such
that
m(π)−1
X
2
(2.7) P (Yti+1 − Yti ) − [Y ]t ≥ ε ≤ ε
i=0

for all partitions π with mesh(π) ≤ δ. We express this limit as

X
(2.8) lim (Yti+1 − Yti )2 = [Y ]t in probability.
mesh(π)→0
i

Definition 2.11. The quadratic variation process [Y ] = {[Y ]t : t ∈ R+ }

of a stochastic process Y is defined by the limit in (2.8) provided this limit
exists for all t ≥ 0. [Y ]0 = 0 is part of the definition.

The same idea is used to define the corresponding object between two
processes.
Definition 2.12. The (quadratic) covariation process [X, Y ] = {[X, Y ]t :
t ∈ R+ } of two stochastic processes X and Y is defined by the following
limits, provided these limits exist for all t:
X
(2.9) lim (Xti+1 − Xti )(Yti+1 − Yti ) = [X, Y ]t in probability.
mesh(π)→0
i

By definition [Y, Y ] = [Y ].
2.2. Quadratic variation 45

Lemma 2.13. Suppose [X], [Y ], and either [X +Y ] or [X −Y ] exists. Then

[X, Y ] exists, and depending on the case,
[X, Y ] = 21 [X + Y ] − [X] − [Y ]

or
1

[X, Y ] = 2 [X] + [Y ] − [X − Y ] .

Proof. Follows from

1
(a + b)2 − a2 − b2 = 1
a2 + b2 − (a − b)2

ab = 2 2
applied to a = Xti+1 − Xti and b = Yti+1 − Yti .
Lemma 2.14. Suppose [X], [Y ], and [X − Y ] exist. Then
(2.10) [X, Y ] ≤ [X]1/2 [Y ]1/2
and
1/2 1/2
(2.11) [X]t − [Y ]t ≤ [X − Y ]t + 2[X − Y ]t [Y ]t .

Proof. Inequality (2.10) follows from Cauchy-Schwarz inequality

X X 1/2 X 1/2
xi yi ≤ x2i yi2 .
From
[X − Y ] = [X] − 2[X, Y ] + [Y ]
we get
[X] − [Y ] = [X − Y ] + 2 [X, Y ] − [Y ]) = [X − Y ] + 2[X − Y, Y ]
where we used the equality
X X X
xi yi − yi2 = (xi − yi )yi .
Utilizing (2.10),
[X] − [Y ] ≤ [X − Y ] + 2[X − Y, Y ] ≤ [X − Y ] + 2[X − Y ]1/2 [Y ]1/2 .

[X]t is nondecreasing as a function of t as can be seen directly from

the definition. It follows then from Lemma 2.13 that [X, Y ]t is BV on any
compact time interval. Then these processes have left and right limits at all
times. In the next proposition we show that if the processes are cadlag, then
we can always assume [X, Y ]t also cadlag. Technically, we show that for each
t, [X, Y ]t+ = [X, Y ]t almost surely. Then the process Q(t) = [X, Y ]t+ has
cadlag paths and satisfies the definition of the covariation. Furthermore, the
46 2. Stochastic Processes

jumps of the processes are copied exactly in the covariation. For any cadlag
process Z, the jump at t is denoted by ∆Z(t) = Z(t) − Z(t−).
Proposition 2.15. Suppose X and Y are cadlag processes, and [X] and
[Y ] exist. Then there exists a modification of [X, Y ] that is a cadlag process.
For any t, ∆[X, Y ]t = (∆Xt )(∆Yt ) almost surely.

Proof. By Lemma 2.13 it suffices to treat the case X = Y . Pick δ, ε > 0.

Fix t < u. Pick η > 0 so that
m(π)−1
X
2
(2.12) P [X]u − [X]t − (Xti+1 − Xti ) < ε > 1 − δ
i=0
whenever π = {t = t1 < · · · < tm (π) = u} is a partition of [t, u] with
mesh(π) < η. Pick such a partition π. Keeping t1 fixed, refine π further in
[t1 , u] so that
m(π)−1
X
2
P [X]u − [X]t1 − (Xti+1 − Xti ) < ε > 1 − δ.
i=1
Taking the intersection of these events, we have that with probability at
least 1 − 2δ,
m(π)−1
X
[X]u − [X]t ≤ (Xti+1 − Xti )2 + ε
i=0
m(π)−1
X
2
= (Xt1 − Xt ) + (Xti+1 − Xti )2 + ε
i=1
2
≤ (Xt1 − Xt ) + [X]u − [X]t1 + 2ε
which rearranges to
[X]t1 ≤ [X]t + (Xt1 − Xt )2 + 2ε.
Looking back, we see that this argument works for any t1 ∈ (t, t + η). By
the monotonicity, [X]t+ ≤ [X]t1 , so for all these t1 ,
P [X]t+ ≤ [X]t + (Xt1 − Xt )2 + 2ε > 1 − 2δ.

Shrink η > 0 further so that for t1 ∈ (t, t + η) by right continuity

P (Xt1 − Xt )2 < ε > 1 − δ.

The final estimate is

P [X]t ≤ [X]t+ ≤ [X]t + 3ε > 1 − 3δ.
Since ε, δ > 0 were arbitrary, it follows that [X]t+ = [X]t almost surely. As
explained before the statement of the proposition, this implies that we can
choose a version of [X] with cadlag paths.
2.2. Quadratic variation 47

To bound the jump at u, return to the partition π first chosen right after
(2.12). Let s = tm(π)−1 . Keeping s fixed refine π sufficiently in [t, s] so that,
with probability at least 1 − 2δ,
m(π)−1
X
[X]u − [X]t ≤ (Xti+1 − Xti )2 + ε
i=0
m(π)−2
X
2
= (Xu − Xs ) + (Xti+1 − Xti )2 + ε
i=0
2
≤ (Xu − Xs ) + [X]s − [X]t + 2ε
which rearranges, through ∆[X]u ≤ [X]u − [X]s , to give
P ∆[X]u ≤ (Xu − Xs )2 + 2ε > 1 − 2δ.

Here s ∈ (u − η, u) was arbitrary. Again we can pick η small enough so that

for all such s with probability at least 1 − δ,
(2.13) (Xu − Xs )2 − (∆Xu )2 < ε.

Since ε, δ are arbitrary, we have ∆[X]u ≤ (∆Xu )2 almost surely.

For the other direction, start by choosing t in (2.12) so that

P ∆[X]u ≥ [X]u − [X]t − ε > 1 − δ.
Let still s = tm(π)−1 . Then (2.12) and (2.13) give, with probability at least
1 − 3δ,
∆[X]u ≥ [X]u − [X]t − ε ≥ (Xu − Xs )2 − 2ε ≥ (∆Xu )2 − 3ε.
Letting ε, δ to zero once more completes the proof.

In all our applications [X, Y ] will be a cadlag process. Lebesgue-Stieltjes

integrals over time intervals with respect to [X, Y ] have an important role in
the development. These are integrals with respect to the Lebesgue-Stieltjes
measure Λ[X,Y ] defined by
Λ[X,Y ] (a, b] = [X, Y ]b − [X, Y ]a , 0 ≤ a < b < ∞,
as explained in Section 1.1.9. Note that there is a hidden ω in all these
quantities. This integration over time is done separately for each fixed ω.
This kind of operation is called “path by path” because ω represents the
path of the underlying process [X, Y ].
When the origin is included in the time interval, we assume [X, Y ]0− = 0,
so the Lebesgue-Stieltjes measure Λ[X,Y ] give zero measure to the singleton
{0}. The Lebesgue-Stieltjes integrals obey the following useful inequality.
48 2. Stochastic Processes

Proposition 2.16 (Kunita-Watanabe Inequality). Fix ω such that [X], [Y ]

and [X, Y ] exist and are right-continuous on the interval [0, T ]. Then for
any B[0,T ] ⊗ F-measurable bounded functions G and H on [0, T ] × Ω,
Z
G(t, ω)H(t, ω) d[X, Y ]t (ω)
[0,T ]
(2.14) Z 1/2 Z 1/2
2 2
≤ G(t, ω) d[X]t (ω) H(t, ω) d[Y ]t (ω) .
[0,T ] [0,T ]

The integrals above are Lebesgue-Stieltjes integrals with respect to the t-

variable, evaluated for the fixed ω.

Proof. Once ω is fixed, the result is an analytic lemma, and the depen-
dence of G and H on ω is irrelevant. We included this dependence so that
the statement better fits its later applications. It is a property of product-
measurability that for a fixed ω, G(t, ω) and H(t, ω) are measurable func-
tions of t.
Consider first step functions
m−1
X
g(t) = α0 1{0} (t) + αi 1(si ,si+1 ] (t)
i=1
and
m−1
X
h(t) = β0 1{0} (t) + βi 1(si ,si+1 ] (t)
i=1
where 0 = s1 < · · · < sm = T is a partition of [0, T ]. (Note that g and h
can be two arbitrary step functions. If they come with distinct partitions,
{si } is the common refinement of these partitions.) Then
Z X
g(t)h(t) d[X, Y ]t = αi βi [X, Y ]si+1 − [X, Y ]si
[0,T ] i
X 1/2 1/2
≤ |αi βi | [X]si+1 − [X]si [Y ]si+1 − [Y ]si
i
X 1/2 X 1/2
2 2

≤ |αi | [X]si+1 − [X]si |βi | [Y ]si+1 − [Y ]si
i i
Z 1/2 Z 1/2
2 2
= g(t) d[X]t h(t) d[Y ]t
[0,T ] [0,T ]

where we applied (2.10) separately on each partition interval (si , si+1 ], and
then Schwarz inequality.
Let g and h be two arbitrary bounded Borel functions on [0, T ], and pick
0 < C < ∞ so that |g| ≤ C and |h| ≤ C. Let ε > 0. Define the bounded
2.2. Quadratic variation 49

Borel measure
µ = Λ[X] + Λ[Y ] + |Λ[X,Y ] |
on [0, T ]. Above, Λ[X] is the positive Lebesgue-Stieltjes measure of the
function t 7→ [X]t (for the fixed ω under consideration), same for Λ[Y ] ,
and |Λ[X,Y ] | is the positive total variation measure of the signed Lebesgue-
Stieltjes measure Λ[X,Y ] . By Lemma A.16 we can choose step functions ge
and eh so that |e g | ≤ C, |e
h| ≤ C, and
Z
ε
|g − ge| + |h − e
h| dµ < .
2C
On the one hand
Z Z Z
gh d[X, Y ]t − geh d[X, Y ]t ≤
e h d|Λ[X,Y ] |
gh − gee
[0,T ] [0,T ] [0,T ]
Z Z
≤C |g − ge| d|Λ[X,Y ] | + C |h − e h| d|Λ[X,Y ] | ≤ ε.
[0,T ] [0,T ]

On the other hand,

Z Z Z
2 2
g d[X]t − ge d[X]t ≤ g 2 − ge2 d[X]t
[0,T ] [0,T ] [0,T ]
Z
≤ 2C |g − ge| d[X]t ≤ ε,
[0,T ]

with a similar bound for h. Putting these together with the inequality
already proved for step functions gives
Z Z 1/2 Z 1/2
gh d[X, Y ]t ≤ ε + ε + g 2 d[X]t ε+ h2 d[Y ]t .
[0,T ] [0,T ] [0,T ]

Since ε > 0 was arbitrary, we can let ε → 0. The inequality as stated in the
proposition is obtained by choosing g(t) = G(t, ω) and h(t) = H(t, ω).
Remark 2.17. Inequality (2.14) has the following corollary. As in the proof,
let |Λ[X,Y ](ω) | be the total variation measure of the signed Lebesgue-Stieltjes
measure Λ[X,Y ](ω) on [0, T ]. For a fixed ω, (1.11) implies that |Λ[X,Y ](ω) |
Λ[X,Y ](ω) and the Radon-Nikodym derivative
d|Λ[X,Y ](ω) |
φ(t) = (t)
dΛ[X,Y ](ω)
on [0, T ] satisfies |φ(t)| ≤ 1. For an arbitrary bounded Borel function g on
[0, T ]
Z Z
g(t) |Λ[X,Y ](ω) |(dt) = g(t)φ(t) d[X, Y ]t (ω).
[0,T ] [0,T ]
50 2. Stochastic Processes

Combining this with (2.14) gives

Z
G(t, ω)H(t, ω) |Λ[X,Y ](ω) |(dt)
[0,T ]
(2.15) Z 1/2 Z 1/2
2 2
≤ G(t, ω) d[X]t (ω) H(t, ω) d[Y ]t (ω) .
[0,T ] [0,T ]

2.3. Path spaces, Markov and martingale property

So far we have thought of a stochastic process as a collection of random
variables on a probability space. An extremely fruitful, more abstract view
regards a process as a probability distribution on a path space. This is a
natural generalization of the notion of probability distribution of a random
variable or a random vector. If Y = (Y1 , . . . , Yn ) is an Rn -valued random
vector on a probability space (Ω, F, P ), its distribution µ is the Borel prob-
ability measure on Rn defined by
µ(B) = P {ω : Y (ω) ∈ B}, B ∈ B Rn .
One can even forget about the “abstract” probability space (Ω, F, P ), and
redefine Y on the “concrete” space (Rn , BRn , µ) as the identity random
variable Y (s) = s for s ∈ Rn .
To generalize this notion to a process X = {Xt : 0 ≤ t < ∞}, we have
to choose a suitable measurable space U so that X can be thought of as a
measurable map X : Ω → U . For a fixed ω the value X(ω) is a function
t 7→ Xt (ω), so the space U has to be a space of functions, or a “path space.”
The path regularity of X determines which space U will do. Here are the
three most important choices.
(i) Without any further assumptions, X is a measurable map into the
product space (Rd )[0,∞) with product σ-field B(Rd )⊗[0,∞) .
(ii) If X is an Rd -valued cadlag process, then a suitable path space is
D = DRd [0, ∞), the space of Rd -valued cadlag functions on [0, ∞), with the
σ-algebra generated by the coordinate projections ξ 7→ ξ(t) from D into Rd .
It is possible to define a metric on D that makes it a complete, separable
metric space, and under which the Borel σ-algebra is the one generated by
the coordinate mappings. Thus we can justifiably denote this σ-algebra by
BD .
(iii) If X is an Rd -valued continuous process, then X maps into C =
CRd [0, ∞), the space of Rd -valued continuous functions on [0, ∞). This
space is naturally metrized by
∞
X
(2.16) r(η, ζ) = 2−k 1 ∧ sup |η(t) − ζ(t)| , η, ζ ∈ C.
k=1 0≤t≤k
2.3. Path spaces, Markov and martingale property 51

This is the metric of uniform convergence on compact sets. (C, r) is a

complete, separable metric space, and its Borel σ-algebra BC is generated
by the coordinate mappings. C is a subspace of D, and indeed the notions
of convergence and measurability in C coincide with the notions it inherits
as a subspace from D.
Generating the σ-algebra of the path space with the coordinate functions
guarantees that X is a measurable mapping from Ω into the path space
(Exercise 1.2(b)). Then we can define the distribution µ of the process on
the path space. For example, if X is cadlag, then define µ(B) = P {X ∈ B}
for B ∈ BD . As in the case of the random vector, we can switch probability
spaces. Take (D, BD , µ) as the new probability space, and define the process
{Yt } on D via the coordinate mappings: Yt (ξ) = ξ(t) for ξ ∈ D. Then the
old process X and the new process Y have the same distribution, because
by definition

P {X ∈ B} = µ(B) = µ{ξ ∈ D : ξ ∈ B} = µ{ξ ∈ D : Y (ξ) ∈ B}

= µ{Y ∈ B}.

One benefit from this construction is that it leads naturally to a theory

of convergence of processes, which is used in many applications. This comes
from adapting the well-developed theory of weak convergence of probability
measures on metric spaces to the case of a path space.
The two most important general classes of stochastic processes are mar-
tingales and Markov processes. Both classes are defined by the relationship
of the process X = {Xt } to the filtration {Ft }. It is always first assumed
that {Xt } is adapted to {Ft }.
Let X be a real-valued process. Then X is a martingale with respect to
{Ft } if Xt is integrable for each t, and

E[Xt |Fs ] = Xs for all s < t.

For the definition of a Markov process X can take its values in an ab-
stract space, but Rd is sufficiently general for us. An Rd -valued process X
is a Markov process with respect to {Ft } if

P [Xt ∈ B|Fs ] = P [Xt ∈ B|Xs ] for all s < t and B ∈ BRd .

A martingale represents a fair gamble in the sense that, given all the
information up to the present time s, the expectation of the future fortune
Xt is the same as the current fortune Xs . Stochastic analysis relies heavily
on martingale theory, parts of which are covered in the next chapter. The
Markov property is a notion of causality. It says that, given the present
state Xs , future events are independent of the past.
52 2. Stochastic Processes

These notions are of course equally sensible in discrete time. Let us

give the most basic example in discrete time, since that is simpler than
continuous time. Later in this chapter we will have sophisticated continuous-
time examples when we discuss Brownian motion and Poisson processes.
Example 2.18 (Random Walk). Let X1 , X2 , X3 , . . . be a sequence of i.i.d.
random variables. Define the partial sums by S0 = 0, and Sn = X1 +· · ·+Xn
for n ≥ 1. Then Sn is a Markov chain (the term for a Markov process in
discrete time). If EXi = 0 then Sn is a martingale.

With Markov processes it is natural to consider the whole family of pro-

cesses obtained by varying the initial state. In the previous example, to have
the random walk start at x, we simply say S0 = x and Sn = x+X1 +· · ·+Xn .
The definition of such a family of processes is conveniently expressed in terms
of probability distributions on a path space. Below we give the definition
of a cadlag Markov process. On the path space D, the natural filtration is
Ft = σ{Xs : s ≤ t} where Xt (ζ) = ζ(t) are the coordinate functions. The
shift maps θs : D → D are defined by (θs ζ)(t) = ζ(s+t). In other words, the
path θs ζ has its time origin translated to s, and the path before s deleted.
For an event A ∈ BD , the inverse image
θs−1 A = {ζ ∈ D : θs ζ ∈ A}
represents the event that “A happens starting at time s.”
Definition 2.19. An Rd -valued Markov process is a collection {P x : x ∈
Rd } of probability measures on D = DRd [0, ∞) with these properties:
(a) P x {ζ ∈ D : ζ(0) = x} = 1.
(b) For each A ∈ BD , the function x 7→ P x (A) is measurable on Rd .
(c) P x [θt−1 A|Ft ](ζ) = P ζ(t) (A) for P x -almost every ζ ∈ D, for every
x ∈ Rd and A ∈ BD .

Requirement (a) in the definition says that x is the initial state under
the measure P x . Requirement (b) is for technical purposes. Requirement
(c) is the Markov property. E x stands for expectation under the measure
P x.
Next we discuss the two most important processes, Brownian motion
and the Poisson process.

2.4. Brownian motion

Definition 2.20. Let (Ω, F, P ) be a probability space and {Ft } a filtration
on it. A standard one-dimensional Brownian motion is an adapted stochastic
process B = {Bt : 0 ≤ t < ∞} with these properties.
2.4. Brownian motion 53

(i) B0 = 0 almost surely.

(ii) For almost every ω, the path t 7→ Bt (ω) is continuous.
(iii) For 0 ≤ s < t, Bt − Bs is independent of Fs , and has normal
distribution with mean zero and variance t − s.
Since the definition involves both the process and the filtration, sometimes
one calls this B an {Ft }–Browian motion, or the pair {Bt , Ft : 0 ≤ t < ∞}
is called the Brownian motion.

To be explicit, point (iii) of the definition can be expressed by the re-

quirement
x2 o
Z
1 n
E Z · h(Bt − Bs ) = EZ · p h(x) exp − dx
2π(t − s) R 2(t − s)
for all bounded Fs -measurable random variables Z and all bounded Borel
functions h on R. By an inductive argument, it follows that for any 0 ≤
s0 < s1 < · · · < sn , the increments
Bs1 − Bs0 , Bs2 − Bs1 , . . . , Bsn − Bsn−1
are independent random variables. Furthermore, the joint distribution of the
increments is not changed by a shift in time: namely, the joint distribution of
the increments above is the same as the joint distribution of the increments
Bt+s1 − Bt+s0 , Bt+s2 − Bt+s1 , . . . , Bt+sn − Bt+sn−1
for any t ≥ 0. These two points are summarized by saying that Brownian
motion has stationary, independent increments.
A d-dimensional standard Brownian motion is an Rd -valued process
Bt = (Bt1 , . . . , Btd ) with the property that each component Bti is a one-
dimensional standard Brownian motion (relative to the underlying filtration
{Ft }), and the coordinates B 1 , B 2 , . . . , B d are independent. This is equiv-
alent to requiring that
(i) B0 = 0 almost surely.
(ii) For almost every ω, the path t 7→ Bt (ω) is continuous.
(iii) For 0 ≤ s < t, Bt − Bs is independent of Fs , and has multivariate
normal distribution with mean zero and covariance matrix (t −
s) I .
d×d
Above, I is the d × d identity matrix.
d×d
To have a Brownian motion Bt with a general initial distribution µ (the
probability distribution of B0 ), take a standard Brownian motion (B et , Fet )
and a µ-distributed random variable X independent of F∞ , and define Bt =
e
X +Bet . The filtration is now Ft = σ{X, Fet }. Since Bt −Bs = B
et − B
es , Fe∞ is
54 2. Stochastic Processes

independent of X, and B et − B
es is independent of Fes , Exercise 1.5(c) implies
that Bt − Bs is independent of Fs . Conversely, if a process Bt satisfies parts
(ii) and (iii) of the definition, then Bet = Bt − B0 is a standard Brownian
motion, independent of B0 .
The construction (proof of existence) of Brownian motion is rather tech-
nical, and hence relegated to Section B.2 in the Appendix. For the un-
derlying probability space the construction uses the “canonical” path space
C = CR [0, ∞). Let Bt (ω) = ω(t) be the coordinate projections on C, and
FtB = σ{Bs : 0 ≤ s ≤ t} the filtration generated by the coordinate process.
Theorem 2.21. There exists a Borel probability measure P 0 on C = CR [0, ∞)
such that the process B = {Bt : 0 ≤ t < ∞} on the probability space
(C, BC , P 0 ) is a standard one-dimensional Brownian motion with respect to
the filtration {FtB }.

The proof of this existence theorem relies on the Kolmogorov Extension

Theorem 1.27. The probability measure P 0 on C constructed in the theorem
is called Wiener measure. Brownian motion itself is sometimes also called
the Wiener process. Once we know that Brownian motion starting at the
origin exists, we can construct Brownian motion with an arbitrary initial
point (random or deterministic) following the description after Definition
2.20.
The construction gives us the following regularity property of paths. Fix
0 < γ < 12 . For P 0 -almost every ω ∈ C,
|Bt (ω) − Bs (ω)|
(2.17) sup <∞ for all T < ∞.
0≤s<t≤T |t − s|γ
We shall show later in this section that this property is not true for γ > 12 .
Terminological comment: for γ = 1 this property would be called Lipschitz
continuity, while for 0 < γ < 1 it is called Hölder continuity. What happens
if γ > 1?

2.4.1. Properties of Brownian motion. We discuss here mainly prop-

erties of the one-dimensional case. The multidimensional versions of the
matters we discuss follow naturally from the one-dimensional case. We shall
use the term Brownian motion to denote a process that satisfies (ii) and (iii)
of Definition 2.20, and call it standard if also B0 = 0.
A fundamental property of Brownian motion is that it is both a martin-
gale and a Markov process.
Proposition 2.22. Suppose B = {Bt } is a Brownian motion with respect
to a filtration {Ft } on (Ω, F, P ). Then Bt and Bt2 − t are martingales with
respect to {Ft }.
2.4. Brownian motion 55

Next we show that Brownian motion restarts itself independently of the

past. This is the heart of the Markov property. Also, it is useful to know
that the filtration of a Brownian motion can always be both augmented with
the null events and made right-continuous.

Lemma 2.23. Suppose B = {Bt } is a Brownian motion with respect to a

filtration {Ft } on (Ω, F, P ).
(a) We can assume that Ft contains every set A for which there exists
an event N ∈ F such that A ⊆ N and P (N ) = 0. (This is the notion of a
complete or augmented filtration introduced earlier.) Furthermore, B = {Bt }
is also a Brownian motion with respect to the right-continuous filtration
{Ft+ }.
(b) Fix s ∈ R+ and define Yt = Bs+t − Bs . Then Y = {Yt : 0 ≤ t < ∞}
is a standard Brownian motion with respect to the filtration {Gt } defined by
Gt = Fs+t . The process Y is independent of Fs .

Proof. Part (a). Definition (2.1) shows how to complete the filtration. Of
course, the adaptedness of B to the filtration is not harmed by enlarging
the filtration, the issue is the independence of F̄s and Bt − Bs . If G ∈ F has
A ∈ Fs such that P (A4G) = 0, then P (G ∩ H) = P (A ∩ H) for any event
H. In particular, the independence of F̄s from Bt − Bs follows.
To check the independence of Fs+ and Bt − Bs , let f be a bounded
continuous function on R, and suppose Z is a bounded Fs+ -measurable
random variable. Z is Fs+h -measurable for each h > 0. By path continuity,
and by the independence of Bt − Bs+h and Fs+h for s + h < t, we get

E Z · f (Bt − Bs ) = lim E Z · f (Bt − Bs+h )
h→0

= lim EZ · E f (Bt − Bs+h )
h→0

= EZ · E f (Bt − Bs ) .
56 2. Stochastic Processes

This implies the independence of Fs+ and Bt − Bs (Lemma B.4 extends the
equality from continuous f to bounded Borel f ).

Part (b). Fix 0 = t0 < t1 < t2 < · · · < tn and abbreviate the vector of
Brownian increments by
ξ = (Bs+t1 − Bs , Bs+t2 − Bs+t1 , . . . , Bs+tn − Bs+tn−1 ).
We first claim that ξ is independent of Fs . Pick Borel sets A1 , . . . , An of
R and a bounded Fs -measurable random variable Z. In the next calcula-
tion, separate out the factor 1An (Bs+tn − Bs+tn−1 ) and note that the rest is
Fs+tn−1 -measurable and so independent of 1An (Bs+tn − Bs+tn−1 ):

E Z · 1A1 ×A2 ×···×An (ξ)
h Yn i
=E Z· 1Ai (Bs+ti − Bs+ti−1 )
i=1
h n−1
Y i
=E Z· 1Ai (Bs+ti − Bs+ti−1 ) · 1An (Bs+tn − Bs+tn−1 )
i=1
h n−1
Y i
=E Z· 1Ai (Bs+ti − Bs+ti−1 ) · E 1An (Bs+tn − Bs+tn−1 ) .
i=1

Repeat this argument to separate all the factors, until the expectation be-
comes
n
Y
E Z · E 1Ai (Bs+ti − Bs+ti−1 ) = E Z · E 1A1 ×A2 ×···×An (ξ) .
i=1

Now consider the class of all Borel sets G ∈ BRn such that
E[ Z · 1G (ξ)] = E[Z] · E[ 1G (ξ)].
The above argument shows that this class contains all products A1 × A2 ×
· · · × An of Borel sets from R. We leave it to the reader to check that this
class is a λ-system. Thus by the π-λ Theorem B.1 the equality is true for
all G ∈ BRn . Since Z was an arbitrary bounded Fs -measurable random
variable, we have proved that ξ is independent of Fs .
The vector ξ also satisfies ξ = (Yt1 , Yt2 − Yt1 , . . . , Ytn − Ytn−1 ). Since
the vector η = (Yt1 , Yt2 , . . . , Ytn ) is a function of ξ, we conclude that η is
independent of Fs . This being true for all choices of time points 0 < t1 <
t2 < · · · < tn implies that the entire process Y is independent of Fs .
It remains to check that Y is a standard Brownian motion with respect
to Gt = Fs+t . These details are straightforward and we leave them as an
exercise.
2.4. Brownian motion 57

Parts (a) and (b) of the lemma together assert that Y is a standard
Brownian motion, independent of F̄t+ , the filtration obtained by replac-
ing {Ft } with the augmented right-continuous version. (The order of the
two operations on the filtration is immaterial,
S S in other words the σ-algebra
F̄
s:s>t s agrees with the augmentation of s:s>t Fs , see Exercise 2.2.)
Next we develop some properties of Brownian motion by concentrating
on the “canonical setting.” The underlying probability space is the path
space C = CR [0, ∞) with the coordinate process Bt (ω) = ω(t) and the
filtration FtB = σ{Bs : 0 ≤ s ≤ t} generated by the coordinates. For each
x ∈ R there is a probability measure P x on C under which B = {Bt } is
Brownian motion started at x. Expectation under P x is denoted by E x and
satisfies
E x [H] = E 0 [H(x + B)]
for any bounded BC -measurable function H. On the right x + B is a sum
of a point and a process, interpreted as the process whose value at time t is
x + Bt . (In Theorem 2.21 we constructed P 0 , and the equality above can
be taken as the definition of P x .)
On C we have the shift maps {θs : 0 ≤ s < ∞} defined by (θs ω)(t) =
ω(s + t) that move the time origin to s. The shift acts on the process B by
θs B = {Bs+t : t ≥ 0}.
A consequence of Lemma 2.23(a) is that the coordinate T process B is a
Brownian motion also relative to the larger filtration Ft+B = B
s:s>t Fs . We
shall show that members of FtB and Ft+ B differ only by null sets. (These

σ-algebras are different, see Exercise 2.6.) This will have interesting conse-
quences when we take t = 0. We begin with the Markov property.
Proposition 2.24. Let H be a bounded BC -measurable function on C.
(a) E x [H] is a Borel measurable function of x.
(b) For each x ∈ R
(2.18) E x [H ◦ θs |Fs+
B
](ω) = E Bs (ω) [H] for P x -almost every ω.

Proof. Part (a). Suppose we knew that x 7→ P x (F ) is measurable for each

closed set F ⊆ C. Then the π-λ Theorem B.1 implies that x 7→ P x (A) is
measurable for each A ∈ BC (fill in the details for this claim as an exercise).
Since linear combinations and limits preserve measurability, it follows that
x 7→ E x [H] is measurable for any bounded BC -measurable function H.
To show that x 7→ P x (F ) is measurable for each closed set F , consider
first a bounded continuous function H on C. (Recall that C is metrized by
the metric (2.16) of uniform continuity on compact intervals.) If xj → x in
R, then by continuity and dominated convergence,
E xj [H] = E 0 [H(xj + B)] −→ E 0 [H(x + B)] = E x [H]
58 2. Stochastic Processes

so E x [H] is continuous in x, which makes it Borel measurable. The indicator

function 1F of a closed set can be written as a bounded pointwise limit of
continuous functions Hn (see (B.2) in the appendix). So it follows that
P x (F ) = lim E x [Hn ]
n→∞
is also Borel measurable in x.

Part (b). We can write the shifted process as θs B = Bs + Y where

Yt = Bs+t − Bs . Let Z be a bounded Fs+ B -measurable random variable. By

Lemma 2.23(b), Y is a standard Brownian motion, independent of (Z, Bs )

B -measurable. Consequently
because the latter pair is Fs+
E x Z · H(θs B) = E x [Z · H(Bs + Y )]

Z
E x Z · H(Bs + ζ) P 0 (dζ)

=
C
By independence the expectation over Y can be separated from the expec-
tation over (Z, Bs ) (this is justified in Exercise 1.6). P 0 is the distribution of
Y because Y is a standard Brownian motion. Next move the P 0 (dζ) integral
back inside, and observe that
Z
H(y + ζ) P 0 (dζ) = E y [H]
C
for any point y, including y = Bs (ω). This gives
E x Z · H(θs B) = E x Z · E Bs (H) .

The proof is complete.

Proposition 2.25. Let H be a bounded BC -measurable function on C. Then
for any x ∈ R and 0 ≤ s < ∞,
(2.19) E x [H|Fs+
B
] = E x [H|FsB ] P x -almost surely.

Proof. Suppose first H is of the type

n
Y
H(ω) = 1Ai (ω(ti ))
i=1
for some 0 ≤ t1 < t2 < · · · < tn and Ai ∈ BR . By separating those factors
where ti ≤ s, we can write H = H1 · (H2 ◦ θs ) where H1 is FsB -measurable.
Then
E x [H|Fs+
B
] = H1 · E x [H2 ◦ θs |Fs+
B
] = H1 · E Bs [H2 ]
which is FsB -measurable. Since Fs+
B contains F B , (2.19) follows from a basic
s
property of conditional expectations.
2.4. Brownian motion 59

Let H be the collection of bounded functions H for which (2.19) holds.

By the linearity and the monotone convergence theorem for conditional ex-
pectations (Theorem B.10), H satisfies the hypotheses of Theorem B.2. For
the π-system S needed for Theorem B.2 take the class of events of the form
{ω : ω(t1 ) ∈ A1 , . . . , ω(tn ) ∈ An }
for 0 ≤ t1 < t2 < · · · < tn and Ai ∈ BR . We checked above that indicator
functions of these sets lie in H. Furthermore, these sets generate BC because
BC is generated by coordinate projections. By Theorem B.2 H contains all
bounded BC -measurable functions.
B then there exists B ∈ F B such that P x (A4B) =
Corollary 2.26. If A ∈ Ft+ t
0.

Proof. Let Y = E x (1A |FtB ), and B = {Y = 1} ∈ FtB . The event A4B is

contained in {1A 6= Y }, because ω ∈ A \ B implies 1A (ω) = 1 6= Y (ω), while
ω ∈ B \ A implies Y (ω) = 1 6= 0 = 1A (ω). By (2.19), 1A = Y P x -almost
surely. Hence {1A 6= Y }, and thereby A4B, is a P x -null event.
B ,
Corollary 2.27 (Blumenthal’s 0–1 Law). Let x ∈ R. Then for A ∈ F0+
x
P (A) is 0 or 1.

Proof. The σ-algebra F0B satisfies the 0–1 law under P x , because P x {B0 ∈
G} = 1G (x). Then every P x -conditional expectation with respect to F0B
equals the expectation (Exercise 1.8). The following equalities are valid
P x -almost surely for A ∈ F0+
B :

1A = E x (1A |F0+
B
) = E x (1A |F0B ) = P x (A).
Thus there must exist points ω ∈ C such that 1A (ω) = P x (A), and so the
only possible values for P x (A) are 0 and 1.

From the 0–1 law we get a fact that suggests something about the fast
oscillation of Brownian motion: if it starts at the origin, then in any nontriv-
ial time interval (0, ε) the process is both positive and negative, and hence
by continuity also zero. To make this precise, define
σ = inf{t > 0 : Bt > 0}, τ = inf{t > 0 : Bt < 0},
(2.20)
and T0 = inf{t > 0 : Bt = 0}.

Corollary 2.28. P 0 -almost surely σ = τ = T0 = 0.

B , write
Proof. To see that the event {σ = 0} lies in F0+
∞
\
{σ = 0} = 1
{Bq > 0 for some rational q ∈ (0, m )} ∈ FnB−1 .
m=n
60 2. Stochastic Processes

B B . Same
T
Since this is true for every n ∈ N, {σ = 0} ∈ n∈N Fn−1 = F0+
argument shows {τ = 0} ∈ F0+ B .

Since each variable Bt is a centered Gaussian,

P 0 {σ ≤ 1
m} ≥ P 0 {B1/m > 0} = 1
2
and so
P 0 {σ = 0} = lim P 0 {σ ≤ 1
m} ≥ 21 .
m→∞
1
The convergence of the probability happens because the events {σ ≤ m }
0
shrink down to {σ = 0} as m → ∞. By Blumenthal’s 0–1 law, P {σ = 0} =
0 or 1, so this quantity has to be 1. Again, the same argument for τ .
Finally, fix ω so that σ(ω) = τ (ω) = 0. Then there exist s, t ∈ (0, ε)
such that Bs (ω) < 0 < Bt (ω). By continuity, Bu (ω) = 0 for some u between
s and t. Hence T0 (ω) < ε. Since ε > 0 can be taken arbitrarily small,
T0 (ω) = 0.

2.4.2. Path regularity of Brownian motion. As a byproduct of the

construction of Brownian motion in Section B.2 we obtained Hölder conti-
nuity of paths with any exponent strictly less than 12 .
Theorem 2.29. Fix 0 < γ < 12 . The following is true almost surely for
Brownian motion: for every T < ∞ there exists a finite constant C(ω) such
that
(2.21) |Bt (ω) − Bs (ω)| ≤ C(ω)|t − s|γ for all 0 ≤ s, t ≤ T .

Next we prove a result from the opposite direction. Namely, for an

exponent strictly larger than 12 there is not even local Hölder continuity.
(“Local” here means that the property holds in a small enough interval
around a given point.)
Theorem 2.30. Let B be a Brownian motion. For finite positive reals γ,
C, and ε define the event
G(γ, C, ε) = {there exists s ∈ R+ such that |Bt − Bs | ≤ C|t − s|γ
for all t ∈ [s − ε, s + ε]}.
Then if γ > 12 , P G(γ, C, ε) = 0 for all positive C and ε.

Proof. Fix γ > 12 . Since only increments of Brownian motion are involved,
we can assume that the process in question is a standard Brownian motion.
(Bt and Bet = Bt − B0 have the same increments.) In the proof we want to
deal only with a bounded time interval. So define
Hk (C, ε) = {there exists s ∈ [k, k + 1] such that |Bt − Bs | ≤ C|t − s|γ
for all t ∈ [s − ε, s + ε] ∩ [k, k + 1]}.
2.4. Brownian motion 61

S
G(γ, C, ε) is contained in k Hk (C, ε), so it suffices to show P Hk (C, ε) = 0
Yt = Bk+t −Bk is a standard Brownian motion, P
for all k. Since Hk (C, ε) =
P H0 (C, ε) for each k. Finally, what we show is P H0 (C, ε) = 0.
Fix m ∈ N such that m(γ − 12 ) > 1. Let ω ∈ H0 (C, ε), and pick s ∈ [0, 1]
so that the condition of the event is satisfied. Consider n large enough so
that m/n < ε. Imagine partitioning [0, 1] into intervals of length n1 . Let
Xn,k = max{|B(j+1)/n − Bj/n | : k ≤ j ≤ k + m − 1} for 0 ≤ k ≤ n − m.
The point s has to lie in one of the intervals [ nk , k+m
n ], for some 0 ≤ k ≤ n−m.
For this particular k,
|B(j+1)/n − Bj/n | ≤ |B(j+1)/n − Bs | + |Bs − Bj/n |
≤ C | j+1 γ j γ
n − s| + |s − n | ≤ 2C( m
n)
γ

for all the j-values in the range k ≤ j ≤ k + m − 1. (Simply because

the points nj and j+1
n are within ε of s. Draw a picture.) In other words,
m γ
Xn,k ≤ 2C( n ) for this k-value.
Now consider all the possible k-values, recall that Brownian increments
are stationary and independent, and note that by basic Gaussian properties,
Bt has the same distribution as t1/2 B1 .
n−m
X
γ m γ
P H0 (C, ε) ≤ P {Xn,k ≤ 2C( m
n ) } ≤ nP {Xn,0 ≤ 2C( n ) }
k=0
m−1
m
Y
γ γ
P |B(j+1)/n − Bj/n | ≤ 2C( m = nP |B1/n | ≤ 2C( m

=n n) n)
j=0
m
= nP |B1 | ≤ 2Cn1/2−γ mγ

Z 2Cn1/2−γ mγ m m
1 −x2 /2 1 1/2−γ γ
=n √ e dx ≤ n √ 4Cn m
2π −2Cn1/2−γ mγ 2π
≤ K(m)n1−m(γ−1/2) .
2
In the last stages above we bounded e−x /2 above by 1, and then collected
some of the constants and m-dependent quantities into the function K(m).
large n, while m is fixed. Thus we may let n → ∞,
The bound is valid for all
and obtain P H0 (C, ε) = 0. This proves the theorem.
Corollary 2.31. The following is true almost surely for Brownian motion:
the path t 7→ Bt (ω) is not differentiable at any time point.

Proof. Suppose t 7→ Bt (ω) is differentiable at some point s. This means

that there is a real-valued limit
Bt (ω) − Bs (ω)
ξ = lim .
t→s t−s
62 2. Stochastic Processes

Thus if the integer M satisfies M > |ξ| + 1, we can find another integer k
such that for all t ∈ [s − k −1 , s + k −1 ],
Bt (ω) − Bs (ω)
−M ≤ ≤ M =⇒ |Bt (ω) − Bs (ω)| ≤ M |t − s|.
t−s
Consequently ω ∈ G(1, M, k −1 ).
This reasoning shows that if t 7→ B
St (ω)Sis differentiable at even a single
time point, then ω lies in the union M k G(1, M, k −1 ). This union has
probability zero by the previous theorem.

Functions of bounded variation are differences of nondecreasing func-

tions, and monotone functions can be differentiated at least Lebesgue–almost
everywhere. Hence the above theorem implies that Brownian motion paths
are of unbounded variation on every interval.
To recapitulate, the previous results show that a Brownian path is Hölder
continuous with any exponent γ < 12 but not for any γ > 21 . For the sake of
completeness, here is the precise result.
Theorem 2.32 (Lévy’s Modulus of Continuity). Almost surely,
B − Bs
lim supp t = 1.
δ&0 0≤s<t≤1 2δ log δ −1
t−s≤δ

Next we show that Brownian motion has finite quadratic variation, a

notion that occupies an important role in stochastic analysis and will be
discussed more in the martingale chapter. As a corollary we get another
proof of the unbounded variation of Brownian paths, a proof that does not
require knowledge about the differentiation properties of BV functions.
For a partition π = {0 = t0 < t1 < · · · < tm(π) = t} of [0, t], its mesh is
mesh(π) = maxi (ti+1 − ti ).
Proposition 2.33. Let B be a Brownian motion. For any partition π of
[0, t],
m(π)−1
X 2
2
(2.22) E (Bti+1 − Bti ) − t ≤ 2t mesh(π).
i=0
In particular
m(π)−1
X
(2.23) lim (Bti+1 − Bti )2 = t in L2 (P ).
mesh(pi)→0
i=0
If we have a sequence of partitions π n such that n mesh(π n ) < ∞, then
P
the convergence above holds almost surely along this sequence.
2.4. Brownian motion 63

Proof. Straightforward computation, utilizing the facts that Brownian in-

crements are independent, Bs − Br has mean zero normal distribution with
variance s − r, and so its fourth moment is 3(s − r)2 . Let ∆ti = ti+1 − ti .
m(n)−1
X 2 X
2
E (Bti+1 − Bti )4

E (Bti+1 − Bti ) − t =
i=0 i
X X
E (Bti+1 − Bti )2 (Btj+1 − Btj )2 − 2t E (Bti+1 − Bti )2 + t2

+
i6=j i
X X X X
2 2 2
=3 (∆ti ) + ∆ti · ∆tj − 2t + t = 2 (∆ti )2 + ∆ti · ∆tj − t2
i i6=j i i,j
X X
2
=2 (∆ti ) ≤ 2 mesh(π) ∆ti = 2 mesh(π)t.
i i

By Chebychev’s inequality,
m(π n )−1
X
2
P (Btni+1 − Btni ) − t ≥ ε
i=0
m(π n
X)−1 2
−2 2
≤ε E (Btni+1 − Btni ) − t
i=0
≤ 2tε−2 mesh(π ). n

If n mesh(π n ) < ∞, these numbers have a finite sum over n (in short,
P
they are summable). Hence the asserted convergence follows from the Borel-
Cantelli Lemma.

Corollary 2.34. The following is true almost surely for a Brownian motion
B: the path t 7→ Bt (ω) is not a member of BV [0, T ] for any 0 < T < ∞.

Proof. Pick an ω such that

n −1
2X
2
lim B(i+1)T /2n (ω) − BiT /2n (ω) = T
n→∞
i=0

for each T = k −1 for k ∈ N. Such ω’s form a set of probability 1 by the

previous proposition, because the partitions {iT 2−n : 0 ≤ i ≤ 2n } have
meshes 2−n that form a summable sequence. Furthermore, by almost sure
continuity, we can assume that

lim max B(i+1)T /2n (ω) − BiT /2n (ω) = 0

n→∞ 0≤i≤2n −1
64 2. Stochastic Processes

for each T = k −1 . (Recall that a continuous function is uniformly continuous

on a closed, bounded interval.) And now for each such T ,
n −1
2X
2
T = lim B(i+1)T /2n (ω) − BiT /2n (ω)
n→∞
i=0

≤ lim max
n
B(i+1)T /2n (ω) − BiT /2n (ω)
n→∞ 0≤i≤2 −1
n −1
2X
× B(i+1)T /2n (ω) − BiT /2n (ω) .
i=0

Since the maximum in braces vanishes as n → ∞, the last sum must converge
to ∞. Consequently the path t 7→ Bt (ω) is not BV in any interval [0, k −1 ].
Any other nontrivial inteval [0, T ] contains an interval [0, k −1 ] for some k,
and so this path cannot have bounded variation on any interval [0, T ].

2.5. Poisson processes

Poisson processes describe configurations of random points. The definition
and construction for a general state space is no more complex than for the
real line, so we define and construct the Poisson point process on an abstract
measure space first.
Let 0 < λ < ∞. Recall that a nonnegative integer-valued random vari-
able X has Poisson distribution with parameter λ (Poisson(λ)–distribution)
if
λk
P {X = k} = e−λ for k ∈ Z+ .
k!
To describe point processes we also need the extreme cases: a Poisson vari-
able with parameter λ = 0 is identically zero, or P {X = 0} = 1, while a Pois-
son variable with parameter λ = ∞ is identically infinite: P {X = ∞} = 1.
A Poisson(λ) variable has mean and variance λ. A sum of independent Pois-
son variables (including a sum of countably infinitely many terms) is again
Poisson distributed. These properties make the next definition possible.
Definition 2.35. Let (S, A, µ) be a σ-finite measure space. A process
{N (A) : A ∈ A} indexed by the measurable sets is a Poisson point pro-
cess with mean measure µ if
(i) Almost surely, N (·) is a Z ∪ {∞}-valued measure on (S, A).
(ii) N (A) is Poisson distributed with parameter µ(A).
(iii) For any pairwise disjoint A1 , A2 , . . . , An ∈ A, the random variables
N (A1 ), N (A2 ), . . . , N (An ) are independent.
The interpretation is that N (A) is the number of points in the set A.
2.5. Poisson processes 65

Observe that items (i) and (ii) give a complete description of all the
finite-dimensional distributions of {N (A)}. For arbitrary B1 , B2 , . . . , Bm ∈
A, we can find disjoint A1 , A2 , . . . , An ∈ A so that each Bj is a union of
some of the Ai ’s. Then each N (Bj ) is a certain sum of N (Ai )’s, and we see
that the joint distribution of N (B1 ), N (B2 ), . . . , N (Bm ) is determined by
the joint distribution of N (A1 ), N (A2 ), . . . , N (An ).

Proposition 2.36. Let (S, A, µ) be a σ-finite measure space. Then a Pois-

son point process {N (A) : A ∈ A} with mean measure µ exists.
S
Proof. Let S1 , S2 , S3 , . . . be disjoint measurable sets such that S = Si
and µ(Si ) < ∞. We shall first define a Poisson point process Ni supported
on the subset Si (this means that Ni has no points outside Si ). If µ(Si ) = 0,
define Ni (A) = 0 for every measurable set A ∈ A.
With this trivial case out of the way, we may assume 0 < µ(Si ) < ∞. Let
{Xji : j ∈ N} be i.i.d. Si -valued random variables with common probability
distribution
µ(B ∩ Si )
P {Xji ∈ B} = for measurable sets B ∈ A.
µ(Si )
Independently of the {Xji : j ∈ N}, let Ki be a Poisson(µ(Si )) random
variable. Define
Ki
X
Ni (A) = 1A (Xji ) for measurable sets A ∈ A.
j=1

As the formula reveals, Ki decides how many points to place in Si , and the
{Xji } give the locations of the points in Si . We leave it as an exercise to
check that Ni is a Poisson point process whose mean measure is µ restricted
to Si , defined by µi (B) = µ(B ∩ Si ).
We can repeat this construction for each Si , and take the resulting ran-
dom processes Ni mutually independent (by a suitable product space con-
struction). Finally, define
X
N (A) = Ni (A).
i

Again, we leave checking the properties as an exercise.

The most important Poisson processes are those on Euclidean spaces

whose mean measure is a constant multiple of Lebesgue measure. These are
called homogeneous Poisson point processes. When the points lie on the
positive real line, they naturally acquire a temporal interpretation. For this
case we make the next definition.
66 2. Stochastic Processes

Definition 2.37. Let (Ω, F, P ) be a probability space, {Ft } a filtration on

it, and α > 0. A (homogeneous) Poisson process with rate α is an adapted
stochastic process N = {Nt : 0 ≤ t < ∞} with these properties.
(i) N0 = 0 almost surely.
(ii) For almost every ω, the path t 7→ Nt (ω) is cadlag.
(iii) For 0 ≤ s < t, Nt − Ns is independent of Fs , and has Poisson
distribution with parameter α(t − s).
Proposition 2.38. Homogeneous Poisson processes on [0, ∞) exist.

Proof. Let {N (A) : A ∈ B(0,∞) } be a Poisson point process on (0, ∞)

with mean measure αm. Define N0 = 0, Nt = N (0, t] for t > 0, and
FtN = σ{Ns : 0 ≤ s ≤ t}. Let 0 = s0 < s1 < · · · < sn ≤ s < t. Then
N (s0 , s1 ], N (s1 , s2 ], . . . , N (sn−1 , sn ], N (s, t]
are independent random variables, from which follows that the vector (Ns1 , . . . , Nsn )
is independent of N (s, t]. Considering all such n-tuples (for various n) while
keeping s < t fixed shows that Fs is independent of N (s, t] = Nt − Ns .
The cadlag path property of Nt follows from properties of the Poisson
process. Almost every ω has the property that N (0, T ] < ∞ for all T < ∞.
Given such an ω and t, there exist t0 < t < t1 such that N (t0 , t) = N (t, t1 ) =
0. (There may be a point at t, but there cannot be sequences of points
converging to t from either left or right.) Consequently Ns is constant for
t0 < s < t and so the left limit Nt− exists. Also Ns = Nt for t ≤ s < t1
which gives the right continuity at t.

Let now N denote a homogeneous rate α Poisson process on R+ with re-

spect to a filtration {Ft }. The next lemma is proved just like its counterpart
for Brownian motion, so we omit its proof.
Lemma 2.39. Suppose N = {Nt } is a homogeneous Poisson process with
respect to a filtration {Ft } on (Ω, F, P ).
(a) N continues to be a Poisson process with respect to the filtration if we
augment the filtration with null events, or replace it with the right-continuous
filtration {Ft+ }.
(b) Define Yt = Ns+t − Ns and Gt = Fs+t . Then Y = {Yt : 0 ≤ t <
∞} is a homogeneous Poisson process with respect to the filtration {Gt },
independent of Fs .
Since the Poisson process is monotone nondecreasing it cannot be a
martingale. We need to compensate by subtracting off the mean, and so we
define the compensated Poisson process as
Mt = Nt − αt.
Exercises 67

Proposition 2.40. M is a martingale.

Proof. Follows from the independence of increments.

E[Nt |Fs ] = E[Nt − Ns |Fs ] + E[Ns |Fs ] = α(t − s) + Ns .

The reader should also be aware of another natural construction of N

in terms of waiting times.
Proposition 2.41. Let {Tk : 1 ≤ k < ∞} be i.i.d. rate α exponential
random variables. Let Sn = T1 + · · · + Tn for n ≥ 1. Then {Sn : n ∈ N} is a
homogeneous rate α Poisson point process on R+ , and Nt = max{n : Sn ≤
t} defines a rate α Poisson process with respect to its own filtration {FtN }.
We refer to [5, Section 4.8] for a proof.

Exercises
Exercise 2.1. Let {Ft } be a filtration, and let Gt = Ft+ . Show that Gt− =
Ft− for t > 0.
Exercise 2.2. Assume the probability space (Ω, F, P ) is complete. Let
{Ft } be a filtration, Gt = Ft+ its right-continuous version, and Ht = F̄t
its augmentation.
S Augment {Gt } to get the filtration {Ḡt }, and define also
Ht+ = s:s>t Hs . Show that Ḡt = Ht+ . In other words, it is immaterial
whether we augment before or after making the filtration right-continuous.
Hints. Ḡt ⊆ Ht+ should be easy. For the other direction, if C ∈ Ht+ ,
then for each s > t there exists Cs ∈ Fs such that P (C4Cs ) = 0. For any
e=T
sequence si & t, the set C
S
m≥1 i≥m Csi lies in Ft+ . Use Exercise 1.3.
Exercise 2.3. Let the underlying probability space be Ω = [0, 1] with P
given by Lebesgue measure. Define two processes
Xt (ω) = 0 and Yt (ω) = 1{t=ω} .
X is continuous, Y does not have a single continuous path, but they are
modifications of each other.
Exercise 2.4 (Example of an adapted but not progressively measurable
process). Let Ω = [0, 1], and for each t let Ft be the σ-field generated by
singletons on Ω. (Equivalently, Ft consists of all countable sets and their
complements.) Let Xt (ω) = 1{ω = t}. Then {Xt : 0 ≤ t ≤ 1} is adapted.
But X on [0, 1] × Ω is not B[0,1] ⊗ F1 -measurable.
Hint. Show that elements of B[0,1] ⊗ F1 are of the type
[
Bs × {s} ∪ H × I c

s∈I
68 2. Stochastic Processes

where I is a countable subset of Ω, each Bt ∈ B[0,1] , and H is either empty

or [0, 1]. Consequently the diagonal {(t, ω) : Xt (ω) = 1} is not an element
of B[0,1] ⊗ F1 .
Exercise 2.5. Let σ be a stopping time and Z an Fσ -measurable random
variable. Show that for any A ∈ B[0,t] , 1{τ ∈A} Z is Ft -measurable. Hint.
Start with A = [0, s]. Use the π-λ Theorem.
Exercise 2.6. Let Ft = σ{ω(s) : 0 ≤ s ≤ t} be the filtration generated by
coordinates on the space C = CR [0, ∞) of continuous functions. Let
H = {ω ∈ C : t is a local maximum for ω }.
Show that H ∈ Ft+ \ Ft .
Hints. To show H ∈ Ft+ , note that ω ∈ H iff for all large enough
n ∈ N, ω(t) ≥ ω(q) for all rational q ∈ (t − n−1 , t + n−1 ). To show H ∈
/ Ft ,
use Exercise 1.2(b). For any ω ∈ H one can construct ω e ∈/ H such that
e (s) for s ≤ t.
ω(s) = ω
Exercise 2.7. (a) Suppose X is a caglad process adapted to {Ft }. Define
Z(t) = X(t+) for 0 ≤ t < ∞. Show that Z is a cadlag process adapted to
{Ft+ }.
(b) Show that Lemma 2.8 is valid for a caglad process under the addi-
tional assumption that {Ft } is right-continuous.
(c) Let Ω be the space of real-valued caglad paths, X the coordinate
process Xt (ω) = ω(t) on Ω, and {Ft } the filtration generated by coordinates.
Show that Lemma 2.8 fails for this setting.
Exercise 2.8. Let B and X be two independent one-dimensional Brownian
motions. Following the proof of Proposition 2.33, show that
m(π)−1
X
(2.24) lim (Bti+1 − Bti )(Xti+1 − Xti ) = 0 in L2 (P ).
mesh(π)→0
i=0

Exercise 2.9. Let N be a homogeneous rate α Poisson process on R+ with

respect to a filtration {Ft }, and Mt = Nt − αt. Show that Mt2 − αt and
Mt2 − Nt are martingales.
Chapter 3

Martingales

Let (Ω, F, P ) be a probability space with a filtration {Ft }. As defined in the

previous chapter, a martingale with respect to {Ft } is a stochastic process
M = {Mt : t ∈ R+ } adapted to {Ft } such that Mt is integrable for each t,
and
E[Mt |Fs ] = Ms for all s < t.
If the equality above is relaxed to
E[Mt |Fs ] ≥ Ms for all s < t
then M is a submartingale. M is a supermartingale if −M is a submartingale.
M is square-integrable if E[Mt2 ] < ∞ for all t.
These properties are preserved by certain classes of functions.
Proposition 3.1. (a) If M is a martingale and ϕ a convex function such
that ϕ(Mt ) is integrable for all t ≥ 0, then ϕ(Mt ) is a submartingale.
(b) If M is a submartingale and ϕ a nondecreasing convex function such
that ϕ(Mt ) is integrable for all t ≥ 0, then ϕ(Mt ) is a submartingale.

Proof. Part (a) follows from Jensen’s inequality. For s < t,

E[ϕ(Mt )|Fs ] ≥ ϕ E[Mt |Fs ] = ϕ(Ms ).
Part (b) follows from the same calculation, but now the last equality becomes
the inequality ≥ due to the submartingale property E[Mt |Fs ] ≥ Ms and the
monotonicity of ϕ.

The martingales we work with have always right-continuous paths. Then

it is sometimes convenient to enlarge the filtration to {Ft+ } if the filtration is
not right-continuous to begin with. The next proposition permits this move.

69
70 3. Martingales

An example of its use appears in the proof of Doob’s inequality, Theorem

3.11.

Proposition 3.2. Suppose M is a right-continuous submartingale with re-

spect to a filtration {Ft }. Then M is a submartingale also with respect to
{Ft+ }.

Proof. Let s < t and consider n > (t − s)−1 . Mt ∨ c is a submartingale, so

E[Mt ∨ c|Fs+n−1 ] ≥ Ms+n−1 ∨ c.

Since Fs+ ⊆ Fs+n−1 ,

E[Mt ∨ c|Fs+ ] ≥ E[Ms+n−1 ∨ c|Fs+ ].

By the bounds
c ≤ Ms+n−1 ∨ c ≤ E[Mt ∨ c|Fs+n−1 ]

and Lemma B.12 from the Appendix, for a fixed c the random variables
{Ms+n−1 ∨ c} are uniformly integrable. Let n → ∞. Right-continuity of
paths implies Ms+n−1 ∨ c → Ms ∨ c. Uniform integrability then gives con-
vergence in L1 . By Lemma B.13 there exists a subsequence {nj } such that
conditional expectations converge almost surely:

E[Ms+n−1 ∨ c|Fs+ ] → E[Ms ∨ c|Fs+ ].

Consequently

E[Mt ∨ c|Fs+ ] ≥ E[Ms ∨ c|Fs+ ] = Ms ∨ c ≥ Ms .

As c → −∞, the dominated convergence theorem for conditional expecta-

tions makes the conditional expectation on the right converge, and in the
limit
E[Mt |Fs+ ] ≥ Ms .

The connection between the right-continuity of the (sub)martingale and

the filtration goes the other way too.

Proposition 3.3. Suppose the filtration {Ft } satisfies the usual conditions,
in other words (Ω, F, P ) is complete, F0 contains all null events, and Ft =
Ft+ . Let M be a submartingale such that t 7→ EMt is right-continuous.
Then there exists a cadlag modification of M that is an {Ft }-submartingale.
3.1. Optional Stopping 71

3.1. Optional Stopping

We begin with optional stopping for discrete stopping times.
Lemma 3.4. Let M be a submartingale. Let σ and τ be two stopping times
whose values lie in an ordered countable set {0 ≤ s1 < s2 < s3 < · · · } ∪ {∞}
where sj % ∞. Then for any T < ∞,
(3.1) E[Mτ ∧T |Fσ ] ≥ Mσ∧τ ∧T .

Proof. Fix n so that sn ≤ T < sn+1 . First observe that Mτ ∧T is integrable,

because
Xn
|Mτ ∧T | = 1{τ = si }|Msi | + 1{τ > sn }|MT |
i=1
n
X
≤ |Msi | + |MT |.
i=1

Next check that Mσ∧τ ∧T is Fσ -measurable. For discrete stopping times

this is simple. We need to show that {Mσ∧τ ∧T ∈ B} ∩ {σ ≤ t} ∈ Ft for all
B ∈ BR and t. Let sj be the highest value not exceeding t. (If there is no
such si , then t < s1 , the event above is empty and lies in Ft .) Then
j
[
{Mσ∧τ ∧T ∈ B} ∩ {σ ≤ t} = {σ ∧ τ = si } ∩ {Msi ∧T ∈ B} ∩ {σ ≤ t} .
i=1

This is a union of events in Ft because si ≤ t and σ ∧ τ is a stopping time.

Since both E[Mτ ∧T |Fσ ] and Mσ∧τ ∧T are Fσ -measurable, (3.1) follows
from checking that
E{1A E[Mτ ∧T |Fσ ]} ≥ E{1A Mσ∧τ ∧T } for all A ∈ Fσ .
By the definition of conditional expectation, this reduces to showing
E[1A Mτ ∧T ] ≥ E[1A Mσ∧τ ∧T ].
Decompose A according to whether σ ≤ T or σ > T . If σ > T , then
τ ∧ T = σ ∧ τ ∧ T and then
E[1A∩{σ>T } Mτ ∧T ] = E[1A∩{σ>T } Mσ∧τ ∧T ].
It remains to show that
E[1A∩{σ=si } Mτ ∧T ] ≥ E[1A∩{σ=si } Mσ∧τ ∧T ]
= E[1A∩{σ=si } Msi ∧τ ∧T ] for 1 ≤ i ≤ n
72 3. Martingales

where n was chosen so that sn ≤ T < sn+1 . Since A ∩ {σ = si } ∈ Fsi , by

conditioning again on the left, this last conclusion will follow from
(3.2) E[Mτ ∧T |Fsi ] ≥ Msi ∧τ ∧T for 1 ≤ i ≤ n.
We check this by an iterative argument. First an auxiliary inequality: for
any j,

E[Msj+1 ∧τ ∧T |Fsj ] = E Msj+1 ∧T 1{τ > sj } + Msj ∧τ ∧T 1{τ ≤ sj } Fsj
= E[Msj+1 ∧T |Fsj ] · 1{τ > sj } + Msj ∧τ ∧T 1{τ ≤ sj }
≥ Msj ∧T 1{τ > sj } + Msj ∧τ ∧T 1{τ ≤ sj }
= Msj ∧τ ∧T .
Above we used the fact that Msj ∧τ ∧T is Fsj -measurable (which was checked
above) and then the submartingale property. Since τ ∧ T = sn+1 ∧ τ ∧ T
(recall that sn ≤ T < sn+1 ), applying the above inequality to j = n gives
E[Mτ ∧T |Fsn ] ≥ Msn ∧τ ∧T
which is case i = n of (3.2). Applying this and the auxiliary inequality again
gives

E[Mτ ∧T |Fsn−1 ] = E E[Mτ ∧T |Fsn ] Fsn−1

≥ E Msn ∧τ ∧T Fsn−1
≥ Msn−1 ∧τ ∧T .
This is repeated until all cases of (3.2) have been proved.

To extend this result to general stopping times, we assume some regu-

larity on the paths of M . First we derive a moment bound.
Lemma 3.5. Let M be a submartingale with right-continuous paths and
T < ∞. Then for any stopping time ρ that satisfies P {ρ ≤ T } = 1,
E |Mρ | ≤ 2E[MT+ ] + E[M0 ].

Proof. Define a discrete approximation of ρ by ρn = T if ρ = T , and

ρn = 2−n T ([2n ρ/T ] + 1) if ρ < T . Then ρn is a stopping time with finitely
many values in [0, T ], and ρn & ρ as n → ∞.
Averaging over (3.1) gives E[Mτ ∧T ] ≥ E[Mσ∧τ ∧T ]. Apply this to τ = ρn
and σ = 0 to get
EMρn ≥ EM0 .
Next, apply (3.1) to the submartingale Mt+ = Mt ∨ 0, with τ = T and
σ = ρn to get
EMT+ ≥ EMρ+n .
3.1. Optional Stopping 73

From the above

EMρ−n = EMρ+n − EMρn ≤ EMT+ − EM0 .
Combining these,
E|Mρn | = EMρ+n + EMρ−n ≤ 2EMT+ + EM0− .
Let n → ∞, use right-continuity and apply Fatou’s Lemma to get
E|Mρ | ≤ lim E|Mρn | ≤ 2EMT+ + EM0− .
n→∞

The conclusion from the previous lemma needed next is that for any
stopping time τ and T ∈ R+ , the stopped variable Mτ ∧t is integrable.
Theorem 3.6. Let M be a submartingale with right-continuous paths, and
let σ and τ be two stopping times. Then for T < ∞,
(3.3) E[Mτ ∧T |Fσ ] ≥ Mσ∧τ ∧T .

Proof. As pointed out before the theorem, Mτ ∧T and Mσ∧τ ∧T are integrable
random variable. In particular, the conditional expectation is well-defined.
Define approximating discrete stopping times by σn = 2−n ([2n σ]+1) and
τn = 2−n ([2n τ ] + 1). The interpretation for infinite values is that σn = ∞ if
σ = ∞, and similarly for τn and τ .
Let c ∈ R. The function x 7→ x ∨ c is convex and nondecreasing, hence
Mt ∨ c is also a submartingale. Applying Theorem 3.4 to this submartingale
and the stopping times σn and τn gives
E[Mτn ∧T ∨ c|Fσn ] ≥ Mσn ∧τn ∧T ∨ c.
Since σ ≤ σn , Fσ ⊆ Fσn , and if we condition both sides of the above
inequality on Fσ , we get
(3.4) E[Mτn ∧T ∨ c|Fσ ] ≥ E[Mσn ∧τn ∧T ∨ c|Fσ ].

The purpose is now to let n → ∞ in (3.4) and obtain the conclusion

(3.3) for the truncated process Mt ∨ c, and then let c & −∞ and get the
conclusion. The time arguments converge from the right: τn ∧ T & τ ∧ T
and σn ∧ τn ∧ T & σ ∧ τ ∧ T . Then by the right-continuity of M ,
Mτn ∧T → Mτ ∧T and Mσn ∧τn ∧T → Mσ∧τ ∧T .

Next we justify convergence of the conditional expectations along a sub-

sequence. By Theorem 3.4
c ≤ Mτn ∧T ∨ c ≤ E[MT ∨ c|Fτn ]
74 3. Martingales

and
c ≤ Mσn ∧τn ∧T ∨ c ≤ E[MT ∨ c|Fσn ∧τn ].
Together with Lemma B.12 from the Appendix, these bounds imply that the
sequences {Mτn ∧T ∨ c : n ∈ N} and {Mσn ∧τn ∧T ∨ c : n ∈ N} are uniformly
integrable. Since these sequences converge almost surely (as argued above),
uniform integrability implies that they converge in L1 . By Lemma B.13
there exists a subsequence {nj } along which the conditional expectations
converge almost surely:
E[Mτnj ∧T ∨ c|Fσ ] → E[Mτ ∧T ∨ c|Fσ ]
and
E[Mσnj ∧τnj ∧T ∨ c|Fσ ] → E[Mσ∧τ ∧T ∨ c|Fσ ].
(To get a subsequence that works for both limits, extract a subsequence for
the first limit by Lemma B.13, and then apply Lemma B.13 again to extract
a further subsubsequence for the second limit.) Taking these limits in (3.4)
gives
E[Mτ ∧T ∨ c|Fσ ] ≥ E[Mσ∧τ ∧T ∨ c|Fσ ].
M is right-continuous by assumption, hence progressively measurable, and
so Mσ∧τ ∧T is Fσ∧τ ∧T -measurable. This is a sub-σ-field of Fσ , and so
E[Mτ ∧T ∨ c|Fσ ] ≥ E[Mσ∧τ ∧T ∨ c|Fσ ] = Mσ∧τ ∧T ∨ c ≥ Mσ∧τ ∧T .
As c & −∞, Mτ ∧T ∨c → Mτ ∧T pointwise, and for all c we have the integrable
bound |Mτ ∧T ∨ c| ≤ |Mτ ∧T |. Thus by the dominated convergence theorem
for conditional expectations, almost surely
lim E[Mτ ∧T ∨ c|Fσ ] = E[Mτ ∧T |Fσ ].
c→−∞

This completes the proof.

Corollary 3.7. Suppose M is a right-continuous submartingale and τ is
a stopping time. Then the stopped process M τ = {Mτ ∧t : t ∈ R+ } is a
submartingale with respect to the original filtration {Ft }.
If M is a also a martingale, then M τ is a martingale. And finally, if M
is an L2 -martingale, then so is M τ .

Proof. In (3.3), take T = t, σ = s < t. Then it becomes the submartingale

property for M τ :
(3.5) E[Mτ ∧t |Fs ] ≥ Mτ ∧s .
If M is a martingale, we can apply this to both M and −M . And if M is
an L2 -martingale, Lemma 3.5 implies that so is M τ .
3.2. Inequalities 75

Corollary 3.8. Suppose M is a right-continuous submartingale. Let {σ(u) :

u ≥ 0} be a nondecreasing, [0, ∞)-valued process such that σ(u) is a bounded
stopping time for each u. Then {Mσ(u) : u ≥ 0} is a submartingale with
respect to the filtration {Fσ(u) : u ≥ 0}.
If M is a martingale or an L2 -martingale to begin with, then so is Mσ(u) .

Proof. For u < v and T ≥ σ(v), (3.3) gives E[Mσ(v) |Fσ(u) ] ≥ Mσ(u) . If
M is a martingale, we can apply this to both M and −M . And if M is
an L2 -martingale, Lemma 3.5 applied to the submartingale M 2 implies that
2 ] ≤ 2E[M 2 ] + E[M 2 ].
E[Mσ(u)
T 0

The last corollary has the following implications:

(i) M τ is a submartingale not only with respect to {Ft } but also with
respect to {Fτ ∧t }.
(ii) Let M be an L2 -martingale and τ a bounded stopping time. Then
M̄t = Mτ +t − Mτ is an L2 -martingale with respect to F̄t = Fτ +t .

3.2. Inequalities
Lemma 3.9. Let M be a submartingale, 0 < T < ∞, and H a finite subset
of [0, T ]. Then for r > 0,
n o
(3.6) P max Mt ≥ r ≤ r−1 E[MT+ ]
t∈H

and
n o
P min Mt ≤ −r ≤ r−1 E[MT+ ] − E[M0 ] .

(3.7)
t∈H

Proof. Let σ = min{t ∈ H : Mt ≥ r}, with the interpretation that σ = ∞

if Mt < r for all t ∈ H. (3.3) with τ = T gives
E[MT ] ≥ E[Mσ∧T ] = E[Mσ 1{σ<∞} ] + E[MT 1{σ=∞} ],
from which
n o
rP max Mt ≥ r = rP {σ < ∞} ≤ E[Mσ 1{σ<∞} ] ≤ E[MT 1{σ<∞} ]
t∈H
≤ E[MT+ 1{σ<∞} ] ≤ E[MT+ ].
This proves (3.6).
To prove (3.7), let τ = min{t ∈ H : Mt ≤ −r}. (3.4) with σ = 0 gives
E[M0 ] ≤ E[Mτ ∧T ] = E[Mτ 1{τ <∞} ] + E[MT 1{τ =∞} ],
76 3. Martingales

from which
n o
−rP min Mt ≤ −r = −rP {τ < ∞} ≥ E[Mτ 1{τ <∞} ]
t∈H
≥ E[M0 ] − E[MT 1{τ =∞} ] ≥ E[M0 ] − E[MT+ ].

Next we generalize this to uncountable suprema and infima.

Theorem 3.10. Let M be a right-continuous submartingale and 0 < T <
∞. Then for r > 0,
n o
(3.8) P sup Mt ≥ r ≤ r−1 E[MT+ ]
0≤t≤T

and
n o
inf Mt ≤ −r ≤ r−1 E[MT+ ] − E[M0 ] .

(3.9) P
0≤t≤T

Proof. Let H be a countable dense subset of [0, T ] that contains

S 0 and T ,
and let H1 ⊆ H2 ⊆ H3 ⊆ · · · be finite sets such that H = Hn . Lemma
3.9 applies to the sets Hn . Let b < r. By right-continuity,
n o n o n o
P sup Mt > b = P sup Mt > b = lim P sup Mt > b
0≤t≤T t∈H n→∞ t∈Hn
−1
≤b E[MT+ ].
Let b % r. This proves (3.8). (3.9) is proved by a similar argument.

When X is either a left- or right-continuous process, we define

(3.10) XT∗ (ω) = sup |Xt (ω)|.
0≤t≤T

The measurability of XT∗

is checked as follows. First define U = sups∈R |Xs |
where R contains T and all rationals in [0, T ]. U is FT -measurable as a
supremum of countably many FT -measurable random variables. On every
left- or right-continuous path U coincides with XT∗ . Thus U = XT∗ at least
almost surely. By the completeness assumption on the filtration, all events of
probability zero and their subsets lie in FT , and so XT∗ is also FT -measurable.
Theorem 3.11 (Doob’s Inequality). Let M be a nonnegative right-continuous
submartingale and 0 < T < ∞. The for 1 < p < ∞
h p i p p
(3.11) E sup Mt ≤ E XTp .
0≤t≤T p − 1
3.2. Inequalities 77

Proof. Since M is nonnegative, MT∗ = sup0≤t≤T Mt . The first part of the

proof is to justify the inequality
P MT∗ > r ≤ r−1 E MT 1{MT∗ ≥ r}

(3.12)
for r > 0. Let
τ = inf{t > 0 : Mt > r}.
This is an {Ft+ }-stopping time by Lemma 2.6. By right-continuity Mτ ≥ r
when τ < ∞. Quite obviously MT∗ > r implies τ ≤ T , and so
rP MT∗ > r ≤ E Mτ 1{MT∗ > r} ≤ E[ Mτ 1{τ ≤ T }].

Since M is a submartingale with respect to {Ft+ } by Proposition 3.2, The-

orem 3.6 gives
E[Mτ 1{τ ≤T } ] = E[Mτ ∧T ] − E[MT 1{τ >T } ] ≤ E[MT ] − E[MT 1{τ >T } ]
= E[MT 1{τ ≤T } ] ≤ E MT 1{MT∗ ≥ r} .

(3.12) has been verified.

Let 0 < b < ∞. By (1.36) and Hölder’s inequality,
Z b Z b
∗ ∗
p p−1
prp−2 E MT 1{MT∗ ≥ r} dr

E (MT ∧ b) = pr P [MT > r] dr ≤
0 0
Z b∧MT∗
h i p
prp−2 dr = · E MT (b ∧ MT∗ )p−1

= E MT ·
0 p−1
p p1 p−1
≤ · E MT p E (b ∧ MT∗ )p p .
p−1
The truncation at b guarantees that the last expectation is finite so we can
divide by it through the inequality to get
1 p 1
E (MT∗ ∧ b)p p ≤ · E MTp p .

p−1
Raise both sides of this last inequality to power p and then let b % ∞.
Monotone convergence theorem gives the conclusion.

By applying the previous inequalities to the stopped process Mt∧τ , we

can replace T with a bounded stopping time τ . We illustrate the idea with
Doob’s inequality.
Corollary 3.12. Let M be a nonnegative right-continuous submartingale
and τ a bounded stopping time. The for 1 < p < ∞
h p i p p
(3.13) E sup Mt ≤ E Xτp .
0≤t≤τ p−1

Proof. Pick T so that τ ≤ T . Since sup0≤t≤T Mt∧τ = sup0≤t≤τ Mt and

MT ∧τ = Mτ , the result follows immediately by applying (3.11) to Mt∧τ .
78 3. Martingales

3.3. Local martingales and semimartingales

For a stopping time τ and a process X = {Xt : t ∈ R+ }, the stopped process
X τ is defined by Xtτ = Xτ ∧t .
Definition 3.13. Let M = {Mt : t ∈ R+ } be a process. M is a local
martingale if there exists a sequence of stopping times τ1 ≤ τ2 ≤ τ3 ≤ · · ·
such that P {τk % ∞} = 1 and M τk is a martingale for each k. M is a
local square-integrable martingale if M τk is a square-integrable martingale
for each k. In both cases we say {τk } is a localizing sequence for M .

We shall also use the shorter term local L2 -martingale for a local square-
integrable martingale.
Lemma 3.14. Suppose M is a local martingale and σ is an arbitrary stop-
ping time. Then M σ is also a local martingale. Similarly, if M is a local
L2 -martingale, then so is M σ . In both cases, if {τk } is a localizing sequence
for M , then it is also a localizing sequence for M σ .

Proof. Let {τk } be a sequence of stopping times such that τk % ∞ and

τk
M τk is a martingale. By Corollary 3.7 the process Mσ∧t = (M σ )τt k is a
σ
martingale. Thus the stopping times τk work also for M .
τk
If M τk is an L2 -martingale, then so is Mσ∧t = (M σ )τt k , because by
applying Lemma 3.5 to the submartingale (M τk )2 ,
2
E[Mσ∧τk ∧t
] ≤ 2E[Mτ2k ∧t ] + E[M02 ].

Only large jumps can prevent a cadlag local martingale from being a
local L2 -martingale.
Lemma 3.15. Suppose M is a cadlag local martingale, and there is a con-
stant c such that |Mt − Mt− | ≤ c for all t. Then M is a local L2 -martingale.

Proof. Let τk % ∞ be stopping times such that M τk is a martingale. Let

ρk = inf{t ≥ 0 : |Mt | or |Mt− | ≥ k}
be the stopping times defined by (2.5) and (2.6). By the cadlag assumption,
each path t 7→ Mt (ω) is locally bounded (means: bounded in any bounded
time interval), and consequently ρk (ω) % ∞ as k % ∞. Let σk = τk ∧ ρk .
Then σk % ∞, and M σk is a martingale for each k. Furthermore,
|Mtσk | = |Mτk ∧ρk ∧t | ≤ sup |Ms | + |Mρk − Mρk − | ≤ k + c.
0≤s<ρk

So M σk is a bounded process, and in particular M σk is an L2 -process.

3.4. Quadratic variation for local martingales 79

Recall that the usual conditions on the filtration {Ft } meant that the
filtration is complete (each Ft contains every subset of a P -null event in F)
and right-continuous (Ft = Ft+ ).
Theorem 3.16 (Fundamental Theorem of Local Martingales). Assume
{Ft } is complete and right-continuous. Suppose M is a cadlag local martin-
gale and c > 0. Then there exist cadlag local martingales M
f and A such that
the jumps of M are bounded by c, A is an FV process, and M = M
f f + A.

A proof of the fundamental theorem of local martingales can be found

in [4]. Combining this theorem with the previous lemma gives the following
corollary, which we will find useful because L2 -martingales are the starting
point for developing stochastic integration.
Corollary 3.17. Assume {Ft } is complete and right-continuous. Then a
cadlag local martingale M can be written as a sum M = M f + A of a cadlag
2
local L -martingale M and a local martingale A that is an FV process.
f

Definition 3.18. A cadlag process Y is a semimartingale if it can be written

as Yt = Y0 + Mt + Vt where M is a cadlag local martingale, V is a cadlag
FV process, and M0 = V0 = 0.

By the previous corollary, we can always select the local martingale

part of a semimartingale to be a local L2 martingale. The normalization
M0 = V0 = 0 does not exclude anything since we can always replace M with
M − M0 without losing either the local martingale or local L2 -martingale
property, and also V − V0 has the same total variation as V does.
Remark 3.19 (A look ahead). The definition of a semimartingale appears
ad hoc. But stochastic analysis will show us that the class of semimartingales
has several attractive properties. (i) For a suitably boundedR predictable pro-
cess X and a semimartingale Y , the stochastic integral X dY is again a
semimartingale. (ii) f (Y ) is a semimartingale for a C 2 function f . The class
of local L2 -martingales has property (i) but not (ii). In order to develop a
sensible stochastic calculus that involves functions of processes, it is neces-
sary to extend the class of processes considered from local martingales to
semimartingales.

3.4. Quadratic variation for local martingales

Quadratic variation and covariation were discussed in general in Section 2.2,
and here we look at these notions for martingales and local martingales. We
begin with the examples we already know.
Example 3.20 (Brownian motion). From Proposition 2.33 and Exercise 2.8,
for two independent Brownian motions B and Y , [B]t = t and [B, Y ]t = 0.
80 3. Martingales

Example 3.21 (Poisson process). Let N be a homogeneous rate α Poisson

process, and Mt = Nt − αt the compensated Poisson process. Then [M ] =
N by Corollary A.9. If Ne is an independent rate α
e Poisson process with
M
ft = Net − α
et, [M, M
f] = 0 by Lemma A.8 because with probability one M
and M
f have no jumps in common.

Next the general existence theorem.

Theorem 3.22. Let M be a right-continuous local martingale with respect

to a filtration {Ft }. Then the quadratic variation process [M ] exists. [M ]
is a real-valued, right-continuous, nondecreasing adapted process such that
[M ]0 = 0. If M is an L2 -martingale, the convergence in (2.8) for Y = M
holds also in L1 , and

(3.14) E{[M ]t } = E{Mt2 − M02 }.

If M is continuous, then so is [M ].

We check that stopping the submartingale and stopping the quadratic

variation give the same process.

Lemma 3.23. Let M be a right-continuous L2 -martingale or local L2 -

martingale. Let τ be a stopping time. Then [M τ ] = [M ]τ , in the sense
that these processes are indistinguishable.

Proof. The processes [M τ ] and [M ]τ are right-continuous. Hence indistin-

guishability follows from proving almost sure equality at all fixed times.

Step 1. We start with a discrete stopping time τ whose values form

an unbounded, increasing sequence u1 < u2 < · · · < uj % ∞. Fix t and
consider a sequence of partitions π n = {0 = tn0 < tn1 < · · · < tnm(n) = t} of
[0, t] with mesh(π n ) → 0 as n → ∞. For any uj ,
X
(Muj ∧tni+1 − Muj ∧tni )2 → [M ]uj ∧t in probability, as n → ∞.
i

We can replace the original sequence π n with a subsequence along which

this convergence is almost sure for all j. We denote this new sequence again
by π n . (The random variables above are the same for all j large enough so
that uj > t, so there are really only finitely many distinct j for which the
limit needs to happen.)
3.4. Quadratic variation for local martingales 81

Fix an ω at which the convergence happens. Let uj = τ (ω). Then the

above limit gives
X 2
[M τ ]t (ω) = lim Mtτni+1 (ω) − Mtτni (ω)
n→∞
i
X 2
= lim Mτ ∧tni+1 (ω) − Mτ ∧tni (ω)
n→∞
i
X 2
= lim Muj ∧tni+1 (ω) − Muj ∧tni (ω)
n→∞
i
= [M ]uj ∧t (ω) = [M ]τ ∧t (ω) = [M ]τt (ω).
The meaning of the first equality sign above is that we know [M τ ]t is given
by this limit, since according to the existence theorem, [M τ ]t is given as a
limit in probability along any sequence of partitions with vanishing mesh.
We have shown that [M τ ]t = [M ]τt almost surely for a discrete stopping
time τ .
Step 2. Let τ be an arbitrary stopping time, but assume M is an L2 -
martingale. Let τn = 2−n ([2n τ ] + 1) be the discrete approximation that
converges to τ from the right. Apply (2.11) to X = M τn and Y = M τ , take
expectations, use (3.14), and apply Schwarz inequality to get
1/2 1/2
E | [M τn ]t − [M τ ]t | ≤ E [M τn − M τ ]t + 2E [M τn − M τ ]t [M τ ]t

1/2 1/2
≤ E (Mtτn − Mtτ )2 + 2E [M τn − M τ ]t E [M τ ]t

1/2 2 1/2
= E (Mτn ∧t − Mτ ∧t )2 + 2E (Mτn ∧t − Mτ ∧t )2

E Mτ ∧t
1/2 2 1/2
≤ E{Mτ2n ∧t } − E{Mτ2∧t } + 2 E{Mτ2n ∧t } − E{Mτ2∧t }

E Mt .
In the last step we used (3.3) in two ways: For a martingale it gives equality,
and so
E (Mτn ∧t − Mτ ∧t )2 = E{Mτ2n ∧t } − 2E{ E(Mτn ∧t |Fτ ∧t )Mτ ∧t } + E{Mτ2∧t }

= E{Mτ2n ∧t } − E{Mτ2∧t }.
Second, we applied (3.3) to the submartingale M 2 to get
1/2 1/2
E Mτ2∧t ≤ E Mt2

.
The string of inequalities allows us to conclude that [M τn ]t converges to
[M τ ]t
in L1 as n → ∞, if we can show that
(3.15) E{Mτ2n ∧t } → E{Mτ2∧t }.
To argue this last limit, first note that by right-continuity, Mτ2n ∧t → Mτ2∧t
almost surely. By optional stopping (3.6),
0 ≤ Mτ2n ∧t ≤ E(Mt2 |Fτn ∧t ).
82 3. Martingales

This inequality and Lemma B.12 from the Appendix imply that the sequence
{Mτ2n ∧t : n ∈ N} is uniformly integrable. Under uniform integrability, the
almost sure convergence implies convergence of the expectations (3.15).
To summarize, we have shown that [M τn ]t → [M τ ]t in L1 as n → ∞.
By Step 1, [M τn ]t = [M ]τn ∧t which converges to [M ]τ ∧t by right-continuity
of the process [M ]. Putting these together, we get the almost sure equality
[M τ ]t = [M ]τ ∧t for L2 -martingales.

Step 3. Lastly a localization step. Let {σk } be stopping times such

that σk % ∞ and M σk is an L2 -martingale for each k. By Step 2
[M σk ∧τ ]t = [M σk ]τ ∧t .
On the event {σk > t}, throughout the time interval [0, t], M σk ∧τ agrees with
M τ and M σk agrees with M . Hence the corresponding sums of squared
increments agree also. In the limits of vanishing mesh we have almost
surely [M σk ∧τ ]t = [M τ ]t , and also [M σk ]s = [M ]s for all s ∈ [0, t] by right-
continuity. We can take s = τ ∧ t, and this way we get the required equality
[M τ ]t = [M ]τ ∧t .

Theorem 3.24. (a) If M is a right-continuous L2 -martingale, then Mt2 −

[M ]t is a martingale.
(b) If M is a right-continuous local L2 -martingale, then Mt2 − [M ]t is a
local martingale.

Proof. Part (a). Let s < t and A ∈ Fs . Let 0 = t0 < · · · < tm = t be a

partition of [0, t], and assume that s is a partition point, say s = t` .
E 1A Mt2 − Ms2 − [M ]t + [M ]s

m−1
X
2 2
(3.16a) = E 1A (Mti+1 − Mti ) − [M ]t + [M ]s
i=`
m−1
X
2
= E 1A (Mti+1 − Mti ) − [M ]t + [M ]s
i=`
m−1
X
(3.16b) = E 1A (Mti+1 − Mti )2 − [M ]t
i=0
`−1
X
2
(3.16c) + E 1A [M ]s − (Mti+1 − Mti ) .
i=0

The second equality above follows from

E Mt2i+1 − Mt2i Fti = E (Mti+1 − Mti )2

Fti .
3.4. Quadratic variation for local martingales 83

To apply this, the expectation on line (3.16a) has to be taken apart, the
conditioning applied to individual terms, and then the expectation put back
together. Letting the mesh of the partition tend to zero makes the expec-
tations on lines (3.16b)–(3.16c) vanish by the L1 convergence in (2.8) for
L2 -martingales.
In the limit we have
E 1A Mt2 − [M ]t = E 1A Ms2 − [M ]s

for an arbitrary A ∈ Fs , which implies the martingale property.

(b) Let X = M 2 − [M ] for the local L2 -martingale M . Let {τk } be a
localizing sequence for M . By part (a), (M τk )2t − [M τk ]t is a martingale.
Since [M τk ]t = [M ]t∧τk by Lemma 3.23, this martingale is the same as
2
Mt∧τ k
− [M ]t∧τk = Xtτk . Thus {τk } is a localizing sequence for X.

By Theorem 3.22 and Lemma 2.13 the covariation [M, N ] of two right-
continuous local martingales M and N exists. As a difference of increasing
processes, [M, N ] is a finite variation process.
Lemma 3.25. Let M and N be cadlag L2 -martingales or local L2 -martingales.
Let τ be a stopping time. Then [M τ , N ] = [M τ , N τ ] = [M, N ]τ .

Proof. [M τ , N τ ] = [M, N ]τ follows from Lemma 2.13 and Lemma 3.23. For
the first equality claimed, consider a partition of [0, t]. If 0 < τ ≤ t, let ` be
the index such that t` < τ ≤ t`+1 . Then
X
(Mtτi+1 − Mtτi )(Nti+1 − Nti ) = (Mτ − Mt` )(Nt`+1 − Nτ )1{0<τ ≤t}
i
X
+ (Mtτi+1 − Mtτi )(Ntτi+1 − Ntτi ).
i

(If τ = 0 the equality above is still true, for both sides vanish.) Let the mesh
of the partition tend to zero. With cadlag paths, the term after the equality
sign converges almost surely to (Mτ − Mτ − )(Nτ + − Nτ )1{0<τ ≤t} = 0. The
convergence of the sums gives [M τ , N ] = [M τ , N τ ].
Theorem 3.26. (a) If M and N are right-continuous L2 -martingales, then
M N − [M, N ] is a martingale.
(b) If M and N are right-continuous local L2 -martingales, then M N −
[M, N ] is a local martingale.

Proof. Both parts follow from Theorem 3.24 after writing

M N − [M, N ] = 21 {(M + N )2 − [M + N ]} − 12 {M 2 − [M ]} − 21 {N 2 − [N ]}.

84 3. Martingales

3.5. Doob-Meyer decomposition

In addition to the quadratic variation process [M ], there is another process
with similar notation, namely hM i, associated to a square-integrable mar-
tingale M . hM i is the increasing process that comes from the Doob-Meyer
decomposition of M 2 . For a continuous square-integrable martingale [M ]
and hM i coincide.
Throughout this section we work with a fixed probability space (Ω, F, P )
with a filtration {Ft }.

Definition 3.27. An increasing process A = {At : 0 ≤ t < ∞} is an

adapted process such that, for almost every ω, A0 (ω) = 0 and s 7→ As (ω) is
nondecreasing and right-continuous. Monotonicity implies the existence of
left limits At− , so it follows that an increasing process is cadlag.
An increasing process A is natural if for every bounded cadlag martingale
M = {M (t) : 0 ≤ t < ∞},
Z Z
(3.17) E M (s)dA(s) = E M (s−)dA(s) for 0 < t < ∞.
(0,t] (0,t]

The expectation–integral on the left in condition (3.17) is interpreted

in the following way. First for a fixed ω, the function s 7→ M (s, ω) is in-
tegrated against the (positive) Lebesgue-Stieltjes measure of the function
s 7→ A(s, ω). The resulting quantity is a measurable function of ω (Exer-
cise 3.1). Then this function is averaged over the probability space. The
expectations in (3.17) can be infinite. For a fixed ω,
Z
M (s)dA(s) ≤ sup|M (s)|A(t) < ∞
(0,t] s

so the random integral is finite.

Lemma 3.28. Let A be an increasing process and M a bounded cadlag

martingale. If A is continuous then (3.17) holds.

Proof. A cadlag path s 7→ M (s, ω) has at most countably many disconti-

nuities. If A is continuous, the Lebesgue-Stieltjes measure ΛA gives no mass
to singletons: ΛA {s} = A(s) − A(s−) = 0, and hence no mass to a countable
set either. Consequently
Z

M (s) − M (s−) dA(s) = 0.
(0,t]

3.5. Doob-Meyer decomposition 85

Definition 3.29. For 0 < u < ∞, let Tu be the collection of stopping times
τ that satisfy τ ≤ u. A process X is of class DL if the random variables
{Xτ : τ ∈ Tu } are uniformly integrable for each 0 < u < ∞.

The main example for us is the following.

Lemma 3.30. A right-continuous nonnegative submartingale is of class

DL.

Proof. Let X be a right-continuous nonnegative submartingale, and 0 <

u < ∞. By (3.3)
0 ≤ Xτ ≤ E[Xu |Fτ ].
By Lemma B.12 the collection of all conditional expectations on the right
is uniformly integrable. Consequently these inequalities imply the uniform
integrability of the collection {Xτ : τ ∈ Tu }.

Here is the main theorem.

Theorem 3.31 (Doob-Meyer Decomposition). Assume the underlying fil-

tration is complete and right-continuous. Let X be a right-continuous sub-
martingale of class DL. Then there is an increasing natural process A,
unique up to indistinguishability, such that X − A is a martingale.

Definition 3.32. Assume the underlying filtration is complete and right-

continuous. Let M be a right-continuous square-integrable martingale. Then
hM i is by definition the unique increasing, natural process such that Mt2 −
hM it is a martingale. The existence of hM i follows from the Doob-Meyer
decomposition.

Proposition 3.33. (a) Suppose M is a continuous square-integrable mar-

tingale. Then hM i = [M ].
(b) Suppose M is a right-continuous square-integrable martingale whose
increments have these properties: for all s, t ≥ 0, Ms+t − Ms is independent
of Fs and has the same distribution as Mt −M0 . Then hM it = t·E[M12 −M02 ].

Proof. Part (a). By Theorems 3.22 and 3.24, [M ] is a continuous, increasing

process such that M 2 − [M ] is a martingale. Continuity implies that [M ] is
natural by Lemma 3.28. Uniqueness of hM i implies hM i = [M ].

Part (b). The deterministic, continuous function t 7→ tE[M12 − M02 ]

satisfies the requirements of an increasing, natural process. For any t > 0
86 3. Martingales

and integer k
k−1
X k−1
X
2
E[Mkt − M02 ] = 2
E[M(j+1)t 2
− Mjt ]= E[(M(j+1)t − Mjt )2 ]
j=0 j=0

= kE[(Mt − M0 ) ] = kE[Mt2 − M02 ].

Using this twice, for any rational k/n,

2
E[Mk/n − M02 ] = kE[M1/n
2
− M02 ] = (k/n)E[M12 − M02 ].
Given an irrational t > 0, pick rationals qm & t. Fix T ≥ qm . By right-
continuity of paths, Mqm → Mt almost surely. Uniform integrability of
{Mq2m } follows by the submartingale property
0 ≤ Mq2m ≤ E[MT2 |Fqm ]
and Lemma B.12. Uniform integrability gives convergence of expectations
E[Mq2m ] → E[Mt2 ]. Applying this above gives
E[Mt2 − M02 ] = tE[M12 − M02 ].
Now we can check the martingale property.
E[Mt2 |Fs ] = Ms2 + E[Mt2 − Ms2 |Fs ] = Ms2 + E[(Mt − Ms )2 |Fs ]
= Ms2 + E[(Mt−s − M0 )2 ] = Ms2 + E[Mt−s
2
− M02 ]
= Ms2 + (t − s)E[M12 − M02 ].

With this proposition we can easily handle our two basic examples.
Example 3.34. For a standard Brownian motion hBit = [B]t = t. For a
compensated Poisson process Mt = Nt − αt,
hM it = tE[M12 ] = tE[(N1 − α)2 ] = αt.

3.6. Spaces of martingales

Given a probability space (Ω, F, P ) with a filtration {Ft }, let M2 denote
the space of square-integrable cadlag martingales on this space with respect
to {Ft }. The subspace of M2 martingales with continuous paths is Mc2 .
M2 and Mc2 are both linear spaces.
Every right-continuous martingale has a cadlag modification, so for every
right-continuous L2 -martingale M there exists a martingale M f ∈ M2 such
that P {Mt = Mt } = 1 for all t ≥ 0.
f
We measure the size of a martingale M ∈ M2 with the quantity
∞
X
2−k 1 ∧ kMk kL2 (P ) .

(3.18) kM kM2 =
k=1
3.6. Spaces of martingales 87

kMk kL2 (P ) = E[ |Mk |2 ]1/2 is the L2 norm of Mk . k · kM2 is not a norm

because kαM kM2 is not necessarily equal to |α| · kM kM2 for a real number
α. But the triangle inequality
kM + N kM2 ≤ kM kM2 + kN kM2
is valid, and follows from the triangle inequality of the L2 norm and because
1 ∧ (a + b) ≤ 1 ∧ a + 1 ∧ b for a, b ≥ 0.
Hence we can define a metric, or distance function, between two martingales
M, N ∈ M2 by
(3.19) dM2 (M, N ) = kM − N kM2 .
A technical issue arises here. A basic property of a metric is that the
distance between two elements is zero iff these two elements coincide. But
with the above definition we have dM2 (M, N ) = 0 if M and N are in-
distinguishable, even if they are not exactly equal as functions. So if we
were to precisely follow the axiomatics of metric spaces, indistinguishable
martingales should actually be regarded as equal. The mathematically so-
phisticated way of doing this is to regard M2 not as a space of processes
but as a space of equivalence classes
{M } = {N : N is a square-integrable cadlag martingale on (Ω, F, P ),
and M and N are indistinguishable}
Fortunately this technical point does not cause any difficulties. We shall
continue to regard the elements of M2 as processes in our discussion, and
remember that two indistinguishable processes are really two “representa-
tives” of the same underlying process.
Lemma 3.35. Assume the underlying probability space (Ω, F, P ) and the
filtration {Ft } complete. Let indistinguishable processes be interpreted as
equal. Then M2 is a complete metric space under the metric dM2 . The
subspace Mc2 is closed, and hence a complete metric space in its own right.

Proof. Suppose M ∈ M2 and kM kM2 = 0. Then E[Mk2 ] = 0 for each

k ∈ N. Since Mt2 is a submartingale, E[Mt2 ] ≤ E[Mk2 ] for t ≤ k, and
consequently E[Mt2 ] = 0 for all t ≥ 0. In particular, for each fixed t,
P {Mt = 0} = 1. A countable union of null sets is a null set, and so there
exists an event Ω0 ⊆ Ω such that P (Ω0 ) = 1 and Mq (ω) = 0 for all ω ∈ Ω0
and q ∈ Q+ . By right-continuity, then Mt (ω) = 0 for all ω ∈ Ω0 and t ∈ R+ .
This shows that M is indistinguishable from the identically zero process.
The above paragraph shows that dM2 (M, N ) = 0 implies M = N , in the
sense that M and N are indistinguishable. We already observed above that
dM2 satisfies the triangle inequality. The remaining property of a metric is
the symmetry dM2 (M, N ) = dM2 (N, M ) which is true by the definition.
88 3. Martingales

To check completeness, let {M (n) : n ∈ N} be a Cauchy sequence in the

metric dM2 in the space M2 . We need to show that there exists M ∈ M2
such that kM (n) − M kM2 → 0 as n → ∞.
(m) (n)
For any t ≤ k ∈ N, first because (Mt − Mt )2 is a submartingale,
and then by the definition (3.18),
(m) (n) 1/2 (m) (n) 1/2
1 ∧ E (Mt − Mt )2 ≤ 1 ∧ E (Mk − Mk )2
≤ 2k kM (m) − M (n) kM2 .
(n)
It follows that for each fixed t, {Mt } is a Cauchy sequence in L2 (P ). By
the completeness of the space L2 (P ), for each t ≥ 0 there exists a random
variable Yt ∈ L2 (P ) defined by the mean-square limit
(n)
lim E (Mt − Yt )2 = 0.

(3.20)
n→∞
(n) (n)
Take s < t and A ∈ Fs . Let n → ∞ in the equality E[1A Mt ] = E[1A Ms ].
Mean-square convergence guarantees the convergence of the expectations,
and in the limit
(3.21) E[1A Yt ] = E[1A Ys ].
We could already conclude here that {Yt } is a martingale, but {Yt } is not
our ultimate limit because we need the cadlag path property.
To get a cadlag limit we use a Borel-Cantelli argument. By inequality
(3.8),
(m) (n) (m) (n)
P sup |Mt − Mt | ≥ ε ≤ ε−2 E (Mk − Mk )2 .

(3.22)
0≤t≤k

This enables us to choose a subsequence {nk } such that

(n ) (n )
P sup |Mt k+1 − Mt k | ≥ 2−k ≤ 2−k .

(3.23)
0≤t≤k

To achieve this, start with n0 = 1, and assuming nk−1 has been chosen, pick
nk > nk−1 so that
kM (m) − M (n) kM2 ≤ 2−3k
for m, n ≥ nk . Then for m ≥ nk ,
(m) (n ) 1/2
1 ∧ E (Mk − Mk k )2 ≤ 2k kM (m) − M (nk ) kM2 ≤ 2−2k ,
and the minimum with 1 is superfluous since 2−2k < 1. Substituting this
back into (3.22) with ε = 2−k gives (3.23) with 2−2k on the right-hand side.
By the Borel-Cantelli lemma, there exists an event Ω1 with P (Ω1 ) = 1
such that for ω ∈ Ω1 ,
(nk+1 ) (nk )
sup |Mt (ω) − Mt (ω)| < 2−k
0≤t≤k
3.6. Spaces of martingales 89

for all but finitely many k’s. It follows that the sequence of cadlag functions
(n )
t 7→ Mt k (ω) are Cauchy in the uniform metric over any bounded time
interval [0, T ]. By Lemma A.3 in the Appendix, for each T < ∞ there exists
(T ) (n )
a cadlag process {Nt (ω) : 0 ≤ t ≤ T } such that Mt k (ω) converges to
(T )
Nt (ω) uniformly on the time interval [0, T ], as k → ∞, for any ω ∈ Ω1 .
(S) (T )
Nt (ω) and Nt (ω) must agree for t ∈ [0, S ∧ T ], since both are limits of
the same sequence. Thus we can define one cadlag function t 7→ Mt (ω) on
(n )
R+ for ω ∈ Ω1 , such that Mt k (ω) converges to Mt (ω) uniformly on each
bounded time interval [0, T ]. To have M defined on all of Ω, set Mt (ω) = 0
for ω ∈ / Ω1 .
The event Ω1 lies in Ft by the assumption of completeness of the filtra-
(n )
tion. Since Mt k → Mt on Ω1 while Mt = 0 on Ωc1 , it follows that Mt is
Ft -measurable. The almost sure limit Mt and the L2 limit Yt of the sequence
(n )
{Mt k } must coincide almost surely. Consequently (3.21) becomes
(3.24) E[1A Mt ] = E[1A Ms ]
for all A ∈ Fs and gives the martingale property for M . To summarize, M
is now a square-integrable cadlag martingale, in other words an element of
M2 . The final piece, namely kM (n) − M kM2 → 0, follows because we can
replace Yt by Mt in (3.20) due to the almost sure equality Mt = Yt .
If all M (n) are continuous martingales, the uniform convergence above
produces a continuous limit M . This shows that Mc2 is a closed subspace of
M2 under the metric dM2 .

By adapting the argument above from equation (3.22) onwards, we get

this useful consequence of convergence in M2 .
Lemma 3.36. Suppose kM (n) − M kM2 → 0 as n → ∞. Then for each
T < ∞ and ε > 0,
(n)
(3.25) lim P sup |Mt − Mt | ≥ ε = 0.
n→∞ 0≤t≤T

Furthermore, there exists a subsequence {M (nk ) } and an event Ω0 such that

P (Ω0 ) = 1 and for each ω ∈ Ω0 and T < ∞,
(nk )
(3.26) lim sup Mt (ω) − Mt (ω) = 0.
n→∞ 0≤t≤T

The convergence defined by (3.25) for all T < ∞ and ε > 0 is called
uniform convergence in probability on compact intervals.
We shall write M2,loc for the space of cadlag local L2 -martingales with
respect to a given filtration {Ft } on a given probability space (Ω, F, P ). We
do not introduce a distance function on this space.
90 3. Martingales

Exercises
Exercise 3.1. Let A be an increasing process, and φ : R+ × Ω → R a
bounded BR+ ⊗ F-measurable function. Let T < ∞.
Z
gφ (ω) = φ(t, ω)dAt (ω)
(0,T ]
is an F-measurable function. Show also that, for any BR+ ⊗ F-measurable
nonnegative function φ : R+ × Ω → R+ ,
Z
gφ (ω) = φ(t, ω)dAt (ω)
(0,∞)
is an F-measurable function. The integrals are Lebesgue-Stieltjes integrals,
evaluated separately for each ω. The only point in separating the two cases
is that if φ takes both positive and negative values, the integral over the
entire interval [0, ∞) might not be defined.
Hint: One can start with φ(t, ω) = 1(a,b]×Γ (t, ω) for 0 ≤ a < b < ∞ and
Γ ∈ F. Then apply Theorem B.2 from the Appendix.
Exercise 3.2. Let N = {N (t) : 0 ≤ t < ∞} be a homogeneous rate α
Poisson process with respect to {Ft } and Mt = Nt − αt the compensated
Poisson process. We have seen that the quadratic variation is [M ]t = Nt
while hM it = αt. It follows that N cannot be a natural increasing process.
In this exercise you show that the naturalness condition fails for N .
(a) Let λ > 0. Show that
X(t) = exp{−λN (t) + αt(1 − e−λ )}
is a martingale.
(b) Show that N is not a natural increasing process, by showing that for
X defined above, the condition
Z Z
E X(s)dN (s) = E X(s−)dN (s)
(0,t] (0,t]
fails. (In case you protest that X is not a bounded martingale, fix T > t
and consider X(s ∧ T ).)
Chapter 4

Stochastic Integral
with respect to
Brownian Motion

As an introduction to stochastic integration we develop the stochastic inte-

gral with respect to Brownian motion. This can be done with fewer tech-
nicalities than are needed for integrals with respect to general cadlag mar-
tingales, so the basic ideas of stochastic integration are in clearer view. The
same steps will be repeated in the next chapter in the development of the
more general integral. For this reason we leave the routine verifications in
this chapter as exercises. We develop only enough properties of the inte-
gral to enable us to get to the point where the integral of local integrands
is defined. This chapter can be skipped without loss. Only Lemma 4.2 is
referred to later in Section 5.5. This lemma is of technical nature and can
be read independently of the rest of the chapter.
Throughout this chapter, (Ω, F, P ) is a fixed probability space with a
filtration {Ft }, and B = {Bt } is a standard one-dimensional Brownian mo-
tion with respect to the filtration {Ft } (Definition 2.20). We assume that F
and each Ft contains all subsets of events of probability zero, an assumption
that entails no loss of generality as explained in Section 2.1.
We begin with a calculation that reveals that, contrary to the Riemann
and Stieltjes integrals from calculus, the choice of point of evaluation in a
partition interval matters.

91
92 4. Stochastic Integral with respect to Brownian Motion

Lemma 4.1. Fix a number u ∈ [0, 1]. Given a partition π = {0 = t0 < t1 <
· · · < tm(π) = t}, let si = (1 − u)ti + uti+1 , and define
m(π)−1
X
S(π) = Bsi (Bti+1 − Bti ).
i=0

Then
lim S(π) = 12 Bt2 − 12 t + ut in L2 (P ).
mesh(π)→0

Proof. First check the algebra identity

a2 c2 (a − c)2
b(a − c) = − − + (b − c)2 + (a − b)(b − c).
2 2 2
Applying this,
X X
S(π) = 12 Bt2 − 1
2 (Bti+1 − Bti )2 + (Bsi − Bti )2
i i
X
+ (Bti+1 − Bsi )(Bsi − Bti )
i
≡ 12 Bt2 − S1 (π) + S2 (π) + S3 (π)
where the last equality defines the sums S1 (π), S2 (π), and S3 (π). By Propo-
sition 2.33,
lim S1 (π) = 21 t in L2 (P ).
mesh(π)→0

For the second sum,

X X
E[S2 (π)] = (si − ti ) = u (ti+1 − ti ) = ut,
i i

and
X X
Var[S2 (π)] = Var[(Bsi − Bti )2 ] = 2 (si − ti )2
i i
X
2
≤2 (ti+1 − ti ) ≤ 2t mesh(π)
i

which vanishes as mesh(π) → 0. The factor 2 above comes from Gaussian

properties: if X is a mean zero normal with variance σ 2 , then
2
Var[X 2 ] = E[X 4 ] − E[X 2 ] = 3σ 4 − σ 4 = 2σ 2 .
The vanishing variance of S2 (π) is equivalent to
lim S2 (π) = ut in L2 (P ).
mesh(π)→0
4. Stochastic Integral with respect to Brownian Motion 93

Lastly, we show that S3 (π) vanishes in L2 (P ).

X 2
2
E[S3 (π) ] = E (Bti+1 − Bsi )(Bsi − Bti )
i
X
E (Bti+1 − Bsi )2 (Bsi − Bti )2

=
i
X
+ E (Bti+1 − Bsi )(Bsi − Bti )(Btj+1 − Bsj )(Bsj − Btj )
i6=j
X X
= (ti+1 − si )(si − ti ) ≤ (ti+1 − ti )2 ≤ t mesh(π)
i i
which again vanishes as mesh(π) → 0.

Proposition 2.22 says that only one choice, namely u = 0 or that si = ti ,

the initial point of the partition interval, makes the limit of S(π) into a
martingale. This is the choice for the Itô integral with respect to Brownian
motion that we develop.
For a measurable process X, the L2 -norm over the set [0, T ] × Ω is
Z 1/2
2
(4.1) kXkL2 ([0,T ]×Ω) = E |X(t, ω)| dt .
[0,T ]

Let L2 (B) denote the collection of all measurable, adapted processes X such
that
kXkL2 ([0,T ]×Ω) < ∞
for all T < ∞. A metric on L2 (B) is defined by dL2 (X, Y ) = kX − Y kL2 (B)
where
X∞
2−k 1 ∧ kXkL2 ([0,k]×Ω) .

(4.2) kXkL2 (B) =
k=1
The triangle inequality
kX + Y kL2 (B) ≤ kXkL2 (B) + kY kL2 (B)
is valid, and this gives the triangle inequality
dL2 (X, Y ) ≤ dL2 (X, Z) + dL2 (Z, Y )
required for dL2 (X, Y ) to be a genuine metric.
To have a metric, one also needs the property dL2 (X, Y ) = 0 iff X =
Y . We have to adopt the point of view that two processes X and Y are
considered “equal” if the set of points (t, ω) where X(t, ω) 6= Y (t, ω) has
m ⊗ P -measure zero. Equivalently,
Z ∞
(4.3) P {X(t) 6= Y (t)} dt = 0.
0
94 4. Stochastic Integral with respect to Brownian Motion

In particular processes that are indistinguishable, or modifications of each

other have to be considered equal under this interpretation.
The symmetry dL2 (X, Y ) = dL2 (Y, X) is immediate from the defini-
tion. So with the appropriate meaning assigned to equality, L2 (B) is a
metric space. Convergence Xn → X in L2 (B) is equivalent to Xn → X in
L2 ([0, T ] × Ω) for each T < ∞.
The symbol B reminds us that L2 (B) is a space of integrands for sto-
chastic integration with respect to Brownian motion.
The finite mean square requirement for membership in L2 (B) is restric-
tive. For example, it excludes some processes of the form f (Bt ) where f
is a smooth but rapidly growing function. Consequently from L2 (B) we
move to a wider class of processes denoted by L(B), where the mean square
requirement is imposed only on integrals over the time variable. Precisely,
L(B) is the class of measurable, adapted processes X such that
Z T
(4.4) P ω: X(t, ω)2 dt < ∞ for all T < ∞ = 1.
0

This will be the class of processes X for which the stochastic integral process
with respect to Brownian motion, denoted by
Z t
(X · B)t = Xs dBs
0

will ultimately be defined.

The development of the integral starts from a class of processes for which
the integral can be written down directly. There are several possible starting
places. Here is our choice.
A simple predictable process is a process of the form
n−1
X
(4.5) X(t, ω) = ξ0 (ω)1{0} (t) + ξi (ω)1(ti ,ti+1 ] (t)
i=1

where n is finite integer, 0 = t0 = t1 < t2 < · · · < tn are time points, and for
0 ≤ i ≤ n − 1, ξi is a bounded Fti -measurable random variable on (Ω, F, P ).
Predictability refers to the fact that the value Xt can be “predicted” from
{Xs : s < t}. Here this point is rather simple because X is left-continuous
so Xt = lim Xs as s % t. In the next chapter we need to deal seriously with
the notion of predictability. Here it is not really needed, and we use the
term only to be consistent with what comes later. The value ξ0 at t = 0
is irrelevant both for the stochastic integral of X and for approximating
general processes. We include it so that the value X(0, ω) is not artificially
restricted.
4. Stochastic Integral with respect to Brownian Motion 95

A key point is that processes in L2 (B) can be approximated by simple

predictable processes in the L2 (B)-distance. We split this approximation
into two steps.

Lemma 4.2. Suppose X is a bounded, measurable, adapted process. Then

there exists a sequence {Xn } of simple predictable processes such that, for
any 0 < T < ∞,
Z T
2
lim E Xn (t) − X(t) dt = 0.
n→∞ 0

Proof. We begin by showing that, given T < ∞, we can find simple pre-
(T )
dictable processes Yk that vanish outside [0, T ] and satisfy
Z T
(T ) 2
(4.6) lim E Yk (t) − X(t) dt = 0.
k→∞ 0

Extend X to R × Ω by defining X(t, ω) = 0 for t < 0. For each n ∈ N and

s ≥ 0, define
X
Z n,s (t, ω) = X(s + 2−n j, ω)1(s+2−n j,s+2−n (j+1)] (t) · 1[0,T ] (t).
j∈Z

Z n,s is a simple predictable process. It is jointly measurable as a function of

the triple (s, t, ω), so it can be integrated over all three variables. Fubini’s
theorem allows us to perform these integrations in any order we please.
We claim that
Z T Z 1
2
(4.7) lim E dt ds Z n,s (t) − X(t) = 0.
n→∞ 0 0

To prove this, start by ignoring the expectation and rewrite the double
integral as follows:
Z T Z 1
2
dt ds Z n,s (t) − X(t)
0 0
Z T XZ 1
2
= dt ds X(s + 2−n j, ω) − X(t, ω) 1(s+2−n j,s+2−n (j+1)] (t)
0 j∈Z 0
Z T XZ 1
2
= dt ds X(s + 2−n j, ω) − X(t, ω) 1[t−2−n (j+1),t−2−n j) (s).
0 j∈Z 0

For a fixed t, the s-integral vanishes unless

0 < t − 2−n j and t − 2−n (j + 1) < 1,
96 4. Stochastic Integral with respect to Brownian Motion

which is equivalent to 2n (t − 1) − 1 < j < 2n t. For each fixed t and j, change

variables in the s-integral: let h = t − s − 2−n j. Then s ∈ [t − 2−n (j + 1), t −
2−n j) iff h ∈ (0, 2−n ]. These steps turn the integral into
Z T X Z 2−n
2
dt 1{2n (t−1)−1<j<2n t} dh X(t − h, ω) − X(t, ω)
0 j∈Z 0
Z 2−n Z T
2
≤ (2n + 1) dh dt X(t − h, ω) − X(t, ω) .
0 0
The last upper bound follows because there are at most 2n + 1 j-values
that satisfy the restriction 2n (t − 1) − 1 < j < 2n t. Now take expectations
through the inequalities. We get
Z T Z 1
2
E dt ds Z n,s (t) − X(t) dt
0 0
Z 2−n Z T
n 2
≤ (2 + 1) dh E dt X(t − h, ω) − X(t, ω) .
0 0
The last line vanishes as n → ∞ for these reasons: First,
Z T
2
lim dt X(t − h, ω) − X(t, ω) = 0
h→0 0

for each fixed ω by virtue of Lp -continuity (Proposition A.17). Since X is

bounded, the expectations converge by dominated convergence:
Z T
2
lim E dt X(t − h, ω) − X(t, ω) = 0.
h→0 0
Last,
Z 2−n Z T
n 2
lim (2 + 1) dh E dt X(t − h, ω) − X(t, ω) =0
n→∞ 0 0

follows from this general fact, which we leave as an exercise: if f (x) → 0 as

x → 0, then
1 ε
Z
f (x) dx → 0 as ε → 0.
ε 0
We have justified (4.7).
We can restate (4.7) by saying that the function
Z T
2
φn (s) = E dt Z n,s (t) − X(t) dt
0

satisfies φn → 0 in L1 [0, 1].

Consequently a subsequence φnk (s) → 0 for
(T )
Lebesgue-almost every s ∈ [0, 1]. Fix any such s. Define Yk = Z nk ,s , and
we have (4.6).
4. Stochastic Integral with respect to Brownian Motion 97

(m)
To complete the proof, create the simple predictable processes {Yk }
for all T = m ∈ N. For each m, pick km such that
Z m
(m) 2 1
E dt Ykm (t) − X(t) dt < .
0 m
(m)
Then Xm = Ykm satisfies the requirement of the lemma.

Proposition 4.3. Suppose X ∈ L2 (B). Then there exists a sequence of

simple predictable processes {Xn } such that kX − Xn kL2 (B) → 0.

Proof. Let X (k) = (X∧k)∨(−k). Since |X (k) −X| ≤ |X| and |X (k) −X| → 0
pointwise on R+ × Ω,
Z m
2
lim E X (k) (t) − X(t) dt = 0
k→∞ 0

for each m ∈ N. This is equivalent to kX − X (k) kL2 (B) → 0. Given ε > 0,

pick k such that kX − X (k) kL2 (B) ≤ ε/2. Since X (k) is a bounded process,
the previous lemma gives a simple predictable process Y such that kX (k) −
Y kL2 (B) ≤ ε/2. By the triangle inequality kX − Y kL2 (B) ≤ ε. Repeat this
argument for each ε = 1/n, and let Xn be the Y thus selected. This gives
the approximating sequence {Xn }.

We are ready to proceed to the construction of the stochastic integral.

There are three main steps.
(i) First an explicit formula is given for the integral X · B of a sim-
ple predictable process X. This integral will be a continuous L2 -
martingale.
(ii) A general process X in L2 (B) is approximated by simple processes
Xn . One shows that the integrals Xn · B of the approximating
simple processes converge to a continuous L2 -martingale which is
then defined to be the stochastic integral X · B.
(iii) A localization step is used to get from integrands in L2 (B) to inte-
grands in L(B).
The lemmas needed along the way for this development make valuable
exercises. So we give only hints for the proofs, and urge the first-time reader
to give them a try. These same properties are proved again with full detail
in the next chapter when we develop the more general integral. The proofs
for the Brownian case are very similar to those for the general case.
We begin with the integral of simple processes. For a simple predictable
process of the type (4.5), the stochastic integral is the process X · B defined
98 4. Stochastic Integral with respect to Brownian Motion

by
n−1
X
(4.8) (X · B)t (ω) = ξi (ω) Bti+1 ∧t (ω) − Bti ∧t (ω) .
i=1
Note that our convention is such that the value of X at t = 0 does not
influence the integral. We also write I(X) = X · B when we need a symbol
for the mapping I : X 7→ X · B.
Let S2 denote the space of simple predictable processes. It is a subspace
of L2 (B). An element X of S2 can be represented in the form (4.5) in many
different ways. We need to check that the integral X · B depends only on
the process X and not on the particular representation. Also, we need to
know that S2 is a linear space, and that the integral I(X) is a linear map
on S2 .
Lemma 4.4. (a) Suppose the process X in (4.5) also satisfies
m−1
X
Xt (ω) = η0 (ω)1{0} (t) + ηj (ω)1(si ,si+1 ] (t)
j=1

for all (t, ω), where 0 = s0 = s1 < s2 < · · · < sm < ∞ and ηj is Fsj -
measurable for 0 ≤ j ≤ m − 1. Then for each (t, ω),
n−1
X m−1
X
ξi (ω) Bti+1 ∧t (ω) − Bti ∧t (ω) = ηi (ω) Bsi+1 ∧t (ω) − Bsi ∧t (ω) .
i=1 j=1

In other words, X · B is independent of the representation.

(b) S2 is a linear space, in other words for X, Y ∈ S2 and reals α and
β, αX + βY ∈ S2 . The integral satisfies
(αX + βY ) · B = α(X · B) + β(Y · B).

Hints for proof. Let {uk } = {sj } ∪ {ti } be the common refinement of
the partitions {sj } and {ti }. Rewrite both representations of X in terms
of {uk }. The same idea can be used for part (b) to write two arbitrary
simple processes in terms of a common partition, which makes adding them
easy.

Next we need some continuity properties for the integral. Recall the
distance measure k · kM2 defined for continuous L2 -martingales by (3.18).
Lemma 4.5. Let X ∈ S2 . Then X · B is a continuous square-integrable
martingale. We have these isometries:
Z t
2
Xs2 ds for all t ≥ 0,

(4.9) E (X · B)t = E
0
4. Stochastic Integral with respect to Brownian Motion 99

and
(4.10) kX · BkM2 = kXkL2 (B) .

Hints for proof. To show that X · B is a martingale, start by proving this

statement: if u < v and ξ is a bounded Fu -measurable random variable,
then Zt = ξ(Bt∧v − Bt∧u ) is a martingale.
To prove (4.9), first square:
n−1
X 2
(X · B)2t = ξi2 Bt∧ti+1 − Bt∧ti
i=1
X
+2 ξi ξj (Bt∧ti+1 − Bt∧ti )(Bt∧tj+1 − Bt∧tj ).
i<j

Then compute the expectations of all terms.

From the isometry property we can deduce that simple process approx-
imation gives approximation of stochastic integrals.
Lemma 4.6. Let X ∈ L2 (B). Then there is a unique continuous L2 -
martingale Y such that, for any sequence of simple predictable processes
{Xn } such that
kX − Xn kL2 (B) → 0,
we have
kY − Xn · BkM2 → 0.

Hints for proof. It all follows from these facts: an approximating sequence
of simple predictable processes exists for each process in L2 (B), a convergent
sequence in a metric space is a Cauchy sequence, a Cauchy sequence in a
complete metric space converges, the space Mc2 of continuous L2 -martingales
is complete, the isometry (4.10), and the triangle inequality.

Note that uniqueness of the process Y defined in the lemma means

uniqueness up to indistinguishability: any process Ye indistinguishable from
Y also satisfies kYe − Xn · BkM2 → 0.
Now we can state the definition of the integral of L2 (B)-integrands with
respect to Brownian motion.
Definition 4.7. Let B be a Brownian motion on a probability space (Ω, F, P )
with respect to a filtration {Ft }. For any measurable adapted process
100 4. Stochastic Integral with respect to Brownian Motion

X ∈ L2 (B), the stochastic integral I(X) = X · B is the square-integrable

continuous martingale that satisfies
lim kX · B − Xn · BkM2 = 0
n→∞
for every sequence Xn ∈ S2 of simple predictable processes such that
kX − Xn kL2 (B) → 0.
The process I(X) is unique up to indistinguishability.

The reader familiar with more abstract principles of analysis should note
that the extension of the stochastic integral X ·B from X ∈ S2 to X ∈ L2 (B)
is an instance of a general, classic argument. A uniformly continuous map
from a metric space into a complete metric space can always be extended
to the closure of its domain. If the spaces are linear, the linear operations
are continuous, and the map is linear, then the extension is a linear map
too. In this case the map is X 7→ X · B, first defined for X ∈ S2 . Uniform
continuity follows from linearity and (4.10). Proposition 4.3 implies that the
closure of S2 in L2 (B) is all of L2 (B).
Some books first define the integral (X · B)t at a fixed time t as a map
from L2 ([0, t]×Ω, m⊗P ) into L2 (P ), utilizing the completeness of L2 -spaces.
Then one needs a separate argument to show that the integrals defined for
different times t can be combined into a continuous martingale t 7→ (X · B)t .
We defined the integral directly as a map into the space of martingales Mc2
to avoid the extra argument. Of course, we did not really save work. We
just did part of the work earlier when we proved that Mc2 is a complete
space (Lemma 3.35).
Example 4.8. In the definition (4.5) of the simple predictable process we
required the ξi bounded because this will be convenient later. For this
section it would have been more convenient to allow square-integrable ξi .
So let us derive the integral for that case. Let
m−1
X
X(t) = ηi 1(si ,si+1 ] (t)
i=1
where 0 ≤ s1 < · · · < sm and each ηi ∈ L2 (P ) is Fsi -measurable. Check
that a sequence of approximating simple processes is given by
m−1
(k)
X
Xk (t) = ηi 1(si ,si+1 ] (t)
i=1
(k)
with truncated variables ηi= (ηi ∧ k) ∨ (−k). And then that
Z t m−1
X
X(s) dBs = ηi (Bt∧si+1 − Bt∧si ).
0 i=1
4. Stochastic Integral with respect to Brownian Motion 101

There is something to check here because it is not immediately obvious that

the terms on the right above are square-integrable. See Exercise 4.2.
Example 4.9. One can check that Brownian motion itself is an element of
L2 (B). Let tni = i2−n and
2n
X n−1
Xn (t) = Btni 1(tni ,tni+1 ] (t).
i=0
For any T < n,
Z T 2n n−1 Z tn
X i+1
2
E (Bs − Btni )2 ds

E |Xn (s) − Bs | dt ≤
0 i=0 tn
i

2n
X n−1
= 1 n
2 (ti+1 − tni )2 = 12 n2−n .
i=0
Thus Xn converges to B in L2 (B) as n → ∞. By Example 4.8
Z t 2n
X n−1
Xn (s) dBs = Btni (Bt∧tni+1 − Bt∧tni ).
0 i=1
By the isometry (4.12) in the next Proposition, this integral converges to
Rt 2
0 Bs dBs in L as n → ∞, so by Lemma 4.1,
Z t
Bs dBs = 12 Bt2 − 12 t.
0

Before developing the integral further, we record some properties.

Proposition 4.10. Let X, Y ∈ L2 (B).
(a) Linearity carries over:
(αX + βY ) · B = α(X · B) + β(Y · B).
(b) We have again the isometries
Z t
E (X · B)2t = E Xs2 ds

(4.11) for all t ≥ 0,
0
and
(4.12) kX · BkM2 = kXkL2 (B) .
In particular, if X, Y ∈ L2 (B) are m ⊗ P -equivalent in the sense (4.3), then
X · B and Y · B are indistinguishable.
(c) Suppose τ is a stopping time such that X(t, ω) = Y (t, ω) for t ≤ τ (ω).
Then for almost every ω, (X · B)t (ω) = (Y · B)t (ω) for t ≤ τ (ω).
102 4. Stochastic Integral with respect to Brownian Motion

Hints for proof. Parts (a)–(b): These properties are inherited from the
integrals of the approximating simple processes Xn . One needs to justify
taking limits in Lemma 4.4(b) and Lemma 4.5.
The proof of part (c) is different from the one that is used in the next
chapter. So we give here more details than in previous proofs.
By considering Z = X −Y , it suffices to prove that if Z ∈ L2 (B) satisfies
Z(t, ω) = 0 for t ≤ τ (ω), then (Z · B)t (ω) = 0 for t ≤ τ (ω).
Assume first that Z is bounded, so |Z(t, ω)| ≤ C. Pick a sequence {Zn }
of simple predictable processes that converge to Z in L2 (B). Let Zn be of
the generic type (recall (4.5))
m(n)−1
X
Zn (t, ω) = ξin (ω)1(tni ,tni+1 ] (t).
i=1

(To approximate a process in L2 (B) the t = 0 term in (4.5) is not needed

because values at t = 0 do not affect dt-integrals.) We may assume |ξin | ≤ C
always, for if this is not the case, replacing ξin by (ξin ∧ C) ∨ (−C) will only
improve the approximation. Define another sequence of simple predictable
processes by
m(n)−1
X
Zn (t) =
e ξin 1{τ ≤ tni }1(tni ,tni+1 ] (t).
i=1
We claim that
(4.13) en → Z in L2 (B).
Z
To prove (4.13), note first that Zn 1{τ < t} → Z1{τ < t} = Z in L2 (B). So
it suffices to show that
(4.14) Zn 1{τ < t} − Z
en → 0 in L2 (B).

Estimate
X
Zn (t)1{τ < t} − Z
en (t) ≤ C 1{τ < t} − 1{τ ≤ tni } 1(tni ,tni+1 ] (t)
i
X
≤C 1{tni <τ < tni+1 }1(tni ,tni+1 ] (t).
i

Integrate over [0, T ] × Ω, to get

Z T
E en (t) 2 dt
Zn (t)1{τ < t} − Z
0
X Z T
≤ C2 P {tni < τ < tni+1 } 1(tni ,tni+1 ] (t) dt
i 0

≤ C 2 max{T ∧ tni+1 − T ∧ tni : 1 ≤ i ≤ m(n) − 1}.

4. Stochastic Integral with respect to Brownian Motion 103

We can artificially add partition points tni to each Zn so that this last quan-
tity converges to 0 as n → ∞, for each fixed T . This verifies (4.14), and
thereby (4.13).
The integral of Zen is given explicitly by
m(n)−1
X
en · B)t =
(Z ξin 1{τ ≤ tni }(Bt∧tni+1 − Bt∧tni ).
i=1
en · B)t = 0 if t ≤ τ . By the definition
By inspecting each term, we see that (Z
of the integral and (4.13), Zn · B → Z · B in Mc2 . Then by Lemma 3.36
e
there exists a subsequence Zen · B and an event Ω0 of full probability such
k
that, for each ω ∈ Ω0 and T < ∞,
en · B)t (ω) → (Z · B)t (ω)
(Z uniformly for 0 ≤ t ≤ T .
For any ω ∈ Ω0 , in the n → ∞ limit (Z · B)t (ω) = 0 for t ≤ τ (ω). Part (c)
has been proved for a bounded process.
To complete the proof, given Z ∈ L2 (B), let Z (k) (t, ω) = (Z(t, ω) ∧ k) ∨
(−k), a bounded process in L2 (B) with the same property Z (k) (t, ω) = 0 if
t ≤ τ (ω). Apply the previous step to Z (k) and justify what happens in the
limit.

Next we extend the integral to integrands in L(B). Given a process

X ∈ L(B), define the stopping times
Z t
(4.15) τn (ω) = inf t ≥ 0 : X(s, ω)2 ds ≥ n .
0
These are stopping times by Corollary 2.9 because the function
Z t
t 7→ X(s, ω)2 ds
0
is continuous for each ω in the event in the definition (4.4). By this same
continuity, if τn (ω) < ∞,
Z ∞ Z τn (ω)
X(s, ω)2 1{s ≤ τn (ω)} ds = X(s, ω)2 ds = n.
0 0
Let Xn (t, ω) = X(t, ω)1{t ≤ τn }. Adaptedness of Xn follows from {t ≤
τn } = {τn < t}c ∈ Ft . The function (t, ω) 7→ 1{t ≤ τn (ω)} is BR+ ⊗ F-
measurable by Exercise 4.1, hence Xn is a measurable process. Together
these properties say that Xn ∈ L2 (B), and the stochastic integrals (Xn · B)t
are well-defined.
The goal is to show that there is a uniquely defined limit of the processes
(Xn · B)t as n → ∞, and this will then serve as the definition of X · B.
104 4. Stochastic Integral with respect to Brownian Motion

Lemma 4.11. For almost every ω, if t ≤ τm (ω)∧τn (ω) then (Xm ·B)t (ω) =
(Xn · B)t (ω).

Proof. Immediate from Proposition 4.10(c).

The lemma says that, for a given (t, ω), once n is large enough so that
τn (ω) ≥ t, the value (Xn · B)t (ω) does not change with n. The definition
(4.4) guarantees that τn (ω) % ∞ for almost every ω. These ingredients
almost justify the next extension of the stochastic integral to L(B).
Definition 4.12. Let B be a Brownian motion on a probability space
(Ω, F, P ) with respect to a filtration {Ft }, and X ∈ L(B). Let Ω0 be
the event of full probability on which τn % ∞ and the conclusion of Lemma
4.11 holds. The stochastic integral X · B is defined for ω ∈ Ω0 by
(4.16) (X · B)t (ω) = (Xn · B)t (ω) for any n such that τn (ω) ≥ t.
For ω ∈
/ Ω0 define (X · B)t (ω) ≡ 0. The process X · B is a continuous local
2
L -martingale.

To justify the claim that X · B is a local L2 -martingale, just note that

{τn } serves as a localizing sequence:
(X · B)τt n = (X · B)t∧τn = (Xn · B)t∧τn = (Xn · B)τt n ,
so (X · B)τn = (Xn · B)τn , which is an L2 -martingale by Corollary 3.7. The
above equality also implies that (X · B)t (ω) is continuous for t ∈ [0, τn (ω)],
which contains any given interval [0, T ] if n is taken large enough.
It seems somewhat arbitrary to base the definition of the stochastic
integral on the particular stopping times {τn }. The property that enabled
us to define X · B by (4.16) was that X(t)1{t ≤ τn } is a process in the space
L2 (B) for all n. Let us make this into a new definition.
Definition 4.13. Let X be an adapted, measurable process. A nonde-
creasing sequence of stopping times {σn } is a localizing sequence for X if
X(t)1{t ≤ σn } is in L2 (B) for all n, and σn % ∞ with probability one.

One can check that X ∈ L(B) if and only if X has a localizing sequence
{σn }. Lemma 4.11 and Definition 4.12 work equally well with {τn } replaced
by an arbitrary localizing sequence {σn }. Fix such a sequence {σn } and
define Xen (t) = 1{t ≤ σn }X(t). Let Ω1 be the event of full probability on
which σn % ∞ and for all pairs m, n, (X em · B)t = (X
en · B)t for t ≤ σm ∧ σn .
(In other words, the conclusion of Lemma 4.11 holds for {σn }.) Let Y be
the process defined by
(4.17) en · B)t (ω) for any n such that σn (ω) ≥ t,
Yt (ω) = (X
for ω ∈ Ω1 , and identically zero outside Ω1 .
Exercises 105

Lemma 4.14. Y = X · B in the sense of indistinguishability.

Proof. Let Ω2 be the intersection of the full-probability events Ω0 and Ω1

defined previously above (4.16) and (4.17). By applying Proposition 4.10(c)
to the stopping time σn ∧ τn and the processes Xn and Xen , we conclude that
for almost every ω ∈ Ω2 , if t ≤ σn (ω) ∧ τn (ω),
en · B)t (ω) = (Xn · B)t (ω) = (X · B)t (ω).
Yt (ω) = (X
Since σn (ω) ∧ τn (ω) % ∞, the above equality holds almost surely for all
0 ≤ t < ∞.

This lemma tells us that for X ∈ L(B) the stochastic integral X · B can
be defined in terms of any localizing sequence of stopping times.

Exercises
Exercise 4.1. Show that for any [0, ∞]-valued measurable function Y on
(Ω, F), the set {(s, ω) ∈ R+ × Ω : Y (ω) > s} is BR+ ⊗ F-measurable.
Hint. Start with Sa simple Y . Show that if Yn % Y pointwise, then
{(s, ω) : Y (ω) > s} = n {(s, ω) : Yn (ω) > s}.
Exercise 4.2. Suppose η ∈ L2 (P ) is Fs measurable and t > s. Show that
E η 2 (Bt − Bs )2 = E η 2 · E (Bt − Bs )2

by truncating and using monotone convergence. In particular, this implies

that η(Bt − Bs ) ∈ L2 (P ).
Complete the details in Example 4.8. You need to show first that Xk →
X in L2 (B), and then that
Z t m−1
X
Xk (s) dBs → ηi (Bt∧si+1 − Bt∧si ) in Mc2 .
0 i=1

Exercise 4.3. Show that Bt2 is a process in L2 (B) and evaluate

Z t
Bs2 dBs .
0
Rt 1 3
Rt
Hint. Follow the example of 0 Bs dBs . Answer: 3 Bt − 0 Bs ds.
Exercise 4.4 (Integral of a step function in L(B)). Fix 0 = t0 < t1 < · · · <
tM < ∞, and random variables η0 , . . . , ηM −1 . Assume that ηi is almost
surely finite and Fti -measurable, but make no integrability assumption. De-
fine
M
X −1
g(s, ω) = ηi (ω)1(ti ,ti+1 ] (s).
i=0
106 4. Stochastic Integral with respect to Brownian Motion

The task is to show that g ∈ L(B) (virtually immediate) and that

Z t M
X −1
g(s) dBs = ηi (Bti+1 ∧t − Bti ∧t )
0 i=0
as one would expect.
Hints. Take the localizing sequence σn (ω) = inf{t : |g(t, ω)| ≥ n}. (Re-
call the convention inf ∅ = ∞.) Show that gn (t, ω) = g(t, ω)1{t ≤ σn (ω)}
is also a step function with the same partition. Then we know what the
approximating integrals
Z t
Yn (t, ω) = gn (s, ω) dBs (ω)
0
look like.
Chapter 5

Stochastic Integration
of Predictable
Processes

The
R main goal of this chapter is the definition of the stochastic integral
(0,t] X(s) dY (s) where the integrator Y is a cadlag semimartingale and X
is a locally bounded predictable process. The most important special case
is the one where the integrand is of the Rform X(t−) for some cadlag process
X. In this case the stochastic integral (0,t] X(s−) dY (s) can be realized as
the limit of Riemann sums

∞
X
Sn (t) = X(si ) Y (si+1 ∧ t) − Y (si ∧ t)
i=0

when the mesh of the partition {si } tends to zero. The convergence is then
uniform on compact time intervals, and happens in probability. Random
partitions of stopping times can also be used.
These results will be reached in Section 5.3. Before the semimartingale
integral we explain predictable processes and construct the integral with
respect to L2 -martingales and local L2 -martingales. Right-continuity of the
filtration {Ft } is not needed until we define the integral with respect to a
semimartingale. And even there it is needed only for guaranteeing that the
semimartingale has a decomposition whose local martingale part is a local
L2 -martingale. Right-continuity of {Ft } is not needed for the arguments
that establish the integral.

107
108 5. Stochastic Integral

5.1. Square-integrable martingale integrator

Throughout this section, we consider a fixed probability space (Ω, F, P ) with
a filtration {Ft }. M is a square-integrable cadlag martingale relative to the
filtration {Ft }. We assume that the probability space and the filtration are
complete. In other words, if N ∈ F has P (N ) = 0, then every subset F ⊆ N
is a member of F and each Ft . Right-continuity of the filtration {Ft } is not
assumed unless specifically stated.

5.1.1. Predictable processes. Predictable rectangles are subsets of R+ ×

Ω of the type (s, t] × F where 0 ≤ s < t < ∞ and F ∈ Fs , or of the
type {0} × F0 where F0 ∈ F0 . R stands for the collection of all predictable
rectangles. We regard the empty set also as a predictable rectangle, since
it can be represented as (s, t] × ∅. The σ-field generated by R in the space
R+ × Ω is denoted by P and called the predictable σ-field. P is a sub-σ-field
of BR+ ⊗ F because R ⊆ BR+ ⊗ F. Any P-measurable function X from
R+ × Ω into R is called a predictable process.
A predictable process is not only adapted to the original filtration {Ft }
but also to the potentially smaller filtration {Ft− } defined in (2.3) [Exercise
5.1]. This gives some mathematical sense to the term “predictable”, because
it means that Xt is knowable from the information “immediately prior to t”
represented by Ft− .
Predictable processes will be the integrands for the stochastic integral.
Before proceeding, let us develop additional characterizations of the σ-field
P.

Lemma 5.1. The following σ-fields on R+ × Ω are all equal to P.

(a) The σ-field generated by all continuous adapted processes.
(b) The σ-field generated by all left-continuous adapted processes.
(c) The σ-field generated by all adapted caglad processes (that is, left-
continuous processes with right limits).

Proof. Continuous processes and caglad processes are left-continuous. Thus

to show that σ-fields (a)–(c) are contained in P, it suffices to show that all
left-continuous processes are P-measurable.
Let X be a left-continuous, adapted process. Let

∞
X
Xn (t, ω) = X0 (ω)1{0} (0) + Xi2−n (ω)1(i2−n ,(i+1)2−n ] (t).
i=0
5.1. Square-integrable martingale integrator 109

Then for B ∈ BR ,
{(t, ω) : Xn (t, ω) ∈ B} = {0} × {ω : X0 (ω) ∈ B}
[∞n o
∪ (i2−n , (i + 1)2−n ] × {ω : Xi2−n (ω) ∈ B}
i=0
which is an event in P, being a countable union of predictable rectangles.
Thus Xn is P-measurable. By left-continuity of X, Xn (t, ω) → X(t, ω) as
n → ∞ for each fixed (t, ω). Since pointwise limits preserve measurability,
X is also P-measurable.
We have shown that P contains σ-fields (a)–(c).
The indicator of a predictable rectangle is itself an adapted caglad pro-
cess, and by definition this subclass of caglad processes generates P. Thus
σ-field (c) contains P. By the same reasoning, also σ-field (b) contains P.
It remains to show that σ-field (a) contains P. We show that all pre-
dictable rectangles lie in σ-field (a) by showing that their indicator functions
are pointwise limits of continuous adapted processes.
If X = 1{0}×F0 for F0 ∈ F0 , let
(
1 − nt, 0 ≤ t < 1/n
gn (t) =
0, t ≥ 1/n,
and then define Xn (t, ω) = 1F0 (ω)gn (t). Xn is clearly continuous. For a
fixed t, writing (
gn (t)1F0 , 0 ≤ t < 1/n
Xn (t) =
0, t ≥ 1/n,
and noting that F0 ∈ Ft for all t ≥ 0, shows that Xn is adapted. Since
Xn (t, ω) → X(t, ω) as n → ∞ for each fixed (t, ω), {0} × F0 lies in σ-field
(a).
If X = 1(u,v]×F for F ∈ Fu , let



n(t − u), u ≤ t < u + 1/n

1, u + 1/n ≤ t < v
hn (t) =


1 − n(t − v), v ≤ t ≤ v + 1/n

0, t < u or t > v + 1/n.
Consider only n large enough so that 1/n < v − u. Define Xn (t, ω) =
1F (ω)hn (t), and adapt the previous argument. We leave the missing details
as Exercise 5.3.

The previous lemma tells us that all continuous adapted processes, all
left-continuous adapted processes, and any process that is a pointwise limit
110 5. Stochastic Integral

[at each (t, ω)] of a sequence of such processes, is predictable. It is impor-

tant to note that left and right continuity are not treated equally in this
theory. The difference arises from the adaptedness requirement. Not all
right continuous processes are predictable. However, an arbitrary determin-
istic process, one that does not depend on ω, is predictable. (See Exercises
5.2 and 5.4).
Given a square-integrable cadlag martingale M , we define its Doléans
measure µM on the predictable σ-field P by
Z
(5.1) µM (A) = E 1A (t, ω)d[M ]t (ω), A ∈ P.
[0,∞)

The meaning of formula (5.1) is that first, for each fixed ω, the function
t 7→ 1A (t, ω) is integrated by the Lebesgue-Stieltjes measure Λ[M ](ω) of the
nondecreasing right-continuous function t 7→ [M ]t (ω). The resulting integral
is a measurable function of ω, which is then averaged over the probability
space (Ω, F, P ) (Exercise 3.1). Recall that our convention for the measure
Λ[M ](ω) {0} of the origin is

Λ[M ](ω) {0} = [M ]0 (ω) − [M ]0− (ω) = 0 − 0 = 0.

Consequently integrals over (0, ∞) and [0, ∞) coincide in (5.1).

Formula (5.1) would make sense for any A ∈ BR+ ⊗ F. But we shall see
that when we want to extend µM beyond P, formula (5.1) does not always
provide the right extension. Since

(5.2) µM ([0, T ] × Ω) = E([M ]T ) = E(MT2 − M02 ) < ∞

for all T < ∞, the measure µM is σ-finite.

Example 5.2 (Brownian motion). If M = B, standard Brownian motion,

we saw in Proposition 2.33 that [B]t = t. Then
Z
µB (A) = E 1A (t, ω)dt = m ⊗ P (A)
[0,∞)

where m denotes Lebesgue measure on R+ . So the Doléans measure of

Brownian motion is m ⊗ P , the product of Lebesgue measure on R+ and
the probability measure P on Ω.

Example 5.3 (Compensated Poisson process). Let N be a homogeneous

rate α Poisson process on R+ with respect to the filtration {Ft }. Let Mt =
Nt − αt. We claim that the Doléans measure µM is αm ⊗ P , where as above
m is Lebesgue measure on R+ . We have seen that [M ] = N (Example 3.21).
5.1. Square-integrable martingale integrator 111

For a predictable rectangle A = (s, t] × F with F ∈ Fs ,

Z Z
µM (A) = E 1A (u, ω)d[M ]u (ω) = E 1F (ω)1(s,t] (u)dNu (ω)
[0,∞) [0,∞)

= E 1F (Nt − Ns ) = E 1F E (Nt − Ns )
= P (F )α(t − s) = αm ⊗ P (A).
A crucial step above used the independence of Nt − Ns and Fs which is
part of the definition of a Poisson process. Both measures µM and αm ⊗ P
give zero measure to sets of the type {0} × F0 . We have shown that µM
and αm ⊗ P agree on the class R of predictable rectangles. By Lemma B.3
they then agree on P. For the application of Lemma B.3, note that the
space R+ × Ω can be written as aScountable disjoint union of predictable
rectangles: R+ × Ω = {0} × Ω ∪ n≥0 (n, n + 1] × Ω.

For predictable processes X, we define the L2 norm over the set [0, T ]×Ω
under the measure µM by
Z 1/2
kXkµM ,T = |X|2 dµM
[0,T ]×Ω
(5.3) Z 1/2
2
= E |X(t, ω)| d[M ]t (ω) .
[0,T ]

Let L2 = L2 (M, P) denote the collection of all predictable processes X

such that kXkµM ,T < ∞ for all T < ∞. A metric on L2 is defined by
dL2 (X, Y ) = kX − Y kL2 where
∞
X
2−k 1 ∧ kXkµM ,k .

(5.4) kXkL2 =
k=1

L2 is not an L2 space, but instead a local L2 space of sorts. The discussion

following definition (3.18) of the metric on martingales can be repeated
here with obvious changes. In particular, to satisfy the requirement that
dL2 (X, Y ) = 0 iff X = Y , we have to regard two processes X and Y in L2
as equal if
(5.5) µM {(t, ω) : X(t, ω) 6= Y (t, ω)} = 0.
Let us say processes X and Y are µM -equivalent if (5.5) holds.
Example 5.4. Suppose X is predictable and bounded on bounded time
intervals, in other words there exist constants CT < ∞ such that, for almost
every ω and all T < ∞, |Xt (ω)| ≤ CT for 0 ≤ t ≤ T . Then X ∈ L2 (M, P)
because
Z
E X(s)2 d[M ]s ≤ CT2 E{[M ]s } = CT2 E{MT2 − M02 } < ∞.
[0,T ]
112 5. Stochastic Integral

5.1.2. Construction of the stochastic integral. R In this section we de-

fine the stochastic integral process (X · M )t = (0,t] X dM for integrands
X ∈ L2 . There are two steps: first an explicit definition of integrals for a
class of processes with a particularly simple structure, and then an approx-
imation step that defines the integral for a general X ∈ L2 .
A simple predictable process is a process of the form
n−1
X
(5.6) Xt (ω) = ξ0 (ω)1{0} (t) + ξi (ω)1(ti ,ti+1 ] (t)
i=1

where n is a finite integer, 0 = t0 = t1 < t2 < · · · < tn are time points, ξi is

a bounded Fti -measurable random variable on (Ω, F, P ) for 0 ≤ i ≤ n − 1.
We set t1 = t0 = 0 for convenience, so the formula for X covers the interval
[0, tn ] without leaving a gap at the origin.

Lemma 5.5. A process of type (5.6) is predictable.

Proof. Immediate from Lemma 5.1 and the left-continuity of X.

Alternatively, here is an elementary argument that shows that X is P-
measurable. For each ξi we can find Fti -measurable simple functions
m(i,N )
βji,N 1F i,N
X
ηiN =
j
j=1

such that ηiN (ω) → ξi (ω) as N → ∞. Here βji,N are constants and Fji,N ∈
Fti . Adding these up, we have that
n−1
X
Xt (ω) = lim η0N (ω)1{0} (t) + ηiN (ω)1(ti ,ti+1 ] (t)
N →∞
i=1
m(0,N
X) X m(i,N
n−1 X)
= lim βj0,N 1{0}×F 0,N (t, ω) + βji,N 1(ti ,ti+1 ]×F i,N (t, ω) .
N →∞ j j
j=1 i=1 j=1

The last function is clearly P-measurable, being a linear combination of indi-

cator functions of predictable rectangles. Consequently X is P-measurable
as a pointwise limit of P-measurable functions.

Definition 5.6. For a simple predictable process of the type (5.6), the
stochastic integral is the process X · M defined by
n−1
X
(5.7) (X · M )t (ω) = ξi (ω) Mti+1 ∧t (ω) − Mti ∧t (ω) .
i=1
5.1. Square-integrable martingale integrator 113

Note that our convention is such that the value of X at t = 0 does not
influence the integral. We also write I(X) = X · M when we need a symbol
for the mapping I : X 7→ X · M .
Let S2 denote the subspace of L2 consisting of simple predictable pro-
cesses. Any particular element X of S2 can be represented in the form (5.6)
with many different choices of random variables and time intervals. The
first thing to check is that the integral X · M depends only on the process X
and not on the particular representation (5.6) used. Also, let us check that
the space S2 is a linear space and the integral behaves linearly, since these
properties are not immediately clear from the definitions.

Lemma 5.7. (a) Suppose the process X in (5.6) also satisfies

m−1
X
Xt (ω) = η0 (ω)1{0} (t) + ηj (ω)1(si ,si+1 ] (t)
j=1

for all (t, ω), where 0 = s0 = s1 < s2 < · · · < sm < ∞ and ηj is Fsj -
measurable for 0 ≤ j ≤ m − 1. Then for each (t, ω),
n−1
X m−1
X
ξi (ω) Mti+1 ∧t (ω) − Mti ∧t (ω) = ηi (ω) Msi+1 ∧t (ω) − Msi ∧t (ω) .
i=1 j=1

In other words, X · M is independent of the representation.

(b) S2 is a linear space, in other words for X, Y ∈ S2 and reals α and
β, αX + βY ∈ S2 . The integral satisfies

(αX + βY ) · M = α(X · M ) + β(Y · M ).

Proof. Part (a). We may assume sm = tn . For if say tn < sm , replace n

by n + 1, define tn+1 = sm and ξn (ω) = 0, and add the term ξn 1(tn ,tn+1 ]
to the {ξi , ti }-representation (5.6) of X. X did not change because the
new term is zero. The stochastic integral X · M then acquires the term
ξn (Mtn+1 ∧t − Mtn ∧t ) which is identically zero. Thus the new term in the
representation does not change the value of either Xt (ω) or (X · M )t (ω).
Let T = sm = tn , and let 0 = u1 < u2 < · · · < up = T be an ordered
relabeling of the set {sj : 1 ≤ j ≤ m} ∪ {ti : 1 ≤ i ≤ n}. Then for each
1 ≤ k ≤ p − 1, there are unique indices i and j such that

(uk , uk+1 ] = (ti , ti+1 ] ∩ (sj , sj+1 ].

For t ∈ (uk , uk+1 ], Xt (ω) = ξi (ω) and Xt (ω) = ηj (ω). So for these particular
i and j, ξi = ηj .
114 5. Stochastic Integral

The proof now follows from a reordering of the sums for the stochastic
integrals.
n−1
X
ξi (Mti+1 ∧t − Mti ∧t )
i=1
n−1
X p−1
X
= ξi (Muk+1 ∧t − Muk ∧t )1{(uk , uk+1 ] ⊆ (ti , ti+1 ]}
i=1 k=1
p−1
X n−1
X
= (Muk+1 ∧t − Muk ∧t ) ξi 1{(uk , uk+1 ] ⊆ (ti , ti+1 ]}
k=1 i=1
p−1
X m−1
X
= (Muk+1 ∧t − Muk ∧t ) ηj 1{(uk , uk+1 ] ⊆ (sj , sj+1 ]}
k=1 j=1
m−1
X p−1
X
= ηj (Muk+1 ∧t − Muk ∧t )1{(uk , uk+1 ] ⊆ (sj , sj+1 ]}
j=1 k=1
m−1
X
= ηj (Msj+1 ∧t − Msj ∧t ).
j=1

Part (b). Suppose we are given two simple predictable processes

n−1
X
Xt = ξ0 1{0} (t) + ξi 1(ti ,ti+1 ] (t)
i=1
and
m−1
X
Yt = η0 1{0} (t) + ηj 1(si ,si+1 ] (t).
j=1
As above, we can assume that T = tn = sm by adding a zero term to one of
these processes. As above, let {uk : 1 ≤ k ≤ p} be the common refinement
of {sj : 1 ≤ j ≤ m} and {ti : 1 ≤ i ≤ n}, as partitions of [0, T ]. Given
1 ≤ k ≤ p − 1, let i = i(k) and j = j(k) be the indices defined by
(uk , uk+1 ] = (ti , ti+1 ] ∩ (sj , sj+1 ].
Define ρk (ω) = ξi(k) (ω) and ζk (ω) = ηj(k) (ω). Then
p−1
X
Xt = ξ0 1{0} (t) + ρk 1(uk ,uk+1 ] (t)
k=1
and
p−1
X
Yt = η0 1{0} (t) + ζk 1(uk ,uk+1 ] (t).
k=1
5.1. Square-integrable martingale integrator 115

The representation
p−1
X
αXt + βYt = (αξ0 + βη0 )1{0} (t) + (αρk + βζk )1(uk ,uk+1 ] (t)
k=1

shows that αX + βY is a member of S2 . According to part (a) proved

above, we can write the stochastic integrals based on these representations,
and then linearity of the integral is clear.

To build a more general integral on definition (5.7), we need some con-

tinuity properties.

Lemma 5.8. Let X ∈ S2 . Then X · M is a square-integrable cadlag mar-

tingale. If M is continuous, then so is X · M . These isometries hold: for
all t > 0
Z
2
X 2 dµM ,

(5.8) E (X · M )t =
[0,t]×Ω

and

(5.9) kX · M kM2 = kXkL2 .

Proof. The cadlag property for each fixed ω is clear from the definition of
X · M , as is the continuity if M is continuous to begin with.
Linear combinations of martingales are martingales. So to prove that
X · M is a martingale it suffices to check this statement: if M is a mar-
tingale, u < v and ξ is a bounded Fu -measurable random variable, then
Zt = ξ(Mt∧v − Mt∧u ) is a martingale. The boundedness of ξ and integrabil-
ity of M guarantee integrability of Zt . Take s < t.
First, if s < u, then

E[Zt |Fs ] = E ξ(Mt∧v − Mt∧u ) Fs

= E ξE{Mt∧v − Mt∧u |Fu } Fs
= 0 = Zs

because Mt∧v − Mt∧u = 0 for t ≤ u, and for t > u the martingale property
of M gives

E{Mt∧v − Mt∧u |Fu } = E{Mt∧v |Fu } − Mu = 0.

116 5. Stochastic Integral

Second, if s ≥ u, then also t > s ≥ u, and it follows that ξ is Fs -

measurable and Mt∧u = Mu = Ms∧u is also Fs -measurable. Then

E[Zt |Fs ] = E ξ(Mt∧v − Mt∧u ) Fs

= ξE Mt∧v − Ms∧u Fs

= ξ E[Mt∧v |Fs ] − Ms∧u
= ξ(Ms∧v − Ms∧u ) = Zs .

In the last equality above, either s ≥ v in which case Mt∧v = Mv = Ms∧v

is Fs -measurable, or s < v in which case we use the martingale property of
M.
We have proved that X · M is a martingale.
Next we prove (5.8). After squaring,
n−1
X 2
(X · M )2t = ξi2 Mt∧ti+1 − Mt∧ti
i=1
X
+2 ξi ξj (Mt∧ti+1 − Mt∧ti )(Mt∧tj+1 − Mt∧tj ).
i<j

We claim that each term of the last sum has zero expectation. Since i < j,
ti+1 ≤ tj and both ξi and ξj are Ftj -measurable.
h
E ξi ξj (Mt∧ti+1 − Mt∧ti )(Mt∧tj+1 − Mt∧tj )

= E ξi ξj (Mt∧ti+1 − Mt∧ti )E{Mt∧tj+1 − Mt∧tj |Ftj } = 0

because the conditional expectation vanishes by the martingale property of

M.
Now we can compute the mean of the square. Let t > 0. The key point
of the next calculation is the fact that M 2 − [M ] is a martingale.

n−1
X 2
E (X · M )2t = E ξi2 Mt∧ti+1 − Mt∧ti

i=1
n−1
X
E ξi2 E (Mt∧ti+1 − Mt∧ti )2 Fti

=
i=1
n−1
X
E ξi2 E Mt∧t
2 2

= i+1
− Mt∧ti
Fti
i=1
n−1
X
E ξi2 E [M ]t∧ti+1 − [M ]t∧ti Fti

=
i=1
5.1. Square-integrable martingale integrator 117

n−1
X
E ξi2 [M ]t∧ti+1 − [M ]t∧ti

=
i=1
n−1
X h Z i
= E ξi2 1(ti ,ti+1 ] (s) d[M ]s
i=1 [0,t]
Z n−1
X
=E ξ02 1{0} (s) + ξi2 1(ti ,ti+1 ] (s) d[M ]s
[0,t] i=1
Z n−1
X 2
=E ξ0 1{0} (s) + ξi 1(ti ,ti+1 ] (s) d[M ]s
[0,t] i=1
Z
= X 2 dµM .
[0,t]×Ω

In the third last equality we added the term ξ02 1{0} (s) inside the d[M ]s -
integral because this term integrates to zero (recall that Λ[M ] {0} = 0). In
the second last equality we used the equality
n−1
X n−1
X 2
2 2
ξ0 1{0} (s) + ξi 1(ti ,ti+1 ] (s) = ξ0 1{0} (s) + ξi 1(ti ,ti+1 ] (s)
i=1 i=1
which is true due to the pairwise disjointness of the time intervals.
The above calculation checks that
k(X · M )t kL2 (P ) = kXkµM ,t
for any t > 0. Comparison of formulas (3.18) and (5.4) then proves (5.9).

Let us summarize the message of Lemmas 5.7 and 5.8 in words. The
stochastic integral I : X 7→ X · M is a linear map from the space S2 of
predictable simple processes into M2 . Equality (5.9) says that this map is
a linear isometry that maps from the subspace (S2 , dL2 ) of the metric space
(L2 , dL2 ), and into the metric space (M2 , dM2 ). In case M is continuous,
the map goes into the space (Mc2 , dM2 ).
A consequence of (5.9) is that if X and Y satisfy (5.5) then X · M and
Y · M are indistinguishable. For example, we may have Yt = Xt + ζ1{t = 0}
for a bounded F0 -measurable random variable ζ. Then the integrals X · M
and Y · M are indistinguishable, in other words the same process. This is
no different from the analytic fact that changing the value of a function f
on [a, b] at a single point (or even at countably many points) does not affect
Rb
the value of the integral a f (x) dx.
We come to the approximation step.
Lemma 5.9. For any X ∈ L2 there exists a sequence Xn ∈ S2 such that
kX − Xn kL2 → 0.
118 5. Stochastic Integral

Proof. Let Le2 denote the class of X ∈ L2 for which this approximation is
possible. Of course S2 itself is a subset of Le2 .
Indicator functions of time-bounded predictable rectangles are of the
form
1{0}×F0 (t, ω) = 1F0 (ω)1{0} (t),
or
1(u,v]×F (t, ω) = 1F (ω)1(u,v] (t),
for F0 ∈ F0 , 0 ≤ u < v < ∞, and F ∈ Fu . They are elements of S2 due to
(5.2). Furthermore, since S2 is a linear space, it contains all simple functions
of the form
Xn
(5.10) X(t, ω) = ci 1Ri (t, ω)
i=0

where {ci } are finite constants and {Ri } are time-bounded predictable rect-
angles.
Now we do the actual approximation of predictable processes, beginning
with constant multiples of indicator functions of predictable sets.

Step 1. Let G ∈ P be an arbitrary set and c ∈ R. We shall show that

X = c1G lies in Le2 . We can assume c 6= 0, otherwise X = 0 ∈ S2 . Again by
(5.2) c1G ∈ L2 because
1/2
kc1G kµM ,T = |c| · µ G ∩ ([0, T ] × Ω) <∞
for all T < ∞.
Given ε > 0 fix n large enough so that 2−n < ε/2. Let Gn = G ∩ ([0, n] ×
Ω). Consider the restricted σ-algebra
Pn = {A ∈ P : A ⊆ [0, n] × Ω} = {B ∩ ([0, n] × Ω) : B ∈ P}.
Pn is generated by the collection Rn of predictable rectangles that lie in
[0, n] × Ω (Exercise 1.2 part (d)). Rn is a semialgebra in the space [0, n] × Ω.
(For this reason it is convenient to regard ∅ as a member of Rn .) The algebra
An generated by Rn is the collection of all finite disjoint unions of members
of Rn (Lemma B.5). Restricted to [0, n]×Ω, µM is a finite measure. Thus by
Lemma B.6 there exists R ∈ An such that µM (Gn 4R) < |c|−2 ε2 /4. We can
write R = R1 ∪· · ·∪Rp as a finite disjoint union of time-bounded predictable
rectangles.
Let Z = c1R . By the disjointness,
p
n[ o p
X
Z = c1R = c1 Ri = c1Ri
i=1 i=1
5.1. Square-integrable martingale integrator 119

so in fact Z is of type (5.10) and a member of S2 . The L2 -distance between

Z = c1R and X = c1G is now estimated as follows.
n
X
kZ − XkL2 ≤ 2−k kc1R − c1G kµM ,k + 2−n
k=1
n
X Z 1/2
−k 2
≤ 2 |c| |1R − 1G | dµM + ε/2
k=1 [0,k]×Ω
Z 1/2
2
≤ |c| |1R − 1Gn | dµM + ε/2
[0,n]×Ω

= |c|µM (Gn 4R)1/2 + ε/2 ≤ ε.

Above we bounded integrals over [0, k] × Ω with the integral over [0, n] × Ω
for 1 ≤ k ≤ n, then noted that 1G (t, ω) = 1Gn (t, ω) for (t, ω) ∈ [0, n] × Ω,
and finally used the general fact that
|1A − 1B | = 1A4B
for any two sets A and B.
To summarize, we have shown that given G ∈ P, c ∈ R, and ε > 0,
there exists a process Z ∈ S2 such that kZ − c1G kL2 ≤ ε. Consequently
c1G ∈ Le2 .
Let us observe that Le2 is closed under addition. Let X, Y ∈ Le2 and
Xn , Yn ∈ S2 be such that kXn − XkL2 and kYn − Y kL2 vanish as n → ∞.
Then Xn + Yn ∈ S2 and by the triangle inequality
k(X + Y ) − (Xn + Yn )kL2 ≤ kXn − XkL2 + kXn − XkL2 → 0.
From this and the proof for c1G we conclude that all simple functions of the
type
n
X
(5.11) X= ci 1Gi , with ci ∈ R and Gi ∈ P,
i=1

lie in Le2 .

Step 2. Let X be an arbitrary process in L2 . Given ε > 0, pick

n so that 2−n < ε/3. Find simple functions Xm of the type (5.11) such
that |X − Xm | ≤ |X| and Xm (t, ω) → X(t, ω) for all (t, ω). Since X ∈
L2 ([0, n] × Ω, Pn , µM ), Lebesgue’s dominated convergence theorem implies
for 1 ≤ k ≤ n that
Z 1/2
lim supkX − Xm kµM ,k ≤ lim |X − Xm |2 dµM = 0.
m→∞ m→∞ [0,n]×Ω
120 5. Stochastic Integral

Consequently
n
X
lim supkX − Xm kL2 ≤ 2−k lim supkX − Xm kµM ,k + ε/3 = ε/3.
m→∞ m→∞
k=1

Fix m large enough so that kX − Xm kL2 ≤ ε/2. Using Step 1 find a process
Z ∈ S2 such that kXm − ZkL2 < ε/2. Then by the triangle inequality
kX − ZkL2 ≤ ε. We have shown that an arbitrary process X ∈ L2 can be
approximated by simple predictable processes in the L2 -distance.

Now we can state formally the definition of the stochastic integral.

Definition 5.10. Let M be a square-integrable cadlag martingale on a

probability space (Ω, F, P ) with filtration {Ft }. For any predictable process
X ∈ L2 (M, P), the stochastic integral I(X) = X · M is the square-integrable
cadlag martingale that satisfies
lim kX · M − Xn · M kM2 = 0
n→∞

for every sequence Xn ∈ S2 of simple predictable processes such that kX −

Xn kL2 → 0. The process I(X) is unique up to indistinguishability. If M is
continuous, then so is X · M .

Justification of the definition. Here is the argument that justifies the

claims implicit in the definition. It is really the classic argument about
extending a uniformly continuous map into a complete metric space to the
closure of its domain.

Existence. Let X ∈ L2 . By Lemma 5.9 there exists a sequence Xn ∈ S2

such that kX − Xn kL2 → 0. From the triangle inequality it then follows
that {Xn } is a Cauchy sequence in L2 : given ε > 0, choose n0 so that
kX − Xn kL2 ≤ ε/2 for n ≥ n0 . Then if m, n ≥ n0 ,
kXm − Xn kL2 ≤ kXm − XkL2 + kX − Xn kL2 ≤ ε.
For Xn ∈ S2 the stochastic integral Xn · M was defined in (5.7). By the
isometry (5.9) and the additivity of the integral,
kXm · M − Xn · M kM2 = k(Xm − Xn ) · M kM2 = kXm − Xn kL2 .

Consequently {Xn ·M } is a Cauchy sequence in the space M2 of martingales.

If M is continuous this Cauchy sequence lies in the space Mc2 . By Lemma
3.35 these spaces are complete metric spaces, and consequently there exists
a limit process Y = limn→∞ Xn · M . This process we call I(X) = X · M .
5.1. Square-integrable martingale integrator 121

Uniqueness. Let Zn be another sequence in S2 that converges to X in

L2 . We need to show that Zn · M converges to the same Y = X · M in M2 .
This follows again from the triangle inequality and the isometry:
kY − Zn · M kM2 ≤ kY − Xn · M kM2 + kXn · M − Zn · M kM2
= kY − Xn · M kM2 + kXn − Zn kL2
≤ kY − Xn · M kM2 + kXn − XkL2 + kX − Zn kL2 .
All terms on the last line vanish as n → ∞. This shows that Zn · M → Y ,
and so there is only one process Y = X · M that satisfies the description of
the definition.
Note that the uniqueness of the stochastic integral cannot hold in a sense
stronger than indistinguishability. If W is a process that is indistinguishable
from X · M , which meant that
P {ω : Wt (ω) = (X · M )t (ω) for all t ∈ R+ } = 1,
then W also has to be regarded as the stochastic integral. This is built into
the definition of I(X) as the limit: if kX · M − Xn · M kM2 → 0 and W is
indistinguishable from X · M , then also kW − Xn · M kM2 → 0.

The definition of the stochastic integral X · M feels somewhat abstract

because the approximation happens in a space of processes, and it may not
seem obvious how to produce the approximating predictable simple processes
Xn . When X is caglad, one can use Riemann sum type approximations
with X-values evaluated at left endpoints of partition intervals. To get L2
approximation, one must truncate the process, and then let the mesh of the
partition shrink fast enough and the number of terms in the simple process
grow fast enough. See Proposition 5.29 and Exercise 5.11.
We took the approximation step in the space of martingales to avoid
separate arguments for the path properties of the integral. The completeness
of the space of cadlag martingales and the space of continuous martingales
gives immediately a stochastic integral with the appropriate path regularity.
As for the style of convergence in the definition of the integral, let us
recall that convergence in the spaces of processes actually reduces back to
familiar mean-square convergence. kXn − XkL2 → 0 is equivalent to having
Z
|Xn − X|2 dµM → 0 for all T < ∞.
[0,T ]×Ω

Convergence in M2 is equivalent to L2 (P ) convergence at each fixed time t:

for martingales N (j) , N ∈ M2 ,
kN (j) − N kM2 → 0
122 5. Stochastic Integral

if and only if
(j)
E (Nt − Nt )2 → 0

for each t ≥ 0.
In particular, at each time t ≥ 0 the integral (X · M )t is the mean-square
limit of the integrals (Xn · M )t of approximating simple processes. These
observations are used in the extension of the isometric property of the inte-
gral.
Proposition 5.11. Let M ∈ M2 and X ∈ L2 (M, P). Then we have the
isometries
Z
2
X 2 dµM for all t ≥ 0,

(5.12) E (X · M )t =
[0,t]×Ω

and
(5.13) kX · M kM2 = kXkL2 .
In particular, if X, Y ∈ L2 (M, P) are µM -equivalent in the sense (5.5), then
X · M and Y · M are indistinguishable.

Proof. As already observed, the triangle inequality is valid for the distance
measures k · kL2 and k · kM2 . From this we get a continuity property. Let
Z, W ∈ L2 .
kZkL2 − kW kL2 ≤ kZ − W kL2 + kW kL2 − kW kL2
≤ kZ − W kL2 .
This and the same inequality with Z and W switched give
(5.14) kZkL2 − kW kL2 ≤ kZ − W kL2 .
This same calculation applies to k · kM2 also, and of course equally well to
the L2 norms on Ω and [0, T ] × Ω.
Let Xn ∈ S2 be a sequence such that kXn − XkL2 → 0. As we proved
in Lemma 5.8, the isometries hold for Xn ∈ S. Consequently to prove the
proposition we need only let n → ∞ in the equalities
Z
2
Xn2 dµM

E (Xn · M )t =
[0,t]×Ω

and
kXn · M kM2 = kXn kL2
that come from Lemma 5.8. Each term converges to the corresponding term
with Xn replaced by X.
The last statement of the proposition follows because kX − Y kL2 = 0
iff X and Y are µM -equivalent, and kX · M − Y · M kM2 = 0 iff X · M and
Y · M are indistinguishable.
5.1. Square-integrable martingale integrator 123

Remark 5.12 (Enlarging the filtration). Throughout we assume that M

is a cadlag martingale. By Proposition 3.2, if our original filtration {Ft }
is not already right-continuous, we can replace it with the larger filtration
{Ft+ }. Under the filtration {Ft+ } we have more predictable rectangles than
before, and hence P+ (the predictable σ-field defined in terms of {Ft+ })
is potentially larger than our original predictable σ-field P. The relevant
question is whether switching to {Ft+ } and P+ gives us more processes X
to integrate? The answer is essentially no. Only the value at t = 0 of a P+ -
measurable process differentiates it from a P-measurable process (Exercise
5.5). And as alreay seen, the value X0 is irrelevant for the stochastic integral.

5.1.3. Properties of the stochastic integral. We prove here basic prop-

erties of the L2 integral X · M constructed in Definition 5.10. Many of these
properties really amount to saying that the notation works the way we would
expect it to work. Those properties that take the form of an equality between
two stochastic integrals are interpreted in the sense that the two processes
are indistinguishable. Since the stochastic integrals are cadlag processes,
indistinguishability follows from showing almost sure equality at any fixed
time (Lemma 2.4).
The stochastic integral X · M was defined as a limit Xn · M → X · M
in M2 -space, where Xn · M are stochastic integrals of approximating simple
predictable processes Xn . Recall that this implies that for a fixed time t, (X ·
M )t is the L2 (P )-limit of the random variables (Xn · M )t . And furthermore,
there is uniform convergence in probability on compact intervals:
n o
(5.15) lim P sup (Xn · M )t − (X · M )t ≥ ε = 0
n→∞ 0≤t≤T

for each ε > 0 and T < ∞. By the Borel-Cantelli lemma, along some
subsequence {nj } there is almost sure convergence uniformly on compact
time intervals: for P -almost every ω

lim sup (Xn · M )t (ω) − (X · M )t (ω) = 0 for each T < ∞.

n→∞ 0≤t≤T

These last two statements are general properties of M2 -convergence, see

Lemma 3.36.
A product of functions of t, ω, and (t, ω) is regarded as a process in
the obvious sense: for example, if X is a process, Z is a random variable
and f is a function on R+ , then f ZX is the process whose value at (t, ω)
is f (t)Z(ω)X(t, ω). This just amounts to taking some liberties with the
notation: we do not distinguish notationally between the function t 7→ f (t)
on R+ and the function (t, ω) 7→ f (t) on R+ × Ω.
124 5. Stochastic Integral

Throughout these proofs, when Xn ∈ S2 approximates X ∈ L2 , we write

Xn generically in the form
k−1
X
(5.16) Xn (t, ω) = ξ0 (ω)1{0} (t) + ξi (ω)1(ti ,ti+1 ] (t).
i=1

We also introduce the familiar integral notation through the definition

Z
(5.17) X dM = (X · M )t − (X · M )s for 0 ≤ s ≤ t.
(s,t]

To explicitly display either the time variable (the integration variable), or

the sample point ω, this notation has variants such as
Z Z Z
X dM = Xu dMu = Xu (ω) dMu (ω).
(s,t] (s,t] (s,t]

When the martingale M is continuous, we can also write

Z t
X dM
s
because then including or excluding endpoints of the interval make no dif-
ference (Exercise 5.6).
We shall alternate freely between the different notations for the stochas-
tic integral, using whichever seems more clear, compact or convenient.
Since (X · M )0 = 0 for any stochastic integral,
Z
X dM = (X · M )t .
(0,t]

It is more accurate to use the interval (0, t] above rather than [0, t] because
the integral does not take into consideration any jump of the martingale at
the origin. Precisely, if ζ is an F0 -measurable random variable and M ft =
ζ + Mt , then [M ] = [M ], the spaces L2 (M , P) and L(M, P) coincide, and
f f
X ·M f = X · M for each admissible integrand.
An integral of the type
Z
G(s, ω) d[M ]s (ω)
(u,v]

is interpreted as a path-by-path Lebesgue-Stieltjes integral (in other words,

evaluated as an ordinary Lebesgue-Stieltjes integral over (u, v] separately for
each fixed ω).
Proposition 5.13. (a) Linearity:
(αX + βY ) · M = α(X · M ) + β(Y · M ).
5.1. Square-integrable martingale integrator 125

(b) For any 0 ≤ u ≤ v,

Z Z
(5.18) 1[0,v] X dM = X dM
(0,t] (0,v∧t]

and
Z
1(u,v] X dM = (X · M )v∧t − (X · M )u∧t
(0,t]
(5.19) Z
= X dM.
(u∧t,v∧t]

The inclusion or exclusion of the origin in the interval [0, v] is immaterial be-
cause a process of the type 1{0} (t)X(t, ω) for X ∈ L2 (M, P) is µM -equivalent
to the identically zero process, and hence has zero stochastic integral.
(c) For s < t, we have a conditional form of the isometry:
Z
2
Xu2 d[M ]u

(5.20) E (X · M )t − (X · M )s Fs = E Fs .
(s,t]

This implies that Z

(X · M )2t − Xu2 d[M ]u
(0,t]
is a martingale.

Proof. Part (a). Take limits in Lemma 5.7(b).

Part (b). If Xn ∈ S2 approximate X, then

k−1
X
1[0,v] (t)Xn (t) = ξ0 1{0} (t) + ξi 1(ti ∧v,ti+1 ∧v] (t)
i=1

are simple predictable processes that approximate 1[0,v] X.

k−1
X

(1[0,v] Xn ) · M t
= ξi (Mti+1 ∧v∧t − Mti ∧v∧t )
i=1
= (Xn · M )v∧t .
Letting n → ∞ along a suitable subsequence gives in the limit the almost
sure equality

(1[0,v] X) · M t = (X · M )v∧t
which is (5.18). The second part (5.19) comes from 1(u,v] X = 1[0,v] X −
1[0,u] X, the additivity of the integral, and the definition (5.17).

Part (c). First we check this for the simple process Xn in (5.16). This
is essentially a redoing of the calculations in the proof of Lemma 5.8. Let
126 5. Stochastic Integral

s < t. If s ≥ sk then both sides of (5.20) are zero. Otherwise, fix an index
1 ≤ m ≤ k − 1 such that tm ≤ s < tm+1 . Then
k−1
X
(Xn · M )t − (Xn · M )s = ξm (Mtm+1 ∧t − Ms ) + ξi (Mti+1 ∧t − Mti ∧t )
i=m+1
k−1
X
= ξi (Mui+1 ∧t − Mui ∧t )
i=m

where we have temporarily defined um = s and ui = ti for i > m. After

squaring,
k−1
X 2
(Xn · Mt − Xn · Ms )2 = ξi2 Mui+1 ∧t − Mui ∧t
i=m
X
+2 ξi ξj (Mui+1 ∧t − Mui ∧t )(Muj+1 ∧t − Muj ∧t ).
m≤i<j<k

We claim that the cross terms vanish under the conditional expectation.
Since i < j, ui+1 ≤ uj and both ξi and ξj are Fuj -measurable.

E ξi ξj (Mui+1 ∧t − Mui ∧t )(Muj+1 ∧t − Muj ∧t ) Fs

= E ξi ξj (Mui+1 ∧t − Mui ∧t )E{Muj+1 ∧t − Muj ∧t |Fuj } Fs = 0
because the inner conditional expectation vanishes by the martingale prop-
erty of M .
Now we can compute the conditional expectation of the square.
k−1
2 X 2
E ξi2 Mui+1 ∧t − Mui ∧t

E (Xn · M )t − (Xn · M )s Fs = Fs
i=m
k−1
X
E ξi2 E (Mui+1 ∧t − Mui ∧t )2 Fui

= Fs
i=m
k−1
X
E ξi2 E Mu2i+1 ∧t − Mu2i ∧t Fui

= Fs
i=m
k−1
X
E ξi2 E [M ]ui+1 ∧t − [M ]ui ∧t Fui

= Fs
i=m
k−1
X
E ξi2 [M ]ui+1 ∧t − [M ]ui ∧t

= Fs
i=m
k−1
X h Z i
= E ξi2 1(ui ,ui+1 ] (u) d[M ]u Fs
i=m (s,t]
5.1. Square-integrable martingale integrator 127

Z k−1
X
=E ξ0 1{0} (u) + ξi2 1(ti ,ti+1 ] (u) d[M ]u Fs
(s,t] i=1
Z k−1
X 2
=E ξ0 1{0} (u) + ξi 1(ti ,ti+1 ] (u) d[M ]u Fs
(s,t] i=1
Z
2
=E Xn (u, ω) d[M ]u (ω) Fs
(s,t]

Inside the d[M ]u integral above we replaced the ui ’s with ti ’s because for
u ∈ (s, t], 1(ui ,ui+1 ] (u) = 1(ti ,ti+1 ] (u). Also, we brought in the terms for
i < m because these do not influence the integral, as they are supported on
[0, tm ] which is disjoint from (s, t].
Next let X ∈ L2 be general, and Xn → X in L2 . The limit n → ∞ is
best taken with expectations, so we rewrite the conclusion of the previous
calculation as
Z
E (Xn · Mt − Xn · Ms )2 1A = E 1A Xn2 (u) d[M ]u

(s,t]

for an arbitrary A ∈ Fs . Rewrite this once again as

Z
E (Xn · M )2t 1A − E (Xn · M )2s 1A = Xn2 dµM .

(s,t]×A

All terms in this equality converge to the corresponding integrals with Xn

replaced by X, because (Xn · M )t → (X · M )t in L2 (P ) and Xn → X in
L2 ((0, t] × Ω, µM ) (see Lemma A.15 in the Appendix for the general idea).
As A ∈ Fs is arbitrary, (5.20) is proved.

Given stopping times σ and τ we can define various stochastic intervals.

These are subsets of R+ × Ω. Here are two examples which are elements of
P:
[0, τ ] = {(t, ω) ∈ R+ × Ω : 0 ≤ t ≤ τ (ω)},
and
(σ, τ ] = {(t, ω) ∈ R+ × Ω : σ(ω) < t ≤ τ (ω)}.
If τ (ω) = ∞, the ω-section of [0, τ ] is [0, ∞), because (∞, ω) is not a point
in the space R+ × Ω. If σ(ω) = ∞ then the ω-section of (σ, τ ] is empty. The
path t 7→ 1[0,τ ] (t, ω) = 1[0,τ (ω)] (t) is left-continuous with right limits. Hence
by Lemma 5.1 the indicator function 1[0,τ ] is P-measurable. The same goes
for 1(σ,τ ] . If X is a predictable process, then so is the product 1[0,τ ] X.
Recall also the notion of a stopped process M τ defined by Mtτ = Mτ ∧t .
If M ∈ M2 then also M τ ∈ M2 , because Lemma 3.5 implies
E[Mτ2∧t ] ≤ 2E[Mt2 ] + E[M02 ].
128 5. Stochastic Integral

We insert a lemma on the effect of stopping on the Doléans measure.

Lemma 5.14. Let M ∈ M2 and τ a stopping time. Then for any P-
measurable nonnegative function Y ,
Z Z
(5.21) Y dµM τ = 1[0,τ ] Y dµM .
R+ ×Ω R+ ×Ω

Proof. Consider first a nondecreasing cadlag function G on [0, ∞). For

u > 0, define the stopped function Gu by Gu (t) = G(u ∧ t). Then the
Lebesgue-Stieltjes measures satisfy
Z Z
h dΛGu = 1(0,u] h ΛG
(0,∞) (0,∞)

for every nonnegative Borel function h. This can be justified by the π-λ
Theorem. For any interval (a, b],
ΛGu (s, t] = Gu (t) − Gu (s) = G(u ∧ t) − G(u ∧ s) = ΛG (s, t] ∩ (0, u] .

Then by Lemma B.3 the measures ΛGu and ΛG ( · ∩ (0, u]) coincide on all
Borel sets of (0, ∞). The equality extends to [0, ∞) if we set G(0−) = G(0)
so that the measure of {0} is zero under both measures.
Now fix ω and apply the preceding. By Lemma 3.23, [M τ ] = [M ]τ , and
so
Z Z
Y (s, ω) d[M τ ]s (ω) = Y (s, ω) d[M ]τs (ω)
[0,∞) [0,∞)
Z
= 1[0,τ (ω)] (s)Y (s, ω) d[M ]s (ω).
[0,∞)

Taking expectation over this equality gives the conclusion.

The lemma implies that that the measure µM τ is absolutely continuous

with respect to µM , and furthermore that L2 (M, P) ⊆ L2 (M τ , P).
Proposition 5.15. Let M ∈ M2 , X ∈ L2 (M, P), and let τ be a stopping
time.
(a) Let Z be a bounded Fτ -measurable random variable. Then Z1(τ,∞) X
and 1(τ,∞) X are both members of L2 (M, P), and
Z Z
(5.22) Z1(τ,∞) X dM = Z 1(τ,∞) X dM.
(0,t] (0,t]

(b) The integral behaves as follows under stopping:

(1[0,τ ] X) · M t = (X · M )τ ∧t = (X · M τ )t .

(5.23)
5.1. Square-integrable martingale integrator 129

(c) Let also N ∈ M2 and Y ∈ L2 (N, P). Suppose there is a stopping

time σ such that Xt (ω) = Yt (ω) and Mt (ω) = Nt (ω) for 0 ≤ t ≤ σ(ω). Then
(X · M )σ∧t = (Y · N )σ∧t .
Remark 5.16. Equation (5.23) implies that τ can appear in any subset of
the three locations. For example,
(X · M )τ ∧t = (X · M )τ ∧τ ∧t = (X · M τ )τ ∧t
(5.24)
= (X · M τ )τ ∧τ ∧t = (1[0,τ ] X) · M τ

τ ∧t
.

Proof. (a) Z1(τ,∞) is P-measurable because it is an adapted caglad process.

(This process equals Z if t > τ , otherwise it vanishes. If t > τ then Fτ ⊆ Ft
which implies that Z is Ft -measurable.) This takes care of the measurability
issue. Multiplying X ∈ L2 (M, P) by something bounded and P-measurable
creates a process in L2 (M, P).
Assume first τ = u, a deterministic time. Let Xn as in (5.16) approxi-
mate X in L2 . Then
k−1
X
1(u,∞) Xn = ξi 1(u∨ti ,u∨ti+1 ]
i=1

approximates 1(u,∞) X in L2 . And

k−1
X
Z1(u,∞) Xn = Zξi 1(u∨ti ,u∨ti+1 ]
i=1

are elements of S2 that approximate Z1(u,∞) X in L2 . Their integrals are

k−1
X

(Z1(u,∞) Xn ) · M t
= Zξi (M(u∨ti+1 )∧t − M(u∨ti )∧t )
i=1

= Z (1(u,∞) Xn ) · Mt .
Letting n → ∞ along a suitable subsequence gives almost sure convergence
of both sides of this equality to the corresponding terms in (5.22) at time t,
in the case τ = u.
Now let τ be a general stopping time. Define τ m by
(
i2−m , if (i − 1)2−m ≤ τ < i2−m for some 1 ≤ i ≤ 2m m
τm =
∞, if τ ≥ m.
Pointwise τ m & τ as m % ∞, and 1(τ m ,∞) % 1(τ,∞) . Both
1{(i−1)2−m ≤τ <i2−m } and 1{(i−1)2−m ≤τ <i2−m } Z
are Fi2−m -measurable for each i. (The former by definition of a stopping
time, the latter by Exercise 2.5.) The first part proved above applies to each
130 5. Stochastic Integral

such random variable with u = i2−m .

mm
2X
(Z1(τ m ,∞) X) · M = 1{(i−1)2−m ≤τ <i2−m } Z1(i2−m ,∞) X · M
i=1
mm
2X

=Z 1{(i−1)2−m ≤τ <i2−m } 1(i2−m ,∞) X · M
i=1
mm
2X
=Z 1{(i−1)2−m ≤τ <i2−m } 1(i2−m ,∞) X ·M
i=1

= Z (1(τ m ,∞) X) · M .

Let m → ∞. Because Z1(τ m ,∞) X → Z1(τ,∞) X and 1(τ m ,∞) X → 1(τ,∞) X

in L2 , both extreme members of the equalities above converge in M2 to the
corresponding martingales with τ m replaced by τ . This completes the proof
of part (a).

Part (b). We prove the first equality in (5.23). Let τn = 2−n ([2n τ ] + 1)
be the usual discrete approximation that converges down to τ as n → ∞.
Let `(n) = [2n t] + 1. Since τ ≥ k2−n iff τn ≥ (k + 1)2−n ,
`(n)
X
1{τ ≥ k2−n } (X · M )(k+1)2−n ∧t − (X · M )k2−n ∧t

(X · M )τn ∧t =
k=0
`(n) Z
X
−n
= 1{τ ≥ k2 } 1(k2−n ,(k+1)2−n ] X dM
k=0 (0,t]

`(n) Z
X
= 1{τ ≥ k2−n }1(k2−n ,(k+1)2−n ] X dM
k=0 (0,t]
Z `(n)
X
= 1{0} X + 1{τ ≥ k2−n }1(k2−n ,(k+1)2−n ] X dM
(0,t] k=0
Z
= 1[0,τn ] X dM.
(0,t]

In the calculation above, the second equality comes from (5.19), the third
from (5.22) where Z is the Fk2−n -measurable 1{τ ≥ k2−n }. The next to
last equality uses additivity and adds in the term 1{0} X that integrates to
zero. The last equality follows from the identity
`(n)
X
1[0,τn ] (t, ω) = 1{0} (t) + 1{τ ≥k2−n } (ω)1(k2−n ,(k+1)2−n ] (t).
k=0
5.1. Square-integrable martingale integrator 131

Now let n → ∞. By right-continuity, (X · M )τn ∧t → (X · M )τ ∧t . To show

that the last term of the string of equalities converges to (1[0,τ ] X) · M t , it
suffices to show, by the isometry (5.12), that
Z
2
lim 1[0,τn ] X − 1[0,τ ] X dµM = 0.
n→∞ [0,t]×Ω

This follows from dominated convergence. The integrand vanishes as n → ∞

because
(
0, τ (ω) = ∞
1[0,τn ] (t, ω) − 1[0,τ ] (t, ω) =
1{τ (ω) < t ≤ τn (ω)}, τ (ω) < ∞

and τn (ω) & τ (ω). The integrand is bounded by |X|2 for all n, and
Z
|X|2 dµM < ∞
[0,t]×Ω

by the assumption X ∈ L2 . This completes the proof of the first equality in

(5.23).
We turn to proving the second equality of (5.23). Let Xn ∈ S2 as in
(5.16) approximate X in L2 (M, P). By (5.21), X ∈ L2 (M τ , P) and the
processes Xn approximate X also in L2 (M τ , P). Comparing their integrals,
we get
X
(Xn · M τ )t = ξi (Mtτi+1 ∧t − Mtτi ∧t )
i
X
= ξi (Mti+1 ∧t∧τ − Mti ∧t∧τ )
i
= (Xn · M )t∧τ
By the definition of the stochastic integral X · M τ , the random variables
(Xn · M τ )t converge to (X · M τ )t in L2 as n → ∞.
We cannot appeal to the definition of the integral to assert the con-
vergence of (Xn · M )t∧τ to (X · M )t∧τ because the time point is random.
However, martingales afford strong control of their paths. Yn (t) = (Xn ·
M )t − (X · M )t is an L2 martingale with Yn (0) = 0. Lemma 3.5 applied to
the submartingale Yn2 (t) implies
h 2 i
E[Yn2 (t ∧ τ )] ≤ 2E[Yn2 (t)] = 2E (Xn · M )t − (X · M )t

and this last expectation vanishes as n → ∞ by the definition of X · M .

Consequently
(Xn · M )t∧τ → (X · M )t∧τ in L2 .
This completes the proof of the second equality in (5.23).
132 5. Stochastic Integral

Part (c). Since 1[0,σ] X = 1[0,σ] Y and M σ = N σ ,

(X · M )t∧σ = (1[0,σ] X) · M σ t = (1[0,σ] Y ) · N σ t = (Y · N )t∧σ .

Example 5.17. Let us record some simple integrals as consequences of the
properties.
(a) Let σ ≤ τ be two stopping times, and ξ a bounded Fσ -measurable
random variable. Define X = ξ1(σ,τ ] , or more explicitly,
Xt (ω) = ξ(ω)1(σ(ω),τ (ω)] (t).
As a caglad process, X is predictable. Let M be an L2 -martingale. Pick a
constant C ≥ |ξ(ω)|. Then for any T < ∞,
Z
X 2 dµM = E ξ 2 [M ]τ ∧T − [M ]σ∧T ≤ C 2 E [M ]τ ∧T

[0,T ]×Ω

= C 2 E{Mτ2∧T } ≤ C 2 E{MT2 } < ∞.

Thus X ∈ L2 (M, P). By (5.22) and (5.23),

X · M = (ξ1(σ,∞) 1[0,τ ] ) · M = ξ (1(σ,∞) 1[0,τ ] ) · M = ξ (1[0,τ ] − 1[0,σ] ) · M
= ξ (1 · M )τ − (1 · M )σ = ξ(M τ − M σ ).

Above we used 1 to denote the function or process that is identically one.

(b) Continuing the example, consider a sequence 0 ≤ σ1 ≤ σ2 ≤ · · · ≤
σi % ∞ of stopping times, and random variables {ηi : i ≥ 1} such that ηi is
Fσi -measurable and C = supi,ω |ηi (ω)| < ∞. Let
∞
X
X(t) = ηi 1(σi ,σi+1 ] (t).
i=1

As a bounded caglad process, X ∈ L2 (M, P) for any L2 -martingale M . Let

n
X
Xn (t) = ηi 1(σi ,σi+1 ] (t).
i=1
By part (a) of the example and the additivity of the integral,
n
X
Xn · M = ηi (M σi+1 − M σi ).
i=1
Xn → X pointwise. And since |X − Xn | ≤ 2C,
Z
|X − Xn |2 dµM −→ 0
[0,T ]×Ω

for any T < ∞ by dominated convergence. Consequently Xn → X in

L2 (M, P), and then by the isometry, Xn · M → X · M in M2 . From the
5.1. Square-integrable martingale integrator 133

formula for Xn · M it is clear where it converges pointwise, and this limit

must agree with the M2 limit. The conclusion is
∞
X
X ·M = ηi (M σi+1 − M σi ).
i=1

As the last issue of this section, we consider integrating a given process

X with respect to more than one martingale.
Proposition 5.18. Let M, N ∈ M2 , α, β ∈ R, and X ∈ L2 (M, P) ∩
L2 (N, P). Then X ∈ L2 (αM + βN, P), and
(5.25) X · (αM + βN ) = α(X · M ) + β(X · N ).

Lemma 5.19. For a predictable process Y ,

Z 1/2
2
|Y | dµαM +βN
[0,T ]×Ω
Z 1/2 Z 1/2
2 2
≤ |α| |Y | dµM + |β| |Y | dµN .
[0,T ]×Ω [0,T ]×Ω

Proof. The linearity

[αM + βN ] = α2 [M ] + 2αβ[M, N ] + β 2 [N ]
is inherited by the Lebesgue-Stieltjes measures. By the Kunita-Watanabe
inequality (2.14),
Z Z Z
2 2 2
|Ys | d[αM + βN ]s = α |Ys | d[M ]s + 2αβ |Ys |2 d[M, N ]s
[0,T ] [0,T ] [0,T ]
Z
+ β2 |Ys |2 d[N ]s
[0,T ]
Z Z 1/2 Z 1/2
2 2 2 2
≤α |Ys | d[M ]s + 2|α||β| |Ys | d[M ]s |Ys | d[N ]s
[0,T ] [0,T ] [0,T ]
Z
+ β2 |Ys |2 d[N ]s .
[0,T ]

The above integrals are Lebesgue-Stieltjes integrals over [0, T ], evaluated at

a fixed ω. Take expectations and apply Schwarz inequality to the middle
term.

Proof of Proposition 5.18. The Lemma shows that X ∈ L2 (αM +βN, P).
Replace the measure µM in the proof of Lemma 5.9 with the measure
134 5. Stochastic Integral

µ
e = µM + µN . The proof works exactly as before, and gives a sequence
of simple predictable processes Xn such that
Z
|X − Xn |2 d(µM + µN ) → 0
[0,T ]×Ω
for each T < ∞. This combined with the previous lemma says that Xn → X
simultaneously in spaces L2 (M, P), L2 (N, P), and L2 (αM + βN, P). (5.25)
holds for Xn by the explicit formula for the integral of a simple predictable
process, and the general conclusion follows by taking the limit.
5.2. Local square-integrable martingale integrator 135

5.2. Local square-integrable martingale integrator

Recall that a cadlag process M is a local L2 -martingale if there exists a
nondecreasing sequence of stopping times {σk } such that σk % ∞ almost
surely, and for each k the stopped process M σk = {Mσk ∧t : t ∈ R+ } is an
L2 -martingale. The sequence {σk } is a localizing sequence for M . M2,loc is
the space of cadlag, local L2 -martingales.
We wish to define a stochastic integral X ·M where M can be a local L2 -
martingale. The earlier approach via an L2 isometry will not do because the
whole point is to get rid of integrability assumptions. We start by defining
the class of integrands. Even for an L2 -martingale this gives us integrands
beyond the L2 -space of the previous section.

Definition 5.20. Given a local square-integrable martingale M , let L(M, P)

denote the class of predictable processes X which have the following prop-
erty: there exists a sequence of stopping times 0 ≤ τ1 ≤ τ2 ≤ τ3 ≤ · · · ≤
τk ≤ · · · such that
(i) P {τk % ∞} = 1,
(ii) M τk is a square-integrable martingale for each k, and
(iii) the process 1[0,τk ] X lies in L2 (M τk , P) for each k.
Let us call such a sequence of stopping times a localizing sequence for the
pair (X, M ).

By our earlier development, for each k the stochastic integral

Y k = (1[0,τk ] X) · M τk
exists as an element of M2 . The idea will be now to exploit the consistency
in the sequence of stochastic integrals Y k , which enables us to define (X ·
M )t (ω) for a fixed (t, ω) by the recipe “take Ytk (ω) for a large enough k.”
First a lemma that justifies the approach.

Lemma 5.21. Let M ∈ M2,loc and let X be a predictable process. Sup-

pose σ and τ are two stopping times such that M σ and M τ are cadlag
L2 -martingales, 1[0,σ] X ∈ L2 (M σ , P) and 1[0,τ ] X ∈ L2 (M τ , P). Let
Z Z
σ
Zt = 1[0,σ] X dM and Wt = 1[0,τ ] X dM τ
(0,t] (0,t]

denote the stochastic integrals, which are cadlag L2 -martingales. Then

Zt∧σ∧τ = Wt∧σ∧τ
where we mean that the two processes are indistinguishable.
136 5. Stochastic Integral

Proof. A short derivation based on (5.23)–(5.24) and some simple ob-

servations: (M σ )τ = (M τ )σ = M σ∧τ , and 1[0,σ] X, 1[0,τ ] X both lie in
L2 (M σ∧τ , P).
Zt∧σ∧τ = (1[0,σ] X) · M σ t∧σ∧τ = (1[0,τ ] 1[0,σ] X) · (M σ )τ t∧σ∧τ

= (1[0,σ] 1[0,τ ] X) · (M τ )σ t∧σ∧τ = (1[0,τ ] X) · M τ t∧σ∧τ = Wt∧σ∧τ .

Let Ω0 be the following event:

Ω0 = {ω : τk (ω) % ∞ as k % ∞, and for all (k, m),
(5.26) k m
Yt∧τk ∧τm
(ω) = Yt∧τk ∧τm
(ω) for all t ∈ R+ .}
P (Ω0 ) = 1 by the assumption P {τk % ∞} = 1, by the previous lemma, and
because there are countably many pairs (k, m). To rephrase this, on the
event Ω0 , if k and m are indices such that t ≤ τk ∧ τm , then Ytk = Ytm . This
makes the definition below sensible.

Definition 5.22. Let M ∈ M2,loc , X ∈ L(M, P), and let {τk } be a localiz-
ing sequence for (X, M ). Define the event Ω0 as in the previous paragraph.
The stochastic integral X · M is the cadlag local L2 -martingale defined as
follows: on the event Ω0 set
(X · M )t (ω) = (1[0,τk ] X) · M τk t (ω)

(5.27)
for any k such that τk (ω) ≥ t.
Outside the event Ω0 set (X · M )t = 0 for all t.
This definition is independent of the localizing sequence {τk } in the sense
that using any other localizing sequence of stopping times gives a process
indistinguishable from X · M defined above.

Justification of the definition. The process X·M is cadlag on any bounded

interval [0, T ] for the following reasons. If ω ∈
/ Ω0 the process is constant
in time. If ω ∈ Ω0 , pick k large enough so that τk (ω) > T , and note that
the path t 7→ (X · M )t (ω) coincides with the cadlag path t 7→ Ytk (ω) on the
interval [0, T ]. Being cadlag on all bounded intervals is the same as being
cadlag on R+ , so it follows that X · M is cadlag process.
The stopped process satisfies
(X · M )τt k = (X · M )τk ∧t = Yτkk ∧t = (Y k )τt k

because by definition X · M = Y k (almost surely) on [0, τk ]. Y k is an

L2 -martingale, hence so is (Y k )τk . Consequently (X · M )τk is a cadlag L2 -
martingale. This shows that X · M is a cadlag local L2 -martingale.
5.2. Local square-integrable martingale integrator 137

To take up the last issue, let {σj } be another localizing sequence of

stopping times for (X, M ). Let
Z
j
Wt = 1[0,σj ] X dM σj .
(0,t]
Corresponding to the event Ω0 and definition (5.27) from above, based on
{σj } we define an event Ω1 with P (Ω1 ) = 1, and on Ω1 an “alternative”
stochastic integral by
(5.28) Wt (ω) = Wtj (ω) for j such that σj (ω) ≥ t.
j k
Lemma 5.21 implies that the processes Wt∧σ j ∧τk
and Yt∧σ j ∧τk
are indistin-
guishable. Let Ω2 be the set of ω ∈ Ω0 ∩ Ω1 for which
j k
Wt∧σ j ∧τk
(ω) = Yt∧σ j ∧τk
(ω) for all t ∈ R+ and all pairs (j, k).
P (Ω2 ) = 1 because it is an intersection of countably many events of proba-
bility one. We claim that for ω ∈ Ω2 , (X · M )t (ω) = Wt (ω) for all t ∈ R+ .
Given t, pick j and k so that σj (ω) ∧ τk (ω) ≥ t. Then, using (5.27), (5.28)
and ω ∈ Ω2 ,
(X · M )t (ω) = Ytk (ω) = Wtj (ω) = Wt (ω).
We have shown that X · M and W are indistinguishable, so the definition
of X · M does not depend on the particular localizing sequence used.
We have justified all the claims made in the definition.
Remark 5.23 (Irrelevance of the time origin). The value X0 does not af-
fect anything above because µZ ({0} × Ω) = 0 for any L2 -martingale Z. If
a predictable X is given and X et = 1(0,∞) (t)Xt , then µZ {X 6= X}e = 0. In
particular, {τk } is a localizing sequence for (X, M ) iff it is a localizing se-
quence for (X, e M ), and X · M = X e · M if a localizing sequence exists. Also,
in part (iii) of Definition 5.20 we can equivalently require that 1(0,τk ] X lies
in L2 (M τk , P).
Remark 5.24 (Path continuity). If the local L2 -martingale M has continu-
ous paths to begin with, then so do M τk , hence also the integrals 1[0,τk ] M τk
have continuous paths, and the integral X · M has continuous paths.

Property (iii) of Definition 5.20 made the localization argument of the

definition of X · M work. In important special cases property (iii) follows
from this stronger property:
there exist stopping times {σk } such that σk % ∞ almost
(5.29)
surely and 1(0,σk ] X is a bounded process for each k.
Let M be an arbitrary local L2 -martingale with localizing sequence {νk },
and assume X is a predictable process that satisfies (5.29). A bounded
process is in L2 (Z, P) for any L2 -martingale Z, and consequently 1(0,σk ] X ∈
138 5. Stochastic Integral

L2 (M νk , P). By Remark 5.23 the conclusion extends to 1[0,σk ] X. Thus the

stopping times τk = σk ∧ νk localize the pair (X, M ), and in particular the
integral X · M is well-defined.
The next proposition lists the most obvious types of predictable pro-
cesses that satisfy (5.29). In certain cases demonstrating the existence of
the stopping times may require a right-continuous filtration. Then one re-
places {Ft } with {Ft+ }. As observed in the beginning of Chapter 3, this
can be done without losing any cadlag martingales (or local martingales).
Recall also the definition
(5.30) XT∗ (ω) = sup |Xt (ω)|
0≤t≤T

which is FT -measurable for any left- or right-continuous process X, provided

we make the filtration complete. (See discussion after (3.10) in Chapter 3.)
Proposition 5.25. The following cases are examples of processes with stop-
ping times {σk } that satisfy condition (5.29).
(i) X is predictable, and for each T < ∞ there exists a constant CT < ∞
such that, with probability one, Xt ≤ CT for all 0 < t ≤ T . Take σk = k.
(ii) X is adapted and has almost surely continuous paths. Take
σk = inf{t ≥ 0 : |Xt | ≥ k}.

(iii) X is adapted, and there exists an adapted, cadlag process Y such

that X(t) = Y (t−) for t > 0. Take
σk = inf{t > 0 : |Y (t)| ≥ k or |Y (t−)| ≥ k}.

(iv) X is adapted, has almost surely left-continuous paths, and XT∗ < ∞
almost surely for each T < ∞. Assume the underlying filtration {Ft } right-
continuous. Take
σk = inf{t ≥ 0 : |Xt | > k}.

Remark 5.26. Category (ii) is a special case of (iii), and category (iii) is a
special case of (iv). Category (iii) seems artificial but will be useful. Notice
that every caglad X satisfies X(t) = Y (t−) for the cadlag process Y defined
by Y (t) = X(t+), but this Y may fail to be adapted. Y is adapted if {Ft }
is right-continuous. But then we find ourselves in Category (iv).

Proof. Case (i): nothing to prove.

Case (ii). By Lemma 2.9 this σk is a stopping time. A continuous path

t 7→ Xt (ω) is bounded on compact time intervals. Hence for almost every
ω, σk (ω) % ∞. Again by continuity, |Xs | ≤ k for 0 < s ≤ σk . Note that
5.2. Local square-integrable martingale integrator 139

if |X0 | > k then σk = 0, so we cannot claim 1[0,σk ] |X0 | ≤ k. This is why

boundedness cannot be required at time zero.

Case (iii). By Lemma 2.8 this σk is a stopping time. A cadlag path is

locally bounded, and so σk % ∞. If σk > 0, then |X(t)| < k for t < σk , and
by left-continuity |X(σk )| ≤ k. Note that |Y (σk )| ≤ k may fail so we cannot
adapt this argument to Y .

Case (iv). By Lemma 2.6 σk is a stopping time since we assume {Ft }

right-continuous. As in case (iii), by left-continuity |Xs | ≤ k for 0 < s ≤
σk . Given ω such that XT∗ (ω) < ∞ for all T < ∞, we can choose kT >
sup0≤t≤T |Xt (ω)| and then σk (ω) ≥ T for k ≥ kT . Thus σk % ∞ almost
surely.
Example 5.27. Let us repeat Example 5.17 without boundedness assump-
tions. 0 ≤ σ1 ≤ σ2 ≤ · · · ≤ σi % ∞ are stopping times, ηi is a finite
Fσi -measurable random variable for i ≥ 1, and
∞
X
Xt = ηi 1(σi ,σi+1 ] (t).
i=1

X is a caglad process, and satisfies the hypotheses of case (iii) of Proposition

5.25. We shall define a concrete localizing sequence. Fix M ∈ M2,loc and
let {ρk } be a localizing sequence for M . Define
(
σj , if max1≤i≤j−1 |ηi | ≤ k < |ηj | for some j
ζk =
∞, if |ηi | ≤ k for all i.
That ζk is a stopping time follows directly from
∞
[

{ζk ≤ t} = max |ηi | ≤ k < |ηj | ∩ σj ≤ t .
1≤i≤j−1
j=1

Also ζk % ∞ since σi % ∞. The stopping times τk = ρk ∧ ζk localize the

pair (X, M ).
(k)
Truncate ηi = (ηi ∧ k) ∨ (−k). If t ∈ (σi ∧ τk , σi+1 ∧ τk ] then necessarily
(k)
σi < τk . This implies ζk ≥ σi+1 which happens iff η` = η` for 1 ≤ ` ≤ i.
Hence
∞
(k)
X
1[0,τk ] X = ηi 1(σi ∧τk ,σi+1 ∧τk ] (t).
i=1
This process is bounded, so by Example 5.17,
∞
(k)
X
τk

(1[0,τk ] X) · M t
= ηi (Mσi+1 ∧τk ∧t − Mσi ∧τk ∧t ).
i=1
140 5. Stochastic Integral

Taking k so that τk ≥ t, we get

∞
X
(X · M )t = ηi (Mσi+1 ∧t − Mσi ∧t ).
i=1

We use the integral notation

Z
X dM = (X · M )t − (X · M )s
(s,t]

and other notational conventions exactly as for the L2 integral. The stochas-
tic integral with respect to a local martingale inherits the path properties
of the L2 integral, as we observe in the next proposition. Expectations and
conditional expectations of (X · M )t do not necessarily exist any more so we
cannot even contemplate their properties.
Proposition 5.28. Let M, N ∈ M2,loc , X ∈ L(M, P), and let τ be a
stopping time.
(a) Linearity continues to hold: if also Y ∈ L(M, P), then
(αX + βY ) · M = α(X · M ) + β(Y · M ).

(b) Let Z be a bounded Fτ -measurable random variable. Then Z1(τ,∞) X

and 1(τ,∞) X are both members of L(M, P), and
Z Z
(5.31) Z1(τ,∞) X dM = Z 1(τ,∞) X dM.
(0,t] (0,t]

Furthermore,
= (X · M )τ ∧t = (X · M τ )t .

(5.32) (1[0,τ ] X) · M t

(c) Let Y ∈ L(N, P). Suppose Xt (ω) = Yt (ω) and Mt (ω) = Nt (ω) for
0 ≤ t ≤ τ (ω). Then
(X · M )τ ∧t = (Y · N )τ ∧t .
(d) Suppose X ∈ L(M, P) ∩ L(N, P). Then for α, β ∈ R, X ∈ L(αM +
βN, P) and
X · (αM + βN ) = α(X · M ) + β(X · N ).

Proof. The proofs are short exercises in localization. We show the way by
doing (5.31) and the first equality in (5.32).
Let {σk } be a localizing sequence for the pair (X, M ). Then {σk } is
a localizing sequence also for the pairs (1(σ,∞) X, M ) and (Z1(σ,∞) X, M ).
5.2. Local square-integrable martingale integrator 141

Given ω and t, pick k large enough so that σk (ω) ≥ t. Then by the definition
of the stochastic integrals for localized processes,

Z (1(τ,∞) X) · M )t (ω) = Z (1[0,σk ] 1(τ,∞) X) · M σk )t (ω)

and
(Z1(τ,∞) X) · M )t (ω) = (1[0,σk ] Z1(τ,∞) X) · M σk )t (ω).
The right-hand sides of the two equalities above coincide, by an application
of (5.22) to the L2 -martingale M σk and the process 1[0,σk ] X in place of X.
This verifies (5.31).
The sequence {σk } works also for (1[0,τ ] X, M ). If t ≤ σk (ω), then

(1[0,τ ] X)·M t = (1[0,σk ] 1[0,τ ] X)·M σk t = (1[0,σk ] X)·M σk τ ∧t = (X·M )τ ∧t .

The first and the last equality are the definition of the local integral, the
middle equality an application of (5.23). This checks the first equality in
(5.32).

We come to a very helpful result for later development. The most im-
portant processes are usually either caglad or cadlag. The next proposition
shows that for left-continuous processes the integral can be realized as a
limit of Riemann sum-type approximations. For future benefit we include
random partitions in the result.
However, a cadlag process X is not necessarily predictable and therefore
not an admissible integrand. Nevertheless it turns out that the Riemann
sums still converge. They cannot converge to X · M because this integral
might not exist. Instead, these sums converge to the integral X− · M of the
caglad process X− defined by

X− (0) = X(0) and X− (t) = X(t−) for t > 0.

We leave it as an exercise to verify that X− has caglad paths. The limit in

the next proposition is not a mere curiosity. It will be important when we
derive Itô’s formula.
Note the similarity with Lemma 1.12 for Lebesgue-Stieltjes integrals.

Proposition 5.29. Let X be a real-valued process and M ∈ M2,loc . Suppose

0 = τ0n ≤ τ1n ≤ τ2n ≤ τ3n ≤ · · · are stopping times such that for each n,
τin → ∞ almost surely as i → ∞, and δn = supi (τi+1 n − τ n ) tends to zero
i
almost surely as n → ∞. Define the process
∞
X
X(τin ) M (τi+1
n
∧ t) − M (τin ∧ t) .

(5.33) Rn (t) =
i=1
142 5. Stochastic Integral

(a) Assume X is left-continuous and satisfies (5.29). Then for each fixed
T < ∞ and ε > 0,
n o
lim P sup |Rn (t) − (X · M )t | ≥ ε = 0.
n→∞ 0≤t≤T

In other words, Rn converges to X · M in probability, uniformly on compact

time intervals.
(b) If X is an adapted cadlag process, then Rn converges to X− · M in
probability, uniformly on compact time intervals.

Proof. Since X− = X for a left-continuous process, we can prove parts (a)

and (b) simultaneously.
Assume first X0 = 0. This is convenient for the proof. At the end we
lift this assumption. Define
∞
X
Yn (t) = X(τin )1(τin ,τi+1
n ] (t) − X− (t).

i=0

By the hypotheses and by Example 5.27, Yn is an element of L(M, P) and

its integral is
Yn · M = Rn − X− · M.
Consequently we need to show that Yn · M → 0 in probability, uniformly on
compacts.
Let {σk } be a localizing sequence for (X− , M ) such that 1(0,σk ] X− is
bounded. In part (a) existence of {σk } is a hypothesis. For part (b) apply
part (iii) of Proposition 5.25. As explained there, it may happen that X(σk )
is not bounded, but X(t−) will be bounded for 0 < t ≤ σk .
Pick constants bk such that |X− (t)| ≤ bk for 0 ≤ t ≤ σk (here we rely on
the assumption X0 = 0). Define X (k) = (X ∧ bk ) ∨ (−bk ) and
∞
(k)
X
Yn(k) (t) = X (k) (τin )1(τin ,τi+1
n ] (t) − X
− (t).
i=0
(k)
In forming X− (t), it is immaterial whether truncation follows the left limit
or vice versa.
We have the equality
1[0,σk ] Yn (t) = 1[0,σk ] Yn(k) (t),
For the sum in Yn this can be seen term by term:
n (k) n
1[0,σk ] (t)1(τin ,τi+1
n ] (t)X(τ ) = 1[0,σ ] (t)1(τ n ,τ n ] (t)X
i k i i+1
(τi )
because both sides vanish unless τin < t ≤ σk .
5.2. Local square-integrable martingale integrator 143

Thus {σk } is a localizing sequence for (Yn , M ). On the event {σk > T },
for 0 ≤ t ≤ T , by definition (5.27) and Proposition 5.15(b)–(c),
(Yn · M )t = (1[0,σk ] Yn ) · M σk )t = (Yn(k) · M σk )t .

Fix ε > 0. In the next bound we apply martingale inequality (3.8) and
the isometry (5.12).
n o
P sup |(Yn · M )t | ≥ ε ≤ P {σk ≤ T }
0≤t≤T
n o
Yn(k) · M σk

+P sup t
≥ε
0≤t≤T

≤ P {σk ≤ T } + ε−2 E (Yn(k) · M σk )2T

Z
≤ P {σk ≤ T } + ε−2 |Yn(k) (t, ω)|2 µM σk (dt, dω).
[0,T ]×Ω

Let ε1 > 0. Fix k large enough so that P {σk ≤ T } < ε1 . As n → ∞,

(k)
Yn (t, ω) → 0 for all t, if ω is such that the path s 7→ X− (s, ω) is left-
continuous and the assumption δn (ω) → 0 holds. This excludes at most a
zero probability set of ω’s, and so this convergence happens µM σk -almost
(k)
everywhere. By the bound |Yn | ≤ 2bk and dominated convergence,
Z
|Yn(k) |2 dµM σk −→ 0 as n → ∞.
[0,T ]×Ω

Letting n → ∞ in the last string of inequalities gives

n o
lim P sup |(Yn · M )t | ≥ ε ≤ ε1 .
n→∞ 0≤t≤T

Since ε1 > 0 can be taken arbitrarily small, the limit above must actually
equal zero.
At this point we have proved
n o
(5.34) lim P sup |Rn (t) − (X− · M )t | ≥ ε = 0
n→∞ 0≤t≤T

under the extra assumption X0 = 0. Suppose X e satisfies the hypotheses

of the proposition, but X0 is not identically zero. Then (5.34) is valid
e
for Xt = 1(0,∞) (t)Xet . Changing value at t = 0 does not affect stochastic
e · M = X · M . Let
integration, so X
∞
X
e in ) M (τi+1
n
∧ t) − M (τin ∧ t) .

R
en (t) = X(τ
i=0

The conclusion follows for X

e if we can show that
sup |R
en (t) − Rn (t)| → 0 as n → ∞.
0≤t<∞
144 5. Stochastic Integral

M (τ1n ∧ t) − M (0) , we have the bound

en (t) − Rn (t) = X(0)
Since R e

sup |R
en (t) − Rn (t)| ≤ |X(0)|
e · sup |M (t) − M (0)|.
0≤t<∞ 0≤t≤δn

The last quantity vanishes almost surely as n → ∞, by the assumption

δn → 0 and the cadlag paths of M . In particular it converges to zero in
probability.
To summarize, (5.34) now holds for all processes that satisfy the hy-
potheses.
Remark 5.30 (Doléans measure). We discuss here briefly the Doléans mea-
sure of a local L2 -martingale. It provides an alternative way to define the
space L(M, P) of admissible integrands. The lemma below will be used to
extend the stochastic integral beyond predictable integrands, but that point
is not central to the main development, so the remainder of this section can
be skipped.
Fix a local L2 -martingale M and stopping times σk % ∞ such that
M σk ∈ M2 for each k. By Theorem 3.22 the quadratic variation [M ] exists
as a nondecreasing cadlag process. Consequently Lebesgue-Stieltjes integrals
with respect to [M ] are well-defined. The Doléans measure µM can be
defined for A ∈ P by
Z
(5.35) µM (A) = E 1A (t, ω)d[M ]t (ω),
[0,∞)

exactly as for L2 -martingales earlier. The measure µM is σ-finite: the union

of the stochastic intervals {[0, σk ∧ k] : k ∈ N} exhausts R+ × Ω, and
Z
µM ([0, σk ∧ k]) = E 1[0,σk (ω)∧k] (t) d[M ]t (ω) = E{[M ]σk ∧k }
[0,∞)

= E{[M σk
]k } = E{(Mkσk )2 − (M0σk )2 } < ∞.
Along the way we used Lemma 3.23 and then the square-integrability of
M σk .
The following alternative characterization of membership in L(M, P) will
be useful for extending the stochastic integral to non-predictable integrands
in Section 5.5.
Lemma 5.31. Let M be a local L2 -martingale and X a predictable process.
Then X ∈ L(M, P) iff there exist stopping times ρk % ∞ (a.s.) such that
for each k,
Z
1[0,ρk ] |X|2 dµM < ∞ for all T < ∞.
[0,T ]×Ω
5.3. Semimartingale integrator 145

We leave the proof of this lemma as an exercise. The key point is that
for both L2 -martingales and local L2 -martingales, and a stopping time τ ,
µM τ (A) = µM (A ∩ [0, τ ]) for A ∈ P. (Just check that the proof of Lemma
5.14 applies without change to local L2 -martingales.)
Furthermore, we leave as an exercise proof of the result that if X, Y ∈
L(M, P) are µM -equivalent, which means again that
µM {(t, ω) : X(t, ω) 6= Y (t, ω)} = 0,
then X · M = Y · M in the sense of indistinguishability.

5.3. Semimartingale integrator

First a reminder of some terminology and results. A cadlag semimartingale
is a process Y that can be written as Yt = Y0 + Mt + Vt where M is a cadlag
local martingale, V is a cadlag FV process, and M0 = V0 = 0. To define the
stochastic integral, we need M to be a local L2 -martingale. If we assume
the filtration {Ft } complete and right-continuous (the “usual conditions”),
then by Corollary 3.17 we can always select the decomposition so that M is
a local L2 -martingale. Thus usual conditions for {Ft } need to be assumed in
this section, unless one works with a semimartingale Y for which it is known
that M can be chosen a local L2 -martingale. If g is a function of bounded
variation on [0, T ], then the Lebesgue-Stieltjes measure Λg of g exists as a
signed Borel measure on [0, T ] (Section 1.1.9).
In this section the integrands will be predictable processes X that satisfy
this condition:
there exist stopping times {σn } such that σn % ∞ almost
(5.36)
surely and 1(0,σn ] X is a bounded process for each n.
In particular, the categories listed in Proposition 5.25 are covered. We
deliberately ask for 1(0,σn ] X to be bounded instead of 1[0,σn ] X because X0
might not be bounded.
Definition 5.32. Let Y be a cadlag semimartingale. Let X be a predictable
process that satisfies (5.36). Then we define the integral of X with respect
to Y as the process
Z Z Z
(5.37) Xs dYs = Xs dMs + Xs ΛV (ds).
(0,t] (0,t] (0,t]

Here Y = Y0 + M + V is some decomposition of Y into a local L2 -martingale

M and an FV process V ,
Z
Xs dMs = (X · M )t
(0,t]
146 5. Stochastic Integral

is the stochastic integral of Definition 5.22, and

Z Z
Xs ΛV (ds) = Xs dVs
(0,t] (0,t]

is the path-by-path Lebesgue-Stieltjes

R integral of X with respect to the
function s 7→ Vs . The process X dY thus defined is unique up to indistin-
guishability, and it is a semimartingale.
R
As before, we shall use the notations X · Y and X dY interchangeably.

Justification of the definition. The first item to check is that the inte-
gral does not depend on the decomposition of Y chosen. Suppose Y =
Y0 + Mf + Ve is another decomposition of Y into a local L2 -martingale Mf
and an FV process V . We need to show that
e
Z Z Z Z
Xs dMs + Xs ΛV (ds) = X s d Ms +
f Xs ΛVe (ds)
(0,t] (0,t] (0,t] (0,t]

in the sense that the processes on either side of the equality sign are indistin-
guishable. By Proposition 5.28(d) and the additivity of Lebesgue-Stieltjes
measures, this is equivalent to
Z Z
Xs d(M − M )s =
f Xs ΛVe −V (ds).
(0,t] (0,t]

From Y = M + V = M
f + Ve we get

M −M
f = Ve − V

and this process is both a local L2 -martingale and an FV process. The

equality we need is a consequence of the next proposition.
Proposition 5.33. Suppose Z is a cadlag local L2 -martingale and an FV
process. Let X be a predictable process that satisfies (5.36). Then for almost
every ω
Z Z
(5.38) X(s, ω)dZs (ω) = X(s, ω)ΛZ(ω) (ds) for all 0 ≤ t < ∞.
(0,t] (0,t]

On the left is the stochastic integral, on the right the Lebesgue-Stieltjes in-
tegral evaluated separately for each fixed ω.

Proof. Both sides of (5.38) are right-continuous in t, so it suffices to check

that for each t they agree with probability 1.

Step 1. Start by assuming that Z is an L2 -martingale. Fix 0 < t < ∞.

Let
H = {X : X is a bounded predictable process and (5.38) holds for t}.
5.3. Semimartingale integrator 147

Indicators of predictable rectangles lie in H because we know explicitly what

integrals on both sides of (5.38) look like. If X = 1(u,v]×F for F ∈ Fu , then
the left side of (5.38) equals 1F (Zv∧t − Zu∧t ) by the first definition (5.7)
of the stochastic integral. The right-hand side equals the same thing by
the definition of the Lebesgue-Steltjes integral. If X = 1{0}×F0 for F0 ∈ F0 ,
both sides of (5.38) vanish, on the left by the definition (5.7) of the stochastic
integral and on the right because the integral is over (0, t] and hence excludes
the origin. By the linearity of both integrals, H is a linear space.
Let Xn ∈ H and Xn % X. We wish to argue that at least along some
subsequence {nj }, both sides of
Z Z
(5.39) Xn (s, ω) dZs (ω) = Xn (s, ω) ΛZ(ω) (ds)
(0,t] (0,t]

converge for almost every ω to the corresponding integrals with X. This

would imply that X ∈ H. Then we would have checked that the space H
satisfies the hypotheses of Theorem B.2. (The π-system for the theorem is
the class of predictable rectangles.)
On the right-hand side of (5.39) the desired convergence follows from
dominated convergence. For a fixed ω, the BV function s 7→ Zs (ω) on [0, t]
can be expressed as the difference Zs (ω) = f (s) − g(s) of two nondecreasing
functions. Hence the signed measure ΛZ(ω) is the difference of two finite
positive measures: ΛZ(ω) = Λf − Λg . Then
Z
lim Xn (s, ω) ΛZ(ω) (ds)
n→∞ (0,t]
Z Z
= lim Xn (s, ω) Λf (ds) − lim Xn (s, ω) Λg (ds)
n→∞ (0,t] n→∞ (0,t]
Z Z
= X(s, ω) Λf (ds) − X(s, ω) Λg (ds)
(0,t] (0,t]
Z
= X(s, ω) ΛZ(ω) (ds).
(0,t]

The limits are applications of the usual dominated convergence theorem,

because −C ≤ X1 ≤ Xn ≤ X ≤ C for some constant C.
Now the left side of (5.39). For a fixed k and T < ∞, by dominated
convergence,
Z
lim |X − Xn |2 dµZ = 0.
n→∞ [0,T ]×Ω

Hence kXn − XkL2 (Z,P) → 0, and by the isometry Xn · Z → X · Z in M2 , as

n → ∞. Then for a fixed t, (Xnj · Z)t → (X · Z)t almost surely along some
subsequence {nj }. Thus taking the limit along {nj } on both sides of (5.39)
148 5. Stochastic Integral

gives Z Z
X(s, ω) dZs (ω) = X(s, ω) ΛZ(ω) (ds)
(0,t] (0,t]
almost surely. By Theorem B.2, H contains all bounded P-measurable pro-
cesses.
This completes Step 1: (5.38) has been verified for the case where Z ∈
M2 and X is bounded.
Step 2. Now consider the case of a local L2 -martingale Z. By the
assumption on X we may pick a localizing sequence {τk } such that Z τk is
an L2 -martingale and 1(0,τk ] X is bounded. Then by Step 1,
Z Z
(5.40) 1(0,τk ] (s)X(s) dZsτk = 1(0,τk ] (s)X(s) ΛZ τk (ds).
(0,t] (0,t]

On the event {τk ≥ t} the left and right sides of (5.40) coincide almost
surely with the corresponding sides of (5.38). The union over k of the
events {τk ≥ t} equals almost surely the whole space Ω. Thus we have
verified (5.38) almost surely, for this fixed t.
The left-hand side of (5.40) coincides almost surely with (1[0,τk ] X)·Z τk t

due to the irrelevance of the time origin. By the definition (5.27) of the
stochastic integral, this agrees with (X · Z)t on the event {τk ≥ t}.
On the right-hand side of (5.40) we only need to observe that if τk ≥
t, then on the interval (0, t] 1(0,τk ] (s)X(s) coincides with X(s) and Zsτk
coincides with Zs . So it is clear that the integrals on the right-hand sides of
(5.40) and (5.39) coincide.

Returning
R to the justification of the definition, we now know that the
process X dY does not depend on the choice of the decomposition Y =
Y0 + M + V .
X · M is a local L2 -martingale, and for a fixed ω the function
Z
t 7→ Xs (ω) ΛV (ω) (ds)
(0,t]

has bounded variation on every compact interval (Lemma 1.16). R Thus the
definition (5.37) provides the semimartingale decomposition of X dY .

As in the previous step of the development, we want to check the Rie-

mann sum approximations. Recall that for a cadlag X, the caglad process
X− is dedined by X− (0) = X0 and X− (t) = X(t−) for t > 0. The hypothe-
ses for the integrand in the next proposition are exactly the same as earlier
in Proposition 5.29.
Parts (a) and (b) below could be subsumed under a single statement
since X = X− for a left-continuous process. We prefer to keep them separate
5.3. Semimartingale integrator 149

though to avoid confusing the issue that for a cadlag process X the limit
is not necessarily the stochastic integral of X. The integral X · Y may fail
to exist, and even if it exists, it does not necessarily coincide with X− · Y .
This is not a consequence of the stochastic aspect, but can happen also for
Lebesgue-Stieltjes integrals. (Find examples!)
Proposition 5.34. Let X be a real-valued process and Y a cadlag semi-
martingale. Suppose 0 = τ0n ≤ τ1n ≤ τ2n ≤ τ3n ≤ · · · are stopping times such
that for each n, τin → ∞ almost surely as i → ∞, and δn = sup0≤i<∞ (τi+1 n −
n
τi ) tends to zero almost surely as n → ∞. Define
∞
X
X(τin ) Y (τi+1
n
∧ t) − Y (τin ∧ t) .

(5.41) Sn (t) =
i=0

(a) Assume X is left-continuous and satisfies (5.36). Then for each fixed
T < ∞ and ε > 0,
n o
lim P sup |Sn (t) − (X · Y )t | ≥ ε = 0.
n→∞ 0≤t≤T

In other words, Sn converges to X · Y in probability, uniformly on compact

time intervals.
(b) If X is an adapted cadlag process, then Sn converges to X− · Y in
probability, uniformly on compacts.

Proof. Pick a decomposition Y = Y0 + M + V . We get the corresponding

decomposition Sn = Rn + Un by defining
∞
X
X(τin ) Mτi+1

Rn (t) = n ∧t − Mτ n ∧t
i
i=0

as in Proposition 5.29, and

∞
X
X(τin ) Vτi+1

Un (t) = n ∧t − Vτ n ∧t .
i
i=0

Proposition 5.29 gives the convergence Rn → X− · M . Lemma 1.12

applied to the Lebesgue-Stieltjes measure of Vt (ω) tells us that, for almost
every fixed ω
Z
lim sup Un (t, ω) − X(s−, ω) ΛV (ω) (ds) = 0.
n→∞ 0≤t≤T (0,t]

The reservation “almost every ω” is needed in case there is an exceptional

zero probability event on which X or V fails to have the good path proper-
ties. Almost sure convergence implies convergence in probability.
150 5. Stochastic Integral

5.4. Further properties of stochastic integrals

Now that we have constructed the stochastic integral, subsequent chapters
on Itô’s formula and stochastic differential equations develop techniques that
can be applied to build models and study the behavior of those models. In
this section we develop further properties of the integral as groundwork for
later chapters.
The usual conditions on {Ft } need to be assumed only insofar as this is
needed for a definition of the integral with respect to a semimartingale. For
the proofs of this section right-continuity of {Ft } is not needed. (Complete-
ness we assume always.)
The properties listed in Proposition 5.28 extend readily to the semi-
martingale intergal. Each property is linear in the integrator, holds for the
martingale part by the proposition, and can be checked path by path for
the FV part. We state the important ones for further reference and leave
the proofs as exercises.

Proposition 5.35. Let Y and Z be semimartingales, G and H predictable

processes that satisfy the local boundedness condition (5.36), and let τ be a
stopping time.
(a) Let U be a bounded Fτ -measurable random variable. Then
Z Z
(5.42) U 1(τ,∞) G dY = U 1(τ,∞) G dY.
(0,t] (0,t]

Furthermore,

= (G · Y )τ ∧t = (G · Y τ )t .

(5.43) (1[0,τ ] G) · Y t

(b) Suppose Gt (ω) = Ht (ω) and Yt (ω) = Zt (ω) for 0 ≤ t ≤ τ (ω). Then

(G · Y )σ∧t = (H · Z)σ∧t .

5.4.1. Jumps of a stochastic integral. For any cadlag process Z, ∆Z(t) =

Z(t) − Z(t−) denotes the jump at t. First a strengthening of the part of
Proposition 2.15 that identifies the jumps of the quadratic variation. The
strengthening is that the “almost surely” qualifier is not applied separately
to each t.

Lemma 5.36. Let Y be a semimartingale. Then the quadratic variation

[Y ] exists. For almost every ω, ∆[Y ]t = (∆Yt )2 for all 0 < t < ∞.
5.4. Further properties of stochastic integrals 151

Proof. Fix 0 < T < ∞. By Proposition 5.34, we can pick a sequence of

partitions π n = {tni } of [0, T ] such that the process
X X
Sn (t) = 2 Ytni (Ytni+1 ∧t − Ytni ∧t ) = Yt2 − Y02 − (Ytni+1 ∧t − Ytni ∧t )2
i i
converges to the process
Z
S(t) = 2 Y (s−) dY (s),
(0,t]

uniformly for t ∈ [0, T ], for almost every ω. This implies the convergence of
the sum of squares, so the quadratic variation [Y ] exists and satisfies
[Y ]t = Yt2 − Y02 − S(t).

It is true in general that a uniform bound on the difference of two func-

Fix an ω for which the uniform convergence Sn → S holds. Then for

each t ∈ (0, T ], the jump ∆Sn (t) converges to ∆S(t) = ∆(Y 2 )t − ∆[Y ]t .
Directly from the definition of Sn one sees that ∆Sn (t) = 2Ytnk ∆Yt for
the index k such that t ∈ (tnk , tnk+1 ]. Here is the calculation in detail: if s < t
is close enough to t, also s ∈ (tnk , tnk+1 ], and then

∆Sn (t) = lim Sn (t) − Sn (s)
s%t, s<t
X X
= lim 2 Ytni (Ytni+1 ∧t − Ytni ∧t ) − 2 Ytni (Ytni+1 ∧s − Ytni ∧s )
s%t, s<t
i i

= lim 2Ytnk (Yt − Ytnk ) − 2Ytnk (Ys − Ytnk )
s%t, s<t
= lim 2Ytnk (Yt − Ys ) = 2Ytnk ∆Yt .
s%t, s<t

By the cadlag path of Y , ∆Sn (t) → 2Yt− ∆Yt . Equality of the two limits
of Sn (t) gives
2Yt− ∆Yt = ∆(Y 2 )t − ∆[Y ]t
which rearranges to ∆[Y ]t = (∆Yt )2 .
Theorem 5.37. Let Y be a cadlag semimartingale, X a predictable process
that satisfies the local boundedness condition (5.36), and X · Y the stochastic
integral. Then for all ω in a set of probability one,
∆(X · Y )(t) = X(t)∆Y (t) for all 0 < t < ∞.
152 5. Stochastic Integral

We prove this proposition in stages. The reader may be surprised by the

appearance of the point value X(t) in a statement about integrals. After all,
one of the lessons of integration theory is that point values of functions are
often irrelevant. But implicit in the proof below is the notion that a point
value of the integrand X influences the integral exactly when the integrator
Y has a jump.
Lemma 5.38. Let M be a cadlag local L2 -martingale and X ∈ L(M, P).
Then for all ω in a set of probability one, ∆(X · M )(t) = X(t)∆M (t) for all
0 < t < ∞.

Proof. Suppose the conclusion holds if M ∈ M2 and X ∈ L2 (M, P). Pick

a sequence {σk } that localizes (X, M ) and let Xk = 1[0,σk ] X. Fix ω such
that definition (5.27) of the integral X · M works and the conclusion above
holds for each integral Xk · M σk . Then if σk > t,
∆(X · M )t = ∆(Xk · M τk )t = Xk (t)∆Mtτk = X(t)∆Mt .
For the remainder of the proof we may assume that M ∈ M2 and
X ∈ L2 (M, P). Pick simple predictable processes
m(n)−1
X
Xn (t) = ξin 1(tni ,tni+1 ] (t)
i=1
such that Xn → X in L2 (M, P). By the definition of Lebesgue-Stieltjes
integrals, and because by Lemma 5.36 the processes ∆[M ]t and (∆Mt )2 are
indistinguishable,
X X
2 2
E Xn (s)∆Ms − X(s)∆Ms =E Xn (s) − X(s) ∆[M ]s
s∈(0,T ] s∈(0,T ]
Z
2
≤E Xn (s) − X(s) d[M ]s
(0,T ]
and by hypothesis the last expectation vanishes as n → ∞. Thus there
exists a subsequence nj along which almost surely
2
X
lim Xnj (s)∆Ms − X(s)∆Ms = 0.
j→∞
s∈(0,T ]

In particular, on this event of full probability, for any t ∈ (0, T ]

2
(5.44) lim Xnj (t)∆Mt − X(t)∆Mt = 0.
j→∞

On the other hand, Xnj · M → X · M in M2 implies that along a further

subsequence (which we denote by the same nj ), almost surely,
(5.45) lim sup (Xnj · M )t − (X · M )t = 0.
j→∞ t∈[0,T ]
5.4. Further properties of stochastic integrals 153

Fix an ω on which both almost sure limits (5.44) and (5.45) hold. For
any t ∈ (0, T ], the uniform convergence in (5.45) implies ∆(Xnj · M )t →
∆(X · M )t . Also, since a path of Xnj · M is a step function, ∆(Xnj · M )t =
Xnj (t)∆Mt . (The last two points were justified explicitly in the proof of
Lemma 5.36 above.) Combining these with the limit (5.44) shows that, for
this fixed ω and all t ∈ (0, T ],

∆(X · M )t = lim ∆(Xnj · M )t = lim Xnj (t)∆Mt = X(t)∆Mt .

j→∞ j→∞

In other words, the conclusion holds almost surely on [0, T ]. To finish,

consider countably many T that increase up to ∞.

Lemma 5.39. Let f be a bounded Borel function and U a BV function on

[0, T ]. Denote the Lebesgue-Stieltjes integral by
Z
(f · U )t = f (s) dU (s).
(0,t]

Then ∆(f · U )t = f (t)∆U (t) for all 0 < t ≤ T .

Proof. By the rules concerning Lebesgue-Stieltjes integration,

Z
(f · U )t − (f · U )s = f (s) dU (s)
(s,t]
Z
= f (s) dU (s) + f (t)∆U (t)
(s,t)

and
Z
f (s) dU (s) ≤ kf k∞ ΛVU (s, t).
(s,t)

VU is the total variation function of U . As s % t, the set (s, t) decreases

down to the empty set. Since ΛVU is a finite positive measure on [0, T ],
ΛVU (s, t) & 0 as s % t.

Proof of Theorem 5.37 follows from combining Lemmas 5.38 and 5.39.
We introduce the following notation for the left limit of a stochastic integral:
Z Z
(5.46) H dY = lim H dY
(0,t) s%t, s<t (0,s]

and then we have this identity:

Z Z
(5.47) H dY = H dY + H(t)∆Y (t).
(0,t] (0,t)
154 5. Stochastic Integral

5.4.2. Convergence theorem for stochastic integrals. The next the-

orem is a dominated convergence theorem of sorts for stochastic integrals.
We will use it in the existence proof for solutions of stochastic differential
equations.

Theorem 5.40. Let {Hn } be a sequence of predictable processes and Gn a

sequence of nonnegative adapted cadlag processes. Assume |Hn (t)| ≤ Gn (t−)
for all 0 < t < ∞, and the running maximum

G∗n (T ) = sup Gn (t)

0≤t≤T

converges to zero in probability, for each fixed 0 < T < ∞. Then for any
cadlag semimartingale Y , Hn · Y → 0 in probability, uniformly on compact
time intervals.

Proof. Let Y = Y0 + M + U be a decomposition of Y into a local L2 -

martingale M and an FV process U . We show that both terms in Hn · Y =
Hn · M + Hn · U converge to zero in probability, uniformly on compact time
intervals. The FV part is immediate:
Z Z
sup Hn (s) dU (s) ≤ sup |Hn (s)| dVU (s) ≤ G∗n (T )VU (T )
0≤t≤T (0,t] 0≤t≤T (0,t]

where we applied inequality (1.13). Since VU (T ) is a finite random variable,

the last bound above converges to zero in probability.
For the local martingale part Hn · M , pick a sequence of stopping times
{σk } that localizes M . Let

ρn = inf{t ≥ 0 : Gn (t) ≥ 1} ∧ inf{t > 0 : Gn (t−) ≥ 1}.

These are stopping times by Lemma 2.8. By left-continuity, Gn (t−) ≤ 1 for

0 < t ≤ ρn . For any T < ∞,

P {ρn ≤ T } ≤ P {G∗n (T ) > 1/2} → 0 as n → ∞.

(1)
Let Hn = (Hn ∧ 1) ∨ (−1) denote the bounded process obtained by trun-
(1)
cation. If t ≤ σk ∧ ρn then Hn · M = Hn · M σk by part (c) of Proposition
(1)
5.28. As a bounded process Hn ∈ L2 (M τk , P), so by martingale inequality
5.4. Further properties of stochastic integrals 155

(3.8) and the stochastic integral isometry (5.12),

n o
P sup |(Hn · M )t | ≥ ε ≤ P {σk ≤ T } + P {ρn ≤ T }
0≤t≤T
n o
Hn(1) · M σk

+P sup t
≥ε
0≤t≤T

≤ P {σk ≤ T } + P {ρn ≤ T } + ε−2 E (Hn(1) · M σk )2T

Z
= P {σk ≤ T } + P {ρn ≤ T } + ε−2 E |Hn(1) (t)|2 d[M σk ]t
[0,T ]
Z
2
≤ P {σk ≤ T } + P {ρn ≤ T } + ε−2 E Gn (t−) ∧ 1 d[M σk ]t
[0,T ]
n 2 o
−2
≤ P {σk ≤ T } + P {ρn ≤ T } + ε E [M σk ]T G∗n (T ) ∧ 1 .

As k and T stay fixed and n → ∞, the last expectation above tends to zero.
This follows from the dominated convergence theorem under convergence
in probability (Theorem B.9). The integrand is bounded by the integrable
random variable [M σk ]T . Given δ > 0, pick K > δ so that
P [M σk ]T ≥ K < δ/2.

Then
n 2 o
P [M σk ]T G∗n (T ) ∧ 1 ≥ δ ≤ δ/2 + P G∗n (T ) ≥ δ/K
p

where the last probability vanishes as n → ∞, by the assumption that

G∗n (T ) → 0 in probability.
Returning to the string of inequalities and letting n → ∞ gives
n o
lim P sup |(Hn · M )t | ≥ ε ≤ P {σk ≤ T }.
n→∞ 0≤t≤T

This last bound tends to zero as k → ∞, and so we have proved that

Hn · M → 0 in probability, uniformly on [0, T ].

5.4.3. Restarting at a stopping time. Let Y be a cadlag semimartingale

and G a predictable process that satisfies the local boundedness condition
(5.36). Let σ be a bounded stopping time with respect to the underlying
filtration {Ft }, and define F̄t = Fσ+t . Let P̄ be the predictable σ-field under
the filtration {F̄t }. In other words, P̄ is the σ-field generated by sets of the
type (u, v] × Γ for Γ ∈ F̄u and {0} × Γ0 for Γ0 ∈ F̄0 .
Define new processes
Ȳ (t) = Y (σ + t) − Y (σ) and Ḡ(s) = G(σ + s).
For Ȳ we could define just as well Ȳ (t) = Y (σ + t) without changing the
statements below. The reason is that Ȳ appears only as an integrator, so
156 5. Stochastic Integral

only its increments matter. Sometimes it might be convenient to have initial

value zero, that is why we subtract Y (σ) from Ȳ (t).
Theorem 5.41. Let σ be a bounded stopping time with respect to {Ft }.
Under the filtration {F̄t }, the process Ḡ is predictable and Ȳ is an adapted
semimartingale. We have this equality of stochastic integrals:
Z Z
Ḡ(s) dȲ (s) = G(s) dY (s)
(0,t] (σ,σ+t]
(5.48) Z Z
= G(s) dY (s) − G(s) dY (s).
(0,σ+t] (0,σ]

The second equality in (5.48) is the definition of the integral over (σ, σ +
t]. The proof of Theorem 5.41 follows after two lemmas.
Lemma 5.42. For any P-measurable function G, Ḡ(t, ω) = G(σ(ω) + t, ω)
is P̄-measurable.

Proof. Let U be the space of P-measurable functions G such that Ḡ is

P̄-measurable. U is a linear space and closed under pointwise limits since
these operations preserve measurability. Since any P-measurable function
is a pointwise limit of bounded P-measurable functions, it suffices to show
that U contains all bounded P-measurable functions. According to Theorem
B.2 we need to check that U contains indicator functions of predictable
rectangles. Let Γ ∈ Fu and G = 1(u,v]×Γ . Then
(
1Γ (ω), u < σ + t ≤ v
Ḡ(t) = 1(u,v] (σ + t)1Γ (ω) =
0, otherwise.

For a fixed ω, Ḡ(t) is a caglad process. By Lemma 5.1, P̄-measurability of

Ḡ follows if it is adapted to {F̄t }. Since {Ḡ(t) = 1} = Γ ∩ {u < σ + t ≤ v},
Ḡ is adapted to {F̄t } if Γ ∩ {u < σ + t ≤ v} ∈ F̄t . This is true by Lemma 2.2
because u, v and σ +t can be regarded as stopping times and F̄t = Fσ+t .
Lemma 5.43. Let M be a local L2 -martingale with respect to {Ft }. Then
M̄t = Mσ+t − Mσ is a local L2 -martingale with respect to {F̄t }. If G ∈
L(M, P), then Ḡ ∈ L(M̄ , P̄), and
(Ḡ · M̄ )t = (G · M )σ+t − (G · M )σ .

Proof. Let {τk } localize (G, M ). Let νk = (τk − σ)+ . For any 0 ≤ t < ∞,
{νk ≤ t} = {τk ≤ σ + t} ∈ Fσ+t = F̄t by Lemma 2.2(ii)
5.4. Further properties of stochastic integrals 157

so νk is a stopping time for the filtration {F̄t }. If σ ≤ τk ,

τk
M̄tνk = M̄νk ∧t = Mσ+(νk ∧t) − Mσ = M(σ+t)∧τk − Mσ∧τk = Mσ+t − Mστk .

If σ > τk , then νk = 0 and M̄tνk = M̄0 = Mσ − Mσ = 0. The earlier formula

also gives zero in case σk > τ , so in both cases we can write
(5.49) M̄tνk = Mσ+t
τk
− Mστk .
Since M τk is an L2 -martingale, the right-hand side above is an L2 -martingale
with respect to {F̄t }. See Corollary 3.8 and point (ii) after the corollary.
Consequently M̄ νk is an L2 -martingale with respect to {F̄t }, and hence M̄
is a local L2 -martingale.
Fixing k temporarily, write Z = M τk . Then (5.49) shows that
M̄tνk = Z̄t ≡ Zσ+t − Zσ ,
in other words obtained from Z by the operation of restarting at σ. We
turn to looking at the measure µZ̄ = µM̄ νk . Let Γ ∈ Fu . Then 1(u,v]×Γ is
P-measurable, while 1(u,v]×Γ (σ + t, ω) is P̄-measurable.
Z
1Γ (ω)1(u,v] (σ + t) µZ̄ (dt, dω)
[0,T ]×Ω
h i
= E 1Γ (ω) [Z̄]T ∧(v−σ)+ − [Z̄]T ∧(u−σ)+
h 2 i
= E 1Γ (ω) Z̄T ∧(v−σ)+ − Z̄T ∧(u−σ)+
h 2 i
= E 1Γ (ω) Zσ+T ∧(v−σ)+ − Zσ+T ∧(u−σ)+
h i
= E 1Γ (ω) [Z]σ+T ∧(v−σ)+ − [Z]σ+T ∧(u−σ)+
h Z i
= E 1Γ (ω) 1(σ+T ∧(u−σ)+ , σ+T ∧(v−σ)+ ] (t) d[Z]t .

Observe that

σ,
 u<σ
+
(5.50) σ + T ∧ (u − σ) = u, σ ≤u<σ+T

σ + T, u ≥ σ + T.


From this one can derive

(σ + T ∧ (u − σ)+ , σ + T ∧ (v − σ)+ ] = (σ, σ + T ] ∩ (u, v]
and then calculation above can be completed to arrive at
Z Z
1Γ (ω)1(u,v] (σ + t) dµZ̄ = 1Γ (ω)1(u,v] (t) dµZ .
[0,T ]×Ω (σ,σ+T ]×Ω
158 5. Stochastic Integral

From predictable rectangles (u, v] × Γ ∈ P this integral identity extends to

nonnegative predictable processes X to give, with X̄(t) = X(σ + t),
Z Z
(5.51) X̄ dµZ̄ = X dµZ .
[0,T ]×Ω (σ,σ+T ]×Ω

Apply this to X = 1(σ,τk ] (t)G(t). Now

X̄(t) = 1(0,τk −σ] (t)Ḡ(t) = 1(0,νk ] (t)Ḡ(t).

We have assumed σ bounded, so let S be a constant such that σ ≤ S. Then

Z Z
1(0,νk ] (t)Ḡ(t) dµZ̄ = 1(σ,τk ] (t)G(t) dµZ
[0,T ]×Ω (σ,σ+T ]×Ω
Z
≤ 1[0,τk ] (t)G(t) dµZ < ∞,
[0,S+T ]×Ω

where the last inequality is a consequence of the assumption that {τk } lo-
calizes (G, M ).
To summarize thus far: we have shown that {νk } localizes (Ḡ, M̄ ). This
checks Ḡ ∈ L(M̄ , P̄).
Fix again k and continue denoting the L2 -martingales by Z = M τk and
Z̄ = M̄ νk . Consider a simple P-predictable process
m−1
X
Hn (t) = ξi 1(ui ,ui+1 ] (t).
i=0

Let k denote the index that satisfies uk+1 > σ ≥ uk . (If there is no such k
then H̄n = 0.) Then
X
H̄n (t) = ξi 1(ui −σ,ui+1 −σ] (t).
i≥k

The stochastic integral is

X
( H̄n · Z̄ )t = ξi Z̄(ui+1 −σ)∧t − Z̄(ui −σ)+ ∧t
i≥k
X
= ξi Zσ+(ui+1 −σ)∧t − Zσ+(ui −σ)∧t
i>k

+ ξk Zσ+(uk+1 −σ)∧t − Zσ .

The i = k term above develops differently from the others because Z̄(uk −σ)+ ∧t =
Z̄0 = 0. By (5.50),

σ + (ui − σ) ∧ t = (σ + t) ∧ ui for each i > k.

5.4. Further properties of stochastic integrals 159

For i ≤ k note that (σ + t) ∧ ui = σ ∧ ui = ui . Now continue from above:

X
( H̄n · Z̄ )t = ξi Z(σ+t)∧ui+1 − Z(σ+t)∧ui
i>k

+ ξk Z(σ+t)∧uk+1 − Zσ
X X
= ξi Z(σ+t)∧ui+1 − Z(σ+t)∧ui − ξi Zσ∧ui+1 − Zσ∧ui
i i
= (Hn · Z)σ+t − (Hn · Z)σ .

Next we take an arbitrary process H ∈ L2 (Z, P), and wish to show

(5.52) (H̄ · Z̄)t = (H · Z)σ+t − (H · Z)σ .

Take a sequence {Hn } of simple predictable processes such that Hn → H

in L2 (Z, P). Equality (5.51), along with the boundedness of σ, then implies
that H̄n → H̄ in L2 (Z̄, P̄). Consequently we get the convergence H̄n · Z̄ →
H̄ · Z̄ and Hn ·Z → H ·Z in probability, uniformly on compact time intervals.
Identity (5.52) was verified above for simple predictable processes, hence the
convergence passes it on to general processes. Boundedness of σ is used here
because then the time arguments σ + t and σ on the right side of (5.52) stay
bounded. Since (H · Z)σ = (1[0,σ] H) · Z σ+t we can rewrite (5.52) as

(5.53) (H̄ · Z̄)t = (1(σ,∞) H) · Z σ+t .

At this point we have proved the lemma in the L2 case and a localization
argument remains. Given t, pick νk > t. Then τk > σ + t. Use the fact that
{νk } and {τk } are localizing sequences for their respective integrals.

(Ḡ · M̄ )t = (1(0,νk ] Ḡ) · M̄ νk t

= (1(σ,∞) 1(σ,τk ] G) · M τk σ+t

= (1(σ,∞) 1[0,τk ] G) · M τk σ+t

= (1(σ,∞) G) · M σ+t
= (G · M )σ+t − (G · M )σ .

This completes the proof of the lemma.

Proof of Theorem 5.41. Ȳ is a semimartingale because by Lemma 5.43

the restarting operation preserves the local L2 -martingale part, while the
FV part is preserved by a direct argument. If Y were an FV process, the
integral identity (5.48) would be evident as a path-by-path identity. And
again Lemma 5.43 takes care of the local L2 -martingale part of the integral,
so (5.48) is proved for a semimartingale Y .
160 5. Stochastic Integral

5.4.4. Stopping just before a stopping time. Let τ be a stopping time

and Y a cadlag process. The process Y τ − is defined by

Y (0),
 t = 0 or τ = 0
τ−
(5.54) Y (t) = Y (t), 0<t<τ

Y (τ −), 0 < τ ≤ t.


In other words, the process Y has been stopped just prior to the stopping
time. This type of stopping is useful for processes with jumps. For example,
if
τ = inf{t ≥ 0 : |Y (t)| ≥ r or |Y (t−)| ≥ r}
then |Y τ | ≤ r may fail if Y jumped exactly at time τ , but |Y τ − | ≤ r is true.
For continuous processes Y τ − and Y τ coincide. More precisely, the
relation between the two stoppings is that
Y τ (t) = Y τ − (t) + ∆Y (τ )1{t ≥ τ }.
In other words, only a jump of Y at τ can produce a difference, and that is
not felt until t reaches τ .
The next example shows that stopping just before τ can fail to preserve
the martingale property. But it does preserve a semimartingale, because a
single jump can be moved to the FV part, as evidenced in the proof of the
lemma after the example.

Example 5.44. Let N be a rate α Poisson process and Mt = Nt − αt.

Let τ be the time of the first jump of N . Then N τ − = 0, from which
Mtτ − = −αt1{τ >t} − ατ 1{τ ≤t} . This process cannot be a martingale because
M0τ − = 0 while Mtτ − < 0 for all t. One can also check that the expectation
of Mtτ − is not constant in t, which violates martingale behavior.

Lemma 5.45. Let Y be a semimartingale and τ a stopping time. Then

Y τ − is a semimartingale.

Proof. Let Y = Y0 + M + U be a decomposition of Y into a local L2 -

martingale M and an FV process U . Then
(5.55) Ytτ − = Y0 + Mtτ + Utτ − − ∆Mτ 1{t ≥ τ }.
M τ is a local L2 -martingale, and the remaining part
Gt = Utτ − − ∆Mτ 1{t ≥ τ }
is an FV process.

Next we state some properties of integrals stopped just before τ .

5.4. Further properties of stochastic integrals 161

Proposition 5.46. Let Y and Z be semimartingales, G and H predictable

processes locally bounded in the sense (5.36), and τ a stopping time.
(a) (G · Y )τ − = (1[0,τ ] G) · (Y τ − ) = G · (Y τ − ).
(b) If G = H on [0, τ ] and Y = Z on [0, τ ), then (G · Y )τ − = (H · Z)τ − .

Proof. It suffices to check (a), as (b) is an immediate consequence. Using

the semimartingale decomposition (5.55) for Y τ − we have
(1[0,τ ] G) · Y τ − = (1[0,τ ] G) · M τ + (1[0,τ ] G) · U τ − − G(τ )∆Mτ 1[τ,∞)
= (G · M )τ + (G · U )τ − − G(τ )∆Mτ 1[τ,∞) ,
and the same conclusion also without the factor 1[0,τ ] :
G · Y τ − = G · M τ + G · U τ − − G(τ )∆Mτ 1[τ,∞)
= (G · M )τ + (G · U )τ − − G(τ )∆Mτ 1[τ,∞) .
In the steps above, the equalities
(1[0,τ ] G) · M τ = G · M τ = (G · M )τ .
were obtained from (5.32). The equality
(1[0,τ ] G) · U τ − = G · U τ − = (G · U )τ −
comes from path by path Lebesgue-Stieltjes integration: since U τ − froze
just prior to τ , its Lebesgue-Stieltjes measure satisfies ΛU τ − (A) = 0 for any
Borel set A ⊆ [τ, ∞), and so for any bounded Borel function g and any
t ≥ τ, Z Z
g dΛU τ − = g dΛU τ − .
[0,t] [0,τ )
Lastly, note that by Theorem 5.37
(G · M )τt − G(τ )∆Mτ 1[τ,∞) (t) = (G · M )τt − ∆(G · M )τ 1[τ,∞) (t)
= (G · M )τt − .
Applying this change above gives
(1[0,τ ] G) · Y τ − = (G · M )τ − + (G · U )τ − = (G · Y )τ −
and completes the proof.

5.4.5. Matrix-valued integrands and vector-valued integrators. In

order to consider equations for vector-valued processes, we need to establish
the (obvious componentwise) conventions regarding the integrals of matrix-
valued processes with vector-valued integrators. Let a probability space
(Ω, F, P ) with filtration {Ft } be given. If Qi,j (t), 1 ≤ i ≤ d and 1 ≤
j ≤ m, are predictable processes on this space, then we regard Q(t) =
(Qi,j (t)) as a d × m-matrix valued predictable process. And if Y1 , . . . , Ym
162 5. Stochastic Integral

are semimartingales on this space, then Y = (Y1 , . .R. , Ym )T is an Rm -valued

semimartingale. The stochastic integral Q · Y = Q dY is the Rd -valued
process whose ith component is
n Z
X
(Q · Y )i (t) = Qi,j (s) dYj (s)
j=1 (0,t]

assuming of course that all the componentwise integrals are well-defined.

One note of caution: the definition of a d-dimensional Brownian motion
B(t) = [B1 (t), . . . , Bd (t)]T includes the requirement that the coordinates be
independent of each other. For a vector-valued semimartingale, it is only
required that each coordinate be marginally a semimartingale.

5.5. Integrator with absolutely continuous Doléans measure

This section is an appendix to the chapter, somewhat apart from the main
development, and can be skipped.
In Chapter 4 we saw that any measurable, adapted process can be in-
tegrated with respect to Brownian motion, provided the process is locally
in L2 . On the other hand, the integral of Sections 5.1 and 5.2 is restricted
to predictable processes. How does the more general Brownian integral fit
in the general theory? Do we need to develop separately a more general
integral for other individual processes besides Brownian motion?
In this section we partially settle the issue by showing that when the
Doléans measure µM of a local L2 -martingale M is absolutely continuous
with respect to m ⊗ P , then all measurable adapted processes can be inte-
grated (subject to the localization requirement). In particular, this applies
to Brownian motion to give all the integrals defined in Section 4, and also
applies to the compensated Poisson process Mt = Nt − αt.
The extension is somewhat illusory though. We do not obtain genuinely
new integrals. Instead we show that every measurable adapted process X
is equivalent, in a sense to be made precise below, to a predictable process
X̄. Then we define X · M = X̄ · M . This will be consistent with our earlier
definitions because it was already the case that µM -equivalent processes have
equal stochastic integrals.
For the duration of this section, fix a local L2 -martingale M with Doléans
measure µM . The Doléans measure of a local L2 -martingale is defined ex-
actly as for L2 -martingales, see Remark 5.30. We assume that µM is ab-
solutely continuous with respect to m ⊗ P on the predictable σ-algebra P.
Recall that absolute continuity, abbreviated µM m ⊗ P , means that if
m ⊗ P (D) = 0 for some D ∈ P, then µM (D) = 0.
5.5. Non-predictable integrands 163

Let P ∗ be the σ-algebra generated by the predictable σ-algebra P and

all sets D ∈ BR+ ⊗ F with m ⊗ P (D) = 0. Equivalently, as checked in
Exercise 1.2(e),
P ∗ = {G ∈ BR+ ⊗ F : there exists A ∈ P
(5.56)
such that m ⊗ P (G4A) = 0}.
Definition 5.47. Suppose µM is absolutely continuous with respect to m ⊗
P on the predictable σ-algebra P. By the Radon-Nikodym Theorem, there
exists a P-measurable function fM ≥ 0 on R+ × Ω such that
Z
(5.57) µM (A) = fM d(m ⊗ P ) for A ∈ P.
A
Define a measure µ∗M on the σ-algebra P ∗ by
Z
∗
(5.58) µM (G) = fM d(m ⊗ P ) for G ∈ P ∗ .
G
The measure µ∗M is an extension of µM from P to the larger σ-algebra P ∗ .

Firstly, definition (5.58) makes sense because fM and G are BR+ ⊗ F-

measurable, so they can be integrated against the product measure m ⊗ P .
Second, if G ∈ P, comparison of (5.57) and (5.58) shows µ∗M (G) = µM (G),
so µ∗M is an extension of µM .
Note also the following. Suppose G and A are in the relationship (5.56)
that characterizes P ∗ . Then µ∗M (G) = µM (A). To see this, write
µ∗M (G) = µ∗M (A) + µ∗M (G \ A) − µ∗M (A \ G)
Z Z
= µM (A) + fM d(m ⊗ P ) − fM d(m ⊗ P )
G\A A\G
= µM (A)
where the last equality follows from
m ⊗ P (G \ A) + m ⊗ P (A \ G) = m ⊗ P (G4A) = 0.

The key facts that underlie the extension of the stochastic integral are
assembled in the next lemma.
Lemma 5.48. Let X be an adapted, measurable process. Then there exists
a P-measurable process X̄ such that
(5.59) m ⊗ P {(t, ω) ∈ R+ × Ω : X(t, ω) 6= X̄(t, ω)} = 0.
In particular, all measurable adapted processes are P ∗ -measurable.
Under the assumption µM m ⊗ P , we also have
(5.60) µ∗M {(t, ω) ∈ R+ × Ω : X(t, ω) 6= X̄(t, ω)} = 0.
164 5. Stochastic Integral

Following our earlier conventions, we say that X and X̄ are µ∗M -equivalent.

Proof. Let X be a bounded, adapted, measurable process. By Lemma 4.2

there exists a sequence of simple predictable processes Xn such that
Z
(5.61) E |X − Xn |2 ds → 0
[0,T ]×Ω
for all T < ∞.
We claim that there exists a subsequence {Xnk } such that Xnk (t, ω) →
X(t, ω) m ⊗ P -almost everywhere on R+ × Ω. We perform this construction
with the usual diagonal argument. In general, L2 convergence implies almost
everywhere convergence along some subsequence. Thus from (5.61) for T =
1 we can extract a subsequence {Xn1 : j ∈ N} such that Xn1 → X m ⊗ P -
j j
almost everywhere on the set [0, 1] × Ω. Inductively, suppose we have a
subsequence {Xn` : j ∈ N} such that limj→∞ Xn` = X m ⊗ P -almost
j j
everywhere on the set [0, `] × Ω. Then apply (5.61) for T = ` + 1 to extract
a subsequence {n`+1
j : j ∈ N} of {n`j : j ∈ N} such that limj→∞ Xn`+1 = X
j
m ⊗ P -almost everywhere on the set [0, ` + 1] × Ω.
From the array {n`j : `, j ∈ N} thus constructed, take the diagonal
nk = nkk for k ∈ N. For any `, {nk : k ≥ `} is a subsequence of {n`j : j ∈ N},
and consequently Xnk → X m ⊗ P -almost everywhere on the set [0, `] × Ω.
Let
A = {(t, ω) ∈ R+ × Ω : Xnk (t, ω) → X(t, ω) as k → ∞}.
The last observation on {nk } implies that
∞
[
Ac = {(t, ω) ∈ [0, `] × Ω : Xnk (t, ω) does not converge to X(t, ω)}
`=1
is a countable union of sets of m ⊗ P -measure zero. Thus Xnk → X m ⊗ P -
almost everywhere on R+ × Ω.
Set
X̄(t, ω) = lim sup Xnk (t, ω).
k→∞
The processes Xnk are P-measurable by construction. Consequently X̄ is
P-measurable. On the set A, X̄ = X. Since m ⊗ P (Ac ) = 0, (5.59) follows
for this X.
For a Borel set B ∈ BR ,
{(t, ω) ∈ R+ × Ω : X(t, ω) ∈ B}
= {(t, ω) ∈ A : X(t, ω) ∈ B} ∪ {(t, ω) ∈ Ac : X(t, ω) ∈ B}
= {(t, ω) ∈ A : X̄(t, ω) ∈ B} ∪ {(t, ω) ∈ Ac : X(t, ω) ∈ B}.
5.5. Non-predictable integrands 165

This expresses {X ∈ B} as a union of a set in P and a set in BR+ ⊗ F with

m ⊗ P -measure zero. So {X ∈ B} ∈ P ∗ .
The lemma has now been proved for a bounded adapted measurable
process X.
Given an arbitrary adapted measurable process X, let X (k) = (X ∧
k) ∨ (−k). By the previous part, X (k) is P ∗ -measurable, and there exist
P-measurable processes X̄k such that X (k) = X̄k m ⊗ P -almost everywhere.
Define (again) X̄ = lim supk→∞ X̄k . Since X (k) → X pointwise, X is also
P ∗ -measurable, and X = X̄ m ⊗ P -almost everywhere.

A consequence of the lemma is that it makes sense to talk about the

µ∗M -measure
of any event involving measurable, adapted processes.

Definition 5.49. Assume M is a local L2 -martingale whose Doléans mea-

sure µM satisfies µM m ⊗ P on P. Define the extension µ∗M of µM to P ∗
as in Definition 5.47.
Let L(M, P ∗ ) be the class of measurable adapted processes X for which
there exists a nondecreasing sequence of stopping times ρk such that ρk % ∞
almost surely, and for each k,
Z
1[0,ρk ] |X|2 dµ∗M < ∞ for all T < ∞.
[0,T ]×Ω

For X ∈ L(M, P ∗ ), the stochastic integral X·M is the local L2 -martingale

given by X̄ · M , where X̄ is the P-measurable process that is µ∗M -equivalent
to X in the sense (5.60). This X̄ will lie in L(M, P) and so the process X̄ ·M
exists. The process X · M thus defined is unique up to indistinguishability.

Justification of the definition. Since X̄ = X µ∗M -almost everywhere,

their µ∗M -integrals agree, and in particular
Z Z
2
1[0,ρk ] |X̄| dµM = 1[0,ρk ] |X|2 dµ∗M < ∞ for all T < ∞.
[0,T ]×Ω [0,T ]×Ω

By Lemma 5.31, this is just another way of expressing X̄ ∈ L(M, P). It fol-
lows that the integral X̄ · M exists and is a member of M2,loc . In particular,
we can define X · M = X̄ · M as an element of M2,loc .
If we choose another P-measurable process Ȳ that is µ∗M -equivalent to
X, then X̄ and Ȳ are µM -equivalent, and the integrals X̄ · M and Ȳ · M are
indistinguishable by Exercise 5.7.

If M is an L2 -martingale to begin with, such as Brownian motion or the

compensated Poisson process, we naturally define L2 (M, P ∗ ) as the space
166 5. Stochastic Integral

of measurable adapted processes X such that

Z
|X|2 dµ∗M < ∞ for all T < ∞.
[0,T ]×Ω
The integral extended to L(M, P ∗ ) or L2 (M, P ∗ ) enjoys all the properties
derived before, because the class of processes that appear as stochastic in-
tegrals has not been expanded.
A technical point worth noting is that once the integrand is not pre-
dictable, the stochastic integral does not necessarily coincide with the Lebes-
gue-Stieltjes integral even if the integrator has paths of bounded variation.
This is perfectly illustrated by the Poisson process in Exercise 5.10. While
this may seem unnatural, we must remember that the theory of Itô stochas-
tic integrals is so successful because integrals are martingales.
Exercises 167

Exercises
Exercise 5.1. Show that if X : R+ ×Ω → R is P-measurable, then Xt (ω) =
X(t, ω) is a process adapted to the filtration {Ft− }.
Hint: Given B ∈ BR , let A = {(t, ω) : X(t, ω) ∈ B} ∈ P. The event
{Xt ∈ B} equals the t-section At = {ω : (t, ω) ∈ A}, so it suffices to show
that an arbitrary A ∈ P satisfies At ∈ Ft− for all t ∈ R+ . This follows
from checking that predictable rectangles have this property, and that the
collection of sets in BR+ ⊗ F with this property form a sub-σ-field.
Exercise 5.2. (a) Show that for any Borel function h : R+ → R, the
deterministic process X(t, ω) = h(t) is predictable. Hint: Intervals of the
type (a, b] generate the Borel σ-field of R+ .

(b) Suppose X is an Rm -valued predictable process and g : Rm → Rd

a Borel function. Show that the process Zt = g(Xt ) is predictable.

(c) Suppose X is an Rm -valued predictable process and f : R+ × Rm →

Rd a Borel function. Show that the process Wt = f (t, Xt ) is predictable.
Hint. Parts (a) and (b) imply that h(t)g(Xt ) is predictable. Now appeal
to results around the π-λ Theorem.
Exercise 5.3. Fill in the missing details of the proof that P is generated
by continuous processes (see Lemma 5.1).
Exercise 5.4. Let Nt be a rate α Poisson process on R+ and Mt = Nt − αt.
In Example 5.3 we checked that µM = αm ⊗ P on P. Here we use this result
to show that the process N is not P-measurable.
(a) Evaluate Z
N (s, ω) µM (ds, dω).
[0,T ]×Ω

(b) Evaluate Z
E Ns d[M ]s
[0,T ]
where the inner integral is the pathwise Lebsgue-Stieltjes integral, in accor-
dance with the interpretation of definition (5.1) of µM . Conclude that N
cannot be P-measurable.
(c) For comparison, evaluate explicitly
Z
E Ns− dNs .
(0,T ]

Here Ns− (ω) = limu%s Nu (ω) is the left limit. Explain why we know without
calculation that the answer must agree with part (a)?
168 5. Stochastic Integral

Exercise 5.5. Let {Ft } be a filtration, P the corresponding predictable

σ-field, {Ft+ } defined as in (2.2), and P+ the predictable σ-field that cor-
responds to {Ft+ }. In other words, P+ is the σ-field generated by the sets
{0} × F0 for F0 ∈ F0+ and (s, t] × F for F ∈ Fs+ .
(a) Show that each (s, t] × F for F ∈ Fs+ lies in P.
(b) Show that a set of the type {0} × F0 for F ∈ F0+ \ F0 cannot lie in
P.
(c) Show that the σ-fields P and P+ coincide on the subspace (0, ∞) × Ω
of R+ × Ω. Hint: Apply part (d) of Exercise 1.2.
(d) Suppose X is a P+ -measurable process. Show that the process
Y (t, ω) = X(t, ω)1(0,∞) (t) is P-measurable.
Exercise 5.6. Suppose M is a continuous L2 -martingale and X ∈ L2 (M, P).
Show that (1(a,b) X) · M = (1[a,b] X) · M . Hint: [M ] is continuous.
Exercise 5.7. Let M be a local L2 martingale. Show that if X, Y ∈
L(M, P) are µM -equivalent, namely
µM {(t, ω) : X(t, ω) 6= Y (t, ω)} = 0,
then X · M and Y · M are indistinguishable.
Exercise 5.8. Finish the proof of Proposition 5.28.
Exercise 5.9 (Computations with the Poisson process). Let N be a homo-
geneous rate α Poisson process, and Mt = Nt − αt the compensated Poisson
process which is an L2 -martingale. Let 0 < τ1 < τ2 < . . . < τN (t) denote the
jump times of N in (0, t].
(a) Show that
N (t)
X αt2
E τi = .
2
i=1
Hint: For a homogeneous Poisson process, given that there are n jumps in an
interval I, the locations of the jumps are n i.i.d. uniform random variables
in I.
(b) Compute the integral
Z
N (s−) dM (s).
(0,t]
“Compute” means to find a reasonably simple formula in terms of Nt and
the τi ’s. One way is to justify the evaluation of this integral as a Lebesgue-
Stieltjes integral.

R (c) Use the formula you obtained in part (b) to check that the process
N (s−) dM (s) is a martingale. (Of course, this conclusion is part of the
Exercises 169

theory but the point here is to obtain it through computation. Part (a) and
Exercise 2.9 take care of parts of the work.)
R
(d) Suppose N were predictable. Then the stochastic integral N dM
would exist and be a martingale. Show that this is not true and conclude
that N cannot be predictable.
Hints: It might be easiest to find
Z Z Z

N (s) dM (s) − N (s−) dM (s) = N (s) − N (s−) dM (s)
(0,t] (0,t] (0,t]
and use the fact that the integral of N (s−) is a martingale.
Exercise 5.10 (Extended stochastic integral of the Poisson process). Let
Nt be a rate α Poisson process, Mt = Nt − αt and N− (t) = N (t−). Show
that N− is a modification of N , and
µ∗M {(t, ω) : Nt (ω) 6= Nt− (ω)} = 0.
Thus the stochastic integral N · M can be defined according to the extension
in Section 5.5 and this N · M must agree with N− · M .
Exercise 5.11 (Riemann sum approximation in M2 ). Let M be an L2
martingale, X ∈ L2 (M, P), and assume X also satisfies the hypotheses of
Proposition 5.29. Let π m = {0 = tm m m
1 < t2 < t3 < · · · } be partitions such
m k,m
that mesh π → 0. Let ξi = (Xtmi
∧ k) ∨ (−k), and define the simple
predictable processes
n
Wtk,m,n = ξim,k 1(tm
X
m (t).
i ,ti+1 ]
i=1
Then there exists a subsequences {m(k)} and {n(k)} such that
lim kX · M − W k,m(k),n(k) · M kM2 = 0.
k→∞

Exercise 5.12. Let 0 < a < b < ∞ be constants and M ∈ M2,loc . Find
the stochastic integral Z
1[a,b) (s) dMs .
(0,t]
Hint: Check that if M ∈ M2 then 1(a−1/n,b−1/n] converges to 1[a,b) in
L2 (M, P).
Chapter 6

Itô’s formula

Itô’s formula works as a fundamental theorem of calculus in stochastic anal-

ysis. As preparation for its statement and proof, we extend our earlier
treatment of quadratic variation and covariation.
Right-continuity of the filtration {Ft } is not needed for the proofs of this
chapter. This property might be needed for defining the stochastic integral
with respect to a semimartingale one wants to work with. As explained in
the beginning of Section 5.3, under this assumption we can apply Theorem
3.16 (fundamental theorem of local martingales) to guarantee that every
semimartingale has a decomposition whose local martingale part is a lo-
cal L2 -martingale. This is necessary for the construction of the stochastic
integral.

6.1. Quadratic variation

Quadratic covariation [X, Y ] of two processes X and Y was defined in Section
2.2 by the limit
X
(6.1) lim (Xti+1 − Xti )(Yti+1 − Yti ) = [X, Y ]t
mesh(π)→0
i

where π = {0 = t0 < t1 < · · · < tm(π) = t} is a partition of [0, t]. The limit
is assumed to hold in probability. The quadratic variation [X] of a single
process X is then defined by [X] = [X, X]. When these processes exist, they
are tied together by the identity
[X, Y ] = 21 [X + Y ] − [X] − [Y ] .

(6.2)
For right-continuous martingales and local martingales M and N , [M ] and
[M, N ] exist. [M ] is an increasing process, which meant that almost every

171
172 6. Itô’s formula

path is nondecreasing, cadlag, and initially at zero. [M, N ] is a cadlag FV

process, initially at zero.
The definition shows that, for processes X, Y and Z and reals α and β,
[αX + βY, Z] = α[X, Z] + β[Y, Z]
provided these all exist. Consequently [ · , · ] operates as an inner product
on processes.
Lemma 6.1. Suppose Mn , M , Nn and N are L2 -martingales. Fix 0 ≤ T <
∞. Assume Mn (T ) → M (T ) and Nn (T ) → N (T ) in L2 as n → ∞. Then
n o
E sup [Mn , Nn ]t − [M, N ]t → 0 as n → ∞.
0≤t≤T

Proof. It suffices to consider the case where Mn = Nn and M = N because

the general case then follows from (6.2). Apply inequality (2.11) and note
that [X]t is nondecreasing in t.
1/2 1/2
[Mn ]t − [M ]t ≤ [Mn − M ]t + 2[Mn − M ]t [M ]t
1/2 1/2
≤ [Mn − M ]T + 2[Mn − M ]T [M ]T .
Take expectations, apply Schwarz and recall (3.14).
n o
E sup [Mn ]t − [M ]t ≤ kMn (T ) − M (T )k2L2 (P )
0≤t≤T
+ 2kMn (T ) − M (T )kL2 (P ) · kM (T )kL2 (P ) .
Letting n → ∞ completes the proof.
Proposition 6.2. Let M, N ∈ M2,loc , G ∈ L(M, P), and H ∈ L(N, P).
Then Z
[G · M, H · N ]t = Gs Hs d[M, N ]s .
(0,t]

Proof. It suffices to show

Z
(6.3) [G · M, L]t = Gs d[M, L]s
(0,t]
for an arbitrary L ∈ M2,loc . This is because then (6.3) applied to L = H · N
gives Z
[M, L]t = Hs d[M, N ]s
(0,t]
so the Lebesgue-Stieltjes measures satisfy d[M, L]t = Ht d[M, N ]t . Substi-
tute this back into (6.3) to get the desired equality
Z Z
[G · M, L]t = Gs d[M, L]s = Gs Hs d[M, N ]s .
(0,t] (0,t]
6.1. Quadratic variation 173

Step 1. Assume L, M ∈ M2 . First consider G = ξ1(u,v] for a bounded

Fu -measurable random variable ξ. Then G · M = ξ(M v − M u ). By the
bilinearity of quadratic covariation,
[G · M, L]t = ξ [M v , L]t − [M u , L]t = ξ [M, L]v∧t − [M, L]u∧t

Z Z
= ξ1(u,v] (s) d[M, L]s = Gs d[M, L]s .
(0,t] (0,t]

The second equality above used Lemma 3.25. ξ moves freely in and out of
the integrals because they are path-by-path Lebesgue-Stieltjes integrals. By
additivity of the covariation conclusion (6.3) follows for G that are simple
predictable processes of the type (5.6).
Now take a general G ∈ L2 (M, P). Pick simple predictable processes
Gn such that Gn → G in L2 (M, P). Then (Gn · M )t → (G · M )t in L2 (P ).
By Lemma 6.1 [Gn · M, L]t → [G · M, L]t in L1 (P ). On the other hand the
previous lines showed
Z
[Gn · M, L]t = Gn (s) d[M, L]s .
(0,t]

The desired equality

Z
[G · M, L]t = G(s) d[M, L]s
(0,t]

follows if we can show the L1 (P ) convergence

Z Z
Gn (s) d[M, L]s −→ G(s) d[M, L]s .
(0,t] (0,t]

The first step below is a combination of the Kunita-Watanabe inequal-

ity (2.14) and the Schwarz inequality. Next, the assumption Gn → G in
L2 (M, P) implies Gn → G in L2 ([0, t] × Ω, µM ).
Z

E Gn (s) − G(s) d[M, L]s
(0,t]
Z 1/2
2
1/2
≤ E |Gn (s) − G(s)| d[M ]s E [L]t
(0,t]
Z 1/2
1/2
|Gn − G|2 dµM E L2t − L20

= −→ 0 as n → ∞.
(0,t]×Ω

We have shown (6.3) for the case L, M ∈ M2 and G ∈ L2 (M, P).

Step 2. Now the case L, M ∈ M2,loc and G ∈ L(M, P). Pick stopping
times {τk } that localize both L and (G, M ). Abbreviate Gk = 1(0,τk ] G.
174 6. Itô’s formula

Then if τk (ω) ≥ t,
[G · M, L]t = [G · M, L]τk ∧t = [(G · M )τk , Lτk ]t = [(Gk · M τk ), Lτk ]t
Z Z
k τk τk
= Gs d[M , L ]s = Gs d[M, L]s .
(0,t] (0,t]

We used Lemma 3.25 to move the τk superscript in and out of covariations,

and the definition (5.27) of the integral G · M . Since τk (ω) ≥ t, we have
Gk = G on [0, t], and consequently Gk can be replaced by G on the last line.
This completes the proof.

We can complement part (c) of Proposition 5.13 with this result.

Corollary 6.3. Suppose M, N ∈ M2 , G ∈ L2 (M, P), and H ∈ L2 (N, P).
Then Z
(G · M )t (H · N )t − Gs Hs d[M, N ]s
(0,t]
is a martingale. If we weaken the hypotheses to Suppose M, N ∈ M2,loc ,
G ∈ L(M, P), and H ∈ L(N, P), then the process above is a local martingale.

Proof. Follows from Proposition 6.2 above, and a general property of [M, N ]
for (local) L2 -martingales N and N , stated as Theorem 3.26 in Section
2.2.

Since quadratic variation functions like an inner product for local L2 -

martingales, we can use the previous result as a characterization of the
stochastic integral. According to this characterization, to verify that a given
process is the stochastic integral, it suffices to check that it behaves the right
way in the quadratic covariation.
Lemma 6.4. Let M ∈ M2,loc and assume M0 = 0.
(a) For any 0 ≤ t < ∞, [M ]t = 0 almost surely iff sup0≤s≤t |Ms | = 0
almost surely.
(b) Let G ∈ L(M, P). The stochastic integral G · M is the unique process
Y ∈ M2,loc that satisfies Y0 = 0 and
Z
[Y, L] = Gs d[M, L]s almost surely
(0,t]

for each 0 ≤ t < ∞ and each L ∈ M2,loc .

Proof. Part (a). Fix t. Let {τk } be a localizing sequence for M . Then for
s ≤ t, [M τk ]s ≤ [M τk ]t = [M ]τk ∧t ≤ [M ]t = 0 almost surely by Lemma 3.23
and the t-monotonicity of [M ]. Consequently E{(Msτk )2 } = E{[M τk ]s } = 0,
from which Msτk = 0 almost surely. Taking k large enough so that τk (ω) ≥ s,
6.1. Quadratic variation 175

Ms = Msτk = 0 almost surely. Since M has cadlag paths, there is a single

event Ω0 such that P (Ω0 ) = 1 and Ms (ω) = 0 for all ω ∈ Ω0 and s ∈ [0, t].
Conversely, if M vanishes on [0, t] then so does [M ] by its definition.
Part (b). We checked that G · M satisfies (6.3) which is the property
required here. Conversely, suppose Y ∈ M2,loc satisfies the property. Then
by the additivity,
[Y − G · M, L]t = [Y, L] − [G · M, L] = 0
for any L ∈ M2,loc . Taking L = Y − G · M gives [Y − G · M, Y − G · M ] = 0,
and then by part (a) Y = G · M .

The next “change-of-integrator” property could have been proved in

Chapter 5 with a case-by-case argument, from simple predictable integrands
to localized integrals. At this point we can give a quick proof, since we
already did the tedious work in the previous proofs.
Proposition 6.5. Let M ∈ M2,loc , G ∈ L(M, P), and let N = G · M ,
also a member of M2,loc . Suppose H ∈ L(N, P). Then HG ∈ L(M, P) and
H · N = (HG) · M .

Proof. Let {τk } be a localizing sequence for (G, M ) and (H, N ). By part
(b) of Proposition 5.28 N τk = (G · M )τk = G · M τk , and so Proposition
6.2 gives the equality of Lebesgue-Stieltjes measures d[N τk ]s = G2s d[M τk ]s .
Then for any T < ∞,
Z Z
2 2 τk
E 1[0,τk ] (t)Ht Gt d[M ]t = E 1[0,τk ] (t)Ht2 d[N τk ]t < ∞
[0,T ] [0,T ]

because {τk } is assumed to localize (H, N ). This checks that {τk } localizes
(HG, M ) and so HG ∈ L(M, P).
Let L ∈ M2,loc . Equation (6.3) gives Gs d[M, L]s = d[N, L]s , and so
Z Z
[(HG) · M, L]t = Hs Gs d[M, L]s = Hs d[N, L]s = [H · N, L]t .
(0,t] (0,t]

By Lemma 6.4(b), (HG) · M must coincide with H · N .

We proceed to extend some of these results to semimartingales, begin-

ning with theR change of integrator. Recall some notation and terminology.
The integral G dY with respect to a cadlag semimartingale Y was defined
in Section 5.3 for predictable integrands G that satify this condition:
there exist stopping times {σn } such that σn % ∞ almost
(6.4)
surely and 1(0,σn ] G is a bounded process for each n.
If X is a cadlag process, we defined the caglad process X− by X− (0) = X(0)
and X− (t) = X(t−) for t > 0. The notion of uniform convergence on
176 6. Itô’s formula

compact time intervals in probability first appeared in our discussion of

martingales (Lemma 3.36) and then with integrals (Propositions 5.29 and
5.34).
Corollary 6.6. Let Y be a cadlag Rsemimartingale, G and H predictable
processes that satisfy (6.4), and X = H dY , also a cadlag semimartingale.
Then Z Z
G dX = GH dY.

Proof. Let Y = Y0 + M + U be a decomposition 2

R of Y into a local L -
martingale M and an FV process U . Let Vt = (0,t] Hs dUs , another FV
process. This definition entails that, for a fixed ω, Hs is the Radon-Nikodym
derivative dΛV /dΛU of the Lebesgue-Stieltjes measures on the time line
(Lemma 1.16). X = H · M R+ V is a semimartingale decomposition of X.
By definition of the integral G dX and Proposition 6.5,
Z Z

Gs dXs = G · (H · M ) t + Gs dVs
(0,t] (0,t]
Z

= (GH) · M t + Gs Hs dUs
(0,t]
Z
= Gs Hs dYs .
(0,t]
R
The last equality is the definition of the semimartingale integral GH dY .

In terms of “stochastic differentials” we can express the conclusion as

dX = H dY . These stochastic differentials do not exist as mathematical
objects, but the rule dX = H dY tells us that dX can be replaced by H dY
in the stochastic integral.
Proposition 6.7. Let Y and Z be cadlag semimartingales. Then [Y, Z]
exists as a cadlag, adapted FV process and satisfies
Z Z
(6.5) [Y, Z]t = Yt Zt − Y0 Z0 − Ys− dZs − Zs− dYs .
(0,t] (0,t]
The product Y Z is a semimartingale, and for a predictable process H that
satisfies (6.4),
Z Z Z
Hs d(Y Z)s = Hs Ys− dZs + Hs Zs− dYs
(0,t] (0,t] (0,t]
(6.6) Z
+ Hs d[Y, Z]s .
(0,t]
6.1. Quadratic variation 177

Proof. By definition of [Y, Z] and some algebra, in the limit of vanishing

mesh of a partition π = {0 = t0 < t1 < · · · < tm = t},
X
[Y, Z]t = lim (Yti+1 − Yti )(Zti+1 − Zti )
X
= lim (Yti+1 Zti+1 − Yti Zti )
X X
− Yti (Zti+1 − Zti ) − Zti (Yti+1 − Yti )
Z Z
= Yt Zt − Y0 Z0 − Ys− dZs − Zs− dYs .
(0,t] (0,t]

The last equality follows from Proposition 5.34.

This argument applies to show that [Y + Z], [Y ] and [Z] all exist and
are cadlag processes. Since these quadratic variations are nondecreasing (as
follows from the definition (2.8)), the identity [Y, Z] = 21 ([Y + Z] − [Y ] −
[Z]) implies that [Y, Z] is a cadlag FV process. (Since the equation above
expresses [Y, Z] in terms of stochastic integrals, it is not clear from the
formula that [Y, Z] is an FV process.)
We can turn (6.5) around to say
Z Z
Y Z = Y0 Z0 + Y− dZ + Z− dY + [Y, Z]

which represents Y Z as a sum of semimartingales, and thereby Y Z itself is

a semimartingale.
Equality (6.6) follows from the additivity of integrals,
R and the
R change-of-
integrator formula applied to the semimartingales Y− dZ and Z− dY .
Remark 6.8. Identity (6.5) gives an integration by parts rule for stochastic
integrals. It generalizes the integration by parts rule of Lebesgue-Stieltjes
integrals that is part of standard real analysis [3, Section 3.5].

Next we extend Proposition 6.2 to semimartingales.

Theorem 6.9. Let Y and Z be cadlag semimartingales and G and H pre-
dictable processes that satisfy (6.4). Then
Z
[G · Y, H · Z]t = Gs Hs d[Y, Z]s .
(0,t]

Proof. For the same reason as in the proof of Proposition 6.2, it suffices to
show Z
[G · Y, Z]t = Gs d[Y, Z]s .
(0,t]
178 6. Itô’s formula

Let Y = Y0 + M + A and Z = Z0 + N + B be decompositions into local

L2 -martingales M and N and FV processes A and B. By the linearity of
quadratic covariation,
[G · Y, Z] = [G · M, N ] + [G · M, B] + [G · A, Z].
R
By Proposition 6.2 [G · M, N ] = G d[M, N ]. To handle the other two
terms, fix an ω such that the paths of the integrals and the semimartin-
gales are cadlag, and the characterization of jumps given in Theorem 5.37
is valid for each integral that appears here. Then since B and G · A are FV
processes, Lemma A.8 applies. Combining Lemma A.8, Theorem 5.37, and
the definition of a Lebesgue-Stieltjes integral with respect to a step function
gives
X X
[G · M, B]t + [G · A, Z]t = ∆(G · M )s ∆Bs + ∆(G · A)s ∆Zs
s∈(0,t] s∈(0,t]
X X
= Gs ∆Ms ∆Bs + Gs ∆As ∆Zs
s∈(0,t] s∈(0,t]
Z Z
= Gs d[M, B]s + Gs d[A, Z]s .
(0,t] (0,t]

Combining terms gives the desired equation.

Proposition 6.10. Let Y and Z be cadlag semimartingales, and G an

adapted cadlag process. Given a partition π = {0 = t1 < t2 < t3 < · · · <
ti % ∞} of [0, ∞), define
∞
X
(6.7) Rt (π) = Gti (Yti+1 ∧t − Yti ∧t )(Zti+1 ∧t − Zti ∧t ).
i=1
R
Then as mesh(π) → 0, R(π) converges to G− d[Y, Z] in probability, uni-
formly on compacts.

Proof. Algebra gives

X
Rt (π) = Gti (Yti+1 ∧t Zti+1 ∧t − Yti ∧t Zti ∧t )
X X
− Gti Yti (Zti+1 ∧t − Zti ∧t ) − Gti Zti (Yti+1 ∧t − Yti ∧t ).

Applying Proposition 5.34 to each sum gives the claimed type of convergence
to the limit
Z Z Z
Gs− d(Y Z)s − Gs− Ys− dZs − Gs− Zs− dYs
(0,t] (0,t] (0,t]
R
which by (6.6) equals (0,t] Gs− d[Y, Z]s .
6.2. Itô’s formula 179

6.2. Itô’s formula

We prove Itô’s formula in two main stages, first for real-valued semimartin-
gales and then for vector-valued semimartingales. Additionally we state
several simplifications that result if the cadlag semimartingale specializes to
an FV process, a continuous semimartingale, or Brownian motion.
For an open set D ⊆ R, C 2 (D) is the space of functions f : D → R such
that the derivatives f 0 and f 00 exist everywhere on D and are continuous
functions. For a real or vector-valued cadlag process X, let us introduce the
notation
∆Xs = Xs − Xs−
for the jump at time s. Recall also the notion of the closure of the path over
a time interval, for a cadlag process given by
X[0, t] = {X(s) : 0 ≤ s ≤ t} ∪ {X(s−) : 0 < s ≤ t}.

Itô’s formula contains a term which is a sum over the jumps of the
process. This sum has at most countably many terms because a cadlag path
has at most countably many discontinuities (Lemma A.5). It is also possible
to define rigorously what is meant by a convergent sum of uncountably many
terms, and arrive at the same value (see the discussion around (A.4) in the
appendix).
Theorem 6.11. Fix 0 < T < ∞. Let D be an open subset of R and
f ∈ C 2 (D). Let Y be a cadlag semimartingale with quadratic variation
process [Y ]. Assume that for all ω outside some event of probability zero,
Y [0, T ] ⊆ D. Then
Z Z
f (Yt ) = f (Y0 ) + f 0 (Ys− ) dYs + 21 f 00 (Ys− ) d[Y ]s
(0,t] (0,t]
(6.8) X n o
+ f (Ys ) − f (Ys− ) − f 0 (Ys− )∆Ys − 12 f 00 (Ys− )(∆Ys )2
s∈(0,t]

Part of the conclusion is that the last sum over s ∈ (0, t] converges absolutely.
Both sides of the equality above are cadlag processes, and the meaning of
the equality is that these processes are indistinguishable on [0, T ]. In other
words, there exists an event Ω0 of full probability such that for ω ∈ Ω0 , (6.8)
holds for all 0 ≤ t ≤ T .

Proof. The proof starts by Taylor expanding f . We formulate this in the

following way. Define the function γ on D × D by γ(x, x) = 0 for all x ∈ D,
and for x 6= y
1 n
0 1 00 2
o
(6.9) γ(x, y) = f (y) − f (x) − f (x)(y − x) − 2 f (x)(y − x) .
(y − x)2
180 6. Itô’s formula

On the set {(x, y) ∈ D × D : x 6= y} γ is continuous as a quotient of

two continuous functions. That γ is continuous also at diagonal points
(z, z) follows from Taylor’s theorem (Theorem A.14 in the appendix). Given
z ∈ D, pick r > 0 small enough so that G = (z − r, z + r) ⊆ D. Then for
x, y ∈ G, x 6= y, there exists a point θx,y between x and y such that
f (y) = f (x) + f 0 (x)(y − x) + 12 f 00 (θx,y )(y − x)2 .
So for these (x, y)
γ(x, y) = 12 f 00 (θx,y ) − 12 f 00 (x).
As (x, y) converges to (z, z), θx,y converges to z, and so by the assumed
continuity of f 00
γ(x, y) −→ 21 f 00 (z) − 12 f 00 (z) = 0 = γ(z, z).
We have verified that γ is continuous on D × D.
Write
f (y) − f (x) = f 0 (x)(y − x) + 21 f 00 (x)(y − x)2 + γ(x, y)(y − x)2 .
Given a partition π = {ti } of [0, ∞), apply the above identity to each par-
tition interval to write
X
f (Yt ) = f (Y0 ) + f (Yt∧ti+1 ) − f (Yt∧ti )
i
X
(6.10) = f (Y0 ) + f 0 (Yt∧ti )(Yt∧ti+1 − Yt∧ti )
i
X
(6.11) + 1
2 f 00 (Yt∧ti )(Yt∧ti+1 − Yt∧ti )2
i
X
(6.12) + γ(Yt∧ti , Yt∧ti+1 )(Yt∧ti+1 − Yt∧ti )2 .
i

By Propositions 5.34 and 6.10 we can fix a sequence of partitions π ` such

that mesh(π ` ) → 0, and so that the following limits happen almost surely,
uniformly for t ∈ [0, T ], as ` → ∞.
(i) The sum on line (6.10) converges to
Z
f 0 (Ys− ) dYs .
(0,t]

(ii) The term on line (6.11) converges to

Z
1
2 f 00 (Ys− ) d[Y ]s .
(0,t]

(iii) The limit

X
(Yt∧ti+1 − Yt∧ti )2 −→ [Y ]t
i
6.2. Itô’s formula 181

happens.
Fix ω so that the limits in items (i)–(iii) above happen. Lastly we apply
the scalar case of Lemma A.11 to the cadlag function s → Ys (ω) on [0, t] and
the sequence of partitions π ` chosen above. For the closed set K in Lemma
A.11 take K = Y [0, T ]. For the continuous function φ in Lemma A.11 take
φ(x, y) = γ(x, y)(y − x)2 . By hypothesis, K is a subset of D. Consequently,
as verified above, the function
(
(x − y)−2 φ(x, y), x 6= y
γ(x, y) =
0, x=y
is continuous on K ×K. Assumption (A.8) of Lemma A.11 holds by item (iii)
above. The hypotheses of Lemma A.11 have been verified. The conclusion
is that for this fixed ω and each t ∈ [0, T ], the sum on line (6.12) converges
to
X X 2
φ(Ys− , Ys ) = γ(Ys− , Ys ) Ys − Ys−
s∈(0,t] s∈(0,t]
X n o
= f (Ys ) − f (Ys− ) − f 0 (Ys− )∆Ys − 21 f 00 (Ys− )(∆Ys )2 .
s∈(0,t]

Lemma A.11 also contains the conclusion that this last sum is absolutely
convergent.
To summarize, given 0 < T < ∞, we have shown that for almost every
ω, (6.8) holds for all 0 ≤ t ≤ T .

Let us state some simplifications of Itô’s formula.

Corollary 6.12. Under the hypotheses of Theorem 6.11, we have the fol-
lowing special cases.
(a) If Y is continuous on [0, T ], then
Z t Z t
(6.13) f (Yt ) = f (Y0 ) + f 0 (Ys ) dYs + 12 f 00 (Ys ) d[Y ]s .
0 0

(b) If Y has bounded variation on [0, T ], then

Z
f (Yt ) = f (Y0 ) + f 0 (Ys− ) dYs
(0,t]
(6.14) X n o
+ f (Ys ) − f (Ys− ) − f 0 (Ys− )∆Ys
s∈(0,t]

(c) If Yt = Y0 + Bt , where B is a standard Brownian motion independent

of Y0 , then
Z t Z t
0
(6.15) f (Bt ) = f (Y0 ) + 1
f (Bs ) dBs + 2 f 00 (Bs ) ds.
0 0
182 6. Itô’s formula

Proof. Part (a). Continuity eliminates the sum over jumps, and renders
endpoints of intervals irrelevant for integration.
Part (b). By Corollary A.9 the quadratic variation of a cadlag path
consists exactly of the squares of the jumps. Consequently
Z X
1 00 1 00 2
2 f (Ys− ) d[Y ]s = 2 f (Ys− )(∆Ys )
(0,t] s∈(0,t]

and we get cancellation in the formula (6.8).

Part (c). Specialize part (a) to [B]t = t.

The open set D in the hypotheses of Itô’s formula does not have to be
an interval, so it can be disconnected.
The important hypothesis Y [0, T ] ⊆ D prevents the process from reach-
ing the boundary. Precisely speaking, the hypothesis implies that for some
δ > 0, dist(Y (s), Dc ) ≥ δ for all s ∈ [0, T ]. To prove this, assume the
contrary, namely the existence of si ∈ [0, T ] such that dist(Y (si ), Dc ) → 0.
Since [0, T ] is compact, we may pass to a convergent subsequence si → s.
And then by the cadlag property, Y (si ) converges to some point y. Since
dist(y, Dc ) = 0 and Dc is a closed set, y ∈ Dc . But y ∈ Y [0, T ], and we have
contradicted Y [0, T ] ⊆ D.
But note that the δ (the distance to the boundary of D) can depend on
ω. So the hypothesis Y [0, T ] ⊆ D does not require that there exists a fixed
closed subset H of D such that P {Y (t) ∈ H for all t ∈ [0, T ]} = 1.
Hypothesis Y [0, T ] ⊆ D is needed because otherwise a “blow-up” at
the boundary can cause problems. The next example illustrates why we
need to assume the containment in D of the closure Y [0, T ], and not merely
Y [0, T ] ⊆ D.
Example 6.13. Let D = (−∞, 1) ∪ ( 23 , ∞), and define
(√
1 − x, x < 1
f (x) =
0, x > 23 ,
a C 2 -function on D. Define the deterministic process
(
t, 0≤t<1
Yt =
1 + t, t ≥ 1.
Yt ∈ D for all t ≥ 0. However, if t > 1,
Z Z Z
0 0 0
f (Ys− ) dYs = f (s) ds + f (Y1− ) + f 0 (s) ds
(0,t] (0,1) (1,t]
= −1 + (−∞) + 0.
6.2. Itô’s formula 183

As the calculation shows, the integral is not finite. The problem is that the
closure Y [0, t] contains the point 1 which lies at the boundary of D, and the
derivative f 0 blows up there.

We extend Itô’s formula to vector-valued semimartingales. For purposes

of matrix multiplication we think of points x ∈ Rd as column vectors, so
with T denoting transposition,

x = [x1 , x2 , . . . , xd ]T .

Let Y1 (t), Y2 (t), . . . , Yd (t) be cadlag semimartingales with respect to a com-

mon filtration {Ft }. We write Y (t) = [Y1 (t), . . . , Yd (t)]T for the column vec-
tor with coordinates Y1 (t), . . . , Yd (t), and call Y an Rd -valued semimartin-
gale. Its jump is the vector of jumps in the coordinates:

∆Y (t) = [∆Y1 (t), ∆Y2 (t), . . . , ∆Yd (t)]T .

For 0 < T < ∞ and an open subset D of Rd , C 1,2 ([0, T ] × D) is the

space of functions f : [0, T ] × D → R whose partial derivatives ft , fxi ,
and fxi ,xj exist and are continuous in the interior (0, T ) × D, and extend
as continuous functions to [0, T ] × D. So the superscript in C 1,2 stands for
one continuous time derivative and two continuous space derivatives. For
f ∈ C 1,2 ([0, T ] × D) and (t, x) ∈ [0, T ] × D, the spatial gradient

T
Df (t, x) = fx1 (t, x), fx2 (t, x), . . . , fxd (t, x)

is the column vector of first-order partial derivatives in the space variables,

and the Hessian matrix D2 f (t, x) is the d × d matrix of second-order spatial
partial derivatives:

 
fx1 ,x1 (t, x) fx1 ,x2 (t, x) · · · fx1 ,xd (t, x)
fx ,x (t, x) fx ,x (t, x) · · · fx2 ,xd (t, x)
 2 1 2 2
D2 f (t, x) =  .

.. .. .. ..
 . . . . 
fxd ,x1 (t, x) fxd ,x2 (t, x) · · · fxd ,xd (t, x)

Theorem 6.14. Fix d ≥ 2 and 0 < T < ∞. Let D be an open subset of

Rd and f ∈ C 1,2 ([0, T ] × D). Let Y be an Rd -valued cadlag semimartingale
184 6. Itô’s formula

such that outside some event of probability zero, Y [0, T ] ⊆ D. Then

Z t
f (t, Y (t)) = f (0, Y (0)) + ft (s, Y (s)) ds
0
d Z
X
+ fxj (s, Y (s−)) dYj (s)
j=1 (0,t]
X Z
(6.16) + 1
fxj ,xk (s, Y (s−)) d[Yj , Yk ](s)
2
1≤j,k≤d (0,t]
X n
+ f (s, Y (s)) − f (s, Y (s−))
s∈(0,t]
o
− Df (s, Y (s−))T ∆Y (s) − 12 ∆Y (s)T D2 f (s, Y (s−))∆Y (s)

Proof. Let us write Ytk = Yk (t) in the proof. The pattern is the same as in
the scalar case. Define a function φ on [0, T ]2 × D2 by the equality

f (t, y) − f (s, x) = ft (s, x)(t − s) + Df (s, x)T (y − x)

(6.17)
+ 12 (y − x)T D2 f (s, x)(y − x) + φ(s, t, x, y).

Apply this to partition intervals to write

f (t, Yt ) = f (0, Y0 )
X
(6.18) + ft (t ∧ ti , Yt∧ti ) (t ∧ ti+1 ) − (t ∧ ti )
i
d X
X
k k

(6.19) + fxk (t ∧ ti , Yt∧ti ) Yt∧ti+1
− Yt∧ti
k=1 i
j j
X X
1 k k

(6.20) + 2 fxj ,xk (t ∧ ti , Yt∧ti ) Yt∧ti+1
− Yt∧ti
Yt∧ti+1
− Yt∧ti
1≤j,k≤d i
X
(6.21) + φ(t ∧ ti , t ∧ ti+1 , Yt∧ti , Yt∧ti+1 ).
i

By Propositions 5.34 and 6.10 we can fix a sequence of partitions π ` such

that mesh(π ` ) → 0, and so that the following limits happen almost surely,
uniformly for t ∈ [0, T ], as ` → ∞.
(i) Line (6.18) converges to
Z
ft (s, Ys ) ds .
(0,t]
6.2. Itô’s formula 185

(ii) Line (6.19) converges to

d Z
X
fxk (Ys− ) dYsk .
k=1 (0,t]

(iii) Line (6.20) converges to

X Z
1
2 fxj ,xk (Ys− ) d[Y j , Y k ]s .
1≤j,k≤d (0,t]

(iv) The limit

X
k k
2
Yt∧ti+1
− Yt∧ti
−→ [Y k ]t
i

happens for 1 ≤ k ≤ d.
Fix ω such that Y [0, T ] ⊆ D and the limits in items (i)–(iv) hold. By the
above paragraph and by hypothesis these conditions hold for almost every
ω.
We wish to apply Lemma A.11 to the Rd -valued cadlag function s 7→
Ys (ω) on [0, t], the function φ defined by (6.17), the closed set K = Y [0, T ],
and the sequence of partitions π ` chosen above. We need to check that φ and
the set K satisfy the hypotheses of Lemma A.11. Continuity of φ follows
from the definition (6.17). Next we argue that if (sn , tn , xn , yn ) → (u, u, z, z)
in [0, T ]2 × K 2 while for each n, either sn 6= tn or xn 6= yn , then
φ(sn , tn , xn , yn )
(6.22) → 0.
|tn − sn | + |yn − xn |2
Given ε > 0, let I be an interval around u in [0, T ] and let B be an open
ball centered at z and contained in D such that

|ft (v, w) − ft (u, z)| + |D2 f (v, w) − D2 f (u, z)| ≤ ε

for all v ∈ I, w ∈ B. Such an interval I and ball B exist by the openness of

D and by the assumption of continuity of derivatives of f in [0, T ] × D. For
large enough n, we haev sn , tn ∈ I and xn , yn ∈ B. Since a ball is convex,
by Taylor’s formula (A.16) we can write

φ(sn , tn , xn , yn ) = ft (τn , yn ) − ft (sn , xn ) (tn − sn )
+ (yn − xn )T D2 f (sn , ξn ) − D2 f (sn , xn ) (yn − xn ),

where τn lies between sn and tn , and ξn is a point on the line segment

connecting xn and yn . Hence τn ∈ I and ξn ∈ B, and by Scwarz inequality
186 6. Itô’s formula

in the form (A.6),

Thus
φ(sn , tn , xn , yn )
≤ 2ε
|tn − sn | + |yn − xn |2
for large enough n, and we have verified (6.22).
The function
φ(s, t, x, y)
(s, t, x, y) 7→
|t − s| + |y − x|2
is continuous at points where either s 6= t or x 6= y, as a quotient of two
continuous functions. Consequently the function γ defined by (A.7) is con-
tinuous on [0, T ]2 × K 2 .
Hypothesis (A.8) of Lemma A.11 is a consequence of the limit in item
(iii) above.
The hypotheses of Lemma A.11 have been verified. By this lemma, for
this fixed ω and each t ∈ [0, T ], the sum on line (6.21) converges to
X X n
φ(s, s, Ys− , Ys ) = f (s, Ys ) − f (s, Ys− )
s∈(0,t] s∈(0,t]
o
− Df (s, Ys− )∆Ys − 12 ∆YsT D2 f (s, Ys− )∆Ys .
This completes the proof of Theorem 6.14.
Remark 6.15 (Notation). Often Itô’s formula is expressed in terms of dif-
ferential notation which is more economical than the integral notation. As
an example, if Y is a continuous Rd -valued semimartingale, equation (6.16)
can be written as
d
X
df (t, Y (t)) = ft (t, Y (t)) dt + fxj (t, Y (t−)) dYj (t)
(6.23) j=1
X
1
+ 2 fxj ,xk (t, Y (t−)) d[Yj , Yk ](t).
1≤j,k≤d
As mentioned already, these “stochastic differentials” have no rigorous mean-
ing. The formula above is to be regarded only as an abbreviation of the
integral formula (6.16).

Next some important special cases.

For f ∈ C 2 (D) in Rd , the Laplacian operator ∆ is defined by
∆f = fx1 ,x1 + · · · + fxd ,xd .
6.3. Applications of Itô’s formula 187

A function f is harmonic in D if ∆f = 0 on D.

Corollary 6.16. Let B(t) = (B1 (t), . . . , Bd (t)) be Brownian motion in Rd ,

with random initial point B(0), and f ∈ C 2 (Rd ). Then
Z t Z t
T 1
(6.24) f (B(t)) = f (B(0)) + Df (B(s)) dB(s) + 2 ∆f (B(s)) ds.
0 0

Suppose f is harmonic in an open set D ⊆ Rd . Let D1 be an open subset

of D such that dist(D1 , Dc ) > 0. Assume initially B(0) = z for some point
z ∈ D1 , and let
(6.25) τ = inf{t ≥ 0 : B(t) ∈ D1c }
be the exit time for Brownian motion from D1 . Then f (B τ (t)) is a local
L2 -martingale.

Proof. Formula (6.24) comes directly from Itô’s formula, because [Bi , Bj ] =
δi,j t.
The process B τ is a (vector) L2 martingale that satisfies B τ [0, T ] ⊆ D
for all T < ∞. Thus Itô’s formula applies. Note that [Biτ , Bjτ ] = [Bi , Bj ]τ =
δi,j (t ∧ τ ). Hence ∆f = 0 in D eliminates the second-order term, and the
formula simplifies to
Z t
τ
f (B (t)) = f (z) + Df (B τ (s))T dB τ (s)
0

which shows that f (B τ (t)) is a local L2 -martingale.

6.3. Applications of Itô’s formula

One can use Itô’s formula to find pathwise expressions for stochastic inte-
grals.
Rt
Example 6.17. To evaluate 0 Bsα dBs for standard Brownian motion and
α ≥ 1, take f (x) = (α + 1)−1 xα+1 , so that f 0 (x) = xα and f 00 (x) = αxα−1 .
Itô’s formula gives
Z t
α t α−1
Z
α −1 α+1
Bs dBs = (α + 1) Bt − B ds.
0 2 0 s
The integral on the right is a familiar Riemann integral of the continuous
function s 7→ Bsα−1 .

One can use Itô’s formula to find martingales, which in turn are useful
for calculations, as the next lemma and example show.
188 6. Itô’s formula

Lemma 6.18. Suppose f ∈ C 1,2 (R+ × R) and ft + 12 fxx = 0. Let Bt be

one-dimensional standard Brownian motion. Then f (t, Bt ) is a local L2 -
martingale. If
Z T
E fx (t, Bt )2 dt < ∞,

(6.26)
0
then f (t, Bt ) is an L2 -martingale on [0, T ].

Proof. Since [B]t = t, (6.16) specializes to

Z t Z t
ft (s, Bs ) + 21 fxx (s, Bs ) ds

f (t, Bt ) = f (0, 0) + fx (s, Bs ) dBs +
0 0
Z t
= f (0, 0) + fx (s, Bs ) dBs
0
where the last line is a local L2 -martingale. (The integrand fx (s, Bs ) is a
continuous process, hence predictable, and satisfies the local boundedness
condition (5.29).)
The integrability condition (6.26) guarantees that fx (s, Bs ) lies in the
space L2 (B, P) of integrands on the interval [0, T ]. In our earlier develop-
ment of stochastic integration we always considered processes defined for all
time. To get an integrand process on the entire time line [0, ∞), one can
extend fx (s, Bs ) by declaring it identically zero on (T, ∞). This does not
change the integral on [0, T ].
Example 6.19. Let µ ∈ R and σ 6= 0 be constants. Let a < 0 < b. Let
Bt be one-dimensional standard Brownian motion, and Xt = µt + σBt a
Brownian motion with drift. Question: What is the probability that Xt
exits the interval (a, b) through the point b?
Define the stopping time
τ = inf{t > 0 : Bt = a or Bt = b}.
First we need to check that τ < ∞ almost surely. For example, the events
{σBn+1 − σBn > b − a + |µ| + 1}, n = 0, 1, 2, . . .
are independent and have a common positive probability. Hence one of them
happens almost surely. Consequently Xn cannot remain in (a, b) for all n.
We seek a function h such that h(Xt ) is a martingale. Then, if we could
justify Eh(Xτ ) = h(0), we could compute the desired probability P (Xτ = b)
from
h(0) = Eh(Xτ ) = h(a)P (Xτ = a) + h(b)P (Xτ = b).
To utilize Lemma 6.18, let f (t, x) = h(µt + σx). The condition ft + 21 fxx = 0
becomes
µh0 + 21 σ 2 h00 = 0.
6.3. Applications of Itô’s formula 189

At this point we need to decide whether µ = 0 or not. Let us work the case
µ 6= 0. Solving for h gives
h(x) = C1 exp 2µσ −2 x + C2

for two constants of integration C1 , C2 . To check (6.26), from f (t, x) =

h(µt + σx) derive
fx (t, x) = −2C2 µσ −1 exp 2µσ −2 (µt + σx) .

Since Bt is a mean zero normal with variance t, one can verify that (6.26)
holds for all T < ∞.
Now Mt = h(Xt ) is a martingale. By optional stopping, Mτ ∧t is also a
martingale, and so EMτ ∧t = EM0 = h(0). By path continuity and τ < ∞,
Mτ ∧t → Mτ almost surely as t → ∞. Furthermore, the process Mτ ∧t is
bounded, because up to time τ process Xt remains in [a, b], and so |Mτ ∧t | ≤
C ≡ supa≤x≤b |h(x)|. Dominated convergence gives EMτ ∧t → EMτ as t →
∞. We have verified that Eh(Xτ ) = h(0).
Finally, we can choose the constants C1 and C2 so that h(b) = 1 and
h(a) = 0. After some details,
2
e−2µa/σ − 1
P (Xτ = b) = h(0) = .
e−2µa/σ2 − e−2µb/σ2
Can you explain what you see as you let either a → −∞ or b → ∞? (Decide
first whether µ is positive or negative.)
We leave the case µ = 0 as an exercise. You should get P (Xτ = b) =
(−a)/(b − a).

Next we use Itô’s formula to investigate recurrence and transience of

Brownian motion, and whether Brownian motion ever hits a point.
Let us first settle these questions in one dimension.
Proposition 6.20. Let Bt be Brownian motion in R. Then limt→∞ Bt = ∞
and limt→∞ Bt = −∞, almost surely. Consequently almost every Brownian
path visits every point infinitely often.

Proof. Let τ0 = 0 and τ (k + 1) = inf{t > τ (k) : |Bt − Bτ (k) | = 4k+1 }. By

the strong Markov property of Brownian motion, for each k the restarted
process {Bτ (k)+s − Bτ (k) : s ≥} is a standard Brownian motion, independent
of Fτ (k) . Consequently, by the case µ = 0 of Example 6.19,

P Bτ (k+1) − Bτ (k) = 4k+1 = P Bτ (k+1) − Bτ (k) = −4k+1 = 21 .

By the strong Markov property these random variables indexed by k are

independent. Thus, for almost every ω, there are arbitrarily large j and k
190 6. Itô’s formula

such that Bτ (j+1) − Bτ (j) = 4j+1 and Bτ (k+1) − Bτ (k) = −4k+1 . But then
since
j
X 4j+1 − 1 4j+1
|Bτ (j) | ≤ 4i = ≤ ,
4−1 2
i=1

Bτ (j+1) ≥ 4j /2,
and by the same argument Bτ (k+1) ≤ −4k /2. Thus limt→∞ Bt =
∞ and limt→∞ Bt = −∞ almost surely.
Almost every Brownian path visits every point infinitely often due to a
special property of one dimension: it is impossible to go “around” a point.

Proposition 6.21. Let Bt be Brownian motion in Rd , and let P z denote
the probability measure when the process Bt is started at point z ∈ Rd . Let
τr = inf{t ≥ 0 : |Bt | ≤ r}
be the first time Brownian motion hits the ball of radius r around the origin.
(a) If d = 2, P z (τr < ∞) = 1 for all r > 0 and z ∈ Rd .
(b) If d ≥ 3, then for z outside the ball of radius r,
d−2
z r
P (τr < ∞) = .
|z|
There will be an almost surely finite time T such that |Bt | > r for all t ≥ T .
(c) For d ≥ 2 and any z, y ∈ Rd ,
P z [ Bt 6= y for all 0 < t < ∞] = 1.
Note that z = y is allowed. That is why t = 0 is not included in the event.

Proof. Observe that for (c) it suffices to consider y = 0, because

P z [ Bt 6= y for all 0 < t < ∞] = P z−y [ Bt 6= 0 for all 0 < t < ∞].
Then, it suffices to consider z 6= 0, because if we have the result for all z 6= 0,
then we can use the Markov property to restart the Brownian motion after
a small time s:
P 0 [ Bt 6= 0 for all 0 < t < ∞]
= lim P 0 [ Bt 6= 0 for all s < t < ∞]
s&0

= lim E 0 P B(s) { Bt 6= 0 for all 0 < t < ∞} = 1.

s&0

The last equality is true because B(s) 6= 0 with probability one.

Now we turn to the actual proofs, and we can assume z 6= 0 and the
small radius r < |z|.
6.3. Applications of Itô’s formula 191

We start with dimension d = 2. The function g(x) = log|x| is harmonic

in D = R2 \ {0}. (Only in d = 2, check.) Let
σR = inf{t ≥ 0 : |Bt | ≥ R}
be the first time to exit the open ball of radius R. Pick r < |z| < R, and
define the annulus A = {x : r < |x| < R}. The time to exit the annulus is
ζ = τr ∧ σR . Apply Proposition 6.16 to the harmonic function
log R − log|x|
f (x) =
log R − log r
and the annulus A. We get that f (Bζ∧t ) is a local L2 -martingale, and since
f is bounded on the closure of A, it follows that f (Bζ∧t ) is an L2 -martingale.
The argument used in Example 6.19, in conjunction with letting t → ∞,
gives again f (z) = Ef (B0 ) = Ef (Bζ ). Since f = 1 on the boundary of
radius r but vanishes at radius R,
P z (τr < σR ) = P z ( |Bζ | = r) = Ef (Bζ ) = f (z),
and so
log R − log|z|
(6.27) P z (τr < σR ) = .
log R − log r
From this we can extract both part (a) and part (c) for d = 2. First,
σR % ∞ as R % ∞ because a fixed path of Brownian motion is bounded
on bounded time intervals. Consequently
P z (τr < ∞) = lim P z (τr < σR )
R→∞
log R − log|z|
= lim = 1.
R→∞ log R − log r

Secondly, consider r = r(k) = (1/k)k and R = R(k) = k. Then we get

log k − log|z|
P z (τr(k) < σR(k) ) =
(k + 1) log k
which vanishes as k → ∞. Let τ = inf{t ≥ 0 : Bt = 0} be the first hitting
time of 0. For r < |z|, τr ≤ τ because Bt cannot hit zero without first
entering the ball of radius r. Again since σR(k) % ∞ as k % ∞,
P z (τ < ∞) = lim P z (τ < σR(k) )
k→∞
≤ lim P z (τr(k) < σR(k) ) = 0.
k→∞

For dimension d ≥ 3 we use the harmonic function g(x) = |x|2−d , and

apply Itô’s formula to the function
R2−d − |x|2−d
f (x) = .
R2−d − r2−d
192 6. Itô’s formula

The annulus A and stopping times σR and ζ are defined as above. The same
reasoning now leads to
R2−d − |z|2−d
(6.28) P z (τr < σR ) = .
R2−d − r2−d
Letting R → ∞ gives
d−2
|z|2−d

z r
P (τr < ∞) = 2−d =
r |z|
as claimed. Part (c) follows now because the quantity above tends to zero
as r → 0.
It remains to show that after some finite time, the ball of radius r is no
longer visited. Let r < R. Define σR1 = σ , and for n ≥ 2,
R
n−1
τrn = inf{t > σR : |Bt | ≤ r}
and
n
σR = inf{t > τrn : |Bt | ≥ R}.
1 < τ 2 < σ 2 < τ 3 < · · · are the successive visits to radius
In other words, σR r R r
R and back to radius r. Let α = (r/R)d−2 < 1. We claim that for n ≥ 2,
P z (τrn < ∞) = αn−1 .
1 < τ 2 , use the strong Markov property to restart the
For n = 2, since σR r
Brownian motion at time σR 1.

P z (τr2 < ∞) = P z (σR

1
< τr2 < ∞)
1
= E z P B(σR ) {τr < ∞} = α.

Then by induction.
n−1
P z (τrn < ∞) = P z (τrn−1 < σR < τrn < ∞)
n−1
n−1
= E z 1{τrn−1 < σR < ∞}P B(σR ) {τr < ∞}

= P z (τrn−1 < ∞) · α = αn−1 .

n−1
Above we used the fact that if τrn−1 < ∞, then necessarily σR < ∞
because each coordinate of Bt has limsup ∞.
The claim implies
X
P z (τrn < ∞) < ∞.
n
By Borel-Cantelli, τrn < ∞ can happen only finitely many times, almost
surely.
Exercises 193

Remark 6.22. We see here an example of an uncountable family of events

with probability one whose intersection must vanish. Namely, the inter-
section of the events that Brownian motion does not hit a particular point
would be the event that Brownian motion never hits any point. Yet Brow-
nian motion must reside somewhere.
Theorem 6.23 (Lévy’s Characterization of Brownian Motion). Let M =
[M1 , . . . , Md ]T be a continuous Rd -valued local martingale with M (0) = 0.
Then M is a standard Brownian motion iff [Mi , Mj ]t = δi,j t.

Proof. We already know that a d-dimensional standard Brownian motion

satisfies [Bi , Bj ]t = δi,j t.
What we need to show is that M with the assumed covariance is Brow-
nian motion. As a continuous local martingale, M is also a local L2 -
martingale. Fix a vector θ = (θ1 , . . . , θd )T ∈ Rd and let f (t, x) = exp{iθT x+
1 2
2 |θ| t}. Let Zt = f (t, M (t)). Itô’s formula applies equally well to complex-
valued functions, and so
d
|θ|2 t
Z X Z t
Zt = 1 + Zs ds + iθj Zs dMj (s)
2 0 0
j=1
d Z t
1 X
− θj2 Zs ds
2 0
j=1
d
X Z t
=1+ iθj Zs dMj (s).
j=1 0

This shows that Z is a local L2 -martingale. On any bounded time interval

Z is bounded because the random factor exp{iθT Bt } has absolute value
one. Consequently Z is an L2 -martingale. For s < t E[Zt |Fs ] = Zs can be
rewritten as
h i
E exp iθT (M (t) − M (s)) Fs = exp − 21 |θ|2 (t − s) .

This says that, conditioned on Fs , the increment M (t) − M (s) has normal
distribution with mean zero and covariance matrix identity. In particular,
M (t) − M (s) is independent of Fs . Thus M has all the properties of Brown-
ian motion. (For the last technical point, see Lemma B.14 in the appendix.)

Exercises
Exercise 6.1. Check that for a Poisson process N Itô’s formula needs no
proof, in other words it reduces immediately to an obvious identity.
194 6. Itô’s formula

Exercise 6.2. Let B be standard one-dimensional Brownian motion. Show

that for a continuously differentiable nonrandom function φ,
Z t Z t
φ(s) dBs = φ(t)Bt − Bs φ0 (s) ds.
0 0

Exercise 6.3. Define

Z t Z t
V1 (t) = Bs ds, V2 (t) = V1 (s) ds,
0 0
and generally
Z t Z t Z s1 Z sn−1
Vn (t) = Vn−1 (s) ds = ds1 ds2 · · · dsn Bsn .
0 0 0 0
Vn is known as n times integrated Brownian motion, and appears in appli-
cations in statistics. Show that
1 t
Z
Vn (t) = (t − s)n dBs .
n! 0
Then show that the process
n Z t
X tj
Mn (t) = Vn (t) − (−s)n−j dBs
j!(n − j)! 0
j=1
is a martingale.
Exercise 6.4. Let X and Y be independent rate α Poisson processes.
(a) Show that
P {X and Y have no jumps in common} = 1.
Find the covariation [X, Y ].
(b) Find a process U such that Xt Yt − Ut is a martingale.
Exercise 6.5. Let Bt be Brownian motion in Rk started at point z ∈ Rk .
As before, let
σR = inf{t ≥ 0 : |Bt | ≥ R}
be the first time the Brownian motion leaves the ball of radius R. Compute
the expectation E z [σR ] as a function of z, k and R.
Hint. Start by applying Itô’s formula to f (x) = x21 + · · · + x2k .
Chapter 7

Stochastic Differential
Equations

In this chapter we study equations of the type

Z
(7.1) X(t, ω) = H(t, ω) + F (s, ω, X(ω)) dY (s, ω)
(0,t]

for an unknown Rd -valued process X. In the equation, Y is a given Rm -

valued cadlag semimartingale and H is a given Rd -valued adapted cadlag
process. The coefficient F (t, ω, η) is a d × m-matrix valued function of the
time
R variable t, the sample point ω, and a cadlag path η. The integral
F dY with a matrix-valued integrand and vector-valued integrator has a
natural componentwise interpretation, as explained in Section 5.4.5.
Underlying the equation is a probability space (Ω, F, P ) with a filtration
{Ft }, on which the ingredients H, Y and F are defined. A solution X is
an Rd -valued cadlag process that is defined on this given probability space
(Ω, F, P ) and adapted to {Ft }, and that satisfies the equation in the sense
that the two sides of (7.1) are indistinguishable. This notion is the so-called
strong solution which means that the solution process can be constructed
on the given probability space, adapted to the given filtration.
R
In order for the integral F dY to be sensible, F has to have the prop-
erty that, whenever an adapted cadlag process X is substituted for η, the
resulting process (t, ω) 7→ F (t, ω, X(ω)) is predictable and locally bounded
in the sense (5.36). We list precise assumptions on F when we state and
prove an existence and uniqueness theorem. Before the theory we discuss
examples.

195
196 7. Stochastic Differential Equations

Since the integral term vanishes at time zero, equation (7.1) contains
the initial value X(0) = H(0). The integral equation (7.1) can be written
in the differential form
(7.2) dX(t) = dH(t) + F (t, X) dY (t), X(0) = H(0)
where the initial value must then be displayed explicitly. The notation can
be further simplified by dropping the superfluous time variables:
(7.3) dX = dH + F (t, X) dY, X(0) = H(0).
Equations (7.2) and (7.3) have no other interpretation except as abbrevi-
ations for (7.1). Equations such as (7.3) are known as SDE’s (stochas-
tic differential equations) even though rigorously speaking they are integral
equations.

7.1. Examples of stochastic equations and solutions

7.1.1. Itô equations. Let Bt be a standard Brownian motion in Rm with
respect to a filtration {Ft }. Let ξ be an Rd -valued F0 -measurable ran-
dom variable. Often ξ is a nonrandom point x0 . The assumption of F0 -
measurability implies that ξ is independent of the Brownian motion. An Itô
equation has the form
(7.4) dXt = b(t, Xt ) dt + σ(t, Xt ) dBt , X0 = ξ
or in integral form
Z t Z t
(7.5) Xt = ξ + b(s, Xs ) ds + σ(s, Xs ) dBs .
0 0
The coefficients b(t, x) and σ(t, x) are Borel measurable functions of (t, x) ∈
[0, ∞) × Rd . The drift vector b(t, x) is Rd -valued, and the dispersion matrix
σ(t, x) is d × m-matrix valued. The d × d matrix a(t, x) = σ(t, x)σ(t, x)T is
called the diffusion matrix. X is the unknown Rd -valued process.
Linear Itô equations can be explicitly solved by a technique from basic
theory of ODE’s (ordinary differential equations) known as the integrating
factor. We begin with the nonrandom case from elementary calculus.
Example 7.1 (Integrating factor for linear ODE). Let a(t) and g(t) be
given functions. Suppose we wish to solve
(7.6) x0 + a(t)x = g(t)
for an unknown function x = x(t). The trick is to multiply through the
equation by the integrating factor
nZ t o
(7.7) z(t) = exp a(s) ds
0
7.1. Examples of stochastic equations and solutions 197

and then identify the left-hand side zx0 + azx as the derivative (zx)0 . The
equation becomes
d
nZ t o nZ t o
x(t) exp a(s) ds = g(t) exp a(s) ds .
dt 0 0
Integrating from 0 to t gives
nZ t o Z t nZ s o
x(t) exp a(s) ds − x(0) = g(s) exp a(u) du ds
0 0 0
which rearranges into
n Z t o
x(t) = x(0) exp − a(s) ds
0
n Z t oZ t n Z s o
+ exp − a(s) ds g(s) exp − a(u) du ds.
0 0 0
Now one can check by differentiation that this formula gives a solution.

We apply this idea to the most basic Itô equation.

Example 7.2 (Ornstein-Uhlenbeck process). Let α, σ be constants, and
consider the SDE
(7.8) dXt = −αXt dt + σ dBt
with a given initial value X0 independent of the one-dimensional Brownian
motion Bt . Formal similarity with the linear ODE (7.6) suggests we try the
integrating factor Zt = eαt . For the ODE the key was to take advantage of
the formula d(zx) = x dz + z dx for the derivative of a product. Here we
attempt the same, but differentiation rules from calculus have to be replaced
by Itô’s formula. The integration by parts rule which is a special case of
Itô’s formula gives
d(ZX) = Z dX + X dZ + d[Z, X].
The definition of Z gives dZ = αZ dt. Since Z is a continuous FV process,
[Z, X] = 0. Assuming X satisfies (7.8), we get
d(ZX) = −αZX dt + σZ dB + αZX dt = σZ dB
which in integrated from becomes (note Z0 = 1)
Z t
Zt Xt = X0 + σ eαs dBs
0
and we can solve for X to get
Z t
(7.9) Xt = X0 e−αt + σe−αt eαs dBs .
0
Since we arrived at this formula by assuming the solution of the equation,
one should now turn around and verify that (7.9) defines a solution of (7.8).
198 7. Stochastic Differential Equations

We leave this to the reader. The process defined by the SDE (7.8) or by the
formula (7.9) is known as the Ornstein-Uhlenbeck process.

We continue with another example of a linear equation where the inte-

grating factor needs to take into consideration the stochastic part.

Example 7.3. Let µ, σ be constants, B one-dimensional Brownian motion,

and consider the SDE
(7.10) dX = µX dt + σX dB
with a given initial value X0 independent of the Brownian motion. Since
the equation rewrites as
dX − X(µ dt + σ dB) = 0
a direct adaptation of (7.7) would suggest multiplication by exp(−µt−σBt ).
The reader is invited to try this. Some trial and error with Itô’s formula
reveals that the quadratic variation must be taken into account, and the
useful integrating factor turns out to be
Zt = exp −µt − σBt + 12 σ 2 t .

Itô’s formula gives first

(7.11) dZ = (−µ + σ 2 )Z dt − σZ dB
and then, assuming X satisfies (7.10), d(ZX) = 0. For this calculation, note
that the only nonzero contribution to the covariation of X and Z comes from
the dB-terms of their semimartingale representations (7.11) and (7.10):
h Z Z i
[Z, X]t = −σ Z dB , σ X dB
t
Z t Z t
= −σ 2 Xs Zs d[B, B]s = −σ 2 Xs Zs ds.
0 0

Above we used Proposition 6.2. In differential form the above is simply

d[Z, X] = −σ 2 ZX dt.
Continuing with the solution of equation (7.10), d(ZX) = 0 and Z0 = 1
give Zt Xt = Z0 X0 = X0 , from which Xt = X0 Zt−1 . More explicitly, our
tentative solution is
Xt = X0 exp (µ − 12 σ 2 )t + σBt

whose correctness can again be verified with Itô’s formula.

Solutions of the previous two equations are defined for all time. Our the
last example of a linear equation is not.
7.1. Examples of stochastic equations and solutions 199

Example 7.4 (Brownian bridge). Fix 0 < T < 1. The SDE is now
X
(7.12) dX = − dt + dB, for 0 ≤ t ≤ T , with X0 = 0.
1−t
The integrating factor
nZ t
ds o 1
Zt = exp =
0 1−s 1−t
works, and we arrive at the solution
Z t
1
(7.13) Xt = (1 − t) dBs .
0 1−s
To check that this solves (7.12), apply the
R product formula d(U V ) = U dV +
V dU + d[U, V ] with U = 1 − t and V = (1 − s) dBs = (1 − t)−1 X. With
−1

dU = −dt, dV = (1 − t)−1 dB and [U, V ] = 0 this gives

dX = d(U V ) = (1 − t)(1 − t)−1 dB − (1 − t)−1 Xdt
dB − (1 − t)−1 Xdt
which is exactly (7.12).
The solution (7.13) is defined for 0 ≤ t < 1. One can show that it
converges to 0 as t % 1, along almost every path of Bs (Exercise 7.1).

7.1.2. Stochastic exponential. The exponential function g(t) = ect can

be characterized as a the unique function g that satisfies the equation
Z t
g(t) = 1 + c g(s) ds.
0
The stochastic exponential generalizes the dt-integral to a semimartingale
integral.
Theorem 7.5. Let Y be a real-valued semimartingale such that Y0 = 0.
Define
Y
Zt = exp Yt − 21 [Y ]t (1 + ∆Ys ) exp −∆Ys + 12 (∆Ys )2 .

(7.14)
s∈(0,t]

Then the process Z is a cadlag semimartingale, and it is the unique semi-

martingale Z that satisfies the equation
Z
(7.15) Zt = 1 + Zs− dYs .
(0,t]

Proof. The uniqueness of the solution of (7.15) will follow from the general
uniqueness theorem for solutions of semimartingale equations.
200 7. Stochastic Differential Equations

We start by showing that Z defined by (7.14) is a semimartingale. The

definition can be reorganized as
X Y
Zt = exp Yt − 12 [Y ]t + 21 (∆Ys )2

(1 + ∆Ys ) exp −∆Ys .
s∈(0,t] s∈(0,t]

The continuous part

X
[Y ]ct = [Y ]t − (∆Ys )2
s∈(0,t]

of the quadratic variation is an increasing process (cadlag, nondecreasing),

hence an FV process. Since ex is a C 2 function, the factor
X
1 1 2
(7.16) exp(Wt ) ≡ exp Yt − 2 [Y ]t + 2 (∆Ys )
s∈(0,t]

is a semimartingale by Itô’s formula. It remains to show that for a fixed ω

the product
Y
(7.17) Ut ≡ (1 + ∆Ys ) exp −∆Ys
s∈(0,t]

converges and has paths of bounded variation. The part

Y
(1 + ∆Ys ) exp −∆Ys
s∈(0,t]:|∆Y (s)|≥1/2

has only finitely many factors because a cadlag path has only finitely many
jumps exceeding a given size in a bounded time interval. Hence this part is
piecewise constant, cadlag and in particular FV. Let ξs = ∆Ys 1{|∆Y (s)|<1/2}
denote a jump of magnitude below 12 . It remains to show that
Y n X o
Ht = (1 + ξs ) exp −ξs = exp log(1 + ξs ) − ξs
s∈(0,t] s∈(0,t]

is an FV process. For |x| < 1/2,

log(1 + x) − x ≤ x2 .
Hence the sum above is absolutely convergent:
X X
log(1 + ξs ) − ξs ≤ ξs2 ≤ [Y ]t < ∞.
s∈(0,t] s∈(0,t]

It follows that log Ht is a cadlag FV process (see Example 1.14 in this con-
text). Since the exponential function is locally Lipschitz, Ht = exp(log Ht )
is also a cadlag FV process.
To summarize, we have shown that eWt in (7.16) is a semimartingale
and Ut in (7.17) is a well-defined real-valued FV process. Consequently
Zt = eWt Ut is a semimartingale.
7.1. Examples of stochastic equations and solutions 201

The second part of the proof is to show that Z satisfies equation (7.15).
Let f (w, u) = ew u, and find
fw = f , fu = ew , fuu = 0, fuw = ew and fww = f .
Note that ∆Ws = ∆Ys because the jump in [Y ] at s equals exactly (∆Ys )2 .
A straight-forward application of Itô gives
Z Z
W (s−)
Zt = 1 + e dUs + Zs− dWs
(0,t] (0,t]
Z Z
+2 1
Zs− d[W ]s + eW (s−) d[W, U ]s
(0,t] (0,t]
X n o
+ ∆Zs − Zs− ∆Ys − eW (s−) ∆Us − 12 Zs− (∆Ys )2 − eW (s−) ∆Ys ∆Us .
s∈(0,t]

Since X
Wt = Yt − 1
2 [Y ]t − (∆Ys )2
s∈(0,t]

where the part in parentheses is FV and continuous, linearity

P of covariation
and Lemma A.8 imply [W ] = [Y ] and [W, U ] = [Y, U ] = ∆Y ∆U .
Now match up and cancel terms. First,
Z X
eW (s−) dUs − eW (s−) ∆Us = 0
(0,t] s∈(0,t]

because U is an FV process whose paths are step functions so the integral

reduces to a sum over jumps. Second,
Z Z X
1
Zs− dWs + 2 Zs− d[W ]s − Zs− (∆Ys )2
(0,t] (0,t] s∈(0,t]
Z
= Zs− dYs
(0,t]

by the definition of W and the observation [W ] = [Y ] from above. Then

Z X
eW (s−) d[W, U ]s − eW (s−) ∆Ys ∆Us = 0
(0,t] s∈(0,t]
P
by the observation [W, U ] = ∆Y ∆U from above. Lastly, ∆Zs −Zs− ∆Ys =
0 directly from the definition of Z.
After the cancelling we are left with
Z
Zt = 1 + Zs− dYs ,
(0,t]

the desired equation.

202 7. Stochastic Differential Equations

The semimartingale Z defined in the theorem is called the stochastic

exponential of Y , and denoted by E(Y ).
Example 7.6. Let Yt = λBt where B is Brownian motion. Then E(λB)t =
exp{λBt − 21 λ2 t}, also known as geometric Brownian motion. The equation
Z t
E(λB)t = 1 + E(λB)s dBs
0
and moment bounds show that geometric Brownian motion is a continuous
L2 -martingale.
7.2. Existence and uniqueness for a semimartingale equation 203

7.2. Existence and uniqueness for a semimartingale equation

Fix a probability space (Ω, F, P ) with a filtration {Ft }. We consider the
equation
Z
(7.18) X(t, ω) = H(t, ω) + F (s, ω, X(ω)) dY (s, ω)
(0,t]

where Y is a given Rm -valued cadlag semimartingale, H is a given Rd -

valued adapted cadlag process, and X is the unknown Rd -valued process.
The coefficient F is a d × m-matrix valued function of its Rarguments. (The
componentwise interpretation of the stochastic integral F dY appeared
Section 5.4.) For the coefficient F we make these assumptions.

Assumption 7.7. The coefficient function F (s, ω, η) in equation (7.18) is

assumed to satisfy the following requirements.

(i) F is a map from the space R+ ×Ω×DRd [0, ∞) into the space Rd×m
of d×m matrices. F satisfies a spatial Lipschitz condition uniformly
in the other variables: there exists a finite constant L such that this
holds for all (t, ω) ∈ R+ × Ω and all η, ζ ∈ DRd [0, ∞):

(7.19) |F (t, ω, η) − F (t, ω, ζ)| ≤ L · sup |η(s) − ζ(s)|.

s∈[0,t)

(ii) Given any adapted Rd -valued cadlag process X on Ω, the function

(t, ω) 7→ F (t, ω, X(ω)) is a predictable process, and there exist
stopping times νk % ∞ such that 1(0,νk ] (t)F (t, X) is bounded for
each k.

These conditions on F are rather technical. They are written to help

prove the next theorem. The proof is lengthy and postponed to the next
section. There are technical steps where we need right-continuity of the
filtration, hence we include this requirement in the hypotheses.

Theorem 7.8. Assume {Ft } is complete and right-continuous. Let H be an

adapted Rd -valued cadlag process and Y an Rm -valued cadlag semimartin-
gale. Assume F satisfies Assumption 7.7. Then there exists a cadlag process
{X(t) : 0 ≤ t < ∞} adapted to {Ft } that satisfies equation (7.18), and X is
unique up to indistinguishability.

In the remainder of this section we discuss the assumptions on F and

state some consequences of the existence and uniqueness theorem.
204 7. Stochastic Differential Equations

The Lipschitz condition in part (i) of Assumption 7.7 implies that F (t, ω, · )
is a function of the stopped path

 η(0), t = 0, 0 ≤ s < ∞

t−
η (s) = η(s), 0≤s<t

η(t−), s ≥ t > 0.


In other words, the function F (t, ω, · ) only depends on the path on the time
interval [0, t).
R
Part (ii) guarantees that the stochastic integral F (s, X) dY (s) exists
for an arbitrary adapted cadlag process X and semimartingale Y . The
existence of the stopping times {νk } in part (ii) can be verified via this local
boundedness condition.
Lemma 7.9. Assume F satisfies parts (i) and (ii) of Assumpotion 7.7.
Suppose there exists a path ζ̄ ∈ DRd [0, ∞) such that for all T < ∞,
(7.20) c(T ) = sup |F (t, ω, ζ̄ )| < ∞.
t∈[0,T ], ω∈Ω

Then for any adapted Rd -valued cadlag process X there exist stopping times
νk % ∞ such that 1(0,νk ] (t)F (t, X) is bounded for each k.

Proof. Define
νk = inf{t ≥ 0 : |X(t)| ≥ k} ∧ inf{t > 0 : |X(t−)| ≥ k} ∧ k
These are bounded stopping times by Lemma 2.8. |X(s)| ≤ k for 0 ≤ s < νk
(if νk = 0 we cannot claim that |X(0)| ≤ k). The stopped process

X(0),
 νk = 0
νk −
X (t) = X(t), 0 ≤ t < νk

X(νk −), t ≥ νk > 0


is cadlag and adapted.

We show that 1(0,νk ] (s)F (s, X) is bounded. If s = 0 or νk = 0 then this
random variable vanishes. Suppose 0 < s ≤ νk . Since [0, s) ⊆ [0, νk ), X
agrees with X νk − on [0, s), and then F (s, X) = F (s, X νk − ) by (7.19).
|F (s, X)| = |F (s, X νk − )| ≤ |F (s, ζ̄ )| + |F (s, X νk − ) − F (s, ζ̄ )|
≤ c(k) + L · sup |X(t) − ζ̄(t)|
0≤t<νk

≤ c(k) + L k + sup |ζ̄(t)| .
0≤t≤k

The last line above is a finite quantity because ζ̄ is locally bounded, being
a cadlag path.
7.2. Existence and uniqueness for a semimartingale equation 205

Here is how to apply the theorem to an equation that is not defined for
all time.

Corollary 7.10. Let 0 < T < ∞. Assume {Ft } is right-continuous, Y

is a cadlag semimartingale and H is an adapted cadlag process, all defined
for 0 ≤ t ≤ T . Let F satisfy Assumption 7.7 for (t, ω) ∈ [0, T ] × Ω. In
particular, part (ii) takes this form: if X is a predictable process defined
on [0, T ] × Ω, then so is F (t, X), and there is a nondecreasing sequence of
stopping times {σk } such that 1(0,σk ] (t)F (t, X) is bounded for each k, and
for almost every ω, σk = T for all large enough k.
Then there exists a unique solution X to equation (7.18) on [0, T ].

Proof. Extend the filtration, H, Y and F to all time in this manner: for
t ∈ (T, ∞) and ω ∈ Ω define Ft = FT , Ht (ω) = HT (ω), Yt (ω) = YT (ω), and
F (t, ω, η) = 0. Then the extended processes H and Y and the coefficient F
satisfy all the original assumptions on all of [0, ∞). Note in particular that
if G(t) = F (t, X) is a predictable process for 0 ≤ t ≤ T , then extending
it as a constant to (T, ∞) produces a predictable process for 0 ≤ t < ∞.
And given a predictable process X on [0, ∞), let {σk } be the stopping times
given by the assumption, and then define
(
σ k , σk < T
νk =
∞, σk = T.

These stopping times satisfy part (ii) of Assumption 7.7 for [0, ∞). Now
Theorem 7.18 gives a solution X for all time 0 ≤ t < ∞ for the extended
equation, and on [0, T ] X solves the equation with the original H, Y and F .
For the uniqueness part, given a solution X of the equation on [0, T ],
extend it to all time by defining Xt = XT for t ∈ (T, ∞). Then we have
a solution of the extended equation on [0, ∞), and the uniqueness theorem
applies to that.

Another easy generalization is to the situation where the Lipschitz con-

stant is an unbounded function of time.

Corollary 7.11. Let the assumptions be as in Theorem 7.8, except that the
Lipschitz assumption is weakened to this: for each 0 < T < ∞ there exists
a finite constant L(T ) such that this holds for all (t, ω) ∈ [0, T ] × Ω and all
η, ζ ∈ DRd [0, ∞):

(7.21) |F (t, ω, η) − F (t, ω, ζ)| ≤ L(T ) · sup |η(s) − ζ(s)|.

s∈[0,t)

Then equation (7.18) has a unique solution X adapted to {Ft }.

206 7. Stochastic Differential Equations

Proof. For k ∈ N, the function 1{0≤t≤k} F (t, ω, η) satisfies the original hy-
potheses. By Theorem 7.8 there exists a process Xk that satisfies the equa-
tion
Z
(7.22) Xk (t) = H k (t) + 1[0,k] (s)F (s, Xk ) dY k (s).
(0,t]

The notation above is H k (t)

= H(k ∧ t) for a stopped process as before. Let
k < m. Stopping the equation
Z
m
Xm (t) = H (t) + 1[0,m] (s)F (s, Xm ) dY m (s)
(0,t]

at time k gives the equation

Z
k k
Xm (t) = H (t) + 1[0,m] (s)F (s, Xm ) dY m (s),
(0,t∧k]

valid for all t. By Proposition 5.35 stopping a stochastic integral can be

achieved by stopping the integrator or by cutting off the integrand with an
indicator function. If we do both, we get the equation
Z
k
Xm (t) = H k (t) + k
1[0,k] (s)F (s, Xm ) dY k (s).
(0,t]

Thus Xk and Xm k satisfy the same equation, so by the uniqueness theorem,

k k
X = Xm for k < m. Thus we can unambiguously define a process X by
setting X = Xk on [0, k]. Then for 0 ≤ t ≤ k we can substitute X for Xk in
equation (7.22) and get the equation
Z
k
X(t) = H (t) + F (s, X) dY (s), 0 ≤ t ≤ k.
(0,k∧t]

Since this holds for all k, X is a solution to the original SDE (7.18).
Uniqueness works similarly. If X and X e solve (7.18), then X(k ∧ t) and
e ∧ t) solve (7.22). By the uniqueness theorem X(k ∧ t) = X(k
X(k e ∧ t) for
all t, and since k can be taken arbitrary, X = X. e
Example 7.12. Here are ways by which a coefficient F satisfying Assump-
tion 7.7 can arise.
(i) Let f (t, ω, x) be a P ⊗ BRd -measurable function from (R+ × Ω) × Rd
into d × m-matrices. Assume f satisfies the Lipschitz condition
|f (t, ω, x) − f (t, ω, y)| ≤ L(T )|x − y|
for (t, ω) ∈ [0, T ] × Ω and x, y ∈ Rd , and the local boundedness condition

sup |f (t, ω, 0)| : 0 ≤ t ≤ T, ω ∈ Ω < ∞
for all 0 < T < ∞. Then put F (t, ω, η) = f (t, ω, η(t−)).
7.2. Existence and uniqueness for a semimartingale equation 207

Satisfaction of the conditions of Assumption 7.7 is straight-forward,

except perhaps the predictability. Fix a cadlag process X and let U be
the space of P ⊗ BRd measurable f such that (t, ω) 7→ f (t, ω, Xt− (ω))
is P-measurable. This space is linear and closed under pointwise con-
vergence. By Theorem B.2 it remains to check that U contains indica-
tor functions f = 1Γ×B of products of Γ ∈ P and B ∈ BRd . For such
f , f (t, ω, Xt− (ω)) = 1Γ (t, ω)1B (Xt− (ω)). The first factor 1Γ (t, ω) is pre-
dictable by construction. The second factor 1B (Xt− (ω)) is predictable be-
cause Xt− is a predictable process, and because of the general fact that g(Zt )
is predictable for any predictable process Z and any measurable function g
on the state space of Z.

(ii) A special case of the preceding is a Borel measurable function f (t, x)

on [0, ∞) × Rd with the required Lipschitz and boundedness conditions. In
other words, no dependence on ω.

(iii) Continuing with non-random functions, suppose a function G :

DRd [0, ∞) → DRd [0, ∞) satisfies a Lipschitz condition in this form:
(7.23) G(η)t − G(ζ)t ≤ L · sup |η(s) − ζ(s)|.
s∈[0,t)

Then F (t, ω, η) = G(η)t− satisfies Assumption 7.7.

Next, let us specialize the existence and uniqueness theorem to Itô equa-
tions.
Corollary 7.13. Let Bt be a standard Brownian motion in Rm with respect
to a right-continuous filtration {Ft } and ξ an Rd -valued F0 -measurable ran-
dom variable. Fix 0 < T < ∞. Assume the functions b : [0, T ] × Rd → Rd
and σ : [0, T ] × Rd → Rd×m satisfy the Lipschitz condition
|b(t, x) − b(t, y)| + |σ(t, x) − σ(t, y)| ≤ L|x − y|
and the bound
|b(t, x)| + |σ(t, x)| ≤ L(1 + |x|)
for a constant L and all 0 ≤ t ≤ T , x, y ∈ Rd .
Then there exists a unique continuous process X on [0, T ] that is adapted
to {Ft } and satisfies
Z t Z t
(7.24) Xt = ξ + b(s, Xs ) ds + σ(s, Xs ) dBs
0 0
for 0 ≤ t ≤ T .

Proof. To fit this into Theorem 7.8, let Y (t) = [t, Bt ]T , H(t) = ξ, and

F (t, ω, η) = b(t, η(t−)), σ(t, η(t−)) .
208 7. Stochastic Differential Equations

We write b(t, η(t−)) to get a predictable coefficient. For a continuous path

η this left limit is immaterial as η(t−) = η(t). The hypotheses on b and σ
establish the conditions required by the existence and uniqueness theorem.
Once there is a cadlag solution X from the theorem, continuity of X follows
by observing that the right-hand side of (7.24) is a continuous process.

Next we present the obligatory basic examples from ODE theory that
illustrate the loss of existence and uniqueness when the Lipschitz assumption
on F is weakened.
Example 7.14. Consider the equation
Z t p
x(t) = 2 x(s) ds.
0
√
The function f (x) = x is not Lipschitz on [0, 1] because f 0 (x) blows up
at the origin. The equation has infinitely many solutions. Two of them are
x(t) = 0 and x(t) = t2 .
The equation Z t
x(t) = 1 + x2 (s) ds
0
does not have a solution for all time. The unique solution starting at t = 0
is x(t) = (1 − t)−1 which exists only for 0 ≤ t < 1.

7.3. Proof of the existence and uniqueness theorem

This section proves Theorem 7.8. The key part of both the existence and
uniqueness proof is a Gronwall-type estimate, which we derive after some
technical preliminaries.

7.3.1. Technical preliminaries.

Lemma 7.15 (Gronwall’s Inequality). Let g be an integrable Borel function
on [a, b], and assume that there exists two constants A and B such that
Z t
g(t) ≤ A + B g(s) ds, a ≤ t ≤ b.
a
Then
g(t) ≤ AeB(t−a) , a ≤ t ≤ b.

Rt
Proof. The indefinite integral a g(s) ds of an integrable function is an ab-
solutely continuous (AC) function. At Lebesgue-almost every t it is differ-
entiable and the derivative equals g(t). Consequently the equation
d −Bt t
Z Z t
−Bt
e g(s) ds = −Be g(s) ds + e−Bt g(t)
dt a a
7.3. Proof of the existence and uniqueness theorem 209

is valid for Lebesgue-almost every t. Hence by the hypothesis

d −Bt t
Z Z t
−Bt
e g(s) ds = e g(t) − B g(s) ds
dt a a
≤ Ae−Bt for almost every t.
An absolutely continuous function is the integral of its almost everywhere
existing derivative, so
Z t Z t
−Bt A
Ae−Bs ds = − e−Bt − e−Ba

e g(s) ds ≤
a a B
from which Z t
A B(t−a)
g(s) ds ≤ e −1 .
a B
Using the assumption once more,
A B(t−a)
− 1 = AeB(t−a) .

g(t) ≤ A + B · e
B

Lemma 7.16. Let X be a cadlag process and Y a cadlag semimartingale.
Fix t > 0. Let γ be a nondecreasing continuous process on [0, t] such that
γ(0) = 0 and γ(u) is a stopping time for each u. Then
Z Z
(7.25) X(s−) dY (s) = X ◦ γ(s−) d(Y ◦ γ)(s).
(0,γ(t)] (0,t]

Remark 7.17. On the right the integrand is

X ◦ γ(s−) = lim X(γ(u))
u%s, u<s

which is not the same as X(γ(s−)) if the latter is interpreted as X evaluated

at γ(s−).

Proof. As {tni } goes through partitions of [0, t] with mesh tending to zero,
{γ(tni )} goes through partitions of [0, γ(t)] with mesh tending to zero. By
Proposition 5.34, both sides of (7.25) equal the limit of the sums
X
X(γ(tni )) Y (tni+1 ) − Y (tni ) .

Lemma 7.18. Suppose A is a nondecreasing cadlag function such that
A(0) = 0, and Z is a nondecreasing real-valued cadlag function. Then
γ(u) = inf{t ≥ 0 : A(t) > u}
210 7. Stochastic Differential Equations

defines a nondecreasing cadlag function with γ(0) = 0, and

Z

Z ◦ γ(s−) d(A ◦ γ)(s) ≤ A(γ(u)) − u Z ◦ γ(u−)
(0,u]
(7.26) Z
+ Z ◦ γ(s−) ds.
(0,u]

Proof. That γ is nondecreasing is immediate. To see right-continuity, let

s > γ(u). Then A(s) > u. Pick ε > 0 so that A(s) > u + ε. Then for
v ∈ [u, u + ε], γ(v) ≤ s.
Also, since A(t) > u for t > γ(u), by the cadlag property of A
(7.27) A(γ(u)) ≥ u.
Since Z ◦ γ is cadlag, the integrals (7.26) are limits of Riemann sums with
integrand evaluated at the left endpoint of partition intervals (Lemma 1.12).
Let {0 = s0 < s1 < · · · < sm = u} be a partition of [0, u]. Next algebraic
manipulations:
m−1
X
Z(γ(si )) A(γ(si+1 )) − A(γ(si ))
i=0
m−1
X
= Z(γ(si ))(si+1 − si )
i=0
m−1
X
+ Z(γ(si )) A(γ(si+1 )) − si+1 − A(γ(si )) + si
i=0
m−1
X
= Z(γ(si ))(si+1 − si ) + Z(γ(sm−1 )) A(γ(u)) − u
i=0
m−1
X
− Z(γ(si )) − Z(γ(si−1 )) A(γ(si )) − si .
i=1
The sum on the last line above is nonnegative by the nondecreasing mono-
tonicity of Z ◦ γ and by (7.27). Thus we have
m−1
X
Z(γ(si )) A(γ(si+1 )) − A(γ(si ))
i=0
m−1
X
≤ Z(γ(si ))(si+1 − si ) + Z(γ(sm−1 )) A(γ(u)) − u .
i=0
Letting the mesh of the partition to zero turns the inequality above into
(7.18).
7.3. Proof of the existence and uniqueness theorem 211

Let F be a cadlag BV function on [0, T ] and ΛF its Lebesgue-Stieltjes

measure. The total variation function of F was denoted by VF (t). The total
variation measure |ΛF | satisfies |ΛF | = ΛVF . For Lebesgue-Stieltjes integrals
the general inequality (1.10) for signed measures can be written in this form:
Z Z
(7.28) g(s) dF (s) ≤ |g(s)| dVF (s).
(0,T ] (0,T ]

Lemma 7.19. Let X be an adapted cadlag process and α > 0. Let τ1 <
τ2 < τ3 < · · · be the times of successive jumps in X of magnitude above α:
with τ0 = 0,
τk = inf{t > τk−1 : |X(t) − X(t−)| > α}.
Then the {τk } are stopping times.

Proof. Fix k. We show {τk ≤ t} ∈ Ft . Consider the event

[ [ [ \n
A= there exist integers 0 < u1 < u2 < · · · < uk ≤ n
`≥1 m≥1 N ≥1 n≥N
such that ui − ui−1 ≥ n/`, and
u t u t t 1o
i i
X −X − >α+ .
n n n m
We claim that A = {τk ≤ t}.
Suppose ω ∈ A. Let sn,i = ui t/n, 1 ≤ i ≤ k, be the points whose
existence follows for all n ≥ N . Pass to a subsequence {nj } such that for
each 1 ≤ i ≤ k we have convergence snj ,i → ti ∈ [0, t] as j → ∞. From
the description of A there is an 1 ≤ ` < ∞ such that ti − ti−1 ≥ t/`. The
convergence also forces snj ,i − t/n → ti . Since the increment in the path X
cannot shrink below α + 1/m across the two points snj ,i − t/n and snj ,i , it
must be the case that |X(ti ) − X(ti −)| ≥ α + 1/m.
Consequently the points t1 , t2 , . . . , tk are times of jumps of magnitude
above α, and thereby τ ≤ t.
Conversely, suppose ω ∈ {τ ≤ t} and let t1 < t2 < · · · < tk be times of
jumps of magnitude above α in [0, t]. Pick ε > 0 so that |X(ti ) − X(ti −)| ≥
α+2ε and ti −ti−1 > 4ε for 2 ≤ i ≤ k. Pick δ ∈ (0, ε) such that r ∈ [ti −δ, ti )
and s ∈ [ti , ti + δ] imply
|X(s) − X(r)| > α + ε.
Now let ` > t/ε, m > 1/ε and N > t/δ. Then for n ≥ N find integers ui
such that
t t
(ui − 1) < ti ≤ ui .
n n
Then also
t t
ti − δ < (ui − 1) < ti ≤ ui < ti + δ
n n
212 7. Stochastic Differential Equations

from which
u t u t t 1
i i
X −X − >α+ .
n n n m
This shows ω ∈ A.

We need to consider equations for stopped processes, and also equations

that are “restarted” at a stopping time.

Lemma 7.20. Suppose F satisfies Assumption 7.7 and let

Z
ξ(t) = F (s, X) dY (s).
(0,t]

Let τ be a finite stopping time. Then

Z
τ−
(7.29) ξ (t) = F (s, X τ − ) dY τ − (s).
(0,t]

In particular, suppose X satisfies equation (7.18). Then the equation con-

tinues to hold when all processes are stopped at τ −. In other words,
Z
τ− τ−
(7.30) X (t) = H (t) + F (s, X τ − ) dY τ − (s).
(0,t]

Proof. It suffices to check (7.29), the second conclusion is then immediate.

Then first by part (b), then by part (a) of Proposition 5.46,

ξ τ − = (H · Y )τ − = (G · Y )τ − = G · Y τ − .

Lemma 7.21. Assume the filtration {Ft } is right-continuous. Let σ be a

finite stopping time for {Ft } and F̄t = Fσ+t .
(a) Let ν be a stopping time for {F̄t }. Then σ + ν is a stopping time for
{Ft } and F̄ν ⊆ Fσ+ν .
(b) Suppose τ is a stopping time for {Ft }. Then ν = (τ − σ)+ is an
Fτ -measurable random variable, and an {F̄t }-stopping time.
7.3. Proof of the existence and uniqueness theorem 213

(c) Let Z be a cadlag process adapted to {Ft } and Z̄ a cadlag process

adapted to {F̄t }. Define
(
Z(t), t<σ
X(t) =
Z(σ) + Z̄(t − σ), t ≥ σ.
Then X is a cadlag process adapted to {Ft }.

Proof. Part (a). Let A ∈ F̄ν . Observe that

[
A ∩ {σ + ν < t} = A ∩ {ν < r} ∩ {σ ≤ t − r}.
r∈(0,t)∩Q

(If ε > 0 satisfies σ + ν < t − ε, then any rational r ∈ (ν, ν + ε) will do.) By
the definition of F̄ν ,
[
1
A ∩ {ν < r} = A ∩ {ν ≤ r − m } ∈ F̄r = Fσ+r ,
m∈N
and consequently by the definition of Fσ+r ,
A ∩ {ν < r} ∩ {σ + r ≤ t} ∈ Ft .
If A = Ω, we have showed that {σ + ν < t} ∈ Ft . By Lemma 2.5 and the
right-continuity of {Ft }, this implies that σ + ν is an {Ft }-stopping time.
For the general A ∈ F̄ν we have showed that
\
1
A ∩ {σ + ν ≤ t} = A ∩ {σ + ν < t + m} ∈ Ft+(1/n) .
m≥n

Since this is true for any n,

\
A ∩ {σ + ν ≤ t} ∈ Ft+(1/n) = Ft
n
where we used the right-continuity of {Ft } again.
Part (b). For 0 ≤ t < ∞, {(τ − σ)+ ≤ t} = {σ + t ≥ τ }. By part (ii)
of Lemma 2.2 this event lies in both Fτ and Fσ+t = F̄t . This second part
implies that ν = (τ − σ)+ is an {F̄t }-stopping time.
Part (c). Fix 0 ≤ t < ∞ and a Borel set B on the state space of these
processes.
{X(t) ∈ B} = {σ > t, Z(t) ∈ B} ∪ {σ ≤ t, Z(σ) + Z̄(t − σ) ∈ B}.
The first part {σ > t, Z(t) ∈ B} lies in Ft because it is the intersection of
two sets in Ft . Let ν = (t − σ)+ . The second part can be written as
{σ ≤ t, Z(σ) + Z̄(ν) ∈ B}.
Cadlag processes are progressively measurable, hence Z(σ) is Fσ -measurable
and Z̄(ν) is F̄ν -measurable (part (iii) of Lemma 2.2). Since σ ≤ σ + ν,
214 7. Stochastic Differential Equations

Fσ ⊆ Fσ+ν . By part (a) F̄ν ⊆ Fσ+ν . Consequently Z(σ) + Z̄(ν) is Fσ+ν -

measurable. Since σ ≤ t is equivalent to σ +ν ≤ t, we can rewrite the second
part once more as
{σ + ν ≤ t} ∩ {Z(σ) + Z̄(ν) ∈ B}.
By part (i) of Lemma 2.2 applied to the stopping times σ + ν and t, the set
above lies in Ft .
This concludes the proof that {X(t) ∈ B} ∈ Ft .

7.3.2. A Gronwall estimate for semimartingale equations. In this

section we prove a Gronwall-type estimate for SDE’s under fairly stringent
assumptions on the driving semimartingale. When the result is applied,
the assumptions can be relaxed through localization arguments. All the
processes are defined for 0 ≤ t < ∞. F is assumed to satisfy the conditions
of Assumption 7.7, and in particular L is the Lipschitz constant of F . In this
section we work with a class of semimartingales that satisfies the following
definition.
Definition 7.22. Let 0 < δ, K < ∞ be constants. Let us say an Rm -
valued cadlag semimartingale Y = (Y1 , . . . , Ym )T is of type (δ, K) if Y has a
decomposition Y = Y (0)+M +S where M = (M1 , . . . , Mm )T is an m-vector
of L2 -martingales, S = (S1 , . . . , Sm )T is an m-vector of FV processes, and
(7.31) |∆Mj (t)| ≤ δ, |∆Sj (t)| ≤ δ, and Vt (Sj ) ≤ K
for all 0 ≤ t < ∞ and 1 ≤ j ≤ m, almost surely.

This notion of (δ, K) type is of no general significance. We use it as a

convenient abbreviation in the existence and uniqueness proofs that follow.
As functions of the various constants that have been introduced, define the
increasing process
m
X Xm
(7.32) A(t) = 16L2 dm [Mj ]t + 4KL2 dm VSj (t) + t,
j=1 j=1

the stopping times

(7.33) γ(u) = inf{t ≥ 0 : A(t) > u},
and the constant
(7.34) c = c(δ, K, L) = 16δ 2 L2 dm2 + 4δKL2 dm2 .
Let us make some observations about A(t) and γ(u). The term t is
added to A(t) to give A(t) ≥ t and make A(t) strictly increasing. Then
A(u + ε) ≥ u + ε > u for all ε > 0 gives γ(u) ≤ u. To see that γ(u) is
a stopping time, observe that {γ(u) ≤ t} = {A(t) ≥ u}. First, assuming
γ(u) ≤ t, by (7.27) and monotonicity A(t) ≥ A(γ(u)) ≥ u. Second, if
7.3. Proof of the existence and uniqueness theorem 215

A(t) ≥ u then by strict monotonicity A(s) > u for all s > t, which implies
γ(u) ≤ t.
Strict monotonicity of A gives continuity of γ. Right continuity of γ
was already argued in Lemma 7.18. For left continuity, let s < γ(u). Then
A(s) < u because A(s) ≥ u together with strict increasingness would imply
the existence of a point t ∈ (s, γ(u)) such that A(t) > u, contradicting
t < γ(u). Then γ(v) ≥ s for v ∈ (A(s), u) which shows right continuity.
In summary: u 7→ γ(u) is a continuous nondecreasing function of bounded
stopping times such that γ(u) ≤ u. For any given ω and T , once u > A(T )
we have γ(u) ≥ T , and so γ(u) → ∞ as u → ∞.
If Y is of type (δ0 , K0 ) for any δ0 ≤ δ and K0 ≤ K then all jumps of A
satisfy |∆A(t)| ≤ c. This follows because the jumps of quadratic variation
and total variation obey ∆[M ](t) = (∆M (t))2 and ∆VSj (t) = |∆Sj (t)|.
For ` = 1, 2, let H` , X` , and Z` be adapted Rd -valued cadlag processes.
Assume they satisfy the equations
Z
(7.35) Z` (t) = H` (t) + F (s, X` ) dY (s), ` = 1, 2,
(0,t]

for all 0 ≤ t < ∞. Let

(7.36) DX (t) = sup |X1 (s) − X2 (s)|2
0≤s≤t

and
h i
sup |X1 (s) − X2 (s)|2 .

(7.37) φX (u) = E DX ◦ γ(u) = E
0≤s≤γ(u)

Make the same definitions with X replaced by Z and H. DX , DZ , and

DH are nonnegative, nondecreasing cadlag processes. φX , φZ , and φH are
nonnegative nondecreasing functions, and cadlag at least on any interval on
which they are finite. We assume that
h i
(7.38) φH (u) = E sup |H1 (s) − H2 (s)|2 < ∞
0≤s≤γ(u)

for all 0 ≤ u < ∞.

The proposition below is the key tool for both the uniqueness and exis-
tence proofs. Part (b) is a Gronwall type estimate for SDE’s.
Proposition 7.23. Suppose F satisfies Assumption 7.7 and Y is a semi-
martingale of type (δ, K −δ) in Definition 7.22. Furthermore, assume (7.38),
and let the pairs (X` , Z` ) satisfy (7.35).
(a) For 0 ≤ u < ∞,
Z u
(7.39) φZ (u) ≤ 2φH (u) + cφX (u) + φX (s) ds.
0
216 7. Stochastic Differential Equations

(b) Suppose Z` = X` , in other words X1 and X2 satisfy the equations

Z
(7.40) X` (t) = H` (t) + F (s, X` ) dY (s), ` = 1, 2.
(0,t]

Then φX (u) < ∞. If δ is small enough relative to K and L so that c < 1,

then for all u > 0,

2φH (u) n u o
(7.41) φX (u) ≤ exp .
1−c 1−c

Before starting the proof of the proposition, we establish an auxiliary

lemma.

Lemma 7.24. Let 0 < δ, K < ∞ and suppose Y is a semimartingale of

type (δ, K) as specified in Definition 7.22. Let G = {Gi,j } be a bounded
predictable d × m-matrix valued process. Then for 0 ≤ u < ∞ we have the
bounds
Z 2 Z
1 −2
E sup G(s) dY (s) ≤ 2L E |G(s)|2 dA(s)
(7.42) 0≤t≤γ(u) (0,t] (0,γ(u)]

≤ 21 L−2 (u + c)kGk2∞ .

Proof of Lemma 7.24. By the definition of Euclidean norm and by the

inequality (x + y)2 ≤ 2x2 + 2y 2 ,

Z 2 d X
X m Z 2
G(s) dY (s) ≤2 Gi,j (s) dMj (s)
(0,t] i=1 j=1 (0,t]
(7.43)
d X
X m Z 2
+2 Gi,j (s) dSj (s) .
i=1 j=1 (0,t]

The first sum inside parentheses after the inequality in (7.43) is an L2 -

martingale by the boundedness of G and because each Mj is an L2 -martingale.
Take supremum over 0 ≤ t ≤ γ(u), on the right pass the supremum inside
the i-sums, and take expectations. By
PDoob’s inequality (3.13), Schwarz in-
equality in the form ( m 2 ≤m m 2 , and by the isometric property
P
a
i=1 i ) a
i=1 i
7.3. Proof of the existence and uniqueness theorem 217

of stochastic integrals,
m Z
X 2
E sup Gi,j (s) dMj (s)
0≤t≤γ(u) j=1 (0,t]
m Z
X 2
≤ 4E Gi,j (s) dMj (s)
j=1 (0,γ(u)]
(7.44) m Z 2
X
≤ 4m E Gi,j (s) dMj (s)
j=1 (0,γ(u)]

Xm Z
≤ 4m E Gi,j (s)2 d[Mj ](s).
j=1 (0,γ(u)]

Similarly handle the expectation of the last sum in (7.43). Instead of

quadratic variation, use (7.28) and Schwarz another time.
X m Z 2
E sup Gi,j (s) dSj (s)
0≤t≤γ(u) j=1 (0,t]
m
X Z 2
≤m E sup Gi,j (s) dSj (s)
j=1 0≤t≤γ(u) (0,t]
(7.45) m Z 2
X
≤m E Gi,j (s) dVSj (s)
j=1 (0,γ(u)]

Xm Z
≤m E VSj (γ(u)) Gi,j (s)2 dVSj (s) .
j=1 (0,γ(u)]

Now we prove (7.42). Equations (7.43), (7.44) and (7.45), together with
the hypothesis VSj (t) ≤ K, imply that
Z 2
E sup G(s) dY (s)
0≤t≤γ(u) (0,t]
m
X Z
≤ 8dm E |G(s)|2 d[Mj ](s)
j=1 (0,γ(u)]
m
X Z
+ 2Kdm E |G(s)|2 dVSj (s)
j=1 (0,γ(u)]
Z
≤ 12 L−2 E |G(s)|2 dA(s)
(0,γ(u)]
h i
≤ 21 L−2 E sup |G(s)|2 · A(γ(u))
0≤t≤γ(u)
218 7. Stochastic Differential Equations

≤ 21 L−2 (u + c)kGk2∞ .
From A(γ(u)−) ≤ u and the bound c on the jumps in A came the bound
A(γ(u)) ≤ u + c used above. This completes the proof of Lemma 7.24.

Proof of Proposition 7.23. Step 1. We prove the proposition first under

the additional assumption that there exists a constant C0 such that |F | ≤ C0 ,
and under the relaxed assumption that Y is of type (δ, K). This small
relaxation accommodates the localization argument that in the end removes
the boundedness assumptions on F .

Use the inequality (x + y)2 ≤ 2x2 + 2y 2 to write

|Z1 (t) − Z2 (t)|2 ≤ 2|H1 (t) − H2 (t)|2
(7.46) Z
2
+2 F (s, X1 ) − F (s, X2 ) dY (s) .
(0,t]

We first check that

h i
(7.47) φZ (u) = E sup |Z1 (t) − Z2 (t)|2 < ∞ for all 0 ≤ u < ∞.
0≤t≤γ(u)

Combine (7.42) with the hypothesis |F | ≤ C0 to get the bound

Z
2
E sup F (s, X1 ) − F (s, X2 ) dY (s)
(7.48) 0≤t≤γ(u) (0,t]

≤ 2L−2 (u + c)C02 .
Now (7.47) follows from a combination of inequality (7.46), assumption
(7.38), and bound (7.48). Note that (7.47) does not require a bound on
X, due to the boundedness assumption on F and (7.38).
The Lipschitz assumption on F gives
2
F (s, X1 ) − F (s, X2 ) ≤ L2 DX (s−).
Apply (7.42) together with the Lipschitz bound to get
Z
2
E sup F (s, X1 ) − F (s, X2 ) dY (s)
0≤t≤γ(u) (0,t]
Z
1 −2 2
(7.49) ≤ 2L E F (s, X1 ) − F (s, X2 ) dA(s)
(0,γ(u)]
Z
≤ 21 E DX (s−) dA(s).
(0,γ(u)]

Now we prove part (a) under the assumption of bounded F . Take supre-
mum over 0 ≤ t ≤ γ(u) in (7.46), take expectations, and apply (7.49) to
7.3. Proof of the existence and uniqueness theorem 219

write
h i
sup |Z1 (t) − Z2 (t)|2

φZ (u) = E DZ ◦ γ(u) = E
0≤t≤γ(u)
(7.50) Z
≤ 2φH (u) + E DX (s−) dA(s).
(0,γ(u)]
To the dA-integral above apply first the change of variable from Lemma 7.16
and then inequality (7.26). This gives
φZ (u) ≤ 2φH (u)
Z
(7.51)
+ E A(γ(u)) − u DX ◦ γ(u−) + E DX ◦ γ(s−) ds.
(0,u]
For a fixed ω, cadlag paths are bounded on bounded time
R intervals, so apply-
ing Lemmas 7.16 and 7.18 to the path-by-path integral (0,γ(u)] DX (s−) dA(s)
is not problematic. And then, since the resulting terms are nonnegative,
their expectations exist.
By the definition of γ(u), A(s) ≤ u for s < γ(u), and so A(γ(u)−) ≤ u.
Thus by the bound c on the jumps of A,
A(γ(u)) − u ≤ A(γ(u)) − A(γ(u)−) ≤ c.
Applying this to (7.51) gives
Z

φZ (u) ≤ 2φH (u) + cE DX ◦ γ(u−) + E DX ◦ γ(s−) ds
(0,u]
(7.52) Z
= 2φH (u) + cφX (u) + φX (u) ds.
(0,u]

This is the desired conclusion (7.39), and the proof of part (a) for the case
of bounded F is complete.
We prove part (b) for bounded F . By assumption, Z` = X` and c < 1.
Now φX (u) = φZ (u), and by (7.47) this function is finite. Since it is nonde-
creasing, it is bounded on bounded intervals. Inequality (7.52) becomes
Z u
(1 − c)φX (u) ≤ 2φH (u) + φX (s) ds.
0
An application of Gronwall’s inequality (Lemma 7.15) gives the desired con-
clusion (7.41). This completes the proof of the proposition for the case of
bounded F .
Step 2. Return to the original hypotheses: Assumption 7.7 for F with-
out additional boundedness and Definition 7.22 for Y with type (δ, K − δ).
By part (ii) of Assumption 7.7, we can pick stopping times σk % ∞ and
constants Bk such that
1[0,σk ] (s)F (s, X` ) ≤ Bk for ` = 1, 2.
220 7. Stochastic Differential Equations

Define truncated functions by

(7.53) FBk (s, ω, η) = F (s, ω, η) ∧ Bk ∨ (−Bk ).
By Lemma 7.20 the equations
Z
Z`σk − (t) = H`σk − (t) + F (s, X`σk − ) dY σk − (s), ` ∈ {1, 2},
(0,t]

are satisfied. Since X` = X`σk − on [0, σk ), F (s, X`σk − ) = F (s, X` ) on [0, σk ].

The truncation has no effect on F (s, X` ) if 0 ≤ s ≤ σk , and so also
F (s, X`σk − ) = FBk (s, X`σk − ) on [0, σk ].
By part (b) of Proposition 5.46 we can perform this substitution in the
integral, and get the equations
Z
σk − σk −
Z` (t) = H` (t) + FBk (s, X`σk − ) dY σk − (s), ` ∈ {1, 2}.
(0,t]

Now we have equations with bounded coefficients in the integral as required

for Step 1.
We need to check what happens to the type in Definition 7.22 as we
replace Y with Y σk − . Originally Y was of type (δ, K − δ). Decompose Y σk −
as
Y σk − (t) = Y0 + M σk (t) + S σk − (t) − ∆M (σk )1{t≥σk } .
The new martingale part M σk is still an L2 -martingale. Its jumps are a
subset of the jumps of M , hence bounded by δ. The jth component of the
new FV part is Gj (t) = Sjσk − (t) − ∆Mj (σk )1{t≥σk } . It has jumps of Sj
on [0, σk ) and the jump of Mj at σk , hence all bounded by δ. The total
variation VGj is at most VSj + |∆Mj (σk )| ≤ K − δ + δ = K. We conclude
that Y σk − is of type (δ, K).
We have verified all the hypotheses of Step 1 for the stopped processes
and the function FBk . Consequently Step 1 applies and gives parts (a) and
(b) for the stopped processes Z`σk − , H`σk − , and X`σk − . Note that
h i h i
E sup |X1σk − (s) − X2σk − (s)|2 = E sup |X1 (s) − X2 (s)|2
0≤s≤γ(u) 0≤s≤γ(u), s<σk
h i
≤E sup |X1 (s) − X2 (s)|2 = φX (u)
0≤s≤γ(u)

while h i
lim E sup |X1σk − (s) − X2σk − (s)|2 = φX (u)
k→∞ 0≤s≤γ(u)

by monotone convergence because σk % ∞. Same facts hold of course for

Z and H too.
7.3. Proof of the existence and uniqueness theorem 221

Using the previous inequality, the outcome from Step 1 can be written
for part (a) as
h i
E sup |Z1σk − (s) − Z2σk − (s)|2
0≤s≤γ(u)
Z u
≤ 2φH (u) + cφX (u) + φX (s) ds.
0
and for part (b) as
h i 2φ (u) n u o
H
E sup |X1σk − (s) − X2σk − (s)|2 ≤ exp .
0≤s≤γ(u) 1−c 1−c
As k % ∞ the left-hand sides of the inequalities above converge to the
desired expectations. Parts (a) and (b) both follow, and the proof is com-
plete.

7.3.3. The uniqueness theorem.

Theorem 7.25. Let H be an Rd -valued adapted cadlag process and Y
an Rm -valued cadlag semimartingale. Assume F satisfies Assumption 7.7.
Suppose X1 and X2 are two adapted Rd -valued cadlag processes, and both
are solutions to the equation
Z
(7.54) X(t) = H(t) + F (s, X) dY (s), 0 ≤ t < ∞.
(0,t]

Then for almost every ω, X1 (t, ω) = X2 (t, ω) for all 0 ≤ t < ∞.

Proof. At time zero X1 (0) = H(0) = X2 (0).

Step 1. We first show that there exists a stopping time σ, defined in
terms of Y , such that P {σ > 0} = 1, and for all choices of H and F , and
for all solutions X1 and X2 , X1 (t) = X2 (t) for 0 ≤ t ≤ σ.

To prove this, we stop Y so that the hypotheses of Proposition 7.23 are

satisfied. Choose 0 < δ < K/3 < ∞ so that c < 1 for c defined by (7.34).
By Theorem 3.16 (fundamental theorem of local martingales) we can choose
the semimartingale decomposition Y = Y (0) + M + G so that the local
L2 -martingale M has jumps bounded by δ/2. Fix a constant 0 < C < ∞.
Define the following stopping times:
τ1 = inf{t > 0 : |Y (t) − Y (0)| ≥ C or |Y (t−) − Y (0)| ≥ C},
τ2 = inf{t > 0 : |∆Y (t)| > δ/2},

and

τ3 = inf{t > 0 : VGj (t) ≥ K − 2δ for some 1 ≤ j ≤ m}.

222 7. Stochastic Differential Equations

Past lemmas guarantee that τ1 and τ2 are stopping times. For τ3 , observe
that since VGj (t) is nondecreasing and cadlag,
m
[
{τ3 ≤ t} = VGj (t) ≥ K − 2δ .
j=1

Each of τ1 , τ2 and τ3 is strictly positive. A cadlag path satisfies |Y (s) −

Y (0)| < C for s ≤ δ for a positive δ that depends on the path, but then τ1 ≥
δ. The interval [0, 1] contains only finitely many jumps of Y of magnitude
above δ/2, so there must be a first one, and this occurs at some positive
time because Y (0+) = Y (0). Finally, total variation VGj is cadlag and so
VGj (0+) = VGj (0) = 0, hence τ3 > 0.
Let T be an arbitrary finite positive number and

(7.55) σ = τ1 ∧ τ2 ∧ τ3 ∧ T.

P (σ > 0) = 1 by what was said above.

We claim that semimartingale Y σ− satisfies the hypotheses of Proposi-
tion 7.23. To see this, decompose Y σ− as

(7.56) Y σ− (t) = Y (0) + M σ (t) + Gσ− (t) − ∆M (σ) · 1{t≥σ} .

Stopping does not introduce new jumps, so the jumps of M σ are still bounded
by δ/2. The jumps of Y σ− are bounded by δ/2 since σ ≤ τ2 . On [0, σ) the
FV part S(t) = Gσ− (t) − ∆M (σ) · 1{t≥σ} has the jumps of Gσ− . These are
bounded by δ because ∆Gσ− (t) = ∆Y σ− (t) − ∆M σ− (t). At time σ S has
the jump ∆M (σ), bounded by δ/2. The total variation of a component Sj
of S is

VSj (t) ≤ VGσ− (t) + |∆Mj (σ)| ≤ VGj (τ3 −) + δ/2 ≤ K − 2δ + δ/2
j

≤ K − δ.

Since |Y σ− −Y (0)| ≤ C and |Sj | ≤ |VSj | ≤ K, it follows that M σ is bounded,

and consequently an L2 -martingale.
We have verified that Y σ− is of type (δ, K − δ) according to Definition
7.22.
By assumption equation (7.54) is satisfied by both X1 and X2 . By
Lemma 7.20, both X1σ− and X2σ− satisfy the equation
Z
σ−
(7.57) X(t) = H (t) + F (s, X) dY σ− (s), 0 ≤ t < ∞.
(0,t]

To this equation we apply Proposition 7.23. In the hypotheses of Proposition

7.23 take H1 = H2 = H σ− , so φH (u) = 0 and assumption (7.38) is satisfied.
7.3. Proof of the existence and uniqueness theorem 223

All the hypotheses of part (b) of Proposition 7.23 are satisfied, and we get
h i
E sup |X1σ− (t) − X2σ− (t)|2 = 0
0≤t≤γ(u)

for any u > 0, where γ(u) is the stopping time defined by (7.33). As we
let u % ∞ we get X1σ− (t) = X2σ− (t) for all 0 ≤ t < ∞. This implies that
X1 (t) = X2 (t) for 0 ≤ t < σ.
At time σ,
Z
X1 (σ) = H(σ) + F (s, X1 ) dY (s)
(0,σ]
Z
(7.58)
= H(σ) + F (s, X2 ) dY (s)
(0,σ]
= X2 (σ)
because the integrand F (s, X1 ) depends only on {X1 (s) : 0 ≤ s < σ} which
agrees with the corresponding segment of X2 , as established in the previous
paragraph. Now we know that X1 (t) = X2 (t) for 0 ≤ t ≤ σ. This concludes
the proof of Step 1.

Step 2. Now we show that X1 and X2 agree up to an arbitrary finite

time T . At this stage we begin to make use of the assumption of right-
continuity of {Ft }. Define
(7.59) τ = inf{t ≥ 0 : X1 (t) 6= X2 (t)}.
The time when the cadlag process X1 − X2 first enters the open set {0}c is
a stopping time under a right-continuous filtration by Lemma 2.6. Hence
τ is a stopping time. Since X1 = X2 on [0, τ ), if τ < ∞, a calculation like
the one in (7.58) shows that X1 = X2 on [0, τ ]. From Step 1 we also know
τ ≥ σ > 0.
The idea is to apply Step 1 again, starting from time τ . For this we need
a lemma that enables us to restart the equation.
Lemma 7.26. Assume F satisfies Assumption 7.7 and X satisfies equation
(7.18) on [0, ∞). Let σ be a finite stopping time. Define a new filtration,
new processes, and a new coefficient by F̄t = Fσ+t , X̄(t) = X(σ + t) − X(σ),
Ȳ (t) = Y (σ + t) − Y (σ), H̄(t) = H(σ + t) − H(σ), and
F̄ (t, ω, η) = F (σ + t, ω, ζ ω,η )
where the cadlag path ζ ω,η ∈ DRd [0, ∞) is defined by
(
X(s), 0≤s<σ
ζ ω,η (s) =
X(σ) + η(s − σ), s ≥ σ.
224 7. Stochastic Differential Equations

Then under {F̄t }, X̄ and H̄ are adapted cadlag processes, Ȳ is a semi-

martingale, and F̄ satisfies Assumption 7.7. X̄ is a solution of the equation
Z
X̄(t) = H̄(t) + F̄ (s, X̄) dȲ (s).
(0,t]

Proof of Lemma 7.26. We check that the new F̄ satisfies all the hypothe-
ses. The Lipschitz property is immediate. Let Z̄ be a cadlag process adapted
to {F̄t }. Define the process Z by
(
X(t), t<σ
Z(t) =
X(σ) + Z̄(t − σ), t ≥ σ.
Then Z is a cadlag process adapted to {Ft } by Lemma 7.21. F̄ (t, Z̄) =
F (σ + t, Z) is predictable under {F̄t } by Lemma 5.42. Find stopping times
νk % ∞ such that 1(0,νk ] (s)F (s, Z) is bounded for each k. Define ρk =
(νk − σ)+ . Then ρk % ∞, by Lemma 7.21 ρk is a stopping time for {F̄t },
and 1(0,ρk ] (s)F̄ (s, Z̄) = 1(0,νk ] (σ + s)F (σ + s, Z) which is bounded.
Ȳ is a semimartingale by Theorem 5.41. X̄ and H̄ are adapted to {F̄t }
by part (iii) of Lemma 2.2 (recall that cadlag paths imply progressive mea-
surability).
The equation for X̄ checks as follows:
X̄(t) = X(σ + t) − X(σ)
Z
= H(σ + t) − H(σ) + F (s, X) dY (s)
(σ,σ+t]
Z
= H̄(t) + F (σ + s, X) dȲ (s)
(0,t]
Z
= H̄(t) + F̄ (s, X̄) dȲ (s).
(0,t]

The next to last equality is from (5.48), and the last equality from the
definition of F̄ and ζ ω,X̄ = X.

We return to the proof of the uniqueness theorem. Let 0 < T < ∞. By

Lemma 7.26, we can restart the equations for X1 and X2 at time σ = τ ∧ T .
Since X1 = X2 on [0, σ], both X1 and X2 lead to the same new coefficient
F̄ (t, ω, η) for the restarted equation. Consequently we have
Z
X̄` (t) = H̄(t) + F̄ (s, X̄` ) dȲ (s), ` = 1, 2.
(0,t]
Applying Step 1 to this restarted equation, we find a stopping time σ > 0
in the filtration {F(τ ∧T )+t } such that X̄1 = X̄2 on [0, σ]. This implies that
X1 = X2 on [0, τ ∧ T + σ]. Hence by definition (7.59), τ ≥ τ ∧ T + σ, which
7.3. Proof of the existence and uniqueness theorem 225

implies that τ ≥ T . Since T was arbitrary, τ = ∞. This says that X1 and

X2 agree for all time.
Remark 7.27. Let us note the crucial use of the fundamental theorem of
local martingales in Step 1 of the proof above. Suppose we could not choose
the decomposition Y = Y (0) + M + G so that M has jumps bounded by
δ/2. Then to satisfy the hypotheses of Proposition 7.23, we could try to stop
before either M or S has jumps larger than δ/2. However, this attempt runs
into trouble. The stopped local martingale part M σ in (7.56) might have
a large jump exactly at time σ. Replacing M σ with M σ− would eliminate
this problem, but there is no guarantee that the process M σ− is a local
martingale.

7.3.4. Existence theorem. We begin with an existence theorem under

stringent assumptions on Y . These hypotheses are subsequently relaxed
with a localization argument.
Proposition 7.28. A solution to (7.18) on [0, ∞) exists under the following
assumptions:
(i) F satisfies Assumption 7.7.
(ii) There are constants 0 < δ < K < ∞ such that c defined by (7.34)
satisfies c < 1, and Y is of type (δ, K − δ) as specified by Definition
7.22.

We prove this proposition in several stages. The hypotheses are chosen

so that the estimate in Proposition 7.23 can be applied.
We define a Picard iteration as follows. Let X0 (t) = H(t), and for n ≥ 0,
Z
(7.60) Xn+1 (t) = H(t) + F (s, Xn ) dY (s).
(0,t]
We need to stop the processes in order to get bounds that force the iteration
to converge. By part (ii) of Assumption 7.7 fix stopping times νk % ∞ and
constants Bk such that 1(0,νk ] (s)|F (s, H)| ≤ Bk , for all k. By Lemma 7.20
equation (7.60) continues to hold for stopped processes:
Z
νk − νk −
(7.61) Xn+1 (t) = H (t) + F (s, Xnνk − ) dY νk − (s).
(0,t]
Fix k for a while. For n ≥ 0 let
νk −
Dn (t) = sup |Xn+1 (s) − Xnνk − (s)|2 .
0≤s≤t

Recall definition (7.33), and for 0 ≤ u < ∞ let

φn (u) = E Dn ◦ γ(u) .
Lemma 7.29. The function φ0 (u) is bounded on bounded intervals.
226 7. Stochastic Differential Equations

Proof. Because φ0 is nondecreasing it suffices to show that φ0 (u) is finite

for any u. First,
Z 2
νk − νk − 2
|X1 (t) − X0 (t)| = F (s, H νk − ) dY νk − (s)
(0,t]
Z 2
= FBk (s, H νk − ) dY νk − (s)
(0,t]

where FBk denotes the truncated function

FBk (s, ω, η) = F (s, ω, η) ∧ Bk ∨ (−Bk ).

The truncation can be introduced in the stochastic integral because on [0, νk ]

F (s, H νk − ) = F (s, H) = FBk (s, H) = FBk (s, H νk − ).

We get the bound

h i
φ0 (u) = E sup |X1νk − (t) − X0νk − (t)|2
0≤t≤γ(u)
Z 2
≤E sup FBk (s, H) dY (s)
0≤t≤γ(u) (0,t]

≤ 12 L−2 (u + c)Bk2 .

The last inequality above is from Lemma 7.24.

νk −
Part (a) of Proposition 7.23 applied to (Z1 , Z2 ) = (Xnνk − , Xn+1 ) and
(H1 , H2 ) = (H νk − , H νk − ) gives
Z u
(7.62) φn+1 (u) ≤ cφn (u) + φn (s) ds.
0

Note that we need no assumption on H because only the difference H νk − −

H νk − = 0 appears in hypothesis (7.38) for Proposition 7.23. By the previous
lemma φ0 is bounded on bounded intervals. Then (7.62) shows inductively
that all φn P
are bounded functions on bounded intervals. The next goal is to
prove that n φn (u) < ∞.

Lemma 7.30. Fix 0 < T < ∞. Let {φn } be nonnegative measurable func-
tions on [0, T ] such that φ0 ≤ B for some constant B, and inequality (7.62)
is satisfied for all n ≥ 0 and 0 ≤ u ≤ T . Then for all n and 0 ≤ u ≤ t,
n
X 1 n k n−k
(7.63) φn (u) ≤ B u c .
k! k
k=0
7.3. Proof of the existence and uniqueness theorem 227

Proof. Use induction. (7.63) is true for n = 0 by hypothesis. Assume it is

true for n. By (7.62),
Z u
φn+1 (u) ≤ cφn (u) + φn (s) ds
0
n n u
cn+1−k n cn−k n
X X Z
k
≤B u +B sk ds
k! k k k! 0
k=0 k=0
n n+1
cn+1−k n k X cn+1−k

X n
=B u +B uk
k! k k! k−1
k=0 k=1
n+1
X cn+1−k n + 1
=B uk .
k! k
k=0
n
For the last step above, combine terms and use nk + k−1 = n+1

k .

To (7.63) we apply the next limit.

Lemma 7.31. For any 0 < δ < 1 and 0 < u < ∞,
∞ X n
X 1 n k n−k 1 n u o
u δ = · exp .
k! k 1−δ 1−δ
n=0 k=0

Proof. First check this auxiliary equality for 0 < x < 1 and k ≥ 0:
∞
X k!
(7.64) (m + 1)(m + 2) · · · (m + k)xm = .
(1 − x)k+1
m=0
One way to see this is to write the left-hand sum as
∞ ∞
dk m+k dk X m+k dk
k
X x
x = k x = k
dxk dx dx 1 − x
m=0 m=0
k j dk−j

X k d
1
xk

= j
· k−j
j dx 1 − x dx
j=0
k
X k j!
= · k(k − 1) · · · (j + 1)xj
j (1 − x)j+1
j=0
k j k
k! X k x k! x
= · = 1+
1−x j 1−x 1−x 1−x
j=0
k!
= .
(1 − x)k+1
For an alternative proof of (7.64) see Exercise 7.2.
228 7. Stochastic Differential Equations

After changing the order of summation, the sum in the statement of the
lemma becomes
∞ ∞
X uk X
n(n − 1) · · · (n − k + 1)δ n−k
(k!)2
k=0 n=k
∞ ∞
X uk X
= (m + 1)(m + 2) · · · (m + k)δ m
(k!)2
k=0 m=0
∞
X u k ∞ k
k! 1 X 1 u
= · =
(k!)2 (1 − δ)k+1 1−δ k! 1 − δ
k=0 k=0
1 n u o
= · exp .
1−δ 1−δ

As a consequence of the last two lemmas we get

∞
X
(7.65) φn (u) < ∞.
n=0

It follows by the next Borel-Cantelli argument that for almost every ω, the
cadlag functions {Xnνk − (t) : n ∈ Z+ } on the interval t ∈ [0, γ(u)] form a
Cauchy sequence in the uniform norm. (Recall that we are still holding k
fixed.) Pick α ∈ (c, 1). By Chebychev’s inequality and (7.63),
∞ n o ∞
νk −
X X
P sup |Xn+1 (t) − Xnνk − (t)| ≥ αn/2 ≤ B α−n φn (u)
n=0 0≤t≤γ(u) n=0
∞ X
n
X 1 n u k c n−k
≤B .
k! k α α
n=0 k=0

This sum converges by Lemma 7.31. By the Borel-Cantelli lemma there

exists an almost surely finite N (ω) such that for n ≥ N (ω),
νk −
sup |Xn+1 (t) − Xnνk − (t)| < αn/2 .
0≤t≤γ(u)

Consequently for p > m ≥ N (ω),

p−1
νk −
X
νk −
sup |Xm (t) − Xpνk − (t)| ≤ sup |Xn+1 (t) − Xnνk − (t)|
0≤t≤γ(u) n=m 0≤t≤γ(u)
∞
X αm/2
≤ αn/2 = .
n=m
1 − α1/2
This last bound can be made arbitrarily small by taking m large enough.
This gives the Cauchy property. By the completeness of cadlag functions
7.3. Proof of the existence and uniqueness theorem 229

under uniform distance (Lemma A.3), for almost every ω there exists a
cadlag function t 7→ X
ek (t) on the interval [0, γ(u)] such that

(7.66) lim ek (t) − X νk − (t)| = 0.

sup |X
n→∞ 0≤t≤γ(u) n

By considering a sequence of u-values increasing to infinity, we get a single

event of probability one on which (7.66) holds for all 0 ≤ u < ∞. For any
fixed ω and T < ∞, γ(u) ≥ T for all large enough u. We conclude that,
ek on the interval [0, ∞)
with probability one, there exists a cadlag function X
such that
(7.67) lim ek (t) − X νk − (t)| = 0
sup |X for all T < ∞.
n→∞ 0≤t≤T n

Next we show, still with k fixed, that X

ek is a solution of
Z
(7.68) ek (t) = H νk − (t) +
X ek ) dY νk − (s).
F (s, X
(0,t]

For this we let n → ∞ in (7.61) to obtain (7.68) in the limit. The left
side of (7.61) converges to the left side of (7.68) almost surely, uniformly on
compact time intervals by (7.67). For the the right side of (7.61) we apply
Theorem 5.40. From the Lipschitz property of F
ek ) − F (t, X νk − ) ≤ L · sup |X
F (t, X ek (s) − X νk − (s)|.
n n
0≤s<t

ek ) − F (t, X νk − ) and the cadlag bound

In Theorem 5.40 take Hn (t) = F (t, X n
ek (s) − X νk − (s)|.
Gn (t) = L · sup |X n
0≤s≤t

(Note the range of the supremum over s.) The convergence in (7.67) gives
the hypothesis in Theorem 5.40, and we conclude that the right side of
(7.61) converges to the right side of (7.68), in probability, uniformly over
t in compact time intervals. We can get almost sure convergence along a
subsequence. Equation (7.68) has been verified.
This can be repeated for all values of k. The limit (7.67) implies that if
k < m, then X em = X ek on [0, νk ). Since νk % ∞, we conclude that there is a
single well-defined cadlag process X on [0, ∞) such that X = X ek on [0, νk ).
On [0, νk ) equation (7.68) agrees term by term with the equation
Z
(7.69) X(t) = H(t) + F (s, X) dY (s).
(0,t]

For the integral term this follows from part (b) of Proposition 5.46 and
the manner of dependence of F on the path: since X = X ek on [0, νk ),
F (s, X) = F (s, X
ek ) on [0, νk ].
230 7. Stochastic Differential Equations

We have found a solution to equation (7.69). The proof of Proposition

7.28 is thereby complete.
The main result of this section is the existence of a solution without extra
assumptions on Y . This theorem will also complete the proof of Theorem
7.8.
Theorem 7.32. A solution to (7.18) on [0, ∞) exists under Assumption 7.7
on F , for an arbitrary semimartingale Y and cadlag process H.

Given an arbitrary semimartingale Y and a cadlag process H, fix con-

stants 0 < δ < K/3 < ∞ so that c defined by (7.34) satisfies c < 1. Pick
a decomposition Y = Y0 + M + G such that the local L2 -martingale M
has jumps bounded by δ/2. Fix a constant 0 < C < ∞. Define the fol-
lowing stopping times ρk , σk and τki for 1 ≤ i ≤ 3 and k ∈ Z+ . First
ρ0 = σ0 = τ0i = 0 for 1 ≤ i ≤ 3. For k ≥ 1,
τk1 = inf{t > 0 : |Y (ρk−1 + t) − Y (ρk−1 )| ≥ C
or |Y ((ρk−1 + t)−) − Y (ρk−1 )| ≥ C},
τk2 = inf{t > 0 : |∆Y (ρk−1 + t)| > δ/2},
τk3 = inf{t > 0 : VGj (ρk−1 + t) − VGj (ρk−1 ) ≥ K − 2δ
for some 1 ≤ j ≤ m},
σk = τk1 ∧ τk2 ∧ τk3 ∧ 1, and ρk = σ1 + · · · + σk .
Each σk > 0 for the same reasons that σ defined by (7.55) was positive.
Consequently 0 = ρ0 < ρ1 < ρ2 < · · ·
We claim that ρk % ∞. To see why, suppose to the contrary that ρk %
u < ∞ for some ω. Then necessarily for some 1 ≤ i ≤ 3 and a subsequence
{km }, τkim % u < ∞. But this cannot happen for any i. If τk1m % u or
τk2m % u, the existence of the left limit Y (u−) would be impossible, and
since VGj is nondecreasing, τk3m % u would force VGj (u−) = ∞.
By iterating part (a) of Lemma 7.21 one can conclude that for each
k, ρk is a stopping time for {Ft }, and then σk+1 is a stopping time for
{Fρk +t : t ≥ 0}. (Recall that we are assuming {Ft } right-continuous now.)
The heart of the existence proof is an iteration which we formulate with
the next lemma.
Lemma 7.33. For each k, there exists an adapted cadlag process Xk (t) such
that the equation
Z
(7.70) Xk (t) = H ρk (t) + F (s, Xk ) dY ρk (s)
(0,t]
is satisfied.
7.3. Proof of the existence and uniqueness theorem 231

Proof. The proof is by induction. After each ρk , we restart the equation

but stop before the hypotheses of Proposition 7.28 are violated. This way
we can apply Proposition 7.28 to construct a segment of the solution, one
interval (ρk , ρk+1 ) at a time. An explicit definition takes the solution up to
time ρk+1 , and then we are ready for the next iteration.
For k = 1, ρ1 = σ1 . The semimartingale Y σ1 − satisfies the hypotheses
of Proposition 7.28. This argument is the same as in the uniqueness proof
of the previous section. Consequently by Proposition 7.28 there exists a
solution X
e of the equation
Z
ρ1 − e dY ρ1 − (s).
(7.71) X(t) = H
e (t) + F (s, X)
(0,t]

Define
(
X(t),
e 0 ≤ t < ρ1
X1 (t) =
X1 (ρ1 −) + ∆H(ρ1 ) + F (ρ1 , X)∆Y (ρ1 ), t ≥ ρ1 .
e

e = F (t, X1 ) for 0 ≤ t ≤ ρ1 because X1 = X

F (t, X) e on [0, ρ1 ). Then X1
solves (7.70) for k = 1 and 0 ≤ t < ρ1 . For t ≥ ρ1 ,

X1 (t) = X1 (ρ1 −) + ∆H(ρ1 ) + F (ρ1 , X)∆Y e (ρ1 )

Z
= H(ρ1 ) + F (s, X1 ) dY (s)
(0,ρ1 ]
Z
ρ1
= H (t) + F (s, X1 ) dY ρ1 (s).
(0,t]

The equality of stochastic integrals of the last two lines above is an instance
of the general identity G · Y τ = (G · Y )τ (Proposition 5.35). The case k = 1
of the lemma has been proved.
Assume a process Xk (t) solves (7.70). Define F̄t = Fρk +t ,
H̄(t) = H(ρk + t) − H(ρk ),
Ȳ (t) = Y (ρk + t) − Y (ρk ),
and F̄ (t, ω, η) = F (ρk + t, ω, ζ ω,η )
where the cadlag path ζ ω,η is defined by
(
ω,η Xk (s), 0 ≤ s < ρk
ζ (s) =
Xk (ρk ) + η(s − ρk ), s ≥ ρk .

Now we find a solution X̄ to the equation

Z
(7.72) X̄(t) = H̄ σk+1 − (t) + F̄ (s, X̄) dȲ σk+1 − (s)
(0,t]
232 7. Stochastic Differential Equations

under the filtration {F̄t }. We need to check that this equation is the type
to which Proposition 7.28 applies. Semimartingale Ȳ σk+1 − satisfies the as-
sumption of Proposition 7.28, again by the argument already used in the
uniqueness proof. F̄ satisfies Assumption 7.7 exactly as was proved earlier
for Lemma 7.26.
The hypotheses of Proposition 7.28 have been checked, and so there
exists a process X̄ that solves (7.72). Note that X̄(0) = H̄(0) = 0. Define



 Xk (t), t < ρk

X (ρ ) + X̄(t − ρ ),
k k k ρk ≤ t < ρk+1
Xk+1 (t) =


 Xk+1 (ρk+1 −) + ∆H(ρk+1 )

 +F (ρk+1 , Xk+1 )∆Y (ρk+1 ), t ≥ ρk+1 .
The last case of the definition above makes sense because it depends on the
segment {Xk+1 (s) : 0 ≤ s < ρk+1 } defined by the two preceding cases.
By induction Xk+1 satisfies the equation (7.70) for k + 1 on [0, ρk ]. From
the definition of F̄ , F̄ (s, X̄) = F (ρk + s, Xk+1 ) for 0 ≤ s ≤ σk+1 . Then for
ρk < t < ρk+1 ,
Xk+1 (t) = Xk (ρk ) + X̄(t − ρk )
Z
= Xk (ρk ) + H̄(t − ρk ) + F̄ (s, X̄) dȲ (s)
(0,t−ρk ]
Z
= Xk (ρk ) + H(t) − H(ρk ) + F (s, Xk+1 ) dY (s)
(ρk ,t]
Z
= H(t) + F (s, Xk+1 ) dY (s)
(0,t]
Z
ρk+1
=H (t) + F (s, Xk+1 ) dY ρk+1 (s).
(0,t]

The last line of the definition of Xk+1 extends the validity of the equation
to t ≥ ρk+1 :
Xk+1 (t) = Xk+1 (ρk+1 −) + ∆H(ρk+1 ) + F (ρk+1 , Xk+1 )∆Y (ρk+1 )
= H(ρk+1 −) + ∆H(ρk+1 )
Z
+ F (s, Xk+1 ) dY (s) + F (ρk+1 , Xk+1 )∆Y (ρk+1 )
(0,ρk+1 )
Z
= H(ρk+1 ) + F (s, Xk+1 ) dY (s)
(0,ρk+1 ]
Z
ρk+1
=H (t) + F (s, Xk+1 ) dY ρk+1 (s).
(0,t]

By induction, the lemma has been proved.

7.3. Proof of the existence and uniqueness theorem 233

We are ready to finish off the proof of Theorem 7.32. If k < m, stopping
the processes of the equation
Z
ρm
Xm (t) = H (t) + F (s, Xm ) dY ρm (s)
(0,t]
at ρk gives the equation
Z
ρk ρk ρk
Xm (t) = H (t) + F (s, Xm ) dY ρk (s).
(0,t]
ρk
By the uniqueness theorem, Xm = Xk for k < m. Consequently there exists
a process X that satisfies X = Xk on [0, ρk ] for each k. Then for 0 ≤ t ≤ ρk ,
equation (7.70) agrees term by term with the desired equation
Z
X(t) = H(t) + F (s, X) dY (s).
(0,t]
Hence this equation is valid on every [0, ρk ], and thereby on [0, ∞). The
existence and uniqueness theorem has been proved.
234 7. Stochastic Differential Equations

Exercises
Exercise 7.1. (a) Show that for any g ∈ C[0, 1],
Z t
g(s)
lim(1 − t) 2
ds = g(1).
t%1 0 (1 − s)

(b) Let the process Xt be defined by (7.13) for 0 ≤ t < 1. Show that
Xt → 0 as t → 1.
Hint. Apply Exercise 6.2 and then part (a).
Exercise 7.2. Here is an alternative inductive proof of the identity (7.64)
used in the existence proof for solutions of SDE’s. Fix −1 < x < 1 and let
X∞
ak = (m + 1)(m + 2) · · · (m + k)xm
m=0
and
∞
X
bk = m(m + 1)(m + 2) · · · (m + k)xm .
m=0
Compute a1 explicitly, then derive the identities bk = xak+1 and ak+1 =
(k + 1)ak + bk .
Appendix A

Analysis

Definition A.1. Let X be a space. A function d : X × X → [0, ∞) is a

metric if for all x, y, z ∈ X,
(i) d(x, y) = 0 if and only if x = y,
(ii) d(x, y) = d(y, x), and
(iii) d(x, y) ≤ d(x, z) + d(z, y).
The pair (X, d), or X alone, is called a metric space.

Convergence of a sequence {xn } to a point x in X means that the dis-

tance vanishes in the limit: xn → x if d(xn , x) → 0. {xn } is a Cauchy
sequence if supm>n d(xm , xn ) → 0 as n → ∞. Completeness of a metric
space means that every Cauchy sequence in the space has a limit in the
space. A countable set {yk } is dense in X if for every x ∈ X and every
ε > 0, there exists a k such that d(x, yk ) < ε. X is a separable metric space
if it has a countable dense set. A complete, separable metric space is called
a Polish space.
The CartesianPproduct X n = X × X × · · · × X is a metric space with
metric d(x, ¯ y) =
i d(xi , yi ) defined for vectors x = (x1 , . . . , xn ) and y =
n
(y1 , . . . , yn ) in X .
A vector space (also linear space) is a space X on which addition x + y
(x, y ∈ X) and scalar multiplication αx (α ∈ R, x ∈ X) are defined, and
satisfy the usual algebraic properties that familiar vector spaces such as Rn
have. For example, there must exist a zero vector 0 ∈ X, and each x ∈ X
has an additive inverse −x such that x + (−x) = 0.
Definition A.2. Let X be a vector space. A function k · k : X → [0, ∞) is
a norm if for all x, y ∈ X and α ∈ R,

235
236 A. Analysis

(i) kxk = 0 if and only if x = 0,

(ii) kαxk = |α| · kxk, and
(iii) kx + yk ≤ kxk + kyk.
X is a normed space if it has a norm. A metric on a normed space is defined
by d(x, y) = kx − yk. A normed space that is complete as a metric space is
called a Banach Space.

A.1. Continuous, cadlag and BV functions

Fix an interval [a, b]. The uniform norm and the uniform metric on functions
on [a, b] are defined by
(A.1) kf k∞ = sup |f (t)| and d∞ (f, g) = sup |f (t) − g(t)|.
t∈[a,b] t∈[a,b]

One needs to distinguish these from the L∞ norms defined in (1.6).

We can impose this norm and metric on different spaces of functions on
[a, b]. Continuous functions are the most familiar. Studying stochastic pro-
cesses leads us also to consider cadlag functions, which are right-continuous
and have left limits at each point in [a, b]. Both cadlag functions and con-
tinuous functions form a complete metric space under the uniform metric.
This is proved in the next lemma.
Lemma A.3. (a) Suppose {fn } is a Cauchy sequence of functions in the
metric d∞ on [a, b]. Then there exists a function f on [a, b] such that
d∞ (fn , f ) → 0.
(b) If all the functions fn are cadlag on [a, b], then so is the limit f .
(c) If all the functions fn are continuous on [a, b], then so is the limit f .

Proof. As an instance of the general definition, a sequence of functions {fn }

on [a, b] is a Cauchy sequence in the metric d∞ if for each ε > 0 there exists
a finite N such that d∞ (fn , fm ) ≤ ε for all m, n ≥ N . For a fixed t, the
sequence of numbers {fn (t)} is a Cauchy sequence, and by the completeness
of the real number system fn (t) converges as n → ∞. These pointwise limits
define the function
f (t) = lim fn (t).
n→∞
It remains to show uniform convergence fn → f . Fix ε > 0, and use the
uniform Cauchy property to pick N so that
|fn (t) − fm (t)| ≤ ε for all m, n ≥ N and t ∈ [a, b].
With n and t fixed, let m → ∞. In the limit we get
|fn (t) − f (t)| ≤ ε for all n ≥ N and t ∈ [a, b].
A.1. Continuous, cadlag and BV functions 237

This shows the uniform convergence.

(b) Fix t ∈ [a, b]. We first show f (s) → f (t) as s & t. (If t = b no
approach from the right is possible and there is nothing to prove.) Let
ε > 0. Pick n so that
sup |fn (x) − f (x)| ≤ ε.
x∈[a,b]

By assumption fn is cadlag so we may pick δ > 0 so that t ≤ s ≤ t + δ

implies |fn (t) − fn (s)| ≤ ε/4. The triangle inequality then shows that
|f (t) − f (s)| ≤ ε
for s ∈ [t, t + δ]. We have shown that f is right-continuous.
To show that left limits exist for f , use the left limit of fn to find η > 0
so that |fn (r) − fn (s)| ≤ ε/4 for r, s ∈ [t − η, t). Then again by the triangle
inequality, |f (r) − f (s)| ≤ ε for r, s ∈ [t − η, t). This implies that
0 ≤ lim sup f (s) − lim inf f (s) ≤ ε
s%t s%t

and since ε > 0 was arbitrary, the existence of the limit f (t−) = lims%t f (s)
follows.

A function f defined on Rd (or some subset of it) is locally Lipschitz if

for any compact set K there exists a constant LK such that
|f (x) − f (y)| ≤ L|x − y|
for all x, y ∈ K in the domain of f . In particular, a locally Lipschitz function
on R is Lipschitz continuous on any compact intyerval [a, b].
Lemma A.4. Suppose g ∈ BV [0, T ] and f is locally Lipschitz on R, or
some subset of it that contains the range of g. Then f ◦ g is BV.

Proof. A BV function on [0, T ] is bounded because

|g(x)| ≤ |g(0)| + Vg (T ).
Hence f is Lipschitz continuous on the range of g. With L denoting the
Lipschitz constant of f on the range of g, for any partition {ti } of [0, T ],
X X
|f (g(ti+1 )) − f (g(ti ))| ≤ L |g(ti+1 ) − g(ti )| ≤ LVg (T ).
i i

238 A. Analysis

Lemma A.5. Suppose f has left and right limits at all points in [0, T ]. Let
α > 0. Define the set of jumps of magnitude at least α by
(A.2) U = {t ∈ [0, T ] : |f (t+) − f (t−)| ≥ α}
with the interpretations f (0−) = f (0) and f (T +) = f (T ). Then U is finite.
Consequently f can have at most countably many jumps in [0, T ].

Proof. If U were infinite, it would have a limit point s ∈ [0, T ]. This means
that every interval (s − δ, s + δ) contains a point of U , other than s itself.
But since the limits f (s±) both exist, we can pick δ small enough so that
|f (r) − f (t)| < α/2 for all pairs r, t ∈ (s − δ, s), and all pairs r, t ∈ (s, s + δ).
Then also |f (t+) − f (t−)| ≤ α/2 for all t ∈ (s − δ, s) ∪ (s, s + δ), and so these
intervals cannot contain any point from U .

The lemma applies to monotone functions, BV functions, cadlag and

caglad functions.
Lemma A.6. Let f be a cadlag function on [0, T ] and define U as in (A.2).
Then
lim sup{|f (v) − f (u)| : 0 ≤ u < v ≤ T, v − u ≤ δ, (u, v] ∩ U = ∅} ≤ α.
δ&0

Proof. This is proved by contradiction. Assume there exists a sequence

δn & 0 and points un , vn ∈ [0, T ] such that 0 < vn −un ≤ δn , (un , vn ]∩U = ∅,
and
(A.3) |f (vn ) − f (un )| > α + ε
for some ε > 0. By compactness of [0, T ], we may pass to a subsequence
(denoted by un , vn again) such that un → s and vn → s for some s ∈ [0, T ].
One of the three cases below has to happen for infinitely many n.
Case 1. un < vn < s. Passing to the limit along a subsequence for which
this happens gives
f (vn ) − f (un ) → f (s−) − f (s−) = 0
by the existence of left limits for f . This contradicts (A.3).
Case 2. un < s ≤ vn . By the cadlag property,
|f (vn ) − f (un )| → |f (s) − f (s−)|.
(A.3) is again contradicted because s ∈ (un , vn ] implies s cannot lie in U , so
the jump at s must have magnitude strictly less than α.
Case 3. s ≤ un < vn . This is like Case 1. Cadlag property gives
f (vn ) − f (un ) → f (s) − f (s) = 0.

A.1. Continuous, cadlag and BV functions 239

In Section 1.1.9 we defined the total variation Vf (t) of a function defined

on [0, t]. Let us also define the quadratic cross variation of two functions f
and g by
m(π)−1
X
[f, g](t) = lim f (si+1 ) − f (si ) g(si+1 ) − g(si )
mesh(π)→0
i=0

if the limit exists as the mesh of the partition π = {0 = s0 < · · · < sm(π) = t}
tends to zero. The quadratic variation of f is [f ] = [f, f ].
P
In the next development we write down sums of the type α∈A x(α)
where A is an arbitrary set and x : A → R a function. Such a sum can be
defined as follows: the sum has a finite value c if for every ε > 0 there exists
a finite set B ⊆ A such that if E is a finite set with B ⊆ E ⊆ A then
X
(A.4) x(α) − c ≤ ε.
α∈E
P
If A x(α) has a finite value, x(α) 6= 0 for at most countably many α-values.
In the above condition, the set B must contain all α such that |x(α)| > 2ε,
for otherwise adding on one such term violates the inequality. In other
words, the set {α : |x(α)| ≥ η} is finite for any η > 0.
If x(α) ≥ 0 always, then
X X
x(α) = sup x(α) : B is a finite subset of A
α∈A α∈B

gives a value in [0, ∞] which agrees with the definition above if it is finite.
As for familiar series, absolute convergence implies convergence. In other
words if
X
|x(α)| < ∞,
α∈A
P
then the sum A x(α) has a finite value.

Lemma A.7. Let f be a function with left and right limits on [0, T ]. Then
X
|f (s+) − f (s−)| ≤ Vf (T ).
s∈[0,T ]

The sum is actually over a countable set because f has at most countably
many jumps.

Proof. If f has unbounded variation there is nothing to prove because the

right-hand side of the inequality is infinite. Suppose Vf (T ) < ∞. If the
conclusion of the lemma fails, there exists a finite set {s1 < s2 < · · · < sm }
240 A. Analysis

of jumps such that

m
X
|f (si +) − f (si −)| > Vf (T ).
i=1
Pick disjoint intervals (ai , bi ) 3 si for each i. (If s1 = 0 take a1 = 0, and if
sm = T take bm = T .) Then
m
X
Vf (T ) ≥ |f (bi ) − f (ai )|
i=1
Let ai % si and bi & si for each i, except a1 in case s1 = 0 and bm in case
sm = T . Then the right-hand side above converges to
Xm
|f (si +) − f (si −)|,
i=1
contradicting the earlier inequality.
Lemma A.8. Let f and g be a real-valued cadlag functions on [0, T ], and
assume f ∈ BV [0, T ]. Then
X
(A.5) [f, g](T ) = f (s) − f (s−) g(s) − g(s−)
s∈(0,T ]

and the sum above converges absolutely.

Proof. As a cadlag function g is bounded. (If it were not, we could pick sn

so that |g(sn )| % ∞. By compactness, a subsequence snk → t ∈ [0, T ]. But
g(t±) exist and are finite.) Then by the previous lemma
X X
|f (s) − f (s−)| · |g(s) − g(s−)| ≤ 2kgk∞ |f (s) − f (s−)|
s∈(0,T ] s∈(0,T ]
≤ 2kgk∞ Vf (T ) < ∞.
This checks that the sum in (A.5) converges absolutely. Hence it has a finite
value, and we can approximate it with finite sums. Let ε > 0. Let Uα be
the set defined in (A.2) for g. For small enough α > 0,
X
f (s) − f (s−) g(s) − g(s−)
s∈Uα
X
− f (s) − f (s−) g(s) − g(s−) ≤ ε.
s∈(0,T ]

Shrink α further so that 2αVf (T ) < ε.

For such α > 0, let Uα = {u1 < u2 < · · · < un }. Let δ = 12 min{uk+1 −
uk } be half the minimum distance between two of these jumps. Consider
partitions π = {0 = s0 < · · · < sm(π) = t} with mesh(π) ≤ δ. Let i(k)
A.1. Continuous, cadlag and BV functions 241

be the index such that uk ∈ (si(k) , si(k)+1 ], 1 ≤ k ≤ n. By the choice of δ,

a partition interval (si , si+1 ] can contain at most one uk . Each uk lies in
some (si , si+1 ] because these intervals cover (0, T ] and for a cadlag function
0 is not a discontinuity. Let I = {0, . . . , m(π) − 1} \ {i(1), . . . , i(n)} be the
complementary set of indices.
By Lemma A.6 we can further shrink δ so that if mesh(π) ≤ δ then
|g(si+1 ) − g(si )| ≤ 2α for i ∈ I. Then
m(π)−1
X
f (si+1 ) − f (si ) g(si+1 ) − g(si )
i=0
Xn

= f (si(k)+1 ) − f (si(k) ) g(si(k)+1 ) − g(si(k) )
k=1
X
+ f (si+1 ) − f (si ) g(si+1 ) − g(si )
i∈I
n
X
≤ f (si(k)+1 ) − f (si(k) ) g(si(k)+1 ) − g(si(k) ) + 2αVf (T ).
k=1
As mesh(π) → 0, si(k) < uk and si(k)+1 ≥ uk , while both converge to uk .
By the cadlag property the sum on the last line above converges to
Xn

f (uk ) − f (uk −) g(uk ) − g(uk −) .
k=1
Combining this with the choice of α made above gives
m(π)−1
X
lim f (si+1 ) − f (si ) g(si+1 ) − g(si )
mesh(π)→0
i=0
X
≤ f (s) − f (s−) g(s) − g(s−) + 2ε.
s∈(0,T ]

Reversing inequalities in the argument gives

m(π)−1
X
lim f (si+1 ) − f (si ) g(si+1 ) − g(si )
mesh(π)→0 i=0
X
≥ f (s) − f (s−) g(s) − g(s−) − 2ε.
s∈(0,T ]

Since ε > 0 was arbitrary the proof is complete.

Corollary A.9. If f is a cadlag BV-function on [0, T ], then [f ]T is finite
and given by X 2
[f ](T ) = f (s) − f (s−) .
s∈(0,T ]
242 A. Analysis

Lemma A.10. Let fn , f , gn and g be cadlag functions on [0, T ] such that

fn → f uniformly and gn → g uniformly on [0, T ]. Suppose fn ∈ BV [0, T ]
for each n, and C0 ≡ supn Vfn (T ) < ∞. Then [fn , gn ]T → [f, g]T as n → ∞.

Proof. The function f is also BV, because for any partition,

X X
|f (si+1 ) − f (si )| = lim |fn (si+1 ) − fn (si )| ≤ C0 .
n→∞
i i

Consequently by Lemma A.8 we have

X
[f, g]T = f (s) − f (s−) g(s) − g(s−)
s∈(0,T ]

and
X
[fn , gn ]T = fn (s) − fn (s−) gn (s) − gn (s−)
s∈(0,T ]

and all these sums converge absolutely. Pick δ > 0. Let

Un (δ) = {s ∈ (0, T ] : |gn (s) − gn (s−)| ≥ δ}
be the set of jumps of g of magnitude at least δ, and U (δ) the same for g.
By the absolute convergence of the sum for [f, g] we can pick δ > 0 so that
X
[f, g]T − f (s) − f (s−) g(s) − g(s−) ≤ ε.
s∈U (δ)

We claim that for large enough n, Un (δ) ⊆ U (δ). Since a cadlag

function has only finitely many jumps with magnitude above any given
positive quantity, there exists a small α > 0 such that g has no jumps
s such that |g(s) − g(s−)| ∈ (δ − α, δ). If n is large enough so that
sups |gn (s) − g(s)| < α/4, then for each s
(gn (s) − gn (s−)) − (g(s) − g(s−)) ≤ α/2.
A.1. Continuous, cadlag and BV functions 243

Now if s ∈ Un (δ), |gn (s) − gn (s−)| ≥ δ and the above inequality imply
|g(s) − g(s−)| ≥ δ − α/2. This jump cannot fall in the forbidden range
(δ − α, δ), so in fact it must satisfy |g(s) − g(s−)| ≥ δ and then s ∈ U (δ).
Now we can complete the argument. Take n large enough so that U (δ) ⊃
Un (δ), and take H = U (δ) in the estimate above. Putting the estimates
together gives

[f, g] − [fn , gn ] ≤ 2ε
X
+ f (s) − f (s−) g(s) − g(s−)
s∈U (δ)
X
− fn (s) − fn (s−) gn (s) − gn (s−) .
s∈U (δ)

As U (δ) is a fixed finite set, the difference of two sums over U tends to zero
as n → ∞. Since ε > 0 was arbitrary, the proof is complete.

We regard vectors as column vectors. T denotes transposition. So if

x = [x1 , . . . , xd ]T and y = [y1 , . . . , yd ]T are elements of Rd and A = (ai,j ) is
a d × d matrix, then
X
xT Ay = xi ai,j yj .
1≤i,j≤d

The Euclidean norm is |x| = (x21 + · · · + x2d )1/2 , and we apply this also to
matrices in the form
X 1/2
2
|A| = ai,j .
1≤i,j≤d

The Schwarz inequality says |xT y| ≤ |x| · |y|. This extends to

(A.6) |xT Ay| ≤ |x| · |A| · |y|.

Lemma A.11. Let g1 , . . . , gd be cadlag functions on [0, T ], and g = (g1 , . . . , gd )T

the Rd -valued cadlag function with coordinates g1 , . . . , gd . Cadlag functions
are bounded, so there exists a closed, bounded set K ⊆ Rd such that g(s) ∈ K
for all s ∈ [0, T ].
Let φ be a continuous function on [0, T ]2 × K 2 such that the function

φ(s, t, x, y)

 , s 6= t or x 6= y
(A.7) γ(s, t, x, y) = |t − s| + |y − x|2
0, s = t and x = y

244 A. Analysis

is also continuous on [0, T ]2 × K 2 . Let π ` = {0 = t`0 < t`1 < · · · < t`m(`) = T }
be a sequence of partitions of [0, T ] such that mesh(π ` ) → 0 as ` → ∞, and
m(`)−1
2
X
(A.8) C0 = sup g(t`i+1 ) − g(t`i ) < ∞.
` i=0

Then
m(`)−1
X X
φ tì , tì+1 , g(tì ), g(tì+1 ) =

(A.9) lim φ s, s, g(s−), g(s) .
`→∞
i=0 s∈(0,T ]

The limit on the right is a finite, absolutely convergent sum.

Proof. To begin, note that [0, T ]2 × K 2 is a compact set, so any continuous

function on [0, T ]2 × K 2 is bounded and uniformly continuous.
We claim that
2
X
(A.10) g(s) − g(s−) ≤ C0
s∈(0,T ]

where C0 is the constant defined in (A.8). This follows the reasoning of

Lemma A.7. Consider any finite set of points s1 < · · · < sn in (0, t]. For
each `, pick indices i(k) such that sk ∈ (tì(k) , tì(k)+1 ], 1 ≤ k ≤ n. For large
enough ` all the i(k)’s are distinct, and then by (A.8)
n
2
X
g(tì(k)+1 ) − g(tì(k) ) ≤ C0 .
k=1

As ` → ∞ the inequality above becomes

n
2
X
g(sk ) − g(sk −) ≤ C0
k=1

by the cadlag property, because for each `, t`i(k) < sk ≤ t`i(k)+1 , while both
extremes converge to sk . The sum on the left-hand side of (A.10) is by
definition the supremum of sums over finite sets, hence the inequality in
(A.10) follows.
By continuity of γ there exists a constant C1 such that
|φ(s, t, x, y)| ≤ C1 |t − s| + |y − x|2

(A.11)
for all s, t ∈ [0, T ] and x, y ∈ K. From (A.11) and (A.10) we get the bound
X
φ s, s, g(s−), g(s) ≤ C0 C1 < ∞.
s∈(0,T ]
A.1. Continuous, cadlag and BV functions 245

This absolute convergence implies that the sum on the right-hand side of
(A.9) can be approximated by finite sums. Given ε > 0, pick α > 0 small
enough so that
X X
φ s, s, g(s−), g(s) − φ s, s, g(s−), g(s) ≤ ε
s∈Uα s∈(0,T ]

where
Uα = {s ∈ (0, t] : |g(s) − g(s−)| ≥ α}

is the set of jumps of magnitude at least α.

Since γ is uniformly continuous on [0, T ]2 × K 2 and vanishes on the set
{(u, u, z, z) : u ∈ [0, T ], z ∈ K}, we can shrink α further so that

(A.12) |γ(s, t, x, y)| ≤ ε/C0

whenever |t − s| ≤ 2α and |y − x| ≤ 2α. Given this α, let I ` be the set of

indices 0 ≤ i ≤ m(`)−1 such that (t`i , t`i+1 ]∩Uα = ∅. By Lemma A.6 and the
assumption mesh(π ` ) → 0 we can fix `0 so that for ` ≥ `0 , mesh(π ` ) < 2α
and

(A.13) |g(t`i+1 ) − g(t`i )| ≤ 2α for all i ∈ I ` .

Note that the proof of Lemma A.6 applies to vector-valued functions.

Let J ` = {0, . . . , m(`) − 1} \ I ` be the complementary set of indices i
such that (t`i , t`i+1 ] contains a point of Uα . Now we can bound the difference
in (A.9). Consider ` ≥ `0 .

m(`)−1
X X
φ tì , tì+1 , g(tì ), g(tì+1 ) −

φ s, s, g(s−), g(s)
i=0 s∈(0,T ]
X
φ tì , tì+1 , g(tì ), g(tì+1 )

≤
i∈I `
X X
φ tì , tì+1 , g(tì ), g(tì+1 ) −

+ φ s, s, g(s−), g(s) + ε.
i∈J ` s∈Uα

The first sum after the inequality above is bounded above by ε, by (A.8),
(A.12) and (A.13). The difference of two sums in absolute values vanishes as
` → ∞, because for large enough ` each interval (t`i , t`i+1 ] for i ∈ J ` contains
a unique s ∈ Uα , and as ` → ∞,

tì → s, tì+1 → s, g(tì ) → g(s−) and g(tì+1 ) → g(s)

246 A. Analysis

by the cadlag property. (Note that Uα is finite by Lemma A.5 and for large
enough ` index set J` has exactly one term for each s ∈ Uα .) We conclude
m(`)−1
X X
φ tì , tì+1 , g(tì ), g(tì+1 ) −

lim sup φ s, s, g(s−), g(s) ≤ 2ε.
`→∞ i=0 s∈(0,T ]

Since ε > 0 was arbitrary, the proof is complete.

A basic extension result says that a uniformly continuous map into a

complete space can be extended to the closure of its domain.
Lemma A.12. Let (X, d) and (Y, r) be metric spaces, and assume (Y, r) is
complete. Let S be a subset of X, and f : S → Y a map that is uniformly
continuous. Then there exists a unique continuous map g : S̄ → Y such that
g(x) = f (x) for x ∈ S. Furthermore g is also uniformly continuous.

Proof. Fix x ∈ S̄. There exists a sequence xn ∈ S that converges to x.

We claim that {f (xn )} is a Cauchy sequence in Y . Let ε > 0. By uniform
continuity, there exists η > 0 such that for all z, w ∈ S, if d(z, w) ≤ η
then r f (z), f (w) ≤ ε/2. Since xn → x, there exists N < ∞ such that
d(xn , x) < η/2 whenever n ≥ N . Now if m, n ≥ N ,
d(xm , xn ) ≤ d(xm , x) + d(x, xn ) < η/2+ < η/2 = η,
so by choice of η,

(A.14) r f (xm ), f (xn ) ≤ ε/2 < ε for m, n ≥ N .

By completeness of Y , the Cauchy sequence {f (xn )} converges to some

point y ∈ Y . Let us check that for any other sequence S 3 zk → x, the limit
of f (zm ) is again y. Let again ε > 0, choose η and N as above, and choose
K so that d(zk , x) < η/2 whenever k ≥ K. Let n ≥ N and k ≥ K. By the
triangle inequality

r y, f (zk ) ≤ r y, f (xn ) + r f (xn ), f (zk ) .
We can let m → ∞ in (A.14), and since f (xm ) → y and the metric is itself a
continuous funcion of each of its arguments, in the limit r y, f (xn ) ≤ ε/2.
Also,
d(zk , xn ) ≤ d(zk , x) + d(x, xn ) < η/2+ < η/2 = η,

so r f (xn ), f (zk ) ≤ ε/2. We have shown that r y, f (zk ) ≤ ε for k ≥ K.
In other words f (zk ) → y.
The above development shows that, given x ∈ S̄, we can unambigously
define g(x) = lim f (xn ) for any sequence xn in S that converges to x. By
the continuity of f , if x happens to lie in S, then g(x) = lim f (xn ) = f (x),
so g is an extension of f to S̄.
A.2. Differentiation and integration 247

To show the continuity of g, let ε > 0, pick η as above, and suppose

d(x, z) ≤ η/2 for x, z ∈ S̄. Pick sequences xn → x and zn → z from S.
Pick n large enough so that r f (xn ), f (x) < ε/4, r f (zn ), f (z) < ε/4,
d(xn , x) < η/4 and d(zn , z) < η/4. Then

d(xn , zn ) ≤ d(xn , x) + d(x, z) + d(zn , z) < η

which implies r f (xn ), f (zn ) ≤ ε/2. Then

r f (x), f (z) ≤ r f (x), f (xn ) + r f (xn ), f (zn ) + r f (zn ), f (z)
≤ ε.

This shows the uniform continuity of g.

Uniqueness of g follows from above because any continuous extension h
of f must satisfy h(x) = lim f (xn ) = g(x) whenever a sequence xn from S
converges to x.

Without uniform continuity the extension might not be possible, as ev-

idenced by a simple example such as S = [0, 1) ∪ (1, 2], f ≡ 1 on [0, 1), and
f ≡ 2 on (1, 2].
Our application of the extension is to the following situation.

Lemma A.13. Let X, Y and S be as in the previous lemma. Assume in ad-

dition that they are all linear spaces, and that the metrics satisfy d(x0 , x1 ) =
d(x0 + z, x1 + z) for all x0 , x1 , z ∈ X and r(y0 , y1 ) = r(y0 + w, y1 + w) for
all y0 , y1 , w ∈ Y . Let I : S → Y be a continuous linear map. Then there
exists a linear map T : S̄ → Y which agrees with I on S.

A.2. Differentiation and integration

For an open set G ⊆ Rd , C 2 (G) is the space of functions f : G → R whose
partial derivatives up to second order exist and are continuous. We use
subscript notation for partial derivatives, as in

∂f ∂2f
fx1 = and fx1 ,x2 = .
∂x1 ∂x1 ∂x2
These will always be applied to functions with continuous partial derivatives
so the order of differentiation does not matter. The gradient Df is the
column vector of first-order partial derivatives:
T
Df (x) = fx1 (x), fx2 (x), . . . , fxd (x) .
248 A. Analysis

The Hessian matrix D2 f is the d × d matrix of second-order partial deriva-

tives:  
fx1 ,x1 (x) fx1 ,x2 (x) ··· fx1 ,xd (x)
fx ,x (x) fx ,x (x)
 2 1 2 2 ··· fx2 ,xd (x)
D2 f (x) =  .

.. .. .. ..
 . . . . 
fxd ,x1 (x) fxd ,x2 (x) · · · fxd ,xd (x)

We prove a version of Taylor’s theorem. Let us write f ∈ C 1,2 ([0, T ]×G)

if the partial derivatives ft , fxi and fxi ,xj exist and are continuous on (0, T )×
G, and they extend as continuous functions to [0, T ] × G. The continuity of
ft is actually not needed for the next theorem. But this hypothesis will be
present in the application to the proof of Itô’s formula, so we assume it here
already.

Theorem A.14. (a) Let (a, b) be an open interval in R, f ∈ C 1,2 ([0, T ] ×

(a, b)), s, t ∈ [0, T ] and x, y ∈ (a, b). Then there exists a point τ between s
and t and a point θ between x and y such that
f (t, y) = f (s, x) + ft (s, x)(y − x) + ft (τ, y)(t − s)
(A.15)
+ 12 fxx (s, θ)(y − x)2 .

(b) Let G be an open convex set in Rd , f ∈ C 1,2 ([0, T ] × G), s, t ∈ [0, T ]

and x, y ∈ G. Then there exists a point τ between s and t and θ ∈ [0, 1]
such that, with ξ = θx + (1 − θ)y,

f (y) = f (x) + Df (x)T (y − x) + ft (τ, y)(t − s)

(A.16)
+ 21 (y − x)T D2 f (s, ξ)(y − x).

Proof. Part (a). Check by integration by parts that

Z y
ψ(s, x, y) = (y − z)fxx (z) dz.
x

satisfies
ψ(s, x, y) = f (s, y) − f (s, x) − fx (s, x)(y − x).
By the mean value theorem there exists a point τ between s and t such that
f (t, y) − f (s, y) = ft (τ, y)(t − s).
By the intermediate value theorem there exists a point θ between x and y
such that Z y
00
ψ(x, y) = f (θ) (y − z) dz = 12 f 00 (θ)(y − x)2 .
x
A.2. Differentiation and integration 249

The application of the intermediate value theorem goes like this. Let f 00 (a)
and f 00 (b) be the minimum and maximum of f 00 in [x, y] (or [y, x] if y < x).
Then
ψ(x, y)
f 00 (a) ≤ 1 2
≤ f 00 (b).
2 (y − x)
The intermediate value theorem gives a point θ between a and b such that
ψ(x, y)
f 00 (θ) = 1 .
− x)2
2 (y

The continuity of f 00 is needed here. Now

f (t, y) = f (s, x) + fx (s, x)(y − x) + ft (τ, y)(t − s) + ψ(s, x, y)
= f (s, x) + fx (s, x)(y − x) + ft (τ, y)(t − s)
+ 21 fxx (s, θ)(y − x)2 .

Part (b). Apply part (a) to the function g(t, r) = f t, x + r(y − x) for
(t, r) ∈ [0, T ] × (−ε, 1 + ε) for a small enough ε > 0 so that x + r(y − x) ∈ G
for −ε ≤ r ≤ 1 + ε.
Lemma A.15. Let (X, B, ν) be a general measure space. Assume gn → g
in Lp (ν) for some 1 ≤ p < ∞. Then for any measurable set B ∈ B,
Z Z
p
|gn | dν → |g|p dν.
B B

Proof. Let νe denote the measure restricted to the subspace B, and gen and
ge denote functions restricted to this space. Since
Z Z Z
p p
|e
gn − ge| de
ν= |gn − g| dν ≤ |gn − g|p dν
B B X

we have Lp (e
ν) convergence gen → ge. Lp norms (as all norms) are subject to
the triangle inequality, and so
ke
gn kLp (eν ) − ke
g kLp (eν ) ≤ ke
gn − gekLp (eν ) → 0.
Consequently
Z Z
p
|gn | dν = gn kpLp (eν )
ke → g kpLp (eν )
ke = |g|p dν.
B B

A common technical tool is approximation of general functions by func-

tions of some convenient type. Here is one such result that we shall use
later.
250 A. Analysis

Lemma A.16. Let µ be a σ-finite Borel measure on R, 1 ≤ p < ∞, and

f ∈ Lp (µ). Let ε > 0. Then there is a continuous function g supported on
a bounded interval, and a step function h of the form
m−1
X
(A.17) h(t) = αi 1(si ,si+1 ] (t)
i=1
for some points −∞ < s1 < s2 < · · · < sm < ∞ and reals αi , such that
Z Z
p
|f − g| dµ < ε and |f − h|p dµ < ε.
R R
If there exists a constant C such that |f | ≤ C, then g and h can also be
selected so that |g| ≤ C and |h| ≤ C.

When the underlying measure is Lebesgue measure, one often writes

Lp (R) for the function spaces.
Proposition A.17 (Lp continuity). Let f ∈ Lp (R). Then
Z
lim |f (t) − f (t + h)|p dt = 0.
h→0 R

Proof. Check that the property is true for a step function of the type (A.17).
Then approximate an arbitrary f ∈ Lp (R) with a step function.
Proposition A.18. Let T be an invertible linear transformation on Rn and
f a Boirel or Lebesgue measurable function on Rn . Then if f is either in
L1 (Rn ) or nonnegative,
Z Z
(A.18) f (x) dx = |det T | f (T (x)) dx.
Rn Rn

Exercises
Exercise A.1. Let A be a set and x : A → R a function. Suppose
X
c1 ≡ sup |x(α)| : B is a finite subset of A < ∞.
α∈B
P
Show that then the sum α∈A x(α) has a finite value in the sense of the
definition stated around equation (A.4).
P
Hint. Pick finite sets Bk such that Bk |x(α)| > c1 − 1/k. Show that
P
the sequence ak = Bk x(α) is a Cauchy sequence. Show that c = lim ak is
the value of the sum.
Appendix B

Probability

B.1. General matters

Let Ω be an arbitrary space, and L and R collections of subsets of Ω. R
is a π-system if it is closed under intersections, in other words if A, B ∈ R,
then A ∩ B ∈ R. L is a λ-system if it has the following three properties:
(1) Ω ∈ L.
(2) If A, B ∈ L and A ⊆ B then B \ A ∈ L.
(3) If {An : 1 ≤ n < ∞} ⊆ L and An % A then A ∈ L.
Theorem B.1 (Dynkin’s π-λ-theorem). If R is a π-system and L is a λ-
system that contains R, then L contains the σ-algebra σ(R) generated by
R.

For a proof, see the Appendix in [2]. The π-λ-theorem has the following
version for functions.
S
Theorem B.2. Let R be a π-system on a space X such that X = Bi for
some pairwise disjoint sequence Bi ∈ R. Let H be a linear space of bounded
functions on X. Assume that 1B ∈ H for all B ∈ R, and assume that H is
closed under bounded, increasing pointwise limits: if f1 ≤ f2 ≤ f3 ≤ · · · are
elements of H and supn,x fn (x) ≤ c for some constant c, then f = lim fn
lies in H. Then H contains all bounded σ(R)-measurable functions.

Proof. L = {A : 1A ∈ H} is a λ-system containing

Pn R. Note that X ∈
L follows because by assumption 1X = limn→∞ i=1 1Bi is an increasing
limit of functions in H. By the π-λ Theorem indicator functions of all
σ(R)-measurable sets lie in H. By linearity all σ(R)-measurable simple
functions lie in H. A bounded, nonnegative σ(R)-measurable function is an

251
252 B. Probability

increasing limit of nonnegative simple functions, and hence in H. And again

by linearity, all bounded σ(R)-measurable functions lie in H.
Lemma B.3. Let R be a π-system on a space X. Let µ and ν be two
(possibly infinite) measures on σ(R), the σ-algebra generated by R. Assume
that µ and ν agree on R. Assume further that thereS is a countable collection
of pairwise disjoint sets {Ri } ⊆ R such that X = Ri and µ(Ri ) = ν(Ri ) <
∞. Then µ = ν on σ(R).

Proof. It suffices to check that µ(A) = ν(A) for all A ∈ σ(R) that lie inside
some Ri . Then for a general B ∈ σ(R),
X X
µ(B) = µ(B ∩ Ri ) = ν(B ∩ Ri ) = ν(B).
i i
Inside a fixed Rj , let
D = {A ∈ σ(R) : A ⊆ Rj , µ(A) = ν(A)}.
D is a λ-system. Checking property (2) uses the fact that Rj has finite
measure under µ and ν so we can subtract: if A ⊆ B and both lie in D, then
µ(B \ A) = µ(B) − µ(A) = ν(B) − ν(A) = ν(B \ A),
so B \A ∈ D. By hypothesis D contains the π-system {A ∈ R : A ⊆ Rj }. By
the π-λ-theorem D contains all the σ(R)-sets that are contained in Rj .
Lemma B.4. Let ν and µ be two finite Borel measures on a metric space
(S, d). Assume that
Z Z
(B.1) f dµ = f dν
S S
for all bounded continuous functions f on S. Then µ = ν.

Proof. Given a closed set F ⊆ S,

1
(B.2) fn (x) =
1 + n dist(x, F )
defines a bounded continuous function which converges to 1F (x) as n → ∞.
The quantity in the denominator is the distance from the point x to the set
F , defined by
dist(x, F ) = inf{d(x, y) : y ∈ F }.
For a closed set F , dist(x, F ) = 0 iff x ∈ F . Letting n → ∞ in (B.1) with
f = fn gives µ(F ) = ν(F ). Apply Lemma B.3 to the class of closed sets.

Two definitions of classes of sets that are more primitive than σ-algebras,
hence easier to deal with. A collection A of subsets of Ω is an algebra if
(i) Ω ∈ A.
B.1. General matters 253

(ii) Ac ∈ A whenever A ∈ A.
(iii) A ∪ B ∈ A whenever A ∈ A and B ∈ A.

A collection S of subsets of Ω is a semialgebra if it has these properties:

(i) ∅ ∈ S.
(ii) If A, B ∈ S then also A ∩ B ∈ S.
(iii) If A ∈ S, then Ac is a finite disjoint union of elements of S.

In applications, one sometimes needs to generate an algebra with a semial-

gebra, which is particularly simple.

Lemma B.5. Let S be a semialgebra, and A the algebra generated by S, in

other words the intersection of all algebras that contain S. Then A is the
collection of all finite disjoint unions of members of S.

Proof. Let B be the collection of all finite disjoint unions of members of

S. Since any algebra containing S must contain B, it suffices to verify
that B is an algebra. By hypothesis ∅ ∈ S and Ω = ∅c is a finite disjoint
union of members of S, hence a member of B. Since A ∪ B = (Ac ∩ B c )c
(deMorgan’s law), it suffices to show that B is closed under intersections
and complements.
S S
Let A = 1≤i≤m Si and B = 1≤j≤n Tj be finite disjoint unions of
S
members of S. Then A ∩ B = i,j Ai ∩ Bj is again a finite disjoint union
c
S members of S. By the properties of a semialgebra, we can write Si =
of
1≤k≤p(i) Ri,k as a finite disjoint union of members of S. Then

\ [ \
Ac = Sic = Ri,k(i) .
1≤i≤m (k(1),...,k(m)) 1≤i≤m

T above is over m-tuples (k(1), . . . , k(m)) such that 1 ≤ k(i) ≤

The last union
p(i). Each 1≤i≤m Ri,k(i) is an element of S, and for distinct m-tuples these
are disjoint because k 6= ` implies Ri,k ∩ Ri,` = ∅. Thus Ac ∈ B too.

Algebras are sufficiently rich to provide approximation with error of

arbitrarily small measure.

Lemma B.6. Suppose µ is a finite measure on the σ-algebra σ(A) generated

by an algebra A. Show that for every B ∈ σ(A) and ε > 0 there exists A ∈ A
such that µ(A4B) < ε. The operation 4 is the symmetric difference defined
by A4B = (A \ B) ∪ (B \ A).
254 B. Probability

Let {An } be a sequence of events in a probability space (Ω, F, P ). We

say An happens infinitely often at ω if ω ∈ An for infinitely many n. Equiv-
alently,
∞ [
\ ∞
{ω : An happens infinitely often} = An .
m=1 n=m
The complementary event is
∞ \
[ ∞
{ω : An happens only finitely many times} = Acn .
m=1 n=m
Often it is convenient to use the random variable
n \∞ o
N0 (ω) = inf m : ω ∈ Acn
n=m
with N0 (ω) = ∞ if An happens infinitely often at ω.
Lemma B.7 (Borel-Cantelli Lemma). P Let {An } be a sequence of events in
a probability space (Ω, F, P ) such that n P (An ) < ∞. Then
P {An happens infinitely often} = 0,
or equivalently, P {N0 < ∞} = 1.

Proof. Since the tail of a convergent series can be made arbitrarily small,
we have \ ∞ [ ∞ X ∞
P An ≤ P (An ) → 0
m=1 n=m n=m
as m % ∞.

The typical application of the Borel-Cantelli lemma is some variant of

this idea.
Lemma B.8. Let {Xn } be random variables on (Ω, F, P ). Suppose
X
P {|Xn | ≥ ε} < ∞ for every ε > 0.
n
Then Xn → 0 almost surely.

Proof. Translating the familiar ε-definition of convergence into an event

gives \ [ \
{ω : Xn (ω) → 0} = {ω : |Xn (ω)| ≤ 1/k}.
k≥1 m≥1 n≥m
For each k ≥ 1, the hypothesis and the Borel-Cantelli lemma give
[ \ \ [
P {|Xn | ≤ 1/k} = 1 − P {|Xn | > 1/k} = 1.
m≥1 n≥m m≥1 n≥m
B.1. General matters 255

A countable intesection of events of probability one has also probability

one.

The basic convergence theorems of integration theory often work if al-

most everywhere convergence is replaced by the weaker types of convergence
that are common in probability. As an example, here is the dominated con-
vergence theorem under convergence in probability.
Theorem B.9. Let Xn be random variables on (Ω, F, P ), and assume Xn →
X in probability. Assume there exists a random variable Y ≥ 0 such that
|Xn | ≤ Y almost surely for each n, and EY < ∞. Then EXn → EX.

Proof. It suffices to show that every subsequence {nk } has a further sub-
subsequence {nkj } such that EXnkj → EX as j → ∞. So let {nk } be
given. Convergence in probability Xnk → X implies almost sure conver-
gence Xnkj → X along some subsubsequence {nkj }. The standard domi-
nated convergence theorem now gives EXnkj → EX.

Conditional expectations satisfy some of the same convergence theorems

as ordinary expectations.
Theorem B.10. Let A be a sub-σ-field of F on the probability space (Ω, F, P ).
Let {Xn } and X be integrable random variables. The hypotheses and con-
clusions below are all “almost surely.”
(i) (Monotone Convergence Theorem) Suppose Xn+1 ≥ Xn ≥ 0 and
Xn % X. Then E(Xn |A) % E(X|A).
(ii) (Fatou’s Lemma) If Xn ≥ 0 for all n, then

E lim Xn |A ≤ lim E(Xn |A).

(iii) (Dominated Convergence Theorem) Suppose Xn → X, |Xn | ≤ Y

for all n, and Y ∈ L1 (P ). Then E(Xn |A) → E(X|A).

Proof. Part (i). By the monotonicity of the sequence E(Xn |A) and the
ordinary monotone convergence theorem, for A ∈ A,

E 1A · lim E(Xn |A) = lim E 1A E(Xn |A) = lim E 1A Xn = E 1A X
n→∞ n→∞ n→∞

= E 1A E(X|A) .
Since A ∈ A is arbitrary, this implies the almost sure equality of the A-
measurable random variables limn→∞ E(Xn |A) and E(X|A).
Part (ii). The sequence Yk = inf m≥k Xm increases up to lim Xn . Thus
by part (i),

E lim Xn |A = lim E inf Xk |A ≤ lim E(Xn |A).
n→∞ k≥n n→∞
256 B. Probability

Part (iii). Usig part (ii),

E(X|A) + E(Y |A) = E lim{Xn + Y }|A ≤ lim E Xn + Y |A
= lim E(Xn |A) + E(Y |A).
This gives lim E(Xn |A) ≥ E(X|A). Apply this to −Xn to get lim E(Xn |A) ≤
E(X|A).
Definition B.11. Let {Xα : α ∈ A} be a collection of random variables
defined on a probability space (Ω, F, P ). They are uniformly integrable if

lim sup E |Xα | · 1{|Xα | ≥ M } = 0.
M →∞ α∈A

Equivalently, the following two conditions are satisfied. (i) supα E|Xα | < ∞.
(ii) Given ε > 0, there exists a δ > 0 such that for every event B such that
P (B) ≤ δ,
Z
sup |Xα | dP ≤ ε.
α∈A B

Proof of the equivalence of the two formulations of uniform integrability

can be found for example in [1].
Lemma B.12. Let X be an integrable random variable on a probability
space (Ω, F, P ). Then the collection of random variables
{E(X|A) : A is a sub-σ-field of F}
is uniformly integrable.
Lemma B.13. Suppose Xn → X in L1 on a probability space (Ω, F, P ).
Let A be a sub-σ-field of F. Then there exists a subsequence {nj } such that
E[Xnj |A] → E[X|A] a.s.

Proof. We have

lim E E[ |Xn − X| |A] = lim E |Xn − X| = 0,
n→∞ n→∞

and since
| E[Xn |A] − E[X|A] | ≤ E[ |Xn − X| |A],
we conclude that E[Xn |A] → E[X|A] in L1 . L1 convergence implies a.s.
convergence along some subsequence.
Lemma B.14. Let X be a random d-vector and A a sub-σ-field on the
probability space (Ω, F, P ). Let
Z
T
φ(θ) = eiθ x µ(dx) (θ ∈ Rd )
Rd
B.2. Construction of Brownian motion 257

be the characteristic function (Fourier transform) of a probability distribu-

tion µ on Rd . Assume
E exp{iθT X}1A = φ(θ)P (A)

for all θ ∈ Rd and A ∈ A. Then X has distribution µ and is independent

of A.

Proof. Taking A = Ω above shows that X has distribution µ. Fix A such

that P (A) > 0, and define the probability measure νA on Rd by
1
νA (B) = E 1B (X)1A , B ∈ BRd .
P (A)
By hypothesis, the characteristic function of νA is φ. Hence νA = µ, which
is the same as saying that

P {X ∈ B} ∩ A = νA (B)P (A) = µ(B)P (A).
Since B ∈ BRd and A ∈ A are arbitrary, the independence of X and A
follows.

B.2. Construction of Brownian motion

We construct here a one-dimensional standard Brownian motion by con-
structing its probability distribution on the “canonical” path space C =
CR [0, ∞). Let Bt (ω) = ω(t) be the coordinate projections on C, and
FtB = σ{Bs : 0 ≤ s ≤ t} the filtration generated by the coordinate pro-
cess.
Theorem B.15. There exists a Borel probability measure P 0 on C =
CR [0, ∞) such that the process B = {Bt : 0 ≤ t < ∞} on the probabil-
ity space (C, BC , P 0 ) is a standard one-dimensional Brownian motion with
respect to the filtration {FtB }.

The construction gives us the following regularity property of paths. Fix

0 < γ < 12 . For P 0 -almost every ω ∈ C,
|Bt (ω) − Bs (ω)|
(B.3) sup <∞ for all T < ∞.
0≤s<t≤T |t − s|γ
The proof relies on the Kolmogorov Extension Theorem 1.27. We do not
directly construct the measure on C. Instead, we first construct the process
on positive dyadic rational time points
nk o
Q2 = : k, n ∈ N .
2n
Then we apply an important theorem, the Kolmogorov-Centsov criterion, to
show that the process has a unique continuous extension from Q2 to [0, ∞)
258 B. Probability

that satisfies the Hölder property. The distribution of this extension will be
the Wiener measure on C.
Let
1 n x2 o
(B.4) gt (x) = √ exp −
2πt 2t
be the density of the normal distribution with mean zero and variance t (g for
Gaussian). For an increasing n-tuple of positive times 0 < t1 < t2 < · · · < tn ,
let t = (t1 , t2 , . . . , tn ). We shall write x = (x1 , . . . , xn ) for vectors in Rn ,
and abbreviate dx = dx1 dx2 · · · dxn to denote integration with respect to
Lebesgue measure on Rn . Define a probability measure µt on Rn by
Z Yn
(B.5) µt (A) = 1A (x) gt1 (x1 ) gti −ti−1 (xi − xi−1 ) dx
Rn i=2
for A ∈ BRn . Before proceeding further, we check that this definition is
the right one, namely that µt is the distribution we want for the vector
(Bt1 , Bt2 , . . . , Btn ).
Lemma B.16. If a one-dimensional standard Brownian motion B exists,
then for A ∈ BRn

P (Bt1 , Bt2 , . . . , Btn ) ∈ A = µt (A).

Proof. Define the nonsingular linear map

T (y1 , y2 , y3 , . . . , yn ) = (y1 , y1 + y2 , y1 + y2 + y3 , . . . , y1 + y2 + · · · + yn )
with inverse
T −1 (x1 , x2 , x3 , . . . , xn ) = (x1 , x2 − x1 , x3 − x2 , . . . , xn − xn−1 ).
In the next calculation, use the fact that the Brownian increments are inde-
pendent and Bti − Bti−1 has density gti −ti−1 .

P (Bt1 , Bt2 , Bt3 , . . . , Btn ) ∈ A

= P T (Bt1 , Bt2 − Bt1 , Bt3 − Bt2 , . . . , Btn − Btn−1 ) ∈ A
= P (Bt1 , Bt2 − Bt1 , Bt3 − Bt2 , . . . , Btn − Btn−1 ) ∈ T −1 A

Z
= 1A (T y)gt1 (y1 )gt2 −t1 (y2 ) · · · gtn −tn−1 (yn ) dy.
Rn
Now change variables through x = T y ⇔ y = T −1 x. T has determinant
one, so by (A.18) the integral above becomes
Z
1A (x)gt1 (x1 )gt2 −t1 (x2 − x1 ) · · · gtn −tn−1 (xn − xn−1 ) dx = µt (A).
Rn

B.2. Construction of Brownian motion 259

To apply Kolmogorov’s Extension Theorem we need to define consistent

finite-dimensional distributions. For an n-tuple s = (s1 , s2 , . . . , sn ) of dis-
tinct elements from Q2 let π be the permutation that orders it. In other
words, π is the bijection on {1, 2, . . . , n} determined by
sπ(1) < sπ(2) < · · · < sπ(n) .
(In this section all n-tuples of time points have distinct entries.) Define the
probability measure Qs on Rn by Qs = µπs ◦ π, or in terms of integrals of
bounded Borel functions,
Z Z
f dQs = f (π −1 (x)) dµπs .
Rn Rn

Let us convince ourselves again that this definition is the one we want.
Qs should represent the distribution of the vector (Bs1 , Bs2 , . . . , Bsn ), and
indeed this follows from Lemma B.16:
Z
f (π −1 (x)) dµπs = E (f ◦ π −1 )(Bsπ(1) , Bsπ(2) ) , . . . , Bsπ(n) )

Rn

= E f (Bs1 , Bs2 , . . . , Bsn ) .
For the second equality above, apply to yi = Bsi the identity
π −1 (yπ(1) , yπ(2) , . . . , yπ(n) ) = (y1 , y2 , . . . , yn )
which is a consequence of the way we defined the action of a permutation
on a vector in Section 1.2.4.
Let us check the consistency properties (i) and (ii) required for the Ex-
tension Theorem 1.27. Suppose t = ρs for two distinct n-tuples s and t
from Q2 and a permutation ρ. If π orders t then π ◦ ρ orders s, because
tπ(1) < tπ(2) implies sρ(π(1)) < sρ(π(2)) . One must avoid confusion over how
the action of permutations is composed:
π(ρs) = (sρ(π(1)) , sρ(π(2)) , . . . , sρ(π(n)) )

because π(ρs) i = (ρs)π(i) = sρ(π(i)) . Then

Qs ◦ ρ−1 = µπ(ρs) ◦ (π ◦ ρ) ◦ ρ−1 = µπt ◦ π = Qt .

This checks (i). Property (ii) will follow from this lemma.
Lemma B.17. Let t = (t1 , . . . , tn ) be an ordered n-tuple , and let t̂ =
(t1 , . . . , tj−1 , tj+1 , . . . , tn ) be the (n − 1)-tuple obtained by removing tj from
t. Then for A ∈ BRj−1 and B ∈ BRn−j ,
µt (A × R × B) = µt̂ (A × B).
260 B. Probability

Proof. A basic analytic property of Gaussian densities is the convolution

identity gs ∗ gt (x) = gs+t (x), where the convolution of two densities f and g
is in general defined by
Z
(f ∗ g)(x) = f (y)g(x − y) dy.
R

We leave checking this to the reader. The corresponding probabilistic prop-

erty is that the sum of independent normal variables is again normal, and
the means and variances add.
The conclusion of the lemma follows from a calculation. Abbreviate x0 =
(x1 , . . . , xj−1 ), x00 = (xj+1 , . . . , xn ) and x̂ = (x0 , x00 ). It is also convenient to
use t0 = 0 and x0 = 0.
Z Z Z Yn
0
µt (A × R × B) = dx dxj dx00 gti −ti−1 (xi − xi−1 )
A R B i=1
Z Y
= dx̂ gti −ti−1 (xi − xi−1 )
A×B i6=j,j+1
Z
× dxj gtj −tj−1 (xj − xj−1 )gtj+1 −tj (xj+1 − xj ).
R

After a change of variables y = xj − xj−1 , the interior integral becomes

Z
dy gtj −tj−1 (y)gtj+1 −tj (xj+1 − xj−1 − y) = gtj −tj−1 ∗ gtj+1 −tj (xj+1 − xj−1 )
R
= gtj+1 −tj−1 (xj+1 − xj−1 ).

Substituting this back up gives

Z Yn
dx̂ gti −ti−1 (xi − xi−1 ) · gtj+1 −tj−1 (xj+1 − xj−1 )
A×B i6=j,j+1
= µt̂ (A × B).

As A ∈ BRj−1 and B ∈ BRn−j range over these σ-algebras, the class of

product sets A × B generates BRn−1 and is also closed under intersections.
Consequently by Lemma B.3, the conclusion of the above lemma generalizes
to

(B.6) µt {x ∈ Rn : x̂ ∈ G} = µt̂ (G) for all G ∈ BRn−1 .

We are ready to check requirement (ii) of Theorem 1.27. Let t =

(t1 , t2 , . . . , tn−1 , tn ) be an n-tuple, s = (t1 , t2 , . . . , tn−1 ) and A ∈ BRn−1 .
B.2. Construction of Brownian motion 261

We need to show Qs (A) = Qt (A × R). Suppose π orders t. Let j be the

index such that π(j) = n. Then s is ordered by the permutation
(
π(i), i = 1, . . . , j − 1
σ(i) =
π(i + 1), i = j, . . . , n − 1.
A few observations before we compute: σ is a bijection on {1, 2, . . . , n − 1}
as it should. Also
σ(xπ−1 (1) , xπ−1 (2) , . . . , xπ−1 (n−1) ) = (xπ−1 (σ(1)) , xπ−1 (σ(2)) , . . . , xπ−1 (σ(n−1)) )
= (x1 , . . . , xj−1 , xj+1 , . . . , xn ) = x̂
where the notation x̂ is used as in the proof of the previous lemma. And
lastly, using πt
c to denote the omission of the jth coordinate,
πt
c = (tπ(1) , . . . , tπ(j−1) , tπ(j+1) , . . . , tπ(n) )
= (tσ(1) , . . . , tσ(j−1) , tσ(j) , . . . , tσ(n−1) )
= σs.
Equation (B.6) says that the distribution of x̂ under µπt is µπt
c . From these
ingredients we get
Qt (A × R) = µπt (π(A × R))
= µπt {x ∈ Rn : (xπ−1 (1) , xπ−1 (2) , . . . , xπ−1 (n−1) ) ∈ A}
= µπt {x ∈ Rn : x̂ ∈ σA}
= µπt
c (σA) = µσs (σA) = Qs (A).

We have checked the hypotheses of Kolmogorov’s Extension Theorem for

the family {Qt }.
Let Ω2 = RQ2 with its product σ-algebra G2 = B(R)⊗Q2 . Write ξ = (ξt :
t ∈ Q2 ) for a generic element of Ω2 . By Theorem 1.27 there is a probability
measure Q on (Ω2 , G2 ) with marginals

Q ξ ∈ Ω2 : (ξt1 , ξt2 , . . . , ξtn ) ∈ B = Qt (B)
for n-tuples t = (t1 , t2 , . . . , tn ) of distinct elements from Q2 and B ∈ BRn .
We write E Q for expectation under the measure Q.
So far we have not included the time origin in the index set Q2 . This
was for reasons of convenience. Because B0 has no density, to include t0 = 0
would have required two formulas for µt in (B.5), one for n-tuples with zero,
the other for n-tuples without zero. Now define Q02 = Q2 ∪ {0}. On Ω2
define random variables {Xq : q ∈ Q02 } by
(B.7) X0 (ξ) = 0, and Xq (ξ) = ξq for q ∈ Q2 .
The second laborious technical step of our construction of Brownian mo-
tion is the proof that the process {Xq } is uniformly continuous on bounded
262 B. Probability

index sets. The next proposition is the main step of the so-called Kolmogorov-
Centsov criterion for continuity. We shall discuss this at the end of the sec-
tion. The process {Xq } referred to in the proposition is completely general,
while we will of course apply the result to the particular {Xq } defined above
on the probability space (Ω2 , G2 , Q).
Proposition B.18. Suppose {Xq : q ∈ Q02 } is a stochastic process defined
on some probability space (Ω, F, P ) with the following property: there exist
constants K < ∞ and α, β > 0 such that
E |Xs − Xr |β ≤ K|s − r|1+α for all r, s ∈ Q02 .

(B.8)
Let 0 < γ < α/β and T < ∞. Then for almost every ω there exists a
constant C(ω) < ∞ such that
(B.9) |Xs (ω) − Xr (ω)| ≤ C(ω)|s − r|γ for all r, s ∈ Q02 ∩ [0, T ].

Proof. We may assume T = 2M for some M , by increasing T if necessary.

Fix γ ∈ (0, α/β). Let 0 < η < 1 be small enough so that
λ = (1 − η)(1 + α − βγ) − (1 + η) > 0.
Let
In = {(i, j) ∈ Z2+ : 0 ≤ i < j ≤ 2M +n , j − i ≤ 2nη }.
Define the event
Gn = {|Xj2−n − Xi2−n | ≤ (j − i)γ 2−nγ for all (i, j) ∈ In }.
From Chebychev’s inequality we get a bound on P (Gcn ):
X
P (Gcn ) ≤ P {|Xj2−n − Xi2−n | > (j − i)γ 2−nγ }
(i,j)∈In
X
(j − i)−βγ 2nβγ E |Xj2−n − Xi2−n |β

≤
(i,j)∈In
X
≤K (j − i)1+α−βγ 2−n(1+α−βγ)
(i,j)∈In

≤ K2M 2n(1+η) 2−n(1−η)(1+α−βγ) = K2M 2−nλ .

In the last inequality we used j − i ≤ 2nη , and then bounded the number of
elements in In by 2M +n 2nη . Consider N large enough to have
(B.10) 2N η−(1−η) > 1
Let
∞
\ 2(1−η)γ + 2
HN = Gn and A = .
1 − 2−γ
n=N
B.2. Construction of Brownian motion 263

Claim. On the event HN , for q, r ∈ Q02 ∩ [0, 2M ],

(B.11) |q − r| < 2−N (1−η) implies |Xq − Xr | ≤ A|q − r|γ .

We prove the claim and after that return to the main thread of the proof.
Let q, r ∈ Q02 ∩ [0, 2M ] satisfy 0 < r − q < 2−N (1−η) . Pick an integer m ≥ N
such that
2−(m+1)(1−η) ≤ r − q < 2−m(1−η) .
Pick integers i and j such that
(i − 1)2−m < q ≤ i2−m and j2−m ≤ r < (j + 1)2−m .
Then necessarily i ≤ j because by (B.10),
r − q ≥ 2−(m+1)(1−η) > 2−m ,
so there must be at least one dyadic rational of the type k2−m in the inteval
(q, r). On the other hand j − i ≤ 2m (r − q) < 2mη , so (i, j) ∈ Im .
We can express the dyadic rationals q and r as
r = j2−m + 2−r(1) + · · · + 2−r(k) and q = i2−m − 2−q(1) − · · · − 2−q(`)
for integers
m < r(1) < r(2) < · · · < r(k) and m < q(1) < q(2) < · · · < q(`).
To see this for r, let r = h2−L , so that from j2−m ≤ r < (j + 1)2−m follows
j2L−m ≤ h < (j + 1)2L−m . Then h is of the form
L−m−1
X
h = j2L−m + ap 2p
p=0

for ap ∈ {0, 1}, and dividing this by 2L gives the expression for r.
We bound the difference in three parts.
|Xq − Xr | ≤ |Xq − Xi2−m | + |Xi2−m − Xj2−m | + |Xj2−m − Xr |.
The middle term satisfies
|Xi2−m − Xj2−m | ≤ (j − i)γ 2−mγ ≤ 2−m(1−η)γ
because we are on the event HN which lies inside Gm . For the first term,
`
X
X i2−m − 2−q(1) − · · · − 2−q(h−1)

|Xq − Xi2−m | ≤
h=1

− X i2−m − 2−q(1) − · · · − 2−q(h)

` ∞
X
−q(h) γ
X 2−γ(m+1)
≤ (2 ) ≤ (2−γ )p = .
1 − 2−γ
h=1 p=m+1
264 B. Probability

This is seen by rewriting the time arguments in each X-difference as

i2q(h)−m − 2q(h)−q(1) − · · · − 2q(h)−q(h−1) 2−q(h)

and i2q(h)−m − 2q(h)−q(1) − · · · − 2q(h)−q(h−1) − 1 2−q(h)

and by using the definition of the event Gq(h) . A similar argument gives the
same bound for the third term. Together we have
2 · 2−γ(m+1)
|Xq − Xr | ≤ 2−m(1−η)γ +
1 − 2−γ
2(1−η)γ 2
≤ 2−(m+1)(1−η)γ · −γ
+ 2−(m+1)(1−η)γ ·
1−2 1 − 2−γ
2(1−η)γ + 2
≤ |r − q|γ · .
1 − 2−γ
This completes the proof of (B.11).
Now we finish the proof of the proposition with the help of (B.11). First,
X X X X X
P (HNc
)≤ P (Gcn ) ≤ K2M 2−nλ
N ≥1 N ≥1 n≥N N ≥1 n≥N

K2M X
= 2−N λ < ∞.
1 − 2−λ
N ≥1

By the Borel-Cantelli Lemma, there is an almost surely finite random vari-

able N0 (ω) such that ω ∈ HN for N ≥ N0 (ω). Let δ(ω) = 2−N0 (ω)(1−η) . For
ω such that N0 (ω) < ∞, (B.11) says that
|Xq (ω) − Xr (ω)| ≤ A|q − r|γ
for q, r ∈ Q02 ∩ [0, T ] such that |q − r| < δ(ω). For q < r in Q02 ∩ [0, T ] such
that r − q ≥ δ(ω), find q = s0 < s1 < · · · < sk = r such that each si ∈ Q02
and 12 δ(ω) ≤ si − si−1 < δ(ω). Then
r−q 2T
k≤ 1 ≤ .
2 δ(ω)
δ(ω)
Define C(ω) = 2T A/δ(ω). Finally,
k
X k
X
|Xq (ω) − Xr (ω)| ≤ |Xsi (ω) − Xsi−1 (ω)| ≤ A |si − si−1 |γ
i=1 i=1
γ γ
≤ Ak|q − r| ≤ C(ω)|q − r| .
This completes the proof of the proposition.

We apply the proposition to our particular situation.

B.2. Construction of Brownian motion 265

Corollary B.19. Let {Xq } be the process defined by (B.7) on the probability
space (Ω2 , G2 , Q), where Q is the probability measure whose existence came
from Kolmogorov’s Extension Theorem. Let 0 < γ < 12 . Then there is an
event Γ such that Q(Γ) = 1 with this property: for all ξ ∈ Γ and T < ∞,
there exists a finite constant CT (ξ) such that
(B.12) |Xs (ξ) − Xr (ξ)| ≤ CT (ξ)|s − r|γ for all r, s ∈ Q02 ∩ [0, T ].
In particular, for ξ ∈ Γ the function q 7→ Xq (ξ) is uniformly continuous on
Q02 ∩ [0, T ] for every T < ∞.

Proof. We need to check the hypothesis (B.8). Due to the definition of the
finite-dimensional distributions of Q, this reduces to computing a moment
of the Gaussian distribution. Fix an integer m ≥ 2 large enough so that
1 1 0
2 − 2m > γ. Let 0 ≤ q < r be indices in Q2 . In the next calculation, note
that after changing variables in the dy2 -integral it no longer depends on y1 ,
and the y1 -variable can be integrated away.
ZZ
E Q (Xr − Xq )2m = (y2 − y1 )2m gq (y1 )gr−q (y2 − y1 ) dy1 dy2

2
Z Z R
= dy1 gq (y1 ) dy2 (y2 − y1 )2m gr−q (y2 − y1 )
ZR ZR Z
2m
= dy1 gq (y1 ) dx x gr−q (x) = dx x2m gr−q (x)
R R R
x2 o
Z
1 2m
n
= p x exp − dx
2π(r − q) R 2(r − q)
Z n z2 o
m
= (r − q) z 2m exp − dz = Cm |r − q|m ,
R 2
where Cm = 1 · 3 · 5 · · · (2m − 1), the product of the odd integers less than
2m. We have verified the hypothesis (B.8) for the values α = m − 1 and
β = 2m, and by choice of m,
1 1
0 < γ < α/β = 2 − 2m .
Proposition B.18 now implies the following. For each T < ∞ there exists
an event ΓT ⊆ Ω2 such that Q(ΓT ) = 1 and for every ξ ∈ ΓT there exists a
finite constant C(ξ) such that
(B.13) |Xr (ξ) − Xq (ξ)| ≤ C(ξ)|r − q|γ
for any q, r ∈ [0, T ] ∩ Q02 . Take
∞
\
Γ= ΓT .
T =1

Then Q(Γ) = 1 and each ξ ∈ Γ has the required property.

266 B. Probability

With the uniform continuity in hand, we can now extend the definition
of the process to the entire time line. By Lemma A.12, for each ξ ∈ Γ there
is a unique continuous function t 7→ Xt (ξ) for t ∈ [0, ∞) that coincides with
the earlier values Xq (ξ) for q ∈ Q02 . The value Xt (ξ) for any t ∈
/ Q02 can be
defined by
Xt (ξ) = lim Xqi (ξ)
i→∞
for any sequence {qi } from Q02 such that qi → t. This tells us that the
random variables {Xt : 0 ≤ t < ∞} are measurable on Γ. To have a
continuous process Xt defined on all of Ω2 set
Xt (ξ) = 0 for ξ ∈
/ Γ and all t ≥ 0.

The process {Xt } is in fact a one-dimensional standard Brownian mo-

tion. The initial value X0 = 0 has been built into the definition, and we
have the path continuity. It remains to check finite-dimensional distribu-
tions. Fix 0 < t1 < t2 < · · · < tn . Pick points 0 < q1k < q2k < · · · < qnk from
Q02 so that qik → ti as k → ∞, for 1 ≤ i ≤ n. Pick them so that for some
δ > 0, δ ≤ qik − qi−1
k ≤ δ −1 for all i and k. Let φ be a bounded continuous
n
function on R . Then by the path-continuity (use again t0 = 0 and x0 = 0)
Z Z
φ(Xt1 , Xt2 , . . . , Xtn ) dQ = lim φ(Xqk , Xqk , . . . , Xqnk ) dQ
k→∞ Ω2 1 2
Ω2
Z n
Y
= lim φ(x) gqk −qk (xi − xi−1 ) dx
k→∞ Rn i i−1
i=1
Z n
Y Z
= φ(x) gti −ti−1 (xi − xi−1 ) dx = φ(x) µt (dx).
Rn i=1 Rn

The second-last equality above is a consequence of dominated convergence.

An L1 (Rn ) integrable bound can be gotten by
n 2
o
i −xi−1 )
n o
(xi −xi−1 )2
exp − (x k k
2(qi −qi−1 ) exp − 2δ −1
gqk −qk (xi − xi−1 ) = q ≤ √ .
i i−1
2π(qik − qi−1
k ) 2πδ

Comparison with Lemma B.16 shows that (Xt1 , Xt2 , . . . , Xtn ) has the dis-
tribution that Brownian motion should have. An application of Lemma
B.4 is also needed here, to guarantee that it is enough to check continuous
functions φ. It follows that {Xt } has independent increments, because this
property is built into the definition of the distribution µt .
To complete the construction of Brownian motion and finish the proof
of Theorem 2.21, we define the measure P 0 on C as the distribution of the
process X = {Xt }:
P 0 (B) = Q{ξ ∈ Ω2 : X(ξ) ∈ A} for A ∈ BC .
B.2. Construction of Brownian motion 267

For this, X has to be a measurable map from Ω2 into C. This follows

from Exercise 1.2(b) because BC is generated by the projections πt : ω 7→
ω(t). The compositions of projections with X are precisely the coordinates
πt ◦ X = Xt which are measurable functions on Ω2 . The coordinate process
B = {Bt } on C has the same distribution as X by the definition of P 0 :
P 0 {B ∈ H} = P 0 (H) = Q{X ∈ H} for H ∈ BC .
We also need to check that B has the correct relationship with the
filtration {FtB }, as required by Definition 2.20. B is adapted to {FtB } by
construction. The independence of Bt − Bs and FsB follows from two facts:
B inherits independent increments from X (independence of increments is a
property of finite-dimensional distributions), and the increments {Bv − Bu :
0 ≤ u < v ≤ s} generate FsB . This completes the proof of Theorem 2.21.
Lastly we argue the Hölder continuity (2.17). Since paths are continuous,
it suffices to show that the events

|Br − Bq |
GT = sup γ
<∞
q,r∈Q0 ∩[0,T ] |r − q|
2
q6=r

have P 0 -probability 1. (Their intersection over T ∈ N then also has proba-

bility 1.) This is a consequence of Corollary B.19 and equality in distribution
of B and X.
The key to obtaining continuous paths was the technical proposition
B.18. Before moving on to discuss properties of Brownian motion, we place
this proposition in its right context as part of the Kolmogorov-Centsov
criterion. Recall that a process Y is a modification of a process X if
P {Xt = Yt } = 1.
Theorem B.20 (Kolmogorov-Centsov Criterion). Suppose X = {Xt : 0 ≤
t ≤ T } is a stochastic process defined on some probability space (Ω, F, P )
with the following property: there exist constants K < ∞ and α, β > 0 such
that
E |Xt − Xs |β ≤ K|t − s|1+α for all s, t ∈ [0, T ].

(B.14)
Then there exists a continuous modification Y = {Yt : 0 ≤ t ≤ T } of
X. Furthermore, Y is almost surely Hölder continuous with any exponent
γ < α/β. Namely, for almost every ω there exists a constant C(ω) < ∞
such that
(B.15) |Yt (ω) − Ys (ω)| ≤ C(ω)|t − s|γ for all s, t ∈ [0, T ].

Sketch of proof. Proposition B.18 can be applied to the process {Xq : q ∈

Q02 ∩ [0, T ]}. For q ∈ Q02 ∩ [0, T ] define Yq = Xq , and for t ∈ [0, T ] \ Q02
268 B. Probability

define Yt = lim Xqi along any sequence qi ∈ Q02 ∩ [0, T ] that converges to
t. The hypothesis implies that then Xqi → Xt in probability, and hence
Xt = Yt almost surely. The limits extend the Hölder property (B.9) from
dyadic rational time points to all of [0, T ] as claimed in (B.15)
Bibliography

[1] R. M. Dudley. Real analysis and probability. The Wadsworth & Brooks/Cole Mathe-
matics Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove,
CA, 1989.
[2] R. Durrett. Probability: theory and examples. Duxbury Press, Belmont, CA, second
edition, 1996.
[3] G. B. Folland. Real analysis: Modern techniques and their applications. Pure and
Applied Mathematics. John Wiley & Sons Inc., New York, second edition, 1999.
[4] P. E. Protter. Stochastic integration and differential equations. Springer-Verlag, Berlin,
second edition, 2004.
[5] S. Resnick. Adventures in stochastic processes. Birkhäuser Boston Inc., Boston, MA,
1992.
[6] K. R. Stromberg. Introduction to classical real analysis. Wadsworth International, Bel-
mont, Calif., 1981. Wadsworth International Mathematics Series.

269
Index

Blumenthal’s 0–1 law, 59 σ-field, 108

Brownian motion process, 108, 167
Blumenthal’s 0–1 law, 59 rectangle, 108
construction, 257
definition, 52 stochastic process, 36
Hölder continuity, 60
martingale property, 54 Wiener measure, 54
modulus of continuity, 62
non-differentiability, 61
quadratic cross-variation, 68
quadratic variation, 62
unbounded variation, 62, 63

filtration
augmented, 36
definition, 35
left-continuous, 41
right-continuous, 41
usual conditions, 41

Kolmogorov-Centsov criterion, 267

Markov process, 51, 52

metric
uniform convergence on compacts, 51

path space, 50
C–space, 50
D–space, 50
Poisson process
compensated, 66, 68
homogeneous, 66
martingales, 68
not predictable, 167
on an abstract space, 64
predictable

271

Lectures On Stochastic Processes
No ratings yet
Lectures On Stochastic Processes
207 pages
Basics of Stochastic Analysis PDF
100% (1)
Basics of Stochastic Analysis PDF
402 pages
wt2 Bonn PDF
100% (1)
wt2 Bonn PDF
127 pages
Quantitative Research Designs
100% (15)
Quantitative Research Designs
16 pages
Marketing Research Outline
67% (3)
Marketing Research Outline
7 pages
418 CUMMINS 6CTA8.3-C215 Dongfeng Part Catalogue
100% (1)
418 CUMMINS 6CTA8.3-C215 Dongfeng Part Catalogue
84 pages
Elements of Probability Theory
100% (2)
Elements of Probability Theory
38 pages
Iso 3511 Instrument - Symbols - Part - 4 PDF
0% (1)
Iso 3511 Instrument - Symbols - Part - 4 PDF
10 pages
Borel Sets PDF
100% (1)
Borel Sets PDF
181 pages
Continuous Time
No ratings yet
Continuous Time
62 pages
Sourav's Note
No ratings yet
Sourav's Note
191 pages
Mind Map
100% (1)
Mind Map
13 pages
Measure Theory
No ratings yet
Measure Theory
149 pages
Measure Theory Notes
No ratings yet
Measure Theory Notes
31 pages
Stochastic Calculus Notes 1/5
100% (3)
Stochastic Calculus Notes 1/5
25 pages
6616 Notes 18 Nov 2015
No ratings yet
6616 Notes 18 Nov 2015
96 pages
Measure Theory
No ratings yet
Measure Theory
141 pages
Stats 310 Notes
No ratings yet
Stats 310 Notes
251 pages
Measure Theory
No ratings yet
Measure Theory
110 pages
Adv Prob Note
No ratings yet
Adv Prob Note
102 pages
1 ProbTools
No ratings yet
1 ProbTools
89 pages
Notes GSzabo
No ratings yet
Notes GSzabo
76 pages
Probability and Measure Theory
No ratings yet
Probability and Measure Theory
198 pages
Ch2 Integration
No ratings yet
Ch2 Integration
57 pages
FT Notes
No ratings yet
FT Notes
46 pages
Note
No ratings yet
Note
46 pages
MA359 Lecture Notes
No ratings yet
MA359 Lecture Notes
48 pages
Measures Integration Nice
No ratings yet
Measures Integration Nice
36 pages
Measure Theory For Dummies
0% (1)
Measure Theory For Dummies
7 pages
Appendix A Basics of Measure Theory
No ratings yet
Appendix A Basics of Measure Theory
26 pages
Notes 501
No ratings yet
Notes 501
26 pages
Measurenotes 6 Feb 2019
No ratings yet
Measurenotes 6 Feb 2019
41 pages
Background Notes To Course Stochastic Processes
No ratings yet
Background Notes To Course Stochastic Processes
29 pages
Measure Theory Primer
No ratings yet
Measure Theory Primer
33 pages
Notes Mainimp
No ratings yet
Notes Mainimp
164 pages
MTLI
No ratings yet
MTLI
101 pages
Measure Theory
No ratings yet
Measure Theory
22 pages
Notes
No ratings yet
Notes
20 pages
Book Chapter02
No ratings yet
Book Chapter02
15 pages
4MTH501 2
No ratings yet
4MTH501 2
20 pages
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
No ratings yet
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
34 pages
Applied Stochastic Processes: M. Ottobre
No ratings yet
Applied Stochastic Processes: M. Ottobre
164 pages
MScFE 620 DTSP - Compiled - Notes - M1 PDF
No ratings yet
MScFE 620 DTSP - Compiled - Notes - M1 PDF
25 pages
Lecture 1
No ratings yet
Lecture 1
11 pages
Lec45-Measure Theory Basics-0801
No ratings yet
Lec45-Measure Theory Basics-0801
23 pages
EISTI - Departement of Mathematics Q.F.R.M. M1 2014-15 An Introduction To Measure and Integration
No ratings yet
EISTI - Departement of Mathematics Q.F.R.M. M1 2014-15 An Introduction To Measure and Integration
33 pages
Measuretheory
No ratings yet
Measuretheory
6 pages
Lec1
No ratings yet
Lec1
3 pages
EE 533 Information Theory: Barı S Nakibo Glu
No ratings yet
EE 533 Information Theory: Barı S Nakibo Glu
12 pages
An Introduction To Probability Theory - Geiss
No ratings yet
An Introduction To Probability Theory - Geiss
71 pages
Reference Softener Calculations
No ratings yet
Reference Softener Calculations
1 page
2017-01-20-015145 @R Basicmeasuretheory - Skript PDF
No ratings yet
2017-01-20-015145 @R Basicmeasuretheory - Skript PDF
51 pages
Orf526 f24 Lec1
No ratings yet
Orf526 f24 Lec1
4 pages
MA2224 ch3
No ratings yet
MA2224 ch3
21 pages
10.john - K.Hunter - Measure Theory 14pp 2007 - TM
No ratings yet
10.john - K.Hunter - Measure Theory 14pp 2007 - TM
14 pages
The Theory of La Mesure
No ratings yet
The Theory of La Mesure
11 pages
Prob 1 Lecture 1
No ratings yet
Prob 1 Lecture 1
8 pages
Hydrocracking Technology
100% (1)
Hydrocracking Technology
12 pages
Summary Notes 1
No ratings yet
Summary Notes 1
4 pages
Gpon Cli Manual-V1.01
No ratings yet
Gpon Cli Manual-V1.01
257 pages
Chapter 3. Lebesgue Integral and The Monotone Convergence Theorem
No ratings yet
Chapter 3. Lebesgue Integral and The Monotone Convergence Theorem
21 pages
221 Analysis 2, 2008-09 Summary of Theorems and Definitions: Measure Theory
No ratings yet
221 Analysis 2, 2008-09 Summary of Theorems and Definitions: Measure Theory
13 pages
Some Note On Probability Theory
No ratings yet
Some Note On Probability Theory
4 pages
Delcam - PowerMILL 2017 Toolpath PointParameters en - 2016
No ratings yet
Delcam - PowerMILL 2017 Toolpath PointParameters en - 2016
11 pages
Errecom Cat.a.05 19.en
No ratings yet
Errecom Cat.a.05 19.en
88 pages
EC744 Lecture Note 6 Stochastic Models: Mathematical Preliminaries
No ratings yet
EC744 Lecture Note 6 Stochastic Models: Mathematical Preliminaries
18 pages
Neural Network Presentation
100% (4)
Neural Network Presentation
33 pages
Cameron EB 539 D Rev D1 - Preferred Seal Orientation For Pressure Control Equipment
No ratings yet
Cameron EB 539 D Rev D1 - Preferred Seal Orientation For Pressure Control Equipment
7 pages
Machinelearning VSDeep Learning
No ratings yet
Machinelearning VSDeep Learning
2 pages
Area, Volume
No ratings yet
Area, Volume
8 pages
Enrtl-Rk Rate Based Dipa Model
No ratings yet
Enrtl-Rk Rate Based Dipa Model
34 pages
CGL Tier-1 Mock - p12
No ratings yet
CGL Tier-1 Mock - p12
1 page
Notes For Practical
No ratings yet
Notes For Practical
49 pages
Geosynthetic Lining System For Modern Waste Facilities - Experiences in Developing Asia
No ratings yet
Geosynthetic Lining System For Modern Waste Facilities - Experiences in Developing Asia
8 pages
VU21997 - Expose Website Security Vulnerabilities - Class 4 SQLMap Final
No ratings yet
VU21997 - Expose Website Security Vulnerabilities - Class 4 SQLMap Final
21 pages
A Review On Cellular Manufacturing Syste
No ratings yet
A Review On Cellular Manufacturing Syste
5 pages
Paradigm Shifts
No ratings yet
Paradigm Shifts
1 page
Create Stored Procedures in The NorthWind
No ratings yet
Create Stored Procedures in The NorthWind
7 pages
CBSE Class 12 Chemistry Question Paper Solution 2019
No ratings yet
CBSE Class 12 Chemistry Question Paper Solution 2019
6 pages
Polybatic Edge 540 Part2
No ratings yet
Polybatic Edge 540 Part2
8 pages
Handout - 7 - Threats To Validity
No ratings yet
Handout - 7 - Threats To Validity
1 page
Acknowledgement Abstract
No ratings yet
Acknowledgement Abstract
6 pages
Network Master Series: MT9090A MU909014A1/B/B1/C/C6 MU909015A6/B/B1/C/C6
No ratings yet
Network Master Series: MT9090A MU909014A1/B/B1/C/C6 MU909015A6/B/B1/C/C6
12 pages
The Automatic Pilot
No ratings yet
The Automatic Pilot
10 pages
Reteach Multiples - Worksheet Given by The Teacher
No ratings yet
Reteach Multiples - Worksheet Given by The Teacher
1 page
DSPDF Formulae
No ratings yet
DSPDF Formulae
3 pages
Quantum Mechanics in Hilbert Space: Second Edition
From Everand
Quantum Mechanics in Hilbert Space: Second Edition
Eduard Prugovecki
4.5/5 (3)
Linear Algebra
From Everand
Linear Algebra
Sterling K. Berberian
3/5 (2)
Advanced Calculus: An Introduction to Classical Analysis
From Everand
Advanced Calculus: An Introduction to Classical Analysis
Louis Brand
5/5 (1)
Elementary Algebraic Geometry: Second Edition
From Everand
Elementary Algebraic Geometry: Second Edition
Keith Kendig
No ratings yet