Basics of Stochastic Analysis Compress
Basics of Stochastic Analysis Compress
Chapter 3. Martingales 69
§3.1. Optional Stopping 71
§3.2. Inequalities 75
§3.3. Local martingales and semimartingales 78
§3.4. Quadratic variation for local martingales 79
§3.5. Doob-Meyer decomposition 84
§3.6. Spaces of martingales 86
Exercises 90
iii
iv Contents
Exercises 105
Chapter 5. Stochastic Integration of Predictable Processes 107
§5.1. Square-integrable martingale integrator 108
§5.2. Local square-integrable martingale integrator 135
§5.3. Semimartingale integrator 145
§5.4. Further properties of stochastic integrals 150
§5.5. Integrator with absolutely continuous Doléans measure 162
Exercises 167
Chapter 6. Itô’s formula 171
§6.1. Quadratic variation 171
§6.2. Itô’s formula 179
§6.3. Applications of Itô’s formula 187
Exercises 193
Chapter 7. Stochastic Differential Equations 195
§7.1. Examples of stochastic equations and solutions 196
§7.2. Existence and uniqueness for a semimartingale equation 203
§7.3. Proof of the existence and uniqueness theorem 208
Exercises 234
Appendix A. Analysis 235
§A.1. Continuous, cadlag and BV functions 236
§A.2. Differentiation and integration 247
Exercises 250
Appendix B. Probability 251
§B.1. General matters 251
§B.2. Construction of Brownian motion 257
Bibliography 269
Index 271
Chapter 1
Measures, Integrals,
and Foundations of
Probability Theory
In this chapter we sort out the integrals one typically encounters in courses
on calculus, analysis, measure theory, probability theory and various applied
subjects such as statistics and engineering. These are the Riemann inte-
gral, the Riemann-Stieltjes integral, the Lebesgue integral and the Lebesgue-
Stieltjes integral. The starting point is the general Lebesgue integral on an
abstract measure space. The other integrals are special cases, even though
they might have different definitions.
This chapter is not a complete treatment of the basics of measure theory.
It provides a brief unified explanation for readers who have prior familiarity
with various notions of integration. To avoid unduly burdening this chapter,
many technical matters that we need later in the book have been relegated
to the appendix. For details that we have omitted and for proofs the reader
should turn to any of the standard textbook sources, such as Folland [3].
For students of probability, the Appendix of [2] is convenient.
In the second part of the chapter we go over the measure-theoretic foun-
dations of probability theory. Readers who know basic measure theory and
measure-theoretic probability can safely skip this chapter.
1
2 1. Measures, Integrals, and Foundations of Probability Theory
It is possible that F (∞) = ∞ and F (−∞) = −∞, but this will not hurt
the definition. One can show that µ0 satisfies the hypotheses of Theorem
1.5, and consequently there exists a measure µ on (R, BR ) that gives mass
F (b) − F (a) to each interval (a, b]. This measure is called the Lebesgue-
Stieltjes measure of the function F , and we shall denote µ by ΛF to indicate
the connection with F .
The most important special case is Lebesgue measure which we shall
denote by m, obtained by taking F (x) = x.
On the other hand, if µ is a Borel measure on R such that µ(B) < ∞
for all bounded Borel sets, we can define a right-continuous nondecreasing
function by (
µ(0, x], x>0
G(0) = 0, and G(x) =
−µ(x, 0], x < 0
and then µ = ΛG . Thus Lebesgue-Stieltjes measures give us all the Borel
measures that are finite on bounded sets.
1.1.3. The integral. Let (X, A, µ) be a fixed measure space. To say that a
function f : X → R or f : X → [−∞, ∞] is measurable is always interpreted
with the Borel σ-algebra on R or [−∞, ∞]. In either case, it suffices to
check that {f ≤ t} ∈ A for each real t. The Lebesgue integral is defined
in several stages, starting with cases for which the integral can be written
out explicitly. This same pattern of proceeding from simple cases to general
cases will also be used to define stochastic integrals.
These steps complete the construction of the integral. Along the way
one proves that the integral has all the necessary properties, such as linearity
Z Z Z
(αf + βg) dµ = α f dµ + β g dµ,
monotonicity: Z Z
f ≤ g implies f dµ ≤ g dµ,
and the important inequality
Z Z
f dµ ≤ |f | dµ.
R
Various notations are used for the integral f dµ. Sometimes
R it is desir-
able to indicate the space over which one integrates by X f dµ. Then one
can indicate integration over a subset A by defining
Z Z
f dµ = 1A f dµ.
A X
To make the integration variable explicit, one can write
Z Z
f (x) µ(dx) or f (x) dµ(x).
X X
Since the integral is bilinear in both the function and the measure, the linear
functional notation hf, µi is used. Sometimes the notation is simplified to
µ(f ), or even to µf .
8 1. Measures, Integrals, and Foundations of Probability Theory
R
we can give the following explicit limit expression for the integral f dµ of
a [0, ∞]-valued function f . Define simple functions
2n
X n−1
fn (x) = 2−n k · 1{2−n k≤f <2−n (k+1)} (x) + n · 1{f ≥n} (x).
k=0
Typically one proves then that every continuous function is Riemann inte-
grable.
The definition of the Riemann integral is fundamentally different from
the definition of the Lebesgue integral. For the Riemann integral there is
one recipe for all functions, instead of a step-by-step definition that proceeds
from simple to complex cases. For the Riemann integral we partition the
domain [a, b], whereas the Lebesgue integral proceeds by partitioning the
range of f , as formula (1.3) makes explicit. This latter difference is some-
times illustrated by counting the money in your pocket: the Riemann way
picks one coin at a time from the pocket, adds its value to the total, and
repeats this until all coins are counted. The Lebesgue way first partitions
the coins into pennies, nickles, dimes, etc, and then counts the piles. As the
coin-counting picture suggests, the Lebesgue way is more efficient (it leads
to a more general integral with superior properties) but when both apply,
the answers are the same. The precise relationship is the following, which
also gives the exact domain of applicability of the Riemann integral.
Theorem 1.10. Suppose f is a bounded function on [a, b].
(a) If f is a Riemann integrable function on [a, b], then f is Lebesgue
measurable, and the Riemann integral of f coincides with the Lebesgue in-
tegral of f with respect to Lebesgue measure m on [a, b].
12 1. Measures, Integrals, and Foundations of Probability Theory
for a Borel or Lebesgue measurable function f on [a, b], even if the function
f is not Riemann integrable.
measure takes only values in [0, ∞]. If this point needs emphasizing, we use
the term positive measure as a synonym for measure.
For any signed measure ν, there exist unique positive measures ν + and
ν − such that ν = ν + − ν − and ν + ⊥ν − . (The statement ν + ⊥ν − reads “ν +
and ν − are mutually singular,” and means that there exists a measurable
set A such that ν + (A) = ν − (Ac ) = 0.) The measure ν + is the positive
variation of ν, ν − is the negative variation of ν, and ν = ν + − ν − the
Jordan decomposition of ν. There exist measurable sets P and N such that
P ∪ N = X, P ∩ N = ∅, and ν + (A) = ν(A ∩ P ) and ν − (A) = −ν(A ∩ N ).
(P, N ) is called the Hahn decomposition of ν. The total variation of ν is the
positive measure |ν| = ν + +ν − . We say that the signed measure ν is σ-finite
if |ν| is σ-finite.
Integration with respect to a signed measure is defined by
Z Z Z
(1.9) f dν = f dν − f dν −
+
valid for all f for which the integral on the right is finite.
Note for future reference that integrals with respect to |ν| can be ex-
pressed in terms of ν by
Z Z Z Z
+ −
(1.11) f d|ν| = f dν + f dν = (1P − 1N )f dν.
P N
The supremum above is taken over partitions of the interval [a, x]. F has
bounded variation on [a, b] if VF (b) < ∞. BV [a, b] denotes the space of
functions with bounded variation on [a, b] (BV functions).
VF is a nondecreasing function with VF (a) = 0. F is a BV function iff
it is the difference of two bounded nondecreasing functions, and in case F
is BV, one way to write this decomposition is
F = 21 (VF + F ) − 12 (VF − F )
1.1. Measure theory and integration 15
F is BV [6, page 282]. The next lemma gives a version of this limit that
will be used frequently in the sequel.
Lemma 1.12. Let ν be a finite signed measure on (0, T ]. Let f be a bounded
Borel function on [0, T ] for which the left limit f (t−) exists at all 0 < t ≤ T .
Let π n = {0 = sn1 < · · · < snm(n) = T } be partitions of [0, T ] such that
mesh(π n ) → 0. Then
X Z
lim sup f (sni )ν(sni ∧ t, sni+1 ∧ t] − f (s−) ν(ds) = 0.
n→∞ 0≤t≤T (0,t]
i
In particular, for a right-continuous function G ∈ BV [0, T ],
X Z
f (sni ) G(sni+1 ∧ t) − G(sni ∧ t) −
lim sup f (s−) dG(s) = 0.
n→∞ 0≤t≤T (0,t]
i
each A ∈ A,
Z
(1.14) ν(A) = f dµ.
A
R + R −
Some remarks
R are in order. Since either A f dµ or A f dµ is finite,
the integral A f dµ has a well-defined value in [−∞, ∞]. The equality of
integrals (1.14) extends to measurable functions, so that
Z Z
(1.15) g dν = gf dµ
for all A-measurable functions g for which the integrals make sense. The
precise sense in which f is unique is this: if f˜ also satisfies (1.14) for all
A ∈ A, then µ{f 6= f˜} = 0.
The function f is the Radon-Nikodym derivative of ν with respect to µ,
and denoted by f = dν/dµ. The derivative notation is very suggestive. It
leads to dν = f dµ which tells us how to do the substitution in the integral.
Also, it suggests that
dν dρ dν
(1.16) · =
dρ dµ dµ
which is a true theorem under the right assumptions: suppose ν is a signed
measure, ρ and µ positive measures, all σ-finite, ν ρ and ρ µ. Then
Z Z Z
dν dν dρ
g dν = g · dρ = g · · dµ
dρ dρ dµ
by two applications of (1.15). Since the Radon-Nikodym derivative is unique,
the equality above proves (1.16).
18 1. Measures, Integrals, and Foundations of Probability Theory
The last integral vanishes as tn & t because 1(t,tn ] (s) → 0 for each point s,
and the integral converges by dominated convergence. Thus F (t+) = F (t).
For any partition 0 = s0 < s1 < · · · < sn = T ,
X X Z XZ
F (si+1 ) − F (si ) = g dν ≤ |g| d|ν|
i i (si ,si+1 ] i (si ,si+1 ]
Z
= |g| d|ν|.
(0,T ]
By the assumption g ∈ L1 (ν) the last quantity above is a finite upper bound
on the sums of F -increments over all partitions. Hence F ∈ BV [0, T ].
The last issue is the equality of the two measures ΛF and g dν on (0, T ].
By Lemma B.3 it suffices to check the equality of the two measures for
intervals (a, b], because these types of intervals generate the Borel σ-algebra.
Z Z Z
ΛF (a, b] = F (b)−F (a) = g(s) ν(ds)− g(s) ν(ds) = g(s) ν(ds).
[0,b] [0,a] (a,b]
This suffices.
On the other hand, the conclusion of the lemma on (0, T ] would not change
if we defined F (0) = 0 and
Z
F (t) = g(s) ν(ds), 0 < t ≤ T.
(0,t]
This changes F by a constant and hence does not affect its total variation
or Lebesgue-Stieltjes measure.
coin tosses are independent (a term we discuss below) and fair (heads and
tails equally likely). Let S be the class of events of the form
A = {ω : (x1 , . . . , xn ) = (a1 , . . . , an )}
as n varies over N and (a1 , . . . , an ) varies over n-tuples of zeroes and ones.
Include ∅ and Ω to make S is a semialgebra. Our assumptions dictate
that the probability of the event A should be P0 (A) = 2−n . One needs
to check that P0 satisfies the hypotheses of Theorem 1.5, and then the
mathematical machinery takes over. There exists a probability measure
P on (Ω, F) that agrees with P0 on S. This is a mathematical model of
a sequence of independent fair coin tosses. Natural random variables to
define on Ω are first the coordinate variables Xi (ω) = xi , and then variables
derived from these such as Sn = X1 + · · · + Xn , the number of ones among
the first n tosses. The random variables {Xi } are an example of an i.i.d.
sequence, which is short for independent and identically distributed.
then Xn → X in L1 .
conditional expectation works with integrals on the real line with respect to
the distribution µY of Y : for any B ∈ BR ,
Z
(1.28) E[1B (Y )X] = E(X|Y = y) µY (dy).
B
The next theorem lists the main properties of the conditional expec-
tation. Equalities and inequalities concerning conditional expectations are
almost sure statements, although we did not indicate this below, because
the conditional expectation is defined only up to null sets.
Theorem 1.25. Let (Ω, F, P ) be a probability space, X and Y integrable
random variables on Ω, and A and B sub-σ-fields of F.
(i) E[E(X|A)] = EX.
(ii) E[αX + βY |A] = αE[X|A] + βE[Y |A] for α, β ∈ R.
(iii) If X ≥ Y then E[X|A] ≥ E[Y |A].
(iv) If X is A-measurable, then E[X|A] = X.
1.2. Basic concepts of probability theory 27
Proof. The proofs must appeal to the definition of the conditional expecta-
tion. We leave them mostly as exercises or to be looked up in any graduate
probability textbook. Let us prove (v) and (vii) as examples.
Proof of part (v). We need to check that X · E[Y |A] satisfies the defi-
nition of E[XY |A]. The A-measurability of X · E[Y |A] is true because X
is A-measurable by assumption, E[Y |A] is A-measurable by definition, and
multiplication preserves A-measurability. Then we need to check that
(1.33) E 1A XY = E 1A X E[Y |A]
for an arbitrary A ∈ A. If X were bounded, this would be a special case of
(1.27) with Z replaced by 1A X. For the general case we need to check the
integrability of X E[Y |A] before we can really write down the right-hand
side of (1.33).
Let us assume first that both X and Y are nonnegative. Then also
E(Y |A) ≥ 0 by (iii), because E(0|A) = 0 by (iv). Let X (k) = X ∧ k be a
truncation of X. We can apply (1.27) to get
E 1A X (k) Y = E 1A X (k) E[Y |A] .
(1.34)
28 1. Measures, Integrals, and Foundations of Probability Theory
+ E 1A X − E(Y − |A) .
Consequently E[X|A] is in L2 (P ).
We refer the reader to [1, Chapter 12] for a proof of Kolmogorov’s theo-
rem in this generality. The appendix in [2] gives a proof for the case where
I is countable and Xt = R for each t. The main idea of the proof is no
different for the more abstract result.
We will not discuss the proof. Let us observe that hypotheses (i) and
(ii) are necessary for the existence of P , so nothing unnecessary is as-
sumed in the theorem. Property (ii) is immediate from (1.35) because
(xt1 , xt2 , . . . , xtn −1 ) ∈ A iff (xt1 , xt2 , . . . , xtn ) ∈ A × Xtn . Property (i) is
also clear on intuitive grounds because all it says is that if the coordinates
are permuted, their distribution gets permuted too. Here is a rigorous jus-
tification. Take a bounded measurable function f on X t . Note that f ◦ π is
then a function on X s , because
x = (x1 , . . . , xn ) ∈ X s ⇐⇒ xi ∈ Xsi (1 ≤ i ≤ n)
⇐⇒ xπ(i) ∈ Xti (1 ≤ i ≤ n)
⇐⇒ πx = (xπ(1) , . . . , xπ(n) ) ∈ X t .
Compute as follows, assuming P exists:
Z Z
f dQt = f (xt1 , xt2 , . . . , xtn ) P (dx)
Xt
ZX
= f (xsπ(1) , xsπ(2) , . . . , xsπ(n) ) P (dx)
X
Z
= (f ◦ π)(xs1 , xs2 , . . . , xsn ) P (dx)
X
Z Z
= (f ◦ π) dQs = f d(Qs ◦ π −1 ).
Xs Xs
Exercises
Exercise 1.1. Here is a useful formula for computing expectations. Suppose
X is a nonnegative random variable, and h is a nondecreasing function on
R+ such that h(0) = 0 and h is absolutely continuous on R a each bounded
interval. (This last hypothesis is for ensuring that h(a) = 0 h0 (s) ds for all
Exercises 31
(a) Fix two points x and y of the underlying space. Suppose for each
A ∈ E, {x, y} ⊆ A or {x, y} ⊆ Ac . Show that the same property is true for
all A ∈ B. In other words, if the generating sets do not separate x and y,
neither does the σ-field.
(c) In the setting of part (b), suppose two points x and y of X satisfy
f (x) = f (y) for all f ∈ Φ. Show that for each B ∈ B, {x, y} ⊆ B or
{x, y} ⊆ B c .
Exercise 1.4. (Product σ-algebras) Recall the setting of Example 1.3. For
a subset L ⊆ I of indices, let BL = σ{fi : i ∈ L} denote theN σ-algebra
generated by the projections fi for i ∈ L. So in particular, BI = i∈I Ai is
the full product σ-algebra.
(a) Show that for each B ∈ BI there exists a countable set L ⊆ I such
that B ∈ BL . Hint. Do not try to reason starting from a particular set
B ∈ BI . Instead, try to say something useful about the class of sets for
which a countable L exists.
(b) Let R[0,∞) be the space of all functions x : [0, ∞) → R, with the
product σ-algebra generated by the projections x 7→ x(t), t ∈ [0, ∞). Show
that the set of continuous functions is not measurable.
Exercise 1.5. (a) Let E1 , . . . , En be collections of measurable sets on (Ω, F, P ),
each closed under intersections (if A, B ∈ Ei then A ∩ B ∈ Ei ). Suppose
P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 ) · P (A2 ) · · · P (An )
for all A1 ∈ E1 , . . . , An ∈ En . Show that the σ-algebras σ(E1 ), . . . , σ(En ) are
independent. Hint. A straight-forward application of the π–λ Theorem B.1.
(b) Let {Ai : i ∈ I} be a collection of independent σ-algebras. Let I1 ,
. . . , In be pairwise disjoint subsets of I, and let Bk = σ{Ai : i ∈ Ik } for
1 ≤ k ≤ n. Show that B1 , . . . , Bn are independent.
(c) Let A, B, and C be sub-σ-algebras of F. Assume σ{B, C} is indepen-
dent of A, and C is independent of B. Show that A, B and C are independent,
and so in particular C is independent of σ{A, B}.
(d) Show by example that the independence of C and σ{A, B} does not
necessarily follow from having B independent of A, C independent of A, and
C independent of B. This last assumption is called pairwise independence of
A, B and C. Hint. An example can be built from two independent fair coin
tosses.
Exercise 1.6. Independence allows us to average separately. Here is a
special case that will be used in a later proof. Let (Ω, F, P ) be a probability
Exercises 33
Hints. Start with functions of the type f (x, y) = g(x)h(y). Use Theorem
B.2 from the appendix.
Exercise 1.9. Let (X, Y ) be an R2 -valued random vector with joint density
f (x, y). This means that for any bounded Borel function φ on R2 ,
ZZ
E[φ(X, Y )] = φ(x, y)f (x, y) dx dy.
R2
Let
f (x, y) , if f (y) > 0
Y
f (x|y) = fY (y)
0, if fY (y) = 0.
(a) Show that f (x|y)fY (y) = f (x, y) for almost every (x, y), with respect
to Lebesgue measure on R2 . Hint: Let
Show that m(Hy ) = 0 for each y-section of H, and use Tonelli’s theorem.
34 1. Measures, Integrals, and Foundations of Probability Theory
Stochastic Processes
This chapter first covers general matters in the theory of stochastic processes,
and then discusses the two most important processes, Brownian motion and
Poisson processes.
35
36 2. Stochastic Processes
Proof. Part (i). Let A ∈ Fσ . For the first statement, we need to show that
(A ∩ {σ ≤ τ }) ∩ {τ ≤ t} ∈ Ft . Write
(A ∩ {σ ≤ τ }) ∩ {τ ≤ t}
= (A ∩ {σ ≤ t}) ∩ {σ ∧ t ≤ τ ∧ t} ∩ {τ ≤ t}.
All terms above lie in Ft . (i) The first by the definition of A ∈ Fσ . (ii) The
second because both σ ∧ t and τ ∧ t are Ft -measurable random variables: for
any u ∈ R, {σ ∧ t ≤ u} equals Ω if u ≥ t and {σ ≤ u} if u < t, a member of
Ft in both cases. (iii) {τ ≤ t} ∈ Ft since τ is a stopping time.
In particular, if σ ≤ τ , then Fσ ⊆ Fτ .
To show A ∩ {σ < τ } ∈ Fτ , write
[
1
A ∩ {σ < τ } = A ∩ {σ + n ≤ τ }.
n≥1
All members of the union on the right lie in Fτ by the first part of the proof,
because σ ≤ σ + n1 implies A ∈ Fσ+1/n .
All the stochastic processes we study will have some regularity properties
as functions of t, when ω is fixed. These are regularity properties of paths.
A stochastic process X = {Xt : t ∈ R+ } is continuous if for each ω ∈ Ω,
the path t 7→ Xt (ω) is continuous as a function of t. The properties left-
continuous and right-continuous have the obvious analogous meaning. X is
right continuous with left limits (or cadlag as the French acronym for this
property goes) if the following is true for all ω ∈ Ω and t ∈ R+ :
Xt (ω) = lim Xs (ω), and the left limit Xt− (ω) = lim Xs (ω) exists.
s&t s%t
Above s & t means that s approaches t from above (from the right), and
s % t approach from below (from the left). Finally, we also need to consider
the reverse situation, namely a process that is left continuous with right
limits, and for that we use the term caglad.
X is a finite variation process (FV process) if the path t 7→ Xt (ω) has
bounded variation on each compact interval [0, T ].
We shall use all these terms also of a process that has a particular path
property for almost every ω. For example, if t 7→ Xt (ω) is cadlag for all ω
in a set Ω0 of probability 1, then we can define X et (ω) = Xt (ω) for ω ∈ Ω0
and Xt (ω) = 0 for ω ∈
e / Ω0 . Then X has all paths continuous, and X
e
and X are indistinguishable. Since we regard indistinguishable processes
e
as equal, it makes sense to regard X itself as a continuous process. When
we prove results under hypotheses of path regularity, we assume that the
40 2. Stochastic Processes
path condition holds for each ω. Typically the result will be the same for
processes that are indistinguishable.
Note, however, that processes that are modifications of each other can
have quite different path properties (Exercise 2.3).
The next two lemmas record some technical benefits of path regularity.
Hence Xt (ω) = Yt (ω) for all t ∈ R+ and ω ∈ Ω0 , and this says X and Y are
indistinguishable.
For the left-continuous case the origin t = 0 needs a separate assumption
because it cannot be approached from the left.
{Ft+ } is a new filtration, and Ft+ ⊇ Ft . If Ft = Ft+ for all t, we say {Ft }
is right-continuous. Similarly, we can define
[
(2.3) F0− = F0 and Ft− = σ Fs for t > 0.
s:s<t
for all values except s = 0, but 0 is among the rationals so it gets taken care
of.) Thus we have
[
{τG < t} = {Xq ∈ G} ∈ σ{Xs : 0 ≤ s < t} ⊆ Ft .
q:q∈Q+
q∈[0,t)
Example 2.7. Assuming X continuous would not improve the conclusion
to {τG ≤ t} ∈ Ft . To see this, let G = (b, ∞) for some b > 0, let t > 0, and
consider the two paths
Xs (ω0 ) = Xs (ω1 ) = bs for 0 ≤ s ≤ t
while )
Xs (ω0 ) = bs
for s ≥ t.
Xs (ω1 ) = b(2t − s)
Now τG (ω0 ) = t while τG (ω1 ) = ∞. Since Xs (ω0 ) and Xs (ω1 ) agree for
s ∈ [0, t], the points ω0 and ω1 must be together either inside or outside any
event in FtX [Exercise 1.2(c)]. But clearly ω0 ∈ {τG ≤ t} while ω1 ∈ / {τG ≤
t}. This shows that {τG ≤ t} ∈ X
/ Ft .
(ii) If uk < s for all k then both X(uk ) and X(uk −) converge to X(s−),
which thus lies in H by the closedness of H.
(iii) If uk > s for all k then both X(uk ) and X(uk −) converge to X(s),
which thus lies in H.
The equality above is checked.
Let
Hn = {y : there exists x ∈ H such that |x − y| < n−1 }
be the n−1 -neighborhood of H. Let U contain all the rationals in [0, t] and
the point t itself. Next we claim
{X(0) ∈ H} ∪ {X(s) ∈ H or X(s−) ∈ H for some s ∈ (0, t]}
∞ [
\
= {X(q) ∈ Hn }.
n=1 q∈U
To justify this, note first that if X(s) = y ∈ H for some s ∈ [0, t] or X(s−) =
y ∈ H for some s ∈ (0, t], then we can find a sequence qj ∈ U such that
X(qj ) → y, and then X(qj ) ∈ Hn for all large enough j. Conversely, suppose
we have qn ∈ U such that X(qn ) ∈ Hn for all n. Extract a convergent
subsequence qn → s. By the cadlag property a further subsequence of X(qn )
converges to either X(s) or X(s−). By the closedness of H, one of these
lies in H.
Combining the set equalities proved shows that {σH ≤ t} ∈ Ft .
Lemma 2.8 fails for caglad processes, unless the filtration is assumed
right-continuous (Exercise 2.7). For a continuous process X and a closed
set H the random times defined by (2.4) and (2.5) coincide. So we get this
corollary.
Corollary 2.9. Assume X is continuous and H is closed. Then τH is a
stopping time.
Remark 2.10 (A look ahead). The stopping times discussed above will
play a role in the development of the stochastic integral in the following
way. To integrate an unbounded real-valued process X we need stopping
times ζk % ∞ such that Xt (ω) stays bounded for 0 < t ≤ ζk (ω). Caglad
processes will be an important class of integrands. For a caglad X Lemma
2.6 shows that
ζk = inf{t ≥ 0 : |Xt | > k}
are stopping times, provided {Ft } is right-continuous. Left-continuity of X
then guarantees that |Xt | ≤ k for 0 < t ≤ ζk .
Of particular interest will be a caglad process X that satisfies Xt = Yt−
for t > 0 for some adapted cadlag process Y . Then by Lemma 2.8 we get
44 2. Stochastic Processes
The same idea is used to define the corresponding object between two
processes.
Definition 2.12. The (quadratic) covariation process [X, Y ] = {[X, Y ]t :
t ∈ R+ } of two stochastic processes X and Y is defined by the following
limits, provided these limits exist for all t:
X
(2.9) lim (Xti+1 − Xti )(Yti+1 − Yti ) = [X, Y ]t in probability.
mesh(π)→0
i
By definition [Y, Y ] = [Y ].
2.2. Quadratic variation 45
or
1
[X, Y ] = 2 [X] + [Y ] − [X − Y ] .
jumps of the processes are copied exactly in the covariation. For any cadlag
process Z, the jump at t is denoted by ∆Z(t) = Z(t) − Z(t−).
Proposition 2.15. Suppose X and Y are cadlag processes, and [X] and
[Y ] exist. Then there exists a modification of [X, Y ] that is a cadlag process.
For any t, ∆[X, Y ]t = (∆Xt )(∆Yt ) almost surely.
To bound the jump at u, return to the partition π first chosen right after
(2.12). Let s = tm(π)−1 . Keeping s fixed refine π sufficiently in [t, s] so that,
with probability at least 1 − 2δ,
m(π)−1
X
[X]u − [X]t ≤ (Xti+1 − Xti )2 + ε
i=0
m(π)−2
X
2
= (Xu − Xs ) + (Xti+1 − Xti )2 + ε
i=0
2
≤ (Xu − Xs ) + [X]s − [X]t + 2ε
which rearranges, through ∆[X]u ≤ [X]u − [X]s , to give
P ∆[X]u ≤ (Xu − Xs )2 + 2ε > 1 − 2δ.
Proof. Once ω is fixed, the result is an analytic lemma, and the depen-
dence of G and H on ω is irrelevant. We included this dependence so that
the statement better fits its later applications. It is a property of product-
measurability that for a fixed ω, G(t, ω) and H(t, ω) are measurable func-
tions of t.
Consider first step functions
m−1
X
g(t) = α0 1{0} (t) + αi 1(si ,si+1 ] (t)
i=1
and
m−1
X
h(t) = β0 1{0} (t) + βi 1(si ,si+1 ] (t)
i=1
where 0 = s1 < · · · < sm = T is a partition of [0, T ]. (Note that g and h
can be two arbitrary step functions. If they come with distinct partitions,
{si } is the common refinement of these partitions.) Then
Z X
g(t)h(t) d[X, Y ]t = αi βi [X, Y ]si+1 − [X, Y ]si
[0,T ] i
X 1/2 1/2
≤ |αi βi | [X]si+1 − [X]si [Y ]si+1 − [Y ]si
i
X 1/2 X 1/2
2 2
≤ |αi | [X]si+1 − [X]si |βi | [Y ]si+1 − [Y ]si
i i
Z 1/2 Z 1/2
2 2
= g(t) d[X]t h(t) d[Y ]t
[0,T ] [0,T ]
where we applied (2.10) separately on each partition interval (si , si+1 ], and
then Schwarz inequality.
Let g and h be two arbitrary bounded Borel functions on [0, T ], and pick
0 < C < ∞ so that |g| ≤ C and |h| ≤ C. Let ε > 0. Define the bounded
2.2. Quadratic variation 49
Borel measure
µ = Λ[X] + Λ[Y ] + |Λ[X,Y ] |
on [0, T ]. Above, Λ[X] is the positive Lebesgue-Stieltjes measure of the
function t 7→ [X]t (for the fixed ω under consideration), same for Λ[Y ] ,
and |Λ[X,Y ] | is the positive total variation measure of the signed Lebesgue-
Stieltjes measure Λ[X,Y ] . By Lemma A.16 we can choose step functions ge
and eh so that |e g | ≤ C, |e
h| ≤ C, and
Z
ε
|g − ge| + |h − e
h| dµ < .
2C
On the one hand
Z Z Z
gh d[X, Y ]t − geh d[X, Y ]t ≤
e h d|Λ[X,Y ] |
gh − gee
[0,T ] [0,T ] [0,T ]
Z Z
≤C |g − ge| d|Λ[X,Y ] | + C |h − e h| d|Λ[X,Y ] | ≤ ε.
[0,T ] [0,T ]
with a similar bound for h. Putting these together with the inequality
already proved for step functions gives
Z Z 1/2 Z 1/2
gh d[X, Y ]t ≤ ε + ε + g 2 d[X]t ε+ h2 d[Y ]t .
[0,T ] [0,T ] [0,T ]
Since ε > 0 was arbitrary, we can let ε → 0. The inequality as stated in the
proposition is obtained by choosing g(t) = G(t, ω) and h(t) = H(t, ω).
Remark 2.17. Inequality (2.14) has the following corollary. As in the proof,
let |Λ[X,Y ](ω) | be the total variation measure of the signed Lebesgue-Stieltjes
measure Λ[X,Y ](ω) on [0, T ]. For a fixed ω, (1.11) implies that |Λ[X,Y ](ω) |
Λ[X,Y ](ω) and the Radon-Nikodym derivative
d|Λ[X,Y ](ω) |
φ(t) = (t)
dΛ[X,Y ](ω)
on [0, T ] satisfies |φ(t)| ≤ 1. For an arbitrary bounded Borel function g on
[0, T ]
Z Z
g(t) |Λ[X,Y ](ω) |(dt) = g(t)φ(t) d[X, Y ]t (ω).
[0,T ] [0,T ]
50 2. Stochastic Processes
For the definition of a Markov process X can take its values in an ab-
stract space, but Rd is sufficiently general for us. An Rd -valued process X
is a Markov process with respect to {Ft } if
A martingale represents a fair gamble in the sense that, given all the
information up to the present time s, the expectation of the future fortune
Xt is the same as the current fortune Xs . Stochastic analysis relies heavily
on martingale theory, parts of which are covered in the next chapter. The
Markov property is a notion of causality. It says that, given the present
state Xs , future events are independent of the past.
52 2. Stochastic Processes
Requirement (a) in the definition says that x is the initial state under
the measure P x . Requirement (b) is for technical purposes. Requirement
(c) is the Markov property. E x stands for expectation under the measure
P x.
Next we discuss the two most important processes, Brownian motion
and the Poisson process.
independent of X, and B et − B
es is independent of Fes , Exercise 1.5(c) implies
that Bt − Bs is independent of Fs . Conversely, if a process Bt satisfies parts
(ii) and (iii) of the definition, then Bet = Bt − B0 is a standard Brownian
motion, independent of B0 .
The construction (proof of existence) of Brownian motion is rather tech-
nical, and hence relegated to Section B.2 in the Appendix. For the un-
derlying probability space the construction uses the “canonical” path space
C = CR [0, ∞). Let Bt (ω) = ω(t) be the coordinate projections on C, and
FtB = σ{Bs : 0 ≤ s ≤ t} the filtration generated by the coordinate process.
Theorem 2.21. There exists a Borel probability measure P 0 on C = CR [0, ∞)
such that the process B = {Bt : 0 ≤ t < ∞} on the probability space
(C, BC , P 0 ) is a standard one-dimensional Brownian motion with respect to
the filtration {FtB }.
Proof. Follows from the properties of Brownian increments and basic prop-
erties of conditional expectations. Let s < t.
E[Bt |Fs ] = E[Bt − Bs |Fs ] + E[Bs |Fs ] = Bs ,
and
E[Bt2 |Fs ] = E[(Bt − Bs + Bs )2 |Fs ] = E[(Bt − Bs )2 |Fs ]
+ 2Bs E[Bt − Bs |Fs ] + Bs2
= (t − s) + Bs2 .
Proof. Part (a). Definition (2.1) shows how to complete the filtration. Of
course, the adaptedness of B to the filtration is not harmed by enlarging
the filtration, the issue is the independence of F̄s and Bt − Bs . If G ∈ F has
A ∈ Fs such that P (A4G) = 0, then P (G ∩ H) = P (A ∩ H) for any event
H. In particular, the independence of F̄s from Bt − Bs follows.
To check the independence of Fs+ and Bt − Bs , let f be a bounded
continuous function on R, and suppose Z is a bounded Fs+ -measurable
random variable. Z is Fs+h -measurable for each h > 0. By path continuity,
and by the independence of Bt − Bs+h and Fs+h for s + h < t, we get
E Z · f (Bt − Bs ) = lim E Z · f (Bt − Bs+h )
h→0
= lim EZ · E f (Bt − Bs+h )
h→0
= EZ · E f (Bt − Bs ) .
56 2. Stochastic Processes
This implies the independence of Fs+ and Bt − Bs (Lemma B.4 extends the
equality from continuous f to bounded Borel f ).
Part (b). Fix 0 = t0 < t1 < t2 < · · · < tn and abbreviate the vector of
Brownian increments by
ξ = (Bs+t1 − Bs , Bs+t2 − Bs+t1 , . . . , Bs+tn − Bs+tn−1 ).
We first claim that ξ is independent of Fs . Pick Borel sets A1 , . . . , An of
R and a bounded Fs -measurable random variable Z. In the next calcula-
tion, separate out the factor 1An (Bs+tn − Bs+tn−1 ) and note that the rest is
Fs+tn−1 -measurable and so independent of 1An (Bs+tn − Bs+tn−1 ):
E Z · 1A1 ×A2 ×···×An (ξ)
h Yn i
=E Z· 1Ai (Bs+ti − Bs+ti−1 )
i=1
h n−1
Y i
=E Z· 1Ai (Bs+ti − Bs+ti−1 ) · 1An (Bs+tn − Bs+tn−1 )
i=1
h n−1
Y i
=E Z· 1Ai (Bs+ti − Bs+ti−1 ) · E 1An (Bs+tn − Bs+tn−1 ) .
i=1
Repeat this argument to separate all the factors, until the expectation be-
comes
n
Y
E Z · E 1Ai (Bs+ti − Bs+ti−1 ) = E Z · E 1A1 ×A2 ×···×An (ξ) .
i=1
Now consider the class of all Borel sets G ∈ BRn such that
E[ Z · 1G (ξ)] = E[Z] · E[ 1G (ξ)].
The above argument shows that this class contains all products A1 × A2 ×
· · · × An of Borel sets from R. We leave it to the reader to check that this
class is a λ-system. Thus by the π-λ Theorem B.1 the equality is true for
all G ∈ BRn . Since Z was an arbitrary bounded Fs -measurable random
variable, we have proved that ξ is independent of Fs .
The vector ξ also satisfies ξ = (Yt1 , Yt2 − Yt1 , . . . , Ytn − Ytn−1 ). Since
the vector η = (Yt1 , Yt2 , . . . , Ytn ) is a function of ξ, we conclude that η is
independent of Fs . This being true for all choices of time points 0 < t1 <
t2 < · · · < tn implies that the entire process Y is independent of Fs .
It remains to check that Y is a standard Brownian motion with respect
to Gt = Fs+t . These details are straightforward and we leave them as an
exercise.
2.4. Brownian motion 57
Parts (a) and (b) of the lemma together assert that Y is a standard
Brownian motion, independent of F̄t+ , the filtration obtained by replac-
ing {Ft } with the augmented right-continuous version. (The order of the
two operations on the filtration is immaterial,
S S in other words the σ-algebra
F̄
s:s>t s agrees with the augmentation of s:s>t Fs , see Exercise 2.2.)
Next we develop some properties of Brownian motion by concentrating
on the “canonical setting.” The underlying probability space is the path
space C = CR [0, ∞) with the coordinate process Bt (ω) = ω(t) and the
filtration FtB = σ{Bs : 0 ≤ s ≤ t} generated by the coordinates. For each
x ∈ R there is a probability measure P x on C under which B = {Bt } is
Brownian motion started at x. Expectation under P x is denoted by E x and
satisfies
E x [H] = E 0 [H(x + B)]
for any bounded BC -measurable function H. On the right x + B is a sum
of a point and a process, interpreted as the process whose value at time t is
x + Bt . (In Theorem 2.21 we constructed P 0 , and the equality above can
be taken as the definition of P x .)
On C we have the shift maps {θs : 0 ≤ s < ∞} defined by (θs ω)(t) =
ω(s + t) that move the time origin to s. The shift acts on the process B by
θs B = {Bs+t : t ≥ 0}.
A consequence of Lemma 2.23(a) is that the coordinate T process B is a
Brownian motion also relative to the larger filtration Ft+B = B
s:s>t Fs . We
shall show that members of FtB and Ft+ B differ only by null sets. (These
σ-algebras are different, see Exercise 2.6.) This will have interesting conse-
quences when we take t = 0. We begin with the Markov property.
Proposition 2.24. Let H be a bounded BC -measurable function on C.
(a) E x [H] is a Borel measurable function of x.
(b) For each x ∈ R
(2.18) E x [H ◦ θs |Fs+
B
](ω) = E Bs (ω) [H] for P x -almost every ω.
Proof. The σ-algebra F0B satisfies the 0–1 law under P x , because P x {B0 ∈
G} = 1G (x). Then every P x -conditional expectation with respect to F0B
equals the expectation (Exercise 1.8). The following equalities are valid
P x -almost surely for A ∈ F0+
B :
1A = E x (1A |F0+
B
) = E x (1A |F0B ) = P x (A).
Thus there must exist points ω ∈ C such that 1A (ω) = P x (A), and so the
only possible values for P x (A) are 0 and 1.
From the 0–1 law we get a fact that suggests something about the fast
oscillation of Brownian motion: if it starts at the origin, then in any nontriv-
ial time interval (0, ε) the process is both positive and negative, and hence
by continuity also zero. To make this precise, define
σ = inf{t > 0 : Bt > 0}, τ = inf{t > 0 : Bt < 0},
(2.20)
and T0 = inf{t > 0 : Bt = 0}.
B B . Same
T
Since this is true for every n ∈ N, {σ = 0} ∈ n∈N Fn−1 = F0+
argument shows {τ = 0} ∈ F0+ B .
Proof. Fix γ > 12 . Since only increments of Brownian motion are involved,
we can assume that the process in question is a standard Brownian motion.
(Bt and Bet = Bt − B0 have the same increments.) In the proof we want to
deal only with a bounded time interval. So define
Hk (C, ε) = {there exists s ∈ [k, k + 1] such that |Bt − Bs | ≤ C|t − s|γ
for all t ∈ [s − ε, s + ε] ∩ [k, k + 1]}.
2.4. Brownian motion 61
S
G(γ, C, ε) is contained in k Hk (C, ε), so it suffices to show P Hk (C, ε) = 0
Yt = Bk+t −Bk is a standard Brownian motion, P
for all k. Since Hk (C, ε) =
P H0 (C, ε) for each k. Finally, what we show is P H0 (C, ε) = 0.
Fix m ∈ N such that m(γ − 12 ) > 1. Let ω ∈ H0 (C, ε), and pick s ∈ [0, 1]
so that the condition of the event is satisfied. Consider n large enough so
that m/n < ε. Imagine partitioning [0, 1] into intervals of length n1 . Let
Xn,k = max{|B(j+1)/n − Bj/n | : k ≤ j ≤ k + m − 1} for 0 ≤ k ≤ n − m.
The point s has to lie in one of the intervals [ nk , k+m
n ], for some 0 ≤ k ≤ n−m.
For this particular k,
|B(j+1)/n − Bj/n | ≤ |B(j+1)/n − Bs | + |Bs − Bj/n |
≤ C | j+1 γ j γ
n − s| + |s − n | ≤ 2C( m
n)
γ
Thus if the integer M satisfies M > |ξ| + 1, we can find another integer k
such that for all t ∈ [s − k −1 , s + k −1 ],
Bt (ω) − Bs (ω)
−M ≤ ≤ M =⇒ |Bt (ω) − Bs (ω)| ≤ M |t − s|.
t−s
Consequently ω ∈ G(1, M, k −1 ).
This reasoning shows that if t 7→ B
St (ω)Sis differentiable at even a single
time point, then ω lies in the union M k G(1, M, k −1 ). This union has
probability zero by the previous theorem.
By Chebychev’s inequality,
m(π n )−1
X
2
P (Btni+1 − Btni ) − t ≥ ε
i=0
m(π n
X)−1 2
−2 2
≤ε E (Btni+1 − Btni ) − t
i=0
≤ 2tε−2 mesh(π ). n
If n mesh(π n ) < ∞, these numbers have a finite sum over n (in short,
P
they are summable). Hence the asserted convergence follows from the Borel-
Cantelli Lemma.
Corollary 2.34. The following is true almost surely for a Brownian motion
B: the path t 7→ Bt (ω) is not a member of BV [0, T ] for any 0 < T < ∞.
Since the maximum in braces vanishes as n → ∞, the last sum must converge
to ∞. Consequently the path t 7→ Bt (ω) is not BV in any interval [0, k −1 ].
Any other nontrivial inteval [0, T ] contains an interval [0, k −1 ] for some k,
and so this path cannot have bounded variation on any interval [0, T ].
Observe that items (i) and (ii) give a complete description of all the
finite-dimensional distributions of {N (A)}. For arbitrary B1 , B2 , . . . , Bm ∈
A, we can find disjoint A1 , A2 , . . . , An ∈ A so that each Bj is a union of
some of the Ai ’s. Then each N (Bj ) is a certain sum of N (Ai )’s, and we see
that the joint distribution of N (B1 ), N (B2 ), . . . , N (Bm ) is determined by
the joint distribution of N (A1 ), N (A2 ), . . . , N (An ).
As the formula reveals, Ki decides how many points to place in Si , and the
{Xji } give the locations of the points in Si . We leave it as an exercise to
check that Ni is a Poisson point process whose mean measure is µ restricted
to Si , defined by µi (B) = µ(B ∩ Si ).
We can repeat this construction for each Si , and take the resulting ran-
dom processes Ni mutually independent (by a suitable product space con-
struction). Finally, define
X
N (A) = Ni (A).
i
Exercises
Exercise 2.1. Let {Ft } be a filtration, and let Gt = Ft+ . Show that Gt− =
Ft− for t > 0.
Exercise 2.2. Assume the probability space (Ω, F, P ) is complete. Let
{Ft } be a filtration, Gt = Ft+ its right-continuous version, and Ht = F̄t
its augmentation.
S Augment {Gt } to get the filtration {Ḡt }, and define also
Ht+ = s:s>t Hs . Show that Ḡt = Ht+ . In other words, it is immaterial
whether we augment before or after making the filtration right-continuous.
Hints. Ḡt ⊆ Ht+ should be easy. For the other direction, if C ∈ Ht+ ,
then for each s > t there exists Cs ∈ Fs such that P (C4Cs ) = 0. For any
e=T
sequence si & t, the set C
S
m≥1 i≥m Csi lies in Ft+ . Use Exercise 1.3.
Exercise 2.3. Let the underlying probability space be Ω = [0, 1] with P
given by Lebesgue measure. Define two processes
Xt (ω) = 0 and Yt (ω) = 1{t=ω} .
X is continuous, Y does not have a single continuous path, but they are
modifications of each other.
Exercise 2.4 (Example of an adapted but not progressively measurable
process). Let Ω = [0, 1], and for each t let Ft be the σ-field generated by
singletons on Ω. (Equivalently, Ft consists of all countable sets and their
complements.) Let Xt (ω) = 1{ω = t}. Then {Xt : 0 ≤ t ≤ 1} is adapted.
But X on [0, 1] × Ω is not B[0,1] ⊗ F1 -measurable.
Hint. Show that elements of B[0,1] ⊗ F1 are of the type
[
Bs × {s} ∪ H × I c
s∈I
68 2. Stochastic Processes
Martingales
69
70 3. Martingales
By the bounds
c ≤ Ms+n−1 ∨ c ≤ E[Mt ∨ c|Fs+n−1 ]
and Lemma B.12 from the Appendix, for a fixed c the random variables
{Ms+n−1 ∨ c} are uniformly integrable. Let n → ∞. Right-continuity of
paths implies Ms+n−1 ∨ c → Ms ∨ c. Uniform integrability then gives con-
vergence in L1 . By Lemma B.13 there exists a subsequence {nj } such that
conditional expectations converge almost surely:
Consequently
Proposition 3.3. Suppose the filtration {Ft } satisfies the usual conditions,
in other words (Ω, F, P ) is complete, F0 contains all null events, and Ft =
Ft+ . Let M be a submartingale such that t 7→ EMt is right-continuous.
Then there exists a cadlag modification of M that is an {Ft }-submartingale.
3.1. Optional Stopping 71
The conclusion from the previous lemma needed next is that for any
stopping time τ and T ∈ R+ , the stopped variable Mτ ∧t is integrable.
Theorem 3.6. Let M be a submartingale with right-continuous paths, and
let σ and τ be two stopping times. Then for T < ∞,
(3.3) E[Mτ ∧T |Fσ ] ≥ Mσ∧τ ∧T .
Proof. As pointed out before the theorem, Mτ ∧T and Mσ∧τ ∧T are integrable
random variable. In particular, the conditional expectation is well-defined.
Define approximating discrete stopping times by σn = 2−n ([2n σ]+1) and
τn = 2−n ([2n τ ] + 1). The interpretation for infinite values is that σn = ∞ if
σ = ∞, and similarly for τn and τ .
Let c ∈ R. The function x 7→ x ∨ c is convex and nondecreasing, hence
Mt ∨ c is also a submartingale. Applying Theorem 3.4 to this submartingale
and the stopping times σn and τn gives
E[Mτn ∧T ∨ c|Fσn ] ≥ Mσn ∧τn ∧T ∨ c.
Since σ ≤ σn , Fσ ⊆ Fσn , and if we condition both sides of the above
inequality on Fσ , we get
(3.4) E[Mτn ∧T ∨ c|Fσ ] ≥ E[Mσn ∧τn ∧T ∨ c|Fσ ].
and
c ≤ Mσn ∧τn ∧T ∨ c ≤ E[MT ∨ c|Fσn ∧τn ].
Together with Lemma B.12 from the Appendix, these bounds imply that the
sequences {Mτn ∧T ∨ c : n ∈ N} and {Mσn ∧τn ∧T ∨ c : n ∈ N} are uniformly
integrable. Since these sequences converge almost surely (as argued above),
uniform integrability implies that they converge in L1 . By Lemma B.13
there exists a subsequence {nj } along which the conditional expectations
converge almost surely:
E[Mτnj ∧T ∨ c|Fσ ] → E[Mτ ∧T ∨ c|Fσ ]
and
E[Mσnj ∧τnj ∧T ∨ c|Fσ ] → E[Mσ∧τ ∧T ∨ c|Fσ ].
(To get a subsequence that works for both limits, extract a subsequence for
the first limit by Lemma B.13, and then apply Lemma B.13 again to extract
a further subsubsequence for the second limit.) Taking these limits in (3.4)
gives
E[Mτ ∧T ∨ c|Fσ ] ≥ E[Mσ∧τ ∧T ∨ c|Fσ ].
M is right-continuous by assumption, hence progressively measurable, and
so Mσ∧τ ∧T is Fσ∧τ ∧T -measurable. This is a sub-σ-field of Fσ , and so
E[Mτ ∧T ∨ c|Fσ ] ≥ E[Mσ∧τ ∧T ∨ c|Fσ ] = Mσ∧τ ∧T ∨ c ≥ Mσ∧τ ∧T .
As c & −∞, Mτ ∧T ∨c → Mτ ∧T pointwise, and for all c we have the integrable
bound |Mτ ∧T ∨ c| ≤ |Mτ ∧T |. Thus by the dominated convergence theorem
for conditional expectations, almost surely
lim E[Mτ ∧T ∨ c|Fσ ] = E[Mτ ∧T |Fσ ].
c→−∞
Proof. For u < v and T ≥ σ(v), (3.3) gives E[Mσ(v) |Fσ(u) ] ≥ Mσ(u) . If
M is a martingale, we can apply this to both M and −M . And if M is
an L2 -martingale, Lemma 3.5 applied to the submartingale M 2 implies that
2 ] ≤ 2E[M 2 ] + E[M 2 ].
E[Mσ(u)
T 0
3.2. Inequalities
Lemma 3.9. Let M be a submartingale, 0 < T < ∞, and H a finite subset
of [0, T ]. Then for r > 0,
n o
(3.6) P max Mt ≥ r ≤ r−1 E[MT+ ]
t∈H
and
n o
P min Mt ≤ −r ≤ r−1 E[MT+ ] − E[M0 ] .
(3.7)
t∈H
from which
n o
−rP min Mt ≤ −r = −rP {τ < ∞} ≥ E[Mτ 1{τ <∞} ]
t∈H
≥ E[M0 ] − E[MT 1{τ =∞} ] ≥ E[M0 ] − E[MT+ ].
and
n o
inf Mt ≤ −r ≤ r−1 E[MT+ ] − E[M0 ] .
(3.9) P
0≤t≤T
We shall also use the shorter term local L2 -martingale for a local square-
integrable martingale.
Lemma 3.14. Suppose M is a local martingale and σ is an arbitrary stop-
ping time. Then M σ is also a local martingale. Similarly, if M is a local
L2 -martingale, then so is M σ . In both cases, if {τk } is a localizing sequence
for M , then it is also a localizing sequence for M σ .
Only large jumps can prevent a cadlag local martingale from being a
local L2 -martingale.
Lemma 3.15. Suppose M is a cadlag local martingale, and there is a con-
stant c such that |Mt − Mt− | ≤ c for all t. Then M is a local L2 -martingale.
Recall that the usual conditions on the filtration {Ft } meant that the
filtration is complete (each Ft contains every subset of a P -null event in F)
and right-continuous (Ft = Ft+ ).
Theorem 3.16 (Fundamental Theorem of Local Martingales). Assume
{Ft } is complete and right-continuous. Suppose M is a cadlag local martin-
gale and c > 0. Then there exist cadlag local martingales M
f and A such that
the jumps of M are bounded by c, A is an FV process, and M = M
f f + A.
If M is continuous, then so is [M ].
1/2 1/2
≤ E (Mtτn − Mtτ )2 + 2E [M τn − M τ ]t E [M τ ]t
1/2 2 1/2
= E (Mτn ∧t − Mτ ∧t )2 + 2E (Mτn ∧t − Mτ ∧t )2
E Mτ ∧t
1/2 2 1/2
≤ E{Mτ2n ∧t } − E{Mτ2∧t } + 2 E{Mτ2n ∧t } − E{Mτ2∧t }
E Mt .
In the last step we used (3.3) in two ways: For a martingale it gives equality,
and so
E (Mτn ∧t − Mτ ∧t )2 = E{Mτ2n ∧t } − 2E{ E(Mτn ∧t |Fτ ∧t )Mτ ∧t } + E{Mτ2∧t }
= E{Mτ2n ∧t } − E{Mτ2∧t }.
Second, we applied (3.3) to the submartingale M 2 to get
1/2 1/2
E Mτ2∧t ≤ E Mt2
.
The string of inequalities allows us to conclude that [M τn ]t converges to
[M τ ]t
in L1 as n → ∞, if we can show that
(3.15) E{Mτ2n ∧t } → E{Mτ2∧t }.
To argue this last limit, first note that by right-continuity, Mτ2n ∧t → Mτ2∧t
almost surely. By optional stopping (3.6),
0 ≤ Mτ2n ∧t ≤ E(Mt2 |Fτn ∧t ).
82 3. Martingales
This inequality and Lemma B.12 from the Appendix imply that the sequence
{Mτ2n ∧t : n ∈ N} is uniformly integrable. Under uniform integrability, the
almost sure convergence implies convergence of the expectations (3.15).
To summarize, we have shown that [M τn ]t → [M τ ]t in L1 as n → ∞.
By Step 1, [M τn ]t = [M ]τn ∧t which converges to [M ]τ ∧t by right-continuity
of the process [M ]. Putting these together, we get the almost sure equality
[M τ ]t = [M ]τ ∧t for L2 -martingales.
m−1
X
2 2
(3.16a) = E 1A (Mti+1 − Mti ) − [M ]t + [M ]s
i=`
m−1
X
2
= E 1A (Mti+1 − Mti ) − [M ]t + [M ]s
i=`
m−1
X
(3.16b) = E 1A (Mti+1 − Mti )2 − [M ]t
i=0
`−1
X
2
(3.16c) + E 1A [M ]s − (Mti+1 − Mti ) .
i=0
To apply this, the expectation on line (3.16a) has to be taken apart, the
conditioning applied to individual terms, and then the expectation put back
together. Letting the mesh of the partition tend to zero makes the expec-
tations on lines (3.16b)–(3.16c) vanish by the L1 convergence in (2.8) for
L2 -martingales.
In the limit we have
E 1A Mt2 − [M ]t = E 1A Ms2 − [M ]s
By Theorem 3.22 and Lemma 2.13 the covariation [M, N ] of two right-
continuous local martingales M and N exists. As a difference of increasing
processes, [M, N ] is a finite variation process.
Lemma 3.25. Let M and N be cadlag L2 -martingales or local L2 -martingales.
Let τ be a stopping time. Then [M τ , N ] = [M τ , N τ ] = [M, N ]τ .
Proof. [M τ , N τ ] = [M, N ]τ follows from Lemma 2.13 and Lemma 3.23. For
the first equality claimed, consider a partition of [0, t]. If 0 < τ ≤ t, let ` be
the index such that t` < τ ≤ t`+1 . Then
X
(Mtτi+1 − Mtτi )(Nti+1 − Nti ) = (Mτ − Mt` )(Nt`+1 − Nτ )1{0<τ ≤t}
i
X
+ (Mtτi+1 − Mtτi )(Ntτi+1 − Ntτi ).
i
(If τ = 0 the equality above is still true, for both sides vanish.) Let the mesh
of the partition tend to zero. With cadlag paths, the term after the equality
sign converges almost surely to (Mτ − Mτ − )(Nτ + − Nτ )1{0<τ ≤t} = 0. The
convergence of the sums gives [M τ , N ] = [M τ , N τ ].
Theorem 3.26. (a) If M and N are right-continuous L2 -martingales, then
M N − [M, N ] is a martingale.
(b) If M and N are right-continuous local L2 -martingales, then M N −
[M, N ] is a local martingale.
3.5. Doob-Meyer decomposition 85
Definition 3.29. For 0 < u < ∞, let Tu be the collection of stopping times
τ that satisfy τ ≤ u. A process X is of class DL if the random variables
{Xτ : τ ∈ Tu } are uniformly integrable for each 0 < u < ∞.
and integer k
k−1
X k−1
X
2
E[Mkt − M02 ] = 2
E[M(j+1)t 2
− Mjt ]= E[(M(j+1)t − Mjt )2 ]
j=0 j=0
With this proposition we can easily handle our two basic examples.
Example 3.34. For a standard Brownian motion hBit = [B]t = t. For a
compensated Poisson process Mt = Nt − αt,
hM it = tE[M12 ] = tE[(N1 − α)2 ] = αt.
To achieve this, start with n0 = 1, and assuming nk−1 has been chosen, pick
nk > nk−1 so that
kM (m) − M (n) kM2 ≤ 2−3k
for m, n ≥ nk . Then for m ≥ nk ,
(m) (n ) 1/2
1 ∧ E (Mk − Mk k )2 ≤ 2k kM (m) − M (nk ) kM2 ≤ 2−2k ,
and the minimum with 1 is superfluous since 2−2k < 1. Substituting this
back into (3.22) with ε = 2−k gives (3.23) with 2−2k on the right-hand side.
By the Borel-Cantelli lemma, there exists an event Ω1 with P (Ω1 ) = 1
such that for ω ∈ Ω1 ,
(nk+1 ) (nk )
sup |Mt (ω) − Mt (ω)| < 2−k
0≤t≤k
3.6. Spaces of martingales 89
for all but finitely many k’s. It follows that the sequence of cadlag functions
(n )
t 7→ Mt k (ω) are Cauchy in the uniform metric over any bounded time
interval [0, T ]. By Lemma A.3 in the Appendix, for each T < ∞ there exists
(T ) (n )
a cadlag process {Nt (ω) : 0 ≤ t ≤ T } such that Mt k (ω) converges to
(T )
Nt (ω) uniformly on the time interval [0, T ], as k → ∞, for any ω ∈ Ω1 .
(S) (T )
Nt (ω) and Nt (ω) must agree for t ∈ [0, S ∧ T ], since both are limits of
the same sequence. Thus we can define one cadlag function t 7→ Mt (ω) on
(n )
R+ for ω ∈ Ω1 , such that Mt k (ω) converges to Mt (ω) uniformly on each
bounded time interval [0, T ]. To have M defined on all of Ω, set Mt (ω) = 0
for ω ∈ / Ω1 .
The event Ω1 lies in Ft by the assumption of completeness of the filtra-
(n )
tion. Since Mt k → Mt on Ω1 while Mt = 0 on Ωc1 , it follows that Mt is
Ft -measurable. The almost sure limit Mt and the L2 limit Yt of the sequence
(n )
{Mt k } must coincide almost surely. Consequently (3.21) becomes
(3.24) E[1A Mt ] = E[1A Ms ]
for all A ∈ Fs and gives the martingale property for M . To summarize, M
is now a square-integrable cadlag martingale, in other words an element of
M2 . The final piece, namely kM (n) − M kM2 → 0, follows because we can
replace Yt by Mt in (3.20) due to the almost sure equality Mt = Yt .
If all M (n) are continuous martingales, the uniform convergence above
produces a continuous limit M . This shows that Mc2 is a closed subspace of
M2 under the metric dM2 .
The convergence defined by (3.25) for all T < ∞ and ε > 0 is called
uniform convergence in probability on compact intervals.
We shall write M2,loc for the space of cadlag local L2 -martingales with
respect to a given filtration {Ft } on a given probability space (Ω, F, P ). We
do not introduce a distance function on this space.
90 3. Martingales
Exercises
Exercise 3.1. Let A be an increasing process, and φ : R+ × Ω → R a
bounded BR+ ⊗ F-measurable function. Let T < ∞.
Z
gφ (ω) = φ(t, ω)dAt (ω)
(0,T ]
is an F-measurable function. Show also that, for any BR+ ⊗ F-measurable
nonnegative function φ : R+ × Ω → R+ ,
Z
gφ (ω) = φ(t, ω)dAt (ω)
(0,∞)
is an F-measurable function. The integrals are Lebesgue-Stieltjes integrals,
evaluated separately for each ω. The only point in separating the two cases
is that if φ takes both positive and negative values, the integral over the
entire interval [0, ∞) might not be defined.
Hint: One can start with φ(t, ω) = 1(a,b]×Γ (t, ω) for 0 ≤ a < b < ∞ and
Γ ∈ F. Then apply Theorem B.2 from the Appendix.
Exercise 3.2. Let N = {N (t) : 0 ≤ t < ∞} be a homogeneous rate α
Poisson process with respect to {Ft } and Mt = Nt − αt the compensated
Poisson process. We have seen that the quadratic variation is [M ]t = Nt
while hM it = αt. It follows that N cannot be a natural increasing process.
In this exercise you show that the naturalness condition fails for N .
(a) Let λ > 0. Show that
X(t) = exp{−λN (t) + αt(1 − e−λ )}
is a martingale.
(b) Show that N is not a natural increasing process, by showing that for
X defined above, the condition
Z Z
E X(s)dN (s) = E X(s−)dN (s)
(0,t] (0,t]
fails. (In case you protest that X is not a bounded martingale, fix T > t
and consider X(s ∧ T ).)
Chapter 4
Stochastic Integral
with respect to
Brownian Motion
91
92 4. Stochastic Integral with respect to Brownian Motion
Lemma 4.1. Fix a number u ∈ [0, 1]. Given a partition π = {0 = t0 < t1 <
· · · < tm(π) = t}, let si = (1 − u)ti + uti+1 , and define
m(π)−1
X
S(π) = Bsi (Bti+1 − Bti ).
i=0
Then
lim S(π) = 12 Bt2 − 12 t + ut in L2 (P ).
mesh(π)→0
and
X X
Var[S2 (π)] = Var[(Bsi − Bti )2 ] = 2 (si − ti )2
i i
X
2
≤2 (ti+1 − ti ) ≤ 2t mesh(π)
i
Let L2 (B) denote the collection of all measurable, adapted processes X such
that
kXkL2 ([0,T ]×Ω) < ∞
for all T < ∞. A metric on L2 (B) is defined by dL2 (X, Y ) = kX − Y kL2 (B)
where
X∞
2−k 1 ∧ kXkL2 ([0,k]×Ω) .
(4.2) kXkL2 (B) =
k=1
The triangle inequality
kX + Y kL2 (B) ≤ kXkL2 (B) + kY kL2 (B)
is valid, and this gives the triangle inequality
dL2 (X, Y ) ≤ dL2 (X, Z) + dL2 (Z, Y )
required for dL2 (X, Y ) to be a genuine metric.
To have a metric, one also needs the property dL2 (X, Y ) = 0 iff X =
Y . We have to adopt the point of view that two processes X and Y are
considered “equal” if the set of points (t, ω) where X(t, ω) 6= Y (t, ω) has
m ⊗ P -measure zero. Equivalently,
Z ∞
(4.3) P {X(t) 6= Y (t)} dt = 0.
0
94 4. Stochastic Integral with respect to Brownian Motion
This will be the class of processes X for which the stochastic integral process
with respect to Brownian motion, denoted by
Z t
(X · B)t = Xs dBs
0
where n is finite integer, 0 = t0 = t1 < t2 < · · · < tn are time points, and for
0 ≤ i ≤ n − 1, ξi is a bounded Fti -measurable random variable on (Ω, F, P ).
Predictability refers to the fact that the value Xt can be “predicted” from
{Xs : s < t}. Here this point is rather simple because X is left-continuous
so Xt = lim Xs as s % t. In the next chapter we need to deal seriously with
the notion of predictability. Here it is not really needed, and we use the
term only to be consistent with what comes later. The value ξ0 at t = 0
is irrelevant both for the stochastic integral of X and for approximating
general processes. We include it so that the value X(0, ω) is not artificially
restricted.
4. Stochastic Integral with respect to Brownian Motion 95
Proof. We begin by showing that, given T < ∞, we can find simple pre-
(T )
dictable processes Yk that vanish outside [0, T ] and satisfy
Z T
(T ) 2
(4.6) lim E Yk (t) − X(t) dt = 0.
k→∞ 0
To prove this, start by ignoring the expectation and rewrite the double
integral as follows:
Z T Z 1
2
dt ds Z n,s (t) − X(t)
0 0
Z T XZ 1
2
= dt ds X(s + 2−n j, ω) − X(t, ω) 1(s+2−n j,s+2−n (j+1)] (t)
0 j∈Z 0
Z T XZ 1
2
= dt ds X(s + 2−n j, ω) − X(t, ω) 1[t−2−n (j+1),t−2−n j) (s).
0 j∈Z 0
(m)
To complete the proof, create the simple predictable processes {Yk }
for all T = m ∈ N. For each m, pick km such that
Z m
(m) 2 1
E dt Ykm (t) − X(t) dt < .
0 m
(m)
Then Xm = Ykm satisfies the requirement of the lemma.
Proof. Let X (k) = (X∧k)∨(−k). Since |X (k) −X| ≤ |X| and |X (k) −X| → 0
pointwise on R+ × Ω,
Z m
2
lim E X (k) (t) − X(t) dt = 0
k→∞ 0
by
n−1
X
(4.8) (X · B)t (ω) = ξi (ω) Bti+1 ∧t (ω) − Bti ∧t (ω) .
i=1
Note that our convention is such that the value of X at t = 0 does not
influence the integral. We also write I(X) = X · B when we need a symbol
for the mapping I : X 7→ X · B.
Let S2 denote the space of simple predictable processes. It is a subspace
of L2 (B). An element X of S2 can be represented in the form (4.5) in many
different ways. We need to check that the integral X · B depends only on
the process X and not on the particular representation. Also, we need to
know that S2 is a linear space, and that the integral I(X) is a linear map
on S2 .
Lemma 4.4. (a) Suppose the process X in (4.5) also satisfies
m−1
X
Xt (ω) = η0 (ω)1{0} (t) + ηj (ω)1(si ,si+1 ] (t)
j=1
for all (t, ω), where 0 = s0 = s1 < s2 < · · · < sm < ∞ and ηj is Fsj -
measurable for 0 ≤ j ≤ m − 1. Then for each (t, ω),
n−1
X m−1
X
ξi (ω) Bti+1 ∧t (ω) − Bti ∧t (ω) = ηi (ω) Bsi+1 ∧t (ω) − Bsi ∧t (ω) .
i=1 j=1
Hints for proof. Let {uk } = {sj } ∪ {ti } be the common refinement of
the partitions {sj } and {ti }. Rewrite both representations of X in terms
of {uk }. The same idea can be used for part (b) to write two arbitrary
simple processes in terms of a common partition, which makes adding them
easy.
Next we need some continuity properties for the integral. Recall the
distance measure k · kM2 defined for continuous L2 -martingales by (3.18).
Lemma 4.5. Let X ∈ S2 . Then X · B is a continuous square-integrable
martingale. We have these isometries:
Z t
2
Xs2 ds for all t ≥ 0,
(4.9) E (X · B)t = E
0
4. Stochastic Integral with respect to Brownian Motion 99
and
(4.10) kX · BkM2 = kXkL2 (B) .
From the isometry property we can deduce that simple process approx-
imation gives approximation of stochastic integrals.
Lemma 4.6. Let X ∈ L2 (B). Then there is a unique continuous L2 -
martingale Y such that, for any sequence of simple predictable processes
{Xn } such that
kX − Xn kL2 (B) → 0,
we have
kY − Xn · BkM2 → 0.
Hints for proof. It all follows from these facts: an approximating sequence
of simple predictable processes exists for each process in L2 (B), a convergent
sequence in a metric space is a Cauchy sequence, a Cauchy sequence in a
complete metric space converges, the space Mc2 of continuous L2 -martingales
is complete, the isometry (4.10), and the triangle inequality.
The reader familiar with more abstract principles of analysis should note
that the extension of the stochastic integral X ·B from X ∈ S2 to X ∈ L2 (B)
is an instance of a general, classic argument. A uniformly continuous map
from a metric space into a complete metric space can always be extended
to the closure of its domain. If the spaces are linear, the linear operations
are continuous, and the map is linear, then the extension is a linear map
too. In this case the map is X 7→ X · B, first defined for X ∈ S2 . Uniform
continuity follows from linearity and (4.10). Proposition 4.3 implies that the
closure of S2 in L2 (B) is all of L2 (B).
Some books first define the integral (X · B)t at a fixed time t as a map
from L2 ([0, t]×Ω, m⊗P ) into L2 (P ), utilizing the completeness of L2 -spaces.
Then one needs a separate argument to show that the integrals defined for
different times t can be combined into a continuous martingale t 7→ (X · B)t .
We defined the integral directly as a map into the space of martingales Mc2
to avoid the extra argument. Of course, we did not really save work. We
just did part of the work earlier when we proved that Mc2 is a complete
space (Lemma 3.35).
Example 4.8. In the definition (4.5) of the simple predictable process we
required the ξi bounded because this will be convenient later. For this
section it would have been more convenient to allow square-integrable ξi .
So let us derive the integral for that case. Let
m−1
X
X(t) = ηi 1(si ,si+1 ] (t)
i=1
where 0 ≤ s1 < · · · < sm and each ηi ∈ L2 (P ) is Fsi -measurable. Check
that a sequence of approximating simple processes is given by
m−1
(k)
X
Xk (t) = ηi 1(si ,si+1 ] (t)
i=1
(k)
with truncated variables ηi= (ηi ∧ k) ∨ (−k). And then that
Z t m−1
X
X(s) dBs = ηi (Bt∧si+1 − Bt∧si ).
0 i=1
4. Stochastic Integral with respect to Brownian Motion 101
2n
X n−1
= 1 n
2 (ti+1 − tni )2 = 12 n2−n .
i=0
Thus Xn converges to B in L2 (B) as n → ∞. By Example 4.8
Z t 2n
X n−1
Xn (s) dBs = Btni (Bt∧tni+1 − Bt∧tni ).
0 i=1
By the isometry (4.12) in the next Proposition, this integral converges to
Rt 2
0 Bs dBs in L as n → ∞, so by Lemma 4.1,
Z t
Bs dBs = 12 Bt2 − 12 t.
0
Hints for proof. Parts (a)–(b): These properties are inherited from the
integrals of the approximating simple processes Xn . One needs to justify
taking limits in Lemma 4.4(b) and Lemma 4.5.
The proof of part (c) is different from the one that is used in the next
chapter. So we give here more details than in previous proofs.
By considering Z = X −Y , it suffices to prove that if Z ∈ L2 (B) satisfies
Z(t, ω) = 0 for t ≤ τ (ω), then (Z · B)t (ω) = 0 for t ≤ τ (ω).
Assume first that Z is bounded, so |Z(t, ω)| ≤ C. Pick a sequence {Zn }
of simple predictable processes that converge to Z in L2 (B). Let Zn be of
the generic type (recall (4.5))
m(n)−1
X
Zn (t, ω) = ξin (ω)1(tni ,tni+1 ] (t).
i=1
Estimate
X
Zn (t)1{τ < t} − Z
en (t) ≤ C 1{τ < t} − 1{τ ≤ tni } 1(tni ,tni+1 ] (t)
i
X
≤C 1{tni <τ < tni+1 }1(tni ,tni+1 ] (t).
i
We can artificially add partition points tni to each Zn so that this last quan-
tity converges to 0 as n → ∞, for each fixed T . This verifies (4.14), and
thereby (4.13).
The integral of Zen is given explicitly by
m(n)−1
X
en · B)t =
(Z ξin 1{τ ≤ tni }(Bt∧tni+1 − Bt∧tni ).
i=1
en · B)t = 0 if t ≤ τ . By the definition
By inspecting each term, we see that (Z
of the integral and (4.13), Zn · B → Z · B in Mc2 . Then by Lemma 3.36
e
there exists a subsequence Zen · B and an event Ω0 of full probability such
k
that, for each ω ∈ Ω0 and T < ∞,
en · B)t (ω) → (Z · B)t (ω)
(Z uniformly for 0 ≤ t ≤ T .
For any ω ∈ Ω0 , in the n → ∞ limit (Z · B)t (ω) = 0 for t ≤ τ (ω). Part (c)
has been proved for a bounded process.
To complete the proof, given Z ∈ L2 (B), let Z (k) (t, ω) = (Z(t, ω) ∧ k) ∨
(−k), a bounded process in L2 (B) with the same property Z (k) (t, ω) = 0 if
t ≤ τ (ω). Apply the previous step to Z (k) and justify what happens in the
limit.
Lemma 4.11. For almost every ω, if t ≤ τm (ω)∧τn (ω) then (Xm ·B)t (ω) =
(Xn · B)t (ω).
The lemma says that, for a given (t, ω), once n is large enough so that
τn (ω) ≥ t, the value (Xn · B)t (ω) does not change with n. The definition
(4.4) guarantees that τn (ω) % ∞ for almost every ω. These ingredients
almost justify the next extension of the stochastic integral to L(B).
Definition 4.12. Let B be a Brownian motion on a probability space
(Ω, F, P ) with respect to a filtration {Ft }, and X ∈ L(B). Let Ω0 be
the event of full probability on which τn % ∞ and the conclusion of Lemma
4.11 holds. The stochastic integral X · B is defined for ω ∈ Ω0 by
(4.16) (X · B)t (ω) = (Xn · B)t (ω) for any n such that τn (ω) ≥ t.
For ω ∈
/ Ω0 define (X · B)t (ω) ≡ 0. The process X · B is a continuous local
2
L -martingale.
One can check that X ∈ L(B) if and only if X has a localizing sequence
{σn }. Lemma 4.11 and Definition 4.12 work equally well with {τn } replaced
by an arbitrary localizing sequence {σn }. Fix such a sequence {σn } and
define Xen (t) = 1{t ≤ σn }X(t). Let Ω1 be the event of full probability on
which σn % ∞ and for all pairs m, n, (X em · B)t = (X
en · B)t for t ≤ σm ∧ σn .
(In other words, the conclusion of Lemma 4.11 holds for {σn }.) Let Y be
the process defined by
(4.17) en · B)t (ω) for any n such that σn (ω) ≥ t,
Yt (ω) = (X
for ω ∈ Ω1 , and identically zero outside Ω1 .
Exercises 105
This lemma tells us that for X ∈ L(B) the stochastic integral X · B can
be defined in terms of any localizing sequence of stopping times.
Exercises
Exercise 4.1. Show that for any [0, ∞]-valued measurable function Y on
(Ω, F), the set {(s, ω) ∈ R+ × Ω : Y (ω) > s} is BR+ ⊗ F-measurable.
Hint. Start with Sa simple Y . Show that if Yn % Y pointwise, then
{(s, ω) : Y (ω) > s} = n {(s, ω) : Yn (ω) > s}.
Exercise 4.2. Suppose η ∈ L2 (P ) is Fs measurable and t > s. Show that
E η 2 (Bt − Bs )2 = E η 2 · E (Bt − Bs )2
Stochastic Integration
of Predictable
Processes
The
R main goal of this chapter is the definition of the stochastic integral
(0,t] X(s) dY (s) where the integrator Y is a cadlag semimartingale and X
is a locally bounded predictable process. The most important special case
is the one where the integrand is of the Rform X(t−) for some cadlag process
X. In this case the stochastic integral (0,t] X(s−) dY (s) can be realized as
the limit of Riemann sums
∞
X
Sn (t) = X(si ) Y (si+1 ∧ t) − Y (si ∧ t)
i=0
when the mesh of the partition {si } tends to zero. The convergence is then
uniform on compact time intervals, and happens in probability. Random
partitions of stopping times can also be used.
These results will be reached in Section 5.3. Before the semimartingale
integral we explain predictable processes and construct the integral with
respect to L2 -martingales and local L2 -martingales. Right-continuity of the
filtration {Ft } is not needed until we define the integral with respect to a
semimartingale. And even there it is needed only for guaranteeing that the
semimartingale has a decomposition whose local martingale part is a local
L2 -martingale. Right-continuity of {Ft } is not needed for the arguments
that establish the integral.
107
108 5. Stochastic Integral
∞
X
Xn (t, ω) = X0 (ω)1{0} (0) + Xi2−n (ω)1(i2−n ,(i+1)2−n ] (t).
i=0
5.1. Square-integrable martingale integrator 109
Then for B ∈ BR ,
{(t, ω) : Xn (t, ω) ∈ B} = {0} × {ω : X0 (ω) ∈ B}
[∞n o
∪ (i2−n , (i + 1)2−n ] × {ω : Xi2−n (ω) ∈ B}
i=0
which is an event in P, being a countable union of predictable rectangles.
Thus Xn is P-measurable. By left-continuity of X, Xn (t, ω) → X(t, ω) as
n → ∞ for each fixed (t, ω). Since pointwise limits preserve measurability,
X is also P-measurable.
We have shown that P contains σ-fields (a)–(c).
The indicator of a predictable rectangle is itself an adapted caglad pro-
cess, and by definition this subclass of caglad processes generates P. Thus
σ-field (c) contains P. By the same reasoning, also σ-field (b) contains P.
It remains to show that σ-field (a) contains P. We show that all pre-
dictable rectangles lie in σ-field (a) by showing that their indicator functions
are pointwise limits of continuous adapted processes.
If X = 1{0}×F0 for F0 ∈ F0 , let
(
1 − nt, 0 ≤ t < 1/n
gn (t) =
0, t ≥ 1/n,
and then define Xn (t, ω) = 1F0 (ω)gn (t). Xn is clearly continuous. For a
fixed t, writing (
gn (t)1F0 , 0 ≤ t < 1/n
Xn (t) =
0, t ≥ 1/n,
and noting that F0 ∈ Ft for all t ≥ 0, shows that Xn is adapted. Since
Xn (t, ω) → X(t, ω) as n → ∞ for each fixed (t, ω), {0} × F0 lies in σ-field
(a).
If X = 1(u,v]×F for F ∈ Fu , let
n(t − u), u ≤ t < u + 1/n
1, u + 1/n ≤ t < v
hn (t) =
1 − n(t − v), v ≤ t ≤ v + 1/n
0, t < u or t > v + 1/n.
Consider only n large enough so that 1/n < v − u. Define Xn (t, ω) =
1F (ω)hn (t), and adapt the previous argument. We leave the missing details
as Exercise 5.3.
The previous lemma tells us that all continuous adapted processes, all
left-continuous adapted processes, and any process that is a pointwise limit
110 5. Stochastic Integral
The meaning of formula (5.1) is that first, for each fixed ω, the function
t 7→ 1A (t, ω) is integrated by the Lebesgue-Stieltjes measure Λ[M ](ω) of the
nondecreasing right-continuous function t 7→ [M ]t (ω). The resulting integral
is a measurable function of ω, which is then averaged over the probability
space (Ω, F, P ) (Exercise 3.1). Recall that our convention for the measure
Λ[M ](ω) {0} of the origin is
For predictable processes X, we define the L2 norm over the set [0, T ]×Ω
under the measure µM by
Z 1/2
kXkµM ,T = |X|2 dµM
[0,T ]×Ω
(5.3) Z 1/2
2
= E |X(t, ω)| d[M ]t (ω) .
[0,T ]
such that ηiN (ω) → ξi (ω) as N → ∞. Here βji,N are constants and Fji,N ∈
Fti . Adding these up, we have that
n−1
X
Xt (ω) = lim η0N (ω)1{0} (t) + ηiN (ω)1(ti ,ti+1 ] (t)
N →∞
i=1
m(0,N
X) X m(i,N
n−1 X)
= lim βj0,N 1{0}×F 0,N (t, ω) + βji,N 1(ti ,ti+1 ]×F i,N (t, ω) .
N →∞ j j
j=1 i=1 j=1
Definition 5.6. For a simple predictable process of the type (5.6), the
stochastic integral is the process X · M defined by
n−1
X
(5.7) (X · M )t (ω) = ξi (ω) Mti+1 ∧t (ω) − Mti ∧t (ω) .
i=1
5.1. Square-integrable martingale integrator 113
Note that our convention is such that the value of X at t = 0 does not
influence the integral. We also write I(X) = X · M when we need a symbol
for the mapping I : X 7→ X · M .
Let S2 denote the subspace of L2 consisting of simple predictable pro-
cesses. Any particular element X of S2 can be represented in the form (5.6)
with many different choices of random variables and time intervals. The
first thing to check is that the integral X · M depends only on the process X
and not on the particular representation (5.6) used. Also, let us check that
the space S2 is a linear space and the integral behaves linearly, since these
properties are not immediately clear from the definitions.
for all (t, ω), where 0 = s0 = s1 < s2 < · · · < sm < ∞ and ηj is Fsj -
measurable for 0 ≤ j ≤ m − 1. Then for each (t, ω),
n−1
X m−1
X
ξi (ω) Mti+1 ∧t (ω) − Mti ∧t (ω) = ηi (ω) Msi+1 ∧t (ω) − Msi ∧t (ω) .
i=1 j=1
For t ∈ (uk , uk+1 ], Xt (ω) = ξi (ω) and Xt (ω) = ηj (ω). So for these particular
i and j, ξi = ηj .
114 5. Stochastic Integral
The proof now follows from a reordering of the sums for the stochastic
integrals.
n−1
X
ξi (Mti+1 ∧t − Mti ∧t )
i=1
n−1
X p−1
X
= ξi (Muk+1 ∧t − Muk ∧t )1{(uk , uk+1 ] ⊆ (ti , ti+1 ]}
i=1 k=1
p−1
X n−1
X
= (Muk+1 ∧t − Muk ∧t ) ξi 1{(uk , uk+1 ] ⊆ (ti , ti+1 ]}
k=1 i=1
p−1
X m−1
X
= (Muk+1 ∧t − Muk ∧t ) ηj 1{(uk , uk+1 ] ⊆ (sj , sj+1 ]}
k=1 j=1
m−1
X p−1
X
= ηj (Muk+1 ∧t − Muk ∧t )1{(uk , uk+1 ] ⊆ (sj , sj+1 ]}
j=1 k=1
m−1
X
= ηj (Msj+1 ∧t − Msj ∧t ).
j=1
The representation
p−1
X
αXt + βYt = (αξ0 + βη0 )1{0} (t) + (αρk + βζk )1(uk ,uk+1 ] (t)
k=1
and
Proof. The cadlag property for each fixed ω is clear from the definition of
X · M , as is the continuity if M is continuous to begin with.
Linear combinations of martingales are martingales. So to prove that
X · M is a martingale it suffices to check this statement: if M is a mar-
tingale, u < v and ξ is a bounded Fu -measurable random variable, then
Zt = ξ(Mt∧v − Mt∧u ) is a martingale. The boundedness of ξ and integrabil-
ity of M guarantee integrability of Zt . Take s < t.
First, if s < u, then
E[Zt |Fs ] = E ξ(Mt∧v − Mt∧u ) Fs
= E ξE{Mt∧v − Mt∧u |Fu } Fs
= 0 = Zs
because Mt∧v − Mt∧u = 0 for t ≤ u, and for t > u the martingale property
of M gives
We claim that each term of the last sum has zero expectation. Since i < j,
ti+1 ≤ tj and both ξi and ξj are Ftj -measurable.
h
E ξi ξj (Mt∧ti+1 − Mt∧ti )(Mt∧tj+1 − Mt∧tj )
= E ξi ξj (Mt∧ti+1 − Mt∧ti )E{Mt∧tj+1 − Mt∧tj |Ftj } = 0
n−1
X 2
E (X · M )2t = E ξi2 Mt∧ti+1 − Mt∧ti
i=1
n−1
X
E ξi2 E (Mt∧ti+1 − Mt∧ti )2 Fti
=
i=1
n−1
X
E ξi2 E Mt∧t
2 2
= i+1
− Mt∧ti
Fti
i=1
n−1
X
E ξi2 E [M ]t∧ti+1 − [M ]t∧ti Fti
=
i=1
5.1. Square-integrable martingale integrator 117
n−1
X
E ξi2 [M ]t∧ti+1 − [M ]t∧ti
=
i=1
n−1
X h Z i
= E ξi2 1(ti ,ti+1 ] (s) d[M ]s
i=1 [0,t]
Z n−1
X
=E ξ02 1{0} (s) + ξi2 1(ti ,ti+1 ] (s) d[M ]s
[0,t] i=1
Z n−1
X 2
=E ξ0 1{0} (s) + ξi 1(ti ,ti+1 ] (s) d[M ]s
[0,t] i=1
Z
= X 2 dµM .
[0,t]×Ω
In the third last equality we added the term ξ02 1{0} (s) inside the d[M ]s -
integral because this term integrates to zero (recall that Λ[M ] {0} = 0). In
the second last equality we used the equality
n−1
X n−1
X 2
2 2
ξ0 1{0} (s) + ξi 1(ti ,ti+1 ] (s) = ξ0 1{0} (s) + ξi 1(ti ,ti+1 ] (s)
i=1 i=1
which is true due to the pairwise disjointness of the time intervals.
The above calculation checks that
k(X · M )t kL2 (P ) = kXkµM ,t
for any t > 0. Comparison of formulas (3.18) and (5.4) then proves (5.9).
Let us summarize the message of Lemmas 5.7 and 5.8 in words. The
stochastic integral I : X 7→ X · M is a linear map from the space S2 of
predictable simple processes into M2 . Equality (5.9) says that this map is
a linear isometry that maps from the subspace (S2 , dL2 ) of the metric space
(L2 , dL2 ), and into the metric space (M2 , dM2 ). In case M is continuous,
the map goes into the space (Mc2 , dM2 ).
A consequence of (5.9) is that if X and Y satisfy (5.5) then X · M and
Y · M are indistinguishable. For example, we may have Yt = Xt + ζ1{t = 0}
for a bounded F0 -measurable random variable ζ. Then the integrals X · M
and Y · M are indistinguishable, in other words the same process. This is
no different from the analytic fact that changing the value of a function f
on [a, b] at a single point (or even at countably many points) does not affect
Rb
the value of the integral a f (x) dx.
We come to the approximation step.
Lemma 5.9. For any X ∈ L2 there exists a sequence Xn ∈ S2 such that
kX − Xn kL2 → 0.
118 5. Stochastic Integral
Proof. Let Le2 denote the class of X ∈ L2 for which this approximation is
possible. Of course S2 itself is a subset of Le2 .
Indicator functions of time-bounded predictable rectangles are of the
form
1{0}×F0 (t, ω) = 1F0 (ω)1{0} (t),
or
1(u,v]×F (t, ω) = 1F (ω)1(u,v] (t),
for F0 ∈ F0 , 0 ≤ u < v < ∞, and F ∈ Fu . They are elements of S2 due to
(5.2). Furthermore, since S2 is a linear space, it contains all simple functions
of the form
Xn
(5.10) X(t, ω) = ci 1Ri (t, ω)
i=0
where {ci } are finite constants and {Ri } are time-bounded predictable rect-
angles.
Now we do the actual approximation of predictable processes, beginning
with constant multiples of indicator functions of predictable sets.
lie in Le2 .
Consequently
n
X
lim supkX − Xm kL2 ≤ 2−k lim supkX − Xm kµM ,k + ε/3 = ε/3.
m→∞ m→∞
k=1
Fix m large enough so that kX − Xm kL2 ≤ ε/2. Using Step 1 find a process
Z ∈ S2 such that kXm − ZkL2 < ε/2. Then by the triangle inequality
kX − ZkL2 ≤ ε. We have shown that an arbitrary process X ∈ L2 can be
approximated by simple predictable processes in the L2 -distance.
if and only if
(j)
E (Nt − Nt )2 → 0
for each t ≥ 0.
In particular, at each time t ≥ 0 the integral (X · M )t is the mean-square
limit of the integrals (Xn · M )t of approximating simple processes. These
observations are used in the extension of the isometric property of the inte-
gral.
Proposition 5.11. Let M ∈ M2 and X ∈ L2 (M, P). Then we have the
isometries
Z
2
X 2 dµM for all t ≥ 0,
(5.12) E (X · M )t =
[0,t]×Ω
and
(5.13) kX · M kM2 = kXkL2 .
In particular, if X, Y ∈ L2 (M, P) are µM -equivalent in the sense (5.5), then
X · M and Y · M are indistinguishable.
Proof. As already observed, the triangle inequality is valid for the distance
measures k · kL2 and k · kM2 . From this we get a continuity property. Let
Z, W ∈ L2 .
kZkL2 − kW kL2 ≤ kZ − W kL2 + kW kL2 − kW kL2
≤ kZ − W kL2 .
This and the same inequality with Z and W switched give
(5.14) kZkL2 − kW kL2 ≤ kZ − W kL2 .
This same calculation applies to k · kM2 also, and of course equally well to
the L2 norms on Ω and [0, T ] × Ω.
Let Xn ∈ S2 be a sequence such that kXn − XkL2 → 0. As we proved
in Lemma 5.8, the isometries hold for Xn ∈ S. Consequently to prove the
proposition we need only let n → ∞ in the equalities
Z
2
Xn2 dµM
E (Xn · M )t =
[0,t]×Ω
and
kXn · M kM2 = kXn kL2
that come from Lemma 5.8. Each term converges to the corresponding term
with Xn replaced by X.
The last statement of the proposition follows because kX − Y kL2 = 0
iff X and Y are µM -equivalent, and kX · M − Y · M kM2 = 0 iff X · M and
Y · M are indistinguishable.
5.1. Square-integrable martingale integrator 123
for each ε > 0 and T < ∞. By the Borel-Cantelli lemma, along some
subsequence {nj } there is almost sure convergence uniformly on compact
time intervals: for P -almost every ω
It is more accurate to use the interval (0, t] above rather than [0, t] because
the integral does not take into consideration any jump of the martingale at
the origin. Precisely, if ζ is an F0 -measurable random variable and M ft =
ζ + Mt , then [M ] = [M ], the spaces L2 (M , P) and L(M, P) coincide, and
f f
X ·M f = X · M for each admissible integrand.
An integral of the type
Z
G(s, ω) d[M ]s (ω)
(u,v]
and
Z
1(u,v] X dM = (X · M )v∧t − (X · M )u∧t
(0,t]
(5.19) Z
= X dM.
(u∧t,v∧t]
The inclusion or exclusion of the origin in the interval [0, v] is immaterial be-
cause a process of the type 1{0} (t)X(t, ω) for X ∈ L2 (M, P) is µM -equivalent
to the identically zero process, and hence has zero stochastic integral.
(c) For s < t, we have a conditional form of the isometry:
Z
2
Xu2 d[M ]u
(5.20) E (X · M )t − (X · M )s Fs = E Fs .
(s,t]
Part (c). First we check this for the simple process Xn in (5.16). This
is essentially a redoing of the calculations in the proof of Lemma 5.8. Let
126 5. Stochastic Integral
s < t. If s ≥ sk then both sides of (5.20) are zero. Otherwise, fix an index
1 ≤ m ≤ k − 1 such that tm ≤ s < tm+1 . Then
k−1
X
(Xn · M )t − (Xn · M )s = ξm (Mtm+1 ∧t − Ms ) + ξi (Mti+1 ∧t − Mti ∧t )
i=m+1
k−1
X
= ξi (Mui+1 ∧t − Mui ∧t )
i=m
We claim that the cross terms vanish under the conditional expectation.
Since i < j, ui+1 ≤ uj and both ξi and ξj are Fuj -measurable.
E ξi ξj (Mui+1 ∧t − Mui ∧t )(Muj+1 ∧t − Muj ∧t ) Fs
= E ξi ξj (Mui+1 ∧t − Mui ∧t )E{Muj+1 ∧t − Muj ∧t |Fuj } Fs = 0
because the inner conditional expectation vanishes by the martingale prop-
erty of M .
Now we can compute the conditional expectation of the square.
k−1
2 X 2
E ξi2 Mui+1 ∧t − Mui ∧t
E (Xn · M )t − (Xn · M )s Fs = Fs
i=m
k−1
X
E ξi2 E (Mui+1 ∧t − Mui ∧t )2 Fui
= Fs
i=m
k−1
X
E ξi2 E Mu2i+1 ∧t − Mu2i ∧t Fui
= Fs
i=m
k−1
X
E ξi2 E [M ]ui+1 ∧t − [M ]ui ∧t Fui
= Fs
i=m
k−1
X
E ξi2 [M ]ui+1 ∧t − [M ]ui ∧t
= Fs
i=m
k−1
X h Z i
= E ξi2 1(ui ,ui+1 ] (u) d[M ]u Fs
i=m (s,t]
5.1. Square-integrable martingale integrator 127
Z k−1
X
=E ξ0 1{0} (u) + ξi2 1(ti ,ti+1 ] (u) d[M ]u Fs
(s,t] i=1
Z k−1
X 2
=E ξ0 1{0} (u) + ξi 1(ti ,ti+1 ] (u) d[M ]u Fs
(s,t] i=1
Z
2
=E Xn (u, ω) d[M ]u (ω) Fs
(s,t]
Inside the d[M ]u integral above we replaced the ui ’s with ti ’s because for
u ∈ (s, t], 1(ui ,ui+1 ] (u) = 1(ti ,ti+1 ] (u). Also, we brought in the terms for
i < m because these do not influence the integral, as they are supported on
[0, tm ] which is disjoint from (s, t].
Next let X ∈ L2 be general, and Xn → X in L2 . The limit n → ∞ is
best taken with expectations, so we rewrite the conclusion of the previous
calculation as
Z
E (Xn · Mt − Xn · Ms )2 1A = E 1A Xn2 (u) d[M ]u
(s,t]
for every nonnegative Borel function h. This can be justified by the π-λ
Theorem. For any interval (a, b],
ΛGu (s, t] = Gu (t) − Gu (s) = G(u ∧ t) − G(u ∧ s) = ΛG (s, t] ∩ (0, u] .
Then by Lemma B.3 the measures ΛGu and ΛG ( · ∩ (0, u]) coincide on all
Borel sets of (0, ∞). The equality extends to [0, ∞) if we set G(0−) = G(0)
so that the measure of {0} is zero under both measures.
Now fix ω and apply the preceding. By Lemma 3.23, [M τ ] = [M ]τ , and
so
Z Z
Y (s, ω) d[M τ ]s (ω) = Y (s, ω) d[M ]τs (ω)
[0,∞) [0,∞)
Z
= 1[0,τ (ω)] (s)Y (s, ω) d[M ]s (ω).
[0,∞)
Part (b). We prove the first equality in (5.23). Let τn = 2−n ([2n τ ] + 1)
be the usual discrete approximation that converges down to τ as n → ∞.
Let `(n) = [2n t] + 1. Since τ ≥ k2−n iff τn ≥ (k + 1)2−n ,
`(n)
X
1{τ ≥ k2−n } (X · M )(k+1)2−n ∧t − (X · M )k2−n ∧t
(X · M )τn ∧t =
k=0
`(n) Z
X
−n
= 1{τ ≥ k2 } 1(k2−n ,(k+1)2−n ] X dM
k=0 (0,t]
`(n) Z
X
= 1{τ ≥ k2−n }1(k2−n ,(k+1)2−n ] X dM
k=0 (0,t]
Z `(n)
X
= 1{0} X + 1{τ ≥ k2−n }1(k2−n ,(k+1)2−n ] X dM
(0,t] k=0
Z
= 1[0,τn ] X dM.
(0,t]
In the calculation above, the second equality comes from (5.19), the third
from (5.22) where Z is the Fk2−n -measurable 1{τ ≥ k2−n }. The next to
last equality uses additivity and adds in the term 1{0} X that integrates to
zero. The last equality follows from the identity
`(n)
X
1[0,τn ] (t, ω) = 1{0} (t) + 1{τ ≥k2−n } (ω)1(k2−n ,(k+1)2−n ] (t).
k=0
5.1. Square-integrable martingale integrator 131
and τn (ω) & τ (ω). The integrand is bounded by |X|2 for all n, and
Z
|X|2 dµM < ∞
[0,t]×Ω
Example 5.17. Let us record some simple integrals as consequences of the
properties.
(a) Let σ ≤ τ be two stopping times, and ξ a bounded Fσ -measurable
random variable. Define X = ξ1(σ,τ ] , or more explicitly,
Xt (ω) = ξ(ω)1(σ(ω),τ (ω)] (t).
As a caglad process, X is predictable. Let M be an L2 -martingale. Pick a
constant C ≥ |ξ(ω)|. Then for any T < ∞,
Z
X 2 dµM = E ξ 2 [M ]τ ∧T − [M ]σ∧T ≤ C 2 E [M ]τ ∧T
[0,T ]×Ω
Proof of Proposition 5.18. The Lemma shows that X ∈ L2 (αM +βN, P).
Replace the measure µM in the proof of Lemma 5.9 with the measure
134 5. Stochastic Integral
µ
e = µM + µN . The proof works exactly as before, and gives a sequence
of simple predictable processes Xn such that
Z
|X − Xn |2 d(µM + µN ) → 0
[0,T ]×Ω
for each T < ∞. This combined with the previous lemma says that Xn → X
simultaneously in spaces L2 (M, P), L2 (N, P), and L2 (αM + βN, P). (5.25)
holds for Xn by the explicit formula for the integral of a simple predictable
process, and the general conclusion follows by taking the limit.
5.2. Local square-integrable martingale integrator 135
Definition 5.22. Let M ∈ M2,loc , X ∈ L(M, P), and let {τk } be a localiz-
ing sequence for (X, M ). Define the event Ω0 as in the previous paragraph.
The stochastic integral X · M is the cadlag local L2 -martingale defined as
follows: on the event Ω0 set
(X · M )t (ω) = (1[0,τk ] X) · M τk t (ω)
(5.27)
for any k such that τk (ω) ≥ t.
Outside the event Ω0 set (X · M )t = 0 for all t.
This definition is independent of the localizing sequence {τk } in the sense
that using any other localizing sequence of stopping times gives a process
indistinguishable from X · M defined above.
(iv) X is adapted, has almost surely left-continuous paths, and XT∗ < ∞
almost surely for each T < ∞. Assume the underlying filtration {Ft } right-
continuous. Take
σk = inf{t ≥ 0 : |Xt | > k}.
Remark 5.26. Category (ii) is a special case of (iii), and category (iii) is a
special case of (iv). Category (iii) seems artificial but will be useful. Notice
that every caglad X satisfies X(t) = Y (t−) for the cadlag process Y defined
by Y (t) = X(t+), but this Y may fail to be adapted. Y is adapted if {Ft }
is right-continuous. But then we find ourselves in Category (iv).
and other notational conventions exactly as for the L2 integral. The stochas-
tic integral with respect to a local martingale inherits the path properties
of the L2 integral, as we observe in the next proposition. Expectations and
conditional expectations of (X · M )t do not necessarily exist any more so we
cannot even contemplate their properties.
Proposition 5.28. Let M, N ∈ M2,loc , X ∈ L(M, P), and let τ be a
stopping time.
(a) Linearity continues to hold: if also Y ∈ L(M, P), then
(αX + βY ) · M = α(X · M ) + β(Y · M ).
Furthermore,
= (X · M )τ ∧t = (X · M τ )t .
(5.32) (1[0,τ ] X) · M t
(c) Let Y ∈ L(N, P). Suppose Xt (ω) = Yt (ω) and Mt (ω) = Nt (ω) for
0 ≤ t ≤ τ (ω). Then
(X · M )τ ∧t = (Y · N )τ ∧t .
(d) Suppose X ∈ L(M, P) ∩ L(N, P). Then for α, β ∈ R, X ∈ L(αM +
βN, P) and
X · (αM + βN ) = α(X · M ) + β(X · N ).
Proof. The proofs are short exercises in localization. We show the way by
doing (5.31) and the first equality in (5.32).
Let {σk } be a localizing sequence for the pair (X, M ). Then {σk } is
a localizing sequence also for the pairs (1(σ,∞) X, M ) and (Z1(σ,∞) X, M ).
5.2. Local square-integrable martingale integrator 141
Given ω and t, pick k large enough so that σk (ω) ≥ t. Then by the definition
of the stochastic integrals for localized processes,
and
(Z1(τ,∞) X) · M )t (ω) = (1[0,σk ] Z1(τ,∞) X) · M σk )t (ω).
The right-hand sides of the two equalities above coincide, by an application
of (5.22) to the L2 -martingale M σk and the process 1[0,σk ] X in place of X.
This verifies (5.31).
The sequence {σk } works also for (1[0,τ ] X, M ). If t ≤ σk (ω), then
The first and the last equality are the definition of the local integral, the
middle equality an application of (5.23). This checks the first equality in
(5.32).
We come to a very helpful result for later development. The most im-
portant processes are usually either caglad or cadlag. The next proposition
shows that for left-continuous processes the integral can be realized as a
limit of Riemann sum-type approximations. For future benefit we include
random partitions in the result.
However, a cadlag process X is not necessarily predictable and therefore
not an admissible integrand. Nevertheless it turns out that the Riemann
sums still converge. They cannot converge to X · M because this integral
might not exist. Instead, these sums converge to the integral X− · M of the
caglad process X− defined by
(a) Assume X is left-continuous and satisfies (5.29). Then for each fixed
T < ∞ and ε > 0,
n o
lim P sup |Rn (t) − (X · M )t | ≥ ε = 0.
n→∞ 0≤t≤T
i=0
Thus {σk } is a localizing sequence for (Yn , M ). On the event {σk > T },
for 0 ≤ t ≤ T , by definition (5.27) and Proposition 5.15(b)–(c),
(Yn · M )t = (1[0,σk ] Yn ) · M σk )t = (Yn(k) · M σk )t .
Fix ε > 0. In the next bound we apply martingale inequality (3.8) and
the isometry (5.12).
n o
P sup |(Yn · M )t | ≥ ε ≤ P {σk ≤ T }
0≤t≤T
n o
Yn(k) · M σk
+P sup t
≥ε
0≤t≤T
Since ε1 > 0 can be taken arbitrarily small, the limit above must actually
equal zero.
At this point we have proved
n o
(5.34) lim P sup |Rn (t) − (X− · M )t | ≥ ε = 0
n→∞ 0≤t≤T
sup |R
en (t) − Rn (t)| ≤ |X(0)|
e · sup |M (t) − M (0)|.
0≤t<∞ 0≤t≤δn
= E{[M σk
]k } = E{(Mkσk )2 − (M0σk )2 } < ∞.
Along the way we used Lemma 3.23 and then the square-integrability of
M σk .
The following alternative characterization of membership in L(M, P) will
be useful for extending the stochastic integral to non-predictable integrands
in Section 5.5.
Lemma 5.31. Let M be a local L2 -martingale and X a predictable process.
Then X ∈ L(M, P) iff there exist stopping times ρk % ∞ (a.s.) such that
for each k,
Z
1[0,ρk ] |X|2 dµM < ∞ for all T < ∞.
[0,T ]×Ω
5.3. Semimartingale integrator 145
We leave the proof of this lemma as an exercise. The key point is that
for both L2 -martingales and local L2 -martingales, and a stopping time τ ,
µM τ (A) = µM (A ∩ [0, τ ]) for A ∈ P. (Just check that the proof of Lemma
5.14 applies without change to local L2 -martingales.)
Furthermore, we leave as an exercise proof of the result that if X, Y ∈
L(M, P) are µM -equivalent, which means again that
µM {(t, ω) : X(t, ω) 6= Y (t, ω)} = 0,
then X · M = Y · M in the sense of indistinguishability.
Justification of the definition. The first item to check is that the inte-
gral does not depend on the decomposition of Y chosen. Suppose Y =
Y0 + Mf + Ve is another decomposition of Y into a local L2 -martingale Mf
and an FV process V . We need to show that
e
Z Z Z Z
Xs dMs + Xs ΛV (ds) = X s d Ms +
f Xs ΛVe (ds)
(0,t] (0,t] (0,t] (0,t]
in the sense that the processes on either side of the equality sign are indistin-
guishable. By Proposition 5.28(d) and the additivity of Lebesgue-Stieltjes
measures, this is equivalent to
Z Z
Xs d(M − M )s =
f Xs ΛVe −V (ds).
(0,t] (0,t]
From Y = M + V = M
f + Ve we get
M −M
f = Ve − V
On the left is the stochastic integral, on the right the Lebesgue-Stieltjes in-
tegral evaluated separately for each fixed ω.
gives Z Z
X(s, ω) dZs (ω) = X(s, ω) ΛZ(ω) (ds)
(0,t] (0,t]
almost surely. By Theorem B.2, H contains all bounded P-measurable pro-
cesses.
This completes Step 1: (5.38) has been verified for the case where Z ∈
M2 and X is bounded.
Step 2. Now consider the case of a local L2 -martingale Z. By the
assumption on X we may pick a localizing sequence {τk } such that Z τk is
an L2 -martingale and 1(0,τk ] X is bounded. Then by Step 1,
Z Z
(5.40) 1(0,τk ] (s)X(s) dZsτk = 1(0,τk ] (s)X(s) ΛZ τk (ds).
(0,t] (0,t]
On the event {τk ≥ t} the left and right sides of (5.40) coincide almost
surely with the corresponding sides of (5.38). The union over k of the
events {τk ≥ t} equals almost surely the whole space Ω. Thus we have
verified (5.38) almost surely, for this fixed t.
The left-hand side of (5.40) coincides almost surely with (1[0,τk ] X)·Z τk t
due to the irrelevance of the time origin. By the definition (5.27) of the
stochastic integral, this agrees with (X · Z)t on the event {τk ≥ t}.
On the right-hand side of (5.40) we only need to observe that if τk ≥
t, then on the interval (0, t] 1(0,τk ] (s)X(s) coincides with X(s) and Zsτk
coincides with Zs . So it is clear that the integrals on the right-hand sides of
(5.40) and (5.39) coincide.
Returning
R to the justification of the definition, we now know that the
process X dY does not depend on the choice of the decomposition Y =
Y0 + M + V .
X · M is a local L2 -martingale, and for a fixed ω the function
Z
t 7→ Xs (ω) ΛV (ω) (ds)
(0,t]
has bounded variation on every compact interval (Lemma 1.16). R Thus the
definition (5.37) provides the semimartingale decomposition of X dY .
though to avoid confusing the issue that for a cadlag process X the limit
is not necessarily the stochastic integral of X. The integral X · Y may fail
to exist, and even if it exists, it does not necessarily coincide with X− · Y .
This is not a consequence of the stochastic aspect, but can happen also for
Lebesgue-Stieltjes integrals. (Find examples!)
Proposition 5.34. Let X be a real-valued process and Y a cadlag semi-
martingale. Suppose 0 = τ0n ≤ τ1n ≤ τ2n ≤ τ3n ≤ · · · are stopping times such
that for each n, τin → ∞ almost surely as i → ∞, and δn = sup0≤i<∞ (τi+1 n −
n
τi ) tends to zero almost surely as n → ∞. Define
∞
X
X(τin ) Y (τi+1
n
∧ t) − Y (τin ∧ t) .
(5.41) Sn (t) =
i=0
(a) Assume X is left-continuous and satisfies (5.36). Then for each fixed
T < ∞ and ε > 0,
n o
lim P sup |Sn (t) − (X · Y )t | ≥ ε = 0.
n→∞ 0≤t≤T
Furthermore,
= (G · Y )τ ∧t = (G · Y τ )t .
(5.43) (1[0,τ ] G) · Y t
(b) Suppose Gt (ω) = Ht (ω) and Yt (ω) = Zt (ω) for 0 ≤ t ≤ τ (ω). Then
(G · Y )σ∧t = (H · Z)σ∧t .
uniformly for t ∈ [0, T ], for almost every ω. This implies the convergence of
the sum of squares, so the quadratic variation [Y ] exists and satisfies
[Y ]t = Yt2 − Y02 − S(t).
By the cadlag path of Y , ∆Sn (t) → 2Yt− ∆Yt . Equality of the two limits
of Sn (t) gives
2Yt− ∆Yt = ∆(Y 2 )t − ∆[Y ]t
which rearranges to ∆[Y ]t = (∆Yt )2 .
Theorem 5.37. Let Y be a cadlag semimartingale, X a predictable process
that satisfies the local boundedness condition (5.36), and X · Y the stochastic
integral. Then for all ω in a set of probability one,
∆(X · Y )(t) = X(t)∆Y (t) for all 0 < t < ∞.
152 5. Stochastic Integral
Fix an ω on which both almost sure limits (5.44) and (5.45) hold. For
any t ∈ (0, T ], the uniform convergence in (5.45) implies ∆(Xnj · M )t →
∆(X · M )t . Also, since a path of Xnj · M is a step function, ∆(Xnj · M )t =
Xnj (t)∆Mt . (The last two points were justified explicitly in the proof of
Lemma 5.36 above.) Combining these with the limit (5.44) shows that, for
this fixed ω and all t ∈ (0, T ],
and
Z
f (s) dU (s) ≤ kf k∞ ΛVU (s, t).
(s,t)
Proof of Theorem 5.37 follows from combining Lemmas 5.38 and 5.39.
We introduce the following notation for the left limit of a stochastic integral:
Z Z
(5.46) H dY = lim H dY
(0,t) s%t, s<t (0,s]
converges to zero in probability, for each fixed 0 < T < ∞. Then for any
cadlag semimartingale Y , Hn · Y → 0 in probability, uniformly on compact
time intervals.
(1)
Let Hn = (Hn ∧ 1) ∨ (−1) denote the bounded process obtained by trun-
(1)
cation. If t ≤ σk ∧ ρn then Hn · M = Hn · M σk by part (c) of Proposition
(1)
5.28. As a bounded process Hn ∈ L2 (M τk , P), so by martingale inequality
5.4. Further properties of stochastic integrals 155
As k and T stay fixed and n → ∞, the last expectation above tends to zero.
This follows from the dominated convergence theorem under convergence
in probability (Theorem B.9). The integrand is bounded by the integrable
random variable [M σk ]T . Given δ > 0, pick K > δ so that
P [M σk ]T ≥ K < δ/2.
Then
n 2 o
P [M σk ]T G∗n (T ) ∧ 1 ≥ δ ≤ δ/2 + P G∗n (T ) ≥ δ/K
p
The second equality in (5.48) is the definition of the integral over (σ, σ +
t]. The proof of Theorem 5.41 follows after two lemmas.
Lemma 5.42. For any P-measurable function G, Ḡ(t, ω) = G(σ(ω) + t, ω)
is P̄-measurable.
Proof. Let {τk } localize (G, M ). Let νk = (τk − σ)+ . For any 0 ≤ t < ∞,
{νk ≤ t} = {τk ≤ σ + t} ∈ Fσ+t = F̄t by Lemma 2.2(ii)
5.4. Further properties of stochastic integrals 157
Observe that
σ,
u<σ
+
(5.50) σ + T ∧ (u − σ) = u, σ ≤u<σ+T
σ + T, u ≥ σ + T.
where the last inequality is a consequence of the assumption that {τk } lo-
calizes (G, M ).
To summarize thus far: we have shown that {νk } localizes (Ḡ, M̄ ). This
checks Ḡ ∈ L(M̄ , P̄).
Fix again k and continue denoting the L2 -martingales by Z = M τk and
Z̄ = M̄ νk . Consider a simple P-predictable process
m−1
X
Hn (t) = ξi 1(ui ,ui+1 ] (t).
i=0
Let k denote the index that satisfies uk+1 > σ ≥ uk . (If there is no such k
then H̄n = 0.) Then
X
H̄n (t) = ξi 1(ui −σ,ui+1 −σ] (t).
i≥k
The i = k term above develops differently from the others because Z̄(uk −σ)+ ∧t =
Z̄0 = 0. By (5.50),
At this point we have proved the lemma in the L2 case and a localization
argument remains. Given t, pick νk > t. Then τk > σ + t. Use the fact that
{νk } and {τk } are localizing sequences for their respective integrals.
In other words, the process Y has been stopped just prior to the stopping
time. This type of stopping is useful for processes with jumps. For example,
if
τ = inf{t ≥ 0 : |Y (t)| ≥ r or |Y (t−)| ≥ r}
then |Y τ | ≤ r may fail if Y jumped exactly at time τ , but |Y τ − | ≤ r is true.
For continuous processes Y τ − and Y τ coincide. More precisely, the
relation between the two stoppings is that
Y τ (t) = Y τ − (t) + ∆Y (τ )1{t ≥ τ }.
In other words, only a jump of Y at τ can produce a difference, and that is
not felt until t reaches τ .
The next example shows that stopping just before τ can fail to preserve
the martingale property. But it does preserve a semimartingale, because a
single jump can be moved to the FV part, as evidenced in the proof of the
lemma after the example.
The key facts that underlie the extension of the stochastic integral are
assembled in the next lemma.
Lemma 5.48. Let X be an adapted, measurable process. Then there exists
a P-measurable process X̄ such that
(5.59) m ⊗ P {(t, ω) ∈ R+ × Ω : X(t, ω) 6= X̄(t, ω)} = 0.
In particular, all measurable adapted processes are P ∗ -measurable.
Under the assumption µM m ⊗ P , we also have
(5.60) µ∗M {(t, ω) ∈ R+ × Ω : X(t, ω) 6= X̄(t, ω)} = 0.
164 5. Stochastic Integral
Following our earlier conventions, we say that X and X̄ are µ∗M -equivalent.
By Lemma 5.31, this is just another way of expressing X̄ ∈ L(M, P). It fol-
lows that the integral X̄ · M exists and is a member of M2,loc . In particular,
we can define X · M = X̄ · M as an element of M2,loc .
If we choose another P-measurable process Ȳ that is µ∗M -equivalent to
X, then X̄ and Ȳ are µM -equivalent, and the integrals X̄ · M and Ȳ · M are
indistinguishable by Exercise 5.7.
Exercises
Exercise 5.1. Show that if X : R+ ×Ω → R is P-measurable, then Xt (ω) =
X(t, ω) is a process adapted to the filtration {Ft− }.
Hint: Given B ∈ BR , let A = {(t, ω) : X(t, ω) ∈ B} ∈ P. The event
{Xt ∈ B} equals the t-section At = {ω : (t, ω) ∈ A}, so it suffices to show
that an arbitrary A ∈ P satisfies At ∈ Ft− for all t ∈ R+ . This follows
from checking that predictable rectangles have this property, and that the
collection of sets in BR+ ⊗ F with this property form a sub-σ-field.
Exercise 5.2. (a) Show that for any Borel function h : R+ → R, the
deterministic process X(t, ω) = h(t) is predictable. Hint: Intervals of the
type (a, b] generate the Borel σ-field of R+ .
(b) Evaluate Z
E Ns d[M ]s
[0,T ]
where the inner integral is the pathwise Lebsgue-Stieltjes integral, in accor-
dance with the interpretation of definition (5.1) of µM . Conclude that N
cannot be P-measurable.
(c) For comparison, evaluate explicitly
Z
E Ns− dNs .
(0,T ]
Here Ns− (ω) = limu%s Nu (ω) is the left limit. Explain why we know without
calculation that the answer must agree with part (a)?
168 5. Stochastic Integral
R (c) Use the formula you obtained in part (b) to check that the process
N (s−) dM (s) is a martingale. (Of course, this conclusion is part of the
Exercises 169
theory but the point here is to obtain it through computation. Part (a) and
Exercise 2.9 take care of parts of the work.)
R
(d) Suppose N were predictable. Then the stochastic integral N dM
would exist and be a martingale. Show that this is not true and conclude
that N cannot be predictable.
Hints: It might be easiest to find
Z Z Z
N (s) dM (s) − N (s−) dM (s) = N (s) − N (s−) dM (s)
(0,t] (0,t] (0,t]
and use the fact that the integral of N (s−) is a martingale.
Exercise 5.10 (Extended stochastic integral of the Poisson process). Let
Nt be a rate α Poisson process, Mt = Nt − αt and N− (t) = N (t−). Show
that N− is a modification of N , and
µ∗M {(t, ω) : Nt (ω) 6= Nt− (ω)} = 0.
Thus the stochastic integral N · M can be defined according to the extension
in Section 5.5 and this N · M must agree with N− · M .
Exercise 5.11 (Riemann sum approximation in M2 ). Let M be an L2
martingale, X ∈ L2 (M, P), and assume X also satisfies the hypotheses of
Proposition 5.29. Let π m = {0 = tm m m
1 < t2 < t3 < · · · } be partitions such
m k,m
that mesh π → 0. Let ξi = (Xtmi
∧ k) ∨ (−k), and define the simple
predictable processes
n
Wtk,m,n = ξim,k 1(tm
X
m (t).
i ,ti+1 ]
i=1
Then there exists a subsequences {m(k)} and {n(k)} such that
lim kX · M − W k,m(k),n(k) · M kM2 = 0.
k→∞
Exercise 5.12. Let 0 < a < b < ∞ be constants and M ∈ M2,loc . Find
the stochastic integral Z
1[a,b) (s) dMs .
(0,t]
Hint: Check that if M ∈ M2 then 1(a−1/n,b−1/n] converges to 1[a,b) in
L2 (M, P).
Chapter 6
Itô’s formula
where π = {0 = t0 < t1 < · · · < tm(π) = t} is a partition of [0, t]. The limit
is assumed to hold in probability. The quadratic variation [X] of a single
process X is then defined by [X] = [X, X]. When these processes exist, they
are tied together by the identity
[X, Y ] = 21 [X + Y ] − [X] − [Y ] .
(6.2)
For right-continuous martingales and local martingales M and N , [M ] and
[M, N ] exist. [M ] is an increasing process, which meant that almost every
171
172 6. Itô’s formula
The second equality above used Lemma 3.25. ξ moves freely in and out of
the integrals because they are path-by-path Lebesgue-Stieltjes integrals. By
additivity of the covariation conclusion (6.3) follows for G that are simple
predictable processes of the type (5.6).
Now take a general G ∈ L2 (M, P). Pick simple predictable processes
Gn such that Gn → G in L2 (M, P). Then (Gn · M )t → (G · M )t in L2 (P ).
By Lemma 6.1 [Gn · M, L]t → [G · M, L]t in L1 (P ). On the other hand the
previous lines showed
Z
[Gn · M, L]t = Gn (s) d[M, L]s .
(0,t]
Step 2. Now the case L, M ∈ M2,loc and G ∈ L(M, P). Pick stopping
times {τk } that localize both L and (G, M ). Abbreviate Gk = 1(0,τk ] G.
174 6. Itô’s formula
Then if τk (ω) ≥ t,
[G · M, L]t = [G · M, L]τk ∧t = [(G · M )τk , Lτk ]t = [(Gk · M τk ), Lτk ]t
Z Z
k τk τk
= Gs d[M , L ]s = Gs d[M, L]s .
(0,t] (0,t]
Proof. Follows from Proposition 6.2 above, and a general property of [M, N ]
for (local) L2 -martingales N and N , stated as Theorem 3.26 in Section
2.2.
Proof. Part (a). Fix t. Let {τk } be a localizing sequence for M . Then for
s ≤ t, [M τk ]s ≤ [M τk ]t = [M ]τk ∧t ≤ [M ]t = 0 almost surely by Lemma 3.23
and the t-monotonicity of [M ]. Consequently E{(Msτk )2 } = E{[M τk ]s } = 0,
from which Msτk = 0 almost surely. Taking k large enough so that τk (ω) ≥ s,
6.1. Quadratic variation 175
Proof. Let {τk } be a localizing sequence for (G, M ) and (H, N ). By part
(b) of Proposition 5.28 N τk = (G · M )τk = G · M τk , and so Proposition
6.2 gives the equality of Lebesgue-Stieltjes measures d[N τk ]s = G2s d[M τk ]s .
Then for any T < ∞,
Z Z
2 2 τk
E 1[0,τk ] (t)Ht Gt d[M ]t = E 1[0,τk ] (t)Ht2 d[N τk ]t < ∞
[0,T ] [0,T ]
because {τk } is assumed to localize (H, N ). This checks that {τk } localizes
(HG, M ) and so HG ∈ L(M, P).
Let L ∈ M2,loc . Equation (6.3) gives Gs d[M, L]s = d[N, L]s , and so
Z Z
[(HG) · M, L]t = Hs Gs d[M, L]s = Hs d[N, L]s = [H · N, L]t .
(0,t] (0,t]
Proof. For the same reason as in the proof of Proposition 6.2, it suffices to
show Z
[G · Y, Z]t = Gs d[Y, Z]s .
(0,t]
178 6. Itô’s formula
Applying Proposition 5.34 to each sum gives the claimed type of convergence
to the limit
Z Z Z
Gs− d(Y Z)s − Gs− Ys− dZs − Gs− Zs− dYs
(0,t] (0,t] (0,t]
R
which by (6.6) equals (0,t] Gs− d[Y, Z]s .
6.2. Itô’s formula 179
Itô’s formula contains a term which is a sum over the jumps of the
process. This sum has at most countably many terms because a cadlag path
has at most countably many discontinuities (Lemma A.5). It is also possible
to define rigorously what is meant by a convergent sum of uncountably many
terms, and arrive at the same value (see the discussion around (A.4) in the
appendix).
Theorem 6.11. Fix 0 < T < ∞. Let D be an open subset of R and
f ∈ C 2 (D). Let Y be a cadlag semimartingale with quadratic variation
process [Y ]. Assume that for all ω outside some event of probability zero,
Y [0, T ] ⊆ D. Then
Z Z
f (Yt ) = f (Y0 ) + f 0 (Ys− ) dYs + 21 f 00 (Ys− ) d[Y ]s
(0,t] (0,t]
(6.8) X n o
+ f (Ys ) − f (Ys− ) − f 0 (Ys− )∆Ys − 12 f 00 (Ys− )(∆Ys )2
s∈(0,t]
Part of the conclusion is that the last sum over s ∈ (0, t] converges absolutely.
Both sides of the equality above are cadlag processes, and the meaning of
the equality is that these processes are indistinguishable on [0, T ]. In other
words, there exists an event Ω0 of full probability such that for ω ∈ Ω0 , (6.8)
holds for all 0 ≤ t ≤ T .
happens.
Fix ω so that the limits in items (i)–(iii) above happen. Lastly we apply
the scalar case of Lemma A.11 to the cadlag function s → Ys (ω) on [0, t] and
the sequence of partitions π ` chosen above. For the closed set K in Lemma
A.11 take K = Y [0, T ]. For the continuous function φ in Lemma A.11 take
φ(x, y) = γ(x, y)(y − x)2 . By hypothesis, K is a subset of D. Consequently,
as verified above, the function
(
(x − y)−2 φ(x, y), x 6= y
γ(x, y) =
0, x=y
is continuous on K ×K. Assumption (A.8) of Lemma A.11 holds by item (iii)
above. The hypotheses of Lemma A.11 have been verified. The conclusion
is that for this fixed ω and each t ∈ [0, T ], the sum on line (6.12) converges
to
X X 2
φ(Ys− , Ys ) = γ(Ys− , Ys ) Ys − Ys−
s∈(0,t] s∈(0,t]
X n o
= f (Ys ) − f (Ys− ) − f 0 (Ys− )∆Ys − 21 f 00 (Ys− )(∆Ys )2 .
s∈(0,t]
Lemma A.11 also contains the conclusion that this last sum is absolutely
convergent.
To summarize, given 0 < T < ∞, we have shown that for almost every
ω, (6.8) holds for all 0 ≤ t ≤ T .
Proof. Part (a). Continuity eliminates the sum over jumps, and renders
endpoints of intervals irrelevant for integration.
Part (b). By Corollary A.9 the quadratic variation of a cadlag path
consists exactly of the squares of the jumps. Consequently
Z X
1 00 1 00 2
2 f (Ys− ) d[Y ]s = 2 f (Ys− )(∆Ys )
(0,t] s∈(0,t]
The open set D in the hypotheses of Itô’s formula does not have to be
an interval, so it can be disconnected.
The important hypothesis Y [0, T ] ⊆ D prevents the process from reach-
ing the boundary. Precisely speaking, the hypothesis implies that for some
δ > 0, dist(Y (s), Dc ) ≥ δ for all s ∈ [0, T ]. To prove this, assume the
contrary, namely the existence of si ∈ [0, T ] such that dist(Y (si ), Dc ) → 0.
Since [0, T ] is compact, we may pass to a convergent subsequence si → s.
And then by the cadlag property, Y (si ) converges to some point y. Since
dist(y, Dc ) = 0 and Dc is a closed set, y ∈ Dc . But y ∈ Y [0, T ], and we have
contradicted Y [0, T ] ⊆ D.
But note that the δ (the distance to the boundary of D) can depend on
ω. So the hypothesis Y [0, T ] ⊆ D does not require that there exists a fixed
closed subset H of D such that P {Y (t) ∈ H for all t ∈ [0, T ]} = 1.
Hypothesis Y [0, T ] ⊆ D is needed because otherwise a “blow-up” at
the boundary can cause problems. The next example illustrates why we
need to assume the containment in D of the closure Y [0, T ], and not merely
Y [0, T ] ⊆ D.
Example 6.13. Let D = (−∞, 1) ∪ ( 23 , ∞), and define
(√
1 − x, x < 1
f (x) =
0, x > 23 ,
a C 2 -function on D. Define the deterministic process
(
t, 0≤t<1
Yt =
1 + t, t ≥ 1.
Yt ∈ D for all t ≥ 0. However, if t > 1,
Z Z Z
0 0 0
f (Ys− ) dYs = f (s) ds + f (Y1− ) + f 0 (s) ds
(0,t] (0,1) (1,t]
= −1 + (−∞) + 0.
6.2. Itô’s formula 183
As the calculation shows, the integral is not finite. The problem is that the
closure Y [0, t] contains the point 1 which lies at the boundary of D, and the
derivative f 0 blows up there.
x = [x1 , x2 , . . . , xd ]T .
T
Df (t, x) = fx1 (t, x), fx2 (t, x), . . . , fxd (t, x)
fx1 ,x1 (t, x) fx1 ,x2 (t, x) · · · fx1 ,xd (t, x)
fx ,x (t, x) fx ,x (t, x) · · · fx2 ,xd (t, x)
2 1 2 2
D2 f (t, x) = .
.. .. .. ..
. . . .
fxd ,x1 (t, x) fxd ,x2 (t, x) · · · fxd ,xd (t, x)
Proof. Let us write Ytk = Yk (t) in the proof. The pattern is the same as in
the scalar case. Define a function φ on [0, T ]2 × D2 by the equality
f (t, Yt ) = f (0, Y0 )
X
(6.18) + ft (t ∧ ti , Yt∧ti ) (t ∧ ti+1 ) − (t ∧ ti )
i
d X
X
k k
(6.19) + fxk (t ∧ ti , Yt∧ti ) Yt∧ti+1
− Yt∧ti
k=1 i
j j
X X
1 k k
(6.20) + 2 fxj ,xk (t ∧ ti , Yt∧ti ) Yt∧ti+1
− Yt∧ti
Yt∧ti+1
− Yt∧ti
1≤j,k≤d i
X
(6.21) + φ(t ∧ ti , t ∧ ti+1 , Yt∧ti , Yt∧ti+1 ).
i
happens for 1 ≤ k ≤ d.
Fix ω such that Y [0, T ] ⊆ D and the limits in items (i)–(iv) hold. By the
above paragraph and by hypothesis these conditions hold for almost every
ω.
We wish to apply Lemma A.11 to the Rd -valued cadlag function s 7→
Ys (ω) on [0, t], the function φ defined by (6.17), the closed set K = Y [0, T ],
and the sequence of partitions π ` chosen above. We need to check that φ and
the set K satisfy the hypotheses of Lemma A.11. Continuity of φ follows
from the definition (6.17). Next we argue that if (sn , tn , xn , yn ) → (u, u, z, z)
in [0, T ]2 × K 2 while for each n, either sn 6= tn or xn 6= yn , then
φ(sn , tn , xn , yn )
(6.22) → 0.
|tn − sn | + |yn − xn |2
Given ε > 0, let I be an interval around u in [0, T ] and let B be an open
ball centered at z and contained in D such that
Thus
φ(sn , tn , xn , yn )
≤ 2ε
|tn − sn | + |yn − xn |2
for large enough n, and we have verified (6.22).
The function
φ(s, t, x, y)
(s, t, x, y) 7→
|t − s| + |y − x|2
is continuous at points where either s 6= t or x 6= y, as a quotient of two
continuous functions. Consequently the function γ defined by (A.7) is con-
tinuous on [0, T ]2 × K 2 .
Hypothesis (A.8) of Lemma A.11 is a consequence of the limit in item
(iii) above.
The hypotheses of Lemma A.11 have been verified. By this lemma, for
this fixed ω and each t ∈ [0, T ], the sum on line (6.21) converges to
X X n
φ(s, s, Ys− , Ys ) = f (s, Ys ) − f (s, Ys− )
s∈(0,t] s∈(0,t]
o
− Df (s, Ys− )∆Ys − 12 ∆YsT D2 f (s, Ys− )∆Ys .
This completes the proof of Theorem 6.14.
Remark 6.15 (Notation). Often Itô’s formula is expressed in terms of dif-
ferential notation which is more economical than the integral notation. As
an example, if Y is a continuous Rd -valued semimartingale, equation (6.16)
can be written as
d
X
df (t, Y (t)) = ft (t, Y (t)) dt + fxj (t, Y (t−)) dYj (t)
(6.23) j=1
X
1
+ 2 fxj ,xk (t, Y (t−)) d[Yj , Yk ](t).
1≤j,k≤d
As mentioned already, these “stochastic differentials” have no rigorous mean-
ing. The formula above is to be regarded only as an abbreviation of the
integral formula (6.16).
A function f is harmonic in D if ∆f = 0 on D.
Proof. Formula (6.24) comes directly from Itô’s formula, because [Bi , Bj ] =
δi,j t.
The process B τ is a (vector) L2 martingale that satisfies B τ [0, T ] ⊆ D
for all T < ∞. Thus Itô’s formula applies. Note that [Biτ , Bjτ ] = [Bi , Bj ]τ =
δi,j (t ∧ τ ). Hence ∆f = 0 in D eliminates the second-order term, and the
formula simplifies to
Z t
τ
f (B (t)) = f (z) + Df (B τ (s))T dB τ (s)
0
One can use Itô’s formula to find martingales, which in turn are useful
for calculations, as the next lemma and example show.
188 6. Itô’s formula
At this point we need to decide whether µ = 0 or not. Let us work the case
µ 6= 0. Solving for h gives
h(x) = C1 exp 2µσ −2 x + C2
Since Bt is a mean zero normal with variance t, one can verify that (6.26)
holds for all T < ∞.
Now Mt = h(Xt ) is a martingale. By optional stopping, Mτ ∧t is also a
martingale, and so EMτ ∧t = EM0 = h(0). By path continuity and τ < ∞,
Mτ ∧t → Mτ almost surely as t → ∞. Furthermore, the process Mτ ∧t is
bounded, because up to time τ process Xt remains in [a, b], and so |Mτ ∧t | ≤
C ≡ supa≤x≤b |h(x)|. Dominated convergence gives EMτ ∧t → EMτ as t →
∞. We have verified that Eh(Xτ ) = h(0).
Finally, we can choose the constants C1 and C2 so that h(b) = 1 and
h(a) = 0. After some details,
2
e−2µa/σ − 1
P (Xτ = b) = h(0) = .
e−2µa/σ2 − e−2µb/σ2
Can you explain what you see as you let either a → −∞ or b → ∞? (Decide
first whether µ is positive or negative.)
We leave the case µ = 0 as an exercise. You should get P (Xτ = b) =
(−a)/(b − a).
such that Bτ (j+1) − Bτ (j) = 4j+1 and Bτ (k+1) − Bτ (k) = −4k+1 . But then
since
j
X 4j+1 − 1 4j+1
|Bτ (j) | ≤ 4i = ≤ ,
4−1 2
i=1
Bτ (j+1) ≥ 4j /2,
and by the same argument Bτ (k+1) ≤ −4k /2. Thus limt→∞ Bt =
∞ and limt→∞ Bt = −∞ almost surely.
Almost every Brownian path visits every point infinitely often due to a
special property of one dimension: it is impossible to go “around” a point.
Proposition 6.21. Let Bt be Brownian motion in Rd , and let P z denote
the probability measure when the process Bt is started at point z ∈ Rd . Let
τr = inf{t ≥ 0 : |Bt | ≤ r}
be the first time Brownian motion hits the ball of radius r around the origin.
(a) If d = 2, P z (τr < ∞) = 1 for all r > 0 and z ∈ Rd .
(b) If d ≥ 3, then for z outside the ball of radius r,
d−2
z r
P (τr < ∞) = .
|z|
There will be an almost surely finite time T such that |Bt | > r for all t ≥ T .
(c) For d ≥ 2 and any z, y ∈ Rd ,
P z [ Bt 6= y for all 0 < t < ∞] = 1.
Note that z = y is allowed. That is why t = 0 is not included in the event.
The annulus A and stopping times σR and ζ are defined as above. The same
reasoning now leads to
R2−d − |z|2−d
(6.28) P z (τr < σR ) = .
R2−d − r2−d
Letting R → ∞ gives
d−2
|z|2−d
z r
P (τr < ∞) = 2−d =
r |z|
as claimed. Part (c) follows now because the quantity above tends to zero
as r → 0.
It remains to show that after some finite time, the ball of radius r is no
longer visited. Let r < R. Define σR1 = σ , and for n ≥ 2,
R
n−1
τrn = inf{t > σR : |Bt | ≤ r}
and
n
σR = inf{t > τrn : |Bt | ≥ R}.
1 < τ 2 < σ 2 < τ 3 < · · · are the successive visits to radius
In other words, σR r R r
R and back to radius r. Let α = (r/R)d−2 < 1. We claim that for n ≥ 2,
P z (τrn < ∞) = αn−1 .
1 < τ 2 , use the strong Markov property to restart the
For n = 2, since σR r
Brownian motion at time σR 1.
Then by induction.
n−1
P z (τrn < ∞) = P z (τrn−1 < σR < τrn < ∞)
n−1
n−1
= E z 1{τrn−1 < σR < ∞}P B(σR ) {τr < ∞}
This says that, conditioned on Fs , the increment M (t) − M (s) has normal
distribution with mean zero and covariance matrix identity. In particular,
M (t) − M (s) is independent of Fs . Thus M has all the properties of Brown-
ian motion. (For the last technical point, see Lemma B.14 in the appendix.)
Exercises
Exercise 6.1. Check that for a Poisson process N Itô’s formula needs no
proof, in other words it reduces immediately to an obvious identity.
194 6. Itô’s formula
Stochastic Differential
Equations
195
196 7. Stochastic Differential Equations
Since the integral term vanishes at time zero, equation (7.1) contains
the initial value X(0) = H(0). The integral equation (7.1) can be written
in the differential form
(7.2) dX(t) = dH(t) + F (t, X) dY (t), X(0) = H(0)
where the initial value must then be displayed explicitly. The notation can
be further simplified by dropping the superfluous time variables:
(7.3) dX = dH + F (t, X) dY, X(0) = H(0).
Equations (7.2) and (7.3) have no other interpretation except as abbrevi-
ations for (7.1). Equations such as (7.3) are known as SDE’s (stochas-
tic differential equations) even though rigorously speaking they are integral
equations.
and then identify the left-hand side zx0 + azx as the derivative (zx)0 . The
equation becomes
d
nZ t o nZ t o
x(t) exp a(s) ds = g(t) exp a(s) ds .
dt 0 0
Integrating from 0 to t gives
nZ t o Z t nZ s o
x(t) exp a(s) ds − x(0) = g(s) exp a(u) du ds
0 0 0
which rearranges into
n Z t o
x(t) = x(0) exp − a(s) ds
0
n Z t oZ t n Z s o
+ exp − a(s) ds g(s) exp − a(u) du ds.
0 0 0
Now one can check by differentiation that this formula gives a solution.
We leave this to the reader. The process defined by the SDE (7.8) or by the
formula (7.9) is known as the Ornstein-Uhlenbeck process.
Solutions of the previous two equations are defined for all time. Our the
last example of a linear equation is not.
7.1. Examples of stochastic equations and solutions 199
Example 7.4 (Brownian bridge). Fix 0 < T < 1. The SDE is now
X
(7.12) dX = − dt + dB, for 0 ≤ t ≤ T , with X0 = 0.
1−t
The integrating factor
nZ t
ds o 1
Zt = exp =
0 1−s 1−t
works, and we arrive at the solution
Z t
1
(7.13) Xt = (1 − t) dBs .
0 1−s
To check that this solves (7.12), apply the
R product formula d(U V ) = U dV +
V dU + d[U, V ] with U = 1 − t and V = (1 − s) dBs = (1 − t)−1 X. With
−1
Proof. The uniqueness of the solution of (7.15) will follow from the general
uniqueness theorem for solutions of semimartingale equations.
200 7. Stochastic Differential Equations
has only finitely many factors because a cadlag path has only finitely many
jumps exceeding a given size in a bounded time interval. Hence this part is
piecewise constant, cadlag and in particular FV. Let ξs = ∆Ys 1{|∆Y (s)|<1/2}
denote a jump of magnitude below 12 . It remains to show that
Y n X o
Ht = (1 + ξs ) exp −ξs = exp log(1 + ξs ) − ξs
s∈(0,t] s∈(0,t]
It follows that log Ht is a cadlag FV process (see Example 1.14 in this con-
text). Since the exponential function is locally Lipschitz, Ht = exp(log Ht )
is also a cadlag FV process.
To summarize, we have shown that eWt in (7.16) is a semimartingale
and Ut in (7.17) is a well-defined real-valued FV process. Consequently
Zt = eWt Ut is a semimartingale.
7.1. Examples of stochastic equations and solutions 201
The second part of the proof is to show that Z satisfies equation (7.15).
Let f (w, u) = ew u, and find
fw = f , fu = ew , fuu = 0, fuw = ew and fww = f .
Note that ∆Ws = ∆Ys because the jump in [Y ] at s equals exactly (∆Ys )2 .
A straight-forward application of Itô gives
Z Z
W (s−)
Zt = 1 + e dUs + Zs− dWs
(0,t] (0,t]
Z Z
+2 1
Zs− d[W ]s + eW (s−) d[W, U ]s
(0,t] (0,t]
X n o
+ ∆Zs − Zs− ∆Ys − eW (s−) ∆Us − 12 Zs− (∆Ys )2 − eW (s−) ∆Ys ∆Us .
s∈(0,t]
Since X
Wt = Yt − 1
2 [Y ]t − (∆Ys )2
s∈(0,t]
(i) F is a map from the space R+ ×Ω×DRd [0, ∞) into the space Rd×m
of d×m matrices. F satisfies a spatial Lipschitz condition uniformly
in the other variables: there exists a finite constant L such that this
holds for all (t, ω) ∈ R+ × Ω and all η, ζ ∈ DRd [0, ∞):
The Lipschitz condition in part (i) of Assumption 7.7 implies that F (t, ω, · )
is a function of the stopped path
η(0), t = 0, 0 ≤ s < ∞
t−
η (s) = η(s), 0≤s<t
η(t−), s ≥ t > 0.
In other words, the function F (t, ω, · ) only depends on the path on the time
interval [0, t).
R
Part (ii) guarantees that the stochastic integral F (s, X) dY (s) exists
for an arbitrary adapted cadlag process X and semimartingale Y . The
existence of the stopping times {νk } in part (ii) can be verified via this local
boundedness condition.
Lemma 7.9. Assume F satisfies parts (i) and (ii) of Assumpotion 7.7.
Suppose there exists a path ζ̄ ∈ DRd [0, ∞) such that for all T < ∞,
(7.20) c(T ) = sup |F (t, ω, ζ̄ )| < ∞.
t∈[0,T ], ω∈Ω
Then for any adapted Rd -valued cadlag process X there exist stopping times
νk % ∞ such that 1(0,νk ] (t)F (t, X) is bounded for each k.
Proof. Define
νk = inf{t ≥ 0 : |X(t)| ≥ k} ∧ inf{t > 0 : |X(t−)| ≥ k} ∧ k
These are bounded stopping times by Lemma 2.8. |X(s)| ≤ k for 0 ≤ s < νk
(if νk = 0 we cannot claim that |X(0)| ≤ k). The stopped process
X(0),
νk = 0
νk −
X (t) = X(t), 0 ≤ t < νk
X(νk −), t ≥ νk > 0
The last line above is a finite quantity because ζ̄ is locally bounded, being
a cadlag path.
7.2. Existence and uniqueness for a semimartingale equation 205
Here is how to apply the theorem to an equation that is not defined for
all time.
Proof. Extend the filtration, H, Y and F to all time in this manner: for
t ∈ (T, ∞) and ω ∈ Ω define Ft = FT , Ht (ω) = HT (ω), Yt (ω) = YT (ω), and
F (t, ω, η) = 0. Then the extended processes H and Y and the coefficient F
satisfy all the original assumptions on all of [0, ∞). Note in particular that
if G(t) = F (t, X) is a predictable process for 0 ≤ t ≤ T , then extending
it as a constant to (T, ∞) produces a predictable process for 0 ≤ t < ∞.
And given a predictable process X on [0, ∞), let {σk } be the stopping times
given by the assumption, and then define
(
σ k , σk < T
νk =
∞, σk = T.
These stopping times satisfy part (ii) of Assumption 7.7 for [0, ∞). Now
Theorem 7.18 gives a solution X for all time 0 ≤ t < ∞ for the extended
equation, and on [0, T ] X solves the equation with the original H, Y and F .
For the uniqueness part, given a solution X of the equation on [0, T ],
extend it to all time by defining Xt = XT for t ∈ (T, ∞). Then we have
a solution of the extended equation on [0, ∞), and the uniqueness theorem
applies to that.
Corollary 7.11. Let the assumptions be as in Theorem 7.8, except that the
Lipschitz assumption is weakened to this: for each 0 < T < ∞ there exists
a finite constant L(T ) such that this holds for all (t, ω) ∈ [0, T ] × Ω and all
η, ζ ∈ DRd [0, ∞):
Proof. For k ∈ N, the function 1{0≤t≤k} F (t, ω, η) satisfies the original hy-
potheses. By Theorem 7.8 there exists a process Xk that satisfies the equa-
tion
Z
(7.22) Xk (t) = H k (t) + 1[0,k] (s)F (s, Xk ) dY k (s).
(0,t]
Since this holds for all k, X is a solution to the original SDE (7.18).
Uniqueness works similarly. If X and X e solve (7.18), then X(k ∧ t) and
e ∧ t) solve (7.22). By the uniqueness theorem X(k ∧ t) = X(k
X(k e ∧ t) for
all t, and since k can be taken arbitrary, X = X. e
Example 7.12. Here are ways by which a coefficient F satisfying Assump-
tion 7.7 can arise.
(i) Let f (t, ω, x) be a P ⊗ BRd -measurable function from (R+ × Ω) × Rd
into d × m-matrices. Assume f satisfies the Lipschitz condition
|f (t, ω, x) − f (t, ω, y)| ≤ L(T )|x − y|
for (t, ω) ∈ [0, T ] × Ω and x, y ∈ Rd , and the local boundedness condition
sup |f (t, ω, 0)| : 0 ≤ t ≤ T, ω ∈ Ω < ∞
for all 0 < T < ∞. Then put F (t, ω, η) = f (t, ω, η(t−)).
7.2. Existence and uniqueness for a semimartingale equation 207
Next, let us specialize the existence and uniqueness theorem to Itô equa-
tions.
Corollary 7.13. Let Bt be a standard Brownian motion in Rm with respect
to a right-continuous filtration {Ft } and ξ an Rd -valued F0 -measurable ran-
dom variable. Fix 0 < T < ∞. Assume the functions b : [0, T ] × Rd → Rd
and σ : [0, T ] × Rd → Rd×m satisfy the Lipschitz condition
|b(t, x) − b(t, y)| + |σ(t, x) − σ(t, y)| ≤ L|x − y|
and the bound
|b(t, x)| + |σ(t, x)| ≤ L(1 + |x|)
for a constant L and all 0 ≤ t ≤ T , x, y ∈ Rd .
Then there exists a unique continuous process X on [0, T ] that is adapted
to {Ft } and satisfies
Z t Z t
(7.24) Xt = ξ + b(s, Xs ) ds + σ(s, Xs ) dBs
0 0
for 0 ≤ t ≤ T .
Proof. To fit this into Theorem 7.8, let Y (t) = [t, Bt ]T , H(t) = ξ, and
F (t, ω, η) = b(t, η(t−)), σ(t, η(t−)) .
208 7. Stochastic Differential Equations
Next we present the obligatory basic examples from ODE theory that
illustrate the loss of existence and uniqueness when the Lipschitz assumption
on F is weakened.
Example 7.14. Consider the equation
Z t p
x(t) = 2 x(s) ds.
0
√
The function f (x) = x is not Lipschitz on [0, 1] because f 0 (x) blows up
at the origin. The equation has infinitely many solutions. Two of them are
x(t) = 0 and x(t) = t2 .
The equation Z t
x(t) = 1 + x2 (s) ds
0
does not have a solution for all time. The unique solution starting at t = 0
is x(t) = (1 − t)−1 which exists only for 0 ≤ t < 1.
Rt
Proof. The indefinite integral a g(s) ds of an integrable function is an ab-
solutely continuous (AC) function. At Lebesgue-almost every t it is differ-
entiable and the derivative equals g(t). Consequently the equation
d −Bt t
Z Z t
−Bt
e g(s) ds = −Be g(s) ds + e−Bt g(t)
dt a a
7.3. Proof of the existence and uniqueness theorem 209
Proof. As {tni } goes through partitions of [0, t] with mesh tending to zero,
{γ(tni )} goes through partitions of [0, γ(t)] with mesh tending to zero. By
Proposition 5.34, both sides of (7.25) equal the limit of the sums
X
X(γ(tni )) Y (tni+1 ) − Y (tni ) .
Lemma 7.18. Suppose A is a nondecreasing cadlag function such that
A(0) = 0, and Z is a nondecreasing real-valued cadlag function. Then
γ(u) = inf{t ≥ 0 : A(t) > u}
210 7. Stochastic Differential Equations
Lemma 7.19. Let X be an adapted cadlag process and α > 0. Let τ1 <
τ2 < τ3 < · · · be the times of successive jumps in X of magnitude above α:
with τ0 = 0,
τk = inf{t > τk−1 : |X(t) − X(t−)| > α}.
Then the {τk } are stopping times.
from which
u t u t t 1
i i
X −X − >α+ .
n n n m
This shows ω ∈ A.
(If ε > 0 satisfies σ + ν < t − ε, then any rational r ∈ (ν, ν + ε) will do.) By
the definition of F̄ν ,
[
1
A ∩ {ν < r} = A ∩ {ν ≤ r − m } ∈ F̄r = Fσ+r ,
m∈N
and consequently by the definition of Fσ+r ,
A ∩ {ν < r} ∩ {σ + r ≤ t} ∈ Ft .
If A = Ω, we have showed that {σ + ν < t} ∈ Ft . By Lemma 2.5 and the
right-continuity of {Ft }, this implies that σ + ν is an {Ft }-stopping time.
For the general A ∈ F̄ν we have showed that
\
1
A ∩ {σ + ν ≤ t} = A ∩ {σ + ν < t + m} ∈ Ft+(1/n) .
m≥n
A(t) ≥ u then by strict monotonicity A(s) > u for all s > t, which implies
γ(u) ≤ t.
Strict monotonicity of A gives continuity of γ. Right continuity of γ
was already argued in Lemma 7.18. For left continuity, let s < γ(u). Then
A(s) < u because A(s) ≥ u together with strict increasingness would imply
the existence of a point t ∈ (s, γ(u)) such that A(t) > u, contradicting
t < γ(u). Then γ(v) ≥ s for v ∈ (A(s), u) which shows right continuity.
In summary: u 7→ γ(u) is a continuous nondecreasing function of bounded
stopping times such that γ(u) ≤ u. For any given ω and T , once u > A(T )
we have γ(u) ≥ T , and so γ(u) → ∞ as u → ∞.
If Y is of type (δ0 , K0 ) for any δ0 ≤ δ and K0 ≤ K then all jumps of A
satisfy |∆A(t)| ≤ c. This follows because the jumps of quadratic variation
and total variation obey ∆[M ](t) = (∆M (t))2 and ∆VSj (t) = |∆Sj (t)|.
For ` = 1, 2, let H` , X` , and Z` be adapted Rd -valued cadlag processes.
Assume they satisfy the equations
Z
(7.35) Z` (t) = H` (t) + F (s, X` ) dY (s), ` = 1, 2,
(0,t]
and
h i
sup |X1 (s) − X2 (s)|2 .
(7.37) φX (u) = E DX ◦ γ(u) = E
0≤s≤γ(u)
2φH (u) n u o
(7.41) φX (u) ≤ exp .
1−c 1−c
≤ 21 L−2 (u + c)kGk2∞ .
Z 2 d X
X m Z 2
G(s) dY (s) ≤2 Gi,j (s) dMj (s)
(0,t] i=1 j=1 (0,t]
(7.43)
d X
X m Z 2
+2 Gi,j (s) dSj (s) .
i=1 j=1 (0,t]
of stochastic integrals,
m Z
X 2
E sup Gi,j (s) dMj (s)
0≤t≤γ(u) j=1 (0,t]
m Z
X 2
≤ 4E Gi,j (s) dMj (s)
j=1 (0,γ(u)]
(7.44) m Z 2
X
≤ 4m E Gi,j (s) dMj (s)
j=1 (0,γ(u)]
Xm Z
≤ 4m E Gi,j (s)2 d[Mj ](s).
j=1 (0,γ(u)]
Xm Z
≤m E VSj (γ(u)) Gi,j (s)2 dVSj (s) .
j=1 (0,γ(u)]
Now we prove (7.42). Equations (7.43), (7.44) and (7.45), together with
the hypothesis VSj (t) ≤ K, imply that
Z 2
E sup G(s) dY (s)
0≤t≤γ(u) (0,t]
m
X Z
≤ 8dm E |G(s)|2 d[Mj ](s)
j=1 (0,γ(u)]
m
X Z
+ 2Kdm E |G(s)|2 dVSj (s)
j=1 (0,γ(u)]
Z
≤ 12 L−2 E |G(s)|2 dA(s)
(0,γ(u)]
h i
≤ 21 L−2 E sup |G(s)|2 · A(γ(u))
0≤t≤γ(u)
218 7. Stochastic Differential Equations
≤ 21 L−2 (u + c)kGk2∞ .
From A(γ(u)−) ≤ u and the bound c on the jumps in A came the bound
A(γ(u)) ≤ u + c used above. This completes the proof of Lemma 7.24.
≤ 2L−2 (u + c)C02 .
Now (7.47) follows from a combination of inequality (7.46), assumption
(7.38), and bound (7.48). Note that (7.47) does not require a bound on
X, due to the boundedness assumption on F and (7.38).
The Lipschitz assumption on F gives
2
F (s, X1 ) − F (s, X2 ) ≤ L2 DX (s−).
Apply (7.42) together with the Lipschitz bound to get
Z
2
E sup F (s, X1 ) − F (s, X2 ) dY (s)
0≤t≤γ(u) (0,t]
Z
1 −2 2
(7.49) ≤ 2L E F (s, X1 ) − F (s, X2 ) dA(s)
(0,γ(u)]
Z
≤ 21 E DX (s−) dA(s).
(0,γ(u)]
Now we prove part (a) under the assumption of bounded F . Take supre-
mum over 0 ≤ t ≤ γ(u) in (7.46), take expectations, and apply (7.49) to
7.3. Proof of the existence and uniqueness theorem 219
write
h i
sup |Z1 (t) − Z2 (t)|2
φZ (u) = E DZ ◦ γ(u) = E
0≤t≤γ(u)
(7.50) Z
≤ 2φH (u) + E DX (s−) dA(s).
(0,γ(u)]
To the dA-integral above apply first the change of variable from Lemma 7.16
and then inequality (7.26). This gives
φZ (u) ≤ 2φH (u)
Z
(7.51)
+ E A(γ(u)) − u DX ◦ γ(u−) + E DX ◦ γ(s−) ds.
(0,u]
For a fixed ω, cadlag paths are bounded on bounded time
R intervals, so apply-
ing Lemmas 7.16 and 7.18 to the path-by-path integral (0,γ(u)] DX (s−) dA(s)
is not problematic. And then, since the resulting terms are nonnegative,
their expectations exist.
By the definition of γ(u), A(s) ≤ u for s < γ(u), and so A(γ(u)−) ≤ u.
Thus by the bound c on the jumps of A,
A(γ(u)) − u ≤ A(γ(u)) − A(γ(u)−) ≤ c.
Applying this to (7.51) gives
Z
φZ (u) ≤ 2φH (u) + cE DX ◦ γ(u−) + E DX ◦ γ(s−) ds
(0,u]
(7.52) Z
= 2φH (u) + cφX (u) + φX (u) ds.
(0,u]
This is the desired conclusion (7.39), and the proof of part (a) for the case
of bounded F is complete.
We prove part (b) for bounded F . By assumption, Z` = X` and c < 1.
Now φX (u) = φZ (u), and by (7.47) this function is finite. Since it is nonde-
creasing, it is bounded on bounded intervals. Inequality (7.52) becomes
Z u
(1 − c)φX (u) ≤ 2φH (u) + φX (s) ds.
0
An application of Gronwall’s inequality (Lemma 7.15) gives the desired con-
clusion (7.41). This completes the proof of the proposition for the case of
bounded F .
Step 2. Return to the original hypotheses: Assumption 7.7 for F with-
out additional boundedness and Definition 7.22 for Y with type (δ, K − δ).
By part (ii) of Assumption 7.7, we can pick stopping times σk % ∞ and
constants Bk such that
1[0,σk ] (s)F (s, X` ) ≤ Bk for ` = 1, 2.
220 7. Stochastic Differential Equations
while h i
lim E sup |X1σk − (s) − X2σk − (s)|2 = φX (u)
k→∞ 0≤s≤γ(u)
Using the previous inequality, the outcome from Step 1 can be written
for part (a) as
h i
E sup |Z1σk − (s) − Z2σk − (s)|2
0≤s≤γ(u)
Z u
≤ 2φH (u) + cφX (u) + φX (s) ds.
0
and for part (b) as
h i 2φ (u) n u o
H
E sup |X1σk − (s) − X2σk − (s)|2 ≤ exp .
0≤s≤γ(u) 1−c 1−c
As k % ∞ the left-hand sides of the inequalities above converge to the
desired expectations. Parts (a) and (b) both follow, and the proof is com-
plete.
and
Past lemmas guarantee that τ1 and τ2 are stopping times. For τ3 , observe
that since VGj (t) is nondecreasing and cadlag,
m
[
{τ3 ≤ t} = VGj (t) ≥ K − 2δ .
j=1
(7.55) σ = τ1 ∧ τ2 ∧ τ3 ∧ T.
Stopping does not introduce new jumps, so the jumps of M σ are still bounded
by δ/2. The jumps of Y σ− are bounded by δ/2 since σ ≤ τ2 . On [0, σ) the
FV part S(t) = Gσ− (t) − ∆M (σ) · 1{t≥σ} has the jumps of Gσ− . These are
bounded by δ because ∆Gσ− (t) = ∆Y σ− (t) − ∆M σ− (t). At time σ S has
the jump ∆M (σ), bounded by δ/2. The total variation of a component Sj
of S is
VSj (t) ≤ VGσ− (t) + |∆Mj (σ)| ≤ VGj (τ3 −) + δ/2 ≤ K − 2δ + δ/2
j
≤ K − δ.
All the hypotheses of part (b) of Proposition 7.23 are satisfied, and we get
h i
E sup |X1σ− (t) − X2σ− (t)|2 = 0
0≤t≤γ(u)
for any u > 0, where γ(u) is the stopping time defined by (7.33). As we
let u % ∞ we get X1σ− (t) = X2σ− (t) for all 0 ≤ t < ∞. This implies that
X1 (t) = X2 (t) for 0 ≤ t < σ.
At time σ,
Z
X1 (σ) = H(σ) + F (s, X1 ) dY (s)
(0,σ]
Z
(7.58)
= H(σ) + F (s, X2 ) dY (s)
(0,σ]
= X2 (σ)
because the integrand F (s, X1 ) depends only on {X1 (s) : 0 ≤ s < σ} which
agrees with the corresponding segment of X2 , as established in the previous
paragraph. Now we know that X1 (t) = X2 (t) for 0 ≤ t ≤ σ. This concludes
the proof of Step 1.
Proof of Lemma 7.26. We check that the new F̄ satisfies all the hypothe-
ses. The Lipschitz property is immediate. Let Z̄ be a cadlag process adapted
to {F̄t }. Define the process Z by
(
X(t), t<σ
Z(t) =
X(σ) + Z̄(t − σ), t ≥ σ.
Then Z is a cadlag process adapted to {Ft } by Lemma 7.21. F̄ (t, Z̄) =
F (σ + t, Z) is predictable under {F̄t } by Lemma 5.42. Find stopping times
νk % ∞ such that 1(0,νk ] (s)F (s, Z) is bounded for each k. Define ρk =
(νk − σ)+ . Then ρk % ∞, by Lemma 7.21 ρk is a stopping time for {F̄t },
and 1(0,ρk ] (s)F̄ (s, Z̄) = 1(0,νk ] (σ + s)F (σ + s, Z) which is bounded.
Ȳ is a semimartingale by Theorem 5.41. X̄ and H̄ are adapted to {F̄t }
by part (iii) of Lemma 2.2 (recall that cadlag paths imply progressive mea-
surability).
The equation for X̄ checks as follows:
X̄(t) = X(σ + t) − X(σ)
Z
= H(σ + t) − H(σ) + F (s, X) dY (s)
(σ,σ+t]
Z
= H̄(t) + F (σ + s, X) dȲ (s)
(0,t]
Z
= H̄(t) + F̄ (s, X̄) dȲ (s).
(0,t]
The next to last equality is from (5.48), and the last equality from the
definition of F̄ and ζ ω,X̄ = X.
≤ 12 L−2 (u + c)Bk2 .
νk −
Part (a) of Proposition 7.23 applied to (Z1 , Z2 ) = (Xnνk − , Xn+1 ) and
(H1 , H2 ) = (H νk − , H νk − ) gives
Z u
(7.62) φn+1 (u) ≤ cφn (u) + φn (s) ds.
0
Lemma 7.30. Fix 0 < T < ∞. Let {φn } be nonnegative measurable func-
tions on [0, T ] such that φ0 ≤ B for some constant B, and inequality (7.62)
is satisfied for all n ≥ 0 and 0 ≤ u ≤ T . Then for all n and 0 ≤ u ≤ t,
n
X 1 n k n−k
(7.63) φn (u) ≤ B u c .
k! k
k=0
7.3. Proof of the existence and uniqueness theorem 227
Proof. First check this auxiliary equality for 0 < x < 1 and k ≥ 0:
∞
X k!
(7.64) (m + 1)(m + 2) · · · (m + k)xm = .
(1 − x)k+1
m=0
One way to see this is to write the left-hand sum as
∞ ∞
dk m+k dk X m+k dk
k
X x
x = k x = k
dxk dx dx 1 − x
m=0 m=0
k j dk−j
X k d
1
xk
= j
· k−j
j dx 1 − x dx
j=0
k
X k j!
= · k(k − 1) · · · (j + 1)xj
j (1 − x)j+1
j=0
k j k
k! X k x k! x
= · = 1+
1−x j 1−x 1−x 1−x
j=0
k!
= .
(1 − x)k+1
For an alternative proof of (7.64) see Exercise 7.2.
228 7. Stochastic Differential Equations
After changing the order of summation, the sum in the statement of the
lemma becomes
∞ ∞
X uk X
n(n − 1) · · · (n − k + 1)δ n−k
(k!)2
k=0 n=k
∞ ∞
X uk X
= (m + 1)(m + 2) · · · (m + k)δ m
(k!)2
k=0 m=0
∞
X u k ∞ k
k! 1 X 1 u
= · =
(k!)2 (1 − δ)k+1 1−δ k! 1 − δ
k=0 k=0
1 n u o
= · exp .
1−δ 1−δ
It follows by the next Borel-Cantelli argument that for almost every ω, the
cadlag functions {Xnνk − (t) : n ∈ Z+ } on the interval t ∈ [0, γ(u)] form a
Cauchy sequence in the uniform norm. (Recall that we are still holding k
fixed.) Pick α ∈ (c, 1). By Chebychev’s inequality and (7.63),
∞ n o ∞
νk −
X X
P sup |Xn+1 (t) − Xnνk − (t)| ≥ αn/2 ≤ B α−n φn (u)
n=0 0≤t≤γ(u) n=0
∞ X
n
X 1 n u k c n−k
≤B .
k! k α α
n=0 k=0
under uniform distance (Lemma A.3), for almost every ω there exists a
cadlag function t 7→ X
ek (t) on the interval [0, γ(u)] such that
For this we let n → ∞ in (7.61) to obtain (7.68) in the limit. The left
side of (7.61) converges to the left side of (7.68) almost surely, uniformly on
compact time intervals by (7.67). For the the right side of (7.61) we apply
Theorem 5.40. From the Lipschitz property of F
ek ) − F (t, X νk − ) ≤ L · sup |X
F (t, X ek (s) − X νk − (s)|.
n n
0≤s<t
(Note the range of the supremum over s.) The convergence in (7.67) gives
the hypothesis in Theorem 5.40, and we conclude that the right side of
(7.61) converges to the right side of (7.68), in probability, uniformly over
t in compact time intervals. We can get almost sure convergence along a
subsequence. Equation (7.68) has been verified.
This can be repeated for all values of k. The limit (7.67) implies that if
k < m, then X em = X ek on [0, νk ). Since νk % ∞, we conclude that there is a
single well-defined cadlag process X on [0, ∞) such that X = X ek on [0, νk ).
On [0, νk ) equation (7.68) agrees term by term with the equation
Z
(7.69) X(t) = H(t) + F (s, X) dY (s).
(0,t]
For the integral term this follows from part (b) of Proposition 5.46 and
the manner of dependence of F on the path: since X = X ek on [0, νk ),
F (s, X) = F (s, X
ek ) on [0, νk ].
230 7. Stochastic Differential Equations
Define
(
X(t),
e 0 ≤ t < ρ1
X1 (t) =
X1 (ρ1 −) + ∆H(ρ1 ) + F (ρ1 , X)∆Y (ρ1 ), t ≥ ρ1 .
e
The equality of stochastic integrals of the last two lines above is an instance
of the general identity G · Y τ = (G · Y )τ (Proposition 5.35). The case k = 1
of the lemma has been proved.
Assume a process Xk (t) solves (7.70). Define F̄t = Fρk +t ,
H̄(t) = H(ρk + t) − H(ρk ),
Ȳ (t) = Y (ρk + t) − Y (ρk ),
and F̄ (t, ω, η) = F (ρk + t, ω, ζ ω,η )
where the cadlag path ζ ω,η is defined by
(
ω,η Xk (s), 0 ≤ s < ρk
ζ (s) =
Xk (ρk ) + η(s − ρk ), s ≥ ρk .
under the filtration {F̄t }. We need to check that this equation is the type
to which Proposition 7.28 applies. Semimartingale Ȳ σk+1 − satisfies the as-
sumption of Proposition 7.28, again by the argument already used in the
uniqueness proof. F̄ satisfies Assumption 7.7 exactly as was proved earlier
for Lemma 7.26.
The hypotheses of Proposition 7.28 have been checked, and so there
exists a process X̄ that solves (7.72). Note that X̄(0) = H̄(0) = 0. Define
Xk (t), t < ρk
X (ρ ) + X̄(t − ρ ),
k k k ρk ≤ t < ρk+1
Xk+1 (t) =
Xk+1 (ρk+1 −) + ∆H(ρk+1 )
+F (ρk+1 , Xk+1 )∆Y (ρk+1 ), t ≥ ρk+1 .
The last case of the definition above makes sense because it depends on the
segment {Xk+1 (s) : 0 ≤ s < ρk+1 } defined by the two preceding cases.
By induction Xk+1 satisfies the equation (7.70) for k + 1 on [0, ρk ]. From
the definition of F̄ , F̄ (s, X̄) = F (ρk + s, Xk+1 ) for 0 ≤ s ≤ σk+1 . Then for
ρk < t < ρk+1 ,
Xk+1 (t) = Xk (ρk ) + X̄(t − ρk )
Z
= Xk (ρk ) + H̄(t − ρk ) + F̄ (s, X̄) dȲ (s)
(0,t−ρk ]
Z
= Xk (ρk ) + H(t) − H(ρk ) + F (s, Xk+1 ) dY (s)
(ρk ,t]
Z
= H(t) + F (s, Xk+1 ) dY (s)
(0,t]
Z
ρk+1
=H (t) + F (s, Xk+1 ) dY ρk+1 (s).
(0,t]
The last line of the definition of Xk+1 extends the validity of the equation
to t ≥ ρk+1 :
Xk+1 (t) = Xk+1 (ρk+1 −) + ∆H(ρk+1 ) + F (ρk+1 , Xk+1 )∆Y (ρk+1 )
= H(ρk+1 −) + ∆H(ρk+1 )
Z
+ F (s, Xk+1 ) dY (s) + F (ρk+1 , Xk+1 )∆Y (ρk+1 )
(0,ρk+1 )
Z
= H(ρk+1 ) + F (s, Xk+1 ) dY (s)
(0,ρk+1 ]
Z
ρk+1
=H (t) + F (s, Xk+1 ) dY ρk+1 (s).
(0,t]
We are ready to finish off the proof of Theorem 7.32. If k < m, stopping
the processes of the equation
Z
ρm
Xm (t) = H (t) + F (s, Xm ) dY ρm (s)
(0,t]
at ρk gives the equation
Z
ρk ρk ρk
Xm (t) = H (t) + F (s, Xm ) dY ρk (s).
(0,t]
ρk
By the uniqueness theorem, Xm = Xk for k < m. Consequently there exists
a process X that satisfies X = Xk on [0, ρk ] for each k. Then for 0 ≤ t ≤ ρk ,
equation (7.70) agrees term by term with the desired equation
Z
X(t) = H(t) + F (s, X) dY (s).
(0,t]
Hence this equation is valid on every [0, ρk ], and thereby on [0, ∞). The
existence and uniqueness theorem has been proved.
234 7. Stochastic Differential Equations
Exercises
Exercise 7.1. (a) Show that for any g ∈ C[0, 1],
Z t
g(s)
lim(1 − t) 2
ds = g(1).
t%1 0 (1 − s)
(b) Let the process Xt be defined by (7.13) for 0 ≤ t < 1. Show that
Xt → 0 as t → 1.
Hint. Apply Exercise 6.2 and then part (a).
Exercise 7.2. Here is an alternative inductive proof of the identity (7.64)
used in the existence proof for solutions of SDE’s. Fix −1 < x < 1 and let
X∞
ak = (m + 1)(m + 2) · · · (m + k)xm
m=0
and
∞
X
bk = m(m + 1)(m + 2) · · · (m + k)xm .
m=0
Compute a1 explicitly, then derive the identities bk = xak+1 and ak+1 =
(k + 1)ak + bk .
Appendix A
Analysis
235
236 A. Analysis
(b) Fix t ∈ [a, b]. We first show f (s) → f (t) as s & t. (If t = b no
approach from the right is possible and there is nothing to prove.) Let
ε > 0. Pick n so that
sup |fn (x) − f (x)| ≤ ε.
x∈[a,b]
and since ε > 0 was arbitrary, the existence of the limit f (t−) = lims%t f (s)
follows.
(c) Right continuity of f follows from part (b), and left continuity is
proved by the same argument.
238 A. Analysis
Lemma A.5. Suppose f has left and right limits at all points in [0, T ]. Let
α > 0. Define the set of jumps of magnitude at least α by
(A.2) U = {t ∈ [0, T ] : |f (t+) − f (t−)| ≥ α}
with the interpretations f (0−) = f (0) and f (T +) = f (T ). Then U is finite.
Consequently f can have at most countably many jumps in [0, T ].
Proof. If U were infinite, it would have a limit point s ∈ [0, T ]. This means
that every interval (s − δ, s + δ) contains a point of U , other than s itself.
But since the limits f (s±) both exist, we can pick δ small enough so that
|f (r) − f (t)| < α/2 for all pairs r, t ∈ (s − δ, s), and all pairs r, t ∈ (s, s + δ).
Then also |f (t+) − f (t−)| ≤ α/2 for all t ∈ (s − δ, s) ∪ (s, s + δ), and so these
intervals cannot contain any point from U .
if the limit exists as the mesh of the partition π = {0 = s0 < · · · < sm(π) = t}
tends to zero. The quadratic variation of f is [f ] = [f, f ].
P
In the next development we write down sums of the type α∈A x(α)
where A is an arbitrary set and x : A → R a function. Such a sum can be
defined as follows: the sum has a finite value c if for every ε > 0 there exists
a finite set B ⊆ A such that if E is a finite set with B ⊆ E ⊆ A then
X
(A.4) x(α) − c ≤ ε.
α∈E
P
If A x(α) has a finite value, x(α) 6= 0 for at most countably many α-values.
In the above condition, the set B must contain all α such that |x(α)| > 2ε,
for otherwise adding on one such term violates the inequality. In other
words, the set {α : |x(α)| ≥ η} is finite for any η > 0.
If x(α) ≥ 0 always, then
X X
x(α) = sup x(α) : B is a finite subset of A
α∈A α∈B
gives a value in [0, ∞] which agrees with the definition above if it is finite.
As for familiar series, absolute convergence implies convergence. In other
words if
X
|x(α)| < ∞,
α∈A
P
then the sum A x(α) has a finite value.
Lemma A.7. Let f be a function with left and right limits on [0, T ]. Then
X
|f (s+) − f (s−)| ≤ Vf (T ).
s∈[0,T ]
The sum is actually over a countable set because f has at most countably
many jumps.
and
X
[fn , gn ]T = fn (s) − fn (s−) gn (s) − gn (s−)
s∈(0,T ]
Shrink δ further so that δ < ε/C0 . Then for any finite H ⊇ Un (δ),
X
[fn , gn ]T − fn (s) − fn (s−) gn (s) − gn (s−)
s∈H
X
≤ |fn (s) − fn (s−)| · |gn (s) − gn (s−)|
s∈H
/
X
≤δ |fn (s) − fn (s−)| ≤ δC0 ≤ ε.
s∈(0,T ]
Now if s ∈ Un (δ), |gn (s) − gn (s−)| ≥ δ and the above inequality imply
|g(s) − g(s−)| ≥ δ − α/2. This jump cannot fall in the forbidden range
(δ − α, δ), so in fact it must satisfy |g(s) − g(s−)| ≥ δ and then s ∈ U (δ).
Now we can complete the argument. Take n large enough so that U (δ) ⊃
Un (δ), and take H = U (δ) in the estimate above. Putting the estimates
together gives
[f, g] − [fn , gn ] ≤ 2ε
X
+ f (s) − f (s−) g(s) − g(s−)
s∈U (δ)
X
− fn (s) − fn (s−) gn (s) − gn (s−) .
s∈U (δ)
As U (δ) is a fixed finite set, the difference of two sums over U tends to zero
as n → ∞. Since ε > 0 was arbitrary, the proof is complete.
The Euclidean norm is |x| = (x21 + · · · + x2d )1/2 , and we apply this also to
matrices in the form
X 1/2
2
|A| = ai,j .
1≤i,j≤d
φ(s, t, x, y)
, s 6= t or x 6= y
(A.7) γ(s, t, x, y) = |t − s| + |y − x|2
0, s = t and x = y
244 A. Analysis
is also continuous on [0, T ]2 × K 2 . Let π ` = {0 = t`0 < t`1 < · · · < t`m(`) = T }
be a sequence of partitions of [0, T ] such that mesh(π ` ) → 0 as ` → ∞, and
m(`)−1
2
X
(A.8) C0 = sup g(t`i+1 ) − g(t`i ) < ∞.
` i=0
Then
m(`)−1
X X
φ t`i , t`i+1 , g(t`i ), g(t`i+1 ) =
(A.9) lim φ s, s, g(s−), g(s) .
`→∞
i=0 s∈(0,T ]
by the cadlag property, because for each `, t`i(k) < sk ≤ t`i(k)+1 , while both
extremes converge to sk . The sum on the left-hand side of (A.10) is by
definition the supremum of sums over finite sets, hence the inequality in
(A.10) follows.
By continuity of γ there exists a constant C1 such that
|φ(s, t, x, y)| ≤ C1 |t − s| + |y − x|2
(A.11)
for all s, t ∈ [0, T ] and x, y ∈ K. From (A.11) and (A.10) we get the bound
X
φ s, s, g(s−), g(s) ≤ C0 C1 < ∞.
s∈(0,T ]
A.1. Continuous, cadlag and BV functions 245
This absolute convergence implies that the sum on the right-hand side of
(A.9) can be approximated by finite sums. Given ε > 0, pick α > 0 small
enough so that
X X
φ s, s, g(s−), g(s) − φ s, s, g(s−), g(s) ≤ ε
s∈Uα s∈(0,T ]
where
Uα = {s ∈ (0, t] : |g(s) − g(s−)| ≥ α}
m(`)−1
X X
φ t`i , t`i+1 , g(t`i ), g(t`i+1 ) −
φ s, s, g(s−), g(s)
i=0 s∈(0,T ]
X
φ t`i , t`i+1 , g(t`i ), g(t`i+1 )
≤
i∈I `
X X
φ t`i , t`i+1 , g(t`i ), g(t`i+1 ) −
+ φ s, s, g(s−), g(s) + ε.
i∈J ` s∈Uα
The first sum after the inequality above is bounded above by ε, by (A.8),
(A.12) and (A.13). The difference of two sums in absolute values vanishes as
` → ∞, because for large enough ` each interval (t`i , t`i+1 ] for i ∈ J ` contains
a unique s ∈ Uα , and as ` → ∞,
by the cadlag property. (Note that Uα is finite by Lemma A.5 and for large
enough ` index set J` has exactly one term for each s ∈ Uα .) We conclude
m(`)−1
X X
φ t`i , t`i+1 , g(t`i ), g(t`i+1 ) −
lim sup φ s, s, g(s−), g(s) ≤ 2ε.
`→∞ i=0 s∈(0,T ]
∂f ∂2f
fx1 = and fx1 ,x2 = .
∂x1 ∂x1 ∂x2
These will always be applied to functions with continuous partial derivatives
so the order of differentiation does not matter. The gradient Df is the
column vector of first-order partial derivatives:
T
Df (x) = fx1 (x), fx2 (x), . . . , fxd (x) .
248 A. Analysis
satisfies
ψ(s, x, y) = f (s, y) − f (s, x) − fx (s, x)(y − x).
By the mean value theorem there exists a point τ between s and t such that
f (t, y) − f (s, y) = ft (τ, y)(t − s).
By the intermediate value theorem there exists a point θ between x and y
such that Z y
00
ψ(x, y) = f (θ) (y − z) dz = 12 f 00 (θ)(y − x)2 .
x
A.2. Differentiation and integration 249
The application of the intermediate value theorem goes like this. Let f 00 (a)
and f 00 (b) be the minimum and maximum of f 00 in [x, y] (or [y, x] if y < x).
Then
ψ(x, y)
f 00 (a) ≤ 1 2
≤ f 00 (b).
2 (y − x)
The intermediate value theorem gives a point θ between a and b such that
ψ(x, y)
f 00 (θ) = 1 .
− x)2
2 (y
Proof. Let νe denote the measure restricted to the subspace B, and gen and
ge denote functions restricted to this space. Since
Z Z Z
p p
|e
gn − ge| de
ν= |gn − g| dν ≤ |gn − g|p dν
B B X
we have Lp (e
ν) convergence gen → ge. Lp norms (as all norms) are subject to
the triangle inequality, and so
ke
gn kLp (eν ) − ke
g kLp (eν ) ≤ ke
gn − gekLp (eν ) → 0.
Consequently
Z Z
p
|gn | dν = gn kpLp (eν )
ke → g kpLp (eν )
ke = |g|p dν.
B B
Proof. Check that the property is true for a step function of the type (A.17).
Then approximate an arbitrary f ∈ Lp (R) with a step function.
Proposition A.18. Let T be an invertible linear transformation on Rn and
f a Boirel or Lebesgue measurable function on Rn . Then if f is either in
L1 (Rn ) or nonnegative,
Z Z
(A.18) f (x) dx = |det T | f (T (x)) dx.
Rn Rn
Exercises
Exercise A.1. Let A be a set and x : A → R a function. Suppose
X
c1 ≡ sup |x(α)| : B is a finite subset of A < ∞.
α∈B
P
Show that then the sum α∈A x(α) has a finite value in the sense of the
definition stated around equation (A.4).
P
Hint. Pick finite sets Bk such that Bk |x(α)| > c1 − 1/k. Show that
P
the sequence ak = Bk x(α) is a Cauchy sequence. Show that c = lim ak is
the value of the sum.
Appendix B
Probability
For a proof, see the Appendix in [2]. The π-λ-theorem has the following
version for functions.
S
Theorem B.2. Let R be a π-system on a space X such that X = Bi for
some pairwise disjoint sequence Bi ∈ R. Let H be a linear space of bounded
functions on X. Assume that 1B ∈ H for all B ∈ R, and assume that H is
closed under bounded, increasing pointwise limits: if f1 ≤ f2 ≤ f3 ≤ · · · are
elements of H and supn,x fn (x) ≤ c for some constant c, then f = lim fn
lies in H. Then H contains all bounded σ(R)-measurable functions.
251
252 B. Probability
Proof. It suffices to check that µ(A) = ν(A) for all A ∈ σ(R) that lie inside
some Ri . Then for a general B ∈ σ(R),
X X
µ(B) = µ(B ∩ Ri ) = ν(B ∩ Ri ) = ν(B).
i i
Inside a fixed Rj , let
D = {A ∈ σ(R) : A ⊆ Rj , µ(A) = ν(A)}.
D is a λ-system. Checking property (2) uses the fact that Rj has finite
measure under µ and ν so we can subtract: if A ⊆ B and both lie in D, then
µ(B \ A) = µ(B) − µ(A) = ν(B) − ν(A) = ν(B \ A),
so B \A ∈ D. By hypothesis D contains the π-system {A ∈ R : A ⊆ Rj }. By
the π-λ-theorem D contains all the σ(R)-sets that are contained in Rj .
Lemma B.4. Let ν and µ be two finite Borel measures on a metric space
(S, d). Assume that
Z Z
(B.1) f dµ = f dν
S S
for all bounded continuous functions f on S. Then µ = ν.
Two definitions of classes of sets that are more primitive than σ-algebras,
hence easier to deal with. A collection A of subsets of Ω is an algebra if
(i) Ω ∈ A.
B.1. General matters 253
(ii) Ac ∈ A whenever A ∈ A.
(iii) A ∪ B ∈ A whenever A ∈ A and B ∈ A.
(i) ∅ ∈ S.
(ii) If A, B ∈ S then also A ∩ B ∈ S.
(iii) If A ∈ S, then Ac is a finite disjoint union of elements of S.
\ [ \
Ac = Sic = Ri,k(i) .
1≤i≤m (k(1),...,k(m)) 1≤i≤m
Proof. Since the tail of a convergent series can be made arbitrarily small,
we have \ ∞ [ ∞ X ∞
P An ≤ P (An ) → 0
m=1 n=m n=m
as m % ∞.
Proof. It suffices to show that every subsequence {nk } has a further sub-
subsequence {nkj } such that EXnkj → EX as j → ∞. So let {nk } be
given. Convergence in probability Xnk → X implies almost sure conver-
gence Xnkj → X along some subsubsequence {nkj }. The standard domi-
nated convergence theorem now gives EXnkj → EX.
Proof. Part (i). By the monotonicity of the sequence E(Xn |A) and the
ordinary monotone convergence theorem, for A ∈ A,
E 1A · lim E(Xn |A) = lim E 1A E(Xn |A) = lim E 1A Xn = E 1A X
n→∞ n→∞ n→∞
= E 1A E(X|A) .
Since A ∈ A is arbitrary, this implies the almost sure equality of the A-
measurable random variables limn→∞ E(Xn |A) and E(X|A).
Part (ii). The sequence Yk = inf m≥k Xm increases up to lim Xn . Thus
by part (i),
E lim Xn |A = lim E inf Xk |A ≤ lim E(Xn |A).
n→∞ k≥n n→∞
256 B. Probability
Equivalently, the following two conditions are satisfied. (i) supα E|Xα | < ∞.
(ii) Given ε > 0, there exists a δ > 0 such that for every event B such that
P (B) ≤ δ,
Z
sup |Xα | dP ≤ ε.
α∈A B
Proof. We have
lim E E[ |Xn − X| |A] = lim E |Xn − X| = 0,
n→∞ n→∞
and since
| E[Xn |A] − E[X|A] | ≤ E[ |Xn − X| |A],
we conclude that E[Xn |A] → E[X|A] in L1 . L1 convergence implies a.s.
convergence along some subsequence.
Lemma B.14. Let X be a random d-vector and A a sub-σ-field on the
probability space (Ω, F, P ). Let
Z
T
φ(θ) = eiθ x µ(dx) (θ ∈ Rd )
Rd
B.2. Construction of Brownian motion 257
that satisfies the Hölder property. The distribution of this extension will be
the Wiener measure on C.
Let
1 n x2 o
(B.4) gt (x) = √ exp −
2πt 2t
be the density of the normal distribution with mean zero and variance t (g for
Gaussian). For an increasing n-tuple of positive times 0 < t1 < t2 < · · · < tn ,
let t = (t1 , t2 , . . . , tn ). We shall write x = (x1 , . . . , xn ) for vectors in Rn ,
and abbreviate dx = dx1 dx2 · · · dxn to denote integration with respect to
Lebesgue measure on Rn . Define a probability measure µt on Rn by
Z Yn
(B.5) µt (A) = 1A (x) gt1 (x1 ) gti −ti−1 (xi − xi−1 ) dx
Rn i=2
for A ∈ BRn . Before proceeding further, we check that this definition is
the right one, namely that µt is the distribution we want for the vector
(Bt1 , Bt2 , . . . , Btn ).
Lemma B.16. If a one-dimensional standard Brownian motion B exists,
then for A ∈ BRn
P (Bt1 , Bt2 , . . . , Btn ) ∈ A = µt (A).
Let us convince ourselves again that this definition is the one we want.
Qs should represent the distribution of the vector (Bs1 , Bs2 , . . . , Bsn ), and
indeed this follows from Lemma B.16:
Z
f (π −1 (x)) dµπs = E (f ◦ π −1 )(Bsπ(1) , Bsπ(2) ) , . . . , Bsπ(n) )
Rn
= E f (Bs1 , Bs2 , . . . , Bsn ) .
For the second equality above, apply to yi = Bsi the identity
π −1 (yπ(1) , yπ(2) , . . . , yπ(n) ) = (y1 , y2 , . . . , yn )
which is a consequence of the way we defined the action of a permutation
on a vector in Section 1.2.4.
Let us check the consistency properties (i) and (ii) required for the Ex-
tension Theorem 1.27. Suppose t = ρs for two distinct n-tuples s and t
from Q2 and a permutation ρ. If π orders t then π ◦ ρ orders s, because
tπ(1) < tπ(2) implies sρ(π(1)) < sρ(π(2)) . One must avoid confusion over how
the action of permutations is composed:
π(ρs) = (sρ(π(1)) , sρ(π(2)) , . . . , sρ(π(n)) )
because π(ρs) i = (ρs)π(i) = sρ(π(i)) . Then
This checks (i). Property (ii) will follow from this lemma.
Lemma B.17. Let t = (t1 , . . . , tn ) be an ordered n-tuple , and let t̂ =
(t1 , . . . , tj−1 , tj+1 , . . . , tn ) be the (n − 1)-tuple obtained by removing tj from
t. Then for A ∈ BRj−1 and B ∈ BRn−j ,
µt (A × R × B) = µt̂ (A × B).
260 B. Probability
index sets. The next proposition is the main step of the so-called Kolmogorov-
Centsov criterion for continuity. We shall discuss this at the end of the sec-
tion. The process {Xq } referred to in the proposition is completely general,
while we will of course apply the result to the particular {Xq } defined above
on the probability space (Ω2 , G2 , Q).
Proposition B.18. Suppose {Xq : q ∈ Q02 } is a stochastic process defined
on some probability space (Ω, F, P ) with the following property: there exist
constants K < ∞ and α, β > 0 such that
E |Xs − Xr |β ≤ K|s − r|1+α for all r, s ∈ Q02 .
(B.8)
Let 0 < γ < α/β and T < ∞. Then for almost every ω there exists a
constant C(ω) < ∞ such that
(B.9) |Xs (ω) − Xr (ω)| ≤ C(ω)|s − r|γ for all r, s ∈ Q02 ∩ [0, T ].
We prove the claim and after that return to the main thread of the proof.
Let q, r ∈ Q02 ∩ [0, 2M ] satisfy 0 < r − q < 2−N (1−η) . Pick an integer m ≥ N
such that
2−(m+1)(1−η) ≤ r − q < 2−m(1−η) .
Pick integers i and j such that
(i − 1)2−m < q ≤ i2−m and j2−m ≤ r < (j + 1)2−m .
Then necessarily i ≤ j because by (B.10),
r − q ≥ 2−(m+1)(1−η) > 2−m ,
so there must be at least one dyadic rational of the type k2−m in the inteval
(q, r). On the other hand j − i ≤ 2m (r − q) < 2mη , so (i, j) ∈ Im .
We can express the dyadic rationals q and r as
r = j2−m + 2−r(1) + · · · + 2−r(k) and q = i2−m − 2−q(1) − · · · − 2−q(`)
for integers
m < r(1) < r(2) < · · · < r(k) and m < q(1) < q(2) < · · · < q(`).
To see this for r, let r = h2−L , so that from j2−m ≤ r < (j + 1)2−m follows
j2L−m ≤ h < (j + 1)2L−m . Then h is of the form
L−m−1
X
h = j2L−m + ap 2p
p=0
for ap ∈ {0, 1}, and dividing this by 2L gives the expression for r.
We bound the difference in three parts.
|Xq − Xr | ≤ |Xq − Xi2−m | + |Xi2−m − Xj2−m | + |Xj2−m − Xr |.
The middle term satisfies
|Xi2−m − Xj2−m | ≤ (j − i)γ 2−mγ ≤ 2−m(1−η)γ
because we are on the event HN which lies inside Gm . For the first term,
`
X
X i2−m − 2−q(1) − · · · − 2−q(h−1)
|Xq − Xi2−m | ≤
h=1
` ∞
X
−q(h) γ
X 2−γ(m+1)
≤ (2 ) ≤ (2−γ )p = .
1 − 2−γ
h=1 p=m+1
264 B. Probability
and by using the definition of the event Gq(h) . A similar argument gives the
same bound for the third term. Together we have
2 · 2−γ(m+1)
|Xq − Xr | ≤ 2−m(1−η)γ +
1 − 2−γ
2(1−η)γ 2
≤ 2−(m+1)(1−η)γ · −γ
+ 2−(m+1)(1−η)γ ·
1−2 1 − 2−γ
2(1−η)γ + 2
≤ |r − q|γ · .
1 − 2−γ
This completes the proof of (B.11).
Now we finish the proof of the proposition with the help of (B.11). First,
X X X X X
P (HNc
)≤ P (Gcn ) ≤ K2M 2−nλ
N ≥1 N ≥1 n≥N N ≥1 n≥N
K2M X
= 2−N λ < ∞.
1 − 2−λ
N ≥1
Corollary B.19. Let {Xq } be the process defined by (B.7) on the probability
space (Ω2 , G2 , Q), where Q is the probability measure whose existence came
from Kolmogorov’s Extension Theorem. Let 0 < γ < 12 . Then there is an
event Γ such that Q(Γ) = 1 with this property: for all ξ ∈ Γ and T < ∞,
there exists a finite constant CT (ξ) such that
(B.12) |Xs (ξ) − Xr (ξ)| ≤ CT (ξ)|s − r|γ for all r, s ∈ Q02 ∩ [0, T ].
In particular, for ξ ∈ Γ the function q 7→ Xq (ξ) is uniformly continuous on
Q02 ∩ [0, T ] for every T < ∞.
Proof. We need to check the hypothesis (B.8). Due to the definition of the
finite-dimensional distributions of Q, this reduces to computing a moment
of the Gaussian distribution. Fix an integer m ≥ 2 large enough so that
1 1 0
2 − 2m > γ. Let 0 ≤ q < r be indices in Q2 . In the next calculation, note
that after changing variables in the dy2 -integral it no longer depends on y1 ,
and the y1 -variable can be integrated away.
ZZ
E Q (Xr − Xq )2m = (y2 − y1 )2m gq (y1 )gr−q (y2 − y1 ) dy1 dy2
2
Z Z R
= dy1 gq (y1 ) dy2 (y2 − y1 )2m gr−q (y2 − y1 )
ZR ZR Z
2m
= dy1 gq (y1 ) dx x gr−q (x) = dx x2m gr−q (x)
R R R
x2 o
Z
1 2m
n
= p x exp − dx
2π(r − q) R 2(r − q)
Z n z2 o
m
= (r − q) z 2m exp − dz = Cm |r − q|m ,
R 2
where Cm = 1 · 3 · 5 · · · (2m − 1), the product of the odd integers less than
2m. We have verified the hypothesis (B.8) for the values α = m − 1 and
β = 2m, and by choice of m,
1 1
0 < γ < α/β = 2 − 2m .
Proposition B.18 now implies the following. For each T < ∞ there exists
an event ΓT ⊆ Ω2 such that Q(ΓT ) = 1 and for every ξ ∈ ΓT there exists a
finite constant C(ξ) such that
(B.13) |Xr (ξ) − Xq (ξ)| ≤ C(ξ)|r − q|γ
for any q, r ∈ [0, T ] ∩ Q02 . Take
∞
\
Γ= ΓT .
T =1
With the uniform continuity in hand, we can now extend the definition
of the process to the entire time line. By Lemma A.12, for each ξ ∈ Γ there
is a unique continuous function t 7→ Xt (ξ) for t ∈ [0, ∞) that coincides with
the earlier values Xq (ξ) for q ∈ Q02 . The value Xt (ξ) for any t ∈
/ Q02 can be
defined by
Xt (ξ) = lim Xqi (ξ)
i→∞
for any sequence {qi } from Q02 such that qi → t. This tells us that the
random variables {Xt : 0 ≤ t < ∞} are measurable on Γ. To have a
continuous process Xt defined on all of Ω2 set
Xt (ξ) = 0 for ξ ∈
/ Γ and all t ≥ 0.
Comparison with Lemma B.16 shows that (Xt1 , Xt2 , . . . , Xtn ) has the dis-
tribution that Brownian motion should have. An application of Lemma
B.4 is also needed here, to guarantee that it is enough to check continuous
functions φ. It follows that {Xt } has independent increments, because this
property is built into the definition of the distribution µt .
To complete the construction of Brownian motion and finish the proof
of Theorem 2.21, we define the measure P 0 on C as the distribution of the
process X = {Xt }:
P 0 (B) = Q{ξ ∈ Ω2 : X(ξ) ∈ A} for A ∈ BC .
B.2. Construction of Brownian motion 267
define Yt = lim Xqi along any sequence qi ∈ Q02 ∩ [0, T ] that converges to
t. The hypothesis implies that then Xqi → Xt in probability, and hence
Xt = Yt almost surely. The limits extend the Hölder property (B.9) from
dyadic rational time points to all of [0, T ] as claimed in (B.15)
Bibliography
[1] R. M. Dudley. Real analysis and probability. The Wadsworth & Brooks/Cole Mathe-
matics Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove,
CA, 1989.
[2] R. Durrett. Probability: theory and examples. Duxbury Press, Belmont, CA, second
edition, 1996.
[3] G. B. Folland. Real analysis: Modern techniques and their applications. Pure and
Applied Mathematics. John Wiley & Sons Inc., New York, second edition, 1999.
[4] P. E. Protter. Stochastic integration and differential equations. Springer-Verlag, Berlin,
second edition, 2004.
[5] S. Resnick. Adventures in stochastic processes. Birkhäuser Boston Inc., Boston, MA,
1992.
[6] K. R. Stromberg. Introduction to classical real analysis. Wadsworth International, Bel-
mont, Calif., 1981. Wadsworth International Mathematics Series.
269
Index
filtration
augmented, 36
definition, 35
left-continuous, 41
right-continuous, 41
usual conditions, 41
path space, 50
C–space, 50
D–space, 50
Poisson process
compensated, 66, 68
homogeneous, 66
martingales, 68
not predictable, 167
on an abstract space, 64
predictable
271