0% found this document useful (0 votes)
579 views

Stochastic Analysis Notes

1. The document introduces stochastic analysis and discusses some foundational concepts in probability theory, such as the notion of assigning probabilities or "measures" to infinite collections of events. 2. It proves that it is not possible to define a measure that satisfies certain natural properties, like additivity, for every subset of the real numbers. 3. It also introduces some useful results in probability theory, such as the Borel-Cantelli lemmas about almost sure convergence of event sequences.

Uploaded by

christianblanco
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
579 views

Stochastic Analysis Notes

1. The document introduces stochastic analysis and discusses some foundational concepts in probability theory, such as the notion of assigning probabilities or "measures" to infinite collections of events. 2. It proves that it is not possible to define a measure that satisfies certain natural properties, like additivity, for every subset of the real numbers. 3. It also introduces some useful results in probability theory, such as the Borel-Cantelli lemmas about almost sure convergence of event sequences.

Uploaded by

christianblanco
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

STOCHASTIC ANALYSIS

NOTES

Ivan F Wilde
Chapter 1

Introduction

We know from elementary probability theory that probabilities of disjoint


events “add up”, that is, if A and B are events with A ∩ B = ∅, then
P (A ∪ B) = P (A) + P (B). For infinitely-many events the situation is a little
more subtle. For example, suppose that X is a random variable
S with uniform
distribution on the interval (0, 1). Then { X ∈ (0, 1) } = s∈(0,1) { X = s }
and P (X ∈ (0, 1)) = 1 but P (X = s) = 0 for every s ∈ (0, 1). So the
probabilities on the right hand side do not “add up” to that on the left
hand side.
A satisfactory theory must be able to cope with this.
Continuing with this uniform distribution example, given an arbitrary
subset S ⊂ (0, 1), we might wish to know the value of P (X ∈ S). This
seems a reasonable request but can we be sure that there is an answer, even
in principle? We will consider the following similar question.
Does it make sense to talk about the length of every subset of R ? More
precisely, does there exist a “length” or “measure” m defined on all subsets
of R such that the following hold?

1. m(A) ≥ 0 for all A ⊂ R and m(∅) = 0.

2. If A
S1 , A2 ,. . . P
is any sequence of pairwise disjoint subsets of R, then
m n An = n m(An ).

3. If I is an interval [a, b], then m(I) = ℓ(I) = b − a, the “length” of I.

4. m(A + a) = m(A) for any A ⊂ R and a ∈ R (translation invariance).

The answer to this question is “no”, it is not possible. This is a famous


“no-go” result.

Proof. To see this, we first construct a special collection of subsets of [0, 1)


defined via the following equivalence relation. For x, y ∈ [0, 1), we say that
x ∼ y if and only if x − y ∈ Q. Evidently this is an equivalence relation on
[0, 1) and so [0, 1) is partitioned into a family of equivalence classes.

1
2 Chapter 1

One such is Q ∩ [0, 1). No other equivalence class can contain a rational.
Indeed, if a class contains one rational, then it must contain all the rationals
which are in [0, 1). Choose one point from each equivalence class to form the
set A0 , say. Then no two elements of A0 are equivalent and every equivalence
class has a representative which belongs to A0 .
For each q ∈ Q ∩ [0, 1), we construct a subset Aq of [0, 1) as follows. Let
Aq = { x + q − [x + q] : x ∈ A0 } where [t] denotes the integer part of the
real number t. In other words, Aq is got from A0 + q by translating that
portion of A0 + q which lies in the interval [1, 2) back to [0, 1) simply by
subtracting 1 from each such element.
If we write A0 = Bq′ ∪ Bq′′ where Bq′ = { x ∈ A0 : x + q < 1 } and
Bq = A0 \ Bq′ , then we see that Aq = (Bq′ + q) ∪ (Bq′′ + q − 1). The translation
′′

invariance of m implies that m(Aq ) = m(A0 ).


Claim. If r, s ∈ Q ∩ [0, 1) with r 6= s, then Ar ∩ As = ∅.
Proof. Suppose that x ∈ Ar ∩ As . Then there is α ∈ A0 such that x =
α + r − [α + r] and β ∈ A0 such that x = β + s − [β + s]. It follows that
α ∼ β which is not possible unless α = β which would mean that r = s.
This proves the claim.
S
Claim. q∈Q∩[0,1) Aq = [0, 1).
Proof. Since Aq ⊂ [0, 1), we need only show that the right hand side is a
subset of the left hand side. To establish this, let x ∈ [0, 1). Then there is
a ∈ A0 such that x ∼ a, that is x − a ∈ Q.
Case 1. x ≥ a. Put q = x − a ∈ Q ∩ [0, 1). Then x = a + q so that x ∈ Aq .
Case 2. x < a. Since both x ∈ [0, 1) and a ∈ [0, 1), it follows that 2 >
1 + x > a. Put r = 1 + x − a. Then 0 < r < 1 (because x < a) and r ∈ Q.
Hence x = a + r − 1 so that x ∈ Ar . The claim is proved.

Finally, we get our contradiction. We have m([0, 1)) = ℓ([0, 1)) = 1. But
[ X
m([0, 1)) = m( Aq ) = m(Aq )
q∈Q∩[0,1) q∈Q∩[0,1)

which is not possible since m(Aq ) = m(A0 ) for all q ∈ Q ∩ [0, 1).
We have seen that the notion of length simply cannot be assigned to every
subset of R. We might therefore expect that it is not possible to assign a
probability to every subset of the sample space, in general.
The following result is a further very dramatic indication that we may
not be able to do everything that we might like.
Theorem 1.1 (Banach-Tarski). It is possible to cut-up a sphere of radius one
(in R3 ) into a finite number of pieces and then reassemble these pieces (via
standard Euclidean motions in R3 ) into a sphere of radius 2.
Moral - all the fuss with σ-algebras and so on really is necessary if we want
to develop a robust (and rigorous) theory.

ifwilde Notes
Introduction 3

Some useful results


Frequent use is made of the following.

Proposition 1.2.TLet (An ) be a sequence of events such that P (An ) = 1 for


all n. Then P ( n An ) = 1.

Proof. We note that P (Acn ) = 0 and so


S c Sm c
P( n An ) = lim P ( n=1 An )=0
m→∞
Sm c
Pm c
because P ( n=1 An )≤ n=1 P (An ) = 0 for every m. But then
T T c
S c
P( n An ) = 1 − P( ( n An ) ) = 1 − P( n An )=1

as required.

Suppose that (An ) is a sequence of events in a probability space (Ω, Σ, P ).


We define the event { An infinitely-often } to be the event

{ An infinitely-often } = { ω ∈ Ω : ∀N ∃n > N such that ω ∈ An } .


T S
Evidently, { An infinitely-often } = n k≥n Ak and so { An infinitely-often }
really is an event, that is, it belongs to Σ.

Lemma 1.3 (Borel-Cantelli).

(i)P(First Lemma) Suppose that (An ) is a sequence of events such that


n P (An ) is convergent. Then P (An infinitely-often) = 0.

(ii) (Second
P Lemma) Let (An ) be a sequence of independent events such
that n P (An ) is divergent. Then P (An infinitely-often) = 1.

Proof.P(i) By hypothesis, it follows that for any ε > 0 there is N ∈ N such


that ∞ n=N P (An ) < ε. Now
T S S
P (An infinitely-often) = P ( n k≥n Ak ) ≤ P( k≥N Ak ) .

But for any m


m
X
Sm
P( k=N Ak ) ≤ P (Ak ) < ε
k=N

and so, taking the limit m → ∞, it follows that


S Sm
P (An infinitely-often) ≤ P ( k≥N Ak ) = lim P ( k=N Ak ) ≤ ε
m

and therefore P (An infinitely-often) = 0, as required.

November 22, 2004


4 Chapter 1

(ii) We have
T S
P (An infinitely-often) = P ( n k≥n Ak )
S
= lim P ( k≥n Ak
)
n
Sm+n
= lim lim P ( k=n Ak ) .
n m

The proof now proceeds by first taking complements, then invoking the
independence hypothesis and finally by the inspirational use of the inequality
1 − x ≤ e−x for any x ≥ 0. Indeed, we have
S Tm+n c
P ( ( m+n c
k=n Ak ) ) = P ( k=n Ak )
m+n
Y
= P (Ack ) , by independence,
k=n
m+n
Y
≤ e−P (Ak ) , since P (Ack ) = 1 − P (Ak ) ≤ e−P (Ak ) ,
k=n
P m+n
= e− k=n P (Ak )

→0
P S
as m → ∞, since k P (Ak ) is divergent. It follows that P ( k≥n Ak ) = 1
and so P (An infinitely-often) = 1.

Example 1.4. A fair coin is tossed repeatedly. Let An be the event that
the outcome at the nth play is “heads”. Then P (An ) = 21 and evidently
P
n P (An ) is divergent (and A1 , A2 , . . . are independent). It follows that
P (An infinitely-often) = 1. In other words, in a sequence of coin tosses,
there will be an infinite number of “heads” with probability one.
Now let Bn be the event that the outcomes of the five consecutive plays
at the times 5n, 5n +P1, 5n + 2, 5n + 3 and 5n + 4 are all “heads”. Then
P (Bn ) = ( 12 )5 and so n P (Bn ) is divergent. Moreover, the events B1 , B2 , . . .
are independent and so P (Bn infinitely-often) = 1. In particular, it follows
that, with probability one, there is an infinite number of “5 heads in a row”.

Functions of a random variable


If X is a random variable and g : R → R is a Borel function, then Y = g(X)
is σ(X)-measurable (where σ(X) denotes the σ-algebra generated by X).
The converse is true. If X is discrete, then one can proceed fairly directly.
Suppose, by way of illustration, S that X assumes the finite number of distinct
values x1 , . . . , xm and that Ω = mk=1 Ak where X = xk on Ak . Then σ(X) is
generated by the finite collection { A1 , . . . , Am P
} and so any σ(X)-measurable
random variable Y must have the form Y =P k yk 1Ak for some y1 , . . . , ym
in R. Define g : R → R by setting g(x) = m k=1 yk 1{ xk } (x). Then g is a

ifwilde Notes
Introduction 5

Borel function and Y = g(X). To prove this for the general case, one takes
a far less direct approach.
Theorem 1.5. Let X be a random variable on (Ω, S, P ) and suppose that Y
is σ(X)-measurable. Then there is a Borel function g : R → R such that
Y = g(X).
Proof. Let C denote the class of Borel functions of X,
C = { ϕ(X) : for some Borel function ϕ : R → R }
Then C has the following properties.
(i) 1A ∈ C for any A ∈ σ(X).
To see this, we note that σ(X) = X −1 (B) and so there is B ∈ B such that
A = X −1 (B). Hence 1A (ω) = 1B (X(ω)) and so 1A ∈ C because 1B : R → R
is Borel.
(ii) Clearly, any linear combination of members of C also belongs to C.
This fact, together with (i) means that C contains all simple functions on
(Ω, σ(X)).
(iii) C is closed under pointwise convergence.
Indeed, suppose that (ϕn ) is a sequence of Borel functions on R such that
ϕn (X(ω)) → Z(ω) for each ω ∈ Ω. We wish to show that Z ∈ C, that is,
that Z = ϕ(X) for some Borel function ϕ : R → R.
Let B = { s ∈ R : limn ϕn (s) exists }. Then
B = { s ∈ R : (ϕn (s)) is a Cauchy sequence in R }
\ [ \
= { ϕm (s) − k1 < ϕn (s) < ϕm (s) + 1
k }
k∈N N ∈N m,n>N
| {z }
1 1
{ ϕn (s) < ϕm (s)+ k } ∩ { ϕm (s)− k < ϕn (s) }

and it follows that B ∈ B. Furthermore, by hypothesis, ϕn (X(ω)) converges


(to Z(ω)) and so X(ω) ∈ B for all ω ∈ Ω.
Let ϕ : R → R be given by
(
limn ϕn (s), s ∈ B
ϕ(s) =
0, s∈/ B.
We claim that ϕ is a Borel function on R. To see this, let ψn (s) = ϕn (s) 1B (s).
Each ψn is Borel and ψn (s) converges to ϕ(s) for each s ∈ R. It follows that
ϕ is also a Borel function on R. Indeed,
[ [ \
{ s : ϕ(s) < α } = { s : ψn (s) < α − k1 }
k∈N N ∈N n≥N

is a Borel subset of R for any α ∈ R.


To complete the proof of (iii), we note that for any ω ∈ Ω, X(ω) ∈ B and
so ϕn (X(ω)) → ϕ(X(ω)). But ϕn (X(ω)) → Z(ω) and so we conclude that
Z = ϕ(X) and is of the required form.

November 22, 2004


6 Chapter 1

Finally, to complete the proof of the theorem, we note that any Borel
measurable function Y : Ω → R is a pointwise limit of simple functions and
so must belong to C, by (ii) and (iii) above. In other words, any such Y has
the form Y = g(X) for some Borel function g : R → R.

L versus L
Let (Ω, S, P ) be a probability space. For any 1 ≤ p < ∞, the space
Lp (Ω, S, P ) is the collection of measurable functions
R (random variables)
f : Ω → R (or C) such that E(|f |p ) < ∞, i.e., Ω |f (ω)|p dP is finite.
R 1/p
One can show that Lp is a linear space. Let kf kp = Ω |f (ω)|p dP .
Then (using Minkowski’s inequality), one can show that

kf + gkp ≤ kf kp + kgkp

for any f, g ∈ Lp . As a consequence, k · kp is a semi-norm on Lp — but not a


norm. Indeed, if g = 0 almost surely, then kgkp = 0 even though g need not
be the zero function. It is interesting to note that if q obeys 1/p + 1/q = 1
(q is called the exponent conjugate to p), then
Z
kf kp = sup{ |f g| dP : kgkq ≤ 1 }.

Of further interest is Hölder’s inequality


Z
|f g| dP ≤ kf kp kgkq

for any f ∈ Lp Lq
and g ∈ and with 1/p+1/q = 1. This reduces (essentially)
to the Cauchy-Schwarz inequality when p = 2 (so that q = 2 also).
To “correct” the norm-seminorm issue, consider equivalence classes of
functions, as follows. Declare functions f and g in Lp to be equivalent,
written f ∼ g, if f = g almost surely. In other words, if N denotes the
collection of random variables equal to 0 almost surely, then f ∼ g if and
only if f − g ∈ N . (Note that every element h of N belongs to every Lp and
obeys khkp = 0.) One sees that ∼ is an equivalence relation. (Clearly f ∼ f ,
and if f ∼ g then g ∼ f . To see that f ∼ g and g ∼ h implies that f ∼ h,
note that { f = g } ∩ { g = h } ⊂ { f = h }. But P (f = g) = P (g = h) = 1
so that P (f = h) = 1, i.e., f ∼ h.)
For any f ∈ Lp , let [f ] be the equivalence class containing f , so that
[f ] = f + N . Let Lp (Ω, S, P ) denote the collection of equivalence classes
{ [f ] : f ∈ Lp }. Lp (Ω, S, P ) is a linear space equipped with the rule

α[f ] + β[g] = [αf + βg].

One notes that if f1 ∼ f and g1 ∼ g, then there are elements h′ , h′′ of N


such that f1 = f +h′ and g1 = g +h′′ . Then αf1 +βg1 = αf +βg +αh′ +βh′′

ifwilde Notes
Introduction 7

which shows that αf1 + βg1 ∼ αf + βg. In other words, the definition of
α[f ] + β[g] above does not depend on the particular choice of f ∈ [f ] or
g ∈ [g] used, so this really does determine a linear structure on Lp .
Next, we define |||[f ]|||p = kf kp . If f ∼ f1 , then kf kp = kf1 kp , so ||| · |||p
is well-defined on Lp . In fact, ||| · |||p is a norm on Lp . (If |||[f ]|||p = 0, then
kf kp = 0 so that f ∈ N , i.e., f ∼ 0 so that [f ] is the zero element of Lp .) It
is usual to write ||| · |||p as just k · kp . One can think of Lp and Lp as “more or
less the same thing” except that in Lp one simply identifies functions which
are almost surely equal.
Note: this whole discussion applies to any measure space — not just
probability spaces. The fact that the measure has total mass one is irrelevant
here.

Riesz-Fischer Theorem
Theorem 1.6 (Riesz-Fischer). Let 1 ≤ p < ∞ and suppose that (fn ) is a
Cauchy sequence in Lp , then there is some f ∈ Lp such that fn → f in
Lp and there is some subsequence (fnk ) such that fnk → f almost surely as
k → ∞.

Proof. Put d(N ) = supn,m≥N kfn − fm kp . Since (fn ) is a Cauchy sequence


in Lp , it follows that d(N ) → 0 as N → ∞. Hence there is some sequence
n1 < n2 < n3 < . . . such that d(nk ) < 1/2k . In particular, it is true
that kfnk+1 − fnk kp < 1/2k . Let us write gk for fnk , simply for notational
convenience.
Let Ak = { ω : |gk+1 (ω) − gk (ω)| ≥ 1/k 2 }. By Chebyshev’s inequality,

P (Ak ) ≤ k 2p E(|gk+1 − gk |p ) = k 2p kgk+1 − gk kpp ≤ k 2p /2k .


P
This estimate implies that k P (Ak ) < ∞ and so, by the Borel-Cantelli
Lemma,
P (Ak infinitely-often) = 0.

In other words, if B = { ω : ω ∈ Ak for at most finitely many k }, then


clearly B = { Ak infinitely often }c so that P (B) = 1. But for ω ∈ B, there
is some N (which may depend on ω) such that |gk+1 (ω) − gk (ω)| < 1/k 2 for
all k > N . It follows that
j
X
gj+1 (ω) = g1 (ω) + (gk+1 (ω) − gk (ω))
k=1

converges (absolutely) for each ω ∈ B as j → ∞. Define the function f on


Ω by f (ω) = limj gj (ω) for ω ∈ B and f (ω) = 0, otherwise. Then gj → f
almost surely. (Note that f is measurable because f = limj gj 1B on Ω.)

November 22, 2004


8 Chapter 1

We claim that fn → f in L1 . To see this, first observe that


m m ∞
X X X
kgm+1 − gj kp =
(gk+1 − gk ) p ≤ kgk+1 − gk kp < 2−k .
k=j k=j k=j

Hence, by Fatou’s Lemma, letting m → ∞, we find



X
kf − gj kp ≤ 2−k .
k=j

In particular, f − gj ∈ Lp and so f = (f − gj ) + gj ∈ Lp .
Finally, let ε > 0 be given.
P∞Let N be such that d(N ) < 21 ε and choose
any j such that nj > N and k=j 2−k < 21 ε. Then for any n > N , we have

X
kf − fn kp ≤ kf − fnj kp + kfnj − fn kp ≤ 2−k + d(N ) < ε ,
k=j

that is, fn → f in Lp .

Proposition 1.7 (Chebyshev’s inequality). For any c > 0 and 1 ≤ p < ∞

P (|X| ≥ c) ≤ c−p kXkpp

for any X ∈ Lp .

Proof. Let A = { ω : |X| ≥ c }. Then


Z
p
kXkp = |X|p dP

Z Z
p
= |X| dP + |X|p dP
A Ω\A
Z
≥ |X|p dP
A
≥ cp P (A)

as required.

Remark 1.8. For any random variable g which is bounded almost surely, let

kgk∞ = inf{ M : |g| ≤ M almost surely }.

Then kgkp ≤ kgk∞ and kgk∞ = limp→∞ kgkp . To see this, suppose that g
is bounded with |g| ≤ M almost surely. Then
Z
p
kgkp = |g|p dP ≤ M p

ifwilde Notes
Introduction 9

and so kgkp is a lower bound for the set { M : |g| ≤ M almost surely }. It
follows that kgkp ≤ kgk∞ .

S part, note that if kgk∞ = 0, then g = 0 almost surely. (g 6= 0


For the last
on the set n { |g| > 1/n } = A, say. But if for each n ∈ N, |g| ≤ 1/n almost
surely, then P (A) = 0 because P (|g| > 1/n) = 0 for all n.) It follows that
kgkp = 0 and there is nothing more to prove. So suppose that kgk∞ > 0.
Then by replacing g by g/kgk∞ , we see that we may suppose that kgk∞ = 1.
Let 0 < r < 1 be given and choose δ such that r < δ < 1. By definition of
the k · k∞ -norm, there is a set B with P (B) > 0 such that |g| > δ on B. But
then Z
1 = kgk∞ ≥ kgkp =p
|g|p dP ≥ δ p P (B)

so that 1 ≥ kgkp ≥ δ P (B)1/p . But P (B)1/p → 1 as p → ∞, and so


1 ≥ kgkp ≥ r for all sufficiently large p. The result follows.

Monotone Class Theorem


It is notoriously difficult, if not impossible, to extend properties of collections
of sets directly to the σ-algebra they generate, that is, “from the inside”.
One usually has to resort to a somewhat indirect approach. The so-called
Monotone Class Theorem plays the rôle of the cavalry in this respect and
can usually be depended on to come to the rescue.

Definition 1.9. A collection A of subsets of a set X is an algebra if


(i) X ∈ A,
(ii) if A ∈ A and B ∈ A, then A ∪ B ∈ A,
(iii) if A ∈ A, then Ac ∈ A.
Note that it follows that if A, B ∈ A, then A ∩ B = (Ac ∪ B c )c ∈ A.
Also,
Sn for any finiteTn family A1 , . . . , An ∈ A, it follows by induction that
i=1 Ai ∈ A and i=1 Ai ∈ A.

Definition 1.10. A collection M of subsets of X is a monotone class if

(i)S whenever A1 ⊆ A2 ⊆ . . . is an increasing sequence in M, then



i=1 Ai ∈ M,

T∞whenever B1 ⊇ B2 ⊇ . . . is a decreasing sequence in M, then


(ii)
i=1 Bi ∈ M.

One can show that the intersection of an arbitrary family of monotone


classes of subsets of X is itself a monotone class. Thus, given a collection
C of subsets of X, we may consider M(C), the monotone class generated by
the collection C — it is the “smallest” monotone class containing C, i.e., it
is the intersection of all those monotone classes which contain C.

November 22, 2004


10 Chapter 1

Theorem 1.11 (Monotone Class Theorem). Let A be an algebra of subsets


of X. Then M(A) = σ(A), the σ-algebra generated by A.

Proof. It is clear that any σ-algebra is a monotone class and so σ(A) is a


monotone class containing A. Hence M(A) ⊆ σ(A). The proof is complete
if we can show that M(A) is a σ-algebra, for then we would deduce that
σ(A) ⊆ M(A).
If a monotone class M is an algebra, then it is a σ-algebra. To see this,
let A1 , A2 , · · · ∈ M. For each nS∈ N, set B
Sn∞= A1 ∪ · · · ∪ An . Then Bn ∈ M,
if M is an algebra. But then ∞ i=1 Ai = n=1 Bn ∈ M if M is a monotone
class. Thus the algebra M is also a σ-algebra. It remains to prove that M
is, in fact, an algebra. We shall verify the three requirements.
(i) We have X ∈ A ⊆ M(A).
(iii) Let A ∈ M(A). We wish to show that Ac ∈ M(A). To show this, let

f = { B : B ∈ M(A) and B c ∈ M(A) }.


M

Since A is an algebra, if A ∈ A then Ac ∈ A and so


f ⊆ M(A).
A⊆M

We shall show that M f is a monotone class. Let (Bn ) be a sequence in


Mf with B1 ⊆ B2 ⊆ . . . . Then Bn ∈ M(A) and Bnc ∈ M(A). Hence
S T c
n Bn ∈ M(A) and also n Bn ∈ M(A), since M(A) is a monotone class
(and (BT c ) is a decreasing sequence).
n S c S S c
But n Bnc = n Bn and so both n Bn and n Bn belong to
S f
M(A), i.e., n Bn ∈ M.
Similarly, if B1 ⊇ B2 ⊇ . . . belong to M, f then T Bn ∈ M(A) and
T c S c T n
Bn = B ∈ M(A) so that Bn ∈ f
M. It follows that Mf is a
n n n n
monotone class. Since A ⊆ M f ⊆ M(A) and M(A) is the monotone class
generated by A, we conclude that M f = M(A). But then this means that
c
for any B ∈ M(A), we also have B ∈ M(A).
(ii) We wish to show that if A and B belong to M(A) then so does A ∪ B.
Now, by (iii), it is enough to show that A ∩ B ∈ M(A) (using A ∪ B =
(Ac ∩ B c )c ). To this end, let A ∈ M(A) and let

MA = { B : B ∈ M(A) and A ∩ B ∈ M(A) }.

Then for B1 ⊆ B2 ⊆ . . . in MA , we have



[ ∞
[
A∩ Bi = A ∩ Bi ∈ M(A)
i=1 i=1

since each A ∩ Bi ∈ M(A) by the definition of MA .

ifwilde Notes
Introduction 11

Similarly, if B1 ⊇ B2 ⊇ . . . belong to MA , then



\ ∞
\
A∩ Bi = A ∩ Bi ∈ M(A).
i=1 i=1

Therefore MA is a monotone class.


Suppose A ∈ A. Then for any B ∈ A, we have A ∩ B ∈ A, since A is
an algebra. Hence A ⊆ MA ⊆ M(A) and therefore MA = M(A) for each
A ∈ A.
Now, for any B ∈ M(A) and A ∈ A, we have

A ∈ MB ⇐⇒ A ∩ B ∈ M(A) ⇐⇒ B ∈ MA = M(A).

Hence, for every B ∈ M(A),

A ⊆ MB ⊆ M(A)

and so (since MB is a monotone class) we have MB = M(A) for every


B ∈ M(A).
Now let A, B ∈ M(A). We have seen that MB = M(A) and therefore
A ∈ M(A) means that A ∈ MB so that A ∩ B ∈ M(A) and the proof is
complete.

Example 1.12. Suppose that P and Q are two probability measures on B(R)
which agree on sets of the form (−∞, a] with a ∈ R. Then P = Q on B(R).

Proof. Let S = { A ∈ B(R) : P (A) = Q(A) }. Then S includes sets of the


form (a, b], for a < b, (−∞, a] and (a, ∞) and so contains A the algebra of
subsets generated by those of the form (−∞, a]. However, one sees that S is
a monotone class (because P and Q are σ-additive) and so S contains σ(A).
The proof is now complete since σ(A) = B(R).

November 22, 2004


12 Chapter 1

ifwilde Notes
Chapter 2

Conditional expectation

Consider a probability space (Ω, S, P ). The conditional expectation of an


integrable random variable X with respect to a sub-σ-algebra G of S is a
G-measurable random variable, denoted by E(X | G), obeying
Z Z
E(X | G) dP = X dP
A A

for every set A ∈ G.


Where does this come from? R
Suppose that X ≥ 0 and define ν(A) = A X dP for any A ∈ G. Then
Z
ν(A) = X 1A dP = E(X 1A ) .

Proposition 2.1. ν is a finite measure on (Ω, G).

Proof. Since X 1A ≥ 0 almost surely for any A ∈ G, we see that ν(A) ≥ 0


for all A ∈ G. Also, ν(Ω) = E(X) which is finite by hypothesis (X is
integrable).
Now suppose that A1 , A2 , . . . is a sequence of pairwise disjoint events in G.
Set Bn = A1 ∪ · · · ∪ An . Then
Z Z
ν(Bn ) = X 1Bn dP = X(1A1 + . . . + 1An ) dP
Ω Ω
= ν(A1 ) + · · · + ν(An ) .
S
Letting n → ∞, 1Bn ↑ 1 n An on Ω and so by Lebesgue’s Monotone Conver-
gence Theorem,
Z Z
X 1Bn dP ↑ S [
X 1 n An dP = ν( An ) .
Ω Ω n
P S P
It follows that k ν(Ak ) is convergent and ν( n An ) = n ν(An ). Hence
ν is a finite measure on (Ω, G).

13
14 Chapter 2

If P (A) = 0, then X 1A = 0 almost surely and (since X1A ≥ 0) we see


that ν(A) = 0. Thus P (A) = 0 =⇒ ν(A) = 0 for A ∈ G. We say that ν is
absolutely continuous with respect to P on G (written ν << P ).
The following theorem is most relevant in this connection.
Theorem 2.2 (Radon-Nikodym). Suppose that µ1 and µ2 are finite measures
(on some (Ω, G)) with µ1 << µ2 . Then there is a G-measurable µ2 -integrable
function g (g R∈ L1 (G, dµ2 )) on Ω such that g ≥ 0 (µ1 -almost everywhere)
and µ1 (A) = A g dµ2 for any A ∈ G.
With µ1 = ν and µ2 = P , we obtain the conditional expectation E(X | G)
as above. Note that if X is not necessarily positive, then we can write
X = X+ − X− with X± ≥ 0 to give E(X | G) = E(X+ | G) − E(X− | G).

Properties of the Conditional Expectation


Various basic properties of the conditional expectation are contained in the
following proposition.
Proposition 2.3. The conditional expectation enjoys the following properties.
(i) E(X | G) is unique, almost surely.

(ii) If X ≥ 0, then E(X | G) ≥ 0 almost surely, i.e., the conditional


expectation is (almost surely) positivity preserving.

(iii) E(X + Y | G) = E(X | G) + E(Y | G) almost surely.

(iv) For any a ∈ R, E(aX | G) = aE(X | G) almost surely. Also E(a | G) =


a almost surely.

(v) If G = { Ω, ∅ }, the trivial σ-algebra, then E(X | G) = E(X) every-


where, i.e., E(X | G)(ω) = E(X) for every ω ∈ Ω.

(vi) (Tower Property) If G1 ⊆ G2 , then E(E(X | G2 ) | G1 ) = E(X | G1 ).

(vii) If σ(X) and G are independent σ-algebras, then E(X | G) = E(X)


almost surely.

(viii) For any G, E(E(X | G)) = E(X).


Proof. (i) Suppose that f and g are positive G-measurable and satisfy
Z Z
f dP = g dP
A A
R
for all A ∈ G. Then A (f − g) dP = 0 for all A ∈ G and so f − g = 0 almost
surely, that is, if A = { ω : f (ω) 6= g(ω) }, then A ∈ G and P (A) = 0. This
last assertion follows from the following lemma.

ifwilde Notes
Conditional expectation 15

R If h is integrable on some finite measure space (Ω, S, µ) and


Lemma 2.4.
satisfies A h dµ = 0 for all A ∈ S, then h = 0 µ-almost everywhere.

Proof of Lemma. Let An = { ω : h(ω) ≥ n1 }. Then


Z Z
1
0= h dµ ≥ n dµ = n1 µ(An )
An An

and so µ(An ) = 0 for all n ∈ N. But then it follows that

µ( { ω : h(ω) > 0 } ) = lim µ(An ) = 0 .


| S
{z
n
}
An
n

Let Bn = { ω : h(ω) ≤ − n1 }. Then


Z Z
0= h dµ ≤ − n1 dµ = − n1 µ(Bn )
Bn Bn

so that µ(Bn ) = 0 for all n ∈ N and therefore

µ( { ω : h(ω) < 0 } ) = lim µ(Bn ) = 0 .


| {zS n
}
Bn
n

Hence
µ({ ω : h(ω) 6= 0 }) = µ(h < 0) + µ(h > 0) = 0 ,
that is, µ(h = 0) = 1.

Remark 2.5. Note that we have used the standard shorthand notation such
as µ(h > 0) for µ({ ω : h(ω) > 0 }). There is unlikely to be any confusion.

b denote E(X | G). Then for any A ∈ G,


(ii) For notational convenience, let X
Z Z
b
X dP = X dP ≥ 0.
A A

b
Let Bn = { ω : X(ω) ≤ − n1 }. Then
Z Z
b
X dP ≤ − n 1
dP = − n1 P (Bn ) .
Bn Bn
R
However, the left hand side is equal to Bn X dP ≥ 0 which forces P (Bn ) = 0.
b < 0) = limn P (Bn ) = 0 and so P (X
But then P (X b ≥ 0) = 1.

(iii) For any choices of conditional expectation, we have


Z Z
E(X + Y | G) dP = (X + Y ) dP
A A

November 22, 2004


16 Chapter 2
Z Z
= X dP + Y dP
ZA AZ
= E(X | G) dP + E(Y | G) dP
ZA A

= E(X + Y | G) dP
A

for all A ∈ G. We conclude that E(X + Y | G) = E(X | G) + E(Y | G) almost


surely, as required.

(iv) This is just as the proof of (iii).

(v) With A = Ω, we have


Z Z
X dP = X dP = E(X)
A Ω
Z
= E(X) dP
ZΩ
= E(X) dP .
A

Now with A = ∅,
Z Z Z
X dP = X dP = 0 = E(X) dP .
A ∅ A

b : ω 7→ E(X) is { Ω, ∅ }-measurable
So the everywhere constant function X
and obeys Z Z
Xb dP = X dP
A A
for every A ∈ { Ω, ∅ }. Hence ω 7→ E(X) is a conditional expectation of
X. If X ′ is another, then X ′ = Xb almost surely, so that P (X ′ = X)b = 1.
′ b
But the set { X = X } is { Ω, ∅ }-measurable and so is equal to either ∅ or
to Ω. Since P (X ′ = X)b = 1, it must be the case that { X ′ = X b } = Ω so
b
that X(ω) ′
= X (ω) for all ω ∈ Ω.

(vi) Suppose that G1 and G2 are σ-algebras satisfying G1 ⊆ G2 . Then, for


any A ∈ G1 ,
Z Z
E(X | G1 ) dP = X dP
A A
Z
= E(X | G2 ) dP since A ∈ G2 ,
ZA
= E(E(X | G2 ) | G1 ) dP since A ∈ G1
A

ifwilde Notes
Conditional expectation 17

and we conclude that E(X | G1 ) = E(E(X | G2 ) | G1 ) almost surely, as claimed.

(vii) For any A ∈ G,


Z Z Z
E(X | G) dP = X dP = X 1A dP
A A Ω
= E(X 1A ) = E(X) E(1A )
Z
= E(X) dP.
A

The result follows.

b Then for any A ∈ G,


(viii) Denote E(X | G) by X.
Z Z
b
X dP = X dP .
A A

In particular, with A = Ω, we get


Z Z
b dP =
X X dP ,
Ω Ω

b = E(X).
that is, E(X)

The next result is a further characterization of the conditional expecta-


tion.
Theorem 2.6. Let f ∈ L1 (Ω, S, P ). The conditional expectation fb = E(f | G)
is characterized (almost surely) by
Z Z
f g dP = fbg dP (∗)
Ω Ω

for all bounded G-measurable functions g.


Proof. With g = 1A , we see that (∗) implies that
Z Z
f dP = fbdP
A A

for all A ∈ G. It follows that fb is the required conditional expectation.


For the converse, suppose that we know that (∗) holds for all f ≥ 0.
Then, writing a general f as f = f + − f − (where f ± ≥ 0 and f + f − = 0
almost surely), we get
Z Z Z
+
f g dP = f g dP − f − g dP
Ω Ω Ω
Z Z
= c+
f g dP − fc− g dP
Ω Ω

November 22, 2004


18 Chapter 2
Z
= ( fc c
+−f + ) g dP

ZΩ
= fbg dP

and we see that (∗) holds for general f ∈ L1 . Similarly, we note that by
decomposing g as g = g + −g − , it is enough to prove that (∗) holds for g ≥ 0.
So we need only show that (∗) holds for f ≥ 0 and g ≥ 0. In this case, we
know that there is a sequence (sn ) of simple G-measurable functions P such
that 0 ≤ sn ≤ g and sn → g everywhere. For fixed n, let sn = aj 1Aj
(finite sum). Then
Z XZ XZ Z
f sn dP = f dP = b
f dP = fbsn dP
Ω Aj Aj Ω

giving the equality Z Z


f sn dP = fbsn dP .
Ω Ω
By Lebesgue’s
R Dominated Convergence Theorem, theRleft hand side con-
verges to Ω f g dP and the right hand side converges to Ω fbg dP as n → ∞,
which completes the proof.

Jensen’s Inequality

We begin with a definition.

Definition 2.7. The function ϕ : R → R is said to be convex if

ϕ(a + s(b − a)) ≤ ϕ(a) + s(ϕ(b) − ϕ(a)),

that is,
ϕ((1 − s)a + sb)) ≤ (1 − s)ϕ(a) + sϕ(b) (2.1)
for any a, b ∈ R and all 0 ≤ s ≤ 1. The point a + s(b − a) = (1 − s)a + sb
lies between a and b and the inequality (2.1) is the statement that the chord
between the points (a, ϕ(a)) and (b, ϕ(b)) on the graph y = ϕ(x) lies above
the graph itself.

Let u < v < w. Then v = u + s(w − u) = (1 − s)u + sw for some 0 < s < 1
and from (2.1) we have

(1 − s)ϕ(v) + sϕ(v) = ϕ(v) ≤ (1 − s)ϕ(u) + sϕ(w) (2.2)

which can be rearranged to give

(1 − s)(ϕ(v) − ϕ(u)) ≤ s(ϕ(w) − ϕ(v)). (2.3)

ifwilde Notes
Conditional expectation 19

But v = (1 − s)u + sw = u + s(w − u) = (1 − s)(u − w) + w so that


s = (v − u)/(w − u) and 1 − s = (v − w)/(u − w) = (w − v)/(w − u).
Inequality (2.3) can therefore be rewritten as

ϕ(v) − ϕ(u) ϕ(w) − ϕ(v)


≤ . (2.4)
v−u w−v
Again, from (2.2), we get

ϕ(v) − ϕ(u) ≤ (1 − s)ϕ(u) + sϕ(w) − ϕ(u) = s(ϕ(w) − ϕ(u))

and so, substituting for s,

ϕ(v) − ϕ(u) ϕ(w) − ϕ(u)


≤ . (2.5)
v−u w−u
Once more, from (2.2),

ϕ(v) − ϕ(w) ≤ (1 − s)ϕ(u) + sϕ(w) − ϕ(w) = (s − 1)(ϕ(w) − ϕ(u))

which gives
ϕ(w) − ϕ(u) ϕ(w) − ϕ(v)
≤ . (2.6)
w−u w−v
These inequalities are readily suggested from a diagram.
Now fix v = v0 . Then by inequality (2.5), we see that the ratio (Newton
quotient) (ϕ(w) − ϕ(v0 ))/(w − v0 ) decreases as w ↓ v0 and, by (2.4), is
bounded below by (ϕ(v0 ) − ϕ(u))/(v0 − u) for any u < v0 . Hence ϕ has a
right derivative at v0 , i.e.,

ϕ(w) − ϕ(v0 )
∃ lim ≡ D+ ϕ(v0 ).
w↓v0 w − v0

Next, we consider u ↑ v0 . By (2.6), the ratio (ϕ(v0 ) − ϕ(u))/(v0 − u) is


increasing as u ↑ v0 and, by the inequality (2.4), is bounded above (by
(ϕ(w) − ϕ(v0 ))/(w − v0 ) for any w > v0 ). It follows that ϕ has a left
derivative at v0 , i.e.,

ϕ(v0 ) − ϕ(u)
∃ lim ≡ D− ϕ(v0 ).
u↑v0 v0 − u
It follows that ϕ is continuous at v0 because
 ϕ(w) − ϕ(v ) 
0
ϕ(w) − ϕ(v0 ) = (w − v0 ) →0 as w ↓ v0
w − v0
and  ϕ(v ) − ϕ(u) 
0
ϕ(v0 ) − ϕ(u) = (v0 − u) →0 as u ↑ v0 .
v0 − u

November 22, 2004


20 Chapter 2

By (2.5), letting u ↑ v0 , we get

ϕ(w) − ϕ(v0 )
D− ϕ(v0 ) ≤
w − v0
ϕ(w) − ϕ(λ)
≤ for any v0 ≤ λ ≤ w, by (2.6)
w−λ
↑ D− ϕ(w) as λ ↑ w.

Hence
D− ϕ(v0 ) ≤ D− ϕ(w)
whenever v ≤ w. Similarly, letting w ↓ v in (2.6), we find

ϕ(v) − ϕ(u)
≤ D+ ϕ(v)
v−u
and so
ϕ(λ) − ϕ(u) ϕ(v) − ϕ(u)
≤ ≤ D+ ϕ(v).
λ−u v−u
Letting λ ↓ u, we get
D+ ϕ(u) ≤ D+ ϕ(v)
whenever u ≤ v. That is, both D+ ϕ and D− ϕ are non-decreasing functions.
Furthermore, letting u ↑ v0 and w ↓ v0 in (2.4), we see that

D− ϕ(v0 ) ≤ D+ (v0 )

at each v0 .

Claim. Fix v and let m satisfy D− ϕ(v) ≤ m ≤ D+ ϕ(v). Then

m(x − v) + ϕ(v) ≤ ϕ(x) (2.7)

for any x ∈ R.

Proof. For x > v, (ϕ(x) − ϕ(v))/(x − v) ↓ D+ ϕ(v) and so

ϕ(x) − ϕ(v)
≥ D+ ϕ(v) ≥ m
x−v

which means that ϕ(x) − ϕ(v) ≥ m(x − v) for x > v.


Now let x < v. Then (ϕ(v) − ϕ(x))/(v − x) ↑ D− ϕ(v) ≤ m and so we see
that ϕ(v) − ϕ(x) ≤ m(v − x), i.e., ϕ(x) − ϕ(v) ≥ m(x − v) for x < v and the
claim is proved.

Note that the inequality in the claim becomes equality for x = v.

ifwilde Notes
Conditional expectation 21

Let A = { (α, β) : ϕ(x) ≥ αx + β for all x }. From the remark above (and
using the same notation), we see that for any x ∈ R,

ϕ(x) = sup{ αx + β : (α, β) ∈ A }.

We wish to show that it is possible to replace the uncountable set A with a


countable one.
For each q ∈ Q, let m(q) be a fixed choice such that

D− ϕ(q) ≤ m(q) ≤ D+ ϕ(q)

(for example, we could systematically choose m(q) = D− ϕ(q)). Set α(q) =


m(q) and β(q) = ϕ(q) − m(q)q. Let A0 = { (α(q), β(q)) : q ∈ Q }. Then the
discussion above shows that

αx + β ≤ ϕ(x)

for all x ∈ R and that ϕ(q) = sup{ αq + β : (α, β) ∈ A0 }.

Claim. For any x ∈ R, ϕ(x) = sup{ αx + β : (α, β) ∈ A0 }.

Proof. Given x ∈ R, fix u < x < w. Then we know that for any q ∈ Q with
u < q < w,
D− ϕ(u) ≤ D− ϕ(q) ≤ D+ ϕ(q) ≤ D+ ϕ(w). (2.8)

Hence D− ϕ(u) ≤ m(q) ≤ D+ ϕ(w). Let (qn ) be a sequence in Q with


u < qn < w and such that qn → x as n → ∞. Then by (2.8), (m(qn )) is a
bounded sequence. We have

0 ≤ ϕ(x) − (α(qn )x + β(qn ))


= ϕ(x) − ϕ(qn ) + ϕ(qn ) − (α(qn )x + β(qn ))
= (ϕ(x) − ϕ(qn )) + α(qn )qn + β(qn ) − (α(qn )x + β(qn ))
= (ϕ(x) − ϕ(qn )) + m(qn )(qn − x) .

Since ϕ is continuous, the first term on the right hand side converges to 0
as n → ∞ and so does the second term, because the sequence (m(qn )) is
bounded. It follows that (α(qn )x + β(qn )) → ϕ(x) and therefore we see that
ϕ(x) = sup{ αx + β : (α, β) ∈ A0 }, as claimed.

We are now in a position to discuss Jensen’s Inequality.

November 22, 2004


22 Chapter 2

Theorem 2.8 (Jensen’s Inequality). Suppose that ϕ is convex on R and that


both X and ϕ(X) are integrable. Then

ϕ(E(X | G)) ≤ E(ϕ(X) | G) almost surely.

Proof. As above, let A = { (α, β) : ϕ(x) ≥ αx + β for all x ∈ R }. We have


seen that there is a countable subset A0 ⊂ A such that

ϕ(x) = sup (αx + β).


(α,β)∈A0

For any (α, β) ∈ A,


αX(ω) + β ≤ ϕ(X(ω))
for all ω ∈ Ω. In other words, ϕ(X) − (αX + β) ≥ 0 on Ω. Hence, taking
conditional expectations,

E(ϕ(X) − (αX + β) | G) ≥ 0 almost surely.

It follows that

E((αX + β) | G) ≤ E(ϕ(X) | G) almost surely

and so
b + β ≤ E(ϕ(X) | G)
αX almost surely,
b is any choice of E(X | G).
where X
Now, for each (α, β) ∈ A0 , let A(α, β) be a set in G such that P (A(α, β)) = 1
and
b
αX(ω) + β ≤ E(ϕ(X) | G)(ω)
T
for every ω ∈ A(α, β). Let A = (α,β)∈A0 . Since A0 is countable, P (A) = 1
and
b
αX(ω) + β ≤ E(ϕ(X) | G)(ω)
for all ω ∈ A. Taking the supremum over (α, β) ∈ A0 , we get

sup b
αX(ω) + β ≤ E(ϕ(X) | G)(ω)
(α,β)∈A0
| {z }
b
ϕ(X(ω))=ϕ( b
X)(ω)

on A, that is,
b ≤ E(ϕ(X) | G)
ϕ(X)
almost surely and the proof is complete.

ifwilde Notes
Conditional expectation 23

Proposition 2.9. Suppose that X ∈ Lr where r ≥ 1. Then

k E(X | G) kr ≤ k X kr .

In other words, the conditional expectation is a contraction on every Lr with


r ≥ 1.

Proof. Let ϕ(s) = |s|r and let X


b be any choice for the conditional expecta-
tion E(X | G). Then ϕ is convex and so by Jensen’s Inequality

b ≤ E(ϕ(X) | G) ,
ϕ(X) almost surely,

that is
b |r ≤ E( |X|r | G) almost surely.
|X

Taking expectations, we get

b r ) ≤ E( E( |X|r | G) ) = E( |X|r ) .
E( |X|

Taking rth roots, gives


b kr ≤ kXkr
kX

as claimed.

Functional analytic approach to the conditional expectation


Consider a vector u = ai + bj + ck ∈ R3 , where i, j, k denote the unit
vectors in the Ox, Oy and Oz directions, respectively. Then the distance
between u and the x-y plane is just |c|. The vector u can be written as
v + w, where v = ai + bk and w = ck. The vector v lies in the x-y plane
and is orthogonal to the vector w.
The generalization to general Hilbert spaces is as follows. Recall that a
Hilbert space is a complete inner product space. (We consider only real
Hilbert spaces, but it is more usual to discuss complex Hilbert spaces.)
Let X be a real linear space equipped with an “inner product”, that is, a
map x, y 7→ (x, y) ∈ R such that
(i) (x, y) ∈ R and (x, x) ≥ 0, for all x ∈ X,
(ii) (x, y) = (y, x), for all x, y ∈ X,
(iii) (ax + by, z) = a(x, z) + b(y, z), for all x, y, z ∈ X and a, b ∈ R.
Note: it is usual to also require that if x 6= 0, then (x, x) > 0 (which is
why we have put the term “inner product” in quotation marks). This, of
course, leads us back to the discussion of the distinction between L2 and L2 .

November 22, 2004


24 Chapter 2

R
Example 2.10. Let X = L2 (Ω, S, P ) with (f, g) = Ω f (ω)g(ω) dP for any
f, g ∈ L2 .
In general, set kxk = (x, x)1/2 for x ∈ X. Then in the example above, we
R 1/2
observe that kf k = Ω f 2 dP = kf k2 . It is important to note that k · k
is not quite a norm. It can happen that kxk = 0 even though x 6= 0. Indeed,
an example in L2 is provided by any function which is zero almost surely.
Proposition 2.11 (Parallelogram law). For any x, y ∈ X,
kx + yk2 + kx − yk2 = 2( kxk2 + kyk2 ).
Proof. This follows by direct calculation using kwk2 = (w, w).
Definition 2.12. A subspace V ⊂ X is said to be complete if every Cauchy
sequence in V converges to an element of V , i.e., if (vn ) is a Cauchy sequence
in V (so that kvn − vm k → 0 as m, n → ∞) then there is some v ∈ V such
that vn → v, as n → ∞.
Theorem 2.13. Let x ∈ X and suppose that V is a complete subspace of X.
Then there is some v ∈ V such that kx − vk = inf y∈V kx − yk, that is, there
is v ∈ V so that kx − vk = dist(x, V ), the distance between x and V .
Proof. Let (vn ) be any sequence in V such that
kx − vn k → d = inf kx − yk .
y∈V

We claim that (vn ) is a Cauchy sequence. Indeed, the parallelogram law


gives
kvn − vm k2 = k(x − vm ) + (vn − x)k2
= 2kx − vm k2 + kvn − xk2 − k (x − vm ) − (vn − x) k2
| {z }
1
2(x− 2 (vm +vn ))

= 2kx − vm k2 + kvn − xk2 − 4kx − 12 (vm + vn )k2 (∗)


Now
d ≤ kx − 12 ( vm + vn ) k = k 12 (x − vm ) + 12 (x − vm )k
| {z }
∈V
1
≤ kx − v k + 1 kx − v k
2 | {z m } 2 | {z n }
→d →d

so that kx− 21 (vm +vn )k


→ d as m, n → ∞. Hence (∗) → 2d2 +2d2 −4d2 = 0
as m, n → ∞. It follows that (vn ) is a Cauchy sequence. By hypothesis, V
is complete and so there is v ∈ V such that vn → v. But then
d ≤ kx − vk ≤ kx − vn k + kvn − vk
| {z } | {z }
→d →0

and so d = kx − vk.

ifwilde Notes
Conditional expectation 25

Proposition 2.14. If v ′ ∈ V satisfies d = kx − v ′ k, then x − v ′ ⊥ V and


kv − v ′ k = 0 (where v is as in the previous theorem).
Conversely, if v ′ ∈ V and x − v ′ ⊥ V , then kx − v ′ k = d and kv − v ′ k = 0.

Proof. Suppose that there is w ∈ V such that (x − v ′ , w) = λ 6= 0. Let


u = v ′ + αw where α ∈ R. Then u ∈ V and

kx − uk2 = kx − v ′ − αwk2 = kx − v ′ k2 − 2α(x − v ′ , w) + α2 kwk2


= d2 − 2αλ + α2 kwk2 .

But the left hand side is ≥ d2 , so by the definition of d we obtain

−2αλ + α2 kwk2 = α(−2λ + αkwk2 ) ≥ 0

for any α ∈ R. This is impossible. (The graph of y = x(xb2 − a) lies below


the x-axis for all x between 0 and a/b2 .)
We conclude that there is no such w and therefore x − v ′ ⊥ V . But then
we have

d2 = kx − vk2 = k(x − v ′ ) + (v ′ − v)k2


= kx − v ′ k2 + 2 (x − v ′ , v ′ − v) +kv ′ − vk2 = d2 + kv ′ − vk2 .
| {z }
= 0 since v ′ − v ∈ V

It follows that kv ′ − vk = 0.
For the converse, suppose that x − v ′ ⊥ V . Then we calculate

d2 = kx − vk2 = k(x − v ′ ) + (v ′ − v)k2


= kx − v ′ k2 + 2 (x − v ′ , v ′ − v) +kv − v ′ k2
| {z }
= 0 since v ′ − v ∈ V
= kx − v ′ k2 +kv − v k ′ 2
(∗)
| {z }
≥d2

≥ d + kv − v ′ k2 .
2

It follows that kv − v ′ k = 0 and then (∗) implies that kx − v ′ k = d.

Suppose now that k · k satisfies the condition that kxk = 0 if and only if
x = 0. Thus we are supposing that k · k really is a norm not just a seminorm
on X. Then the equality kv − v ′ k = 0 is equivalent to v = v ′ . In this case,
we can summarize the preceding discussion as follows.
Given any x ∈ X, there is a unique v ∈ V such that x − v ⊥ V . With
x = v + x − v we see that we can write x as x = v + w where v ∈ V
and w ⊥ V . This decomposition of x as the sum of an element v ∈ V
and an element w ⊥ V is unique and means that we can define a map
P : X → V by the formula P : x 7→ P x = v. One checks that this is a linear

November 22, 2004


26 Chapter 2

map from X onto V . (Indeed, P v = v for all v ∈ V .) We also see that


P 2 x = P P x = P v = v = P x, so that P 2 = P . Moreover, for any x, x′ ∈ X,
write x = v + w and x′ = v ′ + w′ with v, v ′ ∈ V and w, w′ ⊥ V . Then

(P x, x′ ) = (v, x′ ) = (v, v ′ + w′ ) = (v, v ′ ) = (v + w, v ′ ) = (x, P x′ ).

A linear map P with the properties that P 2 = P and (P x, x′ ) = (x, P x′ )


for all x, x′ ∈ X is called an orthogonal projection. We say that P projects
onto the subspace V = { P x : x ∈ X }.
Suppose that Q is a linear map obeying these two conditions. Then we
note that Q(1−Q) = Q−Q2 = 0 and writing any x ∈ X as x = Qx+(1−Q)x,
we find that the terms Qx and (1 − Q)x are orthogonal

(Qx, (1 − Q)x) = (Qx, x) − (Qx, Qx) = (Qx, x) − (Q2 x, Qx) = 0.

So Q is the orthogonal projection onto the linear subspace { Qx : x ∈ X }.


We now wish to apply this to the linear space L2 (Ω, S, P ). Suppose
that G is a sub-σ-algebra of S. Just as we construct L2 (Ω, S, P ), so we
may construct L2 (Ω, G, P ). Since every G-measurable function is also S-
measurable, it follows that L2 (Ω, G, P ) is a linear subspace of L2 (Ω, S, P ).
The Riesz-Fisher Theorem tells us that L2 (Ω, G, P ) is complete and so the
above analysis is applicable, where now V = L2 (Ω, G, P ).
Thus, any element f ∈ L2 (Ω, S, P ) can be written as

f = fb + g

where fb ∈ L2 (Ω, G, P ) and g ⊥ L2 (Ω, G, P ). Because k · k2 is not a norm but


only a seminorm on L2 (Ω, S, P ), the functions fb and g are unique only in
the sense that if also f = f ′ + g ′ with f ′ ∈ L2 (Ω, G, P ) and g ′ ∈ L2 (Ω, G, P )
then kfb − f ′ k2 = 0, that is, fb ∈ L2 (Ω, G, P ) is unique almost surely.
If we were to apply this discussion to L2 (Ω, S, P ) and L2 (Ω, G, P ), then
k · k2 is a norm and the object corresponding to fb should now be unique and
be the image of [f ] ∈ L2 (Ω, S, P ) under an orthogonal projection. However,
there is a subtle point here. For this idea to go through, we must be able to
identify L2 (Ω, G, P ) as a subspace of L2 (Ω, S, P ). It is certainly true that
any element of L2 (Ω, G, P ) is also an element of L2 (Ω, S, P ), but is every
element of L2 (Ω, G, P ) also an element of L2 (Ω, S, P )?
The answer lies in the sets of zero probability. Any element of L2 (Ω, G, P )
is a set (equivalence class) of the form [f ] = { f + N (G) }, where N (G)
denotes the set of null functions that are G-measurable. On the other hand,
the corresponding element [f ] ∈ L2 (Ω, S, P ) is the set { f + N (S) }, where
now N (S) is the set of S-measurable null functions. It is certainly true that
{ f + N (G) } ⊂ { f + N (S) }, but in general there need not be equality. The
notion of almost sure equivalence depends on the underlying σ-algebra. If
we demand that G contain all the null sets of S, then we do have equality

ifwilde Notes
Conditional expectation 27

{ f + N (G) } = { f + N (S) } and in this case it is true that L2 (Ω, G, P )


really is a subspace of L2 (Ω, S, P ). For any x ∈ L2 (Ω, S, P ), there is a
unique element v ∈ L2 (Ω, G, P ) such that x − v ⊥ L2 (Ω, G, P ). Indeed, if
x = [f ], with f ∈ L2 (Ω, S, P ), then v is given by v = [ fb].
Definition 2.15. For given f ∈ L2 (Ω, S, P ), any element fb ∈ L2 (Ω, G, P ) such
that f − fb ⊥ L2 (Ω, G, P ) is called a version of the conditional expectation
of f and is denoted E(f | G).

On square-integrable random variables, the conditional expectation map


f 7→ fb is an orthogonal projection (subject to the ambiguities of sets of
probability zero). We now wish to show that it is possible to recover the
usual properties of the conditional expectation from this L2 (inner product
business) approach.
Proposition 2.16. For f ∈ L2 (Ω,RS, P ), (any versionR of ) the conditional ex-
pectation fb = E(f | G) satisfies Ω f (ω)g(ω) dP = Ω fb(ω)g(ω) dP for any
g ∈ L2 (Ω, G, P ). In particular,
Z Z
f dP = fb dP
A A

for any A ∈ G.
Proof. By construction, f − fb ⊥ L2 (Ω, G, P ) so that
Z
(f − fb) g dP = 0

for any g ∈ L2 (Ω, G, P ). In particular, for any A ∈ G, set g = 1A to get the


equality Z Z Z Z
f dP = f g dP = b
f g dP = fb dP.
A Ω Ω A

This is the defining property of the conditional expectation except that


f belongs to L2 (Ω, S, P ) rather than L1 (Ω, S, P ). However, we can extend
the result to cover the L1 case as we now show. First note that if f ≥ g
almost surely, then fb ≥ gb almost surely. Indeed, by considering h = f − g,
it is enough to show that bh ≥ 0 almost surely whenever h ≥ 0 almost surely.
But this follows directly from
Z Z
bh dP = h dP ≥ 0
A A

b
R A ∈ G. (If 1Bn = { h < −1/n } for n ∈ N, then the inequalities
for all
0 ≤ Bn h dP ≤ − n P (Bn ) imply that P (Bn ) = 0. But then P (b
b h < 0) =
limn P (Bn ) = 0.)

November 22, 2004


28 Chapter 2

Proposition 2.17. For f ∈ L1 (Ω, S, P ), there exists fb ∈ L1 (Ω, G, P ) such


that Z Z
f dP = fb dP
A A

for any A ∈ G. The function fb is unique almost surely.

Proof. By writing f as f + − f − , it is enough to consider the case with f ≥ 0


almost surely. For n ∈ N , set fn (ω) = f (ω) ∧ n. Then 0 ≤ fn ≤ n almost
surely and so fn ∈ L2 (Ω, S, P ). Let fc n be any version of the conditional
expectation of fn with respect to G. Now, if n > m, then fn ≥ fm and
so fc c
n ≥ fm almost surely. That is, there is some event Bmn ∈ G with
P (Bmn ) = 0 and such that fc c c c
n ≥ fm on Bmn (and P (Bmn ) = 1). Let
S
B = m,n Bmn . Then P (B) = 0, P (B c ) = 1 and fc c c
n ≥ fm on B for all m, n
with m ≤ n. Set (
supn fc n (ω), ω ∈ B
c
fb(ω) =
0, ω ∈ B.
Then for any A ∈ G,
Z Z Z Z
fn dP = fn 1A dP = c
fn 1A dP = fc
n 1B c 1A dP
A Ω Ω Ω
R
because P (B c ) = 1. The left hand side → Ω f 1A dP by Lebesgue’s Domi-
nated Convergence Theorem. Applying Lebesgue’s Monotone Convergence
Theorem to the right hand side, we see that
Z Z Z
fc 1
n B c 1A dP → b
f 1 B c 1A dP = fb dP
Ω Ω A
R R
which gives the equality A f dP = A fb dP .
Taking A = Ω, we see that fb ∈ L1 (Ω, G, P ).

Of course, the function fb is called (a version of) the conditional expecta-


tion E(f | G) and we have recovered our original construction as given via
the Radon-Nikodym Theorem.

ifwilde Notes
Chapter 3

Martingales

Let (Ω, S, P ) be a probability space. We consider a sequence (Fn )n∈Z+ of


sub-σ-algebras of a fixed σ-algebra F ⊂ S obeying

F0 ⊂ F1 ⊂ F2 ⊂ · · · ⊂ F .

Such a sequence is called a filtration (upward filtering). The intuitive idea is


to think of Fn as those events associated with outcomes of interest occurring
up to “time n”. Think of n as (discrete) time.

Definition 3.1. A martingale (with respect to (Fn )) is a sequence (ξn ) of


random variables such that

1. Each ξn is integrable: E(|ξn |) < ∞ for all n ∈ Z+ .

2. ξn is measurable with respect to Fn (we say (ξn ) is adapted ).

3. E(ξn+1 | Fn ) = ξn , almost surely, for all n ∈ Z+ .

Note that (1) is required in order for (3) to make sense. (The conditional
expectation E(ξ | G) is not defined unless ξ is integrable.)

Remark 3.2. Suppose that (ξn ) is a martingale. For any n > m, we have

E(ξn | Fm ) = E(E(ξn | Fn−1 ) | Fm ) almost surely, by the tower property


= E(ξn−1 | Fm ) almost surely
= ···
= E(ξm+1 | Fm ) almost surely
= ξm almost surely.

That is,
E(ξn | Fm ) = ξm almost surely
for all n ≥ m. (This could have been taken as part of the definition.)

29
30 Chapter 3

Example 3.3. Let X0 , X1 , X2 , . . . be independent integrable random vari-


ables with mean zero. For each n ∈ Z+ let Fn = σ(X0 , X1 , . . . , Xn ) and let
ξn = X0 + X1 + · · · + Xn . Evidently (ξn ) is adapted and

E(|ξn |) = E(|X0 + X1 + · · · + Xn |)
≤ E(|X0 |) + E(|X1 |) + · · · + E(|Xn |)
<∞

so that ξn is integrable. Finally, we note that (almost surely)

E(ξn+1 | Fn ) = E(Xn+1 + ξn | Fn )
= E(Xn+1 | Fn ) + E(ξn | Fn )
= E(Xn+1 ) + ξn
since ξn is adapted and Xn+1 and Fn are independent
= ξn

and so (ξn ) is a martingale.

Example 3.4. Let X ∈ L1 (F) and let ξn = E(X | Fn ). Then ξn ∈ L1 (Fn )


and

E(ξn+1 | Fn ) = E(E(X | Fn+1 ) | Fn )


= E(X | Fn ) almost surely
= ξn almost surely

so that (ξn ) is a martingale.

Proposition 3.5. For any martingale (ξn ), E(ξn ) = E(ξ0 ).

Proof. We note that E(X) = E(X | G), where G is the trivial σ-algebra,
G = { Ω, ∅ }. Since G ⊂ Fn for all n, we can apply the tower property to
deduce that

E(ξn ) = E(ξn | G) = E(E(ξn | F0 ) | G) = E(ξ0 | G) = E(ξ0 )

as required.

Definition 3.6. The sequence X0 , X1 , X2 , . . . of random variables is said to


be a supermartingale with respect to the filtration (Fn ) if the following three
conditions hold.

1. Each Xn is integrable.

2. (Xn ) is adapted to (Fn ).

3. For each n ∈ Z+ , Xn ≥ E(Xn+1 | Fn ) almost surely.

ifwilde Notes
Martingales 31

The sequence X0 , X1 , . . . is said to be a submartingale if both (1) and (2)


above hold and also the following inequalities hold.
3′. For each n ∈ Z+ , Xn ≤ E(Xn+1 | Fn ) almost surely.
Evidently, (Xn ) is a submartingale if and only if (−Xn ) is a supermartingale.
Example 3.7. Let X = (Xn )n∈Z+ be a supermartingale (with respect to a
given filtration (Fn )) such that E(Xn ) = E(X0 ) for all n ∈ Z+ . Then
X is a martingale. Indeed, let Y = Xn − E(Xn+1 | Fn ). Since (Xn ) is a
supermartingale, Y ≥ 0 almost surely. Taking expectations, we find that
E(Y ) = E(Xn ) − E(Xn+1 ) = 0, since E(Xn+1 ) = E(Xn ) = E(X0 ). It
follows that Y = 0, almost surely, that is, (Xn ) is a martingale.
Example 3.8. Let (Yn ) be a sequence of independent random variables such
that Yn and eYn are integrable for all n ∈ Z+ and such that E(Yn ) ≥ 0. For
n ∈ Z+ , let
Xn = exp(Y0 + Y1 + · · · + Yn ) .
Then (Xn ) is a submartingale with respect to the filtration (Fn ) where
Fn = σ(Y0 , . . . , Yn ), the σ-algebra generated by Y0 , Y1 , . . . , Yn .
To see this, we first show that each Xn is integrable. We have
Xn = e(Y0 + ... + Yn ) = eY0 eY1 . . . eYn .
Each term eYj is integrable, by hypothesis, but why should the product be?
It is the independence of the Yj s which does the trick — indeed, we have
E(Xn ) = E(e(Y0 +···+Yn ) )
= E(eY0 eY1 . . . eYn )
= E(eY0 ) E(eY1 ) . . . E(eYn ) ,
by independence.
[ Let fj = eYj and for m ∈ N, set fjm = fj ∧ m. Each fjm is bounded and
the fjm s are independent. Note also that each fj is non-negative. Then
E(f0m . . . fnm ) = E(f0m ) E(f1m ) . . . E(fnm ), by independence. Letting m → ∞
and applying Lebesgue’s Monotone Convergence Theorem, we deduce that
the product f0 . . . fn is integrable (and its integral is given by the product
E(f0 ) . . . E(fn )).]
By construction, (Xn ) is adapted. To verify the submartingale inequality,
we consider
E(Xn+1 | Fn ) = E(eY0 +Y1 +···+Yn +Yn+1 | Fn ) = E(Xn eYn+1 | Fn )
= Xn E(eYn+1 | Fn ), since Xn is Fn -measurable,
= Xn E(eYn+1 ), by independence,
E(Yn+1 )
≥ Xn e , by Jensen’s inequality, (s 7→ es is convex)
≥ Xn , since E(Yn+1 ) ≥ 0.

November 22, 2004


32 Chapter 3

Proposition 3.9.
(i) Suppose that (Xn ) and (Yn ) are submartingales. Then (Xn ∨ Yn ) is
also a submartingale.

(ii) Suppose that (Xn ) and (Yn ) are supermartingales. Then (Xn ∧ Yn )
is also a supermartingale.
Proof. (i) Set Zn = Xn ∨ Yn . Then Zk ≥ Xk and Zk ≥ Yk for all k and so

E(Zn+1 | Fn ) ≥ E(Xn+1 | Fn ) ≥ Xn almost surely

and similarly E(Zn+1 | Fn ) ≥ E(Yn+1 | Fn ) ≥ Yn almost surely. It follows


that E(Zn+1 | Fn ) ≥ Zn almost surely, as required.
(If A = { E(Zn+1 | Fn ) ≥ Xn } and B = { E(Zn+1 | Fn ) ≥ Yn }, then
A ∩ B = { E(Zn+1 | Fn ) ≥ Zn }. However, P (A) = P (B) = 1 and so
P (A ∩ B) = 1.)

(ii) Set Un = Xn ∧ Yn , so that Uk ≤ Xk and Uk ≤ Yk for all k. It follows


that
E(Un+1 | Fn ) ≤ E(Xn+1 | Fn ) ≤ Xn almost surely
and similarly E(Un+1 | Fn ) ≤ E(Yn+1 | Fn ) ≤ Yn almost surely. We conclude
that E(Xn+1 ∧ Yn+1 | Fn ) ≤ Xn ∧ Yn almost surely, as required.

Proposition 3.10. Suppose that (ξn )n∈Z+ is a martingale and ξn ∈ L2 for


each n. Then (ξn2 ) is a submartingale.
Proof. The function ϕ : t 7→ t2 is convex and so by Jensen’s inequality

ϕ( E(ξn+1 | Fn ) ) ≤ E(ϕ(ξn+1 ) | Fn ) almost surely.


| {z }
ξn

That is,
ξn2 ≤ E(ξn+1
2
| Fn ) almost surely
as required.

Theorem 3.11 (Orthogonality of martingale increments). Suppose that (Xn )


is an L2 -martingale with respect to the filtration (Fn ). Then

E( (Xn − Xm ) Y ) = 0

whenever n ≥ m and Y ∈ L2 (Fm ). In particular,

E( (Xk − Xj ) (Xn − Xm ) ) = 0

for any 0 ≤ j ≤ k ≤ m ≤ n. In other words, the increments (Xk − Xj ) and


(Xn − Xm ) are orthogonal in L2 .

ifwilde Notes
Martingales 33

Proof. Note first that (Xn − Xm )Y is integrable. Next, using the “tower
property” and the Fm -measurability of Y , we see that

E((Xn − Xm ) Y ) = E(E((Xn − Xm ) Y | Fm ))
= E(E((Xn − Xm | Fm ) Y )
=0

since E(Xn − Xm | Fm ) = Xm − Xm = 0.
The orthogonality of the martingale increments follows immediately by
taking Y = Xk − Xj .

Gambling
It is customary to mention gambling. Consider a sequence η1 , η2 , . . . of
random variables where ηn is thought of as the “winnings per unit stake” at
game play n. If a gambler places a unit stake at each game, then the total
winnings after n games is ξn = η1 + η2 + · · · + ηn .
For n ∈ N, let Fn = σ(η1 , . . . , ηn ) and set ξ0 = 0 and F0 = { Ω, ∅ }. To
say that (ξn ) is a martingale is to say that

E(ξn+1 | Fn ) = ξn almost surely,

or
E(ξn+1 − ξn | Fn ) = 0 almost surely.

We can interpret this as saying that knowing everything up to play n, we


expect the gain at the next play to be zero. We gain no advantage nor suffer
any disadvantage at play n + 1 simply because we have already played n
games. In other words, the game is “fair”.
On the other hand, if (ξn ) is a super martingale, then

ξn ≥ E(ξn+1 | Fn ) almost surely.

This can be interpreted as telling us that the knowledge of everything up to


(and including) game n suggests that our winnings after game n + 1 will be
less than the winnings after n games. It is an unfavourable game for us.
If (ξn ) is a submartingale, then arguing as above, we conclude that the
game is in our favour.
Note that if (ξn ) is a martingale, then

E(ξn ) = E(ξ1 ) ( = E(ξ0 )) = 0 in this case)

so the expected total winnings never changes.

November 22, 2004


34 Chapter 3

Definition 3.12. A stochastic process (αn )n∈N is said to be previsible (or


predictable) with respect to a filtration (Fn ) if αn is Fn−1 -measurable for
each n ∈ N.
Note that one often extends the labelling to Z+ and sets α0 = 0, mainly for
notational convenience.

Using the gambling scenario, we might think of αn as the number of stakes


bought at game n. Our choice for αn can be based on all events up to and
including game n − 1. The requirement that αn be Fn−1 -measurable is quite
reasonable and natural in this setting.
The total winnings after n games will be

ζn = α1 η1 + · · · + αn ηn .

However, ξn = η1 + · · · + ηn = ξn−1 + ηn and so

ζn = α1 (ξ1 − ξ0 ) + α2 (ξ2 − ξ1 ) + · · · + αn (ξn − ξn−1 ) .

This can also be expressed as

ζn+1 = ζn + αn+1 ηn+1 = ζn + αn+1 (ξn+1 − ξn ) .

Example 3.13. For each k ∈ Z+ , let Bk be a Borel subset of Rk+1 and set
(
1, if (η0 , η1 , . . . , ηk ) ∈ Bk
αk+1 =
0, otherwise.

Then αk+1 is σ(η0 , η1 , . . . , ηk )-measurable. We have a “gaming strategy”.


One decides whether to play at game k + 1 depending on the outcomes up to
and including game k, namely, whether the outcome belongs to Bk or not.

In the following discussion, let F0 = (Ω, ∅) and set ζ0 = 0.

Theorem 3.14. Let (αn ) be a predictable process and as above, let (ζn ) be the
process ζn = α1 (ξ1 − ξ0 ) + · · · + αn (ξn − ξn−1 ).

(i) Suppose that (αn ) is bounded and that (ξn ) is a martingale. Then
(ζn ) is a martingale.

(ii) Suppose that (αn ) is bounded, αn ≥ 0 and that the process (ξn ) is
a supermartingale. Then (ζn ) is a supermartingale.

(iii) Suppose that (αn ) is bounded, αn ≥ 0 and that the process (ξn ) is
a submartingale. Then (ζn ) is a submartingale.

ifwilde Notes
Martingales 35

Proof. First note that

|ζn | ≤ |α1 | (|ξ1 | + |ξ0 |) + · · · + |αn | (|ξn | + |ξn−1 |)

and so ζn is integrable because each ξk is and (αn ) is bounded.


Next, we have

E(ζn | Fn−1 ) = E(ζn−1 + αn (ξn − ξn−1 ) | Fn−1 )


= ζn−1 + E(αn (ξn − ξn−1 ) | Fn−1 )
= ζn−1 + αn E((ξn − ξn−1 ) | Fn−1 )
| {z }
E(ξn | Fn−1 ) − ξn−1
| {z }
Φn

That is,
E(ζn | Fn−1 ) − ζn−1 = αn Φn .
Now, in case (i), Φn = 0 almost surely, so E(ζn | Fn−1 ) = ζn−1 almost surely.
In case (ii), Φn ≤ 0 almost surelyand therefore αn Φn ≤ 0 almost surely. It
follows that ζn−1 ≥ E(ζn | Fn−1 ) almost surely.
In case (iii), Φn ≥ 0 almost surely and so αn Φn ≥ 0 almost surely and
therefore ζn−1 ≤ E(ζn | Fn−1 ) almost surely.

Remarks 3.15.

1. This last result can be interpreted as saying that no matter what strategy
one adopts, it is not possible to make a fair game “unfair”, a favourable
game unfavourable or an unfavourable game favourable.

2. The formula

ζn = α1 (ξ1 − ξ0 ) + α2 (ξ2 − ξ1 ) + · · · + αn (ξn − ξn−1 )

is the martingale transform of ξ by α. The general definition follows.

Definition 3.16. Given an adapted process X and a predictable process C,


the (martingale) transform of X by C is the process C · X with

(C · X)0 = 0
X
(C · X)n = Ck (Xk − Xk1 )
1≤k≤n

(which means that (C · X)n − (C · X)n−1 = Cn (Xn − Xn−1 )).

We have seen that if C is bounded and X is a martingale, then C · X


is also a martingale. Moreover, if C ≥ 0 almost surely, then C · X is a
submartingale if X is and C · X is a supermartingale if X is.

November 22, 2004


36 Chapter 3

Stopping Times

A map τ : Ω → Z+ ∪ { ∞ } is said to be a stopping time (or a Markov time)


with respect to the filtration (Fn ) if

{ τ ≤ n } ∈ Fn for each n ∈ Z+ .

One can think of this as saying that the information available by time n
should be sufficient to tell us whether something has “stopped by time n”or
not. For example, we should not need to be watching out for a company’s
profits in September if we only want to know whether it went bust in May.

Proposition 3.17. Let τ : Ω → Z+ ∪ { ∞ }. Then τ is a stopping time (with


respect to the filtration (Fn )) if and only if

{ τ = n } ∈ Fn for every n ∈ Z+ .

Proof. If τ is a stopping time, then { τ ≤ n } ∈ Fn for all n ∈ Z+ . But


{ τ = 0 } = { τ ≤ 0 } and

{ τ = n } = { τ ≤ n } ∩ { τ ≤ n − 1 }c
| {z } | {z }
∈Fn ∈Fn−1 ⊂Fn

for any n ∈ N. Hence { τ = n } ∈ Fn for all n ∈ Z+ .


For the converse, suppose that { τ = n } ∈ Fn for all n ∈ Z+ . Then
[
{τ ≤ n} = {τ = k}.
0≤k≤n

Each event { τ = k } belongs to Fk ⊂ Fn if k ≤ n and therefore it follows


that { τ ≤ n } ∈ Fn .

Proposition 3.18. For any fixed k ∈ Z+ , the constant time τ (ω) = k is a


stopping time.

Proof. If k > n, then { τ ≤ n } = ∅ ∈ Fn . On the other hand, if k ≤ n,


then { τ ≤ n } = Ω ∈ Fn .

Proposition 3.19. Let σ and τ be stopping times (with respect to (Fn )). Then
σ + τ , σ ∨ τ and σ ∧ τ are also stopping times.

Proof. Since σ(ω) ≥ 0 and τ (ω) ≥ 0, we see that


[
{σ + τ = n} = {σ = k} ∩ {τ = n − k}.
0≤k≤n

Hence { σ + τ ≤ n } ∈ Fn .

ifwilde Notes
Martingales 37

Next, we note that

{ σ ∨ τ ≤ n } = { σ ≤ n } ∩ { τ ≤ n } ∈ Fn

and so σ ∨ τ is a stopping time.


Finally, we have

{ σ ∧ τ ≤ n } = { σ ∧ τ > n }c = ({ σ > n } ∩ { τ > n })c ∈ Fn

and therefore σ ∧ τ is a stopping time.

Definition 3.20. Let X be a process and τ a stopping time. The process X


stopped by τ is the process (Xnτ ) with

Xnτ (ω) = Xτ (ω)∧n (ω) for ω ∈ Ω.

So if the outcome is ω and say, τ (ω) = 23, then Xnτ (ω) is given by

Xτ (ω)∧1 (ω), Xτ (ω)∧2 (ω), . . .


= X1 (ω), X2 (ω), . . . , X22 (ω), X23 (ω), X23 (ω), X23 (ω), . . .

Xτ ∧n is constant for n ≥ 23. Of course, for another outcome ω ′ , with


τ (ω ′ ) = 99 say, then Xτ ∧n takes values

X1 (ω ′ ), X2 (ω ′ ), . . . , X98 (ω ′ ), X99 (ω ′ ), X99 (ω ′ ), X99 (ω ′ ), . . .

Proposition 3.21. If (Xn ) is a martingale (resp., submartingale, supermartin-


gale), so is the stopped process (Xnτ ).

Proof. Firstly, we note that


(
Xk (ω) if τ (ω) = k with k ≤ n
Xnτ (ω) = Xτ (ω)∧n (ω) =
Xn (ω) if τ (ω) > n,

that is,
n
X
Xnτ = Xk 1{ τ =k } + Xn 1{ τ ≤n }c .
k=0

Now, { τ = k } ∈ Fk and { τ ≤ n }c ∈ Fn and so it follows that Xnτ is


Fn -measurable. Furthermore, each term on the right hand side is integrable
and so we deduce that (Xnτ ) is adapted and integrable.

November 22, 2004


38 Chapter 3

Next, we see that


n+1
X
τ
Xn+1 − Xnτ = Xk 1{ τ =k } + Xn+1 1{ τ >n+1 }
k=0 n
X
− Xk 1{ τ =k } − Xn 1{ τ >n }
k=0
= Xn+1 1{ τ =n+1 } + Xn+1 1{ τ >n+1 } − Xn 1{ τ >n }
= Xn+1 1{ τ ≥n+1 } − Xn 1{ τ >n }
= (Xn+1 − Xn ) 1{ τ ≥n+1 } .
Taking the conditional expectation, the left hand side gives
τ
E(Xn+1 − Xnτ | Fn ) = E(Xn+1
τ
| Fn ) − Xnτ
since Xnτ is adapted. The right hand side gives
E((Xn+1 − Xn ) 1{ τ ≥n+1 } | Fn ) = 1{ τ ≥n+1 } E((Xn+1 − Xn ) | Fn )
since 1{ τ ≥n+1 } ∈ Fn
= 1{ τ ≥n+1 } (E(Xn+1 | Fn ) − Xn )


 = 0, if (Xn ) is a martingale
≥ 0, if (Xn ) is a submartingale


≤ 0, if (Xn ) is a supermartingale
and so it follows that (Xnτ ) is a martingale (respectively, submartingale,
supermartingale) if (Xn ) is.

Definition 3.22. Let (Xn )n∈Z+ be an adapted process with respect to a given
filtration (Fn ) built on a probability space (Ω, S, P ) and let τ be a stopping
time such that τ < ∞ almost surely. The random variable stopped by τ is
defined to be
(
Xτ (ω) (ω), for ω ∈ { τ ∈ Z+ }
Xτ (ω) =
X∞ , / Z+ ,
if τ ∈
where X∞ is any arbitrary but fixed constant. Then Xτ really is a random
variable, that is, itS
is measurable
 with respect to S (in fact, it is measurable
with respect to σ n Fn ). To see this, let B be any Borel set in R. Then
(on { τ < ∞ })
[ [
{ Xτ ∈ B } = { X τ ∈ B } ∩ {τ = k} = ({ Xτ ∈ B } ∩ { τ = k })
k∈Z+ k∈Z+
[ [
= { Xk ∈ B } ∈ Fk
k∈Z+ k∈Z+

which shows that Xτ is a bone fide random variable.

ifwilde Notes
Martingales 39

Example 3.23 (Wald’s equation). Let (Xj )j∈N be a sequence of independent


identically distributed random variables with finite expectation. For each
n ∈ N, let Sn = X1 + · · · + Xn . Then clearly E(Sn ) = n E(X1 ).
For n ∈ N, let Fn = σ(X1 , . . . , Xn ) be the filtration generated by the Xj s
and suppose that N is a bounded stopping time (with respect to (Fn )).
We calculate

E(SN ) = E(S1 1{ N =1 } + S2 1{ N =2 } + . . . )
= E(X1 1{ N =1 } + (X1 + X2 )1{ N =2 } +
+ (X1 + X2 + X3 )1{ N =3 } + . . . )
= E(X1 1{ N ≥1 } + X2 1{ N ≥2 } + X3 1{ N ≥3 } + . . . )
= E(X1 g1 ) + E(X2 g2 ) + . . .

where gj = 1{ N ≥j } . But 1{ N ≥j } = 1 − 1{ N <j } and so (gj ) is predictable


and, by independence,

E(Xj gj ) = E(Xj ) E(gj ) = E(Xj ) P (N ≥ j) = E(X1 ) P (N ≥ j)

since E(Xj ) = E(X1 ) for all j. Therefore

E(SN ) = E(X1 ) ( P (N ≥ 1) + P (N ≥ 2) + P (N ≥ 3) + . . . )
= E(X1 ) (P (N = 1) + 2 P (N = 2) + 3 P (N = 3) + . . . )
= E(X1 ) E(N )

— known as Wald’s equation.

Theorem 3.24 (Optional Stopping Theorem). Let (Xn )n∈Z+ be a martingale


and τ a stopping time such that:

(1) τ < ∞ almost surely (i.e., P (τ ∈ Z+ ) = 1);

(2) Xτ is integrable;

(3) E(Xn 1{ τ >n } ) → 0 as n → ∞.

Then E(Xτ ) = E(X0 ).


(
Xτ on { τ ≤ n }
Proof. We have Xτ ∧n =
Xn on { τ > n }
and so

Xt = Xτ ∧n + (Xτ − Xτ ∧n )
= Xτ ∧n + (Xτ − Xτ ∧n )(1τ ≤n + 1τ >n )
= Xτ ∧n + (Xτ − Xn )1τ >n .

November 22, 2004


40 Chapter 3

Taking expectations, we get

E(Xτ ) = E(Xτ ∧n ) + E(Xτ 1{ τ >n } ) − E(Xn 1{ τ >n } ) (∗)

But we know (Xτ ∧n ) is a martingale and so E(Xτ ∧n ) = E(Xτ ∧0 ) = E(X0 )


since τ (ω) ≥ 0 always.
The middle term in (∗) gives (using 1{ τ =∞ } = 0 almost surely)


X
E(Xτ 1{ τ >n } ) = E(Xk 1{ τ =k } )
k=n+1
→ 0 as n → ∞

because, by (2),

X
E(Xτ ) = E(Xk 1{ τ =k } ) < ∞ .
k=0

Finally, by hypothesis, we know that E(Xn 1{ τ >n } ) → 0 and so letting


n → ∞ in (∗) gives the desired result.

Remark 3.25. If X is a submartingale and the conditions (1), (2) and (3) of
the theorem hold, then we have

E(Xτ ) ≥ E(X0 ) .

This follows just as for the case when X is a martingale except that one
now uses the inequality E(Xτ ∧n ) ≥ E(Xτ ∧0 ) = E(X0 ) in equation (∗). In
particular, we note these conditions hold if τ is a bounded stopping time.

Remark 3.26. We can recover Wald’s equation as a corollary. Indeed, let


X1 , X2 , . . . be a sequence of independent identically distributed random vari-
ables with means E(Xj ) = µ and let Yj = Xj − µ. Let ξn = Y1 + · · · + Yn .
Then we know that (ξn ) is a martingale with respect to the filtration (Fn )
where Fn = σ(X1 , . . . , Xn ) = σ(Y1 , . . . , Yn ). By the Theorem (but with
index set now N rather than Z+ ), we see that E(ξN ) = E(ξ1 ) = E(Y1 ) = 0,
where N is a stopping time obeying the conditions of the theorem. If
Sn = X1 + · · · + Xn , then we see that ξN = SN − N µ and we conclude
that
E(SN ) = E(N ) µ = E(N ) E(X1 )

which is Wald’s equation.

ifwilde Notes
Martingales 41

Lemma 3.27. Let (Xn ) be a submartingale with respect to the filtration (Fn )
and suppose that τ is a bounded stopping time with τ ≤ m where m ∈ Z.
Then
E(Xm ) ≥ E(Xτ ) .

Proof. We have
m Z
X
E(Xm ) = Xm 1{ τ =j } dP
j=0 Ω
Xm Z
= Xm dP
j=0 { τ =j }
Xm Z
= E(Xm | Fj ) dP, since { τ = j } ∈ Fj ,
j=0 { τ =j }
Xm Z
≥ Xj dP, since E(Xm | Fj ) ≥ Xj almost surely,
j=0 { τ =j }

= E(Xτ )

as required.

Theorem 3.28 (Doob’s Maximal Inequality). Suppose that (Xn ) is a non-


negative submartingale with respect to the filtration (Fn ). Then, for any
m ∈ Z+ and λ > 0,

λ P (max Xk ≥ λ) ≤ E(Xm 1{ maxk≤m Xk ≥λ } ) .


k≤m

Proof. Fix m ∈ Z+ and let Xm ∗ = max


k≤m Xk .
For λ > 0, let
(
min{ k ≤ m : Xk (ω) ≥ λ }, if { k : k ≤ m and Xk (ω) ≥ λ } =
6 ∅
τ (ω) =
m, otherwise.

Then evidently τ ≤ n almost surely (and takes values in Z+ ). We shall show


that τ is a stopping time. Indeed, for j < m

{ τ = j } = { X0 < λ } ∩ { X1 < λ } ∩ · · · ∩ { Xj−1 < λ } ∩ { Xj ≥ λ } ∈ Fj

and

{ τ = m } = { X0 < λ } ∩ { X1 < λ } ∩ · · · ∩ { Xm−1 < λ } ∈ Fm

and so we see that τ is a stopping time as claimed.

November 22, 2004


42 Chapter 3

Next, we note that by the Lemma (since τ ≤ m)

E(Xm ) ≥ E(Xτ ) .
∗ ≥ λ }. Then
Now, set A = { Xm
Z Z
E(Xτ ) = Xτ dP + Xτ dP .
A Ac

If Xm∗ ≥ λ, then there is some 0 ≤ k ≤ m with X ≥ λ and τ = k ≤ k in


k 0
this case. (τ is the minimum of such ks so Xk0 ≥ λ.) Therefore

X τ = X k0 ≥ λ .
∗ < λ, then there is no j with 0 ≤ j ≤ m and
On the other hand, if Xm
Xj ≥ λ. Thus τ = m, by construction. Hence
Z Z Z
Xm dP = E(Xm ) ≥ E(Xτ ) = Xτ dP + Xτ dP
Ω Ac
ZA
≥ λ P (A) + Xm dP .
Ac

Rearranging, we find that


Z
λ P (A) ≤ Xm dP
A

and the proof is complete.

Corollary 3.29. Let (Xn ) be an L2 -martingale. Then

λ2 P (max |Xk | ≥ λ) ≤ E(Xm


2
)
k≤m

for any λ ≥ 0.
Proof. Since (Xn ) is an L2 -martingale, it follows that the process (Xn2 ) is a
submartingale (Proposition 3.10). Applying Doob’s Maximal Inequality to
the submartingale (Xn2 ) (and with λ2 rather than λ), we get
Z
2 2 2 2
λ P (max Xk ≥ λ ) ≤ Xm dP
k≤m { maxk≤m Xk2 ≥λ2 }
Z
2
≤ Xm dP

that is,
λ2 P (max |Xk | ≥ λ) ≤ E(Xm
2
)
k≤m

as required.

ifwilde Notes
Martingales 43

Proposition 3.30. Suppose that X ≥ 0 and X 2 is integrable. Then


Z ∞
2
E(X ) = 2 t P (X ≥ t) dt .
0

Proof. For x ≥ 0, we can write


Z x
2
x =2 t dt
0
Z ∞
=2 t 1{ x≥t } dt .
0

Setting x = X(ω), we get


Z ∞
2
X(ω) = 2 t 1X(ω)≥t dt
0

so that
Z ∞
2
E(X ) = 2 t E(1X≥t ) dt
Z0 ∞
=2 t P (X ≥ t) dt .
0

Theorem 3.31 (Doob’s Maximal L2 -inequality). Let (Xn ) be a non-negative


L2 -submartingale. Then

E( (Xn∗ )2 ) ≤ 4 E( Xn2 )

where Xn∗ = maxk≤n Xk . In alternative notation,

k Xn∗ k2 ≤ 2 kXn k2 .

Proof. Using the proposition, we find that

k Xn∗ k22 = E( (Xn∗ )2 )


Z ∞
=2 t P (Xn 1{ Xn∗ ≥t } ) dt
Z0 ∞
≤2 E(Xn 1{ Xn∗ ≥t } ) dt , by the Maximal Inequality,
0
Z ∞ Z 
=2 Xn dP dt
0 { Xn∗ ≥t }
Z Z Xn∗ 
=2 Xn dt dP , by Tonelli’s Theorem,
ZΩ 0

=2 Xn Xn∗ dP

November 22, 2004


44 Chapter 3

= 2 E(Xn Xn∗ )
≤ 2 kXn k2 kXn∗ k2 , by Schwarz’ inequality.

It follows that
k Xn∗ k2 ≤ 2 k Xn k2
or
E( (Xn∗ )2 ) ≤ 4 E( Xn2 )
and the proof is complete.

Doob’s Up-crossing inequality


We wish to discuss convergence properties of martingales and an important
first result in this connection is Doob’s so-called up-crossing inequality. By
way of motivation, consider a sequence (xn ) of real numbers. If xn → α as
n → ∞, then for any ε > 0, xn is eventually inside the interval (α − ε, α + ε).
In particular, for any a < b, there can be only a finite number of pairs
(n, n+k) for which xn < a and xn+k > b. In other words, the graph of points
(n, xn ) can only cross the semi-infinite strip { (x, y) : x > 0, a ≤ y ≤ b }
finitely-many times. We look at such crossings for processes (Xn ).
Consider a process (Xn ) with respect to the filtration (Fn ). Fix a < b.
Now fix ω ∈ Ω and consider the sequence X0 (ω), X1 (ω), X2 (ω), . . . (although
the value of X0 (ω) will play no rôle here). We wish to count the number of
times the path (Xn (ω)) crosses the band { (x, y) : a ≤ y ≤ b } from below a
to above b. Such a crossing is called an up-crossing of [a, b] by (Xn (ω)).

X7 (ω)
X13 (ω)

y=b

y=a

X11 (ω) X16 (ω)


X3 (ω)

Figure 3.1: Up-crossings of [a, b] by (Xn (ω)).

In the example in the figure 3.1, the path sequences X3 (ω), . . . , X7 (ω) and
X11 (ω), X12 (ω), X13 (ω) each constitute an up-crossing. The path sequence

ifwilde Notes
Martingales 45

X16 (ω), X17 (ω), X18 (ω), X19 (ω) forms a partial up-crossing (which, in fact,
will never be completed if Xn (ω) ≤ b remains valid for all n > 16).
As an aid to counting such up-crossings, we introduce the process (gn )n∈N
defined as follows:

g1 (ω) = 0
(
1, if X1 (ω) < a
g2 (ω) =
0, otherwise


1, if g2 (ω) = 0 and X2 (ω) < a
g3 (ω) = 1, if g2 (ω) = 1 and X2 (ω) ≤ b


0, otherwise
..
.


1, if gn (ω) = 0 and Xn (ω) < a
gn+1 (ω) = 1, if gn (ω) = 1 and Xn (ω) ≤ b


0, otherwise.

We see that gn = 1 when an up-crossing is in progress. Fix m and consider


the sum (up to time m) (— the transform of X by g at time m)

m
X
gj (ω) (Xj (ω) − Xj−1 (ω) )
j=1

= g1 (ω) (X1 (ω) − X0 (ω) ) + · · · + gm (ω) (Xm (ω) − Xm−1 (ω) ) . (∗)

The gj s take the values 1 or 0 depending on whether an up-crossing is in


progress or not. Indeed, if

gr (ω) = 0, gr+1 (ω) = · · · = gr+s (ω) = 1, gr+s+1 (ω) = 0,

then the path Xr (ω), . . . , Xr+s (ω) forms an up-crossing of [a, b] and we see
that
r+s
X r+s
X
gj (ω)(Xj (ω) − Xj−1 (ω)) = (Xj (ω) − Xj−1 (ω) )
j=r+1 j=r+1

= Xr+s (ω) − Xr (ω)


>b−a

because Xr < a and Xr+s > b.


Let Um [a, b](ω) denote the number of completed up-crossings of (Xn (ω)) up
to time m. If gj (ω) = 0 for each 1 ≤ j ≤ m, then the sum (∗) is zero. If not
all gj s are zero, we can say that the path has made Um [a, b](ω) up-crossings

November 22, 2004


46 Chapter 3

(which may be zero) followed possibly by a partial up-crossing (which will


be the case if and only if gm (ω) = gm+1 (ω) = 1). Hence we may estimate
(∗) by
m
X
gj (ω) (Xj (ω) − Xj−1 (ω) ) ≥ Um [a, b](ω)(b − a) + R
j=1

where R = 0 if there is no residual partial up-crossing but otherwise is given


by
Xm
R= gj (ω) (Xj (ω) − Xj−1 (ω) )
j=k

where k is the largest integer for which gj (ω) = 1 for all k ≤ j ≤ m + 1 and
gk−1 = 0. This means that Xk−1 (ω) < a and Xj (ω) ≤ b for k ≤ j ≤ m.
The sequence Xk−1 (ω) < a, Xk (ω) ≤ b, . . . , Xm (ω) ≤ b forms the partial
up-crossing at the end of the path X0 (ω), X1 (ω), . . . , Xm (ω). Since we have
gk (ω) = · · · = gm (ω) = 1, we see that

R = Xm (ω) − Xk−1 (ω) > Xm (ω) − a

where the inequality follows because gk (ω) = 1 and so Xk−1 (ω) < a, by
construction. Now, any real-valued function f can be written as f = f + −f −
where f ± are the positive and negative parts of f , defined by f ± = 12 (|f |±f ).
Evidently f ± ≥ 0. The inequality f ≥ −f − allows us to estimate R by

R ≥ −(Xm − a)− (ω)

which is valid also when there is no partial up-crossing (so R = 0).


Putting all this together, we obtain the estimate
m
X
gj (Xj − Xj−1 ) ≥ Um [a, b](b − a) − (Xm − a)− (∗∗)
j=1

on Ω.

Proposition 3.32. The process (gn ) is predictable.

Proof. It is clear that g1 is F0 -measurable. Furthermore, g2 = 1{ X1 <a }


and so g2 is F1 -measurable. For the general case, let us suppose that gm is
Fm−1 -measurable. Then

gm+1 = 1{ gm =0 }∩{ Xm <a } + 1{ gm =1 }∩{ Xm ≤b }

which is Fm -measurable. By induction, it follows that (gn )n∈N is predictable,


as required.

ifwilde Notes
Martingales 47

Theorem 3.33. For any supermartingale (Xn ), we have

(b − a) E( Um [a, b] ) ≤ E((Xm − a)− ) .

Proof. The non-negative, bounded process (gn ) is predictable and so the


martingale transform ((g · X)n ) is a supermartingale. The estimate (∗∗)
becomes
(g · X)m ≥ (b − a) Um [a, b] − (Xm − a)− .
Taking expectations, we obtain the inequality

E( (g · X)m ) ≥ (b − a) E( Um [a, b] ) − E((Xm − a)− ) .

But E( (g · X)m ) ≤ E( g · X)1 ) since ((g · X)n ) is a supermartingale, and


(g · X)1 = 0, by construction, so we may say that

0 ≥ (b − a) E( Um [a, b] ) − E((Xm − a)− )

and the result follows.

Lemma 3.34. Suppose that (fn ) is a sequence of random variables such that
fn ≥ 0, fn ↑ and such that E(fn ) ≤ K for all n. Then

P (lim fn < ∞) = 1 .
n

Proof. Set gn = fn ∧ m. Then gn ↑ and 0 ≤ gn ≤ m for all n. It follows that


g = limn gn exists and obeys 0 ≤ g ≤ m. Let B = { ω : fn (ω) → ∞ }. Then
g = m on B (because fn (ω) is eventually > m if ω ∈ B). Now

0 ≤ gn ≤ fn =⇒ E(gn ) ≤ E(fn ) ≤ K .

By Lebesgue’s Monotone Convergence Theorem, E(gn ) ↑ E(g) and therefore


E(g) ≤ K. Hence
m P (B) ≤ E(g) ≤ K .
This holds for any m and so it follows that P (B) = 0.

Theorem 3.35 (Doob’s Martingale Convergence Theorem). Let (Xn ) be a


supermartingale such that E(Xn ) < M for all n ∈ Z+ . Then there is an
integrable random variable X such that Xn → X almost surely as n → ∞.
Proof. We have seen that
E( (Xn − a)− )
E( Un [a, b] ) ≤
b−a
for any a < b. However,

(Xn − a)− = 1
2 (|Xn − a| − (Xn − a))
≤ |Xn | + |a|

November 22, 2004


48 Chapter 3

and so, by hypothesis,

E( (Xn − a)− ) ≤ E|Xn | + |a|


≤ M + |a|

giving
M + |a|
E( Un [a, b] ) ≤
b−a
+
for any n ∈ Z . However, by its very construction, Un [a, b] ≤ Un+1 [a, b] and
M + |a|
so limn E( Un [a, b] ) exists and obeys limn E( Un [a, b] ) ≤ . By the
b−a
lemma, it follows
T that Un [a, b] converges almost surely (to a finite value).
Let A = a<b { limn Un [a, b] < ∞ }. Then A is a countable intersection
a,b∈Q
of sets of probability 1 and so P (A) = 1.
Claim: (Xn ) converges almost surely.
For, if not, then

B = { lim inf Xn < lim sup Xn } ⊂ Ω

would have positive probability. For any ω ∈ B, there exist a, b ∈ Q with


a < b such that

lim inf Xn (ω) < a < b < lim sup Xn (ω)

which means that limn Un [a, b](ω) = ∞ (because (Xn (ω)) would cross [a, b]
infinitely-many times). Hence B ∩ A = ∅ which means that B ⊂ Ac and so
P (B) ≤ P (B c ) = 0. It follows that Xn converges almost surely as claimed.
Denote this limit by X, with X(ω) = 0 for ω ∈ / A. Then

E( |X| ) = E( lim inf |Xn | )


n
≤ lim inf E(|Xn | , by Fatou’s Lemma,
n
≤M

and the proof is complete.

Corollary 3.36. Let (Xn ) be a positive supermartingale. Then there is some


X ∈ L1 such that Xn → X almost surely.
Proof. Since Xn ≥ 0 almost surely (and is a supermartingale), it follows
that
0 ≤ E(Xn ) ≤ E(X0 )
and so (E(Xn )) is bounded. Now apply the theorem.

Remark 3.37. These results also hold for martingales and submartingales
(because (−Xn ) is a supermartingale whenever (Xn ) is a submartingale).

ifwilde Notes
Martingales 49

Remark 3.38. The supermartingale (Xn ) is L1 -bounded if and only if the


process (Xn− ) is, that is

sup E(|Xn |) < ∞ ⇐⇒ sup E(Xn− ) < ∞ .


n n

We can see this as follows. Since E(Xn− ) ≤ E(Xn+ + Xn− ) = E(|Xn |), we
see immediately that if E(|Xn |) ≤ M for all n, then also E(Xn− ) ≤ M
for all n. Conversely, suppose there is some positive constant M such that
E(Xn− ) ≤ M for all n. Then

E(|Xn |) = E(Xn+ + Xn− ) = E(Xn ) + 2 E(Xn− ) ≤ E(X0 ) + 2M

and so (Xn ) is L1 -bounded. (E(Xn ) ≤ E(X0 ) because (Xn ) is a super-


martingale.)
Remark 3.39. Note that the theorem gives convergence almost surely. In
general, almost sure convergence does not necessarily imply L1 convergence.
For example, suppose that Y has a uniform distribution on (0, 1) and for each
n ∈ N let Xn = n1{ Y <1/n } . Then Xn → 0 almost surely but kXn k1 = 1
for all n and so it is false that Xn → 0 in L1 . However, under an extra
condition, L1 convergence is assured.
Definition 3.40. The collection { Yα : α ∈ I } of integrable random variables
labeled by some index set I is said to be uniformly integrable if for any given
ε > 0 there is some M > 0 such that
Z
E( |Yα |1{ |Yα |>M } ) = |Yα | dP < ε
{ |Yα |>M }

for all α ∈ I.
Remark 3.41. Note that any uniformly integrable family { Yα : α ∈ I } is
L 1
R -bounded. Indeed, by definition, for any ε > 0 there is M such that
{ |Yα |>M } |Yα | dP < ε for all α. But then
Z Z Z
|Yα | dP = |Yα | dP + |Yα | dP ≤ M + ε
Ω { |Yα |≤M } { |Yα |>M }

for all α ∈ I.
Before proceeding, we shall establish the following result.
Lemma 3.42. Let X ∈ L1 . Then for
R any ε > 0 there is δ > 0 such that if A
is any event with P (A) < δ then A |X| dP < ε.
Proof. Set ξ = |X| and for n ∈ N let ξn = ξ 1{ ξ≤n } . Then ξn → ξ almost
surely and by Lebesgue’s Monotone Convergence Theorem
Z Z Z Z
ξ dP = ξ (1 − 1{ ξ≤n } ) dP = ξ dP − ξn dP → 0
{ ξ>n } Ω Ω Ω

November 22, 2004


50 Chapter 3

R
as n → ∞. Let ε > 0 be given and let n be so large that { ξ>n } ξ dP < 12 ε.
For any event A, we have
Z Z Z
ξ dP = ξ dP + ξ dP
A A∩{ ξ≤n } A∩{ ξ>n }
Z Z
≤ n dP + ξ dP
A { ξ>n }
1
< n P (A) + 2ε

whenever P (A) < δ where δ = ε/2n.

Theorem 3.43 (L1 -convergence Theorem). Suppose that (Xn ) is a uniformly


integrable supermartingale. Then there is an integrable random variable X
such that Xn → X in L1 .

Proof. By hypothesis, the family { Xn : n ∈ Z+ } is uniformly integrable


and we know that Xn converges almost surely to some X ∈ L1 . We shall
show that these two facts imply that Xn → X in L1 . Let ε > 0 be given.
We estimate
Z
kXn − Xk1 = |Xn − X| dP
ZΩ Z
= |Xn − X| dP + |Xn − X| dP
{ |Xn |≤M } { |Xn |>M }
Z
≤ |Xn − X| dP
{ |Xn |≤M }
Z Z
+ |Xn | dP + |X| dP .
{ |Xn |>M } { |Xn |>M }

We shall consider separately the three terms on the right hand side.
By Rthe hypothesis of uniform integrability, we may say that the second
term { |Xn |>M } |Xn | dP < ε for all sufficiently large M .
As for the
R third term, the integrability of X means that there is δ > 0
such that A |X| dP < ε whenever P (A) < δ. Now we note that (provided
M > 1)
Z Z
P ( |Xn | > M ) = 1{ |Xn |>M } dP ≤ 1{ |Xn |>M } |Xn | dP < δ
Ω Ω

for large enough M , by uniform integrability. Hence, for all sufficiently


large M , Z
|X| dP < ε .
{ |Xn |>M }

ifwilde Notes
Martingales 51

Finally, we consider the first term. Fix M > 0 so that the above bounds
on the second and third terms hold. The random variable 1{ |Xn |≤M } |Xn −X|
is bounded by the integrable random variable M + |X| and so it follows from
Lebesgue’s Dominated Convergence Theorem that
Z Z
|Xn − X| dP = 1{ |Xn |≤M } |Xn − X| dP → 0
{ |Xn |≤M } Ω

as n → ∞ and the result follows.

Proposition 3.44. Let (Xn ) be a martingale with respect to the filtration (Fn )
such that Xn → X in L1 . Then, for each n, Xn = E(X | Fn ) almost surely.

Proof. The conditional expectation is a contraction on L1 , that is,

kE(X | Fm )k1 ≤ kXk1

for any X ∈ F 1 . Therefore, for fixed m,

kE(X − Xn | Fm )k1 ≤ kX − Xn k1 → 0

as n → ∞. But the martingale property implies that for any n ≥ m

E(X − Xn | Fm ) = E(X | Fm ) − E(Xn | Fm ) = E(X | Fm ) − Xm

which does not depend on n. It follows that kE(X | Fm ) − Xm k1 = 0 and


so E(X | Fm ) = Xm almost surely.

Theorem 3.45 (L2 -martingale Convergence Theorem). Suppose that (Xn ) is


an L1 -martingale such that E(Xn2 ) < K for all n. Then there is X ∈ L2
such that

(1) Xn → X almost surely,

(2) E( (X − Xn )2 ) → 0.

In other words, Xn → X almost surely and also in L2 .

Proof. Since kXn k1 ≤ kXn k2 , it follows that (Xn ) is a bounded L1 -martingale


and so there is some X ∈ L1 such that Xn → X almost surely. We must
show that X ∈ L2 and that Xn → X in L2 . We have

kXn − X0 k22 = E( (Xn − X0 )2 )


Xn n
X 
=E (Xk − Xk−1 ) (Xj − Xj−1 )
k=1 j=1
n
X 
= E (Xk − Xk−1 )2 , by orthogonality of increments.
k=1

November 22, 2004


52 Chapter 3

However, by the martingale property,

E( (Xn − X0 )2 ) = E(Xn2 ) − E(X02 ) < K


P 
and so nk=1 E (Xk − Xk−1 )2 increases with n but is bounded, and so
must converge.
Let ε > 0 be given. Then (as above)
n
X 
E( (Xn − Xm )2 ) = E (Xk − Xk−1 )2 < ε
k=m+1

for all sufficiently large m, n. Letting n → ∞ and applying Fatou’s Lemma,


we get

E( (X − Xm )2 ) = E( lim inf (Xn − Xm )2 )


n
≤ lim inf E( (Xn − Xm )2 )
n
≤ε

for all sufficiently large m. Hence X − Xn ∈ L2 and so X ∈ L2 and Xn → X


in L2 , as required.

Doob-Meyer Decomposition
The (continuous time formulation of the) following decomposition is a crucial
idea in the abstract development of stochastic integration.

Theorem 3.46. Suppose that (Xn ) is an adapted L1 process. Then (Xn ) has
the decomposition
X = X0 + M + A

where (Mn ) is a martingale, null at 0, and (An ) is a predictable process, null


at 0. Such a decomposition is unique, that is, if also X = X0 + M ′ + A′ ,
then M = M ′ almost surely and A = A′ almost surely.
Furthermore, if X is a supermartingale, then A is increasing.

Proof. We define the process (An ) by

A0 = 0
An = An−1 + E(Xn − Xn−1 | Fn−1 ) .

Evidently, A is null at 0 and An is Fn−1 -measurable, so (An ) is predictable.


Furthermore, we see (by induction) that An is integrable.

ifwilde Notes
Martingales 53

Now, from the construction of A, we find that

E( ( Xn − An ) | Fn−1 ) = E(Xn | Fn−1 )


− E(An−1 | Fn−1 ) − E(( Xn − Xn−1 ) | Fn−1 )
= Xn−1 − An−1 a.s.

which means that (Xn −An ) is an L1 -martingale. Let Mn = (Xn −An )−X0 .
Then M is a martingale, M is null at 0 and we have the Doob decomposition

X = X0 + M + A .

To establish uniqueness, suppose that

X = X 0 + M ′ + A′ = X 0 + M + A .

Then

E(Xn − Xn−1 | Fn−1 ) = E( ( Mn + An ) − ( Mn−1 + An−1 ) | Fn−1 )


= An − An−1 a.s.

since M is a martingale and A is predictable. Similarly,

E(Xn − Xn−1 | Fn−1 ) = A′n − A′n−1 a.s.

and so
An − An−1 = A′n − A′n−1 a.s.
Now, both A and A′ are null at 0 and so A0 = A′0 (= 0) almost surely and
therefore
A1 = A′1 + (A0 − A′0 ) =⇒ A1 = A′1 a.s.
| {z }
=0a.s.

Continuing in this way, we see that An = A′n a.s. for each n. It follows that
there is some Λ ⊂ Ω with P (Λ) = 1 and such that An (ω) = A′n (ω) for all n
for all ω ∈ Λ. However, M = X − X0 − A and M ′ = X − X0 − A′ and so
Mn (ω) = Mn′ (ω) for all n for all ω ∈ Λ, that is M = M ′ almost surely.
Now suppose that X = X0 +M +A is a submartingale. Then A = X−M −X0
is also a submartingale so, since A is predictable,

An = E(An | Fn−1 ) ≥ An−1 a.s.

Once again, it follows that there is some Λ ⊂ Ω with P (Λ) = 1 and such
that
An (ω) ≥ An−1 (ω)
for all n and all ω ∈ Λ, that is, A is increasing almost surely.

November 22, 2004


54 Chapter 3

Remark 3.47. Note that if A is any predictable, increasing, integrable process,


then E(An | Fn−1 ) = An ≥ An−1 so A is a submartingale. It follows that if
X = X0 + M + A is the Doob decomposition of X and if the predictable
part A is increasing, then X must be a submartingale (because M is a
martingale).

Corollary 3.48. Let X be an L2 -martingale. Then X 2 has decomposition

X 2 = X02 + M + A

where M is an L1 -martingale, null at 0 and A is an increasing, predictable


process, null at 0.

Proof. We simply note that X 2 is a submartingale and apply the theorem.

Definition 3.49. The almost surely increasing (and null at 0) process A is


called the quadratic variation of the L2 -process X.

The following result is an application of a Monotone Class argument.

Proposition 3.50 (Lévy’s Upward Theorem). LetS (Fn )n∈Z+ be a filtration


on a probability space (Ω, S, P ). Set F∞ = σ( n Fn ) and let X ∈ L2 be
F∞ -measurable. For n ∈ Z+ , let Xn = E(X | Fn ). Then Xn → X almost
surely.

Proof. The family (Xn ) is an L2 -martingale with respect to the filtration


(Fn ) and so the Martingale Convergence Theorem (applied to the filtered
probability space (Ω, F∞ , P, (Fn ))) tells us that there is some Y ∈ L2 (F∞ )
such that Xn → Y almost surely, and also Xn → Y in L2 (F∞ ). We shall
show that Y = X almost surely.
For any n ∈ Z+ and any B ∈ Fn , we have
Z Z Z
Xn dP = E(X | Fn ) dP = X dP.
B B B

On the other hand,


Z Z Z

Xn dP − Y dP = 1B (Xn − Y ) dP ≤ kXn − Y k2 .
B B Ω

by the Cauchy-Schwarz
R kXn − Y k2 → 0, we must have the
inequality. Since S
equality B (X − Y ) dP = 0 for any B ∈ n Fn .

ifwilde Notes
Martingales 55

Let M denote the subset of F∞ given


Z
M = { B ∈ F∞ : (X − Y ) dP = 0 } .
B

We claim that M is a monotone class. Indeed, suppose that B1S⊂ B2 ⊂ . . .


is an increasing sequence in M. Then 1Bn ↑ 1B , where B = n Bn . The
function X −Y is integrable and so we may appeal to Lebesgue’s Dominated
Convergence Theorem to deduce that
Z Z
0= (X − Y ) dP = 1Bn (X − Y ) dP
ZBn Ω
Z
→ 1B (X − Y ) dP = (X − Y ) dP
Ω B
S
which proves the claim. Now, n Fn is an algebra and belongs to M. By
the Monotone Class Theorem, M contains Rthe σ-algebra generated by this
algebra, namely F∞ . But then we are done, B (X −Y ) dP for every B ∈ F∞
and so we must have X = Y almost surely.

Remark 3.51. This result is also valid in L1 . In fact, if X ∈ L1 (F∞ ) then


(E(X | Fn )) is uniformly integrable and the Martingale Convergence The-
orem applies. Xn converges almost surely and also in L1 to some Y . An
analogous proof shows that X = Y almost surely.
In the L2 case, one would expect the result to be true on the following
grounds. The subspaces L2 (Fn ) increase and would seem to “exhaust” the
space L2 (F∞ ). The projections Pn from L2 (F∞ ) onto L2 (Fn ) should there-
fore increase to the identity operator, Pn g → g for any g ∈ L2 (F∞ ). But
Xn = Pn X, so we would expect Xn → X. Of course, to work this argument
through would bring us back to the start.

November 22, 2004


56 Chapter 3

ifwilde Notes
Chapter 4

Stochastic integration - informally

We suppose that we are given a probability space (Ω, S, P ) together with a


filtration (Ft )t∈R+ of sub-σ-algebras indexed by R+ = [0, ∞) (so Fs ⊂ Ft if
s < t). (Recall that our notation A ⊂ B means that A ∩ B c = ∅ so that
Fs = Ft is permissible here.) Just as for a discrete index, one can define
martingales etc.

Definition 4.1. The process (Xt )t∈R+ is adapted with respect to the filtration
(Ft )t∈R+ if Xt is Ft -measurable for each t ∈ R+ .
The adapted process (Xt )t∈R+ is a martingale with respect to the filtration
(Ft )t∈R+ if Xt is integrable for each t ∈ R+ and if

E(Xt | Fs ) = Xs almost surely for any s ≤ t.

Supermartingales indexed by R+ are defined as above but with the inequality


E(Xt | Fs ) ≤ Xs almost surely for any s ≤ t instead of equality above. A
process is a submartingale if its negative is a supermartingale.

Remark 4.2. If (Xt ) is an L2 -martingale, then (Xt2 ) is a submartingale. The


proof of this is just as for a discrete index, as in Proposition 3.10.

Remark 4.3. It must be stressed that although it may at first appear quite
innocuous, the change from a discrete index to a continuous one is anything
but. There are enormous technical complications involved in the theory with
a continuous index. Indeed, one might immediately anticipate measure-
theoretic difficulties simply because R+ is not countable.

Remark 4.4. Stochastic processes (Xn ) and (Yn ), indexed by Z+ , are said to
be indistinguishable if

P (Xn = Yn for all n) = 1.

The process (Yn )n∈Z+ is said to be a version (or a modification) of (Xn )n∈Z+
if Xn = Yn almost surely for every n ∈ Z+ .

57
58 Chapter 4

If (Yn ) is a version of (Xn ) then (Xn ) and (Yn ) are indistinguishable.


Indeed, for each k ∈ Z+ , let Ak = { Xk = Yk }. We know that P (Ak ) = 1,
since (Yn ) is a version of (Xn ), by hypothesis. But then
T
P (Xn = Yn for all n ∈ Z+ ) = P ( n An ) = 1.

Warning: the extension of these definitions to processes indexed by R+ is


clear, but the above implication can be false in the situation of continuous
time, as the following example shows.
Example 4.5. Let Ω = [0, 1] equipped with the Borel σ-algebra and the
probability measure P determined by (Lebesgue measure) P ([a, b]) = b − a
for any 0 ≤ a ≤ b ≤ 1. For t ∈ R+ , let Xt (ω) = 0 for all ω ∈ Ω = [0, 1] and
let (
0, ω 6= t
Yt (ω) =
1, ω = t.
Fix t ∈ R+ . Then Xt (ω) = Yt (ω) unless ω = t. Hence { Xt = Yt } = Ω \ { t }
or Ω depending on whether t ∈ [0, 1] or not. So P (Xt = Yt ) = 1. On the
other hand, { Xt = Yt for all t ∈ R+ } = ∅ and so we have

P (Xt = Yt ) = 1 , for each t ∈ R+ ,


P (Xt = Yt for all t ∈ R+ ) = 0 ,

which is to say that (Yt ) is a version of (Xt ) but these processes are far from
indistinguishable.
Note also that the path t 7→ Xt (ω) is constant for every ω, whereas for
every ω, the path t 7→ Yt (ω) has a jump at t = ω. So the paths t 7→ Xt (ω) are
continuous almost surely, whereas with probability one, no path t 7→ Yt (ω)
is continuous.

T + of sub-σ-algebras+of a σ-algebra S is said


Example 4.6. A filtration (Gt )t∈R
to be right-continuous if Gt = s>t Gs for any
T t∈R .
For anyTgiven filtration (Ft ), set Gt = s>t Fs . Fix t ≥ 0 and suppose
that A ∈ Ts>t Gs . For any r > t, choose s with t < s < r. TThen we see that
A ∈ Gs = v>s Fv and, in particular, A ∈ Fr . Hence A ∈ r>t Fr = Gt and
so (Gt )t∈R+ is right-continuous.
Remark 4.7. Let (Xt )t∈R+ be a process indexed by R+ . Then one might be
interested in the process given by Yt = sups≤t Xt for t ∈ R+ . Even though
each Xt is a random variable, it is not at all clear that Yt is measurable.
However, suppose that t 7→ Xt is almost surely continuous and let Ω0 ⊂ Ω
be such that P (Ω0 ) = 1 and t 7→ Xt (ω) is continuous for each ω ∈ Ω0 . Then
we can define (
sups≤t Xt (ω), for ω ∈ Ω0 ,
Yt (ω) =
0, otherwise.

ifwilde Notes
Stochastic integration - informally 59

Claim: Yt is measurable.

Proof of claim. Suppose that ϕ : [a, b] → R is a continuous function. If


D = { t0 , t1 , . . . , tk } with a = t0 < t1 < · · · < tk = b is a partition of [a, b],
let maxD ϕ denote max{ ϕ(t) : t ∈ D }. Let
(n) (n)
Dn = { a = t 0 < t1 < · · · < t(n)
mn = b }

be a sequence of partitions of [a, b] such that mesh(Dn ) → 0, where mesh(D)


is given by mesh(D) = max{ tj+1 − tj : tj ∈ D }.
Then it follows that sup{ ϕ(s) : s ∈ [a, b] } = limn maxDn ϕ. This is
because a continuous function ϕ on a closed interval [a, b] is uniformly
continuous there and so for any ε > 0, there is some δ > 0 such that
|ϕ(t) − ϕ(s)| < ε whenever |t − s| < δ. Suppose then that n is sufficiently
large that mesh(Dn ) < δ. Then for any s ∈ [a, b], there is some tj ∈ Dn
such that |s − tj | < δ. This means that |ϕ(s) − ϕ(tj )| < ε. It follows that

maxDn ϕ ≤ sup { ϕ(s) : s ∈ [a, b] } < maxDn ϕ + ε

and so sup{ ϕ(s) : s ∈ [a, b] } = limn maxDn ϕ.


Now fix t ≥ 0, set a = 0, b = t, fix ω ∈ Ω and let ϕ(s) = 1Ω0 (ω) Xs (ω).
Evidently s 7→ ϕ(s) is continuous on [0, t] and so

Yt (ω) = sup{ ϕ(s) : s ∈ [0, t] } = lim maxDn ϕ


n

as above. But for each n, ω 7→ maxDn ϕ = max{ Xt (ω) : t ∈ Dn } is


measurable and so the pointwise limit ω 7→ Yt (ω) is measurable, which
completes the proof of the claim.

Example 4.8 (Doob’s Maximal Inequality). Suppose now that (Xt ) is a non-
negative submartingale with respect to a filtration (Ft ) such that F0 contains
all sets of probability zero. Then Ω0 ∈ F0 and so the process (ξt ) = (1Ω0 Xt )
is also a non-negative submartingale. Fix t ≥ 0 and n ∈ N. For 0 ≤ j ≤ 2n ,
let tj = tj/2n . Set ηj = ξtj and Gj = Ftj for j ≤ 2n and ηj = ηt and Gj = Ft
for j > 2n . Then (ηj ) is a discrete parameter non-negative submartingale
with respect to the filtration (Gj ). According to the discussion above,

Yt = sup ξs = lim maxn ηj = sup maxn ηj


s≤t n j≤2 n j≤2

on Ω. Now, by adjusting the inequalities in the proof of Doob’s Maximal


inequality, Theorem 3.28, we see that the result is also valid if the inequality
maxk Xk ≥ λ is replaced by maxk Xk > λ on both sides and so

λ P (maxj≤2n ηj > λ) ≤ E( η2n 1{ maxj≤2n ηj >λ } ) .

November 22, 2004


60 Chapter 4

Since η2n = Xt , we may say that

λ E(1{ maxj≤2n ηj >λ } ) ≤ E( Xt 1{ maxj≤2n ηj >λ } ) .

But maxj≤2n ηj ≤ Yt and maxj≤2n ηj → Yt on Ω as n → ∞ and therefore


1{ maxj≤2n ηj >λ } → 1{ Yt >λ } . Hence, letting n → ∞, we find that

λ E(1{ Yt >λ } ) ≤ E( Xt 1{ Yt >λ } ) (∗)

Replacing λ by λn in (∗) where now λn ↓ λ > 0 and letting n → ∞, we


obtain
λ E(1{ Yt ≥λ } ) ≤ E( Xt 1{ Yt ≥λ } ) ,

a continuous version of Doob’s Maximal inequality.

Similarly, we can obtain a version of Corollary 3.29 for continuous time.


Suppose that (Xt )t∈R+ is an L2 martingale such that the map t 7→ Xt (ω) is
almost surely continuous. Then for any λ > 0

1
P ( sup |Xs | ≥ λ ) ≤ kXt k22 .
s≤t λ2

Proof. Let (Dn ) be the sequence of partitions of [0, t] as above and let
fn (ω) = maxj≤2n |ξj (ω)|. Then fn → f = sups≤t |Xs | almost surely and
since fn ≤ f almost surely, it follows that 1{ fn >µ } → 1{ f >µ } almost
surely. By Lebesgue’s Dominated Convergence Theorem, it follows that
E(1{ fn >µ } ) → E(1{ f >µ } ), that is, P (fn > µ) → P (f > µ).
Let µ < λ. Applying Doob’s maximal inequality, corollary 3.29, for the
discrete-time filtration (Fs ) with s ∈ { 0 = tn0 , tn1 , . . . , tnmn = t }, we get

1
P ( fn > µ ) ≤ P ( fn ≥ µ ) ≤ kXt k22 .
µ2

Letting n → ∞, gives

µ2 P ( f > µ ) ≤ kXt k22 . (∗)

For j ∈ N, let µj ↑ λ and set Aj = { f > µj }. Then Aj ↓ { f ≥ λ } so that


P (Aj ) ↓ P (f ≥ λ). Replacing µ in (∗) by µj and letting j → ∞, we get the
inequality
λ2 P ( f ≥ λ ) ≤ kXt k22

as required.

ifwilde Notes
Stochastic integration - informally 61

Stochastic integration – first steps


We wish to try to perform some kind of abstract integration with respect to
martingales. It is usually straightforward to integrate step functions, so we
shall try to do that here. First, we recall that if Z is a random variable, then
its (cumulative) distribution function F is defined to be F (t) = P (Z ≤ t)
and we have

P (a < Z ≤ b) = P (Z ≤ b) − P (Z ≤ a) = F (b) − F (a) .

This can be written as E(1{ a<Z≤b } ) = F (b) − F (a) or as


Z ∞
1(a,b] (s) dF (s) = F (b) − F (a) .
−∞

Notice that step-functions of the form 1(a,b] (left-continuous) appear quite


naturally here.
Now, suppose that (Xt )t∈R+ is a square-integrable martingale (that is,
RT
Xt ∈ L2 for each t). We want to set-up something like 0 f dXt so we begin
with integrands which are step-functions.
Definition 4.9. A process g : R+ × Ω → R is said to be elementary if it has
the form
g(s, ω) = h(ω) 1(a,b] (s)
for some 0 ≤ a < b and some bounded random variable h(ω) which is Fa -
measurable. Note that g is piecewise-constant in s (and continuous from the
left).
RT
The stochastic integral 0 g dX of the elementary process g with respect
the L2 -martingale (Xt ) is defined to be the random variable
Z T Z T
g dX = h 1(a,b] (s) dXs
0 0

h (Xb − Xa ), if T ≥ b

= h (XT − Xa ), if a < T < b


0, if T ≤ a.
Rt
Proposition 4.10. For a fixed elementary process g, the family ( 0 g dX)t∈R+
is an L2 -martingale.
Proof. Suppose that g = h 1(a,b] and let
Z t
Yt = g dX = h (Xb∧t − Xa∧t ) .
0

Now, if t ≥ a, then h is Ft -measurable and so is Xb∧t − Xa∧t . Since Yt = 0


for t < a, we see that (Yt ) is adapted. Furthermore, since h is bounded, it

November 22, 2004


62 Chapter 4

follows that Yt ∈ L2 for each t ∈ R+ . To show that Yt is a martingale, let


0 ≤ s ≤ t and consider three cases.
Case 1. s ≤ a.
Using the tower property E(· | Fs ) = E(E(· | Fa ) | Fs ), we find that
E(Yt | Fs ) = E(h (Xt∧b − Xt∧a ) | Fs )
= E(E(h (Xt∧b − Xt∧a ) | Fa ) | Fs )
= E(h E((Xt∧b − Xt∧a ) | Fa ) Fs | ) a.s.
| {z }
= 0 since (Xt ) is a martingale

But s ≤ a implies that Ys = h (Xs∧b − Xs∧a ) = 0 and so


E(Yt | Fs ) = 0 = Ys almost surely.

Case 2. a ≤ s ≤ b.
We have
E(Yt | Fs ) = E(h (Xt∧b − Xa ) | Fs )
= h E((Xt∧b − Xa ) | Fs ) a.s.
= h (Xt∧b∧s − Xa ) a.s.
= h (Xs∧b − Xs∧a )
= Ys .
Case 3. b < s.
We find
E(Yt | Fs ) = E(h (Xb − Xa ) | Fs )
= h E((Xb − Xa ) | Fs ) a.s.
= h (Xb − Xa ) a.s.
= h (Xs∧b − Xs∧a )
= Ys
and the proof is complete.

Notation Let E denote the real linear span of the set of elementary processes.
So any element h ∈ E has the form
n
X
h(ω, s) = gi (ω)1(ai ,bi ] (s)
i=1

for some n, pairs ai < bi and bounded random variables gi where each gi is
Fai -measurable. Notice that h(ω, 0) = 0. In fact, we are not interested in
the value of h at s = 0 as far as integration is concerned. We could have
included random variables of the form g0 (ω)1{ 0 } (s) in the construction of E,
where g0 is F0 -measurable, but such elements play no rôle.

ifwilde Notes
Stochastic integration - informally 63

P
Definition 4.11. For h = ni=1 gi 1(ai ,bi ] ∈ E and T ≥ 0, the stochastic integral
RT
0 h dX is defined to be the random variable

Z T n Z
X T n
X
h dX = hi dX = gi (XT ∧bi − XT ∧ai )
0 i=1 0 i=1

where hi = gi 1(ai ,bi ] .

Remark 4.12. It is crucial to note that the stochastic integral is a sum of


terms gi (XT ∧bi − XT ∧ai ) where gi is Fai -measurable and bi > ai . The
“increment” (XT ∧bi − XT ∧ai ) “points to the future”. This will play a central
rôle in the development of the theory.

Rt
Proposition 4.13. For h ∈ E, the process ( 0 g dX)t∈R+ ) is an L2 -martingale.
Pn
Proof. If h = i=1 hi , where each hi is elementary, as above, then
Z t n Z
X t
h dX = hi dX
0 i=1 0

which is a linear combination of L2 -martingales, by Proposition 4.10. The


result follows.
Rt
So far so good. Now if Yt = 0 h dX, can anything be said about kYt k2
(or E(Yt2 ))? We pursue this now.
Let h ∈ E. Then h can be written as
n
X
h= gi 1(ti ,ti+1 ]
i=1

for some m, 0 = t1 < · · · R< tm and Fti -measurable random variables gi


t
(which may be 0). Let Yt = 0 h dX and consider E(Yt2 ). First we note that
Yt is unchanged if h is replaced by h 1(0,t] which means that we may assume,
without loss of generality, that tm = t. This done, we consider

m−1
X 2
E(Yt2 ) = E( gi (Xti +1 − Xti ) )
i=1
X m−1
m−1 X
= E( gi gj (Xti +1 − Xti )(Xtj +1 − Xtj ) ) .
j=1 i=1

November 22, 2004


64 Chapter 4

Now, suppose i 6= j, say i < j. Writing ∆Xi for (Xti +1 − Xti ), we find that

E( gi gj ∆Xi ∆Xj ) = E( E(gi gj ∆Xi ∆Xj | Ftj ) )


= E( gi gj ∆Xi E(∆Xj | Ftj ) ) , a.s.,
since gi gj ∆i is Fti -measurable,
= 0, since (Xt ) is a martingale.

Of course, this also holds for i > j (simply interchange i and j) and so we
may say that
Z t m−1
X m−1
X
2
E( g dX ) = E( gj2 ∆Xj2 ) = E( gj2 ∆Xj2 ) .
0 j=1 j=1

Next, consider

E( gj2 ∆Xj2 ) = E( gj2 (Xtj+1 − Xtj )2 )


= E( gj2 (Xt2j+1 + Xt2j − 2Xtj+1 Xtj ) )
= E( gj2 (Xt2j+1 + Xt2j ) ) − 2E( gj2 Xtj+1 Xtj )
= E( gj2 (Xt2j+1 + Xt2j ) ) − 2E( E(gj2 Xtj+1 Xtj | Ftj ) )
= E( gj2 (Xt2j+1 + Xt2j ) ) − 2E( gj2 Xtj E( Xtj+1 | Ftj ) )
= E( gj2 (Xt2j+1 + Xt2j ) ) − 2E( gj2 Xtj Xtj ),
since (Xt ) is a martingale,
= E( gj2 (Xt2j+1 − Xt2j ) ) .

What now? We have already seen that the square of a discrete-time L2 -


martingale has a decomposition as the sum of an L1 -martingale and a pre-
dictable increasing process. So let us assume that (Xt2 ) has the Doob-Meyer
decomposition
Xt2 = Mt + At
where (Mt ) is an L1 -martingale and (At ) is an L1 -process such that A0 = 0
and As (ω) ≤ At (ω) almost surely whenever s ≤ t.
How does this help? Setting ∆Mj = Mtj+1 − Mtj and similarly ∆Aj =
Atj+1 − Atj , we see that

E( gj2 (Xt2j+1 − Xt2j ) ) = E( gj2 ∆Mj ) + E( gj2 ∆Aj )


= E( E(gj2 ∆Mj | Ftj ) ) + E( gj2 ∆Aj )
= E( gj2 E((Mtj+1 − Mtj ) | Ftj ) ) + E( gj2 ∆Aj )
| {z }
= 0 since (Mt ) is a martingale

= E( gj2 ∆Aj ) .

ifwilde Notes
Stochastic integration - informally 65

Hence
Z t m−1
X
2
E( g dX )= E( gj2 (Atj+1 − Atj ) ) .
0 j=1

Now, on a set of probability one, At (ω) is an increasing function of t and we


can consider Stieltjes integration using this. Indeed,
Z t XZ t
m−1
2
g(s, ω) dAs (ω) = gj2 (ω) 1(tj ,tj+1 ] (s) dAs (ω)
0 j=1 0

m−1
X
= gj2 (ω) ( Atj+1 (ω) − Atj (ω) ) .
j=1

Taking expectations,
Z t X
E( g 2 dA ) = E( gj2 ∆A ))
0

and this leads us finally to the formula

Z t Z t
2
E( g dXs ) = E( g 2 dAs )
0 0

known as the isometry property. This isometry relation allows one to extend
the class of integrands in the stochastic integral. Indeed, suppose that (gn )
+
R t from2E which converges to a map h : R × Ω → R in the sense
is a sequence
that E( 0 (gn − h) dAs ) → 0. The isometry property then tells us that the
Rt
sequence ( 0 gn dX) of random variables is a Cauchy sequence in L2 and so
Rt
converges to some Yt in L2 . This allows us to define 0 h dX as this Yt . We
will consider this again for the case when (Xt ) is a Wiener process.

Remark 4.14. The Doob-Meyer decomposition of Xt2 as Mt + At is far from


straightforward and involves further technical assumptions. The reader
should consult the text of Meyer for the full picture.

November 22, 2004


66 Chapter 4

ifwilde Notes
Chapter 5

Wiener process

We begin, without further ado, with the definition.


Definition 5.1. An adapted process (Wt )t∈R+ on a filtered probability space
(Ω, S, P, (Ft )) is said to be a Wiener process starting at 0 if it obeys the
following:
(a) W0 = 0 almost surely and the map t 7→ Wt is continuous almost
surely.

(b) For any 0 ≤ s < t, the random variable Wt − Ws has a normal


distribution with mean 0 and variance t − s.

(c) For all 0 ≤ s < t, Wt − Ws is independent of Fs .


Remarks 5.2.
1. From (b), we see that the distribution of Ws+t − Ws is normal with
mean 0 and variance t; so
Z
1 2
P (Ws+t − Ws ∈ A) = √ e−x /2t dx
2πt A
for any Borel set A in R. Furthermore, by (a), Wt = Wt+0 − W0 almost
surely, so each Wt has a normal distribution with mean 0 and variance t.
2. For each fixed ω ∈ Ω, think of the map t 7→ Wt (ω) as the path of a
particle (with t interpreted as time). Then (a) says that all paths start
at 0 (almost surely) and are continuous (almost surely).
3. The independence property (c) implies that any collection of increments
Wt1 − Ws1 , Wt2 − Ws2 , . . . , Wtn − Wsn
with 0 ≤ s1 < t1 ≤ s2 < t2 ≤ s3 < · · · ≤ sn < tn are independent.
Indeed, for any Borel sets A1 , . . . , An in R, we have
P (Wt1 − Ws1 ∈ A1 , . . . , Wtn − Wsn ∈ An )
| {z } | {z }
∆W1 ∆Wn
= P ({ ∆W1 ∈ A1 , . . . , ∆Wn−1 ∈ An−1 } ∩ { ∆Wn ∈ An })

67
68 Chapter 5

= P (∆W1 ∈ A1 , . . . , ∆Wn−1 ∈ An−1 ) P (∆Wn ∈ An ),


since { ∆Wj ∈ Aj } ∈ Fsn for all 1 ≤ j ≤ n − 1
and ∆Wn is independent of Fsn ,
= ···
= P (∆W1 ∈ A1 ) P (∆W2 ∈ A2 ) . . . P (∆Wn ∈ An )

as required.

4. A d-dimensional Wiener process (starting from 0 ∈ Rd ) is a d-tuple


(Wt1 , . . . Wtd ) where the (Wt1 ), . . . (Wtd ) are independent Wiener processes
in R (starting at 0).

5. A Wiener process is also referred to as Brownian motion.

1 2
6. Such a Wiener process exists. In fact, let p(x, t) = √2πt e−x /2t denote
the density of Wt = Wt − W0 and let 0 < t1 < · · · < tn . Then to
say that Wt1 = x1 , Wt2 = x2 , . . . , Wtn = xn is to say that Wt1 = x1 ,
Wt2 − Wt2 = x2 − x1 , . . . , Wtn − Wtn−1 = xn − xn−1 . Now, the random
variables Wt1 , Wt2 −Wt1 , . . . , Wtn −Wtn−1 are independent so their joint
density is a product of individual densities. This suggests that the joint
probability density of Wt1 , . . . , Wtn is

p(x1 , x2 , . . . , xn ; t1 , t2 , . . . , tn )
= p(x1 , t1 ) p(x2 − x1 , t2 − t1 ) . . . p(xn − xn−1 , tn − tn−1 ) .
Q
Let Ωt = Ṙ, the one-point compactification of R, and let Ω = t∈R+ Ωt
be the (compact) product space. Suppose that f ∈ C(Ω) depends only
on a finite number of coordinates in Ω, f (ω) = f (xt1 , . . . , xtn ), say. Then
we define
Z
ρ(f ) = p(x1 , . . . , xn ; t1 , . . . , tn )f (x1 , . . . , xn ) dx1 . . . dxn .
Rn

It can be shown that ρ can be extended into a (normalized) positive


linear functional on the algebra C(Ω) and so one can apply the Riesz-
Markov Theorem to deduce the existence of a probability measure µ on
Ω such that Z
ρ(f ) = f (ω) dµ .

Then Wt (ω) = ωt , the t-th component of ω ∈ Ω.


(This elegant construction is due to Edward Nelson.)

ifwilde Notes
Wiener process 69

We collect together some basic facts in the following theorem.


Theorem 5.3. The Wiener process enjoys the following properties.
(i) E( Ws Wt ) = min{ s, t } = s ∧ t.

(ii) E( (Wt − Ws )2 ) = |t − s|.

(iii) E( Wt4 ) = 3t2 .

(iv) (Wt ) is an L2 -martingale.

(v) If Mt = Wt2 − t, then (Mt ) is a martingale, null at 0.


Proof. (i) Suppose s ≤ t. Using independence, we calculate

E( Ws Wt ) = E( Ws (Wt − Ws ) ) + E( Ws2 )
= E( Ws ) E( Wt − Ws ) + var Ws
| {z } | {z } | {z }
=0 =0 since E(Ws ) = 0

= s.

(ii) Again, suppose s ≤ t. Then

E( (Wt − Ws )2 ) = var(Wt − Ws ) , since E(Wt − Ws ) = 0,


= t − s , by definition.

(iii) We have Z ∞
2 /2 √
e−αx dx = α−1/2 2π .
−∞
Differentiating both sides twice with respect to α gives
Z ∞ √
2
x4 e−αx /2 dx = 3α−5/2 2π .
−∞

Replacing α by 1/t and rearranging, we get


Z ∞
4 1 2
E( Wt ) = √ x4 e−x /2t dx = 3t2 .
2πt −∞
and the proof is complete.
(iv) For 0 ≤ s < t, we have (using independence)

E(Wt | Fs ) = E(Wt − Ws | Fs ) + E(Ws | Fs )


= E(Wt − Ws ) + Ws a.s.
= Ws a.s.

Wt = Wt − W0 has a normal distribution, so Wt ∈ L2 .

November 22, 2004


70 Chapter 5

(v) Let 0 ≤ s < t. Then, with probability one,

E(Wt2 | Fs ) = E((Wt − Ws )2 | Fs ) + 2E(Wt Ws | Fs ) − E(Ws2 | Fs )


= E((Wt − Ws )2 ) + 2Ws E(Wt | Fs ) − Ws2
= t − s + 2Ws2 − Ws2 , by (ii) and (iv),
=t−s+ Ws2

so that E( (Wt2 − t) | Fs ) = (Ws2 − s) almost surely.

1 2 1
Example 5.4. For a ∈ R, ( eaWt − 2 a t ) (and so ( eWt − 2 t ), in particular) is a
martingale. Indeed, for s ≤ t,

1 2 1 2
E(eaWt − 2 a t | Fs ) = E(ea(Wt −Ws )+aWs − 2 a t | Fs )
1 2
= e− 2 a t E(ea(Wt −Ws ) eaWs | Fs )
1 2
= e− 2 a t eaWs E(ea(Wt −Ws ) | Fs )
1 2
= e− 2 a t eaWs E(ea(Wt −Ws ) ) , by independence,
1 2 1 2 (t−s)
= e− 2 a t eaWs e 2 a
1 2s
= eaWs − 2 a

since we know that Wt − Ws has a normal distribution with mean zero and
variance t − s.

(2k)! tk
Example 5.5. For k ∈ N, E(Wt2k ) = .
2k k!
To see this, let Ik = E(Wt2k ) and for n ∈ N, let P (n) be the statement that
(2n)! tn
E(Wt2n ) = . Since I1 = t, we see that P (1) is true. Integration by
2n n!
parts, gives
E(Wt2k ) = E(Wt2k+2 )/t(2k + 1)

and so the truth of P (k) implies that

(2k)! tk (2(k + 1))! t(k+1)


E(Wt2k+2 ) = t(2k + 1) E(Wt2k ) = t(2k + 1) =
2k k! 2k+1 (k + 1)!

which says that P (k + 1) is true. By induction, P (n) is true for all n ∈ N.

ifwilde Notes
Wiener process 71

(n) (n) (n)


Example 5.6. Let Dn = { 0 = t0 < t1 < · · · < tmn = T } be a sequence of
partitions of the interval [0, T ] such that mesh(Dn ) → 0, where mesh(D) =
max{ tj+1 − tj : tj ∈ D }. For each n and 0 ≤ j ≤ mn − 1, let ∆n Wj denote
(n) (n)
the increment ∆n Wj = Wt(n) − Wt(n) and set ∆n tj = tj+1 − tj . Then
j+1 j

mn
X
( ∆n Wj )2 → T
j=0

in L2 as n → ∞.
To see this, we calculate

mn
X X
mn mn
X 2 
k ( ∆n Wj )2 − T k2 = E ( ∆n Wj )2 − ∆n t j
j=0 j=0 j=0
X 
= E ( ( ∆n Wi )2 − ∆n ti )( ( ∆n Wj )2 − ∆n tj )
i,j
X 
= E ( ( ∆n Wj )2 − ∆n tj )2 ,
j

(off-diagonal terms vanish by independence),


X 
= E ( ∆n Wj )4 − 2 ( ∆n Wj )2 ∆n tj + (∆n tj )2
j
X
= 2 (∆n tj )2
j
X
≤ 2 mesh(Dn ) ∆n t j
j

= 2 mesh(Dn ) T → 0

as required. This is the quadratic variation of the Wiener process.

Example 5.7. For any c > 0, Yt = 1c Wc2 t is a Wiener process with respect
to the filtration generated by the Yt s.
We can see this as follows. Clearly, Y0 = 0 almost surely and the map
t 7→ Yt (ω) = Wc2 t (ω)/c is almost surely continuous because t 7→ Wt (ω)
is. Also, for any 0 ≤ s < t, the distribution of the increment Yt − Ys
is that of (Wc2 t − Wc2 s )/c, namely, normal with mean zero and variance
(c2 t−c2 s)/c2 = t−s. Let Gt = Fc2 t which is equal to the σ-algebra generated
by the random variables { Ys : s ≤ t }. Then c(Yt − Ys ) = (Wc2 t − Wc2 s ) is
independent of Fc2 s = Gs , and so therefore is Yt − Ys . Hence (Yt ) is a Wiener
process with respect to (Gt ).

November 22, 2004


72 Chapter 5

Remark 5.8. For t > 0, let Yt = t X1/t and set Y0 = 0. Then for any
0<s<t

Yt − Ys = t X1/t − s X1/s = (t − s) X1/t − s (X1/s − X1/t ) .

Now, 0 < 1/t < 1/s and so X1/t and (X1/s − X1/t ) are independent normal
random variables with zero means and variances given by 1/t and 1/s − 1/t,
respectively. It follows that Yt − Ys is a normal random variable with mean
zero and variance (t − s)2 /t + s2 (1/s − 1/t) = (t − s). When s = 0, we see
that Yt − Y0 = Yt = t X1/t which is a normal random variable with mean
zero and variance t2 /t = t.
Let (Gt ) be the filtration where Gt is the σ-algebra generated by the family
{ Ys : s ≤ t }. Again, suppose that 0 < s < t. Then for any r < s

E((Yt − Ys ) Yr ) = E( (t X1/t − s X1/s ) r X1/r )


= rt E(X1/t X1/r ) − rs E(X1/s X1/r )
= rt E(X1/t (X1/r − X1/t )) + rt E(X1/t X1/t )
− rsE(X1/s (X1/r − X1/s )) − rs E(X1/s X1/s )
= rt E(X1/t ) E(X1/r − X1/t ) + rt var(X1/t )
− rsE(X1/s ) E(X1/r − X1/s ) − rs var(X1/s )
= 0 + rt (1/t) − 0 − rs (1/s)
= 0.

This shows that (Yt − Ys ) is orthogonal in L2 to each Yr . But orthogonality


for jointly normal random variables implies independence and so it follows
that the increment Yt − Ys is independent of Gs .
Finally, it can be shown that the map t 7→ Yt on R+ is continuous almost
surely. In fact, it is evidently continuous almost surely on (0, ∞) because
this is true of t 7→ Xt . The continuity at t = 0 requires additional argument.
Notice that it is easy to show that t 7→ Yt is L2 -continuous. For s > 0 and
t > 0, we find that

kYt − Ys k2 = k t X1/t − s X1/s k2


≤ k t X1/t − t X1/s k2 + k t X1/s − s X1/s k2
= t |1/s − 1/t|1/2 + |t − s| (1/s)1/2
→ 0 as s → t.

At t = 0, we have

kYt − Y0 k22 = kYt k22 = t2 kX1/t k22 = t2 var X1/t = t → 0

as t ↓ 0. This is example is useful in that it relates large time behaviour of


the Wiener process to small time behaviour. The behaviour of Xt for large t
is related to that of Ys for small s and both Xt and Ys are Wiener processes.

ifwilde Notes
Wiener process 73

Example 5.9. Let (Wt ) be a Wiener process and let Xt = µt + σWt for t ≥ 0
(where µ and σ are constants). Then (Xt ) is a martingale if µ = 0 but is a
submartingale if µ ≥ 0.
We see that for 0 ≤ s < t,

E(Xt | Fs ) = E(µt + σWt | Fs ) = µt + σWs > µs + σWs = Xs .

Non-differentiability of Wiener process paths


Whilst the sample paths Wt (ω) are almost surely continuous in t, they are
nevertheless extremely jagged, as the next result indicates.

Theorem 5.10. With probability one, the sample path t 7→ Wt (ω) is nowhere
differentiable.

Proof. First we just consider 0 ≤ t ≤ 1. Fix β > 0. Now, if a given function


f is differentiable at some point s with f ′ (s) ≤ β, then certainly we can say
that
|f (t) − f (s)| ≤ 2β |t − s|
whenever |t − s| is sufficiently small. (If t is sufficiently close to s, then
(f (t)−f (s))/(t−s) is within β of f ′ (s) and so is smaller than f ′ (s)+β ≤ 2β).
So let

An = { ω : ∃s ∈ [0, 1) such that |Wt (ω) − Ws (ω)| ≤ 2β |t − s| ,


2
whenever |t − s| ≤ n }.
S
Evidently An ⊂ An+1 and n An includes all ω ∈ Ω for which the function
t 7→ Wt (ω) has a derivative at some point in [0, 1) with value ≤ β.
Once again, suppose f is some function such that for given s there is some
δ > 0 such that
|f (t) − f (s)| ≤ 2β |t − s| (∗)
if |t − s| ≤ 2δ. Let n ≥ 1/δ and let k be the largest integer such that
k/n ≤ s. Then

max{ | f ( k+2 k+1 k+1 k k k−1
n ) − f( n ) | , | f( n ) − f(n) | , | f(n) − f( n ) | } ≤ n (∗∗)

To verify this, note first that k−1 k k+1 k+2


n < n ≤ s < n < n and so we may
estimate each of the three terms involved in (∗∗) with the help of (∗). We
find

| f ( k+2 k+1 k+2 k+1


n ) − f ( n ) | ≤ | f ( n ) − f (s) | + | f (s) − f ( n ) |
≤ 2β| k+2
n − s| + 2β|s −
k+1
n |

≤ n

November 22, 2004


74 Chapter 5

and

| f ( k+1 k k+1 k
n ) − f ( n ) | ≤ | f ( n ) − f (s) | + | f (s) − f ( n ) |
≤ 2β| k+1 k
n − s| + 2β|s − n |

≤ n

and

| f ( nk ) − f ( k−1 k k−1
n ) | ≤ | f ( n ) − f (s) | + | f (s) − f ( n ) |
≤ 2β| nk − s| + 2β|s − k−1
n |

≤ n

which establishes the inequality (∗∗).


For given ω, let

gk (ω) = max{ |W k+2 (ω) − W k+1 (ω)| , |W k+1 (ω) − W k (ω)| ,


n n n n

|W k (ω) − W k−1 (ω)| }


n n

and let

Bn = { ω : gk (ω) ≤ n for some k ≤ n − 2 } .
Now, if ω ∈ An , then Wt (ω) is differentiable at some s and furthermore
|Wt (ω) − Ws (ω)| ≤ 2β |t − s| if |t − s| ≤ 2/n. However, according to our
discussion above, this means that gk (ω) ≤ 6β/n where k is the largest
integer with k/n ≤ s. Hence ω ∈ Bn and so An ⊂ Bn . Now,
n−2
[ 6β
Bn = { ω : gk (ω) ≤ n }
k=1

and so
n−2
X 6β
P (Bn ) ≤ P ( gk ≤ n ).
k=1
We estimate

P ( gk ≤ n ) = P ( max{ |W k+2 − W k+1 | ,
n n

|W k+1 − W k | , |W k − W k−1 | } ≤ n )
n n n n

= P ( { |W k+2 − W k+1 | ≤ n }∩
n n
6β 6β
{ |W k+1 − W k | ≤ n } ∩ { |W k − W k−1 | } ≤ n })
n n n n

= P ( { |W k+2 − W k+1 | ≤ n }) ×
n n
6β 6β
× P ( { |W k+1 − W k | ≤ n } ) P ( { |W k − W k−1 | ≤ n }),
n n n n

by independence of increments,

ifwilde Notes
Wiener process 75

q Z 6β/n 3
2 /2
= n
2π e−n x dx
−6β/n
q 3
n 12 β
≤ 2π n

= C n−3/2

where C is independent of n. Hence

P (Bn ) ≤ n C n−3/2 = C n−1/2 → 0

as n → ∞. Now for fixed j, Aj ⊂ An for all n ≥ j, so

P (Aj ) ≤ P (An ) ≤ P (Bn ) → 0


S
as n → ∞, and this forces P (Aj ) = 0. But then P ( j Aj ) = 0.
S
What have we achieved so far? We have shown that P ( n An ) = 0 and
so, with probability one, Wt has no derivative at any s ∈ [0, 1) with value
smaller than β. Now consider any unit interval [j, j + 1) with j ∈ Z+ .
Repeating the previous argument but with Wt replaced by Xt = Wt+j , we
deduce that almost surely Xt has no derivative at any s ∈ [0, 1) whose value
is smaller than β. But to say that Xt has a derivative at s is to say that
Wt has a derivative at s + j and so it follows that with probability one, Wt
has no derivative in [j, j + 1) whose value is smaller than β. We have shown
that if

Cj = { ω : Wt (ω) has a derivative at some s ∈ [j, j + 1) with value ≤ β }


S
then P (Cj ) = 0 and so P ( j∈Z+ Cj ) = 0, which means that, with probability
one, Wt has no derivative at any point in R+ whose value is ≤ β. This holds
for any given β > 0.
To complete the proof, for m ∈ N, let

Sm = { ω : Wt (ω) has a derivative at some s ∈ R+ with value ≤ m } .


S
Then we have seen above that P (Sm ) = 0 and so P ( m Sm ) = 0. It follows
that, with probability one, Wt has no derivative anywhere.

November 22, 2004


76 Chapter 5

Itô integration
We wish to indicate how one can construct stochastic integrals with respect
to the Wiener process. The resulting integral is called the Itô integral. We
follow the strategy discussed earlier, namely, we first set-up the integral
for integrands which are step-functions. Next, we establish an appropriate
isometry property and it then follows that the definition can be extended
abstractly by continuity considerations.
We shall consider integration over the time interval [0, T ], where T > 0 is
fixed throughout.
RT For an elementary process, h ∈ E, we define the stochastic
integral 0 h dW by
Z T Xn
h dW ≡ I(h)(ω) = gi+1 (ω) ( Wti+1 (ω) − Wti (ω) ) .
0 i=1
Pn
where h = i=1 gi 1(ti ,ti+1 ] with 0 = t1 < . . . < tn+1 = T and where gi
is bounded and Fti -measurable. Now, we know that Wt2 has Doob-Meyer
decomposition Wt2 = Mt + t, where (Mt ) is an L1 -martingale. Using this,
we can calculate E(I(h)2 ) as in the abstract set-up and we find that
Z T
2

E(I(h) ) = E h2 (t, ω) dt
0
which is the isometry property for the Itô-integral. Rt
For any 0 ≤ t ≤ T , we define the stochastic integral 0 h dW by
Z t n
X
h dW = I(h 1(0,t] = gi+1 (ω) ( Wti+1 ∧t (ω) − Wti ∧t (ω) )
0 i=1

and it is convenient to denote this by It (h). We see (as in the abstract


theory discussed earlier) that for any 0 ≤ s < t ≤ T
E(It (h) | Fs ) = Is (h) ,
that is, (It (h))0≤t≤T is an L2 -martingale.
Next, we wish to extend the family of allowed integrands. The right hand
side of the isometry property suggests the way forward. Let KT denote the
linear subspace of L2 ((0, T ] × Ω) of adapted processes f (t, ω) such that there
is some sequence (hn ) of elementary processes such that
Z T 
E ( f (s, ω) − hn (s, ω) )2 ds → 0 . (∗)
0

We construct the stochastic integral I(f ) (and It (f )) for any f ∈ KT via the
isometry property. Indeed, we have
E( (I(hn ) − I(hm ))2 ) = E( I(hn − hm )2 )
Z T

=E (hn − hm )2 ds .
0

ifwilde Notes
Wiener process 77

But by (∗), (hn ) is a Cauchy sequence in KT (with respect the norm khkKT =
RT
E( 0 h2 ds)1/2 ) and so (I(hn )) is a Cauchy sequence in L2 . It follows that
there is some F ∈ L2 (FT ) such that

E( (F − I(hn ))2 ) → 0 .
RT
We denote F by I(f ) or by 0 f dWs . One checks that this construction
does not depend on the particular choice of the sequence (hn ) converging to
f in KT . The Itô-stochastic integral obeys the isometry property

Z T Z T
2
E( f dWs )= E(f 2 ) ds
0 0

for any f ∈ KT .
For any f, g ∈ KT , we can apply the isometry property to f ± g to get
Z
2 2

E( (I(f ) ± I(g)) ) = E( (I(f ± g)) ) = E (f ± g)2 ds .
T

On subtraction and division by 4, we find that


Z T

E( I(f )I(g) ) = E f (s)g(s) ds .
0

Replacing f by f 1(0,t] in the discussion above and using hn 1(0,t] ∈ E rather


than hn , we construct It (f ) for any 0 ≤ t ≤ T . Taking the limit in L2 , the
martingale property E(It (hn ) | Fs ) = Is (hn ) gives the martingale property
of It (f ), namely,

E(It (f ) | Fs ) = Is (f ) , 0≤s≤t≤T.

For the following important result, we make a further assumption about


the filtration (Ft ).
Assumption: each Ft contains all events of probability zero.
Note that by taking complements, it follows that each Ft contains all
events with probability one.
Theorem 5.11. Let f ∈ KT . Then there is a continuous modification of It (f ),
0 ≤ t ≤ T , that is, there is an adapted process Jt (ω), 0 ≤ t ≤ T , for which
t 7→ Jt (ω) is continuous for all ω ∈ Ω and such that for each t, It (f ) = Jt ,
almost surely.
Proof. Let (hn ) in E be a sequence of approximations to f in KT , that is
Z T 
E (hn − f )2 ds → 0
0

November 22, 2004


78 Chapter 5

so that I(hn ) → I(f ) in L2 (and also It (hn ) → It (f ) in L2 ). It follows that


(I(hn )) is a Cauchy sequence in L2 ,
Z T
kI(hn ) − I(hm )k22 = E(hn − hm )2 ds → 0
0

as n, m → ∞. Hence, for any k ∈ N, there is Nk such that if n, m ≥ Nk


then
kI(hn ) − I(hm )k22 < 21k 41k .
If we set nk = N1 + · · · + Nk , then clearly nk+1 > nk ≥ Nk and

kI(hnk+1 ) − I(hnk )k22 < 1 1


2k 4k

for k ∈ N. Now, the map t 7→ It (hn ) is almost surely continuous (because


this is true of the map t 7→ Wt ) and It (hn )−It (hm ) = It (hn −hm ) is an almost
surely continuous L2 -martingale. So by Doob’s Martingale Inequality, we
have
P ( sup | It (hn − hm ) | > ε ) ≤ ε12 kI(hn − hm )k22
0≤t≤T

for any ε > 0. In particular, if we set ε = 1/2k and denote by Ak the event
1
Ak = { sup | It (hnk+1 − hnk ) | > 2k
}
0≤t≤T

then we get
1 1 1
P (Ak ) ≤ (2k )2 kI(hn − hm )k22 < (2k )2 k k
= k.
2 4 2
P
But then k P (Ak ) < ∞ and so by the Borel-Cantelli Lemma (Lemma 1.3),
it follows that
P ( Ak infinitely-often ) = 0 .
| {z }
B
For ω ∈ Bc, we must have
1
sup | It (hnk+1 )(ω) − It (hnk )(ω) | ≤ 2k
0≤t≤T

for all k > k0 , where k0 may depend on ω. Hence, for j > k

sup | It (hnj (ω) − It (hnk )(ω) |


0≤t≤T
≤ sup | It (hnj )(ω) − It (hnj−1 )(ω) |
0≤t≤T
+ sup | It (hnj−1 )(ω) − It (hnj−2 )(ω) |
0≤t≤T
+ . . . + sup | It (hnk+1 )(ω) − It (hnk )(ω) |
0≤t≤T

ifwilde Notes
Wiener process 79

1 1 1
≤ 2j−1
+ 2j−2
+ ··· + 2k
2
< 2k
.

This means that for each ω ∈ B c , the sequence of functions (It (hnk )(ω))
of t is a Cauchy sequence with respect to the norm kϕk = sup0≤t≤T |ϕ(t)|.
In other words, it is uniformly Cauchy and so must converge uniformly on
[0, T ] to some function of t, say Jt (ω). Now, for each k, there is a set Ek with
P (Ek ) =T 1 such that if ω ∈ Ek then t 7→ It (hnk )(ω) is continuous on [0, T ].
Set E = k Ek , so P (E) = 1. Then P (B c ∩ E) = 1 and if ω ∈ B c ∩ E then
t 7→ Jt (ω) is continuous on [0, T ]. We set Jt (ω) = 0 for all t if ω ∈ / Bc ∩ E
which means that t 7→ Jt (ω) is continuous for all ω ∈ Ω.
However, It (hnk ) → It (f ) in L2 and so there is some subsequence (It (hnkj ))
such that It (hnkj )(ω) → It (f )(ω) almost surely, say, on St with P (St ) = 1.
But It (hnkj )(ω) → Jt (ω) on B c ∩E and so It (hnkj )(ω) → Jt (ω) on B c ∩E ∩St
and therefore Jt (ω) = It (f )(ω) for ω ∈ B c ∩E ∩St . Since P (B c ∩E ∩St ) = 1,
we may say that Jt = It (f ) almost surely.
We still have to show that the process (Jt ) is adapted. This is where we use
the hypothesis that Ft contains all events of zero probability. Indeed, by
construction, we know that

Jt = lim It (hnk ) 1B c ∩E
k→∞ | {z }
Ft -measurable

and so it follows that Jt is Ft -measurable and the proof is complete.

November 22, 2004


80 Chapter 5

ifwilde Notes
Chapter 6

Itô’s Formula

We have seen that E((Wt − Ws )2 ) = t − s for any 0 ≤ s ≤ t. In particular, if


t−s is small, then we see that (Wt −Ws )2 is of first order (on average) rather
than second order. This might suggest that we should not expect stochastic
calculus to be simply calculus with ω ∈ Ω playing a kind of parametric rôle.
Indeed, the Itô stochastic integral is not an integral in the usual sense.
It is constructed via limits of sums in which the integrator points to the
future. There is no reason to suppose that there is a stochastic fundamental
theorem of calculus. This, of course, makes it difficult to evaluate stochastic
integrals. After all, we know, for example, that the derivative of x3 is 3x2
and so the usual fundamental theorem of calculus tell us that the integral
of 3x2 is x3 . By differentiating many functions, one can consequently build
up a list of integrals. The following theorem allows a similar construction
for stochastic integrals.

Theorem 6.1 (Itô’s Formula). Let F (t, x) be a function such that the partial
derivatives ∂t F and ∂xx F are continuous. Suppose that ∂x F (t, Wt ) ∈ KT .
Then
Z T 
F (T, WT ) = F (0, W0 ) + ∂t F (t, Wt ) + 21 ∂xx F (t, Wt ) dt
0
Z T
+ ∂x F (t, Wt ) dWt
0

almost surely.

Proof. Suppose first that F (t, x) is such that the partial derivatives ∂x F and
∂xx F are bounded on [0, T ]×R, say, |∂x F | < C and |∂xx F | < C. Let Ω0 ⊂ Ω
be such that P (Ω0 ) = 1 and t 7→ Wt (ω) is continuous for each ω ∈ Ω0 . Fix
(n) (n) (n) (n)
ω ∈ Ω0 . Let tj = jT /n, so that 0 = t0 < t1 < · · · < tn = T partitions
the interval [0, T ] into n equal subintervals. Suppressing the n dependence,

81
82 Chapter 6

(n) (n)
let ∆j t = tj+1 − tj and ∆j W = Wtj+1 (ω) − Wtj (ω). Then we have

F (T, WT (ω)) − F (0, W0 (ω))


n−1
X 
= F (tj+1 , Wtj+1 (ω)) − F (tj , Wtj (ω))
j=0
n−1
X 
= F (tj+1 , Wtj+1 (ω)) − F (tj , Wtj+1 (ω))
j=0
n−1
X 
+ F (tj , Wtj+1 (ω)) − F (tj , Wtj (ω))
j=0
n−1
X
= ∂t F (τj , Wtj+1 (ω)) ∆j t
j=0
n−1
X 
+ F (tj , Wtj+1 (ω)) − F (tj , Wtj (ω)) ,
j=0

for some τj ∈ [tj , tj+1 ], by Taylor’s Theorem (to 1st order),


n−1
X
= ∂t F (τj , Wtj+1 (ω)) ∆j t
j=0
n−1
X 
+ ∂x F (tj , Wtj (ω)) ∆j W + 12 ∂xx F (tj , zj ) (∆j W )2 ,
j=0

for some zj between Wtj (ω) and Wtj+1 (ω), by Taylor’s Theorem
(to 2nd order),
≡ Γ1 (n) + Γ2 (n) + Γ3 (n) .

We shall consider separately the behaviour of these three terms as n → ∞.


Consider first Γ1 (n). We write

∂t F (τj , Wtj+1 (ω)) ∆j t = ∂t F (τj , Wtj+1 (ω)) − ∂t F (tj+1 , Wtj+1 (ω))
+ ∂t F (tj+1 , Wtj+1 (ω)) .

By hypothesis, ∂t F (t, x) is continuous and so is uniformly continuous on any


rectangle in R+ × R. Also t 7→ Wt (ω) is continuous and so is bounded on
the interval [0, T ]; say, |Wt (ω)| ≤ M on [0, T ]. In particular, then, ∂t F (t, x)
is uniformly continuous on [0, T ] × [−M, M ] and so, for any given ε > 0,

| ∂t F (τj , Wtj+1 (ω)) − ∂t F (tj+1 , Wtj+1 (ω)) | < ε

for sufficiently large n (so that |τj − tj+1 | ≤ 1/n is sufficiently small).

ifwilde Notes
Itô’s Formula 83

Hence, for sufficiently large n,


n−1
X 
Γ1 (n) = ∂t F (τj , Wtj+1 (ω)) − ∂t F (tj+1 , Wtj+1 (ω)) ∆j t
j=0
n−1
X
+ ∂t F (tj+1 , Wtj+1 (ω)) ∆j t .
j=0

For
P large n, the first summation on the right hand side is bounded by
j ε ∆j t = ε T and, as n → ∞, the second summation converges to the
RT
integral 0 ∂s F (s, Ws (ω)) ds. It follows that
Z T
Γ1 (n) → ∂s F (s, Ws (ω)) ds
0

almost surely as n → ∞.
Next, consider term Γ2 (n). This is
n−1
X
∂x F (tj , Wtj (ω)) ∆j W = I(gn )(ω)
| {z }
j=0
Wtj+1 (ω)−Wtj (ω)

where gn ∈ E is given by
n−1
X
gn (s, ω) = ∂x F (tj , Wtj (ω)) 1(tj ,tj+1 ] (s) .
j=0

Now, the function t 7→ ∂x F (tj , Wtj (ω)) is continuous (and so uniformly


continuous) on [0, T ] and gn (·, ω) → g(·, ω) uniformly on (0, T ], where
g(s, ω) = ∂x F (s, Ws (ω)). It follows that
Z T
|gn (s, ω) − g(s, ω)|2 ds → 0
0
RT
as n → ∞ for each ω ∈ Ω0 . But 0 |gn (s, ω) − g(s, ω)|2 ds ≤ 4T C 2 and
therefore (since P (Ω0 ) = 1)
Z T 
E |gn − g|2 ds → 0
0

as n → ∞, by Lebesgue’s Dominated Convergence Theorem.


Applying the Isometry Property, we see that

kI(gn ) − I(g)k2 → 0 ,

that is, Γ2 (n) = I(gn ) → I(g) in L2 and so there is a subsequence (gnk ) such
that I(gnk ) → I(g) almost surely. That is, Γ2 (nk ) → I(g) almost surely.

November 22, 2004


84 Chapter 6

1 Pn−1
We now turn to term Γ3 (n) = 2 j=0 ∂xx F (tj , zj )(∆j W )2 . For any ω ∈ Ω,
write

1
2 ∂xx F (ti , zi )(∆i W )2 = 1
2 ∂xx F (ti , Wi (ω)) (∆i W )2 − ∆i t
+ 12 ∂xx F (ti , Wi (ω)) ∆i t

+ 1
2 ∂xx F (ti , zi ) − ∂xx F (ti , Wi (ω)) (∆i W )2 .

Summing over j, we write Γ3 (n) as

Γ3 (n) ≡ Φ1 (n) + Φ2 (n) + Φ3 (n)


Pn−1 1

where Φ1 (n) = j=0 2 ∂xx F (ti , Wi (ω)) (∆i W )2 − ∆i t etc. As before, we
see that for ω ∈ Ω0
Z T
Pn−1 1 1
Φ2 (n) = i=0 2 ∂xx F (ti , Wi (ω)) ∆i t → 2 ∂xx F (s, Ws (ω)) ds
0

1
RT
as n → ∞. Hence Φ2 (nk ) → 2 0 ∂xx F (s, Ws (ω)) ds almost surely as
k → ∞.
To discuss Φ1 (nk ), consider
P
E(Φ1 (nk )2 ) = E(( i αi )2 )
P
= E( i,j αi αj )

where αi = ∂xx F (ti , Wi (ω)) (∆i W )2 − ∆i t . By independence, if i < j,

E(αi αj ) = E( αi ∂xx F (tj , Wj ) ) E( (∆j W )2 − ∆j t )


| {z }
=0

and so
X
E(Φ1 (nk )2 ) = E( αi2 )
i
X 
= E (∂xx F (ti , Wi ))2 ( (∆i W )2 − ∆i t)2
i
X  
= E (∂xx F (ti , Wi ))2 E (∆i W )2 − ∆i t)2 ,
i
| {z }
≤C 2
by independence,
X 
≤ C2 E (∆i W )2 − ∆i t)2
i
X 
2
=C E (∆i W )4 − 2∆i t(∆i W )2 + (∆i t)2
i
X 
2
=C ( E (∆i W )4 − (∆i t)2 )
i

ifwilde Notes
Itô’s Formula 85
X
= C2 ( 3(∆i t)2 − (∆i t)2 )
i
X
2
=C 2 (∆i t)2
i
X 2
= C2 2 T
nk
i
2 T2
= 2 C nk → 0,

as n → ∞. Hence Φ1 (nk ) → 0 in L2 .
P
Now we consider Φ3 (n) = i ( ∂xx F (ti , zi ) − ∂xx F (ti , Wi (ω)) )(∆i W )2 .
Fix ω ∈ Ω0 . Then (just as we have argued for ∂x F ), we may say that the
function t 7→ ∂xx F (t, Wt (ω)) is uniformly continuous on [0, T ]. It follows
that for any given ε > 0,

|∂xx F (ti , zi ) − ∂xx F (ti , Wi (ω))| < ε

for all 0 ≤ i ≤ n − 1, for sufficiently large n. Hence, for all sufficiently


large n,
n−1
X
( ∂xx F (ti , zi ) − ∂xx F (ti , Wi (ω)) )(∆i W )2 ≤ ε Sn (∗)
i=0
Pn−1
where Sn = i=0 (∆i W )2 .
Claim: Sn → T in L2 as n → ∞.
To see this, we consider

kSn − T k22 = E( (Sn − T )2 )


P
= E( (Sn − i ∆i t)2 )
P
= E( ( i {(∆Wi )2 − ∆i t})2 )
P
= i,j E(βi βj ) , where βi = (∆Wi )2 − ∆i t,
P P
= i E(βi2 ) + i6=j E(βi βj )
P P
= i E(βi2 ) + i6=j E(βi ) E(βj ) , by independence,
P
= i E(βi2 ) , since E(βi ) = 0,
P
= i E( (∆i W )4 − 2∆i t (∆i W )2 + (∆i t)2 )
P
= i 2 (∆i t)2
2
= 2 Tn
→0

as n → ∞, which establishes the claim.

November 22, 2004


86 Chapter 6

It follows that Snk → T in L2 and so there is a subsequence such that


both Φ1 (nm ) → 0 almost surely and Snm → T almost surely. It follows from
(∗) that
Φ3 (nm ) → 0
almost surely as m → ∞.
Combining these results, we see that

Γ1 (nm ) + Γ2 (nm ) + Γ3 (nm )


Z T Z T
→ ∂s F (s, Ws (ω)) ds + ∂x F (s, Ws ) dWs
0 0
Z T
+ 12 ∂xx F (s, Ws (ω)) ds
0

almost surely, which concludes the proof for the case when ∂x F and ∂xx F
are both bounded. To remove this restriction, let ϕn : R → R be a smooth
function such that ϕn (x) = 1 for |x| ≤ n + 1 and ϕn (x) = 0 for |x| ≥ n + 2.
Set Fn (t, x) = ϕn (x) F (t, x). Then ∂x Fn and ∂xx Fn are continuous and
bounded on [0, T ] × R. Moreover, Fn = F , ∂t Fn = ∂t F , ∂x Fn = ∂x F and
∂xx Fn = ∂xx F whenever |x| ≤ n.
We can apply the previous argument to deduce that for each n there is
some set Bn ⊂ Ω with P (Bn ) = 1 such that
Z T 
Fn (T, WT ) = Fn (0, W0 ) + ∂t Fn (t, Wt ) + 12 ∂xx Fn (t, Wt ) dt
0
Z T
+ ∂x Fn (t, Wt ) dWt (∗∗)
0

for ω ∈ Bn . T 
Let A = Ω0 ∩ B
n n so that P (A) = 1. Fix ω ∈ A. Then ω ∈ Ω0
and so t 7→ Wt (ω) is continuous. It follows that there is some N ∈ N such
that |Wt (ω)| < N for all t ∈ [0, T ] and so FN (t, Wt (ω)) = F (t, Wt (ω)) for
t ∈ [0, T ] and the same remark applies to the partial derivatives of FN and F .
But ω ∈ BN and so (∗∗) holds with Fn replaced by FN which in turn means
that (∗∗) holds with now FN replaced by F – simply because FN and F
(and the derivatives) agree for such ω and all t in the relevant range.
We conclude that
Z T 
F (T, WT ) = F (0, W0 ) + ∂t F (t, Wt ) + 21 ∂xx F (t, Wt ) dt
0
Z T
+ ∂x F (t, Wt ) dWt
0

for all ω ∈ A and the proof is complete.

ifwilde Notes
Itô’s Formula 87

Example 6.2. Let F (s, x) = s x. Then we find that


Z t Z t
F (t, Wt ) − F (0, W0 ) = t Wt − 0 = Ws ds + s dWs ,
0 0

that is,
Z t Z t
s dWs = t Wt − Ws ds .
0 0

Example 6.3. Let F (t, x) = x2 . Then Itô’s Formula tells us that


Z t Z t
Wt2 = W02 + 1
2 2 ds + 2Ws dWs ,
0 0

that is,
Z t
Wt2 =t+2 Ws dWs .
0

This can also be expressed as


Z t
Ws dWs = 1
2 Wt2 − 21 t .
0

Example 6.4. Let F (t, x) = sin x, so we find that ∂t F = 0, ∂x F = cos x and


∂xx F = − sin x. Applying Itô’s Formula, we get
Z t Z t
1
F (t, Wt ) − F (0, W0 ) = sin Wt = cos(Ws ) dWs − 2 sin(Ws ) ds
0 0

or Z Z
t t
1
cos(Ws ) dWs = sin Wt + 2 sin(Ws ) ds .
0 0

Example 6.5. Suppose that the function F (t, x) obeys ∂t F = − 21 ∂xx F . Then
if ∂x F (t, Wt ) ∈ KT , Itô’s formula gives
Z T 
F (T, WT ) = F (0, W0 ) + ∂t F (t, Wt ) + 21 ∂xx F (t, Wt ) dt
0 | {z }
=0
Z T
+ ∂x F (t, Wt ) dWt
|0 {z }
martingale

1 2 1 2
Taking F (t, x) = eαx− 2 tα , we may say that eαWt − 2 tα is a martingale.

November 22, 2004


88 Chapter 6

Itô’s formula also holds when Wt is replaced by the somewhat more general
processes of the form
Z t Z t
Xt = X 0 + u(s, ω) ds + v(s, ω) dWs
0 0
Rt
where u and v obey P ( 0 |u|2 ds < ∞) = 1 and similarly for v. One has
Z tn
F (t, Xt ) − F (0, X0 ) = ∂s F (s, Xs ) + u ∂x F (s, Xs )
0
o
+ 21 v 2 ∂xx F (s, Xs ) ds
Z t
+ v(s, ω) ∂x F (s, Xs ) dWs .
0

In symbolic differential form, this can be written as

dF (t, Xt ) = ∂t F dt + u ∂x F dt + 21 v 2 ∂xx F dt + v ∂x F dW . (∗)

On the other hand, on the grounds that dt dX and dt dt are negligible, we


might also write

dF (t, X) = ∂t F dt + ∂x F dX + 12 ∂xx F dX dX . (∗∗)

However, in the same spirit, we may write

dX = u dt + v dW

which leads to

dX dX = u2 |{z}
dt2 + 2uv |dt{z
dW} + v 2 dW dW .
=0 =0

But we know that E((∆W )2 ) = ∆t and so if we replace dW dW by dt and


substitute these expressions for dX and dX dX into (∗∗), then we recover
the expression (∗).
It would appear that standard practice is to manipulate such symbolic
differential expressions (rather than the proper integral equation form) in
conjunction with the following Itô table:

Itô table

dt2 = 0
dt dW = 0
dW 2 = dt

ifwilde Notes
Itô’s Formula 89

R t 1
g dW − 2
R t
g 2 ds
Example 6.6. Consider Zt = e 0 . Let
0

Z t Z t
1 2
Xt = X0 + g dW − 2 g ds
0 0

so that dX = − 21 g 2 ds + g dW . Then, with F (t, x) = ex , we have ∂t F = 0,


∂x F = ex and ∂xx F = ex so that

dF = ∂t F (X) dt + ∂x F (X) dX + 12 ∂xx F (X) dX dX



= 0 + eX − 12 g 2 dt + g dW + 21 eX dX| {zdX}
g 2 dt

= eX g dW .

We deduce that F (Xt ) = eXt = Zt obeys dZ = eX g dW = Z g dW , so that


Z t
Zt = Z0 + g Zs dWs .
0

It can be shown that this process is a martingale for suitable g – such a


1 R T
g 2 ds
condition is given by the requirement that E(e 2 0 ) < ∞, the so-called
Novikov condition.
R t 1 R t
u2 ds
Example 6.7. Suppose that dX = u dt + dW , Mt = e− 0 u dW − 2 0 and
consider the process Xt Mt . Then

dYt = Xt dMt + Mt dYt + dXt dMt .

The question now is what is dMt ? Let dVt = − 12 u2 dt − u dWt so that


Mt = eVt . Applying Itô’s formula (equation (∗) above) with F (t, x) = ex ,
(so ∂t F = 0 and ∂x F = ∂xx F = ex and F (Vt ) = Mt ), we obtain

dMt = dF (Vt ) = (0 − 12 u2 eV + 12 u2 eV ) ds − u eV dW
| {z }
=0
Vt
= −u e dWt .

So

dYt = −Xt u eVt dWt + Mt (u dt + dWt ) − u eVt dWt (u dt + dWt )


= −u Xt Mt dWt + u Mt dt + Mt dWt − u eVt dt
= (−u Xt Mt + Mt ) dWt , since Mt = eVt ,

or, in integral form,


Z t
Yt = Y0 + (Ms − u Xs Ms ) dWs
0

which is a martingale.

November 22, 2004


90 Chapter 6

Remark 6.8. There is also an m-dimensional version of Itô’s formula. Let


(W (1) , W (2) , . . . , W (m) ) be an m-dimensional Wiener process (that is, the
W (j) are independent Wiener processes) and suppose that

dX1 = u1 dt + v11 dW (1) + · · · + v1m dW (m)


..
.
dXn = un dt + vn1 dW (1) + · · · + vnm dW (m)

and let Yt (ω) = F (t, X1 (ω), . . . , Xn (ω)). Then


n
X X
1
dY = ∂t F (t, X) dt + ∂xi F (t, X) dXi + 2 ∂xi xj dXi dXj
i=1 i,j

where the Itô table is enhanced by the extra rule that dW (i) dW (j) = δij dt.

Stochastic Differential Equations


The purpose of this section is to illustrate how Itô’s formula can be used
to obtain solutions to so-called stochastic differential equations. These are
usually obtained from physical considerations in essentially the same way
as are “usual” differential equations but when a random influence is to be
taken into account. Typically, one might wish to consider the behaviour of
a certain quantity as time progresses. Then one theorizes that the change of
such a quantity over a small period of time is approximately given by such
and such terms (depending on the physical problem under consideration)
plus some extra factor which is supposed to take into account some kind of
random influence.

Example 6.9. Consider the “stochastic differential equation”

dXt = µ Xt dt + σ Xt dWt (∗)

with X0 = x0 > 0. Here, µ and σ are constants with σ > 0. The quantity
Xt is the object under investigation and the second term on the right is
supposed to represent some random input to the change dXt . Of course,
mathematically, this is just a convenient shorthand for the corresponding
integral equation. Such an equation has been used in financial mathematics
(to model a risky asset).
To solve this stochastic differential equation, let us seek a solution of the
form Xt = f (t, Wt ) for some suitable function f (t, x). According to Itô’s
formula, such a process would satisfy

dXt = df = ft (t, Wt ) + 12 fxx (t, Wt ) dt + fx (t, Wt ) dWt (∗∗)

ifwilde Notes
Itô’s Formula 91

Comparing coefficients in (∗) and (∗∗), we find


µ f (t, Wt ) = ft (t, Wt ) + 12 fxx (t, Wt )
σ f (t, Wt ) = fx (t, Wt ) .
So we try to find a function f (t, x) satisfying
µ f (t, x) = ft (t, x) + 21 fxx (t, x) (i)
σ f (t, x) = fx (t, x) (ii)
Equation (ii) leads to f (t, x) = eσx C(t). Substitution into equation (i)
1
requires C(t) = et(µ− 2 σ(2) C(0) which leads to the solution
1 2
f (t, x) = C(0) eσx+t(µ− 2 σ ) .
But then X0 = f (0, W0 ) = f (0, 0) = C(0) so that C(0) = x0 and we have
found the required solution
1 2
Xt = x0 eσ Wt +t(µ− 2 σ ) .
From this, we can calculate the expectation E(Xt ). We find
1 2 
E(Xt ) = x0 etµ E eσ Wt − 2 tσ )
| {z }
=1

= x0 e
and so we see that if µ > 0 then the expectation E(Xt ) grows exponentially
as t → ∞.
However, we can write
Wn (Wn − Wn−1 ) + (Wn−1 − Wn−2 ) + · · · + (W1 − W0 )
=
n n
Z1 + Z2 + · · · + Zn
=
n
where Z1 , . . . , Zn are independent, identically distributed random variables
with mean zero (in fact, standard normal). So we can apply the Law of
Large Numbers to deduce that
P ( Wnn → 0 as n → ∞) = 1 .
Now, we have
1 2)
Xn = x0 eσ Wn +n(µ− 2 σ
Wn 1
σ 2 −µ))
= x0 eσn ( n
−( 2
.
If σ 2 > 2µ, the right hand side above → 0 almost surely as n → ∞.
We see that Xn has the somewhat surprising behaviour that Xn → 0 with
probability one, whereas E(Xn ) → ∞ (exponentially quickly).

November 22, 2004


92 Chapter 6

Example 6.10. Ornstein-Uhlenbeck Process


Consider
dXt = −α Xt dt + σ dWt
with X0 = x0 , where α and σ are positive constants. We look for a solution
of the form Xt = g(t) Yt where dYt = h(t) dWt . In differential form, this
becomes (using the Itô table)

dXt = g dYt + dg Yt + dg dYt


= gh dWt + g ′ Yt dt + 0 .

Comparing this with the original equation, we require

g ′ Y = −αg Y
gh = σ .

The first of these equations leads to the solution g(t) = C e−αt . This gives
h = σ/g = σeαt /C and so dY = Cσ eαt dW , or
Z t
Yt = Y0 + σ
C eαs dWs .
0

Hence
 Z t 
−αt
Xt = C e Y0 + σ
C eαs dWs
0
 Z t 
= e−αt C Y0 + σ eαs dWs .
0

When t = 0, X0 = x0 and so x0 = CY0 and we finally obtain the solution


Z t
−αt
Xt = x0 e +σ eα(s−t) dWs .
0

Example 6.11. Let (W (1) , W (2) ) be a two-dimensional Wiener process and


(1) (2)
let Mt = ϕ(Wt , Wt ). Let f (t, x, y) = ϕ(x, y) so that ∂t f = 0. Itô’s
formula then says that
(1) (2) (1) (1) (2) (2)
dMt = df = ∂x ϕ(Wt , Wt ) dWt + ∂y ϕ(Wt , Wt ) dWt
 (1) (2) (1) (2)
+ 21 ∂x2 ϕ(Wt , Wt ) + ∂y2 ϕ(Wt , Wt ) dt .

In particular, if ϕ is harmonic (so ∂x2 ϕ + ∂y2 ϕ = 0) then


(1) (2) (1) (1) (2) (2)
dMt = ∂x ϕ(Wt , Wt ) dWt + ∂y ϕ(Wt , Wt ) dWt
(1) (2)
and so Mt = ϕ(Wt , Wt ) is a martingale.

ifwilde Notes
Itô’s Formula 93

Feynman-Kac Formula
Consider the diffusion equation
1
ut (t, x) = 2 uxx (t, x)

with u(0, x) = f (x) (where f is well-behaved). The solution can be written


as Z ∞
2
1
u(t, x) = √2πt f (v) e(v−x) /2t dv .
−∞

However, the right hand side here is equal to the expectation E( f (x + Wt ) )


and x + Wt is a Wiener process started at x, rather than at 0.

Theorem 6.12 (Feynman-Kac). Let q : R → R and f : R → R be bounded.


Then (the unique) solution to the initial-value problem
1
∂t u(t, x) = 2 ∂xx u(t, x) + q(x) u(t, x) ,

with u(0, x) = f (x), has the representation


 R t
q(x+Ws ) ds

u(t, x) = E f (x + Wt ) e 0 .

Proof. Consider Itô’s formula applied to the function f (s, y) = u(t−s, x+y).
In this case ∂s f = −∂1 u = −u1 , ∂y f = ∂2 u = u2 , ∂yy f = ∂22 u = u22 and so
we get

df (s, Ws ) = −u1 (t − s, x + Ws ) ds
+ 21 u22 (t − s, x + Ws ) dWs dWs + u2 (t − s, x + Ws ) dWs
| {z }
ds
= −q(x + Ws ) u(t − s, x + Ws ) ds + u2 (t − s, x + Ws ) dWs

where we have made use of the partial differential equation satisfied by u.


For 0 ≤ s ≤ t, set
R s
q(x+Wv ) dv
Ms = u(t − s, x + Ws ) e 0

Applying the rule d(XY ) = (dX) Y + X dY + dX dY together with the Itô


table, we get
sR s R  s  R
dMs = df e 0 q(x+Wv ) dv + f d e 0 q(x+Wv ) dv + df d e 0 q(x+Wv ) dv
R s
q(x+Wv ) dv
= −q(x + Ws ) u(t − s, x + Ws ) e
R ds
0

s
q(x+Wv ) dv
+ u2 (t − s, x + Ws ) e
R dWs
0

s
q(x+Wv ) dv
R
+ f q(x + Ws ) e ds + 0
0

s
q(x+Wv ) dv
= u2 (t − s, x + Ws ) e 0 dWs .

November 22, 2004


94 Chapter 6

It follows that
Z τ R s
q(x+Wv ) dv
Mτ = M0 + u2 (t − s, x + Ws ) e 0 dWs
0

and so Mτ is a martingale and, in particular, E(Mt ) = E(M0 ). However,


by construction, M0 = u(t, x) almost surely and so E(M0 ) = u(t, x) and
 t
R
E(Mt ) = E u(0, x + Wt ) e 0 q(x+Wv ) dv
| {z }
=f (x+Wt )

by the initial condition. The result follows.

Martingale Representation Theorems


Let (Fs ) be the standard Wiener process filtration (the minimal filtration
generated by the Wt ’s).

Theorem 6.13 (L2 -martingale representation theorem). Let X ∈ L2 (Ω, FT , P ).


Then there is f ∈ KT (unique almost surely) such that
Z T
X =α+ f (s, ω) dWs
0

where α = E(X).

If (Xt )t≤T is an L2 -martingale, then there is f ∈ KT (unique almost surely)


such that Z t
Xt = X0 + f dWs
0
for all 0 ≤ t ≤ T .

We will not prove this here but will just make some remarks. To say
that X in the Hilbert space L2 (FT ) obeys E(X) = 0 is to say that X is
orthogonal to 1, so the theorem says that every element of L2 (FT ) orthogonal
to 1 is a stochastic integral. The uniqueness of f ∈ KT in the theoremRT is
a consequence of the isometry property. Moreover, if we denote 0 f dW
by I(f ), then the isometry property tells us that f 7→ I(f ) is an isometric
isomorphism between KT and the subspace of L2 (FT ) orthogonal to 1.
Note, further, that the second part of the theorem follows from
R T the first
part by setting X = XT . Indeed, by the first part, XT = α + 0 f dW and
so Z t
Xt = E(XT | Ft ) = α + f dWs
0
for 0 ≤ t ≤ T . Evidently, α = X0 = E(XT ).

ifwilde Notes
Itô’s Formula 95

Recall that if dXt = u dt + v dWt then Itô’s formula is


df (t, Xt ) = ∂t f (t, Xt ) dt + 12 ∂xx f (t, Xt ) dXt dXt + ∂x f (t, Xt ) dXt
where, according to the Itô table, dXt dXt = v 2 dt.
Set u = 0 and let f (t, x) = x2 . Then ∂t f = 0, ∂x f = 2x and ∂xx f = 2 so
that
d(Xt2 ) = 0 + dXt dXt +2X dXt .
| {z }
=v 2 dt
That is, Z Z
t t
Xt2 = X02 + 2Xs dXs + v 2 ds .
0 0
But now dXs = u ds + v dWs = 0 + v dWs and Xs is a martingale and
Z t Z t
2 2
Xt = X 0 + 2Xs v dWs + v 2 ds
0
| {z } | 0 {z }
martingale part, Zt increasing process part, At

This is the Doob-Meyer 2


2
R t 2 decomposition of the submartingale (XtR).t 2
Note that Xt − 0 v ds is the martingale part. If v = 1, then 0 v ds = t
and so this says that Xt2 − t is the martingale part. But if v = 1, we have
dX = v dW = dW so that Xt = X0 + Wt and in this case (Xt ) is a Wiener
process.
The converse is true (Lévy’s characterization) as we show next using the
following result.
Proposition 6.14. Let G ⊂ F be σ-algebras. Suppose that X is F-measurable
and for each θ ∈ R
2 2
E(eiθX | G) = e−θ σ /2
almost surely. Then X is independent of G and X has a normal distribution
with mean 0 and variance σ 2 .
Proof. For any A ∈ G,
2 σ 2 /2
E(eiθX 1A ) = E(e−θ 1A )
−θ2 σ 2 /2
=e E(1A )
−θ2 σ 2 /2
=e P (A) .
In particular, with A = Ω, we see that
2 σ 2 /2
E(eiθX ) = e−θ
and so it follows that X is normal with mean zero and variance σ 2 .
To show that X is independent of G, suppose that A ∈ G with P (A) > 0.
Define Q on F by
P (B ∩ A) E(1A 1B )
Q(B) = = .
P (A) P (A)

November 22, 2004


96 Chapter 6

Then Q is a probability measure on F and


Z
iθX
EQ (e )= eiθX dQ
ZΩ
dP
= eiθX
Ω P (A)
E(e iθX 1A )
=
P (A)
2 σ 2 /2
= e−θ

from the above. It follows that X also has a normal distribution, with mean
zero and variance σ 2 with respect to Q. Hence, if Φ denotes the standard
normal distribution function, then for any x ∈ R we have

P ({ X ≤ x } ∩ A)
Q(X ≤ x) = Φ(x/σ) =⇒ = Φ(x/σ) = P (X ≤ x)
P (A)
=⇒ P ({ X ≤ x } ∩ A) = P (X ≤ x) P (A)

for any A ∈ G with P (A) 6= 0. This trivially also holds for A ∈ G with
P (A) = 0 and so we may conclude that this holds for all A ∈ G which means
that X is independent of G.

Using this, we can now discuss Lévy’s theorem where as before, we work
with the minimal Wiener filtration.

Theorem 6.15 (Lévy’s characterization of the Wiener process).


Let (Xt )t≤T be a (continuous) L2 -martingale with X0 = 0 and such that
Xt2 − t is an L1 -martingale. Then (Xt ) is a Wiener process.

Proof. By the L2 -martingale representation theorem, there is β ∈ KT such


that Z t
Xt = β(s, ω) dWs .
0

We have seen that d(Xt2 ) = dZt + β 2 dt where (Zt ) is a martingale and so by


hypothesis and the uniqueness of the Doob-Meyer decomposition d(X 2 ) =
dZ + dA, it follows that β 2 dt = dt (the increasing part is At = t).
Let f (t, x) = eiθx so that ∂t f = 0, ∂x f = iθ eiθx and ∂xx f = −θ2 eiθx and
apply Itô’s formula to f (t, Xt ), where now dX = β dW , to get

d(eiθXt ) = − 21 θ2 eiθXt dXt dXt +iθ eiθXt dXt .


| {z } |{z}
β 2 dt=dAt =dt β dWt

ifwilde Notes
Itô’s Formula 97

So Z Z
t t
iθXt iθXs 2 iθXu
e −e = − 12 θ e du + iθ eiθXu β dWu
s s
and therefore
Z t Z t
eiθ(Xt −Xs ) − 1 = − 21 θ2 eiθ(Xu −Xs ) du + iθ eiθ(Xu −Xs ) β dWu (∗)
s s
Rt
Let Yt = 0 eiθXu β dWu . Then (Yt ) is a martingale and the second term on
the right hand side of equation (∗) is equal to iθ e−iθXs (Yt − Ys ). Taking the
conditional expectation with respect to Fs , this term then drops out and we
find that
Z t
iθ(Xt −Xs ) 1 2
E(e | Fs ) − 1 = − 2 θ E(eiθ(Xu −Xs ) | Fs ) du .
s

Letting ϕ(t) = E(eiθ(Xt −Xs ) | Fs ), we have


Z t
1 2
ϕ(t) − 1 = − 2 θ ϕ(u) du
s
′ 2
=⇒ ϕ (t) = − 12 θ ϕ(t) and ϕ(s) = 1
2 /2
=⇒ ϕ(t) = C e−(t−s)θ

for t ≥ s. The result now follows from the proposition.

Cameron-Martin, Girsanov change of measure


The process Wt +µt is a Wiener process “with drift” and is not a martingale.
However, it becomes a martingale if the underlying probability measure is
changed. To discuss this, we consider the process
Z t
Xt = W t + µ(s, ω) ds
0
R t 1 R t
µ2 ds
where µ is bounded and adapted on [0, T ]. Let Mt = e− 0 µ dW − 2 0 .
Claim: (Mt )t≤T is a martingale.

Proof. Apply Itô’s


R t formula1 to
R t f (t, Y ) where f (t, x) = ex and where Yt is the
process Yt = − 0 µ dW − 2 0 µ ds so Mt = eYt and dY = −µ dW − 21 µ2 ds.
2

We see that ∂t f = 0 and ∂x f = ∂xx f = ex and therefore

dM = d(eY ) = 0 + 12 eY dY dY + eY dY
= 1
2 eY µ2 ds + eY (−µ dW − 21 µ2 ds)
= −µeY dW

and so Mt = eYt is a martingale, as claimed.

November 22, 2004


98 Chapter 6

Claim: (Xt Mt )t≤T is a martingale.

Proof. Itô’s product formula gives


d(X M ) = (dX) M + X dM + dX dM
= M (dW + µ ds) + X(−µM dW ) + (−µM ) ds
= (M − µXM ) dW
and therefore Z t
X t Mt = X 0 M0 + (M − µXM ) dW
0
is a martingale, as required.

Theorem 6.16. Let Q be the measure on F given by Q(A) = E(1A MT ) for


A ∈ F. Then Q is a probability measure and (Xt )t≤T is a martingale with
respect to Q.
R
Proof. Since Q is the map A 7→ Q(A) = A MT dP on F, it is clearly a
measure. Furthermore, Q(Ω) = E(MT ) = E(M0 ) since Mt is a martingale.
But E(M0 ) = 1 and so we R F.
R see that Q is a probabilityR measure on
We can write Q(A) = Ω 1A MT dP and therefore Ω f dQ = Ω f MT dP
for Q-integrable f (symbolically, dQ = MT dP .) (Note, incidentally, that
Q(A) = 0 if and only if P (A) = 0.)
To show that Xt is a martingale with respect to Q, let 0 ≤ s ≤ t ≤ T
and let A ∈ Fs . Then, using the facts shown above that Mt and Xt Mt are
martingales, we find that
Z Z
Xt dQ = Xt MT dP
A A
Z
= E(Xt MT | Ft ) dP , since A ∈ Fs ⊂ Ft ,
A
Z
= Xt Mt dP
ZA

= Xs Ms dP
ZA
= E(Xs MT | Fs ) dP
ZA
= Xs MT dP
ZA
= Xs dQ
A

and so
E Q (Xt |Fs ) = Xs
and the proof is complete.

ifwilde Notes
Textbooks on Probability and Stochastic Analysis

There are now many books available covering the highly technical mathe-
matical subject of probability and stochastic analysis. Some of these are
very instructional.

L. Arnold, Stochastic Differential Equations, Theory and Applications, John


Wiley, 1974. Very nice — user friendly.

R. Ash, Real Analysis and Probability, Academic Press, 1972. Excellent for
background probability theory and functional analysis.

L. Breiman, Probability, Addison-Wesley, 1968. Very good background


reference for probability theory.

P. Billingsley, Probability and Measure, 3rd edition, John Wiley, 1995.


Excellent, well-written and highly recommended.

N. H. Bingham and R. Kiesel, Risk-Neutral Valuation, Pricing and Hedging


of Financial Derivatives, Springer, 1998. Useful reference for those interested
in applications in mathematical finance, but, as the authors point out, many
proofs are omitted.

Z. Brzeźniak and Tomasz Zastawniak, Basic Stochastic Processes, SUMS,


Springer, 1999. Absolutely first class textbook — written to be understood
by the reader. This book is a must.

K. L. Chung and R. J. Williams, Introduction to Stochastic Integration,


2nd edition, Birkhäuser, 1990. Advanced monograph with many references to
the first author’s previous works.

R. Durrett, Probability: Theory and Examples, 2nd edition, Duxbury


Press, 1996. The author clearly enjoys his subject — but be prepared to fill
in many gaps (as exercises) for yourself.

99
100

R. Durrett, Stochastic Calculus, A Practical Introduction, CRC Press, 1996.


Good reference book — with constant reference to the author’s other book
(and with many exercises).

R. J. Elliott, Stochastic Calculus and Applications, Springer, 1982. Advanced


no-nonsense monograph.

R. J. Elliott and P. E. Kopp, Mathematics of Financial Markets, Springer,


1999. For the specialist in financial mathematics.

T. Hida, Brownian Motion, Springer, 1980. Especially concerned with


generalized stochastic processes.

N. Ikeda and S. Watanabe, Stochastic Differential Equations and Diffusion


Processes, North-Holland, 2nd edition, 1989. Advanced monograph.

I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus,


2nd edition, Springer, 1991. Many references to other texts and with proofs
left to the reader.

F. Klebaner, Introduction to Stochastic Calculus with Applications, Imperial


College Press, 1998. User-friendly — but lots of gaps.

P. E. Kopp, Martingales and Stochastic Integrals, Cambridge University


Press, 1984. For those who enjoy functional analysis.

R. Korn and E. Korn, Option Pricing and Portfolio Optimization, Modern


Methods of Financial Mathematics, American Mathematical Society, 2000.
Very good — recommended.

D. Lamberton and B. Lapeyre, Introduction to Stochastic Calculus Applied


to Finance, Chapman and Hall/CRC 1996. For the practitioners.

R. S. Lipster and A. N. Shiryaev, Stochastics of Random Processes, I


General Theory, 2nd edition, Springer, 2001. Advanced treatise.

X. Mao, Stochastic Differential Equations and their Applications, Horwood


Publishing, 1997. Good — but sometimes very brisk.

A. V. Mel´nikov, Financial Markets, Stochastic Analysis and the Pricing


of Derivative Securities, American Mathematical Society, 1999. Well worth a
look.

ifwilde Notes
Textbooks 101

P. A. Meyer, Probability and Potentials, Blaisdell Publishing Company, 1966.


Top-notch work from one of the masters (but not easy).

T. Mikosch, Elementary Stochastic Calculus with Finance in View, World


Scientific, 1998. Excellent — a very enjoyable and accessible account.

B. Øksendal, Stochastic Differential Equations, An Introduction with


Applications, 5th edition, Springer, 2000. Good — but there are many results
of interest which only appear as exercises.

D. Revuz and M. Yor, Continuous martingales and Brownian motion,


Springer, 1991. Advanced treatise.

J. M. Steele, Stochastic calculus and Financial Applications, Springer, 2001.


Very good — recommended reading.

H. von Weizsäcker and G. Winkler, Stochastic Integrals, Vieweg and Son,


1990. Advanced mathematical monograph.

D. Williams, Diffusions, Markov Processes and Martingales, John Wiley,


1979. Very amusing — but expect to look elsewhere if you would like detailed
explanations.

D. Williams, Probability with Martingales, Cambridge University Press, 1991.


Very good, with proofs of lots of background material — recommended.

J. Yeh, Martingales and Stochastic Analysis, World Scientific, 1995. Excellent


— very well-written careful logical account of the theory. This book is like
a breath of fresh air for those interested in the mathematics.

November 22, 2004

You might also like