Stochastic Analysis Notes
Stochastic Analysis Notes
NOTES
Ivan F Wilde
Chapter 1
Introduction
2. If A
S1 , A2 ,. . . P
is any sequence of pairwise disjoint subsets of R, then
m n An = n m(An ).
1
2 Chapter 1
One such is Q ∩ [0, 1). No other equivalence class can contain a rational.
Indeed, if a class contains one rational, then it must contain all the rationals
which are in [0, 1). Choose one point from each equivalence class to form the
set A0 , say. Then no two elements of A0 are equivalent and every equivalence
class has a representative which belongs to A0 .
For each q ∈ Q ∩ [0, 1), we construct a subset Aq of [0, 1) as follows. Let
Aq = { x + q − [x + q] : x ∈ A0 } where [t] denotes the integer part of the
real number t. In other words, Aq is got from A0 + q by translating that
portion of A0 + q which lies in the interval [1, 2) back to [0, 1) simply by
subtracting 1 from each such element.
If we write A0 = Bq′ ∪ Bq′′ where Bq′ = { x ∈ A0 : x + q < 1 } and
Bq = A0 \ Bq′ , then we see that Aq = (Bq′ + q) ∪ (Bq′′ + q − 1). The translation
′′
Finally, we get our contradiction. We have m([0, 1)) = ℓ([0, 1)) = 1. But
[ X
m([0, 1)) = m( Aq ) = m(Aq )
q∈Q∩[0,1) q∈Q∩[0,1)
which is not possible since m(Aq ) = m(A0 ) for all q ∈ Q ∩ [0, 1).
We have seen that the notion of length simply cannot be assigned to every
subset of R. We might therefore expect that it is not possible to assign a
probability to every subset of the sample space, in general.
The following result is a further very dramatic indication that we may
not be able to do everything that we might like.
Theorem 1.1 (Banach-Tarski). It is possible to cut-up a sphere of radius one
(in R3 ) into a finite number of pieces and then reassemble these pieces (via
standard Euclidean motions in R3 ) into a sphere of radius 2.
Moral - all the fuss with σ-algebras and so on really is necessary if we want
to develop a robust (and rigorous) theory.
ifwilde Notes
Introduction 3
as required.
(ii) (Second
P Lemma) Let (An ) be a sequence of independent events such
that n P (An ) is divergent. Then P (An infinitely-often) = 1.
(ii) We have
T S
P (An infinitely-often) = P ( n k≥n Ak )
S
= lim P ( k≥n Ak
)
n
Sm+n
= lim lim P ( k=n Ak ) .
n m
The proof now proceeds by first taking complements, then invoking the
independence hypothesis and finally by the inspirational use of the inequality
1 − x ≤ e−x for any x ≥ 0. Indeed, we have
S Tm+n c
P ( ( m+n c
k=n Ak ) ) = P ( k=n Ak )
m+n
Y
= P (Ack ) , by independence,
k=n
m+n
Y
≤ e−P (Ak ) , since P (Ack ) = 1 − P (Ak ) ≤ e−P (Ak ) ,
k=n
P m+n
= e− k=n P (Ak )
→0
P S
as m → ∞, since k P (Ak ) is divergent. It follows that P ( k≥n Ak ) = 1
and so P (An infinitely-often) = 1.
Example 1.4. A fair coin is tossed repeatedly. Let An be the event that
the outcome at the nth play is “heads”. Then P (An ) = 21 and evidently
P
n P (An ) is divergent (and A1 , A2 , . . . are independent). It follows that
P (An infinitely-often) = 1. In other words, in a sequence of coin tosses,
there will be an infinite number of “heads” with probability one.
Now let Bn be the event that the outcomes of the five consecutive plays
at the times 5n, 5n +P1, 5n + 2, 5n + 3 and 5n + 4 are all “heads”. Then
P (Bn ) = ( 12 )5 and so n P (Bn ) is divergent. Moreover, the events B1 , B2 , . . .
are independent and so P (Bn infinitely-often) = 1. In particular, it follows
that, with probability one, there is an infinite number of “5 heads in a row”.
ifwilde Notes
Introduction 5
Borel function and Y = g(X). To prove this for the general case, one takes
a far less direct approach.
Theorem 1.5. Let X be a random variable on (Ω, S, P ) and suppose that Y
is σ(X)-measurable. Then there is a Borel function g : R → R such that
Y = g(X).
Proof. Let C denote the class of Borel functions of X,
C = { ϕ(X) : for some Borel function ϕ : R → R }
Then C has the following properties.
(i) 1A ∈ C for any A ∈ σ(X).
To see this, we note that σ(X) = X −1 (B) and so there is B ∈ B such that
A = X −1 (B). Hence 1A (ω) = 1B (X(ω)) and so 1A ∈ C because 1B : R → R
is Borel.
(ii) Clearly, any linear combination of members of C also belongs to C.
This fact, together with (i) means that C contains all simple functions on
(Ω, σ(X)).
(iii) C is closed under pointwise convergence.
Indeed, suppose that (ϕn ) is a sequence of Borel functions on R such that
ϕn (X(ω)) → Z(ω) for each ω ∈ Ω. We wish to show that Z ∈ C, that is,
that Z = ϕ(X) for some Borel function ϕ : R → R.
Let B = { s ∈ R : limn ϕn (s) exists }. Then
B = { s ∈ R : (ϕn (s)) is a Cauchy sequence in R }
\ [ \
= { ϕm (s) − k1 < ϕn (s) < ϕm (s) + 1
k }
k∈N N ∈N m,n>N
| {z }
1 1
{ ϕn (s) < ϕm (s)+ k } ∩ { ϕm (s)− k < ϕn (s) }
Finally, to complete the proof of the theorem, we note that any Borel
measurable function Y : Ω → R is a pointwise limit of simple functions and
so must belong to C, by (ii) and (iii) above. In other words, any such Y has
the form Y = g(X) for some Borel function g : R → R.
L versus L
Let (Ω, S, P ) be a probability space. For any 1 ≤ p < ∞, the space
Lp (Ω, S, P ) is the collection of measurable functions
R (random variables)
f : Ω → R (or C) such that E(|f |p ) < ∞, i.e., Ω |f (ω)|p dP is finite.
R 1/p
One can show that Lp is a linear space. Let kf kp = Ω |f (ω)|p dP .
Then (using Minkowski’s inequality), one can show that
kf + gkp ≤ kf kp + kgkp
for any f ∈ Lp Lq
and g ∈ and with 1/p+1/q = 1. This reduces (essentially)
to the Cauchy-Schwarz inequality when p = 2 (so that q = 2 also).
To “correct” the norm-seminorm issue, consider equivalence classes of
functions, as follows. Declare functions f and g in Lp to be equivalent,
written f ∼ g, if f = g almost surely. In other words, if N denotes the
collection of random variables equal to 0 almost surely, then f ∼ g if and
only if f − g ∈ N . (Note that every element h of N belongs to every Lp and
obeys khkp = 0.) One sees that ∼ is an equivalence relation. (Clearly f ∼ f ,
and if f ∼ g then g ∼ f . To see that f ∼ g and g ∼ h implies that f ∼ h,
note that { f = g } ∩ { g = h } ⊂ { f = h }. But P (f = g) = P (g = h) = 1
so that P (f = h) = 1, i.e., f ∼ h.)
For any f ∈ Lp , let [f ] be the equivalence class containing f , so that
[f ] = f + N . Let Lp (Ω, S, P ) denote the collection of equivalence classes
{ [f ] : f ∈ Lp }. Lp (Ω, S, P ) is a linear space equipped with the rule
ifwilde Notes
Introduction 7
which shows that αf1 + βg1 ∼ αf + βg. In other words, the definition of
α[f ] + β[g] above does not depend on the particular choice of f ∈ [f ] or
g ∈ [g] used, so this really does determine a linear structure on Lp .
Next, we define |||[f ]|||p = kf kp . If f ∼ f1 , then kf kp = kf1 kp , so ||| · |||p
is well-defined on Lp . In fact, ||| · |||p is a norm on Lp . (If |||[f ]|||p = 0, then
kf kp = 0 so that f ∈ N , i.e., f ∼ 0 so that [f ] is the zero element of Lp .) It
is usual to write ||| · |||p as just k · kp . One can think of Lp and Lp as “more or
less the same thing” except that in Lp one simply identifies functions which
are almost surely equal.
Note: this whole discussion applies to any measure space — not just
probability spaces. The fact that the measure has total mass one is irrelevant
here.
Riesz-Fischer Theorem
Theorem 1.6 (Riesz-Fischer). Let 1 ≤ p < ∞ and suppose that (fn ) is a
Cauchy sequence in Lp , then there is some f ∈ Lp such that fn → f in
Lp and there is some subsequence (fnk ) such that fnk → f almost surely as
k → ∞.
In particular, f − gj ∈ Lp and so f = (f − gj ) + gj ∈ Lp .
Finally, let ε > 0 be given.
P∞Let N be such that d(N ) < 21 ε and choose
any j such that nj > N and k=j 2−k < 21 ε. Then for any n > N , we have
∞
X
kf − fn kp ≤ kf − fnj kp + kfnj − fn kp ≤ 2−k + d(N ) < ε ,
k=j
that is, fn → f in Lp .
for any X ∈ Lp .
as required.
Remark 1.8. For any random variable g which is bounded almost surely, let
Then kgkp ≤ kgk∞ and kgk∞ = limp→∞ kgkp . To see this, suppose that g
is bounded with |g| ≤ M almost surely. Then
Z
p
kgkp = |g|p dP ≤ M p
Ω
ifwilde Notes
Introduction 9
and so kgkp is a lower bound for the set { M : |g| ≤ M almost surely }. It
follows that kgkp ≤ kgk∞ .
ifwilde Notes
Introduction 11
A ∈ MB ⇐⇒ A ∩ B ∈ M(A) ⇐⇒ B ∈ MA = M(A).
A ⊆ MB ⊆ M(A)
Example 1.12. Suppose that P and Q are two probability measures on B(R)
which agree on sets of the form (−∞, a] with a ∈ R. Then P = Q on B(R).
ifwilde Notes
Chapter 2
Conditional expectation
13
14 Chapter 2
ifwilde Notes
Conditional expectation 15
Hence
µ({ ω : h(ω) 6= 0 }) = µ(h < 0) + µ(h > 0) = 0 ,
that is, µ(h = 0) = 1.
Remark 2.5. Note that we have used the standard shorthand notation such
as µ(h > 0) for µ({ ω : h(ω) > 0 }). There is unlikely to be any confusion.
b
Let Bn = { ω : X(ω) ≤ − n1 }. Then
Z Z
b
X dP ≤ − n 1
dP = − n1 P (Bn ) .
Bn Bn
R
However, the left hand side is equal to Bn X dP ≥ 0 which forces P (Bn ) = 0.
b < 0) = limn P (Bn ) = 0 and so P (X
But then P (X b ≥ 0) = 1.
= E(X + Y | G) dP
A
Now with A = ∅,
Z Z Z
X dP = X dP = 0 = E(X) dP .
A ∅ A
b : ω 7→ E(X) is { Ω, ∅ }-measurable
So the everywhere constant function X
and obeys Z Z
Xb dP = X dP
A A
for every A ∈ { Ω, ∅ }. Hence ω 7→ E(X) is a conditional expectation of
X. If X ′ is another, then X ′ = Xb almost surely, so that P (X ′ = X)b = 1.
′ b
But the set { X = X } is { Ω, ∅ }-measurable and so is equal to either ∅ or
to Ω. Since P (X ′ = X)b = 1, it must be the case that { X ′ = X b } = Ω so
b
that X(ω) ′
= X (ω) for all ω ∈ Ω.
ifwilde Notes
Conditional expectation 17
b = E(X).
that is, E(X)
ZΩ
= fbg dP
Ω
and we see that (∗) holds for general f ∈ L1 . Similarly, we note that by
decomposing g as g = g + −g − , it is enough to prove that (∗) holds for g ≥ 0.
So we need only show that (∗) holds for f ≥ 0 and g ≥ 0. In this case, we
know that there is a sequence (sn ) of simple G-measurable functions P such
that 0 ≤ sn ≤ g and sn → g everywhere. For fixed n, let sn = aj 1Aj
(finite sum). Then
Z XZ XZ Z
f sn dP = f dP = b
f dP = fbsn dP
Ω Aj Aj Ω
Jensen’s Inequality
that is,
ϕ((1 − s)a + sb)) ≤ (1 − s)ϕ(a) + sϕ(b) (2.1)
for any a, b ∈ R and all 0 ≤ s ≤ 1. The point a + s(b − a) = (1 − s)a + sb
lies between a and b and the inequality (2.1) is the statement that the chord
between the points (a, ϕ(a)) and (b, ϕ(b)) on the graph y = ϕ(x) lies above
the graph itself.
Let u < v < w. Then v = u + s(w − u) = (1 − s)u + sw for some 0 < s < 1
and from (2.1) we have
ifwilde Notes
Conditional expectation 19
which gives
ϕ(w) − ϕ(u) ϕ(w) − ϕ(v)
≤ . (2.6)
w−u w−v
These inequalities are readily suggested from a diagram.
Now fix v = v0 . Then by inequality (2.5), we see that the ratio (Newton
quotient) (ϕ(w) − ϕ(v0 ))/(w − v0 ) decreases as w ↓ v0 and, by (2.4), is
bounded below by (ϕ(v0 ) − ϕ(u))/(v0 − u) for any u < v0 . Hence ϕ has a
right derivative at v0 , i.e.,
ϕ(w) − ϕ(v0 )
∃ lim ≡ D+ ϕ(v0 ).
w↓v0 w − v0
ϕ(v0 ) − ϕ(u)
∃ lim ≡ D− ϕ(v0 ).
u↑v0 v0 − u
It follows that ϕ is continuous at v0 because
ϕ(w) − ϕ(v )
0
ϕ(w) − ϕ(v0 ) = (w − v0 ) →0 as w ↓ v0
w − v0
and ϕ(v ) − ϕ(u)
0
ϕ(v0 ) − ϕ(u) = (v0 − u) →0 as u ↑ v0 .
v0 − u
ϕ(w) − ϕ(v0 )
D− ϕ(v0 ) ≤
w − v0
ϕ(w) − ϕ(λ)
≤ for any v0 ≤ λ ≤ w, by (2.6)
w−λ
↑ D− ϕ(w) as λ ↑ w.
Hence
D− ϕ(v0 ) ≤ D− ϕ(w)
whenever v ≤ w. Similarly, letting w ↓ v in (2.6), we find
ϕ(v) − ϕ(u)
≤ D+ ϕ(v)
v−u
and so
ϕ(λ) − ϕ(u) ϕ(v) − ϕ(u)
≤ ≤ D+ ϕ(v).
λ−u v−u
Letting λ ↓ u, we get
D+ ϕ(u) ≤ D+ ϕ(v)
whenever u ≤ v. That is, both D+ ϕ and D− ϕ are non-decreasing functions.
Furthermore, letting u ↑ v0 and w ↓ v0 in (2.4), we see that
D− ϕ(v0 ) ≤ D+ (v0 )
at each v0 .
for any x ∈ R.
ϕ(x) − ϕ(v)
≥ D+ ϕ(v) ≥ m
x−v
ifwilde Notes
Conditional expectation 21
Let A = { (α, β) : ϕ(x) ≥ αx + β for all x }. From the remark above (and
using the same notation), we see that for any x ∈ R,
αx + β ≤ ϕ(x)
Proof. Given x ∈ R, fix u < x < w. Then we know that for any q ∈ Q with
u < q < w,
D− ϕ(u) ≤ D− ϕ(q) ≤ D+ ϕ(q) ≤ D+ ϕ(w). (2.8)
Since ϕ is continuous, the first term on the right hand side converges to 0
as n → ∞ and so does the second term, because the sequence (m(qn )) is
bounded. It follows that (α(qn )x + β(qn )) → ϕ(x) and therefore we see that
ϕ(x) = sup{ αx + β : (α, β) ∈ A0 }, as claimed.
It follows that
and so
b + β ≤ E(ϕ(X) | G)
αX almost surely,
b is any choice of E(X | G).
where X
Now, for each (α, β) ∈ A0 , let A(α, β) be a set in G such that P (A(α, β)) = 1
and
b
αX(ω) + β ≤ E(ϕ(X) | G)(ω)
T
for every ω ∈ A(α, β). Let A = (α,β)∈A0 . Since A0 is countable, P (A) = 1
and
b
αX(ω) + β ≤ E(ϕ(X) | G)(ω)
for all ω ∈ A. Taking the supremum over (α, β) ∈ A0 , we get
sup b
αX(ω) + β ≤ E(ϕ(X) | G)(ω)
(α,β)∈A0
| {z }
b
ϕ(X(ω))=ϕ( b
X)(ω)
on A, that is,
b ≤ E(ϕ(X) | G)
ϕ(X)
almost surely and the proof is complete.
ifwilde Notes
Conditional expectation 23
k E(X | G) kr ≤ k X kr .
b ≤ E(ϕ(X) | G) ,
ϕ(X) almost surely,
that is
b |r ≤ E( |X|r | G) almost surely.
|X
b r ) ≤ E( E( |X|r | G) ) = E( |X|r ) .
E( |X|
as claimed.
R
Example 2.10. Let X = L2 (Ω, S, P ) with (f, g) = Ω f (ω)g(ω) dP for any
f, g ∈ L2 .
In general, set kxk = (x, x)1/2 for x ∈ X. Then in the example above, we
R 1/2
observe that kf k = Ω f 2 dP = kf k2 . It is important to note that k · k
is not quite a norm. It can happen that kxk = 0 even though x 6= 0. Indeed,
an example in L2 is provided by any function which is zero almost surely.
Proposition 2.11 (Parallelogram law). For any x, y ∈ X,
kx + yk2 + kx − yk2 = 2( kxk2 + kyk2 ).
Proof. This follows by direct calculation using kwk2 = (w, w).
Definition 2.12. A subspace V ⊂ X is said to be complete if every Cauchy
sequence in V converges to an element of V , i.e., if (vn ) is a Cauchy sequence
in V (so that kvn − vm k → 0 as m, n → ∞) then there is some v ∈ V such
that vn → v, as n → ∞.
Theorem 2.13. Let x ∈ X and suppose that V is a complete subspace of X.
Then there is some v ∈ V such that kx − vk = inf y∈V kx − yk, that is, there
is v ∈ V so that kx − vk = dist(x, V ), the distance between x and V .
Proof. Let (vn ) be any sequence in V such that
kx − vn k → d = inf kx − yk .
y∈V
and so d = kx − vk.
ifwilde Notes
Conditional expectation 25
It follows that kv ′ − vk = 0.
For the converse, suppose that x − v ′ ⊥ V . Then we calculate
≥ d + kv − v ′ k2 .
2
Suppose now that k · k satisfies the condition that kxk = 0 if and only if
x = 0. Thus we are supposing that k · k really is a norm not just a seminorm
on X. Then the equality kv − v ′ k = 0 is equivalent to v = v ′ . In this case,
we can summarize the preceding discussion as follows.
Given any x ∈ X, there is a unique v ∈ V such that x − v ⊥ V . With
x = v + x − v we see that we can write x as x = v + w where v ∈ V
and w ⊥ V . This decomposition of x as the sum of an element v ∈ V
and an element w ⊥ V is unique and means that we can define a map
P : X → V by the formula P : x 7→ P x = v. One checks that this is a linear
f = fb + g
ifwilde Notes
Conditional expectation 27
for any A ∈ G.
Proof. By construction, f − fb ⊥ L2 (Ω, G, P ) so that
Z
(f − fb) g dP = 0
Ω
b
R A ∈ G. (If 1Bn = { h < −1/n } for n ∈ N, then the inequalities
for all
0 ≤ Bn h dP ≤ − n P (Bn ) imply that P (Bn ) = 0. But then P (b
b h < 0) =
limn P (Bn ) = 0.)
ifwilde Notes
Chapter 3
Martingales
F0 ⊂ F1 ⊂ F2 ⊂ · · · ⊂ F .
Note that (1) is required in order for (3) to make sense. (The conditional
expectation E(ξ | G) is not defined unless ξ is integrable.)
Remark 3.2. Suppose that (ξn ) is a martingale. For any n > m, we have
That is,
E(ξn | Fm ) = ξm almost surely
for all n ≥ m. (This could have been taken as part of the definition.)
29
30 Chapter 3
E(|ξn |) = E(|X0 + X1 + · · · + Xn |)
≤ E(|X0 |) + E(|X1 |) + · · · + E(|Xn |)
<∞
E(ξn+1 | Fn ) = E(Xn+1 + ξn | Fn )
= E(Xn+1 | Fn ) + E(ξn | Fn )
= E(Xn+1 ) + ξn
since ξn is adapted and Xn+1 and Fn are independent
= ξn
Proof. We note that E(X) = E(X | G), where G is the trivial σ-algebra,
G = { Ω, ∅ }. Since G ⊂ Fn for all n, we can apply the tower property to
deduce that
as required.
1. Each Xn is integrable.
ifwilde Notes
Martingales 31
Proposition 3.9.
(i) Suppose that (Xn ) and (Yn ) are submartingales. Then (Xn ∨ Yn ) is
also a submartingale.
(ii) Suppose that (Xn ) and (Yn ) are supermartingales. Then (Xn ∧ Yn )
is also a supermartingale.
Proof. (i) Set Zn = Xn ∨ Yn . Then Zk ≥ Xk and Zk ≥ Yk for all k and so
That is,
ξn2 ≤ E(ξn+1
2
| Fn ) almost surely
as required.
E( (Xn − Xm ) Y ) = 0
E( (Xk − Xj ) (Xn − Xm ) ) = 0
ifwilde Notes
Martingales 33
Proof. Note first that (Xn − Xm )Y is integrable. Next, using the “tower
property” and the Fm -measurability of Y , we see that
E((Xn − Xm ) Y ) = E(E((Xn − Xm ) Y | Fm ))
= E(E((Xn − Xm | Fm ) Y )
=0
since E(Xn − Xm | Fm ) = Xm − Xm = 0.
The orthogonality of the martingale increments follows immediately by
taking Y = Xk − Xj .
Gambling
It is customary to mention gambling. Consider a sequence η1 , η2 , . . . of
random variables where ηn is thought of as the “winnings per unit stake” at
game play n. If a gambler places a unit stake at each game, then the total
winnings after n games is ξn = η1 + η2 + · · · + ηn .
For n ∈ N, let Fn = σ(η1 , . . . , ηn ) and set ξ0 = 0 and F0 = { Ω, ∅ }. To
say that (ξn ) is a martingale is to say that
or
E(ξn+1 − ξn | Fn ) = 0 almost surely.
ζn = α1 η1 + · · · + αn ηn .
Example 3.13. For each k ∈ Z+ , let Bk be a Borel subset of Rk+1 and set
(
1, if (η0 , η1 , . . . , ηk ) ∈ Bk
αk+1 =
0, otherwise.
Theorem 3.14. Let (αn ) be a predictable process and as above, let (ζn ) be the
process ζn = α1 (ξ1 − ξ0 ) + · · · + αn (ξn − ξn−1 ).
(i) Suppose that (αn ) is bounded and that (ξn ) is a martingale. Then
(ζn ) is a martingale.
(ii) Suppose that (αn ) is bounded, αn ≥ 0 and that the process (ξn ) is
a supermartingale. Then (ζn ) is a supermartingale.
(iii) Suppose that (αn ) is bounded, αn ≥ 0 and that the process (ξn ) is
a submartingale. Then (ζn ) is a submartingale.
ifwilde Notes
Martingales 35
That is,
E(ζn | Fn−1 ) − ζn−1 = αn Φn .
Now, in case (i), Φn = 0 almost surely, so E(ζn | Fn−1 ) = ζn−1 almost surely.
In case (ii), Φn ≤ 0 almost surelyand therefore αn Φn ≤ 0 almost surely. It
follows that ζn−1 ≥ E(ζn | Fn−1 ) almost surely.
In case (iii), Φn ≥ 0 almost surely and so αn Φn ≥ 0 almost surely and
therefore ζn−1 ≤ E(ζn | Fn−1 ) almost surely.
Remarks 3.15.
1. This last result can be interpreted as saying that no matter what strategy
one adopts, it is not possible to make a fair game “unfair”, a favourable
game unfavourable or an unfavourable game favourable.
2. The formula
(C · X)0 = 0
X
(C · X)n = Ck (Xk − Xk1 )
1≤k≤n
Stopping Times
{ τ ≤ n } ∈ Fn for each n ∈ Z+ .
One can think of this as saying that the information available by time n
should be sufficient to tell us whether something has “stopped by time n”or
not. For example, we should not need to be watching out for a company’s
profits in September if we only want to know whether it went bust in May.
{ τ = n } ∈ Fn for every n ∈ Z+ .
{ τ = n } = { τ ≤ n } ∩ { τ ≤ n − 1 }c
| {z } | {z }
∈Fn ∈Fn−1 ⊂Fn
Proposition 3.19. Let σ and τ be stopping times (with respect to (Fn )). Then
σ + τ , σ ∨ τ and σ ∧ τ are also stopping times.
Hence { σ + τ ≤ n } ∈ Fn .
ifwilde Notes
Martingales 37
{ σ ∨ τ ≤ n } = { σ ≤ n } ∩ { τ ≤ n } ∈ Fn
So if the outcome is ω and say, τ (ω) = 23, then Xnτ (ω) is given by
that is,
n
X
Xnτ = Xk 1{ τ =k } + Xn 1{ τ ≤n }c .
k=0
Definition 3.22. Let (Xn )n∈Z+ be an adapted process with respect to a given
filtration (Fn ) built on a probability space (Ω, S, P ) and let τ be a stopping
time such that τ < ∞ almost surely. The random variable stopped by τ is
defined to be
(
Xτ (ω) (ω), for ω ∈ { τ ∈ Z+ }
Xτ (ω) =
X∞ , / Z+ ,
if τ ∈
where X∞ is any arbitrary but fixed constant. Then Xτ really is a random
variable, that is, itS
is measurable
with respect to S (in fact, it is measurable
with respect to σ n Fn ). To see this, let B be any Borel set in R. Then
(on { τ < ∞ })
[ [
{ Xτ ∈ B } = { X τ ∈ B } ∩ {τ = k} = ({ Xτ ∈ B } ∩ { τ = k })
k∈Z+ k∈Z+
[ [
= { Xk ∈ B } ∈ Fk
k∈Z+ k∈Z+
ifwilde Notes
Martingales 39
E(SN ) = E(S1 1{ N =1 } + S2 1{ N =2 } + . . . )
= E(X1 1{ N =1 } + (X1 + X2 )1{ N =2 } +
+ (X1 + X2 + X3 )1{ N =3 } + . . . )
= E(X1 1{ N ≥1 } + X2 1{ N ≥2 } + X3 1{ N ≥3 } + . . . )
= E(X1 g1 ) + E(X2 g2 ) + . . .
E(SN ) = E(X1 ) ( P (N ≥ 1) + P (N ≥ 2) + P (N ≥ 3) + . . . )
= E(X1 ) (P (N = 1) + 2 P (N = 2) + 3 P (N = 3) + . . . )
= E(X1 ) E(N )
(2) Xτ is integrable;
Xt = Xτ ∧n + (Xτ − Xτ ∧n )
= Xτ ∧n + (Xτ − Xτ ∧n )(1τ ≤n + 1τ >n )
= Xτ ∧n + (Xτ − Xn )1τ >n .
∞
X
E(Xτ 1{ τ >n } ) = E(Xk 1{ τ =k } )
k=n+1
→ 0 as n → ∞
because, by (2),
∞
X
E(Xτ ) = E(Xk 1{ τ =k } ) < ∞ .
k=0
Remark 3.25. If X is a submartingale and the conditions (1), (2) and (3) of
the theorem hold, then we have
E(Xτ ) ≥ E(X0 ) .
This follows just as for the case when X is a martingale except that one
now uses the inequality E(Xτ ∧n ) ≥ E(Xτ ∧0 ) = E(X0 ) in equation (∗). In
particular, we note these conditions hold if τ is a bounded stopping time.
ifwilde Notes
Martingales 41
Lemma 3.27. Let (Xn ) be a submartingale with respect to the filtration (Fn )
and suppose that τ is a bounded stopping time with τ ≤ m where m ∈ Z.
Then
E(Xm ) ≥ E(Xτ ) .
Proof. We have
m Z
X
E(Xm ) = Xm 1{ τ =j } dP
j=0 Ω
Xm Z
= Xm dP
j=0 { τ =j }
Xm Z
= E(Xm | Fj ) dP, since { τ = j } ∈ Fj ,
j=0 { τ =j }
Xm Z
≥ Xj dP, since E(Xm | Fj ) ≥ Xj almost surely,
j=0 { τ =j }
= E(Xτ )
as required.
and
E(Xm ) ≥ E(Xτ ) .
∗ ≥ λ }. Then
Now, set A = { Xm
Z Z
E(Xτ ) = Xτ dP + Xτ dP .
A Ac
X τ = X k0 ≥ λ .
∗ < λ, then there is no j with 0 ≤ j ≤ m and
On the other hand, if Xm
Xj ≥ λ. Thus τ = m, by construction. Hence
Z Z Z
Xm dP = E(Xm ) ≥ E(Xτ ) = Xτ dP + Xτ dP
Ω Ac
ZA
≥ λ P (A) + Xm dP .
Ac
for any λ ≥ 0.
Proof. Since (Xn ) is an L2 -martingale, it follows that the process (Xn2 ) is a
submartingale (Proposition 3.10). Applying Doob’s Maximal Inequality to
the submartingale (Xn2 ) (and with λ2 rather than λ), we get
Z
2 2 2 2
λ P (max Xk ≥ λ ) ≤ Xm dP
k≤m { maxk≤m Xk2 ≥λ2 }
Z
2
≤ Xm dP
Ω
that is,
λ2 P (max |Xk | ≥ λ) ≤ E(Xm
2
)
k≤m
as required.
ifwilde Notes
Martingales 43
so that
Z ∞
2
E(X ) = 2 t E(1X≥t ) dt
Z0 ∞
=2 t P (X ≥ t) dt .
0
E( (Xn∗ )2 ) ≤ 4 E( Xn2 )
k Xn∗ k2 ≤ 2 kXn k2 .
=2 Xn Xn∗ dP
Ω
= 2 E(Xn Xn∗ )
≤ 2 kXn k2 kXn∗ k2 , by Schwarz’ inequality.
It follows that
k Xn∗ k2 ≤ 2 k Xn k2
or
E( (Xn∗ )2 ) ≤ 4 E( Xn2 )
and the proof is complete.
X7 (ω)
X13 (ω)
y=b
y=a
In the example in the figure 3.1, the path sequences X3 (ω), . . . , X7 (ω) and
X11 (ω), X12 (ω), X13 (ω) each constitute an up-crossing. The path sequence
ifwilde Notes
Martingales 45
X16 (ω), X17 (ω), X18 (ω), X19 (ω) forms a partial up-crossing (which, in fact,
will never be completed if Xn (ω) ≤ b remains valid for all n > 16).
As an aid to counting such up-crossings, we introduce the process (gn )n∈N
defined as follows:
g1 (ω) = 0
(
1, if X1 (ω) < a
g2 (ω) =
0, otherwise
1, if g2 (ω) = 0 and X2 (ω) < a
g3 (ω) = 1, if g2 (ω) = 1 and X2 (ω) ≤ b
0, otherwise
..
.
1, if gn (ω) = 0 and Xn (ω) < a
gn+1 (ω) = 1, if gn (ω) = 1 and Xn (ω) ≤ b
0, otherwise.
m
X
gj (ω) (Xj (ω) − Xj−1 (ω) )
j=1
= g1 (ω) (X1 (ω) − X0 (ω) ) + · · · + gm (ω) (Xm (ω) − Xm−1 (ω) ) . (∗)
then the path Xr (ω), . . . , Xr+s (ω) forms an up-crossing of [a, b] and we see
that
r+s
X r+s
X
gj (ω)(Xj (ω) − Xj−1 (ω)) = (Xj (ω) − Xj−1 (ω) )
j=r+1 j=r+1
where k is the largest integer for which gj (ω) = 1 for all k ≤ j ≤ m + 1 and
gk−1 = 0. This means that Xk−1 (ω) < a and Xj (ω) ≤ b for k ≤ j ≤ m.
The sequence Xk−1 (ω) < a, Xk (ω) ≤ b, . . . , Xm (ω) ≤ b forms the partial
up-crossing at the end of the path X0 (ω), X1 (ω), . . . , Xm (ω). Since we have
gk (ω) = · · · = gm (ω) = 1, we see that
where the inequality follows because gk (ω) = 1 and so Xk−1 (ω) < a, by
construction. Now, any real-valued function f can be written as f = f + −f −
where f ± are the positive and negative parts of f , defined by f ± = 12 (|f |±f ).
Evidently f ± ≥ 0. The inequality f ≥ −f − allows us to estimate R by
on Ω.
ifwilde Notes
Martingales 47
Lemma 3.34. Suppose that (fn ) is a sequence of random variables such that
fn ≥ 0, fn ↑ and such that E(fn ) ≤ K for all n. Then
P (lim fn < ∞) = 1 .
n
0 ≤ gn ≤ fn =⇒ E(gn ) ≤ E(fn ) ≤ K .
(Xn − a)− = 1
2 (|Xn − a| − (Xn − a))
≤ |Xn | + |a|
giving
M + |a|
E( Un [a, b] ) ≤
b−a
+
for any n ∈ Z . However, by its very construction, Un [a, b] ≤ Un+1 [a, b] and
M + |a|
so limn E( Un [a, b] ) exists and obeys limn E( Un [a, b] ) ≤ . By the
b−a
lemma, it follows
T that Un [a, b] converges almost surely (to a finite value).
Let A = a<b { limn Un [a, b] < ∞ }. Then A is a countable intersection
a,b∈Q
of sets of probability 1 and so P (A) = 1.
Claim: (Xn ) converges almost surely.
For, if not, then
which means that limn Un [a, b](ω) = ∞ (because (Xn (ω)) would cross [a, b]
infinitely-many times). Hence B ∩ A = ∅ which means that B ⊂ Ac and so
P (B) ≤ P (B c ) = 0. It follows that Xn converges almost surely as claimed.
Denote this limit by X, with X(ω) = 0 for ω ∈ / A. Then
Remark 3.37. These results also hold for martingales and submartingales
(because (−Xn ) is a supermartingale whenever (Xn ) is a submartingale).
ifwilde Notes
Martingales 49
We can see this as follows. Since E(Xn− ) ≤ E(Xn+ + Xn− ) = E(|Xn |), we
see immediately that if E(|Xn |) ≤ M for all n, then also E(Xn− ) ≤ M
for all n. Conversely, suppose there is some positive constant M such that
E(Xn− ) ≤ M for all n. Then
for all α ∈ I.
Remark 3.41. Note that any uniformly integrable family { Yα : α ∈ I } is
L 1
R -bounded. Indeed, by definition, for any ε > 0 there is M such that
{ |Yα |>M } |Yα | dP < ε for all α. But then
Z Z Z
|Yα | dP = |Yα | dP + |Yα | dP ≤ M + ε
Ω { |Yα |≤M } { |Yα |>M }
for all α ∈ I.
Before proceeding, we shall establish the following result.
Lemma 3.42. Let X ∈ L1 . Then for
R any ε > 0 there is δ > 0 such that if A
is any event with P (A) < δ then A |X| dP < ε.
Proof. Set ξ = |X| and for n ∈ N let ξn = ξ 1{ ξ≤n } . Then ξn → ξ almost
surely and by Lebesgue’s Monotone Convergence Theorem
Z Z Z Z
ξ dP = ξ (1 − 1{ ξ≤n } ) dP = ξ dP − ξn dP → 0
{ ξ>n } Ω Ω Ω
R
as n → ∞. Let ε > 0 be given and let n be so large that { ξ>n } ξ dP < 12 ε.
For any event A, we have
Z Z Z
ξ dP = ξ dP + ξ dP
A A∩{ ξ≤n } A∩{ ξ>n }
Z Z
≤ n dP + ξ dP
A { ξ>n }
1
< n P (A) + 2ε
<ε
We shall consider separately the three terms on the right hand side.
By Rthe hypothesis of uniform integrability, we may say that the second
term { |Xn |>M } |Xn | dP < ε for all sufficiently large M .
As for the
R third term, the integrability of X means that there is δ > 0
such that A |X| dP < ε whenever P (A) < δ. Now we note that (provided
M > 1)
Z Z
P ( |Xn | > M ) = 1{ |Xn |>M } dP ≤ 1{ |Xn |>M } |Xn | dP < δ
Ω Ω
ifwilde Notes
Martingales 51
Finally, we consider the first term. Fix M > 0 so that the above bounds
on the second and third terms hold. The random variable 1{ |Xn |≤M } |Xn −X|
is bounded by the integrable random variable M + |X| and so it follows from
Lebesgue’s Dominated Convergence Theorem that
Z Z
|Xn − X| dP = 1{ |Xn |≤M } |Xn − X| dP → 0
{ |Xn |≤M } Ω
Proposition 3.44. Let (Xn ) be a martingale with respect to the filtration (Fn )
such that Xn → X in L1 . Then, for each n, Xn = E(X | Fn ) almost surely.
kE(X − Xn | Fm )k1 ≤ kX − Xn k1 → 0
(2) E( (X − Xn )2 ) → 0.
Doob-Meyer Decomposition
The (continuous time formulation of the) following decomposition is a crucial
idea in the abstract development of stochastic integration.
Theorem 3.46. Suppose that (Xn ) is an adapted L1 process. Then (Xn ) has
the decomposition
X = X0 + M + A
A0 = 0
An = An−1 + E(Xn − Xn−1 | Fn−1 ) .
ifwilde Notes
Martingales 53
which means that (Xn −An ) is an L1 -martingale. Let Mn = (Xn −An )−X0 .
Then M is a martingale, M is null at 0 and we have the Doob decomposition
X = X0 + M + A .
X = X 0 + M ′ + A′ = X 0 + M + A .
Then
and so
An − An−1 = A′n − A′n−1 a.s.
Now, both A and A′ are null at 0 and so A0 = A′0 (= 0) almost surely and
therefore
A1 = A′1 + (A0 − A′0 ) =⇒ A1 = A′1 a.s.
| {z }
=0a.s.
Continuing in this way, we see that An = A′n a.s. for each n. It follows that
there is some Λ ⊂ Ω with P (Λ) = 1 and such that An (ω) = A′n (ω) for all n
for all ω ∈ Λ. However, M = X − X0 − A and M ′ = X − X0 − A′ and so
Mn (ω) = Mn′ (ω) for all n for all ω ∈ Λ, that is M = M ′ almost surely.
Now suppose that X = X0 +M +A is a submartingale. Then A = X−M −X0
is also a submartingale so, since A is predictable,
Once again, it follows that there is some Λ ⊂ Ω with P (Λ) = 1 and such
that
An (ω) ≥ An−1 (ω)
for all n and all ω ∈ Λ, that is, A is increasing almost surely.
X 2 = X02 + M + A
by the Cauchy-Schwarz
R kXn − Y k2 → 0, we must have the
inequality. Since S
equality B (X − Y ) dP = 0 for any B ∈ n Fn .
ifwilde Notes
Martingales 55
ifwilde Notes
Chapter 4
Definition 4.1. The process (Xt )t∈R+ is adapted with respect to the filtration
(Ft )t∈R+ if Xt is Ft -measurable for each t ∈ R+ .
The adapted process (Xt )t∈R+ is a martingale with respect to the filtration
(Ft )t∈R+ if Xt is integrable for each t ∈ R+ and if
Remark 4.3. It must be stressed that although it may at first appear quite
innocuous, the change from a discrete index to a continuous one is anything
but. There are enormous technical complications involved in the theory with
a continuous index. Indeed, one might immediately anticipate measure-
theoretic difficulties simply because R+ is not countable.
Remark 4.4. Stochastic processes (Xn ) and (Yn ), indexed by Z+ , are said to
be indistinguishable if
The process (Yn )n∈Z+ is said to be a version (or a modification) of (Xn )n∈Z+
if Xn = Yn almost surely for every n ∈ Z+ .
57
58 Chapter 4
which is to say that (Yt ) is a version of (Xt ) but these processes are far from
indistinguishable.
Note also that the path t 7→ Xt (ω) is constant for every ω, whereas for
every ω, the path t 7→ Yt (ω) has a jump at t = ω. So the paths t 7→ Xt (ω) are
continuous almost surely, whereas with probability one, no path t 7→ Yt (ω)
is continuous.
ifwilde Notes
Stochastic integration - informally 59
Claim: Yt is measurable.
Example 4.8 (Doob’s Maximal Inequality). Suppose now that (Xt ) is a non-
negative submartingale with respect to a filtration (Ft ) such that F0 contains
all sets of probability zero. Then Ω0 ∈ F0 and so the process (ξt ) = (1Ω0 Xt )
is also a non-negative submartingale. Fix t ≥ 0 and n ∈ N. For 0 ≤ j ≤ 2n ,
let tj = tj/2n . Set ηj = ξtj and Gj = Ftj for j ≤ 2n and ηj = ηt and Gj = Ft
for j > 2n . Then (ηj ) is a discrete parameter non-negative submartingale
with respect to the filtration (Gj ). According to the discussion above,
1
P ( sup |Xs | ≥ λ ) ≤ kXt k22 .
s≤t λ2
Proof. Let (Dn ) be the sequence of partitions of [0, t] as above and let
fn (ω) = maxj≤2n |ξj (ω)|. Then fn → f = sups≤t |Xs | almost surely and
since fn ≤ f almost surely, it follows that 1{ fn >µ } → 1{ f >µ } almost
surely. By Lebesgue’s Dominated Convergence Theorem, it follows that
E(1{ fn >µ } ) → E(1{ f >µ } ), that is, P (fn > µ) → P (f > µ).
Let µ < λ. Applying Doob’s maximal inequality, corollary 3.29, for the
discrete-time filtration (Fs ) with s ∈ { 0 = tn0 , tn1 , . . . , tnmn = t }, we get
1
P ( fn > µ ) ≤ P ( fn ≥ µ ) ≤ kXt k22 .
µ2
Letting n → ∞, gives
as required.
ifwilde Notes
Stochastic integration - informally 61
Case 2. a ≤ s ≤ b.
We have
E(Yt | Fs ) = E(h (Xt∧b − Xa ) | Fs )
= h E((Xt∧b − Xa ) | Fs ) a.s.
= h (Xt∧b∧s − Xa ) a.s.
= h (Xs∧b − Xs∧a )
= Ys .
Case 3. b < s.
We find
E(Yt | Fs ) = E(h (Xb − Xa ) | Fs )
= h E((Xb − Xa ) | Fs ) a.s.
= h (Xb − Xa ) a.s.
= h (Xs∧b − Xs∧a )
= Ys
and the proof is complete.
Notation Let E denote the real linear span of the set of elementary processes.
So any element h ∈ E has the form
n
X
h(ω, s) = gi (ω)1(ai ,bi ] (s)
i=1
for some n, pairs ai < bi and bounded random variables gi where each gi is
Fai -measurable. Notice that h(ω, 0) = 0. In fact, we are not interested in
the value of h at s = 0 as far as integration is concerned. We could have
included random variables of the form g0 (ω)1{ 0 } (s) in the construction of E,
where g0 is F0 -measurable, but such elements play no rôle.
ifwilde Notes
Stochastic integration - informally 63
P
Definition 4.11. For h = ni=1 gi 1(ai ,bi ] ∈ E and T ≥ 0, the stochastic integral
RT
0 h dX is defined to be the random variable
Z T n Z
X T n
X
h dX = hi dX = gi (XT ∧bi − XT ∧ai )
0 i=1 0 i=1
Rt
Proposition 4.13. For h ∈ E, the process ( 0 g dX)t∈R+ ) is an L2 -martingale.
Pn
Proof. If h = i=1 hi , where each hi is elementary, as above, then
Z t n Z
X t
h dX = hi dX
0 i=1 0
m−1
X 2
E(Yt2 ) = E( gi (Xti +1 − Xti ) )
i=1
X m−1
m−1 X
= E( gi gj (Xti +1 − Xti )(Xtj +1 − Xtj ) ) .
j=1 i=1
Now, suppose i 6= j, say i < j. Writing ∆Xi for (Xti +1 − Xti ), we find that
Of course, this also holds for i > j (simply interchange i and j) and so we
may say that
Z t m−1
X m−1
X
2
E( g dX ) = E( gj2 ∆Xj2 ) = E( gj2 ∆Xj2 ) .
0 j=1 j=1
Next, consider
= E( gj2 ∆Aj ) .
ifwilde Notes
Stochastic integration - informally 65
Hence
Z t m−1
X
2
E( g dX )= E( gj2 (Atj+1 − Atj ) ) .
0 j=1
m−1
X
= gj2 (ω) ( Atj+1 (ω) − Atj (ω) ) .
j=1
Taking expectations,
Z t X
E( g 2 dA ) = E( gj2 ∆A ))
0
Z t Z t
2
E( g dXs ) = E( g 2 dAs )
0 0
known as the isometry property. This isometry relation allows one to extend
the class of integrands in the stochastic integral. Indeed, suppose that (gn )
+
R t from2E which converges to a map h : R × Ω → R in the sense
is a sequence
that E( 0 (gn − h) dAs ) → 0. The isometry property then tells us that the
Rt
sequence ( 0 gn dX) of random variables is a Cauchy sequence in L2 and so
Rt
converges to some Yt in L2 . This allows us to define 0 h dX as this Yt . We
will consider this again for the case when (Xt ) is a Wiener process.
ifwilde Notes
Chapter 5
Wiener process
67
68 Chapter 5
as required.
1 2
6. Such a Wiener process exists. In fact, let p(x, t) = √2πt e−x /2t denote
the density of Wt = Wt − W0 and let 0 < t1 < · · · < tn . Then to
say that Wt1 = x1 , Wt2 = x2 , . . . , Wtn = xn is to say that Wt1 = x1 ,
Wt2 − Wt2 = x2 − x1 , . . . , Wtn − Wtn−1 = xn − xn−1 . Now, the random
variables Wt1 , Wt2 −Wt1 , . . . , Wtn −Wtn−1 are independent so their joint
density is a product of individual densities. This suggests that the joint
probability density of Wt1 , . . . , Wtn is
p(x1 , x2 , . . . , xn ; t1 , t2 , . . . , tn )
= p(x1 , t1 ) p(x2 − x1 , t2 − t1 ) . . . p(xn − xn−1 , tn − tn−1 ) .
Q
Let Ωt = Ṙ, the one-point compactification of R, and let Ω = t∈R+ Ωt
be the (compact) product space. Suppose that f ∈ C(Ω) depends only
on a finite number of coordinates in Ω, f (ω) = f (xt1 , . . . , xtn ), say. Then
we define
Z
ρ(f ) = p(x1 , . . . , xn ; t1 , . . . , tn )f (x1 , . . . , xn ) dx1 . . . dxn .
Rn
ifwilde Notes
Wiener process 69
E( Ws Wt ) = E( Ws (Wt − Ws ) ) + E( Ws2 )
= E( Ws ) E( Wt − Ws ) + var Ws
| {z } | {z } | {z }
=0 =0 since E(Ws ) = 0
= s.
(iii) We have Z ∞
2 /2 √
e−αx dx = α−1/2 2π .
−∞
Differentiating both sides twice with respect to α gives
Z ∞ √
2
x4 e−αx /2 dx = 3α−5/2 2π .
−∞
1 2 1
Example 5.4. For a ∈ R, ( eaWt − 2 a t ) (and so ( eWt − 2 t ), in particular) is a
martingale. Indeed, for s ≤ t,
1 2 1 2
E(eaWt − 2 a t | Fs ) = E(ea(Wt −Ws )+aWs − 2 a t | Fs )
1 2
= e− 2 a t E(ea(Wt −Ws ) eaWs | Fs )
1 2
= e− 2 a t eaWs E(ea(Wt −Ws ) | Fs )
1 2
= e− 2 a t eaWs E(ea(Wt −Ws ) ) , by independence,
1 2 1 2 (t−s)
= e− 2 a t eaWs e 2 a
1 2s
= eaWs − 2 a
since we know that Wt − Ws has a normal distribution with mean zero and
variance t − s.
(2k)! tk
Example 5.5. For k ∈ N, E(Wt2k ) = .
2k k!
To see this, let Ik = E(Wt2k ) and for n ∈ N, let P (n) be the statement that
(2n)! tn
E(Wt2n ) = . Since I1 = t, we see that P (1) is true. Integration by
2n n!
parts, gives
E(Wt2k ) = E(Wt2k+2 )/t(2k + 1)
ifwilde Notes
Wiener process 71
mn
X
( ∆n Wj )2 → T
j=0
in L2 as n → ∞.
To see this, we calculate
mn
X X
mn mn
X 2
k ( ∆n Wj )2 − T k2 = E ( ∆n Wj )2 − ∆n t j
j=0 j=0 j=0
X
= E ( ( ∆n Wi )2 − ∆n ti )( ( ∆n Wj )2 − ∆n tj )
i,j
X
= E ( ( ∆n Wj )2 − ∆n tj )2 ,
j
= 2 mesh(Dn ) T → 0
Example 5.7. For any c > 0, Yt = 1c Wc2 t is a Wiener process with respect
to the filtration generated by the Yt s.
We can see this as follows. Clearly, Y0 = 0 almost surely and the map
t 7→ Yt (ω) = Wc2 t (ω)/c is almost surely continuous because t 7→ Wt (ω)
is. Also, for any 0 ≤ s < t, the distribution of the increment Yt − Ys
is that of (Wc2 t − Wc2 s )/c, namely, normal with mean zero and variance
(c2 t−c2 s)/c2 = t−s. Let Gt = Fc2 t which is equal to the σ-algebra generated
by the random variables { Ys : s ≤ t }. Then c(Yt − Ys ) = (Wc2 t − Wc2 s ) is
independent of Fc2 s = Gs , and so therefore is Yt − Ys . Hence (Yt ) is a Wiener
process with respect to (Gt ).
Remark 5.8. For t > 0, let Yt = t X1/t and set Y0 = 0. Then for any
0<s<t
Now, 0 < 1/t < 1/s and so X1/t and (X1/s − X1/t ) are independent normal
random variables with zero means and variances given by 1/t and 1/s − 1/t,
respectively. It follows that Yt − Ys is a normal random variable with mean
zero and variance (t − s)2 /t + s2 (1/s − 1/t) = (t − s). When s = 0, we see
that Yt − Y0 = Yt = t X1/t which is a normal random variable with mean
zero and variance t2 /t = t.
Let (Gt ) be the filtration where Gt is the σ-algebra generated by the family
{ Ys : s ≤ t }. Again, suppose that 0 < s < t. Then for any r < s
At t = 0, we have
ifwilde Notes
Wiener process 73
Example 5.9. Let (Wt ) be a Wiener process and let Xt = µt + σWt for t ≥ 0
(where µ and σ are constants). Then (Xt ) is a martingale if µ = 0 but is a
submartingale if µ ≥ 0.
We see that for 0 ≤ s < t,
Theorem 5.10. With probability one, the sample path t 7→ Wt (ω) is nowhere
differentiable.
and
| f ( k+1 k k+1 k
n ) − f ( n ) | ≤ | f ( n ) − f (s) | + | f (s) − f ( n ) |
≤ 2β| k+1 k
n − s| + 2β|s − n |
4β
≤ n
and
| f ( nk ) − f ( k−1 k k−1
n ) | ≤ | f ( n ) − f (s) | + | f (s) − f ( n ) |
≤ 2β| nk − s| + 2β|s − k−1
n |
6β
≤ n
and let
6β
Bn = { ω : gk (ω) ≤ n for some k ≤ n − 2 } .
Now, if ω ∈ An , then Wt (ω) is differentiable at some s and furthermore
|Wt (ω) − Ws (ω)| ≤ 2β |t − s| if |t − s| ≤ 2/n. However, according to our
discussion above, this means that gk (ω) ≤ 6β/n where k is the largest
integer with k/n ≤ s. Hence ω ∈ Bn and so An ⊂ Bn . Now,
n−2
[ 6β
Bn = { ω : gk (ω) ≤ n }
k=1
and so
n−2
X 6β
P (Bn ) ≤ P ( gk ≤ n ).
k=1
We estimate
6β
P ( gk ≤ n ) = P ( max{ |W k+2 − W k+1 | ,
n n
6β
|W k+1 − W k | , |W k − W k−1 | } ≤ n )
n n n n
6β
= P ( { |W k+2 − W k+1 | ≤ n }∩
n n
6β 6β
{ |W k+1 − W k | ≤ n } ∩ { |W k − W k−1 | } ≤ n })
n n n n
6β
= P ( { |W k+2 − W k+1 | ≤ n }) ×
n n
6β 6β
× P ( { |W k+1 − W k | ≤ n } ) P ( { |W k − W k−1 | ≤ n }),
n n n n
by independence of increments,
ifwilde Notes
Wiener process 75
q Z 6β/n 3
2 /2
= n
2π e−n x dx
−6β/n
q 3
n 12 β
≤ 2π n
= C n−3/2
Itô integration
We wish to indicate how one can construct stochastic integrals with respect
to the Wiener process. The resulting integral is called the Itô integral. We
follow the strategy discussed earlier, namely, we first set-up the integral
for integrands which are step-functions. Next, we establish an appropriate
isometry property and it then follows that the definition can be extended
abstractly by continuity considerations.
We shall consider integration over the time interval [0, T ], where T > 0 is
fixed throughout.
RT For an elementary process, h ∈ E, we define the stochastic
integral 0 h dW by
Z T Xn
h dW ≡ I(h)(ω) = gi+1 (ω) ( Wti+1 (ω) − Wti (ω) ) .
0 i=1
Pn
where h = i=1 gi 1(ti ,ti+1 ] with 0 = t1 < . . . < tn+1 = T and where gi
is bounded and Fti -measurable. Now, we know that Wt2 has Doob-Meyer
decomposition Wt2 = Mt + t, where (Mt ) is an L1 -martingale. Using this,
we can calculate E(I(h)2 ) as in the abstract set-up and we find that
Z T
2
E(I(h) ) = E h2 (t, ω) dt
0
which is the isometry property for the Itô-integral. Rt
For any 0 ≤ t ≤ T , we define the stochastic integral 0 h dW by
Z t n
X
h dW = I(h 1(0,t] = gi+1 (ω) ( Wti+1 ∧t (ω) − Wti ∧t (ω) )
0 i=1
We construct the stochastic integral I(f ) (and It (f )) for any f ∈ KT via the
isometry property. Indeed, we have
E( (I(hn ) − I(hm ))2 ) = E( I(hn − hm )2 )
Z T
=E (hn − hm )2 ds .
0
ifwilde Notes
Wiener process 77
But by (∗), (hn ) is a Cauchy sequence in KT (with respect the norm khkKT =
RT
E( 0 h2 ds)1/2 ) and so (I(hn )) is a Cauchy sequence in L2 . It follows that
there is some F ∈ L2 (FT ) such that
E( (F − I(hn ))2 ) → 0 .
RT
We denote F by I(f ) or by 0 f dWs . One checks that this construction
does not depend on the particular choice of the sequence (hn ) converging to
f in KT . The Itô-stochastic integral obeys the isometry property
Z T Z T
2
E( f dWs )= E(f 2 ) ds
0 0
for any f ∈ KT .
For any f, g ∈ KT , we can apply the isometry property to f ± g to get
Z
2 2
E( (I(f ) ± I(g)) ) = E( (I(f ± g)) ) = E (f ± g)2 ds .
T
E(It (f ) | Fs ) = Is (f ) , 0≤s≤t≤T.
for any ε > 0. In particular, if we set ε = 1/2k and denote by Ak the event
1
Ak = { sup | It (hnk+1 − hnk ) | > 2k
}
0≤t≤T
then we get
1 1 1
P (Ak ) ≤ (2k )2 kI(hn − hm )k22 < (2k )2 k k
= k.
2 4 2
P
But then k P (Ak ) < ∞ and so by the Borel-Cantelli Lemma (Lemma 1.3),
it follows that
P ( Ak infinitely-often ) = 0 .
| {z }
B
For ω ∈ Bc, we must have
1
sup | It (hnk+1 )(ω) − It (hnk )(ω) | ≤ 2k
0≤t≤T
ifwilde Notes
Wiener process 79
1 1 1
≤ 2j−1
+ 2j−2
+ ··· + 2k
2
< 2k
.
This means that for each ω ∈ B c , the sequence of functions (It (hnk )(ω))
of t is a Cauchy sequence with respect to the norm kϕk = sup0≤t≤T |ϕ(t)|.
In other words, it is uniformly Cauchy and so must converge uniformly on
[0, T ] to some function of t, say Jt (ω). Now, for each k, there is a set Ek with
P (Ek ) =T 1 such that if ω ∈ Ek then t 7→ It (hnk )(ω) is continuous on [0, T ].
Set E = k Ek , so P (E) = 1. Then P (B c ∩ E) = 1 and if ω ∈ B c ∩ E then
t 7→ Jt (ω) is continuous on [0, T ]. We set Jt (ω) = 0 for all t if ω ∈ / Bc ∩ E
which means that t 7→ Jt (ω) is continuous for all ω ∈ Ω.
However, It (hnk ) → It (f ) in L2 and so there is some subsequence (It (hnkj ))
such that It (hnkj )(ω) → It (f )(ω) almost surely, say, on St with P (St ) = 1.
But It (hnkj )(ω) → Jt (ω) on B c ∩E and so It (hnkj )(ω) → Jt (ω) on B c ∩E ∩St
and therefore Jt (ω) = It (f )(ω) for ω ∈ B c ∩E ∩St . Since P (B c ∩E ∩St ) = 1,
we may say that Jt = It (f ) almost surely.
We still have to show that the process (Jt ) is adapted. This is where we use
the hypothesis that Ft contains all events of zero probability. Indeed, by
construction, we know that
Jt = lim It (hnk ) 1B c ∩E
k→∞ | {z }
Ft -measurable
ifwilde Notes
Chapter 6
Itô’s Formula
Theorem 6.1 (Itô’s Formula). Let F (t, x) be a function such that the partial
derivatives ∂t F and ∂xx F are continuous. Suppose that ∂x F (t, Wt ) ∈ KT .
Then
Z T
F (T, WT ) = F (0, W0 ) + ∂t F (t, Wt ) + 21 ∂xx F (t, Wt ) dt
0
Z T
+ ∂x F (t, Wt ) dWt
0
almost surely.
Proof. Suppose first that F (t, x) is such that the partial derivatives ∂x F and
∂xx F are bounded on [0, T ]×R, say, |∂x F | < C and |∂xx F | < C. Let Ω0 ⊂ Ω
be such that P (Ω0 ) = 1 and t 7→ Wt (ω) is continuous for each ω ∈ Ω0 . Fix
(n) (n) (n) (n)
ω ∈ Ω0 . Let tj = jT /n, so that 0 = t0 < t1 < · · · < tn = T partitions
the interval [0, T ] into n equal subintervals. Suppressing the n dependence,
81
82 Chapter 6
(n) (n)
let ∆j t = tj+1 − tj and ∆j W = Wtj+1 (ω) − Wtj (ω). Then we have
for some zj between Wtj (ω) and Wtj+1 (ω), by Taylor’s Theorem
(to 2nd order),
≡ Γ1 (n) + Γ2 (n) + Γ3 (n) .
for sufficiently large n (so that |τj − tj+1 | ≤ 1/n is sufficiently small).
ifwilde Notes
Itô’s Formula 83
For
P large n, the first summation on the right hand side is bounded by
j ε ∆j t = ε T and, as n → ∞, the second summation converges to the
RT
integral 0 ∂s F (s, Ws (ω)) ds. It follows that
Z T
Γ1 (n) → ∂s F (s, Ws (ω)) ds
0
almost surely as n → ∞.
Next, consider term Γ2 (n). This is
n−1
X
∂x F (tj , Wtj (ω)) ∆j W = I(gn )(ω)
| {z }
j=0
Wtj+1 (ω)−Wtj (ω)
where gn ∈ E is given by
n−1
X
gn (s, ω) = ∂x F (tj , Wtj (ω)) 1(tj ,tj+1 ] (s) .
j=0
kI(gn ) − I(g)k2 → 0 ,
that is, Γ2 (n) = I(gn ) → I(g) in L2 and so there is a subsequence (gnk ) such
that I(gnk ) → I(g) almost surely. That is, Γ2 (nk ) → I(g) almost surely.
1 Pn−1
We now turn to term Γ3 (n) = 2 j=0 ∂xx F (tj , zj )(∆j W )2 . For any ω ∈ Ω,
write
1
2 ∂xx F (ti , zi )(∆i W )2 = 1
2 ∂xx F (ti , Wi (ω)) (∆i W )2 − ∆i t
+ 12 ∂xx F (ti , Wi (ω)) ∆i t
+ 1
2 ∂xx F (ti , zi ) − ∂xx F (ti , Wi (ω)) (∆i W )2 .
1
RT
as n → ∞. Hence Φ2 (nk ) → 2 0 ∂xx F (s, Ws (ω)) ds almost surely as
k → ∞.
To discuss Φ1 (nk ), consider
P
E(Φ1 (nk )2 ) = E(( i αi )2 )
P
= E( i,j αi αj )
where αi = ∂xx F (ti , Wi (ω)) (∆i W )2 − ∆i t . By independence, if i < j,
and so
X
E(Φ1 (nk )2 ) = E( αi2 )
i
X
= E (∂xx F (ti , Wi ))2 ( (∆i W )2 − ∆i t)2
i
X
= E (∂xx F (ti , Wi ))2 E (∆i W )2 − ∆i t)2 ,
i
| {z }
≤C 2
by independence,
X
≤ C2 E (∆i W )2 − ∆i t)2
i
X
2
=C E (∆i W )4 − 2∆i t(∆i W )2 + (∆i t)2
i
X
2
=C ( E (∆i W )4 − (∆i t)2 )
i
ifwilde Notes
Itô’s Formula 85
X
= C2 ( 3(∆i t)2 − (∆i t)2 )
i
X
2
=C 2 (∆i t)2
i
X 2
= C2 2 T
nk
i
2 T2
= 2 C nk → 0,
as n → ∞. Hence Φ1 (nk ) → 0 in L2 .
P
Now we consider Φ3 (n) = i ( ∂xx F (ti , zi ) − ∂xx F (ti , Wi (ω)) )(∆i W )2 .
Fix ω ∈ Ω0 . Then (just as we have argued for ∂x F ), we may say that the
function t 7→ ∂xx F (t, Wt (ω)) is uniformly continuous on [0, T ]. It follows
that for any given ε > 0,
almost surely, which concludes the proof for the case when ∂x F and ∂xx F
are both bounded. To remove this restriction, let ϕn : R → R be a smooth
function such that ϕn (x) = 1 for |x| ≤ n + 1 and ϕn (x) = 0 for |x| ≥ n + 2.
Set Fn (t, x) = ϕn (x) F (t, x). Then ∂x Fn and ∂xx Fn are continuous and
bounded on [0, T ] × R. Moreover, Fn = F , ∂t Fn = ∂t F , ∂x Fn = ∂x F and
∂xx Fn = ∂xx F whenever |x| ≤ n.
We can apply the previous argument to deduce that for each n there is
some set Bn ⊂ Ω with P (Bn ) = 1 such that
Z T
Fn (T, WT ) = Fn (0, W0 ) + ∂t Fn (t, Wt ) + 12 ∂xx Fn (t, Wt ) dt
0
Z T
+ ∂x Fn (t, Wt ) dWt (∗∗)
0
for ω ∈ Bn . T
Let A = Ω0 ∩ B
n n so that P (A) = 1. Fix ω ∈ A. Then ω ∈ Ω0
and so t 7→ Wt (ω) is continuous. It follows that there is some N ∈ N such
that |Wt (ω)| < N for all t ∈ [0, T ] and so FN (t, Wt (ω)) = F (t, Wt (ω)) for
t ∈ [0, T ] and the same remark applies to the partial derivatives of FN and F .
But ω ∈ BN and so (∗∗) holds with Fn replaced by FN which in turn means
that (∗∗) holds with now FN replaced by F – simply because FN and F
(and the derivatives) agree for such ω and all t in the relevant range.
We conclude that
Z T
F (T, WT ) = F (0, W0 ) + ∂t F (t, Wt ) + 21 ∂xx F (t, Wt ) dt
0
Z T
+ ∂x F (t, Wt ) dWt
0
ifwilde Notes
Itô’s Formula 87
that is,
Z t Z t
s dWs = t Wt − Ws ds .
0 0
that is,
Z t
Wt2 =t+2 Ws dWs .
0
or Z Z
t t
1
cos(Ws ) dWs = sin Wt + 2 sin(Ws ) ds .
0 0
Example 6.5. Suppose that the function F (t, x) obeys ∂t F = − 21 ∂xx F . Then
if ∂x F (t, Wt ) ∈ KT , Itô’s formula gives
Z T
F (T, WT ) = F (0, W0 ) + ∂t F (t, Wt ) + 21 ∂xx F (t, Wt ) dt
0 | {z }
=0
Z T
+ ∂x F (t, Wt ) dWt
|0 {z }
martingale
1 2 1 2
Taking F (t, x) = eαx− 2 tα , we may say that eαWt − 2 tα is a martingale.
Itô’s formula also holds when Wt is replaced by the somewhat more general
processes of the form
Z t Z t
Xt = X 0 + u(s, ω) ds + v(s, ω) dWs
0 0
Rt
where u and v obey P ( 0 |u|2 ds < ∞) = 1 and similarly for v. One has
Z tn
F (t, Xt ) − F (0, X0 ) = ∂s F (s, Xs ) + u ∂x F (s, Xs )
0
o
+ 21 v 2 ∂xx F (s, Xs ) ds
Z t
+ v(s, ω) ∂x F (s, Xs ) dWs .
0
dX = u dt + v dW
which leads to
dX dX = u2 |{z}
dt2 + 2uv |dt{z
dW} + v 2 dW dW .
=0 =0
Itô table
dt2 = 0
dt dW = 0
dW 2 = dt
ifwilde Notes
Itô’s Formula 89
R t 1
g dW − 2
R t
g 2 ds
Example 6.6. Consider Zt = e 0 . Let
0
Z t Z t
1 2
Xt = X0 + g dW − 2 g ds
0 0
= eX g dW .
dMt = dF (Vt ) = (0 − 12 u2 eV + 12 u2 eV ) ds − u eV dW
| {z }
=0
Vt
= −u e dWt .
So
which is a martingale.
where the Itô table is enhanced by the extra rule that dW (i) dW (j) = δij dt.
with X0 = x0 > 0. Here, µ and σ are constants with σ > 0. The quantity
Xt is the object under investigation and the second term on the right is
supposed to represent some random input to the change dXt . Of course,
mathematically, this is just a convenient shorthand for the corresponding
integral equation. Such an equation has been used in financial mathematics
(to model a risky asset).
To solve this stochastic differential equation, let us seek a solution of the
form Xt = f (t, Wt ) for some suitable function f (t, x). According to Itô’s
formula, such a process would satisfy
dXt = df = ft (t, Wt ) + 12 fxx (t, Wt ) dt + fx (t, Wt ) dWt (∗∗)
ifwilde Notes
Itô’s Formula 91
g ′ Y = −αg Y
gh = σ .
The first of these equations leads to the solution g(t) = C e−αt . This gives
h = σ/g = σeαt /C and so dY = Cσ eαt dW , or
Z t
Yt = Y0 + σ
C eαs dWs .
0
Hence
Z t
−αt
Xt = C e Y0 + σ
C eαs dWs
0
Z t
= e−αt C Y0 + σ eαs dWs .
0
ifwilde Notes
Itô’s Formula 93
Feynman-Kac Formula
Consider the diffusion equation
1
ut (t, x) = 2 uxx (t, x)
Proof. Consider Itô’s formula applied to the function f (s, y) = u(t−s, x+y).
In this case ∂s f = −∂1 u = −u1 , ∂y f = ∂2 u = u2 , ∂yy f = ∂22 u = u22 and so
we get
df (s, Ws ) = −u1 (t − s, x + Ws ) ds
+ 21 u22 (t − s, x + Ws ) dWs dWs + u2 (t − s, x + Ws ) dWs
| {z }
ds
= −q(x + Ws ) u(t − s, x + Ws ) ds + u2 (t − s, x + Ws ) dWs
s
q(x+Wv ) dv
+ u2 (t − s, x + Ws ) e
R dWs
0
s
q(x+Wv ) dv
R
+ f q(x + Ws ) e ds + 0
0
s
q(x+Wv ) dv
= u2 (t − s, x + Ws ) e 0 dWs .
It follows that
Z τ R s
q(x+Wv ) dv
Mτ = M0 + u2 (t − s, x + Ws ) e 0 dWs
0
where α = E(X).
We will not prove this here but will just make some remarks. To say
that X in the Hilbert space L2 (FT ) obeys E(X) = 0 is to say that X is
orthogonal to 1, so the theorem says that every element of L2 (FT ) orthogonal
to 1 is a stochastic integral. The uniqueness of f ∈ KT in the theoremRT is
a consequence of the isometry property. Moreover, if we denote 0 f dW
by I(f ), then the isometry property tells us that f 7→ I(f ) is an isometric
isomorphism between KT and the subspace of L2 (FT ) orthogonal to 1.
Note, further, that the second part of the theorem follows from
R T the first
part by setting X = XT . Indeed, by the first part, XT = α + 0 f dW and
so Z t
Xt = E(XT | Ft ) = α + f dWs
0
for 0 ≤ t ≤ T . Evidently, α = X0 = E(XT ).
ifwilde Notes
Itô’s Formula 95
from the above. It follows that X also has a normal distribution, with mean
zero and variance σ 2 with respect to Q. Hence, if Φ denotes the standard
normal distribution function, then for any x ∈ R we have
P ({ X ≤ x } ∩ A)
Q(X ≤ x) = Φ(x/σ) =⇒ = Φ(x/σ) = P (X ≤ x)
P (A)
=⇒ P ({ X ≤ x } ∩ A) = P (X ≤ x) P (A)
for any A ∈ G with P (A) 6= 0. This trivially also holds for A ∈ G with
P (A) = 0 and so we may conclude that this holds for all A ∈ G which means
that X is independent of G.
Using this, we can now discuss Lévy’s theorem where as before, we work
with the minimal Wiener filtration.
ifwilde Notes
Itô’s Formula 97
So Z Z
t t
iθXt iθXs 2 iθXu
e −e = − 12 θ e du + iθ eiθXu β dWu
s s
and therefore
Z t Z t
eiθ(Xt −Xs ) − 1 = − 21 θ2 eiθ(Xu −Xs ) du + iθ eiθ(Xu −Xs ) β dWu (∗)
s s
Rt
Let Yt = 0 eiθXu β dWu . Then (Yt ) is a martingale and the second term on
the right hand side of equation (∗) is equal to iθ e−iθXs (Yt − Ys ). Taking the
conditional expectation with respect to Fs , this term then drops out and we
find that
Z t
iθ(Xt −Xs ) 1 2
E(e | Fs ) − 1 = − 2 θ E(eiθ(Xu −Xs ) | Fs ) du .
s
dM = d(eY ) = 0 + 12 eY dY dY + eY dY
= 1
2 eY µ2 ds + eY (−µ dW − 21 µ2 ds)
= −µeY dW
= Xs Ms dP
ZA
= E(Xs MT | Fs ) dP
ZA
= Xs MT dP
ZA
= Xs dQ
A
and so
E Q (Xt |Fs ) = Xs
and the proof is complete.
ifwilde Notes
Textbooks on Probability and Stochastic Analysis
There are now many books available covering the highly technical mathe-
matical subject of probability and stochastic analysis. Some of these are
very instructional.
R. Ash, Real Analysis and Probability, Academic Press, 1972. Excellent for
background probability theory and functional analysis.
99
100
ifwilde Notes
Textbooks 101