0% found this document useful (0 votes)
65 views156 pages

Draft Notes

This document outlines the contents of a course on stochastic differential equations (SDEs). It covers 13 weeks of material, including: - The difference between ordinary differential equations (ODEs) and stochastic differential equations (SDEs) - Applications of SDEs - Probability theory concepts needed for SDEs like probability spaces, stochastic processes - Continuous-time martingales and theorems regarding their convergence - Properties and distributions of Brownian motion as it relates to SDEs - Ito calculus including the Ito integral and Ito's formula for stochastic integrals and differential equations - Solving stochastic differential equations The course progresses from introducing SDEs and underlying probability concepts in

Uploaded by

Junrong Lin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views156 pages

Draft Notes

This document outlines the contents of a course on stochastic differential equations (SDEs). It covers 13 weeks of material, including: - The difference between ordinary differential equations (ODEs) and stochastic differential equations (SDEs) - Applications of SDEs - Probability theory concepts needed for SDEs like probability spaces, stochastic processes - Continuous-time martingales and theorems regarding their convergence - Properties and distributions of Brownian motion as it relates to SDEs - Ito calculus including the Ito integral and Ito's formula for stochastic integrals and differential equations - Solving stochastic differential equations The course progresses from introducing SDEs and underlying probability concepts in

Uploaded by

Junrong Lin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 156

A FIRST COURSE

IN
STOCHASTIC DIFFERENTIAL EQUATION
Contents

1 Week1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Tuesday 1

1.1.1 Difference between ODE and SDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Applications of SDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.3 Reviewing for Probability Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Thursday 7

1.2.1 More on Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.2 Stochastic Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Week2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1 Tuesday 13

2.1.1 More on Stochastic Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.2 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.3 Tips about Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.4 Reviewing on Real Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Thursday 22

2.2.1 Uniform Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Week3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1 Tuesday 27

3.1.1 Reviewing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.2 Necessary and Sufficient Conditions for UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.3 Convergence of random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.4 Martingales in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

v
3.2 Thursday 35

3.2.1 Stopping Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Week4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.1 Tuesday 41

4.1.1 Martingales in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.1.2 Doob’s Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 Thursday 48

4.2.1 Doob’s Maximal Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5 Week5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.1 Tuesday 53

5.1.1 Convergence of Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.1.2 Continuous-time Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.2 Thursday 60

5.2.1 Theorems for Continuous Time Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 Week6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1 Tuesday 65

6.1.1 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.1.2 Introduction to Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.2 Thursday 69

6.2.1 Properties of Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7 Week7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

7.1 Tuesday 73

7.1.1 Reflection Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

7.1.2 Distributions of Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

vi
7.2 Thursday 79

7.2.1 Unbounded Variation of Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8 Weak 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

8.1 Thursday 83

8.1.1 Quadratic Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

9 Week9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

9.1 Tuesday 87

9.1.1 Introduction to Ito Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

9.2 Thursday 92

9.2.1 Approxiation by simple processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

10 Week10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

10.1 Tuesday 99

10.1.1 Square Integrable Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

10.2 Thursday 103

10.2.1 Introduction to Ito Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

10.2.2 Properties of Ito Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

11 Week11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

11.1 Tuesday 111

11.1.1 Quadratic Variation of Ito Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

11.2 Thursday 120

11.2.1 Quadratic Covariation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

11.2.2 Ito Integral for General Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

vii
12 Week12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

12.1 Tuesday 125

12.1.1 Ito’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

12.1.2 Applications of Ito’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

12.2 Thursday 140

12.2.1 Introduction to SDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

13 Week13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

13.1 Thursday 147

13.1.1 Fundamental Theorems in SDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

viii
Chapter 1

Week1

1.1. Tuesday

1.1.1. Difference between ODE and SDE

We first discuss the difference between deterministic differential equations and stochas-

tic ones by considering several real-life problems.

Problem 1: Population Growth Model. Consider the first-order ODE


8
> dN (t)
< = a(t) N (t)
dt
>
: N (0) = N
0

where N (t) denotes the size of the population at time t; a(t) is the given (deterministic)

function describing the rate of growth of population at time t; and N0 is a given

constant.

If a(t) is not completely known, e.g.,

a(t) = r (t) · noise, or r (t) + noise,

with r (t) being a deterministic function of t, and the “noise” term models something

random. The question arises: How to rigorously describe the “noise” term and solve it?

1
Problem 2: Electric Circuit. Let Q(t) denote the charge at time t in an electrical
circuit, which admits the following ODE:

8
> 1
< LQ00 (t) + RQ0 (t) + Q(t) = F (t),
C
>
: Q (0) = Q , Q 0 (0) = Q 0
0 0

where L denotes the inductance, R denotes the resistance, C denotes the capacity, and

F (t) denotes the potential source.

Now consider the scenario where F (t) is not completely known, e.g.,

F (t) = G (t) + noise

where G (t) is deterministic. The question is how to solve the problem.

R The differential equations above involving non-deterministic coefficients are


called the stochastic differential equations (SDEs). Clearly, the solution to
an SDE should involve the randomness.

1.1.2. Applications of SDE


Now we discuss some applications of SDE shown in the finance area.

Problem 3: Optimal Stopping Problem. Suppose someone holds an asset (e.g.,


stock, house). He plans to sell it at some future time. Denote X (t) as the price of the

asset at time t, satisfying the following dynamics:

dX (t)
= rX (t) + aX (t) · noise
dt

where r, a are given constants. The goal of this person is to maximize the expected

selling price:
sup E [ X (t )]
t 0

where the optimal solution t ⇤ is the optimal stopping time.

2
Problem 4: Portfolio Selection Problem. Suppose a person is interested in two
types of assets:

• A risk-free asset which generates a deterministic return r, whose price X1 (t)

follows a deterministic dynamics

dX1 (t)
= rX1 (t),
dt

• A risky asset whose price X2 (t) satisfies the following SDE:

dX2 (t)
= µX2 (t) + sX2 (t) · noise
dt

where µ, s > 0 are given constants.

The policy of the investment is as follows. The wealth at time t is denoted as v(t).

This person decides to invest the fraction u(t) of his wealth into the risky asset, with

the remaining 1 u(t) part to be invested into the safe asset. Suppose that the utility

function for this person is U (·), and his goal is to maximize the expected total wealth

at the terminal time T:

max E [U (vu ( T ))]


u(t),0t T

where the decision variable is the portfolio function u(t) along whole horizon [0, T ].

Problem 5: Option Pricing Problem. The financial derivates are products in the
market whose value depends on the underlying asset. The European call option is

a typical financial derivative. Suppose that the underlying asset is stock A, whose

price at time t is X (t). Then the call option gives the option holder the right (not the

obligation) to buy one unit of stock A at a specified price (strike price) K at maturity

date T. The task is to inference the fair price of the option at the current time. The

formula for the price of the option is the following:

c0 = E [( X ( T ) K )+ ]

which is the famous Black-Sholes-Merton Formula.

3
1.1.3. Reviewing for Probability Space

Firstly, we review some basic concepts in real analysis.

Definition 1.1 [s-Algebra] A set F containing subsets of W is called a s-algebra if:

1. W 2 F ;

2. F is closed under complement, i.e., A 2 F implies W \ A 2 F ;

3. F is closed under countably union operation, i.e., Ai 2 F , i 1 implies [i•=1 Ai 2 F .

Definition 1.2 [Probability Measure] A function P : F ! R is called a probability

measure on (W, F ) if

• P (W) = 1;

• P ( A) 0, 8 A 2 F ;

• P is s-additive, i.e., when Ai 2 F , i 1 and Ai \ A j = ∆, 8i 6= j,

✓ • ◆ •
[
P Ai = Â P ( A i ).
i =1 i =1

where P ( A) is called the probability of the event A. ⌅

Definition 1.3 [Probability Space] A probability space is a triplet (W, F , P ) defined as

follows:

1. W denotes the sample space, and a point w 2 W is called a sample point;

2. F is a s-algebra of W, which is a collection of subsets in W. The element A 2 F is

called an “event”; and

3. P is a probability measure defined in the space (W, F ).

4
Definition 1.4 [Almost Surely True] A statement S is said to be almost surely (a.s.)

true or true with probability 1, if

• B := {w : S(w) is true} 2 F

• P ( F ) = 1.

Definition 1.5 [Topological Space] A topological space ( X, T ) consists of a (non-empty)

set X, and a family of subsets of X (“open sets” T ) such that

1. ∆, X 2 T
T
2. U, V 2 T implies U V2T
S
3. If Ua 2 T for all a 2 A, then a2A Ua 2T.

When A 2 T , A is called the open subset of X. The T is called a topology on X. ⌅

Definition 1.6 [Borel s-Algebra] Consider a topological space W, with U being the

topology of W. The Borel s-Algebra B(W) on W is defined to be the minimal s-algebra

containing U :

B(W) , s(U ).

Any element B 2 B(W) is called the Borel set. ⌅

Definition 1.7 [F -Measurable / Random Variable]

1. A function f : (W, F ) ! (R n , B(R n )) is called F -measurable if

1
f ( B ) = {w | f (w) 2 B} 2 F ,

for any B 2 B(R n ).

2. A random variable X is a function X : (W, F ) ! (R n , B(R n )) and is F -measurable.

5
Definition 1.8 [Generated s-Algebra] Suppose X is a random variable on (W, F , P ).

Then the s-algebra generated by X, say H X is defined to be the minimal s-algebra on

W to make X measurable. ⌅

Proposition 1.1 HX = {X 1(B ) : B 2 B(R n )}.

Proof. Since X is H X -measurable, for any B 2 B(R n ), X 1 (B ) 2 H X . Thus H X ◆


{X 1 (B ) : B 2 B(R n )}. It suffices to show that { X 1 (B ) : B 2 B(R n )} is a s-algebra

to finish the proof, which is true since B(R n ) = s(U ), with U being the topology of

Rn .

6
1.2. Thursday
Reviewing for Probability Space.

• (W, F , P );

• Random variable;

• Generated s-algebra;

1.2.1. More on Probability Theory


Definition 1.9 [Distribution] A probability measure µ X on R n induced by the random

variabe X isdefined as
1
µX ( B ) = P(X ( B )),

where B 2 B(R n ). The µ X is called the distribution of X. ⌅

Definition 1.10 [Expectation] The expectation of X is given by

Z
E[X] = X (w ) dP (w )
W

When W = R n , the expectation can be written in terms of distribution function:

Z
E[X] = y dµ X (y)
Rn

Note that the expectation of the random variable X is well-defined when X is integrable:

Definition 1.11 [Integrable] The random variable X is integrable, if

Z
| X (w)| dP (w) < •.
W

In other words, X is said to be L1 -integrable, denoted as X 2 L1 (W, F , P ). ⌅

7
R
⌅ Example 1.1 If f : R n ! R is Borel measurable, and W
| f ( X (w ))| dP (w ) < •, then

Z Z
E [ f ( x )] = f ( X (w )) dP (w ) = f (y) dµ X (y).
W Rn

Definition 1.12 [L p space] Suppose X : W ! R is a random variable and p 1.

• Define L p -norm of X as

✓Z ◆1/p
p
kXk p = | X (w )| dP
W

If p = •, define

k X k• = inf{ N 2 R | | X (w)|  N, a.s.}

• A random variable X is said to be in the L p space (p-th integrable) if

Z
| X (w )| p dP (w ) < •,
W

denoted as X 2 L p (W, F , P ).

Proposition 1.2 If p q, then k X k p k X kq . Thus L p (W, F , P ) ✓ Lq (W, F , P ).

Proof. The inequality is shown by using Holder’s inequality:

Z ✓Z ◆q/p ✓Z ◆ 1p ·q
q q
k X kq = | X |q dP  (| X |q ) p/q dP = | X | p dP = kXk p .
W W W

Then we discuss how to define independence between two random variables, by

the following three steps:

8
Definition 1.13 [Independence]
T
1. Two events A1 , A2 2 F are said to be independent if P ( A1 A2 ) = P ( A1 )P ( A2 ).

2. Two s-algebras F1 , F2 are said to be independent if F1 , F2 are independent events

for 8 F1 2 F1 , F2 2 F2

3. Two random variables X, Y are said to be independent if H X , HY , the s-algebra

generated by X and Y, respectively, are independent.

R The independence defined above can be generalized from two events into
finite number of events.

Proposition 1.3 If X and Y are two independent random variables, and E [| X |] <

•, E [|Y |] < •, then

E [ XY ] = E [ X ]E [Y ] < •.

Proof. The first step is to simplify the probability distribution for the product random

variable ( X, Y ), i.e., µ X,Y .

From now on, we also write the event { X 1 ( B )} as { X 2 B } for B 2 B(R n ).


R

By the definition of independence, we have the following:

µ X,Y ( A1 ⇥ A2 ) , P ({( X, Y ) 2 ( A1 ⇥ A2 )}) = P ({ X 2 A1 , Y 2 A2 })

= P ({ X 2 A1 })P ({Y 2 A2 }) = µ X ( A1 )µY ( A2 ).

Now we begin to simplify the expectation of product:

Z ZZ
E [ XY ] = xy dµ X,Y ( x, y) = xy dµ X ( x )µY (y)
Z  Z Z
= y x dµ X ( x ) µY (y) = E [ X ]yµY (y) = E [ X ]E [Y ].

9
1.2.2. Stochastic Process
Consider a set T of time index, e.g., a non-negative integer set or a time interval [0, •).

We will discuss a discrete/continuous time stochastic process.

Definition 1.14 [Stochastic Process] A collection of random variables { Xt }t2T , defined

on (W, F , P ) and taking values in R n , is called a stochastic process. ⌅

R A stochastic process { Xt }t2T can also be viewed as a random function, since


it is a mapping W ⇥ T ! R n . Sometimes we omit the subscript to denote a
stochastic process { Xt }.

Definition 1.15 [Sample Path] Fixing w 2 W, then { Xt (w )}t2T (denoted as X· (w )) is

called a sample path, or trajectory. ⌅

Definition 1.16 [Continuous] A stochastic process { Xt } is said to be continuous (right-

cot, left-cot, resp.) a.s., if t ! Xt (w ) is continuous (right-cot, left-cot, resp.) a.s., i.e.,

✓ ◆
P {w : t ! Xt (w ) is continuous (right-cot, left-cot, resp.)} = 1.

⌅ Example 1.2 [Poisson Process] Consider (x j , j = 1, 2, . . .) a sequence of i.i.d. random

variables with Possion distribution with intensity l > 0. Let T0 = 0, and Tn = Ânj=1 x j .

Define Xt = n if Tn  t < Tn+1 . Verify that { Xt } is a stochastic process with right-

continuity and left-limit exists. Instead of giving a mathematical proof, we provide a

numerical simulation of { Xt } plotted in Figure. 1.1. a ⌅

a The corresponding matlab code can be found in

https://github.com/WalterBabyRudin/Courseware/tree/master/MAT4500/week1

10
125

0
0 20 40 60 80 100 120

Figure 1.1: One simulation of { Xt } with intensity l = 1.2 and 500 samples

11
Chapter 2

Week2

2.1. Tuesday

2.1.1. More on Stochastic Process


For simplicity of notation, we write

1
{ X 2 F } , {w : X (w ) 2 F } = X ( F ).

Definition 2.1 [Joint Distribution of a Stochastic Process] Let { Xt } be a stochastic

process. Let 0 = t0  t1  · · ·  tk . The joint distribution of random variables Xt1 , . . . , Xtk

is defined as

µt1 ,t2 ,...,tk ( F1 ⇥ · · · ⇥ Fk ) = P ( Xt1 2 F1 , · · · , Xtk 2 Fk ),

where F1 , . . . , Fk are all Borel sets in R n . ⌅

R The measure µt1 ,t2 ,...,tk is the finite-dimensional distribution. In particular,


µt1 ,t2 ,...,tk is a probability measure on the product space R n ⇥ · · · ⇥ R n .

⌅ Example 2.1 [Brownian Motion] Consider a probability space (W, F , P ). Define the

function
✓ ◆
1 (y x )2
P(t, x, y) = p exp , x, y 2 R, t > 0
2pt 2t

The Brownian motion a is denoted by { Bt }t 0. Then the joint distribution of { Bt } at time

13
t1 , t2 , . . . , tk is given by:

P ( Bt1 2 F1 , . . . , Btk 2 Fk ) =
Z
P(t1 , 0, x1 )P(t2 t1 , x1 , x2 ) · · · P( t l tk 1 , xk 1 , xk ) dx1 dx2 · · · dxk
F1 ⇥···⇥ Fk

a now consider only the Brownian motion with independent, normally distributed increment.

Definition 2.2 [Measurable Set] Let (S, F ) be a pair, with S being a set and F is a

s-algebra on S. Then the set F is called a measurable space, and an element of F is

called a F -measurable subset of S. ⌅

R Cosnider a stochastic process { Xt } in continuous time, e.g., a Brownian


motion. Consider the space (R [0,•) , B(R [0,•) )), and define the collection of
outcomes
F = {w 2 W | Xt (w ) 2 [0, 1], 8t  1}

The issue is that this event F is not necessarily B(R [0,•) )-measurable. Some-
times we need some extra conditions on the stochastic process to make F
measurable. The significance of F will also discussed in the future.

Proposition 2.1 Suppose that { Xt } is a continuous-time stochastic process. Let T be a

countable subset of [0, •), then given B 2 B(R n ),

• The set {w : Xt (w ) 2 B for any t 2 T } is measurable;

• The function h = supt2T | Xt | is F -measurable.

Proof. For fixed t 2 T , because of the F -measurability of Xt , the set

{ Xt 2 B} := {w : Xt (w ) 2 B} is measurable.

It is easy to see that the countably intersection \t2T { Xt 2 B} is measurable as well.

For the second assertion, it suffices to check that h 1 ([ •, a)) = \t2T { Xt < a} is

measurable. ⌅

14
However, when T is uncountable, it is problematic to show the measurability. It is

even difficult to show that for almost all w, t 7! Xt (w ) is continuous. In order to obtain

a “continuous” process, we need the following important concept:

Definition 2.3 [Equivalent random variables] Let { Xt }t 0 and {Yt }t 0 be two stochastic

process on (W, F , P ). Then {Yt } is called an equivalent (a version) of { Xt } if

P ({w | Xt (w ) = Yt (w )}) = 1, for any time t.

R It is easy to see that when { Xt }t 0 is a version of {Yt }t 0, they have the same
finite-dimensional distributions, but their path properties may be different,
e.g., for almost all w, t 7! Xt (w ) may be continuous while t 7! Yt (w ) may not.

2.1.2. Conditional Expectation


Definition 2.4 [Conditional Expectation] Suppose that (W, F , P ) is a probability space.

G is a sub s-algebra of F , i.e., G ✓ F . Let X : W ! R n be an integrable random variable,


and the conditional expectation X given G , denoted as E [ X | G], is a random variable

satisfying the following conditions:

1. E [ X | G] is G -measurable;

2. For any event A 2 G ,


Z Z
E [ X | G] dP = X dP
A A

In other words,

E [E [ X | G]1 A ] = E [ X1 A ].

R Let X be an integrable random variable. Then for each sub s-algebra G ✓ F ,


the conditional expectation E [ X | G] exists and is unique up to V -measurable
sets of probability zero. The proof is based on the Radon-Nikodym theorem.

15
In other words, suppose that Y is another random variable satisfying the
condition mentioned in Definition 2.4, i.e.,

• Y is G -measurable;

• E [Y1 A ] = E [ X1 A ] for any A 2 G ;

then we can assert that Y = E [ X | G] a.s., and Y is called a version of E [ X | G].

Conditional expectation has many of the same properties that ordinary expectation

does:

Theorem 2.1 — Properties of Conditional Expectation. Let X be a random variable

defined on (W, F , P ), and G is a sub s-algebra of F , then the following holds:

1. E [E [ X | G]] = E [ X ]

2. If X is G -measurable, then E [ X | G] = X a.s..

3. (Linearity) For any a1 , a2 2 R,

E [ a1 X1 + a2 X2 | G] = a1 E [ X1 | G] + a2 E [ X2 | G], a.s.

4. (Positivity) If X 0, then E [ X | G] 0.

5. (Jensen Inequality) If f : R ! R is a convex function, then

E [f( X ) | G] f(E [ X | G]).

6. (Tower Property) Let H be a sub s-algebra of G . Then


E E [ X | G] | H = E [ X | H], a.s.

7. (Conditional Independence) Suppose that H is a s-algebra independent of

s(s( X ), G), then



E X s(G , H) = E [ X | G].

In particular, E [ X | H] = E [ X ] if H is independent of X.

16
Proof. 1. Recall the definition of E [ X | G] and take A = W,

E [E [ X | G]] = E [E [ X | G]1W ] = E [ X ].

2. It suffices to verify that X satisfies 1) and 2) in Definition 2.4, and the result holds

by the uniqueness of conditional expectation.

3. Again, verify the RHS satisfies 1) and 2) in Definition 2.4, and the result holds by

the uniqueness of conditional expectation.

4. For fixed w 2 W,

E [ X | G](w ) = E [E [ X | G]1{w } ] = E [ X1{w } ] = X (w ) 0.

5. • Assume that we can construct a collection of affine functions L = { L( x ) : L( x ) =

ax + b}, such that f( x ) = sup L2L L( x ). As a result, for any L 2 L,

E [f( X ) | G] E [ L( X ) | G] = L(E [ X | G])

Taking the supermum over all L 2 L, the desired result holds.

• Here we give an explicit construction of L:

L = { x 7 ! f ( x0 ) + gT ( x x0 ) | x0 2 dom(f), g 2 ∂f( x0 )}

Note that L( x )  f( x ) for any L 2 L since the subgradient inequality holds

for convex functions. Reversely, [f( x0 ) + gT ( x x0 )] | x= x0 = f( x0 ). Therefore,

f( x ) = sup L2L L( x ).

6. It suffices to show that E [ X | H] is a version of E E [ X | G] | H . The key is to

show that for all A 2 H,

E [E [ X | H]1 A ] = E [E [ X | G]1 A ].

Verify that both sides equal to E [ X1 A ].

17

7. It suffices to show that E [ X | G] is a version of E X s(G , H) , i.e., for any

A 2 s(G , H),

E [ X1 A ] = E E [ X | G]1 A .

2.1.3. Tips about Probability Theory


Suppose that { En } is a seuqence of events. We aim to define the limit of this sequence.

A key issue is that two sets may loss orders. For instance, it is possible that neither

A ✓ B nor B ✓ A. Therefore, based on a seuqence of events, we first define monotone

increasing/decreasing sequence of events as follows:

[ \
Em = En , Em = En
n m n m

Then { Em } and { Em } are montone decreasing/increasing, and it is easy to define their

limits:

lim sup En = \m Em , lim inf En = [m Em .


n!• n!•

According to this definition, we have:

lim sup En , {w : w 2 En for infinitely many n}


n!•

lim inf En , {w : w 2 En for all large enough n}


n!•

Theorem 2.2 — Borel-Cantelli Lemma. If { En } is a sequence of events satisfying

•
n=1 P ( En ) < •, then ✓ ◆
P lim sup En = 0.
n!•

Proof. Define Em as above, and thus lim supn!• En = \m Em . As a result, for any m,

✓ ◆ •
P lim sup En = P (\m Em )  P ( Em ) 
n!•
 P(En ).
n=m

18
Because of the condition •
n=1 P ( En ) < •, as m ! •,

• ✓ ◆
 P(En ) ! 0 =) P lim sup En
n!•
= 0.
n=m

2.1.4. Reviewing on Real Analysis


Theorem 2.3 — Monotone Convergence Theorem. Let { f n } be a sequence of non-

negative measurable functions on (S, S, µ) satisfying

• f 1 ( x )  f 2 ( x )  · · · for almost all x 2 S;

• f n ( x ) ! f ( x ) for almost all x 2 S, for some measurable function f .

Then
Z Z
f dµ = lim f n dµ.
S n!• S

The proof for the monotone convergence theorem (MCT) can be found in the website

Daniel Wong, Jie Wang. (2019) Lecture Notes for MAT3006: Real Analy-

sis, Lecture 21. Available at the link

https://walterbabyrudin.github.io/information/Updates/Updates.html

We can apply MCT to show the Fatou’s lemma, in which the required condition is

weaker:

Theorem 2.4 — Fatou’s Lemma. Suppose that { f n } is a sequence of measurable,

non-negative functions. Then

Z Z
lim inf f n dµ  lim inf f n dµ.
S n!• n!• S

Proof. Define the function gn = infk n fk. Then { gn } is a non-decreasing sequence of

19
non-negative functions. Then

Z Z Z
lim inf f n dµ = lim gn dµ = lim gn dµ
n!• n!• n!•
Z
= lim inf gn dµ
n!•
Z
 lim inf f n dµ
n!•

where the second equality is by MCT, and the last equality is because that gn 

f n , 8n. ⌅

⌅ Example 2.2 In general the integral of the limit-inf on a sequence of functions is smaller.

For instance, consider a sequence of functions on R:


8
>
<1 [0,1/2] ( x ), when n is odd
f n (x) =
>
:1
[1/2,1] ( x ), when n is even

Then
Z
lim inf f n = 1{1/2} =) lim inf f n dm = 0,
n!• n!•

R
while [0,1] f n dm = 1/2 for each n. ⌅

R We also have the reversed fatou’s lemma, saying that in general the integral
of the limit-sup on a sequence of functions is bigger:

Z Z
lim sup f n dµ lim sup f n dµ.
S n!• n!• S

Theorem 2.5 — Dominated Convergence Theorem. Let { f n } be a sequence of mea-

surable functions on (S, S, µ) satisfying

1. f n is dominated by an integrable function g, i.e.,

| f n ( x )|  g( x )

20
R
for almost all x 2 S, with S
| g| dµ < •.
2. f n converges to f almost everywhere for some measurable function f .
R
Then f is integrable and f n ! f in L1 , i.e., limn!• S
| fn f | dµ = 0, which implies

that
Z
f dµ = lim f n dµ.
S n!•

Proof. • The integrability of f is because that | f |  g a.e.;

• The L1 -convergence for f n is by the reversed fatou’s lemma:

Z Z
lim sup | fn f | dm  lim sup | f n f | dm = 0.

• The remaining part is by applying Fatou’s lemma on a sequence of functions

{ g + f n } and { g f n }. The details are in the reference

Daniel Wong, Jie Wang. (2019) Lecture Notes for MAT3006: Real

Analysis, Lecture 23. Available at the link

https://walterbabyrudin.github.io/information/Updates/Updates.html

21
2.2. Thursday

2.2.1. Uniform Integrability


In this lecture, we discuss the uniform integrability, which is an useful tool to handle

the convergence of random variables in L1 .

Definition 2.5 [L1 -convergence] Given a sequence of functions { f n }, we say f n ! f in

L1 if
Z
lim | fn f | dµ = 0.
n!• S

Proposition 2.2 Suppose that a random variable X is integrable, denoted as X 2

L1 (W, F , P ), then for any # > 0, there exists d > 0 such that for any F 2 F with P ( F ) < d,

we have
Z
E [| X |; F ] , E [| X |1F ] = | X | dP < #
F

Proof. Suppose on the contrary that there exists some # 0 > 0, and a sequence of events

{ Fn } with each Fn 2 F such that

1
P ( Fn ) < , but E [| X |; Fn ] #0.
2n

As a result, •
n=1 P ( Fn ) < •. By applying theorem 2.2,

P ( H ) = 0, where H , lim sup Fn .


n!•

On the other hand, by the reversed Fatou’s lemma,

Z Z
E [| X |; H ] = | X |1 H dP lim sup | X |1Fn dP = lim sup E [| X |; Fn ] #0
n!• n!•

which contradicts to the fact that P ( H ) = 0. ⌅

22
Corollary 2.1 Suppose that X 2 L1 (W, F , P ), then for any # > 0, there exists K > 0,

such that
Z
E [| X |; | X | > K ] := | X | dP < #.
{| X |>K }

Proof. The idea is to construct K such that {| X | > K } happens with small probability.

• Firstly we have the Markov inequality P ({| X | > K })  K1 E [| X |], since the follow-

ing inequality holds:

E [| X |] = E [| X |; | X | > K ] + E [| X |; | X |  K ]

E [K; | X | > K ] = KE [1|X |>K ] = KP (| X | > K )

E|X|
• Applying Proposition (3.1), we choose K large enough such that K < d, which
implies P (| X | > K ) < d. The desired result follows immediately.

Definition 2.6 A collection C of random variables are said to be uniform integrable if

and only if for any given # > 0, there exists a K 0 such that

E [| X |; | X | > K ] < #, 8X 2 C .

R An uniform integrable (UI) class C is also L1 -bounded:

Proof. Choose # = 1, then there exists K > 0 such that for any X 2 C ,

E [| X |] = E [| X |; | X | > K ] + E [| X |; X  K ]  # + K = 1 + K,

However, the converse of this statement is not necessarily true. See Example 2.3 for

a counter-example.

23
⌅ Example 2.3 Consider the probability space (W, F , P ) = ([0, 1], B([0, 1]), Leb), and

the collection C = { Xn }, with Xn = n · 1En and En = (0, 1/n).

• It is easy to show that E [ Xn ] = 1, 8n, which means that C is L1 -bounded.

• However, C is not UI. Take # = 1, and for any K > 0, as long as n > K,

E [| Xn |; | Xn | > K ] = 1

• Moreover, L1 -boundedness does not mean L1 -convergence. Observe that Xn ! 0

a.s., but
Z
| Xn 0| dP = 1, 8n.

Although L1 -boundedness does not imply UI, the L p -boundness for p > 1 does.

Theorem 2.6 Let p > 1. Suppose that a class C of random variables are uniformly

bounded in L p , i.e.,

Z
E [| X | p ] = | X | p dP < M < •, 8 X 2 C ,
W

where M is some finite constant. Then the class C is uniformly integrable (UI).

Proof. Choose some K > 0, the idea is to bound the term E [| X |; | X | > K ], for any X 2 C :

Z Z
|X| p
| X | dP = p 1
dP
{| X |>K } {| X |>K } | X |
Z Z
|X| p 1
 p 1
dP = p 1
| X | p dP
{| X |>K } K K {| X |>K }
Z
1
 p 1 | X | p dP
K W
M
 p 1.
K

where the last inequality is by the L p -boundedness. Therefore, for any given # > 0, the

24
M
desired result holds by choosing K large enough such that Kp 1
 #.

The uniform integrability also has the dominance property:

Theorem 2.7 Suppose that a class C of random variables are dominated by an

integrable random variable Y, i.e., 8 X 2 C ,

| X (w )|  Y (w ), 8w 2 W, E |Y | < •,

then the class C is UI.

Proof. The idea is to bound the term E [| X |; | X | > K ] to show the UI:

Z Z Z
| X | dP  | X | dP  |Y | dP
{| X |>K } {|Y |>K } {|Y |>K }

where the first inequality is because that {| X | > K } ✓ {|Y | > K }, and the second is

because that | X | < Y. The desired result holds by applying Corollary 3.1 such that

Z
|Y | dP < #.
{|Y |>K }

25
Chapter 3

Week3

3.1. Tuesday

3.1.1. Reviewing
Definition 3.1 For p 1, we say a random variable X 2 L p if

p
k X k p , E [| X | p ] < •.

Particularly, when X 2 L1 , the random variable X is said to be integrable. ⌅

A useful propoerty of integrability is the following:

Proposition 3.1 Suppose that a random variable X is integrable, then for any # > 0,

there exists d > 0 such that for any F 2 F with P ( F ) < d, we have

Z
E [| X |; F ] , E [| X |1F ] = | X | dP < #
F

Since {| X | > K } happens with small probability, we have the following corol-

lary:

Corollary 3.1 Suppose that X 2 L1 (W, F , P ), then for any # > 0, there exists K > 0,

such that
Z
E [| X |; | X | > K ] := | X | dP < #.
{| X |>K }

27
Definition 3.2 Consider a collcetion of random variables instead, denoted as C :

• C is said to be L p -bounded if there exists a finite M such that

E [| X | p ] < M, 8X 2 C .

• C is said to be uniformly integrable if any given # > 0, there exists a K 0 such

that

E [| X |1{|X |>K} ] < #, 8X 2 C .

R UI implies L1 -boundedness: Try to upper bound E [| X |]. However, the con-


verse is not true: One counter-example is C = { Xn }n with Xn = n · 1(0,1/n) .

Proposition 3.2 • L p -boundedness for p > 1 will imply UI;

• The class of random variables dominated by an integrable random variable is UI.

Recall the proof stated in Theorem 2.6 and Theorem 2.7 in detail.

Proof Outline. 1. The first statement is by applying the L p -boundedness on the

following formula:

Z Z
1
E [| X |1{|X |>K} ] = | X | dP  1
| X | p dP.
{| X |>K } Kp {| X |>K }

2. Firstly show that

Z Z
E [| X |1{|X |>K} ] = | X | dP  |Y | dP.
{| X |>K } {|Y |>K }

Apply Corollary 3.1 concludes the proof.

28
3.1.2. Necessary and Sufficient Conditions for UI
Our first result is about sufficient conditions for the UI on a collection of conditional

expectations:

Theorem 3.1 Suppose that X 2 L1 (W, F , P ) and {Ga }a2A is a collection of s-algebras

such that Ga ✓ F . Then the collection of random variables


C = E [ X | Ga ] : a 2 A

is uniformly integrable.

Proof. • Apply proposition 3.1 on X: For given # > 0, there exists d > 0 such that

when P ( F ) < d with F 2 F , E [| X | · 1F ] < #.

• Define Ya = E [ X | Ga ]. By Jensen’s inequality, |Ya |  E [| X | | Ga ], which motivates

us to upper bound the following integral:

 Z
E |E [ X | Ga ]|; |E [ X | Ga ]| > K = |Ya | dP
{|Ya |>K }
Z
 E [| X | | Ga ] dP
{|Ya |>K }
Z
 E [| X | | Ga ] dP
{E [| X ||Ga ]>K }
Z
= | X | dP
{E [| X ||Ga ]>K }

where the last equality is because of the definition for conditional expectation

and that {E [| X | | Ga ] > K } 2 Ga .

• Then consider upper bounding P {E [| X | | Ga ] > K } using Markov inequality:

E [E [| X | | Ga ]] E [| X |]
P {E [| X | | Ga ] > K }  = ,
K K

where the equality is by the tower property of conditional expectation. Here we


E [| X |]
choose K such that K < d, which implies P {E [| X | | Ga ] > K } < d. By applying

29
the result on the first part, we have

 Z
E |E [ X | Ga ]|; |E [ X | Ga ]| > K  | X | dP  #.
{E [| X ||Ga ]>K }

R A class C of random variables is uniformly integrable if and only if

Z
lim sup | X | dP = 0.
k!• X 2C {| X |>K }

3.1.3. Convergence of random variables


In the following part we study several convergence versions shown in probability

theory.

Definition 3.3 [Convergence in probability] Let { Xn } be a sequence of random variables.

• We call { Xn } converges to a random variable X in probability, denoted as Xn ! X

in prob., if for any # > 0,

✓ ◆
lim P | Xn X| > # = 0.
n!•

• We call { Xn } converges to a random variable X a.s., denoted as Xn ! X a.s.., if

✓n o◆
P w : lim Xn (w ) = X (w ) = 1.
n!•

• We call { Xn } converges to a random variable X in L1 , denoted as Xn ! X in L1 , if

lim k Xn X k1 = 0.
n!•

30
R

• Xn ! X a.s. implies Xn ! X in prob.;

• Xn ! X in L1 implies Xn ! X in prob.;

• A natural question is what is the connection between convergence a.s.


and convergence in L1 . The dominated convergence theorem provides
the following characterization:

9
a.s. >
Xn ! X >
>
>
=
L1
| Xn | < Y > ) Xn ! X
>
>
>
E (Y ) < • ;

Then we provide sufficient conditions for convergence in probability to imply conver-

gence in L1 :

Theorem 3.2 — Bounded Convergence Theorem. Let { Xn } be a sequence of random

variables converging to X in probability. Suppose that { Xn } is bounded by M, i.e.,

| Xn (w )|  M, 8w 2 W, n 1. Then { Xn } converges to X in L1 :

lim E [| Xn X |] = 0.
n!•

R Note that it is a stronger version of bounded convergence theorem compared


with the one studied in MAT3006. In the theorem above, we only require
convergence in probability rather than convergence a.s.

The relevance between uniform integrability and convergence of random variables

is explained by the following theorem:

Theorem 3.3 Let { Xn } be a sequence of random variables with Xn 2 L1 , and let

X 2 L1 . The sequence { Xn } converges to X in L1 if and only if

1. Xn ! X in probability, and

2. { Xn } is uniformly integrable.

31
Proof for the Reverse Direction. For K > 0, construct a function fK : R ! [ K, K ]:
8
>
>
>
> K, if x > K
>
<
fK ( x ) = x, if | x |  K
>
>
>
>
>
: K, if x < K

By the triangle inequality,

| Xn X |  | Xn fK ( Xn )| + |fK ( Xn ) fK ( X )| + |fK ( X ) X |.

It suffices to upper bound three terms on the RHS for the following formula:

E [| Xn X |]  E [| Xn fK ( Xn )|] + E [|fK ( Xn ) fK ( X )|] + E [|fK ( X ) X |]


Z Z
= [| X | K ] dP + E [|fK ( Xn ) fK ( X )|] + [| Xn | K ] dP
{| X |>K } {| Xn |>K }
Z Z
 [| X |] dP + E [|fK ( Xn ) fK ( X )|] + [| Xn |] dP (3.1)
{| X |>K } {| Xn |>K }

• For the first term, by choosing sufficiently large K, by Corollary 3.1, it can be

upper bounded by #/3;

• For the third term, when K is large enough, by the uniform integrability of { Xn },

it can be upper bounded by #/3;

• Observe the following inequality holds:

|fK ( x ) fK (y)|  | x y|, 8 x, y =) {|fK ( Xn ) fK ( X )| > #} ✓ {| Xn X | > # },

which means that P ({|fK ( Xn ) fK ( X )| > #})  P ({| Xn X | > #}). As a result,

Xn ! X in prob. implies fK ( Xn ) ! fK ( X ) in prob.1

By the Bounded Convergence Theorem 3.2, limn!• E [|fK ( Xn ) fK ( X )|] = 0.

1 Following the similar method, we can show that as long as f is continuous and Xn ! X in prob., we
have f ( Xn ) ! f ( X ) in prob.

32
Thus for sufficiently large n,

#
E [|fK ( Xn ) fK ( X )|] <
3

Combining these three bounds above, for fixed # > 0, we can pick K > 0 and there

exists sufficiently large n such that

E [| Xn X |]  #.

Proof for the Forward Direction. • Firstly we show that { Xn } is L1 -bounded, which

suffices to show that E [| Xn |] ! E [| X |], which is because of the following obser-

vation:

E [| Xn |] E [| X |]  E || Xn | | X ||  E | Xn X | ! 0.

• Then we show the uniform integrability result. By the L1 -convergence, for fixed

# > 0, there exists N0 > 0 such that


#
E | Xn X| < , 8n > N0 .
2

Similar as the previous proof for the uniform integrability results, we should

apply proposition 3.1 on finitely many random variables: for fixed # > 0, there

exists a d > 0 such that whenever P ( F ) < d, F 2 F ,

#
E [| X |1F ] < (3.2a)
2
#
E [| Xn |1F ] < , 8n  N0 (3.2b)
2

• Construct a K such that P (| Xn | > K ) is small for any n:

E | Xn | supn E | Xn |
P (| Xn | > K )  
K K
33
supn E | Xn |
Therefore, we choose K such that K < d, and then P (| Xn | > K ) < d.

• Now we can conclude the uniform integrability result: For n  N0 , by the con-

struction of K and (3.2b),

E [| Xn |1{|Xn |>K} ] < #.

For n > N0 ,

  
E | Xn |1{|Xn |>K}  E | X Xn |1{|Xn |>K} + E | X |1{|Xn |>K}
 
 E |X Xn | + E | X |1{|Xn |>K}

# #
< + = #.
2 2

where the last inequality is because of the L1 -convergence and (3.2a).

• Finally, the convergence in probability can be shown by the Markov inequality:

✓ ◆
E [| Xn X |]
P | Xn X| > #  ! 0, as n ! •.
#

3.1.4. Martingales in Discrete Time


Definition 3.4 [Stochastic Process] Let (W, F , P ) be a probability space. We describe

random phenomena in discrete time by a collection of random variables { Xn : n 1} and

increasing sequence of sub s-fields {Fn : Fn ✓ F }.

• X (·) , { Xn : n 1} is called a stochastic process;

• F , {Fn : Fn ✓ F } is called a filtration.

A probability space (W, F , P ) associated with a filtration F is called a filtered probability

space, written as (W, F , F, P ) ⌅

34
R A typical example of F is defined by generated s-algebra:

FnX , s( Xt : t  n), 8n 0.

This natural filteration is the sequence of smallest s-algebras such that Xn is


Fn -measurable for all n.

Definition 3.5 [Predictable process]

• A stochastic process X (·) , { Xn : n 1} is said to be adapted to the filtration

F , {Fn : n 1} if Xn is Fn -measurable for each n. We call X an adapted

process with respect to F.

• If Xn is Fn 1 -measurable for each n and X0 is F0 -measurable, X is said to be a

predictable process.

3.2. Thursday

3.2.1. Stopping Time


Definition 3.6 [Stopping Time] A mapping T : W ! {0, 1, 2, . . . , •} is called a stopping

time with respect to the filteration {Fn }n 0 if

{ T  n} , {w 2 W : T (w )  n} 2 Fn , 8n.

1. T can take the infinite value


2. An equivalent definition for a stopping time T is { T = n} 2 Fn , 8n.

35
Proof. (a) Suppose that { T  n} 2 Fn , 8n, then

{T  n 1} 2 F n 1 ✓ Fn =) { T = n} = { T  n} \ { T  n 1} 2 F n .

(b) Suppose that { T = n} 2 Fn , 8n, then

[
{ T = k} 2 Fk ✓ Fn , 8k  n =) { T = n} = { T = k } 2 Fn .
kn

3. A constant mapping T ⌘ N with N 2 Z + is always a stopping time.

⌅ Example 3.1 Let { Xn }n 0 be an adapted process on a filtered probability space

(W, F , {Fn }n 0 , P ). Let B 2 B(R ) be a Borel set. Define

T (w ) , inf {n 0 : Xn ( w ) 2 B }.

Here T denotes the first time that { Xn }n 0 enters into set B. Define inf(∆) = • by

default, i.e., T = • when { Xn }n 0 never enters into V. To check that T is a stopping

time, observe that

{ T = n } = { X0 2 B c , X1 2 B c , X2 2 B c , . . . , X n 1 2 B c , Xn 2 B }

= { Xn 2 B} [ ([0kn 1 { Xk 2 Bc })

Since { Xn } is adapted, { Xk 2 Bc } 2 Fk ✓ Fn for 0  k  n 1. Moreover, { Xn 2 B} 2 Fn .

Therefore, { T = n} 2 Fn for each n.

Definition 3.7 [Stopping Time s-algebra] Define the stopping time s-algebra for a given

stopping time T as the following:

F T = { A 2 F : A \ { T  n } 2 F n , 8 n }.

36
Here FT represents the information available up to a random time T. ⌅

Proposition 3.3 1. FT is a s-algebra;

2. T is FT -measurable;

3. When T1 , T2 are two stopping times with T1  T2 a.s., FT1 ✓ FT2 .

Proof. 1. It is trivial that ∆ 2 FT . Suppose that A 2 FT , then ( A \ { T  n})c 2 Fn ,

which implies that

✓ ◆c
c
A \ { T  n} = A \ { T  n} \ { T  n} 2 Fn .

Suppose that Ak 2 FT , k 1, then

✓ ◆
[ [
Ak \ { T  n} = ( Ak \ { T  n}) 2 Fn .
k 1 k 1

2. It suffices to show that { T  m} 2 FT for any m. This is true because for any n,

{ T  m } \ { T  n } = { T  m ^ n } 2 Fm^n ✓ Fn .

3. Consider any A 2 FT1 , then A \ { T1  n} 2 Fn for any n. Moreover,

{ T2  n} ✓ { T1  n} =) A \ { T2  n} ✓ A \ { T1  n} 2 Fn ,

which implies the desired result.

Theorem 3.4 Let { Xn }n 0 be an adapted process on (W, F , {Fn }n 0 , P ). Let T be a

stopping time w.r.t. {Fn }n 0. Define a random variable XT :

X T ( w ) , X T ( w ) ( w ), 8w 2 W.

Then XT is FT -measurable.

Proof. It suffices to check { XT  a} 2 FT , 8 a 2 R. By definition of the stopping time

37
s-algebra, it suffices to check

[
{ XT  a} \ { T  n} 2 Fn , 8n (= { Xk  a} \ { T = k} 2 Fn , 8n.
0 k  n

Since { Xn } is adapted, { Xk  a} 2 Fk , 8k. By definition of the stopping time, { T =

k } 2 Fk , 8k. Therefore,

{ Xk  a } \ { T = k } 2 F k ✓ F n .

The proof is complete. ⌅

Definition 3.8 [Martingale] Let { Xn }n 0 be an adapted process on (W, F , {Fn }n 0 , P ).

A stochastic process { Xn }n 0 is called a martingale if

1. Xn 2 L1 , 8n;

2. E [ Xn+1 | Fn ] = Xn a.s., for all n.

If in the last definition, “ =” is replaced by “ ” or “ ”, then { Xn }n 0 is said to be a

supermartingale or submartingale, respectively. ⌅

• A supermartingale goes downward on average, and a submartigale goes


upward on average.

• { Xn } n 0 is a supermartingale if and only if { Xn }n 0 is a submartin-


gale.

• { Xn } n 0 is a martingale if and only if it is both a supermartingale and a


submartingale.

⌅ Example 3.2 Let {Yn }n 1 be a sequence of independent random variables with

E [|Yk |] < • and E [Yk ] = 0, 8k. Define Fn = s(Y1 , Y2 , . . . , Yn ) for n 1 and F0 = {∆, W}.

Define Xn = Y1 + Y2 + · · · + Yn , 8n 1 and X0 = 0. Then { Xn }n 0 is a martingale:

1. E [| Xn |]  Âin=1 E [| Xi |] < •, which means that Xn is integrable;

38
2. Check that

E [ Xn+1 | Fn ] = E [ Xn + Yn+1 | Fn ]

= E [ Xn | Fn ] + E [Yn+1 | Fn ]

= Xn + E [Yn+1 ] = Xn ,

where the third equality is because that Xn is Fn -measurable and Yn+1 is independent

of Fn .

⌅ Example 3.3 Let {Yn }n 1 be a sequence of independent random variables with Yk 0

a.s. and E [Yk ] = 1, 8k. Define Fn = s (Y1 , Y2 , . . . , Yn ) for n 1 and F0 = {∆, W}. Define

Xn = Y1 · Y2 · · · Yn , 8n 1 and X0 = 1. Then { Xn }n 0 is a martingale:

1. E [| Xn |] = E [ Xn ] = ’nk=1 E [Yk ] = 1 < •, which means that Xn is integrable;

2. Check that

E [ Xn+1 | Fn ] = E [ Xn · Yn+1 | Fn ]

= Xn · E [Yn+1 | Fn ]

= Xn · E [Yn+1 ] = Xn ,

where the second equality is because that Xn is Fn -measurable; the third equality is

because that Yn+1 is independent of Fn .

39
Chapter 4

Week4

4.1. Tuesday

4.1.1. Martingales in Discrete Time


⌅ Example 4.1 Let F , {Fn }n 0 be a filteration and consider a random variable z 2 L1 .

Define Xn , E [z | Fn ], and we can check that { Xn }n is a martingale with respect to F:

• Firstly we need to show the integrability of Xn for any n:

E [| Xn |] = E [|E [z | Fn ]|]

 E [E [|z | | Fn ]]

= E [|z |] < •

where the first inequality is by the Jensen’s inequality.

• Then we check that E [ Xn+1 | Fn ] = Xn for any n:

E [ Xn +1 | F n ] = E [ E [ z | F n +1 ] | F n ]

= E [ z | F n ] = Xn .

41
⌅ Example 4.2 [Martingale Transform] Let Cn be the stake to be bet on game n, and

Xn Xn 1 be the net with per stake in game n, with n 1. Suppose that the process

{Cn }n 1 is predictable, and the total win up to time n is Yn = Â1kn Ck ( Xk Xk 1 ).

Define Y0 := 0. If { Xn } is a martingale w.r.t. {Fn } and {Cn } is bounded a.s.a , then we

can show that {Yn } is also a martingale:

Firstly note that Yn is Fn -measurable since Xk , Ck are all Fn -measurable for 1  k  n.

Then we check {Yn } is a martingale w.r.t. {Fn }:

• For any n, we have

E [|Yn |]  Â E [|Ck ( Xk Xk 1 )|]


1 k  n

 M· Â E [| Xk Xk 1 |]
1 k  n

 M· Â E [| Xk |] + E [| Xk 1 |] < •,
1 k  n

where the second inequality is by the boundedness of {Cn }.

• Moreover,


E [Yn+1 | Fn ] = E Â Ck ( Xk Xk 1) Fn
1 k  n +1

= E [Yn + Cn+1 ( Xn+1 Xn ) | F n ]

= E [Yn | Fn ] + Cn+1 E [ Xn+1 Xn | F n ]

= Yn + Cn+1 E [ Xn+1 Xn | Fn ] = Yn ,

where the third equality is by the Fn -measurability of Cn+1 ; the fourth equality is

by the Fn -measurability of Yn , and the last equality is by E [ Xn+1 | Fn ] = Xn .

a Here the boundedness means that |Cn (w )|  M for some M > 0 and almost all w

42
Theorem 4.1 Suppose that { Xn } is a martingale with respect to Fn , and f : R ! R

is a convex function such that f( Xn ) is integrable for all n, then {f( Xn )} is a

sub-martingale with respect to Fn .

Proof. By the Jensen’s inequality and E [ Xn+1 | Fn ] = Xn , we have

E [ f ( Xn +1 ) | F n ] f(E [ Xn+1 | Fn ]) = f( Xn ).

By similar proof, we can show the following theorem:

Theorem 4.2 Suppose that { Xn } is a sub-martingale with respect to Fn , and f is

an increasing convex function such that f( Xn ) is integrable for all n, then {f( Xn )}

is a sub-martingale with respect to Fn .

A direct example is the following:

⌅ Example 4.3 Suppose that { Xn } is a sub-martingale, define the convex function

X + := max( X, 0), then { Xn+ } is also a sub-martingale. ⌅

Theorem 4.3 Let { Xn } be a martingale and T be a stopping time. Define the stopped

process { Xn^T } as

Xn ^ T ( w ) , Xn ^ T ( w ) ( w ), 8w 2 W, 8n.

Then { Xn^T } is a martingale. In particular,

E [ X n ^ T ] = E [ X0 ] , 8n.

Proof. We will show this result by applying the Martingale transform technique men-

tioned in Example 4.2. Define the stake process {CnT }n 0 as

CnT (w ) = 1{n  T (w )}(w ), 8w 2 W, 8n.

43
Note that {CnT }n 0 is predictable since {CnT = 0} = { T (w )  n 1} 2 F n 1. Now we

begin to simplify Â1kn CkT ( Xk Xk 1 ):

 CkT ( Xk Xk 1) = 1{ T 1}( X1 X0 ) + 1 { T 2}( X2 X1 ) + · · · + 1 { T n}( Xn Xn 1)


1 k  n

= 1{ T 1 } X0 + ( 1 { T 1} 1{ T 2}) X1 + (1{ T 2} 1{ T 3}) X2

+ · · · + (1{ T n 1} 1{ T n}) Xn 1 + 1{ T n } Xn
n 1
= 1{ T n } Xn 1{ T 1 } X0 + Â 1 { T = i } Xi
i =1
✓ n 1 ◆ ✓ ◆
= 1{ T n } Xn + Â 1 { T = i } Xi 1{ T 1 } X0 1 { T = 0 } X0
i =0

= Xn ^ T X0 .

By the boundedness of {CnT }, and the result in Example 4.2, we can show that { Xn^T

X0 } is a martingale, i.e., { Xn^T } is a martingale. Therefore,

E [ Xn ^ T ] = E [E [ Xn ^ T | F n 1 ]] = E [ X( n 1)^ T ] = · · · = E [ X0 ^ T ] = E [ X0 ] .

Note that E [ XT ] does not equal to limn!• E [ Xn^T ] = E [ X0 ]. The following provides

a counter-example:

⌅ Example 4.4 Let { Xn } be a simple symmetric random walk on integers and X0 = 0.

Then { Xn } is a martingale.Define the stopping time

T , inf{n 0 Xn = 1}.

Then E [ Xn^T ] = E [ X0 ] = 0, 8n. Since the random walk is recurrent, P ( T < •), and

XT = 1 a.s., which implies that

1 = E [ XT ] 6= lim E [ Xn^T ] = E [ X0 ] = 0.
n!•

44

The Doob’s optional stopping theorem provides sufficient conditions for E [ XT ] =

E [ X0 ] :

Theorem 4.4 — Doob’s Optional Stopping Theorem. Let { Xn } be a martingale and T

be a stopping time. Then XT is integrable and E [ XT ] = E [ X0 ] if any of the following

conditions hold:

1. T is bounded a.s.;

2. { Xn } is bounded and T is finite (P ( T < •) = 1)a , a.s.;

3. E [ T ] < •b and {| Xn Xn 1 |} is bounded.


a The finiteness is a weaker condition, which does not imply boundedness
b The E [ T ] < • is a little bit stronger condition than finiteness, but still does not imply boundedness.

Proof. 1. Suppose that T is bounded a.s., which means that there exists K such

that P {w : T (w )  K } = 1. Therefore, XK^T = XT a.s. By Theorem 4.3, XT is

integrable with E [ XT ] = E [ XK^T ] = E [ X0 ].

2. Suppose that T is finite a.s., then we can show that Xn^T ! XT a.s.: Note that

P { T < •} = 1, and

w 2 { T < •} =) lim Xn^T (w ) = XT (w ).


n!•

Since { Xn } is bounded a.s., { Xn^T } is bounded a.s. as well. By the Bounded

Convergence Theorem, XT is integrable and Xn^T ! XT in L1 , which implies that

E [ XT ] = lim E [ Xn^T ] = E [ X0 ].
n!•

3. We first show that { XT ^n X0 } is dominated by an integrable random variable:

T ^n T ^n
| XT ^n X0 | = Â ( Xk Xk 1)  Â | Xk Xk 1|  M · ( T ^ n)  MT,
k =1 k =1

where the second inequality is by the boundedness of {| Xn Xn 1 |}, i.e., for any

n and w 2 W, | Xn (w ) Xn 1 ( w )|  M. Considering that E [ T ] < •, T is finite a.s.,


which implies that Xn^T ! XT . Applying the dominated convergence theorem,

45
XT ^n ! XT in L1 , and

E [ XT ] = lim E [ Xn^T ] = E [ X0 ].
n!•

4.1.2. Doob’s Inequalities


Theorem 4.5 — Doob’s Optional Sampling Theorem. Let { Xn } be a martingale and

S, T be two bounded stopping times, with S  T a.s., then E [ XT | FS ] = XS a.s.

Moreover, if instead { Xn } is assumed to be a sub-martingale or super-martingale,

then the equality in the result is replaced by or , respectively.

Proof. We only show the result based on the assumption that { Xn } is a sub-martingale,

since the remaining part follows the similar logic. Since S, T are bounded a.s., random

variables XT and XS are integrable. In order to simplify E [ XT | FS ], we need to

study the structure of FS : For any A 2 FS , by the definition of stopping-time s-

algebra, A \ {S  j} 2 F j . Considering that {S > j 1} = { S  j 1} c 2 F j 1 ✓ F j and


{ T > j} = { T  j}c 2 F j ,

A \ {S  j} \ {S > j 1} \ { T > j } = A \ { S = j } \ { T > j } 2 F j , 8 j.

• Assume that 0  T S  1, a.s., it follows that

Z N Z

A
( XT XS ) dP = Â ( XT XS ) dP (4.1a)
j=0 A\{S= j}
N Z N Z
=Â ( XT XS ) dP + Â ( XT XS ) dP
j=0 A\{S= j}\{ T > j} j=0 A\{S= j}\{ T = j}

(4.1b)
N Z
=Â ( XT XS ) dP (4.1c)
j=0 A\{S= j}\{ T > j}

where (4.1a) is by the assumption that S is bounded a.s., i.e., |S|  N a.s.; (4.1b)

46
is by the assumption that T S; (4.1c) is because 1{S = j, T = j} · ( XT XS ) = 0.

Since E [ X j+1 | F j ]  X j a.s. for any j,

Z Z
(Xj X j+1 ) dP 0 =) ( XS XT ) dP 0, 8 A 2 FS . (4.2)
A\{S= j}\{ T > j} A

R
For two F -measurable random variables, if A
(X Y ) dP  0, 8 A 2 F , then one

can assert that X  Y a.s. Therefore, (4.2) implies E [ XT | FS ]  XS a.s.

• Now suppose that T S 0 a.s., and construct intermediate variables R j =

T ^ (S + j), j = 1, 2, . . . , N. It follows that R j is a stopping time and S  R1  R2 

· · ·  R N  T a.s., with

0  R1 S  1, 0  Rj Rj 1  1, 8 j 0  T R N  1.

Consider any A 2 FS , and since 0  R1 S  1 a.s.,

Z
( XS XR1 ) dP 0.
A

By definition of stopping time s-algebra, A \ {S  j} 2 F j , which implies that

A \ {S  j} \ { R1  j} = A \ { R1  j} 2 F j =) A 2 F R1 .

Considering that 0  R2 R1  1 a.s.,

Z
( X R1 XR2 ) dP 0.
A

Similarly,

Z Z
( XR j 1
XR j ) dP 0, j = 2, . . . , N, ( XR N XT ) dP 0.
A A

R
Adding those integrals above, A
( XS XT ) dP 0, 8 A 2 FS , i.e.,

E [ X T | F S ]  XS , a.s.

47

R We can assume the uniform integrability of { Xn } and the conclusion still


holds, without assuming that T, S are bounded.

4.2. Thursday

4.2.1. Doob’s Maximal Inequality


Theorem 4.6 — Doob’s Maximal Inequality. Let { Xn } be a super-martingale. Choose

some N > 0 , then for any l > 0,


✓ ◆ " ( )#
1. l · P sup Xk l  E [ X0 ] E X N · 1 sup Xk < l ;
✓ k N ◆  ⇢ k N
2. l · P inf Xk  l  E ( X N ) · 1 inf Xk  l .
k N k N

Proof. 1. Define a stopping time R:

R(w ) = inf{k 0 : Xk ( w ) l }, 8w 2 W.

Take T = R ^ N, which is a bounded stopping time. Apply the Optional Sampling

Theorem 4.5,


E [ X0 ] E E [ X T | F0 ] = E [ X T ]
Z ⇢ Z ⇢
= 1 sup Xk l XT dP + 1 sup Xk < l XT dP
k N k N
✓ ◆ Z ⇢
l · P sup Xk l + 1 sup Xk < l X N dP
k N k N

where the first inequality is because that X0 E [ XT | F0 ] a.s.; and the last

inequality is because that conditioned on the event sup Xk < l , XT ⌘ X N .
k N
Thus the desired result holds.

48
2. Let Yn = Xn , and {Yn } is a sub-martingale. Define the stopping time

R(w ) = inf{k 0 : Yk (w ) l }, 8w 2 W.

Take T = R ^ N, which is a bounded stopping time. Apply the Optional Sampling

Theorem 4.5,

✓ ◆  ✓ ◆
E [YN ] E [YT ] lP sup Yk l + E YN 1 sup Yk < l
k N k N

It follows that

✓ ◆ ✓ ◆  ✓ ◆
l·P inf Xk  l = l · P sup Yk l  E [YN ] E YN 1 sup Yk < l
k N k N k N
 ✓ ◆
= E ( X N )1 inf Xk  l .
k N

R Summing up these two results in Theorem 4.6, we imply

✓ ◆ " ( )#  ⇢
l · P sup | Xk | l  E [ X0 ] E X N · 1 sup Xk < l E XN · 1 inf Xk  l
k N k N k N

 E [ X0 ] + 2E [ X N ],

where X , max( X, 0).

Theorem 4.7 Let { Xn } be a martingale. Choose some N > 0 and let X N 2 L2 , i.e.,

E [ X 2N ] < •. Then for any l > 0,

✓ ◆
1
P sup | Xk | > l  E [ X 2N ].
k N l2

Proof. We can show that { Xn2 } is a sub-martingale by applying Jensen’s inequality:

E [ Xn2 +1 | Fn ] (E [ Xn+1 | Fn ])2 = Xn2 .

49
As a result, E [ Xk2 ]  E [ X 2N ] < •, 8k  N, i.e., { Xk2 }k N is a super-martingale. Apply

the second part in Theorem 4.6 completes the proof:

✓ ◆ ✓ ◆
2
l ·P inf ( Xk2 )  l 2 2
= l · P sup | Xk | l
k N k N
 ⇢
 E X 2N · 1 inf ( Xk2 )  l2
k N

 E [ X 2N ].

Theorem 4.8 — Doob’s L p -inequality. 1. Suppose that { Xn } is a sub-martingale,

then for any p > 1,

 !p ✓ ◆p
p
E sup Xk+  E [( Xn+ ) p ].
kn p 1

2. Suppose that { Xn } is a martingale, then for any p > 1,

 ✓ ◆p
p p
E sup | Xk |  E [| Xn | p ].
kn p 1

Proof. 1. W.l.o.g., assume that { Xn } is non-negative, and we may replace Xn+ by Xn .

Consider a continuous increasing function f : R + ! [0, +•) with f(0) = 0, and

we evaluate the expectation for f( Z ), where Z is a given random variable:

Z
E [f( Z )] = f( Z (w )) dP (w )
W
Z Z Z (w )
= df(y) dP (w )
W 0
Z Z
= 1{y  Z (w )} df(y) dP (w )
W [0,•)
Z Z
= 1{y  Z (w )} dP (w ) df(y) (4.3a)
[0,•) W
Z
= P(Z y) df(y)
[0,•)

where (4.3a) is by the Fubini’s theorem.

50
Take f(y) ⌘ y p and define Xn⇤ = supkn Xn 1 for notation simplifcation. As a result,

using a little bit calculus gives

Z
E [( Xn⇤ ) p ] = P ( Xn⇤ l) dl p
[0,•)
Z
" ( )#
1
 E | Xn |1 sup | Xk | l dl p (4.3b)
[0,•) l kn
Z • Z
1
= Xn (w )1 { Xn⇤ (w ) l} dP (w ) dl p (4.3c)
0 l W
Z Z •
1
= Xn ( w ) 1 { Xn⇤ (w ) l} dl p dP (w ) (4.3d)
W 0 l
Z Z X ⇤ (w )
n
2
= Xn ( w ) pl p dl dP (w )
W 0
Z
p 1
= Xn ( w ) [ Xn⇤ (w )] p dP (w )
W p 1
p ⇣ ⌘1/q
 (E [( Xn ) p ])1/p E [( Xn⇤ )( p 1) q
] , with 1/q = 1 1/p
p 1
(4.3e)
p
= (E [( Xn ) p ])1/p (E [( Xn⇤ ) p ])( p 1)/p
(4.3f)
p 1

where (4.3b) is by the Doob’s maximal inequality; (4.3c) is by the assumption that

Xn 0; (4.3d) is by Fubini’s theorem; (4.3e) is by Holder’s inequality. If dividing


1)/p
both sides in (4.3f) by (E [( Xn⇤ ) p ])( p , we get the desired result.

2. The second inequality follows by applying the first and replacing { Xn } with

{| Xn |}.

⌅ Example 4.5 Let { Xn } be a non-negative sub-martingale. Then we can apply the

similar proceed to show the following upper bound holds:

 ✓ ◆
e +
E sup Xk  1 + sup E [ Xk log Xk ] ,
kn e 1 kn

where log+ X , (log X )1{ X 1}.

1 called the running maximal of { Xn }

51
Tale f(y) = (y 1)+ , then

Z •
E [( Xn⇤ 1) + ] = P ( Xn⇤ l) df(l)
0
Z •
1
 E [ Xn 1{ Xn⇤ l}] df(l) (4.4a)
0 l
Z Z X⇤
n 1
= Xn df(l) dP (4.4b)
W 0 l
Z Z X⇤
n 1
= Xn 1{l 1} dl dP
W 0 l
Z
= Xn 1{ Xn⇤ 1} log Xn⇤ dP
W

= E [ Xn log+ Xn⇤ ].

where (4.4a) is by the Doob’s maximal inequality, and (4.4b) is by Fubini’s theorem. As a

result,

E [ Xn⇤ ] 1  E [( Xn⇤ 1)+ ]  E [ Xn log+ Xn⇤ ].

We can use a bit calculus to show that

b
a log+ b  a log+ a + , (4.4c)
e

which implies that


1
E [ Xn⇤ ] 1  E [ Xn log+ Xn ] + E [ Xn⇤ ].
e

The proof is complete. ⌅

52
Chapter 5

Week5

5.1. Tuesday

5.1.1. Convergence of Martingales


Let { Xn }n 0 be an adapted process on (W, F , {Fn }n 0 , P ), and [ a, b] be a closed interval.

Define T0 = inf{n 0, Xn  a}, and

T2k 1 = inf{n > T2k 2 : Xn b }, T2k = inf{n > T2k 1 : Xn  a }

See the Figure for a illustration of Tk , k 0.

Figure 5.1: Upcrossings of [ a, b]

We may check that { Tk }k 0 is a sequence of stopping times and is increasing:

53
Proof. The increasing property is trivial. To check Tk is a stopping time, observe that

0 1
m\1
{ T2k 1 = m} = @ { Xt < b } A \ { Xm b} 2 Fm ,
t= T2k 2 +1
0 1
m\1
{ T2k = m} = @ { Xt > a } A \ { Xm  a } 2 F m .
t= T2k 1 +1

Definition 5.1 [Upcrossing]

• If T2k 1 < • a.s., then the sequence XT0 , XT1 , . . . , XT2k 1


is said to upcrosses the

interval [ a, b] by k times.

• Define Uab [ X; n] to be the number of upcrossing the interval [ a, b] by the process

X , { Xk } k 0 up to time n. We can check that Uab [ X; n] is also a stopping time:

{Uab [ X; n] = j} = { T2j 1  n < T2j+1 } = { T2j 1  n} \ { T2j+1  n}c 2 Fn .

We can also assert that XT2j  a if T2j < • a.s.; and XT2j+1 b if T2j+1 < • a.s.

Theorem 5.1 — Doob’s Upcrossing Theorem. 1. Suppose that { Xn }n 0 is a super-

martingale, then for any n 1, k 0,

⇣ ⌘ 1 h i
P Uab [ X; n] k+1  E ( Xn a) 1{Uab [ X; n] = k} .
b a

1
As a result, E [Uab [ X; n]]  b a E [( Xn a ) ].

2. Suppose that { Xn }n 0 is a sub-martingale, then for any n 1, k 0,

⇣ ⌘ 1 h i
P Uab [ X; n] k  E ( Xn a)+ 1{Uab [ X; n] = k } .
b a

1
As a result, E [Uab [ X; n]]  b a E [( Xn a ) + ].

54
Proof. 1. Considering that { Xn } is a super-martingale and T(2k+1)^n , T2k^n are two

bounded stopping times, by Doob’s Optional Sampling Theorem 4.5,

0 E [ XT(2k+1)^n XT2k^n ]

= E [( XT(2k+1)^n XT2k^n )1{n < T2k }] + E [( XT(2k+1)^n XT2k^n )1{ T2k  n < T2k+1 }]

+ E [( XT(2k+1)^n XT2k^n )1{n T2k+1 }]

= E [( Xn XT2k )1{ T2k  n < T2k+1 }] + E [( XT2k+1 XT2k )1{n T2k+1 }]

E [( Xn a)1{ T2k  n < T2k+1 }] + E [(b a )1{ n T2k+1 }] (5.1a)

E [( Xn a) 1{ T2k  n < T2k+1 }] + (b a )P { n T2k+1 }

E [( Xn a) 1{ T2k 1  n < T2k+1 }] + (b a )P { n T2k+1 }

where (5.1a) is by the fact that XT2k  a, XT2k+1 b, a.s. Therefore,

1
P {n T2k+1 }  E [( Xn a) 1{ T2k 1  n < T2k+1 }]. (5.1b)
b a

Note that {Uab [ X; n] = k } = { T2k 1  n < T2k+1 } and

{Uab [ X; n] k + 1} = [ j b
k +1 {Ua [ X; n ] = j} ✓ { T2k+1  n}.

Therefore, applying these two conditions on (5.1b) gives the desired inequality:

1
P {Uab [ X; n] k + 1}  P { n T2k+1 }  E [( Xn a) 1{Uab [ X; n] = k }].
b a

Summing up the inequality above for k 0, we imply

1
 P{Uab [X; n] k + 1} 
b a
E [( Xn a ) ].
k 0

55
The LHS is essentially E [Uab [ X; n]]:

• • j 1
 P{Uab [X; n] k + 1} =   P {Uab [ X; n] = j} =   P{Uab [X; n] = j}
k 0 k 0 j = k +1 j =1 k =0

= Â jP {Uab [ X; n] = j} = E [Uab [ X; n]].
j =1

The proof is complete.

2. We may use the similar technique to finish the proof on the second part. Apply

Doob’s Optional Sampling Theorem 4.5 on T(2k 1)^n , T2k ^n gives

0 E [ XT(2k 1)^n
XT2k^n ]

= E [( XT(2k 1)^n
XT2k^n )1{n < T2k 1 }] + E [( XT(2k 1)^n
XT2k^n )1{ T2k 1  n < T2k }]

+ E [( XT(2k 1)^n
XT2k^n )1{n T2k }]

= E [( XT2k 1
Xn )1{ T2k 1  n < T2k }] + E [( XT2k 1
XT2k )1{n T2k }]

E [(b Xn )1{ T2k 1  n < T2k }] + E [(b a )1{ n T2k }]

= E [( a Xn )1{ T2k 1  n < T2k }] + E [(b a)1{ T2k 1  n < T2k }] + E [(b a )1{ n T2k }]

= E [( a Xn )1{ T2k 1  n < T2k }] + E [(b a )1{ n T2k 1 }]

E [( Xn a)+ 1{ T2k 1  n < T2k }] + (b a )P { n T2k 1}

Or equivalently,

1
P {n T2k 1}  E [( Xn a)+ 1{ T2k 1  n < T2k }]
b a

Considering that {Uab [ X; n] = k } = { T2k 1  n < T2k } and {Uab [ X; n] k} ✓ {n

T2k 1 }, we imply

1
P {Uab [ X; n] k}  E [( Xn a)+ 1{Uab [ X; n] = k }]
b a

Summing up for k 1 both sides, we conclude the desired result.

56
R The upcrossing of { Xn } on the interval is the same as teh upcrossing of
{ Xn } on [ b, a]. Using this fact, we can assert that

• If { Xn } is a super-martingale, for any n 1, k 1,

⇣ ⌘ 1 h i
P Uab [ X; n] k  E ( Xn b) 1{Uab [ X; n] = k } .
b a

• If { Xn } is a sub-martingale, for any n 1, k 1,

⇣ ⌘ 1 h i
P Uab [ X; n] k+1  E ( Xn b)+ 1{Uab [ X; n] = k } .
b a

From the upcrossing inequality, we can easily get the result for the convergence of

a martingale.

Theorem 5.2 — Martingale Convergence Theorem. Suppose that { Xn } is a super-

martingale which is L1 -bounded, i.e., supn E [| Xn |] < •. Then there exists a random

variable X• . such that X• 2 L1 , and Xn ! X• a.s.

If we further assume that { Xn } is lower bounded by zero, then E [ X• | Fn ]  Xn

a.s., for any n.

Proof. • Firstly, we study the limit of {Uab [ X; n]}n 1, which is guaranteed to exist

since Uab [ X; n] is increasing in n. Define Uab [ X ] , limn!• Uab [ X; n], then

E [Uab [ X ]]  lim inf E [Uab [ X; n]] (5.2a)


n!•
1
 lim inf E [( Xn a) ] (5.2b)
b a n!•
✓ ◆
1 1
 sup E [( Xn a) ]  sup E [| Xn |] + | a| < •.
b a n b a n

where (5.2a) is by the Fatou’s lemma and (5.2b) is by the Doob’s upcrossing

Theorem 5.1. As a result, Uab [ X ] < • a.s.

• Note that the result in the first part holds for any rational a, b with a < b, which

57
means that P (Uab [ x ] < •) = 1, 8 a, b 2 Q, a < b. Therefore, we can show that

[
P [W ] = 0, where W = {lim inf Xn < a < b < lim sup Xn }.
a,b2Q,a<b

• Then we construct X• as follows. Because of the denseness of Q, for w 2


/ W, Xn (w )

is convergent and define X• (w ) = limn!• Xn (w ). Otherwise, define Xn (w ) = 0.

Then the almost sure convergence of Xn ! X• is obtained.

Also, we can check that X• 2 L1 :

E [| X• |]  lim inf E [| Xn |]  sup E [| Xn |] < •.


n!• n

• Given that { Xn } is lower bounded by zero, the remaining result can be shown by
R
upper bounding the integral A E [ X• | Fn ] for any A 2 Fn :

Z Z
E [ X• | F n ] = X• dP
A A
Z
 lim inf Xm dP (5.3a)
m!• A
Z
= lim inf E [ Xm | Fn ] dP (5.3b)
m!• A
Z Z
 lim inf Xn dP = Xn dP. (5.3c)
m!• A A

where (5.3a) is by Fatou’s lemma, (5.3b) is by the definition of conditional

expectation, and (5.3c) is by the definition of super-martingale.

5.1.2. Continuous-time Martingales


Now we discuss the concepts of martingales, super-martingales, sub-martingales for

continuous-time.

Definition 5.2 [Martingale] Let { Xt }t 0 be an adapted process on filtered probability

space (W, F , {Ft }t 0 , P ). A stochastic process { Xt }t 0 is called a martingale if

58
1. Xt 2 L1 , 8t;

2. E [ Xt | Fs ] = Xs a.s., for all 0  s  t.

If in the last definition, “ =” is replaced by “ ” or “ ”, then { Xt }t 0 is said to be a

supermartingale or submartingale, respectively. ⌅

Definition 5.3 [Optional Time]

1. A mapping T : W ! [0, •] is called an {Ft }-stopping time if { T  t} 2 Ft for all

t 0.

2. A mapping T : W ! [0, •] is called an {Ft }-optional time if { T < t} 2 Ft for all

t > 0.

It is easy to check that a stopping time is always an optional time. Now we discuss an

example about a specific optional time.

⌅ Example 5.1 Let T be an optional time. For each n 1, define the step-function

mapping 8
> k
< n , if (k 1)/2n  T < k/2n , k = 1, 2, . . .
Tn = 2
>
: •, if T = •

Then { Tn }n 1 are stopping times with Tn # T:

• To show Tn is a stopping time, study the set

 ⇢
[ k 1 k
{ Tn  t} = { Tn  t} \ T<
k 1
2n 2n
b t · 2n c  ⇢ b t · 2n c ⇢
[ k 1 k [ k 1 k
= { Tn  t} \ T< n = T< .
k =1
2n 2 k =1
2n 2n

Since T is an optional time,


k 1 k
T< 2 Fk/2n ✓ Ft , 8k  t · 2n =) { Tn  t} 2 Ft , 8t.
2n 2n

59
• The result for Tn # T can be found in MAT3006 knowledge:

Daniel Wong, Jie Wang. (2019) Lecture Notes for MAT3006: Real

Analysis, Lecture 19, Proposition 10.4. Available at the link

https://walterbabyrudin.github.io/information/Updates/Updates.html

T
Definition 5.4 1. Let {Ft }t 0 be a filteration. Define Ft+ , s>t Fs . Then {Ft+ }t 0

is also a filteration. A filteration {Ft }t 0 is said to be right-continuous if Ft =

Ft+ , 8t.

R Ft+ can be interpreted as the information available immediately


after time t. We can show that Ft+ is a s-algebra and Ft+ ◆ Ft .

When the filteration is right-continuous, a stopping time is the same


as an optional time.

2. A filteration {Ft }t 0 is said to be complete if each Ft contains all P-null sets in

F.
3. A filteration {Ft }t 0 is called an augmented filteration, or said to satisfy the

usual conditions, if it is complete and right-continuous.

5.2. Thursday

5.2.1. Theorems for Continuous Time Martingales


Definition 5.5 [Stopping Time s-algebra] Let T be an {Ft }-stopping time. Define

F T , { A 2 F : A \ { T  t} 2 Ft , 8t 0}.

Then FT is the s-algebra for T, containing the information available up to time T. ⌅

60
Proposition 5.1 Let {Ft } be a filtration. Define

F T + , { A 2 F : A \ { T  t } 2 Ft+ , 8 t 0}

G T , { A 2 F : A \ { T < t } 2 F t , 8 t > 0}

We can show that FT + = G T .

Theorem 5.3 1. Let { Xt }t 0 be a martingale and the filteration {Ft }t 0 satis-

fies the usual condition. Then there exists a version of { Xt }t 0, which is

right-continuous with left limits, denoted as { X̃t }t 0. Then { X̃t }t 0 is a right-

continuous martingales w.r.t. {Ft }t 0.

R The martingales we encountered are basically right-continuous by


construction.

2. (Maximal Inequality) Denote Xt⇤ , supst | Xs | as the running maximal. Then

for any t > 0 and l > 0,

1
P ( Xt⇤ l)  E [| Xt |].
l

3. (Convergence Theorem) Let { Xt }t 0 be a right-continuous super-martingale,

which is L1 -bounded. Then there exists a random variable X• such that X• 2

L1 and Xn ! X• a.s.

Theorem 5.4 — Doob’s Optional Sampling Theorem for Continuous-time Martingales.

Let { Xt }t 0 be a right-continuous martingale with the last element X• , i.e., X• 2 L1

and E [ X• | Ft ] = Xt a.s. for any t 0. Let S  T be two {Ft }-optional times. Then

E [ XT | FS+ ] = XS a.s. Specifically, if S is a stopping time, we replace FS+ by FS . In

particular, E [ XT ] = E [ X0 ].

Let’s first show a necessary and sufficient condition for the existence of X• :

Proposition 5.2 A last element X• exists if and only if { Xt }t 0 is uniformly integrable.

61
Proof. Assume that { Xt }t 0 is uniformly integrable, which implies that { Xt }t 0 is L1 -

bounded. By the martingale convergence theorem 5.3, there exists a random variable
P
X• such that Xt ! X• a.s. and X• 2 L1 . In particular, Xt ! X• . Together with the

uniform integrability of { Xt }t 0, we can assert that Xt ! X• in L1 . Finally, we check

that E [ X• | Fu ] = Xu , a.s., for any u 0, i.e., for any A 2 Fu , we have

Z Z
X• dP = lim Xt dP
A t!• A
Z Z Z
= lim E [ Xt | Fu ] dP = lim Xu dP = Xu dP.
t!• A t!• A A

Now we assume that the last element exists. Then { Xt } is a collection of conditional

expectations of X• . Applying Theorem 3.1 gives the desired result.

Now we begin to prove Theorem 5.4.

Proof. • Firstly, construct the approximation of S, T and argue the similar optional

sampling results hold:

8
> k
< n, if (k 1)/2n  S < k/2n , k = 1, 2, . . .
Sn = 2
>
: •, if S = •
8
> k
< n, if (k 1)/2n  T < k/2n , k = 1, 2, . . .
Tn = 2
>
: •, if T = •

By Example 5.1, {Sn }, { Tn } are two sequences of stopping times with Sn # S, Tn # T.

Moroever, for each n 1, Sn  Tn a.s., taking values in a countable set. Since

{ Xt } t 0 is uniformly integrable, applying the discrete-time optional sampling

theorem,

E [ XTn | FSn ] = XSn , a.s.

R R
Therefore, for any A 2 FSn , A
XTn dP = A
XSn dP.

62
• We claim that FS+ = \n 1 F Sn . Therefore, for any A 2 FS+ ,

Z Z
XTn dP = XSn dP. (5.4)
A A

Here { XSn }n 1 is called a backward (discrete) martingale w.r.t. {FSn }•


n=1 , i.e.,

E [ XSn | FSn+1 ] = XSn+1 . Therefore, for any A 2 FSn+1 ,

Z Z
XSn dP = XSn+1 dP.
A A

Thus E [ XSn+1 ] = E [ XSn ] = E [ X0 ] > • and limn!• E [ XSn ] > •.

• We also claim that { XSn }n 1 is uniformly integrable, which implies { XTn }n 1 is

uniformly integrable. Since { Xt } is right-continuous and Tn # T, Sn # S, the limit

of { XTn } and { XSn } always exist:

XT , lim XTn a.s., XS , lim XSn a.s.


n!• n!•

In particular, XTn ! XT in prob. and XSn ! XS in prob. By Theorem 3.3, XTn ! XT

in L1 and XSn ! XS in L1 .

• Then we can show that E [ XT | FS+ ] = XS a.s. as the follwoing. For any A 2 FS+ ,

Z Z
E [ XT | FS+ ] dP = XT dP
A A
Z
= lim XTn dP
n!• A
Z
= lim XSn dP
n!• A
Z
= XS dP
A

where the second and the last equality is because of the L1 convergence, and the

third equality is because of (5.4).

• Provided that S is a stopping time, S  Sn implies FS ✓ FSn . Therefore, for any


R R
A 2 FS , A XTn dP = A XSn dP. The proof is complete.

63
Chapter 6

Week6

6.1. Tuesday
At the beginning of this lecture, let’s fill the gap for the Theorem 5.3.

Proposition 6.1 Suppose that Tn is a positive stopping time and T < Tn conditioned

on the event { T < •}, 8n 1, where T , infn Tn . Then FT + = \•


n=1 F Tn , where

F T + , { A 2 F : A \ { T  t } 2 Ft+ , 8 t 0}

, { A 2 F : A \ { T < t } 2 F t , 8 t > 0}

Proof. Firstly, we show that T is an optional time:

{ T < t} = { T t}c = {\n { Tn > t}}c

= [n { Tn > t}c = [n { Tn  t}

Since Tn is a stopping time, { Tn  t} 2 Ft , 8t, which implies that T is an optional time.

• Suppose that A 2 \•
n=1 F Tn , then A \ { Tn  t } 2 Ft , 8 t, 8 n 1. As a result,

[
Ft 3 { A \ { Tn  t}} = A \ [[n { Tn  t}]
n

= A \ { T < t }, 8t.

In other words, A 2 FT + .

• Suppose that A 2 FT + , then A \ { T < t} 2 Ft , 8t. Moreover, { Tn  t} 2 FTn .

65
Therefore,

Ft 3 ( A \ { T < t}) [ ( A \ { Tn  t}) = A \ { Tn  t}, 8t > 0.

In other words, A 2 \•
n=1 FTn .

Proposition 6.2 Let {Fn }•


n=1 be a decreasing sequence of sub-s-algebras of F :

F ◆ F1 ◆ · · · ◆ F n ◆ · · · .

Suppose that { Xn }• •
n=1 is a backward submartingale w.r.t {Fn }n=1 , i.e., i) Xn is Fn -

measurable, ii) E [| Xn |] < •, iii) E [ Xn | Fn+1 ] Xn+1 , a.s. If limn!• E [ Xn ] > •, then

the sequence { Xn }•
n=1 is UI.

Proof. Note that the limit of E [ Xn ] exists since it is decreasing. Denote c , limn!• E [ Xn ].

We can argue the uniform convergence of P (| Xn | > l) as the following:

E [| Xn |]
8l > 0, P (| Xn | > l) 
l
1
= 2E [ Xn+ ] E [ Xn ]
l
1
 2E [ X1+ ] c < •.
l

where the first inequality is because that E [ Xn ] # c and { Xn+ } is a backward submartin-

gale. In other words, P (| Xn | > l) uniformly converges to 0 as l ! •.

Now we begin to show the UI of { Xn }. Applying the useful proposition 3.1 on X1 ,

for any # > 0, there exists d such that for any A 2 F1 , P ( A) < d,

Z
| X1 | dP < #. (6.1)
A

66
We can choose l sufficiently large such that P (| Xn | > l) < d, 8n. As a result,

Z Z
| Xn | dP  | Xn 1 | dP  ···
{| Xn |>l} {| Xn |>l}
Z
 | X1 | dP < #
{| Xn |>l}

where the first inequality is because that E [| Xn 1 |Fn ] | Xn |, and the last inequality is
R
because of (6.1). Therefore, {|Xn |>l} | X1 | dP < # for any n. The proof is complete. ⌅

6.1.1. Localization
The concepts of stopping times provide a tool of “localizing” quantities.

Definition 6.1 [Stopped Process] Suppose that { Xt }t 0 is an {Ft }-adapted process on

(W, F , {Ft }, P ), and T is a stopping time. Define the stopped process { Xt^T }t 0 such

that

Xt ^ T ( w ) = Xt ^ T ( w ) ( w ), 8w 2 W.

Note that { Xt^T }t 0 is also an {Ft }-adapted process.

Definition 6.2 [Local Martingale] An {Ft }-adapted process { Xt }t 0 is called a local

martingale if there is an increasing sequence of stopping times { Tn }n 0 and Tn " • a.s.,

such that { Xt^Tn }t 0 is a martingale for each n, w.r.t. {Ft }. ⌅

Note that a martingale is a local martingale. Now we give a sufficient condition such

that local martingale can be a martingale.

Theorem 6.1 Suppose that { Xt }t 0 is an {Ft }-adapted local martingale, and there

is a sequence { Tn }n 0 that reduces { Xt }t 0:

{ Xt^Tn }t 0 is a martingale for each n.

Suppose that E [supn | Xt^T |] < • for each t, then { Xt }t 0 is a martingale.

67
Proof. Considering that i) Xt^Tn ! Xt a.s. because Tn " •, ii) | Xt^Tn |  supn | Xt^Tn |, with

the random variable supn | Xt^Tn | integrable, we can apply the dominated convergence
L1
theorem to show that Xt^Tn ! Xt , 8t, as n ! •.

Now we check that { Xt }t 0 is a martingale, i.e., for any A 2 Fs , 0  s  t, we have

Z Z
E [ Xt | Fs ] dP = Xt dP
A A
Z
= lim Xt^Tn dP
n!• A
Z
= lim Xs^Tn dP
n!• A
Z
= Xs dP
A

where the third inequality is because that E [ Xt^Tn | Fs ] = Xs^Tn . ⌅

6.1.2. Introduction to Brownian Motion


Brownian motion is a mathematical model of random movements observed by botanist

Robert Brown. Now we give a way for constructing the Brownian motion.

Definition 6.3 [Brownian Motion] A stochastic process B = { Bt }t 0 on a probability

space (W, F , P ), taking values in R, is called a Brownian motion if:

1. P ( B0 = 0) = 1;

2. (Independent Increments) For every 0  t1 < · · · < tk < • and x1 , x2 , . . . , xk 1 2 R,

✓ ◆
P Bt2 Bt1  x1 , . . . , Btk Btk 1
 xk 1 = ’ P ( Bt j Bt j 1
 xj 1)
2 j  k

3. (Normal Distribution)For each 0  s < t, Bt Bs follows normal distribution with

mean 0 and variance s2 (t s), where s > 0.

4. Almost all the sample paths of { Bt }t 0 are continuous. In particular, when s = 1,

we call it the standard Brownian motion.

68
R In some situations, the first condition may not be satisfied. Instead, the process
may start at a non-zero point x. Then we write such a process { x + Bt }.

Definition 6.4 [Canonical Wiener Measure] Let the sample space be W = C [0, •), and

its associated topology is T . Define the Borel s-algebra B = s(T ). Thus w 2 W is a

continuos function with support [0, •). Define Bt (w ) = w (t). A probability measure P

on (C [0, •), B) is called a Wiener measure if conditions (1)-(3) in Definition 6.3 are

satisfied. With such a probability measure, { Bt }t 0 is said to be a Brownian motion on

(C [0, •), B , P ). ⌅

Theorem 6.2 — Existence and Uniqueness of Wiener Measure. For each s > 0, there

exists a unique Wiener measure in Definition 6.4.

6.2. Thursday

6.2.1. Properties of Brownian Motion


Proposition 6.3 Suppose that { Bt }t 0 is a standard Brownian motion, then it satisfies

the following properties:

1. Joint distribution: Fix 0  t1 < t2 < · · · < tk . Given x1 , x2 , . . . , xk 2 R, the joint den-

sity of ( Bt1 , Bt2 , . . . , Btk ) in ( x1 , x2 , . . . , xk ) is equal to the joint density of ( Bt1 , Bt2

Bt1 , . . . , Btk Btk 1 ) in ( x1 , x2 x1 , . . . , x k xk 1 ), which is

!
k
1 ( x j x j 1 )2
’q exp
2( t j t j 1 )
.
j =2 2p (t j tj 1)

2. Stationary: For any s > 0, define Bts = Bt+s Bs , t 0. Then { Bts }t 0 is a Brownian

motion.

3. Scaling:

• For each c 6= 0, {cBt }t 0 is a Brownian motion with variance c2 ;

69
• For each c > 0, { Bt/c }t 0 is a Brownian motion with variance 1/c;
p
• (Scaling invariance / self-similarity) By previous two properties, { cBt/c }t 0

is a standard Brownian motion, c > 0.

4. Covariance: for fixed 0  s  t, cov( Bt , Bs ) = s.

5. Time reversal: Given a standard Brownian motion { Bt }, define a new process

{ B̂t } with B̂t = tB1/t for t > 0, and B̂0 = 0. Then { B̂t } is a standard Brownian
motion.

Proof on the first four parts. 1) can be shown by the indepedent increments and normal

distribution properties of Brownian motion; 2), 3) can be shown by checking the

definition of Browninan motion; 4) can be shown by directly computing the covariance:

cov( Bt , Bs ) = E [ Bt Bs ] E [ Bt ]E [ Bs ]

= E [( Bt Bs + Bs ) Bs ]

= E [( Bt Bs ) Bs ] + E [ Bs2 ]

= E [ Bt Bs ]E [ Bs ] + E [ Bs2 ]

= s.

Proof on the time reversal part. We need to check those four conditions in Definition 6.3

are satisfied. The condition (1) is trivial.

• Now check condition (3). Fix 0 < s < t, then

B̂t B̂s = tB1/t sB1/s = (t s) B1/t + s( B1/t B1/s ).

Since B1/t B1/s ⇠ N (0, 1/s 1/t), we imply s( B1/t B1/s ) ⇠ N (0, s2 (1/s

1/t)). Moreover, (t s) B1/t ⇠ N (0, (t s)2 /t). By the increment independent

property, this term is independent with s( B1/s B1/t ). Therefore, (t s) B1/t

s( B1/s B1/t ) is normally distributed with mean 0 and variance (t s)2 /t +

s2 (1/s 1/t) = t s.

70
• In order to check condition (2), fix t1 < t2 < t3 . It suffices to check B̂t3 B̂t2 and

B̂t2 B̂t1 are independent. Considering that these two r.v.’s are jointly normal, it

suffices to verify their covariance is zero:

t3 t1 = Var( B̂t3 B̂t1 )

= Var( B̂t3 B̂t2 + B̂t2 B̂t1 )

= Var( B̂t3 B̂t2 ) + Var( B̂t2 B̂t1 ) + 2Cov( B̂t3 B̂t2 , B̂t2 B̂t1 )

= t3 t2 + t2 t1 + 2Cov( B̂t3 B̂t2 , B̂t2 B̂t1 )

which implies the desired result.

• Finally we check the condition (4). Since the continuity of { B̂t } holds at any

t > 0, it suffices to check t = 0 is also a continuous point, i.e., almost surely

limt!0 B̂t = limt!0 tB1/t (w ) = 0.

– Firstly show that B̂t ! 0 when t = 1/n, n ! •. For fixed n 2 Z + , Bn =

Ânj=1 ( Bj Bj 1 ), i.e., Bn is a sum of i.i.d. random variables with standard

normal distribution. By strong law of large numbers, Bn /n ! 0 a.s. as

n ! •.

– Then we show that B̂t ! 0 for other values of t. Fix any s 2 (n, n + 1), note

that

Bs Bn Bs Bn Bn Bn
 +
s n s s s n
1 1 1
= | Bs Bn | + | Bn |
s s n
1 1
 sup | Bs Bn | + 2 | Bn |
n n  s  n +1 n

Since Bn /n ! 0 a.s., we have Bn /n2 ! 0 a.s. Define Zn , supnsn+1 | Bs

Bn |, then
Bs Bn Zn 1
sup  + 2 | Bn |
n < s < n +1 s n n n

71
We claim that for any # > 0,


Zn (w )
P w2W: > #, infinitely often = 0.
n

Zn (w )
Then n ! 0 for almost all w 2 W. As a result,

Bs Bn
sup ! 0, a.s.,
n < s < n +1 s n

which implies the desired continuity result.

– Then we need to show the correctness of our claim. By the stationary and

independent increments of Brownian motion, supnsn+1 | Bs Bn | has the

same distribution as sup0s1 | Bs |, then

Z •
E [ Z0 ] = E [ sup | Bs |] = P ( Z0 > x ) dx
0 s 1 0
• Z ( n +1) #
= Â P ( Z0 > x ) dx
n=0 n#
• Z ( n +1) #
 P ( Z0 > (n + 1)#) dx
n=0 n#
• •
= Â #P(Z0 > (n + 1)#) = # Â P(Z0 > n#)
n =0 n =1

= # Â P ( Z0 /n > #)
n =1

= # Â P ( Zn /n > #)
n =1

We claim that E [ Z0 ] < • (which will be shown in the next lecture), which

implies that

 P(Zn /n > #) < •.
n =1

Applying the Borel-Cantelli Lemma gives the desired result.

72
Chapter 7

Week7

7.1. Tuesday

7.1.1. Reflection Principle


Consider a Brownian motion { Bt }t 0 on a probability space (W, F , P ). Let the filteration

{Ft }t 0 be the natural filteration, i.e., Ft = s( Bu : u  t). Suppose that a > 0, define

Ta , inf{t 0 : Bt = a}.

Then Ta is the first time that Brownian motion hits level a. By convention, inf ∆ = +•.

Theorem 7.1 The hitting time is finite almost surely:

P ( Ta < •) = 1.

Proof. Based on Bt , define new stochastic process { Ztq } with

✓ ◆
q2 t
Ztq = exp qBt , t 0.
2

As a result,

2 s2 /2
• Since E [eqX ] = eq for X ⇠ N (0, s2 ),

✓ ◆
q2 t
E [| Ztq |] = E exp qBt = 1, 8t
2
73
• For any 0  u < t, we have

 ✓ ◆
q2
E [ Ztq | Fu ] = E exp qBt Fu
t
 ✓ ◆ ✓ ◆
q 2 (t u) q2 u
= E exp q ( Bt Bu ) exp qBu Fu
2 2
✓ ◆  ✓ ◆
q2 u q 2 (t u)
= exp qBu · E exp q ( Bt Bu ) Fu
2 2
✓ ◆  ✓ ◆
q2 u q 2 (t u)
= exp qBu · E exp qBt u = Zuq
2 2

Therefore, { Ztq } is a martingale w.r.t. {Ft }t 0. Now we compute limt!• E [ ZTq a 1{ Ta <

•}] as the following:


a.s.
• Since i) Ztq^Ta 1{ Ta < •} ! ZTq a 1{ Ta < •} as t ! •, ii) | Ztq^Ta |  eqa for any t, by

bounded convergence theorem,

lim E [ Ztq^Ta 1{ Ta < •}] = E [ ZTq a 1{ Ta < •}]


t!•

• Since Ztq^Ta 1{ Ta = •} = Ztq 1{ Ta = •} for any t, and

q 2 t/2
Ztq 1{ Ta = •}  eqa ! 0,

by bounded convergence theorem,

lim E [ Ztq^Ta 1{ Ta = •}] = lim E [ Ztq 1{ Ta = •}] = 0.


t!• t!•

On the other hand, since the stopped process { Ztq^Ta } is a martingale,

E [ Ztq^Ta ] = E [ Z0q ] = 1, 8t.

It follows that

1 = lim E [ Ztq^Ta 1{ Ta < •}] + lim E [ Ztq^Ta 1{ Ta = •}]


t!• t!•

q 2 Ta /2
= E [ ZTq a 1{ Ta < •}] = E [eqa 1{ Ta < •}].

74
Therefore, E [e q 2 Ta /2 1{ T < •}] = e qa . Since e q 2 Ta /2 1{ T < •} is increasing, by mono-
a a

tone convergence theorem,

qa q 2 Ta /2
1 = lim e = lim E [e 1{ Ta < •}] = E [1{ Ta < •}] = P [1{ Ta < •}].
q !0 q !0

R The stationary property shows that { Bt+s Bs }t 0 is also a Brownian motion


for any s > 0. Given that Ta is a stopping time and finite a.s., we can assert
that { Bt+Ta BTa }t 0 is also a Brownian motion, independent of FTa .

Theorem 7.2 Let { Bt }t 0 be a standard Brownian motion. Let Mt = sup0ut Bu be

the running maximal of Brownian motion. For any a 0,

Z •
2 x2
P ( Mt a) = 2P ( Bt a) = p e 2t dx.
2pt a

Proof. Firstly simplify P ( Bt a) as the following:

P ( Bt a) = P ( Bt a, Mt a) + P ( Bt a, Mt < a)

= P ( Bt a, Mt a)

= P ( Bt a | Mt a ) P ( Mt a)

= P ( Bt a | Ta  t)P ( Mt a)

= P ( Bt BTa 0 | Ta  t)P ( Mt a)

Since conditioned on { Ta  t}, Bt BTa is normally distributed with mean 0, we imply

P ( Bt BTa 0 | Ta  t) = 12 . Therefore,

Z •
1 1 x2
P ( Bt a ) = P ( Mt a) = p e 2t dx.
2 2pt a

75
7.1.2. Distributions of Brownian Motion
Theorem 7.3 The joint distribution of the Brownian motion and the running maxi-

mal of Brownian motion is:

Z •
1 x2
P ( Mt a, Bt  a y) = P ( Bt a + y) = p e 2t dx.
2pt a+y

Proof. Simplify P ( Bt a + y) as the following:

P ( Bt a + y) = P ( Bt a + y | Mt a ) P ( Mt a)

= P ( Bt a + y | Ta  t)P ( Mt a)

= P ( Bt BTa y | Ta  t)P ( Mt a)

= P ( Bt BTa  y | Ta  t)P ( Mt a)

= P ( Bt  a y | Mt a ) P ( Mt a ).

Theorem 7.4 For any l > 0,

p
lTa 2la
E [e ]=e .

Proof. By Theorem 7.2, the density of Ta is

✓ ◆
a 1 a2
f Ta (t) = p p exp .
t t 2p 2t

R•
Computing the integral 0
f Ta (t)e lt dt gives the desired result. ⌅
p q 2 Ta /2 1{ T
Another Quick Proof. Since P ( Ta < •) = 1, substituting q with 2l for E [e a <
•}] = e qa gives the desired result. ⌅

76
Theorem 7.5 Consider the Brownian motion with drift:

Xt , µt + sBt ,

where µ 6= 0, s > 0.

1. For any 0  s < t, Xt Xs is normally distributed with mean µ(t s) and

variance s2 (t s). Independent incremental property also holds.

2. Time reversal: limt!• Xt t = µ.

3. For µ < 0, define M• = supt 0 Xt as the running maximal of drifted Brownian


2| µ |
motion to infinity. Then M• is exponentially distributed with parameter s2
:

✓ ◆
2| µ |
P ( M• > y) = exp y , y 0.
s2

The first two parts are trivial, and we give a proof for the last part:

Proof. Choose some q 6= 0 and define the random process {Vtq }t 0 such that

✓ ◆
s2 q 2
Vtq = exp qXt µqt t .
2

It follows that {Vtq }t 0 is a martingale:

• Since Xt ⇠ N (µt, s2 t), E [|Vtq |] = 1;

• For any 0  u < t, we have

 ✓ ◆
s2 q 2
E [Vtq | Fu ] = E exp qXt µqt t
Fu
2
 ✓ ◆
s2 q 2 s2 q 2
= E exp q ( Xt Xu ) (µq + )(t u) e(qXu (µq + 2 )u) Fu
2
 ✓ ◆
2 2
(qXu (µq + s 2q )u) s2 q 2
=e E exp q ( Xt Xu ) (µq + )(t u)
2

= Vuq .


For a < 0 < b, define Ta,b = inf{t 0 : Xt = a or Xt = b}. Define q = s2
such that
a.s.
µq + 12 s2 q 2 = 0. Considering that Ta,b  Ta with Ta being finite a.s., we imply Vtq^Ta,b !

77
VTqa,b . Moreover, |Vtq^Ta,b |  max(e qa , eqb ). By dominated convergence theorem,

E [VTqa,b ] = lim E [Vtq^Ta,b ] = lim E [V0q ] = 1.


t!• t!•

Therefore,

1 = E [VTqa,b ] = E [VTqa,b 1{ Ta,b = a}] + E [VTqa,b 1{ Ta,b = b}]

= eqa P ( Ta,b = a) + eqb P ( Ta,b = b).

Together with the fact that P ( Ta,b = a) + P ( Ta,b = b) = 1, we assert that

eqb 1
P ( Ta,b = a) = .
eqb eqa

Now define the set A a = { Ta,b = Ta }, then the sequence { A a } is monotone in a:

a2 < a1 < 0 =) A a2 ✓ A a1 .

Define the set

\
L , { M• < b} = {w 2 W : Xt (w ) never hits level b} = Aa .
a<0,a2Q

Thus
✓ ◆
\
qb
P Aa = lim P ( A a ) = lim P ( Ta,b = a) = 1 e .
a!• a!•
a<0,a2Q

Then we conclude that

2|µ|/s2
P ( M• < b ) = P ( L ) = 1 e b.

In particular, we imply P ( M• < •) = 1. ⌅

78
7.2. Thursday

7.2.1. Unbounded Variation of Brownian Motion


Definition 7.1 [Partition] Consider a closed interval [ a, b]. A sequence

a = t0 < t1 < · · · < t n = b

is called a partition of [ a, b], denoted as P = P(t0 , t1 , . . . , tn ). ⌅

Definition 7.2 [Total Variation] Let f be a continuous function on [ a, b]:

f : [ a, b] 7! R.

Then the total variation of f is defined as

TV ( f )[ a, b] = sup  | f (tk ) f (tk 1 )|


P k

where the supremum is taken over all the possible partitions P on the interval [ a, b]. Since

Âk | f (tk ) f (tk 1 )| is increasing as the partition being smaller,

TV ( f )[ a, b] = lim
kPk!0 k
 | f (tk ) f (tk 1 )|

with kPk = maxk |tk tk 1 |. ⌅

R The real analysis shows that a bounded variation function, i.e., the function
whose total variation is bounded, is differentiable almost everywhere. Then if
the function is nowhere differentiable, it is not of bounded variation.

Theorem 7.6 Brownian motion is not of bounded variation almost surely, i.e.,

P ({w 2 W : TV ( B· (w ))[0, t] = •}) = 1, 8t > 0.

79
This result is based on the fact that the Brownian motion is nowhere differentiable

alomst surely.

Theorem 7.7 Brownian motion is nowhere differentiable almost surely. In particular,

✓⇢ ◆
B (w ) Bt (w )
P w 2 W : lim sup t+h = •, t 2 [0, •) =1
n!• h

R If a stochastic process { At }t 0 has bounded variation, then the integral


Rb
a
f (t) dAt (w ) can be defined in Riemann integraion sense w-wisely. How-
ever, Brownian motion is not of the bounded variaiton. The stochastic integral
Rb
a
f (t) dBt (w ) shall be defined in a new manner.

Proof. Choose any T > 0 and M > 0, define the set


( M) Bt+h (w ) Bt (w )
A , w 2 W : 9t 2 [0, T ] such that lim sup M .
h !0 h

It suffices to show that P ( A( M) ) = 0. If w 2 A( M) , there exists t 2 [0, T ] and n0 such

that when n n0 ,

Bu (w ) Bt (w )
 2M, 8u 2 (t 2/n, t + 2/n).
u t

Decompose A( M) into many smaller sets. Define the set


( M)
An , w 2 W : 9t 2 [0, T ] such that | Bu (w ) Bt (w )|  2M |u t |, 8u 2 (t 2/n, t + 2/n) .

(7.1)
( M) ( M)
Then i) A( M) ✓ [n An , and ii) { An } is monotone: An ✓ An+1 .

Suppose that w 2 An with t having such a property in (7.1). Let k = sup{ j 2

Z : j/n  t}, which means k is close enough to t. Define Yk as the maximal of three

independent increments:

n o
Yk = max B(k+2)/n B(k+1)/n , B(k+1)/n Bk/n , Bk/n B(k 1)/n .

80
( M)
We can show that Yk (w )  6M/n, 8w 2 An as follows. Firstly,

B(k+2)/n (w ) B(k+1)/n (w )  B(k+2)/n (w ) Bt (w ) + Bt (w ) B(k+1)/n (w )


k+2 k+1
 2M t + 2M t
n n
2 1 6M
 2M · + 2M · =
n n n

where the last inequality is because that k/n  t < (k + 1)/n. Following the similar

technique, we can show that

6M 6M
B(k+1)/n (w ) Bk/n (w ) , Bk/n (w ) B(k 1)/n ( w )  =) Yk (w )  .
n n

( M)
Now define the new set based on the consequence of the claim about An :


( M) 6M
En , w 2 W : 9 j 2 [1, Tn ] \ Z such that Yj (w )  .
n

with
n o
Yj = max B( j+2)/n B( j+1)/n , B( j+1)/n Bj/n , Bj/n B( j 1)/n .

( M) ( M) ( M)
Directly An ✓ En for each n. Now we begin to upper bound P ( En ):

( M) 6M
P ( En ) Â P (Yj 
n
)
1 j Tn
✓ n o ◆
6M
 Tn P max B( j+2)/n B( j+1)/n , B( j+1)/n Bj/n , Bj/n B( j 1)/n 
n
✓ ◆
6M
= Tn · ’ P B(i+1)/n B(i)/n  (7.2a)
i = j 1:j+1
n
 ✓ ◆ 3
6M
= Tn · P | B1/n |  , (7.2b)
n

where (7.2a) is because of the independent increments of Brownian motion, and (7.2b)

81
is because of its stationary increment property. In particular,

✓ ◆ ✓ ◆
6M 6M 6M
P | B1/n |  =P  B1/n 
n n n
✓ ◆
6M 6M
=P p  B1  p (7.2c)
n n
Z 6M
1 p
n x2 /2
=p e dx
2p 6M
p
n

2 6M
p p (7.2d)
2p n

where (7.2c) is by the scaling property, and (7.2d) is by upper bounding e x2 /2  1. It


follows that
( M) 2 6M
P ( En )  Tn ( p p )3 ! 0.
2p n
( M) ( M) ( M) ( M)
Since An ✓ En , P ( An ) ! 0. Since A( M) ✓ [n An ,

( M) ( M)
P ( A( M) )  P ([n An ) = lim P ( An ) = 0.
n!•

The proof is complete. ⌅

82
Chapter 8

Weak 8

8.1. Thursday

8.1.1. Quadratic Variation


Definition 8.1 [Quadratic Variation] Consider a partition P on the interval [0, T ]. The

quadratic variation of { Bt (w )}0tT over the partition P is defined as

Q(P, w ) = Â | Btk (w ) Btk 1 (w )|2 .


k

Theorem 8.1 Consider a sequence of partitions {P(n) } with kP(n) k ! 0, where

kPk , maxk |tk tk 1 |. Then

h i
lim E ( Q(P(n) ) T )2 = 0.
n!•

Proof. Given a partition P on the interval [0, T ], define

qk = ( Btk Btk 1 )2 (tk tk 1) =) Q(P) = T + Â qk


k

83
We claim that q j , qk are uncorrelated for j 6= k:

⇣ ⌘
E [q j qk ] = E ( Bt j Bt j 1 )2 (t j tj 1) ( Btk Btk 1 )2 (tk tk 1)

h i ⇥ ⇤
= E ( Bt j Bt j 1 )2 ( Btk Btk 1 )2 (t j t j 1 )E ( Btk Btk 1 )2
h i
(tk tk 1 )E ( Bt j Bt j 1 )2 + (t j t j 1 )(tk tk 1 )
h i ⇥ ⇤ ⇥ ⇤
= E ( Bt j Bt j 1 ) E ( Btk Btk 1 )2
2
(t j t j 1 )E ( Btk Btk 1 )2
h i
(tk tk 1 )E ( Bt j Bt j 1 )2 + (t j t j 1 )(tk tk 1 )

= (t j tj 1 )( tk tk 1) (t j tj 1 )( tk tk 1)

(tk tk 1 )( t j tj 1) + (t j tj 1 )( tk tk 1) = 0.

Then we begin to simplify E [( Q(P(n) ) T )2 ]:

" #
h i
E ( Q ( P(n) ) T )2 = E ( Â q k )2
k
⇥ ⇤ ⇥ ⇤
= Â E qk2 + Â E q j qk
k j6=k
h 2
i
= ÂE ( Btk Btk 1 )2 (tk tk 1)
k
h i
= Â E ( Btk Btk 1 )4
k
⇥ ⇤
2 Â(tk tk 1 )E ( Btk Btk 1 )2 + Â(tk tk 1)
2

k k

= 3 Â(tk tk 1)
2
2 Â(tk tk 1)
2
+ Â(tk tk 1)
2

k k k

= 2 Â(tk tk 1)
2
 2k P ( n ) k  ( t k tk 1)
k k

= 2T · kP(n) k ! 0.

R The Theorem 8.1 shows that the quadratic variation of Brownian motion on
the interval [0, T ] converges to T in L2 for any T. This implies that Q(P(n) ) !
T in probability as kP(n) k ! 0. Then there exists a subsequence of P(n) , such

84
that Q(P(n) ) ! T almost surely.

Theorem 8.2 If kP(n) k ! 0 faster than 1/n2 , i.e.,

lim n2 · kP(n) k = 0,
n!•

then Q(P(n) ) ! T almost surely.

Proof. Take dn , n2 · kP(n) k, then by Markov inequality,

h i
⇣ ⌘ E ( Q ( P(n) ) T )2 2T kP(n) k T
P ( Q ( P(n) ) T )2 > 2dn   = 2.
2dn 2dn n

Considering that Ân nT2 < •,

⇣ ⌘
ÂP ( Q(P (n)
) 2
T ) > 2dn < •.
n

By Borel-Cantelli Lemma,

⇣ ⌘
P ( Q ( P(n) ) T )2 > 2dn , infinitely often = 0.

Therefore, for almost all w 2 W,

p p
| Q ( P(n) , w ) T| > 2dn , for finite n =) | Q(P(n) , w ) T|  2dn , n ! •.

By the assumption that dn ! 0 as n ! •, we conclude that

| Q ( P(n) , w ) T | ! 0.

The proof is complete. ⌅

The Brownian motion is nowhere differentiable almost surely and does not have

bounded variation. However, it turns to have finite quadratic variation limit. Using this

quadratic variation property, we can go back to show that the Brownian motion does

85
not have bounded variation.

A Direct Proof for Theorem 7.6. Since the Brownian motion is almost surely continuous,

on the closed interval [0, T ], Bt (w ) is uniformly continuous for almost all w 2 W:

max | Btk (w ) Btk 1 (w )| ! 0, as max |tk tk 1| !0


k k

Assume on the contrary that there exists t > 0 such that

P ( TV ( B)[0, t] = •) < 1 =) P ( TV ( B)[0, t] < •) > 0.

Define the set L = {w 2 W : TV ( B)[0, t] < •}. Then for any partition P, if w 2 L,

 | Bt (w )
k
Btk 1 (w )|  TV ( B· (w ))[0, t] < •
k

As a result, for w 2 L, the quadratic variation converges to 0 as kPk ! 0:

Q(P, w ) = Â( Btk (w ) Btk 1 (w ))2  kPk  | Btk (w ) Btk 1 (w )| ! 0.


k k

Then P (limn!• Q(P(n) ) = 0) P (L) > 0, where kP(n) k ! 0. Choose some e 2 (0, t).

If w 2 {limn!• Q(P(n) ) = 0}, there exists n0 such that for n n0 ,

w 2 {| Q(P(n) ) t| > e} =) lim P (| Q(P(n) ) t| > e) > 0,


n!•

which contradicts to the fact that Q(P(n) ) ! t in probability. ⌅

86
Chapter 9

Week9

9.1. Tuesday

9.1.1. Introduction to Ito Calculus


Throughout this chapter, we consider a complete probability space (W, F , P ), i.e., for

any A 2 F with P ( A) = 0 and 8 B ✓ A, we have B 2 F . Let { Bt }t 0 be a standard Brow-

nian motion on (W, F , P ) and {Ft }t 0 be the natural filtration, i.e., Ft , s({ Bu : u  t}).

Suppose that { Xt }t 0 is an {Ft }t 0 -adapted stochastic process. One typical example

for such a process is Xt = f ( Bt ) for some Borel measurable function f : R ! R. In this

chapter, we aim to define the integral of the following form:

Z t
Xs dBs , t 0. (9.1)
0

A naive idea is to define this integral using the Riemann-sum approach:

Z t

0
Xs dBs , lim
kp k!0 k
 Xt k 1
· ( Btk Btk 1 ), (9.2)

where the limit is taken in L2 sense along the partition p defined on the interval [0, t],

and kp k , maxk |tk tk 1 |.

It is reasonable to study how the limit for (9.2) looks like, by considering a simple

stochastic process { Xt }t 0, and then extend by some approximation procedure.

87
Definition 9.1 [Simple Stochastic Process]

1. Let L2 be the space of all adapted stochastic process { Xt }t 0 satisfying

Z T
E Xt2 dt < •, 8 T > 0.
0

2. An adapted stochastic process { Xt }t 0 2 L2 is called simple if for any w 2 W,

Xt ( w ) = Xt j ( w ), t 2 [t j , t j+1 ), j = 0, 1, . . . ,

where 0 = t0 < t1 < · · · < tn < · · · is an increasing sequence with limn!• tn = •.

Denote L20 be the class of all simple processes.

R Not every piece-wise constant process is a simple process. Consider an


{Ft }t 0 -adapted stochastic process { Xt }t 0 and define

Yt (w ) = Xt j+1 (w ), t 2 ( t j , t j +1 ] .

Then Yt is not Ft -measurable, and thus {Yt }t 0 is not an {Ft }t 0 -adapted

process.

Definition 9.2 [Ito Integral for Simple Stochastic Process] Suppose that { Xt }t 0 is a

simple process. Given T > 0, define, for each w 2 W,

Z T n 1

0
Xt (w ) dBt (w ) = Â Xt (w ) · [ Bt + (w )
k k 1
Btk (w )] + Xtn (w ) · [ BT (w ) Btn (w )]
k =0
(9.3)

where n , max{ j 2 N : t j  T }. ⌅

Proposition 9.1 The Ito Integral for Simple Stochastic Process admits the following

properties:

88
1. Linearity: Suppose that { Xt }t 0 , {Yt }t 0 2 L20 and a, b 2 R, then

Z T Z T Z T
(aXt + bYt ) dBt = a · Xt dBt + b · Yt dBt
0 0 0

2. Ito Isometry: for { Xt }t 0 2 L20 ,


"✓Z ◆2 # Z
T T
E Xt dBt =E Xt2 dt
0 0

Rt
3. Define the new random variable It [ X ] , 0
Xu dBu , then the process { It [ X ]}t 0 is

an almost sure continuous martingale, with square integrability:

E [( It [ X ])2 ] < •, 8t 0.

(1) (2)
Proof for Part 1). Denote {tk }k 0 and {tk }k 0 as the paritions corresponding to sim-

ple processes { Xt }t 0 , {Yt }t 0 . Consider a partition {tk }k 0 obtained as a union of

these two partitions. With respect to this new sequence {tk }, processes { Xt }t 0 , {Yt }t 0

are still simple. Then {aXt + bYt } is also a simple process corresponding to {tk }. The

linearity property follows by checking the definition in (9.3). ⌅

Proof for Part 2). Note that the left side in this part can be expanded as the following:

"✓Z 2 !2 3
T
◆2 # n 1
E
0
Xt dBt = E4 Â Xt k
· [ Btk+1 Btk ] + Xtn · [ BT Btn ] 5
k =0
h i
= Â E Xtk1 Xtk2 ( Btk1 +1 Btk1 )( Btk2 +1 Btk2 )
0k1 ,k2 n 1

⇥ ⇤ n 1 ⇥ ⇤
+ E Xt2n · ( BT Btn )2 + 2 Â E Xtk Xtn · ( Btk+1 Btk )( BT Btn )
k =0

⇥ ⇤
where the second term equals E Xt2n · ( T tn ) by the independent incremental prop-

erty of Brownian motion, and the third term vanishes since

⇥ ⇤ ⇥ ⇤
E Xtk ( Btk+1 Btk ) = E E [ Xtk ( Btk+1 Btk ) | Fk
⇥ ⇤ ⇥ ⇤
= E Xtk E [( Btk+1 Btk ) | Fk ] = E Xtk E [( Btk+1 Btk )] = 0.

89
Following the similar trick, for 0  k1 < k2  n 1, we have

h i h i
E Xtk1 Xtk2 ( Btk1 +1 Btk1 )( Btk2 +1 Btk2 ) = E E [ Xtk1 Xtk2 ( Btk1 +1 Btk1 )( Btk2 +1 Btk2 ) | Ftk2 ]
h i
= E Xtk1 ( Btk1 +1 Btk1 ) Xtk2 E [( Btk2 +1 Btk2 )] = 0,

where the second equality is because that all random variables except Btk2 +1 are Ftk2 -

measurable. Now when k1 = k2 ⌘ k, we have

⇥ ⇤ ⇥ ⇤
E Xt2k ( Btk+1 Btk )2 = E Xt2k E [( Btk+1 Btk )2 | Ftk ]
⇥ ⇤
= E Xt2k (tk+1 tk ) .

Therefore,
"✓Z ◆2 #
T n 1 ⇥ ⇤ ⇥ ⇤
E
0
Xt dBt = ÂE Xt2k (tk+1 tk ) + E Xt2n · ( T tn )
k =0
" # Z
n 1 T
=E Â Xt2k (tk+1 tk ) + Xt2n · (T tn ) = E
0
Xt2 dt .
k =0

Proof of Part 3). Since { Bt }t 0 is almost surely continuous, by definition, It [ X ] is also

a.s. continuous. To show the square integrability, by Ito isometry property,

✓Z t
◆2 Z t
2
E [( It [ X ]) ] = E [ Xu dBu ]=E Xu2 du .
0 0

Since { Xt }t 0 2 L20 , E [( It [ X ])2 ] < • for all t 0.

Now we begin to show that { It [ X ]}t 0 is a martingale with respect to {Ft }t 0. For

any 0  s < t and take n0 , max{ j 2 N : tn0  s}, we have

" #
n 1
E [ It [ X ] | Fs ] = E Â Xt ( Bt +
k k 1
Btk ) + Xtn ( Bt Btn ) Fs .
k =0

90
Considering separate the summation from 0 to n 1 into two parts, we further have

" #
n0 1
E [ It [ X ] | Fs ] = E Â Xt ( Bt + k k 1
Btk ) + Xtn0 ( Bs Btn0 ) Fs
k =0
h i
+ E Xtn0 ( Btn0 +1 Bs ) Fs
" 0 #
n 1
+E Â Xtk ( Btk+1 Btk ) + Xtn ( Bt Btn ) Fs
k = n 0 +1

where the first term equals

n0 1
 Xt ( Bt +
k k 1
Btk ) + Xtn0 ( Bs Btn0 ),
k =0

the second term equals


h i
Xtn0 E ( Btn0 +1 Bs ) = 0,

and the third term equals

n0 1 ⇥ ⇤
Â0 E Xtk E [ Btk+1 Btk | Ftk ] Fs + E [ Xtn E [ Bt Btn | Ftn ]|Fs ] = 0.
k = n +1

As a result,

n0 1
E [ It [ X ] | Fs ] = Â Xt ( Bt + k k 1
Btk ) + Xtn0 ( Bs Btn0 )
k =0
Z s
= Xu dBu .
0

Since { It [ X ]} is square integrable, it is L1 -integrable. Therefore, { It [ X ]} is a martingale.

The proof is completed. ⌅

91
9.2. Thursday

9.2.1. Approxiation by simple processes

R Before defining the Ito integral for general adapted process { Xt }t 0 2 L2 , we


show that any such { Xt }t 0 can be approximated by a sequence of simple
(n)
processes { Xt }t 0 2 L20 , n = 1, 2, . . ..

Theorem 9.1 1. Let { Xt }t 0 2 L2 be an almost surely bounded and continuous


process, i.e.,

⇢ !
P w 2 W : sup | Xt (w )|  M and Xt (w ) continuous on t 0 = 1.
t 0

(n)
Then for given T > 0, there exists a sequence of simple processes { Xt }t 0 2 L20
such that
Z T
(n)
lim E ( Xt Xt )2 dt = 0.
n!• 0

2. Let { Xt }t 0 2 L2 be an almost surely bounded (but not necessarily continuous)


process. Then for given T > 0, there exists a sequence of almost surely bounded
(n)
and continuous process { Xt }t 0 2 L2 , 8 n 1 such that

Z T
(n)
lim E ( Xt Xt )2 dt = 0.
n!• 0

3. Let { Xt }t 0 2 L2 . Then for given T > 0, there exists a sequence of almost surely
(n)
bounded processes { Xt }t 0 2 L2 , 8 n 1 such that

Z T
(n)
lim E ( Xt Xt )2 dt = 0.
n!• 0

(n)
Proof for Part 1). Construct { Xt }t 0 as the following. Pick a sequence of partitions

{P(n) } on the interval [0, T ] with kP(n) k ! 0. For each n, define the stochastic process
(n)
{ Xt } with
(n) (n) (n) (n)
Xt ( w ) = Xt j ( w ), for t 2 [t j , t j+1 ),

92
(n) (n) (n)
where P(n) , {0 = t0 < t1 < · · · < T }. It is clear that { Xt }t 0 is a simple process.

Define the set


L= w 2 W : sup | Xt (w )|  M and Xt (w ) continuous on t 0
t 0

For each w 2 W, Xt (w ) is continuous (and thus uniformly continuous) on [0, T ]: for

# > 0, there exists d > 0 such that

r
#
| Xs ( w ) Xt (w )| < , for any |s t| < d.
T

Choose large n such that kP(n) k < d, which implies that

r Z T
(n) # (n)
8t 2 [0, T ], | Xt ( w ) Xt (w )| < =) [ Xt ( w ) Xt (w )]2 dt < #.
T 0

RT (n)
Therefore, the random variable 0
[ Xt Xt ]2 dt ! 0 almost surely. For w 2 L, { Xt (w )}
(n)
is bounded by M, and thus { Xt (w )} is bounded by M as well. Thus the random
R T (n)
variable 0 [ Xt Xt ]2 dt is upper bounded by (2M )2 · T almost surely. By the bounded

convergence theorem,
Z T
(n)
lim E ( Xt Xt )2 dt = 0.
n!• 0

(n)
Proof for Part 2). Construct { Xt }t 0 as the following. For each n, pick a non-negative

continuous funciton fn : R ! R satisfying

1
1. fn ( x ) = 0 for x 2 ( •, n] [ [0, •);
R•
2. • n
f ( x ) dx = 1.

(n)
Then define the process { Xt }t 0 with

Z t
(n)
Xt ( w ) ⌘ fn ⇤ X· (w ) |0t , fn (s t) Xs (w ) ds, 8w 2 W.
0

Define the set



L= w 2 W : sup | Xt (w )|  M .
t 0

93
Then for each w 2 L,

Z t Z t
(n)
| Xt (w )|  fn (s t)| Xs (w )| ds  M · fn (s t) ds  M, 8t 0, 8n.
0 0

(n)
Therefore, the process { Xt }t 0 is almost surely bounded. Moreover, by definition,
(n)
{ Xt } t 0 is continuous a.s. and {Ft } adapted.

RT (n)
Now we begin to show that 0
[ Xt Xt ]2 dt ! 0 almost surely. Take w 2 L, then

Z T Z T
(n) 2 (n)
[ Xt ( w ) Xt (w )] dt  2M · | Xt ( w ) Xt (w )| dt,
0 0

where the integral on the RHS can be upper bounded as the following:

Z T Z T Z t
(n)
| Xt ( w ) Xt (w )| dt = fn (s t) Xs (w ) ds Xt (w ) dt (9.4a)
0 0 0
Z T Z •
= fn ( s) Xt s (w ) ds Xt (w ) dt (9.4b)
0 0
Z T Z • Z •
= fn ( s) Xt s (w ) ds fn ( s) Xt (w ) ds dt
0 0 0

(9.4c)
Z TZ •
 fn ( s)| Xt (w ) Xt s (w )| ds dt (9.4d)
0 0
Z • Z T
= fn ( s) | Xt ( w ) Xt s (w )| dt ds (9.4e)
0 0

R•
where (9.4b) is by the change of variable s0 = t s; (9.4c) is because that • fn (s) ds =
R0 RT
f (s) ds = 1; (9.4e) is by the Fubini’s theorem. We claim that the term 0 | Xt (w )
• n

Xt s (w )| dt is small when s is small, i.e., for any # > 0, there exists d such that when

s < d,
Z T
| Xt ( w ) Xt s (w )| dt < #. (9.5)
0

94
We can further apply (9.5) to upper bound the term (9.4e):

Z • Z T
fn ( s) | Xt ( w ) Xt s (w )| dt ds
0 0
Z d Z T Z • Z T
= fn ( s) | Xt ( w ) Xt s (w )| dt ds + fn ( s) | Xt ( w ) Xt s (w )| dt ds
0 0 d 0
Z • Z •
# fn ( s) ds + 2MT · fn ( s) ds = #.
d d

1
where the last equality holds when we choose n large enough such that d n. Thus

Z T
(n)
| Xt ( w ) Xt (w )| dt ! 0 for w 2 L.
0

The remaining part follows the similar logic as in part 1).

Finally, we show the claim in (9.5) by discussing cases for continuous and discon-

tinuous Xt :

• When Xt is continuous on t 2 [0, T ], we show that for any # > 0, there exists d

such that
Z T
8s < d, | Xt Xt s | dt < #.
0

Since Xt is continuous (and thus uniformly continuous) on t 2 [0, T ], for any

# > 0, there exists d < #


2M such that for any s < d,

#
| Xt Xt s | < , 8t 2 [s, T ].
2T

As a result,

Z T Z s Z T
| Xt Xt s | dt = | Xt Xt s | dt + | Xt Xt s | dt
0 0 s
#
 Md + T · < #.
2T

• When Xt is not continuous on t 2 [0, T ], we can also show the same result:

Because the continuous functions are dense in L p (1  p < •), for any # > 0, there

95
exists a continuous function X̂t such that

Z T
#
| Xt X̂t | dt < . (9.7)
0 3

As a result,

Z T Z T Z T Z T
| Xt Xt s | dt  | Xt X̂t | dt + | Xt s X̂t s | dt + | X̂t s X̂t | dt
0 0 0 0

<#

where the first two terms are all bounded by #/3 because of (9.7), and the last

term is also bounded by #/3 since X̂t is continuous on t 2 [0, T ].

(n)
Proof of Part 3). For each n, we construct the almost surely bounded process { Xt }t 0

using the truncation method:


8
>
>
>
> n, if Xt (w ) n
>
<
(n)
Xt ( w ) = Xt ( w ), n < Xt ( w ) < n
>
>
>
>
>
: n, if Xt (w )  n

(n)
Therefore, | Xt (w )|  | Xt (w )|, 8w 2 W. Together with the inequality ( a + b)2  2a2 +

2b2 , we have

Z T Z T Z T Z T
(n) (n)
( Xt Xt )2 dt  2 ( Xt )2 dt + 2 ( Xt )2 dt  4 ( Xt )2 dt < •, 8t.
0 0 0 0

RT (n)
Therefore, 0
Xt )2 dt is dominated by an integrable random variable. Moreover,
( Xt
(n) RT
substituting the form of Xt and considering 0 ( Xt )2 dt < • gives

Z T Z T Z T
(n) 2 2 2
( Xt Xt ) dt  ( Xt ) 1{( Xt ) n} dt + ( Xt )2 1{( Xt )2  n} dt ! 0, as n ! •.
0 0 0

Applying the dominated convergence theorem gives the desired result.

96
Combining Theorem 9.1 part 1) to 3), we conclude that for { Xt }t 0 2 L2 , there
(n)
exists a sequence of simple processes { Xt }t 0 2 L20 such that

Z T
(n)
lim E ( Xt Xt )2 dt = 0.
n!• 0

97
Chapter 10

Week10

10.1. Tuesday
Recall the property 3) in proposition 9.1, in which we show that when { Xu }u is a simple
nR o
t
process, 0
X u dBu is a square integrable, almost surely continuous martingale. In
t
this lecture, we first review some basic ideas about square integrability.

10.1.1. Square Integrable Process


Definition 10.1 [Square Integrable Martingales]

1. A stochastic process { Xt }t 0 is said to be square integrable if E [ Xt2 ] < •, 8t 0.

2. Let U 2 be the class of square integrable, right-continuous, with left limit exist

martingales.

3. Let Uc2 be the class of square integrable, almost surely continuous martingales.

In particular, we denote L2 as the set of square integrable random variables.

Definition 10.2 [Norm on U 2 ] For given T > 0, define a norm k · k on U 2 :

1/2
k X k , E [ XT2 ] , { Xt } t 0 2 U 2.

99
Theorem 10.1 — Completeness of Square Integrable Martingales. With respect to the

norm k · k,

1. U 2 is a complete metric space;

2. Uc2 is a closed subspace of U 2 .

Proof. 1. It is easy to see that (U 2 , k · k) is a metric space. To show the completeness,


(n)
it suffices to show that for any Cauchy sequence { Xt }t 0 2 U 2 where n = 1, 2, . . .,
there exists a process { Xt }t 0 2 U 2 such that

k X (n) X k ! 0, as n ! •.

We first give a construction on such { Xt }t 0:

• Since { X (n) }n is a Cauchy sequence, for any # > 0, there exists N such that

as long as m, n > N,

k X (m) X (n) k < #.

(m) (n)
Since { Xt Xt } t 0 is a martingale, by convexity of quadratic function,
(m) (n)
{( Xt Xt )2 } t 0 is a sub-martingale, which implies

(m) (n) (m) (n)


8t  T, E [( Xt Xt )2 ]  E [( XT X T )2 ] = k X ( m ) X ( n ) k2 < #2 .

(n)
This means that for fixed t  T, { Xt }n is a Cauchy sequence in L2 , with
1/2
the metric defined as k X kL2 , E [ X 2 ] . By the completeness of L2 , there
( n ) L2
exists a random variable Xt 2 L2 such that Xt ! Xt as n ! •.

Next, we show that { Xt }t 0 2 U 2:

(a) In order to show { Xt }t 0 is {Ft }-adapted, we need the almost sure conver-

gence of some process into { Xt }, as the definition of almost sure convergence

is based on sample-path. The details are given as follows. The L2 conver-


(n) P (nk )
gence implies Xt ! Xt , and thus there exist s a subsequence { Xt }k such

100
that
(nk ) a.s.
Xt ! Xt as k ! •.

Then define the sample-path-based set


(nk )
L= w 2 W : lim Xt ( w ) = Xt ( w ) .
k!•

Re-define Xt (w ) = 0 for w 2 Lc . Note that Lc 2 Ft since we assume {Ft }

satisfies the usual condition (see Definition 5.4). It follows that for a < 0,

⇢ ⇢
\[ [ (nk ) 1
w 2 W : Xt ( w )  a = w 2 W : Xt (w ) < a + 2 Ft .
j m k>m
j

For a 0, since Lc 2 Ft ,
2 3
⇢ ⇢
[ \[ [ (nk ) 1 5
w 2 W : Xt ( w )  a = L c 4 w 2 W : Xt (w ) < a + 2 Ft .
j m k>m
j

(b) The integrability of Xt is because Xt 2 L2 . Then we show { Xt }t 0 satisfies


R
the martingale property, i.e, for fixed 0  s < t, we need to show A Xt dP =
R
X dP, 8 A 2 Fs . Direct calculation, together with the martingability of
A s
(n)
{ Xt } t 0 gives

Z Z Z Z
(n) (n)
Xt dP Xs dP = ( Xt Xt ) dP ( Xs Xs ) dP, 8n.
A A A A

(n)
By the L2 convergence of Xt ,

Z ⇣ ⌘1/2
(n) (n) (n)
| Xt Xt | dP  E [| Xt Xt |]  E [( Xt Xt )2 ] ! 0, n ! •.
A

R (n)
Similarly, A
| Xt Xt | dP ! 0. The martingability of { Xt }t 0 follows since

Z Z Z Z
(n) (n)
Xt dP Xs dP = lim ( Xt Xt ) dP ( Xs Xs ) dP
A A n!• A A
Z Z
(n) (n)
 lim | Xt Xt | dP + | Xt Xt | dP = 0.
n!• A A

101
(c) To make { Xt }t 0 right-continuous with left limit exists, apply part 1) in

Theorem 5.3.

(n)
2. Consider a sequence { Xt }t 2 Uc2 for n = 1, 2, . . .. By result in part 1), there exists

{ Xt }t 2 U 2 as a limt. It suffices to show that { Xt }t 2 Uc2 . In order to show the


continuity result, we first construct the (almost sure) uniform convergence of

some sequence in Uc2 with the limit { Xt }t .

(n)
• Note that { Xt Xt }t is a martingale for all n, then by Doob’s inequality,

!
(n) 1 h (n) i
P sup | Xt Xt | > #  E ( XT XT )2 ! 0, as n ! •.
t T #2

(n) P (nk )
Since suptT | Xt Xt | ! 0, there exists a subsequence {suptT | Xt

Xt |}k that converges to 0 almost surely. Define


\ (nk )
W1 = w 2 W : Xt (w ) is continuous
k

(nk )
W2 = w 2 W : lim sup | Xt Xt | = 0
k!• t T

W = W1 \ W2 =) P (W ) = 1.

Take w 2 W, then for any # > 0, there exists N > 0 such that as long as k > N

and t  T,

(nk ) (nk ) #
| Xt (w ) Xt (w )|  sup | Xt (w ) Xt (w )| < .
t T 3

(nk )
In other words, Xt (w ) uniformly converges to Xt (w ) for w 2 W.

(nk )
The continuity of Xt (w ) (and therefore uniform continuity) implies there exists
d such that as long as h < d,

(n ) (nk ) #
| Xt+kh (w ) Xt (w )| < , 8t  T.
3
102
Hence we conclude the continuity of Xt (w ) for w 2 W:

(n ) (n ) (nk )
| Xt + h ( w ) Xt (w )|  | Xt+h (w ) Xt+kh (w )| + | Xt+kh (w ) Xt (w )|
(nk )
+ | Xt (w ) Xt (w )| < #.

The proof is completed.

10.2. Thursday

10.2.1. Introduction to Ito Integral


By the conclusion in Section 9.2.1, for any process { Xt }t 0 2 L2 , when fixing T > 0,
(n)
there exists a sequence of simple processes { Xt }t 0, n = 1, 2, . . . such that

Z T
(n)
lim E ( Xt Xt )2 dt = 0. (10.1)
n!• 0

The Ito integral for a simple process { X̃t }t 0 is well-defined as in (9.3):

Z T n 1

0
X̃t (w ) dBt (w ) = Â X̃t (w ) · [ Bt + (w )
k k 1
Btk (w )] + X̃tn (w ) · [ BT (w ) Btn (w )]
k =0

Rt
Denote It ( X̃ ) , { 0
X̃s (w ) dBs (w )}w 2W .

Theorem 10.2 Given a process { Xt }t 0 2 L2 and fixing T > 0, there exists an unique
process { Zt }t 0 2 Uc2 such that

h i
lim E ( It ( X (n) ) Zt )2 = 0, 8t 2 [0, T ]. (10.2)
n!•

R The process { Zt }t 0 is unique in the following sense: if there is another


(n)
sequence of simple process, { X̃t }t 0, approximating { Xt }t 0, and there

103
exists { Z̃t }t 0 2 Uc2 such that

h i
lim E ( It ( X̃ (n) ) Z̃t )2 = 0, 8t 2 [0, T ],
n!•

then
✓ ◆
P Zt = Z̃t for all t 2 [0, T ] = 1.

Proof. The existence of { Zt }t 0 can be argued if we show { It ( X (n) )}t 0, n = 1, 2, . . . is a


Cauchy sequence:

• We first show that IT ( X (n) ), n = 1, 2, . . . is Cauchy (with respect to L2 measure):

"✓Z Z T ◆2 #
h i T
( j) (k)
E ( IT ( X ( j ) ) IT ( X (k) ))2 = E Xt dBt Xt dBt (10.3a)
0 0
Z T
( j) (k)
=E ( Xt Xt )2 dt (10.3b)
0
Z T
( j) (k)
=E ( Xt Xt + Xt Xt )2 dt (10.3c)
0
Z T
Z T
( j) (k)
 2E ( Xt Xt )2 dt + 2E ( Xt Xt )2 dt
0 0

(10.3d)

where (10.3b) is by the linearity and isometry property in Proposition 9.1; (10.3d)

is by the inequality ( a + b)2  2a2 + 2b2 .

Recall (10.1), i.e., for any # > 0, there exists N such that as long as j, k > N,

Z T
Z T
( j) 2 # (k) #
E ( Xt Xt ) dt < , E ( Xt Xt )2 dt < .
0 4 0 4

h i
Thus E ( IT ( X ( j) ) IT ( X (k) ))2 < # for large j, k.

• Then we show that It ( X (n) ), n = 1, 2, . . . is Cauchy for t < T. Since { It ( X ( j) )}t 0

and { It ( X (k) )}t 0 are martingales, together with the convexity of quadratic

function, we can assert that {( It ( X ( j) ) It ( X (k) ))2 }t 0 is a sub-martingale, which

104
means for large j, k,

h i h i
E ( It ( X ( j) ) It ( X (k) ))2  E ( IT ( X ( j) ) IT ( X (k) ))2 < #, 8t < T.

By part 3) in Proposition 9.1, the process { It ( X (n) )}t 0 2 Uc2 for each n. By the Cauchy
property for { It ( X (n) )}t 0, n = 1, 2, . . . and the closedness of Uc2 with respect to the
norm k X k = (E [ XT 2 ])1/2 , there exists a limit { Zt }t 0 2 Uc2 such that

h i
lim E ( IT ( X (n) ) ZT )2 = 0.
n!•

We can further show (10.2) holds by the sub-martingability of {( It ( X (n) ) Zt )2 }t 0.

Now we begin to show the uniqueness of { Zt }t 0. Suppose there is another


(n)
simple process { X̃t }t 0 approximating { Xt }t 0 , and we have { Z̃t }t 0 such that
h i
(n)
limn!• E ( It ( X̃t ) Z̃t )2 = 0, 8t 2 [0, T ]. Observe that ZT Z̃T 2 L2 . By doob’s in-

equality, !
1
8#, P sup | Zt Z̃t | > #  E [( ZT Z̃T )2 ].
t T #2

This suggests that in order to show that { Z̃t }t 0 is a version of { Zt }t 0, we can start

with bounding their L2 distance:

(n) (n) (n) (n)


E [( ZT Z̃T )2 ] = E [( ZT IT ( Xt ) + IT ( Xt ) IT ( X̃t ) + IT ( X̃t ) Z̃T )2 ]
h i h i
(n) (n) (n)
 3E ( ZT IT ( Xt ))2 + 3E ( IT ( Xt ) IT ( X̃t ))2
h i
(n)
+ 3E ( IT ( X̃t ) Z̃T )2

where the inequality is by applying (Âin=1 ai )2  n Âin=1 a2i . The first and last term

vanishes as n ! •. Moreover, the second term can be upper bounded as

h i  Z T
(n) (n) (n) (n)
E ( IT ( X t ) IT ( X̃t ))2 = E ( ( Xt X̃t ) dBt )2
0
Z T
(n) (n)
=E ( Xt X̃t )2 dt
0
Z T Z T
(n) (n)
 2E ( Xt Xt )2 dt + 2E ( X̃t Xt )2 dt
0 0

105
h i
(n) (n)
and thus E ( IT ( Xt ) IT ( X̃t ))2 ! 0 as n ! •. Put things together, we can assert

that !
2
E [( ZT Z̃T ) ] = 0 =) P sup | Zt Z̃t | > # = 0, 8# > 0.
t T

Define L = {w : Zt (w ) Z̃t (w ) 6= 0 for some t 2 [0, T ]}, then

!

[ 1
L✓ sup | Zt Z̃t | > ,
n =1 t T n

which implies P (L) = 0, i.e.,

✓ ◆
P Zt = Z̃t for all t 2 [0, T ] = P (Lc ) = 1.

Definition 10.3 [Ito Integral for Square Integrable Process] For any process { Xt }t 0 2 L2 ,
define the Ito integral

It ( X ) = Zt , t 2 [0, T ], (10.4)

where { Zt }t 0 is defined in Theorem 10.2. ⌅

Proposition 10.1 We find the Ito integral

Z T
1⇥ 2 ⇤
Bt dBt = B T .
0 2 T

Proof. Before the computation, we need to show { Bt }t 0 satisfies the assumption

for Ito integral, i.e., { Bt }t 0 2 L2 , which is trivial.

(n)
Firstly, we need to figure out the simple process { Bt }t 0 that approximating
(n) (n)
the argument in the integral, say { Bt }t 0. Define P(n) = {0, t1 , t2 , . . . , T } the

partition on [0, T ], and construct

(n) (n) (n)


Bt = Bt(n) , 8 t 2 [ t j , t j +1 ) .
j

106
As a consequence,

Z " Z t(n)
# Z t(n)
T j +1 j +1
h i
(n) (n) (n)
E
0
( Bt 2
Bt ) dt = E Â (n)
tj
( Bt Bt ) dt = Â
2
tj
(n)
E ( Bt Bt )2 dt
j i
Z t(n)  Z t(n)
j +1 j +1 (n)
=Â (n)
E ( Bt(n) Bt ) 2
dt = Â (n)
(t t j ) dt
j tj j
j tj

1 (n) (n) 1 (n) (n) 1


=  ( t j +1 t j )2  k P ( n ) k  ( t j +1 t j ) = kP(n) k T,
2 j 2 j
2

(n)
which indicates that as long as we construct { Bt }t 0 such that kP(n) k ! 0,
hR i
T (n)
E 0 ( Bt Bt )2 dt ! 0 as n ! •.
(n)
2. The Ito integral of the simple process { Bt }t
1. 0 is

IT ( B(n) ) = Â Bt(n) ( Bt(n) Bt(n) )


j j +1 j
j

We will show this term converges to 12 [ BT2 T ] in L2 . Observe that

" ✓ ◆2 #
1 2
Bt(n) ( Bt(n) Bt(n) ) = B (n) B2(n) Bt(n) Bt(n)
j j +1 j 2 t j +1 tj j +1 j

which implies

" ✓ ◆2 #
1
IT ( B ( n ) ) =
2 Â( Bt2(+) n
j 1
B2(n) )
tj
 Bt(n)
j +1
Bt(n)
j
j j
1h 2 i 1h i
= ( BT B02 ) Q ( P(n) ) = B2 Q ( P(n) )
2 2 T

where Q(P(n) ) is the quadratic variation of Brownian motion { Bt }t 0 over the


L2 L2 1 ⇥ ⇤
partition P(n) . Recall that Q(P(n) ) ! T, which implies IT ( B(n) ) !2 BT2 T .

So we conclude that

Z T
1⇥ 2 ⇤
IT ( B ) , Bt dBt = B T .
0 2 T

107
10.2.2. Properties of Ito Integral
RT
Proposition 10.2 For any { Xt }t 0 2 L2 , the Ito integral 0
Xt dBt has the following

properties:

1. Linearity: let { Xt }t 0 , {Yt }t 0 2 L2 , for any a, b,

Z T Z T Z T
(aXt + bYt ) dBt = a Xt dBt + b Yt dBt .
0 0 0

2. Ito isometry: for any 0  s < t < T,

"✓Z ◆2 # Z
t t
E Xu dBu Fs = E Xu2 du Fs .
s s

3. { It ( X )}t 0 2 Uc2 .

Proof. The linearity property is trivial to show, and the third property comes from
(n)
Theorem 10.2. It suffices to show the second propperty. Let { Xt }t 0, n = 1, 2, . . . be
the sequence of simple processes approximating { Xt }t 0, and A 2 FS . It follows that

"✓Z ◆2 #
t h i
2
E Xu dBu 1 A = E ( It ( X ) Is ( X )) 1 A
s
⇣ ⌘2
=E It ( X ) It ( X (n) ) + It ( X (n) ) Is ( X (n) ) + Is ( X (n) ) Is ( X ) 1A
⇣ ⌘2 ⇣ ⌘2
(n)
=E It ( X ) It ( X ) 1A + E It ( X (n) ) Is ( X (n) ) 1A
| {z } | {z }
( a) (b)
⇣ ⌘2
+E Is ( X (n) ) Is ( X ) 1A
| {z }
(c)
h⇣ ⌘⇣ ⌘ i
+ 2E It ( X ) It ( X (n) ) It ( X (n) ) Is ( X (n) ) 1 A
| {z }
(d)
h⇣ ⌘⇣ ⌘ i
+ 2E It ( X ) It ( X (n) ) Is ( X (n) ) Is ( X ) 1 A
| {z }
(e)
h⇣ ⌘⇣ ⌘ i
+ 2E It ( X (n) ) Is ( X (n) ) Is ( X (n) ) Is ( X ) 1 A
| {z }
(f)

108
It is easy to show that ( a), (c) vanishes as n ! •:

⇣ ⌘2 ⇣ ⌘2
(n)
E It ( X ) It ( X ) 1A  E It ( X ) It ( X (n) ) ! 0.

It is also easy to show that (d), (e), ( f ) vanishes as n ! •. For instance,

h⇣ ⌘⇣ ⌘ i h i
E It ( X ) It ( X (n) ) It ( X (n) ) Is ( X (n) ) 1 A  E It ( X ) It ( X (n) ) It ( X (n) ) Is ( X (n) )
⇣ h i⌘1/2 ⇣ h i⌘1/2
 E ( It ( X ) It ( X (n) ))2 E ( It ( X (n) ) Is ( X (n) ))2 ! 0,

h i
as E ( It ( X ) It ( X (n) ))2 ! 0 as n ! •.
h⇣R ⌘ i
t 2
It remains to show that the term (b) in fact converges to E X du
s u
1 A as

n ! •:

⇣ "✓Z ◆2 # ✓Z ◆
⌘2 t t
(n) (n)
E It ( X (n)
) Is ( X (n)
) 1A = E Xu dBu 1A = E ( Xu )2 du 1 A ,
s s

and

✓Z t
◆ ✓Z t

(n)
E ( Xu )2 du 1A E ( Xu ) du 1 A 2
s s
✓Z t

(n) (n)
= E ( Xu + Xu )( Xu Xu ) du 1 A
s
✓Z t

(n) (n)
 E | Xu + Xu || Xu Xu | du
s
Z t 1/2 Z t 1/2
(n) 2 (n) 2
E ( Xu + Xu ) du E ( Xu Xu ) du
s s

Note that the first term is bounded:

Z t
Z t
(n) (n)
E ( Xu + Xu )2 du = E ( Xu Xu + 2Xu )2 du
s s
Z t
Z t
(n) 2
 2E ( Xu Xu ) du + 2E (2Xu )2 du < •.
s s

Together with the fact that the second term vanishes as n ! •, we assert that

✓Z t
◆ ✓Z t

(n)
E ( Xu )2 du 1A E 2
( Xu ) du 1 A ! 0.
s s

109
Or equivalently,

⇣ ⌘2 ✓Z t
◆ ✓Z t

(n)
lim E It ( X (n)
) Is ( X (n)
) 1 A = lim E ( Xu )2 du 1A = E 2
( Xu ) du 1 A .
n!• n!• s s

The proof is completed.

110
Chapter 11

Week11

11.1. Tuesday

11.1.1. Quadratic Variation of Ito Integral


Recall that the Ito integral of the process { Xt }t 0 2 L2 is denoted as

Z t
It ( X ) = Xu dBu .
0

It is a random function of t, and it is continuous and adapted. However, compute It ( X )

directly from definition is sophisticated. The quadratic variation formula of Ito integral

plays a central role for simplifying the calculation of Ito integral. In this lecture, we

will give a introduction on this topic.

Definition 11.1 [Qudaratic Variation] Suppose that X is a function of time t, define its

qudaratic variation over the interval [0, t] as the limit (when it exists)a

n
Q(P(n) ; X )[0, t] = Â [ Xt(n) Xt (n) ], (11.1)
i i 1
i =1

where the limit is taken over the partitions

(n) (n) (n)


P ( n ) : = {0 = t0 , t1 , . . . , t = t n },

(n) (n)
with kP(n) k = max j (t j (t j 1) ! 0. ⌅

a To simplify the notation, sometimes we write X [t] for Xt , and X [t](w ) for Xt (w ).

111
We first discuss how to compute the quadratic variation for { Xt }t 0 2 Uc2 , by
applying the Doob-Meyer Decomposition trick:

Definition 11.2 [Doob-Meyer Decomposition] Consider { Xt }t 0 2 U 2 , then { Xt2 }t 0 is

a non-negative sub-martinalge. The process X·2 admits the unique Doob-Meyer decompo-

sition:

Xt2 = Mt + At , 0  t < •,

with { Mt }t 0 being a right-continuous martingale and { At }t 0 being an increasing

predictable process. In particular, when { Xt }t 0 2 Uc2 , the processes { Mt }t 0 and { At }t 0

are continuous, and denote

h X i , A, h X i0 = 0.

⌅ Example 11.1 Consider the Brownian motion { Bt }t 0 2 Uc2 , then the Doob-Meyer
Decomposition is

Bt2 = ( Bt2 t) + t,

where Mt := Bt2 t is a martingale, and h Bit = t is increasing. ⌅

The following theorem illustrates h X i can be constructed by taking the limit of

quadratic variation:

Theorem 11.1 Let { Xt }t 0 2 Uc2 and {P(n) }•


n=1 be a sequence of partitions on [0, • )

with kP(n) k ! 0. Then for any t > 0, then the quadratic variation

Q(P(n) ; X )[0, t] = Â[ Xt(n) ^t Xt (n) ^ t ]2


j +1 j
j

converge to h X it in probability. Here we call {h X it }t 0 the quadratic variational

process of X.

We first consider the bounded process X• and h X i• , and then extend the results into

general case by the localization techinique for martingales.

112
Proof. 1. We first consider the bounded process X· on [0, t], i.e., | Xs |  K almost

surely for any s 2 [0, t], and h X is  K almost surely for all s 2 [0, t]. In order

to show the desired result, we will show the stronger result in the sense that
h i2
E Q(P(n) ; X )[0, t] h X it ! 0.

" #2
h i2 ⇣ ⌘2
(n) (n)
E Q(P(n) ; X )[0, t] h X it =E Â X [ t j +1 ^ t ] X [t j ^ t] h X it
j
" #2
⇣ ⌘2 ⇣ ⌘
(n) (n) (n) (n)
=E Â X [ t j +1 ^ t] X [t j ^ t] Â h X i[t j+1 ^ t] h X i[t j ^ t]
j j
" ⇢⇣ #2
⌘2 ⇣ ⌘
(n) (n) (n) (n)
=E Â X [ t j +1 ^ t] X [t j ^ t] h X i[t j+1 ^ t] h X i[t j ^ t]
j
⇢⇣ ⌘2 ⇣ ⌘ 2
(n) (n) (n) (n)
=ÂE X [ t j +1 ^ t] X [t j ^ t] h X i[t j+1 ^ t] h X i[t j ^ t]
j
⇢⇣ ⌘2 ⇣ ⌘
(n) (n) (n) (n)
+2ÂE X [ t j +1 ^ t ] X [t j ^ t] h X i[t j+1 ^ t] h X i[t j ^ t]
j<k
⇢⇣ ⌘2 ⇣ ⌘
(n) (n) (n) (n)
⇥ X [ t k +1 ^ t ] X [tk ^ t] h X i[tk+1 ^ t] h X i[tk ^ t]
(11.2)

Now we show that the second part actually vanishes. For any 0  s < t  u < v,

 
2
E ( Xt Xs ) (h X it h X is ) ( Xv Xu )2 (h X iv h X iu )
⇢ ✓  ◆
2 2
=E E ( Xt Xs ) (h X it h X is ) ( Xv Xu ) (h X iv h X iu ) Ft
⇢ ✓ ◆
=E ( Xt Xs )2 (h X it h X is ) E ( Xv Xu )2 (h X iv h X iu ) Ft
(11.3)

Moreover,

⇥ ⇤ ⇥ ⇤
E ( Xv Xu )2 | Ft = E Xv2 + Xu2 2Xv Xu | Ft
⇥ ⇤
= E Xv2 + Xu2 2E [ Xv Xu | Fu ] | Ft
⇥ ⇤ ⇥ ⇤
= E Xv2 + Xu2 2Xu2 | Ft = E Xv2 Xu2 | Ft

= E [h X iv h X iu | Ft ] ,

where the last equality is because { X 2 h X i} is a martingale. This implies the

113
term in (11.3) vanishes, and therefore the second part in (11.2) vanishes.

By the elementary inequality ( a b)2  2a2 + 2b2 , the first part in (11.2) can be

upper bounded as

⇢⇣ ⌘2 ⇣ ⌘ 2
(n) (n) (n) (n)
ÂE X [ t j +1 ^ t ] X [t j ^ t] h X i[t j+1 ^ t] h X i[t j ^ t]
j
⇣ ⌘4 ⇣ ⌘2
(n) (n) (n) (n)
 2 Â E X [ t j +1 ^ t ] X [t j ^ t] + 2 Â E h X i[t j+1 ^ t] h X i[t j ^ t]
j j
(11.4)

We claim that:
⇣ ⌘4
(n) (n)
(a) Â j E X [t j+1 ^ t] X [t j ^ t] ! 0 as n ! •;
⇣ ⌘2
(n) (n)
(b) Â j E h X i[t j+1 ^ t] h X i[t j ^ t] ! 0 as n ! •.
h i2
L2
Then E Q(P(n) ; X )[0, t] h X it ! 0 as n ! •, i.e., Q(P(n) ; X )[0, t] ! h X it . The
desired result holds since L2 convergence implies convergence in probability.

2. Then consider the case where { Xt }t 0, hXi are unbounded. We argue for this

case by the technique of localization. Define a sequence of stopping time

Tk = inf{t 0 : | Xt | K or h X it K }.

Thus the stopped process { X [t ^ Tk ]}t 0 is a bounded martingale, denoted as

X (k) [t] ⌘ X [t ^ Tk ]. By the Doob-Meyer decomposition,

n o
(k)
( X ( k ) )2 = ( Xt )2 h X (k) it + h X (k) it .

Now we begin to simplify the formula presented above by the uniqueness of

Doob-Meyer decomposition (Inituitively we should have h X (k) i[t] = h X i[t ^ Tk ]).

• The stopped process { X [t ^ Tk ]2 h X i[t ^ Tk ]} is a martingale because


{ Xt2 h X it } is a martingale and Tk is a stopping time.

• Moreover, h X i[t ^ Tk ] is increasing and predicatable.

114
Hence X (k) ⌘ X [t ^ Tk ] admits the Doob-Meyer decomposition

( X (k) )2 = { X [t ^ Tk ]2 h X i[t ^ Tk ]} + h X i[t ^ Tk ].

By the uniqueness of Doob-Meyer decomposition, h X (k) i[t] = h X i[t ^ Tk ]. Apply-


(k)
ing the result in part 1) into the bounded process X• , for any # > 0, h > 0 there

exists N such that as long as n > N,

✓ ◆ ✓ ◆
(n) (k) (k) (n) (k) h
P | Q(P ;X )[0, t] hX i[t]| > # = P | Q(P ;X )[0, t] h X i[t ^ Tk ]| > # < .
2
(11.5)
P
Now we begin to show that Q(P(n) ; X )[0, t] ! h X it :

✓ ◆
P | Q(P(n) ; X )[0, t] h X i[t]| > #
✓ ◆ ✓ ◆
=P | Q(P(n) ; X )[0, t] h X i[t]| > #, Tk (n)
t + P | Q(P ; X )[0, t] h X i[t]| > #, Tk < t
✓ ◆
P | Q(P(n) ; X )[0, t] h X i[t]| > #, Tk t + P ( Tk < t) (11.6a)
✓ ⇣ ⌘2 ◆
(n) (n)
=P Â X [ t j +1 ^ t] X [t j ^ t] h X i[t ^ Tk ] > #, Tk t + P ( Tk < t)
j
✓ ⇣ ⌘2 ◆
(n) (n)
=P Â X [ t j +1 ^ Tk ^ t] X [t j ^ Tk ^ t] h X i[t ^ Tk ] > #, Tk t + P ( Tk < t)
j
✓ ⇣ ⌘2 ◆
(n) (n)
P Â X [ t j +1 ^ Tk ^ t] X [t j ^ Tk ^ t] h X i[t ^ Tk ] > # + P ( Tk < t)
j
✓ ◆
=P Q(P(n) ; X (k) )[0, t] h X i[t ^ Tk ] > # + P ( Tk < t)

h h
 + = h. (11.6b)
2 2

where the upper bound presented (11.6a) is because P ( Tk < •) ! 0, and (11.6b)

makes use of the upper bound (11.5).

In order to complete the proof, it remains to show the following two claims are

correct.

115
Proposition 11.1 Let { Xt }t 0 2 Uc2 with | Xs |  K almost surely for any s 2 [0, t]. Let
{ P(n) }•
n=1 be a sequence of partitions on [0, t ] with k P
(n) k ! 0. Then

" #
(n) (n)
lim E
n!•
 ( X [ t j +1 ] X [t j ])4 = 0.
j

Proof. 1. We first show that E [ Q(P; X )[0, t]]2 is bounded for any partition P:

 2  2
E Q(P; X )[0, t] =E Â ( X [ t j +1 ] X [t j ])2
j
 " #
=E Â ( X [ t j +1 ] X [t j ])4 + 2E Â Â ( X [ t j +1 ] X [t j ])2 ( X [tk+1 ] X [tk ])2
j k j>k
(11.7)

In particular,

" # " #
E Â ( X [ t j +1 ] X [t j ])2 Ftk+1 = E Â ( X [ t j +1 ] 2 + X [ t j ] 2 2X [t j+1 ] X [t j ]) Ftk+1
j>k j>k
" #
⇣ ⌘
=E Â X [ t j +1 ] 2
+ X [t j ] 2
2E X [t j+1 ] X [t j ] Ft j F t k +1
j>k
" #
⇥ ⇤
=E Â X [ t j +1 ] 2 X [ t j ] 2 F t k +1  E X [ t k + 2 ] F t k +1  K 2 .
j>k

h i
Taking k = 0, we have E Â j 1 ( X [ t j +1 ] X [t j ])2  K2 . We can apply these in-

equalities to upper bound the second term in (11.7):

" #
E Â Â ( X [ t j +1 ] X [t j ])2 ( X [tk+1 ] X [tk ])2
k j>k
" ( )#
=E Â E Â ( X [ t j +1 ] 2
X [t j ]) ( X [tk+1 ] 2
X [tk ]) Ftk+1
k j>k
" ( )#
=E Â ( X [ t k +1 ] X [tk ])2 E Â ( X [ t j +1 ] X [t j ])2 Ftk+1
k j>k
" #
E Â ( X [ t k +1 ] X [tk ])2 K2  K4 .
k

116
Then we upper bound the first term in (11.7):


E Â ( X [ t j +1 ] X [t j ])4
j

=E Â ( X [ t j +1 ] X [t j ])2 ( X [t j+1 ] X [t j ])2
j

E Â 4K2 (X [t j+1 ] X [t j ])2  4K4 ,
j

h i
where the first inequality is because E Â j 1 ( X [ t j +1 ] X [t j ])2  K2 . Then we
2
can assert that E [ Q(P; X )[0, t]]  6K4 .

2. In order to show the desired result, we start with simplifying  j ( X [t j+1 ] X [t j ])4 :

 ( X [ t j +1 ] X [t j ])4 = Â( X [t j+1 ] X [t j ])2 ( X [t j+1 ] X [t j ])2


j j
!2
 sup | X [t j+1 ] X [t j ]| Â ( X [ t j +1 ] X [t j ])2
j j

 A2P Â( X [t j+1 ] X [t j ])2


j

in which we define AP = sup{| Xy Xs | : 0  s < y  t, |y s|  kPk}. By

Cauchy-Schwarz inequality,

E Â ( X [ t j +1 ] X [t j ])4
j
" #2
E A2P Â ( X [ t j +1 ] X [t j ]) 2

j
8 " #2 91/2
n o1/2 < =
: Â
 EA4P E ( X [ t j +1 ] X [t j ])2
j
;
n o1/2 n o1/2
2
= EA4P E [ Q(P; X )[0, t]]
n o1/2 n o1/2
 EA4P 6K4

Since X is continuous and thus uniformly continuous on [0, t], AP ! 0 as kPk ! 0.

Applying bounded convergence theorem gives EA4P ! 0. The proof is completed.

117

Proposition 11.2 Let { Xt }t 0 2 Uc2 and h X is  K almost surely for any s 2 [0, t]. Sup-
pose that {P(n) } is a sequence of partitions on [0, t] such that kP(n) k ! 0, then we

have
 ⇣ ⌘2
(n) (n)
lim E
n!•
 h X i[t j+1 ] h X i[t j ] = 0.
j

⇣  ⌘2
(n) (n)
Proof. The proof is by direct computing and upper bounding the term E Â j h X i[t j+1 ] h X i[t j ] :

 ⇣ ⌘
(n) (n) 2
E Â h X i[t j+1 ] h X i[t j ]
j
 ⇣ ⌘
(n) (n) (n) (n)
E sup h X i[t j+1 ] h X i[t j ] · Â h X i[t j+1 ] h X i[t j ]
j j

(n) (n)
=E sup h X i[t j+1 ] h X i[t j ] · h X it  E [ ÂP · h X it ],
j

where we define ÂP = sup{|h X iy h X is | : 0  s < y  t, |y s|  kPk}. Since h X i is

continous, and therefore uniformly continuous on [0, t], when kPk ! 0, ÂP ! 0. It is

easy to see that | ÂP · h X it |  2K2 . By bounded convergence theorem, as kPk ! 0,

 ⇣ ⌘2
(n) (n)
E Â h X i[t j+1 ] h X i[t j ]  E [ ÂP · h X it ] ! 0.
j

The proof is completed. ⌅

Now we begin to characterize the quadratic variation of Ito’s integral It ( X ) for

X• 2 L2 . However, we don’t prove this result by taking the limit of the formula

presented in Theorem 11.1 (because it is a little bit complicated), but making use of the

Doob-Meyer decomposition.

Theorem 11.2 The quadratic variation of the Ito integral It ( X ) for { Xt }t 0 2 L2 on


the interval [0, T ] is
Z T
Xt2 dt.
0

118
Proof. The Doob-Meyer decomposition for { It ( X )}t 0 2 Uc2 is

It2 ( X ) , ( It2 ( X ) h I ( X )it ) + h I ( X )it .

We will characterize this decomposition form as follows. On the one hand,

E [( It ( X ) Is ( X ))2 | Fs ] = E [ It2 ( X ) 2It ( X ) Is ( X ) + Is2 ( X ) | Fs ]

= E [ It2 ( X ) | Fs ] 2Is ( X )E [ It ( X ) | Fs ] + Is2 ( X )

= E [ It2 ( X ) | Fs ] Is2 ( X ).

On the other hand,

Z t
2
E [( It ( X ) Is ( X )) | Fs ] = E Xu2 du | Fs
s
Z t Z s
=E Xu2 du | Fs Xu2 du.
0 0

Equating those two terms aand re-arranging gives

 Z t Z s
E [ It2 ( X ) Xu2 du Fs = Is2 ( X ) Xu2 du.
0 0

Rt Rt
Hence, { It2 ( X ) 0
Xu2 du}t 0 is a martingale. Moreover, 0
Xu2 du is a predictable

increasing process in t. By the uniqueness of Doob-Mayer decomposition,

Z t
h I ( X )it = Xu2 du.
0

119
11.2. Thursday

11.2.1. Quadratic Covariation


Note that quadratic variation characterizes the sum of the quadratic of jumps for a

single process. Now we aim to quantify the sum of product of jumps for two processes

{ Xt }t 0 , {Yt }t 0 2 Uc2 :

Definition 11.3 [Quadratic Covariation] Define the quadratic covariation of X• , Y• 2 Uc2

as

⇣ ⌘⇣ ⌘
(n) (n) (n) (n)
Qc (P; X, Y )[0, t] = Â X [t j+1 ^ t] X [t j ^ t] Y [ t j +1 ^ t ] Y [t j ^ t]
j

Define the quadratic covariational process of X and Y as

1
h X, Y i = (h X + Y i hX Y i),
4

which is well-defined since X + Y, X Y 2 Uc2 . ⌅

Similar as in Theorem 11.1, we can show that the quadratic covariational process is

the limit of the quadratic covariation as kPk ! 0.

Theorem 11.3 Let {P(n) }•


n=1 be a sequence of partitions with k P
(n) k ! 0. Then the

quadratic covariation Qc (P(n) ; X, Y )[0, t] converges to h X, Y it in probability.

Proof. It suffices to show that

1
Qc (P; X, Y )[0, t] = ( Q(P; X + Y )[0, t] + Q(P; X Y )[0, t]) , (11.8)
4

and the remainings can be shown by taking the limit of the quadratic variation

Q(P(n) ; X + Y )[0, t] and Q(P(n) ; X Y )[0, t]. Here we begin to simplify the RHS in

120
(11.8):

2
Q(P; X + Y )[0, t] = Â X [t j+1 ^ t] + Y [t j+1 ^ t] X [t j ^ t] Y [ t j +1 ^ t ]
j

2 2
= Â X [ t j +1 ^ t ] X [t j ^ t] + Â Y [ t j +1 ^ t ] Y [ t j +1 ^ t ]
j j

+ 2 Â X [ t j +1 ^ t ] X [t j ^ t] Y [ t j +1 ^ t ] Y [ t j +1 ^ t ] .
j

Similarly,

2 2
Q(P; X Y )[0, t] = Â X [t j+1 ^ t] X [t j ^ t] + Â Y [ t j +1 ^ t ] Y [ t j +1 ^ t ]
j j

2 Â X [ t j +1 ^ t ] X [t j ^ t] Y [ t j +1 ^ t ] Y [ t j +1 ^ t ] .
j

Then it is clear that (11.8) holds. The proof is completed. ⌅

Proposition 11.3 — Properties about Quadratic Covariational Process. Let stochastic


(1) (2)
processes X• , Y• , X• , X• 2 Uc2 , then the following properties hold:

1. Symmetry: h X, Y i = hY, X i;

2. For any a, b 2 R, h aX (1) + bX (2) , Y i = ah X (1) , Y i + bh X (2) , Y i;

3. Cauchy-Schwarz inequality: |h X, Y it |2  h X it hY it .

Proof. Part 1) and part 2) can be shown by definition. For part 3), we first apply

Cauchy-scharz inequality on the quadratic covariation of X and Y:

!2
⇣ ⌘⇣ ⌘
(n) (n) (n) (n)
( Qc (P; X, Y )[0, t])2 = Â X [ t j +1 ^ t ] X [t j ^ t] Y [ t j +1 ^ t ] Y [t j ^ t]
j
! !
⇣ ⌘2 ⇣ ⌘2
(n) (n) (n) (n)
 Â X [ t j +1 ^ t] X [t j ^ t] Â Y [ t j +1 ^ t] Y [t j ^ t]
j j

= Q(P; X )[0, t] Q(P; Y )[0, t]

Then taking the limit on both sides gives the desired result. ⌅

R Recall that a monotone function has the bounded variation over any finite

121
intervals. Then h X i has the bounded variation since it is increasing. It follows
that the quadratic covariational process h X, Y i has the bounded variation
since it is a linear combination of two quadratic variational processes.

Next, we show that the product process XY admits the decomposition that is

similar to Doob-Meyer decomposition, though in this case h X, Y i may not be increas-

ing:

⌅ Example 11.2 We can show that XY h X, Y i is a martingale as follows. Since


stochastic processes X + Y, X Y 2 Uc2 , the Doob-Meyer decomposition implies that

( X + Y )2 h X + Y i, ( X Y )2 hX Y i are martingales. Then we can express the term

XY h X, Y i as the linear combination of two martingales, which is a martingale as well:


1
[( X + Y )2 h X + Y i] [( X Y )2 h X Y i]
4

1
= [ X 2 + 2XY + Y2 X 2 + 2XY Y2 ] [h X + Y i hX Y i]
4
1
= XY [h X + Y i h X Y i].
4

11.2.2. Ito Integral for General Processes


Ito Integral for Simple Processes. Let M• 2 Uc2 , and { Xt }t 0 be a simple process

with the corresponding partition P = {0 = t0 < t1 < · · · < tn  T }. Then the Ito integral

of Xt w.r.t. Mt is defined as

Z T n 1

0
Xt dMt = Â X [ t j ] · ( M [ t j +1 ] M[t j ]) + X [tn ]( M [ T ] N [tn ]),
j =1

denoted as IT ( X ). Immediately we have the following properties:

1. { It ( X )}t 0 2 Uc2 ;

122
2. The quadratic variational process of I• ( X ) is

Z t
h I ( X )it = Xu2 dh Miu .
0

n Rt o
The process It2 ( X ) X 2 dh M i u
0 u
is a martingale;
t 0
3. Ito isometry: for any t 0 we have

Z t 2 Z t
E Xu dMu =E Xu2 dh Miu .
0 0

We need some conditions, such as integrability results about X, to define the Ito
RT
integral 0 Xt dMt for general stochastic processes X• .

Definition 11.4 [Ito Integrable Space] The set L2 ( M) denotes the space containing

all adapted stochastic processes { Xt }t 0, such that, there exists a sequence of simple
(n)
processes { Xt }t 0 satisfying

Z T
(n)
lim E ( Xt Xt )2 dh Mit = 0, 8 T > 0.
n!• 0

Ito’s Integral on L2 ( M ). For any X• 2 L2 ( M) and associated simple processes X•(n)


appproximating X• , there exists a limit of the Ito integral { It ( X (n) )}t 0 in the closed

set Uc2 , Then we take


Z T Z T
(n)
Xt dMt = lim Xt dMt .
0 n!• 0

Ito’s Integral on Locally Bounded Processes. The question is whether we could


extend the Ito’s integral to other processes, such as locally bounded process { Xt }t 0

and continuous square-integrable local martingale { Mt }t 0.

Definition 11.5 [Local Martingale] A process X• is called a local martingale if there is

a sequence of finite stopping times {tn } with tn " • so that X tn ⌘ { X (tn ^ t)}t 0 is a

martingale for each n 1. ⌅

123
Definition 11.6 [Local Bounded] A process X• is said to be locally bounded if there is

a sequence of finite stopping times {sn } with sn " • so that X sn is bounded for each

n 1. ⌅

The answer to the question above is yes. We take t̃n = tn ^ sn , then t̃n " •, and Mt̃n 2 Uc2 .
(n) (n)
Construct Xt = Xt 1{t  t̃n }, then X• 2 L2 ( Mt̃n ). Then we define the Ito’s integral

Z T Z T
(n)
Xt dMt = Xt dMt̃n , T 2 [0, t̃n ].
0 0

Note that this definition is consistent, i.e., does not depend on the particular choice of

the sequence {t̃n }.

Ito’s Integral on the class of semi-martingales. Finally, we are wondering


whether it is possible to define the stochastic integral on the class of semi-martingales.

We define the semi-martingale for continuous case as the following:

Definition 11.7 [Semi-martingale] We say X• is a semi-martingale if it admits the

decomposition

X t = A t + Mt ,

where M• is a continuous local martingale, and A• is an adapted process of finite variations:


| A|(t) ⌘ sup sup  1(tn  t)| A[tn ] A[tn 1 ]| < •, 8t 0.
d>0,t0 =0 tn tn 1 d n =1

Let { Xt }t 0 be a left-continuous adapted process, and {Yt }t 0 be a continuous semi-

martingale with the decomposition Yt = At + Mt . Then define the stochastic integral

Z T Z T Z T
Xt dYt = Xt dMt + Xt dAt .
0 0 0

Note tht a left-continuous adapted process is locally bounded. Hence the first term
RT
0
Xt dMt is the Ito integral w.r.t. a continuous local martingale. The second term is

the Riemann integation defined for each w 2 W.

124
Chapter 12

Week12

12.1. Tuesday

12.1.1. Ito’s Formula


In this lecture, we will study the Ito’s formula, which is very useful for evaluating Ito’s

integrals. The following elementary identity will be used frequently:

X [ t j +1 ] 2 X [ t j ] 2 = ( X [ t j +1 ] X [t j ])2 + 2X [t j ]( X [t j+1 ] X [t j ]) (12.1)

Now we show how to compute the Ito integral for Xt2 based on this identity:

⌅ Example 12.1 Suppose that { Xt }t 0 2 Uc2 , and P = {t0 < t1 < · · · < t} is a partition
on the interval [0, t]. Then take the summation on 12.1 both sides yields

Xt2 X02 = 2 Â X [t j ]( X [t j+1 ] X [t j ]) + Â( X [t j+1 ] X [t j ])2 .


j j

Taking the limit both sides as kPk ! 0, we have

Z t
Xt2 X02 = 2 Xs dXs + h X it .
0

It is the Ito’s formula applied to Xt2 . ⌅

125
Theorem 12.1 Let { Bt }t 0 be a Brownian motion and f 2 C 2 (R ). Then

Z t Z t
1
f ( Bt ) = f (0) + f 0 ( Bu ) dBu + f 00 ( Bu ) du,
0 2 0

almost surely for any t 0.

Proof. Let P = {t0 < t1 < · · · < tn = t} be a partition on [0, t], then f ( Bt ) admits the

expansion
n 1
f ( Bt ) = f (0) + Â [ f ( B[t j+1 ]) f ( B[t j ])]. (12.2)
j =0

By Taylor expansion on the RHS, there exists qt j (w ) 2 [ B[t j ](w ), B[t j+1 ](w )] such that

1 00
f ( B[t j+1 ]) f ( B[t j ]) = f 0 ( B[t j ])( B[t j+1 ] B[t j ]) + f ( q t j ) · ( B [ t j +1 ] B[t j ])2 . (12.3)
2

Substituting (12.3) into (12.2) yields

1
f (0) = Â f 0 ( B[t j ])( B[t j+1 ]

f ( Bt ) B[t j ]) + f 00 (qt j ) · ( B[t j+1 ] B[t j ])2 .
j j

Rt
As kPk ! 0, the term  j f 0 ( B[t j ])( B[t j+1 ] B[t j ]) ! 0
f 0 ( Bu ) dBu in probability. Then

we begin to compute the limit for the second term on RHS.

• We first consider the case where qt j (w ) ⌘ Bt j .

" #2
E Â f 00 ( B[t j ]) · ( B[t j+1 ] B[t j ])2 Â f 00 ( B[t j ]) · (t j+1 tj )
j j
" #2
=E Â f 00 ( B[t j ]) · ( B [ t j +1 ] B[t j ])2 ( t j +1 tj )
j
⇥ ⇤2
= Â E f 00 ( B[t j ]) · ( B[t j+1 ] B[t j ])2 ( t j +1 tj )
j

+ 2 Â E f 00 ( B[t j ]) · ( B[t j+1 ] B[t j ])2 ( t j +1 tj )
j<k

⇥ f 00 ( B[tk ]) · ( B[tk+1 ] B[tk ])2 ( t k +1 tk )

126
The second term vanishes because for any j < k,


E f 00 ( B[t j ]) · ( B[t j+1 ] B[t j ])2 ( t j +1 tj )

⇥ f 00 ( B[tk ]) · ( B [ t k +1 ] B[tk ])2 ( t k +1 tk )

=EE f 00 ( B[t j ]) · ( B[t j+1 ] B[t j ])2 ( t j +1 t j ) Ft j

⇥ f 00 ( B[tk ]) · ( B [ t k +1 ] B[tk ])2 ( t k +1 tk )

00
=E f ( B[t j ])E ( B[t j+1 ] B[t j ])2 ( t j +1 t j ) Ft j

⇥ f 00 ( B[tk ]) · ( B [ t k +1 ] B[tk ])2 ( t k +1 tk ) = 0,

where the first equality is by tower property and the last equality is because

{ Bt2 t}t 0 is a martingale. To simplify the first term, observe that

⇥ ⇤2
E f 00 ( B[t j ]) · ( B[t j+1 ] B[t j ])2 (t j+1 t j )
h ⇣ ⌘i2
=E f 00 ( B[t j ]) · E ( B[t j+1 ] B[t j ])2 (t j+1 t j ) Ft j
⇥ ⇤2 ⇥ ⇤2
=E f 00 ( B[t j ]) · 2(t j+1 t j )2 = 2 ( t j +1 t j )2 E f 00 ( B[t j ])

It follows that

⇥ ⇤2
ÂE f 00 ( B[t j ]) · ( B[t j+1 ] B[t j ])2 ( t j +1 tj )
j
⇥ ⇤2
= 2 Â ( t j +1 t j )2 E f 00 ( B[t j ])
j
⇥ ⇤2 ⇥ ⇤2
 2 k P k  ( t j +1 t j )E f 00 ( B[t j ]) = 2tkPkE f 00 ( B[t j ]) .
j

(a) Suppose that f 00 is bounded, i.e., | f 00 ( x )|  K for any x 2 R, then as kPk ! 0,

⇥ ⇤2
ÂE f 00 ( B[t j ]) · ( B[t j+1 ] B[t j ])2 ( t j +1 tj )  2tkPkK2 ! 0,
j

L2
which means that  j f 00 ( B[t j ]) · ( B[t j+1 ] B[t j ])2 !  j f 00 ( B[t j ]) · (t j+1 tj )

127
and therefore in probability. Also, by the Lebesgue integration knowl-

edge, since f 00 is continuous and Bt is almost surely continuous, the term


Rt
 j f 00 ( B[t j ]) · (t j+1 t j ) converges to 0 f 00 ( Bu ) du almost surely. As a result,

Z t
2 P
Âf 00
( B[t j ]) · ( B[t j+1 ] B[t j ]) !
0
f 00 ( Bu ) du.
j

(b) Now consider the case where f 00 is unbounded, then for any K, we wish to

show that

⇥ ⇤ P
 f 00 ( B[t j ]) · ( B [ t j +1 ] B[t j ])2 ( t j +1 t j ) ! 0.
j

We apply the truncation technique so that

⇥ ⇤
f 00 ( B[t j ]) · ( B[t j+1 ] B[t j ])2 ( t j +1 tj )
⇥ ⇤
= f 00 ( B[t j ])1{| f 00 ( B[t j ])|  K } · ( B[t j+1 ] B[t j ])2 (t j+1 t j )
⇥ ⇤
+ f 00 ( B[t j ])1{| f 00 ( B[t j ])| > K } · ( B[t j+1 ] B[t j ])2 (t j+1 t j )

By applying the result in the part (a), as kPk ! 0,

⇥ ⇤
 f 00 ( B[t j ])1{| f 00 ( B[t j ])|  K} · ( B [ t j +1 ] B[t j ])2 ( t j +1 t j ) ! 0.
j

The remainder term can be upper bounded as the following:

⇥ ⇤
 f 00 ( B[t j ])1{| f 00 ( B[t j ])| > K} · ( B [ t j +1 ] B[t j ])2 ( t j +1 tj )
j

 max f 00 ( B[t j ])1{| f 00 ( B[t j ])| > K } · Â ( B[t j+1 ] B[t j ])2 ( t j +1 tj )
j j
" #

0 u  t
00 00
 max f ( B[t j ])1{| f ( B[t j ])| > K } · Â ( B [ t j +1 ] 2
B[t j ]) + t
j

Applying Theorem 8.1 gives  j ( B[t j+1 ] B[t j ])2 ! t in probability. By the

128
continuity of f 00 and Bt on the interval [0, t],

max f 00 ( B[t j ])1{| f 00 ( B[t j ])| > K } ! 0, as K ! •.


0 u  t

Hence,

⇥ ⇤ P
 f 00 ( B[t j ])1{| f 00 ( B[t j ])|  K} ( B [ t j +1 ] B[t j ])2 ( t j +1 t j ) ! 0, as kPk ! 0.
j

As a result,

Z t
2 P P
Âf 00
( B[t j ]) · ( B[t j+1 ] B[t j ]) ! Â f ( B[t j ]) · (t j+1
00
tj ) !
0
f 00 ( Bu ) du.
j j

• Now consider the case where qt j 6= Bt j . It remains to show that  j f 00 (q [t j ])( B[t j+1 ]
Rt
B[t j ])2 converges to 0 f 00 ( Bu ) du in probability:

 f 00 (q [t j ])( B[t j+1 ] B[t j ])2  f 00 ( B[t j ])( B[t j+1 ] B[t j ])2
j j

= Â[ f 00 (q [t j ]) f 00 ( B[t j ])]( B[t j+1 ] B[t j ])2


j

 max f 00 (q [t j ]) f 00 ( B[t j ]) · Â( B[t j+1 ] B[t j ])2


j j

P
By Theorem 8.1 we can see that  j ( B[t j+1 ] B[t j ])2 ! t. For almost all w 2 W, by

the continuity and thus uniform continuity of f 00 ( Bt ),

max f 00 (q [t j ]) f 00 ( B[t j ]) ! 0.
j

Hence,

Z t
P P
 f 00 (q [t j ])( B[t j+1 ] B[t j ])2 !  f 00 ( B[t j ])( B[t j+1 ] B[t j ])2 !
0
f 00 ( Bu ) du.
j j

129
Definition 12.1 [Ito Processes] An Ito process is a stochastic process X• on (W, F , F, P )

of the form
Z t Z t
X t = X0 + µ[u] du + s[u] dB[u], 0  t  T, (12.4)
0 0

where X0 is a F0 -measurable random variable; {µt }t 0 is {Ft }-adapted so that

Z t
P |µu | du < •, for all t 0 = 1;
0

and {st }t 0 is {Ft }-adapted so that

Z t
P su2 du < •, for all t 0 = 1.
0

Sometimes we also write (12.4) as the stochastic differential form

dXt = µt dt + st dBt .

Now we are ready to show the main result of this lecture:

Theorem 12.2 — Ito Formula for 1-dimension. Let { Xt }t 0 be an Ito process with

stochastic differential

dXt = µt dt + st dBt .

Let f 2 C 2 (R ), then f ( Xt ) is again an Ito process, with the following equality holds

almost surely:

Z t Z t
1
f ( X t ) = f ( X0 ) + f 0 ( Xu ) dXu + f 00 ( Xu )( dXu )2 .
0 2 0

In particular, ( dXt )2 can be computed according to the rules

( dt)2 = ( dt)( dBt ) = ( dBt )( dt) = 0, ( dBt )2 = dt.

130
So we further have the representation

Z t Z t Z t
1
f ( X t ) = f ( X0 ) + f 0 ( Xu )µu du + f 00 ( Xu )su2 du + f 0 ( Xu )su dBu , 8t 0
0 2 0 0

R The Ito process { f ( Xt )}t 0 has the stochastic differential form


1 00
d f ( Xt ) = f 0 ( Xt ) µ t + f ( Xt )st2 dt + f 0 ( Xt )st dBt .
2

The proof follows the similar idea as stated in Theorem 12.1. Here we only provide

a sketrch of the proof:

Outline of Proof. Let P = {t0 < t1 < · · · < tn = t} be a partition on [0, t], then f ( Xt )

admits the expansion


f ( Xt ) = f ( X0 ) + Â f ( X [t j+1 ]) f ( X [t j ])
j
 (12.5)
1
= f ( X0 ) + Â f ( X [t j ])( X [t j+1 ]
0
X [t j ]) + f 00 (q [t j ])( X [t j+1 ] X [t j ]) 2

j
2

(a) First consider the case where {µt } and {st } are simple processes, then

X [ t j +1 ] X [t j ] = µ[t j ](t j+1 t j ) + s[t j ]( B[t j+1 ] B[t j ])

Substituting this into (12.5) gives

f ( Xt ) = f ( X0 ) + Â f 0 ( X [t j ])µ[t j ](t j+1 t j ) + Â f 0 ( X [t j ])s[t j ]( B[t j+1 ] B[t j ])


j j
1

+ f 00 (q [t j ])( X [t j+1 ] X [t j ])2 .
j

P
Assume that q [t j ] = X [t j ], then we first show  j f 00 ( X [t j ])( X [t j+1 ] X [t j ])2 !

 j f 00 ( X [t j ])s[t j ]2 (t j+1 t j ), and then show that

Z t
1 a.s.
Âf 00 2
( X [t j ])s[t j ] (t j+1 tj ) !
2 0
f 00 ( Xu )su2 du.
j

131
For the case where q [t j ] 6= X [t j ], we will show that

a.s.
 f 00 (q [t j ])(X [t j+1 ] X [t j ])2  f 00 (X [t j ])(X [t j+1 ] X [t j ])2 ! 0.
j j

(b) For general processes {µt }t 0 and {st }t 0, we will use the approximation of

simple processes.

⌅ Example 12.2 Let { Xt }t 0 be the drifted Brownian motion:

Xt = µt + Bt , µ 2 R.

Then we apply the Ito’s formula to compute the stochastic differential form of X 2 :

• Take f ( x ) = x2 and f 0 ( x ) = 2x, f 00 ( x ) = 2. Hence,

d f ( Xt ) = dXt2

= (2µXt + 1) dt + 2Xt dBt .

The Ito’s formula can also be generalized into multiple processes:

(1) (d)
Theorem 12.3 — Ito’s Formula. Let { Xt }t 0 , . . . , { Xt }t 0 be continuous semi-

martingales, and f 2 C 2 (R ), then

(1) (d) (1) (d)


f ( X t , . . . , X t ) = f ( X0 , . . . , X0 )
d Z t
∂f (1) (d) ( j)
+Â ( Xu , . . . , Xu ) dXu
j =1 0
∂X j
Z t
1 d ∂2 f (1) (d)
2 j,kÂ
+ ( Xu , . . . , Xu ) dh X ( j ) , X ( k ) i u
=1 0 ∂X j ∂Xk

In particular, dh Bi , Bj i = 1(i = j) dt and dh Bi , ti = dht, Bi i = 0.

132
Recall that in discrete case, any real-valued process is a semi-martingale, while it is

not true for continuous case. We define the semi-martingale for continuous case as the

following:

Definition 12.2 [Semi-martingale] We say X• is a semi-martingale if it admits the

decomposition

X t = A t + Mt ,

where M• is a continuous local martingale, and A• is an adapted process of finite variations:


| A|(t) ⌘ sup sup  1(tn  t)| A[tn ] A[tn 1 ]| < •, 8t 0.
d>0,t0 =0 tn tn 1 d n =1

( j) ( j) ( j)
R Suppose that X ( j) admits the decomposition Xt = At + Mt , then

d Z t d Z t
∂f (1) (d) ( j) ∂f (1) (d) ( j)
 ∂X j
( Xu , . . . , Xu ) dXu = Â
∂X j
( Xu , . . . , Xu ) dMu
j =1 0 j =1 0
d Z t
∂f (1) (d) ( j)
+Â ( Xu , . . . , Xu ) dAu
j =1 0
∂X j

Moreover, we can show that h X ( j) , X (k) i = h M( j) , M(k) i. Then the process


(1) (d)
{ f ( Xt , . . . , Xt )}t 0 also admits the semi-martingale decomposition:

(1) (d) f f
f ( X t , . . . , X t ) = A t + Mt ,

where

d Z t
f (1) (d) ∂f (1) (d) ( j)
Mt = f ( X0 , . . . , X0 ) + Â ∂X j
( Xu , . . . , Xu ) dMu ,
j =1 0
d Z t
f ∂f (1) (d) ( j)
At = Â ∂X j
( Xu , . . . , Xu ) dAu
j =1 0
Z t
1 d ∂2 f (1) (d)
+ Â ( Xu , . . . , Xu ) dh M ( j ) , M ( k ) i u
2 j,k=1 0 ∂X j ∂Xk

133
⌅ Example 12.3 We can also apply Theorem 12.3 to obtain an “integration by parts”

formula. Suppose that X• and Y• are semi-martingales, then by direct computation with

f ( Xt , Yt ) = Xt Yt ,

Z t Z t
Xt Yt X0 Y0 = Xs dYs + Ys dXs + h X, Y it
0 0

Suppose that Xt and Yt admits the semi-martingale decomposition

X t = Mt + A t , Yt = Nt + Wt ,

then XY also admits the semi-martingale decomposition

 Z t Z t Z t Z t
Xt Yt = X0 Y0 + Xs dNs + Ys dMs + Xs dWs + Ys dAs + h M, N it
0 0 0 0

12.1.2. Applications of Ito’s Formula


Here we presents some examples for how to use Ito’s formula.

⌅ Example 12.4 We aim to sovle the following stochastic differential equation with

Xt = Mt + At being a continuous semi-martingale;

dZt = Zt dXt , Z0 = 1. (12.6)

This equation is called the stochastic exponential of X, which can be re-written as the

integral form:
Z t
Zt = 1 + Zu dXu , (12.7)
0

where the integration refers to the Ito’s integral. We guess the solution should be Zt =

exp( Xt + Vt ), with Vt to be determined. Applying Ito’s formula on Zt with f (z ) = exp(z )

134
(z = X + V ) gives

Z t Z t
1
Zt = 1 + Zu d( Xu + Vu ) + Zu dh X + V iu
0 2 0
Z t Z t Z t
1
=1+ Zu dXu + Zu dVu + Zu dh M + V iu
0 0 2 0

1 1
In order to satisfy (12.7), we take Vt = 2 h M it . As a result, exp Xt 2 h M it is the

solution to (12.7), which is called the stochastic exponential. ⌅

⌅ Example 12.5 [Levy’s characterization of Brownian Motion] Levy states that the

quadratic variational process of a continuous local martingale will characterize the Brownian

motion uniquely:

Theorem 12.4 — Levy’s Theorem. Consider the filtered probability space (W, F , F, P )

satisfying the usual condition. Let { Mt }t 0 be a stochastic process on this filtered

probability space, with M0 = 0 almost surely. Then the process { Mt }t 0 is a

Brownian motion if and only if:

1. { Mt }t 0 is a continuous local martingale with respect to F;

2. The quadratic variation h M it = t almost surely.

⇣ ⌘
(z ) z2
Proof. In order to show the sufficiency, we construct Zt = exp z Mt 2t . By
(z )
Ito’s formula on Zt and f ( x ) = e x , for 0  s < t, we have

(z ) (z )
dZt = zZt dMt .

(z )
It indicates that Zt is a martingale, by the uniqueness of semi-martingale decom-
(z ) (z )
position. Re-arranging the term E [ Zt | Fs ] = Zs yields

✓ ◆
z2
E [exp(z ( Mt Ms )) | Fs ] = exp (t s) ,
2

which, together with the uniqueness of characterization function, explains that Mt

is a Brownian motion. ⌅

135

A third application indicates that when a continuous local martingale is applied

time-change, it is a Brownian motion:

Theorem 12.5 Consider a stochastic process { Mt }t 0 on (W, F , F, P ), where { Mt }t 0

is a continuous local martingale with initial value equal to 0 and h M i• = •. Let

tt = inf{u : h Miu > t}. Then for 8t 0, tt is a stopping time, and Bt ⌘ Mtt is a

Ftt -Brownian motion with Mt = Bh Mit .

Proof Outline. We call the increasing sequence of stopping time {tt }t 0 the time-change.

Since h Mi• = • almost surely, we can assert that each tt is finite almost surely. By the

continuity of { Mt }t 0, we have h M itt = t. Applying the optional samping theorem on

{ Mu^tt }u 0 gives

E [ Mtt | Ftu ] = Mtu ,

which implies that Bt is a Ftt -local martingale. Applying the optional samping theorem

on { Mu2 ^tt h Miu^tt }u 0 gives

E [ Mt2t h Mitt | Ftu ] = Mt2u h Mitu .

This implies that { Bt2 t} is a Ftt -local martingale. By Levy’s characterization, together

with the continuity of Bt , we conclude that { Bt }t 0 is a Ftt -Brownian motion. ⌅

⌅ Example 12.6 Consider a probability space (W, F , F, P ). Let T > 0 and Q be a

probability measure on (W, FT ) that is absolutely continuous with respect to P. Denote


dQ
z= dP F . Then for any FT -measurable bounded random variable X, we have
T

Z Z
X (w ) dQ(w ) = X (w )z (w ) dP (w ).
W W

Or equivalently, E X ⇠Q [ X ] = E X ⇠P [zX ].

Conversely, let T > 0 and { Zt }t 0 be a continuous martingale on (W, F , F, P ) with

Z0 ⌘ 1 and Zt (w ) > 0, 8(t, w ) 2 [0, T ] ⇥ W. Define a probability measure Q on (W, FT , P )

136
dQ
to be Q( A) = E [ ZT 1 A ], A 2 F T . Then we have dP F = ZT . Because { Zt }t 0 is a
T

martingale, then for any t  T, we have

dQ
= Zt .
dP Ft

Theorem 12.6 — Girsanov. Let M• be a continuous local martingale on (W, F , F, P )

with 0  t  T, and Z• be a continuous martingale, strictly positive, and initial


Rt
value equal to 1. Then the process Xt ⌘ Mt Z dh M, Z iu is a continuous local
0 u

martingale on (W, F , F, Q ).

Proof Outline. By the technique of localization, we assume that M, Z, Z1 are all bounded.

It suffices to show that X is a martingale w.r.t. the probability measure Q, i.e., for any

0  u < t  T, EQ [1 A ( Xt Xu )] = 0, 8 A 2 Fu . By definition,

E Q [1 A ( Xt Xu )] = EP [1 A ( Zt Xt Zu Xu )].

It remains to show that { Zt Xt } is a martingale w.r.t. the probability measure P.

Applying the integration by parts gives

Z t Z t
Zt Xt = Z0 X0 + Zs dXs + Xs dZs + h X, Z it
0 0
Z t ✓ ◆ Z t
1
= Z0 X0 + Zs dMs dh M, Z is + Xs dZs + h X, Z it
0 Zs 0
Z t Z t
= Z0 X0 + Zs dMs + Xs dZs .
0 0

By the uniqueness of semi-martingale decomposition, we can see that { Zt Xt } is a

martingale. ⌅

Motivation for Martingale Representation Theorem. The last application we


will discuss is the martingale representation theorem. Previously we noticed that if

the stochastic process { f t }t 0 2 L2 , i.e., is square-integrable, then the process Xt =

137
Rt
X0 + f dBs
0 s
is always a martingale w.r.t. Ft .1 The martingale representation theorem

states that the converse is also true: any Ft -martingale can be represented as an Ito’s

integral.

Theorem 12.7 — Martingale Representation Theorem. Let B• be a standard Brownian

motion on a complete probability space (W, F , P ), and F0 ⌘ {Ft0 }t 0 be the natural


0 =s [
filtration generated by B• , with F• 0 . Let F• , F be the completion of
t 0 Ft

0 , F 0 , respectively, and denote F ⌘ {F }


F• t t 0 . Then consider a square-integrable

martingale M• on the filtered probability space (W, F , F, P ). There exists a stocastic

process { f t }t 0 2 L2 so that

Z t
Mt = E [ M0 ] + f u dBu .
0

This theorem relies on an important auxiliary result:

Let z 2 L2 (W, FT , P ), then there exists f • 2 L2 so that

Z T
z = E [z ] + f t dBt .
0

Proof for Theorem 12.7. Assume the claim is true, then for the martingale M• with

MT 2 L2 (W, FT , P ), there exists { f t } 2 L2 so that

Z T Z T
MT = E [ MT ] + f t dBt = E [ M0 ] + f t dBt .
0 0

Hence for any t 2 [0, T ], by the martingability of M• ,

Z t
Mt = E [ MT | Ft ] = E [ M0 ] + f u dBu .
0

This key auxiliary result can be shown by applying Ito’s formula:

1 This
can be shown either by directly applying definition or by the uniqueness of semi-martingale
decomposition.

138
Proof on the Claim. Let T > 0 and define the stochastic exponential martingale for any

f 2 L2 ([0, T ]):

⇢Z t Z t
1
E [ f ]t = exp f (u) dBu f 2 (u) du , t 2 [0, T ].
0 2 0

Then we can show that ` ⌘ span{E [ f ]t : f 2 L2 ([0, T ])} is dense in the space L2 (W, FT , P ).

It remains to verify this claim when z = E [ f ]t for some f 2 L2 ([0, T ]). Directly applying

Ito’s formula gives

Z T
dE [ f ]t = f (t)E [ f ]t dBt =) E [ f ]t = 1 + f (t)E [ f ]t dBt .
0

139
12.2. Thursday

12.2.1. Introduction to SDE


In this week, we will introduce some concept of stochastic differential equations (SDEs).

We will talk about how to solve some simple SDEs. We will also study some important

SDEs in applications that cannot find explicit solutions, and establish the theorem of

existence and uniqueness of solutions.

Motivation. SDEs are usually regarded as an ODE plus a stocahstic perturbation


driven by Brownian motion, which is also called “noises”. The following ODE charac-

terizes the population growth in practice:

dSt
= r dt.
dt

dSt
This simple ODE admits a deterministic solution St . The corresponding SDE is dt =
r dt + s dBt , called Black-Sholes equation, in which the increasing rate is a constant

plus a random perturbation.

In this lecture, we will conside the SDE of the Markovian type, i.e., the parameter

µ and s at time index t only depends on t, Xt instead of { Xu }u<t :

dXt = µ(t, Xt ) dt + s(t, Xt ) dBt , (12.8)

where B• is a standard Brownian motion, and X• is the unknown continuous process.

The differential for SDE has no meaning, while only the integration has the meaning.

Therefore, the above equation refers to the follwoing equation in integral form:

Z t Z t
Xt X0 = µ(u, Xu ) du + s(u, Xu ) dBu . (12.9)
0 0

where µ : [0, •) ⇥ R ! R and s : [0, •) ⇥ R ! R. A particular situation is when

µ(t, Xt ) = µ( Xt ) and s(t, Xt ) = s( Xt ), i.e., µ : [0, •) ! R and s : [0, •) ! R. We call

the SDE in this case the type of time-homogeneous Markovian or Ito.

140
Definition 12.3 [Solution to SDE] Given the functions µ, s defined above, the solution

to (12.8) is the process ( X, B) on a filtered probability space (W, F , F, P ) satisfying

• B• is a standard F-Brownian motion;

• The equation (12.9) holds.

R In addition to the unknown process X• , the probability space and Brownian


motion are also parts of the solution. Only the coefficients µ and s are given.

There are two interpretations for the uniqueness of an SDE:

Definition 12.4 [Pathwise Uniqueness] The solution to (12.8) has pathwise uniqueness

if given the initial value X0 and the Brownian motion { Bt }t 0 on a filtered probability

space (W, F , F, P ), there is a unique stochastic process X• , pathwisely, satisfying the

equation. In other words, if two solutions ( X, B) and ( X 0 , B0 ) have the same initial value,

i.e., X0 = X00 and B0 = B00 , then Xt = Xt0 almost surely for any t 0. ⌅

Definition 12.5 [Uniqueness in Law] The solution to (12.8) has uniqueness in law if two

solutions X, X 0 with the same initial distribution are equivalent, i.e., X and X 0 have the

same finite dimensional distribution. ⌅

Definition 12.6 [Strong/Weak Solution] A solution ( X, B) to equation (12.8) is called a

strong solution if X is adapted to the filteration F B that is generated by the Brownian

motion B• with completion. A solution that is not a strong solution is called a weak

solution. ⌅

R If a strong solution exists, then the SDE always has a solution for any given
probability space and Brownian motion. In other words, when a strong
solution exists, any initial value and Brownian motion corresponds to at least one
solution. If one has the pathwise uniqueness together with the existence of a

141
strong solution, then any initial value and Brownian motion will correspond to a
unique solution.

The following example shows an important concept in SDE: sometimes the solution

exists but there is no strong solution; sometimes we have no pathwise uniqueness but

only uniqueness in law.

⌅ Example 12.7 [The Tanaka Equation] Consider the 1-dimensional equation

Z t
Xt = sign( Xu ) dBu , 0  t < •, (12.10)
0

where sign( X ) = 1{ X 0} 1{ X < 0} denotes the sign function. It corresponds to the

SDE

dXt = sign( Xu ) dBt , X0 = 0. (12.11)

We have the following conclusions on this SDE:

1. The solution to (12.11) has uniqueness in law: Take X be a standard Brownian

motion. For any other solution X 0 satisfying (12.10), notice that X 0 is a continuouis
Rt Rt
local martingale and the quadratic variation h X 0 it = 0 sign( Xu0 )2 du = 0 du = t.

By the Levy’s theorem, X and X 0 share the same distribution.

2. A weak solution exists: choose X be any Brownian motion {Wt }t 0 , and define
Rt
B̃t by B̃t = 0 sign(Ws ) dWs , i.e., d B̃t = sign( Xt ) dXt . Then B̃• is also a Brownian

motion. Moreover,

dXt = sign( Xt ) d B̃t .

Hence, the pair (W, B̃) is a weak solution.

3. Pathwise uniqueness does not hold: when ( X, B) is a solution, then ( X, B) is also

a solution.

4. There is no strong solution.

Next we will discuss how to solve some simple SDEs:

142
⌅ Example 12.8 [Ornstein-Uhlenbeck Process] Consider solving the SDE

dXt = aXt dt + s dBt , X0 = 1,

where a, s are some non-negative constants.

• Take Yt = eat Xt , where eat can be viewed as the integrating factor. Applying Ito’s

formula on Yt with f ( x, t) = eat x gives

dYt = aeat Xt dt + eat dXt

= aeat Xt dt + eat ( aXt ) dt + eat s dBt

= seat dBt

It follows that

Z t Z t
Yt = Y0 + s eau dBu =) Xt = e at
Yt = e at
+s ea(u t)
dBu .
0 0

⌅ Example 12.9 [Geometric Brownian Motion] Consider the SDE

dXt = µXt dt + sXt dBt , X0 = 1,

where µ, s are constants. We claim that there is a unique strong solution Xt = e(µ 1/2s2 )t+sBt .

• To check it is indeed a solution, apply the Ito’s formula on Xt with f (t, B) =

143
e(µ 1/2s2 )t+sB gives

✓ ◆
1 2 (µ 1/2s2 )t+sBt 1/2s2 )t+sBt
dXt = µ s e dt + se(µ dBt
2
1 1/2s2 )t+sBt
+ s2 e(µ dt
2
1/2s2 )t+sBt 1/2s2 )t+sBt
= µe(µ dt + se(µ dBt

= µXt dt + sXt dBt .

⌅ Example 12.10 Consider the simple SDE

dXt = b(t, Xt ) dt + dBt ,

where b(t, x ) : [0, •) ⇥ R ! R is a bounded Borel measurable function.

• We can apply the change of probability measure trick to solve this SDE. Let {Wt }t 0

be a standard Brownian motion on a filtered probabiltiy space (W, F , F, P ). Define

a new probability measure Q with dQdP = Zt , t 0, where Zt is the stochastic


Ft
Rt
exponential of Nt ⌘ 0 b(u, Wu ) dWu :

⇢Z t Z t
1
Zt = exp b(u, Wu ) dWu b2 (u, Wu ) du .
0 2 0

Since N• is a martingale w.r.t. P, we can check that Z• is also a martingale. Applying


1
Ito’s formula on Zt with f (z ) = ez and z = Nt 2 h N it gives

✓ ◆
1
dZt = d exp Nt h N it
2
1 1 Nt 1 1 1
= e Nt 2 h N it dNt e 2 h N it dh N it + e Nt 2 h N it dh N it = Zt dNt .
2 2

Hence, the process Zt admits the integral equation

Z t
Zt = 1 + Zu dNu .
0

144
• We recover the original solution by Girsanov theorem. Define B̃t = Wt W0
Rt 1
0 Zu
dhW, Z iu . In particular,

Z t Z t Z t
hW, Z iu = h dWu , Zu dNu i = Zu dhW, N iu ,
0 0 0

Rt 1
which implies that 0 Zu
dhW, Z iu = hW, N it . Therefore, B̃t = Wt W0 hW, N it
is a martingale w.r.t. the probability measure Q. Moreover, h B̃it = hW it = t. By

Levy’s theorem, B̃• is a standard Brownian motion w.r.t. Q. Furthermore,

Z t Z t Z t
hW, N it = h dWu , b(u, Wu ) dWu i = b(u, Wu ) du.
0 0 0

Substituting this form into B̃t yields

Z t Z t
Wt = W0 + b(u, Wu ) du + B̃t = b(u, Wu ) du + B̃t .
0 0

As a result, (W, B̃) is a solution on the probability space (W, F , Q). This solution is

a weak solution.

145
Chapter 13

Week13

13.1. Thursday

13.1.1. Fundamental Theorems in SDE


The solution to an SDE could be either a strong solution or a weak solution. If we

concern about the Brownian motion to be pre-determined, then we need to consider

the strong solution; otherwise the weak solution is enough, if we only concern the

distribution of a process or the construction of a process. Only when we need to

construct different solutions w.r.t. the same Brownian motion, the strong solution is

needed. The first theorem indicates that the existence of solution, together with the

pathwise uniqueness, implies the existence and uniqueness of a strong solution.

Theorem 13.1 — Existence and Uniqueness for SDEs. Suppose that the SDE in (12.8)

has pathwise uniqueness, then

1. It also has uniqueness in law;

2. The existence of solution implies the existence of strong solution, i.e., there

exists a functional F : R ⇥ C[0, •) ! C[0, •) so that x = F ( x0 , B).

The pathwise uniqueness is difficult to verify. The following theorem gives a

different way to check the existence and uniqueness:

Theorem 13.2 Suppose that the SDE in (12.8) satisfies

1. Coefficients µ, s are Lipschitz in x uniformly in t 2 [0, T ], i.e., there exists

147
constant L so that

|µ(t, x ) µ(t, y)|  L| x y |, |s(t, x ) s(t, x )|  L| x y|, 8 x, y, t 2 [0, T ].

2. Coefficients µ, s satisfy the linear growth condition in x uniformly in t, i.e.,

there exists a constant C so that

|µ(t, x )|  C (1 + | x |), |s(t, x )|  C (1 + | x |), 8 x, t 2 [0, T ].

3. Let Z be a random variable on (W, F0 , P ), independent of the Brownian motion

{ Bt }t 0 , satisfying E [ Z2 ] < •.

Then the SDE with the initial condition X0 = Z admits a strong solution { Xt }t 0

adapted to the filtration {Ft }t 0 and X 2 L2 . Furthermore, the solution has pathwise

uniqueness.

Proof. The uniqueness is by Ito’s isometry and Lipschitz condition. The existence of

strong solution can be shown by Picard iteration method used in ODE. ⌅

We can use Ito’s formula to solve the time-homogeneous SDE:

dXt = µ( Xt ) dt + s( Xt ) dBt . (13.1)

Suppose that µ, s 2 C • (R ) are smooth functions with bounded derivative. Let X• be

the unique strong solution. Take f 2 C 2 (R ), and by Ito’s formula,

Z t Z t 2
∂f 1 ∂ f
f ( Xt ) f ( X0 ) = ( Xu ) dXu + ( Xu ) dh X i u .
0 ∂X 2 0 ∂X 2

The SDE in (13.1) can also be written as the integral form:

Z t Z t Z t
X t = X0 + µ( Xu ) du + s( Xu ) dBu =) h X it = s2 ( Xu ) du.
0 0 0

148
It follows that

Z t Z t
∂f 1 ∂2 f ∂f
f ( Xt ) f ( X0 ) = ( Xu ) µ ( Xu ) + ( Xu )s2 ( Xu ) du + ( Xu )s( Xu ) dBu .
0 ∂X 2 ∂X 2 0 ∂X

Theorem 13.3 Let µ, s be bounded Borel-measurable functions, and assume there


1
exists a constant l > 0 so that l < s(·) < l. Define the operator

1 d2 d
L = s2 ( x ) 2 + µ ( x ) .
2 dx dx

If X• is a continuous process on (W, F , P ) such taht for any f 2 C 2 (R ), the process

Z t
f
Wt = f ( Xt ) f ( X0 ) (L f )( Xs ) ds
0

is a continuous local martingale, then X• solves the SDE (13.1) on the space (W, F , P ).

Proof. In order to show X• is a weak solution on (W, F , P ), it suffices to construct a

Brownian motion B• so that

Z t Z t
X t = X0 + s ( Xu ) dBu + µ( Xu ) du.
0 0

Rt 1 f
We set f ( x ) = x and Bt = 0 s ( Xu )
dWu . Then we show it is a Brownian motion by

Levy’s characterization. ⌅

149

You might also like