0% found this document useful (0 votes)
5 views

4SP_LectureNotes_v3 (1)

The document discusses Brownian motion and stochastic integration, covering foundational concepts such as random variables, Gaussian distributions, and the properties of Brownian motion. It details the construction of stochastic integrals, Ito's formula, and applications in finance, including mathematical models for stock prices and the Black-Scholes equation. The content is structured into chapters that systematically build on the theory and applications of stochastic processes.

Uploaded by

zak zaki (zakk)
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

4SP_LectureNotes_v3 (1)

The document discusses Brownian motion and stochastic integration, covering foundational concepts such as random variables, Gaussian distributions, and the properties of Brownian motion. It details the construction of stochastic integrals, Ito's formula, and applications in finance, including mathematical models for stock prices and the Black-Scholes equation. The content is structured into chapters that systematically build on the theory and applications of stochastic processes.

Uploaded by

zak zaki (zakk)
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Brownian motion and stochastic integration

Yuzhao Wang
UoB 2024
November 14, 2024

1
Contents
1 Preliminaries 3
1.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Gaussian random variable . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Gaussian random vectors . . . . . . . . . . . . . . . . . . . . . . 5

2 Brownian motion 8
2.1 Motivation and definition . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Gaussian process . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Non-differentiability . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7 Quadratic variation . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.8 Markov and martingale properties . . . . . . . . . . . . . . . . . 17
2.8.1 Filtrations . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8.2 Markov property . . . . . . . . . . . . . . . . . . . . . . . 19
2.8.3 Strong Markov property* . . . . . . . . . . . . . . . . . . 19
2.8.4 Martingale property . . . . . . . . . . . . . . . . . . . . . 20

3 Stochastic integral 21
3.1 Wiener Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Stochastic integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Ito Calculus and Finance . . . . . . . . . . . . . . . . . . . . . . 27
3.5 No perfect foresight assumption . . . . . . . . . . . . . . . . . . . 28

4 Ito formula 29
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Proof of Ito formula . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Generalized Ito formula . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Multidimensional Ito’s formula . . . . . . . . . . . . . . . . . . . 33

5 SDEs and applications in finance 35


5.1 Existence and uniqueness . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Mathematical models for stock prices . . . . . . . . . . . . . . . . 35
5.3 Mathematical models for interest rate . . . . . . . . . . . . . . . 38
5.4 Evolution of portfolio value . . . . . . . . . . . . . . . . . . . . . 41
5.5 Black-Scholes Equation . . . . . . . . . . . . . . . . . . . . . . . 42
5.5.1 Evolution of Option Value . . . . . . . . . . . . . . . . . . 42
5.5.2 Equating the Evolutions . . . . . . . . . . . . . . . . . . . 43
5.5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2
1 Preliminaries
In this chapter we recall basic definitions and results used in this lecture.

1.1 Random variables


A measurable space is a pair (Ω, F ) where
• Ω is a nonempty set;
• F is a σ-field, also called a σ-algebra, of subsets of Ω.1
A probability space is a triple (Ω, F , P) where
• (Ω, F ) is a measurable space;
• P is a positive measure on F such that P(Ω) = 1.
If (Ω, F ) and (E, G ) are two measurable spaces, then a mapping X from Ω
to E such that2 X −1 (A) ∈ F as long as A ∈ G is called a measurable mapping
or a random variable (r.v.) from (Ω, F ) to (E, G ) or an E-valued r.v. In this
series of lectures, we will mainly concern with real-valued random variable (or
random vectors), i.e. E = R (or Rd ) and G = B(R) (or B(Rd )) - the Borel
σ-algebra, i.e. the smallest σ-algebra containing all open subsets of R (or Rd ).
If X is a real-valued random variable defined on (Ω, F ) and P a probability
measure on Ω, then we denote the image of P by the mapping X by L (X)

L (X)(A) = P(X −1 (A)) = P(ω ∈ Ω : X(ω) ∈ A), ∀A ∈ B(R). (1)

The measure µX = L (X) is called the distribution or the law of X. Let


X : (Ω, F , P) → (R, B) be a random variable and denote by σ(X) the σ-
algebra generated by X, i.e. the smallest sub-σ-algebra of F with respect
to which X is measurable.
The cumulative distribution function of X with regard to a probability
distribution P is defined as

F (x) = P(X ≤ x).

If there is a function f : R → [0, ∞], such that for each interval [a, b] ⊂ R, we
have Z b
P(a ≤ X ≤ b) = f (x)dx,
a
then f is called the probability density function of X.
Let X be integrable. We define the expectation of X by
Z Z
E[X] = X(ω)P(dω) = xµX (dx). (2)

If the random variable X has density function f , then we have


Z
E[X] = xf (x)dx.
R
1 Thisis to say that the family F contains the set Ω and is closed under the operation of
taking complements and countable unions.
2 Here X −1 (A) = {ω ∈ Ω : X(ω) ∈ A}.

3
Let X be a real square-integrable random variable. Its variance is the quantity

Var(X) = E (X − [X])2 = [X 2 ] − [X]2 . (3)


 

The covariance of two r.v.’s X and Y is defined as

Cov(X, Y ) = E[XY ] − E[X][Y ]. (4)

If Cov(X, Y ) = 0, X and Y are said to be uncorrelated.


The events A1 , · · · , Am ∈ F are said to be independent if and only if

P(Ai1 ∩ · · · ∩ Aiℓ ) = P(Ai1 ) · · · P(Aiℓ )

for every choice of 1 ≤ ℓ ≤ m and of 1 ≤ i1 < i2 < · · · < iℓ ≤ m. The


random variable X1 , · · · Xm taking values respectively in (E1 , E1 ), · · · (Em , Em )
are said to be independent if for every A′1 ∈ E1 , · · · A′m ∈ Em , the events
X −1 (A′1 ), · · · X −1 (A′m ) are independent. If F1 , · · · , Fm are sub-σ-algebras of
F , we say that they are independent if, for every A1 ∈ F1 , · · · , Am ∈ Fm , we
have
P(A1 ∩ · · · ∩ Am ) = P(A1 ) · · · P(Am ).
Therefore, the random variables X1 , · · · , Xm are independent if and only if so
are the generated σ-algebras σ(X1 ), · · · , σ(Xm ).
Let (Xi )i∈I be a (possibly infinite) family of random variables. They are
said to be independent if and only if any sub-family of (Xi )i∈I are independent.
Similarly, we can define the independence of an infinite family of σ-algebras.
If we denote by µi the law of Xi and we define E = E1 × · · · Em , E =
E1 ⊗ · · · ⊗ Em , and fX is the law of X = (X1 , · · · , Xm ) on (E, E ) is. Then, if the
random variable X1 , · · · , Xm are independent if and only if fX = fX1 ⊗· · ·⊗fXm .
Proposition 1.1. If X and Y are real independent integrable random variables,
then
E[XY ] = E[X]E[Y ].
In particular, we have that real integrable and independent random variables
are not corrected. The converse is not true, unless they are jointly Gaussian.

1.2 Gaussian random variable


We say a real-valued random variable X is a Gaussian random variable (or
normally distributed with mean µ and variance σ 2 ) if
Z ∞
1 (y−µ)2
P(X > x) = √ e− 2σ2 dy,
2πσ 2 x
from which we see that the probability density function of X is given by
1 (x−µ)2
fX (x) = √ e− 2σ 2 .
2πσ 2
We may write X ∼ N (µ, σ). If µ = 0 and σ = 1, i.e. X ∼ N (0, 1), then the
Gaussian random variable X is called standard normally distributed.

4
Lemma 1.2. Suppose X is standard normally distributed. Then for all x > 0,
we have
1 1 − x2
P(X > x) ≤ √ e 2. (5)
x 2π
Recall the characteristic function of a random variable X is defined as
the expected value of eitX , i.e.

φX (t) = E eitX . (6)


 

The characteristic function of a real-valued random variable always exists, since


it is an integral of a bounded continuous function over a space whose measure
is finite. We then have the following result
Lemma 1.3. Let X1 and X2 be two random variables with density function fX1
and fX2 respectively. Then fX1 = fX2 if and only if φX1 = φX2 .
The above lemma can be generalized to more general random variables. As
a matter of fact, no two distinct distributions can both have the same charac-
teristic function.
If X ∼ N (µ, σ 2 ), then we have
1 2 2
φX (t) = eitµ e− 2 σ t
, (7)

which has an interesting consequence.


Lemma 1.4. Let X ∼ N (µ, σ 2 ). Then we have E[(X − µ)n ] = 0 when n is
odd, and E[(X − µ)n ] = σ n (n − 1)!! when n is even.
We also have the following useful lemma.
Lemma 1.5 (Kac’s theorem). Let X and Y be two random variables with
characteristic functions φX and φY . X and Y are independent if and only if
φ(X,Y ) (s, t) = φX (s)φY (t) for all (s, t) ∈ R2 .
Lemma 1.6. Let X and Y be independent random variables that are normally
distributed (and therefore also jointly so), then their sum is also normally dis-
tributed. i.e., if X ∼ N (µX , σX
2
), Y ∼ N (µY , σY2 ), and Z = aX + bY, then

Z ∼ N (aµX + bµY , a2 σX
2
+ b2 σY2 ).

1.3 Gaussian random vectors


When dealing simultaneously with more than one random variable the joint
cumulative distribution function can also be defined
We then turn to random vectors with normally distributed components,
which are the higher-dimensional analogue of the normal distribution. The
main motivation is that they are building blocks of the increments of Brownian
motion.
Definition 1.7 (Standard Gaussian random vectors). A random variable X =
(X1 , X2 , · · · , Xd )T with valued in Rd has the d-dimensional standard Gaussian
distribution if its coordinates {Xi }di=1 are independent and standard normally
distributed.

5
More general Gaussian random variable can be constructed via linear image
of standard Gaussians.
Definition 1.8 (Gaussian random variable). A random variable Y ∈ Rd is
called Gaussian if there exists an m-dimensional standard Gaussian X, a d × m
matrix A, and a d-dimensional vector µ such that Y = AX + µ.
The covariance matrix of the vector Y is then given by
Cov(Y ) = E (Y − EY )(Y − EY )T = AAT . (8)
 

The characteristic function of a random vector Y is defined as the expected


value of ei⟨t,Y ⟩ , where t ∈ Rd and
d
X
⟨t, Y ⟩ = tk Yk .
k=1

More precisely, we have h i


φY (t) = E ei⟨t,Y ⟩
If Y is a Gaussian random variable as in Definition 1.8, then we have
h i
φY (t) = ei⟨t,µ⟩ E ei⟨t,AX⟩
m
" d
#
Y  X  
= ei⟨t,µ⟩ E exp i tk akj Xj
j=1 k=1
m
Y d
 1 X 2 
= ei⟨t,µ⟩ exp − tk akj (9)
j=1
2
k=1
 1X Xm 2 
= ei⟨t,µ⟩ exp − tk tℓ akj akℓ
2 j=1
k,ℓ
 1 
= exp i⟨t, µ⟩ − ⟨Ct, t⟩ ,
2
where C is the covariance matrix of the vector Y defined in (4). It is also easy
to check that the density function is the Fourier transform of the characteristic
function. Therefore, we can generalize Lemma 1.3 to higher dimensions
Lemma 1.9. Let X and Y be two random vectors with density function fX and
fY respectively. Then fX = fY if and only if φX = φY .
We then have the following corollary, which shows that the distribution of a
Gaussian random vector is determined by its expectation and covariance matrix.
Corollary 1.10. If X and Y are d-dimensional Gaussian vectors with EX =
EY and Cov(X) = Cov(Y ), then X and Y have the same distribution.
More precisely, we have
Lemma 1.11. Let X be a d-dimensional Gaussian random vector with mean
µ ∈ Rd and a positive definite covariance matrix C = Cov(X). Then X has the
density function
1  1 
fX (x) = d√ exp − (x − µ), C −1 (x − µ)T .
(2π) 2 det C 2

6
It turns out that the orthogonal d×d matrix does not change the distribution
of a standard Gaussian random vector, which is recorded in the following lemma

Lemma 1.12. If A is an orthogonal d × d matrix, i.e. AAT = Id , and X


is a d-dimensional standard Gaussian vector, then AX is also a d-dimensional
standard Gaussian vector.
We have he following useful consequence,

Corollary 1.13. Let X1 and X2 be independent and normally distributed with


zero expection and variance σ 2 > 0. Then X1 +X2 and X1 −X2 are independent
and normally distributed with expectation 0 and variance 2σ 2 .
Lemma 1.14. Suppose {Xn }n∈N is a sequence of Gaussian random vectors and
limn Xn = X, almost surely. If µ = limn µn := limn EXn and C = limn Cn :=
limn CovXn exist, then X is Gaussian with mean µ and covariance matrix C.

7
2 Brownian motion
A large part of probability theory is devoted to describing the macroscopic pic-
ture emerging from random systems defined by microscopic phenomena. Brown-
ian motion is the macroscopic picture emerging from a particle moving randomly
in d-dimensional space without making very big jumps. On the microscopic
level, at any time step, the particle receives a random displacement, cased by
other particles hitting it or by an external force, so that,Pif its position at time
n
zero is x0 , its position at time n is given as xn = x0 + i=1 xi , where the dis-
placements x1 , x2 , x3 , · · · are assumed to be independent, identically distributed
random variables with values in Rd . The process {xn }n≥0 is a random walk,
the displacements represent the microscopic inputs. When we think about the
macroscopic picture, we would like to know
• Does xn drift to infinity?
• Does xn return to the neighbourhood of the origin infinitely often?
• What is the speed of growth of max{|x1 |, · · · , |xn |} as n → ∞?
it turns out that not all the features of the microscopic inputs contribute to
the macroscopic picture. Indeed, if they exist, only the mean and covariance of
the displacements shape the picture. In other words, all random walks whose
displacements have the same mean and covariance matrix give rise to the same
macroscopic process, and even the assumption that the displacements have to be
independent and identically distributed can be substantially relaxed. This effect
is called universality, and the macroscopic process is often called a universal
object. It is a common approach in probability to study various phenomena
through the associated universal objects.

2.1 Motivation and definition


Let (Ω, F , P) be a probability space. A stochastic process {X(t) : t ≥ 0}
is a family of (uncountable many) random variables ω 7→ X(t, ω) defined on
(Ω, F , P). At the same time, a stochastic process can also be interpreted as a
random function with the sample functions defined by t 7→ X(t, ω). While the
later random function perspective of the random processes is one of our main
concerns in this module.
Einstein (1905) considered the Brownian motion in the following way. Con-
sider a long, thin tube filled with clear water, into which we inject at time t = 0
a unit amount of ink, at the location x = 0. Now, let u = u(x, t) denote the
density of ink particles at position x ∈ R and time t ≥ 0. Initially, we have
u(x, 0) = δ0 ,
the Dirac measure at 0. Now, suppose that the probability density of the event
that an ink particle moves from x to x + y in short time τ is f (y, τ ).
Z
u(x, t + τ ) = u(x − y, t)f (y, τ )dy
(10)
R
1
Z  
= u + y∂x u + y 2 ∂xx u + · · · f (y, τ )dy
R 2
Here, we have

8
• f is the probability density, i.e. f dy = 1;
R
R

• symmetry, i.e. f (−y, τ ) = f (y, τ ). Thus, we have yf dy = 0;


R
R

• the variance of f is linear in time τ , i.e.


Z
y 2 f (y, τ )dy = Dτ.
R

We insert these into (10) to get

u(x + t + τ ) − u(x, t) D
= ∂x2 u(x, t) + {higher-order terms of τ }.
τ 2
Taking τ → 0, we see that
D 2
∂t u = ∂ u, (11)
2 x
with initial condition u(0) = δ0 . The solution to this equation is
1 x2
u(x, t) = 1 e
− 2Dt
.
(2πDt) 2

This shows the density of the diffusing ink at time t is N (0, Dt), the normal
distribution, for some constant D. Einstein further computed
RT
D=
NA f
where R is the gas constant, T is the absolute temperature, f is the friction
coefficient, and Na is Avogadro’s number. This equation and the observed
properties of Brownian motion helped J. Perrin to compute NA (≈ 6 × 1023 =
the member of molecules in a mole) and lead to the atomic theory of matter.
We now introduce Brownian motion, for which we take D = 1.
Definition 2.1 (Brownian motion). A real-valued stochastic process {B(t) :
t ≥ 0} is called a (linear) Brownian motion with start in x ∈ R if the following
holds
• B(0) = x,
• the process has independent increments, i.e. for all times 0 ≤ t1 ≤ t2 ≤
. . . ≤ tn the increments B(tn ) − B(tn−1 ), B(tn−1 ) − B(tn−2 ), . . . , B(t2 ) −
B(t1 ) are independent random variables,
• for all t ≥ 0 and h > 0 the increments B(t + h) − B(t) are normally
distributed with expectation zero and variance h, i.e. the process has
stationary increments,
• almost surely, the function t 7→ B(t) is continuous.

We say that {B(t) : t ≥ 0} is a standard Brownian motion if x = 0.

9
Figure 1: Illustration of a standard Brownian motion. Each coloured path
represents a sample (realisation) path of the Brownian motion (source of figure:
https://dlsun.github.io/probability/brownian-motion.html)

N. Wiener, in the 1920s and later, put the theory on a firm mathematical
basis, which we will discuss in the next subsection.
See Figure 1.1 for an illustration of a standard Brownian motion
Lemma 2.2. Suppose B(t) is a one-dimensional Brownian motion. Then

E[B(t)] = 0, E[B 2 (t)] = t, t ≥ 0;

and
E[B(t)B(s)] = min{s, t},
for t, s ≥ 0.

2.2 Existence
The existence of a Brownian motion is a nontrivial question. It is not obvious
that the conditions imposed on the finite-dimensional distributions in the defi-
nitions of Brownian motion allow the process to have continuous sample paths,
or whether there is a contradiction.
Theorem 2.3 (Wiener 1923). Standard Brownian motion exists.
We shall prove this theorem by explicitly constructing a Brownian motion
(this construction is due to Lévy). Lévy’s construction is a bit more complicated
but offers several useful properties that will be used in the later sections (e.g.
for the continuity property). To be more precise, we construct Brownian motion
as a uniform limit of continuous functions, to ensure that it automatically has
continuous paths. Recall that we only need to construct a standard Brownian
motion {B(t) : t ≥ 0} since X(t) = x + B(t) is a Brownian motion with starting
point x.

10
Step 1. We first construct Brownian motion on the interval [0, 1] as a random
element of the space C([0, 1]) of continuous functions on [0, 1]. To this end, we
construct the Brownian motion on the sets of dyadic points

[ nk o
D= Dn where Dn = : 0 ≤ k ≤ 2n

n=0
2n

and interpolate between them. The sets D0 ⊂ D1 ⊂ · · · discretely approximate


the interval [0, 1] (in other words, D is a dense subset of [0, 1]). Below are
explicit lists for the first few Dn :

D0 = {0, 1},
D1 = {0, 1/2, 1},
D2 = {0, 1/4, 1/2, 3/4, 1},
D3 = {0, 1/8, 1/4, 3/8, 1/2, 5/8, 3/4, 7/8, 1},
..
.

Step 2. Let (Ω, F , P) be a probability space on which a collection {Zt : t ∈ D}


of independent, standard normally distributed random variables can be defined.
We now construct by induction on n, random variables B(d), d ∈ Dn in the
following way, for n = 0, Dn = {0, 1}, we define B(0) := 0 and B(1) := Z1 .
Suppose that we have succeeded in doing this for some n − 1. We then define
B(d) for d ∈ Dn \ Dn−1 by

B(d − 2−n ) + B(d + 2−n ) Zd


B(d) = + (n+1)/2 . (12)
2 2
Explicit computations for the first few n:
• n=0
B(0) = 0, B(1) = Z1 .

• n = 1, D1 \ D0 = {1/2}

B(0) + B(1) Z1/2


B(1/2) = + ,
2 2

• n = 2, D2 \ D1 = {1/4, 3/4}

B(0) + B(1/2) Z1/4 B(1/2) + B(1) Z3/4


B(1/4) = + √ , B(3/4) = + √ ,
2 8 2 8

• n = 3, D3 \ D2 = {1/8, 3/8, 5/8, 7/8}

B(0) + B(1/4) Z1/8 B(1/4) + B(1/2) Z3/8


B(1/8) = + , B(3/8) = + ,
2 4 2 4
B(1/2) + B(3/4) Z5/8 B(3/4) + B(1) Z7/8
B(5/8) = + , B(7/8) = + .
2 4 2 4

We now show that the following two properties are satisfied.

11
Lemma 2.4. Let Dn be given as above, and B(d) be defined in (12). Then,
(i) for all r < s < t in Dn the random variable B(t) − B(s) is normally
distributed with mean zero and variance t−s, and is independent of B(s)−
B(r),
(ii) the vectors (B(d) : d ∈ Dn ) and (Zt : t ∈ D\Dn ) are independent.
Having defined the values of the process on all dyadic points, we now inter-
polate them. Define

Z1 for t = 1,

F0 (t) = 0 for t = 0,
linear in between,

and for each n ≥ 1



2 Zt for t ∈ Dn \ Dn−1
 −(n+1)/2

Fn (t) = 0 for t ∈ Dn−1


linear between consecutive points in Dn .

Lemma 2.5. These functions are continuous on [0, 1] and for all n and d ∈ Dn
n
X ∞
X
B(d) = Fi (d) = Fi (d). (13)
i=0 i=0

Moreover, the function series



X
Fn (t)
n=0

is uniformly convergent on [0, 1] almost surely.


We denote the continuous limit by B(t), i.e.

X
B(t) = Fn (t), t ∈ [0, 1]. (14)
n=0

Finally, it remains to check that B(t) defined in (14) is a Brownian motion.

Lemma 2.6. B(t) defined in (14) is a standard Brownian motion.

Step 3. Finally, we extend the domain from [0, 1] to [0, ∞). We now take
a sequence B0 , B1 , . . . of independent C[0, 1]-valued random variables with the
distribution of this process and define {B(t) : t ≥ 0} by gluing together the
parts, by
⌊t⌋−1
X
B(t) := B⌊t⌋ (t − ⌊t⌋) + Bi (1), for all t ≥ 0.
i=0

To show that this defines a continuous random function B : [0, ∞) → R we only


need to verify at positive integer points k ∈ N. Let k ≤ (tm ) < k + 1, (tm ) ↓ k

12
and k − 1 ≤ tm′ < k, tm′ ↑ k. Then

h k−1
X i
lim B(tm ) = lim Bk (tm − k) + Bi (1)
m→∞ m→∞
i=0
k−1
X k−1
X
= Bk (0) + Bi (1) = Bi (1),
i=0 i=0

and
h k−2
X i
lim

B(tm′ ) = lim

Bk−1 (tm′ − k + 1) + Bi (1)
m →∞ m →∞
i=0
k−2
X k−1
X
= Bk−1 (1) + Bi (1) = Bi (1).
i=0 i=0

Hence
lim B(tm ) = lim

B(tm′ ) = B(k).
m→∞ m →∞

Therefore B is continuous at t = k (and thus continuous on [0, ∞)) and is a


standard Brownian motion.

2.3 Gaussian process


Next we introduce another very important stochastic process, namely a Gaus-
sian process.
Definition 2.7. A stochastic process {Y (t) : t ≥ 0} is called a Gaussian process
if for all t1 < t2 < . . . < tn the vector (Y (t1 ), Y (t2 ), . . . , Y (tn ))T is a Gaussian
random vector.
Note that, equivalently,
X {Y (t) : t ≥ 0} is a Gaussian process if every finite
linear combination at Yt (F is a finite set) is either identically zero or has
t∈F
a Gaussian distribution on R. The covariance function of a Gaussian process
{Y (t) : t ≥ 0} is the bivariate function

R(s, t) = Cov(Y (s), Y (t)) = E[(Y (s) − EY (s))(Y (t) − EY (t))].

Proposition 2.8. A Brownian motion {B(t) : t ≥ 0} is a Gaussian process.

If {B(t) : t ≥ 0} is a standard Brownian motion and 0 < t1 < t2 < . . . < tn ,


then for i < j, we have

E(B(ti )) = E[B(ti ) − B(ti−1 ) + . . . + B(t1 ) − B(0) + B(0)] = 0,

and

Cov(B(ti ), B(tj )) = E[B(ti )B(tj )]


= E[B(ti )(B(tj ) − B(ti ))] + E[]B(ti )2 ] = ti .

13
Similarly if j < i then Cov(B(ti ), B(tj )) = tj . Hence in general, we obtain

Cov(B(t), B(s)) = min{t, s}.

Because Gaussian random vectors are characterised by their expectation and


covariance matrices, if two Gaussian processes have the same expectation and
covariance matrices for all sets of times 0 < t1 < . . . < tn then the two Gaus-
sian processes have the same distributions, due to the Kolmogorov extension
theorem.

2.4 Invariance
In this section, we show that if we perform certain transformations on a Brow-
nian motion we still get a Brownian motion.
Lemma 2.9 (Symmetry). Suppose that {B(t) : t ≥ 0} is a standard Brownian
motion. Then {−B(t) : t ≥ 0} is also a standard Brownian motion
Lemma 2.10 (Scaling invariance). Suppose that {B(t) : t ≥ 0} is a standard
Brownian motion and let a > 0. Then the process {X(t) : t ≥ 0} defined by
1
X(t) := B(a2 t)
a
is also a standard Brownian motion.
Theorem 2.11 (Time inversion). Suppose that {B(t) : t ≥ 0} is a standard
Brownian motion. Then the process {X(t) : t ≥ 0} defined by
(
0 for t = 0,
X(t) =
tB( t ) for t > 0,
1

is also a standard Brownian motion.


An interesting application of the time inversion principle is the following
result.
B(t)
Lemma 2.12 (Law of large numbers). Almost surely, lim = 0.
t→∞ t

2.5 Continuity
In this section, we study continuity properties of a Brownian motion. We have
known that a Brownian motion is almost surely continuous. The following the-
orem states a stronger statement providing an upper estimate for the quantity
|B(t + h) − B(t)|.
Theorem 2.13. There exists a constant C > 0 such that, almost surely, for
every sufficiently small h > 0 and all 0 ≤ t ≤ 1 − h,
r 1
B(t + h) − B(t) ≤ C h log . (15)
h
Theorem 2.13 has one important consequence saying that the paths are α-
Hölder continuous, which is stronger than continuity.

14
Definition 2.14. A function f : [0, ∞) → R is said to be locally α-Hölder
continuous at x ≥ 0 if there exists ε > 0 and c > 0 such that

|f (x) − f (y)| ≤ c|x − y|α , for all y ≥ 0 with |y − x| < ε.

We refer to α > 0 as the Hölder exponent and to c > 0 as the Hölder constant.
It is easy to see tht α-Hölder continuity gets stronger, as the exponent α
gets larger.

Lemma 2.15. For h > 0 sufficiently small and all 0 < α < 1/2 we have
1  1 1−2α
log ≤ . (16)
h h
Theorem 2.16. If α < 1/2 then, almost surely, Brownian motion is everywhere
locally α-Hölder continuous.

2.6 Non-differentiability
In the previous section, we have shown that a Brownian motion is almost surely
α-continuous for any α < 1/2.
In this section, we show that almost surely Brownian motion is nowhere
differentiable. This is a striking property of Brownian motion.
Let f : R → R. Define, for any limit point a ∈ R,

lim sup f (x) = lim sup f (x) : x ∈ B(a; ε) \ {a}



x→a ε→0

and

lim inf f (x) = lim inf f (x) : x ∈ B(a; ε) \ {a}



x→a ε→0

where B(a; ε) denotes the ball of radius ε about a. For a function f , we define
the upper and lower right derivatives

B(t + h) − B(t)
D+ f (t) = lim sup ,
h→0+ h

and
B(t + h) − B(t)
D− f (t) = lim inf .
+
h→0 h
It then follows that if f is differentiable at t ∈ R, then both D+ f (t) and D− f (t)
exist and
f ′ (t) = D+ f (t) = D− f (t).
We then have the following theorem
Theorem 2.17 (Paley, Wiener and Zygmund 1933). Almost surely, Brownian
motion is nowhere differentiable. Furthermore, for all t

either D+ B(t) = +∞ or D− B(t) = −∞ or both.

15
2.7 Quadratic variation
In this section, we show that Brownian motion has finite quadratic variation,
which is crucially important for the development of stochastic integration stud-
ied later on.
For a function f : [a, b] → R and Pn = {t0 , · · · , tn } is a partition of the finite
interval [a, b] of the form a = t0 < t1 < . . . < tn = b, the variation and the
quadratic variation of f over [a, b] with respect to Pn are defined respectively
by
n
X
VPn (f )[a, b] = |f (tk ) − f (tk−1 )|
k=1
and
n
X
QPn (f )[a, b] = |f (tk ) − f (tk−1 )|2 .
k=1
Let ∥Pn ∥ = max1≤k≤n {tk − tk−1 } denote the maximum interval length of a
partition. The total variation and the quadratic variation of f over [a, b] are
defined respectively by
V (f )[a, b] = lim VPn (f )[a, b],
∥Pn ∥→0

and
Q(f )[a, b] = lim QPn (f )[a, b].
∥Pn ∥→0

Lemma 2.18. If f is differentiable then


Z b
V (f )[a, b] = |f ′ (x)| dx < ∞, Q(f )[a, b] = 0.
a

Lemma 2.18 shows that the total variation and the quadratic variation of a
differentiable function are both finite.
Recalling from Theorem 2.17 that a Brownian is nowhere differentiable. We
will show that a Brownian motion has unbounded total variation but finite
quadratic variation.
Now we consider the total variation and quadratic variation of a Brownian
motion {B(t) : t ≥ 0} defined as follows: the total variation
V (B)[0, t] = lim VPn (B)[0, t],
∥Pn ∥→0

where
n
X
VPn (B)[0, t] = |B(tk ) − B(tk−1 )|,
k=1
and the quadratic variation
Q(B)[0, t] = lim QPn (B)[0, t],
∥Pn ∥→0

where
n
X
QPn (B)[0, t] = |B(tk ) − B(tk−1 )|2 .
k=1
Note that the total variation and quadratic variation of a Brownian motion are
both random variables.

16
Theorem 2.19. The quadratic variation of a Brownian motion {B(s) : s ∈
[0, t]} satisfies
Q(B)[0, t] = lim QPn (B)[0, t] = t,
∥Pn ∥→0

where the convergence is in L2 (Ω), that is


h 2 i
E QPn (B)[0, t] − t → 0, (17)

as ∥Pn ∥ → 0. As a consequence, the convergence holds in probability as well:


for each t > 0, it holds that for all ε > 0
 
P |QPn (B)[0, t] − t| > ε → 0, (18)

as ∥Pn ∥ → 0. Furthermore, for sufficiently refined sequences of partitions


{Pn : n ≥ 1} with ∥Pn ∥ → 0 as n → ∞, then one can obtain almost surely
convergence, that is for each t > 0, almost surely

QPn (B)[0, t] → t. (19)

Theorem 2.19 shows that the quadratic variation of a Brownian motion is


finite, which will be crucial in studying stochastic integration. The following
lemma demonstrates that in contrast, a Brownian motion has unbounded total
variation.

Lemma 2.20. For all t > 0, almost surely V (B)[0, t] = ∞.

2.8 Markov and martingale properties


In the previous chapter, we have defined a Brownian motion {B(t) : t ≥ 0} in
1-dimensional spaces (a linear Brownian motion). We now extend it to higher
dimensional spaces. To this end, we require that each component is a linear
Brownian motion and that the components are independent.
Definition 2.21 (n-dimensional Brownian motion). A n-dimensional Brownian
motion {B(t) : t ≥ 0} started in x = (x1 , . . . , xn ) is a stochastic process given
by
B(t) = (B1 (t), . . . , Bn (t))T
where B1 , . . . , Bn are independent linear (1-dimensional) Brownian motions
started in x1 , . . . , xn respectively. A Brownian motion started in the origin
is also called standard Brownian motion.

Throughout this chapter, we write Px for the probability measure which


makes the n-dimensional process {B(t) : t ≥ 0} a Brownian motion stated in
x ∈ Rn and Ex for the corresponding expectation.

2.8.1 Filtrations
We equip our measurable space (Ω, F ) with a filtration, i.e., a nondecreasing
family {Ft ; t ≥ 0} of sub-σ-fields of F .

Definition 2.22 (Filtration).

17
1. A filtration on a probability space (Ω, F , P) is a family (F (t) : t ≥ 0) of
σ-algebras such that F (s) ⊂ F (t) ⊂ F for all s < t.
2. A probability space together with a filtration is called a filtered probability
space.
3. A stochastic process {X(t) : t ≥ 0} defined on a filtered probability space
with filtration (F (t) : t ≥ 0) is called adapted if X(t) is F (t)-measurable
for any t ≥ 0.
We now introduce two natural filtrations for a Brownian motion (B(t) : t ≥
0).
Definition 2.23 (Two natural filtrations for Brownian motions). Let {B(t) :
t ≥ 0} be a Brownian motion defined on a probability space (Ω, F , P).
1. We denote (F 0 (t) : t ≥ 0) the filtration defined by
F 0 (t) := σ(B(s) : 0 ≤ s ≤ t),
which is the σ-algebra generated by the random variables B(s) for 0 ≤
s ≤ t.
2. We denote (F + (t) : t ≥ 0) the filtration defined by
\ \
F + (t) := F 0 (s) = σ(B(τ ) : 0 ≤ τ ≤ s).
s>t s>t

The filtration (F 0 (t) : t ≥ 0) is the smallest σ-algebra such that each


B(s), 0 ≤ s ≤ t is measurable. Intuitively, this is the σ-algebra contains all
the information available from observing the process up to time t. Clearly
F 0 (t) ⊆ F + (t). Intuitively, the filtration F + (t) allows an “infinitesimal glance
into the future”. In particular, the σ-algebra F + (0) is called the germ σ-algebra.
Definition 2.24 (right-continuous filtration). A filtration (F (t) : t ≥ 0) is said
to be right-continuous if \
F (t) = F (t + ε).
ε>0

The crucial property which distinguishes the filtration (F + (t) : t ≥ 0) from


the filtraion (F 0 (t) : t ≥ 0) is that the former one is right-continuous, that is
\
F + (t) = F + (t + ε).
ε>0

This is because
\ ∞ \
\ ∞
F + (t + ε) = F + (t + 1/k + 1/m) = F + (t).
ε>0 k=1 m=1

Example 2.25. Let (B(t) : t ≥ 0) be a standard Brownian motion. We define


τ := inf{t > 0 : B(t) > 0}
then the event {τ = 0} belongs to F + (0) because
\
{τ = 0} = {∃ 0 < ε < 1/n, B(ε) > 0} ∈ F + (0).
n≥1

18
2.8.2 Markov property
Definition 2.26 (independence of two stochastic processes). Two stochastic
processes {X(t) : t ≥ 0} and {Y (t) : t ≥ 0} are called independent if for any
sets t1 , . . . , tm ≥ 0 and s1 , . . . , sk ≥ 0 of times the vectors (X(t1 ), . . . , X(tm ))
and (Y (s1 ), . . . , Y (sk )) are independent.
Suppose that {X(t) : t ≥ 0} is a stochastic process. Intuitively, the Markov
property says that if we know the process {X(t) : t ≥ 0} on the interval [0, s],
then for the prediction of the future {X(t) : t ≥ s} we only need to know
the information about the end point (the present) B(s) but not necessarily the
information about the whole path (the history) {X(t) : 0 ≤ t ≤ s}.
The basic Markov property for a Brownian motion is the following.
Theorem 2.27 (Markov property I). Let {B(t) : t ≥ 0} be an n-dimensional
Brownian motion started at x ∈ Rn . Let s ≥ 0. Then the process {B(t +
s) − B(s)}t≥0 is a standard Brownian motion and is independent of the process
{B(u) : 0 ≤ u ≤ s}.
In fact, we have a stronger result:
Theorem 2.28 (Markov property II). Let {B(t) : t ≥ 0} be an n-dimensional
Brownian motion started at x ∈ Rn . Let s ≥ 0. Then the process {B(t + s) −
B(s)}t≥0 is a standard Brownian motion and is independent of F + (s).

2.8.3 Strong Markov property*


Definition 2.29 (stopping time). A random variable T with values in [0, ∞]
defined on a probability space with filtration (F (t) : t ≥ 0) is called a stopping
time with respect to this filtration if {T ≤ t} ∈ F (t) for every t ≥ 0.
Intuitively, a stopping time is a moment when a random event related to the
process happens.
For every stopping time T we define the σ-algebra

F + (T ) = {A ∈ F : A ∩ {T ≤ t} ∈ F + (t) for all t ≥ 0}.

Intuitively, this is the collection of all events that happened before the stopping
time T .
Theorem 2.30 (Strong Markov property). For every almost surely finite stop-
ping time T , the process

{B(T + t) − B(T ) : t ≥ 0}

is a standard Brownian motion independent of F + (T ).


One interesting applications of the strong Markov property is the following
reflection principle
Theorem 2.31 (Reflection principle). Let {B(t) : t ≥ 0} be a standard Brow-
nian motion and T be a stopping time. Then the process

B(t)1{t≤T } + (2B(T ) − B(t))1{t≥T } ,

called the Brownian motion reflected at T , is a also a standard Brownian motion.

19
As a consequence of the reflection principle is the following.
Proposition 2.32. Let {B(t) : t ≥ 0} be a standard linear Brownian motion
and let
M (t) := max B(s).
0≤s≤t

Then if a > 0,

P(M (t) ≥ a) = 2P(B(t) ≥ a) = P(|B(t)| ≥ a).

2.8.4 Martingale property


In this section, we study the martingale property of Brownian motion.
Definition 2.33. A real-valued stochastic process {X(t)} is called a martingale
with respect to a filtration {F (t) : t ≥ 0} if it is adapted to the filtration,
E|X(t)| < ∞ for all t ≥ 0 and for any 0 ≤ s ≤ t we have

E[X(t)|F (s)] = X(s) almost surely.

The process is called a submartingale if ≥ holds and a supermartingale if ≤


holds.

Intuitively, a martingale describes fair games in the sense that the current
state is always the best prediction for future states.
Theorem 2.34. A Brownian motion is a martingale (with respect to the filtra-
tion {F + (t) : t ≥ 0}).

We now present two useful facts about martingale: the optional stopping
theorem and Doob’s maximal inequality. The proofs of the two following theo-
rems, which can be found in[1], will be omitted.
Theorem 2.35 (Optional stopping theorem). Suppose {X(t) : t ≥ 0} is a
continuous martingale and 0 ≤ S ≤ T are stopping times. If the process {X(t ∧
T ) : t ≥ 0} is dominated by an integrable random variable X, that is |X(t∧T )| ≤
X almost surely for all t ≥ 0, then

E[X(T )|F (S)] = X(S), almost surely.

Theorem 2.35 says that, under certain conditions, the expected value of a
martingale at a stopping time is equal to its initial expected value. Martingales
are useful in modeling the wealth of a gambler participating in a fair game,
the optional stopping theorem says that, on average, nothing can be gained by
stopping play based on the information obtainable so far (i.e., without looking
into the future).

Theorem 2.36 (Doob’s maximal inequality). Suppose {X(t) : t ≥ 0} is a


continuous martingale. Then for any t ≥ 0 and any p > 1, we have
h i  p p 
E ( sup |X(s)|)p ≤ E |X(t)|p .

0≤s≤t p−1

20
3 Stochastic integral
In this section, we will study the stochastic integral, also known as the Ito
integral, which has the form
Z b
f (t, ω)dB(t, ω),
a

where f (t, ω) is a stochastic process adapted to the filtration Ft = σ{B(s); s ≤


Rb
t} and a |f (t)|2 dt < ∞.
We first consider an integral if f is a deterministic function in the next
subsection, where this integral is called the Wiener integral.

3.1 Wiener Integral


In this section, we consider the following Riemann-Stieltjes type integral:
Z b
f (t)dB(t, ω), (20)
a

where f is a deterministic function (i.e. does not depend on ω) and B(t, ω) is a


Brownian motion. From Lemma 2.20 we see that the Brownian motion B(t) has
unbounded total variation. Therefore, one may propose to define this integral
in the following sense: for each ω ∈ Ω,
Z b b
Z b
f (t)dB(t, ω) = f (t)B(t, ω) − B(t, ω)df (t). (21)
a a a
Rb
However, the class of functions f (t) for which the integral a B(t, ω)df (t) is
defined for each ω ∈ Ω is rather limited, as we need f to have bounded total
variation.
To extend the integral (20) for a wider class of functions f (t), we need a new
idea rather than the Riemann-Stieltjes integral. This new integral, called the
Wiener integral of f , is defined for all functions f ∈ L2 ([a, b]), where L2 ([a, b])
denotes the Hilbert space of all real-valued square integrable functions on [a, b].
Example 3.1. Let f : [0, 1] → R defined as

0,   t=0
(
f (t) =
t sin 1t , 0 < t ≤ 1.

Then it is easy to check that f is continuous with unbounded variation. More-


over, f ∈ L2 ([0, 1]).

3.1.1 Construction
We first suppose f is a step function given by
n
X
f= ai 1[ti−1 ,ti ) ,
i=1

21
where t0 = a and tn = b. In this case, define the integral of step function f by
Z b n
X
I(f ) = f (t)dB(t) = ai (B(ti ) − B(ti−1 )). (22)
a i=1

It is easy to check that I is linear, i.e. I(af +bg) = aI(f )+bI(g) for any a, b ∈ R
and step functions f and g. Moreover, we have

Lemma 3.2. For a step function f , the random variable I(f ) is Gaussian with
mean 0 and variance
Z b
E(I(f )2 ) = f (t)2 dt. (23)
a

We use L (Ω) to denote the Hilbert space of square integrable real-valued


2

random bariables on Ω with inner product ⟨X, Y ⟩ = E[XY ]. Let f ∈ L2 ([a, b])
and a sequence {fn }∞
n=1 of step functions such that fn → f in L ([a, b]). By
2

Lemma 3.2 the sequence {I(fn )}n=1 is Cauchy in L (Ω). Therefore, it converges
∞ 2

in L2 (Ω). Define

I(f ) = lim I(fn ), in L2 (Ω). (24)


n→∞

In order to show I(f ) is well-defined, we also need to prove that the limit in
(24) is independent of the choice of the sequence {fn }. Suppose {gm } is another
sequence of step functions and gm → f in L2 ([a, b]). Then by the linearity of
the mapping I, (23), and triangle inequality, we have
Z b
E[(I(fn ) − I(gm ))2 ] = E[(I(fn − gm ))2 ] = (fn (t) − gm (t))2 dt
a
 2
≤ ∥fn (t) − f (t)∥L2 ([a,b]) + ∥gm (t) − f (t)∥L2 ([a,b]) ,

which converges to 0 as n, m → ∞. Therefore, it follows that

lim I(fn ) = lim I(gm )


n→∞ m→∞

in L2 (Ω). This shows that I(f ) is well-defined.


Theorem 3.3. For each f ∈ L2 ([a, b]), the Wiener integral defined in (24) is
a Gaussian random variable with mean 0 and variance ∥f ∥2L2 ([a,b]) .

Thus the Wiener integral I : L2 ([a, b]) → L2 (Ω) is an isometry. In fact, it


preserves the inner product,
Corollary 3.4. If f, g ∈ L2 ([a, b]), then
Z b
E[I(f )I(g)] = f (t)g(t)dt. (25)
a

In particular, if f and g are orthogonal, then the Gaussian random variables


I(f ) and I(g) are independent.

22
Example 3.5. The Wiener integral
Z 1
s dB(s)
0
R1
is a Gaussian random variable with mean 0 and variance 0
s2 ds = 31 .
It turns out that the Wiener integral defined in (24) coincide with the
Riemann-Stieltjes integral defined in (21) almost surely, which means that the
former is indeed the extension of the later.
Theorem 3.6. Let f be a continuous functionn of bounded variation. Then for
almost all ω ∈ Ω,
Z b  Z b
f (t)dB(t) (ω) = f (t)dB(t, ω)
a a

where the left-hand side is the Wiener integral of f and the right-hand side is
the Riemann-Stieltjes integral of f defined by (21).
Let f ∈ L2 ([a, b]) and consider the stochastic process defined by
Z t
Mt = f (s)dB(s) (26)
a

for a ≤ t ≤ b. We show that Mt is a martingale. See Definition 2.33 for


the definition of martingale. In what follows, if the filtration is not explicitly
specified, then the filtration {Ft } is understood to be the one given by Ft =
σ{Xs ; s ≤ t}. Recall that a Brownian motion B(t) is a martingale. We show
that the stochastic process Mt defined in (26) is a martingale.
Theorem 3.7. Let f ∈ L2 ([a, b]). Then the stochastic process
Z t
Mt = f (s)dB(s), a ≤ t ≤ b,
a

is a martingale with respect to Ft = σ{B(s); a ≤ s ≤ t}.


In what follows, we denote by Ft = Ft0 .

3.2 Stochastic integrals


Given a Brownian motion B(t) and a filtration {Ft ; a ≤ t ≤ b} satisfying the
following conditions:

• For each t, B(t) is Ft -measurable.


• For any s ≤ t, the random variable B(t) − B(s) is independent of the
σ-field Fs .

Definition 3.8. In what follows, we use L2ad ([a, b]×Ω) to denote the space of all
stochastic process f (t, ω), a ≤ t ≤ b, ω ∈ Ω, satisfying the following conditions:
1. f (t, ω) is adapted to the filtration {Ft };

23
Rb
2. a
E[|f (t)|2 ]dt < ∞.
The main purpose of this subsection is to define the stochastic integral
Z b
f (t)dB(t), (27)
a

for f ∈ L2ad ([a, b] × Ω). We will split this construction into three steps.
Step 1. f is a step stochastic process in L2ad ([a, b] × Ω).
Suppose f is a step stochastic process given by
n
X
f (t, ω) = ξi−1 (ω)1[ti−1 ,ti ) (t), (28)
i=1

where t0 = a and tn = b, ξi−1 is Fti−1 -measurable and E[ξi−1 2


] < ∞. In this
case, we define
Z b n
X
I(f ) = f (t)dB(t) = ξi−1 (B(ti ) − B(ti−1 )). (29)
a i=1

It is easy to see that I is a linear mapping, i.e. I(af + bg) = aI(f ) + bI(g) for
any a, b ∈ R and any such step stochastic processes f and g. Moreover, we have
Lemma 3.9. Let I(f ) be defined by (29). Then, E[I(f )] = 0 and
Z b
E[I(f ) ] =
2
E[f (t)2 ]dt. (30)
a

Step 2. An approximation lemma.


We need an approximation lemma to define the stochastic integral (27) for
general stochastic functions f ∈ L2ad ([a, b] × Ω).
Lemma 3.10. Suppose f ∈ L2ad ([a, b]×Ω). Then there exists a sequence {fn (t)}
of step stochastic processes in L2ad ([a, b] × Ω) such that
Z b
lim E[|f (t) − fn (t)|2 ]dt = 0. (31)
n→∞ a

Case 2: f is bounded.
In this case, define a stochastic process gn by
Z n(t−a)
gn (t, ω) = e−τ f (t − n−1 τ, ω)dτ.
0
Rb
Note that gn is adapted to Ft and a
E[|gn (t)|2 ]dt < ∞.
Claim 3.11. For each n, E[gn (t)gn (s)] is a continuous function of (t, s).
And we also have
Claim 3.12. We have
Z b
E[|f (t) − gn (t)|2 ]dt = 0,
a
as n → ∞.

24
Rb
Step 3. Stochastic integral a f (t)dB(t) for f ∈ L2ad ([a, b] × Ω).
Now we can use what we prove in Step 1 and 2 to define the stochastic
integral
Z b
f (t)dB(t),
a

for f ∈ L2ad ([a, b] × Ω).


Apply Lemma 3.10 to get a sequence {fn (t, ω)} of adapted step stochastic
processes such that (31) holds. For each n, I(fn ) is defined by Step 1. By
Lemma 3.9 we have
Z b
lim E |I(fn ) − I(fm )|2 = lim E |fn (t) − fm (t)|2 dt = 0.
   
n,m→∞ n,m→∞ a

Hence the sequence {I(fn )} is a Cauchy sequence in L2 (Ω). Define

I(f ) = lim I(fn ) (32)


n→∞

in L2 (Ω). We can then use the arguments similar to those in Section 3.1.1 for
the Wiener integral to show that the above I(f ) is well-defined.
Definition 3.13. The limit I(f ) defined in (32) is called Ito integral of f and
Rb
is denoted by a f (t)dB(t).
From our construction it is easy to check the mapping I defined on f ∈
L2ad ([a, b]×Ω) is linear. Furthermore, the Ito integral I : L2ad ([a, b]×Ω) → L2 (Ω)
is an isometry.
Theorem 3.14. Suppose f ∈ L2ad ([a, b] × Ω). Then the Ito integral I(f ) =
Rb
a
f (t)dB(t) is a random variable with E[I(f )] = 0 and
Z b
E[|I(f )| ] =
2
E |f (t)|2 dt. (33)
 
a

Since I is linear, we also have the following corollary, whose proof is similar to
that of Corollary 3.4.
Corollary 3.15. For any f, g ∈ L2ad ([a, b] × Ω), the following equality holds:
Z b Z b  Z b
E f (t)dB(t) g(t)dB(t) = E[f (t)g(t)]dt.
a a a

In the rest of this subsection, we discuss some examples of stochastic inte-


grals.
Example 3.16. We have

1
Z b
B(t)dB(t) = [B(b)2 − B(a)2 − (b − a)].
a 2
As a consequence of Example 3.16, we have

1
Z t
B(s)dB(s) = [B(t)2 − B(a)2 − (t − a)]. (34)
a 2

25
Example 3.17. We have
1
Z b Z b
B(t) dB(t) = (B(b) − B(a) ) −
2 3 3
B(t)dt,
a 3 a

where the integral in the right-hand side is the Riemann integral of B(t, ω) for
almost all ω ∈ Ω.

3.3 Properties
We consider the continuity property of the stochastic process defined by the
stochastic integral
Z t
Xt = f (s)dB(s), (35)
a

for a ≤ t ≤ b and f ∈ L2ad ([a, b]


× Ω). Note that the stochastic integral is
not defined for each fixed ω as a Riemann, Riemann-Stieltjes, or even Lebesgue
integral. Even for the Wiener integral case, it is not defined path-wisely. There-
fore, the continuity of the stochastic process in (35) is not a trivial fact as in
elementary real analysis.
Theorem 3.18. Suppose f ∈ L2ad ([a, b] × Ω). Then the stochastic process Xt
given in (35) is continuous, namely, almost all of its sample paths are continuous
functions on the interval [a, b].
The proof of Theorem 3.18 needs the Doob submartingale inequality, which
is out of the scope of this module, thus we omit it here.
We record the following, which says that if f is continuous, the approxima-
tion sequence fn can be explicitly constructed.
Theorem 3.19. Suppose f is a continuous {Ft }-adapted stochastic process.
Then f ∈ L2ad ([a, b] × Ω) and
Z b n
X
f (t)dB(t) = lim f (ti−1 )(B(ti ) − B(ti−1 )),
a ∥Pn ∥→0
i=1

in probability, where Pn = {t0 , · · · , tn } is a partition of the finite interval [a, b].


This integral inherits the properties of Ito integrals of simple processes. We
summarize these in the next theorem.
Theorem 3.20. Let f, g ∈ L2ad ([a, b] × Ω). Then the stochastic integral (35)
defined by (32) has the following properties:
1. (Continuity) As a function of t, the paths of X(t) are continuous a.s.
2. (Adaptivity) For each t, X(t) is Ft -measurable.
Rt
3. (Linearity) If Y (t) = a g(s)dB(s), then
Z t
X(t) + Y (t) = (f (s) + g(s))dB(s) a.s.
a

furthermore, for every constant c


Z t
cX(t) = cf (s)dB(s) a.s.
a

26
4. (Martingale) X(t) is a martingale with respect to the filtration {Ft ; a ≤
t ≤ b}.
hZ t i
5. (Ito isometry) E[X 2 (t)] = E f 2 (s)ds .
a

6. (Quadratic variation) we have


n
X Z t
[X]t = lim (Xtk − Xtk−1 )2 = f 2 (s)ds, (36)
∥Pn ∥→0 a
k=1

in probability, where [X]t is the quadratic variation of X.

3.4 Ito Calculus and Finance


The Ito calculus created by Japanese mathematician Kiyoshi Ito is pervasive in
quantitative finance. In particular, it plays a crucial rule in the derivation of
the Black-Scholes formula.
If the functions Xt represents the price of a non-dividend paying stock and
f (t) denotes the number of shares held in that same stock through time, the
Riemann-Stieltjes integral
Z T n
X
f (t)dXt = lim f (ti−1 )(Xti − Xti−1 ) (37)
0 ∥Pn ∥→0
i=1

is the cumulative gain or loss in the single-stock portfolio due to changes in the
price of the stock from time 0 to time T, when trading takes place continuously
in time. In particular,
f (ti−1 )(Xti − Xti−1 )
represents the (approximate) gain or loss over the time [ti−1 .ti ], and the limit
lim∥Pn ∥→0 means the trading takes place continuously in time. For the Riemann-
Stieltjes integral (37) to exist, i.e. the limit in (37) to converge, we usually
require Xt having bounded variation on the interval [0, T ], which prevents us
using Brownian motion to model the price in the stock market.
Competition in the liquid capital markets is fierce as millions of traders try
to predict future prices based on assessments of available information, which
makes the stock prices change by seemingly random movements. In 1900, French
mathematician Louis Bachelier was the first to model stock prices with Wiener
process. It turns out that Brownian motion is a perfect candidate to help in
modeling stock prices movements.
However, Brownian motion has unbounded total variation, which leads to
a major issue for the use of Riemann-Stieltjes integral (21). To overcome this
difficulty, Ito proposed a different method of convergence, i.e. the mean square
limit:
 X n 2

lim E f (ti−1 )(Bti − Bti−1 ) − I(f ) =0
∥Pn ∥→0
i=1

where B(t) represents a Brownian motion. Built upon this idea, more complex
Ito integral can be constructed. In particular, if the stock price X(t) is modeled

27
as a Geometric Brownian motion, we can make sense of the integral
Z T
f (t)dXt ,
0

in a proper way, which plays an fundamental role in derivation of the famous


Black-Scholes model.

3.5 No perfect foresight assumption


One may wonder why can’t we use other ways to construct the Ito integral, for
instance,
Xn
lim f (τi )(Xti − Xti−1 ),
∥Pn ∥→0
i=1

where τi ∈ [ti , ti−1 ], which is allowed in Riemann–Stieltjes integral. In particu-


lar, if we choose
ti + ti−1
τi = ,
2
we may construct a notion of “integral" I(f e ) through the following limit
 Xn t + t  2

i i−1
lim E f (Bti − Bti−1 ) − I(f
e ) = 0.
∥Pn ∥→0
i=1
2

This integral I(f e ) is called the Stratonovich integral, and it is important in


physics.
The reason that we can not choose τi other than the left endpoint ti−1 is
that we should assume no one can see the future with certain, which is one of
the major assumption in modeling finance market. This means that to predict
the loss or gain on the time interval [ti−1 , ti ], the amounts of shares we hold
f (t) should only depends on the information that we have had upto ti−1 .
American economist Robert C. Merton introduced the Ito calculus into fi-
nance in his seminal paper, where he realized that choosing other input τi ∈
[ti−1 , ti ], rather than the left endpoint τi = ti−1 , would imply perfect foresight,
i.e. the correct prediction of future events. If there is no uncertainty then an
agent can have perfect foresight if they know all relevant information and have a
correct model to use for prediction. When there is uncertainty it is not possible
to have perfect foresight. As such, this is an invalid assumption when modeling
financial decisions. For example, for the single-stock portfolio case, comput-
ing the cumulative gain by the Stratonovich integral would need to assume the
portfolio manager has information about the future at the time of selecting the
trading strategy, that is, at the time of selecting the number of shares to hold,
which is impossibility.

28
4 Ito formula
The chain rule in Calculus is the formula
d
f (g(t)) = f ′ (g(t))g ′ (t)
dt
for differentiable functions f and g. It can be rewritten in the integral form as
Z t
f (g(t)) − f (g(a)) = f ′ (g(s))g ′ (s)ds. (38)
a

In this section, we shall develop the chain rule for stochastic calculus.

4.1 Motivation
Let f be a differentiable function, and consider the composite function f (B(t).
Since almost all sample paths of B(t) are nowhere differentiable, the equation
(38) obviously has no meaning. However, when we rewrite B ′ (s)ds as an inte-
grator dB(s) in the Ito integral, (38) leads to the following question: does the
equality
Z t
f (B(t)) − f (B(a)) = f ′ (B(s)dB(s), (39)
a

holds for any differentiable


Rt function f ?
The integral a f ′ (B(s)dB(s) is an Ito integral. For f (x) = x2 , the formula
(39) would give
Z t
B(t)2 − B(a)2 = 2 B(s)dB(s),
a

which is contradictory to (34), which is


Z t
B(t)2 − B(a)2 − (t − a) = 2 B(s)dB(s).
a

Therefore, the above question is negative.


Then is there a chain rule for the composite function f (B(t)) in the integral
form as (38)? To answer this question, we consider a partition Pn = {t0 , · · · , tn }
of [a, t]. Then we write
n
X
f (B(t)) − f (B(a)) = f (B(ti )) − f (B(ti−1 )) . (40)

i=1

Let f be a C 2 function, i.e. it is twice differentiable and the second derivative


f ′′ is continuous. Then by Taylor expansion we have

f (B(ti )) − f (B(ti−1 )) = f ′ (B(ti−1 ))(B(ti ) − B(ti−1 ))


1
+ f ′′ B(ti−1 ) + λi (B(ti ) − B(ti−1 ))(B(ti ) − B(ti−1 ))2 ,
2

29
where 0 < λi < 1, which together with (40) yields
n
X
f (B(t)) − f (B(a)) = f ′ (B(ti−1 ))(B(ti ) − B(ti−1 ))
i=1
n
(41)
1X
+ f B(ti−1 ) + λi (B(ti ) − B(ti−1 ))(B(ti ) − B(ti−1 )) ,
′′ 2
2 i=1

From Theorem 3.19 we note that


n
X Z t
lim f (B(ti−1 ))(B(ti ) − B(ti−1 )) =

f ′ (B(s))dB(s) (42)
∥Pn ∥→0 a
i=1

in probability. As for the second summation in (41), we may guess from Theorem
2.19 that
n
X
lim f ′′ B(ti−1 ) + λi (B(ti ) − B(ti−1 ))(B(ti ) − B(ti−1 ))2
∥Pn ∥→0
i=1 (43)
Z t
= f (B(s))ds.
′′
a

We shall prove (43) later. By collecting (41), (42), and (43), we have the
following result, which Ito proved in 1944.
Theorem 4.1. Let f (x) be a C 2 -function. Then
t
1 t
Z Z
f (B(t)) − f (B(a)) = f ′ (B(s)dB(s) + f ′′ (B(s))ds, (44)
a 2 a

where the first integral is an Ito integral, and the second integral is a Riemann
integral for each sample path of B(s).
We remark that the appearence of the second term in (44) is a consequence
of the nonzero quadratic variation of the Brownian motion B(t). This extra
term is the key difference between Ito calculus and Leibniz-Newton calculus.
Example 4.2. Take the function f (x) = x2 for (44) to get
Z t
B(t)2 − B(a)2 = 2 B(s)dB(s) + (t − a),
a

which coincides with (34). If we take f (x) = x3 , then (44) with t = b gives
Z b Z b
B(b) − B(a) = 3
3 3
B(s) dB(s) + 3
2
B(s)ds,
a a

which coincides with Example 3.17.

4.2 Proof of Ito formula


In this subsection, we will proof Theorem 4.1, for which it suffices to show (43).
To simplify the argument, we only show (43) when f ′′ is a bounded function.

30
Lemma 4.3. Let g(x) be a continuous function on R. For each n ≥ 1, let
Pn = {t0 , t1 , · · · , tn } be a partition of [a, t] and let 0 < λi < 1 for 1 ≤ i ≤ n.
Then there exists a subsequence of
n
X
lim g(B(ti−1 ) + λi (B(ti ) − B(ti−1 )) − g(B(ti−1 ))

∥Pn ∥→0
i=1 (45)
× (B(ti ) − B(ti−1 )) , 2

converges to 0 almost surely.


Lemma 4.4. Let g be a bounded measurable function and let P = {t0 , · · · , tn }
be a partition of [a, t]. Then
n
X  
lim g(B(ti−1 )) (B(ti ) − B(ti−1 ))2 − (ti − ti−1 ) = 0, (46)
∥P ∥→0
i=1

in L2 (Ω).
Remark 4.5. If g is a continuous function on R, i.e. no boundedness assump-
tion, then the (46) converges almost surely. We omit the proof.

4.3 Generalized Ito formula


Let f (t, x) be a continuous function of t and x and have continuous partial
∂2f
derivatives ∂f
∂t , ∂x , and ∂x2 . Then by Taylor expansion we get
∂f

f (t, x) − f (s, x0 ) = [f (t, x) − f (s, x)] + [f (s, x) − f (s, x0 )]


∂f ∂f
= (s + ρ(t − s), x)(t − s) + (s, x0 )(x − x0 ) (47)
∂t ∂x
1 ∂2f
+ (s, x0 + λ(x − x0 ))(x − x0 )2 ,
2 ∂x2
where 0 < ρ, λ < 1.
Put x = Bt = B(t) to get a stochastic process f (t, Bt ). Then, similar as
(41), we can write
n 
X 
f (t, Bt ) − f (a, Ba ) = f (ti , Bti ) − f (ti−1 , Bti−1 )
i=1 (48)
= I + II + III,
2
where I + II + III are the summations corresponding to ∂f ∂t , ∂x , and ∂x2 re-
∂f ∂ f

spectively, in (47). By the continuity of ∂f ∂t and the Brownian motion Bt , we


have
n Z t
X ∂f ∂f
I= ti−1 + ρ(ti − ti−1 ), Bti−1 (ti − ti−1 ) → (s, Bs )ds,

i=1
∂t a ∂t

almost surely as ∥Pn ∥ → 0. By Theorem 3.19 we see that


n Z t
X ∂f ∂f
II = (ti−1 , Bti−1 )(Bti − Bti−1 ) → (s, Bs )dBs ,
i=1
∂x a ∂x

31
almost surely as ∥Pn ∥ → 0. Finally, similar argument as in Lemmas 3.9 and 4.4
yields
n Z t 2
X ∂2f ∂ f
III = (t , Bti−1 + λ(Bti − Bti−1 ))(Bti − Bti−1 ) →
2 i−1
2
2
(s, Bs )ds,
i=1
∂x a ∂x

almost surely as ∥Pn ∥ → 0 up to some subsequence. Therefore, we have shown


the following formula
Theorem 4.6. Let f (t, x) be a continuous function with continuous partial
∂2f
derivatives ∂f ∂f
∂t , ∂x , and ∂x2 . Then

t
Z
∂f
f (t, Bt ) = f (a, Ba ) + (s, Bs )dBs
a ∂x
(49)
1 ∂2f
Z t
∂f 
+ (s, Bs ) + (s, Bs ) ds
a ∂t 2 ∂x2
To further generalize Ito formula, we introduce some notations.
Definition 4.7. We say f ∈ Lad (Ω, Lp ([a, b])) if f is {Ft }-adapted stochastic
Rb
process such that a |f (t)|p dt < ∞ almost surely.
Definition 4.8. An Ito process is a stochastic process of the form
Z t Z t
Xt = Xa + σs dBs + µs ds, (50)
a a

for a ≤ t ≤ b, where Xa is Fa -measurable, σt ∈ Lad (Ω, L2 ([a, b])), and µt ∈


Lad (Ω, L1 ([a, b])).
A convenient shorthand for writing (50) is the following “stochastic differ-
ential"

dXt = σt dBt + µt dt. (51)

It should be pointed out that the stochastic differential has no meaning by


itself since Brownian paths are nowhere differentiable. It is merely a symbolic
expression to mean the equality in (50).
The following theorem give the general form of Ito’s formula.
Theorem 4.9. Let Xt be an Ito process given by Definition 4.8, and f be a
∂2f
continuous function with continuous partial derivatives ∂f ∂f
∂t , ∂x , and ∂x2 . Then
f (t, Xt ) is also an Ito process and
Z t
∂f
f (t, Xt ) = f (a, Xa ) + (s, Xs )dBs
a ∂x
(52)
1 ∂2f
Z t
∂f ∂f 
+ (s, Xs ) + µs (s, Xs ) + σs2 2 (s, Xs ) ds.
a ∂t ∂x 2 ∂x
It would be intuitive to show (52) through the symbolic derivation in terms
of a stochastic differential by using Taylor expansion to the first order in dt and
the second order for dXt , and also the following table

32
× dB(t) dt
dB(t) dt 0
dt 0 0

From this table we see that

(dXt )2 = σt2 (dBt )2 + 2σt µt dBt dt + µ2t (dt)2 = σt2 dt (53)

Now we are ready to “prove" (52). First by Taylor expansion to get

∂f ∂f 1 ∂2f
df (t, Xt ) = (t, Xt )dt + (t, Xt )dXt + (t, Xt )(dXt )2 . (54)
∂t ∂x 2 ∂x2
Then we plug (50) and (53) to get

∂f ∂f 1 ∂2f
df (t, Xt ) = (t, Xt )dt + (t, Xt )(σt dB(t) + µt dt) + (t, Xt )σt2 dt
∂t ∂x 2 ∂x2
(55)
1 ∂2f
 
∂f ∂f ∂f
= σt dB(t) + + µt + σt2 2 dt.
∂x ∂t ∂x 2 ∂x

Here we omit the variable (t, Xt ) for simplicity. Finally, we can convert this
differential equation into integral form and get (52).
We remark that the above computation using the symbolic derivation of
stochastic differentials yields the correct Ito formula, however, this derivation is
not a proof.

4.4 Multidimensional Ito’s formula


We now turn to the situation in higher dimensions: Let B(t) = (B1 (t), . . . , Bm (t))
denote m-dimensional Brownian motion. For some stochastic processes σij ∈
Lad (Ω, L2 ([a, b])) and µi ∈ Lad (Ω, L1 ([a, b])), for 1 ≤ i ≤ n and 1 ≤ j ≤ m,
consider the following n Ito processes:

dX1 =µ1 dt + σ11 dB1 + · · · + σ1m dBm





.. .. ..

. . . (56)


dXn =µn dt + σn1 dB1 + · · · + σnm dBm

Or, in matrix notation simply

dX(t) = Mdt + SdB(t)

where

X1 (t)
   
µ1
X(t) =  .. M =  ...  ,
. ,
   

Xn (t) µn
and
dB1 (t)
   
σ11 ··· σ1m
S =  ... ..  ,
.  dB(t) =  ..
.
  

σn1 ··· σnm dBm (t)

33
Such a process X(t) is called an n-dimensional Ito process (or just an Ito
process). The stochastic differentials (56) should be understood in integral form,
i.e.,
 Z t Z t Z t
 X1 (t) =X1 (0) +

 µ1 ds + σ 11 dB1 (s) + · · · σ1m dBm (s)

 0 0 0

.. .. ..

. . . (57)



 Z t Z t Z t
Xn (t) =Xn (0) + µn ds + σn1 dB1 (s) + · · · σnm dBm (s).


0 0 0

Or in matrix form,
Z t Z t
X(t) = X(0) + M(s)ds + S(s)dB(s) .
0 0

Let f (t, x) be a C 2 function defined on from [0, ∞) × Rn . Then the process


f (t, X(t)) is again an Ito process, whose differential is given by
∂f X ∂f 1 X ∂2f
df (t, X(t)) = (t, X)dt + (t, X)dXi + (t, X)dXi dXj
∂t i
∂xi 2 i,j ∂xi ∂xj

where dXi dXj should computated using the following table,

× dBj (t) dt
dBi (t) δi,j dt 0
dt 0 0
where (
1 i=j
δij = .
0 i ̸= j
The product dBi (t)dBj (t) = 0 for i ̸= j is the symbolic expression of the
following fact: let B1 (t) and B2 (t) be two independent Brownian motions and
let Pn = {t0 , · · · , tn } be a partition of [a, b]. Then,
n
X
lim (B1 (ti ) − B1 (ti−1 ))(B2 (ti ) − B2 (ti−1 )) = 0
∥Pn ∥→0
i=1

in L2 (Ω). The proof of this fact is left as an exercise.


Example 4.10. Let B = (B1 , . . . , Bn ) be Brownian motion in Rn , n ≥ 2, and
consider 1
|B(t)| = B12 (t) + · · · + Bn2 (t) 2
i.e. the distance to the origin of B(t). The function f (t, x) = |x| is not C 2 at
the origin, but since Bt never hits the origin, a.s. when n ≥ 2, Ito’s formula
still works and we get
n
X Bi (t) n−1
d|B(t)| = dBi (t) + dt. (58)
i=1
|B(t)| 2|B(t)|

The process |B(t)| is called the n-dimensional Bessel process.

34
5 SDEs and applications in finance
A stochastic differential equation (SDE) is a differential equation in which one
or more of the terms is a stochastic process. Solutions to SDE, if exists, are also
stochastic processes. SDEs are used to model various phenomena such as stock
prices or physical systems subject to thermal fluctuations.

5.1 Existence and uniqueness


In the last part of this module we will derive the Black-Scholes equation for the
price of an option on an asset modeled as a geometric Brownian motion, which
is a stochastic differential equation. To this end, we first study the stochastic
differential equations.
By a solution X(t, ω) of the stochastic differential equation

dX(t) = b(t, X(t))dt + σ(t, X(t))dB(t) , b(t, x) ∈ R , σ(t, x) ∈ R , (59)

with initial condition X(0) = X0 , we mean that X(t) satisfies the stochastic
integral equation
Z t Z t
X(t) = X(0) + b(s, X(s))ds + σ(s, X(s))dB(s) . (60)
0 0

Theorem 5.1 (Existence and uniqueness theorem for stochastic differential


equations). Let T > 0 and b(·, ·) : [0, T ] × R → R, σ(·, ·) : [0, T ] × R → R be
continuous functions satisfying

|b(t, x)| + |σ(t, x)| ≤ C(1 + |x|); x ∈ R, t ∈ [0, T ]

for some constant C, and such that

|b(t, x) − b(t, y)| + |σ(t, x) − σ(t, y)| ≤ D|x − y|; x, y ∈ R, t ∈ [0, T ]

for some constant D. Then the stochastic differential equation

dX(t) = b (t, X(t)) dt + σ (t, X(t)) dB(t), 0 ≤ t ≤ T, X(0) = X0

has a unique t-continuous solution X(t, ω) with the property that X(t, ω) is
adapted to the filtration Ft generated by W (s); s ≤ t and
"Z #
T
2
E |X(t)| dt < ∞ .
0

It is the Ito formula that is the key to the solution of many stochastic dif-
ferential equations. The method is illustrated in the following examples.

5.2 Mathematical models for stock prices


A geometric Brownian motion (GBM) (also known as exponential Brownian mo-
tion) is a continuous-time stochastic process, whose logarithm follows a Brown-
ian motion with drift. It is used in mathematical finance to model stock prices

35
Figure 2: Stock index (1979 - 2019). (source of figure:
https://www.stlouisfed.org/on-the-economy/2021/january/
irrational-exuberance-look-stock-prices)

Figure 3: Illustration of geometric Brownian motions. (source of figure: https:


//en.wikipedia.org/wiki/Geometric_Brownian_motion)

36
in the Black–Scholes model. A GBM process only assumes positive values, just
like real stock prices, which is one of the main advantage comparing to directly
using the Brownian motion to model the stock prices, as the Brownian motion
may take negative values. See the above figure for an example of stock prices.
A stochastic process S(t) is a GBM if it satisfies the following stochastic
differential equation (SDE):

dS(t) = µS(t)dt + σS(t)dBt , (61)

where Bt is a Brownian motion. Here µ is the percentage drift, which is used


to model deterministic trends; and σ is the percentage volatility, which is of-
ten used to model a set of unpredictable events occurring during this motion
(randomness). For simplicity, we assume both µ and σ are constants.
Given initial data

S(0) = S0 , (62)

we expect there exists solutions to (61) satisfying the initial condition.


Theorem 5.2. Given initial value S0 and constants µ and σ, the above SDE
(61) has a solution:

σ2
  
S(t) = S0 exp µ− t + σBt , (63)
2

in the Ito’s sense.


We can use
√ our knowledge about the behaviour of Bt (Bt increases at most
at the order 2t log log t when t is large) to gain information on these solutions.
For example, for the Ito solution S(t) we get the following:

1. If µ > 21 σ 2 then S(t) → ∞ as t → ∞, a.s.


2. If µ < 12 σ 2 then S(t) → 0 as t → ∞, a.s.
3. If µ = 12 σ 2 then S(t) will fluctuate between arbitrary large and arbitrary
small values as t → ∞, a.s.

These conclusions are direct consequences of the formula (63). We also have
the following.
Corollary 5.3. The above solution S(t) is a log-normally distributed random
variable with expected value and variance given by

E(S(t)) = S0 eµt
 2 
Var(S(t)) = S02 e2µt eσ t − 1 .

From the above we see that the expectation of S is determined by the de-
terministic trends µ. If we use GBM to model the stock price, this shows that
the expected returns are independent of the value of the process (stock price),
which agrees with what we would expect in reality.
However, GBM is not a completely realistic model, in particular:

37
• In real stock prices, volatility changes over time (possibly stochastically),
but in GBM, volatility σ is assumed constant.
• In real life, stock prices often show jumps caused by unpredictable events
or news, but in GBM, the path is continuous.
Example 5.4 (Generalized geometric Brownian motion). In an attempt to
make GBM more realistic as a model for stock prices, one can drop the assump-
tion that the volatility σ is constant. Let B(t), t ≥ 0, be a Brownian motion,
Ft , t ≥ 0 be an associated filtration, and µ(t) and σ(t) be adapted processes.
Define the Ito process,

1 2
Z t Z t 
X(t) = σ(s)dB(s) + µ(s) − σ (s) ds,
0 0 2
and its differential form
1
 
dX(t) = σ(t)dB(t) + µ(t) − σ 2 (t) dt
2

From the Ito’s table (4.3) we see that dX(t)dX(t) = σ 2 (t)dB(t)dB(t) = σ 2 (t)dt.
Consider an asset price process given by

1
Z t Z t  
S(t) = S(0)eX(t) = S(0) exp σ(s)dB(s) + µ(s) − σ 2 (s) ds (64)
0 0 2

where S(0) is nonrandom and positive. We may write S(t) = f (X(t)), where
f (x) = S(0)ex . According to the Ito formula

dS(t) = df (X(t))
1
= f ′ (X(t))dX(t) + f ′′ (X(t))dX(t)dX(t)
2
1
= S(0)e X(t)
dX(t) + S(0)eX(t) dX(t)dX(t)
2
1 (65)
= S(t)dX(t) + S(t)dX(t)dX(t)
2
1 1
   
= S(t) σ(t)dB(t) + µ(t) − σ 2 (t) S(t)dt + σ(t)2 S(t)dt
2 2
= µ(t)S(t)dt + σ(t)S(t)dB(t) ,

which shows that the process S(t) satisfies

dS(t) = µ(t)S(t)dt + σ(t)S(t)dB(t) . (66)

We note that (61) is a special case of (66) with µ(t) and σ taking deterministic
constant values. In (66), the asset price S(t) has instantaneous mean rate of
return µ(t) and volatility σ(t), both of which are allowed to be time-varying and
random.

5.3 Mathematical models for interest rate


It is observed that the interest rates tend to return to some stable region. This
tendency to return to certain region is known as mean reversion. It is widely

38
believed that the interest rates are mean reverting, as very high or negative
interest rates either lead to downward economic spiral, or snap quickly back to
more normal levels. Therefore, the only reasonable mathematical interpretation
of interest rate behavior is a mean reverting one. The main purpose of this
subsection is to introduce a mathematical model with mean reverting property.
Example 5.5 (Model of interest rate). Let B(t), t ≥ 0, be a Brownian motion.
The Vasicek model for the interest rate process R(t) is
dR(t) = (α − βR(t))dt + σdB(t) (67)
where α, β, and σ are positive constants. Find a solution to this equation.
Equation (67) is an example of a stochastic differential equation. It defines
a random process, R(t) in this case, by giving a formula for its differential, and
the formula involves the random process itself and the differential of a Brownian
motion.
Theorem 3.3 implies that the random variable
Z t
eβs dB(s)
0

appearing on the right-hand side is normally distributed with mean zero and
variance
1
Z t
e2βs ds = e2βt − 1

0 2β
Therefore, R(t) is normally distributed with mean
α
E[R(t)] = e−βt R(0) + 1 − e−βt

β
and variance
σ2
Var[R(t)] = 1 − e−2βt .


The Vasicek model has the desirable property that the interest rate is mean-
reverting. When R(t) = α β , the drift term (the dt term) in (67) is zero. When
R(t) > α β , this term is negative, which pushes R(t) back toward α β . When
R(t) < β , this term is positive, which again pushes R(t) back toward α
α
β . If
R(0) = β , then ER(t) = β for all t ≥ 0. If R(0) ̸= β , then
α α α

α
lim E[R(t)] = .
t→∞ β
There is one main disadvantage of the Vasicek model, which is that no matter
how the parameters α > 0, β > 0, and σ > 0 are chosen, there is positive
probability that R(t) is negative, an undesirable property for an interest rate
model. We therefore introduce a similar but different module to get around
this.
Example 5.6 (Cox-Ingersoll-Ross (CIR) interest rate model). Let B(t), t ≥ 0,
be a Brownian motion. The Cox-Ingersoll-Ross model for the interest rate
process R(t) is
dR(t) = (α − βR(t))dt + σ R(t)dB(t) (68)
p

where α, β, and σ are positive constants. Find the expectation and variance of
R(t).

39
Figure 4: 50 years of US inflation vs interest
rates. (source of figure: https://www.gzeromedia.com/
the-graphic-truth-50-years-of-us-inflation-vs-interest-rates)

Figure 5: Illustration of Vasicek model for the interest rate. (source of figure:
https://en.wikipedia.org/wiki/Vasicek_model)

40
Unlike the Vasicek equation (67), the solution to CIR equation (68) can not
be written explicitly. The advantage of (68) over the Vasicek model is that the
interest rate in the CIR model does not become negative. If R(t) reaches zero,
the term multiplying dB(t) vanishes and the positive drift term αdt in equation
(68) drives the interest rate back into positive territory. Like the Vasicek model,
the CIR model is mean-reverting.
Theorem 5.7. CIR model is mean-reverting.
Although we cannot write the solution for (68) explicitly, let us try to find
the expected value and variance of R(t), since the expectation of Ito integral is
0, which simplifies some calculation.

5.4 Evolution of portfolio value


Consider an agent who at each time t has a portfolio valued at X(t). This
portfolio invests in a money market account paying a constant rate of interest
r and in a stock modeled by the geometric Brownian motion

dS(t) = αS(t)dt + σS(t)dB(t) . (69)

Suppose at each time t, the investor holds H(t) shares of stock. The position
H(t) can be random but must be adapted to the filtration associated with the
Brownian motion B(t), t ≥ 0. The remainder of the portfolio value, X(t) −
H(t)S(t), is invested in the money market account.
The differential dX(t) for the investor’s portfolio value at each time t is due
to two factors, the capital gain H(t)dS(t) on the stock position and the interest
earnings r(X(t) − H(t)S(t))dt on the cash position. In other words,

dX(t) = H(t)dS(t) + r(X(t) − H(t)S(t))dt


= H(t)(µS(t)dt + σS(t)dB(t)) + r(X(t) − H(t)S(t))dt (70)
= rX(t)dt + H(t)(µ − r)S(t)dt + H(t)σS(t)dB(t) .
The three terms appearing in the last line of (70) can be understood as follows:
1. an average underlying rate of return r on the portfolio, which is reflected
by the term rX(t)dt;
2. a risk premium µ − r for investing in the stock, which is reflected by the
term H(t)(µ − r)S(t)dt, and
3. a volatility term proportional to the size of the stock investment, which is
the term H(t)σS(t)dB(t).
We shall often consider the discounted stock price e−rt S(t) and the dis-
counted portfolio value of an agent, e−rt X(t). According to the Ito formula
with f (t, x) = e−rt x, the differential of the discounted stock price is

d e−rt S(t) = df (t, S(t))




1
=ft (t, S(t))dt + fx (t, S(t))dS(t) + fxx (t, S(t))dS(t)dS(t)
2 (71)
= − re−rt S(t)dt + e−rt dS(t)
=(µ − r)e−rt S(t)dt + σe−rt S(t)dB(t)

41
and the differential of the discounted portfolio value is

d e−rt X(t) = df (t, X(t))




1
=ft (t, X(t))dt + fx (t, X(t))dX(t) + fxx (t, X(t))dX(t)dX(t)
2
= − re−rt X(t)dt + e−rt dX(t) (72)
=H(t)(µ − r)e−rt S(t)dt + H(t)σe−rt S(t)dB(t)
=H(t)d e−rt S(t) .


Discounting the stock price reduces the mean rate of return from µ, the
term multiplying S(t)dt in (69), to µ − r, the term multiplying e−rt S(t)dt in
(71). Discounting the portfolio value removes the underlying rate of return r;
compare the last line of (70) to the next-to-last line of (72). The last line of
(72) shows that change in the discounted portfolio value is solely due to change
in the discounted stock price.

5.5 Black-Scholes Equation


In mathematical finance, the Black–Scholes equation is a partial differential
equation (PDE) governing the price evolution of a European call under the
Black–Scholes model.
The key financial insight behind the equation is that, under the model as-
sumption of a frictionless market, one can perfectly hedge the option by buying
and selling the underlying asset H(t) in just the right way and consequently
“eliminate risk". This hedge, in turn, implies that there is only one right price
for the option, as returned by the Black–Scholes formula.
The main purpose of this subsection is to derive the Black-Scholes equation.
For simplicity, we make the following assumptions:
1. A money market account paying a constant rate of interest r;
2. the stock prices is modeled by the geometric Brownian motion with con-
stant µ and σ in (61), i.e.

dS(t) = µS(t)dt + σS(t)dB(t);

3. shares of stock H(t);


4. a European call option3 with strike price K and expiration T .

5.5.1 Evolution of Option Value


Consider a European call option that pays (S(T ) − K)+ at time T . The strike
price4 K is some nonnegative constant. The value of this call at any time should
depend on
3 A European call options are financial contracts that give the option buyer the right but

not the obligation to buy a stock. For an investor to profit from a call option, the stock’s
price, at expiry, has to be trading high enough above the strike price to cover the cost of the
option premium.
4 For call options, the strike price is where the security can be bought by the option holder.

42
(i) the time to expiration, i.e. T ;
(ii) the value of the stock price at that time, i.e. S(t);

(iii) current time t;


(iv) the model parameters µ and σ in (69);
(v) the contractual strike price K.

Black, Scholes, and Merton argued that only two of these quantities, time t and
stock price S(t), are variable.
Following this reasoning, we let c(t, x) denote the value of the call at time t
if the stock price at that time is S(t) = x. There is nothing random about the
function c(t, x). However, the value of the option is random; it is the stochastic
process c(t, S(t)) obtained by replacing the dummy variable x by the random
stock price S(t) in this function. At the initial time, we do not know the future
stock prices S(t) and hence do not know the future option values c(t, S(t)). Our
goal is to determine the function c(t, x) so we at least have a formula for the
future option values in terms of the future stock prices.
We begin by computing the differential of c(t, S(t)). According to the Ito
formula, it is

dc(t, S(t))
1
=ct (t, S(t))dt + cx (t, S(t))dS(t) + cxx (t, S(t))dS(t)dS(t)
2
=ct (t, S(t))dt + cx (t, S(t))(µS(t)dt + σS(t)dB(t))
1 (73)
+ cxx (t, S(t))σ 2 S 2 (t)dt
2
1 2 2
 
= ct (t, S(t)) + µS(t)cx (t, S(t)) + σ S (t)cxx (t, S(t)) dt
2
+ σS(t)cx (t, S(t))dB(t) .

We next compute the differential of the discounted option price e−rt c(t, S(t)).
Let f (t, x) = e−rt x. According to the Ito formula,

d e−rt c(t, S(t)) = df (t, c(t, S(t)))




=ft (t, c(t, S(t)))dt + fx (t, c(t, S(t)))dc(t, S(t))


1
+ fxx (t, c(t, S(t)))dc(t, S(t))dc(t, S(t))
2
= − re−rt c(t, S(t))dt + e−rt dc(t, S(t)) (74)
h
=e−rt − rc(t, S(t)) + ct (t, S(t)) + µS(t)cx (t, S(t))
1 i
+ σ 2 S 2 (t)cxx (t, S(t)) dt + e−rt σS(t)cx (t, S(t))dB(t) .
2

5.5.2 Equating the Evolutions


A (short option) hedging portfolio starts with some initial capital X(0) and
invests in the stock and money market account so that the portfolio value X(t) at

43
each time t ∈ [0, T ] agrees with c(t, S(t)). This happens if and only if e−rt X(t) =
e−rt c(t, S(t)) for all t. One way to ensure this equality is to make sure that

d e−rt X(t) = d e−rt c(t, S(t)) for all t ∈ [0, T ) (75)


 

and X(0) = c(0, S(0)). Integration of (75) from 0 to t then yields

e−rt X(t) − X(0) = e−rt c(t, S(t)) − c(0, S(0)) for all t ∈ [0, T ) . (76)

If X(0) = c(0, S(0)), then we can cancel this term in (76) and get the desired
equality.
Comparing (72) and (74), we see that (75) holds if and only if

H(t)(µ − r)S(t)dt + H(t)σS(t)dB(t)


1
 
= −rc(t, S(t)) + ct (t, S(t)) + µS(t)cx (t, S(t)) + σ 2 S 2 (t)cxx (t, S(t)) dt (77)
2
+ σS(t)cx (t, S(t))dB(t) .

We examine what is required in order for (77) to hold. We first equate the dB(t)
terms in (77), which gives

H(t) = cx (t, S(t)) for all t ∈ [0, T ) . (78)

This is called the delta-hedging rule. At each time t prior to expiration, the
number of shares held by the hedge of the short option position is the partial
derivative with respect to the stock price of the option value at that time. This
quantity, cx (t, S(t)), is called the delta of the option. We next equate the dt
terms in (77), using (78), to obtain

(µ − r)S(t)cx (t, S(t))


1 (79)
= − rc(t, S(t)) + ct (t, S(t)) + µS(t)cx (t, S(t)) + σ 2 S 2 (t)cxx (t, S(t)).
2
for all t ∈ [0, T ). The term µS(t)cx (t, S(t)) appears on both sides of (79), and
after canceling it, we obtain
1
rc(t, S(t)) = ct (t, S(t)) + rS(t)cx (t, S(t))+ σ 2 S 2 (t)cxx (t, S(t)), (80)
2
for all t ∈ [0, T ). In conclusion, we should seek a continuous function c(t, x)
that is a solution to the Black-Scholes-Merton partial differential equation
1
ct (t, x) + rxcx (t, x) + σ 2 x2 cxx (t, x) = rc(t, x) , (81)
2
for all t ∈ [0, T ) and x ≥ 0, and satisfies the terminal condition

c(T, x) = (x − K)+ . (82)

5.5.3 Conclusion
Suppose we have found this function. If an investor starts with initial capital
X(0) = c(0, S(0)) and uses the hedge H(t) = cx (t, S(t)), then (77) will hold for

44
all t ∈ [0, T ). Indeed, the dB(t) terms on the left and right sides of (77) agree
because H(t) = cx (t, S(t)), and the dt terms agree because (81) guarantees
(80). Equality in (77) gives us (76). Canceling X(0) = c(0, S(0)) and e−rt in
this equation, we see that X(t) = c(t, S(t)) for all t ∈ [0, T ). Taking the limit
as t ↑ T and using the fact that both X(t) and c(t, S(t)) are continuous, we
conclude that X(T ) = c(T, S(T )) = (S(T ) − K)+ .
This means that the short position5 has been successfully hedged. No matter
which of its possible paths the stock price follows, when the option expires, the
agent hedging the short position has a portfolio whose value agrees with the
option payoff.

References
[1] P. Mörters, Y. Peres, (2010), Brownian Motion, Cambridge: Cambridge
University Press.

[2] B. Oksendal, Stochastic differential equations. An introduction with applica-


tions. Sixth edition. Universitext. Springer-Verlag, Berlin, 2003. xxiv+360
pp.
[3] S.E. Steven, Stochastic calculus for finance. II. Continuous-time models.
Springer Finance. Springer-Verlag, New York, 2004. xx+550 pp. ISBN: 0-
387-40101-6

5 The Short Position is a technique used when an investor anticipates that the value of

a stock will decrease in the short term, perhaps in the next few days or weeks. In a short
sell transaction the investor borrows the shares of stock from the investment firm to sell to
another investor.

45

You might also like