4SP_LectureNotes_v3 (1)
4SP_LectureNotes_v3 (1)
Yuzhao Wang
UoB 2024
November 14, 2024
1
Contents
1 Preliminaries 3
1.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Gaussian random variable . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Gaussian random vectors . . . . . . . . . . . . . . . . . . . . . . 5
2 Brownian motion 8
2.1 Motivation and definition . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Gaussian process . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Non-differentiability . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7 Quadratic variation . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.8 Markov and martingale properties . . . . . . . . . . . . . . . . . 17
2.8.1 Filtrations . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8.2 Markov property . . . . . . . . . . . . . . . . . . . . . . . 19
2.8.3 Strong Markov property* . . . . . . . . . . . . . . . . . . 19
2.8.4 Martingale property . . . . . . . . . . . . . . . . . . . . . 20
3 Stochastic integral 21
3.1 Wiener Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Stochastic integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Ito Calculus and Finance . . . . . . . . . . . . . . . . . . . . . . 27
3.5 No perfect foresight assumption . . . . . . . . . . . . . . . . . . . 28
4 Ito formula 29
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Proof of Ito formula . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Generalized Ito formula . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Multidimensional Ito’s formula . . . . . . . . . . . . . . . . . . . 33
2
1 Preliminaries
In this chapter we recall basic definitions and results used in this lecture.
If there is a function f : R → [0, ∞], such that for each interval [a, b] ⊂ R, we
have Z b
P(a ≤ X ≤ b) = f (x)dx,
a
then f is called the probability density function of X.
Let X be integrable. We define the expectation of X by
Z Z
E[X] = X(ω)P(dω) = xµX (dx). (2)
Ω
3
Let X be a real square-integrable random variable. Its variance is the quantity
4
Lemma 1.2. Suppose X is standard normally distributed. Then for all x > 0,
we have
1 1 − x2
P(X > x) ≤ √ e 2. (5)
x 2π
Recall the characteristic function of a random variable X is defined as
the expected value of eitX , i.e.
Z ∼ N (aµX + bµY , a2 σX
2
+ b2 σY2 ).
5
More general Gaussian random variable can be constructed via linear image
of standard Gaussians.
Definition 1.8 (Gaussian random variable). A random variable Y ∈ Rd is
called Gaussian if there exists an m-dimensional standard Gaussian X, a d × m
matrix A, and a d-dimensional vector µ such that Y = AX + µ.
The covariance matrix of the vector Y is then given by
Cov(Y ) = E (Y − EY )(Y − EY )T = AAT . (8)
6
It turns out that the orthogonal d×d matrix does not change the distribution
of a standard Gaussian random vector, which is recorded in the following lemma
7
2 Brownian motion
A large part of probability theory is devoted to describing the macroscopic pic-
ture emerging from random systems defined by microscopic phenomena. Brown-
ian motion is the macroscopic picture emerging from a particle moving randomly
in d-dimensional space without making very big jumps. On the microscopic
level, at any time step, the particle receives a random displacement, cased by
other particles hitting it or by an external force, so that,Pif its position at time
n
zero is x0 , its position at time n is given as xn = x0 + i=1 xi , where the dis-
placements x1 , x2 , x3 , · · · are assumed to be independent, identically distributed
random variables with values in Rd . The process {xn }n≥0 is a random walk,
the displacements represent the microscopic inputs. When we think about the
macroscopic picture, we would like to know
• Does xn drift to infinity?
• Does xn return to the neighbourhood of the origin infinitely often?
• What is the speed of growth of max{|x1 |, · · · , |xn |} as n → ∞?
it turns out that not all the features of the microscopic inputs contribute to
the macroscopic picture. Indeed, if they exist, only the mean and covariance of
the displacements shape the picture. In other words, all random walks whose
displacements have the same mean and covariance matrix give rise to the same
macroscopic process, and even the assumption that the displacements have to be
independent and identically distributed can be substantially relaxed. This effect
is called universality, and the macroscopic process is often called a universal
object. It is a common approach in probability to study various phenomena
through the associated universal objects.
8
• f is the probability density, i.e. f dy = 1;
R
R
u(x + t + τ ) − u(x, t) D
= ∂x2 u(x, t) + {higher-order terms of τ }.
τ 2
Taking τ → 0, we see that
D 2
∂t u = ∂ u, (11)
2 x
with initial condition u(0) = δ0 . The solution to this equation is
1 x2
u(x, t) = 1 e
− 2Dt
.
(2πDt) 2
This shows the density of the diffusing ink at time t is N (0, Dt), the normal
distribution, for some constant D. Einstein further computed
RT
D=
NA f
where R is the gas constant, T is the absolute temperature, f is the friction
coefficient, and Na is Avogadro’s number. This equation and the observed
properties of Brownian motion helped J. Perrin to compute NA (≈ 6 × 1023 =
the member of molecules in a mole) and lead to the atomic theory of matter.
We now introduce Brownian motion, for which we take D = 1.
Definition 2.1 (Brownian motion). A real-valued stochastic process {B(t) :
t ≥ 0} is called a (linear) Brownian motion with start in x ∈ R if the following
holds
• B(0) = x,
• the process has independent increments, i.e. for all times 0 ≤ t1 ≤ t2 ≤
. . . ≤ tn the increments B(tn ) − B(tn−1 ), B(tn−1 ) − B(tn−2 ), . . . , B(t2 ) −
B(t1 ) are independent random variables,
• for all t ≥ 0 and h > 0 the increments B(t + h) − B(t) are normally
distributed with expectation zero and variance h, i.e. the process has
stationary increments,
• almost surely, the function t 7→ B(t) is continuous.
9
Figure 1: Illustration of a standard Brownian motion. Each coloured path
represents a sample (realisation) path of the Brownian motion (source of figure:
https://dlsun.github.io/probability/brownian-motion.html)
N. Wiener, in the 1920s and later, put the theory on a firm mathematical
basis, which we will discuss in the next subsection.
See Figure 1.1 for an illustration of a standard Brownian motion
Lemma 2.2. Suppose B(t) is a one-dimensional Brownian motion. Then
and
E[B(t)B(s)] = min{s, t},
for t, s ≥ 0.
2.2 Existence
The existence of a Brownian motion is a nontrivial question. It is not obvious
that the conditions imposed on the finite-dimensional distributions in the defi-
nitions of Brownian motion allow the process to have continuous sample paths,
or whether there is a contradiction.
Theorem 2.3 (Wiener 1923). Standard Brownian motion exists.
We shall prove this theorem by explicitly constructing a Brownian motion
(this construction is due to Lévy). Lévy’s construction is a bit more complicated
but offers several useful properties that will be used in the later sections (e.g.
for the continuity property). To be more precise, we construct Brownian motion
as a uniform limit of continuous functions, to ensure that it automatically has
continuous paths. Recall that we only need to construct a standard Brownian
motion {B(t) : t ≥ 0} since X(t) = x + B(t) is a Brownian motion with starting
point x.
10
Step 1. We first construct Brownian motion on the interval [0, 1] as a random
element of the space C([0, 1]) of continuous functions on [0, 1]. To this end, we
construct the Brownian motion on the sets of dyadic points
∞
[ nk o
D= Dn where Dn = : 0 ≤ k ≤ 2n
n=0
2n
D0 = {0, 1},
D1 = {0, 1/2, 1},
D2 = {0, 1/4, 1/2, 3/4, 1},
D3 = {0, 1/8, 1/4, 3/8, 1/2, 5/8, 3/4, 7/8, 1},
..
.
• n = 1, D1 \ D0 = {1/2}
• n = 2, D2 \ D1 = {1/4, 3/4}
11
Lemma 2.4. Let Dn be given as above, and B(d) be defined in (12). Then,
(i) for all r < s < t in Dn the random variable B(t) − B(s) is normally
distributed with mean zero and variance t−s, and is independent of B(s)−
B(r),
(ii) the vectors (B(d) : d ∈ Dn ) and (Zt : t ∈ D\Dn ) are independent.
Having defined the values of the process on all dyadic points, we now inter-
polate them. Define
Z1 for t = 1,
F0 (t) = 0 for t = 0,
linear in between,
Lemma 2.5. These functions are continuous on [0, 1] and for all n and d ∈ Dn
n
X ∞
X
B(d) = Fi (d) = Fi (d). (13)
i=0 i=0
Step 3. Finally, we extend the domain from [0, 1] to [0, ∞). We now take
a sequence B0 , B1 , . . . of independent C[0, 1]-valued random variables with the
distribution of this process and define {B(t) : t ≥ 0} by gluing together the
parts, by
⌊t⌋−1
X
B(t) := B⌊t⌋ (t − ⌊t⌋) + Bi (1), for all t ≥ 0.
i=0
12
and k − 1 ≤ tm′ < k, tm′ ↑ k. Then
h k−1
X i
lim B(tm ) = lim Bk (tm − k) + Bi (1)
m→∞ m→∞
i=0
k−1
X k−1
X
= Bk (0) + Bi (1) = Bi (1),
i=0 i=0
and
h k−2
X i
lim
′
B(tm′ ) = lim
′
Bk−1 (tm′ − k + 1) + Bi (1)
m →∞ m →∞
i=0
k−2
X k−1
X
= Bk−1 (1) + Bi (1) = Bi (1).
i=0 i=0
Hence
lim B(tm ) = lim
′
B(tm′ ) = B(k).
m→∞ m →∞
and
13
Similarly if j < i then Cov(B(ti ), B(tj )) = tj . Hence in general, we obtain
2.4 Invariance
In this section, we show that if we perform certain transformations on a Brow-
nian motion we still get a Brownian motion.
Lemma 2.9 (Symmetry). Suppose that {B(t) : t ≥ 0} is a standard Brownian
motion. Then {−B(t) : t ≥ 0} is also a standard Brownian motion
Lemma 2.10 (Scaling invariance). Suppose that {B(t) : t ≥ 0} is a standard
Brownian motion and let a > 0. Then the process {X(t) : t ≥ 0} defined by
1
X(t) := B(a2 t)
a
is also a standard Brownian motion.
Theorem 2.11 (Time inversion). Suppose that {B(t) : t ≥ 0} is a standard
Brownian motion. Then the process {X(t) : t ≥ 0} defined by
(
0 for t = 0,
X(t) =
tB( t ) for t > 0,
1
2.5 Continuity
In this section, we study continuity properties of a Brownian motion. We have
known that a Brownian motion is almost surely continuous. The following the-
orem states a stronger statement providing an upper estimate for the quantity
|B(t + h) − B(t)|.
Theorem 2.13. There exists a constant C > 0 such that, almost surely, for
every sufficiently small h > 0 and all 0 ≤ t ≤ 1 − h,
r 1
B(t + h) − B(t) ≤ C h log . (15)
h
Theorem 2.13 has one important consequence saying that the paths are α-
Hölder continuous, which is stronger than continuity.
14
Definition 2.14. A function f : [0, ∞) → R is said to be locally α-Hölder
continuous at x ≥ 0 if there exists ε > 0 and c > 0 such that
We refer to α > 0 as the Hölder exponent and to c > 0 as the Hölder constant.
It is easy to see tht α-Hölder continuity gets stronger, as the exponent α
gets larger.
Lemma 2.15. For h > 0 sufficiently small and all 0 < α < 1/2 we have
1 1 1−2α
log ≤ . (16)
h h
Theorem 2.16. If α < 1/2 then, almost surely, Brownian motion is everywhere
locally α-Hölder continuous.
2.6 Non-differentiability
In the previous section, we have shown that a Brownian motion is almost surely
α-continuous for any α < 1/2.
In this section, we show that almost surely Brownian motion is nowhere
differentiable. This is a striking property of Brownian motion.
Let f : R → R. Define, for any limit point a ∈ R,
and
where B(a; ε) denotes the ball of radius ε about a. For a function f , we define
the upper and lower right derivatives
B(t + h) − B(t)
D+ f (t) = lim sup ,
h→0+ h
and
B(t + h) − B(t)
D− f (t) = lim inf .
+
h→0 h
It then follows that if f is differentiable at t ∈ R, then both D+ f (t) and D− f (t)
exist and
f ′ (t) = D+ f (t) = D− f (t).
We then have the following theorem
Theorem 2.17 (Paley, Wiener and Zygmund 1933). Almost surely, Brownian
motion is nowhere differentiable. Furthermore, for all t
15
2.7 Quadratic variation
In this section, we show that Brownian motion has finite quadratic variation,
which is crucially important for the development of stochastic integration stud-
ied later on.
For a function f : [a, b] → R and Pn = {t0 , · · · , tn } is a partition of the finite
interval [a, b] of the form a = t0 < t1 < . . . < tn = b, the variation and the
quadratic variation of f over [a, b] with respect to Pn are defined respectively
by
n
X
VPn (f )[a, b] = |f (tk ) − f (tk−1 )|
k=1
and
n
X
QPn (f )[a, b] = |f (tk ) − f (tk−1 )|2 .
k=1
Let ∥Pn ∥ = max1≤k≤n {tk − tk−1 } denote the maximum interval length of a
partition. The total variation and the quadratic variation of f over [a, b] are
defined respectively by
V (f )[a, b] = lim VPn (f )[a, b],
∥Pn ∥→0
and
Q(f )[a, b] = lim QPn (f )[a, b].
∥Pn ∥→0
Lemma 2.18 shows that the total variation and the quadratic variation of a
differentiable function are both finite.
Recalling from Theorem 2.17 that a Brownian is nowhere differentiable. We
will show that a Brownian motion has unbounded total variation but finite
quadratic variation.
Now we consider the total variation and quadratic variation of a Brownian
motion {B(t) : t ≥ 0} defined as follows: the total variation
V (B)[0, t] = lim VPn (B)[0, t],
∥Pn ∥→0
where
n
X
VPn (B)[0, t] = |B(tk ) − B(tk−1 )|,
k=1
and the quadratic variation
Q(B)[0, t] = lim QPn (B)[0, t],
∥Pn ∥→0
where
n
X
QPn (B)[0, t] = |B(tk ) − B(tk−1 )|2 .
k=1
Note that the total variation and quadratic variation of a Brownian motion are
both random variables.
16
Theorem 2.19. The quadratic variation of a Brownian motion {B(s) : s ∈
[0, t]} satisfies
Q(B)[0, t] = lim QPn (B)[0, t] = t,
∥Pn ∥→0
2.8.1 Filtrations
We equip our measurable space (Ω, F ) with a filtration, i.e., a nondecreasing
family {Ft ; t ≥ 0} of sub-σ-fields of F .
17
1. A filtration on a probability space (Ω, F , P) is a family (F (t) : t ≥ 0) of
σ-algebras such that F (s) ⊂ F (t) ⊂ F for all s < t.
2. A probability space together with a filtration is called a filtered probability
space.
3. A stochastic process {X(t) : t ≥ 0} defined on a filtered probability space
with filtration (F (t) : t ≥ 0) is called adapted if X(t) is F (t)-measurable
for any t ≥ 0.
We now introduce two natural filtrations for a Brownian motion (B(t) : t ≥
0).
Definition 2.23 (Two natural filtrations for Brownian motions). Let {B(t) :
t ≥ 0} be a Brownian motion defined on a probability space (Ω, F , P).
1. We denote (F 0 (t) : t ≥ 0) the filtration defined by
F 0 (t) := σ(B(s) : 0 ≤ s ≤ t),
which is the σ-algebra generated by the random variables B(s) for 0 ≤
s ≤ t.
2. We denote (F + (t) : t ≥ 0) the filtration defined by
\ \
F + (t) := F 0 (s) = σ(B(τ ) : 0 ≤ τ ≤ s).
s>t s>t
This is because
\ ∞ \
\ ∞
F + (t + ε) = F + (t + 1/k + 1/m) = F + (t).
ε>0 k=1 m=1
18
2.8.2 Markov property
Definition 2.26 (independence of two stochastic processes). Two stochastic
processes {X(t) : t ≥ 0} and {Y (t) : t ≥ 0} are called independent if for any
sets t1 , . . . , tm ≥ 0 and s1 , . . . , sk ≥ 0 of times the vectors (X(t1 ), . . . , X(tm ))
and (Y (s1 ), . . . , Y (sk )) are independent.
Suppose that {X(t) : t ≥ 0} is a stochastic process. Intuitively, the Markov
property says that if we know the process {X(t) : t ≥ 0} on the interval [0, s],
then for the prediction of the future {X(t) : t ≥ s} we only need to know
the information about the end point (the present) B(s) but not necessarily the
information about the whole path (the history) {X(t) : 0 ≤ t ≤ s}.
The basic Markov property for a Brownian motion is the following.
Theorem 2.27 (Markov property I). Let {B(t) : t ≥ 0} be an n-dimensional
Brownian motion started at x ∈ Rn . Let s ≥ 0. Then the process {B(t +
s) − B(s)}t≥0 is a standard Brownian motion and is independent of the process
{B(u) : 0 ≤ u ≤ s}.
In fact, we have a stronger result:
Theorem 2.28 (Markov property II). Let {B(t) : t ≥ 0} be an n-dimensional
Brownian motion started at x ∈ Rn . Let s ≥ 0. Then the process {B(t + s) −
B(s)}t≥0 is a standard Brownian motion and is independent of F + (s).
Intuitively, this is the collection of all events that happened before the stopping
time T .
Theorem 2.30 (Strong Markov property). For every almost surely finite stop-
ping time T , the process
{B(T + t) − B(T ) : t ≥ 0}
19
As a consequence of the reflection principle is the following.
Proposition 2.32. Let {B(t) : t ≥ 0} be a standard linear Brownian motion
and let
M (t) := max B(s).
0≤s≤t
Then if a > 0,
Intuitively, a martingale describes fair games in the sense that the current
state is always the best prediction for future states.
Theorem 2.34. A Brownian motion is a martingale (with respect to the filtra-
tion {F + (t) : t ≥ 0}).
We now present two useful facts about martingale: the optional stopping
theorem and Doob’s maximal inequality. The proofs of the two following theo-
rems, which can be found in[1], will be omitted.
Theorem 2.35 (Optional stopping theorem). Suppose {X(t) : t ≥ 0} is a
continuous martingale and 0 ≤ S ≤ T are stopping times. If the process {X(t ∧
T ) : t ≥ 0} is dominated by an integrable random variable X, that is |X(t∧T )| ≤
X almost surely for all t ≥ 0, then
Theorem 2.35 says that, under certain conditions, the expected value of a
martingale at a stopping time is equal to its initial expected value. Martingales
are useful in modeling the wealth of a gambler participating in a fair game,
the optional stopping theorem says that, on average, nothing can be gained by
stopping play based on the information obtainable so far (i.e., without looking
into the future).
20
3 Stochastic integral
In this section, we will study the stochastic integral, also known as the Ito
integral, which has the form
Z b
f (t, ω)dB(t, ω),
a
0, t=0
(
f (t) =
t sin 1t , 0 < t ≤ 1.
3.1.1 Construction
We first suppose f is a step function given by
n
X
f= ai 1[ti−1 ,ti ) ,
i=1
21
where t0 = a and tn = b. In this case, define the integral of step function f by
Z b n
X
I(f ) = f (t)dB(t) = ai (B(ti ) − B(ti−1 )). (22)
a i=1
It is easy to check that I is linear, i.e. I(af +bg) = aI(f )+bI(g) for any a, b ∈ R
and step functions f and g. Moreover, we have
Lemma 3.2. For a step function f , the random variable I(f ) is Gaussian with
mean 0 and variance
Z b
E(I(f )2 ) = f (t)2 dt. (23)
a
random bariables on Ω with inner product ⟨X, Y ⟩ = E[XY ]. Let f ∈ L2 ([a, b])
and a sequence {fn }∞
n=1 of step functions such that fn → f in L ([a, b]). By
2
Lemma 3.2 the sequence {I(fn )}n=1 is Cauchy in L (Ω). Therefore, it converges
∞ 2
in L2 (Ω). Define
In order to show I(f ) is well-defined, we also need to prove that the limit in
(24) is independent of the choice of the sequence {fn }. Suppose {gm } is another
sequence of step functions and gm → f in L2 ([a, b]). Then by the linearity of
the mapping I, (23), and triangle inequality, we have
Z b
E[(I(fn ) − I(gm ))2 ] = E[(I(fn − gm ))2 ] = (fn (t) − gm (t))2 dt
a
2
≤ ∥fn (t) − f (t)∥L2 ([a,b]) + ∥gm (t) − f (t)∥L2 ([a,b]) ,
22
Example 3.5. The Wiener integral
Z 1
s dB(s)
0
R1
is a Gaussian random variable with mean 0 and variance 0
s2 ds = 31 .
It turns out that the Wiener integral defined in (24) coincide with the
Riemann-Stieltjes integral defined in (21) almost surely, which means that the
former is indeed the extension of the later.
Theorem 3.6. Let f be a continuous functionn of bounded variation. Then for
almost all ω ∈ Ω,
Z b Z b
f (t)dB(t) (ω) = f (t)dB(t, ω)
a a
where the left-hand side is the Wiener integral of f and the right-hand side is
the Riemann-Stieltjes integral of f defined by (21).
Let f ∈ L2 ([a, b]) and consider the stochastic process defined by
Z t
Mt = f (s)dB(s) (26)
a
Definition 3.8. In what follows, we use L2ad ([a, b]×Ω) to denote the space of all
stochastic process f (t, ω), a ≤ t ≤ b, ω ∈ Ω, satisfying the following conditions:
1. f (t, ω) is adapted to the filtration {Ft };
23
Rb
2. a
E[|f (t)|2 ]dt < ∞.
The main purpose of this subsection is to define the stochastic integral
Z b
f (t)dB(t), (27)
a
for f ∈ L2ad ([a, b] × Ω). We will split this construction into three steps.
Step 1. f is a step stochastic process in L2ad ([a, b] × Ω).
Suppose f is a step stochastic process given by
n
X
f (t, ω) = ξi−1 (ω)1[ti−1 ,ti ) (t), (28)
i=1
It is easy to see that I is a linear mapping, i.e. I(af + bg) = aI(f ) + bI(g) for
any a, b ∈ R and any such step stochastic processes f and g. Moreover, we have
Lemma 3.9. Let I(f ) be defined by (29). Then, E[I(f )] = 0 and
Z b
E[I(f ) ] =
2
E[f (t)2 ]dt. (30)
a
Case 2: f is bounded.
In this case, define a stochastic process gn by
Z n(t−a)
gn (t, ω) = e−τ f (t − n−1 τ, ω)dτ.
0
Rb
Note that gn is adapted to Ft and a
E[|gn (t)|2 ]dt < ∞.
Claim 3.11. For each n, E[gn (t)gn (s)] is a continuous function of (t, s).
And we also have
Claim 3.12. We have
Z b
E[|f (t) − gn (t)|2 ]dt = 0,
a
as n → ∞.
24
Rb
Step 3. Stochastic integral a f (t)dB(t) for f ∈ L2ad ([a, b] × Ω).
Now we can use what we prove in Step 1 and 2 to define the stochastic
integral
Z b
f (t)dB(t),
a
in L2 (Ω). We can then use the arguments similar to those in Section 3.1.1 for
the Wiener integral to show that the above I(f ) is well-defined.
Definition 3.13. The limit I(f ) defined in (32) is called Ito integral of f and
Rb
is denoted by a f (t)dB(t).
From our construction it is easy to check the mapping I defined on f ∈
L2ad ([a, b]×Ω) is linear. Furthermore, the Ito integral I : L2ad ([a, b]×Ω) → L2 (Ω)
is an isometry.
Theorem 3.14. Suppose f ∈ L2ad ([a, b] × Ω). Then the Ito integral I(f ) =
Rb
a
f (t)dB(t) is a random variable with E[I(f )] = 0 and
Z b
E[|I(f )| ] =
2
E |f (t)|2 dt. (33)
a
Since I is linear, we also have the following corollary, whose proof is similar to
that of Corollary 3.4.
Corollary 3.15. For any f, g ∈ L2ad ([a, b] × Ω), the following equality holds:
Z b Z b Z b
E f (t)dB(t) g(t)dB(t) = E[f (t)g(t)]dt.
a a a
1
Z b
B(t)dB(t) = [B(b)2 − B(a)2 − (b − a)].
a 2
As a consequence of Example 3.16, we have
1
Z t
B(s)dB(s) = [B(t)2 − B(a)2 − (t − a)]. (34)
a 2
25
Example 3.17. We have
1
Z b Z b
B(t) dB(t) = (B(b) − B(a) ) −
2 3 3
B(t)dt,
a 3 a
where the integral in the right-hand side is the Riemann integral of B(t, ω) for
almost all ω ∈ Ω.
3.3 Properties
We consider the continuity property of the stochastic process defined by the
stochastic integral
Z t
Xt = f (s)dB(s), (35)
a
26
4. (Martingale) X(t) is a martingale with respect to the filtration {Ft ; a ≤
t ≤ b}.
hZ t i
5. (Ito isometry) E[X 2 (t)] = E f 2 (s)ds .
a
is the cumulative gain or loss in the single-stock portfolio due to changes in the
price of the stock from time 0 to time T, when trading takes place continuously
in time. In particular,
f (ti−1 )(Xti − Xti−1 )
represents the (approximate) gain or loss over the time [ti−1 .ti ], and the limit
lim∥Pn ∥→0 means the trading takes place continuously in time. For the Riemann-
Stieltjes integral (37) to exist, i.e. the limit in (37) to converge, we usually
require Xt having bounded variation on the interval [0, T ], which prevents us
using Brownian motion to model the price in the stock market.
Competition in the liquid capital markets is fierce as millions of traders try
to predict future prices based on assessments of available information, which
makes the stock prices change by seemingly random movements. In 1900, French
mathematician Louis Bachelier was the first to model stock prices with Wiener
process. It turns out that Brownian motion is a perfect candidate to help in
modeling stock prices movements.
However, Brownian motion has unbounded total variation, which leads to
a major issue for the use of Riemann-Stieltjes integral (21). To overcome this
difficulty, Ito proposed a different method of convergence, i.e. the mean square
limit:
X n 2
lim E f (ti−1 )(Bti − Bti−1 ) − I(f ) =0
∥Pn ∥→0
i=1
where B(t) represents a Brownian motion. Built upon this idea, more complex
Ito integral can be constructed. In particular, if the stock price X(t) is modeled
27
as a Geometric Brownian motion, we can make sense of the integral
Z T
f (t)dXt ,
0
28
4 Ito formula
The chain rule in Calculus is the formula
d
f (g(t)) = f ′ (g(t))g ′ (t)
dt
for differentiable functions f and g. It can be rewritten in the integral form as
Z t
f (g(t)) − f (g(a)) = f ′ (g(s))g ′ (s)ds. (38)
a
In this section, we shall develop the chain rule for stochastic calculus.
4.1 Motivation
Let f be a differentiable function, and consider the composite function f (B(t).
Since almost all sample paths of B(t) are nowhere differentiable, the equation
(38) obviously has no meaning. However, when we rewrite B ′ (s)ds as an inte-
grator dB(s) in the Ito integral, (38) leads to the following question: does the
equality
Z t
f (B(t)) − f (B(a)) = f ′ (B(s)dB(s), (39)
a
29
where 0 < λi < 1, which together with (40) yields
n
X
f (B(t)) − f (B(a)) = f ′ (B(ti−1 ))(B(ti ) − B(ti−1 ))
i=1
n
(41)
1X
+ f B(ti−1 ) + λi (B(ti ) − B(ti−1 ))(B(ti ) − B(ti−1 )) ,
′′ 2
2 i=1
in probability. As for the second summation in (41), we may guess from Theorem
2.19 that
n
X
lim f ′′ B(ti−1 ) + λi (B(ti ) − B(ti−1 ))(B(ti ) − B(ti−1 ))2
∥Pn ∥→0
i=1 (43)
Z t
= f (B(s))ds.
′′
a
We shall prove (43) later. By collecting (41), (42), and (43), we have the
following result, which Ito proved in 1944.
Theorem 4.1. Let f (x) be a C 2 -function. Then
t
1 t
Z Z
f (B(t)) − f (B(a)) = f ′ (B(s)dB(s) + f ′′ (B(s))ds, (44)
a 2 a
where the first integral is an Ito integral, and the second integral is a Riemann
integral for each sample path of B(s).
We remark that the appearence of the second term in (44) is a consequence
of the nonzero quadratic variation of the Brownian motion B(t). This extra
term is the key difference between Ito calculus and Leibniz-Newton calculus.
Example 4.2. Take the function f (x) = x2 for (44) to get
Z t
B(t)2 − B(a)2 = 2 B(s)dB(s) + (t − a),
a
which coincides with (34). If we take f (x) = x3 , then (44) with t = b gives
Z b Z b
B(b) − B(a) = 3
3 3
B(s) dB(s) + 3
2
B(s)ds,
a a
30
Lemma 4.3. Let g(x) be a continuous function on R. For each n ≥ 1, let
Pn = {t0 , t1 , · · · , tn } be a partition of [a, t] and let 0 < λi < 1 for 1 ≤ i ≤ n.
Then there exists a subsequence of
n
X
lim g(B(ti−1 ) + λi (B(ti ) − B(ti−1 )) − g(B(ti−1 ))
∥Pn ∥→0
i=1 (45)
× (B(ti ) − B(ti−1 )) , 2
in L2 (Ω).
Remark 4.5. If g is a continuous function on R, i.e. no boundedness assump-
tion, then the (46) converges almost surely. We omit the proof.
31
almost surely as ∥Pn ∥ → 0. Finally, similar argument as in Lemmas 3.9 and 4.4
yields
n Z t 2
X ∂2f ∂ f
III = (t , Bti−1 + λ(Bti − Bti−1 ))(Bti − Bti−1 ) →
2 i−1
2
2
(s, Bs )ds,
i=1
∂x a ∂x
t
Z
∂f
f (t, Bt ) = f (a, Ba ) + (s, Bs )dBs
a ∂x
(49)
1 ∂2f
Z t
∂f
+ (s, Bs ) + (s, Bs ) ds
a ∂t 2 ∂x2
To further generalize Ito formula, we introduce some notations.
Definition 4.7. We say f ∈ Lad (Ω, Lp ([a, b])) if f is {Ft }-adapted stochastic
Rb
process such that a |f (t)|p dt < ∞ almost surely.
Definition 4.8. An Ito process is a stochastic process of the form
Z t Z t
Xt = Xa + σs dBs + µs ds, (50)
a a
32
× dB(t) dt
dB(t) dt 0
dt 0 0
∂f ∂f 1 ∂2f
df (t, Xt ) = (t, Xt )dt + (t, Xt )dXt + (t, Xt )(dXt )2 . (54)
∂t ∂x 2 ∂x2
Then we plug (50) and (53) to get
∂f ∂f 1 ∂2f
df (t, Xt ) = (t, Xt )dt + (t, Xt )(σt dB(t) + µt dt) + (t, Xt )σt2 dt
∂t ∂x 2 ∂x2
(55)
1 ∂2f
∂f ∂f ∂f
= σt dB(t) + + µt + σt2 2 dt.
∂x ∂t ∂x 2 ∂x
Here we omit the variable (t, Xt ) for simplicity. Finally, we can convert this
differential equation into integral form and get (52).
We remark that the above computation using the symbolic derivation of
stochastic differentials yields the correct Ito formula, however, this derivation is
not a proof.
where
X1 (t)
µ1
X(t) = .. M = ... ,
. ,
Xn (t) µn
and
dB1 (t)
σ11 ··· σ1m
S = ... .. ,
. dB(t) = ..
.
σn1 ··· σnm dBm (t)
33
Such a process X(t) is called an n-dimensional Ito process (or just an Ito
process). The stochastic differentials (56) should be understood in integral form,
i.e.,
Z t Z t Z t
X1 (t) =X1 (0) +
µ1 ds + σ 11 dB1 (s) + · · · σ1m dBm (s)
0 0 0
.. .. ..
. . . (57)
Z t Z t Z t
Xn (t) =Xn (0) + µn ds + σn1 dB1 (s) + · · · σnm dBm (s).
0 0 0
Or in matrix form,
Z t Z t
X(t) = X(0) + M(s)ds + S(s)dB(s) .
0 0
× dBj (t) dt
dBi (t) δi,j dt 0
dt 0 0
where (
1 i=j
δij = .
0 i ̸= j
The product dBi (t)dBj (t) = 0 for i ̸= j is the symbolic expression of the
following fact: let B1 (t) and B2 (t) be two independent Brownian motions and
let Pn = {t0 , · · · , tn } be a partition of [a, b]. Then,
n
X
lim (B1 (ti ) − B1 (ti−1 ))(B2 (ti ) − B2 (ti−1 )) = 0
∥Pn ∥→0
i=1
34
5 SDEs and applications in finance
A stochastic differential equation (SDE) is a differential equation in which one
or more of the terms is a stochastic process. Solutions to SDE, if exists, are also
stochastic processes. SDEs are used to model various phenomena such as stock
prices or physical systems subject to thermal fluctuations.
with initial condition X(0) = X0 , we mean that X(t) satisfies the stochastic
integral equation
Z t Z t
X(t) = X(0) + b(s, X(s))ds + σ(s, X(s))dB(s) . (60)
0 0
has a unique t-continuous solution X(t, ω) with the property that X(t, ω) is
adapted to the filtration Ft generated by W (s); s ≤ t and
"Z #
T
2
E |X(t)| dt < ∞ .
0
It is the Ito formula that is the key to the solution of many stochastic dif-
ferential equations. The method is illustrated in the following examples.
35
Figure 2: Stock index (1979 - 2019). (source of figure:
https://www.stlouisfed.org/on-the-economy/2021/january/
irrational-exuberance-look-stock-prices)
36
in the Black–Scholes model. A GBM process only assumes positive values, just
like real stock prices, which is one of the main advantage comparing to directly
using the Brownian motion to model the stock prices, as the Brownian motion
may take negative values. See the above figure for an example of stock prices.
A stochastic process S(t) is a GBM if it satisfies the following stochastic
differential equation (SDE):
S(0) = S0 , (62)
σ2
S(t) = S0 exp µ− t + σBt , (63)
2
These conclusions are direct consequences of the formula (63). We also have
the following.
Corollary 5.3. The above solution S(t) is a log-normally distributed random
variable with expected value and variance given by
E(S(t)) = S0 eµt
2
Var(S(t)) = S02 e2µt eσ t − 1 .
From the above we see that the expectation of S is determined by the de-
terministic trends µ. If we use GBM to model the stock price, this shows that
the expected returns are independent of the value of the process (stock price),
which agrees with what we would expect in reality.
However, GBM is not a completely realistic model, in particular:
37
• In real stock prices, volatility changes over time (possibly stochastically),
but in GBM, volatility σ is assumed constant.
• In real life, stock prices often show jumps caused by unpredictable events
or news, but in GBM, the path is continuous.
Example 5.4 (Generalized geometric Brownian motion). In an attempt to
make GBM more realistic as a model for stock prices, one can drop the assump-
tion that the volatility σ is constant. Let B(t), t ≥ 0, be a Brownian motion,
Ft , t ≥ 0 be an associated filtration, and µ(t) and σ(t) be adapted processes.
Define the Ito process,
1 2
Z t Z t
X(t) = σ(s)dB(s) + µ(s) − σ (s) ds,
0 0 2
and its differential form
1
dX(t) = σ(t)dB(t) + µ(t) − σ 2 (t) dt
2
From the Ito’s table (4.3) we see that dX(t)dX(t) = σ 2 (t)dB(t)dB(t) = σ 2 (t)dt.
Consider an asset price process given by
1
Z t Z t
S(t) = S(0)eX(t) = S(0) exp σ(s)dB(s) + µ(s) − σ 2 (s) ds (64)
0 0 2
where S(0) is nonrandom and positive. We may write S(t) = f (X(t)), where
f (x) = S(0)ex . According to the Ito formula
dS(t) = df (X(t))
1
= f ′ (X(t))dX(t) + f ′′ (X(t))dX(t)dX(t)
2
1
= S(0)e X(t)
dX(t) + S(0)eX(t) dX(t)dX(t)
2
1 (65)
= S(t)dX(t) + S(t)dX(t)dX(t)
2
1 1
= S(t) σ(t)dB(t) + µ(t) − σ 2 (t) S(t)dt + σ(t)2 S(t)dt
2 2
= µ(t)S(t)dt + σ(t)S(t)dB(t) ,
We note that (61) is a special case of (66) with µ(t) and σ taking deterministic
constant values. In (66), the asset price S(t) has instantaneous mean rate of
return µ(t) and volatility σ(t), both of which are allowed to be time-varying and
random.
38
believed that the interest rates are mean reverting, as very high or negative
interest rates either lead to downward economic spiral, or snap quickly back to
more normal levels. Therefore, the only reasonable mathematical interpretation
of interest rate behavior is a mean reverting one. The main purpose of this
subsection is to introduce a mathematical model with mean reverting property.
Example 5.5 (Model of interest rate). Let B(t), t ≥ 0, be a Brownian motion.
The Vasicek model for the interest rate process R(t) is
dR(t) = (α − βR(t))dt + σdB(t) (67)
where α, β, and σ are positive constants. Find a solution to this equation.
Equation (67) is an example of a stochastic differential equation. It defines
a random process, R(t) in this case, by giving a formula for its differential, and
the formula involves the random process itself and the differential of a Brownian
motion.
Theorem 3.3 implies that the random variable
Z t
eβs dB(s)
0
appearing on the right-hand side is normally distributed with mean zero and
variance
1
Z t
e2βs ds = e2βt − 1
0 2β
Therefore, R(t) is normally distributed with mean
α
E[R(t)] = e−βt R(0) + 1 − e−βt
β
and variance
σ2
Var[R(t)] = 1 − e−2βt .
2β
The Vasicek model has the desirable property that the interest rate is mean-
reverting. When R(t) = α β , the drift term (the dt term) in (67) is zero. When
R(t) > α β , this term is negative, which pushes R(t) back toward α β . When
R(t) < β , this term is positive, which again pushes R(t) back toward α
α
β . If
R(0) = β , then ER(t) = β for all t ≥ 0. If R(0) ̸= β , then
α α α
α
lim E[R(t)] = .
t→∞ β
There is one main disadvantage of the Vasicek model, which is that no matter
how the parameters α > 0, β > 0, and σ > 0 are chosen, there is positive
probability that R(t) is negative, an undesirable property for an interest rate
model. We therefore introduce a similar but different module to get around
this.
Example 5.6 (Cox-Ingersoll-Ross (CIR) interest rate model). Let B(t), t ≥ 0,
be a Brownian motion. The Cox-Ingersoll-Ross model for the interest rate
process R(t) is
dR(t) = (α − βR(t))dt + σ R(t)dB(t) (68)
p
where α, β, and σ are positive constants. Find the expectation and variance of
R(t).
39
Figure 4: 50 years of US inflation vs interest
rates. (source of figure: https://www.gzeromedia.com/
the-graphic-truth-50-years-of-us-inflation-vs-interest-rates)
Figure 5: Illustration of Vasicek model for the interest rate. (source of figure:
https://en.wikipedia.org/wiki/Vasicek_model)
40
Unlike the Vasicek equation (67), the solution to CIR equation (68) can not
be written explicitly. The advantage of (68) over the Vasicek model is that the
interest rate in the CIR model does not become negative. If R(t) reaches zero,
the term multiplying dB(t) vanishes and the positive drift term αdt in equation
(68) drives the interest rate back into positive territory. Like the Vasicek model,
the CIR model is mean-reverting.
Theorem 5.7. CIR model is mean-reverting.
Although we cannot write the solution for (68) explicitly, let us try to find
the expected value and variance of R(t), since the expectation of Ito integral is
0, which simplifies some calculation.
Suppose at each time t, the investor holds H(t) shares of stock. The position
H(t) can be random but must be adapted to the filtration associated with the
Brownian motion B(t), t ≥ 0. The remainder of the portfolio value, X(t) −
H(t)S(t), is invested in the money market account.
The differential dX(t) for the investor’s portfolio value at each time t is due
to two factors, the capital gain H(t)dS(t) on the stock position and the interest
earnings r(X(t) − H(t)S(t))dt on the cash position. In other words,
1
=ft (t, S(t))dt + fx (t, S(t))dS(t) + fxx (t, S(t))dS(t)dS(t)
2 (71)
= − re−rt S(t)dt + e−rt dS(t)
=(µ − r)e−rt S(t)dt + σe−rt S(t)dB(t)
41
and the differential of the discounted portfolio value is
1
=ft (t, X(t))dt + fx (t, X(t))dX(t) + fxx (t, X(t))dX(t)dX(t)
2
= − re−rt X(t)dt + e−rt dX(t) (72)
=H(t)(µ − r)e−rt S(t)dt + H(t)σe−rt S(t)dB(t)
=H(t)d e−rt S(t) .
Discounting the stock price reduces the mean rate of return from µ, the
term multiplying S(t)dt in (69), to µ − r, the term multiplying e−rt S(t)dt in
(71). Discounting the portfolio value removes the underlying rate of return r;
compare the last line of (70) to the next-to-last line of (72). The last line of
(72) shows that change in the discounted portfolio value is solely due to change
in the discounted stock price.
not the obligation to buy a stock. For an investor to profit from a call option, the stock’s
price, at expiry, has to be trading high enough above the strike price to cover the cost of the
option premium.
4 For call options, the strike price is where the security can be bought by the option holder.
42
(i) the time to expiration, i.e. T ;
(ii) the value of the stock price at that time, i.e. S(t);
Black, Scholes, and Merton argued that only two of these quantities, time t and
stock price S(t), are variable.
Following this reasoning, we let c(t, x) denote the value of the call at time t
if the stock price at that time is S(t) = x. There is nothing random about the
function c(t, x). However, the value of the option is random; it is the stochastic
process c(t, S(t)) obtained by replacing the dummy variable x by the random
stock price S(t) in this function. At the initial time, we do not know the future
stock prices S(t) and hence do not know the future option values c(t, S(t)). Our
goal is to determine the function c(t, x) so we at least have a formula for the
future option values in terms of the future stock prices.
We begin by computing the differential of c(t, S(t)). According to the Ito
formula, it is
dc(t, S(t))
1
=ct (t, S(t))dt + cx (t, S(t))dS(t) + cxx (t, S(t))dS(t)dS(t)
2
=ct (t, S(t))dt + cx (t, S(t))(µS(t)dt + σS(t)dB(t))
1 (73)
+ cxx (t, S(t))σ 2 S 2 (t)dt
2
1 2 2
= ct (t, S(t)) + µS(t)cx (t, S(t)) + σ S (t)cxx (t, S(t)) dt
2
+ σS(t)cx (t, S(t))dB(t) .
We next compute the differential of the discounted option price e−rt c(t, S(t)).
Let f (t, x) = e−rt x. According to the Ito formula,
43
each time t ∈ [0, T ] agrees with c(t, S(t)). This happens if and only if e−rt X(t) =
e−rt c(t, S(t)) for all t. One way to ensure this equality is to make sure that
e−rt X(t) − X(0) = e−rt c(t, S(t)) − c(0, S(0)) for all t ∈ [0, T ) . (76)
If X(0) = c(0, S(0)), then we can cancel this term in (76) and get the desired
equality.
Comparing (72) and (74), we see that (75) holds if and only if
We examine what is required in order for (77) to hold. We first equate the dB(t)
terms in (77), which gives
This is called the delta-hedging rule. At each time t prior to expiration, the
number of shares held by the hedge of the short option position is the partial
derivative with respect to the stock price of the option value at that time. This
quantity, cx (t, S(t)), is called the delta of the option. We next equate the dt
terms in (77), using (78), to obtain
5.5.3 Conclusion
Suppose we have found this function. If an investor starts with initial capital
X(0) = c(0, S(0)) and uses the hedge H(t) = cx (t, S(t)), then (77) will hold for
44
all t ∈ [0, T ). Indeed, the dB(t) terms on the left and right sides of (77) agree
because H(t) = cx (t, S(t)), and the dt terms agree because (81) guarantees
(80). Equality in (77) gives us (76). Canceling X(0) = c(0, S(0)) and e−rt in
this equation, we see that X(t) = c(t, S(t)) for all t ∈ [0, T ). Taking the limit
as t ↑ T and using the fact that both X(t) and c(t, S(t)) are continuous, we
conclude that X(T ) = c(T, S(T )) = (S(T ) − K)+ .
This means that the short position5 has been successfully hedged. No matter
which of its possible paths the stock price follows, when the option expires, the
agent hedging the short position has a portfolio whose value agrees with the
option payoff.
References
[1] P. Mörters, Y. Peres, (2010), Brownian Motion, Cambridge: Cambridge
University Press.
5 The Short Position is a technique used when an investor anticipates that the value of
a stock will decrease in the short term, perhaps in the next few days or weeks. In a short
sell transaction the investor borrows the shares of stock from the investment firm to sell to
another investor.
45