0% found this document useful (0 votes)
78 views

Lecture Notes

This document provides an overview of financial mathematics concepts including stochastic calculus, the Black-Scholes-Merton model for pricing options, and interest rate models. It introduces key topics such as Itō calculus, Girsanov's theorem, self-financing portfolios, risk-neutral measures, and the fundamental theorems of asset pricing. More advanced topics covered include pricing dividend-paying assets, barrier options, American options, and fixed income securities. The probabilistic appendix provides background on constructing probability spaces and defining the Brownian motion.

Uploaded by

valdikaldi69
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Lecture Notes

This document provides an overview of financial mathematics concepts including stochastic calculus, the Black-Scholes-Merton model for pricing options, and interest rate models. It introduces key topics such as Itō calculus, Girsanov's theorem, self-financing portfolios, risk-neutral measures, and the fundamental theorems of asset pricing. More advanced topics covered include pricing dividend-paying assets, barrier options, American options, and fixed income securities. The probabilistic appendix provides background on constructing probability spaces and defining the Brownian motion.

Uploaded by

valdikaldi69
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

Financial Mathematics

How to make money with probability, but not almost surely!

Pasquale Cirillo
TU Delft

June 18, 2020


ii
Contents

I Stochastic Calculus and Financial Mathematics 3

1 What you need to know about Itō Calculus 7


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Definition of the integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Some basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 The Itō-Doeblin formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Girsanov Theorem and FTAPs 21


2.1 Self-financing portfolios, risk-neutral measures and arbitrage . . . . . . . . . 22
2.2 Change of measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.1 The Radon-Nikodym derivative . . . . . . . . . . . . . . . . . . . . . 25
2.3 Fundamental theorems of Asset Pricing . . . . . . . . . . . . . . . . . . . . 27
2.4 The Cameron-Martin theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 A simple model for foreign exchange rates . . . . . . . . . . . . . . . . . . . 31
2.6 Moving towards Girsanov theorem . . . . . . . . . . . . . . . . . . . . . . . 33
2.6.1 Girsanov theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3 Black-Scholes-Merton demystified 37
3.1 Black-Scholes-Merton (BSM) demystified . . . . . . . . . . . . . . . . . . . 37
3.1.1 A little digression: two useful results . . . . . . . . . . . . . . . . . . 38
3.1.2 Self-financing portfolios for BSM . . . . . . . . . . . . . . . . . . . . 39
3.2 Pricing options in BSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Volatility and the BSM model . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.1 Historical Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.2 Implied Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Some extensions via exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.1 Back to Bachelier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4.2 The Value-at-Risk for a simple portfolio . . . . . . . . . . . . . . . . 47
3.4.3 No arbitrage conditions . . . . . . . . . . . . . . . . . . . . . . . . . 49

iii
iv CONTENTS

II Advanced Topics in Financial Mathematics 53

4 A step forward in derivatives 55


4.1 Dividend-paying assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Barrier options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.1 A little digression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2.2 Back to pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 American Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 American Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.2 American Puts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.3 Perpetual American Puts . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Fixed-income and interest rates 67


5.1 Some basics of fixed-income . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1.1 Intro to interest rate derivatives and the T-forward measure . . . . . 68
5.2 Models for interest rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.1 Vasicek model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.2 CIR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2.3 Affine models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3 Girsanov I, II and III. The Revenge. . . . . . . . . . . . . . . . . . . . . . . 72
5.4 Pricing interest rate derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4.1 Pricing a general claim . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4.2 Bond options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

III Probabilistic Appendix 79

6 Constructing probability spaces 83


6.1 The construction on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.1.1 The equipped space (R, B(R)) . . . . . . . . . . . . . . . . . . . . . . 83
6.1.2 An appropriate probability measure . . . . . . . . . . . . . . . . . . 86
6.1.3 Random variables and densities . . . . . . . . . . . . . . . . . . . . . 90
6.2 The construction on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3 The construction on R∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3.1 A probability on (R∞ , B(R∞ )) . . . . . . . . . . . . . . . . . . . . . 93
6.4 The construction on RT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7 The Brownian motion 99


7.1 The Wiener Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.2 Defining the Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.2.1 The Brownian motion and (R[0,+∞) , B(R[0,+∞) )) . . . . . . . . . . . 101
7.2.2 The Brownian motion as Gaussian process . . . . . . . . . . . . . . . 104
CONTENTS v

7.3 Main properties of the Brownian motion . . . . . . . . . . . . . . . . . . . . 104


7.3.1 Markovianity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.3.2 Martingality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.3.3 The Maximum of {B(t)} . . . . . . . . . . . . . . . . . . . . . . . . 107
7.4 About the total variation of the Brownian motion . . . . . . . . . . . . . . . 109
7.5 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
vi CONTENTS
Preface

I have written these lecture notes for the TU Delft course in Financial Mathematics of the
Master Program in Applied Mathematics, Financial Engineering Specialization.
The aim of these pages is to provide a short yet rigorous introduction to stochastic
calculus and its applications in finance.
These lecture notes are self-contained and cover the entire course. No extra-reading
is typically required. However, now and then, I will provide extra references for those
interested in deepening a particular subject.
The lecture notes can be divided into four parts. In the first one, the mathematical
foundations of stochastic calculus are given, from the construction of probability spaces, to
the resolution of those stochastic differential equations that are commonly used in finance.
The second part is then devoted to the most important financial (and sometimes economic)
applications, from Black-Scholes-Merton to arbitrage. In the third and fourth parts, more
advanced topics are discussed, from derivatives to interest rates modeling and risk theory.
In writing the first part of these notes, I have made a large use of the notes I personally
took for the course on Stochastic Processes, during my PhD at Bocconi University in Milan,
Italy. The course was held by Prof. Michele Donato Cifarelli, a renowned scholar, author
of important results in Bayesian statistics, and a wonderful teacher.
Thanks to the help of my former students, I have tried my best to minimize errors and
typos, but I am pretty sure that something wrong is still present. Naturally all fault is
mine.
Every comment and correction is highly appreciated.

Pasquale Cirillo.
Delft 2020.

1
2 CONTENTS
Part I

Stochastic Calculus and Financial


Mathematics

3
5

In the next three chapters, we deal with the basics of financial mathematics in a brief
yet rigorous way, starting from a review of Itō Calculus. The fundamental theorems of
asset pricing, Girsanov theorem and the well-known Black-Scholes-Merton model are the
main topics of this part.
All the contents are part of the final exam.
6
Chapter 1

What you need to know about Itō


Calculus

In this chapter we introduce the famous Itō’s integral, a building block of stochastic cal-
culus. The necessity to introduce this new type of integral Remerges because we cannot use
t
standard calculus techniques to study objects of the form 0 f (s)dB(s), since B(s) is not
a BV function.
We will first introduce the new integral and study its basic properties. Then we will move
to the well-known Itō-Doeblin formula (or lemma), which we can use to find the differential
of a time-dependent function of a stochastic process. Later on, we will use this formula to
study the fundamental Black-Scholes-Merton model.
The topics discussed in the present chapter constitute the necessary basis for studying
stochastic differential equations.
The interested reader can find additional materials in [2] and [8].

1.1 Introduction
Imagine we own a portfolio whose value at time t is P (t). For simplicity we assume that our
portfolio only contains securities whose value depends on the interest rate on the market.
Let r(t) be the instantaneous interest rate at time t. It may be interesting to study an
equation like
d
P (t) = r(t)P (t), t ≥ 0, (1.1)
dt
which tells us how the value of our portfolio varies over time, when we consider infinitesimal
time variations1 . Equation (1.1) is a common example of differential equation, whose
solution is studied in any calculus course.
1
If you are familiar with the Greeks in finance, this should remind you of T heta.

7
8 CHAPTER 1. WHAT YOU NEED TO KNOW ABOUT ITŌ CALCULUS

Now assume that we can express r(t) as follows


r(t) = r + aW (t),
where r and a are two constants, and where W (t) is a “white noise”, that is to say a
stochastic process such that
• W(t) and W(s) are independent for t 6= s;
• For t1 < ... < th and k > 0, we have that
(W (t1 ), ..., W (th )) =d (W (t1 + k), ..., W (th + k));

• E(W (t)) = 0, ∀t.


Equation (1.1) can thus be rewritten as
d
P (t) = rP (t) + a(P (t)W (t)). (1.2)
dt
This equation now includes random quantities expressed by the process W (t). As a con-
sequence we cannot solve equation (1.2) in the usual way, since we should consider it as
an ordinary differential equation for every trajectory of W (t), which is a rather “irregular”
object.
In any case, similarly to what we usually do with equations like (1.1), a solution for (1.2)
could be thought as the solution of the following (stochastic) integral equation
Z t Z t
P (t) = P (0) + r P (s)ds + a P (s)W (s)ds.
0 0
Unfortunately, the last term on the right-hand side of the previous equation cannot be
easily defined. Rt Rt
A possibility would be to substitute 0 P (s)W (s)ds with something like 0 P (s)dB(s),
where B(s) is a Brownian motion, and W (s) ≈ dB(s)/ds,R so that dB(s) = W (s)ds.
t
Anyway, even this solution is not feasible: if we interpret 0 P (s)dB(s) as a Riemann-
Stieltjes integral, we immediately have to remember that B(s) is not a BV function.
Finally, we can notice that the quantity
n
X
P (δi )[B(si ) − B(si−1 )],
i=1
Rt
which should approximate 0 P (s)dB(s), converges to different values, depending on the
choice of δi ∈ [si−1 , si ]. The main reason of such a behavior is the irregularity of the real-
izations of B(s).
So, how can we solve the problem of evaluating (1.2)?
The only solution seems to define another type of integral. Over the years different possi-
bilities have been introduced, but the most successful - and now widely accepted - one is
due to Kiyoshi Itō.
1.2. DEFINITION OF THE INTEGRAL 9

1.2 Definition of the integral


Following the steps of Itō, given a Brownian motion B(t, ω), we want to define the integral
Z T
I(f )(ω) := f (t, ω)dB(t, ω), (1.3)
0

for a sufficiently large class of functions f . Notice that the presence of ω ∈ Ω in the
arguments of f indicates that f can be a random function (but naturally also a traditional,
deterministic one).
Mimicking what happens in standard calculus, a natural way of defining I(f )(ω) is to set
Z T n−1
X
I(f )(ω) := f (t, ω)dB(t, ω) = lim f (tk , ω)[B(tk+1 , ω) − B(tk , ω)],
0 n→∞
k=0

for 0 = t0 < t1 < ... < tn = T , with maxk (tk+1 − tk ) → 0 as n → ∞.


Let us now characterize the class M2 [0, T ] of functions f for which we can define Itō’s
integral.
Definition 1. A function f : [0, +∞) × Ω → R belongs to the class M2 [0, T ] if
• f (t, ω) is B × F−measurable, where B = B(R[0,+∞) ), and F is given by the space
(Ω, F, P ) on which B(t) is defined;

• f (t, ω) is Ft −measurable, i.e. f (t, ω) is non-anticipating, i.e. adapted to Ft ;


R 
T
• E 0 f 2 (t, w)dt < +∞.

Definition 2. A function φ ∈ M2 [0, T ] is called simple process (or elementary function)


if
n−1
X
φ(t, ω) = aj (ω)1[tj ,tj+1 ] (t),
j=0

where 0 = t0 < ... < tn = T .


In the previous definition, please notice that
• aj (ω) can be a random quantity;

• Since φ ∈ M2 [0, T ], each aj (ω) must be Ftj −measurable!


Definition 3. If φ ∈ M2 [0, T ] is a simple process (or elementary function), then its Itō’s
integral is
Z T n−1
X
φ(t, ω)dB(t, ω) := aj (ω) [B(tj+1 , ω) − B(tj , ω)] .
0 j=0
10 CHAPTER 1. WHAT YOU NEED TO KNOW ABOUT ITŌ CALCULUS

Proposition 1 (Properties of simple processes). Let φ1 and φ2 be two simple processes in


M2 [0, T ]. Then
RT RT RT
1. 0 (φ1 + φ2 )dB = 0 φ1 dB + 0 φ2 dB;
RT RT
2. 0 cφ1 dB = c 0 φ1 dB, for any constant c;
hR i
T
3. E 0 φ1 dB = 0;
hR i hR i
T RT T
4. E 0 φ1 dB · 0 φ2 dB = E 0 (φ1 · φ2 )dt .

The last property is known as the Itō Isometry.


Proof. Points 1., 2. and 3. are left as exercises.
To prove point 4. we start with the case φ1 = φ2 . Hence
 !2  
Z T n−1
X 2
E φ1 (t, ω)dB(t, ω)  = E a2j (ω) [B(tj+1 , ω) − B(tj , ω)] + (1.4)
0 j=0

X 

+ 2 aj (ω) [B(tj+1 , ω) − B(tj , ω)] ak (ω) [B(tk+1 , ω) − B(tk , ω)]
.
0≤j<k≤n−1 
| {z }
C

In order to further work with equation (1.4), let us consider the following:

• E[(B(tj+1 , ω) − B(tj , ω))2 ] = tj+1 − tj ;

• for k > j the elements aj (ω) [B(tj+1 , ω) − B(tj , ω)] ak (ω) are independent from
B(tk+1 , ω) − B(tk , ω). Hence C = 0.

Thus, equation (1.4) simplifies to


 
"Z
T 2 # n−1
X
E φ1 (t, ω)dB(t, ω) = E a2j (ω) [B(tj+1 , ω) − B(tj , ω)]2 
0 j=0
n−1
X h i
= E[a2j (ω)]E (B(tj+1 , ω) − B(tj , ω))2
j=0
n−1
X n−1
X
= E[a2j (ω)](tj+1 − tj ) = E [a2j (ω)](tj+1 − tj )
j=0 j=0
Z T 
= E φ21 (t, ω)dt
0
1.2. DEFINITION OF THE INTEGRAL 11

6 φ2 , we can consider φ3 = φ1 + φ2 . Thus


For the case φ1 =
"Z
T 2 # "Z
T 2 # "Z
T Z T 2 #
E φ3 dB = E (φ1 + φ2 )dB =E φ1 dB + φ2 dB
0 0 0 0
Z T  Z T Z T  Z T 
= E φ21 dt + 2E φ1 dB · φ2 dB + E φ22 dt .
0 0 0 0

But we also know that


"Z
T 2 # Z T  Z T 
E φ3 dB = E φ23 dt =E (φ21 + 2φ2 φ2 + φ22 )dt
0 0 0
Z T  Z T  Z T 
= E φ21 dt + 2E φ1 φ2 dt + E φ22 dt
0 0 0

Then, by equating the last two equations, we get the final result.

Exercise 1. Prove points 1.,2. and 3. of Proposition 1.

In what follows we want to define the Itō’s integral for a general f ∈ M2 [0, T ]. In order
to do so, we need to introduce some extra definitions and results.
In particular, we will follow the following procedure:

1. We start from the definition of Itō’s integral for simple processes, as given in Definition
3.

2. We then show that any function f ∈ M2 [0, T ] can be approximated by a proper


sequence of simple processes {φn }.
RT RT
3. Finally we define 0 f (t, ω)dB(t, ω) as the limit of 0 φn (t, ω)dB(t, ω), for φn → f
as n → ∞.

Definition 4 (Approximating sequence in M2 ). Consider f (t, ω) ∈ M2 [0, T ]. A collection


{φn } of simple processes in M2 [0, T ] is an approximating sequence for f , if
Z T 
2
lim E |f (t, ω) − φn (t, ω)| dt = 0. (1.5)
n→∞ 0

Now, let us consider the following sequence of Itō’s integrals for simple processes
Z T
I(φn )(ω) = φn (t, ω)dB(t, ω);
0

and let us also assume that {φn } is an approximating sequence for f ∈ M2 [0, T ], so that
Equation (1.5) holds. The we can state the following.
12 CHAPTER 1. WHAT YOU NEED TO KNOW ABOUT ITŌ CALCULUS

Theorem 1. For f ∈ M2 [0, T ], let {φn } ∈ M2 [0, T ] be an approximating sequence for f .


Then the sequence of integrals {I(φn )(ω)} converges in quadratic mean.
Proof. We simply have to show that {I(φn )(ω)} is a Cauchy sequence in quadratic mean.
Consider two elements φl and φm of the sequence {φn }. Clearly
"Z Z T 2 # "Z
T 2 # T
E φl dB − φm dB =E (φl − φm )dB .
0 0 0

Since φl and φm are simple processes, it is easy to see that φl − φm is a simple process in
M2 [0, T ] as well. Hence, for what we have seen in Proposition 1,
"Z
T 2 # Z T  Z T 
2 2
E (φl − φm )dB =E (φl − φm ) dt = E [(φl − f ) + (f − φm )] dt .
0 0 0

But this implies that


"Z
T 2 #  Z T Z T 
2 2
E (φl − φm )dB ≤E 2 |φl − f | dt + |f − φm | dt → 0,
0 0 0

as l, m → ∞.

It is worth noticing that the previous theorem implies that if {φn } and {ηm } are two dif-
ferent approximating sequences for f , then the limits limn→∞ I(φn )(ω) and limm→∞ I(ηm )(ω)
exist and they must be the same.
We can now introduce the most important result of the present section, with which we
show that every f ∈ M2 [0, T ] can be approximated by a properly chosen approximating
sequence, so that we can define the Itō integral of f in terms of limits.
Theorem 2 (Itō’s approximation). For every f ∈hM2 [0, T ], there exists at least
i a sequence
RT 2
of functions {hn } in M2 [0, T ] such that limn→∞ E 0 |f (t, ω) − hn (t, ω)| dt = 0. In other
words, {hn } is an approximating sequence for f .
To prove this theorem we will make use of three different lemmas. Our aim is to show
that
1. If a function g ∈ M2 [0, T ] is bounded and continuous, it can be approximated by a
proper sequence of simple processes {φn }.

2. If a function h ∈ M2 [0, T ] is bounded, it can be approximated by a proper sequence


of bounded and continuous functions {gn }.

3. Every function f ∈ M2 [0, T ] can be approximated by a proper sequence of bounded


functions {hn }.
1.2. DEFINITION OF THE INTEGRAL 13

Lemma 1. For every g ∈ M2 [0, T ] which is bounded and continuous, there exists a sequence
{φn } of simple processes in M2 [0, T ] such that
Z T 
2
lim E |g(t, ω) − φn (t, ω)| dt = 0.
n→∞ 0

Proof. Since g ∈ M2 [0, T ] is continuous, the function


n−1
X
φn (t, ω) := g(tj , ω)1[tj ,tj+1 ) (t)
j=0

is a simple process and it converges uniformly to g, i.e. (g − φn )2 → 0 uniformly. Then


RT RT
limn→∞ 0 (g − φn )2 dt = 0 limn→∞ (g − φn )2 dt = 0. Now, given that g − φn is bounded,
RT
we have that 0 (g − φn )2 dt ≤ D < ∞, and we can apply the dominated converge theorem
to complete the proof.

Dominated Convergence Theorem


If a sequence of functions {fn } on (Ω, F, µ) converges to a function f , and it is
dominated byRanother function g, so that |fn (x)| ≤ g(x) for all x ∈ Ω and n = 1, 2, ...,
then limn→∞ Ω |fn − f |dµ = 0.

Lemma 2. Let h ∈ M2 [0, T ] be a bounded function. Then there exists a sequence {gn } ∈
M2 [0, T ] of bounded and continuous functions, for every ω and n, such that
Z T 
lim E |h(t, ω) − gn (t, ω)|2 dt = 0.
n→∞ 0

We omit the proof of this lemma; however it may be interesting to know that the
sequence {gn } can be defined as
Z t
gn (t, ω) = n h(s, ω)ds, 0 ≤ t ≤ T.
1
max(t− n ,0)

Lemma 3. Let f ∈ M2 [0, T ] be a generic function. Then there exists a sequence {hn } ∈
M2 [0, T ] of bounded functions, such that
Z T 
2
lim E |f (t, ω) − hn (t, ω)| dt = 0.
n→∞ 0
14 CHAPTER 1. WHAT YOU NEED TO KNOW ABOUT ITŌ CALCULUS

Proof. Let us assume that f is non-negative, i.e. f ≥ 0, and define the sequence
(
f (t, w) f ≤ n
hn (t, ω) = .
n f >n

Since f ∈ M2 [0, T ], we know that f 2 is integrable a.s. and h2n ≤ f 2 . Hence we get
RT 2 2 2
0 hn (t, ω)dt < ∞ a.s. Moreover |f − hn |R ≤ f , and limn→∞ hn = f . The dominated
T
convergence theorem thus guarantees that 0 |f − hn |2 dt → 0.
RT T RT 2
However, 0 |f − hn |2 dt ≤ 0 f 2 dt, with
R
hR E( i < ∞. The dominated convergence
0 f dt)
T
theorem then tells us that limn→∞ E 0 |f − hn |2 dt = 0.
For a general f , it is sufficient to use the decomposition f = f+ − f− , where f+ =
max(f, 0) ≥ 0 and f− = max(−f, 0) ≥ 0.

Combining the previous three lemmas we can thus prove Theorem 2. The next (and
last) step for the definition of the Itō’s integral for every f ∈ M2 [0, T ] is therefore contained
in the following definition.
Definition 5 (The fundamental definition). Let f be a function belonging to the class
M2 [0, T ]. Let {φn } be an approximating sequence for f . We define Itō’s integral of f the
following limit
Z T Z T
I(f )(ω) = f (t, ω)dB(t, ω) := lim I(φn )(ω) = lim φn (t, ω)dB(t, ω).
0 n→∞ n→∞ 0

1.3 Some basic properties


In this section we analyze some properties of the integral I(f )(ω) we have defined so far.
For example, we can start by showing that I(f )(ω) inherits some important properties
from I(φ)(ω), where φ is a simple process in M2 [0, T ].
Proposition 2. If f ∈ M2 [0, T ] then
Z T 
E f (t, ω)dB(t, ω) = 0,
0
hR i
T
Proof. Let {φn } be such that limn→∞ E 0 |f − φn |2 dt = 0. Then
Z T Z T
lim φn (t, ω)dB(t, ω) = f (t, ω)dB(t, ω)
n→∞ 0 0

in quadratic mean. As a consequence


Z T  Z T 
E φn (t, ω)dB(t, ω) → E f (t, ω)dB(t, ω) .
0 0
1.3. SOME BASIC PROPERTIES 15
hR i hR i
T T
But we know that E 0 φn (t, ω)dB(t, ω) = 0, hence E 0 f (t, ω)dB(t, ω) = 0 as well.

Proposition 3. If f ∈ M2 [0, T ] then


"Z
T 2 # Z T 
2
E f (t, ω)dB(t, ω) =E f (t, ω)dt .
0 0

Proof. The proof of this proposition is left as exercise; see below.

Exercise 2. Prove Proposition 3.


Hint: Just notice that, for what we have seen so far,
"Z
T 2 # "Z
T 2 #
E φn (t, ω)dB(t, ω) →E f (t, ω)dB(t, ω) .
0 0

Now you can apply point 4. of Proposition 1 to conclude the proof.


Proposition 4. If f ∈ M2 [0, T ] then
Z t
Q(t) = f (s, ω)dB(s, ω), t ≥ 0,
0

is a martingale with respect to the natural filtration Ft generated by B(t, ω), and the prob-
ability measure P of (Ω, F, P ).
Proof. We have to check the different properties of martingales, in order to complete our
proof.
The fact that Q(t) is Ft −measurable is evident.
We also know that (we drop (s, ω) in the notation)
Z t  "Z
t 2 # 12 Z t  12
2
E[|Q(t)|] = E f dB ≤E f dB ≤E f ds < ∞.
0 0 0

Rt
Now, let us take into consideration In (t) = 0 φn (s, ω)dB(s, ω), where {φn } is an approxi-
mating sequence for our f . For t ≥ s, we have

m−1
X
E[In (t)|Fs ] = E  φj (tj , ω) [B(tj+1 , ω) − B(tj , ω)]
j=0
n−1
#
X
+ φk (tk , ω) [B(tk+1 , ω) − B(tk , ω)] Ftm ,
k=m
16 CHAPTER 1. WHAT YOU NEED TO KNOW ABOUT ITŌ CALCULUS

where 0 = t0 < t1 < ... < tm = s < tm+1 < ... < tn = t.
Hence
m−1
X
E[In (t)|Fs ] = φj (tj , ω) [B(tj+1 , ω) − B(tj , ω)] (for measurability) (1.6)
j=0
X
+ E [ φk (tk , ω) [B(tk+1 , ω) − B(tk , ω)]| Ftm ] .
k≥m
| {z }
A

Now notice that the second addend A in Equation (1.6) is equal to 0. In fact, for t ≥ s,
the increments B(tk+1 , ω) − B(tk , ω) are independent from φk (tk , ω), and

E [ φk (tk , ω) [B(tk+1 , ω) − B(tk , ω)]| Ftm ] = E[φk (tk , ω)] E [B(tk+1 , ω) − B(tk , ω)|Ftm ] = 0.
| {z }
=0

This means that, for t ≥ s,


m−1
X Z s
E[In (t)|Fs ] = φj (tj , ω) [B(tj+1 , ω) − B(tj , ω)] = φn (r, ω)dB(r, ω) = In (s).
j=0 0

In order to conclude the proof, it is sufficient to apply a well-known result about conditional
expectations, i.e. if Xn →r X, r ≥ 1, and G is a σ−algebra, then E[Xn |G] →r E[X|G],
whereR“→r ” indicates convergence in Rthe r−th mean (orR in the Lr −norm, ifR you prefer).
t t t t t
Now, 0 φn dB →2 0 f dB, so that E[ 0 φn dB|Ft ] →2 E[ 0 f dB|Ft ]. But E[ 0 φn dB|Ft ] =
R
Rt 2
R t R t R s
0 φn dB → E[ 0 f dB]. Therefore E[ 0 f dB|Fs ] = 0 f dB.

We conclude this section by stating an important proposition that, for the moment, we
do not prove. We will come back to it later on, when needed.
Rt
Proposition 5. If f belongs to the class M2 [0, T ], then Q(t) = 0 f (s, ω)dB(s, ω) always
admits a continuous modification for every T ≥ t ≥ 0.

1.4 The Itō-Doeblin formula


The definition of Itō’s integral as given in Definition 5 is essential for the development of
stochastic calculus, as much as the fundamental theorem of calculus is for the definition
and the study of “standard” integrals. However, Definition 5 alone (and the construction
behind it) is not very useful for the actual computation of stochastic integrals. We need in
fact some sort of rule, or formula, we can apply to practically compute these new objects.
The aim of this section is to introduce this formula, known as the Itō-Doeblin formula (or
Itō-Doeblin lemma).
1.4. THE ITŌ-DOEBLIN FORMULA 17

Before introducing the formula, it is interesting to stress that the class M2 [0, T ] of functions,
for which the integral I(f )(ω) can be defined, can be extended to a slightly more general
class, called H2 [0, T ].

Definition 6. A function f belongs to the class H2 [0, T ] if

• f (t, ω) is B × F−measurable;

• There exists a filtration {Mt } ⊆ F such that B(t, ω) is a martingale with respect to
{Mt };

• f (t, ω) is Mt −measurable;
RT
• P ( 0 f 2 (t, w)dt < +∞) = 1.

For this new class of functions we can show that there always exists a sequence {φn }
RT
of simple processes such that 0 |f − φn |2 dt → 0 in probability. Hence we can set
Z T
I(f )(ω) = P-limn φn dB.
0

From now on, all the functions f will belong to H2 [0, T ].


A couple of definitions will be very useful to introduce the Itō-Doeblin formula.

Definition 7 (Stochastic integral). Let B(t, w) be a Brownian motion on (Ω, F, P ). A


stochastic integral is a process X(t) on (Ω, F, P ) such that
Z t Z t
X(t) = X(0) + u(s, ω)ds + v(s, ω)dB(s, ω),
0 0
Rt
where u, v ∈ H2 [0, T ] and P ( 0 |u(s, ω)|ds < +∞) = 1.

Definition 8 (Stochastic differential). If X(t) is a stochastic integral, its stochastic dif-


ferential is
dX(t) = u(t, ω)dt + v(t, ω)dB(t, ω).

Theorem 3 (Itō-Doeblin formula). Let X(t) be a stochastic integral with stochastic dif-
ferential
dX(t) = u(t, ω)dt + v(t, ω)dB(t, ω),
Rt
with u, v ∈ H2 [0, T ] and P ( 0 |u(s, ω)|ds < +∞) = 1.
Let g(t, x) : [0, +∞)×R → R be a function such that gt0 , gx0 and gxx
00 exist and are continuous.

The process Y (t) = g(t, X(t)) is still a stochastic integral with stochastic differential
1 00
dY (t) = gt0 (t, X(t))dt + gx0 (t, X(t))dX(t) + gxx (t, X(t))(dX(t))2 .
2
18 CHAPTER 1. WHAT YOU NEED TO KNOW ABOUT ITŌ CALCULUS

We will not give a formal proof of the previous theorem, since it goes beyond the scope
of these lecture notes.
However it is interesting to observe that Itō-Doeblin formula can also be written as
 
1 00 2
dY (t) = gt + gx u + gxx v dt + gx0 vdB.
0 0
(1.7)
2

Equation (1.7) can be obtained by writing the Taylor series of Y (t) = g(t, X(t)) up to the
second order, always remembering that dX(t) = u(t, ω)dt + v(t, ω)dB(t, ω). This leads to
1 00
dY (t) = gx0 (u(t, ω)dt + v(t, ω)dB(t, ω)) + gt0 dt + gxx (u(t, ω)dt + v(t, ω)dB(t, ω))2 .
2
00 (u(t, ω)dt + v(t, ω)dB(t, ω))2 can be re-written as
The term 21 gxx

1 00
g ((u(t, ω))2 (dt)2 + (v(t, ω))2 (dB(t, ω))2 + 2u(t, ω)v(t, ω)dtdB(t, ω)).
2 xx
Now, for dt → 0, we have (dt)2 = 0 = dtdB, while (dB)2 = dt for what we have seen
before. Rearranging the terms (and dropping (t, ω) in the notation) gives us Equation
(1.7).
We finish this section and chapter by giving two examples of the use of the Itō-Doeblin
formula. Later on, when introducing the model of Black-Scholes-Merton, or when dealing
with risk-neutral pricing, we will see many interesting financial applications of the formula.

Example 1. Assume we want to compute


Z t
B n (s, ω)dB(s, ω), n ≥ 1.
0

Set u(t, ω) = 0, v(t, ω) = 1, g(t, x) = n+11


xn+1 . We have that gt0 = 0, gx0 = xn and
00
gxx = nx n−1 . The Itō-Doeblin formula then gives
1
dg(t, B(t, ω)) = nB n−1 (t, ω)dt + B n (t, ω)dB(t, ω).
2
From this we get
Z t Z t
n 1 n
B (s, ω)dB(s, ω) = B n+1 (t, ω) − B n−1 (s, ω)ds.
0 n+1 2 0

For n = 1 we obtain Z t
1 t
B(s, ω)dB(s, ω) = B 2 (t, ω) − ,
0 2 2
which represents an interesting case we will see later on in applications.
1.4. THE ITŌ-DOEBLIN FORMULA 19

Example
Rt 2. Let f be a non-random function with bounded variation. How to compute
0 f (s)dB(s, ω)?
Let us set u(t, ω) = 0, v(t, ω) = 1, g(t, x) = xf (t). Then gt0 = xdf (t), gx0 = f (t) and
00 = 0. Hence
gxx Z t Z t
B(t, ω)f (t) = B(s, ω)df (s) + f (s)dB(s, ω),
0 0
that is Z t Z t
f (s)dB(s, ω) = B(t, ω)f (t) − B(s, ω)df (s).
0 0
20 CHAPTER 1. WHAT YOU NEED TO KNOW ABOUT ITŌ CALCULUS
Chapter 2

Girsanov Theorem and the


Fundamental theorems of Asset
Pricing

The aim of this chapter is to introduce Girsanov theorem, a fundamental result of financial
mathematics. This theorem tells us how the dynamics of a stochastic process changes when
the original measure is changed to an equivalent probability measure. In financial words,
when dealing with derivatives, it tells us how to pass from the physical or market measure,
which characterizes the probability that an underlying asset will have a particular value
as we can observe it on the market, to the risk-neutral measure (or equivalent martingale
measure) that governs the pricing of a derivative (whose value depends on the underlying
asset) under the hypothesis of no arbitrage.
The procedure we will follow to prove Girsanov theorem is the following: we first
introduce exponential martingales and we show how these can be used to generate new
measures on a probability space (Ω, F, P ). This means that we will speak about abso-
lutely continuous measures and the Radon-Nikodym derivative. We will then introduce
the Cameron-Martin theorem, which gives us important clues about the relationships be-
tween the standard Brownian motion and Brownian motion with drift. Finally, after giving
some extra conditions (Novikov conditions), we will be able to discuss Girsanov theorem,
which generalizes the results of Cameron-Martin.
In this chapter we also introduce the two fundamental theorems of asset pricing. These
theorems give us important insights about the properties of efficient and complete markets,
which are the essential starting point for all pricing procedures.

21
22 CHAPTER 2. GIRSANOV THEOREM AND FTAPS

2.1 Self-financing portfolios, risk-neutral measures and ar-


bitrage
Let M be a T −period market (model) with traded assets Ak , k = 1, 2, .., K. Let StA (ω) be
the price of asset A at time t, under the market scenario ω ∈ Ω.
It is plausible to assume that a generic trader will not just hold single assets, but a more
realistic portfolio, consisting of shares (even negative ones, for short positions) of each of
the traded assets. Let θtA (ω) be the shares of asset A in our trader’s portfolio during trading
period t (that is to say the period between actual trading at time t and the beginning of
the next trading session at t + 1), under scenario ω. If we assume that the trader adjusts
his/her portfolio over time, on the basis of the different scenarios, what we get is a so-called
dynamically rebalanced portfolio. A similar portfolio is said bounded when each θtAk (ω) is
bounded, for k = 1, ..., K. It is not difficult to see that if a market only allows finitely
many scenarios, then all dynamically rebalanced portfolios are bounded.
According to what we have seen so far, we require the sequence {θtA }0≤t≤T to be adapted
to the natural filtration generated by the corresponding price process.
The total value of portfolio θ (the vector of shares in the different assets), after rebalancing
at time t under scenario ω, is
K
X
Vtθ = Vtθ (ω) = θtAk (ω)StAk (ω).
k=1

Notice that, given the changes in prices, we generally have Vtθ 6= Vt+1θ .

If we assume that the trader does not invest (or withdraws) new resources in the portfolio
at time t + 1, the total value of the portfolio just before rebalancing at time t + 1 must be
the same as the value after rebalancing, that is
K
X K
X
θtAk (ω)St+1
Ak
(ω) = Ak
θt+1 Ak
(ω)St+1 (ω). (2.1)
k=1 k=1

An equivalent way of writing this is


K
X  
θ
Vt+1 (ω) − Vtθ (ω) = θtAk (ω) St+1
Ak
(ω) − StAk (ω) (2.2)
k=1

Definition 9 (Self-financing portfolio). A dynamically rebalanced portfolio that satisfies


(2.1), or equivalently (2.2), is called self-financing, since it needs no investment or with-
drawal, apart from those at t = 0.

Let us now assume that our market M possesses a risk-free asset, whose riskless rate
of return is r for trading period under continuous compounding.
2.1. SELF-FINANCING PORTFOLIOS, RISK-NEUTRAL MEASURES AND ARBITRAGE23

Definition 10 (Risk-neutral probability measure). A probability measure P on the σ−algebra


FT is defined risk-neutral (or equilibrium measure), if for every bounded self-financing port-
folio θ, the value V0θ at t = 0 is equal to the expectation, under P , of the discounted value
VTθ at time t = T , with respect to the risk-free rate, i.e.
h i
V0θ = e−rT EP VTθ .

Theorem 4. If a market M is characterized by a risk-free asset with rate r = 0, then


under every risk-neutral probability measure, the discounted price process of any traded
asset {e−rt St = St }0≤t≤T is a martingale w.r.t. the natural filtration.

Proof. W.l.o.g. we want to show that, for every t = 0, 1, ..., T − 1, we have

E[St+1 |Ft ] = St ,

where St is the price of a stock.


Equivalently we can show that, for every F ∈ Ft ,

E[St+1 1F ] = E[St 1F ].

In doing this we will exploit the fact that the value of every self-financing portfolio θ at
time t = 0 is the (discounted, but r = 0) expected value of the portfolio value at t = T ,
under any equilibrium distribution.
Now, assume that at times τ = 0, 1, 2, ..., t we hold no position in any of the assets on the
market. This means neither long nor short positions.
Assume that, at time τ = t, if the event F manifests itself, we short sell St shares of the
risk-free asset, which we call bond. The money we receive is used to buy one share of stock.
We then hold this position for a trading period, i.e. until t + 1. No matter what happens,
in t + 1 we sell the share of the stock, we collect St+1 and we invest this amount in the
risk-free bond. From now on, we hold a (St+1 − St ) position in the bond until t = T .
In all the scenarios in which the event F does not happen at time t, we hold no position
at τ = t, ..., T in any of the assets.
The portfolio we have just considered is self-financing, its value at t = 0 is 0. This implies
that under any risk-neutral probability, the expected value of the portfolio at time t = T
must be 0 as well. But the final value of the portfolio at time T is (St+1 − St )1F , hence

0 = E [(St+1 − St )1F ] −→ E[St+1 1F ] = E[St 1F ].

Corollary 5. If a market M is characterized by a risk-free asset with rate r > 0, then


under every risk-neutral probability measure, the discounted price process of any traded
asset {e−rt St }0≤t≤T is a martingale w.r.t. the natural filtration.
24 CHAPTER 2. GIRSANOV THEOREM AND FTAPS

Notice that Theorem 4 and Corollary 5 also hold for the discounted value process of any
bounded self-financing portfolio. In other words, the discounted value process of a bounded
self-financing portfolio is a martingale under the risk-neutral measure. This comes from
the fact that the linear combination of martingales is a martingale.

Exercise 3. Prove Corollary 5

Before moving to the next sections, leading to Girsanov theorem, we conclude our
financial introduction by giving the definition of arbitrage, or free lunch.
The concept of arbitrage (or to be more exact the absence of arbitrage) will be used a lot
in the following pages.

Definition 11 (Arbitrage). An arbitrage is a portfolio with value Vt such that V0 = 0 and,


for some T > 0,
P (VT ≥ 0) = 1 and P (VT > 0) > 0. (2.3)

Since we do not like to lose money for sure, it is clear that we should never offer prices
deriving from a model that admits arbitrage1 .
Later in this chapter, we will see that the first theorem of asset pricing gives us a simple
condition to verify that the model we use does not allow arbitrage.

2.2 Change of measure


In financial models, we always consider a probability space (Ω, F, P ), where P is the prob-
ability measure that assigns a probability to every scenario in Ω. The probability measure
P is known as the real world measure.
However, when pricing derivative securities, we will see that we never use P in our compu-
tations, but rather another measure Q, which is risk-neutral, as per Definition 10. This is
due to the fact that we need the discounted prices of assets to be martingales, something
that P cannot guarantee, while Q does.
Naturally, in passing from P to Q, we need to be sure that these two measures are equiv-
alent, according to the following definition.

Definition 12 (Equivalent measure). Given an equipped space (Ω, F), two measure P and
Q on (Ω, F) are said equivalent, if they agree on which sets in F have probability zero.

In simple words, if two measures are equivalent, they must agree on which scenarios
in Ω are possible, and which not. For the rest, the may assign different probabilities
to the possible scenarios. This is a very important point, because it suggests a natural
question: are prices computed under the risk-neutral measure Q appropriate for the real
1
If we buy, on the contrary, it is fantastic to find an arbitrage.
2.2. CHANGE OF MEASURE 25

world characterized by P ? We will answer this question later on, when analyzing Black-
Scholes-Merton model.
In what follows we will consider a way for passing from P to a new measure Q, through an
operation which is called “change of measure”. We will see that this operation guarantees
that P and Q are equivalent.
As you can imagine, the new measure Q will be a risk-neutral measure.

2.2.1 The Radon-Nikodym derivative


Let us now consider a positive random variable Z on the probability space (Ω, F, P ), such
that E[Z] = EP [Z] = 1. This variable can be used to define a new probability measure on
the equipped space (Ω, F). In fact, for every F ∈ F, it is sufficient to set

Q(F ) = EP [Z1F ]. (2.4)

Proposition 6. The quantity Q, defined in equation (2.4), is a probability measure on


(Ω, F).

Proof. The fact that Q is always nonnegative is trivial, given that Z is always positive.
Further we can notice that

Q(Ω) = EP [Z1Ω ] = EP [Z · 1] = EP [Z] = 1.

Finally, to show σ−additivity, just consider FP


1 , F2 , ..., to be pairwise disjoint events in

F, such that ∪∞n=1 Fn = F ∈ F. Then 1 F = n=1 1Fn , and the monotone convergence
theorem guarantees that
" ∞ # ∞ ∞
X X X
Q(F ) = EP [Z1F ] = EP Z 1Fn = EP [Z1Fn ] = Q(Fn ).
n=1 n=1 n=1

Proposition 7. Given the measures P and Q, and a nonnegative random variable Y , the
expectations EP and EQ behave as follows:

EQ [Y ] = EP [Y Z], (2.5)

EP [Y ] = EQ [Y /Z]. (2.6)

Proof. We just sketch the proof, by explaining its main steps.


To show equation (2.5), we can start by taking Y = 1F . In that case equation (2.5) is
nothing more than the definition of the measure Q.
Now, assume that Y is a simple random variable, i.e. it can be obtained as linear combi-
nation of indicators. Since the expectation is a linear operator, equation (2.5) still holds.
26 CHAPTER 2. GIRSANOV THEOREM AND FTAPS

Finally, take a general nonnegative Y . Nonnegativity guarantees that Y is the monotone


limit of simple random variables. Hence the monotone convergence theorem guarantees
that equation (2.5) also holds for the limit.
Regarding equation (2.6), it is sufficient to re-write Y = Ỹ Z, where Ỹ is a nonnegative
random variable. Notice that every nonnegative random variable Y can be expressed in
that way, being Z strictly positive. The proof is then completed by plugging Y = Ỹ Z in
(2.6), also considering equation (2.5).

Exercise 4. Show that the measures P and Q, as defined above, are equivalent. In other
words, show that, for every A ∈ F, if Q(A) = 0 then P (A) = 0 as well.
Hint: just consider Proposition 7.
In the case in which two probability measure P and Q satisfy equations (2.5) and (2.6)
for a given positive r.v. Z, then we say that the two measures are mutually absolutely
continuous (and we write P , Q a.c). The random variable Z is called Radon-Nikodym
derivative of Q with respect to P , or likelihood ratio of Q w.r.t P . The typical notation is
Z := dQ
dP . This notation makes particularly sense if we consider the following equation
Z Z Z
dQ
EQ [Y ] = Y dQ = Y dP = Y ZdP = EP [Y Z].
Ω Ω dP Ω

It is very important to notice that the Radon-Nikodym derivative of Q w.r.t. P depends on


(i.e. it is measurable with respect to) the σ−algebra F on which the measures are defined.
At this point, a natural question arises: what happens when we have different σ−algebras
available?
Proposition 8. Let P and Q be mutually a.c. probability measures on (Ω, F), with Radon-
Nikodym derivative Z = (dQ/dP )F . Suppose that G is a σ−algebra such that G ⊂ F. Then
the Radon-Nikodym derivative of Q w.r.t. P on the σ−algebra G is
 
dQ
= EP [Z|G].
dP G

Proof. For every G ∈ G, we aim to show that Q(G) = EP [1G EP [Z|G]].


By assumption we have G ⊂ F, therefore G ∈ F and 1G is G− and F−measurable. By
hypothesis we also know that Z is the Radon-Nikodym derivative of Q w.r.t. P on the
σ−algebra F. This means that we can apply equation (2.5) with Y = 1G , getting

Q(G) = EQ [1G ] = EP [Z1G ].

Using the law of total expectation and the fact that 1G is G−measurable, we finally have

Q(G) = EP [Z1G ] = EP [EP [Z1G |G]] = EP [1G EP [Z|G]].


2.3. FUNDAMENTAL THEOREMS OF ASSET PRICING 27

Proposition 8 tells us that, if {Ft } ⊂ F is a filtration on (Ω, F), and if P and Q are
mutually absolutely continuous probability measures on FT , for some T ≤ ∞, then P
and Q are mutually a.c. on every Ft with t ≤ T , and the Radon-Nikodym derivatives
(dQ/dP )Ft define a martingale (under P ) for 0 ≤ t ≤ T .

2.3 Fundamental theorems of Asset Pricing


We are finally ready to introduce and discuss the two fundamental theorems of asset pric-
ing. These theorems are very important to understand the behavior of financial markets,
at least in their theoretical representation.
In this section we will give a formal proof of the first theorem (Theorem 6), while we omit
the proof of the second one (Theorem 7), which we will consider later on, when dealing
with hedging.

Theorem 6 (First fundamental theorem of asset pricing). If a market is characterized by


a risk-neutral measure Q, equivalent to the physical/market measure P , and a risk-free rate
r, then it does not allow arbitrage.

The proof of theorem 6 is rather simple, once we know the concepts of arbitrage2 and
equivalent measures.

Proof. If a market is characterized by an equivalent risk-neutral measure Q, and a risk-


free rate r, then every discounted value process of a bounded self-financing portfolio is a
martingale under Q, and its value at time t is the discounted expectation of the value at
maturity. In particular, let V (t) be the value of the portfolio at time t, such that V(0)=0.
Then we have that
EQ e−rT V (T ) = 0.
 

Now suppose that P (V (T ) < 0) = 0, where P is the real world measure.


Since Q is equivalent
 −rT  to P , we have that Q(V (T ) < 0) = 0. If we take into account that
EQ e V (T ) = 0, we clearly have
 that Q (V
 (T ) > 0) = 0. Otherwise we would have
Q e−rT V (T ) > 0 > 0, that is EQ e−rT V (T ) > 0. In other terms Q would not be risk-
neutral.
Now, thanks to equivalence, if Q (V (T ) > 0) = 0, then P (V (T ) > 0) = 0. This means
that the portfolio of value V (t) cannot be an arbitrage, because any portfolio for which
V (0) = 0 cannot be an arbitrage.

Given the first fundamental theorem, for the moment we only state the second one.
The proof will be discussed in Part III of the lecture notes.
2
I suggest to have again a look at Definition 11.
28 CHAPTER 2. GIRSANOV THEOREM AND FTAPS

Theorem 7 (Second fundamental theorem of asset pricing). Consider a market on which


we define a risk-neutral measure. The market model is complete, that is every risk position
can be hedged and every security exchanged, if and only if the risk-neutral measure is
unique.

Theorem 7 is particularly important in terms of risk management, because it tells us


that in a complete market, not only we can hedge all risky positions, but there is no
ambiguity about the risk-neutral measure we use.

2.4 The Cameron-Martin theorem

In this section, we introduce the important Cameron-Martin theorem, which we use as


a starting point to move to Girsanov theorem, the final goal of this chapter. Cameron-
Martin theorem tells us that there is a very interesting relationship between the standard
Brownian motion and a Brownian motion with drift, once we make the right change of
measure.
Let3 {Bt = B(t)}t≥0 be a standard Brownian motion defined on (Ω, F, P ). Let {Ft }t≥0 be
its natural filtration.
For every ξ ∈ R, define the process {Zξ (t)}t≥0 such that

ξ2t
 
Zξ (t) = exp ξB(t) − . (2.7)
2

Proposition 9. For every ξ ∈ R, the process {Zξ (t)}t≥0 is a positive martingale w.r.t.
{Ft }t≥0 .

Proof. The fact that Zξ (t) > 0 is always positive is evident.


We want to show that, for any s, t ≥ 0, E(Zξ (t + s)|Fs ) = Zξ (s). This is just one of
the different ways in which we can prove martingality (with the purpose of speeding up
computations).
In order to get our result, we will make use of a very well known property of normal random
variables. In particular, if X ∼ N (0, σ), where σ is the standard deviation, then, for every
ξ ∈ R, we know that E[exp(ξX)] = exp(ξ 2 σ 2 /2).This is nothing more than one of the
properties of the expected value of a lognormal random variable, which we get by taking
the exponential of a normal r.v.

3
Please notice that in this chapter we often interchange the notation X(t) and Xt for convenience.
2.4. THE CAMERON-MARTIN THEOREM 29

Hence we have the following:

E[Zξ (t + s)|Fs ] = E[exp ξB(t + s) − ξ 2 (t + s)/2 |Fs ]




= E[exp(ξB(s) − ξ 2 s/2) exp(ξ(B(t + s) − B(s)) − ξ 2 t/2)|Fs ]


= exp(ξB(s) − ξ 2 s/2)E[exp(ξ(B(t + s) − B(s)) − ξ 2 t/2)|Fs ]
= Zξ (s)E[exp(ξ(B(t + s) − B(s))) exp(−ξ 2 t/2)|Fs ]
= Zξ (s) E[exp(ξ(B(t + s) − B(s)))] exp(−ξ 2 t/2)
| {z }
Lognormal expected value
2 2
= Zξ (s) exp(ξ t/2) exp(−ξ t/2) = Zξ (s).
| {z }
=1

Thanks to Proposition 9, process {Zξ (t)}t≥0 is also known as the (basic) exponential
martingale.
For each ξ ∈ R and every T > 0, it is easy to see that the quantity Zξ (T ), defined
in equation (2.7), is a positive random variable with expectation 1 under P (remember
that {B(t)}t≥0 is a standard Brownian motion on (Ω, F, P ), and {Ft }t≥0 is its natural
filtration). This implies that Zξ (T ) can be a Radon-Nikodym derivative. Set Pξ and Eξ to
be the probability measure and the expectation operator determined by Zξ (T ) on (Ω, FT ).
In other terms, for every F ∈ FT , and for every nonnegative FT −measurable random
variable Y , we have

Pξ (F ) = E0 [Zξ (T )1F ], Eξ [Y ] = E0 [Zξ (T )Y ],

and
P0 (F ) = Eξ [Zξ (T )−1 1F ], E0 [Y ] = Eξ [Zξ (T )−1 Y ],
with P0 = P .
We now have all the concepts we need to introduce the Cameron-Martin theorem that,
as we have said before, is nothing more than a special case (the most important one,
actually) of the more general Girsanov theorem, which we will consider in the next sections.
In particular, the Cameron-Martin theorem characterizes the distribution of the random
process {B(t)}t≥0 under the (tilted) measure Pξ .

Theorem 8 (Cameron-Martin). Under Pξ , the process {Bt = B(t)}0≤t≤T has the same
law as a Brownian motion with drift ξ under P0 . In other terms, the stochastic process
{Bt }0≤t≤T has the same law under Pξ as the process {Bt + ξt}0≤t≤T under P0 .
30 CHAPTER 2. GIRSANOV THEOREM AND FTAPS

Proof. Let us start by the simplest case, that is to say the one involving a single random
variable U = BT under Pξ . W.l.o.g. we also assume T=1.
For every y ∈ R, we have

Pξ (U ≤ y) = E0 [Zξ (1)1U ≤y ] = E0 [exp(ξU − ξ 2 /2)1U ≤y ]


Z y
1
= exp(ξu − ξ 2 /2) √ exp(−u2 /2)du

Z−∞
y
1
= √ exp(−(u − ξ)2 /2)du
−∞ 2π
Z y+ξ
1
= √ exp(−v 2 /2)dv
−∞ 2π
= P0 (U − ξ ≤ y).

This implies that, under Pξ , the random variable U = B1 has the same distribution of
the random variable B1 + ξ under P0 .
To really prove the theorem, we are supposed to show that, for 0 ≤ t0 < t1 < ... < tn = T ,
the joint distribution of the increments ∆B1 , ∆B2 , ..., ∆Bn is the same, under Pξ , as that of
∆B1 + ξ∆t1 , ∆B2 + ξ∆t2 , ..., ∆Bn + ξ∆tn under P0 . Notice that we are using the notation
∆Bk = Btk − Btk−1 .
It is known that, if we have two distributions for which the moment generating functions
(m.g.f.) exist, the two distributions are the same if and only if the two m.g.f. are the same.
This is what we show in the following equation, given that the m.g.f. of a Brownian motion
fortunately exists.

n n
" !# " !#
X X
Eξ exp λk ∆Bk = E0 Zξ (T ) exp λk ∆Bk
k=1 k=1
n
" !#
X
2
= E0 exp(ξB(tn ) − ξ tn /2) exp λk ∆Bk
k=1
n
" ! #
X
= E0 exp (λk + ξ)∆Bk exp(−ξ 2 tn /2)
k=1
n
Y
= exp(−ξ 2 tn /2) E [exp ((λk + ξ)∆Bk )]
|0 {z }
k=1
Lognormal expected value
n
Y
= exp(−ξ 2 tn /2) exp (λk + ξ)2 ∆tk /2


k=1
2.5. A SIMPLE MODEL FOR FOREIGN EXCHANGE RATES 31
 
n
Y
= exp λ2k ∆tk /2 +ξλk ∆tk 
| {z }
k=1 ∗
n
" !#
X
= E0 exp (λk (∆Bk + ξ∆tk )) .
k=1

In ∗, we “reverse” the lognormal expected value. Hence the proof is complete.

It is thus clear that the Cameron-Martin theorem builds a very interesting relation
between standard Brownian motion and Brownian motion with drift. Since a standard
Brownian motion can be seen as a Brownian motion with drift when the drift ξ is equal
to 0, the Cameron-Martin theorem implicitly connects Brownian motions with different
drifts. The following corollary clarifies this.
Corollary 9. Thanks to the Cameron-Martin theorem, we have that, if ξ and η are two
drifts,  
dPξ Zξ (T )
= exp (ξ − η)BT − (ξ 2 − η 2 )T /2 .

=
dPη FT Zη (T )
Exercise 5. Prove Corollary 9.

2.5 A simple model for foreign exchange rates


Assume that there are two risk-free assets on the market. The first one, called EUmoney,
is expressed in euros and gives a riskless rate rE . The second, named USmoney, pays rU
and is given in dollars.
In reality, given the uncertainty related to exchange rates, the EUmoney asset is not
completely risk-free for a dollar investor, and vice versa for the USmoney and a euro
investor. The choice of the numeraire, that is to say the currency with respect to which
all the evaluations will be performed, finally determines the risk-free asset. In what follow,
we will take the point of view of a euro investor.
Let Yt be the exchange rate at time t, i.e. the amount of euros we can buy with 1 dollar in
t. Following a basic model by Merton (the same guy of the Black-Scholes-Merton formula,
which we will consider later in this course), we assume that the exchange rates follows a
stochastic differential equation like
dYt = µYt dt + σYt dB(t) (2.8)
where B(t) is a standard Brownian motion.
Let now Ut and Et be the share prices of USmoney and EUmoney, expressed in terms of
dollars and euros, and normalized so that the share prices in t=0 are both equal to 1.
Hence we have
Ut = exp(rU t),
32 CHAPTER 2. GIRSANOV THEOREM AND FTAPS

and
Et = exp(rE t).
The share price of USmoney at time t in euros is simply Ut Yt . Combining all the ingredients,
we have the following explicit formula

Ut Yt = Y0 exp(rU t + µt − σ 2 t/2 + σB(t)). (2.9)

Proposition 10. Let QE be the risk-neutral probability for the euro investor. If the dol-
lar/euro exchange rate follows a stochastic differential equation of the form (2.8), and if
the risk-free rates are rU and rE for dollar and euro investors respectively, then under QE
we must have
µ = rE − rU .
This implies that
Yt = Y0 exp((rE − rU )t − σ 2 t/2 + σB(t)). (2.10)
where under QE the process B(t) is a standard Brownian motion.
Proof. Under QE , the discounted share price of USmoney in euros must be a martingale.
But the discounted share price is

exp(−rE t)Ut Yt

Given equation (2.9), this equals

Y0 exp(((rU − rE ) + µ)t) exp(−σ 2 t/2 + σB(t)).

The first exponent is nonrandom, while the second one defines a martingale (it is very easy
to show). To guarantee that the product of the two exponentials is still a martingale we
then need to impose µ = rE − rU .

What happens if we take into account the point of view of the dollar investor?
Proposition 11. Let QU be the risk-neutral probability for the dollar investor. If the
dollar/euro exchange rate follows a stochastic differential equation of the form (2.8), where
B(t) is a standard Brownian motion under QU , and if the risk-free rates are rU and rE for
dollar and euro investors respectively, then under QU we must have

µ = rE − rU + σ 2 .

Proof. The proof is left as an exercise.

Proposition 12. The measures QE and QU are mutually absolutely continuous. In par-
ticular, they are related by the Radon-Nikodym derivative
 
dQU
= exp σBT − σ 2 T /2 .

dQE FT
2.6. MOVING TOWARDS GIRSANOV THEOREM 33

Proof. Let VT be the value in T of a contingent claim in dollars. Naturally we have that

V0 = e−rU T EU [VT ]. (2.11)

If WT is the value of VT in euros, we have WT = VT YT . Hence W0 = V0 Y0 = e−rE T EE [VT YT ],


From this we have
V0 = e−rU T EE [VT (YT /Y0 ) exp((rU − rE )T ]. (2.12)
By comparing equations (2.11) and (2.12), we get

EU [VT ] = EE [VT (YT /Y0 ) exp((rU − rE )T ].

This last formula holds for any nonnegative variable VT that is FT −measurable. Hence,
using equation (2.10), we get
 
dQU
= (YT /Y0 )e(rU −rE )T = exp σBT − σ 2 T /2 .

dQE FT

It is interesting to notice that the Radon-Nikodym derivative of Proposition 12 recalls


what we have seen in Corollary 12. Changing the point of view, the risk-free nature of an
asset changes.

2.6 Moving towards Girsanov theorem


Let {B(t)}t≥0 be a standard Brownian motion under the probability P on (Ω, F, P ). Let
{Ft }t≥0 be its natural filtration.
From section 2.2, we know that Zξ (t) = exp(ξBt − ξ 2 t/2) is a martingale with respect to
{Ft }. These martingales constitute the Radon-Nikodym derivatives on which the Cameron-
Martin theorem is based.
The exponential martingales Zξ (t) can be easily generalized to a larger class of martingales,
very surprisingly called generalized exponential martingales.
Let {ξs } be an adapted process (to {Ft }, naturally), which belongs to the class H2 [0, T ]
that we have seen in Chapter 3, Definition 16. This second assumption can be even more
relaxed by taking a locally−H2 process, meaning that only the truncated process {ξs 1s≤t }
belongs to H2 . Rt
In any case, given these assumptions, we have that the Itō’s integral 0 ξs dB(s) is well-
defined.
Now set Z t
1 t 2
Z 
Z(t) = exp ξs dB(s) − ξ ds . (2.13)
0 2 0 s
34 CHAPTER 2. GIRSANOV THEOREM AND FTAPS

Theorem 10 (Novikov Condition). Assume that, for every t ≥ 0, we have


 Z t 
E exp ξs2 ds/2 < +∞,
0

then, for every t ≥ 0,


E[Z(t)] = 1.

In this case, the process {Z(t)}t≥0 is a positive martingale w.r.t. {Ft }.

Novikov theorem can be proven in different ways (at least three); for example we can use
Itō’s formula, as shown in exercise 12 below. However, in order to simplify our narration,
we will just prove the theorem for the very special case in which ξs is nonrandom and
continuous in s.
Rt
Proof. Since ξs is nonrandom, the random quantity 0 ξs dB(s) is normally distributed
Rt 2 Rt
with mean 0 and variance 0 ξs ds. For s < t, also the quantity s ξu dB(u) is normally
distributed and independent from Fs .
Hence, for s < t,
 Z t   Z s   Z t  
E exp ξu dB(u) |Fs = exp ξu dB(u) E exp ξu dB(u) |Fs
0 0 s
Z s   Z t 
= exp ξu dB(u) E exp ξu dB(u)
0 s
Z s  Z t 
2
= exp ξu dB(u) exp ξu du/2 (2.14)
0 s

This shows that Z(t) is a martingale with respect to {Ft }, and that its expected value is
1 for every t < ∞. In fact it is sufficient to compute E[Z(t)|Fs ] and to substitute (2.14).

Exercise 6. Prove the


R t first statement of the previous proof, i.e. “since ξs is nonrandom,
Rt
the random quantity 0 ξs dB(s) is normally distributed with mean 0 and variance 0 ξs2 ds”.

Exercise 7. Given a deterministic ξs , set the following


 Z t  Z t
1
g(x, t) = exp x − ξs2 ds and Yt = ξs dB(s).
2 0 0

Clearly we have Z(t) = g(Yt , t).


Use these facts to give an alternative proof of Novikov theorem.
2.6. MOVING TOWARDS GIRSANOV THEOREM 35

2.6.1 Girsanov theorem

Let us assume that {ξt } is an adapted process satisfying Novikov condition. Moreover, set
Z(t) to be defined as in (2.13). Since E[Z(t)] = 1 for every t, we have that, for T > 0,
Z(T ) is a Radon-Nikodym derivative, so that

Q(F ) = EP [Z(T )1F ]

defines a new probability measure on (Ω, F).


Now, given the process {B(t)}t≥0 , for every t let us define

Z t
B̃(t) = B(t) − ξs ds.
0

We then have the following.

Theorem 11 (Girsanov). Under the probability measure Q, the stochastic process {B̃(t)}0≤t≤T
is a standard Brownian motion.

Given the theorem of Novikov, Girsanov theorem is just a straightforward generalization


of the results of Cameron-Martin.

Proof. What we aim to show is that the process B̃t is a standard Brownian motion under Q,
that is to say it has independent, normally distributed increments, with the right variances.
We can do this by showing that the moment generating function of the increments

B̃(t1 ), B̃(t2 ) − B̃(t1 ), ..., B̃(tn ) − B̃(tn−1 )

is equal to the one of n independent Gaussian random variables with expectation 0 and
variances t1 , t2 − t1 , t3 − t2 , ...
In other terms, we want to show that the following is true

n n
" !#
αk2
X Y  
EQ exp αk (B̃(tk ) − B̃(tk−1 )) = exp (tk − tk−1 ) .
2
k=1 k=1

In what follows we just focus on the simple case n = 1, but the reasoning is exactly the
36 CHAPTER 2. GIRSANOV THEOREM AND FTAPS

same (just a little bit more cumbersome) for a general n. We can thus notice the following.
  Z t 
EQ [exp(αB̃(t))] = EQ exp αB(t) − α ξs ds
0
  Z t  Z t Z t 
2
= EP exp αB(t) − α ξs ds exp ξs dB(s) − ξs ds/2
0 0 0
 Z t Z t 
= EP exp (α + ξs )dB(s) − (2αξs + ξs2 )ds/2
0 0
 2   Z t Z t 
α 2
= exp t EP exp (α + ξs )dB(s) − (α + ξs ) ds/2
2 0 0
 2 
α
= exp t .
2

It is worth noticing that the last step is essentially an application of Novikov theorem.

From the proof we have just seen, it is therefore evident that Girsanov “simply” gen-
eralizes Cameron-Martin.
In the next chapters we will see the importance of Girsanov theorem for pricing financial
products correctly. A complete proof of Girsanov’s theorem, in the most general formula-
tion, will be given in Part II of the lecture notes.
Chapter 3

Black-Scholes-Merton (BSM)
demystified

In this chapter, thanks to the tools introduced in the previous pages, we can finally deal
with the fundamental model of Black-Scholes-Merton for pricing European options and, in
later chapters, some more advanced derivatives.
In writing this chapter, I used [4] as a reference.

3.1 Black-Scholes-Merton (BSM) demystified


In the model of BSM, the behavior of prices is well represented by a continuous time model,
which takes into account a risky asset, e.g. a share1 with price St at time t, and a risk-free
asset, e.g. a zero-coupon bond with price St0 and paying a riskless rate r (under continuous
compounding).
For what concerns the risk-free asset, we assume that the evolution of its price is given by
dSt0 = rSt0 dt.
In what follows we assume to be able to rescale everything, so that S00 = 1 and St0 = ert .
For what concerns the stock price, we assume the following stochastic differential equation
dSt = St (µdt + σdBt ), (3.1)
with µ ∈ R, σ > 0 and Bt is a standard Brownian motion.
The model is assumed to be valid over the period [0, T ], where T is called maturity. For
what concerns the information about the state of the markets, we assume that it is fully
contained into the natural filtration generated by the Brownian motion throughout the
price process2 , i.e. Ft := σ(Sr , r ≤ t).
1
We will omit ω in the notation, but please remember that it is always there.
2
Notice that here σ() is the σ−algebra!.

37
38 CHAPTER 3. BLACK-SCHOLES-MERTON DEMYSTIFIED

Remark 1. Consider equation (3.1). If we apply Itō’s formula to log St , we easily get a
closed-form solution, i.e.

σ2
  
St = S0 exp µ− t + σBt . (3.2)
2

Notice that this process inherits many interesting properties from Bt . These properties are
totally consistent with the perfect markets’ assumption of BSM. In particular:

• The sample paths of the price process are continuous a.s.;

• The so-called “relative” increments (St − Su )/Su , for u ≤ t, are independent from
Fu ;

• The relative increments are stationary.

3.1.1 A little digression: two useful results


In what follows, we will make use of two important results of probability theory and
stochastic calculus.
The first one is the Itō representation theorem, which tells us that any square integrable
random variable, which is measurable with respect to a Brownian motion (i.e. w.r.t. its
natural filtration), can be expressed as a stochastic integral involving this Brownian motion.
We will give this theorem without proof.
2
Theorem 12 (Itō Representation).
hR i F ∈ L (FT , P ). Then there exists a unique adapted
Let
T 2
process f (t, ω) such that E 0 fs ds < ∞ and

Z T
F (ω) = E[F ] + f (t, ω)dBt (ω).
0

Taken the Itō representation theorem for granted, we can then prove a second very
important result about Brownian martingales, i.e. martingales involving the Brownian
motion.

Theorem 13 (Martingale Representation). Let {Mt }0≤t≤T be a square-integrable martin-


gale
hR with respect
i to {Ft }0≤t≤T . There exists a unique adapted process {θt }0≤t≤T , such that
T 2
E 0 θs ds < ∞, and

Z t
Mt = M0 + θs dBs a.s., ∀t ∈ [0, T ].
0
3.1. BLACK-SCHOLES-MERTON (BSM) DEMYSTIFIED 39
hR i
T
Proof. It is known that, if {θt }0≤t≤T is an adapted process such that E 0 θs2 ds < ∞,
Rt
then the process { 0 θs dBs } is a square-integrable martingale, null at 0.
Setting t = T and F = Mt , Itō representation theorem guarantees that, for all t, there
exists a unique h(t) (s, ω) ∈ L2 (FT , P ), such that
Z t Z t
(t)
Mt (ω) = E[Mt ] + h (s)dBs = E[M0 ] + h(t) (s)dBs .
0 0

Let us now consider 0 ≤ t1 < t2 . We have


Z t2  Z t1
(t2 )
Mt1 = E[Mt2 |Ft1 ] = E[M0 ] + E h (s)dBs |Ft1 = E[M0 ] + h(t2 ) (s)dBs .
0 0

But we also have that Z t1


Mt1 = E[M0 ] + h(t1 ) (s)dBs .
0

This implies that h(t1 ) (s) = h(t2 ) (s), for all (s, ω) ∈ [0, t1 ] × Ω. Therefore, in general, we
can always set
θs = h(N ) (s), s ∈ [0, N ],
getting
Z t Z t
(t)
Mt = E[M0 ] + h (s)dBs = M0 + θs dBs , ∀t ≥ 0.
0 0

3.1.2 Self-financing portfolios for BSM


A strategy or portfolio for the BSM model is a vector process θ = {θ t }0≤t≤T = (θt0 , θt ),
with values in R2 , and adapted to Ft . Consistently with what we have seen before, θt0 and
θt are the amounts of the risk-free and risky assets we have in our portfolio at time t. The
value of the portfolio in t is clearly

Vtθ = θt0 St0 + θt St .

In continuous time, the self-financing condition of equations (2.1) and (2.2) can be restated
as
dVtθ = θt0 dSt0 + θt dSt . (3.3)
Now, let us assume that,
Z T Z T
|θt0 |dt + θt2 dt < ∞ a.s. (3.4)
0 0
40 CHAPTER 3. BLACK-SCHOLES-MERTON DEMYSTIFIED

This guarantees that both Z T Z T


θt0 dSt0 = θt0 rert dt
0 0
and Z T Z T Z T
θt dSt = St µθt dt + σθt St dBt
0 0 0
are well-defined.
Definition 13 (Self-financing portfolio or strategy (in continuous time)). A portfolio θ t =
(θt0 , θt ) satisfying (3.4), and such that
Z t Z t
θt0 St0 + θt St = θ00 S00 + θ0 S0 + θu0 dSu0 + θu dSu a.s., ∀t ∈ [0, T ],
0 0

is called self-financing portfolio.


From now on, we will use the “∼” symbol to indicate discounted quantities; for example
S̃t = e−rt St will be the discounted price of the risky asset.
Proposition 13. Let θ t = (θt0 , θt ) be an adapted process with values in R2 , satisfying
equation (3.4). Then θ t defines a self-financing strategy if and only if
Z t
θ θ
Ṽt = V0 + θu dS̃u a.s., ∀t ∈ [0, T ].
0

Proof. The proof is left as an exercise.

3.2 Pricing options in BSM


We now have all the ingredients we need to show that, given the probability space (Ω, F, P ),
there exists a probability measure equivalent to P , under which the discounted price process
S̃t is a martingale.
From equation (3.1), we get

dS̃t = S̃t ((µ − r)dt + σdBt ).


(µ−r)t
Now, set Wt = Bt + σ . This gives

dS̃t = S̃t σdWt . (3.5)

From Girsanov theorem, if we set ξt = (µ−r)t


σ , there exists a probability measure Q, equiv-
alent to P (and P a.c.), under which Wt is a standard Brownian motion. Then, under the
probability measure Q, the process

S̃t = S̃0 exp(σWt − σ 2 t/2)


3.2. PRICING OPTIONS IN BSM 41

is a martingale. Notice that all this also implies that Q is a risk-neutral probability.

On the basis of this information, we can finally price European options. We will focus
our attention on European calls, but the reasoning is exactly the same for puts.
For us a European call is a non-negative, Ft −measurable random variable h = f (ST ),
where f (x) = (x − K)+ , and where K is the so-called strike price3 .

Definition 14. A portfolio θ t , 0 ≤ t ≤ T , is admissible if it is self-financing and if the


corresponding discounted value Ṽtθ is non-negative for all t, and such that supt∈[0,T ] Ṽtθ is
square-integrable under Q.

In what follows, we will always refer to admissible portfolios/strategies.

Definition 15. An option is said replicable if its payoff at maturity (i.e. in T) is equal to
the final value of an admissible strategy.

Definition 15 essentially tells us that an option h must be square-integrable under Q.


For a European call (but also for a EU put) this is always true, given that EQ [ST2 ] < ∞.

Theorem 14 (Black-Scholes-Merton Theorem). In the Black-Scholes-Merton model, any


option h, which is a nonnegative, FT −measurable random variable, square-integrable under
Q, is replicable by the means of an admissible strategy, and the value at time t of any
replicating portfolio θ is equal to
h i
Vtθ = EQ e−r(T −t) h|Ft .

This simply means that, at time t, the value of the option is equal to EQ e−r(T −t) h|Ft
 

Proof. Assume that h is replicable, i.e. we have an admissible strategy (θt0 , θt ) which
reproduces the option. At time t, we have that the value of such a portfolio (for which we
omit θ in the notation) is
Vt = θt0 St0 + θt St ,
and by hypothesis VT = h. Now, let us consider the discounted value Ṽt = Vt e−rt , i.e.4

Ṽt = θt0 + θt S̃t .

Since the portfolio is self-financing, we have that


Z t Z t
Ṽt = V0 + θu dS̃u = V0 + σθu S̃u dWu . (3.6)
0 0
3
For a put, we will simply have g(x) = (K − x)+ .
4
Remember S00 = 1 and St0 = ert .
42 CHAPTER 3. BLACK-SCHOLES-MERTON DEMYSTIFIED

Under Q, we know that supt∈[0,T ] Ṽt is square-integrable, given the admissibility of the
strategy. Moreover, equation (3.6) implies that {Ṽt } is a stochastic integral involving Wt ,
which under Q is a standard Brownian motion. Therefore it follows that {Ṽt } is a square-
integrable martingale under Q, so that

Ṽt = EQ [ṼT |Ft ],

that is h i
Vt = EQ e−r(T −t) h|Ft .

Hence the portfolio (θt0 , θt ) actually replicates h.


In order to complete the proof we have to show that such a portfolio exists, that is to say
that h is replicable. In other words, we have to find two processes {θt0 } and {θt } such that
h i
θt0 St0 + θt St = EQ e−r(T −t) h|Ft .

Under Q, the process Mt = EQ e−rT h|Ft is a square integrable martingale (Exercise 8


 

below). Now notice that the natural filtration Ft of Bt is also the natural filtration for
Wt . The martingale representation hR theoremi then guarantees the existence of an adapted
T
process {Kt }0≤t≤T , such that EQ 0 Ks2 ds < ∞, and
Z t
Mt = M0 + Ks dWs a.s., ∀t ∈ [0, T ].
0

The strategy θ = (θt0 , θt ), with θt0 = Mt − θt S̃t and θt = σKS̃t is then a self-financing strategy
t
(see Exercise 9), and its value at time t is given by
h i
Vtθ = ert Mt = EQ e−r(T −t) h|Ft .

The previous expression tells us that Vtθ is a nonnegative random variable, with supt∈[0,T ] Ṽtθ
square-integrable under Q, and such that VTθ = h.
Hence the proof is complete.

Exercise 8. Prove that the process {Mt } in the BSM Theorem is a square integrable
martingale.
Kt
Exercise 9. Prove that the strategy θ = (θt0 , θt ), with θt0 = Mt − θt S̃t and θt = σ S̃t
, which
we use in the second part of the BSM theorem, is self-financing.

Thanks to Theorem 14 we are now able to explicitly compute the price of a European
call.
3.2. PRICING OPTIONS IN BSM 43

Consider h = f (ST ), with f (x) = (x − K)+ . We can then express Vt as a function of St


and t, i.e.
   
σ2
h i
Vt = EQ e−r(T −t) f (ST )|Ft = EQ e−r(T −t) f St er(T −t) eσ(WT −Wt )− 2 (T −t) |Ft .

Under Q, we have that Wt is a standard Brownian motion, hence WT − Wt is independent


of Ft . Moreover, we know that St is Ft −measurable.
Now set   
2
−r(T −t) r(T −t) σ(WT −Wt )− σ2 (T −t)
F (t, x) = EQ e f xe e . (3.7)

Clearly we have that Vt = F (t, St ).


The increment WT − Wt ∼ N (0, T − t) under Q. As a consequence
Z +∞   2 √ 
−r(T −t) 1 r− σ2 (T −t)+σy T −t y2
F (t, x) = e √ f xe e− 2 dy.
−∞ 2π
Substituting for f (·), and using equation (3.7), we have
" + #
2
  
−r(T −t) r− σ2 (T −t)+σ(WT −Wt )
F (t, x) = EQ e xe −K
 √ 2
+
zσ τ − σ 2 τ −rτ
= EQ xe − Ke
 +
√ 2
zσ τ − σ2 τ −rτ 
= E xe
| {z − Ke } ,
A

where τ = T − t and z ∼ N (0, 1).


At this point, set
log(x/K) + (r + σ 2 /2)τ √
d1 = √ and d2 = d1 − σ τ .
σ τ
These quantities can be easily derived by solving the inequality A ≥ 0 for z. In particular,
we find that A ≥ 0 for z ≥ −d2 .
We can then write the following

 2
 
zσ τ − σ 2 τ −rτ
F (t, x) = E xe − Ke 1{z≥−d2 }
Z +∞  √

1 σ2 τ y2
= √ xeyσ τ − 2 − Ke−rτ e− 2 dy
−d2 2π
Z d2  √

1 2
−yσ τ − σ 2 τ −rτ y2
= √ xe − Ke e− 2 dy.
−∞ 2π
44 CHAPTER 3. BLACK-SCHOLES-MERTON DEMYSTIFIED

This last expression can be given as the difference of two integrals involving the standard
Ru x2 √
Gaussian c.d.f., i.e. Φ(u) = √12π −∞ e− 2 dx. In fact, by setting z = y + σ τ ,

F (t, x) = xΦ(d1 ) − Ke−rτ Φ(d2 ). (3.8)

Therefore the value in t of an EU Call is explicitly

Ct = F (t, St ) = St Φ(d1 ) − Ke−rτ Φ(d2 ).

Similar steps also allow to derive the price of a European put, i.e.

G(t, x) = Ke−rτ Φ(−d2 ) − xΦ(−d1 ).

It is sufficient to use g(x) in place of f (x) (see Footnote 5).


European calls and puts are frequently referred to as plain vanilla options by practitioners.

Exercise 10. Derive d1 and d2 in Equation (3.8).

Exercise 11. Compute the price of a European put.

3.3 Volatility and the BSM model


One of the main characteristics of the BSM model is that all the pricing formulas only
depend on one non-observable parameter: σ. In fact the drift µ disappears thanks to the
probability change from P to Q.
The parameter σ is commonly referred to as “volatility” by practitioners. Is there a way
of estimating σ?
The answer is clearly yes, and the estimation depends on the preferred method. We here
consider the two most popular ones.

3.3.1 Historical Volatility


As the name suggests, in order to estimate the historical volatility we make use of past
market data (therefore: be aware of the possible historical bias!). The trick is to exploit a
nice property of the Brownian motion, when we consider its quadratic variation. To make
things funnier (!), let’s consider the following review exercise.

Exercise 12 (Quadratic Variation). Let Π = {t0 , t1 , ...tn } be a partition of [0, T ]. Now


define the (sampled) quadratic variation of B(t) on Π as
n−1
X
QΠ = (B(tj+1 ) − B(tj ))2 .
j=0
3.3. VOLATILITY AND THE BSM MODEL 45

Show that QΠ converges to T as ||Π|| → 0, that is when the mesh (or norm) of the partition
tends to zero, i.e. when the distance among the points in the partition becomes smaller and
smaller, tending towards 0.
Hint: Show that E(QΠ ) = T , and that V ar(QΠ ) → 0 as ||Π|| → 0. In fact, if V ar(QΠ ) → 0
as ||Π|| → 0, we obtain that lim||Π||→0 QΠ = E(QΠ ) = T .
In the BSM model, as we know, the price of the underlying asset follows a Geometric
Brownian Motion
  
1 2
St = S(t) = S(0) exp µ − σ t + σB(t) .
2
Let 0 ≤ T1 < T2 be a given time window and assume that we observe the process S(t) on
[T1 , T2 ].
Now consider a partition T1 = t0 < t1 < ... < tm = T2 and, if S(t) represents prices,
compute the so-called log-returns, over each time interval [tj , tj+1 ], as
 
S(tj+1 ) 1 2
log = µ − σ (tj+1 − tj ) + σ(B(tj+1 ) − B(tj )).
S(tj ) 2
In finance, the sum of the squares of the log-returns is known as realised volatility, RV .
Hence we have
m−1
X S(tj+1 ) 2

RV[T1 ,T2 ] = log (3.9)
S(tj )
j=0
 2 m−1 m−1
1 X X
= µ − σ2 2
(tj+1 − tj ) + σ 2
(B(tj+1 ) − B(tj ))2 (3.10)
2
j=0 j=0
| {z } | {z }
A B
  m−1
1 2 X
+ 2σ µ − σ (tj+1 − tj )(B(tj+1 ) − B(tj )) . (3.11)
2
j=0
| {z }
C

Now assume that the mesh of our partition is small, that is maxj=0,...,m−1 (tj+1 − tj ) → 0.
In other terms assume that we have a very refined partition Π, thanks to which we can
observe our prices very often in [T1 , T2 ].
As a consequence, the term B in (3.9) is approximately equal to σ 2 (T2 −T1 ), where (T2 −T1 )
is the quadratic variation of B(t) over [T1 , T2 ] (this is simply an extension of Exercise 12).
The term A is equal to 0, because
m−1
X m−1
X
(tj+1 − tj )2 ≤ max (tj+1 − tj ) (tj+1 − tj ) = ||Π||(T2 − T1 ) → 0.
0≤j≤m−1
j=0 j=0
46 CHAPTER 3. BLACK-SCHOLES-MERTON DEMYSTIFIED

Finally, C is also equal to 0, because of the properties of B(t) (see Exercise 13).
Now, rearranging the terms in (3.9) we then get
m−1
X S(tj+1 ) 2

2 1
σ = log ,
T2 − T1 S(tj )
j=0

for ||Π|| → 0.
If prices really follow a geometric Brownian motion S(t) with constant volatility σ, then
Pm−1  S(tj+1 ) 2

1
σ can be estimated by computing T2 −T 1 j=0 log S(tj ) and taking the square root.
This is very convenient from a statistical point of view.
In practice it is not possible to have a very refined partition, because of physical limits: we
can only observe prices at discrete times, there are regulations on the market, etc. This tells
us that what we can get for σ using the historical volatility is only a good approximation.
Exercise 13. Show that C in equation (3.9) is equal to 0.
Exercise 14. Since we are dealing with volatility, why don’t you compute the variance of
the price process in equation (3.2), that is V ar(S(t))? It can be useful later on.

3.3.2 Implied Volatility


While historical volatility is backward-looking, being estimated on past data, the implied
volatility approach is considered forward-looking, because it gives us an idea of the mar-
ketplace views about the value of volatility in the future, as reflected by the structure of
prices today.
The computation of the implied volatility requires numerical methods, but it is con-
ceptually very simple. Let’s consider a EU Call: we know that its value at time t is
Ct = St Φ(d1 ) − Ke−rτ Φ(d2 ), where

log(St /K) + (r + σ 2 /2)τ √


d1 = √ and d2 = d1 − σ τ .
σ τ
Now imagine that we know Ct , St , K and all the other quantities apart from σ. Clearly σ
can be found numerically as a solution of Ct − St Φ(d1 ) − Ke−rτ Φ(d2 ) = 0.
This is the implied volatility, that is the volatility value implied by the BSM model, i.e.
the volatility value such that the model is consistent.

3.4 Some extensions via exercises


This section contains some exercises that help us in introducing some additional interesting
results about what we have seen in this chapter. We will see how the tools we have learnt
3.4. SOME EXTENSIONS VIA EXERCISES 47

can help us in understanding some very important concepts of mathematical finance, such
as the uniqueness of the risk-free rate, and some of the conditions for the absence of
arbitrage. We will also show how we can price EU options on different price processes,
such as for example the one of Bachelier.

3.4.1 Back to Bachelier


In his seminal work, published in 1900, Bachelier introduced the following model for prices,
probably the first formal one:

S(t) = S(0) + µt + σB(t).

What is the price of a EU call on such an asset?

In what follows, we assume r = 0, so that the discount factor plays no role, while we
leave the more general setting for r ≥ 0 as a simple exercise.
Using Cameron-Martin we know that, under a change of measure (let us call the new
measure Q),
S(t) = S(0) + σW (t),
where W (t) is a standard Brownian motion under Q.
Hence we have that

C(0) = F (0, ST ) = EQ (ST − K)+ = EQ (S(0) + σW (T ) − K)+


   

1
Z +∞ √ y2
= √ (S(0) + σy T − K)+ e− 2 dy.
2π −∞

Now, the only condition for which S(0) + σy T − K ≥ 0 is y ≥ σ√1 T (K − S(0)) = d. Thus

1
Z +∞ √ y2
C(0) = √ (S(0) + σy T − K)e− 2 dy
2π d
√ Z
S(0) − K +∞ − y2 σ T +∞ − y2
Z
= √ e 2 dy + √ ye 2 dy
2π d 2π d

σ T − d2
= (S(0) − K)(1 − Φ(d)) + √ e 2 .

3.4.2 The Value-at-Risk for a simple portfolio


Let L be a random variable accounting for the losses we can expect on some investment.
The quantity
 
 α  100 − α
v = inf x ∈ R : P (L ≤ x) ≥ = inf x ∈ R : P (L > x) ≤
100 100
48 CHAPTER 3. BLACK-SCHOLES-MERTON DEMYSTIFIED

is called value-at-risk at confidence level α%, and it is generally indicated as V aRα% . Most
of the times, losses are considered over a standard time horizon, say 1 day or 1 year, hence
we can write V aRα1d and V aRα1y . In what follows we consider a 1-year time window.
Value-at-risk is a fundamental tool of risk management.
1y
Assume we aim to compute the V aR5% for losses on a single investment in a single share
of a stock, modeled according to BSM, purchased at S0 = 100, sold at ST , with µ = 10%,
σ = 40% and T = 1.

The first thing we have to do is to define what is a loss for us. We have three possi-
bilities
• L = ST − S0 , i.e. we assume there is no cost for money and for missing investment
opportunities;
• L = ST e−rT − S0 , i.e. we take into consideration the cost of liquidity, but not the
cost of missing other investments, which could guarantee an average return higher
than µ;
• L = ST e−µT − S0 , i.e. we take into consideration both the cost of money and that
of not investing in other products, since we implicitly assume that µ > r, given that
otherwise it would not make sense to invest in such an asset, less competitive than
the risk-free and riskier.
Let us continue our analyses under the third definition. Hence
P (L ≤ x) = P ST e−µT − S0 ≤ x

 2

− σ2 T +σBT
= P S0 e ≤ x + S0

σ2
    
1 x
= P BT ≤ log +1 + T
σ S0 2
2
    
1 x σ
= Φ √ log +1 + T .
σ T S0 2
This means that we look for the value v such that
σ2
    
α 1 v
= 0.05 = Φ √ log +1 + T
100 σ T S0 2
Using the properties of the standard Gaussian distribution, the quantile v is thus given by
√ −1  α  σ 2
   
v = S0 exp σ T Φ + T −1 .
100 2
1y
Substituting the values of the different variables and parameters, we get V aR5% = −43.89.
In words, the probability of observing a loss that, in absolute terms, is bigger than/equal
to 43.89 is equal to 0.05.
3.4. SOME EXTENSIONS VIA EXERCISES 49

3.4.3 No arbitrage conditions


The next three exercises are meant to discuss conditions for the absence of arbitrage on
the market. Prosaically, arbitrage is nothing more than a “free lunch” situation, that is to
say the possibility of implementing a trading strategy that can profit without cost or risk.
A typical situation for arbitrage is when an asset is mispriced, possibly having different
prices on different markets.
Recall that there are two main types of arbitrage:
• Type 1: A trading strategy that has positive initial cash flow and nonnegative payoff
under all future scenarios.
• Type 2: A trading strategy that costs nothing initially, has nonnegative payoff under
all future scenarios and has a strictly positive expected payoff.
In what follows we show that, in order not to have arbitrage, the following conditions must
be fulfilled
1. We cannot have two or more risk-free assets providing different risk-free rates;
2. If a risky asset mimics the risk-free asset, it must coincide with the risk-free asset;
3. The Put-Call parity must be respected.
Naturally, other conditions can be considered and discussed, but they are not taken into
considerations in these lecture notes.

4.8.4.1 The uniqueness of the risk-free asset


The following exercise shows that we cannot have more than one risk-free asset on the
market, in order not to have arbitrage. In more details, we show that if a second risk-free
asset is introduced, it must replicate the already-existing one. In terms of rates, this means
that the new risk-free asset cannot provide an alternative risk-free rate.
Starting from equation (3.2), let us consider the degenerate case in which σ = 0. This
implies
dSt = µSt dt.
We assume S00 = S0 = 1. A necessary condition not to have arbitrage is that µ = r.

We look for a portfolio (θ0 , θ) such that


θ00 S00 + θ0 S0 = 0 and θT0 ST0 + θT ST > 0 a.s.
Let us assume that µ > r.
In t = 0 we buy θ0 = 1 shares of stock and short-sell one unit of risk-free, i.e. θ00 = −1.
This costs 0 since S00 = S0 = 1. Hence we have
θ00 S00 + θ0 S0 = −1 × 1 + 1 × 1 = 0.
50 CHAPTER 3. BLACK-SCHOLES-MERTON DEMYSTIFIED

Given the assumptions of the exercise, and since µ > r, we know that, for every t ∈ (0, T ),
St = eµt > ert = St0 . Given this information, until T we do nothing, so that θt0 = −1 and
θt = 1 for all t ∈ (0, T ). This is trivially a self-financing strategy.
At maturity5 T we then have
θT0 ST0 + θT ST = −erT + eµT > 0 a.s.,
hence we have arbitrage.
If we assume µ < r, we can obtain similar results by buying the risk-free and short-selling
the stock. As a consequence, the only condition not to have arbitrage is that µ = r, since
in that case
θT0 ST0 + θT ST = −erT + eµT = −erT + erT = 0.

4.8.4.2 No one like the risk-free


Show that if the price process P (t) of an asset satisfies
dP (t) = g(t)P (t)dt,
where g is a stochastic process, then g(t) = r a.s. ∀t ≥ 0.
The aim of this exercise is to show that, once again, if we have an asset that mimics the
behavior of the risk-free asset, then it must coincide with the risk-free asset, in order not
to have arbitrage.

We know that the risk-free asset S 0 (t) satisfies


dS 0 (t) = rS 0 (t)dt,
and we can also set S 0 (0) = 1.
Now, w.l.o.g. let us also assume that P (0) = 1.
Consider a strategy (x(t), y(t)) consisting of x(t) units of P (t) and y(t) units of S 0 (t), such
that 
1
 P (t) > S 0 (t)
x(t) = 0 P (t) = S 0 (t) , y(t) = −x(t).

 0
−1 P (t) < S (t)
For what concerns the value of such a strategy, we can easily observe that is is always
nonnegative, in fact

0 0
P (t) − S (t) P (t) > S (t)

0
0 ≤ V (t) = x(t)P (t) + y(t)S (t) = 0 P (t) = S 0 (t) .

 0
S (t) − P (t) P (t) < S 0 (t)
5
Notice that in most exercises about arbitrage, we clearly identify 3 steps: the creation of the strategy
in t = 0, the maintenance in t ∈ (0, T ), and the liquidation in T .
3.4. SOME EXTENSIONS VIA EXERCISES 51

Moreover, V (t) satisfies

d d d
V (t) = x(t) P (t) + y(t) S 0 (t),
dt dt dt
almost everywhere with respect to t ≥ 0. That means

dV (t) = x(t)dP (t) + y(t)dS 0 (t).

At the end of the day, we have a self-financing strategy such that V (0) = 0 and V (t) ≥ 0
for all t ≥ 0. The only condition not to have arbitrage is that V (t) = 0 for all t ≥ 0. But
this implies
P (t) = S 0 (t) = ert ,
so that we have g(t) = r for all t ≥ 0.

4.8.4.3 The Put-Call parity as a condition of no arbitrage


Let Ct and Pt be the price of a EU call and a EU put at time t respectively. The Put-Call
parity is expressed as
Ct − Pt = St − Ke−r(T −t) .
In words: in every time instant t, the price difference between a call and a put on the
same underlying asset (modeled according to BSM, with strike price K, etc.) is equal to
the difference between the price of the asset and the discounted value of the strike price.
We can show that the Put-Call parity is another necessary condition for the absence of
arbitrage.
To simplify the treatment, let us assume that our call and put are at-money forward, i.e.
K = S0 erT . In other terms, we are expressing naive expectations about the behavior of the
underlying asset.
Given all these assumptions, we have that

C0 − P0 = 0.

Let us show that if C0 − P0 = X > 0, i.e. C0 = P0 + X, we have arbitrage.

The simplest way of showing the possibility of arbitrage is to implement the following
strategy:

• At time t = 0,

1. We buy a Put for P0 ;


2. We short-sell the call, getting P0 + X;
3. We invest X in the risk-free asset, at rate r;
52 CHAPTER 3. BLACK-SCHOLES-MERTON DEMYSTIFIED

4. We enter into a forward contract, in which we accept to receive ST − K at


maturity (if the quantity is positive, we get it, if it is negative, we pay its
absolute value to our counterparty). Notice that the price of such a forward is
equal to 0, given that S0 − e−rT K = 0.

The total cost of this strategy is P0 − P0 − X + X + 0 = 0.

• For t ∈ (0, T ), we hold our positions as they are, doing nothing.

• At T , the situation is as follows:

1. We get (K − ST )+ from the Put;


2. We pay (ST − K)+ for the Call;
3. We get/pay ST − K because of the the forward;
4. We obtain XerT from the investment in the risk-free.

The value at time T for our portfolio is thus

VT = (K−ST )+ −(ST −K)+ +ST −K+XerT = K−ST +ST −K+XerT = XerT > 0 a.s.

Hence we have an arbitrage. The Put-Call parity is therefore a very important relation-
ship that goes beyond the simple link between two different securities, given that it has
implications in terms of the possibility of no arbitrage.
Part II

Advanced Topics in Financial


Mathematics

53
Chapter 4

A step forward in derivatives

Starting from what we have seen in Chapter 3, we deal with options on dividend-paying
assets, barrier options and american options. These types of options can be seen as direct
derivations of EU options under BSM.
More complex derivatives, like Parisian options, are - from a probabilistic point of view -
just generalizations, where the numerical part is much more interesting than the theoretical
framework.
Good references for this Chapter are [3], [4] and [8].

4.1 Dividend-paying assets


Most of the times, the holders of ordinary shares do receive dividends. A dividend is a cash
payment paid on specific dates, typically once or twice a year. The common expression is
“a dividend of x cents per share will be distributed on day DD”.
If we think about a stock paying dividends, it is meaningful to consider the so-called
dividend yield. In financial terms, the dividend yield is nothing more than the dividend per
share per unit time, divided by the price of the share itself. From a mathematical point
of view, we will hypothesize that a dividend is a continuous time payment stream, such
that the dividend paid in an infinitesimal time interval dt by stock S(t) is equal to qS(t)dt,
where q is the dividend yield.
The use of the dividend yield is particularly important when dealing with stock indices,
since their constituent stocks may pay different dividends at different times.
In what follows we consider the simple case in which q is a fixed constant.
Using the usual notation, the value of a BSM portfolio with a divided-paying asset is given
by

dVt = θt dSt + qθt St dt + (Vt − θt St )rdt (4.1)


= θt St (µ + q − r)dt + rVt dt + θt St σdBt ,

55
56 CHAPTER 4. A STEP FORWARD IN DERIVATIVES

with V0 = v0 . Taking into consideration the discounted value, we get

dṼt = θt S̃t (µ + q − r)dt + θt S̃t σdBt .

As done before, we can change to a risk-neutral measure Q such that


µ+q−r
dWt = dBt + dt,
σ
i.e. Wt is a Q-Brownian motion (the quantity µ+q−r σ is now the drift we use under Q).
This let us re-write
dṼt = θt S̃t σdWt .
Now, let us come back to equation (4.1), and let us assume that Vt − θt St = 0, for all t ≥ 0,
i.e. we invest everything in the risky asset. A clear consequence is that θt = Vt /St . Hence

dVt = Vt ((µ + q)dt + σdBt ).

Now, notice that the dividend yield q can be seen as an implicit rate of return for the
risky asset, since it takes into consideration all the payments generated by the asset itself
in continuous time. We can then introduce the following quantity

Ŝt = e−qt St ,

and then observe that


dVt = Ŝt ((µ + q)dt + σdBt ).
If we set V0 = e−qT S0 , we can then state the following proposition about the price of a EU
call for a dividend-paying asset.

Proposition 14. In the BSM universe, if we take into consideration an asset with a
constant dividend yield q, the value at time t = 0 of a European call option with maturity
T and strike price K is equal to

F (0, AT ) = e−rT (AT Φ(d1 ) − KΦ(d2 )) ,

where AT = e(r−q)T S0 ,

log(AT /K) + σ 2 T /2 √
d1 = √ , and d2 = d1 − σ T .
σ T
Proof. The proof is simply a manipulation of the standard derivation of the price for a
basic EU call under BSM.

Exercise 15. Can you build any connection between EU options on dividend-paying assets,
as we have just discussed above, and the simple exchange rate model of Section 2.5?
4.2. BARRIER OPTIONS 57

4.2 Barrier options


Let St be a price process of a risky asset according to the basic settings of BSM. Set
Mt = max0≤u≤t Su to be the maximum price to date.
An up-and-out or knock-out option is an option with exercise value

[ST − K]+ 1MT <L . (4.2)

Such an option pays the standard payoff of a EU call if St < L for all t ∈ [0, T ], and
zero otherwise. The quantity L is called level or barrier. In order to have economically
meaningful computations, we will assume S0 < L and K < L.
An up-and-in option, a.k.a. knock-in option, has the opposite behavior. Its payoff is

[ST − K]+ 1MT ≥L . (4.3)

Up-and-out and up-and-in options belong to the large class of barrier options, that is to
say options whose value and behavior depends on the possibility of the underlying asset to
cross a pre-determined threshold value. Barrier options are one of the main constituents of
the larger exotic class, the class of all non vanilla options. Other common barrier options
are down-and-out and down-and-in.
The first thing to be noticed, when dealing with up-and-out and up-and-in options, is that
the sum of the two payoffs in equations (4.2) and (4.3) above corresponds to the payoff of
a standard European call. Hence it is sufficient to value just one of the two types.

4.2.1 A little digression


To price up-and-out options, we will rely on the reflection equality or principle of the
Brownian Motion (see equation (7.3) in the Appendix).
Let Bt be a standard Brownian motion starting at 0, and set mt = maxs≤t Bs . The
reflection principle states that, for 0 < x ≤ y,
 
x − 2y
P (mt ≥ y, Bt < x) = Φ √ . (4.4)
t

Now, let us observe the following fact:


   
2y − x x − 2y
P (mt ≥ y, Bt ≥ 2y − x) = P (Bt ≥ 2y − x) = 1 − Φ √ =Φ √ .
t t

Clearly, for the event [Bt < x] we have

[Bt < x] = [mt < y, Bt < x] ∪ [mt ≥ y, Bt < x].


58 CHAPTER 4. A STEP FORWARD IN DERIVATIVES

Hence we have
 
x
Φ √ = P (Bt < x) = P (mt < y, Bt < x) + P (mt ≥ y, Bt < x).
t

But this implies the following proposition.

Proposition 15. The joint distribution of Bt and its maximum-to-date mt is given by


   
x x − 2y
H0 (y, x) = P (mt < y, Bt < x) = Φ √ − Φ √ . (4.5)
t t

4.2.2 Back to pricing


The argument we have just used to obtain Proposition 15 only works for a standard Brow-
nian motion, whereas it is not valid in presence of drift. However, we now have all the tools
we need to overcome such a difficulty. Once again we can make use of Cameron-Martin!
Let Pη be the probability measure of a Brownian motion with drift η, say Wt = ηt + Bt .
In Chapter 2, we have seen that we can define the Radon-Nykodim
 
dPη 1 2
= exp ηWT − η T ,
dP0 2

where P0 is the original measure of Bt .


For any integrable function f , we have that, thanks to equation (4.5),
 
dPη
Eη [1mT <y f (WT )] = E0 1mT <y f (WT ) (4.6)
dP0
  
1
= E0 1mT <y f (WT ) exp ηWT − η 2 T
2
Z y √ √ 
 
1
= f (x) exp ηx − η 2 T φ(x/ T ) − φ((x − 2y)/ T ) dx,
−∞ 2

where φ(x) is the density function of a standard Gaussian.


Now, we first take into account that
   
1 1 2 1 2 1 x − ηT
√ exp − x + ηx − η T = √ φ √ ,
2πT 2T 2 T T

and after some computations we also get


   
1 1 2 1 2 1 2ηy x − 2y − ηT
√ exp − (x − 2y) + ηx − η T = √ e φ √ .
2πT 2T 2 T T
4.2. BARRIER OPTIONS 59

This leads us to give the following joint distribution for a Brownian motion with drift η
and its maximum to date:
   
x − ηt 2ηy x − 2y − ηt
Hη (y, x) = Pη (mt < y, Wt < x) = Φ √ −e Φ √ .
t t
Notice that when η = 0, Hη (y, x) = H0 (y, x).

We can now come back to the pricing of barrier options. Under the risk-neutral mea-
sure we know that
σ2
  
ST = S0 exp r− T + σBT ,
2
or
ST = S0 eσWT ,
where WT = ηT + BT is a standard Brownian motion under the risk-neutral measure
(what’s the explicit formula for η?).
The price ST is in the money (i.e. the option is profitable, given that ST > K), but below
the barrier L, when WT ∈ (l1 , l2 ), with
1 1
log(K/S0 ) l1 = and l2 = log(L/S0 ). (4.7)
σ σ
Let us now define the marginal
∂Hη (y, x)
h(y, x) =
.
∂x
Notice that we have already implicitly used this quantity in equation 4.6.
The option value at time t = 0 can then be obtained as
Z l2
 −rT + −rT
(S0 eσx − K)h(y, x)dx.

Eη e [ST − K] 1MT <L = e
l1
We “only” need to apply BSM theorem.
For an up-and-out option, the fair (risk-neutral) price at t = 0 is given by
2λ ! 2λ−2 !
L √ L √ √
 
−rT
S0 Φ(d1 ) − Φ(x1 ) + (Φ(−y) − Φ(−y1 )) +Ke −Φ(d2 ) + Φ(x1 − σ T) − (Φ(−y + σ T ) − Φ(−y1 + σ T )) ,
S0 S0

where d1 and d2 are as in the basic BSM model, and where


log(S0 /L) √
x1 = √ + λσ T ,
σ T
log(L/S0 ) √
y1 = √ + λσ T ,
σ T
log(L2 /(S0 K)) √
y = √ + λσ T ,
σ T
r + σ 2 /2
λ = .
σ2
60 CHAPTER 4. A STEP FORWARD IN DERIVATIVES

The complexity of the formulas related to exotic options is the main cause for them to be
called “exotic”.

Exercise 16. Derive the bounds l1 and l2 of equation 4.7.

4.3 American Options


Describing an American option1 is rather simple: it is just like an European option, BUT
it can be exercised at any time t before maturity T , and not only in T . From a practical
point of view, this apparently small change makes everything much more complicated, given
the infinitely many scenarios and exercise policies. American options constitute another
fundamental type of vanilla options2 .
An interesting phenomenon, when dealing with American (US) options, is that the call
and the put options tend to behave differently. For a US call we can show that the optimal
strategy is to only exercise at maturity, while for a US put we usually find an optimal time
t = τ ≤ T (τ will be our optimal stopping time), even if its identification is not at all a
simple task.
As usual, in what follows, we assume that the BSM framework holds, and that the price
process of the underlying asset is a geometric Brownian motion. To simplify computations,
we also consider an asset that does not pay any dividend (in case of dividends, most of the
things that we are going to say still hold, but formulas become extremely cumbersome).
As far as notation is concerned, τ represents the optimal stopping time at which we exercise
a given option (call or put). By definition τ ∈ [0, T ].

4.3.1 American Calls


In a call option, the payoff is (S − K)+ , where S is the stock price (at the time of exercise)
and K is the strike price. If we fix K, it is easy to see that (S − K)+ is convex in S (show
it!). This observation is what we need to state the following Theorem.

Theorem 15. The optimal strategy for the owner of a US call is to hold the option until
its expiration, so that τ = T , where T is the maturity of the option.
1
In these lecture notes we deal with the modeling of American options in continuous time. The modeling
in discrete time, using binomial trees, is not covered. For the interested reader, a good reference is [7].
2
At the intersection between European and American options we can find Bermuda options. In a
Bermuda option, the corresponding right (to buy or to sell) can be only exercised at one of a finite number
of times, for example “every first Monday of every month until expiration”. Let E be the set of the exercise
times allowed in a Bermuda option. If E = {T }, a Bermuda option is nothing more than a European
option. If E = {k∆ : k = 1, 2, ..., bT /∆c} ∪ {T }, with ∆ → 0, then a Bermuda option converges towards an
American option. The pricing of Bermuda options is performed using backward induction.
A further generalization are Canary options, which are Bermuda options with an initial deferment period.
For example the option cannot be exercised during the first year, and then it can be exercised once a month.
4.3. AMERICAN OPTIONS 61

To prove theorem 15, it is convenient to introduce the following lemma, based on a


simple application Jensen’s inequality for conditional expectations.

Lemma 4. Let {Mt } be a martingale w.r.t. the filtration {Ft }t∈[0,T ] . Let τ ≤ T be a
stopping time. Finally, let φ be a convex function. Then

E[φ(Mτ )] ≤ E[φ(MT )].

Proof. We have that


E[MT |Fτ ] = Mτ a.s.

Applying Jensen’s inequality, we thus obtain

E[φ(MT )] = E[E[φ(MT )|Fτ ]] ≥ E[φ(E[MT |Fτ ])] = E[φ(Mτ )].

Let’s come back to theorem 15. We are now ready to prove it.

Proof of theorem 15. Take τ ≤ T . The payoff of a US call exercised at τ is (Sτ − K)+ . Its
discounted expected value in t = 0 is thus

E[e−rτ (Sτ − K)+ ] ≤ E[Sτ e−rτ − Ke−rT ]+ .

The first FTAP tells us that discounted price process {e−rt St }t≥0 is a martingale w.r.t. to
its natural filtration (which coincides with that of the embedded Brownian motion). Now,
since y → (y − K)+ is convex, lemma 4 tells us that

E[Sτ e−rτ − Ke−rT ]+ ≤ E[ST e−rT − Ke−rT ]+ .

Therefore: be patient and wait until maturity T .

4.3.2 American Puts


The payoff of a put takes the form (K − S)+ . With VtU S , respectively VtEU , we indicate
the value of an American, respectively European, put option at time t ∈ [0, T ].

Proposition 16. Let the risk-free rate r be strictly positive. Then, for all t < T , we have
that
VtEU < VtU S .
62 CHAPTER 4. A STEP FORWARD IN DERIVATIVES

Proof. A sufficient strategy to prove the proposition is the following one, which is however
not necessarily optimal.
Exercise the American put at time min(τ, T ), where
n o
τ = min t ≥ 0 : St ≤ K − Ke−r(T −t) .

Then...the rest of the proof is left as an exercise.

Exercise 17. Complete the proof of proposition 16.


Hint: just consider the two scenarios τ < T and τ ≥ T . What happens?
Proposition 16 tells us that the optimal exercise policy for a US put is not necessarily
to wait until maturity. In particular, one can show that the optimal time to exercise an
American put is the first τ ≤ T such that

Sτ ≤ S ∗ (τ ), (4.8)

where S ∗ (τ ) is a quantity called exercise boundary. If the condition in equation 4.8 is not
satisfied before T , one should allow the option to expire, or to get worthless, using some
financial jargon.

The big problem when dealing with American puts is that the differentiable and strictly
increasing function S ∗ (t) (from which we obtain S ∗ (τ )) cannot be expressed in a simple
closed form. For a general US put option, it is hard to prove even basic qualitative prop-
erties of the boundary, an example being smoothness.
Dealing with American puts thus becomes an optimization problem, whose technicalities
go beyond the scope of these lecture notes. However, since I do not want to disappoint
you3 , it is possible to have a flavor of the procedure used to identify the exercise boundary,
by playing with a special type of American options: Perpetual Puts.

4.3.3 Perpetual American Puts


Perpetual American Puts (PAPs) have been first studied by Merton [4]. A PAP works
exactly like a standard American put, apart from one single big difference: there is no
expiration date, i.e. T = +∞. In simple words: you can hold it or pass it forever, up till
infinity, if you are able to live that longer. If you decide to exercise it, what you get is
(K − Sτ )+ , with τ < T .
Theorem 16. For an Perpetual American Put with strike price K, the optimal strategy is
to exercise it at the optimal time

τ = inf{t : St ≤ S ∗ }, (4.9)
3
I know you love financial mathematics.
4.3. AMERICAN OPTIONS 63

where
2Kr
S∗ = .
2r + σ 2
The value VtP AP of the unexercised perpetual put at at time t only depends on the current
price St of the underlying asset, being equal to
   2r2
K 2r σ
VtP AP =K 1− = u(St ). (4.10)
St 2r + σ 2

To prove theorem 16, we will make use of several propositions and lemmas.
The first proposition tells us more about the value function u(·) in equation 4.10.

Proposition 17. Let VtP AP be the value of a not-yet-exercised perpetual American put
option at time t. Then VtP AP is a function only of the price level St , i.e.

VtP AP = u(St ).

Heuristic Proof. In order to avoid excessive complications, we can give a heuristic proof of
proposition 17.
Suppose that at t = 0 we observed S0 = 10, and that now, in t > 0, the stock price is
again (it can change in-between) St = 10. In the meanwhile the corresponding PAP has
not been exercised.
Given the properties of the geometric Brownian motion, we know that {St+l |St = 10}l≥0
has the same law of {Sl |S0 = 10}l≥0 . Now, since the risk-free r is assumed constant, there
is no reason to behave differently at time t > 0 w.r.t. t = 0. This tells us that the value of
the option must be the same, because the price level is the same.

Exercise 18. Link the reasoning of the previous heuristic proof with the absence of arbi-
trage on the market.

The following proposition characterizes the value function u(·).

Proposition 18. The value function u(S) is:

1. strictly positive.

2. non-increasing in S.

3. bounded from below by (K − S)+ .

4. Lipshitz-continuous4 in S.
4
Actually the value fuction u(S) is everywhere differentiable, but proving it is extremely technical.
64 CHAPTER 4. A STEP FORWARD IN DERIVATIVES

Proof. Points 1-3 are left as exercise.


(1) (2)
Regarding point 4, let’s consider two different initial points, s0 and s0 , for our price
(2) (1)
process in t = 0, such that |s0 − s0 | < , where  > 0 is small enough.
The value of the price process in t is therefore
σ2
   
(i) (i)
St = s0 exp σBt + r − t = s 0 Zt , i = 1, 2.
2
(1) (2)
Now assume that, for both S0 = s0 and S0 = s0 , the PAP is exercised at the same
optimal time τ . This tells us that the difference in the magnitude of the payoffs is
(1) (2) (1) (2)
|(K − s0 Zτ )+ − (K − s0 Zτ )+ | ≤ |s0 − s0 |Zτ 1Z 
(1) (2)
. (4.11)
τ ≤max K/s0 ,K/s0

By taking discounted expectations we thus get


     
(1) (2) (1) (2)
u s0 − u s0 ≤  max K/s0 , K/s0 . (4.12)

Exercise 19. Prove points 1-3 of proposition 18.


Exercise 20. Verify that equation 4.11 implies equation 4.12.
The value function u(S) is so important to us, because it tells us when to exercise our
PAP.
Let’s assume that an optimal strategy exists (please have a look at exercise 22), so that we
can define an optimal stopping time to exercise our option.
Proposition 18 tells us that u(S) ≥ (K − S)+ for all possible values of S. But, if u(S) >
(K − S)+ , it is sub-optimal to exercise the option, because we are getting the smaller value
of (K − S)+ for an option worth u(S). This means that an optimal strategy can only
allow5 to exercise the option when u(S) = (K − S)+ . Stating all this formally, we get the
following proposition.
Proposition 19. An optimal strategy for a PAP is to exercise it at time τ , with

τ = inf{t : u(St ) = (K − St )+ }.

The following proposition gives us some important information about the price values
for which an optimal strategy can be defined and implemented.
Proposition 20. There exists 0 < S ∗ = S ∗ (K) ≤ K such that

u(S) = (K − S)+ iff S ≤ S ∗ .


5
Actually, an optimal strategy imposes to exercise the option when u(S) = (K − S)+ , because it could
only lose value, if not exercised.
4.3. AMERICAN OPTIONS 65

Proof. First of all we know that u(S) > 0 for all S > 0, so that u(S) = (K − S)+ only if
S < K.
Assume that S < S 0 ≤ K, and that u(S) > (K −S). We want to show that u(S 0 ) > K −S 0 .
If u(S) > (K − S), there exists τ such that

E[e−rτ (K − SZτ )+ 1τ <∞ ] > K − S.

In other words, there exists an exercise policy for which the expected discounted payoff is
strictly greater than the payoff for immediate exercise.
W.l.o.g. we can assume that τ is such that SZτ < K (why?). Hence we have

E[e−rτ (K − SZτ )+ 1τ <∞ ] > K − S =⇒


S(1 − E[e−rτ 1τ <∞ ]) > K − KE[e−rτ 1τ <∞ ] =⇒
S 0 (1 − E[e−rτ 1τ <∞ ]) > K − KE[e−rτ 1τ <∞ ] =⇒
E[erτ (K − S 0 Zτ )+ 1τ <∞ ] > K − S 0 .

The last inequality implies that u(S 0 ) > K − S 0 , and the proof is complete.

Corollary 17. The optimal exercise time for a PAP is

τ = inf{t : St = S ∗ }.

In order to complete our step-by-step (long) proof of theorem 16, we need to find S ∗ .
In reality, this is the easy part. The trick is to calculate the expected discounted payoff of
a perpetual American put for all possible price levels S, and then to maximize it to get S ∗ .
Following corollary 17, we can just take into consideration values of S ∗ < K (why?). The
expected discounted payoff in t = 0 of a PAP with initial value S0 ≥ K is equal to
 2r2
S∗

σ
−rτ + ∗ −rτ ∗
EQ [e (K − Sτ ) 1τ <∞ ] = (K − S )EQ [e 1τ <∞ ] = (K − S ) , (4.13)
S0

where EQ is the expectation under the risk-neutral measure.


The right-hand side of equation 4.13 is then maximized for
2Kr
S∗ = .
2r + σ 2
If we now substitute the value for S ∗ in equation 4.13, and we rearrange, we can finally
obtain the value function u(·) in equation 4.10.

Exercise 21. Compute the payoff of a perpetual American option in t = 0 as per equation
4.13.
 ∗  2r2  2

Hint: To show that EQ [e−rτ 1τ <∞ ] = SS0 σ , remember that e−rt St = S0 exp σBt − σ2 t ,
66 CHAPTER 4. A STEP FORWARD IN DERIVATIVES

and think about the definition of τ .


Then apply Cameron-Martin (you should see a Radon-Nikodym derivative somewhere...)
and use the fact that, if Bt is a standard Brownian motion under the measure P, then

P (Bt + γt = η) = e2γη ,

for any γ > 0 and η < 0.

Exercise 22. To show that an optimal strategy may not exist, in general, consider the
following fictitious perpetual option: its owner has the right to exercise it, at any t > 0, for
a payoff of 1 − 1t . Assume r = 0. Why is it not possible to have an optimal strategy?
In other words, with PAPs we are lucky, but not all options allow for optimal strategies,
i.e. strategies that maximize our payoffs and that cannot be beaten.
Chapter 5

Fixed-income and interest rates

In this chapter we introduce the basics of fixed income mathematics. Our goal is to discuss
some fundamental results for interest rates.

5.1 Some basics of fixed-income


Let us consider a money market account (MMA), i.e. a financial account paying interests
based on current interest rates in the money markets. The value R(t) of a 1-euro (we
created it in t = 0 with 1 euro) MMA at time t is defined as
Rt
r(u)du
R(t) = e 0 ,

where r(u) is the interest rate process.


Let us now consider the price P (t, T ) of a 1-euro1 zero-coupon bond with maturity T . Such
a price is given by h RT i
P (t, T ) = EQR e− t r(u)du |Ft ,

under a risk-neutral martingale equivalent measure QR that we will define in a few lines.
Notice that, if interest rates are deterministic, the previous equation simplifies to
RT R(t)
P (t, T ) = e− t r(u)du
= .
R(T )

And this further simplifies for constant rates, i.e. P (t, T ) = e−r(T −t) .
In the no-arbitrage setting, when pricing derivatives, the value V (t) of a derivative on an
underlying asset S(t) is given by selecting a numeraire N (t), and by taking the expectation
1
In what follows, unless differently specified, we always play with 1-euro investments.

67
68 CHAPTER 5. FIXED-INCOME AND INTEREST RATES

with respect to the equivalent martingale measure QN , under which the discounted value
of the derivative is a martingale (cfr. Section 4.3). In mathematical terms
 
V (t) V (T )
= EQN |Ft .
N (t) N (T )

Numeraire
A numeraire is a basic standard by which value is computed. A typical example is
money.
If we choose the quantity G as the numeraire, we simply assume that all prices and
economic quantities are normalized by G’s price.
.
If we take the MMA to be the numeraire, we have
 
R(t)V (T ) h RT i
V (t) = EQR |Ft = EQR e− t r(u)du V (T )|Ft , (5.1)
R(T )
V (t)
where QR is the equivalent martingale measure according to which R(t) is a martingale.
Exercise 23. How does equation (5.1) simplifies if the interest rates are deterministic?
And if they are constant?

5.1.1 Intro to interest rate derivatives and the T-forward measure


If we consider interest rate derivatives, such as caps, floors and swaptions, the value of
the derivative is linked to r(t), i.e. V (t) = V (t, T, r(t)). Moreover, interest rates cannot
be assumed to be constant (or even deterministic), since that would be a too restrictive
hypothesis, compatible with the BSM universe, but not with products that are directly
dependent on interest rates. This consideration implies that the value of an interest rate
derivative is obtained as
h RT i
− t r(u)du
V (t) = V (t, T, r(t)) = EQR e V (T, T, r(T ))|Ft . (5.2)

The expected value in equation (5.2) can be evaluated using P (t, T ) as numeraire. The
corresponding equivalent martingale measure QT is known as the T-forward measure.

Caps and floors


A (interest rate) cap is a derivative in which the buyer receives payments at the end
of each period (coupon time) in which the reference interest rate L exceeds an agreed
strike level K. An example of a cap would be an agreement to receive a payment for
each month the reference rate exceeds 3%. A cap can be seen as a series of European
call options, known as caplets, which exist for each (coupon) period the cap agreement
5.1. SOME BASICS OF FIXED-INCOME 69

is in existence. In mathematical terms, a caplet payoff is

θ[L − K, 0]+ ,

where θ is the so called accrual factor.


A floor is a series of European put options or floorlets on a specified reference rate
L. The buyer of the floor receives money if on the maturity of any of the floorlets,
the reference rate is below the agreed strike level K of the floor. The payoff of each
floorlet is therefore
θ[K − L, 0]+ .
Other types of interest rate derivatives are swaptions, CMS, CTS, etc. See [3] for more
details.
.

We know that P (T, T ) = 1. Hence


 
V (t) V (T )
= EQT |Ft = EQT [V (T )|Ft ] ,
P (t, T ) P (T, T )

so that2
h RT i
V (t) = P (t, T )EQT [V (T )|Ft ] = EQR e− t r(u)du |Ft EQT [V (T )|Ft ] . (5.3)

Equation (5.3) is very important: it allows us to pass from the expected value under QR
of a product of two terms, as in (5.2), to the product of two expectations, one under QR
and the other under QT . This is much simpler to evaluate.
Now, let us consider
h RT i
V (t) = EQR e− t r(u)du V (T )|Ft = P (t, T )EQT [V (T )|Ft ] .

The previous equation can be re-written as


 
R(t)
V (t) = EQR V (T )|Ft = EQT [P (t, T )V (T )|Ft ] .
R(T )

Using again the fact that P (T, T ) = 1, we can then write


   
R(t) P (t, T )
EQR V (T )|Ft = EQT V (T )|Ft . (5.4)
R(T ) P (T, T )
2
Remember V (t) = V (t, T, r(t)).
70 CHAPTER 5. FIXED-INCOME AND INTEREST RATES

dQT
Equation (5.4) is the key to obtain the Radon-Nicodym derivative dQ R
, which allows us
to pass from QR to QT . In particular, using the same reasoning of Subsection 4.1.1, we
obtain RT
dQT R(t)/R(T ) e− t r(u)du
= = .
dQR P (t, T )/P (T, T ) P (t, T )
All this tells us that, in order to value an interest rate derivative, it is convenient to choose
as a numeraire the price of a zero-coupon bond with the same maturity T of the derivative.

Exercise 24. Consider the quantity

∂ log P (t, T )
f w(t, T ) = − .
∂T
Such a quantity is known as forward rate, and it represents the future yield of a bond, as
inferable from the actual rate structure. Show that f w(t, T ) is a martingale under QT .

5.2 Models for interest rates


Empirical evidence shows that interest rates are characterized by mean reversion. This
means that, on a short period, rates may diverge, but, on the long run, they tend to move
back towards a certain constant mean value. A direct consequence of this empirical fact
is that the geometric Brownian motion is not an appropriate model for interest rates. In
fact, if we assume rate rt to follow a GMB, we simply have

drt = µrt dt + σrt dBt ,

something that is not compatible with reversion towards a constant mean. Why?
In the literature (for more details, [8]), several models have been proposed to deal with
interest rates. Here below we consider some of the most famous ones.

5.2.1 Vasicek model


Vasicek model is the basic model when dealing with interest rates. Introduced in 1977, it is
a model in which the movements of interest rates are driven by only one source of market
risk (one-factor model), represented by a Brownian motion component as usual.
Vasicek model can be naturally used to evaluate interest rate derivatives, but also as a
way of modeling interest rates in the more general BSM framework. In risk management,
different Vasicek-like models have been proposed to model both market and credit risk.
Historically, Vasicek model has been criticized for allowing negative interest rates. However,
this previously-unwanted feature is now becoming one of the drivers of the revival of the
model in financial mathematics, given the present bizarre situation with actual interest
5.2. MODELS FOR INTEREST RATES 71

rates on the market.


Vasicek model is nothing more than a special type of Ornstein-Uhlenbeck process, i.e.

drt = (a − brt )dt + σdBt ,

where a, b, σ ∈ R.
Using Itō formula, we can immediately verify that
Z t
−bt a 
rt = r0 e + 1 − e−bt + σ e−b(t−s) dBs .
b 0

This tells us that



a  σ2  
−bt −bt −2bt
rt ∼ N r0 e + 1−e , 1−e .
b 2b

For t → ∞, we then get the asymptotic behavior of rt , i.e.

a σ2
 
rt ∼ N , .
b 2b

Hence interest rates will tend to come back to the value ab in the long run.
  2 
Exercise 25. Verify that in Vasicek model rt ∼ N r0 e−bt + ab 1 − e−bt , σ2b 1 − e−2bt .

5.2.2 CIR model


Cox-Ingersoll-Ross model was introduced in 1985 as an extension and correction of Vasicek
model. In particular, the CIR model does not allow for negative interest rates, a charac-
teristic of Vasicek model that used to be criticized in the past.
According to this model, we have that

drt = β(α − rt )dt + σ rt dBt ,

with α ≥ 0 and β, σ > 0.


follow a χ2 distribution
Standard probabilistic arguments tell us that rate rt will tend to 
2
(why?), which in the limit, that is for t → ∞, becomes a Gamma 2αβ ,σ .
σ 2 2β

5.2.3 Affine models


Vasicek and CIR models are just special cases of a more general class of interest rate
models, which goes under the name of affine models. In an affine model, we assume
p
dr(t) = (η(t) − λ(t)r(t))dt + δ(t) + γ(t)r(t)dB(t), (5.5)
72 CHAPTER 5. FIXED-INCOME AND INTEREST RATES

where all quantities like η(t) or δ(t) can be either deterministic or stochastic, i.e. η(t, ω).
In the affine class we also find other important models, like the time-dependent extension
of Vasicek’s, that is the model of Hull and White. Developed in 1990, Hull-White model
is the reference model to price things like Bermudan swaptions:

dr(t) = (η(t) − λ(t)r(t))dt + σ(t)dB(t).

5.3 Girsanov I, II and III. The Revenge.


All the interest models we have briefly listed above are characterized by one common
feature: the drift is not constant.
This tells us that Cameron-Martin theorem is not sufficient to “remove” the drift and play
with risk-neutrality. And if we consider the most general affine models, as per equation
5.5, the drift could also be random, so that not even Girsanov with determinist drift would
suffice.
It is therefore time to come back to Chapter 2 and to complete our treatment of Girsanov,
analyzing in some more details what happens when we have a random drift.
In what follows we will state three different versions of Girsanov theorem. The first one
is nothing more than a restatement, in a more formal way, of what we saw at the end
of Chapter 2. The other two are important generalizations, which not only allow for the
treatment of the most advanced interest rate models, but they also give us the tools to price
all type of financial product on the markets, even in presence of stochastic interest rates.
In other words, we could restate what we have seen so far in a more general framework,
and still be able to deal with it, at least theoretically, given that in practice we always need
numerical methods to evaluate non-trivial securities and products.
In all the three versions of Girsanov theorem, a fundamental assumption is represented by
Novikov condition, as already seen in Chapter 2. It represents the only assumption we
cannot drop.
(n)
Theorem 18 (Girsanov I). Let Y (t) ∈ Rn be a stochastic process on the space (Ω, FT , P ),
such that
dY (t) = a(t, ω)dt + dB(t),
with Y (0) = 0 and 0 ≤ t ≤ T < ∞. The elements B1 (t), ..., Bn (t) in B(t) are independent.
Define the process
Z t
1 t 2
Z 
Z(t) = exp a(s, ω)dB(s) − a (s, ω)ds , ∀t ≤ T.
0 2 0

Assume that a(t, ω) is adapted and the following condition (aka Novikov condition) holds:
  Z t 
1 2
EP exp a (s, ω)ds < ∞.
2 0
5.3. GIRSANOV I, II AND III. THE REVENGE. 73

(n)
Define a new measure Q on (Ω, FT ), equivalent to P , by setting

dQ(ω) = ZT (ω)dP (ω).

Then, under the new measure Q, the process Y (t) is a standard Brownian motion.

Remark 2. We can notice the following:

• If a(t, ω) = a, ∀ω and ∀t, we are in the Cameron-Martin framework of Theorem 11.

• If a(t, ω) = a(t), ∀ω, we are just in the basic Girsanov framework we have considered
in Chapter 2.

• In general, the process {Z(t)} is a martingale. In fact we can prove (you can try by
applying Itō formula with g(t, x) = ex ) that

dZ(t) = Z(t)a(t, ω)dB(t).

We will use this fact in the proof of the theorem.

Proof (you can skip it for the exam). For simplicity, let’s consider the case a(t, ω) bounded.
We want to verify that Y (t) is a standard Brownian motion under Q. In other words, under
Q, we have that

• Y (t) = (Y1 (t), ..., Yn (t)) is a martingale.

• Yi (t)Yj (t) − δij t is a martingale3 , where δij is some constant quantity.

We prove the first bullet, while we leave the second one to the reader (the proof is essentially
the same).
Set K(t) = Z(t)Y (t). Consider one of the elements of the vector K(t), that is Ki (t),
i = 1, ..., n.
We have that, using Itō,

dKi (t) = Z(t)dYi (t) + Yi (t)dZ(t) + dYi (t)dZ(t)


n
!
X
= Z(t)(ai (t, ω)dt + dB(t)) + Yi (t)Z(t) −ak (t, ω)dBk (t)
k=1
n
!
X
+ dBi (t) −Z(t) ak (t, ω)dBk (t)
k=1
= ...(by collecting the terms and using the vector representation)...
= Z(t)γ (i) (t)dB(t),
3
Notice that E[Yi (t)Yj (t)] is the cross-correlation.
74 CHAPTER 5. FIXED-INCOME AND INTEREST RATES
 
(i) (i)
where, with a little abuse of notation, γ (i) (t) is a vector γ1 (t), ..., γn (t) , in which each
component is such that
(
(i) −Yi (t)aj (t, ω) j 6= i
γj (t) = .
1 − Yi (t)ai (t, ω) j = i

Using what we said in Remark 2, then Ki (t) is a martingale w.r.t. P . And so is the vector
K(t).
Now, for s < t, consider
E [Z(t)Yi (t)|Fs ] E [Ki (t)|Fs ] Ki (s)
EQ [Yi (t)|Fs ] = = = = Yi (s).
E [Z(t)|Fs ] Z(s) Z(s)

Hence, the proof is complete, given that every component of Y (t) is a martingale under
Q.

The first Girsanov theorem can be further generalized to deal with more general pro-
cesses with random diffusion parameter. Girsanov II is probably the most useful theorem
in financial mathematics, when dealing with changes of measures. Girsanov I is included
in Girsanov II as an intermediate step.
(n)
Theorem 19 (Girsanov II). Let Y (t) ∈ Rn be a stochastic process on the space (Ω, FT , P ),
such that
dY (t) = β(t, ω)dt + θ(t, ω)dB(t),
with Y (0) = 0 and 0 ≤ t ≤ T < ∞.
Suppose there exist two adapted process u(t, ω) and α(t, ω) such that

θ(t, ω)u(t, ω) = β(t, ω) − α(t, ω).

Assume u(t, ω) satisfies Novikov condition, and set


 Z t
1 t 2
Z 
Z(t) = exp − u(s, ω)dB(s) − u (s, ω)ds , ∀t ≤ T.
0 2 0
(n)
Define a new measure Q on (Ω, FT ), equivalent to P , as

dQ(ω) = ZT (ω)dP (ω).


Rt
Then, under the new measure Q, the process B̃(t) = B(t) + 0 u(s, ω)ds is a standard
Brownian motion, and 4

dY (t) = α(t, ω)dt + θ(t, ω)dB̃(t).


4
The quantity α(t, ω) here below can also be replaced by a more general a(t, ω).
5.4. PRICING INTEREST RATE DERIVATIVES 75

Proof. The proof mimics the one of Girsanov I and it is left to the reader.

Finally, a third more general version of Girsanov can be given. This version is particu-
larly useful in numerical mathematics, as it allows for the identification of weak solutions
(if they exist). For our purposes, it can be considered even too general, hence we just state
it.

Theorem 20 (Girsanov III). Consider two processes like

dX(t) = b(X(t))dt + σ(X(t))dB(t),

and
dY (t) = [γ(t, ω) + b(Y (t))] dt + σ(Y (t))dB(t),
If there exists an adapted process u(t, ω), satisfying Novikov condition, such that

σ(Y (t))u(t, ω) = γ(t, ω),

and which we can use to define an appropriate Radon-Nikodym derivative Z(t), then the
law of X(t) under P is the same as that of Y (t) under Q.

5.4 Pricing interest rate derivatives


To price interest rate derivatives, let’s start by considering the the so-called tenor structure,
i.e. let’s split the usual time horizon [0, T ] into non-overlapping intervals

0 = T0 < T1 < ... < Tn = T.

The tenor structure is a common representation for interest rates, when dealing with the
zero-rates yield curve.
Let’s now consider the bond price P (t, Ti ), i = 1, ..., n, which–we know–can be used as a
numeraire. We assume
dP (t, Ti )
= rt dt + ξi (t)dBt , (5.6)
P (t, Ti )
where both rt and ξi (t) are adapted to the natural filtration generated by Bt .
From equation (5.6) we then get (recall t ≤ Ti )
Z t Z t
1 t
Z 
2
P (t, Ti ) = P (0, Ti ) exp rs ds + ξi (s)dBs − |ξi (s)| ds .
0 0 2 0

Remark 3. The process


Z t
Wti = Bt − ξi (s)ds, 0 ≤ t ≤ Ti ,
0
76 CHAPTER 5. FIXED-INCOME AND INTEREST RATES

is a standard Brownian motion under the forward measure QTi for i = 1, ..., n.
This is just an application of Girsanov theorem, once we choose the numeraire Nt =
P (t, Ti ).
It is in fact sufficient to notice that
1
dWti = dBt − dNt dBt = ... = dBt − ξi (t)dt.
Nt
Exercise 26. Complete the ... part in Remark 3.
Moreover, look at the term N1t dNt . What is it? Why do we use it here?
The process Wti is usually referred to as the Forward Brownian Motion , i.e. a Brownian
motion with random drift under the physical measure that behaves as a standard Brownian
motion under the appropriate forward measure.
Now, for i = 1, ..., n, we know that
dWti = dBt − ξi (t)dt.
If we consider i, j = 1, ..., n with i 6= j, simple manipulations give us
dWtj = dBt − ξj (t)dt = dWti + (ξi (t) − ξj (t)) dt. (5.7)
Equation (5.7) is very important: it tells us that the process Wtj , a standard Brownian
motion under QTj , is nothing more than a Brownian motion with drift (ξi (t) − ξj (t)) under
QTi . This suggests that, when dealing with interest rate derivative, it is very important to
choose the right measure, carefully looking at the tenor structure.

5.4.1 Pricing a general claim


Let C be a generic random claim with maturity Ti , so that at time t we aim to compute
its discounted expected value using equation (5.3), i.e.
h R Ti i
EQR e− t rs ds C|Ft = P (t, Ti )EQTi [C|Ft ],
where QR is the risk-neutral measure and QTi the forward measure with horizon Ti ≥ t.
First notice that under QTi , we have for j 6= i
dP (t, Tj )
= rt dt + ξi (t)ξj (t)dt + ξj (t)dWti .
P (t, Tj )
Moreover
t t
1 t
Z Z Z 
2
P (t, Tj ) = P (0, Tj ) exp rs ds + ξj (s)dBs − |ξj (s)| ds (5.8)
0 0 2 0
Z t Z t
1 t
Z 
j 2
= P (0, Tj ) exp rs ds + ξj (s)dWs + |ξj (s)| ds
0 0 2 0
Z t Z t
1 t 1 t
Z Z 
i 2 2
= P (0, Tj ) exp rs ds + ξj (s)dWs − |ξj (s) − ξi (s)| ds + |ξi (s)| ds .
0 0 2 0 2 0
5.4. PRICING INTEREST RATE DERIVATIVES 77

From this we can derive in t the forward price (or rate) for the periods Ti ≤ Tj as
Z t
1 t
Z 
P (t, Tj ) P (0, Tj ) i 2
P (Ti , Tj ) = = exp (ξj (s) − ξi (s))dWs − |ξj (s) − ξi (s)| ds
P (t, Ti ) P (0, Ti ) 0 2 0
| {z }
A
(5.9)
The forward price is the future price of a bond with time horizon [Ti , Tj ] as inferable from
today’s rates’ structure (today we are in t).
The term A in equation (5.9) should remind you of something we have recently encountered.

5.4.2 Bond options


Set 0 ≤ Ti ≤ Tj and assume, for k = i, j,
dP (t, Tk )
= rt dt + ξk (t)dBt ,
P (t, Tk )
where ξk (t) is from now on taken to be a deterministic volatility function.

Proposition 21. The price of a call option on P (Ti , Tj ) with pay-off C = (P (Ti , Tj ) − K)+
is
 
h R Ti
− t rs ds +
i v(t, Ti ) 1 P (t, Tj )
EQR e (P (Ti , Tj ) − K) |Ft = P (t, Tj )Φ + log
2 v(t, Ti ) KP (t, Ti )
 
v(t, Ti ) 1 P (t, Tj )
− KP (t, Ti )Φ − + log ,
2 v(t, Ti ) KP (t, Ti )
RT
where v 2 (t, Ti ) = t i |ξi (s) − ξj (s)|2 ds.
Proof. First we know that
h R Ti i
EQR e− t rs ds (P (Ti , Tj ) − K)+ |Ft = P (t, Ti )EQTi [(P (Ti , Tj ) − K)+ |Ft ].

Secondly,
Z Ti Z Ti 
P (t, Tj ) 1
P (Ti , Tj ) = exp (ξj (s) − ξi (s))dWsi− 2
|ξj (s) − ξi (s)| ds .
P (t, Ti ) t 2 t
| {z }
B

Therefore
" + #
h R Ti i P (t, Tj )
EQR e− t rs ds (P (Ti , Tj ) − K)+ |Ft = P (t, Ti )EQTi ×B−K |Ft
P (t, Ti )
= EQTi (P (t, Tj ) × B − KP (t, Ti ))+ |Ft .
 

At this point we can apply BSM theorem to obtain the pricing formula.
78 CHAPTER 5. FIXED-INCOME AND INTEREST RATES

Remark 4. Thanks to Proposition 21 we can make some interesting observations, com-


paring the pricing formula of the bond option with that of a standard EU call option:

• The forward price P (Ti , Tj ) is lognormally distributed (remember that ξk (t) is deter-
ministic for k = i, j), as the price process in the EU call.

• In the pricing formula P (t, Tj ) plays the role of the underlying asset.

• KP (t, Ti ) is the new strike price.



• The volatility is v(t, Ti )/ Ti − t.

• The time to maturity is naturally Ti − t.

• The “risk neutral rate” is just set to be r = 0 (Why? Think about the setting we are
in).

• The replicating strategy for a bond option on the forward price P (Ti , Tj ) thus con-
tains the zero-coupon bonds P (t, Tk ), k = i, j, with the two limiting maturities of the
forward rate.

Exercise 27. Using the BSM theorem, verify that the pricing formula in Proposition 21
is correct, i.e. try to obtain it from scratch.
Part III

Probabilistic Appendix

79
81

The aim of this appendix is to give you the basic knowledge to deal with Parts I and
II of the lecture notes. If you are attending the Financial Engineering Specialization, you
should already have this knowledge, in particular for what concerns the Brownian Motion.
Reading the appendix is optional, but it can be a useful reference.
82
Chapter 6

Constructing probability spaces

In this chapter we show how a probability space can be constructed. We introduce useful
concepts and tools such as the Borel σ-algebra, Kolmogorov’s cylinders, etc.
Sometimes you will find some boxes. The idea is to recall definitions and results that are
not discussed in detail, but that are useful to understand what is going on.
Chapters 1 and 2 in Shreve [8] are good additional references.

6.1 The construction on R


We start by constructing a proper probability space on the extended real line R∪{−∞, +∞}.

6.1.1 The equipped space (R, B(R))


Let us consider R = (−∞, +∞) and the intervals (a, b] = {x ∈ R : a < x ≤ b}, with
−∞ ≤ a < b < +∞, and (c, +∞), with −∞ ≤ c < +∞.
Let us take into account the class of all the subsets of R that are a finite union of disjoint
intervals in R. Let us also include ∅ in that class, which we call A.
The class A is an algebra (or f ield).

Algebra:
Let X be some set; with 2X we indicate its power set (all subsets+∅+X itself). Then
a subset C ⊆ 2X is called an algebra if it is non-empty, closed under complementation
and closed under finite unions (or intersections, using De Morgan).

Let us check:

• R ∈ A.
Just consider the case c = −∞.1
1
Also notice that ∅ ∈ A by construction.

83
84 CHAPTER 6. CONSTRUCTING PROBABILITY SPACES

• If A ∈ A, then Ac ∈ A. S
Consider the case A = ni=1 (ai , bi ]. W.l.o.g. we assume that a1 < a2 < ... < an .
Since A ∈ A, we know that A is the finite union of disjoint sets, hence

A = (a1 , b1 ] ∪ (a2 , b2 ] ∪ ... ∪ (an , bn ].

As a consequence,

Ac = (−∞, a1 ] ∪ (b1 , a2 ] ∪ ... ∪ (bn−1 , an ] ∪ (bn , +∞) ∈ A.

• If A, B ∈ A then A
S ∪ B ∈ A.
We consider A = ni=1 (ai , bi ] and B = m ∗ ∗
S
j=1 (aj , bj ]. Then

n
! m 
[ [ [[
A∩B = (ai , bi ] ∩  (a∗j , b∗j ] = (ai , bi ] ∩ (a∗j , b∗j ].
i=1 j=1 i j

Since all the intervals (ai , bi ] and (a∗j , b∗j ] are disjoint, we have that the intersection
(ai , bi ] ∩ (a∗j , b∗j ] is either empty or an interval of type (a, b], hence we easily derive
A ∩ B ∈ A. This is closure under intersection.
Now we can notice that A ∪ B = (Ac ∩ B c )c . Closure under complementation and
intersection then tell us that A ∪ B ∈ A.
The reasoning can naturally be extended to any finite union of elements of A.
Good, but...the class A does not constitute a σ-algebra!!! A σ-algebra is what we need to
develop our theory.
Just consider the case: An = 0, n−1 ∈ A. Then +∞
 S
n n=1 An = (0, 1) 6∈ A.

σ-algebra:
A σ-algebra is nothing more than an algebra which is also closed under countable
unions (or intersections).
Please notice that is just an intuitive and unorthodox definition. Every serious topologist and probabilist would kill me for this.

.
The smallest σ-algebra that contains A, also known as the σ-algebra generated by A, or
σ(A), is the Borel σ-algebra of R . In what follows we will indicate the Borel σ-algebra as
B(R) = σ(A).
From a topological point of view, given a topological space Y , the Borel σ-algebra is
the smallest σ-algebra containing all open sets (or, equivalently, all closed sets) in Y . If
Y = R, B(R) is the smallest σ-algebra containing all open (or respectively closed) intervals.

The couple (R, B(R)) defines a so-called measurable space (also known as equipped
space) .
6.1. THE CONSTRUCTION ON R 85

Other definitions for B(R)


Are there other ways of defining B(R)?
There are many ways! Here below we consider two of the simplest ones.
Let us define I as the class of the intervals (a, b], with −∞ ≤ a < b < +∞, and (c, +∞),
with −∞ ≤ c < +∞. Please notice that with I we are not restricting our attention to the
case of finite unions of disjoint intervals.
The smallest σ-algebra containing I, i.e. σ(I), coincides with B(R), so that σ(I) = B(R) =
σ(A).
In fact we have that
I ⊂ B(R) ⇒ σ(I) ⊂ B(R),
given that B(R) is a σ-algebra. At the same time A ⊂ σ(I), hence σ(A) ⊂ σ(I).
Another possibility to obtain B(R) requires us to introduce the metric
|x − y|
d(x, y) =
1 + |x − y|
on R, and we then use it to define the open neighborhood of x0 ∈ R as
Sp (x0 ) = {x ∈ R : d(x0 , x) < p}, p > 0.
Let B0 (R) be the σ-algebra generated by the open sets Sp (x0 ), p > 0, x0 ∈ R, then
B0 (R) = B(R). The way of showing this is quite simple. The open sets correspond to the
open intervals (a, b) with a = x0 − p/(1 − p) and b = x0 + p/(1 − p). Hence, letting x0 and
p vary, we can reproduce the class of open intervals (a, b), which - as we know - generates
B(R).

A remark about B(R)


It is interesting to notice that B(R) contains many different types of intervals:
• (a, b) = +∞ 1
S
i=1 (a, b − i ] ∈ B(R);

• [a, b] = +∞ 1
T
i=1 (a − i , b] ∈ B(R);

• {a} = +∞ 1
T
i=1 (a − i , a] ∈ B(R);

• [a, b) = {a} ∪ (a, b) ∈ B(R);


• (−∞, b) ∈ B(R), as a special case of (a, b).
In other words, B(R) contains all the different types of intervals of R, together with all the
finite and countable sets.
86 CHAPTER 6. CONSTRUCTING PROBABILITY SPACES

6.1.2 An appropriate probability measure


Given the equipped space ((R, B(R)), we now want to define an appropriate measure over
it. We will do that by introducing a so-called repartition function F .

Definition 16. A function F : R → [0, 1] is called repartition function if it satisfies the


following properties:
1. F is non-decreasing, i.e. F (x1 ) ≤ F (x2 ) for every x1 < x2 ∈ R;

2. limx→−∞ F (x) = 0 and limx→+∞ F (x) = 1;

3. F is right-continuous, i.e. ∀x ∈ R, limt→x+ F (t) = F (x).


Let us now consider all the intervals of type (a, b] and (c, +∞), and let us define the
following probabilities:

P0 ((a, b]) = F (b) − F (a) and P0 ((c, +∞)) = 1 − F (c).

Using additivity we extend P0 on the algebra A, getting


n
X
P1 (A) := P0 ((ai , bi ]).
i=1

Please notice that the domain of P0 is represented by the class of the intervals, whereas
the domain of P1 is A. For A 3 A = (a, b], we naturally obtain P1 (A) = P0 (A), and this is
why P1 is called extension of P0 .

Measures and Probabilities:


Let X be a set and Σ an algebra over X. A function µ : Σ → R (where R is to be
intended as the extended real number line) is called a premeasure if it is non-negative,
additive and µ(∅) = 0. If µ : Σ → [0, 1], then µ is called probability.
If Σ is a σ−algebra, then additivity is substituted with countable additivity, also known
as σ−additivity (see in the text). When µ : Σ → [0, 1], then µ is called probability
measure.
.
The first question we may want to answer is then the following: is P1 well-defined? In
other words, if an element A ∈ A has different representations, does P1 always assign the
same probability to A?
The answer is naturally yes. It is sufficient to set A = ∪ni=1 (ai , bi ] = ∪m ∗ ∗
j=1 (aj , bj ]. Now we
can observe that
∗ ∗ ∗ ∗
(ai , bi ] = (ai , bi ] ∩ A = (ai , bi ] ∩ ∪m m
 
j=1 (aj , bj ] = ∪j=1 (ai , bi ] ∩ (aj , bj ] .
6.1. THE CONSTRUCTION ON R 87

But the intervals (ai , bi ] ∩ (a∗j , b∗j ] are disjoint, hence


m
X
P0 (ai , bi ] ∩ (a∗j , b∗j ] ,

P0 ((ai , bi ]) =
j=1

and thus
n X
X m n
X
(a∗j , b∗j ]

P1 (A) = P0 (ai , bi ] ∩ = P0 ((ai , bi ]).
i=1 j=1 i=1

Now, if we start from (a∗j , b∗j ], so that

(a∗j , b∗j ] = ∪ni=1 (a∗j , b∗j ] ∩ (ai , bi ] ,




it is easy to verify that we obtain the same value for P1 (A).

Proposition 22. P1 is a probability on A.

Proof. We have to show that

1. P1 (R) = 1;

2. ∀A ∈ A, P1 (A) ≥ 0;

3. For A1 , A2 , ....An ∈ A pairwise-disjoint, the probability P1 is (finitely) additive, i.e.


n
X
P1 (∪ni=1 Ai ) = P1 (Ai ).
i=1

For point 1, just set R = (−∞, +∞). Then

P1 ((−∞, +∞)) = lim P1 ((c, +∞)) = lim (1 − F (c)) = 1.


c→−∞ c→−∞

Point 2 is given by the fact that F is non-decreasing, and P1 ((ai , bi ]) = F (bi ) − F (ai ) ≥ 0.
Point 3 is left as an exercise [Hint: consider A = ∪ni=1 (ai , bi ] and B = ∪m ∗ ∗
j=1 (aj , bj ], then
verify P1 (A ∪ B) = P1 (A) + P1 (B)].

However, what is really important for us is that P1 is also σ−additive on A. This means
that, for A1 , A2 , ... ∈ A, with Ai ∩ Aj = ∅ for i 6= j, ifP∪∞i=1 Ai ∈ A (notice that this is a

strict requirement for an algebra), then P1 (∪∞ i=1 A i ) = i=1 P1 (Ai ).
To prove this we will use the following proposition.

Proposition 23. If P is a probability on an algebra E, and if P is continuous from above


in ∅, then P is also σ−additive on the algebra E.
88 CHAPTER 6. CONSTRUCTING PROBABILITY SPACES

Continuity from above:


A measure µ is continuous from above if, for E1 ⊃ E2 ⊃ ... ⊃ En ⊃ ... measurable
sets (an infinite collection of events), such that µ(E1 ) < ∞, we have µ (∩∞
i=1 Ei ) =
limi→∞ µ(Ei ).

Proof. Let E1 , E2 , ... ∈ E be pair-wise disjoint sets. Let us define E = ∪∞


i=1 Ei ∈ E and
Bn = ∪ni=1 Ei . Clearly we have that E = Bn ∪ (E ∩ Bnc ). Since P is a probability on E, it
is additive, thus P (E) = P (Bn ) + P (E ∩ Bnc ).
Now we can observe that Bnc ↓ E c , view that Bn ↑ E. Hence E ∩ Bnc ↓ E ∩ E c = ∅.
But P is continuous from above in ∅, so we know that limn→+∞ P (E ∩ Bnc ) = P (∅) = 0.
As a consequence:
n
X ∞
X
P (E) = P (∪∞
i=1 Ei ) = lim P (Bn ) = lim P (Ei ) = P (Ei ).
n→+∞ n→+∞
i=1 i=1

We are now ready to show that P1 is σ−additive on A.


Proposition 24. Given its construction, P1 is a (σ−additive) probability measure on A.
Proof. We know that P1 is a probability on A (non-negative, finitely additive, etc.). Hence
we simply have to show countable additivity.
We divide the proof into 2 parts.

Part 1: Let us assume that A 3 A1 ⊃ A2 ⊃ ... ⊃ An ⊃ ... with An ↓ ∅ = ∩∞


i=1 An ,
and also that An ∈ [−M, M i], M > 0.  i
(n) (n) (n) (n)
Since An = ∪m i=1 ai , bi , we can define Bn = ∪m
i=1 a i + δ, bi ⊂ An , and also
h i
(n) (n)
B̄n = ∪mi=1 ai + δ, bi ⊂ An , δ > 0.
Then we have
m h    i h    i
(n) (n) (n) (n)
X
P1 (An \Bn ) = P1 (An ) − P1 (Bn ) = F bi − F ai − F bi − F ai + δ
i=1
m  
X (n)
 
(n)
 
= F ai + δ − F ai ≤ n,
2
i=1

for every  > 0, provided that we choose δ properly.


By hypothesis, we know that An ↓ ∅, so that B̄n ↓ ∅ as well. As a consequence

[−M, M ] ∩ Acn ↑ [−M, M ] and [−M, M ] ∩ B̄nc ↑ [−M, M ].

By definition we get
∪∞ c
i=n [−M, M ] ∩ B̄n = [−M, M ],
6.1. THE CONSTRUCTION ON R 89

where [−M, M ] is a closed and bounded set.


However, for every n, the set [−M, M ] ∩ B̄nc is open in [−M, M ]. We can then apply the
Heine-Borel-Pincherle theorem (a well-known result in topology), for which every closed
and bounded set admits a finite cover of open sets, i.e.

∪nn=1
0
[−M, M ] ∩ B̄nc = [−M, M ].

Given this important information, we proceed as follows:


c
[−M, M ] ∩ [−M, M ]c = ∅ = [−M, M ] ∩ B̄nc ∪nn=1
0
[−M, M ] ∩ B̄nc
= [−M, M ] ∩nn=1 [−M, M ]c ∩ B̄n = [−M, M ] ∩nn=1
0
 0
B̄n .

This implies that ∩nn=1


0
B̄n = ∅.
Hence P1 (An0 ) = P1 (An0 ∩ (∩nn=10
(∪nn=1
Bn )c ) = P1P 0
(An0 ∩ Bnc )) ≤ P n0 c
1 (∪n=1 (An ∩ Bn )).
n0 c n0 c
P n0
Therefore
Pn0 P1 (An0 ) ≤ P1 (∪n=1 (An ∩ Bn )) ≤ n=1 P1 (An ∩ Bn ) = n=1 P1 (An \Bn ) =
/2 n < . In other words, P (A ) <  for n ≥ n .
n=1 1 n 0

Part 2: Let us discard the hypothesis that An ⊂ [−M, M ]. Let us choose M such that
P1 ([−M, M ]) = P0 ([−M, M ]) > 1 − /2,  > 0.
We can simply observe that An = (An ∩ [−M, M ]) ∪ (An ∩ [−M, M ]c ). Hence

P1 (An ) = P1 (An ∩ [−M, M ]) + P1 (An ∩ [−M, M ]c ) ≤ P1 (An ∩ [−M, M ]) + P0 ([−M, M ]c )



≤ P1 (An ∩ [−M, M ]) + .
2
The same applies to An ∩ [−M, M ], from which we derive that P (An ∩ [−M, M ]) ≤ /2,
for n ≥ n0 .
Part 1 and Part 2 then simply imply that P1 (An ) ↓ 0 in any case.

The fact that P1 is σ−additive on A is very important. In fact it allows us to use the
powerful Carathéodory Extension Theorem, which we state without proof (a nice one in
Ash[2]).

Theorem 21 (Carathéodory Extension Theorem). Any σ−additive measure defined on an


algebra E can be uniquely extended to the σ−algebra σ(E) generated by the algebra itself.

For us this means that P1 can be uniquely extended to σ(A) = B(R). In other words,
there exists one and only one probability measure P on the equipped space (R, B(R)), such
that P (A) = P1 (A) for every A ∈ A, i.e. such that P ((a, b]) = P1 ((a, b]) = P0 ((a, b]) =
F (b) − F (a). The measure P that we obtain as extension of P1 on σ(A) = B(R) is the
famous Lebesgue-Stieltjes probability measure.
We have so concluded the construction of the probability space (R, B(R), P ) .
90 CHAPTER 6. CONSTRUCTING PROBABILITY SPACES

6.1.3 Random variables and densities


From now on, we will call random variable any measurable function X : (Ω, F) → (R, B(R)),
that is to say a function defined on Ω with values in R, such that {ω : X(ω) ∈ B} ∈ F,
∀B ∈ B(R).
The probability density function of X on (R, B(R)) is then defined as

PX (A) := P ({ω : X(ω) ∈ A}) = P X −1 (A) ,



A ∈ B(R).

The (cumulative) distribution function is given by

FX (x) := PX ((−∞, x]) = P X −1 ((−∞, x]) ,



x ∈ R.

At this point, a natural question arises. Are there any probability space (Ω, F, Q) and
random variable X such that

QX ((−∞, x]) = Q({ω : X(ω) ≤ x}) = F (x)?

The answer is clearly yes, since it is sufficient to set the following:


• Ω = R;
• F = B(R);
• Q((a,b])=P((a,b])=F(b)-F(a);
• And finally X(ω) = ω.

6.2 The construction on Rn


The construction of a probability space on Rn simply generalizes what we have seen so far
for R.
Let Rn = R × ... × R represent the space of the ordered n-tuples (x1 , x2 , ..., xn ), −∞ <
xk < +∞, k = 1, 2, ..., n. From now on we will use the notation x = (x1 , x2 , ..., xn ).
Set Ik = (ak , bk ] and define I = I1 × I2 × ... × In . The set I is called rectangle of size n,
and I1 , I2 , ..., In are its edges. Naturally we have

I = {x ∈ Rn : x1 ∈ I1 , ..., xn ∈ In }.

Similarly to what we have seen in the unidimensional case, if I is the class of all the
rectangles of size n, which also forms an algebra, we call Borel σ−algebra of Rn the smallest
σ−algebra containing I, i.e. σ(I) = B(Rn ).
There is an interesting relationship between B(Rn ) and B(R), namely
n
O
B(R) ⊗ B(R) ⊗ ... ⊗ B(R) = B(R) = σ(I) = B(Rn ).
k=1
6.3. THE CONSTRUCTION ON R∞ 91

The proof is left as an exercise.


In order to define a proper probability measure on (Rn , B(Rn )) we can exactly follow the
procedure we used for the univariate case.
Set Fn : Rn → [0, 1] to be a multivariate repartition function such that:

1. limxi →−∞ Fn (x1 , ..., xi , ..., xn ) = 0, for i = 1, 2, ..., n;

2. limx1 →+∞,...,xn →+∞ Fn (x1 , ..., xi , ..., xn ) = 1;

3. limx1 ↓c1 ,...,xn ↓cn Fn (x1 , ..., xn ) = Fn (c1 , ..., cn ).

4. ∆ba11 ∆ba22 · · · ∆bann Fn (x1 , ..., xn ) ≥ 0, with ai < bi , i = 1, 2, ..., n, and where

∆baii Fn (x1 , ..., xi , ..., xn ) = Fn (x1 , ..., bi , ..., xn ) − Fn (x1 , ..., ai , ..., xn ).

Now define the following probability on the intervals (rectangles) of Rn

P0 (I1 × I2 × ... × In ) := ∆ba11 ∆ba22 · · · ∆bann Fn (x1 , ..., xn ) ≥ 0.

The same reasoning we used for the univariate case makes us now conclude that there
exists a unique probability measure P on (Rn , B(Rn )) such that

P ((a1 , b1 ] × ... × (an , bn ]) = Pn (I1 × ... × In ) = P0 (I1 × ... × In ).

In a nutshell: using additivity we extend P0 on I, defining Pn , then we can show that


such a probability is also σ−additive and, using the Carathéodory Extension Theorem, we
define a unique probability measure on σ(I) = B(Rn ).

Given a general probability space (Ω, F, P ), we call random vector every multivariate func-
tion X from Ω to Rn , such that {ω : (X1 (ω), ..., Xn (ω)) ∈ A} ∈ F, for every A ∈ B(Rn ).

Exercise: Given the repartition function F (x1 , ..., xn ) of a random vector X, is there
any specific space (Ω, F, P ) such that

P ({ω : X1 (ω) ≤ x1 , ..., Xn (ω) ≤ xn }) = F (x1 , ..., xn )?

6.3 The construction on R∞


We now turn our attention on Ω = R∞ = R × R × ..., that is to say the space of the real
sequences (x1 , x2 , ...) with xk ∈ R ∀k ≥ 1.
The present section can be considered a step by step constructive proof of the famous
Kolmogorov Extension Theorem , which we will state at the end of the section. The
procedure is more or less the same, but there are some important elements we need to take
92 CHAPTER 6. CONSTRUCTING PROBABILITY SPACES

into careful consideration.


In this case the building block is represented by the so-called cylinder of basis A ∈ B(Rn )
(also known as the cylinder of Kolmogorov) :

In (A) := {(x1 , x2 , ..., xn , ...) ∈ R∞ : (x1 , x2 , ..., xn ) ∈ A}.

By letting A vary in B(Rn ) together with n = 1, 2, ..., we obtain a class C of cylinders, i.e.
sequences, which represents an algebra (once we enrich it with ∅). In fact:
• R∞ ∈ C, since it is sufficient to consider A = Rn .

• (In (A))c = In (Ac ) ∈ C, view that, if A ∈ B(Rn ), then Ac ∈ B(Rn ).

• If A ∈ B(Rh ) and B ∈ B(Rk ), k ≥ h, then

Ih (A) ∪ Ik (B) = {(x1 , x2 , ..., xn , ...) ∈ R∞ : (x1 , x2 , ..., xh ) ∈ A ∨ (x1 , x2 , ..., xk ) ∈ B}


= {(x1 , x2 , ..., xn , ...) ∈ R∞ : (x1 , x2 , ..., xk ) ∈ A × Rk−h ∨ (x1 , x2 , ..., xk ) ∈ B}
= {(x1 , x2 , ..., xn , ...) ∈ R∞ : (x1 , x2 , ..., xk ) ∈ A × Rk−h ∪ B} ∈ C,

given that A × Rk−h ∪ B ∈ B(Rk ).


Now, let B(R∞ ) be the σ−algebra generated by C. It is very important to observe the
following.
Proposition 25. The σ−algebra B(R∞ ) contains all the cylinders of C, but also all the
following sequences (a and c are scalars):
1. {(x1 , x2 , ..., xn , ...) ∈ R∞ : supn xn > a};

2. {(x1 , x2 , ..., xn , ...) ∈ R∞ : supn xn < a};

3. {(x1 , x2 , ..., xn , ...) ∈ R∞ : (xn )n≥1 converges};

4. {(x1 , x2 , ..., xn , ...) ∈ R∞ : ni=1 xi converges as n → +∞};


P

5. {(x1 , x2 , ..., xn , ...) ∈ R∞ : n−1 ni=1 xi → c for n → +∞}.


P

Proof. We will prove points 1 and 5. The remaining ones are left as exercise.
Point 1: We have that

{(x1 , x2 , ..., xn , ...) ∈ R∞ : sup xn > a} =


n
= ∪∞
n=1 {(x1 , x2 , ..., xn , ...) ∈ R

: xn > a}
= ∪∞
n=1 {(x1 , x2 , ..., xn , ...) ∈ R

: (x1 , x2 , ..., xn ) ∈ |R × {z
... × R} ×(a, +∞)}
n−1 times
= ∪∞
n=1 In (R
| × {z
... × R} ×(a, +∞)) ∈ σ(C) = B(R ). ∞

n−1 times
6.3. THE CONSTRUCTION ON R∞ 93

Pn
= ∩∞ ∞ ∞ ∞ : |n−1

Point 5: Set D P k=1 ∪N =1 ∩n=N (x1 , x2 , ..., xn , ...) ∈ R i=1 xi − c| < 1/k . We
that n−1 ni=1 xi → c, when n → +∞, iff ∀K > 0 ∃N = N (K) such that, ∀n ≥ N ,
have P
|n−1 ni=1 xi − c| < 1/K.
But
n
( )
X
∞ −1
(x1 , x2 , ..., xn , ...) ∈ R : |n xi − c| < 1/K = In (Bn )
i=1

is a cylinder with basis


n
( )
X
Bn = (x1 , x2 , ..., xn ) ∈ Rn : |n−1 xi − c| < 1/K ∈ B(Rn ).
i=1

That means that D corresponds to the countable union and intersection of cylinders, hence
D ∈ B(R∞ ).

6.3.1 A probability on (R∞ , B(R∞ ))


In order to define a proper probability measure on (R∞ , B(R∞ )) we will proceed as follows:

1. Let us assume that for the equipped spaces (R, B(R)), (R2 , B(R2 )), ..., (Rn , B(Rn )),
we have constructed the corresponding probability measures P1 , P2 , ..., Pn (the index
n identifies the equipped space of reference).

2. Let us impose the following compatibility condition, also known as consistency , which
was introduced by Kolmogorov:

Pn (A × R) = Pn−1 (A), ∀A ∈ B(Rn−1 ), n ≥ 2. (6.1)

3. Let us require that the probability P , which we are building on (R∞ , B(R∞ )), is such
that, for each cylinder In (A) of basis A ∈ B(Rn ),

P (In (A)) = Pn (A), ∀A ∈ B(Rn ), n ≥ 1.

Proposition 26. P is well-defined, that is to say it always assigns the same mass to the
same cylinder, notwithstanding its possible alternative representations.

Proof. Suppose that a cylinder In (A), with basis A ∈ B(Rn ), may also be represented as
In+k (B), with basis B ∈ B(Rn+k ), k ≥ 0.
This means that we can write In (A) = In+k (B) = In+k (A × Rk ). As a consequence

Pn+k (A × Rk ) = Pn+k (B).


94 CHAPTER 6. CONSTRUCTING PROBABILITY SPACES

The compatibility condition2 then tells us that

Pn+k (A × Rk ) = Pn+k−1 (A × Rk−1 ) = ... = Pn+1 (A × R) = Pn (A).

Hence
Pn (A) = P (In (A)) = Pn+k (B) = P (In+k (B)).

The next step, as before, is to show that P is additive on C, with P (In (A)) = Pn (A).
Let us assume that Im (A1 ), Im (A2 ), ..., Im (Ak ) are disjoint cylinders with disjoint bases
A1 , ..., Ak (if they are not, they can always be transformed into a new set of disjoint
cylinders with standard set operations). Without any loss of generality, let us also assume
that Ai ∈ B(Rm ).
Then we have
n o
∪kj=1 Im (Aj ) = (x1 , x2 , ..., xn , ...) ∈ R∞ : (x1 , ..., xm ) ∈ ∪kj=1 Aj = Im (∪kj=1 Aj ).

Therefore
k
X k
X
P (∪kj=1 Im (Aj )) = P (Im (∪kj=1 Aj )) = Pm (∪kj=1 Aj ) = Pm (Aj ) = P (Im (Aj )).
j=1 j=1

Additivity is a very useful property but, unfortunately, it is not sufficient to apply Carathéodory
Extension Theorem, as we plan to do. This is why we will now prove that P is also
σ−additive on C, that is to say P is a proper probability measure on C.
To show this, it is sufficient to prove that P is continuous from above in ∅.

Proposition 27. Let (In (An ))n≥1 be a sequence of cylinders in C such that In (An ) ↓ ∅,
An ∈ B(Rn ). Then limn→+∞ P (In (An )) = 0.

Proof. To prove our proposition we will show that if limn→+∞ P (In (An )) = λ > 0, then
∩∞
n=1 In (An ) 6= ∅, and this is in contradiction with the enunciation of the proposition.
For every basis An , let us now consider a compact set Bn ⊂ An , such that Pn (An \Bn ) ≤
λ/2n+1 . This is nothing more than a property of the the measure on (Rn , B(Rn )) that we
can derive from P1 on (R, B(R)).
A direct consequence is then

Pn (An \Bn ) = P (In (An )\In (Bn )) ≤ λ/2n+1 .


2
It is worth to notice the following. If Fn is the repartition function used to define Pn , we have that,
∀A ∈ B(Rn−1 ), Pn (A × R) = A×R dFn (x1 , ..., xn ) = A dFn (x1 , ..., xn−1 , +∞). This means that the
R R

compatibility condition can also be expressed in terms of Fn , i.e. Fn (x1 , ..., xn−1 , +∞) = Fn−1 (x1 , ..., xn−1 ),
for (x1 , ..., xn−1 ) ∈ Rn−1 , n ≥ 2.
6.3. THE CONSTRUCTION ON R∞ 95

Moreover
P (In (An )\In (Bn )) = P (In (An )∩(∪ni=1 Ii (Bi ))c ) = P (∪ni=1 In (An )∩(Ii (Bi ))c ) = P (∪ni=1 In (An )∩Ii (Bic )).
Let us now consider the cylinder Ĉn = ∩nj=1 Ij (Bj ). By hypothesis, we have that the
sequence of cylinders (In (An ))n≥1 is decreasing, thus
n
X
P (In (An )\Ĉn ) < P (∪ni=1 Ii (Ai ) ∩ Ii (Bic )) < P (Ii (Ai ) ∩ Ii (Bic ))
i=1
n
X n
X
≤ Pi (Ai ∩ Bic ) = Pi (Ai \Bi )
i=1 i=1

X λ
≤ Pi (Ai \Bi ) ≤ .
2
i=1

By choice Bi ⊂ Ai , so that Ii (Bi ) ⊂ Ii (Ai ) and ∩ni=1 Ii (Bi ) ⊂ ∩ni=1 Ii (Ai ) ⊂ In (An ). Then
we also have Ĉn ⊂ In (An ). Therefore
λ
P (In (An )\Ĉn ) = P (In (An )) − P (Ĉn ) ≤ ,
2
and
λ
P (Ĉn ) ≥ P (In (An )) −
, ∀n ≥ 1,
2
i.e. Ĉn is not empty for n ≥ 1. We are thus ready to show the contradiction.
(n) (n)
Since Ĉn is not empty for n ≥ 1, there exists a sequence xn = (x1 , x2 , ...) ∈ Ĉn ⊂ Ij (Bj ),
(n) (n) (n)
for j = 1, 2, ..., n, such that (x1 , x2 , ..., xj ) ∈ Bj , for n ≥ j.
(n)
But when n varies, x1 belongs to B1 , i.e. a compact set. This means that we can identify
(n )
a subsequence of {n}, which we call {n1 }, such that x1 1 → x1 ∈ B1 . The sequence
(n ) (n )
(x1 1 , x2 1 ) belongs to B2 , hence there exists another subsequence of {n1 }, say {n2 },
(n ) (n )
such that (x1 2 , x2 2 ) → (x1 , x2 ) ∈ B2 . Going on with this reasoning, we can obtain the
subsequence {nj } of {n} for which
(nj ) (nj ) (nj )
(x1 , x2 , ..., xj ) → (x1 , x2 , ..., xj ) ∈ Bj , j = 1, 2, ...
This means that we have generated a sequence (x1 , x2 , ...) ∈ R∞ such that (x1 , ..., xj ) ∈ Bj .
But this is a cylinder, i.e. (x1 , ..., xj ) ∈ Ij (Bj ) ∀j. As a consequence, (x1 , x2 , ...) ∈ Ĉn for
every n ≥ 1. In other words Ĉn does not tend to ∅, and this is the contradiction we were
waiting for.
Given all the previous results, we can now apply Carathéodory Extension Theorem to
(R∞ , B(R∞ )), finally obtaining the measure P we were looking for.
As anticipated at the beginning of the section, this long construction is nothing more than
the formal proof of the following theorem.
96 CHAPTER 6. CONSTRUCTING PROBABILITY SPACES

Theorem 22 (Kolmogorov Extension Theorem). Let Pn be a probability measure on


(Rn , B(Rn )), n = 1, 2, ..., which satisfies the compatibility condition of equation (6.1). Then
there exists a unique probability measure P on (R∞ , B(R∞ )) such that

P (In (A)) = Pn (A), ∀A ∈ B(Rn ), n ≥ 1.

The Kolmogorov Extension Theorem allows us to build a discrete-time stochastic pro-


cess (Xn )n≥1 with a given probability law. In fact, we can build a generic probability space
(Ω, F, P ) and a sequence of random variables (Xn )n≥1 , which is defined on it, such that

P ({ω ∈ Ω : X1 (ω) ≤ x1 , ..., Xn (ω) ≤ xn }) = Fn (x1 , ..., xn ).

Using Fn we can construct a unique probability measure Pn on (Rn , B(Rn )), i.e.

P ({(ω1 , ..., ωn ) ∈ Rn : ω1 ≤ x1 , ..., ωn ≤ xn }) = Fn (x1 , ..., xn ).

Thanks to the Kolmogorov Extension Theorem we can build a measure P on (R∞ , B(R∞ ))
such that Pn ({ω ∈ R∞ : (ω1 , ..., ωn ) ∈ A}) = Pn (A), A ∈ B(Rn ). If Xn (ω) = ωn we then
have

P ({ω ∈ Ω : X1 (ω) ≤ x1 , ..., Xn (ω) ≤ xn }) = P ({ω ∈ R∞ : (ω1 , ..., ωn ) ∈ (−∞, x1 ] × ... × (−∞, xn ]})
= Fn (x1 , ..., xn ).

Example: Let us consider the following multivariate repartition function


n
Y
Fn (x1 , ..., xn ) = F (xi ),
i=1

where F is a standard repartition function. Let us use Fn to build Pn on (Rn , B(Rn )).
It is easy to verify that:
1. Fn+1 (x1 , ..., xn , xn+1 ) = Fn (x1 , ..., xn )F (xn+1 );

2. limxn+1 →+∞ Fn+1 (x1 , ..., xn+1 ) = Fn (x1 , ..., xn ).


This guarantees that all the Pn are compatible for n ≥ 1. Hence, there exists a unique
probability measure on (R∞ , B(R∞ )) such that

P ({(x1 , x2 , ..., xn , ...) ∈ R∞ : (x1 , ..., xn ) ∈ (−∞, y1 ] × ... × (−∞, yn ]}) = Fn (y1 , ..., yn ).

..., xn , ...) ∈
The sequence of random variables (Xn )n≥1 , such that Xn (x) = xn , for x = (x1 ,Q
R∞ , is characterized by the n-dimensional probability law Fn (x1 , ..., xn ) = ni=1 F (xi ),
which is exactly the law of a sequence of i.i.d. random variables.
6.4. THE CONSTRUCTION ON RT 97

6.4 The construction on RT


What we have seen so far can be further generalized by considering a construction on RT ,
where T may represent very different sets. For example, if T = {1, 2, ..., n} then RT = Rn ,
if T =Q{1, 2, ..., } then RT = R∞ , and if T = [0, +∞) then RT = R[0,+∞) . In general
RT = t∈T R.
Here the basic object is a cylinder of basis A , defined as

It1 ,t2 ,...,tn (A) := {(xt )t∈T ∈ RT : (xt1 , xt2 , ..., xtn ) ∈ A},

where A ∈ B(Rn ), t1 , t2 , ..., tn ∈ T .


The class that we obtain by letting A vary in B(Rn ), {t1 , ..., tn } ∈ T and n = 1, 2, ... is an
algebra (to prove this is a good exercise). We call it CT .
The smallest σ−algebra containing CT is σ(CT ), and it is called Borel σ−algebra of CT , or
B(RT ).
It is interesting to notice the following (the proof in [1]): absolutely no set in B(RT ) may
depend on an uncountable infinity of coordinates. This means that the set

B = {(xt )t∈T ∈ RT : xt is continuous for 0 ≤ t ≤ 1} ∈


/ B(RT ).

For what concerns the probability measure, let us assume that, for every t1 , ..., tn ∈ T , n ≥
1, we have defined P t1 ,...,tn on (Rn , B(Rn )). Let also assume that the following compatibility
conditions hold:

1. P t1 ,...,tn (A1 ×...×An ) = P ts1 ,...,tsn (As1 ×...×Asn ), where (s1 , ..., sn ) is any permutation
of (1, ..., n), n ≥ 1.

2. P t1 ,...,tn (A × R) = P t1 ,...,tn−1 (A), ∀A ∈ B(Rn−1 ).

Exercise: Let Ft1 ,...,tn (x1 , ..., xn ) be the repartition function that generates P t1 ,...,tn . How
can we redefine the compatibility conditions in terms of Ft1 ,...,tn ?

From now on the procedure is more or less the same we have used in the R∞ case. There-
fore, we leave the next steps to the reader, as stated in the following exercise.

Exercise: Show that P t1 ,...,tn is well-defined, additive on CT and, finally, σ−additive (for
this last point, it is sufficient to show that P = P t1 ,...,tn is continuous from above in ∅).

Even in this case we can use the Kolmogorov Extension Theorem to define a unique prob-
ability measure on (RT , B(RT )), starting from P = P t1 ,...,tn on (Rn , B(Rn )). Naturally,
P = P t1 ,...,tn must fulfill the compatibility conditions we have given in points 1. and 2.
The obvious consequence of all this is that we can now define a family of random variables
98 CHAPTER 6. CONSTRUCTING PROBABILITY SPACES

{Xt }t∈T with a given distribution, that is to say a stochastic process. In particular, if
Xt (ω) = ωt , ω ∈ RT and t ∈ T , then we have

P ({ω ∈ RT : Xt1 (ω) ≤ x1 , ..., Xtn (ω) ≤ xn }) = Ft1 ,...,tn (x1 , ..., xn ).
Chapter 7

The Brownian motion

This chapter is devoted to the introduction of the Brownian motion (BM), one of the build-
ing blocks of stochastic calculus and of many fundamental models of mathematical finance.
We first introduce the Wiener Measure as a natural probability measure on (R[0,+∞) , B(R[0,+∞) )).
We then pass to the construction of the Brownian motion, through the use of equivalent
processes, in order to overcome some problems about continuity. Finally, we study the
most important properties of BM, its variation and how to extend it to the multivariate
case.

7.1 The Wiener Measure


Let us consider the equipped space (R[0,+∞) , B(R[0,+∞) )).
Let t0 = 0 < t1 < ... < tn be n arbitrary points in [0, +∞) and let us define
Z x1 Z x2 Z xn
Ft1 ,t2 ,...,tn {x1 , x2 , ..., xn } = ··· p(t1 ; 0, y1 )p(t2 −t1 ; y1 , y2 ) · · · p(tn −tn−1 ; yn−1 , yn )dy1 · · · dyn ,
−∞ −∞ −∞
(7.1)
where
1 − 1 (y−x)2
p(t; x, y) = √ e 2t , t > 0.
2πt
Lemma 5. The function
n
Y 1 − 1 y2 1 − 1
(yn −yn−1 )2
p(ti − ti−1 ; yi−1 , yi ) = √ e 2t1 1 · · · p e 2(tn −tn−1 )
i=1
2πt1 2π(tn − tn−1 )

is a probability density on {y1 , ..., yn } ∈ Rn .


Proof. Just notice that:
• The quantity ni=1 p(ti − ti−1 ; yi−1 , yi ) is always non-negative.
Q

99
100 CHAPTER 7. THE BROWNIAN MOTION

Qn
• i=1 p(ti − ti−1 ; yi−1 , yi ) is measurable and continuous.
R Qn
• Rn i=1 p(ti − ti−1 ; yi−1 , yi )dy1 · · · dyn = 1.
To show this last statement, just take into consideration
Z +∞ Z +∞ Z n
+∞ Y
··· p(ti − ti−1 ; yi−1 , yi )dy1 · · · dyn .
−∞ −∞ −∞ i=1

Set x1 = y1 , x2 P = y2 − y1 and so on up to xn = yn − yn−1 , so that y1 = x1 , y2 =


x1 + x2 , ..., yn = ni=1 xi . This leads to
n Z +∞ x2 n
Y 1 − 2(t i
i −ti−1 )
Y
p e dxi = 1 = 1.
i=1 −∞ 2π(ti − ti−1 ) i=1

Now, let us consider (s1 , s2 , ..., sn ) distinct points in (0, +∞) (to be more exact we can
take s0 = 0, but si > 0 for all i = 1, ..., n). Then let us define the following probability
Z
(s1 ,...,sn )
P (A1 × · · · × An ) = p(s(1) ; 0, y1 ) · · · p(s(n) − s(n−1) ; yn−1 , yn )dy1 · · · dyn ,
A(1) ×···×A(n)
(7.2)
where (s(1) , ..., s(n) ) is a permutation of (s1 , s2 , ..., sn ), so that all the elements are in an
increasing order (think about order statistics, if you wish), and A(1) , ..., A(n) is the corre-
sponding ordered n-tuple of the sets A1 , ..., An .
It is not difficult to see that the probabilities P (s1 ,...,sn ) , for n varying, are compatible.
In fact our definition of P (s1 ,...,sn ) is naturally immune to permutations of the elements
(s1 , s2 , ..., sn ), and

P (s1 ,...,sn ) (A1 × · · · × An−1 × R) = P (s1 ,...,sn−1 ) (A1 × · · · × An−1 ).

At this point, Kolmogorov Extension Theorem tells us that there exists a unique probability
measure on (R[0,+∞) , B(R[0,+∞) )) with all the properties we know. This measure is known
as Wiener Measure, and it is a fundamental part of what we are going to see in the next
sections.

7.2 Defining the Brownian motion


We call Brownian Motion or Wiener Process a family of random variables {B(t, ω)}t≥0 ,
which are defined on a probability space (Ω, F, P ), with values in R, such that1
1
Sometimes, here below, in order to simplify the notation, we may drop ω, so that B(t, ω) = B(t).
7.2. DEFINING THE BROWNIAN MOTION 101

1. For 0 = t0 < t1 < ... < tn < +∞, the increments B(t1 ), B(t2 ) − B(t1 ), ...., B(tn ) −
B(tn−1 ) are independent random variables;
2. Each increment B(t) − B(s), s < t, follows a normal distribution with mean 0 and
variance t − s, i.e. E[B(t) − B(s)] = 0 and V ar[B(t) − B(s)] = t − s;
3. The trajectories t → B(t) are continuous almost surely, that is to say with unitary
probability.
All these three properties are fundamental for the definition of a Brownian motion2 . Figure
7.1 gives an example of Brownian motion.

Example of Brownian Motion


B(t)

ti tj tn

B(tj)

B(ti)

B(tj)-B(ti)

Figure 7.1: Example of Brownian Motion B(t) and increment B(tj ) − B(ti ).

Given the approach we have followed so far, we are immediately inclined to ask whether
there really exists a probability space (Ω, F, P ), on which we can define the family {B(t)}t≥0 ,
which satisfies the points 1.-3. above.
The answer is naturally yes, but some considerations are needed.

7.2.1 The Brownian motion and (R[0,+∞) , B(R[0,+∞) ))


In Section 7.1 we have seen how to build a probability measure P , the Wiener measure, on
the space of the functions defined in [0, +∞) with values in R. That P is constructed via
2
We will see later on that, most of the times, we also ask for the additional property B(0) = 0.
102 CHAPTER 7. THE BROWNIAN MOTION

the repartition function given in equation (7.1). For every A ∈ B(Rn ) and 0 < t1 < ... <
tn < +∞, we have
  Z
[0,+∞)
P (xt )t≥0 ∈ R : (xt1 , ..., xtn ) ∈ A = p(t(1) ; 0, y1 ) · · · p(t(n) −t(n−1) ; yn−1 , yn )dy1 · · · dyn .
A

If we set ω = (xt )t≥0 , the family B(t, ω) := xt , t ≥ 0, satisfies points 1. and 2. in the
definition of the Brownian motion.
In fact the density of the vector (B(t1 ), ..., B(tn )) in (x1 , ..., xn ) is equal to
n
!
1 1 X (xi − xi−1 )2
p(t1 ; 0, x1 ) · · · p(tn −tn−1 ; xn−1 , xn ) = qQ exp − ,
(2π)n/2 n
(t − t ) 2 ti − ti−1
j=1 j j−1 i=1

with t0 = 0 and x0 = 0.
The density of (B(t1 ), B(t2 ) − B(t1 ), ..., B(tn ) − B(tn−1 )) is then obtained using the trans-
formation x = Jy, with x = (x1 , ..., xn ), y = (y1 , ..., yn ), and where
 
1 0 ··· 0 0
 1
 1 0 ··· 0  
J= 1
 1 1 0 ···  
 ··· ··· ··· ··· ··· 
1 1 1 ··· 1
is a triangular matrix. First notice that with this transformation: y1 = x1 , y2 = x2 − x1
and so on. Moreover, notice that the Jacobian ||J|| is equal to 1.
Hence the density of the increments in y1 , ..., yn is
y 2 y2 2 yn 2
1 − 1 1 − 1 −
φ(y1 , ..., yn ) = √ e 2t1 p e 2(t2 −t1 ) · · · p e 2(tn −tn−1 ) .
2πt1 2π(t2 − t1 ) 2π(tn − tn−1 )
This shows that the increments are independent, normally distributed and

E[B(t) − B(s)] = 0 and V ar[B(t) − B(s)] = t − s, t > s.

Points 1. and 2. are thus satisfied. For what concerns point 3., on the contrary, things are
not that easy.
In fact, the process that we have defined using the Wiener measure, is a process on R[0,+∞) ,
the space of functions that are defined in [0, +∞) with values in R. But this space is NOT
the space of continuous functions C[0,+∞) , as required by point 3., which states: “the
trajectories t → B(t) are continuous almost surely”.
Naturally we could solve the problem by showing that P (C[0,+∞) ) = 1. However, C[0,+∞) ∈ /
B(R[0,+∞) )), therefore P (C[0,+∞) ) is not even defined! So what?! Is point 3. completely
meaningless?
Fortunately we can solve this problem by introducing the concept of equivalent process.
7.2. DEFINING THE BROWNIAN MOTION 103

Definition 17 (Modification). Let {Xt }t∈T and {Yt }t∈T be two stochastic processes defined
on the same probability space (Ω, F, P ). Then we say that {Xt }t∈T is a modification of
{Yt }t∈T , if
P ({ω ∈ Ω : Xt (ω) = Yt (ω)}) = 1, ∀t ∈ T.

In other words, if {Xt }t∈T is a modification of {Yt }t∈T , then the two processes have the
same distribution, even if their trajectories can be quite different.
Two processes are called equivalent when they are one the modification of the other.

Lemma 6. If {Xt }t∈T and {Yt }t∈T are two equivalent processes defined on (Ω, F, P ), then
they possess the same finite dimensional laws.

Proof. Set At = {ω ∈ Ω : Xt (ω) 6= Yt (ω)}. If {Xt }t∈T and {Yt }t∈T are equivalent then we
expect P (At ) = 0, for all t ∈ T .
Let us consider

Act1 ∩ Act2 ∩ ... ∩ Actn = {ω ∈ Ω : Xt1 (ω) = Yt1 (ω), Xt2 (ω) = Yt2 (ω), ..., Xtn (ω) = Ytn (ω)}.

Then
n
X
P (Act1 ∩ Act2 ∩ ... ∩ Actn ) = 1 − P (At1 ∪ At2 ∪ ... ∪ Atn ) ≥ 1 − P (Ati ) = 1.
i=1

Therefore

P (Act1 ∩ ... ∩ Actn ) = P ({ω ∈ Ω : Xt1 (ω) − Yt1 (ω) = 0, ..., Xtn (ω) − Ytn (ω) = 0}) = 1.

As a consequence
n n
!
X X
P ω∈Ω: αi Xti (ω) = αi Yti (ω) = 1,
i=1 i=1

where αi is a weight,
Pn i = 1, ..., n.
This means that i=1 αi Xti (ω) and ni=1 αi Yti (ω) have the same distributions, that is (in
P
terms of characteristic functions)
 Pn   Pn 
E eiτ j=1 αj Xtj (ω) = E eiτ j=1 αj Ytj (ω) .

Now, set τ αj = zj , so that


 Pn   Pn 
E ei j=1 zj Xtj (ω) = E ei j=1 zj Ytj (ω)

This implies that the vectors (Xt1 (ω), ..., Xtn (ω)) and (Yt1 (ω), ..., Ytn (ω)) follow the same
probability law.
104 CHAPTER 7. THE BROWNIAN MOTION

We are finally ready to solve the problem related to point 3. in the definition of the
Brownian motion. The answer is given by the following theorem, which we do not prove.

Theorem 23 (Continuous modification of Kolmogorov). Let {Xt (ω)}t≥0 be a process for


which, for every T > 0, there exist positive constants a, b, D, such that

E[|Xt (ω) − Xs (ω)|a ] ≤ D|t − s|1+b , 0 ≤ s ≤ t ≤ T.

Then {Xt (ω)}t≥0 admits a continuous modification with probability 1.

It is now sufficient to notice that, for a = 4, b = 1 and D = 3, the process {B(t)}t≥0 is


such that
E[|B(t) − B(s)|4 ] = 3V ar[B(t) − B(s)]2 = 3(t − s)2 .
Hence Theorem 23 tells us that B(t) admits, with probability 1, a modification with con-
tinuous trajectories.
From now on, when we speak about the Brownian motion, we will refer to its continuous
modification, so that point 3. is respected.

7.2.2 The Brownian motion as Gaussian process


Let us consider the following definition.

Definition 18 (Gaussian process). A process {Xt }t≥0 is called Gaussian if, for every
integer k ≥ 1 and 0 < t1 < t2 < ... < tk < +∞, the joint distribution of the random vector
(Xt1 (ω), ..., Xtk (ω)) is Gaussian.

The finite dimensional distributions of a Gaussian process are determined by the vector
(µ(t1 ), ..., µ(tk )), where µ(t) = E[Xt ], t ≥ 0, and by the variance-covariance matrix Σ, with
components ρ(ti , tj ) = E[(Xti − µ(ti ))(Xtj − µ(tj ))], ti , tj ≥ 0, 1 ≤ i, j ≤ k.
Hence a Brownian motion is nothing more than a Gaussian process where µ(t) = 0 for
t ≥ 0, and ρ(s, t) = min(s, t) for s, t ≥ 0.
Conversely, every almost surely continuous Gaussian process, with E[Xt ] = 0 and Cov(Xt , Xs ) =
E(Xt Xs ) = min(s, t) for s, t ≥ 0, is a Brownian motion.

7.3 Main properties of the Brownian motion


The importance of the Brownian motion B(t) in stochastic calculus and financial math-
ematics is essentially due to the many properties that this stochastic process possesses.
Here below we list some of the most important ones.

1. The process cB(t/c), t ≥ 0, c > 0, is a Brownian motion.

2. The process −B(t), t ≥ 0, is a Brownian motion.


7.3. MAIN PROPERTIES OF THE BROWNIAN MOTION 105

3. The process tB(1/t), t > 0, is a Brownian motion.


4. The process Bs (t) = B(s + t) − B(s), s > 0, t ≥ 0, is again a Brownian motion.
5. Set a > 0 and define Ta = inf{t|B(t) = a} as the first passage time in a. Then the
process (
∗ B(t) t ≤ Ta
B (t) = , with Ta < +∞ a.s.,
2a − B(t) t > Ta
is a Brownian motion. A graphical representation is given in Figure 7.2.
For what concerns this process, it is worth noticing the following. Let x be a level
for B(t) such that x < a. Then
P (Ta ≤ t, B(t) ≤ x) = P (B(t) ≥ 2a − x). (7.3)
Equation (7.3) is known as the reflection principle (or reflection equality) .

B*(t)
B(t),B*(t)

B(t)

Ta
t

Figure 7.2: Example of Brownian path and reflected path.

Exercise 28. Prove points 1. and 2. in the previous list.


Naturally, many other properties can be found (see for example [8]), but these constitute
the most useful for us. Or, to be more exact, the most useful ones together with the
fundamental properties we deal with in the next three subsections.
However, we first need to give the following definitions.
106 CHAPTER 7. THE BROWNIAN MOTION

Definition 19 (Filtration). Let (Ω, F) be an equipped space. A filtration {Ft }t≥0 is a


collection of σ−algebras such that Ft ⊆ F for every t, and Fs ⊆ Ft for s ≤ t.

Definition 20 (Adaptivity). A stochastic process {Xt }t≥0 is said to be adapted to the


filtration {Ft }t≥0 , if the random variable Xt is Ft −measurable for every t ≥ 0.

Let us now refine the previous definitions for our special process {B(t)}t≥0 .

Definition 21 (Filtration for a BM). Let (Ω, F, P ) be a probability space on which a


Brownian motion {B(t)}t≥0 is given. A filtration for {B(t)}t≥0 is thus a collection of
σ−algebras {Ft }t≥0 such that:

• For 0 ≤ s < t, every set in Fs is also in Ft . Later on, we will see that in finance a
filtration can be seen as a process of knowledge accumulation. Hence we are requiring
that at time t > s we have at least the same information about the markets that was
available at time s.

• For every t ≥ 0, the Brownian motion B(t) at time t is Ft −measurable. This means
that at time t we have a sufficient amount of information to evaluate B(t).

• For 0 ≤ s < t, the increment B(t) − B(s) is independent of Fs . After time s, every
new increment will not depend on the information available at time s.

7.3.1 Markovianity
The Brownian motion is a Markov process with homogeneous transition probability

(y − x)2
Z  
1
pt (x, A) = P [B(s + t) ∈ A|B(s) = x] = √ exp dy,
A 2πt 2t

and initial distribution µ = δ0 , where δx is the Dirac function. A proof of this can be found
in [8], pages 107 and 108.
The following definitions can be useful to recall what a Markov process is.

Definition 22 (Markov Property). Let (Ω, F, P ) be a probability space with a filtration


{Ft }t∈T . A process {Xt }t∈T , which is adapted to the filtration {Ft }, is said to have a
Markov property with respect to {Ft }, if

P (Xt ∈ A|Fs ) = P (Xt ∈ A|Xs ), s < t ≤ T.

Definition 23 (Markov Process). A process {Xt }t∈T on the space (Ω, F, P ), with values
in S, is a Markov process if it has the Markov property with respect to its natural filtration,
that it to say the filtration generated by the process itself, i.e.

Ft = σ({X −1 (A) : s ≤ t, A ∈ S}).


7.3. MAIN PROPERTIES OF THE BROWNIAN MOTION 107

7.3.2 Martingality
Before showing that the BM is a martingale, let us recall the definition.

Definition 24 (Martingale). A process {Xt }t∈T on the space (Ω, F, P ), with values in S,
is a martingale with respect to the filtration {Ft } ⊆ F and the probability measure P , if
the following properties are satisfied:

1. {Xt }t∈T is adapted to {Ft };

2. For every t, EP (|Xt |) < +∞, where EP is the expectation according to the probability
measure P ;

3. For s < t, EP (Xt |Fs ) = Xs .

It is very important to stress that martingality is always defined with respect to a


filtration and a probability measure. The same process can be a martingale under a given
measure, but it may be not under another measure. When dealing with risk-neutral pricing,
we will see that Girsanov theorem gives us a way to find a probability measure with respect
to which an Itō process is a martingale.

Proposition 28. The Brownian motion is a martingale with respect to its natural filtration,
under the probability measure P defined in Equation (7.2).

Proof. Every stochastic process is adapted to its natural filtration, hence point 1. in the
definition of a martingale is respected.
For point 2. we can simply refer to what we have seen before, when working with the
Wiener measure.
For point 3., let us consider 0 ≤ s ≤ t. Then, by setting E[.] = EP [.],

E[B(t)|Fs ] = E[B(t) − B(s) + B(s)|Fs ].

But the increments of a BM are independent, hence

E[B(t) − B(s) + B(s)|Fs ] = E[B(t) − B(s)|Fs ] + E[B(s)|Fs ]


= E[B(t) − B(s)] + B(s) = 0 + B(s) = B(s).

7.3.3 The Maximum of {B(t)}


In pricing barrier options, it is very useful to consider a function of B(t), i.e. the so-called
maximum to date
M (t) = max B(t).
0≤s≤t
108 CHAPTER 7. THE BROWNIAN MOTION

To study this quantity, let us come back to equation (7.3). For a > 0, M (t) ≥ a if and
only if Ta ≤ t. This implies

P (M (t) ≥ a, B(t) ≤ x) = P (B(t) ≥ 2a − x), x ≤ m, m > 0.

We can thus give the following results. We omit the proves, since we will give them later
on, when dealing with barrier options.

Proposition 29. The joint density of (M (t), B(t)) is

(2a − x)2
 
2(2a − x)
fM (t),B(t) (a, x) = √ exp − .
t 2πt 2t

Proposition 30. The conditional distribution of M (t) given B(t) = x is

2a(a − x)2
 
2(2a − x)
fM (t)|B(t) (a|x) = exp − .
t t

Maximum to date

M(t)
B(t)

t
time

Figure 7.3: Representation of the maximum to date for a Brownian motion {Bt }.
7.4. ABOUT THE TOTAL VARIATION OF THE BROWNIAN MOTION 109

7.4 About the total variation of the Brownian motion


Let f be a real-valued function defined on the interval [a, b], a < b. Now let us consider a
sequence (x0 , ..., xn ) such that a = x0 < x1 < ... < xn = b. Then we have that
n
X
V[a,b] = sup |f (xi ) − f (xi−1 )|
i=1

is the total variation of f on [a, b].


From analysis we know that every function whose total variation is finite (bounded) can
be obtained as the difference of two monotonic functions. In other words, a function of
bounded variation (also known as BV function) is differentiable almost everywhere, i.e. the
set of points for which the the function is not differentiable is a set of measure zero.
In this section we want to show that the Brownian motion is not a BV function on [0, t].

Let {B(u), 0 ≤ t} be a Brownian motion on the interval [0, t]. Now let us define the
following sum
2n 2n     2 !
X X k k−1 t
Wn,k = B t −B t − n . (7.4)
2n 2n 2
k=1 k=1

This sum is clearly based on a partition of the interval [0, t]. It is then worth noticing that
    2 !
k k−1 t
E[Wn,k ] = E B t −B t − n = 0,
2n 2n 2

for k = 1, 2, ..., 2n . Moreover, and the proof is left as an exercise3 ,

2 2 2t2
E[Wn,k ] = V ar[Wn,k ]= .
4n
As we know, the increments of a Brownian motion are independent, this implies that
E(Wn,k Wn,h ) = E(Wn,k )E(Wn,h ) = 0 for k 6= h. Hence

2n
!2  
2n 2n

X X X
2
E Wn,k  = E  Wn,k + Wn,k Wn,h 
k=1 k=1 k6=h
" 2n # 2 n
X X
2
 2 
= E Wn,k = E Wn,k
k=1 k=1
2t2 2t2
= 2n = .
4n 2n
3 k k−1 t
  
Use the fact that B 2n
t −B 2n
t ∼ N 0, 2n
.
110 CHAPTER 7. THE BROWNIAN MOTION

and
2n ! 2 n !
X 2t2 X
V ar Wn,k = n, and E Wn,k = 0.
2
k=1 k=1

Using the famous Tchebycheff inequality we then get


" 2n #
X 2t2
P Wn,k >  ≤ ,  > 0.
2n 2
k=1
nPn o
2t2
Clearly we have that ∞ 2
P
n=1 2n 2 < ∞. Setting En = k=1 W n,k >  we can then apply
the first Borel-Cantelli lemma (see the box below), so that
 !
" 2 n # 2n
X \ [ X
P Wn,k >  infinitely often = 0 = P  Wn,k >   .
k=1 n≥1 m≥n k=1

In other terms
2n !
X
P lim Wn,k = 0 = 1,
n→∞
k=1

so that, from (7.4),

2n     2 !
X k k−1
B t −B t → t a.s.
2n 2n
k=1

Let us now consider the following inequality


2n     2 ! 2 n     !    
X k k−1 X k k−1 j j−1
B t −B t ≤ B t − B t max B t − B t .
2n 2n 2n 2n 1≤j≤2n 2n 2n
k=1 k=1

Rearranging the terms we finally get


P2n  k
 k−1
2 
2n     B t − B t
X k k−1 k=1 2n 2n
B n
t −B n
t ≥     . (7.5)
2 2 max1≤j≤2n B 2jn t − B j−1 t
k=1 2n

Since B(u) is continuous in [0, t], it is also uniformly continuous4 . This means that we can
play with B(u) so that its oscillation in any interval of length t/2n is as small as we want.
But that means that the denominator to the right-hand side of (7.5) can be very close to
4
Every continuous function on a compact set is uniformly continuous.
7.5. GENERALIZATIONS 111

0, while the numerator almost surely converges to t. As a consequence the element on the
left-hand side diverges to ±∞. But
2n    
X k k−1
sup B t −B t
2n 2n
k=1

is nothing more than the unbounded total variation of our Brownian motion {B(u), 0 ≤ t}.
2 ] = V ar[W 2t2
Exercise 29. Prove that E[Wn,k n,k ] = 4n .

First Borel-Cantelli lemma


Let E1 , E2 , ... be a sequence of events such that ∞
P
i=1 P (Ei ) < +∞, then
 
\ [
P (lim sup En ) = P  Em  = P (En infinitely often) = 0.
n
n≥1 m≥n

In probability, the “lim sup” is the limit superior of a sequence of events, where each
event is a set of possible outcomes. Hence lim supn En is the set of outcomes that occur
infinitely many times within the infinite sequence of events (En )n≥0 .

7.5 Generalizations
Take µ ∈ R and σ > 0 and define

B ∗ (t) = σB(t) + µt,

where B(t) is a Brownian motion. Then the process B ∗ (t) is a BM with drift µ and diffusion
parameter σ. It is then easy to verify that

E[B ∗ (t)] = µt
V ar[B ∗ (t)] = σ 2 t
Cov[B ∗ (t), B ∗ (s)] = σ 2 min(s, t).

Exercise 30. What about the increments of B ∗ (t)? What about their expected values and
variances?

And now a definition.


112 CHAPTER 7. THE BROWNIAN MOTION

Definition 25. If B1 (t), B2 (t), ..., Bk (t) are k independent Brownian motions, then the
vector
B(t) = [B1 (t), B2 (t), ..., Bk (t)]T
is a k−dimensional Brownian motion5 .

Let B(t) be a k−dimensional Brownian motion. Now consider a vector

µ = [µ1 , ..., µk ] ∈ Rk ,

and a non-singular k × k matrix A. Then the process

B∗ (t) = AB(t) + µt,

is said k−dimensional Brownian motion with drift µ and diffusion matrix D = AAT .

5
Please be careful: the notation [.]T means “transposed”, it is not a power!
Bibliography

[1] R.B. Ash (1972). Real analysis and probability. Academic Press, New York.

[2] R.B. Ash (1999). Probability and Measure theory. Academic Press, New York.

[3] J.C. Hull (2005). Options, Futures and Other Derivatives, 6th edition. Prentice Hall,
New York.

[4] R.C. Merton (1990). Continuous-time Finance. Basil Blackwell, Oxford and Cam-
bridge.

[5] A.J. McNeil, R. Frey, P. Embrechts (2005). Quantitative risk management. Princeton
University Press.

[6] J. Shao (2003). Mathematical Statistics. Springer, New York.

[7] S.E. Shreve (2004). Stochastic calculus for finance I: the binomial asset pricing model.
Springer, New York.

[8] S.E. Shreve (2004). Stochastic calculus for finance II: continuous models. Springer,
New York.

113
Index

Accrual, 69 Cylinder
Adaptivity, 106 for RT , 97
Algebra, 83 Kolmogorov, 92
σ−, 84
Dividend yield, 55
Sigma, 84
Dividend-paying assets, 55
Approximating Sequence, 11
Arbitrage, 24, 49 Equivalent Process, 102
Arbitrage, absence of, 49–51 Exercise Boundary, 62
Asset Pricing, 27
Filtration
Bachelier, 47 Definition, 106
Bond Option, 77 for the BM, 106
Borel Fixed Income, 67
σ−algebra, 84 Floor, 68
Brownian Motion Floorlet, 69
Forward, 76 Forward Price, 77
Brownian Motion
Interest rate derivatives, 68
k−dimensional, 112
Interest Rates
as Gaussian Process, 104
Models, 70
Definition, 100
Itō’s Integral
Markovianity, 106
Approximation, 12
Martingality, 107
Class
Maximum of, 57, 58, 107 H2 , 17
Properties, 104 M2 , 9
Quadratic Variation, 44 Definition, 9, 14
Reflection Principle, 57, 105 for Simple Processes, 9
Total Variation, 109 Itō’s Isometry, 10
Itō-Doeblin formula, 16
Cap, 68
Martingality, 15
Caplet, 69 Properties, 14
CIR model, 71 Itō-Doeblin formula, 16
Compatibility Condition
or Consistency, 93 Market

114
INDEX 115

Complete, 28 Bond, 77
Definition, 22 Canary, 60
Scenario, 22 Caps and floors, 68
Martingale European, 41
Brownian Motion, 107 European Call, 41
Definition, 107 European Put, 41
Exponential, 29 Exotic, 60
Itō’s Integral , 15 knock-in, 57
Representation Theorem, 38 knock-out, 57
Measure on dividend-paying asset, 55
Change of, 24 Perpetual American Put, 62
Definition, 86 Perpetual Put, 62
Equilibrium, 23 Pricing, 43
Equivalent, 24 Replicable, 41
Market, 21 Up-and-in, 57
Mutually absolutely continuous, 26 Up-and-out, 57
Physical, 21 Vanilla, 44
Pre-, 86
Probability, 86 Portfolio
Risk-Neutral, 21, 23 Admissible, 41
Wiener, 99 Dynamically rebalanced, 22
Measure in continuous time, 40
T-forward, 68 Self-financing, 22, 39
Probability, 86
Model
Put-Call Parity, 51
Affine, 71
CIR, 71 Radon-Nikodym derivative, 25
Vasicek, 70 Random Variable, 90
Modification Risk Neutral Measure, 23
Continuous, 104
Definition, 103 Simple Process
Definition, 9
Novikov Condition, 34 Space
Novikov condition, 72 Equipped, 84
Numeraire, 68 Probability, 83, 89
Stochastic Differential, 17
Option Stochastic Integral, 17
American, 60 Stopping Time, 60
American Call, 60
American Put, 61 Tenor Structure, 75
Barrier, 57 Theorem
Bermuda, 60 Asset Pricing I, 27
116 INDEX

Asset Pricing II, 28


Black-Scholes-Merton, 41
Cameron-Martin, 29
Carathéodory Extension, 89
Dominated Convergence, 13
Girsanov, 35
Girsanov I, 72
Girsanov II, 74
Girsanov III, 75
Itō Representation, 38
Itō’s Approximation, 12
Kolmogorov Extension, 91, 96
Martingale Representation, 38
Time
Optimal Stopping, 60
Stopping, 60

Value-at-Risk, 47
Vasicek model, 70
Volatility
Historical, 44
Implied, 46
Realised, 45

You might also like