Mathematical Finance
Mathematical Finance
1. Measure
The language of option pricing involves that of probability, which in turn
involves that of measure theory. This originated with Henri LEBESGUE
(1875-1941), in his 1902 thesis, ‘Intégrale, longueur, aire’. We begin with
the simplest case.
Length.
The length µ(I) of an interval I = (a, b), [a, b], [a, b) S
or (a, b] should be
b − a: µ(I) = b − a. The length of the disjoint union I = nr=1 Ir of intervals
Ir should be the sum of their lengths:
n
! n
[ X
µ Ir = µ(Ir ) (finite additivity).
r=1 r=1
1
as Zorn’s Lemma, is needed to demonstrate the existence of non-measurable
sets – but all such proofs are highly non-constructive). So: some but not all
subsets of the line have a length.1 These are called the Lebesgue-measurable
sets, and form the class F described above; length, defined on F is called
Lebesgue measure µ (on the real line, R).
Area.
The area of a rectangle R = (a1 , b1 ) × (a2 , b2 ) – with or without any of
its perimeter included – should be µ(R) = (b1 − a1 ) × (b2 − a2 ). The area of
a finite or countably infinite union of disjoint rectangles should be the sum
of their areas:
∞
! ∞
[ X
µ Rn = µ(Rn ) (countable additivity).
n=1 n=1
Let F be the smallest class of sets, containing the rectangles, closed under
finite or countably infinite unions, closed under complements, and complete
(containing all subsets of sets of area 0 as sets of area 0). Lebesgue showed
that area can be sensibly defined on the sets in F and no others. The sets
A ∈ F are called the Lebesgue-measurable sets in the plane R2 ; area, defined
on F, is called Lebesgue measure in the plane. So: some but not all sets in
the plane have an area.
Volume.
Similarly in three-dimensional space R3 , starting with the volume of a
cuboid C = (a1 , b1 ) × (a2 , b2 ) × (a3 , b3 ) as
2
Euclidean space.
Similarly in k-dimensional Euclidean space Rk . We start with
k
! k
Y Y
µ (ai , bi = (bi − ai ),
i=1 i=1
2. Integral.
1. Indicators.
We start in dimension
Rb k = 1 for simplicity , and consider the simplest
calculus formula a 1 dx = b − a. We rewrite this as
Z ∞
I(f ) := f (x) dx = b − a if f (x) = I[a,b) (x),
−∞
the indicator function of [a, b] (1 in [a, b], 0 outside it), and similarly for the
other three choices about end-points.
2. Simple functions.
A function
Pn f is called simple if it is a finite linear combination of indica-
tors: f = i=1 ci fi for constants ci and indicator functions fi of intervals Ii .
One then extends the definition of the integral from indicator functions to
simple functions by linearity:
n
! n
X X
I ci fi := ci I(fi )
i=1 i=1
3
Call f a (Lebesgue-) measurable function if, for all c, the sets {x : f (x) ≤
c} is a Lebesgue-measurable set (§1). If f is a non-negative measurable
function, we quote that it is possible to construct f as the increasing limit
of a sequence of simple functions fn :
(we quote that this does indeed define I(f ): the value does not depend on
which approximating sequence (fn ) we use). Since fn increases in n, so does
I(fn ) (the integral is order-preserving), so either I(fn ) increases to a finite
limit, or diverges to ∞. In the first case, we Rsay f is (Lebesgue-)
R integrable
with (Lebesgue-)
R integral
R I(f ) = lim I(fn ), or f (x) dx = lim fn (x) dx, or
simply f = lim fn .
4. Measurable functions.
If f is a measurable function that may change sign, we split it into its
positive and negative parts, f± :
4
R∞
has Rno meaning for Lebesgue integrals, since 1 | sinx x| dx diverges to +∞
∞
like 1 x1 dx. It has to be replaced by the limit relation
Z X
sin x π
dx → (X → ∞).
0 x 2
Lp spaces.
For p ≥ 1, the Lp spaces Lp (Rk ) on Rk are the spaces of measurable
functions f with Lp -norm
Z p1
kf kp := |f |p < ∞.
Riemann integrals.
Our first exposure to integration is the ‘Sixth-Form integral’, taught non-
rigorously at school. Mathematics undergraduates are taught a rigorous in-
tegral (in their first or second years), the Riemann integral [G.B. RIEMANN
(1826-1866)] – essentially this is just a rigourization of the school integral.
It is much easier to set up than the Lebesgue integral, but much harder to
manipulate.
For finite intervals [a, b] ,we quote:
(i) for any function f Riemann-integrable on [a, b], it is Lebesgue-integrable
to the same value (but many more functions are Lebesgue integrable);
(ii) f is Riemann-integrable on [a, b] iff it is continuous a.e. on [a, b]. Thus the
question, “Which functions are Riemann-integrable?” cannot be answered
without the language of measure theory – which then gives one the techni-
cally superior Lebesgue integral anyway.
Note. Integration
R is like summation (which is why Leibniz gave us the in-
tegral sign , as an elongated S). Lebesgue was a very practical man – his
5
father was a tradesman – and used to think about integration in the follow-
ing way. Think of a shopkeeper totalling up his day’s takings. The Riemann
integral is like adding up the takings – notes and coins – in the order in
which they arrived. By contrast, the Lebesgue integral is like totalling up
the takings in order of size - from the smallest coins up to the largest notes.
This is obviously better! In mathematical effect, it exchanges ‘integrating by
x-values’ (abscissae) with ‘integrating by y-values’ (ordinates).
Lebesgue-Stieltjes integral.
Suppose that F (x) is a non-decreasing function on R:
F (x) ≤ F (x) if x ≤ y
6
This may be +∞; but if Vab (F ) < ∞, F is said to be of finite variation (FV)
on [a, b], F ∈ F Vab (bounded variation, BV, is also used). If F is of finite
variation on all finite intervals, F is said to be locally of finite variation,
F ∈ F Vloc ; if F is of finite variation on the real line, F is of finite variation,
F ∈ FV .
We quote (Jordan’s theorem) that the following are equivalent:
(i) F is locally of finite variation;
(ii) F is the difference F = F1 − F2 of two monotone
R functions.
So the above procedure defines the integral f dF when the integrator F is
of finite variation.
3. Probability.
Probability spaces.
The mathematical theory of probability can be traced to 1654, to corre-
spondence between PASCAL (1623-1662) and FERMAT (1601-1665). How-
ever, the theory remained both incomplete and non-rigorous till the 20th
century. It turns out that the Lebesgue theory of measure and integral
sketched above is exactly the machinery needed to construct a rigorous the-
ory of probability adequate for modelling reality (option pricing, etc.) for
us. This was realised by the great Russian mathematician and probabilist
A.N.KOLMOGOROV (1903-1987), whose classic book of 1933, Grundbegriffe
der Wahrscheinlichkeitsrechnung [Foundations of probability theory] inaugu-
rated the modern era in probability.
Recall from your first course on probability that, to describe a random
experiment mathematically, we begin with the sample space Ω, the set of all
possible outcomes. Each point ω of Ω, or sample point, represents a possible
– random – outcome of performing the random experiment. For a set A ⊆ Ω
of points ω we want to know the probability P (A) (or Pr(A), pr(A)). We
clearly want
1. P (∅) = 0, P (Ω) = 1.
2. P (A) ≥ 0 for all A.
3. If A1 , A2 , . . . , An are disjoint, P ( ni=1 Ai ) = ni=1 P (Ai )
S P
(finite additivity – fa), which, as above we will strengthen to
3*. If A1 , A2 . . . (ad inf.) are disjoint,
∞
[ ∞
X
P( Ai ) = P (Ai ) (countable additivity – ca).
i=1 i=1
7
4. If B ⊆ A and P (A) = 0, then P (B) = 0 (completeness).
Then by 1 and 3 (with A = A1 , Ω \ A = A2 ),
P (Ac ) = P (Ω \ A) = 1 − P (A).
Random variables.
Next, recall random variables X from your first probability course. Given
a random outcome ω, you can calculate the value X(ω) of X (a scalar – a
real number, say; similarly for vector-valued random variables, or random
vectors). So, X is a function from Ω to R, X → R,
X : ω 7→ X(ω) (ω ∈ Ω).
We can only deal with functions X for which all these probabilities are de-
fined. So, for each x, we need {ω : X(ω) ≤ x} ∈ F. We summarize this by
saying that X is measurable with respect to the σ-field F (of events), briefly,
X is F-measurable. Then, X is called a random variable [non-F-measurable
X cannot be handled, and so are left out]. So,
(i) a random variable X is an F-measurable function on Ω;
(ii) a function on Ω is a random variable (is measurable) iff its distribution
function is defined.
8
Generated σ-fields.
The smallest σ-field containing all the sets {ω : X(ω) ≤ x} for all real x
[equivalently, {X < x}, {X ≥ x}, {X > x}]2 is called the σ-field generated
by X, written σ(X). Thus,
When the (random) value X(ω) is known, we know which of the events in the
σ-field generated by X have happened: these are the events {ω : X(ω) ∈ B},
where B runs through the Borel σ-field [the σ-field generated by the intervals
– it makes no difference whether open, closed etc.] on the line.
Interpretation.
Think of σ(X) as representing what we know when we know X, or in
other words the information contained in X (or in knowledge of X). This is
from the following result, due to J. L. DOOB (1910-2004), which we quote:
Expectation.
A measure (II.1) determines an integral (II.2). A probability measure P ,
being a special kind of measure [a measure of total mass one] determines a
special kind of integral, called an expectation.
Definition. The expectation E of a random variable X on (Ω, F, P ) is
defined by Z Z
E[X] := X dP, or X(ω) dP (ω).
Ω Ω
2
Here, and in Measure Theory, whether intervals are open, closed or half-open doesn’t
matter. In Topology, such distinctions are crucial. One can combine Topology and Mea-
sure Theory, but we must leave this here.
9
or if X is discrete,
P taking values xn , (n = 1, 2, . . .) with probability function
f (xn )(≥ 0), ( f (xn ) = 1),
X
E[X] := xn f (xn )
P << Q,
10
(note that since the integral of anything over a null set is zero, any P so
representable is certainly absolutely continuous
R with respect to Q – the point
is that the converse holds). Since P (A) = A dP , this says that
Z Z
dP = f dQ ∀A ∈ F.
A A
By analogy with the chain rule of ordinary calculus, we write dP/dQ for f ;
then Z Z
dP
dP = dQ ∀A ∈ F.
A A dQ
Symbolically,
dP
if P << Q, dP = dQ.
dQ
The measurable function (= random variable) dP/dQ is called the Radon-
Nikodym derivative (RN-derivative) of P with respect to Q.
If P << Q and also Q << P , we call P and Q equivalent measures,
written P ∼ Q. Then dP/dQ and dQ/dP both exist, and
dP . dQ
=1 .
dQ dP
For P ∼ Q, P (A) = 0 iff Q(A) = 0: P and Q have the same null sets. Taking
negations: P ∼ Q iff P, Q have the same sets of positive measure. Taking
complements: P ∼ Q iff P, Q have the same sets of probability one [the same
a.s. sets]. Thus the following are equivalent: P ∼ Q iff P , Q have the same
null sets/the same a.s. sets/the same sets of positive measure.
Note. Far from being an abstract theoretical result, the Radon-Nikodym
theorem is of key practical importance, in two ways:
(a) It is the key to the concept of conditioning (”using what we know” – §5,
§6 below), which is of central importance throughout,
(b) The concept of equivalent measures is central to the key idea of math-
ematical finance, risk-neutrality, and hence to its main results, the Black-
Scholes formula, the Fundamental Theorem of Asset Pricing (FTAP), etc.
The key to all this is that prices should be the discounted expected values
under the equivalent martingale measure. Thus equivalent measures, and
the operation of change of measure, are of central economic and financial
importance. We shall return to this later in connection with the main math-
ematical result on change of measure, Girsanov’s theorem (VII.4).
11
Recall that we first met the phrase ‘equivalent martingale measure’ in
II.5 above. We now know what a measure is, and what equivalent measures
are; we will learn about martingales in III.3 below.
Its expectation is
X
E[Y |X = xi ] = yj fY |X (yj |xi )
j
12
X X
= yj f (xi , yj )/ f (xi , yj ).
j j
But this approach only works when the events on which we condition have
positive probability, which only happens in the discrete case.
2. Density case. Formally replacing the sums above by integrals: if (X, Y )
has density f (x, y),
Z ∞ Z ∞
X has density f1 (x) := f (x, y)dy, Y has density f2 (y) := f (x, y)dx.
−∞ −∞
Its expectation is
Z ∞ Z ∞ Z ∞
E[Y |X = x] = yfY |X (y|x)dy = yf (x, y)dy/ f (x, y)dy.
−∞ −∞ −∞
13
is non-negative (because Y is), σ-additive – because
Z XZ
Y dP = Y dP
B n Bn
Y = Y+ − Y−
14
Note.
1. To check that something is a conditional expectation: we have to check
that it integrates the right way over the right sets [i.e., as in (*)].
2. From (*): if two things integrate the same way over all sets B ∈ B, they
have the same conditional expectation given B.
3. For notational convenience, we use E[Y |B] and EB Y interchangeably.
4. The conditional expectation thus defined coincides with any we may have
already encountered - in regression or multivariate analysis, for example.
However, this may not be immediately obvious. The conditional expectation
defined above – via σ-fields and the Radon-Nikodym theorem – is rightly
called by Williams ([W], p.84) ‘the central definition of modern probability’.
It may take a little getting used to. As with all important but non-obvious
definitions, it proves its worth in action: see III.6 below for properties of con-
ditional expectations, and Chapter IV for stochastic processes, particularly
martingales [defined in terms of conditional expectations].
1. B = {∅, Ω}. Here B is the smallest possible σ-field (any σ-field of subsets
of Ω contains ∅ and Ω), and represents ‘knowing nothing’.
Proof. We have to check (*) for all sets B ∈ F. The only integrand that
integrates like Y over all sets is Y itself, or a function agreeing with Y except
on a set of measure zero.
Note. When we condition on F (‘knowing everything’), we know Y (because
we know everything). There is thus no uncertainty left in Y to average out,
so taking the conditional expectation (averaging out remaining randomness)
has no effect, and leaves Y unaltered.
15
3. If Y is B-measurable, E[Y |B] = Y P − a.s.
Proof. Recall that Y is always F-measurable (this is the definition of Y being
a random variable). For B ⊂ F, Y may not be B-measurable, but if it is,
the proof above applies with B in place of F.
Note. If Y is B-measurable, when we are given B (that is, when we condition
on it), we know Y . That makes Y effectively a constant, and when we take
the expectation of a constant, we get the same constant.
Note. 5, 5’ are the two forms of the iterated conditional expectations property.
When conditioning on two σ-fields, one larger (finer), one smaller (coarser),
the coarser rubs out the effect of the finer, either way round. This is also
called the coarse-averaging property, or (Williams [W]) the tower property.
16
Note. Compare this with the Conditional Variance Formula of Statistics: see
e.g. SMF, IV.6, or Ch. VIII.
Note. In the elementary definition P (A|B) := P (A∩B)/P (B) (if P (B) > 0),
if A and B are independent (that is, if P (A ∩ B) = P (A).P (B)), then
P (A|B) = P (A): conditioning on something independent has no effect. One
would expect this familiar and elementary fact to hold in this more general
situation also. It does – and the proof of this rests on the proof above.
E[E[X|C]|C] = E[X|C].
This says that the operation of taking conditional expectation given a sub-
σ-field C is idempotent – doing it twice is the same as doing it once. Also,
taking conditional expectation is a linear operation (it is defined via an in-
tegral, and integration is linear). Recall from Linear Algebra that we have
met such idempotent linear operations before. They are the projections.
(Example: (x, y, z) 7→ (x, y, 0) projects from 3-dimensional space onto the
(x, y)-plane.) This view of conditional expectation as projection is useful
and powerful; see e.g. [BK], [BF] or
[N] J. Neveu, Discrete-parameter martingales (North- Holland, 1975), I.2.
It is particularly useful when one has not yet got used to conditional expec-
tation defined measure-theoretically as above, as it gives us an alternative
(and perhaps more familiar) way to think.
17