0% found this document useful (0 votes)

109 views17 pages

Mathematical Finance

This document provides an overview of measure-theoretic probability and Lebesgue integration. It begins by defining Lebesgue measure for intervals on the real line based on length, then extends this to higher dimensions based on area and volume. It introduces Lebesgue-measurable sets and defines the Lebesgue integral for non-negative functions using simple approximations, then extends this to signed functions by splitting them into positive and negative parts. Finally, it discusses Lp spaces and compares the Lebesgue and Riemann integrals.

Uploaded by

Alps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views17 pages

Mathematical Finance

Uploaded by

Alps

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Chapter III: MEASURE-THEORETIC PROBABILITY

1. Measure
The language of option pricing involves that of probability, which in turn
involves that of measure theory. This originated with Henri LEBESGUE
(1875-1941), in his 1902 thesis, ‘Intégrale, longueur, aire’. We begin with
the simplest case.
Length.
The length µ(I) of an interval I = (a, b), [a, b], [a, b) S
or (a, b] should be
b − a: µ(I) = b − a. The length of the disjoint union I = nr=1 Ir of intervals
Ir should be the sum of their lengths:
n
! n
[ X
µ Ir = µ(Ir ) (finite additivity).
r=1 r=1

Consider now an infinite sequence I1 , I2 , . . .(ad infinitum) of disjoint intervals.

Letting n → ∞ suggests that length should again be additive over disjoint
intervals:
∞
! ∞
[ X
µ Ir = µ(Ir ) (countable additivity).
r=1 r=1

For I an interval, A a subset of length µ(A), the length of the complement

I \ A := I ∩ Ac of A in I should be

µ(I \ A) = µ(I) − µ(A) (complementation).

If A ⊆ B and B has length µ(B) = 0, then A should have length 0 also:

A ⊆ B & µ(B) = 0 ⇒ µ(A) = 0 (completeness).

Let F be the smallest class of sets A ⊂ R containing the intervals, closed

under countable disjoint unions and complements, and complete (containing
all subsets of sets of length 0 as sets of length 0). The above suggests – what
Lebesgue showed – that length can be sensibly defined on the sets F on the
line, but on no others. There are others – but they are hard to construct
(in technical language: the Axiom of Choice (AC), or some variant of it such

1
as Zorn’s Lemma, is needed to demonstrate the existence of non-measurable
sets – but all such proofs are highly non-constructive). So: some but not all
subsets of the line have a length.1 These are called the Lebesgue-measurable
sets, and form the class F described above; length, defined on F is called
Lebesgue measure µ (on the real line, R).
Area.
The area of a rectangle R = (a1 , b1 ) × (a2 , b2 ) – with or without any of
its perimeter included – should be µ(R) = (b1 − a1 ) × (b2 − a2 ). The area of
a finite or countably infinite union of disjoint rectangles should be the sum
of their areas:
∞
! ∞
[ X
µ Rn = µ(Rn ) (countable additivity).
n=1 n=1

If R is a rectangle and A ⊆ R with area µ(A), the area of the complement

R \ A should be

µ(R \ A) = µ(R) − µ(A) (complementation).

If B ⊆ A and A has area 0, B should have area 0:

A ⊆ B & µ(B) = 0 ⇒ µ(A) = 0 (completeness).

Let F be the smallest class of sets, containing the rectangles, closed under
finite or countably infinite unions, closed under complements, and complete
(containing all subsets of sets of area 0 as sets of area 0). Lebesgue showed
that area can be sensibly defined on the sets in F and no others. The sets
A ∈ F are called the Lebesgue-measurable sets in the plane R2 ; area, defined
on F, is called Lebesgue measure in the plane. So: some but not all sets in
the plane have an area.
Volume.
Similarly in three-dimensional space R3 , starting with the volume of a
cuboid C = (a1 , b1 ) × (a2 , b2 ) × (a3 , b3 ) as

µ(C) = (b1 − a1 ) · (b2 − a2 ) · (b3 − a3 ).

1
There are alternatives to AC, under which all sets are measurable. So it is not so much
a question of whether AC is true or not, but of what axioms of Set Theory we assume.
Background: Model Theory in Mathematical Logic, etc.

2
Euclidean space.
Similarly in k-dimensional Euclidean space Rk . We start with
k
! k
Y Y
µ (ai , bi = (bi − ai ),
i=1 i=1

and obtain the class F of Lebesgue-measurable sets in Rk , and Lebesgue mea-

sure µ in Rk .
Probability.
The unit cube [0, 1]k in Rk has Lebesgue measure 1. It can be used to
model the uniform distribution (density f (x) = 1 if x ∈ [0, 1]k , 0 otherwise),
with probability = length/area/volume if k = 1/2/3.
Note. If a property holds everywhere except on a set of measure zero, we
say it holds almost everywhere (a.e.) [French: presque partout, p.p.; German:
fast überall, f.u.]. If it holds everywhere except on a set of probability zero,
we say it holds almost surely (a.s.) [or, with probability one].

2. Integral.
1. Indicators.
We start in dimension
Rb k = 1 for simplicity , and consider the simplest
calculus formula a 1 dx = b − a. We rewrite this as
Z ∞
I(f ) := f (x) dx = b − a if f (x) = I[a,b) (x),
−∞

the indicator function of [a, b] (1 in [a, b], 0 outside it), and similarly for the
other three choices about end-points.
2. Simple functions.
A function
Pn f is called simple if it is a finite linear combination of indica-
tors: f = i=1 ci fi for constants ci and indicator functions fi of intervals Ii .
One then extends the definition of the integral from indicator functions to
simple functions by linearity:
n
! n
X X
I ci fi := ci I(fi )
i=1 i=1

for constants ci and indicators fi of intervals Ii .

3. Non-negative measurable functions.

3
Call f a (Lebesgue-) measurable function if, for all c, the sets {x : f (x) ≤
c} is a Lebesgue-measurable set (§1). If f is a non-negative measurable
function, we quote that it is possible to construct f as the increasing limit
of a sequence of simple functions fn :

fn (x) ↑ f (x) for all x ∈ R (n → ∞), fn simple.

We then define the integral of f as

I(f ) := lim I(fn ) (≤ ∞)

n→∞

(we quote that this does indeed define I(f ): the value does not depend on
which approximating sequence (fn ) we use). Since fn increases in n, so does
I(fn ) (the integral is order-preserving), so either I(fn ) increases to a finite
limit, or diverges to ∞. In the first case, we Rsay f is (Lebesgue-)
R integrable
with (Lebesgue-)
R integral
R I(f ) = lim I(fn ), or f (x) dx = lim fn (x) dx, or
simply f = lim fn .
4. Measurable functions.
If f is a measurable function that may change sign, we split it into its
positive and negative parts, f± :

f+ (x) := max(f (x), 0), f− (x) := − min(f (x), 0),

f (x) = f+ (x) − f− (x), |f (x)| = f+ (x) + f− (x)

If both f+ and f− are integrable, we say that f is too, and define

Z Z Z
f := f+ − f− .

Then, in particular, |f | is also integrable, and

Z Z Z
|f | = f+ + f− .

Note. The Lebesgue integral is, by construction, an absolute integral: f is

integrable iff |f | is integrable. Thus, for instance, the well-known formula
Z ∞
sin x π
dx =
0 x 2

4
R∞
has Rno meaning for Lebesgue integrals, since 1 | sinx x| dx diverges to +∞
∞
like 1 x1 dx. It has to be replaced by the limit relation
Z X
sin x π
dx → (X → ∞).
0 x 2

The class of (Lebesgue-) integrable functions f on R is written L(R) or (for

reasons explained below) L1 (R) – abbreviated to L1 or L.
Higher dimensions.
In Rk , we start instead from k-dimensional R boxes.
Qk If f is the indicator of
a box B = [a1 , b1 ] × [a2 , b2 ] × · · · × [ak , bk ], f := i=1 (bi − ai ). We then ex-
tend to simple functions by linearity, to non-negative measurable functions
by taking increasing limits, and to measurable functions by splitting into
positive and negative parts.

Lp spaces.
For p ≥ 1, the Lp spaces Lp (Rk ) on Rk are the spaces of measurable
functions f with Lp -norm
Z p1
kf kp := |f |p < ∞.

Riemann integrals.
Our first exposure to integration is the ‘Sixth-Form integral’, taught non-
rigorously at school. Mathematics undergraduates are taught a rigorous in-
tegral (in their first or second years), the Riemann integral [G.B. RIEMANN
(1826-1866)] – essentially this is just a rigourization of the school integral.
It is much easier to set up than the Lebesgue integral, but much harder to
manipulate.
For finite intervals [a, b] ,we quote:
(i) for any function f Riemann-integrable on [a, b], it is Lebesgue-integrable
to the same value (but many more functions are Lebesgue integrable);
(ii) f is Riemann-integrable on [a, b] iff it is continuous a.e. on [a, b]. Thus the
question, “Which functions are Riemann-integrable?” cannot be answered
without the language of measure theory – which then gives one the techni-
cally superior Lebesgue integral anyway.
Note. Integration
R is like summation (which is why Leibniz gave us the in-
tegral sign , as an elongated S). Lebesgue was a very practical man – his

5
father was a tradesman – and used to think about integration in the follow-
ing way. Think of a shopkeeper totalling up his day’s takings. The Riemann
integral is like adding up the takings – notes and coins – in the order in
which they arrived. By contrast, the Lebesgue integral is like totalling up
the takings in order of size - from the smallest coins up to the largest notes.
This is obviously better! In mathematical effect, it exchanges ‘integrating by
x-values’ (abscissae) with ‘integrating by y-values’ (ordinates).

Lebesgue-Stieltjes integral.
Suppose that F (x) is a non-decreasing function on R:

F (x) ≤ F (x) if x ≤ y

(prime example: F a probability distribution function). Such functions can

have at most countably many discontinuities, which are at worst jumps. We
may without loss re-define F at jumps so as to be right-continuous.
We now generalise the starting points above:
(i) Measure. We take µ((a, b]) := F (b) − F (a).
Rb
(ii) Integral. We take a 1 := F (b) − F (a).
We may now follow through the successive extension procedures used above.
We obtain:
(i) Lebesgue-Stieltjes measure Rµ, or µF , R R
(ii) Lebesgue-Stieltjes integral f dµ, or f dµF , or even f dF .
Similarly in higher dimensions; we omit further details.
Finite variation (FV).
If instead of being monotone non-decreasing, F is theR difference
R of two
such functions, F = F1 − F2 , we can define the integrals f dF1 , f dF2 as
above, and then define
Z Z Z Z
f dF = f d(F1 − F2 ) := f dF1 − f dF2 .

If [a, b] is a finite interval and F is defined on [a, b], a finite collection of

points, x0 , x1 , . . . , xn with aP= x0 < x1 < · · · < xn = b, is called a partition
of [a, b], P say. The sum ni=1 |F (xi ) − F (xi−1 )| is called the variation of
F over the partition. The least upper bound of this over all partitions P is
called the variation of F over the interval [a, b], Vab (F ):
X
Vab (F ) := sup |F (xi ) − F (xi−1 )|.
P

6
This may be +∞; but if Vab (F ) < ∞, F is said to be of finite variation (FV)
on [a, b], F ∈ F Vab (bounded variation, BV, is also used). If F is of finite
variation on all finite intervals, F is said to be locally of finite variation,
F ∈ F Vloc ; if F is of finite variation on the real line, F is of finite variation,
F ∈ FV .
We quote (Jordan’s theorem) that the following are equivalent:
(i) F is locally of finite variation;
(ii) F is the difference F = F1 − F2 of two monotone
R functions.
So the above procedure defines the integral f dF when the integrator F is
of finite variation.

3. Probability.
Probability spaces.
The mathematical theory of probability can be traced to 1654, to corre-
spondence between PASCAL (1623-1662) and FERMAT (1601-1665). How-
ever, the theory remained both incomplete and non-rigorous till the 20th
century. It turns out that the Lebesgue theory of measure and integral
sketched above is exactly the machinery needed to construct a rigorous the-
ory of probability adequate for modelling reality (option pricing, etc.) for
us. This was realised by the great Russian mathematician and probabilist
A.N.KOLMOGOROV (1903-1987), whose classic book of 1933, Grundbegriffe
der Wahrscheinlichkeitsrechnung [Foundations of probability theory] inaugu-
rated the modern era in probability.
Recall from your first course on probability that, to describe a random
experiment mathematically, we begin with the sample space Ω, the set of all
possible outcomes. Each point ω of Ω, or sample point, represents a possible
– random – outcome of performing the random experiment. For a set A ⊆ Ω
of points ω we want to know the probability P (A) (or Pr(A), pr(A)). We
clearly want
1. P (∅) = 0, P (Ω) = 1.
2. P (A) ≥ 0 for all A.
3. If A1 , A2 , . . . , An are disjoint, P ( ni=1 Ai ) = ni=1 P (Ai )
S P
(finite additivity – fa), which, as above we will strengthen to
3*. If A1 , A2 . . . (ad inf.) are disjoint,
∞
[ ∞
X
P( Ai ) = P (Ai ) (countable additivity – ca).
i=1 i=1

7
4. If B ⊆ A and P (A) = 0, then P (B) = 0 (completeness).
Then by 1 and 3 (with A = A1 , Ω \ A = A2 ),

P (Ac ) = P (Ω \ A) = 1 − P (A).

So the class F of subsets of Ω whose probabilities P (A) are defined should

be closed under countable, disjoint unions and complements, and contain the
empty set ∅ and the whole space Ω. Such a class is called a σ-field of subsets
of Ω [or sometimes a σ-algebra, which one would write A]. For each A ∈ F,
P (A) should be defined (and satisfy 1, 2, 3∗, 4 above). So, P : F → [0, 1] is a
set-function,
P : A 7→ P (A) ∈ [0, 1] (A ∈ F).
The sets A ∈ F are called events. Finally, 4 says that all subsets of null-sets
(events) with probability zero (we will call the empty set ∅ empty, not null)
should be null-sets (completeness). A probability space, or Kolmogorov triple,
is a triple (Ω, F, P ) satisfying these Kolmogorov axioms 1,2,3*,4 above. A
probability space is a mathematical model of a random experiment.

Random variables.
Next, recall random variables X from your first probability course. Given
a random outcome ω, you can calculate the value X(ω) of X (a scalar – a
real number, say; similarly for vector-valued random variables, or random
vectors). So, X is a function from Ω to R, X → R,

X : ω 7→ X(ω) (ω ∈ Ω).

Recall also that the distribution function of X is defined by

F (x), or FX (x), := P {ω : X(ω) ≤ x} , or P (X ≤ x), (x ∈ R).

We can only deal with functions X for which all these probabilities are de-
fined. So, for each x, we need {ω : X(ω) ≤ x} ∈ F. We summarize this by
saying that X is measurable with respect to the σ-field F (of events), briefly,
X is F-measurable. Then, X is called a random variable [non-F-measurable
X cannot be handled, and so are left out]. So,
(i) a random variable X is an F-measurable function on Ω;
(ii) a function on Ω is a random variable (is measurable) iff its distribution
function is defined.

8
Generated σ-fields.
The smallest σ-field containing all the sets {ω : X(ω) ≤ x} for all real x
[equivalently, {X < x}, {X ≥ x}, {X > x}]2 is called the σ-field generated
by X, written σ(X). Thus,

X is F-measurable [is a random variable] iff σ(X) ⊆ F.

When the (random) value X(ω) is known, we know which of the events in the
σ-field generated by X have happened: these are the events {ω : X(ω) ∈ B},
where B runs through the Borel σ-field [the σ-field generated by the intervals
– it makes no difference whether open, closed etc.] on the line.

Interpretation.
Think of σ(X) as representing what we know when we know X, or in
other words the information contained in X (or in knowledge of X). This is
from the following result, due to J. L. DOOB (1910-2004), which we quote:

σ(X) ⊆ σ(Y ) iff X = g(Y )

for some measurable function g. For, knowing Y means we know X := g(Y )

– but not vice-versa, unless the function g is one-to-one [injective], when the
inverse function g −1 exists, and we can go back via Y = g −1 (X).

Expectation.
A measure (II.1) determines an integral (II.2). A probability measure P ,
being a special kind of measure [a measure of total mass one] determines a
special kind of integral, called an expectation.
Definition. The expectation E of a random variable X on (Ω, F, P ) is
defined by Z Z
E[X] := X dP, or X(ω) dP (ω).
Ω Ω

If X is real-valued, say, with distribution function F , recall (Ch. I) that EX

is defined in your first course on probability by
Z
E[X] := xf (x) dx if X has a density f

2
Here, and in Measure Theory, whether intervals are open, closed or half-open doesn’t
matter. In Topology, such distinctions are crucial. One can combine Topology and Mea-
sure Theory, but we must leave this here.

9
or if X is discrete,
P taking values xn , (n = 1, 2, . . .) with probability function
f (xn )(≥ 0), ( f (xn ) = 1),
X
E[X] := xn f (xn )

(weighted average of possible values, weighted according to their probability).

These two formulae are the special cases (for the density and discrete cases)
of the general formula Z ∞
E[X] := x dF (x)
−∞

where the integral on the right is a Lebesgue-Stieltjes integral. This in turn

agrees with the definition above, since if F is the distribution function of X,
Z Z ∞
X dP = x dF (x)
Ω −∞

follows by the change of variable formula for the measure-theoretic integral,

on applying the map X : Ω → R (we quote this: see any book on Measure
Theory).
Glossary. We now have two parallel languages, measure-theoretic and prob-
abilistic:
Measure Probability
Integral Expectation
Measurable set Event
Measurable function Random variable
almost-everywhere (a.e.) almost-surely (a.s.)

§4. Equivalent Measures and Radon-Nikodym derivatives.

Given two measures P and Q defined on the same σ-field F, we say that
P is absolutely continuous with respect to Q, written

P << Q,

if P (A) = 0 whenever Q(A) = 0, A ∈ F. We quote from measure theory the

vitally important Radon-Nikodym theorem: P << Q iff there exists a (F-)
measurable function f such that
Z
P (A) = f dQ ∀A ∈ F
A

10
(note that since the integral of anything over a null set is zero, any P so
representable is certainly absolutely continuous
R with respect to Q – the point
is that the converse holds). Since P (A) = A dP , this says that
Z Z
dP = f dQ ∀A ∈ F.
A A

By analogy with the chain rule of ordinary calculus, we write dP/dQ for f ;
then Z Z
dP
dP = dQ ∀A ∈ F.
A A dQ
Symbolically,
dP
if P << Q, dP = dQ.
dQ
The measurable function (= random variable) dP/dQ is called the Radon-
Nikodym derivative (RN-derivative) of P with respect to Q.
If P << Q and also Q << P , we call P and Q equivalent measures,
written P ∼ Q. Then dP/dQ and dQ/dP both exist, and
dP . dQ
=1 .
dQ dP
For P ∼ Q, P (A) = 0 iff Q(A) = 0: P and Q have the same null sets. Taking
negations: P ∼ Q iff P, Q have the same sets of positive measure. Taking
complements: P ∼ Q iff P, Q have the same sets of probability one [the same
a.s. sets]. Thus the following are equivalent: P ∼ Q iff P , Q have the same
null sets/the same a.s. sets/the same sets of positive measure.
Note. Far from being an abstract theoretical result, the Radon-Nikodym
theorem is of key practical importance, in two ways:
(a) It is the key to the concept of conditioning (”using what we know” – §5,
§6 below), which is of central importance throughout,
(b) The concept of equivalent measures is central to the key idea of math-
ematical finance, risk-neutrality, and hence to its main results, the Black-
Scholes formula, the Fundamental Theorem of Asset Pricing (FTAP), etc.
The key to all this is that prices should be the discounted expected values
under the equivalent martingale measure. Thus equivalent measures, and
the operation of change of measure, are of central economic and financial
importance. We shall return to this later in connection with the main math-
ematical result on change of measure, Girsanov’s theorem (VII.4).

11
Recall that we first met the phrase ‘equivalent martingale measure’ in
II.5 above. We now know what a measure is, and what equivalent measures
are; we will learn about martingales in III.3 below.

§5. Conditional Expectations.

Suppose that X is a random variable, whose expectation exists (i.e.
E[|X|] < ∞, or X ∈ L1 ). Then E[X], the expectation of X, is a scalar
(a number) – non-random. The expectation operator E averages out all the
randomness in X, to give its mean (a weighted average of the possible value
of X, weighted according to their probability, in the discrete case).
It often happens that we have partial information about X – for instance,
we may know the value of a random variable Y which is associated with X,
i.e. carries information about X. We may want to average out over the
remaining randomness. This is an expectation conditional on our partial in-
formation, or more briefly a conditional expectation.
This idea will be familiar already from elementary courses, in two cases
(see e.g. [BF]):
1. Discrete case, based on the formula

P (A|B) := P (A ∩ B)/P (B) if P (B) > 0.

If X takes values x1 , · · · , xm with probabilities f1 (xi ) > 0, Y takes values

y1 , · · · , yn with probabilities f2 (yj ) > 0, (X, Y ) takes values (xi , yj ) with
probabilitiesP f (xi , yj ) > 0, then P
(i) f1 (xi ) = j f (xi , yj ), f2 (yj ) = i f (xi , yj ),
(ii) P (Y = yj |X = xi ) = P (X = xi , Y = yj )/P (X = xi ) = f (xi , yj )/f1 (xi )
X
= f (xi , yj )/ f (xi , yj ).
j

This is the conditional distribution of Y given X = xi , written

X
fY |X (yj |xi ) = f (xi , yj )/f1 (xi ) = f (xi , yj )/ f (xi , yj ).
j

Its expectation is
X
E[Y |X = xi ] = yj fY |X (yj |xi )
j

12
X X
= yj f (xi , yj )/ f (xi , yj ).
j j

But this approach only works when the events on which we condition have
positive probability, which only happens in the discrete case.
2. Density case. Formally replacing the sums above by integrals: if (X, Y )
has density f (x, y),
Z ∞ Z ∞
X has density f1 (x) := f (x, y)dy, Y has density f2 (y) := f (x, y)dx.
−∞ −∞

We define the conditional density of Y given X = x by the continuous ana-

logue of the discrete formula above:
Z ∞
fY |X (y|x) := f (x, y)/f1 (x) = f (x, y)/ f (x, y)dy.
−∞

Its expectation is
Z ∞ Z ∞ Z ∞
E[Y |X = x] = yfY |X (y|x)dy = yf (x, y)dy/ f (x, y)dy.
−∞ −∞ −∞

Example: Bivariate normal distribution, N (µ1 , µ2 , σ12 , σ22 , ρ).

σ2
E[Y |X = x] = µ2 + ρ (x − µ1 ),
σ1
the familiar regression line of statistics (linear model: [BF, Ch. 1]). See I.4.

Kolmogorov’s approach: conditional expectations via σ-fields

The problem is that joint densities need not exist – do not exist, in general.
One of the great contributions of Kolmogorov’s classic book of 1933 was the
realization that measure theory – specifically, the Radon-Nikodym theorem
–provides a way to treat conditioning in general, without assuming that we
are in the discrete case or density case above.
Recall that the probability triple is (Ω, F, P ). Take B a sub-σ-field of F,
B ⊂ F (recall: a σ-field represents information; the big σ-field F represents
‘knowing everything’, the small σ-field B represents ‘knowing something’).
Suppose that Y is a non-negative random variable whose expectation
exists: E[Y ] < ∞. The set-function
Z
Q(B) := Y dP (B ∈ B)
B

13
is non-negative (because Y is), σ-additive – because
Z XZ
Y dP = Y dP
B n Bn

if B = ∪n Bn , Bn disjoint – and defined on the σ-algebra B, so is a measure

on B. If P (B) = 0, then Q(B) = 0 also (the integral of anything over a
null set is zero), so Q << P . By the Radon-Nikodym theorem (III.4), there
exists a Radon-Nikodym derivative of Q with respect to P on B, which is
B-measurable [in the Radon-Nikodym theorem as stated in III.4, we had F in
place of B, and got a random variable, i.e. an F-measurable function. Here,
we just replace F by B.] Following Kolmogorov (1933), we call this Radon-
Nikodym derivative the conditional expectation of Y given (or conditional on)
B, E[Y |B]: this is B-measurable, integrable, and satisfies
Z Z
Y dP = E[Y |B]dP ∀B ∈ B. (∗)
B B

In the general case, where Y is a random variable whose expectation exists

(E[|Y |] < ∞) but which can take values of both signs, decompose Y as

Y = Y+ − Y−

and define E[Y |B] by linearity as

E[Y |B] := E[Y+ |B] − E[Y− |B].

Suppose now that B is the σ-field generated by a random variable X:

B = σ(X) (so B represents the information contained in X, or what we
know when we know X). Then E[Y |B] = E[Y |σ(X)], which is written more
simply as E[Y |X]. Its defining property is
Z Z
Y dP = E[Y |X]dP ∀B ∈ σ(X).
B B

Similarly, if B = σ(X1 , · · · , Xn ) (B is the information in (X1 , · · · , Xn )) we

write E[Y |σ(X1 , · · · , Xn ] as E[Y |X1 , · · · , Xn ]:
Z Z
Y dP = E[Y |X1 , · · · , Xn ]dP ∀B ∈ σ(X1 , · · · , Xn ).
B B

14
Note.
1. To check that something is a conditional expectation: we have to check
that it integrates the right way over the right sets [i.e., as in (*)].
2. From (*): if two things integrate the same way over all sets B ∈ B, they
have the same conditional expectation given B.
3. For notational convenience, we use E[Y |B] and EB Y interchangeably.
4. The conditional expectation thus defined coincides with any we may have
already encountered - in regression or multivariate analysis, for example.
However, this may not be immediately obvious. The conditional expectation
defined above – via σ-fields and the Radon-Nikodym theorem – is rightly
called by Williams ([W], p.84) ‘the central definition of modern probability’.
It may take a little getting used to. As with all important but non-obvious
definitions, it proves its worth in action: see III.6 below for properties of con-
ditional expectations, and Chapter IV for stochastic processes, particularly
martingales [defined in terms of conditional expectations].

§6. Properties of Conditional Expectations.

1. B = {∅, Ω}. Here B is the smallest possible σ-field (any σ-field of subsets
of Ω contains ∅ and Ω), and represents ‘knowing nothing’.

E[Y |{∅, Ω}] = EY.

Proof. We have to check (*) of §5 for B = ∅ and B = Ω. For B = ∅ both

sides are zero; for B = Ω both sides are EY . //

2. B = F. Here B is the largest possible σ-field: ‘knowing everything’.

E[Y |F] = Y P − a.s.

Proof. We have to check (*) for all sets B ∈ F. The only integrand that
integrates like Y over all sets is Y itself, or a function agreeing with Y except
on a set of measure zero.
Note. When we condition on F (‘knowing everything’), we know Y (because
we know everything). There is thus no uncertainty left in Y to average out,
so taking the conditional expectation (averaging out remaining randomness)
has no effect, and leaves Y unaltered.

15
3. If Y is B-measurable, E[Y |B] = Y P − a.s.
Proof. Recall that Y is always F-measurable (this is the definition of Y being
a random variable). For B ⊂ F, Y may not be B-measurable, but if it is,
the proof above applies with B in place of F.
Note. If Y is B-measurable, when we are given B (that is, when we condition
on it), we know Y . That makes Y effectively a constant, and when we take
the expectation of a constant, we get the same constant.

4. If Y is B-measurable, E[Y Z|B] = Y E[Z|B] P − a.s.

We refer for the proof of this to [W], p.90, proof of (j).
Note. Williams calls this property ‘taking out what is known’. To remem-
ber it: if Y is B-measurable, then given B we know Y , so Y is effectively a
constant, so can be taken out through the integration signs in (*), which is
what we have to check (with Y Z in place of Y ).

5. If C ⊂ B, E[E[Y |B]|C] = E[Y |C] a.s.

Proof. EC EB Y is C-measurable, and for C ∈ C ⊂ B,
Z Z
EC [EB Y ]dP = EB Y dP (definition of EC as C ∈ C)
C C
Z
= Y dP (definition of EB as C ∈ B).
C

So EC [EB Y ] satisfies the defining relation for EC Y . Being also C-measurable,

it is EC Y (a.s.). //

5’. If C ⊂ B, E[E[Y |C]|B] = E[Y |C] a.s.

Proof. E[Y |C] is C-measurable, so B-measurable as C ⊂ B, so E[.|B] has no
effect on it, by 3.

Note. 5, 5’ are the two forms of the iterated conditional expectations property.
When conditioning on two σ-fields, one larger (finer), one smaller (coarser),
the coarser rubs out the effect of the finer, either way round. This is also
called the coarse-averaging property, or (Williams [W]) the tower property.

6. Conditional Mean Formula. E[E[Y |B]] = EY P − a.s.

Proof. Take C = {∅, Ω} in 5 and use 1. //
Example. Check this for the bivariate normal distribution considered above.

16
Note. Compare this with the Conditional Variance Formula of Statistics: see
e.g. SMF, IV.6, or Ch. VIII.

7. Role of independence. If Y is independent of B,

E[Y |B] = E[Y ] a.s.

Proof. See [W], p.88, 90, property (k).

Note. In the elementary definition P (A|B) := P (A∩B)/P (B) (if P (B) > 0),
if A and B are independent (that is, if P (A ∩ B) = P (A).P (B)), then
P (A|B) = P (A): conditioning on something independent has no effect. One
would expect this familiar and elementary fact to hold in this more general
situation also. It does – and the proof of this rests on the proof above.

Projections. In Property 5 (tower property), take B = C:

E[E[X|C]|C] = E[X|C].

This says that the operation of taking conditional expectation given a sub-
σ-field C is idempotent – doing it twice is the same as doing it once. Also,
taking conditional expectation is a linear operation (it is defined via an in-
tegral, and integration is linear). Recall from Linear Algebra that we have
met such idempotent linear operations before. They are the projections.
(Example: (x, y, z) 7→ (x, y, 0) projects from 3-dimensional space onto the
(x, y)-plane.) This view of conditional expectation as projection is useful
and powerful; see e.g. [BK], [BF] or
[N] J. Neveu, Discrete-parameter martingales (North- Holland, 1975), I.2.
It is particularly useful when one has not yet got used to conditional expec-
tation defined measure-theoretically as above, as it gives us an alternative
(and perhaps more familiar) way to think.

Lebesgue Integral - PDF Liviuc
No ratings yet
Lebesgue Integral - PDF Liviuc
224 pages
Metastasis and Metastability - A - Kane X. Faucher
No ratings yet
Metastasis and Metastability - A - Kane X. Faucher
332 pages
Chapter Iii: Measure-Theoretic Probability 1. Measure: Measure Theory Int Egrale, Longueur, Aire' Length
No ratings yet
Chapter Iii: Measure-Theoretic Probability 1. Measure: Measure Theory Int Egrale, Longueur, Aire' Length
17 pages
Chapter I. Probability Background.: 1. Measure
No ratings yet
Chapter I. Probability Background.: 1. Measure
10 pages
Notes On The Lebesgue Integral: Revised, 9/5/2021
No ratings yet
Notes On The Lebesgue Integral: Revised, 9/5/2021
10 pages
Measure Theory and Lebesgue Integration: Joshua H. Lifton Originally Published 31 March 1999 Revised 5 September 2004
No ratings yet
Measure Theory and Lebesgue Integration: Joshua H. Lifton Originally Published 31 March 1999 Revised 5 September 2004
28 pages
Bab 3 - Marek
No ratings yet
Bab 3 - Marek
13 pages
Easure Heory Otes: Mr. Andrew L Pinchuck Department of Mathematics (Pure & Applied) Rhodes University
No ratings yet
Easure Heory Otes: Mr. Andrew L Pinchuck Department of Mathematics (Pure & Applied) Rhodes University
74 pages
Charles Doss: K K K K
No ratings yet
Charles Doss: K K K K
4 pages
01 Book
No ratings yet
01 Book
38 pages
Introduction Le Be Sgue Integral
No ratings yet
Introduction Le Be Sgue Integral
13 pages
Chap 01
No ratings yet
Chap 01
12 pages
Lebesgue Theory
No ratings yet
Lebesgue Theory
49 pages
Math3301 Lecturenotes
No ratings yet
Math3301 Lecturenotes
72 pages
Intro
No ratings yet
Intro
5 pages
MA2224 ch3
No ratings yet
MA2224 ch3
21 pages
Measure Theory and Lebesgue Integration: Appendix D
No ratings yet
Measure Theory and Lebesgue Integration: Appendix D
14 pages
Measure Theory and Lebesgue Integration: Appendix D
No ratings yet
Measure Theory and Lebesgue Integration: Appendix D
14 pages
Measure 1
No ratings yet
Measure 1
77 pages
mth404 Notes PDF
No ratings yet
mth404 Notes PDF
20 pages
Mth404 Notes
100% (1)
Mth404 Notes
20 pages
11 Measure
No ratings yet
11 Measure
12 pages
Wang A
No ratings yet
Wang A
9 pages
Theory of Integration in Statistics: Courtesy: An Unknown Writer
No ratings yet
Theory of Integration in Statistics: Courtesy: An Unknown Writer
60 pages
The Lebesgue Integral As A Riemann
No ratings yet
The Lebesgue Integral As A Riemann
14 pages
Real Analysis Lecture 4new
No ratings yet
Real Analysis Lecture 4new
15 pages
Chapter 3. Lebesgue Integral and The Monotone Convergence Theorem
No ratings yet
Chapter 3. Lebesgue Integral and The Monotone Convergence Theorem
21 pages
Another Method of Integration: Lebesgue Integral: Shengjun Wang 2017/05
No ratings yet
Another Method of Integration: Lebesgue Integral: Shengjun Wang 2017/05
33 pages
MT Notes 2024-4
No ratings yet
MT Notes 2024-4
96 pages
Unit 9
No ratings yet
Unit 9
30 pages
Lecture Notes For Measure Theory
100% (1)
Lecture Notes For Measure Theory
128 pages
Spiegel M.R. Real Variables, Lebesque Measure With Applications To Fourier Series 1990
100% (3)
Spiegel M.R. Real Variables, Lebesque Measure With Applications To Fourier Series 1990
201 pages
Harmonic Analysis Under
No ratings yet
Harmonic Analysis Under
111 pages
Integration
No ratings yet
Integration
9 pages
Lebesgue Measure
No ratings yet
Lebesgue Measure
10 pages
Daniel MI Notes Early
No ratings yet
Daniel MI Notes Early
60 pages
Measure Theory
100% (1)
Measure Theory
201 pages
Measure Theory in R
No ratings yet
Measure Theory in R
5 pages
221 Analysis 2, 2008-09 Summary of Theorems and Definitions: Measure Theory
No ratings yet
221 Analysis 2, 2008-09 Summary of Theorems and Definitions: Measure Theory
13 pages
Lebesgue Integration Theory
No ratings yet
Lebesgue Integration Theory
4 pages
Real Analysis II
No ratings yet
Real Analysis II
13 pages
Matematyka w6 7
No ratings yet
Matematyka w6 7
44 pages
95 04 22 21 57 48
No ratings yet
95 04 22 21 57 48
296 pages
Akshay
No ratings yet
Akshay
177 pages
Integral
No ratings yet
Integral
30 pages
Measure and Integration Theory - 4-06-11-2021!16!20-07 - Measure and Integration Theory (20MAT22C2)
100% (1)
Measure and Integration Theory - 4-06-11-2021!16!20-07 - Measure and Integration Theory (20MAT22C2)
90 pages
Measure Theory Primer
No ratings yet
Measure Theory Primer
33 pages
Lebesgue Measure: On The Real Line
No ratings yet
Lebesgue Measure: On The Real Line
4 pages
Daniel MI Notes
No ratings yet
Daniel MI Notes
74 pages
MTD350 Mid-Sem Report 2022MT11923
No ratings yet
MTD350 Mid-Sem Report 2022MT11923
28 pages
Lecture Notes On Measure Theory and Functional Analysis
No ratings yet
Lecture Notes On Measure Theory and Functional Analysis
82 pages
Wa0000.
No ratings yet
Wa0000.
20 pages
Lebesgue Integration and Measure
No ratings yet
Lebesgue Integration and Measure
290 pages
Section 4.2. Lebesgue Integration of A Bounded Measurable Function Over A Set of Finite Measure
No ratings yet
Section 4.2. Lebesgue Integration of A Bounded Measurable Function Over A Set of Finite Measure
8 pages
Measure (Mathematics) 1 PDF
No ratings yet
Measure (Mathematics) 1 PDF
9 pages
Lectures on Measure and Integration
From Everand
Lectures on Measure and Integration
Harold Widom
No ratings yet
An Introduction to Lebesgue Integration and Fourier Series
From Everand
An Introduction to Lebesgue Integration and Fourier Series
Howard J. Wilcox
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
ch5 pt1 PDF
No ratings yet
ch5 pt1 PDF
40 pages
Ethier Gambling-Course
No ratings yet
Ethier Gambling-Course
42 pages
STAT301 Notes
No ratings yet
STAT301 Notes
168 pages
Reading 3 - Probability Concepts
No ratings yet
Reading 3 - Probability Concepts
47 pages
Shafer - Probabilistic Expert Systems
No ratings yet
Shafer - Probabilistic Expert Systems
91 pages
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
No ratings yet
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
30 pages
UW MATH-STAT395 Bivariate-Distributions PDF
No ratings yet
UW MATH-STAT395 Bivariate-Distributions PDF
17 pages
Stochastic Calculus: Summary. by Celine Azizieh (Université Libre de Bruxelles)
100% (1)
Stochastic Calculus: Summary. by Celine Azizieh (Université Libre de Bruxelles)
360 pages
Eda Joint Probability Distribution
No ratings yet
Eda Joint Probability Distribution
6 pages
Conditional Probability and Expectation
No ratings yet
Conditional Probability and Expectation
19 pages
Chapter 6-Linear Regression With Multiple Regressors
No ratings yet
Chapter 6-Linear Regression With Multiple Regressors
68 pages
Tutorial 10 Solution
No ratings yet
Tutorial 10 Solution
25 pages
Conditional Expectation and Martingales
No ratings yet
Conditional Expectation and Martingales
21 pages
Finbook 2
No ratings yet
Finbook 2
246 pages
Probability PPT Group
No ratings yet
Probability PPT Group
78 pages
Lecture Notes On Stochastic Calculus (NYU)
100% (3)
Lecture Notes On Stochastic Calculus (NYU)
138 pages
R300 MT Class 1 Slides
No ratings yet
R300 MT Class 1 Slides
68 pages
The Many Shapley Values For Model Explanation
No ratings yet
The Many Shapley Values For Model Explanation
11 pages
Ps 1
No ratings yet
Ps 1
6 pages
Econometrics Chapter 1& 2
No ratings yet
Econometrics Chapter 1& 2
35 pages
Nptel: Course On
No ratings yet
Nptel: Course On
11 pages
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 241 270
No ratings yet
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 241 270
30 pages
Applied Probability Theory - J. Chen
100% (3)
Applied Probability Theory - J. Chen
177 pages
Marcin Pitera. Stochastic Processes.
No ratings yet
Marcin Pitera. Stochastic Processes.
45 pages
Towerconditinal
No ratings yet
Towerconditinal
2 pages
Introduction To Nonlinear Filtering
No ratings yet
Introduction To Nonlinear Filtering
126 pages
Mathematical Expectation: Examples
No ratings yet
Mathematical Expectation: Examples
12 pages
325 Book
100% (1)
325 Book
182 pages

Mathematical Finance

Uploaded by

Mathematical Finance

Uploaded by

Chapter III: MEASURE-THEORETIC PROBABILITY

Consider now an infinite sequence I1 , I2 , . . .(ad infinitum) of disjoint intervals.

For I an interval, A a subset of length µ(A), the length of the complement

µ(I \ A) = µ(I) − µ(A) (complementation).

If A ⊆ B and B has length µ(B) = 0, then A should have length 0 also:

A ⊆ B & µ(B) = 0 ⇒ µ(A) = 0 (completeness).

Let F be the smallest class of sets A ⊂ R containing the intervals, closed

If R is a rectangle and A ⊆ R with area µ(A), the area of the complement

µ(R \ A) = µ(R) − µ(A) (complementation).

If B ⊆ A and A has area 0, B should have area 0:

A ⊆ B & µ(B) = 0 ⇒ µ(A) = 0 (completeness).

µ(C) = (b1 − a1 ) · (b2 − a2 ) · (b3 − a3 ).

and obtain the class F of Lebesgue-measurable sets in Rk , and Lebesgue mea-

for constants ci and indicators fi of intervals Ii .

fn (x) ↑ f (x) for all x ∈ R (n → ∞), fn simple.

We then define the integral of f as

I(f ) := lim I(fn ) (≤ ∞)

f+ (x) := max(f (x), 0), f− (x) := − min(f (x), 0),

If both f+ and f− are integrable, we say that f is too, and define

Then, in particular, |f | is also integrable, and

Note. The Lebesgue integral is, by construction, an absolute integral: f is

The class of (Lebesgue-) integrable functions f on R is written L(R) or (for

(prime example: F a probability distribution function). Such functions can

If [a, b] is a finite interval and F is defined on [a, b], a finite collection of

So the class F of subsets of Ω whose probabilities P (A) are defined should

Recall also that the distribution function of X is defined by

X is F-measurable [is a random variable] iff σ(X) ⊆ F.

σ(X) ⊆ σ(Y ) iff X = g(Y )

for some measurable function g. For, knowing Y means we know X := g(Y )

If X is real-valued, say, with distribution function F , recall (Ch. I) that EX

(weighted average of possible values, weighted according to their probability).

where the integral on the right is a Lebesgue-Stieltjes integral. This in turn

follows by the change of variable formula for the measure-theoretic integral,

§4. Equivalent Measures and Radon-Nikodym derivatives.

if P (A) = 0 whenever Q(A) = 0, A ∈ F. We quote from measure theory the

§5. Conditional Expectations.

P (A|B) := P (A ∩ B)/P (B) if P (B) > 0.

If X takes values x1 , · · · , xm with probabilities f1 (xi ) > 0, Y takes values

This is the conditional distribution of Y given X = xi , written

We define the conditional density of Y given X = x by the continuous ana-

Example: Bivariate normal distribution, N (µ1 , µ2 , σ12 , σ22 , ρ).

Kolmogorov’s approach: conditional expectations via σ-fields

if B = ∪n Bn , Bn disjoint – and defined on the σ-algebra B, so is a measure

In the general case, where Y is a random variable whose expectation exists

and define E[Y |B] by linearity as

E[Y |B] := E[Y+ |B] − E[Y− |B].

Suppose now that B is the σ-field generated by a random variable X:

Similarly, if B = σ(X1 , · · · , Xn ) (B is the information in (X1 , · · · , Xn )) we

§6. Properties of Conditional Expectations.

E[Y |{∅, Ω}] = EY.

Proof. We have to check (*) of §5 for B = ∅ and B = Ω. For B = ∅ both

2. B = F. Here B is the largest possible σ-field: ‘knowing everything’.

E[Y |F] = Y P − a.s.

4. If Y is B-measurable, E[Y Z|B] = Y E[Z|B] P − a.s.

5. If C ⊂ B, E[E[Y |B]|C] = E[Y |C] a.s.

So EC [EB Y ] satisfies the defining relation for EC Y . Being also C-measurable,

5’. If C ⊂ B, E[E[Y |C]|B] = E[Y |C] a.s.

6. Conditional Mean Formula. E[E[Y |B]] = EY P − a.s.

7. Role of independence. If Y is independent of B,

E[Y |B] = E[Y ] a.s.

Proof. See [W], p.88, 90, property (k).

Projections. In Property 5 (tower property), take B = C:

You might also like