0% found this document useful (0 votes)
8 views29 pages

Background Notes To Course Stochastic Processes

The document provides background material for a course on Stochastic Processes, focusing on definitions and properties of algebras and σ-algebras in the context of probability spaces. It discusses the structure of σ-algebras generated by stochastic processes, including the concepts of finite-dimensional rectangles and cylinders, and their role in defining measurable functions. The document also addresses the measurability of random variables and stochastic processes, emphasizing the importance of σ-algebras in understanding the information available from observed paths.

Uploaded by

Günay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views29 pages

Background Notes To Course Stochastic Processes

The document provides background material for a course on Stochastic Processes, focusing on definitions and properties of algebras and σ-algebras in the context of probability spaces. It discusses the structure of σ-algebras generated by stochastic processes, including the concepts of finite-dimensional rectangles and cylinders, and their role in defining measurable functions. The document also addresses the measurability of random variables and stochastic processes, emphasizing the importance of σ-algebras in understanding the information available from observed paths.

Uploaded by

Günay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Background notes to course ‘Stochastic Processes’

Spring 2013
always under construction as well

Abstract

Here we provide some background material for the Lecture Notes.

1 Review of definitions

Recall the following definitions: let a space E be given. A collection A of subsets of E is called a an
algebra (or a field) if the following three conditions hold:

i) E ∈ A;

ii) A ∈ A ⇒ Ac = E \ A ∈ A;

iii) A, B ∈ A ⇒ A ∪ B ∈ A.

A collection A is called a σ-algebra on E if A is an algebra, such that A1 , A2 , . . . ∈ A ⇒ ∪∞


i=1 Ai ∈ A.
Hence, an algebra is closed under taking finite unions and finite intersections, whereas a σ-algebra
is closed under taking countable unions and intersections. The pair (E, A) is called a space, if A is
a σ-algebra on E. Contrast this with the definition of topology.

Definition 1.1 A collection T of so-called open subsets of the space E is called a topology if

i) ∅, E ∈ T ;

ii) the intersection of finitely many members of T belongs to T : A1 , . . . , An ∈ T implies ∩nk=1 Ak ∈ T ;

iii) the union of arbitrarily many members belongs to T : Aα ∈ T , α ∈ B, then ∪α Aα ∈ T .

We call E a topological space.


Let C be a collection of subsets of E. By σ(C), the σ-algebra generated by C, we understand the
smallest σ-algebra that contains C. The Borel-σ-algebra on the topological space E is the smallest
σ-algebra that contains the open sets of E. A set A ⊂ E is called a Gδ -set if it is a countable
intersection of open sets in E. It is an Fσ -set if it is a countable union of closed (i.e. complements
of open sets) sets in E.
If E is a metric space, equipped say with metric ρ, then the Borel-σ-algebra B(E) is the σ-algebra
generated by the open sets induced by the metric ρ. A set A ⊂ E is open if for each x ∈ A

1
there exists rx such that Brx (x) = {y | ρ(x, y) < rx } is contained in A. If, in addition, the metric
space E is separable (i.e. there is a countably dense subset), then B(E) = σ(Bq (x), q ∈ Q, x ∈
countably dense subset of E).
We use B to denote the Borel sets in R.
Many statements concerning σ-algebras can be reduced to statements on certain generic collections
of sets: a main one is the notion of π-system. A collection A of subsets of E is called a π-system if
A is invariant under finite intersections, i.e. A, B ∈ A ⇒ A ∩ B ∈ A.
Desirable properties of π-systems are the following (see Williams, PwM).

Lemma 1.2 i) Let I be a π-system on E generating the σ-algebra A. Suppose that µ1 , µ2 are
measures on (E, σ(I)), such that µ1 (E) = µ2 (E) < ∞, and µ1 (A) = µ2 (A) for all A ∈ I.
Then µ1 (A) = µ2 (A) for all A ∈ (σ(I));

ii) Let a probability space (Ω, F, P) be given. Let I, J be π-systems on E with A = σ(I) and
F = σ(J ). Then A and F are independent if I and J are independent, that is, if P{A ∩ B} =
P{A}P{B} for all A ∈ I, B ∈ J .

Suppose that we have measure spaces (Ω, F) and (E, E). Then the function X : Ω → E is called
a random element if X is F/E-measurable. In other words, if X −1 (A) ∈ F, for all A ∈ E. The
following lemma is helpful in checking measurability.

Lemma 1.3 Let (Ω, F), (E, E) be a measurable space. Let C be a collection of subsets of E, such
that σ(C) = E. Let X : Ω → E be a map. Then X is F/E-measurable if for X −1 (A) ∈ F for all
A ∈ C.

2 σ-algebra for a stochastic process

Let the probability space (Ω, F, P) be given, as well as the measurable space (E, E). Let further
X = (Xt )t∈T be a s.p. (stochastic process) with state E, for index set T . This means that Xt , t ∈ T ,
are all E-valued, F/E-measurable random elements. We assume that T is an interval of R.
Given ω ∈ Ω, the corresponding realisation x(ω) = (xt (ω))t∈T of the s.p. X is called a sample path
or trajectory of X.
One can also view X(ω) as an E-valued function of T , i.e. there exists f : T → E, such that
X(ω) = f , and Xt (ω) = f (t), t ∈ T . Hence, the process X takes values in the function space E T .
The question is: how should one define a σ-algebra on E T in some consistent way and such that we
can answer questions concerning the probability of events of interest that concern the whole path?
We take the following approach.
A finite-dimensional rectangle in E T is a set of the form

S = {x ∈ E T | xt1 ∈ E1 , . . . , xtn ∈ En },

for {t1 , . . . , tn } ⊂ I, Ei ∈ E, i = 1, . . . , n, for some finite integer n > 0.


It is natural to view such sets as the building blocks of our σ-algebra, so we define

E T = σ{1-dimensional rectangles}.

2
For understanding the structure of this σ-algebra, let us recall the concept of product-σ-algebras E n ,
n = 1, 2, . . . , ∞.
For n finite, this is simply the σ-algebra generated by all n-dimensional rectangles

E n = σ{E1 × · · · × En | E1 , . . . , En ∈ E}.

For n = ∞, it is the σ-algebra generated by all finite-dimensional rectangles (defined in E ∞ in a


natural way):

E ∞ = σ{E1 × · · · × En × E × E × · · · | E1 , . . . , En ∈ E, n = 1, 2, . . .}.

Now we shall call a set A ∈ E T a cylinder, if there exists a finite subset {t1 , . . . , tn } of T and a set
B ∈ E n , such that
A = {x | (xt1 , . . . , xtn ) ∈ B}. (2.1)
It will be called a σ-cylinder, if there exists a (at most) non-ordered countable subset {t1 , . . .} of T
and a set B ∈ E ∞ , such that (2.1) holds.
We have the following characterisation.

Lemma 2.1 E T is precisely the collection of σ-cylinder sets.

Proof. First note that the σ-cylinders are contained in the σ-algebra generated by finite-dimensional
cylinders. Hence E T contains all σ-cylinders. Notice that by definition E T is the smallest σ-algebra
containing all 1-dimensional cylinders. Hence it is the smallest σ-algebra containing all σ-cylinders.
So it is sufficient to show that the σ-cylinders themselves form a σ-algebra.
First E T = {x | xt ∈ E} for arbitrary, but fixed t. Hence it is a σ-cylinder.
Let A = {x | (xt1 , xt2 , . . .) ∈ B}, B ∈ E ∞ . Then Ac = {x | (xt1 , xt2 , . . .) ∈ E ∞ \ B} is a σ-cylinder.
Finally, let An = {x | (xtn1 , xtn2 , . . .) ∈ Bn }, Bn ∈ E ∞ , n = 1, 2, . . .. It is sufficient to show that the
intersection is a σ-cylinder. To this end, we need to describe all An in terms of the same ‘time’
points. We therefore need to form the concatenation of the time points that define the different
σ-cylinders A1 , . . .. Let Sn = {tnk }k=1,... , put S = ∪n Sn . S is countable.
Let f : S → {1, 2, . . .} be a bijective map (representing an enumeration of the points of S). The set
An can be defined in terms of all time points f −1 (1), f −1 (2), . . .. At time points f −1 (m) 6∈ {tnk }k any
path x ∈ An is allowed to take any values. At time points f −1 (m) ∈ {tnk }k the values are prescribed.
To formalise this, define B̃n ⊂ E ∞ by

B̃n = {x = (x1 , x2 , . . .) ∈ E ∞ | (xf (tn1 ) , xf (tn2 ) , . . .) ∈ Bn }.

Note that B̃n ∈ E ∞ , since Bn ∈ E ∞ . Then An can be expressed as

An = {x ∈ E T | (xf −1 (1) , xf −1 (2) , . . .) ∈ B̃n }.

But now ∩n An = {x | (xf −1 (1) , xf −1 (2) , . . .) ∈ ∩n B̃n } and is hence is a σ-cylinder.


We conclude that E T must be equal to the σ-algebra of σ-cylinders. QED

3
Remark A stochastic process X = (Xt )t , with Xt F/E-measurable, is F/E T -measurable by con-
struction. Hence, the induced probability measure on (E T , E T ) is given by:

PX {A} = P{ω | X(ω) ∈ A} = P{X −1 (A)},

for any σ-cylinder A. This probability measure is uniquely determined by any π-system generating
E T . A convenient π-system is given by
n
I = A | ∃n ∈ Z+ , t1 < · · · < tn , ti ∈ T, i = 1 . . . , n, A1 , . . . , An ∈ E,
o
such that A = {x ∈ E T | xti ∈ Ai , i = 1 . . . , n} .

Note that a random variable Xt can be obtained from X by projection or co-ordinate maps: let
S ⊂ T , and let x ∈ E T . For any S ⊆ T : E S = {x = (xt )t∈S : xt ∈ E}. The precise indices
belonging to S do not disappear in this notation. For instance if T = {1, 2, 3, 4, 5} and S = {1, 5},
then RS = {x = (x1 , x5 ), x1 , x5 ∈ R}. This looks artificial, but is natural in the context of viewing
the index set as the collection of observation instants of a process.
Then the projection πS : E T → E S is defined by

πS (x) = {(xt )t∈S },

and πt (x) = xt is simply the projection on one co-ordinate. It follows that Xt = πt X.


Note that E T is in fact defined as the smallest σ-algebra that makes all projections on one co-ordinate
measurable:
E T = σ(πt , t ∈ T ).

One can view σ-algebras as the amount of information on individual states that is available to the
observer: of individual states one can only observe the measurable sets to which a state belongs. By
virtue of Lemma 2.1, one can only observe the values of a function in E T in countably many time
instants in T .

Corollary 2.2 Let T = [0, ∞), E = R, E = B(R), and C = {f : T → E | f continuous }.


Then C 6∈ E T .

Problem 2.1 Prove this corollary.

It is often convenient to complete a σ-algebra. Consider the probability space (Ω, F, P). The
completion F of F w.r.t. P is the σ-algebra generated by F and the sets A ⊂ E, s.t. there exists sets
A1 , A2 ∈ F with A1 ⊂ A ⊂ A2 and P{A2 \ A1 } = 0. Put P{A} = P{A1 }. In words, the completion
of F is the σ-algebra obtained by adding all subsets of null sets to F.

Problem 2.2 Show that the set of measurable real-valued functions on T = [0, ∞) is not measurable,
in other words, show that

M = {x ∈ RT | x is B[0, ∞)/B − measurable} 6∈ BT .

Moreover, show that there is no probability measure P on (RT , B T ) such that M ∈ B T , that is, such
that M belongs to the completion of B T w.r.t. P. Hint: consider M c ; show that no σ-cylinder can
contain only measurable functions x.

4
σ-Algebra generated by a random variable or a stochastic process
In a similar fashion one can consider the σ-algebra generated by random variables or by a stochastic
process.
Let Y : Ω → E be an F/E-measurable random variable. Then F Y ⊂ F is the σ-algebra generated
by Y . This is the minimal σ-algebra that makes Y a measurable random variable. Suppose that C
is a π-system generating E, then Y −1 C is a π-system generating F Y .
Let us consider a stochastic process X = (Xt )t∈T with T = Z+ or T = R+ . The σ-algebra FtX is the
σ-algebra generated by X upto time t. It is the minimal σ-algebra that makes the random variables
Xs , s ≤ t measurable and so it contains each σ-algebra FsX , for s ≤ t.
How can one generate this σ-algebra? If C is a π-system for E, then FtX = σ(Xs−1 (C), C ∈ C, s ≤ t}.
Unfortunately, the collection of sets {Xs−1 (C) | C ∈ C, s ≤ t} itself is not a π-system. However, the
following is a π-system generating FtX :

C X = {Xt−1
1
(C1 ) ∩ · · · ∩ Xt−1
n
(Cn ) | t1 < · · · < tn ≤ t, C1 , . . . , Cn ∈ C, n = 1, 2, . . .}.

This can be used in checking independence of σ-algebras, measurability issues, etc. What we see
here again, is that the information contained in the minimal σ-algebra FtX concerns the behaviour
of paths of X upto time t observed at at most countably many time points.

3 Measurable maps

Continuous maps between two metric spaces are always measurable with respect to the Borel-σ-
algebra on these spaces.

Corollary 3.1 Let E 0 , E be metric spaces. Let f : E 0 → E be a continuous map, i.e. f −1 (B) is
open in E 0 for each open set B ⊂ E. Then f is B(E 0 )/B(E)-measurable.

Proof. The open sets of E generate B(E). The statement follows from Lemma 1.3. QED

In stochastic process theory right-continuous functions of the positive real line R+ play an important
role. We want to show that such functions are measurable with respect to the Borel-σ-algebra on
R+ . We will be more specific.
Let E be a Polish space, with metric ρ say, and let E be the Borel-σ-algebra on E.

Lemma 3.2 There exists a countable class H of continuous functions f : E → [0, 1], such that
xn → x in E iff f (xn ) → f (x) in [0, 1] for all f ∈ H.

Proof. Take a countable, dense subset y1 , y2 , . . . of E. Let fk,n : E → [0, 1] be a continuous function
with fk,n (y) = 1 for ρ(y, yk ) ≤ 1/n, and fk,n (y) = 0 for ρ(y, yk ) ≥ 2/n. Then H = {fk,n | k, n = 1, . . .}
has the desired property. QED

Lemma 3.3 If x : [0, ∞) → E is right-continous, then x is continuous except at at most a countable


collection of points.

5
Proof. In view of the previous lemma, it is sufficient to show that t 7→ f (xt ) has the desired property
for each f ∈ H. That is, it is sufficient to consider the case E = [0, 1].
For t > 0 let
if yt1 = 0

|xt − lims↑t xs |,
yt1 = lim sup xs − lim inf xs (≥ 0 !), yt2 =
s↑t s↑t 0, otherwise.

If x is discontinuous at t we have either yt1 > 0 or yt2 > 0. Hence, it is sufficient to show that for any
 > 0 the sets Ai = {t | yti > } are at most countable (why?).
By right-continuity,
0 = lim ys1 = lim ys2 , t ≥ 0. (3.1)
s↓t s↓t

In particular, taking t = 0, gives that Ai ∩ (0, δ] = ∅ for some δ > 0. It follows that

τ i = sup{δ > 0 | Ai ∩ (0, δ] is at most countable} > 0.

But if Ai is uncountable, we would have τ i < ∞. This implies the existence of a sequence τni ∈
Ai ∩ (τ i , ∞) with τni ↓ τ i . Then lim sups↓τ i ysi ≥ , contradicting (3.1). QED

Let now X = {Xt }t be a stochastic process defined on the probability space (Ω, F, P) with right-
continuous paths. Define for f ∈ H, with H a convergence determining class, and yti , i = 1, 2, defined
in the above proof
(i)
Yt,f (ω) = yti (f (Xt (ω))
(i)
\
Cu = {ω | Xt (ω) is continuous at t = u} = {Yt,f = 0, i = 1, 2}.
f ∈H

It follows by right-continuity that lim inf t↑u f (Xt ), and lim supt↑u f (Xt ) are both measurable. Hence
(i)
Yt,f and Cu are as well. It makes sense to define u to be a fixed discontinuity of X if P{Cu } < 1.

Corollary 3.4 Let X = {Xt }t be a stochastic process defined on the probability space (Ω, F, P) with
right-continuous paths. Then X has at most countably many fixed discontinuities.

(i)
Proof. By (3.1) and dominated convergence we have lims↓t EYs,f = 0 for all t. Exactly as in the
(i)
proof of the previous lemma we may conclude that EYt,f = 0 except for t in an at most countable
(i) (i) (i)
set Nf . But when t 6∈ ∪i,f Nf , we have Yt,f = 0 a.s. for all f , implying that P{Ct } = 1. QED

Lemma 3.5 If x : [0, ∞) → E is right-continuous, then x is B(R+ )/E-measurable.

Proof. It is sufficient to show for any open set A ∈ E that the set C = {t | xt ∈ A} ∈ B(R+ ). Let
T be the countable set of discontinuities of x. Let t ∈ C \ T . Then there exist qt1 < t < qt2 , qti ∈ Q,
such that xs ∈ A for s ∈ (qt1 , qt2 ). Let further T ∩ C = TC . Then C = TC ∪ ∪t∈C\T (qt1 , qt2 ). Since
the collection of sets {(q1 , q2 ) | q1 , q2 ∈ Q, q1 < q2 } is at most countable, C is the countable union of
measurable sets. QED

6
Consider again a metric space E, with metric ρ say. Then ρ(x, A) = inf y∈A ρ(x, y) is a continuous
function of x ∈ E, and hence it is B(E)/B(R)-measurable by Corollary 3.1. Let (Ω, F, P) be a
probability space, and let X be an E-valued r.v. Then ρ(X, A) is F/B(R)-measurable and so it is a
r.v.
Now, is ρ(X, Y ) F/B(R)-measurable, for Y another E-valued r.v. on Ω? This map is a composition of
the map (X, Y ) : Ω → E 2 and the map ρ : E 2 → R. The map (X, Y ) is F/B(E) × B(E)-measurable,
where B(E) × B(E) is the product-σ-algebra on E 2 . Consider on E 2 the topology generated by the
rectangles A×B with A, B open in E. This generates the Borel-σ-algebra B(E 2 ) of E 2 . The function
ρ(., .) : E 2 → R is continuous, hence B(E 2 )/B(R)-measurable. As a consequence, for measurability
of ρ(X, Y ), it is sufficient that B(E 2 ) = B(E) × B(E). This is guaranteed if E is separable!

Lemma 3.6 Let E, E 0 be separable, metric spaces. Then B(E) × B(E 0 ) = B(E × E 0 ).

It is clear that B(E) × B(E) ⊂ B(E 2 ). Can you define a space E and metric ρ, such that B(E 2 ) 6=
B(E) × B(E)?

Monotone class theorems Statements about σ-algebras can often be deduced from the statement
on a π-system. We have seen this with respect to measurability, and independence issues, where we
have theorems asserting this.
Suppose one does not have a theorem at one’s disposal. Then the idea is to show that the sets
satisfying a certain condition form a monotone class or a d-system, see below for the defintion. And
then one shows that the monotone class or the d-system contains a π-system generating the σ-algebra
of interest. By Lemma 3.4, the d-system then contains the σ-algebra of interest and so the desired
property applies to our σ-algebra.

Definition 3.7 A collection S of subsets of the space Ω say, is called a d-system or a monotone
class if

i) Ω ∈ S;

ii) A, B ∈ S with A ⊆ B implies B \ A ∈ S.

iii) if An is an increasing sequence of sets in S, then ∪n An = limn→∞ An ∈ S.

Lemma 3.8 If a d-system contains a π-system, I say, then the d-system contains the σ-algebra
σ(I) generated by the π-system.

Proof. se Appendix A.1 from Williams PwithM. QED

Part of this lemma is known as Dynkin’s lemma (there many of these).


Suppose we want to deduce results on general (real-valued) measurable functions. The ‘standard
machinery’ is the following procedure.

7
Standard machinery

i) Show that the desired result holds for indicator functions.

ii) Argue by linearity that this implies the result to hold for step functions, that is, finite linear
combinations of indicator functions.

iii) Consider non-negative (measurable) functions. Then any such function can be approached by
a non-decreasing sequence of step functions. Use monotone convergence to deduce the desired
result for non-negative measurable functions.

iv) Consider general measurable functions (if appropriate) and write such functions as the difference
of two non-negative functions. Apply (iii).

It is sometimes not easy to show the first step (i). Instead one would like to only consider indicator
functions of elements of a π-system generating the σ-algebra. The following theorem allows to deduce
results on general (real-valued) measurable functions from results on indicators of the elements of a
π-system.

Theorem 3.9 ((Halmos) Monotone Class Theorem: elementary version) Let H be a class
of bounded functions from a set S to R, satisfying the following conditions:

i) H is a vector space over R (i.e. it is an Abelian group w.r.t addition of functions, it is closed
under scalar multiplication, such that (αβ)f = α(βf ), (−1)f = −f and (α + β)f = αf + βf ,
for f ∈ H, α, β ∈ R);

ii) if fn , n = 1, . . ., is a sequence of non-negative functions in H, such that fn ↑ f for a bounded


function f , then f ∈ H;

iii) the constant function belongs to H.

If H contains the indicator function of every set in a π-system I, then H contains every bounded
σ(I)-measurable function.

Proof. see Williams, Diffusions, MProcesses and Martingales I. QED

We can now characterise measurable functions on the path space E T .

Lemma 3.10 A function f : E T → R is E T -measurable if and only if f = g ◦ πS for some countable


subset S of T and some E S -measurable function g : E S → R.

Proof. One should use the Monotone Class theorem applied to bounded functions. Then for
T S
P
non-negative functions, with f = n≥0 fn , where fn = f 1{n≤f <n+1} . Notice that πS is E /E -
measurable as a consequence of Lemma 2.1. QED

8
4 Multivariate normal distribution

Definition. The random vector X = (X1 , . . . , Xn )T has a multivariate normal distribution iff
there exists a n × k matrix B, a vector a ∈ Rn and independent random variables Z1 , . . . , Zk , with
d
Zi = N(0, σ 2 (Zi )), σ 2 (Zi ) ≥ 0, defined on the same probability space (Ω, F, P), such that

X = a + BZ, (4.1)

where we write Z = (Z1 , Z2 , . . . , Zk )T . In this case, X is said to have the multivariate normal
| where a = (E(X1 ), . . . , E(Xn ))T is the mean vector and
distribution N(a, Σ),

σ 2 (Z1 )
 
 σ 2 (Z2 )  T
| =B
Σ .. B , (4.2)
 . 
σ 2 (Zk )

the covariance matrix.

Problem 4.1 Show that a = (E(X1 ), . . . , E(Xn ))T , and that Σ


| = (cov(Xi , Xj ))1≤i,j≤n is the covari-
ance matrix.

Corollary 4.1 Suppose that the random vector X = (X1 , . . . , Xn )T has a multivariate normal dis-
| X . The following holds:
tribution with mean vector a and covariance matrix Σ

d
| ii ).
i) Xi = N(ai , Σ

ii) Let B be an m × n-matrix. Then the random vector BX has a multivariate normal distribution
| T.
with mean Ba and covariance matrix B ΣB

|X =Σ
Due to the above result, we will write Σ | for the covariance matrix of X. Then

σ 2 (Z1 )
 
 σ 2 (Z2 ) 
|Z =
Σ .. .
 . 
σ 2 (Zk )

Problem 4.1.a Prove Corollary 4.1. You may use that a linear combination of independent normally
distributed random variables is normally distributed with the appropriate mean and variance.

So far, it is not clear that the multivariate normal distribution is completely determined by mean
vector and covariance matrix. In order that this be true, for any l ≥ 1, n × l matrix C, and
independent, normally distributed, non-degenerate, random variables Z10 , . . . , Zl0 with

| Z0 C T = BΣ
CΣ | Z BT (4.3)

one should have


d
CZ 0 = BZ, (4.4)
where we write Z 0 = (Z10 , . . . , Zk0 )T . The mean vector is merely a linear term.

9
Lemma 4.2 i) Suppose that X = (X1 , . . . , Xn )T has a multivariate normal distribution with mean
vector a and covariance matrix
 2
0 ··· 0

σ1 0
 0 σ22 0 ··· 0 
.. . .. .. 
 
|
ΣX =  .

0 . . . . .
 . .. ..
 ..

. . 0 
0 ··· 0 σn2
d
Then X1 , . . . , Xn are independent random variables, with Xi = N(0, σi2 ), i = 1, . . . , n.

ii) For any l ≥ 1, any n × l matrix C and independent, normally distributed random variables
Z10 , . . . , Zl0 satisfying (4.3), assertion (4.4) holds. In other words: if the random vectors X =
(X1 , . . . , Xn )T and Y = (Y1 , . . . , Yn )T have a multivariate normal distribution with the same
d
mean vector and covariance matrix, then X = Y .

Proof. We first prove (i). To this end, let Z1 , . . . , Zk be independent, normally distributed random
variables (on (Ω, F, P), with variances σ 2 (Z1 ), . . . , σ 2 (Zk ), and let B be an n × k matrix, such that
(4.1) holds, with mean vector a = 0. By assumption, (4.2) equals the above diagonal matrix, that is
 2
0 ··· 0

σ1 0  2
σ (Z1 )

 0 σ22 0 · · · 0 
 ..

.. .. .. 
  σ 2 (Z2 )  T
 . 0 . . . =B  . ..
B .
 . . .

 .. .. .. 0 

2
σ (Zk )
0 ··· 0 σn2

Further, for j 6= i
k
X
cov(Xi , Xj ) = Bim Bjm σ 2 (Zm ) = 0. (4.5)
m=1

The following observations can be made.

• No column of B is identically equal to 0.

• No row of B is identically equal to 0. If this were true, say row i, then Xi would be identically
equal to 0, and then Xi would be independent of (X1 , . . . , Xi−1 , Xi+1 , . . . Xn ). Without loss of
generality, one thus can ‘delete’ the rows of B that are identically equal to 0.

• If n < k, one can add k − n rows to the matrix B in the following manner. Consider the n × k
matrix B̃ with elements
q
B̃ij = Bij σ 2 (Zj ), 1 ≤ i ≤ n, 1 ≤ j ≤ k.

Then the rows of B̃ are mutually orthogonal by (4.5). Add k − n rows to B̃, such that the
resulting k × k matrix, denoted
p B̃, has mutually orthogonal rows. Define the k × k extended
matrix B, by Bij = B̃ij / σ (Zj ), when σ 2 (Zj ) > 0, and arbitrary otherwise, for i = n +
2

1, . . . , k. Extending X = BZ with k − n components, we obtain that Σ | is still a diagonal


matrix.

10
| is not a diagonal matrix.
• If n > k, the Xi are linearly dependent and Σ

The conclusion is that, without loss of generality, we may assume that n = k, and B is non-singular.
By virtue of Corollary 4.1 (i), Xi has a normal distribution with mean 0 and variance
X
σi2 = Σ
| ii = 2 2
Bij σ (Zj ),
j

for i = 1, . . . , n. We want to show that PX = PX1 × PX2 × · · · × PXn , with the latter denoting the
product measure.
Suppose first, that Zi are all non-degenerate, i.e. σ 2 (Zi ) > 0, 1, . . . , n.
Since PZ = PZ1 × · · · × PZn , the product measure has a density, f say, where

1 1 1 T | −1
e− 2 z Σ
P 2 2
f(z1 , . . . , zn ) = p e− i zi /2σ (Zi ) = p Z z, z ∈ Rn .
(2π)n i σ 2 (Zi )
Q
| Z)
(2π)n det(Σ

It holds, that Z = B −1 X. Using a change of variables, we get that X has the density fX with

fX (x) = f(B −1 x)det(B −1 ), x ∈ Rn .


q q
| −1
Notice that det(B −1 ) det(Σ Z ) = det(Σ| −1 ). Further,

| −1
(B −1 x)T Σ −1 T | −1
Z (B x) = x ΣX x.

Putting this together, yields


1 1 T | −1
fX (x) = p e− 2 x Σ X x.
| X)
(2π)n det(Σ
Since
1 2 2
fXi (xi ) = q e−xi /2σi , i = 1, . . . , n,
2πσi2
we have obtained that Y
fX (x) = fXi (xi ), x ∈ Rn ,
i
and so the distribution of X is a product measure of the distributions of the components.
Suppose next that σ 2 (Zi ) = 0, for i ≤ s (we can always reorder the indices). If s = n, then Xi
are all degenerate, and so trivially the independence follows. Thus, assume that s < l. Then the
distribution of X is not affected, by putting the first s columns of B equal to 0. Two cases can
happen.
First, if σi2 > 0 for i = 1, . . . , n, then we may restrict the matrix B to the submatrix consisting of
the last n − s columns, say this is the matrix B 0 . Then put Z 0 = (Zs+1 , . . . , Zn )T . For X 0 = B 0 Z 0 , it
holds that the components are all independent, as we have shown in the foregoing. But X and X 0
are equal in probability, so that the components of X are independent as well.
Secondly, suppose that σi2 = 0 for i ≤ r. Then X1 , . . . , Xr are all equal to 0 with probability 1.
Hence,
PX = PX1 × · · · × PXr × P(Xr +1,...,Xn ).

11
It is sufficient to study the distribution of (Xr+1 , . . . , Xn )T . But this follows similarly to the argu-
ments above.
We prove part (ii). Since Σ|X =Σ | Y is symmetric, we can write Σ| X = U DU T , where U T = U −1 .
T
Then V = U BZ is an n-dimensional random vector with a multivariate normal distribution, and a
diagonal covariance matrix. Thus, V has independent components by part (i), and the distribution is
a product measure of the separate components. Similarly, W = U T CZ 0 is an n-dimensional random
vector with a multivariate normal distribution, with the same diagonal covariance matrix as V .
Thus the distribution of W is the product measure of the separate components. Since corresponding
components of V and W have the same normal distributions, and the distributions of V and W are
product measures, the distributions of V and W are equal.
Since X = U V and Y = U W , it is straightforward to check that X and Y have the same distribution.
Indeed, for any product set A = A1 × · · · × An ∈ B(Rn ), taking the integral over the set {U V ∈ A}
and {U W ∈ A} with respect to the probability distribution of V and W respectively, yields the same
value, since the distributions of V and W are equal. As product sets form a π-system generating
B(Rn ), the probability distributions of X = U V and Y = U W are equal.
QED

By a similar transformation argument as in the proof of the above lemma, X has a density whenever
| 6= 0, given by
detΣ
1
fX (x) = p | −1 (x − a)}, x = (x1 , . . . , xk )T ∈ Rn .
exp{− 12 (x − a)T Σ
n |
(2π) det(Σ)

In this case, X is said to have a non-singular or non-degenerate multivariate normal distribution.


| ii = 0, then Xi has a degenerate distribution, in other words P{Xi = ai } = 1, and,
If e.g. σ(Xi ) = Σ
after PX = PXi × PX1 ,...,Xi−1 ,Xi+1 ,...,Xn . We can repeat this for any component of the vector X with
variance equal to 0.

Remark 4.1 Word of caution: in general it is not true that two normally distributed random
variables X and Y with cov(X, Y ) = 0, are independent!

The following lemma shows that sub-vectors also have a multivariate normal distribution.

| 0 ) distribution, where
Lemma 4.3 i) The vector (Xν1 , . . . , Xνr ), ν1 < . . . < νr ≤ n, has a N(a0 , Σ
| 0 = (Σ
a0 = (aν1 , . . . , aνr )T and Σ | X,ij )i,j∈{ν1 ,...,νr } .

ii) Suppose that Σ | X has a block diagonal structure, in other words, there are indices r0 = 0 <
r1 < · · · < rm−1 < k = rm , such that Σ | X,ij 6= 0 implies that rs−1 + 1 ≤ i, j ≤ rs for some
s ∈ {1, . . . , m}.
Then the vectors (X1 , . . . , Xr1 ), (Xr1 +1 , . . . , Xr2 ), . . . , (Xrm−1 +1 , . . . , Xn ) are mutually inde-
pendent.

Proof. For the proof of (i), take the restriction of the mean vector a and the matrix B to the rows
corresponding to the indices ν1 , . . . νr . The result follows immediately.
| s = (Σ
We prove (ii). Denote as = (ars−1 +1 , . . . , ars ), and let Σ | X,ij )rs−1 +1≤i,j≤rs be the s-th diagonal
| X.
block of Σ

12
d Qm
| s ). Als note that det(Σ
By (i) (Xrs−1 +1 , . . . , Xrs ) = N(as , Σ | X) = |s
s=1 det(Σ ). By the symmetry of
s
| , diagonalisation leads to the decomposition
Σ
| s = Us Ds UsT ,
Σ
where Us , Ds are (rs − rs−1 ) × (rs − rs−1 ) matrices, s = 1, . . . , m. Put U the matrix with the same
block structure as Σ | X , but with s-th diagonal block equal to Us , and put D the diagonal matrix
with diagonal equal to the successive diagonal elements from D1 , . . . , Dm . Then U DU T = Σ | X.
T
Following notation and construction in the proof of Lemma 4.2, V = U BZ is a multivariate normally
distributed random vector with independent components and X = U V . Thus, for Ai ∈ B(R),
i = 1, . . . , n
Z
P{Xi ∈ Ai , i = 1, . . . , n} = dPV1 (z1 ) × · · · × dPVn (zn )
z∈Rn :
U z∈A1 ×···×An
Z Z
1
= dP(V1 ,...,Vr1 )(z ) · · · dP(Vr (z m )
m−1 +1 ,...,Vn )
z1 ∈Rr1 : z m ∈Rrm −rm−1 :
U1 z 1 ∈A1 ×···Ar1 Um z m ∈Arm−1 +1 ×···×An
= P{(X1 , . . . , Xr ) ∈ A1 × · · · × Ar1 } × · · ·
· · · × P{(Xrm−1 +1 , . . . , Xn ) ∈ Arm−1 +1 × · · · × An }.
The result follows. QED

Problem 4.2 An n × n matrix Γ is the covariance matrix of random variables Y1 , . . . , Yn iff Γ is


symmetric, non-negative definite (i.e. xT Γx ≥ 0 for all x ∈ Rn ). Show this.

It is often convenient to use other characterisations of the multivariate normal distribution.

Theorem 4.4 X = (X1 , . . . , Xn ) has a multivariate normal distribution if and only if

i) for all vectors c ∈ Rn , i ci Xi = (c, X) has a normal distribution;


P

if and only if

(ii) there exist a vector a ∈ Rn and a symmetric, non-negative definite n × n matrix Γ, such that
for all θ ∈ Rn
T
Eei(θ,X) = ei(θ,a)−θ Γθ/2 .

Using characteristic functions calls for inversion formulae.

Theorem 4.5 R (Lévy’s inversion formula (see Williams P with M))


Let φ(θ) = R eiθx dF (x) be the characteristic function of a random variable X with distribution
function F . Then for a < b
Z T −iθa
1 − 1 − 1
e − e−iθb
2
(F (b) + F (b )) − 2
(F (a) + F (a )) = lim φ(θ)dθ.
T ↑∞ 2π −T iθ
R
Moreover, if R |φ(θ)|dθ < ∞, then X has a continuous probability density function f with
Z
1
f (x) = 2π e−iθx φ(θ)dθ.
R

13
In case of random variables assuming non-negative values only, it may be convenient to work with
the Laplace transform. Let g : [0, ∞) → R. Then the Laplace transform L(g) is defined by
Z ∞
Lg(s) = e−st g(t)dt,
0−

with s ∈ C, and very often Re(s) ≤ 0 (in the case of random variables), provided the integral exists.
An inversion formula exists in this case as well.

Theorem 4.6 (Mellin’s inverse formula)


Z γ+ir
g(t) = 1
2πi
lim est Lg(s)ds,
r→∞ γ−ir

with γ > Re(σ) for all singularities σ of Lg.

The inverse can often be calculated by Cauchy’s residue theorem.

Remark It is a consequence of both theorems for distribution functions F and G, that φF (θ) = φG (θ)
for all θ ∈ R implies that F ≡ G. Here φF and φG denote the characteristic functions of F and G.
The same conclusion holds for F and G the distributions of non-negative r.v. If LF (s) = LG(s) for
all s ≥ 0, then F ≡ G.

5 Addendum on Measurability

In the proof of Lemma 1.6.11 we need that the limit of a certain sequence of measurable maps is
measurable. We will show this.

Lemma 5.1 Let (E, d) be a metric space and let B(E) be the Borel-σ-algebra of open sets compatible
with d. Let (Xsn )s≤t , n = 1, . . ., be (E, B(E))-valued adapted stochastic processes on the filtered
probability space (Ω, F, (Fs )s≤t , P), such that the map
(s, ω) → Xsn (ω)
is measurable as a map from ([0, t] × Ω, B([0, t]) × Ft ) to (E, B(E)). Suppose that Xsn (ω) converges
to Xs (ω), n → ∞. Then the map (s, ω) → Xs (ω) is measurable.

Proof. Note that B(E) = σ(G | G open in E). Hence it is sufficient to check that
X −1 (G) = {(s, ω) | Xs (ω) ∈ G} ∈ B([0, t]) × F,
for all G open in E. Write G = ∪∞ c −1
k=1 Fk , with Fk = {x ∈ E | d(x, G ) ≥ k }. We need the closed
sets Fk to exclude that Xsm (ω) ∈ G for all m but limm→∞ Xsm (ω) 6∈ G. By pointwise convergence of
Xsm (ω) to Xs (ω) we have that
∞ [
[ ∞ \

X −1 (G) = (X m )−1 (Fk ).
k=1 n=1 m=n

Since (X m )−1 (Fk ) ∈ B[0, t] × F, it follows that X −1 (G) ∈ B[0, t] × F. QED

We now give an example of a stochastic process X = (Xt )0≤t≤1 , that is not progressively measurable.
Moreover, for this example we can construct a stopping time τ , for which Xτ is not Fτ -measurable.

14
Example 5.1 Let Ω = [0, 1], F = B[0, 1], and P is the Lebesgue-measure. Let A ⊂ Ω be non-
measurable, i.e. A 6∈ B. Put 
t + ω, t∈A
Xt (ω) =
−t − ω, t 6∈ A.
Note that σ(Xt ) = B, for all t. Hence, the natural filtration FtX = B. Further |Xt (ω)| = t + ω is
continuous in t and ω.
Now {(s, ω) | Xs (ω) ≥ 0} = A×Ω is not measurable. Consequently X is not progressively measurable.
Define the stopping time T by
T = inf{t | 2t ≥ |Xt |},
so that T (ω) = ω. Clearly {ω : T (ω) ≤ t} = {ω ≤ t} ∈ B = FtX , so that indeed T is a stopping time.
One has Fτ = B. However, 
2ω, ω∈A
XT (ω) (ω) =
−2ω, ω 6∈ A
and so {Xτ > 0} is not B-measurable, hence not Fτ -measurable.

Comment on the σ-algebra Fτ


Let (Ω, F, (Ft )t , P) be a filtered probability space. Let τ be a stopping time. Then Fτ is defined by:
Fτ = {A ∈ F | A ∩ {τ ≤ t} ∈ Ft }.
Let X be a progressively measurable stochastic process, and let τ be a finite stopping time. Then
Xτ is an Fτ -measurable r.v. Hence σ(Xτ ) ⊂ Fτ . In general σ(Xτ ) 6= Fτ !

Next, define the hitting time of x: τx = inf{t > 0 | Wt = x}. A direct proof of measurability of τx for
x 6= 0 follows here (cf. LN Example 1.6.9). To show F/B(R)-measurability of, it is suffcient to show
that τx−1 (t, ∞) is measurable for each t. This is because the sets {(t, ∞), t ∈ R} generate B(R).
For x 6= 0 one has

{τx > t} = {|Ws − x| > 0, 0 ≤ s ≤ t}


= ∪n {|Ws − x| > n1 , 0 ≤ s ≤ t}
= ∪n ∩q∈Q∩[0,t] {|Wq − x| > n1 }.

We have in fact proved that {τx > t} ∈ σ(Ws , s ≤ t), hence {τx ≤ t} ∈ σ(Ws , s ≤ t) and so τx is a
stopping time.
It is slightly more involved to show that {τ0 > t} is a measurable event, since W0 = 0, a.s. and so
there does not exist a uniformly positive distance between the path identically equal to 0 and Ws (ω)
on (0, t].

6 Convergence of probability measures

Let (Ω, F, P) a probability space. First of all, let µ, µn , n = 1, . . ., be a sequence of probability


measures on (Ω, F).

Definition 6.1 Suppose additionally that Ω is a metric Rspace and FR the Borel-σ-algebra. Then
w
µn converges weakly to µ (in formula: µn → µ), if f dµn → f dµ for all f ∈ C(Ω) =
{g | g continuous, bounded}.

15
The following theorem gives equivalent criteria for weak convergence.
w
Theorem 6.2 (Portmanteau Theorem) µn → µ, n → ∞, if and only if one (and hence all) of
the following criteria holds:

i) lim supn µn (F ) ≤ µ(F ) for all closed sets F ;


ii) lim inf n µn (G) ≥ µ(G) for all open sets G;
iii) limn µ(A) = µ(A) for all sets A, such that µ(δ(A)) = 0. Here δ(A) = A− − A0 , where
A− = ∩F ⊃A,F closed F
0
A = ∪G⊂A,G open G.

Assume that (E, E) is a measurable space, with E separable, metric (with metric ρ) and E = B(E)
is the σ-algebra generated by the open sets. Let X, X1 , X2 , . . . : Ω → E be F/E-measurable random
variables.
a.s.
Definition 6.3 i) Xn converges a.s. to X (in formula: Xn → X), if P{limn→∞ Xn = X} = 1;
P
ii) Xn converges in probability to X (in formula: Xn → X), if P{ρ(Xn − X) ≥ } → 0, n → ∞, for
each  > 0.
L1
iii) Xn converges to X in L1 (in formula: Xn → X) if E|Xn − X| → 0, n → ∞.
D w
iv) Xn converges to X in distribution (in formula: Xn → X), iff PXn → PX , where PXn (PX ) is the
distribution of Xn (X).

In Williams PwithM, Appendix to Chapter 13, you can find the following characterisation of con-
vergence in probability.

a.s. P D
Lemma 6.4 i) Xn → X implies Xn → X, implies Xn → X.
D P
ii) Xn → a implies Xn → a, where a is a constant (or degenerate r.v.).
P
iii) Xn → X if and only if every subsequence {Xnk }k=1,2,... of {Xn }n contains a further subsequence
{Xn0k }k such that Xn0k → X, a.s.

Suppose that Xn are i.i.d. integrable r.v.’s. Then the (Strong) Law of Large Numbers states that
Pn
i=1 Xi a.s.
→ EX1 .
n
If also EXn2 < ∞, then the Central Limit Theorem states that
n
1 X D
p (Xi − EX1 ) → X ∗ ,
σ 2 (X1 )n i=1
d
where X ∗ = N(0, 1).
Suppose that Xn are i.i.d. random variables, defined on a common probability space, with values in
a real, separable Banach space. A
Pfamous paper by Itô and Nisio contains the following surprising
convergence results. Write Sn = ni=1 Xi .

16
Theorem 6.5 • Equivalent are

1. Sn , n = 1, 2, . . ., converges in probability;
2. Sn , n = 1, 2, . . ., converges in distribution;
3. Sn , n = 1, 2, . . ., converges a.s.

• If Sn , n = 1, 2, . . ., are uniformly tight, i.e. for each  > 0 there exists a compact set K ⊂ E,
such that
P{Sn ∈ K} ≥ 1 − , n = 1, 2, . . . ,
then there exist c1 , c2 , . . . ∈ E, such that Sn − cn converges a.s.

d
If the random variables are symmetric, i.e. Xi = −Xi , then even more can be said. First, let E ∗ be
the collection of bounded continuous functions on E.

Theorem 6.6 If Xi , i = 1, 2, . . . are all symmetric random variables, then a.s. convergence is
equivalent to uniform tightness. In particular, the following additional equaivalences hold.

3 Sn , n = 1, 2, . . ., converges in probability;

4 Sn , n = 1, 2, . . ., is uniformly tight;

5 f (Sn ), n = 1, 2, . . ., converges in probability, for each f ∈ E ∗ ;

6 there exists an E-valued random variable S, such that limn→∞ Eeif (Sn ) → Eeif (S) for every f ∈ E ∗ .

These convergence results play an important role in the analysis of processes with independent
increments.

7 Conditional expectation

Theorem 7.1 (Fundamental Theorem and Definition of Kolmogorov 1933)


Suppose we have a probability space (Ω, F, P). Let X be a real-valued, integrable random variable,
i.e. E(|X|) < ∞. Let A be a sub-σ-algebra of F. Then there exists a random variable Y such that

i) Y is A-measurable;

ii) E(|Y |) < ∞;

iii) for each A ∈ A we have Z Z


Y dP = XdP.
A A

If Y 0 is another r.v. with properties (i,ii,iii), then Y 0 = Y with probability 1, i.e. P{Y 0 = Y } = 1.
We call Y a version of the conditional expectation E(X|A) of X given A and we write Y = E(X|A)
a.s.

17
Proof. To prove existence, suppose first that X ≥ 0. Define the measure µ on (Ω, A) by
Z
µ(A) = XdP, A ∈ A.
A

Since X is integrable, µ is finite. Now for all A ∈ A we have that P(A) = 0 implies µ(A) = 0. In
other words, µ is absolutely continuous with respect to P. By the Radon-Nikodym theorem, there
exists a measurable random variable Y on (Ω, A) such that
Z
µ(A) = Y dP, A ∈ A.
A

Clearly, Y has the desired properties. The general case follows by linearity.
As regards a.s. unicity of Y , suppose that we have two random variables Y, Y 0 satisfying (i,ii,iii).
Let An = {Y − Y 0 ≥ 1/n}, n = 1, 2, . . .. Then An ∈ A and so
Z
0= (Y − Y 0 )dP ≥ n1 P{An },
An

so that P(An ) = 0. It follows that



[ ∞
X
P{Y − Y 0 > 0} = P{ An } ≤ P{An } = 0.
n=1 n=1

Hence Y ≤ Y 0 a.s. Interchanging the roles of Y and Y 0 yields that Y 0 ≤ Y a.s. Thus Y = Y 0 a.s.
QED

N.B.1 Conditional expectations are random variables!


Conditional probabilities are conditional expectations: P{X ∈ A |F) = E{1{X∈A} |F}. E(X | Y )
stands for E(X | σ(Y )), where Y is any random variable.

N.B.2 Suppose we have constructed an A-measurable r.v. Z, with E(|Z|) < ∞, such that (iii) holds
for all A ∈ π(A), i.e. (iii) holds on a π-system generating A, containing the whole space. Then (iii)
holds for all A ∈ A, and
R so Z is a version of the conditional expectation E(X|A). This follows from
the interpretation of A XdP as a measure on A and the fact that two measures that are equal on a
π-system, are equal on the generated σ-algebra, provided they assign equal mass to the whole space.

This can be used in determining conditional expectations: make a guess and check that it is the right
one on a π-system.

Elementary properties The following properties follow either immediately from the definition,
or from the corresponding properties of ordinary expectations.

Lemma 7.2 i) If X is A-measurable and E|X| < ∞, then E(X | A) = X;

ii) E(X | {∅, Ω}) = EX, a.s.;


a.s.
iii) Linearity: E(aX + bY | A) = aE(X | A) + bE(Y | A);

18
iv) Positivity: if X ≥ 0 a.s., then E(X | A) ≥ 0;

v) Monotone convergence: if Xn ↑ X a.s., then E(Xn | A) ↑ E(X | A) a.s.;

vi) Fatou: if Xn ≥ 0 a.s., then E(lim inf Xn | A) ≤ lim inf E(Xn | A) a.s.

vii) Dominated convergence: suppose that |Xn | ≤ Y a.s. and EY < ∞. Then Xn → X a.s. implies
E(Xn | A) → E(X | A) a.s.

viii) Jensen: if φ is a convex funcion such that E|φ(X)| < ∞, then E(φ(X) | A) ≥ φ(E(X | A)) a.s.;

ix) Tower property: if A ⊂ B, then E(X | A) = E(E(X | B) | A) a.s.;

x) Taking out what is known: if Y is A-measurable and bounded, then E(Y X | A) = Y E(X | A) a.s.
The same assertion holds as well if X, Y ≥ 0 a.s. and E(XY ), EX < ∞, or if E|X|q , E|Y |p <
∞, with 1/q + 1/p = 1, p, q 6= 1;

xi) Role of independence: if B is independent of σ(σ(X), A), then E(X | σ(B, A)) = E(X | A) a.s.

Proof. Exercise! The proofs of parts (iv), (viii), (x) and (xi) are the most challenging, see Williams
PwithM. QED

hier nog iets over onafhankelijkheid!!


An simple method for checking whether two conditional expectations are (a.s.) equal relies on the
following lemma.

Lemma 7.2A Suppose X, Y are r.v. defined on the probability space (Ω, F, P). Then X ≥ Y , a.s.
if and only if Z Z
XdP = E1{F } X ≥ E1{F } Y = Y dP
F F
for all F ∈ F.

Conditioning on σ-algebras generated by random variables Let (Ωi , Fi ), i = 1, 2 be two


measure spaces. Suppose that X : Ω1 → Ω2 is F1 /F2 -measurable. Then σ(X) = σ(X −1 (A), A ∈
F2 ) = (X −1 (A), A ∈ F2 ), i.e. the σ-algebra generated by X is simply the collection of inverse images
under X of sets in F2 . Hence, A ∈ σ(X) iff there exists a set B ∈ F2 with X(A) = B. The following
helps to understand conditional expectations.

Theorem 7.3 Let (Ω1 , F1 ) be a measure space with the property that F1 is the σ-algebra generated
by a map g on Ω1 with values in a measure space (Ω2 , F2 ), i.e. F1 = σ(g). Then a real-valued
function f on (Ω1 , F1 ) is measurable if and only if there exists a real-valued measurable function h
on (Ω2 , F2 ), such that f = h(g).

Proof. If there exists such a function h, then f −1 (B) = g −1 (h−1 (B)) ∈ F1 , by measurability of g
and h. Suppose therefore that f is measurable. We have to show the existence of a function h with
the above required properties.
The procedure is by going through indicator functions through elementary functions to positive
functions and then to general functions f :

19
i) Assume that f = 1{A} for some set A ∈ F1 . Put B = g(A), then B ∈ F2 . Put h = 1{B} .
Pn
ii) Let f = i=1 ai 1{Ai } , Ai ∈ F1 . It follows from (i) that 1{Ai } = hi (g) with hi = 1{g(Ai )}
measurableon Ω2 . But then also h = ni=1 ai hi is measurable on Ω2 , and f = h(g).
P

iii) Let f ≥ 0. Then there is a sequence of elementary functions fn , with f (ω) = limn→∞ fn (ω),
ω ∈ Ω1 . By (ii) fn = hn (g) with hn measurable on Ω2 . It follows that limn hn (ω2 ) exists for
all ω2 ∈ {g(ω1 ) : ω1 ∈ Ω1 }. Define

limn hn (ω2 ), if this limit exists
h(ω2 ) =
0, otherwise

then h is measurable on Ω2 and f = h(g).


iv) Write f = f + − f − for general f , and apply (iii).

QED

Corollary 7.4 (Doob-Dynkin) Let Y = (Y1 , . . . , Yk ), Yi : Ω → E, be a random vector and let X be


an integrable real-valued r.v., both defined on the same probability space (Ω, F, P). Then there exists
a real-valued E k /B(R)-measurable function h : E k → R, such that E(X | Y1 , . . . , Yk ) = h(Y1 , . . . , Yk ).
Let Y = (Yt )t∈T be an (E, E)-valued random variable defined on the probability space (Ω, F, P). Then
there exists a measurable function h : E S → R, S = {t1 , . . . , }, such that E(X | Y ) = h(Yt1 , . . .).

Proof. By Theorem 7.3, E(X | Y ) = h(Y ) for a measurable function h : E T → R. By Lemma 3.10
h = h0 ◦ πS for a countable subset S ⊂ T and an E S -measurable function h0 : E S → R. As a
consequence, h(Y ) = h0 (πS (Y )) = h0 (Yt1 , . . . , ), where S = {t1 , . . .}. QED

Suppose X is a (measurable) map from (Ω, F, P) to (E, E). Let A ∈ F. You are accustomed to
interpreting  
P {X ∈ B} ∩ A
P(X ∈ B | A) = , (7.1)
P(A)
provided P(A) > 0. Formally we get

P(X ∈ B | A) = E(1{X∈B} | A),

which we can interpret as follows. Let A = {∅, Ω, A, Ac }, then E(1{X∈B} | A) is A/E-measurable.


Hence E(1{X∈B} | A) is constant on A and Ac : a.s. we have

c1 , ω∈A
E(1{X∈B} | A)(ω) =
c2 , ω ∈ Ac ,
for some constants c1 and c2 . Hence,
  Z Z
P {X ∈ B} ∩ A = 1{X∈B} dP = E(1{X∈B} | A)dP = c1 P(A),
A A

and so we obtain  
P {X ∈ B} ∩ A
E(1{X∈B} | A)(ω) = ,
P(A)

20
if P(A) > 0. In view of (7.1) we see that P(X ∈ B | A) stands for the value E(1{X∈B} | A)(ω) on A.
How can we take E(1{X∈B} | A)(ω) on A, if P(A) = 0?
The following observation is useful for computations.

Lemma 7.5 Let (Ω, F, P) be a probability space. Let F1 , F2 ⊆ F be independent σ-algebras. Let
(E1 , E1 ), (E2 , E2 ) be measure spaces. Suppose that X : Ω → E1 is F1 -measurable and that Y : Ω → E2
is F2 -measurable. Let further f : E1 × E2 → R be E1 × E2 -measurable with E|f (X, Y )| < ∞. Then
there exists an E2 -measurable function g : E2 → R, such that E(f (X, Y ) | F2 ) = g(Y ), where
Z Z
g(y) = f (X(ω), y)dP(ω) = f (x, y)dPX (x). (7.2)
Ω x

Proof. Define H as the class of bounded, measurable functions f : E1 × E2 → R, with the property
that for each f ∈ H the function g : E2 → R given by (7.2) is E2 -measurable, with E(f (X, Y ) | F2 ) =
g(Y ). It is straightforward to check that H is a monotone class in the sense of Theorem 3.9.
We will check that f = 1{B1 ×B2 } ∈ H, for B1 ∈ E1 , B2 ∈ E2 . By virtue of Theorem 3.9, H then
contains all bounded E1 × E2 -measurable functions.
Now

E(1{B1 ×B2 } (X, Y ) | F2 ) = E(1{B1 } (X)1{B2 } (Y ) | F2 )


= 1{B2 } (Y )E(1{B1 } (X) | F2 )
= 1{B2 } (Y )E(1{B1 } (X)).

On the other hand

E(f (X, y) | F2 ) = 1{B2 } (y)E(1{B1 } (X) | F2 ) = 1{B2 } (y)E(1{B1 } (X)).

Put g(y) = 1{B2 } (y)E(1{B1 } (X)), then the above implies that g(Y ) is a version of
E(1{B1 ×B2 } (X, Y ) | F2 ).
Derive the result for unbounded functions f yourself. QED

Example
Let Ω = [0, 1], F = B[0, 1], P = λ.

Let X : [0, 1] → R be defined by X(ω) = ω 2 . X is a measurable function on (Ω, F, P).


Let F0 = σ{[0, 1/2], (1/2, 1]}. Since E(X | F0 ) is F0 -measurable, it must be constant on the sets
[0, 1/2] and (1/2, 1]. We get for ω ∈ [0, 1/2] that
Z Z
1 1
E(X | F0 )(ω) = E(X | F0 )dP = XdP = , a.s.
2 [0,1/2] [0,1/2] 24

and for ω ∈ (1/2, 1]


Z Z
1 7
E(X | F0 )(ω) = E(X | F0 ) = XdP = , a.s.
2 (1/2,1] (1/2,1] 24

21
Put (
1
12 ,ω ∈ [0, 1/2]
Z(ω) = 7
12 ,ω ∈ (1, 2, 1].
R R
The sets [0, 1/2], (1/2, 1] form a π-system for F0 . Since A ZdP = A XdP for the sets A in a π-system
for F0 , and thus also for A = Ω, we have that Z is a version of E(X |F0 ), i.o.w. Z = E(X | F0 ) a.s.
Next let Y : Ω → R be given by Y (ω) = (1/2 − ω)2 . We want to determine E(X | Y ) = E(X | σ(Y )).
σ(Y ) is the σ-algebra generated by the π-system {[0, ω] ∪ [1 − ω, 1] | ω ∈ [0, 1/2]}. By Corollary 7.4
there exists a B(R)/B(R)-measurable function h : R → R, such that E(X | Y ) = h(Y ). Since Y is
constant on sets of the form {ω, 1 − ω}, necessarily E(X | Y ) is constant on these sets!
The easiest way to determine E(X | Y ) is by introducing a ‘help’ random variable and by subsequently
applying Lemma 3.2. Say V : Ω → {0, 1} is defined by V (ω) = 0 for ω ≤ 1/2 and V (ω) = 1 for ω >
1/2. Then σ(V ) and σ(Y ) are independent, and σ(Y, V ) = F (check this). Hence E(X | Y, V ) = X,
a.s.
By Corollary 7.4 there exists a B(R+ ) × σ({0}, {1})-measurable function f , such that X = f (Y, V ).
√ √
By computation we get f (y, v) = (− y + 1/2)2 1{v=0} + ( y + 1/2)2 1{v=1} .
By Lemma 7.5
g(Y ) = E(X | Y ) = E(f (Y, V ) | Y ),
where
1 √ 1 √
Z
1
g(y) = f (y, v)dPV (v) = (− y + 1/2)2 + ( y + 1/2)2 = y + .
v 2 2 4
1
It follows that E(X | Y ) = Y + 4 a.s.

d
Problem 7.1 Let X and Y be two r.v. defines on the same probability space, such that X =exp(λ),
i.e. P{X > t} = e−λt ; Y ≥ 0 a.s. and X and Y are independent. Show the memoryless property of
the (negative) exponential distribution

P{X > t + Y | X > Y } = P{X > t}, t ≥ 0.

Problem 7.2 A rather queer example. Let Ω = (0, 1]. Let A be the σ-algebra generated by all
one-point sets {x}, x ∈ (0, 1]. Then A ⊂ B(0, 1].

i) Classify A.
ii) Let λ be the Lebesgue measure on ((0, 1], B(0, 1]). Let X : ((0, 1], B(0, 1]) → (R, B) be any
integrable r.v. Determine E(X|A). Explain heuristically.

Gambling systems A casino offers the following game consisting of n rounds. In every round t he
bets αt ≥ 0. His bet in round t may depend on his knowledge of the game’s past.

P in {−1, 1} and P{ηt = 1} = 1/2 =


The outcomes ηt , t = 1, . . . of the game are i.i.d. r.v.s with values
P{ηt = −1}. The gambler’s capital at time t is therefore Xt = ti=1 αi ηi .
A gambling statregy α1 , α2 , . . . is called admissable (or predictable) if αt is σ(η1 , η2 , . . . , ηt−1 )-
measurable. In words this means that the gambler has no prophetic abilities. His bet at time t
depends exclusively on observed past history.
Example: αt = 1ηt >0 “only bet if you will win” is not admissible.

22
Problem 7.3 By the distribution of outcomes, one has E(Xt ) = 0. Prove this.

One has T = min{t|Xt ≤ α} is a stopping time, since {T ≤ t} = ∪tl=0 {Xl ≤ α} and


{Xl ≤ α} ∈ σ(η1 , . . . , ηl } ⊂ σ(η1 , . . . , ηt }, l ≤ t.
Now, αt = 1{T >t−1} = 1{T ≥t} ∈ σ(η1 , . . . , ηt−1 ) defines an admissible gambling strategy with
t t min{t,T }
X X X
Xt = αj ηj = 1{T ≥j} ηj = ηj = Smin{t,T } ,
j=1 j=1 j=1
Pt
where St = j=1 ηj . Hence ESmin{t,T } = 0 if T is a stopping time.
Hedging We have seen that the above gambling strategies cannot modify the expectation: on the
average the gambler wins and loses nothing. Apart from that, which payoffs can one obtain by
gambling?
We discuss a simple model for stock options. Assume that the stock price either increases by 1 or
decreases by 1 every day, with probability 1/2, independently from day to day. Suppose I own αt
units of stock at time t. Then the value of my portfolio increases by αt ηt every day (ηt are defined
as in the gambling section).
Suppose the bank offers the following contract “European option”: at a given time t one has the
choice to buy 1 unit of stock for price C or not to buy it. C is specified in advance. Our pay-off per
unit stock is (St − C)+ . In exchange, the bank receives a deterministic amount E((St − C)+ ).
Can one generate the pay-off by an appropriate gambling strategy? The answer is yes, and in fact
much more is true.

Lemma 7.6 Let Y be a σ(η1 , . . . , ηn ) measurable function. Then there is an admissible gambling
strategy α1 , . . . , αt such that
Xn
Y − E(Y ) = αj ηj .
j=1

Proof. Write Fn = σ(η1 , . . . , ηn ). Define αj by


αj ηj = E(Y |Fj ) − E(Y |Fj−1 ),
where F0 = {∅, Ω}. We have to show that αj is Fj−1 -measurable. It is clearly Fj -measurable and
so αj = E(αj | Fj ). By Corollary 7.4 there exists a measurable function f : Rj → R such that
E(αj |Fj ) = f (η1 , . . . , ηj ).

Problem 7.4 i) Show that E(αj ηj |Fj−1 ) = 0.


ii) Use (i) to show that
f (η1 , . . . , ηj−1 , 1) = f (η1 , . . . , ηj−1 , −1)
Hint: use Corollary 7.4.
iii) Explain now why αj is Fj−1 -measurable. Conclude the proof of Lemma 7.6.

QED

23
8 Uniform integrability
a.s.
Suppose that {Xn }n is an integrable sequence of random variables with Xn → X for some integrable
random variable X. Under what conditions does EXn → EX, n → ∞, or, even stronger, when does
L
Xn →1 X?
If Xn ≤ Xn+1 , a.s., both are true by monotone convergence. If this is not the case, we need additional
properties to hold.
R
For an integrable random variable X, the map A → A |X|dP is “uniformly continuous” in the
following sense.

Lemma 8.1 Let X be an integrable random variable. Then for every  > 0 there exists a δ > 0 such
that for all A ∈ F, P{A} ≤ δ implies Z
|X|dP < .
A

Proof. See for instance Williams, Pwith M. QED

Definition 8.2 Let C be an arbitrary collection of random variables on a probability space (Ω, F, P).
We call the collection uniformly integrable (UI) if to every  > 0 there exists a constant K ≥ 0 such
that Z
|X|dP ≤ , for all X ∈ C.
|X|>K

The following lemma gives an important example of uniformly integrable classes.

Lemma 8.3 Let C be a collection of random variables on a probability space (Ω, F, P).

i) If the class C is bounded in Lp (P) for some p > 1, then C is uniformly integrable.

ii) If C is uniformly integrable, then C is bounded in L1 (P).

Proof. Exercise, see Williams, PwithM. QED

Another useful characterisation of UI is the following. It allows for instance to conclude that the
sum of two UI sequences of random variables is UI.

Lemma 8.4 Let C be a collection of random variables on a probability space (Ω, F, P). C is UI if
and only if C is bounded in L1 and

sup sup E(1{A} |X|) → 0,  → 0. (8.1)


A∈F :P{A}< X∈C

24
Proof. Let C be a UI collection. We have to show that for all δ > 0 there exists  > 0 such that

sup sup E(1{A} |X|) ≤ δ. (8.2)


A∈F :P{A}< X∈C

Notice that
E(1{A} |X|) ≤ xP{A} + E{1{(x,∞)} (X)|X|}. (8.3)
Let x be large enough, so that E{1{(x,∞)} (X)|X|} ≤ δ/2. Next, choose  with x ≤ δ/2. The result
follows.
Boundedness in L1 follows by putting A = Ω in (8.3) and setting x large enough, so that
supX∈C E{1{(x,∞)} (X)|X|} < ∞.
For the reverse statement, assume L1 -boundedness of C as well as (8.1). By Chebychev’s inequality
1
sup P{|X| > x} ≤ sup E|X| → 0, x → ∞. (8.4)
X∈C x X∈C

To show UI, we have to show that for each δ > 0 there exists x ∈ R such that

E1{(x,∞)} (X)|X|} ≤ δ, X ∈ C.

Fix δ > 0. By assumption there exists  > 0 for which (8.2) holds. By virtue of (8.4), there exists x
so that P{X| > x} ≤  for X ∈ C. Put AX = {X > x}. Then (8.4) implies that

sup E{1{(x,∞)} (X)|X|} = sup E{1{AX } |X|} ≤ sup sup E{1{A} |X|} ≤ δ.
X X A:P{A}≤ X

QED

Conditional expectations give us the following important example of a uniformly integrable class.

Lemma 8.5 Let X be an integrable random variable. Then the class

C = {E(X | A) | A is sub-σ-algebra of F}

is uniformly integrable.

Proof. By conditional Jensen (Lemma 7.2) E|E(X | A)| ≤ EE(|X| | A) = E|X|. This yields L1 -
boundedness of C. Let F ∈ F. Then

E(|E(X | A)1{F } )| ≤ E(E(|X| | A)1{F } )


= E(E(E(|X| | A)1{F } |A))
= E(E(|X| | A)E(1{F } | A)) = E(|X|E(1{F } | A)).

Let δ > 0. By virtue of Lemma 8.1 it is sufficient to show the existence of  > 0 such that

sup sup E(|X|E(1{F } | A)) ≤ δ.


F ∈F :P{F }≤ A

Suppose that this is not true. Then for any n there exist Fn ∈ F with P{Fn } ≤ 1/n and An , such
that
E(|X|E(1{Fn } | An )) > δ.

25
Since P{Fn } ≤ 1/n, we can choose a subsequence, indexed by n again, such that E(E(1{Fn } | An )) =
P
P{Fn } ↓ 0 as n → ∞. Hence E(1{Fn } | An ) → 0 (see Theorem 8.6 below). The sequence has a
subsequence converging a.s. to 0 by virtue of Lemma 6.4 (iii). By dominated convergence it follows
that E(|X|E(1{Fn } | An )) ↓ 0 along this subsequence. Contradiction. QED

Uniform integrability is the necessary property for strengthening convergence in probability to con-
vergence in L1 (see §5).

L1
Theorem 8.6 Let (Xn )n∈Z+ and X be integrable random variables. Then Xn → X if and only if

P
i) Xn → X and

ii) the sequence (Xn )n is uniformly integrable.

Proof. See Williams PwithM, §13.7. QED

9 Augmentation of a filtration

In the LN we assume ‘usual conditions’ in constructing right-continuous (super/sub)-martingales.


This is related to making the filtration right-continuous. The following lemma gives the construction.
Let (Ω, F, {Ft }t , P) be a filtered probability space. The usual augmentation is the minimal enlarge-
ment that satisfies the usual conditions (note that the usual conditions in the LN are not always the
usual ones, but the essence of the construction does not change!).
Let N be the collection of P-null sets in the P-completion of F∞ = σ(Ft , t ∈ T ), and put Gt =
∩u>t σ(Fu , N ). Then (Ω, G∞ , {Gt }t , P) is the desired usual augmentation.

Lemma 9.1 Gt = σ(Ft+ , N ). Moreover, if t ≥ 0 and G ∈ Gt , then there exists F ∈ Ft+ , such that

F ∆G := (F \ G) ∪ (G \ F ) ∈ N .

Problem 9.1 i) Let (Ω, A, P) be a probability space, such that A is P-complete. Let N be the
collection of P-null sets in A. Let K be a sub-σ-algebra of A. Prove that

σ(K, N ) = {U ∈ A | ∃K ∈ K for which U ∆K ∈ N }


= {U ∈ A | ∃K1 , K2 ∈ K, with K1 ⊆ U ⊆ K2 , and K2 \ K1 ∈ N }.

ii) Prove Lemma 9.1. Hint: use (i). Note that it amounts to proving that ∩u>t σ(Fu , N ) =
σ(∩u>t Fu , N )!

The problem of augmentation is that crucial properties of the processes under consideration might
change. The next Lemma show that supermartingales with cadlag paths stay supermartingales (with
cadlag paths) after augmentation.

26
Lemma 9.2 Suppose that X is a supermartingale with cadlag paths relative to the filtered probability
space (Ω, F, {Ft }t , P). Then X is also a supermartingale with cadlag paths relative to the usual
augmentation (Ω, G∞ , {Gt }t , P).

Problem 9.2 Prove Lemma 9.2. You may use the previous exercise.

For Markov process theory it is useful to see that by augmentation of σ-algebras, certain measurability
properties do not change intrinsically. Let (Ω, F, P) be a probability space. Let A ⊂ F be a sub-σ-
algebra, and let N be collection of P-null sets.

Lemma 9.3 Let G = σ(N , A).

i) Let X be an F-measurable, integrable random variable. Then E(X | G) = E(X | A), P-a.s.

ii) Suppose that Z is G-measurable. Then there exist A-measurable random variables Z1 , Z2 with
Z1 ≤ Z ≤ Z2 .

Problem 9.3 Prove Lemma 9.3.

10 Elements of functional analysis

Recall that a (real) vector space is called a normed linear space, if there exists a norm on V , i.e. a
map || · || : V → [0, ∞) such that

i) ||v + w|| ≤ ||v|| + ||w||;

ii) ||av|| = |a|||v|| for all a ∈ R and v ∈ V ;

iii) ||v|| = 0 iff v = 0.

A normed linear space may be regarded as a metric space, the distance between the vectors v, w ∈ V
being given by ||v − w||.
If V, W are two normed linear spaces and A : V → W is a linear map, we define the norm of the
operator A by
||A|| = sup{||Av|| | v ∈ V and ||v|| = 1}.
If ||A|| < ∞, we call A a bounded linear transformation from V to W . Observe that by construction
||Av|| ≤ ||A||||v|| for all v ∈ V . A bounded linear transformation from V to R is called a bounded linear
functional on V .

Hahn-Banach theorem We can now state the Hahn-Banach theorem.

Theorem 10.1 Let W be a linear subspace of a normed linear space V and let A be a bounded linear
functional on W . Then A can be extended to a bounded linear functional on V without increasing
its norm.

27
Proof. See for instance Rudin (1987), pp. 104–107. QED

In Chapter 3 we use the following corollary to the Hahn-Banach theorem.

Corollary 10.2 Let W be a linear subspace of a normed linear space V . If every bounded linear
functional on V that vanishes on W , vanishes on the whole space V , then W = V , i.e. W is dense
in V .

Proof. Suppose that W is not dense in V . Then there exists v ∈ V , and  > 0, such that ||v − w|| > 
for all w ∈ W . Let W 0 be the subspace generated by W and v and define a linear functional A on
W 0 by putting A(w + λv) = λ for w ∈ W and λ ∈ R. For λ 6= 0 ||w + λv|| = |λ|||v − (−λ−1 w|| ≥ |λ|.
Hence |A(w + λv)| = |λ| ≤ ||w + λv||/. It follows that ||A|| ≤ 1/, with A considered as a linear
functional on W 0 . Hence A is bounded on W 0 . By the Hahn-Banach theorem, A can be extended
to a bounded linear functional on V . Since A vanishes on W and A(v) = 1, the proof is complete.
QED

Riesz representation theorem Let E ⊆ Rd be an arbitrary set and consider the class C0 (E)
of continuous functions on E that become small outside compacta. We endow C0 (E) with the
supremum norm
||f ||∞ = sup |f (x)|.
x∈E

This turns C0 (E) into a normed linear space, even a Banach space. The version of the Riesz
representation theorem that we consider here, describes the bounded linear functionals on C0 (E).
If µ is a finite Borel measure on E, then clearly the map
Z
f 7→ f dµ
E

is a linear functional on C0 (E), with norm equal to µ(E). The Riesz representation theorem states
that every bounded linear functional on C0 (E) can be represented as the difference of two functionals
of this type.

Theorem 10.3 Let A be a bounded linear functional on C0 (E). Then there exist two finite Borel
measures µ and ν such that Z Z
A(f ) = f dµ − f dν,
E E
for every f ∈ C0 (E).

Proof. See Rudin (1987), pp. 130–132. QED

Banach-Steinhaus Theorem or Principle of Uniform Boundedness Suppose that V is a


Banach space and W a normed linear space. Let Tα , α ∈ A, for some index set A, be a collection of
bounded linear transformations from V into W .

28
Theorem 10.4 Then either there exists M < ∞ such that

||Tα || ≤ M, α ∈ A.

or
sup ||Tα f || = ∞,
α∈A

for all f in some dense Gδ in V .

Proof. See Rudin (1986) pp. 103-104.

Bochner Integral This has to be included yet!

11 Generalised convergence theorems

These theorems are both taken from Royden’s book on Real Analysis. Let Ω, F) be a measurable
space. Let {µn }n be a sequence of measures on (Ω, F). Then µn is said to converge setwise to the
measure µ (on Ω, cF ), if µn (B) → µ(B) for every B ∈ F.

Lemma 11.1 (Generalised Fatou) Let (Ω, F) be a measurable space, {µn }n a sequence of mea-
sures that converges setwise to a measure µ and {fn : (Ω, F) → (R, B)}n a sequence of non-negative
functions that converges pointwise to the function f : (Ω, F) → (R, B). Then,
Z Z
lim inf fn dµn ≥ f dµ.
n→∞

Theorem 11.2 (Generalised Dominated Convergence) Let (Ω, F) be a measurable space,


{µn }n a sequence of measures that converges setwise to a measure µ and {fn : (Ω, F) → (R, B)}n
and {gn : (Ω, F) → (R, B)}n two sequences of functions that converge pointwise to the functions f
and g : (Ω, F) → (R, B). Suppose that |fn | ≤ gn , n ≥ 1, and that
Z Z
lim gn dµn = gdµ.
n→∞

Then Z Z
lim fn dµn = f dµ.
n→∞

29

You might also like