0% found this document useful (0 votes)
85 views

Skript 2022

This document outlines lecture notes on probability theory. It covers basics topics like probability spaces, random variables, expectation, independence, and convergence of random variables. It also covers laws of large numbers, weak convergence and characteristic functions, conditional expectation, martingales, and the central limit theorem. The notes are based on existing textbooks and are intended for students taking a course in probability theory.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Skript 2022

This document outlines lecture notes on probability theory. It covers basics topics like probability spaces, random variables, expectation, independence, and convergence of random variables. It also covers laws of large numbers, weak convergence and characteristic functions, conditional expectation, martingales, and the central limit theorem. The notes are based on existing textbooks and are intended for students taking a course in probability theory.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 112

Probability theory

Lecture notes based on [Dur05] and on Silke Rolles’ lecture notes


Winter 2022, Nina Gantert 1
October 28, 2022

Contents
1 Basics 4
1.1 Probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Construction of independent random variables . . . . . . . . . . . . . . . . 20
1.6.1 Finitely many . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.2 Infinitely many . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Convergence of sequences of random variables . . . . . . . . . . . . . . . . 21
1.8 Uniform integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Laws of large numbers 27


2.1 Weak law of large numbers (WLLN) . . . . . . . . . . . . . . . . . . . . . 27
2.2 First Borel-Cantelli lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Strong law of large numbers (SLLN) . . . . . . . . . . . . . . . . . . . . . 30
2.4 Second Borel-Cantelli lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5 The strong law of large numbers for L1 random variables . . . . . . . . . . 38
2.6 Kolmogorov’s 0-1-law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3 Weak convergence, characteristic functions and the central limit theo-


rem 43
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Weak convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Weakly convergent subsequences . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4 Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5 Uniqueness theorem and inversion theorems . . . . . . . . . . . . . . . . . 58
3.6 Lévy’s continuity theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.7 Characteristic functions and moments . . . . . . . . . . . . . . . . . . . . . 63
3.8 Convergence theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
1
Zentrum Mathematik, Bereich M14, Technische Universität München, D-85747 Garching bei
München, Germany. e-mail: [email protected]
4 Conditional expectation 72
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2 Conditional expectation for a σ-field generated by atoms . . . . . . . . . . 72
4.3 Conditional expectation for general σ-fields . . . . . . . . . . . . . . . . . . 74
4.4 Properties of conditional expectations . . . . . . . . . . . . . . . . . . . . . 76

5 Martingales 79
5.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.1.1 Sums of independent centered random variables . . . . . . . . . . . 79
5.1.2 Successive predictions . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1.3 Radon-Nikodym derivatives on increasing sequences of σ-fields . . . 80
5.1.4 Harmonic functions of Markov chains . . . . . . . . . . . . . . . . . 81
5.1.5 Growth rate of branching processes . . . . . . . . . . . . . . . . . . 81
5.2 Supermartingales and Submartingales . . . . . . . . . . . . . . . . . . . . . 83
5.3 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.4 Stopped martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5 The martingale convergence theorem . . . . . . . . . . . . . . . . . . . . . 88
5.6 Uniform integrability and L1 -convergence for martingales . . . . . . . . . . 95
5.7 Optional stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.8 Backwards martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.9 Polya’s urn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

2
Preface. Large parts of these notes are very close to the books [Dur05] and [Kle06].
These notes are written for the students of the course “Probability theory” (MA 2409) at
TUM. They do not replace any existing books. Please do not distribute them.

Preliminaries.

• A dictionary English – German can be found in [Kle06], page 587.

• I assume that you followed the courses “Einführung in die Wahrscheinlichkeitsthe-


orie” (MA1401) and “Maß- und Integrationstheorie” (MA2003). Measure theory
is a tool we will need throughout the course. Good references are [BK10] and the
appendix of [Dur05].

• Probability theory is a prerequisite for stochastic analysis (MA4405) and many


lectures in financial and actuarial mathematics.

Contents of the course:

• Notions of convergence for random variables, laws of large numbers

• Weak convergence, characteristic functions and the central limit theorem

• Conditional expectation and martingales

Motivation. Probability theory is a very active area of research. It has many connec-
tions with other fields of mathematics and also many applications. Random phenomena
are omnipresent. Stochastic models are used in many areas, e.g. physics, biology, eco-
nomics, sociology, etc. Think for instance about models for the spread of a pandemic -
these are stochastic processes and a building block in such models is the branching process
which we will also encounter in this lecture course.

3
1 Basics
1.1 Probability spaces
The material for this section is taken from Chapter 1 in [Dur05].

Definition 1.1 [Probability space] A probability space (Ω, F, P) consists of a set Ω 6= ∅,


a σ-algebra F on Ω, and a probability measure P on (Ω, F). We recall that a σ-algebra
(or σ-field) is a set F of subsets of Ω such that

(i) Ω ∈ F

(ii) A ∈ F ⇒ Ac ∈ F

S
(iii) A1 , A2 , . . . ∈ F ⇒ Ai ∈ F
i=1

and P : F → [0, 1] is a probability measure on (Ω, F) if

• P[Ω] = 1

• P[ Ai ] = i∈I P[Ai ] for any collection (Ai )i∈I of pairwise disjoint elements of F,
S P
i∈I
where I is a finite or countably infinite index set. The sets A ⊆ Ω, A ∈ F are called
events.

Example 1.2 (Discrete probability spaces) Let Ω be a finite or countably infinite


set, and let F = P(Ω) be the power Pset of Ω (i.e. the set of all subsets of Ω). Given a
weight function p : Ω → [0, 1] with ω∈Ω p(ω) = 1, define
X
P(A) = p(ω) for A ∈ F. (1.1)
ω∈A

Special cases:

• Ω is finite and p(ω) = |Ω|−1 for all ω, i.e. P is the uniform distribution on Ω.

• Binomial distribution, geometric distribution, Poisson distribution, etc. See the


“Introduction to probability” course.

Definition 1.3 For a collection G ⊆ P(Ω), we define


\
σ(G) := F.
F σ−algebra, G⊆F

Show that σ(G) is indeed a σ-algebra, by checking (i), (ii) and (iii) in Definition 1.1.
σ(G) is called the “σ-algebra generated by G”. Show that σ(G) is the smallest σ-algebra
containing G, i.e. if H is a σ-algebra and G ⊆ H, then σ(G) ⊆ H.

4
Example 1.4 (Probability measures with density with respect to the Lebesgue
measure)

• Let Ω = R.

• Let F = B(R) be the σ-algebra of Borel sets on R, σ-algebra generated by the


collection of all open sets ⊆ R.

• Let λ be the Lebesgue measure on R.

• Let f : R → [0, ∞) be measurable with


R
R
f (x) λ(dx) = 1. Then,
Z
P(A) = f (x) λ(dx) (1.2)
A

for A ∈ F defines a probability measure.

Special cases:

• f (x) = 1[0,1] (x) uniform distribution; here and in the following, 1A denotes the
indicator function of the set A, i.e. 1A (x) = 1 if x ∈ A and 1A (x) = 0 otherwise.

• f (x) = e−x 1[0,∞) (x) exponential distribution (modelling for instance the lifetime of
a lightbulb)

• f (x) =
2
√1 e−x /2 standard normal distribution (arising in the central limit theorem).

Example 1.5 (Finite product spaces)


Let (Ωi , Fi , Pi ), 1 ≤ i ≤ n, be probability spaces and define

• Ω = Ω1 × Ω2 × · · · × Ωn = {(ω1 , . . . , ωn ) : ωi ∈ Ωi for all 1 ≤ i ≤ n}

• F = F1 ⊗ F2 ⊗ · · · ⊗ Fn = σ-algebra generated by all cylinder sets, i.e. all sets of


the form {A1 × · · · × An : Ai ∈ Fi for all 1 ≤ i ≤ n}

• P = P1 ⊗ P2 ⊗ · · · ⊗ Pn = unique probability measure such that


n
Y
P(A1 × · · · × An ) = Pi (Ai ). (1.3)
i=1

Special case: Roll a die n times.


|Ai |
Ωi = {1, . . . , 6}, Fi = P(Ωi ), Pi (Ai ) = for Ai ⊆ {1, . . . , 6}, 1 ≤ i ≤ n.
6
|A|
This yields Ω = {1, . . . , 6}n , F = P(Ω) because all singletons are in F, P(A) = .
6n

5
1.2 Random variables
Definition 1.6 A measurable space (Ω, F) consists of a set Ω 6= ∅, and a σ-algebra F
on Ω. Let (Ω, F) and (S, S) be measurable spaces. A function X : Ω → S is measurable
if for any A ∈ S, X −1 (A) := {ω ∈ Ω : X(ω) ∈ A} ∈ F. A measurable function
X : Ω → S is called a random element of S, or a S-valued random variable. For a
probability space (Ω, F, P) and a S-valued random variable X, the distribution of X is the
following probability measure µX on (S, S):

µX (A) := P(X −1 (A)) = P(X ∈ A) for A ∈ S. (1.4)

Here we abbreviate {X ∈ A} := {ω ∈ Ω : X(ω) ∈ A}.

Special cases:

• For S = Rd and S = B(Rd ) we call a measurable X = (X1 , . . . , Xd ) : Ω → Rd , a


random vector. Its distribution is a probability measure on (Rd , B(Rd )) called the
joint distribution of (X1 , . . . , Xd ).

• For d = 1, we call a measurable X : Ω → R a random variable.

Definition 1.7 (Distribution function) The distribution function of a random vari-


able X : Ω → R is defined by

F : R → [0, 1], x 7→ P(X ≤ x) = µ((−∞, x]). (1.5)

Theorem 1.8 If F : R → R is increasing and right-continuous, i.e. limy↓x F (y) = F (x)


for all x ∈ R , with limx→−∞ F (x) = 0 and limx→∞ F (x) = 1, then there is a unique
probability measure µ on (R, B(R)) such that

µ((a, b]) = F (b) − F (a) for all a < b. (1.6)

In particular, the distribution function characterizes the distribution uniquely.

Proof. Existence: We proved in the “Introduction to probability theory” course that


there exists a random variable X with distribution function F . (We can take X = Q(Y )
with Y ∼ Uniform(0, 1) and the generalized inverse Q of F , see exercises!)) Let µ be the
distribution of X. Then,

µ((a, b]) = P(X ∈ (a, b]) = P(X ≤ b) − P(X ≤ a) = F (b) − F (a).

Uniqueness: Let µ and ν be two probability measures on (R, B(R)) satisfying (1.6). Then,

(a) µ(E) = ν(E) ∀E ∈ E := {(a, b] : a ≤ b},

(b) µ(R) = limn→∞ µ((−n, n]) = limn→∞ (F (n) − F (−n)) = 1 = ν(R).

6
Since E is a π-system (i.e. A, B ∈ E ⇒ A ∩ B ∈ E) generating B(R), µ = ν follows from
the uniqueness theorem for probability measures, see later.
Since every distribution function has the required properties, the last claim follows.

Definition 1.9 If X and Y have the same distribution, we say that they are equal in
distribution and we write
d
X = Y.
d
Note: X = Y ; X(ω) = Y (ω) for all ω. Consider e.g. Ω = [0, 1], F = B([0, 1]), P = λ,
X = I[0,1/2) , Y = I[1/2,1] . Then, X(ω) 6= Y (ω) for all ω. But X and Y have the same
distribution, namely Bernoulli( 12 ).

Theorem 1.10 Let (Ω, F) and (S, S) be measurable spaces, let X : Ω → S, and let
A ⊆ S be such that σ(A) = S, i.e. A generates S. If {ω : X(ω) ∈ A} ∈ F for all A ∈ A,
then X is a random element of S. In other words, it is enough to check the condition
“X −1 (A) ∈ F” for sets A ∈ A instead of considering A ∈ S.

Proof. Consider C := {C : {X ∈ C} ∈ F}. C is a σ-algebra:

(σ1) C ∈ C =⇒ {X ∈ C} ∈ F
=⇒ {X ∈ C c } = {X ∈ C}c ∈ F
=⇒ C c ∈ C.

(σ2) Similarly, Cn ∈ C =⇒ ∞
S
n=1 Cn ∈ C.

By assumption, A ⊆ C. Hence, S = σ(A) ⊆ C because C is a σ-algebra.


=⇒ X −1 (C) ∈ F for all C ∈ S
=⇒ X is measurable.

Example 1.11 B(R) = σ((a, b); a < b, a, b ∈ R) = σ((a, b); a < b, a, b ∈ Q).

Definition 1.12 Let (Ω, F) and (S, S) be measurable spaces and X : Ω → S a function.
Then σ(X) := {X −1 (B) : B ∈ S} is called the σ-algebra generated by X.

Note that σ(X) is a σ-algebra, and σ(X) ⊆ F if X is a S-valued random variable.

Example 1.13 Let Ω := {1, 2, 3, 4}, F = P(Ω), S = R and S = B. Let X(ω) :=


I{ω is even} . Which subsets of Ω are in σ(X)?
Answer: σ(X) = {Ω, ∅, {1, 3}, {2, 4}}.

Recall some facts from measure theory:

Theorem 1.14 Let (Ω, F), (S, S), and (T, T ) be measurable spaces. If X : Ω → S and
f : S → T are measurable, then the composition f (X) : Ω → T is measurable as well, i.e.
f (X) is a random element of T .

7
Special cases:

• If X1 , . . . , Xn are random variables and f : Rn → R is Borel-measurable, then


f (X1 , . . . , Xn ) is a random variable.

• X 2 , X k , eX , sin X, n1 ni=1 Xi , ni=1 Xi2 are random variables.


P P

Theorem 1.15 If Xn , n ≥ 1, are random variables, then

inf Xn , sup Xn , lim Xn , lim Xn .


n≥1 n≥1 n→∞ n→∞

are random elements of ([−∞, ∞], B([−∞, ∞])).

Remark 1.16 Often, with a slight abuse of notation, when we say “random variable”
without mentioning the possible values, we mean a random element of
([−∞, ∞], B([−∞, ∞])).

Corollary 1.17 {ω : limn→∞ Xn (ω) exists} ∈ F.

Proof. {ω : limn→∞ Xn (ω) exists} = {ω : limn→∞ Xn (ω) = limn→∞ Xn (ω)}.

Remark 1.18 If Xt , t ∈ [0, 1], are random variables, inf 0≤t≤1 Xt need not be a random
variable. For instance, set Ω = [0, 1], F = B([0, 1]). Take A ⊆ [0, 1], and define
(
0 if t = ω and ω ∈ / A,
Xt (ω) = (1.7)
1 otherwise,

i.e.
(
Xt (ω) = 1 ∀ω if t ∈ A,
(1.8)
Xt (ω) = 1[0,1]\{t} (ω) if t ∈
/ A.

In particular, all Xt , t ∈ [0, 1], are random variables. Note that inf 0≤t≤1 Xt (ω) = 1A (ω)
is not measurable if A ∈ / F.

Definition 1.19 We say that Xn converges almost surely (a.s.), if


h i
P lim Xn exists = 1. (1.9)
n→∞

Example 1.20 Take Ω = [0, 1], F = B([0, 1]), P = λ, and consider the random variables
Xn = (n + 1)ω n , n = 1, 2, . . .. Then Xn → X a.s. where X(ω) = 0, ∀ω.

8
1.3 Expectation
Definition 1.21 Let X be a random variable on a probability space (Ω, F, P). The ex-
pectation (expected value, mean) of X is defined by
Z
E[X] = X(ω) P(dω), (1.10)

whenever the r.h.s. of (1.10) is well-defined.

More precisely, set for x ∈ R,

x+ = max{x, 0} = x ∨ 0 (positive part of x) and (1.11)


x− = max{−x, 0} (negative part of x). (1.12)

Then
R X + and X − are both non-negative random variables. If X ≥ 0, then E[X] =
X dP ∈ [0, ∞] is always well-defined, but it could have the value +∞. Now, E[X] is
well defined if E[X + ] < ∞ or E[X − ] < ∞ (or both), and in this case we have

E[X] = E[X + ] − E[X − ]. (1.13)

Convergence theorems.
The convergence theorems (Fatou’s lemma, monontone convergence theorem, dominated
convergence theorem, bounded convergence theorem) will be used again and again during
the lecture. Therefore, it is important that you know the statements of these theorems
by heart and are able to apply them.

Theorem 1.22 (Fatou’s lemma) If Xn ≥ 0, then


 
E lim Xn ≤ lim E[Xn ]. (1.14)
n→∞ n→∞

Example 1.23 Take Ω = [0, 1], F = B([0, 1]), P = λ, and consider the random variables
Xn = (n + 1)ω n , n = 1, 2, . . .. Then lim inf n→∞ Xn (ω) = limn→∞ Xn (ω) = 0, ∀ω ∈ [0, 1),
and we have
 
0 = E lim Xn < lim E[Xn ] = 1 . (1.15)
n→∞ n→∞

Theorem 1.24 (Monotone convergence theorem) If Xn ≥ 0, Xn ↑ X, then


n→∞

E[Xn ] ↑ E[X] ∈ [0, ∞] . (1.16)


n→∞

Theorem 1.25 (Dominated convergence theorem) Suppose that


(a) Xn −→ X a.s. and
n→∞

9
(b) there exists Y with E[|Y |] < ∞ such that
|Xn | ≤ Y for all n. (1.17)

Then,
lim E[Xn ] = E[X]. (1.18)
n→∞

If in Theorem 1.25(b) Y can be replaced by a constant, we call the theorem bounded


convergence theorem.
Theorem 1.26 (Expectation of a function of a random variable) Let X : Ω → S
be a random element of (S, S) with distribution µ and let h : (S, S) → (R, B(R)) be
measurable. Then
Z
E[h(X)] = h(x) µ(dx), (1.19)
S

whenever l.h.s or r.h.s. are well-defined.


In particular, for a random vector X = (X1 , . . . , Xd ) : Ω → Rd with distribution µ and
h : Rd → R Borel-measurable, one has
Z
E[h(X1 , . . . , Xd )] = h(x1 , . . . , xd ) µ(dx1 . . . dxd ), (1.20)
Rd

whenever l.h.s or r.h.s. are well-defined.


Proof. Case 1: h = 1B with B ∈ S (indicator function).
Then,
Z Z
E[h(X)] = E[1B (X)] = P[X ∈ B] = µ(B) = 1B (x) µ(dx) = h(x) µ(dx).
S S
Pn
Case 2: h = i=1 bi 1Bi with bi ∈ R, Bi ∈ S (simple function).
Essentially the same argument.
Case 3: h ≥ 0 measurable.
There exist simple functions hn such that hn ≥ 0, hn ↑ h. Then,
E[h(X)] = E[ lim hn (X)]
n→∞
= lim E[hn (X)] by the monotone convergence theorem, hn ≥ 0
n→∞
Z
= lim hn (x) µ(dx) by Case 2
n→∞
Z
= lim hn (x) µ(dx) by the monotone convergence theorem
n→∞
Z
= h(x) µ(dx).

Case 4: general h.
Write h = h+ − h− .

10
Example 1.27 (Discrete random variables)
Consider a random variable X such that

X
P[X = an ] = pn , n = 1, 2, . . . , where pn ≥ 0, pn = 1. (1.21)
n=1
P∞
Then, the distribution of X is given by µ = n=1 pn δan where the Dirac measure δx (·) is
given by δx (A) = 1 if x ∈ A and δx (A) = 0 otherwise. Consequently,

X ∞
X
E[h(X)] = h(an )pn , E[X] = an p n , (1.22)
n=1 n=1

whenever the sums are well-defined.

Example 1.28 If the distribution of X has a density f with respect to the Lebesgue
measure, then
Z ∞ Z ∞
E[h(X)] = h(y)f (y) dy, E[X] = yf (y) dy, (1.23)
−∞ −∞

whenever the integral is well-defined.

E[X n ] is called the nth moment of X, n = 1, 2, 3, . . . If E[|X|] < ∞, we define the


variance of X by
2
Var(X) = E (X − E[X])2 = E[X 2 ] − E[X] ∈ [0, ∞] .
 
(1.24)

Example. X ∼ N (m, σ 2 ), i.e. X has density


1 (x−m)2
f (x) = √ e− 2σ 2 , x ∈ R. (1.25)
2πσ 2
Then,

E[X] = m (1.26)

and

Var(X) = σ 2 . (1.27)

See the “Introduction to Probability” course for a proof. The key fact is that if X has
the law N (0, 1), then Y = σX + m has the law N (m, σ 2 ).

11
1.4 Inequalities
See [Dur05], Section 1.3 and appendix A.5.

Theorem 1.29 (Jensen’s inequality) Let ϕ : R → R be convex. If E[|X|] < ∞, then


E[ϕ(X)] is well-defined and one has
  
E ϕ(X) ≥ ϕ E[X] . (1.28)

Proof. See literature.

Application 1.30

E[|X|] ≥ |E[X]| ϕ(x) = |x| . (1.29)


E |X|p ≥ |E[X]|p p
 
ϕ(x) = |x| , p ≥ 1. (1.30)

Apply inequality (1.30) with X = |Z|α , α > 0.


  p
α
≤ E |Z|αp
 
E |Z|
   α1   1
 αp
=⇒ E |Z|α ≤ E |Z|αp by taking the αpth root
p > 1 =⇒ αp > α
   α1
=⇒ E |Z|α is increasing in α. (1.31)

In particular, we know:

E |Z|α < ∞ =⇒ E |Z|β < ∞


   
for all 0 < β < α, (1.32)

for instance: nth moment of Z is finite =⇒ (n − 1)st , (n − 2)nd , . . . , 1st moments of Z are
finite.

Theorem 1.31 (Chebyshev’s inequality) Let ϕ : R → [0, ∞) be measurable and A ∈


B(R). Then,

inf{ϕ(y) : y ∈ A}P(X ∈ A) ≤ E[ϕ(X)1A (X)] ≤ E[ϕ(X)]. (1.33)

In particular, if ϕ is increasing then


E[ϕ(X)]
P(X ≥ c) ≤ (1.34)
ϕ(c)
Proof. We have:

inf{ϕ(y) : y ∈ A}1A (X) ≤ ϕ(X)1A (X) ≤ ϕ(X).

Taking expectations yields (1.33). Then (1.34) follows by taking A = [c, ∞).

12
E[|X|]
Example 1.32 (a) P(|X| ≥ a) ≤ for a > 0 ( Take A = (−∞, −a] ∪ [a, ∞),
a
ϕ(x) = |x| in (1.33) or |X| instead of X and ϕ(x) = x in (1.34)).

E |X|p
 
(b) P(|X| ≥ a) ≤ p
for a > 0, p > 0 (Take A = (−∞, −a]∪[a, ∞), ϕ(x) = |x|p
a
in (1.33) or |X| instead of X and ϕ(x) = xp in (1.34)).
Var(X)
(c) P(|X − E[X]| ≥ a) ≤ for a > 0 (follows from (b) with p = 2)
a2
E |X − E[X]|4
 
(d) P(|X − E[X]| ≥ a) ≤ for a > 0 (follows from (b) with p = 4)
a4
E eλX
 
(e) P(X ≥ a) ≤ for λ > 0, a > 0 (Take ϕ(x) = eλx ) in (1.34)).
eλa

1.5 Independence
See [Dur05], Section 1.4.

Definition 1.33 Let (Ω, F, P) be a probability space and (Si , Si ), 1 ≤ i ≤ n be measurable


spaces.

(a) Events A1 , . . . , Am are independent if for all ∅ =


6 I ⊆ {1, . . . , m}
!
\ Y
P Ai = P(Ai ). (1.35)
i∈I i∈I

(b) Random elements Xi of Si , 1 ≤ i ≤ n are independent if


n
! n
\ Y
P {Xi ∈ Bi } = P(Xi ∈ Bi ) (1.36)
i=1 i=1

for all Bi ∈ Si .

(c) σ-algebras F1 , . . . , Fm ⊆ F are independent if


m
! m
\ Y
P Ai = P(Ai ) (1.37)
i=1 i=1

for all Ai ∈ Fi .

Remark 1.34 Independence is not a property of the events/random elements/σ-algebras


per se, but it depends on the probability measure P.

13
The following proposition shows that in Definition 1.33 (a) is a special case of (b) and
(b) is a special case of (c).

Proposition 1.35 (a) Events A1 , . . . , An are independent if and only if 1A1 , . . . , 1An
are independent.

(b) Random variables X1 , . . . , Xn are independent if and only if the σ-algebras σ(X1 ),
. . ., σ(Xn ) are independent.

Proof.

(a) Suppose 1A1 , . . . , 1An are independent. Let ∅ =


6 I ⊆ {1, . . . , n}. Then,
! !
\ \ \
P Ai = P {1Ai = 1} ∩ {1Aj ∈ {0, 1}}
| {z }
i∈I i∈I
| {z }
j∈{1,...,n}\I
=Ai =Ω
Y Y
= P(Ai ) P(Ω) by the independence assumption.
i∈I j ∈I
/
| {z }
=1

=⇒ A1 , . . . , An are independent.
Suppose A1 , . . . , An are independent. Let ki ∈ {0, 1}, 1 ≤ i ≤ n. Then,

Ai if ki = 1,
{1Ai = ki } =
Aci if ki = 0.

Hence,
 
n
!
\ \ \
P {1Ai = ki } =P  Ai ∩ Acj 
i=1 i:ki =1 j:kj =0

Y Y n
Y
= P(Ai ) P(Acj ) = P(1Ai = ki ).
i:ki =1 j:kj =0 i=1

Here we have used the independence of Aj , j ∈ {i : ki = 1}, Acj , j ∈ {i : ki = 0}.

(b) follows immediately from the definition of σ(Xi ).

Definition 1.36 Collections of events A1 , . . . , An are independent if for all ∅ 6= I ⊆


{1, . . . , n} and Ai ∈ Ai
!
\ Y
P Ai = P(Ai ). (1.38)
i∈I i∈I

14
Whenever Ω ∈ Ai for all i, it suffices to consider I = {1, . . . , n}.

Definition 1.37 Infinitely many collections of events A1 , A2 , . . . are independent if for


all ∅ =
6 J ⊂ N, J finite, the collections Ai , i ∈ J are independent.

Definition 1.38 (a) A is a π-system or ∩-stable if A, B ∈ A =⇒ A ∩ B ∈ A.

(b) D is a Dynkin system (or λ-system) if

(D1) Ω ∈ D,
(D2) A ∈ D =⇒ Ac ∈ D,
[
(D3) A1 , A2 , . . . ∈ D, A1 , A2 , . . . pairwise disjoint =⇒ Ai ∈ D.
i≥1

Each σ-field is a Dynkin system. On the other hand, there are Dynkin systems which
are not σ-fields.

Example 1.39 Ω = {1, 2, 3, 4}, D = {∅, Ω, {1, 2}, {1, 4}, {3, 4}, {2, 3}}. D is a Dynkin
system, but not a σ-field.

S
Note that a Dynkin system which is ∩-stable is a σ-field: For A1 , A2 , . . . ∈ D, An =
n=1
∞ ∞
(An ∩ Ac1 ∩ . . . ∩ Acn−1 ) =⇒
S S
A1 ∪ An ∈ D.
n=2 n=1
T
Lemma 1.40 Let I 6= ∅ be an index set. If Ai is a σ-field on Ω, ∀i ∈ I, then Ai =
i∈I
{A ⊆ Ω | A ∈ Ai , ∀i ∈ I} is a σ-field on Ω.
The same statement holds for Dynkin systems.

Proof. Exercise.

Definition 1.41 For a collection G ⊆ P(Ω), we define


\
D(G) := F.
F Dynkin system, G⊆F

Show that D(G) is indeed a Dynkin system, by checking (D1), (D2) and (D3) in Definition
1.38. D(G) is called the “Dynkin system generated by G”. Show that D(G) is the smallest
Dynkin system containing G, i.e. if H is a Dynkin system and G ⊆ H, then D(G) ⊆ H.

Lemma 1.42 If M is ∩-stable, then D(M) = σ(M).

Proof. We will show that D(M) is ∩-stable.

15
(1) We show A ∈ D(M), B ∈ M =⇒ A ∩ B ∈ D(M).
Proof: For B ∈ M, DB = {A ⊆ Ω | A ∩ B ∈ D(M)} is a Dynkin system,
M ∩ -stable =⇒ M ⊆ DB
DB Dynkin system =⇒ D(M) ⊆ DB

(2) Let A ∈ D(M), B ∈ D(M). We show A ∩ B ∈ D(M).


Proof: DA = {C ⊆ Ω | A ∩ C ∈ D(M)} is a Dynkin system, M ⊆ DA due to (1)
=⇒ D(M) ⊆ DA .

Hence, D(M) is a σ -algebra =⇒ D(M) = σ(M).

Corollary 1.43 (Uniqueness theorem for probability measures) Let P1 , P2 be two


probability measures on the measurable space (Ω, F). Assume that G ⊆ F, G is a π-system
and σ(G) = F. If
P1 (A) = P2 (A) for all A ∈ G
then P1 = P2 , i.e.
P1 (A) = P2 (A) for all A ∈ F.

Proof. P1 (A) = P2 (A) for all A ∈ G implies, due to the properties of probability measures,
P1 (A) = P2 (A) for all A ∈ D(G). But, due to Lemma 1.42, D(G) = σ(G) and σ(G) = F
due to our assumptions, hence we conclude P1 = P2 .

Lemma 1.44 (Dynkin’s lemma) Let M be a π-system and D a Dynkin system. If


M ⊆ D, then

σ(M) ⊆ D. (1.39)

Proof. Due to Lemma 1.42, σ(M) ⊆ D(M). But D(M) ⊆ D since M ⊆ D.

Theorem 1.45 Suppose A1 , . . . , An are independent π-systems. Then σ(A1 ), . . . , σ(An )


are independent.

Proof. Let Ji := Ai ∪ {Ω}. Then, each Ji is still a π-system and


n
! n
\ Y
P Bi = P(Bi ) for all Bi ∈ Ji . (1.40)
i=1 i=1

Fix B2 ∈ J2 , . . . , Bn ∈ Jn . Let

D := {A ∈ F : P(A ∩ B2 ∩ · · · ∩ Bn ) = P(A)P(B2 ) · · · P(Bn )}.

Then,

(a) J1 ⊆ D by (1.40);

(b) D is a Dynkin system by the following argument.

16
(D1) Ω ∈ D because Ω ∈ J1 .
(D2) Let A ∈ D. Then,

P(Ac ∩ B2 ∩ · · · ∩ Bn )

= P (Ω ∩ B2 ∩ · · · ∩ Bn ) \ (A ∩ B2 ∩ · · · ∩ Bn )
= P(Ω ∩ B2 ∩ · · · ∩ Bn ) − P(A ∩ B2 ∩ · · · ∩ Bn )

= P(Ω) − P(A) P(B2 ) · · · P(Bn ) because Ω ∈ D and A ∈ D
= P(Ac )P(B2 ) · · · P(Bn ).

=⇒ Ac ∈ D.
(D3) Let A1 , A2 , . . . ∈ D be pairwise disjoint. Then, Ai ∩ B2 ∩ · · · ∩ Bn , i ≥ 1, are
pairwise disjoint. Hence,
[  ! !
[
P Ai ∩ B2 ∩ · · · ∩ Bn = P (Ai ∩ B2 ∩ · · · ∩ Bn )
i≥1 i≥1
X
= P(Ai ∩ B2 ∩ · · · ∩ Bn )
i≥1
X
= P(Ai )P(B2 ) · · · P(Bn ) because Ai ∈ D ∀i
i≥1
[ 
=P Ai P(B2 ) · · · P(Bn ).
i≥1
S
=⇒ i≥1 Ai ∈ D.

Hence, D is a Dynkin system.

By Dynkin’s lemma, σ(J1 ) ⊆ D, i.e.



σ(A1 ) = σ(J1 ) , J2 , . . . , Jn are independent.

Repeat the argument.

Corollary 1.46 If for all x1 , . . . , xn ∈ (−∞, ∞]


n
Y
P(X1 ≤ x1 , . . . , Xn ≤ xn ) = P(Xi ≤ xi ), (1.41)
i=1

then X1 , . . . , Xn are independent. (Note that {Xi ≤ ∞} = Ω.)



Proof. Let Ai = {Xi ≤ xi } : xi ∈ (−∞, ∞] . Ai is a π-system, and σ(Ai ) = σ(Xi ).
Furthermore, A1 , . . . , An are independent.
=⇒ σ(X1 ), . . . , σ(Xn ) are independent by Theorem 1.45.

17
Corollary 1.47 (i) Suppose F11 , . . . , F1m1 ,
F21 , . . . , F2m2 ,
..
.
Fn1 , . . . , Fnmn
are independent σ-algebras. Let Ji = σ m
S i 
j=1 F ij . Then, J1 , . . . , Jn are indepen-
dent.

(ii) More general, assume Fj , j ∈ I are independent σ-algebras where I is a (not nec-
essarily finite) index set, which is partitioned as follows:

[
I= Ii where Ij ∩ Ik = ∅ for j 6= k .
i=1
S 
Let Ji = σ j∈Ii Fj . Then, J1 , J2 , . . . are independent.

Proof.
 Tmi
(i) Let Ai = j=1 Aij : Aij ∈ Fij . Ai is a π-system, and A1 , . . . , An are independent
by assumption.
=⇒ σ(A1 ), . . . , σ(An ) are independent by Theorem 1.45.
But Ji ⊆ σ(Ai ) because Ω ∈ Fij .
=⇒ J1 , . . . , Jn are independent.

(ii) Recall Definition 1.37 and rerun the proof of (i).

Corollary 1.48 Suppose X11 , . . . , X1m1 ,


X21 , . . . , X2m2 ,
..
.
Xn1 , . . . , Xnmn
are independent random variables. If fi : Rmi → R are measurable (i = 1, . . . , n), then

f1 (X11 , . . . , X1m1 ), . . . , fn (Xn1 , . . . , Xnmn ) (1.42)

are independent random variables.


In particular, if X1 , . . . , Xn are independent and fi : R → R is measurable, then f1 (X1 ),
. . . ,fn (Xn ) are independent.

Proof. Set Fij = σ(Xij ). By Corollary 1.47, Ji = σ m


S i 
j=1 F ij , 1 ≤ i ≤ n, are indepen-
dent. Since fi (Xi1 , . . . , Ximi ) is Ji -measurable, the first claim follows.

Example 1.49 Let X1 , . . . , Xn be independent, and Sm = X1 + · · · + Xm . For m < n,


Sn − Sm = Xm+1 + · · · + Xn is independent of max1≤k≤m Sk .

18
Theorem 1.50 Suppose Xi : Ω → Si , 1 ≤ i ≤ n, are independent. Then (X1 , . . . , Xn )
has distribution µ1 ⊗ · · · ⊗ µn , where µi is the distribution of Xi .

Proof. For all Ai ∈ Si , 1 ≤ i ≤ n, we have that



P (X1 , . . . , Xn ) ∈ A1 × · · · × An = P(X1 ∈ A1 , . . . , Xn ∈ An )
n
Y
= P(Xi ∈ Ai ) by independence
i=1
n
Y
= µi (Ai ) = (µ1 ⊗ · · · ⊗ µn )(A1 × · · · × An ).
i=1

So the distribution of (X1 , . . . , Xn ) and µ1 ⊗ · · · ⊗ µn agree on

{A1 × · · · × An : Ai ∈ Si ∀i}

which is a π-system generating S1 ⊗ . . . ⊗ Sn , hence they agree on S1 ⊗ . . . ⊗ Sn by the


uniqueness theorem for probability measures.

Theorem 1.51 Suppose X : Ω → S and Y : Ω → T are independent with distributions



µ and ν, respectively. If h : S × T → R is measurable and h ≥ 0 or E |h(X, Y )| < ∞,
then
Z Z
E[h(X, Y )] = h(x, y) µ(dx) ν(dy). (1.43)
T S

In the special case h(x, y) = f (x)g(y), we obtain

E[f (X)g(Y )] = E[f (X)]E[g(Y )] (1.44)

Proof. Theorem 1.26 yields


Z
E[h(X, Y )] = h d(µ ⊗ ν)
S×T
Z Z
= h(x, y) µ(dx) ν(dy) by Fubini’s theorem.
T S

In the special case h(x, y) = f (x)g(y), one has


Z Z
E[f (X)g(Y )] = f (x)g(y) µ(dx) ν(dy)
ZT S

= g(y)E[f (X)] ν(dy)


T
= E[f (X)]E[g(Y )].

19
1.6 Construction of independent random variables
See [Dur05], Section 1.4 (c).

1.6.1 Finitely many


Given: distribution functions F1 , . . . , Fn .
Want: Independent random variables X1 , . . . , Xn with P(Xi ≤ x) = Fi (x) for all x ∈ R,
1 ≤ i ≤ n.
Let µi be the unique measure on (R, B(R)) with

µi ((a, b]) = Fi (b) − Fi (a) for all a < b.

Set Ω = Rn , F = B(Rn ), µ = µ1 ⊗ · · · ⊗ µn , and let

Xi : Ω → R, Xi (ω1 , . . . , ωn ) = ωi (1.45)

be the projection to the i-th coordinate. Then, for all Ai ∈ B(R),


n
Y n
Y
P(X1 ∈ A1 , . . . , Xn ∈ An ) = µ(A1 × . . . × An ) = µi (Ai ) = P(Xi ∈ Ai ).
i=1 i=1

In particular, X1 , . . . , Xn are independent and each Xi has distribution µi and hence


distribution function Fi .

1.6.2 Infinitely many


Definition 1.52 Random variables Xi , i ≥ 1, are called independent if X1 , . . . , Xn are
independent for all n ∈ N.

Theorem 1.53 (Kolmogorov’s extension theorem) Let νn be probability measures


on (Rn , B(Rn )) which are consistent in the sense that
 
νn+1 (a1 , b1 ] × · · · × (an , bn ] × R = νn (a1 , b1 ] × · · · × (an , bn ] (1.46)

for all n ≥ 1 and ai < bi . Then, there exists a unique probability measure P on
(RN , B(R)⊗N ) such that
 
P ω : ωi ∈ (ai , bi ] for 1 ≤ i ≤ n = νn (a1 , b1 ] × · · · × (an , bn ] (1.47)

for all ai < bi , n ∈ N. Here, B(R)⊗N denotes the product σ-algebra, i.e. the smallest
σ-algebra generated by the sets

{ω : ωi ∈ Bi , 1 ≤ i ≤ n}, Bi ∈ B(R), n ∈ N. (1.48)

Proof. The proof uses the measure extension theorem. See Appendix A.7 in [Dur05].

20
Example 1.54 Given distribution functions Fi , i ≥ 1, we want independent random
variables Xi , i ≥ 1 with P(Xi ≤ x) = Fi (x) for all x ∈ R, i ≥ 1.
Let µi be the unique measure on (R, B(R)) with

µi ((a, b]) = Fi (b) − Fi (a) for all a < b.

For n ∈ N, let νn = µ1 ⊗ · · · ⊗ µn . Then, for all n and ai < bi , 1 ≤ i ≤ n, one has

νn+1 ((a1 , b1 ] × · · · × (an , bn ] × R) =(µ1 ⊗ · · · ⊗ µn+1 ) ((a1 , b1 ] × · · · × (an , bn ] × R)


n
Y n
Y
= µi ((ai , bi ]) · µn+1 (R) = µi ((ai , bi ])
i=1 i=1
=(µ1 ⊗ · · · ⊗ µn ) ((a1 , b1 ] × · · · × (an , bn ])
=νn ((a1 , b1 ] × · · · × (an , bn ]) .

Hence, the measures νn , n ≥ 1, are consistent. Let P be the unique measure induced by
Kolmogorov’s extension theorem on (Ω = RN , F = B(R)⊗N ), and let

Xi : Ω → R, Xi (ω) = ωi , i = 1, 2, . . . (1.49)

be the projection to the i-th coordinate. Then, by construction, the joint distribution of
(X1 , . . . , Xn ) is given by νn = µ1 ⊗ · · · ⊗ µn . Thus, Xi , i ≥ 1, are independent and Xi has
distribution µi .

1.7 Convergence of sequences of random variables


See [Kle06], 6.1.

Definition 1.55 Let X, Xi , i ≥ 1, be random variables on the same probability space


(Ω, F, P).
(a) Xn −→ X almost surely (a.s.) if

P(ω : lim Xn (ω) = X(ω)) = 1. (1.50)


n→∞

(b) Xn −→ X in probability if for all ε > 0

lim P(|Xn − X| > ε) = 0. (1.51)


n→∞

P
We write Xn −−−→ X.
n→∞

(c) For p ≥ 1 we say Y ∈ Lp if E |Y |p < ∞.


 

Xn −→ X in Lp if Xn ∈ Lp for all n, X ∈ Lp , and

lim E |Xn − X|p = 0.


 
(1.52)
n→∞

21
weak

convergence
in
probability
if dominated

almost
convergence
sure
in Lp
convergence

if dominated strong

Theorem 1.56 (a) If p1 < p2 , then Xn −→ X in Lp2 implies Xn −→ X in Lp1 .

(b) Xn −→ X in Lp implies Xn −→ X in probability.

(c) Xn −→ X almost surely implies Xn −→ X in probability.

(d) Suppose there exists Y ∈ Lp such that |Xn | ≤ Y for all n. If Xn −→ X in probability
and X ∈ Lp , then Xn −→ X in Lp .

Proof.

(a) Assume Xn −→ X in Lp2 . Then,


1 1
E |Xn − X|p1 p1 ≤E |Xn − X|p2 p2 by statement (1.31) of Application 1.30
 

−−−→ 0 by assumption.
n→∞

Hence, Xn −→ X in Lp1 .

(b) Assume Xn −→ X in Lp . By Chebyshev’s inequality,

 E |Xn − X|p
 
P |Xn − X| > ε ≤ −−−→ 0 ∀ε > 0.
εp n→∞

Hence, Xn −→ X in probability.

22
(c) Assume Xn −→ X almost surely. Then, for P-almost all ω ∈ Ω and all ε > 0 there
exists n = n(ω) ∈ N such that for all m ≥ n one has |Xm (ω) − X(ω)| ≤ ε. Hence,
∞ \ ∞
!
[
P {|Xm − X| ≤ ε} = 1.
n=1 m=n

Consequently, for all ε > 0,


∞ [

!
\
0=P {|Xm − X| > ε}
n=1 m=n

!
[
= lim P {|Xm − X| > ε} by σ-continuity of P
n→∞
m=n
≥ lim P(|Xn − X| > ε).
n→∞
=⇒ lim P(|Xn − X| > ε) = 0.
n→∞

Thus, Xn −→ X in probability.

(d) Suppose there exists Y ∈ Lp such that |Xn | ≤ Y for all n.


Special case: Xn −→ 0 in probability. In this case, one has for every ε > 0,
Z Z
p p
|Xn |p dP

E |Xn | = |Xn | dP +
{|Xn |≤ε} {|Xn |>ε}
Z
≤ εp + |Y |p dP.
{|Xn |>ε}

We use the following result (which will be proved later, see Lemma 1.64):
Z
p
E[|Y | ] < ∞ ⇒ (∀η > 0) (∃δ > 0) (∀A with P(A) < δ) |Y |p dP < η. (1.53)
A

Let η > 0 and choose δ as above. Since Xn −→ 0 in probability,

(∃n0 ∈ N) (∀n ≥ n0 ) P(|Xn | > ε) < δ


Z
=⇒ (∃n0 ∈ N) (∀n ≥ n0 ) |Y |p dP < η
{|Xn |>ε}
Z
=⇒ lim sup |Y |p dP = 0.
n→∞ {|Xn |>ε}

Thus,

lim sup E |Xn |p ≤ εp .


 
n→∞

23
Since ε > 0 was arbitrary,
lim E |Xn |p = 0.
 
n→∞

General case: Assume Xn −→ X in probability and X ∈ Lp .


We have |Xn − X| ≤ |Xn | + |X| ≤ |Y | + |X|. By Minkowski’s inequality,
E[(|Y | + |X|)p ]1/p ≤ E[|Y |p ]1/p + E[|X|p ]1/p < ∞.
Since Xn − X → 0 in probability, we can apply the special case to Xn − X and
obtain Xn − X → 0 in Lp or equivalently Xn → X in Lp .

Example 1.57 For Xn −→ X in probability 6=⇒ Xn −→ X almost surely


and Xn −→ X in Lp 6=⇒ Xn −→ X almost surely.
Let Ω = [0, 1], F = B([0, 1]), and P = λ|[0,1] . Set
h 1i h1 i
A1 = 0, , A2 = , 1 ,
2 2
h 1i h1 1i h1 3i h3 i
A3 = 0, , A4 = , , A5 = , , A6 = , 1 ,
4 4 2 2 4 4
h 1i
A7 = 0, , etc.,
8
and Xn = 1An , n ≥ 1. Then,
E |Xn |p = P(An ) −−−→ 0,
 
n→∞

i.e. Xn −→ 0 in Lp and hence in probability. But, for all ω ∈ [0, 1], Xn (ω) takes infinitely
often the value 0 and infinitely often the value 1. Hence, for all ω, the sequence (Xn (ω))n∈N
does not converge. More precisely, lim supn→∞ Xn (ω) = 1 and lim inf n→∞ Xn (ω) = 0 for
all ω ∈ [0, 1].

Example 1.58 For Xn −→ X in probability 6=⇒ Xn −→ X almost surely


and Xn −→ X in Lp 6=⇒ Xn −→ X almost surely.
Let Y1 , Y2 , . . . be iid random variables with P(Yk = 1) = P(Yk = −1) = 21 and Sn =
P n
k=1 Yk , n = 1, 2, . . .. We considered some properties of (Sn ) in the ”Introduction to
Probability” course: (Sn ), n = 1, 2, . . . is called ”simple random walk”. Define Xn =
1{Sn =0} for n = 1, 2, . . ., and X ≡ 0. Then, E[|Xn − X|p ] = P(Xn = 1) = P(Sn = 0).
We showed in the ”Introduction to Probability” course that P(S2n = 0) ∼ √1πn and that
lim supn Xn = 1, P-a.s. (If n is odd, P(Sn = 0) = 0). Hence, Xn → X in in Lp , for all
p ≥ 1, but Xn does not converge almost surely.

Theorem 1.59 (Convergence in probability with subsequences) Let (Ω, F, P) be


a probability space, X a random variable and (Xn ) a sequence of random variables. Then
the following are equivalent:

24
P
(i) Xn −−−→ X.
n→∞

(ii) Every subsequence (Xnk ) of (Xn ) has another subsequence (Xnfk ) such that Xnfk con-
verges to X almost surely for k → ∞.

Proof. (i) =⇒ (ii): Let (Xnk ) be a subsequence of (Xn ). Then, there is a subsequence
(Xnfk ) of (Xnk ) such that
 
1 1
P |Xnfk − X| ≥ ≤ 2 ∀k ≥ 1 . (1.54)
k k

We will show that for all ε > 0,


 
P sup |Xnfk − X| ≥ ε −−−→ 0 . (1.55)
k≥n n→∞

Due to the following lemma (proof see exercises), (1.55) implies that Xnfk converges almost
surely to X.

Lemma 1.60 For a random variable Z and a sequence of random variables (Zn ), the
following are equivalent

(a) Zn → Z a.s.
 
(b) For all ε > 0, P sup |Zn − Z| ≥ ε −−−→ 0.
k≥n n→∞

2
To show (1.55), fix ε > 0 and take n large enough such that n
< ε. Then
  ∞
X  ε
P sup |Xnfk − X| ≥ ε ≤ P |Xnfk − X| ≥
k≥n
k=n
2
∞  
X 1
≤ P |Xnfk − X| ≥
k=n
k

X 1

k=n
k2

and the last term goes to 0 for n → ∞.


(ii) =⇒ (i): Let ε > 0 and bn := P (|Xn − X| ≥ ε). Then 0 ≤ bn ≤ 1 for all n. Let bnk
be a convergent subsequence of (bn ). Our assumption says that there is a subsequence
bnfk such that Xnfk converges a.s. to X. But this implies that bnfk → 0 (since almost sure
convergence implies convergence in probability). Hence, bnk → 0 as well. We conclude
that bn → 0 since every convergent subsequence goes to 0.

25
1.8 Uniform integrability
Definition 1.61 We say that a sequence (Xn )n∈N of random variables is uniformly
integrable if Z
lim sup |Xn |dP = 0, (1.56)
c→∞ n∈N
{|Xn |≥c}
R
where |Xn |dP = E[|Xn |1{|Xn |≥c} ].
{|Xn |≥c}

Remark 1.62 (i)


1
R If the sequence (Xn )n∈N is uniformly integrable, it
R is bounded in
L , i.e. sup |Xn |dP < ∞. Indeed, for any c > 0 and all n, |Xn |dP ≤ c +
R n∈N
|Xn |dP, and taking the supremum over n, the claim follows.
{|Xn |≥c}

(ii) If the random variables Xn are dominated by an integrable random variable, i.e.
there exists a random variable Y ∈ L1 such that |Xn | ≤ Y almost surely, then the
sequence (Xn )n∈N is uniformly integrable. Indeed, for all n,
E[|Xn |1{|Xn |≥c} ] ≤ E[Y 1{Y ≥c} ] = E[Yc ] where Yc = Y 1{Y ≥c} . But E[Yc ] → 0 for
c → ∞ by dominated convergence.
The following theorem extends the dominated convergence theorem and is one of the
reasons why uniform integrability is an important notion.
Theorem 1.63 (Extension of the dominated convergence theorem) If a sequence
(Xn )n∈N of random variables converges in probability to a random variable X and is uni-
formly integrable, then Xn → X in L1 . In particular, if (Xn )n∈N converges in probability
to X and is uniformly integrable, we have lim E[Xn ] = E[lim Xn ] = E[X].
n∈N n∈N
For the proof, we need the following lemma.
Lemma 1.64 If the sequence (Xn )n∈N is uniformly
R integrable, then there is for any ε > 0
some δ = δ(ε) > 0 such that P(A) ≤ δ =⇒ |Xn |dP ≤ ε, for all n.
A
Proof of Lemma 1.64. We have
Z Z Z Z
|Xn |dP = |Xn |dP + |Xn |dP ≤ cP(A) + |Xn |dP (1.57)
A A∩{|Xn <c} A∩{|Xn ≥c} A∩{|Xn ≥c}

and for c = c(ε) large enough and δ < 2cε , both terms on the r.h.s. of (1.57) are ≤ 2ε , for
all n.
Proof of Theorem 1.63. Assume that Xn → X in probability.
R Assume without loss of
generality that X ≡ 0. Then we have E[|Xn |] ≤ ε + |Xn |dP (take c = ε, A = Ω in
{|Xn ≥ε}
(1.57)). But, for any δ > 0, there is some N0 = N0 (δ, ε) such that P(|Xn | R≥ ε) ≤ δ for all
n ≥ N0 , since Xn → 0 in probability. Hence, choosing δ small enough, |Xn |dP ≤ ε
{|Xn ≥ε}
for n ≥ N0 due to Lemma 1.64. Since ε was arbitrary, we conclude that E[|Xn |] → 0 for
n → ∞.

26
2 Laws of large numbers
The material for this section is taken from Chapter 1 in [Dur05].

2.1 Weak law of large numbers (WLLN)


See [Dur05], Section 1.5.

Lemma 2.1 Let X1 , . . . , Xn be random variables on (Ω, F, P). Assume that E[Xi2 ] < ∞,
for all i and that they are uncorrelated, i.e.

E[Xi Xj ] = E[Xi ]E[Xj ] for all i 6= j. (2.1)

(Note that E[Xi ]E[Xj ] is well-defined due to the Cauchy-Schwarz inequality). Then
n
! n
X X
Var Xi = Var(Xi ). (2.2)
i=1 i=1

Proof. Exercise.
Note that independent random variables which are in L2 are uncorrelated.

Theorem 2.2 (Weak law of large numbers (WLLN))


Let Xi ∈ L2 , i ≥ 1, be uncorrelated with E[Xi ] = E[X1 ] for all i. If there exists C > 0
such that E[Xi2 ] ≤ C for all i, then

Sn P
−−−→ E[X1 ]. (2.3)
n n→∞
Proof. Set m = E[X1 ]. Then,
" 2 #    
Sn Sn Sn
E −m = Var because m = E
n n n
1
= Var(Sn )
n2
Cn
≤ 2 by Lemma 2.1
n
−−−→ 0.
n→∞

Sn
Hence, −−−→
n n→∞
m in L2 and hence in probability.
In particular, the weak law of large numbers applies if Xn , n ≥ 1, are i.i.d. (independent
and identically distributed) and in L2 .

27
Example 2.3 (Random permutations) Let n ∈ N and consider

Ωn := set of all permutations of {1, . . . , n}, |Ωn | = n!,


Fn := P(Ωn ),
1
Pn ({π}) := for all π ∈ Ωn .
n!
Every permutation has a unique decomposition into cycles. For example for n = 8,

π = (1 5 4)(2 7)(3)(6 8).

Decompose π ∈ Ωn into cycles as follows:

• Consider

1, π(1), π 2 (1), π 3 (1), . . .

Eventually, π k (1) = 1, and a cycle of length k is completed.

• Repeat with the smallest i not in the first cycle replacing 1, etc.

Set

th
1 if a right parenthesis occurs after the k number

Xn,k := in the cycle decomposition,

0 otherwise.

Claim For each n, Xn,1 , Xn,2 , . . . , Xn,n are independent with respect to Pn . (Proof see
exercises).
1
Pn (Xn,k = 1) = . (2.4)
n−k+1
For example, in the case n = 8, the event

A := {X8,1 = 0, X8,2 = 0, X8,3 = 1, X8,4 = 0, X8,5 = 1, X8,6 = 1, X8,7 = 0, X8,8 = 1}

contains all permutations with the following cycle decomposition produced by the above
algorithm:

(· , · , · ) (· , · ) (· ) (· , · ).

One has

|A| = (1 · 7 · 6) · (1 · 4) · (1) · (1 · 1) = 7 · 6 · 4

28
Hence,
p1 := P8 (X8,1 = 0, X8,2 = 0, X8,3 = 1, X8,4 = 0, X8,5 = 1, X8,6 = 1, X8,7 = 0, X8,8 = 1)
|A| 7·6·4
= = ,
|Ωn | 8!
and
p2 := P8 (X8,1 = 0)P8 (X8,2 = 0) · · · P8 (X8,8 = 1)
 1  1 1  1 1 1  1
= 1− · 1− · · 1− · · · 1− · 1 = p1 .
8 7 6 5 4 3 2
Pnexpectation with respect to Pn and Varn for the variance with respect
We write En for the
to Pn . Let Sn := k=1 Xn,k = number of cycles. Then,
n n
X 1 X1
En [Sn ] = = ∼ ln n (2.5)
k=1
n − k + 1 k=1 k
n
X
Varn (Sn ) = Varn (Xn,k ) by independence
k=1
Xn
En (Xn,k )2
 

k=1
n
X
= En [Xn,k ] because Xn,k is an indicator function
k=1
= En [Sn ] ∼ ln n. (2.6)
Hence,
" 2 #  
Sn Sn Varn (Sn ) 1
En −1 = Varn = 2 ≤ −−−→ 0. (2.7)
E[Sn ] E[Sn ] En [Sn ] En [Sn ] n→∞

2.2 First Borel-Cantelli lemma


Definition 2.4 For An ⊆ Ω, we define
∞ [
\ ∞
lim An = Ak (2.8)
n→∞
n=1 k=n
[∞ \ ∞
lim An = Ak . (2.9)
n→∞ n=1 k=n

Fact 2.5
ω ∈ lim An ⇐⇒ ω ∈ Ak for infinitely many k, (2.10)
n→∞
ω ∈ lim An ⇐⇒ ω ∈ Ak for all but finitely many k. (2.11)
n→∞

29
Lemma 2.6 (First Borel-Cantelli lemma) Let An , ≥ 1, be events. If

X
P(An ) < ∞, then P(An infinitely often) = 0. (2.12)
n=1

Proof.
∞ [

!
\
P(An infinitely often) = P Ak
n=1 k=n
∞ ∞
!
[ [
= lim P Ak by σ-continuity, because Ak ↓ .
n→∞ n
k=n k=n

X
≤ lim P(Ak ) by σ-subadditivity
n→∞
k=n

X
=0 because P(Ak ) < ∞.
k=1

2.3 Strong law of large numbers (SLLN)


Lemma 2.7

Xn −→ 0 almost surely ⇐⇒ (∀ε > 0) P(|Xn | > ε infinitely often) = 0. (2.13)

Proof.

“⇒”: See proof of Theorem 1.56, part (c).

“⇐”: Note that {Xn 6−−−→ 0} = ∞ 1


S
n→∞ k=1 {|Xn | > k
infinitely often}
Using σ-subadditivity, we get

X 1
P(Xn 6−−−→ 0) ≤ P(|Xn | > infinitely often) = 0.
n→∞
k=1
k

Theorem 2.8 (SLLN) Assume that

• Xi ∈ L2 , i ≥ 1, are uncorrelated,

• E[Xi ] = m for all i,

• there exists C ∈ R such that Var(Xi ) ≤ C for all i.

30
Pn
Set Sn := i=1 Xi . Then,

Sn
−−−→ m almost surely. (2.14)
n n→∞
Proof. Without loss of generality m = 0 (otherwise consider Yi = Xi − E[Xi ]).
Fix ε > 0.
h i
  E Sn 2
Sn n
P > ε ≤ by Chebyshev
n ε2
n
(m=0) Var(Sn ) 1 X nC C
= 2 2
= 2 2 Var(Xk ) ≤ 2 2 = 2 .
nε n ε k=1 nε nε

Idea: Look along the subsequence (n2 )n≥1 .


∞   X∞
X Sn2 C
P 2 >ε ≤

2 ε2
< ∞.
n=1
n n=1
n

 
Sn2
Borel-Cantelli I =⇒ P 2 > ε infinitely often = 0 ∀ε > 0
n
Sn2
=⇒ 2 −→ 0 almost surely.
n

Control fluctuations between terms of subsequence: Let

Dn := max |Sk − Sn2 | .


n2 <k≤(n+1)2

Then,
(n+1)2
X
E[Dn2 ] E |Sk − Sn2 |2
  P
≤ (max ≤ )
k=n2 +1
(n+1)2
X
= Var(Xn2 +1 + Xn2 +2 + · · · + Xk ) because m = 0
k=n2 +1
(n+1)2 k
X X
= Var(Xi )
k=n2 +1 i=n2 +1
(n+1)2
X
(n + 1)2 − n2 C = (2n + 1)2 C.


| {z }
k=n2 +1
=2n+1

31
By Chebyshev’s inequality,
 2
E D (2n + 1)2 C C0
P(Dn > n2 ε) ≤ 4 n2 ≤ ≤ .
nε n 4 ε2 n 2 ε2
Hence, Borel-Cantelli I implies P( Dn2n > ε infinitely often) = 0. Since ε > 0 was arbitrary,
using Lemma 2.7
Dn
−−−→ 0 almost surely.
n2 n→∞

Convergence of the whole sequence: Given k, ∃ unique n = n(k) such that n2 <
k ≤ (n + 1)2 . Note that k → ∞ ⇒ n → ∞. Furthermore,
2
(n + 1)2

k 1
1≤ 2 ≤ = 1+
n n2 n
and hence
k
lim = 1.
k→∞ n2

Consequently,
|Sk | |Sn2 + Sk − Sn2 |
=
k k
|Sn2 | + Dn

k
|Sn2 | n2 Dn n2
= 2 · + 2 · −−−→ 0 almost surely.
n
| {z } |{z}k n k
|{z} |{z} k→∞
−−→0 −k→∞
a.s.
−−→1 − −→0 −k→∞
a.s.
−−→1

In particular, the SLLN holds if Xi , i ≥ 1, are i.i.d (independent and identically


distributed) and in L2 .

Application 2.9 (Borel’s law of normal numbers)


This is the first application of the SLLN, due to Borel 1908. Set

Ω := [0, 1], F := B([0, 1]), P := λ|[0,1] .

Expand ω ∈ [0, 1] in the decimal system:

ω = .ξ1 (ω)ξ2 (ω)ξ3 (ω) . . .

with ξi (ω) ∈ {0, 1, . . . , 9}. This representation is unique except for


nm o
n
ω ∈ E := : n ≥ 1, m ∈ {0, 1, . . . , 10 }
10n

32
where there are two representations. We always take the one which terminates. Note:
P(E) = 0. Let
(n)

νk (ω) = i ∈ {1, . . . , n} : ξi (ω) = k , k ∈ {0, 1, . . . , 9}
= number of occurrences of the digit k among the first n digits.
ω ∈ [0, 1] is called simply normal if for all k ∈ {0, 1, . . . , 9}
(n)
νk (ω)
lim (= asymptotic relative frequency of the digit k) (2.15)
n→∞ n
1
exists and equals 10 .
Theorem 2.10 (Borel’s law of normal numbers)
P(ω ∈ [0, 1] : ω is simply normal) = 1. (2.16)
Proof. ξi : Ω → R are random variables: ξ1 (ω) = [10ω], where [x] = largest integer ≤ x,
ξ2 (ω) = [100ω − 10ξ1 (ω)], etc.
ξi , i ≥ 1, are independent with
1
P(ξi = k) = ∀i ≥ 1, k ∈ {0, . . . , 9}.
10
For example,
 1
P(ξ1 = 0, ξ2 = 2) = P ω ∈ [0.02000 . . . , 0.02999 . . . ) =
| {z } 100
=0.03
 1
P(ξ1 = 0) = P ω ∈ [0, 0.0999
| {z . .}.) = 10
=0.1
9
!
[ 1 1
P(ξ2 = 2) = P ω ∈ [0.i2, 0.i2999
| {z . .}.) = 10 · 100 = 10
i=0 =0.i3
=⇒ P(ξ1 = 0, ξ2 = 2) = P(ξ1 = 0)P (ξ2 = 2).
Fix k ∈ {0, . . . , 9}. For n ≥ 1, set
Xn := 1{ξn =k} .
Then Xn , n ≥ 1, are independent and identically distributed, because ξn , n ≥ 1, are
independent and identically distributed.
1
E[Xn ] = P(ξn = k) = , and
10
Var(Xn ) = Var(X1 ) < ∞ ∀n.
Hence, the assumptions of the SLLN are satisfied, and
(n) n
νk 1X 1
relative frequency of k = = Xi −−−→ almost surely.
n n i=1 n→∞ 10


Conjecture 2.11 π and 2 are simply normal.

33
2.4 Second Borel-Cantelli lemma
Recall the first Borel-Cantelli lemma:

X
P(An ) < ∞ =⇒ P(An infinitely often) = 0. (2.17)
n=1

How about

X
P(An ) = ∞ =⇒ P(An infinitely often) = 1?
n=1

This is wrong as the following example shows: Ω = [0, 1], F = B([0, 1]), P = λ|[0,1] ,
An = (0, an ) with an −−−→ 0.
n→∞

=⇒ {An infinitely often} = ∅ (every x ∈ [0, 1] is in at most finitely many An ).

But, if an = n1 , then
∞ ∞
X X 1
P(An ) = = ∞.
n=1 n=1
n

Lemma 2.12 (Second Borel-Cantelli lemma) Let An , n ≥ 1, be independent events.


If

X
P(An ) = ∞, then P(An infinitely often) = 1. (2.18)
n=1

Proof.

Step 1
∞ [

!
\
P(An infinitely often) = P Ak
n=1 k=n
∞ ∞
!
[ [
= lim P Ak by σ-continuity, because Ak ↓ .
n→∞ n
k=n k=n

34
Step 2
" ∞
#c ! ∞
!
[ \
P Ak =P Ack
k=n k=n
N
!
\
≤P Ack for all N ≥ n
k=n
N
Y 
= 1 − P(Ak ) by independence
k=n
N
!
X
≤ exp − P(Ak ) (recall: 1 − x ≤ e−x for x)
k=n
−−−→ 0.
N →∞

Hence, by Step 1,

!
[
P Ak = 1 ∀n =⇒ P(An infinitely often) = 1.
k=n

Fact 2.13
Z ∞
E[|X|] = P(|X| > x) dx. (2.19)
0

Proof. One has


Z ∞ Z ∞Z
P(|X| > x) dx = 1{|X(ω)|>x} P(dω) dx
0
Z0 Z ∞

= 1{|X(ω)|>x} dx P(dω) by Fubini’s theorem


Ω 0
Z Z |X(ω)|
= 1 dx P(dω)
Ω 0
Z
= |X(ω)| P(dω) = E[|X|].

Corollary 2.14
∞  X ∞ 
X 1
P(|X| > ck) ≤ E |X| ≤ P(|X| > ck) ∀c > 0. (2.20)
k=1
c k=0

35
Proof. This follows from
  Z ∞   ∞ k
1 1 X Z
E |X| = P |X| > x dx = P(|X| > cx) dx .
c 0 c k=1 k−1

| {z }

≤ P |X| > c(k − 1)

≥ P(|X| > ck)


P
Applications 2.15 (a) Xn , n ≥ 1, identically distributed =⇒ Xnn −
→ 0, since
 
Xn 
P > ε = P |X1 | > εn −−−→ 0 ∀ε > 0.

n n→∞

Xn
(b) Let Xn , n ≥ 1, be independent and identically distributed. When does n
−−−→ 0
n→∞
almost surely?
Claim
Xn
−→ 0 almost surely ⇐⇒ E[|X1 |] < ∞ .
n
Proof.
 
Xn Xn
−→ 0 almost surely ⇐⇒ (∀ε > 0) P > ε infinitely often = 0
n n
∞  
Borel-Cantelli I + II
X Xn
⇐⇒ (∀ε > 0) P > ε < ∞
n=1
n

X
⇐⇒ (∀ε > 0) P (|X1 | > εn) < ∞
n=1
⇐⇒ E[|X1 |] < ∞ by Corollary 2.14.

If E[|X1 |] = ∞, then by Corollary 2.14


∞  
X Xn
(∀c > 0) P > c = ∞
n=0
n
 
Borel-Cantelli II Xn
=⇒ (∀c > 0) P > c infinitely often = 1
n
|Xn |
=⇒ lim = ∞ almost surely.
n→∞ n

36
Corollary 2.16 Let Xn , n ≥ 1, be independent and identically distributed. If E[|X1 |] =
∞, then
 
Sn
P lim exists in (−∞, ∞) = 0 (2.21)
n→∞ n

Proof.
Sn+1 Sn + Xn+1 Sn n Xn+1
= = · + .
n+1 n+1 n n+1 n+1
Hence,
   
Sn Xn
lim exists in (−∞, ∞) ⊆ −→ 0 .
n→∞ n n
But if E[|X1 |] = ∞, then
|Xn |
lim = ∞ almost surely.
n→∞ n

Example 2.17 Let Xi , i ≥ 1, be independent and identically distributed with


1
P(Xi = +1) = = P(Xi = −1).
2
Set
(
max{k ≥ 1 : Xn−k+1 = · · · = Xn = +1} if Xn = +1
`n =
0 if Xn = −1
= length of the run of +1’s at time n.

(a) P(`n = 0 infinitely often) =?


The events {`n = 0} = {Xn = −1}, n ≥ 1, are independent.

X
P(`n = 0) = ∞ =⇒ P(`n = 0 infinitely often) = 1 by Borel-Cantelli II.
| {z }
n=1
= 21

(b) P(`n = 1 infinitely often) =?


{`n = 1} = {Xn = +1, Xn−1 = −1}, n ≥ 2, are not independent. But {`2 = 1},
{`4 = 1}, {`6 = 1}, . . . are independent.

X
P(`2n = 1) = ∞ =⇒ P(`n = 1 infinitely often) = 1 by Borel-Cantelli II.
| {z }
n=1
= 14

37
The example can be generalized:
• for Xi , i ≥ 1, i.i.d. with P(Xi = +1) = p ∈ (0, 1), P(Xi = −1) = 1 − p one can show
the following:
– P(`n = k infinitely often) = 1 for all k ∈ N;
– any finite sequence (k1 , k2 , . . . , km ) ∈ {−1, 1}m occurs infinitely often with
probability one.
• More generally, for any i.i.d. sequence of random variables taking values in a finite
alphabet, any finite pattern which has positive probability to occur at the beginning
of the sequence occurs infinitely often with probability one.
In particular: A monkey types randomly on the keyboard of a computer. We
assume that the letters he types form an i.i.d. sequence and every character has a
strictly positive probability to appear. Then, any finite pattern (e.g. the constitution
of Bavaria, the bible, the works of Shakespeare, the final exam for this course, a
solution for the final exam, etc.) occurs infinitely often with probability one.

2.5 The strong law of large numbers for L1 random variables


There are two ways to strenghten the WLLN. Once can have weaker assumptions (only
existence of the first instead of the second moment) or one can have a stronger statement
(almost sure convergence instead of convergence in probability). Indeed, we will later
show the following.
Theorem 2.18 (Strong law of large numbers) Let Xi , iP ≥ 1, be independent and
identically distributed with E[|X1 |] < ∞. For n ∈ N, let Sn = ni=1 Xi . Then,
Sn
−→ E[X1 ] almost surely. (2.22)
n
Corollary 2.19 Let Xi , i ≥ 1, be independent and identically distributed with E[Xi− ] <
∞ and E[Xi+ ] = ∞. Then,
Sn
−→ ∞ almost surely. (2.23)
n
Sn 1
Pn + 1
Pn −
Proof. Without loss of generality X i ≥ 0 (because n
= n i=1 X i − n i=1 Xi and
1
P n − −
n i=1 Xi −→ E[X1 ]).
(c) (c)
Let Xn := Xn 1{Xn ≤c} . Then, Xn , n ≥ 1, are i.i.d. in L1 and consequently, we can apply
the strong law of large numbers: For any c > 0, one has
n n
1X 1 X (c) (c)
Xi ≥ Xi −−−→ E[X1 ].
n i=1 n i=1 n→∞

n
1X
=⇒ lim Xi ≥ E[X1 1{X1 ≤c} ] ↑ E[X1 ] = ∞ by monotone convergence.
n→∞ n c→∞
i=1

38
2.6 Kolmogorov’s 0-1-law
Random Series. We P∞start with the following question. Assume Xn , n ≥ 1 are inde-
pendent. When does n=1 Xn converge?

independent with E[Xn ] = 0 ∀n and ∞


P  2
Theorem 2.20 Let Xn , n ≥ 1, beP n=1 E Xn < ∞.
Then, the random variables Sn = ni=1 Xi converge in probability. We say that

X
Xn converges in probability. (2.24)
n=1
Pn
Proof. For Sn = i=1 Xi , we have
n+m
! n+m n+m
X X X
E (Sn+m − Sm )2 = Var E[Xi2 ].
 
Xi = Var(Xi ) =
i=m+1 i=m+1 i=m+1
P∞
Since i=1 E[Xi2 ] < ∞, (Sn )n≥1 is a Cauchy sequence in L2 . But L2 is complete

=⇒ (Sn )n≥1 converges in L2 and hence in probability.

How about almost sure convergence? P∞ 


Question: X1 , X2 , . . . independent =⇒ P n=1 Xn converges ∈ {0, 1}?

Theorem 2.21 (Kolmogorov’s 0-1-law) Let (Ω, F, P) be a probability space. Let (Gi )i∈I
be a countable collection of independent σ-algebras (we assume Gi ⊆ F, ∀i). Further let
T S 
G∞ := σ Gi denote the corresponding tail σ-algebra. Then we have A ∈ G∞ ⇒
J⊆I i∈J
/
|J|<∞
P(A) ∈ {0, 1}.

Interpretation of G∞
(1) Dynamical:
If we interpret I as a sequence {1, 2, 3, . . . } of points in time and Gn as the σ-

T  S 
algebra of all events observable at time n ∈ N, then we have G∞ = σ Gk =
n=1 k≥n

T
σ(Gn , Gn+1 , . . . ). Then G∞ can be interpreted as the σ-algebra of all events
n=1
observable ”in the infinitely distant future”.
(2) Static:
We interpret I as a set of ”subsystems” which act independently of each other
and Gi as the σ-algebra of events which only depend on the i’th subsystem. Then
G∞ is the collection of all ”macroscopic” events which do not depend on finitely
many subsystems. Thus, if the subsystems are independent, we know that on this
”macroscopic scale” the whole system is deterministic.

39
Example 2.22 Let (Xn )n∈N be a sequence of random variables on (Ω, F).

We define Fn := σ(X1 , X2 , . . . , Xn ) and F ∗ = F ∗ (Xi , i ≥ 1) =
T
σ(Xn , Xn+1 , . . .). Then
n=1
F ∗ = G∞ , where Gi = σ(Xi ).
n Pn o  Pn

k=1 Xk k=1 Xk
The events lim cn
exists , lim sup cn ≤ t for cn , t ∈ R with cn % ∞ are
n→∞ n→∞
elements of F ∗ .
Due to Kolmogorov’s 0-1-law we have P(A) ∈ {0, 1}, ∀A ∈ F ∗ provided that the random
variables X1 , X2 , . . . are independent.

Proof.
S
Step 1 The collection of sets Gj (j ∈ J), Gi are independent for every finite set J ⊆ I.
i∈J
/ S 
Due to Corollary 1.47 we have that Gj (j ∈ J), σ Gi are also independent.
S  i∈J
/

Since G∞ ⊆ σ Gi for all finite sets J ⊆ I we have that Gi (i ∈ I), G∞ are


i∈J
/
independent.
S 
Step 2 Again with the help of Corollary 1.47, we can conclude that σ Gi and G∞ are
i∈I
independent.
S 
Step 3 Let A ∈ G∞ . Then A is also an element of σ Gi , since
! i∈I
T S  S
σ Gi ( σ Gi .
J⊆I i∈J
/ i∈I
|J|<∞
Therefore, Step 2 implies P(A) = P(A ∩ A) = P(A)P(A) (A is independent of itself)
⇒ P(A) ∈ {0, 1}.

Applications 2.23
(a) An , n ≥ 1, independent =⇒ P (An infinitely often) ∈ {0, 1},
because {An infinitely often} belongs to the tail-σ-algebra generated by Xn = 1An .
This also follows from the Borel-Cantelli lemmas:
X
P(An ) < ∞ =⇒ P(An infinitely often) = 0,
X
P(An ) = ∞ =⇒ P(An infinitely often) = 1.


!
X
(b) Xn , n ≥ 1, independent =⇒ P Xn converges ∈ {0, 1}.
n=1

40
Example 2.24 (Percolation)

Zd , p ∈ [0, 1]. (
blue with probability p,
1 Every bond is colored
red with probability 1 − p,
0 1 independently of all other bonds.
Consider the random subgraph of Zd containing only
the blue bonds. Its connected components are called
blue clusters.
Claim 2.25 Pp (∃ an infinite blue cluster) ∈ {0, 1}.
 
blue with probability p
Proof. Let Xe = , e ∈ E = set of bonds in Zd , be
red with probability 1 − p
independent. Enumerate the bonds in an arbitrary way such that E = {en : n ∈ N}.
Then,
{∃ an infinite blue cluster} ∈ F ∗ (Xen , n ∈ N).
(Whether there exists an infinite cluster or not doesn’t depend on the state of finitely many
bonds). Use Kolmogorov’s 0-1-law.
You can find more about percolation in [Kle06], section 2.4., or in the lecture course
“Probability on Graphs”.
Lemma 2.26 If Xn , n ≥ 1, are independent and Y is measurable with respect to the
σ-algebra F ∗ (Xn , n ≥ 1) (defined in Example 2.22) then there exists c ∈ [−∞, ∞] such
that P(Y = c) = 1. In other words, Y is almost surely constant.
Proof. By assumption, {Y ≤ a} ∈ F ∗ (Xn , n ≥ 1) for all a ∈ R. Hence, by Kolmogorov’s
0-1 law,
P(Y ≤ a) ∈ {0, 1} for all a ∈ R.
• P(Y ≤ a) = 0 ∀a ∈ R =⇒ P(Y = +∞) = 1.
• P(Y ≤ a) = 1 ∀a ∈ R =⇒ P(Y = −∞) = 1.
• Otherwise c 7→ P(Y ≤ a) equals 1[c,∞) for some c ∈ R; in particular, it is the
distribution function of a constant random variable and hence P(Y = c) = 1.

Corollary 2.27 If Xn , n ≥ 1, are independent, then


 
Sn Sn
either P lim exists = 1 and lim = c ∈ [−∞, ∞] a.s., (2.25)
n→∞ n n→∞ n
 
Sn
or P lim exists = 0. (2.26)
n→∞ n

Proof. limn→∞ Snn and limn→∞ Snn are measurable with respect to F ∗ (Xn , n ≥ 1). Hence,
by Lemma 2.26 they are constant almost surely. If the two constants are equal, then
limn→∞ Snn exists a.s. Otherwise, the limit a.s. does not exist.

41
What is the right almost sure normalization?

Theorem 2.28 Let Xi ∈ L2 , i ≥ 1,Pbe independent and identically distributed with
E[Xi ] = 0 and E Xi2 = σ 2 < ∞, Sn = ni=1 Xi . For all ε > 0,

Sn
√ 1 −−−→ 0 almost surely. (2.27)
n(ln n) 2 +ε n→∞

This is stronger than the SLLN!

Proof. [Dur05], Theorem (8.7) in Chapter 1.


This is close to the best possible. One can show that
Sn √
lim √ = σ 2 almost surely (2.28)
n→∞ n ln ln n
(law of the iterated logarithm).

42
3 Weak convergence, characteristic functions and the
central limit theorem
The material for this chapter is taken from [Dur05], Chapter 2.

3.1 Motivation
Theorem 3.1 (Central limit theorem) Let Xi , i ≥ 1, be independent and identically
distributed with E[X1 ] = 0 and E X12 = 1. For n ∈ N, set Sn = ni=1 Xi . Then, ∀x ∈ R
  P

  Z x
Sn 1 t2
P √ ≤ x −−−→ √ e− 2 dt (3.1)
n n→∞ 2π −∞
= P(Z ≤ x) for Z ∼ N (0, 1). (3.2)

Sn
In other words, if Fn = distribution function of √ ,
n
F = distribution function of N (0, 1),
then

Fn (x) −−−→ F (x) ∀x ∈ R. (3.3)


n→∞
h i  
Sn Sn
(Note: E √
n
= 0, Var √
n
= 1.)

We will need some tools to prove this.


See [Dur05], Theorem (1.5) in Chapter 2 for a proof of the special case
(
−1 with probability 12 ,
Xi =
+1 with probability 12 .

This is called de Moivre-Laplace theorem.

3.2 Weak convergence


Definition 3.2 Let S be a metric space and S be its Borel σ field. If µn , µ are probability
w
measures on (S, S), we say that µn converges weakly to µ (µn −→ µ) if
Z Z
f dµn −−−→ f dµ ∀f : S → R bounded and continuous. (3.4)
n→∞

w w
We say Xn −
→ X if L(Xn ) −
→ L(X), where L(Xn ) denotes the distribution of Xn .

Often we will work with (S, S) = (R, B(R)).

43
Example 3.3 Assume xn ∈ R, xn −−−→ x ∈ R. Let δy be the point mass in y, i.e.
n→∞
(
1 if y ∈ A,
δy (A) = (3.5)
0 otherwise.
Then,
w
δx n −
→ δx , (3.6)
R R
because f dδxn = f (xn ) −−−→ f (x) = f dδx for any continuous f : R → R.
n→∞
w
Remarks 3.4 (a) Xn − → X ⇐⇒ E[f (Xn )] −−−→ E[f (X)] ∀f : S → R bounded
n→∞
and continuous. R
Proof. E[f (Y )] = f (y) µ(dy) with µ = L(Y ).
w
(b) Xn −
→ X is possible even if all Xn ’s are defined on different probability spaces.
w w
(c) Xn −
→ X =⇒ h(Xn ) − → h(X) ∀h continuous.
Proof. f ◦ h is bounded and continuous if f is bounded and continuous.
Theorem 3.5 (Portmanteau theorem)
Let µn , µ be probability measures on (Rd , B(Rd )). The following are equivalent:
w
(i) µn −
→ µ;
Z Z
(ii) f dµn −→ f dµ ∀f : Rd → R bounded and uniformly continuous;

(iii) lim µn (F ) ≤ µ(F ) ∀ closed F ⊆ Rd ;


n→∞

(iv) lim µn (G) ≥ µ(G) ∀ open G ⊆ Rd ;


n→∞

(v) µn (A) −−−→ µ(A) ∀A ∈ B(Rd ) with µ(∂A) = 0, where ∂A = Ā \ A◦ =closure


n→∞
minus interior = boundary of A.
A good example to remember the inequalities in (iii) and (iv) is the following: µn =
w
δ1 −→ µ = δ0 . For F = {0} and G = (0, 1) there is strict inequality in (iii) and (iv),
n
respectively. Since µ(∂F ) = µ({0}) = 1 and µ(∂G) = µ({0, 1}) = 1, this is in accordance
with (v).
Proof.

(i) (ii) (iii) (iv)


trivial trivial

(v)

44
• (iii)⇐⇒(iv): F closed ⇐⇒ F c open.

lim µn (F ) ≤ µ(F )
n→∞
lim 1 − µn (F c ) ≤ 1 − µ(F c )

⇐⇒
n→∞
⇐⇒ 1 − lim µn (F c ) ≤ 1 − µ(F c )
n→∞
⇐⇒ µ(F c ) ≤ lim µn (F c ).
n→∞

• (ii)=⇒(iii): Let F be closed, δ > 0.


Since µ is outer regular, µ(F ) = inf{µ(U ) : F ⊆ U open}. For
ε > 0, consider G = Gε := {x : dist(x, F ) < ε}. Then, G is G
open, F ⊆ G and Gε ↓ F for ε ↓ 0. Hence, for sufficiently small
ε > 0, F

µ(G) < µ(F ) + δ.


Put

1
 if t ≤ 0,
ϕ(t) := 1 − t if 0 ≤ t ≤ 1,

0 if t ≥ 1.

1
ϕ(t)

t
0 1

 
dist(x,F )
The function f (x) := ϕ ε
is uniformly continuous on Rd with 0 ≤ f (x) ≤ 1
∀x and (
1 if x ∈ F,
f (x) =
0 if x ∈ Gc .
So,
Z Z Z
µn (F ) = f dµn ≤ f dµn −−−→ f dµ by (ii)
F R d n→∞ Rd
Z
=⇒ lim µn (F ) ≤ f dµ.
n→∞ Rd

45
Furthermore,
Z d Z
f dµ = f dµ ≤ µ(G) < µ(F ) + δ.
R G

Hence,

lim µn (F ) < µ(F ) + δ.


n→∞

δ > 0 arbitrary =⇒ limn→∞ µn (F ) ≤ µ(F ).

• (iii)=⇒(i): Let
Z f be bounded
Z and continuous.
Claim: lim f dµn ≤ f dµ.
n→∞
Proof of the claim. Without loss of generality 0 < f (x) < 1 ∀x ∈ Rd .
(Given f bounded, ∃ a > 0, b ∈ Rd such that g(x) := af (x) + b satisfies 0 < g(x) < 1
∀x. If the claim holds for g, it holds for f .)
Define for k ∈ N, Fi := {x : f (x) ≥ ki }, i = 0, 1, . . . , k. All Fi are closed because f
is continuous.
k k
i − 1 n i − 1 i o i n i − 1 i o
X Z X
µ x: ≤ f (x) < ≤ f dµ ≤ µ x: ≤ f (x) <
i=1
k | k {z k } i=1
k| k {z k }
=µ(Fi−1 )−µ(Fi )
= µ(Fi−1 \ Fi )
= µ(Fi−1 ) − µ(Fi )
k−1
Xi k
Xi−1 k−1
Xi+1 k
X i
= µ(Fi ) − µ(Fi ) = µ(Fi ) − µ(Fi )
i=0
k i=1
k i=0
k i=1
k
k k
X µ(Fi ) 1 X µ(Fi )
= = + because µ(F0 ) = 1
i=1
k k i=1 k
and µ(Fk ) = 0.

Hence,
k k
µ(Fi ) 1 X µ(Fi )
X Z
≤ f dµ ≤ + . (3.7)
i=1
k k i=1 k

Since (iii) holds,

lim µn (Fi ) ≤ µ(Fi ) ∀i = 0, 1, . . . , k. (3.8)


n→∞

46
So,
k
!
1 µn (Fi )
Z X
lim f dµn ≤ lim + by the upper bound in (3.7)
n→∞ n→∞ k i=1
k
k
1 X µ(Fi )
≤ + by (3.8)
k i=1 k
1
Z
≤ + f dµ by the lower bound in (3.7).
k
Let k → ∞. The claim follows.
Apply the claim to −f to get
Z Z Z
− lim f dµn = lim −f dµn ≤ − f dµ
Z Z
=⇒ lim f dµn ≥ f dµ
Z
≥ lim f dµn by the claim
Z
≥ lim f dµn
Z Z
=⇒ lim f dµn = f dµ.

• (iii)=⇒(v): Let A◦ = interior of A, Ā = closure of A.


(iii)
µ(Ā) ≥ lim µn (Ā) ≥ lim µn (A) ≥ lim µn (A)
n→∞ n→∞ n→∞
(iii)⇒(iv)
≥ lim µn (A◦ ) ≥ µ(A◦ ).
n→∞

If µ(∂A) = 0, then µ(Ā) = µ(A) = µ(A◦ ).

=⇒ lim µn (A) = µ(A).


n→∞

• (v)=⇒(iii): For F closed,

∂{x : dist(x, F ) ≤ δ} ⊆ {x : dist(x, F ) = δ}.

So, these boundaries are distinct for distinct δ. Hence, at most countably many
have positive µ-measure. Consequently,

∃ δk ↓ 0 such that Fk := {x : dist(x, F ) ≤ δk } satisfy µ(∂Fk ) = 0 ∀k.

47
For each k,
(v)
lim µn (F ) ≤ lim µn (Fk ) = µ(Fk ).
n→∞ n→∞

Since F is closed, Fk ↓ F and thus µ(Fk ) ↓ µ(F ).

Theorem 3.6 Let µn , n ≥ 1, µ be probability


 measures on (R, B(R)) with distribution
functions Fn , F i.e. Fn (x) = µn (−∞, x] etc. . Then,
w
µn −
→µ ⇐⇒ Fn (x) −−−→ F (x) for all continuity points of F. (3.9)
n→∞

In general, one cannot remove the restriction to continuity points of F . Example:


w
µn = δ 1 − → µ = δ0 . Then, Fn = 1[ 1 ,∞) , F = 1[0,∞) . For x = 0, one has Fn (0) = 0,
n n
F (0) = 1.

CLT: Let Xi , i ≥ 1, be independent and identically distributed with E[X1 ] = m and


Var(X1 ) = σ 2 ∈ (0, ∞). Then,
Z x
Sn − nm
 
1 t2
P √ ≤ x −−−→ √ e− 2 dt ∀x ∈ R,
nσ n→∞ 2π −∞
Sn − nm w
i.e. √ −
→ Y where Y ∼ N (0, 1).

Proof of Theorem 3.6.
“⇒”: Let x be a continuity point of F . Then,
 
µ ∂(−∞, x] = µ {x} = F (x) − F (x−) = 0.

Apply the Portmanteau theorem:

Fn (x) = µn ((−∞, x]) −−−→ µ((−∞, x]) = F (x).


n→∞

“⇐”: Consider G = (a, b).


Take a < ak < bk < b such that ak ↓ a, bk ↑ b, and F is continuous at ak , bk . Then,

µ(ak , bk ] = F (bk ) − F (ak )



= lim Fn (bk ) − Fn (ak ) by assumption
n→∞

= lim µn (ak , bk ] = lim µn (ak , bk ]
n→∞ n→∞
≤ lim µn (a, b) because (ak , bk ] ⊆ (a, b).
n→∞

48
Take the limit k → ∞. Since (ak , bk ] ↑ (a, b),

µ(a, b) ≤ lim µn (a, b).


n→∞
S
Now, take G open. Write G = k Ik with Ik disjoint open intervals. Then,
X X
µ(G) = µ(Ik ) ≤ lim µn (Ik )
n→∞
k k
X
≤ lim µn (Ik ) by Fatou
n→∞
k
= lim µn (G).
n→∞

w
Portmanteau theorem =⇒ µn −
→ µ.


Recall Fatou’s lemma:
Z Z
lim fn dν ≤ lim fn dν
S n→∞ n→∞ S

if fn ≥ 0, ν measure on (S, S).


Special case: ν = counting measure on (N, P(N)), fn : N → [0, ∞),
X X 
lim fn (k) ≤ lim fn (k).
n→∞ n→∞
k k

Example 3.7 (Geometric distributions) Let p ∈ (0, 1).


(i)
Let Xp , i ≥ 1, be iid (independent and identically distributed),
(
1 with probability p,
Xp(i) =
0 with probability 1 − p.
Xp := min{n : Xp(n) = 1} = first time a success occurs.

Then, Xp has a geometric distribution:

P(Xp = k) = (1 − p)k−1 p, k = 1, 2, 3, . . . (3.10)

Claim: P(pXp > x) −−→ P(X > x) for all x ∈ R, where X has an exponential distribu-
p→0
tion, i.e. L(X) has the density

f (x) = e−x 1[0,∞) (x).

Theorem 3.6 implies


w
pXp −−→ X.
p→0

49
Proof. One has
Z ∞
P(X > x) = e−t dt = e−x ∀x > 0
x

and P(X > x) = 1 for all x ≤ 0. Furthermore, P(pXp > x) = 1 for all x ≤ 0 and

X 1
P(Xp ≥ n) = (1 − p)k−1 p = p(1 − p)n−1 ·
k=n
p
= (1 − p)n−1 for n = 1, 2, . . .
     
x x
P(pXp > x) = P Xp > = P Xp ≥ +1
p p
= (1 − p)[ p ]
x

−−→ e−x ∀x > 0 by the following lemma.


p→0

Lemma 3.8 If cj → 0, aj → ∞, and cj aj → λ, then

(1 + cj )aj −→ eλ . (3.11)

Proof. ln [(1 + cj )aj ] = aj ln(1 + cj ) ∼ aj cj −→ λ.

Example 3.9 (Empirical distributions) Let Xi , i ≥ 1, be iid with distribution func-


tion F and let
n
1X
ρn (ω) = δX (ω) empirical distribution. (3.12)
n i=1 i

The probability measure ρn (ω) has the distribution function


n

1X i ∈ {1, . . . , n} : Xi (ω) ≤ x
Fn (x, ω) = ρn (ω)((−∞, x]) = 1{Xi (ω)≤x} = .
n i=1 n

Due to the SLLN, for all x ∈ R, Fn (x, ω) converges to F (x) for n → ∞, almost surely.
w
Theorem 3.6 =⇒ For almost all ω, ρn (ω) −
→ L(X1 ). (3.13)

A stronger statement was proved by Glivenko-Cantelli: sup |Fn (x, ω) − F (x)| −−−→ 0 for
x∈R n→∞
almost all ω, where F is the distribution function of X1 .

50
Theorem 3.10 (Continuous mapping theorem) Let h : R → R be measurable,
Dh := {x : h is discontinuous at x}. (3.14)
Dh ∈ B(R), see [Dur05], Exercise 2.6 in Chapter 1. If
w
Xn −
→X and P(X ∈ Dh ) = 0, (3.15)
then
w
h(Xn ) −
→ h(X). (3.16)
Proof. Let µn , µ be the distributions of Xn , X,
νn , ν the distributions of h(Xn ), h(X).
Note:
νn (A) = P h(Xn ) ∈ A = P Xn ∈ h−1 (A) = µn h−1 (A) .
  

By the Portmanteau theorem, it suffices to show that


lim νn (F ) ≤ ν(F ) ∀F closed,
n→∞
lim µn h−1 (F ) ≤ µ h−1 (F ) ∀F closed.
 
⇐⇒
n→∞

Let F be closed. Then,


 
−1

−1
lim µn h (F ) ≤ lim µn h (F ) where denotes the closure
n→∞ n→∞
 
w
≤ µ h−1 (F ) by the Portmanteau theorem, since µn −
→ µ.

Claim: h−1 (F ) ⊆ Dh ∪ h−1 (F ).


Let x ∈ h−1 (F ). Then there exist xn ∈ h−1 (F ) with xn −→ x. We have h(xn ) ∈ F .
• If h is continuous at x, then h(xn ) −−−→ h(x).
| {z } n→∞
∈F
F closed =⇒ h(x) ∈ F =⇒ x ∈ h−1 (F ).
• Otherwise, x ∈ Dh .
So,
 
µ h−1 (F ) ≤ µ Dh ∪ h−1 (F ) = µ h−1 (F ) .
 

Theorem 3.11 Let Xn , X be random variables defined on the same probability space.
P w
(a) Xn −
→X =⇒ Xn −
→ X.
w P
(b) Xn −
→ X and X is constant almost surely =⇒ Xn −
→ X.
(c) (b) is false without the assumption that X is constant almost surely.

51
Proof.
(a) Let f : R → R be bounded and uniformly continuous.
Let ε > 0. There exists δ > 0 such that ∀x, y ∈ R with |x − y| < δ one has
|f (x) − f (y)| < ε.
   
E f (Xn ) − E f (X)
Z
≤ |f (Xn ) − f (X)| dP
Z Z
= |f (Xn ) − f (X)| dP + |f (Xn ) − f (X)| dP
{|Xn −X|<δ} | {z } {|Xn −X|≥δ}


≤ ε + 2 kf k∞ P |Xn − X| ≥ δ .
P
Since Xn −
→ X, we get
   
lim E f (Xn ) − E f (X) ≤ ε.
n→∞

ε > 0 was arbitrary


   
=⇒ lim E f (Xn ) = E f (X) .
n→∞
w
By the Portmanteau theorem, Xn −
→ X.
w
(b) Let X = c almost surely. Since Xn −
→ X, we have for x 6= c
(
0 if x < c,
P(Xn ≤ x) −−−→ P(X ≤ x) =
n→∞ 1 if x > c.
For ε > 0, one has
 
P |Xn − X| > ε = P |Xn − c| > ε
= P(Xn > c + ε) + P(Xn < c − ε)
≤ 1 − P(Xn ≤ c + ε) + P(Xn ≤ c − ε)
| {z } | {z }
−→1 −→0
−−−→ 0.
n→∞
P
Hence, Xn −
→ X.
w
(c) For n ∈ N, let Xn ∼ 12 δ 1 + 12 δ1− 1 and X ∼ 21 δ0 + 12 δ1 . Then Xn −
→ X.
n n

Assume Xn , n ≥ 1, and X are defined on the same probability space such that X
1

is independent of all Xn . Then, for all ε ∈ 0, 4 ,

P |Xn − X| > ε = P (X = 0, |Xn | > ε) + P (X = 1, |Xn − 1| > ε)
   
1 1
= P(X = 0)P Xn = 1 − + P(X = 1)P Xn = ∀n large enough
n n
1 1 1 1
= · + · 6−−−→ 0.
2 2 2 2 n→∞

52
3.3 Weakly convergent subsequences
Definition 3.12 (a) A collection Π of probability measures on (R, B(R)) is relatively
compact, if every sequence in Π contains a weakly convergent subsequence.

(b) Π is tight, if (∀ε > 0) (∃M < ∞) such that ∀µ ∈ Π



µ [−M, M ] > 1 − ε. (3.17)

Note that any finite collection of probability measures is tight.

Example 3.13 (a) Π = {δn : n = 1, 2, . . .}. (


0 if x < n,
Corresponding distribution functions: Fn (x) =
1 if x ≥ n.
Then, limn→∞ Fn (x) = 0 ∀x.
Π is not relatively compact, not tight.
n o
(b) Π = δ 1 : n = 1, 2, . . . .
n
w
δ1 −
→ δ0 =⇒ Π is relatively compact.
n 
Π is tight: µ [−1, 1] = 1 ∀µ ∈ Π.

Theorem 3.14 (Prokhorov’s theorem) Π is relatively compact ⇐⇒ Π is tight.

The proof uses

Theorem 3.15 (Helly’s selection theorem) For every sequence (Fn )n∈N of distribu-
tion functions, there exist a subsequence (Fnk )k∈N and a function F which is non-decreasing
and right-continuous for which

Fnk (x) −→ F (x) ∀ continuity points of F. (3.18)

Proof. See [Dur05], (2.5) in Chapter 2.


Note: F need not be a distribution function.

Proof of Theorem 3.14.

“⇐=” Suppose Π is tight.


Let (µn )n≥1 be a sequence in Π, Fn the corresponding distribution functions. By
Helly’s selection theorem, there exist a subsequence (Fnk )k≥1 and a non-decreasing
and right-continuous function F such that

Fnk (x) −−−→ F (x) ∀ continuity points of F.


k→∞

53
Let µ be the unique measure satisfying

µ(a, b] = F (b) − F (a) ∀a < b.

Given ε > 0, choose a, b so that

µn (a, b] > 1 − ε ∀n (3.19)

(possible by tightness). We can assume a, b are continuity points of F (if necessary


decrease a, increase b). Then,

µnk (a, b] = Fnk (b) − Fnk (a)


−−−→ F (b) − F (a) = µ(a, b].
k→∞

Hence, (3.19) implies µ(R) ≥ µ(a, b] ≥ 1 − ε.


ε arbitrary =⇒ µ is a probability measure.
w
Hence, F is a distribution function. By Theorem 3.6, µnk −
→ µ.

“=⇒” Suppose Π is relatively compact.


We give a proof by contradiction. Suppose Π is not tight.

=⇒ (∃ε > 0) (∀M ) (∃µ ∈ Π) such that µ(−M, M ] ≤ 1 − ε.

(∀k) choose µk ∈ Π such that

µk (−k, k] ≤ 1 − ε. (3.20)
w
There is a subsequence such that µk(j) −−−→ µ for some probability measure µ by
j→∞
relative compactness.
Choose a, b so that µ{a} = µ{b} = 0 and
ε
µ(a, b] > 1 − . (3.21)
2

For large enough j, (a, b] ⊆ − k(j), k(j) . Then,

(3.20) 
1 − ε ≥ µk(j) − k(j), k(j) ≥ µk(j) (a, b] −−−→ µ(a, b].
j→∞

=⇒ µ(a, b] ≤ 1 − ε contradicts (3.21)

Corollary 3.16 If (µn )n≥1 is tight and if each subsequence that converges weakly con-
w
verges to the same probability measure µ, then µn −
→ µ.

54
Lemma 3.17 Let xn , n ∈ N, and x be real numbers. If each subsequence (xnk )k≥1 con-
tains a further subsequence (xnk (j) )j≥1 with xnk (j) −−−→ x, then xn −−−→ x.
j→∞ n→∞

Proof. Suppose xn 6→ x. Then, (∃ε > 0) such that (∀n0 ) (∃n ≥ n0 ) such that |xn −x| ≥ ε.
In particular, (∀k) (∃nk ≥ k) with
|xnk − x| ≥ ε. (3.22)
By assumption, there exists a subsequence (xnk (j) )j≥1 with xnk (j) −−−→ x. This contradicts
j→∞
(3.22).
R
Proof of Corollary 3.16. Let f be bounded and continuous and set xn = f dµn .
Since (µn )n≥1 is tight, it is relatively compact. RThus, (xn )n≥1 satisfies the assumptions of
R w
Lemma 3.17 and it follows that f dµn −−−→ f dµ. Thus, µn − → µ.
n→∞

Theorem 3.18 (Sufficient condition for tightness)


If for some p > 0,
sup E |Xn |p < ∞,
 
n
then (L(Xn ))n≥1 is tight.
Proof. Let ε > 0.
By Chebyshev’s inequality,
 E |Xn |p supk E |Xk |p
   
P |Xn | > M ≤ ≤ < ε ∀M sufficiently large.
Mp Mp

=⇒ P |Xn | ≤ M ≥ 1 − ε ∀n.

 2  3.19 Let Xi , i ≥ 1, be iid (independent and identically distributed) with E[Xi ] =


Example
0, E Xi = 1.
" 2 #
Sn 1
=⇒ E √ = Var(Sn ) = 1. (3.23)
n n
  
Sn
=⇒ L √ is tight. (3.24)
n n≥1

3.4 Characteristic functions


Definition 3.20 The characteristic function of a random variable X is defined by
ϕ : R → C, ϕ(t) = E eitX .
 
(3.25)
For a complex valued random variable Z define
   
E[Z] = E Re(Z) + iE Im(Z) . (3.26)
   
So ϕ(t) = E cos(tX) + iE sin(tX) .

55
Note that we have
Z
E[f (X)] = f (x) µ(dx)
R

with µ = L(X) = distribution of X. Hence, if L(X) = L(Y ), then

E[f (X)] = E[f (Y )]

for all measurable maps f : R → C such that E[f (X)] is well-defined. In particular,
Z
itX
ϕX (t) = E[e ] = eitx µ(dx)
R

is the characteristic function of X. Thus, if X and Y have the same distribution, then
they have the same characteristic function.

Proposition 3.21 (Properties of characteristic functions) (a) ϕ(0) = 1.

(b) ϕ−X (t) = ϕX (−t) = ϕX (t).


L(X) = L(−X) ⇐⇒ ϕX is real.

(c) |ϕ(t)| ≤ 1 ∀t ∈ R.

(d) ϕ is uniformly continuous on R.

(e) ϕaX+b (t) = eitb ϕX (at) ∀t ∈ R, a, b ∈ R.

(f ) If X and Y are independent, then

ϕX+Y (t) = ϕX (t)ϕY (t) ∀t ∈ R. (3.27)

Proof.

(a) ϕ(0) = E[e0 ] = 1.


   
(b) ϕ−X (t) = ϕX (−t) = E cos(−tX) +iE sin(−tX) = ϕX (t).
| {z } | {z }
=cos(tX) =− sin(tX)
“⇒”: If L(X) = L(−X), then ϕX (t) = ϕ−X (t) = ϕX (t). Hence, ϕX is real.
“⇐”: If ϕX is real, then ϕX (t) = ϕX (t) = ϕ−X (t). The claim follows from the
uniqueness theorem 3.23, below.
h i
(c) |ϕ(t)| = E eitX ≤ E eitX = 1.
 

56
h i
(d) |ϕ(t + h) − ϕ(t)| = E ei(t+h)X − eitX

h  i
itX
 ihX
= E e · e −1

h i
≤ E eitX · eihX − 1

h i
ihX
=E e − 1 independent of t
| {z }
≤2
−−→ 0 by the bounded convergence theorem.
h→0

(e) ϕaX+b (t) = E eit(aX+b) = eitb E eiatX = eitb ϕX (at).


   

(f) ϕX+Y (t) = E |eit(X+Y )


   itX   itY 
{z } = E e E e = ϕX (t) · ϕY (t).
=eitX ·eitY
independent

A table with the characteristic functions of many distributions can be found in Theorem
15.12 in [Kle06].
Example 3.22 If X ∼ N (m, σ 2 ), then
σ 2 t2
 
ϕX (t) = exp imt − , t ∈ R. (3.28)
2
Proof.

Special case: X ∼ N (0, 1).


Z ∞
1 itx − x2
2
ϕ(t) = ϕX (t) = √ e e dx
2π −∞ =cos(tx)+i
|{z}
sin(tx)
Z ∞
1 x2 x2
=√ cos(tx)e− 2 dx because x 7→ sin(tx)e− 2 is an odd function.
2π −∞
In particular, ϕ(t) is real. Differentiate with respect to t:
Z ∞
0 1 d h x2
i
ϕ (t) = √ cos(tx)e− 2 dx
2π −∞ dt
Z ∞
1 x2
= −√ sin(tx) · xe− 2 dx
2π −∞
x2
Integration by parts: u(x) = sin(tx) v 0 (x) = −xe− 2
x2
u0 (x) = t cos(tx) v(x) = e− 2
h
2 i∞
Z ∞ 
1 − x2 − x2
2
=√ sin(tx)e − t cos(tx)e dx
2π | {z −∞
} −∞
=0
= −tϕ(t).

57
Hence,

ϕ0 (t)
= −t.
ϕ(t)

=⇒ (ln ϕ(t))0 = −t
t2
=⇒ ln ϕ(t) = − + c for some c ∈ R
2
t2
=⇒ ϕ(t) = e− 2 +c .

Since ϕ(0) = 1, we conclude c = 0.


t2
=⇒ ϕ(t) = e− 2 .

General case: X ∼ N (0, 1) =⇒ σX + m ∼ N (m, σ 2 ).


σ 2 t2
ϕσX+m (t) = eitm ϕX (σt) = eitm e− 2 .

3.5 Uniqueness theorem and inversion theorems


Theorem 3.23 (Uniqueness theorem) Let X, Y be random variables. If

ϕX (t) = ϕY (t) ∀t ∈ R, (3.29)

then L(X) = L(Y ).

For any distribution µ, let

µ(σ) := µ ∗ N (0, σ 2 ), (3.30)

i.e. µ(σ) is the distribution of X + Y where X ∼ µ and Y ∼ N (0, σ 2 ) are independent.

Lemma 3.24 Let X ∼ µ have characteristic function ϕ. Then, µ(σ) has density

σ 2 t2
 
1
Z
(σ)
f (x) = ϕ(t) exp −ixt − dt, x ∈ R. (3.31)
2π R 2

Proof. If Y ∼ N 0, σ12 , then




σ
2
Z
− s2 t2 σ 2
e 2σ = ϕY (s) = eist √ e− 2 dt. (3.32)
R 2π

58
By the formula for the density of convolutions,
1
Z
(x−y)2
(σ)
f (x) = √ e− 2σ2 µ(dy)
2πσ
Z ZR
1 t2 σ 2
= ei(y−x)t · e− 2 dt µ(dy) where we used (3.32) for s = y − x
2π R R
1
Z Z
t2 σ 2
= eiyt µ(dy) e−ixt e− 2 dt by Fubini’s theorem.
2π R R
| {z }
=ϕ(t)

Lemma 3.25 Let X ∼ µ. Then


w
µ(σ) −
→ µ as σ → 0. (3.33)

Proof. Let Y ∼ N (0, 1), independent of X. Then, σY ∼ N (0, σ 2 ), so X + σY ∼ µ(σ) .


As σ → 0,

X + σY −→ X almost surely,

hence in probability, hence weakly by Theorem 3.11.


w
=⇒ µ(σ) −
→ µ.

Proof of the uniqueness theorem. Assume that ϕX (t) = ϕY (t) ∀t ∈ R.


Let µ, ν be the distributions of X, Y , respectively.

Lemma 3.24 =⇒ µ(σ) = ν (σ) ∀σ > 0,


↓w ↓w
Lemma 3.25 =⇒ µ ν as σ → 0.

Weak limit is unique =⇒ µ = ν.

Theorem 3.26 (An inversion theorem) Let X ∼ µ with characteristic function ϕ.


 
(a) Let a and b be points with µ {a} = µ {b} = 0. Then,
Z ∞
1 σ 2 t2 e−itb − e−ita
µ(a, b] = lim ϕ(t)e− 2 · dt. (3.34)
σ→0 2π −∞ −it
R
(b) If ϕ satisfies R
|ϕ(t)| dt < ∞, then µ has a density f given by

1
Z  
f (x) = e−itx ϕ(t) dt = lim f (σ) (x) , x ∈ R. (3.35)
2π σ→0

59
Proof. Let Y ∼ N (0, σ 2 ) be independent of X, f (σ) density of X + Y .
By Lemma 3.24,

σ 2 t2
 
1
Z
(σ)
f (x) = ϕ(t) exp −ixt − dt.
2π 2

(a) Integrate both sides over (a, b]:


Z bZ ∞
1 σ 2 t2
µ (σ)
(a, b] = e−ixt ϕ(t)e− 2 dt dx
2π a −∞
∞ b
1
Z Z
σ 2 t2
= e−ixt dx ϕ(t)e− 2 dt by Fubini’s theorem.
2π −∞
|a {z }
h −ixt ix=b −ibt −e−iat
= − e it =e −it
x=a

So,
w
µ(a, b] = lim µ(σ) (a, b] because µ(σ) − →µ
σ→0
Z ∞
1 2 2
− σ 2t e−ibt − e−iat
= lim ϕ(t)e · dt.
σ→0 2π −∞ −it

(b) By the dominated convergence theorem,

lim f (σ) (x) = f (x) ∀x ∈ R.


σ→0

Furthermore,
1
Z
(σ)
|f (x)| ≤ |ϕ(t)| dt =: c < ∞
2π R
 
for all σ > 0, x ∈ R. Thus, for a < b with µ {a} = µ {b} = 0, the bounded
convergence theorem implies
Z b Z b
(σ) (σ)
µ(a, b] = lim µ (a, b] = lim f (x) dx = f (x) dx.
σ→0 σ→0 a a

Hence, µ has density f .

Example 3.27 (Cauchy distribution) Let X have characteristic function ϕ(t) = e−|t| ,
t ∈ R. Then,
Z
|ϕ(t)| dt < ∞.
R

60
Hence, X has density
Z ∞
1
f (x) = e−itx e−|t| dt
2π −∞
Z 0 Z ∞
1 1
= e t(1−ix)
dt + e−t(1+ix) dt
2π −∞ 2π 0
 t(1−ix) t=0  −t(1+ix) t=∞
1 e 1 e
= +
2π 1 − ix t=−∞ 2π −(1 + ix) t=0
 
1 1 1
= +
2π 1 − ix 1 + ix
1 2 1 1
= · 2
= · = density of a Cauchy distribution.
2π 1 − (ix) π 1 + x2
Suppose X1 , . . . , Xn are independent, Cauchy distributed. Let Sn = X1 + · · · + Xn .

ϕSn (t) = ϕX1 (t) · ϕX2 (t) · · · ϕXn (t) = e−n|t|


 
t
ϕ Sn (t) = ϕSn = e−|t|
n n
Sn
Uniqueness theorem =⇒ ∼ Cauchy.
n

3.6 Lévy’s continuity theorem


The following theorem shows the connection between weak convergence and characteristic
functions.

Theorem 3.28 (Lévy’s continuity theorem) Let X, Xi , i ≥ 1, be random variables


with characteristic functions ϕ, ϕi , i ≥ 1. Then,
w
Xn −
→X ⇐⇒ ϕn (t) −−−→ ϕ(t) ∀t ∈ R. (3.36)
n→∞

The following lemma relates the behavior of ϕ near 0 to the tail behavior of X.

Lemma 3.29 Let X ∼ µ have characteristic function ϕ. Then,


1 u
   
2 2
Z

P |X| ≥ =µ x : |x| ≥ ≤ 1 − ϕ(t) dt ∀u > 0. (3.37)
u u u −u

Proof. We have
1 u 1 u
Z Z  Z 
itx

1 − ϕ(t) dt = 1 − e µ(dx) dt
u −u u −u
Z Z u
1
1 − eitx dt µ(dx)

= by Fubini’s theorem.
u −u

61
We calculate the inner integral:
Z u Z u Z u
itx

1−e dt = (1 − cos(tx)) dt − i sin(tx) dt
−u −u −u
| {z }
=0
u  t=u
sin(tx)
Z
=2 (1 − cos(tx)) dt = 2 t −
0 x t=0
 
sin(ux)
= 2u 1 − .
ux
This yields
1 u
Z  
sin(ux)
Z

1 − ϕ(t) dt = 2 1− µ(dx)
u −u ux
sin(ux)
Note: |sin(x)| ≤ |x| ∀x ∈ R =⇒ 1− ≥0
  ux
sin(ux)
Z
≥2 1− µ(dx).
{x:|x|≥ u2 } ux

Use |sin(ux)| ≤ 1:
1 u
   
1 2
Z Z

1 − ϕ(t) dt ≥ 2 1− µ(dx) ≥ µ x : |x| ≥ .
u −u {x:|x|≥ u2 } |ux| u
|{z}
≤ 12
| {z }
≥ 12

Proof of Lévy’s continuity theorem.


w
“=⇒”: Assume Xn −
→ X. Let µn = L(Xn ), µ = L(X).
Z Z
ϕn (t) = e µn (dx) −−−→ eitx µ(dx) = ϕ(t) ∀t,
itx
n→∞

because x 7→ eitx = cos(tx) + i sin(tx) has bounded continuous real and imaginary
parts.
“⇐=”: Assume ϕn (t) −−−→ ϕ(t) ∀t ∈ R.
n→∞
Let ε > 0.
As a characteristic function, ϕ is continuous at 0 with ϕ(0) = 1. Hence, we can find
u > 0 so that for |t| < u one has |1 − ϕ(t)| < 2ε . Hence,
1 u
Z

1 − ϕ(t) dt < ε.
u −u

62
By the dominated convergence theorem,
Z u Z u
 
1 − ϕn (t) dt −→ 1 − ϕ(t) dt.
−u −u

Hence, ∃ N such that ∀ n ≥ N ,


1 u
 
2
Z

2ε ≥ 1 − ϕn (t) dt ≥ µn x : |x| ≥ by Lemma 3.29.
u −u u
Find x1 , x2 , . . . , xN −1 such that

µn {x : |x| ≥ xn } ≤ 2ε ∀n = 1, . . . , N − 1.

Let M = max x1 , x2 , . . . , xN −1 , u2 . Then,





µn [−M, M ] = 1 − µn ({x : |x| > M }) ≥ 1 − 2ε ∀n = 1, 2, . . .

Hence, (µn )n≥1 is tight.


w
Let (µnk )k≥1 be a weakly convergent subsequence, µnk −−−→ ν.
k→∞
Then, by “=⇒”, ϕnk (t) −−−→ ϕν (t) ∀t.
k→∞
By assumption, ϕnk (t) −−−→ ϕ(t) ∀t.
k→∞
=⇒ ϕν (t) = ϕ(t) ∀t ∈ R.
=⇒ ϕ is the characteristic function of ν.
By the uniqueness theorem, ν is unique.
w
Thus, Corollary 3.16 implies µn −
→ ν.

Remark 3.30 We proved slightly more:


Let Xi , i ≥ 1, be random variables with characteristic functions ϕi , i ≥ 1. If

ϕn (t) −−−→ ϕ(t) ∀t ∈ R (3.38)


n→∞

and ϕ is continuous at 0, then ϕ is the characteristic function of some random variable


X and
w
Xn −
→ X. (3.39)

3.7 Characteristic functions and moments


Lemma 3.31 For all x ∈ R, n = 0, 1, 2, . . .

n
( )

ix X j
(ix) |x|n+1 2 |x|n
e − ≤ min , . (3.40)
j!
j=0
(n + 1)! n!

63
Proof. See [Dur05], (3.6) in Chapter 2.

 3.32 Let X be a random variable with characteristic function ϕ. Suppose


Theorem
E |X|k < ∞ for some k ≥ 1. Then, ϕ is k times differentiable and its k th derivative


ϕ(k) (t) = ik E X k eitX


 
(t ∈ R) (3.41)
is uniformly continuous. We have
1
E X m = m ϕ(m) (0) m = 0, 1, . . . , k,
 
(3.42)
i
where ϕ(0) = ϕ, and
k k
X ϕ(j) (0) k
X E[X j ]
j
(it)j + o |t|k
 
ϕ(t) = t + o |t| = as t → 0. (3.43)
j=0
j! j=0
j!

Proof.
• We show the formula (3.41) for ϕ(k) by induction over k. By the definition of ϕ, it
is true for k = 0.
Induction step: Suppose (3.41) is true for k ∈ N0 .
Suppose E |X|k+1 < ∞. Then E |X|k < ∞ and by induction assumption


ϕ(k) (t) = ik E X k eitX .


 

One has
(k)
ϕ (t + h) − ϕ(k) (t) k k ei(t+h)X − eitX
 
k+1
 k+1 itX  k+1 k+1 itX

− i E X e ≤ E i X − i X e
h h
ihX 
−1

e
= E |X|k ·

− iX =: Ih .
| h {z }
=:ψh (X)

Lemma 3.31 for n = 1 yields:


ihx
e − 1 − hix (xh)2 |x2 h|

|ψh (x)| =
≤ = .
h |2h| 2
=⇒ lim ψh (x) = 0 ∀x ∈ R.
h→0
Lemma 3.31 for n = 0 yields:
ihx
e − 1
|ψh (x)| ≤ + |x| ≤ |hx| + |x| = 2 |x| .
h |h|
Thus,
|X|k · |ψh (X)| ≤ 2 |X|k+1 .
Hence, by the dominated convergence theorem, Ih −−→ 0 and we have shown (3.41).
h→0

64
• Uniform continuity of ϕ(k) is proved the same way as uniform continuity of ϕ.
• Taylor expansion: Suppose E[|X|k ] < ∞.
#
k
" k
(j) j
X ϕ (0) X (itX)
ϕ(t) − tj ≤ E eitX −

j! j!


j=0 j=0
" ( )#
|tX|k+1 2 |tX|k
≤ E min ,
(k + 1)! k!
" ( )#
k |t| · |X|k+1 2 |X|k
= |t| E min ,
(k + 1)! k!
| {z }
2|X|k
−−−→ 0 by dominated convergence, min{. . .} ≤ k!
integrable
t→0

= o tk .


Remark 3.33 ϕ0 (0) exists 6=⇒ E[|X|] < ∞.


See [Dur05], Exercise 3.17 in Chapter 2.

Theorem 3.34 If
ϕ(h) − 2ϕ(0) + ϕ(−h)
lim > −∞,
h↓0 h2
then E[X 2 ] < ∞. In particular, E[X 2 ] < ∞ if ϕ00 exists and is continuous at 0.

Proof.
− 2 + e−ihX
 ihX
ϕ(h) − 2ϕ(0) + ϕ(−h)

e
∞ > − lim = lim E −
h↓0 h2 h↓0 | h2
{z }
2(1−cos(hX))
= ≥0
h2
" #
2 1 − cos(hX)
≥ E lim by Fatou’s lemma
h↓0 h2
(hx)2
= E X2
 
because 1 − cos(hx) ∼ as h → 0.
2

3.8 Convergence theorems


Theorem 3.35 Let Xi , i ≥ 1, be independent and identically distributed and assume
E[|X1 |] < ∞. Then,
Sn P

→ E[X1 ]. (3.44)
n

65
Proof. Let m = E[X1 ], ϕ = ϕX1 . Then,
" n # n  
h tSn i
i n
Y tXj
i n
Y t
ϕ Sn (t) = E e =E e = ϕXj by independence
n
j=1 j=1
n
  n    n
t 0 t 1
= ϕ = 1 + ϕ (0) + o as n → ∞ (Taylor-expansion for ϕ)
n n n
   n
itm 1
= 1+ +o because E[|X1 |] < ∞
n n
−−−→ eimt = characteristic function of δm by Lemma 3.8.
n→∞

Sn w
Lévy’s continuity theorem =⇒ n

→ m.
Sn P
Hence, by Theorem 3.11 n

→ m.

Theorem 3.36 (Central limit theorem) Let Xi , i ≥ 1, be independent and identically


distributed with E[Xi ] = m, Var(Xi ) = σ 2 ∈ (0, ∞), Sn = X1 + · · · + Xn . Then,
Sn − nm w
√ −
→ N (0, 1). (3.45)
σ n

Remark 3.37 (3.45) stands for

Sn − nm
 
w
L √ −
→ N (0, 1). (3.46)
σ n

Proof. Without loss of generality m = 0. (Otherwise consider Yi = Xi − m.)


Then, E[X12 ] = σ 2 . Let ϕ be the characteristic function of X1 . Then,

ϕ00 (0) 2
ϕ(t) = 1 + ϕ0 (0) t + t + o t2

| {z } 2 }
=im=0
| {z
2 σ2
=i 2

(σt)2
+ o t2 as t → 0.

=1−
2
As before,
  n 
t t
ϕ S√n (t) = ϕSn √=ϕ √
σ n σ n σ n
 n
t2

1
= 1− +o as n → ∞
2n n
t2
−−−→ e− 2 = characteristic function of N (0, 1).
n→∞

S√n w
Lévy’s continuity theorem =⇒ −−−→
σ n n→∞
N (0, 1).

66
Theorem 3.38 (Lindeberg-Feller theorem) Assume

(i) for each n, Xn,1 , Xn,2 , . . . , Xn,n are independent,


 2 
(ii) E[Xn,j ] = 0, E Xn,j < ∞ ∀n, j,
n
X  2 
(iii) E Xn,j −−−→ σ 2 ∈ (0, ∞),
n→∞
j=1

n
X  2 
(iv) (∀ε > 0) E Xn,j 1{|Xn,j |>ε} −−−→ 0.
n→∞
j=1

Then,
n
w
X
Sn = → N (0, σ 2 ).
Xn,j − (3.47)
j=1

Remark 3.39 This contains our previous CLT:


Yi , i ≥ 1, independent and identically distributed with E[Yi ] = 0, E Yi2 = σ 2 ∈ (0, ∞).
 

Set Xn,j = √1n Yj . (i) and (ii) are clearly satisfied. We check (iii) and (iv):
n n
X  2  X 1  2
• E Yj = E Y12 = σ 2 .
 
E Xn,j =
j=1 j=1
n

n
Y12
X Z Z

 2
Y12 dP −−−→ 0 by the dom-

E Xn,j 1{|Xn,j |>ε} = n √
dP = √
j=1 {|Y1 |>ε n} n {|Y1 |>ε n} n→∞
 2
inated convergence theorem, because E Y1 < ∞.

Hence, (i)–(iv) are satisfied, and we conclude


n n
X 1 X w
Xn,j =√ → N (0, σ 2 ).
Yi −
j=1
n i=1

2
 2 
Proof of Lindeberg-Feller theorem. Let σn,j = E Xn,j .
2
Claim 1: lim max σn,j = 0.
n→∞ 1≤j≤n
Proof. For all ε > 0,
n
X h i
2
 2  2
 2  2 2
σn,j = E Xn,j ≤ ε + E Xn,j 1{|Xn,j |>ε} ≤ ε + E Xn,j 01
{|X |>ε}
n,j 0
j 0 =1
2
=⇒ lim max σn,j ≤ ε2 by (iv).
n→∞ 1≤j≤n

67
Since ε > 0 is arbitrary,
 itX claim
 1 follows.
Let ϕn,j (t) = E e n,j
.

Claim 2: It suffices to prove that



n n  2
2 t
Y Y
lim ϕn,j (t) − 1 − σn,j = 0. (3.48)

n→∞
j=1 j=1
2

Proof. Recall that ln(1 + x) = x + O(x2 ) as x → 0. Hence,


n  2
! X n  2

2 t 2 t
Y
ln 1 − σn,j = ln 1 − σn,j
j=1
2 j=1
2
n
X
2 t2 σ 2 t2
=− σn,j + error(n) −−−→ −
j=1
2 n→∞ 2

n  2 2
 n
2 t
X X
0 2 2
because |error(n)| ≤ c σn,j ≤ c max σn,j σn,j −−−→ 0.
j=1
2 1≤j≤n
j=1
n→∞
| {z }
−→0 | {z }
−→σ 2
n  2

2 t σ 2 t2
Y
=⇒ 1 − σn,j −−−→ e− 2 .
j=1
2 n→∞
If (3.48) holds, then
n
σ 2 t2
Y
ϕ Pn
j=1 Xn,j
(t) = ϕn,j (t) −−−→ e− 2
n→∞
j=1

n
w
X
Lévy’s continuity theorem implies: → N (0, σ 2 ). This completes the proof of
Xn,j −
j=1
claim 2.

Lemma 3.40 For zj , wj ∈ C, 1 ≤ j ≤ n, |zj | ≤ 1, |wj | ≤ 1 ∀j, one has



Yn Yn X n
zj − wj ≤ |zj − wj | . (3.49)


j=1 j=1 j=1

Proof. See [Dur05], Lemma (4.3) in Chapter 2.


2 t2
We want to apply this with zj = ϕn,j (t), wj = 1 − σn,j 2
. Check that the assumptions
of the lemma are satisfied for n sufficiently large:
2

2 t

|ϕn,j (t)| ≤ 1, 1 − σn,j ≤ 1 ∀n large enough.

2

68
Hence,
 X
n n  2 n 2

2 t t
Y Y
2

Jn := ϕn,j (t) − 1 − σn,j ≤ ϕn,j (t) − 1 + σn,j


j=1 j=1
2
j=1
2
n  2

X itX 2 t 2
E e n,j − 1 − itXn,j − i Xn,j

= because E[Xn,j ] = 0.
j=1
2

Using Lemma 3.31, we obtain


n
" ( )#
X |t|3
Jn ≤ E min |Xn,j |3 , t2 Xn,j
2

j=1
6
n
( " # )
3
X |t|
|Xn,j |3 1{|Xn,j |≤ε} + t2 E Xn,j
2
 
≤ E 1{|Xn,j |>ε} for all ε > 0
j=1
6
n n
X |t|3  2  X
t2 E Xn,j
 2 
≤ εE Xn,j 1{|Xn,j |≤ε} + 1{|Xn,j |>ε}
j=1
6 j=1

Assumptions (iii) and (iv) yield


n n
|t|3 X  2  2
X  2  |t|3 2
lim Jn ≤ ε lim E Xn,j + t lim E Xn,j 1{|Xn,j |>ε} = ε σ .
n→∞ 6 n→∞ j=1 n→∞
j=1
6

Since Jn ≥ 0 and ε > 0 is arbitrary, we conclude limn→∞ Jn = 0. This proves (3.48) and
completes the proof of the Lindeberg-Feller theorem.
Example 3.41 Pick a permutation uniformly at random from the set of all permutations
of {1, . . . , n} and consider the cycle decomposition from Example 2.3. Recall the random
variables Xn,1 , Xn,2 , . . . , Xn,n . Assume Xn,1 , Xn,2 , . . . , Xn,n , n = 1, 2, . . . are independent
with
1
P(Xn,k = 1) = 1 − P(Xn,k = 0) = , (3.50)
n−k+1
X n
Sn = Xn,k = number of cycles of a uniformls chosen permutation of {1, . . . , n}.
k=1
(3.51)
We know
n
X 1
E[Sn ] = ∼ ln n as n → ∞. (3.52)
k=1
k

Now,
 2
1 1
− (E[Xn,k ])2 =
 2 
Var(Xn,k ) = E Xn,k − . (3.53)
n−k+1 n−k+1

69
By independence,
n n n
X X 1 X 1
Var(Sn ) = Var(Xn,k ) = − ∼ ln n as n → ∞. (3.54)
k=1 k=1
k k=1 k 2

Sn P
We know: −
→ 1. Hence,
E[Sn ]
 
Sn P Sn

→ 1, i.e. P − 1 > ε −−−→ 0 ∀ε > 0 (3.55)
ln n ln n n→∞

⇐⇒ P (1 − ε) ln n ≤ Sn ≤ (1 + ε) ln n −−−→ 1 ∀ε > 0. (3.56)
n→∞

We check the assumptions of Lindeberg-Feller for


1
Xn,k − n−k+1
Yn,k = √ . (3.57)
ln n
(i) ∀n, Yn,1 , Yn,2 , . . . , Yn,n are independent;
 2 
(ii) E[Yn,k ] = 0, E Yn,k < ∞ ∀n, k;
n
X  2  1
(iii) E Yn,k = Var(Sn ) −−−→ 1;
k=1
ln n n→∞

n
X h i
2
(iv) E Yn,k 1{|Yn,k |>ε} = 0 ∀n large enough and ε > 0,
k=1 √
Xn,k − 1 > ε ln n, false for all k if n is large

because |Yn,k | > ε ⇐⇒ n−k+1
enough.

Hence, Lindeberg-Feller implies


n
X Sn − E[Sn ] w
Yn,k = √ −
→ N (0, 1). (3.58)
k=1
ln n

Note that
n−1 n n−1 n
1 dx X 1 X1
X Z
≥ = ln n ≥ = . (3.59)
k=1
k 1 x k=1
k + 1 k=2 k

Hence,
1
− ≥ ln n − E[Sn ] ≥ −1 (3.60)
n

70
and we conclude |E[Sn ] − ln n| ≤ 1.

E[Sn ] − ln n
=⇒ √ −−−→ 0. (3.61)
ln n n→∞

Sn − ln n w
=⇒ √ −
→ N (0, 1). (3.62)
ln n
Thus,
√ √ Sn − ln n
  
P ln n + a ln n ≤ Sn ≤ ln n + b ln n = P a ≤ √ ≤b (3.63)
ln n
Z b
1 x2
−−−→ √ e− 2 dx ∀a < b. (3.64)
n→∞ 2π a

71
4 Conditional expectation
4.1 Motivation
Assume X is random variable on some probability space (Ω, F, P) and X ∈ L1 . The
expectation E[X] can be interpreted as a prediction for the unknown (random) value of
X. Assume F0 ⊆ F, F0 is a σ-field and assume we ”have the information in F0 ”, i.e. for
each A0 ∈ F0 we know if A0 will occur or not. How does this partial information modify
the prediction of X?

Example 4.1 If X is measurable with respect to F0 , then {X ≤ c} ∈ F0 , ∀c and we


know for each c if {X(ω) ≤ c} or {X(ω) > c} occurs ⇒ we know the value of X(ω).

Example 4.2 X1 , X2 , . . . i.i.d. with E[|X1 |] < ∞ and m = E[X1 ]. How should we modify
the prediction E[X1 ] = m if we know the value Sn (ω) = X1 (ω) + . . . + Xn (ω)?

The solution of the prediction problem is to pass from the constant E[X] = m to
a random variable E[X|F0 ], which is measurable with respect to F0 , the conditional
expectation of X, given F0 .

4.2 Conditional expectation for a σ-field generated by atoms



S
Let G = (Ai )i=1,2,... be a countable partition of Ω into ”atoms” Ai , i.e. Ω = Ai where
i=1
Ai ∈ F, ∀i, Ai ∩ Aj = ∅ if i 6= j, and F0 = σ(G). If P(Ai ) > 0, consider the conditional
law P(·|Ai ) defined by
P(B ∩ Ai )
P(B|Ai ) = (B ∈ F)
P(Ai )
1 1
R R
and define E[X|Ai ] = X dP(·|Ai ) = P(A i)
X dP = P(A i)
E[X1Ai ]. Now define
Ai

X 1
E[X|F0 ](ω) := E[X1Ai ]1Ai (ω). (4.1)
P(Ai )
i : P(Ai )>0

(If ω ∈ Ai and P(Ai ) = 0, we set E[X|F0 ](ω) = 0.) (4.1) gives for each ω ∈ Ω a prediction
E[X|F0 ](ω) which uses only the information in which atom ω is.

Definition 4.3 If F0 is generated by a countable partition G of Ω, the random variable


E[X|F0 ] defined in (4.1) is the conditional expectation of X given F0 .

Theorem 4.4 The random variable E[X|F0 ] (defined in (4.1)) has the following proper-
ties:

(i) E[X|F0 ] is measurable with respect to F0

72
(ii) For each random variable Y0 ≥ 0, which is measurable with respect to F0 , we have

E[XY0 ] = E [E[X|F0 ]Y0 ] . (4.2)

In particular,
E[X] = E [E(X|F0 ]] . (4.3)

Proof. (i) follows from (4.1).


To show (ii), take first
h IA j
(Aj ∈ F0 ). i
1 1
  P
E E[X|F0 ]1Aj = E P(Ai )
E[X1Aj ] 1Ai 1Aj = P(Aj )
E[X1Aj ]P(Aj )
i : P(Ai )>0 
| {z }
1A

i=j
j
=
0

else

= E[X1Aj ] if P(Aj ) >P 0. Hence (4.2) follows in this case from (4.1). Next, we consider
functions of the form ci 1Ai (ci ≥ 0), then monotone limits of such functions as in the
definition of the integral ⇒ (4.2) holds true for all Y0 ≥ 0, Y0 measurable with respect to
F0 . Taking Y0 ≡ 1, (4.3) follows.

Definition 4.5 If F0 = σ(Y ) for some random variable Y , we write E[X|Y ] instead of
E[X|σ(Y )].

Example 4.6 Take p ∈ [0, 1], let X1 , X2 , . . . be i.i.d. Bernoulli random variables with
parameter p, i.e. P(Xi = 1) = p = 1 − P(Xi = 0).
Question: What is E[X1 |Sn ]?
n
P
Answer: E[X1 |Sn ] = P(X1 = 1|Sn = k)1{Sn =k} and
k=0
P(X1 =1,Sn =k) p(n−1
k−1 )
pk−1 (1−p)n−1−(k−1) k
P(X1 = 1|Sn = k) = = =
P(Sn =k) (nk)pk (1−p)n−k n

Sn
⇒ E[X1 |Sn ] = . (4.4)
n
Remark 4.7 E[X1 |Sn ] does not depend on the ”success parameter” p.

Example 4.8 (Randomized sums) X1 , X2 , . . . random variables with Xi ≥ 0, ∀i and


E[Xi ] = m, ∀i. Let N : Ω → {0, 1, . . . } be a random variable which is independent
NP
(ω)
of (X1 , X2 , . . . ) and set SN (ω) := Xk (ω). Then, according to (4.1), E[SN |N ] =
k=1

P 1 N indep.
P(N =k)
E[SN 1{N =k} ]1{N =k} . But E[SN 1{N =k} ] = E[Sk I{N =k} ] = E[Sk ]E[1{N =k} ] =
k=0 of Sk
k · m · P(N = k).

P
Hence E[SN |N ] = m k1{N =k} ⇒ E[SN |N ] = m · N . Now, with (4.3) we conclude that
k=0
| {z }
=N

E[SN ] = m · E[N ] (Wald’s identity) (4.5)

73
4.3 Conditional expectation for general σ-fields
We recall the following facts from measure theory.
Definition 4.9 Let P and Q be measures on (Ω, F). Q is called absolutely continuous
with respect to P (we write Q  P) if one has

P(A) = 0 =⇒ Q(A) = 0. (4.6)

Example 4.10 Let λ be the Lebesgue measure on R and ν the law of a standard normal
random variable, i.e.
1
Z
x2
ν(A) = √ e− 2 dx for all A ∈ B(R). (4.7)
A 2π
Then, ν  λ. In fact, µ  λ for all probability measures µ on R which have a density.

Theorem 4.11 (Radon-Nikodym) Let P and Q be measures on (Ω, F). If P is σ-


finite, then the following are equivalent:
(1) Q  P.

(2) Q has a density with respect to P, i.e. there exists an F, B [0, ∞] -measurable
function f : Ω → [0, ∞] such that
Z
Q(A) = f dP ∀A ∈ F. (4.8)
A

The function f is called Radon-Nikodym derivative or Radon-Nikodym density and one


writes f = dQ
dP
.

Proof. [Bau90], Satz 17.10.

Remark 4.12 The Radon-Nikodym density is unique up to P-nullsets, i.e. if f and f˜ are
two functions satisfying (2) in Theorem 4.11, then P(f 6= f˜) = 0.

Assume X is a random variable on (Ω, F, P), X ≥ 0, F0 ⊆ F, F0 σ-field.


Definition 4.13 A random variable X0 ≥ 0 is a (version of the) conditional expectation
of X, given F0 , if it satisfies
(i) X0 is measurable with respect to F0 .

(ii) For each random variable Y0 ≥ 0, Y0 measurable with respect to F0 , we have

E[XY0 ] = E[X0 Y0 ] (4.9)

We write X0 = E[X|F0 ].

74
Remark 4.14 (a) To check (ii) in Definition 4.13, it suffices to check E[X1A0 ] =
E[X0 1A0 ], ∀A0 ∈ F0 .

(b) (4.9) implies, with Y0 ≡ 1,


E[X] = E [E[X|F0 ]] . (4.10)

Theorem 4.15 (Existence and uniqueness of conditional expectations) The con-


ditional expectation E[X|F0 ] of a random variable X ≥ 0 given a σ-field F0 exists and is
unique in the following sense: If X0 and X̃0 are two random variables satisfying (i) and
(ii) in Definition 4.13, then X0 = X̃0 P-a.s.

Remark 4.16 If X = X + − X − and (at least) one of X + , X − is in L1 , we define


E[X|F0 ] = E[X + |F0 ] − E[X − |F0 ] ∈ [−∞, +∞]. Then, X ∈ L1 ⇒ E[X|F0 ] ∈ L1 .

Proof of Theorem 4.15.

(a) Let Q(A0 ) := E[X1 ] (A0 ∈F0 ). Then


 ∞ A0  Q is∞a measure on∞(Ω, F0 ): if Ai ∩Aj = ∅ for
S P∞ P P
i 6= j, then Q Ai = E X IAi = E[XIAi ] = Q(Ai ). The measure Q
i=1 i=1 i=1 i=1
is absolutely continuous with respect to P on (Ω, F0 ), i.e. we have, ∀A ∈ F0 , P(A0 ) =
0 ⇒ Q(A0 ) = 0.
Due to Radon Nikodym’s Theorem (P is a probability measure and hence σ-finite!)
there is a function
R X0 ≥ 0, which is measurable with respect to F0 such that
Q(A0 ) = X0 dP = E[X0 1A0 ] (A0 ∈ F0 ). Remark 4.14 (a) implies that (ii) in
| {z } A0
=E[X1A0 ]
Definition 4.13 is satisfied for X0 .

(b) If X0 and X̃0 are random variables which satisfy (ii) in Definition 4.13, then A0 =
{X0 > X̃0 } ∈ F0 . (4.9) implies that E[X0 1A0 ] = E[X̃0 1A0 ] ⇒ E[(X0 − X̃0 )1A0 ] =
0 ⇒ P(A0 ) = 0. In the same way, P(X0 < X̃0 ) = 0 ⇒ X0 = X̃0 P-a.s.

Now, we can generalize the explicit computation in Example 4.6.


n
Lemma 4.17 Assume that X1 , . . . , Xn are i.i.d. and X1 ∈ L1 and let Sn =
P
Xi . Then,
i=1

Sn
E[Xi |Sn ] = , i = 1, . . . n. (4.11)
n
For the proof, we will need the following lemma.

Lemma 4.18 Let X and Y be random variables on some probability space (Ω, F, P).
Then, the following statements are equivalent:

(a) Y is measurable with respect to σ(X).

75
(b) There is a measurable function h : (R, B) → (R, B) such that Y = h(X).

Proof of Lemma 4.18. (b) ⇒ (a) is clear because the composition of measurable
functions is measurable.
(a) ⇒ (b): Take first Y = 1A , A ∈ σ(X). Then, A = {X ∈ B} for some B ∈P B and
Y = h(X) = 1B (X) (i.e. h(z) = 1 if z ∈ B, h(z) = 0 otherwise.) Then, take Y = ci IAi
i
(with constants ci ≥ 0), then monotone limits of such functions etc.
We now give the
Proof of Lemma 4.17. Let Y0 ≥ 0, Y0 measurable with respect to σ(Sn ). Hence,
with
R RLemma 4.18, Y0 = h(Sn ) for a measurable function h. Hence, E[Xi h(Sn )] =
· · · xi h(x1 + . . . + xn )µ(dx1 ) . . . µ(dxn ), where µ is the law of X1 . But E[Xi h(Sn )] is in-
variant under permutations of the indices {1, . . . , n} ⇒ E[Xi h(Sn )] = E[Xj h(Sn )], ∀i, j ⇒
n
E[Xi h(Sn )] = n1 E[Xk h(Sn )] = E[ Snn h(Sn )] ⇒ E[Xi Y0 ] = E[ Snn Y0 ] and we showed that
P
k=1
Sn
n
satisfies property (ii) in Definition 4.13.

Remark 4.19 The proof used not independence but only the weaker property that the
joint law of (X1 , . . . , Xn ) is invariant under permutations of the indices.

4.4 Properties of conditional expectations


Conditional expectation satisfies the same ”rules” as expectation. In some situations, this
is obvious since conditional expectation is an expectation with respect to some conditional
distribution.

Theorem 4.20 X1 , X2 random variables with X1 ≥ 0, X2 ≥ 0. Then

(a) E[X1 + X2 |F0 ] = E[X1 |F0 ] + E[X2 |F0 ] P-a.s.


E[cX1 |F0 ) = cE[X1 |F0 ] P-a.s.
(linearity)

(b) X1 ≤ X2 P-a.s. ⇒ E[X1 |F0 ] ≤ E[X2 |F0 ] P-a.s. (monotonicity)

(c) 0 ≤ X1 ≤ X2 ≤ . . . P-a.s. ⇒ E[lim Xn |F0 ] = lim E[Xn |F0 ] P-a.s.


n n

Remark 4.21 Concerning (c) in Theorem 4.20, lim E[Xn |F0 ] is defined as follows. Let
T n
A := {Xn < Xn+1 }. Due to the hypothesis, P(A) = 1 and (b) implies P(A0 ) = 1
n T
where A0 = {E[Xn |F0 ] ≤ E[Xn+1 |F0 ]} (for all versions E[Xn |F0 ], E[Xn+1 |F0 ]). We
n
now set lim Xn (ω) = lim Xn (ω)1A (ω) and lim E[X|F0 ] = lim E[Xn |F0 ]1A0 (ω). Then (c)
n n n n
says that lim E[Xn |F0 ] is (a version of ) the conditional expectation of lim Xn , given F0 ,
n n
i.e. a random variable with properties 4.13 (i) and 4.13 (ii) with F0 and X = lim Xn .
n

76
Proof.

(a) For each choice of a version E[Xi |F0 ] (i = 1, 2) we have that E[X1 |F0 ] + E[X2 |F0 ] is
a random variable which is measurable with respect to F0 and for Y0 ≥ 0, Y0 mea-
surable with respect to F0 , we have E[Y0 (E[X1 |F0 ] + E[X2 |F0 ])] = E[Y0 E[X1 |F0 ]] +
4.13 (ii)
E[Y0 E[X2 |F0 ]] =
E[Y0 X1 ] + E[Y0 X2 ] = E[Y0 (X1 + X2 )].
R
(b) Let B0 = {E[X1 |F0 ] > E[X2 |F0 )]}. Then, B ∈ F0 and (E[X1 |F0 ]−E[X2 |F0 ]) dP =
B0
4.13 (ii) R X1 ≤X2
E(1B0 E[X1 |F0 ] − E[X2 |F0 ]) = E[1B0 (X1 − X2 )] = (X1 − X2 ) dP ≤ 0 ⇒
B0 P-a.s.
P(B0 ) = 0.
mon.
(c) Let Y0 ≥ 0, Y0 measurable with respect to F0 . Then E[Y0 lim E[Xn |F0 ]] =
n conv.
4.13 (ii) mon.
lim E[Y0 E[Xn |F0 ]] = lim E[Y0 Xn ] = E[Y0 lim Xn ].
n n conv. n

The following theorem gives two important cases where the conditional expectation
simplifies.

Theorem 4.22 (a) Let Z0 ≥ 0 be a random variable which is measurable with respect
to F0 . Then
E[Z0 X|F0 ] = Z0 E[X|F0 ]. (4.12)

(b) Assume that σ(X) and F0 are independent. Then,

E[X|F0 ] = E[X]. (4.13)

Proof.

(a) The right hand side of (4.12) is measurable with respect to F0 and for Y0 ≥
4.13 (ii)
0, Y0 measurable with respect to F0 , we have E[Y0 (Z0 X)] = E[(Y0 Z0 )X] =
E[Y0 Z0 E[X|F0 ]] = E[Y0 (Z0 E[X|F0 ])].

(b) see exercises.

Theorem 4.20 implies the following.

Lemma 4.23 (Fatou’s Lemma for conditional expectation) Assume Xn ≥ Y , P-


a.s. ∀n for some Y ∈ L1 . Then
h i
E lim inf Xn |F0 ≤ lim inf E[Xn |F0 ] P-a.s. (4.14)
n→∞ n→∞

77
Theorem 4.24 (Lebesgue’s Theorem for conditional expectations) Assume |Xn | ≤
Y P-a.s. ∀n for some Y ∈ L1 , Xn → X P-a.s. Then
h i
E[X|F0 ] = E lim Xn |F0 = lim E[Xn |F0 ] P-a.s. (4.15)
n n

Moreover, we have
Theorem 4.25 (Jensen’s inequality for conditional expectations) Assume X ∈ L1
and f : R → R convex. Then, E[f (X)] is well-defined and E[f (X)|F0 ] ≥ f (E[X|F0 ]),
P-a.s.
Sketch of proof of Theorem 4.25. Each convex function f is of the form f (x) =
sup `n (x) ∀x with linear functions `n (x) = an x + bn . In particular, f ≥ `n , `n (X) ∈
n
4.20(b) 4.20(a)
L1 . Since E[f (X)|F0 ] ≥ E[`n (X)|F0 ] = `n (E[X|F0 ]), we have E[f (X)|F0 ] ≥
sup `n (E[X|F0 ]) = f (E[X|F0 ]), P-a.s.
n
Corollary 4.26 For p ≥ 1, conditional expectation is a contraction of Lp in the following
sense: X ∈ Lp ⇒ E[X|F0 ] ∈ Lp and kE[X|F0 ]kp ≤ kXkp .
Proof. With f (x) = |x|p , Jensen’s inequality for conditional expectations implies that
|E[X|F0 ]|p ≤ E[|X|p |F0 ] ⇒ E[|E[X|F0 ]|p ] ≤ E[|X|p ] ⇒ kE[X|F0 ]kp ≤ kXkp .
In particular, if X ∈ L2 , then E[X|F0 ] ∈ L2 and E[X|F0 ] can be interpreted as the
”best” prediction of X, given F0 , in the following sense.
Theorem 4.27 Assume X ∈ L2 , Y0 is measurable with respect to F0 and Y0 ∈ L2 .
Then E [(X − E[X|F0 ])2 ] ≤ E[(X − Y0 )2 ] and we have ”=” if and only if Y0 = E[X|F0 ],
P-a.s.
Proof. Assume X0 is a version of E[X|F0 ]. Then E[(X − Y0 )2 ] = E[X 2 ] − 2E[XY0 ] +
E[Y02 ]. For Y0 = X0 , we conclude E[(X − X0 )2 ] = E[X 2 ] − E[X02 ]. Hence E[(X − Y0 )2 ] =
E[(X − X0 )2 ] + E[(X0 − Y0 )2 ] ⇒ E[(X − Y0 )2 ] ≥ E[(X − X0 )2 ] with ”=” if and only if
X0 = Y0 , P-a.s.
Remark 4.28 Theorem 4.27 says that the conditional expectation E[X|F0 ] is the projec-
tion of the element X in the Hilbert space L2 (Ω, F, P) on the closed subspace L2 (Ω, F0 , P).
Theorem 4.29 (Tower property of conditional expectation) Let F0 , F1 be σ-fields
with F0 ⊆ F1 ⊆ F and X a random variable with X ≥ 0. Then,
E[E[X|F1 ]|F0 ] = E[X|F0 ] P-a.s. (4.16)
and
E[E[X|F0 ]|F1 ] = E[X|F0 ] P-a.s. (4.17)
Proof. To show (4.16), we have to prove (see (4.9) in the definition of conditional ex-
pectation) that for Y0 ≥ 0, Y0 measurable with respect to F0 , E[Y0 E[X|F1 ]] = E[Y0 E[X|F0 ]].
But, again due to (4.9), E[Y0 E[X|F0 ]] = E[Y0 X]]. Now, since Y0 is F1 -measurable as well,
again due to (4.9), E[Y0 E[X|F1 ]] = E[Y0 X] and we conclude.
(4.17) is clear since E[X|F0 ] is measurable with respect to F1 : use (4.12).

78
5 Martingales
5.1 Definition and examples
(Ω, F, P) probability space, F0 ⊆ F1 ⊆ F2 ⊆ . . . increasing sequence of σ-fields with
Fi ⊆ F, ∀i. Such a sequence (Fn ) is called a filtration. Interpretation: Fn is the
collection of events observable until time n.

Definition 5.1 A martingale is a sequence (Mn )n=0,1,... of random variables with


(M1) Mn is measurable with respect to Fn , ∀n ≥ 0.

(M2) Mn ∈ L1 , ∀n .

(M3) E[Mn+1 |Fn ] = Mn ∀n ≥ 0.

Remarks 5.2 (a) Under the assumptions (M1) and (M2), (M3) is equivalent to

E[Mn+k − Mn |Fn ] = 0, ∀n ≥ 0. (5.1)


k
P
Proof: (5.1) implies that for n, k ≥ 0, E[Mn+k −Mn |Fn ] = E[Mn+` −Mn+`−1 |Fn ] =
`=1
k
P
E[E(Mn+` − Mn+`−1 |Fn+`−1 ]|Fn ] = 0.
`=1

(b) We are now omitting ”P -a.s.” in (M3).

(c) We say that (Mn ) is adapted to (Fn ) (meaning that for each n, Mn is measurable
with respect to Fn ).

We consider five important (classes of) examples.

5.1.1 Sums of independent centered random variables


Assume Y1 , Y2 , . . . are independent random variables, Yi ∈ L1 , ∀i. Let Fn := σ(Y1 , . . . , Yn ),
Pn
n ≥ 1, F0 := {∅, Ω}. Let Mn := (Yi − E[Yi ]), n ≥ 1 and M0 = 0. Then, (Mn ) is a mar-
i=1
tingale with respect to (Fn ): (M1) and (M2) are satisfied and we have E[Mn+1 −Mn |Fn ] =
E[(Yn+1 − E[Yn+1 ])|Fn ] = E[Yn+1 |Fn ] − E[Yn+1 ] = E[Yn+1 ] − E[Yn+1 ] = 0.

Example 5.3 Let p ∈ (0, 1), Y1 , Y2 , . . . i.i.d. Bernoulli RV with parameter p, i.e. P(Yi =
P n
1) = p = 1 − P(Yi = −1). Let Sn := , n ≥ 1 (S0 = 0) be the partial sums and
i=1
Fn = σ(Y1 , . . . , Yn ), F0 = {∅, Ω}. Define Mn := Sn − n(2p − 1) (n = 0, 1, . . .). Then (Mn )
is a martingale with respect to (Fn ). In the same way, for x ∈ R, M̃n = x + Sn − n(2p −
1) (n = 0, 1, . . . ) is a martingale with respect to (Fn ). Note that (Sn ) is a martingale with
respect to Fn if and only if p = 21 .

79
5.1.2 Successive predictions
Let (Fn ) be a filtration and X ∈ L1 a random variable. Set

Mn := E[X|Fn ], n = 0, 1, 2, . . . (5.2)

Then, (Mn ) is a martingale with respect to (Fn ).


(M1): By definition, Mn is measurable with respect to Fn , ∀n.
(M2): Taking expectations in (5.2), we see that Mn ∈ L1 , ∀n.
(M3): E[Mn+1 |Fn ] = E[E[X|Fn+1 ]|Fn ] = E[X|Fn ] = Mn , where we used the tower
property in the second to last equation.

5.1.3 Radon-Nikodym derivatives on increasing sequences of σ-fields


Assume P and Q are probability measures on (Ω, F) with Q  P. Let (Fn ) be a filtration.
Let X := dQdP
be the Radon-Nikodym derivative of Q with respect to P, and let Mn := dQ |
dP Fn
be the Radon-Nikodym derivative of Q|Fn with respect to P|Fn . Here Q|Fn denotes the
restriction of Q to (Ω, Fn ), defined by Q|Fn (A) = Q(A) for all A ∈ Fn . And P|Fn denotes
the restriction of P to (Ω, Fn ).
Claim:
Mn = E[X|Fn ], n = 0, 1, 2, . . . (5.3)
Consequence: (Mn ) is a martingale with respect to (Fn ) on Ω, P (it falls into the class of
successive predictions).
Proof of the claim. Let Zn := E[X|Fn ].

(a) Zn is measurable with respect to Fn , ∀n.

(b) We show that Q(A) = Zn dP, ∀A ∈ Fn (and this implies Zn = dQ


R
| ). Take
dP Fn
A
A ∈ Fn . Then, Q(A) = X dP since X = dQ
R R R
dP
. But X dP = X1A dP =
R R A A
E[X|Fn ]1A dP = Zn dP.
A

But Radon-Nikodym derivatives on increasing σ-fields form a martingale even if they


are not of the form E[X|Fn ].

Theorem 5.4 Let P and Q be probability measures on (Ω, F) and F0 ⊆ F1 ⊆ . . . an


increasing sequence of σ-fields with Fi ⊆ F, ∀i. We assume Q|Fi  P|Fi , ∀i. Let
Mn = dQ | (n = 0, 1, . . . ). Then, (Mn ) is a martingale with respect to (Fn ) on (Ω, P).
dP Fn

Proof.

(M1) Mn is measurable with respect to Fn , ∀n and

(M2) Mn ∈ L1 , ∀n: Mn ≥ 0 and Mn dP = 1, ∀n.


R

80
(M3) E[Mn+1 |Fn ] = Mn , ∀n, with the same argument as before, i.e. we show that Q(A) =
R A∈Fn+1
E[Mn+1 |Fn ] dP, ∀A ∈ Fn . More precisely: take A ∈ Fn . Then, Q(A) =
A
R A∈F R R
1A Mn+1 dP = n 1A E[Mn+1 |Fn ] dP = E[Mn+1 |Fn ] dP ⇒ Mn = E[Mn+1 |Fn ].
A

5.1.4 Harmonic functions of Markov chains


Let S be countable. Consider a Markov chain with state space S and transition matrix
K(x, y). A function h : (S, S) → (R, B) is harmonic if it satisfies, ∀x ∈ S, the mean value
property X
h(x) = h(y)K(x, y). (5.4)
y∈S

Fix x ∈ S and consider the Markov chain (Xn ) with X0 = x and transition matrix K.
Assume h is harmonic. Let Fn = σ(X0 , . . . , Xn ), n = 0, 1, . . . Then, Mn := h(Xn ), (n =
0, 1, . . .) is a martingale with respect to (Fn ).
(M1) is true by definition.
We check (M3):
X
E[h(Xn+1 )|Fn ] = h(y)K(Xn , y) = h(Xn ) a.s. (5.5)
y∈S

(Exercise: show that (5.5) follows from (5.4) and the Markov property). In particular,
taking expectations in (5.5), E[h(Xn+1 )] = E[h(Xn )], ∀n.
By induction, E[h(Xn )] = h(x) and (M2) follows.

Example 5.5 Assume p ∈ (0, 1), x ∈ Z, Y1 , Y2 , . . . i.i.d. with P(Yi = 1) = p = 1−P(Yi =


n
P
−1), Sn = Yi , (S0 = 0). Consider the random walk Xn = x + Sn (n = 0, 1, . . .).
i=1
(Xn ) is a Markov chain with starting point x and transition  matrix
y K(z, z + 1) = p,
1−p
K(z, z − 1) = 1 − p, K(z, y) = 0 if |y − z| =
6 1. Take h(y) = p (y ∈ Z). Then, h is a
 z
harmonic function for K: y∈Z h(y)K(z, y) = ph(z +1)+(1−p)h(z −1) = 1−p
P
p
= h(z)
⇒ h(Xn ) (n = 0, 1, . . .) is a martingale with respect to Fn .

5.1.5 Growth rate of branching processes


Let pk ≥ 0, k ∈ N0 , with ∞
P
k=0 pk = 1. We consider a population where each individual
has k children with probability pk , k ∈ N0 , independently of all other individuals. More
formally: Let Yn,i , i, n ∈ N, be i.i.d. random variables with

P (Yn,i = k) = pk ∀k ∈ N0 . (5.6)

81
Yn,i is the number of children of the i-th individual in generation n − 1 (if this individual
exists). Set

Z0 := 1, (5.7)
Zn
X
Zn+1 := Yn+1,i ∀n ∈ N, (5.8)
i=1

and Zn+1 = 0 if Zn = 0. Interpretation: Zn is the number of individuals in the nth


generation. (Zn )n∈N0 is called Galton-Watson process.
Let m := E [Yn,i ] ∈ [0, ∞]. Take Fn = σ (Yi,j : j ∈ N, i ∈ {1, . . . , n}).
 
Zn
Lemma 5.6 If m ∈ (0, ∞), then is a martingale with respect to (Fn ).
mn n∈N0

Proof. Set Fn = σ (Yi,j : j ∈ N, i ∈ {1, . . . , n}).


Zn
(M1) is Fn -measurable for all n.
mn
Claim 1: E[Zn+1 |Fn ] = mZn ∀n ∈ N0 .
(Note that since Zn+1 ≥ 0, E[Zn+1 |Fn ] is well-defined.)
Proof of Claim 1.
#  

"Z
n
X X
E[Zn+1 |Fn ] = E Yn+1,i Fn = E  1{i≤Zn } Yn+1,i Fn 

| {z }
i=1 i=1 ∈Fn

independent

X  .&  XZn
= 1{i≤Zn } E Yn+1,i Fn =
m = Zn m.
i=1
| {z } i=1
=E[Yn+1,i ]=m

Claim 2: E[Zn ] = mn .
Proof of Claim 2 by induction over n.
n = 0: E[Z0 ] = 1 = m0 .
n→n+1:
 
E[Zn+1 ] = E E[Zn+1 |Fn ] = E[mZn ] by Claim 1
n
=m·m by the induction assumption
n+1
=m .

The claims have the following consequences:

82
 
Zn 1
(M2) E n = n E[Zn ] = 1 < ∞.
m m | {z }
=mn
 
Zn+1 1 Zn
(M3) E n+1
Fn = n+1 mZn = n .
m m m

5.2 Supermartingales and Submartingales


 5.7 A stochasticprocess (i.e. a sequence of random variables) (Mn )n∈N0 is
Definition
 martingale 
called a supermartingale with respect to a filtration (Fn )n∈N0 if the following condi-
submartingale
 
tions are satisfied:

(M1) (Mn )n∈N0 is adapted to (Fn )n∈N0 .

(M2) E[|Mn |] < ∞ ∀n ∈ N0 .


 
 = Mn 
(M3) E[Mn+1 |Fn ] ≤ Mn ∀n ∈ N0 .
≥ Mn
 

The following is an easy consequence of the definition.


   
 martingale   constant 
Lemma 5.8 If (Mn )n∈N0 is a supermartingale , then n 7→ E[Mn ] is non-increasing .
submartingale non-decreasing
   

Proof in the submartingale case. Let n ∈ N0 . Then, one has


 
E[Mn+1 ] = E E[Mn+1 |Fn ] ≥ E[Mn ].
| {z }
≥Mn by (M3)

 
 martingale 
Lemma 5.9 If (Mn )n∈N0 is a supermartingale with respect to a filtration (Fn )n∈N0 ,
submartingale
 
 
 martingale 
then (Mn )n∈N0 is a supermartingale with respect to the filtration generated by M .
submartingale
 

Proof in the martingale case. For n ∈ N0 , set Gn = σ(Mk , k ≤ n). Assume that
(Mn )n∈N0 is a martingale with respect to some filtration (Fn )n∈N0 .

83
(M1) Mn is Gn -measurable.

(M2) E[|Mn |] < ∞ ∀n ∈ N0 .

(M3) Since Mk is Fk -measurable, σ(Mk ) ⊆ Fk ⊆ Fn for all k ≤ n, and we have Gn ⊆ Fn .


Consequently, for k < n, we get
 
E[Mn |Gk ] = E E[Mn |Fk ] Gk (The smaller σ-algebra wins.)
| {z }
= Mk by assumption
= E[Mk |Gk ]
= Mk because Mk is Gk -measurable.

Remark 5.10 If one says “(Mn )n∈N0 is a martingale” without specifying a filtration, then
one means “(Mn )n∈N0 is a martingale with respect to the filtration generated by M ”.

Remark 5.11 (a) (Mn )n∈N0 is a submartingale ⇐⇒ (−Mn )n∈N0 is a supermartin-


gale.

(b) (Mn )n∈N0 is a martingale ⇐⇒ (Mn )n∈N0 is a submartingale and a supermartin-


gale.
 
Lemma 5.12 Let ϕ : R → R be convex and E |ϕ(Mn )| < ∞ ∀n ∈ N0 .

(a) If (Mn )n∈N0 is a martingale with respect to a filtration (Fn )n∈N0 , then ϕ(Mn ) n∈N0
is a submartingale with respect to (Fn )n∈N0 .

(b) If (Mn )n∈N


 0 is a submartingale with respect to (Fn )n∈N0 and ϕ is increasing, then
ϕ(Mn ) n∈N0 is a submartingale with respect to (Fn )n∈N0 .

Proof.

(M1) ϕ(Mn ) is Fn -measurable because Mn is Fn -measurable.


 
(M2) E |ϕ(Mn )| < ∞ by assumption.

(M3) For k < n, one has


  
E ϕ(Mn ) Fk ≥ ϕ E[Mn |Fk ] by Jensen’s inequality for conditional expectations
= ϕ(Mk ) in the case that (Mn )n∈N0 is a martingale.

In the submartingale case, one has ϕ E[Mn |Fk ] ≥ ϕ(Mk ) because ϕ is increasing.

84
Example 5.13 If (Mn )n∈N0 is a martingale, then (Mn2 )n∈N0 , (|Mn |)n∈N0 and (Mn+ )n∈N0
are submartingales if the required integrability conditions are satisfied.

Example 5.14 You buy H0 stocks of a company at time 0.


For n ∈ N0 ,
let Vn be the value of the portfolio at time n,
Xn be the value of one stock at time n,
Hn be the number of stocks you hold for the period (n − 1, n].
Then,
n
X
Vn = V0 + Hk (Xk − Xk−1 ). (5.9)
k=1

 Hn is measurable
Natural requirement: with respect to σ(X0 , X1 , .. . , Xn−1 ). 
 martingale   fair 
If (Xn )n∈N0 is a supermartingale , it seems that this is a non-profitable in-
submartingale profitable
   
vestment strategy. This is confirmed by Theorem 5.16 below.

Definition 5.15 (Hn )n∈N is previsible with respect to (Fn )n∈N0 if Hn is Fn−1 -measurable
∀n ∈ N.

For n ∈ N, set
n
X
(H.X)n := Hk (Xk − Xk−1 ). (5.10)
k=1
 
 martingale 
Theorem 5.16 Let (Xn )n∈N0 be a supermartingale with respect to (Fn )n∈N0 . Sup-
submartingale
 
all n ∈ N there is a
pose (Hn )n∈Nis previsible withrespect to (Fn )n∈N0 and for  constant
 |H n | ≤ c n    martingale 
cn such that 0 ≤ Hn ≤ cn . Then, (H.X)n n∈N0 is a supermartingale .
0 ≤ Hn ≤ cn submartingale
   

Proof in the supermartingale case.


n
X
(M1) (H.X)n = Hk (Xk − Xk−1 ) is Fn -measurable ∀n.
k=1

(M2) E |(H.X)n | < ∞ because Hk is bounded and Xk ∈ L1 for all n.


 

85
(M3) For all n ∈ N0 , one has
   
E (H.X)n+1 Fn = E (H.X)n + Hn+1 (Xn+1 − Xn ) Fn
| {z } | {z }
- %
Fn -measurable

= (H.X)n + Hn+1 E[Xn+1 − Xn |Fn ]


| {z } | {z }
≥0 ≤0

≤ (H.X)n .

5.3 Stopping times


Definition 5.17 A random variable τ : Ω → N0 ∪ {∞} is called a stopping time with
respect to a filtration (Fn )n∈N0 if
{τ ≤ n} ∈ Fn for all n ∈ N0 . (5.11)
Intuitively speaking, τ is a stopping time if we can decide whether τ ≤ n holds knowing
only Fn , the information up to time n.
Example 5.18 Let (Xn )n∈N0 be a stochastic process adapted to (Fn )n∈N0 .
(a) Then, for any A ∈ B(R),
τA = inf {n ∈ N0 : Xn ∈ A} (5.12)
is a stopping time. Here, we use the convention inf ∅ = ∞.
Proof. For any n ∈ N0 , we have
n
[
{τA ≤ n} = {X ∈ A} ∈ Fn .
| k{z }
k=0 ∈Fk ⊆Fn

We call τA the first hitting time of A.


(b) Consider
τ̃A = sup{k ∈ N0 : Xk ∈ A}. (5.13)
In general, τ̃A is not a stopping time.
Proof. For all n ∈ N0 ,

\
{τ̃A ≤ n} = {Xk ∈
/ A} in general not in Fn .
| {z }
k=n+1 ∈Fk ⊇Fn

86
Lemma 5.19
τ is a stopping time ⇐⇒ {τ = n} ∈ Fn for all n ∈ N0 (5.14)

Proof.
“=⇒”: {τ = n} = {τ ≤ n} \ {τ ≤ n − 1} ∈ Fn .
| {z } | {z }
∈Fn ∈Fn−1 ⊆Fn

n
[
“⇐=”: {τ ≤ n} = {τ = k} ∈ Fn .
| {z }
k=0 ∈F ⊆F
k n

5.4 Stopped martingales


 
 martingale 
Theorem 5.20 Let (Mn )n∈N0 be a supermartingale with respect to (Fn )n∈N0 and
submartingale
 
let T be a stopping time with respect to (Fn )n∈N0 . Define
MT ∧n : Ω → R, ω 7→ MT (ω)∧n (ω).
 
 martingale 
Then, (MT ∧n )n∈N0 is a supermartingale with respect to (Fn )n∈N0 .
submartingale
 

Proof in the supermartingale case. Set Hn = 1{T ≥n} . Since


{T ≥ n} = {T ≤ n − 1}c ∈ Fn−1 ,
(Hn )n∈N0 is previsible. Furthermore 0 ≤ Hn ≤ 1 for all n. One has
n
X n
X
(H.M )n = Hk (Mk − Mk−1 ) = 1{T ≥k} (Mk − Mk−1 )
k=1 k=1
Xn n−1
X
= 1{T ≥k} Mk − 1{T ≥k+1} Mk
k=1 k=0
n−1
X
= 1{T =k} Mk + 1{T ≥n} Mn − 1{T ≥1} M0
k=1
| {z }
=1{T ≥0} M0 −1{T =0} M0
| {z }
=1

n−1
X
= 1{T =k} Mk + 1{T ≥n} Mn −M0 .
|k=0 {z }
=MT ∧n

87
Theorem 5.16 =⇒ (H.M ) is a supermartingale
=⇒ (MT ∧n − M0 )n∈N0 is a supermartingale.
Since (Yn = −M0 )n∈N0 is a martingale and the sum of a supermartingale and a martingale
is a supermartingale, it follows that (MT ∧n )n∈N0 is a supermartingale.

5.5 The martingale convergence theorem


Let (Mn )n∈N0 be a submartingale, a < b. Set

N0 := 0 (5.15)
N2k−1 := inf{i > N2k−2 : Mi ≤ a}, (5.16)
N2k := inf{i > N2k−1 : Mi ≥ b} for k ∈ N. (5.17)

This means: N1 is the first time when (Mn )n∈N0 reaches a value ≤ a,
N2 is the first time after N1 when (Mn )n∈N0 reaches a value ≥ b,
N3 is the first time after N2 when (Mn )n∈N0 reaches a value ≤ a,
etc. 
In particular, Nk , k ∈ N0 , are stopping times with respect to Fn = σ(M0 , . . . , Mn ) n∈N0
and for k ∈ N one has

MN2k−1 ≤ a and MN2k ≥ b, (5.18)

provided N2k−1 < ∞ and N2k < ∞.


Between the times N2k−1 and N2k , (Mn )n∈N0 crosses from below a to above b for the
k-th time. We call this an upcrossing. Let

Un (a, b) = sup{k ∈ N0 : N2k < n}


= number of upcrossings of the interval [a, b] up to time n.

88
Mn

n
0 N1 N2 N3 N4

Theorem 5.21 (Upcrossing inequality) Let (Mn )n∈N0 be a submartingale. For all n
and all a < b, one has

(b − a)E[Un (a, b)] ≤ E (Mn − a)+ − E (M0 − a)+ .


   
(5.19)

Proof. By Lemma 5.12 (b), (Yn := a + (Mn − a)+ )n∈N0 is a submartingale. (Mn )n∈N0
and (Yn )n∈N0 have the same number of upcrossings of the interval [a, b]. Set
(
1 if N2k−1 < i ≤ N2k for some k ∈ N,
Hi =
0 else.

This means Hi = 1 ⇐⇒ i belongs to an upcrossing. One has


n
X
(b − a)Un (a, b) ≤ (H.Y )n = Hs (Ys − Ys−1 ),
s=1

because every upcrossing contributes a profit ≥ b − a to (H.Y )n and an unfinished up-


crossing at the end contributes something positive to (H.Y )n . Taking expectations yields
 
(b − a)E[Un (a, b)] ≤ E (H.Y )n (5.20)

89
Set Ki = 1 − Hi for all i. One has
    
E (H.Y )n = E (1 − K).Y n
" n #
X
=E (1 − Ks )(Ys − Ys−1 )
 s=1 
= E Yn − Y0 − (K.Y )n
= E (Mn − a)+ − (M0 − a)+ − (K.Y )n .
 

Inserting this into (5.20) yields

(b − a)E[Un (a, b)] ≤ E (Mn − a)+ − E (M0 − a)+ − E (K.Y )n .


     
(5.21)

It remains to show that


 
E (K.Y )n ≥ 0.

Observe that

[ ∞
[
{N2k−1 ≤ i − 1} ∩ {N2k ≤ i − 1}c ∈ Fi−1 ,

{Hi = 1} = {N2k−1 < i ≤ N2k } =
k=1 k=1

because N2k−1 and N2k are stopping times. Hence, (Hi )i∈N is previsible. Consequently,

(Ki )i∈N is previsible. Since 0 ≤ Ki ≤ 1 ∀i, Theorem 5.16 implies that (K.Y )n n∈N is a
submartingale. Consequently, by Lemma 5.8,
     
E (K.Y )n ≥ E (K.Y )1 = E (1 − H1 )E[Y1 − Y0 |F0 ] ≥ 0

due to the submartingale property of (Yn ). This completes the proof of the upcrossing
inequality.

Theorem 5.22 (Martingale convergence theorem) Let (Mn )n∈N0 be a submartin-


gale with supn∈N0 E [Mn+ ] < ∞. Then, (Mn )n∈N0 converges almost surely for n → ∞
to a limit M∞ ∈ L1 .

Proof. Fix a < b. By the upcrossing inequality, we get for all n ∈ N:


E [(Mn − a)+ ] − E [(M0 − a)+ ]
E[Un (a, b)] ≤
b−a
E [Mn+ ] + |a|
≤ because (Mn − a)+ ≤ Mn+ + |a| .
b−a
We have Un (a, b) ↑ U (a, b) = number of upcrossings of the interval [a, b] by the whole
n→∞
process (Mn )n∈N0 . By the monotone convergence theorem,
E [Mn+ ] + |a|
E[U (a, b)] = lim E[Un (a, b)] ≤ sup < ∞.
n→∞ n b−a

90
Since E[U (a, b)] < ∞, it follows that U (a, b) < ∞ almost surely, and consequently,

P lim Mn ≤ a < b ≤ lim Mn = 0.
n→∞ n→∞

Hence,
  [  
P lim Mn < lim Mn = P lim Mn ≤ a < b ≤ lim Mn = 0.
n→∞ n→∞ n→∞ n→∞
a<b
a,b∈Q

Thus,
lim Mn = lim Mn almost surely,
n→∞ n→∞

which means that (Mn )n∈N0 converges almost surely to a random variable M∞ . Fatou’s
lemma implies
 
Fatou
E M∞ = E lim Mn ≤ lim E Mn+ < ∞
 + +
 
by assumption.
n→∞ n→∞

Similarly,
 −
≤ lim E Mn−
 
E M∞
n→∞
  
= lim E Mn+ − E[Mn ]

n→∞ | {z }
≥ E[M0 ] because (Mn )n∈N0 is a submartingale

≤ sup E Mn+ − E[M0 ] < ∞


 
by assumption.
n∈N0

Hence,
 +  −
E[|M∞ |] = E M∞ + E M∞ < ∞.

Corollary 5.23 A non-negative supermartingale (Mn )n∈N0 converges almost surely to a


limit M∞ ∈ L1 with E[M∞ ] ≤ E[M0 ].
Proof. (−Mn )n∈N0 is a submartingale with (−Mn )+ = 0. Hence, the martingale conver-
gence theorem implies
Mn −→ M∞ almost surely
and M∞ ∈ L1 . By Fatou’s lemma,
 
E[M∞ ] = E lim Mn ≤ lim E [Mn ]
n→∞ n→∞
≤ E[M0 ] because (Mn )n∈N0 is a supermartingale.

91
Example 5.24 (Growth rate of branching processes) We saw that for a branching
Zn Zn
process (Zn ) with m ∈ (0, ∞), m n is a martingale. Since mn ≥ 0 ∀n, the martingale

convergence theorem implies that


 
Zn
converges almost surely. (5.22)
mn n∈N0

Theorem 5.25 If m ≤ 1 and p1 6= 1, then

P(Zn = 0 for all n sufficiently large) = 1, (5.23)

i.e. with probability one, the population dies out.

Proof.
Case m < 1: P(Zn > 0) ≤ E Zn 1{Zn >0} = E[Zn ] = mn . Hence,
 


X ∞
X
P(Zn > 0) ≤ mn < ∞ because m < 1.
n=0 n=0

The first Borel-Cantelli lemma implies

P(Zn > 0 for infinitely many n) = 0


=⇒ P(Zn = 0 for all but finitely many n) = 1.

Case m = 1: In this case, (Zn )n∈N0 converges almost surely. Since Zn takes only values
in N0 , there exists a (random) N0 such that (Zn )n≥N0 is constant. For k ∈ N0 , let

Ak = {(∃N0 ) (∀n ≥ N0 ) Zn = k}.

By the above, P (∪∞


k=0 Ak ) = 1. We prove by contradiction that P(A0 ) = 1.
Assume there exists k ≥ 1 with P(Ak ) > 0.

Case 1: p` = 1 for some `. Since m = 1, we can only have ` = 1. But this is


excluded by assumption.

Case 2: ∃` 6= j with p` > 0 and pj > 0.


One has ` 6= 1 or j 6= 1. Without loss of generality ` 6= 1.
Set Bn = {Yn,1 = `, Yn,2 = `, . . . , Yn,k = `}. Then, Bn , n ∈ N, are independent with
k
Y
P(Bn ) = P (Yn,i = `) = pk` > 0.
i=1
P∞
Hence, n=1 P(Bn ) = ∞ and the second Borel-Cantelli lemma implies

P(Bn for infinitely many n) = 1.

92
For all n ≥ n0 , one has on Ak ∩ Bn+1 , Zn = k and hence
Zn k
(
X X = 0 6= k if ` = 0,
Zn+1 = Yn+1,i = Yn+1,i = k`
i=1 i=1
| {z } >k if ` ≥ 2.
= ` on Bn+1

Since P(Ak ∩ {Bn for infinitely many n}) > 0, this is a contradiction.
Thus P(Ak ) = 0 for all k ≥ 1 and we conclude P(A0 ) = 1.
Remark 5.26 Later, we will show that for m > 1
P(Zn > 0, ∀n) > 0, (5.24)
i.e. with positive probability, the population survives.
Definition 5.27 (Conditional probability) For A ∈ F, we define the conditional
probability of A given F0 by
P(A|F0 ) = E[1A |F0 ].
Example 5.28 (Polya’s urn) An urn contains a > 0 red and b > 0 blue balls. A ball is
drawn uniformly at random, its color is observed and it is put back into the urn together
with an additional ball of the same color. Let Rn denote the number of red balls drawn in
the first n drawings.
a + Rn
Claim: Mn = = proportion of red balls after n drawings, n ∈ N0 , is a martin-
a+b+n
gale.
Proof. Let Fn = σ(R0 , R1 , . . . , Rn ).
(M1) Mn is a function of Rn and hence Fn -measurable for all n.
(M2) 0 ≤ Mn ≤ 1 =⇒ Mn ∈ L1 .
(M3) We have that
 
a + Rn+1
E[Mn+1 |Fn ] = E Fn
a + b + n + 1
 
a + Rn + 1 a + Rn
Fn
= E 1{Rn+1 =Rn +1} + 1{Rn+1 =Rn }
a+b+n+1 a+b+n+1
↑ ↑
Fn -measurable Fn -measurable

a + Rn + 1 a + Rn
= P(Rn+1 = Rn + 1|Fn ) + P(R = R |F )
a+b+n+1| {z } a + b + n + 1 | n+1 {z n n}
=Mn =1−Mn
 
a + Rn + 1 a + Rn a + Rn
= Mn − +
a+b+n+1 a+b+n+1 a+b+n+1
Mn (a + b + n)Mn
= + = Mn .
a+b+n+1 a+b+n+1

93
Hence, (Mn )n∈N0 is a martingale.
Since Mn ≥ 0 ∀n, the martingale convergence theorem (Corollary 5.23) implies

Mn −−−→ M∞ almost surely. (5.25)


n→∞

M∞ is the asymptotic proportion of red balls in the urn. One can show that

M∞ ∼ beta(a, b). (5.26)

In particular, for a = b = 1,

M∞ ∼ uniform(0, 1). (5.27)

94
5.6 Uniform integrability and L1 -convergence for martingales
Recall Definition 1.61. In the same way,

Definition 5.29 A family (Xi )i∈I of random variables is called uniformly integrable if
 
lim sup E |Xi | 1{|Xi |>K} = 0. (5.28)
K→∞ i∈I

Lemma 5.30 If a family (Xi )i∈I is bounded in Lp for some p > 1, i.e. supi∈I E [|Xi |p ] <
∞, the family is uniformly integrable.

Proof. Let p > 1 be such that

C := sup E [|Xi |p ] < ∞.


i∈I

Let q > 1 satisfy p1 + 1q = 1. Then, by Hölder’s inequality, one has for all i ∈ I and all
K>0
1 1 1 1
E |Xi |1{|Xi |>K} ≤ E [|Xi |p ] p E 1{|Xi |>K} q ≤ C p P (|Xi | > K) q .
  

Chebyshev’s inequality yields

E [|Xi |p ] C
P (|Xi | > K) ≤ p
≤ p.
K K
We conclude 1 1
  C p+q
lim sup E |Xi |1{|Xi |>K} ≤ lim = 0.
K→∞ i∈I K→∞ K p/q

Theorem 5.31 Let X ∈ L1 (Ω, F, P). Then, the family



E[X|G] : G is a σ-algebra with G ⊆ F (5.29)

is uniformly integrable.

Proof. Let ε > 0. There exists δ > 0 such that for all A ∈ F with P(A) ≤ δ one has

E[|X| 1A ] ≤ ε. (5.30)

Choose K0 > 0 such that


E[|X|]
≤ δ. (5.31)
K0

95
For all K ≥ K0 , Jensen’s inequality for the conditional expectation yields
h i h i
E |E[X|G]| 1{|E[X|G]|>K} ≤ E E[|X| |G]1{E[|X||G]>K}
| {z } | {z } | {z }
≤E[|X||G] ⊆{E[|X||G]>K} ∈G
 
= E |X| 1{E[|X||G]>K} . (5.32)

Since we would like to apply (5.30), we estimate the probability


 1  
P E[|X| |G] > K ≤ E E[|X| |G] by Chebyshev’s inequality
K
1 1
= E[|X|] ≤ E[|X|] for all K ≥ K0
K K0
≤δ by (5.31).

Hence, we can apply (5.30) to (5.32) and obtain


 
(∀ε > 0) (∃K0 ) (∀K ≥ K0 ) (∀σ-algebras G ⊆ F) E |E[X|G]| 1{|E[X|G]|>K} ≤ ε.

Thus, we have shown uniform integrability.

Theorem 5.32 For a martingale (Mn )n∈N0 with respect to (Fn )n∈N0 , the following are
equivalent:

(a) (Mn )n∈N0 is uniformly integrable.

(b) (Mn )n∈N0 converges almost surely and in L1 .

(c) (Mn )n∈N0 converges in L1 .

(d) ∃M ∈ L1 such that Mn = E[M |Fn ] for all n.

(a) =⇒ (b) =⇒ (c) holds also if (Mn )n∈N0 is a submartingale.

Proof. (a) =⇒ (b): Let (Mn )n∈N0 be a uniformly integrable submartingale. By Re-
mark 1.62 (i), uniform integrability implies supn∈N0 E[|Mn |] < ∞. Applying the mar-
tingale convergence theorem 5.22, we obtain that Mn −−−→ M almost surely for some
n→∞
M ∈ L1 . But, due to Theorem 1.63, if a sequence of random variables is uniformly
integrable and converges almost surely, then it converges in L1 .
(b) =⇒ (c): trivial.
(c) =⇒ (d): Assume (Mn )n∈N0 is a martingale and Mn −−−→ M in L1 . Fix n ∈ N0 .
n→∞
Claim: Mn = E[M |Fn ].
By the definition of the conditional expectation, it suffices to prove the following:

(∀A ∈ Fn ) E[Mn 1A ] = E[M 1A ].

96
For all i > n, one has
 
E[Mi 1A ] = E E[Mi |Fn ]1A by def. of the conditional expectation because A ∈ Fn
= E[Mn 1A ] by the martingale property. (5.33)

Furthermore,

|E[Mi 1A ] − E[M 1A ]| ≤ E[|Mi − M |] −−−→ 0


i→∞

because Mi −−−→ M in L1 . Thus, lim E[Mi 1A ] = E[M 1A ]. Hence, taking the limit i → ∞
i→∞ i→∞
in (5.33), we get

E[Mn 1A ] = E[M 1A ].

(d) =⇒ (a): follows from Theorem 5.31.


S∞
Theorem 5.33 Let (Fn )n∈N0 be a filtration and set F∞ = σ ( n=0 Fn ). For X ∈ L1 , one
has

E[X|Fn ] −−−→ E[X|F∞ ] almost surely and in L1 . (5.34)


n→∞

Proof. We know already (successive predictions!) that Mn = E[X|Fn ] n∈N0 is a martin-
gale. By Theorem 5.31 it is uniformly integrable. By Theorem 5.32, (Mn )n∈N0 converges
almost surely and in L1 to a limit M ∈ L1 . In the proof of Theorem 5.32, (c) =⇒ (d), we
have shown that

Mn = E[M |Fn ] ∀n.

Hence,

E[X|Fn ] = E[M |Fn ] ∀n.

This means: (∀n) (∀A ∈ Fn )


Z Z Z Z
X dP = E[X|Fn ] dP = E[M |Fn ] dP = M dP. (5.35)
A A A A

The collection of sets A such that (5.35) holds is a Dynkin system which contains the
π-system ∞
S
n=0 Fn . Hence, by Dynkin’s lemma, (5.35) holds for all A ∈ F∞ . Since M is
F∞ -measurable, (5.35) implies

M = E[X|F∞ ] almost surely.

97
5.7 Optional stopping
If (Mn )n∈N0 is a submartingale, then n 7→ E[Mn ] is nondecreasing. This means

E[M0 ] ≤ E[Mn ] (5.36)

for all n ≥ 0. What happens if T is a stopping time?

Theorem 5.34 Let (Mn )n∈N0 be a submartingale and let T be a bounded stopping time
(i.e. ∃N ∈ N with P(T ≤ N ) = 1). Then, one has

E[M0 ] ≤ E[MT ] ≤ E[MN ]. (5.37)

Proof. By Theorem 5.20, (MT ∧n )n∈N0 is a submartingale. Consequently, n 7→ E[MT ∧n ]


is nondecreasing and we know that

E[M0 ] = E[MT ∧0 ] ≤ E[MT ∧N ] = E[MT ].

To prove the second inequality, consider Hn = 1{T ≤n−1} , n ∈ N. Since T is a stopping


time, (Hn )n∈N is previsible. Furthermore, 0 ≤ Hn ≤ 1 for all n. One has
n
X n
X n−1
X
(H · M )n = Hi (Mi − Mi−1 ) = 1{T ≤i−1} Mi − 1{T ≤i} Mi
|{z}
i=1 = 1{T ≤i−1} i=1 i=0

n−1
X
= 1{T ≤n−1} Mn − 1{T =i} Mi − 1 {T ≤0} M0
| {z } i=1
| {z }
=1−1{T ≥n} ={T =0}

n−1
X
= Mn − 1{T ≥n} Mn − 1{T =i} Mi
i=0
= Mn − MT ∧n

for all n ∈ N0 . By Theorem 5.16, (H · M ) is a submartingale. We conclude that


 
E[MN ] − E[MT ] = E[MN ] − E[MT ∧N ] = E (H · M )N
 
≥ E (H · M )0 = 0.

Hence, for a submartingale (Mn )n∈N0 , we know the following for all n ∈ N0 :

• E[Mi ] ≤ E[Mn ] ∀i ∈ N0 with i ≤ n;

• E[M0 ] ≤ E[MT ] ≤ E[MN ] ∀ stopping times T with P(T ≤ N ) = 1 (Theorem 5.34).

We want to generalize this and allow in the last inequality a stopping time instead of the
deterministic time N .

98
Theorem 5.35 Let (Mn )n∈N0 be adapted with Mn ∈ L1 for all n. Then, the following
holds:

(Mn )n∈N0 is a martingale ⇐⇒ E[MT ] = E[M0 ] for all bounded stopping times T.

Proof.

“=⇒”: E[MT ] = E[M0 ] for any bounded stopping time T by Theorem 5.34.

“⇐=”: Claim: E[Mn+1 |Fn ] = Mn for all n ∈ N0 .


By the definition of the conditional expectation, it suffices to show that for all
A ∈ Fn

E[Mn+1 1A ] = E[Mn 1A ].

Set T = n1A + (n + 1)1Ac . Then, for all k ∈ N0 ,


 
 ∅ if k ∈ N0 \ {n, n + 1}, 
{T = k} = A if k = n, ∈ Fk .
 c
A if k = n + 1

Thus, T is a bounded stopping time, and it follows from our assumption that

E[M0 ] = E[MT ] = E[Mn 1A ] + E[Mn+1 1Ac ]


= E[Mn 1A ] + E[Mn+1 ] −E[Mn+1 1A ]
| {z }
= E[M0 ] because
T ≡ n + 1 is a bounded stopping time

⇐⇒ E[Mn+1 1A ] = E[Mn 1A ].

Theorem 5.36 Let T be a stopping time. Assume

(a) (Mn )n∈N0 is a uniformly integrable submartingale


or

(b) T < ∞ almost surely, E[|MT |] < ∞, and Mn 1{T >n} n∈N0
is uniformly integrable.

Then, (MT ∧n )n∈N0 is uniformly integrable.

Proof.

99
(a) (Mn+ )n∈N0 is a submartingale. Since T ∧n is a bounded stopping time, Theorem 5.34
gives

E MT+∧n ≤ E Mn+ .
   

Since (Mn )n∈N0 is uniformly integrable, it is L1 -bounded and hence, (Mn+ )n∈N0 is
L1 -bounded as well. Consequently,

sup E MT+∧n ≤ sup E Mn+ < ∞.


   
n∈N0 n∈N0

Since (MT ∧n )n∈N0 is a submartingale, the martingale convergence theorem implies


that
f ∈ L1
MT ∧n −−−→ M almost surely.
n→∞

(Mn )n∈N0 converges almost surely to a limit M∞ ∈ L1 by the martingale convergence


theorem. Define MT = Mn on {T = n} and MT = M∞ on T = ∞. It follows that

f ∈ L1 .
MT = M

For K > 0, we have


 
E |MT ∧n | 1{|MT ∧n |>K}
   
= E |MT | 1{|MT |>K} 1{T ≤n} + E |Mn | 1{|Mn |>K} 1{T >n} (5.38)
   
≤ E |MT | 1{|MT |>K} + E |Mn | 1{|Mn |>K} .

Taking the supremum over n yields


     
sup E |MT ∧n | 1{|MT ∧n |>K} ≤ E |MT | 1{|MT |>K} + sup E |Mn | 1{|Mn |>K} .
n∈N0 n∈N0

Since E[|MT |] < ∞, the first term goes to 0 as K → ∞. Since (Mn )n∈N0 is uniformly
integrable, the second term goes to 0 as K → ∞. This shows that (MT ∧n )n∈N0 is
uniformly integrable.

(b) follows from (5.38).

Theorem 5.37 Let (Mn )n∈N0 be a uniformly integrable submartingale, and let M∞ =
limn→∞ Mn . Then, for any stopping time T ≤ ∞, one has

E[M0 ] ≤ E[MT ] ≤ E[M∞ ]. (5.39)

100
Proof. By Theorem 5.34, we have

E[M0 ] ≤ E[MT ∧n ] ≤ E[Mn ] ∀n ∈ N0 . (5.40)

(MT ∧n )n∈N0 is uniformly integrable by Theorem 5.36. Theorem 5.32 yields

Mn −−−→ M∞ in L1 and MT ∧n −−−→ MT in L1 .


n→∞ n→∞

Hence, we can take the limit as n → ∞ in (5.40) and the claim follows.

Definition 5.38 For a stopping time T , let



FT = A ∈ F : A ∩ {T = n} ∈ Fn ∀n ∈ N0 (5.41)

be the σ-algebra of the information up to time T .

Lemma 5.39 FT is a σ-algebra.

Proof.
• Ω ∩ {T = n} = {T = n} ∈ Fn ∀n ∈ N0 . Hence Ω ∈ FT .

• Let A ∈ FT . For n ∈ N0 , one has

Ac ∩ {T = n} = (A ∪ {T 6= n})c = (( A ∩ {T = n} ) ∪ {T 6= n})c ∈ Fn .
| {z } | {z }
∈Fn because A∈FT ∈Fn

Hence A ∈ FT .

• Let Ai ∈ FT , i ≥ 1. For n ∈ N0 , one has


∞ ∞
!
[ [
Ai ∩ {T = n} = Ai ∩ {T = n} ∈ Fn .
| {z }
i=1 i=1
∈Fn
S∞
Hence i=1 Ai ∈ FT .

Example 5.40 of an event in FT :

{T ≤ k} ∈ FT for all k ∈ N0 .

Proof.
 
{T = n} if n ≤ k,
{T ≤ k} ∩ {T = n} = ∈ Fn ∀n ∈ N0 .
∅ if n > k

101
Theorem 5.41 (Optional stopping theorem) Let S ≤ T < ∞ be stopping times,
and let (MT ∧n )n∈N0 be a uniformly integrable submartingale. Then, one has
E[MS ] ≤ E[MT ] and MS ≤ E[MT |FS ]. (5.42)
Proof.
• We apply Theorem 5.37 with M
fn = MT ∧n and Te = S:
f ] ≤ E[M
E[M f ] ≤ E[Mf ]
| {z 0} | {z Te} | {z∞}
=E[M0 ] =E[MS ] =E[MT ]

Hence, E[MS ] ≤ E[MT ].


• Let A ∈ FS and let U = S1A + T 1Ac . U is a stopping time, because for all n ∈ N0
{U = n} = A ∩ {S = n} ∪ Ac ∩ {T = n}
 
n
 [ 
= A ∩ {S = n} ∪ Ac ∩ {S = i} ∩ {T = n} ∈ Fn .
| {z } | {z }
i=0
| {z }
∈Fi ⊆Fn ∈Fn
∈Fn
because A∈FS

Since U ≤ T , the first part of the theorem implies


E[MU ] ≤ E[MT ]
| {z } | {z }
=E[MS 1A ]+E[MT 1Ac ] =E[MT 1A ]+E[MT 1Ac ]

⇐⇒ E[MS 1A ] ≤ E[MT 1A ].
We can rewrite this as follows
Z Z Z
MS dP ≤ MT dP = E[MT |FS ] dP
A A A

by the definition of the conditional


expectation. We apply this inequality with
Aε = MS − E[MT |FS ] > ε ∈ FS . This yields
Z

εP(Aε ) ≤ MS − E[MT |FS ] dP ≤ 0

=⇒ P(Aε ) = 0 ∀ε > 0
=⇒ MS ≤ E[MT |FS ] almost surely.

Example 5.42 (Random walk on Z) Let p ∈ (0, 1) and let Yi , i ≥ 1, be independent


and identically distributed with P(Yi = 1) = p, P(Yi = −1) = 1 − p. We set
n
X
S0 = x ∈ Z, Sn = x + Yi , n ∈ N.
i=1

102
−3 −2 −1 0 1 2 3

1−p p

Sn !
1−p

Lemma 5.43 Mn = is a martingale.
p
n∈N0

Proof. We saw already that this is true since (Mn ) falls in the class of harmonic functions
of Markov chains. We include a direct proof for the convenience of the reader. For n ∈ N0 ,
set Fn = σ(Yi , 1 ≤ i ≤ n).

(M1) Mn is Fn -measurable.

(M2) Since |Sn | ≤ |x| + n almost surely, E[|Mn |] < ∞.

(M3) Using Sn+1 = Sn + Yn+1 , we calculate


" S #  S "  Y #
1 − p n+1 1−p n 1 − p n+1
E[Mn+1 |Fn ] = E Fn = E Fn
p p p
-%
independent
" Yn+1 #
1−p
= Mn E
p
1−p
 
p
= Mn p · + (1 − p) · = Mn .
p 1−p

We write Px and Ex instead of P and E to indicate that the random walk starts in x.
For y ∈ Z, set τy = inf{n ∈ N0 : Sn = y}.
1
Theorem 5.44 For p 6= 2
and a, b, x ∈ Z with a < x < b, one has
 x  b
1−p 1−p
p
− p
Px (τa < τb ) =  a  b . (5.43)
1−p 1−p
p
− p


Proof. Let a < x < b, and let T = τa ∧ τb = inf n ∈ N0 : Sn ∈ {a, b} . T is a stopping
time.
Claim: Px (T < ∞) = 1
The number of points in (a, b) ∩ Z equals (b − 1) − (a + 1) + 1 = b − a − 1. Let

Ak = {Yi = 1 ∀i ∈ {k, k + 1, . . . , k + b − a − 1}}

103
be the event that the random walker takes b − a consecutive steps to the right starting at
time k. Then, A`(b−a)+1 , ` ∈ N0 , are independent with

X
b−a
Px (A`(b−a)+1 ) = p >0 ⇒ Px (A`(b−a)+1 ) = ∞.
l=1

Hence, by the second Borel-Cantelli lemma,

Px (A`(b−a)+1 for infinitely many l) = 1.

If A`(b−a)+1 holds and S0 = x ∈ (a, b), the random walker leaves the interval (a, b) and
T < ∞. Thus,

1 = Px (A`(b−a)+1 for infinitely many `) ≤ Px (T < ∞).


  Sn 
Consider the martingale Mn = 1−p p
. Since S0 = x ∈ [a, b], the martingale
n∈N0
(MT ∧n )n∈N0 is bounded. In particular, it is uniformly integrable. Applying the optional
stopping theorem, we obtain

Ex [M0 ] = Ex [MT ] ⇐⇒
x " S # a b
1−p 1−p T 1−p 1−p
  
= Ex = Px (ST = a) + Px (ST = b)
p p p | {z } p
=1−Px (ST =a)

Hence,
 x  b
1−p 1−p
p
− p
Px (τa < τb ) = Px (ST = a) =  a  b .
1−p 1−p
p
− p

Fix a and x. Clearly, {τa < τb } ⊆ {τa < τb+1 } for all b. Hence, as b ↑ ∞,

[
{τa < τb } ↑ {τa < τb } = {τa < ∞}.
b=1

Using the σ-continuity of P, we get

lim Px (τa < τb ) = Px (τa < ∞).


b→∞

1−p
Case p > 21 : In this case, 0 < p
< 1 and hence,
b
1−p

lim = 0.
b→∞ p

104
Consequently, x−a
1−p

Px (τa < ∞) = ∈ (0, 1).
p
Note that by the strong law of large numbers,
Sn
= E[Y1 ] = p − (1 − p) = 2p − 1 > 0
lim
n→∞ n
=⇒ lim Sn = ∞ almost surely.
n→∞
1−p
Case p < 12 : In this case, p
> 1 and hence,
b
1−p

lim = ∞.
b→∞ p
Consequently,
Px (τa < ∞) = lim Px (τa < τb ) = 1.
b→∞
By the strong law of large numbers,
Sn
lim= E[Y1 ] = 2p − 1 < 0
n→∞ n
=⇒ lim Sn = −∞ almost surely.
n→∞

Example 5.45 (Symmetric random walk on Z) Let Yi , i ≥ 1, be independent and


identically distributed with P(Yi = 1) = 12 = P(Yi = −1). We set
n
X
S0 = 0, Sn = Yi , n ∈ N.
i=1

Theorem 5.46 Consider symmetric random walk on Z. Let a, b ∈ Z with a < 0 < b,
and set T = τa ∧ τb . One has
b
P(τa < τb ) = and E[T ] = −ab. (5.44)
b−a
E[T ] is the expected amount of time which the random walker needs to leave the
interval (a, b). Since a < 0 < b, −ab > 0. Note that
b
P(τa < ∞) = lim P(τa < τb ) = lim = 1.
b→∞ b→∞ b − a

By the strong law of large numbers,


Sn
lim = E[Y1 ] = 1 − 2p = 0.
n→∞ n

We will also show that


lim inf Sn = −∞, lim sup Sn = ∞ almost surely. (5.45)
n→∞ n→∞

105
Proof of Theorem 5.46.

• T is a stopping time with P(T < ∞) = 1. Since E[Yi ] = 0, (Sn )n∈N0 is a martingale.
0 ∈ (a, b) implies a ≤ ST ∧n ≤ b for all n ∈ N0 . Hence, (ST ∧n )n∈N0 is uniformly
integrable and the optional stopping theorem implies

E[S0 ] = E[ST ] ⇐⇒ 0 = P(ST = a) a + P(ST = b) b


| {z } | {z }
=P(τa <τb ) =1−P(τa <τb )
b
=⇒ P(τa < τb ) = .
b−a

• (Sn2 − n)n∈N0 is a martingale with respect to Fn = σ(Yi , 1 ≤ i ≤ n):

(M1) Sn2 − n is Fn -measurable.


(M2) E [|Sn2 − n|] < ∞.
(M3)
 2
− (n + 1) Fn = E (Sn + Yn+1 )2 − (n + 1) Fn
  
E Sn+1
= E Sn2 + 2Sn Yn+1 + Yn+1
2
 
− n − 1 Fn
= Sn2 + 2Sn E[Yn+1 ] + E Yn+1
 2 
−n − 1
| {z } | {z }
=0 =1
= Sn2 − n.

• Fix N ∈ N, and set Te = T ∧ N . Since S0 ∈ (a, b), we have


n o
2 2
 2 2
S − T ∧ n ≤ max S , T ∧ n ≤ max a ,b ,N
e e
Te∧n T ∧n
e

for all n ∈ N0 . Hence, the martingale (ST2e∧n − Te ∧ n)n∈N0 is bounded and therefore
uniformly integrable. The optional stopping theorem implies
h i
0 = E ST2e − Te = E ST2 ∧N − E[T ∧ N ]
 

By the bounded convergence theorem, E [ST2 ∧N ] −−−→ E [ST2 ]. By the monotone


N →∞
convergence theorem, E[T ∧ N ] −−−→ E[T ]. We conclude that
N →∞

E[T ] = E ST2 = a2 P(ST = a) + b2 P(ST = b)


 

ab(a − b)
 
2 b 2 a
=a +b − = = −ab.
b−a b−a b−a

106
5.8 Backwards martingales
We want to consider martingales with index set −N0 = {..., −2, −1, 0}. Recall: σ-algebras
Fn , n ∈ −N0 , form a filtration if . . . F−n ⊆ F−n+1 ⊆ . . . F−1 ⊆ F0 .
Definition 5.47 A martingale (Mn )n∈−N0 with index set −N0 is called a backwards mar-
tingale. In other words, a backwards martingale is a sequence (Mn )n∈−N0 of random
variables with
(M1) Mn is measurable with respect to Fn , ∀n ∈ −N0 .
(M2) Mn ∈ L1 , ∀n ∈ −N0 .
(M3) E[M−n+1 |F−n ] = M−n ∀n ∈ N.
Remark 5.48 Every backwards martingale is uniformly integrable because
M−n = E[M0 |F−n ] for all n ∈ N.
Theorem 5.49 Let (Mn )n∈−N0 be a backwards martingale with respect to (Fn )n∈−N0 .
Then, the limit M−∞ = limn→∞ M−n exists almost surely and in L1 and satisfies

\
M−∞ = E[M0 |F−∞ ] with F−∞ = F−n .
n=0

Proof. For n ∈ N and a, b ∈ R with a < b let U−n (a, b) denote the number of upcrossings
of (Mi )i∈[−n,0] of the intervall [a, b]. The upcrossing inequality yields
(b − a)E[U−n (a, b)] ≤ E (M0 − a)+ − E (M−n − a)+ ≤ E (M0 − a)+ .
     

Since U−n (a, b) ↑ U−∞ (a, b) = number of upcrossings of (Mi )i∈−N0 of the intervall [a, b],
the monotone convergence theorem implies
1
E (M0 − a)+ < ∞.
 
E[U−∞ (a, b)] ≤
b−a
Hence, U−∞ (a, b) < ∞ almost surely. The same argument as in the proof of the martingale
convergence theorem yields that (Mn )n∈−N0 converges almost surely to a limit M−∞ ∈
L1 . Convergence in L1 follows as in the proof of Theorem 5.32 (a)⇒(b) using uniform
integrability.
Finally, let A ∈ ∞
T
n=0 F−n . Then, for every n ∈ N, A ∈ F−n and by the martingale
property, we obtain
Z Z Z
M−n dP = E[M0 |F−n ] dP = M0 dP.
A A A

Taking the limit n → ∞ we obtain


Z Z
M−∞ dP = M0 dP.
A A

Since M−∞ is F−∞ -measurable, it follows that M−∞ = E[M0 |F−∞ ].

107
Example 5.50 Let Xi , i ≥ P
1, be independent and identically distributed with E[|X1 |] <
∞. For n ∈ N, we set Sn = ni=1 Xi and

F−n = σ(Sk , k ≥ n).

Claim: (M−n = n1 Sn )n∈N is a backwards martingale with respect to (F−n )n∈N .

(M1) M−n is F−n -measurable.

(M2) Since Xi ∈ L1 , M−n ∈ L1 .

(M3) Fix n ∈ N. One has,


1
E[M−n |F−(n+1) ] = E[Sn |F−(n+1) ]
n
1 1 1
= E[Sn+1 − Xn+1 |F−(n+1) ] = Sn+1 − E[Xn+1 |F−(n+1) ]
n n n
because Sn+1 is F−(n+1) -measurable. Note that

F−(n+1) = σ(Sk , k ≥ n + 1) = σ(Sn+1 , Xk , k ≥ n + 2).

Hence, by symmetry, one has for all 1 ≤ i, j ≤ n + 1,

E[Xi |F−n−1 ] = E[Xj |F−n−1 ].

Consequently,
n+1
1 X 1 1
E[Xn+1 |F−n−1 ] = E[Xi |F−n−1 ] = E[Sn+1 |F−n−1 ] = Sn+1 .
n + 1 i=1 n+1 n+1

We conclude
1 1 1
E[M−n |F−(n+1) ] = Sn+1 − Sn+1 = Sn+1 = M−n−1 .
n n(n + 1) n+1

By the convergence theorem for backwards martingales, it follows that ( n1 Sn )n∈N con-
verges almost surely and in L1 . The limit is measurable with respect to the tail σ-algebra
of the Xi ’s, which is trivial by Kolmogorov’s 0-1-law. Hence, M−∞ = limn→∞ n1 Sn is
constant almost surely and the constant is given by E[M−∞ ]. Using the L1 -convergence,
we find
1
E[M−∞ ] = lim E[Sn ] = E[X1 ].
n→∞ n

This proves the strong law of large numbers:


1
lim Sn = E[X1 ] almost surely.
n→∞ n

108
5.9 Polya’s urn
Consider Polya’s urn from Example 5.28. An urn contains a > 0 red and b > 0 blue balls.
A ball is drawn uniformly at random, its color is observed and it is put back into the urn
together with an additional ball of the same color.
Let Xi be the color of the ball drawn at time i ≥ 1. Clearly, Xi is a random variable
with values in {R, B}. Note that the stochastic process (Xi )i≥1 is not memoryless: The
more red balls we have drawn up to time n, the larger the probability to draw a red ball
at time n + 1.
Assume a = b = 2. In this special case we calculate some probabilities:
2 3 2 1
P(X1 = B, X2 = B, X3 = R) = · · = ,
4 5 6 10
2 2 3 1
P(X1 = B, X2 = R, X3 = B) = · · = .
4 5 6 10
In both cases, the same colors are drawn, but in different order. Nevertheless, the two
probabilities coincide. This is not a coincidence, as the following theorem shows.
Lemma 5.51 Let n ∈ N, xi ∈ {R, B}, 1 ≤ i ≤ n, and let k := |{i ∈ {1, . . . , n} : xi = R}|
be the number of xi being equal to R. Then, one has
(a + k − 1)!(b + n − k − 1)!
P(Xi = xi ∀1 ≤ i ≤ n) = ca,b ·
(a + b + n − 1)!
with the constant
(a + b − 1)!
ca,b := .
(a − 1)!(b − 1)!
In particular, the probability to observe a given sequence of colors depends only on k
(=total number of red balls drawn) and n − k (=total number of blue balls drawn). It does
not depend on the order in which the balls are drawn. This is called exchangeability.
Proof. One has
Qk−1
+ i) n−k−1
Q
i=0 (a i=0 (b + i)
P(Xi = xi ∀1 ≤ i ≤ n) = Qn−1 .
i=0 (a + b + i)
This is the claimed expression.
Consider the infinite product space Ω = {B, R}N , F = {B, R}⊗N . For p ∈ (0, 1), let
O
Pp = (pδB + (1 − p)δR )
i∈N

denote the infinite product measure. For i ∈ N, let


Yi : Ω → {B, R}
be the projection to the i-th coordinate. For any fixed p, Yi , i ≥ 1, are independent and
identically distributed with
Pp (Yi = R) = p, Pp (Yi = B) = 1 − p.

109
Theorem 5.52 For n ∈ N, xi ∈ {R, B}, 1 ≤ i ≤ n, and k := |{i ∈ {1, . . . , n} : xi = R}|,
one has
Z 1
P(Xi = xi ∀1 ≤ i ≤ n) = pk (1 − p)n−k ϕa,b (p) dp
Z0 1
= Pp (Yi = xi ∀1 ≤ i ≤ n)ϕa,b (p) dp,
0

where ϕa,b denotes the density of the beta distribution with parameters a and b:

ϕa,b (p) = ca,b · pa−1 (1 − p)b−1 , p ∈ (0, 1).

Proof. The second equality is clear. To prove the first equality we calculate
Z 1 Z 1
k n−k a−1 b−1
p (1 − p) ca,b · p (1 − p) dp = ca,b pk+a−1 (1 − p)n−k+b−1 dp
0 0
ca,b (a + k − 1)!(b + n − k − 1)!
= = ca,b ·
ca+k,b+n−k (a + b + n − 1)!
=P(Xi = xi , ∀1 ≤ i ≤ n)

by Lemma 5.51.

Theorem 5.53 (De Finetti’s theorem for the Polya urn)


For all events A ∈ {B, R}⊗N one has
Z 1
P((Xi )i≥1 ∈ A) = Pp ((Yi )i≥1 ∈ A)ϕa,b (p) dp.
0

Thus, to calculate the probability of an event for the sequence of drawings (Xi )i≥1
from the Polya urn, we can calculate the probability of this event for every i.i.d. sequence
under Pp and then we take the average of all these probabilities with respect to a beta
distribution. One says: (Xi )i≥1 is a mixture of i.i.d. sequences.
De Finetti’s theorem is useful in Bayesian statistics. Suppose your data is an i.i.d.
sequence sampled from Pp with unknown p. In the Bayesian approach, one puts a prior
distribution on the unknown p, for instance a beta distribution and assumes the data is
sampled form the measure
Z 1
A 7→ Pp ((Yi )i≥1 ∈ A)ϕa,b (p) dp.
0

The interpretation of this measure in terms of the Polya urn simplifies many calculations.
Proof of Theorem 5.53. The events {ω = (ωi )i≥1 ∈ Ω : ωi = xi ∀1 ≤ i ≤ n} are stable
under intersections and generate the product σ-algebra. Hence, the claim follows from
Theorem 5.52.

110
Consider the following random variables:

Kn :=|{i ∈ {1, . . . , n} : Xi = R}| = number of red balls drawn up to time n,


Kn
αn := = fraction of red balls drawn up to time n.
n
Note that
Kn a + Kn
α∞ := lim αn = lim = lim .
n→∞ n→∞ n n→∞ a + b + n

Using a martingale argument, we have shown that the last limit exists almost surely.
Hence, limn→∞ αn exists almost surely.

Theorem 5.54 The limit α∞ is a Beta(a, b) distributed random variable.

Proof. For A ∈ B(R), one has


n
!
  1X
P(α∞ ∈ A) =P lim αn ∈ A = P lim 1{R} (Xi ) ∈ A
n→∞ n→∞ n
i=1
n
Z 1 !
1 X
= Pp lim 1{R} (Yi ) ∈ A ϕa,b (p) dp.
0 n→∞ n i=1

The strong law of large numbers implies


n
! 
1X 1 falls p ∈ A,
Pp lim 1{R} (Yi ) ∈ A =
n→∞ n 0 sonst.
i=1

We conclude
Z
P(α∞ ∈ A) = ϕa,b (p) dp.
A

111
References
[Bau90] Heinz Bauer. Maß- und Integrationstheorie. (Measure and integration theory).
Berlin etc.: Walter de Gruyter., 1990.

[BK10] Martin Brokate and Götz Kersting. Maß und Integral. Birkhäuser, 2010.

[Dur05] Richard Durrett. Probability: theory and examples. Duxbury Press, third edition,
2005.

[Kle06] Achim Klenke. Wahrscheinlichkeitstheorie. Springer, 2006.

112

You might also like