Skript 2022
Skript 2022
Contents
1 Basics 4
1.1 Probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Construction of independent random variables . . . . . . . . . . . . . . . . 20
1.6.1 Finitely many . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.2 Infinitely many . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Convergence of sequences of random variables . . . . . . . . . . . . . . . . 21
1.8 Uniform integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Martingales 79
5.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.1.1 Sums of independent centered random variables . . . . . . . . . . . 79
5.1.2 Successive predictions . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1.3 Radon-Nikodym derivatives on increasing sequences of σ-fields . . . 80
5.1.4 Harmonic functions of Markov chains . . . . . . . . . . . . . . . . . 81
5.1.5 Growth rate of branching processes . . . . . . . . . . . . . . . . . . 81
5.2 Supermartingales and Submartingales . . . . . . . . . . . . . . . . . . . . . 83
5.3 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.4 Stopped martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.5 The martingale convergence theorem . . . . . . . . . . . . . . . . . . . . . 88
5.6 Uniform integrability and L1 -convergence for martingales . . . . . . . . . . 95
5.7 Optional stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.8 Backwards martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.9 Polya’s urn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
2
Preface. Large parts of these notes are very close to the books [Dur05] and [Kle06].
These notes are written for the students of the course “Probability theory” (MA 2409) at
TUM. They do not replace any existing books. Please do not distribute them.
Preliminaries.
Motivation. Probability theory is a very active area of research. It has many connec-
tions with other fields of mathematics and also many applications. Random phenomena
are omnipresent. Stochastic models are used in many areas, e.g. physics, biology, eco-
nomics, sociology, etc. Think for instance about models for the spread of a pandemic -
these are stochastic processes and a building block in such models is the branching process
which we will also encounter in this lecture course.
3
1 Basics
1.1 Probability spaces
The material for this section is taken from Chapter 1 in [Dur05].
(i) Ω ∈ F
(ii) A ∈ F ⇒ Ac ∈ F
∞
S
(iii) A1 , A2 , . . . ∈ F ⇒ Ai ∈ F
i=1
• P[Ω] = 1
• P[ Ai ] = i∈I P[Ai ] for any collection (Ai )i∈I of pairwise disjoint elements of F,
S P
i∈I
where I is a finite or countably infinite index set. The sets A ⊆ Ω, A ∈ F are called
events.
Special cases:
• Ω is finite and p(ω) = |Ω|−1 for all ω, i.e. P is the uniform distribution on Ω.
Show that σ(G) is indeed a σ-algebra, by checking (i), (ii) and (iii) in Definition 1.1.
σ(G) is called the “σ-algebra generated by G”. Show that σ(G) is the smallest σ-algebra
containing G, i.e. if H is a σ-algebra and G ⊆ H, then σ(G) ⊆ H.
4
Example 1.4 (Probability measures with density with respect to the Lebesgue
measure)
• Let Ω = R.
Special cases:
• f (x) = 1[0,1] (x) uniform distribution; here and in the following, 1A denotes the
indicator function of the set A, i.e. 1A (x) = 1 if x ∈ A and 1A (x) = 0 otherwise.
• f (x) = e−x 1[0,∞) (x) exponential distribution (modelling for instance the lifetime of
a lightbulb)
• f (x) =
2
√1 e−x /2 standard normal distribution (arising in the central limit theorem).
2π
5
1.2 Random variables
Definition 1.6 A measurable space (Ω, F) consists of a set Ω 6= ∅, and a σ-algebra F
on Ω. Let (Ω, F) and (S, S) be measurable spaces. A function X : Ω → S is measurable
if for any A ∈ S, X −1 (A) := {ω ∈ Ω : X(ω) ∈ A} ∈ F. A measurable function
X : Ω → S is called a random element of S, or a S-valued random variable. For a
probability space (Ω, F, P) and a S-valued random variable X, the distribution of X is the
following probability measure µX on (S, S):
Special cases:
Uniqueness: Let µ and ν be two probability measures on (R, B(R)) satisfying (1.6). Then,
6
Since E is a π-system (i.e. A, B ∈ E ⇒ A ∩ B ∈ E) generating B(R), µ = ν follows from
the uniqueness theorem for probability measures, see later.
Since every distribution function has the required properties, the last claim follows.
Definition 1.9 If X and Y have the same distribution, we say that they are equal in
distribution and we write
d
X = Y.
d
Note: X = Y ; X(ω) = Y (ω) for all ω. Consider e.g. Ω = [0, 1], F = B([0, 1]), P = λ,
X = I[0,1/2) , Y = I[1/2,1] . Then, X(ω) 6= Y (ω) for all ω. But X and Y have the same
distribution, namely Bernoulli( 12 ).
Theorem 1.10 Let (Ω, F) and (S, S) be measurable spaces, let X : Ω → S, and let
A ⊆ S be such that σ(A) = S, i.e. A generates S. If {ω : X(ω) ∈ A} ∈ F for all A ∈ A,
then X is a random element of S. In other words, it is enough to check the condition
“X −1 (A) ∈ F” for sets A ∈ A instead of considering A ∈ S.
(σ1) C ∈ C =⇒ {X ∈ C} ∈ F
=⇒ {X ∈ C c } = {X ∈ C}c ∈ F
=⇒ C c ∈ C.
(σ2) Similarly, Cn ∈ C =⇒ ∞
S
n=1 Cn ∈ C.
Example 1.11 B(R) = σ((a, b); a < b, a, b ∈ R) = σ((a, b); a < b, a, b ∈ Q).
Definition 1.12 Let (Ω, F) and (S, S) be measurable spaces and X : Ω → S a function.
Then σ(X) := {X −1 (B) : B ∈ S} is called the σ-algebra generated by X.
Theorem 1.14 Let (Ω, F), (S, S), and (T, T ) be measurable spaces. If X : Ω → S and
f : S → T are measurable, then the composition f (X) : Ω → T is measurable as well, i.e.
f (X) is a random element of T .
7
Special cases:
Remark 1.16 Often, with a slight abuse of notation, when we say “random variable”
without mentioning the possible values, we mean a random element of
([−∞, ∞], B([−∞, ∞])).
Remark 1.18 If Xt , t ∈ [0, 1], are random variables, inf 0≤t≤1 Xt need not be a random
variable. For instance, set Ω = [0, 1], F = B([0, 1]). Take A ⊆ [0, 1], and define
(
0 if t = ω and ω ∈ / A,
Xt (ω) = (1.7)
1 otherwise,
i.e.
(
Xt (ω) = 1 ∀ω if t ∈ A,
(1.8)
Xt (ω) = 1[0,1]\{t} (ω) if t ∈
/ A.
In particular, all Xt , t ∈ [0, 1], are random variables. Note that inf 0≤t≤1 Xt (ω) = 1A (ω)
is not measurable if A ∈ / F.
Example 1.20 Take Ω = [0, 1], F = B([0, 1]), P = λ, and consider the random variables
Xn = (n + 1)ω n , n = 1, 2, . . .. Then Xn → X a.s. where X(ω) = 0, ∀ω.
8
1.3 Expectation
Definition 1.21 Let X be a random variable on a probability space (Ω, F, P). The ex-
pectation (expected value, mean) of X is defined by
Z
E[X] = X(ω) P(dω), (1.10)
Ω
Then
R X + and X − are both non-negative random variables. If X ≥ 0, then E[X] =
X dP ∈ [0, ∞] is always well-defined, but it could have the value +∞. Now, E[X] is
well defined if E[X + ] < ∞ or E[X − ] < ∞ (or both), and in this case we have
Convergence theorems.
The convergence theorems (Fatou’s lemma, monontone convergence theorem, dominated
convergence theorem, bounded convergence theorem) will be used again and again during
the lecture. Therefore, it is important that you know the statements of these theorems
by heart and are able to apply them.
Example 1.23 Take Ω = [0, 1], F = B([0, 1]), P = λ, and consider the random variables
Xn = (n + 1)ω n , n = 1, 2, . . .. Then lim inf n→∞ Xn (ω) = limn→∞ Xn (ω) = 0, ∀ω ∈ [0, 1),
and we have
0 = E lim Xn < lim E[Xn ] = 1 . (1.15)
n→∞ n→∞
9
(b) there exists Y with E[|Y |] < ∞ such that
|Xn | ≤ Y for all n. (1.17)
Then,
lim E[Xn ] = E[X]. (1.18)
n→∞
Case 4: general h.
Write h = h+ − h− .
10
Example 1.27 (Discrete random variables)
Consider a random variable X such that
∞
X
P[X = an ] = pn , n = 1, 2, . . . , where pn ≥ 0, pn = 1. (1.21)
n=1
P∞
Then, the distribution of X is given by µ = n=1 pn δan where the Dirac measure δx (·) is
given by δx (A) = 1 if x ∈ A and δx (A) = 0 otherwise. Consequently,
∞
X ∞
X
E[h(X)] = h(an )pn , E[X] = an p n , (1.22)
n=1 n=1
Example 1.28 If the distribution of X has a density f with respect to the Lebesgue
measure, then
Z ∞ Z ∞
E[h(X)] = h(y)f (y) dy, E[X] = yf (y) dy, (1.23)
−∞ −∞
E[X] = m (1.26)
and
Var(X) = σ 2 . (1.27)
See the “Introduction to Probability” course for a proof. The key fact is that if X has
the law N (0, 1), then Y = σX + m has the law N (m, σ 2 ).
11
1.4 Inequalities
See [Dur05], Section 1.3 and appendix A.5.
Application 1.30
In particular, we know:
for instance: nth moment of Z is finite =⇒ (n − 1)st , (n − 2)nd , . . . , 1st moments of Z are
finite.
Taking expectations yields (1.33). Then (1.34) follows by taking A = [c, ∞).
12
E[|X|]
Example 1.32 (a) P(|X| ≥ a) ≤ for a > 0 ( Take A = (−∞, −a] ∪ [a, ∞),
a
ϕ(x) = |x| in (1.33) or |X| instead of X and ϕ(x) = x in (1.34)).
E |X|p
(b) P(|X| ≥ a) ≤ p
for a > 0, p > 0 (Take A = (−∞, −a]∪[a, ∞), ϕ(x) = |x|p
a
in (1.33) or |X| instead of X and ϕ(x) = xp in (1.34)).
Var(X)
(c) P(|X − E[X]| ≥ a) ≤ for a > 0 (follows from (b) with p = 2)
a2
E |X − E[X]|4
(d) P(|X − E[X]| ≥ a) ≤ for a > 0 (follows from (b) with p = 4)
a4
E eλX
(e) P(X ≥ a) ≤ for λ > 0, a > 0 (Take ϕ(x) = eλx ) in (1.34)).
eλa
1.5 Independence
See [Dur05], Section 1.4.
for all Bi ∈ Si .
for all Ai ∈ Fi .
13
The following proposition shows that in Definition 1.33 (a) is a special case of (b) and
(b) is a special case of (c).
Proposition 1.35 (a) Events A1 , . . . , An are independent if and only if 1A1 , . . . , 1An
are independent.
(b) Random variables X1 , . . . , Xn are independent if and only if the σ-algebras σ(X1 ),
. . ., σ(Xn ) are independent.
Proof.
=⇒ A1 , . . . , An are independent.
Suppose A1 , . . . , An are independent. Let ki ∈ {0, 1}, 1 ≤ i ≤ n. Then,
Ai if ki = 1,
{1Ai = ki } =
Aci if ki = 0.
Hence,
n
!
\ \ \
P {1Ai = ki } =P Ai ∩ Acj
i=1 i:ki =1 j:kj =0
Y Y n
Y
= P(Ai ) P(Acj ) = P(1Ai = ki ).
i:ki =1 j:kj =0 i=1
14
Whenever Ω ∈ Ai for all i, it suffices to consider I = {1, . . . , n}.
(D1) Ω ∈ D,
(D2) A ∈ D =⇒ Ac ∈ D,
[
(D3) A1 , A2 , . . . ∈ D, A1 , A2 , . . . pairwise disjoint =⇒ Ai ∈ D.
i≥1
Each σ-field is a Dynkin system. On the other hand, there are Dynkin systems which
are not σ-fields.
Example 1.39 Ω = {1, 2, 3, 4}, D = {∅, Ω, {1, 2}, {1, 4}, {3, 4}, {2, 3}}. D is a Dynkin
system, but not a σ-field.
∞
S
Note that a Dynkin system which is ∩-stable is a σ-field: For A1 , A2 , . . . ∈ D, An =
n=1
∞ ∞
(An ∩ Ac1 ∩ . . . ∩ Acn−1 ) =⇒
S S
A1 ∪ An ∈ D.
n=2 n=1
T
Lemma 1.40 Let I 6= ∅ be an index set. If Ai is a σ-field on Ω, ∀i ∈ I, then Ai =
i∈I
{A ⊆ Ω | A ∈ Ai , ∀i ∈ I} is a σ-field on Ω.
The same statement holds for Dynkin systems.
Proof. Exercise.
Show that D(G) is indeed a Dynkin system, by checking (D1), (D2) and (D3) in Definition
1.38. D(G) is called the “Dynkin system generated by G”. Show that D(G) is the smallest
Dynkin system containing G, i.e. if H is a Dynkin system and G ⊆ H, then D(G) ⊆ H.
15
(1) We show A ∈ D(M), B ∈ M =⇒ A ∩ B ∈ D(M).
Proof: For B ∈ M, DB = {A ⊆ Ω | A ∩ B ∈ D(M)} is a Dynkin system,
M ∩ -stable =⇒ M ⊆ DB
DB Dynkin system =⇒ D(M) ⊆ DB
Proof. P1 (A) = P2 (A) for all A ∈ G implies, due to the properties of probability measures,
P1 (A) = P2 (A) for all A ∈ D(G). But, due to Lemma 1.42, D(G) = σ(G) and σ(G) = F
due to our assumptions, hence we conclude P1 = P2 .
σ(M) ⊆ D. (1.39)
Fix B2 ∈ J2 , . . . , Bn ∈ Jn . Let
Then,
(a) J1 ⊆ D by (1.40);
16
(D1) Ω ∈ D because Ω ∈ J1 .
(D2) Let A ∈ D. Then,
P(Ac ∩ B2 ∩ · · · ∩ Bn )
= P (Ω ∩ B2 ∩ · · · ∩ Bn ) \ (A ∩ B2 ∩ · · · ∩ Bn )
= P(Ω ∩ B2 ∩ · · · ∩ Bn ) − P(A ∩ B2 ∩ · · · ∩ Bn )
= P(Ω) − P(A) P(B2 ) · · · P(Bn ) because Ω ∈ D and A ∈ D
= P(Ac )P(B2 ) · · · P(Bn ).
=⇒ Ac ∈ D.
(D3) Let A1 , A2 , . . . ∈ D be pairwise disjoint. Then, Ai ∩ B2 ∩ · · · ∩ Bn , i ≥ 1, are
pairwise disjoint. Hence,
[ ! !
[
P Ai ∩ B2 ∩ · · · ∩ Bn = P (Ai ∩ B2 ∩ · · · ∩ Bn )
i≥1 i≥1
X
= P(Ai ∩ B2 ∩ · · · ∩ Bn )
i≥1
X
= P(Ai )P(B2 ) · · · P(Bn ) because Ai ∈ D ∀i
i≥1
[
=P Ai P(B2 ) · · · P(Bn ).
i≥1
S
=⇒ i≥1 Ai ∈ D.
17
Corollary 1.47 (i) Suppose F11 , . . . , F1m1 ,
F21 , . . . , F2m2 ,
..
.
Fn1 , . . . , Fnmn
are independent σ-algebras. Let Ji = σ m
S i
j=1 F ij . Then, J1 , . . . , Jn are indepen-
dent.
(ii) More general, assume Fj , j ∈ I are independent σ-algebras where I is a (not nec-
essarily finite) index set, which is partitioned as follows:
∞
[
I= Ii where Ij ∩ Ik = ∅ for j 6= k .
i=1
S
Let Ji = σ j∈Ii Fj . Then, J1 , J2 , . . . are independent.
Proof.
Tmi
(i) Let Ai = j=1 Aij : Aij ∈ Fij . Ai is a π-system, and A1 , . . . , An are independent
by assumption.
=⇒ σ(A1 ), . . . , σ(An ) are independent by Theorem 1.45.
But Ji ⊆ σ(Ai ) because Ω ∈ Fij .
=⇒ J1 , . . . , Jn are independent.
18
Theorem 1.50 Suppose Xi : Ω → Si , 1 ≤ i ≤ n, are independent. Then (X1 , . . . , Xn )
has distribution µ1 ⊗ · · · ⊗ µn , where µi is the distribution of Xi .
{A1 × · · · × An : Ai ∈ Si ∀i}
19
1.6 Construction of independent random variables
See [Dur05], Section 1.4 (c).
Xi : Ω → R, Xi (ω1 , . . . , ωn ) = ωi (1.45)
for all n ≥ 1 and ai < bi . Then, there exists a unique probability measure P on
(RN , B(R)⊗N ) such that
P ω : ωi ∈ (ai , bi ] for 1 ≤ i ≤ n = νn (a1 , b1 ] × · · · × (an , bn ] (1.47)
for all ai < bi , n ∈ N. Here, B(R)⊗N denotes the product σ-algebra, i.e. the smallest
σ-algebra generated by the sets
Proof. The proof uses the measure extension theorem. See Appendix A.7 in [Dur05].
20
Example 1.54 Given distribution functions Fi , i ≥ 1, we want independent random
variables Xi , i ≥ 1 with P(Xi ≤ x) = Fi (x) for all x ∈ R, i ≥ 1.
Let µi be the unique measure on (R, B(R)) with
Hence, the measures νn , n ≥ 1, are consistent. Let P be the unique measure induced by
Kolmogorov’s extension theorem on (Ω = RN , F = B(R)⊗N ), and let
Xi : Ω → R, Xi (ω) = ωi , i = 1, 2, . . . (1.49)
be the projection to the i-th coordinate. Then, by construction, the joint distribution of
(X1 , . . . , Xn ) is given by νn = µ1 ⊗ · · · ⊗ µn . Thus, Xi , i ≥ 1, are independent and Xi has
distribution µi .
P
We write Xn −−−→ X.
n→∞
21
weak
convergence
in
probability
if dominated
almost
convergence
sure
in Lp
convergence
if dominated strong
(d) Suppose there exists Y ∈ Lp such that |Xn | ≤ Y for all n. If Xn −→ X in probability
and X ∈ Lp , then Xn −→ X in Lp .
Proof.
−−−→ 0 by assumption.
n→∞
Hence, Xn −→ X in Lp1 .
E |Xn − X|p
P |Xn − X| > ε ≤ −−−→ 0 ∀ε > 0.
εp n→∞
Hence, Xn −→ X in probability.
22
(c) Assume Xn −→ X almost surely. Then, for P-almost all ω ∈ Ω and all ε > 0 there
exists n = n(ω) ∈ N such that for all m ≥ n one has |Xm (ω) − X(ω)| ≤ ε. Hence,
∞ \ ∞
!
[
P {|Xm − X| ≤ ε} = 1.
n=1 m=n
Thus, Xn −→ X in probability.
We use the following result (which will be proved later, see Lemma 1.64):
Z
p
E[|Y | ] < ∞ ⇒ (∀η > 0) (∃δ > 0) (∀A with P(A) < δ) |Y |p dP < η. (1.53)
A
Thus,
23
Since ε > 0 was arbitrary,
lim E |Xn |p = 0.
n→∞
i.e. Xn −→ 0 in Lp and hence in probability. But, for all ω ∈ [0, 1], Xn (ω) takes infinitely
often the value 0 and infinitely often the value 1. Hence, for all ω, the sequence (Xn (ω))n∈N
does not converge. More precisely, lim supn→∞ Xn (ω) = 1 and lim inf n→∞ Xn (ω) = 0 for
all ω ∈ [0, 1].
24
P
(i) Xn −−−→ X.
n→∞
(ii) Every subsequence (Xnk ) of (Xn ) has another subsequence (Xnfk ) such that Xnfk con-
verges to X almost surely for k → ∞.
Proof. (i) =⇒ (ii): Let (Xnk ) be a subsequence of (Xn ). Then, there is a subsequence
(Xnfk ) of (Xnk ) such that
1 1
P |Xnfk − X| ≥ ≤ 2 ∀k ≥ 1 . (1.54)
k k
Due to the following lemma (proof see exercises), (1.55) implies that Xnfk converges almost
surely to X.
Lemma 1.60 For a random variable Z and a sequence of random variables (Zn ), the
following are equivalent
(a) Zn → Z a.s.
(b) For all ε > 0, P sup |Zn − Z| ≥ ε −−−→ 0.
k≥n n→∞
2
To show (1.55), fix ε > 0 and take n large enough such that n
< ε. Then
∞
X ε
P sup |Xnfk − X| ≥ ε ≤ P |Xnfk − X| ≥
k≥n
k=n
2
∞
X 1
≤ P |Xnfk − X| ≥
k=n
k
∞
X 1
≤
k=n
k2
25
1.8 Uniform integrability
Definition 1.61 We say that a sequence (Xn )n∈N of random variables is uniformly
integrable if Z
lim sup |Xn |dP = 0, (1.56)
c→∞ n∈N
{|Xn |≥c}
R
where |Xn |dP = E[|Xn |1{|Xn |≥c} ].
{|Xn |≥c}
(ii) If the random variables Xn are dominated by an integrable random variable, i.e.
there exists a random variable Y ∈ L1 such that |Xn | ≤ Y almost surely, then the
sequence (Xn )n∈N is uniformly integrable. Indeed, for all n,
E[|Xn |1{|Xn |≥c} ] ≤ E[Y 1{Y ≥c} ] = E[Yc ] where Yc = Y 1{Y ≥c} . But E[Yc ] → 0 for
c → ∞ by dominated convergence.
The following theorem extends the dominated convergence theorem and is one of the
reasons why uniform integrability is an important notion.
Theorem 1.63 (Extension of the dominated convergence theorem) If a sequence
(Xn )n∈N of random variables converges in probability to a random variable X and is uni-
formly integrable, then Xn → X in L1 . In particular, if (Xn )n∈N converges in probability
to X and is uniformly integrable, we have lim E[Xn ] = E[lim Xn ] = E[X].
n∈N n∈N
For the proof, we need the following lemma.
Lemma 1.64 If the sequence (Xn )n∈N is uniformly
R integrable, then there is for any ε > 0
some δ = δ(ε) > 0 such that P(A) ≤ δ =⇒ |Xn |dP ≤ ε, for all n.
A
Proof of Lemma 1.64. We have
Z Z Z Z
|Xn |dP = |Xn |dP + |Xn |dP ≤ cP(A) + |Xn |dP (1.57)
A A∩{|Xn <c} A∩{|Xn ≥c} A∩{|Xn ≥c}
and for c = c(ε) large enough and δ < 2cε , both terms on the r.h.s. of (1.57) are ≤ 2ε , for
all n.
Proof of Theorem 1.63. Assume that Xn → X in probability.
R Assume without loss of
generality that X ≡ 0. Then we have E[|Xn |] ≤ ε + |Xn |dP (take c = ε, A = Ω in
{|Xn ≥ε}
(1.57)). But, for any δ > 0, there is some N0 = N0 (δ, ε) such that P(|Xn | R≥ ε) ≤ δ for all
n ≥ N0 , since Xn → 0 in probability. Hence, choosing δ small enough, |Xn |dP ≤ ε
{|Xn ≥ε}
for n ≥ N0 due to Lemma 1.64. Since ε was arbitrary, we conclude that E[|Xn |] → 0 for
n → ∞.
26
2 Laws of large numbers
The material for this section is taken from Chapter 1 in [Dur05].
Lemma 2.1 Let X1 , . . . , Xn be random variables on (Ω, F, P). Assume that E[Xi2 ] < ∞,
for all i and that they are uncorrelated, i.e.
(Note that E[Xi ]E[Xj ] is well-defined due to the Cauchy-Schwarz inequality). Then
n
! n
X X
Var Xi = Var(Xi ). (2.2)
i=1 i=1
Proof. Exercise.
Note that independent random variables which are in L2 are uncorrelated.
Sn P
−−−→ E[X1 ]. (2.3)
n n→∞
Proof. Set m = E[X1 ]. Then,
" 2 #
Sn Sn Sn
E −m = Var because m = E
n n n
1
= Var(Sn )
n2
Cn
≤ 2 by Lemma 2.1
n
−−−→ 0.
n→∞
Sn
Hence, −−−→
n n→∞
m in L2 and hence in probability.
In particular, the weak law of large numbers applies if Xn , n ≥ 1, are i.i.d. (independent
and identically distributed) and in L2 .
27
Example 2.3 (Random permutations) Let n ∈ N and consider
• Consider
• Repeat with the smallest i not in the first cycle replacing 1, etc.
Set
th
1 if a right parenthesis occurs after the k number
Xn,k := in the cycle decomposition,
0 otherwise.
Claim For each n, Xn,1 , Xn,2 , . . . , Xn,n are independent with respect to Pn . (Proof see
exercises).
1
Pn (Xn,k = 1) = . (2.4)
n−k+1
For example, in the case n = 8, the event
contains all permutations with the following cycle decomposition produced by the above
algorithm:
(· , · , · ) (· , · ) (· ) (· , · ).
One has
|A| = (1 · 7 · 6) · (1 · 4) · (1) · (1 · 1) = 7 · 6 · 4
28
Hence,
p1 := P8 (X8,1 = 0, X8,2 = 0, X8,3 = 1, X8,4 = 0, X8,5 = 1, X8,6 = 1, X8,7 = 0, X8,8 = 1)
|A| 7·6·4
= = ,
|Ωn | 8!
and
p2 := P8 (X8,1 = 0)P8 (X8,2 = 0) · · · P8 (X8,8 = 1)
1 1 1 1 1 1 1
= 1− · 1− · · 1− · · · 1− · 1 = p1 .
8 7 6 5 4 3 2
Pnexpectation with respect to Pn and Varn for the variance with respect
We write En for the
to Pn . Let Sn := k=1 Xn,k = number of cycles. Then,
n n
X 1 X1
En [Sn ] = = ∼ ln n (2.5)
k=1
n − k + 1 k=1 k
n
X
Varn (Sn ) = Varn (Xn,k ) by independence
k=1
Xn
En (Xn,k )2
≤
k=1
n
X
= En [Xn,k ] because Xn,k is an indicator function
k=1
= En [Sn ] ∼ ln n. (2.6)
Hence,
" 2 #
Sn Sn Varn (Sn ) 1
En −1 = Varn = 2 ≤ −−−→ 0. (2.7)
E[Sn ] E[Sn ] En [Sn ] En [Sn ] n→∞
Fact 2.5
ω ∈ lim An ⇐⇒ ω ∈ Ak for infinitely many k, (2.10)
n→∞
ω ∈ lim An ⇐⇒ ω ∈ Ak for all but finitely many k. (2.11)
n→∞
29
Lemma 2.6 (First Borel-Cantelli lemma) Let An , ≥ 1, be events. If
∞
X
P(An ) < ∞, then P(An infinitely often) = 0. (2.12)
n=1
Proof.
∞ [
∞
!
\
P(An infinitely often) = P Ak
n=1 k=n
∞ ∞
!
[ [
= lim P Ak by σ-continuity, because Ak ↓ .
n→∞ n
k=n k=n
∞
X
≤ lim P(Ak ) by σ-subadditivity
n→∞
k=n
∞
X
=0 because P(Ak ) < ∞.
k=1
Proof.
• Xi ∈ L2 , i ≥ 1, are uncorrelated,
30
Pn
Set Sn := i=1 Xi . Then,
Sn
−−−→ m almost surely. (2.14)
n n→∞
Proof. Without loss of generality m = 0 (otherwise consider Yi = Xi − E[Xi ]).
Fix ε > 0.
h i
E Sn 2
Sn n
P > ε ≤ by Chebyshev
n ε2
n
(m=0) Var(Sn ) 1 X nC C
= 2 2
= 2 2 Var(Xk ) ≤ 2 2 = 2 .
nε n ε k=1 nε nε
Sn2
Borel-Cantelli I =⇒ P 2 > ε infinitely often = 0 ∀ε > 0
n
Sn2
=⇒ 2 −→ 0 almost surely.
n
Then,
(n+1)2
X
E[Dn2 ] E |Sk − Sn2 |2
P
≤ (max ≤ )
k=n2 +1
(n+1)2
X
= Var(Xn2 +1 + Xn2 +2 + · · · + Xk ) because m = 0
k=n2 +1
(n+1)2 k
X X
= Var(Xi )
k=n2 +1 i=n2 +1
(n+1)2
X
(n + 1)2 − n2 C = (2n + 1)2 C.
≤
| {z }
k=n2 +1
=2n+1
31
By Chebyshev’s inequality,
2
E D (2n + 1)2 C C0
P(Dn > n2 ε) ≤ 4 n2 ≤ ≤ .
nε n 4 ε2 n 2 ε2
Hence, Borel-Cantelli I implies P( Dn2n > ε infinitely often) = 0. Since ε > 0 was arbitrary,
using Lemma 2.7
Dn
−−−→ 0 almost surely.
n2 n→∞
Convergence of the whole sequence: Given k, ∃ unique n = n(k) such that n2 <
k ≤ (n + 1)2 . Note that k → ∞ ⇒ n → ∞. Furthermore,
2
(n + 1)2
k 1
1≤ 2 ≤ = 1+
n n2 n
and hence
k
lim = 1.
k→∞ n2
Consequently,
|Sk | |Sn2 + Sk − Sn2 |
=
k k
|Sn2 | + Dn
≤
k
|Sn2 | n2 Dn n2
= 2 · + 2 · −−−→ 0 almost surely.
n
| {z } |{z}k n k
|{z} |{z} k→∞
−−→0 −k→∞
a.s.
−−→1 − −→0 −k→∞
a.s.
−−→1
32
where there are two representations. We always take the one which terminates. Note:
P(E) = 0. Let
(n)
νk (ω) = i ∈ {1, . . . , n} : ξi (ω) = k , k ∈ {0, 1, . . . , 9}
= number of occurrences of the digit k among the first n digits.
ω ∈ [0, 1] is called simply normal if for all k ∈ {0, 1, . . . , 9}
(n)
νk (ω)
lim (= asymptotic relative frequency of the digit k) (2.15)
n→∞ n
1
exists and equals 10 .
Theorem 2.10 (Borel’s law of normal numbers)
P(ω ∈ [0, 1] : ω is simply normal) = 1. (2.16)
Proof. ξi : Ω → R are random variables: ξ1 (ω) = [10ω], where [x] = largest integer ≤ x,
ξ2 (ω) = [100ω − 10ξ1 (ω)], etc.
ξi , i ≥ 1, are independent with
1
P(ξi = k) = ∀i ≥ 1, k ∈ {0, . . . , 9}.
10
For example,
1
P(ξ1 = 0, ξ2 = 2) = P ω ∈ [0.02000 . . . , 0.02999 . . . ) =
| {z } 100
=0.03
1
P(ξ1 = 0) = P ω ∈ [0, 0.0999
| {z . .}.) = 10
=0.1
9
!
[ 1 1
P(ξ2 = 2) = P ω ∈ [0.i2, 0.i2999
| {z . .}.) = 10 · 100 = 10
i=0 =0.i3
=⇒ P(ξ1 = 0, ξ2 = 2) = P(ξ1 = 0)P (ξ2 = 2).
Fix k ∈ {0, . . . , 9}. For n ≥ 1, set
Xn := 1{ξn =k} .
Then Xn , n ≥ 1, are independent and identically distributed, because ξn , n ≥ 1, are
independent and identically distributed.
1
E[Xn ] = P(ξn = k) = , and
10
Var(Xn ) = Var(X1 ) < ∞ ∀n.
Hence, the assumptions of the SLLN are satisfied, and
(n) n
νk 1X 1
relative frequency of k = = Xi −−−→ almost surely.
n n i=1 n→∞ 10
√
Conjecture 2.11 π and 2 are simply normal.
33
2.4 Second Borel-Cantelli lemma
Recall the first Borel-Cantelli lemma:
∞
X
P(An ) < ∞ =⇒ P(An infinitely often) = 0. (2.17)
n=1
How about
∞
X
P(An ) = ∞ =⇒ P(An infinitely often) = 1?
n=1
This is wrong as the following example shows: Ω = [0, 1], F = B([0, 1]), P = λ|[0,1] ,
An = (0, an ) with an −−−→ 0.
n→∞
But, if an = n1 , then
∞ ∞
X X 1
P(An ) = = ∞.
n=1 n=1
n
Proof.
Step 1
∞ [
∞
!
\
P(An infinitely often) = P Ak
n=1 k=n
∞ ∞
!
[ [
= lim P Ak by σ-continuity, because Ak ↓ .
n→∞ n
k=n k=n
34
Step 2
" ∞
#c ! ∞
!
[ \
P Ak =P Ack
k=n k=n
N
!
\
≤P Ack for all N ≥ n
k=n
N
Y
= 1 − P(Ak ) by independence
k=n
N
!
X
≤ exp − P(Ak ) (recall: 1 − x ≤ e−x for x)
k=n
−−−→ 0.
N →∞
Hence, by Step 1,
∞
!
[
P Ak = 1 ∀n =⇒ P(An infinitely often) = 1.
k=n
Fact 2.13
Z ∞
E[|X|] = P(|X| > x) dx. (2.19)
0
Corollary 2.14
∞ X ∞
X 1
P(|X| > ck) ≤ E |X| ≤ P(|X| > ck) ∀c > 0. (2.20)
k=1
c k=0
35
Proof. This follows from
Z ∞ ∞ k
1 1 X Z
E |X| = P |X| > x dx = P(|X| > cx) dx .
c 0 c k=1 k−1
| {z }
≤ P |X| > c(k − 1)
P
Applications 2.15 (a) Xn , n ≥ 1, identically distributed =⇒ Xnn −
→ 0, since
Xn
P > ε = P |X1 | > εn −−−→ 0 ∀ε > 0.
n n→∞
Xn
(b) Let Xn , n ≥ 1, be independent and identically distributed. When does n
−−−→ 0
n→∞
almost surely?
Claim
Xn
−→ 0 almost surely ⇐⇒ E[|X1 |] < ∞ .
n
Proof.
Xn Xn
−→ 0 almost surely ⇐⇒ (∀ε > 0) P > ε infinitely often = 0
n n
∞
Borel-Cantelli I + II
X Xn
⇐⇒ (∀ε > 0) P > ε < ∞
n=1
n
∞
X
⇐⇒ (∀ε > 0) P (|X1 | > εn) < ∞
n=1
⇐⇒ E[|X1 |] < ∞ by Corollary 2.14.
36
Corollary 2.16 Let Xn , n ≥ 1, be independent and identically distributed. If E[|X1 |] =
∞, then
Sn
P lim exists in (−∞, ∞) = 0 (2.21)
n→∞ n
Proof.
Sn+1 Sn + Xn+1 Sn n Xn+1
= = · + .
n+1 n+1 n n+1 n+1
Hence,
Sn Xn
lim exists in (−∞, ∞) ⊆ −→ 0 .
n→∞ n n
But if E[|X1 |] = ∞, then
|Xn |
lim = ∞ almost surely.
n→∞ n
37
The example can be generalized:
• for Xi , i ≥ 1, i.i.d. with P(Xi = +1) = p ∈ (0, 1), P(Xi = −1) = 1 − p one can show
the following:
– P(`n = k infinitely often) = 1 for all k ∈ N;
– any finite sequence (k1 , k2 , . . . , km ) ∈ {−1, 1}m occurs infinitely often with
probability one.
• More generally, for any i.i.d. sequence of random variables taking values in a finite
alphabet, any finite pattern which has positive probability to occur at the beginning
of the sequence occurs infinitely often with probability one.
In particular: A monkey types randomly on the keyboard of a computer. We
assume that the letters he types form an i.i.d. sequence and every character has a
strictly positive probability to appear. Then, any finite pattern (e.g. the constitution
of Bavaria, the bible, the works of Shakespeare, the final exam for this course, a
solution for the final exam, etc.) occurs infinitely often with probability one.
n
1X
=⇒ lim Xi ≥ E[X1 1{X1 ≤c} ] ↑ E[X1 ] = ∞ by monotone convergence.
n→∞ n c→∞
i=1
38
2.6 Kolmogorov’s 0-1-law
Random Series. We P∞start with the following question. Assume Xn , n ≥ 1 are inde-
pendent. When does n=1 Xn converge?
Theorem 2.21 (Kolmogorov’s 0-1-law) Let (Ω, F, P) be a probability space. Let (Gi )i∈I
be a countable collection of independent σ-algebras (we assume Gi ⊆ F, ∀i). Further let
T S
G∞ := σ Gi denote the corresponding tail σ-algebra. Then we have A ∈ G∞ ⇒
J⊆I i∈J
/
|J|<∞
P(A) ∈ {0, 1}.
Interpretation of G∞
(1) Dynamical:
If we interpret I as a sequence {1, 2, 3, . . . } of points in time and Gn as the σ-
∞
T S
algebra of all events observable at time n ∈ N, then we have G∞ = σ Gk =
n=1 k≥n
∞
T
σ(Gn , Gn+1 , . . . ). Then G∞ can be interpreted as the σ-algebra of all events
n=1
observable ”in the infinitely distant future”.
(2) Static:
We interpret I as a set of ”subsystems” which act independently of each other
and Gi as the σ-algebra of events which only depend on the i’th subsystem. Then
G∞ is the collection of all ”macroscopic” events which do not depend on finitely
many subsystems. Thus, if the subsystems are independent, we know that on this
”macroscopic scale” the whole system is deterministic.
39
Example 2.22 Let (Xn )n∈N be a sequence of random variables on (Ω, F).
∞
We define Fn := σ(X1 , X2 , . . . , Xn ) and F ∗ = F ∗ (Xi , i ≥ 1) =
T
σ(Xn , Xn+1 , . . .). Then
n=1
F ∗ = G∞ , where Gi = σ(Xi ).
n Pn o Pn
k=1 Xk k=1 Xk
The events lim cn
exists , lim sup cn ≤ t for cn , t ∈ R with cn % ∞ are
n→∞ n→∞
elements of F ∗ .
Due to Kolmogorov’s 0-1-law we have P(A) ∈ {0, 1}, ∀A ∈ F ∗ provided that the random
variables X1 , X2 , . . . are independent.
Proof.
S
Step 1 The collection of sets Gj (j ∈ J), Gi are independent for every finite set J ⊆ I.
i∈J
/ S
Due to Corollary 1.47 we have that Gj (j ∈ J), σ Gi are also independent.
S i∈J
/
Applications 2.23
(a) An , n ≥ 1, independent =⇒ P (An infinitely often) ∈ {0, 1},
because {An infinitely often} belongs to the tail-σ-algebra generated by Xn = 1An .
This also follows from the Borel-Cantelli lemmas:
X
P(An ) < ∞ =⇒ P(An infinitely often) = 0,
X
P(An ) = ∞ =⇒ P(An infinitely often) = 1.
∞
!
X
(b) Xn , n ≥ 1, independent =⇒ P Xn converges ∈ {0, 1}.
n=1
40
Example 2.24 (Percolation)
Zd , p ∈ [0, 1]. (
blue with probability p,
1 Every bond is colored
red with probability 1 − p,
0 1 independently of all other bonds.
Consider the random subgraph of Zd containing only
the blue bonds. Its connected components are called
blue clusters.
Claim 2.25 Pp (∃ an infinite blue cluster) ∈ {0, 1}.
blue with probability p
Proof. Let Xe = , e ∈ E = set of bonds in Zd , be
red with probability 1 − p
independent. Enumerate the bonds in an arbitrary way such that E = {en : n ∈ N}.
Then,
{∃ an infinite blue cluster} ∈ F ∗ (Xen , n ∈ N).
(Whether there exists an infinite cluster or not doesn’t depend on the state of finitely many
bonds). Use Kolmogorov’s 0-1-law.
You can find more about percolation in [Kle06], section 2.4., or in the lecture course
“Probability on Graphs”.
Lemma 2.26 If Xn , n ≥ 1, are independent and Y is measurable with respect to the
σ-algebra F ∗ (Xn , n ≥ 1) (defined in Example 2.22) then there exists c ∈ [−∞, ∞] such
that P(Y = c) = 1. In other words, Y is almost surely constant.
Proof. By assumption, {Y ≤ a} ∈ F ∗ (Xn , n ≥ 1) for all a ∈ R. Hence, by Kolmogorov’s
0-1 law,
P(Y ≤ a) ∈ {0, 1} for all a ∈ R.
• P(Y ≤ a) = 0 ∀a ∈ R =⇒ P(Y = +∞) = 1.
• P(Y ≤ a) = 1 ∀a ∈ R =⇒ P(Y = −∞) = 1.
• Otherwise c 7→ P(Y ≤ a) equals 1[c,∞) for some c ∈ R; in particular, it is the
distribution function of a constant random variable and hence P(Y = c) = 1.
Proof. limn→∞ Snn and limn→∞ Snn are measurable with respect to F ∗ (Xn , n ≥ 1). Hence,
by Lemma 2.26 they are constant almost surely. If the two constants are equal, then
limn→∞ Snn exists a.s. Otherwise, the limit a.s. does not exist.
41
What is the right almost sure normalization?
Theorem 2.28 Let Xi ∈ L2 , i ≥ 1,Pbe independent and identically distributed with
E[Xi ] = 0 and E Xi2 = σ 2 < ∞, Sn = ni=1 Xi . For all ε > 0,
Sn
√ 1 −−−→ 0 almost surely. (2.27)
n(ln n) 2 +ε n→∞
42
3 Weak convergence, characteristic functions and the
central limit theorem
The material for this chapter is taken from [Dur05], Chapter 2.
3.1 Motivation
Theorem 3.1 (Central limit theorem) Let Xi , i ≥ 1, be independent and identically
distributed with E[X1 ] = 0 and E X12 = 1. For n ∈ N, set Sn = ni=1 Xi . Then, ∀x ∈ R
P
Z x
Sn 1 t2
P √ ≤ x −−−→ √ e− 2 dt (3.1)
n n→∞ 2π −∞
= P(Z ≤ x) for Z ∼ N (0, 1). (3.2)
Sn
In other words, if Fn = distribution function of √ ,
n
F = distribution function of N (0, 1),
then
w w
We say Xn −
→ X if L(Xn ) −
→ L(X), where L(Xn ) denotes the distribution of Xn .
43
Example 3.3 Assume xn ∈ R, xn −−−→ x ∈ R. Let δy be the point mass in y, i.e.
n→∞
(
1 if y ∈ A,
δy (A) = (3.5)
0 otherwise.
Then,
w
δx n −
→ δx , (3.6)
R R
because f dδxn = f (xn ) −−−→ f (x) = f dδx for any continuous f : R → R.
n→∞
w
Remarks 3.4 (a) Xn − → X ⇐⇒ E[f (Xn )] −−−→ E[f (X)] ∀f : S → R bounded
n→∞
and continuous. R
Proof. E[f (Y )] = f (y) µ(dy) with µ = L(Y ).
w
(b) Xn −
→ X is possible even if all Xn ’s are defined on different probability spaces.
w w
(c) Xn −
→ X =⇒ h(Xn ) − → h(X) ∀h continuous.
Proof. f ◦ h is bounded and continuous if f is bounded and continuous.
Theorem 3.5 (Portmanteau theorem)
Let µn , µ be probability measures on (Rd , B(Rd )). The following are equivalent:
w
(i) µn −
→ µ;
Z Z
(ii) f dµn −→ f dµ ∀f : Rd → R bounded and uniformly continuous;
(v)
44
• (iii)⇐⇒(iv): F closed ⇐⇒ F c open.
lim µn (F ) ≤ µ(F )
n→∞
lim 1 − µn (F c ) ≤ 1 − µ(F c )
⇐⇒
n→∞
⇐⇒ 1 − lim µn (F c ) ≤ 1 − µ(F c )
n→∞
⇐⇒ µ(F c ) ≤ lim µn (F c ).
n→∞
1
ϕ(t)
t
0 1
dist(x,F )
The function f (x) := ϕ ε
is uniformly continuous on Rd with 0 ≤ f (x) ≤ 1
∀x and (
1 if x ∈ F,
f (x) =
0 if x ∈ Gc .
So,
Z Z Z
µn (F ) = f dµn ≤ f dµn −−−→ f dµ by (ii)
F R d n→∞ Rd
Z
=⇒ lim µn (F ) ≤ f dµ.
n→∞ Rd
45
Furthermore,
Z d Z
f dµ = f dµ ≤ µ(G) < µ(F ) + δ.
R G
Hence,
• (iii)=⇒(i): Let
Z f be bounded
Z and continuous.
Claim: lim f dµn ≤ f dµ.
n→∞
Proof of the claim. Without loss of generality 0 < f (x) < 1 ∀x ∈ Rd .
(Given f bounded, ∃ a > 0, b ∈ Rd such that g(x) := af (x) + b satisfies 0 < g(x) < 1
∀x. If the claim holds for g, it holds for f .)
Define for k ∈ N, Fi := {x : f (x) ≥ ki }, i = 0, 1, . . . , k. All Fi are closed because f
is continuous.
k k
i − 1 n i − 1 i o i n i − 1 i o
X Z X
µ x: ≤ f (x) < ≤ f dµ ≤ µ x: ≤ f (x) <
i=1
k | k {z k } i=1
k| k {z k }
=µ(Fi−1 )−µ(Fi )
= µ(Fi−1 \ Fi )
= µ(Fi−1 ) − µ(Fi )
k−1
Xi k
Xi−1 k−1
Xi+1 k
X i
= µ(Fi ) − µ(Fi ) = µ(Fi ) − µ(Fi )
i=0
k i=1
k i=0
k i=1
k
k k
X µ(Fi ) 1 X µ(Fi )
= = + because µ(F0 ) = 1
i=1
k k i=1 k
and µ(Fk ) = 0.
Hence,
k k
µ(Fi ) 1 X µ(Fi )
X Z
≤ f dµ ≤ + . (3.7)
i=1
k k i=1 k
46
So,
k
!
1 µn (Fi )
Z X
lim f dµn ≤ lim + by the upper bound in (3.7)
n→∞ n→∞ k i=1
k
k
1 X µ(Fi )
≤ + by (3.8)
k i=1 k
1
Z
≤ + f dµ by the lower bound in (3.7).
k
Let k → ∞. The claim follows.
Apply the claim to −f to get
Z Z Z
− lim f dµn = lim −f dµn ≤ − f dµ
Z Z
=⇒ lim f dµn ≥ f dµ
Z
≥ lim f dµn by the claim
Z
≥ lim f dµn
Z Z
=⇒ lim f dµn = f dµ.
So, these boundaries are distinct for distinct δ. Hence, at most countably many
have positive µ-measure. Consequently,
47
For each k,
(v)
lim µn (F ) ≤ lim µn (Fk ) = µ(Fk ).
n→∞ n→∞
48
Take the limit k → ∞. Since (ak , bk ] ↑ (a, b),
w
Portmanteau theorem =⇒ µn −
→ µ.
Recall Fatou’s lemma:
Z Z
lim fn dν ≤ lim fn dν
S n→∞ n→∞ S
Claim: P(pXp > x) −−→ P(X > x) for all x ∈ R, where X has an exponential distribu-
p→0
tion, i.e. L(X) has the density
49
Proof. One has
Z ∞
P(X > x) = e−t dt = e−x ∀x > 0
x
and P(X > x) = 1 for all x ≤ 0. Furthermore, P(pXp > x) = 1 for all x ≤ 0 and
∞
X 1
P(Xp ≥ n) = (1 − p)k−1 p = p(1 − p)n−1 ·
k=n
p
= (1 − p)n−1 for n = 1, 2, . . .
x x
P(pXp > x) = P Xp > = P Xp ≥ +1
p p
= (1 − p)[ p ]
x
(1 + cj )aj −→ eλ . (3.11)
Due to the SLLN, for all x ∈ R, Fn (x, ω) converges to F (x) for n → ∞, almost surely.
w
Theorem 3.6 =⇒ For almost all ω, ρn (ω) −
→ L(X1 ). (3.13)
A stronger statement was proved by Glivenko-Cantelli: sup |Fn (x, ω) − F (x)| −−−→ 0 for
x∈R n→∞
almost all ω, where F is the distribution function of X1 .
50
Theorem 3.10 (Continuous mapping theorem) Let h : R → R be measurable,
Dh := {x : h is discontinuous at x}. (3.14)
Dh ∈ B(R), see [Dur05], Exercise 2.6 in Chapter 1. If
w
Xn −
→X and P(X ∈ Dh ) = 0, (3.15)
then
w
h(Xn ) −
→ h(X). (3.16)
Proof. Let µn , µ be the distributions of Xn , X,
νn , ν the distributions of h(Xn ), h(X).
Note:
νn (A) = P h(Xn ) ∈ A = P Xn ∈ h−1 (A) = µn h−1 (A) .
Theorem 3.11 Let Xn , X be random variables defined on the same probability space.
P w
(a) Xn −
→X =⇒ Xn −
→ X.
w P
(b) Xn −
→ X and X is constant almost surely =⇒ Xn −
→ X.
(c) (b) is false without the assumption that X is constant almost surely.
51
Proof.
(a) Let f : R → R be bounded and uniformly continuous.
Let ε > 0. There exists δ > 0 such that ∀x, y ∈ R with |x − y| < δ one has
|f (x) − f (y)| < ε.
E f (Xn ) − E f (X)
Z
≤ |f (Xn ) − f (X)| dP
Z Z
= |f (Xn ) − f (X)| dP + |f (Xn ) − f (X)| dP
{|Xn −X|<δ} | {z } {|Xn −X|≥δ}
<ε
≤ ε + 2 kf k∞ P |Xn − X| ≥ δ .
P
Since Xn −
→ X, we get
lim E f (Xn ) − E f (X) ≤ ε.
n→∞
Assume Xn , n ≥ 1, and X are defined on the same probability space such that X
1
is independent of all Xn . Then, for all ε ∈ 0, 4 ,
P |Xn − X| > ε = P (X = 0, |Xn | > ε) + P (X = 1, |Xn − 1| > ε)
1 1
= P(X = 0)P Xn = 1 − + P(X = 1)P Xn = ∀n large enough
n n
1 1 1 1
= · + · 6−−−→ 0.
2 2 2 2 n→∞
52
3.3 Weakly convergent subsequences
Definition 3.12 (a) A collection Π of probability measures on (R, B(R)) is relatively
compact, if every sequence in Π contains a weakly convergent subsequence.
Theorem 3.15 (Helly’s selection theorem) For every sequence (Fn )n∈N of distribu-
tion functions, there exist a subsequence (Fnk )k∈N and a function F which is non-decreasing
and right-continuous for which
53
Let µ be the unique measure satisfying
µk (−k, k] ≤ 1 − ε. (3.20)
w
There is a subsequence such that µk(j) −−−→ µ for some probability measure µ by
j→∞
relative compactness.
Choose a, b so that µ{a} = µ{b} = 0 and
ε
µ(a, b] > 1 − . (3.21)
2
For large enough j, (a, b] ⊆ − k(j), k(j) . Then,
(3.20)
1 − ε ≥ µk(j) − k(j), k(j) ≥ µk(j) (a, b] −−−→ µ(a, b].
j→∞
Corollary 3.16 If (µn )n≥1 is tight and if each subsequence that converges weakly con-
w
verges to the same probability measure µ, then µn −
→ µ.
54
Lemma 3.17 Let xn , n ∈ N, and x be real numbers. If each subsequence (xnk )k≥1 con-
tains a further subsequence (xnk (j) )j≥1 with xnk (j) −−−→ x, then xn −−−→ x.
j→∞ n→∞
Proof. Suppose xn 6→ x. Then, (∃ε > 0) such that (∀n0 ) (∃n ≥ n0 ) such that |xn −x| ≥ ε.
In particular, (∀k) (∃nk ≥ k) with
|xnk − x| ≥ ε. (3.22)
By assumption, there exists a subsequence (xnk (j) )j≥1 with xnk (j) −−−→ x. This contradicts
j→∞
(3.22).
R
Proof of Corollary 3.16. Let f be bounded and continuous and set xn = f dµn .
Since (µn )n≥1 is tight, it is relatively compact. RThus, (xn )n≥1 satisfies the assumptions of
R w
Lemma 3.17 and it follows that f dµn −−−→ f dµ. Thus, µn − → µ.
n→∞
55
Note that we have
Z
E[f (X)] = f (x) µ(dx)
R
for all measurable maps f : R → C such that E[f (X)] is well-defined. In particular,
Z
itX
ϕX (t) = E[e ] = eitx µ(dx)
R
is the characteristic function of X. Thus, if X and Y have the same distribution, then
they have the same characteristic function.
(c) |ϕ(t)| ≤ 1 ∀t ∈ R.
Proof.
56
h i
(d) |ϕ(t + h) − ϕ(t)| = E ei(t+h)X − eitX
h i
itX
ihX
= E e · e −1
h i
≤ E eitX · eihX − 1
h i
ihX
=E e − 1 independent of t
| {z }
≤2
−−→ 0 by the bounded convergence theorem.
h→0
A table with the characteristic functions of many distributions can be found in Theorem
15.12 in [Kle06].
Example 3.22 If X ∼ N (m, σ 2 ), then
σ 2 t2
ϕX (t) = exp imt − , t ∈ R. (3.28)
2
Proof.
57
Hence,
ϕ0 (t)
= −t.
ϕ(t)
=⇒ (ln ϕ(t))0 = −t
t2
=⇒ ln ϕ(t) = − + c for some c ∈ R
2
t2
=⇒ ϕ(t) = e− 2 +c .
Lemma 3.24 Let X ∼ µ have characteristic function ϕ. Then, µ(σ) has density
σ 2 t2
1
Z
(σ)
f (x) = ϕ(t) exp −ixt − dt, x ∈ R. (3.31)
2π R 2
σ
2
Z
− s2 t2 σ 2
e 2σ = ϕY (s) = eist √ e− 2 dt. (3.32)
R 2π
58
By the formula for the density of convolutions,
1
Z
(x−y)2
(σ)
f (x) = √ e− 2σ2 µ(dy)
2πσ
Z ZR
1 t2 σ 2
= ei(y−x)t · e− 2 dt µ(dy) where we used (3.32) for s = y − x
2π R R
1
Z Z
t2 σ 2
= eiyt µ(dy) e−ixt e− 2 dt by Fubini’s theorem.
2π R R
| {z }
=ϕ(t)
X + σY −→ X almost surely,
1
Z
f (x) = e−itx ϕ(t) dt = lim f (σ) (x) , x ∈ R. (3.35)
2π σ→0
59
Proof. Let Y ∼ N (0, σ 2 ) be independent of X, f (σ) density of X + Y .
By Lemma 3.24,
σ 2 t2
1
Z
(σ)
f (x) = ϕ(t) exp −ixt − dt.
2π 2
So,
w
µ(a, b] = lim µ(σ) (a, b] because µ(σ) − →µ
σ→0
Z ∞
1 2 2
− σ 2t e−ibt − e−iat
= lim ϕ(t)e · dt.
σ→0 2π −∞ −it
Furthermore,
1
Z
(σ)
|f (x)| ≤ |ϕ(t)| dt =: c < ∞
2π R
for all σ > 0, x ∈ R. Thus, for a < b with µ {a} = µ {b} = 0, the bounded
convergence theorem implies
Z b Z b
(σ) (σ)
µ(a, b] = lim µ (a, b] = lim f (x) dx = f (x) dx.
σ→0 σ→0 a a
Example 3.27 (Cauchy distribution) Let X have characteristic function ϕ(t) = e−|t| ,
t ∈ R. Then,
Z
|ϕ(t)| dt < ∞.
R
60
Hence, X has density
Z ∞
1
f (x) = e−itx e−|t| dt
2π −∞
Z 0 Z ∞
1 1
= e t(1−ix)
dt + e−t(1+ix) dt
2π −∞ 2π 0
t(1−ix) t=0 −t(1+ix) t=∞
1 e 1 e
= +
2π 1 − ix t=−∞ 2π −(1 + ix) t=0
1 1 1
= +
2π 1 − ix 1 + ix
1 2 1 1
= · 2
= · = density of a Cauchy distribution.
2π 1 − (ix) π 1 + x2
Suppose X1 , . . . , Xn are independent, Cauchy distributed. Let Sn = X1 + · · · + Xn .
The following lemma relates the behavior of ϕ near 0 to the tail behavior of X.
Proof. We have
1 u 1 u
Z Z Z
itx
1 − ϕ(t) dt = 1 − e µ(dx) dt
u −u u −u
Z Z u
1
1 − eitx dt µ(dx)
= by Fubini’s theorem.
u −u
61
We calculate the inner integral:
Z u Z u Z u
itx
1−e dt = (1 − cos(tx)) dt − i sin(tx) dt
−u −u −u
| {z }
=0
u t=u
sin(tx)
Z
=2 (1 − cos(tx)) dt = 2 t −
0 x t=0
sin(ux)
= 2u 1 − .
ux
This yields
1 u
Z
sin(ux)
Z
1 − ϕ(t) dt = 2 1− µ(dx)
u −u ux
sin(ux)
Note: |sin(x)| ≤ |x| ∀x ∈ R =⇒ 1− ≥0
ux
sin(ux)
Z
≥2 1− µ(dx).
{x:|x|≥ u2 } ux
Use |sin(ux)| ≤ 1:
1 u
1 2
Z Z
1 − ϕ(t) dt ≥ 2 1− µ(dx) ≥ µ x : |x| ≥ .
u −u {x:|x|≥ u2 } |ux| u
|{z}
≤ 12
| {z }
≥ 12
because x 7→ eitx = cos(tx) + i sin(tx) has bounded continuous real and imaginary
parts.
“⇐=”: Assume ϕn (t) −−−→ ϕ(t) ∀t ∈ R.
n→∞
Let ε > 0.
As a characteristic function, ϕ is continuous at 0 with ϕ(0) = 1. Hence, we can find
u > 0 so that for |t| < u one has |1 − ϕ(t)| < 2ε . Hence,
1 u
Z
1 − ϕ(t) dt < ε.
u −u
62
By the dominated convergence theorem,
Z u Z u
1 − ϕn (t) dt −→ 1 − ϕ(t) dt.
−u −u
µn [−M, M ] = 1 − µn ({x : |x| > M }) ≥ 1 − 2ε ∀n = 1, 2, . . .
63
Proof. See [Dur05], (3.6) in Chapter 2.
Proof.
• We show the formula (3.41) for ϕ(k) by induction over k. By the definition of ϕ, it
is true for k = 0.
Induction step: Suppose (3.41) is true for k ∈ N0 .
Suppose E |X|k+1 < ∞. Then E |X|k < ∞ and by induction assumption
One has
(k)
ϕ (t + h) − ϕ(k) (t) k k ei(t+h)X − eitX
k+1
k+1 itX k+1 k+1 itX
− i E X e ≤ E i X − i X e
h h
ihX
−1
e
= E |X|k ·
− iX =: Ih .
| h {z }
=:ψh (X)
64
• Uniform continuity of ϕ(k) is proved the same way as uniform continuity of ϕ.
• Taylor expansion: Suppose E[|X|k ] < ∞.
#
k
" k
(j) j
X ϕ (0) X (itX)
ϕ(t) − tj ≤ E eitX −
j! j!
j=0 j=0
" ( )#
|tX|k+1 2 |tX|k
≤ E min ,
(k + 1)! k!
" ( )#
k |t| · |X|k+1 2 |X|k
= |t| E min ,
(k + 1)! k!
| {z }
2|X|k
−−−→ 0 by dominated convergence, min{. . .} ≤ k!
integrable
t→0
= o tk .
Theorem 3.34 If
ϕ(h) − 2ϕ(0) + ϕ(−h)
lim > −∞,
h↓0 h2
then E[X 2 ] < ∞. In particular, E[X 2 ] < ∞ if ϕ00 exists and is continuous at 0.
Proof.
− 2 + e−ihX
ihX
ϕ(h) − 2ϕ(0) + ϕ(−h)
e
∞ > − lim = lim E −
h↓0 h2 h↓0 | h2
{z }
2(1−cos(hX))
= ≥0
h2
" #
2 1 − cos(hX)
≥ E lim by Fatou’s lemma
h↓0 h2
(hx)2
= E X2
because 1 − cos(hx) ∼ as h → 0.
2
65
Proof. Let m = E[X1 ], ϕ = ϕX1 . Then,
" n # n
h tSn i
i n
Y tXj
i n
Y t
ϕ Sn (t) = E e =E e = ϕXj by independence
n
j=1 j=1
n
n n
t 0 t 1
= ϕ = 1 + ϕ (0) + o as n → ∞ (Taylor-expansion for ϕ)
n n n
n
itm 1
= 1+ +o because E[|X1 |] < ∞
n n
−−−→ eimt = characteristic function of δm by Lemma 3.8.
n→∞
Sn w
Lévy’s continuity theorem =⇒ n
−
→ m.
Sn P
Hence, by Theorem 3.11 n
−
→ m.
Sn − nm
w
L √ −
→ N (0, 1). (3.46)
σ n
ϕ00 (0) 2
ϕ(t) = 1 + ϕ0 (0) t + t + o t2
| {z } 2 }
=im=0
| {z
2 σ2
=i 2
(σt)2
+ o t2 as t → 0.
=1−
2
As before,
n
t t
ϕ S√n (t) = ϕSn √=ϕ √
σ n σ n σ n
n
t2
1
= 1− +o as n → ∞
2n n
t2
−−−→ e− 2 = characteristic function of N (0, 1).
n→∞
S√n w
Lévy’s continuity theorem =⇒ −−−→
σ n n→∞
N (0, 1).
66
Theorem 3.38 (Lindeberg-Feller theorem) Assume
n
X 2
(iv) (∀ε > 0) E Xn,j 1{|Xn,j |>ε} −−−→ 0.
n→∞
j=1
Then,
n
w
X
Sn = → N (0, σ 2 ).
Xn,j − (3.47)
j=1
Set Xn,j = √1n Yj . (i) and (ii) are clearly satisfied. We check (iii) and (iv):
n n
X 2 X 1 2
• E Yj = E Y12 = σ 2 .
E Xn,j =
j=1 j=1
n
n
Y12
X Z Z
•
2
Y12 dP −−−→ 0 by the dom-
E Xn,j 1{|Xn,j |>ε} = n √
dP = √
j=1 {|Y1 |>ε n} n {|Y1 |>ε n} n→∞
2
inated convergence theorem, because E Y1 < ∞.
2
2
Proof of Lindeberg-Feller theorem. Let σn,j = E Xn,j .
2
Claim 1: lim max σn,j = 0.
n→∞ 1≤j≤n
Proof. For all ε > 0,
n
X h i
2
2 2
2 2 2
σn,j = E Xn,j ≤ ε + E Xn,j 1{|Xn,j |>ε} ≤ ε + E Xn,j 01
{|X |>ε}
n,j 0
j 0 =1
2
=⇒ lim max σn,j ≤ ε2 by (iv).
n→∞ 1≤j≤n
67
Since ε > 0 is arbitrary,
itX claim
1 follows.
Let ϕn,j (t) = E e n,j
.
n 2 2
n
2 t
X X
0 2 2
because |error(n)| ≤ c σn,j ≤ c max σn,j σn,j −−−→ 0.
j=1
2 1≤j≤n
j=1
n→∞
| {z }
−→0 | {z }
−→σ 2
n 2
2 t σ 2 t2
Y
=⇒ 1 − σn,j −−−→ e− 2 .
j=1
2 n→∞
If (3.48) holds, then
n
σ 2 t2
Y
ϕ Pn
j=1 Xn,j
(t) = ϕn,j (t) −−−→ e− 2
n→∞
j=1
n
w
X
Lévy’s continuity theorem implies: → N (0, σ 2 ). This completes the proof of
Xn,j −
j=1
claim 2.
68
Hence,
X
n n 2 n 2
2 t t
Y Y
2
Jn := ϕn,j (t) − 1 − σn,j ≤ ϕn,j (t) − 1 + σn,j
j=1 j=1
2
j=1
2
n 2
X itX 2 t 2
E e n,j − 1 − itXn,j − i Xn,j
= because E[Xn,j ] = 0.
j=1
2
j=1
6
n
( " # )
3
X |t|
|Xn,j |3 1{|Xn,j |≤ε} + t2 E Xn,j
2
≤ E 1{|Xn,j |>ε} for all ε > 0
j=1
6
n n
X |t|3 2 X
t2 E Xn,j
2
≤ εE Xn,j 1{|Xn,j |≤ε} + 1{|Xn,j |>ε}
j=1
6 j=1
Since Jn ≥ 0 and ε > 0 is arbitrary, we conclude limn→∞ Jn = 0. This proves (3.48) and
completes the proof of the Lindeberg-Feller theorem.
Example 3.41 Pick a permutation uniformly at random from the set of all permutations
of {1, . . . , n} and consider the cycle decomposition from Example 2.3. Recall the random
variables Xn,1 , Xn,2 , . . . , Xn,n . Assume Xn,1 , Xn,2 , . . . , Xn,n , n = 1, 2, . . . are independent
with
1
P(Xn,k = 1) = 1 − P(Xn,k = 0) = , (3.50)
n−k+1
X n
Sn = Xn,k = number of cycles of a uniformls chosen permutation of {1, . . . , n}.
k=1
(3.51)
We know
n
X 1
E[Sn ] = ∼ ln n as n → ∞. (3.52)
k=1
k
Now,
2
1 1
− (E[Xn,k ])2 =
2
Var(Xn,k ) = E Xn,k − . (3.53)
n−k+1 n−k+1
69
By independence,
n n n
X X 1 X 1
Var(Sn ) = Var(Xn,k ) = − ∼ ln n as n → ∞. (3.54)
k=1 k=1
k k=1 k 2
Sn P
We know: −
→ 1. Hence,
E[Sn ]
Sn P Sn
−
→ 1, i.e. P − 1 > ε −−−→ 0 ∀ε > 0 (3.55)
ln n ln n n→∞
⇐⇒ P (1 − ε) ln n ≤ Sn ≤ (1 + ε) ln n −−−→ 1 ∀ε > 0. (3.56)
n→∞
n
X h i
2
(iv) E Yn,k 1{|Yn,k |>ε} = 0 ∀n large enough and ε > 0,
k=1 √
Xn,k − 1 > ε ln n, false for all k if n is large
because |Yn,k | > ε ⇐⇒ n−k+1
enough.
Note that
n−1 n n−1 n
1 dx X 1 X1
X Z
≥ = ln n ≥ = . (3.59)
k=1
k 1 x k=1
k + 1 k=2 k
Hence,
1
− ≥ ln n − E[Sn ] ≥ −1 (3.60)
n
70
and we conclude |E[Sn ] − ln n| ≤ 1.
E[Sn ] − ln n
=⇒ √ −−−→ 0. (3.61)
ln n n→∞
Sn − ln n w
=⇒ √ −
→ N (0, 1). (3.62)
ln n
Thus,
√ √ Sn − ln n
P ln n + a ln n ≤ Sn ≤ ln n + b ln n = P a ≤ √ ≤b (3.63)
ln n
Z b
1 x2
−−−→ √ e− 2 dx ∀a < b. (3.64)
n→∞ 2π a
71
4 Conditional expectation
4.1 Motivation
Assume X is random variable on some probability space (Ω, F, P) and X ∈ L1 . The
expectation E[X] can be interpreted as a prediction for the unknown (random) value of
X. Assume F0 ⊆ F, F0 is a σ-field and assume we ”have the information in F0 ”, i.e. for
each A0 ∈ F0 we know if A0 will occur or not. How does this partial information modify
the prediction of X?
Example 4.2 X1 , X2 , . . . i.i.d. with E[|X1 |] < ∞ and m = E[X1 ]. How should we modify
the prediction E[X1 ] = m if we know the value Sn (ω) = X1 (ω) + . . . + Xn (ω)?
The solution of the prediction problem is to pass from the constant E[X] = m to
a random variable E[X|F0 ], which is measurable with respect to F0 , the conditional
expectation of X, given F0 .
X 1
E[X|F0 ](ω) := E[X1Ai ]1Ai (ω). (4.1)
P(Ai )
i : P(Ai )>0
(If ω ∈ Ai and P(Ai ) = 0, we set E[X|F0 ](ω) = 0.) (4.1) gives for each ω ∈ Ω a prediction
E[X|F0 ](ω) which uses only the information in which atom ω is.
Theorem 4.4 The random variable E[X|F0 ] (defined in (4.1)) has the following proper-
ties:
72
(ii) For each random variable Y0 ≥ 0, which is measurable with respect to F0 , we have
In particular,
E[X] = E [E(X|F0 ]] . (4.3)
= E[X1Aj ] if P(Aj ) >P 0. Hence (4.2) follows in this case from (4.1). Next, we consider
functions of the form ci 1Ai (ci ≥ 0), then monotone limits of such functions as in the
definition of the integral ⇒ (4.2) holds true for all Y0 ≥ 0, Y0 measurable with respect to
F0 . Taking Y0 ≡ 1, (4.3) follows.
Definition 4.5 If F0 = σ(Y ) for some random variable Y , we write E[X|Y ] instead of
E[X|σ(Y )].
Example 4.6 Take p ∈ [0, 1], let X1 , X2 , . . . be i.i.d. Bernoulli random variables with
parameter p, i.e. P(Xi = 1) = p = 1 − P(Xi = 0).
Question: What is E[X1 |Sn ]?
n
P
Answer: E[X1 |Sn ] = P(X1 = 1|Sn = k)1{Sn =k} and
k=0
P(X1 =1,Sn =k) p(n−1
k−1 )
pk−1 (1−p)n−1−(k−1) k
P(X1 = 1|Sn = k) = = =
P(Sn =k) (nk)pk (1−p)n−k n
Sn
⇒ E[X1 |Sn ] = . (4.4)
n
Remark 4.7 E[X1 |Sn ] does not depend on the ”success parameter” p.
73
4.3 Conditional expectation for general σ-fields
We recall the following facts from measure theory.
Definition 4.9 Let P and Q be measures on (Ω, F). Q is called absolutely continuous
with respect to P (we write Q P) if one has
Example 4.10 Let λ be the Lebesgue measure on R and ν the law of a standard normal
random variable, i.e.
1
Z
x2
ν(A) = √ e− 2 dx for all A ∈ B(R). (4.7)
A 2π
Then, ν λ. In fact, µ λ for all probability measures µ on R which have a density.
Remark 4.12 The Radon-Nikodym density is unique up to P-nullsets, i.e. if f and f˜ are
two functions satisfying (2) in Theorem 4.11, then P(f 6= f˜) = 0.
We write X0 = E[X|F0 ].
74
Remark 4.14 (a) To check (ii) in Definition 4.13, it suffices to check E[X1A0 ] =
E[X0 1A0 ], ∀A0 ∈ F0 .
(b) If X0 and X̃0 are random variables which satisfy (ii) in Definition 4.13, then A0 =
{X0 > X̃0 } ∈ F0 . (4.9) implies that E[X0 1A0 ] = E[X̃0 1A0 ] ⇒ E[(X0 − X̃0 )1A0 ] =
0 ⇒ P(A0 ) = 0. In the same way, P(X0 < X̃0 ) = 0 ⇒ X0 = X̃0 P-a.s.
Sn
E[Xi |Sn ] = , i = 1, . . . n. (4.11)
n
For the proof, we will need the following lemma.
Lemma 4.18 Let X and Y be random variables on some probability space (Ω, F, P).
Then, the following statements are equivalent:
75
(b) There is a measurable function h : (R, B) → (R, B) such that Y = h(X).
Proof of Lemma 4.18. (b) ⇒ (a) is clear because the composition of measurable
functions is measurable.
(a) ⇒ (b): Take first Y = 1A , A ∈ σ(X). Then, A = {X ∈ B} for some B ∈P B and
Y = h(X) = 1B (X) (i.e. h(z) = 1 if z ∈ B, h(z) = 0 otherwise.) Then, take Y = ci IAi
i
(with constants ci ≥ 0), then monotone limits of such functions etc.
We now give the
Proof of Lemma 4.17. Let Y0 ≥ 0, Y0 measurable with respect to σ(Sn ). Hence,
with
R RLemma 4.18, Y0 = h(Sn ) for a measurable function h. Hence, E[Xi h(Sn )] =
· · · xi h(x1 + . . . + xn )µ(dx1 ) . . . µ(dxn ), where µ is the law of X1 . But E[Xi h(Sn )] is in-
variant under permutations of the indices {1, . . . , n} ⇒ E[Xi h(Sn )] = E[Xj h(Sn )], ∀i, j ⇒
n
E[Xi h(Sn )] = n1 E[Xk h(Sn )] = E[ Snn h(Sn )] ⇒ E[Xi Y0 ] = E[ Snn Y0 ] and we showed that
P
k=1
Sn
n
satisfies property (ii) in Definition 4.13.
Remark 4.19 The proof used not independence but only the weaker property that the
joint law of (X1 , . . . , Xn ) is invariant under permutations of the indices.
Remark 4.21 Concerning (c) in Theorem 4.20, lim E[Xn |F0 ] is defined as follows. Let
T n
A := {Xn < Xn+1 }. Due to the hypothesis, P(A) = 1 and (b) implies P(A0 ) = 1
n T
where A0 = {E[Xn |F0 ] ≤ E[Xn+1 |F0 ]} (for all versions E[Xn |F0 ], E[Xn+1 |F0 ]). We
n
now set lim Xn (ω) = lim Xn (ω)1A (ω) and lim E[X|F0 ] = lim E[Xn |F0 ]1A0 (ω). Then (c)
n n n n
says that lim E[Xn |F0 ] is (a version of ) the conditional expectation of lim Xn , given F0 ,
n n
i.e. a random variable with properties 4.13 (i) and 4.13 (ii) with F0 and X = lim Xn .
n
76
Proof.
(a) For each choice of a version E[Xi |F0 ] (i = 1, 2) we have that E[X1 |F0 ] + E[X2 |F0 ] is
a random variable which is measurable with respect to F0 and for Y0 ≥ 0, Y0 mea-
surable with respect to F0 , we have E[Y0 (E[X1 |F0 ] + E[X2 |F0 ])] = E[Y0 E[X1 |F0 ]] +
4.13 (ii)
E[Y0 E[X2 |F0 ]] =
E[Y0 X1 ] + E[Y0 X2 ] = E[Y0 (X1 + X2 )].
R
(b) Let B0 = {E[X1 |F0 ] > E[X2 |F0 )]}. Then, B ∈ F0 and (E[X1 |F0 ]−E[X2 |F0 ]) dP =
B0
4.13 (ii) R X1 ≤X2
E(1B0 E[X1 |F0 ] − E[X2 |F0 ]) = E[1B0 (X1 − X2 )] = (X1 − X2 ) dP ≤ 0 ⇒
B0 P-a.s.
P(B0 ) = 0.
mon.
(c) Let Y0 ≥ 0, Y0 measurable with respect to F0 . Then E[Y0 lim E[Xn |F0 ]] =
n conv.
4.13 (ii) mon.
lim E[Y0 E[Xn |F0 ]] = lim E[Y0 Xn ] = E[Y0 lim Xn ].
n n conv. n
The following theorem gives two important cases where the conditional expectation
simplifies.
Theorem 4.22 (a) Let Z0 ≥ 0 be a random variable which is measurable with respect
to F0 . Then
E[Z0 X|F0 ] = Z0 E[X|F0 ]. (4.12)
Proof.
(a) The right hand side of (4.12) is measurable with respect to F0 and for Y0 ≥
4.13 (ii)
0, Y0 measurable with respect to F0 , we have E[Y0 (Z0 X)] = E[(Y0 Z0 )X] =
E[Y0 Z0 E[X|F0 ]] = E[Y0 (Z0 E[X|F0 ])].
77
Theorem 4.24 (Lebesgue’s Theorem for conditional expectations) Assume |Xn | ≤
Y P-a.s. ∀n for some Y ∈ L1 , Xn → X P-a.s. Then
h i
E[X|F0 ] = E lim Xn |F0 = lim E[Xn |F0 ] P-a.s. (4.15)
n n
Moreover, we have
Theorem 4.25 (Jensen’s inequality for conditional expectations) Assume X ∈ L1
and f : R → R convex. Then, E[f (X)] is well-defined and E[f (X)|F0 ] ≥ f (E[X|F0 ]),
P-a.s.
Sketch of proof of Theorem 4.25. Each convex function f is of the form f (x) =
sup `n (x) ∀x with linear functions `n (x) = an x + bn . In particular, f ≥ `n , `n (X) ∈
n
4.20(b) 4.20(a)
L1 . Since E[f (X)|F0 ] ≥ E[`n (X)|F0 ] = `n (E[X|F0 ]), we have E[f (X)|F0 ] ≥
sup `n (E[X|F0 ]) = f (E[X|F0 ]), P-a.s.
n
Corollary 4.26 For p ≥ 1, conditional expectation is a contraction of Lp in the following
sense: X ∈ Lp ⇒ E[X|F0 ] ∈ Lp and kE[X|F0 ]kp ≤ kXkp .
Proof. With f (x) = |x|p , Jensen’s inequality for conditional expectations implies that
|E[X|F0 ]|p ≤ E[|X|p |F0 ] ⇒ E[|E[X|F0 ]|p ] ≤ E[|X|p ] ⇒ kE[X|F0 ]kp ≤ kXkp .
In particular, if X ∈ L2 , then E[X|F0 ] ∈ L2 and E[X|F0 ] can be interpreted as the
”best” prediction of X, given F0 , in the following sense.
Theorem 4.27 Assume X ∈ L2 , Y0 is measurable with respect to F0 and Y0 ∈ L2 .
Then E [(X − E[X|F0 ])2 ] ≤ E[(X − Y0 )2 ] and we have ”=” if and only if Y0 = E[X|F0 ],
P-a.s.
Proof. Assume X0 is a version of E[X|F0 ]. Then E[(X − Y0 )2 ] = E[X 2 ] − 2E[XY0 ] +
E[Y02 ]. For Y0 = X0 , we conclude E[(X − X0 )2 ] = E[X 2 ] − E[X02 ]. Hence E[(X − Y0 )2 ] =
E[(X − X0 )2 ] + E[(X0 − Y0 )2 ] ⇒ E[(X − Y0 )2 ] ≥ E[(X − X0 )2 ] with ”=” if and only if
X0 = Y0 , P-a.s.
Remark 4.28 Theorem 4.27 says that the conditional expectation E[X|F0 ] is the projec-
tion of the element X in the Hilbert space L2 (Ω, F, P) on the closed subspace L2 (Ω, F0 , P).
Theorem 4.29 (Tower property of conditional expectation) Let F0 , F1 be σ-fields
with F0 ⊆ F1 ⊆ F and X a random variable with X ≥ 0. Then,
E[E[X|F1 ]|F0 ] = E[X|F0 ] P-a.s. (4.16)
and
E[E[X|F0 ]|F1 ] = E[X|F0 ] P-a.s. (4.17)
Proof. To show (4.16), we have to prove (see (4.9) in the definition of conditional ex-
pectation) that for Y0 ≥ 0, Y0 measurable with respect to F0 , E[Y0 E[X|F1 ]] = E[Y0 E[X|F0 ]].
But, again due to (4.9), E[Y0 E[X|F0 ]] = E[Y0 X]]. Now, since Y0 is F1 -measurable as well,
again due to (4.9), E[Y0 E[X|F1 ]] = E[Y0 X] and we conclude.
(4.17) is clear since E[X|F0 ] is measurable with respect to F1 : use (4.12).
78
5 Martingales
5.1 Definition and examples
(Ω, F, P) probability space, F0 ⊆ F1 ⊆ F2 ⊆ . . . increasing sequence of σ-fields with
Fi ⊆ F, ∀i. Such a sequence (Fn ) is called a filtration. Interpretation: Fn is the
collection of events observable until time n.
(M2) Mn ∈ L1 , ∀n .
Remarks 5.2 (a) Under the assumptions (M1) and (M2), (M3) is equivalent to
(c) We say that (Mn ) is adapted to (Fn ) (meaning that for each n, Mn is measurable
with respect to Fn ).
Example 5.3 Let p ∈ (0, 1), Y1 , Y2 , . . . i.i.d. Bernoulli RV with parameter p, i.e. P(Yi =
P n
1) = p = 1 − P(Yi = −1). Let Sn := , n ≥ 1 (S0 = 0) be the partial sums and
i=1
Fn = σ(Y1 , . . . , Yn ), F0 = {∅, Ω}. Define Mn := Sn − n(2p − 1) (n = 0, 1, . . .). Then (Mn )
is a martingale with respect to (Fn ). In the same way, for x ∈ R, M̃n = x + Sn − n(2p −
1) (n = 0, 1, . . . ) is a martingale with respect to (Fn ). Note that (Sn ) is a martingale with
respect to Fn if and only if p = 21 .
79
5.1.2 Successive predictions
Let (Fn ) be a filtration and X ∈ L1 a random variable. Set
Mn := E[X|Fn ], n = 0, 1, 2, . . . (5.2)
Proof.
80
(M3) E[Mn+1 |Fn ] = Mn , ∀n, with the same argument as before, i.e. we show that Q(A) =
R A∈Fn+1
E[Mn+1 |Fn ] dP, ∀A ∈ Fn . More precisely: take A ∈ Fn . Then, Q(A) =
A
R A∈F R R
1A Mn+1 dP = n 1A E[Mn+1 |Fn ] dP = E[Mn+1 |Fn ] dP ⇒ Mn = E[Mn+1 |Fn ].
A
Fix x ∈ S and consider the Markov chain (Xn ) with X0 = x and transition matrix K.
Assume h is harmonic. Let Fn = σ(X0 , . . . , Xn ), n = 0, 1, . . . Then, Mn := h(Xn ), (n =
0, 1, . . .) is a martingale with respect to (Fn ).
(M1) is true by definition.
We check (M3):
X
E[h(Xn+1 )|Fn ] = h(y)K(Xn , y) = h(Xn ) a.s. (5.5)
y∈S
(Exercise: show that (5.5) follows from (5.4) and the Markov property). In particular,
taking expectations in (5.5), E[h(Xn+1 )] = E[h(Xn )], ∀n.
By induction, E[h(Xn )] = h(x) and (M2) follows.
P (Yn,i = k) = pk ∀k ∈ N0 . (5.6)
81
Yn,i is the number of children of the i-th individual in generation n − 1 (if this individual
exists). Set
Z0 := 1, (5.7)
Zn
X
Zn+1 := Yn+1,i ∀n ∈ N, (5.8)
i=1
Claim 2: E[Zn ] = mn .
Proof of Claim 2 by induction over n.
n = 0: E[Z0 ] = 1 = m0 .
n→n+1:
E[Zn+1 ] = E E[Zn+1 |Fn ] = E[mZn ] by Claim 1
n
=m·m by the induction assumption
n+1
=m .
82
Zn 1
(M2) E n = n E[Zn ] = 1 < ∞.
m m | {z }
=mn
Zn+1 1 Zn
(M3) E n+1
Fn = n+1 mZn = n .
m m m
martingale
Lemma 5.9 If (Mn )n∈N0 is a supermartingale with respect to a filtration (Fn )n∈N0 ,
submartingale
martingale
then (Mn )n∈N0 is a supermartingale with respect to the filtration generated by M .
submartingale
Proof in the martingale case. For n ∈ N0 , set Gn = σ(Mk , k ≤ n). Assume that
(Mn )n∈N0 is a martingale with respect to some filtration (Fn )n∈N0 .
83
(M1) Mn is Gn -measurable.
Remark 5.10 If one says “(Mn )n∈N0 is a martingale” without specifying a filtration, then
one means “(Mn )n∈N0 is a martingale with respect to the filtration generated by M ”.
Proof.
84
Example 5.13 If (Mn )n∈N0 is a martingale, then (Mn2 )n∈N0 , (|Mn |)n∈N0 and (Mn+ )n∈N0
are submartingales if the required integrability conditions are satisfied.
Hn is measurable
Natural requirement: with respect to σ(X0 , X1 , .. . , Xn−1 ).
martingale fair
If (Xn )n∈N0 is a supermartingale , it seems that this is a non-profitable in-
submartingale profitable
vestment strategy. This is confirmed by Theorem 5.16 below.
Definition 5.15 (Hn )n∈N is previsible with respect to (Fn )n∈N0 if Hn is Fn−1 -measurable
∀n ∈ N.
For n ∈ N, set
n
X
(H.X)n := Hk (Xk − Xk−1 ). (5.10)
k=1
martingale
Theorem 5.16 Let (Xn )n∈N0 be a supermartingale with respect to (Fn )n∈N0 . Sup-
submartingale
all n ∈ N there is a
pose (Hn )n∈Nis previsible withrespect to (Fn )n∈N0 and for constant
|H n | ≤ c n martingale
cn such that 0 ≤ Hn ≤ cn . Then, (H.X)n n∈N0 is a supermartingale .
0 ≤ Hn ≤ cn submartingale
85
(M3) For all n ∈ N0 , one has
E (H.X)n+1 Fn = E (H.X)n + Hn+1 (Xn+1 − Xn )Fn
| {z } | {z }
- %
Fn -measurable
≤ (H.X)n .
86
Lemma 5.19
τ is a stopping time ⇐⇒ {τ = n} ∈ Fn for all n ∈ N0 (5.14)
Proof.
“=⇒”: {τ = n} = {τ ≤ n} \ {τ ≤ n − 1} ∈ Fn .
| {z } | {z }
∈Fn ∈Fn−1 ⊆Fn
n
[
“⇐=”: {τ ≤ n} = {τ = k} ∈ Fn .
| {z }
k=0 ∈F ⊆F
k n
n−1
X
= 1{T =k} Mk + 1{T ≥n} Mn −M0 .
|k=0 {z }
=MT ∧n
87
Theorem 5.16 =⇒ (H.M ) is a supermartingale
=⇒ (MT ∧n − M0 )n∈N0 is a supermartingale.
Since (Yn = −M0 )n∈N0 is a martingale and the sum of a supermartingale and a martingale
is a supermartingale, it follows that (MT ∧n )n∈N0 is a supermartingale.
N0 := 0 (5.15)
N2k−1 := inf{i > N2k−2 : Mi ≤ a}, (5.16)
N2k := inf{i > N2k−1 : Mi ≥ b} for k ∈ N. (5.17)
This means: N1 is the first time when (Mn )n∈N0 reaches a value ≤ a,
N2 is the first time after N1 when (Mn )n∈N0 reaches a value ≥ b,
N3 is the first time after N2 when (Mn )n∈N0 reaches a value ≤ a,
etc.
In particular, Nk , k ∈ N0 , are stopping times with respect to Fn = σ(M0 , . . . , Mn ) n∈N0
and for k ∈ N one has
88
Mn
n
0 N1 N2 N3 N4
Theorem 5.21 (Upcrossing inequality) Let (Mn )n∈N0 be a submartingale. For all n
and all a < b, one has
Proof. By Lemma 5.12 (b), (Yn := a + (Mn − a)+ )n∈N0 is a submartingale. (Mn )n∈N0
and (Yn )n∈N0 have the same number of upcrossings of the interval [a, b]. Set
(
1 if N2k−1 < i ≤ N2k for some k ∈ N,
Hi =
0 else.
89
Set Ki = 1 − Hi for all i. One has
E (H.Y )n = E (1 − K).Y n
" n #
X
=E (1 − Ks )(Ys − Ys−1 )
s=1
= E Yn − Y0 − (K.Y )n
= E (Mn − a)+ − (M0 − a)+ − (K.Y )n .
Observe that
∞
[ ∞
[
{N2k−1 ≤ i − 1} ∩ {N2k ≤ i − 1}c ∈ Fi−1 ,
{Hi = 1} = {N2k−1 < i ≤ N2k } =
k=1 k=1
because N2k−1 and N2k are stopping times. Hence, (Hi )i∈N is previsible. Consequently,
(Ki )i∈N is previsible. Since 0 ≤ Ki ≤ 1 ∀i, Theorem 5.16 implies that (K.Y )n n∈N is a
submartingale. Consequently, by Lemma 5.8,
E (K.Y )n ≥ E (K.Y )1 = E (1 − H1 )E[Y1 − Y0 |F0 ] ≥ 0
due to the submartingale property of (Yn ). This completes the proof of the upcrossing
inequality.
90
Since E[U (a, b)] < ∞, it follows that U (a, b) < ∞ almost surely, and consequently,
P lim Mn ≤ a < b ≤ lim Mn = 0.
n→∞ n→∞
Hence,
[
P lim Mn < lim Mn = P lim Mn ≤ a < b ≤ lim Mn = 0.
n→∞ n→∞ n→∞ n→∞
a<b
a,b∈Q
Thus,
lim Mn = lim Mn almost surely,
n→∞ n→∞
which means that (Mn )n∈N0 converges almost surely to a random variable M∞ . Fatou’s
lemma implies
Fatou
E M∞ = E lim Mn ≤ lim E Mn+ < ∞
+ +
by assumption.
n→∞ n→∞
Similarly,
−
≤ lim E Mn−
E M∞
n→∞
= lim E Mn+ − E[Mn ]
n→∞ | {z }
≥ E[M0 ] because (Mn )n∈N0 is a submartingale
Hence,
+ −
E[|M∞ |] = E M∞ + E M∞ < ∞.
91
Example 5.24 (Growth rate of branching processes) We saw that for a branching
Zn Zn
process (Zn ) with m ∈ (0, ∞), m n is a martingale. Since mn ≥ 0 ∀n, the martingale
Proof.
Case m < 1: P(Zn > 0) ≤ E Zn 1{Zn >0} = E[Zn ] = mn . Hence,
∞
X ∞
X
P(Zn > 0) ≤ mn < ∞ because m < 1.
n=0 n=0
Case m = 1: In this case, (Zn )n∈N0 converges almost surely. Since Zn takes only values
in N0 , there exists a (random) N0 such that (Zn )n≥N0 is constant. For k ∈ N0 , let
92
For all n ≥ n0 , one has on Ak ∩ Bn+1 , Zn = k and hence
Zn k
(
X X = 0 6= k if ` = 0,
Zn+1 = Yn+1,i = Yn+1,i = k`
i=1 i=1
| {z } >k if ` ≥ 2.
= ` on Bn+1
Since P(Ak ∩ {Bn for infinitely many n}) > 0, this is a contradiction.
Thus P(Ak ) = 0 for all k ≥ 1 and we conclude P(A0 ) = 1.
Remark 5.26 Later, we will show that for m > 1
P(Zn > 0, ∀n) > 0, (5.24)
i.e. with positive probability, the population survives.
Definition 5.27 (Conditional probability) For A ∈ F, we define the conditional
probability of A given F0 by
P(A|F0 ) = E[1A |F0 ].
Example 5.28 (Polya’s urn) An urn contains a > 0 red and b > 0 blue balls. A ball is
drawn uniformly at random, its color is observed and it is put back into the urn together
with an additional ball of the same color. Let Rn denote the number of red balls drawn in
the first n drawings.
a + Rn
Claim: Mn = = proportion of red balls after n drawings, n ∈ N0 , is a martin-
a+b+n
gale.
Proof. Let Fn = σ(R0 , R1 , . . . , Rn ).
(M1) Mn is a function of Rn and hence Fn -measurable for all n.
(M2) 0 ≤ Mn ≤ 1 =⇒ Mn ∈ L1 .
(M3) We have that
a + Rn+1
E[Mn+1 |Fn ] = E Fn
a + b + n + 1
a + Rn + 1 a + Rn
Fn
= E 1{Rn+1 =Rn +1} + 1{Rn+1 =Rn }
a+b+n+1 a+b+n+1
↑ ↑
Fn -measurable Fn -measurable
a + Rn + 1 a + Rn
= P(Rn+1 = Rn + 1|Fn ) + P(R = R |F )
a+b+n+1| {z } a + b + n + 1 | n+1 {z n n}
=Mn =1−Mn
a + Rn + 1 a + Rn a + Rn
= Mn − +
a+b+n+1 a+b+n+1 a+b+n+1
Mn (a + b + n)Mn
= + = Mn .
a+b+n+1 a+b+n+1
93
Hence, (Mn )n∈N0 is a martingale.
Since Mn ≥ 0 ∀n, the martingale convergence theorem (Corollary 5.23) implies
M∞ is the asymptotic proportion of red balls in the urn. One can show that
In particular, for a = b = 1,
94
5.6 Uniform integrability and L1 -convergence for martingales
Recall Definition 1.61. In the same way,
Definition 5.29 A family (Xi )i∈I of random variables is called uniformly integrable if
lim sup E |Xi | 1{|Xi |>K} = 0. (5.28)
K→∞ i∈I
Lemma 5.30 If a family (Xi )i∈I is bounded in Lp for some p > 1, i.e. supi∈I E [|Xi |p ] <
∞, the family is uniformly integrable.
Let q > 1 satisfy p1 + 1q = 1. Then, by Hölder’s inequality, one has for all i ∈ I and all
K>0
1 1 1 1
E |Xi |1{|Xi |>K} ≤ E [|Xi |p ] p E 1{|Xi |>K} q ≤ C p P (|Xi | > K) q .
E [|Xi |p ] C
P (|Xi | > K) ≤ p
≤ p.
K K
We conclude 1 1
C p+q
lim sup E |Xi |1{|Xi |>K} ≤ lim = 0.
K→∞ i∈I K→∞ K p/q
is uniformly integrable.
Proof. Let ε > 0. There exists δ > 0 such that for all A ∈ F with P(A) ≤ δ one has
E[|X| 1A ] ≤ ε. (5.30)
95
For all K ≥ K0 , Jensen’s inequality for the conditional expectation yields
h i h i
E |E[X|G]| 1{|E[X|G]|>K} ≤ E E[|X| |G]1{E[|X||G]>K}
| {z } | {z } | {z }
≤E[|X||G] ⊆{E[|X||G]>K} ∈G
= E |X| 1{E[|X||G]>K} . (5.32)
Theorem 5.32 For a martingale (Mn )n∈N0 with respect to (Fn )n∈N0 , the following are
equivalent:
Proof. (a) =⇒ (b): Let (Mn )n∈N0 be a uniformly integrable submartingale. By Re-
mark 1.62 (i), uniform integrability implies supn∈N0 E[|Mn |] < ∞. Applying the mar-
tingale convergence theorem 5.22, we obtain that Mn −−−→ M almost surely for some
n→∞
M ∈ L1 . But, due to Theorem 1.63, if a sequence of random variables is uniformly
integrable and converges almost surely, then it converges in L1 .
(b) =⇒ (c): trivial.
(c) =⇒ (d): Assume (Mn )n∈N0 is a martingale and Mn −−−→ M in L1 . Fix n ∈ N0 .
n→∞
Claim: Mn = E[M |Fn ].
By the definition of the conditional expectation, it suffices to prove the following:
96
For all i > n, one has
E[Mi 1A ] = E E[Mi |Fn ]1A by def. of the conditional expectation because A ∈ Fn
= E[Mn 1A ] by the martingale property. (5.33)
Furthermore,
because Mi −−−→ M in L1 . Thus, lim E[Mi 1A ] = E[M 1A ]. Hence, taking the limit i → ∞
i→∞ i→∞
in (5.33), we get
E[Mn 1A ] = E[M 1A ].
Hence,
The collection of sets A such that (5.35) holds is a Dynkin system which contains the
π-system ∞
S
n=0 Fn . Hence, by Dynkin’s lemma, (5.35) holds for all A ∈ F∞ . Since M is
F∞ -measurable, (5.35) implies
97
5.7 Optional stopping
If (Mn )n∈N0 is a submartingale, then n 7→ E[Mn ] is nondecreasing. This means
Theorem 5.34 Let (Mn )n∈N0 be a submartingale and let T be a bounded stopping time
(i.e. ∃N ∈ N with P(T ≤ N ) = 1). Then, one has
n−1
X
= 1{T ≤n−1} Mn − 1{T =i} Mi − 1 {T ≤0} M0
| {z } i=1
| {z }
=1−1{T ≥n} ={T =0}
n−1
X
= Mn − 1{T ≥n} Mn − 1{T =i} Mi
i=0
= Mn − MT ∧n
Hence, for a submartingale (Mn )n∈N0 , we know the following for all n ∈ N0 :
We want to generalize this and allow in the last inequality a stopping time instead of the
deterministic time N .
98
Theorem 5.35 Let (Mn )n∈N0 be adapted with Mn ∈ L1 for all n. Then, the following
holds:
(Mn )n∈N0 is a martingale ⇐⇒ E[MT ] = E[M0 ] for all bounded stopping times T.
Proof.
“=⇒”: E[MT ] = E[M0 ] for any bounded stopping time T by Theorem 5.34.
E[Mn+1 1A ] = E[Mn 1A ].
Thus, T is a bounded stopping time, and it follows from our assumption that
⇐⇒ E[Mn+1 1A ] = E[Mn 1A ].
Proof.
99
(a) (Mn+ )n∈N0 is a submartingale. Since T ∧n is a bounded stopping time, Theorem 5.34
gives
E MT+∧n ≤ E Mn+ .
Since (Mn )n∈N0 is uniformly integrable, it is L1 -bounded and hence, (Mn+ )n∈N0 is
L1 -bounded as well. Consequently,
f ∈ L1 .
MT = M
Since E[|MT |] < ∞, the first term goes to 0 as K → ∞. Since (Mn )n∈N0 is uniformly
integrable, the second term goes to 0 as K → ∞. This shows that (MT ∧n )n∈N0 is
uniformly integrable.
Theorem 5.37 Let (Mn )n∈N0 be a uniformly integrable submartingale, and let M∞ =
limn→∞ Mn . Then, for any stopping time T ≤ ∞, one has
100
Proof. By Theorem 5.34, we have
Hence, we can take the limit as n → ∞ in (5.40) and the claim follows.
Proof.
• Ω ∩ {T = n} = {T = n} ∈ Fn ∀n ∈ N0 . Hence Ω ∈ FT .
Ac ∩ {T = n} = (A ∪ {T 6= n})c = (( A ∩ {T = n} ) ∪ {T 6= n})c ∈ Fn .
| {z } | {z }
∈Fn because A∈FT ∈Fn
Hence A ∈ FT .
{T ≤ k} ∈ FT for all k ∈ N0 .
Proof.
{T = n} if n ≤ k,
{T ≤ k} ∩ {T = n} = ∈ Fn ∀n ∈ N0 .
∅ if n > k
101
Theorem 5.41 (Optional stopping theorem) Let S ≤ T < ∞ be stopping times,
and let (MT ∧n )n∈N0 be a uniformly integrable submartingale. Then, one has
E[MS ] ≤ E[MT ] and MS ≤ E[MT |FS ]. (5.42)
Proof.
• We apply Theorem 5.37 with M
fn = MT ∧n and Te = S:
f ] ≤ E[M
E[M f ] ≤ E[Mf ]
| {z 0} | {z Te} | {z∞}
=E[M0 ] =E[MS ] =E[MT ]
⇐⇒ E[MS 1A ] ≤ E[MT 1A ].
We can rewrite this as follows
Z Z Z
MS dP ≤ MT dP = E[MT |FS ] dP
A A A
102
−3 −2 −1 0 1 2 3
1−p p
Sn !
1−p
Lemma 5.43 Mn = is a martingale.
p
n∈N0
Proof. We saw already that this is true since (Mn ) falls in the class of harmonic functions
of Markov chains. We include a direct proof for the convenience of the reader. For n ∈ N0 ,
set Fn = σ(Yi , 1 ≤ i ≤ n).
(M1) Mn is Fn -measurable.
We write Px and Ex instead of P and E to indicate that the random walk starts in x.
For y ∈ Z, set τy = inf{n ∈ N0 : Sn = y}.
1
Theorem 5.44 For p 6= 2
and a, b, x ∈ Z with a < x < b, one has
x b
1−p 1−p
p
− p
Px (τa < τb ) = a b . (5.43)
1−p 1−p
p
− p
Proof. Let a < x < b, and let T = τa ∧ τb = inf n ∈ N0 : Sn ∈ {a, b} . T is a stopping
time.
Claim: Px (T < ∞) = 1
The number of points in (a, b) ∩ Z equals (b − 1) − (a + 1) + 1 = b − a − 1. Let
103
be the event that the random walker takes b − a consecutive steps to the right starting at
time k. Then, A`(b−a)+1 , ` ∈ N0 , are independent with
∞
X
b−a
Px (A`(b−a)+1 ) = p >0 ⇒ Px (A`(b−a)+1 ) = ∞.
l=1
If A`(b−a)+1 holds and S0 = x ∈ (a, b), the random walker leaves the interval (a, b) and
T < ∞. Thus,
Ex [M0 ] = Ex [MT ] ⇐⇒
x " S # a b
1−p 1−p T 1−p 1−p
= Ex = Px (ST = a) + Px (ST = b)
p p p | {z } p
=1−Px (ST =a)
Hence,
x b
1−p 1−p
p
− p
Px (τa < τb ) = Px (ST = a) = a b .
1−p 1−p
p
− p
Fix a and x. Clearly, {τa < τb } ⊆ {τa < τb+1 } for all b. Hence, as b ↑ ∞,
∞
[
{τa < τb } ↑ {τa < τb } = {τa < ∞}.
b=1
1−p
Case p > 21 : In this case, 0 < p
< 1 and hence,
b
1−p
lim = 0.
b→∞ p
104
Consequently, x−a
1−p
Px (τa < ∞) = ∈ (0, 1).
p
Note that by the strong law of large numbers,
Sn
= E[Y1 ] = p − (1 − p) = 2p − 1 > 0
lim
n→∞ n
=⇒ lim Sn = ∞ almost surely.
n→∞
1−p
Case p < 12 : In this case, p
> 1 and hence,
b
1−p
lim = ∞.
b→∞ p
Consequently,
Px (τa < ∞) = lim Px (τa < τb ) = 1.
b→∞
By the strong law of large numbers,
Sn
lim= E[Y1 ] = 2p − 1 < 0
n→∞ n
=⇒ lim Sn = −∞ almost surely.
n→∞
Theorem 5.46 Consider symmetric random walk on Z. Let a, b ∈ Z with a < 0 < b,
and set T = τa ∧ τb . One has
b
P(τa < τb ) = and E[T ] = −ab. (5.44)
b−a
E[T ] is the expected amount of time which the random walker needs to leave the
interval (a, b). Since a < 0 < b, −ab > 0. Note that
b
P(τa < ∞) = lim P(τa < τb ) = lim = 1.
b→∞ b→∞ b − a
105
Proof of Theorem 5.46.
• T is a stopping time with P(T < ∞) = 1. Since E[Yi ] = 0, (Sn )n∈N0 is a martingale.
0 ∈ (a, b) implies a ≤ ST ∧n ≤ b for all n ∈ N0 . Hence, (ST ∧n )n∈N0 is uniformly
integrable and the optional stopping theorem implies
for all n ∈ N0 . Hence, the martingale (ST2e∧n − Te ∧ n)n∈N0 is bounded and therefore
uniformly integrable. The optional stopping theorem implies
h i
0 = E ST2e − Te = E ST2 ∧N − E[T ∧ N ]
ab(a − b)
2 b 2 a
=a +b − = = −ab.
b−a b−a b−a
106
5.8 Backwards martingales
We want to consider martingales with index set −N0 = {..., −2, −1, 0}. Recall: σ-algebras
Fn , n ∈ −N0 , form a filtration if . . . F−n ⊆ F−n+1 ⊆ . . . F−1 ⊆ F0 .
Definition 5.47 A martingale (Mn )n∈−N0 with index set −N0 is called a backwards mar-
tingale. In other words, a backwards martingale is a sequence (Mn )n∈−N0 of random
variables with
(M1) Mn is measurable with respect to Fn , ∀n ∈ −N0 .
(M2) Mn ∈ L1 , ∀n ∈ −N0 .
(M3) E[M−n+1 |F−n ] = M−n ∀n ∈ N.
Remark 5.48 Every backwards martingale is uniformly integrable because
M−n = E[M0 |F−n ] for all n ∈ N.
Theorem 5.49 Let (Mn )n∈−N0 be a backwards martingale with respect to (Fn )n∈−N0 .
Then, the limit M−∞ = limn→∞ M−n exists almost surely and in L1 and satisfies
∞
\
M−∞ = E[M0 |F−∞ ] with F−∞ = F−n .
n=0
Proof. For n ∈ N and a, b ∈ R with a < b let U−n (a, b) denote the number of upcrossings
of (Mi )i∈[−n,0] of the intervall [a, b]. The upcrossing inequality yields
(b − a)E[U−n (a, b)] ≤ E (M0 − a)+ − E (M−n − a)+ ≤ E (M0 − a)+ .
Since U−n (a, b) ↑ U−∞ (a, b) = number of upcrossings of (Mi )i∈−N0 of the intervall [a, b],
the monotone convergence theorem implies
1
E (M0 − a)+ < ∞.
E[U−∞ (a, b)] ≤
b−a
Hence, U−∞ (a, b) < ∞ almost surely. The same argument as in the proof of the martingale
convergence theorem yields that (Mn )n∈−N0 converges almost surely to a limit M−∞ ∈
L1 . Convergence in L1 follows as in the proof of Theorem 5.32 (a)⇒(b) using uniform
integrability.
Finally, let A ∈ ∞
T
n=0 F−n . Then, for every n ∈ N, A ∈ F−n and by the martingale
property, we obtain
Z Z Z
M−n dP = E[M0 |F−n ] dP = M0 dP.
A A A
107
Example 5.50 Let Xi , i ≥ P
1, be independent and identically distributed with E[|X1 |] <
∞. For n ∈ N, we set Sn = ni=1 Xi and
Consequently,
n+1
1 X 1 1
E[Xn+1 |F−n−1 ] = E[Xi |F−n−1 ] = E[Sn+1 |F−n−1 ] = Sn+1 .
n + 1 i=1 n+1 n+1
We conclude
1 1 1
E[M−n |F−(n+1) ] = Sn+1 − Sn+1 = Sn+1 = M−n−1 .
n n(n + 1) n+1
By the convergence theorem for backwards martingales, it follows that ( n1 Sn )n∈N con-
verges almost surely and in L1 . The limit is measurable with respect to the tail σ-algebra
of the Xi ’s, which is trivial by Kolmogorov’s 0-1-law. Hence, M−∞ = limn→∞ n1 Sn is
constant almost surely and the constant is given by E[M−∞ ]. Using the L1 -convergence,
we find
1
E[M−∞ ] = lim E[Sn ] = E[X1 ].
n→∞ n
108
5.9 Polya’s urn
Consider Polya’s urn from Example 5.28. An urn contains a > 0 red and b > 0 blue balls.
A ball is drawn uniformly at random, its color is observed and it is put back into the urn
together with an additional ball of the same color.
Let Xi be the color of the ball drawn at time i ≥ 1. Clearly, Xi is a random variable
with values in {R, B}. Note that the stochastic process (Xi )i≥1 is not memoryless: The
more red balls we have drawn up to time n, the larger the probability to draw a red ball
at time n + 1.
Assume a = b = 2. In this special case we calculate some probabilities:
2 3 2 1
P(X1 = B, X2 = B, X3 = R) = · · = ,
4 5 6 10
2 2 3 1
P(X1 = B, X2 = R, X3 = B) = · · = .
4 5 6 10
In both cases, the same colors are drawn, but in different order. Nevertheless, the two
probabilities coincide. This is not a coincidence, as the following theorem shows.
Lemma 5.51 Let n ∈ N, xi ∈ {R, B}, 1 ≤ i ≤ n, and let k := |{i ∈ {1, . . . , n} : xi = R}|
be the number of xi being equal to R. Then, one has
(a + k − 1)!(b + n − k − 1)!
P(Xi = xi ∀1 ≤ i ≤ n) = ca,b ·
(a + b + n − 1)!
with the constant
(a + b − 1)!
ca,b := .
(a − 1)!(b − 1)!
In particular, the probability to observe a given sequence of colors depends only on k
(=total number of red balls drawn) and n − k (=total number of blue balls drawn). It does
not depend on the order in which the balls are drawn. This is called exchangeability.
Proof. One has
Qk−1
+ i) n−k−1
Q
i=0 (a i=0 (b + i)
P(Xi = xi ∀1 ≤ i ≤ n) = Qn−1 .
i=0 (a + b + i)
This is the claimed expression.
Consider the infinite product space Ω = {B, R}N , F = {B, R}⊗N . For p ∈ (0, 1), let
O
Pp = (pδB + (1 − p)δR )
i∈N
109
Theorem 5.52 For n ∈ N, xi ∈ {R, B}, 1 ≤ i ≤ n, and k := |{i ∈ {1, . . . , n} : xi = R}|,
one has
Z 1
P(Xi = xi ∀1 ≤ i ≤ n) = pk (1 − p)n−k ϕa,b (p) dp
Z0 1
= Pp (Yi = xi ∀1 ≤ i ≤ n)ϕa,b (p) dp,
0
where ϕa,b denotes the density of the beta distribution with parameters a and b:
Proof. The second equality is clear. To prove the first equality we calculate
Z 1 Z 1
k n−k a−1 b−1
p (1 − p) ca,b · p (1 − p) dp = ca,b pk+a−1 (1 − p)n−k+b−1 dp
0 0
ca,b (a + k − 1)!(b + n − k − 1)!
= = ca,b ·
ca+k,b+n−k (a + b + n − 1)!
=P(Xi = xi , ∀1 ≤ i ≤ n)
by Lemma 5.51.
Thus, to calculate the probability of an event for the sequence of drawings (Xi )i≥1
from the Polya urn, we can calculate the probability of this event for every i.i.d. sequence
under Pp and then we take the average of all these probabilities with respect to a beta
distribution. One says: (Xi )i≥1 is a mixture of i.i.d. sequences.
De Finetti’s theorem is useful in Bayesian statistics. Suppose your data is an i.i.d.
sequence sampled from Pp with unknown p. In the Bayesian approach, one puts a prior
distribution on the unknown p, for instance a beta distribution and assumes the data is
sampled form the measure
Z 1
A 7→ Pp ((Yi )i≥1 ∈ A)ϕa,b (p) dp.
0
The interpretation of this measure in terms of the Polya urn simplifies many calculations.
Proof of Theorem 5.53. The events {ω = (ωi )i≥1 ∈ Ω : ωi = xi ∀1 ≤ i ≤ n} are stable
under intersections and generate the product σ-algebra. Hence, the claim follows from
Theorem 5.52.
110
Consider the following random variables:
Using a martingale argument, we have shown that the last limit exists almost surely.
Hence, limn→∞ αn exists almost surely.
We conclude
Z
P(α∞ ∈ A) = ϕa,b (p) dp.
A
111
References
[Bau90] Heinz Bauer. Maß- und Integrationstheorie. (Measure and integration theory).
Berlin etc.: Walter de Gruyter., 1990.
[BK10] Martin Brokate and Götz Kersting. Maß und Integral. Birkhäuser, 2010.
[Dur05] Richard Durrett. Probability: theory and examples. Duxbury Press, third edition,
2005.
112