Stats 310 Notes
Stats 310 Notes
Sourav Chatterjee
Contents
Chapter 1. Measures 1
1.1. Measurable spaces 1
1.2. Measure spaces 2
1.3. Dynkin’s π-λ theorem 4
1.4. Outer measures 6
1.5. Carathéodory’s extension theorem 8
1.6. Construction of Lebesgue measure 10
1.7. Example of a non-measurable set 12
1.8. Completion of a measure space 12
Chapter 7. Independence 59
7.1. Definition 59
7.2. Expectation of a product under independence 60
7.3. The second Borel–Cantelli lemma 62
7.4. The Kolmogorov zero-one law 63
7.5. Zero-one laws for i.i.d. random variables 64
7.6. Random vectors 66
7.7. Convolutions 67
Measures
The above definition makes it possible to define the following important class
of σ-algebras.
D EFINITION 1.1.8. Let Ω be a set endowed with a topology. The Borel σ-
algebra on Ω is the σ-algebra generated by the collection of open subsets of Ω. It
is sometimes denoted by B(Ω).
In particular, the Borel σ-algebra on the real line R is the σ-algebra generated
by all open subsets of R.
E XERCISE 1.1.9. Prove that the Borel σ-algebra on R is also generated by
the set of all open intervals (or half-open intervals, or closed intervals). (Hint:
Show that any open set is a countable union of open — or half-open, or closed —
intervals.)
E XERCISE 1.1.10. Show that intervals of the form (x, ∞) also generated B(R),
as do intervals of the form (−∞, x).
E XERCISE 1.1.11. Let (Ω, F) be a measurable space and take any Ω0 ∈ F.
Consider the set F 0 := {A∩Ω0 : A ∈ F}. Show that this is a σ-algebra. Moreover,
show that if F is generated by a collection of sets A, then F 0 is generated by the
collection A0 := {A ∩ Ω0 : A ∈ A}. (The σ-algebra F 0 is called the restriction of
F to Ω0 .)
E XERCISE 1.1.12. As a corollary of the above exercise, show that if Ω is a
topological space and Ω0 is a Borel subset of Ω endowed with the topology in-
herited from Ω, then the restriction of B(Ω) to Ω0 equals the Borel σ-algebra of
Ω0 .
P ROOF. Let G be the collection of all A ∈ F for which the stated property
holds. Clearly, A ⊆ G. In particular, Ω ∈ G. Take A ∈ G and any > 0. Find
B ∈ A such that µ(A∆B) < . Since Ac ∆B c = A∆B, we get µ(Ac ∆B c ) < .
Thus, Ac ∈ G. Finally, take any A1 , A2 , . . . ∈ G. Let A := ∪Ai and take any
> 0. Since µ is a probability measure, there is some n large enough such that
n
[
µ A\ Ai < .
2
i=1
For each i, find Bi ∈ A such that µ(Ai ∆Bi ) < 2−i−1 . Let A0 := ∪ni=1 Ai and
B 0 := ∪ni=1 Bi . It is not hard to see that
n
[
A0 ∆B 0 ⊆ (Ai ∆Bi ).
i=1
4 1. MEASURES
Thus,
n
0 0
X
µ(A ∆B ) ≤ µ(Ai ∆Bi ) ≤ .
2
i=1
An important corollary of the π-λ theorem is the following result about unique-
ness of measures.
T HEOREM 1.3.6. Let P be a π-system. If µ1 and µ2 are measures on σ(P)
that agree on P, and there is a sequence A1 , A2 , . . . ∈ P such that An increases
to Ω and µ1 (An ) and µ2 (An ) are both finite for every n, then µ1 = µ2 on σ(P).
P ROOF. Take any A ∈ P such that µ1 (A) = µ2 (A) < ∞. Let
L := {B ∈ σ(P) : µ1 (A ∩ B) = µ2 (A ∩ B)}.
Clearly, Ω ∈ L. If B ∈ L, then
µ1 (A ∩ B c ) = µ1 (A) − µ1 (A ∩ B)
= µ2 (A) − µ2 (A ∩ B) = µ2 (A ∩ B c ),
and hence B c ∈ L. If B1 , B2 , . . . ∈ L are disjoint and B is their union, then
∞
X
µ1 (A ∩ B) = µ1 (A ∩ Bi )
i=1
X∞
= µ2 (A ∩ Bi ) = µ2 (A ∩ B),
i=1
P ROOF. By monotonicity of φ, the left side dominates the right. For the op-
posite inequality, let Bn := An ∩ (A1 ∪ · · · ∪ An−1 )c for each n, so that the sets
B1 , B2 , . . . are disjoint and An = B1 ∪ · · · ∪ Bn for each n. By Lemma 1.4.4,
Bn ∈ F for each n. Thus, by Lemma 1.4.5,
n
X
φ(E ∩ An ) = φ(E ∩ Bi ).
i=1
Consequently,
∞
X
lim φ(E ∩ An ) = φ(E ∩ Bi ).
n→∞
i=1
Since φ is countably subadditive, this completes the proof of the lemma.
We are now ready to prove Theorem 1.4.3.
P ROOF OF T HEOREM 1.4.3. Let A1 , A2 , . . . ∈ F and let A := ∪Ai . For each
n, let
[n
Bn := Ai .
i=1
Take any E ⊆ Ω and any n. By Lemma 1.4.4, Bn ∈ F and hence
φ(E) = φ(E ∩ Bn ) + φ(E ∩ Bnc ).
By monotonicity of φ, φ(E ∩ Bnc ) ≥ φ(E ∩ Ac ). Thus,
φ(E) ≥ φ(E ∩ Bn ) + φ(E ∩ Ac ).
By Lemma 1.4.6,
lim φ(E ∩ Bn ) = φ(E ∩ A).
n→∞
8 1. MEASURES
Then {Aij }∞
i,j=1 is a countable cover for A, and so
∞
X
∗
µ (A) ≤ µ(Aij )
i,j=1
X∞ ∞
X
≤ (µ∗ (Ai ) + 2−i ) = + µ∗ (Ai ).
i=1 i=1
∞
X
µ∗ (E ∩ A) + µ∗ (E ∩ Ac ) ≤ (µ(A ∩ Ai ) + µ(Ac ∩ Ai ))
i=1
X∞
= µ(Ai ).
i=1
P ROOF. By Exercise 1.6.2, the algebra A generates the Borel σ-algebra. The
existence and uniqueness of the extension now follows by Proposition 1.6.4 and
Carathéodory’s extension theorem.
D EFINITION 1.6.6. The unique extension of λ given by Corollary 1.6.5 is
called the Lebesgue measure on the real line. The outer measure defined by the
formula (1.5.1) in this case is called Lebesgue outer measure, and the σ-algebra
induced by this outer measure is called the Lebesgue σ-algebra.
E XERCISE 1.6.7. Prove that for any −∞ < a ≤ b < ∞, λ([a, b]) = b − a.
12 1. MEASURES
(1) Let Z be the set of all subsets of the zero µ-measure sets in F.
(2) Let F0 := {A ∪ B : A ∈ F, B ∈ Z}.
(3) If C ∈ F0 equals A∪B for some A ∈ F and B ∈ Z, let µ0 (C) := µ(A).
(First, show that this well-defined.)
E XERCISE 1.8.2. Prove that the completion of the Borel σ-algebra of R (or
Rn ) obtained via the above prescription is the Lebesgue σ-algebra.
Recall that Lebesgue measure is defined on the Lebesgue σ-algebra. We will,
however, work with the Borel σ-algebra most of the time. When we say that a func-
tion defined on R is ‘measurable’, we will mean Borel measurable unless otherwise
mentioned. On the other hand, abstract probability spaces on which we will define
our random variables (measurable maps), will usually be taken to be complete.
CHAPTER 2
P ROOF. It is easy to verify that the set of all B ⊆ Ω0 such that f −1 (B) ∈ F is
a σ-algebra. Since this set contains A, it must also contains the σ-algebra generated
by A, which is F 0 . Thus, f is measurable.
Essentially all functions that arise in practice are measurable. Let us now see
why that is the case by identifying some large classes of measurable functions.
P ROPOSITION 2.1.4. Suppose that Ω and Ω0 are topological spaces, and F
and F 0 are their Borel σ-algebras. Then any continuous function from Ω into Ω0 is
measurable.
P ROOF. Use Lemma 2.1.3 with A = the set of all open subsets of Ω0 .
A combination of Exercise 2.1.2 and Proposition 2.1.4 shows, for example,
that sums and products of real-valued measurable functions are measurable, since
addition and multiplication are continuous maps from R2 to R, and if f, g : Ω → R
are measurable, then (f, g) : Ω → R2 is measurable with respect to B(R2 ) (easy
to show).
E XERCISE 2.1.5. Following the above sketch, show that sums and products of
measurable functions are measurable.
15
16 2. MEASURABLE FUNCTIONS AND INTEGRATION
P ROOF. For any t ∈ R∗ , g(ω) ≥ t if and only if fn (ω) ≥ t for all n. Thus,
\∞
g −1 ([t, ∞]) = fn−1 ([t, ∞]),
n=1
which shows that g −1 ([t, ∞])
∈ F. It is straightforward to verify that sets of the
form [t, ∞] generate the σ-algebra of R∗ . Thus, g is measurable. The proof for h
is similar.
The following exercises are useful consequences of Proposition 2.1.9.
E XERCISE 2.1.10. If {fn }n≥1 is a sequence of R∗ -valued measurable func-
tions defined on the same measure space, show that the functions lim inf n→∞ fn
and lim supn→∞ fn are also measurable. (Hint: Write the lim sup as an infimum
of suprema and the lim inf as a supremum of infima.)
E XERCISE 2.1.11. If {fn }n≥1 is a sequence of R∗ -valued measurable func-
tions defined on the same measure space, and fn → f pointwise, show that f is
measurable. (Hint: Under the given conditions, f = lim supn→∞ fn .)
E XERCISE 2.1.12. If {fn }n≥1 is a sequence of [0,
P∞]-valued measurable func-
tions defined on the same measure space, show that fn is measurable.
E XERCISE 2.1.13. If {fn }n≥1 is a sequence of measurable R∗ -valued func-
tions defined on the same measurable space, show that the set of all ω where
lim fn (ω) exists is a measurable set.
It is sometimes useful to know that Exercise 2.1.11 has a generalization to
functions taking value in arbitrary separable metric spaces. In that setting, however,
the proof using lim sup does not work, and a different argument is needed. This is
given in the proof of the following result.
P ROPOSITION 2.1.14. Let (Ω, F) be a measurable space and let S be a sepa-
rable metric space endowed with its Borel σ-algebra. If {fn }n≥1 is a sequence of
measurable functions from Ω into S that converge pointwise to a limit function f ,
then f is also measurable.
2.2. LEBESGUE INTEGRATION 17
P ROOF. Let B(x, r) denote the open ball with center x and radius r in S.
Since S is separable, any open set is a countable union of open balls. Therefore
by Lemma 2.1.3 it suffices to check that f −1 (B) ∈ F for any open ball B. Take
such a ball B(x, r). For any ω ∈ Ω, if f (ω) ∈ B(x, r), then there is some large
enough integer k such that f (ω) ∈ B(x, r − k −1 ). Since fn (ω) → f (ω), this
implies that fn (ω) ∈ B(x, r − k −1 ) for all large enough n. On the other hand, if
there exists k ≥ 1 such that fn (ω) ∈ B(x, r − k −1 ) for all large enough n, then
f (ω) ∈ B(x, r). Thus, we have shown that f (ω) ∈ B(x, r) if and only if there is
some integer k ≥ 1 such that fn (ω) ∈ B(x, r − k −1 ) for all large enough n. In set
theoretic notation, this statement can be written as
∞ [
[ ∞ \
∞
f −1 (B(x, r)) = fn−1 (B(x, r − k −1 )).
k=1 N =1 n=N
Since each fn is measurable, and σ-algebras are closed under countable unions and
intersections, this shows that f −1 (B(x, r)) ∈ F, completing the proof.
Measurable maps generate σ-algebras of their own, which are important for
various purposes.
D EFINITION 2.1.15. Let (Ω, F) and (Ω0 , F 0 ) be two measurable spaces and
let f : Ω → Ω0 be a measurable function. The σ-algebra generated by f is defined
as
σ(f ) := {f −1 (A) : A ∈ F 0 }.
It is not difficult to prove that if f is a nonnegative simple function, then this defi-
nition gives the same answer as (2.2.2), so there is no inconsistency.
Finally, take any measurable f : Ω → R∗ . Define f + (ω) = max{f (ω), 0} and
f (ω) = − min{f (ω), 0}. Then f + and f − are nonnegative measurable functions
−
space into R∗ . The integral of f over the set S can then be defined as the integral
of fS on the measure space (S, FS , µS ), provided that the integral exists. When
S = ∅, the integral can be defined to be zero.
R
E XERCISE 2.2.3. Show that the two definitions of S f dµ discussed above are
the same, in the sense that one is defined if and only if the other is defined, and in
that case they are equal.
Since an infinite series with nonnegative summands may be rearranged any way
we like without altering the result, this gives
∞ X
X n ∞
X
ν(S) = ai µ(Ai ∩ Sj ) = ν(Sj ).
j=1 i=1 j=1
P ROOF. Take any n. If k2−n ≤ f (ω) < (k + 1)2−n for some integer 0 ≤
k < n2n , let fn (ω) = k2−n . If f (ω) ≥ n, let fn (ω) = n. It is easy to check that
the sequence {fn }n≥1 is a sequence of nonnegative simple functions that increase
pointwise to f .
2.4. LINEARITY OF THE LEBESGUE INTEGRAL 21
(Hint: Start with f = 1A and use Proposition 2.3.6 and the monotone convergence
theorem to generalize.)
P ROOF. It is easy to check that additivity of the integral holds for nonnega-
tive simple functions. Take any measurable f, g : Ω → [0, ∞]. Then by Propo-
sition 2.3.6, there exist sequences of nonnegative simple functions {un }n≥1 and
{vn }n≥1 increasing to f and g pointwise. Then un + vn increases to f + g point-
wise. Thus, by the linearity of integral for nonnegative simple functions and the
monotone convergence theorem,
Z Z
(f + g)dµ = lim (un + vn )dµ
n→∞
Z Z Z Z
= lim un dµ + vn dµ = f dµ + gdµ.
n→∞
Next, take any α ≥ 0 and any measurable f : Ω → [0, ∞]. Any element of
SF+ (αf ) must be of the form αg for some g ∈ SF+ (f ). Thus
Z Z Z Z
αf dµ = sup αgdµ = α sup gdµ = α f dµ.
g∈SF+ (f ) g∈SF+ (f )
22 2. MEASURABLE FUNCTIONS AND INTEGRATION
This completes the proofs of the assertions about nonnegative functions. Next, take
any integrable f and g. It is easy to see that a consequence of integrability is that
set where f or g take infinite values is a set of measure zero. The only problem in
defining the function f + g is that at some ω, it may happen that one of f (ω) and
g(ω) is ∞ and the other is −∞. At any such point, define (f + g)(ω) = 0. Then
f + g is defined everywhere, is measurable, and (f + g)+ ≤ f + + g + . Therefore,
by the additivity R functions and the integrability of f
of integration for nonnegative
and g, (f + g)+ dµ is finite. Similarly, (f + g)− dµ is finite. Thus, f + g is
R
integrable. Now,
(f + g)+ − (f + g)− = f + g = f + − f − + g + − g − ,
which can be written as
(f + g)+ + f − + g − = (f + g)− + f + + g + .
Note that the above identity holds even if one or more of the summands on either
side are infinity. Thus, by additivity of integration for nonnegative functions,
Z Z Z Z Z Z
− − −
(f + g) dµ + f dµ + g dµ = (f + g) dµ + f dµ + g + dµ,
+ +
R R R
which rearranges to give (f + g)dµ = f dµ + gdµ. Finally, if f is integrable
and α ≥ 0, then + + − −
R (αf ) = Rαf and (αf ) = αf , which shows that αf is
integrable and αf dµ = α f dµ. Similarly, α ≤ 0, then (αf ) = −αf − and
+
Now replace fn by −fn and f byR −f . All the conditionsR still hold, and there-
fore the Rsame deduction shows
R that (−f )dµ ≤ R lim inf (−fRn )dµ, which is the
same as f dµ ≥ lim sup fRn dµ. Thus, we get f dµ = lim fn dµ.
Finally, to show that lim |fn −f |dµ = 0, observe that |fn −f | → 0 pointwise,
and |fn − f | is bounded by the integrable function 2h everywhere. Then apply the
first part.
E XERCISE 2.5.3. Produce a counterexample to show that the nonnegativity
condition in Fatou’s lemma cannot be dropped.
E XERCISE 2.5.4. Produce a counterexample to show that the domination con-
dition of the dominated convergence theorem cannot be dropped.
A very important application of the dominated convergence theorem is in giv-
ing conditions for differentiating under the integral sign.
P ROPOSITION 2.5.5. Let I be an open subset of R, and let (Ω, F, µ) be a
measure space. Suppose that f : I × Ω → R satisfies the following conditions:
(i) f (x, ω) is an integrable function of ω for each x ∈ I,
(ii) for all ω ∈ Ω, the derivative fx of f with respect to x exists for all x ∈ I,
and
(iii) there is an integrable function h : Ω → R such that |fx (x, ω)| ≤ h(ω)
for all x ∈ I and ω ∈ Ω.
Then for all x ∈ I,
Z Z
d
f (x, ω)dµ(ω) = fx (x, ω)dµ(ω).
dx Ω Ω
P ROOF. First suppose that µ({ω : f (ω) > 0}) = 0. Take any nonnegative
measurable simple function g such that g ≤ f everywhere. Then g = 0 whenever
f = R0. Thus, µ({ω : g(ω) > 0}) = 0. Since g is a simple
R function, this implies
that gdµ = 0. Taking supremum over all such g gives f dµ = 0. Conversely,
suppose that µ({ω : f (ω) > 0}) > 0. Then
[ ∞
−1
µ({ω : f (ω) > 0}) = µ {ω : f (ω) > n }
n=1
= lim µ({ω : f (ω) > n−1 }) > 0,
n→∞
where the second equality follows from the observation that the sets in the union
form an increasing sequence. Therefore for some n, µ(An ) > 0, where An :=
{ω : f (ω) > n−1 }. But then
Z Z Z
f dµ ≥ f 1An dµ ≥ n−1 1An dµ = n−1 µ(An ) > 0.
exercises show how to use this rule of thumb to get slightly better versions of the
convergence theorems of this chapter.
E XERCISE 2.6.4. If {fn }n≥1 is a sequence of measurable functions and f is a
function such that fn → f a.e., show that there is a measurable function g such that
g = f a.e. Moreover, if the σ-algebra is complete, show that f itself is measurable.
E XERCISE 2.6.5. Show that in the monotone convergence theorem and the
dominated convergence theorem, it suffices to have fn → f a.e. and f measurable.
E XERCISE 2.6.6. Let f : Ω → I be a measurable function, where I is an
interval in R∗ . The interval I is allowed
R to be finite or infinite, open, closed or
half-open. In all cases, show that f dµ ∈ I if µ is a probability measure. (Hint:
Use Exercise 2.6.3.)
E 2.6.7. If f1 , f2 , . . . arePmeasurable functions from Ω into ∗
PXERCISE
R RP PRR such
that |fi |dµ < ∞, then show that fi exists a.e., and fi dµ = fi dµ.
E XERCISE 2.6.8. Show that in Proposition 2.5.5, every occurrence of ‘for all
ω ∈ Ω’ can be replaced by ‘for almost all ω ∈ Ω’.
CHAPTER 3
Product spaces
for each A1 ∈ F1 , . . . , An ∈ Fn .
P ROOF. It is easy to check that the collection of finite disjoint unions of sets of
the form A1 × · · · × An (sometimes called ‘rectangles’) form an algebra. Therefore
by Carathéodory’s theorem, it suffices to show that µ defined through (3.1.1) is a
countably additive measure on this algebra.
We will prove this by induction on n. It is true for n = 1 by the definition of a
measure. Suppose that it holds for n − 1. Take any rectangular set A1 × · · · × An .
Suppose that this set is a disjoint union of Ai,1 × · · · × Ai,n for i = 1, 2, . . ., where
Ai,j ∈ Fj for each i and j. It suffices to show that
∞
X
µ(A1 × · · · × An ) = µ(Ai,1 × · · · × Ai,n ).
i=1
On the other hand, if y ∈ Ai,n for some i ∈ I, then (x, y) ∈ Ai,1 × · · · × Ai,n .
Thus, (x, y) ∈ A1 × · · · × An , and therefore y ∈ An . This shows that An is the
union of Ai,n over i ∈ I. Now, if y ∈ Ai,n ∩ Aj,n for some distinct i, j ∈ I, then
since x ∈ (Ai,1 × · · · × Ai,n−1 ) ∩ (Aj,1 × · · · × Aj,n−1 ), we get that (x, y) ∈
(Ai,1 × · · · × Ai,n ) ∩ (Aj,1 × · · · × Aj,n ), which is impossible. Thus, the sets Ai,n ,
as i ranges over I, are disjoint. This gives
X
µn (An ) = µn (Ai,n ).
i∈I
The desired result now follows by applying the induction hypothesis to µ0 on both
sides.
The measure µ of Proposition 3.1 is called the product of µ1 , . . . , µn , and is
usually denoted by µ1 × · · · × µn .
E XERCISE 3.1.2. Let (Ωi , Fi , µi ) be σ-finite measure spaces for i = 1, 2, 3.
Show that (µ1 × µ2 ) × µ3 = µ1 × µ2 × µ3 = µ1 × (µ2 × µ3 ).
E XERCISE 3.1.3. Let S be a separable metric space, endowed with its Borel
σ-algebra. Then S × S comes with its product topology, which defines its own
Borel σ-algebra. Show that this is the same as the product σ-algebra on S × S.
E XERCISE 3.1.4. Let S be as above, and let (Ω, F) be a measurable space. If
f and g are measurable functions from Ω into S, show that (f, g) : Ω → S × S is
a measurable function.
E XERCISE 3.1.5. Let S and Ω be as above. If ρ is the metric on S, show that
ρ : S × S → R is a measurable map.
E XERCISE 3.1.6. Let (Ω, F) be a measurable space and let S be a complete
separable metric space endowed with its Borel σ-algebra. Let {fn }n≥1 be a se-
quence of measurable functions from Ω into S. Show that
{ω ∈ S : lim fn (ω) exists}
n→∞
3.2. FUBINI’S THEOREM 29
This completes the proof of Fubini’s theorem for all indicator functions. By linear-
ity, this extends to all simple functions. Using the monotone convergence theorem
and Proposition 2.3.6, it is straightforward the extend the result to all nonnegative
measurable functions.
Now take any integrable f : Ω1 × Ω2 → R∗ . Applying Fubini’s theorem to f +
and f − , we get
Z Z Z Z Z
f dµ = f + (x, y)dµ2 (y)dµ1 (x) − f − (x, y)dµ2 (y)dµ1 (x)
ZΩ1 Ω2 Z Ω1 Ω2
where
Z Z
g(x) := f + (x, y)dµ2 (y), h(x) := f − (x, y)dµ2 (y).
Ω2 Ω2
On the other hand, the set where g(x) and h(x) are both infinite is a set of measure
zero, and therefore we may define
Z
f (x, y)dµ2 (y) = 0
Ω2
P ROOF. First note that by the existence of finite dimensional product mea-
sures, the measure νn := µ1 × · · · × µn is defined for each n.
Next, define Ω(n) := Ωn+1 × Ωn+2 × · · · . Let A ∈ F be called a cylinder set
if it is of the form B × Ω(n) for some n and some B ∈ F1 × · · · × Fn . Define
µ(A) := νn (B). It is easy to see that this is a well-defined functional on A, and
that it satisfies the product property for rectangles.
Let A be the collection of all cylinder sets. It is not difficult to check that
A is an algebra, and that µ is finitely additive and σ-finite on A. Therefore by
Carathéodory’s theorem, we only need to check that µ is countably additive on A.
Suppose that A ∈ A is the union of a sequence of disjoint sets A1 , A2 , . . . ∈ A.
For each n, let Bn := A \ (A1 ∪ · · · ∪ An ). Since A is an algebra, each Bn ∈ A.
Since µ is finitely additive on A, µ(A) = µ(Bn ) + µ(A1 ) + · · · + µ(An ) for each
n. Therefore we have to show that lim µ(Bn ) = 0. Suppose that this is not true.
Since {Bn }n≥1 is a decreasing sequence of sets, this implies that there is some
> 0 such that µ(Bn ) ≥ for all n. Using this, we will now get a contradiction to
the fact that ∩Bn = ∅.
3.3. INFINITE DIMENSIONAL PRODUCT SPACES 33
For each n, let A(n) be the algebra of all cylinder sets in Ω(n) , and let µ(n) be
defined on A(n) using µn+1 , µn+2 , . . . , the same way we defined µ on A. For any
n, m, and (x1 , . . . , xm ) ∈ Ω1 × · · · × Ωm , define
Bn (x1 , . . . , xm ) := {(xm+1 , xm+2 , . . .) ∈ Ω(m) : (x1 , x2 , . . .) ∈ Bn }.
Since Bn is a cylinder set, it is of the form Cn × Ω(m) for some m and some
Cn ∈ F1 × · · · × Fm . Therefore by Lemma 3.2.1, Bn (x1 ) ∈ A(1) for any x1 ∈ Ω1 ,
and by Fubini’s theorem, the map x1 7→ µ(1) (Bn (x1 )) is measurable. (Although
we have not yet shown that µ(1) is a measure on the product σ-algebra of Ω(1) , it is
evidently a measure on the σ-algebra G of all sets of the form D × Ω(m) ⊆ Ω(1) ,
where D ∈ F2 ×· · ·×Fm . Moreover, Bn ∈ F1 ×G. This allows us to use Fubini’s
theorem to reach the above conclusion.) Thus, the set
Fn := {x1 ∈ Ω1 : µ(1) (Bn (x1 )) ≥ /2}
is an element of F1 . But again by Fubini’s theorem,
Z
µ(Bn ) = µ(1) (Bn (x1 ))dµ1 (x1 )
Z Z
(1)
= µ (Bn (x1 ))dµ1 (x1 ) + µ(1) (Bn (x1 ))dµ1 (x1 )
Fn Fnc
≤ µ1 (Fn ) + .
2
Since µ(Bn ) ≥ , this shows that µ1 (Fn ) ≥ /2. Since {Fn }n≥1 is a decreasing
sequence of sets, this shows that ∩Fn 6= ∅. Choose a point x∗1 ∈ ∩Fn .
Now note that {Bn (x∗1 )}n≥1 is a decreasing sequence of sets in F (1) , such that
µ (Bn (x∗1 )) ≥ /2 for each n. Repeating the above argument for the product
(1)
space Ω(1) and the sequence {Bn (x∗1 )}n≥1 , we see that there exists x∗2 ∈ Ω2 such
that µ(2) (Bn (x∗1 , x∗2 )) ≥ /4 for every n.
Proceeding like this, we obtain a point x = (x∗1 , x∗2 , . . .) ∈ Ω such that for any
m and n,
µ(m) (Bn (x∗1 , . . . , x∗m )) ≥ 2−m .
Take any n. Since Bn is a cylinder set, it is of the form Cn × Ω(m) for some m
and some Cn ∈ F1 × · · · × Fm . Since µ(m) (Bn (x∗1 , . . . , x∗m )) > 0, there is some
(xm+1 , xm+2 , . . .) ∈ Ω(m) such that (x∗1 , . . . , x∗m , xm+1 , xm+2 , . . .) ∈ Bn . But by
the form of Bn , this implies that x ∈ Bn . Thus, x ∈ ∩Bn . This completes the
proof.
Having constructed products of countably many probability spaces, one may
now wonder about uncountable products. Surprisingly, this is quite simple, given
that we know how to handle the countable case. Suppose that {(Ωi , Fi , µi )}i∈I
is an arbitrary collection of probability spaces. Let Ω := ×i∈I Ωi , and let F :=
×i∈I Fi be defined using rectangles as in the countable case. Now take any count-
able set J ⊆ I, and let FJ := ×j∈J Fj . Consider a set A ∈ F of the form
{(ωi )i∈I : (ωj )j∈J ∈ B}, where B is some element of FJ . Let G be the collection
of all such A, as B varies in FJ and J varies over all countable subsets of I. It is
34 3. PRODUCT SPACES
P ROOF. Let C be the collection of Borel sets B for which the first and second
identities displayed above hold true. It is easy to see that C contains all closed sets,
since for closed set C can be written as the decreasing intersection of the open sets
Uk := {x ∈ Rn : |x − y| < 1/k for some y ∈ C},
3.4. KOLMOGOROV’S EXTENSION THEOREM 35
where |x − y| denotes the Euclidean norm of x − y. Thus, the proof of the first two
identities will be complete if we can show that C is a σ-algebra.
It is easy to see that ∅ ∈ C. Next, if B ∈ C, then it is not hard to see that
c
B ∈ C using the observation that if C ⊆ B is a closed set and U ⊇ B is an
open set, then C c ⊇ B c is an open set and U c ⊆ B c is a closed set. Next, take
any B1 , B2 , . . . ∈ C and let B be their union. Take any > 0. Find closed sets
C1 ⊆ B1 , C2 ⊆ B2 , . . . such that µ(Ci ) ≥ µ(Bi ) − 2−i for each i, and open
sets U1 ⊇ B1 , U2 ⊇ B2 , . . . such that µ(Ui ) ≤ µ(Bi ) + 2−i for each i. Let
U := ∪i≥1 Ui , so that U is an open set, and
∞
X
µ(U ) − µ(B) ≤ (µ(Ui ) − µ(Bi )) ≤ .
i=1
The problem is, C may not be closed. But this is easily resolved by taking k
large enough so that µ(C) − µ(∪ki=1 Ci ) ≤ . Then C 0 := ∪ki=1 Ci is a closed set
contained in B, and µ(B) − µ(C 0 ) ≤ 2. This proves that C is a σ-algebra.
Finally, to prove the third identity, take any B ∈ B(Rn ), any > 0, and a
closed set C ⊆ B such that µ(B) − µ(C) ≤ . Let Br denote the closed ball of
radius r centered at the origin. Then µ(C ∩ Br ) → µ(C) as r → ∞, and C ∩ Br
is a compact subset of C. Thus, it is possible to find a compact set K ⊆ C such
that µ(B) − µ(K) ≤ 2.
this assumption, we will show that ∩n≥1 Bn 6= ∅, which will give a contradiction,
because we know that the intersection is empty.
Since the Bn ’s are cylinder sets, there exist integers m1 ≤ m2 ≤ · · · , and
Borel sets D1 ⊆ (Rd )m1 , D2 ⊆ (Rd )m2 , . . . such that for each n,
Bn = Dn × Rd × Rd × · · · .
Moreover, since Bn ⊇ Bn+1 for each n, we must have that
Dn+1 ⊆ Dn × (Rd )mn+1 −mn (3.4.1)
for each n.
Now recall the uniform positive lower bound on µ(Bn ). By Lemma 3.4.2,
we can find a compact set Kn ⊆ Dn for each n, such that
µmn (Dn \ Kn ) ≤ 2−n−1 .
Define
Kn0 := (K1 × (Rd )mn −m1 ) ∩ (K2 × (Rd )mn −m2 ) ∩ · · ·
· · · ∩ (Kn−1 × (Rd )mn −mn−1 ) ∩ Kn .
Then Kn0 , being a closed subset of Kn , is compact. Moreover,
Xn
µmn (Dn \ Kn0 ) ≤ µmn (Dn \ (Ki × (Rd )mn −mi ))
i=1
Xn
≤ µmn ((Di × (Rd )mn −mi ) \ (Ki × (Rd )mn −mi ))
i=1
n n
X X
= µmi (Di \ Ki ) ≤ 2−i−1 ≤ .
2
i=1 i=1
Since µmn (Dn ) = µ(Bn ) ≥ , this shows that µmn (Kn0 ) ≥ /2 for each n. In
particular, each Kn0 is nonempty. Now, from the definition of Kn0 , it is easy to see
that Kn+10 ⊆ Kn0 × (Rd )mn+1 −mn for each n. Thus, if we choose (xn1 , . . . , xnmn ) ∈
Kn , then for each i < n, (xn1 , . . . , xnmi ) ∈ Ki0 . In particular, (xn1 , . . . , xnm1 ) ∈
0
K10 for each n, and so, by the compactness, we can choose a subsequence of
{(xn1 , . . . , xnm1 )}n≥1 converging to some (x1 , . . . , xm1 ) ∈ K10 . Passing to a further
subsequence, we can have (xn1 , . . . , xnm2 ) converging to some (x1 , . . . , xm2 ) ∈ K20 ,
and so on. Then, by the standard diagonal argument, we can find a subsequence
along which (xn1 , . . . , xnmi ) converges to some (x1 , . . . , xmi ) ∈ Ki0 for each i. Con-
sider the vector x = (x1 , x2 , . . .) ∈ (Rd )N . We claim that x ∈ ∩n≥1 Bn . Indeed,
this is true, because for each n,
x = (x1 , . . . , xmn , xmn +1 , . . .) ∈ Kn0 × Rd × Rd × · · ·
⊆ Dn × Rd × Rd × · · · = Bn .
This completes the proof of the theorem.
CHAPTER 4
The main goal of this chapter is to introduce Lp spaces and discuss some of
their properties. In particular, we will discuss some inequalities for Lp spaces that
are useful in probability theory.
Jensen’s inequality is not really needed for this proof; it can done by simply using
the definition of convexity and induction on n.)
A function φ is called concave if −φ is convex. Clearly, the opposite of
Jensen’s inequality holds for concave functions. An important concave function
is the logarithm, whose concavity can be verified by simply noting that its second
derivative is negative everywhere on the positive real line. A consequence of the
concavity of the logarithm is Young’s inequality, which we will use later in this
chapter to prove Hölder’s inequality.
E XERCISE 4.2.8 (Young’s inequality). If x and y are positive real numbers,
and p, q ∈ (1, ∞) are number such that 1/p + 1/q = 1, show that
xp y q
xy ≤ + .
p q
(Hint: Take the logarithm of the right side and apply the definition of concavity.)
The space Lp (Ω, F, µ) (or simply Lp (Ω) or Lp (µ)) is the set of all measurable
f : Ω → R∗ such that kf kLp is finite. In addition to the above, there is also the
L∞ norm, defined as
kf kL∞ := inf{K ∈ [0, ∞] : |f | ≤ K a.e.}.
The right side is called the ‘essential supremum’ of the function |f |. It is not hard
to see that |f | ≤ kf kL∞ a.e. As before, L∞ (Ω, F, µ) is the set of all f with finite
L∞ norm.
It turns out that for any 1 ≤ p ≤ ∞, Lp (Ω, F, µ) is actually a vector space
and the Lp is actually a norm on this space, provided that we first quotient out
the space by the equivalence relation of being equal almost everywhere. Moreover,
these norms are complete (that is, Cauchy sequences converge). The only thing that
is obvious is that kαf kLp = |α|kf kLp for any α ∈ R and any p and f . Proving the
other claims, however, requires some work, which we do below.
T HEOREM 4.4.1 (Hölder’s inequality). Take any measurable f, g : Ω → R∗ ,
and any p ∈ [1, ∞]. Let q be the solution of 1/p + 1/q = 1. Then kf gkL1 ≤
kf kLp kgkLq .
If p = ∞, then q = 1 and the proof is exactly the same. So assume that p ∈ (1, ∞),
which implies that q ∈ (1, ∞). Let α := 1/kf kLp and β := 1/kgkLq , and let
u := αf and v := βg. Then kukLp = αkf kLp = 1 and kvkLq = βkgkLq = 1.
Now, by Young’s inequality,
|u|p |v|q
|uv| ≤ +
p q
everywhere. Integrating both sides, we get
kukpLp kvkqLq 1 1
kuvkL1 ≤ + = + = 1.
p q p q
It is now easy to see that this is precisely the inequality that we wanted to prove.
E XERCISE 4.4.2. If p, q ∈ (1, ∞) and f ∈ Lp (µ) and g ∈ Lq (µ), then show
that Hölder’s inequality becomes an equality if and only if there exist α, β ≥ 0, not
both of them zero, such that α|f |p = β|g|q a.e.
4.4. Lp SPACES AND INEQUALITIES 41
Using Hölder’s inequality, it is now easy to prove that the Lp norms satisfy the
triangle inequality.
T HEOREM 4.4.3 (Minkowski’s inequality). For any 1 ≤ p ≤ ∞ and any
measurable f, g : Ω → R∗ , kf + gkLp ≤ kf kLp + kgkLp .
P ROOF. If the right side is infinite, there is nothing to prove. So assume that
the right side is finite. First, consider the case p = ∞. Since |f | ≤ kf kL∞ a.e. and
|g| ≤ kgkL∞ a.e., it follows that |f + g| ≤ kf kL∞ + kgkL∞ a.e., which completes
the proof by the definition of essential supremum.
On the other hand, if p = 1, then the claimed inequality follows trivially from
the triangle inequality and the additivity of integration.
So let us assume that p ∈ (1, ∞). First, observe that by Exercise 4.2.7,
p
f +g |f |p + |g|p
≤ ,
2 2
which shows that kf + gkLp is finite. Next note that by Hölder’s inequality,
Z Z
|f + g| dµ = |f + g||f + g|p−1 dµ
p
Z Z
≤ |f ||f + g| dµ + |g||f + g|p−1 dµ
p−1
The fact that is somewhat nontrivial, however, is that the Lp norm is complete.
That is, any sequence of functions that is Cauchy in the Lp norm converges to a
limit in Lp space. A first step towards the proof is the following lemma, which is
important in its own right.
L EMMA 4.4.5. If {fn }n≥1 is a Cauchy sequence in the Lp norm for some
1 ≤ p ≤ ∞, then there is function f ∈ Lp (Ω, F, µ), and a subsequence {fnk }k≥1 ,
such that fnk → f a.e. as k → ∞.
P ROOF. First suppose that p ∈ [1, ∞). It is not difficult to see that using
the Cauchy criterion, we can extract a subsequence {fnk }k≥1 such that kfnk −
fnk+1 kLp ≤ 2−k for every k. Define the event
Ak := {ω : |fnk (ω) − fnk+1 (ω)| ≥ 2−k/2 }.
Then by Markov’s inequality,
µ(Ak ) = µ({ω : |fnk (ω) − fnk+1 (ω)|p ≥ 2−kp/2 })
Z
kp/2
≤2 |fnk − fnk+1 |p dµ
P ROOF. Take any sequence {fn }n≥1 that is Cauchy in Lp . Then note that by
Lemma 4.4.5, there is a subsequence {fnk }k≥1 that converges a.e. to a function f
which is also in Lp . First, suppose that p ∈ [1, ∞). Take any > 0, and find N
such that for all m, n ≥ N , kfn − fm kLp < . Take any n ≥ N . Then by the
a.e. version of Fatou’s lemma,
Z Z
p
|fn − f | dµ = lim |fn − fnk |p dµ
k→∞
Z
≤ lim inf |fn − fnk |p dµ ≤ p .
k→∞
This shows that fn → f in the Lp norm. Next, suppose that p = ∞. Take any
> 0 and find N as before. Let
E := {ω : |fn (ω) − fm (ω)| ≤ for all m, n ≥ N },
so that µ(E c ) = 0. Take any n ≥ N . Then for any ω such that ω ∈ E and
fnk (ω) → f (ω),
|fn (ω) − f (ω)| = lim |fn (ω) − fnk (ω)| ≤ .
k→∞
This shows that kfn − f kL∞ ≤ , completing the proof that fn → f in the L∞
norm.
The following fact about Lp spaces of probability measures is very important
in probability theory. It does not hold for general measures.
T HEOREM 4.4.7 (Monotonicity of Lp norms for probability measures). Sup-
pose that µ is a probability measure. Then for any measurable f : Ω → R∗ and
any 1 ≤ p ≤ q ≤ ∞, kf kLp ≤ kf kLq .
which proves the claim. So let q be also finite. First assume that kf kLp and kf kLq
are both finite. Applying Jensen’s inequality with the convex function φ(x) =
|x|q/p , we get the desired inequality
Z q/p Z
|f |p dµ ≤ |f |q dµ.
Finally, let us drop the assumption of finiteness of kf kLp and kf kLq . Take a
sequence of nonnegative simple functions {gn }n≥1 increasing pointwise to |f |
(which exists by Proposition 2.3.6). Since µ is a probability measure, kgn kLp and
kgn kLq are both finite. Therefore kgn kLp ≤ kgn kLq for each n. We can now com-
plete the proof by applying the monotone convergence theorem to both sides.
E XERCISE 4.4.8. When Ω = R and µ is the Lebesgue measure, produce a
function f whose L2 norm is finite but L1 norm is infinite.
44 4. NORMS AND INEQUALITIES
The space L2 holds a special status among all the Lp spaces, because it has a
natural rendition as a Hilbert space with respect to the inner product
Z
(f, g) := f gdµ. (4.4.1)
It is easy to verify that this is indeed an inner product on L2 (Ω, F, µ), and the
norm generated by this inner product is the L2 norm (after quotienting out by the
equivalence relation of a.e. equality, as usual). The completeness of the L2 norm
guarantees that this is indeed a Hilbert space. The Cauchy–Schwarz inequality on
this Hilbert space is a special case of Hölder’s inequality with p = q = 2.
CHAPTER 5
Random variables
In this chapter we will study the basic properties of random variables and re-
lated functionals.
5.1. Definition
If (Ω, F) is a measurable space, a measurable function from Ω into R or R∗ is
called a random variable. More generally, if (Ω0 , F 0 ) is another measurable space,
then a measurable function from Ω into Ω0 is called a Ω0 -valued random variable
defined on Ω. Unless otherwise mentioned, we will assume that any random vari-
able that we are talking about is real-valued.
Let (Ω, F, P) be a probability space and let X be a random variable defined on
Ω. It is the common convention in probability theory to write P(X ∈ A) instead
of P({ω ∈ Ω : X(ω) ∈ A}). Similarly, we write {X ∈ A} to denote the set
{ω ∈ Ω : X(ω) ∈ A}. Similarly, if X and Y are two random variables, the event
{X ∈ A, Y ∈ B} is the set {ω : X(ω) ∈ A, Y (ω) ∈ B}.
Another commonly used convention is that if f : R → R is a measurable
function and X is a random variable, f (X) denotes the random variable f ◦X. The
σ-algebra generated by a random variable X is denoted by σ(X), and if {Xi }i∈I
is a collection of random variables defined on the same probability space, then
the σ-algebra σ({Xi }i∈I ) generated by the collection {Xi }i∈I is defined to be the
σ-algebra generated by the union of the sets σ(Xi ), i ∈ I.
E XERCISE 5.1.1. If X1 , . . . , Xn are random variables defined on the same
probability space, and f : Rn → R is a measurable function, show that the random
variable f (X1 , . . . , Xn ) is measurable with respect to the σ-algebra generated by
the random variables X1 , . . . , Xn .
E XERCISE 5.1.2. Let {Xn }∞ n=1 be a sequence of random variables. Show that
the random variables sup Xn , inf Xn , lim sup Xn and lim inf Xn are all measur-
able with respect to σ(X1 , X2 , . . .).
E XERCISE 5.1.3. Let {Xn }∞n=1 be a sequence of random variables defined on
a probability space (Ω, F, P). For any A ∈ σ(X1 , X2 , . . .) and any > 0, show
that there is some n ≥ 1 and some B ∈ σ(X1 , . . . , Xn ) such that P(A∆B) < .
(Hint: Use Theorem 1.2.6.)
E XERCISE 5.1.4. Let {Xi }i∈I be a finite or countable collection of real-valued
random variables defined on a probability space (Ω, F, P). Show that any A ∈
45
46 5. RANDOM VARIABLES
σ({Xi }i∈I ) can be expressed as X −1 (B) for some measurable set B ⊆ RI , where
X : Ω → RI is the map X(ω) = (Xi (ω))i∈I .
E XERCISE 5.1.5. If X1 , . . . , Xn are random variables defined on the same
probability space, and X is a random variable that is measurable with respect to
σ(X1 , . . . , Xn ), then there is a measurable function f : Rn → R such that X =
f (X1 , . . . , Xn ). Hint: Start with indicator random variables.
E XERCISE 5.1.6. Extend the previous exercise to σ-algebras generated by
countably many random variables.
P ROOF. Take Ω = (0, 1), F = the restriction of B(R) to (0, 1), and P =
the restriction of Lebesgue measure to (0, 1), which is a probability measure. For
ω ∈ Ω, define X(ω) := inf{t ∈ R : F (t) ≥ ω}. Since F (t) → 1 as t → ∞
and F (t) → 0 as t → −∞, X is well-defined on Ω. Now, if ω ≤ F (t) for some
ω and t, then by the definition of X, X(ω) ≤ t. Conversely, if X(ω) ≤ t, then
there is a sequence tn ↓ t such that F (tn ) ≥ ω for each n. By the right-continuity
of F , this implies that F (t) ≥ ω. Thus, we have shown that X(ω) ≤ t if and
only if F (t) ≥ ω. By either Exercise 2.1.6 or Exercise 2.1.7, F is measurable.
Hence, the previous sentence shows that X is also measurable, and moreover that
P(X ≤ t) = P((0, F (t)]) = F (t). Thus, F is the c.d.f. of the random variable X.
Conversely, if F is the c.d.f. of a random variable, it is easy to show that F
is right-continuous and satisfies (5.2.1) by the continuity of probability measures
under increasing unions and decreasing intersections. Monotonicity of F follows
by the monotonicity of probability measures.
Because of Proposition 5.2.2, any function satisfying the three conditions stated
in the statement of the proposition is called a cumulative distribution function (or
just distribution function or c.d.f.).
5.4. PROBABILITY DENSITY FUNCTION 47
P ROOF. If X and Y have the same law, it is clear that they have the same c.d.f.
Conversely, let X and Y be two random variables that have the same distribution
function. Let A be the set of all Borel sets A ⊆ R such that µX (A) = µY (A).
It is easy to see that A is a λ-system. Moreover, A contains all intervals of the
form (a, b], a, b ∈ R, which is a π-system that generates the Borel σ-algebra on R.
Therefore, by Dynkin’s π-λ theorem, A ⊇ B(R), and hence µX = µY .
The above proposition shows that there is a one-to-one correspondence be-
tween cumulative distribution functions and probability measures on R. Moreover,
the following is true.
E XERCISE 5.3.2. If X and Y have the same law, and g : R → R is a measur-
able function, show that g(X) and g(Y ) also have the same law.
P ROOF. One implication is trivial. For the other, let F be the c.d.f. of X and
suppose that (5.4.1) is satisfied for every set A of the form [a, b] where a and b are
continuity points of F . We then claim that (5.4.1) holds for every [a, b], even if a
and b are not continuity points of F . This is easily established using Exercise 5.2.3
and the dominated convergence theorem. Once we know this, the result can be
completed by the π-λ theorem, observing that the set of closed intervals is a π-
system, and the set of all Borel sets A for which the identity (5.4.1) holds is a
λ-system.
The following exercise relates the p.d.f. and the c.d.f. It is simple consequence
of the above proposition.
E XERCISE 5.4.2. If f is a p.d.f. on R, show that the function
Z x
F (x) := f (y)dy (5.4.2)
−∞
is the c.d.f. of the probability measure ν defined by f as in (5.4.1). Conversely,
if F is a c.d.f. on R for which there exists a nonnegative measurable function f
satisfying (5.4.2) for all x, show that f is a p.d.f. which generates the probability
measure corresponding to F .
The next exercise establishes that the p.d.f. is essentially unique when it exists.
E XERCISE 5.4.3. If f and g are two probability density functions for the same
probability measure on R, show that f = g a.e. with respect to Lebesgue measure.
Because of the above exercise, we will generally treat the probability density
function of a random variable X as a unique function (although it is only unique
up to almost everywhere equality), and refer to it as ‘the p.d.f.’ of X.
E XERCISE 5.5.1. Verify that the p.d.f. of the standard normal distribution is
indeed a p.d.f., that is, its integral over the real line equals 1. Then use a change
of variable to prove that the normal p.d.f. is indeed a p.d.f. for any µ and σ. (Hint:
Square the integral and pass to polar coordinates.)
E XERCISE 5.5.2. If X ∼ N (µ, σ 2 ), show that for any a, b ∈ R, aX + b ∼
N (aµ + b, a2 σ 2 ).
Another class of densities that occur quite frequently are the exponential den-
sities. The exponential distribution with rate parameter λ has p.d.f.
f (x) = λe−λx 1{x≥0} ,
where 1{x≥0} denotes the function that is 1 when x ≥ 0 and 0 otherwise. It is easy
to see that this is indeed a p.d.f., and its c.d.f. is given by
F (x) = (1 − e−λx )1{x≥0} .
If X is a random variable with this c.d.f., we write X ∼ Exp(λ).
E XERCISE 5.5.3. If X ∼ Exp(λ), show that for any a > 0, aX ∼ Exp(λ/a).
The Gamma distribution with rate parameter λ > 0 and shape parameter r > 0
has probability density function
λr xr−1 −λx
f (x) = e 1{x≥0} ,
Γ(r)
where Γ denotes the standard Gamma function:
Z ∞
Γ(r) = xr−1 e−x dx.
0
(Recall that Γ(r) = (r − 1)! if r is a positive integer.) If a random variable X has
this distribution, we write X ∼ Gamma(r, λ).
Yet another important class of distributions that have densities are uniform dis-
tributions. The uniform distribution on an interval [a, b] has the probability density
that equals 1/(b − a) in this interval and 0 outside. If a random variable X has this
distribution, we write X ∼ U nif [a, b].
exists in the Lebesgue sense, and in that case the two quantities are equal.
in the sense that both sides exist and are equal. (Of course, here f (t, X) denotes
the random variable f (t, X(ω)).)
A corollary of the above exercise is the following fact, which shows that the
expected value is a property of the law of a random variable rather than the random
variable itself.
E XERCISE 6.1.8. Prove that if two random variables X and Y have the same
law, then E(X) exists if and only if E(Y ) exists, and in this case they are equal.
where <(z) and =(z) denote the real and imaginary parts of a complex number
z. Since |E(Z)|2 is real, the above identity show that |E(Z)|2 = E(<(ᾱZ)).
But <(ᾱZ) ≤ |ᾱZ| = |α||Z|. Thus, |E(Z)|2 ≤ |α|E|Z|, which completes the
proof.
The above proposition shows in particular that |φX (t)| ≤ 1 for any random
variable X and any t.
E XERCISE 6.4.2. Show that the characteristic function can be a written as a
power series in t if (6.3.1) holds.
E XERCISE 6.4.3. Show that the characteristic function of any random vari-
able is a uniformly continuous function. (Hint: Use the dominated convergence
theorem.)
Perhaps somewhat surprisingly, the characteristic function also gives a tail
bound. This bound is not very useful, but we will see at least one fundamental
application in a later chapter.
P ROPOSITION 6.4.4. Let X be a random variable with characteristic function
φX . Then for any t > 0,
t 2/t
Z
P(|X| ≥ t) ≤ (1 − φX (s))ds.
2 −2/t
Independence
7.1. Definition
D EFINITION 7.1.1. Let (Ω, F, P) be a probability space, and suppose that
G1 , . . . , Gn are sub-σ-algebras of F. We say that G1 , . . . , Gn are independent if
for any A1 ∈ G1 , A2 ∈ G2 , . . . , An ∈ Gn ,
n
Y
P(A1 ∩ · · · ∩ An ) = P(Ai ).
i=1
More generally, an arbitrary collection {Gi }i∈I of sub-σ-algebras are called inde-
pendent if any finitely many of them are independent.
The independence of σ-algebras is used to define the independence of random
variables and events.
D EFINITION 7.1.2. A collection of events {Ai }i∈I in F are said to be indepen-
dent if the σ-algebras Gi := {∅, Ai , Aci , Ω} generated by the Ai ’s are independent.
Moreover, an event A is said to be independent of a σ-algebra G is the σ-algebras
{∅, A, Ac , Ω} and G are independent.
E XERCISE 7.1.3. Show that a collection of events {Ai }i∈I are independent if
and only if for any finite J ⊆ I,
\ Y
P Aj = P(Aj ).
j∈J j∈J
sequence {Xi }∞i=1 is said to be i.i.d. if the Xi ’s are independent and all have the
same distribution.
We end this section with a sequence of important exercises about independent
random variables and events.
E XERCISE 7.1.5. Let {Xn }∞ n=1 be a sequence of random variables defined on
the same probability space. If Xn+1 is independent of σ(X1 , . . . , Xn ) for each n,
prove that the whole collection is independent.
E XERCISE 7.1.6. If X1 , . . . , Xn are independent random variables and f :
Rn → R is a measurable function, show that the law of f (X1 , . . . , Xn ) is deter-
mined by the laws of X1 , . . . , Xn .
E XERCISE 7.1.7. If {Xi }i∈I is a collection of independent random variables
and {Ai }i∈I is a collection of measurable subsets of R, show that the events {Xi ∈
Ai }, i ∈ I are independent.
E XERCISE 7.1.8. If {Fi }i∈I is a family of cumulative distribution functions,
show that there is a probability space (Ω, F, P) and independent random variables
{Xi }i∈I defined on Ω such that for each i, Fi is the c.d.f. of Xi . (Hint: Use product
spaces.)
(The above exercise allows us to define arbitrary families of independent ran-
dom variables on the same probability space. The usual convention in probability
theory is to always have a single probability space (Ω, F, P) in the background, on
which all random variables are defined. For convenience, this probability space is
usually assumed to be complete.)
E XERCISE 7.1.9. If A is a π-system of sets and B is an event such that B and
A are independent for every A ∈ A, show that B and σ(A) are independent.
E XERCISE 7.1.10. If {Xi }i∈I is a collection of independent random variables,
then show that for any disjoint subsets J, K ⊆ I, the σ-algebras generated by
{Xi }i∈J and {Xi }i∈K are independent. (Hint: Use Exercise 7.1.9.)
E XERCISE 7.1.11. Let {Xi }i∈I and {Yj }j∈J be two collections of random
variables defined on the same probability space. Suppose that for any finite F ⊆ I
and G ⊆ J, the collections {Xi }i∈F and {Yj }j∈G are independent. Then prove
that {Xi }i∈I and {Yj }j∈J are independent. (Hint: Use Exercise 7.1.9.)
Next take any nonnegative X and Y that are independent. Construct nonnegative
simple random variables Xn and Yn increasing to X and Y , using the method from
the proof of Proposition 2.3.6. From the construction, it is easy to see that each Xn
is σ(X)-measurable and each Yn is σ(Y )-measurable. Therefore Xn and Yn are
independent, and hence E(Xn Yn ) = E(Xn )E(Yn ), since we have already proved
this identity for nonnegative simple random variables. Now note that since Xn ↑ X
and Yn ↑ Y , we have Xn Yn ↑ XY . Therefore by the monotone convergence
theorem,
E(XY ) = lim E(Xn Yn ) = lim E(Xn )E(Yn ) = E(X)E(Y )
n→∞ n→∞
Finally, take any independent X and Y . It is easy to see that X + and X − are
σ(X)-measurable, and Y + and Y − are σ(Y )-measurable. Therefore
E|XY | = E((X + + X − )(Y + + Y − ))
= E(X + Y + ) + E(X + Y − ) + E(X − Y + ) + E(X − Y − )
= E(X + )E(Y + ) + E(X + )E(Y − ) + E(X − )E(Y + ) + E(X − )E(Y − )
= E|X|E|Y |.
Since E|X| and E|Y | are both finite, this shows that XY is integrable. Repeating
the steps in the above display starting with E(XY ) instead of E|XY |, we get
E(XY ) = E(X)E(Y ).
The following exercise generalizes the above proposition. It is provable easily
by induction, starting with the case n = 2.
E XERCISE 7.2.2. If X1 , X2 , . . . , Xn are independent integrable random vari-
ables, show that the product X1 X2 · · · Xn is also integrable and
E(X1 X2 · · · Xn ) = E(X1 )E(X2 ) · · · E(Xn ).
62 7. INDEPENDENCE
Moreover, show that the identity also holds if the Xi ’s are nonnegative but not
necessarily integrable.
The following are some important consequences of the above exercise.
E XERCISE 7.2.3. If X and Y are independent integrable random variables,
show that Cov(X, Y ) = 0. In other words, independent random variables are
uncorrelated.
E XERCISE 7.2.4. Give an example of a pair of uncorrelated random variables
that are not independent.
A collection of random variables {Xi }i∈I is called pairwise independent if for
any two distinct i, j ∈ I, Xi and Xj are independent.
E XERCISE 7.2.5. Give an example of three random variables X1 , X2 , X3 that
are pairwise independent but not independent.
P ROOF. Let B denote the event {A i.o.}. Then B c is the set of all ω that
belong to only finitely many of the An ’s. In set theoretic notation,
∞ \
[ ∞
c
B = Ack .
n=1 k=n
Therefore
∞
\
c
P(B ) = lim P Ack .
n→∞
k=n
Take any 1 ≤ n ≤ m. Then by independence,
\∞ \ m
c c
P Ak ≤ P Ak
k=n k=n
m
Y
= (1 − P(Ak )).
k=n
7.4. THE KOLMOGOROV ZERO-ONE LAW 63
The following result is often useful for proving that some random variable is actu-
ally a constant.
T HEOREM 7.4.1 (Kolmogorov’s zero-one law). If {Xn }∞ n=1 is a sequence of
independent random variables defined on a probability space (Ω, F, P), and T is
the tail σ-algebra of this sequence, then for any A ∈ T , P(A) is either 0 or 1.
P ROOF. Take any n. Since A ∈ σ(Xn+1 , Xn+2 , . . .) and the Xi ’s are indepen-
dent, it follows by Exercise 7.1.10 that the event A is independent of the σ-algebra
σ(X1 , . . . , Xn ). Let
∞
[
A := σ(X1 , . . . , Xn ).
n=1
It is easy to see that A is an algebra, and σ(A) = σ(X1 , X2 , . . .). From the first
paragraph we know that A is independent of B for every B ∈ A. Therefore by
Exercise 7.1.9, A is independent of σ(X1 , X2 , . . .). In particular, A is independent
of itself, which implies that P(A) = P(A ∩ A) = P(A)2 . This proves that P(A) is
either 0 or 1.
64 7. INDEPENDENCE
Another way to state Kolmogorov’s zero-one law is to say that the tail σ-
algebra of a sequence of independent random variables is ‘trivial’, in the sense
that any event in it has probability 0 or 1. Trivial σ-algebras have the following
useful property.
E XERCISE 7.4.2. If G is a trivial σ-algebra for a probability measure P, show
that any random variable X that is measurable with respect to G must be equal to
a constant almost surely.
In particular, if {Xn }∞
n=1 is a sequence of independent random variables and
X is a random variable that is measurable with respect to the tail σ-algebra of this
sequence, then show that there is some constant c such that X = c a.e. Some
important consequences are the following.
E XERCISE 7.4.3. If {Xn }∞n=1 is a sequence of independent random variables,
show that the random variables lim sup Xn and lim inf Xn are equal to constants
almost surely.
P ROOF. Clearly Γ satisfies property (ii) in Theorem 7.5.1 for any chosen enu-
meration of the edges. Property (i) is satisfied by the hypothesis of the corol-
lary.
The following exercise demonstrates an important application of Corollary
7.5.2.
E XERCISE 7.5.3. Let {Xe }e∈E be a collection of i.i.d. Ber(p) random vari-
ables, for some p ∈ [0, 1]. Define a random subgraph of Zd by keeping an edge e
if Xe = 1 and deleting it if Xe = 0. Let N be the number of infinite connected
components of this random graph. (These are known as infinite percolation clus-
ters.) Exercise 3.3.2 shows that N is a random variable. Using Theorem 7.5.1 and
the above discussion, prove that N equals a constant in {0, 1, 2, . . .} ∪ {∞} almost
surely.
Another consequence of Theorem 7.5.1 is the following result.
66 7. INDEPENDENCE
P ROOF. Here Γ is the set of all permutations of the positive integers that fix
all but finitely many indices. Then clearly Γ satisfies the hypotheses of Theorem
7.5.1.
A typical application of the Hewitt–Savage zero-one law is the following.
E XERCISE 7.5.5. Suppose that we have m boxes, and an infinite sequence of
balls are dropped into the boxes independently and uniformly at random. Set this
up as a problem in measure-theoretic probability, and prove that with probability
one, each box has the maximum number of balls among all boxes infinitely often.
7.7. Convolutions
Given two probability measures µ1 and µ2 on R, their convolution µ1 ∗ µ2 is
defined to be the push-forward of µ1 × µ2 under the addition map from R2 to R.
That is, if φ(x, y) := x + y, then for any A ∈ B(R),
µ1 ∗ µ2 (A) := µ1 × µ2 (φ−1 (A)).
Exercise 7.6.2 shows that in the language of random variables, the convolution of
two probability measures has the following description: If X and Y are indepen-
dent random variables with laws µ1 and µ2 , then µ1 ∗ µ2 is the law of X + Y .
68 7. INDEPENDENCE
71
72 8. CONVERGENCE OF RANDOM VARIABLES
P ROOF. It is easy to show from the given condition that |X| ≤ c a.s. Take any
> 0. Then
E|Xn − X|p ≤ E(|Xn − X|p ; |Xn − X| > ) + p
≤ (2c)p P(|Xn − X| > ) + p .
Sending n → ∞, we get lim sup E|Xn − X|p ≤ p . Since is arbitrary, this
completes the proof.
Interestingly, there is no direct connection between convergence in Lp and
almost sure convergence.
E XERCISE 8.2.10. Take any p > 0. Give counterexamples to show that almost
sure convergence does not imply Lp convergence, and Lp convergence does not
imply almost sure convergence.
P ROOF. Take any , and find K such that E(|Xn |; |Xn | > K) ≤ for all n.
This implies, in particular, that
E|Xn | ≤ E(|Xn |; |Xn | > K) + E(|Xn |; |Xn | ≤ K) ≤ + K.
8.3. UNIFORM INTEGRABILITY 75
P ROOF. Suppose that {Xn }n≥1 is uniformly integrable. First, choose a > 0
such that
sup E(|Xn |; |Xn | > a) ≤ 1.
n≥1
Then for any n,
E|Xn | = E(|Xn |; |Xn | ≤ a) + E(|Xn |; |Xn | > a) ≤ a + 1,
which shows that supn≥1 E|Xn | < ∞. Next, for any a > 0, any event A, and
any n,
E(|Xn |; A) = E(|Xn |; A ∩ {|Xn | ≤ a}) + E(|Xn |; A ∩ {|Xn | > a})
≤ aP(A) + E(|Xn |; |Xn | > a).
76 8. CONVERGENCE OF RANDOM VARIABLES
By uniform integrability, the right side can be made uniformly small by choosing
a large enough and P(A) small enough.
Conversely, suppose that the alternative criterion holds. Then for any a > 0,
aP(|Xn | > a) ≤ E(|Xn |; |Xn | > a) ≤ sup E|Xn | < ∞.
n≥1
Thus, P(|Xn | > a) can be made uniformly small by choosing a large enough.
Thus, E(|Xn |; |Xn | > a) can be made uniformly small by choosing a large enough,
which proves uniform integrability.
An interesting corollary of the above result is the following. We will have
occasion to use it later.
C OROLLARY 8.3.4. For any integrable random variable X, and for any > 0,
there is some δ > 0 such that E(|X|; A) < whenever P(A) < .
T HEOREM 8.5.1 (Etemadi’s strong law of large numbers). Let {Xn }n≥1 be
a sequence of pairwise independent and identically distributed random variables,
with E|X1 | < ∞. Then n−1 ni=1 Xi converges almost surely to E(X1 ) as n tends
P
to ∞.
P ROOF. Splitting each Xi into its positive and negative parts, we see that it
suffices to prove the theorem for nonnegative random variables. So assume that
the Xi ’s are nonnegative random variables.
The next step is to truncate the Xi ’s to produce random variables that are more
well-behaved with respect to variance computations. Define Yi := Xi 1{Xi <i} . We
claim that it suffices to show that n−1 ni=1 Yi → µ a.s., where µ := E(X1 ). To
P
see why this suffices, notice that by Exercise 6.1.9 and the fact that the Xi ’s are
identically distributed,
∞
X ∞
X ∞
X
P(Xi 6= Yi ) ≤ P(Xi ≥ i) = P(X1 ≥ i) ≤ E(X1 ) < ∞.
i=1 i=1 i=1
Take any α > 1. Let kn := [αn ], where [x] denotes the integer part of a real
number x. The penultimate step in Etemadi’s proof is to show that for any choice
of α > 1, Zkn → 0 a.s.
To show this, take any > 0. Recall that the Xi ’s are pairwise independent.
Therefore so are the Yi ’s and hence Cov(Yi , Yj ) = 0 for any i 6= j. Thus for any
n, by Theorem 8.4.1,
kn
1 X
P(|Zkn | > ) ≤ Var(Yi ).
2 kn2
i=1
Therefore
∞ ∞ kn
X X 1 X
P(|Zkn | > ) ≤ Var(Yi )
2 kn2
n=1 n=1 i=1
∞
1 X X 1
= 2 Var(Yi )
kn2
i=1 n : kn ≥i
8.5. THE STRONG LAW OF LARGE NUMBERS 79
It is easy to see that there is some β > 1, depending only on α, such that kn+1 /kn ≥
β for all n large enough. Therefore for large enough n,
∞
X 1 1 X −n C
≤ β ≤ 2,
kn2 i2 i
n : kn ≥i n=0
E XERCISE 8.5.7 (SLLN under bounded fourth moment). Let {Xn }∞ n=1 be a
sequence of independent randomP variables with mean zero and uniformly bounded
fourth moment. Prove that n−1 ni=1 Xi → 0 a.s. (Hint: Use a fourth moment
version of Chebychev’s inequality.)
E XERCISE 8.5.8 (Random matrices). Let {Xij }1≤i≤j<∞ be a collection of
i.i.d. random variables with mean zero and all moments finite. Let Xji := Xij
if j > i. Let Wn be the n × n symmetric random matrix whose (i, j)th entry is
n−1/2 Xij . A matrix like Wn is called a Wigner matrix. Let λn,1 ≥ · · · ≥ λn,n be
the eigenvalues of Wn , repeated by multiplicities. For any integer k ≥ 1, show that
n X n
1X k 1 k
λn,i − E λn,i → 0 a.s. as n → ∞.
n n
i=1 i=1
(Hint: Express the sum as the trace of a power of Wn , and use Theorem 8.4.1. The
tail bound is strong enough to prove almost sure convergence.)
E XERCISE 8.5.9. If all the random graphs in Exercise 8.4.5 are defined on the
same probability space, show that the convergence is almost sure.
P ROOF. First, suppose that Ef (Xn ) → Ef (X) for every bounded continuous
function f . Take any continuity point t of FX . Take any > 0. Let f be the
function that is 1 below t, 0 above t + , and goes down linearly from 1 to 0 in the
interval [t, t + ]. Then f is a bounded continuous function, and so
lim sup FXn (t) ≤ lim sup Ef (Xn )
n→∞ n→∞
= Ef (X) ≤ FX (t + ).
8.7. AN ALTERNATIVE CHARACTERIZATION OF WEAK CONVERGENCE 83
Since this is true for all > 0 and FX is right-continuous, this gives
lim sup FXn (t) ≤ FX (t).
n→∞
A similar argument shows that for any > 0,
lim inf FXn (t) ≥ FX (t − ).
n→∞
Since t is a continuity point of FX , this proves that lim inf FXn (t) ≥ FX (t). Thus,
Xn → X in distribution.
Conversely, suppose that Xn → X in distribution. Let f be a bounded con-
tinuous function. Take any > 0. By Exercise 8.6.2 there exists K such that
P(|Xn | ≥ K) ≤ for all n. Choose K so large that we also have P(|X| ≥ K) ≤ .
Let M be a number such that |f (x)| ≤ M for all x.
Since f is uniformly continuous in [−K, K], there is some δ > 0 such that
|f (x) − f (y)| ≤ whenever |x − y| ≤ δ and x, y ∈ [−K, K]. By Exercise 5.2.3,
we may assume that −K and K are continuity points of FX , and we can pick out
a set of points x1 ≤ x2 ≤ · · · ≤ xm ∈ [−K, K] such that each xi is a continuity
point of FX , x1 = −K, xm = K, and xi+1 − xi ≤ δ for each i. Now note that
Ef (Xn ) = E(f (Xn ); Xn > K) + E(f (Xn ); Xn ≤ −K)
m−1
X
+ E(f (Xn ); xi < Xn ≤ xi+1 ),
i=1
which implies that
m−1
X
Ef (Xn ) − f (xi )P(xi < Xn ≤ xi+1 )
i=1
m−1
X
≤ M + E(|f (Xn ) − f (xi )|; xi < Xn ≤ xi+1 )
i=1
m−1
X
≤ M + P(xi < Xn ≤ xi+1 ) ≤ (M + 1).
i=1
A similar argument gives
m−1
X
Ef (X) − f (xi )P(xi < X ≤ xi+1 ) ≤ (M + 1).
i=1
Since x1 , . . . , xm are continuity points of FX and Xn → X in distribution,
lim P(xi < Xn ≤ xi+1 ) = lim (FXn (xi+1 ) − FXn (xi ))
n→∞ n→∞
= FX (xi+1 ) − FX (xi ) = P(xi < X ≤ xi+1 )
for each i. Combining the above observations, we get
lim sup |Ef (Xn ) − Ef (X)| ≤ 2(M + 1).
n→∞
Since was arbitrary, this completes the proof.
84 8. CONVERGENCE OF RANDOM VARIABLES
P ROOF. If two random variables have the same law, then they obviously have
the same characteristic function. Conversely, suppose that X and Y have the same
characteristic function. Then by Theorem 8.8.1, E(g(X)) = E(g(Y )) for every
bounded continuous g. Therefore by Proposition 8.7.1, X and Y have the same
law.
E XERCISE 8.8.3. Let φ be the characteristic function of a random variable X.
If |φ(t)| = 1 for all t, show that X is a degenerate random variable — that is,
there is a constant c such that P(X = c) = 1. (Hint: Start by showing that |φ(t)|2
is the characteristic function of X − X 0 , where X 0 has the same law as X and is
independent of X. Then apply Corollary 8.8.2.)
E XERCISE 8.8.4. Let X be a non-degenerate random variable. If aX and bX
have the same distribution for some positive a and b, show that a = b. (Hint: Use
the previous exercise.)
E XERCISE 8.8.5. Let X be any random variable. If X + a and X + b have the
same distribution for some a, b ∈ R, show that a = b.
E XERCISE 8.8.6. Suppose that a random variable X has the same distribution
as the sum of n i.i.d. copies of X, for some n ≥ 2. Prove that X = 0 with
probability one. (Hint: Prove that the characteristic function takes value in a finite
set and must therefore be equal to 1 everywhere.)
Another important corollary of Theorem 8.8.1 is the following simplified in-
version formula.
C OROLLARY 8.8.7. Let X be a random variable with characteristic function
φ. Suppose that Z ∞
|φ(t)|dt < ∞.
−∞
Then X has a probability density function f , given by
Z ∞
1
f (x) = e−itx φ(t)dt.
2π −∞
P ROOF. Recall from the proof of Theorem 8.8.1 that fθ is the p.d.f. of X + Zθ ,
where Zθ ∼ N (0, 2θ). If φ is integrable, then it is easy to see by the dominated
convergence theorem that for every x,
f (x) = lim fθ (x).
θ→0
86 8. CONVERGENCE OF RANDOM VARIABLES
Therefore if a and b are continuity points of the c.d.f. of X, then Slutsky’s theorem
implies that
Z b
P(a ≤ X ≤ b) = f (x)dx.
a
By Proposition 5.4.1, this completes the proof.
For integer-valued random variables, a different inversion formula is often use-
ful.
T HEOREM 8.8.8. Let X be an integer-valued random variable with character-
istic function φ. Then for any x ∈ Z,
Z π
1
P(X = x) = e−itx φ(t)dt.
2π −π
Thus, P(|Xn | ≥ t) ≤ 2 for all large enough n. This allows us to choose K large
enough such that P(|Xn | ≥ K) ≤ 2 for all n. Since is arbitrary, this proves that
{Xn }n≥1 is a tight family.
Suppose that Xn does not converge in distribution to X. Then there is a
bounded continuous function f such that Ef (Xn ) 6→ Ef (X). Passing to a subse-
quence if necessary, we may assume that there is some > 0 such that |Ef (Xn ) −
Ef (X)| ≥ for all n. By tightness, there exists some subsequence {Xnk }k≥1
that converges in distribution to a limit Y . Then Ef (Xnk ) → Ef (Y ) and hence
|Ef (Y ) − Ef (X)| ≥ . But by the first part of this theorem and the hypothesis that
φXn → φX pointwise, we have that φY = φX everywhere. Therefore Y and X
must have the same law by Lemma 8.8.2, and we get a contradiction by the second
assertion of Proposition 8.7.1 applied to this X and Y .
E XERCISE 8.9.2. If a sequence of characteristic functions {φn }n≥1 converges
pointwise to a characteristic function φ, prove that the convergence is uniform on
any bounded interval. (Hint: It suffices to show that if tn → t, then φn (tn ) → φ(t).
Deduce this from Lévy’s continuity theorem.)
E XERCISE 8.9.3. If a sequence of characteristic functions {φn }n≥1 converges
pointwise to some function φ, and φ is continuous at zero, prove that φ is also a
characteristic function. (Hint: Prove tightness and proceed from there.)
T HEOREM 8.10.1 (CLT for i.i.d. sums). Let X1 , X2 , . . . be i.i.d. random vari-
ables with mean µ and variance σ 2 . Then the random variable
Pn
i=1√Xi − nµ
nσ
converges weakly to the standard Gaussian distribution as n → ∞.
We need a few simple lemmas to prepare for the proof. The main argument is
based on Lévy’s continuity theorem.
P ROOF. This follows easily from Taylor expansion, noting that all derivatives
of the map x 7→ eix are uniformly bounded by 1 in magnitude.
x2 |x|3
eix − 1 − ix + ≤ .
2 6
On the other hand, by Lemma 8.10.2 we also have
x2 x2
eix − 1 − ix + ≤ |eix − 1 − ix| +
2 2
x 2 x 2
≤ + = x2 .
2 2
The proof is completed by combining the two bounds.
P ROOF. Writing the difference of the products as a telescoping sum and ap-
plying the triangle inequality, we get
Y n Yn Xn
ai − bi ≤ a1 · · · ai−1 bi · · · bn − a1 · · · ai bi+1 · · · bn
i=1 i=1 i=1
n
X
= |a1 · · · ai−1 (bi − ai )bi+1 · · · bn |
i=1
Xn
≤ |ai − bi |,
i=1
where the last inequality holds because |ai | and |bi | are ≤ 1 for each i.
We are now ready to prove Theorem 8.10.1.
P ROOF OF T HEOREM 8.10.1. Replacing Xi by (Xi − µ)/σ, let us assume
without loss of generality that µ = 0 and σ = 1. Let Sn := n−1/2 ni=1 Xi .
P
Take any t ∈ R. By Lévy’s continuity theorem and the formula for the character-
istic function of the standard normal distribution (Proposition 6.5.1), it is sufficient
2
to show that φSn (t) → e−t /2 as n → ∞, where φSn is the characteristic function
of Sn . By the i.i.d. nature of the summands,
n
Y √ √
φSn (t) = φXi (t/ n) = (φX1 (t/ n))n .
i=1
Therefore by Lemma 8.10.4, when n is so large that t2 ≤ 2n,
t2 n √ t2
φSn (t) − 1 − ≤ n φX1 (t/ n) − 1 − .
2n 2n
Thus, we only need to show that the right side tends to zero as n → ∞. To prove
this, note that by Corollary 8.10.3,
√ t2 √ itX1 t2 X12
itX1 / n
n φX1 (t/ n) − 1 − =nE e −1− √ +
2n n 2n
3 3
|t| |X1 |
≤ E min t2 X12 , √ .
6 n
By the finiteness of E(X12 ) and the dominated convergence theorem, the above
expectation tends to zero as n → ∞.
E XERCISE 8.10.6. Let Xn ∼ Bin(n, p). Prove a central limit theorem for
Xn . (Hint: Use Exercise 7.7.4 and the CLT for i.i.d. random variables.)
E XERCISE 8.10.7. Let Xn ∼ Gamma(n, λ). Prove a central limit theorem
for Xn . (Hint: Use Exercise 7.7.5 and the CLT for i.i.d. random variables.)
E XERCISE 8.10.8. Suppose that Xn ∼ Gamma(n, λn ), where {λn }∞
n=1 is
any sequence of positive constants. Prove a CLT for Xn .
Just like the strong law of large numbers, one can prove a central limit theorem
for a stationary m-dependent sequence of random variables.
T HEOREM 8.10.9 (CLT for stationary m-dependent sequences). Suppose that
X1 , X2 , . . . is a stationary m-dependent sequence of random variables with mean
µ and finite second moment. Let
m+1
X
σ 2 := Var(X1 ) + 2 Cov(X1 , Xi ).
i=2
Then the random variable Pn
− nµ
i=1√Xi
nσ
converges weakly to the standard Gaussian distribution as n → ∞.
P ROOF. The proof of this result uses the technique of ‘big blocks and little
blocks’, which is also useful for other things. Without loss of generality, assume
that µ = 0. Take any r ≥ m. Divide up the set of positive integers into ‘big blocks’
size r, with intermediate ‘little blocks’ of size m. For example, the first big block
is {1, . . . , r}, which is followed by the little block {r + 1, . . . , r + m}. Enumerate
the big blocks as B1 , B2 , . . . and the little blocks as L1 , L2 , . . .. Let
X X
Yj := Xi , Zj := Xi .
i∈Bj i∈Lj
then {Rn − Rn0 }n≥1√ is a tight family of random variables. Therefore by Exercise
8.6.3, (Rn − Rn0 )/ n → 0 in probability. But again by the CLT for i.i.d. variables,
R0 d
√n → N (0, τr2 ),
n
where
Var(Z1 )
τr2 = .
r+m
√
Therefore by Slutsky’s theorem, Rn / n also has the√same weak limit.
Now let φn be the√characteristic function of Sn / n and ψn,r be the character-
istic function of Tn / n. Then for any t,
√ √
|φn (t) − ψn,r (t)| = |E(eitSn / n
− eitTn / n
)|
√
itRn / n
≤ E|e − 1|
Letting n → ∞ on both sides and using the observations made above, we get
2 σ 2 /2
lim sup |φn (t) − e−t r | ≤ E|eitξr − 1|,
n→∞
kn
X
s2n := 2
σn,i .
i=1
Suppose that for any > 0,
kn
1 X
lim E((Xn,i − µn,i )2 ; |Xn,i − µn,i | ≥ sn ) = 0. (8.11.1)
n→∞ s2
n i=1
92 8. CONVERGENCE OF RANDOM VARIABLES
P ROOF OF T HEOREM 8.11.1. Replacing Xn,i by (Xn,i − µn,i )/sn , let us as-
sume without loss of generality that µn,i = 0 and sn = 1 for each n and i. Then
the condition (8.11.1) becomes
kn
X
2
lim E(Xn,i ; |Xn,i | ≥ ) = 0. (8.11.2)
n→∞
i=1
Note that for any > 0 and any n,
2 2 2
max σn,i = max (E(Xn,i ; |Xn,i | < ) + E(Xn,i ; |Xn,i | ≥ ))
1≤i≤kn 1≤i≤kn
≤ 2 + max E(Xn,i
2
; |Xn,i | ≥ ).
1≤i≤kn
Therefore by (8.11.2),
2
lim sup max σn,i ≤ 2 .
n→∞ 1≤i≤kn
Since is arbitrary, this shows that
2
lim max σn,i = 0. (8.11.3)
n→∞ 1≤i≤kn
Pn
Let Sn := i=1 Xi . Then by Exercise 7.7.7,
kn
Y
φSn (t) = φXn,i (t).
i=1
By Corollary 8.10.3,
t2 σn,i
2 t2 Xn,i
2
itXn,i
φXn,i (t) − 1 + = E e − 1 − itXn,i +
2 2
3 3
2 |t| |Xn,i |
≤ E min t2 Xn,i , .
6
8.11. THE LINDEBERG–FELLER CENTRAL LIMIT THEOREM 93
By the central limit theorem, the standard normal distribution is a stable law.
The goal of this section is to characterize the set of all stable laws. We need the
following lemma.
8.12. STABLE LAWS 95
d
L EMMA 8.12.2. Suppose that Xn → X, where X is non-degenerate. Let
d
αn > 0 and βn ∈ R be such that αn Xn + βn → Y for some non-degenerate Y .
d
Then there exist α > 0 and β ∈ R such that αn → α, βn → β, and Y = αX + β.
P ROOF. If Y has the stated property, then it is obvious that Y is stable. Con-
versely, suppose that Y is stable. Let Xn , an and bn be as in Definition 8.12.1.
Let
X1 + · · · + Xn
Zn := − bn .
an
Take any k ≥ 1. For 1 ≤ j ≤ k, let
Xn(j−1)+1 + · · · + Xnj
Zn(j) := − bn ,
an
so that
k
an X (j) kan
Znk = Zn + bn − bnk .
ank ank
j=1
The next result shows that the constant An in Theorem 8.12.3 must necessarily
have a specific form.
We need the following lemma for the proof of Theorem 8.12.4. A random
d
variable X is called ‘symmetric’ if X = −X.
Thus, by symmetry,
1
P(M > t) = P(|M | > t) = P( max |Xi | > t).
2 1≤i≤n
By the inequality 1−x ≤ e−x , this proves the lemma when t is a point of continuity
of F . If t is not a point of continuity, then the result can be obtained by taking a
sequence of continuity points decreasing to t.
98 8. CONVERGENCE OF RANDOM VARIABLES
Thus,
Z ∞ Z ∞
2 2
E(Z ) = 2xP(|Z| ≥ x)dx ≤ t + 4Btα x1−α dx,
0 t
which is finite since α > 2. But then, by the central limit theorem, (Z1 + · · · +
Zn )/n1/α cannot converge in distribution as n → ∞. This gives a contradiction
which proves that α ≤ 2.
E XERCISE 8.12.6. Show that if X is a symmetric stable random variable, it
α
must have characteristic function φ(t) = e−θ|t| for some θ > 0 and α ∈ (0, 2].
CHAPTER 9
In this chapter we will learn about the measure theoretic definition of condi-
tional expectation, about martingales and their properties, and some applications.
P ROOF. It is easy to see from the definition that aE(X1 |G) + bE(X2 |G) is
a valid candidate for E(aX1 + bX2 |G), and therefore the result follows by the
uniqueness of conditional expectation.
L EMMA 9.1.5. If X is a nonnegative simple random variable, then E(X|G) is
also a nonnegative random variable.
assertion of the dominated convergence theorem, we get that E(Xn |G) → E(X|G)
in L1 .
The above properties of conditional expectation are all analogues of properties
of unconditional expectation. Conditional expectation has a few properties that
have no unconditional analogue. One example is the tower property, which says
that if we have two σ-algebras G 0 ⊂ G, then
E(X|G 0 ) = E(E(X|G)|G 0 ) a.s.
To prove this, take any B ∈ G 0 . Then B ∈ G, and hence
E(E(X|G); B) = E(X; B) = E(E(X|G 0 ); B).
Thus, E(X|G 0 ) satisfies the defining property of the conditional expectation of
E(X|G) given G 0 , which proves the claim.
A second example is the property that if Y is G-measurable and XY is inte-
grable (in addition to X being integrable), then
E(XY |G) = Y E(X|G) a.s. (9.2.1)
To prove this, let us first prove the weaker statement
E(XY ) = E(Y E(X|G)). (9.2.2)
Note that this follows from the definition of conditional expectation if Y = 1B
for some B ∈ G. Therefore it holds for any simple G-measurable Y . Next, if
X and Y are both nonnegative, we can take a sequence of nonnegative, simple, G-
measurable random variables increasing to Y and apply the monotone convergence
theorem on both sides to get (9.2.2). In particular, this shows that if X and Y are
nonnegative and XY is integrable, then so is Y E(X|G).
Finally, in the general case, note that the integrability of XY implies that
X + Y + , X + Y − , X − Y + and X − Y − are all integrable. By linearity, this gives
us (9.2.2) for any X and Y such that X and XY are integrable, and Y is G-
measurable.
Next, to get (9.2.1), replace Y by Y 1B in (9.2.2), where B ∈ G. Since the
integrability of XY implies that of XY 1B , we get
E(XY ; B) = E(Y E(X|G); B).
Since this holds for any B ∈ G, we get (9.2.1).
The following result is another basic property of conditional expectation that
is often useful.
P ROPOSITION 9.2.1. Let S and T be two measurable spaces. Let X be an
S-valued random variable and Y be a T -valued random variable, defined on the
same probability space, such that X and Y are independent. Let φ : S ×T → R be
a measurable function (with respect to the product σ-algebra) such that φ(X, Y )
is an integrable random variable. For each x ∈ S, define
ψ(x) := E(φ(x, Y )).
Then ψ is a measurable function, ψ(X) is an integrable random variable and
ψ(X) = E(φ(X, Y )|X) a.s.
9.2. BASIC PROPERTIES OF CONDITIONAL EXPECTATION 107
P ROOF. Take any A ∈ σ(X). Then A = X −1 (B) for some B in the σ-algebra
of S. Let µ be the law of X and ν be the law of Y . Then by Exercise 6.1.2, the
integrability of φ(X, Y ), and Fubini’s theorem,
Z Z
E(φ(X, Y ); A) = φ(x, y)1B (x)dν(y)dµ(x)
S T
Z Z Z
= φ(x, y)dν(y)dµ(x) = ψ(x)dµ(x)
B T B
Z
= ψ(x)1B (x)dµ(x) = E(ψ(X); A).
S
This shows that ψ(X) is indeed a version of E(φ(X, Y )|X). The measurability and
ψ and the integrability of ψ(X) are also consequences of Fubini’s theorem.
In the following exercises, (Ω, F, P) is a probability space on which all our
random variables are defined, and G is an arbitrary sub-σ-algebra of F.
E XERCISE 9.2.2. If Xn → X in L1 , prove that E(Xn |G) → E(X|G).
E XERCISE 9.2.3. On the unit interval with Lebesgue measure, let X(ω) = ω
and for an arbitrary positive integer n let Y (ω) = nω − [nω], where [x] denotes
the largest integer ≤ x. What is E(X|Y )? Explain your answer.
E XERCISE 9.2.5. Suppose X and Y are square integrable. Show that for every
sub-σ-algebra G ⊂ F, E[XE(Y |G)] = E[E(X|G)Y ].
E XERCISE 9.2.8. Let S be a random variable with P(S > t) = exp(−t) for
t > 0. Find E(S| max{S, t}) and E(S| min{S, t}) for each t > 0.
E XERCISE 9.2.10. Suppose that E(Xn ) exists for all n, E(X) exists and X1 ≤
X2 ≤ · · · → X with probability one. Show that E(Xn |G) → E(X|G) a.e. on
{E(X1 |G) > −∞} as n → ∞. Hint: Consider B = {E(X1 |G) ≥ −b} and the
random variables Xn 1B .
108 9. CONDITIONAL EXPECTATION AND MARTINGALES
9.4. Martingales
Let (Ω, F, P) be a probability space. An increasing sequence of σ-algebras
F0 ⊆ F1 ⊆ F2 ⊆ · · · contained in F is called a filtration. A sequence of ran-
dom variables {Xn }n≥0 is said to be adapted to this filtration if for each n, Xn is
Fn -measurable. An adapted sequence is called a martingale if for each n, Xn is
integrable, and
E(Xn+1 |Fn ) = Xn a.s.
Note that by the tower property, E(Xn |Fm ) = Xm a.s. whenever n ≥ m. Also
note that E(Xn ) = E(X0 ) for all n.
E XAMPLE 9.4.1 (Simple random walk). Perhaps the simplest example of a
nontrivial martingale is a random walk with mean zero increments. Let X1 , X2 , . . .
be independent, integrable random variables with E(Xi ) = 0 for all i. Let S0 := 0
and Sn := X1 + · · · + Xn . Let Fn be the σ-algebra generated by X1 , . . . , Xn
for n ≥ 1, and let F0 be the trivial σ-algebra. Then it is easy to see that {Sn }n≥0
is a martingale adapted to the filtration {Fn }n≥0 . To see this, first note that that
E|Xi | < ∞ for each i, the triangle inequality shows that Sn is integrable for
each n. It is obvious that Sn is Fn -measurable. Next, by linearity of conditional
expectation, and the fact that Sn−1 is Fn−1 measurable, we have
E(Sn |Fn−1 ) = E(Sn−1 + Xn |Fn−1 )
= Sn−1 + E(Xn |Fn−1 ).
110 9. CONDITIONAL EXPECTATION AND MARTINGALES
E XERCISE 9.4.4. Let (Ω, F, P) be a probability space, and let {Fn }n≥0 be
a filtration of sub-σ-algebras of F. Let X be an integrable random variable de-
fined on Ω. Let Yn := E(X|Fn ). Show that {Yn }n≥0 is a martingale adapted to
{Fn }n≥0 .
Although most stopping times that occur in practice are unbounded, there is a
simple trick to apply the optional stopping theorem is a great variety of situations.
Let T be a stopping time with respect to a filtration {Fn }n≥0 . Take any n ≥ 0,
and let T ∧ n denote the minimum of T and n. That is, T ∧ n is a random variable
defined as T ∧ n(ω) := min{T (ω), n}. Then T ∧ n is a stopping time. To see this,
note that for any k ≥ 0,
(
{T > k} if n > k,
{T ∧ n > k} =
∅ if n ≤ k,
and apply Exercise 9.5.4. But note that T ∧ n is also bounded. Thus, we can apply
the optional stopping time with T ∧n, and later, take n → ∞ to recover T , because
limn→∞ T ∧ n = T on the set {T < ∞}.
To see how this works, consider the following. Let {Sn }n≥0 be a simple sym-
metric random walk on Z, starting at the origin. Take any a < 0 < b, and let
T := min{n : Sn = a or Sn = b}. Let Fn be the σ-algebra generated by
S0 , . . . , Sn . As noted in Section 9.4, {Sn }n≥0 is a martingale adapted to the filtra-
tion {Fn }n≥0 . Take any n. Then T ∧ n is a bounded stopping time, and hence, by
9.6. OPTIONAL STOPPING THEOREM 113
random variables adapted to a filtration {Fn }n≥0 . Define M0 := 0 and for each
n ≥ 1,
n−1
X
Mn := (Xk+1 − E(Xk+1 |Fk )),
k=0
and
n−1
X
An := (E(Xk+1 |Fk ) − Xk ).
k=0
Then note that by definition,
Xn = X0 + Mn + An . (9.7.1)
It is easy to see that Mn and An are both integrable for each n. Next, note that Mn
is Fn -measurable for each n, and
n−2
X
E(Mn |Fn−1 ) = (Xk+1 − E(Xk+1 |Fk )) = Mn−1 .
k=0
Thus, {Mn }n≥0 is a martingale adapted to {Fn }n≥0 . Finally, note that {An }n≥0
is not only an adapted process, but actually, An is Fn−1 -measurable for each n.
Such processes are called ‘predictable’. Thus, the Doob decomposition (9.7.1) ex-
presses any adapted process as the sum of the initial value, plus a martingale, plus
a predictable process. Additionally, if {Xn }n≥0 is a submartingale, we have that
E(Xn+1 |Fn ) ≥ Xn for each n, and hence {An }n≥0 is a nonnegative and increas-
ing process. Similarly, if {Xn }n≥0 is a supermartingale, {An }n≥0 is a nonpositive
and decreasing process.
The following construction is another method of obtaining submartingales, su-
permartingales and martingales from arbitrary adapted processes, which is often
useful in applications.
P ROPOSITION 9.7.2. Let {Xn }n≥0 be a sequence of integrable random vari-
ables adapted to a filtration {Fn }n≥0 . Let T be a stopping time for this filtration.
Suppose that for each n, E(Xn+1 |Fn ) ≥ Xn a.s. on the set {T > n}, that is,
P({E(Xn+1 |Fn ) < Xn } ∩ {T > n}) = 0.
Then {XT ∧n }n≥0 is a submartingale adapted to {Fn }n≥0 . Similarly, if we have
that E(Xn+1 |Fn ) ≤ Xn a.s. on {T > n}, then {XT ∧n }n≥0 is a supermartingale,
and if E(Xn+1 |Fn ) = Xn a.s. on {T > n}, then {XT ∧n }n≥0 is a martingale.
(The process {XT ∧n }n≥0 is often called the ‘stopped’ version of the process
{Xn }n≥0 , since it progresses as Xn up to time T , and then gets frozen at XT .)
is a trivial consequence of the above definition that Vn ≥ E(Vn+1 |Fn ) a.s. Note
also that Vn ≥ Yn a.s. for each n. In fact, the following holds (which is why it’s
called an ‘envelope’):
E XERCISE 9.8.2. Prove that {Vn }1≤n≤N is the ‘smallest’ supermartingale
dominating {Yn }1≤n≤N , in the sense that if {Un }1≤n≤N is any other supermartin-
gale adapted to {Fn }1≤n≤N such that Un ≥ Yn a.s. for all n, then Un ≥ Vn a.s. for
all n.
Finally, define
τ := min{n : Vn = Yn }.
Note that τ ∈ {1, . . . , N }, since there is always at least one n (namely, N ) such
that Vn = Yn . Note also that τ is a stopping time with respect to {Fn }1≤n≤N ,
since
{τ = n} = {Vn = Yn } ∩ {Vk 6= Yk for all k < n}.
We now arrive at the main result of this section, which is that τ is the solution of
the optimal stopping problem mentioned above.
T HEOREM 9.8.3. The stopping time τ maximizes E(Xτ ) among all stopping
times adapted to the filtration {Fn }1≤n≤N . Moreover, this maximum value is equal
to E(V1 ).
To inject randomness, let us make the quite reasonable assumption that the
(r1 , . . . , rN ) is uniformly distributed over the set of all permutations of {1, . . . , n}.
Since the employer knows only r1n , . . . , rnn at time n, let Fn be the σ-algebra gen-
erated by this vector. Clearly, F1 ⊆ F2 ⊆ · · · ⊆ FN . We want to find a stopping
time τ with respect to this filtration that maximizes P(rτ = 1). To put this into the
framework of Theorem 9.8.3, let Xn := 1{rn =1} , so that P(rT = 1) = E(XT ) for
any stopping time T .
First, let us compute Yn = E(Xn |Fn ). If rnn 6= 1, then clearly rn 6= 1. On the
other hand, given rn = 1, the remaining ri ’s form a uniform random permutation
of 2, . . . , n. In particular, given rn = 1, all possible rankings of the first n − 1
candidates are equally likely. Consequently, for any permutation σ1 , . . . , σn−1 of
2, . . . , n,
1
P(r1n = σ1 , . . . , rn−1
n
= σn−1 , rnn = 1|rn = 1) = .
(n − 1)!
By Bayes’ rule, this gives
P(rn = 1|r1n = σ1 , . . . , rn−1
n
= σn−1 , rnn = 1)
n
P(r1n = σ1 , . . . , rn−1 = σn−1 , rnn = 1|rn = 1)P(rn = 1)
= n n
P(r1 = σ1 , . . . , rn−1 = σn−1 , rnn = 1)
(1/(n − 1)!)(1/N ) n
= = .
(1/n!) N
The above argument shows that
n
Yn = E(Xn |Fn ) = P(rn = 1|Fn ) = 1 n .
N {rn =1}
In other words, Yn = n/N if candidate n is the best among the first n candidates,
and 0 otherwise. The fact that makes it easy to apply Theorem 9.8.3 in this problem
is the following.
L EMMA 9.8.4. For each 2 ≤ n ≤ N , Yn is independent of Fn−1 . Moreover,
P(Yn = n/N ) = 1/n.
P ROOF. Since Yn can take only two values, 0 and n/N , it suffices to show that
the random variable P(Yn = n/N |Fn−1 ), which is the same as P(rnn = 1|Fn ), is
actually nonrandom. Take any permutation σ1 , . . . , σn−1 of 1, . . . , n − 1. Then
P(rnn = 1|r1n−1 = σ1 , . . . , rn−1
n−1
= σn−1 )
P(rnn = 1, r1n−1 = σ1 , . . . , rn−1
n−1
= σn−1 )
=
P(r1n−1 = σ1 , . . . , rn−1
n−1
= σn−1 )
P(rnn = 1, r1n−1 = σ1 + 1, . . . , rn−1
n−1
= σn−1 + 1) 1/n! 1
= = = .
P(r1n−1 = n−1
σ1 , . . . , rn−1 = σn−1 ) 1/(n − 1)! n
Thus, P(Yn = n/N |Fn−1 ) ≡ 1/n. This completes the proof of the lemma.
Now let V1 , . . . , VN be defined by backward induction as in (9.8.1), starting
with VN = YN . Lemma 9.8.4 yields the following corollary.
120 9. CONDITIONAL EXPECTATION AND MARTINGALES
for all n ≥ tN . In particular, for any n < tN , Yn ≤ n/N < vnN , and therefore,
N
(9.8.2) shows that vn−1 = vnN . On the other hand, for tN ≤ n ≤ N − 1, (9.8.2)
and Lemma 9.8.4 imply that
N n
vn−1 = P(Yn = n/N ) + vnN P(Yn = 0)
N
1 1 N
= + 1− v .
N n n
N
Using this, and the fact that vN −1 = 1/N , it is now easy to prove by backward
induction that for tN − 1 ≤ n ≤ N − 1,
N −1
1 n X 1
vnN = + . (9.8.3)
N N k
k=n+1
That is, we should choose the first candidate since time tN who is better than
everyone who came before him. Moreover, with this optimal rule, the probability of
choosing the best candidate is v1N , which is equal to vtNN −1 . By the characterization
of tN given above, and the formula (9.8.3), this value is asymptotically 1/e.
And we keep going like this. That is, taking T0 = −1, we have that for all k ≥ 1,
Sk = inf{n > Tk−1 : Xn ≤ a},
Tk = inf{n > Sk : Xn ≥ b}.
In the above, we follow the convention that the infimum of an empty set is infinity.
The intervals {Sk , Sk + 1, . . . , Tk }, for all k such that Sk is finite, are called the
upcrossings of [a, b] by {Xn }n≥0 .
L EMMA 9.9.1 (Upcrossing lemma). Let {Xn }n≥0 be a submartingale sequence
adapted to some filtration {Fn }n≥0 . Take any interval [a, b] with a < b. Let Um
be the number of upcrossings of this interval by the sequence {Xn }n≥0 that are
completed by time m. Then
E(Xm − a)+ − E(X0 − a)+
E(Um ) ≤ .
b−a
Thus,
m
X
(b − a)E(Um ) ≤ E(Yn − Yn−1 ) = E(Ym − Y0 ).
n=1
P ROOF. Take any interval [a, b] with a < b. Let Um be the number of up-
crossings of this interval by the sequence {Xn }n≥0 that are completed by time m.
Then {Um }m≥0 is an increasing sequence of random variables, whose limit U is
the total number of upcrossings of this interval. But by the upcrossing lemma and
the condition that supn≥0 E(Xn+ ) < ∞, we get that E(Um ) is uniformly bounded.
Thus by the monotone convergence theorem, E(U ) < ∞ and hence U < ∞
a.s. But this is true for any [a, b], and so it holds almost surely for all intervals
with rational endpoints. This implies that lim inf n→∞ Xn cannot be strictly less
than lim supn→∞ Xn , because otherwise there would exist an interval with ratio-
nal endpoints which is upcrossed infinitely many times. This proves the existence
of the limit X.
By Fatou’s lemma, E(X + ) ≤ lim inf n→∞ E(Xn+ ) < ∞. On the other hand,
since {Xn }n≥0 is a submartingale sequence,
E(Xn− ) = E(Xn+ ) − E(Xn ) ≤ E(Xn+ ) − E(X0 ),
which proves the uniform boundedness of E(Xn− ). Thus again by Fatou’s lemma,
E(X − ) < ∞ and hence E|X| < ∞.
E XERCISE 9.9.3. Prove that a nonnegative supermartingale converges almost
surely to an integrable random variable.
E XERCISE 9.9.4 (Pólya’s urn). Consider an urn which initially has one black
ball and one white ball. At each turn, a ball is picked uniformly at random from
the urn, and replaced back into the urn with an additional ball of the same color.
Let Wn be the fraction of white balls at time n (so that W0 = 1/2). Prove that
{Wn }n≥0 is a martingale with respect to a suitable filtration, and show that it con-
verges almost surely to a limit. Show that the limit is uniformly distributed in [0, 1].
Hint: For the last part, prove by induction that each Wn is uniformly distributed on
a certain set depending on n.
E XERCISE 9.9.5. Show by a counterexample that the convergence in the sub-
martingale convergence theorem need not hold in L1 . Hint: Consider the martin-
gale {ST ∧n }n≥0 , where {Sn }n≥0 is a simple symmetric random walk starting at 0,
and T is the first time to hit 1.
124 9. CONDITIONAL EXPECTATION AND MARTINGALES
P ROOF. Let Xn := E(X|Fn ). Take any interval [a, b] with a < b. For each n,
let Un be the number of complete upcrossings of this interval by the finite sequence
Xn , Xn−1 , . . . , X0 . Note that this sequence is a martingale with respect to the
filtration Fn , Fn−1 , . . . , F0 . Therefore by the upcrossing lemma,
E(X0 − a)+
E(Un ) ≤ .
b−a
Since X0 = E(X|F0 ) is integrable, the quantity on the right is finite. Moreover,
it does not depend on n. Thus, supn≥0 E(Un ) < ∞. But Un increases to a limit
as n → ∞, and if this limit is finite, then the sequence {Xn }n≥0 cannot attain
infinitely many values in both the intervals (−∞, a] and [b, ∞). Therefore, with
probability one, this cannot happen for any interval with rational endpoints. This
implies that Xn converges almost surely to a limit X ∗ .
The next step is to show that the sequence {Xn }n≥0 is uniformly integrable.
Since
|Xn | = |E(X|Fn )| ≤ E(|X||Fn ),
we have
E(|Xn |; |Xn | > K) ≤ E(E(|X||Fn ); E(|X||Fn ) > K)
= E(|X|; E(|X||Fn ) > K). (9.10.1)
Take any > 0. By Corollary 8.3.4, there is some δ > 0 such that for any
event A with P(A) < δ, we have E(|X|; A) < . Fixing K, let An be the event
{E(|X||Fn ) > K}. By Markov’s inequality,
E(E(|X||Fn )) E|X|
P(An ) ≤ = ,
K K
which can be made less than δ by choosing K large enough. Therefore, by (9.10.1),
E(|Xn |; |Xn | > K) ≤ E(|X|; An ) ≤ .
This proves the uniform integrability of {Xn }n≥0 . By Proposition 8.3.1, we con-
clude that Xn → X ∗ in L1 . This implies that for any B ∈ F ∗ ,
E(X ∗ ; B) = lim E(Xn ; B) = E(X; B),
n→∞
where the last identity holds because Xn = E(X|Fn ) and B ∈ Fn for every n.
So, to complete the proof we only have to show that X ∗ is F ∗ -measurable. To
9.11. DE FINETTI’S THEOREM 125
P ROOF. For any Borel set A ⊆ R, it is easy to check that the set
{ω : f (X1 (ω), X2 (ω), . . .) ∈ A}
is in En , because it is equal to X −1 (B), where B = f −1 (A), and B is invariant
under permutations of the first n coordinates.
L EMMA 9.11.4. For any n, any bounded measurable function ϕ : Rn → R,
and any permutation π of 1, . . . , n,
E(ϕ(Xπ(1) , . . . , Xπ(n) )|En ) = E(ϕ(X1 , . . . , Xn )|En ) a.s.
P ROOF. Let Y and Z denote the two conditional expectations displayed above,
that have to be shown to be equal. Take any A ∈ En , so that A = X −1 (B) for
some measurable set B ⊆ RN that is invariant under permutations of the first n
coordinates. Let f := 1B . Then
E(Y ; A) = E(ϕ(Xπ(1) , . . . , Xπ(n) )f (X1 , X2 , . . .))
= E(ϕ(Xπ(1) , . . . , Xπ(n) )f (Xπ(1) , Xπ(2) , . . . , Xπ(n) , Xn+1 , Xn+2 , . . .))
= E(ϕ(X1 , . . . , Xn )f (X1 , X2 , . . . , Xn , Xn+1 , Xn+2 , . . .))
= E(Z; A),
where the second identity holds because f is invariant under permutations of the
first n coordinates, and the third identity holds because X1 , X2 , . . . are exchange-
able.
L EMMA 9.11.5. For any bounded measurable function ϕ : R → R,
n
1X
ϕ(Xi ) → E(ϕ(X1 )|E)
n
i=1
a.s. as n → ∞.
a.s. as n → ∞. By Lemma 9.11.4, E(ϕ(X1 )|En ) = E(ϕ(Xi )|En ) a.s. for any
1 ≤ i ≤ n. Thus,
n
1X
E(ϕ(X1 )|En ) = E(ϕ(Xi )|En ) a.s.
n
i=1
X n
1
=E ϕ(Xi ) En a.s.
n
i=1
−1
Pn
But by Lemma 9.11.3, n i=1 ϕ(Xi ) is En -measurable. Thus,
X n n
1 1X
E ϕ(Xi ) En = ϕ(Xi ) a.s.
n n
i=1 i=1
Since the ϕi ’s are bounded functions, a simple calculation shows that Yn −Zn → 0
as n → ∞. By equation (9.11.1), E(ϕ1 (X1 ) · · · ϕk (Xk )|En ) = E(Yn |En ), and by
Lemma 9.11.3, Yn is En -measurable. Combining these observations completes the
proof.
Finally, we are ready to prove De Finetti’s theorem.
P ROOF OF T HEOREM 9.11.2. Take any k, and any bounded measurable func-
tions ϕ1 , . . . , ϕk : Rk → R. By the downwards convergence theorem,
E(ϕ1 (X1 ) · · · ϕk (Xk )|En ) → E(ϕ1 (X1 ) · · · ϕk (Xk )|E)
128 9. CONDITIONAL EXPECTATION AND MARTINGALES
P ROOF. Let Xn := E(X|Fn ). Note that the sequence {Xn }n≥0 is a martin-
gale with respect to the filtration {Fn }n≥0 . Moreover, by Jensen’s inequality for
conditional expectation, E|Xn | ≤ E|X| for all n. Therefore by Theorem 9.9.2,
there is an integrable random variable Y such that Xn → Y a.s. By a similar argu-
ment as in the proof of the backward martingale convergence theorem, we see that
{Xn }n≥0 is uniformly integrable. Thus, Xn → Y in L1 . Finally, we need to show
that Y = E(X|F ∗ ) a.s. To prove this, take any A ∈ ∪n≥1 Fn . Then A ∈ Fn for
some n, and so
E(X; A) = E(Xn ; A).
But Xn = E(Xm |Fn ) for any m ≥ n. Thus, E(X; A) = E(Xm ; A) for any
m ≥ n. Since Xm → Y in L1 as m → ∞, this shows that E(X; A) = E(Y ; A). It
is now a simple exercise using the π-λ theorem to show that E(X; A) = E(Y ; A)
for all A ∈ F ∗ , which proves that Y = E(X|F ∗ ).
9.13. Lp CONVERGENCE OF MARTINGALES 129
P ROOF. For each 0 ≤ i ≤ n, let Ai be the event that Xj < t for j < i and
Xi ≥ t. Then the event {Mn ≥ t} is the union of the disjoint events A0 , . . . , An .
Thus,
Xn
E(Xn ; Mn ≥ t) = E(Xn ; Ai ).
i=0
But Ai is Fi -measurable and E(Xn |Fi ) = Xi . Thus,
Xn
E(Xn ; Mn ≥ t) = E(Xi ; Ai ).
i=0
But if Ai happens, then Xi ≥ t. Thus,
n
X
E(Xn ; Mn ≥ t) ≥ t P(Ai ) = tP(Mn ≥ t),
i=0
which is what we wanted.
Doob’s maximal inequality implies the following inequality, known as Doob’s
Lp inequality.
T HEOREM 9.13.2 (Doob’s Lp inequality). Let {Xi }0≤i≤n be a martingale
or nonnegative submartingale adapted to a filtration {Fi }0≤i≤n . Let Mn :=
max0≤i≤n |Xi |. Then for any p > 1,
p
p p
E(Mn ) ≤ E|Xn |p .
p−1
130 9. CONDITIONAL EXPECTATION AND MARTINGALES
P ROOF. The converse implication is trivial, so let us prove the forward direc-
tion only. Since the Lp norm of Xn is uniformly bounded, so is the L1 norm.
Therefore, by the submartingale convergence theorem, there exists a random vari-
able X such that Xn → X a.s. By Fatou’s lemma,
E|X|p = E(lim inf |Xn |p ) ≤ lim inf E|Xn |p < ∞.
n→∞ n→∞
Thus, X ∈ Lp . For each n, let
Zn := sup |Xm − X|p ,
0≤m≤n
and
Z := lim Zn = sup |Xn − X|p .
n→∞ n≥0
By Jensen’s inequality,
p
Xn − X |Xn |p + |X|p
|Xn − X|p = 2p ≤ 2p .
2 2
9.13. Lp CONVERGENCE OF MARTINGALES 131
and using this, give a direct proof that Xn converges in L2 to some limit random
variable.
For the necessary and sufficient condition for L1 convergence of martingales,
recall the definition of uniform integrability from Section 8.3 of Chapter 8.
T HEOREM 9.13.5. A submartingale converges in L1 if and only if it is uni-
formly integrable, and in this situation, it also converges a.s. to the same limit.
assumptions are that the individuals give birth independently, and the distribution
of the number of offspring per individual is the same throughout. Mathematically,
if Zn denotes the number of individuals in generation n, then
Zn
X
Zn+1 = Xn,i ,
i=1
where Xn,i are i.i.d. random variables from the offspring distribution, which are
independent of everything up to generation n. Suppose that the mean number of
offspring per individual, µ, is finite. Then:
(1) Show that Mn := µ−n Zn is a martingale adapted to a suitable filtration.
(2) Verifying the required conditions, conclude that limn→∞ Mn exists a.s. and
is integrable. From this, show that if µ < 1, then Zn → 0 a.s.
(3) If µ = 1, use the above fact and the fact that Zn is integer-valued to show
that Zn → 0 a.s. when the number of offspring is not always exactly
equal to 1.
(4) Suppose that the variance of the offspring distribution, σ 2 , is finite. Then
show that E(Zn+12 |F ) = µ2 Z 2 + σ 2 Z .
n n n
(5) Use the above result to show that if µ 6= 1,
σ 2 µn − 1
Nn := Mn2 − Mn
µn+1 µ − 1
is a martingale.
(6) Using the above, show that supn≥1 E(Mn2 ) < ∞ if µ > 1 and σ 2 <
∞. From this, conclude that P(limn→∞ Mn > 0) > 0, that is, there
is positive probability that Zn behaves like a positive multiple of µn as
n → ∞.
E XERCISE 9.13.10. Using the above exercise, show that C[0, 1] is a dense
subset of L2 [0, 1]. (Hint: First show that step functions are dense, and then ap-
proximate continuous functions by dense functions.)
(2) Take any 0 < a < b < 1, and any N . Define a function fN : [0, 1] →
[0, ∞) as
1 + cos 2π(u − x) N
Z b√
fN (x) := N du.
a 2
Show that fN is uniformly bounded above by a constant that does not de-
pend on N , and that as N → ∞, fN (x) → 0 if x ∈ / [a, b], and fN (x) → c
if x ∈ (a, b), where c is a positive real number.
(3) Using the above and Exercise 9.12.2, show that if g ∈ L1 [0, 1] is orthogo-
nal to sin 2πnx and cos 2πnx for all n (under the L2 inner product), then
g = 0 a.e. on [0, 1].
P ROOF. Let
n−1
X
Yn := Xn − (ξk − ζk ).
k=0
Then note that
n
X
E(Yn+1 |Fn ) = E(Xn+1 |Fn ) − (ξk − ζk )
k=0
Xn
≤ Xn + ξn − ζn − (ξk − ζk )
k=0
n−1
X
= Xn − (ξk − ζk ) = Yn .
k=0
Thus, {Yn }n≥0 is a supermartingale. Take any a > 0. Let τ := inf{n : nk=0 ξk ≥
P
a}. Note that τ is allowed to be infinity. By the optional stopping theorem, {Yτ ∧n }
134 9. CONDITIONAL EXPECTATION AND MARTINGALES
For example, we can take an = n−3/4 . The following theorem shows that under
mild conditions, Xn converges almost surely to x∗ as n → ∞.
9.14. ALMOST SUPERMARTINGALES 135
Further, suppose that Var(Yn |Xn ) is uniformly bounded above by a constant. Then
Xn → x∗ almost surely.
The following exercise is a simple fact from real analysis that will be needed
in the subsequent problems.
136 9. CONDITIONAL EXPECTATION AND MARTINGALES
E XERCISE 9.14.4 (Strong law of large numbers for martingales). Let {Sn }n≥0
be a mean zero, square-integrable martingale adapted to a filtration {Fn }n≥0 . Let
{cn }n≥1 be a predictable and increasing sequence of random variables (that is, cn
is Fn−1 -measurable and cn ≤ cn+1 for each n). Let σn2 := Var(Sn |Fn−1 ). Then,
show that Sn /cn → 0 a.s. on the set
∞
σn2
X
< ∞ and lim cn = ∞ .
c2n n→∞
n=0
(Hint: Define Zn := Sn /c2n and show
2 that it is an almost supermartingale, and then
apply the Robbins–Siegmund theorem.)
E XERCISE 9.14.6. Let {Xn }n≥0 be a sequence of {0, 1}-valued random vari-
ables adapted to a filtrationP{Fn }n≥0 . Let pn := P(Xn = 1|Fn−1 ) (with p0 :=
P(X0 = 1)) and let Sn := nk=0 Xk . Then show that the limit
Sn
lim Pn
k=0 pk
n→∞
P a.s. Moreover, show that the limit is equal to 1 almost surely on the set
exists
{ ∞ n=0 pn = ∞}. (Hint: Define a suitable almost supermartingale.)
CHAPTER 10
Ergodic theory
Ergodic theory is the study of measure preserving transforms and their proper-
ties. It is useful in probability theory for the study of random processes whose laws
are invariant under various groups of transformations. This chapter gives an intro-
duction to the basic notions and results of ergodic theory and some applications in
probability.
P ROOF. By Dynkin’s π-λ theorem, it suffices to check that the set L of all
A ∈ F such that P(ϕ−1 A) = P(A) is a λ-system. First, note that ϕ−1 Ω = Ω.
Thus, Ω ∈ L. Next, if A ∈ L, then
P(ϕ−1 Ac ) = P((ϕ−1 A)c ) = 1 − P(ϕ−1 A) = 1 − P(A) = P(Ac ),
which shows that Ac ∈ L. Finally, if A1 , A2 , . . . are disjoint elements of L, and
A = ∪i=1 6∞Ai , then
[∞
−1 −1
P(ϕ A) = P ϕ Ai
i=1
∞
X ∞
X
= P(ϕ−1 Ai ) = P(Ai ) = P(A).
i=1 i=1
Thus, L is a λ-system.
E XAMPLE 10.1.2 (Bernoulli shift). Let Ω = {0, 1}N be the set of all infinite
sequences of 0’s and 1’s. Let F be the product σ-algebra on this space, and P be
the product of Ber(p) measures. The ‘shift operator’ on this space is defined as
ϕ(ω0 , ω1 , . . .) = (ω1 , ω2 , . . .).
137
138 10. ERGODIC THEORY
We now show that this map is measure preserving. Let A be the collection of all
sets of the form
{ω : ω0 = x0 , . . . , ωn = xn }
for some n ≥ 0 and some x0 , . . . , xn ∈ {0, 1}. Clearly, this is a π-system that
generates the product σ-algebra. Let A denote the set displayed above. Then ϕ−1 A
is the set of all sequences (ω0 , ω1 , . . .) such that (ω1 , ω2 , . . .) ∈ A, that is, the set
of all ω such that ω1 = x0 , ω2 = x1 , . . . ωn+1 = xn . Under the product measure,
both A and ϕ−1 A have the same measure. By Lemma 10.1.1, this shows that ϕ is
measure preserving.
E XAMPLE 10.1.3 (Rotations of the circle). Let Ω = [0, 1) with the Borel σ-
algebra and Lebesgue measure. Take any θ ∈ [0, 1), and let
(
x+θ if x + θ < 1,
ϕ(x) := x + θ (mod 1) =
x + θ − 1 if x + θ ≥ 1.
One way to view ϕ is to consider [0, 1) as the unit circle in the complex plane via
the map x 7→ e2πix , which makes ϕ a ‘rotation by angle θ’. We claim that ϕ is
measure preserving. To see this, simply note that the map ϕ is a bijection with
(
−1 x−θ if x − θ ≥ 0,
ϕ (x) = x − θ (mod 1) =
x − θ + 1 if x − θ < 0.
and therefore, for any interval [a, b] ⊆ [0, 1),
[a − θ, b − θ]
if a ≥ θ,
−1
ϕ ([a, b]) = [0, b − θ] ∪ [1 + a − θ, 1] if a < θ ≤ b,
[1 + a − θ, 1 + b − θ] if b < θ.
In all three cases, the Lebesgue measure of ϕ−1 ([a, b]) equals b − a. Since the
set of closed intervals is a π-system that generates the Borel σ-algebra on [0, 1),
Lemma 10.1.1 shows that ϕ is measure preserving.
E XERCISE 10.1.4. Show that the map ϕ(x) := 2x (mod 1) preserves Lebesgue
measure on [0, 1).
Measure preserving transforms possess the following important property.
P ROPOSITION 10.1.5. If ϕ is a measure preserving map on a probability space
(Ω, F, P), then for any integrable random variable X defined on this space, X ◦ ϕ
has the same distribution as X, and hence, has the same expected value as X.
By the last three displays, we see that E(X; Mk > 0) must be nonnegative.
We are now ready to prove Birkhoff’s ergodic theorem.
142 10. ERGODIC THEORY
P ROOF OF T HEOREM 10.3.1. First, we claim that it suffices to prove the the-
orem under the assumption that E(X|I) = 0 a.s. To see this, suppose that the the-
orem holds under this assumption. Let Y := E(X|I). Since Y is I-measurable,
it satisfies Y ◦ ϕm = Y for any m, by Exercise 10.2.4. Let Z := X − Y , so that
E(Z|I) = 0 a.s. Thus, by the assumed version of the theorem,
n−1
1 X
Z ◦ ϕm → 0
n
m=0
Thus, the previous display shows that Sk∗ (ω) > 0 for some k, and hence, Mk∗ (ω) >
0 for some k. In other words, ω ∈ F . Thus, D ⊆ F , completing the proof of our
claim that F = D.
Since Mk∗ is an increasing sequence, so is Fk . Thus, X ∗ 1Fk → X ∗ 1F point-
wise as k → ∞. Moreover, |X ∗ 1Fk | ≤ |X ∗ | ≤ |X| + for all k, and X is
integrable. So, by the dominated convergence theorem, we get that
lim E(X ∗ ; Fk ) = E(X ∗ ; F ).
k→∞
By the maximal ergodic theorem, E(X ∗ ; Fk ) ≥ 0 for any k. Since F = D, this
shows that
E(X ∗ ; F ) = E(X ∗ ; D) ≥ 0.
But
E(X ∗ ; D) = E((X − )1D ) = E(E(X|I); D) − P(D).
Since E(X|I) = 0 a.s., this shows that −P(D) ≥ 0. Thus, P(D) = 0. Looking
back at the definition of D, we conclude that lim sup Sn /(n + 1) ≤ 0. Replacing
X by −X throughout the proof, we conclude that lim inf Sn /(n + 1) ≥ 0. This
completes the proof of the a.s. convergence of Sn /n to 0.
It remains to prove convergence in L1 . Fix any M > 0. Define XM 0 :=
00 0
X1{|X|≤M } and XM := X − XM . Then by the almost sure part of the theorem,
n−1
1 X 0 0
XM ◦ ϕm → E(XM |I)
n
m=0
a.s. as n → ∞. Since the random variable on the right is uniformly bounded by M
in absolute value, the dominated convergence theorem implies that the convergence
also holds in L1 . Now, since E(X|I) = 0 a.s., we have
n−1
1 X
E X ◦ ϕm
n
m=0
n−1 n−1
1
X
0 0 1 X 00 00
≤E XM ◦ ϕm − E(XM |I) + E XM ◦ ϕm − E(XM |I) .
n n
m=0 m=0
We have already shown that the first term on the right tends to zero as n → ∞. On
the other hand,
n−1 n−1
1 X 00 m 00 1 X 00 00
E XM ◦ ϕ − E(XM |I) ≤ E|XM ◦ ϕm | + E|E(XM |I)|.
n n
m=0 m=0
By Proposition 10.1.5, 00
E|XM ◦ ϕm |
= 00 | for
E|XMany m, and by Jensen’s inequal-
ity,
00 00 00
E|E(XM |I)| ≤ E(E(|XM ||I)) = E|XM |.
Combining everything, we see that
n−1
1 X 00
lim sup E X ◦ ϕm ≤ 2E|XM |.
n→∞ n
m=0
144 10. ERGODIC THEORY
00 | → 0 as
But M is arbitrary, and by the dominated convergence theorem, E|XM
1
M → ∞. This proves the L part of the theorem.
Now let X denote the sequence (Xn )n≥0 . Let I be the invariant σ-algebra
of the shift operator on RN . Then J := X −1 (I) is a sub-σ-algebra of F, where
(Ω, F, P) is the probability space on which X is defined. Let us call this the ‘in-
variant σ-algebra of X’. A stationary sequence is called ‘ergodic’ if its invariant
σ-algebra is trivial.
As a corollary of Birkhoff’s ergodic theorem, we get the following theorem for
infinite stationary sequences of real-valued random variables.
T HEOREM 10.4.2. Let X = (Xn )n≥0 be a stationary sequence of integrable
real-valued random variables, with invariant σ-algebra J . Then
n−1
1 X
Xm → E(X0 |J )
n
m=0
this sequence, and evaluate the distribution of E(X0 |J ). (This gives an example
of a stationary sequence where the limit in Theorem 10.4.2 is not a constant.)
CHAPTER 11
Markov chains
In this chapter, we define and prove the existence of Markov chains on Eu-
clidean spaces and discuss some of their basic properties.
149
150 11. MARKOV CHAINS
P ROOF. First of all note that by Lemma 3.2.1, for any given x1 , . . . , xn−1 ,
the map xn 7→ f (x1 , . . . , xn ) is bounded and measurable. This proves that g is
well-defined. Next, note that if we can prove the claim when f = 1B for any mea-
surable set B ⊆ S n , the general claim will follow by the usual route of going from
indicator functions to simple functions and then to general measurable functions.
Thus, let us prove it when f = 1B . Suppose first that B = B1 × · · · × Bn for some
B1 , . . . , Bn ∈ S. Then
which is measurable by the properties of K. Now consider the set of all measurable
B for which the claim is true. Using the properties of K, it is not hard to see that
this is a λ-system, and by the above deduction, it contains the π-system of all
rectangular sets. The proof is now completed by invoking the π-λ theorem.
Using the properties of K and the monotone convergence theorem, it is not hard to
show that µn is a probability measure on S n and moreover, that the family {µn }n≥1
is consistent, in the sense that for any n and any A ∈ S n , µn+1 (A × S) = µn (A).
We will now show that there is a probability measure µ on the infinite product
space such that µn is the marginal of µ on S n for each n.
Note that we cannot apply Kolmogorov’s extension theorem, since S is not a
Euclidean space or a Polish space. Instead, consider the algebra A of all cylinder
sets, that is, sets of the form B = A × S × S × · · · for some A ∈ S n for some n.
Define µ(B) = µn (A)¡ which is well-defined due to the consistency of the family
{µn }n≥1 . Then µ is finitely additive on A. By Carathéordory’s extension theorem,
we only need to show that µ is countably additive on A.
As in the proof of Theorem 3.3.1, this can be reduced to showing that for any
sequences of set {An }n≥1 in A decreasing to ∅, µ(An ) → 0. So, let us take any
such sequence, and suppose that µ(An ) ≥ ε > 0 for all n.
Take any B ∈ A. Take any n such that B = A×S ×S ×· · · for some A ∈ S n .
For 1 ≤ k ≤ n − 1, define the “conditional probability of the chain being in B
given that the first k coordinates are x1 , . . . , xk ” to be
Z Z
µ(B|x1 , . . . , xk ) := · · · 1A (x1 , . . . , xn )K(xn−1 , dxn ) · · · K(xk , dxk+1 ).
on S k (by Lemma 11.1.4), and that for all 1 ≤ j < k < ∞ and x1 , . . . , xk ∈ S,
µ(B|x1 , . . . , xj )
Z Z
= · · · µ(B|x1 , . . . , xk )K(xk−1 , dxk ) · · · K(xj , dxj+1 ), (11.1.1)
and moreover,
Z
µ(B) = µ(B|x)dσ(x). (11.1.2)
The limit exists because {An }n≥1 is a decreasing sequence, and is measurable
because it is the limit of a sequence of measurable functions. Analogously, define
m0 := limn→∞ µ(An ). By (11.1.1), (11.1.2), and the dominated convergence
theorem, we get that for any 1 ≤ j < k,
Z Z
mj (x1 , . . . , xj ) = · · · mk (x1 , . . . , xk )K(xk−1 , dxk ) · · · K(xj , dxj+1 ,
and Z
m0 = m1 (x)dσ(x).
By assumption, m0 > 0. Thus, there exists x1 such that m1 (x1 ) > 0. Therefore,
there exists x2 such that m2 (x1 , x2 ) > 0, and so on. Let x := (x1 , x2 , . . .). We
claim that x ∈ ∩∞ n=1 An . To see this, take any n. Then An = B × S × S × · · · for
N
some B ∈ S , for some N (depending on An ). Since mN (x1 , . . . , xN ) > 0, we
have µ(Ak |x1 , . . . , xN ) > 0 for all k. In particular, µ(An |x1 , . . . , xN ) > 0. But
µ(An |x1 , . . . , xN ) = 1B (x1 , . . . , xN ).
Thus, x ∈ B × S × S × · · · = An . This proves that x ∈ ∩∞ n=1 An , giving us
the desired contradiction. Thus, µ extends to a probability measure on the infinite
product σ-algebra.
Finally, to produce an infinite Markov chain with transition kernel K and ini-
tial distribution σ, take the space S N with its product σ-algebra, and let X1 , X2 , . . .
be the coordinate maps. Let Fn := σ(X1 , . . . , Xn ). Then note that for any mea-
surable D ⊆ S n and B ⊆ S,
Z
1D (x1 , . . . , xn )K(xn , B)dµn (x1 , . . . , xn , xn+1 )
S n+1
Z
= 1D (x1 , . . . , xn )K(xn , B)dµn (x1 , . . . , xn )
Sn
Z
= 1D×B (x1 , . . . , xn , xn+1 )K(xn , dxn+1 )dµn (x1 , . . . , xn )
S n+1
Z
= µn+1 (D × B) = 1D (x1 , . . . , xn )1B (xn+1 )dµn+1 (x1 , . . . , xn , xn+1 ).
S n+1
152 11. MARKOV CHAINS
This shows that P(Xn+1 ∈ B|Fn ) = K(Xn , B). It is clear that X1 ∼ σ. This
completes the proof.
P ROOF. First note that by applying the definition of Markov chain in the third
step below,
P(X0 = i0 , . . . , Xn = in )
n
Y
= P(X0 = i0 ) P(Xm = im |X0 = i0 , . . . , Xm−1 = im−1 )
m=1
Yn
= P(X0 = i0 ) P(Xm = im |Xm−1 = im−1 )
m=1
= P(X0 = i0 )pi0 i1 pi1 i2 · · · pim−1 im .
This shows that
P(Xn = j|X0 = i)
X
= P(X1 = i1 , . . . , Xn−1 = in−1 , Xn = j|X0 = i)
i1 ,...,in−1 ∈S
X
= pii1 pi1 i2 · · · pin−1 j .
i1 ,...,in−1 ∈S
But the last expression is just the (i, j)th element of P n . Thus, P (n) = P n , as we
wanted to show.
11.2. MARKOV CHAINS ON COUNTABLE STATE SPACES 153
P ROOF. Fix some state i. For proving the above theorem, it is convenient to
define a random variable N which is the number of n ≥ 1 such that Xn = i. (Note
that we are ignoring n = 0. So if the chain starts from i but never comes back to
i, then N = 0. Also note that N is allowed to be ∞, in case the Markov chain
visits i infinitely many times.) The monotone convergence theorem for conditional
expectation implies that
X∞
E(N |X0 = i) = E 1{Xn =i} X0 = i
n=1
∞ ∞
(n)
X X
= P(Xn = i|X0 = i) = pii . (11.2.1)
n=1 n=1
Define
p = P(Xn = i for some n ≥ 1|X0 = i).
The identity (11.2.1) implies that to prove the theorem, we have to show that p < 1
if E(N |X0 = i) < ∞ and p = 1 if E(N |X0 = i) = ∞. To prove this, it suffices
to show that
∞
X
E(N |X0 = i) = pk . (11.2.2)
k=1
Note that by the monotone convergence theorem for conditional expectation,
X ∞
E(N |X0 = i) = E 1{N ≥k} X0 = i
k=1
∞
X
= P(N ≥ k|X0 = i).
k=1
Thus, to prove (11.2.2), it suffices to show that
P(N ≥ k|X0 = i) = pk . (11.2.3)
154 11. MARKOV CHAINS
By the definition of p, this holds for k = 1. Let us assume that this holds for k − 1.
For any n ≥ 1 and any j1 , . . . , jn ∈ S such that jn = i and exactly k − 2 of the
other jm ’s are equal to i, let Aj1 ,...,jn be the event {X1 = j1 , . . . , Xn = jn }. It is
not hard to see that the event {N ≥ k − 1} is the union of these events. Moreover,
these events are disjoint. Since {N ≥ k} ⊆ {N ≥ k − 1}, this gives
P(N ≥ k|X0 = i) = P({N ≥ k} ∩ {N ≥ k − 1}|X0 = i)
X
= P({N ≥ k} ∩ Aj1 ,...,jn |X0 = i)
X
= P(N ≥ k|Aj1 ,...,jn ∩ {X0 = i})P(Aj1 ,...,jn |X0 = i)
where the sum is over all n and j1 , . . . , jn as above. Now, for any m and i1 , . . . , im ,
let Bi1 ,...,im be the event {Xn+1 = i1 , . . . , Xn+m = im }. Using the definition of
Markov chain and the fact that jn = i, it is easy to see that
P(Bi1 ,...,im |Aj1 ,...,jn ∩ {X0 = i})
= P(Bi1 ,...,im |Xn = i) = pii1 pi1 i2 · · · pim−1 im .
Now, P(N ≥ k|Aj1 ,...,jn ∩ {X0 = i}) is just the sum of the above quantity over
all choices of m and i1 , . . . , im such that im = i and il 6= i for l < m. But on the
other hand, the sum of pii1 pi1 i2 · · · pim−1 im over the same set of m and i1 , . . . , im
equals p. Thus, for any such j1 , . . . , jn ,
P(N ≥ k|Aj1 ,...,jn ∩ {X0 = i}) = p.
Therefore,
X
P(N ≥ k|X0 = i) = P(N ≥ k|Aj1 ,...,jn ∩ {X0 = i})P(Aj1 ,...,jn |X0 = i)
X
=p P(Aj1 ,...,jn |X0 = i)
= pP(N ≥ k − 1|X0 = i).
Theorem 11.2.2 gives necessary and sufficient conditions for recurrence and
transience for a given state. What if we want such information for all states? For-
tunately, we do not usually have to check separately for each state, as we will see
below.
(n)
A state j of a Markov chain is said to be accessible from a state i if pij > 0
for some n ≥ 1. (Note that we are leaving out n = 0.) A Markov chain on a finite
state space is called irreducible if every state j is accessible from every state i.
(n)
Let us say that a state j is accessible from a state i in n steps if pij > 0. The
following lemma is often useful for checking irreducibility.
11.3. PÓLYA’S RECURRENCE THEOREM 155
P ROOF. Take any two states i and j such that j is accessible from i and i is ac-
(l) (m)
cessible from j. Thus, pij > 0 and pji > 0 for some l and m. By the Chapman–
Kolmogorov formula (which continues to hold on countable state spaces),
(n) (m) (n−m−l) (l)
pjj ≥ pji pii pij
for any n > m + l. Summing both sides over n, we see that the recurrence of
i implies the recurrence of j (by Theorem 11.2.2). Similarly, the recurrence of j
implies the recurrence of i. For an irreducible chain, every state is accessible from
every other state. Thus, either every state is recurrent or every state is transient.
Because of Proposition 11.2.4, an irreducible chain as a whole may be called
recurrent or transient. In the next section, we will see important examples of recur-
rent and transient chains.
P ROOF. Let rn denote the probability that the chain returns to the origin at
time n, given that it started from the origin at time 0. The number of ways this can
happen is counted as follows. First, suppose that the walk takes ni steps in the ith
coordinate direction (either forward or backward). Here n1 + · · · + nd = n. The
number of ways of allocating these steps is
n!
.
n1 !n2 ! · · · nd !
Having associated each step of the walk to a coordinate direction, we now have to
decide whether the move is forward or backward. If the walk has to return to the
origin at time n, the number of forward steps in any coordinate direction must be
equal tothe number of backward steps. For direction i, this allocation can be done
in nni /2
i
ways, provided that ni is even. Otherwise, this is zero. Combining, we
see that the number of paths that return to the origin at time n is
d
X n! Y ni
. (11.3.1)
n1 ,...,nd even, n1 !n2 ! · · · nd !
i=1
ni /2
n1 +···+nd =n
Therefore, rn equals the above quantity times (2d)−n . When d = 1, this gives
2n −2n
r2n = 2 ,
n
for any positive integer n. By Stirling’s formula, this is asymptotically
√ 1
2π(2n)2n+ 2 e−2n 2−2n 1
√ =√ .
n+ 12 −n 2 πn
( 2πn e )
In other words,
r2n
lim = 1.
n→∞ (πn)−1/2
By the ratio test for summability of a series, this proves that
X∞
r2n = ∞.
n=1
Therefore by Theorem 11.2.2, simple symmetric random walk on Z is recurrent.
Next, let us consider d = 2. By the formula (11.3.1), we get that if n is even,
then
X n!4−n
rn = .
((k/2)!(l/2)!)2
k,l even,
k+l=n
This can be rewritten as
n n
(2n)!4−2n 2n −2n X n 2
X
r2n = = 4 .
(k!(n − k)!)2 n k
k=0 k=0
11.3. PÓLYA’S RECURRENCE THEOREM 157
Now, since
n n
= ,
k n−k
it follows that
n 2
X n
k
k=0
is the coefficient of xnin the expansion of (1 + x)n (1 + x)n . But the coefficient of
xn in (1 + x)2n is 2n n . Thus,
n 2
X n 2n
= .
k n
k=0
Therefore,
∞
X ∞
X
r2n = (r2dm−2d+2 + r2dm−2d+4 + · · · + r2dm )
n=1 m=1
X∞
≤ r2dm ((2d)2d−2 + (2d)2d−4 + · · · + 1)
m=1
P∞
Thus, to prove n=1 r2n < ∞, it suffices to show that
∞
X
r2dn < ∞.
n=1
158 11. MARKOV CHAINS
Take any nonnegative integers k1 , . . . , kd that sum to dn. Writing ki ! as the product
of 1, 2, . . . , ki , we can write k1 !k2 ! · · · kd ! as a product of integers between 1 and
dn, possibly with many repeats. Using the condition k1 + · · · + kd = dn, little bit
of thought shows that in this product we can replace the integers bigger then n by
integers less than n in such a way that the end result is exactly equal to (n!)d . (If
some ki is less than n, it ‘falls short’ of contributing n − ki terms to n!. On the
other hand, if some ki is bigger than n, it contributes ki − n extra terms that are
all bigger than n. Since k1 + · · · + kd = dn, the number of missing contributions
exactly equals the number excess terms.) This procedure shows that
k1 !k2 ! · · · kd ! ≥ (n!)d .
Thus,
2
(dn)!d−dn (dn)!d−dn (dn)!d−dn
X X
≤
k1 !k2 ! · · · kd ! (n!)d k1 !k2 ! · · · kd !
k1 ,...,kd k1 ,...,kd
k1 +···+kd =dn k1 +···+kd =dn
(dn)!d−dn
, =
(n!)d
where the last identity was obtained by observing that the sum in the previous line
is the sum of a multinomial probability mass function. By Stirling’s formula, the
above quantity is asymptotic to
√ 1 √
2π(dn)dn+ 2 e−dn d−dn d
√ 1 = (d−1)/2
.
( 2πnn+ 2 e−n )d (2π) n(d−1)/2
On the other hand, by our calculations in the one-dimensional case, the term
2dn −2dn
2
dn
in (11.3.2) is asymptotic to (πdn)−1/2 . Thus, r2dn is bounded above by a quantity
that is asymptotic to C(d)n−d/2 , where C(d) is a constant that depends only on d.
Since
∞
X
n−d/2 < ∞
n=1
when d ≥ 3, this proves the summability of r2dn , and so by our previous discus-
sion, the summability of r2n . Since rn = 0 when n is odd, the completes the proof
of transience of simple symmetric random walk in d ≥ 3.
11.4. MARKOV CHAINS ON FINITE STATE SPACES 159
µ(n) = µP n .
But µP = P . Thus, µ(n) = µP n = µ for any n. The following result shows that
a time-homogeneous Markov chain on a finite state space always has at least one
invariant distribution.
µ0 + µ0 P + µ0 P 2 + · · · + µ 0 P n
µn = .
n+1
It is an easy consequence of the Chapman–Kolmogorov formula (as observed
above) that µ0 P k is the distribution of Xk if X0 ∼ µ0 . Since the average of a
set of probability distributions is again a probability distribution, µn must be a
160 11. MARKOV CHAINS
P ROOF. Define
P k − M
Q= .
1−
Since P k and M are both stochastic matrices, it follows that the each row of Q
sums to 1. Moreover, by the Doeblin condition, the entries of Q are nonnegative.
Thus, Q is a stochastic matrix. Now, note that
P k = (1 − )Q + M.
Also note that since µ is a probability distribution, µP = P , and P is a stochastic
matrix, we have the identities
M 2 = M, M P = M, P M = M.
This implies that QM = M Q = M . Consequently, any product of a sequence
of Q’s and M ’s that contains at least one M must be equal to M . Thus, for any
m ≥ 1, the distributive law for matrix products gives
P km = ((1 − )Q + M )m
m
m
X
m m
= (1 − ) Q + (1 − )m−k k M
k
k=1
= (1 − ) Q + (1 − (1 − )m )M.
m m
(n)
P ROOF. Take any state i. By Lemma 11.4.4, there is some m such that pii >
0 for all n ≥ m. By irreducibility and the finiteness of the state space, there is some
k ≥ 1 such that any j is accessible from i in less than k steps. Let m0 = m + k.
Take any n ≥ m0 and any state j. Then there is a number r < k such that j
is accessible from i in r steps. Also, since n − r > n − k ≥ m, the state i is
accessible from itself in n − r steps. Therefore by Lemma 11.2.3, j is accessible
from i in n steps. Thus, we have shown that any state is accessible from state i
in n steps whenever n is sufficiently large. From this it follows that when n is
(n)
sufficiently large, pij > 0 for all i and j, which is what we wanted to prove.
By the finiteness of the state space, Lemma 11.4.5 immediately yields the fol-
lowing corollary, because if the entries of P n are all strictly positive, then for any
given matrix M , there is some > 0 such that P n ≥ M .
C OROLLARY 11.4.6. Any irreducible aperiodic Markov chain on a finite state
space satisfies the Doeblin condition.
It is now easy to finish the proof of Theorem 11.4.3. By Theorem 11.4.1, there
is at least one invariant distribution µ. Let M be the matrix whose rows are all
equal to µ. By the above corollary, P n → M as n → ∞. To prove the uniqueness
of the invariant distribution, let ν be another invariant distribution. Then on the one
hand, νP n = ν for every n. On the other hand,
lim νP n = νM = µ,
n→∞
11.4. MARKOV CHAINS ON FINITE STATE SPACES 163
Since this holds for any σ, X1 is again uniformly distributed on Sn . Thus, the uni-
form distribution is an invariant distribution for this chain. The claim now follows
by the uniqueness of the invariant distribution.
distribution is given by
dv
µv = P .
u∈Vdu
Let us verify this by direct calculation. First,
P note that this is indeed a probability
distribution, since µv ≥ 0 for all v and v∈V µv = 1. Next, let puv be the
probability of transition from u to v. That is,
p
if u = v,
puv = (1 − p)/du if v is a neighbor of u,
0 otherwise.
dv
= pµv + (1 − p) P = pµv + (1 − p)µv = µv .
w∈V dw
Thus, µ is indeed the unique invariant distribution for this Markov chain.
CHAPTER 12
12.1. Definition
Let S be a Polish space with metric ρ. The notions of almost sure convergence,
convergence in probability and Lp convergence remain the same, with |Xn − X|
replaced by ρ(Xn , X). Convergence in distribution is a bit more complicated,
since cumulative distribution functions do not make sense in a Polish space. It
turns out that the right way to define convergence in distribution on Polish spaces
is to generalize the equivalent criterion given in Proposition 8.7.1.
D EFINITION 12.1.1. Let (S, ρ) be a Polish space. A sequence Xn of S-valued
random variables is said to converge weakly to an S-valued random variable X if
for any bounded continuous function f : S → R,
lim Ef (Xn ) → Ef (X).
n→∞
Just as on the real line, the law of an S-valued random variable X is the prob-
ability measure µX on S defined as
µX (A) := P(X ∈ A).
Of course, here the σ-algebra on S is the Borel σ-algebra generated by its topology.
By the following exercise, it follows that a sequence of random variables on S
converge weakly if and only if their laws converge weakly.
E XERCISE 12.1.2. Prove that the assertion of Exercise 6.1.1 holds on Polish
spaces.
For any n, the Euclidean space Rn with the usual Euclidean metric is an exam-
ple of a Polish space. The following exercises give some other examples of Polish
spaces.
165
166 12. WEAK CONVERGENCE ON POLISH SPACES
E XERCISE 12.1.3. Let C[0, 1] be the space of all continuous functions from
[0, 1] into R, with the metric
ρ(f, g) := sup |f (x) − g(x)|.
x∈[0,1]
Prove that this is a Polish space. (Often the distance ρ(f, g) is denoted by kf −gk∞
or kf − gk[0,1] or kf − gksup , and is called the sup-metric.)
E XERCISE 12.1.4. Let C[0, ∞) be the space of all continuous functions from
[0, ∞) into R, with the metric
∞
X kf − gk[j,j+1]
ρ(f, g) := 2−j ,
1 + kf − gk[j,j+1]
j=0
where
kf − gk[j,j+1] := sup |f (x) − g(x)|.
x∈[j,j+1]
Prove that this is a Polish space. Moreover, show that fn → f on this space if and
only if fn → f uniformly on compact sets.
E XERCISE 12.1.5. Generalize the above exercise to Rn -valued continuous
functions on [0, ∞).
E XERCISE 12.1.6. If X is a C[0, 1]-valued random variable, show that for any
t ∈ [0, 1], X(t) is a real-valued random variable (that is, it’s a measurable map
from the sample space into the real line).
E XERCISE 12.1.7. If X is a C[0, 1]-valued random variable defined on a prob-
ability space (Ω, F, P), show that the map (ω, t) 7→ X(ω)(t) from Ω × [0, 1] into
R is measurable with respect to the product σ-algebra. (Hint: Approximate X by
a sequence of piecewise linear random functions, and use the previous exercise.)
E XERCISE 12.1.8. If X is a C[0, 1]-valued random variable, show that for any
R1
bounded measurable f : R → R, 0 f (X(t))dt is a real-valued random variable.
Also, show that the map t 7→ Ef (X(t)) is measurable. (Hint: Use Exercise 12.1.7
and the measurability assertion from Fubini’s theorem.)
E XERCISE 12.1.9. Let X be a C[0, 1]-valued random variable. Prove that
max0≤t≤1 X(t) is a random variable.
(a) R n → µ weakly.
µ R
(b) R f dµn → R f dµ for every bounded and uniformly continuous f .
(c) f dµn → f dµ for every bounded and Lipschitz continuous f .
(d) For every closed set F ⊆ S,
lim sup µn (F ) ≤ µ(F ).
n→∞
(f) For every Borel set A ⊆ S such that µ(∂A) = 0 (where ∂A denotes the
topological boundary of A),
lim µn (A) = µ(A).
n→∞
R R
(g) f dµn → f dµ for every bounded measurable f : S → R that is
continuous a.e. with respect to the measure µ.
P ROOF. It is clear that (a) =⇒ (b) =⇒ (c). Suppose that (c) holds. Take any
closed set F ⊆ S. Define the function
f (x) = ρ(x, F ) := inf ρ(x, y). (12.2.1)
y∈F
Note that for any x, x0 ∈ S and y ∈ F , the triangle inequality gives ρ(x, y) ≤
ρ(x, x0 )+ρ(x0 , y). Taking infimum over y gives f (x) ≤ ρ(x, x0 )+f (x0 ). Similarly,
f (x0 ) ≤ ρ(x, x0 ) + f (x). Thus,
|f (x) − f (x0 )| ≤ ρ(x, x0 ). (12.2.2)
In particular, f is a Lipschitz continuous function. Since F is closed, it is easy to
see that f (x) = 0 if and only if x ∈ F . Thus, if we define gk (x) := (1 − kf (x))+ ,
then gk is Lipschitz continuous, takes values in [0, 1], 1F ≤ gk everywhere, and
gk → 1F pointwise as k → ∞. Therefore by (c),
Z Z
lim sup µn (F ) ≤ lim sup gk dµn = gk dµ
n→∞ n→∞
which proves (d). The implication (d) =⇒ (e) follows simply by recalling that
open sets are complements of closed sets. Suppose that (d) and (e) both hold. Take
any A such that µ(∂A) = 0. Let F be the closure of A and V be the interior of A.
Then F is closed, V is open, V ⊆ A ⊆ F , and
µ(F ) − µ(V ) = µ(F \ V ) = µ(∂A) = 0.
168 12. WEAK CONVERGENCE ON POLISH SPACES
which proves (f). Next, suppose that (f) holds. Take any f as in (g), and let Df
be the set of discontinuity points of f . Since f is bounded, we may apply a linear
transformation and assume without loss of generality that f takes values in [0, 1].
For t ∈ [0, 1], let At := {x : f (x) ≥ t}. Then by Fubini’s theorem, for any
probability measure ν on S,
Z Z Z 1
f (x)dν(x) = 1{f (x)≥t} dtdν(x)
S S 0
Z 1Z Z 1
= 1{f (x)≥t} dν(x)dt = ν(At )dt.
0 S 0
Now take any t and x ∈ ∂At . Then there is a sequence yn → x such that yn 6∈ At
for each n, and there is a sequence zn → x such that zn ∈ At for each n. Thus
if x 6∈ Df , then f (x) = t. Consequently, ∂At ⊆ Df ∪ {x : f (x) = t}. Since
µ(Df ) = 0, this shows that µ(At ) > 0 if and only if µ({x : f (x) = t}) > 0.
But µ({x : f (x) = t}) can be strictly positive for only countably many t. Thus,
µ(∂At ) = 0 for all but countably many t. Therefore by (f) and the a.e. version of
the dominated convergence theorem (Exercise 2.6.5),
Z Z 1 Z 1 Z
lim f dµn = lim µn (At )dt = µ(At )dt = f dµ,
n→∞ n→∞ 0 0
proving (g). Finally, if (g) holds then (a) is obvious.
An important corollary of the portmanteau lemma is the following.
C OROLLARY 12.2.2.R Let S beR a Polish space. If µ and ν are two probability
measures on S such that f dµ = f dν for every bounded continuous f : S → R,
then µ = ν.
Exercise 12.2.6 given below requires an application of criterion (g) from the
portmanteau lemma. Although criterion (g) implicitly assumes that the set of dis-
continuity points is a measurable set of measure zero, it is often not trivial to prove
measurability of the set of discontinuity points. The following exercise is helpful.
E XERCISE 12.2.5. Let (S, ρ) be a metric space, and let f : S → R be any
function. Prove that the set of continuity points of f is measurable. (Hint: For any
open set U , define d(U ) := supx∈U f (x) − inf x∈U f (x). Let V be the union of
all open U with d(U ) < . Show that the set of continuity points of f is exactly
∩n≥1 V1/n .)
E XERCISE 12.2.6. In the setting of the previous exercise, suppose further that
P(X(t) = 0) = 0 for a.e. t ∈ [0, 1]. Then show that
Z 1 Z 1
d
1{Xn (t)≥0} dt → 1{X(t)≥0} dt.
0 0
(Hint: Use criterion (g) from the portmanteau lemma.)
E XERCISE 12.2.7. In the previous exercise, give a counterexample to show
that the conclusion may not be valid without the condition that P(X(t) = 0) = 0
for a.e. t ∈ [0, 1].
E XERCISE 12.2.8. Suppose that S and S 0 are Polish spaces and h : S → S 0
is a measurable map. Let X1 , X2 , . . . be a sequence of S-valued random variables
converging weakly to an S-valued random variable X. Let D be the set of dis-
continuity points of h. If P(X ∈ D) = 0, show that h(X1 ), h(X2 ), . . . converge
weakly to h(X).
for each n. By the almost sure version of the dominated convergence theorem
(Exercise 2.6.5), E(f (Xn )1A ) → E(f (X)1A ) and E(f (Xn )) → E(f (X)) as
n → ∞. Thus, E(f (X)1A ) = E(f (X))P(A).
Now, for any bounded continuous g, h : R → R, the composition g ◦ f is a
bounded continuous function from S into R, and the random variable h(1A ) is just
a linear transformation of 1A . Thus, by the above deduction, E(g(f (X))h(1A )) =
E(g(f (X)))E(h(1A )). So by Corollary 12.7.4, the random variables f (X) and 1A
are independent.
Now take any closed set F ⊆ S, and let f (x) := max{ρ(x, F ), 1}, where
ρ(x, F ) is the distance from x to F as defined in (12.2.1). We have already seen
in the proof of the Portmanteau lemma that x 7→ ρ(x, F ) is continuous. Thus, f
is a continuous map. Therefore f (X) and 1A are independent. In particular, the
events {f (X) = 0} and A are independent. But f (X) = 0 if and only if X ∈ F .
Thus, the events {X ∈ F } and A are independent. Since this holds for every
closed F , the random variable X is independent of the event A (for example, by
Exercise 7.1.9). And since this holds for all A ∈ G, X and G are independent.
P ROOF. Let (S, ρ) be a Polish space and let {µn }∞ n=1 be a sequence of prob-
ability measures on S that converge weakly to some µ. First, we claim that if
{Vi }∞
i=1 is any increasing sequence of open sets whose union is S, then for each
> 0 there is some i such that µn (Vi ) > 1 − for all n. If not, then for each i
there exists ni such that µni (Vi ) ≤ 1 − . There cannot exist k such that ni = k for
infinitely many i, because then µk (Vi ) → 1 as i → ∞. Thus, ni → ∞ as i → ∞.
Therefore by the portmanteau lemma, for each i,
µ(Vi ) ≤ lim inf µnj (Vi ) ≤ lim inf µnj (Vj ) ≤ 1 − .
j→∞ j→∞
E XERCISE 12.4.3. Using Theorem 12.4.2, prove that any probability measure
on a Polish space is regular, in the sense of Lemma 3.4.2.
E XERCISE 12.4.4. Using the previous exercise, prove the analogue of Kol-
mogorov’s extension theorem (Theorem 3.4.1) for Polish spaces.
E XERCISE 12.4.5. Using the previous exercise, prove the existence of Markov
chains with any given kernel on a Polish space (i.e., the analogue of Theorem
11.1.3).
Helly’s selection theorem generalizes to Polish spaces. The generalization is
known as Prokhorov’s theorem.
T HEOREM 12.4.6 (Prokhorov’s theorem). Let (S, ρ) be a Polish space. Sup-
pose that {µn }∞
n=1 is a tight family of probability measures on S. Then there is a
subsequence {µnk }∞k=1 converging weakly to a probability measure µ as k → ∞.
There are various proofs of Prokhorov’s theorem. The proof given below is a
purely measure-theoretic argument. There are other proofs that are more functional
analytic in nature. In the proof below, the measure µ is constructed using the
technique of outer measures, as follows.
Let K1 ⊆ K2 ⊆ · · · be a sequence of compact sets such that µn (Ki ) ≥ 1−1/i
for all n. Such a sequence exists because the family {µn }∞ n=1 is tight. Let D be a
countable dense subset of S, which exists because S is separable. Let B be the set
of all closed balls with centers at elements of D and nonzero rational radii. Then
B is a countable collection. Let C be the collection of all finite unions of sets that
are of the form B ∩ Ki for some B ∈ B and some i. (In particular, C contains the
empty union, which equals ∅.) Then C is also countable. By the standard diagonal
argument, there is a subsequence {nk }∞ k=1 such that
α(C) := lim µnk (C)
k→∞
172 12. WEAK CONVERGENCE ON POLISH SPACES
P ROOF. This is obvious from the definition of α and the properties of mea-
sures.
L EMMA 12.4.8. If F is a closed set, V is an open set containing F , and some
C ∈ C also contains F , then there exists D ∈ C such that F ⊆ D ⊆ V .
P ROOF. Take any two open sets V1 and V2 , and any C ∈ C such that C ⊆
V1 ∪ V2 . Define
F1 := {x ∈ C : ρ(x, V1c ) ≥ ρ(x, V2c )},
F2 := {x ∈ C : ρ(x, V2c ) ≥ ρ(x, V1c )},
where ρ(x, A) is defined as in (12.2.1) for any A. It is not hard to see (using
(12.2.2), for instance), that x 7→ ρ(x, A) is a continuous map for any A. Therefore
the sets F1 and F2 are closed. Moreover, if x 6∈ V1 and x ∈ F1 , then the defi-
nition of F1 implies that x 6∈ V2 , which is impossible since F1 ⊆ C ⊆ V1 ∪ V2 .
Thus, F1 ⊆ V1 . Similarly, F2 ⊆ V2 . Moreover F1 and F2 are both subsets of C.
Therefore by Lemma 12.4.8, there exist C1 , C2 ∈ C such that F1 ⊆ C1 ⊆ V1 and
F2 ⊆ C2 ⊆ V2 . But then C = F1 ∪F2 ⊆ C1 ∪C2 , and therefore by Lemma 12.4.7,
α(C) ≤ α(C1 ) + α(C2 ) ≤ β(V1 ) + β(V2 ).
Taking supremum over C completes the proof.
L EMMA 12.4.10. The functional β is countably subadditive on open sets.
12.4. TIGHTNESS AND PROKHOROV’S THEOREM 173
P ROOF. It is clear from the definition that µ is monotone and satisfies µ(∅) =
0. We only need to show that µ is subadditive. Take any sequence of set A1 , A2 , . . .
contained in S and let A be their union. Take any > 0. For each i, let Vi be an
open set containing Ai such that β(Vi ) ≤ µ(Ai ) + 2−i . Then by Lemma 12.4.10,
[ ∞ X ∞
µ(A) ≤ β Vi ≤ β(Vi )
i=1 i=1
∞
X ∞
X
≤ (µ(Ai ) + 2−i ) = + µ(Ai ).
i=1 i=1
Since is arbitrary, this completes the proof.
L EMMA 12.4.12. For any open V and closed F ,
β(V ) ≥ µ(V ∩ F ) + µ(V ∩ F c ).
where the second inequality holds because µ is monotone. Now taking infimum
over V shows that F ∈ F. Therefore F contains the Borel σ-algebra. To prove
(b), take any i and observe that by the compactness of Ki , Ki can be covered by
finitely many elements of B. Consequently, Ki itself is an element of C. Therefore
1
µ(S) = β(S) ≥ α(Ki ) ≥ 1 − .
i
Since this holds for all i, we get µ(S) ≥ 1. On the other hand, it is clear from the
definition of µ that µ(S) ≤ 1. Thus, µ is a probability measure. Finally, to prove
(c), notice that for any open set V and any C ∈ C, C ⊆ V ,
E XERCISE 12.4.14. Let S be the set of all functions from Z into R. Equip S
with the topology of pointwise convergence. Prove that this topology is metrizable,
and under a compatible metric, it is a Polish space. If (Xi )i∈Z is a sequence of real-
valued random variables defined on some probability space, show that the whole
sequence can be viewed as an S-valued random variable. If (Xn )n≥1 is a sequence
of S-valued random variables, where Xn = (Xn,i )i∈Z , prove that it is tight if and
only if for each i ∈ Z, the sequence of real-valued random variables (Xn,i )n≥1 is
tight.
where B(S) is the Borel σ-algebra of S. Prove that d is a metric on the set of
probability measures on S, and µn → µ weakly if and only if d(µn , µ) → 0.
(Thus, d metrizes weak convergence of probability measures.)
12.5. SKOROKHOD’S REPRESENTATION THEOREM 175
P ROOF. Let B(x, r) denote the closed ball of radius r centered at a point x
in S. Since ∂B(x, r) and ∂B(x, s) are disjoint for any distinct r and s, it follows
that for any x, there can be only countably many r such that µ(∂B(x, r)) > 0. In
particular, for each x we can find rx ∈ (0, /2) such that µ(∂B(x, rx )) = 0. Then
the interiors of the balls B(x, rx ) form a countable open cover of S. Since S is
a separable metric space, it has the property that any open cover has a countable
subcover (the Lindelöf property). Thus, there exist x1 , x2 , . . . such that
∞
[
S= B(xi , rxi ).
i=1
Now choose n so large that
n
[
µ(S) − µ B(xi , rxi ) < .
i=1
Let Bi := B(xi , rxi ) for i = 1, . . . , n. Define A1 = B1 and
Ai = Bi \ (B1 ∪ · · · ∪ Bi−1 )
for 2 ≤ i ≤ n. Finally, let A0 := S \ (B1 ∪ · · · ∪ Bn ). Then by our choice of
n, µ(A0 ) < . By construction, A0 is open and diam(Ai ) ≤ diam(Bi ) ≤ for
1 ≤ i ≤ n. Finally, note that ∂Ai ⊆ ∂B1 ∪ · · · ∪ ∂Bi for 1 ≤ i ≤ n, which shows
that µ(∂Ai ) = 0 because µ(∂Bj ) = 0 for every j. Finally, we merge those Ai with
A0 for which µ(Ai ) = 0, so that µ(Ai ) > 0 for all i that remain.
176 12. WEAK CONVERGENCE ON POLISH SPACES
P ROOF OF T HEOREM 12.5.1. For each j ≥ 1, choose sets Aj0 , . . . , Ajkj satis-
fying the conditions of Lemma 12.5.2 with = 2−j and µ as in the statement of
Theorem 12.5.1.
Next, for each j, find nj such that if n ≥ nj , then
µn (A ∩ Aji )
µn,i (A) :=
µn (Aji )
P ROOF. Let (S, ρ) be a Polish space, and suppose that {Xn }∞n=1 is a sequence
of S-valued random variables converging to a random variable X in probability.
Take any bounded uniformly continuous function f : S → R. Take any > 0.
Then there is some δ > 0 such that |f (x) − f (y)| ≤ whenever ρ(x, y) ≤ δ. Thus,
P(|f (Xn ) − f (X)| > ) ≤ P(ρ(Xn , X) > δ),
which shows that f (Xn ) → f (X) in probability. Since f is bounded, Proposition
8.2.9 shows that f (Xn ) → f (X) in L1 . In particular, Ef (Xn ) → Ef (X). Thus,
Xn → X in distribution.
P ROPOSITION 12.6.2. Let (S, ρ) be a Polish space. A sequence of S-valued
random variables {Xn }∞ n=1 converges to a constant c ∈ S in probability if and
only if Xn → c in distribution.
178 12. WEAK CONVERGENCE ON POLISH SPACES
P ROOF. There are many metrics that metrize the product topology on S × S.
For example, we can use the metric
d((x, y), (w, z)) := ρ(x, w) + ρ(y, z).
Suppose that f : S ×S → R is a bounded and uniformly continuous function. Take
any > 0. Then there is some δ > 0 such that |f (x, y) − f (w, z)| ≤ whenever
d((x, y), (w, z)) ≤ δ. Then
P(|f (Xn , Yn ) − f (Xn , c)| > ) ≤ P(ρ(Yn , c) > δ),
which implies that f (Xn , Yn )−f (Xn , c) → 0 in probability. By Proposition 8.2.9,
this shows that f (Xn , Yn ) − f (Xn , c) → 0 in L1 . In particular, Ef (Xn , Yn ) −
Ef (Xn , c) → 0. On the other hand, x 7→ f (x, c) is a bounded continuous function
on S, and so Ef (Xn , c) → Ef (X, c). Thus, Ef (Xn , Yn ) → Ef (X, c). By the
portmanteau lemma, this completes the proof.
Just like Corollary 8.8.2, the above theorem has the following corollary about
random vectors.
C OROLLARY 12.7.2. Two random vectors have the same law if and only if
they have the same characteristic function.
P ROOF. Same as the proof of Corollary 8.8.2, using Theorem 12.7.1 and Corol-
lary 12.2.2 instead of Theorem 8.8.1 and Proposition 8.7.1.
P ROOF. If X and Y are independent, we know that by the product rule for ex-
pectation, E(f (X)g(Y )) = E(f (X))E(g(Y )) for any bounded continuous f, g :
R → R. Conversely, suppose that this criterion holds. Considering real and imag-
inary parts, we can assume that the criterion holds even if f and g are complex
valued. Then taking f (x) = eisx and g(y) = eity for arbitrary s, t ∈ R, we get that
the characteristic function of (X, Y ) is the product of the characteristic functions
of X and Y . Therefore by Corollary 12.7.3, X and Y are independent.
E XERCISE 12.7.5. Prove a multivariate version of Corollary 8.8.7.
E XERCISE 12.7.6. Let X be a real-valued random variable defined on a prob-
ability space (Ω, F, P) and let G be a sub-σ-algebra of F. Let φ be a characteristic
function. Prove that X is independent of G and has characteristic function φ if and
only if E(eitX ; A) = φ(t)P(A) for all t ∈ R and all A ∈ G. (Hint: Show that X
and 1A are independent random variables for all A ∈ G.)
180 12. WEAK CONVERGENCE ON POLISH SPACES
d
P ROOF. If Xn → X, then it follows from the definition of weak convergence
that the characteristic functions converge. Conversely, suppose that φXn (t) →
φX (t) for all t ∈ Rm , where m is the dimension of the random vectors. Let
Xn1 , . . . , Xnm denote the coordinates of Xn . Similarly, let X 1 , . . . , X m be the co-
ordinates of X. Then note that the pointwise convergence of φXn to φX automat-
ically implies the pointwise convergence of φXni to φX i for each i. Consequently
by Lévy’s continuity theorem, Xni → X i in distribution for each i. In particular,
for each i, {Xni }∞ n=1 is a tight family of random variables. Thus, given any > 0,
there is some K > 0 such that P(Xni ∈ [−K i , K i ]) ≥ 1 − /m for all n. Let
i
K = max{K 1 , . . . , K m }, and let R be the cube [−K, K]m . Then for any n,
m
X
P(Xn 6∈ R) ≤ P(Xni 6∈ [−K, K]) ≤ .
i=1
d
P ROOF. If Xn → X, then Ef (t · Xn ) → Ef (t · X) for every bounded contin-
d
uous function f : R → R and every t ∈ Rm . This shows that t · Xn → t · X for
d
every t. Conversely, suppose that t · Xn → t · X for every t. Then
φXn (t) = E(eit·Xn ) → E(eit·X ) = φX (t).
d
Therefore, Xn → X by the multivariate Lévy continuity theorem.
Brownian motion
P ROOF. Let us first consider the case of C[0, 1]. Let B be the Borel σ-algebra
of C[0, 1] and F be the σ-algebra generated by the finite dimensional projection
maps. Note that by the definition of F, each projection map πt1 ,...,tn is measurable
from (C[0, 1], F) into Rn . Given f ∈ C[0, 1] and some n and some t1 , . . . , tn , we
can reconstruct an ‘approximation’ of f from πt1 ,...,tn (f ) as the function g which
satisfies g(ti ) = f (ti ) for each i, and linearly interpolate when t is between some
ti and tj . Define g to be constant to the left of the smallest ti and to the right of the
largest ti , such that continuity is preserved. The map ρt1 ,...,tn that constructs g from
πt1 ,...,tn (f ) is a continuous map from Rn into C[0, 1], and therefore measurable
from Rn into (C[0, 1], B). Thus, if we let
ξt1 ,...,tn := ρt1 ,...,tn ◦ πt1 ,...,tn ,
then ξt1 ,...,tn is measurable from (C[0, 1], F) into (C[0, 1], B). Now let {t1 , t2 , . . .}
be a dense subset of [0, 1] chosen in such a way that
lim max min |t − ti | = 0. (13.1.1)
n→∞ 0≤t≤1 1≤i≤n
(It is easy to construct such a sequence.) Let fn := ξt1 ,...,tn (f ). By the uniform
continuity of f , the construction of fn , and the property (13.1.1) of the sequence
183
184 13. BROWNIAN MOTION
{tn }∞
n=1 , it is not hard to show that fn → f in the topology of C[0, 1]. Thus, the
map ξt1 ,...,tn converges pointwise to the identity map as n → ∞. So by Proposi-
tion 2.1.14, it follows that the identity map from (C[0, 1], F) into (C[0, 1], B) is
measurable. This proves that F ⊇ B. We have already observed the opposite in-
clusion in the sentence preceding the statement of the proposition. This completes
the argument for C[0, 1].
For C[0, ∞), the argument is exactly the same, except that we have to replace
the maximum over t ∈ [0, 1] in (13.1.1) with maximum over t ∈ [0, K], and then
impose that the condition should hold for every K. The rest of the argument is left
as an exercise for the reader.
Given any probability measure µ on C[0, 1] or C[0, ∞), the push-forwards of
µ under the projection maps are known as the finite-dimensional distributions of
µ. In the language of random variables, if X is a random variable with law µ,
then the finite-dimensional distributions of µ are the laws of random vectors like
(X(t1 ), . . . , X(tn )), where n and t1 , . . . , tn are arbitrary.
P ROPOSITION 13.1.3. On C[0, 1] and C[0, ∞), a probability measure is de-
termined by its finite-dimensional distributions.
such that |f (t)| ≤ M for all f ∈ F and all t ∈ [0, 1]. Finally, recall the Arzelà–
Ascoli theorem, which says that a closed set F ⊆ C[0, 1] is compact if and only if
it is uniformly bounded and equicontinuous.
P ROPOSITION 13.2.1. Let {Xn }∞ n=1 be a sequence of C[0, 1]-valued random
variables. The sequence is tight if and only the following two conditions hold:
(i) For any > 0 there is some a > 0 such that for all n,
P(|Xn (0)| > a) ≤ .
(ii) For any > 0 and η > 0, there is some δ > 0 such that for all large
enough n (depending on and η),
P(ωXn (δ) > η) ≤ .
Next, we have to prove tightness. For that, the following maximal inequality is
useful.
L EMMA 13.3.3. Let X1 , X2 , . . . be independent random variables with mean
zero and variance one. For each n, let Sn := ni=1 Xi . Then for any n ≥ 1 and
P
t ≥ 0,
√ √ √
P( max |Si | ≥ t n) ≤ 2P(|Sn | ≥ (t − 2) n).
1≤i≤n
P ROOF. Define
√
Ai := { max |Sj | < t n ≤ |Si |},
1≤j<i
Plugging this upper bound into the right side of the previous display, we get the
desired result.
C OROLLARY 13.3.4. Let Sn be as in Lemma 13.3.3. Then for any t > 3, there
is some n0 such that for all n ≥ n0 ,
√ 2
P( max |Si | ≥ t n) ≤ 6e−t /8 .
1≤i≤n
√
P ROOF. Take any t > 3. Then t − 2 ≥ t/2. Let Z ∼ N (0, 1). Then by the
central limit theorem,
√ √ √
lim P(|Sn | ≥ (t − 2) n) = P(|Z| ≥ t − 2) ≤ P(|Z| ≥ t/2).
n→∞
To complete the proof, recall that by Exercise 6.3.5, we have the tail bound P(|Z| ≥
2
t/2) ≤ 2e−t /8 .
P ROOF. Choose any η, , δ > 0. Note that for any 0 ≤ s ≤ t ≤ 1 such that
t − s ≤ δ, we have dnte − bnsc ≤ dnδe + 2. From this it is not hard to see that
|Sl − Sk |
ωBn (δ) ≤ max √ : 0 ≤ k ≤ l ≤ n, l − k ≤ dnδe + 2 .
n
Let E be the event that the quantity on the right is greater than η. Let 0 = k0 ≤
k1 ≤ · · · ≤ kr = n satisfy ki+1 − ki = dnδe + 2 for each 0 ≤ i ≤ r − 2, and
dnδe + 2 ≤ kr − kr−1 ≤ 2(dnδe + 2). Let n be so large that dnδe + 2 ≤ 2nδ.
13.3. DONSKER’S THEOREM 189
Suppose that the event E happens, and let k ≤ l be a pair that witnesses that.
Then either ki ≤ k ≤ l ≤ ki+1 for some i, or ki ≤ k ≤ ki+1 ≤ l ≤ ki+2 for some
i. In either case, the following event must happen:
√
0 η n
E := max |Sm − Ski | > for some i ,
ki ≤m≤ki+1 3
√
because if it does not happen, then |Sl − Sk | cannot be greater than η n. But by
Corollary 13.3.4, we have for any 0 ≤ i ≤ r − 1,
√
η n
P max |Sm − Ski | >
ki ≤m≤ki+1 3
√
η n p
=P max |Sm − Ski | > p ki+1 − ki
ki ≤m≤ki+1 3 ki+1 − ki
η p 2
≤P max |Sm − Ski | > √ ki+1 − ki ≤ C1 e−C2 η /δ ,
ki ≤m≤ki+1 3 2δ
where C1 and C2 do not depend on n, η or δ, and n is large enough, depending on
η and δ. Thus, for all large enough n,
P(ωBn (δ) > η) ≤ P(E) ≤ P(E 0 )
2
C1 e−C2 η /δ
−C2 η 2 /δ
≤ C1 re ≤ ,
δ
where the last inequality holds since r is bounded above by a constant multiple
of 1/δ. Thus, given any η, > 0 we can choose δ such that condition (ii) of
Proposition 13.2.1 holds for all large enough n. Condition (i) is automatic. This
completes the proof.
We now have all the ingredients to prove Donsker’s theorem.
P ROOF OF T HEOREM 13.3.1. Note that by Lemma 13.3.5 and Prokhorov’s
theorem, any subsequence of {Bn }∞ n=1 has a weakly convergent subsequence. By
Proposition 13.1.3 and Lemma 13.3.2, any two such weak limits must be the same.
This suffices to prove that the whole sequences converges.
E XERCISE 13.3.6. Let X1 , X2 , . . . be a sequence ofP
i.i.d. random variables
with mean zero and variance one. For each n, let Sn := ni=1 Xi , with S0 = 0.
Prove that the random variables
1
√ max Si
n 0≤i≤n
and
n
1X
1{Si ≥0}
n
i=1
converge in law as n → ∞, and the limiting distributions do not depend on the dis-
tribution of the Xi ’s. In fact, the first sequence converges in law to max0≤t≤1 B(t),
R1
and the second sequence converges in law to 0 1{B(t)≥0} dt. (Hint: Use Donsker’s
190 13. BROWNIAN MOTION
theorem and Exercises 12.2.4 and 12.2.6. The second one is technically more chal-
lenging.)
E XERCISE 13.3.7. Prove a version of Donsker’s theorem for sums of station-
ary m-dependent sequences. (Hint: The key challenge is to generalize Lemma
13.3.3. The rest goes through as in the i.i.d. case, applying the CLT for stationary
m-dependent sequences that we derived earlier.)
(3) Using the above, prove that {Xn }n≥1 is almost surely a Cauchy sequence
in C[0, 1], and therefore has a limit. Prove that the limit is Brownian
motion.
where Z ∼ N (0, 1). Since 2P(Z ≥ x) = P(|Z| ≥ x), we arrive at the following
result.
13.6. LAW OF LARGE NUMBERS FOR BROWNIAN MOTION 193
The proof is now completed by the normal tail bound from Exercise 6.3.5.
E XERCISE 13.5.3. Let B be standard Brownian motion. Given t > 0, figure
out the law of max0≤s≤t B(s).
E XERCISE 13.5.4. Let B be standard Brownian motion. Given a > 0, let
T := inf{t : B(t) ≥ a}. Prove that T is a random variable and that it is finite
almost surely. Then compute the probability density function of T . (Hint: Use the
previous problem.)
P ROOF. Since B(1), B(2) − B(1), B(3) − B(2), . . . are i.i.d. standard normal
random variables, B(n)/n → 0 almost surely as n → ∞ by the strong law of large
numbers. Now if n ≤ t ≤ n + 1, then
|B(t)| |B(t)| |B(n)| 1
≤ ≤ + max |B(s) − B(n)|.
t n n n n≤s≤n+1
Therefore it suffices to show that Mn /n → 0 almost surely as n → ∞, where
Mn := maxn≤s≤n+1 |B(s) − B(n)|. Take any n. Let W (t) := B(n + t) − B(n)
194 13. BROWNIAN MOTION
Thus,
∞ ∞
X √ X
P(Mn ≥ n) ≤ 4e−n/2 < ∞.
n=1 n=1
Prove that Brownian motion on the time interval [0, 1] almost surely has infinite
total variation.
where En is the event that B(t + u) = B(t) for some u ∈ (0, 1/n]. Note that by
the continuity of B,
∞
[
En = {B(t + u) = B(t) for some u ∈ [1/k, 1/n]}
k=n
[∞ \∞
= {|B(t + u) − B(t)| < 1/j for some u ∈ [1/k, 1/n] ∩ Q}.
k=n j=1
This shows that En ∈ Ft+1/n . Thus, for n ≥ 1/, En ∈ Ft+1/n ⊆ Ft+ . Since
this is true for every > 0, we see that {τ = t} ∈ Ft+ .
To see that Ft+ may not be equal to Ft , consider the special case t = 0. Sup-
pose that the probability space on which B is defined is simply C[0, 1] endowed
with its Borel σ-algebra and the Wiener measure, and B is the identity map on this
space. Since B(0) = 0, F0 is the trivial σ-algebra consisting of only Ω and ∅. But
clearly, the event {τ = 0}, as a subset of this space, is neither Ω nor ∅.
Then by the previous paragraph, Wn and Fsn are independent. Since Fs+ ⊆ Fsn ,
we get that Wn and Fs+ are independent for each n. But Wn → W in C[0, ∞).
Therefore by Proposition 12.3.1, W is independent of Fs+ .
An interesting consequence of the Markov property is Blumenthal’s zero-one
law, which says that any event in F0+ must have probability 0 or 1 with respect to
Wiener measure. Indeed, take any A ∈ F0+ . Then due to the Markov property of
B, the event A is independent of σ(B). On the other hand, since F0+ ⊆ σ(B), A
is an element of σ(B). Therefore A is independent of itself, and hence P(A) must
be 0 or 1.
Blumenthal’s zero-one law is a basic fact about Brownian motion, with a num-
ber of important consequences. Let us work out a few of these. First, let
τ + := inf{t > 0 : B(t) > 0},
τ − := inf{t > 0 : B(t) < 0},
τ := inf{t > 0 : B(t) = 0}.
It is easy to see (just as we did in the previous section) that the event {τ + = 0} is
in F0+ . Now for any t > 0,
1
P(τ + ≤ t) ≥ P(B(t) > 0) = .
2
+ +
Since {τ = 0} is the decreasing limit of {τ ≤ 1/n} as n → ∞, this shows
that P(τ + = 0) ≥ 1/2. Thus, by Blumenthal’s zero-one law, P(τ + = 0) = 1. By
the same argument, P(τ − = 0) = 1. But if τ + = 0 and τ − = 0, then B attains
both positive and negative values in arbitrarily small neighborhoods of 0. By the
continuity of B, this implies that τ = 0. Thus, P(τ = 0) = 1.
Combined with the Markov property, the above result implies the more general
statement that for any given t, inf{s > t : B(s) = B(t)} = t almost surely. Note
that this does not imply that there will not exist any exceptional t for which this
fails in a given realization of B. To see this, note that B(1) 6= 0 with probability 1.
Thus, if we let T := sup{t ∈ [0, 1] : B(t) = 0}, then T < 1 almost surely. Now
note that inf{t > T : B(t) = B(T )} > 1 > T .
Similarly, inf{s > t : B(s) > B(t)} = t and inf{s > t : B(s) < B(t)} = t
almost surely. As a consequence, we get the following result.
P ROPOSITION 13.9.2. With probability one, Brownian motion has no interval
of increase or decrease (that is, no nontrivial interval where Brownian motion is
increasing or decreasing monotonically).
P ROOF. For a given interval [a, b], where a < b, we know from the discussion
preceding the proposition that with probability one, B cannot be monotonically in-
creasing or decreasing in [a, b]. Therefore with probability one, this cannot happen
in any interval with rational endpoints. But if there is a nontrivial interval where B
is monotonically increasing or decreasing, then there is a subinterval with rational
endpoints where this happens.
13.10. LAW OF THE ITERATED LOGARITHM FOR BROWNIAN MOTION 199
E XERCISE 13.9.3. Prove that with probability one, every local maximum of
Brownian motion is a strict local maximum. (Hint: First prove that the values of
the maxima in two given disjoint intervals are different almost surely. Then take
intersection over intervals with rational endpoints.)
E XERCISE 13.9.4. Prove that with probability one, the set of local maxima
of Brownian motion is countable and dense. (Hint: Use Proposition 13.9.2 and
Exercise 13.9.3.)
E XERCISE 13.9.5. For any fixed s > 0, show that the random function V (t) :=
B(s − t) − B(s), defined for 0 ≤ t ≤ s, is a standard Brownian motion on the time
interval [0, s]. (This is called ‘time reversal’.)
E XERCISE 13.9.6. Prove that the for any s ∈ [0, 1],
P max B(t) = max B(t) = 0.
0≤t≤s s≤t≤1
Using this, prove that with probability one, Brownian motion has a unique point of
global maximum in the time interval [0, 1].
E XERCISE 13.9.7. Let T be the unique point at which B(t) attains its global
maximum in the time interval [0, 1]. Compute the probability distribution function
and the probability density function of T . (This is known as the ‘arcsine distri-
bution’. Hint: Try to write P(T ≤ s) as a probability involving the maxima of
two independent Brownian motions. You will have to apply time reversal and the
Markov property.)
P ROOF. Using time inversion, it is not difficult to check that it suffices to prove
the result only for t → ∞. Also, the lim inf result follows
√ by changing B to −B,
so it suffices to prove only for lim sup. Let h(t) := 2t log log t. Take any δ > 0.
Let tn := (1 + δ)n . Let An be the event
B(s)
sup > 1 + δ.
s∈[tn ,tn+1 ] h(s)
200 13. BROWNIAN MOTION
Next, take any δ ∈ (0, 1) and let sn := δ −n . Let Xn := B(sn ) − B(sn−1 ) and let
En be the event Xn ≥ (1 − δ)h(sn ). Let
(1 − δ)h(sn ) p
an := √ = 2(1 − δ) log log(δ −n ). (13.10.2)
sn − sn−1
Then by Exercise 6.3.6,
2
e−an /2
1 1
P(En ) ≥ − 3 √ .
an an 2π
From the expression (13.10.2) for an , it follows that ∞
P
n=1 P(Bn ) = ∞. Since
the random variables X1 , X2 , . . . are independent, so are the events E1 , E2 , . . ..
Therefore by the second Borel–Cantelli lemma, En occurs infinitely often with
probability one. In other words, with probability one,
B(sn ) − B(sn−1 )
≥ 1 − δ i.o. (13.10.3)
h(sn )
But by (13.10.1) and the fact that −B is also a standard Brownian motion, we have
that
−B(sn−1 ) −B(sn−1 ) h(sn−1 )
lim sup = lim sup
n→∞ h(sn ) n→∞ h(sn−1 ) h(sn )
√ −B(sn−1 ) √
= δ lim sup ≤ δ.
n→∞ h(sn−1 )
13.10. LAW OF THE ITERATED LOGARITHM FOR BROWNIAN MOTION 201
√
Thus, with probability one, −B(sn−1 )/h(sn ) ≤ 2 δ for all sufficiently large n.
Plugging this into (13.10.3), we get that with probability one,
B(sn ) √
≥ 1 − δ − 2 δ i.o. (13.10.4)
h(sn )
Since δ > 0 is arbitrary, this shows that lim supt→∞ B(t)/h(t) ≥ 1 a.s.
√
E XERCISE 13.10.2. Prove that as t → ∞, B(t)/ 2t log log t → 0 in proba-
bility but does not tend to zero almost surely.
By a refinement of the above proof, one can prove the following stronger ver-
sion of the law of the iterated logarithm for Brownian motion. We will use it later
to prove the law of the iterated logarithm for sums of i.i.d. random variables.
T HEOREM 13.10.3. Let B be standard Brownian motion. Then with probabil-
ity one, for all sequences {tn }n≥1 such that tn → ∞ and tn+1 /tn → 1,
B(tn ) B(tn )
lim sup √ = 1, lim inf √ = −1.
n→∞ 2tn log log tn t→∞ 2tn log log tn
The same statements hold if we replace tn → ∞ by tn → 0, and log log tn by
log log(1/tn ).
(Note that the crucial point in this theorem is that tn is allowed to be random.)
P ROOF. As before, it suffices to prove only for lim sup and only for tn → ∞.
Take a realization of B where the conclusion of the LIL is satisfied, and let {tn }n≥1
be any sequence such that tn → ∞ and tn+1 /tn → 1. Then immediately from
Theorem 13.10.1 we have
B(tn ) B(t)
lim sup √ ≤ lim sup √ = 1.
n→∞ 2tn log log tn t→∞ 2t log log t
The opposite inequality requires a bit more work. First, let δ, sn and En be as in
the second half of the proof of Theorem 13.10.1. Let Dn be the event
√
min (B(t) − B(sn )) ≥ − sn .
sn ≤t≤sn+1
By the Markov property of Brownian motion, it is easy to see that the events En and
Dn are independent. Also, by Proposition 13.5.1, it is easy to check that P(Dn ) ≥
c(δ) > 0 for some constant c(δ) that depends on δ but not on n. Therefore from
our previously obtained lower bound on P(En ), we get
∞
X
P(E2n ∩ D2n ) = ∞.
n=1
But again, note that En and Dn are determined by the process
Wn (t) := B(sn−1 + t) − B(sn−1 ),
while Ek and Dk are determined by {B(t)}t≤sn−1 for k ≤ n − 2. So by successive
applications of the Markov property, we see that the events {E2n ∩ D2n }n≥1 are
independent. Therefore with probability one, these events occur infinitely often.
202 13. BROWNIAN MOTION
Take a realization of B that satisfies the above property for all δ in a sequence
tending to zero. Take any sequence {tn }n≥1 such that tn → ∞ and tn+1 /tn → 1.
For each k, let n = n(k) be the smallest number such that sk ≤ tn < sk+1 . Then
by the above property of the particular realization of B, there exist infinitely many
n such that
√ √
B(tn ) ≥ (1 − δ − 2 δ)h(sk ) − sk .
As k → ∞, we have n = n(k) → ∞, and so tn /tn−1 → 1. This means, in
particular, that tn /sk → 1 as k → ∞, because otherwise tn−1 would be between
sk and tn infinitely often. Thus,
B(tn ) B(tn(k) )
lim sup ≥ lim sup
n→∞ h(tn ) k→∞ h(tn(k) )
√ h(sk ) √
sk
≥ lim (1 − δ − 2 δ) −
k→∞ h(tn(k) ) h(tn(k) )
√
= 1 − δ − 2 δ.
Since this holds for all δ in a sequence tending to zero, this completes the proof of
the theorem.
Any constant is trivially a stopping time. A slightly less trivial example is the
following.
where both identities hold because B is continuous. The last event is clearly in
Ft , and therefore in Ft+ . By the continuity of B, it also follows that T can be
alternately defined as inf{t ≥ 0 : B(t) ≥ a} if a ≥ 0 and as inf{t ≥ 0 : B(t) ≤ a}
if a ≤ 0.
E XERCISE 13.11.3. If T and S are stopping times for Brownian motion, prove
that min{S, T }, max{S, T } and S + T are also stopping times.
Associated with any stopping time T is a σ-algebra known as the stopped σ-
algebra of T . It is defined as
FT+ := {A ∈ σ(B) : A ∩ {T ≤ t} ∈ Ft+ for all t ≥ 0}.
P ROOF. By Proposition 13.11.5, FT+ ⊆ ∩n≥1 FT+n . To prove the opposite in-
clusion, take any A ∈ ∩n≥1 FT+n . Then for any t ≥ 0,
[ ∞ \ ∞
A ∩ {T < t} = A ∩ {Tm < t}
n=1 m=n
∞ \
[ ∞
= (A ∩ {Tm < t}).
n=1 m=n
But A ∩ {Tm < t} ∈ Ft+ for each m. Thus, A ∩ {T < t} ∈ Ft+ , and so by
Exercise 13.11.4, A ∈ FT+ .
Using the above results, we now prove a very important fact.
P ROPOSITION 13.11.7. Let T be a stopping time for Brownian motion. Define
the ‘stopped process’ (
B(t) if t ≤ T,
U (t) :=
B(T ) if t > t.
Then U is FT+ -measurable.
P ROOF. First, let us prove the claim under the assumption that T takes values
in a countable set {t1 , t2 , . . .}, which may include ∞. Take any A ∈ FT+ and any
E in the Borel σ-algebra of C[0, ∞). For each n such that tn is finite, define a
process Wn (t) := B(tn + t) − B(tn ). If tn = ∞, let Wn be an independent
Brownian motion. If T = tn , then W = Wn . Thus,
{W ∈ E} ∩ A ∩ {T = tn } = {Wn ∈ E} ∩ A ∩ {T = tn }.
Now note that if tn is finite, then A∩{T = tn } = A∩{T ≤ tn }∩{T = tn }. Since
A ∈ FT+ , A ∩ {T ≤ tn } ∈ Ft+n . Also, {T = tn } = {T ≤ tn } \ {T < tn } ∈ Ft+n
(by Exercise 13.11.1). Thus, A ∩ {T = tn } ∈ Ft+n . On the other hand, by the
Markov property, Wn is a standard Brownian motion and is independent of Ft+n .
Thus,
P({Wn ∈ E} ∩ A ∩ {T = tn }) = P(Wn ∈ E)P(A ∩ {T = tn })
= P(B ∈ E)P(A ∩ {T = tn }).
If tn = ∞, then also the above identity holds. Thus, for any n,
P({W ∈ E} ∩ A ∩ {T = tn }) = P(B ∈ E)P(A ∩ {T = tn }).
Summing over n gives
P({W ∈ E} ∩ A) = P(B ∈ E)P(A).
Taking A = Ω gives P(W ∈ E) = P(B ∈ E). Thus, W is a Brownian motion,
and is independent of FT+ .
Now take a general stopping time T . Let Tn be defined as in the proof of
Proposition 13.11.7. Define Wn (t) := B(Tn + t) − B(Tn ) if Tn < ∞, and let Wn
be an independent Brownian motion if Tn = ∞. Since Tn = ∞ implies T = ∞,
it also implies that Tm = ∞ for all m. Thus, we can take Wn to be the same
independent Brownian motion for every n if T = ∞.
Now, since Tn can take only countably many values, we know by the previ-
ous deduction that Wn is a standard Brownian motion and that Wn and FT+n are
independent. Since Tn ≥ T , Proposition 13.11.5 implies that FT+n ⊇ FT+ . Thus,
Wn is independent of FT+ for each n. But note that Tn → T as n → ∞, and
hence Wn → W in C[0, ∞). Thus W is also a standard Brownian motion, and by
Proposition 12.3.1, W is independent of FT+ .
The strong Markov property can sometimes be used to show that certain ran-
dom variables are not stopping times. The following is an example.
206 13. BROWNIAN MOTION
Exercise 7.6.5, (M (t), B(t)) has a probability density function, whose value at a
point (a, b) ∈ U is given by
√ Z ∞
∂2 ∂2 1 2
− P( tZ ≥ 2a − b) = − √ e−x /2 dx
∂a∂b ∂a∂b (2a−b)/√t 2π
∂ 1 −(2a−b)2 /2t
=− √ e
∂a 2πt
r
2 (2a − b) −(2a−b)2 /2t
= e .
π t3/2
E XERCISE 13.12.5. Let B be standard Brownian motion and let M (t) be the
maximum of B up to time t. Prove that for any t, M (t) − B(t) has the same law
as |B(t)|.
A stopping time T for a filtration {Ft }t≥0 is a [0, ∞]-valued random variable such
that for any t ∈ [0, ∞), the event {T ≤ t} is in Ft . A stopping time T defines a
stopped σ-algebra FT as
FT := {A ∈ F : A ∩ {T ≤ t} ∈ Ft for all t ≥ 0}.
If S and T are stopping times such that S ≤ T always, then FS ⊆ FT . The proof
is exactly the same as the proof of Proposition 13.11.5. Similarly, if the filtration
is right-continuous, and {Tn }n≥1 is a sequence of stopping times decreasing to a
stopping time T , then
∞
\
FT = FTn .
n=1
Again, the proof is exactly identical to the proof of Proposition 13.11.6.
A stopping times is called bounded if it is always bounded above by some
deterministic constant.
A collection of random variables {Xt }t≥0 , called a stochastic process, is said
to be adapted to this filtration for Xt is Ft -measurable for each t. The stochastic
process is called right-continuous if for any sample point ω, the map t 7→ Xt (ω) is
right-continuous.
A stochastic process {Xt }t≥0 adapted to a filtration {Ft }t≥0 is called a mar-
tingale if E|Xt | < ∞ for each t, and for any s ≤ t, E(Xt |Fs ) = Xs a.s.
209
210 14. MARTINGALES IN CONTINUOUS TIME
to the Brownian filtration. Again, this is a simple application of the Markov prop-
erty. Take any s ≤ t and define W (u) := B(s + u) − B(s), as before. Then
1 2 1 2
E(eλB(t)− 2 λ t |Fs+ ) = eλB(s)− 2 λ t E(eλ(B(t)−B(s)) |Fs+ )
1 2
= eλB(s)− 2 λ t E(eλW (t−s) |Fs+ )
1 2
= eλB(s)− 2 λ t E(eλW (t−s) )
1 2 1 2 (t−s) 1 2
= eλB(s)− 2 λ t e 2 λ = eλB(s)− 2 λ s .
Let us now work out some applications of the martingales obtained above.
E XAMPLE 14.3.4. Take any a < 0 < b. Let T := inf{t : B(t) ∈ / (a, b)}. Then
B(T ) can take only two possible values, namely, a and b. The optional stopping
theorem helps us calculate the respective probabilities, as follows.
Take any t ≥ 0 and let T ∧ t denote the minimum of T and t. This is the
standard notation in this area. By Exercise 13.11.3, T ∧ t is also a stopping time.
Moreover, it is a bounded stopping time. Therefore by the optional stopping theo-
rem for right continuous martingales, E(B(T ∧ t)) = E(B(0)) = 0 for any t.
Now notice that since T ∧ t ≤ T , B(T ∧ t) ∈ [a, b]. Also, letting M (t) :=
max0≤s≤t B(s), we have
P(T = ∞) = lim P(T ≥ t)
t→∞
= lim P(M (t) ≤ b) = 0,
t→∞
where the last identity follows from the fact that M (t) has the same law as the
absolute value of a N (0, t) random variable. Thus, P(T < ∞) = 1, and so
B(T ∧ t) → B(T ) a.s. as t → ∞. Combining all of these observations, we see
that by the dominated convergence theorem,
E(B(T )) = lim E(B(T ∧ t)) = 0.
t→∞
E XAMPLE 14.3.5. Let us now use the martingale B(t)2 − t to compute E(T ).
As before, the optional stopping theorem gives us
E(B(T ∧ t)2 ) = E(T ∧ t)
for every t. We have already observed that B(T ∧t) ∈ [a, b] for all t, and converges
to B(T ) as t → ∞. Thus, by the dominated convergence theorem, the right side
14.3. MARTINGALES RELATED TO BROWNIAN MOTION 213
Fix t1 , . . . , tn ∈ R and let φi (x) := 1{x≤ti } for each i. Suppose that instead of
random (Ui , Vi ), we had deterministic (ui , vi ) and defined Tn and Yn as above
with these ui ’s and vi ’s. Then each Tn would be a stopping time for the Brownian
filtration, and 0 = T0 ≤ T1 ≤ T2 ≤ · · · . Let Wn (t) := B(Tn + t) − B(Tn ).
Then by the strong Markov property, each Wn is a standard Brownian motion, and
is independent of FT+n . Also, note that for each n,
and Y1 , . . . , Yn−1 are FT+n−1 -measurable (because B(Ti ) is FT+i -measurable for all
i and FT+i ⊆ FT+j when i ≤ j). Lastly, note that Yn = Wn−1 (Tn − Tn−1 ). Thus,
E(φ1 (Y1 ) · · · φn (Yn )|FT+n−1 )
= φ1 (Y1 ) · · · φn−1 (Yn−1 )E[φn (Wn (Tn − Tn−1 ))|FT+n−1 ]
= φ1 (Y1 ) · · · φn−1 (Yn−1 )E[φn (Wn (Tn − Tn−1 ))].
By Lemma 14.5.2 and the observations made above, E[φn (Wn (Tn − Tn−1 ))] is a
measurable function of (un , vn ). Let’s call it ψn (un , vn ). Proceeding inductively,
we get
E(φ1 (Y1 ) · · · φn (Yn )) = ψ1 (u1 , v1 ) · · · ψn (un , vn ).
Now consider the case of random (Ui , Vi ), generated as before. Considering the
random variables Y1 , . . . , Yn as functions of B and (U1 , V1 ), . . . , (Un , Vn ), and
applying Proposition 9.2.1 and the above identity, we get
Yn
E(φ1 (Y1 ) · · · φn (Yn )) = E(ψi (Ui , Vi )).
i=1
But by Skorokhod’s embedding theorem and our definition of the Ti ’s,
E(ψi (Ui , Vi )) = E(φi (Xi )).
This proves (14.5.1). So we have shown that (14.5.1) holds for every n, which
completes the proof of the first claim of the theorem. The proof of the second
claim is exactly similar, using Zi := Ti − Ti−1 instead of Yi .
Strassen’s coupling can also be used to give an alternative proof of Donsker’s
theorem, assuming that we construct Brownian motion by some other means (for
example, by Exercise 13.4.7). This is outlined in the following exercises.
E XERCISE 14.5.3. Let X1P , X2 , . . . be i.i.d. random variables with mean zero
and variance one. Let Sn := ni=1 Xi . Using Strassen’s coupling, prove that it
is possible to construct {Sn }n≥1 and a standard Brownian motion B on the same
probability space such that n−1/2 max0≤k≤n |Sk − B(k)| → 0 in probability as
n → ∞. (Hint: First, show that n−1 max0≤k≤n |Tk − k| → 0 a.s. This is
a simple consequence of the fact that Tn /n → 1 a.s. Then use this to control
max0≤k≤n |B(Tk ) − B(k)|.)
E XERCISE 14.5.4. Let Bn be the piecewise linear path from Donsker’s theo-
rem. Using the previous exercise, prove that Bn and a standard Brownian motion
B can be constructed on the same probability space such that for any > 0,
lim P(kBn − Bk[0,1] > ) = 0.
n→∞
(In other words, Bn → B in probability on C[0, 1].) As a consequence, prove
Donsker’s theorem.
Incidentally, if we have more information about the Xi ’s, such as finiteness
of higher moments or the moment generating function, it is possible to construct
better couplings. For example, under the assumption of finite moment generating
218 14. MARTINGALES IN CONTINUOUS TIME
This chapter introduces the basics of stochastic calculus and some applications
of these methods. For technical simplicity, we will not attempt to prove the most
general versions of the results, but only as much as needed.
however, was to realize that convergence takes place in L2 under mild conditions.
We will now carry out this construction.
Fix some t ≥ 0. The first step is to define the Itô integral for simple pro-
cesses. A stochastic process X = {X(s)}s≤t is called simple if there exist some
deterministic M , n and 0 = s0 < s1 < · · · < sn = t such that for each i < n,
X(s) = X(si ) for all s ∈ [si , si+1 ), and |X(si )| ≤ M always. Suppose that
X is adapted to the Brownian filtration, meaning that for each s, X(s) is Fs+ -
measurable. Note that this is equivalent to saying that X(si ) is Fs+i -measurable for
each i. We define
Z t n−1
X
X(s)dB(s) := X(si )(B(si+1 ) − B(si )).
0 i=0
For simplicity, let us denote the integral by I(X). It is not hard to see that this is
well-defined, in the sense that if we take two different representations of X as a
simple process, the value of I(X) would be the same for both.
The Itô integral on simple processes enjoys two important properties. First, it
is linear, meaning that if X and Y are two simple processes, and a and b are real
numbers, then aX + bY is also a simple process, and I(aX + bY ) = aI(X) +
bI(Y ). This is easy to verify from the definition. The second important property is
the Itô isometry property, which says that
Z t
2
E(I(X) ) = E(X(s)2 )ds.
0
To see this, note that for any 0 ≤ j < i < n, the facts that X is adapted to the
Brownian filtration, X is bounded, and B is a martingale together imply that
E[X(si )(B(si+1 ) − B(si ))X(sj )(B(sj+1 ) − B(sj ))|Fs+i ]
= X(si )(B(si+1 ) − B(si ))X(sj )E[(B(sj+1 ) − B(sj ))|Fs+i ] = 0 a.s.,
and therefore
E[X(si )(B(si+1 ) − B(si ))X(sj )(B(sj+1 ) − B(sj ))] = 0.
Similarly, by the Markov property of Brownian motion,
E[X(si )2 (B(si+1 ) − B(si ))2 |Fs+i ] = X(si )2 E[(B(si+1 ) − B(si ))2 |Fs+i ]
= X(si )2 (si+1 − si ).
As a consequence of these two observations, we get
n−1
X
2
E(I(X) ) = E(X(si )2 )(si+1 − si )
i=0
Z t
= E(X(s)2 )ds,
0
which is the Itô isometry. This is called an isometry because of the following
reason. Consider the product space [0, t] × Ω endowed with the product σ-algebra
and the product measure. We may view X as a measurable map (s, ω) 7→ X(s)(ω)
15.2. THE ITÔ INTEGRAL 221
from this product space into the real line. Then the right side of the Itô isometry is
the squared L2 norm of X, and the left side is the L2 norm of I(X). So the map I
is an isometry from a subspace of L2 ([0, t] × Ω) into L2 (Ω).
We will now extend the definition of I to continuous adapted processes. It is
possible to further extend it to square-integrable processes, but we will refrain from
doing that to avoid technical complications.
Let {X(s)}s≤t be a continuous stochastic process adapted to the Brownian
filtration, meaning that X(s) is Fs+ -measurable for each s, and for each ω ∈ Ω,
the map s 7→ X(s)(ω) is continuous. Suppose that
Z t
E(X(s)2 )ds < ∞. (15.2.1)
0
(We will see below that the integral makes sense because the continuity of X im-
plies that the map s 7→ E(X(s)2 ) is measurable.) Under this condition, we want
to define I(X). The key observation is the following.
L EMMA 15.2.1. Let X be as above. Define X(s, ω) := X(s)(ω). Then X ∈
L2 ([0, t] × Ω), and there is a sequence of simple adapted processes {Xn }n≥1 such
that Xn → X in L2 ([0, t] × Ω).
By the condition (15.2.1) and the dominated convergence theorem, we get that
lim kYM − XkL2 ([0,T ]×Ω) = 0.
M →∞
But YM is a continuous adapted process that is bounded by a constant. Therefore
by our previous deduction, we can find a simple adapted process XM such that
1
kYM − XM kL2 ([0,T ]×Ω) ≤ .
M
222 15. INTRODUCTION TO STOCHASTIC CALCULUS
We are now ready to define the Itô integral I(X) for any continuous adapted
process X. Take a sequence of simple processes {Xn }n≥1 that converge to X
in L2 ([0, t] × Ω). Then by the Itô isometry and linearity of I, {I(Xn )}n≥1 is a
Cauchy sequence in L2 (Ω). We define I(X) to be its L2 limit. The Itô isom-
etry ensures that this limit does not depend on the choice of the approximating
sequence. Furthermore, since I(X) is constructed as an L2 limit, it is clear that
X 7→ I(X) is linear and the Itô isometry continues to hold. Lastly, note that
I(X) is Ft+ -measurable, since by Lemma 4.4.5, I(Xn ) converges almost surely
to I(X) along a subsequence, and each I(Xn ) is Ft+ -measurable (and then apply
Proposition 2.1.14).
The above procedure defines the integral from 0 to t, for any t. What about
R t from s to t for some 0 ≤ s ≤ t? We can retrace the exact same steps to
integration
define s X(u)dB(u) for any continuous adapted process X satisfying
Z t
E(X(u)2 )du < ∞.
s
Moreover, when X satisfies (15.2.1), the above integral also satisfies the natural
identity
Z t Z s Z t
X(u)dB(u) = X(u)dB(u) + X(u)dB(u) a.s.
0 0 s
All of the above can be proved by first considering simple processes and then pass-
ing to the L2 limit. The details are left to the reader.
E XERCISE 15.2.2. Let X and Y be twoRcontinuous adapted R t processes satis-
t
fying (15.2.1) for some t ≥ 0. Suppose that 0 X(s)dB(s) = 0 Y (s)dB(s) a.s.
Then prove that with probability one, X(s) = Y (s) for all s ∈ [0, t].
E XERCISE 15.2.3. Let X and Y be two continuous adapted processes satisfy-
ing (15.2.1). Prove that for any t ≥ 0, the event
Z t Z t
X(s)dB(s) 6= Y (s)dB(s) ∩ {X(s) = Y (s) for all s ∈ [0, t]}
0 0
A simple application of the first Borel–Cantelli lemma now shows that with prob-
ability one, the sequence {Znk }k≥1 , viewed as a sequence of random continuous
functions on [0, t], is Cauchy with respect to the supremum norm. Let Z denote
its limit. Being the uniform limit of a sequence of continuous functions, Z is
continuous. Since for each s, the restriction of Xnk to [0, s] converges to the re-
2
R s of X to [0, s] in L ([0, s] × Ω), we conclude that Z(s) must be2 a version
striction
of 0 X(u)dB(u). Moreover, this also shows that Znk (s) → Z(s) in L .
Since each Znk (s) is Fs+ -measurable and Znk (s) → Z(s) a.s., the complete-
ness of Fs+ shows that Z(s) is Fs+ -measurable (see Exercise 2.6.4). Finally, to
prove that Z is a martingale, recall that Zn is a martingale for each n. So for any
0 ≤ u ≤ s ≤ t and any k,
E(Znk (s)|Fu+ ) = Znk (u) a.s.
We know that the right side converges to Z(s) a.s. as k → ∞. For the left side,
recall that Znk (s) → Z(s) in L2 and hence also in L1 . Therefore by Exercise 9.2.2,
E(Znk (s)|Fu+ ) → E(Z(s)|Fu+ ) a.s. as k → ∞. Thus, Z is a martingale.
Using the above procedure we can construct, for each n ≥ 1,R a continuous
t
martingale {Z n (t)}0≤t≤n such that for each t ∈ [0, n], Z n (t) = 0 X(s)dB(s)
a.s. Take any 1 ≤ m ≤ n. Then with probability one, for each rational t ∈ [0, m],
Z m (t) = Z n (t). But Z m and Z n are continuous processes. So, with probability
one, Z m (t) = Z n (t) for all t ∈ [0, m]. Thus, if we define Z(t) := Z n (t) for n =
Rt
dte, then {Z(t)}t≥0 is a continuous martingale satisfying Z(t) = 0 X(s)dB(s)
a.s. for each given t. Uniqueness follows by continuity.
E XERCISE 15.3.2. Let f : R → R be a continuous function. Let X(t) be
Rt
the continuous martingale representing 0 f (s)dB(s). Prove that X is a centered
Gaussian process, meaning that the finite dimensionalRdistributions are jointly nor-
s∧t
mal with mean zero. Show that Cov(X(s), X(t)) = 0 f (u)2 du.
T HEOREM 15.4.1. Let Z and [Z] be as above. Then the process Z(t)2 −[Z](t)
is a continuous martingale adapted to the Brownian filtration.
We need the following simple lemma.
L EMMA 15.4.2. If {Un }n≥1 is a sequence of real-valued random variables
converging in L2 to U , the Un2 → U 2 in L1 .
Rt
P ROOF. Let Z(t) be the continuous martingale version of 0 X(s)dB(s). By
Theorem 15.4.1, Z(t)2 − [Z](t) is also a continuous martingale. Therefore by
the optional stopping theorem for continuous martingales, we get E(Z(T )|FS+ ) =
Z(S) and E(Z(T )2 − [Z](T )|FS+ ) = Z(S)2 − [Z](S). The second identity can be
rewritten as
E(Z(T )2 − Z(S)2 |FS+ ) = E([Z](T ) − [Z](S)|FS+ )
Z T
2 +
=E X(s) ds FS .
S
Thus,
Z T 2
E X(s)dB(s) FS = E((Z(T ) − Z(S))2 |FS+ )
+
S
= E(Z(T )2 − 2Z(T )Z(S) + Z(S)2 |FS+ )
= E(Z(T )2 − Z(S)2 |FS+ ).
Combined with the preceding display, this proves the corollary.
P ROOF. Let us first prove the lemma under the assumption that g is bounded,
in addition to being continuous. Since s 7→ g(B(s)) is a continuous map, the sum
n
X
g(B(si ))(si+1 − si )
i=0
Rt
converges to 0 g(B(s))ds as the mesh size approaches zero. So it suffices to prove
that
X n 2
2
E g(B(si ))((B(si+1 ) − B(si )) − (si+1 − si )) (15.6.1)
i=0
tends to zero as the mesh size goes to zero. When we expand the square and take the
expectation inside, the cross terms vanish due to the Markov property of Brownian
motion, which implies that for any j < i,
E((B(si+1 ) − B(si ))2 |Fs+j ) = si+1 − si a.s.
228 15. INTRODUCTION TO STOCHASTIC CALCULUS
this shows that the quantity displayed in (15.6.1) indeed converges to zero as the
mesh size goes to zero.
Next, let us drop the assumption that g is bounded. Take any , δ > 0. Find M
so large that P(max0≤s≤t |B(s)| > M ) < δ/2. Define
g(x)
if |x| ≤ M,
h(x) := g(M ) if x > M,
g(−M ) if x < −M.
as the mesh size tends to zero. On the other hand, note that
n−1
X n−1
X
P h(B(si ))(si+1 − si ) 6= g(B(si ))(si+1 − si )
i=0 i=0
δ
≤ P( max |B(s)| > M ) < .
0≤s≤t 2
Similarly,
Z t Z t
δ
P h(B(s))ds 6= g(B(s))ds ≤ P( max |B(s)| > M ) < .
0 0 0≤s≤t 2
Combining the above observations, it follows that
n−1
X Z t
lim sup P g(B(si ))(si+1 − si ) − g(B(s))ds > ≤ δ,
i=0 0
where the lim sup is taken over a sequence of partitions with mesh size tending to
zero. Since and δ are arbitrary, this completes the proof of the lemma.
We also the analogous lemma for first-order differences.
15.6. ITÔ’S FORMULA 229
P ROOF. The proof is immediate from the definition of the Itô integral when g
is bounded and continuous. When g is not bounded, we apply the same truncation
trick as in the proof of Lemma
R t 15.6.2, and thenRapply Exercise 15.2.3 to get the
t
required upper bound on P( 0 h(B(s))dB(s) 6= 0 g(B(s))dB(s)).
With the aid of the above lemmas, we are now ready to prove Itô’s formula.
P ROOF OF T HEOREM 15.6.1. First, fix some t > 0. Take a partition 0 = s0 <
s1 < · · · < sn = t. For each δ, M > 0, define
ω(δ, M ) := sup |f 00 (x) − f 00 (y)|.
x,y∈[−M,M ],
|x−y|≤δ
1 1
− f 00 (B(si ))(B(si+1 ) − B(si ))2 ≤ ω(δB , MB )(B(si+1 ) − B(si ))2 ,
2 2
where MB := max0≤s≤t |B(s)| and δB := max0≤i<n |B(si+1 ) − B(si )|. Sum-
ming over i and applying the triangle inequality, we get
n−1
X
f (B(t)) − f (B(0)) − f 0 (B(si ))(B(si+1 ) − B(si ))
i=0
n−1
1 X
− f 00 (B(si ))(B(si+1 ) − B(si ))2
2
i=0
n−1
1 X
≤ ω(δB , MB ) (B(si+1 ) − B(si ))2 .
2
i=0
Now suppose that we take a sequence of partitions such that the mesh size tend to
zero. Then MB remains fixed, but by the continuity of Brownian motion, δB → 0
230 15. INTRODUCTION TO STOCHASTIC CALCULUS
and
n−1
P
X
(B(si+1 ) − B(si ))2 → t.
i=0
Finally, by Lemma 15.6.3,
n−1 Z t
L2
X
f 0 (B(si ))(B(si+1 ) − B(si )) → f 0 (B(s))dB(s).
i=0 0
Itô’s formula
R t is commonly used to evaluate stochastic integrals. For example,
let us evaluate 0 B(s)dB(s).
E XAMPLE 15.6.4. To put the problem in the framework of Itô’s formula, we
choose a function f such that f 0 (x) = x. Let’s take f (x) = x2 /2. Then Itô’s
formula gives
Z t
B(t)2 B(0)2 1 t
Z
− = B(s)dB(s) + ds,
2 2 0 2 0
which gives
Z t
1
B(s)dB(s) = (B(t)2 − t).
0 2
Theorem 15.6.1 gives the simplest version of Itô’s formula. There are many
generalizations. One version that suffices for most purposes is the following.
T HEOREM 15.6.5. Let d be a positive integer and let B = (B1 , . . . , Bd ) be
d-dimensional standard Brownian motions. Let f (t, x1 , . . . , xd ) be a function from
[0, ∞) × Rd into R which is continuously differentiable in t (where the derivative
at 0 is the right derivative) and twice continuously differentiable in (x1 , . . . , xd ).
Suppose that for each t ≥ 0 and each i,
Z t 2
∂f
E (s, B(s)) ds < ∞.
0 ∂xi
15.7. STOCHASTIC DIFFERENTIAL EQUATIONS 231
and
n−1
P
X
g(si , B(si ))(Bj (si+1 ) − Bj (si ))(Bk (si+1 ) − Bk (si )) → 0.
i=0
P ROOF. The proof goes just as for Lemma 15.6.2, by first assuming that g
is bounded and showing convergence in L2 , and then dropping the boundedness
assumption using the same trick as in the proof of Lemma 15.6.2.
Therefore the process Y satisfies a ‘stochastic integral equation’. The related dif-
ferential equation would be
dY (t) 1 dB(t)
= Y (t) + Y (t) .
dt 2 dt
However, Brownian motion is not differentiable. So instead, we write
1
dY (t) = Y (t)dt + Y (t)dB(t). (15.7.1)
2
This is known as a ‘stochastic differential equation’ (s.d.e.). The equation is just
shorthand for writing the integral equation displayed above. It should not be inter-
preted as a differential equation. (Note that we wrote dt before dB(t). Although
dt comes after dB(t) in Itô’s formula, it is customary to place the dt term before
the dB(t) term when writing s.d.e.’s.)
Now consider the following generalization of the stochastic differential equa-
tion (15.7.1):
dY (t) = µY (t)dt + σY (t)dB(t), (15.7.2)
with Y (0) = c, where c, µ ∈ R and σ > 0 are some given constants. To solve the
equation, let us try to guess a solution, and then verify that it works. Suppose that
there is some solution of the form X(t) = f (t, B(t)) for some smooth function
f (t, x). Then by Itô’s formula,
1 ∂2f
∂f ∂f
dX(t) = (t, B(t)) + 2
(t, B(t)) dt + (t, B(t))dB(t).
∂t 2 ∂x ∂x
Thus, we need f to satisfy
∂f ∂f 1 ∂2f
= σf, + = µf.
∂x ∂t 2 ∂x2
From the first equation, we see that f must be of the form A(t)eσx for some func-
tion A. Plugging this into the second equation, we get
1
A0 (t) + σ 2 A(t) = µA(t).
2
1 2
Solving this, we get A(t) = A(0)eµt− 2 σ t . Thus,
1 2 )t+σx
f (t, x) = A(0)e(µ− 2 σ .
The condition X(0) = c implies that A(0) = c. Therefore, our guess for the
solution to (15.7.2) is
1 2 )t+σB(t)
X(t) = ce(µ− 2 σ . (15.7.3)
It is easy to check that this is a continuous adapted process that satisfies (15.2.1).
Therefore, we can apply Itô’s formula and check that it is indeed a solution of the
s.d.e. (15.7.2) with initial condition X(0) = c.
15.7. STOCHASTIC DIFFERENTIAL EQUATIONS 233
and a similar identity holds for Ye (t). Therefore by the Cauchy–Schwarz inequality
and Corollary 15.4.3,
Z t∧Tn 2
e − Ye (t))2 ] ≤ 2E
E[(X(t) (a(s, X(s)) − a(s, Y (s)))ds
0
Z t∧Tn 2
+ 2E (b(s, X(s)) − b(s, Y (s)))dB(s)
0
Z t∧Tn
≤ 2tE (a(s, X(s)) − a(s, Y (s)))2 ds
0
Z t∧Tn
+ 2E (b(s, X(s)) − b(s, Y (s)))2 ds .
0
234 15. INTRODUCTION TO STOCHASTIC CALCULUS
By the local Lipschitz condition on a and b, note that |a(s, X(s))−a(s, Y (s))| and
|b(s, X(s)) − b(s, Y (s))| are both bounded by Ln |X(s) − Y (s)| when s ≤ t ∧ Tn .
But if s ≤ t ∧ Tn , then s = s ∧ Tn . Thus, putting K := 2L2n t + 2L2n , we have
Z t∧Tn
2 2
E[(X(t) − Y (t)) ] ≤ KE
e e (X(s) − Y (s)) ds
0
Z t∧Tn
2
= KE (X(s ∧ Tn ) − Y (s ∧ Tn )) ds
0
Z t
2
≤ KE (X(s ∧ Tn ) − Y (s ∧ Tn )) ds .
0
Next, note that by the Itô isometry and the Lipschitz condition on b,
Z s
2
E[(Vn+1 (s) − Vn (s)) ] = E[(b(u, Xn (u)) − b(u, Xn−1 (u)))2 ]du
0
Z s
2
≤L E[(Xn (u) − Xn−1 (u))2 ]du.
0
But Vn+1 − Vn is a continuous martingale. So by Doob’s L2 inequality for contin-
uous martingales (Theorem 14.2.1),
EkVn+1 − Vn k2[0,s] ≤ 4E[(Vn+1 (s) − Vn (s))2 ]
Z s
2
≤ 4L E[(Xn (u) − Xn−1 (u))2 ]du.
0
Combining this with (15.7.4) and using the inequality (x + y)2 ≤ 2x2 + 2y 2 , we
get
∆n (s) = EkXn+1 − Xn k2[0,s]
≤ 2EkUn+1 − Un k2[0,s] + 2EkVn+1 − Vn k2[0,s]
Z s
≤K E[(Xn (u) − Xn−1 (u))2 ]du
0
Z s
≤K ∆n−1 (u)du,
0
where K := 2L2 t + 8L2 . Let C := ∆0 (t). We claim that
C(Ks)n
∆n (s) ≤ (15.7.5)
n!
for each n ≥ 1 and 0 ≤ s ≤ t. This is easily shown by induction. Suppose that
this is true for n − 1. Then the recursive inequality shows that for any s ≤ t,
Z s
∆n (s) ≤ K ∆n−1 (u)du
0
Z s
C(Ku)n−1
≤K du
0 (n − 1)!
C(Ks)n
= .
n!
Thus, we get
∞
X X∞ p
EkXn+1 − Xn k[0,t] ≤ ∆n (t)
n=1 n=1
∞
r
X C(Kt)n
≤ < ∞.
n!
n=1
By the monotone convergence theorem, this shows that
X∞
kXn+1 − Xn k[0,t] < ∞ a.s.
n=1
15.8. CHAIN RULE FOR STOCHASTIC CALCULUS 237
This, in turn, shows that with probability one, the sequence {Xn }n≥1 is Cauchy
in C[0, t] under the supremum norm. Therefore with probability one, {Xn }n≥1
is Cauchy in C[0, M ] for every integer M . Consequently, {Xn }n≥1 is Cauchy in
C[0, ∞) with probability one. Let X denote the limit. Clearly, X is a continuous
adapted process.
Now, for any given t and n,
Z t Z t
2
E[(Xn+1 (s) − Xn (s)) ]ds ≤ ∆n (s)ds.
0 0
Using this, and the bound (15.7.5) on ∆n (s), it is easy to see that {Xn }n≥1 is a
Cauchy sequence in L2 ([0, t]×Ω). By Fatou’s lemma, the limit must necessarily be
equal to X. This shows that X satisfies the square-integrability condition (15.2.1)
for every t. Thus, by the Lipschitz properties of a and b, the following processes
are well-defined, continuous and adapted:
Z t Z t
U (t) := a(s, X(s))ds, V (t) := b(s, X(s))dB(s).
0 0
To show that X satisfies the required s.d.e., we need to prove that with probability
one, X(t) = c+U (t)+V (t) for all t. Since X, U and V are continuous processes,
it suffices to show that this holds almost surely for any given t. So let us fix some t.
Since Xn (t) = c + Un (t) + Vn (t) for each n, and Xn (t) → X(t) a.s. as n → ∞,
it suffices to show that Un (t) → U (t) and Vn (t) → V (t) in L2 (because then there
will exist subsequences converging a.s.). To prove this, first observe that
Z t
2
E[(U (t) − Un (t)) ] ≤ t E[(a(s, X(s)) − a(s, Xn−1 (s)))2 ]ds
0
≤ L tkX − Xn−1 k2L2 ([0,t]×Ω) .
2
But we have already seen above that Xn → X in L2 ([0, t] × Ω). Thus, Un (t) →
U (t) in L2 . Next, by the Itô isometry,
Z t
2
E[(V (t) − Vn (t)) ] = E[(b(s, X(s)) − b(s, Xn−1 (s)))2 ]ds
0
≤ L kX − Xn−1 k2L2 ([0,t]×Ω) .
2
Again, this shows that Vn (t) → V (t) in L2 . Thus, X is indeed a solution to the
required s.d.e. with initial condition X(0) = c.
where {a(t)}t≥0 and {b(t)}t≥0 are continuous adapted processes, with b satisfy-
ing (15.2.1) for each t. Let Y (t) := f (t, X(t)), where f (t, x) is a twice continu-
ously differentiable function. The following result identifies the s.d.e. satisfied by
the process {Y (t)}t≥0 .
Then
∂f ∂f 1 ∂2f
dY (t) = (t, X(t))dt + (t, X(t))dX(t) + (t, X(t))(dX(t))2 ,
∂t ∂x 2 ∂x2
where dX(t) = a(t)dt + b(t)dB(t) and (dX(t))2 stands for b(t)2 dt.
P ROOF. The proof is almost exactly the same as for Theorem 15.6.1, except
that we have to substitute Brownian motion with the process X. A sketch of the
proof goes as follows. The details are left to reader. To deal with potential un-
boundedness of X, we start by ‘localizing’ X as follows. Choose some large
number M , and define
This takes care of the first-order terms in (15.8.1). For the second-order term, we
proceed as follows. First, note that
e i ))2
e i+1 ) − X(s
(X(s
Z si+1 ∧T 2 Z si+1 ∧T 2
= a(u)du + b(u)dB(u)
si ∧T si ∧T
Z si+1 ∧T Z si+1 ∧T
+2 a(u)du b(u)dB(u) . (15.8.3)
si ∧T si ∧T
Thus,
n−1
XZ si+1 ∧T 2 Z t∧T
a(u)du ≤ max (si+1 − si ) a(u)2 du
si ∧T 0≤i≤n−1 0
i=0
≤ M 2 t max (si+1 − si ),
0≤i≤n−1
which tends to zero as the mesh size goes to zero. Similarly, using the Cauchy–
Schwarz inequality and Corollary 15.4.3, we get an upper bound for L1 norm of
the third term on the right side of (15.8.3), which also tends to zero as the mesh
240 15. INTRODUCTION TO STOCHASTIC CALCULUS
With all this information at our disposal, we can now proceed as in the proof of
Lemma 15.6.2 to show that
n−1 Z si+1 ∧T 2 Z si+1 ∧T
P
X
2
g(si ∧ T, X(si ))
e b(u)dB(u) − b(u) du → 0
i=0 si ∧T si ∧T
as the mesh size goes to zero. Combined with (15.8.4), this gives
n−1 Z si+1 ∧T
P
X
2 2
g(si ∧ T, X(si )) (X(si+1 ) − X(si )) −
e e e b(u) du → 0.
i=0 si ∧T
Additionally, we also have dBi (t)dBj (t) = 0 for i 6= j. These rules give
d
X
(dX(t))2 = bi (t)2 dt.
i=1
E XERCISE 15.8.3. Formulate and prove a version of Theorem 15.8.1 for pro-
cesses adapted to multidimensional Brownian motion.
with initial condition X(0) = c ∈ R, where µ and σ are strictly positive con-
stants. This looks almost the same as the s.d.e. (15.7.2), except that we do not
have an X(t) in front of dB(t), and constant in front of X(t)dt is strictly negative.
A simple verification shows that in this problem we cannot find a solution of the
form f (t, B(t)), so we have to implement a different plan. Note that by Theo-
rems 15.7.4 and 15.7.2, an adapted continuous solution exists and is unique. So we
only have to guess the solution and verify that the guess is valid. Taking cue from
the deterministic case σ = 0, we define Y (t) := eµt X(t). By Theorem 15.8.1,
and therefore,
Z t
X(t) = ce−µt + σ eµ(s−t) dB(s).
0
242 15. INTRODUCTION TO STOCHASTIC CALCULUS
P ROOF. It is not hard to see that it’s sufficient to prove that for any 0 ≤ s ≤ t,
X(t) − X(s) ∼ N (0, t − s) and X(t) − X(s) is independent of Fs . By Exer-
cise 12.7.6, we have to show that for any A ∈ Fs and any θ ∈ R,
2 (t−s)/2
E(eiθ(X(t)−X(s)) ; A) = P(A)e−θ . (15.10.1)
Fix some s ≥ 0 and a positive integer M , and let
T := inf{t ≥ s : |X(t)| > M }.
By the continuity of X, T is a stopping time for the filtration {Ft }t≥0 , and T is
always ≥ s. For each t ≥ 0, let
Y (t) := X(t ∧ T ), Z(t) := X(t ∧ T )2 − (t ∧ T ).
15.10. LÉVY’S CHARACTERIZATION OF BROWNIAN MOTION 243
Let Gt := Ft∧T . Then by the results of Section 13.11 (which hold for any right-
continuous filtration), {Gt }t≥0 is also a right-continuous filtration. Moreover, by
the optional stopping theorem for continuous martingales, Y (t) and Z(t) are mar-
tingales adapted to Gt . A consequence of the martingale properties of Y (t) and
Z(t) is that for any 0 ≤ u ≤ t,
E((Y (t) − Y (u))2 |Gu ) = E(Y (t)2 − 2Y (t)Y (u) + Y (u)2 |Gu )
= E(Y (t)2 |Gu ) − Y (u)2
= E(Z(t) + t ∧ T |Gu ) − Y (u)2
= Z(u) + E(t ∧ T |Gu ) − Y (u)2
= E(t ∧ T − u ∧ T |Gu ). (15.10.2)
Now fix some t ≥ s, A ∈ Fs , and θ ∈ R. Let f (x) := eiθx . Take a partition
s = s0 < s1 < · · · < sn = t of the interval [s, t]. By the martingale property of Y
and the fact that Fs ⊆ Gu for any u ≥ s,
n−1
X
E[(Y (si+1 ) − Y (si ))f 0 (Y (si ) − Y (s)); A] = 0. (15.10.3)
i=0
By (15.10.2),
n−1
X
E[(Y (si+1 ) − Y (si ))2 f 00 (Y (si ) − Y (s)); A]
i=0
n−1
X
= E[(si+1 ∧ T − si ∧ T )f 00 (Y (si ) − Y (s)); A]
i=0
n−1
X
00
=E (si+1 ∧ T − si ∧ T )f (Y (si ) − Y (s)); A .
i=0
As the mesh size goes to zero, the sum the expectation approaches
Z t∧T
f 00 (Y (u) − Y (s))du.
s
Also, by the boundedness of f 00 , the sums are uniformly bounded by a constant.
Therefore by the dominated convergence theorem,
n−1
X
E[(Y (si+1 ) − Y (si ))2 f 00 (Y (si ) − Y (s)); A]
i=0
Z t∧T
00
→E f (Y (u) − Y (s))du; A (15.10.4)
s
as the mesh size goes to zero.
Next, let δ := max0≤i≤n−1 |Y (si+1 ) − Y (si )|, and let
ω(δ) := max |f 00 (x) − f 00 (y)|.
x,y∈R,|x−y|≤δ
244 15. INTRODUCTION TO STOCHASTIC CALCULUS
and f 00 and the dominated convergence theorem, we can take the limits inside the
expectations in the above identity and get
Z t
1
E(f (X(t) − X(s)); A) = P(A) + E f 00 (X(u) − X(s))du; A .
2 s
Let φ(t) := E(f (X(t) − X(s)); A) for t ≥ s. Since f 00 (x) = −θ2 f (x), the above
identity yields the following integral equation for φ:
θ2 t
Z
φ(t) = P(A) − φ(u)du.
2 s
2
The unique solution of this equation is φ(t) = P(A)e−θ (t−s)/2 . (Uniqueness is
easily established, for example, by Picard iteration.) This proves (15.10.1), and
hence completes the proof of the theorem.