0% found this document useful (0 votes)
9 views22 pages

TopicsInAnalysis3 TKSM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views22 pages

TopicsInAnalysis3 TKSM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

TOPICS FROM ANALYSIS - PART 3/3

T.K.SUBRAHMONIAN MOOTHATHU

Contents

1. Measure theoretic probability: independence and the laws 1


2. Convergence of a series of random variables 7
3. Convergence of probability measures 9
4. Conditional expectation 14
5. Martingales 17

This third part of the notes discusses some fundamental results from Probability Theory in the
language of Measure Theory, and the treatment is aimed at those mathematical students who wish
to learn a little bit of Probability Theory for applications in other branches of mathematics.

1. Measure theoretic probability: independence and the laws

The first thing one should learn in Probability Theory is, when to add and when to multiply
probabilities. Roughly speaking, one adds two probabilities when two exclusive events are connected
by OR, and one multiplies two probabilities when two independent events are connected by AND.
For example, assume we randomly choose a pair of numbers (m, n) ∈ {1, . . . , 5} × {1, . . . , 6}. Let
A be the event that m + 4 ≤ n. Since A = {(m = 1 AND n ∈ {5, 6}) OR (m = 2 AND n = 6)},
we obtain prob(A) = (1/5 × 2/6) + (1/5 × 1/6) = 2/30 + 1/30 = 1/10. In this calculation, for
multiplying probabilities we used the fact that the values of m and n are independent of each
other. We will define precisely the notion of independence shortly.

Definition: (i) A measure space (X, A, µ) is called a probability space if µ(X) = 1, and in this
case µ is called a probability measure. (ii) Let (X, A, µ) be a probability space. A measurable
function f : (X, A, µ) → (R, B(R)) is called a random variable on X, i.e., we say f : X → R is a
random variable if f −1 (B) ∈ A for every Borel set B ⊂ R, and this is equivalent to demanding
that {x ∈ X : f (x) < c} ∈ A for every c ∈ R. It is possible to define a random variable in a more
general sense as a measurable function f : (X, A, µ) → (Y, B(Y )) for a topological space Y , but we
will consider only the case Y = R. Note that if f is a random variable on X and if g : R → R
is Borel-Borel measurable (in particular, if g is continuous), then g ◦ f is also a random variable
1
2 T.K.SUBRAHMONIAN MOOTHATHU

on X. (iii) Let (X, A, µ) be a probability space and f : X → R be a random variable on X.


Then f induces a Borel probability measure P on R by the condition that P (B) = µ(f −1 (B)) for
B ∈ B(R). Any A ∈ A is called an event, and P (B) = µ(f −1 (B)) gives the probability of the event
{x ∈ X : f (x) ∈ B}; see the Example below.

Example: Let X = {1, . . . , 364} and f : X → R be f (n) = the maximum temperature at Hyderabd,
India, on the nth day of the year 2013. Equip X with the normalized counting measure µ, i.e.,
µ(A) = |A|/364 for A ⊂ X. Then f is a random variable on (X, µ) and the induced Borel
probability measure P on R given by P (B) = µ(f −1 (B)) = |{n ∈ X : f (n) ∈ B}|/364 gives the
following information: for reals a < b, the value P ([a, b]) is the probability that the maximum
temperature at Hyderabad on a random day in 2013 is between a and b.

Definition: Let (X, A, µ) be a probability space.


(i) Events A, B ∈ A are independent if µ(A ∩ B) = µ(A)µ(B). Note that A ∩ B = {x ∈ X : x ∈
A and x ∈ B} so that independence is precisely the condition required to have the correspondence
“AND ↔ multiplication”. Independence of events does not mean exclusiveness. For example, if we
consider X = [0, 1] with Lebesgue measure, then the events (0, 1/2) and (1/4, 3/4) are independent,
but (0, 1/2) and (1/2, 1) are not independent as per our definition.

(ii) A collection C = {Cj : j ∈ J} ⊂ A of events is said to be independent if for any finite subset
∩ ∏
F ⊂ J with |F | ≥ 2, we have µ( j∈F Cj ) = j∈F µ(Cj ).

(iii) Families Cj ⊂ A (not necessarily σ-algebras) for j ∈ J are said to be independent if for any
∩ ∏
finite subset F ⊂ J with |F | ≥ 2 and events Cj ∈ Cj , we have µ( j∈F Cj ) = j∈F µ(Cj ).

(iv) A collection {fj : X → R : j ∈ J} of random variables is said to be independent if the σ-algebras


fj−1 (B(R))’s for j ∈ J are independent. If Pj = µ ◦ fj−1 is the induced probability measure on R,
then the independence of random variables fj : X → R for j ∈ J is equivalent to the following:

for any finite subset {j1 , . . . , jk } ⊂ J with k ≥ 2, the product probability measure ki=1 Pji on Rk

equals the induced probability measure ( ki=1 µ) ◦ (fj1 , . . . , fjk )−1 .

(v) A random variable f : X → R is independent with C ⊂ A if f −1 (B(R)) is independent with C.

Remark: Let (X, A, µ) be a probability space. (i) Events A, B ∈ A are independent exactly when
the indicator functions 1A , 1B are independent as random variables (check). Thus the independence
of events is a special case of the independence of random variables. (ii) If random variables f1 , f2 :
X → R are independent and g1 , g2 : R → R are Borel, then it may be verified that the random
variables g1 ◦ f1 , g2 ◦ f2 : X → R are independent.

Example: (i) Let (Xj , Aj , µj ) be probability spaces for j = 1, 2 and let (X, A, µ) be the product
⊗ ⊗
probability space, i.e., X = X1 × X2 , A = A1 A2 and µ = µ1 µ2 . If Aj ∈ Aj for j = 1, 2, then
TOPICS FROM ANALYSIS - PART 3/3 3

µ((A1 × X2 ) ∩ (X1 × A2 )) = µ(A1 × A2 ) = µ1 (A1 )µ2 (A2 ) = µ(A1 × X2 )µ(X1 × A2 ), and hence the
events A1 × X2 , X1 × A2 are independent in (X, A, µ). (ii) Let µ be a Borel probability measure
on R2 . Then the projections f1 , f2 : (R2 , µ) → R to the two coordinates are independent random

variables and the induced measures Pj = µ ◦ fj−1 for j = 1, 2 satisfy µ = P1 P2 .

Remark: The above two examples are instructive, and they provide a useful way of looking at
the notion of independence: roughly speaking, we may think of independent events as events
happening in distinct coordinates of a product probability space, and independent random variables
as projection-like functions that depend on distinct coordinates of a product probability space.

Definition: Let (X, A, µ) be a probability space and f : X → R be a random variable. The



expectation of f is defined as Ef = X f dµ ∈ [−∞, ∞] if the integral exists. Intuitively, Ef is
the average value of f . Evidently the expectation is linear when defined. If f ∈ L2 (X, A, µ) ⊂
∫ ∫ ∫ ∫
L1 (X, A, µ), observe that E(f − Ef )2 = (f − Ef )2 dµ = f 2 dµ − 2Ef f dµ + (Ef )2 dµ =
Ef 2 − 2(Ef )2 + (Ef )2 = Ef 2 − (Ef )2 . The variance of f is defined as var(f ) = ∥f − Ef ∥2 =
E(f − Ef )2 = Ef 2 − (Ef )2 when f ∈ L2 (X, A, µ) and var(f ) = ∞ if Ef 2 = ∞. The variance as

well as the standard deviation σ(f ) := var(f ) measures the amount of deviation of f from the
average (expected) value Ef . If f, g ∈ L2 (X, A, µ), then the covariance of f and g is defined as
cov(f, g) = ⟨f − Ef, g − Eg⟩ = E((f − Ef )(g − Eg)) = E(f g) − Ef Eg.

Example: Consider X = [0, 1] with Lebesgue measure µ. If f : X → R is f (x) = x2 , then Ef =


∫1 2 ∫
2 − (Ef )2 = 1 x4 dµ − (1/3)2 = 1/5 − 1/9 = 4/45. If g : [0, 1] → R
0 x dµ = 1/3 and var(f ) = Ef 0
∫1
is g(x) = x3 , then cov(f, g) = E(f g) − Ef Eg = 0 x5 dµ − (1/3)(1/4) = 1/6 − 1/12 = 1/12 ̸= 0.

Exercise-31: [Change of variable] Let (X, A, µ) be a probability space, f : X → R be a random


variable, and P = µ ◦ f −1 . Let g : R → R be Borel, and Pg = µ ◦ (g ◦ f )−1 . Then,
∫ ∫ ∫
(i) X |g ◦ f |dµ = R |g|dP = R |y|Pg (y).
∫ ∫ ∫ ∫
(ii) If X |g ◦ f |dµ < ∞, then E(g ◦ f ) = X g ◦ f dµ = R gdP = R ydPg (y).
∫ ∫ ∫ ∫
(iii) X |f |dµ = R |x|dP . If X |f |dµ < ∞, then Ef = R xdP (x).
[Hint: (i) By linearity, and approximation by nonnegative simple functions, it is enough to consider

the case g = 1B , where B ∈ B(R). Let A = f −1 (B) ⊂ X. Then g ◦ f = 1A so that R |g|dP =
∫ ∫ −1
R 1B dP = P (B) = µ(A) = X |g ◦ f |dµ. Since Pg = µ ◦ 1A = µ(A )µ0 + µ(A)µ1 (where µ0 , µ1 are
c
∫ ∫ ∫ ∫
Dirac measures at 0, 1), we have R |y|dPg (y) = {0} |y|dPg (y) + {1} |y|dPg (y) = 0 + {1} 1dPg (y) =
Pg ({1}) = µ(A). Part(ii) has a similar argument, and (iii) follows by taking g = IR .]

Exercise-32: Let (X, A, µ) be a probability space and f1 , f2 : X → R be random variables.


(i) [Product rule] If f1 , f2 ∈ L1 (X, A, µ), and f1 , f2 are independent, then f1 f2 ∈ L1 (X, A, µ) and
E(f1 f2 ) = (Ef1 )(Ef2 ).
4 T.K.SUBRAHMONIAN MOOTHATHU

(ii) If f1 , f2 ∈ L2 (X, A, µ), then var(f1 + f2 ) = var(f1 ) + 2cov(f1 , f2 ) + var(f2 ). If f1 , f2 are also
independent, then cov(f1 , f2 ) = 0 and hence var(f1 + f2 ) = var(f1 ) + var(f2 ).
⊗ ⊗
[Hint: (i) Let Pj = µ ◦ fj−1 on R, and P = (µ µ) ◦ (f1 , f2 )−1 on R2 . We have P = P1 P2
by independence. Hence by the argument in Exercise-31 and Fubini-Tonelli theorem, we have
∫ ∫ ∫
E(|f1 f2 |) = R2 |xy|dP (x, y) = ( R |x|dP1 )( R |y|dP2 ) = (E|f1 |)(E|f2 |) < ∞ so that E(f1 f2 ) =
∫ ∫ ∫
R2 xydP (x, y) = ( R xdP1 )( R ydP2 ) = (Ef1 )(Ef2 ). (ii) Let gj = fj − Efj for j = 1, 2. Then
var(f1 + f2 ) = ∥g1 + g2 ∥2 = ⟨g1 + g2 , g1 + g2 ⟩ = ∥g1 ∥2 + 2⟨g1 , g2 ⟩ + ∥g2 ∥2 .]

Remark: For f (x) = x2 and g(x) = x3 on [0, 1], E(f g) = 1/6 ̸= 1/12 = Ef Eg; this shows that the
hypothesis of independence is necessary in Exercise-32(i).

Definition: For a sequence (An ) of subsets of a set X, let


∪ ∩∞
lim inf n→∞ An = ∞ k=1 n=k An = {x ∈ X : x ∈ An for all large n ∈ N}, and
∩ ∪∞
lim supn→∞ An = ∞ k=1 n=k An = {x ∈ X : x ∈ An for infinitely many n ∈ N}.

[132] [Borel-Cantelli lemma or 0-1 law] Let (X, A, µ) be a probability space and An ∈ A.

(i) If ∞ n=1 µ(An ) < ∞, then µ(lim supn→∞ An ) = 0.
∑∞
(ii) If n=1 µ(An ) = ∞ and An ’s are independent, then µ(lim supn→∞ An ) = 1.
∪ ∩
Proof. Let Bk = n≥k An and A = ∞ k=1 Bk = lim supn→∞ An .
∑∞ ∑
(i) If n=1 µ(An ) < ∞, then µ(Bk ) ≤ ∞ k=n µ(An ) → 0 as k → ∞ and hence µ(A) = 0.

(ii) Let ∞n=1 µ(An ) = ∞. Using independence and the fact that 1 − p ≤ e
−p for p ∈ [0, 1], we
∩ ∏ ∏ ∏ ∑
obtain µ(Bkc ) = µ( n≥k Acn ) = n≥k µ(Acn ) = n≥k (1 − µ(An )) ≤ n≥k e−µ(An ) = e− n≥k µ(An ) =
e−∞ = 0. Thus µ(Bk ) = 1 for every k ∈ N so that µ(A) = 1. 

Definition: A collection A of subsets of a nonempty set X is called a π-system if A is closed under


finite intersections, and is called a λ-system if X ∈ A, A \ B ∈ A whenever A, B ∈ A, and A is
closed under countable disjoint unions.

Exercise-33: (i) If A is both a π-system and a λ-system on X, then A is a σ-algebra on X.


(ii) Let C be a π-system and D be a λ-system on X. If C ⊂ D, then the σ-algebra σ(C) generated
by C satisfies σ(C) ⊂ D.
(iii) Let (X, A, µ) be a probability space, A ∈ A, and C ⊂ A be a π-system. If A is independent
with C, then A is independent with the σ-algebra σ(C) generated by C.

[Hint: (i) Given An ∈ A, let B1 = A1 and Bn+1 = An+1 \ nj=1 Bj . Then Bn ’s are pairwise
∪ ∪∞
disjoint, and Bn ∈ A inductively. Consequently, ∞ n=1 An = n=1 Bn ∈ A. (ii) Let A be the
minimal λ-system containing C so that C ⊂ A ⊂ D. Enough to show A is a π-system. Let
A1 = {A ⊂ X : A ∩ C ∈ A for every C ∈ C}, which is a λ-system containing C. So A ⊂ A1 ,
and hence A ∩ C ∈ A for A ∈ A and C ∈ C. Let A2 = {B ⊂ X : A ∩ B ∈ A for every A ∈ A},
TOPICS FROM ANALYSIS - PART 3/3 5

which is a λ-system containing C. So A ⊂ A2 , and hence A ∩ B ∈ A for A, B ∈ A. (iii) Let


D = {D ∈ A : µ(A ∩ D) = µ(A)µ(D)}. Then D is a λ-system containing C. Hence σ(C) ⊂ D.]

[133] [Kolmogorov’s 0-1 law] (i) Let (fn ) be a sequence of independent random variables on a
probability space (X, A, µ). Let Tk ⊂ A be the smallest σ-algebra such that fn is measurable for

n ≥ k, and T = ∞ k=1 Tk , the σ-algebra of tail events. Then µ(A) = 0 or µ(A) = 1 for each A ∈ T .
(ii) Let (Xn , An , µn ) be probability spaces for n ∈ N, and let (X, A, µ) be their product probability
∏ ⊗∞
space, i.e., X = ∞ n=1 Xn and µ = n=1 µn . Let πn : X → Xn be the projection to the nth
coordinate, and recall that the product σ-algebra A is the smallest σ-algebra on X w.r.to which
all the projections πn are measurable. Let Tk be the smallest σ-algebra on X such that πn ’s are

measurable for n ≥ k, and let T = ∞ k=1 Tk . Then µ(A) = 0 or µ(A) = 1 for each A ∈ T .

∪∞
Proof. (i) Let Ck be the smallest σ-algebra such that f1 , . . . , fk are measurable. Let C = k=1 Ck ,
which is a π-system, being an increasing union of σ-algebras. If A ∈ T , then A ∈ Tk+1 and hence
A is independent with Ck for each k ∈ N. Hence A is independent with C, and therefore A is
independent with σ(C) by Exercise-33(iii). Since A ∈ T1 ⊂ σ(C), we get A is independent with
itself, i.e., µ(A) = µ(A ∩ A) = µ(A)µ(A) = µ(A)2 . Hence µ(A) = 0 or µ(A) = 1.
∪∞
(ii) Let Ck be the smallest σ-algebra on X such that π1 , . . . , πk are measurable, let C = k=1 Ck ,
and imitate the argument given for part (i) after noting that Ck is independent with Tk+1 . 

Remark: Tail events for an independent sequence (fn ) of random variables are those events that are
not affected by changing/removing finitely many fn ’s. For example, {x ∈ X : lim inf n→∞ fn (x) ≥ 0}
is a tail event, but {x ∈ X : inf n∈N fn (x) ≥ 0} is not a tail event.

Exercise-34: Let (fn ) be an independent sequence of random variables on a probability space


(X, A, µ), and T be the corresponding tail σ-algebra on X.
(i) If f : X → [−∞, ∞] is T -B([−∞, ∞]) measurable, then f is constant µ-almost everywhere.
(ii) Each of the following is a constant in [−∞, ∞] for µ-almost every x ∈ X: lim inf n→∞ fn ,
∑ ∑
lim supn→∞ fn , lim inf n→∞ ( nj=1 fj )/n, and lim supn→∞ ( nj=1 fj )/n.
[Hint: (i) For c ∈ [−∞, ∞], Ac := f −1 ([−∞, c]) ∈ T so that µ(Ac ) = 0 or 1. Let c0 = inf{c :
µ(Ac ) = 1} and check µ(f −1 (c0 )) = 1. (ii) The given functions are T -B([−∞, ∞]) measurable.]

Earlier we showed in [129] that the shift map is ergodic. Now we will give another proof of this
using Kolmogorov’s 0-1 law.

[134] [Ergodicity of the shift map] Let (X, A0 , µ0 ) be a probability measure space, and let (X N , A, µ)
⊗∞ ⊗∞
be the product probability space, where A = n=1 A0 and µ = n=1 µ0 . Then the shift map
g : X N → X N given by g(x)n = xn+1 for x ∈ X N and n ∈ N is ergodic.
6 T.K.SUBRAHMONIAN MOOTHATHU

Proof. Let A ∈ A, and g −1 (A) = A. Let Tk , T be as in [133](ii). As A is the smallest σ-algebra such
that all the projections are measurable, we have A = T1 . Thus A ∈ T1 . Then A = g −(k−1) (A) ∈ Tk

for k > 1 since g is the shift. Hence A ∈ ∞k=1 Tk = T . So µ(A) ∈ {0, 1} by [133](ii). 

Remark: A little more work will establish the same conclusion for the shift map on X Z , see p.271
of R.M. Dudley, Real Analysis and Probability.

The phrase law of large numbers refers to the convergence of the averages (f1 + · · · + fn )/n when
fn ’s are independent random variables. There are weak and strong laws of large numbers. To
establish the weak law, we start with some inequalities.

Exercise-35: Let (X, A, µ) be a probability space.


(i) [Markov’s inequality] If f : X → [0, ∞) is measurable, then µ({x ∈ X : f (x) ≥ t}) ≤ Ef /t =

( f dµ)/t for 0 < t < ∞.
(ii) If g : X → R is a random variable, and 0 < r, t < ∞. Then µ({x ∈ X : |g(x)| ≥ t}) ≤ (E|g|r )/tr .
(iii) [Chebychev’s inequality] Let h : X → R be a random variable with Eh2 < ∞ and var(h) =
σ 2 (h) ∈ (0, ∞). Then for 0 < t < ∞, we have µ({x ∈ X : |h(x) − Eh| ≥ t}) ≤ var(h)/t2 .
[Hint: For (i), let A = {x : f (x) ≥ t}, note 1A ≤ f /t, and integrate. For (ii), note |g(x)| ≥ t iff
|g(x)|r ≥ tr , and apply (i) to f (x) = |g(x)|r . For (iii), apply (ii) to g(x) = h(x) − Eh with r = 2.]

Definition: A sequence (fn ) of random variables on a probability space (X, A, µ) is said to be


−1 = µ ◦ f −1 for every m, n.
identically distributed if µ ◦ fm n

Remark: Let (fn ) be identically distributed and P = µ◦fn−1 . If f1 ∈ L1 , then Efn = R xdP = Ef1
for every n by Exercise-31(iii). Similarly, if f1 ∈ L2 , then var(fn ) = var(f1 ) for every n.

[135] [Weak law of large numbers] Let (fn ) be a sequence of independent, identically distributed
random variables on a probability space (X, A, µ) with Ef12 < ∞. Then (f1 + · · · + fn )/n → Ef1

in measure, i.e., µ({x ∈ X : |Ef1 − (1/n) nj=1 fj (x)| ≥ ε}) → 0 as n → ∞ for each ε > 0.

Proof. Let sn := f1 + · · · + fn . Then E(sn /n) = Ef1 by linearity and the above Remark. Similarly,
var(sn /n) = (1/n2 )var(sn ) = (1/n2 )n · var(f1 ) = var(f1 )/n by Exercise-32 and the above Remark.
By Checbychev’s inequality, µ({x ∈ X : |sn (x)/n − Ef1 | ≥ ε}) ≤ var(f1 )/nε2 → 0 as n → ∞. 

Remark: On a probability space, pointwise convergence a.e. implies convergence in measure; see
Egorov’s theorem in my notes on Measure Theory. Hence [136] below covers [135] above.

[136] [Strong law of large numbers] Let (fn ) be a sequence of independent, identically distributed
random variables on a probability space (X, A, µ). If E|f1 | < ∞, then (f1 + · · · + fn )/n → Ef1
pointwise µ-almost everywhere.
TOPICS FROM ANALYSIS - PART 3/3 7
⊗∞ ⊗∞
Proof. Let P = n=1 (µ ◦ fn−1 ) = n=1 (µ ◦ f1−1 ) be the product Borel measure on RN , and
ϕ : RN → R be (y1 , y2 , . . .) 7→ y1 . Since f : (X, µ) → (RN , P ) defined as f (x) = (fn (x)) is measure
∫ ∫ ∫
preserving, RN |ϕ|dP = X |ϕ| ◦ f dµ = X |f1 |dµ = E|f1 | < ∞. Thus ϕ ∈ L1 (RN , B(RN ), P ). Let
g : (RN , P ) → (RN , P ) be the shift map. By [134], g is ergodic, and by Birkhoff’s ergodic theorem,
∑ ∫ N
limn→∞ (1/n) n−1 j=0 ϕ(g (y)) = RN ϕ(y)dP (y) for P -almost every y ∈ R . Since f is measure
j
∑n−1 ∫
preserving, the substitution y = f (x) gives limn→∞ (1/n) j=0 ϕ(g j (f (x))) = X f1 (x)dµ(x) = Ef1
∑ ∑
for µ-almost every x ∈ X. As ϕ ◦ g j ◦ f = fj+1 , we have (1/n) nj=1 fj = (1/n) n−1 j=0 ϕ ◦ g ◦ f .
j

Thus limn→∞ (1/n) nj=1 fj (x) = Ef1 for µ-almost every x ∈ X. 

Definition: If P1 , P2 are finite signed Borel measures on R, then their convolution P1 ∗ P2 is the

signed Borel measure on R defined by P1 ∗ P2 (A) = R P2 (A − s)dP1 (s). The relevance of this
definition can be seen from the following:

Exercise-36: (i) Let P1 , P2 , P3 be finite signed Borel measures on R. Then we have P1 ∗ P2 (A) =

P1 P2 ({(s, t) ∈ R2 : s+t ∈ A}). Consequently, P1 ∗P2 = P2 ∗P1 and P1 ∗(P2 ∗P3 ) = (P1 ∗P2 )∗P3 .
Moreover, if µ0 is the Dirac measure at 0, then P1 ∗ µ0 = P1 = µ0 ∗ P1 .
(ii) [Convolution of induced measures corresponds to the addition of independent random variables]
Let f1 , f2 : X → R be independent random variables on a probability space (X, A, µ), and Pj =
µ ◦ fj−1 for j = 1, 2. Then P1 ∗ P2 = µ ◦ (f1 + f2 )−1 .
∫ ∫ ∫
[Hint: (i) 1A−s (t) = 1A (s+t). Hence P1 ∗P2 (A) = R P2 (A−s)dP1 (s) = R R 1A−s (t)dP2 (t)dP1 (s) =
∫ ⊗ ⊗
R2 1A (s + t)d(P1 P2 ) = P1 P2 ({(s, t) ∈ R2 : s + t ∈ A}). (ii) By independence, µ ◦ (f1 , f2 )−1
⊗ ⊗
on R2 is the product measure P1 P2 . Hence P1 ∗ P2 (A) = P1 P2 ({(s, t) ∈ R2 : s + t ∈ A}) =
µ ◦ (f1 , f2 )−1 ({(s, t) ∈ R2 : s + t ∈ A}) = µ({x ∈ X : f1 (x) + f2 (x) ∈ A}) = µ ◦ (f1 + f2 )−1 (A).]

2. Convergence of a series of random variables

Let (fn ) be a sequence of random variables. In the previous section, we analyzed the convergence
∑ ∑
of the averages (1/n) nj=1 fj . In this section we ask: when does the series ∞ n=1 fn converge?
Equivalently, we wish to know when does the sequence sn = f1 + · · · + fn of partial sums converge.

In general, convergence in measure for a sequence of random variables on a probability space


implies only that a subsequence converges pointwise almost everywhere. However, after a little bit
of preparation we will show that for the partial sum sequence mentioned above, convergence in
measure is equivalent to pointwise convergence almost everywhere (Levy’s theorem).

Exercise-37: [Completeness for a.e. convergence] Let (sn ) be a sequence of random variables on
a probabiltiy space (X, A, µ). If ∀ ε > 0, we have lim µ({x ∈ X : sup |sn (x) − sk (x)| ≥ ε}) = 0,
k→∞ n≥k
then there is a random variable s on (X, A, µ) such that (sn ) → s pointwise µ-a.e. [Hint: Let

A(k, r) = {x ∈ X : supm,n≥k |sm (x) − sn (x)| ≤ 1/r}, and A(r) = ∞ k=1 A(k, r), which is an
8 T.K.SUBRAHMONIAN MOOTHATHU

increasing union. Using the triangle inequality for ε = 1/2r we may see µ(A(r)) = 1. Then
∩ ∩∞
µ( ∞r=1 Ar ) = 1. And for each x ∈ r=1 Ar , the Cauchy sequence (sn (x)) converges.]

We also need a technical inequality relating partial sums sj and max sj for Levy’s theorem:
1≤j≤m

Exercise-38: Let f1 , . . . , fm be independent random variables on a probability space (X, A, µ), and

sj = ji=1 fi . For ε > 0 and 1 ≤ j ≤ m, let S(j, ε) = {x ∈ X : |sj (x)| ≥ ε} and T (j, ε) = {x ∈
X : |fj+1 (x) + · · · + fm (x)| = |sm (x) − sj (x)| > ε}. If there is ε > 0 such that µ(T (j, ε)) ≤ 1/2

for 1 ≤ j ≤ m, then µ( m j=1 S(j, 2ε)) ≤ 2µ(S(m, ε)). [Hint: Let A1 = S(1, 2ε) and Aj+1 = S(j +
∪j
1, 2ε) \ i=1 S(i, 2ε), which are disjoint. Note that Aj is independent with T (j, ε) since Aj depends
on f1 , . . . , fj and T (j, ε) on fj+1 , . . . , fm . If |sj (x)| ≥ 2ε and |sm (x) − sj (x)| ≤ ε, then |sm (x)| ≥ ε,
∪ ∑m
and therefore Aj ∩ T (j, ε)c ⊂ Aj ∩ S(m, ε). Hence (1/2)µ( m j=1 S(j, 2ε)) = (1/2) j=1 µ(Aj ) ≤
∑m ∑ m ∑ m
j=1 µ(Aj )µ(T (j, ε) ) ≤ j=1 µ(Aj ∩ T (j, ε) ) ≤ j=1 µ(Aj ∩ S(m, ε)) ≤ µ(S(m, ε)).]
c c

[137] [Levy’s theorem] Let (fn ) be a sequence of independent random variables on a probability
space (X, A, µ). Then the following are equivalent:

(i) ∞ n=1 fn converges (to some random variable) in measure.
∑∞
(ii) n=1 fn converges (to some random variable) pointwise µ-a.e.

Proof. Enough to show (i) ⇒ (ii) since the other implication is always true on a probability space.

Let sn = nj=1 fj . Given ε ∈ (0, 1/2), by (i) choose k ∈ N large enough with µ({x ∈ X : |sn (x) −
sm (x)| > ε} ≤ ε < 1/2 for every n, m ≥ k. Fix m ∈ N, and applying Exercise-38 to fk+1 , · · · , fk+m ,
we get µ({x ∈ X : max1≤j≤m |sk+j (x) − sk (x)| ≥ 2ε}) ≤ 2µ({x ∈ X : |sk+m (x) − sk (x)| > ε}) < ε.
As m ∈ N is arbitrary, we have shown that for every ε ∈ (0, 1/2) there is k ∈ N such that
µ({x ∈ X : supn≥k |sn (x) − sk (x)| ≥ 2ε}) ≤ ε. This is equivalent to saying limk→∞ µ({x ∈ X :
supn≥k |sn (x) − sk (x)| ≥ 2ε}) = 0 for every ε ∈ (0, 1/2). By Exercise-37, we are through. 
∑∞
In [138] and Exercise-40 below we will see sufficient conditions for the convergence of n=1 fn .

[138] [Khintchine-Kolmogorov] Let (fn ) be an L2 -sequence of independent random variables on a


probability space (X, A, µ).
∑∞ ∑
(i) If Efn = 0 for every n ∈ N and < ∞, then ∞
2
n=1 Efn n=1 fn converges pointwise µ-a.e., and
∑ ∞
in L2 -norm to some s ∈ L2 (X, A, µ). Also, Es2 = n=1 Efn2 .
∑ ∑∞ ∑∞
(ii) If the real series ∞n=1 Efn and n=1 var(fn ) are convergent, then n=1 fn converges pointwise
µ-a.e. and in L2 -norm to some s ∈ L2 (X, A, µ).
∑n
Proof. (i) Let sn = j=1 fj . Since L2 -convergence implies convergence in measure, and because
of Levy’s theorem, it suffices to show (sn ) is Cauchy in L2 (X, A, µ). Given ε > 0 choose k0 ∈ N

such that ∞ n=k0 Efn < ε. Using Exercise-32(ii) and the fact Efn = 0, we see for m > k ≥ k0 that
2
∑ ∑m ∑m ∑m
∥sm − sk ∥22 = E( m 2
n=k+1 fn ) = var( n=k+1 fn ) = n=k+1 var(fn ) =
2
n=k+1 Efn < ε. Thus
TOPICS FROM ANALYSIS - PART 3/3 9
∑n
(sn ) is Cauchy in L2 (X, A, µ). Let s = limn→∞ sn in L2 (X, A, µ). Note that ∥sn ∥22 = 2
j=1 Efj as

above, and therefore Es2 = ∥s∥22 = limn→∞ ∥sn ∥22 = ∞ 2
j=1 Efj .

(ii) After noting that E(fn − Efn ) = 0 and var(fn ) = E(fn − Efn )2 , we apply (i) to fn − Efn to

conclude that ∞ ′
n=1 (fn −Efn ) converges pointwise µ-a.e. and in L -norm to some s ∈ L (X, A, µ).
2 2
∑∞ ∑
Then n=1 fn converges pointwise µ-a.e. and in L2 -norm to s := s′ + ∞ n=1 Efn ∈ L (X, A, µ). 
2

Remark: In the previous section, to obtain results about the convergence of (f1 + · · · + fn )/n,
we used the hypothesis that fn ’s are independent and identically distributed. However, being
∑∞
identically distributed does not help much in the discussion of the convergence of n=1 fn . If
fn ’s are identically distributed and (fn ) satisfies the hypothesis of [138](ii), then we should have
Efn = 0 = var(fn ), and this implies the L2 -norm Efn2 = 0, giving fn = 0 for every n ∈ N.

Exercise-39: Let (fn ), (gn ) be two sequences of random variables on a probability space (X, A, µ),
∑ ∑∞
equivalent in the sense that ∞ n=1 µ({x ∈ X : fn (x) ̸= gn (x)}) < ∞. Then n=1 fn converges
∑∞
pointwise µ-a.e. iff n=1 gn converges pointwise µ-a.e. [Hint: Let An = {x ∈ X : fn (x) ̸= gn (x)}
and A = lim supn→∞ An . By Borel-Cantelli, µ(A) = 0. And for each x ∈ Ac , we have that
fn (x) = gn (x) for all large n ∈ N.]

Exercise-40: [Kolmogorov’s three series theorem - sufficiency] Let (fn ) be a sequence of independent
random variables on a probability space (X, A, µ), and (gn ) be defined as gn (x) = fn (x) when
|fn (x)| ≤ 1 and gn (x) = 0 otherwise. Assume that the following three series of real numbers
∑ ∑ ∑
are convergent: (i) ∞ n=1 µ{x ∈ X : |fn (x)| > 1}) (ii) ∞ n=1 Egn (iii) ∞
n=1 var(gn ). Then
∑∞
n=1 fn converges pointwise µ-a.e. [Hint: Check that gn ’s are also independent. Now, (i) says
∑∞ ∑∞
n=1 µ({x ∈ X : fn (x) ̸= gn (x)}) < ∞. Hence by Exercise-39, it suffices to show n=1 gn
converges pointwise µ-a.e. And this follows from (ii), (iii) and [138](ii).]

Remark: The converse (necessity) of Exercise-40 is also true. Also, any bound C > 0 may be used
∑ ∑∞
in the place of 1 to define gn from fn as ∞
n=1 fn converges iff n=1 (fn /C) converges.

3. Convergence of probability measures

Definition: A complete separable metric space is called a Polish space. If X is a Polish space, let
M (X) denote the collection of all Borel probability measures on X. Also, let Cb (X, R) = {f :
X → R : f is continuous and bounded}. Note that Cb (X, R) is a Banach space with respect to the
supremum norm, and any µ ∈ M (X) induces a bounded linear functional on Cb (X, R) by the rule

f 7→ X f dµ. If µ, µn ∈ M (X), then borrowing the notion of weak* convergence from Functional
∫ ∫
Analysis, we may say (µn ) converges to µ weak* if X f dµn → X f dµ for every f ∈ Cb (X, R).
However, this convergence is called weak convergence in Probability Theory. That is, we say
∫ ∫
(µn ) → µ weakly in M (X) if X f dµn → X f dµ for every f ∈ Cb (X, R).
10 T.K.SUBRAHMONIAN MOOTHATHU

Exercise-41: Let X be a Polish space, Γ ⊂ Cb (X, R) be a dense subset, and µ, µn ∈ M (X). If


∫ ∫
( X gdµn ) → X gdµ for every g ∈ Γ, then (µn ) → µ weakly. [Hint: Given f ∈ Cb (X, R) and ε > 0,
∫ ∫ ∫ ∫
choose g ∈ Γ with ∥f − g∥ < ε. Then | f d(µ − µn )| ≤ | (f − g)dµ| + | gd(µ − µn )| + | (g − f )dµn |
where the first and third terms are ≤ ε since ∥f − g∥ < ε and µ(X) = µn (X) = 1.]

[139] Let X be a Polish space and µ, µn ∈ M (X). Then the following are equivalent:
(i) (µn ) → µ weakly.
(ii) lim inf n→∞ µn (U ) ≥ µ(U ) for every open set U ⊂ X.
(iii) lim supn→∞ µn (F ) ≤ µ(F ) for every closed set F ⊂ X.
(iv) If A ∈ B(X) is with µ(∂A) = 0, then limn→∞ µn (A) = µ(A).

Proof. (i) ⇒ (ii): Let Fk = {x ∈ X : dist(x, U c ) ≥ 1/k} and gk : X → [0, 1] be a continuous



function such that gk ≡ 1 on Fk and gk ≡ 0 on U c . Then U = ∞
k=1 Fk is an increasing union, and
(gk ) ↗ 1U pointwise ∫on X. Given ε > ∫0, choose k ∈ N large∫ with µ(Fk ) > µ(U ) − ε. Now,
µ(U ) − ε < µ(Fk ) ≤ gk dµ = lim inf gk dµn ≤ lim inf 1U dµn = lim inf µn (U ).
X n→∞ X n→∞ X n→∞

(ii) ⇔ (iii): Just use the fact that U ⊂ X is open iff U c is closed.

(ii & iii) ⇒ (iv): Let U = int(A) and F = A. We have µ(U ) = µ(A) = µ(F ) since µ(∂A) = 0.
Hence lim sup µn (A) ≤ lim sup µn (F ) ≤ µ(F ) = µ(A) = µ(U ) ≤ lim inf µn (U ) ≤ lim inf µn (A).
n→∞ n→∞ n→∞ n→∞
This implies equality throughout, and therefore limn→∞ µn (A) = µ(A).

(iv) ⇒ (i): Consider f ∈ Cb (X, R) and assume f (X) ⊂ (−M, M ). The set Y := {y ∈ R :
µ(f −1 (y)) > 0} is countable since f −1 (y1 ) ∩ f −1 (y2 ) = ∅ for y1 ̸= y2 . Given ε > 0, choose a
partition −M = a0 < a1 < · · · < ak−1 < ak = M of [−M, M ] such that aj − aj−1 < ε for 1 ≤ j ≤ k

and aj ∈/ Y for 0 ≤ j ≤ k. If we put Aj = {x ∈ X : aj−1 ≤ f (x) < aj }, then X = kj=1 Aj is a
disjoint union. Moreover, ∂Aj = 0 for 1 ≤ j ≤ k because ∂Aj ⊂ f −1 (aj−1 ) ∪ f −1 (aj ). This implies

µ(Aj ) = limn→∞ µn (Aj ) for 1 ≤ j ≤ k by (iv). Let g : X → [0, 1] be defined as g = kj=1 aj 1Aj .
∫ ∫
Then gdµ ∫ = limn→∞ gdµn since ∫ µ(Aj ) = lim∫n→∞ µn (Aj ). ∫
Also, f < g < f + ε, and therefore
(lim sup f dµn ) − ε ≤ lim sup (g − ε)dµn = (g − ε)dµ ≤ f dµ

n→∞ ∫ n→∞ ∫ ∫
≤ gdµ = lim inf gdµn ≤ lim inf (f + ε)dµn = (lim inf f dµn ) + ε.
n→∞ n→∞ n→∞
∫ ∫
Since ε > 0 is arbitrary, we deduce that limn→∞ f dµn exists and is equal to f dµ. 

Remark: Let X be a compact metric space. Then C(X, R) = Cb (X, R) is a separable Banach space
w.r.to the supremum norm (for a proof of separability, see the first part of my notes on Topological
Groups). By Alaoglu’s theorem, etc., the closed unit ball Γ of the dual space C(X, R)∗ is compact
and metrizable in the weak* topology (warning: the unit sphere {f ∈ C(X, R) : ∥f ∥ = 1} is not
TOPICS FROM ANALYSIS - PART 3/3 11

closed in the weak* topology, and hence is not weak* compact). The collection M (X) of all Borel
probability measures on X can be thought of as a subset of Γ by Riesz representation theorem.

Exercise-42: Let X be a compact metric space and M (X) ⊂ C(X, R)∗ be the collection of all Borel
probability measures on X. Then,
(i) M (X) is compact and metrizable w.r.to the weak* topology on C(X, R)∗ .
(ii) Every sequence (µn ) in M (X) has a subsequence (µnk ) converging weakly to some µ ∈ M (X),
∫ ∫
i.e., X f dµnk → X f dµ for every f ∈ C(X, R).
[Hint: (i) Enough to check M (X) is weak* closed, and this is easy: if µn ∈ M (X) and (µn ) → µ
∫ ∫
weakly, then µ(X) = X 1dµ = limn→∞ 1dµn = limn→∞ µn (X) = 1. And (ii) follows from (i).]

Remark: Let X be a compact metric space and {fn : n ∈ N} be a countable dense subset of the

closed unit ball of C(X, R). Then it may be shown that µ 7→ ( X fn dµ) embeds (M (X), weak) as a
closed subset of [−1, 1]N ; this gives another proof that (M (X), weak) is compact and metrizable.

We will now be concerned with the following questions:

Question: Let X be a polish space and (µn ) be a sequence in M (X). When can we say that (µn ) has
a weakly convergent subsequence? Are there other equivalent formulations of weak convergence?

Exercise-43: Let µ ∈ M (R) and F : R → [0, 1] be the corresponding distribution function defined
as F (x) = µ((−∞, x]) for x ∈ R. Then,
(i) F is continuous from the right, limx→−∞ F (x) = 0 and limx→∞ F (x) = 1.
(ii) x ≤ y ⇒ F (x) ≤ F (y), and consequently, the set of discontinuities of F is at most countable.
(iii) If F is continuous, then F is uniformly continuous.
[Hint: (iii) Given ε > 0, choose M > 0 large so that F (x) < ε/2 for x < −M and F (x) > 1 − ε/2
for x > M . Then choose δ > 0 for ε/2 for the uniformly continuous function F |[−M,M ] .]

[140] [Helly-Bray] Let µn , µ ∈ M (R), let Fn , F : R → [0, 1] be the corresponding distribution


functions, and let C = {x ∈ R : F is continuous at x}. Then the following are equivalent:
∫ ∫
(i) (µn ) → µ weakly, i.e., R f dµn → R f dµ for every f ∈ Cb (R, R).
(ii) (µn ) → µ in distribution, i.e., (Fn (x)) → F (x) for every x ∈ C.
(iii) There exists a dense subset A ⊂ R such that (Fn (x)) → F (x) for every x ∈ A.
(iv) There exists a dense subset A ⊂ R such that (µn ((a, b]) → µ((a, b]) for every a < b in A.

Proof. (i) ⇒ (ii): Fix x ∈ C and let a < x < b. Let f, g ∈ Cb (R, R) be defined by the following
conditions: f = 1 on (−∞, a], f = 0 on (x, ∞], and the graph of f is linear on [a, x]; g = 1 on
(−∞, x], g = 0 on (b, ∞], and the graph of g is linear on [x, b]. Observe that F (a) = µ((−∞, a]) ≤
∫ ∫ ∫ ∫
f dµ ≤ gdµ ≤ µ((−∞, b]) = F (b) and f dµn ≤ µn ((−∞, x]) = Fn (x) ≤ gdµn . Since
12 T.K.SUBRAHMONIAN MOOTHATHU
∫ ∫ ∫ ∫
f dµn → f dµ and gdµn → gdµ, we get F (a) ≤ lim inf n→∞ Fn (x) ≤ lim supn→∞ Fn (x) ≤
F (b). Letting a ↗ x, b ↘ x and using the continuity of F at x, we see F (x) = limn→∞ Fn (x).

The implication (ii) ⇒ (iii) is trivial, and (iii) ⇒ (iv) is easy.

(iv) ⇒ (i): Let f ∈ Cb (R, R) and ε > 0 be given. Choose a < b in A such that for Y := (a, b]
we have ∫µ(R \ Y ) < ε. Since (µn (Y )) → µ(Y ), we also have (µn (R \ Y )) → µ(R \ Y ). Hence
lim sup | f d(µ − µn )| ≤ lim sup ∥f ∥(µ(R \ Y ) + µn (R \ Y )) ≤ 2∥f ∥ε. (*)
n→∞ R\Y n→∞

Since f is uniformly continuous on [a, b], there exist a = y0 < y1 < · · · < yk = b in A such that
∑ ∫
|f (x) − f (yj )| < ε for x ∈ Yj := (yj−1 , yj ]. Let g = kj=1 f (yj )1Yj , and note that | Y gd(µ − µn )| =
∑ ∑ ∫
| kj=1 f (yj )(µ(Yj ) − µn (Yj ))| ≤ kj=1 ∥f ∥|µ(Yj ) − µn (Yj )|. Also observe that | Y (f − g)dµn | ≤
∑k ∫ ∑k ∫
j=1 Yj |f − g|dµn ≤ j=1 εµn (Yj ) = εµn (Y ) ≤ ε, and similarly | Y (f − g)dµ| ≤ ε. Therefore,
∫ ∫ ∫ ∫ ∑
| Y f d(µ−µn )| ≤ | Y (f −g)dµ|+| Y (f −g)dµn |+| Y gd(µ−µn )| ≤ ε+ε+ kj=1 ∥f ∥|µ(Yj )−µn (Yj )|.

Since (µn (Yj )) → µ(Yj ), we conclude that lim supn→∞ | Y f d(µ − µn )| ≤ 2ε. (**)

From (*) and (**), we obtain that lim supn→∞ | R f d(µ − µn )| ≤ 2∥f ∥ε + 2ε. 

[141] Let X be a Polish space. Suppose that a sequence (µn ) in M (X) is uniformly tight in the
following sense: for every ε > 0, there is a compact subset K ⊂ X with µn (K) > 1 − ε for every
n ∈ N. Then there exist µ ∈ M (X) and a subsequence of (µnj ) such that (µnj ) → µ weakly.

Proof. Step-1 : For m ∈ N, let Km ⊂ X be compact with µn (Km ) > 1 − 1


m for every n ∈ N. Since
Cb (Km , R) = C(Km , R) is separable, we may choose a countable dense subset Γm ⊂ C(Km , R)

for each m ∈ N. Now, (( Km gµn )m∈N, g∈Γm )∞ n=1 is a sequence in the compact metric space
∏ ∏
m∈N g∈Γm [min g, max g], and hence has a convergent subsequence. That is, there is (nj ) such

that ( Km gdµnj )∞
j=1 converges in R for each m ∈ N and each g ∈ Γm .

Step-2 : We claim that ( X f dµnj ) converges for every f ∈ Cb (X, R). Consider f ∈ Cb (X, R) and

it suffices to show ( X f dµnj ) is Cauchy. The idea is to choose Km approximating X by uniform

tightness and then show ( Km f dµnj ) is Cauchy. Given ε > 0, let m ∈ N be large with ∥f ∥/m < ε/5

and g ∈ Γm be with supKm |f − g| < ε/5. Let j0 ∈ N be such that | Km gd(µnk − µnj )| < ε/5 for
every k > j ≥ j0 . Then for k > j ≥ j0 , we have
∫ ∫ ∫ ∫ 3ε
| Km f d(µk − µj )| ≤ Km |f − g|dµnk + | Km gd(µnk − µnj )| + Km |g − f |dµnj <
5
and hence
∫ 3ε ∫ 3ε 2∥f ∥ 3ε 2ε
| X f d(µk − µj )| ≤ + X\Km |f |d(µk + µj ) < + < + = ε.
∫ 5 5 m 5 5
This shows that ( X f dµnj ) is Cauchy, proving our claim.

Step-3 : (Sketch) Define ϕ : Cb (X, R) → R as ϕ(f ) = limj→∞ X f dµnj . Then ϕ is linear and
ϕ(f ) ≥ 0 whenever f ≥ 0. Moreover, if (fm ) ↘ 0 pointwise, then it can be shown that ϕ(fm ) ↘ 0.
Therefore, by a theorem of Stone-Daniell (similar to Riesz representation theorem), there is a Borel
TOPICS FROM ANALYSIS - PART 3/3 13

measure µ ≥ 0 on X with ϕ(f ) = X f dµ (see p.294 of Dudley, Real Analysis and Probability).
Considering f = 1, we see µ(X) = 1. And (µnj ) → µ weakly by the definition of ϕ. 

In fact, Theorem 11.3.3 and Theorem 11.5.4 of Dudley, Real Analysis and Probability give more
information, which we state as [141′ ] below without proof.

Definition: Let X be a Polish space. Let Lb (X, R) = {f ∈ Cb (X, R) : f is Lipschitz}, and ∥f ∥L =


∥f ∥∞ + inf{λ > 0 : λ is a Lipschitz constant for f } for f ∈ Lb (X, R). For µ1 , µ2 ∈ M (X), define

DL (µ1 , µ2 ) = sup{| X f d(µ1 − µ2 )| : f ∈ Lb (X, R) and ∥f ∥L ≤ 1}. Also define D(µ1 , µ2 ) = inf{ε >
0 : µ1 (A) ≤ µ2 (B(A, ε)) + ε ∀ Borel A ⊂ X}, where B(A, ε) = {x ∈ X : dist(x, A) < ε}.

[141′ ] Let X be a Polish space. Then D, DL are metrics on M (X), and (µn ) → µ weakly in M (X)
⇔ D(µn , µ) → 0 ⇔ DL (µn , µ) → 0. Moreover, for Γ ⊂ M (X), the following are equivalent:
(i) Γ is uniformly tight.
(ii) Every sequence in Γ has a subsequence converging weakly to some µ ∈ M (X).
(iii) The closure of Γ is compact in M (X) with respect to the metric D (or DL ).
(iv) Γ is totally bounded in M (X) with respect to the metric D (or DL ).

Remark: Since the elements of a Cauchy sequence form a totally bounded set, it follows from [141′ ]
that (M (X), D) and (M (X), DL ) are complete metric spaces when X is a Polish space.

Exercise-44: Let X be a Polish space, ε > 0 and µ ∈ M (X). Then,



(i) X has a finite Borel partition X = kj=0 Aj with µ(∂Aj ) = 0 for 0 ≤ j ≤ k such that µ(A0 ) < ε
and diam(Aj ) < ε for 1 ≤ j ≤ k (i.e., A0 has small measure and A1 , . . . , Ak have small diameter).
(ii) There is a finitely supported measure ν ∈ M (X), i.e., ν(X \ F ) = 0 for some finite set F ⊂ X,
with D(µ, ν) ≤ ε, where D is the metric in [141′ ].
[Hint: (i) Let {xn : n ∈ N} be dense in X. Since (ε/4, ε/2) is uncountable, for each n there is

δn ∈ (ε/4, ε/2) such that for Bn := B(xn , δn ) we have µ(∂Bn ) = 0. Since X = ∞n=1 Bn , there is
∪k ∪
k ∈ N such that A0 := X \ n=1 Bn satisfies µ(A0 ) < ε. Let A1 = B1 and Aj+1 = Bj+1 \ jn=1 Bn

for 1 ≤ j < k. (ii) Let Aj be as above, pick yj ∈ Aj for 0 ≤ j ≤ k and define ν = kj=0 µ(Aj )µyj ,
where µyj is the Dirac measure at yj . Then ν(Aj ) = µ(Aj ) for 0 ≤ j ≤ k. Given Y ∈ B(X),
∑ ∑ ∪
let J = {1 ≤ j ≤ k : Aj ∩ Y ̸= ∅}. Then j∈J µ(Aj ) = j∈J ν(Aj ) = ν( j∈J Aj ), and
∪ ∑
j∈J Aj ⊂ B(Y, ε). Hence µ(Y ) ≤ µ(A0 ) + j∈J µ(Aj ) < ε + ν(B(Y, ε)). Thus D(µ, ν) ≤ ε.]

Remark: (i) Finitely supported probability measures are convex combinations of Dirac measures.
(ii) Let Y be a countable dense subset of a Polish space (or just a separable metric space) X. It can
be deduced using Exercise-44(ii) that the set of all finitely supported ν ∈ M (X) with supp(ν) ⊂ Y
and ν taking values in Q is a countable dense subset of (M (X), D). Thus (M (X), D) is separable.
14 T.K.SUBRAHMONIAN MOOTHATHU

4. Conditional expectation

You must have studied the notion of conditional probability in elementary classes with the
expression P (B)P (A|B) = P (A ∩ B), where P (A|B) is the conditional probability of the event A
given event B. We will consider a generalization of this to the measure theoretic setting.

Definition: Let (X, A, µ) be a probability space and f ∈ L1 (X, A, µ).

(i) If B ∈ A is with µ(B) > 0, we define the conditional expectation E(f |B) of f given B as

E(f |B) = ( B f dµ)/µ(B); and we put E(f |B) = 0 if µ(B) = 0. Taking f = 1A for A ∈ A, we
∫ ∫
recover the definition of conditional probability since B 1A dµ = A∩B 1dµ = µ(A ∩ B).

(ii) More importantly, if B ⊂ A is a sub σ-algebra of A, then ν(B) := B f dµ for B ∈ B defines a
signed measure on (X, B) absolutely continuous w.r.to µ, and therefore by Radon-Nikodym theorem

there is h ∈ L1 (X, B, µ) such that ν(B) = B hdµ for every B ∈ B. Moreover, this h is unique
in the following sense: if h′ also satisfies the same property, then h(x) = h′ (x) for µ-almost every
x ∈ X. We define the conditional expectation E(f |B) of f given B as E(f |B) = h ∈ L1 (X, B, µ). In
∫ ∫
other words, E(f |B) is defined as the unique h ∈ L1 (X, B, µ) satisfying B hdµ = B f dµ for every
B ∈ B, and the existence of h is guaranteed by Radon-Nikodym theorem.

(iii) Let f ∈ L1 (X, A, µ) and g : X → R be a random variable. Then B := g −1 (B(R)) is a sub


σ-algebra of A. We define E(f |g) = E(f |B).

Remark: Note that E(f |B) is not a real number but an integrable function. Why is it so? Recall
that in Multivariable Calculus the derivative of a differentiable function at a point is not a real
number but a linear map: this linear map incorporates information about directional derivatives in
all possible directions. Similarly, when we write E(f |B) = h, it has to be observed that the function
h incorporates information about the conditional expectations E(f |B) for all B ∈ B. If E(f |B) = h,
∫ ∫ ∫
then for every B ∈ B we have B f dµ = B hdµ, and in particular E(f |B) = ( B hdµ)/µ(B).

Example: (i) Let X = [0, 1], A = B(X), µ be the Lebesgue measure, f : X → R be f (x) = x, and
B = {∅, [0, 1/3], (1/3, 1], X}. Note that f is not B-measurable since [0, 1/2] = f −1 ([0, 1/2]) ∈
/ B.
∫ 1/3 ∫1
To compute E(f |B), observe that 0 f dµ = 1/18 and 1/3 f dµ = 4/9. Let h : X → R be
∫ ∫
h = (1/6)1[0, 1/3) + (2/3)1(1/3, 1] , which belongs to L1 (X, B, µ) and we have B hdµ = B f dµ for
every B ∈ B. Hence E(f |B) = h. In particular, we cannot expect E(f |B) to be continuous (even
after modifying on a null set) even when f is continuous. (ii) More generally, let (X, A, µ) be
a a probability space, and B ⊂ A be a sub σ-algebra generated by a finite measurable partition

X = kj=1 Aj with µ(Aj ) > 0, i.e., B is the smallest sub σ-algebra of A containing {A1 , · · · , Ak }. For
∫ ∑
f ∈ L1 (X, A, µ) if we put aj = ( Aj f dµ)/µ(Aj ), then it may be checked that E(f |B) = kj=1 aj 1Aj .
TOPICS FROM ANALYSIS - PART 3/3 15

Conditional expectation behaves in many cases like ordinary expectation. But it should be noted
that since h = E(f |B) is defined uniquely only µ-almost everywhere, the properties of conditional
expectation that we state as [142], [143], and [144] below, should be read ‘µ-almost everywhere’.

[142] [Properties of conditional expectation - I] Let (X, A, µ) be a probability space, B ⊂ A be a


sub σ-algebra, and f, f1 , f2 ∈ L1 (X, A, µ). Then,
(i) E(E(f |B)) = Ef . If f ∈ L1 (X, B, µ), then E(f |B) = f . In particular, E(f |A) = f .
(ii) [Linearity] E(af1 + bf2 |B) = aE(f1 |B) + bE(f2 |B) for a, b ∈ R.
(iii) [Positivity] E(f |B) ≥ 0 when f ≥ 0. By linearity, E(f1 |B) ≤ E(f2 |B) when f1 ≤ f2 .
(iv) |E(f |B)| ≤ E(|f | | B).
(v) [Tower property] If C ⊂ B is a sub σ-algebra, then E(E(f |B)|C) = E(f |C).
(vi) [Independence] If f is independent with B, i.e., if µ(f −1 (C) ∩ B) = µ(f −1 (C))µ(B) for every
C ∈ B(R) and B ∈ B, then E(f |B) ≡ Ef µ-a.e. In particular, E(f |{∅, X}) ≡ Ef .

Proof. Properties (i) and (ii) follow essentially from the definition of conditional expectation.

(iii) Let h = E(f |B) and A = {x ∈ X : h(x) < 0}. Write A = ∞ n=1 An , where An = {x ∈ X :
h(x) < −1/n}, and note A, An ∈ B. If h ≥ 0 µ-a.e. is false, then µ(A) > 0 and then µ(An ) > 0 for
∫ ∫
some n ∈ N. Then An hdµ ≤ −µ(An )/n < 0 ≤ An f dµ, a contradiction since An ∈ B.

(iv) Since ±f ≤ |f |, we have ±E(f |B) = E(±f |B) ≤ E(|f | | B) by (ii) and (iii).
∫ ∫ ∫
(v) Let h = E(E(f |B)|C). If C ∈ C ⊂ B, then C hdµ = C E(f |B)dµ = C f dµ.

(vi) By hypothesis, 1B is independent with f for B ∈ B and hence E(1B f ) = E(1B )Ef = µ(B)Ef
∫ ∫ ∫
by Exercise-32. Therefore, B f dµ = X 1B f dµ = E(1B f ) = µ(B)Ef = B Ef dµ for B ∈ B. 

[143] [Properties of conditional expectation - II] Let (X, A, µ) be a probability space, B ⊂ A be a


sub σ-algebra, and f, fn ∈ L1 (X, A, µ) for n ∈ N.
(i) [Monotone convergence theorem] If 0 ≤ fn ↗ f pointwise µ-a.e., then E(fn |B) ↗ E(f |B)
pointwise µ-a.e.
(ii) [Fatou’s lemma] If fn ≥ 0 and f := lim inf n→∞ fn , then E(f |B) ≤ lim inf E(fn |B).
n→∞
(iii) [Lebesgue dominated convegence theorem] If (fn ) → f pointwise µ-a.e., and |fn | ≤ g for some
g ∈ L1 (X, A, µ), then E(fn |B) → E(f |B) pointwise µ-a.e.
(iv) [Product with a random variable] If g : (X, B) → R is a random variable with gf ∈ L1 (X, A, µ),
then E(gf |B) = gE(f |B).

Proof. (i) Let hn = E(fn |B). Since 0 ≤ hn ≤ hn+1 ≤ E(f |B) (and this we may assume everywhere
after modification on a set of measure 0), there is h : X → [0, ∞) with h(x) = limn→∞ hn (x)
for every x ∈ X. Clearly h is B-measurable since hn ’s are. Consider B ∈ B. By the ordinary

Monotone convergence theorem applied to 1B fn ↗ 1B f and 1B hn ↗ 1B h, we obtain B f dµ =
16 T.K.SUBRAHMONIAN MOOTHATHU
∫ ∫ ∫ ∫ ∫
limn→∞ B fn dµ and B hdµ = limn→∞ B hn dµ. But B fn dµ = B hn dµ since hn = E(fn |B) and
∫ ∫
thus B f dµ = B hdµ. By taking B = X, we see h ∈ L1 (X, B, µ). Hence h = E(f |B) µ-a.e..

(ii) Let gm := inf n≥m fn . Note that 0 ≤ gm ↗ f and gm ≤ fm . Hence by (i) we have
E(f |B) = lim E(gm |B) ≤ lim inf E(fm |B).
m→∞ m→∞

(iii) Apply (ii) to g ± fn ≥ 0. Then E(g − f |B) = E(lim inf(g − fn )|B) ≤ lim inf E(g − fn |B) =
E(g|B) − lim sup E(fn |B). Removing E(g|B) from both sides we get −E(f |B) ≤ − lim sup E(fn |B),
and hence E(f |B) ≥ lim sup E(fn |B). Similarly, E(g + f |B) = E(lim inf(g + fn )|B) ≤ lim inf E(g +
fn |B) = E(g|B) + lim inf E(fn |B) and thus E(f |B) ≤ lim inf E(fn |B) as well.

(iv) By linearity, we may assume g ≥ 0 and f ≥ 0. If g = 1C for some C ∈ B, then for any B ∈ B
∫ ∫ ∫ ∫
we have B 1C E(f |B)dµ = B∩C E(f |B)dµ = B∩C f dµ = B 1C f dµ and hence gE(f |B) = E(gf |B)
in this case. By linearity, gE(f |B) = E(gf |B) holds for all simple functions g ≥ 0. For a general
g ≥ 0, find simple functions gn with 0 ≤ gn ↗ g. Then gn E(f |B) ↗ gE(f |B). By what is already
proved, and applying (i) to 0 ≤ gn f ↗ gf , also deduce gn E(f |B) = E(gn f |B) ↗ E(gf |B). 

[144] [Jensen’s inequality] Let (X, A, µ) be a probability space, B ⊂ A be a sub σ-algebra, f ∈


L1 (X, A, µ), and g : R → R be a convex function with g ◦ f ∈ L1 (X, A, µ). Then g(E(f |B)) ≤
E(g ◦ f |B). In particular, (taking B = A) we have g(Ef ) ≤ E(g ◦ f ).

Proof. Let Γ be the collection of all affine functions h : R → R, h(x) = ax + b, with h ≤ g. The
geometric observation is that g(x) = sup{h(x) : h ∈ Γ} for each x ∈ R since g is convex. Consider
h ∈ Γ. By linearity and the fact h ◦ f ≤ g ◦ f , we get h(E(f |B)) = E(h ◦ f |B) ≤ E(g ◦ f |B) by
[142]. Taking supremum over h ∈ Γ yields g(E(f |B)) ≤ E(g ◦ f |B). 

Remark: Since x 7→ x2 is convex, we have (E(f |B))2 ≤ E(f 2 |B) by [144]. Also, [144] remains true
if f (X) ⊂ J for some interval J ⊂ R and g : J → R is convex. Therefore, (E(|f | | B))p ≤ E(|f |p |B)
for 1 ≤ p < ∞ as x 7→ xp is convex on [0, ∞) for 1 ≤ p < ∞.

Exercise-45: [A characterization of conditional expectation of L2 -functions] Let (X, A, µ) be a


probability space, f ∈ L2 (X, A, µ), and B ⊂ A be a sub σ-algebra. Then,
(i) Γ := {g ∈ L2 (X, A, µ) : g is B-measurable} is a closed vector subspace of L2 (X, A, µ).
(ii) If h = E(f |B), then ∥f − h∥2 ≤ ∥f − g∥2 for every g ∈ Γ. In other words, E(f |B) is the
orthogonal projection of f onto the subspace Γ.
[Hint: (ii) ∥f − g∥22 = E(f − g)2 = E(f − h + h − g)2 = E(f − h)2 + E(h − g)2 + 2s, where
s = E((f − h)(h − g)) = E(E((f − h)(h − g)|B)) = E((h − g)E((f − h)|B)) by [142](i) and [143](iv)
since h − g is B-measurable. Moreover, E(f − h) = 0 by [142](i) and thus s = 0.]
TOPICS FROM ANALYSIS - PART 3/3 17

5. Martingales

Martingales are special sequences of random variables (whose historical origin is from gambling)
enjoying a toolkit of nice properties including convergence under mild assumptions.

Definition: Let (X, A, µ) be a probability space. A sequence (An )∞


n=0 of sub σ-algebras of A is called
a filtration if An ⊂ An+1 for every n ≥ 0. We say {fn , An }∞
n=0 is a gambling sequence (or, officially
a stochastic sequence) if (An )∞
n=0 is a filtration on (X, A, µ) and fn : (X, An ) → R are random
variables. In this case we also say (fn )∞ ∞
n=0 is adapted to the filtration (An )n=0 . The terminology
gambling sequence is motivated by the following: imagine a gambler playing at a casino; then fn
can be thought of as the fortune of the gambler after the nth game, An is the collection of events
known to have occurred or not in the gambling up to the nth game, and E(fn+1 |An ) is the expected
fortune of the gambler at the (n + 1)th game having known the outcomes of the first n games.

Question: Does the knowledge of An (events up to the nth game) help the gambler to increase the
expected value of fn+1 , his fortune after the (n + 1)th game? i.e., is fn ≤ E(fn+1 |An )?

Definition: A gambling sequence {fn , An }∞


n=0 on a a probability space (X, A, µ) with fn ∈ L (X, A, µ)
1

for every n ≥ 0 is said to be a:


(i) martingale if fn = E(fn+1 |An ) µ-a.e. (nuetral game)
(ii) submartingale if fn ≤ E(fn+1 |An ) µ-a.e. (game favors the gambler)
(iii) supermartingale if fn ≥ E(fn+1 |An ) µ-a.e. (game favors the casino)
for every n ≥ 0.

Remark: (i) If fn is An -measurable, then E(fn |An ) = fn by [142](i). Hence a gambling sequence
{fn , An }∞
n=0 with fn ∈ L is a martingale iff E(fn+1 − fn |An ) = 0; is a submartingale iff E(fn+1 −
1

fn |An ) ≥ 0; and is a supermartingale iff E(fn+1 − fn |An ) ≤ 0 for every n ≥ 0. (ii) {fn , An }∞


n=0 is
a submartingale iff {−fn , An }∞
n=0 is a supermartingale.

Exercise-46: Let {fn , An }∞


n=0 be a martingale. Then,
(i) Ef0 = Efn for every n ≥ 0 , i.e., the expectation is constant.
(ii) fn = E(fm |An ) for every 0 ≤ n ≤ m.
(iii) If g : R → R is convex (concave) with g ◦ fn ∈ L1 (X, An , µ), then {g ◦ fn , An }∞
n=0 is a
submartingale (supermartingale).
[Hint: (i) & (ii): Efn = E(E(fn+1 |An )) = Efn+1 , and fn = E(fn+1 |An ) = E(E(fn+2 |An+1 )|An ) =
E(fn+2 |An ) = · · · by [142](i) and [142](v). (iii) g ◦ fn = g(E(fn+1 |An )) ≤ E(g ◦ fn+1 |An ) by [144].]

Examples: Let (An )∞


n=0 be a filtration on a probability space (X, A, µ), and f ∈ L (X, A, µ). Define
1

fn = E(f |An ), which is An -measurable. Now E(fn+1 |An ) = E(E(f |An+1 )|An ) = E(f |An ) = fn
by [142](v). Therefore {fn , An }∞
n=0 is a martingale. A partial converse is given below.
18 T.K.SUBRAHMONIAN MOOTHATHU

Exercise-47: Let 1 < p < ∞, and {fn , An }∞


n=0 be a martingale on a probability space (X, A, µ)
such that (fn )∞
n=0 is a bounded sequence in L (X, A, µ). Then there is f ∈ L (X, A, µ) with fn =
p p

E(f |An ) for every n ≥ 0. [Hint: Since Lp is reflexive and separable for 1 < p < ∞, the bounded
sequence (fn ) has a weakly convergent subsequence (fnk ) → f ∈ Lp by Alaoglu’s theorem, etc. Fix
n ∈ N, and consider A ∈ An . Then 1A ∈ Lq = (Lp )∗ , where p1 + 1q = 1. By weak convergence,

A f dµ = ⟨f, 1A ⟩ = limk→∞ ⟨fnk , 1A ⟩. For nk > n, we have E(fnk |An ) = fn by Exercise-46(ii).
∫ ∫ ∫ ∫
Hence ⟨fnk , 1A ⟩ = A fnk dµ = A fn dµ. Thus A f dµ = A fn dµ, giving fn = E(f |An ).]

Example: Let gn : X → R be an L1 -sequence of independent random variables on a probability


space (X, A, µ), and assume there is c ∈ R with Egn = c for every n ≥ 0. Let An = σ(g0 , g1 , . . . , gn ),
the smallest sub σ-algebra of A with respect to which g0 , g1 , . . . , gn are measurable. Then (An )∞
n=0

∑n
is a filtration called the natural filtration of (gn )n=0 . Let fn = j=0 gj . Then fn ∈ L (X, A, µ) with
1

natural filtration (An )∞


n=0 . Observe that E(fn+1 − fn |An ) = E(gn+1 |An ) ≡ Egn+1 = c by [142](vi)
since gn+1 is independent with An . Therefore {fn , An }∞
n=0 is a martingale or submartingale or
supermartingale according to whether c = 0 or c ≥ 0 or c ≤ 0 respectively.

Remark: From the Example above we may extract a general method of constructing martingales
as follows. Let (hn )∞ 1
n=0 be an L -sequence of independent random variables on a probability space
(X, A, µ). Let gn = hn −Ehn (then Egn = 0 and gn ’ are still independent), An = σ(g0 , g1 , · · · , gn ) =

σ(h0 , h1 , · · · , hn ), and fn = nj=0 gj . Then {fn , An }∞
n=0 is a martingale.

Exercise-48: Let (fn )∞ ∞


n=0 , (gn )n=0 be two sequences of random variables on a probability space

(X, A, µ) and assume each gn is bounded. Define h0 = f0 and hn = f0 + nj=1 gj (fj − fj−1 ) for
n ∈ N. Here, (hn ) may be called the discrete integral of (gn ) w.r.to (fn ). Let (An )∞
n=0 be a filtration
on (X, A, µ) and assume fn and gn+1 (not gn ) are An -measurable for n ≥ 0.
(i) If {fn , An }∞ ∞
n=0 is a martingale, then {hn , An }n=0 is a martingale.
(ii) If {fn , An }∞ ∞
n=0 is a sub(super)martingale and gn ≥ 0, then {hn , An }n=0 is a sub(super)martingale.
[Hint: Evidently, hn is An -measurable, and hn ∈ L1 since fn ∈ L1 and gn ’s are bounded. Now
E(hn+1 − hn |An ) = E(gn+1 (fn+1 − fn )|An ) = gn+1 E(fn+1 − fn |An ) by [143](iv).]

Many results about martingales have corresponding versions for submartingales/supermartingales


as well. This is not so surprising in view of the following decomposition theorem.

[145] [Doob decomposition] Let {fn , An }∞


n=0 be a submartingale on a probability space (X, A, µ).
Then we may write fn = gn + hn , where
(i) {gn , An }∞
n=0 is a martingale.
(ii) hn ≤ hn+1 µ-a.e. for n ≥ 0.
(iii) hn+1 is An -measurable for n ≥ 0.
Further, the decomposition fn = gn + hn is unique if we set h0 = 0.
TOPICS FROM ANALYSIS - PART 3/3 19

Proof. First we show uniqueness. Suppose fn = gn + hn . We have E(gn+1 |An ) = gn by (i), and
E(hn+1 |An ) = hn+1 by (iii). Hence E(fn+1 − fn |An ) = E(fn+1 |An ) − fn = E(gn+1 + hn+1 |An ) −
(gn + hn ) = hn+1 − hn . This determines hn ’s uniquely since h0 = 0, and then gn ’s are also uniquely
determined. Now we prove the existence of decomposition. Starting with h0 = 0, we may define
hn ’s inductively using the requirement hn+1 − hn = E(fn+1 − fn |An ), which ensures (iii). We have
hn+1 ≥ hn since E(fn+1 − fn |An ) ≥ 0 by the submartingale property of fn . Letting gn = fn − hn ,
we may verify (i) as well by checking E(gn+1 − gn |An ) = 0. 

Remark: For a supermartingale {fn , An }∞


n=0 , similarly we get fn = gn + hn with hn ≥ hn+1 .

Imagine a game in which the gambler stops playing when a particular favorable event happens.
The theory given below leading up to [146] says that this stopping strategy does not affect the
essential nature (martingale/submartingale/supermartingale) of the gambler’s expected fortune.

Definition: Let (X, A, µ) be a probability space. A random variable t : X → N ∪ {0, ∞} is


called a stopping time with respect to a filtration (An )∞
n=0 if {x ∈ X : t(x) ≤ n} ∈ An for every
n ∈ N ∪ {0}, and this is equivalent to saying {x ∈ X : t(x) = n} ∈ An for every n ∈ N ∪ {0} (verify).
Intuitively, the event {x ∈ X : t(x) ≤ n} happens within the first n units of time, and the event
{x ∈ X : t(x) = ∞} never happens. A stopping time t is said to be bounded if there is M ∈ N such
that µ({x ∈ X : t(x) ≤ M }) = 1, and is said to be finite if µ({x ∈ X : t(x) < ∞}) = 1. Clearly, a
bounded stopping time is finite but the converse is not true, see the Example below.

Example: Consider X = [0, 1] with the Lebesgue measure. Let fn : X → R be fn (x) = xn and
An = σ(f0 , f1 , . . . , fn ) for n ≥ 0. Fix c ∈ (0, 1), and define t : [0, 1] → N ∪ {0, ∞} as t(x) = min{n ∈
N : fn (x) ≤ c} and t(x) = ∞ if no such n exists. Now, t is finite since {x ∈ [0, 1] : t(x) < ∞} = [0, 1),
but t is not bounded since {x ∈ [0, 1] : t(x) ≤ M } = [0, c1/M ] whose Lebesgue measure is < 1.

Definition: Let {fn , An }∞


n=0 be a gambling sequence on a probability space (X, A, µ) and t : X →
N ∪ {0, ∞} be a stopping time w.r.to (An ). Let ft∧n : X → R be ft∧n (x) = fmin{t(x),n} (x) for n ≥ 0.
For example, if t ≡ M , then (ft∧n ) = (f0 , f1 , . . . , fM −1 , fM , fM , fM , . . .), which corresponds to a
game stopped at time M . If t is finite, let ft : X → R as ft (x) = ft(x) (x), which is defined µ-a.e.
since t < ∞ µ-a.e.. Intuitively, ft carries information about the gambler’s fortune under optional
stopping of a game. Also define At = {A ∈ A : A ∩ {x ∈ X : t(x) ≤ n} ∈ An ∀ n ≥ 0}.

Exercise-49: As per the above Definition, we have:


(i) ft∧n is An -measurable for every n ≥ 0.
(ii) At is a sub σ-algebra of A, and t, ft are At -measurable.
(iii) If t is finite, then ft (x) = ft(x) (x) = ft∧n (x) for all large n ∈ N for µ-a.e. x ∈ X.
(iv) If t is bounded and fn ∈ L1 for every n ≥ 0, then ft∧n , ft ∈ L1 .
20 T.K.SUBRAHMONIAN MOOTHATHU

[Hint: (i) & (ii): Let Aj = {x ∈ X : t(x) = j} ∈ Aj , En = {x ∈ X : t(x) ≤ n} = nj=0 Aj ∈ An ,
−1 ∪ ∪
and B ∈ B(R). Then ft∧n (B) = [ nj=0 (Aj ∩ fj−1 (B))] [Enc ∩ fn−1 (B)] ∈ An . Since En ∩ t−1 (B) =
∪ −1 −1 ∪n −1
j∈B: 0≤j≤n Aj ∈ An , we have t (B) ∈ At . Since En ∩ft (B) = j=0 [Aj ∩fj (B)] ∈ An , we have
∫ ∑ ∫ ∑M
ft−1 (B) ∈ At . (iv) If t is bounded by M ∈ N, then X |ft |dµ = M j=0 Aj |fj |dµ ≤ j=0 ∥fj ∥ < ∞.]

Example: Let X = [0, 1] with Lebesgue measure µ. Let fn : X → R be fn (x) = x/n and
An = σ(f1 , . . . , fn ) for n ∈ N. Define t : [0, 1] → N ∪ {0, ∞} as t(x) = min{n ∈ N : fn (x) ≤ 1/3},
which is a bounded stopping time w.r.to (An ) since t ≤ 3. Here ft : X → R is given by ft (x) = f1 (x)
for x ∈ [0, 1/3]; ft (x) = f2 (x) for x ∈ (1/3, 2/3]; and ft (x) = f3 (x) for x ∈ (2/3, 1]. Also, ft∧1 = f1 ;
ft∧2 (x) = f1 (x) for x ∈ [0, 1/3] and ft∧2 (x) = f2 (x) for x ∈ (1/3, 1]; and ft∧n = ft for n ≥ 3.

Example: Let X = [0, 1] with Lebesgue measure µ. Let An = [0, 2−n ] and fn : X → R be
fn = 2n 1An for n ≥ 0. We have fn ∈ L1 (X, A, µ) with ∥f ∥1 = Efn = 1. Let Cn be the collection
of subintervals J ⊂ [0, 1] having end points of the form k/2n , and An = σ(Cn ). Then fn is An -
∫ ∫
measurable. Consider J ∈ An . Then J fn dµ = J fn+1 dµ = 1 or 0 depending on whether inf J = 0
∫ ∫
or inf J ≥ 2−n . It follows that A fn dµ = A fn+1 dµ for every A ∈ An . Therefore, E(fn+1 |An ) = fn ,
i.e., {fn , An }∞ ∞
n=0 is a martingale. Note that (fn (x))n=0 is eventually 0 for each x ∈ (0, 1]. Hence
t : X → N ∪ {0, ∞} given by t(0) = ∞ and t(x) = min{n ∈ N : fn (x) = 0} for x ∈ (0, 1], is a
finite stopping time. And ft : X → R satisfies ft ≡ 0 on (0, 1], giving Eft = 0 ̸= 1 = Efn for every
n ≥ 0. This shows the necessity of the boundedness assumption on t in [146](iii) below.

The result below says that under fairly general conditions, an optional stopping does not really
change the nature of a game; for simplicity we state it only for martingales.

[146] [Doob’s optional stopping time theorem] Let {fn , An }∞


n=0 be a martingale on a probability
space (X, A, µ), and t be a stopping times w.r.to (An ). Then,
(i) {ft∧n , An }∞
n=0 is a martingale.
(ii) If t is finite and there is M > 0 with |ft∧n | ≤ M µ-a.e. for each n ≥ 0, then Eft = Ef0 .
(iii) If t is bounded, say t ≤ M ∈ N, then E(fM |At ) = ft . Consequently Eft = EfM = Ef0 µ-a.e.
(iv) If s ≤ t are bounded stopping times w.r.to (An ), then E(ft |As ) = fs µ-a.e.

Proof. (i) Let gn : X → R be gn (x) = 1 if n ≤ t(x) and gn (x) = 0 if t(x) ≤ n − 1. Then gn+1

is An -measurable. Since ft∧0 = f0 and ft∧n = f0 + nj=1 gj (fj − fj−1 ) for n ≥ 1, we conclude by
Exercise-48 that {ft∧n , An }∞
n=0 is a martingale.

(ii) By Exercise-49(iii), |ft | ≤ M µ-a.e., which implies ft ∈ L1 (X, A, µ). Now by (i), Ef0 = Eft∧0 =
Eft∧n for every n ∈ N and therefore |E(ft − f0 )| = |E(ft − ft∧n )| ≤ 2M µ({x ∈ X : t(x) > n}) → 0
as n → ∞ since t < ∞ µ-a.e.. Hence E(ft − f0 ) = 0, or Eft = Ef0 .
TOPICS FROM ANALYSIS - PART 3/3 21

∪M
(iii) Let Aj = {x ∈ X : t(x) = j} ∈ Aj and observe X = j=0 Aj is a measurable partition modulo
a null set. Consider A ∈ At . Then A ∩ Aj ∈ Aj . We have ft ∈ L1 (X, At , µ) by Exercise-49,
∫ ∫
and ft = fj on Aj . Also E(fM |Aj ) = fj by Exercise-46(ii), implying A∩Aj fj dµ = A∩Aj fM dµ.
∫ ∑ ∫ ∑M ∫ ∫
Therefore, A ft dµ = M j=0 A∩Aj fj dµ = j=0 A∩Aj fM dµ = A fM dµ. Thus E(fM |At ) = ft .

(iv) Suppose s ≤ t ≤ M . Applying the tower law of conditional expectation to As ⊂ At , and


applying (iii) to t ≤ M and s ≤ M , we obtain E(ft |As ) = E(E(fM |At )|As ) = E(fM |As ) = fs . 

Seminar topics: (i) Analogue of [146] for submartingales, (ii) Doob’s maximal and Lp -inequalities.

Remark: If {fn , An }∞
n=0 is a (super)martingale with fn ≥ 0 on a probability space (X, A, µ), and t
is a finite stopping time w.r.to (An ), then by Exercise-49(iii), Fatou’s lemma, and [146](i) we have,
Eft = E(lim inf ft∧n ) ≤ lim inf Eft∧n = (≤)Eft∧0 = Ef0 . Thus Eft ≤ Ef0 .

If a sequence of real numbers is not convergent in [−∞, ∞], then the sequence should oscillate and
hence should cross some interval [a, b] infinitely often. Building up on this simple observation, the
so called upcrossing inequality of Doob will now lead us to a sufficient condition for the convergence
of an L1 -martingale. This will complement Exercise-47 stated for an Lp -martingale, 1 < p < ∞.

[147] [Doob’s upcrossing inequality] Let {fn , An }∞


n=0 be a supermartingale on a probability space
(X, A, µ) and fix reals a < b. For n ∈ N, define the upcrossing random variable un : X → R as
follows: un (x) is the maximum integer k ∈ N such that there are 0 ≤ p1 < q1 < · · · < pk < qk ≤ n
with fpi (x) < a < b < fqi (x) for 1 ≤ i ≤ k; and we put un (x) = 0 if no such k ∈ N exists. Then
(b − a)Eun ≤ E|fn − a| for every n ∈ N.

Proof. We will define random variables gj : X → R that are 1 during an upcrossing and 0 during
a downcrossing. Fix x ∈ X, and choose integers 0 ≤ p1 < q1 < p2 < q2 < · · · optimally (least in
each case) with fpi (x) < a < b < fqi (x). Let g0 (x) = 0, gj+1 (x) = 0 for 0 ≤ j ≤ p1 , gj+1 (x) = 1 for
pi < j ≤ qi , and gj+1 (x) = 0 for qi < j ≤ pi+1 . Observe that gj+1 is Aj -measurable. Let h0 = 0

and hn = nj=1 gj (fj − fj−1 ) for n ≥ 1. Then {hn , An }n≥0 is a supermartingale by the argument
in Exercise-48(ii) and therefore Ehn ≤ Eh0 = E0 = 0. Since each upcrossing of fj ’s increases the
value of hn by at least b − a, we also have (b − a)un − |fn − a| ≤ hn , where the subtracted term
|fn − a| is to take care of any possible incomplete upcrossing at the end with fn (x) < a. Taking
expectation we have (b − a)Eun − E|fn − a| ≤ Ehn ≤ 0, and thus (b − a)Eun ≤ E|fn − a|. 

[148] [Martingale convergence theorem] Let {fn , An }∞


n=0 be a (super)martingale on a probabil-
ity space (X, A, µ) such that (fn )∞
n=0 is a bounded sequence in L (X, A, µ). Then there is f ∈
1

L1 (X, A, µ) such that (fn ) → f pointwise µ-a.e.


22 T.K.SUBRAHMONIAN MOOTHATHU

Proof. Let supn≥0 ∥fn ∥ ≤ M < ∞. Fix reals a < b and let un ’s be as in [147]. Since 0 ≤ un ≤ un+1 ,
the limit u(x) = limn→∞ un (x) exists in [0, ∞] for every x ∈ X. We have (b − a)Eun ≤ E|fn − a| ≤
∫ ∫
E(|fn | + |a|) ≤ M + |a| by [147], and therefore X udµ = lim X un dµ ≤ (M + |a|)/(b − a) < ∞ by
Monotone convergence theorem. Hence u(x) < ∞ for µ-a.e. x ∈ X. This in other words means
the set Y (a, b) := {x ∈ X : (fn (x))∞n=0 has infinitely many upcrossings of [a, b]} is µ-null. Since

J := {(a, b) ∈ Q2 : a < b} is countable, the set Y := (a,b)∈J Y (a, b) is also µ-null. And (fn (x))∞
n=0
must converge in [−∞, ∞] for every x ∈ X \ Y . Writing f (x) = limn→∞ fn (x), we see by Fatou’s
∫ ∫
lemma that X |f |dµ ≤ lim inf X |fn |dµ ≤ M < ∞ and thus f ∈ L1 (X, A, µ). 

Example: We will show that we cannot expect either fn = E(f |An ) or ∥f − fn ∥1 → 0 in [148].
Let X = [0, 1] with Lebesgue measure µ. Let An = [0, 2−n ] and fn : X → R be fn = 2n 1An for
n ≥ 0. Let An be the σ-algebra on X generated by subintervals J ⊂ X having end points of the
form k/2n . Then, {fn , An }∞
n=0 is a martingale as shown just before [146]. Now, (fn ) → 0 pointwise
µ-a.e. but not in L1 since ∥fn ∥1 = 1 for every n ≥ 0. Clearly E(0|An ) = 0 ̸= fn also.

The extra hypothesis required to get L1 -convergence in [148] is uniform integrability, and this we
will not discuss here. The situation is better on Lp for 1 < p < ∞. We present only the L2 -case:

[149] Let {fn , An }∞ ∞


n=0 be a martingale on a probability space (X, A, µ) such that (fn )n=0 is a
bounded sequence in L2 (X, A, µ). Then there is f ∈ L2 (X, A, µ) such that:
(i) fn = E(f |An ) for every n ≥ 0.
(i) (fn ) → f pointwise µ-a.e.
(ii) ∥f − fn ∥1 ≤ ∥f − fn ∥2 → 0 and Ef = Ef0 , where ∥ · ∥p denotes the Lp -norm for p = 1, 2.

Proof. Keep in mind that ∥h∥1 ≤ ∥h∥2 ∥1∥2 = ∥h∥2 by Cauchy-Schwarz for every h ∈ L2 (X, A, µ).
Since (i) and (ii) are covered by Exercise-47 and [148], it remains to prove (iii). Let M :=
supn≥0 ∥fn ∥ < ∞. For n < m, we have E(fn fm |An ) = fn E(fm |An ) = fn2 as fn is An -measurable
and by Exercise-46(ii). Taking expectation of both sides we conclude E(fn fm ) = Efn2 for n < m.
Let g0 = 0 and gn+1 = fn+1 − fn . Then Egn+12 2
= Efn+1 − Efn2 since 2E(fn+1 fn ) = 2Efn2 . There-
∑ ∑
fore, nj=1 Egj2 = Efn2 − Ef02 ≤ 2M 2 < ∞ for every n and thus ∞ j=1 Egj < ∞. For n < m, we
2

have E(fm − fn )2 = m j=n+1 Egj . Since (fm ) → f pointwise µ-a.e., we have by Fatou’s lemma that
2

∑m ∑∞
∥f − fn ∥2 = E(f − fn ) = E(lim inf (fm − fn ) ) ≤ lim inf
2 2 2 2
Egj = Egj2 → 0 as n → ∞.
m→∞ m→∞
j=n+1 j=n+1
Hence |E(f − fn )| ≤ ∥f − fn ∥1 ≤ ∥f − fn ∥2 → 0 as n → ∞. And Efn = Ef0 for n ≥ 0. 

Topic for self-study: Applications of martingales.

*****

You might also like