TopicsInAnalysis3 TKSM
TopicsInAnalysis3 TKSM
T.K.SUBRAHMONIAN MOOTHATHU
Contents
This third part of the notes discusses some fundamental results from Probability Theory in the
language of Measure Theory, and the treatment is aimed at those mathematical students who wish
to learn a little bit of Probability Theory for applications in other branches of mathematics.
The first thing one should learn in Probability Theory is, when to add and when to multiply
probabilities. Roughly speaking, one adds two probabilities when two exclusive events are connected
by OR, and one multiplies two probabilities when two independent events are connected by AND.
For example, assume we randomly choose a pair of numbers (m, n) ∈ {1, . . . , 5} × {1, . . . , 6}. Let
A be the event that m + 4 ≤ n. Since A = {(m = 1 AND n ∈ {5, 6}) OR (m = 2 AND n = 6)},
we obtain prob(A) = (1/5 × 2/6) + (1/5 × 1/6) = 2/30 + 1/30 = 1/10. In this calculation, for
multiplying probabilities we used the fact that the values of m and n are independent of each
other. We will define precisely the notion of independence shortly.
Definition: (i) A measure space (X, A, µ) is called a probability space if µ(X) = 1, and in this
case µ is called a probability measure. (ii) Let (X, A, µ) be a probability space. A measurable
function f : (X, A, µ) → (R, B(R)) is called a random variable on X, i.e., we say f : X → R is a
random variable if f −1 (B) ∈ A for every Borel set B ⊂ R, and this is equivalent to demanding
that {x ∈ X : f (x) < c} ∈ A for every c ∈ R. It is possible to define a random variable in a more
general sense as a measurable function f : (X, A, µ) → (Y, B(Y )) for a topological space Y , but we
will consider only the case Y = R. Note that if f is a random variable on X and if g : R → R
is Borel-Borel measurable (in particular, if g is continuous), then g ◦ f is also a random variable
1
2 T.K.SUBRAHMONIAN MOOTHATHU
Example: Let X = {1, . . . , 364} and f : X → R be f (n) = the maximum temperature at Hyderabd,
India, on the nth day of the year 2013. Equip X with the normalized counting measure µ, i.e.,
µ(A) = |A|/364 for A ⊂ X. Then f is a random variable on (X, µ) and the induced Borel
probability measure P on R given by P (B) = µ(f −1 (B)) = |{n ∈ X : f (n) ∈ B}|/364 gives the
following information: for reals a < b, the value P ([a, b]) is the probability that the maximum
temperature at Hyderabad on a random day in 2013 is between a and b.
(ii) A collection C = {Cj : j ∈ J} ⊂ A of events is said to be independent if for any finite subset
∩ ∏
F ⊂ J with |F | ≥ 2, we have µ( j∈F Cj ) = j∈F µ(Cj ).
(iii) Families Cj ⊂ A (not necessarily σ-algebras) for j ∈ J are said to be independent if for any
∩ ∏
finite subset F ⊂ J with |F | ≥ 2 and events Cj ∈ Cj , we have µ( j∈F Cj ) = j∈F µ(Cj ).
Remark: Let (X, A, µ) be a probability space. (i) Events A, B ∈ A are independent exactly when
the indicator functions 1A , 1B are independent as random variables (check). Thus the independence
of events is a special case of the independence of random variables. (ii) If random variables f1 , f2 :
X → R are independent and g1 , g2 : R → R are Borel, then it may be verified that the random
variables g1 ◦ f1 , g2 ◦ f2 : X → R are independent.
Example: (i) Let (Xj , Aj , µj ) be probability spaces for j = 1, 2 and let (X, A, µ) be the product
⊗ ⊗
probability space, i.e., X = X1 × X2 , A = A1 A2 and µ = µ1 µ2 . If Aj ∈ Aj for j = 1, 2, then
TOPICS FROM ANALYSIS - PART 3/3 3
µ((A1 × X2 ) ∩ (X1 × A2 )) = µ(A1 × A2 ) = µ1 (A1 )µ2 (A2 ) = µ(A1 × X2 )µ(X1 × A2 ), and hence the
events A1 × X2 , X1 × A2 are independent in (X, A, µ). (ii) Let µ be a Borel probability measure
on R2 . Then the projections f1 , f2 : (R2 , µ) → R to the two coordinates are independent random
⊗
variables and the induced measures Pj = µ ◦ fj−1 for j = 1, 2 satisfy µ = P1 P2 .
Remark: The above two examples are instructive, and they provide a useful way of looking at
the notion of independence: roughly speaking, we may think of independent events as events
happening in distinct coordinates of a product probability space, and independent random variables
as projection-like functions that depend on distinct coordinates of a product probability space.
(ii) If f1 , f2 ∈ L2 (X, A, µ), then var(f1 + f2 ) = var(f1 ) + 2cov(f1 , f2 ) + var(f2 ). If f1 , f2 are also
independent, then cov(f1 , f2 ) = 0 and hence var(f1 + f2 ) = var(f1 ) + var(f2 ).
⊗ ⊗
[Hint: (i) Let Pj = µ ◦ fj−1 on R, and P = (µ µ) ◦ (f1 , f2 )−1 on R2 . We have P = P1 P2
by independence. Hence by the argument in Exercise-31 and Fubini-Tonelli theorem, we have
∫ ∫ ∫
E(|f1 f2 |) = R2 |xy|dP (x, y) = ( R |x|dP1 )( R |y|dP2 ) = (E|f1 |)(E|f2 |) < ∞ so that E(f1 f2 ) =
∫ ∫ ∫
R2 xydP (x, y) = ( R xdP1 )( R ydP2 ) = (Ef1 )(Ef2 ). (ii) Let gj = fj − Efj for j = 1, 2. Then
var(f1 + f2 ) = ∥g1 + g2 ∥2 = ⟨g1 + g2 , g1 + g2 ⟩ = ∥g1 ∥2 + 2⟨g1 , g2 ⟩ + ∥g2 ∥2 .]
Remark: For f (x) = x2 and g(x) = x3 on [0, 1], E(f g) = 1/6 ̸= 1/12 = Ef Eg; this shows that the
hypothesis of independence is necessary in Exercise-32(i).
[132] [Borel-Cantelli lemma or 0-1 law] Let (X, A, µ) be a probability space and An ∈ A.
∑
(i) If ∞ n=1 µ(An ) < ∞, then µ(lim supn→∞ An ) = 0.
∑∞
(ii) If n=1 µ(An ) = ∞ and An ’s are independent, then µ(lim supn→∞ An ) = 1.
∪ ∩
Proof. Let Bk = n≥k An and A = ∞ k=1 Bk = lim supn→∞ An .
∑∞ ∑
(i) If n=1 µ(An ) < ∞, then µ(Bk ) ≤ ∞ k=n µ(An ) → 0 as k → ∞ and hence µ(A) = 0.
∑
(ii) Let ∞n=1 µ(An ) = ∞. Using independence and the fact that 1 − p ≤ e
−p for p ∈ [0, 1], we
∩ ∏ ∏ ∏ ∑
obtain µ(Bkc ) = µ( n≥k Acn ) = n≥k µ(Acn ) = n≥k (1 − µ(An )) ≤ n≥k e−µ(An ) = e− n≥k µ(An ) =
e−∞ = 0. Thus µ(Bk ) = 1 for every k ∈ N so that µ(A) = 1.
[133] [Kolmogorov’s 0-1 law] (i) Let (fn ) be a sequence of independent random variables on a
probability space (X, A, µ). Let Tk ⊂ A be the smallest σ-algebra such that fn is measurable for
∩
n ≥ k, and T = ∞ k=1 Tk , the σ-algebra of tail events. Then µ(A) = 0 or µ(A) = 1 for each A ∈ T .
(ii) Let (Xn , An , µn ) be probability spaces for n ∈ N, and let (X, A, µ) be their product probability
∏ ⊗∞
space, i.e., X = ∞ n=1 Xn and µ = n=1 µn . Let πn : X → Xn be the projection to the nth
coordinate, and recall that the product σ-algebra A is the smallest σ-algebra on X w.r.to which
all the projections πn are measurable. Let Tk be the smallest σ-algebra on X such that πn ’s are
∩
measurable for n ≥ k, and let T = ∞ k=1 Tk . Then µ(A) = 0 or µ(A) = 1 for each A ∈ T .
∪∞
Proof. (i) Let Ck be the smallest σ-algebra such that f1 , . . . , fk are measurable. Let C = k=1 Ck ,
which is a π-system, being an increasing union of σ-algebras. If A ∈ T , then A ∈ Tk+1 and hence
A is independent with Ck for each k ∈ N. Hence A is independent with C, and therefore A is
independent with σ(C) by Exercise-33(iii). Since A ∈ T1 ⊂ σ(C), we get A is independent with
itself, i.e., µ(A) = µ(A ∩ A) = µ(A)µ(A) = µ(A)2 . Hence µ(A) = 0 or µ(A) = 1.
∪∞
(ii) Let Ck be the smallest σ-algebra on X such that π1 , . . . , πk are measurable, let C = k=1 Ck ,
and imitate the argument given for part (i) after noting that Ck is independent with Tk+1 .
Remark: Tail events for an independent sequence (fn ) of random variables are those events that are
not affected by changing/removing finitely many fn ’s. For example, {x ∈ X : lim inf n→∞ fn (x) ≥ 0}
is a tail event, but {x ∈ X : inf n∈N fn (x) ≥ 0} is not a tail event.
Earlier we showed in [129] that the shift map is ergodic. Now we will give another proof of this
using Kolmogorov’s 0-1 law.
[134] [Ergodicity of the shift map] Let (X, A0 , µ0 ) be a probability measure space, and let (X N , A, µ)
⊗∞ ⊗∞
be the product probability space, where A = n=1 A0 and µ = n=1 µ0 . Then the shift map
g : X N → X N given by g(x)n = xn+1 for x ∈ X N and n ∈ N is ergodic.
6 T.K.SUBRAHMONIAN MOOTHATHU
Proof. Let A ∈ A, and g −1 (A) = A. Let Tk , T be as in [133](ii). As A is the smallest σ-algebra such
that all the projections are measurable, we have A = T1 . Thus A ∈ T1 . Then A = g −(k−1) (A) ∈ Tk
∩
for k > 1 since g is the shift. Hence A ∈ ∞k=1 Tk = T . So µ(A) ∈ {0, 1} by [133](ii).
Remark: A little more work will establish the same conclusion for the shift map on X Z , see p.271
of R.M. Dudley, Real Analysis and Probability.
The phrase law of large numbers refers to the convergence of the averages (f1 + · · · + fn )/n when
fn ’s are independent random variables. There are weak and strong laws of large numbers. To
establish the weak law, we start with some inequalities.
[135] [Weak law of large numbers] Let (fn ) be a sequence of independent, identically distributed
random variables on a probability space (X, A, µ) with Ef12 < ∞. Then (f1 + · · · + fn )/n → Ef1
∑
in measure, i.e., µ({x ∈ X : |Ef1 − (1/n) nj=1 fj (x)| ≥ ε}) → 0 as n → ∞ for each ε > 0.
Proof. Let sn := f1 + · · · + fn . Then E(sn /n) = Ef1 by linearity and the above Remark. Similarly,
var(sn /n) = (1/n2 )var(sn ) = (1/n2 )n · var(f1 ) = var(f1 )/n by Exercise-32 and the above Remark.
By Checbychev’s inequality, µ({x ∈ X : |sn (x)/n − Ef1 | ≥ ε}) ≤ var(f1 )/nε2 → 0 as n → ∞.
Remark: On a probability space, pointwise convergence a.e. implies convergence in measure; see
Egorov’s theorem in my notes on Measure Theory. Hence [136] below covers [135] above.
[136] [Strong law of large numbers] Let (fn ) be a sequence of independent, identically distributed
random variables on a probability space (X, A, µ). If E|f1 | < ∞, then (f1 + · · · + fn )/n → Ef1
pointwise µ-almost everywhere.
TOPICS FROM ANALYSIS - PART 3/3 7
⊗∞ ⊗∞
Proof. Let P = n=1 (µ ◦ fn−1 ) = n=1 (µ ◦ f1−1 ) be the product Borel measure on RN , and
ϕ : RN → R be (y1 , y2 , . . .) 7→ y1 . Since f : (X, µ) → (RN , P ) defined as f (x) = (fn (x)) is measure
∫ ∫ ∫
preserving, RN |ϕ|dP = X |ϕ| ◦ f dµ = X |f1 |dµ = E|f1 | < ∞. Thus ϕ ∈ L1 (RN , B(RN ), P ). Let
g : (RN , P ) → (RN , P ) be the shift map. By [134], g is ergodic, and by Birkhoff’s ergodic theorem,
∑ ∫ N
limn→∞ (1/n) n−1 j=0 ϕ(g (y)) = RN ϕ(y)dP (y) for P -almost every y ∈ R . Since f is measure
j
∑n−1 ∫
preserving, the substitution y = f (x) gives limn→∞ (1/n) j=0 ϕ(g j (f (x))) = X f1 (x)dµ(x) = Ef1
∑ ∑
for µ-almost every x ∈ X. As ϕ ◦ g j ◦ f = fj+1 , we have (1/n) nj=1 fj = (1/n) n−1 j=0 ϕ ◦ g ◦ f .
j
∑
Thus limn→∞ (1/n) nj=1 fj (x) = Ef1 for µ-almost every x ∈ X.
Definition: If P1 , P2 are finite signed Borel measures on R, then their convolution P1 ∗ P2 is the
∫
signed Borel measure on R defined by P1 ∗ P2 (A) = R P2 (A − s)dP1 (s). The relevance of this
definition can be seen from the following:
Exercise-36: (i) Let P1 , P2 , P3 be finite signed Borel measures on R. Then we have P1 ∗ P2 (A) =
⊗
P1 P2 ({(s, t) ∈ R2 : s+t ∈ A}). Consequently, P1 ∗P2 = P2 ∗P1 and P1 ∗(P2 ∗P3 ) = (P1 ∗P2 )∗P3 .
Moreover, if µ0 is the Dirac measure at 0, then P1 ∗ µ0 = P1 = µ0 ∗ P1 .
(ii) [Convolution of induced measures corresponds to the addition of independent random variables]
Let f1 , f2 : X → R be independent random variables on a probability space (X, A, µ), and Pj =
µ ◦ fj−1 for j = 1, 2. Then P1 ∗ P2 = µ ◦ (f1 + f2 )−1 .
∫ ∫ ∫
[Hint: (i) 1A−s (t) = 1A (s+t). Hence P1 ∗P2 (A) = R P2 (A−s)dP1 (s) = R R 1A−s (t)dP2 (t)dP1 (s) =
∫ ⊗ ⊗
R2 1A (s + t)d(P1 P2 ) = P1 P2 ({(s, t) ∈ R2 : s + t ∈ A}). (ii) By independence, µ ◦ (f1 , f2 )−1
⊗ ⊗
on R2 is the product measure P1 P2 . Hence P1 ∗ P2 (A) = P1 P2 ({(s, t) ∈ R2 : s + t ∈ A}) =
µ ◦ (f1 , f2 )−1 ({(s, t) ∈ R2 : s + t ∈ A}) = µ({x ∈ X : f1 (x) + f2 (x) ∈ A}) = µ ◦ (f1 + f2 )−1 (A).]
Let (fn ) be a sequence of random variables. In the previous section, we analyzed the convergence
∑ ∑
of the averages (1/n) nj=1 fj . In this section we ask: when does the series ∞ n=1 fn converge?
Equivalently, we wish to know when does the sequence sn = f1 + · · · + fn of partial sums converge.
Exercise-37: [Completeness for a.e. convergence] Let (sn ) be a sequence of random variables on
a probabiltiy space (X, A, µ). If ∀ ε > 0, we have lim µ({x ∈ X : sup |sn (x) − sk (x)| ≥ ε}) = 0,
k→∞ n≥k
then there is a random variable s on (X, A, µ) such that (sn ) → s pointwise µ-a.e. [Hint: Let
∪
A(k, r) = {x ∈ X : supm,n≥k |sm (x) − sn (x)| ≤ 1/r}, and A(r) = ∞ k=1 A(k, r), which is an
8 T.K.SUBRAHMONIAN MOOTHATHU
increasing union. Using the triangle inequality for ε = 1/2r we may see µ(A(r)) = 1. Then
∩ ∩∞
µ( ∞r=1 Ar ) = 1. And for each x ∈ r=1 Ar , the Cauchy sequence (sn (x)) converges.]
We also need a technical inequality relating partial sums sj and max sj for Levy’s theorem:
1≤j≤m
Exercise-38: Let f1 , . . . , fm be independent random variables on a probability space (X, A, µ), and
∑
sj = ji=1 fi . For ε > 0 and 1 ≤ j ≤ m, let S(j, ε) = {x ∈ X : |sj (x)| ≥ ε} and T (j, ε) = {x ∈
X : |fj+1 (x) + · · · + fm (x)| = |sm (x) − sj (x)| > ε}. If there is ε > 0 such that µ(T (j, ε)) ≤ 1/2
∪
for 1 ≤ j ≤ m, then µ( m j=1 S(j, 2ε)) ≤ 2µ(S(m, ε)). [Hint: Let A1 = S(1, 2ε) and Aj+1 = S(j +
∪j
1, 2ε) \ i=1 S(i, 2ε), which are disjoint. Note that Aj is independent with T (j, ε) since Aj depends
on f1 , . . . , fj and T (j, ε) on fj+1 , . . . , fm . If |sj (x)| ≥ 2ε and |sm (x) − sj (x)| ≤ ε, then |sm (x)| ≥ ε,
∪ ∑m
and therefore Aj ∩ T (j, ε)c ⊂ Aj ∩ S(m, ε). Hence (1/2)µ( m j=1 S(j, 2ε)) = (1/2) j=1 µ(Aj ) ≤
∑m ∑ m ∑ m
j=1 µ(Aj )µ(T (j, ε) ) ≤ j=1 µ(Aj ∩ T (j, ε) ) ≤ j=1 µ(Aj ∩ S(m, ε)) ≤ µ(S(m, ε)).]
c c
[137] [Levy’s theorem] Let (fn ) be a sequence of independent random variables on a probability
space (X, A, µ). Then the following are equivalent:
∑
(i) ∞ n=1 fn converges (to some random variable) in measure.
∑∞
(ii) n=1 fn converges (to some random variable) pointwise µ-a.e.
Proof. Enough to show (i) ⇒ (ii) since the other implication is always true on a probability space.
∑
Let sn = nj=1 fj . Given ε ∈ (0, 1/2), by (i) choose k ∈ N large enough with µ({x ∈ X : |sn (x) −
sm (x)| > ε} ≤ ε < 1/2 for every n, m ≥ k. Fix m ∈ N, and applying Exercise-38 to fk+1 , · · · , fk+m ,
we get µ({x ∈ X : max1≤j≤m |sk+j (x) − sk (x)| ≥ 2ε}) ≤ 2µ({x ∈ X : |sk+m (x) − sk (x)| > ε}) < ε.
As m ∈ N is arbitrary, we have shown that for every ε ∈ (0, 1/2) there is k ∈ N such that
µ({x ∈ X : supn≥k |sn (x) − sk (x)| ≥ 2ε}) ≤ ε. This is equivalent to saying limk→∞ µ({x ∈ X :
supn≥k |sn (x) − sk (x)| ≥ 2ε}) = 0 for every ε ∈ (0, 1/2). By Exercise-37, we are through.
∑∞
In [138] and Exercise-40 below we will see sufficient conditions for the convergence of n=1 fn .
(ii) After noting that E(fn − Efn ) = 0 and var(fn ) = E(fn − Efn )2 , we apply (i) to fn − Efn to
∑
conclude that ∞ ′
n=1 (fn −Efn ) converges pointwise µ-a.e. and in L -norm to some s ∈ L (X, A, µ).
2 2
∑∞ ∑
Then n=1 fn converges pointwise µ-a.e. and in L2 -norm to s := s′ + ∞ n=1 Efn ∈ L (X, A, µ).
2
Remark: In the previous section, to obtain results about the convergence of (f1 + · · · + fn )/n,
we used the hypothesis that fn ’s are independent and identically distributed. However, being
∑∞
identically distributed does not help much in the discussion of the convergence of n=1 fn . If
fn ’s are identically distributed and (fn ) satisfies the hypothesis of [138](ii), then we should have
Efn = 0 = var(fn ), and this implies the L2 -norm Efn2 = 0, giving fn = 0 for every n ∈ N.
Exercise-39: Let (fn ), (gn ) be two sequences of random variables on a probability space (X, A, µ),
∑ ∑∞
equivalent in the sense that ∞ n=1 µ({x ∈ X : fn (x) ̸= gn (x)}) < ∞. Then n=1 fn converges
∑∞
pointwise µ-a.e. iff n=1 gn converges pointwise µ-a.e. [Hint: Let An = {x ∈ X : fn (x) ̸= gn (x)}
and A = lim supn→∞ An . By Borel-Cantelli, µ(A) = 0. And for each x ∈ Ac , we have that
fn (x) = gn (x) for all large n ∈ N.]
Exercise-40: [Kolmogorov’s three series theorem - sufficiency] Let (fn ) be a sequence of independent
random variables on a probability space (X, A, µ), and (gn ) be defined as gn (x) = fn (x) when
|fn (x)| ≤ 1 and gn (x) = 0 otherwise. Assume that the following three series of real numbers
∑ ∑ ∑
are convergent: (i) ∞ n=1 µ{x ∈ X : |fn (x)| > 1}) (ii) ∞ n=1 Egn (iii) ∞
n=1 var(gn ). Then
∑∞
n=1 fn converges pointwise µ-a.e. [Hint: Check that gn ’s are also independent. Now, (i) says
∑∞ ∑∞
n=1 µ({x ∈ X : fn (x) ̸= gn (x)}) < ∞. Hence by Exercise-39, it suffices to show n=1 gn
converges pointwise µ-a.e. And this follows from (ii), (iii) and [138](ii).]
Remark: The converse (necessity) of Exercise-40 is also true. Also, any bound C > 0 may be used
∑ ∑∞
in the place of 1 to define gn from fn as ∞
n=1 fn converges iff n=1 (fn /C) converges.
Definition: A complete separable metric space is called a Polish space. If X is a Polish space, let
M (X) denote the collection of all Borel probability measures on X. Also, let Cb (X, R) = {f :
X → R : f is continuous and bounded}. Note that Cb (X, R) is a Banach space with respect to the
supremum norm, and any µ ∈ M (X) induces a bounded linear functional on Cb (X, R) by the rule
∫
f 7→ X f dµ. If µ, µn ∈ M (X), then borrowing the notion of weak* convergence from Functional
∫ ∫
Analysis, we may say (µn ) converges to µ weak* if X f dµn → X f dµ for every f ∈ Cb (X, R).
However, this convergence is called weak convergence in Probability Theory. That is, we say
∫ ∫
(µn ) → µ weakly in M (X) if X f dµn → X f dµ for every f ∈ Cb (X, R).
10 T.K.SUBRAHMONIAN MOOTHATHU
[139] Let X be a Polish space and µ, µn ∈ M (X). Then the following are equivalent:
(i) (µn ) → µ weakly.
(ii) lim inf n→∞ µn (U ) ≥ µ(U ) for every open set U ⊂ X.
(iii) lim supn→∞ µn (F ) ≤ µ(F ) for every closed set F ⊂ X.
(iv) If A ∈ B(X) is with µ(∂A) = 0, then limn→∞ µn (A) = µ(A).
(ii & iii) ⇒ (iv): Let U = int(A) and F = A. We have µ(U ) = µ(A) = µ(F ) since µ(∂A) = 0.
Hence lim sup µn (A) ≤ lim sup µn (F ) ≤ µ(F ) = µ(A) = µ(U ) ≤ lim inf µn (U ) ≤ lim inf µn (A).
n→∞ n→∞ n→∞ n→∞
This implies equality throughout, and therefore limn→∞ µn (A) = µ(A).
(iv) ⇒ (i): Consider f ∈ Cb (X, R) and assume f (X) ⊂ (−M, M ). The set Y := {y ∈ R :
µ(f −1 (y)) > 0} is countable since f −1 (y1 ) ∩ f −1 (y2 ) = ∅ for y1 ̸= y2 . Given ε > 0, choose a
partition −M = a0 < a1 < · · · < ak−1 < ak = M of [−M, M ] such that aj − aj−1 < ε for 1 ≤ j ≤ k
∪
and aj ∈/ Y for 0 ≤ j ≤ k. If we put Aj = {x ∈ X : aj−1 ≤ f (x) < aj }, then X = kj=1 Aj is a
disjoint union. Moreover, ∂Aj = 0 for 1 ≤ j ≤ k because ∂Aj ⊂ f −1 (aj−1 ) ∪ f −1 (aj ). This implies
∑
µ(Aj ) = limn→∞ µn (Aj ) for 1 ≤ j ≤ k by (iv). Let g : X → [0, 1] be defined as g = kj=1 aj 1Aj .
∫ ∫
Then gdµ ∫ = limn→∞ gdµn since ∫ µ(Aj ) = lim∫n→∞ µn (Aj ). ∫
Also, f < g < f + ε, and therefore
(lim sup f dµn ) − ε ≤ lim sup (g − ε)dµn = (g − ε)dµ ≤ f dµ
∫
n→∞ ∫ n→∞ ∫ ∫
≤ gdµ = lim inf gdµn ≤ lim inf (f + ε)dµn = (lim inf f dµn ) + ε.
n→∞ n→∞ n→∞
∫ ∫
Since ε > 0 is arbitrary, we deduce that limn→∞ f dµn exists and is equal to f dµ.
Remark: Let X be a compact metric space. Then C(X, R) = Cb (X, R) is a separable Banach space
w.r.to the supremum norm (for a proof of separability, see the first part of my notes on Topological
Groups). By Alaoglu’s theorem, etc., the closed unit ball Γ of the dual space C(X, R)∗ is compact
and metrizable in the weak* topology (warning: the unit sphere {f ∈ C(X, R) : ∥f ∥ = 1} is not
TOPICS FROM ANALYSIS - PART 3/3 11
closed in the weak* topology, and hence is not weak* compact). The collection M (X) of all Borel
probability measures on X can be thought of as a subset of Γ by Riesz representation theorem.
Exercise-42: Let X be a compact metric space and M (X) ⊂ C(X, R)∗ be the collection of all Borel
probability measures on X. Then,
(i) M (X) is compact and metrizable w.r.to the weak* topology on C(X, R)∗ .
(ii) Every sequence (µn ) in M (X) has a subsequence (µnk ) converging weakly to some µ ∈ M (X),
∫ ∫
i.e., X f dµnk → X f dµ for every f ∈ C(X, R).
[Hint: (i) Enough to check M (X) is weak* closed, and this is easy: if µn ∈ M (X) and (µn ) → µ
∫ ∫
weakly, then µ(X) = X 1dµ = limn→∞ 1dµn = limn→∞ µn (X) = 1. And (ii) follows from (i).]
Remark: Let X be a compact metric space and {fn : n ∈ N} be a countable dense subset of the
∫
closed unit ball of C(X, R). Then it may be shown that µ 7→ ( X fn dµ) embeds (M (X), weak) as a
closed subset of [−1, 1]N ; this gives another proof that (M (X), weak) is compact and metrizable.
Question: Let X be a polish space and (µn ) be a sequence in M (X). When can we say that (µn ) has
a weakly convergent subsequence? Are there other equivalent formulations of weak convergence?
Exercise-43: Let µ ∈ M (R) and F : R → [0, 1] be the corresponding distribution function defined
as F (x) = µ((−∞, x]) for x ∈ R. Then,
(i) F is continuous from the right, limx→−∞ F (x) = 0 and limx→∞ F (x) = 1.
(ii) x ≤ y ⇒ F (x) ≤ F (y), and consequently, the set of discontinuities of F is at most countable.
(iii) If F is continuous, then F is uniformly continuous.
[Hint: (iii) Given ε > 0, choose M > 0 large so that F (x) < ε/2 for x < −M and F (x) > 1 − ε/2
for x > M . Then choose δ > 0 for ε/2 for the uniformly continuous function F |[−M,M ] .]
Proof. (i) ⇒ (ii): Fix x ∈ C and let a < x < b. Let f, g ∈ Cb (R, R) be defined by the following
conditions: f = 1 on (−∞, a], f = 0 on (x, ∞], and the graph of f is linear on [a, x]; g = 1 on
(−∞, x], g = 0 on (b, ∞], and the graph of g is linear on [x, b]. Observe that F (a) = µ((−∞, a]) ≤
∫ ∫ ∫ ∫
f dµ ≤ gdµ ≤ µ((−∞, b]) = F (b) and f dµn ≤ µn ((−∞, x]) = Fn (x) ≤ gdµn . Since
12 T.K.SUBRAHMONIAN MOOTHATHU
∫ ∫ ∫ ∫
f dµn → f dµ and gdµn → gdµ, we get F (a) ≤ lim inf n→∞ Fn (x) ≤ lim supn→∞ Fn (x) ≤
F (b). Letting a ↗ x, b ↘ x and using the continuity of F at x, we see F (x) = limn→∞ Fn (x).
(iv) ⇒ (i): Let f ∈ Cb (R, R) and ε > 0 be given. Choose a < b in A such that for Y := (a, b]
we have ∫µ(R \ Y ) < ε. Since (µn (Y )) → µ(Y ), we also have (µn (R \ Y )) → µ(R \ Y ). Hence
lim sup | f d(µ − µn )| ≤ lim sup ∥f ∥(µ(R \ Y ) + µn (R \ Y )) ≤ 2∥f ∥ε. (*)
n→∞ R\Y n→∞
Since f is uniformly continuous on [a, b], there exist a = y0 < y1 < · · · < yk = b in A such that
∑ ∫
|f (x) − f (yj )| < ε for x ∈ Yj := (yj−1 , yj ]. Let g = kj=1 f (yj )1Yj , and note that | Y gd(µ − µn )| =
∑ ∑ ∫
| kj=1 f (yj )(µ(Yj ) − µn (Yj ))| ≤ kj=1 ∥f ∥|µ(Yj ) − µn (Yj )|. Also observe that | Y (f − g)dµn | ≤
∑k ∫ ∑k ∫
j=1 Yj |f − g|dµn ≤ j=1 εµn (Yj ) = εµn (Y ) ≤ ε, and similarly | Y (f − g)dµ| ≤ ε. Therefore,
∫ ∫ ∫ ∫ ∑
| Y f d(µ−µn )| ≤ | Y (f −g)dµ|+| Y (f −g)dµn |+| Y gd(µ−µn )| ≤ ε+ε+ kj=1 ∥f ∥|µ(Yj )−µn (Yj )|.
∫
Since (µn (Yj )) → µ(Yj ), we conclude that lim supn→∞ | Y f d(µ − µn )| ≤ 2ε. (**)
∫
From (*) and (**), we obtain that lim supn→∞ | R f d(µ − µn )| ≤ 2∥f ∥ε + 2ε.
[141] Let X be a Polish space. Suppose that a sequence (µn ) in M (X) is uniformly tight in the
following sense: for every ε > 0, there is a compact subset K ⊂ X with µn (K) > 1 − ε for every
n ∈ N. Then there exist µ ∈ M (X) and a subsequence of (µnj ) such that (µnj ) → µ weakly.
In fact, Theorem 11.3.3 and Theorem 11.5.4 of Dudley, Real Analysis and Probability give more
information, which we state as [141′ ] below without proof.
[141′ ] Let X be a Polish space. Then D, DL are metrics on M (X), and (µn ) → µ weakly in M (X)
⇔ D(µn , µ) → 0 ⇔ DL (µn , µ) → 0. Moreover, for Γ ⊂ M (X), the following are equivalent:
(i) Γ is uniformly tight.
(ii) Every sequence in Γ has a subsequence converging weakly to some µ ∈ M (X).
(iii) The closure of Γ is compact in M (X) with respect to the metric D (or DL ).
(iv) Γ is totally bounded in M (X) with respect to the metric D (or DL ).
Remark: Since the elements of a Cauchy sequence form a totally bounded set, it follows from [141′ ]
that (M (X), D) and (M (X), DL ) are complete metric spaces when X is a Polish space.
Remark: (i) Finitely supported probability measures are convex combinations of Dirac measures.
(ii) Let Y be a countable dense subset of a Polish space (or just a separable metric space) X. It can
be deduced using Exercise-44(ii) that the set of all finitely supported ν ∈ M (X) with supp(ν) ⊂ Y
and ν taking values in Q is a countable dense subset of (M (X), D). Thus (M (X), D) is separable.
14 T.K.SUBRAHMONIAN MOOTHATHU
4. Conditional expectation
You must have studied the notion of conditional probability in elementary classes with the
expression P (B)P (A|B) = P (A ∩ B), where P (A|B) is the conditional probability of the event A
given event B. We will consider a generalization of this to the measure theoretic setting.
(i) If B ∈ A is with µ(B) > 0, we define the conditional expectation E(f |B) of f given B as
∫
E(f |B) = ( B f dµ)/µ(B); and we put E(f |B) = 0 if µ(B) = 0. Taking f = 1A for A ∈ A, we
∫ ∫
recover the definition of conditional probability since B 1A dµ = A∩B 1dµ = µ(A ∩ B).
∫
(ii) More importantly, if B ⊂ A is a sub σ-algebra of A, then ν(B) := B f dµ for B ∈ B defines a
signed measure on (X, B) absolutely continuous w.r.to µ, and therefore by Radon-Nikodym theorem
∫
there is h ∈ L1 (X, B, µ) such that ν(B) = B hdµ for every B ∈ B. Moreover, this h is unique
in the following sense: if h′ also satisfies the same property, then h(x) = h′ (x) for µ-almost every
x ∈ X. We define the conditional expectation E(f |B) of f given B as E(f |B) = h ∈ L1 (X, B, µ). In
∫ ∫
other words, E(f |B) is defined as the unique h ∈ L1 (X, B, µ) satisfying B hdµ = B f dµ for every
B ∈ B, and the existence of h is guaranteed by Radon-Nikodym theorem.
Remark: Note that E(f |B) is not a real number but an integrable function. Why is it so? Recall
that in Multivariable Calculus the derivative of a differentiable function at a point is not a real
number but a linear map: this linear map incorporates information about directional derivatives in
all possible directions. Similarly, when we write E(f |B) = h, it has to be observed that the function
h incorporates information about the conditional expectations E(f |B) for all B ∈ B. If E(f |B) = h,
∫ ∫ ∫
then for every B ∈ B we have B f dµ = B hdµ, and in particular E(f |B) = ( B hdµ)/µ(B).
Example: (i) Let X = [0, 1], A = B(X), µ be the Lebesgue measure, f : X → R be f (x) = x, and
B = {∅, [0, 1/3], (1/3, 1], X}. Note that f is not B-measurable since [0, 1/2] = f −1 ([0, 1/2]) ∈
/ B.
∫ 1/3 ∫1
To compute E(f |B), observe that 0 f dµ = 1/18 and 1/3 f dµ = 4/9. Let h : X → R be
∫ ∫
h = (1/6)1[0, 1/3) + (2/3)1(1/3, 1] , which belongs to L1 (X, B, µ) and we have B hdµ = B f dµ for
every B ∈ B. Hence E(f |B) = h. In particular, we cannot expect E(f |B) to be continuous (even
after modifying on a null set) even when f is continuous. (ii) More generally, let (X, A, µ) be
a a probability space, and B ⊂ A be a sub σ-algebra generated by a finite measurable partition
∪
X = kj=1 Aj with µ(Aj ) > 0, i.e., B is the smallest sub σ-algebra of A containing {A1 , · · · , Ak }. For
∫ ∑
f ∈ L1 (X, A, µ) if we put aj = ( Aj f dµ)/µ(Aj ), then it may be checked that E(f |B) = kj=1 aj 1Aj .
TOPICS FROM ANALYSIS - PART 3/3 15
Conditional expectation behaves in many cases like ordinary expectation. But it should be noted
that since h = E(f |B) is defined uniquely only µ-almost everywhere, the properties of conditional
expectation that we state as [142], [143], and [144] below, should be read ‘µ-almost everywhere’.
Proof. Properties (i) and (ii) follow essentially from the definition of conditional expectation.
∪
(iii) Let h = E(f |B) and A = {x ∈ X : h(x) < 0}. Write A = ∞ n=1 An , where An = {x ∈ X :
h(x) < −1/n}, and note A, An ∈ B. If h ≥ 0 µ-a.e. is false, then µ(A) > 0 and then µ(An ) > 0 for
∫ ∫
some n ∈ N. Then An hdµ ≤ −µ(An )/n < 0 ≤ An f dµ, a contradiction since An ∈ B.
(iv) Since ±f ≤ |f |, we have ±E(f |B) = E(±f |B) ≤ E(|f | | B) by (ii) and (iii).
∫ ∫ ∫
(v) Let h = E(E(f |B)|C). If C ∈ C ⊂ B, then C hdµ = C E(f |B)dµ = C f dµ.
(vi) By hypothesis, 1B is independent with f for B ∈ B and hence E(1B f ) = E(1B )Ef = µ(B)Ef
∫ ∫ ∫
by Exercise-32. Therefore, B f dµ = X 1B f dµ = E(1B f ) = µ(B)Ef = B Ef dµ for B ∈ B.
Proof. (i) Let hn = E(fn |B). Since 0 ≤ hn ≤ hn+1 ≤ E(f |B) (and this we may assume everywhere
after modification on a set of measure 0), there is h : X → [0, ∞) with h(x) = limn→∞ hn (x)
for every x ∈ X. Clearly h is B-measurable since hn ’s are. Consider B ∈ B. By the ordinary
∫
Monotone convergence theorem applied to 1B fn ↗ 1B f and 1B hn ↗ 1B h, we obtain B f dµ =
16 T.K.SUBRAHMONIAN MOOTHATHU
∫ ∫ ∫ ∫ ∫
limn→∞ B fn dµ and B hdµ = limn→∞ B hn dµ. But B fn dµ = B hn dµ since hn = E(fn |B) and
∫ ∫
thus B f dµ = B hdµ. By taking B = X, we see h ∈ L1 (X, B, µ). Hence h = E(f |B) µ-a.e..
(ii) Let gm := inf n≥m fn . Note that 0 ≤ gm ↗ f and gm ≤ fm . Hence by (i) we have
E(f |B) = lim E(gm |B) ≤ lim inf E(fm |B).
m→∞ m→∞
(iii) Apply (ii) to g ± fn ≥ 0. Then E(g − f |B) = E(lim inf(g − fn )|B) ≤ lim inf E(g − fn |B) =
E(g|B) − lim sup E(fn |B). Removing E(g|B) from both sides we get −E(f |B) ≤ − lim sup E(fn |B),
and hence E(f |B) ≥ lim sup E(fn |B). Similarly, E(g + f |B) = E(lim inf(g + fn )|B) ≤ lim inf E(g +
fn |B) = E(g|B) + lim inf E(fn |B) and thus E(f |B) ≤ lim inf E(fn |B) as well.
(iv) By linearity, we may assume g ≥ 0 and f ≥ 0. If g = 1C for some C ∈ B, then for any B ∈ B
∫ ∫ ∫ ∫
we have B 1C E(f |B)dµ = B∩C E(f |B)dµ = B∩C f dµ = B 1C f dµ and hence gE(f |B) = E(gf |B)
in this case. By linearity, gE(f |B) = E(gf |B) holds for all simple functions g ≥ 0. For a general
g ≥ 0, find simple functions gn with 0 ≤ gn ↗ g. Then gn E(f |B) ↗ gE(f |B). By what is already
proved, and applying (i) to 0 ≤ gn f ↗ gf , also deduce gn E(f |B) = E(gn f |B) ↗ E(gf |B).
Proof. Let Γ be the collection of all affine functions h : R → R, h(x) = ax + b, with h ≤ g. The
geometric observation is that g(x) = sup{h(x) : h ∈ Γ} for each x ∈ R since g is convex. Consider
h ∈ Γ. By linearity and the fact h ◦ f ≤ g ◦ f , we get h(E(f |B)) = E(h ◦ f |B) ≤ E(g ◦ f |B) by
[142]. Taking supremum over h ∈ Γ yields g(E(f |B)) ≤ E(g ◦ f |B).
Remark: Since x 7→ x2 is convex, we have (E(f |B))2 ≤ E(f 2 |B) by [144]. Also, [144] remains true
if f (X) ⊂ J for some interval J ⊂ R and g : J → R is convex. Therefore, (E(|f | | B))p ≤ E(|f |p |B)
for 1 ≤ p < ∞ as x 7→ xp is convex on [0, ∞) for 1 ≤ p < ∞.
5. Martingales
Martingales are special sequences of random variables (whose historical origin is from gambling)
enjoying a toolkit of nice properties including convergence under mild assumptions.
Question: Does the knowledge of An (events up to the nth game) help the gambler to increase the
expected value of fn+1 , his fortune after the (n + 1)th game? i.e., is fn ≤ E(fn+1 |An )?
Remark: (i) If fn is An -measurable, then E(fn |An ) = fn by [142](i). Hence a gambling sequence
{fn , An }∞
n=0 with fn ∈ L is a martingale iff E(fn+1 − fn |An ) = 0; is a submartingale iff E(fn+1 −
1
fn = E(f |An ), which is An -measurable. Now E(fn+1 |An ) = E(E(f |An+1 )|An ) = E(f |An ) = fn
by [142](v). Therefore {fn , An }∞
n=0 is a martingale. A partial converse is given below.
18 T.K.SUBRAHMONIAN MOOTHATHU
E(f |An ) for every n ≥ 0. [Hint: Since Lp is reflexive and separable for 1 < p < ∞, the bounded
sequence (fn ) has a weakly convergent subsequence (fnk ) → f ∈ Lp by Alaoglu’s theorem, etc. Fix
n ∈ N, and consider A ∈ An . Then 1A ∈ Lq = (Lp )∗ , where p1 + 1q = 1. By weak convergence,
∫
A f dµ = ⟨f, 1A ⟩ = limk→∞ ⟨fnk , 1A ⟩. For nk > n, we have E(fnk |An ) = fn by Exercise-46(ii).
∫ ∫ ∫ ∫
Hence ⟨fnk , 1A ⟩ = A fnk dµ = A fn dµ. Thus A f dµ = A fn dµ, giving fn = E(f |An ).]
Remark: From the Example above we may extract a general method of constructing martingales
as follows. Let (hn )∞ 1
n=0 be an L -sequence of independent random variables on a probability space
(X, A, µ). Let gn = hn −Ehn (then Egn = 0 and gn ’ are still independent), An = σ(g0 , g1 , · · · , gn ) =
∑
σ(h0 , h1 , · · · , hn ), and fn = nj=0 gj . Then {fn , An }∞
n=0 is a martingale.
Proof. First we show uniqueness. Suppose fn = gn + hn . We have E(gn+1 |An ) = gn by (i), and
E(hn+1 |An ) = hn+1 by (iii). Hence E(fn+1 − fn |An ) = E(fn+1 |An ) − fn = E(gn+1 + hn+1 |An ) −
(gn + hn ) = hn+1 − hn . This determines hn ’s uniquely since h0 = 0, and then gn ’s are also uniquely
determined. Now we prove the existence of decomposition. Starting with h0 = 0, we may define
hn ’s inductively using the requirement hn+1 − hn = E(fn+1 − fn |An ), which ensures (iii). We have
hn+1 ≥ hn since E(fn+1 − fn |An ) ≥ 0 by the submartingale property of fn . Letting gn = fn − hn ,
we may verify (i) as well by checking E(gn+1 − gn |An ) = 0.
Imagine a game in which the gambler stops playing when a particular favorable event happens.
The theory given below leading up to [146] says that this stopping strategy does not affect the
essential nature (martingale/submartingale/supermartingale) of the gambler’s expected fortune.
Example: Consider X = [0, 1] with the Lebesgue measure. Let fn : X → R be fn (x) = xn and
An = σ(f0 , f1 , . . . , fn ) for n ≥ 0. Fix c ∈ (0, 1), and define t : [0, 1] → N ∪ {0, ∞} as t(x) = min{n ∈
N : fn (x) ≤ c} and t(x) = ∞ if no such n exists. Now, t is finite since {x ∈ [0, 1] : t(x) < ∞} = [0, 1),
but t is not bounded since {x ∈ [0, 1] : t(x) ≤ M } = [0, c1/M ] whose Lebesgue measure is < 1.
Example: Let X = [0, 1] with Lebesgue measure µ. Let fn : X → R be fn (x) = x/n and
An = σ(f1 , . . . , fn ) for n ∈ N. Define t : [0, 1] → N ∪ {0, ∞} as t(x) = min{n ∈ N : fn (x) ≤ 1/3},
which is a bounded stopping time w.r.to (An ) since t ≤ 3. Here ft : X → R is given by ft (x) = f1 (x)
for x ∈ [0, 1/3]; ft (x) = f2 (x) for x ∈ (1/3, 2/3]; and ft (x) = f3 (x) for x ∈ (2/3, 1]. Also, ft∧1 = f1 ;
ft∧2 (x) = f1 (x) for x ∈ [0, 1/3] and ft∧2 (x) = f2 (x) for x ∈ (1/3, 1]; and ft∧n = ft for n ≥ 3.
Example: Let X = [0, 1] with Lebesgue measure µ. Let An = [0, 2−n ] and fn : X → R be
fn = 2n 1An for n ≥ 0. We have fn ∈ L1 (X, A, µ) with ∥f ∥1 = Efn = 1. Let Cn be the collection
of subintervals J ⊂ [0, 1] having end points of the form k/2n , and An = σ(Cn ). Then fn is An -
∫ ∫
measurable. Consider J ∈ An . Then J fn dµ = J fn+1 dµ = 1 or 0 depending on whether inf J = 0
∫ ∫
or inf J ≥ 2−n . It follows that A fn dµ = A fn+1 dµ for every A ∈ An . Therefore, E(fn+1 |An ) = fn ,
i.e., {fn , An }∞ ∞
n=0 is a martingale. Note that (fn (x))n=0 is eventually 0 for each x ∈ (0, 1]. Hence
t : X → N ∪ {0, ∞} given by t(0) = ∞ and t(x) = min{n ∈ N : fn (x) = 0} for x ∈ (0, 1], is a
finite stopping time. And ft : X → R satisfies ft ≡ 0 on (0, 1], giving Eft = 0 ̸= 1 = Efn for every
n ≥ 0. This shows the necessity of the boundedness assumption on t in [146](iii) below.
The result below says that under fairly general conditions, an optional stopping does not really
change the nature of a game; for simplicity we state it only for martingales.
Proof. (i) Let gn : X → R be gn (x) = 1 if n ≤ t(x) and gn (x) = 0 if t(x) ≤ n − 1. Then gn+1
∑
is An -measurable. Since ft∧0 = f0 and ft∧n = f0 + nj=1 gj (fj − fj−1 ) for n ≥ 1, we conclude by
Exercise-48 that {ft∧n , An }∞
n=0 is a martingale.
(ii) By Exercise-49(iii), |ft | ≤ M µ-a.e., which implies ft ∈ L1 (X, A, µ). Now by (i), Ef0 = Eft∧0 =
Eft∧n for every n ∈ N and therefore |E(ft − f0 )| = |E(ft − ft∧n )| ≤ 2M µ({x ∈ X : t(x) > n}) → 0
as n → ∞ since t < ∞ µ-a.e.. Hence E(ft − f0 ) = 0, or Eft = Ef0 .
TOPICS FROM ANALYSIS - PART 3/3 21
∪M
(iii) Let Aj = {x ∈ X : t(x) = j} ∈ Aj and observe X = j=0 Aj is a measurable partition modulo
a null set. Consider A ∈ At . Then A ∩ Aj ∈ Aj . We have ft ∈ L1 (X, At , µ) by Exercise-49,
∫ ∫
and ft = fj on Aj . Also E(fM |Aj ) = fj by Exercise-46(ii), implying A∩Aj fj dµ = A∩Aj fM dµ.
∫ ∑ ∫ ∑M ∫ ∫
Therefore, A ft dµ = M j=0 A∩Aj fj dµ = j=0 A∩Aj fM dµ = A fM dµ. Thus E(fM |At ) = ft .
Seminar topics: (i) Analogue of [146] for submartingales, (ii) Doob’s maximal and Lp -inequalities.
Remark: If {fn , An }∞
n=0 is a (super)martingale with fn ≥ 0 on a probability space (X, A, µ), and t
is a finite stopping time w.r.to (An ), then by Exercise-49(iii), Fatou’s lemma, and [146](i) we have,
Eft = E(lim inf ft∧n ) ≤ lim inf Eft∧n = (≤)Eft∧0 = Ef0 . Thus Eft ≤ Ef0 .
If a sequence of real numbers is not convergent in [−∞, ∞], then the sequence should oscillate and
hence should cross some interval [a, b] infinitely often. Building up on this simple observation, the
so called upcrossing inequality of Doob will now lead us to a sufficient condition for the convergence
of an L1 -martingale. This will complement Exercise-47 stated for an Lp -martingale, 1 < p < ∞.
Proof. We will define random variables gj : X → R that are 1 during an upcrossing and 0 during
a downcrossing. Fix x ∈ X, and choose integers 0 ≤ p1 < q1 < p2 < q2 < · · · optimally (least in
each case) with fpi (x) < a < b < fqi (x). Let g0 (x) = 0, gj+1 (x) = 0 for 0 ≤ j ≤ p1 , gj+1 (x) = 1 for
pi < j ≤ qi , and gj+1 (x) = 0 for qi < j ≤ pi+1 . Observe that gj+1 is Aj -measurable. Let h0 = 0
∑
and hn = nj=1 gj (fj − fj−1 ) for n ≥ 1. Then {hn , An }n≥0 is a supermartingale by the argument
in Exercise-48(ii) and therefore Ehn ≤ Eh0 = E0 = 0. Since each upcrossing of fj ’s increases the
value of hn by at least b − a, we also have (b − a)un − |fn − a| ≤ hn , where the subtracted term
|fn − a| is to take care of any possible incomplete upcrossing at the end with fn (x) < a. Taking
expectation we have (b − a)Eun − E|fn − a| ≤ Ehn ≤ 0, and thus (b − a)Eun ≤ E|fn − a|.
Proof. Let supn≥0 ∥fn ∥ ≤ M < ∞. Fix reals a < b and let un ’s be as in [147]. Since 0 ≤ un ≤ un+1 ,
the limit u(x) = limn→∞ un (x) exists in [0, ∞] for every x ∈ X. We have (b − a)Eun ≤ E|fn − a| ≤
∫ ∫
E(|fn | + |a|) ≤ M + |a| by [147], and therefore X udµ = lim X un dµ ≤ (M + |a|)/(b − a) < ∞ by
Monotone convergence theorem. Hence u(x) < ∞ for µ-a.e. x ∈ X. This in other words means
the set Y (a, b) := {x ∈ X : (fn (x))∞n=0 has infinitely many upcrossings of [a, b]} is µ-null. Since
∪
J := {(a, b) ∈ Q2 : a < b} is countable, the set Y := (a,b)∈J Y (a, b) is also µ-null. And (fn (x))∞
n=0
must converge in [−∞, ∞] for every x ∈ X \ Y . Writing f (x) = limn→∞ fn (x), we see by Fatou’s
∫ ∫
lemma that X |f |dµ ≤ lim inf X |fn |dµ ≤ M < ∞ and thus f ∈ L1 (X, A, µ).
Example: We will show that we cannot expect either fn = E(f |An ) or ∥f − fn ∥1 → 0 in [148].
Let X = [0, 1] with Lebesgue measure µ. Let An = [0, 2−n ] and fn : X → R be fn = 2n 1An for
n ≥ 0. Let An be the σ-algebra on X generated by subintervals J ⊂ X having end points of the
form k/2n . Then, {fn , An }∞
n=0 is a martingale as shown just before [146]. Now, (fn ) → 0 pointwise
µ-a.e. but not in L1 since ∥fn ∥1 = 1 for every n ≥ 0. Clearly E(0|An ) = 0 ̸= fn also.
The extra hypothesis required to get L1 -convergence in [148] is uniform integrability, and this we
will not discuss here. The situation is better on Lp for 1 < p < ∞. We present only the L2 -case:
Proof. Keep in mind that ∥h∥1 ≤ ∥h∥2 ∥1∥2 = ∥h∥2 by Cauchy-Schwarz for every h ∈ L2 (X, A, µ).
Since (i) and (ii) are covered by Exercise-47 and [148], it remains to prove (iii). Let M :=
supn≥0 ∥fn ∥ < ∞. For n < m, we have E(fn fm |An ) = fn E(fm |An ) = fn2 as fn is An -measurable
and by Exercise-46(ii). Taking expectation of both sides we conclude E(fn fm ) = Efn2 for n < m.
Let g0 = 0 and gn+1 = fn+1 − fn . Then Egn+12 2
= Efn+1 − Efn2 since 2E(fn+1 fn ) = 2Efn2 . There-
∑ ∑
fore, nj=1 Egj2 = Efn2 − Ef02 ≤ 2M 2 < ∞ for every n and thus ∞ j=1 Egj < ∞. For n < m, we
2
∑
have E(fm − fn )2 = m j=n+1 Egj . Since (fm ) → f pointwise µ-a.e., we have by Fatou’s lemma that
2
∑m ∑∞
∥f − fn ∥2 = E(f − fn ) = E(lim inf (fm − fn ) ) ≤ lim inf
2 2 2 2
Egj = Egj2 → 0 as n → ∞.
m→∞ m→∞
j=n+1 j=n+1
Hence |E(f − fn )| ≤ ∥f − fn ∥1 ≤ ∥f − fn ∥2 → 0 as n → ∞. And Efn = Ef0 for n ≥ 0.
*****