0% found this document useful (0 votes)

9 views22 pages

TopicsInAnalysis3 TKSM

Uploaded by

Nidhish Unnikrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views22 pages

TopicsInAnalysis3 TKSM

Uploaded by

Nidhish Unnikrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

TOPICS FROM ANALYSIS - PART 3/3

T.K.SUBRAHMONIAN MOOTHATHU

Contents

1. Measure theoretic probability: independence and the laws 1

2. Convergence of a series of random variables 7
3. Convergence of probability measures 9
4. Conditional expectation 14
5. Martingales 17

This third part of the notes discusses some fundamental results from Probability Theory in the
language of Measure Theory, and the treatment is aimed at those mathematical students who wish
to learn a little bit of Probability Theory for applications in other branches of mathematics.

1. Measure theoretic probability: independence and the laws

The ﬁrst thing one should learn in Probability Theory is, when to add and when to multiply
probabilities. Roughly speaking, one adds two probabilities when two exclusive events are connected
by OR, and one multiplies two probabilities when two independent events are connected by AND.
For example, assume we randomly choose a pair of numbers (m, n) ∈ {1, . . . , 5} × {1, . . . , 6}. Let
A be the event that m + 4 ≤ n. Since A = {(m = 1 AND n ∈ {5, 6}) OR (m = 2 AND n = 6)},
we obtain prob(A) = (1/5 × 2/6) + (1/5 × 1/6) = 2/30 + 1/30 = 1/10. In this calculation, for
multiplying probabilities we used the fact that the values of m and n are independent of each
other. We will deﬁne precisely the notion of independence shortly.

Deﬁnition: (i) A measure space (X, A, µ) is called a probability space if µ(X) = 1, and in this
case µ is called a probability measure. (ii) Let (X, A, µ) be a probability space. A measurable
function f : (X, A, µ) → (R, B(R)) is called a random variable on X, i.e., we say f : X → R is a
random variable if f −1 (B) ∈ A for every Borel set B ⊂ R, and this is equivalent to demanding
that {x ∈ X : f (x) < c} ∈ A for every c ∈ R. It is possible to deﬁne a random variable in a more
general sense as a measurable function f : (X, A, µ) → (Y, B(Y )) for a topological space Y , but we
will consider only the case Y = R. Note that if f is a random variable on X and if g : R → R
is Borel-Borel measurable (in particular, if g is continuous), then g ◦ f is also a random variable
1
2 T.K.SUBRAHMONIAN MOOTHATHU

on X. (iii) Let (X, A, µ) be a probability space and f : X → R be a random variable on X.

Then f induces a Borel probability measure P on R by the condition that P (B) = µ(f −1 (B)) for
B ∈ B(R). Any A ∈ A is called an event, and P (B) = µ(f −1 (B)) gives the probability of the event
{x ∈ X : f (x) ∈ B}; see the Example below.

Example: Let X = {1, . . . , 364} and f : X → R be f (n) = the maximum temperature at Hyderabd,
India, on the nth day of the year 2013. Equip X with the normalized counting measure µ, i.e.,
µ(A) = |A|/364 for A ⊂ X. Then f is a random variable on (X, µ) and the induced Borel
probability measure P on R given by P (B) = µ(f −1 (B)) = |{n ∈ X : f (n) ∈ B}|/364 gives the
following information: for reals a < b, the value P ([a, b]) is the probability that the maximum
temperature at Hyderabad on a random day in 2013 is between a and b.

Deﬁnition: Let (X, A, µ) be a probability space.

(i) Events A, B ∈ A are independent if µ(A ∩ B) = µ(A)µ(B). Note that A ∩ B = {x ∈ X : x ∈
A and x ∈ B} so that independence is precisely the condition required to have the correspondence
“AND ↔ multiplication”. Independence of events does not mean exclusiveness. For example, if we
consider X = [0, 1] with Lebesgue measure, then the events (0, 1/2) and (1/4, 3/4) are independent,
but (0, 1/2) and (1/2, 1) are not independent as per our deﬁnition.

(ii) A collection C = {Cj : j ∈ J} ⊂ A of events is said to be independent if for any ﬁnite subset
∩ ∏
F ⊂ J with |F | ≥ 2, we have µ( j∈F Cj ) = j∈F µ(Cj ).

(iii) Families Cj ⊂ A (not necessarily σ-algebras) for j ∈ J are said to be independent if for any
∩ ∏
ﬁnite subset F ⊂ J with |F | ≥ 2 and events Cj ∈ Cj , we have µ( j∈F Cj ) = j∈F µ(Cj ).

(iv) A collection {fj : X → R : j ∈ J} of random variables is said to be independent if the σ-algebras

fj−1 (B(R))’s for j ∈ J are independent. If Pj = µ ◦ fj−1 is the induced probability measure on R,
then the independence of random variables fj : X → R for j ∈ J is equivalent to the following:
⊗
for any ﬁnite subset {j1 , . . . , jk } ⊂ J with k ≥ 2, the product probability measure ki=1 Pji on Rk
⊗
equals the induced probability measure ( ki=1 µ) ◦ (fj1 , . . . , fjk )−1 .

(v) A random variable f : X → R is independent with C ⊂ A if f −1 (B(R)) is independent with C.

Remark: Let (X, A, µ) be a probability space. (i) Events A, B ∈ A are independent exactly when
the indicator functions 1A , 1B are independent as random variables (check). Thus the independence
of events is a special case of the independence of random variables. (ii) If random variables f1 , f2 :
X → R are independent and g1 , g2 : R → R are Borel, then it may be veriﬁed that the random
variables g1 ◦ f1 , g2 ◦ f2 : X → R are independent.

Example: (i) Let (Xj , Aj , µj ) be probability spaces for j = 1, 2 and let (X, A, µ) be the product
⊗ ⊗
probability space, i.e., X = X1 × X2 , A = A1 A2 and µ = µ1 µ2 . If Aj ∈ Aj for j = 1, 2, then
TOPICS FROM ANALYSIS - PART 3/3 3

µ((A1 × X2 ) ∩ (X1 × A2 )) = µ(A1 × A2 ) = µ1 (A1 )µ2 (A2 ) = µ(A1 × X2 )µ(X1 × A2 ), and hence the
events A1 × X2 , X1 × A2 are independent in (X, A, µ). (ii) Let µ be a Borel probability measure
on R2 . Then the projections f1 , f2 : (R2 , µ) → R to the two coordinates are independent random
⊗
variables and the induced measures Pj = µ ◦ fj−1 for j = 1, 2 satisfy µ = P1 P2 .

Remark: The above two examples are instructive, and they provide a useful way of looking at
the notion of independence: roughly speaking, we may think of independent events as events
happening in distinct coordinates of a product probability space, and independent random variables
as projection-like functions that depend on distinct coordinates of a product probability space.

Deﬁnition: Let (X, A, µ) be a probability space and f : X → R be a random variable. The

∫
expectation of f is defined as Ef = X f dµ ∈ [−∞, ∞] if the integral exists. Intuitively, Ef is
the average value of f . Evidently the expectation is linear when defined. If f ∈ L2 (X, A, µ) ⊂
∫ ∫ ∫ ∫
L1 (X, A, µ), observe that E(f − Ef )2 = (f − Ef )2 dµ = f 2 dµ − 2Ef f dµ + (Ef )2 dµ =
Ef 2 − 2(Ef )2 + (Ef )2 = Ef 2 − (Ef )2 . The variance of f is defined as var(f ) = ∥f − Ef ∥2 =
E(f − Ef )2 = Ef 2 − (Ef )2 when f ∈ L2 (X, A, µ) and var(f ) = ∞ if Ef 2 = ∞. The variance as
√
well as the standard deviation σ(f ) := var(f ) measures the amount of deviation of f from the
average (expected) value Ef . If f, g ∈ L2 (X, A, µ), then the covariance of f and g is defined as
cov(f, g) = ⟨f − Ef, g − Eg⟩ = E((f − Ef )(g − Eg)) = E(f g) − Ef Eg.

Example: Consider X = [0, 1] with Lebesgue measure µ. If f : X → R is f (x) = x2 , then Ef =

∫1 2 ∫
2 − (Ef )2 = 1 x4 dµ − (1/3)2 = 1/5 − 1/9 = 4/45. If g : [0, 1] → R
0 x dµ = 1/3 and var(f ) = Ef 0
∫1
is g(x) = x3 , then cov(f, g) = E(f g) − Ef Eg = 0 x5 dµ − (1/3)(1/4) = 1/6 − 1/12 = 1/12 ̸= 0.

Exercise-31: [Change of variable] Let (X, A, µ) be a probability space, f : X → R be a random

variable, and P = µ ◦ f −1 . Let g : R → R be Borel, and Pg = µ ◦ (g ◦ f )−1 . Then,
∫ ∫ ∫
(i) X |g ◦ f |dµ = R |g|dP = R |y|Pg (y).
∫ ∫ ∫ ∫
(ii) If X |g ◦ f |dµ < ∞, then E(g ◦ f ) = X g ◦ f dµ = R gdP = R ydPg (y).
∫ ∫ ∫ ∫
(iii) X |f |dµ = R |x|dP . If X |f |dµ < ∞, then Ef = R xdP (x).
[Hint: (i) By linearity, and approximation by nonnegative simple functions, it is enough to consider
∫
the case g = 1B , where B ∈ B(R). Let A = f −1 (B) ⊂ X. Then g ◦ f = 1A so that R |g|dP =
∫ ∫ −1
R 1B dP = P (B) = µ(A) = X |g ◦ f |dµ. Since Pg = µ ◦ 1A = µ(A )µ0 + µ(A)µ1 (where µ0 , µ1 are
c
∫ ∫ ∫ ∫
Dirac measures at 0, 1), we have R |y|dPg (y) = {0} |y|dPg (y) + {1} |y|dPg (y) = 0 + {1} 1dPg (y) =
Pg ({1}) = µ(A). Part(ii) has a similar argument, and (iii) follows by taking g = IR .]

Exercise-32: Let (X, A, µ) be a probability space and f1 , f2 : X → R be random variables.

(i) [Product rule] If f1 , f2 ∈ L1 (X, A, µ), and f1 , f2 are independent, then f1 f2 ∈ L1 (X, A, µ) and
E(f1 f2 ) = (Ef1 )(Ef2 ).
4 T.K.SUBRAHMONIAN MOOTHATHU

(ii) If f1 , f2 ∈ L2 (X, A, µ), then var(f1 + f2 ) = var(f1 ) + 2cov(f1 , f2 ) + var(f2 ). If f1 , f2 are also
independent, then cov(f1 , f2 ) = 0 and hence var(f1 + f2 ) = var(f1 ) + var(f2 ).
⊗ ⊗
[Hint: (i) Let Pj = µ ◦ fj−1 on R, and P = (µ µ) ◦ (f1 , f2 )−1 on R2 . We have P = P1 P2
by independence. Hence by the argument in Exercise-31 and Fubini-Tonelli theorem, we have
∫ ∫ ∫
E(|f1 f2 |) = R2 |xy|dP (x, y) = ( R |x|dP1 )( R |y|dP2 ) = (E|f1 |)(E|f2 |) < ∞ so that E(f1 f2 ) =
∫ ∫ ∫
R2 xydP (x, y) = ( R xdP1 )( R ydP2 ) = (Ef1 )(Ef2 ). (ii) Let gj = fj − Efj for j = 1, 2. Then
var(f1 + f2 ) = ∥g1 + g2 ∥2 = ⟨g1 + g2 , g1 + g2 ⟩ = ∥g1 ∥2 + 2⟨g1 , g2 ⟩ + ∥g2 ∥2 .]

Remark: For f (x) = x2 and g(x) = x3 on [0, 1], E(f g) = 1/6 ̸= 1/12 = Ef Eg; this shows that the
hypothesis of independence is necessary in Exercise-32(i).

Deﬁnition: For a sequence (An ) of subsets of a set X, let

∪ ∩∞
lim inf n→∞ An = ∞ k=1 n=k An = {x ∈ X : x ∈ An for all large n ∈ N}, and
∩ ∪∞
lim supn→∞ An = ∞ k=1 n=k An = {x ∈ X : x ∈ An for inﬁnitely many n ∈ N}.

[132] [Borel-Cantelli lemma or 0-1 law] Let (X, A, µ) be a probability space and An ∈ A.
∑
(i) If ∞ n=1 µ(An ) < ∞, then µ(lim supn→∞ An ) = 0.
∑∞
(ii) If n=1 µ(An ) = ∞ and An ’s are independent, then µ(lim supn→∞ An ) = 1.
∪ ∩
Proof. Let Bk = n≥k An and A = ∞ k=1 Bk = lim supn→∞ An .
∑∞ ∑
(i) If n=1 µ(An ) < ∞, then µ(Bk ) ≤ ∞ k=n µ(An ) → 0 as k → ∞ and hence µ(A) = 0.
∑
(ii) Let ∞n=1 µ(An ) = ∞. Using independence and the fact that 1 − p ≤ e
−p for p ∈ [0, 1], we
∩ ∏ ∏ ∏ ∑
obtain µ(Bkc ) = µ( n≥k Acn ) = n≥k µ(Acn ) = n≥k (1 − µ(An )) ≤ n≥k e−µ(An ) = e− n≥k µ(An ) =
e−∞ = 0. Thus µ(Bk ) = 1 for every k ∈ N so that µ(A) = 1.

Deﬁnition: A collection A of subsets of a nonempty set X is called a π-system if A is closed under

ﬁnite intersections, and is called a λ-system if X ∈ A, A \ B ∈ A whenever A, B ∈ A, and A is
closed under countable disjoint unions.

Exercise-33: (i) If A is both a π-system and a λ-system on X, then A is a σ-algebra on X.

(ii) Let C be a π-system and D be a λ-system on X. If C ⊂ D, then the σ-algebra σ(C) generated
by C satisﬁes σ(C) ⊂ D.
(iii) Let (X, A, µ) be a probability space, A ∈ A, and C ⊂ A be a π-system. If A is independent
with C, then A is independent with the σ-algebra σ(C) generated by C.
∪
[Hint: (i) Given An ∈ A, let B1 = A1 and Bn+1 = An+1 \ nj=1 Bj . Then Bn ’s are pairwise
∪ ∪∞
disjoint, and Bn ∈ A inductively. Consequently, ∞ n=1 An = n=1 Bn ∈ A. (ii) Let A be the
minimal λ-system containing C so that C ⊂ A ⊂ D. Enough to show A is a π-system. Let
A1 = {A ⊂ X : A ∩ C ∈ A for every C ∈ C}, which is a λ-system containing C. So A ⊂ A1 ,
and hence A ∩ C ∈ A for A ∈ A and C ∈ C. Let A2 = {B ⊂ X : A ∩ B ∈ A for every A ∈ A},
TOPICS FROM ANALYSIS - PART 3/3 5

which is a λ-system containing C. So A ⊂ A2 , and hence A ∩ B ∈ A for A, B ∈ A. (iii) Let

D = {D ∈ A : µ(A ∩ D) = µ(A)µ(D)}. Then D is a λ-system containing C. Hence σ(C) ⊂ D.]

[133] [Kolmogorov’s 0-1 law] (i) Let (fn ) be a sequence of independent random variables on a
probability space (X, A, µ). Let Tk ⊂ A be the smallest σ-algebra such that fn is measurable for
∩
n ≥ k, and T = ∞ k=1 Tk , the σ-algebra of tail events. Then µ(A) = 0 or µ(A) = 1 for each A ∈ T .
(ii) Let (Xn , An , µn ) be probability spaces for n ∈ N, and let (X, A, µ) be their product probability
∏ ⊗∞
space, i.e., X = ∞ n=1 Xn and µ = n=1 µn . Let πn : X → Xn be the projection to the nth
coordinate, and recall that the product σ-algebra A is the smallest σ-algebra on X w.r.to which
all the projections πn are measurable. Let Tk be the smallest σ-algebra on X such that πn ’s are
∩
measurable for n ≥ k, and let T = ∞ k=1 Tk . Then µ(A) = 0 or µ(A) = 1 for each A ∈ T .

∪∞
Proof. (i) Let Ck be the smallest σ-algebra such that f1 , . . . , fk are measurable. Let C = k=1 Ck ,
which is a π-system, being an increasing union of σ-algebras. If A ∈ T , then A ∈ Tk+1 and hence
A is independent with Ck for each k ∈ N. Hence A is independent with C, and therefore A is
independent with σ(C) by Exercise-33(iii). Since A ∈ T1 ⊂ σ(C), we get A is independent with
itself, i.e., µ(A) = µ(A ∩ A) = µ(A)µ(A) = µ(A)2 . Hence µ(A) = 0 or µ(A) = 1.
∪∞
(ii) Let Ck be the smallest σ-algebra on X such that π1 , . . . , πk are measurable, let C = k=1 Ck ,
and imitate the argument given for part (i) after noting that Ck is independent with Tk+1 .

Remark: Tail events for an independent sequence (fn ) of random variables are those events that are
not aﬀected by changing/removing ﬁnitely many fn ’s. For example, {x ∈ X : lim inf n→∞ fn (x) ≥ 0}
is a tail event, but {x ∈ X : inf n∈N fn (x) ≥ 0} is not a tail event.

Exercise-34: Let (fn ) be an independent sequence of random variables on a probability space

(X, A, µ), and T be the corresponding tail σ-algebra on X.
(i) If f : X → [−∞, ∞] is T -B([−∞, ∞]) measurable, then f is constant µ-almost everywhere.
(ii) Each of the following is a constant in [−∞, ∞] for µ-almost every x ∈ X: lim inf n→∞ fn ,
∑ ∑
lim supn→∞ fn , lim inf n→∞ ( nj=1 fj )/n, and lim supn→∞ ( nj=1 fj )/n.
[Hint: (i) For c ∈ [−∞, ∞], Ac := f −1 ([−∞, c]) ∈ T so that µ(Ac ) = 0 or 1. Let c0 = inf{c :
µ(Ac ) = 1} and check µ(f −1 (c0 )) = 1. (ii) The given functions are T -B([−∞, ∞]) measurable.]

Earlier we showed in [129] that the shift map is ergodic. Now we will give another proof of this
using Kolmogorov’s 0-1 law.

[134] [Ergodicity of the shift map] Let (X, A0 , µ0 ) be a probability measure space, and let (X N , A, µ)
⊗∞ ⊗∞
be the product probability space, where A = n=1 A0 and µ = n=1 µ0 . Then the shift map
g : X N → X N given by g(x)n = xn+1 for x ∈ X N and n ∈ N is ergodic.
6 T.K.SUBRAHMONIAN MOOTHATHU

Proof. Let A ∈ A, and g −1 (A) = A. Let Tk , T be as in [133](ii). As A is the smallest σ-algebra such
that all the projections are measurable, we have A = T1 . Thus A ∈ T1 . Then A = g −(k−1) (A) ∈ Tk
∩
for k > 1 since g is the shift. Hence A ∈ ∞k=1 Tk = T . So µ(A) ∈ {0, 1} by [133](ii).

Remark: A little more work will establish the same conclusion for the shift map on X Z , see p.271
of R.M. Dudley, Real Analysis and Probability.

The phrase law of large numbers refers to the convergence of the averages (f1 + · · · + fn )/n when
fn ’s are independent random variables. There are weak and strong laws of large numbers. To
establish the weak law, we start with some inequalities.

Exercise-35: Let (X, A, µ) be a probability space.

(i) [Markov’s inequality] If f : X → [0, ∞) is measurable, then µ({x ∈ X : f (x) ≥ t}) ≤ Ef /t =
∫
( f dµ)/t for 0 < t < ∞.
(ii) If g : X → R is a random variable, and 0 < r, t < ∞. Then µ({x ∈ X : |g(x)| ≥ t}) ≤ (E|g|r )/tr .
(iii) [Chebychev’s inequality] Let h : X → R be a random variable with Eh2 < ∞ and var(h) =
σ 2 (h) ∈ (0, ∞). Then for 0 < t < ∞, we have µ({x ∈ X : |h(x) − Eh| ≥ t}) ≤ var(h)/t2 .
[Hint: For (i), let A = {x : f (x) ≥ t}, note 1A ≤ f /t, and integrate. For (ii), note |g(x)| ≥ t iﬀ
|g(x)|r ≥ tr , and apply (i) to f (x) = |g(x)|r . For (iii), apply (ii) to g(x) = h(x) − Eh with r = 2.]

Deﬁnition: A sequence (fn ) of random variables on a probability space (X, A, µ) is said to be

−1 = µ ◦ f −1 for every m, n.
identically distributed if µ ◦ fm n
∫
Remark: Let (fn ) be identically distributed and P = µ◦fn−1 . If f1 ∈ L1 , then Efn = R xdP = Ef1
for every n by Exercise-31(iii). Similarly, if f1 ∈ L2 , then var(fn ) = var(f1 ) for every n.

[135] [Weak law of large numbers] Let (fn ) be a sequence of independent, identically distributed
random variables on a probability space (X, A, µ) with Ef12 < ∞. Then (f1 + · · · + fn )/n → Ef1
∑
in measure, i.e., µ({x ∈ X : |Ef1 − (1/n) nj=1 fj (x)| ≥ ε}) → 0 as n → ∞ for each ε > 0.

Proof. Let sn := f1 + · · · + fn . Then E(sn /n) = Ef1 by linearity and the above Remark. Similarly,
var(sn /n) = (1/n2 )var(sn ) = (1/n2 )n · var(f1 ) = var(f1 )/n by Exercise-32 and the above Remark.
By Checbychev’s inequality, µ({x ∈ X : |sn (x)/n − Ef1 | ≥ ε}) ≤ var(f1 )/nε2 → 0 as n → ∞.

Remark: On a probability space, pointwise convergence a.e. implies convergence in measure; see
Egorov’s theorem in my notes on Measure Theory. Hence [136] below covers [135] above.

[136] [Strong law of large numbers] Let (fn ) be a sequence of independent, identically distributed
random variables on a probability space (X, A, µ). If E|f1 | < ∞, then (f1 + · · · + fn )/n → Ef1
pointwise µ-almost everywhere.
TOPICS FROM ANALYSIS - PART 3/3 7
⊗∞ ⊗∞
Proof. Let P = n=1 (µ ◦ fn−1 ) = n=1 (µ ◦ f1−1 ) be the product Borel measure on RN , and
ϕ : RN → R be (y1 , y2 , . . .) 7→ y1 . Since f : (X, µ) → (RN , P ) deﬁned as f (x) = (fn (x)) is measure
∫ ∫ ∫
preserving, RN |ϕ|dP = X |ϕ| ◦ f dµ = X |f1 |dµ = E|f1 | < ∞. Thus ϕ ∈ L1 (RN , B(RN ), P ). Let
g : (RN , P ) → (RN , P ) be the shift map. By [134], g is ergodic, and by Birkhoﬀ’s ergodic theorem,
∑ ∫ N
limn→∞ (1/n) n−1 j=0 ϕ(g (y)) = RN ϕ(y)dP (y) for P -almost every y ∈ R . Since f is measure
j
∑n−1 ∫
preserving, the substitution y = f (x) gives limn→∞ (1/n) j=0 ϕ(g j (f (x))) = X f1 (x)dµ(x) = Ef1
∑ ∑
for µ-almost every x ∈ X. As ϕ ◦ g j ◦ f = fj+1 , we have (1/n) nj=1 fj = (1/n) n−1 j=0 ϕ ◦ g ◦ f .
j
∑
Thus limn→∞ (1/n) nj=1 fj (x) = Ef1 for µ-almost every x ∈ X.

Definition: If P1 , P2 are finite signed Borel measures on R, then their convolution P1 ∗ P2 is the
∫
signed Borel measure on R defined by P1 ∗ P2 (A) = R P2 (A − s)dP1 (s). The relevance of this
definition can be seen from the following:

Exercise-36: (i) Let P1 , P2 , P3 be ﬁnite signed Borel measures on R. Then we have P1 ∗ P2 (A) =
⊗
P1 P2 ({(s, t) ∈ R2 : s+t ∈ A}). Consequently, P1 ∗P2 = P2 ∗P1 and P1 ∗(P2 ∗P3 ) = (P1 ∗P2 )∗P3 .
Moreover, if µ0 is the Dirac measure at 0, then P1 ∗ µ0 = P1 = µ0 ∗ P1 .
(ii) [Convolution of induced measures corresponds to the addition of independent random variables]
Let f1 , f2 : X → R be independent random variables on a probability space (X, A, µ), and Pj =
µ ◦ fj−1 for j = 1, 2. Then P1 ∗ P2 = µ ◦ (f1 + f2 )−1 .
∫ ∫ ∫
[Hint: (i) 1A−s (t) = 1A (s+t). Hence P1 ∗P2 (A) = R P2 (A−s)dP1 (s) = R R 1A−s (t)dP2 (t)dP1 (s) =
∫ ⊗ ⊗
R2 1A (s + t)d(P1 P2 ) = P1 P2 ({(s, t) ∈ R2 : s + t ∈ A}). (ii) By independence, µ ◦ (f1 , f2 )−1
⊗ ⊗
on R2 is the product measure P1 P2 . Hence P1 ∗ P2 (A) = P1 P2 ({(s, t) ∈ R2 : s + t ∈ A}) =
µ ◦ (f1 , f2 )−1 ({(s, t) ∈ R2 : s + t ∈ A}) = µ({x ∈ X : f1 (x) + f2 (x) ∈ A}) = µ ◦ (f1 + f2 )−1 (A).]

2. Convergence of a series of random variables

Let (fn ) be a sequence of random variables. In the previous section, we analyzed the convergence
∑ ∑
of the averages (1/n) nj=1 fj . In this section we ask: when does the series ∞ n=1 fn converge?
Equivalently, we wish to know when does the sequence sn = f1 + · · · + fn of partial sums converge.

In general, convergence in measure for a sequence of random variables on a probability space

implies only that a subsequence converges pointwise almost everywhere. However, after a little bit
of preparation we will show that for the partial sum sequence mentioned above, convergence in
measure is equivalent to pointwise convergence almost everywhere (Levy’s theorem).

Exercise-37: [Completeness for a.e. convergence] Let (sn ) be a sequence of random variables on
a probabiltiy space (X, A, µ). If ∀ ε > 0, we have lim µ({x ∈ X : sup |sn (x) − sk (x)| ≥ ε}) = 0,
k→∞ n≥k
then there is a random variable s on (X, A, µ) such that (sn ) → s pointwise µ-a.e. [Hint: Let
∪
A(k, r) = {x ∈ X : supm,n≥k |sm (x) − sn (x)| ≤ 1/r}, and A(r) = ∞ k=1 A(k, r), which is an
8 T.K.SUBRAHMONIAN MOOTHATHU

increasing union. Using the triangle inequality for ε = 1/2r we may see µ(A(r)) = 1. Then
∩ ∩∞
µ( ∞r=1 Ar ) = 1. And for each x ∈ r=1 Ar , the Cauchy sequence (sn (x)) converges.]

We also need a technical inequality relating partial sums sj and max sj for Levy’s theorem:
1≤j≤m

Exercise-38: Let f1 , . . . , fm be independent random variables on a probability space (X, A, µ), and
∑
sj = ji=1 fi . For ε > 0 and 1 ≤ j ≤ m, let S(j, ε) = {x ∈ X : |sj (x)| ≥ ε} and T (j, ε) = {x ∈
X : |fj+1 (x) + · · · + fm (x)| = |sm (x) − sj (x)| > ε}. If there is ε > 0 such that µ(T (j, ε)) ≤ 1/2
∪
for 1 ≤ j ≤ m, then µ( m j=1 S(j, 2ε)) ≤ 2µ(S(m, ε)). [Hint: Let A1 = S(1, 2ε) and Aj+1 = S(j +
∪j
1, 2ε) \ i=1 S(i, 2ε), which are disjoint. Note that Aj is independent with T (j, ε) since Aj depends
on f1 , . . . , fj and T (j, ε) on fj+1 , . . . , fm . If |sj (x)| ≥ 2ε and |sm (x) − sj (x)| ≤ ε, then |sm (x)| ≥ ε,
∪ ∑m
and therefore Aj ∩ T (j, ε)c ⊂ Aj ∩ S(m, ε). Hence (1/2)µ( m j=1 S(j, 2ε)) = (1/2) j=1 µ(Aj ) ≤
∑m ∑ m ∑ m
j=1 µ(Aj )µ(T (j, ε) ) ≤ j=1 µ(Aj ∩ T (j, ε) ) ≤ j=1 µ(Aj ∩ S(m, ε)) ≤ µ(S(m, ε)).]
c c

[137] [Levy’s theorem] Let (fn ) be a sequence of independent random variables on a probability
space (X, A, µ). Then the following are equivalent:
∑
(i) ∞ n=1 fn converges (to some random variable) in measure.
∑∞
(ii) n=1 fn converges (to some random variable) pointwise µ-a.e.

Proof. Enough to show (i) ⇒ (ii) since the other implication is always true on a probability space.
∑
Let sn = nj=1 fj . Given ε ∈ (0, 1/2), by (i) choose k ∈ N large enough with µ({x ∈ X : |sn (x) −
sm (x)| > ε} ≤ ε < 1/2 for every n, m ≥ k. Fix m ∈ N, and applying Exercise-38 to fk+1 , · · · , fk+m ,
we get µ({x ∈ X : max1≤j≤m |sk+j (x) − sk (x)| ≥ 2ε}) ≤ 2µ({x ∈ X : |sk+m (x) − sk (x)| > ε}) < ε.
As m ∈ N is arbitrary, we have shown that for every ε ∈ (0, 1/2) there is k ∈ N such that
µ({x ∈ X : supn≥k |sn (x) − sk (x)| ≥ 2ε}) ≤ ε. This is equivalent to saying limk→∞ µ({x ∈ X :
supn≥k |sn (x) − sk (x)| ≥ 2ε}) = 0 for every ε ∈ (0, 1/2). By Exercise-37, we are through.
∑∞
In [138] and Exercise-40 below we will see suﬃcient conditions for the convergence of n=1 fn .

[138] [Khintchine-Kolmogorov] Let (fn ) be an L2 -sequence of independent random variables on a

probability space (X, A, µ).
∑∞ ∑
(i) If Efn = 0 for every n ∈ N and < ∞, then ∞
2
n=1 Efn n=1 fn converges pointwise µ-a.e., and
∑ ∞
in L2 -norm to some s ∈ L2 (X, A, µ). Also, Es2 = n=1 Efn2 .
∑ ∑∞ ∑∞
(ii) If the real series ∞n=1 Efn and n=1 var(fn ) are convergent, then n=1 fn converges pointwise
µ-a.e. and in L2 -norm to some s ∈ L2 (X, A, µ).
∑n
Proof. (i) Let sn = j=1 fj . Since L2 -convergence implies convergence in measure, and because
of Levy’s theorem, it suﬃces to show (sn ) is Cauchy in L2 (X, A, µ). Given ε > 0 choose k0 ∈ N
∑
such that ∞ n=k0 Efn < ε. Using Exercise-32(ii) and the fact Efn = 0, we see for m > k ≥ k0 that
2
∑ ∑m ∑m ∑m
∥sm − sk ∥22 = E( m 2
n=k+1 fn ) = var( n=k+1 fn ) = n=k+1 var(fn ) =
2
n=k+1 Efn < ε. Thus
TOPICS FROM ANALYSIS - PART 3/3 9
∑n
(sn ) is Cauchy in L2 (X, A, µ). Let s = limn→∞ sn in L2 (X, A, µ). Note that ∥sn ∥22 = 2
j=1 Efj as
∑
above, and therefore Es2 = ∥s∥22 = limn→∞ ∥sn ∥22 = ∞ 2
j=1 Efj .

(ii) After noting that E(fn − Efn ) = 0 and var(fn ) = E(fn − Efn )2 , we apply (i) to fn − Efn to
∑
conclude that ∞ ′
n=1 (fn −Efn ) converges pointwise µ-a.e. and in L -norm to some s ∈ L (X, A, µ).
2 2
∑∞ ∑
Then n=1 fn converges pointwise µ-a.e. and in L2 -norm to s := s′ + ∞ n=1 Efn ∈ L (X, A, µ).
2

Remark: In the previous section, to obtain results about the convergence of (f1 + · · · + fn )/n,
we used the hypothesis that fn ’s are independent and identically distributed. However, being
∑∞
identically distributed does not help much in the discussion of the convergence of n=1 fn . If
fn ’s are identically distributed and (fn ) satisﬁes the hypothesis of [138](ii), then we should have
Efn = 0 = var(fn ), and this implies the L2 -norm Efn2 = 0, giving fn = 0 for every n ∈ N.

Exercise-39: Let (fn ), (gn ) be two sequences of random variables on a probability space (X, A, µ),
∑ ∑∞
equivalent in the sense that ∞ n=1 µ({x ∈ X : fn (x) ̸= gn (x)}) < ∞. Then n=1 fn converges
∑∞
pointwise µ-a.e. iﬀ n=1 gn converges pointwise µ-a.e. [Hint: Let An = {x ∈ X : fn (x) ̸= gn (x)}
and A = lim supn→∞ An . By Borel-Cantelli, µ(A) = 0. And for each x ∈ Ac , we have that
fn (x) = gn (x) for all large n ∈ N.]

Exercise-40: [Kolmogorov’s three series theorem - sufficiency] Let (fn ) be a sequence of independent
random variables on a probability space (X, A, µ), and (gn ) be defined as gn (x) = fn (x) when
|fn (x)| ≤ 1 and gn (x) = 0 otherwise. Assume that the following three series of real numbers
∑ ∑ ∑
are convergent: (i) ∞ n=1 µ{x ∈ X : |fn (x)| > 1}) (ii) ∞ n=1 Egn (iii) ∞
n=1 var(gn ). Then
∑∞
n=1 fn converges pointwise µ-a.e. [Hint: Check that gn ’s are also independent. Now, (i) says
∑∞ ∑∞
n=1 µ({x ∈ X : fn (x) ̸= gn (x)}) < ∞. Hence by Exercise-39, it suffices to show n=1 gn
converges pointwise µ-a.e. And this follows from (ii), (iii) and [138](ii).]

Remark: The converse (necessity) of Exercise-40 is also true. Also, any bound C > 0 may be used
∑ ∑∞
in the place of 1 to deﬁne gn from fn as ∞
n=1 fn converges iﬀ n=1 (fn /C) converges.

3. Convergence of probability measures

Deﬁnition: A complete separable metric space is called a Polish space. If X is a Polish space, let
M (X) denote the collection of all Borel probability measures on X. Also, let Cb (X, R) = {f :
X → R : f is continuous and bounded}. Note that Cb (X, R) is a Banach space with respect to the
supremum norm, and any µ ∈ M (X) induces a bounded linear functional on Cb (X, R) by the rule
∫
f 7→ X f dµ. If µ, µn ∈ M (X), then borrowing the notion of weak* convergence from Functional
∫ ∫
Analysis, we may say (µn ) converges to µ weak* if X f dµn → X f dµ for every f ∈ Cb (X, R).
However, this convergence is called weak convergence in Probability Theory. That is, we say
∫ ∫
(µn ) → µ weakly in M (X) if X f dµn → X f dµ for every f ∈ Cb (X, R).
10 T.K.SUBRAHMONIAN MOOTHATHU

Exercise-41: Let X be a Polish space, Γ ⊂ Cb (X, R) be a dense subset, and µ, µn ∈ M (X). If

[139] Let X be a Polish space and µ, µn ∈ M (X). Then the following are equivalent:
(i) (µn ) → µ weakly.
(ii) lim inf n→∞ µn (U ) ≥ µ(U ) for every open set U ⊂ X.
(iii) lim supn→∞ µn (F ) ≤ µ(F ) for every closed set F ⊂ X.
(iv) If A ∈ B(X) is with µ(∂A) = 0, then limn→∞ µn (A) = µ(A).

Proof. (i) ⇒ (ii): Let Fk = {x ∈ X : dist(x, U c ) ≥ 1/k} and gk : X → [0, 1] be a continuous

∪
function such that gk ≡ 1 on Fk and gk ≡ 0 on U c . Then U = ∞
k=1 Fk is an increasing union, and
(gk ) ↗ 1U pointwise ∫on X. Given ε > ∫0, choose k ∈ N large∫ with µ(Fk ) > µ(U ) − ε. Now,
µ(U ) − ε < µ(Fk ) ≤ gk dµ = lim inf gk dµn ≤ lim inf 1U dµn = lim inf µn (U ).
X n→∞ X n→∞ X n→∞

(ii) ⇔ (iii): Just use the fact that U ⊂ X is open iﬀ U c is closed.

(ii & iii) ⇒ (iv): Let U = int(A) and F = A. We have µ(U ) = µ(A) = µ(F ) since µ(∂A) = 0.
Hence lim sup µn (A) ≤ lim sup µn (F ) ≤ µ(F ) = µ(A) = µ(U ) ≤ lim inf µn (U ) ≤ lim inf µn (A).
n→∞ n→∞ n→∞ n→∞
This implies equality throughout, and therefore limn→∞ µn (A) = µ(A).

(iv) ⇒ (i): Consider f ∈ Cb (X, R) and assume f (X) ⊂ (−M, M ). The set Y := {y ∈ R :
µ(f −1 (y)) > 0} is countable since f −1 (y1 ) ∩ f −1 (y2 ) = ∅ for y1 ̸= y2 . Given ε > 0, choose a
partition −M = a0 < a1 < · · · < ak−1 < ak = M of [−M, M ] such that aj − aj−1 < ε for 1 ≤ j ≤ k
∪
and aj ∈/ Y for 0 ≤ j ≤ k. If we put Aj = {x ∈ X : aj−1 ≤ f (x) < aj }, then X = kj=1 Aj is a
disjoint union. Moreover, ∂Aj = 0 for 1 ≤ j ≤ k because ∂Aj ⊂ f −1 (aj−1 ) ∪ f −1 (aj ). This implies
∑
µ(Aj ) = limn→∞ µn (Aj ) for 1 ≤ j ≤ k by (iv). Let g : X → [0, 1] be deﬁned as g = kj=1 aj 1Aj .
∫ ∫
Then gdµ ∫ = limn→∞ gdµn since ∫ µ(Aj ) = lim∫n→∞ µn (Aj ). ∫
Also, f < g < f + ε, and therefore
(lim sup f dµn ) − ε ≤ lim sup (g − ε)dµn = (g − ε)dµ ≤ f dµ
∫
n→∞ ∫ n→∞ ∫ ∫
≤ gdµ = lim inf gdµn ≤ lim inf (f + ε)dµn = (lim inf f dµn ) + ε.
n→∞ n→∞ n→∞
∫ ∫
Since ε > 0 is arbitrary, we deduce that limn→∞ f dµn exists and is equal to f dµ.

Remark: Let X be a compact metric space. Then C(X, R) = Cb (X, R) is a separable Banach space
w.r.to the supremum norm (for a proof of separability, see the ﬁrst part of my notes on Topological
Groups). By Alaoglu’s theorem, etc., the closed unit ball Γ of the dual space C(X, R)∗ is compact
and metrizable in the weak* topology (warning: the unit sphere {f ∈ C(X, R) : ∥f ∥ = 1} is not
TOPICS FROM ANALYSIS - PART 3/3 11

closed in the weak* topology, and hence is not weak* compact). The collection M (X) of all Borel
probability measures on X can be thought of as a subset of Γ by Riesz representation theorem.

Exercise-42: Let X be a compact metric space and M (X) ⊂ C(X, R)∗ be the collection of all Borel
probability measures on X. Then,
(i) M (X) is compact and metrizable w.r.to the weak* topology on C(X, R)∗ .
(ii) Every sequence (µn ) in M (X) has a subsequence (µnk ) converging weakly to some µ ∈ M (X),
∫ ∫
i.e., X f dµnk → X f dµ for every f ∈ C(X, R).
[Hint: (i) Enough to check M (X) is weak* closed, and this is easy: if µn ∈ M (X) and (µn ) → µ
∫ ∫
weakly, then µ(X) = X 1dµ = limn→∞ 1dµn = limn→∞ µn (X) = 1. And (ii) follows from (i).]

Remark: Let X be a compact metric space and {fn : n ∈ N} be a countable dense subset of the
∫
closed unit ball of C(X, R). Then it may be shown that µ 7→ ( X fn dµ) embeds (M (X), weak) as a
closed subset of [−1, 1]N ; this gives another proof that (M (X), weak) is compact and metrizable.

We will now be concerned with the following questions:

Question: Let X be a polish space and (µn ) be a sequence in M (X). When can we say that (µn ) has
a weakly convergent subsequence? Are there other equivalent formulations of weak convergence?

Exercise-43: Let µ ∈ M (R) and F : R → [0, 1] be the corresponding distribution function deﬁned
as F (x) = µ((−∞, x]) for x ∈ R. Then,
(i) F is continuous from the right, limx→−∞ F (x) = 0 and limx→∞ F (x) = 1.
(ii) x ≤ y ⇒ F (x) ≤ F (y), and consequently, the set of discontinuities of F is at most countable.
(iii) If F is continuous, then F is uniformly continuous.
[Hint: (iii) Given ε > 0, choose M > 0 large so that F (x) < ε/2 for x < −M and F (x) > 1 − ε/2
for x > M . Then choose δ > 0 for ε/2 for the uniformly continuous function F |[−M,M ] .]

[140] [Helly-Bray] Let µn , µ ∈ M (R), let Fn , F : R → [0, 1] be the corresponding distribution

functions, and let C = {x ∈ R : F is continuous at x}. Then the following are equivalent:
∫ ∫
(i) (µn ) → µ weakly, i.e., R f dµn → R f dµ for every f ∈ Cb (R, R).
(ii) (µn ) → µ in distribution, i.e., (Fn (x)) → F (x) for every x ∈ C.
(iii) There exists a dense subset A ⊂ R such that (Fn (x)) → F (x) for every x ∈ A.
(iv) There exists a dense subset A ⊂ R such that (µn ((a, b]) → µ((a, b]) for every a < b in A.

Proof. (i) ⇒ (ii): Fix x ∈ C and let a < x < b. Let f, g ∈ Cb (R, R) be deﬁned by the following
conditions: f = 1 on (−∞, a], f = 0 on (x, ∞], and the graph of f is linear on [a, x]; g = 1 on
(−∞, x], g = 0 on (b, ∞], and the graph of g is linear on [x, b]. Observe that F (a) = µ((−∞, a]) ≤
∫ ∫ ∫ ∫
f dµ ≤ gdµ ≤ µ((−∞, b]) = F (b) and f dµn ≤ µn ((−∞, x]) = Fn (x) ≤ gdµn . Since
12 T.K.SUBRAHMONIAN MOOTHATHU
∫ ∫ ∫ ∫
f dµn → f dµ and gdµn → gdµ, we get F (a) ≤ lim inf n→∞ Fn (x) ≤ lim supn→∞ Fn (x) ≤
F (b). Letting a ↗ x, b ↘ x and using the continuity of F at x, we see F (x) = limn→∞ Fn (x).

The implication (ii) ⇒ (iii) is trivial, and (iii) ⇒ (iv) is easy.

(iv) ⇒ (i): Let f ∈ Cb (R, R) and ε > 0 be given. Choose a < b in A such that for Y := (a, b]
we have ∫µ(R \ Y ) < ε. Since (µn (Y )) → µ(Y ), we also have (µn (R \ Y )) → µ(R \ Y ). Hence
lim sup | f d(µ − µn )| ≤ lim sup ∥f ∥(µ(R \ Y ) + µn (R \ Y )) ≤ 2∥f ∥ε. (*)
n→∞ R\Y n→∞

[141] Let X be a Polish space. Suppose that a sequence (µn ) in M (X) is uniformly tight in the
following sense: for every ε > 0, there is a compact subset K ⊂ X with µn (K) > 1 − ε for every
n ∈ N. Then there exist µ ∈ M (X) and a subsequence of (µnj ) such that (µnj ) → µ weakly.

Proof. Step-1 : For m ∈ N, let Km ⊂ X be compact with µn (Km ) > 1 − 1

m for every n ∈ N. Since
Cb (Km , R) = C(Km , R) is separable, we may choose a countable dense subset Γm ⊂ C(Km , R)
∫
for each m ∈ N. Now, (( Km gµn )m∈N, g∈Γm )∞ n=1 is a sequence in the compact metric space
∏ ∏
m∈N g∈Γm [min g, max g], and hence has a convergent subsequence. That is, there is (nj ) such
∫
that ( Km gdµnj )∞
j=1 converges in R for each m ∈ N and each g ∈ Γm .
∫
Step-2 : We claim that ( X f dµnj ) converges for every f ∈ Cb (X, R). Consider f ∈ Cb (X, R) and
∫
it suffices to show ( X f dµnj ) is Cauchy. The idea is to choose Km approximating X by uniform
∫
tightness and then show ( Km f dµnj ) is Cauchy. Given ε > 0, let m ∈ N be large with ∥f ∥/m < ε/5
∫
and g ∈ Γm be with supKm |f − g| < ε/5. Let j0 ∈ N be such that | Km gd(µnk − µnj )| < ε/5 for
every k > j ≥ j0 . Then for k > j ≥ j0 , we have
∫ ∫ ∫ ∫ 3ε
| Km f d(µk − µj )| ≤ Km |f − g|dµnk + | Km gd(µnk − µnj )| + Km |g − f |dµnj <
5
and hence
∫ 3ε ∫ 3ε 2∥f ∥ 3ε 2ε
| X f d(µk − µj )| ≤ + X\Km |f |d(µk + µj ) < + < + = ε.
∫ 5 5 m 5 5
This shows that ( X f dµnj ) is Cauchy, proving our claim.
∫
Step-3 : (Sketch) Define ϕ : Cb (X, R) → R as ϕ(f ) = limj→∞ X f dµnj . Then ϕ is linear and
ϕ(f ) ≥ 0 whenever f ≥ 0. Moreover, if (fm ) ↘ 0 pointwise, then it can be shown that ϕ(fm ) ↘ 0.
Therefore, by a theorem of Stone-Daniell (similar to Riesz representation theorem), there is a Borel
TOPICS FROM ANALYSIS - PART 3/3 13
∫
measure µ ≥ 0 on X with ϕ(f ) = X f dµ (see p.294 of Dudley, Real Analysis and Probability).
Considering f = 1, we see µ(X) = 1. And (µnj ) → µ weakly by the definition of ϕ.

In fact, Theorem 11.3.3 and Theorem 11.5.4 of Dudley, Real Analysis and Probability give more
information, which we state as [141′ ] below without proof.

Deﬁnition: Let X be a Polish space. Let Lb (X, R) = {f ∈ Cb (X, R) : f is Lipschitz}, and ∥f ∥L =

∥f ∥∞ + inf{λ > 0 : λ is a Lipschitz constant for f } for f ∈ Lb (X, R). For µ1 , µ2 ∈ M (X), deﬁne
∫
DL (µ1 , µ2 ) = sup{| X f d(µ1 − µ2 )| : f ∈ Lb (X, R) and ∥f ∥L ≤ 1}. Also deﬁne D(µ1 , µ2 ) = inf{ε >
0 : µ1 (A) ≤ µ2 (B(A, ε)) + ε ∀ Borel A ⊂ X}, where B(A, ε) = {x ∈ X : dist(x, A) < ε}.

[141′ ] Let X be a Polish space. Then D, DL are metrics on M (X), and (µn ) → µ weakly in M (X)
⇔ D(µn , µ) → 0 ⇔ DL (µn , µ) → 0. Moreover, for Γ ⊂ M (X), the following are equivalent:
(i) Γ is uniformly tight.
(ii) Every sequence in Γ has a subsequence converging weakly to some µ ∈ M (X).
(iii) The closure of Γ is compact in M (X) with respect to the metric D (or DL ).
(iv) Γ is totally bounded in M (X) with respect to the metric D (or DL ).

Remark: Since the elements of a Cauchy sequence form a totally bounded set, it follows from [141′ ]
that (M (X), D) and (M (X), DL ) are complete metric spaces when X is a Polish space.

Exercise-44: Let X be a Polish space, ε > 0 and µ ∈ M (X). Then,

∪
(i) X has a finite Borel partition X = kj=0 Aj with µ(∂Aj ) = 0 for 0 ≤ j ≤ k such that µ(A0 ) < ε
and diam(Aj ) < ε for 1 ≤ j ≤ k (i.e., A0 has small measure and A1 , . . . , Ak have small diameter).
(ii) There is a finitely supported measure ν ∈ M (X), i.e., ν(X \ F ) = 0 for some finite set F ⊂ X,
with D(µ, ν) ≤ ε, where D is the metric in [141′ ].
[Hint: (i) Let {xn : n ∈ N} be dense in X. Since (ε/4, ε/2) is uncountable, for each n there is
∪
δn ∈ (ε/4, ε/2) such that for Bn := B(xn , δn ) we have µ(∂Bn ) = 0. Since X = ∞n=1 Bn , there is
∪k ∪
k ∈ N such that A0 := X \ n=1 Bn satisfies µ(A0 ) < ε. Let A1 = B1 and Aj+1 = Bj+1 \ jn=1 Bn
∑
for 1 ≤ j < k. (ii) Let Aj be as above, pick yj ∈ Aj for 0 ≤ j ≤ k and define ν = kj=0 µ(Aj )µyj ,
where µyj is the Dirac measure at yj . Then ν(Aj ) = µ(Aj ) for 0 ≤ j ≤ k. Given Y ∈ B(X),
∑ ∑ ∪
let J = {1 ≤ j ≤ k : Aj ∩ Y ̸= ∅}. Then j∈J µ(Aj ) = j∈J ν(Aj ) = ν( j∈J Aj ), and
∪ ∑
j∈J Aj ⊂ B(Y, ε). Hence µ(Y ) ≤ µ(A0 ) + j∈J µ(Aj ) < ε + ν(B(Y, ε)). Thus D(µ, ν) ≤ ε.]

Remark: (i) Finitely supported probability measures are convex combinations of Dirac measures.
(ii) Let Y be a countable dense subset of a Polish space (or just a separable metric space) X. It can
be deduced using Exercise-44(ii) that the set of all ﬁnitely supported ν ∈ M (X) with supp(ν) ⊂ Y
and ν taking values in Q is a countable dense subset of (M (X), D). Thus (M (X), D) is separable.
14 T.K.SUBRAHMONIAN MOOTHATHU

4. Conditional expectation

You must have studied the notion of conditional probability in elementary classes with the
expression P (B)P (A|B) = P (A ∩ B), where P (A|B) is the conditional probability of the event A
given event B. We will consider a generalization of this to the measure theoretic setting.

Deﬁnition: Let (X, A, µ) be a probability space and f ∈ L1 (X, A, µ).

(i) If B ∈ A is with µ(B) > 0, we define the conditional expectation E(f |B) of f given B as
∫
E(f |B) = ( B f dµ)/µ(B); and we put E(f |B) = 0 if µ(B) = 0. Taking f = 1A for A ∈ A, we
∫ ∫
recover the definition of conditional probability since B 1A dµ = A∩B 1dµ = µ(A ∩ B).
∫
(ii) More importantly, if B ⊂ A is a sub σ-algebra of A, then ν(B) := B f dµ for B ∈ B defines a
signed measure on (X, B) absolutely continuous w.r.to µ, and therefore by Radon-Nikodym theorem
∫
there is h ∈ L1 (X, B, µ) such that ν(B) = B hdµ for every B ∈ B. Moreover, this h is unique
in the following sense: if h′ also satisfies the same property, then h(x) = h′ (x) for µ-almost every
x ∈ X. We define the conditional expectation E(f |B) of f given B as E(f |B) = h ∈ L1 (X, B, µ). In
∫ ∫
other words, E(f |B) is defined as the unique h ∈ L1 (X, B, µ) satisfying B hdµ = B f dµ for every
B ∈ B, and the existence of h is guaranteed by Radon-Nikodym theorem.

(iii) Let f ∈ L1 (X, A, µ) and g : X → R be a random variable. Then B := g −1 (B(R)) is a sub

σ-algebra of A. We deﬁne E(f |g) = E(f |B).

Remark: Note that E(f |B) is not a real number but an integrable function. Why is it so? Recall
that in Multivariable Calculus the derivative of a diﬀerentiable function at a point is not a real
number but a linear map: this linear map incorporates information about directional derivatives in
all possible directions. Similarly, when we write E(f |B) = h, it has to be observed that the function
h incorporates information about the conditional expectations E(f |B) for all B ∈ B. If E(f |B) = h,
∫ ∫ ∫
then for every B ∈ B we have B f dµ = B hdµ, and in particular E(f |B) = ( B hdµ)/µ(B).

Example: (i) Let X = [0, 1], A = B(X), µ be the Lebesgue measure, f : X → R be f (x) = x, and
B = {∅, [0, 1/3], (1/3, 1], X}. Note that f is not B-measurable since [0, 1/2] = f −1 ([0, 1/2]) ∈
/ B.
∫ 1/3 ∫1
To compute E(f |B), observe that 0 f dµ = 1/18 and 1/3 f dµ = 4/9. Let h : X → R be
∫ ∫
h = (1/6)1[0, 1/3) + (2/3)1(1/3, 1] , which belongs to L1 (X, B, µ) and we have B hdµ = B f dµ for
every B ∈ B. Hence E(f |B) = h. In particular, we cannot expect E(f |B) to be continuous (even
after modifying on a null set) even when f is continuous. (ii) More generally, let (X, A, µ) be
a a probability space, and B ⊂ A be a sub σ-algebra generated by a ﬁnite measurable partition
∪
X = kj=1 Aj with µ(Aj ) > 0, i.e., B is the smallest sub σ-algebra of A containing {A1 , · · · , Ak }. For
∫ ∑
f ∈ L1 (X, A, µ) if we put aj = ( Aj f dµ)/µ(Aj ), then it may be checked that E(f |B) = kj=1 aj 1Aj .
TOPICS FROM ANALYSIS - PART 3/3 15

Conditional expectation behaves in many cases like ordinary expectation. But it should be noted
that since h = E(f |B) is deﬁned uniquely only µ-almost everywhere, the properties of conditional
expectation that we state as [142], [143], and [144] below, should be read ‘µ-almost everywhere’.

[142] [Properties of conditional expectation - I] Let (X, A, µ) be a probability space, B ⊂ A be a

sub σ-algebra, and f, f1 , f2 ∈ L1 (X, A, µ). Then,
(i) E(E(f |B)) = Ef . If f ∈ L1 (X, B, µ), then E(f |B) = f . In particular, E(f |A) = f .
(ii) [Linearity] E(af1 + bf2 |B) = aE(f1 |B) + bE(f2 |B) for a, b ∈ R.
(iii) [Positivity] E(f |B) ≥ 0 when f ≥ 0. By linearity, E(f1 |B) ≤ E(f2 |B) when f1 ≤ f2 .
(iv) |E(f |B)| ≤ E(|f | | B).
(v) [Tower property] If C ⊂ B is a sub σ-algebra, then E(E(f |B)|C) = E(f |C).
(vi) [Independence] If f is independent with B, i.e., if µ(f −1 (C) ∩ B) = µ(f −1 (C))µ(B) for every
C ∈ B(R) and B ∈ B, then E(f |B) ≡ Ef µ-a.e. In particular, E(f |{∅, X}) ≡ Ef .

Proof. Properties (i) and (ii) follow essentially from the deﬁnition of conditional expectation.
∪
(iii) Let h = E(f |B) and A = {x ∈ X : h(x) < 0}. Write A = ∞ n=1 An , where An = {x ∈ X :
h(x) < −1/n}, and note A, An ∈ B. If h ≥ 0 µ-a.e. is false, then µ(A) > 0 and then µ(An ) > 0 for
∫ ∫
some n ∈ N. Then An hdµ ≤ −µ(An )/n < 0 ≤ An f dµ, a contradiction since An ∈ B.

(iv) Since ±f ≤ |f |, we have ±E(f |B) = E(±f |B) ≤ E(|f | | B) by (ii) and (iii).
∫ ∫ ∫
(v) Let h = E(E(f |B)|C). If C ∈ C ⊂ B, then C hdµ = C E(f |B)dµ = C f dµ.

(vi) By hypothesis, 1B is independent with f for B ∈ B and hence E(1B f ) = E(1B )Ef = µ(B)Ef
∫ ∫ ∫
by Exercise-32. Therefore, B f dµ = X 1B f dµ = E(1B f ) = µ(B)Ef = B Ef dµ for B ∈ B.

[143] [Properties of conditional expectation - II] Let (X, A, µ) be a probability space, B ⊂ A be a

sub σ-algebra, and f, fn ∈ L1 (X, A, µ) for n ∈ N.
(i) [Monotone convergence theorem] If 0 ≤ fn ↗ f pointwise µ-a.e., then E(fn |B) ↗ E(f |B)
pointwise µ-a.e.
(ii) [Fatou’s lemma] If fn ≥ 0 and f := lim inf n→∞ fn , then E(f |B) ≤ lim inf E(fn |B).
n→∞
(iii) [Lebesgue dominated convegence theorem] If (fn ) → f pointwise µ-a.e., and |fn | ≤ g for some
g ∈ L1 (X, A, µ), then E(fn |B) → E(f |B) pointwise µ-a.e.
(iv) [Product with a random variable] If g : (X, B) → R is a random variable with gf ∈ L1 (X, A, µ),
then E(gf |B) = gE(f |B).

Proof. (i) Let hn = E(fn |B). Since 0 ≤ hn ≤ hn+1 ≤ E(f |B) (and this we may assume everywhere
after modiﬁcation on a set of measure 0), there is h : X → [0, ∞) with h(x) = limn→∞ hn (x)
for every x ∈ X. Clearly h is B-measurable since hn ’s are. Consider B ∈ B. By the ordinary
∫
Monotone convergence theorem applied to 1B fn ↗ 1B f and 1B hn ↗ 1B h, we obtain B f dµ =
16 T.K.SUBRAHMONIAN MOOTHATHU
∫ ∫ ∫ ∫ ∫
limn→∞ B fn dµ and B hdµ = limn→∞ B hn dµ. But B fn dµ = B hn dµ since hn = E(fn |B) and
∫ ∫
thus B f dµ = B hdµ. By taking B = X, we see h ∈ L1 (X, B, µ). Hence h = E(f |B) µ-a.e..

(ii) Let gm := inf n≥m fn . Note that 0 ≤ gm ↗ f and gm ≤ fm . Hence by (i) we have
E(f |B) = lim E(gm |B) ≤ lim inf E(fm |B).
m→∞ m→∞

[144] [Jensen’s inequality] Let (X, A, µ) be a probability space, B ⊂ A be a sub σ-algebra, f ∈

L1 (X, A, µ), and g : R → R be a convex function with g ◦ f ∈ L1 (X, A, µ). Then g(E(f |B)) ≤
E(g ◦ f |B). In particular, (taking B = A) we have g(Ef ) ≤ E(g ◦ f ).

Proof. Let Γ be the collection of all aﬃne functions h : R → R, h(x) = ax + b, with h ≤ g. The
geometric observation is that g(x) = sup{h(x) : h ∈ Γ} for each x ∈ R since g is convex. Consider
h ∈ Γ. By linearity and the fact h ◦ f ≤ g ◦ f , we get h(E(f |B)) = E(h ◦ f |B) ≤ E(g ◦ f |B) by
[142]. Taking supremum over h ∈ Γ yields g(E(f |B)) ≤ E(g ◦ f |B).

Remark: Since x 7→ x2 is convex, we have (E(f |B))2 ≤ E(f 2 |B) by [144]. Also, [144] remains true
if f (X) ⊂ J for some interval J ⊂ R and g : J → R is convex. Therefore, (E(|f | | B))p ≤ E(|f |p |B)
for 1 ≤ p < ∞ as x 7→ xp is convex on [0, ∞) for 1 ≤ p < ∞.

Exercise-45: [A characterization of conditional expectation of L2 -functions] Let (X, A, µ) be a

probability space, f ∈ L2 (X, A, µ), and B ⊂ A be a sub σ-algebra. Then,
(i) Γ := {g ∈ L2 (X, A, µ) : g is B-measurable} is a closed vector subspace of L2 (X, A, µ).
(ii) If h = E(f |B), then ∥f − h∥2 ≤ ∥f − g∥2 for every g ∈ Γ. In other words, E(f |B) is the
orthogonal projection of f onto the subspace Γ.
[Hint: (ii) ∥f − g∥22 = E(f − g)2 = E(f − h + h − g)2 = E(f − h)2 + E(h − g)2 + 2s, where
s = E((f − h)(h − g)) = E(E((f − h)(h − g)|B)) = E((h − g)E((f − h)|B)) by [142](i) and [143](iv)
since h − g is B-measurable. Moreover, E(f − h) = 0 by [142](i) and thus s = 0.]
TOPICS FROM ANALYSIS - PART 3/3 17

5. Martingales

Martingales are special sequences of random variables (whose historical origin is from gambling)
enjoying a toolkit of nice properties including convergence under mild assumptions.

Deﬁnition: Let (X, A, µ) be a probability space. A sequence (An )∞

n=0 of sub σ-algebras of A is called
a filtration if An ⊂ An+1 for every n ≥ 0. We say {fn , An }∞
n=0 is a gambling sequence (or, officially
a stochastic sequence) if (An )∞
n=0 is a filtration on (X, A, µ) and fn : (X, An ) → R are random
variables. In this case we also say (fn )∞ ∞
n=0 is adapted to the filtration (An )n=0 . The terminology
gambling sequence is motivated by the following: imagine a gambler playing at a casino; then fn
can be thought of as the fortune of the gambler after the nth game, An is the collection of events
known to have occurred or not in the gambling up to the nth game, and E(fn+1 |An ) is the expected
fortune of the gambler at the (n + 1)th game having known the outcomes of the first n games.

Question: Does the knowledge of An (events up to the nth game) help the gambler to increase the
expected value of fn+1 , his fortune after the (n + 1)th game? i.e., is fn ≤ E(fn+1 |An )?

Deﬁnition: A gambling sequence {fn , An }∞

n=0 on a a probability space (X, A, µ) with fn ∈ L (X, A, µ)
1

for every n ≥ 0 is said to be a:

(i) martingale if fn = E(fn+1 |An ) µ-a.e. (nuetral game)
(ii) submartingale if fn ≤ E(fn+1 |An ) µ-a.e. (game favors the gambler)
(iii) supermartingale if fn ≥ E(fn+1 |An ) µ-a.e. (game favors the casino)
for every n ≥ 0.

Remark: (i) If fn is An -measurable, then E(fn |An ) = fn by [142](i). Hence a gambling sequence
{fn , An }∞
n=0 with fn ∈ L is a martingale iﬀ E(fn+1 − fn |An ) = 0; is a submartingale iﬀ E(fn+1 −
1

fn |An ) ≥ 0; and is a supermartingale iﬀ E(fn+1 − fn |An ) ≤ 0 for every n ≥ 0. (ii) {fn , An }∞

n=0 is
a submartingale iﬀ {−fn , An }∞
n=0 is a supermartingale.

Exercise-46: Let {fn , An }∞

n=0 be a martingale. Then,
(i) Ef0 = Efn for every n ≥ 0 , i.e., the expectation is constant.
(ii) fn = E(fm |An ) for every 0 ≤ n ≤ m.
(iii) If g : R → R is convex (concave) with g ◦ fn ∈ L1 (X, An , µ), then {g ◦ fn , An }∞
n=0 is a
submartingale (supermartingale).
[Hint: (i) & (ii): Efn = E(E(fn+1 |An )) = Efn+1 , and fn = E(fn+1 |An ) = E(E(fn+2 |An+1 )|An ) =
E(fn+2 |An ) = · · · by [142](i) and [142](v). (iii) g ◦ fn = g(E(fn+1 |An )) ≤ E(g ◦ fn+1 |An ) by [144].]

Examples: Let (An )∞

n=0 be a ﬁltration on a probability space (X, A, µ), and f ∈ L (X, A, µ). Deﬁne
1

fn = E(f |An ), which is An -measurable. Now E(fn+1 |An ) = E(E(f |An+1 )|An ) = E(f |An ) = fn
by [142](v). Therefore {fn , An }∞
n=0 is a martingale. A partial converse is given below.
18 T.K.SUBRAHMONIAN MOOTHATHU

Exercise-47: Let 1 < p < ∞, and {fn , An }∞

n=0 be a martingale on a probability space (X, A, µ)
such that (fn )∞
n=0 is a bounded sequence in L (X, A, µ). Then there is f ∈ L (X, A, µ) with fn =
p p

E(f |An ) for every n ≥ 0. [Hint: Since Lp is reﬂexive and separable for 1 < p < ∞, the bounded
sequence (fn ) has a weakly convergent subsequence (fnk ) → f ∈ Lp by Alaoglu’s theorem, etc. Fix
n ∈ N, and consider A ∈ An . Then 1A ∈ Lq = (Lp )∗ , where p1 + 1q = 1. By weak convergence,
∫
A f dµ = ⟨f, 1A ⟩ = limk→∞ ⟨fnk , 1A ⟩. For nk > n, we have E(fnk |An ) = fn by Exercise-46(ii).
∫ ∫ ∫ ∫
Hence ⟨fnk , 1A ⟩ = A fnk dµ = A fn dµ. Thus A f dµ = A fn dµ, giving fn = E(f |An ).]

Example: Let gn : X → R be an L1 -sequence of independent random variables on a probability

space (X, A, µ), and assume there is c ∈ R with Egn = c for every n ≥ 0. Let An = σ(g0 , g1 , . . . , gn ),
the smallest sub σ-algebra of A with respect to which g0 , g1 , . . . , gn are measurable. Then (An )∞
n=0
∞
∑n
is a ﬁltration called the natural filtration of (gn )n=0 . Let fn = j=0 gj . Then fn ∈ L (X, A, µ) with
1

natural ﬁltration (An )∞

n=0 . Observe that E(fn+1 − fn |An ) = E(gn+1 |An ) ≡ Egn+1 = c by [142](vi)
since gn+1 is independent with An . Therefore {fn , An }∞
n=0 is a martingale or submartingale or
supermartingale according to whether c = 0 or c ≥ 0 or c ≤ 0 respectively.

Remark: From the Example above we may extract a general method of constructing martingales
as follows. Let (hn )∞ 1
n=0 be an L -sequence of independent random variables on a probability space
(X, A, µ). Let gn = hn −Ehn (then Egn = 0 and gn ’ are still independent), An = σ(g0 , g1 , · · · , gn ) =
∑
σ(h0 , h1 , · · · , hn ), and fn = nj=0 gj . Then {fn , An }∞
n=0 is a martingale.

Exercise-48: Let (fn )∞ ∞

n=0 , (gn )n=0 be two sequences of random variables on a probability space
∑
(X, A, µ) and assume each gn is bounded. Deﬁne h0 = f0 and hn = f0 + nj=1 gj (fj − fj−1 ) for
n ∈ N. Here, (hn ) may be called the discrete integral of (gn ) w.r.to (fn ). Let (An )∞
n=0 be a ﬁltration
on (X, A, µ) and assume fn and gn+1 (not gn ) are An -measurable for n ≥ 0.
(i) If {fn , An }∞ ∞
n=0 is a martingale, then {hn , An }n=0 is a martingale.
(ii) If {fn , An }∞ ∞
n=0 is a sub(super)martingale and gn ≥ 0, then {hn , An }n=0 is a sub(super)martingale.
[Hint: Evidently, hn is An -measurable, and hn ∈ L1 since fn ∈ L1 and gn ’s are bounded. Now
E(hn+1 − hn |An ) = E(gn+1 (fn+1 − fn )|An ) = gn+1 E(fn+1 − fn |An ) by [143](iv).]

Many results about martingales have corresponding versions for submartingales/supermartingales

as well. This is not so surprising in view of the following decomposition theorem.

[145] [Doob decomposition] Let {fn , An }∞

n=0 be a submartingale on a probability space (X, A, µ).
Then we may write fn = gn + hn , where
(i) {gn , An }∞
n=0 is a martingale.
(ii) hn ≤ hn+1 µ-a.e. for n ≥ 0.
(iii) hn+1 is An -measurable for n ≥ 0.
Further, the decomposition fn = gn + hn is unique if we set h0 = 0.
TOPICS FROM ANALYSIS - PART 3/3 19

Proof. First we show uniqueness. Suppose fn = gn + hn . We have E(gn+1 |An ) = gn by (i), and
E(hn+1 |An ) = hn+1 by (iii). Hence E(fn+1 − fn |An ) = E(fn+1 |An ) − fn = E(gn+1 + hn+1 |An ) −
(gn + hn ) = hn+1 − hn . This determines hn ’s uniquely since h0 = 0, and then gn ’s are also uniquely
determined. Now we prove the existence of decomposition. Starting with h0 = 0, we may deﬁne
hn ’s inductively using the requirement hn+1 − hn = E(fn+1 − fn |An ), which ensures (iii). We have
hn+1 ≥ hn since E(fn+1 − fn |An ) ≥ 0 by the submartingale property of fn . Letting gn = fn − hn ,
we may verify (i) as well by checking E(gn+1 − gn |An ) = 0.

Remark: For a supermartingale {fn , An }∞

n=0 , similarly we get fn = gn + hn with hn ≥ hn+1 .

Imagine a game in which the gambler stops playing when a particular favorable event happens.
The theory given below leading up to [146] says that this stopping strategy does not aﬀect the
essential nature (martingale/submartingale/supermartingale) of the gambler’s expected fortune.

Deﬁnition: Let (X, A, µ) be a probability space. A random variable t : X → N ∪ {0, ∞} is

called a stopping time with respect to a filtration (An )∞
n=0 if {x ∈ X : t(x) ≤ n} ∈ An for every
n ∈ N ∪ {0}, and this is equivalent to saying {x ∈ X : t(x) = n} ∈ An for every n ∈ N ∪ {0} (verify).
Intuitively, the event {x ∈ X : t(x) ≤ n} happens within the first n units of time, and the event
{x ∈ X : t(x) = ∞} never happens. A stopping time t is said to be bounded if there is M ∈ N such
that µ({x ∈ X : t(x) ≤ M }) = 1, and is said to be finite if µ({x ∈ X : t(x) < ∞}) = 1. Clearly, a
bounded stopping time is finite but the converse is not true, see the Example below.

Example: Consider X = [0, 1] with the Lebesgue measure. Let fn : X → R be fn (x) = xn and
An = σ(f0 , f1 , . . . , fn ) for n ≥ 0. Fix c ∈ (0, 1), and deﬁne t : [0, 1] → N ∪ {0, ∞} as t(x) = min{n ∈
N : fn (x) ≤ c} and t(x) = ∞ if no such n exists. Now, t is ﬁnite since {x ∈ [0, 1] : t(x) < ∞} = [0, 1),
but t is not bounded since {x ∈ [0, 1] : t(x) ≤ M } = [0, c1/M ] whose Lebesgue measure is < 1.

Deﬁnition: Let {fn , An }∞

n=0 be a gambling sequence on a probability space (X, A, µ) and t : X →
N ∪ {0, ∞} be a stopping time w.r.to (An ). Let ft∧n : X → R be ft∧n (x) = fmin{t(x),n} (x) for n ≥ 0.
For example, if t ≡ M , then (ft∧n ) = (f0 , f1 , . . . , fM −1 , fM , fM , fM , . . .), which corresponds to a
game stopped at time M . If t is finite, let ft : X → R as ft (x) = ft(x) (x), which is defined µ-a.e.
since t < ∞ µ-a.e.. Intuitively, ft carries information about the gambler’s fortune under optional
stopping of a game. Also define At = {A ∈ A : A ∩ {x ∈ X : t(x) ≤ n} ∈ An ∀ n ≥ 0}.

Exercise-49: As per the above Deﬁnition, we have:

(i) ft∧n is An -measurable for every n ≥ 0.
(ii) At is a sub σ-algebra of A, and t, ft are At -measurable.
(iii) If t is ﬁnite, then ft (x) = ft(x) (x) = ft∧n (x) for all large n ∈ N for µ-a.e. x ∈ X.
(iv) If t is bounded and fn ∈ L1 for every n ≥ 0, then ft∧n , ft ∈ L1 .
20 T.K.SUBRAHMONIAN MOOTHATHU
∪
[Hint: (i) & (ii): Let Aj = {x ∈ X : t(x) = j} ∈ Aj , En = {x ∈ X : t(x) ≤ n} = nj=0 Aj ∈ An ,
−1 ∪ ∪
and B ∈ B(R). Then ft∧n (B) = [ nj=0 (Aj ∩ fj−1 (B))] [Enc ∩ fn−1 (B)] ∈ An . Since En ∩ t−1 (B) =
∪ −1 −1 ∪n −1
j∈B: 0≤j≤n Aj ∈ An , we have t (B) ∈ At . Since En ∩ft (B) = j=0 [Aj ∩fj (B)] ∈ An , we have
∫ ∑ ∫ ∑M
ft−1 (B) ∈ At . (iv) If t is bounded by M ∈ N, then X |ft |dµ = M j=0 Aj |fj |dµ ≤ j=0 ∥fj ∥ < ∞.]

Example: Let X = [0, 1] with Lebesgue measure µ. Let fn : X → R be fn (x) = x/n and
An = σ(f1 , . . . , fn ) for n ∈ N. Deﬁne t : [0, 1] → N ∪ {0, ∞} as t(x) = min{n ∈ N : fn (x) ≤ 1/3},
which is a bounded stopping time w.r.to (An ) since t ≤ 3. Here ft : X → R is given by ft (x) = f1 (x)
for x ∈ [0, 1/3]; ft (x) = f2 (x) for x ∈ (1/3, 2/3]; and ft (x) = f3 (x) for x ∈ (2/3, 1]. Also, ft∧1 = f1 ;
ft∧2 (x) = f1 (x) for x ∈ [0, 1/3] and ft∧2 (x) = f2 (x) for x ∈ (1/3, 1]; and ft∧n = ft for n ≥ 3.

Example: Let X = [0, 1] with Lebesgue measure µ. Let An = [0, 2−n ] and fn : X → R be
fn = 2n 1An for n ≥ 0. We have fn ∈ L1 (X, A, µ) with ∥f ∥1 = Efn = 1. Let Cn be the collection
of subintervals J ⊂ [0, 1] having end points of the form k/2n , and An = σ(Cn ). Then fn is An -
∫ ∫
measurable. Consider J ∈ An . Then J fn dµ = J fn+1 dµ = 1 or 0 depending on whether inf J = 0
∫ ∫
or inf J ≥ 2−n . It follows that A fn dµ = A fn+1 dµ for every A ∈ An . Therefore, E(fn+1 |An ) = fn ,
i.e., {fn , An }∞ ∞
n=0 is a martingale. Note that (fn (x))n=0 is eventually 0 for each x ∈ (0, 1]. Hence
t : X → N ∪ {0, ∞} given by t(0) = ∞ and t(x) = min{n ∈ N : fn (x) = 0} for x ∈ (0, 1], is a
ﬁnite stopping time. And ft : X → R satisﬁes ft ≡ 0 on (0, 1], giving Eft = 0 ̸= 1 = Efn for every
n ≥ 0. This shows the necessity of the boundedness assumption on t in [146](iii) below.

The result below says that under fairly general conditions, an optional stopping does not really
change the nature of a game; for simplicity we state it only for martingales.

[146] [Doob’s optional stopping time theorem] Let {fn , An }∞

n=0 be a martingale on a probability
space (X, A, µ), and t be a stopping times w.r.to (An ). Then,
(i) {ft∧n , An }∞
n=0 is a martingale.
(ii) If t is ﬁnite and there is M > 0 with |ft∧n | ≤ M µ-a.e. for each n ≥ 0, then Eft = Ef0 .
(iii) If t is bounded, say t ≤ M ∈ N, then E(fM |At ) = ft . Consequently Eft = EfM = Ef0 µ-a.e.
(iv) If s ≤ t are bounded stopping times w.r.to (An ), then E(ft |As ) = fs µ-a.e.

Proof. (i) Let gn : X → R be gn (x) = 1 if n ≤ t(x) and gn (x) = 0 if t(x) ≤ n − 1. Then gn+1
∑
is An -measurable. Since ft∧0 = f0 and ft∧n = f0 + nj=1 gj (fj − fj−1 ) for n ≥ 1, we conclude by
Exercise-48 that {ft∧n , An }∞
n=0 is a martingale.

(ii) By Exercise-49(iii), |ft | ≤ M µ-a.e., which implies ft ∈ L1 (X, A, µ). Now by (i), Ef0 = Eft∧0 =
Eft∧n for every n ∈ N and therefore |E(ft − f0 )| = |E(ft − ft∧n )| ≤ 2M µ({x ∈ X : t(x) > n}) → 0
as n → ∞ since t < ∞ µ-a.e.. Hence E(ft − f0 ) = 0, or Eft = Ef0 .
TOPICS FROM ANALYSIS - PART 3/3 21

∪M
(iii) Let Aj = {x ∈ X : t(x) = j} ∈ Aj and observe X = j=0 Aj is a measurable partition modulo
a null set. Consider A ∈ At . Then A ∩ Aj ∈ Aj . We have ft ∈ L1 (X, At , µ) by Exercise-49,
∫ ∫
and ft = fj on Aj . Also E(fM |Aj ) = fj by Exercise-46(ii), implying A∩Aj fj dµ = A∩Aj fM dµ.
∫ ∑ ∫ ∑M ∫ ∫
Therefore, A ft dµ = M j=0 A∩Aj fj dµ = j=0 A∩Aj fM dµ = A fM dµ. Thus E(fM |At ) = ft .

(iv) Suppose s ≤ t ≤ M . Applying the tower law of conditional expectation to As ⊂ At , and

applying (iii) to t ≤ M and s ≤ M , we obtain E(ft |As ) = E(E(fM |At )|As ) = E(fM |As ) = fs .

Seminar topics: (i) Analogue of [146] for submartingales, (ii) Doob’s maximal and Lp -inequalities.

Remark: If {fn , An }∞
n=0 is a (super)martingale with fn ≥ 0 on a probability space (X, A, µ), and t
is a ﬁnite stopping time w.r.to (An ), then by Exercise-49(iii), Fatou’s lemma, and [146](i) we have,
Eft = E(lim inf ft∧n ) ≤ lim inf Eft∧n = (≤)Eft∧0 = Ef0 . Thus Eft ≤ Ef0 .

If a sequence of real numbers is not convergent in [−∞, ∞], then the sequence should oscillate and
hence should cross some interval [a, b] inﬁnitely often. Building up on this simple observation, the
so called upcrossing inequality of Doob will now lead us to a suﬃcient condition for the convergence
of an L1 -martingale. This will complement Exercise-47 stated for an Lp -martingale, 1 < p < ∞.

[147] [Doob’s upcrossing inequality] Let {fn , An }∞

n=0 be a supermartingale on a probability space
(X, A, µ) and ﬁx reals a < b. For n ∈ N, deﬁne the upcrossing random variable un : X → R as
follows: un (x) is the maximum integer k ∈ N such that there are 0 ≤ p1 < q1 < · · · < pk < qk ≤ n
with fpi (x) < a < b < fqi (x) for 1 ≤ i ≤ k; and we put un (x) = 0 if no such k ∈ N exists. Then
(b − a)Eun ≤ E|fn − a| for every n ∈ N.

Proof. We will deﬁne random variables gj : X → R that are 1 during an upcrossing and 0 during
a downcrossing. Fix x ∈ X, and choose integers 0 ≤ p1 < q1 < p2 < q2 < · · · optimally (least in
each case) with fpi (x) < a < b < fqi (x). Let g0 (x) = 0, gj+1 (x) = 0 for 0 ≤ j ≤ p1 , gj+1 (x) = 1 for
pi < j ≤ qi , and gj+1 (x) = 0 for qi < j ≤ pi+1 . Observe that gj+1 is Aj -measurable. Let h0 = 0
∑
and hn = nj=1 gj (fj − fj−1 ) for n ≥ 1. Then {hn , An }n≥0 is a supermartingale by the argument
in Exercise-48(ii) and therefore Ehn ≤ Eh0 = E0 = 0. Since each upcrossing of fj ’s increases the
value of hn by at least b − a, we also have (b − a)un − |fn − a| ≤ hn , where the subtracted term
|fn − a| is to take care of any possible incomplete upcrossing at the end with fn (x) < a. Taking
expectation we have (b − a)Eun − E|fn − a| ≤ Ehn ≤ 0, and thus (b − a)Eun ≤ E|fn − a|.

[148] [Martingale convergence theorem] Let {fn , An }∞

n=0 be a (super)martingale on a probabil-
ity space (X, A, µ) such that (fn )∞
n=0 is a bounded sequence in L (X, A, µ). Then there is f ∈
1

L1 (X, A, µ) such that (fn ) → f pointwise µ-a.e.

22 T.K.SUBRAHMONIAN MOOTHATHU

Proof. Let supn≥0 ∥fn ∥ ≤ M < ∞. Fix reals a < b and let un ’s be as in [147]. Since 0 ≤ un ≤ un+1 ,
the limit u(x) = limn→∞ un (x) exists in [0, ∞] for every x ∈ X. We have (b − a)Eun ≤ E|fn − a| ≤
∫ ∫
E(|fn | + |a|) ≤ M + |a| by [147], and therefore X udµ = lim X un dµ ≤ (M + |a|)/(b − a) < ∞ by
Monotone convergence theorem. Hence u(x) < ∞ for µ-a.e. x ∈ X. This in other words means
the set Y (a, b) := {x ∈ X : (fn (x))∞n=0 has inﬁnitely many upcrossings of [a, b]} is µ-null. Since
∪
J := {(a, b) ∈ Q2 : a < b} is countable, the set Y := (a,b)∈J Y (a, b) is also µ-null. And (fn (x))∞
n=0
must converge in [−∞, ∞] for every x ∈ X \ Y . Writing f (x) = limn→∞ fn (x), we see by Fatou’s
∫ ∫
lemma that X |f |dµ ≤ lim inf X |fn |dµ ≤ M < ∞ and thus f ∈ L1 (X, A, µ).

Example: We will show that we cannot expect either fn = E(f |An ) or ∥f − fn ∥1 → 0 in [148].
Let X = [0, 1] with Lebesgue measure µ. Let An = [0, 2−n ] and fn : X → R be fn = 2n 1An for
n ≥ 0. Let An be the σ-algebra on X generated by subintervals J ⊂ X having end points of the
form k/2n . Then, {fn , An }∞
n=0 is a martingale as shown just before [146]. Now, (fn ) → 0 pointwise
µ-a.e. but not in L1 since ∥fn ∥1 = 1 for every n ≥ 0. Clearly E(0|An ) = 0 ̸= fn also.

The extra hypothesis required to get L1 -convergence in [148] is uniform integrability, and this we
will not discuss here. The situation is better on Lp for 1 < p < ∞. We present only the L2 -case:

[149] Let {fn , An }∞ ∞

n=0 be a martingale on a probability space (X, A, µ) such that (fn )n=0 is a
bounded sequence in L2 (X, A, µ). Then there is f ∈ L2 (X, A, µ) such that:
(i) fn = E(f |An ) for every n ≥ 0.
(i) (fn ) → f pointwise µ-a.e.
(ii) ∥f − fn ∥1 ≤ ∥f − fn ∥2 → 0 and Ef = Ef0 , where ∥ · ∥p denotes the Lp -norm for p = 1, 2.

Proof. Keep in mind that ∥h∥1 ≤ ∥h∥2 ∥1∥2 = ∥h∥2 by Cauchy-Schwarz for every h ∈ L2 (X, A, µ).
Since (i) and (ii) are covered by Exercise-47 and [148], it remains to prove (iii). Let M :=
supn≥0 ∥fn ∥ < ∞. For n < m, we have E(fn fm |An ) = fn E(fm |An ) = fn2 as fn is An -measurable
and by Exercise-46(ii). Taking expectation of both sides we conclude E(fn fm ) = Efn2 for n < m.
Let g0 = 0 and gn+1 = fn+1 − fn . Then Egn+12 2
= Efn+1 − Efn2 since 2E(fn+1 fn ) = 2Efn2 . There-
∑ ∑
fore, nj=1 Egj2 = Efn2 − Ef02 ≤ 2M 2 < ∞ for every n and thus ∞ j=1 Egj < ∞. For n < m, we
2
∑
have E(fm − fn )2 = m j=n+1 Egj . Since (fm ) → f pointwise µ-a.e., we have by Fatou’s lemma that
2

∑m ∑∞
∥f − fn ∥2 = E(f − fn ) = E(lim inf (fm − fn ) ) ≤ lim inf
2 2 2 2
Egj = Egj2 → 0 as n → ∞.
m→∞ m→∞
j=n+1 j=n+1
Hence |E(f − fn )| ≤ ∥f − fn ∥1 ≤ ∥f − fn ∥2 → 0 as n → ∞. And Efn = Ef0 for n ≥ 0.

Topic for self-study: Applications of martingales.

*****

Cheatsheet Probability and Statistics
100% (1)
Cheatsheet Probability and Statistics
10 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
Ouka Cha
No ratings yet
Ouka Cha
16 pages
STAT733 Notes
No ratings yet
STAT733 Notes
216 pages
MATH5825 Lec 11
No ratings yet
MATH5825 Lec 11
21 pages
Elements of Probability Theory
100% (2)
Elements of Probability Theory
38 pages
ST7 1
No ratings yet
ST7 1
21 pages
Gradprob Notes2
No ratings yet
Gradprob Notes2
13 pages
Sol Tor Csaba
No ratings yet
Sol Tor Csaba
118 pages
Module A
No ratings yet
Module A
43 pages
Markov
No ratings yet
Markov
46 pages
Lec 1
No ratings yet
Lec 1
19 pages
Brownian Motion and Ito Calculus
No ratings yet
Brownian Motion and Ito Calculus
19 pages
Probabilityspace
No ratings yet
Probabilityspace
5 pages
Orf526 f24 Lec3
No ratings yet
Orf526 f24 Lec3
5 pages
MTSP - TA Problems Solutions 3 PDF
No ratings yet
MTSP - TA Problems Solutions 3 PDF
5 pages
Lecture 3
No ratings yet
Lecture 3
4 pages
Note
No ratings yet
Note
46 pages
4SP LectureNotes v3
No ratings yet
4SP LectureNotes v3
45 pages
Lnotes
No ratings yet
Lnotes
409 pages
Stochastic Analysis in Finance I Stochastic Analysis in Finance I
No ratings yet
Stochastic Analysis in Finance I Stochastic Analysis in Finance I
18 pages
Econometric S If All 2020
No ratings yet
Econometric S If All 2020
119 pages
Lecturenotes3 4 Probability
No ratings yet
Lecturenotes3 4 Probability
14 pages
Measure Theoretic Probability Theory Notes
No ratings yet
Measure Theoretic Probability Theory Notes
3 pages
Lesson 1
No ratings yet
Lesson 1
31 pages
Probability Preamble
No ratings yet
Probability Preamble
5 pages
An Informal Introduction To Stochastic Calculus With Applications
No ratings yet
An Informal Introduction To Stochastic Calculus With Applications
10 pages
Ito Shit
No ratings yet
Ito Shit
35 pages
Mcnotes 51
No ratings yet
Mcnotes 51
8 pages
SDENotes 2
No ratings yet
SDENotes 2
140 pages
Appendix A 2
No ratings yet
Appendix A 2
18 pages
Probability CW
No ratings yet
Probability CW
14 pages
Lecture12 Multiple Random Variables and Independence
No ratings yet
Lecture12 Multiple Random Variables and Independence
6 pages
Lect 1
No ratings yet
Lect 1
16 pages
Real Analysis Problems - Cristian E. Gutierrez
No ratings yet
Real Analysis Problems - Cristian E. Gutierrez
23 pages
Stochastic Lectures
No ratings yet
Stochastic Lectures
8 pages
Adembo
No ratings yet
Adembo
384 pages
Lecturenotes5 6 Probability
No ratings yet
Lecturenotes5 6 Probability
10 pages
December 2, 2020
No ratings yet
December 2, 2020
38 pages
Elements of Probability Theory: 5.1 Probability Space and Random Variables
No ratings yet
Elements of Probability Theory: 5.1 Probability Space and Random Variables
40 pages
Sample Papers For Class 10 Maths 2019 20 Basic
100% (1)
Sample Papers For Class 10 Maths 2019 20 Basic
17 pages
Introduction To Probability (Lectures 1-2)
No ratings yet
Introduction To Probability (Lectures 1-2)
11 pages
MIT6 436JF08 Lec05
No ratings yet
MIT6 436JF08 Lec05
14 pages
to denote the numerical value of a random variable X, when is no larger than - X (ω) ≤ c) - Of course, in
No ratings yet
to denote the numerical value of a random variable X, when is no larger than - X (ω) ≤ c) - Of course, in
14 pages
Applied Stochastic Processes: M. Ottobre
No ratings yet
Applied Stochastic Processes: M. Ottobre
164 pages
MIT6 436JF08 Lec05
No ratings yet
MIT6 436JF08 Lec05
14 pages
Stochastic Calculus Notes 1/5
100% (3)
Stochastic Calculus Notes 1/5
25 pages
Discrete Random Variables: Scott Sheffield
No ratings yet
Discrete Random Variables: Scott Sheffield
52 pages
John Moriarty: C C C 1 2 3 K 1 K
No ratings yet
John Moriarty: C C C 1 2 3 K 1 K
6 pages
Probability and Statistics: Cheat Sheet
100% (1)
Probability and Statistics: Cheat Sheet
10 pages
Notes Mainimp
No ratings yet
Notes Mainimp
164 pages
Advanced Probabiliy
No ratings yet
Advanced Probabiliy
80 pages
Probability Theory: 1 Heuristic Introduction
No ratings yet
Probability Theory: 1 Heuristic Introduction
17 pages
Probability Theory (MATHIAS LOWE)
No ratings yet
Probability Theory (MATHIAS LOWE)
69 pages
Probability Theory I: CAM 384K Concepts
No ratings yet
Probability Theory I: CAM 384K Concepts
14 pages
Summary Notes 1
No ratings yet
Summary Notes 1
4 pages
Handout For The Quantitative Finance Course: Conditional Expectation and Discrete Martingale
No ratings yet
Handout For The Quantitative Finance Course: Conditional Expectation and Discrete Martingale
4 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
A Probabilistic Theory of Pattern Recognition: Based On The Appendix of The Textbook
No ratings yet
A Probabilistic Theory of Pattern Recognition: Based On The Appendix of The Textbook
4 pages
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
No ratings yet
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
13 pages
GMAT Inequalities
No ratings yet
GMAT Inequalities
34 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Jun 05 A-Level Mark Schemes
No ratings yet
Jun 05 A-Level Mark Schemes
208 pages
Nonlinear Analysis of Structures Cable - Truss: June 2015
No ratings yet
Nonlinear Analysis of Structures Cable - Truss: June 2015
11 pages
(3 ?kavs (3 ?kavs (3 ?kavs (3 ?kavs
No ratings yet
(3 ?kavs (3 ?kavs (3 ?kavs (3 ?kavs
7 pages
RS Seminar 2012
No ratings yet
RS Seminar 2012
52 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Maths Revision
No ratings yet
Maths Revision
3 pages
Section 6.2 Filled Lecture Notes
No ratings yet
Section 6.2 Filled Lecture Notes
10 pages
Math Book 2021 Final
No ratings yet
Math Book 2021 Final
42 pages
Pure Mathematics Paper 2 by VAGWAMA.
No ratings yet
Pure Mathematics Paper 2 by VAGWAMA.
6 pages
HY - Paper - X - Maths - P1 - Eve - PC
No ratings yet
HY - Paper - X - Maths - P1 - Eve - PC
8 pages
Eng Math Notes For A1
No ratings yet
Eng Math Notes For A1
11 pages
Lesson 8-Basics of Counting, Permutations & Combinations
No ratings yet
Lesson 8-Basics of Counting, Permutations & Combinations
10 pages
Zemenu Project Report Final (newAD) PDF
No ratings yet
Zemenu Project Report Final (newAD) PDF
50 pages
Microeconomics: Joana Pais
No ratings yet
Microeconomics: Joana Pais
27 pages
Soluciones Algebra
No ratings yet
Soluciones Algebra
49 pages
Eight Squares: SEAMC 2012 Long Term Buddy Questions
No ratings yet
Eight Squares: SEAMC 2012 Long Term Buddy Questions
23 pages
QR Decomposition by Gram Schmidtt Method
No ratings yet
QR Decomposition by Gram Schmidtt Method
3 pages
Elementary Linear Algebra - 9781118473504 - Latihan 20 - Quizlet
No ratings yet
Elementary Linear Algebra - 9781118473504 - Latihan 20 - Quizlet
3 pages
Some Old Math 200 Final Exam Questions
No ratings yet
Some Old Math 200 Final Exam Questions
8 pages
S4 Graphs in Practical Situations1
No ratings yet
S4 Graphs in Practical Situations1
4 pages
Harmonic Analysis of Tides: H H + H F Cos (A T + e - K
No ratings yet
Harmonic Analysis of Tides: H H + H F Cos (A T + e - K
2 pages
Formule Trigonometrice: Sin X+cos X 1 Cos Sin X
No ratings yet
Formule Trigonometrice: Sin X+cos X 1 Cos Sin X
6 pages
Gr9 Math - Midterm Exam 2019-2020
No ratings yet
Gr9 Math - Midterm Exam 2019-2020
2 pages
08 Application of Integrals
No ratings yet
08 Application of Integrals
1 page
Homework 3.1
No ratings yet
Homework 3.1
2 pages
6325757
No ratings yet
6325757
2 pages
Backtracking & Branch and Bound
No ratings yet
Backtracking & Branch and Bound
24 pages

TopicsInAnalysis3 TKSM

Uploaded by

TopicsInAnalysis3 TKSM

Uploaded by

TOPICS FROM ANALYSIS - PART 3/3

1. Measure theoretic probability: independence and the laws 1

1. Measure theoretic probability: independence and the laws

on X. (iii) Let (X, A, µ) be a probability space and f : X → R be a random variable on X.

Deﬁnition: Let (X, A, µ) be a probability space.

(iv) A collection {fj : X → R : j ∈ J} of random variables is said to be independent if the σ-algebras

(v) A random variable f : X → R is independent with C ⊂ A if f −1 (B(R)) is independent with C.

Deﬁnition: Let (X, A, µ) be a probability space and f : X → R be a random variable. The

Example: Consider X = [0, 1] with Lebesgue measure µ. If f : X → R is f (x) = x2 , then Ef =

Exercise-31: [Change of variable] Let (X, A, µ) be a probability space, f : X → R be a random

Exercise-32: Let (X, A, µ) be a probability space and f1 , f2 : X → R be random variables.

Deﬁnition: For a sequence (An ) of subsets of a set X, let

Deﬁnition: A collection A of subsets of a nonempty set X is called a π-system if A is closed under

Exercise-33: (i) If A is both a π-system and a λ-system on X, then A is a σ-algebra on X.

which is a λ-system containing C. So A ⊂ A2 , and hence A ∩ B ∈ A for A, B ∈ A. (iii) Let

Exercise-34: Let (fn ) be an independent sequence of random variables on a probability space

Exercise-35: Let (X, A, µ) be a probability space.

Deﬁnition: A sequence (fn ) of random variables on a probability space (X, A, µ) is said to be

2. Convergence of a series of random variables

In general, convergence in measure for a sequence of random variables on a probability space

[138] [Khintchine-Kolmogorov] Let (fn ) be an L2 -sequence of independent random variables on a

3. Convergence of probability measures

Exercise-41: Let X be a Polish space, Γ ⊂ Cb (X, R) be a dense subset, and µ, µn ∈ M (X). If

Proof. (i) ⇒ (ii): Let Fk = {x ∈ X : dist(x, U c ) ≥ 1/k} and gk : X → [0, 1] be a continuous

(ii) ⇔ (iii): Just use the fact that U ⊂ X is open iﬀ U c is closed.

We will now be concerned with the following questions:

[140] [Helly-Bray] Let µn , µ ∈ M (R), let Fn , F : R → [0, 1] be the corresponding distribution

The implication (ii) ⇒ (iii) is trivial, and (iii) ⇒ (iv) is easy.

Proof. Step-1 : For m ∈ N, let Km ⊂ X be compact with µn (Km ) > 1 − 1

Deﬁnition: Let X be a Polish space. Let Lb (X, R) = {f ∈ Cb (X, R) : f is Lipschitz}, and ∥f ∥L =

Exercise-44: Let X be a Polish space, ε > 0 and µ ∈ M (X). Then,

Deﬁnition: Let (X, A, µ) be a probability space and f ∈ L1 (X, A, µ).

(iii) Let f ∈ L1 (X, A, µ) and g : X → R be a random variable. Then B := g −1 (B(R)) is a sub

[142] [Properties of conditional expectation - I] Let (X, A, µ) be a probability space, B ⊂ A be a

[143] [Properties of conditional expectation - II] Let (X, A, µ) be a probability space, B ⊂ A be a

[144] [Jensen’s inequality] Let (X, A, µ) be a probability space, B ⊂ A be a sub σ-algebra, f ∈

Exercise-45: [A characterization of conditional expectation of L2 -functions] Let (X, A, µ) be a

Deﬁnition: Let (X, A, µ) be a probability space. A sequence (An )∞

Deﬁnition: A gambling sequence {fn , An }∞

for every n ≥ 0 is said to be a:

fn |An ) ≥ 0; and is a supermartingale iﬀ E(fn+1 − fn |An ) ≤ 0 for every n ≥ 0. (ii) {fn , An }∞

Exercise-46: Let {fn , An }∞

Examples: Let (An )∞

Exercise-47: Let 1 < p < ∞, and {fn , An }∞

Example: Let gn : X → R be an L1 -sequence of independent random variables on a probability

natural ﬁltration (An )∞

Exercise-48: Let (fn )∞ ∞

Many results about martingales have corresponding versions for submartingales/supermartingales

[145] [Doob decomposition] Let {fn , An }∞

Remark: For a supermartingale {fn , An }∞

Deﬁnition: Let (X, A, µ) be a probability space. A random variable t : X → N ∪ {0, ∞} is

Deﬁnition: Let {fn , An }∞

Exercise-49: As per the above Deﬁnition, we have:

[146] [Doob’s optional stopping time theorem] Let {fn , An }∞

(iv) Suppose s ≤ t ≤ M . Applying the tower law of conditional expectation to As ⊂ At , and

[147] [Doob’s upcrossing inequality] Let {fn , An }∞

[148] [Martingale convergence theorem] Let {fn , An }∞

L1 (X, A, µ) such that (fn ) → f pointwise µ-a.e.

[149] Let {fn , An }∞ ∞

Topic for self-study: Applications of martingales.

You might also like