0% found this document useful (0 votes)
34 views

Lecture Notes

This document defines key concepts related to events and collections of events in probability theory: 1. A field is a collection of subsets of a sample space that is closed under complementation and finite unions. A σ-field is also closed under countable unions. 2. The minimal field/σ-field containing a given collection of sets is the smallest field/σ-field that includes those sets. 3. The Borel σ-field of the real line is the minimal σ-field containing all intervals of the real line, and defines the standard events for probability calculations on the real line.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Lecture Notes

This document defines key concepts related to events and collections of events in probability theory: 1. A field is a collection of subsets of a sample space that is closed under complementation and finite unions. A σ-field is also closed under countable unions. 2. The minimal field/σ-field containing a given collection of sets is the smallest field/σ-field that includes those sets. 3. The Borel σ-field of the real line is the minimal σ-field containing all intervals of the real line, and defines the standard events for probability calculations on the real line.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 143

Lecture 1

R = set of real numbers. (1)


R̄ = R ∪ {∞} ∪ {−∞} (extended real numbers) (2)

1. Dealing with ±∞.

a + ∞ = ∞, a − ∞ = −∞,
∞ + ∞ = ∞, −∞ − ∞ = −∞, ∞ − ∞ is undefined.
b × ∞ = ∞ (b > 0),
b × ∞ = −∞ (b < 0)
a a
= = 0, a ∈ R.
∞ −∞
0 × ∞ = 0.

2. lim sup and lim inf of real numbers. {xk } is a sequence of real numbers. Define

lim sup xn = inf sup xk


n k≥n

lim inf xn = sup inf xk .


n k≥n

Exercise 1. Show that;


(a) lim sup xn and lim inf xn are always defined
(b) lim sup xn ≥ lim inf xn .
(c) lim xn exists if and only if lim sup xn = lim inf xn and the common value equals
lim xn .
(d) For the sequence 1, −1, 1, −1, . . ., find lim sup xn and lim inf xn .

3. Operations on sets:

(a) Complement Ac or A0 .
(b) Difference A − B.
(c) Union A ∪ B. In general write ∪kn=1 An
(d) Intersection A ∩ B. In general write ∩kn=1 An
(e) (A ∩ B)c = Ac ∪ B c .
Exercise 2. Which is larger? A ∪ B or A ∩ B?

1
4. Disjointification:
Two sets A1 and A2 .
Define B1 = A1 . B2 = A2 − A1 .
Then
(a) B1 and B2 are disjoint.
(b) B1 ∪ B2 = A1 ∪ A2 .
Extend to more than two sets...
Exercise 3. Sets {An }, n = 1, 2, . . .. Define
Bk = Ak − ∪k−1
j=1 Aj , k ≥ 1 A0 = ∅.

Show that
(a) Bk ⊂ Ak for all k = 1, 2, . . ..
(b) B1 , B2 , . . . are disjoint.
(c) ∪nk=1 Bk = ∪nk=1 Ak for every n.
5. lim sup and lim inf of sets {Ak }. Taking a hint from Item 2,

lim sup An = ∩n=1 (∪∞ ∞


k=n Ak ), lim inf An = ∪n=1 (∩k=n Ak ).
Note that lim sup An and lim inf An are sets.
What type of elements belong to these two sets?
Exercise 4. Show that
(a) x ∈ lim sup An if and only if x belongs to An for infinitely many n’s.
(b) x ∈ lim inf An if and only if for some n (depending on x), x belongs to An , An+1 , . . ..
(c) As a consequence of (a) and (b), lim sup An ⊇ lim inf An .
If the above two sets are equal, then the common set is called lim An .
Exercise 5. Construct sets {An } where lim sup An 6= lim inf An .. (Hint use Exercise 1
(d)).
Exercise 6. Solve problems 1–6, from Page 2 of the textbook.
Exercise 7. If {An } is an increasing (strictly speaking, non-decreasing) sequence of sets
(that is An ⊆ An+1 for all n), then show that lim An = ∪∞ n=1 An .

Exercise 8. If {An } is a decreasing (strictly speaking, non-increasing) sequence of sets


(that is An ⊇ An+1 for all n), then show that lim An = ∩∞ n=1 An .

Exercise 9. Suppose {An }, n = 1, 2, . . . is a sequence of sets. Define


Cn = ∪∞ ∞
k=n Ak , Dn = ∩k=n Ak , n = 1, 2, . . . .

2
Show that
(a) {Cn } is a decreasing sequence of sets.
(b) {Dn } is an increasing sequence of sets.
(c) Cn ⊇ Dn for all n.
(d) lim sup An = ∩∞ ∞
n=1 Cn , lim inf An = ∩n=1 Dn .

3
Lecture 2
Field, σ-field, Borel σ-field
December 2, 2020

Probability calculations are always done for events. For example probability of “four
heads in 10 tosses”. But events can get more complicated (for example by combination
of “simple” events) and soon we face the question “what constitutes the collection of all
events in a given scenario?”
This collection of events will of course vary depending on the problem at hand. But
we may turn around this question and stipulate some “natural” conditions that events
should satisfy, no matter what the probability question we are trying to answer.
We lay down two such sets of natural conditions: field and σ-field.
Sample space. This is a non-empty set, usually denoted by Ω.
For example, I can take Ω = {HH, T T, T H, HT }.
The sample space Ω can be finite, countably infinite or uncountably infinite. We
shall of course see plenty of examples of all kinds later.
Then we have a collection of appropriate subsets of Ω which we shall call events. This
collection is contextual and even with the same Ω, we may have more than one such
relevant collection. This choice may vary depending on our goal. Natural conditions on
this collection are imposed which can be more or, less, strict, again depending on our
goal.
Definition 1 (Field). Let Ω be a non-empty set and F be a collection of subsets of Ω.
Then F is said to be a field (or an algebra) of subsets of Ω, if all the following three
conditions are satisfied:
(a) Ω ∈ F,
(b) If A ∈ F, then Ac ∈ F,
(c) If {Ai }1≤i≤n belong to F, then ∪ni=1 Ai ∈ F.
The elements of F are called events.

Clearly the empty set is a member of F. Note that a field is always defined with respect
to a sample space Ω. Later as we gain expertise, we may not always mention this space
explicitly, since afterall, our main focus will be on events. Also note that this definition
of field/algebra is in no way connected to the definition of field in the theory of algebra.
Exercise 1. Show that a field is always closed under finite intersection and finite
differencing.
Exercise 2. Consider Ω = N = {1, 2, . . .}. Let F be the collection of all subsets of N
which are either finite or their complement are finite. Show that F is a field. Does the
set {2, 4, 6, , . . .} belong to F?

1
Exercise 3. Suppose a non-empty set Ω is given. What is the smallest possible field
(containing Ω)?
Exercise 4. Suppose a non-empty set Ω is given. What is the largest possible field?
Exercise 5. Suppose a non-empty set Ω and a non-empty subset A of Ω is given. What
is the smallest possible field containing A (and Ω)?
Exercise 6. Suppose a non-empty set Ω and two non-empty subsets A1 and A2 of Ω are
given. What is the smallest possible σ-field containing these two sets (and Ω)? (may be
helpful to consider two cases: (i) A1 and A2 are disjoint and (ii) they are not disjoint).
Exercise 7. Show that arbitrary intersection of fields is again a field.
Definition 2 (Minimal field). Suppose C is a collection of subsets of Ω. Then the
smallest field containing all sets of C is called the minimal field containing C or, the
field generated by C, and is written as F(C).

Convince yourself that the minimal σ-field always exists.


Exercise 7. Suppose C = {Ai }, 1 ≤ i ≤ n is a collection of non-empty disjoint subsets
of Ω such that their union is Ω. That is, it is a partition of Ω. Describe the minimal
field containing this collection. How many sets are in this minimal σ-field?
Exercise 8. Now suppose C = {Ai }, i = 1, 2, . . . is a countable partition of Ω. Describe
the minimal field containing this collection. How many sets are there in this field? What
will happen if the partition is uncountable?
Exercise 9. Suppose A is a field. Let B be a subset of Ω. Consider
A ∩ B =: {A ∩ B : A ∈ A}.
Show that this is a field (of subsets of B).
Definition 3 (σ-field). Let Ω be a non-empty set and A is a collection of subsets of Ω.
Then A is said to be a σ-field if in addition to being a field/algebra, it also satisfies:
if {Ai }1≤i<∞ belong to A, then ∪∞i=1 Ai ∈ A. That is A is a field and is closed under
countable unions.

Exercise 10. Give an example of a field which is not a σ-field. Hint: Try Exercise 2.
Exercise 11. Consider Ω = R. Let A be the collection of all subsets of R that are either
countable or their complement are countable. Show that A is a σ-field. Does the set
of all irrational numbers belong to A? Does the set of real numbers between 0 and 1
belong to A?
Exercise 12. Show that a σ-field is always closed under countable intersection.
Exercise 13. Suppose A is a σ-field of subsets of Ω. Let B be a subset of Ω (which may
or may not be in A). Consider
A ∩ B =: {A ∩ B : A ∈ A}.

2
Show that this is a σ-field (of subsets of B).
Exercise 14. Show that arbitrary intersection of σ-fields are again σ-fields.
Definition 4 (Minimal σ-field). Suppose C is a collection of subsets of Ω. Then the
smallest σ-field containing all sets of C is called the minimal σ-field containing C or,
the σ-field generated by C, and is written as σ(C).

Argue that the minimal σ-field always exists.


Exercise 15. Suppose {Ai }, i = 1, 2, . . . is a countable partition of Ω. Describe the
smallest σ-field containing this collection. How many sets are there in this σ-field?
Exercise 16. Show that there cannot exist a countably infinite σ-field A. Hint: if it is
possible, enumerate the events {A1 , A2 , . . .} and consider the collection of all the events
of the form A∗1 ∩ A∗2 ∩ · · · where ∗ denotes presence or absence of the complement sign.
Can you build up A from this collection? Compare with Exercise 15.
Definition 5 (Borel σ-field). The minimal σ-field containing all intervals of R is called
the Borel σ-field of R, denoted by B(R).

The Borel σ-field of other appropriate subsets of R are defined in the natural way.
Exercise 17. Consider Ω = R. Show that the smallest σ-field containing each of these
collections equals B(R).
(a) all intervals of finite length.
(b) all closed intervals.
(c) all open intervals
(d) all left open right closed intervals (including −∞ as left limit)
(d) all left closed right open intervals (including ∞ as right limit)
Exercise 18. Show that B(R) contains the following sets (caution! it contains much
more though):
(a) all singleton sets.
(b) all finite sets
(c) all compact sets.
(d) all open sets
(e) all closed sets.
For which of the above collections is the smallest σ-field equal to B(R)?
The Borel σ-field on R̄, written B(R̄) is the smallest σ-field generated by all intervals of
the form (a, b], a, b ∈ R̄.
Exercise 19. Suppose Ω = R. Let C be the collection of all singleton sets. Describe
σ(C).

3
Lecture 3
Measure, probability measure
December 2, 2020

Whenever we calculate weighted average, length, area of a surface, volume of a solid,


area under the curve of a function, probability, the underlying scale to compute is a
measure—length measure (Lebesgue measure), surface measure, volume measure, prob-
ability measure...
We now give the definition of a measure.
Notation: R+ will denote the set of all non-negative real numbers.
Definition 1 (Measure). (a) Suppose A is a σ-field. A function µ : A → R+ ∪ {∞} is
called a measure (on A) if

X
µ(∪∞
n=1 An ) = µ(An ) for all choices of disjoint {Ai }. (1)
n=1

The above property is called countable additivity (of µ on A). The triplet (Ω, A, µ)
is called a measure space.
(b) Suppose F is a field. A function µ : F → R+ ∪ {∞} is called a measure (on F) if

X
µ(∪∞
n=1 An ) = µ(An ) (2)
n=1

for all choices of disjoint {Ai } from F for which ∪∞


n=1 An ∈ F. The above property is
called countable additivity (of µ on F).

(a) Note that µ(A) must be defined for all A ∈ A (or F). That is, every event must
have a measure. Also note that µ(A) can be ∞. To avoid triviality, we shall assume that
there is at least one set A ∈ A such that µ(A) < ∞. In that case, show that µ(∅) = 0.
(b) There are theories where only the weaker version (finite additivity) is demanded for
a meaure, but countable additivity is a generally accepted required property.
(c) You may wonder, why do not we work on the power set of all subsets of Ω. That
is, declare every subset of Ω as an event. Unfortunately, many reasonable measures (for
example the “length measure” on subsets of R) does NOT have the the above property
on the power set and, must be restricted to a suitable subset of the power set.
(d) There exists theory of measures that are not necessarily non-negative. But we shall
not discuss them.
Notation: #(A) denotes the number of elements of A. P(Ω) will denote the power set
of Ω.
Exercise 1. In each case below, µ is a measure:

1
(a) Suppose Ω = N. µ(A) = #(A) for all subsets of Ω. This is known as the counting
measure.
(b) Suppose {xi } is a sequence of non-negative real numbers. Suppose Ω = N. Let
X
µ(A) = xi , A ⊆ Ω.
i∈A

(c) Suppose λ > 0. Let Suppose Ω = N and let .

X λi
µ(A) = exp{−λ} , A ⊆ Ω.
i!
i∈A

This is the Poisson measure.

Definition 2 (Probability measure). A measure µ on A (or F) is called a


(a) probability measure if
µ(Ω) = 1.

In that case, the triplet (Ω, A, µ) is called a probability space.


(b) finite measure if µ(Ω) < ∞.
(c) a σ-finite measure if there exists {Ai } ∈ A such that µ(Ai ) < ∞ for all i and
∪∞n=1 An = Ω.

The Poisson measure is a probability measure. The counting measure is a σ-finite mea-
sure.
Exercise 2. Suppose Ω = {0, 1, . . .}. Let the σ-field A be P(Ω). Fix 0 < p < 1. Let
q = 1 − p. Define µ for singleton sets as

µ({i}) = pq i , i = 0, 1, . . . .

Show that µ defined in the natural way by extension, on A, is a probability measure. It


is called the geometric measure with parameter p.
Exercise 3. Suppose Ω is a set with finitely many elements. Consider the σ-field as
P(Ω). Define µ as
#(A)
µ(A) = , A ⊆ Ω.
#(Ω)
Show that µ is a probability measure. It is called the uniform measure. What difficulty
would arise if we try to extend this to concept to a set with infinitely many elements?
Here are some easy consequences of the definition of a measure.
1. If µ is a measure of F or A, and A ⊆ B (from F or A respectivey), then µ(A) ≤ µ(B).

2
2. If µ is a measure on A, then for any sequence {Ai } from A,

X
µ(∪∞
n=1 An ) ≤ µ(An ).
n=1

3. Suppose µ is a measure on F and {Ai } is a sequence from F such that ∪∞


n=1 An ∈ F.
Then

X
µ(∪∞ A
n=1 n ) ≤ µ(An ).
n=1

To prove 1, 2, and 3, above, use disjointification.


4. Suppose µ is a measure on A. If {Ai } is a non-decreasing sequence of sets from A
then lim µ(An ) = µ(∪∞ ∞
n=1 An ). This is called continuity from below at A = ∪n=1 An .

5. Suppose µ is a measure on A. If {Ai } is a non-increasing sequence of sets from A such


that µ(Ak ) < ∞ for some k, then lim µ(An ) = µ(∩∞ n=1 An ). This is called continuity
from above at A = ∩∞ A
n=1 n . Note the extra condition needed for this. To prove this,
use Bn = Ak − An , n ≥ k and use additivity, finiteness condition and part 4.

Definition 3 (Finitely additive measure). Suppose F is a field and µ : F → R+ ∪ {∞}.


Then µ is called finitely additive if
n
X
µ(∪nk=1 Ak ) = µ(Ak ) (3)
k=1

for all choices of integers n and disjoint {Ai }, 1 ≤ i ≤ n from F.

Exercise 4. Suppose µ is finitely additive on a field F.


(a) Suppose µ is continuous from below in the sense that whenever {Ai } is a non-
decreasing sequence of sets from F AND ∪∞ ∞
n=1 An ∈ F, then lim µ(An ) = µ(∪n=1 An ).
Show that then µ is a measure on F.
[To prove this, take any disjoint sequence {Ai } from F (and assume that A = ∪∞
n=1 Ai ∈
F). Define Bn = ∪nk=1 Ak . Then Bn is non-decreasing....]
(b) Suppose µ is continuous from above at ∅. That is, whenever {Ai } is a non-increasing
sequence of sets from F AND ∩∞ n=1 An = ∅, then lim µ(An ) = 0. Show that then µ is a
measure on F.
[To prove this, take any disjoint sequence {Ai } from F (and assume that A = ∪∞
n=1 Ai ∈
n
F). Define Bn = ∪k=1 Ak . Note that A − Bn is decreases to ∅....]
Exercise 5. Try the problems on page 10–11 of the book.

3
Lecture 4
Semi-field
Countable sub- and super-additivity
Discussion on “length measure”
December 8, 2020

Notation: (i) For any collection C of subsets of Ω, and any subset A of Ω, define

C ∩ A = {A ∩ B : B ∈ C}.

(ii) For any non-empty set Ω, P(Ω) denotes the collection of all subsets of Ω. It is called
the power set of Ω.

Exercise 1. Suppose C is a class of subsets of Ω. Let A be a non-empty subset of Ω.


Show that
σ(C ∩ A) = σ(C) ∩ A (a σ − field of subsets of A). (1)

Exercise 2. Suppose Ω is countably infinite. Define µ(·) on (the σ-field) P(Ω) as


(
0 if A is finite,
µ(A) = (2)
∞ if A is infinite.

Show that
(a) µ is finitely additive but not countably additive.
(b) µ is not continuous from below at Ω.
(c) µ is not continuous from above at ∅.

Exercise 3. Suppose Ω is countably infinite. Let

F = {A ⊆ Ω : A is finite or its complement is finite} (3)

be the co-finite field of subsets of Ω. Define µ(·) on F as

(
0 if A is finite,
µ(A) = (4)
1 if Ac is finite.

Show that
(a) µ is finitely additive but not countably additive.
(b) µ is not continuous from below at Ω.

Theorem 1. Suppose µ is a finitely additive (non-negative) measure on a field F. Then


(a) µ(∅) = 0. (it is implicitly assumed that there is at least one set A with µ(A) < ∞).

1
(b) µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B) for all A, B ∈ F.
(c) If A, B ∈ F, B ⊆ A, then µ(A) = µ(B) + µ(A − B). As a consequence µ(A) ≥ µ(B).
This is called monotincity of measure.
(d) For any sequence of sets {An } from F,
(i) µ(∪nk=1 Ak ) ≤ nk=1 µ(Ak ). This is called finite sub-additivity. Hint: Use disjoin-
P
tification, finite additivity and (c).
P∞
(ii) If {An } are disjoint and ∪∞ ∞
k=1 Ak ∈ F, then µ(∪k=1 Ak ) ≥ k=1 µ(Ak ). This is
known as countable super-additivity.
P∞
(iii) If µ is a countably additive measure then µ(∪∞ k=1 Ak ) ≤ k=1 µ(Ak ), whenever
∪∞ A
k=1 k ∈ F. This is known as countable sub-additivity.

Exercise 4. Suppose µ is a finitely additive and countably sub-additive measure on a


field or a σ-field. Show that µ is countably additive.

Definition 1 (σ-finite measure). A measure µ on a field (or σ-field) A is said to be


σ-finite, if there exists a sequence of sets {An } from A such that µ(Ak ) < ∞ for all k
and ∪∞ k=1 Ak = Ω.

Check that the sets in the definition can be taken to be disjoint without any loss of
generality.

Exercise 5. Suppose C is a class of subsets of Ω. Verify that we can describe the smallest
field F(C) containing C in the following way. Let

D = {A : A = ∩rj=1 Bj , where each Bj or its complement is in C},


G1 = {∪ni=1 Ai : n ≥ 1, where each Ai ∈ D},
G2 = {∪ni=1 Ai : n ≥ 1, where each Ai ∈ D and they are disjoint}.

Clearly G1 and G2 both contain C. Further, both are fields. Hence they must equal F(C).

Definition 2 (Semi-field). Let Ω be a non-empty set. Suppose S is a (non-empty)


collection of subsets of Ω such that,
(i) it is closed under intersection,
(ii) for any A ∈ S, Ac can be written as a finite disjoint union of sets from S.
Then S is called a semi-field.

Clearly any field is a semi-field. A semi-field is not necessarily closed under complemen-
tation.

Exercise 6. A semi-field which is closed under complementation is a field. Caution! A


field must satisfy three conditions.

2
Exercise 7. Let Ω be the interval (0, 1]. Let S be the collection of all sub-intervals of
Ω of the form (a, b]. Show that S is a semi-field and is not a field.
Exercise 8. Let S be a semi-field and let E be any subset of Ω. Show that S ∩ E is a
semi-field (of subsets of E).
Exercise 9. Suppose Si , i = 1, 2 are semi-fields of subsets of Ωi , i = 1, 2 respectively.
Show that S1 × S2 is a semi-field of subsets of Ω1 × Ω2 . Show by an example, that the
analogous statement is not true for either fields or σ-fields.
Exercise 10. Let C be any class of subsets of Ω which contains the empty set. Show
that E ∈ σ(C) iff there exists C1 , C2 , . . . ∈ C (this collection is allowed to depend on E)
such that E ∈ σ(C1 , C2 , . . .).
Definition 3 (Countably generated field and σ-field). A field or a σ-field, A, of subsets of
Ω, is said to be countably generated, if there exists countably many subsets C1 , C2 , . . .
of Ω such that A = σ(C1 , C2 , . . .).
Exercise 11. Show that B(R) is countably generated.
Exercise 12. Let (Ω, A, µ) be a measure space. Let {An } be a sequence of sets in A.
Show that
µ(lim inf An ) ≤ lim inf µ(An ).

If further µ is a finite measure, then show that

µ(lim sup An ) ≥ lim sup µ(An ).

Exercise 13. Let {An } be a sequence of subsets of Ω. Define a sequence of 0 − 1 valued


functions as
(
0 if ω ∈ / A,
fn (ω) = (5)
1 if ω ∈ A.

What is the relation between the two sets lim sup An , lim inf An and the two functions
lim sup fn and lim inf fn ?
Exercise 14. Let (Ω, A, µn ) be a sequence of measure spaces where {µn } is non-decreasing.
That is, for any set A, µn (A) is a non-decreasing sequence. Define

µ(A) = lim µn (A), A ∈ A.

Is µ a measure?

Discussion on construction and existence of measures. We have seen a few


examples of countably additive measures on σ-fields. All these examples have been very
simple. How do we build up a collection of interesting non-trivial measures? Where do
we start?

3
Maybe we can start by checking out the “length measure” on the real-line...Call such a
target measure λ.
1. For any a < b, this λ must satisfy

λ(a, b] = b − a.

2. Moreover, we should be able to define this measure on an appropriate σ-field. Recall


that B(R) is the smallest σ-field containing all such intervals. So we should be able to
make sure that λ can be defined in a countably additive way on B(R) (and probably on
a larger σ-field...).
3. This measure will not be extendable to P(R). As of now, take this on faith.
4. Every singleton set is a Borel set and λ({ω}) = 0 (why?) for every ω ∈ R.
5. Such a measure λ will not be finite, but it will be σ-finite since

R = ∪∞
n=−∞ (n, n + 1].

6. This λ must be translation invariant. That is λ(A + x) = λ(A) for every A ∈ B(R)
and every x ∈ R.
7. Thus we should be able to restrict our attention to say, the interval (0, 1], and
complete our definition of λ on B(0, 1] and then use it to define λ on B(R).
8. But then why restrict to only the length measure λ? There must be a sort of
general programme to build measures by defining them on “simpler sets” (semi-field or
field maybe?) and then get to the σ-field...For example can we take a non-decreasing
continuous function F and define µ(a, b] = F (b)−F (a). Will this give rise to a countably
additive measure on B(R)?
9. There is another interesting question. What about getting from the length measure
λ on R to the area measure on R × R or the volume measure on R × R × R....?

4
Lecture 5
Monotone class theorem
December 9, 2020

We introduce another class of subsets of Ω and a crucial result concerning this class.
This result is going to be a crucial tool and shall be used many times in this course.
Definition 1 (Monotone class). Let Ω be a non-empty set. A class of subsets M of Ω,
is said to be a monotone class if it is closed under increasing and decreasing limits of
sets. That is,
(i) for all {An } from M such that An ↑, lim An = ∪∞
k=1 Ak ∈ M and
(ii) for all {An } from M, such that An ↓, lim An = ∩∞
k=1 Ak ∈ M.

Clearly any σ-field is a monotone class. A field which has only a finite number of sets is
a monotone class.
Exercise 1. Construct examples of fields that are not monotone classes and vice-versa.
Exercise 2. If M is a field and is a monotone class, then it is a σ-field.
Exercise 3. Check that arbitrary intersection of monotone classes is a monotone class.
Hence, given a class of sets C, there is a smallest monotone class that contains C.
It is usually denoted by M(C).

The following theorem links the three notions of field, monotone class and σ-field.
Theorem 1 (Monotone class theorem). Suppose F is a field. Then
(a) M(F) = σ(F).
(b) As a consequence of (i), if M is a monotone class such that M ⊇ F, then M ⊇ σ(F).

Proof. (i) We immediately have


M(F) ⊆ σ(F) (both are monotone classes).
We need to show the other inclusion.
Fix A ∈ M(F). Let

MA = {B ∈ M(F) : A ∩ B, A ∩ B c and Ac ∩ B ∈ M(F}). (1)


Then
(i) MA is a monotone class. This is easy to check.
(ii) Suppose A ∈ F. Then every B ∈ F satisfies the condition in (1). Thus F ⊆ MA .
This implies
MA ⊇ M(F) (both are monotone classes and contain F).

1
On the other hand, by definition, MA ⊆ M(F). Hence

MA = M(F).

(iii) But this means that for any B ∈ M(F) and any A ∈ F,

A ∩ B, A ∩ B c and Ac ∩ B ∈ M(F).

This means that


MB ⊇ F.
But we already know
MB ⊆ M(F).
Hence
MB = M(F).

(iv) We now claim that M(F) is a field. This is because, we have already seen that
when A, B ∈ M(F) = MA then

A ∩ B, A ∩ B c and Ac ∩ B ∈ M(F).

Thus M(F) is a field.


(v) Now by Exercise 2, M(F) is a σ-field.
(vi) But then M(F) and σ(F) must be equal.
Part (b) follows easily from part (a).

Exercise 4. Suppose F is a field and µ1 and µ2 are two finite measures defined on σ(F)
and agree on F. That is µ1 (A) = µ2 (A) for all A ∈ F. Show that they agree on σ(F).
Hint: Define the class of “good sets”, that is

M = {A ∈ σ(F) : µ1 (A) = µ2 (A)}.

Now use monotone class theorem.

2
Lecture 6
Towards an extension theorem
Updated, December 10, 2020

Before we construct some non-trivial measures, we will first explore whether we can
extending a measure defined on a field F to σ(F). As we have seen in the previous
lecture, IF the measure is finite and IF there is an extension, then this extension must
be unique. In this lecture we shall address these issues.

Lemma 1. Suppose (Ω, F, µ) is a finite measure space where F is a field. Suppose


{An } and {Bn } are two sequences of non-decreasing sets from F such that An ↑ A and
Bn ↑ B.
(a) If A ⊆ B, then lim µ(An ) ≤ lim µ(Bn ).
(b) If A = B, then lim µ(An ) = lim µ(Bn ).

Note that the sets A and B need not belong to F. Lemma 1 implies that if we add the
limits of increasing sets to the collection F, then there is only one way of extending µ
to such sets.

Proof. Fix an integer m. Note that Am ∩ Bn ∈ F and as n → ∞, Am ∩ Bn ↑ Am ∈ F.


Since µ is a measure on F,

µ(Am ) = lim µ(Am ∩ Bn ) ≤ lim µ(Bn ) (because Am ∩ Bn ⊆ Bn )


n→∞ n→∞

Now let m → ∞ to complete the proof of (a).


(b) follows from (a). Simply reverse the roles of A and B.

Now suppose (Ω, F, µ) is a finite measure space where F is a field. Define

G = {A : An ↑ A, An ∈ F}. (1)

Extend µ to sets in G by

µ1 (A) = lim µ(An ), where An ↑ A, An ∈ F. (2)

Then we have the following lemma.

Lemma 2. Suppose (Ω, F, µ) is a finite measure space where F is a field.


(a) G ⊇ F. It is closed under finite union and intersection.
(b) The definition (2) is unambiguous. That is, it does not depend on the specific
sequence{An }.
(c) For all sets A ∈ F, µ(A) = µ1 (A).

1
(d) If G1 , G2 ∈ G and G1 ⊆ G2 , then µ1 (G1 ) ≤ µ1 (G2 ).
(e) µ1 (G1 ∪ G2 ) + µ1 (G1 ∩ G2 ) = µ1 (G1 ) + µ1 (G2 ) for all G1 , G2 ∈ G. So µ1 is finitely
additive.
(f) If Gn ∈ G and Gn ↑ G, then G ∈ G and µ1 (Gn ) ↑ µ1 (G).

Proof. (a) Proof of the first part is trivial. To prove the second part, let G1 , G2 ∈ G. Let
An , Bn ∈ F be such that An ↑ G1 and Bn ↑ G2 . Then An ∪ Bn , An ∩ Bn ∈ F. Moreover,
An ∪ Bn ↑ G1 ∪ G2 and An ∩ Bn ↑ G1 ∩ G2 . Hence G1 ∪ G2 and G1 ∩ G2 are in G.
(b), (c) and (d) follow from Lemma 1.
(e) Let G1 , G2 ∈ G. Then there exists non-decreasing An , Bn ∈ F such that An ↑ G1
and Bn ↑ G2 . By additivity of µ on F,

µ(An ∪ Bn ) + µ(An ∩ Bn ) = µ(An ) + µ(Bn ) for all n. (3)

The sets involved increase to G1 ∪ G2 , G1 ∩ G2 , G1 and G2 respectively and their µ-


measures increase to µ1 (G1 ∪ G2 ), µ1 (G1 ∩ G2 ), µ1 (G1 ) and µ1 (G2 ) respectively. This
proves the second part of (e).
(f) This is proved by a diagonalisation argument. For each n, let Anm ∈ F be such
that Anm ↑ Gn as m → ∞. Define

Dm = A1m ∪ A2m ∪ · · · ∪ Amm .

Observe that Dm ∈ F and Dm ↑ ∪∞


k=1 Dk . Further

Anm ⊆ Dm ⊆ Gm for all n ≤ m. (4)

Hence
µ(Anm ) ≤ µ(Dm ) ≤ µ1 (Gm ). (5)
Let m → ∞ in (4) to obtain
Gn ⊆ ∪∞
m=1 Dm ⊆ G. (6)
Now let n → ∞ to conclude that Dm increases to G. Hence

lim µ(Dm ) = µ1 (G).

Now let m → ∞ in (5) to obtain

µ1 (Gn ) ≤ lim µ(Dm ) ≤ lim µ1 (Gm ). (7)


m→∞ m→∞

Now let n → ∞ to obtain

lim µ1 (Gn ) = lim µ(Dm ) = µ1 (G). (8)


n→∞ m→∞

This proves the lemma completely.

2
Lecture 7
Outer measure
December 10, 2020

Definition 1. Let C be any class of subsets of Ω. Any function µ from C to R̄ will be


called a set function.

Lemma 1. Suppose G is a class of subsets of Ω and µ is a non-negative set function on


G. Suppose the pair (G, µ) satisfies:
(i) Ω, ∅ ∈ G and G is closed under finite union, finite intersection, and increasing limits.
(ii) µ(∅) = 0, µ(Ω) < ∞.
(iii) If G1 , G2 ∈ G and G1 ⊆ G2 , then µ(G1 ) ≤ µ(G2 ). That is, µ is monotone.
(iv) µ(G1 ∪ G2 ) + µ(G1 ∩ G2 ) = µ(G1 ) + µ(G2 ) for all G1 , G2 ∈ G.
(v) If Gn ∈ G and Gn ↑ G, then µ(Gn ) ↑ µ(G).
Define
µ∗ (A) = inf{µ(G) : G ∈ G, G ⊇ A}, A ⊆ Ω. (1)
Then
(a) For all A ∈ G, µ∗ (A) = µ(A).
(b) If A ⊆ B, then µ∗ (A) ≤ µ∗ (B).
(c) For all A ⊆ Ω, µ∗ (A) ≤ µ(Ω) < ∞.
(d) for all A, B ⊆ Ω,

µ∗ (A ∪ B) + µ∗ (A ∩ B) ≤ µ∗ (A) + µ∗ (B).

Hence, for any A ⊂ Ω,

µ∗ (A) + µ∗ (Ac ) ≥ µ∗ (Ω) + µ∗ (∅) = µ(Ω) + µ(∅) = µ(Ω).

(e) If An ↑ A, then µ∗ (An ) ↑ µ∗ (A).

Example 1. We know that the pair (G, µ1 ) defined in the previous lecture satisfies the
above conditions (i)–(v).

Proof Lemma 1. Proofs of (a), (b), and (c) are immediate.


(d) Fix  > 0. Choose G1 , G2 ∈ G such that G1 ⊇ A, G2 ⊇ B and

µ(G1 ) ≤ µ∗ (A) + , µ(G2 ) ≤ µ∗ (B) + .


Then

1
µ∗ (A) + µ∗ (B) + 2 ≥ µ(G1 ) + µ(G2 )
= µ(G1 ∪ G2 ) + µ(G1 ∩ G2 )
≥ µ∗ (A ∪ B) + µ∗ (A ∩ B).
Since  is arbitrary, proof of (d) is complete.
(e) Note that µ∗ (An ) is non-decreasing and since An ↑ A, µ∗ (A) ≥ lim µ∗ (An ). We need
to prove the opposite inequality.
As in the proof of part (d), get Gn ∈ G such that
Gn ⊇ An and µ(Gn ) ≤ µ∗ (An ) + 2−n , for all n ≥ 1. (2)
We now claim that
m
X

µ(∪m
k=1 Gk ) ≤ µ (Am ) +  2−k for all m ≥ 1. (3)
k−1

We prove this by induction. Clearly it is true for m = 1. Suppose it holds for all m ≤ n.
We shall prove it for m = n + 1.
First note that
µ∗ (∪nk=1 Gi ) ∩ Gn+1 ≥ µ∗ (Gn ∩ Gn+1 ) ≥ µ∗ (An ∩ An+1 ) = µ∗ (An ).

(4)
Hence,
µ(∪n+1 n n

k=1 G i ) = µ(∪ k=1 G i ) + µ(G n+1 ) − µ (∪ k=1 G i ) ∩ G n+1 (condition (iv))
n
X  
≤ µ∗ (An ) + k
+ µ∗ (An+1 ) + (n+1) − µ∗ (An ) ((3) with m = n, (2), (4))
2 2
k=1
n+1
X

≤ µ (An+1 ) +  2−k .
k=1

This establishes (3). Incidentally, Gn need not be non-decreasing. However,


A = ∪∞ ∞
k=1 Ak ⊆ ∪k=1 Gk .

Hence,
µ∗ (A) ≤ µ∗ (∪∞
k=1 Gk ) (by (b))
= µ(∪∞
k=1 Gk ) (by (a))
= lim µ(∪nk=1 Gk ) (using condition (v))
n→∞
n
X

≤ lim [µ (An ) +  2−k ]
n→∞
k−1

= lim µ (An ) + .
n→∞

Since  is arbitrary, this proves (e) and the proof of the lemma is complete.

2
Note that µ∗ is defined on P(Ω) but need not be a measure, though it has some measure-
like properties. This motivates the following definition.

Definition 2. A set function λ on P(Ω) is said to be an outer measure if the following


three conditions are satisfied.
(a) λ(∅) = 0.
(b) λ is monotone on P(Ω).
(c) λ is countably sub-additive on P(Ω).

Exercise 1. The set function µ∗ defined in (1) is a finite outer measure.

3
Lecture 8
Extension of measure
Completion of a measure December 14, 2020

Recall that an outer measure is defined on P(Ω).

Exercise 1. Construct an example of an outer measure which is not a measure.

Recall the earlier set up where we have a finite measure space (Ω, F, µ). Then we looked
at the class G of all increasing limits of sets from F. This led to an additive set function
µ∗ on G. Then by the previous lemma, this gives an outer measure (on P(Ω)). Let us
keep calling it µ∗ .
Though an outer measure is not a measure in general, we know that our outer measure
µ∗ agrees with µ on F and hence is a measure on this subset of P(Ω).
We have the following theorem for an outer measure of this type. We state and prove it
for a probability measure but the same proof works, with appropriate changes, for any
finite measure.

Theorem 1. Suppose (Ω, F, P ) is a probability space where F is a field. Consider the


corresponding outer measure P ∗ constructed from P . Let

H = {H ⊆ Ω : P ∗ (H) + P ∗ (H c ) = P (Ω) = 1}. (1)

Then H ⊇ F is a σ-field and P ∗ is a probability measure on H.


In particular, if (Ω, F, µ) is a finite measure space, then there is a unique extension to
a measure µ∗ on σ(F).

Before we begin the proof, observe that H can be described as:

H = {H ⊆ Ω : µ∗ (H) + µ∗ (H c ) ≤ P (Ω) = 1}. (2)

Proof of Theorem 1. We prove the theorem in four steps.


(i) Let G be the collection of sets that are increasing limits of sets from F. Then G ⊆ H.
To see this, suppose An ∈ F such that An ↑ G. Then

P ∗ (G) + P ∗ (Gc ) = lim[P (An ) + P ∗ (Gc )] (since An ↑ G)


≤ lim sup[P (An ) + P ∗ (Acn )] (since Gc ⊆ Acn )
= 1

Hence by (2), G ∈ H.
(ii) From the definition of H, it follows that, Ω ∈ H, and H is closed under complemen-
tation.

1
(iii) H is a field and P ∗ is finitely additive on H.
To show this, let H1 , H2 ∈ H. Let

A = H1 ∪ H2 , B = H1 ∩ H2 ,

x = P ∗ (A) + P ∗ (Ac ), y = P ∗ (B) + P ∗ (B c ).

P ∗ (A) + P ∗ (B) ≤ P ∗ (H1 ) + P ∗ (H2 ) (3)


∗ c ∗ c ∗ ∗
P (A ) + P (B ) ≤ P (H1c ) +P (H2c ). (4)

The above two inequalities follow from Lemma 1 (d) of Lecture 7. If we now add (3)
and (4), and use the fact that H1 , H2 ∈ H, then we arrive at

2 ≤ x + y ≤ P ∗ (H1 ) + P ∗ (H2 ) + P ∗ (H1c ) + P ∗ (H2c ) = 2. (5)

Hence either x ≤ 1 or y ≤ 1. But by Lemma 1 (d) of Lecture 7, x ≥ 1 and y ≥ 1. Thus,


they both must equal 1. In other words A = H1 ∪ H2 ∈ H and B = H1 ∩ H2 ∈ H. That
is, H is a field.
Moreover equality holds in (3) and (4) and that proves finite additivity of P ∗ on H. This
establishes (iii).
(iv) H is a σ-field and P ∗ is countably additive on H.
Since H is closed under complementation and under finite union and intersection, to
prove that it is a σ-field, it is enough to show that it is closed under increasing limits.
So suppose Hn ∈ H and Hn ↑ H. By Lemma 1 (d) of Lecture 7, P ∗ (H) + P ∗ (H c ) ≥ 1.
On the other hand, by Lemma 1 (e) of Lecture 7, P ∗ (Hn ) ↑ P ∗ (H). Fix  > 0. Then for
all large n, P ∗ (H) ≤ P ∗ (Hn ) + . Hence, for all large n, since H c ⊆ Hnc ,

1 ≤ P ∗ (H) + P ∗ (H c )
≤ P ∗ (Hn ) +  + P ∗ (Hnc )
= 1 + .

Since  is arbitrary, P ∗ (H) + P ∗ (H c ) = 1. That is, H ∈ H. Thus H is a σ-field.


Since P ∗ is finitely additive on H and is continuous from below, it is countably additive.
So this proves (iv) and the proof of the first part of the theorem is complete. The second
part is an easy consequence.

Remark 1. For a measure space (Ω, F, µ) where µ is a finite measure but not necessarily
a probability measure, the set H is defined as

H = {H ⊆ Ω : µ∗ (H) + µ∗ (H c ) = µ(Ω)}
H = {H ⊆ Ω : µ∗ (H) + µ∗ (H c ) ≤ µ(Ω)}
⊇ σ(F).

Then all conclusions of Theorem 1 continue to hold mutas mutandis.

2
In Theorem 1, since H ⊇ σ(F)), a natural question is how much larger is it? This ques-
tion has a nice answer and hinges on the notions of null sets and complete measures.

Definition 1 (Null set). Suppose (Ω, F, µ) is a measure space where F is a field. A set
A ∈ F is said to be a µ-null set if µ(A) = 0. If there is no scope for confusion we say
null set instead of µ-null set. We will denote the class of µ-null sets by Nµ .

Exercise 2. Show that a countable union of null sets is again a null set. An uncountable
union of null sets need not be a null set.

Definition 2 (Complete measure). Suppose (Ω, F, µ) is a measure space where F is a


field. Then µ is said to be complete if B ⊆ A for some µ-null set A, implies B ∈ F.
We also say that (Ω, F, µ) is a complete measure space.

Exercise 3. Show that there are measures which are not complete.

Example 1. Suppose (Ω, F, µ) is a finite measure space where F is a field. Then


(Ω, H, µ∗ ) is a complete measure space. To see this, suppose A is a µ∗ -null set and let
B ⊆ A. Then

µ(Ω) ≤ µ∗ (B) + µ∗ (B c )
≤ µ∗ (A) + µ∗ (B c ), since B ⊆ A
= µ∗ (B c ), since µ∗ (A) = 0
≤ µ∗ (Ω).

Hence B ∈ H.

If we have a measure space which is not complete, then can we complete it in some way?

Theorem 2. Suppose (Ω, A, µ) is a measure space where A is a σ-field. Then

Aµ = {A ∪ N : A ∈ A, N ⊆ B, B ∈ Nµ }

Then Aµ ⊇ A and is a σ-field.


Further extend µ on A to Aµ as follows: for any set A ∪ N ∈ Aµ define

µ(A ∪ N ) = µ(A). (6)

Then µ is well-defined and (Ω, Aµ , µ) is a complete measure space.

Proof. To show that Aµ is closed under countable unions, it suffices to observe that

∪∞ ∞ ∞
n=1 (An ∪ Nn ) = (∪n=1 An ) ∪ (∪n=1 Nn )

and invoke Exercise 2.

3
To show that Aµ is close under complementation, consider A ∪ N ∈ Aµ where N ⊆ B
with µ(B) = 0. Then

(A ∪ N )c = Ac ∩ N c (7)
c c c c c
= (A ∩ B ) ∪ (A ∩ (N − B ). (8)

Note that Ac ∩ B c ∈ A and (Ac ∩ (N c − B c ) ⊆ B which is µ-null. Hence (A ∪ N )c ∈ Aµ


proving that Aµ is closed under complementation.
To verify (6) is a valid definition, suppose A1 ∪ N1 = A2 ∪ N2 . Then we must show that
µ(A1 ) = µ(A2 ). Note that

µ(A1 ) = µ(A1 ∩ A2 ) + µ(A1 − A2 ) = µ(A1 ∩ A2 ) since A1 − A2 ⊆ N2 .

Thus µ(A1 ) ≤ µ(A2 ) and hence by symmetry µ(A1 ) = µ(A2 ).


We leave it as an exercise to show that µ is countably additive on Aµ .
To show that it a complete measure space, let M ⊆ A∪N ∈ Aµ where A ∈ A, N ⊆ B ∈ A
such that µ(B) = 0. Then M ⊆ A ∪ B ∈ A and µ(A ∪ B) = 0. Hence M ∈ Aµ .

Theorem 3. The measure space (Ω, H, µ∗ ) is the completion of (Ω, A, µ).

4
Lecture 9
Completion of a measure
December 16, 2020

If we have a measure space which is not complete, then can we complete it in some way?
Theorem 1 (Completion of a measure space). Suppose (Ω, A, µ) is a measure space
where A is a σ-field.
(a) Let
Aµ = {A ∪ B : A ∈ A, B ⊆ N, N ∈ Nµ }
where Nµ is the collection of all µ-null sets from A. Then Aµ ⊇ A and is a σ-field.
(b) Extend µ from A to Aµ as follows:

µ(A ∪ B) = µ(A), A ∈ A, B ⊆ N ∈ Nµ . (1)

Then µ is well-defined on Aµ and (Ω, Aµ , µ) is a complete measure space.

Proof. (a) To show that Aµ is closed under countable unions, it suffices to observe that

∪∞ ∞ ∞
n=1 (An ∪ Bn ) = (∪n=1 An ) ∪ (∪n=1 Bn )

and invoke Exercise 2 of Lecture 8.


To show that Aµ is closed under complementation, consider A ∪ B ∈ Aµ where A ∈ A
and B ⊆ N ∈ Nµ . Then

(A ∪ B)c = Ac ∩ B c (2)
c c c c c
= (A ∩ N ) ∪ (A ∩ (B − N ) (3)
c c c
= (A ∩ N ) ∪ (A ∩ (N − B). (4)

Since Ac ∩ N c ∈ A and (Ac ∩ (N − B) ⊆ N ∈ Nµ , we get (A ∪ B)c ∈ Aµ . This proves


that Aµ is closed under complementation.
(b) To verify (1) is a valid definition, suppose A1 ∪ B1 = A2 ∪ B2 . Then we must show
that µ(A1 ) = µ(A2 ). Note that

µ(A1 ) = µ(A1 ∩ A2 ) + µ(A1 − A2 )


= µ(A1 ∩ A2 ) + 0, since A1 − A2 ⊆ N2 , and A1 − A2 ∈ A
≤ µ(A2 ).

Similarly, µ(A2 ) ≤ µ(A1 ) and hence µ(A1 ) = µ(A2 ).


We leave it as an exercise to show that µ is countably additive on Aµ .
We now show that (Ω, Aµ , µ) is a complete measure space. Take any arbitrary µ-null set
from Aµ . Then it can be written as A ∪ B ∈ Aµ where A ∈ A, B ⊆ N ∈ A, such that
µ(A) = 0, and µ(N ) = 0.

1
Let M ⊆ A ∪ B. We have to show that M ∈ Aµ . But M ⊆ A ∪ N where A ∪ N ∈ A
and moreover µ(A ∪ N ) = 0. Hence by the definition of Aµ , M ∈ Aµ .

Theorem 2. Consider the finite measure space (Ω, F, µ) where F is a field. Then
(Ω, H, µ∗ ) is the completion of (Ω, σ(F), µ).

Proof. For convenience, let us use the notation A =: σ(F). Note that (Ω, A, µ∗ ) is a
measure space. We have to show that

Aµ∗ = H.

We already know that


A ⊆ H.

Consider any A∪B ∈ Aµ∗ where A ∈ A, B ⊆ N, N ∈ Nµ∗ and N ∈ A. Since A ⊆ H, we


have A, N ∈ H. On the other hand, (Ω, H, µ∗ ) is a complete measure space by Example
1 of Lecture 8. Hence B ∈ H. This implies A ∪ B ∈ H and so

Aµ∗ ⊆ H.

Now consider any A ∈ H. Recall the definition of approximation from above by sets
in G while defining the outer measure µ∗ . That implies, there exists sequences of sets
{Cn , Dn } from A such that

Cn ⊆ A ⊆ Dn , µ∗ (Cn ) → µ∗ (A), µ∗ (Dn ) → µ∗ (A). (5)

Let
C = ∪∞ ∞
n=1 Cn , D = ∩n=1 Dn .

Then
A = C ∪ (A − C) where C ∈ A. (6)
But
A−C ⊆D−C ∈A (7)
and
µ∗ (D − C) ≤ µ∗ (Dn − Cn ) = µ∗ (Dn ) − µ∗ (Cn ) → 0. (8)
Using (6), (7), and (8), A ∈ Aµ∗ . This proves the result completely.

Exercise 1. Establish equation (5).

2
Lecture 10
Carathéodory extension theorem
Approximation theorem
December 16, 2020

Theorem 1 (Carathéodory Extension Theorem). Suppose (Ω, F, µ) is a countably addi-


tive measure space where F is a field and µ is σ-finite. Then it can be uniquely extended
to the measure space (Ω, σ(F), µ).

Proof. This proof uses the results for extension of finite measures in a predictable way.
Write

Ω = ∪∞
n=1 An , where An are disjoint, An ∈ F, µ(An ) < ∞ for all n ≥ 1. (1)

Define the measures µn on F as

µn (A) = µ(A ∩ An ), A ∈ F. (2)

Then it is easy to check that for each n ≥ 1, (Ω, F, µn ) is a countably additive finite mea-
sure space. Then by Theorem 1 of Lecture 7, there is a unique extension (Ω, σ(F), µ∗n ).
Now it is obvious what we should do: define

X
µ∗ (A) = µ∗n (A), A ∈ σ(F). (3)
n=1

Then
(i) µ∗ is countably additive since each µ∗n is so. One has to observe that addition of
non-negative numbers gives the same result irrespective of the order in which they are
added.
(ii) (Ω, σ(F), µ∗ ) is an extension of (Ω, F, µ). This is easy to establish.
It remains to prove uniqueness. Suppose λ is a measure on σ(F) which agrees with µ
on F. Then we have to prove that λ = µ on σ(F). Define

λn (A) := λ(A ∩ An ), A ∈ σ(F)


Then (Ω, σ(F), λn ) is a finite measure space. Moreover,

λn (A) = µ(A ∩ An ), A ∈ F since λ = µ on F


= µn (A), A ∈ F by (2)
= µ∗n (A), A ∈ F since µ∗n = µn on F.

As a consequence λn = µ∗n on σ(F) (unique extension of finite measures). But then



X
λ(A) = λn (A), A ∈ σ(F)
n=1

1

X
= µ∗n (A), A ∈ σ(F)
n=1
= µ∗ (A), A ∈ σ(F)

and the proof of the theorem is complete.

Measures of sets in σ(F) can be approximated by those in F in the σ-finite case. This
result will come handy in future.
Theorem 2. Suppose (Ω, F, µ) is a countably additive measure space where F is a field
and µ is σ-finite. Then given any and  > 0, and any set A ∈ σ(F), such that µ(A) < ∞,

µ(A∆F ) <  for some F ∈ F. (4)

Proof. We prove it in a few steps.


(i) Recall the class of increasing limits of sets from F:

G = {A : A = ∪∞
n=1 An , An ∈ F}.

Then (4) holds for all A ∈ G for which µ(A) < ∞, since µ is continuous from below.
(ii) Now suppose µ is finite. By Lemma 1 of Lecture 7, the outer measure of any set A can
be approximated by the measure of sets G from G, which in turn can be approximated
by sets from F ∈ F by (i). This proves the theorem for the case where µ is a finite
measure.
(iii) Now suppose µ is σ-finite. Consider a sequence of sets {An } which satisfies (1).
Define µn on σ(F) as usual by µn (A) = µn (A ∩ An ). Then each µn is a finite measure.
Hence by part (ii), there exists sets Bn ∈ F such that

µn (A∆Bn ) < . (5)
2n
Note that

≥ µn (A∆Bn )
2n 
= µ (A∆Bn ) ∩ An
= µ (A∆(Bn ∩ An ) ∩ An )

= µn A∆(Bn ∩ An ) .

The extra property we gain is that Bn ∩ An ⊆ An and is in F. Thus in (5) we may


assume that Bn ⊆ An . Let
C = ∪∞
n=1 Bn .

Then C ∩ An = Bn . Hence

µn (A∆C) = µ (A∆C) ∩ An

2
= µ ((A∆Bn ) ∩ An )
= µn (A∆Bn ).

Hence

X
µ(A∆C) = µn (A∆C) < .
n=1

It remains to take care of the fact that C may not belong to F. Note that

∪nk=1 Bk − A ↑ C − A, A − ∪nk=1 Bk ↓ A − C.

Now note that

µ(∪nk=1 Bk − A) ↑ µ(C − A) (continuity from below).

µ(A − ∪nk=1 Bk ) ↓ µ(A − C) (continuity from above), using µ(A) < ∞.


Hence µ(A∆ ∪nk=1 Bk ) → µ(A∆C) as n → ∞. This proves the result by choosing, for
large enough n, F = ∪nk=1 Bk ∈ F.

3
Lecture 11
Distribution function and probability distribution function
Lebesgue-Stieltjes measure
From measure to distribution function
December 23, 2020

The extension theorem will now lead us to the Lebesgue measure (the length measure)
on R. Indeed we shall exhibit a more general class of measures on R, which are called
the Lebesgue-Stieltjes measures.
Definition 1 (Distribution function and probability distribution function). Any func-
tion F : R → R is called a distribution function if
(i) F is non-decreasing. That is, F (a) ≤ F (b) for all a ≤ b, a, b ∈ R.
(ii) F is right continuous. That is, limy↓x F (y) = F (x) for every x ∈ R.
F is called a probability distribution function if in addition to (i) and (ii), we have
(iii) limx→−∞ F (x) = 0, and limx→∞ F (x) = 1. In short we write F (−∞) = 0 and
F (∞) = 1.
A probability distribution function is also often called a cumulative distribution
function (CDF).
Remark 1. (i) A distribution function can take negative values and can be unbounded.
For example the function F (x) = x, x ∈ R is a distribution function.
(ii) If F is a probability distribution function (CDF) then of course, 0 ≤ F (x) ≤ 1 for
all x.
(iii) Even though all distribution functions are right continuous, a distribution function
need not be left continuous. However, the left limit exists at each point, in the sense
that limy→x,y<x F (y) exists for each x ∈ R.
(iv) We shall soon show how given any distribution function, there is an associated
measure on B(R) and vice versa.
Exercise 1. Suppose Z x
F (x) = exp(−y 2 )dy, x ∈ R.
−∞
Show that F is a distribution function.
Exercise 2. Suppose for α, p > 0,

0 if x ≤ 0
F (x) = R x αp −αy p−1 (1)
 0 e y dy, if x > 0.
Γ(p)
Show that F is a probability distribution function.

1
Exercise 3. Suppose P is the probability measure where P {0} = P {1} = 1/2. Find the
corresponding probability distribution function.

Exercise 4. Suppose P is the probability measure where P {i} = 2−i , i = 1, 2, . . .. Find


the corresponding probability distribution function.

Definition 2 (Lebesgue-Stieltjes measure). A countably additive measure µ on B(R) is


called a Lebesgue-Stieltjes measure if µ(I) < ∞ for every bounded interval I of R.

Note that any Lebesgue-Stieltjes measure is automatically σ-finite.

Theorem 1. Suppose µ is a Lebesgue-Stieltjes measure on B(R).


(a) Fix F (0) arbitrarily and define
(
F (0) + µ(0, x] if x > 0
F (x) = (2)
F (0) − µ(x, 0] if x < 0.

Then F is a distribution function.


(ii) If µ is a finite measure, then we may choose F (0) = µ(−∞, 0]. With this choice
(2) reduces to
F (x) = µ(−∞, x], x ∈ R. (3)
This F is a probability distribution function if µ(R) = 1.

Remark 2. Whenever µ is a finite measure, we shall always choose the corresponding


distribution function as described in (3).

Proof of Theorem 1. Suppose a < b. Then from (2),

F (b) − F (a) = µ(a, b] ≥ 0,

establishing non-decreasingness.
Fix x ∈ R and xn ↓ x. Since µ is finite on every sub-interval, using continuity from
above,
F (xn ) − F (x) = µ(x, xn ] ↓ µ(∅) = 0,
establishing right continuity.

Exercise 5. Suppose µ is a finite measure and µ{x}) > 0 for some x ∈ R. Show that
F is not left continuous at x.

Before we state and prove a converse of the above result, we make a few observations.
(i) Extend the function F : R → R to the function F : R̄ → R̄ by defining

F (∞) = lim F (x), F (−∞) = lim F (x).


x→∞ x→−∞

2
Note that both the above limits exist but they may equal ∞ and −∞ respectively.
(ii) Towards constructing a measure µ from a given F , define the set of all right-
semiclosed intervals of R̄ as

S̄ = A : A = (a, b] or A = [−∞, b], or A = (−∞, b], a, b ∈ R̄, a < b . (4)
Then
(a) R̄ ∈ S̄.
(b) S̄ is closed under finite intersection.
(c) If A ∈ S̄, then its complement is a finite disjoint union of sets from S̄.
We shall refer to S̄ as the class of all (left open) right-semiclosed intervals of R̄. It is a
semi-field.
(iii) Now let
F̄ = {A : A = ∪nk=1 Ik : Ik ∈ S̄ for all k and they are disjoint}. (5)
Then. by using (a), (b) and (c), it is easy to show that F̄ is a field of subsets of R̄. .
Exercise 6. Verify that F̄ contains the smallest field generated by al intervals of R.

(iv) Define µ on S̄ by
µ(a, b] = F (b) − F (a), a, b ∈ R̄, a < b
µ[−∞, b] = F (b) − F (−∞) = µ(−∞, b], b ∈ R̄. (6)
Then µ is non-negative and is defined on all right-semiclosed intervals of R̄.
(v) Define µ on F̄ by:
n
X
µ(∪nk=1 Ik ) = µ(Ik ), I1 , . . . In ∈ S̄ and they are disjoint.
k=1

Note that ∪nk=1 Ik has alternate descriptions as ∪tk=1 Jk where Jk are disjoint elements of
S. Thus we must show that the above definition is meaningful. This is easily done by
using (iv).
Exercise 7. Show that µ is well defined on S̄.

(vi) µ defined above on F̄ is finitely additive. This is also easily proved by using (iv)
and (v).
Exercise 8. Show that µ is finitely additive on F̄.
Lemma 1. Suppose F is a distribution function and the finitely additive µ is defined on
the field F̄ of finite disjoint unions of all right-semiclosed intervals of R̄ as above. Then
µ is countably additive on F̄.

The proof will be given in the next class.

3
Lecture 12
From distribution function to measure
Lebesgue measure
December 23, 2020

Recall our three primary objects:


(i) A distribution function F on R which has been extended to a function on R̄.
(ii) The field F̄ of subsets of R̄ developed via S̄.
(iii) The set function µ on F̄ which is finitely additive.
We now prove that µ is countably additive on F̄.
Lemma 1. Suppose F is a distribution function and µ is the finitely additive measure
defined on F̄ of finite disjoint union of all right-semiclosed intervals of R̄ as in Equation
(6) of Lecture 11. Then µ is countably additive on F̄.

Proof. (a) First assume that F (∞) − F (−∞) < ∞, so that µ is finite. Let {An } be sets
from F̄ such that An ↓ ∅. To prove countable additivity, we need to show that µ(An ) ↓ 0.
Recall that each An is a finite disjoint union of intervals of the form (a, b]. Suppose
an ↓ a. Then for every fixed b, by right continuity,

µ(an , b] = F (b) − F (an ) → F (b) − F (a) = µ(a, b].

Thus for every An , we can find sets Bn ∈ F̄ such that


(i) closures B¯n (in R̄) of Bn are contained in An and
(ii) µ(Bn ) is as close to µ(An ) as we please.
So fix  > 0 and choose Bn such that

0 ≤ µ(An ) − µ(Bn ) ≤ .
2n
Observe that
∩∞ ¯ ∞
k=1 Bk ⊆ ∩k=1 Ak = ∅.

Since each B¯n is compact, it follows that there exists a finite m, such that ∩m ¯
k=1 Bk = ∅.
Then for all n ≥ m,

µ(An ) = µ An − ∩nk=1 Bk + µ(∩nk=1 Bk ), (since Bk ⊆ Ak for all k ≥ 1)




= µ An − ∩nk=1 Bk


≤ µ ∪nk=1 (Ak − Bk ) since Ak ↓




Xn
≤ µ(Ak − Bk ) since µ is finitely additive
k=1
≤ .

1
This shows that µ(An ) ↓ 0 if F (∞) − F (−∞) < ∞. Hence µ is countably additive in
this case.

(b) Now suppose F (∞) − F (−∞) = ∞. Define



F (x) if |x| ≤ n

Fn (x) = F (n) if x ≥ n (1)

F (−n) if x ≤ −n.

Define the set function µn on F̄ corresponding to Fn . These are all finite and hence
countably additive by (a). Moreover,

µn (A) ≤ µ(A) for all A ∈ F̄ and µn (A) → µ(A) as n → ∞. (2)

Now to show countable additivity of µ, let An ∈ F̄ be disjoint such that ∪∞


n=1 An = A ∈
F̄. Then clearly, finite additivity of µ implies,

X
µ(A) ≥ µ(An ). (3)
n=1
P∞
So if n=1 µ(An ) = ∞, the proof would be complete.
If ∞
P
n=1 µ(An ) < ∞, then

µ(A) = lim µn (A)


n→∞

X
= lim µn (Ak ), since each µn is countably additive.
n→∞
k=1

But then

X
0 ≤ µ(A) − µ(Ak ) (by 3))
k=1

X ∞
X
 
= lim µn (Ak ) − µ(Ak ) (note that µ(An ) < ∞)
n→∞
k=1 n=1
≤ 0 since µn ≤ µ.

This completes the proof of the lemma.

Exercise 1. (a) Prove the claim made in (i) and (ii) above in the course of the proof.
(b) Verify the step An − ∩nk=1 Bk ⊆ ∪nk=1 Ak − Bk claimed in the above proof.
 

Exercise 2. Verify the claim made in Equation (2) in the above proof.

Exercise 3. Verify equation (3).

2
Now we can state our theorem.
Theorem 1 (From distribution function to measure). Let F be a distribution function
on R and let µ(a, b] = F (b) − F (a), a < b. Then there is a unique extension µ which is
a Lebesgue-Stieltjes measure on B(R).

Proof. By the previous lemma, get countably additive measure µ on the field F̄. Note
that this is field of subsets of R̄. Define the set function on the field of finite disjoint of
right-semiclosed intervals of R, (treating (a, ∞) as right-semiclosed). Call this field F.
For example
µ(a, ∞) = F (∞) − F (a), a, b ∈ R
µ(−∞, b] = F (b) − F (−∞), a, b ∈ R
µ(R) = F (∞) − F (−∞).
Note that there is no other possible choice of µ on these sets.
Clearly this µ is countably additive on F. Further µ is σ-finite on F (but not necessarily
on F). Now use the Carathéodory extension theorem. Clearly the extended measure is
Lebesgue-Steiltjes. We omit the details.
Exercise 4. Check all the details in the proof of Theorem 1. Why could not we claim
that µ is σ-finite on R̄?

For any distribution function F , define


F (x−) = lim F (y), x ∈ R.
y↑x,y<x

Note that this limit exists since F is non-decreasing.


Exercise 5. Suppose F is a distribution function and µ is the corresponding Lebesgue-
Stieltjes measure. Show that for a < b, a, b ∈ R,
µ(a, b] = F (b) − F (a)
µ[a, b] = F (b) − F (a−)
µ(a, b) = F (b−) − F (a)
µ[a, b) = F (b−) − F (a−).
Further if F is continuous at a and b, then all four expressions above are equal.
Exercise 6. Suppose F is a distribution function and µ is the corresponding Lebesgue-
Stieltjes measure. Show that
µ(−∞, b] = F (b) − F (−∞)
µ[a, ∞) = F (∞) − F (a−)
µ(−∞, b) = F (b−) − F (−∞)
µ(a, ∞) = F (∞) − F (a)
µ(R) = F (∞) − F (−∞).

3
Exercise 7. Show that

µ{x} = F (x) − F (x−), for all x ∈ R.

Hence µ{x} = 0 if and only if F is continuous at x.

Exercise 8. If F is a distribution function show that the number of discontinuity points


of F is countable. Hint. First assume F (∞) − F (−∞) < ∞.

Definition 1 (Lebesgue measure on R). The measure λ which satisfies λ(a, b] = b − a


for all a, b ∈ R, a < b, is said to be the Lebesgue measure on B(R). Usually we simply
say λ is the Lebesgue measure on R.

Note that λ is obtained by taking F (x) = x + c where c is a constant and is uniquely


defined, irrespective of what c is.

4
Lecture 13
More on distributions and Lebesgue-Stieltjes measures
Lebesgue measure on Rn
Approximations from within by compact sets (σ-finite measure)
Approximation from above by open sets (finite or Lebesgue-Stieltjes measure)
December 23, 2020

Exercise 1. Suppose f is a non-negative continuous function on R. Fix F (0) arbitrarily


and define
Z x
F (x) = F (0) + f (t)dt, for x > 0
0
Z x
F (x) = F (0) − f (t)dt, for x < 0.
0

Show that F is a distribution function. The corresponding Lebesgue-Stieltjes measure is


given by
Z b
µ(a, b] = f (t)dt, a, b ∈ R.
a
In particular µ does not depend on the value of F (0).

Exercise 2. If F is a non-decreasing right continuous function on the closed interval


[a, b] then show that there is a unique measure µ defined on the Borel sets of [a, b] such
that µ(x, y] = F (y) − F (x) for all a ≤ x < y ≤ b.

Definition 1. Any measure µ is said to be concentrated on B if µ(Ω − B) = 0.

Remark 1. Let S = {x1 , x2 , . . .} be a countable subset of R. Let µ be a measure which


is concentrated on S and let µ{xi } = ai > 0, i ≥ 1. Then
P
(i) µ is a Lebesgue-Stieltjes measure if for every finite length interval I, xi ∈I ai < ∞.
(ii) µ is a finite measure if ∞
P
i=1 ai < ∞.

(iii) Suppose µ is Lebesgue-Stieltjes. Then the distribution function F corresponding to


µ is continuous at x if and only if x ∈/ S.
(iv) Suppose µ is Lebesgue-Stieltjes. Then at every xi ∈ S, F jumps by an amount ai .
(v) Suppose µ is Lebesgue-Stieltjes. If x < y, x, y ∈ S and there is no point between x
and y which is in S, then F is constatnt on [x, y).
(v) Let S be the set of rationals. The above discussion yields innumerably many non-
decreasing functions F which jump at only the rationals and is continuous everywhere
else.

Exercise 3. Suppose µ is a measure (not necessarily Lebesgue-Stieltjes) on B(R) which


is concentrated on a countable set S = {x1 , x2 , . . .}. Define F (x) = µ(−∞, x], x ∈ R.
Explore the properties of F .

1
Remark 2. (a) The construction of Lebesgue-Stieltjes measure on B(Rn ) can be
carried out in a similar fashion by starting with a function F which is non-decreasing
and right continuous at every co-ordinate. Note that we will first need to extend the
notion of F (b) − F (a) that we had when a ≤ b ∈ R. We omit the details for now.

(b) For the special case where F is of the form

F (a1 , a2 , . . . , an ) = F1 (a1 ) · · · Fn (an ), ai ∈ R, i = 1, 2, . . . , n,

where each Fi (·) is non-decreasing and right continuous, the construction of the measure
is a bit simpler. One can start with rectangles which are product of right-semiclosed
intervals and defined µ in the natural way.

(b) The existence of Lebesgue measure on Rn follows from (b). This is really a special
case of the product measure construction, starting with the Lebesgue measure on
R. We shall cover this later.
Exercise 4. Learn in details from the book, the concepts and related developments on
distribution functions and Lebesgue-Stieltjes measures on Rn .
Definition 2 (Lebesgue measure on Rn ). The Lebesgue measure λn on B(Rn ) is the
unique measure for which
n
 Y
λn (a1 , b1 ] × · · · × (a1 , bn ] = (bi − ai ), ai < bi , ai , bi ∈ Rn , i = 1, . . . , n.
i=1

Theorem 1 (Approximation theorem). Suppose µ is any measure on B(Rn ). Then


(a) (Approximation from within)

µ(B) = sup µ(K) : K ⊆ B, K compact , for every B ∈ B(Rn ).



(1)

(b) (Approximation from above) If µ is any finite measure then

µ(B) = inf µ(V ) : V ⊇ B, V open , for every B ∈ B(Rn ).



(2)

(c) If µ is any Lebesgue-Stieltjes measure, then (2) continues to hold.


(d) µ need not be Lebesgue-Stieltjes, and then (2) need not hold.

Proof. (a) First suppose µ is finite. Let

C = B ∈ B(Rn ) : (1) holds .



(3)

Then we claim that C is a monotone class.


Suppose Bn ∈ C, Bn ↑ B. Fix  > 0. Let Kn ⊆ Bn be compact sets such that

µ(Bn ) ≤ µ(Kn ) + .

2
By replacing Kn with ∪nk=1 Kk (which is compact), we can assume that Kn is a non-
decreasing sequence. Then

lim µ(Kn ) ≤ µ(B) = lim µ(Bn ) ≤ lim µ(Kn ) + ,


n→∞ n→∞ n→∞

so that (1) holds for B and hence B ∈ C.


Now suppose Bn ∈ C, Bn ↓ B. Let Kn be compact subsets of Bn such that

µ(Bn ) ≤ µ(Kn ) + .
2n
Let K = ∩∞
n=1 Kn . Note that K is compact. Then

µ(B) − µ(K) = µ(B − K) since K ⊆ B


≤ µ ∪∞ (since B ⊆ ∪∞

n=1 (Bn − Kn ) n=1 Bn )
X∞
≤ µ(Bn − Kn ) (countable sub-additivity)
n=1
≤ .

Hence B ∈ C. Thus C is a monotone class.


Now it can be easily checked that C contains all right-semiclosed intervals and also their
finite disjoint unions, which forms a field that generates B(Rn ). Thus by the monotone
class theorem, C = B(Rn ). This proves (a) for the case when µ is finite.
Now suppose µ is σ-finite. Get Bn ∈ B(Rn ) such that µ(Bn ) < ∞ and Bn ↑ B. Each
Bn can be approximated from within by compact sets. To see this, consider the finite
measures µn (A) = µ(A ∩ Bn ), A ∈ B(Rn ) and proceed as in the finite case. Along the
way, you need to observe that finite unions of compact sets are compact. This completely
proves (a).

(b) We have

µ(B) ≤ inf µ(V ) : V ⊇ B, V open
≤ inf µ(W ) : W ⊇ B, W = K c , K compact


= inf µ(R) − µ(W c ) : W c ⊆ B c , W c compact




= µ(R) − sup µ(K) : K ⊆ B c , K compact




= µ(R) − µ(B c ) (by (a), since µ is finite)


= µ(B).

(c) Write Rn = ∪∞
n=1 Bn where {Bn } are disjoint bounded sets. Then for each n, Bn ⊆ Cn
for some bounded open sets Cn . Define the finite measures

µk (A) = µ(A ∩ Ck ), A ∈ B(Rn ).

3
Fix  > 0. If B is Borel subset of Bk , then by (b), there is an open set Wk ⊇ B such
that

µk (Wk ) ≤ µk (B) + k . (4)
2
Now note that Vk = Wk ∩ Ck is an open set and B ∩ Ck = B since B ⊆ Bk ⊆ Ck . Hence

µ(Vk ) = µk (Wk ) (by definition of µk )



≤ µk (B) + k (by (4))
2

= µ(B) + k (since B ⊆ Ck ).
2

Now fix any A ∈ B(Rn ). By the conclusion reached above, let Vk be an open set such
that

Vk ⊇ A ∩ Bk and µ(Vk ) ≤ µ(A ∩ Bk ) + k for all k.
2

Let V = ∪n=1 Vn . Then using the above inequality, V is open, V ⊇ A and

X
µ(V ) ≤ µ(Vn ) ≤ µ(A) + .
n=1

This proves (c)

(d) Let S = {1, 1/2, 1/3, . . .} and let be the measure concentrated on S with µ{1/n} =
1/n for all n. Clearly µ is σ-finite and µ is not a Lebesgue-Stieltjes measure. Consider
the set B = {0}. Any open set containing B has infinite measure.

Exercise 5. Consider the set function

µ(A) = Number of rational points in A.

Show that µ is not a Lebesgue-Stieltjes measure and approximation from above fails.

4
Lecture 14
Towards integration with respect to measures
Measurable functions
December 23, 2020

If f is a continuous function on a closed interval [a, b], we understand what we mean by


Rb
the area under the curve of f , that is the Riemann integral a f (t)dt. Note that the
“length measure” is involved in the computation of this integral.
Now consider the function f on [0, 1]
(
1 if 0 ≤ x ≤ 1 is irrational
f (x) = (1)
2 if 0 ≤ x ≤ 1 is rational.

Note that f is not Riemann integrable on the set [0, 1].


On the other hand, f takes the value 2 on the countable set of all rationals and this
set has Lebesgue measure 0. It has the constantR value 1 on a set which has Lebesgue
1
measure 1. Thus it is tempting to conclude that 0 f (t)dt = 1 in some meaningful way.
Our goal will now be to develop a theory of integration. It will tell us which functions
can be integrated with respect to a measure, and the properties of such integrals. This
integral should of course yield the Riemann integral as and when the latter is defined.
Definition 1 (Measurable space and measurable function). If A is a σ-field of subsets of
Ω then the pair (Ω, A) is called a measurable space. If in addition there is a measure
µ on A, then the triplet (Ω, A, µ) is called a measure space.
Suppose (Ωi , Ai ) i = 1, 2 are two measurable spaces. Any function f : Ω1 → Ω2 is said to
be measurable if f −1 (A2 ) ∈ A1 for every A2 ∈ A2 . To indicate that f is measurable,
we write f : (Ω1 , A1 ) → (Ω2 , A2 ). If in particular, Ω2 = R and A2 = B(R) then a
measurable f is called Borel measurable.

Note that the measurability of a function is determined by what the underlying σ-fields
are, and not on any measure that may be defined on the σ-fields. If these σ-fields are
clear from the context, we may not mention them.
Exercise 1. Suppose f : Ω1 → Ω2 is any function. Suppose {Aα , α ∈ I} and {Bα , α ∈ I}
are subsets of Ω1 and Ω2 respectively. Show that

f ∪α∈I Aα = ∪α∈I f (Aα )

f ∩α∈I Aα ⊆ ∩α∈I f (Aα ) (equality may not hold)
c
= f (Acα ) does not hold in general

f (Aα )
c
f −1 (Bαc ) = f −1 (Bα )
f −1 (∪α∈I Bα ) = ∪α∈I f −1 (Bα )
f −1 (∩α∈I Bα ) = ∩α∈I f −1 (Bα )

1
Exercise 2. Suppose f : Ω1 → Ω2 with σ-fields Ai , i = 1, 2 respectively. Suppose C is a
class of subsets of Ω2 such that σ(C) = A2 . Then f is measurable if and only if for all
C ∈ C, f −1 (C) ∈ A1 .

Exercise 3. Give an example of a function f : (Ω1 , A1 ) → (Ω2 , A2 ) such that for some
set A1 ∈ A1 , f (A1 ) ∈
/ A2 .

Exercise 4. Suppose f : R → R is continuous. Show that f is measurable with respect


to the Borel σ-fields.

Definition 2 (Indicator function). The function I : Ω → R defined below is called the


indicator function of the set A ⊆ Ω.
(
1 if ω ∈ A
I(ω) = (2)
0 if ω ∈
/ A.

Exercise 5. When is the indicator function Borel measurable?

Exercise 6. Suppose f : Ω → R is a function and Ω is equipped with the σ-field A.


Show that the following statements are equivalent:

(a) f is Borel measurable


(b) For every c ∈ R, {ω ∈ Ω : f (ω) > c} ∈ A.
(c) For every c ∈ R, {ω ∈ Ω : f (ω) ≥ c} ∈ A.
(d) For every c ∈ R, {ω ∈ Ω : f (ω) < c} ∈ A.
(e) For every c ∈ R, {ω ∈ Ω : f (ω) ≤ c} ∈ A.
(e) For every a, b ∈ R, {ω ∈ Ω : a ≤ f (ω) ≤ b} ∈ A.
Construct other statements that are equivalent to the above.

 say “f is Borel measurable on (Ω, A, µ)”. This means that


Terminology: We often
f : (Ω, A) → R, B(R) is measurable and (Ω, A, µ) is a measure space.

2
Lecture 15
More on Measurable functions
Integration with respect to measures
December 23, 2020

Definition 1 (Simple function). A function s : (Ω, A) → R̄ is called simple if its range


is finite. It can always be written as
n
X
s= xi IAi where {Ai } are disjoint sets from A, ∪ni=1 Ai = Ω.
i=1

Note that {xi } need not be distinct. In other words a simple function is allowed to have
more than one representation. Since {Ai } are disjoint, both ∞ and −∞ are allowed as
possible values of {xi }.

Definition 2 (Integration of a simple function). If s = ni=1 xi IAi is a simple function


P
on (Ω, A, µ) then we define
Z Xn
sdµ = xi µ(Ai ),
Ω i=1

as long as both ∞ and −∞ do not appear in the set of values of {xi }.

Exercise 1. Show that the above integral is well-defined–all representation of f yield the
same value.

Remark 1. Note the following:


(i) The integral may equal either ∞ or −∞.
R
(ii) If s is a non-negative simple function, then Ω sdµ ≥ 0 and may equal ∞.
(iii) (Monotonicity)
R If s1 and
R s2 are two simple functions whose integrals are defined and
s1 ≤ s2 , then Ω s1 dµ ≤ Ω s2 dµ. Note that the left side may equal−∞ and the right
side may equal ∞.

Exercise 2. Suppose s1 and s2 are simple functions.


(a) When are s1 + s2 and s1 + s2 well-defined? Are they simple whenever they are
well-defined?
R R R
(b) Is Ω (s1 + s2 )dµ defined whenever Ω s1 dµ and Ω s2 dµ are defined? What is the
relation between the three integrals? Note: Be careful about the arithmetic of ±∞.

Definition 3 (Integration of a non-negative measurable function). Suppose f : (Ω, A) →


R̄ is a non-negative measurable function. Then we define
Z Z

f dµ = sup sdµ : s is simple and 0 ≤ s ≤ f
Ω Ω

1
So, here we again see the idea of approximating from below. Note that this definition
is unambiguous and is a true extension of the definition of integral for simple functions.
Also, the value of the integral can be equal to ∞.
Exercise 3. Suppose f : (Ω, A) → (R̄, B(R̄)) is a non-negative measurable function.
(a) Show that there exists a sequence of simple functions {sn } such that sn (ω) ↑ f (w) for
all ω ∈ Ω. Hint: Truncate R and split it into intervals with diadic rational end points
and take inverse images..
R
(b) What do you think will happen to limn→∞ Ω sn dµ?

Now we will move to define integrals of functions f which are not necessarily non-
negative. We use the familiar idea of splitting f into positive and negative parts.
Definition 4 (Positive and negative parts). For any f : (Ω, A) → (R̄, B(R̄)), define
(
f (ω) if f (ω) ≥ 0
f + (ω) = (1)
0 otherwise.

(
− −f (ω) if f (ω) ≤ 0
f (ω) = (2)
0 otherwise.

The functions f + and f − are called the positive and negative parts of f respectively.
Exercise 4. Show that
(i) Both f + and f − are non-negative measurable functions.
(ii) |f | = f + + f − .
(ii) f = f + − f − .
Exercise 5. If f : (Ω, A) → (R̄, B(R̄)) and A ∈ A, then show that f IA is also measur-
able.
Definition 5. For any f : (Ω, A) → (R̄, B(R̄)), define
Z Z Z
f dµ = +
f dµ − f − dµ provided not both integrals are ∞.
Ω Ω Ω
R R
In this case we say that Ω f dµ exists. Otherwise we say that Ω f dµ does not exist. If
the integral exists and is finite, then we say that f is µ-intergable, or simply integrable
if the measure µ is clear from the context. We shall often suppress Ω in the notation of
the intergal. For any set A ∈ A, define
Z Z
f dµ = f IA dµ.
A Ω

2
Lecture 16
More on measurable functions
Projection map
Basic properties of the integral
Monotone Convergence Theorem (MCT)
December 30, 2020

Lemma 1. Suppose {fn } is a sequence of Borel measurable functions from Ω to R̄ such


that f (ω) := limn→∞ fn (ω exists for all ω ∈ Ω. Then f is Borel measurable.

Proof. This follows since for every x ∈ R, the following string of equalities hold:

{ω : f (ω) > x} = ω : lim fn (ω) > x
n→∞
 1
= ω : fn (ω) is eventually > x +
for some k = 1, 2, . . .
k
1
= ∪∞

k=1 ω : fn (ω) > x + for all but finitely many n
k

 1
= ∪k=1 lim inf ω : fn (ω) > x +
n→∞ k
1
= ∪∞ ∞ ∞

k=1 ∪n=1 ∩t=n ω : ft (ω) > x + .
k

Exercise 1. Suppose {fn } is a sequence of Borel measurable functions from Ω to R̄.


Show that the functions lim sup fn and lim inf fn are measurable.
Lemma 2 (Approximation by simple function). (a) Suppose f is a non-negative Borel
measurable function from Ω to R̄. Then there exists a sequence of non-decreasing, non-
negative finite-valued simple functions sn such that sn ↑ f .

(b) Suppose f is any Borel measurable function from Ω to R̄. Then there exists a sequence
of finite-valued simple functions {sn } such that |sn | ≤ |f | and sn → f point-wise.

(c) Suppose f is any bounded Borel measurable function from Ω to R̄. Show that {sn }
in (b) above can be chosen such that sn → f uniformly on Ω.

Proof. (a) Define the simple functions


n
n2
X k−1 k − 1 k n2n
sn = IAk +nIB where Ak = f −1 , n ), k = 1, . . . , n2n , B = Ω−∪k=1 Ak .
2n 2n 2
k=1

It can be checked that the sequence {sn } has the desired properties.

(b) Consider the two non-negative measurable functions f + and f − . Choose simple
functions s1n and s2n respectively for them as in (a). Then let sn = s1n − s2n . It is easy
to check that {sn } has the desired properties. This also serves as proof for (c).

1
Exercise 2. Suppose f1 and f2 are the constant functions 1 and 2 respectively. What
are the corresponding approximating simple functions as constructed in the above proof ?
Exercise 3. Using the above results, show that if f1 and f2 are Borel measurable func-
tions from Ω to R̄, then f1 + f2 and f1 /f2 are also so, provided they are defined.
Exercise 4 (Composition of measurable functions). Suppose f1 : (Ω1 , A1 ) → (Ω2 , A2 ),
f2 : (Ω2 , A2 ) → (Ω3 , A3 ) are measurable. Show that f2 ◦ f1 : (Ω1 , A1 ) → (Ω3 , A3 ) is
measurable.
Lemma 3 (Projection maps). (a) Suppose pi : R¯n → R̄ is defined as pi (x1 , . . . , xi , . . . , xn ) =
xi . Then pi is Borel measurable for 1 ≤ i ≤ n.

(b) Suppose f : Ω → R̄n . Then f is Borel measurable if and only if fi = pi ◦ f is Borel


measurable for all 1 ≤ i ≤ n.

The proof is left as an exercise.


R R R
Theorem
R 1. (a) If f dµ exists then for every constant c, cf dµ exists, and cf dµ =
c f dµ.

(b) Suppose
R f1 , f2 are two
R measurable functions
R such that f1 ≥R f2 . ThenR
(i) If f2 dµ exists and f2 dµ > −∞ then f1 dµ exists and f1 dµ ≥ f2 dµ.
R R R R R
(ii) If f1 dµ exists and f1 dµ < ∞ then f2 dµ exists and f1 dµ ≥ f2 dµ.
R R R R
(iii) If both integrals, f1 dµ and f2 dµ exist, then f1 dµ ≥ f2 dµ.
R R R
(c) If f dµ exists then f dµ ≤ |f |dµ.
(d) If f is non-negative, and A ∈ A, then
Z Z

f dµ = sup sdµ : 0 ≤ s ≤ f, s simple .
A A
R R
(e) If f dµ exists, then so does A f dµ for every A ∈ A. If the first intergral is finite,
then so is the second integral for every A.

Proof is left as an exercise.


Theorem
R 2 (Integral is countably additve). Suppose f is Borel measurable such that
f dµ exists. Define Z
ν(B) = f dµ, B ∈ A.
B
Then ν is a countably additive set function. Hence, if further f is non-negative, then ν
is a measure.

Note that in general


R ν is not non-negative. However. it can take only one value out of
∞ and −∞ since f dµ exists.

2
Proof. The claim can be easily verified when f is a simple function.
So then let f be a non-negative measurable function. Fix any simple function s such
that 0 ≤ s ≤ f . Let {Bn } be disjoint sets from A and B = ∪∞
n=1 Bn . Then
Z ∞ Z
X
sdµ = sdµ since s is simple
B n=1 Bn
X∞ Z
≤ f dµ by monotonicity of integral.
n=1 Bn

Now, taking the supremum over all such simple functions,



X
ν(B) ≤ ν(Bn ).
n=1

If ν(B) = ∞, then the proof is complete.


So suppose ν(B) < ∞. Observe that Bn ⊆ B for all n. Hence IBn ≤ IB for all n. This
implies ν(Bn ) ≤ ν(B) < ∞. Fix n and  > 0. By definition of the integral, and the
fact that maximum of a finite number of simple functions is again simple, get a simple
function s, 0 ≤ s ≤ h such that
Z Z

sdµ ≥ f dµ − , i = 1, 2, . . . , n.
Bi Bi n

Then
Z
ν(B1 ∪ . . . ∪ Bn ) = f dµ
∪n
i=1 Bi
Z
≥ sf dµ
∪ni=1 Bi
n
XZ
= sf dµ since we have simple functions
i=1 Bi
n Z
X
≥ f dµ − 
i=1 Bi
n
X
= ν(Bi ) − .
i=1

Hence

ν(B) ≥ ν(B1 ∪ . . . ∪ Bn )
Xn
≥ ν(Bi ) − 
i=1

3

X
→ ν(Bi ) − .
i=1

Since  is arbitrary, we have proved that ν is countably additive when f is non-negative.


For general f , we can write f = f + − f − and proceed. Details are left as an exercise
but note that we need the hypothesis that the integral exists (so either f + or f − is
integrable).

In the above proof ν(B) = ν + (B) − ν − (B) where ν + and ν − are measures and at least
one of them is finite.

Theorem 3 (Monotone Convergence theorem). Suppose {fn } is anRincreasing R sequence


of non-negative Borel measurable functions such that fn ↑ f . Then fn dµ ↑ f dµ.
R R
Proof. Let v := limn→∞ fn dµ. Then we know that v ≤ f dµ.
Fix 0 < b < 1. Suppose s is a non-negative simple function, 0 ≤ s ≤ f . Let

Bn = {ω : fn (ω) ≥ bs(ω)}.

Then Bn ↑ Ω. Now
Z Z
v ≥ fn dµ ≥ fn dµ
Bn
Z
≥ b sdµ
Bn
Z
→ b sdµ as n → ∞ by Theorem 2, continuity from below
Z B

→ sdµ letting b → 1.
B
R
Now taking supremum over all possible s, we get v ≥ f dµ and the proof is complete.

4
Lecture 17
Further properties of the integral
Almost sure (almost everywhere)
December 31, 2020

Notation:
R R
For any measurable function f we shall write f dµ for Ω f dµ.
The acronym MCT will stand for Monotone Convergence Theorem.

Exercise
R 1. Construct fn such
R that fn ↑ f , integrals of all these functions exist, but
fn dµ does not increase to f dµ.
R
Exercise
R 2. Construct non-negative fn such that fn ↓ f , but fn dµ does not decrease
to f dµ.

Exercise 3. Suppose {xn,k }, n , k =1, 2, . . . is an array of non-negative real numbers


such that

xn,k ≤ xn+1,k for all n, k,


xn,k ↑ xk , k = 1, 2, . . .

Show that

X ∞
X
xn,k ↑ xk .
k=1 k=1

Exercise 4. Suppose {xn } is a sequence of non-decreasing and non-negative real numbers


such that xn ↑ x. By using the MCT, show that
xn n
1+ → ex .
n
Theorem 1.R Suppose fR and g are BorelR measurable
R and f + g is well defined. Further
R
suppose that f dµ and gdµ exist and f dµ + gdµ is also well-defined. Then (f +
g)dµ exists and Z Z Z
(f + g)dµ = f dµ + gdµ.

In particular, if f and g are integrable then f + g is integrable.

Proof. (i) We already know the result if f and g are simple functions.
(ii) Suppose f and g are non-negative functions. Let an and bn be non-negative simple
functions which
R increase
R to f and g respectively. Then 0 ≤ sn := an + bn ↑ f + g. Hence
by MCT, sn dµ ↑ (f + g)dµ. Moreover
Z Z Z
sn dµ = an dµ + bn dµ by (i)

1
Z Z
↑ f dµ + gdµ by MCT.
R
We have already observed that the left side increases to (f + g)dµ. This proves the
result for this special case.
(iii) Now suppose f ≥ 0 and g ≤ 0 and h = f + g ≥ 0. This implies g is finite. Note
that f = h + (−g) is the sum of two non-negative functions. Hence
Z Z Z
f dµ = hdµ (−g)dµ by (ii)
Z Z
= hdµ − gdµ by Theorem 1 (a) of Lecture 16
R R
If gdµ is finite then from the above the result follows. If gdµ = −∞, then since
h ≥ 0, Z Z
f dµ ≥ − gdµ = ∞,
R R
contradicting the hypothesis that f dµ + gdµ is defined. Thus this case does not arise
at all. So, the result is proved in this case.
(iv) If f ≥ 0 and g ≤ 0 such that and h = f + g ≤ 0 then we can work with −f and −g
and apply (iii).
(v) Now we prove the general case by splitting the range of the functions so as to apply
(ii), (iii) and (iv). Let

E1 = {ω : f (ω) ≥ 0, g(ω) ≥ 0},


E2 = {ω : f (ω) < 0, g(ω) < 0},
E3 = {ω : f (ω) ≥ 0, g(ω) < 0, h(ω) ≥ 0}
E4 = {ω : f (ω) ≥ 0, g(ω) < 0, h(ω) < 0}
E5 = {ω : f (ω) < 0, g(ω) ≥ 0, h(ω ≥ 0}
E6 = {ω : f (ω) < 0, g(ω) ≥ 0, h(ω) < 0}

Now,
Z Z Z
hdµ = f dµ + gdµ, for all, i = 1, 2 . . . by (ii), (iii) and (iv),
Ei Ei Ei
Z n Z
X
f dµ = f dµ by Theorem 3
i=1 Ei
Z n Z
X
gdµ = gdµ by Theorem 3
i=1 Ei
Z Z n Z
X
f dµ + gdµ = hdµ using the above three equations
i=1 Ei

2
Z
= hdµ by Theorem 2 of Lecture 16, provided it exists.

To check that hdµ exists, we need to show that at least one of h+ dµ and h−1 dµ is
R R R

finite. Suppose Z Z
h+ dµ = h−1 dµ = ∞ if possible.

In that case there must exist Ei and Ej such that


Z Z
+
h dµ = h− dµ = ∞.
Ei Ej

But then Z Z
f dµ = ∞ or gdµ = ∞
Ei Ei

and hence Z Z
f dµ = ∞ or gdµ = ∞. (1)

Similarly, working with j,


Z Z
f dµ = −∞ or gdµ = −∞. (2)
R R
Equations (1) and (2) go against the assumption that f dµ + gdµ is defined. This
completes the proof.

The following remark follows from the above theorem.

Remark 1. (a) If {fn } are non-negative measurable then



Z X ∞ Z
X
fi dµ = fi dµ.
i=1 i=1

This can be proved by using MCT.


(b) For any Borel measurable f , |f | is integrable if and only f is integrable. This follows
since |f | = f + + f − .
(c) If g and h are Borel measurable and |g| ≤ h where h is integrable, then g is integrable.
This follows from the monotonicity of integrals and (b).

Exercise 5. Suppose {xn,k }, n , k =1, 2, . . . is an array of non-negative real numbers.


Show by using Remark 1 (a) that,

X ∞
X ∞
X ∞
X
 
xn,k = xn,k .
k=1 n=1 n=1 k=1

3
Definition 1 (Almost surely or almost everywhere). . A property P (·) defined on
(Ω, A, µ) is said to hold almost surely-[µ] if there exists a µ-null set A such that it holds
for all ω ∈ Ac . We also write µ-almost surely etc. simply a.s or a.e.

Remark 2. The definition just says that the property P holds outside some null set.
It is silent about what happens on the chosen null set. And so, P (ω) is allowed to hold
for some ω in the chosen null set A. It is plausible that the set of ω ∈ Ω for which P (ω)
holds, may not be in A. Also, the almost surely or almost everywhere is with respect to
a given measure µ.

Example 1. Suppose f and g from (Ω, A, µ) to (Rn , B(Rn )) are measurable. Show that
A = {ω : f (ω) 6= g(ω)} ∈ A. Show that the functions f and g are equal almost surely if
and only if µ(A) = 0.

Exercise 6. Consider the measure space ([0, 1], B([0, 1], λ). Define

4 if x is irrational

f (x) = 2 if x is rational, x < 1 (3)

∞ if x = 1.

Show that
(i) f is measurable.
R
(ii) f dλ = 4.
(iii) f is not Riemann integrable.
(iv) f = 4 almost surely.

Theorem 2. Suppose f and g are Borel measurable functions.


R
(a) If f = 0 µ-almost surely, then f dµ = 0.
R R R R
(b) If f = g almost surely [µ], and f dµ exists, then gdµ exists and f dµ = gdµ.
(c) If f is µ-integrable, then f is finite almost everywhere [µ].
R
(d) If f ≥ 0 and f dµ = 0, then f = 0 a.e. µ.

Proof. (a) Suppose first that f ≥ 0 (everywhere). Take any s, 0 ≤ s ≤ f . Then s = 0


a.s.
R R
It is then easy to check that sdµ = 0. This implies f dµ = 0.
R R
RFor the general case first note that |f | = 0 a.e. Hence |f |dµ = 0. But then | f dµ| ≤
|f |dµ = 0.
(b) Let A = {ω : f (ω) = g(ω)}, B = Ac . Note that B ∈ A and µ(B) = 0. Now,

g = gIA + gIB ,

4
f = f IA + f IB ,
gIB = f IB = 0 almost surely.

Now the result follows from the additivity of integrals and part (a).
R
(c) Let A = {ω : |f |(ω) = ∞}. If µ(A) > 0, then |f |dµ ≥ ∞µ(A) = ∞ which is a
contradiction. Hence (c) is proved.
(d) Let B = {ω : f (ω) > 0} and Bn = {ω : f (ω) > 1/n}, n = 1, 2, . . .. Then Bn ↑ B.
Now 0 ≤ f IBn ≤ f IB . Hence,
Z Z Z
0≤ f dµ ≤ f dµ ≤ f dµ = 0.
Bn B

On the other hand,


Z Z
0= f dµ ≥ 1/ndµ = (1/n)µ(Bn ).
Bn Bn

Hence µ(Bn ) = 0 and this in turn implies that µ(B) = 0.

R f and g are Borel measurable functions such that f ≤ g a.e.µ.


Exercise R7. Suppose
Show that f dµ ≤ gdµ whenever both the integrals exist.

Exercise 8. Suppose fn is a non-decreasing sequenceR of non-negative


R Borel measurable
functions such that fn ↑ f almost surely. Show that fn dµ ↑ f dµ.

5
Lecture 18
Fatou’s lemma
Dominated Convergence Theorem (DCT)
January 6, 2021

We have seen that interchanging the order of limit and integration, the order of sum-
mation in a double sum etc. are valid under a “non-negativity” assumption. This is
inadequate for applications and we now extend these results significantly with two very
important results.
We first upgrade the MCT by relaxing the non-negativity assumption. We shall continue
to call this theorem MCT. Before that, a small exercise

Exercise
R R 1. Suppose fn are non-negative (everywhere) such that fn ↑ f a.e. Then
fn dµ ↑ f dµ.

Theorem 1 (Extended MCT). Suppose {fn } and g are Borel measurable functions.
R R R
(a) If fn ≥ g for all n such that gdµ > −∞, and fn ↑ f , then fn dµ ↑ f dµ.
R R R
(b) If fn ≤ g for all n such that gdµ < ∞, and fn ↓ f , then fn dµ ↓ f dµ.
R R
Proof.
R (a) If gdµ = ∞ then
R by Theorem 1 (b)(i) of Lecture 16, fn dµ = ∞ and
f dµ = ∞. So assume that gdµ < ∞. Then g is integrable and by Theorem 2 (c) of
Lecture 17, g is finite almost everywhere. Redefine g to be 0 on the set {ω : g(w) = ±∞}
R continue to call this function g). Then 0 ≤ fn − g ↑ f − g a.e. Hence 0 ≤ (fn − g)dµ ↑
(we
(f − g)dµ. Note that now we can apply additivity Theorem 1 of Lecture 17 (check that
conditions of that theorem are satisfied), and the result follows.
(b) Consider −fn and −g and apply (a).

Exercise 2. Show that the above theorem continues to hold if we add the clause “almost
surely” at every hypothesis.

Recall that for any sequence of measurable functions {fn },

lim sup fn (ω) = inf sup fk (ω),


n→∞ n≥1 k≥n

lim inf fn (ω) = sup inf fk (ω).


n→∞ n≥1 k≥n

We know that these are extended real-valued functions and are Borel measurable.

Theorem 2 (Fatou’s Lemma). Suppose {fn } and f are Borel measurable.


R
(a) If fn ≥ f for all n such that f dµ > −∞, then
Z Z
lim inf fn dµ ≥ lim inf fn dµ.
n→∞ n→∞

1
R
(b) If fn ≤ f for all n such that f dµ < ∞, then
Z Z
lim sup fn dµ ≤ lim sup fn dµ.
n→∞ n→∞

Remark 1. Compare Fatou’s lemma with Exercise 12 of Lecture 4. That will help you
to remember which way the inequalities go when you apply Fatou’s Lemma.

Proof
R of Theorem 2. Define gn = inf k≥n fk , g = lim inf fn . Then gn ≥ f for all n,
R f dµ > R−∞ and gn ↑ g. Hence we can apply MCT (Theorem 1) to conclude that
gn dµ ↑ gdµ. But gn ≤ fn . Hence
Z Z Z Z Z
lim inf fn dµ = gdµ = lim gn dµ = lim inf gn dµ ≤ lim inf fn dµ.

(b) Take the negatives and convert limsup to liminf and apply (a).

Theorem 3 (Dominated Convergence Theorem (DCT)). Suppose {fn } and h are Borel
measurable functions such that
(i) |fn | ≤ h for all n,
(ii) h is integrable,
(iii) fn → f a.e.
R R
Then f is integrable and fn dµ → f dµ.
Remark 2. (i) As usual, we can add “almost sure” in (i) and the result will still hold.
(ii) A common mistake in applications is to ignore condition (ii). Note that if the limit,
say f , of fn is known to exist, that is not enough for (i) or (ii) to hold.

Proof of Theoem 3. Since (iii) holds, we know that |f | ≤ h a.e., and hence f is integrable.
By Fatou’s lemma
Z Z
lim inf fn dµ ≤ lim inf fn dµ
Z
≤ lim sup fn dµ
Z
≤ lim sup fn dµ.

But lim inf fn = lim sup fn = f a.e. and the result follows.

Exercise 3 (Compare with Exercise 4 of Lecture 17). . Suppose {xn } is a sequence of


real numbers such that xn → x (finite). Show that
 xn  n
1+ → ex .
n

2
Lecture 19
Induced measure
Measure preserving transformation
Change of variable formula
Riemann and Lebesgue integrals
January 6, 2021

Exercise 1. Suppose {fn } are Borel measurable functions such that |fn | ≤ h for all n.
and |h|p is integrable for some p > 0. If fn → f a.e. and |f |p is integrable, show that
|fn − f |p dµ → as n → ∞.
R

Exercise 2. Suppose (Ω, A, µ) is a measure space. Suppose f and g are Borel measurable
functions. Consider the condition
Z Z
gdµ ≤ f dµ for all A ∈ A. (1)
A A

Show the following


(a) If f and g are integrable and (1) holds, then g ≤ f a.e.
(b) If integrals of f and g exist, (1) holds, and µ is σ-finite, then g ≤ f a.e. Hint:
Reduce to finite measure case and then use the set
1
An = {ω : g(ω) ≥ h(ω) + , |h(ω)| ≤ n}.
n
Exercise 3. Suppose T : (Ω, A) → (Ω1 , A1 ) is a measurable function and µ is a measure
on (Ω, A). Define the set function µT −1 on A1 by

µT −1 (A1 ) = µ T −1 (A1 ) A1 ∈ A1 .


Show that µT −1 is a measure.


Definition 1 (Induced measure). The measure µT −1 on (Ω1 , A1 ) is called the induced
measure of T . If (Ω, A) = (Ω1 , A1 ) and µ = µT −1 then T is called measure preserv-
ing.
Exercise 4 (Change of variable formula). Suppose T : (Ω, A, µ) → (Ω1 , A1 , µT −1 ).
Suppose f is any Borel measurable function on Ω1 . Show that for any A1 ∈ A1 ,
Z Z
−1
f (ω1 )dµT (ω1 ) = f (T (ω))dµ(ω)
A1 T −1 (A1 )

in the sense that if one of the integrals exists then the other exists and they are equal.
Hint: Start with f as an indicator function.
Definition 2. (Lebesgue set) The Lebesgue σ-field is the completion of the Borel σ-
field with respect to the Lebesgue measure. Any set in the Lebesgue σ-field is called a
Lebesgue set.

1
Theorem 1 (Reimann and Lebesgue integration). Let f be a bounded real valued Borel
measurable function on [a, b]. Let λ be the Lebesgue measure on [a, b]. Show that
(a) Then f is Riemann integrable if and only if f is λ-a.e. continuous.
(b) If f is Riemann integrable on [a, b] then f is Lebesgue integrable on [a, b] and the
two integrals are equal.

Proof. Suppose supx∈[a, b] |f (x)| ≤ M < ∞. Consider any partition

π = {a = x0 < x1 < · · · < xn = b}

and let
|π| = max (xi − xi−1 ).
1≤i≤n

Define

Mi = sup{f (x) : xi−1 < x ≤ xi }, i = 1, . . . , n,


mi = inf{f (x) : xi−1 < x ≤ xi }, i = 1, . . . , n.

For any π, define the upper and lower functions on [a, b] by

Uπ,n (0) = f (0), Uπ,n (x) = Mi , if xi−1 < x ≤ xi , i = 1, . . . , n,


Lπ,n (0) = f (0), Lπ,n (x) = mi , if xi−1 < x ≤ xi , i = 1, . . . , n.

Note that
Lπ,n (x) ≤ f (x) ≤ Uπ,n (x) for all x ∈ [a, b]
and these functions are also bounded by M . Define the upper and lower sums on [a, b]
by
n
X
Uπ,n = Mi (xi − xi−1 ),
i=1
Xn
Lπ,n = mi (xi − xi−1 ).
i=1

Consider the space ([a, b], L, λ) where L is the Lebesgue σ-field (of subsets of [a, b]).
Note that Uπ,n (·) and Lπ,n (·) are simple functions and
Z Z
Uπ,n (x)dλ(x) = Uπ,n , Lπ,n (x)dλ(x) = Lπ,n .

Choose any sequence of partitions {πn } such that and |πn | → 0, and for each n, πn+1
is a refinement of πn (that is all points of πn are also points of πn+1 ). Then {Uπn ,n (·)}
and {Lπn ,n (·)} are respectively non-increasing and non-decreasing sequence of functions,
with limits Uπ (·) and Lπ (·) say. Moreover

Lπ (x) ≤ f (x) ≤ Uπ (x) for all x ∈ [a, b] (2)

2
and these functions are also bounded by M . Then by Dominated Convergence Theorem
(DCT) (the bounding function is the constant function M ), Let
Z Z
lim Uπn ,n = lim Uπn ,n (x)dλ(x) = Uπ (x)dλ(x), (3)
n→∞ n→∞
Z Z
lim Lπn ,n = lim Lπn ,n (x)dλ(x) = Lπ (x)dλ(x). (4)
n→∞ n→∞

Also note that for all a < x < b,

f is continuous at x if and only if Uπ (x) = Lπ (x) = f (x)

for all such sequence of partitions.


(a) Suppose f is Riemann integrable. Denote the value of the Riemann integral by R(f ).
Then by definition, for any sequence of partitions {πn } such that |πn | → 0,

R(f ) = lim Uπn ,n = lim Lπn ,n .


n→∞ n→∞

But then by (2) for all such sequences of partitions, Uπ (·) = f (·) = Lπ (·) λ-a.e. Hence
f is continuous λ-a.e.
Now assume that f is continuous λ-a.e. Then Uπ (·) = f (·) = Lπ (·) λ-a.e. for any
sequence of partitions {πn } such that |πn | → 0. Now note that Uπ (·) and Lπ (·) are Borel
measurable (since they are limits of simple functions). So f is Lebesgue measurable.
Since f is bounded (and Lebesgue measurable) it is integrable with respect to λ. Hence
Z Z Z
Uπ (x)dλ(x) = f (x)dλ(x) = Lπ (x)dλ(x) (5)

irrespective of the sequence of partitions we choose. Hence f is Riemann integrable.


(b)
R Since f is Riemann integrable, it is continuous λ-a.e. But then by (5) R(f ) =
f (x)dλ(x).

3
Lecture 20
Jordan-Hahn Decomposition Theorem
Signed measure
Upper, lower and total variation
January 7, 2021

Exercise 1. Give an example of a sequence of functions {fn } on [a, b] such that they
are bounded by 1, each fn is Riemann integrable, fn (x) → f (x) for all x ∈ [a, b] but f
is not Riemann integrable.

Definition 1 (Improper Riemann integral). Suppose f : R → R. Suppose its Riemann


integral, say R[a, b] (f ) exists for all a < b. The improper integral of f is defined as

R(f ) = lim R[a, b] (f )


a↓−∞, b↑∞

exists and is finite.

Exercise 2. Show that if f has an improper integral, then it is continuous λ-a.e. on R.


Show that its converse is not true.

Exercise 3. If f is non-negative and has an improper integral, show that then f is


integrable on R with respect to the completion of the Lebesgue measure and the two
integrals are equal. Show by example that the result need not be true if f is not non-
negative.

Definition 2. Suppose
R (Ω, A, ν) is a measure space and f isRa Borel measurable function
on Ω such that Ω f dµ exists. Then ν defined by ν(A) = A f dµ, A ∈ A is called the
indefinite integral of f with respect to µ.

Recall that in this case ν is a difference of two measures and at least one of them is
finite. We now target a converse.

Theorem 1. Suppose ν is an extended real valued, countably additive set function on a


measurable space (Ω, A). Then there exists sets C, D ∈ A such that

ν(C) = sup{ν(A) : A ∈ A} and ν(D) = inf{ν(A) : A ∈ A}.

Exercise 4. (i) Suppose ν is a measure. Then C = Ω, D = ∅.


(ii) Suppose ν is the indefinite integral of f . Then it can be easily checked that we can
take the sets to be
C = {ω : f (ω) ≥ 0}, D = {ω : f (ω) < 0}.
The set {ω : h(ω) = 0} could be taken out of C and included in D.

Proof of Theorem 1. (i) If for some A0 ∈ A, ν(A0 ) = ∞. Then take C = A0 .

1
(ii) Suppose now that ν(A) < ∞ for all A ∈ A. Let S be the supremum. Get An ∈ A
such that ν(An ) → S. Let A0 = ∪∞ n
n=1 An . Fix n and consider the 2 disjoint sets
A∗1 ∩ A∗2 ∩ · · · ∩ A∗n where each A∗i is either Ai or A0 \ Ai . Some of them could be empty.
Label them as Anm , m = 1, 2, . . . , 2n . Let

Bn = ∪m {Anm : ν(Anm ) ≥ 0}.

Since each An is a finite disjoint union of some sets Anm , and negative-valued sets have
been dropped in the definition of Bn , using additivity of ν, we have

ν(An ) ≤ ν(Bn ).

Also note that there is “nesting”. If n1 > n2 then each An1 m1 is either a subset of some
An2 m or disjoint from it. This implies that for r ≥ n,

∪rk=n Bk = Bn ∪ (∪j Ej ) such that for all j, Ej ∩ Bn = ∅, and ν(Ej ) ≥ 0.

Hence we have

ν(An ) ≤ ν(Bn )
≤ ν(∪rk=n Bk ), (additivity and the above observation)
→ ν(∪∞
k=n Bk ) as r → ∞ (continuity from below).

Define
C = lim sup Bn .
Then ∪∞
k=n Bk ↓ C. Also 0 ≤ ν(∪∞
k=n Bk ) < ∞. Thus

S = lim ν(An )
n→∞
≤ lim ν(∪∞
k=n Bk )
n→∞
= ν(C) (continuity from below)
≤ S.

Hence S = ν(C). D can be defined by considering −ν.

Remark 1. We have used continuity from below and above. This follows from the
following exercise. Not sure if I have done this earlier.
Exercise 5. Suppose ν is a countably additive set function on σ-field A. Show that then
the following hold:
(i) If An ∈ A such that An ↑ A, then ν(An ) → ν(A). The convergence may not be
monotone.
(ii) If An ∈ A such that An ↓ A, and µ(Ai ) < ∞, then ν(An ) → ν(A). The convergence
may not be monotone.
(ii) The results (i) and (ii) hold if ν is defined on a field F and we assume that A ∈ F.

2
Theorem 2 (Jordan-Hahn Decomposition theorem). Suppose ν is an extended real
valued, countably additive set function on a measurable space (Ω, A). Define

ν + (A) = {sup{ν(B) : B ∈ A, B ⊂ A},


ν − (A) = −{inf{ν(B) : B ∈ A, B ⊂ A}.

Then ν + and ν − are measures on A and ν = ν + − ν − .

Proof. By definition, ν does not take both values ±∞. Without loss, assume that ν
does not take the value −∞. Let D be a set with the property described in Theorem 1.
Since ν(∅) = 0, we have −∞ < ν(D) ≤ 0. Take any set A ∈ A. Then

ν(D) = ν(A ∩ D) + ν(Ac ∩ D).

Observe that both terms on the right side are finite. Hence

ν(D) ≤ ν(Ac ∩ D) since D yields the infimum


= ν(D) − ν(A ∩ D).

This implies that


ν(A ∩ D) ≤ 0. (1)
On the other hand

ν(D) ≤ ν(D ∪ (A ∩ Dc )), since D yields the infimum


= ν(D) + ν(A ∩ Dc ) (additivity).

This implies that


ν(A ∩ Dc ) ≥ 0. (2)
Take any B ∈ A, B ⊂ A. Then

ν(B) = ν(B ∩ D) + ν(B ∩ Dc ), by additivity


≤ ν(B ∩ Dc ), by (1)
≤ ν(B ∩ Dc ) + ν((A \ B) ∩ Dc ), by (2)
= ν(A ∩ Dc ), by additivity.

Hence

ν + (A) ≤ ν(A ∩ Dc ) taking supremum above over all B


≤ ν + (A) by definition of ν + .

This shows that


ν + (A) = ν(A ∩ Dc ). (3)
Similarly,

ν(B) = ν(B ∩ D) + ν(B ∩ Dc ), by additivity

3
≥ ν(B ∩ D), by (1)
≥ ν(B ∩ D) + ν((A \ B) ∩ D), by (1)
= ν(A ∩ D), by additivity.

Hence by taking infimum over all such B,

ν − (A) ≤ −ν(A ∩ D)
≤ ν − (A) by definition of ν − .

This shows that


ν − (A) = −ν(A ∩ D).
Hence for all A ∈ A,

ν + (A) − ν − (A) = ν(A ∩ Dc ) + ν(A ∩ D) = ν(A)

and by construction ν + and ν − are measures on A.

Remark 2. Suppose ν is a countably addtive extended real-valued set function on A.


The following statements are consequence of the arguments given in the above proof.
We leave this as an exercise.
(i) ν is the difference of two measures, at least one of which is finite.
(ii) If ν is finite (that is |ν(A)| < ∞ for all A ∈ A), then ν is bounded.
(iii) There is a set D ∈ A such that for all A ∈ A, ν(A ∩ D) ≤ 0 and ν(A ∩ Dc ) ≥ 0.
(iv) If D ∈ A is any set such that for all A ∈ A, ν(A ∩ D) ≤ 0 and ν(A ∩ Dc ) ≥ 0, then
ν + (A) = ν(A ∩ Dc ) and ν − (A) = −ν(A ∩ D).
(v) If E is any other set such that conditions in (iv) for D are satisfied for E, then
ν + (D∆E) + ν − (D∆E) = 0

Definition 3. The measures ν + and ν − are called the upper and lower variations of
ν. The measure |ν| := ν + + ν − is called the total variation of ν. A countably additive
set function ν on a σ-field is also called a signed measure.

4
Lecture 21
Absolute continuity of measures
Radon-Nikodym Theorem
January 12, 2021

Exercise 1. Let P be a probability measures on B(R). Define another probability mea-


sure on B(R) by
(
1 if 0 ∈ A, A ∈ B(R)
Q(A) = (1)
0 if 0 ∈
/ A, A ∈ B(R).

Find the Jordan-Hahn decomposition of ν = P − Q.


Exercise 2 (Jordan-Hahn
R decomposition of the indefinite integral). Suppose f is Borel
measurable and f dµ exists. Let ν by the indefinite integral of f . [We already know
that ν is countably additive]. Show that the Jordan-Hahn decomposition of ν is given by
Z Z Z
+ + − −
ν (A) = f dµ, ν (A) = f dµ, and |ν|(A) = |f |dµ.
A A A

Exercise 3 (Minimality of the Jordan-Hahn decomposition). Suppose ν is a signed


measure on A. Let ν + and ν − be its upper and lower variations. If ν = ν1 − ν2 where
ν1 and ν2 are measures, then show that

ν1 (A) ≥ ν + (A), and ν2 (A) ≥ ν − (A), for all A ∈ A.

Exercise 4. Suppose ν is a signed measure on A. Show that the total variation of |ν|
is given by
n
nX o
|ν|(A) = sup |ν(Ei )| : {Ei } are disjoint measurable subsets of E .
i=1

Exercise 5. If ν1 and ν2 are signed measures, show that |ν1 + ν2 | ≤ |ν1 | + |ν2 |.

Suppose f is a non-negative measurable function on (Ω, A, µ). Recall that its indefinite
integral ν is then also a measure. We may visualise the relation between µ, f and ν as
dν = f dµ. In other words the “derivative of ν with respect to µ equals f ”. We now
make this and related ideas precise.
Definition 1. Suppose µ is a measure and ν is a signed measure on (Ω, A, µ). Then we
say that ν is absolutely continuous with respect to µ if for every A ∈ A, µ(A) = 0
implies ν(A) = 0. We write ν << µ.
R
Example 1. Suppose f is a Borel measurable function on (Ω, A, µ) such that f dµ
exists. .If ν is the indefinite integral of f then ν << µ.

The Radon-Nikodym Thereom provides a converse to the above.

1
Theorem 1 (Radon-Nikodym theorem). Suppose µ is a σ-finite measure and ν is a
signed measure on (Ω, A, µ), and ν << µ. Then there exists a Borel measurable function
f : Ω → R̄ such that ν is the indefinite integral of f . That is
Z
ν(A) = f dµ for all A ∈ A. (2)
A

The function f is unique: if g is any other function which satisfies (2), then f = g
a.e.[µ].

Notation. The function f is called the Radon-Nikodym derivative of ν with respect


to µ and we write (2) as dν = f dµ or dν/dµ = f .

Proof. Once the existence is proved, the uniqueness follows from Exercise 2 of Lecture
19. We prove the existence in a few steps.

Step 1. Suppose that µ and ν are finite measures. Define


Z

S = f : f ≥ 0, integrable with respect to µ, and f dµ ≤ ν(A) for all A ∈ A .
A

Note that S is non-empty. Let


Z
s = sup{ f dµ : f ∈ S}.

Note that s ≤ ν(Ω) < ∞. Partially order S by declaring that f ≥ g if and only f ≥ g
a.e. µ.
Let f, g ∈ S. Then h = max(f, g) ∈ S. This follows by taking B = {ω : f (ω) ≤ g(ω)},
and observing that for any A ∈ A,
Z Z Z
hdµ = gdµ + f dµ
A A∩B A∩B c
≤ ν(A ∩ B) + ν(A ∩ B c )
= ν(A).

RWe now identify a maximal element of S. Let {fn } be a sequence in S such that
fn dµ → s. Let gn = max(f1 , . . . , fn ). Then gn ∈ S and gn is non-decreasing. Let
g = lim gn . Then by MCT,
Z Z Z
gdµ = lim gn dµ ≥ lim fn dµ = s.

Let A ∈ A. Then 0 ≤ gn IA ↑ gIA . Hence By MCT,


Z Z
gIA dµ = lim gn IA dµ.

2
R R
But we know that gn IA dµ ≤ ν(A) for all n. Hence g ∈ S. Now since gdµ = s, g is a
maximal element of S. Now consider the set function
Z
ν1 (A) = ν(A) − gIA dµ, A ∈ A.

Then clearly ν1 is a measure. Moreover, ν1 << µ. If ν1 is identically 0, then we are


done.
Suppose if possible, ν1 is not identically 0. Then ν1 (Ω) > 0. Hence there exist a k > 0,
such that
µ(Ω) − kν1 (Ω) < 0. (3)
Apply Remark 2 (iii) of Lecture 20 on the signed measure µ − kν1 to obtain a D ∈ A
such that for all A ∈ A,

µ(A ∩ D) − kν1 (A ∩ D) ≤ 0, (4)


c c
µ(A ∩ D ) − kν1 (A ∩ D ) ≥ 0. (5)

Suppose if possible µ(D) = 0. Then by absolute continuity, ν(D) = 0. Hence ν1 (D) = 0.


Use (5) with A = Ω to obtain

0 ≤ µ(Dc ) − kν1 (Dc ) (6)


= µ(Ω) − kν1 (Ω) since µ(D) = ν1 (D) = 0 (7)
< 0 by (3) (8)

which is a contradiction. Hence µ(D) > 0. Define



 1 if ω ∈ D,
h(ω) = k
0, if ω ∈ / D.

If A ∈ A, then
Z
1
hdµ = µ(A ∩ D)
A k
≤ ν1 (A ∩ D) by (4)
≤ ν1 (A)
Z
= ν(A) − gdµ.
A

This implies Z
(h + g)dµ ≤ ν(A).
A
But then h + g > g on the set D with µ(D) > 0. This contradicts the maximality of g.
Thus ν1 is identically 0 and the theorem is proved in this special case.

3
Lecture 22
Radon-Nikodym Theorem, continued
January 12, 2021

Theorem 1 (Radon-Nikodym theorem). Suppose µ is a σ-finite measure and ν is a


signed measure on (Ω, A, µ), and ν << µ. Then there exists a Borel measurable function
f : Ω → R̄ such that ν is the indefinite integral of f . That is
Z
ν(A) = f dµ for all A ∈ A. (1)
A

The function f is unique: if g is any other function which satisfies (1), then f = g
a.e.[µ].

Proof of Radon-Nikodym theorem continued. Recall that in Step 1 we have already proved
the theorem when both µ and ν are finite measures.

Step 2. Assume that µ and ν are finite and σ-finite measures respectively. Let {Ωn } be
disjoint sets in A such that

∪∞
n=1 Ωn = Ω and ν(Ωn ) < ∞ for al n.

Define
νn (A) = ν(A ∩ Ωn ), A ∈ A.
Then it trivially follows that for every n, νn is a finite measure and νn << µ. Hence
by Step 1, for every n, there exists a non-negative measurable Pfunction gn such that νn
is the indefinite integral of gn with respect to µ. Take g = ∞ n=1 gn . Then ν is the
indefinite integral of g with respect to µ.

Step 3. Now suppose that µ and ν are finite and arbitrary measures respectively. For
any C ∈ A, define
AC = {A ∩ C : A ∈ A}.
Then AC is a σ-field of subsets of C.
Define the class of sets

C = C ∈ A : ν restricted to AC is a σ-finite measure .

Note that C is not empty since ∅ ∈ C.


Let
s = sup{µ(A) : A ∈ C}.
Pick Cn ∈ C such that µ(Cn ) → s. Let C = ∪∞
n=1 Cn . Then C ∈ C. Since

s ≥ µ(C) ≥ µ(Cn ) → s,

1
we have µ(C) = s.
Now consider the measures µ and ν restricted to AC for the above choice of C. Since µ
and ν are finite and σ-finite on AC respectively, by Step 2 there exists a non-negative
function fC : C → R̄, which is Borel measurable with respect to AC such that ν is the
indefinite integral of µ. In other words,
Z
ν(A ∩ C) = fC dµ for all A ∈ A.
A∩C

Now consider a set A ∈ A. We have two possibilities:

Case 1. µ(A ∩ C c ) > 0. Then suppose if possible, ν(A ∩ C c ) < ∞. But this implies
C ∪ (A ∩ C c )) ∈ C and

s ≥ µ(C ∪ (A ∩ C c )) = µ(C) + µ(A ∩ C c ) > µ(C) = s.

This is a contradiction. hence we must have ν(A ∩ C c ) = ∞.

Case 2. µ(A ∩ C c ) = 0. Then by absolute continuity, ν(A ∩ C c ) = 0.

Observe that in either case,


Z
ν(A ∩ C c ) = ∞ dµ.
A∩C c

It follows that

ν(A) = ν(A ∩ C) + ν(A ∩ C c )


Z
= f dµ
A

where
(
fC (ω) if ω ∈ C
f (ω) = (2)
∞ if ω ∈ C c .

Clearly f is a Borel measurable function.

Step 4. Suppose µ and ν are σ-finite and arbitrary measures respectively. Let Ωn be
disjoint sets in A such that Ω = ∪∞n=1 Ωn and µ(Ωn ) < ∞ for all n. By Step 3, for every
n, there exists gn : Ωn → R̄ which are Borel measurable with respect to AΩn such that
Z
ν(A ∩ Ωn ) = gn dµ, A ∈ A.

Extend gn to all of Ω by defining it to be 0 on Ωcn . Call this new function fn . Note that
fn is measurable and Z
ν(A ∩ Ωn ) = fn dµ, A ∈ A.
A

2
Then

X
ν(A) = ν(A ∩ Ωn )
n=1
X∞ Z
= fn dµ,
n=1 A
Z ∞
X
= f dµ where f = fn .
A n=1

Step 5. Now assume that µ and ν are is σ-finite and signed measures respectively. Write
ν = ν + − ν − . Without loss, assume that ν − is a finite measure. By Step 4, there
exists non-negative Borel measurable functions f1 and f2 such that ν + and ν − are the
indefinite integrals of f1 and f2 respectively with respect to µ. Since ν − is finite, f2 is
µ-integrable. Hence

ν(A) = ν + (A) − ν − (A)


Z Z
= f1 dµ − f2 dµ
ZA
= (f1 − f2 )dµ by the additivity result.
A

This completes the proof of the Radon-Nikodym theorem.

Remark 1. The following facts follow from the above theorem and its proof. Suppose
ν << µ where ν is a sigmed measure and µ is σ-finite. The proofs are left as exercises.

(i) If ν is a finite measure then dν/dµ is µ-integrable and hence is finite a.e. µ.

(ii) If |ν| is σ-finite, then dν/dµ is finite a.e. µ.

(iii) If ν is a measure then dν/dµ ≥ 0 a.e µ.


R
Exercise 1. Give Ran example where g which is finite a.e. µ, gdµ exists and if ν is
defined by ν(A) = A gdµ, A ∈ A then nether of the measures |µ| and µ are σ-finite.

Exercise 2. Give an example to show that the condition µ is σ-finite cannot be dropped
from the Radon-Nikodym theorem.

Exercise 3. Suppose ν is a finite signed measure and µ is a measure. Show that ν << µ
if and only if given any  > 0 there exists a δ > 0 such that for any A ∈ A, µ(A) < δ
implies |ν(A)| < .

Exercise 4. Suppose g is a non-negative measurable function and ν is the indefinite


integral of g with respect to µ. When is µ an indefinite integral of ν?

3
Exercise 5. Suppose ν1 and ν2 are signed measures which are both absolutely continuous
with respect to a σ-finite measure µ and the signed measure ν1 + ν2 is well-defined. Show
that (ν1 + ν2 ) << µ and d(ν1 + ν2 )/dµ = (dν1 /dµ) + (dν2 /dµ).

Exercise 6. Suppose µ1 , µ2 , and µ3 are three σ-finite measures such that µ1 << µ2 and
µ2 << µ3 . Show that µ1 << µ3 and dµ1 /dµ3 = (dµ1 /dµ2 )(dµ2 /dµ3 ).

Exercise 7. Suppose µ and ν are mutually absolutely continuous σ-finite measures.


Show that dµ/dν = (dν/dµ)−1 a.e µ or equivalently a.e. ν.

4
Lecture 23
Singularity of measures
January 19, 2021

Definition 1 (Singularity of measures and signed measures). Suppose µ1 and µ2 are


measures. Then we say that they are mutually singular if there exists a set A ∈ A
such that µ1 (A) = µ2 (Ac ) = 0. Then we write µ1 ⊥ µ2 . If ν1 and ν2 are signed measures
then we say that they are mutually singular if |ν1 | ⊥ |ν2 |.
Exercise 1. Suppose ν is a signed measure with Jordan-Hahn decomposition ν = ν + −
ν − . Then show that ν + ⊥ ν − .

The following lemma was earlier given as an exercise.


Lemma 1 (Borel-Cantelli Lemma). Suppose {Ai } is a sequence of sets on a measure
space. Then ∞
P
n=1 µ(A n ) < ∞ implies that µ(lim sup An ) = 0.
Lemma 2. Suppose µ is a measure and ν1 and ν2 are signed measures on A. Then the
following hold:
(a) If λ1 ⊥ µ and λ2 ⊥ µ then λ1 + λ2 ⊥ µ.
(b) λ1 << µ if and only if |λ1 | << µ
(c) If λ1 << µ and λ2 ⊥ µ then λ1 ⊥ λ2 .
(d) λ1 << µ and λ1 ⊥ µ then λ1 = 0
(e) Suppose λ1 is finite. Then λ1 << µ if and only if limµ(A)→0 λ1 (A) = 0.

Proof. (a) Suppose A and B are such that


µ(A) = |λ1 |(Ac ) = 0, and µ(B) = |λ2 |(B c ) = 0.
Then µ(A ∪ B) = 0. For every C ⊂ (A ∪ B)c , C ∈ A,
λ1 (C) = λ2 (C) = 0 and hence |λ1 + λ2 |((A ∪ B)c ) = 0.
(b) Suppose λ1 << µ and µ(A) = 0. If λ+ 1 (A) > 0, then there exists a B ⊂ A such that
λ1 (B) > 0. This is a contradiction to λ1 << µ. Hence λ+ +
1 (A) = 0. That is λ1 << µ.
Similarly, λ−
1 << µ and then by (a), |λ1 | << µ. Converse is easy.

(c) Suppose A is such that µ(A) = 0 and |λ2 |(Ac ) = 0. But since λ1 << µ, by (b) we
know that |λ1 | << µ. Hence |λ1 |(A) = 0. That is λ1 ⊥ λ2 .

(d) By (c) λ1 ⊥ λ1 . Hence there exists a set A such that |λ1 |(A) = |λ1 |(Ac ) = 0. Hence
|λ1 |(Ω) = 0.

(e) Suppose λ1 << µ. Suppose if possible the condition does not hold. Then there exists
 > 0 and sets such that
µ(An ) < 2−n , |λ1 |(An ) ≥ .

1
Let A = lim sup An . Then by Borel-Cantelli Lemma, µ(A) = 0. But |λ1 |(∪∞ k=n Ak ) ≥
|λ1 |(An ) ≥ . Hence |λ1 |(A)| = limn→∞ |λ1 |(∪∞ A
k=n k ) ≥ . This is a contradiction.

Exercise 2. Suppose f is integrable with respect to a measure µ.


R
(a) Show that limn→∞ {ω:|f (ω)|≥n} f dµ = 0.
R
(b) By using (a), show that limµ(A)→0 A f dµ = 0.

Definition 2. A signed measure ν is said to be σ-finite if |ν| is σ-finite.

Theorem 1 (Lebesgue decomposition theorem). Suppose µ is a measure and ν is a


σ-finite signed measure on a measure space. Then ν has a unique decomposition into
ν = ν1 + ν2 such that they are both signed measures and ν1 << µ, ν2 ⊥ µ.

2
Lecture 24
Lebesgue decomposition theorem
January 19, 2021

Theorem 1 (Lebesgue decomposition theorem). Suppose µ is a measure and ν is a


σ-finite signed measure on a measure space. Then ν has a unique decomposition into
ν = ν1 + ν2 such that they are both signed measures and ν1 << µ, ν2 ⊥ µ.

Proof. (a) First suppose ν is a finite measure. Let

C = {A ∈ A : µ(A) = 0} and s = sup{ν(A) : A ∈ C} ≤ ν(Ω) < ∞.

Suppose {Ai } are such that ν(An ) → s. Then C = ∪∞


n=1 An ∈ C and ν(C) = s. Suppose
B ∈ C. Note that C ∪ B ∈ C and hence

s ≥ ν(C ∪ B) = ν(C) + ν(B − C) ≥ s.

It follows that ν(B − C) = 0. Define

ν1 (A) = ν(A − C), ν2 (A) = ν(A ∩ C), A ∈ A.

Then ν1 << µ. To see this, suppose µ(B) = 0. Then B ∈ C. Hence ν1 (B) = ν(B − C) =
0. Now note that µ(C) = 0 and ν2 (C c ) = ν(C c ∩ C) = ν(∅) = 0. So ν2 ⊥ µ. Finally,
ν = ν1 + ν2 . Uniqueness follows by using Lemma 1 (d) of Lecture 23.

(b) Now suppose ν is a σ-finite measure. Suppose {Ai } is a disjoint partition of Ω such
that ν(An ) < ∞ for every n. Define

νn (A) = ν(A ∩ An ), A ∈ A.

Then by (a), get {ν1n }, {ν2n


P},∞such thatPfor every n, νn = ν1n + ν2n , ν1n << µ, ν2n ⊥ µ.

Adding over n we get ν = n=1 ν1n + n=1 ν2n = ν1 + ν2 where ν1 << µ and ν2 ⊥ µ.
Uniqueness follows by using the uniqueness proved in (a).

(c) Suppose ν is a σ-finite signed measure. Then use Jordan-Hahn decomposition and
apply (b). Uniqueness follows by using the uniqueness proved in (c).

1
Lecture 25
Absolutely continuous functions
Functions of bounded variation
January 19, 2021

Definition 1 (Absolutely continuous function). Suppose f : [a, b] → R. Then f is said


to be absolutely continuous if for every  > 0, there exists a δ > 0P such that for any
collectionPof disjoint sub-intervals (ai , bi ), i = 1, . . . , n, of [a, b] with ni=1 (bi − ai ) < δ,
we have ni=1 |f (bi ) − f (ai )| < .

Note that if f is absolutely continuous, then it is continuous.

Exercise 1. (a) Show that in the above definition it does not make any difference if we
allow countable partitions instead of finite partitions.

(b) Give examples of functions which are continuous but not absolutely continuous.

(c) If f and g are absolutely continuous, show that f − g is also absolutely continuous.

Recall that f is said to be a distribution function if it is non-decreasing and right


continuous.

Theorem 1. Suppose F and G are distribution functions on [a, b] with finite Lebesgue-
Steitjes measures µ1 and µ2 respectively. Let f = F − G and µ = µ1 − µ2 . Then µ << λ
if and only if f is absolutely continuous. Here λ is the Lebesgue measure on [a, b].

Proof. (a) Suppose first that µ << λ. Fix  > 0. Then by Lemma 2 (b) and (e) of
Lecture 23, there exists δ > 0 such that λ(A) < δ implies |µ|(A) < . Consider any
collection of disjoint sub-intervals (ai , bi ), i = 1, . . . , n, of [a, b], with total length less
than δ. Let A = ∪ni=1 (ai , bi ]. Then λ(A) < δ. Hence
n
X n
X
|f (bi ) − f (ai )| = |µ(ai , bi ]| by definition of µ
i=1 i=1
Xn
≤ |µ|(ai , bi ] since |µ(A)| ≤ |µ|(A) for any A ∈ A
i=1
≤ |µ|(A) since µ{bi } = 0 for all i
≤ .

Hence by definition f is absolutely continuous.

(b) Now suppose f is absolutely continuous. Hence f is continuous. Thus for any b,

µ{b} = lim µ(b − 1/n, b] = lim [f (b) − f (b − 1/n)] = 0.


n→∞ n→∞

1
Now fix  > 0. Choose δ > 0 as in the definition of absolute continuity. We have to
show that λ(A) = 0 implies µ(A) = 0. Recall Theorem 1 on approximation of Lebesgue-
Stieltjes measure from Lecture 13. By that result,

λ(A) = inf{λ(V ) : V ⊇ A, V open}


µi (A) = inf{µi (V ) : V ⊇ A, V open}, i = 1, 2

Note that the above relations in Theorem 1 were proved for measures on R. But one
can extend the measures µi to measures on R in the obvious way and deduce the above
for Lebesgue-Stieltjes measures on [a, b].
Recall that finite intersection of open sets is open. So, from above, we can get {Vn } open
such that Vn ⊇ A and both λ(Vn ) → λ(A) = 0 and µ(Vn ) → µ(A) hold.
Choose n large enough such that for all k ≥ n, λ(Vk ) < δ. Now since Vn is open, it is a
disjoint union of the form Vn = ∪∞
i=1 (ai , bi ). Then


X
|µ(Vn )| = | µ(ai , bi )|
i=1

X
≤ |µ(ai , bi )|
i=1
X∞
= |µ(ai , bi ]| since µ{bi } = 0
i=1
X∞
= |f (bi ) − f (ai )|
i=1
≤ .

Since  was arbitrary, lim µ(Vn ) = 0 = µ(A).

Exercise 2. The absolute continuity of a function f : R → R is defined in the natural


way. Suppose F and G are bounded distribution functions on R with Lebesgue-Stieltjes
measures µ1 and µ2 respectively and let f = F − G and µ = µ1 − µ2 . Show that f is
absolutely continuous if and only if µ << λ where λ is the Lebesgue measure.

Definition 2 (Bounded variation). Suppose f : [a, b] → R and let π : a = x0 < x1 <


· · · < xn = b be any partition of [a, b]. Let
n
X
Vπ,f [a, b] = |f (xi ) − f (xi−1 )|.
i=1

Then Vf [a, b] =: supπ Vπ,f [a, b], where the supremum is over all partitions π, is called
the variation of f over [a, b]. The function f is said to be of bounded variation if
Vf [a, b] < ∞.

2
Exercise 3. (a) For any interval [a, b] and any a < c < b show that
Vf [a, b] = Vf [a, c] + Vf [c, b].
(b) If F is a monotone function on [a, b] then F is of bounded variation and Vf [a, b] =
|F (b) − F (a)|.

(b) If F and G are monotone functions on [a, b] then f = F − G is of bounded variation.


Lemma 1. Suppose f : [a, b] → R is absolutely continuous. Then f is of bounded
variation.

Proof. Fix  > 0. Choose δ as in the definition of absolute continuity. Suppose π is a


partition of sub-intervals of [a, b]. Let τ = a = x0 < x1 < · · · < xn = b be a finer
partition, consisting of sub-intervals π each of length less that δ/2. Let i0 = 0 and
i1 be the largest integers such that xi1 − xi0 < δ. Continuing this process, suppose it
terminates at xir = n. By construction xik − xik−1r ≥ δ?2 for all k = 1, . . . , r − 1. Hence
(r − 1)δ 2(b − a)
≤ b − a so r ≤ 1 + =: M.
2 δ
By absolute continuity, Vτ (f ) ≤ M . But Vπ (f ) ≤ Vτ (f ) since τ is a refinement of π.

Lemma 2. Suppose f : [a, b] → R is of bounded variation.

(a) Then there exists non-decreasing functions F and G on [a, b] such that f = F − G.

(b) If further f is absolutely continuous, then F and G in (a) can be chosen to be


absolutely continuous.

Proof. (a) Define


F (x) = Vf [a, x], G(x) = F (x) − f (x), a ≤ x ≤ b.
Clearly F is non-decreasing. We now show that G is also non-decreasing. Suppose
x1 < x2 . Then

G(x2 ) − G(x1 ) = F (x2 ) − F (x1 ) − (f (x2 ) − f (x1 ))


= Vf [x1 , x2 ] − (f (x2 ) − f (x1 ))
≥ Vf [x1 , x2 ] − |f (x2 ) − f (x1 )|
≥ 0 by definition of Vf [x1 , x2 ].
(b) Fix  > 0 and choose δ > 0 as in the definition of absolute continuity of f . Let
{(ai , bi )} be disjoint open intervals with total length less that δ. Suppose πi is a
partition of [ai , bi ]. Then by absolute continuity of f ,
n
X
Vπi ,f [ai , bi ] ≤ .
i=1

3
Taking supremum successively over all possible π1 , . . . , πn , we get
n
X
Vf [ai , bi ] ≤ .
i=1

That is the same as saying


n
X
[F (bi ) − F (ai )] ≤ .
i=1

Therefore F is absolutely continuous. Hence G is also so.

Notation: if f is integrable with respect to the Lebesgue measure λ on [a, b] then we


write Z Z b
f dλ =: f (x)dx.
[a, b] a

Theorem 2. Suppose f : [a, b] → R. Then f is absolutely continuous on [a, b] if and


only if there is a Borel measurable function g : [a, b] → R which is integrable with respect
to the Lebesgue measure λ and
Z x
f (x) − f (a) = g(t)dt, a ≤ x ≤ b. (1)
a

Proof. (a) First suppose that f is absolutely continuous. Then by Lemma 2 (b) f = F −G
where f and G are non-decreasing absolutely continuous functions. It is enough to prove
the result assuming that G = 0. Let µ be the Lebesgue-Stieltjes measure corresponding
to F . Then µ << λ by Theorem R 1. By Radon-Nikodym theorem, there is a λ-integrable
function g such that µ(A) = A gdλ for all Borel subsets of [a, b]. Take A = [a, x] to
get (1).

(b) Suppose (1) holds. We can assume g ≥ 0. Otherwise we can split g = g + − g − .


Define a measure µ on the Borel subsets of [a, b] by
Z
µ(A) = gdλ.
A

Then µ << λ. Let F be the distribution function of µ. Then F is absolutely continuous


by Theorem 1. But

F (x) − F (a) = µ(a, x]


Z x
= g(t)dt
a
= f (x) − f (a).

Hence f is absolutely continuous.

4
Exercise 4. Suppose g is Lebesgue integrable on R. Define
Z
f (x) = gdλ, x ∈ R.
(−∞, x]

Show that f is absolutely continuous and hence is continuous on R. Hint: Use DCT or
apply the above theorem.

Exercise 5. Suppose g Lebesgue integrable on [a, b] and continuous at x0 ∈ [a, b].


Suppose Z x
f (x) − f (a) = g(t)dt, a ≤ x ≤ b.
a
Show that f is differentiable at x0 and 0
f (x 0) = g(x0 ).

5
Lecture 26
Continuous singular distribution function/measure
Almost everywhere differentiable functions
Cantor set
Cantor function
February 17, 2021

Suppose F is a bounded distribution function and µ is the corresponding finite measure.


Then Lebesgue decomposition theorem splits µ as µ = µ1 + µ2 where µ1 << λ and
µ2 ⊥ λ. The following exercise splits µ2 further.
Exercise 1. Suppose F is a bounded distribution function on R. Show that F can be
decomposed uniquely (up to additive constants) as F = F1 +F2 +F3 where Fj are bounded
distribution functions (with measures µj ) and have the following properties:

(a) µ1 << λ. So F1 is absolutely continuous.

(a) µ2 is discrete. So F2 increases only by jumps (countably many) and µ2 ⊥ λ.

(c) F3 is continuous (but not absolutely continuous) and µ3 ⊥ λ. The measure µ3 or


the corresponding distribution F3 is often called continuous singular. Hint: Consider
the jumps of F − F1 and define F2 using these jumps. Then F3 = F − F1 − F2 .

Recall that if µ is a measure on the Borel sets of R which is finite on bounded sets then
it is a Lebesgue-Stieltjes measure.
Definition 1 (Differentiability of measure). Suppose µ is a signed measure on the Borel
sets of R which is finite on every bounded set. Then the upper and lower derivatives of
µ at x is defined as
µ(Ir ) µ(Ir )
Du (x) = lim sup , Dl (x) = lim inf
r→0 Ir λ(Ir ) r→0 Ir λ(Ir )

where the supremum and infimum are taken over all intervals that include x and are of
lengths less or equal to r. We say that µ is differentiable at x if Du (x) = Dl (x) and
we write the common value as Dµ(x).
Exercise 2. Suppose µ is a discrete signed measure. Show that Dµ(x) = 0 a.e λ.

We now state three results but we will not prove them. Proofs can be found in the book.
Theorem 1. Suppose µ is a signed measure on R that is finite on bounded on sets. Let
µ = µ1 + µ2 where µ1 << λ and µ2 ⊥ λ. Then Dµ = dµ1 /dλ a.e. λ.
Theorem 2. Suppose f : [a, b] → R be a non-decreasing function. Then f is differ-
entiable a.e. λ and Dµ = f 0 .e. λ. In particular a function of bounded variation is
differentiable a.e. λ.

1
Theorem 3. (a) Suppose f is absolutely continuous on [a, b] with
Z x
f (x) − f (a) = g(t)dt, a ≤ x ≤ b
a

as a Lebesgue integral. Then f0 = g a.e. λ.

(b) f is absolutely continuous if and only if


Z x
f (x) − f (a) = f 0 (t)dt, a ≤ x ≤ b
a
as a Lebesgue integral.
Exercise 3 (The Cantor set). Start with the interval [0, 1]. Remove the middle 1/3
open interval E1 = (1/3, 2/3). From each of the two disjoint sub-intervals of [0, 1] \ E1 ,
remove again their middle 1/3 intervals. Call their union E2 . Now consider the four
disjoint closed intervals of [0, 1] \ (E1 ∪ E2 ) and remove the four middle 1/3 intervals.
Continue this process of removal. Let {Ei } be all the intervals that have been removed.
Let C = [0, 1] \ ∪∞n=1 En . Show that

(a) C is uncountable.
(b) λ(C) = 0.
(c) C is a closed set (in the usual topology).
(d) Every point in C is a limit point.
(e) C is nowhere dense.
Exercise 4 (The Cantor function). In the above exercise, note that the removed set
∪ni=1 Ei consists of 2n − 1 disjoint intervals. Let A1 , . . . , A2n −1 be their enumeration in
increasing order. Define Fn : [0, 1] → [0, 1] by

0, if x = 0,

Fn (x) = k/2n if x ∈ Ak , k = 1, 2, . . . , 2n − 1,

1 if x = 1.

Complete the definition by linear interpolation at other points. Show that

(a) Each Fn is a non-decreasing continuous function.

(b) Fn (x) converges for every x. Let F (x) = limn→∞ Fn (x).


(c) F is continuous and non-decreasing.

(d) F 0 = 0 a.e. λ.

(e) F is not absolutely continuous.


(f ) Let µ be the Lebesgue-Stieltjes measure corresponding to F . Then µ ⊥ λ and has no
discrete part. That is, µ{x} = 0 for every x ∈ R.

2
Lecture 27
Complex-valued functions
Convex function
Lp spaces
Hölder Inequality
Minkowski Inequality
February 3, 2021

Definition 1. Suppose I ⊂ R is an open interval and f : I → R is a function. Then f


is said to be convex if for all x, y ∈ I and 0 ≤ α ≤ 1,

f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y).

Exercise 1. Suppose f : I → R where I is an interval. Suppose f 00 (x) ≥ 0 for all x ∈ I.


Show that then f is convex. In particular − log is a convex function on (0, ∞).

We shall consider functions which are complex-valued. So f = Ref +ιImf . It is said to be


Borel measurable if both the real and imaginary parts are Borel measurable. Integral is
defined in the natural way for complex-valued functions. All relevant results on integrals
that we have established so far on real-valued functions carry over to complex-valued
functions in a natural way.
Definition 2 (Lp space). For any p > 0, define

Lp = Lp (Ω, A, µ)
Z
= {f : f is complex-valued Borel measurable such that |f |p dµ < ∞}
Z 1/p
||f ||p = |f |p dµ , f ∈ Lp .

Lemma 1. Suppose f is a complex-valued function which is integrable. Then


Z Z

f dµ ≤ |f |dµ.

Proof. We shall skip the detailed proof. It starts by writing f (ω) = r(ω)eιθ(ω) .
R
Exercise
R 2. Suppose f is a complex-valued µ-integrable function. Show that | Ω f dµ =
Ω |f |dµ if and only if f is a.e. constant on the set {ω : f (ω) 6= 0}.

Lemma 2. (a) Suppose a, b, α, β > 0, α + β = 1. Then aα bβ ≤ αa + βb.


1 1 cp dq
(b) If c, d > 0 p, q > 1 such that + = 1. Then cd ≤ + .
p q p q

Proof. (a) Take logs and use the fact that − log is a convex function. For (b), let
α = 1/p, β = 1/q, a = cp , b = dq and apply (a).

1
1 1
Theorem 1 (Hölder Inequality). Suppose p, q > 1 such that + = 1. If f ∈ Lp
p q
and g ∈ Lq then
||f g||1 ≤ ||f ||p ||g||q .

Proof. If ||f ||p ||g||q = 0 then one of f and g is 0 a.e. µ. Then the inequality is trivial.
So assume that ||f ||p ||g||q 6= 0. In Lemma 2 (b) let
|f | |g|
c= , d= .
||f ||p ||g||q
Then
|f g| |f |p |g|q i
Z Z h
dµ ≤ + dµ
||f ||p ||g||q p||f ||pp q||g||qq
1 1
= + = 1.
p q

Exercise 3. Show that the Cauchy-Schwartz inequality follows from Hölder Inequality.
Exercise 4. Show that equality holds in Hölder inequality if and only if A|f |p = B|g|q
a.e. µ for some constants A and B, not both zero.
Exercise 5. Suppose (Ω, A, µ) is a finite measure space. Suppose 0 < r < s < ∞.

(a) Show that for any f measurable,


||f ||r ≤ k||f ||s
for some positive constant k. Further k can be taken to be 1 if µ is a probability measure.

(b) As a consequence of (a), show that Ls ⊂ Lr . Show that this is not true if we do not
assume µ is a finite measure.
Lemma 3. If a, b ≥ 0 and p ≥ 1, then (a + b)p ≤ 2p−1 (ap + bp ).

Proof. Consider the function


f (x) = (a + x)p − 2p−1 (ap + xp ), x > 0.
Then
f 0 (x) = p(a + x)p−1 − 2p−1 pxp−1 .
It is easy to see that

> 0, if x < a,

0
f (x) = 0 if x = a

< 0 if x > a.

Hence the maximum of f occurs at x = a. But f (a) = 0.

2
Exercise 6. Show that the above lemma is not true if p < 1.
Theorem 2 (Minkowski Inequality). If f, g ∈ Lp for some p ≥ 1, then
||f + g||p ≤ ||f ||p + ||g||p .

Proof. We shall use Hölder inequality to prove this. Clearly it is true for p = 1 and so
we assume that p > 1. By Lemma 3,
|f + g|p ≤ (|f | + |g|)p ≤ 2p−1 |f |p + |g|p .


p 1 1
Hence f + g ∈ Lp . Choose q = . Note that q > 1 and + = 1. Then
p−1 p q
|f + g|p = |f + g| |f + g|p−1 ≤ |f | |f + g|p−1 | + |g| |f + g|p−1 . (1)
Clearly Z Z
q
|f + g|p−1 dµ = |f + g|p dµ < ∞.


Since f, g ∈ Lp and |f + g|p−1 ∈ Lq , we can apply Hölder inequality to obtain


Z hZ i1/q
p−1
|f | |f + g| dµ ≤ ||f ||p [|f + g|p−1 ]q dµ
p
= ||f ||p| ||f + g||p/q
p , as q =
p−1
Z
|g||f + g|p−1 dµ ≤ ||g||p ||f + g||p/q
p .

Hence by using (1) and combining the above two inequalities,


||f + g||pp ≤ ||f ||p + ||g||p ||f + g||p/q
 
p .
Taking the second factor of teh right side to the left side completes the proof.

Exercise 7. Show that Minkowski inequality is not true if p < 1.


Exercise 8. Suppose p > 1. Show that equality holds in Minkowski inequality if and
only if Af = Bg a.e. µ for some constants A and B, not both zero. Find condition for
equality when p = 1
Definition 3 (Semi-norm). Let V be a vector space (over the field of real or complex
numbers). A function || · || : V → R is said to be semi-norm if

(i) ||v|| ≥ 0 for all v ∈ V .

(ii) ||αv|| = |α| ||v|| for all v ∈ V and scalar α.

(iii) ||v1 + v2 || ≤ ||v1 || + ||v2 || for all v1 , v2 ∈ V .

A semi-norm is said to be a norm if

(iv) ||v|| = 0 implies v = 0.

3
If || · || is a norm on a vector space V , then d(v1 , v2 ) = ||v1 − v2 || defines a metric on V .
Note that Lp is a vector space with the usual operations. By Minkowski Inequality,
|| · ||p is a semi-norm on Lp . Note that if f = g a.e. µ, then ||f ||p = ||g||p . Thus if
we identify two functions that are almost everywhere equal to be equal (in other words
we are considering the equivalence classes of the relation f = g a.e. µ) then for p ≥ 1,
|| · ||p becomes a norm on Lp and hence it is also a metric space. It is NOT a norm for
0 < p < 1. Nevertheless, we shall soon see that Lp is a complete metric space for all
0 < p < ∞. We shall also give meaning to L∞ and it will also be a complete metric
space.

4
Lecture 28
Markov Inequality
Chebeyshev Inequality
February 3, 2021

Inequalites for integrals with respect to general measures and probability measures are
very important. We have seen two such inequalities already. The following inequality is
easy to prove but is of prime importance in statistics and probability where it is used
for a probability measure. It appears in different forms in different areas.

Lemma 1 (Markov Inequality). Let f be a non-negative extended real valued function.


Then for any  > 0, Z
µ{ω : f (ω) ≥ } ≤ −p f p dµ.

Proof. Define the functions


(
1 if f (ω) ≥ 
g(ω) =
0 if f (ω) ≤ ,

h(ω) = −p f p (ω).

Note that g(ω) ≤ h(ω) and the result follows from this by integrating.

Definition 1. Suppose P is a probability measure. Let f be an extended real valued


integrable function. Then
Z
m = f dP

is called the mean of f whenever the integral exists. Note that the mean can be equal
to ±∞.
Suppose that the mean is finite. Then the variance of f is defined as
Z
2
σ = (f − m)2 dP.

Note that the variance is always non-negative and can be equal to ∞.

Exercise 1 (Chebyshev Inequality). Suppose P is a probability measure. Let f be an


extended real valued integrable function with mean m and variance σ 2 . Show that for
any  > 0,
P {ω : |f (ω) − m| ≥ σk} ≤ k −2 .
Hint: Define appropriate functions g and h or use Lemma 1 directly. Note that the
inequality is useful only if k > 1. Further σ can be equal to ∞.

1
Lecture 29
Lp spaces
February 17, 2021

Note that we have proved that for p ≥ 1, Lp =: Lp (Ω, A, µ) is a vector space with the
norm || · ||p .
We now investigate the space Lp for 0 < p < 1. As before we identify two functions to
be equal if they are equal a.e.
Exercise 1. Fix 0 < p < 1. Show that for any a, b ≥ 0, (a + b)p ≤ ap + bp .
Theorem 1 (Lp is a metric space for 0 < p < 1). Fix 0 < p < 1. Then:

(a) Lp is a vector space;

(b) || · ||p is not a norm on Lp ;

(c) d(f, g) = Ω |f − g|p dµ defines a metric on Lp .


R

Proof. (a) This follows from the above exercise.

(b) For a, b > 0, we have (a + b)1/p > a1/p + b1/p whenever 0 < p < 1. Thus the norm
condition is violated.

(c) This also follows easily from the above exercise.

So, to summarize, for p ≥ 1, Lp is a normed space with the norm || · ||p and
R the metric
d(f, g) = [ Ω |f − g| dµ] . For 0 < p < 1, || · |p is not a norm but d(f, g) = Ω |f − g|p dµ
p 1/p
R

is a metric. In either case, Rfor any 0 < p < ∞, {fn ∈ Lp } converges to f ∈ Lp in this
metric/norm if and only if Ω |fn − f |p dµ → 0.
Definition 1. Suppose {fn ∈ Lp } and f ∈ Lp such that Ω |fn − f |p dµ → 0. Then we
R
Lp
say fn converges to f in Lp and we write fn → f .
Exercise 2. Suppose {fn } is a sequence of Borel measurable functions dominated by
g ∈ Lp for some p > 0. Suppose fn → f a.e. µ. Show that

(a) f ∈ Lp . Hint: Use DCT.


Lp
(b) fn → f . Hint: Use DCT.
Theorem 2 (Denseness of simple functions in Lp ). Suppose f ∈ Lp for some p > 0.
Fix any  > 0. Then there exists a finite valued simple function g such that |g| ≤ |f |,
g ∈ Lp and ||f − g||p < .

Proof. Without loss of generality we can assume that f is a real-valued function. Recall
the construction of simple functions {sn } given in Lemma 2 (b) of Lecture 16. These

1
sn satisfy (i) sn → f a.e. µ and (ii) |sn | ≤ |f | for all n. Then by Exercise 2 (b),
||sn − f ||p → 0.

We shall now prove that the space Lp is complete. We need the following lemma.

Lemma 1. Suppose {fn } ∈ Lp for some p > 0 and ||fk − fk+1 ||p < 4−k for all k ≥ 1.
Then {fn } converges a.e. µ.

Proof. Define
Ak = {ω : |fk (ω) − fk+1 (ω)| ≥ 2−k }.
Then by Markov’s inequality,

µ(Ak ) ≤ 2pk ||fk − fk+1 ||pp < 2−pk

which is summable. Hence by Borel-Cantelli lemma, µ(lim sup An ) = 0. Now if ω ∈ /


−k
lim sup An then there exists an N (ω) such that for all k ≥ N (ω), |fk (ω)−fk−1 (ω)| ≤ 2 .
This implies that {fk (ω)} is a Cauchy sequence and hence converges. Since this is true
for all ω ∈
/ lim sup An , the proof is complete.

Theorem 3 (Lp is complete for 0 < p < ∞). Suppose {fn } is Cauchy in Lp for some
0 < p < ∞. Then there is an f such that ||fn − f ||p → 0 as n → ∞.

Proof. (a) First asssume that p ≥ 1. Since {fn } is Cauchy in Lp , we can choose increasing
integers {nk } such that

1
||fn − fm ||p < for all m, n ≥ nk .
4k
Let gk = fnk , k ≥ 1. Then by Lemma 1, gk converges a.e. µ to some f .
We shall now show that f ∈ Lp and ||fn − f ||p → 0. Fix  > 0. Since {fn } is Cauchy in
Lp , choose N such that ||fn − fm ||pp <  for all n, m ≥ N . Fix n ≥ N . and let m → ∞
through the subsequence {nk }. Then

 ≥ lim inf ||fn − fnk ||pp


k→∞
Z
= lim inf |fn − fnk |p dµ
k→∞ Ω
Z
≥ lim inf |fn − fnk |p dµ by Fatou’s lemma
Ω k→∞
= ||fn − f ||pp since fnk → f a.e..

Now let n → ∞. Then lim supn→∞ ||fn − f ||p ≤ . Since  was arbitrary, this proves
Lp
that fn − f → 0. We have not yet shown that f ∈ Lp . But f = (f − fn ) + fn and hence
f ∈ Lp by Minkowski Inequality.

2
(b) Now assume that 0 < p < 1. Then by the above proof (we had used p ≥ 1 only in the
Lp
final step to show that f ∈ Lp ) fn − f → 0. Remains to show that f ∈ Lp . We cannot
use Minkowski inequality. But the conclusion follows by writing f = (f − fn ) + fn and
then using Exercise (1).

Theorem 4. Consider the measure space (Rn , B(Rn ), µ) where µ is a Lebesgue-Stieltjes


measure. Suppose f ∈ Lp =: Lp (Rn , B(Rn ), µ) for some 0 < p < ∞. Show that given any
 > 0, there is a continuous function g ∈ Lp such that ||f − g||p < . So the continuous
functions in Lp are dense in Lp .

Proof. By Theorem 2, it suffices to show that we can approximate in || · ||p , any indicator
function IA ∈ Lp by a continuous function. Note that IA ∈ Lp implies µ(A) < ∞. Hence
by Theorem 1 of Lecture 13, there exists C closed and V open such that C ⊂ A ⊂ V
and µ(V − C) < p 2−p . Using Uryshon’s lemma1 , there exists a continuous function
g : Ω → [0, 1] such that g = 1 on C and g = 0 on V c . Then
Z Z
|IA − g|p dµ = |IA − g|p dµ
Ω {ω:I (ω)6=g(ω)}
Z A
≤ 2p dµ
{ω:IA (ω)6=g(ω)}
≤ 2p µ(V − C)
< p .

1
Urysohn’s Lemma: Suppose Ω is a metric space and K is a closed set in Ω and U is an open
neighbourhood of K. Then there exists a continuous function f : Ω → [0, 1] such that IK (ω) ≤ f (ω) ≤
IU (ω) for all ω ∈ Ω

3
Lecture 30
L∞
lp , 0 ≤ p ≤ ∞
February 3, 2021

Definition 1. Suppose (Ω, A, µ) is a measure space. For any real-valued measurable


function f , define

ess sup f = inf c ∈ R̄ : µ{ω : f (ω) > c} = 0 .

Note that ess sup f is the smallest number c such that f ≤ c a.e. µ.
Example 1. (a) If f is the constant function −1, then ess sup f = −1.

(b) If f is a simple function f = ni=1 ci IAi then ess sup f = sup{ci : µ(Ai ) > 0}.
P

(c) If f = g a.e. then ess sup f = ess sup g.

(d) If f ≤ g a.e. then ess sup f ≤ ess sup g.

For any complex-valued Borel measurable function f , define

||f ||∞ = ess sup |f |.

L∞ =: L∞ (Ω, A, µ) =: {f : ||f ||∞ < ∞}.

Note that ||f ||∞ is the smallest number c such that |f | ≤ c a.e. µ. So if f ∈ L∞ if and
only if f is essentally bounded, that is it is bounded outside a set of measure 0.
Exercise 1. Show that Hölder inequality holds for p = 1, q = ∞ and Minkowski inequal-
ity holds for p = ∞.
Theorem 1. (a) L∞ is a vector space and || · ||∞ is a norm on it.

(b) Suppose {fn ∈ L∞ }. Then ||fn − f ||∞ → 0 if and only if there exists a set A ∈ A
such that µ(A) = 0 and fn → f uniformly on Ac .

(c) L∞ is complete with respect to || · ||∞ .

Proof. (a) This is easy and we skip the details.

(b) Suppose ||fn − f ||∞ → 0. For any given positive integer m, ||fn − f ||∞ ≤ 1/m
for all large n. Hence there exists Am such that for µ(Am ) = 0 and for all ω ∈/ Am ,
|fn − f | ≤ 1/m. Let A = ∪∞ A
m=1 m . Then µ(A) = 0 and fn → f uniformly on A c.

Conversely, suppose A ∈ A is such that µ(A) = 0 and fn → f uniformly on Ac . Then


given  > 0, for all large n, |fn − f | ≤  on Ac . This implies |fn − f | ≤  a.e. µ. That is,
||fn − f ||∞ < .

(c) This is left as an exercise.

1
Exercise 2. (a) Show that finite valued simple functions are dense in L∞ .

(b) Give an example to show that continuous functions g ∈ L∞ are not dense in L∞ when
we consider a measure space (Rn , B(Rn ), µ) where µ is a Lebesgue-Stieltjes measure.

Exercise 3. Suppose µ is a probability measure. Show that ||f ||p → ||f ||∞ as p → ∞.
Give an example to show that the result fails if µ is not a probability measure.
L∞ Lp
Exercise 4. Suppose µ is a finite measure. Show that if fn → f then fn → f for all
0 < p < ∞. Show that the result is not true if µ is not a finite measure.

Definition 2 (The little lp ). Suppose Ω is a countable set, without loss of generality,


Ω = {1, 2, . . . , }. Equip Ω with the power set P(Ω) as the σ-field and the counting
measure µ. Then it is conventional to write Lp (Ω, P(Ω), µ) as lp (Ω) or simply as lp and
it is referred to as “little lp ”. These are spaces of sequences and

X
lp = |xn |p < ∞ , 1 ≤ p < ∞,

{xn } :
n=1


l = {xn } : {xn } is a bounded sequence .

2
Lecture 31
Convergence in measure
Almost uniform convergence
Egoroff’s Theorem
February 3, 2021

Definition 1. A sequence of complex-valued Borel measurable functions {fn } is said


to converge in µ-measure to f , if for every  > 0, µ{ω : |fn (ω) − f (ω)| > } → 0 as
n → ∞. We say fn converges to f in measure. If µ is a probability measure, we say
µ
fn converges to f in probability. We write fn → f .
Exercise 1. Suppose fn converges to f and also to g in measure. Show that f = g a.e.
Definition 2. A sequence of Borel measurable functions {fn } is said to be Cauchy in
measure if given any  > 0, µ{ω : |fn (ω) − fm (ω)| > } → 0 as n, m → ∞.
Exercise 2. Suppose {fn } is a sequence of functions in Lp (µ).
Lp µ
(a) Show that if fn → f for some 0 < p < ∞, then fn → f .Hint: Use Markov Inequality.
(b) Show that if {fn } is Cauchy in Lp for some 0 < p < ∞, then {fn } is Cauchy in
measure.
Exercise 3. Consider the Lebesgue measure on R. Let
(
1/n if 0 ≤ x ≤ en ,
fn (x) =
0 otherwise.

Check that fn → 0 uniformly on R and hence in L∞ and also a.e. Check that fn → 0 in
measure. Check that fn does not converge to 0 in Lp for any 0 < p < ∞.
Definition 3. A sequence of complex-valued Borel measurable functions {fn } is said
to converge almost uniformly to f if given  > 0, there exists a set A ∈ A such that
µ(A) <  and fn → f uniformly on Ac .
Exercise 4. Consider the Lebesgue measure on [0, ∞). Let

1, if n ≤ x ≤ n + 1 ,
fn (x) = n
0 otherwise.

Check that fn → 0 a.e., in measure and in Lp , 0 < p < ∞. Show that fn does not
converge almost uniformly.
Exercise 5. Consider the Lebesgue measure on [0, ∞). Let
(
1 if n ≤ x ≤ n + 1,
fn (x) =
0 otherwise.

Check that fn → 0 a.e., but fn does not converge in measure.

1
µ a.e.
Theorem 1. (a) If fn → f almost uniformly, then fn → f and fn → f .
µ
(b) If fn → f then there is a sub-sequence {nk } so that fnk → f almost uniformly to f .

Proof. (a) Fix  > 0. Let A be such that µ(A) <  and fn → f uniformly on Ac . Fix
δ > 0. Then for all large n, |fn ω) − f (ω)| < δ for all ω ∈ Ac . Therefore for such n
{ω : |fn (ω) − f (ω)| > δ} ⊆ A. This implies that for all large n,

µ{ω : |fn ω) − f (ω)| > δ} ≤ µ(A) ≤ .


µ
So fn → f .
To prove a.e. convergence, for every integer k > 1, choose a set Ak such that µ(Ak ) < 1/k
and fn → f uniformly on Ack . Let B = ∪∞ c
k=1 Ak . Then fn → f on B. Moreover,

µ(B c ) = µ(∩∞
k=1 Ak ) ≤ µ(Ak ) → 0 as k → ∞.

(b) First note that

{ω : |fn (ω) − fm (ω)| ≥ } ⊆ {ω : |fn (ω) − f (ω)| ≥ /2} ∪ {ω : |fm (ω) − f (ω)| ≥ /2}.

Hence as n, m → ∞,
  
µ ω : |fn (ω)−fm (ω)| ≥  ≤ µ ω : |fn (ω)−f (ω)| ≥ /2 +µ ω : |fm (ω)−f (ω)| ≥ /2 → 0.

This proves that fn is Cauchy in measure. Now for k = 1, 2, . . ., choose {nk } strictly
increasing such that

µ ω : |fm (ω) − fn ω)| ≥ 2−k ≤ 2−k , n, m ≥ nk .




Let gk = fnk and

Ak = {ω : |gk (ω) − gk+1 ω)| ≥ 2−k , A = lim sup Ak . (1)

Then clearly, µ(Ak ) ≤ 2−k and hence by Borel-Cantelli lemma, µ(A) = 0. But for all
ω∈ / A, ω ∈ Ak for only finitely many k and as a consequence, {gk (ω)} is Cauchy. So for
all ω ∈
/ A, gk (ω) → g(ω) for some g. Since µ(A) = 0, gk → g a.e. But this g must equal
f a.e. since gk → f is measure. Thus gk → f a.e.
Now we prove almost uniform convergence. Fix  > 0. Let Br = ∪∞
k=r Ak . Then
µ(Br ) <  for all large r. If ω ∈
/ Br , by (1),

|gk (ω) − gk+1 (ω)| < 2−k , k = r, r + 1, . . . .

By Wierstrass M-test, gk converges uniformly on Brc . Again, the limit must be equal to
f a.e..

Exercise 6. Show that {fn } is Cauchy in measure if and only if fn converges in measure.
Hint. Borrow from the above proof.

2
Lemma 1. Suppose µ is a finite measure. Then fn → f a.e. if and only if for every
δ > 0,
lim µ ∪∞

n→∞ k=n {ω : |fk (ω) − f (ω)| ≥ δ} = 0.

Proof. Let
Bn,δ = {ω : |fn (ω) − f (ω)| ≥ δ}, Bδ = lim sup Bn,δ .
Then µ(∪∞
k=n Bk,δ ) ↓ µ(Bδ ) as n → ∞.

{ω : fn (ω) 9 f (ω)} = ∪∞
δ>0 Bδ
= ∪∞
m=1 B1/m .

Hence fn → f a.e. if and only if µ(Bδ ) = 0 for all δ > 0 if and only if µ(∪∞
k=n Bk,δ ) → 0
for all δ > 0.

Exercise 7. If µ is finite, show that {fn } is Cauchy a.e. if and only if, for all δ > 0,

lim µ ∪∞

m,n→∞ j,k=n {ω : |fk (ω) − fj (ω)| ≥ δ} = 0.

Theorem 2 (Egoroff’s Theorem). If µ is a finite measure and fn → f a.e., then fn → f


almost uniformly.

Proof. Fix  > 0 and integer j > 1. From the above lemma, for sufficiently large
n = n(j),
µ(Aj ) = µ(∪∞ j
k=n(j) {|fk − f | ≥ 1/j} ≤ /2 .

Let A = ∪∞
j=1 Aj . Then µ(A) < . For δ > 0, choose j > 1/δ. Then for any k ≥ n(j)
and ω ∈ Ac ,
|fk (ω) − f (ω)| < 1/j < δ.
Thus fn → f uniformly on Ac .

Exercise 8. Suppose µ is a finite measure. Show that if fn → f a.e. then fn → f in


measure. Give an example to show that the result is not true if µ is not a finite measure.
Exercise 9 (Extended DCT). Suppose {fn } is a sequence of measurable functions
R all n, |fRn | ≤ g for an integrable function g. Suppose fn → f in measure.
such that for
Show that fn dµ → f dµ. Hint: Use Theorem 1.
Exercise 10 (Convergence in measure is metrizable). Define a metric on the space of
measurable functions as (functions which are equal a.e. are taken to be equal)

|f − g|
Z
d(f, g) = dµ.
Ω 1 + |f − g|

Show that
µ
(a) d(fn , f ) → 0 implies fn → f .

3
µ
(b) If µ is finite then fn → f implies d(fn , f ) → 0. Hence convergence in measure is
metrizable when µ is finite.

(c) Give an example to show that (b) is not true when µ is not finite.

Exercise 11. Suppose µ is a finite measure.

(a) Suppose for every  > 0, ∞


P
n=1 µ{|fn − f | ≥ } < ∞. Show that fn → f a.e.

(b) Suppose for some 0 < p < ∞, ∞ p


P
n=1 ||fn − f ||p < ∞. Show that fn → f a.e.

4
Lecture 32
Product σ-field
Transition measure/probability
February 4, 2021

As we have discussed a little bit, the Lebesgue measure on Rn is an “n-fold


 Qn product” of
the (marginal) Lebesgue measure on R. So, λn [a1 , b1 ]×· · ·×[an , bn ] = i=1 λ([ai , bi ])
for all rectangles. We may similarly construct product measures of any given marginal
measures. We shall also see how this idea can be generalised. Moreover, the ideas can
be extended to infinite products.

Definition 1 (Product σ-field). Suppose (Ωi , Ai ), i = 1, . . . , n are measurable spaces.


Let Ω = Ω1 × · · · × Ωn . Then a set A is said to be a measurable rectangle if
A = A1 × · · · × An where Ai ∈ Ai i = 1, . . . , n. The smallest σ-field (of subsets of Ω)
containing all measurable rectangles is called the product σ-field and we write this
σ-field as A1 ⊗ · · · ⊗ An .

Note that the product σ-field is not the product of the σ-fields, even though the notation
suggests so. Some authors use the notation A1 ⊗ · · · ⊗ An to denote the product σ-field.
The Borel σ-field on Rn is the n-fold product σ-field of the Borel σ-field on R.

Exercise 1. (a) Show that the collection of all measurable rectangles may not always be
a field.

(b) Show that the collection of all finite disjoint unions of measurable rectangles is always
a field but may not be a σ-field.
(c) Show that (A1 ⊗ · · · ⊗ An−1 ) ⊗ An = A1 ⊗ · · · ⊗ An .

Exercise 2. Suppose A ∈ A1 ⊗ A2 . Let

A(ω2 ) = {ω1 ∈ Ω1 : (ω1 , ω2 ) ∈ A} ⊆ Ω1 ,


A(ω1 ) = {ω2 ∈ Ω2 : (ω1 , ω2 ) ∈ A} ⊆ Ω2 .

These sets are called sections of A. Note that sections can be empty.

(a) Identify the sections when A is a measurable rectangle.

(b) Show that A(ω2 ) ∈ A1 and A(ω1 ) ∈ A2 for all ω1 and ω2 . Hint: Consider the class
of sets for which the relations are true and show that the class contains all measurable
rectangles and is a σ-field.

(c) If A and B are disjoint in A then show that their sections are also disjoint.

(d) Suppose An ∈ A increases (respectively decreases) to A. Then show that An (ω2 )


increases (respectively decreases) to A(ω2 ). Likewise for the other section.

1
Definition 2 (Transition measure). Suppose (Ωi , Ai ), i = 1, 2 are two measurable
spaces. Suppose µ2 : Ω1 × A2 → [0, ∞] is a function such that

(a) For every ω1 ∈ Ω1 , µ2 (ω1 , ·) is a measure on A2 .

(b) For every A2 ∈ A2 , the function ω1 → µ2 (ω1 , A2 ) is measurable (with respect to A1 ).

Then µ2 is called a transition measure. In particular, if each measure in (a) is a


probability measure, then µ2 is called a transition probability.

Exercise 3. Here is a toy example. Suppose I have two boxes B1 and B2 . You may
think of them as two values of ω1 . Suppose Box 1 has 3 red balls and 3 white balls. Box
2 has 3 reds, 2 white and 5 black balls. I pick a box by some mechanism (this mechanism
could also be a probability measure on the two values). Once I pick a box, I pick one ball
“at random” from that box (that is, all balls in that box have equal chance to come into
my sample). That is, there are two probability measures on {red, black, white} depending
on which box is chosen. These are the two distinct transition probability measures given
below:

P1 (red) = 1/2, P1 (white) = 1/2, P1 (black) = 0 when Box 1 is picked,


P2 (red) = 3/10, P2 (white) = 3/10, P1 (black) = 1/2 when Box 2 is picked.

Definition 3. A transition measure is said to be uniformly σ-finite if there exists


{Bn } from A2 such that ∪∞
n=1 Bn = Ω2 and µ2 (ω1 , Bn ) < ∞ for all ω1 ∈ Ω1 and n ≥ 1.

Before presenting the next theorem, a word about notation. So far we have used dµ(ω)
to denote integration with respect to the measure µ. Henceforth we will also use dµ(ω)
for this purpose. This will be convenient to deal with iterated integrals where one or
more variables are held fixed.

Theorem 1 (Combining transition probabilities). Suppose µ2 is a uniformly σ-finite


transition measure and µ1 is a σ-finite measure. Then there is a unique measure µ on
A = A1 ⊗ A2 such that
Z
µ(A1 × A2 ) = µ2 (ω1 , A2 )dµ1 (ω1 ), for all A1 ∈ A1 , A2 ∈ A2 . (1)
A1

The measure of any set A ∈ A is given by


Z
µ(A) = µ2 (ω1 , A(ω1 ))dµ1 (ω1 ). (2)
Ω1

Further µ is σ-finite. If µ1 and the transition measures are all probability measures, then
µ is also a probability measure.

2
Lecture 33
Combining transition measure/probability
Product measure
February 4, 2021

Theorem 1 (Combining transition probabilities). Suppose µ1 is a σ-finite measure on


(Ω1 , A1 ) and µ2 on (Ω1 , A2 ) is a uniformly σ-finite transition measure. Then there is a
unique measure µ on A = A1 ⊗ A2 such that
Z
µ(A1 × A2 ) = µ2 (ω1 , A2 )dµ1 (ω1 ), for all A1 ∈ A1 , A2 ∈ A2 . (1)
A1

The measure of any set A ∈ A is given by


Z
µ(A) = µ2 (ω1 , A(ω1 ))dµ1 (ω1 ). (2)
Ω1

Further µ is σ-finite. If µ1 and the transition measures µ2 (ω1 , ·) are all probabilities,
then µ is also a probability.

Proof. (a) First assume that µ2 (ω1 , ·) are finite.

(i) If A ∈ A, then by Exercise 2 (b) of Lecture 33, we know that A(ω1 ) ∈ A2 .

(ii) We show that f (ω1 ) =: µ2 (ω1 , A(ω1 )) is a measurable function for all A ∈ A. To
show this, let C be the class of sets in A for which this holds. Suppose A = A1 × A2 ∈ A.
Then
(
µ2 (ω1 , A2 ) if ω1 ∈ A1 ,
µ2 (ω1 , A(ω1 )) =
µ2 (ω1 , ∅) = 0 if ω1 ∈/ A1 .

Thus µ2 (ω1 , A(ω1 )) = µ2 (ω1 , A2 )IA1 (ω1 ) which is measurable by assumption. Thus C
contains all measurable rectangles. It is easy to see that C contains the field of all finite
disjoint union of rectangles. [Use the fact that finite sums of measurable functions is
measurable]. Then it is easy to see that C is a monotone class. [We use the finiteness
of the transition measures and Exercise 2(c) of Lecture 32.] Thus by monotone class
theorem C = A.

(iii) Define Z
µ(A) = µ2 (ω1 , A(ω1 ))dµ1 (ω1 ), A ∈ A.
Ω1
Note that for all A ∈ A, the integrand is non-negative, and is a measurable function by
(ii). Hence the integral is defined and satisfies (2). In particular, taking A = A1 × A2 ,
it is easily seen that (1) holds. We now prove that this µ is a measure. Suppose {An } is
a sequence of disjoint sets in A. Then
Z

µ(∪n=1 An ) = µ2 (ω1 , (∪∞
n=1 An )(ω1 ))dµ1 (ω1 )
Ω1

1
Z ∞
X
= µ2 (ω1 , An (ω1 ))dµ1 (ω1 ) since µ2 is a measure
Ω1 n=1

XZ
= µ2 (ω1 , An (ω1 ))dµ1 (ω1 )
n=1 Ω1
X∞
= µ(An ).
n=1

This proves the theorem in the finite case.

Now assume that µ2 (ω1 , ·) is uniformly σ-finite. Then follow the usual route. Split up
Ω2 by the uniform σ-finite condition into {Bn } and consider the restricted transition
measures. We omit the details.

Uniqueness: Suppose if possible ν is another measure on A for which (1) holds. Then
ν = µ on the field of finite disjoint unions of measurable rectangles. On the other hand,
µ is σ-finite. [Since µ1 is σ-finite, split up Ω1 into {Am } and then consider {Am × Bn }].
Thus µ and ν agree on the field and are both σ-finite. So by Caratheodory theorem they
must be equal on A.
Finally, it is easy to see that if µ1 and the transition measures µ2 (ω1 , ·) are all probability
measures, then µ is indeed a probability measure.

Exercise 1 (Product measure). Suppose µi are σ-finite measures on Ai , i = 1, 2.


Show that there is a unique measure on A1 ⊗ A2 such that µ(A1 × A2 ) = µ1 (A1 )µ2 (A2 )
for all Ai ∈ Ai , i = 1, 2. This measure is σ-finite and is known as the product measure
and we write µ = µ1 ⊗ µ2 or µ = µ1 × µ2 . Extend to more than two measures. Use this
to construct the Lebesgue measure on Rn .

Exercise 2. Suppose µ = µ1 ⊗ µ2 where both measures µ1 , µ2 are σ-finite. Suppose


A ∈ A. Show that the following are equivalent.

(a) µ(A) = 0.

(b) µ1 (A(ω2 )) = 0 for µ2 a.e. ω2 ∈ Ω2 .

(c) µ2 (A(ω1 )) = 0 for µ1 a.e. ω1 ∈ Ω1 .

2
Lecture 33
Combining transition measure/probability
Product measure
February 4, 2021

Theorem 1 (Combining transition probabilities). Suppose µ1 is a σ-finite measure on


(Ω1 , A1 ) and µ2 on (Ω1 , A2 ) is a uniformly σ-finite transition measure. Then there is a
unique measure µ on A = A1 ⊗ A2 such that
Z
µ(A1 × A2 ) = µ2 (ω1 , A2 )dµ1 (ω1 ), for all A1 ∈ A1 , A2 ∈ A2 . (1)
A1

The measure of any set A ∈ A is given by


Z
µ(A) = µ2 (ω1 , A(ω1 ))dµ1 (ω1 ). (2)
Ω1

Further µ is σ-finite. If µ1 and the transition measures µ2 (ω1 , ·) are all probabilities,
then µ is also a probability.

Proof. (a) First assume that µ2 (ω1 , ·) are finite.

(i) If A ∈ A, then by Exercise 2 (b) of Lecture 33, we know that A(ω1 ) ∈ A2 .

(ii) We show that f (ω1 ) =: µ2 (ω1 , A(ω1 )) is a measurable function for all A ∈ A. To
show this, let C be the class of sets in A for which this holds. Suppose A = A1 × A2 ∈ A.
Then
(
µ2 (ω1 , A2 ) if ω1 ∈ A1 ,
µ2 (ω1 , A(ω1 )) =
µ2 (ω1 , ∅) = 0 if ω1 ∈/ A1 .

Thus µ2 (ω1 , A(ω1 )) = µ2 (ω1 , A2 )IA1 (ω1 ) which is measurable by assumption. Thus C
contains all measurable rectangles. It is easy to see that C contains the field of all finite
disjoint union of rectangles. [Use the fact that finite sums of measurable functions is
measurable]. Then it is easy to see that C is a monotone class. [We use the finiteness
of the transition measures and Exercise 2(c) of Lecture 32.] Thus by monotone class
theorem C = A.

(iii) Define Z
µ(A) = µ2 (ω1 , A(ω1 ))dµ1 (ω1 ), A ∈ A.
Ω1
Note that for all A ∈ A, the integrand is non-negative, and is a measurable function by
(ii). Hence the integral is defined and satisfies (2). In particular, taking A = A1 × A2 ,
it is easily seen that (1) holds. We now prove that this µ is a measure. Suppose {An } is
a sequence of disjoint sets in A. Then
Z

µ(∪n=1 An ) = µ2 (ω1 , (∪∞
n=1 An )(ω1 ))dµ1 (ω1 )
Ω1

1
Z ∞
X
= µ2 (ω1 , An (ω1 ))dµ1 (ω1 ) since µ2 is a measure
Ω1 n=1

XZ
= µ2 (ω1 , An (ω1 ))dµ1 (ω1 )
n=1 Ω1
X∞
= µ(An ).
n=1

This proves the theorem in the finite case.

Now assume that µ2 (ω1 , ·) is uniformly σ-finite. Then follow the usual route. Split up
Ω2 by the uniform σ-finite condition into {Bn } and consider the restricted transition
measures. We omit the details.

Uniqueness: Suppose if possible ν is another measure on A for which (1) holds. Then
ν = µ on the field of finite disjoint unions of measurable rectangles. On the other hand,
µ is σ-finite. [Since µ1 is σ-finite, split up Ω1 into {Am } and then consider {Am × Bn }].
Thus µ and ν agree on the field and are both σ-finite. So by Caratheodory theorem they
must be equal on A.
Finally, it is easy to see that if µ1 and the transition measures µ2 (ω1 , ·) are all probability
measures, then µ is indeed a probability measure.

Exercise 1 (Product measure). Suppose µi are σ-finite measures on Ai , i = 1, 2.


Show that there is a unique measure on A1 ⊗ A2 such that µ(A1 × A2 ) = µ1 (A1 )µ2 (A2 )
for all Ai ∈ Ai , i = 1, 2. This measure is σ-finite and is known as the product measure
and we write µ = µ1 ⊗ µ2 or µ = µ1 × µ2 . Extend to more than two measures. Use this
to construct the Lebesgue measure on Rn .

Exercise 2. Suppose µ = µ1 ⊗ µ2 where both measures µ1 , µ2 are σ-finite. Suppose


A ∈ A. Show that the following are equivalent.

(a) µ(A) = 0.

(b) µ1 (A(ω2 )) = 0 for µ2 a.e. ω2 ∈ Ω2 .

(c) µ2 (A(ω1 )) = 0 for µ1 a.e. ω1 ∈ Ω1 .

2
Lecture 35
Infinite products
Product probability measure
Kolmogorov extension theorem
February 10, 2021

As we have remarked, the product measure theorem can be extended to product of n


space, n ≥ 3. We state this result, only for probability measures, without proof.
Theorem 1 (Measure on finite product). Suppose (Ωj , Aj ), 1 ≤ j ≤ n are measurable
spaces. Suppose Q1 is a probability measure on A1 and suppose Qj+1 : Ω1 × · · · × Ωj ×
Aj+1 → [0, 1] is a transition probability measure for each 1 ≤ j ≤ n − 1. Then there is
a unique probability measure Pn on A1 ⊗ · · · ⊗ An such that
Z Z Z
Pn (A1 × · · · × An ) = Q1 (dω1 ) Q2 (ω1 , dω2 ) · · · Qn (ω1 , . . . ωn−1 , dωn ).
A1 A2 An

Exercise 1. In the above theorem, show that


Z Z Z
n
Pn (B ) = Q1 (dω1 ) Q2 (ω1 , dω2 ) · · · IB n Qn (ω1 , . . . ωn−1 , dωn ) (1)
Ω1 Ω2 Ωn

for all B n ∈ A1 ⊗ · · · ⊗ An .

We shall now see how to define a product measure on infinite product spaces. However,
we shall restrict our discussion to only probability measures.
For sets Ωj , j ≥ 1, let

Y
Ω = Ωj
j=1
= {(ω1 , ω2 , . . .) : ωj ∈ Ωj for all j ≥ 1}.
Definition 1 (Measurable rectangles and cylinders). Suppose (Ωj , Aj ) are measurable
spaces. For any B n ⊆ Ω1 × · · · × Ωn , the set
Bn = B n × Ωn+1 × Ωn+2 × · · ·
= {ω ∈ Ω : (ω1 , ω2 , . . . , ωn ) ∈ B n } ⊂ Ω
is called a cylinder with n-dimensional base B n . The set Bn is said to be measurable
cylinder if B n ∈ A1 ⊗ · · · ⊗ An . If B n = A1 × · · · × An then Bn is called a (finite
dimensional) rectangle, and a measurable rectangle if Ai ∈ Ai , 1 ≤ i ≤ n.
Exercise 2. Show that:

(a) an n-dimensional cylinder is also an (n + 1) dimensional cylinder.

(b) the finite disjoint unions of measurable rectangles forms a field.


(c) the set of measurable cylinders is a field.

1
Definition 2 (Infinite product σ-field). The smallest σ-field Q generated by all mea-
surable cylinders is called the product σ-field. It is written as ∞ j=1 Aj . If all Aj are
equal to A, then we write the product σ-field as A . ∞

Exercise 3. Show that ∞


Q
j=1 Aj is also the smallest σ-field generated by all measurable
finite dimensional rectangles.

We now state the product measure theorem for infinite products.

Theorem 2 (Measure on infinite product). Suppose (Ωj , Aj ), j ≥ 1 are measurable


spaces. Suppose Q1 is a probability measure on A1 and suppose that Qj+1 : Ω1 × · · · ×
Ωj × Aj+1 → [0, 1] is a transition probability measure for each j ≥ 1. Then there
exists a unique probability measure P on ∞
Q
A
j=1 j such that P agrees with the probability
measures {Pn } given in (1) in the sense that

P (Bn ) = Pn (B n ) for all B n ∈ A1 ⊗ · · · ⊗ An .

Exercise 4. Read the proof of the above theorem from the book. Look out for the steps
which will fail if the measures are not assumed to be probability measures.

Corollary 1. Suppose (ΩQj , Aj , Pj ), j ≥ 1 be probability spaces. Then there is a unique


probability measure P on ∞ j=1 Aj such that

n
Y
P {ω ∈ Ω : ω1 ∈ A1 , . . . , ωn ∈ An } = Pj (Aj ),
j=1

for all n ≥ 1 and Aj Aj , 1 ≤ j ≤ n.

The proof is left as an exercise. We denote the above probability as P = ∞


Q
j=1 Pj . If all
probabilities Pj are equal say P , then the product probability is written as P ∞ . If there
is no scope for confusion, we may write it as also P .

Remark 1 (Probability measure on arbitrary product). The above corollary can be


extended to arbitrary products, provided we assume that for all t ∈ I, Ωt are complete
separable metric spaces. This is known as Kolmogorov extension theorem.

Exercise 5. Suppose for all j ≥ 1, Ωj = {H, T }, Aj = Pj with the probability P {H} =


p, P {T
Q}∞= 1 − p, 0 < p < 1. [Note that p = 1 and p = 0 are uninteresting cases]. Then
Ω = j=1 Ωj . Note that the product σ-field A = ⊗ and is uncountable infinite. and P ∞
is called the coin tossing probability. It is called symmetric if p = 1/2. Show that

P ∞ (ω) = 0 for all ω ∈ Ω


P ∞ {ω ∈ Ω : only finitely many ωi = H} = 0

2
Lecture 36
Weak convergence of measures
February 17, 2021

We now wish to introduce a concept of convergence of finite measures that are


defined on metric spaces. We motivate this by some examples:
Example 1. Suppose Pn is the probability measure which puts equal mass 1/n on the
elements of the set An = {1/n, 2/n, . . . , 1}. Let P be the uniform probability measure
on the interval [0, 1]. Then any reasonable notion of convergence should imply that Pn
converges to P . Consider the set A = ∪∞n=1 An . Note that Pn (A) = 1 for all n. However,
P (A) = 0. This means that Pn (A) → P (A) for every A cannot serve as a convergence
concept.

Recall that for any set A in a metric space, ∂A denotes the boundary of A. This is
the set of all points x of the metric space such that any neighbourhood of x contains
elemnets from both A and Ac .
Definition 1. Suppose µ is a measure on a metric space. Then a Borel set A is said to
be a µ continuity set if µ(∂A) = 0.

Observe that in Example 1, P (∂A) = 1 and hence A is not a µ continuity set.


Example 2. Suppose Pn puts probability 1/2 each on the points 0 and n. In this case,
a mass of 1/2 escapes to ∞ and so Pn cannot converge to a probability measure on R.

The following theorem will lead to the correct concept of convergence of measures on a
metric space.
Definition 2. A function f on a metric space is said to be lower semi-continuous
(lsc) at x if lim inf y→x f (y) ≥ f (x). If f is lsc at every x, then we say that it is a lower
semi-continuous function. A function is upper semi continuous if −f is lsc.
Theorem 1 (Portmanteau theorem). Let µ, µ1 , µ2 , . . . be finite measures on the Borel
σ-field of a metric space Ω. Then the following are equivalent:
R R
(a) f dµn → f dµ for all real bounded continuous functions f .
R R
(b) lim inf f dµn ≥ f dµ for all real bounded lower semi-continuous functions f .
R R
(b’) lim sup f dµn ≤ f dµ for all real bounded upper semi-continuous functions f .
R R
(c) f dµn → f dµ for all f real bounded a.e µ continuous functions f .

(d) lim inf µn (A) ≥ µ(A) for every open set A, and µn (Ω) → µ(Ω).

(d’) lim sup µn (A) ≤ µ(A) for every closed set A, and µn (Ω) → µ(Ω).

(e) µn (A) → µ(A) for every µ-continuity set A.

1
Definition 3. A sequence of measures µn on a metric space is said to converge weakly
w
or vaguely to µ if any of the above equivalent conditions hold. We write µn → µ or
simply µn → µ.
Example 3 (Discrete and continuous uniform). Recall Example 1. All the probability
measures are defined on the Borel σ-field of the metric space Ω = [0, 1] with the usual
metric. Suppose f is any bounded continuous function on Ω. Then
Z n
1X
f dPn = f (i/n)
n
i=1
Z 1
→ f (x)dx.
0

Hence (a) of Theorem 1 is satisfied with P as the uniform probability measure on [0, 1]
w
and hence Pn → P .
Exercise 1. Suppose Pn is the uniform probability measure on An = {1, 2, . . . , n}. Show
that Pn does not converge weakly to any probability measure.
Exercise 2. Suppose Pn puts probability 1/3 and 2/3 at the points 1/n and −1/n re-
spectively. Show that Pn converges weakly to a probability measure P . Identify P .
Definition 4. Suppose P is a probability measure such that P {x} = 1 for some x. Then
P is called the point mass at x and is often written as δx .
Exercise 3 (Binomial to Poisson). Suppose Pn is the binomial probability measure
with parameters n and pn . This means that
 
n k
Pn (k) = p (1 − pn )n−k , k = 0, 1, . . . n.
k n
Suppose as n → ∞, npn → λ, for some 0 < λ < ∞. Show that the probability measures
Pn converge to the Poisson probability measure P with parameter λ, defined by

λk
P (k) = e−λ , k = 0, 1, . . . .
k!
Hint: Show that Pn (k) → P (k) for every k.
Exercise 4. Suppose C =: {x1 , x2 , . . .} is a countable set in a metric space and has no
w
limit points. Suppose {Pn } and P are probability measuresP∞ on C. Show that Pn → P if
and only if Pn (xk ) → P (xk ) for every k ≥ 1 such that k=1 P (xk ) = 1.

Proof of Theorem 1. (a) ⇒ (b). Suppose g is any function such that g ≤ f and g is
bounded continuous. Then
Z Z Z
lim inf f dµn ≥ lim inf gdµn = gdµ.

2
Now since f is lower semi-continuous (lsc) and |f | ≤ M , there exists a sequence of
continuous functions {gn } bounded
R in Rabsolute value by M and gn → f . Then by
DCT(µ is a finite measure!), gn µn → f dµ. This implies (b).

(b) ⇔ (b’): f is lsc if and only if −f is usc.

(b) ⇒ (c): For any measurable f , let

f = sup{g : g is lsc g ≤ f },
f = inf{g : g is usc g ≥ f }

be the lower and upper envelopes of f . Then f is lsc and f is usc. Further

f (x) = lim inf f (y) for all x,


y→x

f (x) = lim sup f (y) for all x,


y→x

f (x) = f (x) = f (x) for all continuity points of f.

Hence if f is bounded and continuous a.e. µ, then


Z Z
f dµ = f dµ
Z
≤ lim inf f dµn by (b)
Z
≤ lim inf f dµn since f ≤ f
Z
≤ lim sup f dµn
Z
≤ lim sup f dµn since f ≥ f
Z
≤ f dµ by (b’)
Z
= f dµ.
R R
This proves that lim f dµn = f dµ.
(c) ⇒ (d): Clearly (c) ⇒ (a) ⇒ (b). Suppose A is an open set. Then IA is a lsc function.
Hence by (b), lim inf µn (A) ≥ µ(A). The second part follows by using IΩ .

(d) ⇔ (d’): This follows by taking complements of sets.

(d) ⇒ (e): Suppose A is a µ-continuity set. Let Ao and Ā respectively be the interior
and closure of A. Then

lim sup µn (A) ≤ lim sup µn (Ā)

3
≤ µ(Ā) by (d’)
= µ(A) by assumption.
Similarly,
lim inf µn (A) ≥ µ(A).
This proves that lim µn (A) = µ(A),

(e) ⇒ (a). Let f be a continuous function on Ω such that |f | < M . Let


A = {c ∈ R : µ(f −1 {c}) 6= 0}.
Since µ is finite, A is countable. If ±M belong to A, increase the value of M so that
they do not belong to A. Note that this is possible since A is countable. Let t0 = −M <
t1 · · · < tj = M be a partition of [−M, M ], where none of the ti belong to A. Let
Bi = {x : ti ≤ f (x) < ti+1 }, 0 ≤ i ≤ j − 1.
Note that f −1 (ti , ti+1 ) is open, ∂f −1 [ti , ti+1 ) ⊆ f −1 {ti , ti+1 } and µ(f −1 {ti , ti+1 }) = 0
since {ti } ∈
/ A. Now
Z Z Z j−1
X
| f dµn − f dµ| ≤ f dµn − ti µn (Bi )|
i=0
j−1
X j−1
X Z j−1
X
+ | ti µn (Bi ) − ti µ(Bi )| + | f dµ − ti µ(Bi )|
i=0 i=0 i=0
= T1 + T2 + T3 say.
Then
j−1 Z
X
T1 = | (f (x) − ti )dµn (x)| ≤ max(ti+1 − ti )µn (Ω)
i
i=0
which can be made arbitrarily small by refining the partition (with partition points never
belonging to A) and also noting that µn (Ω) → µ(Ω) < ∞ as n → ∞ by (e).
Similarly,
T3 ≤ max(ti+1 − ti )µ(Ω),
i
which can also be made arbitrarily small.
Finally T2 → 0 as n → ∞ since each Bi is a continuity set of µ.
R R
Exercise 5. Show that on a metric space, if f dµn → f dµ for all real bounded
w
uniformly continuous functions f , then µn → µ. Hint: Use the following fact from
real analysis: If F is a closed set in a metric space Ω, then IF is the limit of a non-
increasing sequence of uniformly continuous functions fn such that 0 ≤ fn ≤ 1.
Exercise 6. Show that if µn (A) → µ(A) for every open µ-continuity set A, then
w
µn → µ.

4
Lecture 37
Weak convergence of distribution functions
February 10, 2021

Since every finite measure on R has a distribution function, weak convergence can be
expressed in terms of distribution functions.
Recall that if F is a distribution function then F (±∞) are defined in the natural way:
F (±∞) = limy→±∞ F (y).

Definition 1. For any distribution function F , CF will denote the set of continuity
points of F . This includes the points ±∞.

Theorem 1. Let µ, µ1 , µ2 , . . . be finite measures on B(R) with distribution functions


F, F1 , . . . . Then the following are equivalent.
w
(a) µn → µ.

(b) Fn (a, b] = Fn (b) − Fn (a) → F (a, b] = F (b) − F (a) for all a, b ∈ CF .


Moreover, if all the distributions are 0 at −∞, then (b) is equivalent to

Fn (x) → F (x) for all x ∈ CF .

w
To denote this convergence we often write Fn → F or simply Fn → F .

Proof. (a) ⇒ (b): Consider the Borel set A = (a, b] for a, b finite. Note that ∂A = {a, b}.
Suppose a and b are continuity points of F . Then µ{a} = µ{b} = 0, so that (1, b] is a
µ-continuity set. Hence Fn (a, b] = µn (a, b] → µ(a, b] = F (a, b]. If a = −∞, then the
argument is same. If b = ∞, then apply the earlier argument on (a, ∞).

(b) ⇒ (a): Suppose A is an open set. Write A as a countable disjoint union ∪∞


i=1 Ii where
{Ii } are open intervals. Then
X
lim inf µn (A) = lim inf µn (Ik )
k
X
≥ lim inf µn (Ik ) by Fatou’s lemma on counting measure.
k

Fix  > 0. For each k, let Jk be a right semi-closed sub-interval of Ik such that the end
points of Jk are continuity points of F , and

µ(Jk ) ≥ µ(Ik ) − 2−k .

[This can be done since F has only countably many discontinuities]. Then

lim inf µn (Ik ) ≥ lim inf µn (Jk ) = µ(Jk ) by (b).

1
Hence
X
lim inf µn (A) ≥ µ(Jk )
k
X
≥ µ(Ik ) − 
k
= µ(A) − .

Since  is arbitrary, we have lim inf µn (A) ≥ µ(A). Since A was an arbitrary open set,
w
µn → µ by Theorem 1 (d) of Lecture 36.

Exercise 1. Consider the functions Fn (·) defined by


  x n
 1 − 1 − if 0 < x < n,
n


Fn (x) = 0 if x ≤ 0 (1)


1 if x ≥ 1.

Show that each Fn is a continuous distribution function and they converge to some F
weakly. Identify F .
Exercise
P∞ 2. Suppose for every n {pn,k } is a sequence of non-negative
P∞ numbers such that
k=1 p n,k = 1. Suppose for every k, lim n→∞ pn,k = p k such that k=1 pk = 1.
P∞
(a) Show that limn→∞ k=1 |pn,k − pk | = 0. Hint: Use |a − b| = a + b − 2 min(a, b) and
DCT.

(b) Suppose Ω = {1, 2, . . .}. Consider the probability measures Pn and P on Ω defined
by the sequences {pn,k } and {pk }. Show that Pn → P .
Exercise 3. (a) Suppose
R {fn } and
R f are non-negative Borel measurable functions on R
such that for Rall n, fn (x)dx = f (x)dx = 1. Suppose further that fn → f a.e λ. Show
that limn→∞ |fn (x) − f (x)|dx = 0.

(b) Suppose Fn and F are the distribution functions corresponding to {fn } and f :
Z x Z x
F (x) = f (y)dy, Fn (x) = fn (y)dy, x ∈ R, n ≥ 1.
−∞ −∞

w
Show that Fn → F .
Remark 1. One of the pillars of probability theory is the central limit theorem.
This theorem is about convergence of a certain sequence of probability measures to the
Gaussian measure. We shall state and prove this result later.
Remark 2. 1. Theorem 1 can be extended to measures and distributions on Rn .

2. Note that for measures on metric spaces, there is no distribution function.

2
Lecture 37
Independence
February 14, 2021

Definition 1 (Random experiment, events, outcomes). Any probability space (Ω, A, P )


may be considered as a random experiment. The elements of Ω are called outcomes
and the elements of A are called events.

Example 1. Consider the two-fold product in the coin-tossing example. Then Ω =


{HH, HT, T H, T T } and is equipped with the power σ-field, and

P (HH) = p2 , P (T T ) = (1 − p)2 , P (T H) = (1 − p)p and P (HT ) = p(1 − p).

The random experiment here is tossing a coin two times and the four possible results
of this experiment are laid out by the elements of Ω. These are the outcomes of the
experiment. The experiment is called random since the result of the experiment is not
certain but is governed by the above probabilities of the outcomes. Once the experiment
is performed (the two tosses are complete), we can say if a specific event has occurred
or not.

Example 2 (Example 1 continued). One of the key concepts in probability theory is that
of independence. The idea is already germane in the product probability spaces that we
constructed. Let us revisit the above example. Consider the two events

A = {HT, HH}, B = {HH, T H}.

They can be verbally described as “the first toss is head” and “the second toss is head”.
Their probabilities are given by

P (A) = p and P (B) = p.

Now note that the first toss should not influence the result of the second toss. Since
the events A and B depend respectively on the first and the second tosses, this lack of
influence must be reflected in the calculation of probabilities. Now, it is easy to check
that
P (A ∩ B) = p2 = P (A)P (B).
This leads us to the concept of independence of events.

Definition 2. Two events A and B in a probability space are said to be probabilisti-


cally independent (in short, independent) if

P (A ∩ B) = P (A)P (B).

Exercise 1. (a) Consider the class of events which are either P -null or have probability
1. Show that any two events in this collection are independent. In particular, Ω and ∅
are always independent of each other and each is independent of itself.

1
(b) Any event with probability 1 or 0 is independent of any other event in the σ-field.

(b) Suppose A and B are disjoint. Then they are independent if and only if at least one
of them is a P -null set.

(c) If A is independent of itself, then either A is a P -null set or Ac is a P -null set. That
is, either P (A) = 1 or P (Ac ) = 1.

Exercise 2. If A and B are independent, show that then each of the pair of events
{Ac , B c }, {Ac , B} and {A, B c } are also independent.

In the above example, “intuition” told us that the events A and B cannot “influence”
each other. Indeed, we constructed our product probability space in such a manner.
Sometimes presence of independence is not that intuitive.

Exercise 3. In Example 1, let C = {HH, T T } be the event that “the outcome of both
tosses are same”. Show that A and C are independent and so are B and C. Show that
events A ∩ B and C are not independent.

Nevertheless, intuitively if we have three events A, B and C that we wish to declare to


be independent, then for example, A ∩ B and C should also be independent. This leads
to the following definition.

Definition 3 (Independence of events). Let I be any index set and let {Ai , i ∈ I} be a
collection of events in a probability space. Then they are said to be independent, if for
all finite sub-collections {i1 , . . . , ik } of distinct indices from I, we have

P (Aii ∩ Ai2 · · · ∩ Aik ) = P (Ai1 )P (Ai2 ) · · · P (Aik ).

Exercise 4. (a) If {Ai , i ∈ I} are independent, then show that the collection {Bi , i ∈ I}
obtained by replacing some (or all) Ai by Aci is also independent.

(b) If {Ai , i ∈ I} are independent, then clearly any sub-collection is also independent.

Example 3. Note that to check independence one needs to check a lot of conditions.
None of the conditions can actually be dropped. Consider the throw of two dice. Then
we can consider the set of all 36 outcomes and suppose the dice is fair. So, each of the
36 outcomes have the same probability 1/36. Let

A = {second throw shows 1, 2 or 5 }


B = {second throw shows 4, 5 or 6 }
C = {sum of the two faces equal 9 }.

Then it can be checked that P (A) = 1/2, P (B) = 1/2, P (C) = 1/9. Further,
1 1
P (A ∩ B) = 6= P (A)P (B) = ,
6 4

2
1 1
P (A ∩ C) = 6 P (A)P (B) = ,
=
36 18
1 1
P (B ∩ C) = 6 P (B)P (C) = ,
=
12 18
1
P (A ∩ B ∩ C) = = P (A)P (B)P (C).
36
Note that the last equation does not imply that A, B and C are independent, nor does it
imply that A ∩ B and C are independent.

Exercise 5. Consider two probability spaces (Ωi , Ai , Pi ), i = 1, 2. Suppose Ai ∈ Ai , i =


1, 2. Check that the events A1 ×Ω2 and Ω1 ×A2 are independent in the product probability
space. In what ways can this be generalized to n-fold product and infinite products?

3
Lecture 39
Conditional probability
Second Borel-Cantelli Lemma
Independence of σ-fields
February 14, 2021

Suppose B is an event in some probability space (Ω, A, P ). If I am told that the event
B has “occurred”, how does that change the probabilities of events in A?
Definition 1 (Conditional probability measure). Suppose (Ω, A, P ) is a probability
space and P (B) > 0. Then the probability measure P (·|B) defined as
P (A ∩ B)
P (A|B) = , A∈A
P (B)
is called the conditional probability measure given B. For every A ∈ A, P (A|B)
is called the conditional probability of A given B.

Note that the conditional probability measure depends on the “condition” B and is
defined only if P (B) > 0. Further, if P (A) > 0, then the conditional probability of B
given is A equals P (B|A) = P (A ∩ B)/P (A). This is in general different from P (A|B)
unless P (A) = P (B).
Exercise 1. Check that PB is indeed a probability measure. Moreover, if P (B) = 1,
then PB ≡ P .
Definition 2 (Restricted σ-field). Suppose (Ω, A, P ) is a probability space and B is a
non-empty set in A. Define

AB = {A ∩ B : A ∈ A}.

Note that AB is a σ-field of subsets of B. It is called the restriction of A on B.


Exercise 2. Check that the conditional probability measure PB can be considered as a
probability measure on the restricted σ-field AB .
Exercise 3. Check that any two events A and B are independent whenever at least one
of them is P -null.
Exercise 4. (a) Suppose A and B are two events such that P (A) > 0. Then

P (A ∩ B) = P (B|A)P (A).

Moreover, Aand B are independent if and only if P (B|A) = P (B).

(b) Suppose {Ai }, 1 ≤ i ≤ n are events such that P (A1 ∩ A2 ∩ · · · ∩ An−1 ) > 0. Then

P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 ) · · · P (An |A1 ∩ A2 ∩ · · · ∩ An−1 ).

This is usually known as the law of successive conditioning.

1
Exercise 5. Suppose {Bi } is a measurable partition. Then show that for any event A,

X
P (A) = P (A ∩ Bi )
i=1

X
= P (A|Bi )P (Bi ).
{i:P (Bi )>0}

This is usually known as the law of total probability.


Exercise 6. Suppose Box 1 has 6 red and 4 blue marbles. Box 2 has 5 red, 4 blue and
10 white marbles. Suppose we pick Box 1 or 2 with probabilities 1/3 and 2/3. Once a
box is picked we draw a marble from that box with equal chance to all the marbles in
the box. Formulate this with appropriate probability spaces and use conditional and total
probability to compute the probability that the marble drawn is red.

Recall the first Borel-Cantelli Lemma which said that ∞


P
i=1 P (Ai ) < ∞ implies that
P (lim sup An ) = 0. The second Borel-Cantelli Lemma is for independent events and will
be quite useful to us.
Lemma
P∞ 1 (Second Borel-Cantelli Lemma). Suppose {Ai } are independent events. Then
i=1 P (A i ) = ∞ implies P (lim supn An ) = 1.

Proof.
P (lim sup An ) = P (∩∞ ∞
n=1 ∪k=n Ak )
n
= lim P (∪∞
k=n Ak )
n→∞
= lim lim P (∪m
k=n Ak ).
n→∞ m→∞
Now
P (∪m
k=n Ak )
c
= P (∩m c
k=n Ak )
Ym
= P (Ack ) by independence
k=n
Ym
≤ exp {−P (Ak )} since 1 − x ≤ e−x
k=n

X
→ 0 since P (Ai ) = ∞.
i=1

The independence of events can be extended as follows:


Definition 3. Suppose (Ω, A, P ) is a probability space. Sub-σ-fields {Ai , i ∈ I} are said
to be independent if any collection of events {Ai ∈ Ai } are independent.

2
Lecture 40
Random variable
PDF, CDF, PMF
Bernoulli, Exponential, Gaussian
Mean, variance
February 15, 2021

Definition 1 (Random variable). If (Ω, A, P ) is a probability space then any measurable


function is called a random variable. Random variables are usually denoted by capital
letters X, Y etc.

Definition 2 (Distribution of a random variable). If X is a random variable defined on


some probability space (Ω, A, P ) then the probability measure PX on the Borel σ-field
defined by
PX (B) = P {ω : X(ω) ∈ B}
is called the probability distribution or in short, the distribution of X. Often we
suppress ω and write P {X ∈ B}.
The cumulative distribution function (or c.d.f.) of X is defined by

FX (x) = PX (−∞, x], x ∈ R


= P {ω : X(ω) ≤ x}
= P {X ≤ x}.

Often we simply say distribution function or d.f., and suppress the subscript X.

Exercise 1. For any random variable X, we have FX (−∞) = 0 and FX (∞) = 1. Con-
versely, suppose we are given a distribution function (right continuous, non-decreasing)
F on R such that F (−∞) = 0, F (∞) = 1. Show that we can always construct a probabil-
ity space and a random variable X on it such that its c.d.f is F . Hint: Think of identity
mapping.

Remark 1. (a) PA random variable X is said to be discrete if there is a countable set


{xn } such that ∞n=1 P {X = xn } = 1. The number pn = P {X = xn } is called the mass
of X at xn and {pn } is collectively called the probability mass function or p.m.f. of
X. The p.m.f which puts entire mass at a fixed x is denoted by δx .
(b) X is said to be absolutely continuous if PX << λ. This is the same as saying
the distribution function F of X is absolutely continuous. In that case, there is a non-
negative Borel measurable function f such that
Z x
P {X ≤ x} = F (x) = f (t)dt, for all x ∈ R.
−∞

The function f is called the probability density function or, simply the density
function or p.d.f, of X.

1
(c) X is said to be continuous if its distribution function F is continuous. This is the
same as saying P {X = x} = 0 for all x ∈ R. Note that if X is continuous, it need not be
absolutely continuous. An example is given by the Cantor function F given in Exercise
4 of Lecture 26. Extend the Cantor function to the entire R by defining it to be 1 for
all x ≥ 1 and to be 0 for all x ≤ 0. Note that F is indeed a distribution function with
F (−∞) = 0 and F (∞) = 1. Let X be a random variable with distribution function F .
This is possible by Exercise 1. Then X is continuous but not absolutely continuous.
Example 1. (a) X is said to be a Bernoulli random variable if its mass function is
concentrated at two values. Usually the values are 0 and 1 with

P {X = 1} = p, P {X = 0} = q, p + q = 1.

We often write X ∼ Ber(p).


(b) X is said be an exponential random variable with parameter λ if its density function
is given by
f (x) = λe−λx , x > 0.
Check that it is indeed a density function. We write X ∼ Exp(λ).
(c) X is said be a normal or a Gaussian random variable with mean m and variance
σ 2 > 0 if its density function is given by
(x − m)2
1 −
f (x) = √ e 2σ 2 , x ∈ R.

Check that it is indeed a density function. We write X ∼ N (m, σ 2 ). If m = 0 and
σ 2 = 1, then the distribution is called standard normal or standard Gaussian.
Definition 3. Suppose X is a random variable on a probability space (Ω, A, P ). Then
its mean (whenever the integral exists) is defined as
Z
E(X) = X(ω)dP (ω).

If E(X) is finite then the variance of X is defined as


Z
V(X) = [X(ω) − m]2 dP (ω).

Exercise 2. (a) Show that the mean and variance of X can be written in terms of the
probability distribution of X as
Z
E(X) = xdPX (x), (whenever E(X) exists),
R

and whenever E(X) is finite,


Z
V(X) = (x − m)2 dPX (x).
R

2
(b) Show that if X ∼ N (m, σ 2 ), then E(X) = m and V(X) = σ 2 .
(c) Find E(X n ) for all positive integers n when X is a Bernoulli or an exponential
random variable.
(d) If X ∼ N (0, 1), find E(X n ) for all positive integers n.

3
Lecture 41
Projection mapping
Random vectors
Joint and marginal distributions
Independence
February 15, 2021

The concepts of real-valued random variables, cumulative distribution functions etc. can
be extended to vectors in the natural way.

Definition 1 (Random vector, cdf, pdf). Any Rn -valued measurable function on a


probability space (Ω, A, P ) is called a random vector. Its distribution and cdf are
defined in the natural way:

PX (B) = P {ω : X(ω) ∈ B}, B ∈ B(Rn )


FX (x) = PX (−∞, x], x ∈ Rn
= P {ω : X(ω) ≤ x} (co-ordinate wise)
; = P {X ≤ x}.

Random vectors are also usually denoted by capital letters X, Y etc. Discrete, continuous
and absolutely continuous random variables are categorised as before.

Note that if X is a random vector then we can write X = (X1 , . . . , Xn ).

Definition 2 (Projection mapping). The mappings pi : Rn → R defined by

pi (x) = xi , x = (x1 , x2 , . . . , xn ) ∈ Rn , 1 ≤ i ≤ n

are called the projection mappings.

Exercise 1. (a) Check that each pi is a Borel measurable function.


(b) Suppose X = (X1 , . . . , Xn ) : Ω → Rn is a function. Show that X is a random vector
if and only if Xi = pi ◦ X is a random variable for all i.

Definition 3. If X = (X1 , . . . , Xn ) is a random vector, then the probability distributions


or measures {PXi }, 1 ≤ i ≤ n are called the marginal distributions. The measure PX
is called the joint probability distribution or measure of {X1 , , Xn }.

Definition 4. For any random vector X on (Ω, A, P ), the σ-field generated by X is


defined as
AX = {X −1 (B) : B ∈ B(Rn )}.

Note that AX is the smallest sub-sigma field of A that makes X measurable.

1
Definition 5 (Independence of random vectors). Random vectors {Xi , i ∈ I} defined on
(Ω, A, P ) are said to be independent if the σ-fields generated by them are independent.
In particular two random variables X1 and X2 are independent if and only if

P {X1 ∈ B1 , X2 ∈ B2 } = P {X1 ∈ B1 }P {X2 ∈ B2 }, for all B1 , B2 ∈ B(R).

Note that the above is equivalent to saying that PX1 ,X2 ,...,Xn = PX1 PX2 · · · PXn . That is,
the joint probability measure is a product of the marginal probability measures. Note
that independent random vectors may not have the same number of co-ordinates.

Exercise 2. Suppose {Fn } are probability distributions functions on R. Show that there
is a probability space and a random vector X whose components {Xn } are independent
with distributions {Fn }.

2
Lecture 42
Criterion for independence
Covariance
February 23, 2021

Theorem 1. Suppose X = (X1 , . . . , Xn ) is a random vector on (Ω, A, P ) with distribu-


tion function F and marginal distribution functions F1 , F2 , . . . , Fn .
(a) Then {Xi , 1 ≤ i ≤ n} are independent if and only if
n
Y
F (x1 , . . . , xn ) = Fi (xi ), for all xi ∈ R. (1)
i=1

(b) Suppose X is absolutely continuous with density f . Then each Xi is also absolutely
continuous with some density fi . In this case, {Xi } are independent if and only if
n
Y
f (x1 , . . . , xn ) = fi (xi ), a.e λ. (2)
i=1

Conversely if {Xi } are independent with densities {fi } then X has density f given by
(??) a.e. λ.

Proof. (a) Suppose {Xi } are independent. Then

F (x1 , . . . , xn ) = P {X1 ≤ x1 , . . . Xn ≤ xn }
Yn
= P {Xi ≤ xi }
i=1
Yn
= Fi (xi ).
i=1

Conversely, suppose (??) holds. Then


n
Y
PX (a, b] = [Fi (bi ) − Fi (ai )]
i=1
Yn
= PXi (ai , bi )]
i=1

That is,
n
Y
P {Xi ∈ Bi , 1 ≤ i ≤ n} = P {Xi ∈ Bi } (3)
i=1
for all right semi-closed intervals. We wish to prove this equation holds for all Borel sets.
First fix B2 , . . . , Bn and consider

C = {B1 ∈ B(R) : equation (??) holds}.

1
It is easy to see that C is a monotone class and contains all finite disjoint union of right
semi-closed intervals. Hence C = B(R). That is (??) holds for all B1 ∈ B(R) when we
fix right semi-closed B2 , . . . , Bn . Now consider the successive co-ordinates to complete
the proof of (a).
(b)
F1 (x1 ) = P {X1 ≤ x1 }
= P {X1 ≤ x1 , X2 ∈ R, . . . Xn ∈ R}
Z Z Z
= . . . f (t1 , . . . , tn )dt1 · · · dtn .
(−∞, x1 ] R R

Note that by Fubini’s theorem, the above integral is computed iteratively. Now, by
definition it follows that X1 has the density
Z Z
f1 (x1 ) = . . . f (x1 , t2 , . . . , tn )dt2 · · · dtn .
R R
Note that it is indeed a Borel measurable function by Fubini’s theorem. Similarly each
Xi is absolutely continuous and with marginal densities obtained by integrating out the
other variables.
Now suppose (??) holds. Then
Z Z
F (x1 , . . . , xn ) = ... f (t1 , . . . , tn )dt1 · · · dtn
(−∞, x1 ] (−∞, xn ]
n
Y
= Fi (xi ).
i=1

Hence by (a) {Xi } are independent.


Conversely, suppose {Xi } are independent. Then
n
Y
F (x1 , . . . , xn ) = Fi (xi )
i=1
Yn Z Z n
Y
= ... fi (ti )dt1 · · · dtn . (4)
i=1 (−∞, x1 ] (−∞, xn ] i=1

Define
n
Y
g(x1 , . . . , xn ) = fi (xi ).
i=1
Now it is easy to conclude that
Z
PX (B) = g(x)dx for all B ∈ B(R).
B
R
Since PX (B) also equals B g(x)dx for all B, it follows that f = g a.e. λ.
For the last part, we can start with equation (??) and argue as above.

2
Exercise 1. (a) Give an example of a bi-variate random vector (X, Y ) such that both X
and Y are absolutely continuous but (X, Y ) is not so. Hint: Simple curves have Lebesgue
measure 0.
(b) Show by example that even if a bivariate random vector has density, it is not deter-
mined by the marginal densities.
(c) Suppose X = (X1 , . . . , Xn ) is a random vector where each Xi is discrete. Show that
X is then discrete. Further, {Xi } are independent if and only if
n
Y
P {X1 = x1 , . . . , Xn = xn } = P {Xi = xi } for all xi ∈ R.
i=1

Exercise 2. Suppose {Xi } are independent random vectors. Show that then fi ◦ Xi are
also independent for all Borel measurable functions (of appropriate orders).

Exercise 3. (a) Suppose {Xi }, 1 ≤ i ≤ n are independent real-valued random variables.


If all of them are non-negative or if E(Xi ) is finite for all i, then show that
n
Y
E(X1 X2 · · · Xn ) = E(Xi ).
i=1

Hint: Use Fubini’s theorem. Alternatively start with indicator functions.


(b) Suppose {Xi }, 1 ≤ i ≤ n are independent complex-valued random variables and E(Xi )
is finite for all i. Then show that
n
Y
E(X1 X2 · · · Xn ) = E(Xi ).
i=1

Hint: Prove directly for n = 2. Then use induction. Take the last variable to be an
indicator function first and then extend.

Exercise 4. Suppose X is a random variables such that E(X) is finite. Show that
(a) V(aX + b) = a2 V(X) for all real numbers a and b.
(b) V(X) = E(X 2 ) − [E(X)]2 . Caution: Both sides may equal ∞. What can you say
about a random variable X for which V(X) = 0?

Exercise 5. Suppose (X, Y ) is a bivariate random vector.


(a) If E(X) and E(Y ) are finite, then show that E(X + Y ) is also finite and

E(X + Y ) = E(X) + E(Y ).

(b) If X and Y are independent and E(X 2 ) and E(Y 2 ) are finite, then

V(X + Y ) = V(X) + V(Y ).

3
Definition 1. If (X, Y ) is a bivariate random vector and E(X 2 ) and E(Y 2 ) are finite,
then the covariance of X and Y is defined as

Cov(X, Y ) = E (X − E(X))(Y − E(Y ) .

If 0 < V(X), V(Y ) < ∞, then the correlation coefficient between X and Y is defined
as
Cov(X, Y )
ρ(X, Y ) = p .
V(X) V(Y )
Exercise 6. (a) Show that the covariance is finite.
(b) If X and Y are independent then show that their covariance is 0.
(c) Show that −1 ≤ ρ(X, Y ) ≤ 1.
(d) Suppose {Xi } are random variables with V(Xi ) < ∞ for all i. Show that
n
X X
V(X1 + · · · + Xn ) = V(Xi ) + 2 Cov(Xi , Xj ).
i=1 1≤i<j≤n

(e) If in (d), {Xi } are independent, then


n
X
V(X1 + · · · + Xn ) = V(Xi ).
i=1

4
Lecture 43
Independence
Moments
Weak law of large numbers
February 23, 2021

Exercise 1. Suppose P is the restriction of the Lebesgue measure (on R2 ), to the unit
disc Ω = {(x, y) : x2 + y 2 ≤ 1}, equipped with the Borel σ-field. Consider the random
vector Xgiven by the identity mapping X(x, y) = (x, y) on Ω. Denote the component
random variables by X1 and X2 . Consider the random variable R = X12 + X22 . Finds its
probability density function and E(R).
Exercise 2. Suppose {Xi }, 1 ≤ i ≤ n are random variables. Show that they are inde-
pendent if and only if for all bounded Borel measurable functions from R to R,
n
Y n
Y
E[ gi (Xi ) = E[gi (Xi )].
i=1 i=1

Exercise 3. Suppose {Xi }, 1 ≤ i ≤ n are independent and identically distributed (i.i.d.)


each with distribution P {Xi = 1} = p = 1 − P {Xi = 0}. Find the distribution of
X1 + · · · + Xn . This distribution is known as the binomial distribution.
Exercise 4. Suppose X1 and X2 are i.i.d. exponentially distributed with parameter
λ. Find the density of X1 + X2 . Can you extend to n ≥ 3 variables? Hint: Gamma
functions arise.
Definition 1 (Moments). Suppose X is a random variable. Its k-th moment is defined
as E(X k ) whenever the expectation exists.
Exercise 5. If X is a Bernoulli random variable with P {Xi = 1} = p = 1 − P {Xi = 0},
then what is E(X k ) for all k ≥ 1?
Exercise 6. Suppose {Xi }, 1 ≤ i ≤ n are i.i.d with mean 0 and variance 1. Find the
4-th moment of X1 + · · · + Xn in terms of the 3-rd and the 4-th moments of X1 . Hint:
For independent random variables, expectation of product equals product of expectations.
Theorem 1 (Weak law of large numbers (WLLN)). Suppose {Xi } are i.i.d. with mean
µ and variance σ 2 . Show that
X1 + · · · + Xn P
→ µ.
n

The proof is left as an exercise. Hint: Use Chebyshev’s inequality.


Exercise 7. Suppose {Xi } are i.i.d. with mean 0 and variance 1 and finite fourth
moment. Show that
X1 + · · · + Xn
→ 0 almost surely.
n
Hint: Use Exercise 6, Markov’s inequality and the first Borel-Cantelli Lemma.

1
Exercise 8. Suppose {Xi } are i.i.d. with P {Xi = 1} = p = 1 − P {Xi = 0}. Show that

X1 + · · · + Xn
→ p almost surely.
n
Hint: Use the previous exercise. Or do a direct computation via moments and apply the
Borel-Cantelli Lemma.

2
Lecture 44
Strong law of large numbers (SLLN)
February 24, 2021

Notation: Sn = X1 + · · · + Xn is known as the sequence of partial sums.


We have seen that if {Xi } are i.i.d with finite variance then Sn /n → E(X1 ) in probability.
Further if E(X 4 ) < ∞ then Sn /n → E(X1 ) almost surely. We now state and prove a
much stronger result.
Theorem 1 (Strong Law of large Numbers (SLLN)). Suppose {Xi } are pairwise in-
dependent identically distributed random variables. Then E |X1 | < ∞ implies Sn /n →
E(X1 ) almost surely.

This result is known for about 90 years, and is traditionally proved by establishing an
inequality, known as Kolmogorov’s maximal inequality. This inequality is extremely
useful and its proof contains ideas that are used elsewhere, for example in the theory of
martingales. However, we give a simple proof that was discovered by Etemadi in 1981.

Proof of Theorem 1. Consider the sequences {Xi+ } and {Xi− }. They also satisfy the
hypothesis of the theorem. Thus it is enough to prove the theorem when {Xi } are
non-negative. Let F denote the distribution function of (any) Xi . Define

Yi = Xi I{Xi ≤ i}, Sn∗ = Y1 + · · · + Yn .

Note that 0 ≤ Yi ≤ i for all i.


Fix  > 0 and α > 1. Define kn = [αn ] (the largest integer contained in αn ). We will
prove the result in three steps:
Sk∗n
(i) → E(X1 ) almost surely (that is SLLN holds for {Yi } along the particular sub-
kn
sequence of partial sums).
Skn
(ii) → E(X1 ) almost surely (that is, the SLLN holds for the particular sub-sequence
kn
of partial sums).
Sn
(iii) → E(X1 ) almost surely.
n
Proof of Step (i). Below C will denote a constant which depends only on  and α. By
Chebyshev’s inequality,
∞ n S ∗ − E(S ∗ ) ∞
X o 1 X V ar(Sk∗n )
P | kn kn
|> ≤
kn 2 kn2
n=1 n=1
∞ kn
1 X 1 X
= V ar(Yi )
2 kn2
n=1 i=1

1

X X 1
≤ C E(Yi2 )
kn2
i=1 n:kn ≥i

X E(Yi2 )
≤ C tail of a geometric series and kn−2 ≤ i−2
i2
i=1
∞ Z
X 1
= C x2 dF (x)
i2 (0, i]
i=1
∞ i−1 Z
X 1 X
= C x2 dF (x)
i2 (k, k+1]
i=1 k=0
∞ Z ∞
X X
2 1
= C x dF (x) (interchanging the order)
i2
k=0 (k, k+1] i=k+1
∞ Z
X 1
≤ C x2 dF (x)
k + 1 (k, k+1]
k=0
X∞ Z
≤ C xdF (x)
k=0 (k, k+1]

= C E(X1 ) < ∞.

Hence by the first Borel-Cantelli Lemma,

Sk∗n − E(Sk∗n )
→ 0 almost surely. (1)
kn

On the other hand, by MCT


E(Yn ) → E(X1 ).
We also know that if an → a then (a1 + · · · + an )/n → a. Hence

E(Sk∗n ) E(Y1 ) + · · · + E(Ykn )


= → E(X1 ).
kn kn

Thus by using (1),


Sk∗n
→ E(X1 ) almost surely
kn
thereby completing the proof of Step (i).
Proof of Step (ii). This is proved by showing that the sequences {Xi } and {Yi } differ
only at finitely many indices almost surely. To see this, note that,


X ∞
X
P {Xn 6= Yn } = P {Xn > Yn }
n=1 n=1

2
∞ Z
X
= dF (x) (identical distribution)
n=1 (n, ∞)
X∞ X∞ Z
= dF (x)
n=1 i=n (i, i+1]

X∞ Z
= i dF (x) (interchanging the order of summation)
i=1 (i, i+1]

X∞ Z
≤ xdF (x)
i=1 (i, i+1]
Z
≤ xdF (x) = E(X1 ) < ∞.
(0, ∞)

Hence, by Borel-Cantelli Lemma, Xn 6= Yn only for finitely many n almost surely. Hence
using Step (i), Skn /kn → E(X1 ) almost surely.
Proof of Step (iii) Since {Xi } are non-negative, Sn is non-decreasing. For every n, let
sn be the largest j such that kj is smaller than n. Note that ksn +1 ≥ n and

ksn +1
lim = α.
n→∞ ksn

Then
Sksn +1 ksn +1 ksn
lim sup Sn /n ≤ lim sup
ksn +1 ksn n
≤ E(X1 )α almost surely.

Similarly
1
lim inf Sn /n ≥ almost surely.
α

Now since α > 1 is arbitrary, letting α → 1 (through a countable sequence, so that null
sets are managed),
Sn
lim = E(X1 ) almost surely.
n
This complets the proof.

Exercise 1. Suppose X1 and X2 are i.i.d. absolutely continuous random variables with
density f .

(a) Show that P {X1 = X2 } = 0.

(b) Define X(1) = min{X1 , X2 } and X(2) = max{X1 , X2 }. Find the density of X(1) ,
X(2) and (X(1) , X(2) ) in terms of f . Hint: On which Borel subset of R2 will the density
be non-zero? Start with sets for which probabilities are easier to calculate.

3
(c) Can you generalise to the case where you have n i.i.d. absolutely continuous random
variables? Note that you have to define second minimum....etc. These are called ordered
statistics.

(d) Suppose {Xi } are i.i.d. with the uniform distribution on the interval (0, 1). Let

X(n) = max{X1 , . . . , Xn }.

Show that the distribution function Fn of the random variable n(1 − X(n) ) converges to
the distribution function of an exponential random variable.

Exercise 2. (a) Suppose X is a Poisson random variable with parameter λ. Show that
E(X) = V ar(X) = λ.

(b) Suppose X1 and X2 are independent Poisson random variables with parameters λ1
and λ2 . Show that X1 + X2 also has the Poisson distribution with parameter λ1 + λ2 .

Exercise 3. Suppose {Xi } are i.i.d. Bernoulli random variables with P {X1 = 1} = p =
1 − P {X1 = 0}.

(a) Let
N1 = inf{n ≥ 1 : Xn = 1}.
Find the probability distribution of the random variable N1 . Hint: Start by listing the
possible values of N1 .

(b) Let
N2 = inf{n > N1 : Xn = 1}.
Find the distribution of N2 . Hint: Use total and conditional probability, along with
independence. Check whether N1 and N2 − N1 are independent. Hint: Use the p.m.f.
criterion for independence.

4
Lecture 45
Kolmogorov’s Zero-one Law
February 24, 2021

We know that A is independent of itself if and only P (A) = 0 or 1. We now demonstrate


a class of sets with this property in the context of independent random variables.
Definition 1. Suppose {Xi } is a sequence of random variables and let

Tn = σ(Xn , Xn+1 , . . .)

be the smallest σ-field that makes Xn , Xn+1 , . . . measurable. Then

T∞ = ∩∞
n=1 Tn

is known as the tail σ-field of {Xi }. The sets in T∞ are known as tail events. A
function f : Ω → (R̄, B(R)) is called a tail function if it is measurable with respect to
T∞ .

Note that Tn is a decreasing sequence of σ-fields. Also arbitrary intersection of σ-fields


are σ-fields and so T∞ is indeed a σ-field. Note also that tail events are always with
respect to a given sequence of random variables.
Exercise 1. (a) Verify that the following sets are tail events:

(i) A1 = {lim Xn exists}.

(ii) A2 = { ∞
P
n=1 Xn converges}.

(iii) A3 = {lim sup Xn = lim inf Xn }.

(iv) A4 = {Xn < 2 for infinitely many n}.

(b) Show that lim sup Xn , lim inf Xn are tail functions.
Theorem 1 (Kolmogorov’s zero-one law). Suppose {Xi } is a sequence of independent
random variables and T∞ is their tail σ-field. Then for every A ∈ T∞ , P (A) = 0 or 1.
Every function which is measurable with respect to T∞ , is a constant almost surely.

Proof. Fix A ∈ T∞ . Note that T∞ ⊆ T1 . Hence A is of the form

A = {(X1 , X2 , . . .) ∈ A∗ }

for some A∗ ∈ B(R)∞ . For any C ∗ ∈ B(R)∞ , define

C = {(X1 , X2 , . . .) ∈ C ∗ }.

Let
C = {C ∗ ∈ B(R)∞ : C and A are independent}.

1
Suppose C ∗ is a measurable cylinder. Then

C = {(X1 , . . . , Xn ) ∈ Bn } ∈ σ(X1 , . . . , Xn ).

Note that A ∈ Tn+1 . Since {Xi } are independent, clearly A and C are independent.
Thus C contains all measurable cylinders. Clearly C is a monotone class. For example if
Cn ↑ C and Cn ∈ C, then P (Cn ) ↑ P (C) and

P (A)P (Cn ) = P (A ∩ Cn ) ↑ P (A ∩ C).

Thus A and C are independent. Similarly for decreasing limits. By monotone class
theorem C = B(R)∞ and in particular A∗ ∈ B(R)∞ . This means that A is independent
of itself.
If f is a tail function then for each c ∈ R̄, {ω : f (ω < c} is a tail event and hence has
probability 0 or 1. take
k = sup{c ∈ R̄ : P {f < c} = 0}
then f = k almost surely.

Example 1. Suppose {Xi } are i.i.d. with mean 0. Then 0 − 1 law says that the set

Sn
A = {ω : → 0}
n
has probability 0 or 1, since A is a tail event. The SLLN goes a step further and says
that P (A) = 1.

Exercise 2. (a) Suppose {Xi } are i.i.d. non-negative random variables and E(X1 ) = ∞.
Show that Sn /n → ∞ almost surely.

(b) Prove (a) when the non-negativity assumption is dropped

Exercise 3. Show that if {Xi } are independent, then all sets in the tail σ-field are
independent of each other.

2
Lecture 46
Hewitt-Savage Zero-one Law
February 24, 2021

The 0 − 1 law can be strengthened if {Xi } are i.i.d.


Definition 1. Suppose {Xi } is a sequence of random variables. Any event A ∈ σ(X1 , X2 , . . .)
is said to symmetric if the occurrence of A is not affected by a permutation of finitely
many co-ordinates of Xi .

Suppose T : {1, 2, . . . , } → {1, 2, . . . , } is a permutation of finitely many co-ordinates.


Then A is symmetric if A = {XT (1) XT (2) , . . . ∈ A∗ } for some A ∈ B(R)∞ .
Exercise 1. (a) Check that any tail event is a symmetric event.

(b) {Xn = 0 for all n} is a symmetric event but not a tail event.
Theorem 1 (Hewitt-Savage Zero-One Law). Suppose {Xi } are i.i.d.. If A is a sym-
metric set in σ(X1 , X2 , . . .) then P (A) = 0 or 1. Further, any symmetric function is
constant almost surely.

Proof. Write X = (X1 , X2 , . . .). Let A = {X ∈ A∗ }. Then


P (A) = PX (A∗ ).
Since the probability of sets in the σ-field can be approximated by probability of sets
in a generating field, we can get measurable cylinders Ck∗ such that PX (A∗ ∆Ck∗ ) → 0 as
k → ∞. Let
Ck = {X ∈ Ck∗ } = {(X1 , . . . Xnk ) ∈ Bk }.
Suppose Tk is the permutation that interchanges {1, . . . , nk } and {nk +1, . . . , 2nk }. Since
{Xi } are i.i.d., X and X(Tk ) have the same distribution. Hence
P (A∆Ck ) = PX (A∗ ∆Ck∗ )
= PX(Tk ) (A∗ ∆Ck∗ )
= P [{X(Tk ) ∈ A∗ }∆{X(Tk ) ∈ Ck∗ )}] (since A is symmetric)
= P (A∆Ck (Tk )).
Hence P (A∆Ck ) → 0, P (A∆Ck (Tk )) → 0. As a consequence, P (A∆[Ck ∩ Ck (Tk )]) → 0.
This implies that P (Ck ), P (Ck (Tk )), P [Ck ∩ Ck (Tk )] all converge to P (A). On the other
hand,
P [Ck ∩ Ck (Tk )] = P [{(X1 , . . . Xnk ) ∈ Bk , (Xnk +1 , . . . X2nk ) ∈ Bk }]
= P (Ck )P (Ck (Tk )) by independence of {Xi }.
Letting k → ∞, P (A) = P (A)P (A) and hence the first part of the theorem is proved.
The second part is left as an exercise.

1
Lecture 47
Convergence in distribution
Helly’s theorem
Prokhorov’s theorem
February 25, 2021

Definition 1. If X and Y are two random variables (possibly on different proba-


bility spaces) with the same induced probability distribution functions, then we say
D
that X and Y are equal in distribution. We write X = Y .
D
Exercise 1. (a) If X has a normal distribution with mean 0 then show that X = −X.
D
(b) If X has the uniform distribution on the interval [0, 1] then show that X = 1 − X.
(c) If X is the Bernoulli random variable taking values 0 and 1, when does the equation
D
X = 1 − X hold?

Definition 2. Let {Xn } and X be random variables (perhaps on different probability


spaces) with probability distribution functions {Fn } and F . Then we say Xn converges
D
to X in distribution if Fn converges to F weakly. We write Xn → X.

D
Note that if Xn → X, and Y has the same probability distribution as that of X, then
D
Xn → Y . Thus only the probability distributions are being identified in the
limit.

Exercise 2. Show that


D
(a) Xn → X if and only if, for all bounded continuous functions f : R → R, E[f (Xn )] →
E[f (X)]. Note: Here E always denotes the expectation with respect to the induced prob-
ability measures of the respective random variables. Hint: Use Portmanteau Theorem of
Lecture 36 and Theorem 1 of Lecture 37.
D
(b) Xn → X if and only if for all Borel sets A for which P {X ∈ ∂A} = 0, we have
P {Xn ∈ A} → P {X ∈ A}.
D
Exercise 3. Suppose Xn → X and g : R → R is a continuous function. Show that
D D
g(Xn ) → g(X). In particular if c is a constant, then Xn + c → X + c.

Exercise 4. Suppose {Xn } are defined on the same probability space.


D P
(a) If Xn → c where c is some constant, then show that Xn → c.
P D
(b) If Xn → X then show that Xn → X.

(c) Given an example of a sequence {Xn } (defined on the same probability space) such
D P
that Xn → X but Xn → X does not hold.

1
Suppose {xn } is a bounded sequence of real numbers. Then it may not converge but
we can always find a subsequence which converges. Here is the analogous result for
probability distribution functions.

Theorem 1 (Helly’s Theorem). Suppose {Fn } is a sequence of probability distribution


functions on R. Then there is a distribution function F and a sub-sequence {nk } such
that Fnk (x) → F (x) at all continuity points of F .

Example 1. Note that F in the above theorem need not be a probability distribution
function. We have seen such examples earlier. Consider the probability distribution
functions Fn and the distribution function F given by

0 if x < 0,

Fn (x) = 1/2 if 0 ≤ x < n,

1 if x ≥ n.

(
0, if x < 0,
F (x) =
1/2 if x ≥ 1/2.

Then F has only one discontinuity (at x = 0) and it is easy to see that the Fn (x) → F (x)
for all x 6= 0.

Proof of Theorem 1. This is proved by a diagonalisation argument. Let D be a countable


dense subset of R. The sequence {Fn (x1 )} is a bounded sequence. Hence there is a sub-
sequence, {F1j } of {Fn } such that {F1j (x1 )} converges to say y1 . Now consider the
sequence {F1j } and extract a sub-sequence {F2j } such that {F2j ((x2 )} converges to say
y2 . Continuing this way, for every m, there is a sub-sequence {Fmj } of {Fm−1,j } such
that {Fmj (xm )} converges to say ym . Note that 0 ≤ ym ≤ 1 for every m.
Let Fnk = Fkk , k ≥ 1 be the “diagonal sequence”. Define the function FD : D → R as
FD (xj ) = yj , j ≥ 1. Note that by construction

Fnk (x) → FD (x) for all x ∈ D.

Since {Fn } are all non-decreasing functions, it follows that FD (·) is also non-decreasing on
D. Now we construct a distribution function (not necessarily a probability distribution
function) out of FD (·) in the obvious way:

F (x) = inf{FD (y) : y ∈ D, y > x}.

By definition, F is non-decreasing. We now show that F is right-continuous. Let zn ↓ x.


Then lim F (zn ) := b exists since it is a non-increasing sequence. Moreover, b ≥ F (x).
If F (x) < b, then let y ∈ D, y > x such that FD (y) < b. For all large n, x < zn < y.
Hence F (zn ) ≤ FD (y) < b. This is a contradiction. Hence lim F (zn ) = F (x). Thus F is
a distribution function.

2
Remains to show that Fn (x) → F (x) at all continuity points of F . Suppose x < y ∈ D.
Then
lim sup Fnk (x) ≤ lim sup Fnk (y) = FD (y).
k→∞ k→∞
Then take infimum over y ∈ D, y > x to get
lim sup Fnk (x) ≤ F (x).
k→∞
Now consider x∗ < y < x, y ∈ D. Then

F (x ) ≤ FD (y) = lim Fnk (y) = lim inf Fnk (y) ≤ lim inf Fnk (x).
Let x∗ → x to obtain
F (x−) ≤ lim inf Fnk (x).
Now if x is a continuity point of F , F (x− ) = F (x) and we get lim Fnk (x) = F (x).
In Example 1, some mass escaped to ∞. The concept of “tightness” prevents this
situation.
Definition 3. A set of probability measures {Pi , i ∈ I} on (R, B(R)) is said to be tight
if given  > 0, there exists an M > 0 such that Pi [−M, M ]c <  for all i ∈ I. A set of
probability distribution functions {Fi , i ∈ I} or a set of random variables {Xi , i ∈ I} is
said to be tight if their corresponding probability measures are tight.

Note that a finite set of probability measures is always tight.


Definition 4. A set of probability measures {Pi , i ∈ I} is said to be relatively compact
if given any sequence from this collection, there is a further sub-sequence which converges
weakly. This definition extends to probability distribution functions {Fi , i ∈ I} or to
random variables {Xi , i ∈ I} in the obvious way.
Theorem 2 (Prokhorov’s theorem, special case). A set of probability distribution func-
tions {Fi , i ∈ I} is tight if and only if it is relatively compact.

Proof. Suppose {Fi , i ∈ I} is tight. Take a sequence {Fn } from this collection. By
Helly’s theorem, there is a sub-sequence {Fnk } and a distribution function F such that
Fnk (x) → F (x) at all continuity points x of F . Fix  > 0. Using this, we can choose a
and b continuity points of F such that Fn (R \ (a, b]) <  and F (R \ (a, b]) < . This
can be used to show that F is a probability distribution function. We omit the details.
Now assume that {Fi , i ∈ I} is relatively compact but not tight. Then for some  > 0,
we can pick {Fn } such that Fn (R − (−n n)) ≥ . By relative compactness, we can get a
sub-sequence, say {Fnk } which converges to say F . Now note that R \ (−n, n) is closed
and
lim sup Fnk (R \ (−n, n)) ≤ F (R \ (−n, n)).
k→∞
Thus F (R \ (−n, n)) ≥  for all n and letting n → ∞, we obtain  ≤ 0 which is a
contradiction. This completes the proof.

3
Lecture 48
Characteristic function
Inversion formula
February 28, 2021

Definition 1 (Characteristic function). For any finite measure µ on B(Rn ) define its
characteristic function µ̂ : Rn → C as
Z
0
µ̂(t) = eιt x dµ(x), t ∈ Rn .
Rn

We shall mostly deal with the case n = 1 and where µ is a probability measure. Note
that
ιt0 x
R
 Rn e f (x)dλ(x) if µ has density f (·)

µ̂(t) =
P ιt0 x

xe p(x) if µ has p.m.f. p(·).

If X is a random variable with distribution function F , then the characteristic function


of X is defined in the obvious way:
0
φX (t) =: φF (t) = E[eιt X ]
Z
0
= eιt x dF (x).
Rn

Remark 1. The Fourier transform is usually defined for f ∈ L1 (Rn , λ) as


Z
ˆ 1 0
f (t) = √ eιt x f (x)dλ(x).
n
( 2π) R n

Thus, apart from the constant, if µ has a density f , the characteristic function of µ is
really the Fourier transform of f .
Exercise 1. (a) Show that
|eιa − eιb | ≤ |a − b|.
Hint: Write it as an integral from a to b.
eιx − 1
(b) Show that limx→0 = 1.
ix
Exercise 2. Show that φX (·) is a uniformly continuous function.
Theorem 1 (Inversion formula). Suppose φ is the characteristic function of the distri-
bution function F .
(a) Then for all a, b ∈ R, a < b,
c
e−ιta − e−ιtb
Z
F (b) + F (b−) F (a) + F (a−) 1
− = lim φ(t)dt.
2 2 c→∞ 2π −c it

1
In particular if a, b are continuity points of F , then
Z c −ιta
1 e − e−ιtb
F (b) − F (a) = lim φ(t)dt.
c→∞ 2π −c it

(b) If φ(·) is Lebesgue integrable, then F is absolutely continuous and


Z ∞
1
f (x) = e−ιtx φ(t)dt
2π −∞

is a density for F .

Part (b) is known as the Fourier Inversion Theorem. Using this, the map f → fˆ
can be extended to a map from L2 (λ) to itself and then is an isometry. This is one of
the fundamental results in Fourier Analysis.

Proof of Theorem 1. For a < b,


c
e−ita − e−itb
Z
1
Ic : = φ(t)dt
2π −c it
c
e−ita − e−itb 
Z Z
1
eιtx dF (x) dt

=
2π −c it R

Note that by Exercise 1,


e−ita − e−itb ιtx
e ≤b−a
it
and Z c Z
(b − a)dF (x)dt = 2c(b − a) < ∞.
−c R
Hence we can apply Fubini’s theorem to interchange the order of the integration and
conclude that,
Z Z c −ita
− e−itb ιtx
Z
1 e
Ic = e dtF (x) = Jc (x)dF (x),
2π R −c it R

where
c
sin t(x − a) − sin t(x − b)
Z
1
Jc (x) : = dt (the cos integral is 0)
2π −c t
Z c(x−a) Z c(x−b)
1 sin v 1 sin w
= dv − dw.
2π −c(x−a) v 2π −c(x−b) w
R s sin t
Recall that r dt → π as s → ∞ and r → −∞. Using this, it follows that
t
(i) There is an M < ∞ such that supc,x |Jc (x)| ≤ M .

2
(ii)


0 if x < a or x > b

J(x) := lim Jc (x) = 1 if a < x < b
c→∞
 1 if x = a or x = b.


2
Hence by DCT,
Z
lim Ic = J(x)dF (x)
ZR Z Z Z Z
1 1
= 0dF (x) + dF (x) + + dF (x) + dF (x) + 0dF (x)
x<a 2 x=a a<x<b 2 x=b x>b
1 1
= [F (a) − F (a−)] + [F (b−) − F (a)] + [F (b) − F (b−)]
2 2
F (b) + F (b−) F (a) + F (a−)
= − .
2 2

(b) Now suppose φ(·) is integrable. Note that then f given below is well-defined:
Z ∞
1
f (x) = e−ιtx φ(t)dt.
2π −∞

We must show that f is a density for F . By an application of DCT, it is easy to see


that f is continuous. By Fubini’s theorem

Z b Z ∞ Z b
1
e−ιtx dx dt
 
f (x)dx = φ(t)
a 2π −∞ a
Z c Z b
1
e−ιtx dx dt by DCT
 
= lim φ(t)
c→∞ 2π −c a
e−ita
− e−itb
c
Z
1
= lim φ(t)dt
c→∞ 2π −c it
= F (b) − F (a) by part (a) if a, b are continuity points of F.

Thus for all continuity points a < b,


Z b
F (b) − F (a) = f (x)dx.
a

We claim that this holds for all a < b. Since f is continuous, the integral is a continuous
function of a and b. Since the continuity points of F are dense in R, the above holds
for all a and b. Since f is continuous everywhere, F is differentiable everywhere and
moreover F 0 (x) = f (x) for all x. Since F is non-decreasing, f must be non-negative.
Thus f is a density of F .

3
Exercise 3. The following facts were used in the last part of the above theorem. Show
that
Rb
(a) If g is integrable wrt the Lebesgue measure, then a g(x)dx is a continuous function
of a and b.
Rb
(b) If g is a continuous function then a g(x)dx is a differentiable function of both b and
a.

(c) If g is a continuous function such that for a distribution function F , F (b) − F (a) =
Rb
a g(x)dx for all a < b, then show that g is non-negative.

Exercise 4. Suppose P1 and P2 are two probability measures on R such that P̂1 (t) =
P̂2 (t) for all t. Then show that P1 ≡ P2 .

4
Lecture 49
Basic properties of characteristic function
Moment generating function
Cauchy distribution
February 28, 2021

Definition 1. A random variable X or its distribution function F is said to be symmetric


(about 0) if P {X ∈ B} = P {−X ∈ B} for all B ∈ B(R).

The normal distribution with mean 0 is symmetric about 0.


Theorem 1 (Basic properties of the characteristic function). (a) |φ(t)| ≤ φ(0) = 1.

(b) φ(·) is a uniformly continuous function.

(c) For all t ∈ R, φ(−t) = φ(t) (the complex conjugate).

(d) A random variable X has a symmetric distribution if and only if its characteristic
function φ is real-valued.

(e) If E(|X|n ) < ∞ for some positive integer n, then the n-th derivative of φX (·) exists
and is continuous on R and
Z
(n)
φX (t) = (ιx)n eitx dF (x).
R

In particular,
(n)
ιn E(X n ) = φX (0).

The proof of the theorem is left as an exercise.


Exercise 1. Suppose {Xi } are independent and Sn = X1 + · · · + Xn . Then show that
n
Y
φSn (t) = φXi (t).
i=1

This is one of the most useful properties of the characteristic function. On the left side we
have a random variables Sn whose distribution we may not be able to calculate. However,
its characteristic function is easy to calculate in terms of the individual characteristic
function. And due to the uniqueness property of the characteristic function this is
valuable.
Exercise 2. (a) If X is a random variable show that for any real constants a and b,
φaX+b = eitb φX (at).
Definition 2. The moment generating function(m.g.f.) of a random variable is defined
as
MX (t) = E[etX ], t ∈ R.

1
Note that the m.g.f. may equal ∞ for some values of t.
Exercise 3. (a) Show that for a, b ∈ R,
MaX+b (t) = MX (at)ebt , t ∈ R.

(b) If X is a standard normal random variable then show that


2 /2
MX (t) = et .

(c) If X is a standard normal random variable then show that


2 /2
φX (t) = e−t .

(d) Show that a random variable X has the normal distribution if and only if its char-
acteristic function is given by
2
φX (t) = eita e−t b/2
and in that case E(X) = a and V (X) = b.

(e) Show that if X1 and X2 are two independent normal random variables, then X1 + X2
is also a normal random variable.
Exercise 4. Consider the function
1 1
f (x) = , x ∈ R.
π 1 + x2
R
(a) Show that R f (x)dx = 1. [This is called the (standard) Cauchy density function].

(b) Suppose X is a random variable with the above density function. Then it is called
a (standard) Cauchy random variable. By the help of contour integration, show that its
characteristic function is given by
φX (t) = e−|t| , t ∈ R.

(c) Suppose {Xi } are independent Cauchy random variables. Show that (X1 +· · ·+Xn )/n
is also a Cauchy random variable for every n.
Exercise 5. (a) Suppose X is a binomial random variable with parameters n and p.
Find its characteristic function.

(b) Suppose X1 and X2 are independent binomial random variables with paramaters
(n1 , p) and (n2 , p) respectively. Show that X1 + X2 is also a binomial random variable
and identify the parameters.
Exercise 6. (a) Suppose X is a Poisson random variable with parameter λ. Find its
characteristic function.

(b) Suppose {Xi } are independent Poisson random variables with parameters {λi }. Show
that for every n, X1 +· · ·+Xn is a Poisson random variable with parameter λ1 +· · ·+λn .

2
Lecture 50
Weak convergence and characteristic function
Central Limit Theorem (CLT)
February 28, 2021

Theorem 1. Suppose {Fn } is a sequence of tight probability distributions on R.

(a) If every weakly convergent subsequence of {Fn } converges to the same distribution
w
function F , then Fn → F .

(b) {Fn } converges weakly if and only if φFn (t) converges to a finite limit for every t.

Proof. (a) This is left as an exercise.

(b) Suppose φFn (t) converges to a finite limit say g(t) for every t. Since {Fn } is tight,
by Helly’s theorem, there is a subsequence Fnk that converges weakly to a probability
distribution function say F . If Fn does not converge weakly to F , then by the above
exercise, there is another sub-sequence Fmk (t) that converges to say G 6= F . But this is
a contradiction since g(t) = φF (t) = φG (t) for all t. Hence Fn converges weakly to F .
Converse follows from the Portmanteau theorem since x → eιtx is a bounded continuous
function.

Exercise 1. Suppose Xn is a binomial random variable with parameters n and pn where


npn → λ < ∞. Show that Xn converges to the Poisson distribution with parameter λ.

Exercise 2. (a) Suppose {Xn } is a sequence of random variables such that

sup E[|Xn |α ] < ∞.


n≥1

Show that {Xn } is tight.



(b) If {Xi } are i.i.d. with E(X1 ) = 0 and V (X1 ) = σ 2 < ∞, show that {Sn / n} is
tight.

The central limit theorem will claim that in part (b) above, {Sn / n} converges in
distribution and the limit distribution is normal with mean 0 and variance σ 2 . This
can be proved by many different methods. Our proof will be based on characteristic
functions. since it is a key concept in many areas of mathematics, including Harmonic
Analysis and Functional Analysis. The following lemma is a crucial technical result.

Lemma 1 (Truncation Inequality). Suppose F is a probability distribution function on


R. Then for some k > 0,

k u
Z Z
dF (x) ≤ [1 − Re φ(t)]dt.
{x:|x|≥1/u} u 0

1
Proof.
u
1 u
Z Z Z
1
[1 − Re φ(t)]dt = (1 − cos tx)dF (x)dt
u 0 u
Z h0 ZRu
1 i
= (1 − cos tx)dt dF (x) by Fubini’s theorem
u
ZR  0
sin ux 
= 1− dF (x)
R ux
Z
 sin t 
≥ inf 1− dF (x)
{t: |t|≥1} t {x: |ux|≥1}
Z
1
= dF (x).
k {x: |x|≥1/u}

Theorem 2 (Lévy’s theorem). Suppose {Fn } is a sequence of probability distribution


functions on R and F is another probability distribution function.

(a) If Fn → F weakly then φFn (t) → φF (t) for all t.

(b) If φFn (t) → g(t) for all t where g is continuous at 0 and g(0) = 1. Then g is a
characteristic function of some probability distribution F and Fn → F weakly.

Proof. (a) follows from the definition of weak convergence since eitx is a bounded con-
tinuous function.

(b) We first claim that {Fn } is tight. Using Lemma 1,

k u
Z Z
dFn (x) ≤ [1 − Re φFn (t)]dt
{x: |x|≥1/u} u 0
k u
Z
→ [1 − Re g(t)]dt by DCT
u 0
→ 0 as u → 0 (since g is continuous at 0).

This shows that {Fn } is tight. Hence by Theorem 1 Fn converges weakly to a distribution
function F . As a consequence φFn (t) → φF (t) for all t. This implies that φF (t) = g(t)
for all t. Hence g is the characteristic function of F .

Recall the notation Sn = X1 + · · · + Xn .

Theorem 3 (Central Limit Theorem (CLT)). Suppose {Xn } are i.i.d. random variables

with mean µ and variance σ 2 . Then (Sn − nµ)/ nσ converges weakly to the standard
normal distribution.

2
Proof. Define Yi = (Xi − µ)/σ. Then {Yi } are i.i.d. with mean 0 and variance 1. and
n
√ X √
(Sn − nµ)/ nσ = Yi / n = Zn , say.
i=1

Let φ denote the characteristic function of Y1 . Noting that {Yi } are i.i.d.,
−1/2
Pn
φZn (t) = E eιtn Yi

i=1
h t i n
= φ √ (1)
n

Recall that the characteristic function of the standard normal distribution is ψ(t) =
2
e−t /2 . Hence by Lévy’s theorem it is enough to show that
h t i n 2
φ √ → e−t /2 for all t ∈ R. (2)
n

The immediate thought that comes to mind is to take logarithm and this would lead to
a proof that would generalise to other situations where {Xi } are not necessarily i.i.d.
However, for our purposes, we do not take this route and instead the following easy
lemma is an adequate tool.
Lemma 2. Suppose {ai } and {bi } are complex numbers such that

max {|ai |, |bi |} ≤ 1.


1≤i≤n

Then
Yn n
Y X n
ai − bi ≤ |ai − bi |.


i=1 i=1 i=1

In particular, if ai ≡ a, bi ≡ b, then |an − bn | ≤ n|a − b|.

The proof of the lemma follows easily by induction and is left as an exercise.
We shall need another inequality. Recall that we have already used the inequality |eιx −
1| ≤ |x|. We shall need a refinement of this inequlaity.
Lemma 3. (a) For all x ∈ R,

ιx x2 n |x|3 o
e − 1 + ιx − ) ≤ min , x2 .
2 6

(b) If X is a random variable with E(X 2 ) < ∞, then

1
φX (t) = 1 + ιt E(X) − E(X 2 ) + o(t2 ), as t → 0.
2

3
Proof. We leave the proof of (a) as an exercise. To prove (b), note that by (a),
1 h n |tX|3 oi
|φX (t) − (1 + ιt E(X) − E(X 2 )| ≤ E min , (tX)2
2 h n6 oi
≤ t E min |t||X|3 , X 2 .
2

Note that by DCT the expectation goes to 0 as t → 0.

By part (b) of this lemma, and equations (1) and (2)


h
−t2 /2
t in h −t2 /(2n) in
|φZn (t) − e = φ √
− e
n
t  2

≤ n φ √ − e−t /(2n)

n
t2 t2 
+ o(t2 /n) − 1 − + o(t2 /n) = o(1).

= n 1 −

2n 2n
This proves the theorem.
Exercise 3 (Riemann-Lebesgue Lemma). Suppose f ∈ L1 (λ). Show that lim|t|→∞ fˆ(t) =
0. Hint: Reduce to f to be a probability density. Then approximate f by simple functions.
Then see what happens to the characteristic function of these simple functions.
Exercise 4. Show that Z t
sin x π
lim dx = .
t→∞ 0 x 2
Hint: Verify that
Z t
1 
e−ux sin xdx = 1 − e−ut (u sin t + cos t)].
0 1 + u2
Then use Fubini’s theorem after noting that
Z ∞
1
= e−ux dx.
x 0
Exercise 5. Show that for all x ∈ R,

(a) |eιx − 1| ≤ |x|.


x2
(b) |eιx − (1 + ιx)| ≤ .
2
|x|3
(c) |eιx − (1 + ιx − x2 /2)| ≤ . Hence prove part (a) of Lemma 3.
6
Hint. For (a), take g1 (t) = eιt , note that |g(t) ≤ 1 and for x > 0. integrate it over the
range 0 ≤ t ≤ x. For (b) take g2 (t) = eιt − 1, use (a) and integrate as before. For (c)
take g3 (t) = eιt − (1 + ιt).

(d) Generalise (c).

You might also like