Lecture Notes
Lecture Notes
a + ∞ = ∞, a − ∞ = −∞,
∞ + ∞ = ∞, −∞ − ∞ = −∞, ∞ − ∞ is undefined.
b × ∞ = ∞ (b > 0),
b × ∞ = −∞ (b < 0)
a a
= = 0, a ∈ R.
∞ −∞
0 × ∞ = 0.
2. lim sup and lim inf of real numbers. {xk } is a sequence of real numbers. Define
3. Operations on sets:
(a) Complement Ac or A0 .
(b) Difference A − B.
(c) Union A ∪ B. In general write ∪kn=1 An
(d) Intersection A ∩ B. In general write ∩kn=1 An
(e) (A ∩ B)c = Ac ∪ B c .
Exercise 2. Which is larger? A ∪ B or A ∩ B?
1
4. Disjointification:
Two sets A1 and A2 .
Define B1 = A1 . B2 = A2 − A1 .
Then
(a) B1 and B2 are disjoint.
(b) B1 ∪ B2 = A1 ∪ A2 .
Extend to more than two sets...
Exercise 3. Sets {An }, n = 1, 2, . . .. Define
Bk = Ak − ∪k−1
j=1 Aj , k ≥ 1 A0 = ∅.
Show that
(a) Bk ⊂ Ak for all k = 1, 2, . . ..
(b) B1 , B2 , . . . are disjoint.
(c) ∪nk=1 Bk = ∪nk=1 Ak for every n.
5. lim sup and lim inf of sets {Ak }. Taking a hint from Item 2,
2
Show that
(a) {Cn } is a decreasing sequence of sets.
(b) {Dn } is an increasing sequence of sets.
(c) Cn ⊇ Dn for all n.
(d) lim sup An = ∩∞ ∞
n=1 Cn , lim inf An = ∩n=1 Dn .
3
Lecture 2
Field, σ-field, Borel σ-field
December 2, 2020
Probability calculations are always done for events. For example probability of “four
heads in 10 tosses”. But events can get more complicated (for example by combination
of “simple” events) and soon we face the question “what constitutes the collection of all
events in a given scenario?”
This collection of events will of course vary depending on the problem at hand. But
we may turn around this question and stipulate some “natural” conditions that events
should satisfy, no matter what the probability question we are trying to answer.
We lay down two such sets of natural conditions: field and σ-field.
Sample space. This is a non-empty set, usually denoted by Ω.
For example, I can take Ω = {HH, T T, T H, HT }.
The sample space Ω can be finite, countably infinite or uncountably infinite. We
shall of course see plenty of examples of all kinds later.
Then we have a collection of appropriate subsets of Ω which we shall call events. This
collection is contextual and even with the same Ω, we may have more than one such
relevant collection. This choice may vary depending on our goal. Natural conditions on
this collection are imposed which can be more or, less, strict, again depending on our
goal.
Definition 1 (Field). Let Ω be a non-empty set and F be a collection of subsets of Ω.
Then F is said to be a field (or an algebra) of subsets of Ω, if all the following three
conditions are satisfied:
(a) Ω ∈ F,
(b) If A ∈ F, then Ac ∈ F,
(c) If {Ai }1≤i≤n belong to F, then ∪ni=1 Ai ∈ F.
The elements of F are called events.
Clearly the empty set is a member of F. Note that a field is always defined with respect
to a sample space Ω. Later as we gain expertise, we may not always mention this space
explicitly, since afterall, our main focus will be on events. Also note that this definition
of field/algebra is in no way connected to the definition of field in the theory of algebra.
Exercise 1. Show that a field is always closed under finite intersection and finite
differencing.
Exercise 2. Consider Ω = N = {1, 2, . . .}. Let F be the collection of all subsets of N
which are either finite or their complement are finite. Show that F is a field. Does the
set {2, 4, 6, , . . .} belong to F?
1
Exercise 3. Suppose a non-empty set Ω is given. What is the smallest possible field
(containing Ω)?
Exercise 4. Suppose a non-empty set Ω is given. What is the largest possible field?
Exercise 5. Suppose a non-empty set Ω and a non-empty subset A of Ω is given. What
is the smallest possible field containing A (and Ω)?
Exercise 6. Suppose a non-empty set Ω and two non-empty subsets A1 and A2 of Ω are
given. What is the smallest possible σ-field containing these two sets (and Ω)? (may be
helpful to consider two cases: (i) A1 and A2 are disjoint and (ii) they are not disjoint).
Exercise 7. Show that arbitrary intersection of fields is again a field.
Definition 2 (Minimal field). Suppose C is a collection of subsets of Ω. Then the
smallest field containing all sets of C is called the minimal field containing C or, the
field generated by C, and is written as F(C).
Exercise 10. Give an example of a field which is not a σ-field. Hint: Try Exercise 2.
Exercise 11. Consider Ω = R. Let A be the collection of all subsets of R that are either
countable or their complement are countable. Show that A is a σ-field. Does the set
of all irrational numbers belong to A? Does the set of real numbers between 0 and 1
belong to A?
Exercise 12. Show that a σ-field is always closed under countable intersection.
Exercise 13. Suppose A is a σ-field of subsets of Ω. Let B be a subset of Ω (which may
or may not be in A). Consider
A ∩ B =: {A ∩ B : A ∈ A}.
2
Show that this is a σ-field (of subsets of B).
Exercise 14. Show that arbitrary intersection of σ-fields are again σ-fields.
Definition 4 (Minimal σ-field). Suppose C is a collection of subsets of Ω. Then the
smallest σ-field containing all sets of C is called the minimal σ-field containing C or,
the σ-field generated by C, and is written as σ(C).
The Borel σ-field of other appropriate subsets of R are defined in the natural way.
Exercise 17. Consider Ω = R. Show that the smallest σ-field containing each of these
collections equals B(R).
(a) all intervals of finite length.
(b) all closed intervals.
(c) all open intervals
(d) all left open right closed intervals (including −∞ as left limit)
(d) all left closed right open intervals (including ∞ as right limit)
Exercise 18. Show that B(R) contains the following sets (caution! it contains much
more though):
(a) all singleton sets.
(b) all finite sets
(c) all compact sets.
(d) all open sets
(e) all closed sets.
For which of the above collections is the smallest σ-field equal to B(R)?
The Borel σ-field on R̄, written B(R̄) is the smallest σ-field generated by all intervals of
the form (a, b], a, b ∈ R̄.
Exercise 19. Suppose Ω = R. Let C be the collection of all singleton sets. Describe
σ(C).
3
Lecture 3
Measure, probability measure
December 2, 2020
The above property is called countable additivity (of µ on A). The triplet (Ω, A, µ)
is called a measure space.
(b) Suppose F is a field. A function µ : F → R+ ∪ {∞} is called a measure (on F) if
∞
X
µ(∪∞
n=1 An ) = µ(An ) (2)
n=1
(a) Note that µ(A) must be defined for all A ∈ A (or F). That is, every event must
have a measure. Also note that µ(A) can be ∞. To avoid triviality, we shall assume that
there is at least one set A ∈ A such that µ(A) < ∞. In that case, show that µ(∅) = 0.
(b) There are theories where only the weaker version (finite additivity) is demanded for
a meaure, but countable additivity is a generally accepted required property.
(c) You may wonder, why do not we work on the power set of all subsets of Ω. That
is, declare every subset of Ω as an event. Unfortunately, many reasonable measures (for
example the “length measure” on subsets of R) does NOT have the the above property
on the power set and, must be restricted to a suitable subset of the power set.
(d) There exists theory of measures that are not necessarily non-negative. But we shall
not discuss them.
Notation: #(A) denotes the number of elements of A. P(Ω) will denote the power set
of Ω.
Exercise 1. In each case below, µ is a measure:
1
(a) Suppose Ω = N. µ(A) = #(A) for all subsets of Ω. This is known as the counting
measure.
(b) Suppose {xi } is a sequence of non-negative real numbers. Suppose Ω = N. Let
X
µ(A) = xi , A ⊆ Ω.
i∈A
X λi
µ(A) = exp{−λ} , A ⊆ Ω.
i!
i∈A
The Poisson measure is a probability measure. The counting measure is a σ-finite mea-
sure.
Exercise 2. Suppose Ω = {0, 1, . . .}. Let the σ-field A be P(Ω). Fix 0 < p < 1. Let
q = 1 − p. Define µ for singleton sets as
µ({i}) = pq i , i = 0, 1, . . . .
2
2. If µ is a measure on A, then for any sequence {Ai } from A,
∞
X
µ(∪∞
n=1 An ) ≤ µ(An ).
n=1
3
Lecture 4
Semi-field
Countable sub- and super-additivity
Discussion on “length measure”
December 8, 2020
Notation: (i) For any collection C of subsets of Ω, and any subset A of Ω, define
C ∩ A = {A ∩ B : B ∈ C}.
(ii) For any non-empty set Ω, P(Ω) denotes the collection of all subsets of Ω. It is called
the power set of Ω.
Show that
(a) µ is finitely additive but not countably additive.
(b) µ is not continuous from below at Ω.
(c) µ is not continuous from above at ∅.
(
0 if A is finite,
µ(A) = (4)
1 if Ac is finite.
Show that
(a) µ is finitely additive but not countably additive.
(b) µ is not continuous from below at Ω.
1
(b) µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B) for all A, B ∈ F.
(c) If A, B ∈ F, B ⊆ A, then µ(A) = µ(B) + µ(A − B). As a consequence µ(A) ≥ µ(B).
This is called monotincity of measure.
(d) For any sequence of sets {An } from F,
(i) µ(∪nk=1 Ak ) ≤ nk=1 µ(Ak ). This is called finite sub-additivity. Hint: Use disjoin-
P
tification, finite additivity and (c).
P∞
(ii) If {An } are disjoint and ∪∞ ∞
k=1 Ak ∈ F, then µ(∪k=1 Ak ) ≥ k=1 µ(Ak ). This is
known as countable super-additivity.
P∞
(iii) If µ is a countably additive measure then µ(∪∞ k=1 Ak ) ≤ k=1 µ(Ak ), whenever
∪∞ A
k=1 k ∈ F. This is known as countable sub-additivity.
Check that the sets in the definition can be taken to be disjoint without any loss of
generality.
Exercise 5. Suppose C is a class of subsets of Ω. Verify that we can describe the smallest
field F(C) containing C in the following way. Let
Clearly G1 and G2 both contain C. Further, both are fields. Hence they must equal F(C).
Clearly any field is a semi-field. A semi-field is not necessarily closed under complemen-
tation.
2
Exercise 7. Let Ω be the interval (0, 1]. Let S be the collection of all sub-intervals of
Ω of the form (a, b]. Show that S is a semi-field and is not a field.
Exercise 8. Let S be a semi-field and let E be any subset of Ω. Show that S ∩ E is a
semi-field (of subsets of E).
Exercise 9. Suppose Si , i = 1, 2 are semi-fields of subsets of Ωi , i = 1, 2 respectively.
Show that S1 × S2 is a semi-field of subsets of Ω1 × Ω2 . Show by an example, that the
analogous statement is not true for either fields or σ-fields.
Exercise 10. Let C be any class of subsets of Ω which contains the empty set. Show
that E ∈ σ(C) iff there exists C1 , C2 , . . . ∈ C (this collection is allowed to depend on E)
such that E ∈ σ(C1 , C2 , . . .).
Definition 3 (Countably generated field and σ-field). A field or a σ-field, A, of subsets of
Ω, is said to be countably generated, if there exists countably many subsets C1 , C2 , . . .
of Ω such that A = σ(C1 , C2 , . . .).
Exercise 11. Show that B(R) is countably generated.
Exercise 12. Let (Ω, A, µ) be a measure space. Let {An } be a sequence of sets in A.
Show that
µ(lim inf An ) ≤ lim inf µ(An ).
What is the relation between the two sets lim sup An , lim inf An and the two functions
lim sup fn and lim inf fn ?
Exercise 14. Let (Ω, A, µn ) be a sequence of measure spaces where {µn } is non-decreasing.
That is, for any set A, µn (A) is a non-decreasing sequence. Define
Is µ a measure?
3
Maybe we can start by checking out the “length measure” on the real-line...Call such a
target measure λ.
1. For any a < b, this λ must satisfy
λ(a, b] = b − a.
R = ∪∞
n=−∞ (n, n + 1].
6. This λ must be translation invariant. That is λ(A + x) = λ(A) for every A ∈ B(R)
and every x ∈ R.
7. Thus we should be able to restrict our attention to say, the interval (0, 1], and
complete our definition of λ on B(0, 1] and then use it to define λ on B(R).
8. But then why restrict to only the length measure λ? There must be a sort of
general programme to build measures by defining them on “simpler sets” (semi-field or
field maybe?) and then get to the σ-field...For example can we take a non-decreasing
continuous function F and define µ(a, b] = F (b)−F (a). Will this give rise to a countably
additive measure on B(R)?
9. There is another interesting question. What about getting from the length measure
λ on R to the area measure on R × R or the volume measure on R × R × R....?
4
Lecture 5
Monotone class theorem
December 9, 2020
We introduce another class of subsets of Ω and a crucial result concerning this class.
This result is going to be a crucial tool and shall be used many times in this course.
Definition 1 (Monotone class). Let Ω be a non-empty set. A class of subsets M of Ω,
is said to be a monotone class if it is closed under increasing and decreasing limits of
sets. That is,
(i) for all {An } from M such that An ↑, lim An = ∪∞
k=1 Ak ∈ M and
(ii) for all {An } from M, such that An ↓, lim An = ∩∞
k=1 Ak ∈ M.
Clearly any σ-field is a monotone class. A field which has only a finite number of sets is
a monotone class.
Exercise 1. Construct examples of fields that are not monotone classes and vice-versa.
Exercise 2. If M is a field and is a monotone class, then it is a σ-field.
Exercise 3. Check that arbitrary intersection of monotone classes is a monotone class.
Hence, given a class of sets C, there is a smallest monotone class that contains C.
It is usually denoted by M(C).
The following theorem links the three notions of field, monotone class and σ-field.
Theorem 1 (Monotone class theorem). Suppose F is a field. Then
(a) M(F) = σ(F).
(b) As a consequence of (i), if M is a monotone class such that M ⊇ F, then M ⊇ σ(F).
1
On the other hand, by definition, MA ⊆ M(F). Hence
MA = M(F).
(iii) But this means that for any B ∈ M(F) and any A ∈ F,
A ∩ B, A ∩ B c and Ac ∩ B ∈ M(F).
(iv) We now claim that M(F) is a field. This is because, we have already seen that
when A, B ∈ M(F) = MA then
A ∩ B, A ∩ B c and Ac ∩ B ∈ M(F).
Exercise 4. Suppose F is a field and µ1 and µ2 are two finite measures defined on σ(F)
and agree on F. That is µ1 (A) = µ2 (A) for all A ∈ F. Show that they agree on σ(F).
Hint: Define the class of “good sets”, that is
2
Lecture 6
Towards an extension theorem
Updated, December 10, 2020
Before we construct some non-trivial measures, we will first explore whether we can
extending a measure defined on a field F to σ(F). As we have seen in the previous
lecture, IF the measure is finite and IF there is an extension, then this extension must
be unique. In this lecture we shall address these issues.
Note that the sets A and B need not belong to F. Lemma 1 implies that if we add the
limits of increasing sets to the collection F, then there is only one way of extending µ
to such sets.
G = {A : An ↑ A, An ∈ F}. (1)
Extend µ to sets in G by
1
(d) If G1 , G2 ∈ G and G1 ⊆ G2 , then µ1 (G1 ) ≤ µ1 (G2 ).
(e) µ1 (G1 ∪ G2 ) + µ1 (G1 ∩ G2 ) = µ1 (G1 ) + µ1 (G2 ) for all G1 , G2 ∈ G. So µ1 is finitely
additive.
(f) If Gn ∈ G and Gn ↑ G, then G ∈ G and µ1 (Gn ) ↑ µ1 (G).
Proof. (a) Proof of the first part is trivial. To prove the second part, let G1 , G2 ∈ G. Let
An , Bn ∈ F be such that An ↑ G1 and Bn ↑ G2 . Then An ∪ Bn , An ∩ Bn ∈ F. Moreover,
An ∪ Bn ↑ G1 ∪ G2 and An ∩ Bn ↑ G1 ∩ G2 . Hence G1 ∪ G2 and G1 ∩ G2 are in G.
(b), (c) and (d) follow from Lemma 1.
(e) Let G1 , G2 ∈ G. Then there exists non-decreasing An , Bn ∈ F such that An ↑ G1
and Bn ↑ G2 . By additivity of µ on F,
Hence
µ(Anm ) ≤ µ(Dm ) ≤ µ1 (Gm ). (5)
Let m → ∞ in (4) to obtain
Gn ⊆ ∪∞
m=1 Dm ⊆ G. (6)
Now let n → ∞ to conclude that Dm increases to G. Hence
2
Lecture 7
Outer measure
December 10, 2020
µ∗ (A ∪ B) + µ∗ (A ∩ B) ≤ µ∗ (A) + µ∗ (B).
Example 1. We know that the pair (G, µ1 ) defined in the previous lecture satisfies the
above conditions (i)–(v).
1
µ∗ (A) + µ∗ (B) + 2 ≥ µ(G1 ) + µ(G2 )
= µ(G1 ∪ G2 ) + µ(G1 ∩ G2 )
≥ µ∗ (A ∪ B) + µ∗ (A ∩ B).
Since is arbitrary, proof of (d) is complete.
(e) Note that µ∗ (An ) is non-decreasing and since An ↑ A, µ∗ (A) ≥ lim µ∗ (An ). We need
to prove the opposite inequality.
As in the proof of part (d), get Gn ∈ G such that
Gn ⊇ An and µ(Gn ) ≤ µ∗ (An ) + 2−n , for all n ≥ 1. (2)
We now claim that
m
X
∗
µ(∪m
k=1 Gk ) ≤ µ (Am ) + 2−k for all m ≥ 1. (3)
k−1
We prove this by induction. Clearly it is true for m = 1. Suppose it holds for all m ≤ n.
We shall prove it for m = n + 1.
First note that
µ∗ (∪nk=1 Gi ) ∩ Gn+1 ≥ µ∗ (Gn ∩ Gn+1 ) ≥ µ∗ (An ∩ An+1 ) = µ∗ (An ).
(4)
Hence,
µ(∪n+1 n n
k=1 G i ) = µ(∪ k=1 G i ) + µ(G n+1 ) − µ (∪ k=1 G i ) ∩ G n+1 (condition (iv))
n
X
≤ µ∗ (An ) + k
+ µ∗ (An+1 ) + (n+1) − µ∗ (An ) ((3) with m = n, (2), (4))
2 2
k=1
n+1
X
∗
≤ µ (An+1 ) + 2−k .
k=1
Hence,
µ∗ (A) ≤ µ∗ (∪∞
k=1 Gk ) (by (b))
= µ(∪∞
k=1 Gk ) (by (a))
= lim µ(∪nk=1 Gk ) (using condition (v))
n→∞
n
X
∗
≤ lim [µ (An ) + 2−k ]
n→∞
k−1
∗
= lim µ (An ) + .
n→∞
Since is arbitrary, this proves (e) and the proof of the lemma is complete.
2
Note that µ∗ is defined on P(Ω) but need not be a measure, though it has some measure-
like properties. This motivates the following definition.
3
Lecture 8
Extension of measure
Completion of a measure December 14, 2020
Recall the earlier set up where we have a finite measure space (Ω, F, µ). Then we looked
at the class G of all increasing limits of sets from F. This led to an additive set function
µ∗ on G. Then by the previous lemma, this gives an outer measure (on P(Ω)). Let us
keep calling it µ∗ .
Though an outer measure is not a measure in general, we know that our outer measure
µ∗ agrees with µ on F and hence is a measure on this subset of P(Ω).
We have the following theorem for an outer measure of this type. We state and prove it
for a probability measure but the same proof works, with appropriate changes, for any
finite measure.
Hence by (2), G ∈ H.
(ii) From the definition of H, it follows that, Ω ∈ H, and H is closed under complemen-
tation.
1
(iii) H is a field and P ∗ is finitely additive on H.
To show this, let H1 , H2 ∈ H. Let
A = H1 ∪ H2 , B = H1 ∩ H2 ,
The above two inequalities follow from Lemma 1 (d) of Lecture 7. If we now add (3)
and (4), and use the fact that H1 , H2 ∈ H, then we arrive at
1 ≤ P ∗ (H) + P ∗ (H c )
≤ P ∗ (Hn ) + + P ∗ (Hnc )
= 1 + .
Remark 1. For a measure space (Ω, F, µ) where µ is a finite measure but not necessarily
a probability measure, the set H is defined as
H = {H ⊆ Ω : µ∗ (H) + µ∗ (H c ) = µ(Ω)}
H = {H ⊆ Ω : µ∗ (H) + µ∗ (H c ) ≤ µ(Ω)}
⊇ σ(F).
2
In Theorem 1, since H ⊇ σ(F)), a natural question is how much larger is it? This ques-
tion has a nice answer and hinges on the notions of null sets and complete measures.
Definition 1 (Null set). Suppose (Ω, F, µ) is a measure space where F is a field. A set
A ∈ F is said to be a µ-null set if µ(A) = 0. If there is no scope for confusion we say
null set instead of µ-null set. We will denote the class of µ-null sets by Nµ .
Exercise 2. Show that a countable union of null sets is again a null set. An uncountable
union of null sets need not be a null set.
Exercise 3. Show that there are measures which are not complete.
µ(Ω) ≤ µ∗ (B) + µ∗ (B c )
≤ µ∗ (A) + µ∗ (B c ), since B ⊆ A
= µ∗ (B c ), since µ∗ (A) = 0
≤ µ∗ (Ω).
Hence B ∈ H.
If we have a measure space which is not complete, then can we complete it in some way?
Aµ = {A ∪ N : A ∈ A, N ⊆ B, B ∈ Nµ }
Proof. To show that Aµ is closed under countable unions, it suffices to observe that
∪∞ ∞ ∞
n=1 (An ∪ Nn ) = (∪n=1 An ) ∪ (∪n=1 Nn )
3
To show that Aµ is close under complementation, consider A ∪ N ∈ Aµ where N ⊆ B
with µ(B) = 0. Then
(A ∪ N )c = Ac ∩ N c (7)
c c c c c
= (A ∩ B ) ∪ (A ∩ (N − B ). (8)
4
Lecture 9
Completion of a measure
December 16, 2020
If we have a measure space which is not complete, then can we complete it in some way?
Theorem 1 (Completion of a measure space). Suppose (Ω, A, µ) is a measure space
where A is a σ-field.
(a) Let
Aµ = {A ∪ B : A ∈ A, B ⊆ N, N ∈ Nµ }
where Nµ is the collection of all µ-null sets from A. Then Aµ ⊇ A and is a σ-field.
(b) Extend µ from A to Aµ as follows:
Proof. (a) To show that Aµ is closed under countable unions, it suffices to observe that
∪∞ ∞ ∞
n=1 (An ∪ Bn ) = (∪n=1 An ) ∪ (∪n=1 Bn )
(A ∪ B)c = Ac ∩ B c (2)
c c c c c
= (A ∩ N ) ∪ (A ∩ (B − N ) (3)
c c c
= (A ∩ N ) ∪ (A ∩ (N − B). (4)
1
Let M ⊆ A ∪ B. We have to show that M ∈ Aµ . But M ⊆ A ∪ N where A ∪ N ∈ A
and moreover µ(A ∪ N ) = 0. Hence by the definition of Aµ , M ∈ Aµ .
Theorem 2. Consider the finite measure space (Ω, F, µ) where F is a field. Then
(Ω, H, µ∗ ) is the completion of (Ω, σ(F), µ).
Proof. For convenience, let us use the notation A =: σ(F). Note that (Ω, A, µ∗ ) is a
measure space. We have to show that
Aµ∗ = H.
Aµ∗ ⊆ H.
Now consider any A ∈ H. Recall the definition of approximation from above by sets
in G while defining the outer measure µ∗ . That implies, there exists sequences of sets
{Cn , Dn } from A such that
Let
C = ∪∞ ∞
n=1 Cn , D = ∩n=1 Dn .
Then
A = C ∪ (A − C) where C ∈ A. (6)
But
A−C ⊆D−C ∈A (7)
and
µ∗ (D − C) ≤ µ∗ (Dn − Cn ) = µ∗ (Dn ) − µ∗ (Cn ) → 0. (8)
Using (6), (7), and (8), A ∈ Aµ∗ . This proves the result completely.
2
Lecture 10
Carathéodory extension theorem
Approximation theorem
December 16, 2020
Proof. This proof uses the results for extension of finite measures in a predictable way.
Write
Ω = ∪∞
n=1 An , where An are disjoint, An ∈ F, µ(An ) < ∞ for all n ≥ 1. (1)
Then it is easy to check that for each n ≥ 1, (Ω, F, µn ) is a countably additive finite mea-
sure space. Then by Theorem 1 of Lecture 7, there is a unique extension (Ω, σ(F), µ∗n ).
Now it is obvious what we should do: define
∞
X
µ∗ (A) = µ∗n (A), A ∈ σ(F). (3)
n=1
Then
(i) µ∗ is countably additive since each µ∗n is so. One has to observe that addition of
non-negative numbers gives the same result irrespective of the order in which they are
added.
(ii) (Ω, σ(F), µ∗ ) is an extension of (Ω, F, µ). This is easy to establish.
It remains to prove uniqueness. Suppose λ is a measure on σ(F) which agrees with µ
on F. Then we have to prove that λ = µ on σ(F). Define
1
∞
X
= µ∗n (A), A ∈ σ(F)
n=1
= µ∗ (A), A ∈ σ(F)
Measures of sets in σ(F) can be approximated by those in F in the σ-finite case. This
result will come handy in future.
Theorem 2. Suppose (Ω, F, µ) is a countably additive measure space where F is a field
and µ is σ-finite. Then given any and > 0, and any set A ∈ σ(F), such that µ(A) < ∞,
G = {A : A = ∪∞
n=1 An , An ∈ F}.
Then (4) holds for all A ∈ G for which µ(A) < ∞, since µ is continuous from below.
(ii) Now suppose µ is finite. By Lemma 1 of Lecture 7, the outer measure of any set A can
be approximated by the measure of sets G from G, which in turn can be approximated
by sets from F ∈ F by (i). This proves the theorem for the case where µ is a finite
measure.
(iii) Now suppose µ is σ-finite. Consider a sequence of sets {An } which satisfies (1).
Define µn on σ(F) as usual by µn (A) = µn (A ∩ An ). Then each µn is a finite measure.
Hence by part (ii), there exists sets Bn ∈ F such that
µn (A∆Bn ) < . (5)
2n
Note that
≥ µn (A∆Bn )
2n
= µ (A∆Bn ) ∩ An
= µ (A∆(Bn ∩ An ) ∩ An )
= µn A∆(Bn ∩ An ) .
Then C ∩ An = Bn . Hence
µn (A∆C) = µ (A∆C) ∩ An
2
= µ ((A∆Bn ) ∩ An )
= µn (A∆Bn ).
Hence
∞
X
µ(A∆C) = µn (A∆C) < .
n=1
It remains to take care of the fact that C may not belong to F. Note that
∪nk=1 Bk − A ↑ C − A, A − ∪nk=1 Bk ↓ A − C.
3
Lecture 11
Distribution function and probability distribution function
Lebesgue-Stieltjes measure
From measure to distribution function
December 23, 2020
The extension theorem will now lead us to the Lebesgue measure (the length measure)
on R. Indeed we shall exhibit a more general class of measures on R, which are called
the Lebesgue-Stieltjes measures.
Definition 1 (Distribution function and probability distribution function). Any func-
tion F : R → R is called a distribution function if
(i) F is non-decreasing. That is, F (a) ≤ F (b) for all a ≤ b, a, b ∈ R.
(ii) F is right continuous. That is, limy↓x F (y) = F (x) for every x ∈ R.
F is called a probability distribution function if in addition to (i) and (ii), we have
(iii) limx→−∞ F (x) = 0, and limx→∞ F (x) = 1. In short we write F (−∞) = 0 and
F (∞) = 1.
A probability distribution function is also often called a cumulative distribution
function (CDF).
Remark 1. (i) A distribution function can take negative values and can be unbounded.
For example the function F (x) = x, x ∈ R is a distribution function.
(ii) If F is a probability distribution function (CDF) then of course, 0 ≤ F (x) ≤ 1 for
all x.
(iii) Even though all distribution functions are right continuous, a distribution function
need not be left continuous. However, the left limit exists at each point, in the sense
that limy→x,y<x F (y) exists for each x ∈ R.
(iv) We shall soon show how given any distribution function, there is an associated
measure on B(R) and vice versa.
Exercise 1. Suppose Z x
F (x) = exp(−y 2 )dy, x ∈ R.
−∞
Show that F is a distribution function.
Exercise 2. Suppose for α, p > 0,
0 if x ≤ 0
F (x) = R x αp −αy p−1 (1)
0 e y dy, if x > 0.
Γ(p)
Show that F is a probability distribution function.
1
Exercise 3. Suppose P is the probability measure where P {0} = P {1} = 1/2. Find the
corresponding probability distribution function.
establishing non-decreasingness.
Fix x ∈ R and xn ↓ x. Since µ is finite on every sub-interval, using continuity from
above,
F (xn ) − F (x) = µ(x, xn ] ↓ µ(∅) = 0,
establishing right continuity.
Exercise 5. Suppose µ is a finite measure and µ{x}) > 0 for some x ∈ R. Show that
F is not left continuous at x.
Before we state and prove a converse of the above result, we make a few observations.
(i) Extend the function F : R → R to the function F : R̄ → R̄ by defining
2
Note that both the above limits exist but they may equal ∞ and −∞ respectively.
(ii) Towards constructing a measure µ from a given F , define the set of all right-
semiclosed intervals of R̄ as
S̄ = A : A = (a, b] or A = [−∞, b], or A = (−∞, b], a, b ∈ R̄, a < b . (4)
Then
(a) R̄ ∈ S̄.
(b) S̄ is closed under finite intersection.
(c) If A ∈ S̄, then its complement is a finite disjoint union of sets from S̄.
We shall refer to S̄ as the class of all (left open) right-semiclosed intervals of R̄. It is a
semi-field.
(iii) Now let
F̄ = {A : A = ∪nk=1 Ik : Ik ∈ S̄ for all k and they are disjoint}. (5)
Then. by using (a), (b) and (c), it is easy to show that F̄ is a field of subsets of R̄. .
Exercise 6. Verify that F̄ contains the smallest field generated by al intervals of R.
(iv) Define µ on S̄ by
µ(a, b] = F (b) − F (a), a, b ∈ R̄, a < b
µ[−∞, b] = F (b) − F (−∞) = µ(−∞, b], b ∈ R̄. (6)
Then µ is non-negative and is defined on all right-semiclosed intervals of R̄.
(v) Define µ on F̄ by:
n
X
µ(∪nk=1 Ik ) = µ(Ik ), I1 , . . . In ∈ S̄ and they are disjoint.
k=1
Note that ∪nk=1 Ik has alternate descriptions as ∪tk=1 Jk where Jk are disjoint elements of
S. Thus we must show that the above definition is meaningful. This is easily done by
using (iv).
Exercise 7. Show that µ is well defined on S̄.
(vi) µ defined above on F̄ is finitely additive. This is also easily proved by using (iv)
and (v).
Exercise 8. Show that µ is finitely additive on F̄.
Lemma 1. Suppose F is a distribution function and the finitely additive µ is defined on
the field F̄ of finite disjoint unions of all right-semiclosed intervals of R̄ as above. Then
µ is countably additive on F̄.
3
Lecture 12
From distribution function to measure
Lebesgue measure
December 23, 2020
Proof. (a) First assume that F (∞) − F (−∞) < ∞, so that µ is finite. Let {An } be sets
from F̄ such that An ↓ ∅. To prove countable additivity, we need to show that µ(An ) ↓ 0.
Recall that each An is a finite disjoint union of intervals of the form (a, b]. Suppose
an ↓ a. Then for every fixed b, by right continuity,
Since each B¯n is compact, it follows that there exists a finite m, such that ∩m ¯
k=1 Bk = ∅.
Then for all n ≥ m,
= µ An − ∩nk=1 Bk
Xn
≤ µ(Ak − Bk ) since µ is finitely additive
k=1
≤ .
1
This shows that µ(An ) ↓ 0 if F (∞) − F (−∞) < ∞. Hence µ is countably additive in
this case.
Define the set function µn on F̄ corresponding to Fn . These are all finite and hence
countably additive by (a). Moreover,
But then
∞
X
0 ≤ µ(A) − µ(Ak ) (by 3))
k=1
∞
X ∞
X
= lim µn (Ak ) − µ(Ak ) (note that µ(An ) < ∞)
n→∞
k=1 n=1
≤ 0 since µn ≤ µ.
Exercise 1. (a) Prove the claim made in (i) and (ii) above in the course of the proof.
(b) Verify the step An − ∩nk=1 Bk ⊆ ∪nk=1 Ak − Bk claimed in the above proof.
Exercise 2. Verify the claim made in Equation (2) in the above proof.
2
Now we can state our theorem.
Theorem 1 (From distribution function to measure). Let F be a distribution function
on R and let µ(a, b] = F (b) − F (a), a < b. Then there is a unique extension µ which is
a Lebesgue-Stieltjes measure on B(R).
Proof. By the previous lemma, get countably additive measure µ on the field F̄. Note
that this is field of subsets of R̄. Define the set function on the field of finite disjoint of
right-semiclosed intervals of R, (treating (a, ∞) as right-semiclosed). Call this field F.
For example
µ(a, ∞) = F (∞) − F (a), a, b ∈ R
µ(−∞, b] = F (b) − F (−∞), a, b ∈ R
µ(R) = F (∞) − F (−∞).
Note that there is no other possible choice of µ on these sets.
Clearly this µ is countably additive on F. Further µ is σ-finite on F (but not necessarily
on F). Now use the Carathéodory extension theorem. Clearly the extended measure is
Lebesgue-Steiltjes. We omit the details.
Exercise 4. Check all the details in the proof of Theorem 1. Why could not we claim
that µ is σ-finite on R̄?
3
Exercise 7. Show that
4
Lecture 13
More on distributions and Lebesgue-Stieltjes measures
Lebesgue measure on Rn
Approximations from within by compact sets (σ-finite measure)
Approximation from above by open sets (finite or Lebesgue-Stieltjes measure)
December 23, 2020
1
Remark 2. (a) The construction of Lebesgue-Stieltjes measure on B(Rn ) can be
carried out in a similar fashion by starting with a function F which is non-decreasing
and right continuous at every co-ordinate. Note that we will first need to extend the
notion of F (b) − F (a) that we had when a ≤ b ∈ R. We omit the details for now.
where each Fi (·) is non-decreasing and right continuous, the construction of the measure
is a bit simpler. One can start with rectangles which are product of right-semiclosed
intervals and defined µ in the natural way.
(b) The existence of Lebesgue measure on Rn follows from (b). This is really a special
case of the product measure construction, starting with the Lebesgue measure on
R. We shall cover this later.
Exercise 4. Learn in details from the book, the concepts and related developments on
distribution functions and Lebesgue-Stieltjes measures on Rn .
Definition 2 (Lebesgue measure on Rn ). The Lebesgue measure λn on B(Rn ) is the
unique measure for which
n
Y
λn (a1 , b1 ] × · · · × (a1 , bn ] = (bi − ai ), ai < bi , ai , bi ∈ Rn , i = 1, . . . , n.
i=1
µ(Bn ) ≤ µ(Kn ) + .
2
By replacing Kn with ∪nk=1 Kk (which is compact), we can assume that Kn is a non-
decreasing sequence. Then
(b) We have
µ(B) ≤ inf µ(V ) : V ⊇ B, V open
≤ inf µ(W ) : W ⊇ B, W = K c , K compact
(c) Write Rn = ∪∞
n=1 Bn where {Bn } are disjoint bounded sets. Then for each n, Bn ⊆ Cn
for some bounded open sets Cn . Define the finite measures
3
Fix > 0. If B is Borel subset of Bk , then by (b), there is an open set Wk ⊇ B such
that
µk (Wk ) ≤ µk (B) + k . (4)
2
Now note that Vk = Wk ∩ Ck is an open set and B ∩ Ck = B since B ⊆ Bk ⊆ Ck . Hence
Now fix any A ∈ B(Rn ). By the conclusion reached above, let Vk be an open set such
that
Vk ⊇ A ∩ Bk and µ(Vk ) ≤ µ(A ∩ Bk ) + k for all k.
2
∞
Let V = ∪n=1 Vn . Then using the above inequality, V is open, V ⊇ A and
∞
X
µ(V ) ≤ µ(Vn ) ≤ µ(A) + .
n=1
(d) Let S = {1, 1/2, 1/3, . . .} and let be the measure concentrated on S with µ{1/n} =
1/n for all n. Clearly µ is σ-finite and µ is not a Lebesgue-Stieltjes measure. Consider
the set B = {0}. Any open set containing B has infinite measure.
Show that µ is not a Lebesgue-Stieltjes measure and approximation from above fails.
4
Lecture 14
Towards integration with respect to measures
Measurable functions
December 23, 2020
Note that the measurability of a function is determined by what the underlying σ-fields
are, and not on any measure that may be defined on the σ-fields. If these σ-fields are
clear from the context, we may not mention them.
Exercise 1. Suppose f : Ω1 → Ω2 is any function. Suppose {Aα , α ∈ I} and {Bα , α ∈ I}
are subsets of Ω1 and Ω2 respectively. Show that
f ∪α∈I Aα = ∪α∈I f (Aα )
f ∩α∈I Aα ⊆ ∩α∈I f (Aα ) (equality may not hold)
c
= f (Acα ) does not hold in general
f (Aα )
c
f −1 (Bαc ) = f −1 (Bα )
f −1 (∪α∈I Bα ) = ∪α∈I f −1 (Bα )
f −1 (∩α∈I Bα ) = ∩α∈I f −1 (Bα )
1
Exercise 2. Suppose f : Ω1 → Ω2 with σ-fields Ai , i = 1, 2 respectively. Suppose C is a
class of subsets of Ω2 such that σ(C) = A2 . Then f is measurable if and only if for all
C ∈ C, f −1 (C) ∈ A1 .
Exercise 3. Give an example of a function f : (Ω1 , A1 ) → (Ω2 , A2 ) such that for some
set A1 ∈ A1 , f (A1 ) ∈
/ A2 .
2
Lecture 15
More on Measurable functions
Integration with respect to measures
December 23, 2020
Note that {xi } need not be distinct. In other words a simple function is allowed to have
more than one representation. Since {Ai } are disjoint, both ∞ and −∞ are allowed as
possible values of {xi }.
Exercise 1. Show that the above integral is well-defined–all representation of f yield the
same value.
1
So, here we again see the idea of approximating from below. Note that this definition
is unambiguous and is a true extension of the definition of integral for simple functions.
Also, the value of the integral can be equal to ∞.
Exercise 3. Suppose f : (Ω, A) → (R̄, B(R̄)) is a non-negative measurable function.
(a) Show that there exists a sequence of simple functions {sn } such that sn (ω) ↑ f (w) for
all ω ∈ Ω. Hint: Truncate R and split it into intervals with diadic rational end points
and take inverse images..
R
(b) What do you think will happen to limn→∞ Ω sn dµ?
Now we will move to define integrals of functions f which are not necessarily non-
negative. We use the familiar idea of splitting f into positive and negative parts.
Definition 4 (Positive and negative parts). For any f : (Ω, A) → (R̄, B(R̄)), define
(
f (ω) if f (ω) ≥ 0
f + (ω) = (1)
0 otherwise.
(
− −f (ω) if f (ω) ≤ 0
f (ω) = (2)
0 otherwise.
The functions f + and f − are called the positive and negative parts of f respectively.
Exercise 4. Show that
(i) Both f + and f − are non-negative measurable functions.
(ii) |f | = f + + f − .
(ii) f = f + − f − .
Exercise 5. If f : (Ω, A) → (R̄, B(R̄)) and A ∈ A, then show that f IA is also measur-
able.
Definition 5. For any f : (Ω, A) → (R̄, B(R̄)), define
Z Z Z
f dµ = +
f dµ − f − dµ provided not both integrals are ∞.
Ω Ω Ω
R R
In this case we say that Ω f dµ exists. Otherwise we say that Ω f dµ does not exist. If
the integral exists and is finite, then we say that f is µ-intergable, or simply integrable
if the measure µ is clear from the context. We shall often suppress Ω in the notation of
the intergal. For any set A ∈ A, define
Z Z
f dµ = f IA dµ.
A Ω
2
Lecture 16
More on measurable functions
Projection map
Basic properties of the integral
Monotone Convergence Theorem (MCT)
December 30, 2020
Proof. This follows since for every x ∈ R, the following string of equalities hold:
{ω : f (ω) > x} = ω : lim fn (ω) > x
n→∞
1
= ω : fn (ω) is eventually > x +
for some k = 1, 2, . . .
k
1
= ∪∞
k=1 ω : fn (ω) > x + for all but finitely many n
k
∞
1
= ∪k=1 lim inf ω : fn (ω) > x +
n→∞ k
1
= ∪∞ ∞ ∞
k=1 ∪n=1 ∩t=n ω : ft (ω) > x + .
k
(b) Suppose f is any Borel measurable function from Ω to R̄. Then there exists a sequence
of finite-valued simple functions {sn } such that |sn | ≤ |f | and sn → f point-wise.
(c) Suppose f is any bounded Borel measurable function from Ω to R̄. Show that {sn }
in (b) above can be chosen such that sn → f uniformly on Ω.
It can be checked that the sequence {sn } has the desired properties.
(b) Consider the two non-negative measurable functions f + and f − . Choose simple
functions s1n and s2n respectively for them as in (a). Then let sn = s1n − s2n . It is easy
to check that {sn } has the desired properties. This also serves as proof for (c).
1
Exercise 2. Suppose f1 and f2 are the constant functions 1 and 2 respectively. What
are the corresponding approximating simple functions as constructed in the above proof ?
Exercise 3. Using the above results, show that if f1 and f2 are Borel measurable func-
tions from Ω to R̄, then f1 + f2 and f1 /f2 are also so, provided they are defined.
Exercise 4 (Composition of measurable functions). Suppose f1 : (Ω1 , A1 ) → (Ω2 , A2 ),
f2 : (Ω2 , A2 ) → (Ω3 , A3 ) are measurable. Show that f2 ◦ f1 : (Ω1 , A1 ) → (Ω3 , A3 ) is
measurable.
Lemma 3 (Projection maps). (a) Suppose pi : R¯n → R̄ is defined as pi (x1 , . . . , xi , . . . , xn ) =
xi . Then pi is Borel measurable for 1 ≤ i ≤ n.
(b) Suppose
R f1 , f2 are two
R measurable functions
R such that f1 ≥R f2 . ThenR
(i) If f2 dµ exists and f2 dµ > −∞ then f1 dµ exists and f1 dµ ≥ f2 dµ.
R R R R R
(ii) If f1 dµ exists and f1 dµ < ∞ then f2 dµ exists and f1 dµ ≥ f2 dµ.
R R R R
(iii) If both integrals, f1 dµ and f2 dµ exist, then f1 dµ ≥ f2 dµ.
R R R
(c) If f dµ exists then f dµ ≤ |f |dµ.
(d) If f is non-negative, and A ∈ A, then
Z Z
f dµ = sup sdµ : 0 ≤ s ≤ f, s simple .
A A
R R
(e) If f dµ exists, then so does A f dµ for every A ∈ A. If the first intergral is finite,
then so is the second integral for every A.
2
Proof. The claim can be easily verified when f is a simple function.
So then let f be a non-negative measurable function. Fix any simple function s such
that 0 ≤ s ≤ f . Let {Bn } be disjoint sets from A and B = ∪∞
n=1 Bn . Then
Z ∞ Z
X
sdµ = sdµ since s is simple
B n=1 Bn
X∞ Z
≤ f dµ by monotonicity of integral.
n=1 Bn
Then
Z
ν(B1 ∪ . . . ∪ Bn ) = f dµ
∪n
i=1 Bi
Z
≥ sf dµ
∪ni=1 Bi
n
XZ
= sf dµ since we have simple functions
i=1 Bi
n Z
X
≥ f dµ −
i=1 Bi
n
X
= ν(Bi ) − .
i=1
Hence
ν(B) ≥ ν(B1 ∪ . . . ∪ Bn )
Xn
≥ ν(Bi ) −
i=1
3
∞
X
→ ν(Bi ) − .
i=1
In the above proof ν(B) = ν + (B) − ν − (B) where ν + and ν − are measures and at least
one of them is finite.
Bn = {ω : fn (ω) ≥ bs(ω)}.
Then Bn ↑ Ω. Now
Z Z
v ≥ fn dµ ≥ fn dµ
Bn
Z
≥ b sdµ
Bn
Z
→ b sdµ as n → ∞ by Theorem 2, continuity from below
Z B
→ sdµ letting b → 1.
B
R
Now taking supremum over all possible s, we get v ≥ f dµ and the proof is complete.
4
Lecture 17
Further properties of the integral
Almost sure (almost everywhere)
December 31, 2020
Notation:
R R
For any measurable function f we shall write f dµ for Ω f dµ.
The acronym MCT will stand for Monotone Convergence Theorem.
Exercise
R 1. Construct fn such
R that fn ↑ f , integrals of all these functions exist, but
fn dµ does not increase to f dµ.
R
Exercise
R 2. Construct non-negative fn such that fn ↓ f , but fn dµ does not decrease
to f dµ.
Show that
∞
X ∞
X
xn,k ↑ xk .
k=1 k=1
Proof. (i) We already know the result if f and g are simple functions.
(ii) Suppose f and g are non-negative functions. Let an and bn be non-negative simple
functions which
R increase
R to f and g respectively. Then 0 ≤ sn := an + bn ↑ f + g. Hence
by MCT, sn dµ ↑ (f + g)dµ. Moreover
Z Z Z
sn dµ = an dµ + bn dµ by (i)
1
Z Z
↑ f dµ + gdµ by MCT.
R
We have already observed that the left side increases to (f + g)dµ. This proves the
result for this special case.
(iii) Now suppose f ≥ 0 and g ≤ 0 and h = f + g ≥ 0. This implies g is finite. Note
that f = h + (−g) is the sum of two non-negative functions. Hence
Z Z Z
f dµ = hdµ (−g)dµ by (ii)
Z Z
= hdµ − gdµ by Theorem 1 (a) of Lecture 16
R R
If gdµ is finite then from the above the result follows. If gdµ = −∞, then since
h ≥ 0, Z Z
f dµ ≥ − gdµ = ∞,
R R
contradicting the hypothesis that f dµ + gdµ is defined. Thus this case does not arise
at all. So, the result is proved in this case.
(iv) If f ≥ 0 and g ≤ 0 such that and h = f + g ≤ 0 then we can work with −f and −g
and apply (iii).
(v) Now we prove the general case by splitting the range of the functions so as to apply
(ii), (iii) and (iv). Let
Now,
Z Z Z
hdµ = f dµ + gdµ, for all, i = 1, 2 . . . by (ii), (iii) and (iv),
Ei Ei Ei
Z n Z
X
f dµ = f dµ by Theorem 3
i=1 Ei
Z n Z
X
gdµ = gdµ by Theorem 3
i=1 Ei
Z Z n Z
X
f dµ + gdµ = hdµ using the above three equations
i=1 Ei
2
Z
= hdµ by Theorem 2 of Lecture 16, provided it exists.
To check that hdµ exists, we need to show that at least one of h+ dµ and h−1 dµ is
R R R
finite. Suppose Z Z
h+ dµ = h−1 dµ = ∞ if possible.
But then Z Z
f dµ = ∞ or gdµ = ∞
Ei Ei
and hence Z Z
f dµ = ∞ or gdµ = ∞. (1)
3
Definition 1 (Almost surely or almost everywhere). . A property P (·) defined on
(Ω, A, µ) is said to hold almost surely-[µ] if there exists a µ-null set A such that it holds
for all ω ∈ Ac . We also write µ-almost surely etc. simply a.s or a.e.
Remark 2. The definition just says that the property P holds outside some null set.
It is silent about what happens on the chosen null set. And so, P (ω) is allowed to hold
for some ω in the chosen null set A. It is plausible that the set of ω ∈ Ω for which P (ω)
holds, may not be in A. Also, the almost surely or almost everywhere is with respect to
a given measure µ.
Example 1. Suppose f and g from (Ω, A, µ) to (Rn , B(Rn )) are measurable. Show that
A = {ω : f (ω) 6= g(ω)} ∈ A. Show that the functions f and g are equal almost surely if
and only if µ(A) = 0.
Exercise 6. Consider the measure space ([0, 1], B([0, 1], λ). Define
4 if x is irrational
f (x) = 2 if x is rational, x < 1 (3)
∞ if x = 1.
Show that
(i) f is measurable.
R
(ii) f dλ = 4.
(iii) f is not Riemann integrable.
(iv) f = 4 almost surely.
g = gIA + gIB ,
4
f = f IA + f IB ,
gIB = f IB = 0 almost surely.
Now the result follows from the additivity of integrals and part (a).
R
(c) Let A = {ω : |f |(ω) = ∞}. If µ(A) > 0, then |f |dµ ≥ ∞µ(A) = ∞ which is a
contradiction. Hence (c) is proved.
(d) Let B = {ω : f (ω) > 0} and Bn = {ω : f (ω) > 1/n}, n = 1, 2, . . .. Then Bn ↑ B.
Now 0 ≤ f IBn ≤ f IB . Hence,
Z Z Z
0≤ f dµ ≤ f dµ ≤ f dµ = 0.
Bn B
5
Lecture 18
Fatou’s lemma
Dominated Convergence Theorem (DCT)
January 6, 2021
We have seen that interchanging the order of limit and integration, the order of sum-
mation in a double sum etc. are valid under a “non-negativity” assumption. This is
inadequate for applications and we now extend these results significantly with two very
important results.
We first upgrade the MCT by relaxing the non-negativity assumption. We shall continue
to call this theorem MCT. Before that, a small exercise
Exercise
R R 1. Suppose fn are non-negative (everywhere) such that fn ↑ f a.e. Then
fn dµ ↑ f dµ.
Theorem 1 (Extended MCT). Suppose {fn } and g are Borel measurable functions.
R R R
(a) If fn ≥ g for all n such that gdµ > −∞, and fn ↑ f , then fn dµ ↑ f dµ.
R R R
(b) If fn ≤ g for all n such that gdµ < ∞, and fn ↓ f , then fn dµ ↓ f dµ.
R R
Proof.
R (a) If gdµ = ∞ then
R by Theorem 1 (b)(i) of Lecture 16, fn dµ = ∞ and
f dµ = ∞. So assume that gdµ < ∞. Then g is integrable and by Theorem 2 (c) of
Lecture 17, g is finite almost everywhere. Redefine g to be 0 on the set {ω : g(w) = ±∞}
R continue to call this function g). Then 0 ≤ fn − g ↑ f − g a.e. Hence 0 ≤ (fn − g)dµ ↑
(we
(f − g)dµ. Note that now we can apply additivity Theorem 1 of Lecture 17 (check that
conditions of that theorem are satisfied), and the result follows.
(b) Consider −fn and −g and apply (a).
Exercise 2. Show that the above theorem continues to hold if we add the clause “almost
surely” at every hypothesis.
We know that these are extended real-valued functions and are Borel measurable.
1
R
(b) If fn ≤ f for all n such that f dµ < ∞, then
Z Z
lim sup fn dµ ≤ lim sup fn dµ.
n→∞ n→∞
Remark 1. Compare Fatou’s lemma with Exercise 12 of Lecture 4. That will help you
to remember which way the inequalities go when you apply Fatou’s Lemma.
Proof
R of Theorem 2. Define gn = inf k≥n fk , g = lim inf fn . Then gn ≥ f for all n,
R f dµ > R−∞ and gn ↑ g. Hence we can apply MCT (Theorem 1) to conclude that
gn dµ ↑ gdµ. But gn ≤ fn . Hence
Z Z Z Z Z
lim inf fn dµ = gdµ = lim gn dµ = lim inf gn dµ ≤ lim inf fn dµ.
(b) Take the negatives and convert limsup to liminf and apply (a).
Theorem 3 (Dominated Convergence Theorem (DCT)). Suppose {fn } and h are Borel
measurable functions such that
(i) |fn | ≤ h for all n,
(ii) h is integrable,
(iii) fn → f a.e.
R R
Then f is integrable and fn dµ → f dµ.
Remark 2. (i) As usual, we can add “almost sure” in (i) and the result will still hold.
(ii) A common mistake in applications is to ignore condition (ii). Note that if the limit,
say f , of fn is known to exist, that is not enough for (i) or (ii) to hold.
Proof of Theoem 3. Since (iii) holds, we know that |f | ≤ h a.e., and hence f is integrable.
By Fatou’s lemma
Z Z
lim inf fn dµ ≤ lim inf fn dµ
Z
≤ lim sup fn dµ
Z
≤ lim sup fn dµ.
But lim inf fn = lim sup fn = f a.e. and the result follows.
2
Lecture 19
Induced measure
Measure preserving transformation
Change of variable formula
Riemann and Lebesgue integrals
January 6, 2021
Exercise 1. Suppose {fn } are Borel measurable functions such that |fn | ≤ h for all n.
and |h|p is integrable for some p > 0. If fn → f a.e. and |f |p is integrable, show that
|fn − f |p dµ → as n → ∞.
R
Exercise 2. Suppose (Ω, A, µ) is a measure space. Suppose f and g are Borel measurable
functions. Consider the condition
Z Z
gdµ ≤ f dµ for all A ∈ A. (1)
A A
µT −1 (A1 ) = µ T −1 (A1 ) A1 ∈ A1 .
in the sense that if one of the integrals exists then the other exists and they are equal.
Hint: Start with f as an indicator function.
Definition 2. (Lebesgue set) The Lebesgue σ-field is the completion of the Borel σ-
field with respect to the Lebesgue measure. Any set in the Lebesgue σ-field is called a
Lebesgue set.
1
Theorem 1 (Reimann and Lebesgue integration). Let f be a bounded real valued Borel
measurable function on [a, b]. Let λ be the Lebesgue measure on [a, b]. Show that
(a) Then f is Riemann integrable if and only if f is λ-a.e. continuous.
(b) If f is Riemann integrable on [a, b] then f is Lebesgue integrable on [a, b] and the
two integrals are equal.
and let
|π| = max (xi − xi−1 ).
1≤i≤n
Define
Note that
Lπ,n (x) ≤ f (x) ≤ Uπ,n (x) for all x ∈ [a, b]
and these functions are also bounded by M . Define the upper and lower sums on [a, b]
by
n
X
Uπ,n = Mi (xi − xi−1 ),
i=1
Xn
Lπ,n = mi (xi − xi−1 ).
i=1
Consider the space ([a, b], L, λ) where L is the Lebesgue σ-field (of subsets of [a, b]).
Note that Uπ,n (·) and Lπ,n (·) are simple functions and
Z Z
Uπ,n (x)dλ(x) = Uπ,n , Lπ,n (x)dλ(x) = Lπ,n .
Choose any sequence of partitions {πn } such that and |πn | → 0, and for each n, πn+1
is a refinement of πn (that is all points of πn are also points of πn+1 ). Then {Uπn ,n (·)}
and {Lπn ,n (·)} are respectively non-increasing and non-decreasing sequence of functions,
with limits Uπ (·) and Lπ (·) say. Moreover
2
and these functions are also bounded by M . Then by Dominated Convergence Theorem
(DCT) (the bounding function is the constant function M ), Let
Z Z
lim Uπn ,n = lim Uπn ,n (x)dλ(x) = Uπ (x)dλ(x), (3)
n→∞ n→∞
Z Z
lim Lπn ,n = lim Lπn ,n (x)dλ(x) = Lπ (x)dλ(x). (4)
n→∞ n→∞
But then by (2) for all such sequences of partitions, Uπ (·) = f (·) = Lπ (·) λ-a.e. Hence
f is continuous λ-a.e.
Now assume that f is continuous λ-a.e. Then Uπ (·) = f (·) = Lπ (·) λ-a.e. for any
sequence of partitions {πn } such that |πn | → 0. Now note that Uπ (·) and Lπ (·) are Borel
measurable (since they are limits of simple functions). So f is Lebesgue measurable.
Since f is bounded (and Lebesgue measurable) it is integrable with respect to λ. Hence
Z Z Z
Uπ (x)dλ(x) = f (x)dλ(x) = Lπ (x)dλ(x) (5)
3
Lecture 20
Jordan-Hahn Decomposition Theorem
Signed measure
Upper, lower and total variation
January 7, 2021
Exercise 1. Give an example of a sequence of functions {fn } on [a, b] such that they
are bounded by 1, each fn is Riemann integrable, fn (x) → f (x) for all x ∈ [a, b] but f
is not Riemann integrable.
Definition 2. Suppose
R (Ω, A, ν) is a measure space and f isRa Borel measurable function
on Ω such that Ω f dµ exists. Then ν defined by ν(A) = A f dµ, A ∈ A is called the
indefinite integral of f with respect to µ.
Recall that in this case ν is a difference of two measures and at least one of them is
finite. We now target a converse.
1
(ii) Suppose now that ν(A) < ∞ for all A ∈ A. Let S be the supremum. Get An ∈ A
such that ν(An ) → S. Let A0 = ∪∞ n
n=1 An . Fix n and consider the 2 disjoint sets
A∗1 ∩ A∗2 ∩ · · · ∩ A∗n where each A∗i is either Ai or A0 \ Ai . Some of them could be empty.
Label them as Anm , m = 1, 2, . . . , 2n . Let
Since each An is a finite disjoint union of some sets Anm , and negative-valued sets have
been dropped in the definition of Bn , using additivity of ν, we have
ν(An ) ≤ ν(Bn ).
Also note that there is “nesting”. If n1 > n2 then each An1 m1 is either a subset of some
An2 m or disjoint from it. This implies that for r ≥ n,
Hence we have
ν(An ) ≤ ν(Bn )
≤ ν(∪rk=n Bk ), (additivity and the above observation)
→ ν(∪∞
k=n Bk ) as r → ∞ (continuity from below).
Define
C = lim sup Bn .
Then ∪∞
k=n Bk ↓ C. Also 0 ≤ ν(∪∞
k=n Bk ) < ∞. Thus
S = lim ν(An )
n→∞
≤ lim ν(∪∞
k=n Bk )
n→∞
= ν(C) (continuity from below)
≤ S.
Remark 1. We have used continuity from below and above. This follows from the
following exercise. Not sure if I have done this earlier.
Exercise 5. Suppose ν is a countably additive set function on σ-field A. Show that then
the following hold:
(i) If An ∈ A such that An ↑ A, then ν(An ) → ν(A). The convergence may not be
monotone.
(ii) If An ∈ A such that An ↓ A, and µ(Ai ) < ∞, then ν(An ) → ν(A). The convergence
may not be monotone.
(ii) The results (i) and (ii) hold if ν is defined on a field F and we assume that A ∈ F.
2
Theorem 2 (Jordan-Hahn Decomposition theorem). Suppose ν is an extended real
valued, countably additive set function on a measurable space (Ω, A). Define
Proof. By definition, ν does not take both values ±∞. Without loss, assume that ν
does not take the value −∞. Let D be a set with the property described in Theorem 1.
Since ν(∅) = 0, we have −∞ < ν(D) ≤ 0. Take any set A ∈ A. Then
Observe that both terms on the right side are finite. Hence
Hence
3
≥ ν(B ∩ D), by (1)
≥ ν(B ∩ D) + ν((A \ B) ∩ D), by (1)
= ν(A ∩ D), by additivity.
ν − (A) ≤ −ν(A ∩ D)
≤ ν − (A) by definition of ν − .
Definition 3. The measures ν + and ν − are called the upper and lower variations of
ν. The measure |ν| := ν + + ν − is called the total variation of ν. A countably additive
set function ν on a σ-field is also called a signed measure.
4
Lecture 21
Absolute continuity of measures
Radon-Nikodym Theorem
January 12, 2021
Exercise 4. Suppose ν is a signed measure on A. Show that the total variation of |ν|
is given by
n
nX o
|ν|(A) = sup |ν(Ei )| : {Ei } are disjoint measurable subsets of E .
i=1
Exercise 5. If ν1 and ν2 are signed measures, show that |ν1 + ν2 | ≤ |ν1 | + |ν2 |.
Suppose f is a non-negative measurable function on (Ω, A, µ). Recall that its indefinite
integral ν is then also a measure. We may visualise the relation between µ, f and ν as
dν = f dµ. In other words the “derivative of ν with respect to µ equals f ”. We now
make this and related ideas precise.
Definition 1. Suppose µ is a measure and ν is a signed measure on (Ω, A, µ). Then we
say that ν is absolutely continuous with respect to µ if for every A ∈ A, µ(A) = 0
implies ν(A) = 0. We write ν << µ.
R
Example 1. Suppose f is a Borel measurable function on (Ω, A, µ) such that f dµ
exists. .If ν is the indefinite integral of f then ν << µ.
1
Theorem 1 (Radon-Nikodym theorem). Suppose µ is a σ-finite measure and ν is a
signed measure on (Ω, A, µ), and ν << µ. Then there exists a Borel measurable function
f : Ω → R̄ such that ν is the indefinite integral of f . That is
Z
ν(A) = f dµ for all A ∈ A. (2)
A
The function f is unique: if g is any other function which satisfies (2), then f = g
a.e.[µ].
Proof. Once the existence is proved, the uniqueness follows from Exercise 2 of Lecture
19. We prove the existence in a few steps.
Note that s ≤ ν(Ω) < ∞. Partially order S by declaring that f ≥ g if and only f ≥ g
a.e. µ.
Let f, g ∈ S. Then h = max(f, g) ∈ S. This follows by taking B = {ω : f (ω) ≤ g(ω)},
and observing that for any A ∈ A,
Z Z Z
hdµ = gdµ + f dµ
A A∩B A∩B c
≤ ν(A ∩ B) + ν(A ∩ B c )
= ν(A).
RWe now identify a maximal element of S. Let {fn } be a sequence in S such that
fn dµ → s. Let gn = max(f1 , . . . , fn ). Then gn ∈ S and gn is non-decreasing. Let
g = lim gn . Then by MCT,
Z Z Z
gdµ = lim gn dµ ≥ lim fn dµ = s.
2
R R
But we know that gn IA dµ ≤ ν(A) for all n. Hence g ∈ S. Now since gdµ = s, g is a
maximal element of S. Now consider the set function
Z
ν1 (A) = ν(A) − gIA dµ, A ∈ A.
If A ∈ A, then
Z
1
hdµ = µ(A ∩ D)
A k
≤ ν1 (A ∩ D) by (4)
≤ ν1 (A)
Z
= ν(A) − gdµ.
A
This implies Z
(h + g)dµ ≤ ν(A).
A
But then h + g > g on the set D with µ(D) > 0. This contradicts the maximality of g.
Thus ν1 is identically 0 and the theorem is proved in this special case.
3
Lecture 22
Radon-Nikodym Theorem, continued
January 12, 2021
The function f is unique: if g is any other function which satisfies (1), then f = g
a.e.[µ].
Proof of Radon-Nikodym theorem continued. Recall that in Step 1 we have already proved
the theorem when both µ and ν are finite measures.
Step 2. Assume that µ and ν are finite and σ-finite measures respectively. Let {Ωn } be
disjoint sets in A such that
∪∞
n=1 Ωn = Ω and ν(Ωn ) < ∞ for al n.
Define
νn (A) = ν(A ∩ Ωn ), A ∈ A.
Then it trivially follows that for every n, νn is a finite measure and νn << µ. Hence
by Step 1, for every n, there exists a non-negative measurable Pfunction gn such that νn
is the indefinite integral of gn with respect to µ. Take g = ∞ n=1 gn . Then ν is the
indefinite integral of g with respect to µ.
Step 3. Now suppose that µ and ν are finite and arbitrary measures respectively. For
any C ∈ A, define
AC = {A ∩ C : A ∈ A}.
Then AC is a σ-field of subsets of C.
Define the class of sets
C = C ∈ A : ν restricted to AC is a σ-finite measure .
s ≥ µ(C) ≥ µ(Cn ) → s,
1
we have µ(C) = s.
Now consider the measures µ and ν restricted to AC for the above choice of C. Since µ
and ν are finite and σ-finite on AC respectively, by Step 2 there exists a non-negative
function fC : C → R̄, which is Borel measurable with respect to AC such that ν is the
indefinite integral of µ. In other words,
Z
ν(A ∩ C) = fC dµ for all A ∈ A.
A∩C
Case 1. µ(A ∩ C c ) > 0. Then suppose if possible, ν(A ∩ C c ) < ∞. But this implies
C ∪ (A ∩ C c )) ∈ C and
It follows that
where
(
fC (ω) if ω ∈ C
f (ω) = (2)
∞ if ω ∈ C c .
Step 4. Suppose µ and ν are σ-finite and arbitrary measures respectively. Let Ωn be
disjoint sets in A such that Ω = ∪∞n=1 Ωn and µ(Ωn ) < ∞ for all n. By Step 3, for every
n, there exists gn : Ωn → R̄ which are Borel measurable with respect to AΩn such that
Z
ν(A ∩ Ωn ) = gn dµ, A ∈ A.
Extend gn to all of Ω by defining it to be 0 on Ωcn . Call this new function fn . Note that
fn is measurable and Z
ν(A ∩ Ωn ) = fn dµ, A ∈ A.
A
2
Then
∞
X
ν(A) = ν(A ∩ Ωn )
n=1
X∞ Z
= fn dµ,
n=1 A
Z ∞
X
= f dµ where f = fn .
A n=1
Step 5. Now assume that µ and ν are is σ-finite and signed measures respectively. Write
ν = ν + − ν − . Without loss, assume that ν − is a finite measure. By Step 4, there
exists non-negative Borel measurable functions f1 and f2 such that ν + and ν − are the
indefinite integrals of f1 and f2 respectively with respect to µ. Since ν − is finite, f2 is
µ-integrable. Hence
Remark 1. The following facts follow from the above theorem and its proof. Suppose
ν << µ where ν is a sigmed measure and µ is σ-finite. The proofs are left as exercises.
(i) If ν is a finite measure then dν/dµ is µ-integrable and hence is finite a.e. µ.
Exercise 2. Give an example to show that the condition µ is σ-finite cannot be dropped
from the Radon-Nikodym theorem.
Exercise 3. Suppose ν is a finite signed measure and µ is a measure. Show that ν << µ
if and only if given any > 0 there exists a δ > 0 such that for any A ∈ A, µ(A) < δ
implies |ν(A)| < .
3
Exercise 5. Suppose ν1 and ν2 are signed measures which are both absolutely continuous
with respect to a σ-finite measure µ and the signed measure ν1 + ν2 is well-defined. Show
that (ν1 + ν2 ) << µ and d(ν1 + ν2 )/dµ = (dν1 /dµ) + (dν2 /dµ).
Exercise 6. Suppose µ1 , µ2 , and µ3 are three σ-finite measures such that µ1 << µ2 and
µ2 << µ3 . Show that µ1 << µ3 and dµ1 /dµ3 = (dµ1 /dµ2 )(dµ2 /dµ3 ).
4
Lecture 23
Singularity of measures
January 19, 2021
(c) Suppose A is such that µ(A) = 0 and |λ2 |(Ac ) = 0. But since λ1 << µ, by (b) we
know that |λ1 | << µ. Hence |λ1 |(A) = 0. That is λ1 ⊥ λ2 .
(d) By (c) λ1 ⊥ λ1 . Hence there exists a set A such that |λ1 |(A) = |λ1 |(Ac ) = 0. Hence
|λ1 |(Ω) = 0.
(e) Suppose λ1 << µ. Suppose if possible the condition does not hold. Then there exists
> 0 and sets such that
µ(An ) < 2−n , |λ1 |(An ) ≥ .
1
Let A = lim sup An . Then by Borel-Cantelli Lemma, µ(A) = 0. But |λ1 |(∪∞ k=n Ak ) ≥
|λ1 |(An ) ≥ . Hence |λ1 |(A)| = limn→∞ |λ1 |(∪∞ A
k=n k ) ≥ . This is a contradiction.
2
Lecture 24
Lebesgue decomposition theorem
January 19, 2021
Then ν1 << µ. To see this, suppose µ(B) = 0. Then B ∈ C. Hence ν1 (B) = ν(B − C) =
0. Now note that µ(C) = 0 and ν2 (C c ) = ν(C c ∩ C) = ν(∅) = 0. So ν2 ⊥ µ. Finally,
ν = ν1 + ν2 . Uniqueness follows by using Lemma 1 (d) of Lecture 23.
(b) Now suppose ν is a σ-finite measure. Suppose {Ai } is a disjoint partition of Ω such
that ν(An ) < ∞ for every n. Define
νn (A) = ν(A ∩ An ), A ∈ A.
(c) Suppose ν is a σ-finite signed measure. Then use Jordan-Hahn decomposition and
apply (b). Uniqueness follows by using the uniqueness proved in (c).
1
Lecture 25
Absolutely continuous functions
Functions of bounded variation
January 19, 2021
Exercise 1. (a) Show that in the above definition it does not make any difference if we
allow countable partitions instead of finite partitions.
(b) Give examples of functions which are continuous but not absolutely continuous.
(c) If f and g are absolutely continuous, show that f − g is also absolutely continuous.
Theorem 1. Suppose F and G are distribution functions on [a, b] with finite Lebesgue-
Steitjes measures µ1 and µ2 respectively. Let f = F − G and µ = µ1 − µ2 . Then µ << λ
if and only if f is absolutely continuous. Here λ is the Lebesgue measure on [a, b].
Proof. (a) Suppose first that µ << λ. Fix > 0. Then by Lemma 2 (b) and (e) of
Lecture 23, there exists δ > 0 such that λ(A) < δ implies |µ|(A) < . Consider any
collection of disjoint sub-intervals (ai , bi ), i = 1, . . . , n, of [a, b], with total length less
than δ. Let A = ∪ni=1 (ai , bi ]. Then λ(A) < δ. Hence
n
X n
X
|f (bi ) − f (ai )| = |µ(ai , bi ]| by definition of µ
i=1 i=1
Xn
≤ |µ|(ai , bi ] since |µ(A)| ≤ |µ|(A) for any A ∈ A
i=1
≤ |µ|(A) since µ{bi } = 0 for all i
≤ .
(b) Now suppose f is absolutely continuous. Hence f is continuous. Thus for any b,
1
Now fix > 0. Choose δ > 0 as in the definition of absolute continuity. We have to
show that λ(A) = 0 implies µ(A) = 0. Recall Theorem 1 on approximation of Lebesgue-
Stieltjes measure from Lecture 13. By that result,
Note that the above relations in Theorem 1 were proved for measures on R. But one
can extend the measures µi to measures on R in the obvious way and deduce the above
for Lebesgue-Stieltjes measures on [a, b].
Recall that finite intersection of open sets is open. So, from above, we can get {Vn } open
such that Vn ⊇ A and both λ(Vn ) → λ(A) = 0 and µ(Vn ) → µ(A) hold.
Choose n large enough such that for all k ≥ n, λ(Vk ) < δ. Now since Vn is open, it is a
disjoint union of the form Vn = ∪∞
i=1 (ai , bi ). Then
∞
X
|µ(Vn )| = | µ(ai , bi )|
i=1
∞
X
≤ |µ(ai , bi )|
i=1
X∞
= |µ(ai , bi ]| since µ{bi } = 0
i=1
X∞
= |f (bi ) − f (ai )|
i=1
≤ .
Then Vf [a, b] =: supπ Vπ,f [a, b], where the supremum is over all partitions π, is called
the variation of f over [a, b]. The function f is said to be of bounded variation if
Vf [a, b] < ∞.
2
Exercise 3. (a) For any interval [a, b] and any a < c < b show that
Vf [a, b] = Vf [a, c] + Vf [c, b].
(b) If F is a monotone function on [a, b] then F is of bounded variation and Vf [a, b] =
|F (b) − F (a)|.
(a) Then there exists non-decreasing functions F and G on [a, b] such that f = F − G.
3
Taking supremum successively over all possible π1 , . . . , πn , we get
n
X
Vf [ai , bi ] ≤ .
i=1
Proof. (a) First suppose that f is absolutely continuous. Then by Lemma 2 (b) f = F −G
where f and G are non-decreasing absolutely continuous functions. It is enough to prove
the result assuming that G = 0. Let µ be the Lebesgue-Stieltjes measure corresponding
to F . Then µ << λ by Theorem R 1. By Radon-Nikodym theorem, there is a λ-integrable
function g such that µ(A) = A gdλ for all Borel subsets of [a, b]. Take A = [a, x] to
get (1).
4
Exercise 4. Suppose g is Lebesgue integrable on R. Define
Z
f (x) = gdλ, x ∈ R.
(−∞, x]
Show that f is absolutely continuous and hence is continuous on R. Hint: Use DCT or
apply the above theorem.
5
Lecture 26
Continuous singular distribution function/measure
Almost everywhere differentiable functions
Cantor set
Cantor function
February 17, 2021
Recall that if µ is a measure on the Borel sets of R which is finite on bounded sets then
it is a Lebesgue-Stieltjes measure.
Definition 1 (Differentiability of measure). Suppose µ is a signed measure on the Borel
sets of R which is finite on every bounded set. Then the upper and lower derivatives of
µ at x is defined as
µ(Ir ) µ(Ir )
Du (x) = lim sup , Dl (x) = lim inf
r→0 Ir λ(Ir ) r→0 Ir λ(Ir )
where the supremum and infimum are taken over all intervals that include x and are of
lengths less or equal to r. We say that µ is differentiable at x if Du (x) = Dl (x) and
we write the common value as Dµ(x).
Exercise 2. Suppose µ is a discrete signed measure. Show that Dµ(x) = 0 a.e λ.
We now state three results but we will not prove them. Proofs can be found in the book.
Theorem 1. Suppose µ is a signed measure on R that is finite on bounded on sets. Let
µ = µ1 + µ2 where µ1 << λ and µ2 ⊥ λ. Then Dµ = dµ1 /dλ a.e. λ.
Theorem 2. Suppose f : [a, b] → R be a non-decreasing function. Then f is differ-
entiable a.e. λ and Dµ = f 0 .e. λ. In particular a function of bounded variation is
differentiable a.e. λ.
1
Theorem 3. (a) Suppose f is absolutely continuous on [a, b] with
Z x
f (x) − f (a) = g(t)dt, a ≤ x ≤ b
a
(a) C is uncountable.
(b) λ(C) = 0.
(c) C is a closed set (in the usual topology).
(d) Every point in C is a limit point.
(e) C is nowhere dense.
Exercise 4 (The Cantor function). In the above exercise, note that the removed set
∪ni=1 Ei consists of 2n − 1 disjoint intervals. Let A1 , . . . , A2n −1 be their enumeration in
increasing order. Define Fn : [0, 1] → [0, 1] by
0, if x = 0,
Fn (x) = k/2n if x ∈ Ak , k = 1, 2, . . . , 2n − 1,
1 if x = 1.
(d) F 0 = 0 a.e. λ.
2
Lecture 27
Complex-valued functions
Convex function
Lp spaces
Hölder Inequality
Minkowski Inequality
February 3, 2021
Lp = Lp (Ω, A, µ)
Z
= {f : f is complex-valued Borel measurable such that |f |p dµ < ∞}
Z 1/p
||f ||p = |f |p dµ , f ∈ Lp .
Proof. We shall skip the detailed proof. It starts by writing f (ω) = r(ω)eιθ(ω) .
R
Exercise
R 2. Suppose f is a complex-valued µ-integrable function. Show that | Ω f dµ =
Ω |f |dµ if and only if f is a.e. constant on the set {ω : f (ω) 6= 0}.
Proof. (a) Take logs and use the fact that − log is a convex function. For (b), let
α = 1/p, β = 1/q, a = cp , b = dq and apply (a).
1
1 1
Theorem 1 (Hölder Inequality). Suppose p, q > 1 such that + = 1. If f ∈ Lp
p q
and g ∈ Lq then
||f g||1 ≤ ||f ||p ||g||q .
Proof. If ||f ||p ||g||q = 0 then one of f and g is 0 a.e. µ. Then the inequality is trivial.
So assume that ||f ||p ||g||q 6= 0. In Lemma 2 (b) let
|f | |g|
c= , d= .
||f ||p ||g||q
Then
|f g| |f |p |g|q i
Z Z h
dµ ≤ + dµ
||f ||p ||g||q p||f ||pp q||g||qq
1 1
= + = 1.
p q
Exercise 3. Show that the Cauchy-Schwartz inequality follows from Hölder Inequality.
Exercise 4. Show that equality holds in Hölder inequality if and only if A|f |p = B|g|q
a.e. µ for some constants A and B, not both zero.
Exercise 5. Suppose (Ω, A, µ) is a finite measure space. Suppose 0 < r < s < ∞.
(b) As a consequence of (a), show that Ls ⊂ Lr . Show that this is not true if we do not
assume µ is a finite measure.
Lemma 3. If a, b ≥ 0 and p ≥ 1, then (a + b)p ≤ 2p−1 (ap + bp ).
2
Exercise 6. Show that the above lemma is not true if p < 1.
Theorem 2 (Minkowski Inequality). If f, g ∈ Lp for some p ≥ 1, then
||f + g||p ≤ ||f ||p + ||g||p .
Proof. We shall use Hölder inequality to prove this. Clearly it is true for p = 1 and so
we assume that p > 1. By Lemma 3,
|f + g|p ≤ (|f | + |g|)p ≤ 2p−1 |f |p + |g|p .
p 1 1
Hence f + g ∈ Lp . Choose q = . Note that q > 1 and + = 1. Then
p−1 p q
|f + g|p = |f + g| |f + g|p−1 ≤ |f | |f + g|p−1 | + |g| |f + g|p−1 . (1)
Clearly Z Z
q
|f + g|p−1 dµ = |f + g|p dµ < ∞.
3
If || · || is a norm on a vector space V , then d(v1 , v2 ) = ||v1 − v2 || defines a metric on V .
Note that Lp is a vector space with the usual operations. By Minkowski Inequality,
|| · ||p is a semi-norm on Lp . Note that if f = g a.e. µ, then ||f ||p = ||g||p . Thus if
we identify two functions that are almost everywhere equal to be equal (in other words
we are considering the equivalence classes of the relation f = g a.e. µ) then for p ≥ 1,
|| · ||p becomes a norm on Lp and hence it is also a metric space. It is NOT a norm for
0 < p < 1. Nevertheless, we shall soon see that Lp is a complete metric space for all
0 < p < ∞. We shall also give meaning to L∞ and it will also be a complete metric
space.
4
Lecture 28
Markov Inequality
Chebeyshev Inequality
February 3, 2021
Inequalites for integrals with respect to general measures and probability measures are
very important. We have seen two such inequalities already. The following inequality is
easy to prove but is of prime importance in statistics and probability where it is used
for a probability measure. It appears in different forms in different areas.
Note that g(ω) ≤ h(ω) and the result follows from this by integrating.
is called the mean of f whenever the integral exists. Note that the mean can be equal
to ±∞.
Suppose that the mean is finite. Then the variance of f is defined as
Z
2
σ = (f − m)2 dP.
Ω
1
Lecture 29
Lp spaces
February 17, 2021
Note that we have proved that for p ≥ 1, Lp =: Lp (Ω, A, µ) is a vector space with the
norm || · ||p .
We now investigate the space Lp for 0 < p < 1. As before we identify two functions to
be equal if they are equal a.e.
Exercise 1. Fix 0 < p < 1. Show that for any a, b ≥ 0, (a + b)p ≤ ap + bp .
Theorem 1 (Lp is a metric space for 0 < p < 1). Fix 0 < p < 1. Then:
(b) For a, b > 0, we have (a + b)1/p > a1/p + b1/p whenever 0 < p < 1. Thus the norm
condition is violated.
So, to summarize, for p ≥ 1, Lp is a normed space with the norm || · ||p and
R the metric
d(f, g) = [ Ω |f − g| dµ] . For 0 < p < 1, || · |p is not a norm but d(f, g) = Ω |f − g|p dµ
p 1/p
R
is a metric. In either case, Rfor any 0 < p < ∞, {fn ∈ Lp } converges to f ∈ Lp in this
metric/norm if and only if Ω |fn − f |p dµ → 0.
Definition 1. Suppose {fn ∈ Lp } and f ∈ Lp such that Ω |fn − f |p dµ → 0. Then we
R
Lp
say fn converges to f in Lp and we write fn → f .
Exercise 2. Suppose {fn } is a sequence of Borel measurable functions dominated by
g ∈ Lp for some p > 0. Suppose fn → f a.e. µ. Show that
Proof. Without loss of generality we can assume that f is a real-valued function. Recall
the construction of simple functions {sn } given in Lemma 2 (b) of Lecture 16. These
1
sn satisfy (i) sn → f a.e. µ and (ii) |sn | ≤ |f | for all n. Then by Exercise 2 (b),
||sn − f ||p → 0.
We shall now prove that the space Lp is complete. We need the following lemma.
Lemma 1. Suppose {fn } ∈ Lp for some p > 0 and ||fk − fk+1 ||p < 4−k for all k ≥ 1.
Then {fn } converges a.e. µ.
Proof. Define
Ak = {ω : |fk (ω) − fk+1 (ω)| ≥ 2−k }.
Then by Markov’s inequality,
Theorem 3 (Lp is complete for 0 < p < ∞). Suppose {fn } is Cauchy in Lp for some
0 < p < ∞. Then there is an f such that ||fn − f ||p → 0 as n → ∞.
Proof. (a) First asssume that p ≥ 1. Since {fn } is Cauchy in Lp , we can choose increasing
integers {nk } such that
1
||fn − fm ||p < for all m, n ≥ nk .
4k
Let gk = fnk , k ≥ 1. Then by Lemma 1, gk converges a.e. µ to some f .
We shall now show that f ∈ Lp and ||fn − f ||p → 0. Fix > 0. Since {fn } is Cauchy in
Lp , choose N such that ||fn − fm ||pp < for all n, m ≥ N . Fix n ≥ N . and let m → ∞
through the subsequence {nk }. Then
Now let n → ∞. Then lim supn→∞ ||fn − f ||p ≤ . Since was arbitrary, this proves
Lp
that fn − f → 0. We have not yet shown that f ∈ Lp . But f = (f − fn ) + fn and hence
f ∈ Lp by Minkowski Inequality.
2
(b) Now assume that 0 < p < 1. Then by the above proof (we had used p ≥ 1 only in the
Lp
final step to show that f ∈ Lp ) fn − f → 0. Remains to show that f ∈ Lp . We cannot
use Minkowski inequality. But the conclusion follows by writing f = (f − fn ) + fn and
then using Exercise (1).
Proof. By Theorem 2, it suffices to show that we can approximate in || · ||p , any indicator
function IA ∈ Lp by a continuous function. Note that IA ∈ Lp implies µ(A) < ∞. Hence
by Theorem 1 of Lecture 13, there exists C closed and V open such that C ⊂ A ⊂ V
and µ(V − C) < p 2−p . Using Uryshon’s lemma1 , there exists a continuous function
g : Ω → [0, 1] such that g = 1 on C and g = 0 on V c . Then
Z Z
|IA − g|p dµ = |IA − g|p dµ
Ω {ω:I (ω)6=g(ω)}
Z A
≤ 2p dµ
{ω:IA (ω)6=g(ω)}
≤ 2p µ(V − C)
< p .
1
Urysohn’s Lemma: Suppose Ω is a metric space and K is a closed set in Ω and U is an open
neighbourhood of K. Then there exists a continuous function f : Ω → [0, 1] such that IK (ω) ≤ f (ω) ≤
IU (ω) for all ω ∈ Ω
3
Lecture 30
L∞
lp , 0 ≤ p ≤ ∞
February 3, 2021
Note that ess sup f is the smallest number c such that f ≤ c a.e. µ.
Example 1. (a) If f is the constant function −1, then ess sup f = −1.
(b) If f is a simple function f = ni=1 ci IAi then ess sup f = sup{ci : µ(Ai ) > 0}.
P
Note that ||f ||∞ is the smallest number c such that |f | ≤ c a.e. µ. So if f ∈ L∞ if and
only if f is essentally bounded, that is it is bounded outside a set of measure 0.
Exercise 1. Show that Hölder inequality holds for p = 1, q = ∞ and Minkowski inequal-
ity holds for p = ∞.
Theorem 1. (a) L∞ is a vector space and || · ||∞ is a norm on it.
(b) Suppose {fn ∈ L∞ }. Then ||fn − f ||∞ → 0 if and only if there exists a set A ∈ A
such that µ(A) = 0 and fn → f uniformly on Ac .
(b) Suppose ||fn − f ||∞ → 0. For any given positive integer m, ||fn − f ||∞ ≤ 1/m
for all large n. Hence there exists Am such that for µ(Am ) = 0 and for all ω ∈/ Am ,
|fn − f | ≤ 1/m. Let A = ∪∞ A
m=1 m . Then µ(A) = 0 and fn → f uniformly on A c.
1
Exercise 2. (a) Show that finite valued simple functions are dense in L∞ .
(b) Give an example to show that continuous functions g ∈ L∞ are not dense in L∞ when
we consider a measure space (Rn , B(Rn ), µ) where µ is a Lebesgue-Stieltjes measure.
Exercise 3. Suppose µ is a probability measure. Show that ||f ||p → ||f ||∞ as p → ∞.
Give an example to show that the result fails if µ is not a probability measure.
L∞ Lp
Exercise 4. Suppose µ is a finite measure. Show that if fn → f then fn → f for all
0 < p < ∞. Show that the result is not true if µ is not a finite measure.
2
Lecture 31
Convergence in measure
Almost uniform convergence
Egoroff’s Theorem
February 3, 2021
Check that fn → 0 uniformly on R and hence in L∞ and also a.e. Check that fn → 0 in
measure. Check that fn does not converge to 0 in Lp for any 0 < p < ∞.
Definition 3. A sequence of complex-valued Borel measurable functions {fn } is said
to converge almost uniformly to f if given > 0, there exists a set A ∈ A such that
µ(A) < and fn → f uniformly on Ac .
Exercise 4. Consider the Lebesgue measure on [0, ∞). Let
1, if n ≤ x ≤ n + 1 ,
fn (x) = n
0 otherwise.
Check that fn → 0 a.e., in measure and in Lp , 0 < p < ∞. Show that fn does not
converge almost uniformly.
Exercise 5. Consider the Lebesgue measure on [0, ∞). Let
(
1 if n ≤ x ≤ n + 1,
fn (x) =
0 otherwise.
1
µ a.e.
Theorem 1. (a) If fn → f almost uniformly, then fn → f and fn → f .
µ
(b) If fn → f then there is a sub-sequence {nk } so that fnk → f almost uniformly to f .
Proof. (a) Fix > 0. Let A be such that µ(A) < and fn → f uniformly on Ac . Fix
δ > 0. Then for all large n, |fn ω) − f (ω)| < δ for all ω ∈ Ac . Therefore for such n
{ω : |fn (ω) − f (ω)| > δ} ⊆ A. This implies that for all large n,
µ(B c ) = µ(∩∞
k=1 Ak ) ≤ µ(Ak ) → 0 as k → ∞.
{ω : |fn (ω) − fm (ω)| ≥ } ⊆ {ω : |fn (ω) − f (ω)| ≥ /2} ∪ {ω : |fm (ω) − f (ω)| ≥ /2}.
Hence as n, m → ∞,
µ ω : |fn (ω)−fm (ω)| ≥ ≤ µ ω : |fn (ω)−f (ω)| ≥ /2 +µ ω : |fm (ω)−f (ω)| ≥ /2 → 0.
This proves that fn is Cauchy in measure. Now for k = 1, 2, . . ., choose {nk } strictly
increasing such that
Then clearly, µ(Ak ) ≤ 2−k and hence by Borel-Cantelli lemma, µ(A) = 0. But for all
ω∈ / A, ω ∈ Ak for only finitely many k and as a consequence, {gk (ω)} is Cauchy. So for
all ω ∈
/ A, gk (ω) → g(ω) for some g. Since µ(A) = 0, gk → g a.e. But this g must equal
f a.e. since gk → f is measure. Thus gk → f a.e.
Now we prove almost uniform convergence. Fix > 0. Let Br = ∪∞
k=r Ak . Then
µ(Br ) < for all large r. If ω ∈
/ Br , by (1),
By Wierstrass M-test, gk converges uniformly on Brc . Again, the limit must be equal to
f a.e..
Exercise 6. Show that {fn } is Cauchy in measure if and only if fn converges in measure.
Hint. Borrow from the above proof.
2
Lemma 1. Suppose µ is a finite measure. Then fn → f a.e. if and only if for every
δ > 0,
lim µ ∪∞
n→∞ k=n {ω : |fk (ω) − f (ω)| ≥ δ} = 0.
Proof. Let
Bn,δ = {ω : |fn (ω) − f (ω)| ≥ δ}, Bδ = lim sup Bn,δ .
Then µ(∪∞
k=n Bk,δ ) ↓ µ(Bδ ) as n → ∞.
{ω : fn (ω) 9 f (ω)} = ∪∞
δ>0 Bδ
= ∪∞
m=1 B1/m .
Hence fn → f a.e. if and only if µ(Bδ ) = 0 for all δ > 0 if and only if µ(∪∞
k=n Bk,δ ) → 0
for all δ > 0.
Exercise 7. If µ is finite, show that {fn } is Cauchy a.e. if and only if, for all δ > 0,
lim µ ∪∞
m,n→∞ j,k=n {ω : |fk (ω) − fj (ω)| ≥ δ} = 0.
Proof. Fix > 0 and integer j > 1. From the above lemma, for sufficiently large
n = n(j),
µ(Aj ) = µ(∪∞ j
k=n(j) {|fk − f | ≥ 1/j} ≤ /2 .
Let A = ∪∞
j=1 Aj . Then µ(A) < . For δ > 0, choose j > 1/δ. Then for any k ≥ n(j)
and ω ∈ Ac ,
|fk (ω) − f (ω)| < 1/j < δ.
Thus fn → f uniformly on Ac .
|f − g|
Z
d(f, g) = dµ.
Ω 1 + |f − g|
Show that
µ
(a) d(fn , f ) → 0 implies fn → f .
3
µ
(b) If µ is finite then fn → f implies d(fn , f ) → 0. Hence convergence in measure is
metrizable when µ is finite.
(c) Give an example to show that (b) is not true when µ is not finite.
4
Lecture 32
Product σ-field
Transition measure/probability
February 4, 2021
Note that the product σ-field is not the product of the σ-fields, even though the notation
suggests so. Some authors use the notation A1 ⊗ · · · ⊗ An to denote the product σ-field.
The Borel σ-field on Rn is the n-fold product σ-field of the Borel σ-field on R.
Exercise 1. (a) Show that the collection of all measurable rectangles may not always be
a field.
(b) Show that the collection of all finite disjoint unions of measurable rectangles is always
a field but may not be a σ-field.
(c) Show that (A1 ⊗ · · · ⊗ An−1 ) ⊗ An = A1 ⊗ · · · ⊗ An .
These sets are called sections of A. Note that sections can be empty.
(b) Show that A(ω2 ) ∈ A1 and A(ω1 ) ∈ A2 for all ω1 and ω2 . Hint: Consider the class
of sets for which the relations are true and show that the class contains all measurable
rectangles and is a σ-field.
(c) If A and B are disjoint in A then show that their sections are also disjoint.
1
Definition 2 (Transition measure). Suppose (Ωi , Ai ), i = 1, 2 are two measurable
spaces. Suppose µ2 : Ω1 × A2 → [0, ∞] is a function such that
Exercise 3. Here is a toy example. Suppose I have two boxes B1 and B2 . You may
think of them as two values of ω1 . Suppose Box 1 has 3 red balls and 3 white balls. Box
2 has 3 reds, 2 white and 5 black balls. I pick a box by some mechanism (this mechanism
could also be a probability measure on the two values). Once I pick a box, I pick one ball
“at random” from that box (that is, all balls in that box have equal chance to come into
my sample). That is, there are two probability measures on {red, black, white} depending
on which box is chosen. These are the two distinct transition probability measures given
below:
Before presenting the next theorem, a word about notation. So far we have used dµ(ω)
to denote integration with respect to the measure µ. Henceforth we will also use dµ(ω)
for this purpose. This will be convenient to deal with iterated integrals where one or
more variables are held fixed.
Further µ is σ-finite. If µ1 and the transition measures are all probability measures, then
µ is also a probability measure.
2
Lecture 33
Combining transition measure/probability
Product measure
February 4, 2021
Further µ is σ-finite. If µ1 and the transition measures µ2 (ω1 , ·) are all probabilities,
then µ is also a probability.
(ii) We show that f (ω1 ) =: µ2 (ω1 , A(ω1 )) is a measurable function for all A ∈ A. To
show this, let C be the class of sets in A for which this holds. Suppose A = A1 × A2 ∈ A.
Then
(
µ2 (ω1 , A2 ) if ω1 ∈ A1 ,
µ2 (ω1 , A(ω1 )) =
µ2 (ω1 , ∅) = 0 if ω1 ∈/ A1 .
Thus µ2 (ω1 , A(ω1 )) = µ2 (ω1 , A2 )IA1 (ω1 ) which is measurable by assumption. Thus C
contains all measurable rectangles. It is easy to see that C contains the field of all finite
disjoint union of rectangles. [Use the fact that finite sums of measurable functions is
measurable]. Then it is easy to see that C is a monotone class. [We use the finiteness
of the transition measures and Exercise 2(c) of Lecture 32.] Thus by monotone class
theorem C = A.
(iii) Define Z
µ(A) = µ2 (ω1 , A(ω1 ))dµ1 (ω1 ), A ∈ A.
Ω1
Note that for all A ∈ A, the integrand is non-negative, and is a measurable function by
(ii). Hence the integral is defined and satisfies (2). In particular, taking A = A1 × A2 ,
it is easily seen that (1) holds. We now prove that this µ is a measure. Suppose {An } is
a sequence of disjoint sets in A. Then
Z
∞
µ(∪n=1 An ) = µ2 (ω1 , (∪∞
n=1 An )(ω1 ))dµ1 (ω1 )
Ω1
1
Z ∞
X
= µ2 (ω1 , An (ω1 ))dµ1 (ω1 ) since µ2 is a measure
Ω1 n=1
∞
XZ
= µ2 (ω1 , An (ω1 ))dµ1 (ω1 )
n=1 Ω1
X∞
= µ(An ).
n=1
Now assume that µ2 (ω1 , ·) is uniformly σ-finite. Then follow the usual route. Split up
Ω2 by the uniform σ-finite condition into {Bn } and consider the restricted transition
measures. We omit the details.
Uniqueness: Suppose if possible ν is another measure on A for which (1) holds. Then
ν = µ on the field of finite disjoint unions of measurable rectangles. On the other hand,
µ is σ-finite. [Since µ1 is σ-finite, split up Ω1 into {Am } and then consider {Am × Bn }].
Thus µ and ν agree on the field and are both σ-finite. So by Caratheodory theorem they
must be equal on A.
Finally, it is easy to see that if µ1 and the transition measures µ2 (ω1 , ·) are all probability
measures, then µ is indeed a probability measure.
(a) µ(A) = 0.
2
Lecture 33
Combining transition measure/probability
Product measure
February 4, 2021
Further µ is σ-finite. If µ1 and the transition measures µ2 (ω1 , ·) are all probabilities,
then µ is also a probability.
(ii) We show that f (ω1 ) =: µ2 (ω1 , A(ω1 )) is a measurable function for all A ∈ A. To
show this, let C be the class of sets in A for which this holds. Suppose A = A1 × A2 ∈ A.
Then
(
µ2 (ω1 , A2 ) if ω1 ∈ A1 ,
µ2 (ω1 , A(ω1 )) =
µ2 (ω1 , ∅) = 0 if ω1 ∈/ A1 .
Thus µ2 (ω1 , A(ω1 )) = µ2 (ω1 , A2 )IA1 (ω1 ) which is measurable by assumption. Thus C
contains all measurable rectangles. It is easy to see that C contains the field of all finite
disjoint union of rectangles. [Use the fact that finite sums of measurable functions is
measurable]. Then it is easy to see that C is a monotone class. [We use the finiteness
of the transition measures and Exercise 2(c) of Lecture 32.] Thus by monotone class
theorem C = A.
(iii) Define Z
µ(A) = µ2 (ω1 , A(ω1 ))dµ1 (ω1 ), A ∈ A.
Ω1
Note that for all A ∈ A, the integrand is non-negative, and is a measurable function by
(ii). Hence the integral is defined and satisfies (2). In particular, taking A = A1 × A2 ,
it is easily seen that (1) holds. We now prove that this µ is a measure. Suppose {An } is
a sequence of disjoint sets in A. Then
Z
∞
µ(∪n=1 An ) = µ2 (ω1 , (∪∞
n=1 An )(ω1 ))dµ1 (ω1 )
Ω1
1
Z ∞
X
= µ2 (ω1 , An (ω1 ))dµ1 (ω1 ) since µ2 is a measure
Ω1 n=1
∞
XZ
= µ2 (ω1 , An (ω1 ))dµ1 (ω1 )
n=1 Ω1
X∞
= µ(An ).
n=1
Now assume that µ2 (ω1 , ·) is uniformly σ-finite. Then follow the usual route. Split up
Ω2 by the uniform σ-finite condition into {Bn } and consider the restricted transition
measures. We omit the details.
Uniqueness: Suppose if possible ν is another measure on A for which (1) holds. Then
ν = µ on the field of finite disjoint unions of measurable rectangles. On the other hand,
µ is σ-finite. [Since µ1 is σ-finite, split up Ω1 into {Am } and then consider {Am × Bn }].
Thus µ and ν agree on the field and are both σ-finite. So by Caratheodory theorem they
must be equal on A.
Finally, it is easy to see that if µ1 and the transition measures µ2 (ω1 , ·) are all probability
measures, then µ is indeed a probability measure.
(a) µ(A) = 0.
2
Lecture 35
Infinite products
Product probability measure
Kolmogorov extension theorem
February 10, 2021
for all B n ∈ A1 ⊗ · · · ⊗ An .
We shall now see how to define a product measure on infinite product spaces. However,
we shall restrict our discussion to only probability measures.
For sets Ωj , j ≥ 1, let
∞
Y
Ω = Ωj
j=1
= {(ω1 , ω2 , . . .) : ωj ∈ Ωj for all j ≥ 1}.
Definition 1 (Measurable rectangles and cylinders). Suppose (Ωj , Aj ) are measurable
spaces. For any B n ⊆ Ω1 × · · · × Ωn , the set
Bn = B n × Ωn+1 × Ωn+2 × · · ·
= {ω ∈ Ω : (ω1 , ω2 , . . . , ωn ) ∈ B n } ⊂ Ω
is called a cylinder with n-dimensional base B n . The set Bn is said to be measurable
cylinder if B n ∈ A1 ⊗ · · · ⊗ An . If B n = A1 × · · · × An then Bn is called a (finite
dimensional) rectangle, and a measurable rectangle if Ai ∈ Ai , 1 ≤ i ≤ n.
Exercise 2. Show that:
1
Definition 2 (Infinite product σ-field). The smallest σ-field Q generated by all mea-
surable cylinders is called the product σ-field. It is written as ∞ j=1 Aj . If all Aj are
equal to A, then we write the product σ-field as A . ∞
Exercise 4. Read the proof of the above theorem from the book. Look out for the steps
which will fail if the measures are not assumed to be probability measures.
n
Y
P {ω ∈ Ω : ω1 ∈ A1 , . . . , ωn ∈ An } = Pj (Aj ),
j=1
2
Lecture 36
Weak convergence of measures
February 17, 2021
Recall that for any set A in a metric space, ∂A denotes the boundary of A. This is
the set of all points x of the metric space such that any neighbourhood of x contains
elemnets from both A and Ac .
Definition 1. Suppose µ is a measure on a metric space. Then a Borel set A is said to
be a µ continuity set if µ(∂A) = 0.
The following theorem will lead to the correct concept of convergence of measures on a
metric space.
Definition 2. A function f on a metric space is said to be lower semi-continuous
(lsc) at x if lim inf y→x f (y) ≥ f (x). If f is lsc at every x, then we say that it is a lower
semi-continuous function. A function is upper semi continuous if −f is lsc.
Theorem 1 (Portmanteau theorem). Let µ, µ1 , µ2 , . . . be finite measures on the Borel
σ-field of a metric space Ω. Then the following are equivalent:
R R
(a) f dµn → f dµ for all real bounded continuous functions f .
R R
(b) lim inf f dµn ≥ f dµ for all real bounded lower semi-continuous functions f .
R R
(b’) lim sup f dµn ≤ f dµ for all real bounded upper semi-continuous functions f .
R R
(c) f dµn → f dµ for all f real bounded a.e µ continuous functions f .
(d) lim inf µn (A) ≥ µ(A) for every open set A, and µn (Ω) → µ(Ω).
(d’) lim sup µn (A) ≤ µ(A) for every closed set A, and µn (Ω) → µ(Ω).
1
Definition 3. A sequence of measures µn on a metric space is said to converge weakly
w
or vaguely to µ if any of the above equivalent conditions hold. We write µn → µ or
simply µn → µ.
Example 3 (Discrete and continuous uniform). Recall Example 1. All the probability
measures are defined on the Borel σ-field of the metric space Ω = [0, 1] with the usual
metric. Suppose f is any bounded continuous function on Ω. Then
Z n
1X
f dPn = f (i/n)
n
i=1
Z 1
→ f (x)dx.
0
Hence (a) of Theorem 1 is satisfied with P as the uniform probability measure on [0, 1]
w
and hence Pn → P .
Exercise 1. Suppose Pn is the uniform probability measure on An = {1, 2, . . . , n}. Show
that Pn does not converge weakly to any probability measure.
Exercise 2. Suppose Pn puts probability 1/3 and 2/3 at the points 1/n and −1/n re-
spectively. Show that Pn converges weakly to a probability measure P . Identify P .
Definition 4. Suppose P is a probability measure such that P {x} = 1 for some x. Then
P is called the point mass at x and is often written as δx .
Exercise 3 (Binomial to Poisson). Suppose Pn is the binomial probability measure
with parameters n and pn . This means that
n k
Pn (k) = p (1 − pn )n−k , k = 0, 1, . . . n.
k n
Suppose as n → ∞, npn → λ, for some 0 < λ < ∞. Show that the probability measures
Pn converge to the Poisson probability measure P with parameter λ, defined by
λk
P (k) = e−λ , k = 0, 1, . . . .
k!
Hint: Show that Pn (k) → P (k) for every k.
Exercise 4. Suppose C =: {x1 , x2 , . . .} is a countable set in a metric space and has no
w
limit points. Suppose {Pn } and P are probability measuresP∞ on C. Show that Pn → P if
and only if Pn (xk ) → P (xk ) for every k ≥ 1 such that k=1 P (xk ) = 1.
Proof of Theorem 1. (a) ⇒ (b). Suppose g is any function such that g ≤ f and g is
bounded continuous. Then
Z Z Z
lim inf f dµn ≥ lim inf gdµn = gdµ.
2
Now since f is lower semi-continuous (lsc) and |f | ≤ M , there exists a sequence of
continuous functions {gn } bounded
R in Rabsolute value by M and gn → f . Then by
DCT(µ is a finite measure!), gn µn → f dµ. This implies (b).
f = sup{g : g is lsc g ≤ f },
f = inf{g : g is usc g ≥ f }
be the lower and upper envelopes of f . Then f is lsc and f is usc. Further
(d) ⇒ (e): Suppose A is a µ-continuity set. Let Ao and Ā respectively be the interior
and closure of A. Then
3
≤ µ(Ā) by (d’)
= µ(A) by assumption.
Similarly,
lim inf µn (A) ≥ µ(A).
This proves that lim µn (A) = µ(A),
4
Lecture 37
Weak convergence of distribution functions
February 10, 2021
Since every finite measure on R has a distribution function, weak convergence can be
expressed in terms of distribution functions.
Recall that if F is a distribution function then F (±∞) are defined in the natural way:
F (±∞) = limy→±∞ F (y).
Definition 1. For any distribution function F , CF will denote the set of continuity
points of F . This includes the points ±∞.
w
To denote this convergence we often write Fn → F or simply Fn → F .
Proof. (a) ⇒ (b): Consider the Borel set A = (a, b] for a, b finite. Note that ∂A = {a, b}.
Suppose a and b are continuity points of F . Then µ{a} = µ{b} = 0, so that (1, b] is a
µ-continuity set. Hence Fn (a, b] = µn (a, b] → µ(a, b] = F (a, b]. If a = −∞, then the
argument is same. If b = ∞, then apply the earlier argument on (a, ∞).
Fix > 0. For each k, let Jk be a right semi-closed sub-interval of Ik such that the end
points of Jk are continuity points of F , and
[This can be done since F has only countably many discontinuities]. Then
1
Hence
X
lim inf µn (A) ≥ µ(Jk )
k
X
≥ µ(Ik ) −
k
= µ(A) − .
Since is arbitrary, we have lim inf µn (A) ≥ µ(A). Since A was an arbitrary open set,
w
µn → µ by Theorem 1 (d) of Lecture 36.
Show that each Fn is a continuous distribution function and they converge to some F
weakly. Identify F .
Exercise
P∞ 2. Suppose for every n {pn,k } is a sequence of non-negative
P∞ numbers such that
k=1 p n,k = 1. Suppose for every k, lim n→∞ pn,k = p k such that k=1 pk = 1.
P∞
(a) Show that limn→∞ k=1 |pn,k − pk | = 0. Hint: Use |a − b| = a + b − 2 min(a, b) and
DCT.
(b) Suppose Ω = {1, 2, . . .}. Consider the probability measures Pn and P on Ω defined
by the sequences {pn,k } and {pk }. Show that Pn → P .
Exercise 3. (a) Suppose
R {fn } and
R f are non-negative Borel measurable functions on R
such that for Rall n, fn (x)dx = f (x)dx = 1. Suppose further that fn → f a.e λ. Show
that limn→∞ |fn (x) − f (x)|dx = 0.
(b) Suppose Fn and F are the distribution functions corresponding to {fn } and f :
Z x Z x
F (x) = f (y)dy, Fn (x) = fn (y)dy, x ∈ R, n ≥ 1.
−∞ −∞
w
Show that Fn → F .
Remark 1. One of the pillars of probability theory is the central limit theorem.
This theorem is about convergence of a certain sequence of probability measures to the
Gaussian measure. We shall state and prove this result later.
Remark 2. 1. Theorem 1 can be extended to measures and distributions on Rn .
2
Lecture 37
Independence
February 14, 2021
The random experiment here is tossing a coin two times and the four possible results
of this experiment are laid out by the elements of Ω. These are the outcomes of the
experiment. The experiment is called random since the result of the experiment is not
certain but is governed by the above probabilities of the outcomes. Once the experiment
is performed (the two tosses are complete), we can say if a specific event has occurred
or not.
Example 2 (Example 1 continued). One of the key concepts in probability theory is that
of independence. The idea is already germane in the product probability spaces that we
constructed. Let us revisit the above example. Consider the two events
They can be verbally described as “the first toss is head” and “the second toss is head”.
Their probabilities are given by
Now note that the first toss should not influence the result of the second toss. Since
the events A and B depend respectively on the first and the second tosses, this lack of
influence must be reflected in the calculation of probabilities. Now, it is easy to check
that
P (A ∩ B) = p2 = P (A)P (B).
This leads us to the concept of independence of events.
P (A ∩ B) = P (A)P (B).
Exercise 1. (a) Consider the class of events which are either P -null or have probability
1. Show that any two events in this collection are independent. In particular, Ω and ∅
are always independent of each other and each is independent of itself.
1
(b) Any event with probability 1 or 0 is independent of any other event in the σ-field.
(b) Suppose A and B are disjoint. Then they are independent if and only if at least one
of them is a P -null set.
(c) If A is independent of itself, then either A is a P -null set or Ac is a P -null set. That
is, either P (A) = 1 or P (Ac ) = 1.
Exercise 2. If A and B are independent, show that then each of the pair of events
{Ac , B c }, {Ac , B} and {A, B c } are also independent.
In the above example, “intuition” told us that the events A and B cannot “influence”
each other. Indeed, we constructed our product probability space in such a manner.
Sometimes presence of independence is not that intuitive.
Exercise 3. In Example 1, let C = {HH, T T } be the event that “the outcome of both
tosses are same”. Show that A and C are independent and so are B and C. Show that
events A ∩ B and C are not independent.
Definition 3 (Independence of events). Let I be any index set and let {Ai , i ∈ I} be a
collection of events in a probability space. Then they are said to be independent, if for
all finite sub-collections {i1 , . . . , ik } of distinct indices from I, we have
Exercise 4. (a) If {Ai , i ∈ I} are independent, then show that the collection {Bi , i ∈ I}
obtained by replacing some (or all) Ai by Aci is also independent.
(b) If {Ai , i ∈ I} are independent, then clearly any sub-collection is also independent.
Example 3. Note that to check independence one needs to check a lot of conditions.
None of the conditions can actually be dropped. Consider the throw of two dice. Then
we can consider the set of all 36 outcomes and suppose the dice is fair. So, each of the
36 outcomes have the same probability 1/36. Let
Then it can be checked that P (A) = 1/2, P (B) = 1/2, P (C) = 1/9. Further,
1 1
P (A ∩ B) = 6= P (A)P (B) = ,
6 4
2
1 1
P (A ∩ C) = 6 P (A)P (B) = ,
=
36 18
1 1
P (B ∩ C) = 6 P (B)P (C) = ,
=
12 18
1
P (A ∩ B ∩ C) = = P (A)P (B)P (C).
36
Note that the last equation does not imply that A, B and C are independent, nor does it
imply that A ∩ B and C are independent.
3
Lecture 39
Conditional probability
Second Borel-Cantelli Lemma
Independence of σ-fields
February 14, 2021
Suppose B is an event in some probability space (Ω, A, P ). If I am told that the event
B has “occurred”, how does that change the probabilities of events in A?
Definition 1 (Conditional probability measure). Suppose (Ω, A, P ) is a probability
space and P (B) > 0. Then the probability measure P (·|B) defined as
P (A ∩ B)
P (A|B) = , A∈A
P (B)
is called the conditional probability measure given B. For every A ∈ A, P (A|B)
is called the conditional probability of A given B.
Note that the conditional probability measure depends on the “condition” B and is
defined only if P (B) > 0. Further, if P (A) > 0, then the conditional probability of B
given is A equals P (B|A) = P (A ∩ B)/P (A). This is in general different from P (A|B)
unless P (A) = P (B).
Exercise 1. Check that PB is indeed a probability measure. Moreover, if P (B) = 1,
then PB ≡ P .
Definition 2 (Restricted σ-field). Suppose (Ω, A, P ) is a probability space and B is a
non-empty set in A. Define
AB = {A ∩ B : A ∈ A}.
P (A ∩ B) = P (B|A)P (A).
(b) Suppose {Ai }, 1 ≤ i ≤ n are events such that P (A1 ∩ A2 ∩ · · · ∩ An−1 ) > 0. Then
1
Exercise 5. Suppose {Bi } is a measurable partition. Then show that for any event A,
∞
X
P (A) = P (A ∩ Bi )
i=1
∞
X
= P (A|Bi )P (Bi ).
{i:P (Bi )>0}
Proof.
P (lim sup An ) = P (∩∞ ∞
n=1 ∪k=n Ak )
n
= lim P (∪∞
k=n Ak )
n→∞
= lim lim P (∪m
k=n Ak ).
n→∞ m→∞
Now
P (∪m
k=n Ak )
c
= P (∩m c
k=n Ak )
Ym
= P (Ack ) by independence
k=n
Ym
≤ exp {−P (Ak )} since 1 − x ≤ e−x
k=n
∞
X
→ 0 since P (Ai ) = ∞.
i=1
2
Lecture 40
Random variable
PDF, CDF, PMF
Bernoulli, Exponential, Gaussian
Mean, variance
February 15, 2021
Often we simply say distribution function or d.f., and suppress the subscript X.
Exercise 1. For any random variable X, we have FX (−∞) = 0 and FX (∞) = 1. Con-
versely, suppose we are given a distribution function (right continuous, non-decreasing)
F on R such that F (−∞) = 0, F (∞) = 1. Show that we can always construct a probabil-
ity space and a random variable X on it such that its c.d.f is F . Hint: Think of identity
mapping.
The function f is called the probability density function or, simply the density
function or p.d.f, of X.
1
(c) X is said to be continuous if its distribution function F is continuous. This is the
same as saying P {X = x} = 0 for all x ∈ R. Note that if X is continuous, it need not be
absolutely continuous. An example is given by the Cantor function F given in Exercise
4 of Lecture 26. Extend the Cantor function to the entire R by defining it to be 1 for
all x ≥ 1 and to be 0 for all x ≤ 0. Note that F is indeed a distribution function with
F (−∞) = 0 and F (∞) = 1. Let X be a random variable with distribution function F .
This is possible by Exercise 1. Then X is continuous but not absolutely continuous.
Example 1. (a) X is said to be a Bernoulli random variable if its mass function is
concentrated at two values. Usually the values are 0 and 1 with
P {X = 1} = p, P {X = 0} = q, p + q = 1.
Exercise 2. (a) Show that the mean and variance of X can be written in terms of the
probability distribution of X as
Z
E(X) = xdPX (x), (whenever E(X) exists),
R
2
(b) Show that if X ∼ N (m, σ 2 ), then E(X) = m and V(X) = σ 2 .
(c) Find E(X n ) for all positive integers n when X is a Bernoulli or an exponential
random variable.
(d) If X ∼ N (0, 1), find E(X n ) for all positive integers n.
3
Lecture 41
Projection mapping
Random vectors
Joint and marginal distributions
Independence
February 15, 2021
The concepts of real-valued random variables, cumulative distribution functions etc. can
be extended to vectors in the natural way.
Random vectors are also usually denoted by capital letters X, Y etc. Discrete, continuous
and absolutely continuous random variables are categorised as before.
pi (x) = xi , x = (x1 , x2 , . . . , xn ) ∈ Rn , 1 ≤ i ≤ n
1
Definition 5 (Independence of random vectors). Random vectors {Xi , i ∈ I} defined on
(Ω, A, P ) are said to be independent if the σ-fields generated by them are independent.
In particular two random variables X1 and X2 are independent if and only if
Note that the above is equivalent to saying that PX1 ,X2 ,...,Xn = PX1 PX2 · · · PXn . That is,
the joint probability measure is a product of the marginal probability measures. Note
that independent random vectors may not have the same number of co-ordinates.
Exercise 2. Suppose {Fn } are probability distributions functions on R. Show that there
is a probability space and a random vector X whose components {Xn } are independent
with distributions {Fn }.
2
Lecture 42
Criterion for independence
Covariance
February 23, 2021
(b) Suppose X is absolutely continuous with density f . Then each Xi is also absolutely
continuous with some density fi . In this case, {Xi } are independent if and only if
n
Y
f (x1 , . . . , xn ) = fi (xi ), a.e λ. (2)
i=1
Conversely if {Xi } are independent with densities {fi } then X has density f given by
(??) a.e. λ.
F (x1 , . . . , xn ) = P {X1 ≤ x1 , . . . Xn ≤ xn }
Yn
= P {Xi ≤ xi }
i=1
Yn
= Fi (xi ).
i=1
That is,
n
Y
P {Xi ∈ Bi , 1 ≤ i ≤ n} = P {Xi ∈ Bi } (3)
i=1
for all right semi-closed intervals. We wish to prove this equation holds for all Borel sets.
First fix B2 , . . . , Bn and consider
1
It is easy to see that C is a monotone class and contains all finite disjoint union of right
semi-closed intervals. Hence C = B(R). That is (??) holds for all B1 ∈ B(R) when we
fix right semi-closed B2 , . . . , Bn . Now consider the successive co-ordinates to complete
the proof of (a).
(b)
F1 (x1 ) = P {X1 ≤ x1 }
= P {X1 ≤ x1 , X2 ∈ R, . . . Xn ∈ R}
Z Z Z
= . . . f (t1 , . . . , tn )dt1 · · · dtn .
(−∞, x1 ] R R
Note that by Fubini’s theorem, the above integral is computed iteratively. Now, by
definition it follows that X1 has the density
Z Z
f1 (x1 ) = . . . f (x1 , t2 , . . . , tn )dt2 · · · dtn .
R R
Note that it is indeed a Borel measurable function by Fubini’s theorem. Similarly each
Xi is absolutely continuous and with marginal densities obtained by integrating out the
other variables.
Now suppose (??) holds. Then
Z Z
F (x1 , . . . , xn ) = ... f (t1 , . . . , tn )dt1 · · · dtn
(−∞, x1 ] (−∞, xn ]
n
Y
= Fi (xi ).
i=1
Define
n
Y
g(x1 , . . . , xn ) = fi (xi ).
i=1
Now it is easy to conclude that
Z
PX (B) = g(x)dx for all B ∈ B(R).
B
R
Since PX (B) also equals B g(x)dx for all B, it follows that f = g a.e. λ.
For the last part, we can start with equation (??) and argue as above.
2
Exercise 1. (a) Give an example of a bi-variate random vector (X, Y ) such that both X
and Y are absolutely continuous but (X, Y ) is not so. Hint: Simple curves have Lebesgue
measure 0.
(b) Show by example that even if a bivariate random vector has density, it is not deter-
mined by the marginal densities.
(c) Suppose X = (X1 , . . . , Xn ) is a random vector where each Xi is discrete. Show that
X is then discrete. Further, {Xi } are independent if and only if
n
Y
P {X1 = x1 , . . . , Xn = xn } = P {Xi = xi } for all xi ∈ R.
i=1
Exercise 2. Suppose {Xi } are independent random vectors. Show that then fi ◦ Xi are
also independent for all Borel measurable functions (of appropriate orders).
Hint: Prove directly for n = 2. Then use induction. Take the last variable to be an
indicator function first and then extend.
Exercise 4. Suppose X is a random variables such that E(X) is finite. Show that
(a) V(aX + b) = a2 V(X) for all real numbers a and b.
(b) V(X) = E(X 2 ) − [E(X)]2 . Caution: Both sides may equal ∞. What can you say
about a random variable X for which V(X) = 0?
(b) If X and Y are independent and E(X 2 ) and E(Y 2 ) are finite, then
3
Definition 1. If (X, Y ) is a bivariate random vector and E(X 2 ) and E(Y 2 ) are finite,
then the covariance of X and Y is defined as
Cov(X, Y ) = E (X − E(X))(Y − E(Y ) .
If 0 < V(X), V(Y ) < ∞, then the correlation coefficient between X and Y is defined
as
Cov(X, Y )
ρ(X, Y ) = p .
V(X) V(Y )
Exercise 6. (a) Show that the covariance is finite.
(b) If X and Y are independent then show that their covariance is 0.
(c) Show that −1 ≤ ρ(X, Y ) ≤ 1.
(d) Suppose {Xi } are random variables with V(Xi ) < ∞ for all i. Show that
n
X X
V(X1 + · · · + Xn ) = V(Xi ) + 2 Cov(Xi , Xj ).
i=1 1≤i<j≤n
4
Lecture 43
Independence
Moments
Weak law of large numbers
February 23, 2021
Exercise 1. Suppose P is the restriction of the Lebesgue measure (on R2 ), to the unit
disc Ω = {(x, y) : x2 + y 2 ≤ 1}, equipped with the Borel σ-field. Consider the random
vector Xgiven by the identity mapping X(x, y) = (x, y) on Ω. Denote the component
random variables by X1 and X2 . Consider the random variable R = X12 + X22 . Finds its
probability density function and E(R).
Exercise 2. Suppose {Xi }, 1 ≤ i ≤ n are random variables. Show that they are inde-
pendent if and only if for all bounded Borel measurable functions from R to R,
n
Y n
Y
E[ gi (Xi ) = E[gi (Xi )].
i=1 i=1
1
Exercise 8. Suppose {Xi } are i.i.d. with P {Xi = 1} = p = 1 − P {Xi = 0}. Show that
X1 + · · · + Xn
→ p almost surely.
n
Hint: Use the previous exercise. Or do a direct computation via moments and apply the
Borel-Cantelli Lemma.
2
Lecture 44
Strong law of large numbers (SLLN)
February 24, 2021
This result is known for about 90 years, and is traditionally proved by establishing an
inequality, known as Kolmogorov’s maximal inequality. This inequality is extremely
useful and its proof contains ideas that are used elsewhere, for example in the theory of
martingales. However, we give a simple proof that was discovered by Etemadi in 1981.
Proof of Theorem 1. Consider the sequences {Xi+ } and {Xi− }. They also satisfy the
hypothesis of the theorem. Thus it is enough to prove the theorem when {Xi } are
non-negative. Let F denote the distribution function of (any) Xi . Define
1
∞
X X 1
≤ C E(Yi2 )
kn2
i=1 n:kn ≥i
∞
X E(Yi2 )
≤ C tail of a geometric series and kn−2 ≤ i−2
i2
i=1
∞ Z
X 1
= C x2 dF (x)
i2 (0, i]
i=1
∞ i−1 Z
X 1 X
= C x2 dF (x)
i2 (k, k+1]
i=1 k=0
∞ Z ∞
X X
2 1
= C x dF (x) (interchanging the order)
i2
k=0 (k, k+1] i=k+1
∞ Z
X 1
≤ C x2 dF (x)
k + 1 (k, k+1]
k=0
X∞ Z
≤ C xdF (x)
k=0 (k, k+1]
= C E(X1 ) < ∞.
Sk∗n − E(Sk∗n )
→ 0 almost surely. (1)
kn
∞
X ∞
X
P {Xn 6= Yn } = P {Xn > Yn }
n=1 n=1
2
∞ Z
X
= dF (x) (identical distribution)
n=1 (n, ∞)
X∞ X∞ Z
= dF (x)
n=1 i=n (i, i+1]
X∞ Z
= i dF (x) (interchanging the order of summation)
i=1 (i, i+1]
X∞ Z
≤ xdF (x)
i=1 (i, i+1]
Z
≤ xdF (x) = E(X1 ) < ∞.
(0, ∞)
Hence, by Borel-Cantelli Lemma, Xn 6= Yn only for finitely many n almost surely. Hence
using Step (i), Skn /kn → E(X1 ) almost surely.
Proof of Step (iii) Since {Xi } are non-negative, Sn is non-decreasing. For every n, let
sn be the largest j such that kj is smaller than n. Note that ksn +1 ≥ n and
ksn +1
lim = α.
n→∞ ksn
Then
Sksn +1 ksn +1 ksn
lim sup Sn /n ≤ lim sup
ksn +1 ksn n
≤ E(X1 )α almost surely.
Similarly
1
lim inf Sn /n ≥ almost surely.
α
Now since α > 1 is arbitrary, letting α → 1 (through a countable sequence, so that null
sets are managed),
Sn
lim = E(X1 ) almost surely.
n
This complets the proof.
Exercise 1. Suppose X1 and X2 are i.i.d. absolutely continuous random variables with
density f .
(b) Define X(1) = min{X1 , X2 } and X(2) = max{X1 , X2 }. Find the density of X(1) ,
X(2) and (X(1) , X(2) ) in terms of f . Hint: On which Borel subset of R2 will the density
be non-zero? Start with sets for which probabilities are easier to calculate.
3
(c) Can you generalise to the case where you have n i.i.d. absolutely continuous random
variables? Note that you have to define second minimum....etc. These are called ordered
statistics.
(d) Suppose {Xi } are i.i.d. with the uniform distribution on the interval (0, 1). Let
X(n) = max{X1 , . . . , Xn }.
Show that the distribution function Fn of the random variable n(1 − X(n) ) converges to
the distribution function of an exponential random variable.
Exercise 2. (a) Suppose X is a Poisson random variable with parameter λ. Show that
E(X) = V ar(X) = λ.
(b) Suppose X1 and X2 are independent Poisson random variables with parameters λ1
and λ2 . Show that X1 + X2 also has the Poisson distribution with parameter λ1 + λ2 .
Exercise 3. Suppose {Xi } are i.i.d. Bernoulli random variables with P {X1 = 1} = p =
1 − P {X1 = 0}.
(a) Let
N1 = inf{n ≥ 1 : Xn = 1}.
Find the probability distribution of the random variable N1 . Hint: Start by listing the
possible values of N1 .
(b) Let
N2 = inf{n > N1 : Xn = 1}.
Find the distribution of N2 . Hint: Use total and conditional probability, along with
independence. Check whether N1 and N2 − N1 are independent. Hint: Use the p.m.f.
criterion for independence.
4
Lecture 45
Kolmogorov’s Zero-one Law
February 24, 2021
Tn = σ(Xn , Xn+1 , . . .)
T∞ = ∩∞
n=1 Tn
is known as the tail σ-field of {Xi }. The sets in T∞ are known as tail events. A
function f : Ω → (R̄, B(R)) is called a tail function if it is measurable with respect to
T∞ .
(ii) A2 = { ∞
P
n=1 Xn converges}.
(b) Show that lim sup Xn , lim inf Xn are tail functions.
Theorem 1 (Kolmogorov’s zero-one law). Suppose {Xi } is a sequence of independent
random variables and T∞ is their tail σ-field. Then for every A ∈ T∞ , P (A) = 0 or 1.
Every function which is measurable with respect to T∞ , is a constant almost surely.
A = {(X1 , X2 , . . .) ∈ A∗ }
C = {(X1 , X2 , . . .) ∈ C ∗ }.
Let
C = {C ∗ ∈ B(R)∞ : C and A are independent}.
1
Suppose C ∗ is a measurable cylinder. Then
C = {(X1 , . . . , Xn ) ∈ Bn } ∈ σ(X1 , . . . , Xn ).
Note that A ∈ Tn+1 . Since {Xi } are independent, clearly A and C are independent.
Thus C contains all measurable cylinders. Clearly C is a monotone class. For example if
Cn ↑ C and Cn ∈ C, then P (Cn ) ↑ P (C) and
Thus A and C are independent. Similarly for decreasing limits. By monotone class
theorem C = B(R)∞ and in particular A∗ ∈ B(R)∞ . This means that A is independent
of itself.
If f is a tail function then for each c ∈ R̄, {ω : f (ω < c} is a tail event and hence has
probability 0 or 1. take
k = sup{c ∈ R̄ : P {f < c} = 0}
then f = k almost surely.
Example 1. Suppose {Xi } are i.i.d. with mean 0. Then 0 − 1 law says that the set
Sn
A = {ω : → 0}
n
has probability 0 or 1, since A is a tail event. The SLLN goes a step further and says
that P (A) = 1.
Exercise 2. (a) Suppose {Xi } are i.i.d. non-negative random variables and E(X1 ) = ∞.
Show that Sn /n → ∞ almost surely.
Exercise 3. Show that if {Xi } are independent, then all sets in the tail σ-field are
independent of each other.
2
Lecture 46
Hewitt-Savage Zero-one Law
February 24, 2021
(b) {Xn = 0 for all n} is a symmetric event but not a tail event.
Theorem 1 (Hewitt-Savage Zero-One Law). Suppose {Xi } are i.i.d.. If A is a sym-
metric set in σ(X1 , X2 , . . .) then P (A) = 0 or 1. Further, any symmetric function is
constant almost surely.
1
Lecture 47
Convergence in distribution
Helly’s theorem
Prokhorov’s theorem
February 25, 2021
D
Note that if Xn → X, and Y has the same probability distribution as that of X, then
D
Xn → Y . Thus only the probability distributions are being identified in the
limit.
(c) Given an example of a sequence {Xn } (defined on the same probability space) such
D P
that Xn → X but Xn → X does not hold.
1
Suppose {xn } is a bounded sequence of real numbers. Then it may not converge but
we can always find a subsequence which converges. Here is the analogous result for
probability distribution functions.
Example 1. Note that F in the above theorem need not be a probability distribution
function. We have seen such examples earlier. Consider the probability distribution
functions Fn and the distribution function F given by
0 if x < 0,
Fn (x) = 1/2 if 0 ≤ x < n,
1 if x ≥ n.
(
0, if x < 0,
F (x) =
1/2 if x ≥ 1/2.
Then F has only one discontinuity (at x = 0) and it is easy to see that the Fn (x) → F (x)
for all x 6= 0.
Since {Fn } are all non-decreasing functions, it follows that FD (·) is also non-decreasing on
D. Now we construct a distribution function (not necessarily a probability distribution
function) out of FD (·) in the obvious way:
2
Remains to show that Fn (x) → F (x) at all continuity points of F . Suppose x < y ∈ D.
Then
lim sup Fnk (x) ≤ lim sup Fnk (y) = FD (y).
k→∞ k→∞
Then take infimum over y ∈ D, y > x to get
lim sup Fnk (x) ≤ F (x).
k→∞
Now consider x∗ < y < x, y ∈ D. Then
∗
F (x ) ≤ FD (y) = lim Fnk (y) = lim inf Fnk (y) ≤ lim inf Fnk (x).
Let x∗ → x to obtain
F (x−) ≤ lim inf Fnk (x).
Now if x is a continuity point of F , F (x− ) = F (x) and we get lim Fnk (x) = F (x).
In Example 1, some mass escaped to ∞. The concept of “tightness” prevents this
situation.
Definition 3. A set of probability measures {Pi , i ∈ I} on (R, B(R)) is said to be tight
if given > 0, there exists an M > 0 such that Pi [−M, M ]c < for all i ∈ I. A set of
probability distribution functions {Fi , i ∈ I} or a set of random variables {Xi , i ∈ I} is
said to be tight if their corresponding probability measures are tight.
Proof. Suppose {Fi , i ∈ I} is tight. Take a sequence {Fn } from this collection. By
Helly’s theorem, there is a sub-sequence {Fnk } and a distribution function F such that
Fnk (x) → F (x) at all continuity points x of F . Fix > 0. Using this, we can choose a
and b continuity points of F such that Fn (R \ (a, b]) < and F (R \ (a, b]) < . This
can be used to show that F is a probability distribution function. We omit the details.
Now assume that {Fi , i ∈ I} is relatively compact but not tight. Then for some > 0,
we can pick {Fn } such that Fn (R − (−n n)) ≥ . By relative compactness, we can get a
sub-sequence, say {Fnk } which converges to say F . Now note that R \ (−n, n) is closed
and
lim sup Fnk (R \ (−n, n)) ≤ F (R \ (−n, n)).
k→∞
Thus F (R \ (−n, n)) ≥ for all n and letting n → ∞, we obtain ≤ 0 which is a
contradiction. This completes the proof.
3
Lecture 48
Characteristic function
Inversion formula
February 28, 2021
Definition 1 (Characteristic function). For any finite measure µ on B(Rn ) define its
characteristic function µ̂ : Rn → C as
Z
0
µ̂(t) = eιt x dµ(x), t ∈ Rn .
Rn
We shall mostly deal with the case n = 1 and where µ is a probability measure. Note
that
ιt0 x
R
Rn e f (x)dλ(x) if µ has density f (·)
µ̂(t) =
P ιt0 x
xe p(x) if µ has p.m.f. p(·).
Thus, apart from the constant, if µ has a density f , the characteristic function of µ is
really the Fourier transform of f .
Exercise 1. (a) Show that
|eιa − eιb | ≤ |a − b|.
Hint: Write it as an integral from a to b.
eιx − 1
(b) Show that limx→0 = 1.
ix
Exercise 2. Show that φX (·) is a uniformly continuous function.
Theorem 1 (Inversion formula). Suppose φ is the characteristic function of the distri-
bution function F .
(a) Then for all a, b ∈ R, a < b,
c
e−ιta − e−ιtb
Z
F (b) + F (b−) F (a) + F (a−) 1
− = lim φ(t)dt.
2 2 c→∞ 2π −c it
1
In particular if a, b are continuity points of F , then
Z c −ιta
1 e − e−ιtb
F (b) − F (a) = lim φ(t)dt.
c→∞ 2π −c it
is a density for F .
Part (b) is known as the Fourier Inversion Theorem. Using this, the map f → fˆ
can be extended to a map from L2 (λ) to itself and then is an isometry. This is one of
the fundamental results in Fourier Analysis.
where
c
sin t(x − a) − sin t(x − b)
Z
1
Jc (x) : = dt (the cos integral is 0)
2π −c t
Z c(x−a) Z c(x−b)
1 sin v 1 sin w
= dv − dw.
2π −c(x−a) v 2π −c(x−b) w
R s sin t
Recall that r dt → π as s → ∞ and r → −∞. Using this, it follows that
t
(i) There is an M < ∞ such that supc,x |Jc (x)| ≤ M .
2
(ii)
0 if x < a or x > b
J(x) := lim Jc (x) = 1 if a < x < b
c→∞
1 if x = a or x = b.
2
Hence by DCT,
Z
lim Ic = J(x)dF (x)
ZR Z Z Z Z
1 1
= 0dF (x) + dF (x) + + dF (x) + dF (x) + 0dF (x)
x<a 2 x=a a<x<b 2 x=b x>b
1 1
= [F (a) − F (a−)] + [F (b−) − F (a)] + [F (b) − F (b−)]
2 2
F (b) + F (b−) F (a) + F (a−)
= − .
2 2
(b) Now suppose φ(·) is integrable. Note that then f given below is well-defined:
Z ∞
1
f (x) = e−ιtx φ(t)dt.
2π −∞
Z b Z ∞ Z b
1
e−ιtx dx dt
f (x)dx = φ(t)
a 2π −∞ a
Z c Z b
1
e−ιtx dx dt by DCT
= lim φ(t)
c→∞ 2π −c a
e−ita
− e−itb
c
Z
1
= lim φ(t)dt
c→∞ 2π −c it
= F (b) − F (a) by part (a) if a, b are continuity points of F.
We claim that this holds for all a < b. Since f is continuous, the integral is a continuous
function of a and b. Since the continuity points of F are dense in R, the above holds
for all a and b. Since f is continuous everywhere, F is differentiable everywhere and
moreover F 0 (x) = f (x) for all x. Since F is non-decreasing, f must be non-negative.
Thus f is a density of F .
3
Exercise 3. The following facts were used in the last part of the above theorem. Show
that
Rb
(a) If g is integrable wrt the Lebesgue measure, then a g(x)dx is a continuous function
of a and b.
Rb
(b) If g is a continuous function then a g(x)dx is a differentiable function of both b and
a.
(c) If g is a continuous function such that for a distribution function F , F (b) − F (a) =
Rb
a g(x)dx for all a < b, then show that g is non-negative.
Exercise 4. Suppose P1 and P2 are two probability measures on R such that P̂1 (t) =
P̂2 (t) for all t. Then show that P1 ≡ P2 .
4
Lecture 49
Basic properties of characteristic function
Moment generating function
Cauchy distribution
February 28, 2021
(d) A random variable X has a symmetric distribution if and only if its characteristic
function φ is real-valued.
(e) If E(|X|n ) < ∞ for some positive integer n, then the n-th derivative of φX (·) exists
and is continuous on R and
Z
(n)
φX (t) = (ιx)n eitx dF (x).
R
In particular,
(n)
ιn E(X n ) = φX (0).
This is one of the most useful properties of the characteristic function. On the left side we
have a random variables Sn whose distribution we may not be able to calculate. However,
its characteristic function is easy to calculate in terms of the individual characteristic
function. And due to the uniqueness property of the characteristic function this is
valuable.
Exercise 2. (a) If X is a random variable show that for any real constants a and b,
φaX+b = eitb φX (at).
Definition 2. The moment generating function(m.g.f.) of a random variable is defined
as
MX (t) = E[etX ], t ∈ R.
1
Note that the m.g.f. may equal ∞ for some values of t.
Exercise 3. (a) Show that for a, b ∈ R,
MaX+b (t) = MX (at)ebt , t ∈ R.
(d) Show that a random variable X has the normal distribution if and only if its char-
acteristic function is given by
2
φX (t) = eita e−t b/2
and in that case E(X) = a and V (X) = b.
(e) Show that if X1 and X2 are two independent normal random variables, then X1 + X2
is also a normal random variable.
Exercise 4. Consider the function
1 1
f (x) = , x ∈ R.
π 1 + x2
R
(a) Show that R f (x)dx = 1. [This is called the (standard) Cauchy density function].
(b) Suppose X is a random variable with the above density function. Then it is called
a (standard) Cauchy random variable. By the help of contour integration, show that its
characteristic function is given by
φX (t) = e−|t| , t ∈ R.
(c) Suppose {Xi } are independent Cauchy random variables. Show that (X1 +· · ·+Xn )/n
is also a Cauchy random variable for every n.
Exercise 5. (a) Suppose X is a binomial random variable with parameters n and p.
Find its characteristic function.
(b) Suppose X1 and X2 are independent binomial random variables with paramaters
(n1 , p) and (n2 , p) respectively. Show that X1 + X2 is also a binomial random variable
and identify the parameters.
Exercise 6. (a) Suppose X is a Poisson random variable with parameter λ. Find its
characteristic function.
(b) Suppose {Xi } are independent Poisson random variables with parameters {λi }. Show
that for every n, X1 +· · ·+Xn is a Poisson random variable with parameter λ1 +· · ·+λn .
2
Lecture 50
Weak convergence and characteristic function
Central Limit Theorem (CLT)
February 28, 2021
(a) If every weakly convergent subsequence of {Fn } converges to the same distribution
w
function F , then Fn → F .
(b) {Fn } converges weakly if and only if φFn (t) converges to a finite limit for every t.
(b) Suppose φFn (t) converges to a finite limit say g(t) for every t. Since {Fn } is tight,
by Helly’s theorem, there is a subsequence Fnk that converges weakly to a probability
distribution function say F . If Fn does not converge weakly to F , then by the above
exercise, there is another sub-sequence Fmk (t) that converges to say G 6= F . But this is
a contradiction since g(t) = φF (t) = φG (t) for all t. Hence Fn converges weakly to F .
Converse follows from the Portmanteau theorem since x → eιtx is a bounded continuous
function.
k u
Z Z
dF (x) ≤ [1 − Re φ(t)]dt.
{x:|x|≥1/u} u 0
1
Proof.
u
1 u
Z Z Z
1
[1 − Re φ(t)]dt = (1 − cos tx)dF (x)dt
u 0 u
Z h0 ZRu
1 i
= (1 − cos tx)dt dF (x) by Fubini’s theorem
u
ZR 0
sin ux
= 1− dF (x)
R ux
Z
sin t
≥ inf 1− dF (x)
{t: |t|≥1} t {x: |ux|≥1}
Z
1
= dF (x).
k {x: |x|≥1/u}
(b) If φFn (t) → g(t) for all t where g is continuous at 0 and g(0) = 1. Then g is a
characteristic function of some probability distribution F and Fn → F weakly.
Proof. (a) follows from the definition of weak convergence since eitx is a bounded con-
tinuous function.
k u
Z Z
dFn (x) ≤ [1 − Re φFn (t)]dt
{x: |x|≥1/u} u 0
k u
Z
→ [1 − Re g(t)]dt by DCT
u 0
→ 0 as u → 0 (since g is continuous at 0).
This shows that {Fn } is tight. Hence by Theorem 1 Fn converges weakly to a distribution
function F . As a consequence φFn (t) → φF (t) for all t. This implies that φF (t) = g(t)
for all t. Hence g is the characteristic function of F .
Theorem 3 (Central Limit Theorem (CLT)). Suppose {Xn } are i.i.d. random variables
√
with mean µ and variance σ 2 . Then (Sn − nµ)/ nσ converges weakly to the standard
normal distribution.
2
Proof. Define Yi = (Xi − µ)/σ. Then {Yi } are i.i.d. with mean 0 and variance 1. and
n
√ X √
(Sn − nµ)/ nσ = Yi / n = Zn , say.
i=1
Let φ denote the characteristic function of Y1 . Noting that {Yi } are i.i.d.,
−1/2
Pn
φZn (t) = E eιtn Yi
i=1
h t i n
= φ √ (1)
n
Recall that the characteristic function of the standard normal distribution is ψ(t) =
2
e−t /2 . Hence by Lévy’s theorem it is enough to show that
h t i n 2
φ √ → e−t /2 for all t ∈ R. (2)
n
The immediate thought that comes to mind is to take logarithm and this would lead to
a proof that would generalise to other situations where {Xi } are not necessarily i.i.d.
However, for our purposes, we do not take this route and instead the following easy
lemma is an adequate tool.
Lemma 2. Suppose {ai } and {bi } are complex numbers such that
Then
Yn n
Y X n
ai − bi ≤ |ai − bi |.
i=1 i=1 i=1
The proof of the lemma follows easily by induction and is left as an exercise.
We shall need another inequality. Recall that we have already used the inequality |eιx −
1| ≤ |x|. We shall need a refinement of this inequlaity.
Lemma 3. (a) For all x ∈ R,
ιx x2 n |x|3 o
e − 1 + ιx − ) ≤ min , x2 .
2 6
1
φX (t) = 1 + ιt E(X) − E(X 2 ) + o(t2 ), as t → 0.
2
3
Proof. We leave the proof of (a) as an exercise. To prove (b), note that by (a),
1 h n |tX|3 oi
|φX (t) − (1 + ιt E(X) − E(X 2 )| ≤ E min , (tX)2
2 h n6 oi
≤ t E min |t||X|3 , X 2 .
2