Measure Theory
Measure Theory
Bent Nielsen
University of Oxford
1/141
Introduction
2/141
Some textbooks
There is a plethora of textbooks on measure-theoretic probability.
Some examples:
Resnick (2019) and Shiryaev (1996): Many examples, focus
on probability, good for self-study.
Dudley (2018): Comprehensive. On real analysis, probability.
Lehmann and Casella (1998) and Lehmann and Romano
(2005) also contain crash-course style material with particular
emphasis on applications in statistics and econometrics.
Liese and Miescke (2008): Appendix with collection of results.
Axler (2021) has a lot of intuition and many examples of why
the theory is the way it is [why define measures only
on σ-algebras, why Lebesgue integration and not Riemann?].
It may be interesting for self-study. Available for free on
Axler’s website.
Kolmogorov and Fomin (1970): concise, clear overview of
metric spaces, topology, σ-algebras.
3/141
Why the measure-theoretic approach?
6/141
Examples of σ-algebras
The systems
7/141
Some further properties of σ-algebras
If A is a σ-algebra in X , then
1 ∅ ∈ A.
2 If A1 , . . . , AN is a finite collection of sets in A,
then ∪N n=1 An ∈ A.
3 If A, B ∈ A, then A ∩ B ∈ A.
4 If A, B ∈ A, then A \ B ∈ A.
5 If (An )n∈N is a sequence of sets in A, then ∩n∈N An ∈ A.
8/141
Proof of Lemma 2
9/141
Intersection of σ-algebras
is a σ-algebra in X .
10/141
What level of generality do we want?
11/141
General level. In mathematics, basic structures are metric spaces
and topologies (Kolmogorov and Fomin, 1970).
13/141
Theorem 4 (Generated σ-algebra)
D ⊆ A =⇒ σ(D) ⊆ A.
15/141
Equation (1) on the previous slide implies that if the system D is a
generating class for a σ-algebra A in X , then the families
c
1 D :D∈D
2 ∪n∈N Dn : Dn ∈ D for all n ∈ N
3 ∩n∈N Dn : Dn ∈ D for all n ∈ N
also generate A. You will show this in the exercises.
16/141
Recall, that a topology in a set X is a system τ of subsets D ⊂ X ,
called open sets (relative to τ ) such that
1 ∅ and X belong to τ .
2 Arbitrary (finite or infinite) unions ∪i∈I Di and finite
intersections ∩ni=1 Di of open sets belong to τ .
The pair (X , τ ) is a topological space.
The complements D c = X − D of open sets D are closed.
If (X , d) is a metric space then its open sets form a topology on
X . But, not every topological space arises in this way.1
By (1) above it follows that the family of open sets G and the
family of closed sets F of any topological space generate the
same σ-algebra.
1
Math topic 3, p. 24; Kolmogorov and Fomin (1970, p. 79) 17/141
Borel algebra - specific level
Let a ≤ b.
Examples of Borel sets of R (with ρ2 ):
Half-open intervals: (a, b] = (−∞, b]\(−∞, a].
Open half-lines: (−∞, b) = ∪n∈N (−∞, b − 1/n].
Closed intervals: [a, b] = (−∞, b]\(−∞, a).
Open intervals: (a, b) = (−∞, b)\(−∞, a].
The above sets could also be used as bases for the Borel-algebra.
Further examples of Borel sets:
Points: {a} = [a, a].
One can construct (complicated) sets that are not Borel in the
sense of Definition 5. The construction requires the axiom of
choice.2
2
Kolmogorov and Fomin (1970, Section 3.7, Problem 26.7).
19/141
Borel algebra - general level
3
Math topic 3, p. 23-24; Kolmogorov and Fomin (1970, p. 50, 79)
20/141
Definition 5 is a special case of Definition 6. In one dimension, this
is because the Borel-algebra in Definition 5 is equivalent to the
σ-algebra generated by the open intervals (a, b) for all a, b ∈ R.
The open intervals generate a topology, where it suffices to
consider countable set operations. The latter is a consequence of
the next Theorem 7.
As all norms on Rd are equivalent4 , then the open sets on Rd with
the Euclidean distance could equivalently be defined with respect
to any norm induced metric.
The trivial σ-algebra (∅, Rd ) and the power set on Rd are both
Borel-algebras in the sense of Definition 6. Nonetheless, we will
only use the notation B(Rd ) when the open sets are generated by
the open hyper-cubes or equivalently by a norm induced metric.
The question whether all σ-algebras are Borel-algebras in the sense
of Definition 6 appears to be an open question. Kolmogorov and
Fomin (1970, p. 35) write that the term Borel-algebra is often used
to denote a σ-algebra.
4
Kolmogorov and Fomin (1970, p. 141) 21/141
Borel algebra on (Rd , ρ2 ) is countably generated
Theorem 7
For every d ∈ N it holds that on (Rd , ρ2 ) then
= σ b2 (x, r ) : x ∈ Qd , r ∈ (0, ∞) ∩ Q
As it turns out, the Borel algebra B(Rd , ρ2 ) has just the right size
in many contexts. For contrast:
The power set of Rd is an uncountably generated σ-algebra.
The trivial σ-algebra (∅, R) is finitely generated.
22/141
Recall: a metric space is separable if it contains a dense subset.5
To prove Theorem 7, we use the following separability result.
Lemma 8
Let Rd be equipped with a metric ρ so that Qd is dense in Rd
(e.g., ρ2 ). Let G ⊆ Rd be a non-empty open set. Write the
countable set G ∩ Qd as
G ∩ Qd = {xk : k ∈ N} .
We now show that the reverse inclusion holds as well. To this end,
fix any x ∈ G and let r ∈ (0, 2] be such that bρ (x, r ) ⊆ G . Since
G is open such an r exists. By assumption, Qd is dense in Rd .
Thus, there exists an n ∈ N such that xn ∈ bρ (x, r /4).6
6
Math topic 3, p. 18-20. 24/141
For any y ∈ bρ (xn , r /2) it holds, by first using the triangle
inequality and then the specified radii of the balls, that
r /2 ≤ sn and so r /4 ≤ sn /2 ≤ rn .
25/141
Proof of Theorem 7
σ b2 (x, r ) : x ∈ Qd , r ∈ (0, ∞) ∩ Q
⊆σ b2 (x, r ) : x ∈ Rd , r > 0
⊆σ {G ⊆ Rd : G is open} = B(Rd ).
Next, by Lemma 8
B(Rd ) = σ G : G is open in Rd
⊆ σ b2 (x, r ) : x ∈ Qd , r ∈ (0, ∞) ∩ Q ,
27/141
Definition 10 (Measure and measure spaces)
Note that
S the left-hand side in (2) is well-defined
since n∈N An ∈ A (by σ3) and the right-hand side is
well-defined as it is a sum of non-negative terms.
28/141
Some terminology
29/141
Examples of measures
30/141
The Lebesgue measure λd on (Rd , B(Rd ))
7
See Shiryaev (1996, p. 151–159). 31/141
Counting measure
Counting measure: Consider the measurable space (X , P(X )),
where P(X ) is the power set. The counting measure
τ : P(X ) → [0, ∞] is then defined as
That is, τ (A) < ∞ if and only if A has finitely many elements.
32/141
If, on the other hand, τ (∪n∈N An ) = ∞, then at least one of the
following holds:
1 There exists an n0 ∈ N such that τ (An0 ) = ∞.
2 τ (An ) > 0 for infinitely many n ∈ N.
P
In both cases n∈N τ (An ) = ∞ as desired. Thus, τ is a measure
on (X , P(X )).
33/141
Theorem 11 (Fundamental properties of measures)
34/141
Comments
35/141
Proof of Theorem 11
36/141
(4): Let (An )n∈N be an arbitrary sequence of sets in A and define
B1 = A1 and Bn = An \ ∪n−1 j=1 Aj for n ≥ 2. (4)
[ [ N
[ N
[
An = Bn and An = Bn for all N ∈ N,
n∈N n∈N n=1 n=1
37/141
(5): assume in addition that A1 ⊆ A2 ⊆ A3 ⊆ . . . and construct
disjoint sets Bn as in (4), such that
N
[ N
[
AN = An = Bn .
n=1 n=1
[ X N
X
µ An = µ(Bn ) = lim µ(Bn ).
N→∞
n∈N n∈N n=1
Use that the Bn are disjoint to apply part (1) of this theorem to get
,→ · · · = lim µ ∪N
n=1 Bn = lim µ(AN ).
N→∞ N→∞
38/141
(6): Assume that A1 ⊇ A2 ⊇ A3 ⊇ . . . and, to begin with,
that µ(X ) < ∞. Hence, µ(A) < ∞ for all A ∈ A.
Clearly Ac1 ⊆ Ac2 ⊆ Ac3 ⊆ . . . and so by part (5) of this theorem
\ c [
µ An =µ Acn = lim µ(AcN ).
N→∞
n∈N n∈N
Thus, recalling that all values of µ are finite, we can use part (3)
to conclude
\ c \
µ(AN ) = µ(X ) − µ(AcN ) → µ(X ) − µ An =µ An .
n∈N n∈N
39/141
If µ(X ) = ∞ but µ(A1 ) < ∞ let the measure µA1 : A → [0, ∞) be
defined as
40/141
Measurable functions
41/141
Recall the following properties of functions.8
A function associates a unique real number y = f (x) with
each element of a set X of real numbers.
X is the domain.
Y = {f (x) : x ∈ X } is the range.
The definitions extends to arbitrary sets X , Y. In this more
general context, f may be called a mapping.
If x ∈ X then y = f (x) is the image of x.
Every element of X with a given element y ∈ Y as its image
is called a preimage of y .
If the preimage of y is unique we denote this by f −1 (y ).
If A is a subset of X then {f (x) : x ∈ A} is the image of A.
If B is a subset of Y then f −1 (B) = {x ∈ X : f (x) ∈ B} is
the preimage of B.
If no element in B has a preimage then f −1 (B) = ∅.
8
Math topic 1, p. 15-18; Kolmogorov and Fomin (1970, p. 4-5)
42/141
Examples of functions.
f (x) = x 2 for x ∈ R is a function.
√
The solutions to y = x 2 for x ∈ R are y = ± x. This is not
unique, so not a function.
Examples of preimages.
Let f (x) = x 2 for x ∈ R. Then
f −1 (A ∪ B) = f −1 (A) ∪ f −1 (B).
f −1 (A ∩ B) = f −1 (A) ∩ f −1 (B).
f (A ∪ B) = f (A) ∪ f (B).
These results extend to unions and intersections of an
arbitrary number of sets.
9
Kolmogorov and Fomin (1970, p. 5-6) 43/141
On metric spaces, continuity is defined as follows.10
Let f map (X , ρx ) into (Y, ρy ). Then, f is continuous at the
point x0 ∈ X if ∀ > 0, ∃δ > 0 so that
ρy f (x), f (x0 ) < whenever ρx (x, x0 ) < δ.
45/141
Example of measurable function
Let (X , A) be a measurable space and define for every A the
indicator function 1A : X → R as
(
1 if x ∈ A,
1A (x) =
0 if x ∈ Ac .
48/141
Lemma 15
Let X be a non-empty set and suppose that D is a family of
subsets which all possess a property P. Furthermore, assume that
E(P) := A ⊆ X : A has property P
49/141
Lemma 16 (Useful rules for measurability)
50/141
Proof of Lemma 16
(1) We want to show that f −1 (B) ∈ A for all B ∈ B. This will
follow from Lemma 15 upon showing that
P := B ⊆ Y : f −1 (B) ∈ A
f −1 ∪n∈N Bn = ∪n∈N f −1 Bn ∈ A,
(g ◦ f )−1 (C ) = x ∈ X : g (f (x)) ∈ C
= x ∈ X : f (x) ∈ g −1 (C )
= f −1 g −1 (C ) ∈ A,
52/141
Continuity implies measurability
Theorem 17 (Continuity implies measurability)
Let (S, ρS ) and (T , ρT ) be metric spaces equipped with the
corresponding Borel σ-algebras B(S) and B(T ), respectively. Then
every continuous mapping f : S → T is B(S)-B(T )-measurable.
M(A) := {f : X → R : f is A-B(R)-measurable}
ϕ(f1 , . . . , fd ) : X → R
cf , f + g, f · g, f ∧ g, f ∨g
57/141
Measurability and limits
58/141
Functions with values in R
59/141
Measurability is preserved under limits
Define
M(A) = {f : X → R : f is A-B(R)-measurable}
M(A)+ = {f ∈ M(A) : f (x) ≥ 0 for all x ∈ X }
f −1 ([−∞, b]) ∈ A,
60/141
One key advantage of working with measurable functions is
that measurability is preserved under pointwise limits.
As mentioned, this is contrast to pointwise limits of
continuous functions.
Define the function inf n∈N fn : X → R as
61/141
Theorem 19 (Limits and measurability)
1 Let (X , A) be a measurable space and let (fn )n∈N be a
sequence of functions in M(A). Then the functions
62/141
Proof of Theorem 19
15
Recall that for any A ⊆ R it holds that − inf A = sup(−A) and use this
with A = {fn (x) : n ∈ N} for each x ∈ X . 64/141
A final note on stability of measurability in M(A)
Theorem 20
Let f , g be functions in M(A) and let c be a constant in R. Then
1 the functions cf , f ∧ g , f ∨ g and fg are elements of M(A).
2 if
and
We shall now introduce the Lebesgue integral and study its main
properties. It has several advantages over the Riemann integral.
1 It is defined for a broader class of functions.
2 It is much more stable under pointwise limits of sequences of
functions. That is, pointwise limits and integration can often
be interchanged. For the Riemann integral we typically need
uniform convergence.
3 The Lebesgue integral is easily defined for functions on an
arbitrary measure space (X , A, µ). The Riemann integral is
defined for functions on R. This is important in probability
theory where the random variables are defined on a probability
space (Ω, F, P) and expected values are Lebesgue integrals
Z
E X = X (ω)P(dω).
66/141
Simple functions
67/141
Given a measure space (X , A, µ), ai ∈ R and Ai ∈ A, i = 1, . . . , n
we call s : X → R defined via
n
X
s(x) = ai 1Ai (x) (6)
i=1
a simple function.
Observe that s is A-B(R)-measurable by part (2) of Theorem 18.
Denote by SM(A) the set of simple functions and by SM(A)+
the set of non-negative simple functions — clearly SM(A) is a
vector space — so that we can apply linear operations.
It is clear that we can always choose the Ai in (6) such that they
are disjoint and ∪ni=1 Ai = X . For example, for f : R → R with
f = 3 · 1[1,3] + 2 · 1[2,4]
= 3 · 1[1,2) + 5 · 1[2,3] + 2 · 1(3,4] + 0 · 1R\[1,4]
71/141
We begin by establishing some fundamental properties of the
integral on SM(A)+ (page 68).
72/141
Proof of Theorem 23
73/141
and so
n
X m
X m
X n
X
f +g = ai 1Ai ∩Bj + bj 1Ai ∩Bj
i=1 j=1 j=1 i=1
n
XX m
= (ai + bj )1Ai ∩Bj
i=1 j=1
The third equality uses part (1) of Theorem 11 (page 34). 74/141
(4) Since f ≤ g are in SM(A)+ then g − f ≥ 0 is in SM(A)+ .
We then find by part (3) that
Z Z Z Z
gdµ = {f + (g − f )}dµ = fdµ + (g − f )dµ.
75/141
The Lebesgue integral on M(A)+
Lemma 21 (page 69) shows that non-negative functions can be
approximated by simple functions.
Definition 24
Let (X , A, µ) be a measure space and f : XR → [0, ∞] a function
in M(A)+ . We then define the µ-integral fdµ of f via
Z nZ o
fdµ := sup sdµ : s ∈ SM(A)+ , s ≤ f ∈ [0, ∞].
f1 ≤ f2 ≤ . . .
77/141
Proof of Theorem 25
That f = limn→∞ fn = supn∈N fn ≥ 0 is again an element of
M(A)+ follows from Theorem 19 (page 62).
Since fn ≤ supn∈N fn = f for all n ∈ N, it holds that
Z Z Z Z
fn dµ ≤ fdµ and hence sup fn dµ ≤ fdµ,
n∈N
Thus, let such an s be given and note that it suffices to show that
Z Z
α sdµ ≤ sup fn dµ for all α ∈ (0, 1).
n∈N
78/141
Thus, let α ∈ (0, 1) be given and set
Bn := x ∈ X : αs(x) ≤ fn (x) .
If f (x) > 0 then αs(x) < f (x). Hence, since fn (x) ↑ f (x) there
exists an nx such that αs(x) ≤ fn (x) for n ≥ nx . If f (x) = 0, then
also αs(x) = fn (x) = 0. Thus, Bn ↑ X , i.e. B1 ⊆ B2 ⊆ . . .
and ∪n∈N Bn = X . Thus,
and hence
Z Z
α lim sup s1Bn dµ ≤ sup fn dµ.
n→∞ n∈N
79/141
We conclude by Rshowing that the left-hand side of the previous
display equals α sdµ. Since s ∈ SM(A)+ there exist an N ∈ N,
and standard representation with a1 , . . . , aN ≥ 0 such that
N
X N
X
s= ai 1Ai and so s1Bn = ai 1Ai ∩Bn .
i=1 i=1
Therefore,
Z N
X
s1Bn dµ = ai µ(Ai ∩ Bn ).
i=1
By part (5) of Theorem 11 (page 34) it holds for i = 1, . . . , N that
lim µ(Ai ∩ Bn ) = µ ∪n∈N (Ai ∩ Bn )
n→∞
= µ Ai ∩ (∪n∈N Bn ) = µ(Ai ).
It follows that
Z N
X Z
lim sup s1Bn dµ = ai µ(Ai ) = sdµ.
n→∞
i=1
80/141
A useful consequence
81/141
Dirichlet’s function
16
Assume without loss of generality that x1 < . . . < xn such that there is a
strictly positive distance r between the xi . Hence, in any partition of [0, 1]
consisting of intervals of length at most r /2 at most n elements contain an xi .
Taking the infimum over such partitions one sees that the upper Riemann
integral is 0 [just like the lower Riemann integral clearly is] 82/141
But since D is not Riemann-integrable we see that pointwise
limits do not preserve Riemann-integrability and it does not
make sense to write
Z 1 Z 1
lim fn (x)dx = D(x)dx.
n→∞ 0 0
17
Alternatively, we can use the Dominated Convergence Theorem to be
introduced in Theorem 33 below. 83/141
We now use the Monotone Convergence Theorem to show how the
integral on M(A)+ inherits properties from SM(A)+ .
85/141
µ-null set, µ-a.e. and “almost surely”
It turns that the µ-integral “does not care about null sets”.
Definition 27
A subset N of X is called a µ-null set if there exists an A ∈ A such
that
N ⊆ A and µ(A) = 0.
86/141
Examples of λ1 -null sets
87/141
Consider (X , A, µ). We say that a property holds for µ-almost
all x ∈ X if the property holds for all x ∈ X \N where
N is a µ-null set. [common in mathematics]
N ∈ A and µ(N) = 0. [common in probability]
We also say the property holds µ-almost everywhere (a.e.).
In probability, we say almost surely (a.s.) or with probability
one. This terminology often turns up in probability and
statistics.
Examples:
If µ(x ∈ X : f (x) 6= g (x)) = 0 we say that f = g µ-almost
everywhere, or for µ-almost every x or µ-a.e.
If µ(x ∈ X : limn→∞ fn (x) does not exist) = 0, we say that fn
converges µ-almost everywhere, or for µ-almost every x.
88/141
It is often useful that the µ-integral does not register µ-null set in
the sense of (4) in the following theorem.
89/141
Proof of Theorem 28
18
For example, one can use sn as constructed in the proof of Lemma 21
(page 69). 90/141
For the converse, suppose µ(f > 0) =: ε > 0. We bound
Z Z Z
1 1
fdµ ≥ f 1{f >1/n} dµ ≥ 1{f >1/n} dµ = µ(f > 1/n).
n n
Since µ(f > 0) = ε > 0 then by part (5) of Theorem 11 (page 34)
there exists an n ∈ N such that µ(f >R 1/n) > ε/2.
Thus, µ(f > 0) = ε > 0 implies that fdµ > ε/(2n) > 0.
R
Part (3): Suppose fdµ < ∞. Define the monotone sequence
gn = n1(f =∞) . Then f ≥ g ≥ gn where gn → g = ∞ · 1(f =∞) .
Part 4 of Theorem 26 (page 84) then shows
Z Z Z
∞ > fdµ ≥ gdµ = lim gn dµ.
n→∞
91/141
Part 2 of Theorem 26 (page 84) then shows
Z Z
∞ > lim n1(f =∞) dµ = lim n 1(f =∞) dµ
n→∞ n→∞
92/141
Integration of real measurable functions
f+ =f ∨0 and f − = −(f ∧ 0)
|f | = f + + f − and f = f + − f − ,
94/141
Because
Z Z Z Z Z
−
+
|f |dµ = f dµ + f dµ ≤ 2 f dµ ∨ f − dµ
+
and
Z Z Z Z Z
+ − + −
f dµ ∨ f dµ ≤ f dµ + f dµ = |f |dµ
it follows
Z
1
L (µ) = f ∈ M(A) : |f |dµ < ∞ .
R
The following expressions are used interchangeably for fdµ.
Z Z Z Z
fdµ, f (x)µ(dx), f (x)µ(dx), f (x)dµ(x).
X X
96/141
Theorem 31 (Linearity and other properties of the integral)
97/141
Proof of Theorem 31
(2): As L1 (µ) is a vector space f + g ∈ L1 (µ) and
(f + g )+ − (f + g )− = f + g = f + − f − + g + − g − ,
and hence (recall that all function values are real)
(f + g )+ + f − + g − = (f + g )− + f + + g + .
Thus, by part (2) of Theorem 26 (page 84)
(f + g )+ dµ + f − dµ + g − dµ = (f + g )− dµ + f + dµ + g + dµ.
R R R R R R
Since all integrals in the previous display are finite one gets
Z Z Z
(f + g )dµ = (f + g )+ dµ − (f + g )− dµ
Z Z Z Z
= f + dµ − f − dµ + g + dµ − g − dµ
Z Z
= fdµ + gdµ.
99/141
Interchanging limits and integration
100/141
Monotone Convergence Theorem
102/141
Prior to illustrating the Monotone Convergence Theorem, let
us note that in case (Ω, F, P) is a probability space upon
which a random variable X with values in (R, B(R)) is
defined, then we write
Z Z
E X := X (ω)P(dω) = XdP, X ∈ L(P).
Ω Ω
103/141
Example: AR(1)
104/141
P∞ i
Consider first i=0 |α| |εt−i |.
PN
Clearly, limN→∞ i=0 |α|i |εt−i | exists in [0, ∞].
Since N i
P
i=0 |α| |εt−i | is F-B(R)-measurable (by Theorem 18,
page 56) and using that limits preserve measurability (by
Theorem 19, page
P 62) i , we conclude
that limN→∞ N i=0 |α| |εt−i | is F-B(R)-measurable.
Since i=0 |α| |εt−i | ↑ ∞
PN i
P i
i=0 |α| |εt−i |, the Monotone
Convergence Theorem (page 101) and linearity of the integral
yield that
∞
X N
X
i
E |α| |εt−i | = lim |α|i E |εt−i | ≤ C /(1 − |α|) < ∞.
N→∞
i=0 i=0
Thus, ∞ i
P
i=0 |α| |εt−i | < ∞ P-a.s. (see Theorem 28, part 3,
page 89)
Since ∞ i
P
i=0 α εt−i converges absolutely P-a.s, it also
converges P-a.s.
105/141
Lebesgue’s Dominated Convergence Theorem
106/141
Example
Consider the measure space (R, B(R), λ1 ) and let f ∈ L1 (λ1 ). We
show that
Z Z
1 1
lim f (x)λ1 (dx) = lim f (x)1[−n,n] (x)λ1 (dx) = 0,
n→∞ 2n [−n,n] n→∞ 2n
(7)
108/141
A useful consequence
and so
Z ∞ Z n
−x
xe −x dx = lim 1 − (n + 1)e −n = 1.
xe λ1 (dx) = lim
0 n→∞ 0 n→∞
110/141
Improper integrals
111/141
Regularity conditions: Example
113/141
For illustration, let us show how one can easily establish a
generalized version of Markov’s inequality within the
measure-theoretic framework we have developed.
Let ψ : R → [0, ∞) be increasing (hence B(R)-B(R)-measurable).
Then, for all t ∈ (0, ∞) such that ψ(t) ∈ (0, ∞)
Z
µ |f | ≥ t ≤ µ ψ(|f |) ≥ ψ(t) = 1{ψ(|f |)≥ψ(t)} dµ
R
ψ(|f |) ψ(|f |)dµ
Z
≤ 1{ψ(|f |)≥ψ(t)} dµ ≤ .
ψ(t) ψ(t)
114/141
Jensen’s inequality
−g (E X ) ≤ E[−g (X )] ⇐⇒ E g (X ) ≤ g (E X ).
19
pPm
Here, for any x ∈ Rm , ||x|| = i=1 xi2 denotes the Euclidean norm.
115/141
Induced measure and substitution rule
117/141
Observe that (9) can equivalently be written as
Z Z Z
fd(µ ◦ T −1 ) = (f ◦ T )dµ = f (T )dµ.
118/141
Proof of Lemma 36
First, we show (9) for indicator functions. Let B ∈ B. By the
definition of µT ,
Z Z
−1
1B dµT = µT (B) = µ(T (B)) = 1T −1 (B) dµ.
Z Z
⇐⇒ f (T )dµ ∧ f − (T )dµ < ∞
+
Z Z
⇐⇒ (f ◦ T ) dµ ∧ (f ◦ T )− dµ < ∞
+
⇐⇒ f ◦ T ∈ L(µ).
120/141
Similarly, replacing “∧” by “∨” in the previous display, we
conclude that
f ∈ L1 (µT ) ⇐⇒ f ◦ T ∈ L1 (µ)
121/141
Product measures and Tonelli’s Theorem
123/141
Tonelli’s theorem tells us how to integrate non-negative
functions with respect to product measures.
We now discuss how to integrate functions that are not
necessarily non-negative.
125/141
Radon-Nikodym
Theorem 41 (Radon-Nikodym)
If µ and ν are σ-finite measures on A and µ ν, then there exists
a ν-a.e. uniquely determined f ∈ M(A)+ called the density of µ
with respect to ν such that for every A ∈ A
Z Z Z
µ(A) = fdν and hdµ = hfdν.
A
126/141
Example: Normal distribution
1 2 /2σ 2
fη,σ2 (x) = √ e −(x−η) , x ∈R
2πσ 2
with respect to the Lebesgue measure λ1 .
That is,
Z
N(η, σ 2 )(A) = fη,σ2 (x)λ1 (dx), A ∈ B(R).
A
e −λ λx
fλ (x) = , x ∈ N0
x!
with respect to the counting measure τ .
That is,
Z X e −λ λx
Poi(λ)(A) = fλ (x)τ (dx) = , A ∈ P(N0 ),
A x!
x∈A
128/141
Applications to measure-theoretic probability
129/141
Independence
Definition 42
1 Let X , . . . , X be random functions defined on (Ω, F, P) with
1 n
values in the measurable spaces (X1 , A1 ), . . . , (Xn , An ),
respectively. We say that X1 , . . . , Xn are independent if for
all A1 , . . . , An in A1 , . . . , An , respectively, it holds that
n
Y
P X1 ∈ A1 , . . . , Xn ∈ An = P(Xi ∈ Ai ).
i=1
130/141
The distribution PX1 ,...,Xn of (X1 , . . . , Xn ) on A1 ⊗ . . . , ⊗An is
defined as
131/141
Proof of Theorem 43
20
Since |X | ≤ 1 + X 2 it follows that L2 (P) ⊆ L1 (P) and the covariance is
well-defined. 134/141
A law of large numbers
Let us finally see how we can easily prove a law of large numbers
within the measure-theoretic framework.
Theorem 45
Let (Xi )i∈N be an independent sequence of random variables on a
probability space (Ω, F, P). Assume that the Xi are identically
distributed, that is PXi = PX1 for all i ∈ N. If X1 ∈ L2 (P) then for
all ε > 0 and n ∈ N
X n
1 Var (X1 )
P Xi − E[X1 ] > ε ≤ . (11)
n nε2
i=1
σ(T ) = T −1 (B) : B ∈ T .
138/141
Let us characterize the function ϕ. By the definition of a
conditional expectation it satisfies 21
Z Z
ϕ(T (ω))P(dω) = X (ω)P(dω) for all B ∈ T ,
T −1 (B) T −1 (B)
where PT = P ◦ T −1 .
A T -B(R)-measurable function ϕ satisfying the above two
displays is also called a conditional expectation of X given T = t.
One often uses the notation E(X |T = t) := ϕ(t) for any such
function ϕ.
21
Observe that a typical element of σ(T ) is of the form T −1 (B) for B ∈ T .
139/141
Summarizing advantages of the measure theoretic approach
140/141
References
Axler, S. (2021): Measure, integration and real analysis,
Springer.
Bartle, R. G. and D. R. Sherbert (2011): Introduction to
real analysis, Wiley, 4th ed.
Dudley, R. M. (2018): Real analysis and probability, CRC Press.
Hoffmann-Jørgensen, J. (1994): Probability with a view
toward Statistics, vol. 1, Chapman and Hall.
Kolmogorov, A. N. and S. V. Fomin (1970): Introductory
Real Analysis, Dover.
Lehmann, E. and G. Casella (1998): Theory of Point
Estimation, Springer.
Lehmann, E. and J. Romano (2005): Testing Statistical
Hypotheses, Springer.
Liese, F. and K.-J. Miescke (2008): Statistical Decision
Theory, Springer.
Resnick, S. (2019): A probability path, Springer.
Shiryaev, A. N. (1996): Probability, Springer, 2nd ed. 141/141