0% found this document useful (0 votes)

3 views277 pages

Prob Intro4

The document discusses the concepts of measurable functions, including real-valued and extended real-valued measurable functions, limit operations, and the properties of simple functions. It emphasizes the importance of measurability in defining operations and integration within probability spaces. Additionally, it outlines how to handle nonnegative measurable functions and the roadmap for defining integration in measure spaces.

Uploaded by

Soumyadeep Majumdar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views277 pages

Prob Intro4

Uploaded by

Soumyadeep Majumdar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 277

Review of the concepts of probability IV

Presidency University

September, 2024
Introduction

I We have talked about real valued measurable functions.

Introduction

I We have talked about real valued measurable functions.

I We have seen that if f , g are real valued measurable functions

then f ± g , fg and gf (if g 6= 0 a..e.) are measurable functions
also.
Introduction

I We have talked about real valued measurable functions.

I We have seen that if f , g are real valued measurable functions

then f ± g , fg and gf (if g 6= 0 a..e.) are measurable functions
also.

I In fact we can show that min{f , g } and max{f , g } are also

measurable functions. These are called Lattice operations.
Introduction

I We have talked about real valued measurable functions.

I We have seen that if f , g are real valued measurable functions

then f ± g , fg and gf (if g 6= 0 a..e.) are measurable functions
also.

I In fact we can show that min{f , g } and max{f , g } are also

measurable functions. These are called Lattice operations.

I But what about limit opertaions?

Limit operations

I If {fn }n≥1 be a sequence of real valued measurable functions,

then
I lim supfn is measurable.
n→∞
I lim inf fn is measurable.
n→∞
I If f = lim fn exists (pointwise limit), then f is measurable.
n→∞
Limit operations

I If {fn }n≥1 be a sequence of real valued measurable functions,

then
I lim supfn is measurable.
n→∞
I lim inf fn is measurable.
n→∞
I If f = lim fn exists (pointwise limit), then f is measurable.
n→∞
I However lim supfn can be ∞ and lim inf fn can be −∞ (unless
n→∞ n→∞
fn ’s are bounded).
Limit operations

I If {fn }n≥1 be a sequence of real valued measurable functions,

I So we need to talk about measurability of functions taking

values in [−∞, ∞] which we call extended real valued
functions.
Extended real valued measurable functions
I Consider a measurable space (Ω, A). A function f : Ω → [−∞, ∞] is
called extended real valued measurable function if f −1 (B) ∈ A for all
B ∈ B or B = {∞} or B = {−∞}.
Extended real valued measurable functions
I Consider a measurable space (Ω, A). A function f : Ω → [−∞, ∞] is
called extended real valued measurable function if f −1 (B) ∈ A for all
B ∈ B or B = {∞} or B = {−∞}.
I Like real valued measurable functions, here also, we can prove similarly
that it is enough to verify the condition for any generating class C of B.
Extended real valued measurable functions
I Consider a measurable space (Ω, A). A function f : Ω → [−∞, ∞] is
called extended real valued measurable function if f −1 (B) ∈ A for all
B ∈ B or B = {∞} or B = {−∞}.
I Like real valued measurable functions, here also, we can prove similarly
that it is enough to verify the condition for any generating class C of B.
I Further all the algebric and lattice properties hold with the convention:

0.(∞) = 0.(−∞) = 0.
For real a 6= 0, (
∞ , if a > 0
a(∞) =
−∞ , if a < 0
(
−∞ , if a > 0
a(−∞) =
∞ , if a < 0
For a ∈ R,a + ∞ = ∞, a + (−∞) = −∞ and
∞ + ∞ = ∞,−∞ − ∞ = −∞, and −∞ + ∞ is not defined.
Extended real valued measurable functions
I Consider a measurable space (Ω, A). A function f : Ω → [−∞, ∞] is
called extended real valued measurable function if f −1 (B) ∈ A for all
B ∈ B or B = {∞} or B = {−∞}.
I Like real valued measurable functions, here also, we can prove similarly
that it is enough to verify the condition for any generating class C of B.
I Further all the algebric and lattice properties hold with the convention:

0.(∞) = 0.(−∞) = 0.
For real a 6= 0, (
∞ , if a > 0
a(∞) =
−∞ , if a < 0
(
−∞ , if a > 0
a(−∞) =
∞ , if a < 0
For a ∈ R,a + ∞ = ∞, a + (−∞) = −∞ and
∞ + ∞ = ∞,−∞ − ∞ = −∞, and −∞ + ∞ is not defined.
I Since −∞ + ∞ is not defined, we cannot say that f , g measurable implies
f + g is measurable: Rather we need to say if f + g is well-defined then it
is measurable.
Extended real valued measurable functions (Contd.)

I f extended real valued on (Ω, A) is measurable if and only if

the following hold:
I f −1 ({∞}) ∈ A.
I f −1 ({−∞}) ∈ A.
I For any interval I ⊆ R, f −1 (I ) ∈ A.
Indicator random variable

I Consider a probability space (Ω, A, P). Suppose for some

A ⊆ Ω, IA : Ω → R be a function such that
(
1 if ω ∈ A
IA (ω) = .
0 if ω ∈/A
Indicator random variable

I Consider a probability space (Ω, A, P). Suppose for some

A ⊆ Ω, IA : Ω → R be a function such that
(
1 if ω ∈ A
IA (ω) = .
0 if ω ∈/A

I Then IA is a random variable iff A ∈ A.

Simple function

I Consider a measurable space (Ω, A). A simple function on

(Ω, A) is a measurable function which takes only finitely many
real values.
Simple function

I Consider a measurable space (Ω, A). A simple function on

(Ω, A) is a measurable function which takes only finitely many
real values.

I A function f : (Ω, A) → R is a simple function iff

n
X
f = ci IAi
i=1

where Ai ∈ A for i = 1, 2, ..., n and ci ∈ R. Further Ai ’s can

be chosen to form a partition of Ω.
Proposition

I Suppose f ∈ L is nonnegative. Then there exists an increasing

sequence {fn }n≥1 of simple functions whose pointwise limit is
f.
Proposition

I Suppose f ∈ L is nonnegative. Then there exists an increasing

sequence {fn }n≥1 of simple functions whose pointwise limit is
f.
1
I We shall divide the range of f into intervals of length 2n .
Proposition

I Suppose f ∈ L is nonnegative. Then there exists an increasing

sequence {fn }n≥1 of simple functions whose pointwise limit is
f.
1
I We shall divide the range of f into intervals of length 2n .

I More precisely, let

n2 n −1
X k
fn = I + nI .
2n f −1 [ k, k+1 ] f −1 (n,∞)
k=0 2n 2n
Proposition

I Suppose f ∈ L is nonnegative. Then there exists an increasing

sequence {fn }n≥1 of simple functions whose pointwise limit is
f.
1
I We shall divide the range of f into intervals of length 2n .

I More precisely, let

n2 n −1
X k
fn = I + nI .
2n f −1 [ k , k+1 ] f −1 (n,∞)
k=0 2n 2n

I Now both f −1 [ 2kn , k+1
2n ] and f −1 (n, ∞) belong to A
and hence fn ’s are non-negative simple functions.
Proposition (Contd.)

I If f (ω) = 0, then fn (ω) = 0 for all n and hence fn converges to

f.
Proposition (Contd.)

I If f (ω) = 0, then fn (ω) = 0 for all n and hence fn converges to

I For all n, fn ≤ f . This is clear from the construction of fn .

Proposition (Contd.)

I If f (ω) = 0, then fn (ω) = 0 for all n and hence fn converges to

I For all n, fn ≤ f . This is clear from the construction of fn .

I Let f (ω) > 0. Suppose f (ω) = c.

Proposition (Contd.)

I If f (ω) = 0, then fn (ω) = 0 for all n and hence fn converges to

I For all n, fn ≤ f . This is clear from the construction of fn .

I Let f (ω) > 0. Suppose f (ω) = c.

1
I Then for all n ≥ [c] + 1, |fn (x) − f (x)| < 2n .
Proposition (Contd.)

I If f (ω) = 0, then fn (ω) = 0 for all n and hence fn converges to

I For all n, fn ≤ f . This is clear from the construction of fn .

I Let f (ω) > 0. Suppose f (ω) = c.

1
I Then for all n ≥ [c] + 1, |fn (x) − f (x)| < 2n .

I Moreover fn ≤ fn+1 and hence we have fn ↑ f .

I For f : Ω → R, define f + , f − : Ω → R as

f + (x) = max{f (x), 0}

f − (x) = max{−f (x), 0}

I For f : Ω → R, define f + , f − : Ω → R as

f + (x) = max{f (x), 0}

f − (x) = max{−f (x), 0}

I Both f + and f − are nonnegative, and f = f + − f − .

I For f : Ω → R, define f + , f − : Ω → R as

f + (x) = max{f (x), 0}

f − (x) = max{−f (x), 0}

I Both f + and f − are nonnegative, and f = f + − f − .

I If f is measurable, so are both f + and f − .

I For f : Ω → R, define f + , f − : Ω → R as

f + (x) = max{f (x), 0}

f − (x) = max{−f (x), 0}

I Both f + and f − are nonnegative, and f = f + − f − .

I If f is measurable, so are both f + and f − .

I Every measurable function is the difference of two nonnegative

measurable functions.
I For f : Ω → R, define f + , f − : Ω → R as

f + (x) = max{f (x), 0}

f − (x) = max{−f (x), 0}

I Both f + and f − are nonnegative, and f = f + − f − .

I If f is measurable, so are both f + and f − .

I Every measurable function is the difference of two nonnegative

measurable functions.

I This is useful because by the above theorem, nonnegative

measurable functions are easier to deal with than general
measurable functions.
Defining integration: Roadmap
I Suppose (Ω, A, µ) is a measure space,
´ and f : Ω → R is
measurable. How can we define Ω fdµ?
Defining integration: Roadmap
I Suppose (Ω, A, µ) is a measure space,
´ and f : Ω → R is
measurable. How can we define Ω fdµ?
I If we could define the integral for f + and f − , we could define
ˆ ˆ ˆ
fdµ = +
f dµ − f − dµ
Ω Ω Ω
because we want integration to be a linear operation.
Defining integration: Roadmap
I Suppose (Ω, A, µ) is a measure space,
´ and f : Ω → R is
measurable. How can we define Ω fdµ?
I If we could define the integral for f + and f − , we could define
ˆ ˆ ˆ
fdµ = +
f dµ − f − dµ
Ω Ω Ω
because we want integration to be a linear operation.
I So we have reduced the problem from general measurable
functions to nonnegative measurable
´ functions. If f is
nonnegative, how can we define Ω fdµ?
Defining integration: Roadmap
I Suppose (Ω, A, µ) is a measure space,
´ and f : Ω → R is
measurable. How can we define Ω fdµ?
I If we could define the integral for f + and f − , we could define
ˆ ˆ ˆ
fdµ = +
f dµ − f − dµ
Ω Ω Ω
because we want integration to be a linear operation.
I So we have reduced the problem from general measurable
functions to nonnegative measurable
´ functions. If f is
nonnegative, how can we define Ω fdµ?
I If s is a nonnegative simple function, say taking values
a1 , ..., ak , then we can define
ˆ X k
−1
sdµ = ai µ f ({ai })
Ω i=1
because this corresponds to our notion of “area under the
curve”.
Roadmap (Contd.)

I Since we want integration to be monotonic, we should have

ˆ ˆ
fdµ ≥ sup sdµ : s is non-negative simple and s ≤ f .
Ω Ω
Roadmap (Contd.)

I Since we want integration to be monotonic, we should have

ˆ ˆ
fdµ ≥ sup sdµ : s is non-negative simple and s ≤ f .
Ω Ω

I Since f is nonnegative, we know that nonnegative simple

functions s with s ≤ f exist, in fact we can get a sequence of
them converging to f .
Roadmap (Contd.)

I Since we want integration to be monotonic, we should have

ˆ ˆ
fdµ ≥ sup sdµ : s is non-negative simple and s ≤ f .
Ω Ω

I Since f is nonnegative, we know that nonnegative simple

functions s with s ≤ f exist, in fact we can get a sequence of
them converging to f .

I Though it is not immediately clear, it makes sense to define the

integral of f by taking the above equation to be an equality.
Defining Integration: Step 1

I Let (Ω, A, µ) be a measure space. If f is a simple

non-negative function such that
k
X
f = ai IAi ,
i=1

with each ai ≥ 0 and Ai ∈ A, then

ˆ k
X
fdµ = ai µ(Ai ).
i=1

I Note thta ai µ(Ai ) is the “area” of a “rectangle” with height ai

and “length” µ(Ai ).
I The definition in Step 1 is a valid definition. If we have
k
X l
X
f = ai IAi = bi IBi
i=1 i=1

with all ai , bi ≥ 0 and Ai , Bi ∈ A, then we must have

k
X l
X
ai µ(Ai ) = bi µ(Bi ).
i=1 i=1
Properties

I If
´ f , g are nonnegative
´ simple,
´ then
(f + g )dµ = fdµ + gdµ.
´ c ≥ 0, and ´f is nonnegative simple, then
If
I
(cf )dµ = c fdµ.
´If f , g are´ nonnegative simple with f ≤ g , then
I
fdµ ≤ gdµ.
Defining Integration: Step 2

I If f is a non-negative measurable function, define

ˆ ˆ
fdµ = sup gdµ : g is non-negative simple and g ≤ f .
Ω Ω
Defining Integration: Step 2

I If f is a non-negative measurable function, define

ˆ ˆ
fdµ = sup gdµ : g is non-negative simple and g ≤ f .
Ω Ω

´

I We note that Ω gdµ : g is simple and g ≤ f is non-empty
because at least one g ≤ f exists : g (x) = 0 for all x.
Defining Integration: Step 2

I If f is a non-negative measurable function, define

ˆ ˆ
fdµ = sup gdµ : g is non-negative simple and g ≤ f .
Ω Ω

´

I We note that Ω gdµ : g is simple and g ≤ f is non-empty
because at least one g ≤ f exists : g (x) = 0 for all x.
´
I Moreover Ω fdµ defined in this way is non-negative but can
be ∞.
Defining Integration: Step 2

I If f is a non-negative measurable function, define

ˆ ˆ
fdµ = sup gdµ : g is non-negative simple and g ≤ f .
Ω Ω

´

I We note that Ω gdµ : g is simple and g ≤ f is non-empty
because at least one g ≤ f exists : g (x) = 0 for all x.
´
I Moreover Ω fdµ defined in this way is non-negative but can
be ∞.
´
I If f is nonnegative
´ simple, then fdµ as defined in Step 1
agrees with fdµ defined in Step 2.
I By monotonicity of the integral from Step 1 and using that f
is simple, we have
ˆ ˆ
fdµ = sup gdµ : g is non-negative simple and g ≤ f
| Ω{z } | Ω{z }
according to Step 2 according to Step 1
ˆ
= fdµ
| Ω{z }
according to Step 1
I By monotonicity of the integral from Step 1 and using that f
is simple, we have
ˆ ˆ
fdµ = sup gdµ : g is non-negative simple and g ≤ f
| Ω{z } | Ω{z }
according to Step 2 according to Step 1
ˆ
= fdµ
| Ω{z }
according to Step 1

I Hence the definition in Step 2 is an extension of the definition

in Step
´ 1, as it should be: For simple functions f , we can talk
of fdµ unambiguously.
Properties

´ ´
I If 0 ≤ f1 ≤ f2 , then f1 dµ ≤ f2 dµ.
Properties

´ ´
I If 0 ≤ f1 ≤ f2 , then f1 dµ ≤ f2 dµ.

I If g is simple and g ≤ f1 , then g ≤ f2 .

Properties

´ ´
I If 0 ≤ f1 ≤ f2 , then f1 dµ ≤ f2 dµ.

I If g is simple and g ≤ f1 , then g ≤ f2 .

I Hence ˆ
gdµ : g is non-negative simple and g ≤ f1
Ω
ˆ
⊆ gdµ : g is non-negative simple and g ≤ f2 .
Ω
Properties

´ ´
I If 0 ≤ f1 ≤ f2 , then f1 dµ ≤ f2 dµ.

I If g is simple and g ≤ f1 , then g ≤ f2 .

I Hence ˆ
gdµ : g is non-negative simple and g ≤ f1
Ω
ˆ
⊆ gdµ : g is non-negative simple and g ≤ f2 .
Ω

´ ´
I Hence f1 dµ ≤ f2 dµ.
Monotone convergence theorem (MCT)

I Suppose {fn }n≥1 is an increasing sequence of nonnegative

functions which converge pointwise to f . Then
ˆ ˆ
fn dµ ↑ fdµ as n → ∞.
Properties (Contd.)

I If
´ f , g are nonnegative
´ measurable,
´ then
(f + g )dµ = fdµ + gdµ.
Properties (Contd.)

I If
´ f , g are nonnegative
´ measurable,
´ then
(f + g )dµ = fdµ + gdµ.

I Let {sn }n≥1 and {tn }n≥1 be increasing sequences of

nonnegative simple functions which converge pointwise to f
and g respectively (we know that such sequences exist due to
a result we have studied) .
Properties (Contd.)

I If
´ f , g are nonnegative
´ measurable,
´ then
(f + g )dµ = fdµ + gdµ.

I Let {sn }n≥1 and {tn }n≥1 be increasing sequences of

nonnegative simple functions which converge pointwise to f
and g respectively (we know that such sequences exist due to
a result we have studied) .

I Then {sn + tn }n≥1 is an increasing sequence of nonnegative

functions which converges to f + g .
I For each n,
ˆ ˆ ˆ
sn dµ + tn dµ = (sn + tn )dµ

from Step 1.
I For each n,
ˆ ˆ ˆ
sn dµ + tn dµ = (sn + tn )dµ

from Step 1.

I Taking the limit as n → ∞ and

´ using the
´ monotone
´
convergence theorem, we get fdµ + gdµ = (f + g )dµ.
Properties (Contd.)

´If c ≥ 0 and f´ is nonnegative measurable, then

I
(cf )dµ = c fdµ.
Properties (Contd.)

´If c ≥ 0 and f´ is nonnegative measurable, then

I
(cf )dµ = c fdµ.
´
I If c = 0 then both sides are 0 (note that fdµ could be ∞, so
we are using our convention that 0 · ∞ = 0 here).
Properties (Contd.)

´If c ≥ 0 and f´ is nonnegative measurable, then

I
(cf )dµ = c fdµ.
´
I If c = 0 then both sides are 0 (note that fdµ could be ∞, so
we are using our convention that 0 · ∞ = 0 here).
I If c > 0, let
ˆ
A= gdµ : g is non-negative simple and g ≤ f
Ω
ˆ
B= gdµ : g is non-negative simple and g ≤ cf
Ω
I It is easy to see that for any nonnegative real number x, x ∈ A
if and only if cx ∈ B.
I It is easy to see that for any nonnegative real number x, x ∈ A
if and only if cx ∈ B.

I So ˆ ˆ
c fdµ = c sup A = sup B = (cf )dµ.
Incorporating extended real valued functions

I Step 1 in the definition of integration talked only about

nonnegative simple functions. The notion of a nonnegative
simple function does not change when one moves to extended
real valued functions from real valued functions, because we
required simple functions by definition to take only real values.
Incorporating extended real valued functions

I Step 1 in the definition of integration talked only about

I In the context of extended real valued functions, in step 2 we

can define the integral of a nonnegative measurable function in
the same way as we did for real valued functions. This extends
the definition in step 1, is monotone, and respects linear
operations (addition, and scalar multiplication by a
nonnegative number). Further the monotone convergence
theorem continues to hold.
Incorporating extended real valued functions

I Step 1 in the definition of integration talked only about

I In the context of extended real valued functions, in step 2 we

I However we can make some more observations as follows:

Proposition

I Let (Ω, A, µ) be a measure space, and let f , g : Ω → [0, ∞] be

extended real-valued nonnegative measurable functions.
´
1. If ´µ({ω : f (ω) 6= 0}) = 0, then fdµ = 0.
2. If fdµ < ∞, then µ({ω : f (ω) = ∞}) = 0. (The converse is
not true.) ´ ´
3. If µ({ω : f (ω) 6= g (ω)}) = 0, then fdµ = gdµ.
Defining Integration: Step 3
´
I
´For−any f : Ω → R measurable, if at least one of f + dµ and
f dµ is finite, define
ˆ ˆ ˆ
fdµ = f dµ − f − dµ.
+

´ ´ ´
If both f + dµ and f − dµ are ∞, then we ´ say that fdµ
does not exist. f is said to be integrable if fdµ exists and is
finite.
Defining Integration: Step 3
´
I
´For−any f : Ω → R measurable, if at least one of f + dµ and
f dµ is finite, define
ˆ ˆ ˆ
fdµ = f dµ − f − dµ.
+

´ ´ ´
If both f + dµ and f − dµ are ∞, then we ´ say that fdµ
does not exist. f is said to be integrable if fdµ exists and is
finite.
I The definition in Step 3 extends that in Step 2: If f is
nonnegative, then f + = f and f − = 0, and so
ˆ ˆ ˆ ˆ
fdµ = f + dµ − f − dµ = fdµ .
| Ω{z } Ω
| {z } Ω
| {z } | Ω{z }
according to Step 3 according to Step 2 according to Step 2 according to Step 2
I Let L1 (Ω, A, µ) ⊆ L(Ω, A, µ) be the set of integrable
´
functions. Recall that f is said to ´be integrable´if fdµ exists
and is finite (equivalently, if both f + dµ and f − dµ are
finite).
I Let L1 (Ω, A, µ) ⊆ L(Ω, A, µ) be the set of integrable
´
functions. Recall that f is said to ´be integrable´if fdµ exists
and is finite (equivalently, if both f + dµ and f − dµ are
finite).

I f ∈ L1 ⇔ |f | ∈ L1 .
1 1
´ f , g ∈ L then
If ´ f + g ´∈ L and
I
(f + g )dµ = fdµ + gdµ.
´ ´
I If f ∈ L1 and c ∈ R, then cf ∈ L1 and cfdµ = c fdµ.
´ ´
I If f ∈ L1 , then | fdµ| ≤ |f |dµ.
Incorporating extended real valued functions

I We can also have Step 3 for extended real valued functions.

Incorporating extended real valued functions

I We can also have Step 3 for extended real valued functions.

I We note that for a, b ∈ [0, ∞], a + b < ∞ if and only if

a < ∞ and b < ∞.
Incorporating extended real valued functions

I We can also have Step 3 for extended real valued functions.

I We note that for a, b ∈ [0, ∞], a + b < ∞ if and only if

a < ∞ and b < ∞.

´ +f : Ω → [0,
Let ´ ∞] be measurable. Applying the above to
I
−
f dµ and f dµ, here also we get that f is integrable if
and only if |f | is.
Incorporating extended real valued functions

I We can also have Step 3 for extended real valued functions.

I We note that for a, b ∈ [0, ∞], a + b < ∞ if and only if

a < ∞ and b < ∞.

´ +f : Ω → [0,
Let ´ ∞] be measurable. Applying the above to
I
−
f dµ and f dµ, here also we get that f is integrable if
and only if |f | is.

I Further´ as with ´real-valued´ functions, for integrable f , we

define fdµ = f + dµ − f − dµ.
Incorporating extended real valued functions

I We can also have Step 3 for extended real valued functions.

I We note that for a, b ∈ [0, ∞], a + b < ∞ if and only if

a < ∞ and b < ∞.

´ +f : Ω → [0,
Let ´ ∞] be measurable. Applying the above to
I
−
f dµ and f dµ, here also we get that f is integrable if
and only if |f | is.

I Further´ as with ´real-valued´ functions, for integrable f , we

define fdµ = f + dµ − f − dµ.

I However, while proving the properties we need to be cautious

as we illustrate below:
I For f , g : Ω → [0, ∞] define f + g : Ω → [0, ∞] by
(
0 if {f (ω), g (ω)} = {−∞, ∞}
(f + g )(ω) = .
f (ω) + g (ω) otherwise
I For f , g : Ω → [0, ∞] define f + g : Ω → [0, ∞] by
(
0 if {f (ω), g (ω)} = {−∞, ∞}
(f + g )(ω) = .
f (ω) + g (ω) otherwise

I We note that, here defining (f + g )(ω) to be 0 in the first

case is an adhoc definition, we could have put “anything
sensible” (like, say, 5) there. We are not saying ∞ − ∞ = 0.
I For f , g : Ω → [0, ∞] define f + g : Ω → [0, ∞] by
(
0 if {f (ω), g (ω)} = {−∞, ∞}
(f + g )(ω) = .
f (ω) + g (ω) otherwise

I We note that, here defining (f + g )(ω) to be 0 in the first

case is an adhoc definition, we could have put “anything
sensible” (like, say, 5) there. We are not saying ∞ − ∞ = 0.

I Then we have the´ following result:

´ If f and
´ g are integrable,
so is f + g , and (f + g )dµ = fdµ + gdµ.
I For f , g : Ω → [0, ∞] define f + g : Ω → [0, ∞] by
(
0 if {f (ω), g (ω)} = {−∞, ∞}
(f + g )(ω) = .
f (ω) + g (ω) otherwise

I We note that, here defining (f + g )(ω) to be 0 in the first

case is an adhoc definition, we could have put “anything
sensible” (like, say, 5) there. We are not saying ∞ − ∞ = 0.

I Then we have the´ following result:

´ If f and
´ g are integrable,
so is f + g , and (f + g )dµ = fdµ + gdµ.

I Let S = {ω : f (ω) ∈
/ R or g (ω) ∈
/ R}.
I Since f and g are integrable, S is a subset of

{ω : ∞ ∈ {f + (ω), f − (ω), g + (ω), g − (ω)}}

which we know from the property (2) above has measure 0.

I Since f and g are integrable, S is a subset of

{ω : ∞ ∈ {f + (ω), f − (ω), g + (ω), g − (ω)}}

which we know from the property (2) above has measure 0.

I So µ(S) = 0. Let T = Ω − S.
I Since f and g are integrable, S is a subset of

{ω : ∞ ∈ {f + (ω), f − (ω), g + (ω), g − (ω)}}

which we know from the property (2) above has measure 0.

I So µ(S) = 0. Let T = Ω − S.

I Now
0 ≤ |f + g | ≤ f + + f − + g + + g −
and so |f + g | is integrable and f + g is integrable.
I Since f and g are integrable, S is a subset of

{ω : ∞ ∈ {f + (ω), f − (ω), g + (ω), g − (ω)}}

which we know from the property (2) above has measure 0.

I So µ(S) = 0. Let T = Ω − S.

I Now
0 ≤ |f + g | ≤ f + + f − + g + + g −
and so |f + g | is integrable and f + g is integrable.

I As before, we have for all

h ∈ {f + , f − , g + , g − , (f + g )+ , (f + g )− },
´ ´
hdµ = hIT dµ.
I Now

(f + g )+ IT − (f + g )− IT = f + IT − f − IT + g + IT − g − IT .
I Now

(f + g )+ IT − (f + g )− IT = f + IT − f − IT + g + IT − g − IT .

I Since these are all real-valued integrable functions,

ˆ ˆ
(f + g ) IT dµ − (f + g )− IT dµ
+

ˆ ˆ ˆ ˆ
−
= +
f IT dµ − f IT dµ + +
g IT dµ − g − IT dµ.
I As seen above, we can drop the IT multipliers, to get
ˆ ˆ
(f + g ) dµ − (f + g )− dµ
+

ˆ ˆ ˆ ˆ
−
= +
f dµ − f dµ + +
g dµ − g − dµ

which implies
ˆ ˆ ˆ
(f + g )dµ = fdµ + gdµ.
I As seen above, we can drop the IT multipliers, to get
ˆ ˆ
(f + g ) dµ − (f + g )− dµ
+

ˆ ˆ ˆ ˆ
−
= +
f dµ − f dµ + +
g dµ − g − dµ

which implies
ˆ ˆ ˆ
(f + g )dµ = fdµ + gdµ.

I Similarly we have the following result:

´ If f is´integrable and
c ∈ R, then cf is integrable and cfdµ = c fdµ.
Fatou’s Lemma

I Suppose {fn }n≥1 is a sequence of nonnegative measurable

functions, and f is nonnegative measurable such that
f = lim inf fn (pointwise), then
n→∞
ˆ ˆ
(lim inf fn )dµ ≤ lim inf fn dµ.
n→∞ n→∞
Dominated convergence theorem (DCT)

I Suppose {fn }n≥1 is a sequence of functions and g is a

function, all in L1 , with |fn | ≤ g for all n. Suppose f : Ω → R
is such that lim inf fn = f (pointwise). Then f ∈ L1 and
n→∞
ˆ ˆ ˆ
lim fn dµ = fdµ, lim |fn − f |dµ = 0.
n→∞ n→∞
MCT (once again)

I We shall now look at stronger, “almost everywhere” versions of

MCT and DCT.
MCT (once again)

I We shall now look at stronger, “almost everywhere” versions of

MCT and DCT.

I Let (Ω, A, µ) be a measure space, and let {fn }n≥1 be a

sequence of measurable functions which
´ are nonnegative
´ a.e.,
and which increase to f a.e. . Then fn dµ → fdµ as
n → ∞.
MCT (once again)

I We shall now look at stronger, “almost everywhere” versions of

MCT and DCT.

I Let (Ω, A, µ) be a measure space, and let {fn }n≥1 be a

sequence of measurable functions which
´ are nonnegative
´ a.e.,
and which increase to f a.e. . Then fn dµ → fdµ as
n → ∞.

I Let
I S1 = {ω : ∃n ∈ N, fn (ω) < 0}.
I S2 = {ω : ∃n ∈ N, fn+1 (ω) < fn (ω)}.
I S3 = {ω : fn (ω) 6→ f (ω) as n → ∞}.
MCT (once again)

I We shall now look at stronger, “almost everywhere” versions of

MCT and DCT.

I Let (Ω, A, µ) be a measure space, and let {fn }n≥1 be a

sequence of measurable functions which
´ are nonnegative
´ a.e.,
and which increase to f a.e. . Then fn dµ → fdµ as
n → ∞.

I Let
I S1 = {ω : ∃n ∈ N, fn (ω) < 0}.
I S2 = {ω : ∃n ∈ N, fn+1 (ω) < fn (ω)}.
I S3 = {ω : fn (ω) 6→ f (ω) as n → ∞}.
S S
I Let S = S1 S2 S3 . By the hypothesis, µ(S) = 0.
I Let gn = fn IΩ−S for all n ∈ N and suppose g = fIΩ−S .
I Let gn = fn IΩ−S for all n ∈ N and suppose g = fIΩ−S .
´ ´
I Since´ gn = fn ´a.e., and g = f a.e., we have gn dµ = fn dµ
and gdµ = fdµ.
I Let gn = fn IΩ−S for all n ∈ N and suppose g = fIΩ−S .
´ ´
I Since´ gn = fn ´a.e., and g = f a.e., we have gn dµ = fn dµ
and gdµ = fdµ.

I The sequence {gn }n≥1 is a sequence of nonnegative functions

increasing to g : for ω ∈ S, gn (ω) = g (ω) = 0, and for ω ∈
/ S,
this is true by the definition of S1 , S2 , S3 .
I Let gn = fn IΩ−S for all n ∈ N and suppose g = fIΩ−S .
´ ´
I Since´ gn = fn ´a.e., and g = f a.e., we have gn dµ = fn dµ
and gdµ = fdµ.

I The sequence {gn }n≥1 is a sequence of nonnegative functions

increasing to g : for ω ∈ S, gn (ω) = g (ω) = 0, and for ω ∈
/ S,
this is true by the definition of S1 , S2 , S3 .

I So
´ by the original
´ monotone convergence theorem,
gn dµ → gdµ as n → ∞.
I Let gn = fn IΩ−S for all n ∈ N and suppose g = fIΩ−S .
´ ´
I Since´ gn = fn ´a.e., and g = f a.e., we have gn dµ = fn dµ
and gdµ = fdµ.

I The sequence {gn }n≥1 is a sequence of nonnegative functions

increasing to g : for ω ∈ S, gn (ω) = g (ω) = 0, and for ω ∈
/ S,
this is true by the definition of S1 , S2 , S3 .

I So
´ by the original
´ monotone convergence theorem,
gn dµ → gdµ as n → ∞.
´ ´
I Hence fn dµ → fdµ as n → ∞.
DCT (once again)

I Let (Ω, A, µ) be a measure space, and let g ∈ L1 (Ω, A, µ).

Let {fn }n≥1 be a sequence of measurable functions, and f be a
measurable function, such that |fn | ≤´ g a.e., and fn → f as
1
´ → ∞ a.e.
n ´ . Then fn , f ∈ L , and |fn − f |dµ → 0 and
fn dµ → fdµ as n → ∞.
DCT (once again)

I Let (Ω, A, µ) be a measure space, and let g ∈ L1 (Ω, A, µ).

I Let
I S1 = {ω : g (ω) < 0}.
I S2 = {ω : ∃n ∈ N, |fn (ω)| > g (ω)}.
I S3 = {ω : fn (ω) 6→ f (ω) as n → ∞}.
DCT (once again)

I Let (Ω, A, µ) be a measure space, and let g ∈ L1 (Ω, A, µ).

I Let
I S1 = {ω : g (ω) < 0}.
I S2 = {ω : ∃n ∈ N, |fn (ω)| > g (ω)}.
I S3 = {ω : fn (ω) 6→ f (ω) as n → ∞}.
S S
I Let S = S1 S2 S3 . By the hypothesis, µ(S) = 0.
DCT (once again)

I Let (Ω, A, µ) be a measure space, and let g ∈ L1 (Ω, A, µ).

I Let
I S1 = {ω : g (ω) < 0}.
I S2 = {ω : ∃n ∈ N, |fn (ω)| > g (ω)}.
I S3 = {ω : fn (ω) 6→ f (ω) as n → ∞}.
S S
I Let S = S1 S2 S3 . By the hypothesis, µ(S) = 0.

I Let f˜n = fn IΩ−S , f˜ = fIΩ−S , g̃ = gIΩ−S .

I As
´ before,´because´ of almost
´ everywhere
´ equality,
´
˜ ˜
fn dµ = fn dµ, fdµ = f dµ, gdµ = g̃ dµ.
I As
´ before,´because´ of almost
´ everywhere
´ equality,
´
˜ ˜
fn dµ = fn dµ, fdµ = f dµ, gdµ = g̃ dµ.

I Moreover g̃ ≤ |g |, so g̃ ∈ L1 .
I As
´ before,´because´ of almost
´ everywhere
´ equality,
´
˜ ˜
fn dµ = fn dµ, fdµ = f dµ, gdµ = g̃ dµ.

I Moreover g̃ ≤ |g |, so g̃ ∈ L1 .

I Now |f˜n | ≤ g̃ for all n ∈ N and f˜n → f˜ as n → ∞.

I As
´ before,´because´ of almost
´ everywhere
´ equality,
´
˜ ˜
fn dµ = fn dµ, fdµ = f dµ, gdµ = g̃ dµ.

I Moreover g̃ ≤ |g |, so g̃ ∈ L1 .

I Now |f˜n | ≤ g̃ for all n ∈ N and f˜n → f˜ as n → ∞.

I By
´ the original dominated convergence theorem,
|f˜n − f˜|dµ → 0 as n → ∞.
I As
´ before,´because´ of almost
´ everywhere
´ equality,
´
˜ ˜
fn dµ = fn dµ, fdµ = f dµ, gdµ = g̃ dµ.

I Moreover g̃ ≤ |g |, so g̃ ∈ L1 .

I Now |f˜n | ≤ g̃ for all n ∈ N and f˜n → f˜ as n → ∞.

I By
´ the original dominated convergence theorem,
|f˜n − f˜|dµ → 0 as n → ∞.

I Since ´|f˜n − f˜| = |fn − f | a.e. (they differ at most on S), we

have |fn − f |dµ → 0.
I As
´ before,´because´ of almost
´ everywhere
´ equality,
´
˜ ˜
fn dµ = fn dµ, fdµ = f dµ, gdµ = g̃ dµ.

I Moreover g̃ ≤ |g |, so g̃ ∈ L1 .

I Now |f˜n | ≤ g̃ for all n ∈ N and f˜n → f˜ as n → ∞.

I By
´ the original dominated convergence theorem,
|f˜n − f˜|dµ → 0 as n → ∞.

I Since ´|f˜n − f˜| = |fn − f | a.e. (they differ at most on S), we

have |fn − f |dµ → 0.
´ ´ ´ ´
I Also, since f˜n dµ → f˜dµ, we have fn dµ → fdµ.
Properties

´
I If f ≥ 0, then fdµ ≥ 0.
Properties

´
I If f ≥ 0, then fdµ ≥ 0.
´ ´
I If f , g ∈ L1 and f ≤ g , then fdµ ≤ gdµ.
1
I This
´ is because
´ g ´− f ∈ L and nonnegative,
´ and so
gdµ = fdµ + (g − f )dµ ≥ fdµ.
Properties (Contd.)

´
I If f ≥ 0 and fdµ = 0, then µ({ω ∈ Ω : f (ω) 6= 0}) = 0.
Properties (Contd.)

´
I If f ≥ 0 and fdµ = 0, then µ({ω ∈ Ω : f (ω) 6= 0}) = 0.

I For all n ∈ N, let An = {ω : f (ω) ≥ n1 }.

Properties (Contd.)

´
I If f ≥ 0 and fdµ = 0, then µ({ω ∈ Ω : f (ω) 6= 0}) = 0.

I For all n ∈ N, let An = {ω : f (ω) ≥ n1 }.

S 1
I Note that {ω
´ : f (ω) 6
= 0}
´ = n∈N An and n IAn ≤ f , and so
1 1
n µ(An ) = n IAn dµ ≤ fdµ = 0.
Properties (Contd.)

´
I If f ≥ 0 and fdµ = 0, then µ({ω ∈ Ω : f (ω) 6= 0}) = 0.

I For all n ∈ N, let An = {ω : f (ω) ≥ n1 }.

S 1
I Note that {ω
´ : f (ω) 6
= 0}
´ = n∈N An and n IAn ≤ f , and so
1 1
n µ(An ) = n IAn dµ ≤ fdµ = 0.

I Hence µ(An ) = 0.
Properties (Contd.)

´
I If f ≥ 0 and fdµ = 0, then µ({ω ∈ Ω : f (ω) 6= 0}) = 0.

I For all n ∈ N, let An = {ω : f (ω) ≥ n1 }.

S 1
I Note that {ω
´ : f (ω) 6
= 0}
´ = n∈N An and n IAn ≤ f , and so
1 1
n µ(An ) = n IAn dµ ≤ fdµ = 0.

I Hence µ(An ) = 0.

I Now
[ X
µ({ω ∈ Ω : f (ω) 6= 0}) = µ An ≤ µ(An ) = 0.
n∈N n∈N
Properties (Contd.)
I Let (Ω, A, µ) be a measure space, and f : Ω → R measurable.
´
Suppose µ({ω : f (ω) 6= 0}) = 0. Then f ∈ L1 and fdµ = 0.
Properties (Contd.)
I Let (Ω, A, µ) be a measure space, and f : Ω → R measurable.
´
Suppose µ({ω : f (ω) 6= 0}) = 0. Then f ∈ L1 and fdµ = 0.

I Suppose s is simple with 0 ≤ s ≤ |f |.

Properties (Contd.)
I Let (Ω, A, µ) be a measure space, and f : Ω → R measurable.
´
Suppose µ({ω : f (ω) 6= 0}) = 0. Then f ∈ L1 and fdµ = 0.

I Suppose s is simple with 0 ≤ s ≤ |f |.

´
I We shall show sdµ = 0.
Properties (Contd.)
I Let (Ω, A, µ) be a measure space, and f : Ω → R measurable.
´
Suppose µ({ω : f (ω) 6= 0}) = 0. Then f ∈ L1 and fdµ = 0.

I Suppose s is simple with 0 ≤ s ≤ |f |.

´
I We shall show sdµ = 0.

I Let the canonical expression of s as a nonnegative linear

combination of indicators be
n
X
s= ai IAi .
i=1
Properties (Contd.)
I Let (Ω, A, µ) be a measure space, and f : Ω → R measurable.
´
Suppose µ({ω : f (ω) 6= 0}) = 0. Then f ∈ L1 and fdµ = 0.

I Suppose s is simple with 0 ≤ s ≤ |f |.

´
I We shall show sdµ = 0.

I Let the canonical expression of s as a nonnegative linear

combination of indicators be
n
X
s= ai IAi .
i=1

I If ai 6= 0, then ai > 0. |f | takes value at least ai on Ai , so

Ai ⊆ {ω : |f (ω)| =
6 0} = {ω : f (ω) 6= 0}

which implies µ(Ai ) = 0.

I Hence for all i, ai µ(Ai ) = 0.
I Hence for all i, ai µ(Ai ) = 0.
n
´ X
I This shows that sdµ = ai µ(Ai ) = 0.
i=1
I Hence for all i, ai µ(Ai ) = 0.
n
´ X
I This shows that sdµ = ai µ(Ai ) = 0.
i=1

I Since ´this holds for all simple functions s with 0 ≤ s ≤ |f |, we

have |f |dµ = 0 which means |f | ∈ L1 , and thus f ∈ L1 .
I Hence for all i, ai µ(Ai ) = 0.
n
´ X
I This shows that sdµ = ai µ(Ai ) = 0.
i=1

I Since ´this holds for all simple functions s with 0 ≤ s ≤ |f |, we

have |f |dµ = 0 which means |f | ∈ L1 , and thus f ∈ L1 .
´ ´ ´
I Moreover fdµ ≤ |f |dµ, so fdµ = 0.
Finite sums

I Let
I Ω = {1, ..., n}.
I A = 2Ω .
I µ(A) = |A|. It is easy to see that µ is a measure on (Ω, A).
Finite sums

I Let
I Ω = {1, ..., n}.
I A = 2Ω .
I µ(A) = |A|. It is easy to see that µ is a measure on (Ω, A).
´ ´
I Every f : Ω → R is simple, and f + dµ and f − dµ are finite,
so every f : Ω → R is integrable.
Finite sums

I Let
I Ω = {1, ..., n}.
I A = 2Ω .
I µ(A) = |A|. It is easy to see that µ is a measure on (Ω, A).
´ ´
I Every f : Ω → R is simple, and f + dµ and f − dµ are finite,
so every f : Ω → R is integrable.
n
´ X
I Here fdµ = f (k).
k=1
Infinite sums

I Let
I Ω = N.
I A = 2Ω .
I µ(A) = |A|.
Infinite sums

I Let
I Ω = N.
I A = 2Ω .
I µ(A) = |A|.
I Every f : Ω → R is measurable.
Infinite sums

I Let
I Ω = N.
I A = 2Ω .
I µ(A) = |A|.
I Every f : Ω → R is measurable.

I We can think of f as a sequence {f (n)}n≥1 .

Infinite sums

I Let
I Ω = N.
I A = 2Ω .
I µ(A) = |A|.
I Every f : Ω → R is measurable.

I We can think of f as a sequence {f (n)}n≥1 .

I Suppose f ≥ 0 with f (n) = 0 for all n > N, for some N.

I Then f is a nonnegative simple function (though not all
nonnegative simple functions look like this!), and
ˆ N
X
fdµ = f (i).
i=1
I Then f is a nonnegative simple function (though not all
nonnegative simple functions look like this!), and
ˆ N
X
fdµ = f (i).
i=1

I Now suppose f ≥ 0, without assuming other conditions. For

all k ∈ N, define gk : Ω → N by
(
f (n) , if n ≤ k
gk (n) = .
0 otherwise
I Then f is a nonnegative simple function (though not all
nonnegative simple functions look like this!), and
ˆ N
X
fdµ = f (i).
i=1

I Now suppose f ≥ 0, without assuming other conditions. For

all k ∈ N, define gk : Ω → N by
(
f (n) , if n ≤ k
gk (n) = .
0 otherwise

I gk is nonnegative, and {gk }k≥1 increases to f .

I Then f is a nonnegative simple function (though not all
nonnegative simple functions look like this!), and
ˆ N
X
fdµ = f (i).
i=1

I Now suppose f ≥ 0, without assuming other conditions. For

all k ∈ N, define gk : Ω → N by
(
f (n) , if n ≤ k
gk (n) = .
0 otherwise

I gk is nonnegative, and {gk }k≥1 increases to f .

I So by the monotone convergence theorem:

ˆ ˆ
lim gk dµ = fdµ.
k→∞
k
´ X
I But we have gk dµ = f (i).
i=1
k
´ X
I But we have gk dµ = f (i).
i=1

k ∞
´ X X
I Hence fdµ = lim f (i) = f (i).
k→∞
i=1 i=1
k
´ X
I But we have gk dµ = f (i).
i=1

k ∞
´ X X
I Hence fdµ = lim f (i) = f (i).
k→∞
i=1 i=1

I The monotone convergence theorem made our job easier - we

did not have to take all nonnegative simple functions below
´ f,
calculate their integrals, and take the supremum to get fdµ.
k
´ X
I But we have gk dµ = f (i).
i=1

k ∞
´ X X
I Hence fdµ = lim f (i) = f (i).
k→∞
i=1 i=1

I The monotone convergence theorem made our job easier - we

did not have to take all nonnegative simple functions below
´ f,
calculate their integrals, and take the supremum to get fdµ.

I Now suppose f : Ω → R is any function. Then f is

measurable.
I f is integrable if and only if |f | is integrable, that is, if and only if
ˆ ∞
X
|f |dµ = |f (i)| < ∞.
i=1

I Suppose f is integrable. Define gk as above.

I f is integrable if and only if |f | is integrable, that is, if and only if
ˆ ∞
X
|f |dµ = |f (i)| < ∞.
i=1

I Suppose f is integrable. Define gk as above.

I By the dominated convergence theorem (taking |f | as the dominating

function), we have ˆ ˆ
lim gk dµ = fdµ,
k→∞

and so ˆ ∞
X
fdµ = f (i).
i=1
I f is integrable if and only if |f | is integrable, that is, if and only if
ˆ ∞
X
|f |dµ = |f (i)| < ∞.
i=1

I Suppose f is integrable. Define gk as above.

I By the dominated convergence theorem (taking |f | as the dominating

function), we have ˆ ˆ
lim gk dµ = fdµ,
k→∞

and so ˆ ∞
X
fdµ = f (i).
i=1

I f is integrable if and only if the series corresponding to the sequence

{f (n)}n≥1 is absolutely convergent.
I f is integrable if and only if |f | is integrable, that is, if and only if
ˆ ∞
X
|f |dµ = |f (i)| < ∞.
i=1

I Suppose f is integrable. Define gk as above.

I By the dominated convergence theorem (taking |f | as the dominating

function), we have ˆ ˆ
lim gk dµ = fdµ,
k→∞

and so ˆ ∞
X
fdµ = f (i).
i=1

I f is integrable if and only if the series corresponding to the sequence

{f (n)}n≥1 is absolutely convergent.

I Consider f given by f (n) = (−1)n−1 1 .

n
I f is integrable if and only if |f | is integrable, that is, if and only if
ˆ ∞
X
|f |dµ = |f (i)| < ∞.
i=1

I Suppose f is integrable. Define gk as above.

I By the dominated convergence theorem (taking |f | as the dominating

function), we have ˆ ˆ
lim gk dµ = fdµ,
k→∞

and so ˆ ∞
X
fdµ = f (i).
i=1

I f is integrable if and only if the series corresponding to the sequence

{f (n)}n≥1 is absolutely convergent.

I Consider f given by f (n) = (−1)n−1 1 .

P∞ P∞ 1
I We say that f (i) exists. But f is not integrable, since = ∞.
i=1 i=1 n
Riemann integration

I Consider a closed and bounded interval [a, b] and suppose f is

bounded real valued function on [a, b].
Riemann integration

I Consider a closed and bounded interval [a, b] and suppose f is

bounded real valued function on [a, b].

I Let π : a = a0 < a1 < ... < an = b be a partition of [a, b].

Riemann integration

I Consider a closed and bounded interval [a, b] and suppose f is

bounded real valued function on [a, b].

I Let π : a = a0 < a1 < ... < an = b be a partition of [a, b].

I Then we have
n
X
U(f , π) = sup f (x)(ai − ai−1 )
ai−1 ≤x≤ai
i=1

= Upper Riemann Sums

and
n
X
L(f , π) = inf f (x)(ai − ai−1 )
ai−1 ≤x≤ai
i=1

= Lower Riemann Sums.

I We have the Upper Riemann integral as

inf{U(f , π) : π partition}

and the Lower Riemann integral as

sup{L(f , π) : π partition}.
I We have the Upper Riemann integral as

inf{U(f , π) : π partition}

and the Lower Riemann integral as

sup{L(f , π) : π partition}.

I Then a function f is said to be Riemann integrable on [a, b] if

Upper
´b Riemann integral = Lower Riemann integral and take
a f (x)dx to be the common value.
I So here is a connection between Riemann integration and
Lebesgue integration.
I So here is a connection between Riemann integration and
Lebesgue integration.

I A function f is Riemann integrable on [a, b] iff f is continuous

λ a.e. on [a, b]. In this case f must be Lebesgue measurable
on [a, b] and
ˆ b ˆ
f (x)dx = fdλ.
a
I So here is a connection between Riemann integration and
Lebesgue integration.

I A function f is Riemann integrable on [a, b] iff f is continuous

λ a.e. on [a, b]. In this case f must be Lebesgue measurable
on [a, b] and
ˆ b ˆ
f (x)dx = fdλ.
a

I Here f is Lebesgue measurable means that there is a Borel

measurable function g which is such that f = g λ a.e..
Example

I Consider the function I[0,1]−Q .

Example

I Consider the function I[0,1]−Q .

I This is a nonnegative simple function.

Example

I Consider the function I[0,1]−Q .

I This is a nonnegative simple function.

I Since any countable set has Lebesgue measure 0, we have

ˆ \
I[0,1]−Q dλ = λ([0, 1] − Q) = λ([0, 1]) − λ([0, 1] Q) = 1.
Example

I Consider the function I[0,1]−Q .

I This is a nonnegative simple function.

I Since any countable set has Lebesgue measure 0, we have

ˆ \
I[0,1]−Q dλ = λ([0, 1] − Q) = λ([0, 1]) − λ([0, 1] Q) = 1.

I However, I[0,1]−Q is not Riemann integrable: every interval of

nonzero length has both rationals and irrationals, so for any
partition of [0, 1], the lower Riemann sum is always 0 and the
upper Riemann sum is always 1.
Riemann-Stieljes Integral

I Consider a closed and bounded interval [a, b] and G is a

function of bounded variation on [a, b].
Riemann-Stieljes Integral

I Consider a closed and bounded interval [a, b] and G is a

function of bounded variation on [a, b].

I G is bounded variation iff G = G1 − G2 where G1 and G2 are

non-decreasing. Hence we assume (Wlg) G is non-decreasing.
Riemann-Stieljes Integral

I Consider a closed and bounded interval [a, b] and G is a

function of bounded variation on [a, b].

I G is bounded variation iff G = G1 − G2 where G1 and G2 are

non-decreasing. Hence we assume (Wlg) G is non-decreasing.

I Suppose f is bounded real valued function on [a, b].

Riemann-Stieljes Integral

I Consider a closed and bounded interval [a, b] and G is a

function of bounded variation on [a, b].

I G is bounded variation iff G = G1 − G2 where G1 and G2 are

non-decreasing. Hence we assume (Wlg) G is non-decreasing.

I Suppose f is bounded real valued function on [a, b].

I Let π : a = a0 < a1 < ... < an = b be a partition of [a, b].

I Then we have
n
X
U(f , G , π) = sup f (x)(G (ai ) − G (ai−1 ))
ai−1 ≤x≤ai
i=1

= Upper Riemann-Stieljes Sums

and
n
X
L(f , G , π) = inf f (x)(G (ai ) − G (ai−1 ))
ai−1 ≤x≤ai
i=1

= Lower Riemann-Stieljes Sums.

I We have the Upper Riemann-Stieljes integral as

inf{U(f , G , π) : π partition}

and the Lower Riemann-Stieljes integral as

sup{L(f , G , π) : π partition}.
I We have the Upper Riemann-Stieljes integral as

inf{U(f , G , π) : π partition}

and the Lower Riemann-Stieljes integral as

sup{L(f , G , π) : π partition}.

I Then a function f is said to be Riemann integrable on [a, b] if

Upper Riemann-Stieljes
´b integral = Lower Riemann-Stieljes
integral and take a f (x)dG (x) to be the common value.
I f is Riemann-Stieljes integrable with respect to G (x)
(non-decreasing) on [a, b] iff f is continuous µ a.e. where µ is
the unique measure on [a, b] induced by G . In this case f must
´b ´
be µ measurable and a f (x)dG (x) = fdµ.
I f is Riemann-Stieljes integrable with respect to G (x)
(non-decreasing) on [a, b] iff f is continuous µ a.e. where µ is
the unique measure on [a, b] induced by G . In this case f must
´b ´
be µ measurable and a f (x)dG (x) = fdµ.

I Here induced by G means: we take Ḡ (x) = limG (x) = G (x+)

x↓0
(since G is non-decreasing this right hand limit exists) and µ is
a measure induced by Ḡ (as d.f.).
I Consider a probability measure PX on (R, B) such that PX is
the probability distribution of a continuous random variable X .
Then we have already discussed that PX ({c}) = 0 for any
c ∈ R or more generally

PX (any countable subset of R) = 0.

I Consider a probability measure PX on (R, B) such that PX is
the probability distribution of a continuous random variable X .
Then we have already discussed that PX ({c}) = 0 for any
c ∈ R or more generally

PX (any countable subset of R) = 0.

I Recall that, on the same measurable space (R, B), we can

define the lebesgue measure λ, which indeed generalizes the
concept of length, viz. λ((a, b]) = b − a for any a, b ∈ R.
Even for lebesgue measure we had the same idea that

λ(any countable subset of R) = 0.

PX (any countable subset of R) = 0.

I Recall that, on the same measurable space (R, B), we can

define the lebesgue measure λ, which indeed generalizes the
concept of length, viz. λ((a, b]) = b − a for any a, b ∈ R.
Even for lebesgue measure we had the same idea that

λ(any countable subset of R) = 0.

I Thus on the same measurable space (R, B), we can have two
distinct measure spaces (R, B, PX ) and (R, B, λ).
I Further the two distinct measures PX and λ are such that
whenever for any subset A of R we have λ(A) = 0, then we
shall also have PX (A) = 0.
I Further the two distinct measures PX and λ are such that
whenever for any subset A of R we have λ(A) = 0, then we
shall also have PX (A) = 0.

I In other words, sets A ∈ B which have λ measure 0 also have

PX measure 0.
I Further the two distinct measures PX and λ are such that
whenever for any subset A of R we have λ(A) = 0, then we
shall also have PX (A) = 0.

I In other words, sets A ∈ B which have λ measure 0 also have

PX measure 0.

I We shall say that the measure PX is absolutely continuous

with respect to the measure λ.
I Further the two distinct measures PX and λ are such that
whenever for any subset A of R we have λ(A) = 0, then we
shall also have PX (A) = 0.

I In other words, sets A ∈ B which have λ measure 0 also have

PX measure 0.

I We shall say that the measure PX is absolutely continuous

with respect to the measure λ.

I In general, we have the following idea: Consider a measurable

space (Ω, A) and two measures µ and γ defined on A. The
measure γ is said to be absolutely continuous with respect to
µ, if for any measurable set A ∈ A, µ(A) = 0 implies
γ(A) = 0. We denote it by γ << µ.
Construction of absolutely continuous measure

I Suppose we start with a measurable space (Ω, A) and let µ be

a measure on this space.
Construction of absolutely continuous measure

I Suppose we start with a measurable space (Ω, A) and let µ be

a measure on this space.

I Consider a non-negative measurable function f on (Ω, A).

Construction of absolutely continuous measure

I Suppose we start with a measurable space (Ω, A) and let µ be

a measure on this space.

I Consider a non-negative measurable function f on (Ω, A).

I Then for any set A ∈ A, let us define

ˆ
γ(A) = fdµ.
A
Construction of absolutely continuous measure

I Suppose we start with a measurable space (Ω, A) and let µ be

a measure on this space.

I Consider a non-negative measurable function f on (Ω, A).

I Then for any set A ∈ A, let us define

ˆ
γ(A) = fdµ.
A

I Then f is integrable (Lebesgue) iff γ is finite and is a measure

on (Ω, A).
I It can be easily observed that
I γ(∅) = 0 and γ(A) ≥ 0∀A ∈ A because f ≥ 0.
I For mutually disjoint sets A1 , A2 , .... in A, we have
[ ˆ
γ Ak = [ fdµ
k Ak
k

Xˆ X
= fdµ = γ(Ak )
k Ak k

so that γ defined in this way is a measure on (Ω, A).

I It can be easily observed that
I γ(∅) = 0 and γ(A) ≥ 0∀A ∈ A because f ≥ 0.
I For mutually disjoint sets A1 , A2 , .... in A, we have
[ ˆ
γ Ak = [ fdµ
k Ak
k

Xˆ X
= fdµ = γ(Ak )
k Ak k

so that γ defined in this way is a measure on (Ω, A).

I Further, since f ≥ 0, we have

µ(A) = 0 ⇒ γ(A) = 0

or in other words, γ << µ.

I Hence using the concpt of Lebesgue integration, we have
deviced a method for construction of absolutely continuous
measure.
I Hence using the concpt of Lebesgue integration, we have
deviced a method for construction of absolutely continuous
measure.

I Further this way of construction depends on the choice of f ,

that is, given a fixed measure µ, we can construct several
measures aboslutely continuous w.r.t. µ by selecting different
choices of f .
I Hence using the concpt of Lebesgue integration, we have
deviced a method for construction of absolutely continuous
measure.

I Further this way of construction depends on the choice of f ,

that is, given a fixed measure µ, we can construct several
measures aboslutely continuous w.r.t. µ by selecting different
choices of f .

I But is this the only way of constructing such aboslutely

continous measures? The following results says so:
Radon-Nikodym theorem
I Let µ and γ be two measures on (Ω, A) and µ be σ−finite. If γ << µ,
then there exists a non-negative Borel function f on Ω such that
ˆ
γ(A) = fdµ, A ∈ A.
A
Moreover such representation is unique in the sense, if there exists g ≥ 0
such that ˆ
γ(A) = gdµ, A ∈ A
A
then f = g a.e. µ. The function f is called the Radon-Nikodym derivative
or density of γ with respect to µ and we denote it as
dγ
f = .
dµ
Radon-Nikodym theorem
I Let µ and γ be two measures on (Ω, A) and µ be σ−finite. If γ << µ,
then there exists a non-negative Borel function f on Ω such that
ˆ
γ(A) = fdµ, A ∈ A.
A
Moreover such representation is unique in the sense, if there exists g ≥ 0
such that ˆ
γ(A) = gdµ, A ∈ A
A
then f = g a.e. µ. The function f is called the Radon-Nikodym derivative
or density of γ with respect to µ and we denote it as
dγ
f = .
dµ
I We note that f = dγ
dµ
is not derivative in the usual sense and should be
treated only as a symbol. Rather we should keep in mind that f is the
Radon-Nikodym derivative of γ with respect to µ if
ˆ
γ(A) = fdµ, A ∈ A.
A
I Further if ˆ
γ(Ω) = fdµ = 1
for any f ≥ 0 a.e. µ, then γ is a probability measure and f is called its
I A random variable X is called absolutely continuous if the probability
distribution PX is absolutely continuous with respect to lebesgue measure
λ.
I A random variable X is called absolutely continuous if the probability
distribution PX is absolutely continuous with respect to lebesgue measure
λ.

I Then by the Radon-Nikodym theorem, there exists a non-negative Borel

measurable function f such that
ˆ
PX (B) = fdλ, B ∈ B.
B
I A random variable X is called absolutely continuous if the probability
distribution PX is absolutely continuous with respect to lebesgue measure
λ.

I Then by the Radon-Nikodym theorem, there exists a non-negative Borel

measurable function f such that
ˆ
PX (B) = fdλ, B ∈ B.
B

I Here the function f , which we denote by

dPX
f =
dλ
is the Radon-Nikodym derivative.
I In particular, if we choose B = (−∞, x], we get
ˆ
PX ((−∞, x]) = fdλ
(−∞,x]
ˆ x
⇒ F (x) = f (u)du.
−∞
I In particular, if we choose B = (−∞, x], we get
ˆ
PX ((−∞, x]) = fdλ
(−∞,x]
ˆ x
⇒ F (x) = f (u)du.
−∞

I Hence in this case the Radon-Nikodym derivative is actually our usual

probability desnsity function (p.d.f.).
I Now we shall look back at the Radon-Nikodym theorem once again. The
uniqueness part of the theorem tells us if there exists g ≥ 0 such that
ˆ
γ(A) = gdµ, A ∈ A
A

then f = g a.e. µ.
I Consider the following example: Suppose we have a random variable X
having U(0, 1) distribution. Then we know that
(
1, 0 < x < 1
f1 (x) = .
0, otherwise
(
1, 0 ≤ x ≤ 1
f2 (x) = .
0, otherwise
(
1, 0 < x ≤ 1
f3 (x) =
0, otherwise
(
1, 0 ≤ x < 1
f4 (x) =
0, otherwise
are all pdf of X .
I That is, all these four functions can be considered as the
Radon-Nikodym derivative of PX with respect to the Lebesgue
measure λ.
I That is, all these four functions can be considered as the
Radon-Nikodym derivative of PX with respect to the Lebesgue
measure λ.

I Further we note that f1 , f2 , f3 and f4 differ among themselves

in at most two points and hence f1 = f2 = f3 = f4 a.e. λ, as
claimed in the uniqueness part of the theorem.
I That is, all these four functions can be considered as the
Radon-Nikodym derivative of PX with respect to the Lebesgue
measure λ.

I Further we note that f1 , f2 , f3 and f4 differ among themselves

in at most two points and hence f1 = f2 = f3 = f4 a.e. λ, as
claimed in the uniqueness part of the theorem.

I We had already argued this using the fact that Rieman integral
remains unchanged if we change the integrand at countable
number of points. This is just another way of proving the
same idea.
I In our previous discussions, we have raised this question: For a
continuous cdf FX , how do we gurantee that
dFX (x)
f (x) =
dx
exist? Because continuity does not imply differentiabilty in general.
I In our previous discussions, we have raised this question: For a
continuous cdf FX , how do we gurantee that
dFX (x)
f (x) =
dx
exist? Because continuity does not imply differentiabilty in general.

I As we already noted, even if FX is not differentiable at countable number

of points, we can modify FX at those points to make this relation
f (x) = dFdx
X (x)
valid.
I In our previous discussions, we have raised this question: For a
continuous cdf FX , how do we gurantee that
dFX (x)
f (x) =
dx
exist? Because continuity does not imply differentiabilty in general.

I As we already noted, even if FX is not differentiable at countable number

of points, we can modify FX at those points to make this relation
f (x) = dFdx
X (x)
valid.

I However this strategy should not work if FX is not differentiable at

uncountable number of points. As we shall see later, not all continous cdf
have a pdf.
I In our previous discussions, we have raised this question: For a
continuous cdf FX , how do we gurantee that
dFX (x)
f (x) =
dx
exist? Because continuity does not imply differentiabilty in general.

I As we already noted, even if FX is not differentiable at countable number

of points, we can modify FX at those points to make this relation
f (x) = dFdx
X (x)
valid.

I However this strategy should not work if FX is not differentiable at

uncountable number of points. As we shall see later, not all continous cdf
have a pdf.

I Infact for a continuous cdf to admit a density, it must be absolutely

continous with respect to Lebesgue measure.
I In our previous discussions, we have raised this question: For a
continuous cdf FX , how do we gurantee that
dFX (x)
f (x) =
dx
exist? Because continuity does not imply differentiabilty in general.

I As we already noted, even if FX is not differentiable at countable number

of points, we can modify FX at those points to make this relation
f (x) = dFdx
X (x)
valid.

I However this strategy should not work if FX is not differentiable at

uncountable number of points. As we shall see later, not all continous cdf
have a pdf.

I Infact for a continuous cdf to admit a density, it must be absolutely

continous with respect to Lebesgue measure.

I Absolute continuity is a stronger condition that continuity but it is weaker

than differntiability. So how does absolute continuity of F gurantee the
existence of dFdx(x) ?
I In our previous discussions, we have raised this question: For a
continuous cdf FX , how do we gurantee that
dFX (x)
f (x) =
dx
exist? Because continuity does not imply differentiabilty in general.

I As we already noted, even if FX is not differentiable at countable number

of points, we can modify FX at those points to make this relation
f (x) = dFdx
X (x)
valid.

I However this strategy should not work if FX is not differentiable at

uncountable number of points. As we shall see later, not all continous cdf
have a pdf.

I Infact for a continuous cdf to admit a density, it must be absolutely

continous with respect to Lebesgue measure.

I Absolute continuity is a stronger condition that continuity but it is weaker

than differntiability. So how does absolute continuity of F gurantee the
existence of dFdx(x) ?

I This is due to the following result: If F be a cdf of a absolute continuous

probability measure w.r.t. Lebesgue measure, then F is differentiable a.e.
λ.
I Consider a discrete random variable X assuming values in the
countable set {x1 , x2 , ....} and having probability distribution
PX and cdf FX and pmf p(x).
I Consider a discrete random variable X assuming values in the
countable set {x1 , x2 , ....} and having probability distribution
PX and cdf FX and pmf p(x).
I Then for any set B ∈ B, we have
X
PX (B) = P(X ∈ B) = pi .
i:xi ∈B
I Consider a discrete random variable X assuming values in the
countable set {x1 , x2 , ....} and having probability distribution
PX and cdf FX and pmf p(x).
I Then for any set B ∈ B, we have
X
PX (B) = P(X ∈ B) = pi .
i:xi ∈B

I Recall the counting measure µ defined on any sigma-field as

µ(A) = number of elements in A.
I Consider a discrete random variable X assuming values in the
countable set {x1 , x2 , ....} and having probability distribution
PX and cdf FX and pmf p(x).
I Then for any set B ∈ B, we have
X
PX (B) = P(X ∈ B) = pi .
i:xi ∈B

I Recall the counting measure µ defined on any sigma-field as

µ(A) = number of elements in A.

I Then for any B ∈ B, if we have µ(B) = 0, then

X
PX (B) = pi = 0
i:xi ∈B

since {i : xi ∈ B} = ∅.
I Consider a discrete random variable X assuming values in the
countable set {x1 , x2 , ....} and having probability distribution
PX and cdf FX and pmf p(x).
I Then for any set B ∈ B, we have
X
PX (B) = P(X ∈ B) = pi .
i:xi ∈B

I Recall the counting measure µ defined on any sigma-field as

µ(A) = number of elements in A.

I Then for any B ∈ B, if we have µ(B) = 0, then

X
PX (B) = pi = 0
i:xi ∈B

since {i : xi ∈ B} = ∅.
I Hence PX (B) << µ or in other words, any discrete probability
measure is absolutely continuous with respect to counting
measure.
I Hence by Radon-Nikodym theorem, there should exist a
function f such that
ˆ
PX (B) = fdµ, B ∈ B
B

which we call the Radon-Nikodym derivative of PX w.r.t. µ,

denoted as
dPX
f = .
dµ
I Hence by Radon-Nikodym theorem, there should exist a
function f such that
ˆ
PX (B) = fdµ, B ∈ B
B

which we call the Radon-Nikodym derivative of PX w.r.t. µ,

denoted as
dPX
f = .
dµ
I Choosing B = (−∞, x], we get
ˆ
PX ((−∞, x]) = fdµ
(−∞,x]
X
⇒ F (x) = f (y ), ∀x.
y :y ≤x
I Hence by Radon-Nikodym theorem, there should exist a
function f such that
ˆ
PX (B) = fdµ, B ∈ B
B

which we call the Radon-Nikodym derivative of PX w.r.t. µ,

denoted as
dPX
f = .
dµ
I Choosing B = (−∞, x], we get
ˆ
PX ((−∞, x]) = fdµ
(−∞,x]
X
⇒ F (x) = f (y ), ∀x.
y :y ≤x

I This implies the Radon-Nikodym derivative in this case is the

p.m.f p(x) of the random variable X .
Further Example: Pressure

I Absolutely continuous measures are not rare in real life also.

From basic concept of physics, we know Pressure(P) can be
expressed as P = dF
dA where F is the force applied to an area
A. But how does this make sense?
Further Example: Pressure

I Absolutely continuous measures are not rare in real life also.

From basic concept of physics, we know Pressure(P) can be
expressed as P = dF
dA where F is the force applied to an area
A. But how does this make sense?

I Treat area as measure (two-dimensional Lebesgue measure)

and force applied also as measure.
Further Example: Pressure

I Absolutely continuous measures are not rare in real life also.

From basic concept of physics, we know Pressure(P) can be
expressed as P = dF
dA where F is the force applied to an area
A. But how does this make sense?

I Treat area as measure (two-dimensional Lebesgue measure)

and force applied also as measure.

I Then force is absolutely continuous w.r.t. area (why?)

Further Example: Pressure

I Absolutely continuous measures are not rare in real life also.

From basic concept of physics, we know Pressure(P) can be
expressed as P = dF
dA where F is the force applied to an area
A. But how does this make sense?

I Treat area as measure (two-dimensional Lebesgue measure)

and force applied also as measure.

I Then force is absolutely continuous w.r.t. area (why?)

I Thus pressure is the Radon-Nikodyn derivative of force w.r.t.

area in the sense ˆ
F = PdA.
Proposition

I Consider a measurable space (Ω, A) and two measures µ and

γ defined on it such that γ << µ, so that
ˆ
γ(A) = fdµ.
A

´ g is a ´real valued measurable function on (Ω, A).

Suppose
Then gdγ = gfdµ. If any one of these integrals exists,
then so does the other.
Proposition

I Consider a measurable space (Ω, A) and two measures µ and

γ defined on it such that γ << µ, so that
ˆ
γ(A) = fdµ.
A

´ g is a ´real valued measurable function on (Ω, A).

Suppose
Then gdγ = gfdµ. If any one of these integrals exists,
then so does the other.

I It is convenient to remember this as

ˆ ˆ ˆ
dγ
gdγ = g dµ = gfdµ.
dµ
Proof (outline)

´
´If g = IB , ´then gdγ´ = γ(B) and
I
gfdµ = IB fdµ = B fdµ = γ(B). Hence the result holds.
Proof (outline)

´
´If g = IB , ´then gdγ´ = γ(B) and
I
gfdµ = IB fdµ = B fdµ = γ(B). Hence the result holds.

I If g is a simple function, we can write g as linear combination

of indicators and the proof will follow.
Proof (outline)

´
´If g = IB , ´then gdγ´ = γ(B) and
I
gfdµ = IB fdµ = B fdµ = γ(B). Hence the result holds.

I If g is a simple function, we can write g as linear combination

of indicators and the proof will follow.

I For non-negative measurable g , we can take a sequence of

simple functions {gn }n≥1 such that gn ↑ g and then apply
MCT.
Proof (outline)

´
´If g = IB , ´then gdγ´ = γ(B) and
I
gfdµ = IB fdµ = B fdµ = γ(B). Hence the result holds.

I If g is a simple function, we can write g as linear combination

of indicators and the proof will follow.

I For non-negative measurable g , we can take a sequence of

simple functions {gn }n≥1 such that gn ↑ g and then apply
MCT.

I For general g , we write g = g + − g − and prove similarly.

Further calculus with Radon-Nikodym derivatives
I Let ν be a σv-finite measure on a measure space (Ω, A). All
other measures discussed in (1)-(2) are defined on (Ω, A).
1. If λi , i = 1, 2 are measures and λi << ν, then λ1 + λ2 << ν
and
d(λ1 + λ2 ) dλ1 dλ2
= +
dν dν dν
a.e. ν.
2. (Chain rule). If τ is a measure, λ is a σv-finite measure, and
τ << λ << ν, then
dτ dτ dλ
= a.e. ν.
dν dλ dν
In particular, if λ << ν and ν << λ (in which case λ and ν
are equivalent), then
−1
dλ dν
= .
dν dλ
What we get?
I A probability measure P will only admit a density f if it is absolutely
continuous with respect to a sigma finite measure µ.
What we get?
I A probability measure P will only admit a density f if it is absolutely
continuous with respect to a sigma finite measure µ.

I Further if the choice of such µ changes, then the density f also changes.
What we get?
I A probability measure P will only admit a density f if it is absolutely
continuous with respect to a sigma finite measure µ.

I Further if the choice of such µ changes, then the density f also changes.

I For example, a parametric family {Pθ : θ ∈ Θ} dominated by a σ−finite

measure ν on (Ω, A) is called an exponential family if and only if the
distributions Pθ have densities of the form
X s
dPθ
pθ (x) = (x) = exp ηi (θ)Ti (x) − B(θ) h(x)
dν i=1

where ηi and B are real-valued functions of the parameters and Ti are

real valued statistics.
What we get?
I A probability measure P will only admit a density f if it is absolutely
continuous with respect to a sigma finite measure µ.

I Further if the choice of such µ changes, then the density f also changes.

I For example, a parametric family {Pθ : θ ∈ Θ} dominated by a σ−finite

measure ν on (Ω, A) is called an exponential family if and only if the
distributions Pθ have densities of the form
X s
dPθ
pθ (x) = (x) = exp ηi (θ)Ti (x) − B(θ) h(x)
dν i=1

where ηi and B are real-valued functions of the parameters and Ti are

real valued statistics.

I This representation is not unique. If we change the measure ν that

dominates the family, the above representation ´will also change. For
example we can define a new measure λ(A) = A hdν for any A ∈ A and
this will give us an exponential family with density
X s
dPθ
pθ (x) = (x) = exp ηi (θ)Ti (x) − B(θ) .
dλ i=1
Expectation
I Consider a probability space (Ω, A, P) and X : Ω → R is a
random variable.
Expectation
I Consider a probability space (Ω, A, P) and X : Ω → R is a
random variable.
´ ´
I We say E (X ) exists if XdP exists and E (X ) = XdP.
Expectation
I Consider a probability space (Ω, A, P) and X : Ω → R is a
random variable.
´ ´
I We say E (X ) exists if XdP exists and E (X ) = XdP.

I That is, E (X ) is finite iff X is P- integrable.

Expectation
I Consider a probability space (Ω, A, P) and X : Ω → R is a
random variable.
´ ´
I We say E (X ) exists if XdP exists and E (X ) = XdP.

I That is, E (X ) is finite iff X is P- integrable.

I If PX is the probability distribution of X , then for any

measurable function h(X ) we have
ˆ
E (h(X )) = h(x)dPX .
R
Expectation
I Consider a probability space (Ω, A, P) and X : Ω → R is a
random variable.
´ ´
I We say E (X ) exists if XdP exists and E (X ) = XdP.

I That is, E (X ) is finite iff X is P- integrable.

I If PX is the probability distribution of X , then for any

measurable function h(X ) we have
ˆ
E (h(X )) = h(x)dPX .
R

I Further if X has cdf FX , then we can write

ˆ ˆ
E (h(X )) = h(x)dPX = h(x)dFX (x)
R R

where the last integral is Riemann-Stieljes integral.

I Now suppose X is absolutely continous random variable with
probability density fX . Then PX << λ and we have
ˆ ˆ ˆ
E (h(X )) = h(x)dPX = h(x)fX (x)dλ = h(x)fX (x)dx.
R R R
I Now suppose X is absolutely continous random variable with
probability density fX . Then PX << λ and we have
ˆ ˆ ˆ
E (h(X )) = h(x)dPX = h(x)fX (x)dλ = h(x)fX (x)dx.
R R R

I On the other hand, if X is a discrete random variable taking

values in countable set D with pmf p(x) then PX << µ
(counting measure) and
ˆ ˆ X
E (h(X )) = h(x)dPX = h(x)pX (x)dµ = h(x)pX (x).
R R x∈D
I Now suppose X is absolutely continous random variable with
probability density fX . Then PX << λ and we have
ˆ ˆ ˆ
E (h(X )) = h(x)dPX = h(x)fX (x)dλ = h(x)fX (x)dx.
R R R

I On the other hand, if X is a discrete random variable taking

values in countable set D with pmf p(x) then PX << µ
(counting measure) and
ˆ ˆ X
E (h(X )) = h(x)dPX = h(x)pX (x)dµ = h(x)pX (x).
R R x∈D

´
I WeP note that we can only express E (X ) as E (X ) = xf (x)dx
or xp(x) if X is absolutely continuous
´ or discrete
´ random
variable
´ whereas the forms E (X ) = XdP or R XdP X or
R XdFX (x) are always true.
Singular measure

I Consider a measurable space (Ω, A) and two measures µ and

γ defined on it. We say γ is singular w.r.t µ if there exist a set
A ∈ A, such that

µ(A) = 0 implies γ(Ac ) = 0

and we denote it as γ ⊥ µ.
Singular measure

I Consider a measurable space (Ω, A) and two measures µ and

γ defined on it. We say γ is singular w.r.t µ if there exist a set
A ∈ A, such that

µ(A) = 0 implies γ(Ac ) = 0

and we denote it as γ ⊥ µ.

I It is to be noted that if γ is singular with respect to µ, then µ

is also singular with respect to γ and hence we often call µ
and γ to be mutually singular.
Example

I Uniform([0,1]) and Uniform([1, 2]) are singular.

Example

I Uniform([0,1]) and Uniform([1, 2]) are singular.

I Uniform([1,3]) is neither absolutely continuous nor singular to Uniform([2,

4]).
Example

I Uniform([0,1]) and Uniform([1, 2]) are singular.

I Uniform([1,3]) is neither absolutely continuous nor singular to Uniform([2,

4]).

I Uniform([1, 2]) is absolutely continuous with respect to Uniform([0, 4])

but not conversely.
Example

I Uniform([0,1]) and Uniform([1, 2]) are singular.

I Uniform([1,3]) is neither absolutely continuous nor singular to Uniform([2,

4]).

I Uniform([1, 2]) is absolutely continuous with respect to Uniform([0, 4])

but not conversely.

I All these uniforms are absolutely continuous to Lebesgue measure.

Example

I A measure that is purely discrete is singular with respect to Lebesgue

measure.
Example

I A measure that is purely discrete is singular with respect to Lebesgue

measure.

I A probability measure on the line with density (eg., N(0; 1)) is absolutely
continuous to λ . In fact N(0; 1) and are mutually absolutely continuous.
Example

I A measure that is purely discrete is singular with respect to Lebesgue

measure.

I A probability measure on the line with density (eg., N(0; 1)) is absolutely
continuous to λ . In fact N(0; 1) and are mutually absolutely continuous.

I However, the exponential distribution is absolutely continuous to

Lebesgue measure, but not conversely (since (−∞, 0), has zero probability
under the exponential distribution but has positive Lebesgue measure).
Example
I Suppose that θ is uniformly distributed on the interval [0, 2π).
Let X = cosθ, Y = sinθ.
Example
I Suppose that θ is uniformly distributed on the interval [0, 2π).
Let X = cosθ, Y = sinθ.

I Then (X , Y ) has a continuous distribution on the circle

C = {(x, y ) : x 2 + y 2 = 1}.
Example
I Suppose that θ is uniformly distributed on the interval [0, 2π).
Let X = cosθ, Y = sinθ.

I Then (X , Y ) has a continuous distribution on the circle

C = {(x, y ) : x 2 + y 2 = 1}.

I This is because if (x, y ) ∈ C then there exist a unique

θ0 ∈ [0, 2π) with x = cosθ0 and y = sinθ0 . Hence

P[(X , Y ) = (x, y )] = P(θ = θ0 ) = 0.

Example
I Suppose that θ is uniformly distributed on the interval [0, 2π).
Let X = cosθ, Y = sinθ.

I Then (X , Y ) has a continuous distribution on the circle

C = {(x, y ) : x 2 + y 2 = 1}.

I This is because if (x, y ) ∈ C then there exist a unique

θ0 ∈ [0, 2π) with x = cosθ0 and y = sinθ0 . Hence

P[(X , Y ) = (x, y )] = P(θ = θ0 ) = 0.

I In fact P(Y > X ) = 0.5 which means the probability measure

P(X ,Y ) assigns 0.5 value to the set {ω : Y (ω) > X (ω)}.
Example
I Suppose that θ is uniformly distributed on the interval [0, 2π).
Let X = cosθ, Y = sinθ.

I Then (X , Y ) has a continuous distribution on the circle

C = {(x, y ) : x 2 + y 2 = 1}.

I This is because if (x, y ) ∈ C then there exist a unique

θ0 ∈ [0, 2π) with x = cosθ0 and y = sinθ0 . Hence

P[(X , Y ) = (x, y )] = P(θ = θ0 ) = 0.

I In fact P(Y > X ) = 0.5 which means the probability measure

P(X ,Y ) assigns 0.5 value to the set {ω : Y (ω) > X (ω)}.

I But the distribution of (X , Y ) and λ2 are mutually singular

because
P[(X , Y ) ∈ C ] = 1 but λ2 (C ) = 0
and hence we have no density.
Example
I Suppose Z = (X , Y ) ∼ N2 (0, 0, 1, 1, ρ).
Example
I Suppose Z = (X , Y ) ∼ N2 (0, 0, 1, 1, ρ).

I If ρ = 1, then (X , Y ) lie on a straight line Y = X and if ρ = −1, then

(X , Y ) lie on a straight line Y = −X on a two-dimesional plane.
Example
I Suppose Z = (X , Y ) ∼ N2 (0, 0, 1, 1, ρ).

I If ρ = 1, then (X , Y ) lie on a straight line Y = X and if ρ = −1, then

(X , Y ) lie on a straight line Y = −X on a two-dimesional plane.

I Now PZ is a continuous probability distribution because

PZ ({1, 1}) = P(X = 1) = 0 if ρ = 1 and PZ ({1, −1}) = P(X = 1) = 0 if
ρ = −1.
Example
I Suppose Z = (X , Y ) ∼ N2 (0, 0, 1, 1, ρ).

I If ρ = 1, then (X , Y ) lie on a straight line Y = X and if ρ = −1, then

(X , Y ) lie on a straight line Y = −X on a two-dimesional plane.

I Now PZ is a continuous probability distribution because

PZ ({1, 1}) = P(X = 1) = 0 if ρ = 1 and PZ ({1, −1}) = P(X = 1) = 0 if
ρ = −1.

I Thus PZ cannot have density w.r.t. counting measure, that is, PZ cannot
have p.m.f.
Example
I Suppose Z = (X , Y ) ∼ N2 (0, 0, 1, 1, ρ).

I If ρ = 1, then (X , Y ) lie on a straight line Y = X and if ρ = −1, then

(X , Y ) lie on a straight line Y = −X on a two-dimesional plane.

I Now PZ is a continuous probability distribution because

PZ ({1, 1}) = P(X = 1) = 0 if ρ = 1 and PZ ({1, −1}) = P(X = 1) = 0 if
ρ = −1.

I Thus PZ cannot have density w.r.t. counting measure, that is, PZ cannot
have p.m.f.

I Moreover PZ is singular w.r.t. Lebesgue measure λ2 because for ρ = 1,

PZ ({ω : X (ω) = Y (ω)}) = 1 but λ2 ({ω : X (ω) = Y (ω)}) = 0 and
similarly for ρ = −1, PZ ({ω : X (ω) = −Y (ω)}) = 1 but
λ2 ({ω : X (ω) = −Y (ω)}) = 0.
Example
I Suppose Z = (X , Y ) ∼ N2 (0, 0, 1, 1, ρ).

I If ρ = 1, then (X , Y ) lie on a straight line Y = X and if ρ = −1, then

(X , Y ) lie on a straight line Y = −X on a two-dimesional plane.

I Now PZ is a continuous probability distribution because

PZ ({1, 1}) = P(X = 1) = 0 if ρ = 1 and PZ ({1, −1}) = P(X = 1) = 0 if
ρ = −1.

I Thus PZ cannot have density w.r.t. counting measure, that is, PZ cannot
have p.m.f.

I Moreover PZ is singular w.r.t. Lebesgue measure λ2 because for ρ = 1,

PZ ({ω : X (ω) = Y (ω)}) = 1 but λ2 ({ω : X (ω) = Y (ω)}) = 0 and
similarly for ρ = −1, PZ ({ω : X (ω) = −Y (ω)}) = 1 but
λ2 ({ω : X (ω) = −Y (ω)}) = 0.

I Thus PZ cannot have density w.r.t. Lebesgue measure as well.

Example
I Suppose Z = (X , Y ) ∼ N2 (0, 0, 1, 1, ρ).

I If ρ = 1, then (X , Y ) lie on a straight line Y = X and if ρ = −1, then

(X , Y ) lie on a straight line Y = −X on a two-dimesional plane.

I Now PZ is a continuous probability distribution because

PZ ({1, 1}) = P(X = 1) = 0 if ρ = 1 and PZ ({1, −1}) = P(X = 1) = 0 if
ρ = −1.

I Thus PZ cannot have density w.r.t. counting measure, that is, PZ cannot
have p.m.f.

I Moreover PZ is singular w.r.t. Lebesgue measure λ2 because for ρ = 1,

PZ ({ω : X (ω) = Y (ω)}) = 1 but λ2 ({ω : X (ω) = Y (ω)}) = 0 and
similarly for ρ = −1, PZ ({ω : X (ω) = −Y (ω)}) = 1 but
λ2 ({ω : X (ω) = −Y (ω)}) = 0.

I Thus PZ cannot have density w.r.t. Lebesgue measure as well.

I We call such distribution to be singular bivariate normal: it is a

probability measure without any density.
Example

I Suppose X ∈ Rp such that X ∼ Np (µ, Σ).

Example

I Suppose X ∈ Rp such that X ∼ Np (µ, Σ).

I In order to write a density for X , we generally assume Σ > 0 (which

means Rank(Σ) = p).
Example

I Suppose X ∈ Rp such that X ∼ Np (µ, Σ).

I In order to write a density for X , we generally assume Σ > 0 (which

means Rank(Σ) = p).

I Suppose Rank(Σ) = r < p.

Example

I Suppose X ∈ Rp such that X ∼ Np (µ, Σ).

I In order to write a density for X , we generally assume Σ > 0 (which

means Rank(Σ) = p).

I Suppose Rank(Σ) = r < p.

I Then X lies on a r dimensional subspace of Rp .

I Now PX is a continuous probability distribution because

PX (some fixed x on the r dimensional subspace) = 0

I Now PX is a continuous probability distribution because

PX (some fixed x on the r dimensional subspace) = 0

I Thus PX cannot have density w.r.t. counting measure, that is, PX cannot
have p.m.f.
I Now PX is a continuous probability distribution because

PX (some fixed x on the r dimensional subspace) = 0

I Thus PX cannot have density w.r.t. counting measure, that is, PX cannot
have p.m.f.

I Moreover PX is singular w.r.t. Lebesgue measure λp because

PX (r dimensional subspace ) = 1 but λp (r dimensional subspace ) = 0.
I Now PX is a continuous probability distribution because

PX (some fixed x on the r dimensional subspace) = 0

I Thus PX cannot have density w.r.t. counting measure, that is, PX cannot
have p.m.f.

I Moreover PX is singular w.r.t. Lebesgue measure λp because

PX (r dimensional subspace ) = 1 but λp (r dimensional subspace ) = 0.

I Thus PX cannot have density w.r.t. Lebesgue measure as well.

I Now PX is a continuous probability distribution because

PX (some fixed x on the r dimensional subspace) = 0

I Thus PX cannot have density w.r.t. counting measure, that is, PX cannot
have p.m.f.

I Moreover PX is singular w.r.t. Lebesgue measure λp because

PX (r dimensional subspace ) = 1 but λp (r dimensional subspace ) = 0.

I Thus PX cannot have density w.r.t. Lebesgue measure as well.

I We call such distribution to be singular multivariate normal: it is a

probability measure without any density.
Lebesgue Decomposition Theorem

I Consider a sigma-finite measure space (Ω, A, µ) and γ be

another sigma-finite measure on (Ω, A). Then there exists two
measures γ0 and γ1 on (Ω, A) such that γ = γ0 + γ1 where
γ0 ⊥ µ and γ1 << µ. Such decomposition is unique.
Cantor set
1 S 1 2 S2
I Take I = [0, 1].Trisect I as [0, ] [ , ] [ , 1]
| {z3} |3{z3} |3{z }
I11 I21 I31
Cantor set
1 S 1 2 S2
I Take I = [0, 1].Trisect I as [0, ] [ , ] [ , 1]
| {z3} |3{z3} |3{z }
I11 I21 I31

1
I λ- measure of each set= 3 = λ(Ii1 ), i = 1, 2, 3
Cantor set
1 S 1 2 S2
I Take I = [0, 1].Trisect I as [0, ] [ , ] [ , 1]
| {z3} |3{z3} |3{z }
I11 I21 I31

1
I λ- measure of each set= 3 = λ(Ii1 ), i = 1, 2, 3

I Step-I: Set I21 = J11

Cantor set
1 S 1 2 S2
I Take I = [0, 1].Trisect I as [0, ] [ , ] [ , 1]
| {z3} |3{z3} |3{z }
I11 I21 I31

1
I λ- measure of each set= 3 = λ(Ii1 ), i = 1, 2, 3

I Step-I: Set I21 = J11

I Step-II: Next trisect each of the I11 , I31 . Remove the middle
parts from each of the sets and retain the two extreme parts.
Remove parts are denoted by J21 , J22 . Then
λ(J21 ) = λ(J22 ) = 13 λ(J11 ) = 312 .
Cantor set
1 S 1 2 S2
I Take I = [0, 1].Trisect I as [0, ] [ , ] [ , 1]
| {z3} |3{z3} |3{z }
I11 I21 I31

1
I λ- measure of each set= 3 = λ(Ii1 ), i = 1, 2, 3

I Step-I: Set I21 = J11

I Step-III: Each of the retained part is again trisected. Here, as

earlier, middle parts are removed and the extreme parts are
retained. The middle parts are denoted by J31 , J32 , J33 , J34 .
Then λ(J3i ) = 13 λ(J2i ) = 313
I Continue the process. At nth step: The removal sets are denoted by
1
Jnk , k = 1, 2, . . . , 2n−1 . We have λ(Jnk ) = 3n
I Continue the process. At nth step: The removal sets are denoted by
1
Jnk , k = 1, 2, . . . , 2n−1 . We have λ(Jnk ) = 3n