0% found this document useful (0 votes)
1 views48 pages

Handbook Unit 1

This handbook serves as an introduction to probability theory for postgraduate students, focusing on conceptual understanding, visual learning, logical development, and practical applications. It includes a structured approach with sequential chapters covering foundational topics, advanced measure-theoretic concepts, and various probability theories. The book also provides guidelines on how to effectively use the material, including working through examples and utilizing margin notes for quick reference.

Uploaded by

mdnamis12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views48 pages

Handbook Unit 1

This handbook serves as an introduction to probability theory for postgraduate students, focusing on conceptual understanding, visual learning, logical development, and practical applications. It includes a structured approach with sequential chapters covering foundational topics, advanced measure-theoretic concepts, and various probability theories. The book also provides guidelines on how to effectively use the material, including working through examples and utilizing margin notes for quick reference.

Uploaded by

mdnamis12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Preface

This handbook is designed to provide postgraduate students with a rigorous


yet accessible introduction to probability theory. The approach emphasizes:

• Conceptual Understanding: Each definition is accompanied by de-


tailed explanations and motivations

• Visual Learning: Extensive use of diagrams and visualizations to


illustrate abstract concepts

• Logical Development: Careful progression from basic to advanced


topics with clear connections

• Practical Applications: Examples drawn from statistics, computer


science, and real-world problems

How to Use This Book:

• Read chapters sequentially as concepts build upon each other

• Work through all examples and problems to reinforce understanding

• Use the margin notes and key idea boxes for quick reference

• Refer to the appendix for proofs and additional technical details

i
ii
Contents

Preface i

1 Foundations of Probability Theory 1


1.1 Set Theory Fundamentals . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Basic Concepts and Definitions . . . . . . . . . . . . . 1
1.1.2 Set Operations . . . . . . . . . . . . . . . . . . . . . . 1
1.1.3 De Morgan’s Laws . . . . . . . . . . . . . . . . . . . . 2
1.1.4 Generalized Set Operations . . . . . . . . . . . . . . . 3
1.2 Set Theory in Probability . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Sample Spaces and Events . . . . . . . . . . . . . . . . 4
1.2.2 Probability Measures . . . . . . . . . . . . . . . . . . . 4

2 Advanced Measure-Theoretic Foundations 5


2.1 -Algebras and Measurable Spaces . . . . . . . . . . . . . . . . 5
2.2 Measure Theory Foundations . . . . . . . . . . . . . . . . . . 6
2.3 Advanced Limit Concepts . . . . . . . . . . . . . . . . . . . . 7
2.4 Complete Measures and Extensions . . . . . . . . . . . . . . . 8
2.5 Borel Sets and Measurable Functions . . . . . . . . . . . . . . 8
2.6 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Probability Measures . . . . . . . . . . . . . . . . . . . . . . . 10
2.8 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 11
2.9 Integration Theory . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Advanced Probability Theory 13


3.1 Inverse Functions and Measurability . . . . . . . . . . . . . . 13
3.2 Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Random Variables and Distributions . . . . . . . . . . . . . . 14
3.4 Integration and Expectation . . . . . . . . . . . . . . . . . . . 15
3.5 Convergence of Random Variables . . . . . . . . . . . . . . . 16

iii
iv CONTENTS

4 Probability Foundations 17
4.1 Classical Probability . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Empirical Probability . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Subjective Probability . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Axiomatic Probability . . . . . . . . . . . . . . . . . . . . . . 19
4.5 Comparison of Probability Interpretations . . . . . . . . . . . 19
4.6 Basic Properties of Probability Measures . . . . . . . . . . . . 20
4.7 Continuity Theorems . . . . . . . . . . . . . . . . . . . . . . . 20
4.8 Inclusion-Exclusion Principle . . . . . . . . . . . . . . . . . . 21
4.9 Advanced Properties . . . . . . . . . . . . . . . . . . . . . . . 22
4.10 Applications and Examples . . . . . . . . . . . . . . . . . . . 22

5 Discrete Probability Theory 25


5.1 Discrete Probability Spaces . . . . . . . . . . . . . . . . . . . 25
5.2 Finite Probability Spaces . . . . . . . . . . . . . . . . . . . . 26
5.3 Countable Probability Spaces . . . . . . . . . . . . . . . . . . 26
5.4 Discrete Random Variables . . . . . . . . . . . . . . . . . . . 27
5.5 Important Discrete Distributions . . . . . . . . . . . . . . . . 27
5.6 Expectation and Variance . . . . . . . . . . . . . . . . . . . . 28

6 Continuous Probability Theory 29


6.1 Continuous Probability Spaces . . . . . . . . . . . . . . . . . 29
6.2 Important Continuous Distributions . . . . . . . . . . . . . . 30
6.3 Continuous Random Variables . . . . . . . . . . . . . . . . . . 30
6.4 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . 30
6.5 Expectation and Variance . . . . . . . . . . . . . . . . . . . . 31

7 Distribution Functions and Their Properties 33


7.1 Cumulative Distribution Functions . . . . . . . . . . . . . . . 33
7.2 Types of Distributions . . . . . . . . . . . . . . . . . . . . . . 34
7.3 Decomposition of Distributions . . . . . . . . . . . . . . . . . 34
7.4 Convergence of Distribution Functions . . . . . . . . . . . . . 35
7.5 Empirical Distribution Functions . . . . . . . . . . . . . . . . 35
7.6 Applications and Examples . . . . . . . . . . . . . . . . . . . 36

A Detailing on selected concepts 37


A.1 De Morgan’s Laws . . . . . . . . . . . . . . . . . . . . . . . . 37
A.2 Validating Borel Sets . . . . . . . . . . . . . . . . . . . . . . . 38
A.2.1 Validation Method . . . . . . . . . . . . . . . . . . . . 38
A.2.2 Non-Borel Examples . . . . . . . . . . . . . . . . . . . 39
CONTENTS v

A.2.3 Verification Techniques . . . . . . . . . . . . . . . . . . 39


A.3 Borel-Cantelli Lemma . . . . . . . . . . . . . . . . . . . . . . 40
A.4 Radon-Nikodym Theorem . . . . . . . . . . . . . . . . . . . . 41
A.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
vi CONTENTS
Chapter 1

Foundations of Probability
Theory

1.1 Set Theory Fundamentals


1.1.1 Basic Concepts and Definitions
Definition 1.1 (Set). A set is a well-defined collection of distinct objects,
called elements or members. Sets are typically denoted by capital letters
(A, B, C, . . .) and their elements by lowercase letters (a, b, c, . . .).
Formally, a set A in a universe Ω can be defined by its characteristic
function: (
1 if ω ∈ A
χA (ω) =
0 if ω ∈
/A

Remark 1.2. The concept of a set is fundamental in probability theory


because:

• Sample spaces are sets of all possible outcomes

• Events are subsets of the sample space

• Probability measures are functions defined on collections of sets

1.1.2 Set Operations


Definition 1.3 (Basic Set Operations). For sets A, B ⊆ Ω:

• Union: A ∪ B = {ω ∈ Ω : ω ∈ A or ω ∈ B}

1
2 CHAPTER 1. FOUNDATIONS OF PROBABILITY THEORY

• Intersection: A ∩ B = {ω ∈ Ω : ω ∈ A and ω ∈ B}

• Complement: Ac = {ω ∈ Ω : ω ∈
/ A}

• Set Difference: A \ B = {ω ∈ Ω : ω ∈ A and ω ∈


/ B}

• Symmetric Difference: A△B = (A \ B) ∪ (B \ A)

Example 1.4. Let Ω = {1, 2, 3, 4, 5}, A = {1, 2, 4}, B = {2, 4, 5}:

A ∪ B = {1, 2, 4, 5}
A ∩ B = {2, 4}
Ac = {3, 5}
A \ B = {1}
A△B = {1, 5}

A∪B

A\B A A∩B B B\A

Figure 1.1: Venn diagram illustrating basic set operations

1.1.3 De Morgan’s Laws


Theorem 1.5 (De Morgan’s Laws). For any two sets A, B ⊆ Ω:

1. (A ∪ B)c = Ac ∩ B c
2. (A ∩ B)c = Ac ∪ B c

Proof of First Law. We prove both inclusions:


1. (A ∪ B)c ⊆ Ac ∩ B c :

• Let x ∈ (A ∪ B)c

• Then x ∈
/ A ∪ B by definition of complement

• Thus x ∈
/ A and x ∈
/ B (negation of "or")
1.1. SET THEORY FUNDAMENTALS 3

• Therefore x ∈ Ac and x ∈ B c
• Hence x ∈ Ac ∩ B c
2. Ac ∩ B c ⊆ (A ∪ B)c :
• Let x ∈ Ac ∩ B c
• Then x ∈
/ A and x ∈
/B
• Thus x ∈
/ A∪B
• Therefore x ∈ (A ∪ B)c

Key Idea

De Morgan’s Laws establish a duality between union and intersection


through complementation. This duality is fundamental in probability
theory when dealing with events and their complements.

1.1.4 Generalized Set Operations


Definition 1.6 (Generalized Union and Intersection). For an indexed family
of sets {Ai }i∈I :
• i∈I Ai = {ω ∈ Ω : ∃i ∈ I such that ω ∈ Ai }
S

• i∈I Ai = {ω ∈ Ω : ∀i ∈ I, ω ∈ Ai }
T

Example 1.7 (Nested Intervals). Consider the sequence of sets An = (0, 1+


n ) for n ∈ N:
1

\∞
An = (0, 1]
n=1
This shows that countable intersections can behave differently than finite
intersections.
Theorem 1.8 (Extended De Morgan’s Laws). For any collection {Ai }i∈I :
!c
[ \
Ai = Aci
i∈I i∈I
!c
\ [
Ai = Aci
i∈I i∈I
4 CHAPTER 1. FOUNDATIONS OF PROBABILITY THEORY

1.2 Set Theory in Probability


1.2.1 Sample Spaces and Events
Definition 1.9 (Sample Space). The sample space Ω is the set of all
possible outcomes of a random experiment.

Definition 1.10 (Event). An event is a subset of the sample space to which


we can assign probability. The collection of all events forms a σ-algebra F .

Example 1.11. For rolling a die:

• Sample space: Ω = {1, 2, 3, 4, 5, 6}

• Events: "Even number" = {2, 4, 6}, "Prime number" = {2, 3, 5}

1.2.2 Probability Measures


Definition 1.12 (Probability Measure). A function P : F → [0, 1] is a
probability measure if:

• P (Ω) = 1

• For countable disjoint events {Ai }, P ( i Ai ) = i P (Ai )


S P

Proposition 1.13 (Basic Properties). For any events A, B:

• P (Ac ) = 1 − P (A)

• If A ⊆ B, then P (A) ≤ P (B)

• P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Chapter 2

Advanced Measure-Theoretic
Foundations

2.1 -Algebras and Measurable Spaces


Definition 2.1 (-Algebra). A collection F of subsets of X is a -algebra if:

1. X ∈ F

2. Closed under complements: A ∈ F ⇒ Ac ∈ F


S∞
3. Closed under countable unions: {An }∞
n=1 ⊆ F ⇒ n=1 An ∈F

The pair (X, F ) is called a measurable space.

Theorem 2.2 (Generated -Algebra). For any collection C of subsets of


X, there exists a smallest -algebra σ(C ) containing C , called the -algebra
generated by C .

Proof. Let {Fi }i∈I be the family of all -algebras containing C . Then:
\
σ(C ) = Fi
i∈I

We verify the -algebra properties:

1. Each Fi contains X, so X ∈ σ(C )

2. If A ∈ σ(C ), then A ∈ Fi for all i, hence Ac ∈ Fi for all i, so


Ac ∈ σ(C )

5
6 CHAPTER 2. ADVANCED MEASURE-THEORETIC FOUNDATIONS

3. S
For countable unions {AS n } ⊆ σ(C ), each An ∈ Fi for all i, hence
An ∈ Fi for all i, so An ∈ σ(C )
Minimality follows since any -algebra containing C is one of the Fi .

Example 2.3 (Borel -Algebra). The Borel -algebra B(Rn ) is generated


by:
• Open sets: σ(open sets)
• Equivalent generators: closed sets, intervals (a, b], or {(a1 , b1 ] × · · · ×
(an , bn ]}

2.2 Measure Theory Foundations


Definition 2.4 (Measure). A function µ : F → [0, ∞] is a measure if:
1. µ(∅) = 0

n=1 ⊆ F ,
2. Countable additivity: For disjoint {An }∞
∞ ∞
!
[ X
µ An = µ(An )
n=1 n=1

The triple (X, F , µ) is a measure space.


Theorem 2.5 (Continuity from Below).
S Let (X, F , µ) be a measure space
and An ↑ A (i.e., An ⊆ An+1 and A = ∞
n=1 An ). Then:

lim µ(An ) = µ(A)


n→∞

Proof. Construct disjoint sets B1 = A1 , Bn = An \ An−1 for n ≥ 2. Then:


∞ ∞
! !
[ [
µ(A) = µ An = µ Bn
n=1 n=1

X
= µ(Bn ) (countable additivity)
n=1
N N
!
X [
= lim µ(Bn ) = lim µ Bn
N →∞ N →∞
n=1 n=1
= lim µ(AN )
N →∞
2.3. ADVANCED LIMIT CONCEPTS 7

Theorem 2.6 (Carathéodory’s Extension Theorem). Let µ be a premeasure


on an algebra A . Then there exists a unique extension of µ to a measure on
σ(A ).
1. Construct outer measure: µ∗ (E) = inf { ∞
S∞
i=1 Ai , Ai ∈ A }
P
Proof Sketch. i=1 µ(Ai ) : E ⊆

2. Identify µ∗ -measurable sets M satisfying:

µ∗ (E) = µ∗ (E ∩ M ) + µ∗ (E ∩ M c ) ∀E ⊆ X

3. Show M is a -algebra containing A

4. Prove µ∗ |M is the desired measure extension

2.3 Advanced Limit Concepts


Definition 2.7 (Set Convergence). For a sequence {An } in F :
• lim sup An = ∞
S∞
n=1 k=n Ak = {ω : ω ∈ An for infinitely many n}
T

• lim inf An = ∞
T∞
n=1 k=n Ak = {ω : ω ∈ An for all but finitely many n}
S

When lim sup An = lim inf An = A, we say An → A.


Theorem 2.8 (Borel-Cantelli Lemma). For any measure space (X, F , µ):
1. If ∞
P
n=1 µ(An ) < ∞, then µ(lim sup An ) = 0

2. If µ is a probability measure and {An } are independent with ∞


P
n=1 µ(An ) =
∞, then µ(lim sup An ) = 1
Proof. (1) Note that:
∞ [
∞ ∞ ∞
! !
\ [ X
µ(lim sup An ) = µ Ak ≤µ Ak ≤ µ(Ak ) → 0
n=1 k=n k=n k=n

as n → ∞ by the convergence of the series.


(2) Using independence:
N N N
! !
\ Y X
µ Ack = (1 − µ(Ak )) ≤ exp − µ(Ak ) → 0
k=n k=n k=n

Thus µ(lim inf Acn ) = 0 and µ(lim sup An ) = 1.


8 CHAPTER 2. ADVANCED MEASURE-THEORETIC FOUNDATIONS

2.4 Complete Measures and Extensions


Definition 2.9 (Complete Measure). A measure space (X, F , µ) is com-
plete if:
∀E ∈ F with µ(E) = 0, ∀F ⊆ E, F ∈ F

Theorem 2.10 (Completion Theorem). For any measure space (X, F , µ),
there exists a smallest complete measure space (X, F , µ) extending it.

Proof. Define:

F = {E ∪ N : E ∈ F , N ⊆ F ∈ F with µ(F ) = 0}
µ(E ∪ N ) = µ(E)

Verification:

• F is a -algebra

• µ is well-defined and extends µ

• µ is complete

• Minimality follows from construction

2.5 Borel Sets and Measurable Functions


Definition 2.11 (Borel -Algebra). The Borel -algebra B(R) is the small-
est -algebra containing all open sets in R. Equivalently, it’s generated by:

• Open intervals: σ({(a, b) : a < b})

• Closed intervals: σ({[a, b] : a ≤ b})

• Left-open intervals: σ({(a, b] : a < b})

Theorem 2.12 (Cardinality of Borel Sets). The Borel -algebra has cardi-

nality 2ℵ0 (same as R), strictly smaller than 22 0 (the power set of R).
2.6. MEASURABLE FUNCTIONS 9

Proof. 1. Construct the Borel hierarchy:


Σ01 = Open sets
Π01 = Closed sets
( )
[
Σ0α = An : An ∈ Π0βn , βn < α
n∈N
Π0α = A : A ∈ Σ0α
c


2. Show B(R) = α<ω1 Σ0α where ω1 is first uncountable ordinal 3. Each


S

level has cardinality 2ℵ0 , and union over ℵ1 levels preserves this

2.6 Measurable Functions


Definition 2.13 (Measurable Function). A function f : (X, F ) → (R, B(R))
is measurable if:
∀B ∈ B(R), f −1 (B) ∈ F
Equivalently, any of these conditions suffice:
• ∀a ∈ R, {x : f (x) ≤ a} ∈ F
• ∀a ∈ R, {x : f (x) > a} ∈ F
• ∀open U ⊆ R, f −1 (U ) ∈ F
Theorem 2.14 (Operations Preserving Measurability). Let f, g : (X, F ) →
(R, B(R)) be measurable. Then:
1. f + g is measurable
2. f · g is measurable
3. max(f, g) and min(f, g) are measurable
4. If g ̸= 0, f /g is measurable
5. For continuous ϕ : R → R, ϕ ◦ f is measurable
Proof for f + g. For any a ∈ R:
[
{x : f (x) + g(x) > a} = ({x : f (x) > r} ∩ {x : g(x) > a − r})
r∈Q
∈F
since:
10CHAPTER 2. ADVANCED MEASURE-THEORETIC FOUNDATIONS

• The union is countable


• Each {x : f (x) > r} ∈ F and {x : g(x) > a − r} ∈ F
• F is closed under countable unions and finite intersections

2.7 Probability Measures


Definition 2.15 (Probability Measure). A function P : F → [0, 1] is a
probability measure if:
1. P (∅) = 0, P (Ω) = 1

n=1 ⊆ F ,
2. Countable additivity: For disjoint {An }∞
∞ ∞
!
[ X
P An = P (An )
n=1 n=1

The triple (Ω, F , P ) is a probability space.


Theorem 2.16 (Continuity of Probability Measures). For any probability
measure P :
S
1. If An ↑ A (i.e., A1 ⊆ A2 ⊆ · · · and A = An ), then
lim P (An ) = P (A)
n→∞
T
2. If An ↓ A (i.e., A1 ⊇ A2 ⊇ · · · and A = An ), then
lim P (An ) = P (A)
n→∞

Proof for increasing sequence. 1. Construct disjoint sets:


B 1 = A1 , Bn = An \ An−1 for n ≥ 2
2. Then:

[
A= Bn (disjoint union)
n=1
3. By countable additivity:

X N
X
P (A) = P (Bn ) = lim P (Bn ) = lim P (AN )
N →∞ N →∞
n=1 n=1
2.8. RANDOM VARIABLES 11

2.8 Random Variables


Definition 2.17 (Random Variable). A random variable is a measurable
function X : (Ω, F , P ) → (R, B(R)).
Theorem 2.18 (Measurability of Limits). Let {Xn } be a sequence of random
variables. Then:
1. supn Xn , inf n Xn are random variables

2. lim supn→∞ Xn , lim inf n→∞ Xn are random variables

3. If limn→∞ Xn exists pointwise, it is a random variable


Proof for supn Xn . For any a ∈ R:

\
{ω : sup Xn (ω) ≤ a} = {ω : Xn (ω) ≤ a} ∈ F
n
n=1

since:
• Each {ω : Xn (ω) ≤ a} ∈ F by measurability of Xn

• F is closed under countable intersections

2.9 Integration Theory


Definition 2.19 (Lebesgue Integral). For a random variable X ≥ 0:
1. For simple X = ni=1 ai IAi :
P

Z n
X
XdP = ai P (Ai )
i=1

2. For general X ≥ 0:
Z Z 
XdP = sup Y dP : 0 ≤ Y ≤ X, Y simple

3. For general X:
Z Z Z
XdP = X + dP − X − dP
12CHAPTER 2. ADVANCED MEASURE-THEORETIC FOUNDATIONS

Theorem 2.20 (Monotone Convergence Theorem). If 0 ≤ Xn ↑ X point-


wise, then: Z Z
lim Xn dP = XdP
n→∞

Proof. 1. Xn dP ≤ XdP by monotonicity, so lim sup Xn dP ≤ XdP .


R R R R

2. For any simple Y ≤ X with 0 ≤ Y ≤ c, define:

Yn = min(Y, Xn )

Then Yn ↑ Y and:
Z Z Z
Y dP = lim Yn dP ≤ lim inf Xn dP
n→∞ n→∞

3. Take supremum over all simple Y ≤ X to get:


Z Z
XdP ≤ lim inf Xn dP
n→∞
Chapter 3

Advanced Probability Theory

3.1 Inverse Functions and Measurability


Definition 3.1 (Inverse Function). For a bijective function f : (X, F ) →
(Y, G ) between measurable spaces:

• The inverse f −1 : Y → X satisfies f −1 (y) = x ⇐⇒ f (x) = y

• For any B ⊆ Y , f −1 (B) = {x ∈ X : f (x) ∈ B}

Theorem 3.2 (Measurability of Inverse Functions). If f : (X, F ) → (Y, G )


is bijective and measurable, then:

1. f −1 is measurable when F = σ(f −1 (G ))

2. For Borel measurable f : R → R, f −1 is Borel measurable

Proof. 1. For any A ∈ G :

(f −1 )−1 (A) = f (A) ∈ F by measurability of f

2. For Borel case:

• {A ⊆ R : f −1 (A) is Borel} is a σ-algebra

• Contains open sets since f is continuous

• Hence contains all Borel sets

13
14 CHAPTER 3. ADVANCED PROBABILITY THEORY

3.2 Probability Spaces


Definition 3.3 (Complete Probability Space). A probability space (Ω, F , P )
is complete if:

∀N ∈ F with P (N ) = 0, ∀A ⊆ N, A ∈ F

Theorem 3.4 (Completion Theorem). For any probability space (Ω, F , P ),


there exists a smallest complete extension (Ω, F , P ).

Proof. Define:

F = {E ∪ N : E ∈ F , N ⊆ F ∈ F with P (F ) = 0}
P (E ∪ N ) = P (E)

Verification:

• F is a σ-algebra

• P is well-defined and complete

• Minimality follows from construction

3.3 Random Variables and Distributions


Definition 3.5 (Random Variable). A random variable is a measurable
function:
X : (Ω, F , P ) → (R, B(R))
Its distribution is the pushforward measure:

PX (B) = P (X −1 (B)) = P ({ω : X(ω) ∈ B})

Theorem 3.6 (Properties of Distribution Functions). For any random vari-


able X, its CDF FX (x) = P (X ≤ x) satisfies:

1. Monotonicity: x ≤ y ⇒ FX (x) ≤ FX (y)

2. Right-continuity: limx→a+ FX (x) = FX (a)

3. Limits: limx→−∞ FX (x) = 0, limx→+∞ FX (x) = 1


3.4. INTEGRATION AND EXPECTATION 15

Proof. 1. Monotonicity follows from {X ≤ x} ⊆ {X ≤ y}.


2. For right-continuity:

\
{X ≤ a} = {X ≤ a + n1 }
n=1

By continuity of probability measures:


FX (a) = lim FX (a + n1 )
n→∞

3. Limit properties follow from:


lim {X ≤ x} = ∅, lim {X ≤ x} = Ω
x→−∞ x→∞

3.4 Integration and Expectation


Definition 3.7 (Lebesgue Integral). For random variable X ≥ 0 on (Ω, F , P ):
1. For simple X = ni=1 ai IAi :
P

Z Xn
E[X] = XdP = ai P (Ai )
i=1

2. For general X ≥ 0:
E[X] = sup {E[Y ] : 0 ≤ Y ≤ X, Y simple}

3. For general X:
E[X] = E[X + ] − E[X − ]
Theorem 3.8 (Dominated Convergence Theorem). If Xn → X pointwise
and |Xn | ≤ Y with E[Y ] < ∞, then:
lim E[Xn ] = E[X]
n→∞

Proof Sketch. 1. Apply Fatou’s Lemma to Y + Xn and Y − Xn :


E[X] ≤ lim inf E[Xn ]
E[−X] ≤ lim inf E[−Xn ] = − lim sup E[Xn ]
2. Combine to get:
lim sup E[Xn ] ≤ E[X] ≤ lim inf E[Xn ]
3. Hence the limit exists and equals E[X].
16 CHAPTER 3. ADVANCED PROBABILITY THEORY

3.5 Convergence of Random Variables


Definition 3.9 (Modes of Convergence). For random variables {Xn } and
X:

• Almost sure: P (limn→∞ Xn = X) = 1

• Probability: ∀ϵ > 0, limn→∞ P (|Xn − X| > ϵ) = 0

• Lp : limn→∞ E[|Xn − X|p ] = 0

• Distribution: limn→∞ FXn (x) = FX (x) at continuity points

Theorem 3.10 (Relationships Between Convergences).

a.s. ⇒ Probability ⇒ Distribution


Lp ⇒ Probability
Probability ⇒ Subsequence a.s.

Proof of Probability ⇒ Subsequence a.s. 1. Choose ϵk = 1/k and nk such


that:
P (|Xnk − X| > 1/k) < 1/2k
2. By Borel-Cantelli, P (|Xnk − X| > 1/k i.o.) = 0 3. Hence Xnk → X
almost surely
Chapter 4

Probability Foundations

4.1 Classical Probability


Definition 4.1 (Classical Probability). For a finite sample space Ω with N
equally likely outcomes:
|A| Number of favorable outcomes
P (A) = =
|Ω| Total number of outcomes
Theorem 4.2 (Properties of Classical Probability). For any events A, B ⊆
Ω:
1. 0 ≤ P (A) ≤ 1

2. P (Ω) = 1

3. If A ∩ B = ∅, then P (A ∪ B) = P (A) + P (B)

4. P (Ac ) = 1 − P (A)

Proof. 1. Since 0 ≤ |A| ≤ N , we have 0 ≤ |A|


N ≤ 1.
2. P (Ω) = N = 1.
N

3. If A ∩ B = ∅, then |A ∪ B| = |A| + |B|, so:


|A| + |B|
P (A ∪ B) = = P (A) + P (B)
N
4. Since A ∩ Ac = ∅ and A ∪ Ac = Ω:

1 = P (Ω) = P (A) + P (Ac ) ⇒ P (Ac ) = 1 − P (A)

17
18 CHAPTER 4. PROBABILITY FOUNDATIONS

4.2 Empirical Probability


Definition 4.3 (Empirical Probability). For an event A occurring nA times
in n trials:
nA
P (A) = lim
n→∞ n

Theorem 4.4 (Strong Law of Large Numbers). For i.i.d. trials with true
probability p:
 nA 
P lim =p =1
n→∞ n

Sketch. 1. Use Borel-Cantelli lemma to show:


∞  n 
A
X
∀ϵ > 0, P −p >ϵ <∞
n
n=1

2. Apply Kolmogorov’s zero-one law to conclude almost sure convergence.

4.3 Subjective Probability


Definition 4.5 (Subjective Probability). A personal degree of belief P (A) ∈
[0, 1] satisfying:

1. P (Ω) = 1

2. For mutually exclusive A1 , ..., An :

n n
!
[ X
P Ai = P (Ai )
i=1 i=1

Theorem 4.6 (Dutch Book Argument). A subjective probabilist who violates


the probability axioms can be forced into a sure loss (Dutch book).

Proof. Suppose an agent assigns: - P (A) + P (Ac ) < 1 for some event A
A bookmaker can offer: - Bet on A at odds P (A) : (1 − P (A)) - Bet on
Ac at odds P (Ac ) : (1 − P (Ac ))
The agent accepts both bets, paying P (A) + P (Ac ) < 1 but receives 1
regardless of outcome.
4.4. AXIOMATIC PROBABILITY 19

4.4 Axiomatic Probability


Definition 4.7 (Probability Space). A triple (Ω, F , P ) where:

• Ω is the sample space

• F is a σ-algebra on Ω

• P : F → [0, 1] satisfies Kolmogorov’s axioms

Theorem 4.8 (Extension Theorem). Any countably additive P on an algebra


A has a unique extension to σ(A ).

Key Steps. 1. Define outer measure:


(∞ ∞
)
X [
P ∗ (E) = inf P (Ai ) : E ⊆ Ai , Ai ∈ A
i=1 i=1

2. Show M = {E ⊆ Ω : P ∗ (F ) = P ∗ (F ∩ E) + P ∗ (F ∩ E c ) ∀F ⊆ Ω} is
a σ-algebra containing A .
3. P ∗ |M is the desired extension.

4.5 Comparison of Probability Interpretations


Theorem 4.9 (Equivalence in Regular Cases). For i.i.d. trials with equally
likely outcomes:

Pclassical (A) = Pempirical (A) = Psubjective (A)

when all are well-defined.


|A|
Proof. 1. Classical: By symmetry, P (A) = |Ω| .
|A|
2. Empirical: By SLLN, nA
n → |Ω| almost surely.
|A|
3. Subjective: For symmetric beliefs, must assign P (A) = |Ω| to avoid
Dutch book.

Remark 4.10. The axiomatic framework subsumes all interpretations when


properly specified.
20 CHAPTER 4. PROBABILITY FOUNDATIONS

4.6 Basic Properties of Probability Measures


Theorem 4.11 (Fundamental Properties). For any probability space (Ω, F , P )
and events A, B ∈ F :

1. P (∅) = 0

2. P (Ac ) = 1 − P (A)

3. If A ⊆ B, then P (B \ A) = P (B) − P (A)

4. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

5. Boole’s Inequality: P ( ni=1 Ai ) ≤ ni=1 P (Ai )


S P

Proof of Property 3. For A ⊆ B:

B = A ∪ (B \ A) (disjoint union)
P (B) = P (A) + P (B \ A) by additivity
⇒ P (B \ A) = P (B) − P (A)

Proof of Property 4. Using the disjoint decomposition:

A ∪ B = (A \ B) ∪ (A ∩ B) ∪ (B \ A)
P (A ∪ B) = P (A \ B) + P (A ∩ B) + P (B \ A)
= [P (A \ B) + P (A ∩ B)] + [P (B \ A) + P (A ∩ B)] − P (A ∩ B)
= P (A) + P (B) − P (A ∩ B)

4.7 Continuity Theorems


Theorem 4.12 (ContinuityS from Below). For any increasing sequence A1 ⊆
A2 ⊆ · · · with An ↑ A = ∞
n=1 An :

lim P (An ) = P (A)


n→∞
4.8. INCLUSION-EXCLUSION PRINCIPLE 21

Proof. Construct disjoint sets B1 = A1 , Bn = An \ An−1 for n ≥ 2. Then:



[
A= Bn (disjoint union)
n=1
X∞
P (A) = P (Bn ) by countable additivity
n=1
N
X
= lim P (Bn ) = lim P (AN )
N →∞ N →∞
n=1

Theorem 4.13 (ContinuityT from Above). For any decreasing sequence A1 ⊇


A2 ⊇ · · · with An ↓ A = ∞
n=1 An :

lim P (An ) = P (A)


n→∞

Proof. Consider complements Acn which form an increasing sequence:



[
Acn ↑ Ac = Acn
n=1
lim P (An ) = 1 − lim P (Acn )
n→∞ n→∞
c
= 1 − P (A ) = P (A)

4.8 Inclusion-Exclusion Principle


Theorem 4.14 (General Inclusion-Exclusion). For events A1 , . . . , An :

n n
!
[ X
P Ai = (−1)k+1 Sk
i=1 k=1

where:  
X k
\
Sk = P Aij 
1≤i1 <···<ik ≤n j=1
22 CHAPTER 4. PROBABILITY FOUNDATIONS

Proof. By induction on n. Base case (n = 2) is the familiar formula:

P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 )

Inductive step uses:


n+1 n n
! ! !
[ [ [
P Ai = P Ai + P (An+1 ) − P (Ai ∩ An+1 )
i=1 i=1 i=1

and applies the inductive hypothesis to both unions.

4.9 Advanced Properties


Theorem 4.15 (Bonferroni Inequalities). For any events A1 , . . . , An and
m ≤ n:

1. For odd m: P ( ni=1 Ai ) ≤ m k+1 S


S P
k=1 (−1) k
Sn Pm
2. For even m: P ( i=1 Ai ) ≥ k=1 (−1)k+1 Sk

Theorem 4.16 (Fréchet Inequalities). For any events A1 , . . . , An :


n n
! !
X \
max 0, P (Ai ) − (n − 1) ≤ P Ai ≤ min P (Ai )
i
i=1 i=1
n n
! !
[ X
max P (Ai ) ≤ P Ai ≤ min 1, P (Ai )
i
i=1 i=1

Proof of Lower Bound. Using induction and:

P (A1 ∩ A2 ) ≥ P (A1 ) + P (A2 ) − 1

The general case follows by iterating this inequality.

4.10 Applications and Examples


Example 4.17 (Poincaré’s Formula). For events A1 , . . . , An :
n n n
! !
\ [ X
P Aci = 1 − P Ai = (−1)k Sk
i=1 i=1 k=0

where S0 = 1. This gives the probability that none of the events occur.
4.10. APPLICATIONS AND EXAMPLES 23

Example 4.18 (Matching Problem). For random permutations, the proba-


bility of at least one fixed point is:
n
X (−1)k
1− ≈ 1 − e−1 ≈ 0.632
k!
k=0

as n → ∞, using inclusion-exclusion.
24 CHAPTER 4. PROBABILITY FOUNDATIONS
Chapter 5

Discrete Probability Theory

5.1 Discrete Probability Spaces


Definition 5.1 (Discrete Probability Space). A probability space (Ω, F , P )
is discrete if:

• Ω is countable (finite or countably infinite)

• F = P(Ω) (the power set)

• There exists a probability mass function p : Ω → [0, 1] with:


X
P (A) = p(ω) for all A ⊆ Ω
ω∈A

Theorem 5.2 (Properties of Discrete Measures). For any discrete probability


space:

1. P ({ω}) = p(ω) for all ω ∈ Ω


P
2. ω∈Ω p(ω) = 1
P
3. For any A ⊆ Ω, P (A) = ω∈A P ({ω})

Proof. 1. By definition with A = {ω}.


2. Since P (Ω) = 1 and Ω is countable:
X X
1 = P (Ω) = P ({ω}) = p(ω)
ω∈Ω ω∈Ω

3. Follows directly from countable additivity.

25
26 CHAPTER 5. DISCRETE PROBABILITY THEORY

5.2 Finite Probability Spaces


Definition 5.3 (Finite Equiprobable Space). A discrete space with |Ω| =
N < ∞ and:
1
p(ω) = for all ω ∈ Ω
N
Theorem 5.4 (Combinatorial Probability). In a finite equiprobable space:

|A| Number of favorable outcomes


P (A) = =
|Ω| Total number of outcomes

Proof.
X X 1 |A|
P (A) = p(ω) = =
N N
ω∈A ω∈A

Example 5.5 (Poker Hands). Probability of a full house (3 of one rank, 2


of another):

13 × 12 × 43 × 42
 
P (Full House) = 52
 ≈ 0.00144
5

5.3 Countable Probability Spaces


Definition 5.6 (Geometric Distribution). For 0 < p < 1, the geometric
distribution on N has:

p(k) = (1 − p)k−1 p for k = 1, 2, 3, . . .

Theorem 5.7 (Memoryless Property). For geometric X and k, m ≥ 1:

P (X > k + m | X > k) = P (X > m)

Proof.

P (X > k + m)
P (X > k + m | X > k) =
P (X > k)
(1 − p)k+m
= = (1 − p)m = P (X > m)
(1 − p)k
5.4. DISCRETE RANDOM VARIABLES 27

5.4 Discrete Random Variables


Definition 5.8 (Discrete Random Variable). A function X : Ω → R where
X(Ω) is countable.
Theorem 5.9 (Probability Mass Function). For discrete X, the PMF pX :
R → [0, 1] is:
pX (x) = P (X = x) = P ({ω : X(ω) = x})
satisfying:
1. pX (x) ≥ 0 for all x
P
2. x∈X(Ω) pX (x) = 1
Proof. 1. Since P is non-negative.
2. Since X(Ω) is countable and P (Ω) = 1:
 
X [
P (X = x) = P  {ω : X(ω) = x} = P (Ω) = 1
x∈X(Ω) x∈X(Ω)

5.5 Important Discrete Distributions


Definition 5.10 (Poisson Distribution). For λ > 0, the Poisson PMF is:
λk
p(k) = e−λ for k = 0, 1, 2, . . .
k!
Theorem 5.11 (Poisson Limit Theorem). For n → ∞, npn → λ:
λk
 
n k
pn (1 − pn )n−k → e−λ
k k!
Sketch. 1. Write:
n(n − 1) · · · (n − k + 1) λn k λn n−k
     
n k n−k
p (1 − pn ) = 1−
k n k! n n
2. Take limits:
n(n − 1) · · · (n − k + 1)
→1
nk
λn n−k
 
1− → e−λ
n
28 CHAPTER 5. DISCRETE PROBABILITY THEORY

5.6 Expectation and Variance


Definition 5.12 (Expectation). For discrete X with PMF pX :
X
E[X] = xpX (x)
x∈X(Ω)

when the sum converges absolutely.

Theorem 5.13 (Linearity of Expectation). For discrete X, Y and a, b ∈ R:

E[aX + bY ] = aE[X] + bE[Y ]

Proof.
X
E[aX + bY ] = (aX(ω) + bY (ω))P ({ω})
ω∈Ω
X X
=a X(ω)P ({ω}) + b Y (ω)P ({ω})
ω ω
= aE[X] + bE[Y ]

Definition 5.14 (Variance).

Var(X) = E[(X − E[X])2 ] = E[X 2 ] − (E[X])2

Example 5.15 (Binomial Variance). For X ∼ Binomial(n, p):

Var(X) = np(1 − p)
Chapter 6

Continuous Probability Theory

6.1 Continuous Probability Spaces


Definition 6.1 (Continuous Probability Space). A probability space (Ω, F , P )
is continuous if:

• Ω is uncountable (typically Rn )

• F contains the Borel σ-algebra

• There exists a probability density function f : Ω → [0, ∞) with:


Z
P (A) = f (x)dx for all measurable A ⊆ Ω
A

Theorem 6.2 (Properties of Continuous Measures). For any continuous


probability space:

1. P ({x}) = 0 for all x ∈ Ω


R∞
2. −∞ f (x)dx = 1
Rb
3. For any a ≤ b, P ([a, b]) = a f (x)dx

Proof. 1. Since singletons have zero Lebesgue measure.


2. By normalization:
Z
P (Ω) = f (x)dx = 1

3. Follows from the definition of Lebesgue integration.

29
30 CHAPTER 6. CONTINUOUS PROBABILITY THEORY

6.2 Important Continuous Distributions


Definition 6.3 (Uniform Distribution). For a < b, the uniform PDF on
[a, b] is: (
1
x ∈ [a, b]
f (x) = b−a
0 otherwise
Theorem 6.4 (Uniform Properties). For X ∼ Uniform(a, b):
a+b
1. E[X] = 2
(b−a)2
2. Var(X) = 12
b2 −a2
Rb
Proof. 1. E[X] = a x · b−a1 1
dx = b−a · 2 = a+b
2
b 3 3
2. E[X 2 ] = a x2 · b−a
1 b −a
R
dx = 3(b−a)
(b−a)2
Var(X) = E[X 2 ] − (E[X])2 = 12

6.3 Continuous Random Variables


Definition 6.5 (Continuous Random Variable). A measurable function X :
Ω → R with absolutely continuous distribution.
Theorem 6.6 (Cumulative Distribution Function). For continuous X, the
CDF FX : R → [0, 1] is:
Z x
FX (x) = P (X ≤ x) = fX (t)dt
−∞

and satisfies:
1. limx→−∞ FX (x) = 0, limx→∞ FX (x) = 1
2. FX is absolutely continuous
d
3. fX (x) = dx FX (x) almost everywhere

6.4 Normal Distribution


Definition 6.7 (Standard Normal). The standard normal PDF is:
1 2
ϕ(x) = √ e−x /2

Rx
with CDF Φ(x) = −∞ ϕ(t)dt.
6.5. EXPECTATION AND VARIANCE 31

Theorem 6.8 (General Normal). For X ∼ N (µ, σ 2 ):

1 (x−µ)2
fX (x) = √ e− 2σ2
σ 2π

6.5 Expectation and Variance


Definition 6.9 (Expectation). For continuous X with PDF fX :
Z ∞
E[X] = xfX (x)dx
−∞

when the integral converges absolutely.

Theorem 6.10 (Law of the Unconscious Statistician). For measurable g:


Z ∞
E[g(X)] = g(x)fX (x)dx
−∞

Definition 6.11 (Variance).


Z ∞
Var(X) = E[(X − E[X])2 ] = (x − µ)2 fX (x)dx
−∞
32 CHAPTER 6. CONTINUOUS PROBABILITY THEORY
Chapter 7

Distribution Functions and


Their Properties

7.1 Cumulative Distribution Functions


Definition 7.1 (Cumulative Distribution Function). For a random variable
X on (Ω, F , P ), the CDF FX : R → [0, 1] is:

FX (x) = P (X ≤ x) = P ({ω ∈ Ω : X(ω) ≤ x})

Theorem 7.2 (Characterization of CDFs). A function F : R → [0, 1] is a


CDF iff:

1. F is non-decreasing

2. F is right-continuous: limy↓x F (y) = F (x)

3. limx→−∞ F (x) = 0 and limx→∞ F (x) = 1

Proof. (⇒) 1. Monotonicity: For x ≤ y, {X ≤ x} ⊆ {X ≤ y} implies


F (x) ≤ F (y).
2. Right-continuity: For xn ↓ x, {X ≤ xn } ↓ {X ≤ x}, so by continuity
from above:
F (xn ) = P (X ≤ xn ) ↓ P (X ≤ x) = F (x)

3. Limits: {X ≤ x} ↓ ∅ as x → −∞ and {X ≤ x} ↑ Ω as x → ∞.
(⇐) Construction of probability measure via Carathéodory extension the-
orem.

33
34CHAPTER 7. DISTRIBUTION FUNCTIONS AND THEIR PROPERTIES

7.2 Types of Distributions


Definition 7.3 (Discrete Distribution). A random variable X is discrete
if its range is countable. Its CDF is a step function:
X
FX (x) = p(xi )
xi ≤x

where p(xi ) = P (X = xi ) is the probability mass function.


Example 7.4 (Bernoulli Distribution). For X ∼ Bernoulli(p):

0
 x<0
FX (x) = 1 − p 0≤x<1

1 x≥1

with jumps of size 1 − p at 0 and p at 1.


Definition 7.5 (Absolutely Continuous Distribution). X is absolutely
continuous if its CDF can be expressed as:
Z x
FX (x) = fX (t)dt
−∞
R∞
for some density function fX ≥ 0 with −∞ fX (t)dt = 1.
Theorem 7.6 (Radon-Nikodym for Probability Measures). If FX is abso-
lutely continuous, then fX = FX′ almost everywhere.
Example 7.7 (Normal Distribution). For X ∼ N (µ, σ 2 ):
Z x
1 (t−µ)2
FX (x) = √ e− 2σ2 dt
2πσ −∞
which is differentiable everywhere with:
1 (x−µ)2
fX (x) = √ e− 2σ2
2πσ

7.3 Decomposition of Distributions


Theorem 7.8 (Lebesgue Decomposition). Any CDF F can be uniquely de-
composed as:
F = αFd + βFac + γFsc
where α + β + γ = 1 and:
7.4. CONVERGENCE OF DISTRIBUTION FUNCTIONS 35

• Fd is discrete (pure jumps)

• Fac is absolutely continuous

• Fsc is singular continuous (continuous but not absolutely continuous)

Example 7.9 (Cantor Distribution). The Cantor function Fsc is:

• Continuous everywhere

• Differentiable almost everywhere with Fsc


′ = 0 a.e.

• Not absolutely continuous

• Supported on the Cantor set (measure zero)

7.4 Convergence of Distribution Functions


Definition 7.10 (Weak Convergence). A sequence Fn converges weakly
to F (Fn ⇒ F ) if:
lim Fn (x) = F (x)
n→∞

at all continuity points x of F .

Theorem 7.11 (Helly’s Selection Theorem). Every sequence of CDFs has a


subsequence that converges weakly to some right-continuous, non-decreasing
function F (possibly defective: F (∞) − F (−∞) < 1).

Theorem 7.12 (Levy Continuity Theorem). For characteristic functions


ϕn (t) = E[eitXn ]:

Fn ⇒ F ⇐⇒ ϕn (t) → ϕ(t) for all t

where ϕ is the characteristic function of F .

7.5 Empirical Distribution Functions


Definition 7.13 (Empirical CDF). For i.i.d. samples X1 , . . . , Xn from F ,
the empirical CDF is:
n
1X
F̂n (x) = I(−∞,x] (Xi )
n
i=1
36CHAPTER 7. DISTRIBUTION FUNCTIONS AND THEIR PROPERTIES

Theorem 7.14 (Glivenko-Cantelli).


a.s.
sup |F̂n (x) − F (x)| → 0
x∈R

a.s.
Proof. 1. For fixed x, F̂n (x) → F (x) by SLLN.
2. Uniform convergence follows from monotonicity and compactification
of R.

Theorem 7.15 (Donsker’s Theorem).



n(F̂n − F ) ⇒ B ◦ F

where B is a standard Brownian bridge.

7.6 Applications and Examples


Example 7.16 (Exponential Distribution Validation). For F (x) = 1 − e−λx
(x ≥ 0):

• Right-continuous: limy↓x F (y) = F (x)

• Monotone: F ′ (x) = λe−λx > 0 for x > 0

• Limits: F (0+ ) = 0, F (∞) = 1


2 /2
Example 7.17 (Rayleigh Distribution). For F (x) = 1 − e−x (x ≥ 0):
2
• F ′ (x) = xe−x /2 is non-negative
R∞ 2
• 0 xe−x /2 dx = 1

• Corresponds to X = Y12 + Y22 where Yi ∼ N (0, 1)


p
Appendix A

Detailing on selected concepts

A.1 De Morgan’s Laws


Complete Proof of Both Laws. We provide a more detailed proof using logi-
cal equivalences:
1. First Law: (A ∪ B)c = Ac ∩ B c
For any x ∈ Ω:

x ∈ (A ∪ B)c ⇔ x ∈
/ A∪B
⇔ ¬(x ∈ A or x ∈ B)
/ A and x ∈
⇔x∈ /B (by De Morgan’s laws in logic)
⇔ x ∈ A and x ∈ B
c c

⇔ x ∈ Ac ∩ B c

2. Second Law: (A ∩ B)c = Ac ∪ B c


For any x ∈ Ω:

x ∈ (A ∩ B)c ⇔ x ∈
/ A∩B
⇔ ¬(x ∈ A and x ∈ B)
/ A or x ∈
⇔x∈ /B (by De Morgan’s laws in logic)
⇔ x ∈ A or x ∈ B
c c

⇔ x ∈ Ac ∪ B c

37
38 APPENDIX A. DETAILING ON SELECTED CONCEPTS

A.2 Validating Borel Sets


Definition A.1 (Borel Set). A Borel set is any set in a topological space
that can be formed from open sets (or equivalently, closed sets) through:

• Countable unions

• Countable intersections

• Complements

The collection of all Borel sets forms a σ-algebra called the Borel σ-algebra,
denoted B(X).

A.2.1 Validation Method


To verify if a set A is Borel:

1. Check if A can be constructed from open/closed sets using countable


operations

2. Verify membership in the Borel σ-algebra:

A ∈ B(X)

3. Alternatively, show it’s measurable with respect to all Borel measures

Example A.2 (Standard Borel Sets). The following are Borel sets in R with
the usual topology:

• All open intervals (a, b)

• All closed intervals [a, b]

• The rational numbers Q (countable union of singletons)

• The Cantor set (closed, uncountable intersection of closed sets)

Rationals are Borel. [


Q= {q}
q∈Q

Each singleton {q} is closed (hence Borel), and the countable union preserves
Borel measurability.
A.2. VALIDATING BOREL SETS 39

Example A.3 (Constructing Borel Sets). The set A = (0, 1) ∪ {2} is Borel
because:

• (0, 1) is open (hence Borel)

• {2} is closed (hence Borel)

• The union of two Borel sets is Borel

A.2.2 Non-Borel Examples


[Vitali Set] There exists a subset V ⊂ [0, 1] that is not Borel measurable.
Such sets:

• Require the Axiom of Choice to construct

• Are not Lebesgue measurable

• Cannot be obtained through countable operations on open/closed sets

[Non-Borel in Product Space] Let A ⊂ R2 be the set of points with both


coordinates rational:
A=Q×Q
While this appears simple, if we equip R2 with the σ-algebra generated by
products of Borel sets, A is Borel. However, more exotic sets like:

B = {(x, y) : x − y ∈ V }

where V is a Vitali set, are typically non-Borel.

A.2.3 Verification Techniques


Theorem A.4 (Borel Hierarchy). The Borel σ-algebra can be stratified into
levels:

• Σ01 : Open sets

• Π01 : Closed sets

• Σ02 : Countable unions of closed sets (Fσ )

• Π02 : Countable intersections of open sets (Gδ )

• And so on through transfinite induction


40 APPENDIX A. DETAILING ON SELECTED CONCEPTS

A set is Borel if it appears at some level of this hierarchy.

Example A.5 (Gδ but not Fσ ). The set of irrational numbers R \ Q is:

• Gδ : Can be written as q∈Q R \ {q}


T

• Not Fσ in R (requires Baire category theorem)

but still Borel as it’s in the hierarchy.

A.3 Borel-Cantelli Lemma


Theorem A.6 (Borel-Cantelli Lemma). Let (Ω, F , P ) be a probability space
and {An }∞
n=1 a sequence of events.

1. If ∞
P
n=1 P (An ) < ∞, then P (lim sup An ) = 0

2. If {An } are independent and ∞


P
n=1 P (An ) = ∞, then P (lim sup An ) =
1

where lim sup An = ∞


T S∞
n=1 k=n Ak (“infinitely many An occur").

Proof of Part 1.
∞ [

!
\
P (lim sup An ) = P Ak
n=1 k=n

!
[
= lim P Ak (by continuity of measure)
n→∞
k=n

X
≤ lim P (Ak ) (subadditivity)
n→∞
k=n
= 0 (since the tail of a convergent series tends to 0)

Proof of Part 2. Using independence and the inequality 1 − x ≤ e−x :


N N
!
\ Y
P Ack = (1 − P (Ak ))
k=n k=n
N
!
X
≤ exp − P (Ak ) → 0 as N → ∞
k=n
A.4. RADON-NIKODYM THEOREM 41

Thus:
∞ ∞
! !
[ \
P Ak =1−P Ack =1
k=n k=n
and:

!
[
P (lim sup An ) = lim P Ak =1
n→∞
k=n

A.4 Radon-Nikodym Theorem


Definition A.7 (Absolute Continuity). A measure ν is absolutely con-
tinuous with respect to µ (denoted ν ≪ µ) if:
µ(E) = 0 =⇒ ν(E) = 0 ∀E ∈ F
Theorem A.8 (Radon-Nikodym). Let (Ω, F ) be a measurable space with
σ-finite measures µ and ν. If ν ≪ µ, then there exists a measurable f : Ω →
[0, ∞) such that: Z
ν(E) = f dµ ∀E ∈ F
E

This f is called the Radon-Nikodym derivative, denoted dµ .

Sketch of Proof. The proof proceeds in several steps:


1. Finite Measures Case:
• Consider the set F = {f ≥ 0 : E f dµ ≤ ν(E)∀E}
R

• Take f0 to be a maximal element (exists by Zorn’s Lemma)


• Show ν(E) − E f0 dµ defines a zero measure
R

2. σ-finite Extension:
• Decompose Ω = ∞ n=1 Ωn with µ(Ωn ), ν(Ωn ) < ∞
S

• Apply finite case to each Ωn to get fn


• Define f = ∞
P
n=1 fn 1Ωn

Corollary A.9 (Existence of Conditional Expectation). For X ∈ L1 (Ω, F , P )


and sub-σ-algebra G ⊆ F , the Rconditional expectation E[X|G ] is the Radon-
Nikodym derivative of ν(E) = E XdP with respect to P |G .
42 APPENDIX A. DETAILING ON SELECTED CONCEPTS

A.5 Applications
Example A.10 (Borel-Cantelli in Random Walks). For a simple random
walk {Sn } on Zd , the event {Sn = 0 infinitely often} has probability 1 if
d ≤ 2 and 0 if d ≥ 3.

Example A.11 (Radon-Nikodym in Finance). In risk-neutral pricing, the


Radon-Nikodym derivative connects physical and risk-neutral measures:
dQ RT 1 T 2
R
= e− 0 θt dWt − 2 0 θt dt
dP

You might also like