0% found this document useful (0 votes)
11 views19 pages

Lec 1

Chapter 01 introduces the fundamentals of probability theory, including definitions of probability, σ-algebra, probability measure, random variables, and their distributions. It also covers concepts such as expectation, convergence of functions and integrals, and the change of measure through the Radon-Nikodym theorem. The chapter emphasizes the mathematical framework and properties that underpin probability theory.

Uploaded by

wyt2928
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views19 pages

Lec 1

Chapter 01 introduces the fundamentals of probability theory, including definitions of probability, σ-algebra, probability measure, random variables, and their distributions. It also covers concepts such as expectation, convergence of functions and integrals, and the change of measure through the Radon-Nikodym theorem. The chapter emphasizes the mathematical framework and properties that underpin probability theory.

Uploaded by

wyt2928
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Chapter 01: Probability Theory

Jing Xu (RUC)

Renmin University of China


Fall
Jing Xu (RUC) SOURCE: Chapter 01
Modern Language about Probability

What is probability?
Classic interpretation: The frequencies that certain events
occur
I it is posterior
I every event is associated with a number in [0, 1]

Modern perspective: Probability is a mapping from the set of


“events” to the interval [0,1], satisfying some basic properties
(Kolmogorov)
I from now on, you may think probability as some kind of
function, which maps events to numbers. The number is
defined as “the probability that a particular event will occur”
I Q: how to build a model for “event”?

Jing Xu (RUC) Chapter 01


σ-Algebra: A Model of “Event”

Suppose we are doing a random experiment. Let a set Ω


include all possible outcomes of the experiment. Ω is called
the sample space, and every element in it is called a sample
point
Definition (σ-algebra): Given a set Ω, let F be a collection of
subsets of Ω. F is called a σ-algebra, if it satisfies the
following properties
I (i) ∅ ∈ F;
I (ii) if A ∈ F, then Ac ∈ F;
I (iii) if Ai ∈ F, then ∪∞
i=1 Ai ∈ F.

Basic property: σ-algebra is closed for common set operations


such as set intersection, set difference, and so on.

Jing Xu (RUC) Chapter 01


σ-Algebra: A Model of “Event”

Usually, given a collection of the subsets of Ω, we can


construct a smallest σ-algebra that includes this collection.
This σ-algebra is called to be generated by this collection
Consider a random experiment of tossing a fair coin twice
I what is the sample space?
I what different σ-algebras can you construct, given different
collections of subsets, like A1 = {{HH}}, A2 = {{HH, HT }},
and A3 = {{HH}, {HT }, {TH}, {TT }}?
I note that we have a clear meaning for every set in the above
σ-algebras

Jing Xu (RUC) Chapter 01


Probability Measure

Definition (Probability Measure): Given a set Ω, let F be a


σ-algebra of the subsets of Ω. A Probability Measure P is a
mapping from F to [0,1], satisfying the following properties
I (i) P(Ω) = 1;
P∞ collection of disjoint sets Ai ∈ F, we have
(ii) for a countable
I

P(∪∞ A
i=1 i ) = i=1 P(Ai ) (this implies P(∅) = 0).

Definition (Probability Space): If the above requirements are


satisfied, the triplet (Ω, F, P) is called a Probability Space
I notes: all three elements are necessary

Jing Xu (RUC) Chapter 01


Random Variable and Distribution

Definition (Borel set): Let Ω = R. The Borel σ-algebra is the


σ-algebra generated by all closed intervals contained in R.
Every set A in the Borel σ-algebra is called a Borel set
I almost every point set on R you can imagine is a Borel set (an
open interval, a single point, some isolated points, and so on)

Definition (Random Variable): Given a probability space


(Ω, F, P), a random variable X (defined on this space) is a
mapping from Ω to the set of real numbers R: X : ω → X (ω),
satisfying: For each Borel set B ∈ B(R), X −1 (B) ∈ F.
I comparison: a probability measure maps sets in a σ-algebra to
numbers in [0, 1], while a random variable maps points in a
sample space to numbers in R

Jing Xu (RUC) Chapter 01


Distribution

Definition (Distribution): A random variable X induces a


measure on the real line R, denoted by µX (), satisfying
µX (B) = P(ω : X (ω) ∈ B) for every Borel set B of R
I remark: what is the density function of a random variable? If
there exists a non-negative function f (x), such that for every
Borel set B ∈ B(R), we have
Z
µX (B) = f (x)dx
B

then f (x) is called the density function of random variable X

Jing Xu (RUC) Chapter 01


Expectation of Random Variables

Definition (Expectation): Given a probability space (Ω, F, P)


and a random variable
R X , the expectation of X is defined as a
Lebesgue integral Ω X (ω)dP(ω)
R
Construction of Lebesgue integral Ω X (ω)dP(ω) (assume
that X (ω) ≥ 0 for all ω for the moment):
I let 0 = y0 < y1 < ... < yk < yk+1 < ... be a partition of R +
I let Ak = {ω : yk ≤ X (ω) < yk+1 }, then Ak ∈ F (Why?).
Therefore, P(Ak ) is defined. We construct the lower Lebesgue
sum

X
LSΠ− = yk P(Ak )
k=1

I the limit of LSΠ− when the size of the partition Π approaches 0


is defined as the value of the Lebesgue integral

Jing Xu (RUC) Chapter 01


More on Lebesgue Integral
Remarks:
I 1: What if X (ω) can assume either positive or negative values?
We can separate its positive and negative part apart
X = X+ − X−
where X + = max{X , 0}, X − = max{−X , 0}. Then we can
define
Z Z Z
X (ω)dP(ω) = +
X (ω)dP(ω) − X − (ω)dP(ω)
Ω Ω Ω

provided that at least one of them is finite


I 2: We don’t need to consider the upper Lebesgue sum

X
LSΠ+ = yk+1 P(Ak )
k=1

I 3: Use the definition to calculate the expectation of a random


variable on the coin-tossing space
Jing Xu (RUC) Chapter 01
Properties of Expectations

The usual properties of expectation continue to hold with the


new definition
I Linearity: E [aX + bY ] = aE [X ] + bE [Y ]
I Comparability: if X ≤ Y a.s., then E [X ] ≤ E [Y ]
I Jensen’s inequality: if ϕ(x) is a convex function, and
|E [X ]| < ∞, then ϕ(E [X ]) ≤ E [ϕ(X )]. The inequality is
reversed if ϕ(x) is concave (pay attention to the implication of
this inequality for risk aversion and acceptability of gamble)
I Cauchy’s inequality: (E [XY ])2 ≤ E [X 2 ]E [Y 2 ]
I If X has a density function f (x), then the expectation
R can be
computed through evaluating the Riemann integral R xf (x)dx

Jing Xu (RUC) Chapter 01


Convergence of Functions

Let {fn (x) : n ≥ 0} be a sequence of functions, and f (x) be a


function, defined on a common domain X
Given a point y ∈ X , if limn→∞ fn (y ) = f (y ), then we say the
sequence {fn : n ≥ 0} converges to f at the point y
If the sequence {fn (x) : n ≥ 0} converges to f (x) at every
point x ∈ X , then we say {fn : n ≥ 0} converges to f
everywhere on X
Almost everywhere...

Jing Xu (RUC) Chapter 01


Convergence of Integrals
When a sequence of functions fn (x) convergesRto a limit
function f (x) almost everywhere,
R the integral Ω fn (x)dx does
not necessarily converge to Ω f (x)dx
I An example: let fn (x) be the density function of normal
distribution N(0, n1 ). Then, limn→∞ fn (x) = 0 almost
everywhere. Does the integral converge as well?

Two convergence theorems


I Monotone ConvergenceRTheorem: if fn (x) converges R to f (x) in
a monotonic way, then Ω fn (x)dx converges to Ω f (x)dx
I Dominated Convergence Theorem: if there is an integrable
function g (x) such that |fnR(x)| ≤ g (x) almost
R everywhere,
then fn (x) → f (x) implies Ω fn (x)dx → Ω f (x)dx
I Remark: These two theorems are about sufficient conditions,
not necessary conditions. The above example does not satisfy
the conditions of either theorems

Jing Xu (RUC) Chapter 01


Convergence of Expectations

When a sequence of random variables Xn converges to a limit


random variable X almost surely, the expectation E [Xn ] does
not necessarily converge to E [X ]
Two convergence theorems
I Monotone Convergence Theorem: if Xn converges to X in a
monotone way, then E [Xn ] → E [X ]
I Dominated Convergence Theorem: if there is an integrable
random variable Y such that |Xn | ≤ Y almost surely, then
Xn → X a.s. implies E [Xn ] → E [X ]
I Again, these two theorems are about sufficient conditions, not
necessary conditions

Jing Xu (RUC) Chapter 01


Change of Measure
Since a probability measure is no more than a rule of assigning
numbers, we can assign new numbers to the same events as
we wish, as long as the new assignment satisfies proper
conditions
One of the most efficient ways: Changing measure through a
non-negative random variable Z with E [Z ] = 1, according to
the following “algorithm”
Z Z
P̃(A) = Z (ω)dP(ω) = Z (ω)1A (ω)dP(ω)
A Ω

I remark 1: P̃(A) can be interpreted as the average value of Z


on the set A, under probability measure P
I remark 2: If Z > 0 almost surely, then P̃ and P are equivalent
probability measures in the sense that P̃(A) = 0 if and only if
P(A) = 0 (if Z = 0 with positive P measure, this relation is no
longer true)
Jing Xu (RUC) Chapter 01
Change of Measure
Task: verify P̃ is indeed a probability measure
Assume we define a new measure P̃ by
Z Z
P̃(A) = Z (ω)dP(ω) = Z (ω)1A (ω)dP(ω)
A Ω
then how can we calculate expectations under the new
measure P̃?
R
I Note that P̃(A) = Ẽ [1A ] = Ω
1A (ω)d P̃(ω). Writing the above
formula in differential form
d P̃(ω) = Z (ω)dP(ω)

I According to the definition of expectation, for a random


variable X , we have
Z Z
Ẽ [X ] = X (ω)d P̃(ω) = X (ω)Z (ω)dP(ω) = E [XZ ]
Ω Ω

Jing Xu (RUC) Chapter 01


Radon-Nikodym Theorem
Definition (Radon-Nikodym Derivative):
R If we define a new
probability measure P̃ by P̃(A) = A Z (ω)dP(ω), then Z is
called the Radon-Nikodym derivative of P̃ with respect to P,
and we formally write
d P̃
Z=
dP

Theorem (Radon-Nikodym): Let P and P̃ be equivalent


probability measures defined on a probability space (Ω, F).
Then there exists an almost surely positive random variable Z
such that E [Z ] = 1 and
Z
P̃(A) = Z (ω)dP(ω)
A

for every A ∈ F.
Jing Xu (RUC) Chapter 01
An Example of Change-of-Measure

Suppose we have a probability space (Ω, F, P), and X is a


standard normal random variable on this space. Fix a constant
θ > 0, clearly, Y = X + θ is not a standard normal random
variable. However, we can construct a new measure P̃, such
that Y is a standard normal random variable under P̃
1 2
I we define Z = e −θX − 2 θ and use Z as the Radon-Nikodym
derivative to induce a new measure P̃
I it can be shown that
Z y
1 1 2
P̃(Y ≤ y ) = √ e − 2 y dy
−∞ 2π

Jing Xu (RUC) Chapter 01


After-Class Work

Read Chapter 1 of the textbook

Jing Xu (RUC) Chapter 01

You might also like