Orf526 f24 Lec3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

LECTURE 3:

RANDOM VARIABLES AND FUNCTIONS THAT DESCRIBE THEM

The goal of the lecture notes is to give an exposition of the material in the same order and
approximately in the same completeness as it is done in class (ORF526, Fall 2024). Always
refer to the textbooks if in doubt and for more complete treatment.
Please do not distribute these notes as they may be incomplete. You are encouraged to send
any questions on the notes and observed typos to [email protected], thanks in advance!
E. Rebrova

Recall that we call a pair (S, F) of an arbitrary set S with a σ-algebra F on it a measurable
space, meaning that one can define a measure on it. If µ is a measure defined on the pair,
then call (S, F, µ) is a measure space (or a probability space if µ is a probability measure).
Definition 3.1 (Random variable). A function f : (Ω, F) → (S, S) is called a measurable
function if for any A ∈ S, the preimage f −1 (A) = {x ∈ Ω : f (x) ∈ A} is also measurable.
A random variable is a measurable function X : (Ω, F, P) → (S, S).
The idea behind this definition is as follows: If we want to know the probability that a
random variable takes a certain value, or is in a certain (measurable) range, we need the
probability function to be well-defined on the preimage of the range or value of interest.
Consider an example: We select a number from Ω = {1, 2, . . . , 100} uniformly at random.
We are interested what is the probability that this number (mod 5) gives 3. We naturally
have a function X : Ω → {0, 1, 2, 3, 4} (ω 7→ w mod 5). The question above asks for
P(X −1 (3)) = P({w : X(w) = 3}) =: P(X = 3).
Remark 3.2. (1) Notation: We write P(X ∈ A) to denote P({w : X(w) ∈ A}). The
dependence of a random variable on the sample space will frequently be omitted
when it is clear. However, it is important to remember that a random variable X is
a function defined on some underlying space Ω (i.e., its value depends on the “state
of the world” ω ∈ Ω.)
(2) If the target space (S, S) is unspecified, it is typically assumed that it is (R, B(R)),
i.e., X is a real-valued random variable.
The following lemma lists some useful properties of measurable functions that can be used
to claim measurability of newly constructed functions.
Lemma 3.3. Let X : (Ω, F) → (S, S) and Y : (Ω, F) → (S, S) be measurable functions.
(1) (Generated σ-algebras). If S = σ(A), then X is measurable if and only if X −1 (A) is
measurable for any A ∈ A.
(2) (Composition of RVs). If Z : (S, S) → (E, E) is measurable, then Z ◦ X is also
measurable.

Date: Fall 2024.


1
2 LECTURE 3: RANDOM VARIABLES AND FUNCTIONS THAT DESCRIBE THEM

(3) (Pointwise maxima/minima). If (S, S) = (R, B(R)), then the pointwise maximum
max{X, Y } and minimum min{X, Y } functions are measurable. In particular, the
positive and negative parts X+ = max{X, 0} and X− = − min{X, 0} of X = X+ −X−
are also measurable.
(4) (Sums, differences, products and ratios). If (S, S) = (R, B(R)), then the functions
(defined pointwise) X + Y , X − Y , XY and X/Y (provided Y ̸= 0 everywhere) are
measurable.
(5) (Pointwise limits). If Xn : (Ω, F) → (R, B(R)), n ≥ 1, is a sequence of real-
valued measurable function, then the functions (defined pointwise) supn Xn , inf n Xn ,
lim supn Xn , lim inf n Xn , and lim Xn (when the limit exists, i.e., lim sup = lim inf)
are also measurable.
(6) (Continuous/monotone functions). Let f : (B, R) → (B, R) be a real-valued function.
If f is left-continuous, right continuous, or monotone, then f is measurable.
Most of these properties are not hard to check; some of these are homework exercises. For
example, for property (1), consider B := {B ⊂ S : X −1 (B) is measurable}. If the preimages
of the generator, X −1 (A), are measurable, then it can be checked from the definition that B
is a σ-algebra that contains A, and hence it contains σ(A).

Three objects associated with a RV:


σ-algebra σ(X), the law of X, the CDF of X
(1) The law of X is a probability measure on the image space (typically, on (R, B(R))),
denoted as PX . It allows us to talk about random variables without references to the
original space Ω and only focus on the probabilities of the observed values.
Definition 3.4 (Random variables define new probability measures). Let X : (Ω, F) →
(S, S) be a random variable. The law of X is the probability measure PX on the
image space induced by the random variable X, defined as PX (A) := P(X ∈ A) for
any A ∈ S.
Exercise: check that the function PX indeed defines a probability measure.
(2) The generated σ-algebra σ(X) contains exactly those events A for which we can say
whether ω is in A or not, based on the value of X(ω).
Definition 3.5 (Random variables define new σ-algebras). Given a random variable
X, the σ-algebra generated by X, denoted by σ(X), is the minimal σ-algebra such
that X(ω) is a measurable function.
Exercise: Consider an experiment where an infinite sequence of coin tosses is ob-
served. The outcome space has elements ω = (ω1 , ω2 , . . .), where each ωi ∈ {H, T }.
Let’s identify H = 1 and T = 0. Note that any infinite sequence of zeros and ones
is possible, and thus the space can be identified with the segment [0, 1] by binary
expansions. Let X(ω) := ωn (the result of the nth toss). What is the σ-algebra
generated by X, σ(X)? Think this through. (The image of X is {0, 1}. X −1 ({0}) =
all sequences with 0 in the nth position. . . . )
(3) If a random variable maps to R, the cumulative distribution function FX essentially
presents a more convenient representation for its law, focusing only on the set of
generators for a Borel sigma-algebra.
LECTURE 3: RANDOM VARIABLES AND FUNCTIONS THAT DESCRIBE THEM 3

Definition 3.6. Let X be a random variable on a probability space (Ω, F, P). The
cumulative distribution function (CDF) of X is a function FX : R → [0, 1]
defined as FX (t) := P(X ≤ t).

Theorem 3.7. Two random variables X, Y have the same CDF if and only if they have the
same law.
Proof. Consider the set of all intervals Π := {(−∞, b] : b ∈ R}. This is a π-system
(check this!) of subsets of R that generates B(R). By definition, the law of X satisfies
PX ((−∞, b]) = P(X ≤ b) = FX (b). Hence, if any other law PY has the same CDF, then
PX = P on B(R) by the corollary of the Dynkin π-λ theorem from Lecture 2. □
Theorem 3.8. Any CDF F : R → [0, 1] satisfies the following properties:
(1) F is non-decreasing
(2) F is right-continuous
(3) limt→−∞ F (t) = 0 and limt→∞ F (t) = 1.
If a function F satisfies (1)–(3), then there exists a probability space and a random variable
X defined on this space such that its CDF FX coincides with F .
Proof of Theorem 3.8. We can verify the properties (1)–(3) for any CDF using monotonicity,
continuity from above and from below of the law PX of the constructed random variable X
with the same CDF F (exercise!).
Consider any F satisfying the properties (1)–(3). Let Ω = (0, 1), and consider the prob-
ability space ((0, 1), B(0, 1), P), where P is the uniform measure (i.e., P is the Lebesgue
measure on the unit interval). Define the function X(ω) := inf{y : F (y) ≥ ω} on the prob-
ability space. Note that X is well-defined due to (3). Furthermore, properties (1) and (2)
imply that {ω : X(ω) ≤ t} = {ω : F (t) ≥ ω}; see the following figure:

Therefore,
(i) (ii) (iii)
FX (t) = P{X ≤ t} = P{ω : X(ω) ≤ t} = P{ω : ω ≤ F (t)} = P((0, F (t)]) = F (t).
Here, (i) is by the identity in the purple box above, (ii) is simply notation, and (iii) is by
the definition of the uniform probability measure on (0, 1).
4 LECTURE 3: RANDOM VARIABLES AND FUNCTIONS THAT DESCRIBE THEM

Finally, this shows that X is measurable (since F is measurable as a non-decreasing


function). Hence, X is a random variable with the same CDF as F . □

Remark 3.9. One can check that a CDF can have at most countably many points of
discontinuity.

Basic example: indicators and simple random variables,


approximation by simple random variables
Some fundamental random variables are the following:
• Indicator (Dirac) random variables: fix A ∈ F, and consider the indicator random
variable of the event A:
(
1 if ω ∈ A,
1A (ω) =
0 otherwise.

Let’s discuss this. (1) 1A is a random variable since X −1 (B) = A if 1 ∈ B and Ac


otherwise for B ∈ B(R), which are measurable by definition of the σ-algebra F. (2)
σ(X) = {∅, A, Ac , Ω}. (3) The law of X is given by PX (B) = P(X ∈ B) = P(A) if
1 ∈ B and P(Ac ) otherwise for any B ∈ B(R)).
The CDF of an indicator (Dirac) random variable looks like the following:

• Simple random variables: for any finite sequence of events A1 , . . . , AN ∈ F and non-
random numbers c1 , . . . , cN , we can define a simple random variable by the following
finite sum:
N
X
X(ω) := cn 1An (ω).
n=1

Theorem 3.10 (Monotone approximation by simple functions). For any random variable
X(ω), there exists a monotone increasing sequence of simple random variables Xn (ω) such
that X1 (ω) ≤ X2 (ω) ≤ . . . and Xn (ω) → X(ω) as n → ∞ for any ω ∈ Ω.

Proof. Define functions fn : R → R by


n2n −1
X
fn (x) := n1{x>n} (x) + k2−n 1(k2−n ,(k+1)2−n ] (x).
k=0

If X ≥ 0, then Xn := fn (X) is such that X ≥ Xn+1 ≥ Xn and X(ω) − Xn (ω) < 2−n
if X(ω) ≤ n. So, fn (x) → f (x) as n → ∞. For arbitrary X(ω), we represent it as
X(ω) = X+ (ω) − X− (ω) with X+ (ω) := max(X(ω), 0) and X− (ω) := − min(X(ω), 0), and
apply the construction above to the positive and negative parts separately. □
LECTURE 3: RANDOM VARIABLES AND FUNCTIONS THAT DESCRIBE THEM 5

Probability density functions


Some (but not all) random variables also have density functions.
Definition 3.11. A random variable X(ω) has a probability density function fX : R →
R with respect to Lebesgue measure if
Z
P(X ∈ A) = fX (x) dx
A
for all Borel sets A ∈ B(R).
What do we mean by “with respect to Lebesgue measure?” How do we integrate over
an arbitrary Borel set? Soon, we will also want to integrate arbitrary measurable functions
(and random variables). In the next lecture, we will formally define integration with respect
to some measure (in particular, we will define the Lebesgue integral with respect to the
Lebesgue measure).
For now, note that the definition of a random variable with a probability density function
fX implies the following:
R
(1) R fX (x) dx = 1.
(2) fX (x) ≥ 0 almost everywhere, i.e., for all x except on a set of measure zero.
(3) The CDF FX of X is given by
Z t
FX (t) = fX (x) dx.
−∞
(Compare the properties of integrals with the properties of CDFs. For now, you can
think of the integral in a Riemann sense.)

You might also like