Orf526 f24 Lec3
Orf526 f24 Lec3
Orf526 f24 Lec3
The goal of the lecture notes is to give an exposition of the material in the same order and
approximately in the same completeness as it is done in class (ORF526, Fall 2024). Always
refer to the textbooks if in doubt and for more complete treatment.
Please do not distribute these notes as they may be incomplete. You are encouraged to send
any questions on the notes and observed typos to [email protected], thanks in advance!
E. Rebrova
Recall that we call a pair (S, F) of an arbitrary set S with a σ-algebra F on it a measurable
space, meaning that one can define a measure on it. If µ is a measure defined on the pair,
then call (S, F, µ) is a measure space (or a probability space if µ is a probability measure).
Definition 3.1 (Random variable). A function f : (Ω, F) → (S, S) is called a measurable
function if for any A ∈ S, the preimage f −1 (A) = {x ∈ Ω : f (x) ∈ A} is also measurable.
A random variable is a measurable function X : (Ω, F, P) → (S, S).
The idea behind this definition is as follows: If we want to know the probability that a
random variable takes a certain value, or is in a certain (measurable) range, we need the
probability function to be well-defined on the preimage of the range or value of interest.
Consider an example: We select a number from Ω = {1, 2, . . . , 100} uniformly at random.
We are interested what is the probability that this number (mod 5) gives 3. We naturally
have a function X : Ω → {0, 1, 2, 3, 4} (ω 7→ w mod 5). The question above asks for
P(X −1 (3)) = P({w : X(w) = 3}) =: P(X = 3).
Remark 3.2. (1) Notation: We write P(X ∈ A) to denote P({w : X(w) ∈ A}). The
dependence of a random variable on the sample space will frequently be omitted
when it is clear. However, it is important to remember that a random variable X is
a function defined on some underlying space Ω (i.e., its value depends on the “state
of the world” ω ∈ Ω.)
(2) If the target space (S, S) is unspecified, it is typically assumed that it is (R, B(R)),
i.e., X is a real-valued random variable.
The following lemma lists some useful properties of measurable functions that can be used
to claim measurability of newly constructed functions.
Lemma 3.3. Let X : (Ω, F) → (S, S) and Y : (Ω, F) → (S, S) be measurable functions.
(1) (Generated σ-algebras). If S = σ(A), then X is measurable if and only if X −1 (A) is
measurable for any A ∈ A.
(2) (Composition of RVs). If Z : (S, S) → (E, E) is measurable, then Z ◦ X is also
measurable.
(3) (Pointwise maxima/minima). If (S, S) = (R, B(R)), then the pointwise maximum
max{X, Y } and minimum min{X, Y } functions are measurable. In particular, the
positive and negative parts X+ = max{X, 0} and X− = − min{X, 0} of X = X+ −X−
are also measurable.
(4) (Sums, differences, products and ratios). If (S, S) = (R, B(R)), then the functions
(defined pointwise) X + Y , X − Y , XY and X/Y (provided Y ̸= 0 everywhere) are
measurable.
(5) (Pointwise limits). If Xn : (Ω, F) → (R, B(R)), n ≥ 1, is a sequence of real-
valued measurable function, then the functions (defined pointwise) supn Xn , inf n Xn ,
lim supn Xn , lim inf n Xn , and lim Xn (when the limit exists, i.e., lim sup = lim inf)
are also measurable.
(6) (Continuous/monotone functions). Let f : (B, R) → (B, R) be a real-valued function.
If f is left-continuous, right continuous, or monotone, then f is measurable.
Most of these properties are not hard to check; some of these are homework exercises. For
example, for property (1), consider B := {B ⊂ S : X −1 (B) is measurable}. If the preimages
of the generator, X −1 (A), are measurable, then it can be checked from the definition that B
is a σ-algebra that contains A, and hence it contains σ(A).
Definition 3.6. Let X be a random variable on a probability space (Ω, F, P). The
cumulative distribution function (CDF) of X is a function FX : R → [0, 1]
defined as FX (t) := P(X ≤ t).
Theorem 3.7. Two random variables X, Y have the same CDF if and only if they have the
same law.
Proof. Consider the set of all intervals Π := {(−∞, b] : b ∈ R}. This is a π-system
(check this!) of subsets of R that generates B(R). By definition, the law of X satisfies
PX ((−∞, b]) = P(X ≤ b) = FX (b). Hence, if any other law PY has the same CDF, then
PX = P on B(R) by the corollary of the Dynkin π-λ theorem from Lecture 2. □
Theorem 3.8. Any CDF F : R → [0, 1] satisfies the following properties:
(1) F is non-decreasing
(2) F is right-continuous
(3) limt→−∞ F (t) = 0 and limt→∞ F (t) = 1.
If a function F satisfies (1)–(3), then there exists a probability space and a random variable
X defined on this space such that its CDF FX coincides with F .
Proof of Theorem 3.8. We can verify the properties (1)–(3) for any CDF using monotonicity,
continuity from above and from below of the law PX of the constructed random variable X
with the same CDF F (exercise!).
Consider any F satisfying the properties (1)–(3). Let Ω = (0, 1), and consider the prob-
ability space ((0, 1), B(0, 1), P), where P is the uniform measure (i.e., P is the Lebesgue
measure on the unit interval). Define the function X(ω) := inf{y : F (y) ≥ ω} on the prob-
ability space. Note that X is well-defined due to (3). Furthermore, properties (1) and (2)
imply that {ω : X(ω) ≤ t} = {ω : F (t) ≥ ω}; see the following figure:
Therefore,
(i) (ii) (iii)
FX (t) = P{X ≤ t} = P{ω : X(ω) ≤ t} = P{ω : ω ≤ F (t)} = P((0, F (t)]) = F (t).
Here, (i) is by the identity in the purple box above, (ii) is simply notation, and (iii) is by
the definition of the uniform probability measure on (0, 1).
4 LECTURE 3: RANDOM VARIABLES AND FUNCTIONS THAT DESCRIBE THEM
Remark 3.9. One can check that a CDF can have at most countably many points of
discontinuity.
• Simple random variables: for any finite sequence of events A1 , . . . , AN ∈ F and non-
random numbers c1 , . . . , cN , we can define a simple random variable by the following
finite sum:
N
X
X(ω) := cn 1An (ω).
n=1
Theorem 3.10 (Monotone approximation by simple functions). For any random variable
X(ω), there exists a monotone increasing sequence of simple random variables Xn (ω) such
that X1 (ω) ≤ X2 (ω) ≤ . . . and Xn (ω) → X(ω) as n → ∞ for any ω ∈ Ω.
If X ≥ 0, then Xn := fn (X) is such that X ≥ Xn+1 ≥ Xn and X(ω) − Xn (ω) < 2−n
if X(ω) ≤ n. So, fn (x) → f (x) as n → ∞. For arbitrary X(ω), we represent it as
X(ω) = X+ (ω) − X− (ω) with X+ (ω) := max(X(ω), 0) and X− (ω) := − min(X(ω), 0), and
apply the construction above to the positive and negative parts separately. □
LECTURE 3: RANDOM VARIABLES AND FUNCTIONS THAT DESCRIBE THEM 5