to denote the numerical value of a random variable X, when is no larger than - X (ω) ≤ c) - Of course, in
to denote the numerical value of a random variable X, when is no larger than - X (ω) ≤ c) - Of course, in
to denote the numerical value of a random variable X, when is no larger than - X (ω) ≤ c) - Of course, in
RANDOM VARIABLES
Contents
X(ω1 , . . . , ωn ) = ω1 + · · · + ωn .
1
With this definition, the set {ω | X(ω) < 4} is just the event that there were fewer than
4 heads overall, belongs to the σ-field F, and therefore has a well-defined probability.
Consider the real line, and let B be the associated Borel σ-field. Sometimes,
we will also allow random variables that take values in the extended real line,
R = R ∪ {−∞, ∞}. We define the Borel σ-field on R, also denoted by B, as
the smallest σ-field that contains all Borel subsets of R and the sets {−∞} and
{∞}.
2
Because the collection of intervals of the form (−∞, c] generates the Borel
σ-field in R, it can be shown that if X is a random variable, then for any
−1
� −1set B,� the set X (B) is F-measurable. It follows that the probability
Borel
P X (B) = P({ω | X(ω) ∈ B}) is well-defined. It is often denoted by
P(X ∈ B).
(a) For every Borel subset B of the real line (i.e., B ∈ B), we define PX (B) =
P(X ∈ B).
(b) The resulting function PX : B → [0, 1] is called the probability law of X.
Proof: Clearly, PX (B) ≥ 0, for every Borel set B. Also, PX (R) = P(X ∈
R) = P(Ω) = 1. We now verify countable additivity. Let {Bi } be a countable
sequence of disjoint Borel subsets of R. Note that the sets X −1 (Bi ) are also
disjoint, and that
X −1 ∪∞ ∞ −1
� �
i=1 Bi = ∪i=1 X (Bi ),
or, in different notation,
{X ∈ ∪∞ ∞
i=1 Bi } = ∪i=1 {X ∈ Bi }.
3
1.3 Technical digression: measurable functions
The following generalizes the definition of a random variable.
According to the above definition, given a probability space (Ω, F, P), and
taking into account the discussion in Section 1.1, a random variable on a proba
bility space is a function X : Ω → R that is (F, B)-measurable.
As a general rule, functions constructed from other measurable functions
using certain simple operations are measurable. We collect, without proof, a
number of relevant facts below.
Another way that we can form a random variable is by taking the limit of
a sequence of random variables. Let us first introduce some terminology. Let
each fn be a function from some set Ω into R. Consider a new function f =
inf n fn defined by f (ω) = inf n fn (ω), for every ω ∈ Ω. The functions supn fn ,
lim inf n→∞ fn , and lim supn→∞ fn are defined similarly. (Note that even if
the fn are everywhere finite, the above defined functions may turn out to be
extended-valued. ) If the limit limn→∞ fn (ω) exists for every ω, we say that the
sequence of functions {fn } converges pointwise, and define its pointwise limit
to be the function f defined by f (ω) = limn→∞ fn (ω). For example, suppose
that Ω = [0, 1] and that fn (ω) = ω n . Then, the pointwise limit f = limn→∞ fn
exists, and satisfies f (1) = 1, and f (ω) = 0 for ω ∈ [0, 1).
Example 4. Let X be the number of heads in two independent tosses of a fair coin. In
5
particular, P(X = 0) = P(X = 2) = 1/4, and P(X = 1) = 1/2. Then,
⎧
⎪
⎪ 0, if x < 0,
1/4, if 0 ≤ x < 1,
⎨
FX (x) =
⎪
⎪ 3/4, if 1 ≤ x < 2.
1, if x ≥ 2.
⎩
Example 5. (A uniform random variable and its square) Consider a probability space
(Ω, B, P), where Ω = [0, 1], B is the Borel σ-field B, and P is the Lebesgue measure.
The random variable U defined by U (ω) = ω is said to be uniformly distributed. Its
CDF is given by
⎧
⎨ 0, if x < 0,
FU (x) = x, if 0 < x < 1,
1, if x ≥ 1.
⎩
Thus,
⎧
⎨ 0,
√ if x < 0,
FX (x) = x, if 0 ≤ x < 1,
1, if x ≥ 1.
⎩
Proof:
(a) Suppose that x ≤ y. Then, {X ≤ x} ⊂ {X ≤ y}, which implies that
6
(b) Since FX (x) is monotonic in x and bounded below by zero, it converges as
x → −∞, and the limit is the same for every sequence {xn } converging to
−∞. So, let xn = −n, and note that the sequence of events ∩∞ n=1 {X ≤
−n} converges to the empty set. Using the continuity of probabilities, we
obtain
Since this is true for every such sequence {xn }, we conclude that
limy↓x FX (y) = FX (x).
7
range of F is the entire interval (0, 1). Furthermore, F is invertible: for ev
ery y ∈ (0, 1), there exists a unique x, denoted F −1 (y), such that F (x) = y.
We define U (ω) = ω and X(ω) = F −1 (ω), for every ω ∈ (0, 1), so that
X = F −1 (U ). Note that F (F −1 (ω)) = ω for every ω ∈ (0, 1), so that
F (X) = U . Since F is strictly increasing, we have X ≤ x if and only
F (X) ≤ F (x), or U ≤ F (x). (Note that this also establishes that the event
{X ≤ x} is measurable, so that X is indeed a random variable.) Thus, for every
x ∈ R, we have
� �
FX (x) = P(X ≤ x) = P F (X) ≤ F (x) = P(U ≤ F (x)) = F (x),
as desired.
Note that the probability law of X assigns probabilities to all Borel sets,
whereas the CDF only specifies the probabilities of certain intervals. Neverthe
less, the CDF contains enough information to recover the law of X.
Proof: (Outline) Let F0 be the collection of all subsets of the real line that are
unions of finitely many intervals of the form (a, b]. Then, F0 is a field. Note
that, the CDF FX can be used to completely determine the probability PX (A)
of a set A ∈ F0 . Indeed, this is done using relations such as
Discrete random variables take values in a countable set. We need some nota
tion. Given a function f : Ω → R, its range is the set
8
Definition 5. Discrete random variables and PMFs)
(a) A random variable X, defined on a probability space (Ω, F, P), is said to be
discrete if its range X(Ω) is countable.
(b) If X is a discrete random variable, the function pX : R → [0, 1] defined by
pX (x) = P(X = x), for every x, is called the (probability) mass function
of X, or PMF for short.
A random variable that takes only integer values is discrete. For instance, the
random variable in Example 4 (number of heads in two coin tosses) is discrete.
Also, every simple random variable is discrete, since it takes a finite number of
values. However, more complicated discrete random variables are also possible.
Example 6. Let the sample space be the set N of natural numbers, and consider a
measure that satisfies P(n) = 1/2n , for every n ∈ N. The random variable X defined
by X(n) = n is discrete.
Suppose now that the rational numbers have been arranged in a sequence, and that
xn is the nth rational number, according to this sequence. Consider the random variable
Y defined by Y (n) = xn . The range of this random variable is countable, so Y is a
discrete random variable. Its range is the set of rational numbers, every rational number
has positive probability, and the set of irrational numbers has zero probability.
We close by noting that discrete random variables can be represented in
terms of indicator functions. Indeed, given a discrete random variable X, with
range {x1 , x2 , . . .}, we define An = {X = xn }, for every n ∈ N. Observe that
each set An is measurable (why?). Furthermore, the sets An , n ∈ N, form a
partition of the sample space. Using indicator functions, we can write
∞
�
X(ω) = xn IAn (ω).
n=1
9
Conversely, suppose we are given a sequence {An } of disjoint events, and a
real sequence {xn }. Define X : Ω → R by letting X(ω) = xn if and only if
ω ∈ An . Then X is a discrete random variable, and P(X = xn ) = P(An ), for
every n.
The function f is called a (probability) density function (or PDF, for short)
for X,
Any nonnegative measurable function that satisfies Eq. (2) is called a�density
x
function. Conversely, given a density function f , we can define F (x) = −∞ f (t) dt,
and verify that F is a distribution function. It follows that given a density func
tion, there always exists a random variable whose PDF is the given density.
If a CDF FX is differentiable at some x, the corresponding value fX (x) can
be found by taking the derivative of FX at that point. However, CDFs need not
be differentiable, so this will not always work. Let us also note that a PDF of
a continuous random variable is not uniquely defined. We can always change
the PDF at a finite set of points, without affecting its integral, hence multiple
10
PDFs can be associated to the same CDF. However, this nonuniqueness rarely
becomes an issue. In the sequel, we will often refer to “the PDF” of X, ignoring
the fact that it is nonunique.
Example 7. For a uniform random variable, we have FX (x) = P(X ≤ x) = x, for
every x ∈ (0, 1). By differentiating, we find fX (x) = 1, for x ∈ (0, 1). For x < 0
we have FX (x) = 0, and for x > 1 we have FX (x) = 1; in both cases, we obtain
fX (x) = 0. At x = 0, the CDF is not differentiable. We are free to define fX (0) to be
0, or 1, or in fact any real number; the value of the integral of fX will remain unaffected.
Using the PDF of a continuous random variable, we can calculate the prob
ability of various subsets of the real line. For example, we have P(X = x) = 0,
for all x, and if a < b,
� b
P(a < X < b) = P(a ≤ X ≤ b) = fX (t) dt.
a
11
(a) Consider a function f : Rm → R, and fix some x ∈ Rm . We say that f (y)
converges to a value c, as y tends to x, if we have limn→∞ f (xn ) = c, for
every sequence {xn } of elements of Rm such that xn = � x for all n, and
limn→∞ xn = x. In this case, we write limy→x f (y) = c.
This expansion is not unique. For example, 1/3 admits two expansions, namely
.10000 · · · and .022222 · · · . Nonuniqueness occurs only for those x that admit
an expansion ending with an infinite sequence of 2s. The set of such unusual x
is countable, and therefore has Lebesgue measure zero.
12
The Cantor set C is defined as the set of all x ∈ [0, 1] that have a ternary ex
pansion that uses only 0s and 2s (no 1s allowed). The set C can be constructed as
follows. Start with the interval [0, 1] and remove the “middle third” (1/3, 2/3).
Then, from each of the remaining closed intervals, [0, 1/3] and [2/3, 1], remove
their middle thirds, (1/9, 2/9) and (7/9, 8/9), resulting in four closed intervals,
and continue this process indefinitely. Note that C is measurable, since it is
constructed by removing a countable sequence of intervals. Also, the length
(Lebesgue measure) of C is 0, since at each stage its length is mutliplied by a
factor of 2/3. On the other hand, the set C has the same cardinality as the set
{0, 2}∞ , and is uncountable.
Consider now an infinite sequence of independent rolls of a 3-sided die,
whose faces are labeled 0, 1, and 2. Assume that at each roll, each of the three
possible results has the same probability, 1/3. If we use the sequence of these
rolls to form a number x, then the probability law of the resulting random vari
able is the Lebesgue measure (i.e., picking a ternary expansion “at random”
leads to a uniform random variable).
The Cantor set can be identified with the event consisting of all roll se
quences in which a 1 never occurs. (This event has zero probability, which is
consistent with the fact that C has zero Lebesgue measure.)
Consider now an infinite sequence of independent tosses of a fair coin. If
the ith toss results in tails, record xi = 0; if it results in heads, record xi = 2.
Use the xi s to form a number x, using Eq. (4). This defines a random variable
X on ([0, 1], B), whose range is the set C. The probability law of this random
variable is therefore concentrated on the “zero-length” set C. At the same time,
P(X = x) = 0 for every x, because any particular sequence of heads and tails
has zero probability. A measure with this property is called singular.
The random variable X that we have constructed here is neither discrete nor
continuous. Moreover, the CDF of X cannot be written as a mixture of the kind
considered in Eq. (3).
13
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.