Random Variables
Random Variables
Copyright ©
Denition. According to the Merriam-Webster online dictionary1 , the word random means
lacking a denite plan, purpose or pattern or relating to, having, or being elements or
events with denite probability of occurrence. In probability and statistics, we use the sec-
ond denition, such that a random process is any action that has a probability distribution.
The concept of uncertainty is inherent to the denition of randomness. For any random
process, we cannot be certain about the action’s outcome because there is some element of
chance involved. Note that the opposite of a random process is a deterministic process,
which is some action that always results in the same outcome.
Example 2. Drawing the rst card from a shued deck of 52 cards is a random process
because we do not know which card we will select. Drawing the rst card from a sorted deck
of 52 cards is a deterministic process because we will select the same card each time.
1
https://www.merriam-webster.com/dictionary/random
A random variable is an abstract way to talk about experimental outcomes, which makes
it possible to exibly apply probability theory. Note that you cannot observe a random
variable X itself, i.e., you cannot observe the function that maps experimental outcomes to
numbers. The experimenter denes the random variable (i.e., function) of interest, and then
observes the result of applying the dened function to an experimental outcome.
Denition. The realization of a random variable is the result of applying the random vari-
able (i.e., function) to an observed outcome of a random experiment. This is what the
experimenter actually observes. The realization of a random variable is typically denoted
using lowercase italicized Roman letters, e.g., x is a realization of X.
Denition. The domain of a random variable is the sample space S, i.e., the set of possible
realizations that the random variable can take.
Example 3. Suppose we ip a fair (two-sided) coin n ≥ 2 times, and assume that the n
ips are independent of one another. Dene X as the number of coin ips that are heads.
Note that X is a random variable given that it is a function (i.e., counting the number of
heads) that is applied to a random process (i.e., independently ipping a fair coin n times).
Possible realizations of the random variable X include any x ∈ 0, 1, , n, i.e., we could
observe any number of heads between 0 and n.
Example 4. Suppose that we draw the rst card from a randomly shued deck of 52
cards, and dene X as the suit of the drawn card. Note that X is a random variable
given that it is a function (i.e., suit of the card) that is applied to a random process (i.e.,
drawing the rst card from a shued deck). Possible realizations of the random variable X
include any x ∈ 1, 2, 3, 4, where 1 = Clubs, 2 = Diamonds, 3 = Hearts, and 4 = Spades.
Note that it is not necessary to code the suits using numeric values, i.e., we could write
that x ∈ Clubs, Diamonds, Hearts, Spades. However, mapping the suits onto numbers is
(i) notationally more convenient given that we can more compactly denote the possibilities,
and (ii) technically more correct given our denition of a random variable.
Denition. A random variable is discrete if its domain consists of a nite (or countably
innite) set of values. A random variable is continuous if its domain is uncountably innite.
Example 5. Suppose we ip a fair (two-sided) coin n ≥ 2 times, and assume that the n ips
are independent of one another. Dene X as the number of coin ips that are heads. The
random variable X is a discrete random variable given that the domain S = 0, , n is a
nite set (assuming a xed number of ips n). Thus, we could associate a specic probability
to each x ∈ S. See Example 13 from the Introduction to Probability Theory notes for an
example of the probability distribution with n = 3.
Example 6. Consider the face of a clock, and suppose that we randomly spin the second
hand around the clock face. Dene X as the position where the second hand stops spinning
(see Figure 1). The random variable X is a continuous random variable given that the
domain S = x x is a point on a circle is an uncountably innite set. Thus, we cannot
associate a specic probability with any given x ∈ S, i.e., P (X = x) = 0 for any x ∈ S, but
we can calculate the probability that X is in a particular range, e.g., P (3 < X < 6) = 14.
12 12 12
11 1 11 1 11 1
10 2 10 2 10 2
9 3 9 3 9 3
8 4 8 4 8 4
7 5 7 5 7 5
6 6 6
Figure 1: Clock face with three random positions of the second hand.
Denition. The probability mass function (PMF) of a discrete random variable X is the
function f (·) that associates a probability with each x ∈ S. In other words, the PMF of X
is the function that returns P (X = x) for each x in the domain of X.
Any PMF must dene a valid probability distribution, with the properties:
Example 7. See Figure 2 for the PMF corresponding to the coin ipping example with
n = 5 and n = 10 independent ips. As we might expect, the most likely realizations of
X are those in which we observe approximately n2 heads. Note that it is still possible to
observe x = 0 or x = n heads, but these extreme results are less likely to occur.
n=5 n = 10
0.20
0.25
P (X = x)
P (X = x)
0.15
0.10
0.05
0.00
0 1 2 3 4 5 0 2 4 6 8 10
x x
Figure 2: Probability mass function for coin ipping example with n = 5 and n = 10.
Any PDF must dene a valid probability distribution, with the properties:
Example 8. Suppose that we randomly spin the second hand around a clock face n inde-
pendent times. Dene Zi as the position where the second hand stops spinning on the i-th
replication, and dene X = n1 ni=1 Zi as the average of the n spin results. Note that the
realizations of X are any values x ∈ [0, 12], which is the same domain as Zi for i = 1, , n.
See Figure 3 for the PDF with n = 1 and n = 5 independent spins. Note that with n = 1
spin, the PDF is simply a at line between 0 and 12, which implies that P (x < X < x + 1)
is equal for any x ∈ 0, , 11. This makes sense given that, for any given spin, the second
hand could land anywhere on the clock face with equal probability. With n = 5 spins, the
PDF has a bell shape, where values around the midpoint of x = 6 have the largest density.
∫8
With n = 5 spins, we have that 4 f (x)dx = P (4 < X < 8) ≈ 080 (i.e., about 80% of the
realizations of X are between 4 and 8).
n=1 n=5
0.08
0.20
f (x)
f (x)
0.06
0.10
0.00
0.04
0 2 4 6 8 10 12 0 2 4 6 8 10 12
x x
Figure 3: Probability density function for clock spinning example with n = 1 and n = 5.
Any CDF must dene a valid probability distribution, i.e., 0 ≤ F (x) ≤ 1 for any x ∈ S,
which comes from the fact that F (x) = P (X ≤ x) is a probability calculation. Note that a
CDF can be dened for both discrete and continuous random variables:
• F (x) = z∈S,z≤x f (z) for discrete random variables
∫x
• F (x) = −∞
f (z)dz for continuous random variables
Furthermore, note that probabilities can be written in terms of the CDF, such as
given that the CDF is related to the PMF (or PDF), such as
0.8
F (x)
F (x)
0.4
0.4
0.0
0.0
0 1 2 3 4 5 0 2 4 6 8 10 12
x x
Figure 4: Cumulative distribution function for the coin and clock examples with n = 5.
5 Quantile Function
Denition. The quantile function of a random variable X is the function Q(·) that returns
the realization x such that P (X ≤ x) = p for any p ∈ [0, 1]. Note that the quantile function
is the inverse of the CDF, such that Q(·) is a function from [0, 1] to S, i.e., Q : [0, 1] → S.
More formally, quantile function can be dened as Q(p) = minx∈S F (x) ≥ p, where min
denotes the minimum. Thus, for any input probability p ∈ [0, 1], the quantile function Q(p)
returns the smallest x ∈ S that satises the inequality F (x) ≥ p. For continuous random
variables, we have the relationship Q = F −1 . There are a handful of quantiles that are quite
popular. The quartiles are dened as
• Second Quartile (Median): p = 12 returns x that cuts the distribution in half
Denition. The 100pth percentile of a distribution is the quantile x such that 100p% of the
distribution is below x for any p ∈ (0, 1). For example, the 20th percentile is the quantile
corresponding to p = 15, such that 20% of the distribution is below Q(15).
n=1 n=5
0.08
0.20
f (x)
f (x)
0.06
0.10
0.00
0.04
0 2 4 6 8 10 12 0 2 4 6 8 10 12
x x
Figure 5: Density function and quartiles for clock spinning example with n = 1 and n = 5.
Denition. The expected value of a random variable X is a weighed average where the
weights are dened according to the PMF or PDF. The expected value of X is dened as
µ = E(X) where E(·) is the expectation operator, which is dened as E(X) = x∈S xf (x)
∫
for discrete random variables or E(X) = x∈S xf (x)dx for continuous random variables.
To gain some insight into the expectation operator E(·), suppose that we have sampled n
independent realizations of some random variable X. Let x1 , , xn denote the n independent
realizations of X, and dene the arithmetic mean as x̄n = n1 ni=1 xi . As the sample size n
gets innitely large, the arithmetic mean converges to the expected value µ, i.e.,
which is due to the weak law of large numbers (also known as Bernoulli’s theorem).
Expectation Rules
3. E(X1 +· · ·+Xp ) = E(X1 )+· · ·+E(Xp ), which reveals that expectations of summations
are summations of expectations (ESSE).
Note that rules #3 and #4 are true regardless of whether X1 , , Xp are independent.
n
Example 9. For the coin ipping example, note that X = i=1 Zi where Zi is the i-th
coin ip. Applying the ESSE rule, we have that the expected value of X can be written as
E(X) = ni=1 E(Zi ). Since the coin is assumed to be fair, the expected value of Zi has the
form E(Zi ) = 1x=0 xf (x) = 0(12) + 1(12) = 12 for any given i ∈ 1, , n. Thus, the
expected value of X can be written as E(X) = ni=1 (12) = n2. In other words, we would
expect to observe n2 heads given n independent ips of a fair coin, which makes intuitive
sense (about half of the ips should be expected to be heads if the coin is fair).
n
Example 10. For the clock spinning example, note that X = i=1 ai Zi where Zi is the i-th
clock spin and ai = 1n for all i = 1, , n. Applying the EWSSWE rule, we have that the
expected value of X can be written as E(X) = (1n) ni=1 E(Zi ). And note that E(Zi ) =
E(Z1 ) for all i = 1, , n because the n spins are independent and identically distributed
(iid). Thus, we just need to determine the expected value of a single spin, which has the
∫ 12
form E(Z) = 0 zf (z)dz. The density function for a single clock spin is simply a rectangle
that gives equal density to each observable value between 0 and 12 (see Figure 3), which
1
implies that f (z) = for z ∈ [0, 12] and f (z) = 0 otherwise. This implies that the expected
12
1
∫ 12 1 1 2 z=12 1
value of a single spin has the form E(Z) = 12 0
zdz = 12 [ 2 z ]z=0 = 24 (144 − 0) = 6, so
1
n 1
the expected value of X has the form E(X) = n i=1 6 = n (6n) = 6.
To gain some insight into the variance, suppose that we have sampled n independent
realizations of some random variable X. Let x1 , , xn denote the n independent realizations
of X, and dene the arithmetic mean of the squared deviations from the average value, i.e.,
s̃2n = n1 ni=1 (xi − x̄n )2 where x̄n = n1 ni=1 xi . As the sample size n gets innitely large, the
arithmetic mean of the squared deviations converges to the variance σ 2 , i.e.,
which is due to the weak law of large numbers (also known as Bernoulli’s theorem).
Variance Rules
3. For independent variables, Var(X1 +· · ·+Xp ) = Var(X1 )+· · ·+Var(Xp ), which reveals
that variances of summations are summations of variances (VSSV) when the summed
variables are mutually independent.
4. For independent variables, Var(b1 X1 +· · ·+bp Xp ) = b21 Var(X1 )+· · ·+b2p Var(Xp ), which
reveals that variances of weighted summations are summations of weighted variances
(VWSSWV) when the summed variables are mutually independent.
5. More generally, Var(b1 X1 + b2 X2 ) = b21 Var(X1 ) + b22 Var(X2 ) + 2b1 b2 Cov(X1 , X2 ), where
Cov(X1 , X2 ) = E[(X1 − µ1 )(X2 − µ2 )] is the covariance between X1 and X2 .
Note that rules #3 and #4 are only true if X1 , , Xp are mutually independent.
n
Example 11. For the coin ipping example, remember that X = i=1 Zi where Zi is the
i-th coin ip. Applying the VSSV rule (which is valid because the Zi are independent), we
have that the variance of X can be written as Var(X) = ni=1 Var(Zi ). Since the coin is
assumed to be fair, Var(Zi ) = 1x=0 (x − 12)2 f (x) = (14)(12) + (14)(12) = 14 for any
given i ∈ 1, , n, which uses the fact that E(Zi ) = 12. Thus, the variance of X can be
written as Var(X) = ni=1 (14) = n4.
n
Example 12. For the clock spinning example, remember that X = i=1 ai Zi where Zi
is the i-th clock spin and ai = 1n for all i = 1, , n. Applying the VWSSWV rule, we
have that the variance of X can be written as Var(X) = n12 ni=1 Var(Zi ). And note that
Var(Zi ) = Var(Z1 ) for all i = 1, , n because the n spins are independent and identically
distributed (iid). Thus, we just need to determine the variance of a single spin, which has
1
the form Var(Z) = E(Z 2 ) − E(Z)2 . Remembering that f (z) = 12 and E(Z) = 6, we just
2 1
∫ 12 2 1 1 3 z=12 1
need to calculate E(Z ) = 12 0 z dz = 12 [ 3 z ]z=0 = 36 (1728 − 0) = 48. Thus, the variance
of a single spin Z has the form Var(Z) = 48 − 36 = 12, which implies that the variance of
X is Var(X) = n12 ni=1 12 = 12n.
Denition. The standard deviation of a random variable X is the square root of the variance
of X, i.e., the standard deviation of X is dened as σ = E[(X − µ)2 ].
If X is a random variable with mean µ and variance σ 2 , then the transformed variable
X −µ
Z=
σ
has mean E(Z) = 0 and variance Var(Z) = E(Z 2 ) = 1. To prove this result, we can use the
previous expectation and variance rules. First, note that Z = a + bX where a = − σµ and
b = σ1 . Applying Expectation Rule #2 gives E(Z) = a + bE(X) = − σµ + σµ = 0, and applying
σ2
Variance Rule #2 gives Var(Z) = b2 Var(X) = σ2
= 1.
Denition. A standardized variable has mean E(Z) = 0 and variance E(Z 2 ) = 1. Such a
variable is typically denoted by Z (instead of X) and may be referred to as z-score.
8 Moments of a Distribution
Denition. The k-th moment of a random variable X is the expected value of X k , i.e.,
µ′k = E(X k ). The k-th central moment of a random variable X is the expected value of
(X − µ)k , i.e., µk = E[(X − µ)k ], where µ = E(X) is the expected value of X. The
k-th standardized moment of a random variable X is the expected value of (X − µ)k σ k , i.e.,
µ̃k = E[(X − µ)k ]σ k , where σ = E[(X − µ)2 ] is the standard deviation of X.
Note: The mean µ is the rst moment and the variance σ 2 is the second central moment.
To gain some insight into the moments of a distribution, suppose that we have sampled
n independent realizations of some random variable X. Letting x1 , , xn denote the n
independent realizations of X, we have that
n
1∑ k
µ′k = lim xi
n→∞ n
i=1
n
1∑
µk = lim (xi − x̄n )k
n→∞ n
i=1
∑n ( )k
1 xi − x̄n
µ̃k = lim
n→∞ n s̃n
i=1
n √
1 1 n
where x̄n = n i=1 xi and s̃n = n i=1 (xi − x̄n )2 . Note that these are all results of the
law of large numbers, which states that averages of iid data converge to expectations.
f (x)
f (x)
f (x)
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x x x x
Figure 6: Density functions for distributions with dierent values of skewness and kurtosis.
The dotted line in each subplot denotes the density of the standard normal distribution.