Randomvariables
Randomvariables
PROBABILITY MODELING
If E is the horse race and U is the set of horses, then X may be the
amount you win.
It is useful to express random variables in terms of set operations. A
random variable is simply a mapping from the space U onto the real numbers,
R. If we say that x = X(e) we mean that x ∈ R is the number that is
associated with the outcome e ∈ U. This is illustrated in Figure 2.3. In a
sense, the values x ∈ R can be considered as outcomes in a new experiment.
2.6. RANDOM VARIABLES 25
2.6.1 Events
Every interval on R corresponds to a set of outcomes in U. Let I ∈ R be an
interval and let A = {e ∈ U : X(e) = x, x ∈ I} . Then A is the event that is
associated with I, and P (I) = P (A). Every interval of R is an event whose
probability can be calculated. This implies that there is an inverse operation
from R to U, which we may write as A = X −1 (I). Note that it is perfectly
possible that there may be no points in U that map to some selected interval
I. In that case X −1 (I) = φ, the empty set, and P (I) = P (φ) = 0. It is
26 CHAPTER 2. PROBABILITY MODELING
also possible that an interval may be selected such that A contains all of the
points in U. In that case X −1 (I) = U, and P (I) = P (U) = 1.
Not only is every interval of R an event, but so is every collection of
intervals. This gives us the tool to construct a probability measure on the
real number line. Later it will be extended to any geometric space.
Consider the set of points R = {x : X(e) = x for some e ∈ U} . This set
of points is the of the random variable. It is the entire set of the points that
can be assumed by the random variable X.
FX (x) = P (X ≤ x) (2.18)
FX (x) is a nice function in the sense that it is either continuous or has step
discontinuities. To see this, let a = b − ε and rewrite (2.19) as FX (b) =
FX (b − ε) + P (b − ε < X ≤ b). The last term is just the probability that X
falls in the interval (b − ε, b], which is equal to P (X = b). Hence, FX (x) is
either continuous at x = b or has a step of size P (X = b). This is shown in
Figure 2.4. Distribution functions come in three types, continuous, discrete
and mixed. We will look at examples of each.
Discrete Distribution
A discrete random variable can assume only discrete values. The number of
values must be either Þnite or at least countable. The range of a discrete
random variable consists of isolated points.
Let the values that can be assumed by X be xk , k = 0, 1, 2, . . . Then the
distribution function will have the staircase appearance shown in Figure 2.5.
The steps occur at each xk and have size P (X = xk ).
The discrete values are the values that can be observed on a trial of the
experiment.
heads that appear. The possible values for the random variable are x0 = 0,
x1 = 1, x2 = 2, x3 = 3 and the associated probabilities are P (X = 0) = 1/8,
P (X = 1) = 3/8, P (X = 2) = 3/8, P (X = 3) = 1/8. These probabilities
can be computed by enumeration of the possibilities or use of the binomial
formula that is developed in Exercise 5. The distribution function is shown
in Figure 2.6
Continuous Distribution
Suppose that FX (x) is continuous for all x. Then lim FX (x) − FX (x − ε) = 0
ε→0
so that P (X = x) = 0 for all x. That is, the probability that the random
variable equals any chosen value of x is zero. This is like throwing a dart at
a dartboard and asking the probability of exactly hitting any mathematical
point. The total probability of hitting somewhere on the board is one, as-
suming it is early in the evening, but the probability of hitting any particular
point is zero.
No isolated points exist in the range of the continuous random variable.
If X can take on a value x then it can also take on other points in a neighbor-
hood of size ε around x, no matter how small ε. The probability associated
with any individual point is zero.
2.6. RANDOM VARIABLES 29
Figure 2.6: The probability distribution function for the number of heads in
an experiment of tossing three fair coins.
dFX (x)
fX (x) = (2.23)
dx
This function provides the probability that X falls in a small interval near x:
P (x − ε < X ≤ x) = fX (x)ε. This is much like Þnding a mass by multiplying
a density by a volume. The continuous distribution and its associated proba-
bility density function is shown in Figure 2.7. The relationship in (2.23) can
be inverted, so that for any x
Z x
FX (x) = fX (u)du (2.24)
−∞
Figure 2.7: (a) The distribution function for a continuous random variable
and (b) its probability density function. Note that the probability density
function is highest where the slope of the distribution function is greatest.
Mixed Distribution
The range of a mixed distribution contains isolated points and points in a
continuum. The distribution function is a smooth curve except at one or
more points where there are Þnite steps. The derivative of the distribution
2.6. RANDOM VARIABLES 31
function does not exist at the step points. However, it is common to write
the distribution as the sum of the derivative of FX (x) where the derivative is
deÞned and delta functions where the steps occur. The weight of each delta
function is equal to the height of the step at that point. Let xk , k = 0, 1, 2, . . .
be the locations of Þnite jumps in FX (x) and let c(x) = dFX /dx at all other
points. Then
X
fX (x) = c(x) + P (X = xk )δ(x − xk ) (2.26)
k
The distribution and pdf for a mixed random variable is shown in Figure 2.9.
In this case there are three Þnite steps in FX (x) and three impulses in fX (x).
The size of the step and the weight of the corresponding impulse are equal.
Example 2.6.3 Let E be the experiment of tossing a fair die. Let A be the
event that an even face appears and B be the event that the number on the
face is four or more. Let I(S) be the indicator function which takes on the
value 1 if S is true and the value 0 if S is false. Then X = [I(A), I(B)]
2.6. RANDOM VARIABLES 33
Figure 2.10: Mapping of the outcomes of the die tossing experiment onto
points in a plane by a particular pair of random variables.
The last two relationships form a bridge between the joint distribution
functions and the single-variate distribution functions. They are readily es-
tablished by using (2.27).
The probability that an experiment produces a pair (X1 , X2 ) that falls in
a rectangular region with lower left corner (a, c) and upper right corner (b, d)
is
∂ 2 FX1 X2 (x1 , x2 )
fX1 X2 (x1 , x2 ) = (2.29)
∂x1 ∂x2
It follows from the deÞnition of the distribution function that the density
function is non-negative. You are asked to carry out the details of showing
this in Exercise 10.
The probability density function has a number of properties that are use-
ful. Some of them are listed below. The notation U and V is used for the
random variables and u and v for the values to reduce the number of sub-
scripts. You are asked to compute these functions for a speciÞc distribution
36 CHAPTER 2. PROBABILITY MODELING
Let U and V be random variables and let A and B be sets of real numbers.
Commonly these will be intervals, but that is not required. Then consider
the event that an experiment with a probability density function fU,V (u, v)
has an outcome such that U ∈ A and V ∈ B. This can be expressed as2
Z Z
P [U ∈ A, V ∈ B] = fU,V (ξ, η)dξdη (2.37)
ξ∈A η∈B
2
We will use the notation P [U ∈ A, V ∈ B] rather than P [(U ∈ A) ∩ (V ∈ B)]
because it is more common in the literature and less cumbersome to write. The events in
joint probability expressions are customarily separated by a comma when the operation is
“and”.
2.6. RANDOM VARIABLES 37
We can now make use of the deÞnition (2.12) to write the conditional prob-
ability computed from the joint density function.
P [U ∈ A, V ∈ B]
P [U ∈ A|V ∈ B] = (2.39)
P [V ∈ B]
R R
f (ξ, η)dξdη
ξ∈A η∈B U,V
= R∞ R
f (ξ, η)dξdη
ξ=−∞ η∈B U,V
whenever P [V ∈ B] > 0.
The term on the right is computed from (2.39) with the region A =
(−∞, u].
The conditional probability distribution function has all of the properties
of an ordinary one-dimensional probability distribution function. That is,
it is a nondecreasing function with FU (−∞|V ∈ B) = 0 and FU (∞|V ∈
B) = 1. You should convince yourself of these properties by examining the
deÞnition. A conditional probability density function can be deÞned simply
as the derivative, where it exists, of the conditional probability distribution
function.
dFU (u|V ∈ B)
fU (u|V ∈ B) = (2.41)
du
wherever the derivative exists.
38 CHAPTER 2. PROBABILITY MODELING
We have noted that two events, say A and B are statistically independent
when P (A ∩ B) = P (A)P (B). If we choose A and B to be the events U ∈ A
and V ∈ B, where A and B are regions, then we have the basis for a deÞnition
of independent random variables.
P [U ∈ A, V ∈ B] = P [U ∈ A]P [V ∈ B] (2.42)
P [U ≤ u, V ≤ v] = P [U ≤ u]P [V ≤ v] (2.43)
If the distribution functions are differentiable for almost all u and v then we
also have the result
fU,V (u, v) = fU (u)fV (v) (2.45)
Example 2.6.4 A normal probability density function for two random vari-
ables U and V is given by
³ ´
1 − u2 +v 2
fU,V (u, v) = e 2
(2.46)
2π
R∞ 2
Now, fU (u) = −∞ fU,V (u, v)dv = √12π e−u /2 , which is easily established by
R∞ 2
using the fact that √12π −∞ e−v /2 dv = 1. By a similar integration, we also
2
Þnd that fV (v) = √12π e−v /2 . Therefore fU,V (u, v) = fU (u) fV (v) and the
random variables are statistically independent.
2.6. RANDOM VARIABLES 39
0
-1 -0.5 0.5v 1
This function is plotted in Figure 2.11. Notice that its shape is substantially
changed from that of a rectangular distribution for fU (u) given by (2.6.5).
Let U and V be random variables, and let W = U +V. This is a situation that
arises when we observe a signal in noise, for example. We need a means to
compute the probability functions associated with W given that we know the
distribution function FU,V (u, v) and the pdf fU,V (u, v). The deÞnition of the
distribution function can be used to set this up. Since FW (w) = P [W ≤ w]
and W = U + V, we have
FW (w) = P [U + V ≤ w] (2.53)
This can be calculated by integrating fU,V (u, v) over the region of the plane
for which u + v ≤ w. The expression for that calculation is
Z ∞ ·Z w−u ¸
FW (w) = fU,V (u, v)dv du (2.54)
−∞ −∞
2.6. RANDOM VARIABLES 41
The computation of the integral in the last line is clear and straightforward,
even if the details can be technically complicated. If necessary, it lends itself
to numerical evaluation by computer.
A case that is particularly common and important arises when U and
V are independent random variables. In that case we can apply (2.45) and
write (2.55) as Z ∞
fW (w) = fU (u)fV (w − u)du (2.56)
−∞
If you look closely at the expression you will quickly recognize an old friend,
namely the convolution integral! What a surprise to discover this old friend
in a new setting!
3
The differentiation makes use of Leibnitz’ rule for the differentiation of an inte-
gral. Assuming that a(t), b(t) and r(s, t) are all differentiable with respect to t then
d
R b(t) 0 0
R b(t) ∂
dt a(t) r(s, t)ds = r[b(t), t]b (t) − r[a(t), t]a (t) + a(t) ∂t r(s, t)ds