G Randomvariables
G Randomvariables
7.1 Introduction
From the universe of possible information, we ask
statistics probability
a question. To address this question, we might col-
lect quantitative data and organize it, for example,
using the empirical cumulative distribution func- universe of sample space - ⌦
tion. With this information, we are able to com- information and probability - P
pute sample means, standard deviations, medians + +
and so on. ask a question and define a random
Similarly, even a fairly simple probability
model can have an enormous number of outcomes.
collect data variable X
For example, flip a coin 332 times. Then the num- + +
ber of outcomes is more than a google (10100 ) – organize into the organize into the
a number at least 100 quintillion times the num- empirical cumulative cumulative
ber of elementary particles in the known universe. distribution function distribution function
We may not be interested in an analysis that con- + +
siders separately every possible outcome but rather
some simpler concept like the number of heads or
compute sample compute distributional
the longest run of tails. To focus our attention on means and variances means and variances
the issues of interest, we take a given outcome and
compute a number. This function is called a ran-
Table I: Corresponding notions between statistics and probability. Examining
dom variable. probabilities models and random variables will lead to strategies for the collection
Definition 7.1. A random variable is a real val- of data and inference from these data.
ued function from the probability space.
X : ⌦ ! R.
Generally speaking, we shall use capital letters near the end of the alphabet, e.g., X, Y, Z for random variables.
The range S of a random variable is sometimes called the state space.
Exercise 7.2. Roll a die twice and consider the sample space ⌦ = {(i, j); i, j = 1, 2, 3, 4, 5, 6} and give some random
variables on ⌦.
Exercise 7.3. Flip a coin 10 times and consider the sample space ⌦, the set of 10-tuples of heads and tails, and give
some random variables on ⌦.
101
Introduction to the Science of Statistics Random Variables and Distribution Functions
! 7! X(!) 7! f (X(!))
and so on. The last of these, rounding down X to the nearest integer, is called the floor function.
Exercise 7.4. How would we use the floor function to round down a number x to n decimal places.
We write
{! 2 ⌦; X(!) 2 B} (7.1)
to indicate those outcomes ! which have X(!), the value of the random variable, in the subset A. We shall often
abbreviate (7.1) to the shorter statement {X 2 B}. Thus, for the example above, we may write the events
{X is an odd number}, {X is greater than 1} = {X > 1}, {X is between 2 and 7} = {2 < X < 7}
Recall that with quantitative observations, we called the analogous notion the empirical cumulative distribution
function. Using the abbreviated notation above, we shall typically write the less explicit expression
FX (x) = P {X x}
102
Introduction to the Science of Statistics Random Variables and Distribution Functions
Exercise 7.7. For a random variable X and subset B of the sample space S, define
PX (B) = P {X 2 B}.
Choose a < b, then the event {X a} ⇢ {X b}. Their set theoretic difference
{X b} \ {X a} = {a < X b}.
In words, the event that X is less than or equal to b but not less than or equal to a is the event that X is greater than a
and less than or equal to b. Consequently, by the difference rule for probabilities,
Thus, we can compute the probability that a random variable takes values in an interval by subtracting the distri-
bution function evaluated at the endpoints of the intervals. Care is needed on the issue of the inclusion or exclusion of
the endpoints of the interval.
Example 7.8. To give the cumulative distribution function for X, the sum of the values for two rolls of a die, we start
with the table
x 2 3 4 5 6 7 8 9 10 11 12
P {X = x} 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
6 r r
r
1
r
3/4 r
r
r
1/2
r
r
1/4
r
r -
1 2 3 4 5 6 7 8 9 10 11 12
Figure 7.1: Graph of FX , the cumulative distribution function for the sum of the values for two rolls of a die.
103
Introduction to the Science of Statistics Random Variables and Distribution Functions
If we look at the graph of this cumulative distribution function, we see that it is constant in between the possible
values for X and that the jump size at x is equal to P {X = x}. In this example, P {X = 5} = 4/36, the size of the
jump at x = 5. In addition,
X
FX (5) FX (2) = P {2 < X 5} = P {X = 3} + P {X = 4} + P {X = 5} = P {X = x}
2<x5
2 3 4 9
= + + = .
36 36 36 36
We shall call a random variable discrete if it has a finite or countably infinite state space. Thus, we have in general
that:
X
P {a < X b} = P {X = x}.
a<xb
Exercise 7.9. Let X be the number of heads on three independent flips of a biased coin that turns ups heads with
probability p. Give the cumulative distribution function FX for X.
Exercise 7.10. Let X be the number of spades in a collection of three cards. Give the cumulative distribution function
for X. Use R to plot this function.
Exercise 7.11. Find the cumulative distribution function of Y = X 3 in terms of FX , the distribution function for X.
0.8
0.4
3. FX is nondecreasing. 0.2
!0.4
x!x0 + !0.8
Exercise 7.13. Prove the statement concerning the right continuity of the
distribution function from the continuity property of a probability.
Definition 7.14. A continuous random variable has a cumulative distribu-
1.0
Example 7.15. Consider a dartboard having unit radius. Assume that the
0.2
104
Introduction to the Science of Statistics Random Variables and Distribution Functions
The first line states that X cannot be negative. The third states that X is at most 1, and the middle lines describes
how X distributes is values between 0 and 1. For example,
✓ ◆
1 1
FX =
2 4
indicates that with probability 1/4, the dart will land within 1/2 unit of the center of the dartboard.
Exercise 7.16. Find the probability that the dart lands between 1/3 unit and 2/3 unit from the center.
Exercise 7.17. Let the reward Y for through the dart be the inverse 1/X of the distance from the center. Find the
cumulative distribution function for Y .
Exercise 7.18. An exponential random variable X has cumulative distribution function
⇢
0 if x 0,
FX (x) = P {X x} = (7.3)
1 exp( x) if x > 0.
for some > 0. Show that FX has the properties of a distribution function.
Its value at x can be computed in R using the command pexp(x,0.1) for = 1/10 and drawn using
> curve(pexp(x,0.1),0,80)
1.0
0.8
0.6
pexp(x, 0.1)
0.4
0.2
0.0
0 20 40 60 80
x
Figure 7.3: Cumulative distribution function for an exponential random variable with = 1/10.
Exercise 7.19. The time until the next bus arrives is an exponential random variable with = 1/10 minutes. A person
waits for a bus at the bus stop until the bus arrives, giving up when the wait reaches 20 minutes. Give the cumulative
distribution function for T , the time that the person remains at the bus station and sketch a graph.
Even though the cumulative distribution function is defined for every random variable, we will often use other
characterizations, namely, the mass function for discrete random variable and the density function for continuous
random variables. Indeed, we typically will introduce a random variable via one of these two functions. In the next
two sections we introduce these two concepts and develop some of their properties.
105
Introduction to the Science of Statistics Random Variables and Distribution Functions
fX (x) = P {X = x}.
The first property is based on the fact that probabilities are non-negative. The second follows from the observation
that the collection Cx = {!; X(!) = x} for all x 2 S, the state space for X, forms a partition of the probability
space ⌦. In Example 7.8, we saw the mass function for the random variable X that is the sum of the values on two
independent rolls of a fair dice.
Example 7.21. Let’s make tosses of a biased coin whose outcomes are independent. We shall continue tossing until
we obtain a toss of heads. Let X denote the random variable that gives the number of tails before the first head and p
denote the probability of heads in any given toss. Then
fX (0) = P {X = 0} = P {H} = p
fX (1) = P {X = 1} = P {T H} = (1 p)p
fX (2) = P {X = 2} = P {T T H} = (1 p)2 p
.. .. ..
. . .
fX (x) = P {X = x} = P {T · · · T H} = (1 p)x p
So, the probability mass function fX (x) = (1 p)x p. Because the terms in this mass function form a geometric
sequence, X is called a geometric random variable. Recall that a geometric sequence c, cr, cr2 , . . . , crn has sum
c(1 rn+1 )
sn = c + cr + cr2 + · · · + crn =
1 r
for r 6= 1. If |r| < 1, then limn!1 rn = 0 and thus sn has a limit as n ! 1. In this case, the infinite sum is the limit
c
c + cr + cr2 + · · · + crn + · · · = lim sn = .
n!1 1 r
Exercise 7.22. Establish the formula above for sn .
The mass function above forms a geometric sequence with the ratio r = 1 p. Consequently, for positive integers
a and b,
b
X
P {a < X b} = (1 p)x p = (1 p)a+1 p + · · · + (1 p)b p
x=a+1
(1 p)a+1 p (1 p)b+1 p
= = (1 p)a+1 (1 p)b+1
1 (1 p)
We can take a = 0 to find the distribution function for a geometric random variable.
FX (b) = P {X b} = 1 (1 p)b+1 .
Exercise 7.23. Give a second way to find the distribution function above by explaining why P {X > b} = (1 p)b+1 .
106
Introduction to the Science of Statistics Random Variables and Distribution Functions
The mass function and the cumulative distribution function for the geometric random variable with parameter
p = 1/3 can be found in R by writing
> x<-c(0:10)
> f<-dgeom(x,1/3)
> F<-pgeom(x,1/3)
The initial d indicates density and p indicates the probability from the distribution function.
> data.frame(x,f,F)
x f F
1 0 0.333333333 0.3333333
2 1 0.222222222 0.5555556
3 2 0.148148148 0.7037037
4 3 0.098765432 0.8024691
5 4 0.065843621 0.8683128
6 5 0.043895748 0.9122085
7 6 0.029263832 0.9414723
8 7 0.019509221 0.9609816
9 8 0.013006147 0.9739877
10 9 0.008670765 0.9826585
11 10 0.005780510 0.9884390
Note that the difference in values in the distribution function FX (x) FX (x 1), giving the height of the jump
in FX at x, is equal to the value of the mass function. For example,
Exercise 7.24. Check that the jumps in the cumulative distribution function for the geometric random variable above
is equal to the values of the mass function.
Exercise 7.25. For the geometric random variable above, find P {X 3}, P {2 < X 5}. P {X > 4}.
We can simulate 100 geometric random variables with parameter p = 1/3 using the R command rgeom(100,1/3).
(See Figure 7.4.)
Histogram of x Histogram of x
50
5000
40
4000
30
Frequency
Frequency
3000
20
2000
10
1000
0
0 2 4 6 8 10 12 0 5 10 15 20
x x
Figure 7.4: Histogram of 100 and 10,000 simulated geometric random variables with p = 1/3. Note that the histogram looks much more like a
geometric series for 10,000 simulations. We shall see later how this relates to the law of large numbers.
107
Introduction to the Science of Statistics Random Variables and Distribution Functions
is called the probability density function and X is called a continuous random variable.
By the fundamental theorem of calculus, the density function is the derivative of the distribution function.
FX (x + x) FX (x) 0
fX (x) = lim = FX (x).
x!0 x
In other words,
FX (x + x) FX (x) ⇡ fX (x) x.
We can compute probabilities by evaluating definite integrals
Z b
P {a < X b} = FX (b) FX (a) = fX (t) dt.
a
P {x < X x + x} = FX (x + x) FX (x)
⇡ fX (x) x = 2x
Figure 7.5: The probability P {a < X b} is the area under the
density function, above the x axis between y = a and y = b.
and X has density
8
<0 if x < 0,
fX (x) = 2x if 0 x 1,
:
0 if x > 1.
Exercise 7.27. Let fX be the density for a random variable X and pick a number x0 . Explain why P {X = x0 } = 0.
Example 7.28. For the exponential distribution function (7.3), we have the density function
⇢
0 if x 0,
fX (x) =
e x if x > 0.
Example 7.29. Density functions do not need to be bounded, for example, if we take
8
<0 if x 0,
fX (x) = pc
if 0 < x < 1,
: x
0 if 1 x.
108
Introduction to the Science of Statistics Random Variables and Distribution Functions
So c = 1/2.
For 0 a < b 1,
Z b p p
1 b p
P {a < X b} = p dt = t = b a.
a 2 t a
Exercise 7.30. Give the cumulative distribution function for the random variable in the previous example.
Exercise 7.31. Let X be a continuous random variable with density fX , then the random variable Y = aX + b has
density ✓ ◆
1 y b
fY (y) = fX
|a| a
(Hint: Begin with the definition of the cumulative distribution function FY for Y . Consider the cases a > 0 and a < 0
separately.)
For continuous random variables, we consider B1 = (x1 , x1 + x1 ] and B2 = (x2 , x2 + x2 ] and ask that for
some function fX1 ,X2 , the joint probability density function to satisfy
Example 7.32. Generalize the notion of mass and density functions to more than two random variables.
In words, the probability that the two events {X1 2 B1 } and {X2 2 B2 } happen simultaneously is equal to the
product of the probabilities that each of them happen individually.
For independent discrete random variables, we have that
fX1 ,X2 (x1 , x2 ) = P {X1 = x1 , X2 = x2 } = P {X1 = x1 }P {X2 = x2 } = fX1 (x1 )fX2 (x2 ).
In this case, we say that the joint probability mass function is the product of the marginal mass functions.
109
Introduction to the Science of Statistics Random Variables and Distribution Functions
Thus, for independent continuous random variables, the joint probability density function
110
Introduction to the Science of Statistics Random Variables and Distribution Functions
[1] 1
> data<-sample(x,50,replace=TRUE,prob=f)
> data
[1] 1 4 4 4 4 4 3 3 4 3 3 2 3 3 3 4 4 3 3 2 4 1 3 3 4 2 3 3 3 1 2 4 3 2 3 4 4 4 4 2 4 1
[43] 2 3 4 4 1 4 3 4
Notice that 1 is the least represented value and 4 is the most represented. If the command prob=f is omitted, then
sample will choose uniformly from the values in the vector x. Let’s check our simulation against the mass function
that generated the data. (Notice the double equal sign, ==.) First, recount the observations that take on each possible
value for x. We can make a table.
> table(data)
data
1 2 3 4
5 7 18 20
> counts<-rep(0,max(x)-min(x)+1)
> for (i in min(x):max(x)){counts[i]<-length(data[data==i])}
> simprob<-counts/(sum(counts))
> data.frame(x,f,simprob)
x f simprob
1 1 0.1 0.10
2 2 0.2 0.14
3 3 0.3 0.36
4 4 0.4 0.40
Exercise 7.35. Simulate the sums on each of 20 rolls of a pair of dice. Repeat this for 1000 rolls and compare the
simulation with the appropriate mass function.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
x
Figure 7.6: Illustrating the Probability Transform. First simulate uniform random variables u1 , u2 , . . . , un on the interval [0, 1]. About 10%
of the random numbers should be in the interval [0.3, 0.4]. This corresponds to the 10% of the simulations on the interval [0.28, 0.38] for a random
variable with distribution function FX shown. Similarly, about 10% of the random numbers should be in the interval [0.7, 0.8] which corresponds
to the 10% of the simulations on the interval [0.96, 1.51] for a random variable with distribution function FX , These values on the x-axis can be
obtained from taking the inverse function of FX , i.e., xi = FX 1 (ui ).
111
Introduction to the Science of Statistics Random Variables and Distribution Functions
U = FX (X)
Note that FX has range from 0 to 1. It cannot take values below 0 or above 1. Thus, U takes on values between 0 and
1. Thus,
FU (u) = 0 for u < 0 and FU (u) = 1 for u 1.
For values of u between 0 and 1, note that
If we can simulate U , we can simulate a random variable with distribution FX via the quantile function
X = FX 1 (U ). (7.4)
Example 7.37. For the dart board, for x between 0 and 1, the
0.4
distribution function
p
u = FX (x) = x2 and thus the quantile function x = FX 1 (u) = u.
0.2
quantile function x
p Figure 7.7: The distribution function (red) and the empirical
Xi = U i . cumulative distribution function (black) based on 100 simula-
tions of the dart board distribution. R commands given below.
112
Introduction to the Science of Statistics Random Variables and Distribution Functions
> u<-runif(100)
> x<-sqrt(u)
> xd<-seq(0,1,0.01)
> plot(sort(x),1:length(x)/length(x),
type="s",xlim=c(0,1),ylim=c(0,1), xlab="x",ylab="probability")
> par(new=TRUE)
> plot(xd,xdˆ2,type="l",xlim=c(0,1),ylim=c(0,1),xlab="",ylab="",col="red")
113
Introduction to the Science of Statistics Random Variables and Distribution Functions
7.7. Let’s check the three axioms. Each verification is based on the corresponding axiom for the probability P .
1. For any subset B, PX (B) = P {X 2 B} 0.
2. For the sample space S, PX (S) = P {X 2 S} = P (⌦) = 1.
3. For mutually exclusive subsets Bi , i = 1, 2, · · · , we have by the exercise above the mutually exclusive events
{X 2 Bi }, i = 1, 2, · · · . Thus,
1
! ( 1
) 1
! 1 1
[ [ [ X X
PX Bi = P X 2 Bi = P {X 2 Bi } = P {X 2 Bi } = PX (Bi ).
i=1 i=1 i=1 i=1 i=1
7.10. From the example in the section Basics of Probability, we know that
x 0 1 2 3
P {X = x} 0.41353 0.43588 0.13765 0.01294
To plot the distribution function, we use,
> hearts<-c(0:3)
> f<-choose(13,hearts)*choose(39,3-hearts)/choose(52,3)
> (F<-cumsum(f))
[1] 0.4135294 0.8494118 0.9870588 1.0000000
> plot(hearts,F,ylim=c(0,1),type="s")
Thus, the cumulative distribution function,
8
> 0 for x < 0,
1.0
>
>
>
< 0.41353 for 0 x < 1,
0.8
>
:
1 for 3 x
F
0.4
FY (y) = P {Y y} = P {X 3 y}
0.0
p p
= P {X 3 y} = FX ( 3 y). 0.0 0.5 1.0 1.5 2.0 2.5 3.0
hearts
114
Introduction to the Science of Statistics Random Variables and Distribution Functions
(Check this last equality.) Then P {X x1 } P {X x2 } · · · . Now, use the second continuity property of
probabilities to obtain limn!1 FX (xn ) = limn!1 P {X xn } = P {X x0 } = FX (x0 ). Because this holds for
every strictly decreasing sequencing sequence with limit x0 , we have that
lim FX (x) = FX (x0 ).
x!x0 +
7.17. Using the relation Y = 1/X, we find that the distribution function for Y ,
1
FY (y) = P {Y y} = P {1/X y} = P {X 1/y} = 1 P {X < 1/y} = 1 .
y2
Thus uses the fact that P {X = 1/y} = 0.
7.18. We use the fact that the exponential function is increasing, and that limu!1 exp( u) = 0. Using the numbering
of the properties above
115
Introduction to the Science of Statistics Random Variables and Distribution Functions
7.19. The distribution function has the graph shown in Figure 7.8.
1.0
0.8
0.6
F
0.4
0.2
0.0
0 20 40 60 80
Figure 7.8: Cumulative distribution function for an exponential random variable with = 1/10 and a jump at x = 20.
The formula 8
<0 if x < 0,
FT (x) = P {X x} = 1 exp( x/10) if 0 x < 20,
:
1 if 20 x.
7.22. For r 6= 1, write the expressions for sn and rsn and subtract. Notice that most of the terms cancel.
sn = c+ cr +cr2 + · · · + crn
rsn = cr +cr2 + · · · + crn +crn+1
(1 r)sn = c crn+1 = c(1 rn+1 )
116
Introduction to the Science of Statistics Random Variables and Distribution Functions
7.28. Because the density is non-negative on the interval [0, 1], FX (x) = 0 if x < 0 and FX (x) = 1 if x 1. For x
between 0 and 1, Z x
1 p x p
p dt = t = x.
0 2 t 0
Thus, 8
< 0p if x 0,
FX (x) = x if 0 < x < 1,
:
1 if 1 x.
For a > 0 ⇢ ✓ ◆
y b y b
FY (y) = P X = FX .
a a
Now take a derivative and use the chain rule to find the density
✓ ◆ ✓ ◆
y b 1 1 y b
fY (y) = FY0 (y) = fX = fX .
a a |a| a
For a < 0 ⇢ ✓ ◆
y b y b
FY (y) = P X =1 FX .
a a
Now the derivative ✓ ◆ ✓ ◆
y b 1 1 y b
fY (y) = FY0 (y) = fX = fX .
a a |a| a
fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = fX1 (x1 )fX2 (x2 ) · · · fXn (xn )
117
Introduction to the Science of Statistics Random Variables and Distribution Functions
1.0
0.8
0.6
F
0.4
0.2
0.0
2 4 6 8 10 12
x
Figure 7.9: Sum on two fair dice. The empirical cumulative distribution function from the simulation (in black) and the cumulative distribution
function (in red) are shown for Exercise 7.31.
3 4 0.08333333 0.065
4 5 0.11111111 0.096
5 6 0.13888889 0.120
6 7 0.16666667 0.167
7 8 0.13888889 0.157
8 9 0.11111111 0.121
9 10 0.08333333 0.098
10 11 0.05555556 0.058
11 12 0.02777778 0.033
We also have a plot to compare the empirical cumulative distribution function from the simulation with the cumu-
lative distribution function.
> plot(sort(twodice),1:length(twodice)/length(twodice),type="s",xlim=c(2,12),
ylim=c(0,1),xlab="",ylab="")
> par(new=TRUE)
> plot(x,F,type="s",xlim=c(2,12),ylim=c(0,1),col="red")
7.39. FX is increasing and continuous, so the set {x; FX (x) u} is the interval ( 1, FX 1 (u)]. In addition, x is in
this inverval precisely when x FX 1 (u).
7.40. Let’s find FV . If v < 0, then
0 P {V v} P {V 0} = P {1 U 0} = P {1 U } = 0
because U is never greater than 1. Thus, FV (v) = 0 Similarly, if v 1,
1 P {V v} P {V 1} = P {1 U 1} = P {0 U } = 1
because U is always greater than 0. Thus, FV (v) = 1. For 0 v < 1,
FV (v) = P {V v} = P {1 U v} = P {1 v U} = 1 P {U < 1 v} = 1 (1 v) = v.
This matches the distribution function of a uniform random variable on [0, 1].
118