Vdoc.pub Introduction to Probability
Vdoc.pub Introduction to Probability
to Probability
Mark Daniel Ward
Associate Professor of Statistics
Purdue University
Ellen Gundlach
Education Specialist and Continuing Lecturer in Statistics
Purdue University
ISBN-13: 978-0-7167-7109-8
ISBN-10: 0-7167-7109-8
©2016 by W. H. Freeman & Company
All rights reserved
First printing
Preface xiii
I Randomness 1
1 Outcomes, Events, and Sample Spaces 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Complements and DeMorgan’s Laws . . . . . . . . . . . . . . . . 11
1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Probability 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Equally Likely Events . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Complements; Probabilities of Subsets . . . . . . . . . . . . . . . 24
2.4 Inclusion-Exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 More Examples of Probabilities of Events . . . . . . . . . . . . . 27
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Independent Events 37
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Some Nice Facts about Independence . . . . . . . . . . . . . . . . 42
3.3 Probability of Good Occurring before Bad . . . . . . . . . . . . . 43
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4 Conditional Probability 48
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Distributive Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Conditional Probabilities Satisfy the Probability Axioms . . . . . 54
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5 Bayes’ Theorem 60
5.1 Introduction to Versions of Bayes’ Theorem . . . . . . . . . . . . 60
5.2 Multiplication with Conditional Probabilities . . . . . . . . . . . 66
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
v
vi Contents
6 Review of Randomness 73
6.1 Summary of Randomness . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
IV Counting 271
Bibliography 647
Index 649
this page left intentionally blank
Preface
We want to briefly justify why there should be another probability book, when
so many others are available.
Motivation: Students from majors in the mathematical sciences and in other
areas will be more engaged with the material if they are studying problems
that are relevant to them. While testing drafts of the book in the classroom,
the students who used this book were asked to contribute questions. As a
result, many of the exercises in this text began with questions motivated by the
students’ own interests.
Example- and exercise-oriented approach: Our book serves as a student’s
first introduction to probability theory, so we devote significant attention to a
wealth of exercises and examples. We encourage students to practice their skills
by solving lots of questions. Our exercises are split into practice, extensions,
and advanced types of questions. We recommend assigning a small number of
questions to students on a daily basis. This promotes more interactive discus-
sion in class between the students and the instructor. It consistently empowers
the students to try their hand at some problems of their own. It reduces stress
and “cramming” at exam time, as the students consistently develop their under-
standing during the course. It also provides a firm foundation for the students’
long term understanding of probability. The exercises, theorems, definitions,
and remarks are all numbered in one list (instead of numbered separately) be-
cause we believe that they all should be used in tandem to understand the
chapter material.
Relationship between events and random variables: The jump from events
to random variables is often a “leap” in other texts. In the present book, we
devote significant attention to outcomes, events, sample spaces, and probabili-
ties, before moving on to random variables. Random variables are introduced
explicitly as real-valued functions on the sample space, i.e., as functions that
depend on the outcome. The notation X(ω) is used to introduce a random
variable at first (where ω is an outcome), so that students can more easily make
a transition from studying outcomes to studying random variables that depend
on such outcomes.
Jointly distributed random variables: Many probability texts first emphasize
properties of one discrete random variable, followed by properties of one contin-
xiii
xiv Preface
uous random variable, and finally return to jointly distributed random variables.
We believe, in contrast, that a firm understanding of jointly distributed random
variables, from the very beginning, is most helpful for the students’ comprehen-
sive understanding of the material. Using jointly distributed random variables
at an early stage of the book also allows for more intuitive definitions of some of
the concepts. For instance, many probability texts introduce Binomial random
variables by explaining the mass, which requires a good grasp of Binomial co-
efficients, i.e., of nk . We believe, however, that Binomial random variables are
Counting: Many probability books start with counting, which means this
material is taught during the first two or three weeks of a course. Unfortu-
nately, this means that a student who is weighing her/his interest in a course
will not even begin to grasp the concepts of randomness until the registration
period is over. This leads to attrition. Moreover, some questions in counting
are best understood from a probabilistic point of view, using (for instance) the
linearity of expectation, which is not available to the students at the start of the
course. As an example, consider how many couples are expected to sit together,
when people sit uniformly at random in a circle. This question can be answered
very succinctly, using indicator functions and the linearity of expectation. (The
probability mass function, in contrast, is cumbersome to compute.) In general,
we believe that our approach to counting is significantly enhanced by the use
of sums of indicator random variables. Therefore, we focus on counting after
we finish a thorough treatment of discrete random variables, but before moving
onwards to continuous random variables. This allows the students to feel con-
fident in their understanding of the discrete world before they tackle difficult
counting questions. We emphasize to students that combinatorics is a deep and
beautiful subject (much of the Ward’s research is motivated by problems in ap-
plied discrete mathematics). We also try to emphasize the connections between
discrete random variables and counting. We firmly believe that this is best
accomplished when the students already understand discrete random variables.
also guide the students through ways to tell which kind of distribution they are
working with. We give suggestions about how to grasp nuances in a problem that
make it a Binomial or Geometric or Negative Binomial situation. These guides
help students home in on what separates these distributions. We summarize
each distribution for quick reference, but we also go into details to explain each
discrete distribution’s mass, expected value, variance, etc.
Our students have had an excellent experience using this probability book.
Several of our own students have already passed the SOA/CAS P/1 exam after
having learned probability using only the early drafts of this book. Our students
seem to enjoy the many examples and friendly tone. We hope that you and your
students also find our book approachable and thorough. We are delighted by the
kind reception that our students and colleagues have given to the book during
its pilot tests.
We have divided our book into seven main parts:
Part I: Randomness. We introduce outcomes, events, sample spaces, basic
probability rules, independence, conditional probabilities, and Bayes’ Theorem.
Part II: Discrete Random Variables. We discuss the difference between dis-
crete and continuous random variables and introduce probability mass function,
cumulative distribution function, expected value, variance, and joint distribu-
tions for discrete random variables.
Part III: Named Discrete Random Variables. We consider ways to dis-
tinguish between—and perform calculations with—the most common discrete
random variables: Bernoulli, Binomial, Geometric, Negative Binomial, Poisson,
Hypergeometric, and Discrete Uniform. We include a review chapter to help
students see the similarities and differences between all of these distributions.
Part IV: Counting. We use indicator variables and the linearity of expecta-
tion as tools to help tackle several different types of counting problems: sam-
pling with and without replacement; when order matters and doesn’t matter;
and rearrangement problems. We have case studies on poker and Yahtzee, two
popular games many students will recognize.
Part V: Continuous Random Variables. We reinforce the difference between
discrete and continuous random variables. Then we introduce the probability
density function, cumulative distribution function, expected value, variance,
and joint and conditional distributions for continuous random variables.
Part VI: Named Continuous Random Variables. We show ways to utilize
(and quickly make distinctions between) the most common continuous random
variables: Continuous Uniform, Exponential, Gamma, Beta and Normal. We
show how the Central Limit Theorems and Laws of Large Numbers work. We
include a review chapter to help students see the similarities and differences
between all of these continuous distributions and between some of the continuous
and discrete distributions.
xvi Preface
Part VII: Additional Topics. Here we cover more advanced topics that could
be optional, depending on how much time an instructor has in a semester or
quarter. We treat the distribution of a function of one continuous random vari-
able, the variance of sums of random variables, correlation, conditional expecta-
tion, Markov and Chebyshev Inequalities, order statistics, moment generating
functions, and the joint density of two random variables that are functions of
another pair of random variables.
Acknowledgements
We thank our students for their feedback, as well as their contributions to the
exercises in this text. These students are:
Probability: The Science Of Uncertainty (HONR 399, Fall 2010) J. Blair,
J. Covalt, S. Fancher, C. Fleming, B. Goosman, W. Hess, E. Hoffman, C. Hol-
comb, A. Hurlock, E. Jenkins, J. Ling, D. Mo, B. Morgan, C. Mullen, S. Muss-
mann, D. Rouleau, C. Sanor, V. Savikhin, S. Sheafer, S. Spence, R. Stevick,
J. Wu, S. Yap, M. Zachman, B. Zhou.
Probability (MA/STAT 416, Fall 2011) K. Amstutz, C. Ben, M. Boing,
K. Breneman, P. Coghlan, B. Copeland, O. El-Ghirani, M. Fronek, Z. Gao,
A. Gerardi, T. He, X. He, K. Hopkins, J. Jaagosild, N. Johnson, K. Kemmer-
ling, J. Kwong, N. Lawrence, K. Leow, B. Lewis, K. Li, S. Li, Y. Liang, E. Luo
Cao, M. Mazlan, J. Milligan, E. Myers, Y. Nazarbekov, C. Ng, R. Schwartz,
S. Scott, S. Seffrin, J. Sheng, D. Snyder, T. Sun, B. Wilson, E. Winkowska,
R. Xue, Y. Yang, G. Zhao.
We heartily thank our colleague Dr. Frederi Viens (Purdue University) for
piloting the book in his STAT/MA 416 course in Fall 2011 and again in Spring
2013. We extend our thanks also to Dr. David Galvin (University of Notre
Dame), Dr. Patricia Humphrey (Georgia Southern University), Dr. Hosam Mah-
moud (The George Washington University), and Dr. Meike Neiderhausen (Uni-
versity of Portland), for their willingness to test our book in Fall 2012 in several
different types of courses at their own universities. These colleagues and their
students provided very valuable feedback about early drafts of the book. We
benefited a great deal from their guidance, insight, and suggestions.
This book was genuinely a team effort. We are very grateful to Deborah
Sutton for preparing the answers to the odd-numbered exercises. Dr. Patri-
cia Humphrey provided numerous insights about all aspects of the content and
checked the accuracy of the entire book. Pat also produced the solutions in
the student and instructor manuals. We appreciate Jackie Miller’s insights and
ideas at the beginning of this project. We are thankful for the support of Terri
Ward, our publisher at W. H. Freeman, and for the contributions of the entire
W. H. Freeman team. In particular, Liz Geller and Elizabeth Marraffino pro-
vided hundreds of helpful suggestions for improvement. We are also especially
thankful for the marketing expertise of Karen Carson, Cara LeClair, and Kat-
Preface xvii
rina Mangold. We are grateful to Diana Blume and Matt McAdams for their
help with the design, layout, and art, and to Susan Wein for making sure our
files were technically sound and printer-ready.
We are also thankful for the feedback of the following reviewers:
David Anderson, University of Wisconsin
Alireza Arasteh, Western New Mexico University
G. Jogesh Babu, Pennsylvania State University
Christian Beneš, Brooklyn College of the City University of New York
Kiran R. Bhutani, The Catholic University of America
Ken Bosworth, Idaho State University
Daniel Conus, Lehigh University
Michelle Cook, Stephen F. Austin State University
Dana Draghicescu, City University of New York, Hunter College
Randy Eubank, Arizona State University
Marian Frazier, Gustavus Adolphus College
Rohitha Goonatilake, Texas A&M University
Patrick Gorman, Kutztown University
Ross Gosky, Appalachian State University
Marc Goulet, University of Wisconsin–Eau Claire
Susan Herring, Sonoma State University
Christopher Hoffman, University of Washington
Patricia Humphrey, Georgia Southern University
Ahmad Kamalvand, Huston-Tillotson University
Judy Kasabian, El Camino College
Syed Kirmani, University of Northern Iowa
Charles Lindsey, Florida Gulf Coast University
Susan Martonosi, Harvey Mudd College
Ronald Mellado Miller, Brigham Young University–Hawaii
Joseph Mitchell, Stony Brook University
Sumona Mondal, Clarkson University
Cindy Moss, Skyline College
Meike Niederhausen, University of Portland
Will Perkins, Georgia Institute of Technology
Rebecca L. Pierce, Ball State University
David J. Rader, Jr., Rose-Hulman Institute of Technology
Aaron Robertson, Colgate University
Charles A. Rohde, Johns Hopkins University
Seyed Roosta, Albany State University
Juana Sanchez, University of California, Los Angeles
Yiyuan She, Florida State University
Therese Shelton, Southwestern University
A. Robert Sinn, University of North Georgia
Clifton D. Sutton, George Mason University
Mahbobeh Vezvaei, Kent State University
xviii Preface
B
A
xix
xx Notation Review
A Ac
\ setminus (i.e., B \ A = B ∩ Ac ;
the event containing outcomes in B that are not in A)
∪ union of events, which corresponds with the word “or,”
i.e., A ∪ B is the event containing outcomes in A or B or both.
The set A ∪ B corresponds to the region containing
shading, lines, or both, in the figure below.
∩ intersection of events, which corresponds with the word “and,”
i.e., A ∩ B is the event containing outcomes in A and B.
The set A ∩ B is the region that contains lines
overlapping the shading in the figure below.
S
B
A
Geometric Sums
For −1 < a < 1, recall these two finite summations of geometric terms,
r
2 3 r
X 1 − ar+1
1 + a + a + a + ··· + a = aj =
1−a
j=0
and
r
2 3 4 r
X a − ar+1
a + a + a + a + ··· + a = aj = .
1−a
j=1
For −1 < a < 1, these yield two infinite summations of geometric terms,
∞
2 3
X 1
1 + a + a + a + ··· = aj =
1−a
j=0
and
∞
2 3 4
X a
a + a + a + a + ··· = aj = .
1−a
j=1
Exponential Function
For any real-valued x, the power series definition of the exponential function
evaluated at x is
∞
X xn
ex = ,
n!
n=0
(n)(n + 1)
1 + 2 + ··· + n =
2
and
(n)(n + 1)(2n + 1)
12 + 22 + · · · + n2 =
6
xxi
xxii Math Review
Randomness
1
2 Part I. Randomness
Math skills you will need:Pbasic understanding of set notation, unions, in-
tersections, and summation notation.
Additional resources: Calculators may be used to assist in the calculations.
Colored pencils may be helpful for drawing Venn diagrams clearly.
Chapter 1
On Monday in math class, Mrs. Fibonacci says, “You know, you can think of
almost everything as a math problem.” On Tuesday I start having problems.
—Math Curse by Jon Scieszka and Lane Smith (Viking, 1995)
In a National Public Radio story from November 30, 2012, “That’s So Random:
The Evolution of an Odd Word,” Neda Ulaby writes about the many misuses of
the word “random” in our modern culture, including snippets from the comedian
Spencer Thompson’s routine, “I Hate When People Misuse the Word Random.”
For example, Thompson explains that if your friends talk about a “random
party” they went to, it probably wasn’t as random is they think since it was likely
to be held within a reasonably small community and planned with some people
that your friends already knew. What do mathematicians and statisticians
mean by the word “random”?
1.1 Introduction
Probability theory is the study of randomness and all things associated with
randomness. Examples abound everywhere. From the time that we are children,
we play guessing games, roll dice, and flip coins. We frequently encounter the
unknown and the uncertain. We turn on an mp3 player in a “shuffle” mode, or
listen to the radio, eagerly waiting to see what song will come on next. The
time until an something happens is often random, e.g., until a traffic light turns
green, an email arrives, the telephone rings, or a text message buzzes. The
sex of a baby remains unknown until birth (or an ultrasound). An athlete
runs a race, but the exact finishing time is unknown beforehand. Millions of
3
4 Chapter 1. Outcomes, Events, and Sample Spaces
people play lotteries and other games of chance, often wagering large amounts
of money. Throughout this book, we study probability using examples that will
be familiar to the reader.
Definition 1.1. When something happens at random there are several po-
tential outcomes. Exactly one of these outcomes occurs. An event is defined
to be a collection of some outcomes.
Even though the Two extreme events have names: The empty set ∅ consists of no outcomes
empty set never (so the empty set never happens). The sample space S consists of all outcomes
happens, we will (so the sample space always happens).
need it to
understand disjoint
events, i.e., events
that have no
outcome in common. Example 1.2. You roll a 6-sided die.
The sample space is S = {1, 2, 3, 4, 5, 6}. Only one of these six outcomes ac-
tually occurs; for instance, 2 is a possible outcome, or 5 is a possible outcome,
etc. We cannot “solve” for which outcome occurs because, as we know from
practical experience, we do not know (in advance) which outcome will occur.
The outcome is random.
(Q: How many One event is {1, 3, 5}, i.e., the event that the outcome is odd. The event
events are there that the roll is 2 or higher is {2, 3, 4, 5, 6}. The event that 4 does not appear is
altogether? Hint: {1, 2, 3, 5, 6}. The event {3} consists of only one outcome. Event {1, 6} has the
It’s a power of 2.) smallest and largest possible outcomes.
One event is a subset of another if every outcome from the first event is
contained in the second event too. Subsets are denoted with the “⊂” symbol.
For instance, an event with one outcome (such as 5) is a subset of a larger event
(such as {1, 2, 5}), which is a subset of the sample space:
B
A
Example 1.4. A student buys a book and opens it to a random page. He notes
the number of typographical errors on the page.
1.1. Introduction 5
The sample space is S = {boy, girl}. Although there is just one baby, we can
describe four events:
One possible outcome is that the the mother has triplets, which are all girls;
we denote this outcome as (g, g, g). If she delivers a boy and then a girl, the
outcome is (b, g). So the sample space is
S = {(b), (g),
(b, b), (b, g), (g, b), (g, g),
(b, b, b), (b, b, g), (b, g, b), (b, g, g), (g, b, b), (g, b, g), (g, g, b), (g, g, g), . . .}.
Note: A new mother may have a single baby, twins, triplets, octuplets or A set of octuplets (8
any other (relatively small) number of babies at one time. We only listed the babies) was born in
possibilities up to triplets explicitly, but the other possibilities are included in 1998 and also in
S too; hence, the “. . .” at the end of S. 2009 in the United
States.
Let A be the event that the mother has at least one boy and at least one
girl. So A does not contain the outcomes (b) or (b, b) or (b, b, b) etc., and does
not contain the outcomes (g) or (g, g) or (g, g, g) etc. Thus
A = {(b, g), (g, b), (b, b, g), (b, g, b), (b, g, g), (g, b, b), (g, b, g), (g, g, b), . . .}.
Example 1.7. You wait at a red traffic light and record the time (in seconds)
until the light turns green.
The sample space is the set of all positive real numbers, R>0 . One event is
[5, 10], the event consisting of all outcomes between 5 and 10 seconds (inclusive).
Another event is (12.7, ∞), i.e., the waiting time is strictly more than 12.7
seconds. Another event is {32.7} seconds, the event consisting of only the
outcome 32.7 seconds. Events can be built using unions and intersections, e.g.,
(0, 60) ∪ (120, 180) is the event consisting of all outcomes less than 1 minute and
also consisting of all outcomes of 2 to 3 minutes.
6 Chapter 1. Outcomes, Events, and Sample Spaces
Example 1.8. You notice the color of the next car to pass on the street.
The sample space is the set of all possible colors in the scheme used to classify
this car’s color, for instance, perhaps it is classified according to the sample
space
S = {red, yellow, green, blue, orange, silver, brown, black, white, other}.
As we see in the examples with the baby’s sex or car’s color, outcomes do not
have to be numbers.
At the most fundamental level, it is essential to consider how we classify the
outcomes. There are often several valid viewpoints. As an example:
Example 1.9. We hit or miss the bullseye with a dart (two possible outcomes).
5 20 1
12 18
9 4 r
14 13 x
11 6 (x, y)
bullseye 8 10
16 15
7 2
19 3 17 miss
Figure 1.1: Different sample spaces for a dart throw. Left: Two outcomes in
the sample space. Middle: Twenty-one outcomes in the sample space (the 21st
outcome denotes missing the board altogether). Right: Sample space consists
of the outcomes, according to location, given as coordinates.
The sample space is S = {hit, miss}. This is depicted on the left side of Fig-
ure 1.1. There are four events:
(The empty set never happens because it has no outcomes. Sometimes event
{hit} happens; sometimes event {miss} happens. Event {hit, miss} = S always
happens.)
1.1. Introduction 7
Example 1.9 (continued) When throwing a dart, we hit one of twenty re-
gions, or we miss the entire board (twenty-one possible outcomes). Notice: this
classification of the outcomes is very different than the “hit” or “miss” setup.
The sample space is S = {miss, 1, 2, 3, . . . , 20}, consisting of the twenty-one
possible outcomes: either we “miss” the board altogether or we hit one of the
20 specified regions. This is depicted in the middle of Figure 1.1. (The board
has metal ridges between the regions, so that a dart cannot land exactly on the
boundary of two regions.)
Example 1.9 (continued) When throwing a dart, we note the exact location
where the dart lands.
The sample space is
S = {(x, y) | x, y ∈ R},
{(x, y) | x2 + y 2 ≤ r2 },
where r is the radius of the dartboard. (In this case, we have not handled darts
that miss the board entirely.) For instance, if r = 9 inches, then the sample
space includes outcomes such as (x, y) = (3.6, −1.35), etc.
Notation 1.10. The notation for a set uses braces, with the contents of the
set, often followed by a line and then any conditions on the contents of the
set.
things conditions on
an incomplete list—of the different outcomes that are possible from a random
phenomenon. (As a rule of thumb, we often encourage students to write five
different possible outcomes, if the problem is complicated, just to develop some
intuition.) With the darts in Example 1.9, we can certainly write down both
outcomes in the first scenario, i.e., “bullseye” or “not bullseye.” In the second
scenario, the list of all possible outcomes would be “miss,” 1, 2, 3, . . . , 20. In
the third scenario, as soon as we begin to try to write down all of the possible
locations on the board by their (x, y) coordinates, we quickly realize that this
is a hopeless task. It will not be possible for us to write down every potential
outcome, so the concise set notation is crucial to use.
Definition 1.11. We use the union notation “∪” when a new set is formed
that contains each outcome found in any of the component events. E.g., A ∪ B
contains each outcome that is found in A, or in B, or in both.
Example 1.13. A student shuffles a deck of cards thoroughly (one time) and
then selects cards from the deck without replacement until the ace of spades
appears.
“Without replacement” means that the cards are not put back into the deck
after they are drawn. So on the first draw there are 52 cards available, but on
the second draw there are only 51 cards available, and 50 cards available on
the third draw, etc. So the ace of spades is certain to appear sometime during
the 52 draws. Also, because they are selected without replacement, the chosen
cards will be distinct.
The event that exactly three draws are needed to see the ace of spades is
{(x1 , x2 , x3 ) | x3 = A♠, and the xj ’s are distinct}.
The sample space S consists of all possible draws of distinct cards that end with
the ace of spades:
S = {(A♠)} ∪ {(x1 , x2 ) | x2 = A♠, and the xj ’s are distinct }
∪ {(x1 , x2 , x3 ) | x3 = A♠, and the xj ’s are distinct }
∪ {(x1 , x2 , x3 , x4 ) | x4 = A♠, and the xj ’s are distinct }
..
.
∪ {(x1 , x2 , . . . , x52 ) | x52 = A♠, and the xj ’s are distinct } .
Equivalently, if Bk =S{(x1 , x2 , . . . , xk ) | xk = A♠, for distinct xj ’s }, then the
sample space is S = 52k=1 Bk .
1.1. Introduction 9
Example 1.14. A student draws cards from a standard deck of playing cards
until the ace of spades appears. After every unsuccessful draw, the student
replaces the card and shuffles the deck thoroughly before selecting a new card.
The set of outcomes in which the ace of spades first appears on the kth draw is
Bk = {(x1 , . . . , xk ) | only xk is A♠}
Notice that we dropped the condition about the cards being distinct.
The set
S∞of all possibilities in which the student actually finds the ace of Since the cards are
spades is j=1 Bk . The astute reader will notice that we did not yet mention replaced after each
the possibility that the aces of spades never appears. We write this event as draw, this scenario
is quite different
C = {(x1 , x2 , x3 , . . .) | none of the xk ’s is A♠}. from Example 1.13.
So the entire sample space is
[
S= Bk ∪ C.
k≥1
Example 1.15. A traffic engineer records times (in seconds) between the next
six cars that pass.
For example, consider when the next six cars arrive:
car 1 car 2 car 3 car 4 car 5 car 6
Example 1.17. A student hears ten songs (in a random shuffle mode) on her
music player, noting how many of these songs belong to her favorite type of
music.
If she uses F to denote when a song belongs to her favorite type of music, and
N for not-favorite, then the sample space consists of all ten-tuples of F ’s and
N ’s. In other words, the sample space is
S = {(x1 , . . . , x10 ) | xj ∈ {F, N }}.
The event that none of the first three songs is her favorite type of music is
A = {(N, N, N, x4 , . . . , x10 ) | xj ∈ {F, N }}.
Songs 1, 2, 3 must be of type “N ,” but each of songs 4, 5, 6, 7, 8, 9,
10 have two possible assignments of types, either N or “F .” So A contains
1 × 1 × 1 × 2 × 2 × 2 × 2 × 2 × 2 × 2 = 27 outcomes. (In Part IV of the
book, and in the chapters on discrete random variables, we will investigate
more thoroughly the ways that counting is used in probability theory.)
The event that the even-numbered songs are from her favorite type of music
is
B = {(x1 , F, x3 , F, x5 , F, x7 , F, x9 , F ) | xj ∈ {F, N }}.
Event B contains 25 = 32 outcomes.
No selection of songs would be an outcome in both A and B, so A ∩ B = ∅;
in particular, the second song is of type N if the outcome is in event A, but the
second song must be of type F if the outcome is in event B.
The event that the last five songs are from type F is
C = {(x1 , x2 , x3 , x4 , x5 , F, F, F, F, F ) | xj ∈ {F, N }};
there are 25 = 32 outcomes in event C.
An outcome is in B ∩ C if and only if the 2nd, 4th, 6th, 7th, 8th, 9th, and
10th songs are of type “F .” So the event B ∩ C can be written as
B ∩ C = {(x1 , F, x3 , F, x5 , F, F, F, F, F ) | xj ∈ {F, N }}.
So event B ∩ C has 23 = 8 outcomes.
1.2. Complements and DeMorgan’s Laws 11
Example 1.18. Consider two events: A is the event that the amount of rainfall
on a given day is strictly less than 2.8 inches, and B is the event that the amount
of rainfall is 2.8 inches or more. Thus
Whenever the entire sample space is split into two events without overlap, the Some students call
two events are complements of each other. Thus B is the complement of A; “\” the “throwaway”
this is written as B = Ac . Similarly, A = B c . operator, i.e., S \ A
consists of all of S,
Definition 1.19. For an event A, the complement is the set of all outcomes “throwing away” any
in the sample space S that are not in A. The complement of A is written as outcomes in A.
Ac or as S \ A, using the “setminus” notation given below.
A Ac
Notation 1.20. The setminus “ \” notation can be used for any pair of
events. The event B \ A contains all outcomes found in B but not found in A.
B∩A B\A
12 Chapter 1. Outcomes, Events, and Sample Spaces
Example 1.22. A student is randomly selected, and she is asked about her
movie preferences. Let A1 , A2 , A3 be the event that she enjoys adventure, com-
edy, or romance movies, respectively. She might enjoy more than one genre.
c
Event 3j=1 Aj occurs if she likes at least one of these genres. Thus,
S S3
j=1 Aj
occurs if she dislikes all three genres; this is the same event as 3j=1 Acj , i.e.,
T
S3 c T3
the event that she dislikes each of the three genres. So j=1 Aj = j=1 Acj .
A picture of this scenario is given on the left side of Figure 1.2.
A B A B
C C
c
The event 3j=1 Aj happens if she likes all three genres. Thus,
T T3
j=1 Aj
occurs if she dislikes at least one genre; this is the same event as 3j=1 Acj ,
S
T3 c
i.e., the event that she dislikes at least one genre. So = 3j=1 Acj .
S
j=1 Aj
A picture of this scenario is given on the right side of Figure 1.2.
These ideas about complements of unions and intersections of events hold much
more generally, for both finite and infinite collections of events. Consider a finite
collection of events A1 , A2 , . . . , An or an infinite collection of events A1 , A
S2 , . . ..
The event that contains the outcomes found in at least one of the Aj ’s is j Aj .
c
The complement of this event is j Aj ; it contains the outcomes missing
S
from all of the Aj ’s, i.e., the outcomes that are in every Acj and thus in j Acj .
T
1.3. Exercises 13
Similarly, the event that contains the outcomes found in all of the Aj ’s is j Aj .
T
c
The complement of this event is j Aj ; it contains the outcomes missing from
T
at S
least one of the Aj ’s, i.e., the outcomes that are in at least one Acj and thus
in j Acj .
1.3 Exercises
In some of these scenarios, several different interpretations are possible. These
early exercises are intended to inspire discussion. A key goal is to effectively
communicate your understanding of the sample space and of the various at-
tributes of the scenario that the outcomes exhibit.
1.3.1 Practice
Exercise 1.1. Skydiver. A skydiver jumps out of a plane and lands somewhere
at random inside a circle with radius one mile. What is his landing location?
Exercise 1.2. Q library books. A library worker named Jim is going through
the returned books. Books are constantly arriving, and Jim’s quirky boss,
Quinten Quirrell, forces his workers to sort books until they sort a book with a
title beginning with the letter Q. How many books will Jim have to sort until
he gets a break?
Exercise 1.4. A random hand. You are dealt a hand of five cards (without
replacement) from a standard deck of fifty-two playing cards. You note the suits
and values of the cards (the order does not matter). Which cards are you dealt?
Exercise 1.5. Cell phone minutes. Your parents restricted your cell phone
minutes to 400 minutes this month. You call your boyfriend 75 times during
the month. What are the lengths of your calls, if you don’t exceed your allotted
400 minutes?
Exercise 1.6. Crayons. A little girl picks out crayons (without replacement)
from her 24-pack of Crayolas until she gets to the pink crayon. How many
crayons are needed?
1.3.2 Extensions
Exercise 1.7. Moving chairs. Four chairs are placed in a row; two of them
are red and look identical; the other two are blue and look identical.
a. How many outcomes are in the sample space? What are they?
b. How many different events are there?
Exercise 1.8. Abstract art. A painter has four different jars of paint colors
available, exactly one of which is purple. She wants to paint something abstract,
so she blindfolds herself, randomly dips her brush, and paints on the canvas.
She continues trying paint jars until she finally gets some purple onto the canvas
(her assistant will tell her when this happens). Assume that she does not repeat
any of the jars because her assistant removes a jar once it has been used.
a. How many outcomes are in the sample space? What are they?
b. How many different events are there?
c. Another painter borrows the four jars of paint and performs the same
experiment; i.e., selects paint at random; but she allows the jars to be reused,
perhaps over and over many times (assume each contains an unlimited amount
of paint). List a few of the outcomes in the sample space, when repetitions are
allowed.
d. In the scenario from part c, write an expression for the sample space.
a. What is the sample space that describes the set of waiting times between
the messages that the student receives?
1.3. Exercises 15
b. Assume the time required to type a text response also takes a random
amount of time. What is the sample space that describes the set of all waiting
times and also the lengths of typing the responses as well?
c. Write an expression for the event that the waiting times get longer and
longer, but the times used to type responses get shorter and shorter.
Exercise 1.10. Double die rolling. Two friends are playing a board game
that requires each of them to roll a die. Each player uses her/his own die.
a. What is the sample space for a single roll if their dice are painted two
different colors?
b. What if both dice are white—does this change anything?
c. What if one person rolls both dice—does this change anything?
1.3.3 Advanced
Exercise 1.11. Choose a point in a triangle. A point is chosen at random
inside the triangle in Figure 1.3.
What is the sample space? (Use set notation for the constraints on x and y.)
y
4
y
3
2
2
1
1
x x
1 2 1 2
What is the sample space? (Hint: Give bounds on x and then on y.)
Exercise 1.13. Building a loft. You are assembling a loft. One piece of wood
has 8 screw holes in a straight row, but you can only find 6 screws (which look
identical). In a hurry, you put the 6 screws in the 8 holes. How many outcomes
are in the sample space, if exactly 6 holes are selected (and the order of selection
does not matter)?
16 Chapter 1. Outcomes, Events, and Sample Spaces
Exercise 1.15. Sum of three dice. Roll three distinguishable dice (e.g.,
assume that there is a way to tell them apart, for instance, that the dice are
three different colors). There are 6 × 6 × 6 = 216 possible outcomes.
For 3 ≤ j ≤ 18, define Aj as the event that the sum of the dice equals j.
Find |Aj |, i.e., the number of outcomes in Aj ? (For instance, |A3 | = 1 since A3
contains only the one outcome (1, 1, 1). Similarly, |A18 | = 1 since A18 contains
only one outcome, (6, 6, 6).)
Chapter 2
Probability
You have 22 songs on your mp3 player’s playlist, and you set the player on
shuffle mode, where songs are allowed to repeat, while you study. Country
music is your favorite, but only 12 of the songs on this playlist are country with
the rest being rock. What is the probability the first song will be country music?
What is the probability the first song will not be country music? If 3 songs play,
what is the probability that all 3 are country music? Or the probability that
exactly 2 of them are country music? Or the probability that none of them are
country music? If 3 songs play, what is the probability that only the first song
is country, and is this different from the probability that exactly one song will
be country? Why or why not?
2.1 Introduction
17
18 Chapter 2. Probability
Now we can state the fundamental ideas (mentioned earlier) in a precise way.
This theorem makes sense intuitively, but does it fit with our basic assumptions?
Yes! Here is the reasoning:
By Axiom 2.4.2, P (S) = 1. The rest of the right hand side consists of nonneg-
ative terms P (∅), which must therefore each be 0. So P (∅) = 0.
2.2. Equally Likely Events 19
Again, this makes intuitive sense. To prove it, using our basic assumptions,
define Aj = ∅ for all j > n. Then we use the probability theory axioms, as
follows:
[n [∞
P Aj = P Aj since Aj = ∅ for j > n
j=1 j=1
∞
X
= P (Aj ) by Axiom 2.4.3
j=1
n
X
= P (Aj ) since P (Aj ) = 0 for j > n
j=1
Example 2.7. When rolling a die, each of the six outcomes should be equally
likely. This means that each single-outcome event should have the same prob-
ability.
The probability of each seems (intuitively) to be 1/6. Using our simple assump-
tions,
If all P ({j})’s are the same, then 1 = 6P ({j}), so P ({j}) = 1/6 for each j. So
our intuition is correct. Now we can compute any kind of probability associated
with one roll of a die. For instance, the probability a die roll is odd is:
1 1 1 1
P ({1, 3, 5}) = P ({1}) + P ({3}) + P ({5}) = + + = .
6 6 6 2
20 Chapter 2. Probability
Example 2.8. A pregnancy that yields exactly one baby would yield an out-
come of either a boy or a girl, which are equally likely (as in the die example
above).
www.cdc.gov/ The four relevant probabilities are
nchs/data/nvsr/
nvsr61/ 1. P (∅) = 0,
nvsr61_01.pdf
suggests that the 2. P ({boy}) = 1/2,
odds are really
3. P ({girl}) = 1/2,
closer to 51.17% for
boys vs 48.83% for 4. P ({boy, girl}) = P (S) = 1.
girls, but we assume
a 50/50 ratio. In the last case, S = {boy, girl}, so we are really just emphasizing the fact that
P (S) = 1.
These observations about equally likely outcomes are handy and very general:
Theorem 2.9. If a sample space S has n equally likely outcomes, then each
outcome has probability 1/n of occurring.
This is true for just the same reasons as in the die example. Let x1 , x2 , . . . , xn
be the n outcomes. Then
If all P ({xj })’s are the same, then 1 = nP ({xj }), so P ({xj }) = 1/n for each j.
P (A) = P ({y1 , y2 , . . . , yj })
= P ({y1 } ∪ {y2 } ∪ · · · ∪ {yj })
= P ({y1 }) + P ({y2 }) + · · · + P ({yj }) by Axiom 2.4.3
1 1 1
= + + ··· + by Theorem 2.9
n n n
= j/n
Definition 2.11. The number of outcomes in an event A, also called the size
of A, is denoted as |A|.
2.2. Equally Likely Events 21
Using the notation of |S| and in |A| as the number of items in S and A, respec-
tively, Corollary 2.10 can be rewritten as follows:
Corollary 2.12. If sample space S has a finite number of equally likely out-
comes, then event A has probability
P (A) = |A|/|S|,
where |S| and |A| denote the number of items in S and A, respectively.
We will study equally likely outcomes to a much greater extent, in Chapters 20
and 22. For now, however, we give a few examples.
Example 2.13 (continued) We can now split the dartboard into four regions.
If a new event R1 is constructed as the union of several of the Aj ’s, then
the probability of R1 is just the sum of the probabilities. E.g., if
R1 = {1, 18, 4, 13, 6} = A1 ∪ A18 ∪ A4 ∪ A13 ∪ A6 ,
(i.e., R1 is the event that the dart lands in the northeast portion), then
P (R1 ) = P (A1 ) + P (A18 ) + P (A4 ) + P (A13 ) + P (A6 ) = 5/20 = 1/4.
This could also seen by using Corollary 2.10, since R1 contains 5 of the 20
equally likely outcomes, so P (R1 ) = 5/20.
As in Figure 2.1, define northeast, southeast, southwest, and northwest re-
gions as
R1 = {1, 18, 4, 13, 6}, R2 = {10, 15, 2, 17, 3},
R3 = {19, 7, 16, 8, 11}, R4 = {14, 9, 12, 5, 20}.
22 Chapter 2. Probability
5 20 1
12 18
9 4
14 13
11 6
8 10
16 15
7 2
19 3 17 (or could miss the
dartboard altogether)
Figure 2.1: Twenty-one possible outcomes. Four colors for the northeast,
southeast, southwest, and northwest regions.
In a partition, not all of the regions need to have the same size.
Example 2.15. For instance, a different partition of the dartboard could con-
sist of three disjoint events:
See Figure 2.2. In this partition, P (T1 ) = 5/20, P (T2 ) = 8/20, and P (T3 ) =
7/20. Notice P (T1 ) + P (T2 ) + P (T3 ) = 1.
5 20 1
12 18
9 4
14 13
11 6
8 10
16 15
7 2
19 3 17 (or could miss the
dartboard altogether)
Figure 2.2: Twenty-one possible outcomes. Three colors for Example 2.15
regions.
To see this,
S in a partition consisting Bj ’s, every outcome is in one of the events,
so S = j Bj . Also, each outcome is in exactly one of these events in the
partition, so the Bj ’s are disjoint. Thus P j P (Bj ). Putting these
S P
j Bj =
together, we get [ X
1 = P (S) = P Bj = P (Bj ).
j j
Let Bk denote the event that the ace of spades is drawn on exactly the kth
draw:
We emphasize that P (Bk ) = 1/52 for each k, since the initial placement of the
ace of spades (i.e., during the initial shuffle) completely determines when the
ace of spades will appear. Since the ace of spades is equally likely to be in any
of the 52 places in the deck, then the ace of spades is equally likely to appear
on any of the 52 draws.
The Bk ’s are disjoint events, since it is impossible for an outcome to simul-
taneously be in more than one of the Bk ’s. Also, every outcome is in exactly
one of the events. So B1 , B2 , . . . , B52 form a partition of the sample space.
24 Chapter 2. Probability
Not every set of outcomes is equally likely, so we must be careful when applying
Corollary 2.10. For instance, when bowling, it is possible to knock down between
0 and 10 pins, so there are 11 outcomes (if we only keep track of the score, not
the specific pins that fall down). We have no reason to believe that all of these
11 outcomes are equally likely.
Example 2.18. As in Example 1.18, consider two events: A is the event that
the amount of rainfall on a given day is strictly less than 2.8 inches, and B is
the event that the amount of rainfall is 2.8 inches or more. Thus
A = [0, 2.8) B = [2.8, ∞).
The sample space consisting of all possible amounts of rain is S = [0, ∞) so
S = A ∪ B. Also, A and B are disjoint. So
1 = P (S) = P (A ∪ B) = P (A) + P (B),
so P (B) = 1 − P (A). E.g., if the probability of “rainfall less than 2.8 inches” is
83%, then the probability of “rainfall 2.8 inches or more” must be 1−0.83 = 0.17,
i.e., 17%.
2.4 Inclusion-Exclusion
The method of inclusion-exclusion allows us to relate overlaps among subsets
to unions and intersections. This decomposition enables us to calculate proba-
bilities for events that are overlapping.
S
A B
red
orange brown
blue
green yellow silver black
A∩B
lime A\B B\A
teal
S \ (A ∪ B)
Notice
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
This might seem intuitively clear because A ∪ B accounts for each of the colors
red, blue, yellow, orange, silver one time, while
1. A accounts for red, blue, yellow, orange, silver
P (A ∪ B) = P (A \ B) + P (A ∩ B) + P (B \ A)
= P (A \ B) + P (A ∩ B) + P (B \ A) + P (A ∩ B) − P (A ∩ B)
= P (A) + P (B) − P (A ∩ B)
A
A B
D
C B
The overlaps among A, B, C can be visualized on the left side of Figure 2.4.
The overlaps among four events, A, B, C, D can be visualized on the right side
of the same figure. It is true, furthermore, by similar reasoning, that
2.5. More Examples of Probabilities of Events 27
This same kind of reasoning can continue, and we get a general inclusion-
exclusion formula.
Theorem 2.25. Inclusion-Exclusion Rule
For any finite sequence of events A1 , A2 , . . . , An ,
[n Xn X X
P Aj =
P (Aj ) − P (Ai ∩ Aj ) + P (Ai ∩ Aj ∩ Ak )
j=1 j=1 i<j i<j<k
X
− P (Ai ∩ Aj ∩ Ak ∩ Al )
i<j<k<l
± · · · + (−1)n+1 P (A1 ∩ A2 ∩ · · · ∩ An )
Example 2.26. As in Example 1.14, consider a student who draws cards from
a deck but this time he always replaces the card after each selection and then
reshuffles the deck. He only stops if he reaches the ace of spades.
Let Bk be the set of outcomes in which the ace of spades is discovered for the
first time on the kth draw:
Bk = {(x1 , . . . , xk ) | only xk is A♠}
Then the Sset of all possibilities in which the student actually finds the ace of
spades is ∞k=1 Bk . Also let
The probability that a single draw does not contain the ace of spades is always
51/52. Since the draws do not affect each other (a phenomenon that we will
explore much further in Chapter 3 on independence), it follows that
k−1
z
}| { k−1
51 51 51 1 51 1
P (Bk ) = ··· = .
52 52 52 52 52 52
51 j
We know that the Bk ’s and C are all disjoint, and also
P∞
= 1−151
j=0 52 52
is a geometric sum;
∞
[
see the Math
Review.
S= Bk ∪ C.
k=1
So
∞
X
1 = P (S) = P (C ∪ B1 ∪ B2 ∪ B3 ∪ · · · ) = P (C) + P (Bk ).
k=1
Example 2.27. (continued from Example 1.17) A student hears ten songs
(in a random shuffle mode) on her music player, paying special attention to how
many of these songs belong to her favorite type of music. We assume that the
songs are picked independently of each other and that each song has probability
p of being a song of the student’s favorite type.
and the probability of the event is P (A) = (1 − p)3 , because all that the event
requires is that none of the first three songs is from her favorite type of music.
We do not impose any restrictions on songs 4, 5, . . . , 10, so these do not affect
the probability of event A occurring.
2.6. Exercises 29
Now consider the event that the even-numbered songs are from her favorite
type of music. We write this event as
and the probability of B is P (B) = p5 , because we only require that five specific
songs are from her favorite type of music.
Similarly, if
where 10j = j!(10−j)! is the number of ways to pick exactly j out of 10 songs.
10!
We will explore this idea in greater depth in Chapter 15, on Binomial random
variables.
2.6 Exercises
2.6.1 Practice
Exercise 2.1. Songs by genre. A song is chosen at random from a person’s
mp3 player. The student makes a partition of the sample space, according to
genre of music. The table below gives the number of outcomes in each part of
the partition. There are 27,333 songs altogether.
1032 Alternative 83 Electronic 56 Metal
330 Blues 508 Folk 2718 Other
275 Books & Spoken 183 Gospel 1786 Pop
1468 Children’s Music 82 Hip-Hop 403 R&B
921 Classical 564 Holiday 8286 Rock
6169 Country 537 Jazz 1432 Soundtrack
178 Easy Listening 106 Latin 216 World
Let A be the event that a song is either blues, jazz, or rock, i.e.,
when one song is chosen at random. Assume that all songs are equally likely to
appear.
Exercise 2.2. Rock climbing. I am out rock climbing, and the rock face has
4 easy, 7 challenging, and 3 extreme routes to get to the top. The routes are
poorly marked, so I just choose one at random, with all routes equally likely.
What is the probability that I do not choose an extreme route?
and
P (A ∩ B) = P (A ∩ C) = P (B ∩ C) = 0.12,
and
P (A ∩ B ∩ C) = 0.05.
Find the probability that the student participates in at least one of these
three programs, i.e., find P (A ∪ B ∪ C).
Exercise 2.4. Dining with Dad. At a random meal during a parent weekend
in the dining hall, a student notices the food chosen by her father. Let A, B, C
be the events that his meal include Artichokes, Broccoli, or Cauliflower. These
events have the property that
P (B) = 0.39
P (C) = 0.44
P (A ∩ B) = 0.13
P (A ∩ C) = 0.12
P (B ∩ C) = 0.13
P (A ∩ B ∩ C) = 0.09
P (A ∪ B ∪ C) = 0.89
What is the probability that the father includes Artichokes in his meal, i.e.,
what is P (A)?
2.6. Exercises 31
a. What is the probability that she selects a pair of shoes that makes her
taller if she pulls a pair from her closet without looking?
b. What is the probability that she selects a pair of shoes that does not
make her look taller?
c. Create another type of partition for this woman’s collection of shoes.
Exercise 2.8. Coin flips. You flip a coin 5 times. What is the probability the
first 4 are heads and the last one is a tail?
Exercise 2.9. Pizza meat. The guys on one floor of a college dorm all decide
to get pizzas to share. They get 3 pepperoni pizzas, 2 bacon pizzas, 1 cheese
pizza, 3 sausage pepperoni pizzas, and 3 meat lovers pizzas with sausage, pep-
peroni, and bacon. What is the probability of a randomly selected slice of pizza
containing:
a. Bacon?
b. Pepperoni?
c. Sausage?
2.6.2 Extensions
Exercise 2.11. Monkey keystrokes. A monkey is let loose in a computer lab
and starts playing with a keyboard. What is the probability that the monkey,
without any comprehension or intention, types out the word “bananas” if he
types exactly 7 keys? The typical keyboard has 101 keys, and the monkey only
presses one key at a time.
Exercise 2.12. Apples. There are 6 apples in a basket. Two of them are red,
and four are green.
Exercise 2.13. Shuffling and star ratings. I have 20 five-star songs and
200 four-star songs on my iPod, which has 2000 songs total.
Exercise 2.15. Roll a die. If you roll a die, event A contains outcomes 1, 3,
and 6; event B contains outcomes 1 and 6, and event C contains outcomes 4
and 6.
Exercise 2.17. Abstract art. A painter has three different jars of paint colors
available, in colors green, yellow, and purple. She wants to paint something
abstract, so she blindfolds herself, randomly dips her brush, and paints on the
canvas. She continues trying paint jars until she finally gets some purple onto
the canvas (her assistant will tell her when this happens) and then she stops.
2.6. Exercises 33
Assume that she does not repeat any of the jars because her assistant removes
a jar once it has been used. So the sample space is
Exercise 2.18. Yahtzee. In the game Yahtzee, there are 5 dice with 6 possible
numbers on each. What is the probability for a Yahtzee on a player’s first roll?
(In other words, what is the probability that all 5 dice show the same number
the first time that they are rolled)?
Exercise 2.19. Locker combinations. You just forgot your locker combi-
nation and are too embarrassed to ask for it. You know for sure that the first
number is 22, or was it 32? It’s one of those. You’re certain that the middle
number is a one-digit number (0–9), and the last number could be anything
between 0 and 45. If the lock is a 3-number lock with numbers 0 through 45,
what is the maximum number of tries needed to open it, assuming you don’t
repeat any combinations?
P (A) = 0.17
P (B) = 0.37
P (C) = 0.19
P (A ∩ B) = 0.07
P (B ∩ C) = 0.11
P (A ∩ B ∩ C) = 0.03
P (A ∪ B ∪ C) = 0.48
Exercise 2.23. Mystery probability. Suppose there are 3 events such that
P (A) = 0.20
P (B) = 0.10
P (C) = 0.40
P (A ∩ B) = 0.05
P (A ∩ C) = 0.10
P (B ∩ C) = 0.03
P (A ∩ B ∩ C) = 0.01
2.6.3 Advanced
Exercise 2.26. Prove Theorem 2.25.
Exercise 2.27. Is the whole smaller than the sum of the parts?
a. It is always true, for any events A, B, that P (A ∪ B) ≤ P (A) + P (B).
Why? Explain briefly with words or a very clear picture.
b. Is it always true that P (A ∪ B ∪ C) ≤ P (A) + P (B) + P (C)? If so,
explain why, either using words or a very clear picture. If not, please give a
counterexample.
Exercise 2.28. Grabbing a pen. You find a container of 27 old pens in your
school supplies and continue to test them (without replacement), until you find
one that works. If each individual pen works 25% of the time (regardless of the
other pens), what is the probability that you find one that works within the
first four tries?
2.6. Exercises 35
Exercise 2.29. Die rolls. You roll a die three times. What is the probability
the sum of the first two rolls is equal to the third roll?
Exercise 2.30. Cookies. Consider a jar of 9 chocolate chip and 11 peanut
butter cookies. You randomly select 2 cookies to eat. All possible choices are
equally likely.
a. What is the probability that the 2 you select will both will be chocolate
chip?
b. What is the probability that at least one of your cookies will be peanut
butter?
c. What is the probability that last 2 cookies left in the jar (after 18 have
been eaten) will be chocolate chip? (Is this answer the same or different than
part a? Why or why not?)
Exercise 2.31. Seating arrangements. Alice, Bob, Catherine, Doug, and
Edna are randomly assigned seats at a circular table in a perfectly circular
room. Assume that rotations of the table do not matter, so there are exactly
24 possible outcomes in the sample space.
Bob and Catherine are married. Doug and Edna are married.
Let Aj denote the event that exactly j of the married couples are happy
because they are sitting together. Find P (A0 ) and P (A1 ) and P (A2 ).
Exercise 2.32. Socks. In your drawer you have 10 white socks, 6 black socks,
4 red socks, and 2 purple socks. Your roommate is still asleep, and you can’t
turn the light on while you’re getting dressed. You reach in blindly and grab
two socks. What is the probability of pulling out a matching pair of purple
socks?
Exercise 2.33. Maximum of three dice. Roll three distinguishable dice
(e.g., assume that there is a way to tell them apart, for instance, that the dice
are three different colors). There are 6 × 6 × 6 = 216 possible outcomes.
Let Bk be the event that the maximum value that appears on all three dice
when they are rolled is less than or equal to k. Find P (B1 ), P (B2 ), P (B3 ),
P (B4 ), P (B5 ), and P (B6 ). If you prefer, you are welcome to just give a general
formula that covers all six of these cases, i.e., you are welcome to just give a
formula for P (Bk ) itself.
Exercise 2.34. If A1 , A2 , . . . , An is a collection of events, is it always true that
n
[ Xn
P Ak ≤ P (Ak )?
k=1 k=1
Independent Events
The word probability, in its mathematical acceptation, has reference to the state
of our knowledge of the circumstances under which an event may happen or fail.
—Collected Logical Works, Volume 2: The Laws Of Thought by George
Boole (Walton and Maberly, 1854)
3.1 Introduction
We have an intuitive understanding of the word “independence”: Two events A
and B are independent if the occurrence of one of the events does not affect
the probability of occurrence of the other event. This is exactly right, but
we need the concept of conditional probabilities (to be covered in Chapter 4),
to use this viewpoint. In the present chapter, we define events A and B as
independent if the probability that A and B both occur equals the probability
that A occurs times the probability that B occurs. We will also discuss the
notion of independence among more than two events. Afterwards, we will give
examples of dependent events, as well as a very general fact about sequences
of independent attempts, in which we are waiting for the first “good” result to
occur.
P (A ∩ B) = P (A)P (B).
37
38 Chapter 3. Independent Events
P (A ∩ B) 6= P (A)P (B).
Example 3.3. Consider the birth of two children from two separate pregnancies
(in particular, we are not considering the birth of twins, in which one baby’s
sex might affect the other).
If A is the event that the first baby is a girl, and B is the event that the
second baby is a girl, then P (A) = 1/2, and P (B) = 1/2, and P (A ∩ B) = 1/4,
so P (A ∩ B) = P (A)P (B). Thus, events A and B are independent. This
matches our traditional understanding of the word “independent,” because the
sex of the first baby does not affect the sex of the second baby.
Let C denote the event that both children are girls. Then P (A ∩ C) = 1/4
but P (A)P (C) = 1/8, so A and C are dependent (intuitively, if C happens,
then A must happen).
E.g., consider the outcome “girl, girl” in Example 3.3, which is found in both
events A and B, and thus in A ∩ B too. So A and B are independent but are
not disjoint.
More generally, consider the picture in Figure 3.1, for two different situa-
tions.
S S
A B A B
P (A)P (B) = 0.
P (A ∩ B) = 0 = P (A)P (B).
Example 3.8. When rolling a die, let A denote the event consisting of outcomes
{1, 2, 3}, and let B denote the event consisting of outcomes {3, 4}, so P (A) =
1/2 and P (B) = 1/3. Also A ∩ B = {3}, so P (A ∩ B) = 1/6. So P (A ∩ B) =
P (A)P (B), and this means that A and B are independent.
Example 3.11. Consider the songs from Exercise 2.1. Suppose that songs are
chosen in such a way that each song is chosen at random, and repetitions are
allowed, and every outcome is equally likely (an “outcome” is a particular song,
not a genre).
1032 Alternative 83 Electronic 56 Metal
330 Blues 508 Folk 2718 Other
275 Books & Spoken 183 Gospel 1786 Pop
1468 Children’s Music 82 Hip-Hop 403 R&B
921 Classical 564 Holiday 8286 Rock
6169 Country 537 Jazz 1432 Soundtrack
178 Easy Listening 106 Latin 216 World
Let A be the event that the first song is either blues or jazz. Let B be the
event that the second song is jazz. Let C be the event that the third song is
blues or rock.
Notice A and B are independent. Also, A and C are independent. Also, B
and C are independent. In the scenario when song repetitions are allowed, the
type of one song does not affect the types of other songs.
P (A ∩ B) = P (A)P (B)
P (A ∩ C) = P (A)P (C)
P (B ∩ C) = P (B)P (C)
P (A ∩ B ∩ C) = P (A)P (B)P (C)
330 + 537 537
P (A ∩ B) = = P (A)P (B)
27,333 27,333
330 + 537 330 + 8286
P (A ∩ C) = = P (A)P (C)
27,333 27,333
3.1. Introduction 41
537 330 + 8286
P (B ∩ C) = = P (B)P (C)
27,333 27,333
330 + 537 537 330 + 8286
P (A ∩ B ∩ C) = = P (A)P (B)P (C)
27,333 27,333 27,333
Example 3.14. Consider a student who flips twenty coins in a row. Let Aj
denote the event that the jth coin shows a head. Then the events A1 , . . . , A20
are independent.
Example 3.15. Consider a student who flips coins for an arbitrarily long
amount of time. As before, let Aj denote the event that the jth coin shows
a head. Again, the individual coin flips do not impact each other, so the collec-
tion of all of the Aj ’s is independent.
Example 3.16. If Aj represents the event that there are two or more errors
on the jth page of a book, then the collection of Aj ’s is perhaps independent,
because the errors on the individual pages of a book should not affect the errors
that occur on other pages of the book.
Example 3.17. The lifetimes of 100 randomly selected light bulbs are mea-
sured. Let Aj denote the event that the jth bulb lasts for at least 60 days.
Then the collection of 100 events, A1 , . . . , A100 , is independent.
42 Chapter 3. Independent Events
Example 3.19. A student flips a coin until the tenth head appears. See Fig-
ure 3.2. Let A denote the event that at least 3 flips are needed between the 7th
and 8th heads; let B denote the event that at least 3 flips are needed between
the 8th and 9th heads. Then A and B are independent. The coin flips are trials.
H T T T ··· H T T T ··· H
7th head 8th head 9th head
Figure 3.2: The number of flips between the 7th and 8th heads do not affect
the number of flips between the 8th and 9th heads.
P (B)P (Ac ) = P (B ∩ Ac ),
Example 3.22. Two randomly chosen people are selected from a large college
campus, and their heights are measured.
Let A denote the event that the height of the first person is 70 inches or
greater; let B denote the event that the height of the second person is less than
68.5 inches. Then A and B are independent.
Let C denote the event that the first student’s height is less than 68.5 inches.
Then A and C are disjoint, so by Remark 3.5, A and C are dependent too.
Let An denote the event that the nth trial is appleSand none
of the earlier
∞
trials are apple or orange. Then we are looking for P n=1 n .
A
44 Chapter 3. Independent Events
Notice that the An ’s are disjoint. If A3 occurs, then apple first appears on
the 3rd trial, so apple cannot appear for the first time on the 1st trial, or 2nd
trial, so neither A1 nor A2 can occur. Since the An ’s are disjoint, then
[ ∞ X ∞
P An = P (An )
n=1 n=1
cannot happen for the first time on two different trials! Since the An ’s are
disjoint, this gives
[∞ X ∞
P An = P (An ).
n=1 n=1
Also, An occurs if n − 1 neutral trials are followed by a good one, so
P (An ) = (1 − p − q)n−1 p.
So the desired probability is
∞ ∞
[ X 1 p
P An = (1 − p − q)n−1 p = p= .
1 − (1 − p − q) p+q
n=1 n=1
So the probability something good happens before something bad happens is
p/(p + q).
3.4. Exercises 45
3.4 Exercises
3.4.1 Practice
Exercise 3.1. Graduation. Jack and Jill are independently struggling to
pass their last (one) class required for graduation. Jack needs to pass Calcu-
lus III, but he only has probability 0.30 of passing. Jill needs to pass Advanced
Pharmaceuticals, but she only has probability 0.46 of passing. They work in-
dependently. What is the probability that at least one of them gets a diploma?
Exercise 3.2. Japanese pan noodles. Ten students order noodles at a cer-
tain local restaurant. Their orders are placed independently. Each student is
known to prefer Japanese pan noodles 40% of the time (it is a very popular and
tasty dish!).
a. What is the probability that all ten of the students order Japanese pan
noodles?
b. What is the probability that none of the students order Japanese pan
noodles?
c. What is the probability that at least one of the students orders Japanese
pan noodles?
Exercise 3.3. Off to the races. Suppose Mike places three separate bets on
three separate horse races at three separate tracks. Each bet is for a specific
horse to win. His horse in race 1 wins with probability 1/5. His horse in race 2
wins with probability 2/5. His horse in race 3 wins with probability 3/5. What
is the probability that he made the correct bet in exactly one of these three
races?
Exercise 3.4. Early class. Consider these 3 independent trials: On Monday
you wake up 45 minutes before class, and the probability that you get to class
on time is 0.98. On Tuesday you wake up 32 minutes before class, and your
chance of being on time is 0.71. On Wednesday you wake up very, very late,
and your probability of being on time is only 0.16.
a. What is the probability that you were on time to class all 3 days?
b. What is the probability that you were never on time?
c. What is the probability that you were on time at least 1 day?
Exercise 3.5. Home for the holidays. A holiday flight from New York to
Indianapolis has a probability of 0.75 each time it flies (independently) of taking
less than 4 hours.
a. What is the probability that at least one of 3 flights arrives in less than
4 hours?
b. What is the probability that exactly 2 of the 3 flights arrive in less than
4 hours?
46 Chapter 3. Independent Events
3.4.2 Extensions
Exercise 3.6. Hoops. Your sister is playing basketball. She makes 4 tosses to
a lowered basketball hoop, and whether the ball goes in each time is independent
of the other trials. Her chance of making the basket on a trial is 60%.
For each j with 0 ≤ j ≤ 4, what is the probability that she makes exactly j
baskets?
Exercise 3.7. Abstract art. A painter has three different jars of paint colors
available, in colors green, yellow, and purple. She wants to paint something
abstract, so she blindfolds herself, randomly dips her brush, and paints on the
canvas. She continues trying paint jars, without replacement, until all three
have been used. (Her assistant helps with this blindfolded process!) So sample
space S is
S = {(G, P, Y ), (G, Y, P ), (P, G, Y ), (P, Y, G), (Y, G, P ), (Y, P, G)}.
Let A be the event that purple is found in the second jar tested by the painter.
Let B be the event that green is found before yellow. Are events A and B
independent?
Exercise 3.8. Even versus four or less. Roll a die. Let A be the event that
the outcome on the die is an even number. Let B be the event that the outcome
on the die is 4 or smaller. Let C be the event that the outcome on the die is 3
or larger.
a. Are A and B independent?
b. Are B and C independent?
Exercise 3.9. Vegetarian dilemma. In a very large collection of sandwiches,
40% are cheese, 45% have steak, and 15% have tofu. A person is vegetarian
and therefore samples the sandwiches randomly until finding a cheese or tofu
sandwich. What is the probability that they find a cheese sandwich before
finding a tofu sandwich?
Exercise 3.10. Guessing on an exam. While taking a probability exam,
you come to three questions that you have no clue how to answer. You would
have known the answers if you had taken the time to study the night before
instead of going to a party, but you did not make a good life choice, and you
vow to never party on a school night again if you fail this exam. Each question
on the exam is multiple choice with the correct answer being either a, b, c, d,
or e. (Your guesses are independent.)
What is the probability that:
a. you randomly guess the right answer to all three questions?
b. you randomly guess the right answer to none of the three questions?
c. you randomly guess the right answer to exactly one of the three questions?
d. you randomly guess the right answer to exactly two of the three questions?
e. Do the probabilities in parts a–d sum to 1?
3.4. Exercises 47
3.4.3 Advanced
Exercise 3.11. Can the sum be greater than 1? Is it possible to have two
independent events A and B with the property that
Exercise 3.13. Political survey. On a large campus, 53% of the students are
Democrats, and 47% are Republicans. A political survey is conducted. Assume
that the students respond independently. How many students are needed, so
that the probability of at least 1 Democratic participant exceeds 99%?
Chapter 4
Conditional Probability
Your dad is visiting you at college, and you have taken him to lunch in your
dorm’s dining hall to give him the full college student experience. Your dad
is a bit of a health nut and loves vegetables, but he thinks some vegetables
pair better than others. For example, he’s more likely to put broccoli and
cauliflower together than broccoli and green beans. Your cafeteria has a fairly
extensive selection of vegetables available in the lunch buffet. If you know
your dad already picked up cauliflower, what’s the chance he will also pick up
broccoli? How does knowing that your dad already picked up cauliflower change
the probability that he will pick up broccoli compared to when you walked in
the door before he had selected any vegetables?
4.1 Introduction
When we have some additional information about a random phenomenon, we
can take advantage of the concept of conditional probability. When we know
(or assume) something about a random phenomenon in advance, it allows us to
essentially shrink the sample space to a smaller set of possible outcomes. This
fundamentally alters the probabilities. Consider the following example:
48
4.1. Introduction 49
If A is the event that the randomly selected student gets a job, and B is the
event that the randomly selected student is a French major, the conditional
probability of A given B is written as P (A | B). In P (A|B), the bar
is read as “given.”
The reason for conditional probability is that we sometimes have additional
information that we want (or need) to incorporate into the problem. In general,
we denote conditional probability as follows:
In general, if event B has nonzero probability (i.e., P (B) > 0), then the
conditional probability P (A | B) of A given B is defined as
P (A ∩ B)
P (A | B) = .
P (B)
Equivalently,
P (A ∩ B) = P (B)P (A | B).
Remark 4.3. Consider event B with P (B) > 0. Recall A and B are inde-
pendent exactly when P (A)P (B) = P (A ∩ B), i.e., exactly when
P (A ∩ B)
P (A) = ,
P (B)
but the right-hand side is always equal to the conditional probability P (A | B).
So A and B are independent exactly when
P (A) = P (A | B),
i.e., when B’s occurrence does not affect the probability of A occurring.
Theorem 4.4. If P (B) > 0, then A and B are independent if and only if
P (A) = P (A | B), i.e., when B’s occurrence does not affect the probability of
A occurring.
Example 4.5. When a die is rolled, let B be the event that the outcome is
“odd.” Then, for example,
Since we know that B occurs, the sample space has essentially been reduced
from the original sample space, with 6 outcomes,
1 2 3 4 5 6
1 2 3 4 5 6
All of the probabilities from the original model are scaled by a factor of 1
P (B)
to get the conditional probabilities.
Example 4.6. Suppose that a friend will call one time during the next 60
minutes. We measure (in minutes) the waiting time until she calls. Let A =
{x | x ≤ 30}, and let B = {x | x ≤ 10}.
Given that A occurs, the probability of B occurring is 1/3; in other words,
P (B | A) = 1/3.
B A
0 5 10 15 20 25 30 35 40 45 50 55 60
4.1. Introduction 51
Once we know that A occurs, we can ignore any of the outcomes that are
bigger than 30. Also, conditional probabilities satisfy all of the requirements of
probabilities, as we will see at the end of this chapter. So, given that A occurs,
if we know B occurs 1/3 of the time, it must be the case that B c occurs the
other 2/3 of the time. Thus, the conditional probability of B not occurring
(given A occurred) must be 2/3, i.e.,
P (B c | A) = 2/3.
Example 4.7. On a dartboard let C = {9, 12, 5, 20, 1, 18, 4} be the event that
the dart lands in the upper portion of the dartboard. If A = {9, 12, 5}, we have
P (A | C) = 3/7, as in Figure 4.1.
5 20 1
12 18
9 4
14 13
11 6
8 10
16 15
7 2
19 3 17 (or could miss the
dartboard altogether)
We can also consider some events that are not completely within C. For
instance, say B = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Some of B overlaps with C and
some does not; this is depicted in Figure 4.2.
5 20 1
12 18
9 4
14 13
11 6
8 10
16 15
7 2
19 3 17 (or could miss the
dartboard altogether)
Figure 4.2: Event C is shaded in blue; event B is shaded with lines; so event
B ∩ C is the blue part that is shaded with lines.
52 Chapter 4. Conditional Probability
So the outcomes 8, 7, 3, 2, 10, 6 from B were essentially not used in this solution.
These are not possible outcomes, because they are outside C. Event C can be
viewed as the (new) sample space, when calculating probabilities conditioned
on C.
Example 4.8 demonstrates that we cannot swap the events A and B in the
conditional probability P (A | B).
Now we calculate
P (A ∩ B) 0.05
P (A | B) = = = 5/42 = 0.1190.
P (B) 0.42
4.2. Distributive Laws 53
P (B \A) = 0.37
B
P (A ∩ B = 0.05)
P (A\B = 0.05) A S
This is a very small probability. Intuitively: The event B is very large, compared
to A ∩ B. So if we know that B has occurred, the probability of A ∩ B is very
small. On the other hand,
P (A ∩ B) 0.05
P (B | A) = = = 1/2.
P (A) 0.10
and \ \
Aj ∪B = (Aj ∪ B);
j j
Sn
The unions Sand intersections over j can be finite, written as j=1 Aj , or can
be infinite, ∞
j=1 Aj .
2. the outcome is in B, or
3. both are true, i.e., the outcome is in all of the Aj ’s and also in B.
This is the same as the requirement for an outcome to be in j (Aj ∪ B); the
T
outcome must be in B, or if not in B, it must be in all of the Aj ’s.
these
S3 three majors and is T3also on the volleyball team; this is the same event as
j=1 (Aj ∩ B). Event j=1 Aj ∪ B occurs if and only if the student pursues
all three T
of these majors or is on the volleyball team (or both); this is the same
event as 3j=1 (Aj ∪ B).
Remark 4.12. The probabilities of the form P (A) that we have studied in
Chapters 2 and 3 were unconditional. They did not assume that any event
occurred. Such probability could be treated as “conditional” if we just make
S the condition, i.e., P (A) is the same as P (A | S).
The intuitive way to think about this is that, when B is a given event
throughout a problem or scenario, it is just the same as if B is written into the
fabric of what is known. So everything is conditioned on B. The event B can
just be viewed as replacing the original sample space S.
4.3. Conditional Probabilities Satisfy the Probability Axioms 55
S∞
The
P∞ Aj ∩ B are disjoint since the Aj ’s are disjoint. So P
j=1 (Aj ∩ B) =
j=1 P (Aj ∩ B). So we conclude that
∞ P∞ ∞ ∞
j=1 P (Aj ∩ B)
[ X P (Aj ∩ B) X
P Aj B = = = P (Aj | B).
P (B) P (B)
j=1 j=1 j=1
Example 4.14. When rolling a die, let A denote the event consisting of out-
comes {3, 5}, and let B denote the event consisting of outcomes {1, 2, 3}. Find
P (A | B) and P (Ac | B).
56 Chapter 4. Conditional Probability
Example 4.15. When rolling a die, let A denote the event consisting of out-
comes {1, 2, 3}, and let B denote the event consisting of outcomes {3, 4}.
Example 4.16. Let A denote the event that the amount of rain on July 1,
2011, in Lafayette, Indiana, is 0.10 inches or more. Let B denote the event
that the amount of rain on July 1, 2012 (i.e., one year later), in Lafayette, is
0.10 inches or more. The occurrence of A should not affect the likelihood of
occurrence of B, so P (B | A) = P (B), and thus A and B are independent.
4.4 Exercises
4.4.1 Practice
Exercise 4.1. Pick ten songs. As in Example 2.27, a student hears ten songs
(in a random shuffle mode) and uses F to denote when a song belongs to her
favorite type of music, and N for not-favorite, so the sample space consists of
all ten-tuples of F ’s and N ’s. For each song, the probability that the song is
one of her favorite type can be called p, and the probability that the song is not
one of her favorite type is 1 − p. Define
so that, for instance, P (A) = (1 − p)3 , and P (B) = P (C) = p5 . Find the
following conditional probabilities:
Exercise 4.2. Dining with Dad. Consider the events from Exercise 2.4, i.e.,
at a random meal during a parent weekend in the dining hall, a student notices
the food chosen by her father. Let A, B, C be the events that his meal include
Artichokes, Broccoli, or Cauliflower. These events have the property that:
P (A) = 0.35; P (B) = 0.39; P (C) = 0.44; P (A ∩ B) = 0.13; P (A ∩ C) =
0.12; P (B ∩ C) = 0.13.
Find the following conditional probabilities:
P (B | C), P (C | B), P (A | B), P (B | A), P (A | C), P (C | A).
when one song is chosen at random. Let B, J, R denote the events that the song
is a blues, jazz, or rock song, respectively.
Exercise 4.4. Golf. In the PGA, on par 3 holes, golfers hit the green in one
shot 80% of the time. In fact, 20% of the time, they hit the green in one shot
and then need only one putt to complete the hole; so 60% of the time, they hit
the green in one shot but are unsuccessful on their putt. What is the probability
that a PGA golfer only needs one putt, given that he hits the green in one shot?
Exercise 4.5. Parity of spinning. A spinner has the left side (numbers 1, 2,
3, 4, and 5) colored red and the right side colored white (numbers 6, 7, 8, and
9), with all numbers equally likely.
Exercise 4.6. Conditioning on cards. Draw one card from a shuffled deck
of 52 cards.
a. What is the probability that the card is a spade if you know the card is
a 7?
b. What is the probability that the card is a spade if you know the card is
black?
c. What is the probability that the card is a 7 if you know the card is a
spade?
d. What is the probability that the card is a 7 if you know the card is black?
e. What is the probability that the card is black if you know the card is a
spade?
f. What is the probability the card is black if you know the card is a 7?
Exercise 4.7. Puppets. You are previewing movies for your young nephew.
You have 1,284 movies available to view, 272 of which are G-rated. Your nephew
enjoys movies with puppets, which make up 94 of your G-rated movies. There
are only 30 movies with puppets that are not G-rated. If you happen to pick a
movie that has puppets, what is the probability that it is G-rated?
4.4.2 Extensions
Exercise 4.8. Dice. You roll two dice. Let A be the event that the sum of the
dice is an even number. Let B be the event that the two results are different.
If B has occurred, what is the probability A has also occurred?
Exercise 4.9. Pair of dice. Roll a pair of dice. Given that the two dice have
different values, find the probability that the sum of the dice is an even number.
Exercise 4.10. Pair of dice. Roll a pair of dice. Given that the sum of the
pair of dice is 9 or larger, find the probability that the sum of the pair of dice
is exactly 10.
Exercise 4.11. Random sexes. A couple has two children. At least one is a
boy. What is the probability that the couple has one child of each sex?
Exercise 4.12. More random sexes. A couple has three children. They are
not all girls.
4.4.3 Advanced
Exercise 4.13. Seating arrangements. Alice, Bob, Catherine, Doug, and
Edna are randomly assigned seats at a circular table in a perfectly circular
room. Assume that rotations of the table do not matter, so there are exactly 24
possible outcomes in the sample space. Bob and Catherine are married. Doug
and Edna are married. Given that Bob and Catherine are sitting next to each
other, find the conditional probability that Doug and Edna are sitting next to
each other.
a. Given that at least one of the digits on the chosen page is a 5, find the
probability that the page is 255.
b. Given that at least two of the digits on the chosen page are 5’s, find the
probability that the page is 255.
Exercise 4.15. Even more random sexes. A couple has n children, where
n ≥ 1 is fixed. They are not all girls. What is the probability of exactly j boys
(where j ≥ 1)?
Chapter 5
Bayes’ Theorem
We all know that using cell phones while driving is dangerous, but many people
do it anyway. If an accident occurs, the first question an insurance company asks
the driver is, “Was anybody using a cell phone when the accident occurred?”
What is the probability of a randomly selected driver having an accident in the
month of September? How does that probability change if we know that driver
is a “regular” cell phone user? If a driver has an accident, what is the chance
the driver is a “regular” cell phone user? Why are these last two probabilities
not the same?
60
5.1. Introduction to Versions of Bayes’ Theorem 61
P (A)P (B | A)
P (A | B) = .
P (B)
Example 5.2. In a certain household, 20% of the milk has two-percent milkfat,
and the other 80% of the milk is whole milk. The whole milk is spoiled 5% of
the time; overall, the milk is spoiled 4.7% of the time. Find the conditional
probability that, if you just poured a spoiled cup of milk from a random jug, it
is whole milk.
Let A denote the event that the milk came from a whole milk carton, and let
B denote the event that the milk was spoiled. Then we want to find P (A | B).
We are given P (B | A) = 0.05, i.e., the probability that a whole milk carton will
be spoiled is 5%. We are also told that P (A) = 0.80 and that P (B) = 0.047.
So we get
P (A)P (B | A) (0.80)(0.05)
P (A | B) = = = 0.85.
P (B) 0.047
Often we are not given P (B) directly, and we have to decompose P (B) by
writing:
P (B) = P (A ∩ B) + P (Ac ∩ B)
= P (A)P (B | A) + P (Ac )P (B | Ac )
Example 5.3. In a recent study about driving safety, drivers were classified
as either “regularly” or “rarely” using a cell phone while driving. Each person
who regularly talks on a cell phone while driving has a probability of 1/250 of
causing an accident in September. Each person who rarely talks on a cell phone
while driving has a probability of 1/2000 of causing an accident in September.
Also, 35% of people were classified as regularly using a cell phone while driving.
Suppose that a randomly selected person causes an accident in September.
What is the probability that the person is a “regular” cell phone user while
driving?
62 Chapter 5. Bayes’ Theorem
We write A for the event that the person who caused the accident was a
“regular” user, and B for the event that the person selected causes an accident
in September. We want to know P (A | B). We are given:
P (A)P (B | A)
P (A | B) = ,
P (B)
and we now have all of the pieces except P (B). There are 2 ways in which an
accident can happen: with a driver who regularly uses a cell phone or with a
driver who rarely uses a cell phone. We need to combine these possibilities for
the overall probability that there will be an accident.
We can compute P (B) by writing
P (B) = P (A ∩ B) + P (Ac ∩ B)
= P (A)P (B | A) + P (Ac )P (B | Ac )
= (0.35)(1/250) + (0.65)(1/2000)
= 0.001725
Thus,
P (A)P (B | A) (0.35)(1/250)
P (A | B) = = = 0.81.
P (B) 0.001725
So, given that a randomly selected person causes an accident in September, that
person is a “regular” user with probability 0.81.
There are several alternative formulations which can be derived directly from
Bayes’ Theorem. The first one is a generalization of the example about driver
safety.
Remark 5.4. We can decompose B into two parts: A∩B and B \A = Ac ∩B.
Since B = (A ∩ B) ∪ (Ac ∩ B) is a disjoint union, then if P (A) is not 0 or 1,
Substituting this into the denominator of the original statement of Bayes’ The-
orem, we get the following alternative formulation:
5.1. Introduction to Versions of Bayes’ Theorem 63
P (A)P (B | A)
P (A | B) = .
P (A)P (B | A) + P (Ac )P (B | Ac )
This is a really nice version because we can now compute P (A | B) using only
three numbers: P (A), P (B | A), and P (B | Ac ). The last piece of information,
P (Ac ), we can get for free, because we always have P (Ac ) = 1 − P (A).
Let A and Ac be the events that the song was chosen from his roommate’s or
girlfriend’s playlist, respectively. Let B be the event a song is from the student’s
favorite type of music. We want to find P (A | B). We are given:
A is the event that the song B is the event that the song
is from the roommate’s playlist is one of the favorite type
P (A ∩ B)
P (A | B) =
P (B)
P (A ∩ B)
=
P (A ∩ B) + P (Ac ∩ B)
P (A)P (B | A)
=
P (A)P (B | A) + P (Ac )P (B | Ac )
(0.40)(0.62)
=
(0.40)(0.62) + (0.60)(0.88)
= 0.32.
Thus, given that his favorite type of music is playing, he chose his roommate’s
playlist 32% of the time.
64 Chapter 5. Bayes’ Theorem
In Example 5.6, we split the sample space into two pieces, e.g., whether a song
was chosen from the playlist of a person’s roommate or girlfriend, or whether
the randomly chosen person regularly or rarely uses the cell phone while driving.
Sometimes it is helpful to be able to split the sample space into more than two
groups. Here is an example:
Let B be the probability a randomly chosen monitor lasts over 5 years. Let Aj
be the event that a monitor is of type j. We want P (Aj | B). We are given:
The probability that a randomly chosen person’s monitor lasts over 5 years is
Given that a randomly chosen person’s monitor lasts over 5 years, the proba-
bilities that it is of type 1, or type 2, or type 3 (respectively) are:
P (A1 )P (B | A1 ) (0.1)(0.8)
P (A1 | B) = = = 0.17
P (B) 0.47
P (A2 )P (B | A2 ) (0.3)(0.7)
P (A2 | B) = = = 0.45
P (B) 0.47
P (A3 )P (B | A3 ) (0.6)(0.3)
P (A3 | B) = = = 0.38
P (B) 0.47
The probabilities 0.17, 0.45, and 0.38 must add to 1 because the three possible
events A1 ∩ B, A2 ∩ B, A3 ∩ B are disjoint and their union is all of B.
5.1. Introduction to Versions of Bayes’ Theorem 65
The following version of Bayes’ Theorem allows us to split the sample space
into a finite number of different pieces:
P (B) = P (A1 ∩B)+· · ·+P (An ∩B) = P (A1 )P (B | A1 )+· · ·+P (An )P (B | An ).
Substituting into the denominator of the first version of Bayes’ Theorem, we get:
Theorem 5.9. Bayes’ Theorem (Decomposition of Sample Space into
Finitely Many Parts)
If P (B) > 0, and if A1 , A2 , . . . , An form a partition of S, with all P (Aj ) > 0,
then
P (Ak )P (B | Ak )
P (Ak | B) = .
P (A1 )P (B | A1 ) + P (A2 )P (B | A2 ) + · · · + P (An )P (B | An )
Thus, to compute P (Ak | B) we only need the P (Ak )’s and P (B | Ak )’s.
Finally, sometimes we need to split the sample space into infinitely many
pieces. A version of Bayes’ Theorem is possible for this situation too:
Remark 5.10. Let A1 , A2 , . . . be a partition of the sample space (i.e.,
∞
j=1 Aj = S and the Aj ’s are disjoint). Then we have the decomposition of B
S
into infinitely many parts, A1 ∩B, A2 ∩B, etc. So B = (A1 ∩B)∪(A2 ∩B)∪· · ·
is a union of disjoint events. Thus
∞
X ∞
X
P (B) = P (A1 ∩ B) + P (A2 ∩ B) + · · · = P (Aj ∩ B) = P (Aj )P (B | Aj ).
j=1 j=1
Substituting this into the denominator of the original statement of Bayes’ The-
orem, we get the following alternative formulation:
Theorem 5.11. Bayes’ Theorem (Decomposition of Sample Space
into Infinitely Many Parts)
If P (B) > 0, and if A1 , A2 , . . . form a partition of S, with all P (Aj ) > 0, then
P (Ak )P (B | Ak )
P (Ak | B) = P∞ .
j=1 P (Aj )P (B | Aj )
Example 5.12. Consider a game with two stages. In the first stage, a player
flips a fair coin until a head appears (usually this only requires a small number
of flips). Say that it takes j flips to get this first head. Then, in the second
stage, he random draws an integer between 1 and 2j to get his winnings. E.g.,
if the player takes 3 flips to get a head, he wins an amount between 1 and 8.
66 Chapter 5. Bayes’ Theorem
Given that the player wins 1 dollar, what is the probability that the coin was
flipped exactly 6 times, i.e., that j = 6?
Let Aj be the event that exactly j coin flips are needed to get a head on the
coin. Let Bk be the event that the player wins k dollars. We want P (A6 | B1 ).
We have
P (B1 | A6 )P (A6 )
P (A6 | B1 ) = P∞ .
j=1 P (B1 | Aj )P (Aj )
Also, P (Aj ) = 1/2j . Given that Aj occurred, the player will win one of the
amounts between 1 and 2j inclusive, and these 2j outcomes are equally likely,
so the player wins 1 with probability 1/2j . So we have
1 1 1
6 26 6
P (A6 | B1 ) = P∞2 1 1 = P∞ 4
1 j
j=1 2j 2j j=1 4
So we conclude that
1/46 3
P (A6 | B1 ) = = 6 = 0.000732.
1/3 4
P (A1 ∩ A2 )
P (A2 | A1 ) = ,
P (A1 )
or equivalently
P (A1 ∩ A2 ) = P (A1 )P (A2 | A1 ).
Example 5.17. Consider a student’s playlist that has 10 rock songs and 12
country songs. The student chooses at random—with all selections equally
likely, and no repeats—three songs from the playlist. What is the probability
that all three are country songs?
68 Chapter 5. Bayes’ Theorem
Let A1 , A2 , A3 be the event that the first, second and third songs, respectively,
are country songs. Then the desired probability is
The probability of A1 is 12/22 since there are initially 22 songs, 12 of which are
country, and all of the selections are equally likely.
Next, given that A1 occurs, there are 21 songs remaining—10 rock songs
and 11 country songs—so P (A2 | A1 ) = 11/21.
Example 5.18. Consider any scenario in which there are N items of one type
and M items of another type, and we need to choose n items altogether. Suppose
all choices are equally likely. If we want to calculate the probability that all n
of the chosen items are of the second type, we can do this systematically.
We write A1 , A2 , . . . , An for the events that the first, second, third, ..., nth items
are of the second type. Then we use the fact that
As above, we have
M
P (A1 ) =
N +M
M −1
P (A2 | A1 ) =
N +M −1
M −2
P (A3 | A1 ∩ A2 ) =
N +M −2
.. ..
. .
M − (n − 1)
P (An | A1 ∩ A2 ∩ · · · ∩ An−1 ) =
N + M − (n − 1)
(M )(M − 1) · · · (M − n + 1)
P (A1 ∩ A2 ∩ · · · ∩ An ) = .
(N + M )(N + M − 1) · · · (N + M − n + 1)
5.3 Exercises
5.3.1 Practice
Exercise 5.1. Car safety. A car safety institute observes that 12% of cars on
the road are manufactured by Honda. They also observe that 98% of Honda
vehicles are classified as “safe” at the time of inspection. The percentage of all
cars (regardless of brand) classified as “safe” is 72%. A car is randomly chosen
on the road and inspected. It is classified as “safe.” What is the probability
that it is a Honda?
Exercise 5.4. mp3 players. Eighty percent of all mp3 players are iPods. Five
percent of iPods are defective. Seven percent of all mp3 players are defective.
A randomly chosen mp3 player is taken to a repair shop because it is defective.
What is the probability it is an iPod?
Exercise 5.5. Chalkboards and dry-erase boards. In the Department
of Mathematical and Computational Science, 83% of lecture halls have chalk-
boards, and the other 17% have dry-erase boards. Of the classes taught in
rooms with chalkboards, 75% are mathematics courses, 15% are computer sci-
ence courses, and 10% are statistics courses. Of the classes taught in rooms
with dry-erase boards, 65% are computer science courses, 8% are mathemat-
ics courses, and 27% are statistics courses. What percentage of courses at the
college are mathematics or statistics?
Exercise 5.6. Math and art. In the theory of general intelligence, it is stated
that being good in one intelligence, like math, increases the chance of one being
good in another intelligence. Suppose 10% of the people are good at art, and
that 40% of the people who are good at art are also good at math. If a person
is not good at art, they have only a 30% chance of being good at math. What
is the probability that a person who is good at math will also be good at art?
Exercise 5.7. Students with music players. A student is chosen at random.
You want to know the probability that the student has an iPod. This could be
hard to determine since there are over 40,000 students on the given campus.
Fortunately, a current survey recorded that 47% of first-year students have
iPods, and you know that 32% of the students are first-year students. The
survey indicates that 56.2% of upperclass students (i.e., non first-years) have
iPods.
a. What is the probability that the student you randomly select has an
iPod?
b. Given that the selected student has an iPod, what is the probability that
he/she is an upperclass student?
c. Given that the selected student does not have an iPod, what is the
probability that he/she is an upperclass student?
Exercise 5.8. Smoke detectors. It is estimated that 82% of homes have
working smoke detectors. On average, 22% of fires result in fatalities, but the
presence of a working smoke detector cuts the risk to just 7%.
a. If a random fire resulted in a fatality, what is the probability that the
house had a working smoke detector?
b. In homes without a working smoke detector, what is the risk of fatalities?
5.3.2 Extensions
Exercise 5.9. Coins and dice. Consider the following game: The player flips
a fair coin. If it shows a head, he gets to roll a 4-sided die. If it shows a tail,
5.3. Exercises 71
he gets to roll a 6-sided die. In either case, let A denote the event that he gets
1 on the die roll. Let H denote the event that his coin flip shows a head, i.e.,
that he uses the 4-sided die. Let T denote the event that his coin flip shows a
tail, i.e., that he uses the 6-sided die.
a. Find P (H | A).
b. Find P (T | A).
Hint: To verify your answers, note P (H | A) + P (T | A) = 1.
Exercise 5.10. French classes. In a certain school, 4 levels of French are
taught with 40% of the students being enrolled in level 1, 30% enrolled in
level 2, 20% enrolled in level 3, and 10% enrolled in level 4. The percentage of
people who enjoy their French class is 70% in level 1, 80% in level 2, 85% in
level 3, and 90% in level 4.
Given that a person enjoys his/her French class, let pj be the probability
that the student is enrolled in level j. Find pj for j = 1, 2, 3, 4. (Check: Your
four answers should sum to 1 altogether.)
Exercise 5.11. Sex and switching majors. At a certain university, 60% of
undergraduate students are male, and 40% are female. Ten percent of females
change their majors at least once. Overall, 30% of students change their majors
at least once.
a. Find the probability that, given a randomly selected student who changes
majors, the student is a female.
b. What percentage of males change their majors at least once?
Exercise 5.12. Engineering majors. At First Street Towers, an equal num-
ber of students live on each floor. On floors 1, 2, 3, 4, 5, the percentage of
students who study engineering are, respectively, 80%, 52%, 74%, 67%, and
29%. Upon meeting an Engineering student who lives in First Street Towers,
what is the probability that the student lives on the 4th floor?
Exercise 5.13. Babies. Allison delivers one baby, and a year later she delivers
a second baby. Let C be the event that at least one of the babies is a girl (either
the first, or the second, or both). Let D be the event that both of the babies
are girls. Find P (D | C).
5.3.3 Advanced
Exercise 5.14. Pair of dice. Roll a blue die and a red die. Given that the
blue die has an odd value, what is the probability that the sum of the two dice
is exactly 4?
Exercise 5.15. Pair of dice. Roll a blue die and a red die. Given that the
blue die has a value of 4 or smaller, what is the probability that the sum of the
two dice is 7 or larger?
72 Chapter 5. Bayes’ Theorem
Exercise 5.16. Fuses. Two fuses in series are built to shut down if an overload
occurs. If the first fuse shuts down properly 90% of the time, there is no need
for the second fuse to do anything. If the first fuse fails to shut down properly,
the second fuse shuts down properly 95% of the time. What is the probability
the whole system operates correctly during an overload (with one fuse or the
other shutting down properly)?
Exercise 5.17. Weather. The weather on any given day can either be sunny,
cloudy, or partially cloudy. Each day is also classified as dry or rainy. The
probability of a sunny day is 0.48, and the probability of a cloudy day is 0.39.
The probability of having a sunny and dry day is 0.48. The probability of a
cloudy and dry day is 0.14. The probability of a partially cloudy and dry day
is 0.09.
Exercise 5.18. Coin flips and then dice. Claire flips a coin until she gets
a head for the first time. Say it takes her n times. Then (afterwards) she rolls
exactly n dice. What is the probability that none of the dice show the value 1?
(For instance, if it takes her 7 flips to get a head for the first time, then she
rolls 7 dice. Hint: It is enough to use Remark 5.8. You do not need Bayes’
Theorem itself for this problem.)
Review of Randomness
73
74 Chapter 6. Review of Randomness
P (A ∩ B)
P (A | B) = .
P (B)
Equivalently,
P (A ∩ B) = P (B)P (A | B).
P (B) = P (A1 ∩B)+· · ·+P (An ∩B) = P (A1 )P (B | A1 )+· · ·+P (An )P (B | An ).
P (A)P (B | A)
P (A | B) = .
P (A)P (B | A) + P (Ac )P (B | Ac )
6.2. Exercises 75
6.2 Exercises
Exercise 6.1. Probabilities in a chain of subsets. Four events A, B, C, D
could be compared in the following way:
A ⊂ B ⊂ C ⊂ D.
The probability of A is 0.03, and the probability of C is 0.27. With the given
information, what are potential probabilities of B and D?
Exercise 6.2. Waiting for the bus. A customer waits for a bus to appear.
a. List 5 possible outcomes.
b. What is the sample space?
c. Write a partition for the customer’s waiting time, using five-minute in-
tervals for the partition. (Assume that the customer can wait as long as needed
for the bus.)
d. Explain how the answer to part c meets the definition of a partition; see
Definition 2.14.
e. Modify your answer to part c, with the assumption that the customer
has a maximum time that they can afford to wait before giving up.
Exercise 6.3. Randomly choose a page. Randomly open a 300-page book,
and mark a page. What is the probability that the number of the page contains
the lucky number 5?
Exercise 6.4. Cell phone. A student loses her first cell phone. Her new
phone number is randomly chosen by the store, but the area code is fixed, so
there are exactly 7 randomly selected digits. (Assume all 107 possibilities are
equally likely.) Give the probabilities of the following events:
a. Her phone number ends in a 2.
b. Her phone number ends in an odd number.
c. Her phone number ends in a 5 or a 7.
Exercise 6.5. Egg-citing! There are 45 egg boxes in a store. Twenty are
Brand A, fifteen are brand B, and ten are Brand C. Brand A and C each have
half green boxes and half yellow boxes. Brand B is all yellow.
a. If you have Brand B or C, what is the chance that you have a green box?
b. If you have a yellow box, what is the chance the brand is B?
76 Chapter 6. Review of Randomness
Exercise 6.6. Skittles. Chris has 32 Skittles candies. Nine are red, three
are blue, seven are yellow, five are orange, and eight are purple. Exactly four
are sour, and all of these sour Skittles are purple. Chris picks one Skittle at
random.
In this part of the book, we study discrete random variables and their distribu-
tions. We will discuss the differences between discrete and continuous random
variables and introduce some useful formulas for calculating other information
about discrete random variables. (We will return to continuous random vari-
ables starting in Chapter 24.) The rules and formulas in the next few chapters
will work for all discrete random variables. This will build a base of under-
standing about common themes of discrete random variables that will be useful
in the subsequent part of the book, on named discrete random variables.
By the end of this part of the book, you should be able to:
77
Chapter 7
While writing my book I had an argument with Feller. He asserted that everyone
said “random variable” and I asserted that everyone said “chance variable.” We
obviously had to use the same name in our books, so we decided the issue by a
stochastic procedure. That is, we tossed for it and he won.
—Joe Doob, from “A Conversation with Joe Doob,” by J. Laurie Snell, from
Statistical Science, volume 12, number 4, November 1997
We do not measure everything with the same types of variables. Can what we
are measuring be counted? Or does it fall within a certain range without specific
levels? For instance, consider the difference in how we measure the outcome of
a die roll versus the distance a dart lands from the bullseye. If our types of
variables are different, we will need different ways of calculating probabilities.
7.1 Introduction
We have already seen that, in a random phenomenon, there are many possible
outcomes; exactly one of these outcomes occurs. The set of all possible outcomes
is the sample space. Now we introduce random variables, which are the main
topic of study throughout the rest of this book:
78
7.1. Introduction 79
Example 7.2.a If we roll three dice, there are 63 = 216 possible outcomes. One
possible random variable, which we call X, is the sum of the three dice. 1 die has 6 possible
outcomes; 2 dice
If the outcome is ω = (6, 4, 3), then sum of the dice is X(ω) = 13. If the have 6 × 6 = 36
outcome is ω = (5, 2, 2), the sum is X(ω) = 9. In general, if the outcome is possible outcomes; 3
ω = (a, b, c), then the sum is X(ω) = a + b + c. In this example, X is always dice have
6 × 6 × 6 = 216
an integer from 3 and 18, depending on the outcome. The outcome completely
possible outcomes.
and uniquely determines the value of the random variable.
Example 7.2.b As before, roll three dice. Let the random variable Y be the
maximum of the three dice.
If the outcome is ω = (a, b, c), then Y (ω) = max(a, b, c). For instance, if the
outcome is ω = (4, 5, 2), then Y (ω) = 5.
Example 7.2.c Let the random variable Z denote the value of the first die
rolled.
If the outcome is ω = (a, b, c), then Z(ω) = a. E.g., if the outcome is
ω = (6, 1, 2), then Z(ω) = 6.
When working with random variables, we must resist the urge to “solve” for the
value of the random variable. Unlike in algebra, we will not spend time studying
equations such as x2 −x−6 = 0 and trying to solve for x. As a constant reminder
that we are not solving such equations, we always use a capital letters (such
as X) to denote a random variable.
The outcome of a random phenomenon is, well, random! Different outcomes
happen, so random variables can have different values. Every random variable
depends on the underlying outcome.
Also, random variables must assign a real number to each outcome. E.g., if
a randomly chosen person ω (the outcome) in a classroom is selected, we could
define a random variable X(ω) as the person’s age. So X(ω), the person’s age,
is a random variable that completely depends on ω, the person selected.
80 Chapter 7. Discrete Versus Continuous Random Variables
Since a random variable must assign a real number to each outcome, then
a random variable cannot be a person’s name, the color of a car, a country, a
suit in a deck of cards, heads or tails, etc. These types of things might be the
underlying outcomes, but they cannot be the random variables themselves.
Random variables can be either discrete or continuous. Parts II and III of
the book are all about discrete random variables. Parts V and VI are dedicated
to continuous random variables.
The list of all values taken on by a discrete random variable is either finite:
or countably infinite:
e.g., {1, 1/2, 1/4, 1/8, . . .} or {1, 2, 3, 4, . . .} or {. . . , −2, −1, 0, 1, 2, . . .}, etc.
Discrete random variables can assume any kind of real numbers; e.g., negative
numbers are allowed; decimals, fractions, transcendental numbers, etc., are all
allowed too. All that matters for a random variable to be discrete is that the
set of possible values is either finite or can be put into a countably infinite list.
Continuous random variables, on the other hand, to be covered in Part V,
take values on continuous intervals (or on the union of continuous intervals):
7.2 Examples
• let X be “1” if the next car to pass is blue, or “2” if red, or “3” if silver, or
Remember, random “−1” otherwise;
variables can be
negative too! • let X be the number of the region, 1 through 20, on which a dart lands.
Examples of random variables that are continuous are:
• let X be the length of your left foot (in inches);
7.2. Examples 81
Example 7.4. Flip a coin three times, and let X denote the total number of
heads that appear.
The set of values X can assume is {0, 1, 2, 3}, so X is a discrete random variable.
We can make a chart of all of the possible outcomes and the associated values
of X:
outcome prob. of outcome value of X prob. of each X value
(H, H, H) 1/8 3 1/8 = P (X = 3)
(H, H, T ) 1/8 2
(H, T, H) 1/8 2 3/8 = P (X = 2)
(T, H, H) 1/8 2
1
(T, T, H) 1/8
(T, H, T ) 1/8 1 3/8 = P (X = 1)
(H, T, T ) 1/8 1
(T, T, T ) 1/8 0 1/8 = P (X = 0)
Each of the outcomes (H, H, T ) and (H, T, H) and (T, H, H) will cause X to
be 2. Each of these outcomes has probability 1/8, so the event containing all
three of these outcomes has probability 3/8, i.e.,
P (X = 2) = 3/8.
Example 7.5. a Let ω = (x, y) be the Cartesian coordinates where a dart lands
on a dart board. We could let X(ω) = x be a random variable that denotes the
first coordinate and let Y (ω) = y be a random variable that denotes the second
coordinate.
Both X and Y are continuous random variables that each assume values (for
instance) on the interval (−9, 9) if the dartboard has radius 9 inches. If ω =
(3.6, −1.35) is the location where the dart lands, then X(ω) = 3.6 and Y (ω) =
−1.35. Again, we often write (more simply) just X = 3.6 and Y = −1.35, in
such a case.
82 Chapter 7. Discrete Versus Continuous Random Variables
Example 7.5.b We could also consider other random variables, p for instance,
if ω = (x, y) is the location of the dart’s landing, then Z(ω) = x2 + y 2 is the
distance of the dart from the center of the board. More simply, we can drop the
notation for ω and just write Z as the distance to the center of the dartboard.
Notice Z is a continuous random variable.
Example 7.6. A traffic engineer observes the next three cars that pass. He
uses X as the time until the arrival of the third car, Y as the time between the
arrivals of the second and third cars, and Z as the speed of the third car. Since
X, Y, Z each take values on the interval (0, ∞), then X, Y, Z are each continuous
random variables.
Example 7.7. A student flips a coin until the 10th head appears. Each outcome
is a string of heads and tails. For instance, an outcome ω might be
ω = (H, H, H, T, H, T, T, T, H, T, T, T, H, H, T, H, T, H, T, H),
He writes:
In this case, outcome ω causes these random variables to have the following
values:
X1 = 1, X2 = 1, X3 = 1, X4 = 2, X5 = 4,
X6 = 4, X7 = 1, X8 = 2, X9 = 2, X10 = 2.
Each of the Xj ’s takes on a positive integer value and is therefore a discrete
random variable. Finally, X1 + · · · + X10 is the total number of flips needed
until the 10th head appears.
7.2. Examples 83
Example 7.8. A student is selected at random and her mp3 player is examined.
Let X denote the number of songs on her music player and let Y denote the
number of songs on the first playlist on the music player, or Y = 0 if there are
no playlists on the mp3 player. (Let X = 0 and Y = 0 if she doesn’t have an
mp3 player at all.) Notice that X and Y are random variables. Also, we know
Y ≤ X, since the number of songs on the playlist is limited by the number of
songs on the mp3 player altogether.
We cannot assign an artist’s name, or a genre of song, as a random variable,
because these are not real numbers. Nonetheless, we could (say) assign a nu-
meric scheme, such as Z = 1 if the majority of the songs are blues songs, Z = 2
if the majority of the songs are rock songs, Z = 3 if the majority of the songs
are jazz songs, or Z = 4 otherwise (including if there is a “tie” for the majority).
Both X and Y assume values that are nonnegative integers and thus are
discrete random variables. Since Z takes values in the range {1, 2, 3, 4}, then Z
is a discrete random variable too.
At this point, it should be clear that there are ample, real-world possibilities
for assigning random variables according to all kinds of random phenomenon.
Example 7.10. Roll a pair of dice, and let X denote the sum of the two values
that appear. So if the outcome is (i, j), then X = i + j.
84 Chapter 7. Discrete Versus Continuous Random Variables
The sum X is a discrete random variable that has values in the set {2, 3, . . . , 12}.
There are several outcomes that cause X to be equal to 5. These outcomes are
those in the event
{(1, 4), (2, 3), (3, 2), (4, 1)}.
Each event has a probability associated with it. In this case,
P (X = 5) = 4/36
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (X = x) 36 36 36 36 36 36 36 36 36 36 36
Random variables that are 1 when an event occurs or 0 when the event does
not occur are called indicator random variables, sometimes abbreviated
as indicators; they are also called Bernoulli random variables. They will be
studied in Chapter 14. Indicator random variables are used extensively when
working with sums of random variables, to be covered in Chapter 11.
7.3. Exercises 85
7.3 Exercises
7.3.1 Practice
In Exercises 7.1 to 7.15, identify whether the random variables are discrete or
continuous (an exercise might contain one or more of both types). Also identify
the possible values that each random variable could assume. In some cases,
answers will differ according to interpretation of the problems by the students.
Exercise 7.1. Pair of dice. Roll two dice. Let X be the absolute value of
the difference of the two values that appear. (If D1 , D2 are the two die values,
then X = |D1 − D2 |.)
Exercise 7.2. Onions. Pick a basket of onions. Let X be the weight of the
onions (in pounds). Let Y be the number of onions in the basket.
Exercise 7.5. Music genres. Select a random student’s music player. Let X
denote the number of blues songs on the music player; let Y denote the number
of jazz songs; let Z denote the number of rock songs.
Exercise 7.6. Average distance. Let X be the average distance (in miles)
that a student drives in a week.
Exercise 7.7. Sexes of babies. Consider the births of ten consecutive babies.
Let X be the number of babies that are girls. Let Y be the number of the first
baby that is a boy (if none of the babies are boys, then just let Y = 0, or suggest
your own, alternative value for such a case).
Exercise 7.12. Puzzle pieces. A pile of puzzle pieces falls onto the floor.
Let X denote the number of pieces that are edge pieces, and let Y denote the
number of pieces that are interior (i.e., not edge) pieces.
Exercise 7.14. Snake eyes. Roll a pair of dice until “snake eyes” (i.e., a pair
of 1’s) appear. Let X denote the total number of rolls required. Let Y denote
the sum of all of the dice rolled during this process.
Exercise 7.15. Student age and name. Select a random student from your
course; let X denote the age, in days, of the selected student; let Y denote the
length of the student’s name.
7.3.2 Extensions
Exercise 7.16. Toy ballet dancers. When opening a container of 32 toy bal-
let dancers performing pirouettes, let X denote the number of broken dancers,
and let Y denote the number of whole (unbroken) dancers.
Write Z = X/Y , i.e., Z is the ratio of broken dancers to whole dancers.
What complication arises in this definition of Z as a random variable?
Exercise 7.17. Pick two cards. Pick two cards at random from a well-shuffled
deck of 52 cards (pick them simultaneously, i.e., grab two cards at once—so they
are not the same card!). There are 12 cards which are considered face cards (4
Jacks, 4 Queens, 4 Kings). Let X be the number of face cards that you get.
a. Find P (X = 0).
b. Find P (X = 1).
c. Find P (X = 2).
7.3.3 Advanced
Exercise 7.18. Mixed random variables. Do you think that there are
random variables which are neither discrete nor continuous? If yes, try to
construct a simple example. If no, then discuss why not.
Chapter 8
The 50-50-90 rule: Anytime you have a 50-50 chance of getting something right,
there’s a 90% probability you’ll get it wrong.
—Andy Rooney
If we toss a single coin, we have a 1/2 chance of getting a head. How do the
probabilities change if we toss a coin 3 times and total up the number of heads?
Is the chance of getting 0, 1, 2, or 3 heads exactly the same? Why or why
not? Is the probability of getting exactly 1 head the same as the probability of
getting head, tail, tail?
8.1 Introduction
In many situations, it is very helpful to consider the probability that a random
variable assumes a specific value or is found in an entire range of numbers. For
instance, it might be helpful to know whether a pregnant mother will have 2 or
more babies during her delivery; if X denotes the number of babies to be born,
the probability of this event is written as P (X ≥ 2). If X is the number of
guitar strings in a package to be shipped, and it is supposed to have 6 strings,
the manufacturing company will be very interested in P (X = 6), i.e., if the
package contains the right number of strings. When baking cookies, X could
be the number of cookies that your roommates would like to eat. If there are 8
cookies in a container, then P (X ≤ 8) is the probability that you have enough
cookies to feed them.
We use the concepts P (X = x) and P (X ≤ x) so often that we give them
each a special name:
87
88 Chapter 8. Probability Mass Functions and CDFs
pX (x) = P (X = x).
This is called the probability mass function (PMF), or just the mass,
of X.
8.2 Examples
Example 8.3. Roll a die, and let X denote the value that appears.
pX (x) pX (x)
1 1
5/6 5/6
4/6 4/6
3/6 3/6
2/6 2/6
1/6 1/6
x x
1 2 3 4 5 6 1 2 3 4 5 6
Figure 8.1: Left: Mass pX (x) = P (X = x) of the value on a die roll. Right:
Same plot but with the values of 0 not shown in the plot.
In the mass of a die roll X, the left hand side of Figure 8.1 illustrates the
fact that
pX (x) = P (X = x) = 1/6
for all x in the set {1, 2, 3, 4, 5, 6}, and pX (x) = P (X = x) = 0 otherwise. Since
the “otherwise” encompasses most values of x, we usually suppress the values
8.2. Examples 89
for which the mass is 0. Thus, the right hand side of Figure 8.1 is the way that
we will usually show such a mass (with the 0 values omitted from the plot).
To compute the cumulative distribution function of X, we compute
FX (1) = P (X ≤ 1) = P (X = 1) = 1/6;
FX (2) = P (X ≤ 2) = P (X = 1) + P (X = 2) = 2/6;
FX (3) = P (X ≤ 3) = P (X = 1) + P (X = 2) + P (X = 3) = 3/6;
FX (4) = P (X ≤ 4) = P (X = 1) + · · · + P (X = 4) = 4/6;
FX (5) = P (X ≤ 5) = P (X = 1) + · · · + P (X = 5) = 5/6;
FX (6) = P (X ≤ 6) = P (X = 1) + · · · + P (X = 6) = 1.
So if we begin to draw the cumulative distribution function in this case, we
know 6 of the values of FX (x) = P (X ≤ x); see the left side of Figure 8.2.
Figure 8.2: Left: Starting to construct the CDF FX (x) = P (X ≤ x) for the
value of a die roll, for x in {1, 2, 3, 4, 5, 6}. Middle: The CDF FX (x) including
x < 0 and x > 6. Right: The CDF FX (x) for all values of x.
Example 8.4. Flip a coin three times; let X denote the total number of heads.
The mass of X is
pX (0) = P (X = 0) = 1/8,
pX (1) = P (X = 1) = 3/8,
pX (2) = P (X = 2) = 3/8,
pX (3) = P (X = 3) = 1/8.
FX (0) = P (X ≤ 0) = 1/8,
FX (1) = P (X ≤ 1) = 4/8,
FX (2) = P (X ≤ 2) = 7/8,
FX (3) = P (X ≤ 3) = 1.
With similar reasoning to the last example, the value of the CDF FX (x) does
not increase in between the integers x. The plots of the mass and CDF of the
total number of heads X are given in Figure 8.3. Notice that the size of the
“jumps” in the CDF function are equal to the probabilities that X is found at
a specific value:
pX (x) FX (x)
1 1
7/8 7/8
6/8 6/8
5/8 5/8
4/8 4/8
3/8 3/8
2/8 2/8
1/8 1/8
x x
1 2 3 1 2 3
Figure 8.3: Left: The mass pX (x) of X, the number of heads in three tosses
of a fair coin. Right: The CDF FX (x) of X.
8.3. Properties of the Mass and CDF 91
The reasoning from Example 8.4 works in general. As we move from left to right
across the mass, we start with probability 0 on the extreme left-hand side of the
CDF. After sweeping all the way across the mass, we eventually accumulate all
of the probability from the mass, so that we get probability 1 on the extreme
right-hand side of the CDF.
Remark 8.6. Since discrete random variables assume only a finite or count-
able number of values, we can sum over all the nonzero masses, and we must
get sum 1: X
pX (x) = 1.
x : pX (x)6=0
We usually drop the notation about restricting to x’s for which pX (x) = 0:
X
pX (x) = 1.
x
For example,
P in Example 8.4, the random variable X can only take on values
0, 1, 2, 3, so 3j=0 pX (j) = 1.
{X ≤ a} ⊂ {X ≤ b}.
Whenever one event is contained in another, then the probability of the first
event is smaller than the probability of the second event. Thus
lim FX (x) = 1,
x→∞
lim FX (x) = 0.
x→−∞
Example 8.10. Flip a coin until the first head appears; let X denote the total
number of flips until the first head appears.
8.4. More Examples 93
The mass of X is depicted on the left of Figure 8.4; the values are:
pX (1) = P (X = 1) = P ({H}) = 1/2,
pX (2) = P (X = 2) = P ({T, H}) = 1/4,
pX (3) = P (X = 3) = P ({T, T, H}) = 1/8,
pX (4) = P (X = 4) = P ({T, T, T, H}) = 1/16,
and in general
j−1
z }| {
pX (j) = P (X = j) = P ({T, T, . . . , T , H}) = 1/2j .
Thus, the CDF of X (shown on the right of Figure 8.4) has the following values
at the integers:
FX (1) = P (X ≤ 1) = 1/2,
FX (2) = P (X ≤ 2) = 3/4,
FX (3) = P (X ≤ 3) = 7/8,
FX (4) = P (X ≤ 4) = 15/16,
and in general, for each positive integer x,
FX (x) = P (X ≤ x) = P ({first head within x tosses}) = 1 − 1/2x .
Another viewpoint is that, for positive integers x, we have FX (x) = P (X ≤
x) = 1 − P (X > x), but X > x only if the first x tosses are T , which has
probability 1/2x . Thus FX (x) = 1 − 1/2x .
pX (x) FX (x)
1 1
7/8 7/8
6/8 6/8
5/8 5/8
4/8 4/8
3/8 3/8
2/8 2/8
1/8 1/8
x x
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Figure 8.4: Left: The mass pX (x) of X, the number of tosses until the first
head. Right: The CDF FX (x) of X.
Remark 8.11. In Example 8.10, the observant reader will notice that, for
the outcome ω = (T, T, T, T, T, T, . . .), we did not define the value of X. This
outcome requires an infinite number of tosses until the first head is reached.
This outcome has probability 0, so it does not affect any of our calculations.
94 Chapter 8. Probability Mass Functions and CDFs
Example 8.12. As in Example 7.10, roll a pair of dice, and let X denote the
sum of the two values that appear. In other words, if the outcome is (i, j), then
we let X = i + j.
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (X = x) 36 36 36 36 36 36 36 36 36 36 36
We plot the mass and CDF of the sum of two dice in Figure 8.5.
pX (x) FX (x)
1 1
5/6 5/6
4/6 4/6
3/6 3/6
2/6 2/6
1/6 1/6
x x
1 2 3 4 5 6 7 8 9 101112 1 2 3 4 5 6 7 8 9 101112
Figure 8.5: Left: The mass pX (x) of X, the sum of values on two dice.
Right: The CDF FX (x) of X.
for x < 1,
0
FX (x) = bxc/20 for 1 ≤ x < 20,
for x ≥ 20.
1
Example 8.14. Suppose that 20.2% of cars are blue, 32.7% of cars are red,
3.06% of cars are silver, and the rest of the cars on the road are other colors.
Let X be “1” if the next car to pass is blue, or “2” if the next car is red, or “3”
if the next car is silver, or “0” otherwise.
Since the rest of the mass is at x = 0, and the mass adds to 1, the mass at 0
must be
pX (0) = 1 − 0.202 − 0.327 − 0.0306 = 0.4404.
For any values of x not in {0, 1, 2, 3}, we must have pX (x) = 0. The CDF is
0 for x < 0,
for
0.4404 0 ≤ x < 1,
FX (x) = 0.6424 for 1 ≤ x < 2,
for
0.9694 2 ≤ x < 3,
for
1 x ≥ 3.
Example 8.15. Draw a card from a well-shuffled deck until the ace of spades
appears. If a draw is unsuccessful, then replace and reshuffle the deck before
making the next selection. Let X be the number of draws needed until the ace
of spades appears for the first time.
(The event that the ace of spades never appears has probability 0. In such a
case, we could, for instance, make X = −1, or any other suitable solution. It
will not matter, because such an event has probability 0 anyway.)
The mass of X is pX (j) = (51/52)j−1 (1/52) for positive integers j, and
pX (x) = 0 otherwise.
Example 8.16. Suppose that a basketball player makes 80% of her free throws
successfully. She shoots as many times as necessary, until scoring the first
basket. Let X denote the number of necessary attempts.
(The probability that she never scores is 0, so we can safely ignore this outcome;
the other probabilities will not be affected.)
The mass of X is pX (j) = (0.20)j−1 (0.80) for positive integers j, and
pX (x) = 0 otherwise.
96 Chapter 8. Probability Mass Functions and CDFs
Example 8.17. Draw a card from a well-shuffled deck until the ace of spades
appears. If a draw is unsuccessful, do not replace the card—just continue to
draw. Let X be the number of draws needed until the ace of spades appears for
the first time.
Here, X will be one of the integers between 1 and 52, inclusive. As discussed
in Example 2.17, the placement of the ace of spades is equally likely to be
anywhere in the deck, so any of the first 52 draws are equally likely to be the
ace of spades. So the mass of X is pX (j) = 1/52 for j = 1, 2, 3, . . . , 52, and
pX (x) = 0 otherwise. The CDF of X is FX (x) = x/52 for each integer x with
1 ≤ x ≤ 52. Thus
for x < 1,
0
FX (x) = bxc/52 for 1 ≤ x < 52,
for x ≥ 52.
1
Example 8.18. A con artist has a trick die which is weighted so that a 1 will
come up half of the time. The other numbers are equally likely to appear. Let
X denote the number that appears when you roll the die once.
The possible values X assumes are still {1, 2, 3, 4, 5, 6}. We are given
pX (1) = P (X = 1) = 1/2.
Since the other values are equally likely to appear, we must have
8.5 Exercises
8.5.1 Practice
Exercise 8.1. Snacks. A student makes a trip once per day to the store, and
he always buys a snack. The student eats the snack 70% of the time, but the
other 30% of the time, his roommate eats it first.
8.5. Exercises 97
a. During a period of four days, the student keeps track of whether he gets
to eat his snack. What is the sample space of possible outcomes?
b. Let X be the number of times, within the four day period, that he gets
to eat his snack. What is the mass of X?
c. What is the CDF of X?
Exercise 8.3. Songs by genre. As in Exercises 2.1 and 4.3, and in Ex-
ample 3.11, a randomly chosen song is from the blues genre with probabil-
ity 330/27333; from the jazz genre with probability 537/27333; from the rock
genre with probability 8286/27333; or from some other genre with probability
18180/27333. Let X be “1” if a randomly selected song is from the “blues” genre,
or “2” if “jazz,” or “3” if “rock,” or “−1” otherwise. Find the mass and CDF of
X.
Exercise 8.4. Milkfat. Suppose that 13% of all milk cartons have 1% milkfat,
and 28% of all milk cartons have 2% milkfat, and 18% of all milk cartons are
fat-free, and 41% of all milk cartons are of some other type. Randomly choose
a carton of milk, and let X be “1” if the carton of milk is 1% milkfat, or “2” if
the carton of milk is 2% milkfat, or “3” if the carton of milk is fat-free, or “4”
otherwise.
A probability student hears about the SBC and goes to the local restaurant.
He observes the number of customers, X, that attempt to eat the SBC, until
the first success. So if there are 4 failures and then 1 success (i.e., the outcome
is (F, F, F, F, T )), then X = 5.
98 Chapter 8. Probability Mass Functions and CDFs
Exercise 8.10. Mystery mass. Consider a random variable X that has CDF
0 if x < 2,
0.3 if 2 ≤ x < 4,
FX (x) = 0.8 if 4 ≤ x < 6,
0.95 if 6 ≤ x < 8,
if 8 ≤ x.
1
Exercise 8.11. Mystery mass. Consider a random variable X that has CDF
0 if x < −10,
0.1 if −10 ≤ x < −5,
FX (x) =
0.8 if −5 ≤ x < 0,
if 0 ≤ x.
1
Exercise 8.12. Butterflies. Alice, Bob, and Charlotte are looking for butter-
flies. They look in three separate parts of a field, so that their probabilities of
success are independent.
• Alice finds 1 butterfly with probability 17%, and otherwise does not find
one.
• Bob finds 1 butterfly with probability 25%, and otherwise does not find
one.
• Charlotte finds 1 butterfly with probability 45%, and otherwise does not
find one.
Let X be the total number of butterflies that they find. Find the probability
mass function of X.
8.5.2 Extensions
Exercise 8.15. Poisson distribution. Suppose X has mass
λx e−λ
pX (x) = for x ∈ Z≥0 ,
x!
and pX (x) = 0 otherwise.
a. For λ = 2, make a plot of the probability mass function.
b. What is the CDF of X, when λ = 2?
c. Make a plot of the CDF, when λ = 2.
For future reference, this is called a Poisson random variable. You may want
to refer to the Math Review for help with the summation of the terms of the
x
form λx! . We will study these random variables more in Chapter 18.
Exercise 8.16. Coin flips. Flip a coin three times. Let X denote the number
of heads minus the number of tails. So, for instance, if (H, T, T ) is the outcome,
then X = 1 − 2 = −1.
a. What are the possible values of X?
b. What is the mass of X?
c. Make a plot of the probability mass function.
d. What is the CDF of X?
e. Make a plot of the CDF.
8.5.3 Advanced
Exercise 8.17. Magic. Using a shuffled standard deck of 52 playing cards, a
magician wants to do a trick where he tries to guess which card an audience
member has selected if the audience member chooses a card at random and
doesn’t show the magician. Let X be the number of cards the magician will
guess correctly if he tries the trick with 6 audience members. Each audience
member starts with a complete and well-shuffled deck. The magician decides to
call a trial a success if he guesses the number on the card correctly even if he
doesn’t get the right suit.
8.5. Exercises 101
Exercise 8.19. Pick two cards. (See also Exercise 7.17) Pick two cards at
random from a well-shuffled deck of 52 cards (pick them simultaneously, i.e.,
grab two cards at once—so they are not the same card!). There are 12 cards
which are considered face cards (4 Jacks, 4 Queens, 4 Kings). Let X be the
number of face cards that you get. Draw the CDF FX (x) of X.
Chapter 9
The most misleading assumptions are the ones you don’t even know you’re
making.
—“Meeting a Gorilla,” by Douglas Adams and Mark Carwardine, from Chap-
ter 2 of The Great Ape Project, edited by Paola Cavalieri and Peter Singer
(St. Martin’s Griffin, 1993)
Seven people stand in a post office lobby. We don’t know who came in first.
What is the probability the first person from that group who walked in the door
was female? How does that probability change if we know the total number of
females in the group of 7? If all 7 people are female, what do we know about
the probability that the first person was a female? If all 7 people are male, what
do we know about the probability that the first person was a female? What
happens to the probabilities for the gender of the first person for total numbers
of women in between 0 and 7?
102
9.1. Joint Probability Mass Functions 103
or equivalently,
FX,Y (x, y) = P (X ≤ x and Y ≤ y).
Example 9.3. Roll two dice. Let X denote the minimum of the two values
that appear, and let Y denote the maximum of the two values that appear.
For instance,
Also,
and for 1 ≤ x = y ≤ 6,
and
pX,Y (x, y) = 0 otherwise.
So the case pX,Y (x, y) = 2/36 with 1 ≤ x < y ≤ 6 corresponds to the two
possible outcomes (x, y) or (y, x), and the case pX,Y (x, y) = 1/36 with 1 ≤ x =
y ≤ 6 corresponds to the one possible outcome (x, x) (here, we emphasize that
y = x, i.e., the maximum and minimum are exactly the same because the die
rolls are the same).
The cumulative distribution function is calculated similarly, e.g.,
FX,Y (2, 4) = P ({min ≤ 2 and max ≤ 4})
= P ({(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2),
(2, 3), (2, 4), (3, 1), (3, 2), (4, 1), (4, 2)})
= 12/36.
As with CDFs for one variable, nothing changes if we calculate, for instance,
FX,Y (2.9, 4.1) = P (X ≤ 2.9 and Y ≤ 4.1) = P (X ≤ 2 and Y ≤ 4) = 12/36.
Using the chart below, we can easily obtain all of the values of FX,Y (x, y). As
another example,
FX,Y (5, 2) = P ({min ≤ 5 and max ≤ 2})
= P ({(1, 1), (1, 2), (2, 1), (2, 2)})
= 4/36.
In general, we have
FX,Y (x, y) y=1 y=2 y=3 y=4 y=5 y=6
x=1 1/36 3/36 5/36 7/36 9/36 11/36
x=2 1/36 4/36 8/36 12/36 16/36 20/36
x=3 1/36 4/36 9/36 15/36 21/36 27/36
x=4 1/36 4/36 9/36 16/36 24/36 32/36
x=5 1/36 4/36 9/36 16/36 25/36 35/36
x=6 1/36 4/36 9/36 16/36 25/36 36/36
Example 9.4. Flip a fair coin three times. Let X be the total number of heads
that appear, and let Y be the total number of tails that appear.
First of all, we notice that X + Y = 3 always. So we are certain to have
pX,Y (x, y) = 0 if x + y 6= 3. We can easily make a chart of all of the possible
outcomes—organized strategically into events according to the values of X and
Y that they induce—and the associated values of X:
Event Probability Joint Mass of X and Y
{(H, H, H)} 1/8 pX,Y (3, 0) = 1/8
{(H, H, T ), (H, T, H), (T, H, H)} 3/8 pX,Y (2, 1) = 3/8
{(T, T, H), (T, H, T ), (H, T, T )} 3/8 pX,Y (1, 2) = 3/8
{(T, T, T )} 1/8 pX,Y (0, 3) = 1/8
9.1. Joint Probability Mass Functions 105
As another example, FX,Y (2, 2) = FX,Y (1, 2) + FX,Y (2, 1) = 3/8 + 3/8 = 6/8.
In general, we have
FX,Y (x, y) y=0 y=1 y=2 y=3
x=0 0 0 0 1/8
x=1 0 0 3/8 4/8
x=2 0 3/8 6/8 7/8
x=3 1/8 4/8 7/8 8/8
Example 9.5. Roll a 6-sided fair die and let X denote the outcome. Also flip
a spinner that shows “1” with probability 0.30, or shows “2” with probability
0.32, or shows “3” with probability 0.38, and let Y denote the outcome.
Then for each integer j from 1 to 6 inclusive,
Remark 9.6. Since the joint mass and joint CDF are probabilities, then
The joint mass, summed over all x’s and y’s, takes the probabilities from the
whole sample space into account, so
XX
pX,Y (x, y) = 1.
x y
Example 9.7. As in Example 9.3, roll two dice. Let X denote the minimum
of the two values that appear, and let Y denote the maximum of the two values
that appear.
106 Chapter 9. Independence and Conditioning
pX (3) = P ({(3, 3), (3, 4), (3, 5), (3, 6), (4, 3), (5, 3), (6, 3)}) = 7/36.
y 1 2 3 4 5 6
pY (y) 1/36 3/36 5/36 7/36 9/36 11/36.
Alternatively, we can compute the mass of X directly from the joint mass of X
and Y , by letting Y take on any value. For instance,
6
X 1 2 2 2
pX (3) = pX,Y (x, y) = 0 + 0 + + + + = 7/36.
36 36 36 36
y=1
Remark 9.8. Calculating the mass of one variable from the joint
mass. The mass of X can be calculated by summing the joint mass over all
possible values of Y : X
pX (x) = pX,Y (x, y).
y
Similarly, the mass of Y is the sum of the joint mass over all possible values
of X: X
pY (y) = pX,Y (x, y).
x
Example 9.9. Again, as in Example 9.3, roll two dice. Let X denote the
minimum of the two values that appear, and let Y denote the maximum of the
two values that appear.
FX (2) = P ({(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)
(3, 1), (3, 2), (4, 1), (4, 2), (5, 1), (5, 2), (6, 1), (6, 2)})
= 20/36.
x 1 2 3 4 5 6
FX (x) 11/36 20/36 27/36 32/36 35/36 36/36
x 1 2 3 4 5 6
FY (y) 1/36 4/36 9/36 16/36 25/36 36/36
As an alternative, we can compute the CDF of X directly from the joint CDF
of X and Y , by letting Y → ∞, e.g.,
Remark 9.10. Calculating the CDF of one variable from the joint
CDF. The CDF of X can be calculated by taking the limit as y → ∞ in the
joint CDF:
FX (x) = lim FX,Y (x, y).
y→∞
Example 9.12. Roll a die and flip a fair coin. Let X be the result of the die
roll. Let Y be 0 if the coin shows a “tail” or 1 if the coin shows a “head.” Notice
pX,Y (x, y) = 1/12
for all 1 ≤ x ≤ 6 and 0 ≤ y ≤ 1, since all twelve of these outcomes are equally
likely. Also
pX (x) = 1/6 for 1 ≤ x ≤ 6,
pY (y) = 1/2 for 0 ≤ y ≤ 1.
So pX,Y (x, y) = pX (x)pY (y), so X and Y are independent.
3. We will define the conditional mass later in this chapter, but we go ahead
and state a way to use conditional mass for independence.
(a) Mass of X is conditional mass pX|Y (x | y) of X given Y = y,
or (b) Mass of Y is conditional mass pY |X (y | x) of Y given X = x.
9.2. Independent Random Variables 109
pX|Y (x | y) = pX (x),
or
pY |X (y | x) = pY (y).
Once we get more familiar with these concepts, we will not go through such
tedious calculations. Sometimes we will just go ahead and observe that “X and
Y are independent,” but for now it is good to practice a little bit.
Example 9.14. Let X indicate whether the first baby born to a certain mother
is a girl, i.e., X = 1 if the first baby born is a girl; otherwise, X = 0. Let Y
indicate whether the second baby born to a certain mother is a girl, i.e., Y = 1
if the second baby born is a girl; otherwise, Y = 0. (We are not considering the
birth of twins, in which one baby’s sex might affect the other.) Since
and since X and Y each take values in the set {0, 1}, we can factor the joint
mass pX,Y (x, y) = 1/4 into 1/4 = (1/2) · (1/2), and we must have pX (x) = 1/2
for x = 0, 1 and pY (y) = 1/2 for y = 0, 1. So X and Y are independent.
Thus pX,Y (x, y) can be factored as pX,Y (x, y) = pX (x)pY (y), by writing
and
pY (1) = 1/3 and pY (0) = 2/3.
A moment’s thought shows that the commas (in the four lines above) can be
replaced with equalities if and only if A and B are independent events or equiv-
alently if X and Y are independent random variables.
Example 9.17. Flip a coin until a head appears. Let A denote the event that
an even number of flips are required. (The case that a head never appears can
be included in A; it does not matter, since the probability that a head never
appears is 0.) Let B denote the event that 11 or more flips are necessary. Let X
indicate whether A occurs (i.e., X = 1 if A occurs, and X = 0 otherwise). Let
Y indicate whether B occurs (i.e., Y = 1 if B occurs, and Y = 0 otherwise).
P (A) = P (T H) + P (T T T H) + P (T T T T T H) + P (T T T T T T T H) + · · ·
= (1/2)2 + (1/2)4 + (1/2)6 + (1/2)8 + · · ·
= 1/4 + (1/4)2 + (1/4)3 + (1/4)4 + · · ·
= (1/4)(1 + (1/4) + (1/4)2 + (1/4)3 + · · · )
1 1
=
4 1 − 1/4
1 1
=
4 3/4
= 1/3
To get an intuitive idea why P (A) = 1/3, look at the first two flips: We are using
geometric sums.
1. If the first two flips are HH, then Ac occurs, i.e., an odd number of flips
was needed to see the first head.
2. If the first two flips are HT , then Ac occurs, i.e., an odd number of flips
was needed to see the first head.
3. If the first two flips are T H, then A occurs, i.e., an even number of flips
was needed to see the first head.
4. If the first two flips are T T , then we essentially start over; just look at
the next pair of flips.
Thus A occurs in exactly 1 out of the 3 deciding cases (and the 3 deciding cases
are all equally likely).
Finally, we calculate the probability of A and B occurring:
11 13
z }| { z }| {
P (A ∩ B) = P (T T T T T T T T T T T H) + P (T T T T T T T T T T T T T H)
15
z }| {
+ P (T T T T T T T T T T T T T T T H) + · · ·
= (1/2)12 + (1/2)14 + (1/2)16 + (1/2)18 + · · ·
= (1/4)6 + (1/4)7 + (1/4)8 + (1/4)9 + · · ·
= (1/4)6 (1 + (1/4) + (1/4)2 + (1/4)3 + · · · )
6
1 1
=
4 1 − 1/4
6
1 1
=
4 3/4
= (1/4)5 (1/3)
112 Chapter 9. Independence and Conditioning
So we conclude
pX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = pX1 (x1 )pX2 (x2 ) · · · pXn (xn )
for all x1 , x2 , . . . , xn .
FX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = FX1 (x1 )FX2 (x2 ) · · · FXn (xn )
for all x1 , x2 , . . . , Xn .
pX|Y (x | y) = P (X = x | Y = y).
For the conditional mass of X given Y to make sense, we must use Y values
such that P (Y = y) > 0.
114 Chapter 9. Independence and Conditioning
Example 9.24. Roll two dice. Let X denote the value of the first die, and let
Y denote the value of the sum of the two dice. If we are given Y = 4, then we
know that the set of possible outcomes are
Example 9.25. Flip the cards from a deck over, one at a time, until the whole
deck has been flipped over. Let X denote the number of cards until the first
ace appears. Let Y denote the number of cards until the first queen appears.
Then X and Y are dependent.
To see that X and Y are dependent, first notice that pY (1) > 0 because it is
possible (i.e., the probability is positive) that the first card is a queen. On the
other hand, if X = 1, i.e., if the first card is an ace, then the first card is not
a queen, so pY |X (1 | 1) = 0. Thus pY (y) 6= pY |X (y | x) when x and y are both
equal to 1. So X and Y are not independent. In fact, X and Y are dependent.
(It is worthwhile to compare and contrast with Example 9.15, to see what is
similar and different in these two examples.)
To see that X and Y are dependent, first notice that pY (1) = 1/2 because the
outcome is even with probability 1/2. On the other hand, if X = 1, i.e., if
the outcome is 1, 2, or 3, then the outcome is even with probability 1/3. so
pY |X (1 | 1) = 1/3. Thus pY (y) = 1/2 6= 1/3 = pY |X (y | x) when x and y
are both equal to 1. So X and Y are not independent. In fact, X and Y are
dependent.
9.4. Conditional Probability Mass Functions 115
Example 9.27. As in Example 5.2, in a certain household, 20% of the milk has
two-percent milkfat, and the other 80% of the milk is whole milk. The whole
milk is spoiled 5% of the time; overall, the milk is spoiled 4.7% of the time.
In that example, we let A denote the event that the milk is came from a whole
milk carton, and we let B denote the event that the milk was spoiled. We were
given P (A) = 0.80, and we calculated
Then X and Y are dependent. To see this, notice P (A) = 0.80 is the same
as pX (1) = 0.80, and P (A | B) = 0.85 is the same as pX|Y (1 | 1) = 0.85. So
pX (x) 6= pX|Y (x | y) when x = 1 and y = 1.
Example 9.28. In Example 9.15, X and Y are independent. (In that example,
we roll a die and let X be 1 if the outcome is 1, 3, or 5; let X be 0 otherwise.
Let Y be 1 if the outcome is 5 or 6; let Y be 0 otherwise.) An alternative way
to see that X and Y are independent is to show that pY |X (y | x) = pY (y).
We have pY (1) = 1/3 and pY (0) = 2/3.
Also, given that X = 1, then the outcome is 1, 3, or 5, so Y = 1 exactly
1/3 of the time, thus pY |X (1 | 1) = 1/3 and pY |X (0 | 1) = 2/3; similarly, given
that X = 0, then the outcome is 2, 4, or 6, so Y = 1 exactly 1/3 of the time,
so pY |X (1 | 0) = 1/3 and pY |X (0 | 0) = 2/3. So, regardless of the value of x, we
have pY |X (1 | x) = 1/3 and pY |X (0 | x) = 2/3.
Thus pY |X (y | x) = pY (y) in all cases. So X and Y are independent.
Example 9.29. As in Exercises 2.21 and 8.2, a sequence of seven people walk
into a post office (one at a time) and only their sexes are noted. Assume that
each of the seven customers is equally likely to be a man or a woman. Let Y
denote the number of customers that are female. Let X = 0 if the first person
is a male, or X = 1 if the first person is a female.
116 Chapter 9. Independence and Conditioning
and
pX,Y (1, 0) 0
pX|Y (1 | 0) = = =0
pY (0) 1/128
pX,Y (1, 1) 1/128 1
pX|Y (1 | 1) = = =
pY (1) 7/128 7
pX,Y (1, 2) 6/128 2
pX|Y (1 | 2) = = =
pY (2) 21/128 7
pX,Y (1, 3) 15/128 3
pX|Y (1 | 3) = = =
pY (3) 35/128 7
pX,Y (1, 4) 20/128 4
pX|Y (1 | 4) = = =
pY (4) 35/128 7
pX,Y (1, 5) 15/128 5
pX|Y (1 | 5) = = =
pY (5) 21/128 7
pX,Y (1, 6) 6/128 6
pX|Y (1 | 6) = = =
pY (6) 7/128 7
pX,Y (1, 7) 1/128
pX|Y (1 | 7) = = =1
pY (7) 1/128
Notice: The two terms in each row have a sum of 1, i.e., P pX|Y (0 | y) +
pX|Y (1 | y) = 1 for each y, because the sum over x’s, i.e., x pX|Y (x | y)
must be 1 for each fixed value of y, because the first person to enter is either
male (X = 0) or female (X = 1).
Method #2 of computing the conditional mass. In hindsight, we
could have computed pX|Y (x | y) directly, but it was perhaps helpful to go
through such a rigorous exercise. Now we directly compute pX|Y (x | y); this is
much, much shorter but requires some subtle insight. Suppose that it is known
that Y = y. Then, in other words, it is known that exactly y out of the 7
people are female. The first person is equally likely to be any of these 7 people,
of which y are female and 7−y are male. So the probability that the first person
is male is exactly (7 − y)/7; the probability that the first person is female is
exactly y/7. So we always have the following: Given that Y = y is an integer
between 1 and 7, the probability that the first person is a male is exactly
pX|Y (0 | y) = (7 − y)/7;
the probability that the first person is a female is exactly
pX|Y (1 | y) = y/7.
In closing, we note that the joint mass of two random variables X, Y is equal
to the conditional mass of X given Y multiplied by the mass of Y .
Remark 9.30. For all random variables X and Y (regardless of indepen-
dence), we have
pX,Y (x, y) = pX|Y (x | y)pY (y).
118 Chapter 9. Independence and Conditioning
9.5 Exercises
9.5.1 Practice
Exercise 9.1. Random employee hiring. Ten students apply for a job
opening, but only 1 of the students will be selected. The employer chooses
randomly; all ten outcomes are equally likely. If person 3, 5, 7, or 9 gets the
job, let X = 1; otherwise, X = 0. If person 1, 2, 3, 4, or 5 gets the job, let
Y = 1; otherwise, Y = 0. Are X and Y independent random variables? Justify
your answer.
Exercise 9.5. Butterflies. Alice, Bob, and Charlotte are looking for butter-
flies. They look in three separate parts of a field, so that their probabilities of
success do not affect each other.
• Alice finds 1 butterfly with probability 17%, and otherwise does not find
one.
• Bob finds 1 butterfly with probability 25%, and otherwise does not find
one.
• Charlotte finds 1 butterfly with probability 45%, and otherwise does not
find one.
Let X be the number of butterflies that they find altogether. Let Y be the
number of people who do not find a butterfly.
Find the joint mass pX,Y (x, y) of X and Y .
9.5. Exercises 119
Exercise 9.7. Pick two cards. Pick two cards at random from a well-shuffled
deck of 52 cards (pick them simultaneously, i.e., grab two cards at once—so they
are not the same card!). There are 12 cards which are considered face cards (4
Jacks, 4 Queens, 4 Kings). There are 4 cards with the value 10. Let X be the
number of face cards in your hand; let Y be the number of 10’s in your hand.
Are X and Y dependent or independent?
9.5.2 Extensions
Exercise 9.8. Dice. Roll two dice, one colored red and one colored blue. Let
Y denote the maximum value that appears on the two dice. Let X denote the
value of the blue die. Find the conditional mass of X given Y .
Exercise 9.10. Two 4-sided dice. Consider some 4-sided dice. Roll two of
these dice. Let X denote the minimum of the two values that appear, and let
Y denote the maximum of the two values that appear.
9.5.3 Advanced
Exercise 9.11. Prove that the statements of independence in Definition 9.13
are equivalent.
Chapter 10
Your teacher can open the door, but you must enter by yourself.
—Proverb
You are having your weekly poker game with some of the people from your
dorm. The dealer shuffles the cards and then deals each person five cards. How
many of the cards in your hand do you expect to be hearts?
10.1 Introduction
An expected value (also called a (weighted) average or mean) says something
succinct about a random variable. The expected value is one way to measure the
center of the distribution. It does not tell us everything about the distribution;
many different kinds of random variables can have the same expected value.
Nonetheless, an expected value of a random variable is helpful. It describes the
sum of the values that a random variable takes, in proportion to the mass the
random variable has at each value.
Definition 10.1. Expected value of a discrete random variable
A discrete random variable X that takes on values x1 , x2 , . . . , xn has expected
value
Xn
E(X) = xj pX (xj ).
j=1
120
10.2. Examples 121
10.2 Examples
Example 10.2. Flip a coin three times. Let X denote the number of heads.
For a second way to view the expected values, notice that we have eight
possible outcomes:
ω1 = T T T,
ω2 = HT T, ω3 = T HT, ω4 = T T H,
ω5 = HHT, ω6 = HT H, ω7 = T HH,
ω8 = HHH,
and we get, with this new interpretation of expected value, the very same result
at the end:
E(X) = (0)(1/8)
+ (1)(1/8) + (1)(1/8) + (1)(1/8)
+ (2)(1/8) + (2)(1/8) + (2)(1/8)
+ (3)(1/8)
= 3/2.
The reason that the two different definitions are equivalent is that we could just
group the 2nd, 3rd, and 4th terms above, which all have value “1” as the value
of X, to get
(1)(1/8) + (1)(1/8) + (1)(1/8) = (1)(3/8),
which was also found in the first definition. Similarly, we could just group the
5th, 6th, and 7th terms above, which all have value “2” as the value of X, and
we get
(2)(1/8) + (2)(1/8) + (2)(1/8) = (2)(3/8),
which was also found in the first definition. So these are just two different ways
of grouping things.
1. a sum over all of the possible values of X, each weighted by the proba-
bility of X taking on that value, or
2. a sum over all of the possible outcomes, taking the value of X from such
an outcome, weighted by the probability of that outcome
In the second method, where we enumerate the possible outcomes, we have
the following:
Remark 10.4. Expected value of a discrete random variable (always
gives same result as the original definition)
Consider a random phenomenon with possible outcomes ω1 , ω2 , . . . , ωn . Sup-
pose that the outcome ωj causes random variable X to take on value xj . Then
the discrete random variable X has expected value
n
X
E(X) = xj P ({ωj }).
j=1
If the random phenomenon can take on one of infinitely many possible out-
comes ω1 , ω2 , . . ., and we again suppose that the outcome ωj causes random
variable X to take on value xj , then the expected value of X is
∞
X
E(X) = xj P ({ωj }).
j=1
Example 10.5. Consider a class consisting of eight students, who earn the
following scores on an exam: 75, 92, 88, 94, 89, 60, 83, 84. Let X be the
exam score of a randomly chosen student. Each student is equally likely to be
selected. What is the expected value of X?
There are eight possible outcomes in this random phenomenon, each of which
has probability 1/8. So, following the second definition of expected value, we
obtain
Example 10.6. Roll a die three times. Let X be the number of times that a
6 appears. Find the expected value of X.
10.2. Examples 123
We compute:
P (X = 0) = (5/6)3 ;
P (X = 1) = (3)(1/6)(5/6)2
(since there are 3 ways to have 1 occurrence of 6, and each way has probability
(1/6)(5/6)2 );
P (X = 2) = (3)(1/6)2 (5/6)
(since there are 3 ways to have 2 occurrences of 6, and each way has probability
(1/6)2 (5/6));
P (X = 3) = (1/6)3 .
So the expected value of X is
If we were to have 100 people each roll a die three times, and then we took
the average of these results (i.e., added the 100 results and divided by 100),
the average from these 100 experiments would likely be very close to 0.5. It
is worthwhile to try it and see. One thousand people would likely give us an
average even closer to 0.5. We will consider what actually happens in the long
run with experiments later in the book, when discussing Normal approximations
and the Central Limit Theorem.
Example 10.8. As in Examples 1.13 and 2.17, a student shuffles a deck of cards
thoroughly (one time) and then selects cards from the deck without replacement
until the ace of spades appears. How many cards does the student expect to
draw until the ace of spades appears?
Let X be the number of cards needed until the ace of spades appears. Then
Thus,
So the student expects to draw 53/2 cards (i.e., 26.5 cards) to see the ace of
spades appear. Here we used the helpful fact that 1 + 2 + · · · + n = (n)(n + 1)/2.
(We will further study such random variables, which are equally like to be
any one of the values 1, 2, . . . , N , in Chapter 18.)
Example 10.9. A student draws cards from a standard deck of playing cards
until the ace of spades appears for the first time. After every unsuccessful draw,
the student replaces the card and shuffles the deck thoroughly before selecting
a new card. How many cards does the student expect to draw until the ace of
spades appears?
Let X denote the number of cards needed until the ace of spades appears for
the first time. Then P (X = j) = (51/52)j−1 (1/52) for each positive integer j.
So
∞ ∞
1 X d j 1 d X j
E(X) = x = x x=51/52
.
52 dx x=51/52 52 dx
j=1 j=1
∞ ∞
X
j
X x
x =x xj = .
1−x
j=1 j=0
10.2. Examples 125
So we get
1 d x
E(X) =
52 dx 1 − x x=51/52
1 (1 − x)(1) − (x)(−1)
=
52 (1 − x)2 x=51/52
1 1
=
52 (1 − x)2 x=51/52
1 1
=
52 1 − 51 2
52
1 1
=
52 (1/52)2
= 52
So the student expects to draw 52 cards to get the ace of spades. We will study
more examples like this in Chapter 16, on Geometric random variables.
Example 10.10. A standard deck of 52 cards has 13 hearts. The cards in such
a deck are shuffled, and the top five cards are dealt to a player. What is the
expected number of hearts that the player receives?
Let X denote the number of hearts that the player receives. Then
(39)(38)(37)(36)(35) 2109
P (X = 0) = =
(52)(51)(50)(49)(48) 9520
(13)(39)(38)(37)(36) 27417
P (X = 1) = 5 =
(52)(51)(50)(49)(48) 66640
(13)(12)(39)(38)(37) 9139
P (X = 2) =10 =
(52)(51)(50)(49)(48) 33320
(13)(12)(11)(39)(38) 2717
P (X = 3) =10 =
(52)(51)(50)(49)(48) 33320
(13)(12)(11)(10)(39) 143
P (X = 4) = 5 =
(52)(51)(50)(49)(48) 13328
(13)(12)(11)(10)(9) 33
P (X = 5) = =
(52)(51)(50)(49)(48) 66640
Example 10.11. Jim and his brother both like chocolate chip cookies best.
They have a jar of cookies with 5 chocolate chip cookies, 3 oatmeal cookies,
and 4 peanut butter cookies. They are each allowed to have 3 cookies. To be
fair, they agree to randomly select their cookies without peeking, and they each
must keep the cookies that they select. How many chocolate chip cookies does
Jim expect to get? (Notice that it does not matter whether Jim or his brother
selects the cookies first—the answer will be the same, either way.)
Let X the number of chocolate chip cookies that Jim selects. Since there are
12 cookies, there are (12)(11)(10) equally likely outcomes for Jim. Exactly
(7)(6)(5) of them have no chocolate chips. So
(7)(6)(5) 7
P (X = 0) = = .
(12)(11)(10) 44
There are exactly 3 ways that one cookie could be chocolate chip; 5 such cookies
could be the chocolate one; the other cookies could be selected in (7)(6) ways.
So
(3)(5)(7)(6) 21
P (X = 1) = = .
(12)(11)(10) 44
There are exactly 3 ways that two cookies could be chocolate chips; 7 cookies
could be the non-chocolate chip one; the other cookies could be chocolate chips,
selected in (5)(4) ways. So
(3)(7)(5)(4) 7
P (X = 2) = = .
(12)(11)(10) 22
(5)(4)(3) 1
P (X = 3) = = .
(12)(11)(10) 22
So the expected value of X is
7 21 7 1
E(X) = (0) + (1) + (2) + (3) = 5/4.
44 44 22 22
Note: The total number of outcomes is (12)(11)(10) = 1320. As a check,
210 + 630 + 420 + 60 = 1320, so all 1320 possible outcomes have been taken into
account.
10.3 Exercises
10.3.1 Practice
Exercise 10.1. Graduation. As in Exercise 3.1, Jack and Jill are indepen-
dently struggling to pass their last (one) class required for graduation. Jack
10.3. Exercises 127
needs to pass Calculus III, but he only has probability 0.30 of passing. Jill
needs to pass Advanced Pharmaceuticals, but she only has probability 0.46 of
passing. They work independently. Let X = 0 if neither of them graduates,
or X = 1 if exactly one of them graduate, or X = 2 if both of them graduate.
Find the expected value of X.
Exercise 10.2. Japanese pan noodles. As in Exercise 3.2, Four students
order noodles at a certain local restaurant. Their orders are placed indepen-
dently. Each student is known to prefer Japanese pan noodles 40% of the time.
How many of them do we expect to order Japanese pan noodles?
Exercise 10.3. Waiting for folk song. Angelina’s music player, in “shuf-
fle” mode, will play songs without any repetitions, until every song has been
played exactly once. The number of songs of each genre is the following: 330
blues songs; 537 jazz songs; and 1 folk song. So there are 868 songs altogether.
(All possible “shuffles”—i.e., all possible orderings of the 868 songs—are equally
likely.) How many songs does Angelina expect to listen to, until the folk song
finally appears? Please justify your answer. Hint: Use the method in Exam-
ple 10.8.
Exercise 10.4. Bowling strikes. In a bowling alley, 20% of the time when
someone bowls, he or she gets a strike. If there are 3 people in the bowling alley,
and X is the total number of people who get a strike on their current attempt,
what is the expected value of X?
Exercise 10.5. Career fair. You go to a career fair and have some job inter-
views. Based on career fair data, you think you have a 30% chance of getting
an offer for $40,000, a 40% chance of getting an offer of $44,000, a 25% chance
of getting an offer for $51,000, and a 5% chance of an offer for $57,000. What
is your expected offer salary?
Exercise 10.6. Golden ticket. A student was at work at the county am-
phitheater, and was given the task of cleaning 1500 seats. To make the job
more interesting, his boss hid a golden ticket somewhere in the seats. The
ticket is equally likely to be in any of the seats. Let X be the number of seats
cleaned until the ticket is found. Calculate the expected value of X.
Exercise 10.7. Football. A man is looking for a football game to watch on a
Sunday night. He has ten channels, and only one of them shows football, but
he can’t remember which one. Assume they are all equally likely to have it, and
let X be the number of channels he tries until he finds it. What is the expected
number of channels he tries until he finds it?
Exercise 10.8. Cashews. There is a bowl containing 30 cashews, 20 pecans,
25 almonds, and 25 walnuts. I am going to randomly pick and eat 3 nuts. What
is the expected number of cashews I will eat?
Exercise 10.9. Chess. Let X be the number of games of chess I win against
Stephen. Assume that I have a 30% chance of winning in any particular game
128 Chapter 10. Expected Values of Discrete Random Variables
Exercise 10.10. Sports fans. The All-Star Pigeons lose 10 fans every time
they lose a game and gain 100 fans every time they win. Assume they have
a 30% chance of losing each game and that their performance in each game is
independent. What is the expected value of X, the number of fans they will
gain or lose over the next 3 games?
Exercise 10.11. Slot machine. People who play a slot machine win a prize
7% of the time. Let X denote the number of times the slot machine is used
until the next winner is found. What is the expected value of X?
10.3.2 Extensions
Exercise 10.12. Butterflies. Alice, Bob, and Charlotte are looking for but-
terflies. They look in three separate parts of a field, so that their probabilities
of success do not affect each other.
• Alice finds 1 butterfly with probability 17%, and otherwise does not find
one.
• Bob finds 1 butterfly with probability 25%, and otherwise does not find
one.
• Charlotte finds 1 butterfly with probability 45%, and otherwise does not
find one.
Let X be the number of butterflies that they find altogether. Find E(X).
Exercise 10.14. Two 4-sided dice. Consider some 4-sided dice. Roll two of
these dice. Let X denote the minimum of the two values that appear, and let
Y denote the maximum of the two values that appear.
Exercise 10.15. Pick two cards. Pick two cards at random from a well-
shuffled deck of 52 cards (pick them simultaneously, i.e., grab two cards at
once—so they are not the same card!). There are 12 cards which are considered
face cards (4 Jacks, 4 Queens, 4 Kings). There are 4 cards with the value 10.
Let X be the number of face cards in your hand; let Y be the number of 10’s
in your hand.
Exercise 10.17. Claw. Jorge has three kids who spotted a Claw machine with
toys they want. In stores the toys costs $10 each, but each play on the claw
only costs $1. The probability of Jorge winning a game on the Claw is 0.12.
Should he use the Claw to get the toys (he needs one toy per kid), or does he
expect it to be cheaper to buy them in the stores?
Exercise 10.19. Sum of dice. Two fair dice are rolled. Let X be the sum of
the dice. What is the expected value of X?
Chapter 11
11.1 Introduction
One useful property of expected values of random variables is linearity, i.e.:
1. the expected value of a sum of random variables is equal to the sum of
the expected values, and
2. constants can be factored out of expected values.
Theorem 11.1. Expected value of the sum of discrete random
variables
If X1 , X2 , . . . , Xn are discrete random variables with finite expected values,
and a1 , a2 , . . . , an are constant numbers, then
The same property holds for a countably infinite collection of random variables.
This holds even when the Xj ’s are dependent.
130
11.1. Introduction 131
The same type of justification works if there are a countably infinite number of
random variables and constants.
We often use the property above in just the case where n = 2 and a2 = 1
and X2 is a constant, say “b.” So Theorem 11.1 simplifies to
Corollary 11.2. For any random variable X and any constants a and b, we
have
E(aX + b) = aE(X) + b.
Using Theorem 11.1 and Theorem 11.3 together, we have a very powerful
probability tool that allows us to quickly resolve most of the problems from
Chapter 10. We write a random variable as a sum of indicator random vari-
ables, and we compute the expected value of each indicator random variable by
computing the probability of each corresponding event.
132 Chapter 11. Expected Values of Sums of Random Variables
11.2 Examples
Example 11.4. Flip a coin three times. Let X denote the number of heads.
Let A1 , A2 , A3 be the events that the first, second, third flips (respectively) are
heads. Let X1 , X2 , X3 be indicators for A1 , A2 , A3 , i.e., X1 , X2 , X3 indicate
whether the first, second, third flips (respectively) are heads. So E(Xj ) =
P (Aj ) = 1/2 for each j. Also, X = X1 + X2 + X3 . Thus,
Example 11.5. Consider a class consisting of eight students, who earn the
following scores on an exam: 75, 92, 88, 94, 89, 60, 83, 84. Let X be the
exam score of a randomly chosen student. Each student is equally likely to be
selected. What is the expected value of X?
Let Xj be the jth student’s score if the jth student is selected, and Xj = 0
otherwise. So, e.g., E(X1 ) = (75)(1/8) + (0)(7/8) = (75)(1/8). Then always
X = X1 + X2 + · · · + X8 , because one of the Xj ’s will be the selected student’s
score, and the other Xj ’s will be 0. Therefore,
E(X) = E(X1 + · · · + X8 )
= E(X1 ) + · · · + E(X8 )
= (75)(1/8) + (92)(1/8) + (88)(1/8) + (94)(1/8)
+ (89)(1/8) + (60)(1/8) + (83)(1/8) + (84)(1/8)
= 83.125.
Example 11.6. Roll a die three times. Let X be the number of times that a
6 appears. Find the expected value of X.
Let Aj be the event that the jth roll is a 6, so P (Aj ) = 1/6 for each j. Let Xj
be the indicator for Aj , i.e., Xj indicates whether the jth roll is a 6, so Xj = 1
if the jth roll is a 6, and Xj = 0 otherwise. Thus, E(Xj ) = P (Aj ) = 1/6 for
each j. Also, X = X1 + X2 + X3 . So
More generally:
11.2. Examples 133
E(X1 + X2 + · · · + Xn ) = nE(X1 ).
Example 11.8. A standard deck of 52 cards has 13 hearts. The cards in such
a deck are shuffled, and the top five cards are dealt to a player. What is the
expected number of hearts that the player receives?
Let X denote the number of hearts that the player receives. Let Aj be the event
that the jth card is a heart, so P (Aj ) = 1/4 for each j. Let Xj be the indicator
for Aj , i.e., Xj indicates whether the jth card is a heart, so Xj = 1 if the jth
card is a heart, and Xj = 0 otherwise. This yields E(Xj ) = P (Aj ) = 1/4 for
each j. (This is true because we are just focusing momentarily on the jth value;
we are not considering the values of the other cards or conditioning on them.)
Also, X = X1 + X2 + X3 + X4 + X5 . So
Example 11.9. Jim and his brother both like chocolate chip cookies best.
They have a jar of cookies with 5 chocolate chip cookies, 3 oatmeal cookies,
and 4 peanut butter cookies. They are each allowed to have 3 cookies. To be
fair, they agree to randomly select their cookies without peeking, and they each
must keep the cookies that they select. How many chocolate chip cookies does
Jim expect to get? (Notice that it does not matter whether Jim or his brother
selects the cookies first—the answer will be the same, either way.)
Let X be the number of chocolate chip cookies that Jim selects. Let Aj be the
event that the jth cookie that Jim selects is chocolate chip, so P (Aj ) = 5/12
for each j. Let Xj be the indicator for Aj , i.e., Xj indicates whether the jth
cookie that Jim selects is chocolate chip, so Xj = 1 if the jth cookie that Jim
selects is chocolate chip, and Xj = 0 otherwise. So E(Xj ) = P (Aj ) = 5/12 for
each j. Also, X = X1 + X2 + X3 , so
Indicator random variables are useful even when they do not necessary have
the same distribution. Some creativity is required for seeing how to apply
them. This creativity can be developed with experience and with persistence.
It is often worthwhile to see if indicator random variables are relevant to ap-
ply in a problem. We present some different ways that they can be used in
the two examples below, to demonstrate that there are often several different
approaches—using different kinds of indicator random variables—to solve the
same problem.
Example 11.10. A student shuffles a deck of cards thoroughly (one time) and
then selects cards from the deck without replacement until the ace of spades
appears. How many cards does the student expect to draw?
Method #1. Let Aj be the event that j or more draws are required, so P (Aj ) =
1− j−1
52 , because Aj occurs if and only if the first j −1 draws are failures. Let Xj
be the indicator for Aj , i.e., Xj indicates whether j or more draws are needed,
so Xj = 1 if j or more draws are needed, and Xj = 0 otherwise. Therefore, we
have E(Xj ) = P (Aj ) = 1 − j−1 52 for each j. Also, X = X1 + X2 + · · · + X52
because (for instance) if exactly 3 draws are needed, then X1 , X2 , X3 are each
equal to 1, and the other Xj ’s are 0, so X = X1 + X2 + X3 , as desired. So
since
1 + 2 + · · · + n = (n)(n + 1)/2.
So
(52)(51)/2 51 104 − 51
E(X) = 52 − = 52 − = = 53/2.
52 2 2
Method #2. Assign one indicator random variable to each card in the deck
(except the ace of spades). The indictor is 1 if the corresponding card is drawn
before the ace of spades, or 0 otherwise. The number of cards drawn is X1 +
11.2. Examples 135
Method #3. (This method is only included here for readers who know and enjoy
induction; if the reader does not know induction, it is OK to skip this method.)
Let Xj denote the number of flips needed to find a particular card (e.g., the ace
of spades) in a deck of j cards. Then we claim that E(Xj ) = (j + 1)/2 for all j,
and we prove it by induction. If j = 1, then the first card must be the desired
card, so E(X1 ) = (1 + 1)/2 = 1; this shows the base case. Now we handle
the inductive step: If E(Xj−1 ) = j/2, we show that E(Xj ) = (j + 1)/2. Flip
the first card (from the deck of j cards); the first card is the desired card with
probability 1/j. The first card is not the desired card with probability (j −1)/j,
and in such a case, this flip (that just occurred) was used, plus an additional
E(Xj−1 ) flips will be needed, because the problem is essentially starting again
with j − 1 cards. So
1 j−1 1 j−1 j j+1
E(Xj ) = + (1 + E(Xj−1 )) = + 1+ = .
j j j j 2 2
In particular, when starting with 52 cards, E(X52 ) = 53/2 flips are expected.
Example 11.11. A student draws cards from a standard deck of playing cards
until the ace of spades appears for the first time. After every unsuccessful draw,
the student replaces the card and shuffles the deck thoroughly before selecting
a new card. How many cards does the student expect to draw until the ace of
spades appears?
Method #1. Let Aj be the event that j or more draws are required, so P (Aj ) =
(51/52)j−1 , because Aj occurs if and only if the first j − 1 draws are failures.
Let Xj be the indicator for Aj , i.e., Xj indicates whether j or more draws are
needed, so Xj = 1 if j or more draws are needed, and Xj = 0 otherwise. So
E(Xj ) = P (Aj ) = (51/52)j−1 for each j. Also, X = X1 + X2 + X3 + · · · because
(for instance) if exactly 3 draws are needed, then X1 , X2 , X3 are each equal to
1, and the other Xj ’s are 0, so X = X1 + X2 + X3 , as desired. Thus, we have
It follows that
∞
X 1 1
E(X) = (51/52)j = 51 = 1/52 = 52.
j=0
1 − 52
So the student expects to draw 52 cards to see the ace of spades for the first
time.
Method #2. (Does not use indicators!) Let X be the number of flips that are
necessary. With probability 1/52, the ace of spades appears on the first draw.
With probability 51/52, the ace of spades does not appear on the first draw,
and the problem essentially starts over again: in this case, the original flip was
used, plus E(X) more flips will be needed. So we have
1 51 1 51 51 51
E(X) = (1) + (1 + E(X)) = + + E(X) = 1 + E(X).
52 52 52 52 52 52
Subtracting 51
52 E(X) from both sides, we get 1
52 E(X) = 1, and we conclude that
E(X) = 52.
On the other hand, Y can only take on the values 0 or 3, and the mass of Y is:
Therefore, although X1 + X2 + X3 and 3X1 have the same expected value, they
have very different distributions. (More generally, X1 + X2 + · · · + Xn and nX1
have different distributions in most similar examples.)
11.3. Exercises 137
11.3 Exercises
11.3.1 Practice
Exercise 11.1. Graduation. As in Exercise 10.1, Jack and Jill are indepen-
dently struggling to pass their last (one) class required for graduation. Jack
needs to pass Calculus III, but he only has probability 0.30 of passing. Jill
needs to pass Advanced Pharmaceuticals, but she only has probability 0.46 of
passing. They work independently.
Following the notation of Exercise 10.1, let X be 0, 1, or 2, if (respectively)
neither, one, or both of them graduate. Let Y indicate if Jack graduates, so
Y = 1 if Jack graduates, and Y = 0 otherwise. Let Z indicate if Jill graduates,
so Z = 1 if Jill graduates, and Z = 0 otherwise. Notice that X = Y + Z always.
Now find E(X) using E(Y ) and E(Z).
Exercise 11.2. Japanese pan noodles. As in Exercise 10.2, four students
order noodles at a certain local restaurant. Their orders are placed indepen-
dently. Each student is known to prefer Japanese pan noodles 40% of the time.
How many of them do we expect to order Japanese pan noodles?
Let A1 , A2 , A3 , A4 be the events that (respectively) the first, second, third,
or fourth person orders Japanese pan noodles. Let X1 , X2 , X3 , X4 be indicator
random variables for (respectively) A1 , A2 , A3 , A4 . Justify your answer using
the values of the E(Xj )’s.
Exercise 11.3. Yellow ducks. Three hundred little plastic yellow ducks are
dumped in a pond; one of them contains a prize stamped on the bottom.
Leonardo examines each duck until he discovers the prize. He discards each
duck without a prize after he checks it, so that he never needs to check a duck
more than one time. How many ducks does he expect to check until he discovers
the prize?
Exercise 11.4. Super Breakfast Challenge. As in Exercises 8.5 and 10.18,
the Super Breakfast Challenge (SBC) consists of bacon, eggs, oatmeal, orange
juice, milk, and several other foods, and it costs $12.99 per person to order at
a local restaurant. It is known to be very difficult to consume the entire SBC.
Only 10% of people are able to eat all of the SBC. The other 90% of people will
be unable to eat the whole SBC (it is too much food!).
A probability student hears about the SBC and goes to the local restaurant.
He observes the number of customers, X, that attempt to eat the SBC, until
the first success. So if there are 4 failures and then 1 success (i.e., the outcome
is (F, F, F, F, T )), then X = 5.
Find the expected value of X, i.e., the number of customers expected to try
the SBC until the first success.
Exercise 11.5. Weather. A weather forecasting program gets the daily pre-
dictions right about 87% of the time. Assuming each day is independent, what
138 Chapter 11. Expected Values of Sums of Random Variables
is the expected number of days that will pass until the program gets the forecast
wrong?
Exercise 11.6. Random movie picks. A man sits down to watch a movie.
40% of his options are action films, 35% are comedies, and 25% are drama. He
wants to watch a comedy and starts picking movies at random from his box.
If previous picks are put back in the box after marking the choice, how many
movies is he expected to have to pick up until he finds a comedy?
Exercise 11.7. Dice. Robbie rolls a die until he gets a 6. What is the expected
number of rolls?
Exercise 11.8. Free throws. While shooting free throws, you have a 37%
chance of making a basket. How many shots do you expect to have to take until
making a basket?
Exercise 11.10. Butterflies. Alice, Bob, and Charlotte are looking for but-
terflies. They look in three separate parts of a field, so that their probabilities
of success do not affect each other.
• Alice finds 1 butterfly with probability 17%, and otherwise does not find
one.
• Bob finds 1 butterfly with probability 25%, and otherwise does not find
one.
• Charlotte finds 1 butterfly with probability 45%, and otherwise does not
find one.
11.3.2 Extensions
Exercise 11.15. Oreos. A box of Double Stuf Oreos has a defect. One of
the Oreos is only single-stuffed. There are 6 Oreos in the pack. What is the
140 Chapter 11. Expected Values of Sums of Random Variables
expected number of Oreos you need to check in the pack until the defective one
is found?
Exercise 11.16. Flipping coins. Flip a coin until the second head comes up.
Let X be the number of flips needed to get the second head. What is the E(X)?
Exercise 11.17. Waiting for favorite song. Michael puts his iTunes on
shuffle mode where songs are not allowed to be replayed. He has 2,781 songs
saved in iTunes, and exactly one of these is his favorite.
a. How many songs is he expected to have to listen to until his very favorite
song comes up?
b. Now suppose that he allows songs to be repeated. How many songs does
he expect to listen to until his very favorite song comes up?
Exercise 11.18. Crayons. A little girl has a 96-pack of crayons. She picks up
crayons, at random, to check the color, and she leaves them in a separate pile
after inspecting the color. She pulls crayons out of the pack until she gets the
sea foam green crayon.
a. What is the expected number of crayons she will check until she finds sea
foam green?
b. Assume instead that she puts each crayon back in the box before ran-
domly drawing the next crayon. Now how many crayons does she expect to
check until she finds sea foam green?
Exercise 11.19. Lectures and labs. Consider a group of 120 students. They
are split into two lectures (60 students each). They are also split into six labs
(20 students each). All assignments of students to lectures and labs are equally
likely. How many classmates does Barry expect to be in both his lecture and
his lab too?
Exercise 11.21. Exiting the elevator. Eight people enter an elevator in the
parking garage below a building. Each person chooses her exit independently of
the other people. The building has floors 1 through 10. What is the expected
number of stops that the elevator makes?
Exercise 11.23. Rope tying. Consider n pieces of rope. Each piece is colored
blue at one end and red at the other. The blue ends of the ropes are randomly
paired with the red ends of the ropes and tied together, one-to-one, i.e., all n!
such possible methods of joining the ropes this way are equally likely. Let X
be the number of loops that result from this method. Find E(X).
Chapter 12
Two different teachers are teaching probability classes. Before registering, stu-
dents want to know how past students’ grades were distributed for each instruc-
tor. The students have a report of the mean and standard deviation for Exam 1
from last semester. The means are similar, but one instructor has a standard
deviation that is much larger than the other instructor’s. Is it good or bad for
exam scores to be widely spread out from the mean? Why?
12.1 Introduction
The expected value of a function of a random variable is a sum over all values
that a function of a random variable takes; as in Chapter 10, these the values are
taken in proportion to the fraction of the time that the function of the random
variable takes each value.
Definition 12.1. Expected value of a function of a discrete random
variable
If g is any function, and X is a discrete random variable that takes on values
x1 , x2 , . . . , xn , then the expected value of g(X) is
n
X n
X
E(g(X)) = g(xj )P (X = xj ) = g(xj )pX (xj ).
j=1 j=1
142
12.1. Introduction 143
Example 12.2. Flip a coin three times. Let X denote the number of heads.
Find E(X 2 ), E(X 3 ), E(eX ), E(2X − 5), and E(17X 3 + 5X 2 − 7X + 22).
1 3 3 1
E(eX ) = (e0 )(1/8) + (e1 )(3/8) + (e2 )(3/8) + (e3 )(1/8) = + e + e2 + e3 .
8 8 8 8
The expected value of 2X − 5 is
1 3
E(2X − 5) = ((2)(0) − 5) + ((2)(1) − 5)
8 8
3 1
+ ((2)(2) − 5) + ((2)(3) − 5)
8 8
= −2.
As we learned in Chapter 11, the expected value is linear. Also, by treating the
“5” as a random variable that is always equal to 5, we can use the fact from
Chapter 10 that E(X) = 3/2, and we can simply calculate (as a shorter method
than above),
We can use the fact that E(X) = 3/2 and E(X 2 ) = 3 and E(X 3 ) = 27/4 to
compute, for instance, E(17X 3 + 5X 2 − 7X + 22), as follows:
As in Chapter 10, we can also decompose things further, based upon the specific
outcome of the underlying random phenomenon:
144 Chapter 12. Variance of Discrete Random Variables
If there are a countably infinite number of such ω’s, we just adjust to sums
taken over all j ∈ N.
In the setting of Example 12.2, this means we can just decompose the ran-
dom phenomenon into all eight possible outcomes:
ω1 = T T T,
ω2 = HT T, ω3 = T HT, ω4 = T T H,
ω5 = HHT, ω6 = HT H, ω7 = T HH,
ω8 = HHH,
and we get the same result:
E(X 2 ) = (02 )(1/8) + (12 )(1/8) + (12 )(1/8) + (12 )(1/8)
+ (22 )(1/8) + (22 )(1/8) + (22 )(1/8) + (32 )(1/8) = 3.
(In an analogous way, we had two ways of computing expected values in Chap-
ter 10, which basically amounted to whether we grouped the 2nd, 3rd, and 4th
terms above, which all yield the same value of g(X), and we could also group
the 5th, 6th, and 7th terms above, which again all yield a common value of
g(X). So the two boxed methods above are, as in Chapter 10, just two different
ways of grouping things.)
To summarize: When computing the expected value of a function g of a
random variable X, we can take one of the two following approaches:
1. a sum over all of the possible values of g(X), each weighted by the prob-
ability of X taking on value x, and thus g(X) taking on value g(x), or
2. a sum over all of the possible outcomes, taking the value of g(X) from
such an outcome, weighted by the probability of that outcome
Example 12.4. Consider a class consisting of eight students, who earn the
following scores on an exam: 75, 92, 88, 94, 89, 60, 83, 84. Let X be the
exam score of a randomly chosen student. Each student is equally likely to be
selected. How much do we expect each student’s score to exceed 60, i.e., what
is the expected value of X − 60?
12.2. Variance 145
There are eight possible outcomes in this random phenomenon, each of which
has probability 1/8. So, following the second definition of expected value, we
obtain
12.2 Variance
One of the reasons that expected values of functions of random variables are
so useful is the concept of variance. Since the variance of a random variable
is positive or zero (as we will see below), we can always compute the square
root of the variance; this quantity is referred to as the standard deviation of
the random variable. The standard deviation measures the spread of a random
variable around the mean, i.e., it describes how widely the mass of a random
variable is spread out. A larger variance (or standard deviation) signifies that
the values of the random variable tend to be spread relatively far from the
expected value. A smaller variance (or standard deviation) signifies that the
values of the random variable tend to be relatively closer to the expected value,
i.e., the values of the random variable do not tend to be very spread out.
We use µX = E(X) and σX 2 = Var(X) to simplify the notation.
Var(X) = E((X − µX )2 ).
Var(X) = E((X − µX )2 ) ≥ 0.
146 Chapter 12. Variance of Discrete Random Variables
In Example 12.9, we see random variables X and Y that have the same
expected values but different variances.
Example 12.9. Recall Examples 10.5, 11.5, and 12.4, in which the expected
value of a randomly chosen student’s grade in a course is
Consider eight other students in a different course, who earn the following scores
on an exam: 82, 82, 81, 84, 89, 80, 83, 84. In this case, let Y be the exam score
of a randomly chosen student. Find Var(X) and Var(Y ).
The expected value of Y is
so that the average value of the students in the two scenarios is the same, i.e.,
E(X) = E(Y ).
12.2. Variance 147
On the other hand, the variances of the grades from the two groups of students
are different. We have
and
So the variance of X is
The variance of Y is much smaller than the variance of X. This signifies the
fact that the values of Y tend to be much closer to the expected value of Y ,
as compared to the values of X, which tend to be very spread out far from the
expected value of X. All of the values of Y are found in the 80’s, while the
values of X are spread from 60 all the way to 94.
Example 12.10. Flip a coin three times. Let X denote the number of heads.
Find the variance of X.
Example 12.11. Roll three dice. Let X denote the number of 6’s that appear
on the dice altogether. Find the variance of X.
148 Chapter 12. Variance of Discrete Random Variables
P (X = 0) = (5/6)3
P (X = 1) = (3)(1/6)(5/6)2
P (X = 2) = (3)(1/6)2 (5/6)
P (X = 3) = (1/6)3
E(X 2 ) = (02 )(5/6)3 +(12 )(3)(1/6)(5/6)2 +(22 )(3)(1/6)2 (5/6)+(32 )(1/6)3 = 2/3.
Example 12.12. Suppose that you are playing a game where you roll the die
three times and count up the number of 6’s you get in those rolls. You have to
pay $3 to play, but you win $4 for every 6 that comes up. So your winnings are
4X − 3 dollars, where X is the total number of 6’s that appear.
It is worthwhile to note that the dealer wins −4X + 3 during this game. So
dealer’s expected winnings are
E(−4X + 3) = 1 dollar,
Importantly, we conclude that your expected winnings are the same as the
dealer’s expected winnings, but with the sign reversed (i.e., −1 and 1, respec-
tively). We also conclude that your winnings and the dealer’s winnings have
the same variance and standard deviation. This is true more generally. The
difference in the signs will disappear as the constant terms do not affect the
variance, and the signs of the coefficients are lost during the squaring process
in the variance.
Remark 12.13. For any random variable X and any constants a and b, we
have
Var(aX + b) = a2 Var(X).
Example 12.14. A student shuffles a deck of cards thoroughly (one time) and
then selects cards from the deck without replacement until the ace of spades
appears. What is the variance of the number of cards that the student draws
until the ace of spades appears?
150 Chapter 12. Variance of Discrete Random Variables
Let X be the number of cards needed until the ace of spades appears. Then,
as we observed in Example 10.8, we have P (X = j) = 1/52 for each j. So the
expected value of X is:
Example 12.15. A standard deck of 52 cards has 13 hearts. The cards in such
a deck are shuffled, and the top five cards are dealt to a player. What is the
variance of the number of hearts that the player receives?
Let X denote the number of hearts that the player receives. In Example 10.10,
we computed the mass of X:
(39)(38)(37)(36)(35) 2109
P (X = 0) = =
(52)(51)(50)(49)(48) 9520
(13)(39)(38)(37)(36) 27417
P (X = 1) = 5 =
(52)(51)(50)(49)(48) 66640
(13)(12)(39)(38)(37) 9139
P (X = 2) =10 =
(52)(51)(50)(49)(48) 33320
(13)(12)(11)(39)(38) 2717
P (X = 3) =10 =
(52)(51)(50)(49)(48) 33320
(13)(12)(11)(10)(39) 143
P (X = 4) = 5 =
(52)(51)(50)(49)(48) 13328
(13)(12)(11)(10)(9) 33
P (X = 5) = =
(52)(51)(50)(49)(48) 66640
12.2. Variance 151
So we get
2109 27417 9139 2717 143 33
E(X) = (0) + (1) + (2) + (3) + (4) + (5) ,
9520 66640 33320 33320 13328 66640
or more simply, E(X) = 5/4; and
2109 27417 9139 2717 143 33
E(X 2 ) = (02 ) +(12 ) +(22 ) +(32 ) +(42 ) +(52 ) ,
9520 66640 33320 33320 13328 66640
so E(X 2 ) = 165/68. Thus, the variance of X is
Var(X) = 165/68 − (5/4)2 = 235/272.
Example 12.16. Jim and his brother both like chocolate chip cookies best.
They have a jar of cookies with 5 chocolate chip cookies, 3 oatmeal cookies,
and 4 peanut butter cookies. They are each allowed to have 3 cookies. To be
fair, they agree to randomly select their cookies without peeking, and they each
must keep the cookies that they select. What is the variance of the number of
chocolate chip cookies that Jim gets? (Notice that it does not matter whether
Jim or his brother selects the cookies first—the answer will be the same, either
way.)
Let X denote the number of chocolate chip cookies that Jim selects. Notice
that there are (12)(11)(10) = 1320 equally likely outcomes for Jim. We already
computed the mass of X in Example 10.11:
(7)(6)(5) 7
P (X = 0) = =
(12)(11)(10) 44
(3)(5)(7)(6) 21
P (X = 1) = =
(12)(11)(10) 44
(3)(7)(5)(4) 7
P (X = 2) = =
(12)(11)(10) 22
(5)(4)(3) 1
P (X = 3) = = .
(12)(11)(10) 22
So, as we already computed in Example 10.11, the expected value of X is
7 21 7 1
E(X) = (0) + (1) + (2) + (3) = 5/4.
44 44 22 22
The expected value of X 2 is
7 21 7 1
E(X 2 ) = (02 ) + (12 ) + (22 ) + (32 ) = 95/44.
44 44 22 22
So the variance of X is
Var(X) = 95/44 − (5/4)2 = 105/176.
152 Chapter 12. Variance of Discrete Random Variables
To see this, we compute as follows (the second equality holds because X and Y
are independent):
XX
E(g(X)h(Y )) = g(x)h(y)P (X = x and Y = y)
x y
XX
= g(x)h(y)P (X = x)P (Y = y)
x y
X X
= g(x)P (X = x) h(y)P (Y = y)
x y
= E(g(X))E(h(Y ))
Using identity functions g(X) = X and h(Y ) = Y , the following nice fact
follows immediately:
E(XY ) = E(X)E(Y ).
When we are working with random variables, the variance of the sum equals
the sum of the variances. When we pull constants outside of the variance, they
get squared.
n
!
X
Var(a1 X1 + · · · + an Xn ) = Var ai Xi
i=1
n
!2 ! n
!!2
X X
=E ai Xi − E ai Xi
i=1 i=1
n X
n
! n
!2
X X
=E ai aj Xi Xj − ai E(Xi )
i=1 j=1 i=1
n
XX n n
XX n
= ai aj E(Xi Xj ) − ai aj E(Xi )E(Xj ),
i=1 j=1 i=1 j=1
n
X
Var(a1 X1 + · · · + an Xn ) = ai ai E(Xi2 )
i=1
n X
X
+ ai aj E(Xi )E(Xj )
i=1 j6=i
Xn X n
− ai aj E(Xi )E(Xj ).
i=1 j=1
n
X n
X n
X
Var(a1 X1 +· · ·+an Xn ) = ai ai E(Xi2 )− ai ai E(Xi )E(Xi ) = a2i Var(Xi ).
i=1 i=1 i=1
Corollary 12.20. If X is any random variable, and if a and b are any con-
stants, then
Var(aX + b) = a2 Var(X).
Thus σaX+b = Var(aX + b) = a2 Var(X) = |a| Var(X).
p p p
In the case where ai = 1 for all i, Theorem 12.19 has a simpler form, given
in Corollary 12.21. This corollary is very handy. (Just be sure that the random
variables are independent when using these results; otherwise, these ideas cannot
be applied.)
154 Chapter 12. Variance of Discrete Random Variables
Var(X1 + · · · + Xn ) = n Var(X1 ).
(Remember that the independence was not needed for the analogous method
of computing the expected value of a sum of random variables.)
Example 12.22. As in Example 12.10, flip a coin three times. Let X denote
the number of heads. Find the variance of X.
Example 12.23. As in Example 12.11, roll three dice. Let X denote the
number of 6’s that appear on the dice altogether. Find the variance of X.
12.4 Exercises
12.4.1 Practice
Exercise 12.1. Variance of an indicator. Suppose event A occurs with
probability p, and X is an indicator for A, i.e., X = 1 if A occurs, or X = 0
otherwise. We already know E(X) = p. Find Var(X).
Exercise 12.3. Japanese pan noodles. As in Exercises 10.2 and 11.2, four
students order noodles at a certain local restaurant. Their orders are placed
independently. Each student is known to prefer Japanese pan noodles 40% of
the time. Let X be the number of students who order Japanese pan noodles.
What is the variance of X?
Exercise 12.4. Dice. Roll two dice; let X denote the maximum of the two
values that appear.
a. Find E(X).
b. Find E(X 2 ).
c. Find Var(X).
Exercise 12.5. Yellow ducks. Three hundred little plastic yellow ducks are
dumped in a pond; one of them contains a prize stamped on the bottom.
Leonardo examines each duck until he discovers the prize. He discards each
duck without a prize after he checks it, so that he never needs to check a duck
more than one time. Find the variance of the number of ducks he checks until
he discovers the prize.
Exercise 12.6. Dice. Susan rolls a die 4 times. Let X be the number of 1s
that appear.
a. Find E(X).
b. Find E(X 2 ).
c. Find Var(X).
pX (1) = 0.12
pX (3) = 0.37
pX (27) = 0.42
pX (31) = 0.09
156 Chapter 12. Variance of Discrete Random Variables
a. Find E(X).
b. Find E(X 2 ).
c. Find Var(X).
d. Find E(X 2 − 4X).
pX (0) = 0.05
pX (1) = 0.15
pX (2) = 0.35
pX (3) = 0.30
pX (4) = 0.10
pX (5) = 0.05
Assume each alarm requires one battery per year, and batteries cost $2.35.
a. What is the expected cost in batteries for smoke alarms for a randomly
selected apartment?
b. What is the variance of the cost?
Exercise 12.9. Waiting for favorite song. Michael plays a random song on
his iPod. He has 2,781 songs, but only one favorite song. Let X be the number
of songs he has to play on shuffle (songs can be played more than once) in order
to hear his favorite song.
a. Find E(X).
b. Find E(X 2 ).
c. Find Var(X).
a. Find E(X).
b. Find E(X 2 ).
c. Find Var(X).
Exercise 12.11. Raffle tickets. You purchase 8 raffle tickets at the county
fair. Each ticket costs $5. A ticket is worth $100 with probability 1/400, but is
worthless otherwise. What is the expected value of your purchase? Be sure to
take into account the original purchase price of the tickets.
12.4. Exercises 157
Exercise 12.12. SAT scores. Suppose that, among a certain group of stu-
dents, SAT scores have a mean value of 1026 and a standard deviation of 209.
Let X denote a randomly chosen student’s score. What is E(X 2 )?
Exercise 12.13. Butterflies. Alice, Bob, and Charlotte are looking for but-
terflies. They look in three separate parts of a field, so that their probabilities
of success do not affect each other.
• Alice finds 1 butterfly with probability 17%, and otherwise does not find
one.
• Bob finds 1 butterfly with probability 25%, and otherwise does not find
one.
• Charlotte finds 1 butterfly with probability 45%, and otherwise does not
find one.
Let X be the number of butterflies that they find altogether. Find the variance
of X.
Exercise 12.15. Two 4-sided dice. Consider some 4-sided dice. Roll two of
these dice. Let X denote the minimum of the two values that appear, and let
Y denote the maximum of the two values that appear.
Find the variance of X.
(Caution: If Xj is an indicator of whether the minimum of the two dice is “j
or greater”—as in the previous homework—then X = X1 + X2 + X3 + X4 , but
the Xj ’s are dependent.)
Exercise 12.16. Pick two cards. Pick two cards at random from a well-
shuffled deck of 52 cards (pick them simultaneously, i.e., grab two cards at
once—so they are not the same card!). There are 12 cards which are considered
face cards (4 Jacks, 4 Queens, 4 Kings). There are 4 cards with the value 10.
Let X be the number of face cards in your hand; let Y be the number of 10’s
in your hand.
158 Chapter 12. Variance of Discrete Random Variables
Exercise 12.17. Die game. In each round of a game, you earn a dollar if a
die shows 1 or 2, you lose a dollar if a die shows 5 or 6, and you neither earn nor
lose anything if a die shows 3 or 4. Let Xj be +1, −1, or 0 according to your
outcome on the jth round. Let X = X1 + X2 + · · · + X10 . Find the variance
of X.
12.4.2 Extensions
Exercise 12.19. Magic. A magician wants to do a trick where he tries to
guess the value of the card that an audience member has selected (the suit of
the card is not taken into account). He has six decks of cards so he gives one
deck to each of six audience members. Let X denote the number of cards that
he guesses correctly, when each audience member selects one card at random.
a. Find E(X).
b. Find E(X 2 ).
c. Find Var(X).
d. The magician gets paid $500 for performing the show, plus $20 for each
card he guesses correctly. What is the average amount of money that he gets
paid?
e. As in (d), what is the variance of the amount that he gets paid?
Exercise 12.20. Cookies. As in Example 12.16, Jim and his brother both
like chocolate chip cookies best. They have a jar of cookies with 5 chocolate
chip cookies, 3 oatmeal cookies, and 4 peanut butter cookies. They are each
allowed to have 3 cookies. To be fair, they agree to randomly select their cookies
without peeking, and they each must keep the cookies that they select. Suppose
that the cookies are worth $1.20 for each chocolate chip cookie, $0.75 for each
oatmeal cookie, and $1.30 for each peanut butter cookie. find the expected
value and variance of the total value of the cookies Jim gets.
x 1 2 3 4 5
pX (x) 0.1 0.25 0.5 0.1 0.05
12.4. Exercises 159
12.4.3 Advanced
Exercise 12.26. Mystery mass. Consider the mass
0.15 for x = −1,
0.35 for x = 0,
pX (x) =
0.25 for x = 1,
0.25 for x = 2.
a. Find E(X).
b. Find E(2/X).
c. Find 2/E(X).
d. Find E(|X|).
e. Find Var(X).
f. Find Var(|X|).
Exercise 12.28. If X and Y have joint density fX,Y (x, y) = 8xy on the triangle
0 ≤ y ≤ x ≤ 1, find E(XY ).
Exercise 12.29. Coins. Consider a pile that has 9 coins altogether. Exactly 3
of the coins are dimes (worth 10 cents each), and the other 6 coins are pennies
(worth 1 cent each). Emily picks up 4 of the coins blindly (all possibilities are
equally likely) without replacement. Find the expected value (in cents) of the
amount she picks up.
Exercise 12.30. Specify a discrete random variable that has the property
E(Xj ) = 0 for all odd integers j, and E(Xj ) = 1 for all even integers j.
Chapter 13
• The probability mass function table must include every individual value
that the random variable can take.
0 ≤ P (X = x) ≤ 1
161
162 Chapter 13. Review of Discrete Random Variables
E(aX + b) = aE(X) + b
Var(aX + b) = a2 Var(X)
p p
σaX+b = a2 Var(X) = |a| Var(X)
What if all of these random variables are independent and also have
the same distribution?
n
!
X
Var Xi = n Var(X)
i=1
p
σPni=1 Xi = n Var(X)
13.2 Exercises
Exercise 13.1. Horse race. There are 5 horses running in a race, and you
own the one named Rosie. Let X be the order Rosie finishes the race (1st, 2nd,
etc.). Below is a table for the probability for each x-value.
x 1 2 3 4 5
P(X = x) 0.4 0.2 0.1 ? 0.02
a. Find the probability mass function for the number of letters which are
chosen.
b. Graph the probability mass function.
c. What is the mean number of letters which are chosen?
d. What is the standard deviation of the number of letters which are chosen?
e. Find the cumulative distribution function.
f. Graph the cumulative distribution function.
a. Find the probability mass function for the number of math team members
who are calculus specialists. (Hint: Can 5 calculus specialists make the team?
Why or why not?)
b. Find the cumulative distribution function for the number of math team
members who are calculus specialists.
c. What is the expected number of calculus specialists who make the team?
d. What is the variance of the number of calculus specialists who make the
team?
Exercise 13.6. Grading. Joan will grade four statistics projects, selected at
random from a large stack, this evening. From past experience, Joan thinks
that 30% of the projects will be “A” quality projects. Let X be the number of
“A” projects she grades tonight. (Assume that all the projects are independent
and that Joan is completely fair and objective with her grading of each one.)
Exercise 13.11. Trick coin game. Suppose an acquaintance has a trick coin
that comes up a head 75% of the time. He wants to play a game where he tosses
the coin 3 times, and for each time the coin comes up a head, he will pay you
$2. He wants to charge you a fee to play the game. If you are willing to play
the game only if the game is fair, how much money would you be willing to pay
(overall) to play the game?
Exercise 13.12. Hidden bones. Chet the dog likes to dig holes in the yard.
He has buried bones in some of the holes, but others he dug just for fun. He
covers them back up with a little mound of dirt when he is done digging. The
yard has 20 holes that have been filled up, and 8 of them contain bones. Chet
decides to randomly search 4 of the holes (without replacement) in hopes of
finding some of his bones. Let X be the number of bones he finds.
a. Give the probability mass function for the number of bones Chet finds.
b. Give the cumulative distribution function for the number of bones he
finds.
13.2. Exercises 167
X 0 1 2
P (X = x) 0.75 0.20 0.05
a. Is this a valid probability mass function? Why or why not?
b. What is the expected number of homes the realtor will sell in a week?
c. What is the standard deviation in the number of homes the realtor will
sell in a week?
d. What is the expected number of homes the realtor will sell in a 12-week
quarter?
e. What is the standard deviation in the number of homes the realtor will
sell in a 12-week quarter? (Assume the weeks are independent.)
168 Chapter 13. Review of Discrete Random Variables
f. Do you think that the assumption about the weeks being independent is
a good assumption? Why or why not? Does independence affect your answer
to part d? To part e?
Exercise 13.15. Realtor sales, part 2. Using the realtor sales information
from Exercise 13.14, the realtor makes a typical commission of $4000 on each
home sold, but he needs to pay an assistant $900 each week.
a. What is the realtor’s expected net profit each week?
b. What is the standard deviation in the realtor’s weekly net profit?
c. During a 12-week quarter, what is the realtor’s expected net profit?
d. During a 12-week quarter, what is the standard deviation in the realtor’s
net profit?
Exercise 13.16. Basketball score. Erika, a basketball player, scores an av-
erage of 10 points per game with a standard deviation of 2.6 points. Let X
denote the number of points that she scores in a game.
a. What is the expected value of X 2 ?
b. What is the variance of 24X − 3?
e. What is the expected value and standard deviation of her total points
over 30 games, in which her performances are independent?
Exercise 13.17. Phone calls. A person makes an average of 3.12 phone calls
a day with a standard deviation of 1.77 phone calls. Let X denote the number
of calls made per day.
a. What is the expected value of X 2 ?
b. What is the expected value of −12X + 10?
c. Find the variance of −12X + 10.
Exercise 13.18. Rental car, part 1. The number of days a driver needs
a rental car after having an accident, X, is a discrete random variable with
probability mass function:
(
8−x
28 for x = 1, 2, . . . , 7,
P (X = x) =
0 otherwise
a. Make a table which shows the values for X and P (X = x).
b. Graph the probability mass function for X.
c. What is the cumulative distribution function for X?
d. Graph the cumulative distribution function for X.
e. What is the expected number of days that a driver will need a rental car?
f. What is the standard deviation of days that a driver will need a rental
car?
13.2. Exercises 169
Exercise 13.21. Loss on claims, part 2. The probability mass function you
found in Exercise 13.20 was only for homeowners who had at least one small
claim. The insurance company believes the probability that a homeowner will
have at least one small claim is 0.08. What is the probability mass function for
amount of loss for all of their policy holders?
Exercise 13.24. Tour group. A tour group is planning to pick a town to stay
in when they get tired of driving. Since they don’t know exactly where they will
be staying, they don’t know exactly how much the hotel rooms will cost. They
will need 12 rooms. They know that individual hotel rooms along their route
have an average nightly price of $75 with a standard deviation of $20. What is
their expected cost and standard deviation for the 12 rooms?
a. What is the probability that there will not be enough seats for the
passengers who show up to fly?
b. Do you think it is reasonable to assume the passengers are independent?
Why or why not?
c. The airline wants to predict their profit from this flight. Each passenger
paid $200 for his or her ticket, and if someone does not show up, he or she is
refunded only $150 of the ticket purchase price. What are the expected value
and the standard deviation of the airline’s income from this flight?
d. Using the story from part c, what is a customer’s expected cost and the
standard deviation of the cost?
Exercise 13.26. Class sizes. A probability mass function for the class sizes
at a small college is given in the table below (assume that these are the only
allowed class sizes):
In the chapters we have covered so far on discrete random variables, there are
some common themes, and perhaps you have even begun to recognize different
“types” of random variables. In the chapters that follow, we will be even more
precise about the kinds of discrete random variables and what they have in
common. We will describe scenarios that usually lend themselves to a particular
type of random variable, and we will systematically describe formulas to help
make your life easier. In Chapters 14 through 20, we’ll introduce you to these
different “named” distributions. This will culminate with Chapter 21, where we
review some strategies for comparing and contrasting all of these distributions.
This summary gives precise techniques to help you pick the correct distribution
for your situation.
There are many other discrete distributions than the seven we cover here,
but these are the most common ones.
By the end of this part of the book, you should be able to:
2. Identify the variable and the parameters in a story, and state in plain
English what the variable and parameters mean.
3. Use the formulas for the mass, expected value, and variance to answer
questions and find probabilities.
171
172 Part III. Named Discrete Random Variables
Sometimes the most simple things are some of the most useful as well. Is a
randomly chosen song a rock song or not? What is the probability of winning
one game? Does a cereal box have the preferred prize inside?
14.1 Introduction
In our discussion of discrete random variables, we have already seen many types
of Bernoulli random variables. Some outcomes are considered a “success” and
some outcomes are considered a “failure.” A Bernoulli random variable is
always 1 or 0 to indicate these respective possibilities of success or failure. A
Bernoulli random variable is often called an indicator random variable (or
simply an indicator), because the 1 indicates success, while the 0 indicates fail-
ure. Using the terminology of Chapter 3, a Bernoulli random variable indicates
the success or failure of a trial.
The Bernoulli distribution is the simplest named distribution. It is a building
block for the Binomial, Geometric, and Negative Binomial distributions. We’ll
use the word “success” to mean “the thing that we’re looking for,” regardless
of whether it is good or bad. E.g., when biologists study DNA to find which
markers are associated with a rare disease, a “success” could occur when they
identify such a marker, even though it is bad news for the relevant patient.
173
174 Chapter 14. Bernoulli Random Variables
The parameter:
Mass:
P (X = 1) = p
P (X = 0) = q
14.2 Examples
Example 14.1. Suppose 95% of people put peanut butter on first when making
a peanut butter and jelly sandwich. You select a person at random to ask
whether he or she puts the peanut butter on first.
q = 1 − p = 0.05
d. What is the expected number of people who put the peanut butter on
first if you ask one person?
E(X) = p = 0.95
e. What is the standard deviation for the number of people who put the
peanut butter on first if you ask one person?
pX (x)
1
(1, 0.95)
0.8
0.6
0.4
0.2
(0, 0.05)
x
1
FX (x)
1
(1, 1)
0.8
0.6
0.4
0.2
(0, 0.05)
x
1
176 Chapter 14. Bernoulli Random Variables
E(X) = p = 0.38
Var(X) = pq = (0.38)(0.62) = 0.2356
pX (x) FX (x)
1 1
(1, 1)
0.8 (0, 0.62) 0.8 (0, 0.62)
0.6 0.6
0.4 0.4
(1, 0.38)
0.2 0.2
x x
0.2 0.4 0.6 0.8 1 1.2 −0.2 0.2 0.4 0.6 0.8 1 1.2
Let X be a Bernoulli random variable that indicates whether the player wins
or loses. The probability of winning is 2/6, so
If X = 1, then the winnings are 15; if X = 0, then the winnings are −9. So the
expected winnings are
The formal way of writing this is to let f (X) be the gain or loss, depending on
X, so
Example 14.5. A cereal company puts a Star Wars toy watch in each of its
boxes as a sales promotion. Twenty percent of the cereal boxes contain a watch
with Obi Wan Kenobi on it. You are a huge Obi Wan fan, so you decide to buy
1 box of the cereal in hopes that you will find an Obi Wan watch. You convince
your brother and sister to each buy a box too.
a. How many Obi Wan watches does your family expect to obtain?
Let X = 1 if your box contains an Obi Wan watch, or X = 0 otherwise.
Similarly, let Y be an indicator random variable that indicates whether your
brother gets one; let Z be an analogous indicator for your sister.
Note: we do not even need to know whether our selections are independent
to solve this problem about expected values! We will not rely on independence
in our solution, because independence is not needed when taking the expected
value of the sum of random variables.
The X, Y, Z are indicator variables describing (respectively) whether you,
your brother, and your sister get Obi Wan watches. So the total number of
watches obtained is X + Y + Z. So the expected number of watches obtained is
b. What is the variance of the number of Obi Wan watches that your family
obtains?
In this situation, we do need to know about how X, Y, Z depend on each
other. If the number of cereal boxes is sufficiently large that the values of
X, Y, Z do not affect each other, we might be able to assume that X, Y, Z
are independent. (We will consider such situations—with independence—much
178 Chapter 14. Bernoulli Random Variables
more in the next chapter. To handle situations where the X, Y, Z are dependent,
we will have to wait until Chapter 19.) If X, Y, Z are independent, then we can
add their variances:
Var(X + Y + Z) = Var(X) + Var(Y ) + Var(Z)
= (0.2)(0.8) + (0.2)(0.8) + (0.2)(0.8)
= (3)(0.16)
= 0.48.
14.3 Exercises
14.3.1 Practice
Exercise 14.1. Call from home. One out of every eight calls to your house
is from a family member. You will record whether the next call is from a family
member.
a. What do you consider a “success” in this story? What is its probability?
b. What do you consider a “failure” in this story? What is its probability?
c. Why is this a Bernoulli situation? What is the parameter?
d. Define in words what X is in terms of this story. What values can X
take?
e. What is the probability that the next time the phone rings, it will be
from a family member?
f. If the phone calls are independent, what is the probability the 3rd call
today will be from a family member?
g. What is the mean of X?
h. What is the standard deviation of X?
Exercise 14.2. Dice. You roll a fair, six-sided die as part of a game. If you
roll a 5, you will win the game.
a. What do you consider a “success” in this story? What is its probability?
b. What do you consider a “failure” in this story? What is its probability?
c. Why is this a Bernoulli situation? What is the parameter?
d. Define in words what X is in terms of this story. What values can X
take?
e. Your friend will pay you $4 if you win the game. You owe your friend $1
if you lose the game. What are your expected winnings?
f. What is the variance of your expected winnings?
14.3. Exercises 179
Exercise 14.3. Homework. Hui has a class of 300 students, and only 6 have
done their homework assignment due today. He calls on a student at random to
put a problem on the board to check whether he or she has done the assignment.
a. What does Hui consider a “success” in this story? What is the probability?
b. What does Hui consider a “failure” in this story? What is the probability?
c. Why is this a Bernoulli situation? What is the parameter?
d. Define X in terms of this story. What values can X take?
e. What is the probability that the student Hui selects is one who has done
the assignment?
f. If the students are independent, what is the probability that the 3rd
student Hui checks will have done the assignment?
Exercise 14.4. Blu-rays. Suppose that 1% of Blu-ray discs produced by a
company are defective. You buy one of these discs and check to see if it is
defective.
a. What do you consider a “success” in this story? What is the probability?
b. What do you consider a “failure” in this story? What is the probability?
c. Why is this a Bernoulli situation? What is the parameter?
d. Define X in terms of this story. What values can X take?
e. Draw the labeled graph of the mass for this story.
f. Draw the labeled graph of the CDF for this story.
Exercise 14.5. Shoes. Anne and Jane have shoes spread throughout the dorm
room. Anne has 15 pairs of shoes; twenty percent of her shoe collection consists
of sandals. Jane has 40 pairs of shoes; ten percent of her shoe collection consists
of sandals.
a. A shoe is picked at random from the dorm room belonging to Anne and
Jane; what is the probability that it is a sandal?
b. If a randomly chosen shoe is chosen from the room (with all shoes equally
likely to be chosen), what is the probability that it belongs to Anne?
c. If a randomly chosen shoe is chosen from the room (with all shoes equally
likely to be chosen), and upon examination this shoe is seen to be a sandal, what
is the probability that it belongs to Anne?
Exercise 14.6. Movie date. Chris and Juanita always go to the movies
on Friday night. Before meeting for their date, they each make a decision
(independently, without consulting) of what genre of movie they prefer to see.
Chris prefers an adventure movie with probability 70% and a romance with
probability 30%; Juanita chooses adventure with probability 34% and a romance
with probability 66%.
180 Chapter 14. Bernoulli Random Variables
Exercise 14.7. Studying. Let X be the number of nights that you spend
studying in a 30-day month. Assume that you study, on a given night, with
probability 0.65, independent of the other nights. Write X as the sum of thirty
indicators (i.e., as the sum of 30 Bernoulli random variables).
a. Find E(X).
b. Find Var(X).
14.3.2 Extensions
Exercise 14.8. Trucks and cars. On a certain highway, 7% of the vehicles
have 18 wheels, and the other 93% of the vehicles have 4 wheels. (We ignore
motorcycles, etc., for simplicity.) A child looks out the window and counts the
wheels on the next vehicle to pass.
Exercise 14.10. Winning and losing. Suppose that a person wins a game of
chance with probability 0.40, and loses otherwise. If he wins, he earns 5 dollars,
and if he loses, then he loses 4 dollars.
14.3.3 Advanced
Exercise 14.11. Reciprocal of a random variable. If X is a Bernoulli
random variable, is 1/X a well-defined random variable? If not, why? If so,
what is the mass?
Chapter 15
Not everything that can be counted counts, and not everything that counts can
be counted.
—Informal Sociology: A Casual Introduction to Sociological Thinking by
William Bruce Cameron (Random House, 1963)
You are playing a series of one-on-one basketball games with your friend. You
plan on playing 5 games total, and your friend (who is a better player and wins
about 68% of the time you play her), says that she will buy you one lunch for
each game that you win. You will have to buy her lunch for each game she
wins. What is the probability you will have to buy your friend lunch 5 days
next week? More than half the school days next week? Only once?
15.1 Introduction
Binomial random variables are more general than Bernoullis. A Binomial is the
number of successes in n independent trials. Equivalently, a Binomial is the
A Binomial is the sum of n independent Bernoulli random variables with the same probability of
sum of n success p:
independent
Bernoullis. Binomial random variable(n, p) = sum of n independent
Bernoulli(p) random variables
X = X1 + · · · + Xn
182
15.1. Introduction 183
is not affected by how many are successes. We specify, in advance, how many
trials will take place, regardless of how many succeed or fail.
For instance, if n = 8, we can conduct 8 independent trials and let X1 , . . . ,
X8 be the 8 Bernoulli random variables that show which of the 8 trials succeed
or fail. If X = X1 + · · · + X8 , then X is a Binomial random variable that gives
the total number of successes on these 8 trials. As an example:
X = X1 + · · · + X8 = 1 + 1 + 0 + 0 + 1 + 0 + 1 + 0 = 4
X ∼ Binomial(n, p),
where n is the total number of trials and p is the probability of success on each
trial. We also need the Binomial notation:
E.g., when n = 8, the number of ways to arrange j successes within the trials
are:
8 8! 8 8! 8 8!
= =1 = =8 = = 28
0 0!8! 1 1!7! 2 2!6!
8 8! 8 8! 8 8!
= = 56 = = 70 = = 56
3 3!5! 4 4!4! 5 5!3!
8 8! 8 8! 8 8!
= = 28 = =8 = =1
6 6!2! 7 7!1! 8 8!0!
184 Chapter 15. Binomial Random Variables
The parameters:
n
= number of ways to arrange x successes in n trials
x
px = (probability of a success on one trial)# of successes
We have seen examples of Binomial random variables several times already. See,
for instance, Example 14.5b, in which X, Y, Z are independent Bernoullis, each
with probability of success 0.2, so X + Y + Z is Binomial with parameters n = 3
and p = 0.2. Binomial random variables are especially useful in problems that
have sampling with replacement (the replacement assures us that the relevant
probability of success does not change from trial to trial).
15.2. Examples 185
15.2 Examples
31
P (X = 5) = (0.3)5 (0.7)26 = 0.03876
5
P (X ≥ 1) = 1 − P (X = 0)
31
=1− (0.3)0 (0.7)31
0
= 1 − 0.000016
= 0.999984
Remark 15.3. In general, when computing the probability of “at least one”
of something, it is easier to compute the probability of the complement, i.e.,
the probability that there are zero occurrences of the relevant phenomenon.
For example, in Example 15.2e, it is much easier to use the complement of the
event than to add the probabilities corresponding to 1, 2, . . . , 31 snowy days.
pX (x)
0.16
0.12
0.08
0.04
x
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
FX (x)
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
x
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
pX (x) FX (x)
1
(2, 0.375)
0.4 0.8
0.3 (1, 0.25) (3, 0.25) 0.6
0.2 0.4
0.1 (0, 0.0625) (4, 0.0625) 0.2
x x
1 2 3 4 1 2 3 4
Left: Mass of the number of girls. Right: CDF of the number of girls.
b. What is the probability the couple has at least one girl?
P (X ≥ 1) = 1 − P (X = 0)
4
=1− (0.5)0 (0.5)4
0
= 1 − 0.0625
= 0.9375
1 − P (X = 0) ≥ 0.99,
or equivalently,
0.01 ≥ P (X = 0). (15.1)
We have
n
P (X = 0) = (0.5)0 (0.5)n = (0.5)n ,
0
since for any positive integer k, k0 = 1 and k 0 = 1. Substituting back into
(15.1) gives
0.01 ≥ (0.5)n .
188 Chapter 15. Binomial Random Variables
Taking a natural log of each side gives ln 0.01 ≥ ln (0.5)n , and then we can pull
the exponent n down to get
ln 0.01 ≥ n ln (0.5).
X = X1 + · · · + Xn
E(X) = E(X1 + · · · + Xn )
= E(X1 ) + · · · + E(Xn )
= p + ··· + p
= np.
Since the Xj ’s are independent, then we can also add their variances to obtain
the variance of X:
Var(X) = Var(X1 + · · · + Xn )
= Var(X1 ) + · · · + Var(Xn )
= pq + · · · + pq
= npq.
Remark 15.6. We stress that for the Binomial distribution (and also for
the Geometric and Negative Binomial distributions, which we will see later)
it is important to have independence, or at least relative independence. So
what happens if you have a small population and you are sampling without
replacement, i.e., the probabilities change between trials? We’ll cover that in
Chapter 19, with the Hypergeometric distribution.
Example 15.7. Roll a die n times. Let X denote the total number of 4’s and
5’s that appear.
15.2. Examples 189
Here X is a Binomial random variable, with each outcome of “4” or “5” treated
as a “success.” So p = 2/6 = 1/3, because 2 out of the 6 equally likely outcomes
are successes. The mass of X is
j
1 n−j
n 1
pX (x) = 1−
j 3 3
j n−j
n 1 2
=
j 3 3
n−j
n 2
=
j 3n
E(X) = np = n/3,
Example 15.8. (Refer to Example 14.1) Suppose 95% of people put peanut
butter on first when making a peanut butter and jelly sandwich. Five peo-
ple are independently asked about their sandwich-making habits. What is the
probability that the majority of them put the peanut butter on first?
Let X be the number of people who put the peanut butter on first, so X is a
Binomial random variable with mass
5
P (X = j) = (0.95)j (1 − 0.95)5−j
j
5
= (19/20)j (1/20)5−j
j
j
5 19
= ,
j 205
P (X = 3 or X = 4 or X = 5) = P (X = 3) + P (X = 4) + P (X = 5)
3 4 5
5 19 5 19 5 19
= 5
+ 5
+
3 20 4 20 5 205
= 0.0214 + 0.2036 + 0.7738
= 0.9988
190 Chapter 15. Binomial Random Variables
15.3 Exercises
15.3.1 Practice
Exercise 15.1. Skittles. Skittles candies come in the colors red, orange, yel-
low, green, and purple, with each color having equal probability. You are a
quality control inspector, and your job is to count up the number of purple
candies in a random sample of 25 candies from a large population of candies
coming down the factory line.
a. What is a “success” in this situation? What is the probability of a success
on a single trial?
b. What is a “failure” in this situation? What is the probability of a failure
on a single trial?
c. Explain in words what X is in terms of the story. What values can it
take?
15.3. Exercises 191
a. Explain in words what X is in this situation and what values it can take.
192 Chapter 15. Binomial Random Variables
d. What is the probability that either Steve or Jeff will succeed in making
at least 1 field goal?
e. Given that only one field goal total was scored, what is the probability
that Jeff was the one who kicked it?
Exercise 15.12. Tennis. Suppose a tennis player hits an ace once out of every
five serves. Suppose in a match the player performs 80 serves.
a. What is the expected number of aces?
b. What are the variance and standard deviation of the number of aces?
Exercise 15.13. Exam. On a multiple choice exam, a student decides to test
his luck. His exam has 20 questions, each of which has 5 answer choices. The
student decides to roll a die on each question and use the result on the die as
his answer; any time that he rolls a 6, he just discards that roll and tries again.
Let X be the number of questions he gets right on the exam altogether. What
is the probability he gets a grade of A (overall score at least 90%) using this
method?
Exercise 15.14. Dice. Two students decide to make bets about their plans
for lunch during the next work week (Monday through Friday). They roll a
six-sided die five times (once for each day). The agreement is that, for each
day, when a 6 shows up, the first student has to pay for both lunches, and if a
1 shows up, the second student has to pay for both lunches. If neither of these
events occurs, they will bring their lunch from home on that day.
a. How many days does the first student expect to buy lunch for the second
student?
b. How many days do they expect to bring their lunches?
c. What is the mass of the number of days on which the first student buys
lunch for the second student?
d. What is the mass of the number of days on which they bring their lunches?
Exercise 15.15. Winning and losing. Suppose that a person wins a game
of chance with probability 0.40, and loses otherwise. If he wins, he earns 5
dollars, and if he loses, then he loses 4 dollars. Assume that he plays ten games
independently. Let X denote the number of games that he wins. (Hint: His
gain or loss is 5X + (−4)(10 − X) = 9X − 40, since he loses 10 − X games.)
a. What is his expected gain or loss (altogether) during the ten games?
b. What is the variance of his gain or loss (altogether) during the ten games?
c. What is the probability that he wins $32 or more during the ten games?
Exercise 15.16. Telemarketers. Assume that when your phone rings, the
caller is a telemarketer with probability 1/8, and that the probability of a tele-
marketer is independent from call to call. Let X denote the number of telemar-
keters during the next three calls.
15.3. Exercises 195
15.3.2 Extensions
Exercise 15.22. Saving energy. Despite lecturing your roommates on energy
conservation, there is a 60% chance that the lights in a dorm room will be left
on when nobody is home. Each day is independent. Suppose that, every day
the light is left on in a dorm room, there are 1000 Watts of power used. Every
day when the light is turned off, there are 200 Watts of power used. You keep
track of X, the number of days the lights are left on over the next 30 days.
15.3.3 Advanced
Exercise 15.24. Ants. In the ant world, 98% of the ants are female, and
2% are male. In the queen’s first batch of 100,000 offspring, let X be the
total number of male births. Write an expression for the probability that 2100
or more of the ants are males. You do not need to simplify or evaluate your
expression. Would it be difficult to calculate?
Exercise 15.25. Cereal for breakfast. Sixty percent of students usually eat
cereal for breakfast. If n students eat in the dining halls for breakfast each day,
let Xn be the number who have cereal. Find the limiting probability that at
least one student has cereal, as n grows large, i.e., find limn→∞ P (Xn > 0).
Does your answer make sense intuitively? Why?
Chapter 16
You are playing a series of one-on-one basketball games with your friend. Instead
of the 5-game series you played with your friend last week (described in the
introduction to Chapter 15), this week you have decided that you will only play
as many games as it takes for you to win your first game. You still have to buy
your friend lunch for each game she wins, and then she will buy you lunch when
you win your first game. What is the probability you will have to buy your
friend a lunch exactly 4 days this week, before she buys your lunch? What is
the probability you don’t have to buy lunch for your friend at all? How many A Geometric
lunches do you expect to have to buy for your friend? How is this game different random variable is
from last week’s game? the number of
independent trials
until (and
including) the first
success, when the
16.1 Introduction probability of
success on each trial
A Geometric random variable is the number of independent trials needed until is constant.
the first success occurs. An equivalent interpretation is that a Geometric ran-
dom variable is the number of independent Bernoulli random variables we need
to check until the first one that indicates success. If X1 , X2 , . . . are independent
Bernoulli(p) random variables, then
197
198 Chapter 16. Geometric Random Variables
The parameters:
Variance formula:
Var(X) = q/p2
For instance, if a Geometric random variable X has the value 6, we can view
this as five independent Bernoulli random variables that indicate failure, i.e.,
X1 = X2 = X3 = X4 = X5 = 0, followed by a sixth independent Bernoulli that
indicates success, i.e., X6 = 1.
As an example:
X ∼ Geometric(p),
x − 1 = number of trials − 1
= number of failures
q x−1
= (probability of a failure on one trial)# of failures
Example 16.1. Suppose 3% of pet owners give gifts to their pets on Valentine’s
Day. You plan to talk to randomly selected pet owners from a large population
until you find one who gives a gift to his or her pet. Since you talk to them
individually, assume their responses are independent.
1
E(X) = = 33.33 pet owners
0.03
200 Chapter 16. Geometric Random Variables
d. What is the standard deviation of the number of people you have to talk
to until you find one who gives a Valentine’s gift to his or her pet?
1 − 0.03
Var(X) = 2
= 1077.78
√0.03
σX = 1077.78 = 32.8295 pet owners
pX (x)
0.030
0.020
0.010
x
10 20 30 40 50 60 70 80 90 100
FX (x)
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
x
10 20 30 40 50 60 70 80 90 100
g. What is the probability you will have to talk to exactly 40 people total
to find your first pet-gift-giver?
or or
One way to see this is to calculate the probability that a Geometric random
variable is finite. If X is a Geometric(p) random variable with p > 0, then
X X X
P (X is finite) = P (X = j) = q j−1 p = p q j = p/(1 − q) = p/p = 1.
j≥1 j≥1 j≥0
Another way to see this is, if X is infinite then for each n we know n < X, so
0 ≤ P (X is infinite) ≤ P (n < X) = q n ,
It might seem unusual to differentiate with respect to “q,” but the derivation
lends itself to this change, and now we can switch the order of the summation
and the differentiation, so
d X j d q (1 − q)(1) − (q)(−1) 1
E(X) = p q =p =p 2
= p 2 = 1/p.
dq dq 1 − q (1 − q) p
j≥1
This almost gives us the variance. Now we use the fact mentioned above, that
X 2 = (X)(X − 1) + X, and this yields
Example 16.5. Draw a card from a well-shuffled deck until the ace of spades
appears. If a draw is unsuccessful, then replace and reshuffle the deck before
making the next selection. Let X be the number of draws needed until the ace
of spades appears for the first time.
(Note that, in the case where the ace of spades never appears, we have
not specified the value of X, but this omission does not affect the probabilities
associated with the problem, because the probability of the ace of spades never
appearing is zero.)
The mass of X is pX (j) = (51/52)j−1 (1/52) for positive integers j, and
pX (x) = 0 otherwise. So X is a Geometric random variable with p = 1/52. The
expected number of cards until the first ace of spades appears is E(X) = 1/52
1
=
51/52
52. The variance of X is Var(X) = (1/52)2
= 2652.
Example 16.6. Suppose that a basketball player makes 80% of her free throws
successfully. She attempts to make a basket as many times as necessary, until
scoring the first basket. Let X denote the number of necessary attempts.
(Again, the probability that she never scores is 0; we have not specified the
value of X when she never scores.)
The mass of X is pX (j) = (0.20)j−1 (0.80) for positive integers j, and
pX (x) = 0 otherwise. Thus, X is Geometric with p = 0.80. So E(X) =
1/0.80 = 1.25 and Var(X) = 0.20/0.802 = 0.3125.
Example 16.7. People are selected randomly and independently for a drug
test. Each person passes the test 98% of the time. What is the expected
number of people who take the test until the first person fails?
Let X be the number of people tested, until the first person fails the test. Then
P (X = j) = (0.98)j−1 (0.02) for each positive integer j. Thus, X is Geometric
with p = 0.02. So we expect E(X) = 1/0.02 = 50 people to be tested until the
first person fails the test. The variance of X is Var(X) = 0.98/0.022 = 2450.
204 Chapter 16. Geometric Random Variables
What is the probability we need at least 5 trials to get the 1st success?
What is the probability we need at most 5 trials to get the 1st success?
P (X ≤ 5) = 1 − P (X > 5) = 1 − q 5
What is the probability we need fewer than 5 trials to get the 1st success?
P (X < 5) = 1 − P (X ≥ 5) = 1 − P (X > 4) = 1 − q 4
Trial ? ? ?
Outcome No No ? ? ?
Past/future? Past Past Future Future Future
Why does this work? Each trial is independent. Once some trials have
passed, the next trials do not depend on what has already happened in the
past. You already know the first 2 trials were failures. What you don’t know is
what will happen on the next 3 trials.
Here is the same explanation using conditional probability:
Example 16.8. Given that more than j trials are needed to get the first success,
what is the probability that more than k trials are needed?
Of course we have P (X > k | X > j) = 0 if k < j. So we now consider k ≥ j.
In such a case, the first j trials are already given to be failures, so only trials
number j + 1, j + 2, . . . , k are needed to be failures for the first k trials to all
be failures. So
d. Given that the first 4 are unsuccessful, what is the probability at least 8
are needed?
16.6 Exercises
16.6.1 Practice
Exercise 16.1. Skittles. Skittles candies come in the colors red, orange, yel-
low, green, and purple, with each color having equal probability. Due to a dye
mix-up there are a few rainbow-striped candies coming down the line. There is a
5% chance that a candy is rainbow striped. You are a quality control inspector,
and your job is to find the first rainbow-striped candy coming down the line.
(The population is so big that we can assume relative independence.)
a. Explain in words what X is in this situation and what values it can take.
b. Why is this a Geometric distribution situation? What is the parameter?
c. What is the probability you will have to survey at most 16 men until you
find the first one who is colorblind?
d. What is the probability you will have to ask exactly 12 men until you
find the first who is colorblind?
e. What is the expected value of X?
f. What is the variance of X?
g. Show the labeled graph of the mass for this story.
h. Show the labeled graph of the CDF for this story.
a. Explain in words what X is in this situation and what values it can take.
b. Why is this a Geometric distribution situation? What is the parameter?
c. What is the probability that you have to survey parents of at least 5
babies until you find the first one not born by C-section?
16.6. Exercises 209
d. Given that you have to survey parents of more than 1 baby to find the
first one not born by C-section, what is the probability that you have to survey
at least 5 parents of babies?
e. What is the probability that you must survey between 4 and 6 (inclusive)
parents of babies to find one that was not born by C-section?
f. What is the expected value of X?
g. What is the variance of X?
h. Show the labeled graph of the mass for this story.
i. Show the labeled graph of the CDF for this story.
Exercise 16.5. Radio airplay. A certain radio station plays songs from the
1970’s, 80’s, and 90’s. We know that 20% of the songs on the station are from
the 70’s; 37% are from the 80’s; and 43% of the songs are from the 90’s. Let
X be the number of songs you listen to until you hear the first one from the
1990’s.
a. What is the expected number of men she will need to invite until she has
a date for Homecoming?
b. Show the labeled graph of the mass for this story.
c. Show the labeled graph of the CDF for this story.
210 Chapter 16. Geometric Random Variables
Exercise 16.7. Chores. Sarah and Thomas play a card game to determine
who will have to take out the trash (the loser gets this unpopular chore) on
Monday. They use a standard 52-card deck by taking turns randomly drawing
cards from a shuffled deck, with replacement, until somebody draws an ace.
Whoever gets the ace does not have to do the chore.
a. How long do they expect to have to play the game?
b. What is the probability that it takes at least four cards for them to find
an ace?
c. If they play this game for six weeks, what is the probability it takes at
least four cards for them to find an ace on each of those 6 Mondays? What
distribution is being used here? What are the parameters?
Exercise 16.8. Random guessing. On a certain online math assignment, a
student is allowed to submit an unlimited number of answers before moving on
to the next problem. The problem is a free response question, and the student
believes that his guessed answer is correct about 7% of the time. (There are
so many possible guesses that he thinks this ratio stays the same every time.)
Under these assumptions, let X be the number of times he needs to submit a
different random guessed answer until he gets the question correct.
a. How many tries should he expect if he wants to get this question right
without actually learning the material?
b. What is the probability it takes him at least 20 tries to get the question
right?
c. If there are 10 homework questions of a similar nature on this assignment,
and he uses the same random technique with all of the questions, what is the
probability it takes him at least 20 tries to get each of these questions right?
What distribution is being used here? What are the parameters?
d. If each homework question takes him 2 minutes to read (once) and 30
seconds for each random entry attempt, what is the expected time it will take
him to get one question correct? What about the 10 question assignment? (Do
you think he would have been better off just learning the material and doing
the homework properly?)
Exercise 16.9. Shopping before Thanksgiving. Studies have shown that
28% of people do all of their Christmas shopping before Thanksgiving each year.
Let X be the number of people you have to ask until you find somebody who
has finished all of their Christmas shopping before Thanksgiving, assuming each
person is independent of the others.
a. What is the expected number of people you will have to ask until you
find somebody who has finished her Christmas shopping before Thanksgiving?
b. If each interview takes approximately 5 minutes, how much time should
you expect to have to spend total?
16.6. Exercises 211
c. What is the probability that you will have to ask more than 10 people?
Exercise 16.10. Lucky Charms. Michael reaches into a very large box and
pulls out Lucky Charms. If percentages of the pieces are: 50% regular cereal,
6.25% each for hearts, stars, horseshoes, clovers, blue moons, pots of gold,
rainbows, and red balloons, and he only wants blue moons. Let X be the
number of individual pieces he has to pull out until he gets a blue moon.
a. What is the probability that X is more than 8?
b. What is the probability that X is at least 8?
c. What is the probability that X is less than 8?
d. What is the probability that X is at most 8?
Exercise 16.11. Vampire difficulties. Edward and his friend cannot go to
school when it’s sunny outside because they are vampires. Forks, Washington
is experiencing a sunny period these days, and there is an 80% chance each day
that it will be sunny. Assume that the weather is independent from day to day.
Let X be the number of days until the first day Edward can go outside.
a. What is the probability that Edward will have to wait between 10 to 12
days (inclusive) to go back to school?
b. Given that he has already waited 5 sunny days, what is the probability
that his waiting time altogether is between 10 to 12 days (inclusive) to go back
to school?
c. What is the expected number of days that he will have to wait to go back
to school?
Exercise 16.12. Blindfolded basketball. A basketball player shoots free
throws until he makes one. However, because his coach wants him to practice
his technique and the feel of the motions, his coach has the player blindfold
himself, which makes each throw independent from the others. When he is
blindfolded, he has only a 2% chance of making a free throw on any particular
try.
a. What is the probability that he will have to throw over 100 balls until he
makes his first basket?
b. What are the expected value and standard deviation for the number of
throws he will need to make until he gets his first basket?
c. What would you have to change about this story to turn it into a Binomial
question?
Exercise 16.13. Winning and losing. Suppose that a person wins a game
of chance with probability 0.40, and loses otherwise. He plays the game until
he wins for the first time, and then he stops. Assume that the games are
independent of each other. Let X denote the number of games that he must
play until (and including) his first win.
212 Chapter 16. Geometric Random Variables
a. How many games does he expect to play until (and including) his first
win?
b. What is the variance of the number of games he plays until (and includ-
ing) his first win?
c. What is the probability that he plays 4 or more games altogether?
Exercise 16.14. Winning and losing (continued) Continue to use the sce-
nario from the previous problem. As before, let X denote the number of games
that he must play until (and including) his first win. Assume that, if he wins,
he earns 5 dollars, and if he loses, then he loses 4 dollars. (Also assume that he
is allowed to borrow money, i.e., having a negative amount of money is not a
problem here.)
a. Find a formula for his gain or loss, in terms of X, i.e., if Y denotes his
gain or loss in dollars, write Y in terms of X.
b. What is his expected gain or loss (altogether) during the X games, i.e.,
what is E(Y )?
c. What is the variance of his gain or loss (altogether) during the X games,
i.e., what is Var(Y )?
Exercise 16.15. Dating. You randomly call friends who could be potential
partners for a dance. You think that they all respond to your requests indepen-
dently of each other, and you estimate that each one is 7% likely to accept your
request. Let X denote the number of phone calls that you make to successfully
get a date.
a. Find the expected number of people you need to call, i.e., E(X).
b. Find the variance of the number of people you need to call, i.e., Var(X).
c. Given that the first 3 people do not accept your invitation, let Y denote
the additional number of people you need to call (Y does not include those
first 3 people); i.e., suppose X > 3 is given, then let Y = X − 3. Under these
conditions, what is the mass of Y ?
Exercise 16.16. Hearts. You draw cards, one at a time, with replacement
(i.e., placing them randomly back into the deck after they are drawn), from a
shuffled, standard deck of 52 playing cards. Let X be the number of cards that
are drawn to get the first heart that appears.
a. How many cards do you expect to draw, to see the first heart?
b. Now suppose that you draw five cards (again, with replacement), and
none of them are hearts. How many additional cards (not including the first
five) do you expect to draw to see the first heart?
16.6. Exercises 213
16.6.2 Extensions
Exercise 16.17. Telemarketers. Assume that when your phone rings, the
caller is a telemarketer with probability 1/8, and that the probability of a tele-
marketer is independent from call to call. Let X denote the number of tele-
marketers during the next three calls. If n is a nonnegative integer, what is
P (X > n)?
Exercise 16.18. Looking for a wife. Prince Charming has to go around town
asking if the glass slipper fits until he finds a woman whose foot fits properly in
the slipper so that he will know who to marry. (We do not endorse this technique
for finding a wife.) The probability of a glass slipper fitting a randomly selected
woman is 0.12, and the probabilities are independent for the various women.
Let X be the number of women he has to visit until he finds a woman who fits
the slipper. Assume that there are an unlimited supply of eligible women in the
town.
a. Given that he has already checked with 4 women without success, what
is the probability he will still need to check with at least 5 more? (He’s getting
impatient.)
b. Given that he has already checked with 4 women without success, what
is the probability he will succeed with the very next woman?
c. What is the probability he will succeed on his first try?
d. What is the expected number of women he will have to try?
e. If he takes an entourage with him everywhere he goes, and he has to pay
the entourage $100 for the day plus $10 for every visit he makes, how much
does he expect to pay if he does all his visits on one day?
16.6.3 Advanced
Exercise 16.20. Consider two independent Geometric random variables, X
and Y , with parameters p1 and p2 , respectively. Give a general formula (in terms
of p1 and p2 ) for the probability that X and Y are equal, i.e., for P (X = Y ).
Exercise 16.21. Use the probability mass function to justify the fact that, if
X is a Geometric random variable, then P (X > x) = q x . (In Section 16.2, we
justified this intuitively, but we did not use the PMF to prove it.)
Chapter 17
The toughest thing about success is that you’ve got to keep on being a success.
—Irving Berlin
17.1 Introduction
A Negative Binomial random variable is the number of independent trials re-
A Negative quired until a certain number of successes have occurred. For instance, a Nega-
Binomial random tive Binomial random variable could be the number of independent trials until
variable is the the 3rd success occurs. The successes do not have to be consecutive (they
number of usually are not). A Negative Binomial random variable can be interpreted as
independent trials
the sum of several independent Geometric random variables. For example, if
until r successes
have occurred. X, Y, Z are independent Geometric random variables with the same parameter,
then X + Y + Z is a Negative Binomial random variable for the number of trials
until the third success.
214
17.1. Introduction 215
X can’t be less than r because you need at least r trials to get r successes.
The parameters:
Variance formula:
Var(X) = qr/p2
After each individual success, the search for the next success starts again,
independent of what came before. For this reason, a Negative Binomial with
parameters r and p is exactly the sum of r independent Geometric random
216 Chapter 17. Negative Binomial Random Variables
variables, each with parameter p. Therefore, the expected value and variance of
Negative Binomial random variables are easy to determine from the expected
value and variance of Geometric random variables.
For instance, if a Negative Binomial random variable X with r = 3 has the
value 13, this means that the 13th trial is a success, and exactly 2 of the earlier
12 trials are a success too. As an example:
X ∼ NegBinomial(r, p)
We use x−1 x
r−1 instead of r because the last success is always the last trial,
so we only need to consider the ways to arrange r − 1 successes within the first
x − 1 trials.
17.2 Examples
a. What does X represent in terms of this story? What values can it take?
The variable X is the number of people she has to ask until finding the 5th
member of the team, so X can be 5, 6, . . . (She needs to ask at least 5 people
since she is looking for 5 successes.)
b. Why is this a Negative Binomial situation? What are the parameters?
She is counting up number of trials until getting the 5th success, which is
a person who played basketball in high school (r = 5). Each trial (person) is
independent and each person has the same probability of success (p = 0.12).
c. What is the probability that she finds the 5th member of the team on
the 20th person she asks?
19
P (X = 20) = (0.12)5 (0.88)15 = 0.01418
4
d. What is the expected number of people she needs to ask in order to form
her team?
5
E(X) = = 41.6667 people
0.12
e. What is the standard deviation in the number of people she will need to
ask in order to form the team?
5(1 − 0.12)
Var(X) = 2
= 305.5556
√ 0.12
σX = 305.5556 = 17.4801 people
probability 20
(0.12)5 (0.88)15 = 0.0574
P (X = 5) = 5
218 Chapter 17. Negative Binomial Random Variables
probability 19
(0.12)5 (0.88)15 = 0.0143
P (X = 20) = 4
X = X1 + X2 + · · · + Xr
This makes sense when you think that a Negative Binomial random variable
is just the sum of r independent Geometric random variables. For the basketball
example, you have the sum of 5 Geometric random variables since you need 5
basketball players.
X1 X2 X3 X4 X5
z }| { z}|{ z }| {z }| { z }| {
person 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
answer N N Y Y N N N N N Y N N N N N N N Y N Y
“5” appears, and X2 is the number of rolls after that point until the second
“5” appears, and X3 is the number of rolls after that point until the third “5”
appears, etc. Then each Xj is Geometric with parameter 1/6, and the Xj ’s are
independent.
17.3 Exercises
17.3.1 Practice
Exercise 17.1. Skittles. Skittles candies come in the colors red, orange, yel-
low, green, and purple, with each of these colors having equal probability. Due
to a dye mix-up there are a few rainbow-striped candies coming down the line.
There is a 5% chance that a candy is rainbow striped. You are a quality control
inspector, and your job is to find the 3rd rainbow-striped candy. (The popula-
tion is so big, and we randomly sample the Skittles, so we can assume relative
independence.)
220 Chapter 17. Negative Binomial Random Variables
a. Explain in words what X is in this situation and what values it can take.
b. Why is this a Negative Binomial distribution situation? What are the
parameters?
c. What is the probability you will have to ask exactly 12 men to find the
2nd who is colorblind?
d. What is the probability that you will have to ask between 12 and 14 men
to find the 2nd who is colorblind?
e. What is the expected value of X?
f. What is the variance of X?
g. Show the labeled graph of the mass for this story. Draw an arrow to
indicate where the mean is.
h. Show the labeled graph of the CDF for this story.
a. Explain in words what X is in this situation and what values it can take.
17.3. Exercises 221
to find 10 who failed the first exam. What are the expected value and standard
deviation of X?
Exercise 17.8. Still looking for a wife. Twelve percent of single women
in the kingdom have feet which will fit into a glass slipper. Prince Charming
thinks he must continue finding women who fit such a slipper, so that he has
a collection to choose from. He would like 10 women who fit the slipper to
compete on a “Bachelor”-type show for his hand in marriage.
Exercise 17.9. Replaying a video game. You vow to replay a tough level in
Super Mario World until you win 3 times. Assume that you have a 25% chance
of winning each time you play, and each round is independent (your skill does
not improve from game to game, because there is a lot of luck involved). Let X
denote the number of times you have to play. Each attempt takes 5 minutes.
a. What is the probability it will take you more than an hour to win 3
times?
b. How many minutes do you expect to need to win 3 times?
17.3.2 Extensions
Exercise 17.10. Missing an early class. My alarm clock wakes me up only
64% of the time. My probability class meets at 8:30 am, and if I don’t hear my
alarm, I’ll miss class. My teacher takes off one percent of our grade for every
class that we miss. Let X be the number of class days that pass until I have
lost 10% of my grade.
Exercise 17.11. Zombies. During a zombie apocalypse, one human finds that
about 1 out of every 3 shots he makes actually kills a zombie. Let X be the
number of shots he has to take until he kills his fifth zombie.
17.3. Exercises 223
a. Given that it takes between 14 and 16 shots (inclusive) to kill his fifth
zombie, what is the probability that it takes at least 15 shots?
b. Given that it takes him 12 tries to get his 4th success, what is the
probability that he will need exactly 3 more tries to get his 5th success? What
distribution is this? How do you know?
Exercise 17.12. Monopoly. Philip and Callum are playing Monopoly. Cal-
lum has an 80% chance of winning whenever he plays Philip. If they play
Monopoly until somebody wins 3 games (assuming no ties and that the games
are independent):
a. What is the probability that Callum wins the series in exactly 3 games?
In 4 games? In 5 games?
b. What is the probability Philip wins the series in exactly 3 games? In 4
games? In 5 games?
c. Given that the series took 5 games, what is the probability that Callum
won?
d. Given that Callum wins the series, what is the expected number of games
that the boys play?
Chapter 18
Math is like war, people! If you fall behind in your unit, you will die!
—attributed to Mr. Williams, math teacher from Poland Seminary High
School in Poland, Ohio, as remembered by Catharine Patrone, Director of Stu-
dent Services for the Honors College at Purdue University
You are an epidemiologist trying to find people who have a rare disease you
would like to study. The disease is so rare that only 1 out of every 20,000
people have it. In a city of 100,000 people, what is the probability that exactly
4 people have the disease?
18.1 Introduction
How do you Instead of counting up the number of trials or the number of successes, Poisson
pronounce Poisson? random variables have a different motivation. When you know the average
Pwah-sow(n), with rate of occurrences of some event, the Poisson distribution is often correct for
the accent on the describing the number of events that actually occur. For instance, there might
second syllable (the
be a Poisson number of cars passing a building during a 1-hour period, or a
n is implicit).
Poisson is French for Poisson number of raindrops landing on a sidewalk square in five minutes, or a
“fish,” but it is Poisson number of shoppers in a store during a given 3.5-hour afternoon, etc.
actually named after
We use λ as the average number of occurrences of an event during a fixed
Siméon Poisson.
time period. We can think of λ as an average rate, because it depends on the
time period. For instance, suppose that there are an average of 10 cars passing
by a building in an hour. We use λ = 10 if we want the average rate of cars
passing by the building in an hour, but we use λ = 5 if we want to know the
average number of cars passing in a 30-minute period, or λ = 40 for the average
number of cars passing in a 4-hour period. To use Poisson random variables in
these simple settings, we must assume that the average rate is proportional to
224
18.1. Introduction 225
the length of time we observe. This is a very different use of random variables
than we have encountered so far.
Poisson random variables can occur in other related ways too. For instance,
while reading a novel, the number of errors per page can be treated as a Poisson
random variable. In this case, the flow of words corresponds to the flow of time,
and an error on a page corresponds to an event.
The notation for a Poisson distribution looks like: The λ is the average
rate of events.
X ∼ Poisson(λ),
where λ denotes the average rate of events.
Poisson random variables
The common thread: How many events in a given period? You know
the average rate of events, i.e., number of events that occur in a period (this
average rate must be proportional to the length of the period). You count the
actual number of events that occur in such a time period. For instance, you
measure the number of cars that pass in a 40-minute period, the number of
raindrops that fall in one minute on a particular sidewalk square, the number
of times that cell phones in a room are ringing, etc.
Things to look for: A number of events, for which the average rate is known.
The variable:
The parameter:
λ = the average rate of events that occur during the specified period
Mass:
e−λ λx
P (X = x) = , x = 0, 1, 2, 3, . . .
x!
Expected value formula:
E(X) = λ
Variance formula:
Var(X) = λ
Poisson random
variables are not
poison.
Example 18.1. Let X denote the number of errors on a randomly selected
page of a book. Suppose that X has mass
e−0.2 (0.2)j
pX (j) = P (X = j) = .
j!
Then X is a Poisson random variable.
226 Chapter 18. Poisson Random Variables
In this case, for instance, the expected number of errors on a randomly selected
page is E(X) = 0.2, and the variance of the number of errors on a randomly
selected page is also Var(X) = 0.2.
For example, the probability of exactly 3 errors on a randomly selected page
of the book is
e−0.2 (0.2)3
pX (3) = = 0.00109.
3!
P (X ≤ 3) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
80 e−8 81 e−8 82 e−8 83 e−8
= + + +
0!0 1! 2! 3!
8 81 82 83
= + + + e−8
0! 1! 2! 3!
= 126.333e−8
= 0.04238
d. Given that at least 1 customer arrives in the next hour, what is the
probability that more than 3 arrive? (Notice also that the Poisson is not mem-
oryless.)
P (X > 3 ∩ X ≥ 1)
P (X > 3 | X ≥ 1) =
P (X ≥ 1)
P (X > 3)
= (X > 3 & X ≥ 1 only if X > 3)
P (X ≥ 1)
1 − P (X ≤ 3)
= computing the complement
1 − P (X = 0)
1 − 0.04238
=
1 − 0.0003355
= 0.9579
18.1. Introduction 227
In general, for Poisson random variables, if we need to compute P (X > a), and
we only have a calculator, then we need to compute the complement:
P (X > a) = 1 − P (X ≤ a) = 1 − P (X = 0) − P (X = 1) − · · · − P (X = a).
P∞
A direct calculation is infeasible because P (X > a) = j=a+1 P (X = j) is
an infinite sum that we cannot simplify (except by using the complement, as
suggested above).
e. What does the mass look like for this story?
pX (x)
0.14
0.12
0.10
0.08
0.06
0.04
0.02
x
10 20 30
FX (x)
1.0
0.8
0.6
0.4
0.2
x
5 10 15 20 25 30
1614 e−16
P (X = 14) = = 0.09302
14!
h. How many customers do you expect in the next 2 hours?
E(X) = λ = 16 customers
228 Chapter 18. Poisson Random Variables
2.6676 e−2.6667
P (X = 6) = = 0.03472
6!
Example 18.3. What is the probability that there will be exactly 6 customer
arrivals in exactly one out of the next three 20-minute intervals? (Hint: You
have already done some of the work for this problem in the previous question.)
3
P (X = 1) = (0.03472)1 (0.96528)2 = 0.09705
1
Now we think about how the simplification of j/j! = 1/(j − 1)! was performed
for the mean. We will use (j)(j − 1)/j! = 1/(j − 2)! below. We write
X X e−λ λj
E((X)(X − 1)) = (j)(j − 1)P (X = j) = (j)(j − 1) .
j!
j≥0 j≥0
Thus
Var(X) = λ2 + λ − λ2 = λ.
To prove this, observe Z can only take on nonnegative integer values, since
X and Y are each known to take on only nonnegative integer values. Also, in
order to have Z = j, we need to have X = i for some i with 0 ≤ i ≤ j, and
then we also need Y = j − i. So we just compute
j
X j
X
pZ (j) = P (X = i and Y = j − i) = P (X = i)P (Y = j − i),
i=0 i=0
230 Chapter 18. Poisson Random Variables
where the last equality comes from the independence of X and Y . Moreover,
we know the masses of X and Y and can use them, as follows
j
X e−λ1 λi1 e−λ2 λj−i
2
pZ (j) = .
i! (j − i)!
i=0
Next we factor out e−λ1 −λ2 , and multiply and divide by j!. Using λ = λ1 + λ2 ,
this gives
j j
e−λ1 −λ2 X j! i j−i e−λ X j i j−i
pZ (j) = λ λ = λ λ .
j! i!(j − i)! 1 2 j! i 1 2
i=0 i=0
e−λ e−λ λj
pZ (j) = (λ1 + λ2 )j = .
j! j!
Thus, we have established the following:
Example 18.9. With the knowledge that the sum of two independent Poisson
random variables is also a Poisson, we can return to Example 18.2g, and let Y
be the number of customers during hour 1, and Z be the number of customers
during hour 2. Then Y, Z are independent Poisson random variables, each with
mean 8. So X = Y + Z is the total number of customers during the two hours,
and X is also Poisson, with mean 8 + 8 = 16.
When does this work? If n is really big (say, for instance, n ≥ 1000) and p
is really close to 0 or really close to 1. How “big” and how “close” are needed?
If n is large and npq is near 1, or perhaps (as a rule of thumb) within a factor
of 10 away from 1, i.e.,
1/10 ≤ npq ≤ 10,
then the Poisson approximation to the Binomial is usually very appropriate.
For instance, if n = 10,000 and p = 10,000
1
, and q = 10,000
9999
, then nqp = 0.9999,
so the Poisson approximation to the Binomial should be good.
What parameters are involved? The expected value of a Poisson is λ, which
is an average rate. The expected value of a Binomial is n (number of trials)
times p (probability of success on a single trial). So we must set the expected
values equal.
How do you make the switch?
λ = np
We’re setting
100,000
P (X = 4) = (0.00005)4 (0.99995)99,996 = 0.175470002.
4
Our calculators don’t want the calculation above. Does yours? On the other
hand, we can do the approximation in the next part on any hand-held calculator,
and it agrees with the actual answer to several decimal places of accuracy.
b. If Y is Poisson with λ = np = (100,000)(0.00005) = 5, then
54 e−5
P (Y = 4) = = 0.175467370.
4!
232 Chapter 18. Poisson Random Variables
pX (x) FY (y)
0.18 0.18
0.16 0.16
0.14 0.14
0.12 0.12
0.10 0.10
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
x y
5 10 15 20 5 10 15 20
c. What is the expected number of people in this city who have this disease?
Using Binomial or Poisson, E(X) = E(Y ) = 5.
d. What do the masses look like for this story? Show for both the Binomial
and for the Poisson.
See Figure 18.1. As you can see from these graphs, the masses look practi-
cally identical for the Poisson and the Binomial with these parameters, and the
differences are extremely small.
Example 18.12. If there are n = 10000 words in a short story, and if each
word has a probability of p = 1/3000 of getting misspelled, then the number
of misspelled words in the entire short story is approximately Poisson, with
expected value np = 10/3 = 3.3333.
λ n−j
np n−j
(1 − p) n−j
= 1− = 1− ≈ e−λ
n n
18.4 Exercises
18.4.1 Practice
a. What values can X take? Why is this a Poisson situation? What is the
parameter?
b. What is the exact probability that exactly 300 marriage licenses will be
issued tomorrow (a decimal approximation is not needed)?
c. What is the expected number of marriage licenses issued during the next
hour ?
234 Chapter 18. Poisson Random Variables
g. Show the labeled graph of the CDF for adults in this story.
h. Show the labeled graph of the CDF for children in this story.
(The graphs for e and g should look relatively clear after just a few points,
but perhaps two dozen points will be needed to get a good understanding of the
graphs for f and h. A graphing calculator or computer might make the graphs
in f and h quicker to computer.)
Exercise 18.4. Pumpkin carving. According to the Guinness Book of World
Records (2005), the fastest pumpkin-carver on record, Steven Clarke, carved 42
pumpkins an hour. Assume this is his average rate. Let X be the number of
pumpkins Steven carves in an hour. (We suppose that the carver can steadily
maintain work at his record rate.)
a. Assume that this a Poisson situation. What is the parameter?
b. What is the probability Steven will carve exactly 40 pumpkins in the
next hour?
c. Given that he has carved at least 3 pumpkins in a 5-minute interval, what
is the probability that he will carve at least 4 pumpkins during that 5-minute
interval?
d. Given that he carves fewer than 4 pumpkins in a 5-minute interval,
what is the probability he carves fewer than 2 pumpkins during that 5-minute
interval?
e. What is the expected number of pumpkins he can carve in 3 minutes at
this pace?
f. Show the labeled graph of the mass for pumpkins carved in a 3-minute
interval.
g. Show the labeled graph of the CDF for pumpkins carved in a 3-minute
interval.
Exercise 18.5. Quadruplets. The probability of a mother giving birth to
quadruplets is 1 in 729,000. An obstetrician checks the records of 1,000,000
mothers, to see how many mothers of quadruplets are in her database.
a. Which distribution is technically appropriate in this situation? What are
the parameters?
b. Write a formula (using the distribution in part a) for the probability that
there are exactly 3 mothers of quadruplets, but do not solve.
c. What is the expected number of mothers of quadruplets in this sample?
d. Which distribution would be a good approximation to the distribution
in part a? What is the parameter?
e. Approximate the probability in part b by using the approximate distri-
bution in part d, and solve.
236 Chapter 18. Poisson Random Variables
a. A bird watcher sits and looks for the birds for 3 hours. How many of
these birds does she expect to see?
b. What is the probability that she sees exactly 5 of these birds during a
3-hour time period?
c. What is the probability that she sees exactly 1 bird in each of 3 separate
1-hour time periods? Which distribution is this, and what are the parameters?
Approximate the probability that he wins at least one time during his life-
time.
a. What is the probability that more than 2 beetles will cross the counter
in the next half hour?
b. What is the probability that more than 2 beetles will cross the counter
in each of the next two half hours?
c. What is the probability that more than 4 beetles will cross the counter
in the next hour?
Exercise 18.12. New car contest. People are competing to win a new car.
The task, which only 1 in 5000 people can achieve, is to hit a golf ball into a
cup 250 feet away. One hundred thousand people sign up to compete.
a. How many people will win, on average? What is the standard deviation
of the number of people who will win?
b. What is the probability that exactly 18 people win?
c. What is the probability that between 18 and 20 people win?
a. What is the probability that exactly 3 customers will arrive in the next
10 minutes?
b. What is the probability that at least 3 customers will arrive in the next
10 minutes?
c. Given that at least 3 customers will arrive in the next 10 minutes, what
is the probability that exactly 3 will arrive?
d. What is the expected amount of money the bakery will make in the next
10 minutes?
e. What is the standard deviation in the amount of money the bakery will
make in the next 10 minutes?
Exercise 18.14. Toy defects. Workers at a factory produce a toy with a defect
about once every 4 hours on average. Each toy costs the factory approximately
$7 in labor and supplies.
a. What is the expected number of toys with defects at the end of a 40-hour
work week?
b. What is the standard deviation in the number of toys with defects at the
end of a 40-hour work week?
c. What is the expected cost to the factory for toys with defects at the end
of a 40-hour work week?
d. What is the standard deviation in cost to the factory for toys with defects
at the end of the 40-hour work week?
238 Chapter 18. Poisson Random Variables
Exercise 18.16. Albino fish. There are approximately 11,000 fish in a lake.
Each fish has a 1 in 5500 chance of being albino. Let X be the number of albino
fish.
a. What is the probability that there will be exactly 8 successful calls in the
next hour?
b. How many hours are expected to pass until the first hour with exactly 8
people listening to the messages? What distribution is this?
Exercise 18.18. Hungry customers. At a certain hot dog stand, during the
working day, the number of people who arrive to eat is Poisson, with an average
of 1 person every 2 minutes.
a. What is the probability that exactly 3 people arrive during the next 10
minutes?
b. What is the probability that nobody arrives during the next 10 minutes?
c. What is the probability that at least 3 people arrive during the next 10
minutes?
Exercise 18.19. Errors in a book. An author has carefully edited his book,
but as all careful readers know, all books have some errors. Suppose that the
number of errors per page is Poisson, with an average of 0.04 errors per page.
a. In a 3-hour period, how many Yankees and Red Sox fans do we expect
altogether?
b. Find the probability that exactly one person enters the store during the
next 20 minutes who likes the Yankees or Red Sox.
Exercise 18.22. Website visitors. Suppose that the number of men who
visit a website is Poisson, with mean 12 per minute, and the number of women
who visit the same site is also Poisson, with mean 15 per minute. Assume that
the number of men and women are independent.
a. During the next 10 seconds, what is the probability that 1 man and 2
women visit the site?
b. What is the expected number of people who visit the site in the next
5 minutes?
b. What is the variance of the total number of people who visit the site in
the next 5 minutes?
Exercise 18.23. Superfans. At the local stadium, there are 60,000 fans at-
tending a football game. It is well known that only a few people at the game
will be impartial (i.e., will not care about the outcome of the game). Suppose
that each person at the game has probability 10,000
1
of being impartial (and that
the impartiality of a fan has no bearing on the other fans).
a. Give an exact formula for the probability that 8 of the people at the game
are impartial. You do not have to evaluate the formula on your calculator.
b. Use a Poisson estimation for the probability above.
c. Use your calculator to evaluate the Poisson expression that you gave in
part b.
240 Chapter 18. Poisson Random Variables
Exercise 18.24. Shoppers. During the holiday rush, there are 100,000 shop-
pers in a certain region. Each of these shoppers is extremely likely to make a pur-
chase. Suppose that a person makes a purchase with probability 49,999/50,000
and declines to make a purchase with probability 1/50,000. Let X be the num-
ber of people who decline to make a purchase.
a. Give an exact formula for the probability that P (X ≤ 3). You do not
have to evaluate the formula on your calculator.
b. Use a Poisson estimation for the probability above.
c. Use your calculator to evaluate the Poisson expression that you gave in
part b.
18.4.2 Extensions
Exercise 18.25. Bakery (continued) If it costs the bakery owner $20 to keep
the bakery open each hour (staff, electricity, etc.), each customer spends $2.50,
and customers arrive at the bakery at an average rate of 6 per half-hour, will
the bakery be able to stay in business (i.e., will the bakery make more money
than it spends in an hour)?
Exercise 18.26. Verifying a mass. Verify that the mass of a Poisson random
variable is really a mass, i.e., verify that the terms of the mass sum to 1.
18.4.3 Advanced
Exercise 18.27. If X is a Poisson random variable with λ = 3, find E(5X ).
Chapter 19
Hypergeometric Random
Variables
And will you succeed? Yes indeed, yes indeed! Ninety-eight and three-quarters
percent guaranteed.
—Oh, the Places You’ll Go! by Dr. Seuss (Random House, 1990)
In a Lotto game, there are 40 balls labeled 1 to 40. You pick 5 different numbers
for your lottery ticket. That night on TV, the state lottery office will randomly
select 5 different balls (each with a unique number) to be the winning ticket.
You could win $1 million if you picked all 5 winning numbers, but you can win
smaller prizes for picking 3 or 4 correct numbers. What is the probability that
you pick all 5 winning numbers? What is the probability that you win a prize
of some sort? How many numbers do you expect to get right?
19.1 Introduction
A Hypergeometric random variable is the number of desirable items we pick
when selecting some items without replacement from a mixed collection of de-
sirable and undesirable items.
Hypergeometric random variables have three parameters, N, M, n:
241
242 Chapter 19. Hypergeometric Random Variables
The parameters:
Mass:
M N −M
x n−x
P (X = x) = N
.
n
X ∼ Hypergeometric(M, N, n),
19.2 Examples
Let X be the number of crayons selected which are in good condition. Then X
is a Hypergeometric random variable with N = 64 crayons altogether, M = 60
crayons in good condition, and N − M = 4 crayons that are broken. The child
selects n = 2 crayons. So the probability that both crayons selected are in good
condition is
60 4
(60)(59)/2 (60)(59) 1770
pX (2) = P (X = 2) = 640 =
2
= = = 0.877976.
2
(64)(63)/2 (64)(63) 2016
The probability that exactly one of the two selected crayons is broken is
60 4
(60)(4) 240
pX (1) = P (X = 1) = 641 =
1
= = 0.119048.
2
(64)(63)/2 2016
Notice that these 3 probabilities add up to 1 because they cover every possible
way of selecting 2 crayons.
244 Chapter 19. Hypergeometric Random Variables
Example 19.2. A college student is running late for his class and does not
have time to pack his backpack carefully. He has 12 folders on his desk, 4 of
which include homework assignments due today. Without taking time to look,
he accidentally grabs just 3 folders from his stack. When he gets to class, he
counts how many of them contain his homework assignments. Assume that all
of the outcomes are equally likely.
P (X ≥ 2) = P (X = 2) + P (X = 3)
4 8 4 8
2
= 1 +
12
3
0
12
3 3
= 0.21818 + 0.01818
= 0.2364
f. On a different day, in the same situation (this student should really invest
in a reliable alarm clock), the student grabbed 8 folders at random. What is
the probability he got all 4 homework assignment folders?
Now n = 8, but there are only 4 possible “successes” available.
4
8
4 (1)(70)
P (X = 4) = 12
4 = = 0.1414
8
495
19.2. Examples 245
4 8 12 − 8
Var(X) = 8 1− = 0.3232
12 12 12 − 1
√
σX = 0.3232 = 0.5685 homework assignments
Example 19.3. Suppose that there are 100 roseate spoonbills altogether in
Indiana. Also suppose that 40 of them have been observed in 2009 by an or-
nithologist (a computerized tagging system allows the observer to know when
the observances are unique). In 2010 he observes 32 birds. What is the proba-
bility that j out of these 32 birds were already seen previously, in 2009?
In this case, there are N = 100 roseate spoonbills altogether. The “desirable”
birds are the ones that were previously seen in 2009, i.e., M = 40; the undesir-
able birds are the ones that were not seen in 2009, i.e., N − M = 60. We use X
to denote the number of the birds out of the n = 32 seen in 2010 which had
been previously seen in 2009. So the probability that exactly j of the 32 birds
this year were seen in 2009 is
40 60
j 32−j
pX (j) = P (X = j) = 100
.
32
X = X1 + X2 + · · · + Xn .
246 Chapter 19. Hypergeometric Random Variables
Any of the N items is equally likely to be chosen on the jth draw, and exactly
M of them are desirable, so E(Xj ) = M/N for each j. (This is worrisome to
some students at first, but just keep in mind that we are momentarily only
focused on the jth draw.) Even though the Xj ’s are dependent, the expected
value is still linear, i.e., the expected value of the sum is equal to the sum of
the expected values:
M M M
E(X) = E(X1 + · · · + Xn ) = E(X1 ) + · · · + E(Xn ) = + ··· + =n .
N N N
E(Xi Xj ) = P (Xi Xj = 1)
= P (Xi = 1 and Xj = 1)
= P (Xi = 1)P (Xj = 1 | Xi = 1).
There are n terms of the form E(Xj Xj ). The other n2 − n terms are of the form
E(Xi Xj ). So
M M M −1
E(X 2 ) = n + (n2 − n) .
N N N −1
Thus
M 2
2 M 2 2 M M −1
Var(X) = E(X ) − (E(X)) = n + (n − n) − n ,
N N N −1 N
which simplifies to
M M N −n
Var(X) = n 1− .
N N N −1
19.3. Binomial Approximation to the Hypergeometric 247
M # successes in population
= ≈p
N total # in population
50,000
E(X) = 10 = 0.5
1,000,000
50,000 50,000 1,000,000 − 10
Var(X) = 10 1−
1,000,000 1,000,000 1,000,000 − 1
= 10(0.05)(0.95)(0.999991)
= 0.4749957.
The answers for Binomial and Hypergeometric are almost exactly the same!
The masses for the Hypergeometric and for the Binomial approximation in
this story agree to five decimal places of accuracy.
19.4 Exercises
19.4.1 Practice
Exercise 19.1. Vending machine. You are hungry and decide to go to
patronize your office’s vending machine. There are 6 bags of potato chips, 7
bags of pretzels, and 5 packs of chocolate chip cookies. Unfortunately there is
something wrong with the buttons on the vending machine, and it will not let
you type in your selection. However, it will drop 3 snacks out at random. You
are hoping for 2 bags of cookies, and you don’t care what the other snack will
be.
a. What is a success in this story? What is a failure?
b. Explain in words what X is in terms of this story. What values can it
take?
c. Why is this a Hypergeometric situation? What are the parameters?
d. What is the probability you get the 2 bags of cookies?
e. What is the probability that you get at most 1 bag of cookies?
f. What is the expected number of bags of cookies you will get?
g. What is the standard deviation in the number of bags of cookies you will
get?
19.4. Exercises 249
Exercise 19.9. Ramen noodles. There are 20 packs of ramen noodle pack-
ages in a variety pack box, with 10 chicken and 10 beef flavored. What is the
probability that you grab one of each when you reach in to pull out two packages
of noodles without looking?
Exercise 19.10. Playlist. Suppose you have a playlist called “I Love the 90s”
which contains 35 songs, including 10 tracks from the Spin Doctors’ “Pocket
Full of Kryptonite.” If I shuffle the songs, what is the probability that 3 songs
from the album (no repeats) shuffle to the top 5?
a. What is the probability that I grab exactly 2 that are chocolate (either
chunk or chip)?
b. What is the probability that I grab fewer than 2 that are chocolate?
c. What is the expected number of chocolate granola bars out of the 3 that
I grab?
Exercise 19.13. Lucky Charms. Michael reaches into a bowl of 100 pieces of
Lucky Charms cereal. He desires blue moons, which make up 5% of the charms.
If his cereal bowl contains 40 charms, what is the probability he won’t get any
blue moons? What number of blue moons should he expect to get?
Exercise 19.14. Random pants. Henry has 10 pants: 4 dress slacks and 6
jeans. In a hurry while getting ready for a trip, he asks his kids to throw 3
pants in a suitcase for him without specifying which kind of pants he needs.
a. What is the probability the kids correctly throw in 2 pairs of dress slacks
and 1 pair of jeans?
b. What is the probability the kids throw in 2 pairs of jeans and 1 pair of
dress slacks?
c. Are the probabilities in parts a and b the same? Do they add up to 1?
Should they add up to 1? Why or why not?
Exercise 19.15. Corn and beans. As a prank, your roommate removed all
the labels from all the cans in your pantry and shuffled them around. Now you
have no idea what is in a can when you get ready to cook dinner! You know
that you have 8 cans of corn and 5 cans of beans. You don’t want to waste any
of the cans, so the best thing to do is to randomly open 2 cans every night and
just eat whatever is in those cans.
a. What is the probability that on the first night you get one of each type
of can?
b. If you got one of each type of can on the first night, what is the probability
you got one of each type of can on the second night?
c. Are these probabilities in parts a and b the same? Should they be? Why
or why not?
Exercise 19.17. Harmonicas. Carlos “Coyote” Jones owns quite a few har-
monicas. In particular, he has 7 professional harmonicas and 12 cheaper har-
monicas. They have relatively similar shapes, so when he reaches into his har-
monica container without looking, he does not notice a difference between them.
Suppose that he grabs 8 harmonicas, without replacement, and all selections are
equally likely.
19.4.2 Extensions
Exercise 19.18. Capture-recapture sampling. A wildlife biologist is trying
to estimate population size of a pack of hyenas. She tags 20 hyenas and then
releases them back into the wild. She revisits the same area 1 year later, and
examines 50 hyenas. She notes that 10 of the hyenas have the tags from 1 year
earlier. Estimate the population size, and explain your reasoning. Why would
this be considered a form of Hypergeometric distribution?
Exercise 19.19. Spades. Given a standard deck of cards, you are dealt a
hand of 13.
a. What is the probability that more than half the spades in the deck wind
up in your hand?
b. How many spades are you expected to get in this hand?
Chapter 20
When did you first start thinking about probability when you were young? Was
it while playing a board game that involved rolling a die? A die roll is a fairly
simple use of probability because each side is equally likely to come up. The
values 1 to 6 are equally likely to appear.
20.1 Introduction
Think about rolling a die 200 times. We did this, and we show the results below.
Our results should show a fairly similar number of 1s, 2s, 3s, 4s, 5s, and 6s.
die value 1 2 3 4 5 6
number of occurrences 36 36 28 31 38 31
percent of occurrences 0.18 0.18 0.14 0.155 0.19 0.155
Using the computer to simulate rolling a million dice, the results might be
the following:
die value 1 2 3 4 5 6
occurrences 166863 166786 166513 166890 166753 166195
percent 0.1669 0.1668 0.1665 0.1669 0.1668 0.1662
(The more we roll, the more evenly distributed the results would be.) The
result of a single die toss follows the Discrete Uniform distribution because each
of the 6 possible outcomes is equally likely to occur.
254
20.1. Introduction 255
For our purposes, we’ll assume that any Discrete Uniform distribution, if
not numbered consecutively with integers already, could be relabeled this way.
The Discrete Uniform is one of the simplest distributions.
Notation for the Discrete Uniform looks like:
X ∼ D.Uniform(N )
Mass:
1
P (X = x) = x = 1, 2, . . . , N
N
Expected value formula:
N +1
E(X) =
2
Variance formula:
N2 − 1
Var(X) =
12
Where does the mass formula come from? All of the pX (x) values are the
same, and we know that they also sum to 1:
pX (1) + pX (2) + · · · + pX (N ) = 1.
256 Chapter 20. Discrete Uniform Random Variables
20.2 Examples
Example 20.1. The Gilbreth family (from Cheaper by the Dozen fame) has
12 children (in order from oldest to youngest): Anne, Ernestine, Mary, Martha,
Frank, Bill, Lillian, Fred, Dan, Jack, Bob, and Jane. Mr. Gilbreth needs some
help with a project, so he whistles for a child to come help him. Each of the
children is equally likely to appear. We are numbering the children so that 1 is
for Anne (the oldest) and 12 is for Jane (the youngest).
Example 20.2. For a die toss on a fair, 6-sided die, the value of the roll is a
Discrete Uniform, between 1 and 6. Let X be the value of a die roll.
d. The experimental results above give very similar results to the theoretical
distribution. As the number of experiments (here, number of rolls) grows, the
closer the experimental results will get to the theoretical results (this will be
investigated further in Chapter 37, when we study the Central Limit Theorem).
20.3 Exercises
20.3.1 Practice
Exercise 20.1. Skittles. Skittles candies come in 5 different colors: red,
orange, yellow, green, and purple. You have a bowl of the candies, so you reach
in and grab one. Each of the five candies is equally likely to appear.
20.3.2 Extensions
Exercise 20.3. Let X be a Bernoulli random variable with p = 1/2. Let Y
be Discrete Uniform on {1, 2}. Explain (intuitively) why they have different
expected values but the same variance. No calculation should be needed.
20.3.3 Advanced
Exercise 20.4. Prove that the expected value of a Discrete Uniform random
variable on the set {1, 2, . . . , N } is E(X) = (N + 1)/2. Also prove that the
variance is Var(X) = (N 2 − 1)/12.
Chapter 21
These distributions are all starting to sound confusingly similar, aren’t they? To
help you sort them all out, we present Frequently Confused Distributions:
Bernoulli vs. Geometric vs. Binomial
• Bernoulli is a single yes/no trial. We use X = 1 if you get a success, or
X = 0 if you get a failure.
• Geometric is continuing to do more independent Bernoulli trials until
you get your 1st success. The X is the number of trials you have to do.
• Binomial is doing a pre-set number of independent Bernoulli trials and
then counting up the number of successes. The X is the number of suc-
cesses.
• Below is a comparison of results for 1 success in 5 trials for the Geometric
and Binomial distributions.
Binomial Geometric
Y N N N N
N Y N N N
N N Y N N
N N N Y N
N N N N Y NNNNY
5
= 5 arrangements only 1 arrangement since the
1
success must come on the last trial
259
Summary of Named Discrete Random Variables
Name Mass Expected value Variance Parameters What X is When used
Bernoulli pX (1) = p p pq p = prob. 0 or 1 1 success
pX (0) = q succ./trial (no or yes) or failure
n = # trials; 0, 1, 2, . . . , n successes
Binomial n
px q n−x p = prob. (successes) in n trials
x np npq
succ./trial
Geometric q x−1 p 1/p q/p2 p = prob. 1, 2, 3, . . . # trials
succ./trial (trials) to 1st succ.
p = prob. r, r + 1, . . .
Negative x−1
q x−r pr qr/p2 succ./trial; # trials
r−1 r/p
Binomial r = # of (trials) to rth succ.
260
succ. needed
Poisson e−λ λx/x! λ λ λ rate 0, 1, 2, 3, . . . # events
(events) in period
Hyper- M good, 1, 2, . . . , M # of good
−M
(Mx )(Nn−x )
geometric nM M N −n
N − M bad; selected
N
(n) N 1− N N −1 nM/N
n selected
Discrete 1/N (N + 1)/2 (N 2 − 1)/12 N outcomes 1, 2, . . . , N equally
Uniform likely
21.1. Summing-up: How To Tell Random Variables Apart 261
• Binomial is used when you know the number of trials (n) and the prob-
ability of success on each trial (p). The probability of success will be the
same for each trial, and the trials are independent from each other.
• Poisson is used when you know the average rate of arrival for the counts
(λ). You have a set interval instead of a set number of trials.
• If your sample size (n) is really big and your probability of success on a
single trial (p) is really small, you may want to use the Poisson approx-
imation to the Binomial. To make this switch, find the average for the
Binomial E(X) = np, and set that equal to λ. The X stays the same.
Bernoulli vs. Discrete Uniform
• Bernoulli is a single yes/no trial with a defined probability of success.
The probability of success and the probability of failure do not have to be
the same, but they do need to add up to 1.
• Discrete Uniform has one or more possible outcomes, and all of the
outcomes are equally likely.
• Discrete Uniform has only one success, and it must come from one of
the N possible outcomes.
21.2. Exercises 263
21.2 Exercises
For each of the following situations, state which distribution (and approximate
distribution, if applicable) would be most appropriate, and why you think so.
Exercise 21.1. Let X be the number of ice cream cones in your sample which
are broken if you sample 50 of them from a large, independent population, and
12% of the cones in the entire population are broken.
Exercise 21.2. Let X be the number of ice cream cones you need to sample
(again, from a large population) in order to find your 4th broken one, if they
come from a large, independent population, and 12% of the cones in the entire
population are broken.
Exercise 21.3. Let X be the number of broken cones you would find in the
next hour if broken cones come down the assembly line at a rate of 2 broken
cones per minute.
Exercise 21.4. Let X indicate whether the next ice cream cone is broken if
12% of the cones in a large, independent population are broken.
Exercise 21.5. Let X be the number of broken ice cream cones in your sample
if you check 30 from a box (without replacement). Twelve are broken out of the
100 in the box total.
Exercise 21.6. Let X be the number of broken ice cream cones you need
to sample in order to find your first broken one if they come from a large,
independent population, and 12% of the cones in the entire population are
broken.
Exercise 21.7. Let X be the number on the box you randomly select, if you
are choosing 1 box from 7 numbered boxes of ice cream cones.
Exercise 21.8. Let X be the number of broken ice cream cones in your sample
if you check 30 from a shipment (without replacement). The lot has 1200 broken
cones out of a total of 10,000.
Exercise 21.9. Let X be the number of undercooked ice cream cones in your
shipment of 10,000 if you sample from a large population. Undercooked ice
cream cones have a 0.005% chance of occurring in general.
Exercise 21.10. Chinese checkers. Philip and Callum play a game of Chi-
nese checkers. Each time they play a game, Philip has a 0.7 chance of winning.
Assume the games are independent.
264 Chapter 21. Review of Named Discrete Random Variables
a. What is the probability the next person who walks in the door will buy
a car?
b. If 10 customers come in to the dealership today, what is the probability
at least 2 of them will buy cars?
c. What is the probability that the 4th customer coming in today is the first
one who will buy a car?
d. What is the expected number of customers he needs to come into the
dealership, to sell his 3rd car?
e. If 10 customers come to his dealership each day, what is the probability
that he will sell at least 2 cars in each of 3 days out of the next week?
f. If customers are equally likely to want to buy cars painted red, brown,
blue, black, or white, what is the probability that the next customer who buys
a car picks a black car?
Exercise 21.12. Babies. On average, 9 babies are born per 24-hour day at
the local hospital.
a. What is the probability that at least one baby will be born today on the
8 am to 4 pm shift?
b. What is the probability that at least one baby will be born in the next
hour?
c. What is the probability that exactly 4 babies will be born on the 8 am
to 4 pm shift?
d. What is the probability that exactly 4 babies will be born on each of the
next three 8-hour shifts?
e. What is the probability that you would have to wait for four 8-hour shifts
until you got the first one with exactly 4 babies born?
f. What is the probability that exactly 12 babies will be born total in the
next 24 hours?
21.3. Review Problems 265
Exercise 21.13. Scholarship. There are 5 juniors and 10 seniors (one of which
is Amelia), trying to win a scholarship to a summer music program. Only 3
students can win, and the winners will be selected randomly by pulling names
out of a hat.
a. If no one can win more than once, what is the probability that all 3
winners will be seniors?
b. If students can win more than once and the students are independent
from each other, what is the probability that all 3 winners will be seniors?
c. If students can win more than once and the students are independent
from each other, what is the probability that the name-puller will call the first
senior on the third name?
d. What is the probability that the first person to win a scholarship is a
senior?
e. What is the probability that the third person to win a scholarship is a
senior?
Exercise 21.15. Donuts. The probability that a student has a donut for
breakfast is 0.08.
d. What is the probability the next student I ask did not have a donut for
breakfast?
Exercise 21.16. Cards. Stacey is playing a game of cards using a standard
52-card deck (13 each of hearts, clubs, diamonds, and spades).
a. What is the probability the first card she is dealt will be the ace of
spades?
b. She is dealt 7 cards at the beginning of the game to hold in her hand.
What is the probability that all 7 will be hearts?
c. What is the expected number of hearts she will have in her 7 cards?
What is the standard deviation?
Exercise 21.17. Rain. Suppose rain is falling at an average rate of 30 drops
per square inch per minute.
a. What is the probability that a particular square inch is hit by exactly 4
drops in the next 10-second period?
b. How many 10-second intervals do you expect to observe this square inch
until you find a 10-second interval with exactly 4 raindrops?
c. What is the probability you would have to observe ten 10-second intervals
to find three of them with exactly 4 raindrops?
Exercise 21.18. Guessing on an exam. An exam consists of 20 multiple-
choice questions with 5 possible answer choices per question. You haven’t stud-
ied, so you decide to make random guesses for each answer choice. Assume each
question’s answer is independent from the others.
a. What is the probability that a person who randomly guesses on each
question gets exactly 5 questions correct given that they got at least 1 question
correct?
b. What is the expected number of correct answers on the exam? What is
the standard deviation?
c. What is the probability I will have to grade more than 12 exams to find
one with exactly 5 correct answers, assuming all of my students were randomly
guessing on all the questions?
d. What is the expected number of correct answers if a person takes 6
exams? What is the standard deviation in the number of correct answers if a
person takes 6 exams?
e. If you have to pay $2 to take the exam, but your parents pay you 50
cents for every correct answer, what is your expected net profit? What is the
standard deviation in your net profit?
Exercise 21.19. Socks. You are doing laundry, and you are trying to find any
small new red socks which may have fallen into the pile of white clothes in the
21.3. Review Problems 267
laundry basket by mistake. It’s really dark in your basement laundry room, so
you have to randomly sample items of clothing.
a. In a single laundry basket, there are 35 white socks and 10 red socks. If
you sample 5 socks at random (without replacement), what is the probability
that at least one of them will be a red sock?
b. Now suppose instead of being in your basement laundry room, you are
dealing with this problem at a huge cotton clothing manufacturer’s pre-wash
room, where there are 35,000 white socks and 1000 red socks. You sample 5
items of clothing at one time. What is the approximate probability at least one
of them will be a red sock?
c. Using your answer to part b, how many 5-item samples of clothing would
you expect to have to take to find the first 5-item sample with at least one red
sock?
Exercise 21.20. Coffee beans. You are in charge of coffee bean quality
control. In a very large batch of coffee beans, you estimate that the chance
a bean is roasted incorrectly is 0.008. You have your employees sample 1000
beans from the seemingly endless supply, to see how many incorrectly roasted
beans they find. (The roast of the beans is assumed to be independent, since
the quantity of beans is large, and the beans are well-mixed before they get
inspected.)
Exercise 21.21. Dogs. Frédérique has 7 dogs: Molly, Ted, Fido, Rocket, Max,
Sandy, and Nellie. Assume each dog is independent from the other.
a. One of the dogs has stolen her sandwich off her plate at lunch, and she
wants to figure out which one. They are each equally likely to have done it, so
she is going to randomly check a dog to see if he or she has bread crumbs on
his or her fur. What is the probability Rocket was the one who did it?
b. What is the expected number of sandwiches that would be stolen until
Rocket was the one who did it, assuming each dog is equally likely to steal her
lunch, and that the sandwich thieves are working independently, from day to
day (only 1 dog can steal the sandwich of the day).
c. Each time she leaves the table, there is a 0.25 chance that a dog will
steal her lunch. Out of 20 lunchtimes in which she is interrupted, what is the
probability that her lunch will be stolen exactly 4 times?
268 Chapter 21. Review of Named Discrete Random Variables
d. If Frédérique stays at the table for the entire lunchtime, there is only
a 0.0001 chance a dog will steal her lunch. Out of 10 years worth of lunches
(3650) what is the approximate probability a dog will steal her lunch exactly 4
times?
Exercise 21.22. Slug. The number of vehicles crossing a line on an interstate
is Poisson with an average of 5 per hour. A slug arrives at the interstate and
will wait until no vehicles have crossed the line for 5 minutes until it attempts
to get to the other side of the interstate. (This particular slug can count and
tell time.) Assume the cars are independent.
a. For a given 5-minute period, what is the expected number of cars to cross
the line?
b. What is the probability that no vehicles will pass in a given 5-minute
period?
c. How many 5-minute intervals will the slug have to wait until the first one
with no vehicles?
d. It will take the slug 10 minutes to cross the road. What is the probability
the slug will be run over by at least one car? (Start the clock and the car-
counting when the slug starts crossing the road.)
Exercise 21.23. Mice experiment. There are 5 white mice, 6 brown mice,
and 7 gray mice in a cage. For your psychology experiment, you reach in and
randomly select 3 to run in a maze.
a. What is the expected and standard deviation in the number of white
mice which are selected?
b. What is the probability that at least 2 of the mice selected are white?
c. What if you had 5000 white mice, 6000 brown mice, and 7000 gray mice to
choose from? You still need 3 mice to run in the maze. What is the approximate
probability that at least 2 of the mice selected are white?
Exercise 21.24. Left-handed desk. Fred is a left-handed person who walks
into a large lecture room for a 4-hour exam. He’s a little absent-minded at the
moment because he’s worried about the exam. He needs to find a left-handed
desk to sit in for the exam or else his back will get cramped up before the exam
is over, but he won’t know if the desk is left-handed until he actually sits in
one, because the desk part folds underneath the chair. Ten percent of the desks
in the exam room are for left-handed people, and the choice of left-or-right
handedness is independent from desk to desk, throughout the room. Assume
Fred is the first student to walk in the door.
a. What is the probability that Fred finds his first left-handed desk on his
5th try?
b. If Fred keeps sampling desks to find the left-handed desk which “feels
lucky,” what is the probability he finds his 3rd left-handed desk on his 8th try?
21.3. Review Problems 269
c. What is the average number of desks Fred has to try until finding his 2nd
left-handed desk?
d. If the first 9 desks tested are not left-handed, what is the probability that
he has to keep looking for more than 12 desks total to find his first left-handed
desk?
Exercise 21.25. Ming and Shaheed each roll a dice (repeatedly, in rounds)
until their results match, and then they stop. Let X be the number of rounds
in which the sum of the two dice was 3, and let Y be the number of rounds in
which the sum of the two dice was 9. Find the conditional PMF of X, given
X + Y = 10.
Exercise 21.26. Bethany has a fair six-sided die (with sides 1, . . . , 6). Angelica
has a fair coin, with “1” on one side and “4” on the other side. Each day, Bethany
rolls the die one time, and Angelica flips the coin one time (the results are
independent). Let N denote the first day on which Bethany’s die has a strictly
larger result than Angelica’s coin. Let X denote the value of Bethany’s die on
day N . Find the expected value of X.
Counting
We have talked about basic probability ideas in the previous chapters, and
we have discussed the difference between sampling with or without replacement
and whether order does/doesn’t matter. However, in this part, we want to focus
purely on counting problems, including rearrangements. You will see familiar
ideas from the Binomial and Hypergeometric discrete distributions, but we will
expand on those ideas in more complex situations. These problems can be both
challenging and fun. We suggest you draw pictures or even create little models
for yourself if you are having trouble seeing the story clearly in your mind. Much
of counting is common sense once you can visualize the rules for a particular
story.
By the end of this part of the book, you should be able to:
Math skills you will need: Binomial coefficients (“choose” parentheses), fac-
torials.
Additional resources: Calculators may be used to assist in the calculations.
271
Chapter 22
Introduction to Counting
The hardest arithmetic to master is that which enables us to count our blessings.
—Reflections On The Human Condition by Eric Hoffer (Harper, 1973)
How many unique results can appear on 3 differently colored dice? How many
unique results are possible if the dice are indistinguishable?
22.1 Introduction
The concept of “counting,” as it relates to probability, is usually performed in
the context of a sample space S with finitely many outcomes. Moreover, the
outcomes are usually all equally likely. We have seen a few examples of such
situations, during our study of outcomes, events, and discrete random variables.
For instance, if we shuffle a deck of 52 cards, and we remove one card, then there
are 52 possible outcomes, and they are each equally likely to occur. Each event
that contains just one outcome has probability 1/52. If the event contains 13 of
the outcomes (for instance, the 13 outcomes in which a heart is selected), then
the event has probability 13/52. More generally, if the event has j outcomes,
then the event has probability j/52. Recall that the general situation was
established in Corollary 2.10:
272
22.1. Introduction 273
Example 22.1. Roll two dice. The sample space of all 36 equally likely out-
comes is given in Figure 22.1. Some students have difficulty distinguishing how
often each outcome happens. One remedy for this is to imagine that one of the
dice is painted red and one is painted green, or to think of one die as getting
rolled first and the other to get rolled second. The probability that the sum of
the dice is 8 or larger is 15/36 = 5/12.
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Example 22.2. What is the probability that the results on the two dice differ
by 2 or less, i.e., if the two results are X and Y , what is P (|X − Y | ≤ 2)?
1 2 3 4 5 6
1 0 1 2 3 4 5
2 1 0 1 2 3 4
3 2 1 0 1 2 3
4 3 2 1 0 1 2
5 4 3 2 1 0 1
6 5 4 3 2 1 0
Example 22.3. Let X and Y denote the results on two dice. Define Z =
max(X, Y ). Find the mass of Z.
The value of Z is between 1 and 6. We can just add the number of possible
outcomes that correspond to each value of Z. For instance, Z = 5 if the outcome
is contained in the event
{(1, 5), (2, 5), (3, 5), (4, 5), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5)}.
Thus P (Z = 5) = 9/36. Figure 22.3 shows the values of Z for each outcome.
In general, the mass of Z is
z 1 2 3 4 5 6
pZ (z) 1/36 3/36 5/36 7/36 9/36 11/36
1 2 3 4 5 6
1 1 2 3 4 5 6
2 2 2 3 4 5 6
3 3 3 3 4 5 6
4 4 4 4 4 5 6
5 5 5 5 5 5 6
6 6 6 6 6 6 6
Figure 22.3: The maximum of two dice. A loop is drawn around each set of
outcomes for which the maximum is the same.
One of the most helpful rules of thumb in counting problems is the method of
multiplying the number of outcomes when two or more things are happening
simultaneously.
Example 22.4. If there are 4 colors of pants that you could pick from, and 3
colors of shirts, then there are 12 ways that you can put on both your pants
and shirt.
22.1. Introduction 275
More generally, if there are two things happening simultaneously, and the first
thing has n possibilities, and (for each of the first things) the second thing
always has m possibilities, then there are nm possibilities altogether.
Continuing to generalize, suppose now that we not only must put on our
pants and shirt, but also our shoes.
Example 22.6. If there are 4 pants, and for each such choice, 3 shirts, and for
each pair of pants/shirt, there are 5 styles of shoe, then there are (4)(3)(5) = 60
possible outfits altogether.
Example 22.8. A playlist has 10 rock songs, 3 blues songs, and 7 hip-hop
songs. The mp3 player is working in a “shuffle” mode in which each of the 20
songs are equally likely to appear each time, and none of the song choices affect
any of the other song choices. In particular, repetitions are certainly allowed.
What is the probability that the listener hears two consecutive hip-hop songs
followed by a rock song?
Example 22.9. While building a loft in your dorm room, you find that there
are 8 holes remaining but only 6 screws available. The 8 holes are arranged in a
line, from top to bottom. You randomly pick 6 holes to fill, with all possibilities
equally likely. What is the probability that you do not fill the bottom hole?
There are 18! equally likely ways that he can choose the flavors. To be
successful (i.e., to pull them out in the way that is described), there are 3! = 6
ways that the types can be ordered. There are 6! ways that the bottles of the
first type chosen can be arranged, and then there are 6! ways that the second
type chosen can be arranged, and then there are 6! ways that the third type
chosen can be arranged. So the desired probability is
3!6!6!6! 1
= = 3.4979 × 10−7 .
18! 2858856
nr n!
(n−r)! (permutation)
If I roll one red die, If I have 10 students,
one green die, and how many unique ways
one blue die, can 4 different
how many possible students go to
unique results the board for 4 different
Order can I get? homework problems?
matters (r = 3 dice, (r = 4 chosen,
n = 6 possibilities) n = 10 possibilities)
General: you have an General: you care about
unlimited supply and “when” not just “if”
you care about “when” something is selected.
objects are selected Nothing can be selected
(and how many times). more than once.
n+r−1 n n!
(combination)
r r = r!(n−r)!
If I roll 3 identical If I have 10 students,
dice one time each, how many ways can
how many possible 4 different students
unique results go to the board
Order does can I get? for a problem?
not matter (r = 3 dice, (r = 4 chosen,
n = 6 possibilities) n = 10 possibilities)
General: you have an General: you only care
unlimited supply, and “if” not “when”
you only care “if” objects are selected.
not “when” something Nothing can be chosen
is selected. more than once.
The upper left box has nr ways because we have r choices, each of which
has n possibilities, so there are (n)(n) · · · (n) = nr ways. The upper right box
has n first choices, n − 1 second choices, . . . , n − r + 1 for the rth choice, so
(n)(n − 1) · · · (n − r + 1) = n!/(n − r)! ways altogether. In the lower right box,
the count is the same, but the r! different arrangements of each are viewed as
the same, so the count is n!/((n − r)!r!). Finally, in the lower left box, we
have n types of items to choose from r types, so we can just put r − 1 dividing
lines between n unlabeled balls. The number of balls before the first line are
of type 1; the number of balls between the first and second lines are of type 2;
etc., the number of balls between the (r − 2)nd and (r − 1)st lines are of type
r − 1, and the number of balls after the (r − 1)st line are of type r. It is helpful
to try this on your own, with some small n’s and r’s.
278 Chapter 22. Introduction to Counting
Example 22.11. If I roll one red die, one green die, and one blue die, how
many possible unique results can I get?
Each die has 6 possible values (n = 6), and the 3 dice have distinguishable
colors (r = 3), which means a “Red 1, Blue 2, Green 3” would be considered
a different result than “Red 3, Blue 2, Green 1,” for example. Also, dice use
sampling with replacement because each time you roll a die, you have the exact
same options with the same probabilities that you had the first time. The red
die result has no effect on the blue or green dice results. Therefore the number
of possible unique results I can get will be 63 = 216.
The 52 numbers are reusable, so for example, I could choose {A♠, A♠, A♠, A♠
,2♥}, or {2♥, A♠, A♠, A♠, A♠}; the order distinguishes these two. Therefore
I have n = 52 for my options for each card, and r = 5 because I need to select
five cards. There are 525 = 380204032 possible selections of the cards, with
replacement, and keeping track of the order of selection.
n+r−1
Sampling with replacement, order does not matter: r
Example 22.13. If I roll 3 indistinguishable dice, each one time, how many
possible unique results can I get?
This time we still have 6 options on each die (n = 6), and we still have 3 dice
(r = 3), but since the dice are identical, a result of “1, 2, 3” would be the same
as “2, 3, 1” or “3, 1, 2” because
we8cannot tell the dice apart. The number of
unique results will be 6+3−13 = 3 = 56. Compare this answer to what we
found in the example with the red, blue, and green distinguishable dice (216
unique outcomes). Losing the distinction of the ordering causes there to be
fewer unique sets of values.
22.2. Order and Replacement in Sampling 279
The 52 numbers are reusable, so for example, I could choose {A♠, A♠, A♠,
A♠, 2♥}, or {2♥, A♠, A♠, A♠, A♠}, but these are considered as the same
possibility; the order does not distinguish these two. Therefore I have n = 52 for
my options for each card, and r = 5 because I need to select five cards. There
are 52+5−1
5 = 3819816 possible selections of the cards, with replacement, but
without keeping track of the order of selection.
n!
Sampling without replacement, order matters: (n−r)!
Example 22.15. If I have 10 students, how many unique ways can 4 different
students go to the board to do 4 different homework problems?
There are n = 10 students in the room, and each student can only be selected
once (sampling without replacement). I need r = 4 different students to do
4 different homework problems, so the order that the students are chosen are
distinct. The number of unique ways I can select these 4 students is (10−4)!
10!
=
10!
6! = (10)(9)(8)(7) = 5040.
The 52 numbers are not reusable, e.g., I could choose {A♠, 7♠, 10♠, 3♥, 7♥},
or {10♠, 3♥, A♠, 7♥, 7♠}; the order distinguishes these two. Therefore I have
n = 52 for my options for each card, and r = 5 because I need to select five
cards. There are (52−5)!
52!
47! = (52)(51)(50)(49)(48) = 311875200 possible
= 52!
selections of the cards, without replacement, and keeping track of the order of
selection. This is a smaller number of possibilities, as compared to the situation
“with replacement.”
280 Chapter 22. Introduction to Counting
n
Sampling without replacement, order does not matter: r
Example 22.17. If I have 10 students, how many unique ways can 4 different
students go to the board to work on the same problem?
There are n = 10 students in the room, and each student can only be selected
once (sampling without replacement). I need r = 4 different students to do the
same problem, so the order that the students are chosen makes no difference. All
that matters is which 4 of the 10 students are selected. The number of unique
(10)(9)(8)(7)
ways I could choose 4 of these 10 students is 10 10!
4 = 4!(10−4)! = 4! =
210. This is closely related to the previous situation, in which the order of the
students mattered. Grouping together the selections from before according to
the students, we see that 4! = 24 of the previous selections corresponds to just
1 selection here. So there is a factor of 4! fewer distinct possibilities when we
ignore the order in which the students are chosen.
Example 22.19. If Alice and Alan (a couple) and Barbara and Bob (another
couple) and Christine and Charlie (another couple) sit in a row of chairs, what
is the probability that each of the 3 couples sit together?
There are 6! = 720 ways that the 6 people can be seated altogether. (This
is true in all of the questions with 6 people, so we won’t repeat this fact.)
Grouping the couples together again, there are 3! ways that the couples can
be seated (i.e., either A’s/B’s/C’s, or A’s/C’s/B’s, or B’s/A’s/C’s, etc... 3! ways
22.3. Counting: Seating Arrangements 281
total). For each such way, there are 2 ways that the A’s can be arranged among
themselves, and 2 ways that the B’s can be arranged among themselves, and 2
ways that the C’s can be arranged among themselves. So the total probability
that the couples are seated together is 3!(2)(2)(2)/720 = 1/15.
There are (2n)! ways that the 2n people can be seated altogether. (This is
true in all of the questions with 2n people, so we won’t repeat this fact.)
Grouping the couples together again, there are n! ways that the couples can
be seated. For each such way, there are 2 ways within each couple that the men
and woman can be arranged in their two reserved seats. So the total probability
that the couples are seated together is n!2n /(2n)!.
Example 22.21. If Alice, Barbara, and Christine (three women) and Alan,
Bob, and Charlie (three men) sit in a row of chairs, what is the probability that
the women all sit together (the men may or may not be in a group)?
There are 3! ways that the men can be arranged. The women, as a group, can
be collectively put into any of the 4 gaps between the men, including the spaces
on the left- or right-hand ends. Once the women are placed as a group, there
are 3! arrangements among just the women themselves. So the total probability
that the women sit together as a group is (3!)(4)(3!)/6! = 1/5.
Figure 22.4: Ways to arrange 3 men and 3 women, if the women should sit
together in a group.
Example 22.22. If n women and n men sit in a row of chairs, what is the
probability that the women all sit together?
There are n! ways to arrange the men. The women, as a group, can be
put into any of the n + 1 gaps between the men, including the left- or right-
hand ends. There are n! arrangements among the women themselves. So the
probability is n!(n + 1)n!/(2n)!. (Compare with Example 22.21, when n = 3.)
Example 22.23. If Alice, Barbara, and Christine (three women) and Alan,
Bob, and Charlie (three men) sit in a row of chairs, what is the probability that
the women all sit together and all the men sit together?
Either the women sit collectively on the right or on the left (i.e., 2 ways).
Once the women are placed as a group, there are 3! arrangements among just
the women themselves, and there are 3! arrangements among just the men
themselves. So the probability is (2)(3!)(3!)/6! = 1/10.
Example 22.24. If n women and n men sit in a row of chairs, what is the
probability that the women all sit together and all the men sit together?
Either the women sit collectively on the right or on the left (i.e., 2 ways).
Once the women are placed as a group, there are n! arrangements among just
the women themselves, and there are n! arrangements among just the men
themselves. So the probability is (2)(n!)(n!)/(2n)!.
Example 22.25. If Alice, Barbara, and Christine (three women) and Alan,
Bob, and Charlie (three men) sit in a row of chairs, what is the probability that
none of the women are adjacent and none of the men are adjacent?
Either the leftmost chair is for a woman or a man (i.e., 2 ways). After-
wards, the sexes of the people are determined. This leaves 3! arrangements
among the women, and 3! arrangements among the men. So the probability is
(2)(3!)(3!)/6! = 1/10.
Example 22.26. If n women and n men sit in a row of chairs, what is the
probability none of the women are adjacent and none of the men are adjacent?
22.3. Counting: Seating Arrangements 283
Either the leftmost chair is for a woman or a man (i.e., 2 ways). Afterwards,
the sexes of the people are determined. Once they are determined, there are n!
arrangements among just the women themselves, and there are n! arrangements
among just the men themselves. So the probability is (2)(n!)(n!)/(2n)!.
Example 22.27. If n couples sit in a row of chairs, what is the expected number
of couples that sit together?
Method #1. Let X denote the number of couples that sit together. Let Xj
indicate whether the jth man sits next to his wife, so that
We know E(Xj ) = P (Xj = 1), i.e., E(Xj ) is just equal to the probability that
the jth man sits next to his wife. This equals the probability that he sits on
the either end of the row and his wife sits in the unique seat next to him, plus
the probability that he sits in the interior of the row and his wife sits in either
of the seats next to him. The probability that the man sits at either end of the
row is 2/(2n), and, given that he sits at either end of the row, the probability
that his wife sits next to him is 1/(2n − 1). On the other hand, the probability
that the man sits in the interior of the row is (2n − 2)/(2n), and, given that he
sits in the interior of the row, the probability that his wife sits next to him is
2/(2n − 1). So the probability that he sits next to his wife is
2 1 2n − 2 2
E(Xj ) = + ,
2n 2n − 1 2n 2n − 1
which simplifies to
2 + 4n − 4 4n − 2 1
E(Xj ) = = = .
(2n)(2n − 1) (2n)(2n − 1) n
So the expected number of couples that sit together is
1 1 1
E(X) = + + · · · + = n(1/n) = 1.
n n n
Method #2. Let X denote the number of couples that sit together. Let Xj
indicate whether the person in the jth chair has her/his partner to the right,
so that
(
1 if the person in the jth chair has her/his partner to the right,
Xj =
0 otherwise,
284 Chapter 22. Introduction to Counting
We know E(Xj ) = P (Xj = 1), i.e., E(Xj ) is just equal to the probability that
the jth person has her or his partner to the right. No matter who is in the jth
chair, the probability that the person’s partner is found to the right is 1/(2n−1).
So
E(Xj ) = 1/(2n − 1).
Thus, the expected number of couples that sit together is
1 1 1 2n − 1
E(X) = + + ··· + = = 1.
2n − 1 2n − 1 2n − 1 2n − 1
Example 22.28. If n couples sit in a circle of chairs (as opposed to a row, in the
previous example), what is the expected number of couples that sit together?
Method #1. Let X denote the number of couples that sit together. Let Xj
indicate whether the jth man sits next to his wife, so that
We know E(Xj ) = P (Xj = 1), i.e., E(Xj ) is just equal to the probability that
the jth man sits next to his wife. Regardless of where the jth man sits, his wife
will sit next to him with probability 2/(2n − 1). So
We know E(Xj ) = P (Xj = 1), i.e., E(Xj ) is just equal to the probability that
the jth person has her or his partner to the right. No matter who is in the jth
chair, the probability that the person’s partner is found to the right is 1/(2n−1).
So
E(Xj ) = 1/(2n − 1).
So the expected number of couples that sit together is
1 1 1 2n
E(X) = + + ··· + = .
2n − 1 2n − 1 2n − 1 2n − 1
Note that it is slightly more likely for a man to sit next to his wife when the
row of chairs wraps around into a circle. If the man and the woman were on
opposite ends of a row, they will not be next to each other, but if the row is
wrapped into a circle, then they are next to each other.
22.4 Exercises
22.4.1 Practice
Exercise 22.1. Raffle tickets. There are 30 raffle tickets in a bowl. Three
winning tickets will be selected. Each ticket can win at most one prize. How
many ways can the prizes be distributed if the following additional information
is known?
a. All 3 winners receive goldfish (the goldfish are indistinguishable).
b. The 1st winner receives a car, the 2nd a bicycle, and the 3rd a goldfish.
Exercise 22.2. Defective robots. Among 7 robots produced by a factory
one day, 3 are defective. If 3 robots are purchased by the local toy store, find
the probability that at least one will be nondefective.
Exercise 22.3. Rearrangements. Consider the ways that the letters in the
word “Mississippi” can be rearranged (the 4 i’s are indistinguishable, the 4 s’s
are indistinguishable, and the 2 p’s are indistinguishable).
a. What is the probability that the S’s are grouped together?
b. What is the probability that the S’s are grouped together and the P’s
are grouped together?
c. What is the probability S is the 1st letter and I is the 5th letter?
d. What is the probability S or P is in the 1st spot?
Exercise 22.4. Cloudy days. In a certain city, the weather is cloudy on a
given day with probability 0.55, and is sunny with probability 0.45. The weather
is measured on several consecutive days (which are deemed to be sufficiently
independent for this problem).
286 Chapter 22. Introduction to Counting
Exercise 22.5. Mochas. Suppose that typically 3/10 of the customers order
“grande” mocha (made with skim milk and extra whipped cream). Assume that
customers are independent.
a. What is the probability that exactly 2 of the next 5 customers will order
a “grande” mocha?
b. What is the probability that the 1st 3 customers do not order “grande”
mochas and the 4th and 5th customers do order “grande” mochas?
Exercise 22.6. Jelly beans. A student buys a bag of jelly beans at the store.
She eats most of them, but 20 remain at the end of the day. Exactly 13 of these
jelly beans are fruity, and the other 7 are root beer favored. Her boyfriend grabs
5 of the 20 jelly beans.
a. What is the probability that exactly 3 of the 5 jelly beans that he grabs
are fruity?
b. What is the expected number of fruity jelly beans that he grabs?
Exercise 22.7. Stamps. A professor buys postage stamps and puts them in
an envelope. Exactly 30 of the stamps are for letters, and exactly 10 of the
stamps are for postcards. The professor’s wife grabs two stamps. All outcomes
are equally likely.
a. What is the probability that both stamps she selects are for letters?
b. What is the probability that both stamps she selects are for postcards?
c. What is the probability that she gets one of each type?
(Hint: These three answers should sum to 1 altogether.)
Exercise 22.8. Rolling dice. Roll five dice. What is the probability that all
five of the values that appear are distinct, i.e., there are no repetitions among
the five dice?
“B,E,G,M,N,P,Q,R,S,U”
22.4. Exercises 287
Exercise 22.10. Hiring. An employer has a very large pool of applicants for
8 jobs. The employer needs to report only the sex of the 8 people who are hired
(not the names or any other distinguishing features), to a gender-equity review
board. How many ways of hiring men and women for these positions are there
if:
Exercise 22.11. Socks. Running late for class, you grab 2 socks out of your
drawer without looking at what color they are. In the drawer you have one pair
each of black, red, green, brown, blue and white socks, but they are not folded
as pairs—it’s a big jumbled mess in your drawer! What is the probability that
you grab 2 socks of the same color?
Exercise 22.12. Gumballs. You fill a mini gumball machine with 60 gumballs:
red cherry, pink bubblegum, green lime, and orange orange, 15 of each. Over
time you eat all of the gumballs, two at a time. What is the probability of being
left with one pink gumball and one green gumball at the end?
Exercise 22.13. Rock block. On a certain radio station, 70% of the songs
are rock songs, and 30% of the songs are pop songs. The songs are selected
independently. Each “block” of songs (a “block” is a set of songs between com-
mercials) contains 10 songs. The DJ says that the next “block” of songs has at
least 8 rock songs. Given this information, what is the probability that all 10
songs in that “block” will be rock songs?
Exercise 22.16. Family photos. Your computer has 437 family photos. You
decide to take 30 random photos (without replacement) from the collection
(each is equally likely), to use as a slideshow on the TV when guests come over.
Of your family photos, 42 are from your favorite vacation. What is the expected
number of photos from your favorite vacation in the slideshow?
Exercise 22.17. Action figures. In a box there are 31 Power Ranger action
figures. Seven of them are red and the other 24 are blue. Timmy closes his eyes
and randomly picks 8 action figures out of the box. What is the probability 6
of them are red?
Exercise 22.18. Cereal. On the cereal shelf of the house you share with
several roommates, you have 4 types of cereal with chocolate (Reese’s Puffs,
Cocoa Puffs, Count Chocula, and Cookie Crisp) and 3 types of cereal without
chocolate (Corn Chex, Cheerios, and Captain Crunch). You haven’t had your
coffee yet, so you will blindly choose 3 boxes of cereal to mix together for
breakfast. What is the probability you will get at least one chocolate cereal
mixed in?
Exercise 22.20. Pizza parlor. You make pizzas for a local pizza parlor. On a
busy Friday night, you’ve had 12 orders in the last hour. The orders are placed
independently. Customers are known to prefer cheese pizza 37% of the time,
meaty pizza 47% of the time, and veggie pizza the other 16% of the time. What
is the probability that exactly 3 of the pizzas are veggie, 4 are cheese, and 5 are
meaty?
22.4.2 Extensions
Exercise 22.21. Daughters and mothers. If you randomly assign daughters
a, b, c, d, e to mothers A, B, C, D, E, and you let X be the number of daughters
who are correctly assigned to their mother, find the mass of X.
a. What is the probability Pierre’s entire outfit next Monday will be blue?
b. What is the probability Pierre will wear at least 1 entirely blue outfit
during his next 5-day work week?
c. What is the probability that Pierre will wear entirely blue outfits on
Monday and Friday while wearing outfits which are not entirely blue on Tuesday
through Thursday?
d. What is the probability Pierre will wear an entirely blue outfit on exactly
2 of the 5 days next week?
a. If you select a sandwich made with one of each: bagel, meat, cheese,
dressing, and veggie (and you don’t care about the order), how many different
combinations of sandwich do you really have to choose from?
b. Now assume you only eat whole wheat bagels. The same 20 toppings
listed above can be used. How many combinations do you have?
c. If you can choose 3 items to go on your bagel, how many different types of
sandwiches can you have if the different items are reusable (tomato and double
ham would be ok)?
d. If you can choose 3 items but they are not reusable, how many are
possible?
Exercise 22.31. Pizza toppings. A pizza place offers the choice of the fol-
lowing toppings: extra cheese, mushrooms, pepperoni, ham, sausage, onions,
and green peppers. Assume that each pizza must have at least 1 topping and
that order of toppings is irrelevant. Also assume that toppings cannot be reused
(double sausage is impossible).
a. If all the books have different titles, in how many distinct ways can he
arrange them?
b. Throughout parts b–e, assume that all the books from a particular topic
have the same title (for example, 3 indistinguishable copies of “Calculus” by
Carey). In how many distinct ways can he arrange his books?
c. If he groups the identical math books together, and he groups the identical
history books together, and he groups the identical chemistry books together,
in how many distinct ways can he arrange his books?
d. If he groups the identical history books together (but isn’t picky about
the other books), in how many distinct ways can he arrange his books?
e. Given that the identical history books are grouped together, what is the
probability that he has grouped each of the identical books (the situation in
part c)?
Exercise 22.33. Movie theater. Ten people are going to a private screening
of a movie in a small theater. There are two rows in the theater. The first row
has 4 seats, and the second row has 6 seats.
a. How many ways can you arrange the seating of the 10 people altogether?
b. How many seating arrangements are possible if 3 of the 10 people are in
a family, and that family wants to sit together?
Exercise 22.34. Beach books. You have a row of 20 books on your desk.
You’re about to go on vacation and want to grab some good reading material,
preferably a good novel or some other work of fiction. Of the 20 books, 12 are
fiction and 8 are nonfiction. You grab 5 books at random.
a. What is the expected number of fiction books you will have?
b. What is the probability that 2 or more of your 5 books are fiction?
Exercise 22.35. Chair circle. A group of men and women sit in a circle.
There are 20 chairs in the circle, and 10 pairs of married individuals. What is
the probability that a man will sit directly across from his wife if everyone sits
randomly?
Exercise 22.36. License plates. On a “Save The Wetlands” license plate,
there are always two letters (e.g., “SW”) followed by four digits, and thus 10,000
combinations of plates are available for each pair of letters. If the four digits
are selected randomly, and all 10,000 possibilities are equally likely, what is the
probability that the four digits are distinct and in ascending order?
Exercise 22.37. Socks. In my sock drawer there are 21 white socks, 8 black
socks, and 4 brown socks.
a. If I randomly pull out 6 socks to take with me on a trip, what is the
probability that I pull out one pair of each color?
292 Chapter 22. Introduction to Counting
b. What is the probability all the socks are the same color?
c. What is the probability that I pull out 2 socks of one color and 4 socks
of a second color?
Exercise 22.38. Coins. My friend Alejandro has 35 coins, of which 26 are
quarters and the other 9 are pennies. If he gives me 4 coins at random, what is
the probability that I will have enough money to buy my favorite 89 cent candy
bar? How much money do I expect to get from Alejandro?
Since there are 4 suits, there are (4)(10) = 40 types of possible straight flushes.
So the probability of getting a straight flush in one 5-card hand is 40/2598960.
293
294 Chapter 23. Two Case Studies in Counting
23.1.4 Flush
A flush consists of five cards
from the same suit. There are 4 possible suits;
for each suit, there are 135 = 1287 possible cards from that suit. So there
are (4)(1287) = 5148 types of possible flushes. Usually the straight flushes are
removed from this classification (because they have a classification all on their
own); therefore, there are 5148−40 = 5108 types of flushes that are not straight
flushes. so the probability of getting a flush (but not a straight flush) in one
5-card hand is 5108/2598960.
23.1.5 Straight
A straight consists of five cards that can be placed in order. There are 10
possible sets of values for the cards in the straight, as seen
in the straight flush
4
example above; for each type of straight, there are 1 = 4 ways to pick a
card from each of the ranks, so there are (4)(4)(4)(4)(4) = 45 ways to pick the
cards altogether. So there are (10)(45 ) = 10,240 types of possible straights.
Usually the straight flushes are removed from this classification (because they
have a classification all on their own), which leaves 10,240 − 40 = 10,200 types
of straights that are not straight flushes. Thus, the probability of getting a
straight (but not a straight flush) in one 5-card hand is 10200/2598960.
and 41 = 4 ways to pick the greater of the two non-matched cards, and 41 = 4
ways to pick the lesser of the two non-matched cards. Combining these, we have
(13)(66)(4)(4)(4) = 54,912 types of possible three of a kinds. So the probability
of getting three of a kind in one 5-card hand is 54912/2598960.
the rank for the non-matched card. for each such choice, there are 42 = 6 ways
to pick the pair of cards in the greater-valued pair, and 42 = 6 ways to pick
the pair of cards in the lesser-valued pair. There are 41 = 4 ways to pick the
pick the pair of cards in the pair, and 41 = 4 ways to pick each of the non-
23.2 Yahtzee
In Yahtzee, five dice (each is a standard die, with 6 sides) are rolled, so that
there are 65 = 7776 equally likely outcomes. It is usually helpful to think of the
five dice separately, to keep the outcomes clear in one’s mind. For instance, one
might think of five differently colored dice, or five dice that are rolled by five
numbered players 1, 2, 3, 4, 5, etc. Thus, the scenario is a bit different than in
the poker hands case study above.
In Yahtzee, we are allowed to roll the dice up to three times, but
for simplicity here, we will only roll each die one time.
5!55−j
5
P (X = j) = (1/6)j (5/6)5−j = .
j j!(5 − j)!65
For another method of computing this probability, note that all 65 outcomes
are equally likely. In order to have exactly j dice show a certain value (for
instance, the value “3”), there are 5j ways to pick which dice will show that
value. The 5 − j dice that will not equal the desired value can appear in 5
ways each (e.g., the other
5 − j dice can each be anything except “3”). This
means that there are 5j 55−j possible outcomes in which there are exactly j
296 Chapter 23. Two Case Studies in Counting
For another method of computing this probability, note that all 65 outcomes
are equally likely. There are six ways to choose the value for the three of a kind,
e.g., “1.” To get exactly three values equal to this specific value, there are 53
ways to pick the dice that must show this value; there are (5)(4) ways that the
other two dice can show two other values that are distinct. Combining these,
we see that there are (6) 53 (5)(4) possible outcomes in which there are exactly
3 occurrences of one of the values, without the other pair matching. Therefore,
the desired probability is
(6) 53 (5)(4)
25
5
= = 0.1543.
6 162
For another method of computing this probability, note that all 65 outcomes
are equally likely. There are six ways to fix a desired value of j. In order to
have exactly four values equal to this specific j, there are ways to pick which
5
of the 4 dice show this value; the remaining 5 − 4 = 1 die can show any of
the other 5 values. So there are (6) 54 5 possible outcomes in which there are
(6) 54 5
25
5
= = 0.01929.
6 1296
(6)(5) 53
300 25
= = = 0.03858.
65 7776 648
there are 3! ways to choose how the other 3 values are arranged. So there are
5
5! + (4) (3!) = 360 ways.
2
For the type “2 through 5,” the other die cannot be 1 or 6, or then we have
a large straight. We have 2–5, with one value duplicated. Thus, there are 4
ways to choose which value is duplicated, and then there are 52 = 10 ways to
choose which dice get that repeated value, and then there are 3! ways to choose
how the other 3 values are arranged. So, altogether, there are
5
(4) (3!) = 240 ways.
2
For the type “3 through 6,” the other die cannot be a 2, or then we have a
large straight. We either have 3–6, along with 1; or we have 3–6, with one value
298 Chapter 23. Two Case Studies in Counting
duplicated. There are 5! ways to have 3–6 along with 1. To have 3–6, with one
value duplicated, there are 4 ways to choose which value is duplicated, and then
there are 52 = 10 ways to choose which dice get that repeated value, and then
there are 3! ways to choose how the other 3 values are arranged. This yields
5
5! + (4) (3!) = 360 ways.
2
Altogether, there are 360 + 240 + 360 = 960 ways that the results can be
arranged on the dice. So the desired probability is
960 10
5
= = 0.1235.
6 81
(2)(5!) 5
5
= = 0.03086.
6 162
23.2.7 Yahtzee
To obtain a “yahtzee,” which is just five of a kind, there must be exactly 5
occurrences of some value.
Fix a desired value (e.g., “3”) that a player wants to get for the yahtzee, i.e.,
for the Yahtzee. The number of dice X that show that specific value (in our
case, “3”) is Binomial with parameters n = 5 and p = 1/6. So
5 5!
P (X = 5) = (1/6)5 (5/6)5−5 = = 1/65 = 1/7776 = 0.0001286.
5 5!(5 − 5)!65
There are 6 possible values for the yahtzee, and these are all disjoint. So we
conclude that the total desired probability is
1 1
(6) = = 0.0007716.
7776 1296
For another method of computing this probability, note that all 65 outcomes
are equally likely. There are six ways to fix a desired value of j. Once the desired
value of the yahtzee is chosen, then all of the dice must equal the value. So the
desired probability is
6 1
5
= = 0.0007716.
6 1296
Part V
You have learned how to calculate probabilities for discrete random variables.
Earlier in the text we discussed the differences between discrete and continuous
random variables. In the chapters that follow, you will learn how to calculate
probabilities for continuous random variables. Afterwards, we will apply these
rules and formulas in the next part of the book with five different types of named
continuous random variables.
By the end of this part of the book, you should be able to:
5. Calculate the mean, variance, standard deviation, and median of the con-
tinuous random variable.
Math skills you will need: integrals of one and two variables (in particular,
integration by parts), derivatives, ex , and ln x.
Additional resources: Computer programs (such as Excel, Maple, Mathe-
matica, Matlab, Minitab, R, SPSS, etc.) and calculators may be used to assist
in the calculations.
299
Chapter 24
The number of passengers on a randomly chosen bus is an integer, but the speed
of the bus is a real-valued number. The height of a tree is real-valued, but the
number of leaves is discrete.
24.1 Introduction
In Chapter 7, you already learned the difference between a discrete random
variable and a continuous random variable. Here’s a review:
Discrete means that you could list list the specific possible outcomes that
the variable can take (there are either a finite number of them, or a countably
infinite number). An example is X ∈ {0, 1, 2, 3, 4, 5, . . .}, for the number of
4-leaf clovers you will find in a 1-acre pasture this afternoon.
Continuous means that the variable takes on a range of values. You could
usually state the beginning and end points, but you would have infinitely many
possibilities of answers within that range. An example is 62.8 ≤ X ≤ 67.0, for
how many ounces of soda will actually be in the next 64-ounce bottle you open.
We give a table with an overview of the differences between discrete and
continuous random variables. As you can see, many of the formulae are very
similar in structure. A key difference is that for discrete random variables you
use summations, and for continuous random variables you use integrals.
300
24.1. Introduction 301
Discrete Continuous
probability mass (probability mass density (probability density
function function; PMF) function; PDF)
0 ≤ pX (x) ≤ 1 0 ≤ fX (x)
(not necessarily ≤ 1)
P R∞
x pX (x) =1 −∞ fX (x) dx = 1
P (0 ≤ X ≤ 2) P (0 ≤ X ≤ 2)
R2
= P (X = 0) + P (X = 1) + = 0 fX (x) dx
P (X = 2)
if X is integer valued
P (X ≤ 3) 6= P (X < 3) P (X ≤ 3) = P (X < 3)
when P (X = 3) 6= 0 since P (X = 3) = 0 always
cumulative FXP(a) = P (X ≤ a) R a = P (X ≤ a)
FX (a)
distribution = x≤a P (X = a) = −∞ fX (x) dx
function graph of CDF is a graph of CDF is
(CDF) step function with jumps nonnegative and
FX (x) of the same size as continuous, rising
the mass, from 0 to 1 up from 0 to 1
examples counting: defects, hits, lifetimes, waiting times,
die values, coin heads/tails, height, weight, length,
people, card arrangements, proportions, areas, volumes,
trials until success, etc. physical quantities, etc.
named Bernoulli, Binomial, Continuous Uniform,
distributions Geometric, Negative Exponential, Gamma,
Binomial, Poisson, Beta, Normal
Hypergeometric,
Discrete Uniform
R∞
expected
P
E(X) = x xpX (x) E(X) = −∞ xfX (x) dx
R∞
value
P
E(g(X)) = x g(x)pX (x) E(g(X)) = −∞ g(x)fX (x) dx
R∞
E(X 2 ) E(X 2 ) = x x2 pX (x) E(X 2 ) = 2
P
−∞ x fX (x) dx
variance Var(X) = Var(X) =
E(X 2 ) − (E(X))2 E(X 2 ) − (E(X))2
std. dev.
p p
σX = Var(X) σX = Var(X)
• Both are building blocks for probabilities, the CDF, the expected value,
and the variance.
• The sum of the mass over all possible x values is 1. The integral of the
mass over all possible x values is 1.
• Maximum values:
• At a single point:
The density and CDF are related just like derivatives and integrals are re-
lated.
Remark 24.5. Densities and CDFs
The CDF is the integral of the density, i.e.,
Z a
FX (a) = fX (x) dx,
−∞
24.2 Examples
P (1 ≤ X ≤ 3) = P (2 ≤ X ≤ 3)
Z 3
1
= (4x + 1) dx
2 26
3
1
= (2x2 + x)
26 x=2
1
((2)(32 ) + 3) − ((2)(22 ) + 2)
=
26
= 11/26
= 0.4231
1 fX (x) 1 FX (x)
0.5 0.5
x x
1 2 3 4 5 1 2 3 4 5
FX (x) = P (X ≤ x).
306 Chapter 24. Continuous Random Variables and PDFs
and
FX (x) = P (X ≤ x) = 1 for 4 < x.
The most interesting values of the CDF happen for 2 ≤ x ≤ 4. We compute
the density from −∞ to a, where a is in the interval [2, 4], but the density is 0
for x < 2. Thus, it suffices to integrate over the interval [2, a].
Z a
1
P (X ≤ a) = (4x + 1) dx
2 26
a
1
= (2x2 + x)
26 x=2
1
((2)(a ) + a) − ((2)(22 ) + 2)
2
=
26
1
= (2a2 + a − 10)
26
Thus the CDF of X is:
if x < 2,
0
FX (x) = 1
(2x2 + x − 10) if 2 ≤ x ≤ 4,
26
1 if 4 < x.
Notice that the CDF for a continuous random variable—just like the CDF for a
discrete random variable—is a nondecreasing function that is between 0 and 1.
On the other hand, the CDF for a discrete random variable has a jump at each
point where the random variable takes its mass, but the CDF for a continuous
random variable is continuous, with a limit of 0 as x → −∞ and a limit of 1 as
x → ∞.
0.3
1
fX (x) FX (x)
0.2
0.5
0.1
x x
2 4 6 8 10 2 4 6 8 10
Figure 24.2: Left: The density fX (x) = 41 e−x/4 . Right: The CDF FX (x) =
1 − e−x/4 , of a random variable with nonnegative density for x > 0.
Example 24.9. Sometimes a density can have more than two pieces. As an
example:
3/4 if 0 ≤ x ≤ 1,
fX (x) = 1/4 if 3 ≤ x ≤ 4,
otherwise.
0
The density is shown on the left of Figure 24.3. Find the CDF.
We integrate to find the CDF in five disjoint regions:
For a < 0, we have Z a
FX (a) = 0 dx = 0.
−∞
For 0 ≤ a ≤ 1, we have
Z 0 Z a
3
FX (a) = 0 dx + 3/4 dx = a.
−∞ 0 4
For 1 ≤ a ≤ 3, we have
Z 0 Z 1 Z a
3
FX (a) = 0 dx + 3/4 dx + 0 dx = .
−∞ 0 1 4
For 3 ≤ a ≤ 4, we have
Z 0 Z 1 Z 3 Z a
3 1
FX (a) = 0 dx + 3/4 dx + 0 dx + 1/4 dx = + (a − 3).
−∞ 0 1 3 4 4
For 4 ≤ a, we have
Z 0 Z 1 Z 3 Z 4 Z a
3 1
FX (a) = 0 dx + 3/4 dx + 0 dx + 1/4 dx + 0 dx = + = 1.
−∞ 0 1 3 4 4 4
Thus, the CDF of X is
0 if x < 0,
if
3
4x 0 ≤ x ≤ 1,
FX (x) = 43
if 1 < x < 3,
4 + 4 (x − 3) if
3 1
3 ≤ x ≤ 4,
if
1 4 < x.
The CDF is shown on the right of Figure 24.3. By the way, the median can be
read directly from figure for this CDF. We have FX (x) = 1/2 when x = 2/3,
since (looking at the region where 0 ≤ x ≤ 1 and FX (x) = 43 x), we can solve
FX (x) = 34 x = 1/2 to get x = 2/3.
310 Chapter 24. Continuous Random Variables and PDFs
1 fX (x) 1 FX (x)
0.5 0.5
x x
2 4 6 2 4 6
Figure 24.3: Left: The density fX (x). Right: The CDF FX (x), of the
random variable described in Example 24.9, in which the density is nonnegative
on two disjoint regions.
Example 24.10. Check to make sure that the following function is a valid
density:
3
fX (x) = (x)(2 − x) for 0 ≤ x ≤ 2,
4
and fX (x) = 0 otherwise. See Figure 24.4. Also, find P (X ≤ 1).
1
fX (x)
0.5
x
0.5 1 1.5 2
The only time when fX (x) is not equal to zero is for 0 < x < 2. In
this region, x > 0 and 2 − x > 0 so fX (x) > 0 too. Thus, fX (x) is always
nonnegative. Since X is always between 0 and 2, we calculate
Z ∞ Z 2
3
fX (x) dx = (x)(2 − x) dx = 1,
−∞ 0 4
Now we find the probability that X ≤ 1. Since fX (x) is 0 when −∞ < x < 0,
we compute
Z 1 Z 1
3
P (X ≤ 1) = fX (x) dx = (x)(2 − x) dx = 1/2.
−∞ 0 4
Initially we might believe that the probability is 1/2 since we included half
of the interval from 0 to 2 in the previous computation, but the situation is not
always so simple. For instance, the following example does not have a symmetric
density:
3
fX (x) = (x)(2 − x)(3 − x) for 0 ≤ x ≤ 2,
8
and fX (x) = 0 otherwise. Find P (X ≤ 1). See Figure 24.5.
1
fX (x)
0.5
x
0.5 1 1.5 2
We compute
Z 1 Z 1
3
P (X ≤ 1) = fX (x) dx = (x)(2 − x)(3 − x) = 19/32.
−∞ 0 8
Example 24.12. One of the most frequently used types of continuous random
variables are constant on an interval, and zero otherwise. For instance, consider
the density
fX (x) = 1/6 for 4 < x < 10,
and fX (x) = 0 otherwise.
312 Chapter 24. Continuous Random Variables and PDFs
So this function really is a density. In fact, for any constants a and b inside the
interval, i.e., with 4 ≤ a ≤ b ≤ 10, we have
Z b
P (a ≤ X ≤ b) = 1/6 dx = (b − a)/6.
a
For example,
Z 5
P (2 ≤ X ≤ 5) = 1/6 dx = (5 − 2)/6 = 1/2.
2
Such nice facts work because, when we integrate a constant value over an inter-
val, the result is just the constant times the length of the interval. For instance,
when we integrate 1/6 on the interval [2, 5], the result is just 1/6 times the
length of the interval [2, 5] (i.e., 3). So the result is (1/6)(3) = 1/2. This is an
example of the Uniform density, to be considered more in Chapter 31.
3
fX (x)
x
1 2 3 4
24.3 Exercises
24.3.1 Practice
Exercise 24.1. Identify from the information below whether X is a discrete or
continuous random variable.
a. Let X be the height of a randomly selected 8-year-old child.
b. Let X be a random variable that takes values on the nonnegative integers,
i.e., X ∈ {0, 1, 2, 3, . . .}.
c. Let X be a random variable that takes values on all nonnegative real
numbers, i.e., X ≥ 0.
d. Let X be the number of 8-year-old children who will attend the 1:30
showing of the movie “Shrek 3.”
314 Chapter 24. Continuous Random Variables and PDFs
Exercise 24.3. If you know P (X > 2) = 0.3, fill in the chart for the other
values you know. Write “???” if there is not enough information to figure out a
value.
X is discrete X is continuous
P (X ≥ 2)
P (X < 2)
P (X ≤ 2)
P (X = 2)
Exercise 24.4. If you know P (X ≤ 5) = 0.9, fill in the chart for the other
values you know. Write “???” if there is not enough information to figure out a
value.
X is discrete X is continuous
P (X < 5)
P (X > 5)
P (X ≥ 5)
P (X = 5)
Exercise 24.5. Let X have density fX (x) = kx2 (1 − x)2 for 0 ≤ x ≤ 1, and
fX (x) = 0 otherwise, where k is constant.
e. Find P (5 ≤ X ≤ 6.5).
Exercise 24.8. Let X be the waiting time (in minutes) until a student’s friend
arrives. Suppose that X has density
1
fX (x) = e−x/3 , for 0 < x,
3
and fX (x) = 0 otherwise.
a. Find P (3 ≤ X ≤ 6).
c. Find P (X ≥ 24).
d. Find P (X ≤ −3).
e. Find P (3 ≤ X ≤ 12).
Exercise 24.9. Consider the function from Example 24.12, i.e., fX (x) = 1/6
for 4 < x < 10, and fX (x) = 0 otherwise.
Exercise 24.10. Consider the function from Example 24.13, i.e., fX (x) =
3e−3x for x > 0, and fX (x) = 0 otherwise.
if x < 3,
0
FX (x) = 171 (x − 6x − 9) if 3 ≤ x ≤ 6,
1 3
if 6 < x.
1
24.3.2 Extensions
Exercise 24.15. Suppose that X has density
1
fX (x) = sin x, for 0 ≤ x ≤ π,
2
and fX (x) = 0 otherwise.
a. Find P (X ≤ π/6).
and fX (x) = 0 otherwise. Find a general expression for the CDF FX (x), for
any x > 0.
Exercise 24.18. Let
√
FY (y) = y, for 0 ≤ y ≤ 1;
and FY (y) = 0 for y < 0, and FY (y) = 1 for y > 1. Find the density of Y .
24.3. Exercises 317
Exercise 24.20. What is the constant k that makes the following function a
valid density? (
kx9 (1 − x)2 if 0 ≤ x ≤ 1,
fX (x) =
0 otherwise,
Exercise 24.21. What is the constant k that makes the following function a
valid density? (
k x1 − 3x2 if 1 ≤ x ≤ 20,
fX (x) =
0 otherwise,
Exercise 24.23. Again consider the function from Example R ∞24.13, i.e., let
fX (x) = 3e−3x for x > 0, and fX (x) = 0 otherwise. Evaluate −∞ xfX (x) dx.
Exercise 24.24. Let X be the lifetime (in years) of a carbon-14 atom before
it decays. Then X has density
ln 2 ln 2
fX (x) = exp − x , for 0 < x,
5730 5730
and fX (x) = 0 otherwise. Find the length of time a (in years) so that
24.3.3 Advanced
Exercise 24.25. Let X have density
1
fX (x) = x2 e−x , for x > 0,
2
and fX (x) = 0 otherwise. Find P (X < 2).
318 Chapter 24. Continuous Random Variables and PDFs
and fX (x) = 0 otherwise. Find the value of k so that fX (x) is a valid density
function.
Joint Densities
You use a social networking application that sends out notifications for personal
messages and notifications for your friends posting new pictures. You know the
distribution for wait time for a personal message and the distribution of wait
time for a new-picture notification. How long do you expect to have to wait for
either a personal message or a picture notification?
25.1 Introduction
Just as with discrete random variables, we can handle more than one random
variable at a time. We can have a joint probability density function, also called
a joint density, associated with two continuous random variables:
Definition 25.1. Joint probability density function
The joint probability density function—also called a joint density—of a pair of
continuous random variables X and Y is fX,Y (x, y), and it has the following
properties:
319
320 Chapter 25. Joint Densities
The observation in the box above can also be viewed as follows: Since
FX,Y (a, b) = P (X ≤ a and Y ≤ b),
then to get −∞ < x ≤ a and −∞ < y ≤ b, we can use A = (−∞, a] and
B = (−∞, b], and we can integrate the joint density in each of the variables:
Z a Z b
FX,Y (a, b) = fX,Y (x, y) dy dx.
−∞ −∞
We can create pairs of random variables by using joint densities that satisfy
properties like those in the previous chapter. Any nonnegative function of x
and y, whose integral over all x and all y is 1, will describe a pair of continuous
random variables.
Remark 25.4. Creating a Pair of Continuous R ∞ R ∞Random Variables
If f (x, y) is any nonnegative function, and −∞ −∞ f (x, y) dy dx = 1, then
f (x, y) is the joint density for a pair of continuous random variables. We can
write f (x, y) = fX,Y (x, y) and then the associated random variables X and Y
have the property that
Z bZ d
P (a ≤ X ≤ b and c ≤ Y ≤ d) = fX,Y (x, y) dy dx.
a c
Note: The reason f (x, y) has to be a nonnegative function in the box above is
that densities are always nonnegative. This doesn’t mean that the associated
random variables have to be nonnegative. For instance, if f (x, y) = 1/4 for
−1 ≤ x ≤ 1 and −1 ≤ y ≤ 1, and f (x, y) = 0 otherwise, then f (x, y) is
nonnegative, but the random variables X and Y associated with this density
could be positive or negative, i.e., −1 ≤ X ≤ 1 and −1 ≤ Y ≤ 1.
25.2. Examples 321
25.2 Examples
First we present a series of four examples that all focus on the same pair of
random variables. In Example 25.5, we verify that fX,Y (x, y) is a joint density.
In Example 25.6, we consider the joint CDF and we calculate a probability. In
Example 25.7, we study another random variable, Z, that is defined to be the
minimum of X and Y . Finally, in Example 25.8, we compute the probability
that Y is bigger than X, and we do this in two different ways.
Example 25.5. Let X be the time (in minutes) that Maxine waits for a traffic
light to turn green, and let Y be the time (in minutes, at a different intersection)
that Daniella waits for a traffic light to turn green. Suppose that X and Y have
joint density
10
0 0
0 1
1 2
2 x
y 3 3
Figure 25.1: The joint density fX,Y (x, y) = 15e−3x−5y , for x > 0 and y > 0.
We first check that fX,Y (x, y) is a joint density. We see that fX,Y (x, y) is
nonnegative. Next, we check that the integral over all x’s and y’s is 1:
Z ∞ Z ∞ Z ∞Z ∞
fX,Y (x, y) dy dx = 15e−3x−5y dy dx
−∞ −∞ 0 0
We can factor the 15 into 3 times 5, anticipating the division by 3 and 5 that
322 Chapter 25. Joint Densities
Example 25.6. Now we compute FX,Y (1/2, 1/4) = P (X ≤ 1/2 and Y ≤ 1/4),
i.e., the probability that Maxine waits 1/2 of a minute or less, and Daniella
waits 1/4 of a minute or less. In the computation, we take advantage of the
fact that the integration in x does not affect the integration in y, so we split
the integral into two parts.
Z 1/2 Z 1/4
FX,Y (1/2, 1/4) = fX,Y (x, y) dy dx
−∞ −∞
Z1/2 Z 1/4
= 15e−3x−5y dy dx
0 0
Z 1/2 Z 1/4
= 3e−3x dx 5e−5y dy
0 0
1/2 −5y 1/4
−e−3x x=0
= −5e y=0
−3/2 −5/4
= (1 − e )(1 − e )
= 0.5543
We can compute much more than just joint CDFs using joint densities. For in-
stance, we could compute P (X > 1/10 and Y > 2/7), i.e., the probability that
Maxine waits more than 1/10 of a minute and Daniella waits more than 2/7 of
a minute, as follows:
Z ∞ Z ∞
P (X > 1/10 and Y > 2/7) = fX,Y (x, y) dy dx
1/10 2/7
Z ∞ Z ∞
= 15e−3x−5y dy dx
1/10 2/7
Z ∞ Z ∞
−3x
5e−5y dy
= 3e dx
1/10 2/7
∞ ∞
−e−3x x=1/10 −5e−5y
= y=2/7
−3/10
−10/7
= e e
= e−121/70
= 0.1775
25.2. Examples 323
Example 25.7. Using the same X and Y as above—the waiting times for
Maxine and Daniella at their respective traffic lights—we let Z = min(X, Y ).
Notice that Z is a waiting time too, because Z is the waiting time until the first
one of the pair (Maxine or Daniella) has her own light turn green. So Z is the
waiting time until Maxine or Daniella gets to depart. We find the cumulative
distribution function of Z.
First of all, since X > 0 always and Y > 0 always, then Z > 0 always too.
So, for a ≤ 0, we have FZ (a) = P (Z ≤ a) = 0.
Now we turn our attention to a > 0. Since Z = min(X, Y ), then FZ (a) is
equal to P (min(X, Y ) ≤ a). This is a difficult probability to calculate, because
we could either have X ≤ a or Y ≤ a or both. It is much easier to calculate the
complementary probability,
FZ (a) = P (Z ≤ a)
= 1 − P (Z > a)
= 1 − P (min(X, Y ) > a)
= 1 − P (X > a and Y > a)
Z ∞Z ∞
=1− 15e−3x−5y dy dx
Za ∞ a
∞
=1− −3e−3x−5y y=a dx
Za ∞
=1− 3e−3x−5a dx
a
∞
=1+ e−3x−5a x=a
−8a
=1−e
Example 25.8. With joint densities, we can calculate probabilities about all
kinds of relationships between X and Y . For instance, we could calculate the
probability that Daniella waits longer at her light than Maxine, e.g., P (Y > X).
We have two ways to set up the integral:
324 Chapter 25. Joint Densities
y y
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
x x
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Figure 25.2: Left: Setting up the integral for P (Y > X), with y as the outer
integral and x as the inner integral. The y is fixed (e.g., y = 3.2), and x ranges
from 0 to y. Right: Setting up the integral for P (Y > X), with x as the outer
integral and y as the inner integral. The x is fixed (e.g., x = 2.6), and y ranges
from x to ∞.
Method #1: We can integrate for the outer integral over all y’s; for the inner
integral, y is fixed, and we integrate over all of the x’s that are smaller than y,
i.e., 0 ≤ x ≤ y, as shown on the left of Figure 25.2. We get
Z ∞Z y Z ∞
y
P (Y > X) = 15e−3x−5y dx dy = −5e−3x−5y x=0 dy
0 0 0
which yields
Z ∞
∞
P (Y > X) = (5e−5y − 5e−8y ) dy = (−e−5y + (5/8)e−8y ) y=0
= 3/8.
0
Method #2: We can integrate for the outer integral over all x’s; for the inner
integral, x is fixed, and we integrate over all of the y’s that are larger than x,
i.e., x ≤ y, as shown on the right of Figure 25.2. We get
Z ∞Z ∞ Z ∞
−3x−5y ∞
P (Y > X) = 15e dy dx = −3e−3x−5y y=x dx
0 x 0
which yields
Z ∞ Z ∞
−3x−5x ∞
P (Y > X) = 3e dx = 3e−8x dx = −(3/8)e−8x x=0
= 3/8.
0 0
Example 25.9. Suppose that a person throws a dart at a square dart board.
Let X and Y denote, respectively, the x- and y-coordinates (in feet) of the
location where the dart lands; the middle of the dart board is at (0, 0). Suppose
that the dart board is two feet wide and two feet tall, so that the dart only
lands on the dart board if −1 ≤ X ≤ 1 and −1 ≤ Y ≤ 1. Also suppose that the
person always hits the dart board, and moreover, X and Y have joint density:
9
fX,Y (x, y) = (1 − x2 )(1 − y 2 ) for −1 ≤ x ≤ 1, −1 ≤ y ≤ 1,
16
and fX,Y (x, y) = 0 otherwise. See Figure 25.3.
0.6
0.4
0.2
0 −1
−1 0
−0.5 0
0.5 x
1 1
y
To check that fX,Y (x, y) is a joint density, we see that fX,Y (x, y) ≥ 0 always;
the function is 0 except possibly when −1 ≤ x ≤ 1, −1 ≤ y ≤ 1, in which case
1 − x2 and 1 − y 2 are both nonnegative. Also
Z ∞Z ∞ Z 1Z 1
9
fX,Y (x, y) dy dx = (1 − x2 )(1 − y 2 ) dy dx
−∞ −∞ −1 −1 16
Z 1 Z 1
9 2
= (1 − x ) dx (1 − y 2 ) dy
16 −1 −1
9 1 1
= (x − x3/3) x=−1 (y − y 3/3) y=−1
16
= (9/16)(4/3)(4/3)
=1
So fX,Y (x, y) is a joint density. Now we find the probability that the x- and
y-coordinates of the location where the dart lands are each within a 1/2 foot
326 Chapter 25. Joint Densities
3
2
1
x
1 2 3
To see that fX,Y (x, y) is a joint density, we note that fX,Y (x, y) ≥ 0 for all
x, y, and we could compute that
Z 3Z 3
1/9 dy dx = 1.
0 0
and seen in Figure 25.5. So the desired probability is just (1/9)(5) = 5/9. There
is no need to perform the integration using complicated methods. Anytime that
the integrand is constant, we just multiply the integrand (in this case, 1/9) by
the area of the integration region (in this case, 5).
If a reader insists on performing the integration using the tools learned in
calculus, we caution that the integral will need to be broken into at least two
parts. See Figure 25.5. We could integrate over the region in the middle of
Figure 25.5, and then integrate over the region on the right side of Figure 25.5,
and add the results together, since these two regions are disjoint, and their
union is the entire region over which we need to integrate:
Z 1Z 3 Z 3Z 1
P (min(X, Y ) ≤ 1) = 1/9 dy dx + 1/9 dy dx
0 0 1 0
= (1)(3)(1/3) + (2)(1)(1/9)
= 5/9
y y y
3 3 3
2 2 2
1 1 1
x x x
1 2 3 1 2 3 1 2 3
Figure 25.5: Left: The region where min(X, Y ) ≤ 1. The region can be
split into two disjoint parts. Middle: The region where min(X, Y ) ≤ 1 and
0 ≤ X ≤ 1. Right: The region where min(X, Y ) ≤ 1 and 1 ≤ X ≤ 3.
or equivalently, Z ∞
fX (x) = fX,Y (x, y) dy.
−∞
Similarly, the density of Y is found by integrating the joint density over all
x’s: Z ∞
fY (y) = fX,Y (x, y) dx.
−∞
Returning to Example 25.11, we see that the density of X is, for x > 0,
Z ∞ Z x
fX (x) = fX,Y (x, y) dy = 9e−3x dy = 9xe−3x .
−∞ 0
25.3 Exercises
25.3.1 Practice
Exercise 25.1. Consider a pair of random variables X, Y with constant joint
density on the triangle with vertices at (0, 0), (3, 0), and (0, 3). Find P (X +Y >
2).
Find P (X > Y ).
Exercise 25.4. Suppose X, Y have joint density
(
1/16 if −2 ≤ x ≤ 2 and −2 ≤ y ≤ 2,
fX,Y (x, y) =
0 otherwise.
25.3.2 Extensions
Exercise 25.17. Consider the random variables X and Y defined in Exam-
ple 25.5, i.e., X is Maxine’s waiting time, and Y is Daniella’s waiting time.
Let W = max(X, Y ), i.e., W is either Maxine’s or Daniella’s waiting time,
whichever is larger! Find FW (w) = P (W ≤ w), the cumulative distribution
function of W . This is equal to P (max(X, Y ) ≤ w), i.e., the probability that
Maxine waits less than w minutes and Daniella waits less than w minutes.
Exercise 25.18. Freddy and Jane have entered a game in which they each win
between 0 and 2 dollars. If X is the amount Freddy wins, and Y is the amount
that Jane wins, they believe that the joint density of their winnings will be
1
fX,Y (x, y) = xy for 0 ≤ x ≤ 2 and 0 ≤ y ≤ 2,
4
and fX,Y (x, y) = 0 otherwise. Find the probability that their combined win-
nings exceed 2, i.e., find P (X + Y > 2).
and fX,Y (x, y) = 0 otherwise, then find the probability that X and Y are both
smaller than π/6.
Exercise 25.21. Suppose that X, Y are jointly distributed with fX,Y (x, y) =
1/10, for (x, y) in the triangle with vertices at the origin, (2, 0), and (0, 10), and
fX,Y (x, y) = 0 otherwise. Find the probability that Y ≤ 2.
25.3.3 Advanced
Exercise 25.22. Consider X, Y with joint density
sech2 x
fX,Y (x, y) = , for x ≥ 0 and y ≥ 0,
(y + 1)2
332 Chapter 25. Joint Densities
Exercise 25.24. Suppose that the joint PDF of X and Y is f (x, y) = 2e−x−y
for 0 < y < x < ∞, and f (x, y) = 0 otherwise.
Independent Continuous
Random Variables
26.1 Introduction
Two continuous random variables are independent if they satisfy the continuous
versions of Definition 9.13, using densities instead of masses.
Definition 26.1. Independent continuous random variables: Several
equivalent formulations
1. Joint density can be factored into a function of x times a function of y.
These functions of x and y can also be normalized so that they are the densities
of X and Y , respectively: fX,Y (x, y) = fX (x)fY (y) for all x and y.
2. Joint CDF can be factored into a function of x times a function of y. Again,
we can normalize these, so that they are the CDFs of X and Y , respectively:
FX,Y (x, y) = FX (x)FY (y) for all x and y.
3. We will define conditional densities in the next chapter, but we will go
ahead and state the way they will be used for independence: (3a) Density
of X equals conditional density fX|Y (x | y) of X given Y = y, or (3b) Density
of Y equals conditional density fY |X (y | x) of Y given X = x.
333
334 Chapter 26. Independent Continuous Random Variables
26.2 Examples
One of the trickiest things about the above statement of independence is re-
membering to check that the factoring works for all x’s and y’s. For instance,
we briefly reconsider Exercise 25.6:
The region where the density is nonzero is a square, and thus the factorization
of fX,Y (x, y) into the “x stuff” and the “y stuff” works for all x’s and y’s. So,
in this case, X and Y are independent.
Since the X and Y in this example are independent, then we can find the
density of X by itself; all we need to do is determine the constant at the front
of the density of X. We compute
Z 1
(1 − x2 ) dx = (x − x3/3)|1x=−1 = 4/3.
−1
So we need a constant of 3/4 at the front of the density of X, so that the integral
will turn out to be 1. Thus, the density of X is
So, as before, we need a constant at the front of the density of Y ; in this case,
we need a 1/6 in the density of Y , so that the integral will turn out to be 1.
Thus, the density of Y is
So we need a correction factor of 1/2 at the front to make this into a density.
So the density of X must be
Example 26.4. Just because the joint density appears to factor, we must use
caution. Consider, for instance, the joint density
3
fX,Y (x, y) = xy for 0 ≤ x and 0 ≤ y and x + y ≤ 2,
2
and fX,Y (x, y) = 0 otherwise.
Important: We might initially be inclined to say that fX,Y (x, y) can be factored
in a way that works for all x and y, but this is not the case. For instance, if
x = 1/2, then y can be between 0 and 3/2; but, if x = 1, then y can be between
0 and 1. So Y is not independent of X in this example. Nonetheless, we can
still use the technique of finding the density of one random variable, using the
joint density (given in the box at the end of the previous chapter), i.e.,
Z ∞
fX (x) = fX,Y (x, y) dy.
−∞
336 Chapter 26. Independent Continuous Random Variables
y v
4 4
3 3
2 2
1 1
x u
1 2 3 4 5 6 7 1 2 3 4 5 6 7
and fX,Y (x, y) = 0 otherwise. Then X and Y are independent. On the other
hand, consider another pair of random variables, say U and V , which are defined
on a region T with four rectangles, as on the right side of Figure 26.1. Let
fU,V (u, v) be constant on the region T , so
and fU,V (u, v) = 0 otherwise. Then U and V are dependent. For instance, if
v = 0.5 then either u is in the range [0, 1] or [3, 4] or [6, 7], but on the other
26.2. Examples 337
hand, if v = 3.5, then u can be anywhere in the range [0, 7]. So U and V are
dependent. So, the random variables must be defined in rectangles, and if more
than one rectangle is present, the rectangles must be in a grid shape.
One more note: We stated earlier that, “For two continuous random variables
to be independent, the domains where the random variables are defined must
be rectangles (this is a necessary—but not sufficient—condition).” We are just
emphasizing that having a rectangle-shaped domain of definition is not enough
for independence either. The density must (of course) also factor, for the random
variables to be independent. Consider, for instance, Exercise 25.11, in which X
and Y are defined in the unit square 0 ≤ x, y ≤ 1, with density fX,Y (x, y) =
x + y. Despite the fact that the domain of definition is a rectangle (moreover,
a square), the joint density does not factor, so X and Y are dependent.
Remark 26.5. How to check for independence, using a joint density:
1. The joint density fX,Y (x, y) must be defined on a rectangle, or on several
rectangles arranged in a grid shape.
2. The joint density fX,Y (x, y) must factor into a product, with all of the x’s
in one part and all of the y’s in the other part. Properly normalized, the x
part is the density fX (x) of X in such a case, and the y part is the density
fY (y) of Y .
Example 26.6. Each time a pitcher delivers a fastball, the speed is distributed
between 90 and 100 miles per hour, with density 1/10. Assume that the speeds
of pitches are independent.
Consider two fastballs thrown by the pitcher. Find the probability that at
least one of them is 93 miles per hour or faster.
Let X and Y denote the speeds of the two pitches. Then
fX (x) = 1/10 for 90 ≤ x ≤ 100,
and fX (x) = 0 otherwise. Similarly,
fY (y) = 1/10 for 90 ≤ y ≤ 100,
and fY (y) = 0 otherwise. Since X and Y are independent, it follows that the
joint density of X and Y is
fX,Y (x, y) = 1/100 for 90 ≤ x ≤ 100, 90 ≤ y ≤ 100,
and fX,Y (x, y) = 0 otherwise.
So the desired probability is
P (X ≥ 93 or Y ≥ 93) = 1 − P (X ≤ 93 and Y ≤ 93)
Z 93 Z 93
=1− 1/100 dy dx
90 90
= 1 − 9/100
= 91/100
338 Chapter 26. Independent Continuous Random Variables
Another method for computing the desired probability is the following: Let Z
denote the number of the two fastballs that exceed 93 miles per hour. Then Z is
either 0 or 1 orR2, and moreover, Z is Binomial with n = 2 and with probability
100
of success p = 93 1/10 dx = 7/10 on each throw. So the probability of at least
one fastball over 93 miles per hour is
P (Z ≥ 1) = 1 − P (Z = 0)
2 0
=1− p (1 − p)2
0
= 1 − (1 − 7/10)2
= 1 − (3/10)2
= 1 − 9/100
= 91/100.
c. Now suppose that each of the 23 students in a class has a music player.
What is the probability that all 23 of the music players permanently fail within
the first ten years? The probability is
Z 10 23 23
1 −x/3 −x/3
10 23
e dx = e = 1 − e−10/3 = 0.4336599.
0 3 x=0
Another way to see this is the following: The number of music players that
permanently fail within the first ten years is a Binomial random variable with
26.3. Exercises 339
d. Now suppose that we check music players until we find one that dies
within only one year! The number of music players that we need to check is
R1 1
Geometric with probability of success 0 13 e−x/3 dx = e−x/3 x=0 = 1 − e−1/3 =
0.2834687. So the expected number of music players that we would need to
check, until we find one that dies within only one year, is
1
= 3.527726.
1 − e−1/3
26.3 Exercises
26.3.1 Practice
Exercise 26.1. Suppose X, Y have joint density
(
1
(3 − x)(2 − y) if 0 ≤ x ≤ 3 and 0 ≤ y ≤ 2,
fX,Y (x, y) = 9
0 otherwise.
26.3.2 Extensions
Exercise 26.6. When an emergency occurs, the response time (in hours) of
the first police car is a random variable X with density
and fX (x) = 0 otherwise. The response time (in hours) of the first fire engine
is a random variable Y with density
and fX (x) = 0 otherwise. The distance a police car must travel east or west is
a random variable Y with density
(Hint: First find the joint density fX,Y (x, y) of X and Y , and draw a picture
for where the joint density is defined. If you write an integral for the probability,
the integrand is constant. So you can compute the desired area in your picture,
divided by the total area.)
26.3. Exercises 341
and fX,Y (x, y) = 0 otherwise. Find P (X < 2, Y < 2) in this case. Are X and
Y independent in this new case? Why or why not? Can you see whether X and
Y are independent by only looking at the joint density of X and Y ?
Exercise 26.10. Suppose that X, Y have constant joint density on the triangle
with corners at (4, 0), (0, 4), and the origin.
8 − 2x2
fX,Y (x, y) = , for 1 ≤ x ≤ 2, 1/2 ≤ y ≤ 1
3x2 y 3
and fX,Y (x, y) = 0 otherwise.
a. Are X and Y independent?
b. Find P (X < 3/2, Y < 3/4).
Exercise 26.15. Let X, Y, Z have joint density
x
1 2
9√
fX,Y (x, y) = xy, for 0 ≤ x ≤ 1, 0 ≤ y ≤ 4
32
and fX,Y (x, y) = 0 otherwise.
26.3.3 Advanced
Exercise 26.23. Let
1
fX,Y (x, y) = sin x sec2 (y/4) for 0 ≤ x ≤ π, 0 ≤ y ≤ π,
8
and fX,Y (x, y) = 0 otherwise. Find:
a. The density fX (x) of X.
b. The density fY (y) of Y .
c. P (X < π/2, Y < π/2).
Exercise 26.24. Suppose X and Y have joint density
2sech2 (1/x)
fX,Y (x, y) = , for 0 < x < y,
y3
and fX,Y (x, y) = 0 otherwise.
a. Show that fX,Y (x, y) is a joint density, i.e., the integral over all x’s and
y’s is 1.
b. Are X and Y independent?
Exercise 26.25. When X and Y have joint density
2x2 y 2
fX,Y (x, y) = , for 0 < x, 0 < y
ex+2y
and fX,Y (x, y) = 0 otherwise, find P (X < Y ).
Chapter 27
Conditional Distributions
The probability of meeting someone you know increases when you are with
someone you don’t want to be seen with.
—Anonymous
27.1 Introduction
345
346 Chapter 27. Conditional Distributions
fX,Y (x, y)
fX|Y (x | y) = .
fY (y)
For the conditional density of X given Y to make sense, we must use Y values
such that fY (y) > 0.
Equivalently, when fY (y) > 0, then fX|Y (x | y) is the unique function that
gives
fX,Y (x, y) = fY (y)fX|Y (x | y).
To explain how the conditional density can actually be used to obtain con-
ditional probabilities, consider the following example:
The probability
Z
P (X ∈ A | Y = 3) = fX|Y (x | 3) dx
A
only depends on A. For instance, if A is the interval from 0 to 10, we have
Z 10
P (0 ≤ X ≤ 10 | Y = 3) = fX|Y (x | 3) dx.
0
More generally,
Z
P (X ∈ A | Y = y) = fX|Y (x | y) dx.
A
The value “y” should be thought of as “fixed,” and then we are interested in
whether X is in some interval A. For instance, if we think of “y” as fixed, we
might ask whether X is found in the interval from 0 to 10:
Z 10
P (0 ≤ X ≤ 10 | Y = y) = fX|Y (x | y) dx.
0
27.2 Examples
y
10
x
5 10
If we know that the bird lands somewhere on the line where Y = 2, we can find
the conditional density of X given that Y = 2. First we need the density of
Y . For 0 ≤ y ≤ 10, we integrate over all relevant x’s—here, over x from 0 to
10 − y. After the integral with respect to x is performed, and we substitute in We can think of this
10 − y and 0 for x, there will be no x’s remaining. In other words, as integrating x out
Z ∞ of the picture.
fY (y) = fX,Y (x, y) dx
−∞
Z 10−y
= 1/50 dx
0
= (10 − y)/50
Given that the bird’s landing y-coordinate is 2, what is the probability that the
x-coordinate is between 0 and 5?
The probability is
Z 5 Z 5
P (0 ≤ X ≤ 5 | Y = 2) = fX|Y (x | 2) dx = 1/8 dx = 5/8.
0 0
This makes sense, because in this problem, all of the integrands are constant. We
are asking where the x-coordinate will be, and we know that the x-coordinate
is somewhere from 0 to 8, and all of the parts of the line are treated equally, in
a sense.
348 Chapter 27. Conditional Distributions
Given that the dancer’s y-coordinate is exactly 2, what is the density of the
x-coordinate of her location?
We first compute that, for 0 ≤ y ≤ 3, we have − 9 − y 2 ≤ x ≤ 9 − y 2 .
p p
√Given that √
the dancer’s
√ y-coordinate
√ is 2, her√x-coordinate
√ must be between
− 9 − 2 = − 5 and 9 − 2 = 5. So, for − 5 ≤ x ≤ 5, we have
2 2
y
3
1
x
−3 −2 −1 1 2 3
We give more examples in Chapter 31 about constant joint densities and condi-
tional distributions (constant densities correspond to Continuous Uniform ran-
dom variables).
27.2. Examples 349
3
fX,Y (x, y) = xy for 0 ≤ x and 0 ≤ y and x + y ≤ 2,
2
3
2 xy 2x
fX|Y (x | y) = 2 = for 0 ≤ x ≤ 2 − y,
3 (2−y)
y (2 − y)2
2 2
y
2
x
1 2
Z ∞ Z 2−y
2x
fX|Y (x | y) dx = dx
−∞ 0 (2 − y)2
2−y
x2
=
(2 − y)2 0
(2 − y)2
=
(2 − y)2
= 1.
So fX|Y (x | y) is a density.
27.3. Exercises 351
and in this case, the equation for the conditional density simplifies to
27.3 Exercises
27.3.1 Practice
For the joint densities in Exercises 27.1 to 27.10, find (a) the density
fY (y) of Y , and (b) the conditional density fX|Y (x | y) of X given Y .
Exercise 27.2. Let fX,Y (x, y) be constant on the region shown in Figure 27.4.
x
1 2 3 4
27.3.2 Extensions
Exercise 27.11. Consider a pair of random variables X, Y with constant joint
density on the triangle with vertices at (0, 0), (3, 0), and (0, 3).
a. For 0 ≤ y ≤ 3, find the conditional density fX|Y (x | y) of X, given Y = y.
b. Find the conditional probability that X ≤ 1, given Y = 1.
c. Find the conditional probability that X ≤ 1, given Y ≤ 1.
Exercise 27.12. Consider a pair of random variables X, Y with constant joint
density on the quadrilateral with vertices (0, 0), (2, 0), (2, 6), (0, 12).
a. For 0 ≤ y ≤ 6, find the conditional density fX|Y (x | y) of X, given Y = y.
b. For 6 ≤ y ≤ 12, find the conditional density fX|Y (x | y) of X, given
Y = y.
c. Find the conditional probability that X ≤ 1, given 3 ≤ Y ≤ 9.
Exercise 27.13. Let X, Y have joint density fX,Y (x, y) = 14e−2x−7y for x > 0
and y > 0; and fX,Y (x, y) = 0 otherwise.
a. For y > 0, find the conditional density fX|Y (x | y) of X, given Y = y.
b. Find the conditional probability that X ≥ 1, given Y = 3.
c. Find the conditional probability that Y ≤ 1/5, given X = 2.7.
Exercise 27.14. Let X, Y have joint density fX,Y (x, y) = 18e−2x−7y for 0 <
y < x; and fX,Y (x, y) = 0 otherwise.
a. For y > 0, find the conditional density fX|Y (x | y) of X, given Y = y.
b. For x > 0, find the conditional density fY |X (y | x) of Y , given X = x.
Exercise 27.15. Suppose X, Y have joint density
(
1
(3 − x)(2 − y) if 0 ≤ x ≤ 3 and 0 ≤ y ≤ 2,
fX,Y (x, y) = 9
0 otherwise.
Exercise 27.17. Assume that Wyoming is shaded exactly like a rectangle, and
that a person’s location in Wyoming is 0 ≤ X ≤ 350 and 0 ≤ Y ≤ 276. Assume
that the joint density of a person’s location is constant on this region.
Assume that I-80 runs perfectly straight along the east-to-west line Y =
30, as in Figure 27.5. Given that the person is located on I-80, what is the
conditional probability that he is within 10 miles of a State border? (He could
be either west or east; please take both into account.)
250
200
150
100
50
x
100 200 300
Exercise 27.18. Consider the semicircle with radius 2 seen in Figure 27.6. A
dancer is randomly located in the semicircle, with the joint density of their
location to be
2
fX,Y (x, y) = if x, y is in the semicircle,
9π
and fX,Y (x, y) = 0 otherwise. Find the probability that the person is in the
region where −1 ≤ X ≤ 1 and 0 ≤ Y ≤ 1.
y
3
1
x
−3 −2 −1 1 2 3
A thing long expected takes the form of the unexpected when at last it comes.
—Mark Twain
How long should we expect to have to wait at a traffic light? What is the
expected length of a randomly chosen song on our playlist? What is the expected
distance from the center, when a dart is thrown at a dartboard?
28.1 Introduction
To a great extent, the way that expected values are calculated for continuous
random variables should not be surprising: the method is analogous to how
expected values are calculated for discrete random variables. For continuous
random variables, we cannot use the probabilities of values of random variables
as a method of scaling, because the probability of any particular value is 0,
i.e., P (X = x) = 0 for each x. Instead, we use the density fX (x) for scaling.
In other words, we integrate x over all possible x’s (just as we summed over
all possible x’s with discrete random variables), and we give a weight of fX (x)
to each x (just as we gave a weight of pX (x) to each x with discrete random
variables).
355
356 Chapter 28. Expected Values of Continuous Random Variables
To see this:
Z b
E(X) = xfX (x) dx
a
Z b
≤ bfX (x) dx since x ≤ b on the interval [a, b]
a
Z b
=b fX (x) dx
a
= (b)(1) dx since the density must integrate to 1
= b,
and similarly
Z b Z b Z b
E(X) = xfX (x) dx ≥ afX (x) dx = a fX (x) dx = (a)(1) dx = a.
a a a
3
fX (x) = (x)(2 − x)(3 − x) for 0 ≤ x ≤ 2,
8
and fX (x) = 0 otherwise.
E(X) = 1/3.
1
fX (x) = for a ≤ x ≤ b,
b−a
and fX (x) = 0 otherwise.
a+b
E(X) = .
2
28.3. Some Applied Problems with Expected Values 359
E(X) = 1/λ.
To derive this expected value, we follow the earlier example, in which λ was
equal to 3. The same argument works here. The expected value of X is
Z ∞ Z ∞
E(X) = xfX (x) dx = xλe−λx dx
−∞ 0
e−λx e−λx
−λ → 0 as x → ∞ and −λ → −1/λ as x → 0. Thus
E(X) = 1/λ.
Example 28.8. Let X be the time (in minutes) that Maxine waits for a traffic
light to turn green, and let Y be the time (in minutes, at a different intersection)
that Daniella waits for a traffic light to turn green. Suppose that X and Y have
joint density
fX,Y (x, y) = 15e−3x−5y , for x > 0 and y > 0,
and fX,Y (x, y) = 0 otherwise. Find the expected time that each of them wait
at their lights.
360 Chapter 28. Expected Values of Continuous Random Variables
Thus
fX (x) = 3e−3x for x > 0,
E(X) = 1/3.
and fY (y) = 0 otherwise. Again using the boxed result in the previous section,
it follows that
E(Y ) = 1/5.
Example 28.9. Suppose that a person throws a dart at a square dart board.
Let X and Y denote, respectively, the x- and y-coordinates (in feet) of the
location where the dart lands; the middle of the dart board is at (0, 0). Suppose
that the dart board is two feet wide and two feet tall, so that the dart only
lands on the dart board if −1 ≤ X ≤ 1, −1 ≤ Y ≤ 1. Also suppose that the
person always hits the dart board, and moreover, X and Y have joint density:
9
fX,Y (x, y) = (1 − x2 )(1 − y 2 ) for −1 ≤ x ≤ 1, −1 ≤ y ≤ 1,
16
Since the entire problem is symmetric in X and Y , it follows also that E(Y ) = 0.
28.4 Exercises
28.4.1 Practice
For the densities in Exercises 28.1 to 28.8, find the expected value of
the random variable.
Exercise 28.2. The time (in minutes) it takes until Julie’s boyfriend calls is a
random variable X with density
1 −x/10
fX (x) = e , for 0 < x,
10
and fX (x) = 0 otherwise.
362 Chapter 28. Expected Values of Continuous Random Variables
Exercise 28.11. Let X, Y have joint density fX,Y (x, y) = 14e−2x−7y for x > 0
and y > 0; and fX,Y (x, y) = 0 otherwise.
Exercise 28.12. Let X, Y have joint density fX,Y (x, y) = 18e−2x−7y for 0 <
y < x; and fX,Y (x, y) = 0 otherwise.
4 for 0 ≤ x ≤ 1,
3
fX (x) = 14 for 3 ≤ x ≤ 4,
0 otherwise.
Find E(X).
8 for 0 ≤ x ≤ 1,
7
fX (x) = 18 for 7 ≤ x ≤ 8,
0 otherwise.
Find E(X).
364 Chapter 28. Expected Values of Continuous Random Variables
Exercise 28.17. Consider the joint density of X and Y from Example 26.2:
1
fX,Y (x, y) = (1 − x2 )(3 − y) for −1 ≤ x ≤ 1 and −1 ≤ y ≤ 1,
8
and fX,Y (x, y) = 0 otherwise.
a. Find E(X).
b. Find E(Y ).
Exercise 28.18. Consider the joint density of X and Y from Example 26.4:
3
fX,Y (x, y) = xy for 0 ≤ x and 0 ≤ y and x + y ≤ 2,
2
and fX,Y (x, y) = 0 otherwise. Find E(Y ).
Exercise 28.19. As in Example 27.2, a bird lands in a grassy region described
as follows: 0 ≤ x, and 0 ≤ y, and x + y ≤ 10. Let X and Y be the coordinates
of the bird’s landing. Assume that X and Y have the joint density
fX,Y (x, y) = 1/50 for 0 ≤ x and 0 ≤ y and x + y ≤ 10,
and fX,Y (x, y) = 0 otherwise. Find E(Y ). (Hint: The answer is not “ E(Y ) =
5.”)
28.4.2 Extensions
Exercise 28.20. Consider two points that are independently placed on a line
of length 10, at locations X and Y . Thus the joint density of X and Y is
fX,Y (x, y) = 1/100 for 0 ≤ x ≤ 10 and 0 ≤ y ≤ 10,
and fX,Y (x, y) = 0 otherwise.
a. First, try to show that, if Z denotes the distance between X and Y (so
0 ≤ Z ≤ 10), then the cumulative distribution function FZ (z) of Z is
1 1 2
FZ (z) = z − z for 0 ≤ z ≤ 10
5 100
and FZ (z) = 0 for z ≤ 0 and FZ (z) = 1 for z ≥ 10. (Hint: Use the complement;
try to find P (Z > z), i.e., the probability that the distance between X and Y
is more than z. A picture should help.)
b. Regardless of whether you can accomplish the part above, just find fZ (z),
by differentiating FZ (z).
c. Use the density fZ (z) to find E(Z), i.e., the expected distance between
X and Y .
Exercise 28.21. Let X and Y have joint density
3 2
fX,Y (x, y) = (x + y), for 0 ≤ x ≤ 2 and 0 ≤ y ≤ 4,
80
and fX,Y (x, y) = 0 otherwise. Find E(X).
28.4. Exercises 365
28.4.3 Advanced
Exercise 28.22. Three children are skating around the perimeter of a circle in
an ice rink of radius 50 feet. Assume that their locations around the perimeter
of the circle are independent, continuous random variables, each with constant
density. Their mother comes to the door of the ice rink (located at a fixed
position on perimeter of the circle). When she appears, what is the expected
distance around the perimeter from her to the closest of the three children?
Chapter 29
If you do not expect the unexpected, you will not find it; for it is hard to be
sought out, and difficult.
—Heraclitus
To what extent are the students’ grades in a course near the average? How can
we quantify the spread of randomly thrown darts from the center of the board?
The variance of a continuous random variable measures the spread of the random
variable’s distribution around the mean (just as the variance does with a discrete
random variable as well). We first point out two different random variables that
have the same expected value but different variances.
Example 29.1. For our first example, we define two random variables that
each have expected value 10/3. The random variable Y is from Exercise 28.19,
and the random variable X will be defined below. Since these two random
variables have the same expected value, we will use their variances to compare
them.
366
29.1. Variance of a Continuous Random Variable 367
Now we compare to a different random variable that also has expected value
10/3: Suppose that X is the waiting time until the next bus arrives, given in
minutes. If X has density
3 −3x/10
fX (x) = e for x > 0,
10
and fX (x) = 0 otherwise, then the density of X is exponentially decreasing,
with λ = 3/10, so we can conclude, by Theorem 28.7 that E(X) = 1/λ = 10/3.
Now a question immediately arises: Both random variables, X and Y , have
expected value 10/3; can we say something more about the distributions of
X and Y to better understand how X and Y behave? With discrete random
variables, we sometimes looked at the masses; for continuous random variables,
we can look at the densities. The densities of Y and X are given, respectively,
on the left and right sides of Figure 29.1.
fX (x)
0.3
fY (y)
0.2 0.2
0.1 0.1
y x
2 4 6 8 10 2 4 6 8 10
Figure 29.1: Left: The density fY (y) = (10−y)/50 of Y . Right: The density
3 −3x/10
fX (x) = 10 e of X.
368 Chapter 29. Variance of Continuous Random Variables
We notice that
The definition of To find the variance, we need to know how to compute the expected value
variance is the of the square of X and also how to compute the expected value of the square
same for continuous of Y .
and discrete random
variables.
Definition 29.5. Expected value of the square of a continuous random
variable
If X is a continuous random variable X with density fX (x), then the expected
value of X 2 is Z ∞
2
E(X ) = x2 fX (x) dx.
−∞
29.1. Variance of a Continuous Random Variable 369
find the expected value of X 2 , and also the variance and standard deviation of
X.
The expected value of X 2 is
Z ∞
E(X 2 ) = x2 fX (x) dx,
−∞
Example 29.9. When Joyce reaches in her refrigerator for a drink from a 2-
liter bottle of soda, her roommates have usually already helped themselves to
some of the soda. So the amount of soda that remains is a random variable X.
If X has density fX (x) = 1/2 for 0 ≤ x ≤ 2, and fX (x) = 0 otherwise, calculate
the cost of the portion that has already been drunk by the roommates. Suppose
that the cost of soda is $1.38 for 2 liters, i.e., $0.79 per liter.
372 Chapter 29. Variance of Continuous Random Variables
The quantity that has been drunk is 2 − X liters, and its cost is (0.79)(2 − X).
So the desired expected value is
Z 2
E((0.79)(2 − X)) = (0.79)(2 − x)(1/2) dx
0
Z 2
= 0.395 (2 − x) dx
0
2
= 0.395 (2x − x2/2) x=0
= (0.395)(2)
= 0.79
E(aX + b) = aE(X) + b.
To see that the theorem is true, just write
Z ∞
E(aX + b) = (ax + b)fX (x) dx
−∞
Z ∞ Z ∞
=a xfX (x) dx + b fX (x) dx
−∞ −∞
= aE(X) + b.
One might guess (correctly, by the way!) that E(X + Y ) is just equal to
E(X) + E(Y ), and then use the answers from Exercise 28.17 to conclude that
One nice feature of this method is that it prevents us from having to take
the intermediate step to compute the density of either of the variables. The
downside is that it makes the integration a little more complicated, but we
handle the whole computation in one fell swoop! We get the following:
Z 1Z 1
1
E(X + Y ) = (x + y) (1 − x2 )(3 − y) dy dx
−1 −1 8
Z 1 Z 1
1
= (1 − x2 ) (3x − xy + 3y − y 2 ) dy dx
−1 8 −1
Z 1
1 h
1
i
= (1 − x2 ) (3xy − xy 2/2 + 3y 2/2 − y 3/3) y=−1 dx
−1 8
Z 1
1
= (1 − x2 )(6x − 2/3) dx
−1 8
Z 1
1
= (6x − 6x3 − 2/3 + 2x2/3) dx
−1 8
1 1
= (6x2/2 − 6x4/4 − 2x/3 + 2x3/9) x=−1
8
= −1/9
The ability to compute E(X + Y ) in one calculation is very nice. . . , but even
nicer is the fact that this method works much more generally. For any function
of X and Y , we can calculate the expected value of the function of X and Y
using this method.
374 Chapter 29. Variance of Continuous Random Variables
Now, by using the new method for expected values, and letting g(x, y) = x, we
have a very succinct way to calculate the expected value of X directly, using a
joint density:
Generalizing the example above, we can always get the expected value of
the sum of two random variables, by just letting g(x, y) = x + y.
We compute
E(Z) = E(|X − Y |)
Z ∞Z ∞
= |x − y|fX,Y (x, y) dy dx
−∞ −∞
Z 10 Z 10
1
= |x − y| dy dx
0 0 100
The only thing we need to do is split the integral into two parts, according to
whether x ≥ y, in which case |x − y| = x − y, or whether x ≤ y, in which case
376 Chapter 29. Variance of Continuous Random Variables
|x − y| = y − x. So we continue
Z 10 Z x Z 10 Z 10
1 1
E(Z) = (x − y) dy dx + (y − x) dy dx
0 0 100 0 x 100
Z 10 x Z 10 10
2 1 2 1
= (xy − y /2) dx + (y /2 − xy) dx
0 100 y=0 0 100 y=x
Z 10 Z 10
2 1 2 1
= (x − x /2) dx + ((102/2 − 10x) − (x2/2 − x2 )) dx
0 100 0 100
Z 10
1
= (x2 − 10x + 50) dx
0 100
10
1
= (x3/3 − 10x2/2 + 50x)
100 x=0
1
= (1000/3 − 1000/2 + 500)
100
= 10/3
Notice that we did not need to compute FZ (z) or fZ (z) en route to comput-
ing E(Z) with this method. So there are some advantages to just computing
expected values using the powerful formula
Z ∞ Z ∞
E(g(X, Y )) = g(x, y)fX,Y (x, y) dy dx.
−∞ −∞
(Several of these facts have analogous discrete versions; see Section 12.3.)
We already showed that E(aX + b) = aE(X) + b for any random variable
X and any constants a and b. More is true:
Using identity functions for f and g, i.e., f (X) = X and g(Y ) = Y , the following
nice fact follows immediately:
E(XY ) = E(X)E(Y ).
378 Chapter 29. Variance of Continuous Random Variables
Corollary 29.22. If X is any random variable, and if a and b are any con-
stants, then
Var(aX + b) = a2 Var(X).
29.5 Exercises
29.5.1 Practice
For the densities in Exercises 29.1 to 29.8, find the variance of the
given random variable.
Exercise 29.1. A child has lost her ring somewhere along a 20-foot line in a
narrow strip of grass. Let X denote the location (lengthwise) along the grass
where it was dropped, so
1
fX (x) = , for 0 ≤ x ≤ 20,
20
and fX (x) = 0 otherwise.
29.5. Exercises 379
Exercise 29.3. When meeting a friend at a coffee house, Donald sees that his
friend has not yet arrived, so he assumes that his friend’s arrival time, X, in
minutes, will have density
1
fX (x) = , for 0 ≤ x ≤ 5,
5
and fX (x) = 0 otherwise.
Exercise 29.4. The waiting time X, in minutes, until the next green car passes
has density
1
fX (x) = e−x/2 , for 0 < x,
2
and fX (x) = 0 otherwise.
Exercise 29.5. Darius always arrives at Wiley dining hall 50 minutes before
it closes. He estimates that the time X necessary to stand in line, in minutes,
has density
x
fX (x) = , for 0 ≤ x ≤ 10,
50
and fX (x) = 0 otherwise.
Exercise 29.8. Let X be the time, in hours, that Christine spends talking on
the telephone. Then X has density
Exercise 29.16. The distance X, in yards, that a small person can throw a
50-pound weight, has density
For the densities in Exercises 29.17 to 29.22, and the given function
g(x, y) in each exercise, find E(g(X, Y )).
xy 2
fX,Y (x, y) = , for 0 ≤ x ≤ 4, 0 ≤ y ≤ 3,
72
x−y
and fX,Y (x, y) = 0 otherwise. Let g(x, y) = 2 .
382 Chapter 29. Variance of Continuous Random Variables
29.5.2 Extensions
Exercise 29.23. Geoffrey does not like to be low on gas, so he randomly stops
to fill up his tank. He has a 14-gallon tank, and the current price of gas is $3.75
per gallon. Whenever he stops to buy gas, he always buys a candy bar for $1.30.
If X is the amount of gas (in gallons) in his tank when he stops for a purchase,
then fX (x) = 1/14 for 0 ≤ x ≤ 14, and fX (x) = 0 otherwise. He always fills
the tank, so he will always buy 14 − X gallons. Find the expected amount of
money Geoffrey spends on a purchase of gas and a candy bar.
Exercise 29.24. Let X have density
1
fX (x) = e−x/3 , for 0 < x,
3
and fX (x) = 0 otherwise.
a. Find the expected value of X 2 , i.e., E(X 2 ).
b. Find the variance Var(X).
c. Find the standard deviation σX .
Exercise 29.25. Let X and Y correspond to the horizontal and vertical co-
ordinates in the triangle with corners at (2, 0), (0, 2), and the origin. Let
15
fX,Y (x, y) = 28 (xy 2 + y) for (x, y) inside the triangle, and fX,Y (x, y) = 0
otherwise. Find E(XY ).
Exercise 29.26. Let X and Y correspond to the horizontal and vertical coor-
dinates in the rectangle with corners at (15, 0), (15, 10), (0, 10), and the origin.
Let fX,Y (x, y) = 150
1
for (x, y) inside the rectangle, and fX,Y (x, y) = 0 other-
wise. Find E(XY ).2
29.5.3 Advanced
Exercise 29.30. Every day, a student calls his mother and then (afterwards)
calls his girlfriend. Let X be the time (in hours) until he calls his mother, and
let Y be the time (in hours) until he calls his girlfriend. Since he always calls
his mother first, then X < Y . So let the joint density of the time be
Exercise 29.33. Suppose that X and Y have joint probability density function
fX,Y (x, y) = 16e−4x−4y for x > 0 and y > 0, and fX,Y (x, y) = 0 otherwise. Find
E(X + Y ).
• Remember that it is important to check that you have a valid density first.
• Find probabilities, e.g., P (1.7 ≤ X ≤ 3.2).
• Graph the density fX (x).
• Calculate expected value (average), variance, and standard deviation of
X and of functions of X.
• Calculate the CDF by integrating the density.
The formulas in this chapter were for continuous random variables only.
Earlier in the text, we discussed discrete random variable formulas which are
similar, but they did not use calculus.
How do you know if your density is valid?
• The density must account for every possible outcome. It must describe
everything that could happen from −∞ to ∞, although the density is
sometimes zero on large portions of the real line.
384
30.1. Summary of Continuous Random Variables 385
E(aX + b) = aE(X) + b
Var(aX + b) = a2 Var(X)
p p
σaX+b = a2 Var(X) = |a| Var(X)
What if all of these random variables are independent and also have
the same distribution?
n
!
X
Var Xi = n Var(X)
i=1
p
σPni=1 Xi = n Var(X)
30.2 Exercises
Exercise 30.1. Using the following probability density function
(
2 if 0 ≤ x ≤ 1/2,
fX (x) =
0 otherwise,
if x < 2,
0
FX (x) = x−2
2 if 2 ≤ x ≤ 4,
if 4 < x,
1
Exercise 30.5. What is the constant k that makes the following function a
valid density? (
k(e−x + e−4x ) if 0 ≤ x,
fX (x) =
0 otherwise,
Exercise 30.6. What is the constant k that makes the following function a
valid density? (
kx2 (1 − x)7 if 0 ≤ x ≤ 1,
fX (x) =
0 otherwise,
388 Chapter 30. Review of Continuous Random Variables
Exercise 30.7. Which of the following can be true? If an answer is false, state
why it is false.
a. A CDF FX (x) can have the value 4.3.
b. A density fX (x) can have the value 4.3.
c. A mass pX (x) can have the value 4.3.
Exercise 30.8. Which of the following can be true? If an answer is false, state
why it is false.
a. A CDF FX (x) can be 1 for two or more values of x.
b. A density fX (x) can be 1 for two or more values of x.
c. A mass pX (x) can be 1 for two or more values of x.
Exercise 30.9. Which of the following can be true? If an answer is false, state
why it is false.
a. The (indefinite) integral of FX (x) is fX (x).
b. The (indefinite) integral of fX (x) is FX (x).
c. The area under the curve of fX (x) between −∞ and x is FX (x).
d. The area under the curve of FX (x) between −∞ and x is fX (x).
e. The derivative of FX (x) is fX (x).
f. The derivative of fX (x) is FX (x).
Exercise 30.10. Which of the following can be true? If an answer is false,
state why it is false.
a. The area under the FX (x) curve from −∞ to ∞ is 1.
b. The area under the fX (x) curve from −∞ to ∞ is 1.
Exercise 30.11. Which of the following can be true? If an answer is false,
state why it is false.
a. The CDF FX (x) can be negative.
b. The density fX (x) can be negative.
c. The mass pX (x) can be negative.
Exercise 30.12. Which of the following can be true? If an answer is false,
state why it is false.
a. The graph of fX (x) can have jumps (i.e., can be discontinuous).
b. The graph of FX (x) for discrete random variables can have jumps.
c. The graph of FX (x) for continuous random variables can have jumps.
d. The graph of pX (x) can have jumps.
Part VI
For discrete distributions, you learned the rules for general distributions and
then about some of the most widely used types of random variables, to make
your life easier in scenarios that match specific situations (Binomial, Geomet-
ric, etc.). For continuous distributions, you have already learned the rules for
general distributions during the previous part of the book. Now in the chapters
that follow, we will show you some formulas for scenarios matching very com-
mon types of situations, and the named continuous random variables that are
appropriate in these situations. For instance, we dedicate several chapters to
the Normal distribution, perhaps the most important continuous distribution
for probability and statistics. At the end of this part of the book, we will review
some strategies for comparing and contrasting all of these distributions, to help
you pick the correct distribution for your situation.
There are many other continuous distributions than the five that we cover
here, but these are the most common ones.
By the end of this part of the book, you should be able to:
2. Identify the variable and the parameters in a story, and state in English
what the variable and its parameters mean.
3. Use the formulas for the density, CDF, expected value, and variance to
answer questions and find probabilities.
4. Utilize joint densities to study problems about two or more random vari-
ables at the same time.
389
390 Part VI. Named Continuous Random Variables
Math skills you will need: ex , derivatives, integrals (in particular, integration
by parts), and the Gamma function.
Additional resources: Computer programs (such as Excel, Maple, Mathe-
matica, Matlab, Minitab, R, SPSS, etc.) and calculators may be used to assist
in the calculations. You will also need the standard Normal table that appears
in Chapter 35 and also appears in the back of the book.
Chapter 31
If you want to board a bus at a random time, and the bus circles campus once
every 30 minutes, how long do you expect to wait until the bus arrives? What
is the probability that the bus will come within the next 5 minutes? What is
the variance of the time until the next bus comes? What is the probability that
you just missed the bus by 1 minute or less?
31.1 Introduction
For the next few chapters, we focus on some very specific continuous random
variables. The most simple type of continuous random variable is called a
Continuous Uniform random variable, often just called a Uniform. We say that
X is Uniform anytime that the density of X is constant on some interval. In
such a case, the density of a Uniform random variable must be equal to 1 divided
by the length of the interval. The reason is that, when the density (which is
constant) is integrated over the length (or area, or volume, in the case of two
or three dimensions) of the region, the integral needs to evaluate to 1.
One of the nicest properties of Uniform random variables is that, since the
density is constant, the integrals for finding various probabilities are especially
easy, because we can essentially avoid having to perform an integration. We
can just multiply the constant (i.e., the density) by the length over which we
391
392 Chapter 31. Continuous Uniform Random Variables
are “integrating,” and then we obtain the desired probability, without actually
having to use any complicated integrating techniques.
CDF:
0
x < a,
x−a
FX (x) = , a ≤ x ≤ b,
b−a
1 b<x
E(X) = (a + b)/2
Variance formula:
Var(X) = (b − a)2/12
31.2. Examples 393
The notation for a Uniform random variable on the interval (a, b) looks like:
X ∼ C.Uniform(a, b).
Where does the cumulative distribution function formula come from?
If we integrate the probability density function formula from −∞ to c, where
a ≤ c ≤ b, we get
Z c Z c
1 c−a
FX (c) = fX (x) dx = dx = .
−∞ a b − a b−a
Thus FX (x) = x−a
b−a for a ≤ x ≤ b.
However, it is also useful to think about the cumulative distribution function
as the area under the curve for x in the range [a, c].
Where does the expected value formula come from?
The expected value E(X) for the Uniform random variable is just the mid-
point of the range where the density fX (x) is not 0. (This is not true for other
types of continuous random variables.) Since the Continuous Uniform distribu-
tion is symmetric, the median (center) and the mean (expected value) are the
same.
31.2 Examples
1 FX (x)
0.5
1/3 fX (x)
1/6
x x
2 4 6 8 10 2 4 6 8 10
Figure 31.1: Left: The density fX (x) = 1/6. Right: The CDF FX (x) = 6 ,
x−2
2+8
E(X) = = 5, which is 1:05 PM.
2
d. What is the standard deviation for the time the customer arrived?
The variance is
(8 − 2)2
Var(X) = = 3,
12
31.2. Examples 395
e. What is the probability the customer arrived within the last 2 minutes
(i.e., between 1:06 PM and 1:08 PM)?
We are looking for P (6 ≤ X ≤ 8). You could do this problem three different
ways.
Method #1. For the Continuous Uniform distribution X, the probability of
X being in some range is equal to the length of the range divided by the entire
length of the interval where X is defined. This is true in general, for Continuous
random variables, because the density we are integrating (to obtain the proba-
length of [6, 8]
bilities) is a constant function. Thus P (6 ≤ X ≤ 8) = = 1/3.
length of [2, 8]
(Note: If some of the desired interval falls outside the range where X is
defined, that part of the range must be ignored. For example, consider P (6 ≤
X ≤ 10). We know that X is between 2 and 8, so the portion from 8 to 10 can
be ignored. So, P (6 ≤ X ≤ 10) = P (6 ≤ X ≤ 8) = 2/6 = 1/3.)
Method #2. If you use the probability density function, you could integrate
Z 8
1
P (6 ≤ X ≤ 8) = = 2/6 = 1/3.
6 6
Method #3. If you know the cumulative distribution function, you could
just use
P (6 ≤ X ≤ 8) = P (X ≤ 8) − P (X ≤ 6) = FX (8) − FX (6).
See the right side of Figure 31.1 for a plot of the cumulative distribution func-
tion.
f. What is the cumulative distribution function? Also show the graph.
The CDF is
0
x ≤ 2,
x−2
FX (x) = 2 ≤ x ≤ 8,
8−2
1 8 ≤ x,
interval is removed. What remains is the rest of the interval, i.e., [4, 8]. Thus,
the conditional density is constant on the interval [4, 8]. So
length of [6, 8]
P (X > 6 | X > 4) = = 2/4 = 1/2.
length of [4, 8]
0.4 fX (x)
0.2
x
2 4 6 8 10
Figure 31.2: Density fY (y) = 1/4 of Uniform random variable Y on [4, 8].
(4000 − 0)2
Var(T ) = = 1,333,333,
12
p
σT = 1,333,333 = $1154.70.
d. An insurance company issues a renter’s policy with a deductible of $800.
This means that if there is a claim the insurance company will reimburse the
renter all damage costs between $800 and $4000, but the renter is responsible
for the first $800 in costs.
What is the expected out-of-pocket cost C to the renter if damage occurs?
We observe that T = C + M , where M is the amount that the insurance
company pays. The out-of-pocket cost C is just a function of T :
(
T if 0 ≤ T ≤ 800,
C=
800 if 800 < T ≤ 4000
e. How much should the insurance company expect to pay the renter if damage
occurs?
Remember that the insurance company only pays if the damage is above
$800. Again, use M to denote the amount that the insurance company pays.
Then M is just a function of T :
(
0 if 0 ≤ T ≤ 800,
M=
T − 800 if 800 ≤ T ≤ 4000
398 Chapter 31. Continuous Uniform Random Variables
E(M ) = $1280.00,
E(C) = $720.00,
E(T ) = $2000.00
1
fX (x) = for a ≤ x ≤ b,
b−a
and fX (x) = 0 otherwise. So we can use the lengths of intervals, instead of
having to integrate.
length of C length of C
Z
1
P (X ∈ C) = dx = = .
C b−a b−a length of [a, b]
Example 31.6. Let X and Y denote the latitude and longitude of a person
who is thought to be lost in a certain part of a forest. If we assume that the
densities of X and Y are each constant, say, 2 ≤ X ≤ 8.2 and 3.01 ≤ Y ≤ 6.3,
then
1
fX (x) = for 2 ≤ x ≤ 8.2,
8.2 − 2
and fX (x) = 0 otherwise, and
1
fY (y) = for 3.01 ≤ y ≤ 6.3,
6.3 − 3.01
and fY (y) = 0 otherwise. If, moreover, we assume that X and Y are indepen-
dent, then the joint density of X and Y is constant too:
fX,Y (x, y) = fX (x)fY (y)
1 1
= for 2 ≤ x ≤ 8.2 and 3.01 ≤ y ≤ 6.3,
8.2 − 2 6.3 − 3.01
and fX,Y (x, y) = 0 otherwise.
Example 31.8. When a joint density is constant, then the conditional densities
(e.g., if the value of one random variable is given) are uniform too. For instance,
in Example 31.7, if X = 0 is given, then the conditional density of Y is uniform
on the interval [−2, 2], i.e.,
fY |X (y | 0) = 1/4 for −2 ≤ y ≤ 2.
As another example with the dartboard, if√Y =√1 is given, then the conditional
density of X is uniform on the interval [− 3, 3], i.e.,
1 √ √
fX|Y (x | 1) = √ for − 3 ≤ y ≤ 3.
2 3
For some other examples of this phenomenon, refer to Example 27.2 (where
fX|Y (x | 2) = 1/8 for 0 ≤ x ≤ 8) and Example 27.3 (where fX|Y (x | 2) = 2√
1
5
√ √
for − 5 ≤ x ≤ 5).
Let X the be first student’s choice, and let Y be the second student’s choice.
Since X and Y are independent, we can multiply to get the joint density of X
and Y :
fX,Y (x, y) = 1/100 for 0 ≤ x ≤ 10, 0 ≤ y ≤ 10,
and fX,Y (x, y) = 0 otherwise.
Suppose that we want to find the expected value of the minimum of the two
numbers.
Method #1: To find the expected value, we can integrate
Z 10 Z 10
1
E(min(X, Y )) = min(x, y) dy dx
0 0 100
It is difficult to know whether x or y is the minimum when we integrate over
all x’s and y’s at once, so we suggest breaking the integral into two pieces,
depending on whether Y or X is actually the minimum. In Figure 31.3, we
show these two regions.
31.2. Examples 401
y
10
min(X, Y ) = X
9
i.e., X is the minimum
8
7
6
5
4
3
2 min(X, Y ) = Y
1 i.e., Y is the minimum
x
1 2 3 4 5 6 7 8 9 10
Now we integrate min(X, Y )fX,Y (x, y) over the entire square, shown in Fig-
ure 31.3. When we are in the lower triangle, where min(X, Y ) = Y , then
We integrate over the range where Y is the min and then over the range where
X is the min:
Z 10 Z x Z 10 Z 10
1 1
E(min(X, Y )) = min(x, y) dy dx + min(x, y) dy dx
0 0 100 0 x 100
and we get
Z 10 Z x Z 10 Z 10
1 1
E(min(X, Y )) = y dy dx + x dy dx
0 0 100 0 x 100
If we wanted to, we could switch the bounds of integration on the second inte-
gral, so that we integrate over x on the inside and over y on the outside. The
reasons will be clear in a brief moment:
Z 10 Z x Z 10 Z y
1 1
E(min(X, Y )) = y dy dx + x dx dy
0 0 100 0 0 100
Finally, we see that we have essentially just written the same integral twice,
just swapping letters the second time, so we can save ourselves some time by
402 Chapter 31. Continuous Uniform Random Variables
10 x
y2 1
Z
E(min(X, Y )) = 2 dx
0 2 100 y=0
10
x2 1
Z
=2 dx
0 2 100
10
x3 1
=2
6 100 x=0
= 10/3
Method #2: Another possible method is to always write U as the smaller of the
two values X and Y , and write V as the larger of the two values X and Y . In
other words,
U = min(X, Y ),
and
V = max(X, Y ).
and fU,V (u, v) = 0 otherwise. This triangle is shown in Figure 31.4. So the
v
10
8
6
4
2
u
2 4 6 8 10
E(min(X, Y )) = E(U )
Z 10 Z v
1
= (u) du dv
0 0 50
Z 10 2 v
u 1
= dv
0 2 50 u=0
Z 10 2
v 1
= dv
0 2 50
3 10
v 1
=
6 50 v=0
= 10/3,
To see that this is true, we show the result for a > 0. Since s < X < t, then
as + b < aX + b < at + b. So if we write Y = aX + b, we see that fY (y) = 0 for
y outside the interval [as + b, at + b]. For y inside the interval [as + b, at + b],
we compute
y−b
−s
y−b a
FY (y) = P (Y ≤ y) = P (aX + b ≤ y) = P X≤ = ,
a t−s
and thus
d 1/a 1
fY (y) = FY (y) = = .
dy t−s a(t − s)
So the density of Y is constant on the interval [as + b, at + b] and is 0 otherwise.
Thus Y is a Uniform random variable on the interval [as + b, at + b].
404 Chapter 31. Continuous Uniform Random Variables
Thus, we confirm the fact that the expected value of the purchase price is
39.80 + 1.30
E(Y ) = = 20.55.
2
We know much more now; for instance, we know that the variance of Y is
(39.80 − 1.30)2
Var(Y ) = = 123.5208333.
12
and we know that the density of Y is
1 1
fY (y) = = for 1.30 ≤ y ≤ 39.80,
39.80 − 1.30 38.50
31.4 Exercises
31.4.1 Practice
Exercise 31.1. Rope checks. In a certain manufacturing process, an auto-
mated quality control computer checks 10 yards of rope at a time. If no defects
are detected in that 10-yard section, that portion of the rope is passed on. How-
ever, if there is a defect detected, a person will have to check the rope over more
carefully to determine where (measured from the left side in yards) the defect
is. If exactly 1 defect is detected in a rope section, we would like to find the
probabilities for its location.
31.4. Exercises 405
Exercise 31.2. Harry Potter and the Missed Exit. You are driving down
an interstate when you suddenly realize you have missed your exit because
you were busy listening to an exciting chapter of a Harry Potter book on CD.
According to the CD case, the chapter lasts for 14 minutes. Assume you are
driving 60 miles per hour (in other words, 1 mile per minute). You could have
passed your exit anytime over the past 14 minutes. You want to know about
the probabilities for where it was exactly (how far back in miles) as you turn
around to head back to find it.
that 4-minute interval, what is the probability that you missed the exit within
the last 30 seconds?
k. How many total miles do you expect to have to drive out of your way to
get back to the missed exit (miles past the exit ×2)?
l. Also find the standard deviation in the total miles out of your way that
you had to drive.
m. If you get reimbursed by your company 43 cents a mile for your trip plus
the $3.20 you spent on a grande mocha at Starbucks (conveniently located at
the turn-around spot) to wake yourself up as you get turned around, how much
is the expected amount of money this detour will cost the company?
n. What is the standard deviation in this amount of money that you get
reimbursed in part m?
Exercise 31.3. Network latency. The network latency per request from your
computer to a server is Uniform between 30 ms and 150 ms.
Exercise 31.4. A fly is flying around. You believe that there is a fly some-
where less than 6 feet away from you. If you believe that he is located Uniformly
in a circle of radius 6 feet away from you, what is the probability that he is more
than 2 feet away from you?
Exercise 31.7. Target art. A wall in a room is 108 inches tall and 132 inches
wide. There is a painting on the wall that is 18 inches by 24 inches. If a
tennis ball is accidentally flung at the wall, and the location where it lands is
Uniformly distributed on the wall, what is the probability that the tennis ball
hits the painting?
Exercise 31.8. Solar cell. A solar cell measures 10 inches by 10 inches, and it
has 4 non-overlapping regions that generate electricity. Each of these regions is
31.4. Exercises 407
31.4.2 Extensions
Exercise 31.9. Cab ride. Suppose that the length X of a randomly selected
passenger’s trip in a cab is Uniformly distributed between 5 and 30 miles. The
charge induced for such a trip, in dollars is Y = 2.50X + 3.00.
y
20
15
10
x
5 10
Exercise 31.11. There’s still a fly in the house! Consider the two-
dimensional house in Figure 31.5. Suppose that a fly is still found somewhere
in the house, with joint density
a. Find E(X).
b. Find Var(X).
c. Find E(Y ).
d. Find Var(Y ).
Exercise 31.12. Long jump. Suppose that a particular long jumper assumes
that each of his jumps are Uniformly distributed between 6.5 and 7.2 meters.
He is happy whenever he jumps 7 meters or more. If he makes 10 such jumps,
what is the probability that he is happy with exactly 4 of his jumps?
Exercise 31.13. Student arrivals. Five students will arrive to the classroom
during the next 2 minutes. Assume that their arrival times are independent,
and each arrival time is Uniformly distributed between 0 and 2 minutes. What
is the probability that exactly two of the students arrive during the next 30
seconds?
Exercise 31.14. Manufactured cubes. A machine manufactures cubes with
a side length which varies Uniformly over the interval [0.2, 0.3] in millimeters.
For the following problems, make sure you use the correct units. (Assume the
sides of the base and the height are all the same.)
a. What is the expected side length?
b. What is the standard deviation of the side length?
c. What is the expected area of one of the square bases?
d. What is the standard deviation of one of the square bases?
e. What is the expected volume of one of the cubes?
f. What is the standard deviation of the volume of one of the cubes?
In the rest of the problem, assume that the cost to make the cubes is 12
cents per cubic millimeter and 6 cents for the general cost (labor, electricity,
etc.) per cube.
f. What is the expected cost for making 1 cube?
g. What is the variance in the cost for making 1 cube?
h. What is the expected cost for making 10 cubes?
i. What is the variance in the cost for making 10 cubes?
Exercise 31.15. Dartboard. Kelly throws a dart at a circular dartboard of
radius 3 feet. Let X and Y denote the location where the dart lands. Assume
that −3 ≤ X ≤ 3 and −3 ≤ Y ≤ 3 and X 2 + Y 2 ≤ 9, i.e., the dart lands on
the dartboard. Moreover, assume that the dart’s location is Uniform on the
dartboard, i.e.,
Exercise 31.23. Mole rat. The location of a mole rat is Uniform inside a
circular enclosure that has diameter 40 feet. What is the probability that the
mole rat is within 2 feet from the edge of the enclosure?
Exercise 31.24. Broken chalk. Assume a mother’s child has a random lo-
cation that is Uniformly distributed across the 80 foot by 120 foot playground
shown in Figure 31.6 below.
50
playscope
slides swings
30
picnic
20
tables bench
10
x
10 40 55 75 100 120
Exercise 31.27. Suppose X, Y have constant joint density on the triangle with
vertices at (0, 0), (3, 0), and (0, 3). Find E(X).
31.4.3 Advanced
Exercise 31.28. A fly near the wall. There is a fly randomly located Uni-
formly in a room that is 10 feet high, 14 feet long, and 13 feet wide. What is
the probability that the fly is within 1 foot of the walls, ceiling, or floor?
31.4. Exercises 411
What is the expected time until your best friend sends you a text message?
When a mother is waiting for her three children to call her, what is the proba-
bility that the first call will arrive within the next 5 minutes?
32.1 Introduction
Exponential random variables are often waiting times for the next event to
happen. For instance, the time until the telephone rings, or a guest arrives, or
black car passes, etc., are all modeled by Exponential random variables. We
saw examples of Exponential random variables, for example, in Example 25.5
and Example 28.8. Exponential random variables are always positive. (It would
not make sense, for instance, for a waiting time to be negative.) Exponential
random variables, as we will see, also have the memoryless property, as did
Geometric random variables. In many ways, Exponential random variables are
a continuous version of Geometric random variables. Besides being memoryless,
the density of an Exponential random variable decreases exponentially in x, just
as the mass of a Geometric random variable decreased exponentially.
The notation for an Exponential random variable looks like:
X ∼ Exponential(λ).
412
32.2. Average and Variance 413
The parameters:
CDF: (
1 − e−λx , x > 0,
FX (x) =
0 otherwise
Expected value formula:
E(X) = 1/λ
Variance formula:
Var(X) = 1/λ2
∞ ∞
e−λx
Z
−λx
E(X) = e dx = − = 1/λ.
0 λ 0
414 Chapter 32. Exponential Random Variables
fX (x)
2
fX (x)
0.6 1
fX (x)
0.4 1
0.5
0.2
x x x
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10
FX (x)
FX (x) FX (x)
1 1 1
x x x
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10
matter what the λ parameter is, the density starts at λ when x = 0 and then
quickly moves closer to 0 as x → ∞. The CDF starts at 0 but quickly climbs
close to 1 as x → ∞. The density and CDF curves are steeper for larger values
of λ.
Example 32.1. The time between fatal car accidents on a stretch of desolate
highway between two cities was found to follow an Exponential distribution
with a mean of one accident every 44 days. If an accident has occurred today
(the count starts over), the sheriff’s office is interested in when the next accident
may occur.
1 1
E(X) = = = 44 days/accident.
λ 1 accident/44 days
e. What is the standard deviation in the time the sheriff’s office will wait
for the next accident to occur?
1
Var(X) = = 1936 days2
(0.02273)2
√
σX = 1936 = 44 days
f. Find the probability that the next accident occurs within the next 31
days.
g. Today is August 1st. What is the probability that no accident will have
occurred by August 9th?
416 Chapter 32. Exponential Random Variables
This means that we will have to wait more than 8 days for the next accident
to occur. So the probability is
h. You need an estimate of how long you would wait so that there is a
95% chance that the next accident would happen before that day. What is that
cut-off length of time?
We want the time a such that
P (X ≤ a) = 0.95,
or equivalently
FX (a) = 0.95 = 1 − e−0.02273x .
Thus e−0.02273x = 0.05. Taking the natural logarithm of both sides, we have
−0.02273x = −2.996,
so x = 131.7964. In other words, we are 95% certain that the next accident will
happen within the next 131.7964 days.
We saw an example of Exponential random variables in Example 26.7. The
“lifetime” of a music player might not seem like a “waiting time,” but it can be
interpreted that way, i.e., the time we wait until the music player dies.
We saw another example of waiting times near the start of section 29.1,
where X is the waiting time until the next bus arrives, given in minutes. That
waiting time was exponential with λ = 3/10. So now we can immediately
conclude that E(X) = 1/λ = 10/3.
We also saw waiting times in Exercise 26.6, about the response time for a
police car to an emergency call.
Example 32.2. Let X be the time (in minutes) that Maxine waits for a traffic
light to turn green, and let Y be the time (in minutes, at a different intersection)
that Daniella waits for a traffic light to turn green. Suppose that X and Y have
joint density
In this case, we notice that the joint density can be factored, so there are some
constants c and d so that
and FX (x) = 0 otherwise. Thus, for a > 0, we see that P (X > a) has an
especially nice form:
Just like the Geometric distribution, the Exponential distribution has the mem-
oryless property. The memoryless property helps streamline our ability to cal-
culate conditional distributions for Exponential random variables. (Recall from
Chapter 16 that the Geometric distribution also has a memoryless property.)
For example, if X is an Exponential random variable, then P (X > 5 | X >
2) = P (X > 3). Intuitively, if we have already waited at least two minutes for
an event to occur, then the conditional probability that we wait at least five
minutes altogether is equal to the probability that, starting from right now, we
will wait at least three additional minutes until the event occurs. One impor-
tant difference is that, since the Exponential distribution is continuous, then
P (X ≥ 5) and P (X > 5) are exactly the same value, because P (X = 5) = 0.
As another example, if we know that a waiting time X is bigger than 7, and
we want the probability that X is bigger than 11, then we only need to know
that—after 7 units of time have passed—we have to wait at least 4 additional
units of time. So, for example, it would make sense if
We compute
This is called the “memoryless property,” which we also saw was true for Geo-
metric random variables. If we already know that X is bigger than a, then the
conditional probability (given X is bigger than a) that X is bigger than a + b
is just equal to the unconditional probability that X is bigger than b. This
is true with waiting times in life too, in many cases. For instance, if we have
already waited 10 minutes for the phone to ring, then the probability that we
wait a total of 12 minutes or less is just equal to the unconditional probability
of waiting 2 minutes or less from the outset—the previous 10 minutes do not
affect the future waiting time. (We are assuming that we do not know when
the phone will ring, i.e., that there is not a planned calling time in advance. So
this example works best, for example, if we imagine that we are answering a
telephone in an office, for a telethon, etc., when the timing of the next phone
call is variable and is not planned ahead of time.)
Example 32.4. (continued from Example 32.1) The time between acci-
dents for all fatal car accidents on a stretch of highway between two cities was
found to follow an Exponential distribution with a mean of one accident every
44 days. If an accident has occurred today (the count starts over), the sheriff’s
office is interested in when the next accident may occur.
Today is August 1st. Given that no accident has occurred by August 25th,
what is the probability that no accident will have occurred by September 2nd?
No accident occurring by August 25th means that there is no accident in
the next 24 days, so our wait time will be more than 24 days, so X > 24.
No accident occurring by September 2nd means that there is no accident in
the next 32 days so our wait time will be more than 32 days.
So the conditional probability is
FZ (z) = P (Z ≤ z)
= 1 − P (Z > z)
= 1 − P (X > z and Y > z)
= 1 − P (X > z)P (Y > z)
= 1 − e−λ1 z e−λ2 z
= 1 − e−(λ1 +λ2 )z
Z = min(X1 , X2 , . . . , Xn ),
Example 32.6. Now consider two random variables, X and Y , with joint
distribution
fX,Y (x, y) = 36e−5x−4y for 0 < x < y,
and fX,Y (x, y) = 0 otherwise. Then X has an exponential distribution but Y
does not.
We compute, for y > 0, the density of Y :
Z y
36
fY (y) = 36e−5x−4y dx = (e−4y − e−9y ).
0 5
The density of Y tells us that Y is not exponentially distributed. On the other
hand, for x > 0, when we compute the density of X,
Z ∞
fX (x) = 36e−5x−4y dy = 9e−9x ,
x
Example 32.7. Suppose that the times between the arrival of consecutive
emails are independent Exponential random variables, each of which has an
average of 1/2 a minute, i.e., the parameter is λ = 2. So we expect 2 emails
to arrive per minute. The expected number of emails to arrive in a fixed 10
minute time interval (say, between 1:35 PM and 1:45 PM) is a Poisson random
variable with mean (2)(10) = 20.
Example 32.8. A traffic engineer has studied a lonesome desert highway, and
she believes that the average time between consecutive cars at an observation
point is 12 minutes, and these waiting times are independent Exponential ran-
dom variables. Thus, on average, 5 cars pass the observation point per hour.
Moreover, the number of cars that pass the observation point per hour is Pois-
son with mean 5. The number of cars that pass during a two-hour period is
Poisson with mean 10. The number of cars that pass during a 30-minute pe-
riod is Poisson with mean 2.5 (even though the number of cars that pass is an
integer-valued random variable, the parameter λ is allowed to be a non-integer).
422 Chapter 32. Exponential Random Variables
The probability that exactly 4 cars pass between 1:10 PM and 2:10 PM is
e−5 54
= 0.1755.
4!
The probability that exactly 13 cars pass between 1:35 PM and 3:35 PM is
e−10 1013
= 0.0729.
13!
The probability that exactly 2 cars pass between 4:30 PM and 5:00 PM is
e−2.5 2.52
= 0.2565.
2!
E(X) = 1/λ,
E(X 2 ) = 2/λ2 ,
so the variance of X is
2
2 1
Var(X) = 2 − = 1/λ2 .
λ λ
32.4. Exercises 423
32.4 Exercises
32.4.1 Practice
Exercise 32.1. Egg laying. Chickens at Rolling Meadows Farm lay an average
of 18 eggs per day. The farmer has rigged a fancy monitoring device to the
nesting boxes so that he can monitor exactly when the hens lay their eggs.
Assume that no 2 eggs will be laid at exactly the same time and that the eggs
(and chickens) are independent from each other. The farmer wants to know
how long he will have to wait (in minutes) for the next egg to be laid if he starts
monitoring at the first rooster crow in the morning.
Exercise 32.3. Waiting by the phone. Bob is waiting for his girlfriend Alice
to call. His waiting time X is Exponentially distributed, with expected waiting
time E(X) = 0.20 hours, i.e., 12 minutes.
a. What is the probability that he must wait more than 12 minutes for her
to call?
b. What is the probability that he must wait more than 12 minutes (alto-
gether) for her to call, given that he has already waited 3 minutes?
c. What is the probability that he must wait at most 10 minutes (altogether)
for her to call, given that he has already waited 3 minutes?
d. Given that he waited less than 10 minutes, what is the probability that
he waited less than 3 minutes?
e. Find the unique time “a” in hours such that P (X ≤ a) = 1/2 and
P (X > a) = 1/2. This is the median waiting time; also see Exercise 32.13.
Exercise 32.4. Waiting for a bus. A student waits for a bus. Let X be
the number of hours that the student waits. Assume that the waiting time is
Exponential with average 20 minutes.
a. What is the probability that the student waits more than 30 minutes?
32.4. Exercises 425
b. What is the probability that the student waits more than 45 minutes
(total), given that she has already waited for 20 minutes?
c. Given that someone waits less than 45 minutes, what is the probability
that they waited less than 20 minutes?
d. What is the standard deviation of the student’s waiting time?
Exercise 32.5. Waiting for a ride. The waiting time for rides at an amuse-
ment park has an Exponential distribution with average waiting time of 1/2 an
hour. Find the time “t” such that 80% of the people have waiting time t or less.
Exercise 32.6. Happy birthday. It is your birthday and you are waiting for
someone to write a “Happy Birthday” message on your Facebook wall. Your
waiting time is approximately Exponential with average waiting time of 10
minutes between such postings; assume that the times of the postings are inde-
pendent.
a. What is the probability that the next posting takes 15 minutes or longer
to appear?
b. What is the standard deviation of the time in between consecutive Happy
Birthday messages?
c. Suppose that the most recent posting was done at 1:40 PM, and it is now
1:45 PM (i.e., no postings have been made during the last five minutes). What
is the expected time for the next message to appear?
Exercise 32.7. Falling asleep. Lily estimates that her time to fall asleep
each night is approximately Exponential, with an average time of 30 minutes
until she falls asleep.
a. What is the probability that it takes her less than 10 minutes to fall
asleep?
b. What is the probability that it takes her more than 1 hour to fall asleep?
c. If she has already laid awake for 1 hour, what is the probability that it
will take her more than 90 minutes (altogether) to fall asleep?
32.4.2 Extensions
Exercise 32.8. Pizza delivery. Suppose that the times until Hector, Ivan,
and Jacob’s pizza arrives are independent exponential random variables, each
with average of 20 minutes. Find the probability that none of the waiting times
exceed 20 minutes, i.e., find P (max(X, Y, Z) ≤ 20).
Exercise 32.9. Flight delays. Suppose that, when an airplane waits on the
runway, the company must pay each customer a fee if the waiting time exceeds
3 hours. Suppose that an airplane with 72 passengers waits an exponential
amount of time on the runway, with average 1.5 hours. If the waiting time X,
426 Chapter 32. Exponential Random Variables
in hours, is bigger than 3, then the company pays each customer (100)(X − 3)
dollars (otherwise, the company pays nothing). What is the amount that the
company expects to pay for the 72 customers on the airplane altogether? (Of
course their waiting times are all the same.)
a. Given that the machine has not been used in the previous 5 minutes,
what is the probability that the machine will not be used during the next 10
minutes?
b. How many purchases are expected within the next hour?
c. What is the distribution of the number of purchases to be made within
the next hour?
d. Suppose that a person pays 75 cents for each beverage in the vending
machine. The supplier pays 40 cents per beverage in the machine, and thus
makes a profit of 35 cents per beverage. What is the expected profit made
during a given 8 hour period?
Exercise 32.15. Still waiting by the phone. Dan, Dominic, and Doug
are waiting together in the living room for their girlfriends, Sally, Shellie, and
Susanne, to call. Their waiting times (in hours) are independent Exponential
random variables, with parameters 2.1, 3.7, and 5.5, respectively. What is the
probability that the phone will ring (i.e., the first call will arrive) within the
next 30 minutes (i.e., within the next 1/2 an hour)?
Exercise 32.16. Air traffic control. Air traffic control stations often have
insufficient numbers of air traffic controllers, sometimes just one person on duty.
In a recent study, a lone air traffic controller is managing an airstrip in which
32.4. Exercises 427
32.4.3 Advanced
Exercise 32.17. Consider an Exponential random variable X with parameter
λ > 0. Let Y = bXc, which means we get Y by rounding X down to the
nearest integer (in particular, Y itself is a discrete random variable, because Y
is always an integer). For example, if X = 7.2, then Y = 7. If X = 12.9999,
then Y = 12. If X = 5.01, then Y = 5, etc.
So the mass of Y is exactly
a. Find an expression for the mass of Y . (Your expression will have λ in it;
i.e., just integrate to find the value of P (y ≤ X < y + 1), and then simplify.)
b. Do you recognize the mass of Y ? (Yes, you should!) What type of
random variable is Y ? What are the parameters of Y ? (Hint: Y is one of the
types of named discrete random variables.)
Exercise 32.19. A store manager is very impatient at the start of the day. Let
X be the time (in minutes) until his first customer arrives. The workers at the
store decided to find a way to measure his impatience level as a function of X:
g(X) = aX 2 + bX + c,
Exercise 32.20. Suppose that the time in between customers at a store are in-
dependent Exponential random variables, with an average of 2 minutes between
consecutive customers. Let X be the time until the 3rd customer arrives.
a. Find E(X).
b. Find Var(X).
c. Find the density of X.
Chapter 33
I have noticed that people who are late are often so much jollier than the people
who have to wait for them.
—E. V. (Edward Verrall) Lucas
An employee wonders how long it will take until the fourth customer arrives at
the store. How is this related to the length of time until the first arrival? Is
the time to the fourth arrival always four times as long as the time to the first
arrival? Why or why not?
33.1 Introduction
A Negative Binomial random variable is the sum of r independent Geometric
random variables, i.e., the number of (discrete) trials until the rth success oc-
curs. The Gamma random variable has a similar motivation, but in a continuous
setting. A Gamma random variable is the sum of r independent Exponential
random variables. If an Exponential random variable is viewed as the waiting
time until the first event occurs, then a Gamma random variable is the wait-
ing time until the rth event occurs. Thus, if X1 , X2 , . . . , Xr are independent
Exponential random variables that each have parameter λ, and if we define
X = X1 + X2 + · · · + Xr ,
428
33.1. Introduction 429
The parameters:
The λ for the Gamma distribution is the same as the one for the Exponential
distribution. The r parameter is often called the “shape” parameter, and λ is
the “scale” parameter.
Density:
λr r−1 −λx
(
Γ(r) x e , x > 0,
fX (x) =
0 otherwise
where Γ(r) = (r − 1)! since r is a positive integer.
When r = 1, a Gamma random variable is just an Exponential random vari-
able.
CDF:
(λx)j
(
1 − e−λx r−1
P
FX (x) = j=0 j! , x > 0,
0 otherwise
Expected value formula:
E(X) = r/λ
Variance formula:
Var(X) = r/λ2
X ∼ Gamma(r, λ).
1 1 r
E(X) = E(X1 + · · · + Xn ) = E(X1 ) + · · · + E(Xr ) = + ··· + = .
λ λ λ
430 Chapter 33. Gamma Random Variables
1 1 r
Var(X) = Var(X1 + · · · + Xn ) = Var(X1 ) + · · · + Var(Xn ) = 2
+· · ·+ 2 = 2 .
λ λ λ
Some examples of densities for Gamma random variables with various values
of r and λ are given in Figures 33.1, 33.2, and 33.3.
0.25 0.5 2
x x x
5 10 5 10 5 10
5 10 x 5 10 x 5 10 x
5 10 x 5 10 x 5 10 x
33.2 Examples
d. How long should Shirley expect to have for her coffee break before it is
her turn to wait on the customers?
r 4 customers
E(X) = = = 20 minutes.
λ 1 customer / 5 minutes
e. What is the standard deviation in the amount of time Shirley will have
for her coffee break?
r 4
Var(X) = 2
= = 100 minutes2
λ (1/5)2
√
σX = 100 = 10 minutes
f. What is the density of the time until the 4th customer of the day arrives?
(1/5)4 4−1 −(1/5)x
(
fX (x) = Γ(4) x e , x > 0,
0 otherwise
which, since Γ(4) = 3! = 6, simplifies to
(
1
x3 e−(1/5)x , x > 0,
fX (x) = 3750
0 otherwise
g. What is the probability that she will have at least 30 minutes before she
has her first customer (the fourth customer of the day)?
The probability is
= e−6 (1 + 6 + 18 + 36)
= 61e−6
= 0.1512
phone again to start another call (i.e., there are no gaps in between the calls).
Thus, if he conducts r phone calls in a row, the total amount of time he spends
on the telephone is X1 + · · · + Xr , where the Xj ’s are independent, and each
Xj has the same density.
We see that the total time it takes David to make r calls is exactly a Gamma
random variable X = X1 + · · · + Xr with parameters λ = 3 and r.
The expected time that it takes him to make r calls is
E(X) = r/3.
The variance of the time that it takes him to make r calls is
Var(X) = r/9,
and thus the standard deviation is r/9.
p
Here we are using
integration by parts.
We can also compute the probability that he completes two calls within the
first 1 hour, by using the density of X given above. When r = 2 and λ = 3,
then the density of X becomes
fX (x) = x2−1 32 e−3x /(2 − 1)! = 9xe−3x for x > 0,
and fX (x) = 0 otherwise. Completing two calls in the first hour means X < 1;
thus
Z 1
P (X < 1) = 9xe−3x dx
0
1
= 9xe−3x /(−3) − 9e−3x /(9)
x=0
−3 −3
= −3e − e − (−1)
= 1 − 4e−3
= 0.8009
Example 33.3. Consider 300 students who are waiting for service at the regis-
trar’s office. Assume that their waiting times are independent exponential ran-
dom variables, and each waiting time (in minutes) has density fX (x) = 2e−2x
for x > 0, and fX (x) = 0 otherwise. Find the probability that the 300 students
collectively (i.e., altogether) spend between 145 and 152 hours waiting for their
appointments.
33.3 Exercises
33.3.1 Practice
Exercise 33.1. Egg laying. (See Exercise 32.1.) Chickens at Rolling Mead-
ows Farm lay an average of 18 eggs per day. The farmer has rigged a fancy
monitoring device to the nesting boxes so that he can monitor exactly when
the hens lay their eggs. Assume that no 2 eggs will be laid at exactly the same
time and that the eggs (and chickens) are independent from each other. The
farmer wants to know how long he will have to wait (in minutes) for the next
half-dozen (6) eggs to be laid so that he can bake a chocolate cake if he starts
monitoring at the first rooster crow in the morning.
d. What is the expected length of time (in years) between now and when
the third hurricane that is category 4 or stronger will come?
e. What is the variance in this length of time?
f. What is the probability density function for the length of time before the
third hurricane that is category 4 or stronger will come? Write your answer in
function form and show a graph.
g. What is the CDF for the length of time before the third hurricane that is
category 4 or stronger? Write your answer in function form and show a graph.
h. What is the probability that the third hurricane that is category 4 or
stronger will arrive in the next 10 years?
Exercise 33.3. Flight delays. The time (in minutes) until a person’s flight
departs at an airport has density
1 −x/45
fX (x) = e , for x > 0,
45
and fX (x) = 0 otherwise.
Exercise 33.4. Waiting for a ride. The waiting time for rides at an amuse-
ment park has an Exponential distribution with average waiting time of 1/2 an
hour (assume that the waiting times are independent).
a. If a person rides 5 rides, what is the expected amount of time that the
person spends waiting in line?
b. If a person rides 5 rides, what is the standard deviation of the time that
the person spends waiting in line?
c. Find the probability that the person spends more than 1 hour altogether
while waiting for two rides (i.e., that their actual waiting time is longer than
the expected waiting time).
Exercise 33.5. Waiting for a bus. A student waits for a bus 10 times during
a week. Let X be the number of hours that the student waits. Assume that the
waiting times are independent Exponential random variables, each with average
30 minutes.
a. What is his expected time spent waiting for the 10 buses altogether?
b. What is the standard deviation of his waiting time for the 10 buses
altogether?
436 Chapter 33. Gamma Random Variables
Exercise 33.6. Homework. A student estimates that the time needed to solve
each homework problem is Exponentially distributed and is independent of all
the other homework problems. Each problem takes, on average, 15 minutes to
solve.
Exercise 33.7. Waiting at the store. The time that each customer spends
in a grocery store line is Exponential with average waiting time 3 minutes. If I
am the 7th customer in line, how long do I expect to wait until all 7 of us have
had our groceries processed?
Exercise 33.9. Happy birthday. It is your birthday and you are waiting for
someone to write a “Happy Birthday” message on your Facebook wall. Your
waiting time is approximately Exponentially distributed with average waiting
time of 10 minutes between such postings; assume that the times of the postings
are independent.
What is the probability that you have to wait 24 minutes or less until you
get your second Happy Birthday message?
a. What is the density of the time that you spend, if you play two games in
a row?
b. What is the probability that you will be finished with two games in a
total of five minutes or less?
otherwise, fX (x) = 0. He has also estimates that there are 500 meteors in this
shower. How many minutes does he expect to be watching this meteor shower?
Exercise 33.12. Starting a band. Five friends decide to start a band. They
each start to practice their own instruments until they can perform their first
33.3. Exercises 437
33.3.2 Extensions
Exercise 33.13. Lots of homework. Kelsey gets back to her apartment and
immediately starts all her homework assigned that day for her three classes. The
time to finish each is an Exponential random variable, with average 30 minutes;
these three times are independent. What is the probability that it takes her
more than 90 minutes (altogether) to complete the three assignments?
Exercise 33.15. Waiting for a therapist. The waiting time to see a thera-
pist is Exponential with average of 10 minutes during each visit. What is the
probability that, during 3 separate visits to the therapist, a patient spends a
total of 40 minutes or more waiting?
33.3.3 Advanced
Exercise 33.16. If X is a Gamma random variable with r = 2, show that X
does not have the memoryless property.
What is the percent of students who pass the SOA/CAS P/1 actuarial exam?
34.1 Introduction
The Beta distribution deals with percents, proportions, or fractions. The nota-
tion for the Beta distribution looks like:
X ∼ Beta(α, β)
When α and β are both 1, a Beta random variable is simply a Uniform random
variable.
Unfortunately, in general, there is no nice shortcut for the Beta cumulative
distribution function. Since the power of x depends on the parameters α and β,
the density (and, therefore, its integral) depends essentially on those parameters.
Thus, a concise form for the CDF is not available.
The density has a familiar shape for many pairs of parameters. We give
some examples. The plots are only drawn for the values 0 ≤ x ≤ 1 because
the density equals 0 elsewhere. We give four density and CDF example plots in
Figures 34.1, 34.2, 34.3, and 34.4.
438
34.1. Introduction 439
fX (x) FX (x)
1 1
0.5 0.5
x x
0.5 1 0.5 1
2 fX (x) 1 FX (x)
1 0.5
x x
0.5 1 0.5 1
Figure 34.2: When α = 2 and β = 1, (left) the density fX (x) = 2x, and
(right) the CDF FX (x) = x2 of a Beta(2, 1) random variable.
2 fX (x) 1 FX (x)
1 0.5
x x
0.5 1 0.5 1
Figure 34.3: When α = 1 and β = 2, (left) the density fX (x) = 2(1 − x),
and (right) the CDF FX (x) = x(2 − x) of a Beta(1, 2) random variable.
1 FX (x)
1.5 fX (x)
1 0.5
0.5
x x
0.5 1 0.5 1
Figure 34.4: When α = 2 and β = 2, (left) the density fX (x) = 6x(1 − x),
and (right) the CDF FX (x) = x2 (3 − 2x) of a Beta(2, 2) random variable.
440 Chapter 34. Beta Random Variables
34.2 Examples
Example 34.1. A soda company distributor wants to figure out how long he
should wait to deliver more sodas to a particular convenience store. The store’s
stock of sodas in a month can be modeled by a Beta distribution with α = 2
and β = 5, where X is the proportion of sodas that will be purchased.
3 fX (x)
2.5
1.2
FX (x)
2 1
0.8
1.5
0.6
1
0.4
0.5
0.2
x x
0.2 0.4 0.6 0.8 1 1.2 0.2 0.4 0.6 0.8 1 1.2
The CDF FX (x) is 0 for x < 0 and is 1 for x > 1. The interesting portion
of FX (x) is for 0 ≤ x ≤ 1. In this region, it is profitable to use u-substitution,
because the (1 − x) portion of the density has a higher power (namely, 4) than
the x portion of the density (namely, 1). So we calculate, for 0 ≤ a ≤ 1,
Z a
FX (a) = 30x(1 − x)4 dx.
0
2
E(X) = = 2/7 = 0.2857.
2+5
d. What is the variance and standard deviation of the proportion of sodas
442 Chapter 34. Beta Random Variables
(2)(5)
Var(X) = = 5/196 = 0.0255,
(2 + 5)2 (2 + 5 + 1)
p
σX = 5/196 = 0.1597.
e. What is the probability that more than 3/4 of the sodas the distributor
delivered will be purchased?
= 6(1/4)5 − 5(1/4)6
= 19/4096
= 0.0046
f. What is the probability that between 1/2 and 3/4 of the sodas the dis-
tributor delivered will be purchased?
The probability is P (1/2 ≤ X ≤ 3/4) = FX (3/4) − FX (1/2). Using the
results from part e, we know FX (3/4) = 1 − P (X > 3/4) = 1 − 19/4096 =
4077/4096. Also, we have
so it follows that
g. The distributor was already told that more than half of the sodas delivered
earlier this month have been purchased, so what is the conditional probability
that less than 3/4 of the sodas will be purchased?
34.3 Exercises
34.3.1 Practice
Exercise 34.1. Qualifying exam. The proportion of people who pass a pro-
fessional qualifying exam on the first try has a Beta distribution with α = 3
and β = 4.
a. What is the expected proportion of people who will pass on the first try
at the next exam?
b. What is the standard deviation in the proportion of people who will pass
on the first try at the next exam?
Exercise 34.2. Shelf space. A grocery store chain is trying to decide how
much shelf space to devote to organic produce. If they don’t stock enough, their
more upscale customers will shop at the competing grocery store. If they stock
too much, then much of the expensive organic produce will have to be thrown
out when it expires. The percentage of organic produce which is purchased
during the week after delivery can be modeled with a Beta distribution with
α = 4 and β = 3.
34.3.2 Extensions
Sir,—It has been wittily remarked that there are three kinds of falsehood: the
first is a ‘fib,’ the second is a downright lie, and the third and most aggravated
is statistics.
—“Letter to the Editor” of The National Observer, written by T. Mackay,
dated June 8, 1891, published June 13, 1891
What is the probability that a randomly chosen student is more than 6 feet tall?
What is the probability the randomly selected student will be between 5 and 6
feet tall? What is the cutoff length that separates the tallest 5% of people from
the rest of the population?
fX (x)
0.4
0.2
x
−4 −2 2 4
35.1 Introduction
We encounter Normal random variables in many situations. The shape of the
density of a standard Normal random variable, seen in Figure 35.1, is the famil-
iar bell curve that is so often associated with randomly distributed quantities
445
446 Chapter 35. Normal Random Variables
and measurements. Many things that are clustered near their expected value
The density of a are Normally distributed. In fact, the bell curve looks like a bell because most
Normal random of the density is concentrated in a bell shape near the expected value. Also,
variable is often as with a bell, only a relatively small amount of the density is concentrated far
referred to as a bell from the center.
curve.
Weights and heights of many animals, plants, as well as manufactured prod-
ucts of all shapes and sizes, are often Normally distributed. The growth of
plants, the volumes of liquids in bottles, and many other biological, physical,
and financial random variables are Normally distributed.
Normal random variables
The common thread: Concentration (often of some kind of mea-
surement) near the expected value, so the distribution is like a bell
curve. For instance, the weight, height, lifespan, and intelligence scores of
living things are often Normally distributed. The actual volume of soda in a
can filled by a machine in a factory, the high temperatures for June 17th in
a given community, and many other variables in the biological, physical, and
financial world, are all Normally distributed.
Things to look for: Random quantities that are “close” to their expected
values a relatively large portion of the time.
The variable:
The parameters:
Perhaps Normal random variables are the most prevalent kind of random
variables in everyday applications. The Normal distribution is often referred
to as the Gaussian distribution. (It is named for the 19th century mathemati-
cian Gauss; http://en.wikipedia.org/wiki/Carl_Friedrich_Gauss .) Since
35.1. Introduction 447
we experience Normal random variables every day, why have we not practiced
working with the density of the Normal distribution yet? The reason is this:
The Normal distribution has a density that we can easily write down, but we
cannot easily integrate the density. In fact, we do not have any closed-formed
way to express probabilities that arise from the Normal distribution. Similarly,
we cannot even write down the cumulative distribution function of the Normal
distribution with a closed-formula.
For Normal random variables, with parameters µX and σX 2 , as we will see
in Section 35.2, the expected value is exactly E(X) = µX and the variance is
exactly Var(X) = σX 2 . As we will see in later chapters, the Normal distribution
plays a crucial role in describing the asymptotic behavior of the sum and of the
average of a large collection of random variables that are either independent
or loosely dependent. Many limiting theorems have been discovered that are
associated with the Normal distribution.
Another nice property of Normal random variables is that the sum of inde-
pendent Normal random variables is also a Normal random variable (this will
be established in the next chapter), and also if X is Normal, then aX + b is
Normal too (to be seen in Section 35.2). These properties of Normal random
variables make them very desirable to work with, even though we must consult
a chart to look up the probabilities associated with Normal random variables
(since we cannot integrate the density by hand).
The notation for a random variable X with Normal distribution is:
2
X ∼ N (µX , σX ).
Since there is not a closed formula for the CDF of the Normal distribution,
we are forced to read all of the probabilities for the Normal distribution from
a table. This is not because of any inadequacy in our mathematical ability
or understanding; nobody has such a closed formula. So mathematicians and
statisticians and practitioners do exactly the same thing—or one can use a
calculator (if it has a button for the Normal distribution), or a program on a
computer.
A Normal random variable with parameters 0 and 1 (respectively) is used
so frequently that we have a special name for it: a standard Normal random
variable. When working with Normal random variables, whenever we write Z,
we are referring to a standard Normal random variable.
Definition 35.1. Density of a Standard Normal Random Variable
We say Z is a standard Normal random variable if it has parameters 0 and 1
respectively. A standard Normal random variable Z has density
1 2
fZ (z) = √ e−z /2 for −∞ < z < ∞.
2π
One of the very nice things about Normal random variables—especially from
the perspective of using the Normal distribution—is that we do not need to
448 Chapter 35. Normal Random Variables
The authors assume calculate integrals in order to get probabilities from the density. We are unable
that a significant to integrate the density in closed form. Instead, we provide a table that gives
portion of our (cumulative) probabilities of the form P (Z ≤ z), for z’s in the range 0.00 to
readers will be 3.09. The table is included in this chapter and also at the back of the book.
taking the Society of
Actuaries P exam/ It might appear that half of the values we need, P (Z ≤ z) for z < 0, are
Casualty Actuarial missing from the table, because we often need to calculate P (Z ≤ z) for negative
Society exam 1; only values of z too! Fortunately, the density of Z is symmetric around µZ = 0 (i.e.,
P (Z ≤ z) for z ≥ 0
about the origin), so we can calculate the missing probabilities by using the
are given.
complementary probabilities. We will practice this technique many times.
We also emphasize that only the probabilities for standard Normal random
variables are given in the table. Of course, for every possible combination
of µX and σX 2 , there is a unique Normal distribution. We cannot provide a
separate Normal table for every possible Normal distribution, so in Section 35.2
we will learn how to standardize or normalize whatever Normal curve we need,
by doing a transformation of any Normal random variable X to the standard
Normal random variable Z.
It is helpful to see how changing µX and σX will affect the shape of fX (x).
The shape is always a bell curve, with area 1 below the curve. The curves always
stretch from negative infinity to positive infinity, but there is never much area
once you get far beyond the middle of the curve. For smaller standard devia-
tions, the curve will be taller and skinnier. For larger standard deviations, the
curve will be shorter and wider. In Figure 35.2, we plot the density fX (x) when
2 = 1 is fixed but µ
σX X takes on values −2, 0, and 1, respectively. Changing
µX just shifts the density to the left or right. The curve with µX = −2 is on
the left; the curve with µX = 0 is in the middle; and the curve with µX = 1 is
on the right.
fX (x)
0.4
0.2
x
−4 −2 2 4
Figure 35.2: Three examples of the density of Normal random variables with
µX = −2 (loosely dashed), µX = 0 (solid line), and µX = 1 (densely dashed),
respectively. In each case, σX
2 = 1.
In Figure 35.3, we see that smaller values of σX give thin, tall densities,
because such densities are very tightly concentrated about µX . In contrast,
large values of σX give wide, short densities, because such densities are not very
concentrated around µX at all.
35.2. Transforming Normal Random Variables 449
0.8 fX (x)
0.6
0.4
0.2
x
−8 −6 −4 −2 2 4 6 8
Figure 35.3: Three examples of the density of Normal random variables with
µX = 0 in each case, and σX = 0.5 (the tall, well concentrated one, drawn with
densely dashed lines), σX = 1 (the standard Normal, drawn with a solid line),
and σX = 3 (the short, very widely distributed one, drawn with loosely dashed
lines), respectively.
first for the standard Normal random variable Z, i.e., we now prove that Z has
450 Chapter 35. Normal Random Variables
expected value 0 and variance 1 (and thus standard deviation 1 too). To see
this, we note that
Z ∞ −z 2/2
e
E(Z) = (z) √ dz, (35.1)
−∞ 2π
but Z e−z 2/2 2
e−z /2
(z) √ dz = − √ .
2π 2π
Taking limits as z → −∞ and z → ∞, we verify that E(Z) = 0.
To see Var(Z) = 1, we write Var(Z) = E(Z 2 ) − (E(Z))2 ; since E(Z) = 0,
then Z ∞ e−z 2/2
Var(Z) = E(Z 2 ) = (z 2 ) √ dz.
−∞ 2π
−z2/2
Using integration by parts, with u = z and du = dz, and dv = (z) e√2π dz
−z 2/2
and v = − e√2π , we obtain
Z ∞ e−z 2/2 2
e−z /2 ∞
Z ∞ 2
e−z /2
Var(Z) = (z 2 ) √ dz = −z √ − − √ dz.
−∞ 2π 2π z=−∞ −∞ 2π
−z 2/2
The first term is 0 because −z e√2π → 0 as z → −∞ and as z → ∞. The
R∞
second term is 1 because it is equal to −∞ fZ (z) dz, i.e., the integral of the
density of Z over all values of z. Thus Var(Z) = 1. (See Exercise 35.25 for one
additional note.)
More generally, consider a Normal random variable X with parameters µX
and σX 2 . By Theorem 35.2, σ Z + µ
X X is Normal too. A Normal random
variable’s distribution is completely and uniquely specified by the values of its
two parameters. Since X and σX Z + µX have the same expected value and
same variance and are both Normal random variables, then X and σX Z + µX
have the same distribution. Also, σX Z + µX has mean µX and variance σX 2 , so
it follows that the parameters of X are actually the mean and variance of X,
respectively. So we have proved:
Within our proof, the following useful fact was also established (since the
density of a Normal random variable is completely determined by its two pa-
rameters).
Theorem 35.4. If Z is a standard Normal random variable, and if X is Nor-
mal with expected value µX and variance σX
2 , then X has the same distribution
as σX Z + µX .
35.2. Transforming Normal Random Variables 451
Theorem 35.6. If X is a Normal random variable and r, s are any two con-
stants, then rX + s is a Normal random variable with standard deviation rσX
and expected value rµX + s.
0.1 0.3
0.05 0.15
x x
−5 5 10 −2 2
Figure 35.4: Scaling a Normal random variable essentially just changes the
labeling in both the x- and y-dimensions. Left: The density of a Normal random
variable with µX = 2 and σX = 3. Right: The density of a standard Normal
random variable, i.e., with µX = 0 and σX = 1. Notice that the shape stays
the same, but the center, spread, and height have changed.
452 Chapter 35. Normal Random Variables
fZ (z)
0.4
For example, in this graph,
the area under the curve is
P (Z ≤ 0.75) = 0.7734.
z
−2 2
35.2. Transforming Normal Random Variables 453
Now we practice using the standard Normal table. We have four types of
ways to use the table. In every case, we encourage you to quickly sketch a
Normal curve—accuracy is not required at all—because we just want you to see
whether the probability that results is bigger or smaller than 1/2. This will be
easy to tell when looking at the picture, because either more or less than 1/2 of
the area under the curve is shaded. Remember that the probability is just the
area under the curve of the density function.
In the next several examples, we will demonstrate how to use the standard
Normal table to calculate probabilities in several different situations.
fZ (z)
0.4
z
−4 −2 2 4
Example 35.8. If z ≥ 0 and we want P (Z > z), we first look up the comple-
ment, P (Z ≤ z), in the table, and then we use P (Z > z) = 1 − P (Z ≤ z). For
instance, see Figure 35.6.
fZ (z) fZ (z)
0.4 0.4
0.2
z z
−4 −2 2 4 −2 2
Figure 35.6: Left: We want the probability P (Z > 0.88), but it is not in
the table. Right: We find P (Z ≤ 0.88) = 0.8106 in the table. So the desired
probability is P (Z > 0.88) = 1 − P (Z ≤ 0.88) = 1 − 0.8106 = 0.1894.
454 Chapter 35. Normal Random Variables
fZ (z) fZ (z)
0.4 0.4
z z
−4 −2 2 4 −4 −2 2 4
Figure 35.7: Both plots have the same shaded area. Left: The shaded area
is P (Z ≥ −1.10). Right: The shaded area is P (Z ≤ 1.10). Each one is 0.8643.
fZ (z) fZ (z)
0.4 0.4
0.2
z z
−4 −2 2 4 −4 −2 2 4
fZ (z)
0.4
z
−4 −2 2 4
Figure 35.8: Upper left: The shaded area is P (Z ≤ −0.48). Upper right:
The shaded area is P (Z ≥ 0.48) (a mirror image). Bottom: The shaded area
is P (Z ≤ 0.48), which is 0.6844 (see the Normal table). So the upper left and
upper right ones are each P (Z ≥ 0.48) = 1−P (Z ≤ 0.48) = 1−0.6844 = 0.3156.
35.2. Transforming Normal Random Variables 455
Finally, for the probability of Z occurring in some given range, we need two
separate probabilities. For P (a ≤ Z ≤ b) = P (Z ≤ b) − P (Z ≤ a), we compute
P (Z ≤ b) and P (Z ≤ a) separately.
fZ (z) fZ (z)
0.4 0.4
z z
−4 −2 2 4 −4 −2 2 4
fZ (z)
0.4
0.2
z
−4 −2 2 4
The plots of the relevant Normal curves are given in Figures 35.10 and 35.11.
fZ (z) fZ (z)
0.4 0.4
0.2
z z
−4 −2 2 4 −4 −2 2 4
Figure 35.10: Left: The shaded area is P (−1.51 ≤ Z ≤ −0.57). Right: The
shaded area is P (0.57 ≤ Z ≤ 1.51) (mirror image).
fZ (z) fZ (z)
0.4 0.4
z z
−4 −2 2 4 −4 −2 2 4
fZ (z)
0.4
z
−4 −2 2 4
P (µX − σX ≤ X ≤ µX + σX )
µX − σX − µX X − µX µ X + σ X − µX
=P ≤ ≤
σX σX σX
= P (−1 ≤ Z ≤ 1)
= P (Z ≤ 1) − P (Z ≤ −1)
= P (Z ≤ 1) − P (Z ≥ 1)
= P (Z ≤ 1) − (1 − P (Z ≤ 1))
= 0.8413 − (1 − 0.8413)
= 0.6826
z z z
−2 2 −2 2 −2 2
changes to make the form match the Normal table. Finally, we will look up the
probability from the table for the standard Normal distribution.
35.2. Transforming Normal Random Variables 459
a. What is the probability that an account will have less money than Bill’s
account?
We let X ∼ N (µX = 1325, σX 2 = 62500) be the amount in the selected
account. We compute
X − µX 775 − 1325
P (X < 775) = P <
σX 250
= P (Z < −2.20)
= P (Z > 2.20)
= 1 − P (Z ≤ 2.20)
= 1 − 0.9861
= 0.0139.
fZ (z) fZ (z)
0.4 0.4
0.2 0.2
z z
−4 −2 2 4 −4 −2 2 4
Figure 35.14: Left: The shaded area is P (Z ≤ −2.20) = 0.0139. Right: The
shaded area is P (Z > 2.20) = 0.0139.
b. What is the probability that an account will have more than $1875?
Again let X ∼ N (1325, 62500). Then
X − µX 1875 − 1325
P (X > 1875) = P >
σX 250
= P (Z > 2.20)
= 1 − P (Z ≤ 2.20)
= 1 − 0.9861
= 0.0139.
See Figure 35.14 (right side) to visualize this as the shaded area under the
Normal curve.
460 Chapter 35. Normal Random Variables
X − µX 1325 − 1325
P (X < 1325) = P <
σX 250
= P (Z < 0)
= 0.5000
fZ (z)
0.4
z
−4 −2 2 4
Figure 35.15: The probability P (Z < 0) = 0.5000 is the area under the
curve.
e. What is the probability that an account will have less than $10?
Again let X ∼ N (1325, 62500). Then
X − µX 10 − 1325
P (X < 10) = P <
σX 250
= P (Z < −5.26)
=0
The reason that P (Z < −5.26) ≈ 0 is that Z is very well concentrated about
the mean; the probability P (Z < z) is practically 0 for relatively small values
of z, e.g., for z < −3.
35.3. “Backward” Normal Problems 461
f. What is the probability that an account will have between $1075 and
$1825?
Again let X ∼ N (1325, 250). Then
1075 − 1325 X − µX 1825 − 1325
P (1075 < X < 1825) = P < <
250 σX 250
= P (−1 < Z < 2)
= P (Z < 2) − P (Z < −1)
fZ (z)
0.4
z
−4 −2 2 4
Figure 35.16: The probability P (−1 < Z < 2) = 0.8185 is the area under
the curve.
So we get
P (1075 < X < 1825) = 0.9772 − 0.1587 = 0.8185.
So an account has between $1075 and $1825 with probability 0.8185.
P (Z ≤ z) = 1 − 0.15 = 0.85,
as shown on the right side of Figure 35.17. Looking in the chart, this means
z = 1.04. We still need to convert from Z, the number of standard deviations
fZ (z) fZ (z)
0.4 0.4
0.2
z z
−4 −2 2 4 −4 −2 2 4
Figure 35.17: Left: The shaded area is P (Z > z) = 0.15. Right: The
shaded area is P (Z ≤ z) = 0.85.
35.3. “Backward” Normal Problems 463
away from the mean our account balance is, to actual account balances.
x0 = µX + zσX
x0 = 1325 + (1.04)(250)
x0 = 1585
fZ (z) fZ (z)
0.4 0.4
0.2
z z
−4 −2 2 4 −4 −2 2 4
fZ (z)
0.4
z
−4 −2 2 4
We know that z will be negative because the desired probability P (Z ≤ z) = When we use the
0.20 is less than 0.5 (i.e., is not found on the chart for the standard Normal Normal table to
table); see Figure 35.18 (left side). The probability is the same if we look at work backward, we
the mirror image, but now our cut-off value is −z, i.e., P (Z ≥ −z) = 0.20; go from the inside
(probabilities) to
see Figure 35.18 (middle). Finally, we need to take the complement of this
outside (z-values).
probability so that we can look up the value on the table, i.e.,
Thus
P (Z ≤ −z) = 1 − 0.20 = 0.80
So the Normal table gives us −z = 0.84, i.e., z = −0.84. Now we use this
information about the standard Normal random variable Z to convert the cutoff
from z’s into x’s. We can relate the cutoffs x and z such that P (Z ≤ z) = 0.20 =
P (X ≤ x) as follows: x = µX + zσX , so we get
Example 35.18. Between what two central values do 40% of the balances fall?
We have
0.40 = P (−z ≤ Z ≤ z)
= 2P (Z < z) − 1 by (35.2)
So 2P (Z < z) = 1.40, and thus P (Z < z) = 0.70. So the table tells us z = 0.52,
or perhaps z = 0.53. We could compromise and use z = 0.525. Thus, the upper
balance is
X = µX + σX z = 1325 + (250)(0.525) = 1456.25,
and the lower balance is
35.5 Exercises
35.5.1 Practice
Exercise 35.1. Use the Normal table to find the following probabilities starting
from Z. Also sketch a standard Normal curve, and shade the region correspond-
ing to the given probability.
a. P (Z < 1.47)
b. P (Z > 1.47)
c. P (Z < −1.47)
d. P (Z > −1.47)
e. P (Z = −1.47)
Exercise 35.2. Use the Normal table to find the following probabilities starting
from Z. Also sketch a standard Normal curve, and shade the region correspond-
ing to the given probability.
a. P (Z ≤ 0.19)
b. P (Z ≤ 1.90)
c. P (Z ≥ 9.10)
d. P (0.19 < Z < 1.90)
e. P (Z ≥ −1.90)
f. P (Z = −1.90)
a. P (X < 1.62)
b. P (X > −8.49)
c. P (−4 < X < 1)
Exercise 35.4. Assume that X has a Normal distribution with a mean of −1.32
and a standard deviation of 0.34. Use the Normal table to find the following
probabilities starting from X. Also sketch a standard Normal curve, and shade
the region corresponding to the given probability.
a. P (X > −2)
b. P (X < 2.56)
c. P (1.47 < X < 4.12)
466 Chapter 35. Normal Random Variables
Exercise 35.5. Use the standard Normal table to find the following cut-off
values for Z. Also sketch a standard Normal curve, and shade the region corre-
sponding to the given probability.
a. P (Z < z) = 0.95
b. P (Z > z) = 0.15
c. P (−z < Z < z) = 0.65
Exercise 35.6. Use the standard Normal table to find the following cut-off
values for Z. Also sketch a standard Normal curve, and shade the region corre-
sponding to the given probability.
a. P (Z < z) = 0.5
b. P (Z > z) = 0.87
c. P (−z < Z < z) = 0.25
Exercise 35.7. Assume that X has a Normal distribution with a mean of 2 and
a standard deviation of 3. Use the standard Normal table to find the following
cut-off values for X. Also sketch a standard Normal curve, and shade the region
corresponding to the given probability.
a. P (X < x) = 0.78
b. P (X > x) = 0.21
c. What are the two central values such that 90% of the X values are in the
range between these two numbers?
Exercise 35.8. Assume that X has a Normal distribution with a mean of −1.32
and a standard deviation of 0.34. Use the standard Normal table to find the
following cut-off values for X. Also sketch a standard Normal curve, and shade
the region corresponding to the given probability.
Exercise 35.10. Exam scores. The students in my class have Exam 1 scores
which are Normally distributed with a mean of 75 and a standard deviation of
9. If a student is selected at random,
a. What is the probability the student will have a score of more than 90 (an
A)?
35.5. Exercises 467
b. What is the probability the student will have a score of less than 60 (an
F)?
c. What is the probability a student will have a score between 80 and 89
(the B range)?
d. What is the range of scores for the middle 50% of the student scores?
e. What is the range for the central 99.7% of the scores?
f. What is the lower cut-off for the top 3/4 of the students?
Exercise 35.11. Sugary candy. The quantity of sugar X (measured in grams)
in a randomly selected piece of candy is Normally distributed, with expected
value E(X) = µX = 22 and variance Var(X) = σX 2 = 8. Find the probability
that a randomly selected piece of candy has less than 20 grams of sugar.
Exercise 35.12. Annual precipitation. Assume that the annual precipi-
tation in a student’s hometown is Normally distributed, with expected value
µX = 36.3 inches and variance σX 2 = 8.41. A rare species of frog lives in the
town. This rare species of frog is known to reproduce during the year only if
the annual precipitation is between 35 and 39 inches. What is the probability
that the species of frog is able to reproduce this year?
Exercise 35.13. Getting to class. The distance a student lives (in miles)
from their probability classroom is approximately Normally distributed with a
mean of 3 miles and a standard deviation of 1.2 miles.
a. How far away do the closest 10% of students live?
b. What is the probability that a student will live too close to get a parking
permit (less than 1 mile away)?
c. What is the probability that a student will live further away than 5 miles
or less than 1 mile away?
Exercise 35.14. Movie length. Children’s movies run an average of 98 min-
utes with a standard deviation of 10 minutes. You check out a movie, selected
at random without reading the running time on the box, from the library to
entertain your kids so you can study for your probability test. Assume that
your kids will be occupied for the entire length of the movie.
a. What is the probability that your kids will be occupied for at least the 2
hours you would like to study?
b. What is range for the bottom quartile (lowest 25%) of time they will be
occupied?
c. What are the limits for the central 95% of times your kids will be occu-
pied?
Exercise 35.15. Weighing beagles. Let X be the weight (in pounds) of a
beagle. Then X is Normally distributed with µX = 17.2 pounds and σX
2 = 3.2.
Exercise 35.22. Drag race. In a quarter-mile drag race, the average time of
completion is 13.2 seconds, with standard deviation of 0.11 seconds. Find the
probability that a car completes the race in 13 seconds or less.
a. What is the probability that the box you buy (chosen at random) will
weigh less than 15.5 ounces?
b. What is the probability that the box you buy will weigh between 16 and
16.25 ounces?
c. What is the probability that the box you buy will weigh more than 17
ounces?
d. What is the upper cut-off for the 90th percentile (bottom 90%) of weights?
e. What is the range for the middle 60% of weights?
f. What is the range for the upper 2.5%?
35.5. Exercises 469
35.5.2 Extensions
Exercise 35.24. Female heights. Assume that the height of an American
female is Normal with expected value µ = 64 and standard deviation σ = 2.5.
35.5.3 Advanced
Exercise 35.25. We have not actually verified here that the density of a Normal
random variable integrates to 1, because the usual argument to do this requires
a perhaps surprising conversion to polar coordinates and a double integral; see,
for instance, Section
R ∞5.4 of Ross [5] or Section 5.3 of Pitman [4]. Construct such
an argument that −∞ fZ (z) dz = 1.
Chapter 36
Probability is a mathematical discipline whose aims are akin to those, for ex-
ample, of geometry of analytical mechanics. In each field we must carefully
distinguish three aspects of the theory:
(a) the formal logical content,
(b) the intuitive background,
(c) the applications.
The character, and the charm, of the whole structure cannot be appreciated
without considering all three aspects in their proper relation.
—An Introduction to Probability Theory and its Applications, Volume 1, 3rd
edition by William Feller (Wiley, 1971)
What is the total amount of food eaten by all of the students in a dining hall on
a given day? In a given month? In a given year? How can we scale our results
to compare the average, variance, and standard deviation of these food totals?
470
36.1. Sums of Independent Normal Random Variables 471
In the case in which the Normal random variables are not only independent,
but also all have the same expected values and the same variances, the boxed
formula above has the following nice form:
These two nice, boxed results have analogous versions, if we subtract the
expected value from the sum of the random variables and divide by the variance.
In the case of the first version, we get:
X1 + · · · + Xn − (µ1 + · · · + µn )
p is a standard Normal random variable.
σ12 + · · · + σn2
Example 36.5. In Exercise 35.11, we encountered a type of candy such that the
quantity of sugar X (measured in grams) in a randomly selected piece of candy
is Normally distributed, with expected value E(X) = µX = 22 and variance
Var(X) = σX 2 = 8.
36.1. Sums of Independent Normal Random Variables 473
Suppose that Hector eats ten pieces of this candy on Halloween night. What
is the probability that he consumed more than 200 grams of sugar?
If X1 , . . . , X10 denote the amount of sugar in the first, second, . . . , tenth
pieces of candy, respectively, then the total amount of sugar that he ate is
X1 + · · · + X10 . We know that the amount of sugar in the jth piece of candy is a
Normal random variable Xj ∼ N (µ = 22, σ 2 = 8), so the total amount of sugar
is also a Normal random variable with expected value 10µ = (10)(22) = 220
and variance 10σ 2 = (10)(8) = 80. So
X1 + · · · + X10 − 220
Z= √
80
is a standard Normal random variable. With this in mind, we compute the
probability that Hector consumed more than 200 grams of sugar. The desired
probability is:
X1 + · · · + X10 − 220 200 − 220
P (X1 + · · · + X10 > 200) = P √ > √
80 80
= P (Z > −2.24)
= P (Z < 2.24)
= 0.9875
So there is a 98.75% chance that Hector ate more than 200 grams of sugar
on Halloween night.
Example 36.6. Consider a species of ant whose body weight (in milligrams)
is Normally distributed with mean µ = 5 and variance σ 2 = 1.3.
Suppose that an ant colony contains 100,000 of these ants. What is the
probability that the ant colony weighs more than 500,500 milligrams?
Let X1 , . . . , X100,000 denote the weights of the 100,000 ants. So the total
weight of the colony is X1 + · · · + X100,000 . The weight of each ant is Normally
distributed, with Xj ∼ N (µ = 5, σ 2 = 1.3) for each j. Thus the total weight
of the colony is also a Normal random variable with expected value 100,000µ =
(100,000)(5) = 500,000 and variance 100,000σ 2 = (100,000)(1.3) = 130,000.
We use this information to shift and scale the sum of the weights. The sum of
the weights, after this adjustment, is
X1 + · · · + X100,000 − 500,000
Z= √ ,
130,000
Now we calculate the probability that the colony weighs more than 500,500
milligrams:
100,000 P100,000
Xj − 500,000
X j=1 500,500 − 500,000
P Xj > 500,500 = P √ > √
j=1
130,000 130,000
= P (Z > 1.39)
= 1 − P (Z ≤ 1.39)
= 1 − 0.9177
= 0.0823
Thus, there is an 8.23% chance that the weight of the ant colony exceeds 500,500
milligrams.
0.95 ≤ P (X ≤ a)
X − 1.125 a − 1.125
=P ≤
0.25 0.25
a − 1.125
=P Z≤
0.25
which simplifies to
a ≥ (0.25)(1.65) + 1.125 = 1.5375.
So the feature display must be at least 1.5375 inches deep, in order to accom-
modate the book of the month at least 95% of the time.
Example 36.8. In the scenario from the previous example, within the store
itself, the bookseller wants to be able to place n books on a regular shelf that is
72 inches long, and still be 95 percent (or more) sure that all of these n books
will fit. What is the largest value of n that she can plan to place on each shelf?
Thus, looking at the chart of the Normal distribution, we see that we must have
72 − 1.125n
1.645 ≤ √ ,
0.0625n
or equivalently (rearranging the terms),
√
0 ≥ 1.125n + 1.645 0.0625n − 72.
To find the value of n where equality is reached (i.e., where 0 is obtained)√ in
√
the previous equation, we can write: x = n and a = 1.125, b = 1.645 0.0625,
and c = −72. Then we just need √to remember that the quadratic equation
b2 −4ac √
0 = ax2 +bx+c has solution x = −b± 2a . In this case, we know x (i.e., n) is
√
positive, so we only need the larger of the two solutions. So we get n = 7.8193
and n = 61.1416. Thus, the maximum value of n that the bookseller can use,
and still maintain the desired conditions, is n = 61. In other words, if the
bookseller puts 61 books onto a shelf, she can still be at least 95% confident
that they will all fit onto the 72 inch shelf.
476 Chapter 36. Sums of Independent Normal Random Variables
Example 36.10. As in Exercise 35.24, assume that the height (in inches) of an
American female is Normal with expected value µ1 = 64 and standard deviation
σ1 = 2.5. Also assume that the height of an American male is Normal with
expected value µ2 = 69 and standard deviation σ2 = 3.0. Let X denote the
female’s height and Y denote the male’s height. What is the probability that a
randomly selected male is taller than a randomly selected female?
The expected value of Y − X is
E(Y − X) = E(Y ) − E(X) = 69 − 64 = 5.
The variance of Y − X is
Var(Y − X) = Var(Y ) + Var(X) = 3.02 + 2.52 = 15.25.
√
The standard deviation of Y − X is Var(Y − X) = 15.25 = 3.91.
p
(X1 + X2 ) + X3 is Normal,
The rest of the argument is dedicated to show that FZ1 +cZ2 (a) has this form.
We compute
We want to reverse the order of integration, so we must get rid of the z1 ’s in the
range of z2 . Thus, we use the transformation t = z1 + cz2 (i.e., z2 = (t − z1 )/c),
so dt = c dz2 , and in particular, when z2 = (a − z1 )/c, we have t = a. This
yields
Z ∞Z a 2
(t − z1 )2 1
1 z1 1
FZ1 +cZ2 (a) = √ exp − √ exp − dt dz1 .
−∞ −∞ 2π 2 2π 2c2 c
Now we expand the (t − z1 )2 and extract the terms that have t’s but no z1 ’s.
We also switch the order of integration. We get
Z ∞Z a 2 2
t − 2tz1 + z12 1
1 z1 1
FZ1 +cZ2 (a) = √ exp − √ exp − dt dz1
−∞ −∞ 2π 2 2π 2c2 c
Z a 2 Z ∞ 2
2tz1 − z12
1 −t 1 1 z1
= √ exp √ exp − + dz1 dt.
−∞ 2π 2c2 c −∞ 2π 2 2c2
478 Chapter 36. Sums of Independent Normal Random Variables
which simplifies to
2
z 2 2tz1 − z12 (1 + c2 ) t2
t
− 1 + = − z1 − + .
2 2c2 2c2 1 + c2 2c2 (1 + c2 )
Now we substitute back into the expression for the CDF to get
Z a 2
1 −t 1
FZ1 +cZ2 (a) = √ exp 2
−∞ 2π 2c c
2
2
Z ∞ exp − (1+c2 ) z1 − t 2 + 2 t2 2
2c 1+c 2c (1+c )
× √ dz1 dt.
−∞ 2π
Combining the exponential expressions with t’s, using
−t2 t2 t2
+ = − ,
2c2 2c2 (1 + c2 ) 2(1 + c2 )
The expression for the z1 terms almost looks like the integral for the density of
a Normal random variable. To see this more clearly, we write µt = 1+c t
2 and
c2
σ2 = 1+c2
, so that we have
a
t2
Z
1 1
FZ1 +cZ2 (a) = √ exp − 2
−∞ 2π 2(1 + c ) c
Z ∞
1 1
× √ exp − 2 (z1 − µt )2 dz1 dt.
−∞ 2π 2σ
36.3. Exercises 479
√
1+c2
Then we multiply and divide by a factor of c = 1
σ = √1 , to get
σ2
a
t2
Z
1 1 c
FZ1 +cZ2 (a) = √ exp − 2)
√
−∞ 2π 2(1 + c c 1 + c2
Z ∞
1 1 2
× √ exp − 2 (z1 − µt ) dz1 dt.
−∞ 2πσ 2 2σ
When evaluating the inner integral, the
value of2 t (and thus of µt ) is fixed,
−µt )
so that the inner integrand, √ 2 exp − (z12σ
1
2 is exactly the density of a
2πσ
Normal random variable with expected value µt and variance σ 2 . So the inner
integral evaluates to 1, because we are integrating over all z1 . Thus, the entire
expression simplifies to
Z a
t2
1
FZ1 +cZ2 (a) = exp − dt.
2(1 + c2 )
p
−∞ 2π(1 + c2 )
So, the CDF of Z1 + cZ2 shows Z1 + cZ2 is Normal with E(Z1 + cZ2 ) = 0 and
Var(Z1 + cZ2 ) = 1 + c2 .
36.3 Exercises
36.3.1 Practice
Exercise 36.1. Haircuts. The time that it takes a random person to get a
haircut is Normally distributed, with an average of 23.8 minutes and a standard
deviation of 5 minutes. Assume that different people have independent times
of getting their hair cut. Find the probability that, if there are four customers
in a row (with no gaps in between), they will all be finished getting their hair
cut in 1.5 hours (altogether) or less.
Exercise 36.2. Annual precipitation. As in Exercise 35.12, assume that
the annual precipitation in a student’s hometown is Normally distributed, with
expected value µ = 36.3 inches and variance σ 2 = 8.41. Also assume that the
amount of precipitation is independent in distinct years.
Let X1 , . . . , X10 denote the precipitation in the 10 distinct years of a decade.
Find the probability that the total rainfall during the decade exceeds 380 inches.
Exercise 36.3. Printer pages. A printer can produce, on average, 30 pages
per minute, i.e., one page every 2 seconds. Each page’s printing time has stan-
dard deviation of 0.3 seconds. If the pages run times are independent, find the
probability that the total print time for a 30 page job is 62 seconds or less.
Exercise 36.4. Counting calories. Let X denote the number of calories
that a person eats in a single day. Suppose that X is Normally distributed with
µX = 2000 and σX 2 = 10,000. If the person’s eating habits are independent
from day to day, find the probability that the person eats 735,000 calories or
more during a 365-day year.
480 Chapter 36. Sums of Independent Normal Random Variables
36.3.2 Extensions
Exercise 36.11. Tree heights. After planting 20 evergreen tree saplings on
his farm, Don wonders about their heights. If the heights are independent and
Normally distributed, with average height 3 feet, and standard deviation of 1.2
feet, what is the probability that none of the 20 trees’ heights exceeds 5 feet?
36.3. Exercises 481
Exercise 36.14. Baking cakes. Roberto and Sally are busy making cakes for
a bake sale. They need to make 35 cakes for the sale. The cooking times of the
cakes are independent and Normally distributed. Sally makes 19 of the cakes
and Roberto makes 16 of them. Sally’s have average cook time of 45 minutes
each, with standard deviation of 3 minutes. Roberto’s cook, on average, 42
minutes each, with standard deviation of 4 minutes. What is the probability
that the average of their 35 baking times is actually between 43 to 44 minutes
altogether?
Exercise 36.15. Losing weight. A new low-carb diet author claims that
people following his plan will lose an average of 3 pounds with a standard
deviation of 1.4 pounds each week. An author of a low-fat diet claims that
people following her plan will lose an average of 2 pounds with a standard
deviation of 1.8 pounds each week. What is the probability that a low-carb
dieter will lose at least a pound more than a low-fat dieter in a week?
482 Chapter 36. Sums of Independent Normal Random Variables
36.3.3 Advanced
Exercise 36.16. Consider a group of n independent, identically distributed
Normal-loss contracts, X1 , . . . , Xn , each with average µ and standard deviation
σ. A company plans to keep an amount of funds R on reserve so that X1 +
· · · + Xn ≤ R with probability 95%, i.e., so that the company is 95% sure that
all of its losses can be covered.
a. If n = 25, find the needed reserve R. What is the value of R/n, i.e., the
reserve needed per contract?
b. Same question, with n = 100.
c. Same question, with n = 10,000.
d. Same question, for general n. (Notice that, for a bigger aggregation of
contracts, the amount of reserve, per contract, gets smaller; in other words, R/n
is a decreasing function of n, which gets closer to the average loss per contract.)
Chapter 37
A basketball player plans to practice shooting hoops until she gets at least 10
successful baskets. She’s meeting up with some friends after practice, so she
wants to plan how long that will take, given that she usually makes about 60%
of the baskets she shoots. If she wants to be 95% sure of her planning, how
many baskets should she plan on attempting, to be sure that she has at least
10 successful shots?
37.1 Introduction
Gerolamo Cardano (1501–1576) studied games of chance using methods that
were among the earliest predecessors to the modern discipline of probability.
Five hundred years after Cardano, many others have contributed to the study of
probability as it related to the limiting properties of randomness; these scholars
include: Bernoulli (actually several Bernoullis), Laplace, Poisson, Chebyshev,
Markov, Borel, Cantelli, Kolmogorov, and Khinchin. Probability is a major area
of study that transcends mathematics and has applications in every discipline
that involves randomness or uncertainty or chance. Within the study of prob-
ability theory, the subtopic of limiting properties receives a significant amount
of attention. Many articles and books have been devoted to the study of lim-
iting properties of random phenomena. We will only present some of the most
commonly referenced laws about limits in probability theory. This is just the
first stone off a large mountain of known results about the limiting properties
about random variables.
483
484 Chapter 37. Central Limit Theorem
We do not prove the Weak or Strong Law of Large Numbers or the Central
Limit Theorem. More advanced techniques (for instance, characteristic func-
tions) are needed. See, for instance, Billingsley [1] or Durrett [2] for details.
The Central Limit Theorem, often affectionately called the CLT, is perhaps
surprising because we do not need the random variables X1 , X2 , X3 , . . . to be
Normal. This theorem basically says that sums of n independent random vari-
ables (of any type) are distributed similarly to a Normal random variable when
n is large. (There is no minimum n necessary before the CLT applies, but the
CLT is more effective, the larger n is.) This is a truly powerful concept that
is used throughout the sciences and beyond. We give some applications in the
next several examples.
Example 37.4. Consider the volumes of soda remaining in 100 cans of soda
that are nearly empty. Let X1 , . . . , X100 , denote the volumes (in ounces) of
cans one through one hundred, respectively. Suppose that the volumes Xj are
independent, and that each Xj is Uniformly distributed between 0 and 2.
Find the probability that the 100 cans of soda contain less than 90 ounces
of soda total.
The expected value of Xj is E(Xj ) = (0 + 2)/2 = 1. The variance of Xj is
Var(Xj ) = (2 − 0)2/12 = 1/3. Thus, the expected value of X1 + · · · + X100 is
The content of the Central Limit Theorem, applied to this scenario, is that
X1 + · · · + X100 − (100)(1)
p
(100)(1/3)
Suppose we want to find the probability that the total amount is no more than
90 ounces. We compute
!
X1 + · · · + X100 − (100)(1) 90 − (100)(1)
P (X1 + · · · + X100 ≤ 90) = P p ≤ p
(100)(1/3) (100)(1/3)
≈ P (Z ≤ −1.73)
= P (Z ≥ 1.73)
= 1 − P (Z ≤ 1.73)
= 1 − 0.9582
= 0.0418
In other words, the probability is approximately 4.18% that the 100 cans of
soda contain a total of less than 90 ounces of soda.
Example 37.4 (continued) Now we find the probability that the total volume
of soda remaining is between 97 and 103 ounces.
We compute
So the probability is approximately 39.70% that the 100 cans of soda contain
between 97 and 103 ounces of soda.
37.4. CLT for Sums of Continuous Random Variables 487
What is the probability that David can handle 135 calls within a 40 hour
period? In other words, what is P (X1 + · · · + X135 ≤ 40)?
If we only use the fact that X1 + · · · + X135 is a Gamma random variable,
we would need to compute 135 nested integrals. Yuck! Computing 135 nested
integrals would definitely be unreasonable, but such a computation would be
necessary to get the exact probability.
Fortunately, we can use the Central Limit Theorem to get an excellent ap-
proximation to the answer, and we will not need to solve any integrals! Since
E(Xj ) = 1/3 for each j, so
E(X1 + · · · + X135 ) = (135)(1/3).
Since the Xj ’s are independent, we can add the variances to obtain
Var(X1 + · · · + X135 ) = (135)(1/9).
Thus, we compute P (X1 + · · · + X135 ≤ 40) as follows:
!
X1 + · · · + X135 − (135)(1/3) 40 − (135)(1/3)
P p ≤ p ≈ P (Z ≤ −1.29)
(135)(1/9) (135)(1/9)
= P (Z ≥ 1.29)
= 1 − P (Z ≤ 1.29)
= 1 − 0.9015
= 0.0985
Thus, we conclude that the probability is approximately 38.59% that David can
handle 135 calls within 40 hours.
Let X1 , . . . , X24 denote the lengths of the calls. The average length of the 24
calls is (X1 + · · · + X24 )/24. So the average exceeds 1/4 of an hour if
X1 + · · · + X24 1
P > .
24 4
Multiplying by 24 on both sides of the inequality, we have, equivalently,
P (X1 + · · · + Xn ≤ 40),
Which number should we use in our computation, 40 or 41??? The answer will
not be greatly affected either way, but the rule of continuity correction tells us to
use the number halfway in between the two candidate values. So we must first
determine which two numbers are candidates (one value will correspond to a less-
than-or-equal or a greater-than-or-equal, and the other value will correspond to
a strictly-less-than or a strictly-greater-than). Afterwards, we take the average
of the two numbers, in this case, 40.5. So we would compute
X1 + · · · + Xn − nµ 40.5 − nµ
P √ ≤ √ .
nσ 2 nσ 2
37.5. CLT for Sums of Discrete Random Variables 489
Example 37.8. One thousand students participate in a survey to see how many
breakfasts that they each eat, within a two week period. Let X1 , X2 , . . . , X1000
denote the numbers of breakfasts that student 1, 2, . . . , 1000 eats, respectively.
Since Xj is the number of breakfasts consumed by the jth student, then 0 ≤
Xj ≤ 14 for each j. Assume that Xj is Binomial, with n = 14 and p = 0.6, and
assume that the students behave independently.
Find the probability that the students eat strictly more than 8350 breakfasts
altogether during the two week period.
The expected value of Xj is E(Xj ) = np = (14)(0.6) = 8.4. The variance of
Xj is Var(Xj ) = np(1 − p) = (14)(0.6)(0.4) = 3.36. Thus, the expected value
of X1 + · · · + X1000 is
The content of the Central Limit Theorem, applied to this scenario, is that
X1 + · · · + X1000 − (1000)(8.4)
p
(1000)(3.36)
The only way that the continuity correction affects our calculation is in deciding
whether to compute
P (X1 + · · · + X1000 > 8350)
or
P (X1 + · · · + X1000 ≥ 8351).
These are the same probabilities, but they will generate slightly different results
in our approximation below. So we pick the average value of 8350 and 8351
490 Chapter 37. Central Limit Theorem
In other words, the probability is approximately 80.23% that the students eat
strictly more than 8350 breakfasts during the two week period.
Example 37.8 (continued) Suppose that we now want to find the probability
that between 8380 and 8435 breakfasts (inclusive) are served, i.e., find
because that would be the exact same probability. So we settle for the average
of each endpoint. In other words, we compute
So the probability is approximately 36.59% that the students eat between 8380
and 8435 breakfasts (inclusive) during the two week period.
37.5. CLT for Sums of Discrete Random Variables 491
What is the probability that strictly between 260 and 280 attempts are
needed? In other words, what is
Equivalently, what is
We know from Example 17.3 that each player has expected number of shots
E(Xj ) = 1/0.80 = 1.25 and variance Var(Xj ) = 0.20/0.802 = 0.3125.
We compute:
So the probability is approximately 66.53% that (strictly) between 260 and 280
attempts are needed altogether.
492 Chapter 37. Central Limit Theorem
Note that the random variables are only approximately Normal, not com-
pletely Normal. For instance, the mass of a Binomial random variable is still
discrete (although it looks like a bell curve) and the mass pX (j) is only defined
for integers 0 ≤ j ≤ n. See Figures 37.1 and 37.2. We will still use continuity
correction with such Binomial random variables, since they are discrete.
X − µX X − np
Z≈ =p
σX np(1 − p)
pX (x)
0.06
0.05
0.04
0.03
0.02
0.01
x
40 50 60 70 80 90 100 110
Figure 37.1: The mass of a Binomial random variable with n = 300 and
p = 1/4, near the expected value, np = 75.
494 Chapter 37. Central Limit Theorem
pX (x)
0.05
0.04
0.03
0.02
0.01
x
850 860 870 880 890 900 910 920 930 940 950
Figure 37.2: The mass of a Binomial random variable with n = 1000 and
p = 9/10, near the expected value, np = 900.
but again, this is very difficult for many calculators to handle. So we reformulate
this as
P (500 ≤ X ≤ 500),
or equivalently
P (499 < X < 501),
and with continuity correction, this becomes
P (499.5 ≤ X ≤ 500.5),
The probability that at most 840 of the students eat Japanese pan noodles
is P (X ≤ 840), i.e., P (X < 841), so we use P (X ≤ 840.5) for the continuity
correction, and we calculate
X − 800 840.5 − 800
P (X ≤ 840.5) = P √ ≤ √
480 480
≈ P (Z ≤ 1.85)
= 0.9678
Figure 37.3: The mass of a Poisson random variable with λ = 500, near the
expected value, λ = 500.
Of course, Poisson random variables are discrete, and moreover the mass
pX (j) is defined only for the nonnegative integers 0, 1, 2, . . .. Thus, we must use
continuity correction with approximations of Poisson random variables too.
Having the parameter λ ≥ 10 is one possible rule of thumb for getting rela-
tively good approximations of a Normal random variable to a Poisson random
variable. We can view X as the sum of a large number of Poisson random
variables. As an example, if X is a Poisson random variable with parameter
λ = 500, we can let X1 , . . . , X500 be a collection of independent Poisson random
37.7. Approximations of Poisson Random Variables 497
variables, each with parameter 1, so both X and X1 + · · · + X500 have the same
mass:
This comes from the fact that the sum of independent Poisson random variables
is also a Poisson random variable, and that the parameter of the sum is equal
500
z }| {
to the sum of the parameters. In this case, 1 + 1 + · · · + 1 = 500. So we
have just decomposed a Poisson random variable X with parameter 500 into a
sum of 500 independent Poisson random variables X1 , . . . , X500 that each have
parameter 1. Thus, by the Central Limit Theorem, X is approximately a Normal
random variable. We note that X √ has expected value λ = 500 and variance 500,
and therefore standard deviation 500.
We can even decompose the very same Poisson random variable more finely.
For instance, if X is Poisson with parameter λ = 500, we can let X1 , . . . , X3000
be a collection of independent Poisson random variables, each with parameter
1/6 (we use 1/6 here, so that the mean is 500 = (3000)(1/6)), then both X and
X1 + · · · + X3000 have the same mass:
3000
z }| {
This works because 1/6 + 1/6 + · · · + 1/6 = 500.
Methods like this allow us to even decompose Poisson random variables with
parameters that are not integers! For instance, if X is Poisson with parameter
372.9, we can let X1 , . . . , X1000 be a collection of independent Poisson random
variables, each with parameter 0.3729, and then both X and X1 , . . . , X1000 have
the same mass:
1000
z }| {
This works because 0.3729 + 0.3729 + · · · + 0.3729 = 372.9.
498 Chapter 37. Central Limit Theorem
or equivalently,
P (1151 ≤ X ≤ 1249),
so with continuity correction, we use
P (1150.5 ≤ X ≤ 1249.5).
Now we compute
1150.5 − 1200 X − 1200 1249.5 − 1200
P (1150.5 ≤ X ≤ 1249.5) = P √ ≤ √ ≤ √
1200 1200 1200
≈ P (−1.43 ≤ Z ≤ 1.43)
= P (Z ≤ 1.43) − P (Z ≤ −1.43)
= P (Z ≤ 1.43) − P (Z ≥ 1.43)
= P (Z ≤ 1.43) − 1 + P (Z ≤ 1.43)
= 0.9236 − 1 + 0.9236
= 0.8472
37.7. Approximations of Poisson Random Variables 499
= 0.8470129469 . . .
We could not perform such a calculation very easily by hand, because it has
nearly 100 terms that would need to be calculated, and each term includes the
use of very large numbers, e.g., 1200j and j! for 1151 ≤ j ≤ 1249, and also very
small numbers, e.g., e−1200 for 1151 ≤ j ≤ 1249.
Example 37.15. A publisher estimates that a 500 page book has a Poisson
number of errors with average λ = 100 errors/book. Estimate the probability
that there are strictly more than 80 errors in such a book.
If we let X denote the number of errors in a newly published 500 page book,
the probability that there are strictly more than 80 errors is P (X > 80), which
is equivalent to P (X ≥ 81).
Why can we not calculate this manually, without Normal approximation?
If we try a manual calculation, we first realize that there are infinitely many
terms:
e−100 10081 e−100 10082 e−100 10083 e−100 10084
P (X ≥ 81) = + + + + ··· ,
81! 82! 83! 84!
which is hopeless. So we rearrange and use the complement:
P (X ≥ 81) = 1 − P (X ≤ 80)
e−100 1000 e−100 1001 e−100 1002 e−100 10080
= + + + ··· + ,
0! 1! 2! 80!
but this is still hopeless, because there are 81 terms in the summation. So the
value of a Normal approximation—which can be evaluated in one fell swoop—
should now seem apparent.
Also remember that we use P (X ≥ 80.5), for the purposes of continuity
correction. This gives us
80.5 − 100 X − 100
P (80.5 ≤ X) = P √ ≤ √
100 100
≈ P (−1.95 ≤ Z)
= P (1.95 ≥ Z)
= P (Z ≤ 1.95)
= 0.9744
500 Chapter 37. Central Limit Theorem
37.8 Exercises
37.8.1 Practice
Exercise 37.1. Concert tickets. At a certain university, each student who
tries to purchase concert tickets for an upcoming show is successful at connecting
to the concert ticket website with probability 0.85. If unsuccessful in logging
on, he or she tries again a few minutes later, over and over, until finally getting
tickets. The concert is large (tens of thousands of seats), so assume that such
attempts are independent. Within a group of 300 students, find the probability
that strictly more than 360 attempts are necessary for these 300 students to
successfully get tickets.
Exercise 37.2. Blind auction bids. At an auction, exactly 282 people place
requests for an item. The bids are placed “blindly,” which means that they are
placed independently, without knowledge of the actions of any other bidders.
Assume that each bid (measured in dollars) is a Continuous Uniform random
variable on the interval [10.50, 19.30]. Find the probability that the sum of all
the bids exceeds $4150.
Exercise 37.3. Baseball statistics. A certain baseball player has, on average,
0.7 RBI’s (runs batted in) per game, with standard deviation of 0.2. What is
the approximate probability that the player has at most 110 RBI’s during a
given season, which contains 162 games?
Exercise 37.4. Tootsie Pops. Back in 1970, a television commercial for Toot-
sie Pops (a candy) asked how many licks would be needed to get to the Tootsie
Roll Center of a Tootsie Pop. This question has become somewhat famous and
well-known among children. High school, undergraduate, and Ph.D. students
have all performed experiments to uncover the answer (some have even built
licking machines to test this question). Suppose that an average 364 licks are
required, with a standard deviation of 40 licks. If a group of 50 children test
their licking abilities, what is the approximate probability that, among the 50
children, the average number of licks per student is 380 or more?
37.8. Exercises 501
Exercise 37.23. Jackpot. If a certain type of slot machine has only a proba-
bility of 0.0001 of yielding a jackpot on each game at a certain casino in Vegas,
and the patrons play those slots 250,000 times during a given month, estimate
the probability that the casino will have 30 or more jackpot rewards to payout
during the month.
Exercise 37.24. Ice cream patrons. An ice cream shop estimates that its
number of patrons per day is Poisson with mean 19. What is the estimated
probability that they have 20 or more customers on a given day?
37.8.2 Extensions
Exercise 37.29. Burritos. Twenty-five students want burritos, but the dining
hall only has one burrito maker. Each burrito takes an average of 72.5 seconds
to cook, with standard deviation 3.2 seconds.
a. Estimate the probability that all twenty-five students can cook their
burritos in half an hour or less, if we ignore the time in between the consecutive
students.
504 Chapter 37. Central Limit Theorem
b. Estimate the probability that all twenty-five students can cook their
burritos in 2300 seconds or less, if we assume that there is an exact 20-second
delay in between each pair of consecutive students. (Hint: There are 24 such
delays between 25 students.)
c. Estimate the probability that all twenty-five students can cook their
burritos in 2300 seconds or less, if we assume that there is a Normally distributed
delay in between each pair of consecutive students, with average delay of 20
seconds between consecutive pairs of students, and standard deviation of 4
seconds.
Exercise 37.31. Playing the lottery. A person plays a certain lottery game
10,000 times during his life. The chance of winning is 1/5000 during each
attempt, and the attempts are independent. Approximate the probability that
he wins 2 or more times during his life.
Chapter 38
Your first step should be to identify whether X is discrete (how many?) or con-
tinuous (how long or how much?). Once you have decided that X is continuous,
think about what values of X would be allowed.
The actual waiting time until your 1st customer is an Exponential random
variable. The actual waiting time until your 3rd customer is a Gamma random
variable. The customer could arrive at a time that is Uniform between 8:00
and 8:15. The percentage of customers who buy ice cream could be a Beta
random variable. The number of customers during a long period of time is
approximately Normally distributed.
The following are some Pairs of Random Variables that are Related:
Waiting until the 1st success:
• Geometric: X is the # of people you have to ask until you get your first
“yes” answer (discrete)
• Exponential: X is the time you have to wait until you get your first
“yes” answer (continuous)
• Gamma: X is the time you have to wait until you get your 4th “yes”
answer (continuous)
505
Summary of Named Continuous Random Variables
Name Uniform Exponential Gamma Beta Normal
2 2
λe−λx (λx)r−1 Γ(α+β)xα−1 (1−x)β−1 e−(x−µ) /(2σ )
Density 1/(b − a) λe−λx Γ(r) Γ(α)Γ(β)
√
2πσ 2
fX (x)
Domain a≤x≤b x≥0 x≥0 0≤x≤1 −∞ < x < ∞
(λx)j
CDF 1 − e−λx ; 1 − e−λx r−1 convert to std. Normal
P
(x − a)/(b − a) j=0 j! ,
FX (x) if r ∈ N Z = (X − µX )/σX
E(X) (a + b)/2 1/λ r/λ α/(α + β) µ
αβ
Var(X) (b − a)2/12 1/λ2 r/λ2 (α+β)2 (α+β+1)
σ2
Param- a, b are λ is average λ is average usually some µ = exp. value;
eters endpoints # of successes # of successes given σ = st. dev.
506
38.2 Exercises
For Exercises 38.1 through 38.12, state which continuous or discrete distribution
would be most appropriate and why you think so. The choices are:
Exercise 38.10. Let X be the position of the cookie on the tray (numbered 1
to 12) that the cookie you will be served has if all cookies are equally convenient
to the waitress.
Exercise 38.11. Let X be the number of chocolate chip cookies which are
brought to you if you ask the waitress to randomly select 4 cookies from a
selection of 5 chocolate chip, 3 oatmeal, and 7 peanut butter.
Exercise 38.12. Let X be the number of cookies you will eat that until you
find the 5th one that has more than 7 chocolate chips in it.
a. What is the probability a SaveBucks battery will last more than 48 hours?
b. What is the probability a LittleMoola battery will last more than 48
hours?
c. Given that a SaveBucks battery has lasted more than 48 hours, what is
the probability it will last more than 96 hours? What property applies to this
situation?
d. The batteries look identical, so if you have a battery you selected at
random which has lasted more than 48 hours, what is the probability it is a
SaveBucks battery? Which rule do you use to answer this question?
38.2. Exercises 509
Exercise 38.16. Power outage. The power has gone out in your house, and
you are using a small flashlight to read your probability book. You have three
batteries available to you, all of the same brand, but you can’t tell which brand
in the dark. The flashlight uses only one battery at a time. Use the information
about the two brands of batteries from Exercise 38.15 to answer the questions
below.
a. If the 3 batteries are from SaveBucks, how long do you expect to have
light from your flashlight? What is the standard deviation? What distribution
are you using, and what are the parameters? What important assumption do
you have to make about battery lives?
b. If the 3 batteries are from LittleMoola, how long do you expect to have
light from your flashlight? What is the standard deviation? What distribution
are you using and what are the parameters?
c. What is the probability you will need more than 3 batteries if you use the
flashlight every evening for a week because you forgot to pay your electricity
bill? (Assume there are 12 hours of darkness each night and 7 nights in the
week.) What distribution are you using, and what are the parameters?
Exercise 38.17. Distribution of batteries sold. The SaveBucks battery
company wants to wait until 60% of the batteries are sold at the corner conve-
nience store before they ship more. After years of study, they have determined
that the proportion of batteries which are sold after 6 weeks follows a Beta
distribution with α = 2 and β = 3.
a. What is the probability that more than 60% of the batteries will be sold
at the end of the 6-week period?
b. What is the expected proportion of batteries which will be sold at the
end of the 6-week period?
c. What is the standard deviation in the proportion of batteries which will
be sold at the end of the 6-week period?
Exercise 38.18. Where is my engagement ring? While walking from the
car into your dormitory, you dropped your engagement ring somewhere in the
snow. The path is straight and 30 feet long. You are distraught because the
density of its location seems to be constant along this 30-foot route.
a. What is the probability that the ring is within 12 feet of your car?
b. What is the probability that the ring is between 9 to 11 feet away from
the car?
c. What is the expected distance and the standard deviation for the distance
from the car where the ring fell?
Exercise 38.19. Lifeguard. A lifeguard has to jump in to save a swimmer in
trouble at the lake at Possum Park an average of 2 times a week, according to
a Poisson distribution.
510 Chapter 38. Review of Named Continuous Random Variables
a. What is the probability that he will save more than 25 swimmers in the
10-week season he works?
b. If you know he saved exactly 1 swimmer on his 8-hour shift today, what
is the chance he did it in the last hour that he worked?
c. If there are 5 lifeguards working independently, and they each have the
same average rate for saving swimmers, what is the probability that exactly
three of them will save more than 25 swimmers (each) in the 10-week season?
d. The lifeguards receive a small raise if they save 10 swimmers. How long
will the lifeguard expect to wait for his raise?
e. What is the probability our lifeguard have to wait at least 1 week after
the start of the summer until he saves his first swimmer?
f. How many independent lifeguards will we expect to have to ask until we
find our second lifeguard who had to wait at least 1 week after the start of the
summer until he/she saved his/her first swimmer?
a. What is the probability that a patient who is diagnosed today will still
be alive 15 years from now?
b. Given that a patient is still alive after 5 years, what is the probability
the patient will still be alive 15 years after the initial diagnosis?
c. If 5 people are diagnosed with this disease today, what is the probability
that all 5 of them will be alive 15 years from now?
a. The nurse cares for one patient at a time since the care is given in the
patient’s home, and constant monitoring of the patient is needed. She begins
caring for that patient as soon as the diagnosis is given, and she cares for the
patient until death. Then she is reassigned to a new patient. What is the
expected time until her 3rd patient dies?
b. The nurse wakes up and realizes that her patient’s “call” button is lit.
Unfortunately she doesn’t know how long it has been lit. She knows that it
wasn’t lit 10 minutes ago, when she fell asleep. So it could have been pressed
at anytime during the previous 10 minutes, and she is worried that her patient
has been waiting too long. What is the probability that the patient pressed the
call button in the first 30 seconds that the nurse was asleep?
c. Given that the call button was not pressed in the first 30 seconds, what
is the probability it was pressed during the last 2 minutes?
38.2. Exercises 511
Exercise 38.22. Apple juice. The fraction of juice which can be squeezed
from an apple is a random variable with the following density:
(
k(1 − x)6 if 0 ≤ x ≤ 1
fX (x) =
0 elsewhere
a. Find the value of k that makes this a valid probability density function.
b. What is the expected fraction of juice which can be squeezed from an
apple?
c. What is the standard deviation in the fraction of juice which can be
squeezed from an apple?
d. What is the probability that an apple will have more than half of the
juice squeezed?
Exercise 38.23. Identify the distribution. If X has a mean 0.5 and a
variance 0.25, find the parameters of the distribution (if possible) if X is:
a. Binomial
b. Exponential
c. Normal
d. Uniform
e. Geometric
Exercise 38.24. Identify the distribution. If X has a mean 3 and a variance
2.5, find the parameters of the distribution (if possible) if X is:
a. Binomial
b. Exponential
c. Normal
d. Uniform
e. Geometric
Exercise 38.25. Cable repairs. The cable technician guarantees he’ll arrive
sometime between 8 AM and noon, but he can’t be any more specific than that.
a. What is the probability that he will come in between 9:30 and 10:45 AM?
b. You have a meeting at noon, and you’re hoping that the cable technician
will come before 11:30 AM so you’ll have time to get to the meeting. If the
cable technician has not come by 10 AM, what is the probability that he will
come before 11:30 AM?
c. What is the probability that the next 4 times you need the cable techni-
cian to come out, he will come between 9:30 AM and 10:45 AM on at least 3 of
those occasions?
512 Chapter 38. Review of Named Continuous Random Variables
Exercise 38.26. Cell phone usage. The amount of time you use on your cell
phone each month is approximately Normally distributed with a mean of 623
minutes and a standard deviation of 24 minutes.
a. What is the probability that you will talk more than 700 minutes (when
the extra charges apply) next month?
b. What range of talk-times represents the middle 50% of the times you will
talk?
c. How many monthly bills do you expect to receive until you get the 6th
one with extra charges (talk more than 700 minutes)?
of 2, 3, and 4. Determine the probability that the maximum return from these
3 investments is more than 5. (Hint: Think about the opposite case.)
a. What is the probability that the component fails in the first year?
b. What is the probability that the component fails in the second or third
year (between the 1-year mark and the 3-year mark)?
c. What is the probability that the component fails after the third year?
a. How much does the owner expect to pay in refunds if a single component
is purchased?
b. How much does the owner expect to pay in refunds if 50 components are
purchased?
Additional Topics
In this part, we discuss some more advanced topics that not all courses will
have time to cover but are still important and interesting. Depending on the
interests of the students and teachers, some of these topics (like moment gener-
ating functions) may be introduced earlier in the course. These topics regularly
appear on the SOA/CAS P/1 exam.
By the end of this part of the book, you should be able to:
2. Calculate the covariance and correlation, and understand how these terms
define the relationship between two random variables.
5. Calculate the PDF, CDF, joint PDF, and joint CDF for order statistics
for independent, identically distributed, continuous random variables.
6. Understand what moment generating functions are and how they can be
utilized.
515
Chapter 39
This morning, your mom made two different types of cookies, peanut butter
and oatmeal raisin. She put them all in a big cookie jar in the kitchen. You
and your brother both randomly (and hungrily!) grab cookies out of the jar, but
your brother grabs first. Does the number of peanut butter cookies your brother
grabs have an impact on the number of peanut butter cookies you later grab?
What are the average and variance in the number of peanut butter cookies
selected by each of you?
39.1 Introduction
Many times during this course, we have used the fact that the expected value
of a sum of random variables is equal to the sum of the expected values of the
516
39.2. Motivation for Covariance 517
For the left term in (39.2), we just multiply to expand ( ni=1 Xi )2 as follows:
P
X n 2 X n Xn
Xi = Xi Xj
i=1 i=1 j=1
n
XX n
= Xi Xj ,
i=1 j=1
but the expected value of the sum equals the sum of the expected values, so the
left term in (39.2) is
Xn 2 X n Xn
E Xi =E Xi Xj
i=1 i=1 j=1
n
XX n
= E(Xi Xj ).
i=1 j=1
518 Chapter 39. Variance of Sums; Covariance; Correlation
A key point from all of these manipulations with the sums and the squaring is
that the terms E(Xi Xj ) − E(Xi )E(Xj ) play a fundamental role in the variance
of the sum of random variables.
We note that E(X) is constant, so it can be pulled out of the term E (Y E(X)) =
E(Y )E(X). Similarly, E(Y ) is a constant, so it can be pulled out of the term
39.3. Properties of the Covariance 519
E (XE(Y )) = E(X)E(Y ). Finally, E(X) and E(Y ) are both constants, and thus
E (E(X)E(Y )) = E(X)E(Y ). So we obtain
The covariance gives some information about how far two random variables
are spread from their expected values. Just like with the expected value or the
variance, the covariance is only one number, so it can only provide a relatively
small amount of information. Also, it is possible that many different pairs of
X’s and Y ’s could have exactly the same covariance. So the covariance does not
tell the whole story about the relationship between two variables, but it does
help. Very informally—just to build intuition—when a covariance between X
and Y is positive, then the larger X turns out to be, the larger Y will tend to
be, and vice versa. If the covariance is negative, then the larger that one of the
random variables tends to be, the smaller the other will tend to be.
As an informal example, suppose X and Y are indicator random variables
for the events that Alicia and Brent (respectively) get a peanut butter cookie
from the same cookie jar (which contains several types of cookies). When Alicia
gets a peanut butter cookie, then X = 1, and there is one less peanut butter
cookie remaining for Brent, so Y becomes less likely to be 1 than it would be
otherwise. Thus, X and Y have negative covariance. We elaborate on this
specific example in Example 39.9.
Using the notation in equation (39.3), we conclude that the variance of the
sum of random variables is equal to the sum of covariances of all pairs of the
random variables:
Cov(X, X) = Var(X).
Isolating all n of the terms of this flavor from Theorem 39.3, we get another
way to express the variance of the sum of random variables:
We also note that the covariance is symmetric in terms of the two variables,
i.e.,
With this symmetry in mind, we see that each term Cov(Xi , Xj ) in Corol-
lary 39.5 will also appear as a Cov(Xj , Xi ) term. So the following formulation
makes calculations shorter to perform, since we are combining all of these du-
plicated pairs:
We have already observed that for any random variables X and Y and
functions g and h, if X and Y are independent, then:
(Recall: This was proved for discrete random variables in Section 12.2 and for
continuous random variables in Section 29.4. The motivation for this is that
independence of X and Y allows the joint mass pX,Y (x, y) to be factored into
pX (x)pY (y), or in the continuous case, the joint density fX,Y (x, y) to be factored
into fX (x)fY (y). So the double sum or double integral for E(g(X)h(Y )) can be
recomputed as the product of two sums or two integrals, i.e., E(g(X))E(h(Y )).)
As a result, using simply g(X) = X and h(Y ) = Y , we see that if X and Y
are independent, then the covariance of X and Y is 0, i.e.,
The covariance may or may not be 0 when X and Y are dependent; in fact, the
covariance of dependent random variances is usually not 0. (For the interested
reader, however, we give Examples 39.14 and 39.15, which show dependent
random variables that nonetheless have 0 covariance.)
Cov(X, Y ) = 0.
This nice little fact, used in the context of Corollary 39.5 or Corollary 39.7,
reinforces our understanding that, for independent random variables X1 , . . . ,
Xn , the variance of the sum of the Xj ’s equals the sum of the variances of the
Xj ’s:
Var(X1 + · · · + Xn ) = Var(X1 ) + · · · + Var(Xn ).
We put two more facts about covariance into Section 39.5: (1) the covariance
of two sums equals two sums of covariances, and (2) constants can be factored
out of covariances. (This is usually referred to as linearity.) First, however, we
study some examples about covariance.
The mass of Y is the same as the mass of X, although this might take a minute’s
thought to justify: Brent is equally likely to get any of the 13 cookies, so his
chances of getting a peanut butter cookie must be 5/13. To see this rigorously,
we can compute using the Bayes’ rule, i.e., by conditioning on the value of X,
i.e., by conditioning on whether Alicia received a peanut butter cookie:
P (Y = 1) = P (Y = 1 and X = 0) + P (Y = 1 and X = 1)
= P (Y = 1 | X = 0)P (X = 0) + P (Y = 1 | X = 1)P (X = 1)
= (5/12)(8/13) + (4/12)(5/13)
= 5/13
Remember that So Y has the same mass as X. In other words, Brent and Alicia are each equally
indicator random likely to get a peanut butter cookie. On the other hand, when we take Alicia’s
variables are just situation into account first, there is an effect on the ability of Brent to get a
Bernoulli random cookie. So X and Y are dependent. When Alicia does not get a peanut butter
variables, i.e., just 1
cookie, then Brent is more likely to get one, because more peanut butter cookies
or 0, depending on
whether some event remain, i.e.,
happens. P (Y = 1 | X = 0) = 5/12.
When she does get one, then he is less likely to get one, because there are fewer
peanut butter cookies remaining, i.e.,
P (Y = 1 | X = 1) = 4/12.
So
P (Y = 1 | X = 0) 6= P (Y = 1 | X = 1),
and these are both different from P (Y = 1) = 5/13. So Y is dependent on X.
Find the expected value of X + Y :
Even though X and Y are dependent, we can calculate the expected value
of X + Y , the total number of peanut butter cookies eaten by Alicia and Brent
altogether, by adding the expected values:
(Note that E(X) = 5/13 since X is a Bernoulli random variable, and similarly,
E(Y ) = 5/13 since Y is also a Bernoulli.)
Find the variance of X + Y :
Since X and Y are dependent, we cannot just get the variance of X + Y by
adding the variances of X and Y separately. We do still know that Var(X) =
39.4. Examples of Covariance 523
E(XY ) = 1P (X = 1 and Y = 1)
+ 0P (X = 0 and Y = 1)
+ 0P (X = 1 and Y = 0)
+ 0P (X = 0 and Y = 0)
So we just have
E(XY ) = P (X = 1 and Y = 1).
To get X = 1 and Y = 1, we need X = 1, and then conditioned on X = 1, we The product of
need Y = 1 too. If both Alicia and Brent want peanut butter cookies, Alicia Bernoullis is a
needs to get one, and then (given that she got one), Brent will need one too. Bernoulli too.
So
E(XY ) = P (X = 1 and Y = 1)
= P (Y = 1 | X = 1)P (X = 1)
= (4/12)(5/13).
So
Cov(X, Y ) = (4/12)(5/13) − (5/13)(5/13).
In summary,
Example 39.10. Consider an eight-hour work day. Huiping and Ravi each
check their email just once per day. Let X and Y denote the time, respec-
tively, until Huiping and Ravi each check their emails during the day. Assume
that Huiping always checks email first, and that X and Y are Uniform on the
region where 0 ≤ X ≤ Y ≤ 8, so that the joint density is constant on the
triangle in Figure 39.1. Since the joint density is constant on a triangle of area
(8)(8)(1/2) = 32, then the joint density must be the reciprocal of the area, i.e.,
y
8
7
6
5
4
3
2
1
x
1 2 3 4 5 6 7 8
Find the expected value and variance of Y − X, which is the time of the interval
in between Huiping and Ravi checking their email.
The expected time E(X) until Huiping checks email is obtained by integrat-
ing xfX,Y (x, y) over the triangle. We can use integration with respect to y, for
the outer integral; we can use integration over the x’s in the range 0 ≤ x leqy,
for the inner integral. This setup is shown in Figure 39.2.
y
8
7
6
5
4
3
2
1
x
1 2 3 4 5 6 7 8
Figure 39.2: Fixed value of y (here, for example y = 3.2), and x ranging
from 0 to y.
Thus
E(Y − X) = E(Y ) − E(X) = 16/3 − 8/3 = 8/3.
We cannot simply add the variances of X and Y , since X and Y are dependent.
So we compute
and now we need to integrate (y − x)2 fX,Y (x, y) over the triangle, to get the
value of E((Y − X)2 ). Since fX,Y (x, y) = 1/32 on the triangle, we have
Z 8Z y
2
E((Y − X) ) = (y − x)2/32 dx dy
0 0
= 32/3.
So, in summary,
As an alternative way to find the variance of Y −X, we could use the covariance:
Var(Y − X) = Cov(Y − X, Y − X)
= Cov(Y, Y ) + (−1) Cov(Y, X)
+ (−1) Cov(X, Y ) + (−1)(−1) Cov(X, X)
= Var(X) + Var(Y ) − 2 Cov(X, Y ).
and
Z 8Z y
Var(Y ) = y 2/32 dx dy − (16/3)2 = 32 − (16/3)2 = 32/9,
0 0
We let X1 and X2 be indicator random variables that denote whether the first
Remember that, for and second songs (respectively) are rock songs. Thus the expected number of
an indicator (i.e., a rock songs chosen is
Bernoulli random
variable) with E(X) = E(X1 + X2 ) = E(X1 ) + E(X2 ) = 7/10 + 7/10 = 14/10 = 7/5 = 1.4.
probability of
success p, and Now consider the variance of X. We have
probability of failure
q = 1 − p, the Var(X) = Var(X1 + X2 ) = Var(X1 ) + Var(X2 ) + 2 Cov(X1 , X2 ).
expected value is p
and the variance is Each of the Xj ’s is a Bernoulli random variable, so
pq.
Var(Xj ) = P (Xj = 1)P (Xj = 0)
= (probability song is rock)(probability song is not rock)
= (7/10)(3/10)
= 21/100.
Also
Cov(X1 , X2 ) = E(X1 X2 ) − E(X1 )E(X2 ).
We know E(X1 ) = 7/10 and E(X2 ) = 7/10. Also X1 and X2 are Bernoullis, so
X1 X2 is Bernoulli too. Thus the expected value of X1 X2 is just equal to the
probability that X1 X2 is equal to 1. So
E(X1 X2 ) = P (X1 X2 = 1)
= P (X1 = 1 and X2 = 1)
= P (X1 = 1)P (X2 = 1 | X1 = 1).
Notice that the scenarios in Examples 39.9 and 39.11 are specific cases of the
Hypergeometric distribution. E.g., in Example 39.9, X + Y has a Hypergeo-
metric distribution, since we select n = 2 out of N = 13 items (all cookies);
exactly M = 5 (the peanut butter cookies) are desirable. Thus, as we learned
in the Hypergeometric chapter:
M 5 10
E(X + Y ) = n = (2) = ,
N 13 13
and (as we will see below in the more general case, in Example 39.12) we have:
M M N −n 5 5 13 − 2 220
Var(X + Y ) = n 1− = (2) 1− = .
N N N −1 13 13 13 − 1 507
In Example 39.11, the random variable X1 + X2 is Hypergeometric, because
there are N = 10 songs available altogether; n = 2 of the songs are chosen; and
M = 7 of the songs are desirable.
and we know E(Xi ) = M/N and E(Xj ) = M/N . Also, Xi and Xj are each 0
or 1, so Xi Xj is 0 or 1. In other words, Xi and Xj are each Bernoulli, so Xi Xj
is Bernoulli too. Thus
E(Xi Xj ) = P (Xi Xj = 1)
= P (Xi = 1 and Xj = 1)
= P (Xi = 1)P (Xj = 1 | Xi = 1).
The n terms of the first type are all alike; none depend on j. The n2 =
(n)(n − 1)/2 terms of the second type are also alike; none of them depend on i
or j. Thus
M M (n)(n − 1) M M − 1 M M
Var(X) = n 1− +2 − .
N N 2 N N −1 N N
After a bit of simplification using the common denominator N 2 (N − 1), this
yields the result that the variance of a Hypergeometric random variable is
nM (N − M )(N − n) M M N −n
Var(X) = =n 1− .
N 2 (N − 1) N N N −1
In this example,
On the other hand, X and Y are not independent, because the value of Y is
completely dependent on the value of X.
Example 39.15. Let X be any random variable with expected value 0, and
let Y be defined such that
Y =0 whenever X 6= 0,
and let
Y 6= 0 otherwise, e.g., let Y = 13 when X = 0.
Then the distribution of Y again depends on the distribution of X, but also X
and Y again have covariance 0, since XY = 0 in all cases in this example, and
thus
To see this, we compute (in a similar fashion to the argument for sums of
variances from Section 12.3),
Cov(a1 X1 + · · · + an Xn , b1 Y1 + · · · + bm Ym )
X n m
X X n m
X
=E ai Xi bj Yj − E ai Xi E bj Yj
i=1 j=1 i=1 j=1
n X
X m Xn m
X
=E ai bj Xi Yj − ai E(Xi ) bj E(Yj )
i=1 j=1 i=1 j=1
n
XX m n
XX m
= ai bj E(Xi Yj ) − ai bj E(Xi )E(Yj )
i=1 j=1 i=1 j=1
n X
X m
= ai bj (E(Xi Yj ) − E(Xi )E(Yj ))
i=1 j=1
Xn X m
= ai bj Cov(Xi , Yj )
i=1 j=1
This result is often used in a context without any constants (i.e., all ai = 1 and
bj = 1), because we often need to take the covariance of two sums of random
variables. This happens, for instance, when we want to take the covariance of
a sum of indicators with another sum of indicators. As another example, if n
and m are large, and the Xi ’s are independent from each other, and the Yj are
independent from each other, but the Xi ’s and Yj ’s have some dependencies,
then X1 + · · · + Xn is approximately Normal, and Y1 + · · · + Ym is approximately
Normal too, and the covariance tells us how the two sums are related.
Cov(aX, bY ) = ab Cov(X, Y ).
For instance, if X is the charge for an oil change, and a = 1.07, then aX is the
charge for an oil change, with the 7% additional tax included. Similarly, if Y is
the number of containers of oil that are needed, and b is the price per container,
then bY is the price of the oil itself. Thus we can switch from Cov(aX, bY ) to
instead focusing on ab Cov(X, Y ). This could be helpful, especially if the charge
for an oil change is directly related to the number of containers of oil itself. The
other things, like the tax and the charge per container of oil, can be factored out
of the covariance equation, to simplify the situation and allow us to calculate
Cov(X, Y ) without the complications of tax or charge per container of oil.
39.6 Correlation
In the previous example, if X had been, for instance, time spent procrastinating,
then X and Y would have had a negative correlation.
The correlation of X and Y is defined as follows:
Definition 39.20. Correlation of two random variables
The correlation of two random variables X and Y , usually written as ρ, is
defined as
Cov(X, Y )
ρ(X, Y ) = p .
Var(X) Var(Y )
Note: The variance is only zero when a random variable is constant. So,
as long as X and Y are not constant, then the correlation between them is
well-defined.
To see that the correlation of two random variables X and Y is always
between −1 and +1, it is very helpful to write σX and σY as the standard
deviations of X and Y , and then to study how Var σXX + σYY and Var σXX − σYY
behave. In fact, the correlation is like the covariance, but scaled by the sizes of
the variances of the random variables.
We note that variances are always positive, so
X Y
0 ≤ Var +
σX σY
X Y X Y Y X
= Var + Var + Cov , + Cov ,
σX σY σX σY σY σX
but covariance is symmetric so the last two terms above
are the same. Also, we
can factor out the standard deviations to get Var σXX = σ12 Var(X) = 1 and
X
−1 ≤ ρ(X, Y ) ≤ 1.
Example 39.22. Let X be a Uniform random variable on the interval [0, 1],
and let Y = X 2 . Find the correlation between X and Y .
We see that
Z 1 Z 1
E(X) = (x)(1) dx = 1/2 and 2
E(X ) = (x2 )(1) dx = 1/3,
0 0
so
Var(X) = 1/3 − (1/2)2 = 1/12.
Also
Z 1
E(Y ) = E(X 2 ) = 1/3 and E(Y 2 ) = E(X 4 ) = (x4 )(1) dx = 1/5,
0
so
Var(Y ) = 1/5 − (1/3)2 = 4/45.
Also
Cov(X, Y ) = E(XY ) − E(X)E(Y ) = E(X 3 ) − E(X)E(Y ),
R1
and we know E(X) = 1/2 and E(Y ) = 1/3, and we see that E(X 3 ) = 0 x3 dx =
1/4, so
Cov(X, Y ) = 1/4 − (1/2)(1/3) = 1/12.
Thus X and Y are positively correlated, and the correlation between X and Y
is √
Cov(X, Y ) 1/12 15
ρ(X, Y ) = p p =p p = = 0.968.
Var(X) Var(Y ) 1/12 4/45 4
Example 39.23. Roll a die. Let X denote the value on the die. Let Y = 1 if
the die is 4, 5, or 6, and let Y = 0 otherwise. Find the correlation between X
and Y . Informally, when Y = 1, we know X must be larger (than when Y = 0).
So we anticipate X and Y being strongly positively correlated.
534 Chapter 39. Variance of Sums; Covariance; Correlation
We see that
6
X 6
X
E(X) = (j)(1/6) = 7/2, and 2
E(X ) = (j 2 )(1/6) = 91/6,
j=1 j=1
so
Var(X) = 91/6 − (7/2)2 = 35/12.
Also
E(Y ) = (3/6)(1) + (3/6)(0) = 1/2,
and
E(Y 2 ) = (3/6)(12 ) + (3/6)(02 ) = 1/2,
so
Var(Y ) = 1/2 − (1/2)2 = 1/4.
Also
Cov(X, Y ) = E(XY ) − E(X)E(Y ),
and we know that
So we have
Cov(X, Y ) = 5/2 − (7/2)(1/2) = 3/4.
Thus X and Y are positively correlated, and the correlation between X and Y
is
√
Cov(X, Y ) 3/4 3 105
ρ(X, Y ) = p p =p p = = 0.878.
Var(X) Var(Y ) 35/12 1/4 35
Example 39.24. Roll a die. Let X denote the value on the die. Let Y = 1 if
the die is even, i.e., is 2, 4, or 6, and let Y = 0 otherwise. Find the correlation
between X and Y . Informally, when Y = 1, we know X must be slightly larger
(than when Y = 0), because for instance X is 2 not 1, or X is 4 not 3, or X is
6 not 5. So we anticipate X and Y being a little bit positively correlated.
E(X) = 7/2,
Var(X) = 35/12,
E(Y ) = 1/2,
Var(Y ) = 1/4,
Cov(X, Y ) = E(XY ) − E(X)E(Y ),
39.6. Correlation 535
so
Cov(X, Y ) = 2 − (7/2)(1/2) = 1/4.
Thus X and Y are positively correlated, and the correlation between X and Y
is √
Cov(X, Y ) 1/4 105
ρ(X, Y ) = p p =p p = = 0.293.
Var(X) Var(Y ) 35/12 1/4 35
Thus, X and Y are positively correlated, but are not as strongly correlated as
in the previous example.
Example 39.25. Roll a die. Let X denote the value on the die. Let Y = 1 if
the die is odd, i.e., is 1, 3, or 5, and let Y = 0 otherwise. Find the correlation
between X and Y . Informally, when Y = 1, we know X must be slightly smaller
(than when Y = 0), because for instance X is 1 not 2, or X is 3 not 4, or X is
5 not 6. So we anticipate X and Y being a little bit negatively correlated. This
is completely symmetric to Example 39.24.
so we have
Cov(X, Y ) = 3/2 − (7/2)(1/2) = −1/4.
Thus, X and Y are negatively correlated, and the correlation between X
and Y is
√
Cov(X, Y ) −1/4 105
ρ(X, Y ) = p p =p p =− = −0.293.
Var(X) Var(Y ) 35/12 1/4 35
Example 39.26. Roll a die. Let X denote the value on the die. Let Y = 1
if the die is 1, 2, or 3, and let Y = 0 otherwise. Find the correlation between
X and Y . Informally, when Y = 1, we know X must be smaller (than when
Y = 0). So we anticipate X and Y being have a strong negative correlation.
This is completely symmetric to Example 39.23.
536 Chapter 39. Variance of Sums; Covariance; Correlation
so
Cov(X, Y ) = 1 − (7/2)(1/2) = −3/4.
Thus X and Y are negatively correlated, and the correlation between X and Y
is
√
Cov(X, Y ) −3/4 3 105
ρ(X, Y ) = p p =p p =− = −0.878.
Var(X) Var(Y ) 35/12 1/4 35
E(X) = P (X = 1) = P (A),
Var(X) = P (X = 1)P (X = 0) = P (A)P (Ac ),
E(Y ) = P (Y = 1) = P (B),
Var(Y ) = P (Y = 1)P (Y = 0) = P (B)P (B c ),
E(XY ) = P (X = 1 and Y = 1) = P (A ∩ B),
so
Cov(A, B)
ρ(X, Y ) = p p
Var(X) Var(Y )
P (A ∩ B) − P (A)P (B)
=p p .
P (A)P (Ac ) P (B)P (B c )
The denominator is always positive. Thus, the sign of the correlation of X and
Y only depends on the sign of the numerator.
39.7. Exercises 537
P (A ∩ B) − P (A)P (B),
39.7 Exercises
39.7.1 Practice
Exercise 39.1. Pizza for lunch. Each day, Amy eats lunch at the cafeteria.
She chooses pizza as her main dish with probability 40%, and her behavior each
day is independent of all the other days. Let X denote the number of days she
chooses pizza in a 10-day period. Let Y = 10 − X denote the number of days
in which Amy does not eat pizza.
Exercise 39.2. Let X be Uniformly distributed on the interval [0, 10], and let
Y = 10 − X.
Exercise 39.3. Client meeting. An accountant must meet with one more
client before he can go home. The amount of time X (in minutes) that he meets
with the client is Uniformly distributed on the interval [40, 60]. The total length
of time Y (also in minutes) that he must remain in the office is 1.3X + 10.
Exercise 39.4. Broken crayons. You are babysitting for two children, Abby
and Bill. They have a bucket of 30 crayons, 20 of which are unbroken, and
the other 10 are broken. They each choose a crayon, without replacement. Let
X = 1 if Abby gets an unbroken crayon, and X = 0 otherwise. Similarly, let
Y = 1 if Bill gets an unbroken crayon, and Y = 0 otherwise.
39.7.2 Extensions
Find (a) the covariance, and (b) the correlation of the random variables X and
Y defined in Exercises 39.5–39.15
Exercise 39.6. Let X be Uniformly distributed on the interval [0, π], and let
Y = cos X.
Exercise 39.7. Sweet and sour. Henry and Sally each choose 1 candy from
a bag of treats, without replacement. There are 20 sweet candies and 3 sour
candies. Let X = 1 if Henry’s candy is sweet, or X = 0 otherwise. Let Y = 1
if Sally’s candy is sweet, or Y = 0 otherwise.
Exercise 39.9. Let X and Y have a joint Uniform distribution on the triangle
with corners at (0, 2), (2, 0), and the origin.
Exercise 39.16. Consider X and Y such that the joint density fX,Y (x, y) of
X and Y is Uniform on the square where 0 ≤ X, Y ≤ 1. In other words,
39.7.3 Advanced
Exercise 39.18. Let X and Y correspond to the horizontal and vertical co-
ordinates in the triangle with corners at (2, 0), (0, 2), and the origin. Let
15
fX,Y (x, y) = 28 (xy 2 + y) for (x, y) inside the triangle, and fX,Y (x, y) = 0
otherwise.
a. Find the covariance of X and Y .
b. Find the correlation of X and Y .
Exercise 39.19. If X and Y have a constant joint density on the triangle where
0 ≤ y ≤ x ≤ 1, compute Cov(X, Y ).
Exercise 39.20. Draw 5 cards, without replacement, from a standard deck of
cards. Let X be the number of hearts selected. Find the variance of X.
Exercise 39.21. Suppose that Sasha picks an integer X at random between 1
and 6 (inclusive). Then Ravi picks a different integer Y at random, from the
5 integers that remain. Assume that all (6)(5) = 30 choices are equally likely.
Find the correlation ρ(X, Y ) of X and Y .
540 Chapter 39. Variance of Sums; Covariance; Correlation
Exercise 39.22. A total of 3n bears are in a bucket: n are red, n are yellow,
and n are blue. A child begins grabbing the bears at random, with all selections
equally likely. The bears are selected “without replacement,” i.e., she never puts
the bears back after she grabs them. Find the variance of the number of bears
she grabs until the first red bear appears.
Exercise 39.23. Roll 10 differently colored, 6-sided dice. Make a list of all
10
triples using the 10 values that appear. Let X be the total number
3 = 120
of such triples for which the three values in the triple agree. Find Var(X).
Chapter 40
Conditional Expectation
Statistically the probability of any one of us being here is so small that you would
think the mere fact of existence would keep us all in a contented dazzlement
of surprise. We are alive against the stupendous odds of genetics, infinitely
outnumbered by all the alternates who might, except for luck, be in our places.
—The Lives of a Cell: Notes of a Biology Watcher by Lewis Thomas
(Viking, 1974)
If a woman is waiting on her spouse, how does her arrival time affect the ex-
pected time until he arrives?
40.1 Introduction
The idea of conditional expectation of one random variable (say, X), given the
value of another random variable (say, Y = y), is to use the conditional mass or
the conditional density to calculate the expected value of one random variable
(X) when the value of the other is known (Y ). In other words, we know some
information ahead of time about one of our random variables that may affect
the mass or density—and, thus, the probabilities and expected value—of the
other random variable.
541
542 Chapter 40. Conditional Expectation
40.2 Examples
As an example, returning to the situation in Example 39.23, we have the fol-
lowing:
Example 40.2. Roll a die. Let X denote the value on the die. Let Y = 1 if
the die is 4, 5, or 6, and let Y = 0 otherwise.
Given Y = 1, what is the expected value of X?
When Y = 1 is known, then X is equally likely to be any of the values 4, 5,
or 6, and thus E(X | Y = 1) = 13 (4) + 31 (5) + 31 (6) = 5.
On the other hand, when Y = 0 is known, then X is equally likely to be
any of the values 1, 2, or 3, and thus E(X | Y = 0) = 13 (1) + 31 (2) + 31 (3) = 2.
Example 40.3. Roll a die. Let X denote the value on the die. Let Y = 1 if
the die is even, i.e., is 2, 4, or 6, and let Y = 0 otherwise. Given Y = 1, what
is the expected value of X?
We let X (also called X1 ) denote the time of the first call, so X has the
density above. We let Y (also called X1 + X2 ) denote the total time of call
one plus call two, combined, so as in Example 33.2, Y is a Gamma random
variable with n = 2 and λ = 3, so Y has density fY (y) = 9ye−3y for y > 0, and
fY (y) = 0 otherwise.
Given that the first two calls take 5/6 of an hour total, i.e., given that
Y = 5/6, find the expected length of the first call. We compute
Z ∞ Z ∞
fX,Y (x, 5/6)
E(X | Y = 5/6) = xfX|Y (x | 5/6) dx = x dx.
−∞ −∞ fY (5/6)
We see that
fY (5/6) = 9(5/6)e−(3)(5/6) = (15/2)e−5/2 .
Also
fX,Y (x, y) = fY |X (y | x)fX (x).
Once X is known, then Y > X. For any y > X, we have P (Y < y | X = x) =
P (Y − x < y − x | X = x) = P (X2 < y − x). Differentiating with respect to y
gives fY |X (y | x) = fX2 (y − x). (Less formally, this says that, once the length
of the first call is known, the remaining length until the end of both calls is
just equal to the length of the second call.) So fY |X (y | x) = 3e−(3)(y−x) for
y − x > 0, i.e., for y > x. Therefore, for 0 < x < 5/6, we have
fX,Y (x, 5/6) = 3e(−3)((5/6)−x) 3e−3x = 9e−5/2
Now we compute
Z ∞
fX,Y (x, 5/6)
E(X | Y = 5/6) = x dx
−∞ fY (5/6)
5/6
9e−5/2
Z
= x dx
0 (15/2)e−5/2
Z 5/6
= (x)(6/5) dx
0
= 5/12
544 Chapter 40. Conditional Expectation
So given that the two calls last 5/6 of an hour altogether, then the first call is
expected to last 5/12 of an hour.
This should make sense intuitively, because when only the total length of
two calls is known, we have no reason to believe that one call will be longer
than the other, so the expected length of the first call plus the expected length
of the second call must equal the total length of the two calls, i.e., 5/6. So the
conditional expected length of each call must be 5/12. (Each call is expected
to last half of the total time.)
As before, in the last line, only “y” remains; no “x” is still in the picture.
So again we are computing the expected value of E(X | Y = y), which only
depends on Y . Thus, the last line is equal to E( E(X | Y = y) ), and again the
outer summation is taken with regard to Y , while the inner summation is taken
over all possible X. This is also sometimes written as E( E(X | Y ) ), where once
again the outer expected value is with regard to Y , and the inner expected value
is with regard to X. So we have
X
E(X) = E(X | Y = y) pY (y)
y
= E( E(X | Y = y) )
= E( E(X | Y ) ).
Example 40.5. If X and Y are continuous random variables that are indepen-
dent, then
E(X | Y = y) = E(X).
To see this, we just write
Z ∞
E(X | Y = y) = xfX|Y (x | y) dx,
−∞
Of course the arguments in the previous example and the following example
only work when X and Y are independent. This argument would fail if X and
Y are dependent.
Example 40.6. If X and Y are discrete random variables that are independent,
then
E(X | Y = y) = E(X).
To see this, we just write
X
E(X | Y = y) = xpX|Y (x | y),
x
E(X | Y = y) = E(X).
Example 40.8. Suppose that Alice and Bob each eat one cookie per day,
on five consecutive days, so that they eat a total of ten cookies altogether.
They both love peanut butter cookies, but the cookie selection in each of their
dormitories is random. Let X1 , . . . , X5 be indicators for Alice’s five cookies, and
let X6 , . . . , X10 be indicators for Bob’s five cookies, where Xj = 1 if the jth
cookie is peanut butter, and Xj = 0 otherwise. Assume that the Xj ’s are all
independent Bernoulli random variables, each with parameter p = 0.40, because
there is always a fresh supply of cookies (in particular, Alice and Bob’s choices
each day do not affect each other).
Write X = X1 + · · · + X5 for the total number of peanut butter cookies that
Alice gets to eat over a five day period. Write Y = X1 + · · · + X10 for the total
number of peanut butter cookies that the couple gets to eat. Find the expected
number of cookies that Alice got to eat, given that the couple got to eat Y = 7
peanut butter cookies altogether, i.e., find E(X | Y = 7).
Given Y = 7, we claim that X has conditional distribution as a Hypergeo-
metric random variable with parameters M = 5 and N = 10 and n = 7. Using
what we already know about the Hypergeometric distribution, this yields
E(X | Y = 7) = nM/N = (7)(5)/10 = 3.5.
This makes sense because given Y = 7, there are 7 peanut butter cookies that
the couple gets, so the number of cookies that Alice gets is Hypergeometric,
i.e., it is 5 out of these 7 peanut butters and 3 non-peanut-butters. Each of
the 7 peanut butter cookies is equally likely to belong to Alice or Bob, so Alice
expects to get 3.5 peanut butter cookies, and Bob expects to get 3.5 peanut
butter cookies.
In case the reader is unconvinced by the claim that the conditional distri-
bution of X is Hypergeometric (given Y = 7), we can instead compute the
conditional mass P (X|Y = 7) before we can get to the conditional expected
value.
pX,Y (x, 7) pX (x)pY |X (7 | x)
pX|Y (x | 7) = =
pY (7) pY (7)
Since X is the number of peanut butter cookies Alice ate over five days, then
X is Binomial with parameters n = 5 and p = 0.40, so
5
pX (x) = (0.4)x (0.6)5−x .
x
40.2. Examples 547
Since Y is the total number of peanut butter cookies eaten by the couple over
five days, then Y is Binomial with parameters n = 10 and p = 0.40, so
10
pY (7) = (0.4)7 (0.6)3 .
7
Now we need the conditional probability for PY |X (7|x), i.e., for the probability
that the couple has eaten 7 peanut butter cookies total over the five days, given
that Alice ate x of them. Basically, the piece we are missing is the number
of peanut butter cookies that Bob ate. The only way that they eat 7 peanut
butter cookies altogether—given that Alice ate x of them—is for Bob to eat
7 − x of them. The number of peanut butter cookies that Bob eats is Binomial
with n = 5 and p = 0.4. So
5
pY |X (7 | x) = P (Y = 7 | X = x) = (0.4)7−x (0.6)5−(7−x) .
7−x
Thus, putting all the pieces together for the probability pX|Y (x | 7),
5 x (0.6)5−x 5 (0.4)7−x (0.6)5−(7−x) 5
5
(0.4)
pX|Y (x | 7) = x 10
7−x = x 107−x .
(0.4) 7 (0.6)3
7 7
This verifies the claim that X has a Hypergeometric distribution, given that
Y = 7.
Example 40.9. Suppose that each person who logs onto an online retailer’s
website on Black Friday is expected to spend $27.50. Exactly 120 people are
surveyed to see how much is spent by the people within this group. Each behaves
independently of the others, and each visits the online retailer with probability
0.90. How much money do we expect the group of 120 people to spend at the
online retailer?
We let Y denote the number of the people within the group of 120 who
decide to visit the online retailer. Thus Y is Binomial with parameters n = 120
and p = 0.90. Once Y = y is given, we know that exactly y people visit the
store, so the money spent by the group is X1 + · · · + Xy , so the expected money
spent by the group given Y = y is
y
z }| {
E(X1 + · · · + Xy | y) = 27.50 + · · · + 27.50 = (y)(27.50).
Thus, the expected money spent by the group, averaged over all possible values
of Y = y, is
Y
X
E Xj = E((Y )(27.50)) = (120)(0.90)(27.50) = 2970.
j=1
548 Chapter 40. Conditional Expectation
X y
120 X
= E Xj pY (y)
y=0 j=1
120
X
= (27.50)(y)pY (y)
y=0
120
X
= 27.50 (y)pY (y)
y=0
= 27.50 E(Y )
= (27.50)(120)(0.9)
= 2970.
= E(X1 )E(Y ).
So the expected value of X1 + · · · + XY is just equal to the expected value of
one of the Xj ’s, multiplied by the expected value of Y , i.e., by the number of
random variables that we expect to add.
40.2. Examples 549
Example 40.12. Suppose that the number of shoppers coming into a store
follows a Poisson distribution with an expected value of 10 per hour. Suppose
also that each shopper is—independent of all other shoppers—equally likely to
be a man or a woman.
Given that exactly 7 men are shopping, how many women do we expect are
shopping?
Let X and Y denote the number of men and women shoppers, respectively,
and let N = X + Y be the total number of shoppers, which is known to be a
Poisson random variable with mean 10.
We see that
So we have factored the joint mass into “x stuff” and “y stuff,” each of which
is a mass, and thus X and Y must be independent. We see from each of their
masses that X and Y are independent Poissons, each with expected value 5.
So, X is Poisson with expected value 5 and Y is Poisson with expected
value 5. Also X and Y are independent. Thus, regardless of how many men
are shopping (e.g., regardless of the fact that exactly 7 men are shopping), the
number of female shoppers is unaffected by the value of X. Thus, even when
given that 7 men are shopping, the expected number of women shopping is
still 5.
550 Chapter 40. Conditional Expectation
40.3 Exercises
40.3.1 Practice
Exercise 40.1. Let X and Y have a joint uniform distribution on the triangle
with corners at (0, 2), (2, 0), and the origin. Find E(Y | X = 1/2).
Exercise 40.2. Errors on a page. Suppose that the number of errors per page
of a book has a Poisson distribution with parameter λ = 0.12. Also suppose
that the expected number of pages in a randomly selected new book is 400.
Find the mean number of errors in such a book.
Exercise 40.3. Arrival times. Consider a man and a woman who arrive at a
certain location; whoever arrives first will wait for the other to arrive. If X and
Y denote (respectively) the arrival times of the man and the woman after noon,
in minutes, assume that X and Y are independent and each Uniformly dis-
tributed on [0, 60]. (In other words, the man and woman arrive independently,
at Uniform times between noon and 1 PM.)
If the woman arrives at 12:35 PM, find the expected time spent waiting, i.e.,
find
E( |X − Y | | Y = 35 ).
Hint: Since we know that Y = 35, we may as well just substitute “35” for Y , so
that we only have X’s in our lives. So we just need to find
E( |X − 35| ).
Exercise 40.4. Female heights. As in Exercise 35.24 and Example 36.10, as-
sume that the height (in inches) of an American female is Normal with expected
value µ1 = 64 and standard deviation σ1 = 2.5. Also assume that the height
40.3. Exercises 551
Exercise 40.6. Mac and cheese. While Juanita waits for her Mac and Cheese
to boil, she works on her homework for 1/2 of the time. She has been waiting
for 3 minutes already (so she has already done 1.5 minutes of homework). She
doesn’t know how much more time is needed. She estimates that the remaining
time (in minutes) until the water boils is an Exponential random variable Y
with density fY (y) = 41 e−y/4 for y > 0, and fY (y) = 0 otherwise. Find the total
length of time that she expects to work on her homework while waiting for the
water to boil.
Exercise 40.7. Flowers. Sally and David each pick 10 flowers from the case
without paying attention to what type of flowers they are picking. There are a
large quantity of flowers available, 20% of which are roses. Let X be the number
of roses that Sally picks, and let Y be the number of roses that the couple picks
altogether. Find the number of roses that we expect Sally to pick if the total
number picked is Y = 12.
Exercise 40.8. Selling cookies. In a particular girl scout troop, each girl
sells an average of 30 boxes of cookies. Let Y be the number of girls in the
troop, and let X be the number of boxes of cookies sold. Find E(X | Y = y).
assumed to be independent. Given that the total duration of the two appoint-
ments turns out to be 38 minutes altogether, how long do we expect the first
appointment to be?
Exercise 40.12. First aid. A nurse has 10 minutes to administer first aid
to various soldiers in the middle of a battle. The time (in minutes) devoted to
the first soldier is Y , and the time (in minutes) devoted to the second soldier is
X. Assume that (X, Y ) is Uniformly distributed on the triangle where X ≥ 0,
Y ≥ 0, and X + Y ≤ 10. If we know that Y = 3, i.e., that she spends exactly 3
minutes with the first soldier, how much time do we expect that she will spend
with the second soldier?
40.3.2 Extensions
Exercise 40.14. Roll two 6-sided dice. Let X denote the minimum value that
appears, and let Y denote the maximum value that appears.
Exercise 40.15. A child rolls a pair of dice, one of which is blue and one of
which is red.
a. Given that the sum of the dice is 8, find the probability that the red die
shows the value 4.
b. Now the child rolls a pair of dice that look the same (i.e., which are not
painted). Given that the sum of the dice is 8, find the probability that both of
the dice simultaneous show the value 4.
c. Are your answers the same or different in the two parts above? Why?
Exercise 40.16. Two 6-sided dice are rolled. Let X be the minimum of the
two values, and let Y be the absolute value of the difference of the two values.
If D1 , D2 are the two values on the two dice, then
X = min(D1 , D2 ),
and
Y = |D1 − D2 |.
If Y = 4, what is the expected value of X?
Exercise 40.17. Guitar songs. Helen and Joe play guitar together every day
at lunchtime. The number of songs that they play on a given day has a Poisson
distribution, with an average of 5 songs per day. Regardless of how many songs
they will play that day, Helen and Joe always flip a coin at the start of each
40.3. Exercises 553
song, to decide who will play the solo on that song. If we know that Joe plays
exactly 4 solos on a given day, then how many solos do we expect that Helen
will play on that same day?
Exercise 40.18. Black Jack. Let Y denote the value of the dealer’s card
that can be seen by all the players, in a game of Black Jack. Let X be a
Bernoulli random variable that indicates whether the dealer must stay (i.e., not
take another card). Given Y = 10, find the expected value of X. Hint: In Black
Jack, if the dealer has a total of 17 or greater in her hand, then she must stay;
if her total is 16 or less, she will draw. For the purposes of this question, the
Ace has value 11, and each face card (Jack, Queen, or King) has the value 10.
Exercise 40.20. Sandra rolls two 10-sided dice. Let Y be the sum of the two
dice, and let X be the value on the first die that she rolls. Given Y = 15, what
is the expected value of X?
Exercise 40.21. Date wait. Suppose that Harrison comes to pick up Rosita
for a date. Rosita will be ready at a time that is Uniformly distributed between
7 PM and 7:10 PM. If Harrison arrives at 7:07 PM, how long does he expect
to have to wait until Rosita is ready? (If she is already ready when he arrives,
then his waiting time is 0, since a waiting time cannot be negative.)
40.3.3 Advanced
Exercise 40.23. Let X and Y correspond to the horizontal and vertical co-
ordinates in the triangle with corners at (2, 0), (0, 2), and the origin. Let
15
fX,Y (x, y) = 28 (xy 2 + y) for (x, y) inside the triangle, and fX,Y (x, y) = 0
otherwise. Find E(X | Y = 1.5).
Exercise 40.24. Show that the generalization of Example 40.12, given in a box
at the end of the chapter, is true.
Chapter 41
If we know the average speed of drivers on a highway, what kind of bounds can
we give on the probability that a randomly selected driver is speeding? How
about bounds on the probability that a randomly chosen driver is driving within
5 miles of the average speed?
41.1 Introduction
One subdiscipline within probability theory is the study of probability inequal-
ities. These inequalities are used to put bounds on how large or small a proba-
bility can be, under certain conditions. Some bounds are stronger than others.
Some inequalities have many conditions, and other inequalities are quite simple.
Many different types of inequalities are used in limiting situations, i.e., what
happens to a limit of a sequence of random variables and/or their associated
probabilities. As with many topics covered during this course, the study of
probability inequalities is very rich and deep. We could spend many chapters
on probability inequalities, but here we only scratch the surface.
The Markov inequality gives bounds on how large (the absolute value of) a
random variable can be.
554
41.2. Markov Inequality 555
E( |X| )
P (|X| ≥ a) ≤ .
a
To see that the Markov inequality is true, we just consider whether |X| ≥ a.
More precisely, we let Y = 1 if |X| ≥ a, and Y = 0 otherwise. Then Y is an
indicator, so
P (|X| ≥ a) = P (Y = 1)
= E(Y ).
Y = 1 ≤ |X|/a.
• If Y = 0, then
Y = 0 ≤ |X|/a
because |X| and a are both greater than or equal to 0.
Putting this together with the series of equations above, we get the Markov
inequality:
E( |X| )
P (|X| ≥ a) ≤ .
a
As a straightforward extension of the original Markov inequality, we notice
that, if X is always nonnegative (i.e., if X ≥ 0 always), then |X| = X. So, in
such a case, the Markov inequality works for not just |X| but also for X too.
So the second version of the Markov inequality gives bounds on how large a
nonnegative random variable can be.
E(X)
P (X ≥ a) ≤ .
a
Notice that the Markov inequality does not require us to know anything
about the distribution of the random variable X, except that we need to know
the expected value. We do not even need to know if X is discrete or continuous.
556 Chapter 41. Markov and Chebyshev Inequalities
Example 41.3. Consider a class of students for which the class average is 60%.
Find a bound for the probability that a randomly chosen student’s score is
72% or higher. We let X denote a randomly chosen student’s score. Using the
Markov inequality, we have
E(X) 60
P (X ≥ 72) ≤ = = 5/6.
72 72
Notice, in this example, that the distribution of X is not even given! All that
is required is that we know the expected value of X.
E(X) 83,000
P (X ≥ 90,000) ≤ = = 83/90 = 0.92.
90,000 90,000
Example 41.5. On a certain highway, the speed limit is 55 miles per hour, but
most drivers are not driving so fast. The average speed on the highway is 49
miles per hour.
If X denotes a randomly chosen driver’s speed, then the probability that such
a person is driving faster than the speed limit is
E(X) 49
P (X ≥ 55) ≤ = = 0.89.
55 55
we can build an inequality based on having more information about the random
variable’s actual distribution.
One unfortunate thing about the Markov inequality is that we do not nec-
essarily get sharp bounds (i.e., precise bounds) for P (|X| ≥ a). Another unfor-
tunate thing about the Markov inequality, however, is that it does not help us
to calculate values P (|X| ≥ a) when a is smaller than the expected value of |X|
(and thus E(|X|)/a is bigger than 1). If we try to apply the Markov inequality
when a > E(|X|), then the results that we get are not too interesting, because
we will have
E(|X|)
P (|X| ≥ a) ≤
a
but if E(|X|)
a > 1, this is not very informative, because we can automatically
write the tighter bound:
P (|X| ≥ a) ≤ 1,
which is true trivially, just because all probabilities of all events are bounded
above by 1. So we do not gain any additional information in such a case. We
give one such useless example.
Find a bound for the probability that a randomly chosen student’s score is
55% or higher. We let X denote a randomly chosen student’s score. Using the
Markov inequality, we have
E(X) 60
P (X ≥ 55) ≤ = = 1.09.
55 55
This is completely uninteresting, because we could just as well have written
P (X ≥ 55) ≤ 1,
and we have still learned nothing new at all. So the Markov inequality is not
useful to us in such a situation.
Var(X)
P (|X − E(X)| ≥ k) ≤ .
k2
To see that the Chebyshev inequality is true, we just apply the Markov
inequality, using (X − E(X))2 as the random variable, and using a = k 2 . Then
the Markov inequality yields
E((X − E(X))2 )
P ((X − E(X))2 ≥ k 2 ) ≤ .
k2
On the left hand side, (X − E(X))2 ≥ k 2 if and only if |X − E(X)| ≥ k, so we
can rewrite the left hand side as P (|X − E(X)| ≥ k). On the right hand side,
E((X − E(X))2 ) = Var(X). So we get
Var(X)
P (|X − E(X)| ≥ k) ≤ ,
k2
which is exactly the statement of the Chebyshev inequality.
We can get a second version of the Chebyshev inequality if we just use
“k = aσX ” in the version above, where σX is the standard deviation of X. This
yields
Var(X)
P (|X − E(X)| ≥ aσX ) ≤ ,
(aσX )2
2 = 1, this simplifies to
for any a > 0. Since Var(X)/σX
1
P (|X − E(X)| ≥ aσX ) ≤ .
a2
The choice of letter that we use is arbitrary, so we write the second version of
the Chebyshev inequality using k’s instead of a’s:
Corollary 41.8. Chebyshev Inequality (version 2)
If X is any random variable and k is any positive number, then
We could also take complements on both sides of this equation, and rewrite
the equation
P (|X − E(X)| ≥ kσX ) ≤ 1/k 2 .
as the following:
1
1 − P (|X − E(X)| ≥ kσX ) ≥ 1 − ,
k2
or equivalently,
k2 − 1
P (|X − E(X)| ≤ kσX ) ≥ .
k2
This says that a random variable is within k standard deviations from its ex-
2
pected value at least k k−1
2 of the time.
k2 − 1
P (|X − E(X)| ≤ kσX ) ≥ .
k2
Find a bound for the probability that a randomly chosen student’s score is
between 45% and 75%. We let X denote a randomly chosen student’s score.
Using the Chebyshev inequality, since the standard deviation is σX = 10, we
use k = 1.5, and then we have
As with the Markov inequality, the Chebyshev inequality does not require that
we know that distribution of the scores at all. We just need to know the expected
value and the variance or the standard deviation. Also, as with the Markov
inequality, we do not get a precise answer, but rather, we only get some bounds
on the desired probability.
560 Chapter 41. Markov and Chebyshev Inequalities
Example 41.11. Return to the scenario of Example 41.4, in which the average
salary of actuaries in a certain specialty is known to be $83,000 per year. Also
assume that the standard deviation is $20,000 per year.
Another way to express this is to say that the probability a randomly chosen
actuary’s salary is between $43,000 and $123,000 is
We do not have sufficient information to tell the exact probability that a ran-
domly chosen actuary’s salary is between $43,000 and $123,000, but the bounds
given here are better than nothing.
We already noted, after Example 41.10, that the Chebyshev inequality does
not require us to know the distribution of the random variable, but it does re-
quire us to know the expected value and to know the variance, or equivalently,
to know the standard deviation. We only get bounds on the probabilities from
the Chebyshev inequality. Also, we do not get any information about the frac-
tion of the time that a random variable is less than (or is more than) k standard
deviations of the expected value, for k < σX in version 1, or equivalently for
k < 1 in versions 2 or 3. The Chebyshev inequality is just not helpful in such
situations. In version 1, the right hand side will be Var(X)/k 2 > 1 in such a
case, which is not helpful. In version 2, the right hand side will be 1/k 2 > 1,
2
which is not helpful. In version 3, the right hand side will be k k−12 < 0, which
again is not helpful. So these extreme cases do not yield any useful information
from the Chebyshev inequality.
41.4 Exercises
41.4.1 Practice
Exercise 41.1. Waiting for a bus. While waiting for the bus on a snowy
morning, the expected waiting time (including unusual delays for snow!), is 12
minutes.
Exercise 41.3. Basketball shots. A basketball player has improved his scor-
ing ability. During a game, he can be expected to make 12 shots.
Exercise 41.5. Chicken feathers. A certain type of chicken’s wing has 138
feathers on average, with standard deviation 5. Find a bound on the probability
that the chicken has between 120 and 156 feathers.
Exercise 41.6. Sneezing. Henry caught a cold recently and therefore he has
been sneezing a lot. His expected waiting time between sneezes is 45 seconds.
The standard deviation of the waiting time between his sneezes is 8 seconds.
Find a bound on the probability that the time between two consecutive sneezes
is between 30 and 60 seconds.
Exercise 41.7. Sleepy dog. The expected time for a student’s dog to fall
asleep is 12 minutes. Give an upper bound on the time that it takes the dog
more than 20 minutes to fall asleep.
Exercise 41.8. Music library. In a student’s music library on his mp3 player,
the expected length of a randomly chosen song is 3.2 minutes.
a. Find a bound on the probability that the snowfall in a given winter will
exceed 16 inches.
b. If the standard deviation of snowfall is 3.25 inches, find a bound on the
probability that there is between 6 and 14 inches of snow.
Exercise 41.11. Flight time. The flight time of a plane flight from Denver
to New York City is has an average flight time of 3 hours and 16 minutes, and
standard deviation is 30 minutes.
41.4. Exercises 563
a. Give a bound on the probability that such a flight takes 6 hours or longer.
b. Give a bound on the probability that the announced arrival time (3 hours,
16 minutes) and the actual arrival time differ by 1 hour or more.
Exercise 41.12. Cold weather. The average temperature in December is 27◦ .
a. Find a bound on the probability of the temperature falling outside the
range −50◦ to 50◦ .
b. If the standard deviation of the temperature is 7◦ , then what is the
probability that the temperature falls outside the range of 14◦ to 40◦ ?
Exercise 41.13. Final exam. Let X be the number of problems on the final
exam. The professor puts, on average, 30 problems on each exam. The standard
deviation of the number of problems he puts on a final exam is 2. Given a bound
for the likelihood that the final exam contains between 25 and 35 problems.
Exercise 41.14. Video download. A video can be downloaded from the
web and moved to a student’s mobile device in 12 minutes, on average. If the
standard deviation of the time required is 2 minutes, then give a bound on the
probability that 9–15 minutes are needed.
Exercise 41.15. Eating habits. In a study on eating habits, a particular
participant averages 750 cm3 of food during per meal.
a. It is extraordinarily rare for this participant to eat more than 1000 cm3
of food at once. Find a bound on the probability of such an event.
b. If the standard deviation of a meal size is 100 cm3 , the find a bound on
the event that the meal is either too much food, i.e., more than 1000 cm3 , or
an insufficient amount of food, namely, less than 500 cm3 .
Exercise 41.16. Long jumps. An athletic director is recording long-jump
scores for a group of students. The expected value of each of their long jumps
is 7 meters, and the standard deviation is 0.2 meters. If he chooses a student
at random, find a bound on the probability that the student’s long-jump is
between 6.7 and 7.3 meters.
Exercise 41.17. Sled runs. An energetic student can manage to get 9 sled
runs down a hill, on average, within a 30 minute time period.
a. Find an upper bound on the probability that the student achieves 12 or
more runs during a one hour period.
b. If the standard deviation of the number of runs that they can manage in
a 30 minute period is 2, then give a bound on the probability that the student
gets between 6 and 12 sled runs during a 30 minute period.
Exercise 41.18. String theory. Suppose that the expected number of stu-
dents who take String Theory is 23, with variance 169. Suppose that the as-
signed classroom can hold only 40 students. Also, 6 students is the minimum
564 Chapter 41. Markov and Chebyshev Inequalities
enrollment, i.e., if 5 or less people sign up, the class will be combined with
another. What is a bound on the probability that the class is held (i.e., has
a sufficient number of students) and can be taught in the intended classroom
(i.e., does not have too many students)?
Exercise 41.19. Fair coin. Trying to figure out if a coin is fair, a student
flips it one thousand times (!!!). Give an upper bound on the probability that
he gets 700 or more heads, assuming that the coin was actually a fair coin.
Exercise 41.21. Buying gifts. This holiday season, assume that a person
will, on average, spend $673 on gifts, with a standard deviation of $79. Use
a Chebyshev inequality to find a bound on the probability that a randomly
selected person spends between $568 and $778.
Exercise 41.22. Accountant age. Assume that the average age of tax ac-
countants at a certain CPA firm is 44.
a. Find a bound for the probability that a randomly chosen employee of the
firm is 48 or older.
b. If the standard deviation of the age of a randomly chosen employee is 5
years, then find a bound for the probability that a randomly chosen employee’s
age is between 36 and 52.
41.4.2 Extensions
Exercise 41.23. With the same assumptions as in Exercise 41.2, three classes
are independently selected at random.
a. Let A denote the event that all three of the classes have 40 or more
students (i.e., 40 or more in each class). Find a bound on the probability of A.
Hint: Separate A into three independent events, find a bound on the probability
of each, and then think about how to appropriately combine your bounds.
b. In the scenario above, let B denote the event that all three classes
selected at random will have between 20 and 42 people (i.e., 20 to 42 people
in each class). Find a bound on the probability of B. Hint: Again, separate
B into three independent events, find a bound on each, and then recombine
appropriately.
Chapter 42
Order Statistics
When five students gather and compare their grades, what is the probability
that the highest grade exceeds 97%? What is the probability that at least one
student failed the exam? What is the probability that three or more students
earned a “B” grade or higher?
42.1 Introduction
The concept of order statistics is usually used with continuous random variables,
because the idea of determining the rank of the random variables (i.e., which is
smallest, which is largest, which is second-smallest, etc.) is key to the concept
of order statistics. Any “ties” between two random variables—i.e., any two
random variables that happen to equal exactly the same value—complicate the
issue. With continuous random variables, the probability of any ties is 0, so the
use of continuous random variables allows us to safely ignore the possibility of
ties. It is usually impossible to remove this complication with discrete random
variables, so we do not discuss order statistics of discrete random variables in
this text at all.
Usually we speak of the “order statistics” among independent, identically
distributed random variables. So if X1 , X2 , . . . , Xn are the random variables
under consideration, then the Xj ’s are independent, and all of their densities
fj (x) are the same function (or, equivalently, all of their cumulative distribution
565
566 Chapter 42. Order Statistics
functions Fj (x) are the same function). This is not always the case when work-
ing with order statistics (i.e., we could talk about order statistics of random
variables that are either not independent or not identically distributed), but
the independent, identically distributed situation is the most common scenario
for studying order statistics.
In this text, when speaking about order statistics, we always as-
sume that the random variables under study are independent, iden-
tically distributed, continuous random variables.
42.2 Examples
Example 42.1. Consider the waiting times X1 and X2 (in minutes) until
Samuel hears from two of his friends, Mary and Josephine, respectively. He
doesn’t know which one will call first. Both of the Xj ’s are Exponential with
expected value 5. The Xj ’s are also independent. Then the first order statistic
is X(1) = min{X1 , X2 }, i.e., the minimum of the two waiting times. The second
order statistic is X(2) = max{X1 , X2 }, i.e., the maximum of the two waiting
times.
In this case, X(1) is an Exponential random variable. To see this, for any
a > 0,
Thus X(1) is Exponential with E(X(1) ) = 5/2. We emphasize that the parameter
of X(1) is different that the parameters of X1 and X2 . With this in mind, we
can treat X(1) in just the same way that we would treat any other Exponential
random variable with expected value 5/2. For instance, we know immediately
that the variance of the time until he hears from his first friend is (5/2)2 =
25/4 = 6.25 minutes. Also, the probability he hears from the first friend within
the first four minutes is P (X(1) ) = 1 − e−2(4)/5 = 0.798.
On the other hand, we emphasize that X(2) is not an Exponential random
variable. For instance, the CDF of X(2) does not have the form of the CDF of
an Exponential random variable. For a > 0:
Incidentally, we can also find the densities of X(1) and X(2) . Since X(1) is
Exponential with expected value 5/2, then no calculation is necessary. We know
2
fX(1) (x1 ) = e−2x1 /5 for x1 > 0,
5
and fX(1) (x1 ) = 0 otherwise.
To find the density of X(2) , we just differentiate the CDF
X1 = 63.076
X2 = 62.849
X3 = 63.870
then the first, second, and third order statistics are, respectively:
X(1) = 62.849
X(2) = 63.076
X(3) = 63.870
Example 42.4. For example, if a driver experiences the following ten waiting
times at ten red lights on a trip,
then we could put the waiting times in order, to get the order statistics.
The first order statistic is the minimum, written as X(1) = 1.360. The second
order statistic is the second-smallest value, written as X(2) = 3.127. The tenth
order statistic is the maximum value, written as X(10) = 29.617. So the 1st,
2nd, 3rd, . . . , 10th order statistics are, respectively:
X(1) = 1.360, X(2) = 3.127, X(3) = 3.422, X(4) = 9.484, X(5) = 10.420,
X(6) = 12.995, X(7) = 18.837, X(8) = 26.169, X(9) = 29.186, X(10) = 29.617.
Example 42.5. The joint density of the order statistics can only be nonzero
in the region where, for instance, X(1) < X(2) < X(3) . If we consider the joint
density fX(1) ,X(2) ,X(3) (2.7, 5.22, 3.9) of the first, second, and third order statistics
of three random variables X1 , X2 , X3 that are each defined in the interval [0, 10],
then we know that the joint density will be zero at this point, because we cannot
have the second order statistic X(2) = 5.22 and a smaller third order statistic
X(3) = 3.9. On the other hand, fX(1) ,X(2) ,X(3) (2.7, 3.9, 5.22) will be positive
because the order statistics are in ascending order!
Consider any a, b with 0 < a < b. In order to calculate the joint density of
(X(1) , X(2) ) evaluated at (a, b), we first calculate the joint CDF:
FX(1) ,X(2) (x1 , x2 ) = (1 − e−x1 /5 )(1 − e−x1 /5 ) + 2(e−x1 /5 − e−x2 /5 )(1 − e−x1 /5 ).
fX(1) ,X(2) (x1 , x2 ) = 2fX (x1 )fX (x2 ) for 0 < x1 < x2 ,
fX1 ,X2 ,...,Xn (x1 , x2 , . . . , xn ) = fX1 (x1 )fX2 (x2 ) · · · fXn (xn ).
Of course, not only are the Xj ’s all independent, but they also all have the
same distribution, which we can just write as f (x). Thus, the joint density of
X1 , . . . , Xn becomes
The joint density of the order statistics X(1) , . . . , X(n) can also be written as a
product, but we must insist now that the xj ’s are in ascending order. The main
point driving this idea is that there are n! equally likely ways that the underlying
X1 , . . . , Xn can be placed in order. So by multiplying by n!, the integral of the
joint density of the order statistics will be 1. Therefore, the joint density of
X(1) , . . . , X(n) is
Now we can easily recompute some of the previous problems from earlier chap-
ters.
fX(1) ,X(2) ,X(3) (x1 , x2 , x3 ) = 3!f (x1 )f (x2 )f (x3 ) for x1 < x2 < x3 ,
1 1 1
=6 for 0 ≤ x1 < x2 < x3 ≤ 10,
10 10 10
6
= for 0 ≤ x1 < x2 < x3 ≤ 10,
1000
and fX(2) (x2 ) = 0 otherwise. We rewrite this result in the following way:
1 x 10 − x
2 2
fX(2) (x2 ) = 6 for 0 ≤ x2 ≤ 10,
10 10 10
and fX(2) (x2 ) = 0 otherwise. If we write fX (x) = 1/10 as the density of each of
the Xj ’s, and we write FX (x) = 10 x
as the cumulative distribution function of
each of the Xj , and 1 − FX (x) = 1 − 10 x
, then we notice that fX(2) (x2 ) has the
really nice form
fX(2) (x2 ) = 3!fX (x2 )FX (x2 )(1 − FX (x2 )) for 0 ≤ x2 ≤ 10,
and fX(2) (x2 ) = 0 otherwise. This might not look very general, but it is actually
a very special case of the following nice, general result:
572 Chapter 42. Order Statistics
where
n n!
= .
j − 1, 1, n − j (j − 1)!1!(n − j)!
is exactly the number of ways to pick exactly 1 of the n variables X1 , . . . , Xn
to be the jth order statistic (i.e., to be the jth smallest of the collection
X1 , . . . , Xn ), while also choosing j − 1 of the variables to be smaller than Xj ,
and finally letting the other n − j variables be larger than Xj .
In the special case discussed in Example 42.9, we have
n=3 and j = 2,
as well as
x x
fX (x) = 1/10 and FX (x) = and 1 − FX (x) = 1 − .
10 10
Therefore, we are able to quickly verify the earlier result, from Example 42.9.
For 0 ≤ x2 ≤ 10,
3
fX(2) (x2 ) = fX (x2 )FX (x2 )(1 − FX (x2 ))
1, 1, 1
1 x 10 − x
2 2
=6 .
10 10 10
Otherwise, fX(2) (x2 ) = 0.
42.3. Joint Density and Joint CDF of Order Statistics 573
Using what we learned in Example 42.10, we can verify the calculation of the
densities of X(1) and X(2) from Example 42.1. We have
2 1 −x1 /5 2
fX(1) (x1 ) = e (1 − e−x1 /5 )0 (e−x1 /5 )1 = e−2x1 /5 .
0, 1, 1 5 5
Also
2 1 −x2 /5 2
fX(2) (x2 ) = e (1 − e−x2 /5 )1 (e−x2 /5 )0 = e−x2 /5 (1 − e−x2 /5 ).
1, 1, 0 5 5
This agrees with the calculations of the densities made in Example 42.1.
Example 42.13. Consider seven students whose grades are independent and
are each Uniform on the interval from 82 to 98. Let X1 , . . . , X7 denote their
grades. Rank the student’s scores in order, and denote the order statistics as,
respectively, X(1) , . . . , X(7) .
Since each Xj is Uniform on the interval from 82 to 98, then the density of
each Xj is 98−82
1
on the interval 82 ≤ x ≤ 98. Also, the cumulative distribution
function of each Xj is FX (x) = 98−82x−82
for 82 ≤ x ≤ 98. Thus, applying the
general model from Example 42.10, we see that the fifth-smallest student’s score,
X(5) , has the following density, for 82 ≤ x5 ≤ 98:
7 1 x5 − 82 5−1 x5 − 82 7−5
fX(5) (x5 ) = 1− .
4, 1, 2 98 − 82 98 − 82 98 − 82
Example 42.14. Now we generalize the formula for the density of the jth order
statistic of n independent random variables X1 , . . . , Xn that are each Uniform
on the interval from a to b. As usual, we rank the random variables in order,
and we denote the order statistics as, respectively, X(1) , . . . , X(n) .
Since each Xj is Uniform on the interval from a to b, then the density of each
Xj is b−a
1
on the interval a ≤ x ≤ b. Also, the cumulative distribution function
of each Xj is FX (x) = x−ab−a for a ≤ x ≤ b. Thus, using the model from
Example 42.10,
Example 42.17. Consider 10 people who are each waiting for an email to
arrive, and the waiting times are independent Exponential random variables,
each with average 30 minutes.
and fX(j) (x) = 0 otherwise. Thus, the expected value of the jth order statistic,
X(j) , is
Z 1
E(X(j) ) = xfX(j) (x) dx
0
Z 1
n j
= j x (1 − x)n−j dx
0 j
Z 1
n
=j xj (1 − x)n−j dx
j 0
Also
1
(n − j)!j!
Z
xj (1 − x)n−j dx =
0 (n + 1)!
(the inquisitive reader may want to consider why—the reasoning is connected
to the density of a Beta random variable), and thus we conclude
j
E(X(j) ) = .
n+1
Example 42.24. In general (i.e., when a and b are not necessarily 0 and 1),
then if X1 , . . . , Xn are independent and Uniform on [a, b], then we can normalize
that Xj ’s so that they are distributed on the interval [0, 1]. To do this, we define
Xj −a
Yj = b−a , and then each Yj is Uniform on [0, 1]. So the jth order statistic Y(j)
j
of Y1 , . . . , Yn has expected value E(Y(j) ) = n+1 , by the results of the previous
X(j) −a j
example. Equivalently, E b−a = n+1 , so the expected value of the jth
order statistic X(j) is
j(b − a)
E(X(j) ) = + a.
n+1
42.4 Exercises
42.4.1 Practice
Exercise 42.1. Dancers. In the middle of a complicated dance routine, five
dancers are located across the width of the stage. Let X1 , . . . , X5 denote their
positions across the stage. Assume that the Xj ’s are independent and that each
Xj is Uniformly distributed on [0, 100], where “0” denotes the far left side of the
stage, and “100” denotes the far right side of the stage, and the measurements
are given in feet.
42.4. Exercises 579
a. Find the density of the person located closest to the left-hand side of the
stage, i.e., find fY (y) for Y = min(X1 , . . . , X5 ). (Hint: Y is just the 1st order
statistic, i.e., Y = X(1) .)
b. Find the probability that nobody is located within 20 feet of the left-hand
side of the stage.
c. Find the probability that at least two people are located within 30 feet
of the left-hand side of the stage. (Hint: You can use the 2nd order statistic for
this part.)
Exercise 42.2. Throwing darts. Alfredo, Barbara, and Cathy throw darts at
a dartboard of radius 9 inches. Let X1 , X2 , X3 denote the distance of Alfredo,
Barbara, and Cathy’s darts (respectively) from the center of the board. Thus,
X(1) (the first order statistic) is the minimum distance from the center to any
of the darts; X(3) is the distance to the farthest dart; and X(2) is the distance
to the middle of the three darts with respect to the origin.
a. Find the density of X(2) .
R9
b. Check that fX(2) (x2 ) is a density, i.e., check that 0 fX(2) (x2 ) dx2 = 1.
Exercise 42.3. Phone calls. Suppose that three male students are making
telephone calls to their girlfriends. Let X1 , X2 , X3 denote the times of their
phone calls. Suppose that the Xj ’s are independent Exponential random vari-
ables, each with expected value 20 minutes.
a. Find the density of X(1) , the first order statistic. (Easy check: X(1) is
the minimum of three independent Exponentials, so we remember that X(1) is
Exponential too.)
b. Find the probability that none of the calls are less than 12 minutes long,
i.e., they all exceed 12 minutes.
c. Find the probability that two or more of the calls are less than 23 minutes
each.
Exercise 42.4. Let X1 , X2 , X3 be three independent random variables that
are Uniformly distributed on the interval [0, 20]. Find the probability that the
minimum of the three random variables is between 12 and 15.
Exercise 42.5. Standing in a line. Five people stand along a street waiting
for a bus to arrive. The street is 9 yards away from a building.
√
The distance (in
x
yards) of each person from the street has density fX (x) = 18 , for 0 ≤ x ≤ 9.
These distances are independent. Find the density of the greatest of the five
distances from the street.
Exercise 42.6. Student presentations. In four sections of a course, running
(independently) in parallel, there are four students giving presentations that
are each Exponential in length, with expected value of 10 minutes each. How
time much do we expect to be needed until all four of the presentations are
completed?
580 Chapter 42. Order Statistics
Exercise 42.7. Soda cans. The weight of a can of soda is Uniformly dis-
tributed, between 238 and 242 grams. Find the density of:
a. the minimum of the three weights among three cans of soda;
b. the maximum of the three weights among three cans of soda;
c. the 2nd-largest of the three weights among three cans of soda.
Exercise 42.8. Pizza delivery. Suppose that the delivery time for a pizza
delivery is Uniformly distributed between 15 to 20 minutes. Now suppose that
4 people from a dormitory independently order pizza from 4 different drivers
(so the times for delivery are independent). What is the density of:
a. the minimum time until a pizza arrives?
b. the maximum time until a pizza arrives?
c. the time until the second pizza?
d. the time until the third pizza?
Exercise 42.9. Homework times. The time in which a student starts his
homework each night—during a seven day week—is Uniformly distributed be-
tween 6 PM and 8 PM. Let X and Y denote, respectively, the earliest and latest
times that he starts his homework that week.
a. Find the density of X.
b. Find the density of Y .
c. Find the expected value of X.
d. Find the expected value of Y .
Exercise 42.10. Marble drops. On a Rube Goldberg machine, three mar-
bles are dropped from a ledge, and they land at a location which is Uniformly
distributed between 0 to 12 inches away from a wall. What is the expected
location of the marble that lands farthest from the wall?
Exercise 42.11. Student arrivals. A class of 40 students is supposed to arrive
at 8:30 AM, but in practice they each arrive at a time Uniformly distributed
between 8:20 AM and 8:35 AM, and their arrival times are independent.
a. What is the expected arrival time of the last student to arrive?
b. What is the probability that 38 or more of the students have arrived by
8:30 AM?
Exercise 42.12. Shot put throws. Three athletes, competing in the shot
put event at a track and field competition, throw the ball (called the “shot”) as
far as they are able. The athletes have relatively similar strengths, so assume
that the distances they throw the ball are independently, Uniformly distributed
random variables on the interval [16.5, 19.5], measured in meters. What is the
probability that at least two of the three throws exceed 17 meters?
42.4. Exercises 581
42.4.2 Extensions
Exercise 42.13. Violinist’s hands. When an extremely talented violin player
is soloing, the position of her hand along the violin’s neck seems to be Uniform
on a 9-inch interval (the total length is about 13 inches from nut to bridge; but
not all of the string is used). The violinist is photographed 6 times during a
performance. Assume that the location of her hand is independently placed in
each of the photos.
a. Find the jth order statistic of the location of her hand (1 ≤ j ≤ 6).
b. Find the probability that her hand is within 3 inches of the nut during
at least 2 of the 6 photographs.
582
43.1. A Brief Introduction to Generating Functions 583
g (j) (0)
put j! as a coefficient of tj , and then we sum up all of the terms; anytime we
(j)
need g j!(0) , it will be sitting in front of tj , ready to be retrieved. The clothesline
looks like this:
g(0) 0 g 0 (0) 1 g 00 (0) 2 g 000 (0) 3 g (4) (0) 4 g (5) (0) 5 g (6) (0) 6
g(t) = t + t + t + t + t + t + t +· · ·
0! 1! 2! 3! 4! 5! 6!
At this point, one might wonder what to do if we do not have a function
in mind with interesting derivatives. Can we start from a sequence of numbers
and build such a function? Yes, we can start with a sequence of numbers
and construct—from the sequence—a new function g(t) that has interesting
coefficients in its Maclaurin series, so we want to display them on a clothesline.
One motivation for doing this is that a series representation is often a much
more compact way to store a sequence of integers, especially when no general,
succinct form of the numbers themselves are available. To build such functions
ourselves, we start with an interesting sequence of numbers a0 , a1 , a2 , . . ., and
we can just put them onto such a clothesline and add things up, and see what
function we get as a result. We can define g by writing
∞
X aj
g(t) = tj
j!
j=0
a2 2 a3 3 a4 4 a5 5 a6 6 a7 7
= a0 + a1 t + t + t + t + t + t + t + ···
2! 3! 4! 5! 6! 7!
Taking j derivatives, we get aj = g (j) (0) from this function g that we created!
The reason behind aj = g (j) (0) should be clear after taking a few derivatives
and trying it, but nonetheless, we put the reasoning in the Appendix to this
chapter; see Section 43.5.
Definition 43.1. Generating Function
If a0 , a1 , a2 , . . . is an interesting sequence of numbers from which we want to
build a generating function g(t), we write
∞
X aj
g(t) = tj
j!
j=0
a2 2 a3 3 a4 4 a5 5 a6 6 a7 7
= a0 + a1 t + t + t + t + t + t + t + ···
2! 3! 4! 5! 6! 7!
and then we get
aj = g (j) (0)
from this function g that we created!
There are several kinds of generating functions that are interesting to study,
e.g., ordinary generating functions, Exponential generating functions, multi-
variate generating functions, etc. (See, for instance, Analytic Combinatorics by
P. Flajolet and R. Sedgewick [3] for a comprehensive introduction to generating
functions.) In this chapter, we build moment generating functions.
584 Chapter 43. Moment Generating Functions
(We use g(t) and MX (t) interchangeably in the rest of the discussion, since
moment generating functions are the only kind of generating function we handle
in this book.) The equation above lets us wrap all moments E(X j ) of a random
variables into a compact form. They are divided by j! and then conveniently
stored as the coefficients of MX (t). Since the t has nothing to do with X, i.e.,
the t is a constant with regard to X, we can pull the t from each term into the
expected value, and we get
∞
X E((tX)j )
MX (t) = .
j!
j=0
The sum of expected values is equal to the expected value of the sum, so this
implies
∞
(tX)j
X
MX (t) = E .
j!
j=0
MX (t) = E(etX ).
We know that, as discussed above, for all generating functions of this form,
aj = g (j) (0). In this case, aj = E(X j ) and MX (t) = E(etX ). Thus, if we com-
pute E(etX ), and we then take j derivatives and substitute in t = 0, we will get
43.2. Moment Generating Functions 585
If X is continuous, then
Z ∞
MX (t) = E(etX ) = etx f (x) dx.
−∞
At this point, one might wonder what is so great about moment generating
function. For instance, if we want the 2nd moment of a random variable X, we
can just compute X
E(X 2 ) = x2 pX (x),
x
MX (t) = E(etX ) at the start, then we can get any moment of X by simply
taking derivatives of MX (t) and then evaluating at t = 0. To get the jth
moment of X, we just take j derivatives and then evaluate at t = 0. Taking
derivatives is generally much, much easier than taking sums or evaluating inte-
grals, so the method of moment generating functions has some advantages over
the direct calculation of moments. Based on our experience from calculus, we
are already good at taking derivatives, so the method of moment generating
functions should be appealing.
where the last line follows from the Binomial theorem, i.e., from
n
X n j n−j
a b = (a + b)n .
j
j=0
E(X 2 ) = g 00 (0)
d2 t
= (e p + 1 − p)n
dt2 t=0
d
= (n)(et p + 1 − p)n−1 (et p)
dt t=0
t n−2 t 2
= (n)(n − 1)(e p + 1 − p) (e p) + (n)(et p + 1 − p)n−1 (et p)
t=0
= (n)(n − 1)(p + 1 − p)n−2 (p)2 + (n)(p + 1 − p)n−1 (p)
= (n)(n − 1)(p)2 + np.
So the variance of X is
All of this agrees with what we computed earlier in the course about Binomial
random variables. This technique also enables us to readily calculate E(X j ) for
higher values of j.
MX (t) = E(etX )
∞
X e−λ λx
= etx
x!
x=0
∞
−λ
X (et λ)x
=e
x!
x=0
−λ (et λ)
=e e
((et −1)λ)
=e .
E(X) = g 0 (0)
d ((et −1)λ)
= e
dt t=0
((et −1)λ) t
=e eλ
t=0
= e(0λ) λ
= λ,
E(X 2 ) = g 00 (0)
d2 ((et −1)λ)
= e
dt2 t=0
d ((et −1)λ) t
= e eλ
dt t=0
t t
((e −1)λ) t 2
= e (e λ) + e((e −1)λ) et λ
t=0
= e(0λ) λ2 + e(0λ) λ
= λ2 + λ.
So the variance of X is
All of this agrees with what we computed earlier in the course about Poisson
random variables. This technique also enables us to readily calculate E(X j ) for
higher values of j.
Example 43.6. If X is a Uniform random variable on the interval [a, b], the
density of X is
1
fX (x) = for a ≤ x ≤ b,
b−a
43.4. Moment Generating Function: Continuous Case 589
MX (t) = E(etX )
Z ∞
= etx fX (x) dx
−∞
Zb
1
= etx dx
a b−a
b
1 etx
=
b−a t x=a
etb − eta
= .
(b − a)(t)
We use the series expansion of etb and eta because the computation of the
derivative with quotient rule (especially for the second moment) becomes quite
intricate and much more tedious. The first moment of X is
E(X) = g 0 (0)
d etb − eta
=
dt (b − a)(t) t=0
X (tb)j ∞ ∞
1 d 1 1 X (ta)j
= + − −
b − a dt t j!t t j!t
j=1 j=1
t=0
P∞ (tb)j P∞ (ta)j
since etb = j=0 j! and eta = j=0 j!
∞ ∞
1 d X tj−1 bj X tj−1 aj
= −
b − a dt j! j!
j=1 j=1
t=0
∞ ∞
1 X (j − 1)tj−2 bj X (j − 1)tj−2 aj
= −
b−a j! j!
j=1 j=1
t=0
1 b2 a2
= − only the j = 2 term remains when t = 0
b − a 2! 2!
(b + a)(b − a)
=
(b − a)(2)
a+b
= ,
2
590 Chapter 43. Moment Generating Functions
E(X 2 ) = g 00 (0)
d2 etb − eta
=
dt2 (b − a)(t) t=0
∞ ∞
1 d X (j − 1)tj−2 bj X (j − 1)tj−2 aj
= −
b − a dt j! j!
j=1 j=1
t=0
∞ ∞
1 X (j − 1)(j − 2)tj−3 bj X (j − 1)(j − 2)tj−3 aj
= −
b−a j! j!
j=1 j=1
t=0
1 (2)(1)b3 (2)(1)a3
= − only the j = 3 term remains, as t = 0
b−a 3! 3!
b3 − a3
=
(b − a)(3)
(b − a)(a2 + ab + b2 )
=
(b − a)(3)
a + ab + b2
2
= .
3
So the variance of X is
All of this agrees with what we computed earlier in the course about Uniform
random variables. This technique also enables us to readily calculate E(X j ) for
higher values of j.
1
fX (x) = √ exp(−x2/2) for all x.
2π
43.4. Moment Generating Function: Continuous Case 591
MX (t) = E(etX )
Z ∞
= etx fX (x) dx
−∞
Z ∞
1
= etx √ exp(−x2/2) dx
−∞ 2π
Z ∞
1 x2
= √ exp tx − dx.
−∞ 2π 2
So the variance of X is
All of this agrees with what we already knew about X since X is a standard
Normal random variable. This technique also enables us to readily calculate
E(X j ) for higher values of j.
592 Chapter 43. Moment Generating Functions
Note: The 3rd and 4th moments of a random variable are called the skew-
ness and the kurtosis, respectively. The skewness measures how “skewed” the
distribution is, i.e., whether the density or mass is heavier on one side than the
other. The kurtosis measures whether the concentration of the density or mass
is tall and narrow, or short and wide.
E(Y ) = g 0 (0)
d (σt)2
= exp + tµ
dt 2 t=0
2(σt)(σ) (σt)2
= + µ exp + tµ
2 2 t=0
= µ,
43.5. Appendix: Building a Generating Function 593
E(Y 2 ) = g 00 (0)
d2 (σt)2
= exp + tµ
dt2 2 t=0
d 2(σt)(σ)
(σt)2
= + µ exp + tµ
dt 2 2 t=0
2(σt)(σ) 2 (σt)2 (σt)2
= + µ exp + tµ + σ 2 exp + tµ
2 2 2 t=0
2 2
=µ +σ .
So the variance of Y is
All of this agrees with what we already knew about Y . This technique also
enables us to readily calculate E(Y j ) for higher values of j.
then we get aj = g (j) (0) from this function g that we created! Now we explain
the reasoning behind this phenomenon.
If we evaluate at t = 0, we get g(0) on the left hand side, and on the right
hand side, we get a0 ; all of the other terms on the right-hand-side are just 0;
thus
g(0) = a0 .
If we take one derivative with respect to t, we get
a2 a3 a4 a5 a6
g 0 (t) = a1 + 2t + 3t2 + 4t3 + 5t4 + 6t5 + · · ·
2! 3! 4! 5! 6!
and then substitute in t = 0, we conclude
g 0 (0) = a1 .
594 Chapter 43. Moment Generating Functions
g 00 (0) = a2 .
g 000 (0) = a3 .
g (j) (0) = aj .
43.6 Exercises
Exercise 43.1. If X is a Geometric random variable with parameter p (in other
words,
P (X = x) = pX (x) = (1 − p)x−1 p for x = 1, 2, 3, . . .,
and pX (x) = 0 otherwise), then find the moment generating function of X.
Exercise 43.2. Use the moment generating function from Exercise 43.1 to
verify that, if X is Geometric with parameter p, then E(X) = 1/p.
Exercise 43.3. Now use the moment generating function from Exercise 43.1
to verify that, if X is Geometric with parameter p, then E(X 2 ) = (2 − p)/p2 .
Exercise 43.6. Use the moment generating function from Exercise 43.5 to
verify that, if X is Negative Binomial with parameters p, r, then E(X) = r/p.
43.6. Exercises 595
Exercise 43.7. Now use the moment generating function from Exercise 43.5
to verify that, if X is Negative Binomial with parameters p, r, then E(X 2 ) =
r(r + 1 − p)/p2 .
Exercise 43.10. Use the moment generating function from Exercise 43.9 to
verify that, if X is Exponential with parameter λ, then E(X) = 1/λ.
Exercise 43.11. Now use the moment generating function from Exercise 43.9
to verify that, if X is Exponential with parameter λ, then E(X 2 ) = 2/λ2 .
Exercise 43.12. Verify that if X is Exponential with mean 1/λ, then Var(X) =
1/λ2 .
Exercise 43.14. Use the moment generating function from Exercise 43.13 to
verify that, if X is Gamma with parameters λ, r, then E(X) = r/λ.
Exercise 43.15. Now use the moment generating function from Exercise 43.13
to verify that, if X is Gamma with parameters λ, r, then E(X 2 ) = r(r + 1)/λ2 .
Math is the only place where truth and beauty mean the same thing.
—Danica McKellar
“Obvious” is the most dangerous word in mathematics.
—E. T. Bell
In a toy store, there are balls of many sizes available in a big bin. You randomly
select a ball from the bin. How does the size of the radius affect the volume of
the ball?
Example 44.1. Choose a ball at random from a large bin at the toy store.
Suppose that the radius (in inches) is Uniformly distributed on the interval
[0, 15]. What is the expected value of the volume of the ball?
4
Y = πX 3 .
3
596
44.1. The Distribution of a Function of One Random Variable 597
FY (y) = P (Y ≤ y)
4
=P πX 3 ≤ y
3
1/3 !
3y
=P X≤
4π
3y 1/3
( 4π ) −0
= since X is Uniformly distributed on [0, 15]
15 − 0
3y 1/3
( 4π )
=
15
Now taking a derivative with respect to Y , we get the density of Y :
d
fY (y) = FY (y)
dy
3y
d ( 4π )1/3
=
dy 15
3y −2/3 3
(1/3)( 4π ) 4π
=
15
( 3y )−2/3
= 4π
60π
With the density of Y in hand, we can compute probabilities that involve Y , or
598 Chapter 44. Transformations of One or Two Random Variables
which agrees with the value of E(Y ) we had computed above. The advantage of
this method is that we now have the density of Y , so we compute other things
if we want to, like other probabilities involving Y , or the variance of Y , etc.
4. If desired, check to see that fY (y) is really a density, i.e., that integrating
fY (y) over all possible y’s gives a result of 1.
Example 44.3. Let X be a Uniform random variable on the interval [−15, 15].
Let Y = X 2 . First, we find the density of Y .
44.1. The Distribution of a Function of One Random Variable 599
−15 ≤ X ≤ 15,
and thus
0 ≤ X 2 ≤ 152 = 225,
or in other words,
0 ≤ Y ≤ 225.
So the density fY (y) of Y is 0 for y outside the interval [0, 225]. Notice that,
in particular, even though X can be negative, that Y can never be negative,
because Y is a “square” of a quantity.
Now consider y in the interval [0, 225]. We want to compute
P (Y ≤ y) = P (X 2 ≤ y)
or equivalently
√ √
P (Y ≤ y) = P (− y ≤ X ≤ y)
This line is perhaps unexpected; note that f (x) = x2 does not have a unique
inverse, but we have used something like the inverse of f to find the correct
range for X.
Since X is Uniformly distributed, it follows that
√ √ √
y − (− y) 2 y
FY (y) = P (Y ≤ y) = = .
30 30
Differentiating with respect to Y yields
y −1/2
fY (y) = for 0 ≤ y ≤ 225,
30
and fY (y) = 0 otherwise.
We can perform an easy check to see that this is plausible for the density of
Y (i.e., that we did not make any mistakes). Of course we could get an integral
of “1” and still have a subtle mistake somewhere, but if the integral is exactly
1, then we are relatively sure that we performed the computation correctly. We
compute
225
225
y −1/2 y 1/2 (225)1/2
Z
15
dy = = = =1
0 30 (1/2)(30) (1/2)(30) 15
y=0
Let Y = X 3 . First we find the density of Y . Since 0 < X < 1, then also
0 < Y < 1 too. So fY (y) = 0 for y outside the interval (0, 1). Now consider y
inside the interval (0, 1), i.e., consider 0 < y < 1. We have
FY (y) = P (Y ≤ y)
= P (X 3 ≤ y)
= P (X ≤ y 1/3 )
y 1/3 − 0
=
1−0
= y 1/3
d
fY (y) = FY (y)
dy
d 1/3
= y
dy
= (1/3)y −2/3
so fY (y) is a density.
44.2. The Distributions of Functions of Two Random Variables 601
U = g(X, Y )
V = h(X, Y ).
(The functions g and h should already be given, so the above two equations
do not require any work! Some examples of functions g and h are given in the
examples later in this section.)
Compute four partial derivatives of g and h, with respect to x and y:
∂ ∂ ∂ ∂
g(x, y), g(x, y), h(x, y), h(x, y)
∂x ∂y ∂x ∂y
Write the joint density of U and V (we will need to make some adjustments
afterwards, so this notation will look strange mathematically, but we remember
that x and y depend on u and v):
fX,Y (x, y)
fU,V (u, v) =
∂ ∂ ∂ ∂
∂x g(x, y) ∂y h(x, y) − ∂y g(x, y) ∂x h(x, y)
Notice that the left-hand side has u’s and v’s, but the right hand side has
x’s and y’s (because, as we noted, the x’s and y’s depend on u and v). So
we still need to make a substitution. Before we can use such a result, we will
have to solve for X and Y in terms of U and V , and substitute appropriate
combinations of u’s and v’s for every x and y.
U =1−X −Y
and
V = X − Y.
Suppose that the joint density fX,Y (x, y) of X and Y is known. Find the joint
density fU,V (u, v) of U and V .
Notice that, in this example, the new random variables are just linear combi-
nations of the old random variables in this example. (In Example 44.8 we will
see a non-linear example.)
We write
U = g(X, Y ) = 1 − X − Y
and
V = h(X, Y ) = X − Y.
Then we compute four partial derivatives of g and h, with respect to x and y:
∂ ∂ ∂ ∂
g(x, y) = −1, g(x, y) = −1, h(x, y) = 1, h(x, y) = −1
∂x ∂y ∂x ∂y
So we get
fX,Y (x, y)
fU,V (u, v) =
∂ ∂ ∂ ∂
∂x g(x, y) ∂y h(x, y) − ∂y g(x, y) ∂x h(x, y)
fX,Y (x, y)
=
|(−1)(−1) − (1)(−1)|
1
= fX,Y (x, y)
2
44.2. The Distributions of Functions of Two Random Variables 603
Finally, we need to substitute u’s and v’s instead of x’s and y’s on the right hand
side of the equation. This means that—when possible (as in this problem)—
we can just solve for the values of x and y in the equations analogous to the
definitions of U and V :
U =1−X −Y
and
V = X − Y.
For instance, in this case, if we add U and V , we get U + V = 1 − 2Y , so
Y = 1−U2−V . If we subtract V from U , we get U − V = 1 − 2X, and so
X = 1−U2+V .
So we conclude that
1 1−u+v 1−u−v
fU,V (u, v) = fX,Y , .
2 2 2
This is a very general result because we did not even need to specify the actual
joint density fX,Y (x, y) of X and Y ; rather, this solution works regardless of
what the joint density is. A second question,
however, arises: For which u’s and
v’s is the joint density fX,Y 1−u+v
2 , 1−u−v
2 going to be strictly positive? This
involves some understanding of the two-dimensional geometry of the problem.
So we consider some specific cases below.
Example 44.6. Consider independent random variables X and Y that are each
Uniform on the interval [0, 1]. Suppose that
U =1−X −Y
and
V = X − Y.
Find the joint density fU,V (u, v) of U and V .
but the second question (about the geometry of the problem, which we men-
tioned above) still remains: Where is the joint density of U and V defined?
(Where is the joint density of U and V strictly positive?)
To answer this question, we can step around the four sides of the square
where X and Y are defined, and just find the analogous curves for U and V .
Side 1: When Y = 0, then Y = 1−U −V
2 becomes V = 1 − U .
Side 2: When X = 1, then X = 1−U +V
2 becomes V = 1 + U .
Side 3: When Y = 1, then Y = 1−U −V
2 becomes V = −1 − U .
Side 4: When X = 0, then X = 1−U +V
2 becomes V = U − 1.
In the plane for U and V , the four pieces fit together at the places indicated
in Figure 44.1.
y
Y =1
1
X=0 X=1
x
Y =0 1
v
1
X = 1 so Y = 0 so
V =U +1 V =1−U
u
−1 1
Y = 1 so X = 0 so
V = −U − 1 V =U −1
−1
So we see that the pair of random variables (U ,V ) have a joint density that
is Uniform. In fact, fU,V (u, v) = 1/2, on a diamond-shaped region of the U, V
plane. √
This diamond is a square tilted on its side, and the diamond has sides of
length 2, so the diamond has area 2. Any time that a function is constant on
a region, then the integral over that region is just the constant times the area.
44.2. The Distributions of Functions of Two Random Variables 605
Thus, in this case, if we integrate fU,V (u, v) = 1/2 over the whole diamond, we
get (2)(1/2) = 1. This check reassures us that the joint density seems to have
been calculated correctly, without mistakes.
1 2 1 2
=p e−(u−1) /4 p e−v /4
(2π)(2) (2π)(2)
To put the power of the methods above into their proper context, we point
out that without these methods, we still could also have determined
E(U ) = E(1 − X − Y ) = 1
and
Var(U ) = Var(1 − X − Y ) = Var(X) + Var(Y ) = 2
and
E(V ) = E(X − Y ) = 0
and
Var(V ) = Var(X − Y ) = Var(X) + Var(Y ) = 2
by simply using the equations U = 1 − X − Y and V = X − Y and by using
the knowledge that X, Y are independent. On the other hand, we emphasize
that we could not have determined, a priori, that U and V are independent. To
know that U and V are independent, this calculation of the joint density of U
and V was absolutely necessary.
One more quick comment about the domain where the joint density of the pairs
of random variables is defined: Since the joint density of X and Y is defined
throughout the X, Y plane, then also the joint density of U and V is defined
throughout the U, V plane too.
U = XY
and
V = X/Y.
Suppose that the joint density fX,Y (x, y) of X and Y is known. Find the joint
density fU,V (u, v) of U and V .
Notice that the new random variables are not just linear combinations of the
old random variables in this example.
We write
U = g(X, Y ) = XY
and
V = h(X, Y ) = X/Y.
Then we compute four partial derivatives of g and h, with respect to x and y:
∂ ∂ ∂ ∂
g(x, y) = y, g(x, y) = x, h(x, y) = 1/y, h(x, y) = −xy −2
∂x ∂y ∂x ∂y
44.2. The Distributions of Functions of Two Random Variables 607
So we get
fX,Y (x, y)
fU,V (u, v) =
∂ ∂ ∂ ∂
∂x g(x, y) ∂y h(x, y) − ∂y g(x, y) ∂x h(x, y)
fX,Y (x, y)
=
|(y)(−xy −2 ) − (x)(1/y)|
y
= fX,Y (x, y)
2x
Finally, we need to substitute u’s and v’s instead of x’s and y’s on the right
y √
hand side of the equation. The fraction 2x is just 2v
1
. Also uv = x2 , so uv = x;
and u/v = y 2 , so u/v = y.
p
So we conclude that
1 √ p
fU,V (u, v) = fX,Y ( uv, u/v ).
2v
Again, this is very general, so we will consider a specific case below.
Example 44.9. Consider independent random variables X and Y that are each
Uniform on the interval [0, 1]. Suppose that
U = XY
and
V = X/Y.
Find the joint density fU,V (u, v) of U and V .
As before, we first refer back to the general situation, discussed in Example 44.8
above. This establishes the fact that
1 √ p
fU,V (u, v) = fX,Y ( uv, u/v ).
2v
Since X and Y are independent and Uniform on the square where 0 ≤ X ≤ 1
and 0 ≤ Y ≤ 1, then fX,Y (x, y) = 1 on this square, and thus the equation
1 √ p
fU,V (u, v) = fX,Y ( uv, u/v ).
2v
reduces to just
1
fU,V (u, v) = .
2v
but, again, one question remains: Where is the joint density of U and V defined?
To answer this question, we can step around the four sides of the square
where X and Y are defined, and just find the analogous curves for U and V .
608 Chapter 44. Transformations of One or Two Random Variables
y
Y =1
1
X=0 X=1
x
Y =0 1
V
8
4 X = 1 so
V = 1/U
3
1
Y = 1, so V = U
X = 0 so 1 U
V = 0, U = 0
So this joint density integrates to 1 over the region where it is defined, which
helps verify that everything was calculated correctly. We strongly encourage
the use of such pictures and double-checks of the transformed joint density.
44.3 Exercises
44.3.1 Practice
Exercise 44.1. Bottle volume. A certain type of cylindrical bottle always
has height 14 cm. During the manufacturing process however, the radius of the
bottom is Uniformly distributed between 2.3 cm and 2.7 cm. (Whatever radius
is chosen for the bottom is automatically given to the rest of the cylinder too.
Once the bottom is fixed, the whole can is manufactured that way.) What is
the probability that the bottle has a volume of less than 275 cm3 ?
610 Chapter 44. Transformations of One or Two Random Variables
Exercise 44.2. Gift box. A gift is chosen at random, with the price Uniformly
distributed between $5 and $12. Each gift needs (additionally) to be placed in
a box that costs $2. What is the expected total price of the gift and its box?
Exercise 44.3. Fabric size. A customer at the fabric store buys fabric that is
40 inches wide, i.e., 1.11 yards wide. The length is cut by the employee at the
store. When she is asked to cut 1 yard of fabric, the actual length is Uniformly
distributed between 0.87 and 1.05 yards. What is the probability that the entire
piece of fabric has area 1 square yard or larger?
Exercise 44.4. Chess board. You want to buy a square chess board from
a local artists’ collective. Since each chessboard is uniquely handcrafted and
you didn’t bring your ruler, you are not sure of the exact dimensions. Let X
be the length of the side of one of the boards, and assume that X is Uniformly
distributed between 12 and 18 inches.
9
Y = (X − 273) + 32.
5
Find the probability that the temperature is higher than 90 degrees Fahrenheit.
Exercise 44.6. Phone bill. Lucas calls his girlfriend Margaret every day on
his cell phone. The time X, in hours, that they talk in a day is Uniformly
distributed between 0 and 2. Lucas pays $2 per hour that he is on the phone,
plus a flat rate of $5 per day. So Y = 2X + 5 is the size of his bill per day.
a. Find the probability that he spends more than $6 on a given day for his
cell phone service.
b. Find the expected amount that he spends on a given day for his cell
phone service.
c. Find the standard deviation of the amount that he spends on a given day
for his cell phone service.
44.3.2 Extensions
Exercise 44.9. Let X be a random variable that is Uniformly distributed on
the interval [0, π/2]. Find the expected value of Y = sin X.
Exercise 44.10. Generalize Example 44.4 as follows:
Let X be a Uniform random variable on the interval (0, 1), i.e., X is Uni-
formly distributed with 0 < X < 1. Let Y = X n where n is any positive integer.
Find E(Y ).
Exercise 44.11. Consider an Exponential random variable X with parameter
λ > 0.
Is it always true that, if a and b are positive constants, then Y = aX + b is
an Exponential random variable too?
If your answer is “yes,” then give a justification (e.g., give an argument in
favor).
If your answer is “no,” then give a very concrete counterexample (e.g., for
at least one specific a and b of your choice, show that Y = aX + b is not
Exponential).
44.3.3 Advanced
Exercise 44.12. Consider the scenario in which
U = g(X, Y ) = X 2
and
V = h(X, Y ) = X + Y.
Suppose the joint density fX,Y (x, y) of X and Y is Uniform on the square where
0 ≤ X, Y ≤ 1. In other words,
fX,Y (x, y)
fU,V (u, v) =
∂ ∂ ∂ ∂
∂x g(x, y) ∂y h(x, y) − ∂y g(x, y) ∂x h(x, y)
using the values derived above, and using the fact that fX,Y (x, y) = 1 in the
region where X, Y is defined. Make sure that, in your expression of fU,V (u, v),
you convert all x’s and y’s to u’s and v’s in an appropriate way.
Exercise 44.13. Returning to Exercise 44.12, we still need to know the region
where U and V can occur in the U, V plane, in other words, we need to identify
the region where the density fU,V (u, v) is relevant.
a. Find equations for the curves in the U, V plane that correspond to the
four lines in the X, Y plane around the box 0 ≤ X ≤ 1 and 0 ≤ Y ≤ 1.
b. Draw the region in the U, V plane where the joint density fU,V (u, v) is of
interest.
Exercise 44.14. Check your calculations in Exercises 44.12 and 44.13 by mak-
ing sure that you get the usual “1” when you integrate the joint density fU,V (u, v)
from Exercise 44.12, over all values u and v in the picture from Exercise 44.13.
Exercise 44.15. Are the variables U and V from the Exercises 44.12, 44.13,
and 44.14 dependent or independent? Justify your answer.
Chapter 45
Exercise 45.1. Rolling dice. A player rolls a huge bag of 200 dice. Approx-
imate the probability that 40 or more 1’s appear.
Exercise 45.2. Dice game. In a certain game, a player wants to roll five dice
and get as many 1’s as possible. Here is the scheme:
Round 1: The player rolls all five of the dice, and notices how many 1’s
appear.
Round 2: The player sets the 1’s aside and only rolls the dice which did not
show a 1 the first time.
Round 3: The player sets the 1’s aside from rounds 1 and 2 and only rolls
the dice which still did not yet show a 1.
After three rounds, the player is tired so she stops. How many 1’s does she
expect to have at this point?
(Hint: Consider one Bernoulli for each of the dice, so E(X1 + · · · + X5 ) =
E(X1 ) + · · · + E(X5 ).)
Exercise 45.4. Cold drinks. There are many kinds of drinks in a large cooler
in the cafeteria. Fifteen percent of them are decaffeinated. After class (five days
613
614 Chapter 45. Review Questions for All Chapters
a week), Alice always needs a drink. She is always running to her next class,
and the cafeteria is busy, so never has time to pick a specific one. She likes
caffeine, so she will be “happy” if she gets caffeine four or more times during the
five day week. What is the probability that Alice is happy this week?
Exercise 45.5. Snowfall. The amount of snow during the winter in a certain
town follows a Normal distribution with mean 15 inches and standard deviation
4 inches. What is the probability that the snow is more than 7 inches during
the winter?
Exercise 45.6. Texas Hold ’Em. Suppose one is dealing a hand of Texas
Hold ’Em poker at a standard 9-man table (each of the 9 people receives 2
cards, and there are 5 additional community cards placed face-up in front of
the dealer). What is the probability that, after these 23 cards have been dealt,
all 4 of the aces have been dealt?
Exercise 45.8. Waiting for a bus. A passenger is sitting at the mall waiting
for the bus to arrive. The expected waiting time is 30 minutes, i.e., half an
hour. Let X be waiting time (in minutes) until the bus arrives. What is the
probability that the waiting time is more than 20 minutes?
Exercise 45.9. Laundry room. While doing laundry, it seems that each
student spends between 3 to 15 minutes in the laundry room per week, and the
distribution for each student is assumed to be Uniform on this interval.
Exercise 45.10. Flight time. The flight time of a plane flight from Denver
to New York City is Normally distributed, with average flight time of 3 hours
and 16 minutes, and standard deviation of 30 minutes. What is the probability
that such a flight takes 4 hours or longer?
Exercise 45.14. Waiting for a bus. Joe finds that, when he waits for a
bus, his waiting time is Exponential, with an average waiting time of 6 minutes.
Assume that he waits for one bus in the morning and again for one bus in the
evening.
a. What is the probability that he spends 15 minutes or less at the bus stop
altogether during the day?
b. What is his expected waiting time at the bus stop (mornings and evenings
included) during a 20 day period?
c. What is the approximate probability that he spends 200 minutes or less
at the bus stop during a 20 day period (again, mornings and evenings included)?
Exercise 45.15. Artichokes. People shopping at the grocery store are inter-
viewed to see whether they enjoy artichokes. Only 11% of people like artichokes.
a. How many people does the interviewer expect to meet until finding the
25th person who likes artichokes?
b. What is the variance of the number of people he meets, to find this 25th
person who likes artichokes?
Exercise 45.16. Prison escapes. There are 10,000 prison inmates in a certain
state. Independently of each other, and independent of their behavior on previ-
ous days, assume that, on a given day, a prisoner has probability p = 0.000001
of escaping.
616 Chapter 45. Review Questions for All Chapters
a. They want to see if the monkey can accurately type the entire first line
of a famous poem written by Alexander Pushkin:
The poem has 21 characters in this first line (the scientists do not take
spaces into account at all). What is the probability that the monkey types all
21 characters, in order, correctly? (He just types the keys independently, at
random.)
b. Next, the scientists put the monkey in front of a 26-character English
keyboard and see if he can type the phrase in English. It happens to also take
21 characters when written in English: “A MAGIC MOMENT I REMEMBER”
(again, they do not take spaces into account). What is the probability that the
monkey types all 21 characters, in order, correctly?
c. What is the ratio of these two probabilities, i.e., how much more likely
is the monkey to type the 21 character phrase correctly in English versus in
Russian? Give the ratio of the two probabilities that were calculated in the
previous two parts of this problem.
buyer whether he will receive a left or right boot (hence, the need to sell the
boots for an inexpensive price, to still attract buyers). How many boots should
a person buy, to be at least 95% sure of ending up with a complete pair?
Exercise 45.19. Dead pixels. A high-quality computer company sells moni-
tors which rarely have a dead pixel. The number of dead pixels per monitor has
a Poisson distribution with average 0.05 per monitor. If the company sells 30
monitors, and the distribution of pixels is independent from monitor to moni-
tor, what is the distribution of the total number of dead pixels found on all 30
monitors altogether?
Exercise 45.20. Candies. In a large bag of 40 Starburst candies, there are 8
orange, 9 yellow, 12 red, and 11 pink. You only like orange and red. If you take
8 from the bag, what is the probability that at least 5 out of the 8 are ones you
like (i.e., there are 5 or more reds and oranges altogether)?
Exercise 45.21. Study time. Let X be the time (in hours) that Stephen
spends studying on one particular day during “dead week” (the nickname for the
week before final exams). Then X has density 41 e−x/4 . Let Y be the number of
hours that Stephen spends studying for all 7 days during “dead week.” Assume
that the time spent studying on distinct days is independent.
a. Find E(Y ).
b. Find Var(Y ).
Exercise 45.22. Walk time. If the walking time to the dining hall is Expo-
nential with mean 8 minutes, what is the probability that it takes 15 or more
minutes to get there?
Exercise 45.23. Solitaire. A student wins at solitaire about 17% of the time.
How many games does the student expect to play until winning her 3rd game?
Exercise 45.24. Exam scores. On a 50-question Geography exam, the av-
erage score is 25.5 out of 50. The standard deviation of the score is 8. Find
a bound on the probability that a randomly selected student’s score is greater
than 42 or less than 9.
Exercise 45.25. Couples. Consider n pairs of husbands and wives, sitting
randomly in a row of 2n chairs. What is the probability that each person is
sitting beside her/his spouse, and also no two women are adjacent, and no two
men are adjacent (i.e., the sexes are alternating)?
Exercise 45.26. Flower arranging. Each student in a flower arranging class
gets to take home a house plant. Each student has a variety of plants to choose
from, and the selections are made independently (in particular, there is an
ample number of each kind of plant). There is a 20% chance that, when a
student selects her/his plant, the choice will be a peperomia. If there are 60
students in the class, what is the probability that exactly 14 of the students will
take home a peperomia?
618 Chapter 45. Review Questions for All Chapters
Exercise 45.31. Pillows. A girl buys pillows to decorate her apartment living
room. There are square and circular pillows in stock at the store where she is
shopping. There are a total of 400 pillows available:
What is the probability that, if she grabs a square pillow at random, it is stuffed
with cotton fluff?
a. Find P (X < Y ).
b. Find P (X < Z).
P ( X < ln 2 | X − Y = 0) .
this page left intentionally blank
Answers to Exercises
Chapter 1
1.1 Answers will vary. E.g., (1) (0.25, 0.5);
(2) {(x, y) | 0.2 < x < 0.5, 0.36 < y < 0.42}; (3) {(x, y) | x2 + y 2 ≤ 1};
1.3 Answers will vary. E.g., (1) Chris grabs lemon-lime, lemon-lime, orange;
(2) {(x1 , . . . , x4 , orange) | each of x1 , . . . , x4 is lemon-lime or fruit punch};
(3) {(x1 , . . . , xj , orange) | 0 ≤ j ≤ 12; the xj ’s are lemon-lime or fruit punch,
with ≤ 6 of each};
1.5 {(x1 , x2 , . . . , x75 ) | x1 + x2 + · · · + x75 ≤ 400};
1.7 a. 6; S = {(rrbb), (rbrb), (rbbr), (bbrr), (brbr), (brrb)}; b. 26 = 64;
1.9 a. {(x1 , x2 , x3 ) | xj ∈ R>0 }; b. {(x1 , y1 , x2 , y2 , x3 , y3 ) | xj , yk ∈ R>0 };
c. {(x1 , y1 , x2 , y2 , x3 , y3 ) | xj , yk ∈ R>0 ; x1 < x2 < x3 ; y1 > y2 > y3 };
1.11 S = {(x, y) | x, y ∈ R≥0 ; x + y ≤ 2};
1.13 28;
j 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1.15
|Aj | 1 3 6 10 15 21 25 27 27 25 21 15 10 6 3 1
Chapter 2
2.1 a. 330/27333 = 0.0121; b. 9153/27333 = 0.3349; c. No. The table gives
a partition of the songs, i.e., each song is only included in one genre and is
therefore only counted once. The genres are disjoint subsets of the sample
space; d. 18180/27333 = 0.6651;
2.3 0.83; 2.5 a. 13/30; b. 17/30; c. Answers will vary. E.g., Partition the
shoes into those that are black and brown and those that are another color;
2.7 a. 10.222, 24.799, 5.000, 63.1, 30 inches of rain; b. {x | x ≥ 0}; c. Answers
will vary. E.g., partition the sample space into those values which are less than
the average annual rainfall and those which are greater than or equal to the
average; d. Each possible outcome is in exactly one part of the partition.
2.9 a. 5/12; b. 3/4; c. 1/2; 2.11 (1/101)7 ;
2.13 a. 0.01; b. 0.1; c. 0.11; d. 0.89; 2.15 a. 2/3; b. 1/6;
2.17 P {(P )} = 1/3; P {(G, P ), (Y, P )} = 1/3; P {(G, P ), (G, Y, P )} = 1/3;
P {(Y, G, P ), (G, Y, P )} = 1/3; P {(P ), (Y, P )} = 1/2;
2.19 920; 2.21 a. 27 ; b. 21; c. 7j ; d. P (Aj ) = 7j (1/2)7 ; 2.23 0.47;
2.27 a. If A and B are disjoint, then equality is true. Otherwise, A and B will
have an intersection, so P (A ∪ B) = P (A) + P (B) − P (A ∩ B) < P (A) + P (B);
621
622 Answers to Exercises
Chapter 8
x −1 1 2 3
8.3 Mass: ;
P (X = x) 0.6651 0.0121 0.0196 0.3032
0 for x < −1,
0.6651 for −1 < x < 1,
CDF: FX (x) = 0.6772 for 1 ≤ x < 2, ;
0.6968 for 2 ≤ x < 3,
for x ≥ 3
1
pX (x)
0.1
b.
0.05
x
1 2 3 4 5 6
Px
c. FX (x) = 0.1 + (0.1)(0.9) + · · · + (0.1)(0.9x−1 ) = j=1 (0.1)(0.9
j−1 );
FX (x)
1
0.8
d. 0.6
0.4
0.2
x
1 2 3 4 5 6 7 8 9 10 11 12 13 14
8.7 a. X = {0, 1, 2, 3, 4, 5};
x 0 1 2 3 4 5
b. 2109 27417 9139 2717 143 33 or equivalently
pX (x) 9520 66640 33320 33320 13328 66640
x 0 1 2 3 4 5
;
pX (x) 0.2215 0.4114 0.2743 0.0815 0.0107 0.0005
624 Answers to Exercises
0 if x < 0;
0.2215 if 0 ≤ x < 1;
pX (x)
if
0.6330 1 ≤ x < 2;
0.4
c. d. FX (x) = 0.9072 if 2 ≤ x < 3;
0.2
if
0.9888 3 ≤ x < 4;
x
if
1 2 3 4 5 0.9995 4 ≤ x < 5;
if
1 5 ≤ x;
FX (x)
1
0.8
e. 0.6
0.4
0.2
x
1 2 3 4 5
pX (x)
0.25
8.9 a.
x
1 2 3 4
(
bxc/4 for 0 ≤ x ≤ 4
b. FX (x) =
1 for x > 4
FX (x)
1
0.75
c.
0.5
0.25
x
1 2 3 4
8.11 pX (−10) = 0.1; pX (−5) = 0.7; pX (0) = 0.2;
FX (x)
1
0.75
8.13
0.5
0.25
x
1 2 3 4
Answers to Exercises 625
pX (x)
0.3
8.15 a. 0.2
0.1
x
1 2 3 4 5
b. FX (x) = xj=0 e−2 2j/j! for x ∈ N;
P
FX (X)
1
0.75
c. 0.5
0.25
x
1 2 3 4 5 6
8.17 a. X = {0, 1, 2, 3, 4, 5, 6}; b. pX (x) = 6
(1/13)x (12/13)6−x ;
x
pX (x)
1
0.75
c. 0.5
0.25
x
1 2 3 4 5 6
P6
d. FX (x) = 6 x 6−x
x=0 x (1/13) (12/13)
FX (x)
1
0.75
e.
0.5
0.25
x
1 2 3 4 5 6
FX (x)
1
0.75
8.19
0.5
0.25
x
1 2
626 Answers to Exercises
Chapter 9
9.1 Yes, since pX,Y (1, 1) = 2/10 = (4/10)(5/10) = pX (1)pY (1);
9.3 Yes. This makes intuitive sense since the number of flips to the first head
has no bearing on the number of flips after the first head to the second head.
Indeed, we have pX,Y (x, y) = (1/2)x+y = (1/2)x (1/2)y = pX (x)pY (y).
9.5 pX,Y (0, 3) = 0.342375, pX,Y (1, 2) = 0.464375, pX,Y (2, 1) = 0.174125,
pX,Y (3, 0) = 0.019125, pX,Y (x, y) = 0 otherwise;
9.7 Dependent;
Y \X 1 2 3 4 5 6
9.9 0 0 0 0 3/9 2/9 4/9
1 9/19 6/19 4/19 0 0 0
Chapter 10
10.1 0.76;
10.3 Let X be the number of songs until (and including) the folk song. Then
P (X = j) = 1/868 for 1 ≤ j ≤ 868. So
Chapter 13
pX (x)
0.4
13.1 a. 0.3
0.2
0.1
x
1 2 3 4 5
b. 2.32; c. 1.3029;
0 for x<1 FX (x)
for
0.4 1≤x<2 1
0.8
for
0.6 2≤x<3
d. FX (x) = e. 0.6
0.7 for 3≤x<4 0.4
0.98 for 4≤x<5 0.2
x
for
1 x≥5 1 2 3 4 5 6
13.3 a. pX (x) = 4
(13/18)x (5/18)4−x for x = 0, 1, 2, 3, 4;
x
pX (x)
0.4
b. c. 2.8889; d. 0.8958;
0.2
x
1 2 3 4
e. FX (x) = xj=0 4j (13/18)j (5/18)4−j for integers 0 ≤ x ≤ 4;
P
FX (x)
1
0.8
f. 0.6
0.4
0.2
x
1 2 3 4
13.5 a. pX (x) = 5−x17
3 20
x / 5 for x = 0, 1, 2, 3;
0
for x < 0
0.3991 for 0 ≤ x < 1
b. FX (x) = 0.8596 for 1 ≤ x < 2 ;
0.9912 for 2 ≤ x < 3
for x ≥ 3
1
c. 0.75; d. 0.5033;
x -3 1 2.5 7
13.7 a. b. 0.8; c. 17/20; d. 80/97; e. 90.43;
pX (x) 0.03 0.17 0.56 0.24
13.9 a. True. This happens when P (X ≤ x) = 1, or in other words, when all
the mass of the function lies at or below the given value.;
628 Answers to Exercises
Chapter 15
15.1 a. A “success” is when a Skittle is purple; p = 0.2;
b. a “failure” is when a Skittle is not purple, p = 0.8;
c. Random variable X gives the total number of purple Skittles found. It can
take on any integer value from 0 to 25;
d. There are n = 25 independent trials (because it was a random sample), with
equal probability of success(p = 0.2) on each trial, and we are counting up the
number of successes (# of purple Skittles);
e. 0.196; f. 0.9726; g. 5; h. 2; i. 32.5 cents; j. 3;
15.3 a. Random variable X gives the number of babies who are not born by
C-section. It can take an integer value from 0 to 9;
b. There are n = 9 independent trials (because it was a random sample), with
equal probability of success(p = 0.8), and we are counting up the number of
successes (# of babies not born by C-section);
c. 0.1762; d. 0.2587; e. 2; f. 7.2; g. 1.44;
FX (x)
0.4
h. 0.2
x
1 2 3 4 5 6 7 8 9
FX (x)
1
0.8
i. 0.6
0.4
0.2
x
1 2 3 4 5 6 7 8 9
15.5 a. 0.8100 ; b. 20; c. Yes. The expected profit is $650; d. 14 boxes;
15.7 9 times 15.9 a. 0.2503; b. 0.7759; c. 0.9274; d. 0.7625;
15.11 a. 0.992; b. 0.936; c. 0.9285; d. 0.9995; e. 0.7273;
15.13 3.2726 × 10−11 ; 15.15 a. −4; b. 194.4; c. 0.01229;
15.17 a. 0.161; b. 1.4; c. 1.302;
15.19 a. 1.75; b. No. Since there is no replacement, they are not independent
Bernoulli trials;
15.21 School B, because E(XA ) = 222.8 < 243.81 = E(XB );
15.23 a. 270; b. 50 50 j 50−j = 0.0007;
P
j=45 j (7/10) (3/10)
15.25 limn−>∞ P (Xn > 0) = 1. Yes because as n grows very large, the proba-
bility (0.4n ) that no student would eat cereal for breakfast approaches 0.
Chapter 16
16.1 a. The variable X is the number of Skittles that come down the line until
the inspector finds the first one that is striped; here, X can be 1,2,. . . .
630 Answers to Exercises
b. The trials are independent, each with equal probability of success (p = 0.05).
The trials are performed until the first success of finding a striped Skittle.
16.3 a. The variable X is the number of parents you have to ask until you find
one who did not have a baby by C-section; the X can be 1,2,. . . .
b. The trials are independent, each with equal probability of success (p = 0.8).
The trials are performed until the first success of finding parent of a baby not
born by C-section.
pX (x)
1
0.8
h. 0.6
0.4
0.2
x
1 2 3 4 5 6 7 8 9
FX (x)
1
0.8
i. 0.6
0.4
0.2
x
1 2 3 4 5 6 7 8 9
pX (x)
1
0.8
c. 0.6
0.4
0.2
x
1 2 3 4 5 6 7 8 9
Answers to Exercises 631
FX (x)
1
0.8
d. 0.6
0.4
0.2
x
1 2 3 4 5 6 7 8 9
16.7 a. 13 times; b. 0.7865; c. 0.2367; Binomial; n = 6, p = 0.7865;
16.9 a. 3.5714; b. 17.8571 minutes; c. 0.0374;
16.11 a. 0.065498; b. 0.19988; c. 5;
16.13 a. 2.5 games; b. 3.75 games; c. 0.216;
16.15 a. E(X) = 14.2857; b. Var(X) = 189.7959; c. P (Y = y) = q x−3 p;
16.17 P (X > n) = (7/8)n ;
16.19 There is only one possible way for X = x if X Geometric, but there
are nx ways for X to equal x if X Binomial. Geometric variables can have
an infinite number of values, whereas Binomials can only take values between
0 and n. So if X Geometric then P (XP > x|X >y) = q x /q y = q x−y , but if
Chapter 17
17.1 a. The variable X is the number of Skittles until the 3rd striped Skittle
is found; this X can take values 3, 4, 5, . . . .
b. We must count the number of trials until we get the 3rd success, which is
the 3rd striped Skittle found. Each Skittle is independent and each trial has
the same probability of success (p = 0.05).
c. 0.0089; d. 0.0012; e. 0.9978; f. 60; g. 33.7639;
17.3 a. The variable X is the number of parents you have to ask until you find
the 7th whose baby was not born by C-section. The X can be 7,8,. . . .
b. We must count the number of trials until we get the rth success (r = 7),
each set of parents is independent, and each trial has the same probability of
success (p = 0.8).
c. 0.1409; d. 0.2618; e. 8.75; f. 2.1875;
pX (x)
0.4
g.
0.2
x
6 7 8 9 10 11 12 13 14
632 Answers to Exercises
FX (x)
1
0.8
h. 0.6
0.4
0.2
x
1 2 3 4 5 6 7 8 9 10 11 12 13 14
17.5 0.52765; 17.7 E(X) = 66.6667; σX = 19.4365;
17.9 a. 0.3907; b. 60 minutes;
17.11 a. 0.6405; b. 4/27; Geometric; The number of tries to the next success
does not depend on how many tries it took to get the previous successes.
Chapter 18
FX (x)
1
0.8
h. 0.6
0.4
0.2
x
5 10 15 20 25
18.5 a. Binomial(n = 1,000,000, p = 1/729,000);
b. P (X = 3) = 1,000,000 3 (728,999/729,000)999,997 ;
3 (1/729,000)
c. 1.3717; d. Poisson(λ = 1000000/729000 = 1.3717); e. P (X = 3) ≈ 0.1091;
18.7 P (X ≥ 1) ≈ 0.86466;
pX (x)
0.2
18.9 a. 0.1
x
1 2 3 4 5 6 7 8 9
FX (x)
0.8
b. 0.6
0.4
0.2
x
1 2 3 4 5 6 7 8 9 10
18.11 a. 0.4562; b. 0.2081; c. 0.5595;
18.13 a. 0.1804; b. 0.3233; c. 0.5581; d. $5; e. $3.54;
18.15 a. 0.1606; b. 0.000003;
c. 0.0362; the 35 defects can occur any time in the 168 hours (there does not
have to be a certain amount each day).
18.17 a. 0.1126; b. 8.8811; Geometric; 18.19 a. 4; b. 0.1563;
18.21 a. 42; b. 0.04388;
18.23 a. P (X = 8) = 60,000 (1/10,000)8 (9999/10,000)59,992 ;
8
b. P (X = 8) ≈ e−6 68/8!; c. 0.1033; 18.25 Yes
Chapter 19
19.1 a. A “success” is a bag of cookies. A “failure” is a bag of potato chips or
pretzels.
b. The random variable X is the number of bags of cookies you get, so X can
be 0,1,2 or 3.
634 Answers to Exercises
Chapter 20
20.1 a. All 5 outcomes are equally likely (p = 1/5), and you are only selecting
one at random.
b. 1/5;
c. The random variable X is the number of the color that you pick, so X can
be 1, 2, 3, 4, or 5.
d. 4/5;
pX (x)
0.2
e. 0.1
x
1 2 3 4 5
Answers to Exercises 635
FX (x)
1
0.8
f. 0.6
0.4
0.2
x
1 2 3 4 5 6
20.3 Since Y has the same mass as X + 1, it follows that E(Y ) = E(X) + 1,
and Var(Y ) = Var(X).
Chapter 21
21.1 Binomial; n = 50, counting up total number of broken cones, each with
p = 0.12;
21.3 Poisson; there is a rate λ = 2/minute and a set interval (1 hour);
21.5 Hypergeometric; sampling without replacement, population and sample
sizes, number of successes in population;
21.7 Discrete Uniform; equal probability of success for each outcome;
21.9 Binomial (or Poisson approximation); n = 10000, counting up total num-
ber of undercooked cones, each with p = 0.00005; since n is large and p is small,
the approximation with λ = 0.5 is appropriate.
21.11 a. 0.2; b. Binomial(10, 0.2); 0.6242; c. Geometric(0.2); 0.1024;
d. Negative Binomial(0.2, 3); 15; e. Binomial (7, 0.6242); 0.1698;
f. Discrete Uniform(5); 1/5;
21.13 a. Hypergeometric (M = 10, N = 15, n = 3); 24/91 = 0.2637;
b. Binomial(3, 2/3); 8/27 = 0.2963; c. Geometric(2/3); 2/27 = 0.0741;
d. Bernoulli(2/3); 2/3; e. Bernoulli(2/3); 2/3;
21.15 a. Geometric(0.08); 0.0164; b. Negative Binomial(0.08, 4); 50;
c. Binomial(150, 0.08); 12; d. Bernoulli(0.08); 0.92;
21.17 a. Poisson(30); 0.1755; b. Geometric(0.1755); 5.699;
c. Negative Binomial(0.1755, 4); 0.0504;
21.19 a. Hypergeometric(M = 10, N = 45, n = 5); 0.7343;
b. Binomial approximation to the Hypergeometric(5, 1/36); 0.1314;
c. Geometric(0.1314); 7.6113;
21.21 a. Discrete Uniform(7); 1/7; b. Geometric(1/7); 7;
c. Binomial(20,0.25); 0.1897;
d. Poisson approximation to the Binomial (0.365); 0.0005;
21.23 Hypergeometric(M = 5, N = 18, n = 3); a. E(X) = 15/18; σX = 0.7287;
b. 0.1716; c. Binomial approximation to the Hypergeometric(3, 5/18); 0.1886
636 Answers to Exercises
Chapter 22
22.1 a. 4060; b. 24,360;
22.3 a. 840/34650 = 4/165 = 0.0242; b. 210/34650 = 1/165 = 0.0061;
c. 8/55; d. 6/11;
22.5 a. 0.3087; b. 0.03087;
22.7 a. 29/52 = 0.5577; b. 3/52 = 0.0577 c. 20/52 = 0.3846;
22.9 0.00003; 22.11 1/11; 22.13 0.0738;
22.15 a. 57/616 = 0.0925; b. 39/1496 = 0.0261;
c. No. These are only the cases where all boys or all girls are chosen. There are
several cases where some boys and some girls are chosen.
22.17 0.0002; 22.19 a. 64/425 = 0.1506; b. 3/4;
22.21 pX (0) = 44/120 = 11/30; pX (1) = 45/120 = 3/8; pX (2) = 20/120 = 1/6;
pX (3) = 10/120 = 1/12; pX (4) = 0; pX (5) = 1/120
22.23 7.037; 22.25 a. 0.6644; b. 0.0055; 22.27 17/45;
22.29 a. 1/64; b. 0.0757; c. 0.0002; d. 0.0023;
22.31 a. 35; b. 15/35 = 3/7; c. 5/35 = 1/7; d. 210;
22.33 a. 3,628,800 ways to arrange seating;
b. 181,440 ways to arrange seating with the family sitting together;
22.35 1/19; 22.37 a. 0.0319; b. 0.049; c. 0.1976;
22.39 The probability of not sitting next to each other is n−3n−1 for n ≥ 3, or 0
for n = 2.
22.41 2/15; 22.43 2/3; 22.45 n!n!/(2n − 1)! 22.47 0.1
Chapter 24
24.1 a. continuous; b. discrete; c. continuous; d. discrete;
X is discrete X is continuous
P (X ≥ 2) ??? 0.3
24.3 P (X < 2) ??? 0.7
P (X ≤ 2) 0.7 0.7
P (X = 2) ??? 0
24.5 a. k = 30; b. 53/512 = 0.1035;
for x < 2;
0
24.7 a. 4/5; b. FX (x) = (x − 2)/5 for 2 ≤ x ≤ 7;
for x > 7;
1
c. 3/5; d. 0; e. 3/10; f. 0;
1
fX (x)
g. 0.5
x
2 4 6 8 10
1
FX (x)
h. 0.5
x
2 4 6 8
Answers to Exercises 637
for x ≤ 4;
0
24.9 a. FX (x) = (x − 4)/6 for 4 < x < 10;
for x ≥ 10;
1
1
FX (x)
b. 0.5
x
5 10
for y < 0
0
24.11 7/8; 24.13 FY (y) = −3y + 4y for 0 ≤ y ≤ 1
4 3
for y > 1
1
24.15 a. 0.06699; b. 1/2; c. 3/4;
24.17 FX (x) = 1 − e−cx for x > 0;
for x < 16
0
24.19 a. FX (x) = x /64 − x/2 + 4 for 16 ≤ x ≤ 24
2
for x > 24
1
b. a = 20
24.21 k = 1/(ln 20 − 7999) = −0.0001; 24.23 1/3; 24.25 0.3233
24.27 Yes. An example of the CDFs of two such random variables is given
below.
FX (x)
1
x
−10 −8 −6 −4 −2 2 4 6 8 10
Chapter 25
25.1 5/9; 25.3 7/9; 25.5 10/27; 25.7 0.0252;
25.9 a. 1/64; b. 7/64; d. 49/64; e. (1+7+7+49)/64=1;
for x < 0, y < 0
0
25.11 FX,Y (x, y) = x y/2 + xy /2 for 0 ≤ x, y ≤ 1
2 2
for x, y ≥ 1
1
25.13 0.9997; 25.15
( 1/4;
0 for w ≤ 0
25.17 FW (w) =
1−e −3w −e −5w +e −8w for w > 0
25.19 0.06699; 25.21 9/25 = 0.36; 25.23 1/16
Chapter 26
26.1 a. Yes. fX,Y (x, y) is defined on a rectangle and can be factored into
fX (x)fY (y).
b. fX (x) = (2/9)(3 − x) for 0 ≤ x ≤ 3; c. fY (y) = (1/2)(2 − y) for 0 ≤ y ≤ 2;
26.3 a. fX (x) = 3x2 for 0 ≤ x ≤ 1; b. fY (y) = 1/(3y ln 2) for 1/2 ≤ y ≤ 4;
638 Answers to Exercises
Chapter 30
30.1 a. FX (x) = 2x for 0 ≤ x ≤ 1/2; b. 0.4; c. 0.1; d. 0; e. 1/4; f. 0.1443;
3
fX (x)
2
g.
1
x
0.2 0.4 0.6 0.8 1
FX (x)
1
f. 0.5
x
2 4 6 8 10
1 FX (x)
g. 0.5
x
2 4 6 8 10
25th percentile = 4/3; 50th percentile = 2; 75th percentile = 4;
30.5 k = 4/5; 30.7 a. False; 0 ≤ FX (x) ≤ 1 for all x; b. True;
c. False; 0 ≤ pX (x) ≤ 1 for all x;
30.9 a. False; the derivative of FX (x) is fX (x); b. True; c. True;
d. False; the reverse is true.; e. True;
f. False; the reverse is true since FX (x) is the Cumulative Distribution Function
which is the area under the density.
30.11 a. False; 0 ≤ FX (x) for all x since FX (x) is a probability, and probabilities
are always nonnegative;
b. False; 0 ≤ fX (x) for all x since densities are integrated to get probabilities,
and probabilities are always nonnegative;
c. False; 0 ≤ pX (x) for all x since pX (x) is a probability, and probabilities are
always nonnegative.
640 Answers to Exercises
Chapter 31
31.1 a. The density of the defect should be equally weighted throughout the
10-yard section.
b. The X is the actual location of the defect. c. a = 0, b = 10;
d. E(X) = 5 yards; e. σX = 2.8868;
f. fX (x) = 1/10 for 0 ≤ x ≤ 10, 0 otherwise;
fX (x)
0.3
0.2
0.1 x
2 4 6 8 10
for x < 0
0
g. FX (x) = x/10 for 0 ≤ x ≤ 10
for x > 10
1
1 FX (x)
0.5
x
2 4 6 8 10
h. 0.2; i. 0.29; j. 0.4;
31.3 a. 1/4; b. 80; 31.5 4/15; 31.7 1/33; 31.9 a. $46.75; b. $18.0422;
31.11 a. 4.4444; b. 8.025; c. 7.7778; d. 22.84; 31.13 0.2637; 31.15 2;
31.17 E(min (X, Y, Z)) = 2.25; 31.19 $0.21; 31.21 0.075; 31.23 0.19;
31.25 0.6434; 31.27 E(X) = 1; 31.29 FY (y) = 3y 2/100 − y 3/500
Chapter 32
32.1 a. We know the average daily rate of eggs the chickens will lay, and we
waiting for the first egg to be laid (one event). We are measuring the farmer’s
wait in minutes (continuous). b. The X is the amount of time (in minutes)
until the first egg is laid. c. λ = 0.0125/minute; d. E(X) = 80 minutes;
e. Var(X) = 80 minutes;
f. fX (x) = 0.0125e−0.0125x , x > 0
0.015
fX (x)
0.01
0.005
x
20 40 60 80 100 120
Answers to Exercises 641
1 FX (x)
0.5
x
200 400 600 800
h. e−360×0.0125 5j=0 (360 × 0.0125)j/j! = 0.7029;
P
Chapter 35
fZ (z) fZ (z)
0.4 0.4
fZ (z) fZ (z)
0.4 0.4
c. 0.0708 d. 0.9292
0.2
z z
−2 2 −2 2
e. 0
fZ (z)
0.1
35.3 a. 0.4483
z
−5 5 10
fZ (z)
0.1
b. 0.9998
z
−5 5 10
fZ (z)
0.1
c. 0.3679
z
−5 5 10
fZ (z) fZ (z)
0.4 0.4
fZ (z)
0.4
c. 0.935
z
−2 2
35.7 a. 4.31; b. 4.43; c. −2.935 and 6.935;
35.9 a = −0.48; 35.11 0.2389; 35.13 a. 1.464 or less; b. 0.0475; c. 0.095;
35.15 0.9418; 35.17 0.0548; 35.19 0.6826; 35.21 0.0548;
35.23 a. 0.0062; b. 0.3944; c. 0; d. 16.256 ounces;
e. (15.832, 16.168); f. (16.392, ∞)
Chapter 36
36.1 0.3015; 36.3 0.8888; 36.5 0.1685; 36.7 0.2810; 36.9 0.3409; 36.11 0.3778;
36.13 0.7372; 36.15 0.5
Chapter 37
37.1 0.1685; 37.3 0.1271; 37.5 0.0384; 37.7 0.1251; 37.9 0.881; 37.11 0.8098;
37.13 0.3936; 37.15 0.6228; 37.17 0.3156; 37.19 0.7372; 37.21 0.2090;
37.23 0.1841; 37.25 0.0307; 37.27 0.7794; 37.29 a. 0.2177; b. 0.6808; c. 0.6179
Chapter 38
38.1 Normal; 38.3 Bernoulli; 38.5 Geometric; 38.7 Poisson; 38.9 Gamma;
38.11 Hypergeometric; 38.13 a. Binomial(n = 280,000, p = 0.84); b. 0.1515;
38.15 a. 0.3012; b. 0.2019; c. 0.3012; d. 0.5987;
38.17 a. 0.1792; b. 2/5; c. 1/5; 38.19 a. 0.1122; b. 1/8; c. 0.01113;
d. 5 weeks; e. 0.1353; f. 14.78 people;
38.21 a. 30 years; b. 0.05; c. 0.2105;
38.23 a. p =√1/2, n = 1; b. λ √= 2; c. µ = 1/2 = σ;
d. a = (1 − 3)/2, b = (1 + 3)/2; e. not possible;
38.25 a. 0.3125; b. 0.75; c. 0.0935;
38.27 a. 3 students; b. 10 minutes; σ(X) = 10 minutes;
c. 30 minutes; σ(X) = 17.3205 minutes;
d. 0.0111; e. 0.06197; f. 0.3168; g. 1/6; h. 3/5;
38.29 a. 0.2708; b. 0.8546; c. 0.3486;
d. A high-risk driver may cause an accident with a low-risk driver, in which
case they both have an accident, so the assumption of independence is not very
realistic. This assumption allows us to calculate the probability that both types
of drivers have an accident by multiplying the individual probabilities.
38.31 0.4688; 38.33 a. $146.80; b. $7340
Chapter 39
39.1 a. Dependent; b. Cov(X, Y ) = −2.4; c. ρ(X, Y ) = −1;
39.3 a. Cov(X, 1.3X − 10) = 43.3333; b. Corr(X, 1.3X − 10) = 1;
644 Answers to Exercises
Chapter 44
44.1 0.5013; 44.3 0.8333; 44.5 0.3015; 44.7 0.2551; 44.9 2/π;
√ √
44.13 a. y = 0 → v = u; y = 1 → v = 1 + u;
x = 0 → u = 0; x = 1 → u = 1;
2 v
1.5
b. 1
0.5
u
0.5 1
44.15 U and V are independent, since fU,V (u, v) can be factored into fU (u) =
√
1/(2 u) and fV (v) = 1, which are both bonafide densities in the defined regions
√ √
0 ≤ u ≤ 1 and u ≤ v ≤ 1 + u.
Chapter 45
45.1 0.1210;
for x < 0
0
45.3 a. FZ (z) = (z/30) for 0 ≤ x ≤ 30 ;
2
for x > 30
1
b. fZ (z) = 2z/30 for 0 ≤ z ≤ 30, 0 otherwise; c. E(X) = 20;
2
[2] Rick Durrett. Probability: Theory and Examples. Cambridge, 4th edition,
2010.
[5] Sheldon Ross. A First Course in Probability. Prentice Hall, 9th edition,
2014.
647
this page left intentionally blank
Index
649
650 Index
fZ (z)
0.4
For example, in this graph,
the area under the curve is
P (Z ≤ 0.75) = 0.7734.
z
−2 2