0% found this document useful (0 votes)
5 views

Probability

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Probability

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

!

!
!
USING PROBABILITIES
!
!
HAVE YOU EVER WONDERED?
When you donate blood to the Red Cross, ELISA tests are used to detect the presence of
antibodies produced by the human immunodeficiency virus type 1 (HIV-1). These tests are very
good, but they are not perfect. How can probabilities be used to quantify the accuracy of ELISA
tests? More generally, how can probabilities be used in medical diagnoses to help doctors and
patients understand the implications of test results and to communicate with each other?
What, exactly, does it mean when a test for a virus is reported to be 95 percent accurate?
Does this mean that if you have the virus, then 95 times out of 100 this test will indicate that you
have the virus? Or does it mean that if the test says you have the virus, then 95 times out 100 you
do indeed have the virus? Or does it mean both? In this chapter, we will see how probabilities
can be used to answer what are literally life-and-death questions, and how these probabilities
should be interpreted.
!
!
!1

USING PROBABILITIES
Life is a school of probability.
—Walter Bagehot
The great French mathematician Pierre Simon Laplace (1749–1827) observed that probabilities
are only “common sense reduced to calculation.” Explicit calculations have two very big
advantages over common sense. Without the rigor of probability calculations, common sense can
easily be led astray. Without the unambiguous language of probability calculations, common
sense can easily be misunderstood. For example, medical diagnoses are sometimes clear cut (the
x-rays reveal a broken bone) and at other times ambiguous (the patient may or may not have
HIV). Probabilities provide a convenient method for determining and communicating an
uncertain diagnosis.
Suppose that a college student with no prior evidence of HIV gives a blood sample that is
tested for the presence of the HIV antibodies. The test comes back positive, but the test is not
perfect and the results are not always correct. When told of the test results, the student asks a
straightforward question: “Do I have HIV?” A definitive answer of “yes” or “no” is not
warranted because the test is imperfect; yet the student should be told, in some honest way, that
the results are worrisome.
Words alone are inadequate because doctor and patient may interpret words very differently.
When sixteen doctors were asked to assign a numerical probability corresponding to the
diagnosis that disease “cannot be excluded,” the answers ranged from a 5 percent probability to a
95 percent probability, with an average of 47 percent.1 When they were asked to interpret
“likely,” the probabilities ranged from 20 percent to 95 percent, with an average of 75 percent.
Even the phrase “low probability” elicited answers ranging from 0 percent to 80 percent, with an
average of 18 percent. If, by “low probability,” one doctor means 80 percent and another means
no chance at all, then it is better for the doctor to state the probability than to risk a disastrous
misinterpretation of ambiguous words.
Doctors can calculate these probabilities with computer software and national data bases that
combine published medical data with specific information about an individual patient. Later in
this chapter, we will see how such data can be used to determine the probability that HIV
antibodies are present in a blood sample with worrisome test results.
Probabilities can be used not only for life-threatening diseases, but for all the daily
uncertainties that make life interesting and challenging. Think how boring it would be to know in
advance everything that was going to happen to us. School, career, love, and family would all be
so much less fun if the mystery were missing. We wouldn’t need sporting events or elections if
the winners and losers were known in advance. We wouldn’t need a stock market if everyone
knew what stock prices would be tomorrow and every day after that. How interesting do you find
reruns of old football games or microfilms of yesterday’s newspaper? If the future were certain,
life would like rereading an old newspaper. We would be bored actors and actresses walking
through a lifeless script.
Fortunately, life is instead full of uncertainties, with all of the attendant thrills and
frustrations. We can better understand this uncertainty and be more prepared for the possible
outcomes if we use probabilities to quantify these uncertainties. In this chapter, you will see how
!2

to calculate and use probabilities. We begin with games of chance because, when viewed from a
financially safe distance, these are an ideal vehicle for introducing probability logic.
!
5.1 THE GAMBLING ROOTS
The first systematic analysis of probabilities seems to have been conducted by Gerolamo
Cardano “Cardan” (1501–1576). In his Liber de Ludo Aleae (Book of Games of Chance), Cardan
suggested that gambling was “invented” during the Trojan wars to entertain the troops during a
ten-year siege of Palamedes. He explained the rules of various games, philosophized on the
morality of gambling, provided tips for catching cheats, and worked out some careful probability
calculations.
The Flip of a Coin
When a symmetrical coin is flipped fairly, Cardan reasoned that there are two equally likely
outcomes: heads or tails. Because these outcomes are equally likely, each has a 1/2 probability of
occurring. When a symmetrical six-sided die is rolled fairly, there are six equally likely outcomes
and each therefore has a 1/6 probability of occurring. What about the probability that the die
outcome will be an even number? Cardan reasoned that, because three of the six equally likely
outcomes are even numbers, the probability of an even number is 3/6 = 1/2. Cardan’s logic can
be generalized as follows:
The classical equally likely interpretation of probabilities: when there are n equally likely
possible outcomes, the probability that any one of m outcomes will occur is m/n.
The classical equally likely interpretation of probabilities: when there are n equally
likely possible outcomes, the probability that any one of m outcomes will occur is m/n.
After Cardan, the next great milestone in probability theory was laid by the French
mathematician Blaise Pascal in the 1600s. There is a (probably exaggerated) legend that the
French nobleman Antoine Gombauld (the Chevalier de Mere) had won a considerable amount of
money betting that he could roll at least one 6 in four throws of a single die, but was losing
money on bets that he could roll at least one double-6 in twenty-four throws of a pair of dice. De
Mere asked Pascal to analyze these games and, later, other puzzling games of chance. In the
1600s, de Mere’s problem was considered very difficult; you will soon be able to answer it
easily.
Pascal discussed these gambling puzzles with a number of European mathematicians,
including Pierre de Fermat, and these discussions led to the first great books on the mathematical
theory of probability: Christiaan Huygens’ De Ratiociniis in Ludo Aleae (1657) and Pascal’s
Treatise on Figurate Numbers (1665). Huygens and Pascal, like Cardan before them, computed
probabilities by counting the number of equally likely outcomes. However, Huygens and Pascal
were much more systematic and rigorous than Cardan and were more successful in energizing
other intellectuals with their remarkable and revolutionary proofs that chance is subject to
mathematical laws. Pascal wrote that by “bringing together the rigor of scientific demonstration
and the uncertainty of chance, and reconciling those things which are in appearance contrary to
each other, this art can derive its name from both and justly assume the astounding title of the
Mathematics of Chance.”
The underpinnings of their work is the identification of equally likely outcomes. But how do
we know if two outcomes are equally likely? Another French mathematician, Pierre Simon
!3

Laplace, proposed what has become known as Laplace’s principle of insufficient reason: in the
absence of compelling evidence to the contrary, we should assume that the possible outcomes are
equally likely. If the coin or die appears to be fair, we should assume that each possible outcome
is equally likely. We refrain from this assumption if the coin is two-headed, the die is warped, or
the thrower cannot be trusted.
Several Coin Flips
So far, we have only considered the flip of a single coin and roll of a single die. More complex
problems may involve a more complicated enumeration of the possible outcomes. For example,
when a coin is flipped twice, there are three possible outcomes: two heads, one head and one tail,
or two tails. It is tempting to apply Laplace’s principle of insufficient reason and conclude that
each of these three outcomes has a 1/3 probability of occurring. This was, in fact, the answer
given by a prominent eighteenth-century mathematician, Jean d’Alembert. But his answer is
wrong!
There is only one way to obtain two heads and only one way to obtain two tails, but there are
two different ways to obtain one head and one tail—heads on the first flip and tails on the
second, or tails on the first flip and heads on the second. To enumerate the outcomes correctly, it
is sometimes useful to construct a probability tree, which shows the possible outcomes at each
stage in order to determine the possible combinations of outcomes. The probability tree below
(using H for heads and T for tails) shows that there are two equally likely outcomes (heads or
tails) on the first flip, and that, in each of these cases, there are two equally likely outcomes on
the second flip (again heads or tails).
first flip second flip outcomes

H HH
H
T HT

H TH
T
T TT
!
There are four equally likely outcomes, with two involving one head and one tail. Thus the
probability of one head and one tail is 2/4 = 1/2. The probability of two heads is 1/4 and the
probability of two tails is 1/4.
What if, instead of flipping one coin twice, we flip two coins simultaneously? Logically, it
shouldn’t make any difference whether two coins are used instead of one, or whether two coins
are flipped at the same time or one after the other. The above probabilities apply to one coin
flipped twice, two coins flipped separately, or two coins flipped together. A probability tree, in
which the events are treated as happening in a sequence, is merely a device for clarifying our
enumeration of the number of equally likely outcomes.
What about three coin flips? Here is the probability tree:
!4

first flip second flip outcomes


H HH
H
T HT

H TH
T
T TT
!
With three coin flips, there are eight possible outcomes. By inspection, the probabilities are:
Event Number of Ways Probability
Three heads 1 1/8
Two heads, one tail 3 3/8
One head, two tails 3 3/8
Three tails 1 1/8
Total 8 1
These probabilities illustrate the symmetry principle: two events that differ only in their
arbitrary labels are equally probable. When a coin is flipped fairly, heads and tails differ only in
the names used to describe these outcomes. Therefore, the probability of three heads is equal to
the probability of three tails, and the probability of two heads and one tail is equal to the
probability of two tails and one head.
Increasingly complex probability trees can be drawn to analyze four, five, or more coin flips
—though we will soon see that there are shortcuts to avoid such tedious calculations. The
primary use of probability trees is to help us think clearly about multiple combinations of
outcomes, not to force us to do cumbersome arithmetic.
Dice
We can also apply this probability logic to dice rolls. If we roll two six-sided dice fairly, what is
the probability of double-6s? What is the probability that the two numbers will add to 7? Which
is more likely, numbers that add to 7 or numbers that add to 8? Table 5.1 enumerates the possible
outcomes (using a probability tree with the branches omitted to reduce the clutter), and the
implied probabilities are shown in Table 5.2. The probability of double-6s is 1/36. The
probability of numbers that add to 7 is 1/6, which is slightly more likely than numbers that sum
to 8.
Table 5.1 Possible Outcomes When Two Dice are Rolled
First Die Second Die Outcome Sum of Numbers
1 1–1 2
2 1–2 3
3 1–3 4
1 4 1–4 5
5 1–5 6
6 1–6 7
!
!5

1 2–1 3
2 2–2 4
3 2–3 5
2 4 2–4 6
5 2–5 7
6 2–6 8
! 1 3–1 4
2 3–2 5
3 3–3 6
3 4 3–4 7
5 3–5 8
6 3–6 9
! 1 4–1 5
2 4–2 6
3 4–3 7
4 4 4–4 8
5 4–5 9
6 4–6 10
! 1 5–1 6
2 5–2 7
3 5–3 8
5 4 5–4 9
5 5–5 10
6 5–6 11
! 1 6–1 7
2 6–2 8
3 6–3 9
6 4 6–4 10
5 6–5 11
6 6–6 12
!
Table 5.2Probabilities When Two Six-Sided Dice are Rolled Fairly
!
Sum of Dice Number of Ways Probability
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36
total 36 1
!6

In some professions or pastimes, it is helpful to remember the structure of the dice


probabilities shown in Table 5.2. The number 7 has the largest probability and the probabilities
then decline symmetrically as we move away from 7. A sum of 7 is more likely than 6 or 8,
which are more likely than a sum of 5 or 9, and so on. Figure 5.1 is a pictorial representation of
this symmetrical structure, showing all the possible pairs of outcomes when two dice are rolled.
• • ••
•• •
•• •• •• • • • •• •
• • •• • • •• •
•• • • •• ••• • ••• •• •••• •
• • •• • •• • •• • ••• •
• • • •• • •• • ••• •• ••• • •••• ••• •••
• • • • •• • ••• ••• • •• • • •••
• • •• • •• • ••• • ••• • ••• • ••••• ••• • • ••• •••
• • •• •• ••• ••• • ••• • ••• • • ••• • •
• • • • • •••
• • •
••
• • • •
••• • • ••
•••
• •••
• •••
•• •••
••••
• • •••
• • •••
•••••• ••••••
• • ••• ••••••
!
2 3 4 5 6 7 8 9 10 11 12
Figure 5.1 The Possible Outcomes When Two Dice are Rolled
Often, we are interested in a specific probability, rather than all of the probabilities shown in
Table 5.2, and there is a useful shortcut. Suppose that we want to know the probability of rolling
numbers that add to 4. First, we use the probability-tree logic to determine the total number of
possible outcomes. The first die has six possible outcomes and, for each of these outcomes, the
second die has six possible outcomes. So, there are 6(6) = 36 possibilities in all. Second, we
count the number of ways to roll numbers that add to 4. There are three ways (1-3, 2-2, 3-1);
therefore the probability of rolling numbers that add to 4 is 3/36.
Similarly, what is the probability of rolling doubles, with both dice showing the same
number? First, we use the probability-tree logic to determine the total number of possible
outcomes. As before, there are a total of 6(6) = 36 possible outcomes. Because there are six ways
(1-1, 2-2, 3-3, 4-4, 5-5, 6-6) to roll doubles; the probability of doubles is 6/36 = 1/6.
Changing Possibilities
With a sequence of coin flips or dice rolls, the number of possible outcomes is the same at each
stage of a probability tree. Every time a coin is flipped, it has two sides. Every time a die is
rolled, it has six sides. Sometimes we have situations in which the number of possible outcomes
changes as we move through a probability tree.
Consider, for example, studies of a chimpanzee named Sarah that investigated whether
nonhuman primates could be taught language skills.2 In one experiment, Sarah was shown a
series of plastic symbols of varying color, size, and shape that formed a question. To answer this
question correctly, Sarah had to arrange the appropriate symbols in correct order. In a very
simple version, Sarah might be given three symbols—which we will label A, B, and C—to
arrange in correct order (perhaps communicating the answer “bread in bowl”). How many
possible ways are there to arrange these three symbols?
As shown in the probability tree below, the first symbol has three possibilities: A, B, or C.
Given the choice of the first symbol, there are only two possibilities for the second symbol. For
example, if A is selected as the first symbol, the second symbol must be either B or C. Given the
choice of the first two symbols, there is only one remaining possibility for the third symbol. If A
and B are the first two symbols, then C must be the third symbol.
!7

first second third


symbol symbol symbol outcomes

B C ABC
A
C B ACB

A C BAC
B
C A BCA

A B CAB
C
B A CBA
!
Thus the probability tree has three initial branches, followed by two branches after the first node,
and one branch after the second node. In all, there are 3(2)(1) = 6 possible outcomes. The
probability that a random arrangement of three symbols will be in the correct order is 1/6.
In more complicated problems, the drafting of a probability tree may be very messy and
time-consuming. However, by remembering how a probability tree is constructed, we can figure
out the number of possible outcomes without actually drawing a tree. For instance, in one of
Sarah’s experiments, she had to choose four out of eight symbols and arrange these four symbols
in correct order. Visualizing a probability tree, there are eight possible choices for the first
symbol and, for each of these choices, there are seven possible choices for the second symbol.
For any choice of the first two symbols, there remain six possibilities for the third symbol and
then five possibilities for the fourth symbol. Thus, the total number of possible outcomes is 8(7)
(6)(5) = 1,680. The probability that four randomly selected and arranged symbols will be correct
is 1/1,680 = 0.0006. Thus, Sarah’s ability to perform this feat provided convincing evidence that
she was making informed decisions and not just choosing randomly.
Example 5.1: Mendel’s Genetic Theory
Gregor Mendel (1822–1884) was an Austrian Monk who used probabilities to explain the
inheritance of various traits. Mendel’s theory laid the basis for modern genetics, including the
development of many remarkable hybrid plants. One of Mendel’s early experiments involved the
cross-breeding of sweet peas, some with yellow seeds and some with green seeds. He noticed a
number of statistical regularities that he explained by postulating that there are entities, now
called genes, that determine seed color and are inherited from the parent plants.
Mendel found that a crossing of purebred yellow-seeded and green-seeded plants produced
plants with yellow seeds. But when these hybrids were fertilized with their own pollen, a variety
of peas emerged. One-fourth had green seeds and continued to produce green-seeded plants
when they were self-fertilized. One-fourth had yellow seeds and continued to produce yellow-
seeded plants when they were self-fertilized. The remaining half had yellow seeds and yielded
!8

the same offspring proportions as the original hybrids.


What theory could explain these data? Mendel hypothesized that a plant has two genes, each
of which could be either Y or G. Thus a pea plant could be either YY, GG, or YG. A plant with
YY genes produces yellow seeds and a plant with GG genes produces green seeds. A plant with
YG genes produces yellow seeds; therefore, Y is said to be the dominant gene.
One gene is inherited from each of the parent plants. When both parents are YY or both
parents are GG, the offspring will be too. Probabilities come in because when a parent is YG, the
inherited gene is equally likely to be Y or G. The probability tree below shows the possible
outcomes for the offspring of two YG parents. There is a 1/4 probability of two Y genes, a 1/4
probability of two G genes and a 1/2 probability of one Y gene and one G gene—just as Mendel
found in his experiments!
first gene second gene outcomes
Y YY
Y
G YG

Y GY
G
G GG
!
Matters are not always this simple. Some traits, such as hair color, depend on more than one
pair of genes. Some genes, such as those for blood type, have more than two alternative kinds.
And some traits, such as height, depend on environment as well as genes. Nonetheless, Mendel’s
genetic theory is still an extraordinarily simple, yet powerful, application of probability theory.
Exercises
5.1 Mr. and Mrs. ZPG have decided to have two children. Assuming that boy and girl babies
are equally likely and that a baby’s sex is unrelated to the sex of earlier babies, what is
the probability that the ZPGs will have one boy and one girl?
5.2 You are playing draw poker and have been dealt four spades and a heart. If you discard
the heart and draw a new card, what is the probability that this new card will be a spade,
giving you a flush? (Assume that there are no other players, since it can be shown that
your chances do not depend on whether there are other players, as long as you do not
know what cards they have been dealt.)
5.3 The Braille writing system uses six dots, arranged in two columns of three dots. Each dot
can be either raised or flat. How many different Braille characters are possible?
● ●
● ●
● ●
!
5.4 A researcher wants to investigate whether the order in which three prescribed medications
are taken matters. How many volunteers does she need if she wants each possible
!9

sequence to be tried by 30 different different patients? (Each patient will try only one
sequence.)
5.5 A very successful football coach once explained why he preferred running the ball to
passing it: “When you pass, three things can happen [completion, incompletion, or
interception], and two of them are bad.” Can we infer that there is a 2/3 probability that
something bad will happen when a football team passes the ball?
5.2 LONG-RUN FREQUENCIES
One difficulty with the classical approach to probabilities is that there may be compelling
evidence that the possible outcomes are not not equally likely. A company selling a one-year life
insurance policy to a 20-year-old woman can hardly assume that life and death are equally likely
outcomes. To handle such cases, an alternative approach to probabilities has been developed. If a
coin has a 1/2 probability of landing heads, we can infer that, in a large number of coin flips,
heads will come up approximately half the time. Reversing this reasoning, if, in a large number
of trials, an event occurs half the time, we can conclude that its probability of occurring is
approximately 1/2.
The long-run frequency interpretation of probabilities is that if a certain event has
occurred m times in n identical situations (where n is a very large number), then its
probability is m/n.
I once assigned a homework exercise in which each of 30 students flipped a coin 100 times.
Using the students’ last names to arrange the outcomes in alphabetical order, I obtained the
sequence of 3,000 coin flips shown in Figures 5.2 and 5.3. The proportion that were heads varied
considerably during the first 100 flips, but by the end of 3,000 flips was very close to the
anticipated 0.50. The exact heads proportion turned out to be 1511/3000 = 0.5037. If we didn’t
have our equally likely logic to rely on, we could use these data to estimate the probability of a
heads to be 0.5.
head frequency
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 10 20 30 40 50 60 70 80 90 100
number of flips
!
Figure 5.2 The First 100 Coin Flips
!10

head frequency
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 500 1000 1500 2000 2500 3000
number of flips
!
Figure 5.3 All 3,000 Coin Flips
An insurance company that cannot use equally likely logic can estimate the probability of a
healthy 20-year-old woman dying within a year by looking at the recent experiences of millions
of other similar women. If they have data on 10 million 20-year-old women, of whom 12,000
died within a year, they can estimate this probability as 12,000/10,000,000 = 0.0012, or slightly
larger than one in a thousand. In fact, just thirty years after Pascal’s Treatise was published, the
Royal Society of London published mortality tables that could be used to price life insurance.
The equally likely and long-run frequency approaches to probability are two entirely
consistent ways of thinking about probabilities. They differ only in how they go about
determining probabilities. In the equally likely method, we count the possible incomes and think
about whether they are equally likely. In the long-run frequency method, we go out and collect
data from repeated trials. The equally likely method is most appropriate for games of chance or
similar situations in which physical apparatus entitle us to make persuasive assumptions about
the outcome’s relative likelihood. The long-run frequency method is most appropriate when we
suspect that the outcomes are not equally likely and have access to data that can confirm or deny
our suspicions. The long-run frequency approach is obviously needed for insurance rates and
similar situations where it is apparent that the equally likely approach cannot be used. It may also
be useful for fine-tuning the probabilities of events that are almost, but not quite, equally likely.
Will It Be a Boy or a Girl?
Consider the probability that a newborn baby will be a boy or a girl. Laplace’s principle of
insufficient reason suggests that we assume boy and girl babies to be equally likely and
consequently assign a 1/2 probability to each. However, an examination of the data shows that
boy babies slightly outnumber girl babies (perhaps nature’s way of compensating for the male’s
higher fatality rate). During the years 1975–1984, there were 34,696,000 recorded births in the
United States, of which 17,787,000 were gurgling boys and 16,909,000 were bouncing girls.
Using these data, the long-run frequency estimate of the probability of a boy is
!11

17, 787, 000


! P[boy] = = 0.513
34, 696, 000
Because 51.3 percent of these babies were male (and 48.7 percent female), we estimate that a
baby has a 0.513 probability of being male and a 0.487 probability of being female.
A Scientific Study of Roulette
Outside the United States, roulette wheels generally have 37 slots, numbered 0 to 36. Wheels
used in the United States have an additional 00 slot, giving 38 slots in all. If the wheel is
perfectly balanced, clean, and fair, we can assume that each of the slots is equally likely to catch
the ball. In reality, imperfections in the wheels cause some numbers to win slightly more often
than others.
In the late 1800s, an English engineer, William Jaggers, took dramatic advantage of these
imperfections. He paid six assistants to spend every day for an entire month observing the
roulette wheels at Monte Carlo and recording the winning numbers. Jaggers found that certain
numbers came up slightly more often than others. He then bet heavily on these numbers and, in
four days, won 1.5 million francs, nearly $200,000—a fortune in the late 1800s. More recently,
in the 1960s, while their fellow students were studying or protesting, a group of Berkeley
students were reported to have pulled off a similar feat in Las Vegas. Nowadays, unfortunately,
casinos examine and rotate their roulette wheels frequently in order to frustrate the long-run
frequency bettors.
Example 5.2: Experimental Coin Flips and Dice Rolls
Many people have flipped coins over and over again, to see if heads and tails really do occur
with approximately equal frequency. One of the most ambitious was Karl Pearson, a famous
statistician, who flipped a coin 24,000 times and recorded 12,012 heads and 11,988 tails. Heads
turned up a fraction 12,012/24,000 = 0.5005 of the time, indicating that his coin was symmetrical
and tossed fairly. Pearson’s experiment is dwarfed by that of a Swiss astronomer named Wolf,
who rolled dice over a forty-year period, from 1850 to 1893. In one set of experiments, Wolf
tossed a pair of dice, one red and one white, 20,000 times. John Maynard Keynes, the economist,
commented on the results:
[T]he records of the relative frequency of each face show that the dice must have been
very irregular, the six face of the white die, for example, falling 38% more often than the
four face of the same die. This, then, is the sole conclusion of these immensely laborious
experiments—that Wolf’s dice were very ill made...
Wolf recorded altogether ... in the course of his life 280,000 results of tossing
individual dice. It is not clear that Wolf had any well-defined object in view in making
these records, which are published in curious conjunction with various astronomical
records, and they afford a wonderful example of the pure love of experiment and
observation.3
Exercises
5.6 The most important skating event in the Netherlands is the Elfstedentocht, a race over
124 miles of canals through eleven Dutch cities.4 This race is only held if the entire
course is covered by ice at least 8 inches thick. During the 96 years 1900–1995, the race
was held 14 times. Based on these data, what is your estimate of the probability that the
race will be held next year?
!12

5.7 The U.S. postal service handled 56 billion pieces of mail in 1978, of which 4.5 billion
were initially misdelivered and 211,013 were reported lost completely. Based on these
data, what is the probability that a randomly selected letter will be misdelivered? Will be
lost completely? Why might these calculated probabilities understate or overstate the
probability that a letter you send will be misdelivered?
5.8 A book calculated the probabilities of “being injured by various items around your
house” and concluded that, “As the figures show, our homes are veritable booby traps.
Even your cocktail table is out to get you. These accounted for almost 64,000 injuries [in
1977], more than all ladders [62,000 injuries].”5 With 74,050,000 U.S. households, they
calculated the probability of being injured by a cocktail table as 64,000/74,050,000 =
0.00086 and the probability of being injured by a ladder as 62,000/74,050,000 = 0.00084.
Explain why these data do not really show that it is more dangerous to use a cocktail
table than a ladder.
5.3 SUBJECTIVE PROBABILITIES
The long-run frequency approach allows us to extend probabilities to events that are not equally
likely. However, its application is limited to situations where we have lots of repetitive data, and
much of the uncertainty we confront involves virtually unique situations. There are no frequency
data if the situation has never occurred before. Suppose that we are keenly interested in the
outcome of an upcoming presidential election. The opposing candidates may support competing
policies that will have important effects on our lives. Decisions about choosing a career, buying
or selling stocks, expanding or contracting a business, or whether or not to enlist in the military
may well be affected by who you think is going to be elected president.
In 1946, the self-proclaimed Wizard of Odds calculated presidential probabilities as follows:
Miss Deanne Skinner on Monrovia, California, asks: Can the Wizard tell me what the
odds are of the next President of the United States being a Democrat? ... Without
considering the candidates, the odds would be 2 to 1 in favor of a Republican because
since 1861 when that party was founded, there have been 12 Republican Presidents and
only 7 Democrats.6
We shouldn’t predict election outcomes simply by calculating the relative frequencies with
which Democrats and Republicans win. In 1936, Democrat Franklin Roosevelt ran for reelection
against Republican Alf Landon. At that time, there had been 20 previous presidential contests
between Republicans and Democrats, with the Republicans winning 14. Should the 1936
forecasters have concluded that Landon had a 70-percent chance of winning? Or should they
have taken into account the fact that Roosevelt was a popular president running for reelection
against the unexciting governor of Kansas?
By 1964, the tally was 16 election victories for the Republicans and 11 for the Democrats.
Did conservative Republican Barry Goldwater have a 16/27 = 0.59 chance of defeating the
Democratic incumbent, Lyndon Johnson? Were the Republican’s chances approximately the
same in 1964 and eight years later, in 1972, when the Republican incumbent ran against liberal
Democrat George McGovern? Clearly the odds change from election to election, depending on
the candidates and the mood of the electorate.
The outcome of a presidential election is uncertain and it would be useful to quantify that
uncertainty. We would like more than a shrug of the shoulders and a sheepish, “Who knows?”
!13

Neither the equally likely or long-run frequency approach is satisfactory, however. The choice of
president is not determined by the flip of a coin every four years.
Similarly, a good weather forecaster uses more than historical records and a lucky penny to
predict tomorrow’s weather. It would be hard to make rational plans if the best the forecaster
could come up with was, “Maybe it will rain, and then again, maybe it won’t.” It is much more
informative to hear, “There’s a 90 percent chance of rain tomorrow.” A competent doctor will
give individual patients personalized probabilities, not meaningless platitudes: ”You will survive
this operation, unless you don’t.” Will you be accepted by a certain graduate school? Will you be
successful in a certain career? Will interest rates increase or decrease? Did this person rob that
bank? Will Iraq attack Saudia Arabia? Does this blood donor have HIV? Each example is ripe for
a probability assessment, but none fits into the equally likely or long-run frequency mold.
Bayes’ Approach
In the eighteenth century, Reverend Thomas Bayes wrestled with an even more challenging
problem—the probability that God exists. The equally likely and long-run frequency approaches
are useless, and yet this uncertainty is of great interest to many people, including Reverend
Bayes. Such a probability is necessarily subjective. The best that anyone can do is weigh the
available evidence and logical arguments and come up with a personal probability of God’s
existence. This idea of personal probabilities has been extended and refined by other Bayesians,
who argue that many uncertain situations can only be analyzed by means of subjective
probability assessments. Bayesians are willing to assign probabilities to presidential elections,
medical diagnoses, interest-rate movements, legal trials, military strategy, and God’s existence.
A Presidential Election Probability
These personal probabilities can be elicited by offering the person a choice between a gamble
that depends either on the occurrence of the specified event or on a game of chance in which the
probability of winning can be calculated easily. For example, in January of 1995, a former
student, Juan Guerra, telephoned me to ask whether I thought U.S. stocks were cheap or
expensive. In exchange for this valuable advice, he agreed to repeat a mental experiment that he
had participated in during one of my statistics classes.
I asked Juan to choose one of the following gambles:
(a) receiving $10 if Bill Clinton is reelected president in 1996, or
(b) receiving $10 if, on election day, a red card is drawn from a deck containing five
red cards and five black cards.
Juan emphatically chose the card draw, thereby showing that he believed the probability of
Clinton being reelected to be less than 0.5. I then offered him this choice:
(a) receiving $10 if Bill Clinton is reelected president in 1996, or
(b) receiving $10 if, on election day, a red card is drawn from a deck containing three
red cards and seven black cards.
This time, Juan chose Clinton, showing that he believed the probability of Clinton being
reelected to be larger than 3/10 = 0.3. With further questioning, I pinned him down to a 0.35
probability that Clinton would be reelected, by finding that he was indifferent between these
gambles:
(a) receiving $10 if Bill Clinton is reelected president in 1996, or
(b) receiving $10 if, on election day, a red card is drawn from a deck containing 35 red
!14

cards and 65 black cards.


Juan knew a lot about politics. The two presidents preceding Clinton had been Republicans
(Ronald Reagan and George Bush) and the Republican party had recently won control of both
houses of Congress for the first time in forty years. Many Democrats blamed Clinton for the
party’s disastrous showing in the 1994 midterm elections and there was considerable speculation
about Democratic candidates who might run against Clinton in the party’s primary elections. The
economy was doing well, but the Federal Reserve seemed fanatical about inflation and might
cause an economic recession in 1996, which voters would blame on Clinton. None of this could
be translated directly into a probability, but Juan subjectively weighed all of this information, and
more, and came up with a personal probability of 0.35 that Clinton would be reelected. A year
later, in January of 1996, Juan telephoned me again and said that Clinton’s revived popularity
had persuaded him to revise his personal probability of Clinton’s reelection from 0.35 to 0.80.
The Bayesian interpretation of probabilities is that an event has a subjective probability P
of occurring if you are indifferent between a gamble hinging on this event and one based
on the selection of a red card from a deck in which a fraction P of the cards are red.
Many probability theorists are not comfortable with the Bayesian approach. They prefer
equally likely or long-run frequency probabilities on which we can all agree and draw common
conclusions. How can we have a scientific discourse if a probability is 0.35 one year and 0.80 the
next, or if one person believes a probability to be 0.7 and another thinks it is 0.4? Bayesians
respond that these subjective disagreements are all around us and should not be ignored. As Mark
Twain remarked, “It is difference of opinion that makes horse races.” If we refuse to use
probabilities when the equally likely or long-run frequency approach cannot be used, we will be
forced to ignore not only horse races, but many very interesting and far more important
questions. Who will win the next presidential election? Will interest rates go up or down? Will
this medication work? Is Iraq preparing to invade Kuwait? These are all important uncertainties
about which informed and well-intentioned people disagree. Instead of ignoring these
disagreements, Bayesians argue that it is better to quantify them by specifying subjective
probabilities.
Example 5.3: Success Probabilities at Sandoz
Since 1970, Sandoz, a large Swiss pharmaceutical company, has been using subjective
probabilities to gauge the potential success of its research and development projects.7 After a
potentially useful chemical has been identified by exploratory research, it becomes a “project”
and will be tested for a minimum of five years for efficacy and safety. If it passes these tests (on
average, one in ten do), then it is judged technically successful and will be registered and
marketed.
The firm’s decision to approve or reject a project for testing is of crucial importance, because
the new products that will be marketed five-to-ten years from now depend on which projects are
approved for testing today. Even for those projects that are initially approved, the firm must
decide as testing proceeds whether or not to continue spending resources that might be better
used elsewhere. A primary consideration for these important management decisions is each
project’s chance of eventually being judged technically successful. An equally likely assumption
is of little use since all projects are not equally likely to be successful. And because each product
is different, it is not reasonable to estimate success probabilities from long-run frequencies.
!15

Instead, Sandoz uses subjective probabilities, reflecting the consensus of a panel of its
research and development experts. Once a project is launched, its success probability is updated
semiannually. For some projects, the probability of success is revised upward and, as it
approaches 1.0, the project is judged to be a technical success. For other projects, the success
probability declines as testing proceeds and management ends further testing.
Sandoz’s subjective probabilities are no simple matter. These are the chances that an untested
or partially tested compound will eventually be successful. But as the elderly man says about his
health, “I’ve got problems, but life sure beats the alternative.” Sandoz reasons that subjective
probabilities are better than none at all. And it has been pleased to report that its panels’ initial
subjective probability assessments have turned out to be a reliable guide to project success: of
those given an x percent chance of succeeding, about x percent do succeed.
Exercises
5.9 An analyst believes that the stock market is twice as likely to rise during the next 12
months as to fall. What is this analyst’s implicit probability that the market will rise
during the next 12 months? (Assume that there is no chance that the market will be
unchanged.)
5.10 In early 1987, it was very unclear whether the U.S. economy would boom, bust, or
muddle in between. Edward Yardeni, Director of Economics and Fixed Income Research
at Prudential-Bache, explained his position as follows:
We’ve been slumpers for the past few months. A slump is still possible, but now we
feel more comfortable with a muddling scenario. Until recently, we assigned the
slump scenario a probability of 50%, with muddling at 30% and with a boom given a
20% chance. Now, muddling gets 50%, bust gets 30%, and boom stays at 20%.8
Would you characterize his probabilities as being equally likely, long-run frequency, or
subjective? Why are the other two kinds of probabilities inappropriate here?
!
5.4 THE ADDITION RULE
So far, we have focused on probabilities that can be calculated directly by enumerating the
number of equally likely possibilities, examining long-run frequencies, or engaging in subjective
introspection. Some probabilities are better calculated indirectly, by applying standard rules. In
order to make these rules explicit, we will let A identify an event and let P[A] represent the
probability that A will occur. Probabilities cannot be negative or larger than one:
0 ≤ P[A] ≤ 1
If A is impossible, P[A] = 0; if A is certain to occur, P[A] = 1.
Our first rule concerns the probability that either of two events (or possibly both) will occur.
For example, if we flip two coins, what is the probability that one or the other (or both) will be
heads? It is tempting to add the 0.5 probability of heads on the first coin to the 0.5 probability of
heads on the second coin, but that would give a probability of 0.5 + 0.5 = 1.0, which can’t be
correct. We are not 100 percent certain of a heads; we may well gets tails on both coins.
The error is a double counting of the case of two heads. Our 0.5 probability for heads on the
first coin includes the case where we get heads on the second coin, and our 0.5 probability for
heads on the second coin also includes the case of two heads. If we add these two 0.5
probabilities, we count the two-heads case twice. Our earlier probability tree showed that there
!16

are four possible, equally likely outcomes:


HH
HT
TH
TT
The first and second outcomes involve heads on the first coin. The first and third outcomes
involve heads on the second coin. If we add the first and second outcomes to the first and third
outcomes, the first outcome, HH, is counted twice. The correct probability, counting HH only
once, is 3/4.
One way to correct this double counting is to enumerate all the possible outcomes and then
count the successful ones carefully. Another way is to double count knowingly and then subtract
the outcomes that are counted twice.
The addition rule: The probability that either A or B or possibly both will occur is
determined by adding the separate probabilities and then correcting for double counting:
P[A or B] = P[A] + P[B] – P[A and B] (5.1)
With two coin flips, the probability of heads on the first coin is 2/4, the probability of heads on
the second coin is 2/4, and the probability of heads on both coins is 1/4. Therefore, the
probability of heads on either the first or second coin (or possibly both) is 2/4 + 2/4 – 1/4 = 3/4.
Sometimes P[A and B] is equal to zero, because A and B cannot both occur. If A and B are
mutually exclusive, so that P[A and B] = 0, then the addition rule simplifies to
P[A or B] = P[A] + P[B] (5.2)
For instance, if a standard six-sided die is rolled and A is the number 1 and B is the number 2,
then A and B are mutually exclusive outcomes and P[A and B] = 0, since the die cannot be both a
1 and a 2. If we want to calculate the probability that a die will be either a 1 or a 2, there is no
double-counting problem because the die cannot be both a 1 and a 2. The probability that a die
will be either a 1 or a 2 is 1/6 + 1/6 = 1/3.
Many addition-rule questions are answered more easily by using the subtraction rule, which
is discussed later in this chapter. The real value of the addition rule is that it helps us recognize
and avoid the common mistake of simply adding together the probabilities of events that are not
mutually exclusive. For instance, the novelist Len Deighton wrote that a World War II pilot who
has a 2 percent chance of being shot down on each mission is “mathematically certain” to be shot
down in fifty missions. Deighton obtained this mathematical certainty by adding together fifty 2-
percent probabilities in order to get 100 percent. However, the addition rule alerts us to the
existence of double-counting problems. Probabilities cannot simply be added together in this
fashion and Deighton’s calculation is consequently incorrect. (If Deighton’s procedure were
correct, then the probability of being shot down in fifty-one missions would be a nonsensical 102
percent.) Later in this chapter, when we get to the subtraction rule, we will see how to make the
correct calculation.
Exercises
5.11 You are playing backgammon and need to roll a 1 on either of your two dice in order to
hit your opponent. What is your probability of doing so?
5.12 A television weather forecaster once said that there was a 50 percent chance of rain on
Saturday and a 50 percent chance of rain on Sunday, and therefore a 100 percent chance
!17

of rain that weekend. Explain the error in this reasoning. If there is a 40 percent chance
that it will rain on both Saturday and Sunday, what is the probability that it will rain on at
at least one of these two days?
5.13 You run a small business and have two big fears: (a) you will be squeezed by a credit
crunch; and (b) a nationwide discount chain will open a store in your area. You believe
that there is a 0.20 probability of a credit crunch, a 0.05 probability of a discount store,
and a 0.01 probability that both of these disasters will occur. What is the probability that
at least one of your fears will come true?
5.5 CONDITIONAL PROBABILITIES
We began this chapter with the example of worrisome results from an ELISA test for the
presence of HIV antibodies in blood donated to the Red Cross. In order to analyze ELISA test
results with probabilities, we can use a contingency table in which data are classified in one way
by the rows and in another way by the columns. (Because of this two-way classification, a
contingency table is sometimes called a two-way table.) In our example, the blood sample has
two possible conditions (HIV or no HIV) which we show as two rows in this table, and the test
has two possible outcomes (positive and negative) which we show as two columns:
test positive test negative
HIV
No HIV
!
The entries in this table will show the four possible combinations of conditions; for example,
having HIV and testing positive. To determine these entries, we need to assume an arbitrary size
for the population being tested, say 1,000,000 blood samples, and need to specify the fraction of
this population that has HIV. We do not know whether college students who donate blood to the
Red Cross are more or less likely than other students to have HIV. For illustrative purposes, we
will use the estimate that roughly 2 in every 1000 U.S. college students have HIV, and therefore
assume that of these 1,000,000 blood samples, 2,000 (0.2 percent) have HIV and 998,000 (99.8
percent) do not. We show these figures in a total column:
test positive test negative Total
HIV 2,000
No HIV 998,000
Total 1,000,000
!
To fill in the rest of table we need to know the accuracy of the ELISA tests. When HIV
antibodies are present in a blood sample, ELISA tests have a 0.997 probability of detecting the
antibodies and a 0.003 probability of not detecting them; when the antibodies are not present,
ELISA tests have a 0.985 probability of correctly indicating their absence and a 0.015 probability
of incorrectly signaling the presence of the antibodies.9 Therefore, for those 2,000 cases where
the antibodies are present, we use the estimate that ELISA tests have a 0.997 probability of
detecting these antibodies: 0.997(2,000) = 1,994 will show positive test results and 0.003(2,000)
= 6 will show negative results. For those 998,000 cases where the antibodies are not present, we
use the estimates that ELISA tests have a 0.985 probability of correctly indicating their absence:
!18

0.985(998,000) = 983,030 will show negative test results and 0.015(998,000) = 14,970 will show
positive results. Entering these numbers:
test positive test negative Total
HIV 1,994 6 2,000
No HIV 14,970 983,030 998,000
Total 1,000,000
!
Adding up the columns, we can fill in the rest of the contingency table, as shown in Table 5.3.
Table 5.3 ELISA Tests for HIV Antibodies
test positive test negative Total
HIV 1,994 6 2,000
No HIV 14,970 983,030 998,000
Total 16,964 983,036 1,000,000
!
To use this table, we need the following notation: the conditional probability P[B if A] is the
probability that B will occur, given that A has occurred. For example, P[test positive if HIV]
means the probability that a blood sample with the HIV antibodies will test positive. This is very
different from the reverse conditional probability, P[HIV if test positive], which means the
probability that a blood sample that tests positive actually has HIV antibodies. The data in Table
5.3 illustrate this distinction. Of the 2,000 samples with the antibodies, 1,994 test positive;
therefore,
1, 994
! P[test positive if HIV] = = 0.997
2, 000
Of the 16,964 samples that test positive, 1,994 have HIV antibodies:
1, 994
! P[HIV if test positive] = = 0.118
16, 964
These numbers demonstrate clearly that we must exercise care in interpreting conditional
probabilities. If we look at the first numerical row of Table 5.3, we see that 99.7 percent of the
blood samples containing HIV antibodies are correctly identified. Yet, if we look at the first
numerical column, only 11.8 percent of the samples that test positive actually contain HIV
antibodies.
Here is another example of how reversed conditional probabilities can be quite different. At a
large university, only one out of every thousand female students might play on the women’s
basketball team, but 100 percent of the players on the women’s basketball team are female:
P[on women’s basketball team if female] = 0.001
P[female if on women’s basketball team] = 1.0
We calculated the HIV-test conditional probabilities by constructing a contingency table and
then taking the ratio of the appropriate numbers. Conditional probabilities can also be calculated
directly by taking the ratio of two probabilities:
The conditional probability that B will occur, given that A has occurred, is
!19

P[B and A]
P[B if A] =
P[A] (5.3)
In our HIV example,
P[HIV and test positive] 1, 994 / 1, 000, 000
! P[HIV if test positive] = = = 0.118
P[test positive] 16, 964 / 1, 000, 000
Similarly,
P[no HIV and test negative] 983, 030 / 1, 000, 000
! P[no HIV if test negative] = = = 0..999994
P[test negative] 983, 036 / 1, 000, 000
Of those samples that test positive, 11.8 percent have the HIV antibodies; of those that test
negative, 99.9994 percent do not contain HIV antibodies.
Misinterpreting Conditional Probabilities
The popular press often confuses conditional probabilities. For example, Los Angeles removed
half of its 4,000 mid-block crosswalks and Santa Barbara phased out 95 percent of its crosswalks
after a study by San Diego’s Public Works Department found that two-thirds of all accidents
involving pedestrians took place in painted crosswalks.10 However, the fact that most accidents
involving pedestrians take place in crosswalks does not prove that crosswalks are more
dangerous than unmarked streets. The San Diego data give these conditional probabilities
P[crosswalk if accident] = 2/3 and P[no crosswalk if accident] = 1/3
To compare painted crosswalks with unmarked streets, we need to know these conditional
probabilities,
P[accident if crosswalk] and P[accident if no crosswalk]
and we simply do not have enough information to compute these probabilities.
To illustrate the issues involved, we can construct a contingency table with a population of
3,000,000 street crossings and assume hypothetically that there are three accidents, of which two
occur in a crosswalk and one does not:
accident teno accident Total
Crosswalk 2
No crosswalk 1
3 2,997,997 3,000,000
!
In order to compare P[accident if crosswalk] with P[accident if no crosswalk], we need to know
how many pedestrians use crosswalks and how many do not. If, for example, two-thirds of all
pedestrian street-crossings occur in crosswalks, then our complete table looks like this:
accident teno accident Total
Crosswalk 2 1,999,998 2,000,000
No crosswalk 1 999,999 1,000,000
3 2,997,997 3,000,000
!
In this special case, a pedestrian is equally likely to have an accident in a crosswalk or out of a
crosswalk:
!20

2
P[accident if crosswalk] = = 0.000001
2, 000, 000
!
1
P[accident if no crosswalk] = = 0.000001
1, 000, 000
These calculations illustrate the general principle that if two-thirds of all pedestrian accidents
occur in crosswalks and two-thirds of all pedestrian crossings are in crosswalks, then an accident
is equally likely to happen to someone using a crosswalk and to someone who crosses the street
without using a crosswalk. Only if fewer than two-thirds of all pedestrian crossings are in
crosswalks, do the San Diego data indicate that a pedestrian using a crosswalk is more likely to
be involved in an accident than is a pedestrian not using a crosswalk.
A more sophisticated analysis would take into account the location of crosswalks and the
amount of pedestrian traffic. Perhaps crosswalks are generally found at dangerous, heavily
traveled intersections. The question is not whether there are more pedestrian accidents at
dangerous crossings than at remote unused crossings, but whether a particular type of crossing
would be safer without a crosswalk. To exaggerate the point, imagine a town with two streets.
One street has a crosswalk that everyone uses, the other has no crosswalk because nobody ever
walks across that street. (There is nothing on the other side.) All of the pedestrian accidents take
place in the crosswalk, but this provides no evidence on how many accidents would occur if the
crosswalk were removed. To answer that question, we need a controlled study, in which traffic
engineers compare pedestrian crossings that are essentially identical, except that some have
crosswalks and others don’t. This is not an easy task. Our point here is simply that these cities
made an important public-safety decision based on a misinterpretation of conditional
probabilities
Independent Events and Winning Streaks
The probability expression P[B] asks, “Considering all possible outcomes, what is the probability
that B will occur?” The expression P[B if A] asks, “Considering only those cases where A
occurs, what is the probability that B will also occur?” If the probability that B occurs does not
depend on whether or not A occurs, then it is natural to describe A and B as independent: two
events, A and B, are independent if P[B if A] = P[B]. (Logically, it must also be true that P[A if
B] = P[A].)
In our crosswalk example, in the special case where two-thirds of all pedestrian accidents
occur in crosswalks and two-thirds of all street crossings occur in crosswalks, accidents are
independent of whether or not the pedestrian uses a crosswalk:
3
P[accident] = = 0.000001
3, 000, 000
!
2
P[accident if crosswalk] = = 0.000001
2, 000, 000
If, in contrast, the probability of an accident in a crosswalk is not equal to the overall probability
of accident, then crosswalks and accidents are not independent.
In games of chance involving coins, dice, roulette wheels and other physical objects, each
outcome is generally independent of other outcomes, past, present, or future. In any fair game, a
player will win some and lose some (or, it often seems, win some and lose many). The wins will
!21

at times be scattered and, at other times, be bunched together. Some gamblers mistakenly attach a
great deal of significance to these coincidental runs of luck. They apparently believe that luck is
some sort of infectious disease that a player catches and then takes awhile to get over. For
example, Clement McQuaid, “author, vintner, home gardener, and keen student of gambling
games, most of which he has played profitably,” offers this advice:
There is only one way to show a profit. Bet light on your losses and heavy on your wins.
Many good gamblers follow a specific procedure: a. Bet minimums when you’re losing...
b. Bet heavy when you’re winning... c. Quit on a losing streak, not a winning streak.
While the law of mathematic probability averages out, it doesn’t operate on a set pattern.
Wins and losses go in streaks more often than they alternate. If you’ve had a good
winning streak and a loss follows it, bet minimums long enough to see whether or not
another winning streak is coming up. If it isn’t, quit while you’re still ahead.11
You will indeed show a profit if you win your large bets and lose only your small ones, But how
are you to know in advance whether you are going to win or lose your next bet? Suppose you are
playing a dice game and have won three times in a row. You know that you have been winning
and you are probably excited about it, but dice have no memories or emotions. Games were
invented by people. Dice don’t know the difference between a winning number and a losing
number. Dice do not know what happened on the last roll and do not care what happens on the
next roll. The outcomes are independent in that the probabilities are constant, roll after roll.
Sometimes you will win four times in a row, but often you will win three in a row and then
lose the fourth. Sometimes you will win five in a row, but often you will win four in a row and
then lose the fifth. You may fondly recall those occasions when you won four, five, or more
times in a row, and regret afterward that you had not bet more heavily. Unfortunately, there is no
way to know in advance when a winning streak will begin or end. Games of chance are the
classic example of independent events.
Example 5.4: Interpreting Mammogram Results
One hundred doctors were asked this hypothetical question: In a routine examination, you find a
lump in a female patient’s breast. In your experience, only 1 out of 100 such lumps turns out to
be malignant, but, to be safe, you order a mammogram X-ray. If the lump is malignant, there is a
0.80 probability that the mammogram will identify it as malignant; if the lump is benign, there is
a 0.90 probability that the mammogram will identify it as benign. In this particular case, the
mammogram identifies the lump as malignant. In light of these mammogram results, what is
your estimate of the probability that this lump is malignant?
Of the 100 doctors surveyed, 95 gave probabilities of around 75 percent. However, the
correct probability is only 7.5 percent, as shown by the following two-way classification of 1000
patients:
test positive test negative Total
HIV 8 2 10
No HIV 99 891 990
Total 107 893 1,000
!
In 10 of these cases (1 percent), the lump is malignant, and in 990 cases it is benign. Looking
!22

across the numerical rows, we see that the test gives the correct diagnosis in 80 percent of the
malignant cases and 90 percent of the benign cases. Yet, looking down the first numerical
column, of the 107 patients with positive test results, only 7.5 percent actually have malignant
tumors: 8/107 = 0.075.
The data given here imply that, when there is a malignant tumor, there is a 0.80 probability
that it will be detected by a mammogram test; however, in those cases where the mammogram
test indicates the presence of a malignant tumor, there is only a 0.075 probability that it will
actually turn out to be malignant. It is very easy to misinterpret conditional probabilities, and
these doctors evidently misinterpreted them. According to the researcher who conducted this
survey,
The erring physicians usually report that they assumed that the probability of cancer
given that the patient has a positive X-ray...was approximately equal to the the probability
of a positive X-ray in a patient with cancer.....The latter probability is the one measured in
clinical research programs and is very familiar, but it is the former probability that is
needed for clinical decision making. It seems that many if not most physicians confuse
the two.12
The solution, no doubt, is not for doctors to refrain from using probabilities, but to become better
informed about their meaning and interpretation.
Exercises
5.14 In Craig v. Boren (1976), the U.S. Supreme Court considered whether important
government objectives were served by the gender distinction in an Oklahoma statute that
prohibited the sale of 3.2-percent beer to males under the age of 21 and to females under
the age of 18. Among the evidence considered in this case were the following data on
persons arrested in Oklahoma for driving under the influence (DUI) during the last four
months of 1973: 92 percent of those arrested were male; 8 percent of the males arrested
were under the age of 21; and 5 percent of the females arrested were under the age of 21.
Assume a hypothetical population of 10,000 DUI arrests and construct a contingency
table with the columns showing gender (male or female) and the rows showing age
(under 21 or older). Of these DUI arrests, what is the probability that a randomly selected
person under the age of 21 is male? Do these data indicate that the gender and age of DUI
arrests are dependent or independent?
5.15 A 1968 story in a Denver newspaper argued that women are better drivers than men.13
Among the evidence cited: “Of 101 drivers involved in an accident while passing on a
curve, 15 were women.” In addition to automobile data, the newspaper cited these data:
“3,000 men were injured on bicycles in the state in 1967 and 34 were killed, compared
with 662 females injured and 11 killed.” Explain why these data are not sufficient to
show that women are safer than men in driving around curves and riding bicycles.
5.16 Use some specific hypothetical numbers to explain why these data do not necessarily
justify this conclusion by the magazine California Highways:
A large metropolitan police department made a check of the clothing worn by
pedestrians killed in traffic at night. About four-fifths of the victims were wearing
dark clothes and one-fifth light-colored garments. This study points up the rule that
pedestrians are less likely to encounter traffic mishaps at night if they wear or carry
!23

something white after dark so that drivers can see them more easily.14
5.17 Evaluate this advice for winning at craps at Las Vegas:
[F]ind a hot table. Never remain at a cold one. Always make it a policy to keep
looking—move around! A tip-off might be the yelling crowd where a hot roll may be
taking place. Another indication is a lot of money spread every which way around the
table by numerous players.... [I]t is far more lucrative to tag along on the tail end of a
hot roll than to go in fresh on a cold one. And no one moving from table to table will
actually catch a hot roll from the beginning. If 65 percent of a streak is caught, it’s
enough!15
5.6 THE MULTIPLICATION RULE
Equation 5.3, defining conditional probability, tells us how to calculate a conditional probability
if we know P[A and B]. Often, we are interested in the reverse calculation: we know a
conditional probability and want to calculate P[A and B], the probability that A and B will both
occur. We can make this calculation by rearranging Equation 5.3 to obtain
The multiplication rule: The probability that A and B will both occur is
P[A and B] = P[A] P[B if A] (5.4)
For A and B both to occur, A must occur and, given that A has occurred, B must occur too. Thus
the probability of A and B both occurring is equal to the probability that A will occur multiplied
by the probability that B will occur, given that A has occurred. The multiplication rule can be
extended indefinitely to handle more than two events. The probability that A and B and C will
occur is equal to the probability that A will occur multiplied by the probability that B will occur,
given that A has occurred, multiplied by the probability that C will occur, given that A and B
have occurred.
For example, what is the probability that 4 cards drawn from an ordinary deck of playing
cards will all be aces? Because there are initially 4 aces among the 52 cards, the probability that
the first card is an ace is 4/52. Given that the first card is an ace, there are 3 aces among the 51
remaining cards and the probability that the second card will also be an ace is 3/51. If the first 2
cards are aces, there are 2 aces left among 50 cards and the probability of a third ace is 2/50. The
probability of a fourth ace, given that the first 3 cards are aces, is 1/49. Multiplying these
probabilities,
4 3 2 1 24
! P[four aces] = = = 0.0000037
52 51 50 49 6, 497, 400
We could have calculated this probability by visualizing a probability tree with 52(51)(50)(49) =
6,497,400 branches and then determining that 24 of these branches have four aces. The
multiplication rule is easier.
We noted above that if two events are independent then if P[B if A] = P[B]. In this case, the
multiplication rule is modified as follows:
If A and B are independent, so that P[B if A] = P[B], then the multiplication rule
simplifies to
P[A and B] = P[A] P[B] (5.5)
If we roll two dice, for example, the results are independent and the probability of two 1s is
simply the product of each die’s probability of rolling a 1:
!24

11 1
! P[double 1s] = = = 0.0278
6 6 36
Example 5.5: Legal Misinterpretations of the Multiplication Rule
Perhaps the earliest legal use of probability theory and statistical evidence in a U.S. court was in
1867 when Benjamin Peirce, a Harvard mathematics professor, testified that the many
similarities between the signature on a will and the signature on a contested addendum to the will
strongly suggested that the second signature had been traced from the first. Based on a detailed
comparison of the downstrokes in forty-two other uncontested signatures by the deceased, Peirce
estimated the probability of a matched downstroke to be 0.2. Yet, all thirty downstrokes on the
signature on the will’s addendum matched those on the will’s signature. Using the multiplication
rule, he calculated the probability of thirty matches to be 0.230 and concluded that, “So vast an
improbability is practically an impossibility.”
Peirce’s use of the multiplication rule implicitly assumes independence in all 30
downstrokes. There is no mention in his testimony of the reasonableness of this crucial
assumption. Nor did he take into account the fact that the 0.2 estimate was based on 42
signatures made at different times in the deceased’s life, while the signatures on the will and the
addendum were reportedly made on the same day. However, with little understanding of
probabilities and great respect for Peirce’s academic credentials, the opposing attorney did not
challenge the calculations.
A more recent and notorious case is People v. Collins (1968), in which a white woman with
blond hair tied in a ponytail was seen fleeing a Los Angeles robbery in a yellow car driven by a
black man with a beard and a mustache. Four days later, the police found and subsequently
arrested Malcolm Collins, a black man with a beard, mustache, and yellow Lincoln, and his
common-law wife, a white woman with a blond ponytail. A mathematics professor calculated the
probability that two people picked at random would have this combination of characteristics by
estimating the probability of each characteristic: P[black man with beard] = 1/10, P[man with
mustache] = 1/4, P[owning a yellow car] = 1/10, P[interracial couple] = 1/1,000, P[blond
woman] = 1/3, P[wearing a ponytail] = 1/10. Using the multiplication rule,
1 1 1 1 1 1 1
! P[all six characteristics] = =
10 4 10 1, 000 3 10 12, 000, 000
The small value of this probability helped convict Collins and his wife, the jurors apparently
believing, in the words of the California Supreme Court, that “there could be but one chance in
12 million that the defendants were innocent and that another equally distinctive couple
committed the robbery.”
As with the disputed signature 100 years earlier, the professor’s calculation implicitly
assumes independence. But, this time, the California Supreme Court questioned the
appropriateness of this assumption. The probability of having a mustache is clearly affected by
being a black man with a beard: while only 25 percent of the entire population has mustaches,
perhaps 75 percent of black men with beards do. Similarly, being a black man, a blond woman,
and an interracial couple are not independent, nor perhaps are ponytails and blond hair. The court
also raised the possibility that the assumed characteristics might have been incorrect: “the guilty
couple might have included a light-skinned Negress with bleached hair rather than a Caucasian
!25

blonde; or the driver of the car might have been wearing a false beard as a disguise.” Further,
taking into account the population of Los Angeles and using the binomial probabilities discussed
in Chapter 6, the court calculated that, even if there were really only a one-in-twelve-million
chance that a randomly picked couple would match these six characteristics, there was roughly a
40 percent chance of finding another such couple somewhere in Los Angeles. The court decided
that this was not proof beyond a reasonable doubt and reversed the Collins’ conviction.16
Exercises
5.18 A survey asked 68 women of ages 18 to 65 to estimate their weight.17 When the women
were then weighed, all 68 women were found to have overestimated their weight. If a
woman is equally likely to overestimate and underestimate her weight (and no one is
exactly right), what is the probability that 68 of 68 will overestimate their weight?
5.19 A resort has a lip-synch contest in which staff members perform seven songs, with
audience applause determining the rankings of these seven performances. Before the
show, each member of the audience predicts the songs that will finish first, second, and
third. Those whose predictions turn out to be correct win a free dinner. If you pick 3 of
the 7 songs randomly, what is the probability that you will win a free dinner?
5.20 A car was ticketed in Sweden for parking too long in a limited time zone after a
policeman recorded the position of the two tire air valves on one side of the car (in the
1:00 and 8:00 positions) and returned hours later to find the car in the same spot with the
tire valves in the same position.18 The driver claimed that he had driven away and
returned later to park in the same spot, and that it was a coincidence that both tire valves
stopped in the same positions as before. The court accepted the driver’s argument,
calculating the probability that both valves would stop at their earlier positions as (1/12)
(1/12) = 1/144 and feeling that this was not a small enough probability to preclude
reasonable doubt. The court advised, however, that had the policeman noted the position
of all four tire valves and found these to be unchanged, the very slight (1/12)4 = 0.0005
probability of such a coincidence would be accepted as proof that the car had not moved.
As defense attorney for a four-valve client, how might you challenge this calculation?
5.21 Before the fatal 1986 explosion of the Challenger space shuttle, many government
officials believed that the space shuttle program would never have a fatal accident.19
Within NASA, estimates of the probability of a mission failure ranged from 1 in 100,000
(by management) to 1 in 100 (by engineers). On other U.S. rockets, failure rates ranged
from 1 percent (Thor) to 10 percent (Atlas). The tragic Challenger explosion came on its
twenty-sixth mission. What is the probability of 25 successes in 25 missions if the
probability of failure on each mission is 1 percent? 10 percent?
5.7 THE SUBTRACTION RULE
We can be 100 percent certain that either an event will occur or it will not occur; therefore, P[A]
+ P[not A] = 1. From this we derive
The subtraction rule: The probability that A will not occur is
P[not A] = 1 – P[A] (5.6)
If the probability of flipping a head is 1/2, then the probability of not flipping a head is 1 – 1/2 =
1/2. If the probability of rolling a 1 is 1/6, then the probability of not rolling a 1 is 1 – 1/6 = 5/6.
The subtraction rule is so obvious that it seems hardly worth mentioning. However,
!26

sometimes the easiest way to determine the probability of something happening is to calculate
the probability that it will not occur. For instance, what is the probability that among four fairly
flipped coins, there will be at least one heads? We could try to calculate and then add together the
probability of one heads, two heads, three heads, and four heads. A much easier approach is to
calculate the probability of no heads (four tails) and then use the subtraction rule to determine the
probability of at least one heads. The multiplication rule tells us that the probability of four tails
is (1/2)4; therefore, the probability of at least one heads is 1 – (1/2)4 = 0.9375.
For a more complicated example, earlier in this chapter we noted that the Chevalier de Mere
had asked the great French mathematician, Blaise Pascal, to compare these two probabilities:
a. rolling at least one 6 in four throws of a single die
b. rolling at least one double-6 in twenty-four throws of a pair of dice
A naive calculation would note that the probability of a 6 on a single die is 1/6 and add together
four of these probabilities to get 4/6. Similarly, the probability of two 6s on two dice is 1/36 and
the addition of 24 of these probabilities gives 24/36 = 4/6. Yet de Mere was only winning slightly
more than half the time with the first wager and was losing more than half the time with the
second.
The naive calculations are wrong because they add up probabilities without regard for the
double-counting problem. Sometimes when a 6 is rolled on the first die, it will also be rolled on
the second, third, or fourth die too. A simple summation of the four separate probabilities double-
counts all of the multiple occurrences. The same is true of twenty-four rolls of a pair of dice. As
with the addition rule, we need somehow to adjust these summations for all of the double
counting. This is extremely complicated with six dice and unthinkable with twenty-four pairs of
dice. Instead, we will use the insight provided by the subtraction rule.
Whenever we need to calculate the probability that something will happen “at least once,” we
can use the subtraction rule by subtracting from one the probability that it doesn’t happen at all.
The probability of rolling at least one 6 in four throws of a single die is equal to one minus the
probability of rolling no 6s. The probability of rolling a number other than 6 on a single die is
5/6; because the dice rolls are independent, the probability of four straight rolls without a 6 is
(5/6)4. Therefore,
4
⎛ 5⎞
! P[at least one 6] = P[no 6s in four throws] = 1 − ⎜ ⎟ = 0.518
⎝ 6⎠
Similarly, the probability of rolling at least one double-6 in twenty-four throws of a pair of dice
is equal to one minus the probability of rolling no double-6s, which is given by multiplying the
35/36 probability of no double-6 on a single pair of dice by itself twenty-four times:
24
⎛ 35 ⎞
! P[at least one double-6] = P[no double-6s in 24 throws] = 1 − ⎜ ⎟ = 0.491
⎝ 36 ⎠
De Mere’s experience was consistent with these correct probability calculations. The odds were
with him on the first bet, but against him on the second.
Let’s try another one. Earlier in this chapter, we looked at Len Deighton’s incorrect
calculation that a pilot with a 2 percent chance of being shot down on each mission is
“mathematically certain” to be shot down in fifty missions. To determine the correct probability,
we use the subtraction rule to turn the question around: What is the probability that a pilot will
!27

complete fifty missions successfully without being shot down? Once we have the probability of
fifty successful missions, we can subtract this value from one in order to determine the
probability that a pilot will not complete fifty missions successfully.
In order to calculate the probability that a pilot with a 2 percent chance of being shot down
on each mission will complete fifty missions successfully, we need to assume that the mission
outcomes are independent. This assumption is implicit in Deighton’s constant 2-percent
probability, though it is not completely realistic, as pilots no doubt improve with experience.
(Probabilities also vary with the difficulty of individual missions; the 2 percent figure must be a
simplifying average.) With this independence assumption, on each mission a pilot has a 2 percent
probability of being shot down and a 98 percent chance of not being shot down, and the
probability of not being shot down in fifty missions is, using the multiplication rule with
independence, equal to 98 percent multiplied by itself fifty times:
P[not shot down in fifty missions] = 0.9850
The subtraction rule then tells us that the probability that a pilot will be shot down is equal to one
minus the probability of not being shot down,
P[shot down in 50 missions] = 1 – P[not shot down in fifty missions]
= 1 – 0.9850 = 0.6358
Instead of Deighton’s erroneous 100 percent, the correct probability is about 64 percent.
Example 5.6: Matched Birthdays
In a class with 23 students, what is the probability that at least two students were born on the
same day of the year. Surprisingly, the chances are greater than 50 percent!
How do we calculate this probability? If we try to add together the probabilities of individual
students having matching birthdays, we will need a mind-boggling adjustment for the double-
counting of multiple matches. Instead, we use the simplifying logic of the subtraction rule: the
probability of at least one matched birthday is equal to one minus the probability of no matches.
It is traditional in this classic problem to ignore February 29 and to assume that all of the
remaining days are equally likely to be a birthday. We begin by selecting two students. The first
has some birthday, say November 11, and the probability that the second person has a different
birthday is 364/365. Now add a third student. The probability of another different birthday, given
that the first two didn’t match, is 363/365. The probability of no match among the first three
students is equal to the product of these probabilities:
364 363
! P[no match] = P[first two don't match]P[third doesn't match if first two don't match] =
365 365
Extending this logic, the probability that none of the twenty-three students’ birthdays match is
364 363 362 343
! P[no match] = …
365 365 365 365
and the probability of at least one match is
364 363 362 343
! P[at least one match] = 1 − … = 0.51
365 365 365 365
Analogous calculations show that the probability of at least one matched birthday is 0.71 with
thirty students and 0.97 with fifty students. Unless you have an unusually small class, there are
probably at least two students in your statistics class who share a birthday.
!28

Exercises
5.22 In January 1991, shortly after the beginning of allied air raids against Iraq, a newspaper
columnist wrote that “losses per mission were creeping toward the Vietnam and Korean
level of four in every thousand. Such a level sounds good until you do the arithmetic
from the pilot’s point of view and realize that after 100 missions you have a one in three
chance of being shot down.”20 Assuming independence, check this calculation that a pilot
with a 0.004 probability of being shot down on a single mission has a 1/3 probability of
being shot down at least once during 100 missions..
5.23 A dance studio once offered a free introductory dancing lesson to anyone with a “lucky”
dollar bill containing a 2, 5, or 7 in its eight-digit serial number. If all numbers are equally
likely, what is the probability that a randomly selected dollar bill will qualify for this
prize?
5.24 The chair of Los Angeles City/County AIDS Task Force wrote that,
Several studies of sexual partners of people infected with the [HIV] virus show that a
single act of unprotected vaginal intercourse has a surprisingly low risk of infecting
the uninfected partner—perhaps 1 in 100 to 1 in 1,000. For an average, consider the
risk to be 1 in 500....Statistically, 500 acts of intercourse with one infected partner ...
lead to a 100 percent probability of infection.21
Assuming the 1-in-500 figure to be correct, identify the error in the 100-percent
calculation and then calculate the correct probability.
5.8 BAYES’ THEOREM
Reverend Thomas Bayes apparently wanted to use information about the probability that the
world would be as we observe it if God does exist (P[observations if God]) in order to make
inferences about the probability that God exists, given what we observe (P[God if observations]).
Bayes was unable to prove the existence of God, but his work on how to go from one conditional
probability to its reverse has turned out to be extremely useful and is the foundation for the
modern Bayesian approach to probability and statistics.
Although Bayes worked out some sample calculations for various games of chance, his few
published writings do not contain a general formula for reversing conditional probabilities. It was
Laplace who wrote down the general expression and called it Bayes’ theorem.
P[A]P[B if A]
Bayes’ theorem: P[A if B] = (5.7)
P[A]P[B if A] + P[not A]P[B if not A]
You need not memorize Bayes’s theorem. In most cases, the simplest and most intuitive
procedure is to construct a contingency table for a hypothetical population.
For instance, we began this chapter with the example of worrisome results from an ELISA
test for the presence of HIV antibodies in a blood sample. If these antibodies are present, this test
has a 0.997 probability of a positive reading: P[test positive if HIV] = 0.997. The Red Cross and
the donor are of course interested in the reverse conditional probability P[HIV if test positive]—
the probability that a blood sample that tests positive actually has the HIV antibodies.
In order to determine this probability, we assumed a population of 1,000,000 and then used
the given information to construct the contingency table in Table 5.3. This table shows that 1
percent of the blood samples have the HIV antibodies and, of the 2,482 samples with positive
!29

test results, a fraction 1,994/16,964 = 0.118 actually contain HIV antibodies.


Bayes’ theorem is commonly used in two ways, both of which are illustrated in Table 5.3.
The first use is in going from one conditional probability P[A if B] to the reverse, P[B if A].
Here, we went from P[test positive if HIV] = 0.997 to P[HIV if test positive] = 0.118. The second
use of Bayes’ theorem is in revising a probability P[A] in light of additional information, giving
P[A if B]. Here, before using the ELISA test, the probability that a randomly selected blood
sample has the HIV antibodies is P[A] = 0.002. Given the positive ELISA reading, the revised
probability of HIV antibodies is P[A if B] = 0.118. The probability of HIV has increased by a
factor of nearly 60, from 0.2 percent to 11.8 percent, but is still far from certain. It is much more
likely than not that this blood sample does not contain HIV antibodies. The most effective and
unambiguous way of communicating this diagnosis is with an 11.8-percent probability, not with
words (such as “cannot be excluded,” “likely,” or “low probability”) that can easily be
misinterpreted.
Let’s try another example, this time using some of the data cited in the landmark 1964 report
by the U.S. Surgeon General on cigarette smoking.22 At the time, about 6 percent of all deaths in
the United States were due to lung cancer (about 100,000 deaths each year). Not knowing
whether a person smokes tobacco or not, this is our estimate of the probability of dying of lung
cancer. What is our revised probability of death from lung cancer if we know that someone is a
cigarette smoker? The other information we need is that, at that time, 85 percent of the people
who died of lung cancer were cigarette smokers and that about one-third of the adult population
smoked.
We begin the construction of a contingency table by assuming a hypothetical population of
3,000,000 of which 180,000 (6 percent) die of lung cancer and 1,000,000 (one third) are
smokers. (The size of the population doesn’t matter, since we are only interested in percentages;
3,000,000 happens to be easy to work with.)
Smoker Nonsmoker Total
Lung cancer 180,000
No lung cancer 2,820,000
Total 1,000,000 2,000,000 3,000,000
Next, we use the fact that 85 percent of the 180,000 people who die of lung cancer are cigarette
smokers 0.85(180,000) = 153,000, and then fill in the rest of the table:
Smoker Nonsmoker Total
Lung cancer 153,000 27,000 180,000
No lung cancer 847,000 1,973,000 2,820,000
Total 1,000,000 2,000,000 3,000,000
From these numbers, we can calculate the following probabilities:
153, 000
P[lung cancer if smoke] = = 0.1530
1, 000, 000
!
27, 000
P[lung cancer if don't smoke] = = 0.0135
2, 000, 000
Not all smokers die of lung cancer, but they are ten times more likely to die of lung cancer than
are nonsmokers.
!30

Example 5.7: A Bayesian Analysis of Drug Testing


A 1986 editorial in the Journal of the American Medical Association on mandatory urine drug
tests argued that, “An era of chemical McCarthyism is at hand, and guilty until proven innocent
is the new slogan.”23 The editorial noted that it would cost $8 billion to $10 billion annually to
test every employee in the United States once a year and that the accuracy of these tests
(measured in the fraction of the people who are diagnosed correctly) ranged from 75 percent to
95 percent for some drugs and from 30 percent to 60 percent for others.
If an employee is given a drug test, two kinds or errors can happen. A false-positive result
occurs when the test incorrectly indicates the presence of a drug; a false-negative result occurs
when the test fails to detect the presence of drugs. These mistakes can occur for a variety of
reasons, including the mislabeling of samples, the use of contaminated laboratory equipment,
and the technician’s misreading of subjective criteria regarding chemical color, size, and
location.
To illustrate the potential seriousness of the false-positive problem, consider a test
administered to 10,000 persons, of which 500 (5 percent) use the drug that the test is designed to
detect and 9,500 (95 percent) do not. Suppose further that the test is 95 percent accurate: 95
percent of the drug users will be identified as drug users and 95 percent of those who are drug-
free will be identified as drug-free.
Test positive Test negative Total
Drug-user 475 25 500
Drug-free 475 9,025 9,500
Total 950 9,050 10,000
The contingency table shows that of the 500 who use this drug, 475 (95 percent) will have a
positive test result and 25 (5 percent) won’t. Of the 9,500 who don’t use this drug, 475 (5
percent) will have a positive result and 9,025 (95 percent) won’t. Using these numbers, we can
calculate the fractions of the diagnoses that are incorrect:
25
P[drug user if negative reading] = = 0.0028
9, 050
!
475
P[drug user if positive reading] = = 0.50
950
Less than 1 percent of those who test negative are in fact drug users. However, an astounding 50
percent of those who test positive do not use the drug. This is another example of how important
it is to interpret conditional probabilities correctly. In this example, 95 percent of all drug-users
test positive, but only 50 percent of those who test positive are drug users.
These calculations can be redone using other assumptions. An exercise at the end of this
section shows the following:
1. As the number of drug users in the tested group decreases, the fraction of the negative
diagnoses that are incorrect decreases, but the fraction of the positive diagnoses that are
incorrect increases.
2. If the test is less accurate, the fractions of the positive and negative diagnoses that are
incorrect both increase.
Exercises
!31

5.25 Consider a drug test that is 95 percent accurate: 95 percent of those using the drug will be
identified as drug-users and 95 percent of those who are drug-free will be identified as
drug-free.
a. If 10 percent of the persons tested use this drug, what fractions of the negative and
positive diagnoses are, in fact, incorrect?
b. If 1 percent of the persons tested use this drug, what fractions of the negative and
positive diagnoses are incorrect? Based on your calculations, what happens to the
fractions of the negative and positive diagnoses that are incorrect as the number of
drug users in the tested group decreases?
c. Now consider the case where the test is 80 percent accurate and 10 percent of the
persons tested use this drug. What fractions of the negative and positive diagnoses are
incorrect? Based on these calculations, what happens to the fractions of the negative
and positive diagnoses that are incorrect as the test’s accuracy decreases?
5.26 In three careful studies, lie-detector experts examined several persons, some known to be
truthful and the others known to be lying, to see if the experts could tell which were
which.24 Overall, 83 percent of the liars were pronounced “deceptive” and 57 percent of
the truthful people were judged “honest.” Using these data and assuming that 80 percent
of the people tested are truthful and 20 percent are lying, what is the probability that a
person pronounced “deceptive” is in fact truthful? What is the probability that a person
judged “honest” is in fact lying? How would these two probabilities be altered if half of
the people tested are truthful and half are lying?
5.27 In the United States, criminal defendants are presumed innocent until they are proven
guilty beyond a reasonable doubt, because it is thought “better to let nine guilty people go
free than to send one innocent person to prison.” Suppose that 90 percent of all
defendants brought to trial are guilty, 90 percent of all guilty defendants are convicted,
and 90 percent of all innocent defendants are set free.
a. Of those people convicted, what fraction are innocent?
b. Of those people set free, what fraction are guilty?
SUMMARY
Probabilities are used to quantify life’s uncertainties. There are three basic approaches: the
enumeration of equally likely outcomes, the computation of observed long-run frequencies, and a
subjective blending of facts and opinions. The classic equally likely approach was devised to
handle games of chance—the roll of dice, spin of a roulette wheel, and deal of cards. If there are
n equally likely possible outcomes, the probability that any one of m outcomes will occur is m/n.
The long-run frequency approach is designed for cases where there are lots of repetitive data and
the outcomes are apparently not equally likely. If an event has occurred m times in n (a very large
number) identical situations, then its probability is m/n. The subjective approach is closely
associated with Reverend Thomas Bayes. A subjective probability is based on an intuitive
blending of a variety of information, and therefore can vary from person to person. If you are
indifferent between a gamble hinging on the occurrence of an event and a gamble based on the
selection of a red card from a deck in which a fraction P of the cards are red, then you evidently
believe this event to have a probability P of occurring.
A number of rules can be used to simplify probability calculations. The addition rule gives
!32

the probability that either A or B, or both, will occur: P[A or B] = P[A] + P[B] – P[A and B]. The
multiplication rule gives the probability that both A and B will occur: P[A and B] = P[A] P[B if
A]. where P[B if A] is the conditional probability that B will occur, given that A has occurred.
The subtraction rule, P[not A] = 1 – P[A], is particularly useful for figuring out the probability
that something will happen at least once. Bayes’ theorem tells us how to reverse a conditional
probability, from P[B if A] to P[A if B] , or to revise a probability P[A] in light of additional
information. A contingency table provides a simple apparatus for using Bayes’ theorem.
REVIEW EXERCISES
5.28 The color of flowers on Japanese four o’clocks is determined by a pair of genes, each of
which can be either R or W. The flower is red if the genes are RR, white if WW, and pink
if RW. Each color gene is inherited from one of the two parent plants and each of a
parent’s two color genes is equally likely to be inherited. Construct a table showing the
respective probabilities of red, white, and pink flowers (a) if the two parent plants are RR
and WW, and (b) if the two parent plants are RW and RW.
5.29 In the card game Between the Sheets, a player is dealt two cards, face up, from a standard
deck of 52 playing cards. Ace is low and king is high. The player can either fold or bet
that the value of a third dealt card will be between the values of the two original cards.
The bet is lost if the third card is above, below, or matches the first two cards. What is the
probability of winning if your two cards are
a. a three and a nine?
b. a four and a ten?
c. an ace and a king?
d. a pair of jacks?
5.30 A prominent French mathematician, Jean d’Alembert, argued that because there are four
possible outcomes when three coins are flipped (no heads, a head on the first flip, a head
on the second flip, or a head on the third flip), the probability of no heads is 1/4. Explain
the flaw in his reasoning.
5.31 A June 18, 1990 letter to The Nation complained about the outcome of a poetry contest
cosponsored by the magazine:
what really irks me is your result: Four female poets win a competition cosponsored
by a publication with a female poetry editor....I can imagine the screams from the
gallery if the results were as one-sided in the other direction—four male winners of a
prize offered by a publication with a male poetry editor.
The poetry editor responded that
The Discovery-Nation contest is nearly unique in that it is judged anonymously.
Neither the judges nor I know the gender or the names of the poets who enter the
competition. However, this information might enlighten you: According to the laws of
probability, if an equal number of male and female poets submit entries, one out of
sixteen times all the winners will be female, or male.
Explain where the one-out-of-sixteen figure comes from and identify the assumptions
implicit in this calculation.
5.32 In the game Scrabble, tiles with printed letters are arranged to spell words. Suppose that
you want to use seven tiles containing seven different letters to spell a seven-letter word.
!33

How many possible ways can these seven tiles be arranged? How many arrangements are
possible for making four-letter words using any four of seven different letters?
5.33 Explain why you either agree or disagree with this reasoning: “Females make up half the
human race, Cathy. If you answer the phone, you’ve got a fifty-fifty chance of hearing a
woman’s voice on the line.”25
5.34 In the game odd-or-even, two players simultaneously reveal a hand showing either one or
two fingers. If each player is equally likely to show one or two fingers and their choices
are made independently, what is the probability that the sum will be even?
5.35 The traditional Indian game Tong is similar to the odd-or-even game described in the
preceding exercise, except that each player can show one, two, or three fingers. If each
possibility is equally likely and the choices are made independently, what is the
probability that the sum will be an even number?
5.36 In 1991 all 3-digit telephone area codes in the United States began with a digit other than
0 or 1 and had either a 0 or 1 for the second digit. In addition, eight 3-digit numbers
(including 800 and 900) that met this first criteria were reserved for special purposes and
could not be used as area codes. What is the maximum possible number of area codes
consistent with these restrictions? (These area-code restrictions have now been relaxed
because of the proliferation of fax machines and modems.)
5.37 Life insurance costs more for a 50-year-old than for a 20-year-old. Is this because 50-
year-olds have higher incomes and can therefore afford to pay more?
5.38 A standard die is painted green on four sides and red on two sides and will be rolled six
times. You can choose one of these three sequences and will win $25 if the sequence you
choose occurs. Which do you choose and why?
a. red green red red red either
b. red green red red red green
c. green red red red red red
5.39 The Morse code consists of a sequence of telegraphic “dot” or “dash” symbols, with each
sequence representing a letter of the English alphabet. For example, dot = E; dot-dash =
A; and dot-dash-dot = R. How many English letters can be represented by different
sequences of from one-to-three dot or dash signals? How long a sequence did Morse have
to allow in order to include all 26 letters of the English alphabet?
5.40 A research group estimated that college football players have a 23 percent chance of a
knee injury each season and a 64 percent chance of at least one knee-injury during a four-
year career.26 Does this 64 percent probability assume independence from one year to the
next?
5.41 A book on probabilities advises
[N]ext time you pat your little nephew on the head and ask him what he wants to be
when he grows up, don’t expect a quick and intelligent answer. The poor kid has the
odds stacked 17,452 against him. That’s how many specified occupations there are in
the Dictionary of Occupational Trades.27
Explain why you either would or would not use 1/17,452 as the probability of your
nephew correctly selecting his future occupation.
5.42 In 1978 there were 73,767 deaths in New York City and 217 deaths in Dover, Delaware.
!34

Do these data show that New York is 73,767/217 = 340 times as dangerous as Dover?
5.43 A magazine article on driving safety said that a person “hurtling down the highway at 70
miles an hour, careening from side to side” would be four times more likely to have a
fatal accident if it was seven at night than if it were seven in the morning, because, “Four
times more fatalities occur on the highways at 7 P.M. than 7 A.M.”28 Why do these data
not justify the conclusion? Be specific.
5.44 Galileo wrote a short note on the probability of obtaining a sum of 9, 10, 11, or 12 when
three dice are rolled. Someone else had concluded that these numbers are equally likely,
because there are six ways to roll a 9 (1-4-4, 1-3-5, 1-2-6, 2-3-4, 2-2-5, or 3-3-3), six
ways to roll a 10 (1-4-5, 1-3-6, 2-4-4, 2-3-5, 2-2-6, or 3-3-4), six ways to roll an 11
(1-5-5, 1-4-6, 2-4-5, 2-3-6, 3-4-4, or 3-3-5), and six ways to roll a 12 (1-5-6, 2-4-6, 2-5-5,
3-4-5, 3-3-6, or 4-4-4). Yet Galileo observed “from long observation, gamblers consider
10 and 11 to be more likely than 9 or 12.” How do you think Galileo resolved this
conflict between theory and observation?
5.45 Explain the error in this statistical reasoning: “Statistically, the British are more likely to
change spouses than banks. One-third of marriages end in divorce, but only 3% of clients
switch banks in any year.”29
5.46 The Wall Street Journal and Washington Post both reported the results of a 1991 study
that estimated the probability that a 40-year-old, sober, seat-belted person driving a
heavier-than-average car would have a fatal accident while making a 600-mile
automobile trip.30 The authors of this study calculated this probability by multiplying the
overall driver fatality rate by four risk factors. For example, the probability that a heavier-
than-average car will have a fatal accident is only 0.77 times the probability that a car of
average weight will have a fatal accident. So the overall driver fatality rate was
multiplied by 0.77. This adjusted number was then multiplied by 0.68 because the
probability that a 40-year-old will have a fatal accident is only 0.68 times the probability
that a driver of average age will have a fatal accident. Similar adjustments were made for
being sober and wearing a seat belt. What is wrong with this calculation?
5.47 What are the sucker’s chances of winning the following game?
Take a small opaque bottle and seven olives, two of which are green, five black. The
green olives are considered the “unlucky” ones. Place all seven olives in the bottle,
the neck of which should be such a size that it will allow only one olive to pass
through at a time. Ask the sucker to shake them and then wager that he will not be
able to roll out three olives without getting an unlucky green one amongst them.31
5.48 In the carnival game Queens, there are six cards—two kings, two queens, and two jacks.
The cards are turned face down and shuffled. The player picks two cards and wins if
neither is a queen. If the game is fair, what is the probability of winning?
5.49 A Temple University mathematics professor used these data to show that most Americans
have an exaggerated fear of terrorists:
Without some feel for probability, car accidents appear to be a relatively minor
problem of local travel while being killed by terrorists looms as a major risk of
international travel. While 28 million Americans traveled abroad in 1985, 39
Americans were killed by terrorists that year, a bad year—1 chance in 700,000.
!35

Compare that with the annual rates for other modes of travel within the United States
—1 chance in 96,000 of dying in a bicycle crash, 1 chance in 37,000 of drowning,
and 1 chance in only 5,300 of dying in an automobile accident.32
How do you suppose the author calculated the probabilities of dying in a bicycle
accident, of drowning, and of dying in a car accident? Do these calculations prove that it
is more dangerous to drive to school than to fly to Paris?
5.50 In the 1942 federal case Hill v. Texas,33 poll tax data indicated that 8,000 of the 66,000
voters in Dallas County were black. However, none of the 64 citizens selected one year
for possible service on the county’s grand juries were black. If the 66,000 names were
placed in a hat and 64 were randomly selected, what is the probability that none of the
selected persons would be black? In fact, no blacks had been selected for grand jury duty
in Dallas County for sixteen years. Is the probability of selecting no blacks for sixteen
consecutive years larger or smaller than the probability of selecting no blacks in one
particular year?
5.51 Answer this letter to newspaper columnist Marilyn vos Savant, who is listed in the
Guinness Book of World Records Hall of Fame for “Highest IQ”:34
Three of us couples are going to Lava Hot Springs next weekend. We’re staying two
nights, and we’ve rented two studios, because each holds a maximum of only four
people. One couple will get their own studio on Friday, a different couple on
Saturday, and one couple will be out of luck. We’ll draw straws to see which are the
two lucky couples.
I told my wife we should just draw once, and the loser would be the couple out of
luck both nights. I figure we’ll have a two-out-of-three (66 2/3%) chance of winning
one of the nights to ourselves. But she contends that we should draw straws twice—
first on Friday and then, for the remaining two couples only, on Saturday—reasoning
that a one-in-three (33 1/3%) chance for Friday and a one-in-two (50%) chance for
Saturday will give us better odds....
Which way should we go?
5.52 On November 30, 1992, a panel of earthquake scientists estimated that each year there is
a 0.05 to 0.12 probability of a Southern California earthquake of magnitude 7 or larger.35
(This was an sharp increase from earlier estimates of a 0.04 probability of a magnitude-7
earthquake in any given year.)
a. Assuming independence, what is the probability of at least one magnitude-7
earthquake during a five-year period if there is a 0.05 probability in any given year? If
there is a 0.12 probability in any given year?
b. The panel of earthquake scientists cautioned that, “Because few historical precedents
exist and those that do are for rather different circumstances, probability cannot easily
be addressed as frequency of occurrence.... Rather, probability must be interpreted as
betting odds.” Would you characterize these probabilities as equally likely, long-run
relative frequency, or subjective? Explain your reasoning.
5.53 The random walk model of stock prices states that stock-market returns are independent
of the returns in other periods; for example, whether the stock market does well or poorly
in the coming month does not depend on whether it has down well or poorly during the
!36

past month, the past 12 months, or the past 120 months. On average, the monthly return
on U.S. stocks has been positive about 60 percent of the time and negative about 40
percent of the time. If monthly stock returns are independent with a 0.6 probability of a
positive return and a 0.4 probability of a negative return, what is the probability of
a. twelve consecutive positive returns?
b. twelve consecutive negative returns?
c. a positive return, if the return the preceding month was negative?
5.54 One in eight hundred U.S. women in infected with HIV. Thirteen percent of all U.S.
females are black, but 53 percent of the U.S. female HIV carriers are black. What is the
probability that a randomly selected black U.S. woman has HIV? What is the probability
that a randomly selected non-black U.S. woman has HIV?
5.55 If two different cards are drawn from a thoroughly shuffled standard deck of 52 playing
cards, what is the probability that one of the cards will be either a ten, jack, queen, or
king and the other card will be an ace, not necessarily in this order?
5.56 Fifteen percent of U.S. households live below the poverty line. In a third of all U.S.
households, a woman is the sole income provider. In 60 percent of poor households, a
woman is the sole income provider. Of those households in which a woman is the sole
income provider, what fraction are poor?
5.57 Approximately 1.5 percent of Americans are schizophrenic. Computerized axial
tomography (CAT) scans show brain atrophy in 30 percent of the people diagnosed as
schizophrenic and in only 2 percent of the persons diagnosed as not schizophrenic.36 In
the 1982 trial of John Hinckley for the attempted assassination of President Ronald
Reagan, the defense attorney tried to present evidence that a CAT scan of Hinckley had
shown brain atrophy, thereby indicating that Hinckley was schizophrenic. Consider the
hypothetical case of 10,000 randomly selected Americans who are given CAT scans, and
use a contingency table to calculate the fraction of those persons with brain atrophy who
are schizophrenic. Is a CAT scan showing brain atrophy persuasive evidence that the
person is schizophrenic?
5.58 The National Society of Professional Engineers used the following sample question to
promote their national junior-high-school math contest:37
Question: According to the Elvis Institute, 45% of Elvis sightings are made west of
the Mississippi, and 63% of sightings are made after 2 p.m. What are the odds of
spotting Elvis east of the Mississippi before 2 p.m.?
Explain how this group calculated their answer, 20.35 percent, and why it is wrong.
5.59 Here is a probability variant of three-card Monte. You are shown three cards: one black
on both sides, one white on both sides, and one white on one side and black on the other.
The three cards are dropped into an empty bag and you slide one out; it happens to be
black on the side that is showing. The operator of the game says, “We know that this is
not the double-white card. We also know that it could be either black or white on the
other side. I will bet $5 against your $4 that it is, in fact, black on the other side. Can the
operator make money from such bets without cheating somehow
5.60 Smith is learning to play squash. Andrabi has set up an April 1 match in which Smith will
be given a prize if he can win two consecutive games in a three-game match against
!37

Andrabi and Ernst alternately, either Andrabi-Ernst-Andrabi or Ernst-Andrabi-Ernst.


Assume that Andrabi is a better player than Ernst and that Smith’s chances of winning a
game against either player are independent of the order in which the games are played
and the outcomes of other games. Which sequence should Smith choose and why?
Projects
For each of the following projects, type a report in ordinary English using clear, concise, and
persuasive prose. Any data that you collect for this project should be included as an appendix to
your report. Data used in your report should be presented clearly and effectively.
5.1 Use computer software to simulate 1,000 flips of a fair coin. Record the fraction of the
flips that were heads after 10, 100, and 1,000 flips. Repeat this experiment 100 times and
then use three histograms to summarize your results.
5.2 From ten people, elicit subjective probabilities of a Democrat winning the next
presidential election. How easy or difficult was it to determine each of these personal
probabilities? How compact or dispersed are their probabilities?
5.3 It used to be argued that the reason boy babies slightly outnumber girl babies is because
some families prefer boys and will continue to have children until a boy is born, but stop
having children once a boy is born. To explore the consequences of such preferences, use
coin flips to simulate the birth of a boy (heads) or girl (tails) for a family that will have
children until a boy is born, and then stop. For each imaginary family, flip the coin until a
heads appears and record the number of girls and boys in this family. Repeat this
experiment for 100 families. Use two histograms to display the number of boys and girls
in these 100 families. Now calculate the total number of boys and girls.
5.4 Example 5.6 describes the matched-birthdays paradox. On a sheet of paper, write the
names of the 12 months, leaving plenty of room to write up to 31 dates next to each
month. At your college dining hall, survey 50 people, recording the birthday of each on
your sheet of paper, and see if you get any matches.
5.5 The first paragraph of Example 5.4 poses a hypothetical question about the probability
that a lump is malignant in light of worrisome mammogram results. Type this
hypothetical scenario (all of the first paragraph, beginning with “In a routine
examination”) on a sheet of paper and ask 25 people to read over the description and then
give an estimate of the probability that the lump is malignant. Use a histogram to
summarize your results.
5.6 Exercise 5.59 describes a probability variant of three-card Monte. Play this game against
yourself 60 times, recording how often if the front of the selected card is black, the back
is black too, and how often, if the front of the selected card is white, the back is white
too. Are these frequencies closer to 1/2 or 2/3? (DO NOT BET ON THIS GAME.)

You might also like