İlk Ünite-Introduction To Probability
İlk Ünite-Introduction To Probability
Contents
1.1. Sets . . . . . . . . . . p. 3
1 .2 . Probabilistic Models . . . p. 6
1 .3. Conditional Probability p. 18
1 .4. Total Probability Theorem and Bayes' Rule p. 28
1 .5. Independence . . . . . . p. 34
1 .6. Counting . . . . . . . p. 44
1 .7. Summary and Discussion p. 5 1
Problems . . . . . . . . p. 53
1
2 Sample Space and Probability Chap. 1
In fact, the choices and actions of a rational person can reveal a lot about
the inner-held subjective probabilities, even if the person does not make conscious
use of probabilistic reasoning. Indeed, the last part of the earlier dialog was an
attempt to infer the nurse's beliefs in an indirect manner. Since the nurse was
willing to accept a one-for-one bet that the drug would work, we may infer
that the probability of success was judged to be at least 50%. Had the nurse
accepted the last proposed bet (two-for-one) , this would have indicated a success
probability of at least 2/3.
Rather than dwelling further on philosophical issues about the appropriate
ness of probabilistic reasoning, we will simply take it as a given that the theory
of probability is useful in a broad variety of contexts, including some where the
assumed probabilities only reflect subjective beliefs. There is a large body of
successful applications in science, engineering, medicine, management, etc. , and
on the basis of this empirical evidence, probability theory is an extremely useful
tool.
Our main objective in this book is to develop the art of describing un
certainty in terms of probabilistic models, as well as the skill of probabilistic
reasoning. The first step, which is the subject of this chapter, is to describe
the generic structure of such models and their basic properties. The models we
consider assign probabilities to collections (sets) of possible outcomes. For this
reason, we must begin with a short review of set theory.
1.1 SETS
Probability makes extensive use of set operations, so let us introduce at the
outset the relevant notation and terminology.
A set is a collection of objects, which are the elements of the set. If S is
a set and x is an element of S, we write x ES. If x is not an element of S, we
write x � S. A set can have no elements, in which case it is called the empty
set, denoted by 0.
Sets can be specified in a variety of ways. If S contains a finite number of
elements, say Xl , X2 , . . . , Xn,
we write it as a list of the elements, in braces:
For example, the set of possible outcomes of a die roll is {I, 2, 3, 4,5, 6 } , and the
set of possible outcomes of a coin toss is {H, T} , where H stands for "heads"
and T stands for "tails."
If S contains infinitely many elements Xl, X2 ,
, which can be enumerated
. . •
in a list (so that there are as many elements as there are positive integers) we
write
X2 ,
S = {x I , . . } , .
and we say that S is countably infinite. For example, the set of even integers
can be written as {O, 2, -2, 4, -4, . } , and is count ably infinite.
. .
4 Sample Space and Probability Chap. 1
Alternatively, we can consider the set of all x that have a certain property
P, and denote it by
{x I x satisfies Pl·
(The symbol "I" is to be read as "such that." ) For example, the set of even
integers can be written as {k I k /2 is integer} . Similarly, the set of all scalars x
in the interval [0, 1] can be written as {x I 0 ::; x � I}. Note that the elements x
of the latter set take a continuous range of values, and cannot be written down
in a list (a proof is sketched in the end-of-chapter problems) ; such a set is said
to be uncountable .
If every element of a set S is also an element of a set T, we say that S
is a subset of T, and we write SeT or T ::J S. If SeT and T C S, the
two sets are equal , and we write S = T. It is also expedient to introduce a
universal set , denoted by f2, which contains all objects that could conceivably
be of interest in a particular context. Having specified the context in terms of a
universal set f2, we only consider sets S that are subsets of f2.
Set Operations
The complement of a set S, with respect to the universe f2, is the set {x E
f2 I x � S} of all elements of f2 that do not belong to S, and is denoted by Sc.
Note that f2C = 0.
The union of two sets S and T is the set of all elements that belong to S
or T (or both), and is denoted by S U T. The intersection of two sets S and T
is the set of all elements that belong to both S and T, and is denoted by S n T.
Thus,
S U T = {x I xE S or xE T},
and
S n T = {x I xES and xE T} .
In some cases, we will have to consider the union or the intersection of several,
even infinitely many sets, defined in the obvious way. For example, if for every
positive integer n, we are given a set Sn , then
U Sn
00
. respectively ) .
i.e .. the two-dimensiona1 plane (or three-dimensional space,
...... ......
(or
A o p erat i ons are
1.1.
...-."..,'"-J """''-& to visualize in terms
as in
Su u Su (Tu U) = (SuT) u U,
Sn(TuU) ==(5nT)U(SnU). 5 u (Tn U) :::: (5UT) n (SuU).
) c ==S. 5n 0.
==
SuO = O. 5nO S.
=
x E )c. T hen . x ¢
(Un which 1
argument for is
probabil istic an
,.... ,... ... ,.,.
in ..........
It must
1
that we discuss in thi s
two main ingredients are
or
probability law must satisfy
Example 1 . 1 . Consider two alternative games, both involving ten successive coin
tosses:
Game 1: We receive $ 1 each time a head comes up.
Game 2: We receive $1 for every coin toss. up to and including the first time
a head comes up. Then. we receive $2 for every coin toss. up to the second
time a head comes up. More generally, the dollar amount per toss is doubled
each time a head comes up.
t Any collection of possible outcomes, including the entire sample space nand
its complement, the empty set 0, may qualify as an event. Strictly speaking, however,
some sets have to be excluded. In particular, when dealing with probabilistic models
involving an uncountably infinite sample space. there are certain unusual subsets for
which one cannot associate meaningful probabilities. This is an intricate technical issue,
involving the mathematics of measure theory. Fortunately, such pathological subsets
do not arise in the problems considered in this text or in practice. and the issue can be
safely ignored.
8 Probabili ty 1
Models
space by means of
a
3 ...
2 ..
1
1 2 3 .:)
to a leaf of the tree and is ass o c iated with the path from the root to
that shaded area on the left is the event {(1,4), (2 4) (3,4), (4,4)}
that the result of the second roll is 4. That same event can be described by the
, ,
set of leaves highlighted on the right. Note al so that every node of the tree can
be identified with an the set of all leaves downstream from that
n o de For the node labeled a 1 can be identified with the e vent
{(l. 1), (1,2). (1,3), (1, 4)} t h at the result of t h e first roB is 1.
.
,In.,,,... ,..,,::.£>
C,,",'lIr-g n with an experiment.
we have on the
complete the ..., ............... �J"' .. "'" model: we now introd uce a law .
Sec. 1.2 Probabilistic Models 9
Intuitively, this specifies the "likelihood" of any outcome, or of any set of possible
outcomes (an event. as we have called it earlier) . More precisely. the probability
law assigns to every event A. a number P(A) , called the probability of A.
satisfying the following axioms.
Probability Axioms
L (Nonnegativity) P(A) > 0, for every event A.
2. (Additivity) If A and B are two disjoint events, then the probability
of their union satisfies
As another example, consider three disjoint events A I , A2 , and A3. We can use
the additivity axiom for two disjoint events repeatedly, to obtain
P(A1 U A2 U A3) = P (A1 U (A 2 U A3) )
= P(At) + P(A2 U A3 )
Discrete Models
Here is an illustration of how to construct a probability law starting from some
common sense assumptions about a model.
Example 1 .2. Consider an experiment involving a single coin toss. There are two
possible outcomes, heads (H) and tails (T) . The sample space is n = {H, T}, and
the events are
{H. T}, {H}, {T}, 0.
If the coin is fair, i.e. , if we believe that heads and tails are "equally likely," we
should assign equal probabilities to the two possible outcomes and specify that
P( {H}) = P( {T}) = 0.5. The additivity axiom implies that
P({H, T}) = P({H}) + P({T}) = 1,
which is consistent with the normalization axiom. Thus, the probability law is given
by
P ({H, T}) = 1 , p( {H}) = 0.5. P({T}) = 0.5, P(0) = 0,
and satisfies all three axioms.
Consider another experiment involving three coin tosses. The outcome will
now be a 3-long string of heads or tails. The sample space is
n = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
We assume that each possible outcome has the same probability of 1 /8. Let us
construct a probability law that satisfies the three axioms. Consider, as an example.
the event
A = {exactly 2 heads occur} = {HHT, HTH, THH}.
Using additivity, the probability of A is the sum of the probabilities of its elements:
P({HHT, HTH. THH}) = P({HHT}) + P({HTH}) + P({THH})
1 1 1
=8 + 8 + 8
-
3
8
Sec. 1 .2 Probabilistic Models 11
Similarly, the probability of any event is equal to 1/8 times the number of possible
outcomes contained in the event. This defines a probability law that satisfies the
three axioms.
Note that we are using here the simpler notation P(Si) to denote the prob
ability of the event {Si}, instead of the more precise P( {Si} ) . This convention
will be used throughout the remainder of the book.
In the special case where the probabilities P(SI ), . . . , P(s ) are all the same
n
number of elements of A
P(A) = .
n
Let us provide a few more examples of sample spaces and probability laws.
Example 1.3. Consider the experiment of rolling a pair of 4-sided dice (cf. Fig.
1 .4). We assume the dice are fair, and we interpret this assumption to mean that
each of the sixteen possible outcomes [pairs (i, j ) , with i. j = 1 , 2. 3, 4] has the same
probability of 1 / 16. To calculate the probability of an event, we must count the
number of elements of the event and divide by 16 (the total number of possible
1
sum is = 8/ = 1
= =
IS - -
(
is to )
probability l arger than 1. Therefore, the probability of any event that consists of a
7/16.
law satisfies the three probability axioms. The event that Romeo and Juliet wi1l
meet is the shaded region in 1 and its probability is calculated to
1.5: The event Al t hat Romeo and Juliet will arrive within 15 minutes
of each other (d. ). h;
lVl = Ilx yl �
{ y) -
and is shaded in the The area of M is 1 minus the area of the two unshaded
....... 1'".-..
..... or1 - (3/4) . (3/4) = 7/1 6. the of is 7/1 6.
These properties, and other similar ones, can be visualized and verified
graphically using Venn diagrams, as in Fig. 1.6. Note that property ( c ) can be
generalized as follows:
L P(Ai).
n
P(A I U A2 U .. . U An ) :s;
i =I
To see this, we apply property ( c ) to the sets Al and A2 U .. . U An , to obtain
a involves a
tween accuracy, sinlplicity, tractability. Sometimes, a model is chosen
on of or
statistical Tn."', ..... """,.., nlethods, which will
9.
+ AC n B ) 2 P(A),
where the inequality follows from the nonnegativity axiom. and verifies prop
(a).
From (b), we can express the even t s Au Band B as unions of
events;
B= n
Au = P( n n B)+p(ACn
from the first and terms. we obtain
n B), nrr\l'"\ortv (b). Using also the fact
uB) � P(A) + P(B).
AUBUC=Au(ACnB)u
as a consequence of the axiom.
16 1
IS no room
it is
full of calculation
though:
..... �...".�'-' ... �'-' ....... or ambiguous
......... '.......... � is shown Fig. 1
• 18th century. Jacob Bernoulli studies repeated coin tossing and introduces
the first law of large numbers, which lays a foundation for linking theoreti
cal probability concepts and empirical fact. Several mathematicians. such as
Daniel Bernoulli, Leibnitz, Bayes, and Lagrange, make important contribu
tions to probability theory and its use in analyzing real-world phenomena.
De Moivre introduces the normal distribution and proves the first form of
the central limit theorem.
number of elements of A n B
P ( A I B) = .
number of elements of B
Generalizing the argument, we introduce the following definition of condi
tional probability:
P(A n B)
P(A I B) =
P(B) ,
Sec. 1.3 Conditional Probability 19
where we assume that P (B) > 0; the conditional probability is undefined if the
conditioning event has zero probability. In words, out of the total probability of
the elements of B, P (A I B) is the fraction that is assigned to possible outcomes
that also belong to A.
For a fixed event B, it can be verified that the conditional probabilities P(A I B)
form a legitimate probability law that satisfies the three axioms. Indeed, non
negativity is clear. FUrthermore,
P (O n B) P (B)
P (O I B) = = = 1,
P(B) P(B)
and the normalization axiom is also satisfied. To verify the additivity axiom, we
write for any two disjoint events Al and A2,
P ( ( AI A2 ) n B )
A2 I B) =
u
P (A 1 u
P (B)
P ( ( A1 n B) u (A2 n B))
P (B)
P(AI n B) + P (A2 n B)
P (B)
P (A 1
= +
P (B) P (B)
= P(AI I B) + P(A21 B),
where for the third equality, we used the fact that Al n B and A2 n B are
disjoint sets, and the additivity axiom for the (unconditional) probability law.
The argument for a countable collection of disjoint sets is similar.
Since conditional probabilities constitute a legitimate probability law, all
general properties of probability laws remain valid. For example, a fact such as
P (A U C) :S P (A) + P (C) translates to the new fact
Let us also note that since we have P(B I B) = P (B)/P(B) = 1, all of the con
ditional probability is concentrated on B. Thus, we might as well discard all
possible outcomes outside B and treat the conditional probabilities as a proba
bility law defined on the new universe B.
Let us summarize the conclusions reached so far.
20 Sample Space and Probability Chap. 1
number of elements of A n B
P(A I B) = .
number of elements of B
Example 1 .6. We toss a fair coin three successive times. We wish to find the
conditional probability P(A I B) when A and B are the events
which we assume to be equally likely. The event B consists of the four elements
H H H, H HT. HT H. HTT, so its probability is
P(B) = �.
The event A n B consists of the three elements H H H. H HT. HT H, so its proba
bility is
P(A n B) = �.
Thus, the conditional probability P(A I B) is
P(A n B) 3/8 �
P(A I B) = P(B) = =
4/8 4'
Because all possible outcomes are equally likely here, we can also compute P(A I B)
using a shortcut. We can bypass the calculation of P(B) and P(An B), and simply
1
A is3)
eleInents of B(which is4) saIne result3/4.
1 A IS we assume
possible outcomes are equally likely. Let and Y be the result of the 1 st and the
2nd roll: respectively. We wish to determine the conditional probability P(A I B),
where
A= { Y)= = {
a ndm takes each of the values l. 2. 3�
we can
and P(B)
by 16. Alternative]y, we can directly d ivide the number of elenlents of
see 1
2/5. if m = 3 or m = 4.
p( {max(X. Y) = m} I B) = 1 ifm = 2.
O. if m =-= 1.
it C. and an .•.•."""'£A,
team, caB it N, are asked to separately design a n ew product within a month. From
past experience we know that:
( a ) The probability that team C i s successful is23
/ .
(b) The probability that teanl N is is 12
/ .
( c) at one teanl is sw:ceSSI is
22 Sample Space and Probability Chap. 1
Assuming that exactly one successful design is produced, what is the probability
that it was designed by team N?
There are four possible outcomes here, corresponding to the four combinations
of success and failure of the two teams:
SS: both succeed, FF: both fail,
SF: C succeeds, N fails, FS: C fails, N succeeds.
We were given that the probabilities of these outcomes satisfy
A = { an aircraft is present } ,
B = {the radar generates an alarm } ,
1 Probability
is n o t }.
B(' ;;;;;;; { the radar does not generate an alarm} .
The given probabilities are recorded along t he corresponding branches o f the tree de-
scribing t he sanlple s pace , 8.::) shown in 1 . 9 . Each possible o u tcome corresponds
to a leaf of the to of
UAJ..,1J v l,U. '''., '-& with root to the ..
I" A .· ..,. '� ...... ..... rI
f'
set up t r ee
so an event is with a
v ie w the occurrence of the event as a sequence of stepsl namely, the
of leaf.
(b)
tree.
. . . n An . A is as an occurrence of A I .
i t is
occurrence by the
as a pat h n b r an ch e s ,
corresponding to t he events A I , . . . , A n .
occurrence of of A3 ,
probability of A is given by the
fo l lowi n g r u l e ( see also Fig. 1 . 10) .
we
( ni= l )= t ) . ----
co rres ponds
probabi l ities.
The fi nal node t he path to the i n te rsect ion event A! and
its is obtai ned m u l t i pl y i n g t he con d i t ional recorded
along the branches of the path
1 n A2 n · · · n ) ;;;; P ( A d P ( A 2 1 A d · · · n n · . · n An - d·
n
Sec. 1.3 Conditional Probability 25
For the case of just two events, A l and A2 , the multiplication rule is simply the
definition of conditional probability.
Example 1 . 10. Three cards are drawn from an ordinary 52-card deck without
replacement (drawn cards are not placed back in the deck). We wish to find the
probability that none of the three cards is a heart. We assume that at each step,
each one of the remaining cards is equally likely to be picked . By symmetry, this
implies that every triplet of cards is equally likely to be drawn. A cumbersome
approach, which we will not use, is to count the number of all card triplets that
do not include a heart, and divide it with the number of all possible card triplets.
Instead, we use a sequential description of the experiment in conjunction with the
multiplication rule (cf. Fig. 1 . 1 1 ) .
Define the events
We will calculate P(A1 n A2 n A3), the probability that none of the three cards is
a heart, using the multiplication rule
We have
P(Ad = 39 '
52
since there are 39 cards that are not hearts in the 52-card deck. Given that the
first card is not a heart, we are left with 51 cards. 38 of which are not hearts, and
Finally, given that the first two cards drawn are not hearts. there are 37 cards which
are not hearts in the remaining 50-card deck. and
These probabilities are recorded along the corresponding branches of the tree de
scribing the sample space. as shown in Fig. 1 . 1 1 . The desired probability is now
obtained by mUltiplying the probabilities recorded along the corresponding path of
the tree:
P(A1 n A2 n A3) = 39 38 37 .
_ . - . -
52 51 50
1
Note that o nce t he probabi l it ies are recorded along t he tree1 t he probabi l i ty
events c an be '.H A " & & A'''''' 4
13
and 2nd are not 3rd is a = _ . _ m _
52 51 50
1 2 are
win
We have
P (A1 ) = �'
15
=
8
)
14 '
1 .3 27
1 . 1 2 : Sequential
tion of the i n the stu-
dent problem of Example 1 . 1 1 .
are 8
there are 1 4
4
) = 13 '
(c) You first point to door 1 . I f do or 2 is opened , you do not switch . I f door 3 is
opened , you
Which is the best strategy? To answer t he questio n , let us calculate the probability
2/3) ,
the of "-" '- .l A ... ....... t he
28 Sample Space and Probability Chap. 1
another door without a prize has been opened for you, you will get to the winning
door once you switch. Thus. the probability of winning is now 2/3, so (b) is a better
strategy than (a) .
Consider now strategy (c ) . Under this strategy, there is insufficient informa
tion for determining the probability of winning. The answer depends on the way
that your friend chooses which door to open. Let us consider two possibilities.
Suppose that if the prize is behind door 1 , your friend always chooses to open
door 2. (If the prize is behind door 2 or 3, your friend has no choice.) If the prize
is behind door 1 . your friend opens door 2, you do not switch, and you win. If the
prize is behind door 2, your friend opens door 3, you switch. and you win. If the
prize is behind door 3. your friend opens door 2. you do not switch, and you lose .
Thus, the probability of winning is 2/3. so strategy (c) in this case is as good as
strategy (b) .
Suppose now that if the prize is behind door 1 . your friend is equally likely to
open either door 2 or 3. If the prize is behind door 1 (probability 1/3). and if your
friend opens door 2 (probability 1/2), you do not switch and you win (probability
1/6). But if your friend opens door 3, you switch and you lose. If the prize is behind
door 2, your friend opens door 3. you switch, and you win (probability 1 /3 ) . If the
prize is behind door 3, your friend opens door 2, you do not switch and you lose.
Thus. the probability of winning is 1 /6 + 1/3 = 1/2, so strategy (c) in this case is
inferior to strategy (b).
P ( B) = P ( A 1 n B) + . . . + P ( A n n B)
= P(AdP(B I A d + . . . + P ( An ) P ( B I An ) .
The theorem is visualized and proved i n Fig. 1 . 13. Intuitively, we are par
titioning the sample space into a number of scenarios (events) Ai . Then, the
probability that B occurs is a weighted average of its conditional probability
under each scenario, where each scenario is weighted according to its (uncondi
tional) probability. One of the uses of the theorem is to compute the probability
of various events B for which the conditional probabilities P(B I Ai) are known or
1.4 29
B= I n B) u . . . u ( A rl n
P ( B ) = P( A I n B) + . . . + n B).
t he d e fi n i t i o n cond itional
P ( Ai n B) = ).
t he
B ) = P ( A t )P ( B I A d + . . . + BI ).
For a n alternat ive view. consider an equivalent seq uential model . as shown
on the right . The probabi l i ty of the leaf A i n B is the prod uct ) P ( B I ) of
the probabilities along path lead i ng to t hat The event B consists of t h e
th ree leaves and is o b t a i n ed by the i r
enter a
chess tournament w here your of w i n n i ng
(cal l type 1 ) . 0 . 4 a
the players (call them type 1 and against t he remai n i ng q uarter of the p l ayers
( call them type 3 ) . You play a ganle agai nst a randomly chosen opponent . What
is
z.
P (A t ) = 0 . 5 , P(A;J) = 0 . 25 .
E xample 1 . 14. You roll a fair four-sided die. If the result is 1 or 2, you roll once
more but otherwise, you stop. What is the probability that the sum total of your
rolls is at least 4?
Let Ai be the event that the result of first roll is i, and note that peAt ) = 1 / 4
for each i. Let B be the event that the sum total is at least 4. G iven the event A I ,
the sum total will b e at least 4 if the second roll results i n 3 or 4, which happens
with probability 1 /2. Similarly, given the event A2, the sum total will be at least
4 i f the second roll results in 2, 3, or 4, which happens with probability 3/4. Also,
given the event A3 , you stop and the sum total remains below 4 . Therefore,
1
P ( B I Ad = 2 '
- · 0 + -1 . 1 = 9 .
1 1 1 3 1
P CB) = -. -+ . -+
4 2 4 4 4 4 16
Example 1 .15. Alice is taking a probability class and at the end of each week
she can be either up-to-date or she may have fallen behind. If she is up-to-date in
a given week, the probability that she will be up-to-date ( or behind ) in the next
week is 0.8 ( or 0.2, respectively ) . If she is behind in a given week, the probability
that she will be up-to-date (or behind ) in the next week is 0.4 (or 0.6, respectively ) .
Alice is ( by default ) up-to-date when she starts the class. What is the probability
that she is up-to-date after three weeks?
Let Ui and Bi be the events that Alice is up-to-date or behind, respectively,
after i weeks. According to the total probability theorem, the desired probability
P(U3 ) is given by
The probabilities P(U2) and P (B2) can also be calculated using the total probability
theorem:
Bayes' Rule
Let A I , A2 , . . . , An be disjoint events that form a partition of the sample
space, and assume that P (Ai ) > 0, for all i. Then, for any event B such
that P(B) > 0, we have
P(Ai)P(B 1 Ai )
P(A . I B) =
t
P(B)
P (Ai)P(B 1 Ai)
- P(AI )P(B 1 A I ) + . . . + P(An)P(B 1 An ) '
1
3. t h at w e see a shade
.
P ( Ai I B ) = -----------------
peAl )P ( B I Ai ) -3-) ! i = 1 . 2, 3.
P(A1 ) A
of P(A 1 n B ) . d ivided
of t h e leaves. w hich is P(B).
To no t e by of
we
(B I Ad =
( A i I B ) P { B) .
that Inay result in a certain "effect . " We observe the effect , and we to infer
a
effect B has been observed , we
to
P (Ai I B )
cause
as the of event Ai given the information, to
be d istinguished from P{ A d 1
which we call t he
Sec. 1.4 Total Probability Theorem and Bayes ' Rule 33
Example 1 .16. Let us return to the radar detection problem of Example 1.9 and
Fig. 1 .9. Let
A = { an aircraft is present } ,
B { the radar generates an alarm } .
We are given that
Suppose that you win. What is the probability peA l I B) that you had an opponent
of type I?
Using Bayes' rule, we have
P(Al )P(B I A d
peA l I B) =
P(At )P(B I A d + P (A2 )P(B I A 2 ) + P ( A3 )P{B I A3)
0.5 · 0.3
0.5 . 0.3 + 0.25 . 0.4 + 0.25 . 0.5
- �������--���
= 0.4.
a certain population has probability 0.001 of having the disease. Given that the
person just tested positive, what is the probability of having the disease?
If A is the event that the person has the disease, and B is the event that the
test results are positive, the desired probability. P{A I B), is
P(A I B) = P{A)P(B I A)
P(A)P(B I A) + P{Ac )P{B l Ac )
0.00 1 · 0.95
0.001 · 0.95 + 0.999 . 0.05
= 0.0187.
Note that even though the test was assumed to be fairly accurate, a person who has
tested positive is still very unlikely (less than 2%) to have the disease. According
to The Economist (February 20th. 1 999). 80% of those questioned at a leading
American hospital substantially missed the correct answer to a question of this
type; most qf them thought that the probability that the person has the disease
is 0.95!
1.5 INDEPENDENCE
P(A I B) = P(A).
When the above equality holds. we say that A is independent of B. Note that
by the definition P(A I B) = P(A n B)/P(B) , this is equivalent to
For example, an event A and its complement Ac are not independent unless
P (A) = 0 or P (A) = 1 ] , since knowledge that A has occurred provides precise
[
information about whether Ac has occurred.
independent? We have
(
P(Ai n Bj ) = P the outcome of the two rolls is (i, j)) = �
1 '
number of elements of At 4
P ( Ai ) =
total number of possible outcomes 16 '
number of elements of B) 4
P(B). ) =
total number of possible outcomes 16 ·
(
P(A n B) = P the result of the two rolls is ( 1 ,4) = ) �'
1
and also
..! .
number of elements of A
P( A) = _
total number of possible outcomes 16
The event B consists of the outcomes ( l ,4 ) , (2,3), (3,2) , and (4, 1 ) . and
number of elements of B 4
P(B) =
total number of possible outcomes 16 ·
Thus, we see that P(A n B) P(A)P(B) , and the events A and B are
independent.
( c) Are the events
P( A) =
number of elements of A
total number of possible outcomes
=
16
�.
PCB) =
number of elements of B
total number of outcomes 156 '
We have P(A)P(B) =
B are not independent.
15/(16)2 , so that P(A n B) =I- P(A)P(B). and A and
Conditional Independence
We noted earlier that the conditional probabilities of events, conditioned on
a particular event. form a legitimate probability law. We can thus talk about
independence of various events with respect to this conditional law. In particular,
given an event G. the events A and B are called conditionally independent
if
P ( A n B I G) p e A I G)P ( B I G) .
To derive an alternative characterization of conditional independence, we use the
definition of the conditional probability and the multiplication rule, to write
p ( A n B n G)
P ( A n B I G) =
P (G)
P (G)P(B I G)P ( A I B n G)
P (G)
= P ( B I G)P(A I B n G).
We now compare the preceding two expressions. and after eliminating the com
mon factor P ( B I G) , assumed nonzero . we see that conditional independence is
the same as the condition
P ( A I B n G) = peA I G) .
Sec. 1.5 Independence 37
In words, this relation states that if C is known to have occurred, the additional
knowledge that B also occurred does not change the probability of A.
Interestingly, independence of two events A and B with respect to the
unconditional probability law. does not imply conditional independence, and
vice versa, as illustrated by the next two examples.
Example 1 .20. Consider two independent fair coin tosses, in which all four possible
outcomes are equally likely. Let
HI = { 1st toss is a head } ,
H2 = {2nd toss is a head } ,
D = {the two tosses have different results} .
The events HI and H2 are (unconditionally) independent. But
1
P(H 1 1 D) = 2 ' P (HI n H2 1 D) = 0,
Example 1 . 2 1 . There are two coins, a blue and a red one. We choose one of
the two at random, each being chosen with probability 1 /2, and proceed with two
independent tosses. The coins are biased: with the blue coin, the probability of
heads in any given toss is 0.99, whereas for the red coin it is O.O l .
Let B b e the event that the blue coin was selected. Let also Hi b e the event
that the ith toss resulted in heads. Given the choice of a coin, the events Hl and
H2 are independent. because of our assumption of independent tosses. Thus,
P (H1 n H2 1 B) = P (Hl I B)P(H2 1 B) = 0.99 . 0.99.
On the other hand, the events HI and H2 are not independent. Intuitively, if we
are told that the first toss resulted in heads, this leads us to suspect that the blue
coin was selected, in which case, we expect the second toss to also result in heads.
Mathematically, we use the total probability theorem to obtain
Thus, P(H1 n H2) =f. P (H1 )P(H2 ) , and the events Hl and H2 are dependent, even
though they are conditionally independent given B.
We now summarize.
Independence
• Two events A and B are said to be independent if
P {A n B) = P (A)P{B).
P {A I B) = P (A) .
p (A n B I e) = P (A l e) p ( B I e) .
P(HI ) 2
Similarly, H2 and D are independent. On the other hand, we have
P ( H. ) P ( H2 ) P ( D ) ,
1 1 1
P(HI n H2 n D ) = 0 =1= .
2 2.2 =
Example 1 .23. The Equality P(A 1 n A2 n A3) = P(A t ) P(A2 ) P(A3) is not
Enough for Independence. Consider two independent rolls of a fair six-sided
die, and the following events:
A = { lst roll is 1 , 2, or 3 } ,
B = { lst roll is 3, 4, or 5} ,
C = { the sum of the two rolls is 9 } .
We have
1 11
P(A n B) = 6 =1= .
2 2 = P(A)P(B) ,
= P(A)P(C) ,
1 1 4
P(A n C) = 36 =1= 2 . 36
P(B n C) =
1 1 4
= P( B )P (C).
1 2 =1=
36
2 .
1
n n
events A I ,
1 U n u
or
n ) t U );
see
A parallel subsystem succeeds if any one of its components succeeds, so its prob
ability of failure is the product of the probabilities of failure of the corresponding
components, i.e. ,
Returning now to the network of Fig. 1 . 15(a), we can calculate the probabil
ity of success (a path from A to B is available) sequentially, using the preceding
formulas, and starting from the end. Let us use the notation X to denote the ---+ Y
event that there is a (possibly indirect) connection from node X to node Then. Y.
P (C ---+ B) = 1 - ( 1 - P(C ---+ E and E ---+ B ) ) ( l - P(C ---+ F and F ---+ B) )
= 1 - ( 1 - pCEPEB ) ( l - PC FPFB )
= 1 - ( 1 - 0.8 . 0.9 )(1 - 0 . 95 . 0.85)
= 0.946,
P(A ---+ C and C ---+ B) = P(A ---+ C)P(C ---+ B) = 0.9 · 0.946 = 0 . 851 .
P(A ---+ D and D ---+ B) = P(A ---+ D)P(D ---+ B ) = 0 . 75 · 0. 95 = 0.712,
and finally we obtain the desired probability
P(A ---+ B) = 1 - ( 1 - P (A ---+ C and C ---+ B ) ) ( l - P(A ---+ D and D ---+ B))
= 1 - ( 1 - 0.85 1 ) ( 1 - 0. 7 12 )
= 0. 95 7 .
Independent Trials and the Binomial Probabilities
If an experiment involves a sequence of independent but identical stages, we say
that we have a sequence of independent trials . In the special case where there
are only two possible results at each stage, we say that we have a sequence of
independent Bernoulli trials. The two possible results can be anything, e . g . ,
"it rains" or "it doesn't rain," but we will often think in terms of coin tosses and
refer to the two results as "heads" (H) and "tails" (T) .
42 1
dent , w here A i
AI ! ! • • • • An
{ i t h toss is a head } .
=
can by means of a
description , as shown condi tional
outcome
tai ls has probabi l ity
u
n
t h at
sses.
...... ..... ... ..... .... .
of t o
"-"JLU. a ,U. LJ
n-Iong
k from 0
n.
three lnc1lp.np.n-
1 . 16 :
dent tosses of a coi n .
of an
t h e branches of the tree . we record the r- Ar oc' , rl
conditional probabilit ies, and by t h e multiplication rule, t he probability o f ob
.. ...... .
taining a particular 3-toss sequence is calculated by multiplying t he probabilities
recorded the of t he tree .
p r obab i li ty
p( k ) = P ( k ...............'..... ... come in an n-toss
which will play an important role
of any sequence .. .......... -
k u
showed above
IS (1
the probabi1ity
, so we have
p(k ) == () -
n
k
(1 p)n - k :
Sec. 1.5 Independence 43
(nk) n!
k! ( n - k ) ! ' k = 0, 1 , , n, ...
where for any positive integer i we have
i! = 1 . 2 · . . (i - 1 ) . i,
chapter problems. Note that the binomial probabilities must add to 1 , thus p( k )
showing the binomial formula
tk=O G) pk (l - p)n- k = 1.
L p( k ) ,
k =c+ l
where
are the binomial probabilities. For instance, if n = 1 00, = 0. 1 , and c = 15, the p
probability of interest turns out to be 0.0399 .
This example is typical of problems of sizing a facility to serve the needs
of a homogeneous population, consisting of independently acting customers. The
problem is to select the facility size to guarantee a certain probability ( sometimes
called grade of service) that no user is left unserved.
44 Sample Space and Probability Chap. 1
1.6 COUNTING
number of elements of A
P A)
( _
number of such ordered pairs is equal to mn. This observation can be generalized
as follows (see also Fig. 1 . 17).
1 45
the first stage where we only have 8 choices. Therefore, the answer is
8 . 10 . 10 · . . 10 = 8 . 106 .
"'-.".-'
6 times
It should be noted that the Counting Principle remains valid even if each
first-stage result leads to a different set of potential second-stage results, etc. The
only requirement is that the number of possible second-stage results is constant,
regardless of the first-stage result.
In what follows, we will focus primarily on two types of counting arguments
that involve the selection of k objects out of a collection of n objects. If the order
of selection matters, the selection is called a permutation, and otherwise, it is
called a combination. We will then discuss a more general type of counting,
involving a partition of a collection of n objects into multiple subsets.
k-permutations
We start with n distinct objects, and let k be some positive integer, with k � n .
We wish to count the number of different ways that we can pick k out of these
n objects and arrange them in a sequence, i.e. , the number of distinct k-object
sequences. We can choose any of the n objects to be the first one . Having chosen
the first, there are only n - 1 possible choices for the second; given the choice of
the first two, there only remain n - 2 available objects for the third stage, etc.
When we are ready to select the last (the kth) object, we have already chosen
k - 1 objects, which leaves us with n - (k - 1 ) choices for the last one. By the
Counting Principle, the number of possible sequences, called k-permutations,
IS
n(-'---n_ 1 )_·_ . (n _ ) (.:...
+_1...:. ..- -_k----=-)__._
2 1
n( n - 1 ) . . . (n _ k + 1) = _ - -_----=- ' .,.:..._ k _ .,. n_ • • .
_
. .
(n - k) · 2 · 1
n!
(n - k) ! '
In the special case where k = n, the number of possible sequences, simply called
permutations, is
n(n - l ) (n - 2) . . . 2 . 1� = n! .
(Let k = n in the formula for the number of k-permutations, and recall the
convention O! = 1 .)
Sec. 1 . 6 Counting 47
Example 1 .28. Let us count the number of words that consist of four distinct
letters. This is the problem of counting the number of 4-permutations of the 26
letters in the alphabet . The desired number is
n ! = 26! = 26 · 25 · 24 · 23 = 358 , 800 .
(n k) !
_
22 !
The count for permutations can be combined with the Counting Principle
to solve more complicated counting problems.
Example 1.29. You have n l classical music CDs, n2 rock music CDs, and n3
country music CDs. In how many different ways can you arrange them so that the
CDs of the same type are contiguous?
We break down the problem in two stages, where we first select the order of
the CD types, and then the order of the CDs of each type. There are 3! ordered se
quences of the types of CDs (such as classical/rock/country, rock/country/classical,
etc . ), and there are n l ! (or n 2 ! . or n3!) permutations of the classical (or rock. or
country, respectively) CDs . Thus for each of the 3! CD type sequences, there are
nl ! n2! n3! arrangements of CDs. and the desired total number is 3! n l ! n2 ! n3 ! .
Suppose now that you offer to give ki out of the n . CDs of each type i to a
friend, where ki < ni , i = 1 . 2. 3. What is the number of all possible arrangements
of the CDs that you are left with? The solution is similar, except that the number of
(ni - ki)-permutations of CDs of type i replaces ni l in the estimate, so the number
of possible arrangements is
Combinations
There are n people and we are interested in forming a committee of k. How
many different committees are possible? More abstractly, this is the same as the
problem of counting the number of k-element subsets of a given n-element set.
Notice that forming a combination is different than forming a k-permutation.
because in a combination there is no ordering of the selected elements.
For example, whereas the 2-permutations of the letters A, B, C, and D are
AB. BA, AC , CA, AD, DA, BC. CB, BD. DB, CD, DC,
In the preceding example, the combinations are obtained from the per
mutations by grouping together "duplicates" ; for example, AB and BA are not
48 Sample Space and Probability Chap. 1
viewed as distinct, and are both associated with the combination AB. This rea
soning can be generalized: each combination is associated with k! "duplicate"
k-permutations, so the number n!/(n - k) ! of k-permutations is equal to the
number of combinations times k! . Hence, the number of possible combinations,
is equal to
n!
'
k! (n - k) !
Let us now relate the above expression to the binomial coefficient, which
was denoted by G) and was defined in the preceding section as the number of
n-toss sequences with k heads . We note that specifying an n-toss sequence with
k heads is the same as selecting k elements (those that correspond to heads) out
of the n-element set of tosses: i.e. : a combination of k out of n objects. Hence,
the binomial coefficient is also given by the same formula and we have
n!
k! (n - k) ! '
(4) _ � _6
2 - 2! 2! - ,
discussed in Section 1 .5. In the special case where p 1/2, = this formula becomes
tk=O G) = 2n ,
and admits the following simple interpretation. Since G) is the number of k
element subsets of a given n-element subset. the sum over k of (�) counts the
2n.
number of subsets of all possible cardinalities. It is therefore equal to the number
of all subsets of an n-element set. which is
additional club members. Let us count the number of possible clubs of this type in
two different ways, thereby obtaining an algebraic identity.
There are n choices for club leader. Once the leader is chosen, we are left
with a set of n - 1 available persons, and we are free to choose any of the 2 n - 1
subsets. Thus the number of possible clubs is n2 n - 1 •
Alternatively, for fixed k, we can form a k-person club by first selecting k out
of the n available persons [there are G) choices] . We can then select one of the
members to be the leader ( there are k choices ) . By adding over all possible club
sizes k, we obtain the number of possible clubs as L�=l k (�) , thereby showing the
identity
Partitions
k
Recall that a combination is a choice of elements out of an n-element set
without regard to order. Thus, a combination can be viewed as a partition of
k
the set in two: one part contains elements and the other contains the remaining
k.
n - We now generalize by considering partitions into more than two subsets.
We are given an n-element set and nonnegative integers nl , n2 , · . . . nr .
whose sum is equal to n. We consider partitions of the set into r disjoint subsets.
with the ith subset containing exactly ni elements. Let us count in how many
ways this can be done.
We form the subsets one at a time. We have C� J ways of forming the
first subset. Having formed the first subset, we are left with n - nl elements.
We need to choose n2 of them in order to form the second subset, and we have
(n�; l ) choices, etc. Using the Counting Principle for this r-stage process, the
total number of choices is
which is equal to
n! (n - n I ) ! ( n - nl - . . . - nr- I ) !
n d (n - n I ) ! n2 ! ( n - nl - n2 ) !
Example 1 .32. Anagrams. How many different words (letter sequences) can be
obtained by rearranging the letters in the word TATTOO? There are six positions
to be filled by the available letters. Each rearrangement corresponds to a partition
of the set of the six positions into a group of size 3 (the positions that get the letter
T) , a group of size 1 (the position that gets the letter A ) , and a group of size 2 (the
positions that get the letter 0). Thus, the desired number is
---.,.
6!
---,--,--
---
1 ! 2! 3!
= · · · ·5·
1 .2.3.4. .6
1 1 2 1 2 3
= 60 .
( 16
4, 4 , 4 , 4
) 16!
4! 4 ! 4! 4!
12!
3! 3! 3! 3!
different ways.
By the Counting Principle, the event of interest can occur in
4! 1 2!
3! 3! 3! 3!
Sec. 1. 7 Summary and Discussion 51
4! 12!
3! 3! 3! 3!
16!
4! 4! 4! 4!
12 · 8 · 4
15 · 14 · 13 '
A probability problem can usually be broken down into a few basic steps:
(a) The description of the sample space, that is, the set of possible outcomes
of a given experiment.
(b) The (possibly indirect) specification of the probability law (the probability
of each event).
(c) The calculation of probabilities and conditional probabilities of various
events of interest .
The probabilities of events must satisfy the nonnegativity, additivity, and nor
malization axioms. In the important special case where the set of possible out
comes is finite, one can just specify the probability of each outcome and obtain
52 Sample Space and Probability Chap. 1
the probability of any event by adding the probabilities of the elements of the
event.
Given a probability law, we are often interested in conditional probabilities,
which allow us to reason based on partial information about the outcome of
the experiment. We can view conditional probabilities as probability laws of a
special type, under which only outcomes contained in the conditioning event can
have positive conditional probability. Conditional probabilities can be derived
from the (unconditional) probability law using the definition P(A I B) = P(A n
B)/P (B) . However, the reverse process is often convenient, that is, first specify
some conditional probabilities that are natural for the real situation that we wish
to model, and then use them to derive the (unconditional) probability law.
We have illustrated through examples three methods for calculating prob
abilities:
(a) The counting method. This method applies to the case where the num
ber of possible outcomes is finite, and all outcomes are equally likely. To
calculate the probability of an event, we count the number of elements of
the event and divide by the number of elements of the sample space.
(b) The sequential method. This method applies when the experiment has a
sequential character, and suitable conditional probabilities are specified or
calculated along the branches of the corresponding tree (perhaps using the
counting method) . The probabilities of various events are then obtained
by multiplying conditional probabilities along the corresponding paths of
the tree, using the multiplication rule.
(c) The divide-and-conquer method . Here, the probabilities P (B) of vari
ous events B are obtained from conditional probabilities P(B I Ai ) , where
the Ai are suitable events that form a partition of the sample space and
have known probabilities P ( A i ) The probabilities P (B) are then obtained
.
PROBLEMS
(A n B) C AC u BC •
=
( b ) Show that
(c ) Consider rolling a fair six-sided die. Let A be the set of outcomes where the roll
is an odd number. Let B be the set of outcomes where the roll is less than 4.
Calculate the sets on both sides of the equality in part ( b ) , and verify that the
equality holds.
Solu tion. If x belongs to the set on the left, there are two possibilities. Either x E A,
in which case x belongs to all of the sets A U BlI • and therefore belongs to the set on
the right. Alternatively. x belongs to all of the sets Bn in which case. it belongs to all
of the sets A U Bn . and therefore again belongs to the set on the right.
Conversely. if x belongs to the set on the right. then it belongs to A u Bn for all
n. If x belongs to A. then it belongs to the set on the left. Otherwise. x must belong
to every set Bn and again belongs to the set on t he left .
Problem 4 . * Cantor's diagonalization argument. Show that the unit interval
[0, 1] is uncountable. i.e., its elements cannot be arranged in a sequence.
Solution. Any number x in [0. 1] can be represented in terms of its decimal expansion.
e.g., 1/3 = 0.3333 · · ·. Note that most numbers have a unique decimal expansion,
but there are a few exceptions. For example, 1 /2 can be represented as 0.5000 · . . or
as 0.49999 · . '. It can be shown that this is the only kind of exception, i.e . . decimal
expansions that end with an infinite string of zeroes or an infinite string of nines.
54 Sample Space and Probability Chap. 1
Suppose, to obtain a contradiction, that the elements of [0, :1.] can be arranged
in a sequence Xl , X2 , X3 , , so that every element of [0, 1] appears in the sequence.
• . .
where each digit a� belongs to {O, 1 , . . . , 9}. Consider now a number y constructed as
follows. The nth digit of y can be 1 or 2, and is chosen so that it is different from the
nth digit of Xn . Note that y has a unique decimal expansion since it does not end with
an infinite sequence of zeroes or nines. The number y differs from each Xn , since it has
a different nth digit. Therefore, the sequence Xl , X2, does not exhaust the elements
• • •
of [0, 1], contrary to what was assumed. The contradiction establishes that the set [0, 1]
is uncountable.
Problem 5. Out of the students in a class, 60% are geniuses, 70% love chocolate,
and 40% fall into both categories. Determine the probability that a randomly selected
student is neither a genius nor a chocolate lover.
Problem 6. A six-sided die is loaded in a way that each even face is twice as likely
as each odd face. All even faces are equally likely, as are all odd faces. Construct a
probabilistic model for a single roll of this die and find the probability that the outcome
is less than 4.
Problem 7. A four-sided die is rolled repeatedly, until the first time (if ever) that an
even number is obtained. What is the sample space for this experiment?
Problem 8. You enter a special kind of chess tournament, in which you play one game
with each of three opponents, but you get to choose the order in which you play your
opponents, knowing the probability of a win against each . You win the tournament if
you win two games in a row, and you want to maximize the probability of winning.
Show that it is optimal to play the weakest opponent second, and that the order of
playing the other two opponents does not matter.
(b) Use part (a) to show that for any events A, B , and C, we have
P ( A ) = P ( A n B) + P(A n C) + P ( A n BC n CC) - P(A n B n C) .
which gives the probability that exactly one of the events A and B will occur. [Compare
with the formula P(A U B) = P(A) + P(B) - P(A n B), which gives the probability
that at least one of the events A and B will occur. ]
Solution . ( a)
We use the formulas P(X U Y) = P(X) + P(Y) - P(X n Y) and
(A U B) n C = (A n C) U (B n C) . We have
P(A U B U C) = P(A U B) + P(C) - P ( (A U B) n C )
= P(A U B) + P(C) - P ( (A n C) U ( B n C) )
= P(A U B) + P(C) - P(A n C) - P(B n C) + P(A n B n C)
= P(A) + P(B) - P(A n B) + P(C) - P(A n C) - P(B n C)
+ P(A n B n C)
= P(A) + P(B) + P(C) - P(A n B) - P(B n C) - P(A n C)
+ p(A n B n C) .
56 Sample Space and Probability Chap. 1
(b) Use induction and verify the main induction step by emulating the derivation of
part (a) . For a different approach, see the problems at the end of Chapter 2.
Problem 13. * Continuity property of probabilities.
(a) Let A I , A2 , . . . be an infinite sequence of events, which is "monotonically increas
ing," meaning that A n C A n + l for every n. Let A = U;:='= IA n . Show that
P(A) = limn_ oo P (An ) . Hint: Express the event A as a union of countably
many disjoint sets.
(b) Suppose now that the events are "monotonically decreasing," i.e. , An+l C An
for every n. Let A n�= I An . Show that P (A ) limn_ oo P (An ) . Hint: Apply
= =
Solu tion. (a) Let BI = Al and, for n 2: 2, Bn = A n n A� - l ' The events Bn are
disjoint, and we have Uk= I Bk = An, and Uk= I Bk = A. We apply the additivity axiom
to obtain
00 n
P (A) = � P (Bk ) = lim � P(Bk ) = lim P(Uk= I Bk ) = lim P(An ) .
L
k= l
n - oc L
k= l
n -oo n -oo
(b) Let Cn = A� and C = AC . Since An+l C An , we obtain Cn C Cn + l , and the events
Cn are increasing. Furthermore, C = AC = (n�= I An)C = U�=lA� = U�=lCn . Using
the result from part (a) for the sequence Cn , we obtain
Problem 15. A coin is tossed twice. Alice claims that the event of two heads is at
least as likely if we know that the first toss is a head than if we know that at least one
Problems 57
of the tosses is a head. Is she right? Does it make a difference if the coin is fair or
unfair? How can we generalize Alice's reasoning?
Problem 16. We are given three coins: one has heads in both faces, the second has
tails in both faces, and the third has a head in one face and a tail in the other. We
choose a coin at random, toss it, and the result is heads. What is the probability that
the opposite face is tails?
Problem 17. A batch of one hundred items is inspected by testing four randomly
selected items. If one of the four is defective, the batch is rejected. What is the
probability that the batch is accepted if it contains five defectives?
Problem 2 1 . Two players take turns removing a ball from a jar that initially contains
m white and n black balls. The first player to remove a white ball wins. Develop a
58 Sample Space and Probability Chap. 1
recursive formula that allows the convenient computation of the probability that the
starting player wins.
Problem 22. Each of k jars contains m white and n black balls. A ball is randomly
chosen from jar 1 and transferred to jar 2, then a ball is randomly chosen from jar 2
and transferred to jar 3, etc. Finally, a ball is randomly chosen from jar k. Show that
the probability that the last ball is white is the same as the probability that the first
ball is white, i.e. , it is m / ( m + n ) .
Problem 23. We have two jars, each initially containing an equal number of balls.
We perform four successive ball exchanges. In each exchange, we pick simultaneously
and at random a ball from each jar and move it to the other jar. What is the probability
that at the end of the four exchanges all the balls will be in the jar where they started?
Problem 24. The prisoner's dilemma. The release of two out of three prisoners
has been announced. but their identity is kept secret. One of the prisoners considers
asking a friendly guard to tell him who is the prisoner other than himself that will be
released, but hesitates based on the following rationale: at the prisoner's present state
of knowledge, the probability of being released is 2/3, but after he knows the answer,
the probability of being released will become 1 /2, since there will be two prisoners
(including himself) whose fate is unknown and exactly one of the two will be released.
What is wrong with this line of reasoning?
Problem 25. A two-envelopes puzzle. You are handed two envelopes. and you
know that each contains a positive integer dollar amount and that the two amounts are
different. The values of these two amounts are modeled as constants that are unknown.
Without knowing what the amounts are, you select at random one of the two envelopes,
and after looking at the amount inside, you may switch envelopes if you wish. A friend
claims that the following strategy will increase above 1 /2 your probability of ending
up with the envelope with the larger amount: toss a coin repeatedly. let X be equal to
1 /2 plus the number of tosses required to obtain heads for the first time, and switch
if the amount in the envelope you selected is less than the value of X . Is your friend
correct?
event A occurs or not . Assume t hat 0 < p < 1, 0 < q < 1, and that all crows are black.
(a) Given the event B = {a black crow was observed } , what is p eA I B)?
( b ) G iven the event C = {a white cow was observed } , what i s peA I C)?
Problem 27. Alice and Bob have 2n + 1 coins, each coin with probability of heads
equal to 1 /2. Bob tosses n + 1 coins, while Alice tosses the remaining n coins. Assuming
independent coin t osses , show that the probability that after all coins have been tossed ,
Bob will have gotten more heads than Alice is 1 /2.
P (A I B) = I: P (Cd B)p(A I B n CI ) .
i=l
Solution. We have
n
p(A n B ) = L P ( A n B) n Ci ) ,
i=l
Combining these two equations, dividing by PCB), and using the formula peA I B) =
Problem 29. * Let A and B be events with peA) > 0 and PCB) > O. We say that
an event B suggests an event A if p e A I B ) > peA), and does not suggest event A if
peA I B) < p eA) .
(a) Show t hat B suggests A if and only if A suggests B.
( b ) Assume that P(BC ) > O. Show t hat B suggests A if and only if BC does not
suggest A.
( c ) We know that a treasure i s located in one o f two places, with probabilities f3 and
1 - f3, respectively, where 0 < f3 < 1 . We search the first place and if the treasure
is there, we find it with probability p > O. Show that the event of not finding the
treasure in the first place suggests that the treasure is in the second place.
A B eve nts
A = treasure i s
B = { we t reasure i n
( ___1____ _1 1
so
_A n >
=
{3
= =
____
+ ( 1 - /3 )
B) - _- _
=
__-,- ( A) .
P (B) .8 ( 1 - 1 - .8p .
It event
Problem A hu nter has two hu nti n g dogs . One d ay, o n the t rail of some ani m a l ,
d i ve rges i nto two p at h s . He t hat
p.
h u nter d og a pat h ,
if t hey d isagree , to randomly pick a pat h .
t h e two decide o n a pat h ?
- p,
message ( a str ing of symbols) through a noisy commu n i cation channel. Each symbol is
o or 1 with probabi lity p 1 is recei ved i ncorrectly prob-
£0 (I ! in
are i ndep endent .
o o
1
1 -
( a) the
(b) t he
(c) is
... .0 ...'.0 " . ,.0 ... as a 0
(d )
is a
can
or the movies ) is
(a) Suppose that any one plant can produce enough electricity to supply the entire
city. What is the probability that the city will experience a black-out?
(b) Suppose that two power plants are necessary to keep the city from a black-out.
Find the probability that the city will experience a black-out.
Problem 37. A cellular phone system services a population of n l "voice users" (those
who occasionally need a voice connection) and n2
"data users" (those who occasionally
need a data connection) . We estimate that at a given time, each user will need to be
connected to the system with probability P I (for voice users) or P2
(for data users) ,
rl
independent of other users. The data rate for a voice user is bits/sec and for a data
user isr2 bits/sec. The cellular system has a total capacity of c bits/sec. What is the
probability that more users want to use the system than the system can accommodate?
Problem 38. The problem of points. Telis and Wendy play a round of golf ( 18
holes) for a $10 stake, and their probabilities of winning on any one hole are P and
1 - p, respectively, independent of their results in other holes. At the end of 10 holes,
with the score 4 to 6 in favor of Wendy, Telis receives an urgent call and has to report
back to work. They decide to split the stake in proportion to their probabilities of
winning had they completed the round, as follows. If PT and pw are the conditional
probabilities that Telis and Wendy, respectively, are ahead in the score after 18 holes
given the 4-6 score after 10 holes, then Telis should get a fraction PT / (PT + pw ) of the
stake, and Wendy should get the remaining pw /(PT + pw ). How much money should
Telis get? Note: This is an example of the, so-called, problem of points, which played
an important historical role in the development of probability theory. The problem
was posed by Chevalier de Mere in the 17th century to Pascal, who introduced the
idea that the stake of an interrupted game should be divided in proportion to the
players' conditional probabilities of winning given the state of the game at the time of
interruption. Pascal worked out some special cases and through a correspondence with
Fermat , stimulated much thinking and several probability-related investigations.
Problem 39. A particular class has had a history of low attendance. The annoyed
n
professor decides that she will not lecture unless at least k of the students enrolled
in the class are present. Each student will independently show up with probability
Pg if the weather is good, and with probability Pb if the weather is bad. Given the
probability of bad weather on a given day, obtain an expression for the probability that
the professor will teach her class on that day.
Problem 40. Consider a coin that comes up heads with probability P and tails with
probability 1 - p. Let qn be the probability that after independent tosses, there have
n
been an even number of heads. Derive a recursion that relates qn to qn - I , and solve
this recursion to establish the formula
qn = ( 1 + ( 1 - 2Pt ) /2.
Problem 41. Consider a game show with an infinite pool of contestants, where
at each round i, contestant i obtains a number by spinning a continuously calibrated
wheel. The contestant with the smallest number thus far survives. Successive wheel
spins are independent and we assume that there are no ties. Let N be the round at
which contestant 1 is eliminated. For any positive integer n, find peN = n ) .
Problems 63
The sum in the right-hand side can be calculated separately for the two cases where
r = 1 (or p = q) and r =J. 1 (or p =J. q). We have
if p =J. q .
if p = q.
Since Wn = 1 , we can solve for WI and therefore for Wk :
-r
if p =J. q,
1 - rn '
WI =
1
-. if p = q,
rn
so that k
c-r •
if p =J. q,
1 - rn
Wk =
if p = q .
k
-,
n
Problem 43. * Let A and B be independent events. Use the definition of indepen
dence to prove the following:
(a) The events A and Be are independent.
(b) The events AC and BC are independent.
Solution. (a) The event A is the union of the disjoint events A n BC and A n B. Using
the additivity axiom and the independence of A and B, we obtain
It follows that
P (A n Be) = P (A) (1 - P(B) ) = P (A)P(BC ) .
so A and BC are independent.
(b) Apply the result of part (a) twice: first on A and B. then on BC and A.
Problem 44. * Let A. B. and C be independent events, with P( C) > O. Prove that
A and B are conditionally independent given G.
Solution. We have
P (A n B n G)
P(A n B I G) =
P (G)
P(A)P(B)P(G)
P (G)
= P(A)P (B)
= P(A I G)P(B I G),
so A and B are conditionally independent given G. In the preceding calculation, the
first equality uses the definition of conditional probabilities; the second uses the as
sumed independence; the fourth uses the independence of A from G, and of B from G .
Problem 45. * Assume that the events A I , A2 , A3 , A4 are independent and that
P(A3 n A4 ) > O. Show that
Solution. We have
Problem 46. * Laplace's rule of succession. Consider m + 1 boxes with the kth
box containing k red balls and m - k white balls, where k ranges from 0 to m. We
choose a box at random (all boxes are equally likely) and then choose a ball at random
n
from that box, successive times (the ball drawn is replaced each time, and a new ball
n
is selected independently). Suppose a red ball was drawn each of the times. What
is the probability that if we draw a ball one more time it will be red? Estimate this
probability for large m.
Solution. We want to find the conditional probability P (E I Rn ) , where E is the event
of a red ball drawn at time + 1 , and Rn is the event of a red ball drawn each of the
n n
preceding times. Intuitively, the consistent draw of a red ball indicates that a box with
Problems 65
m ( k )1l + 1
k=O k=O
Rn) = R71 + d = m + I L m
P(E n P(
I
k=O
P(
I m ( k )n I 1 m
Rn ) = m + I L m :::::: (m + I ) mn x dx = (m + I ) mn . n ++1I :::::: n +I I .
n I
0
m n
k=O
Similarly,
P(E n Rn ) = ( Rn+ d :::::: n +I 2 '
P
so that
P (E I R71) :::::: nn ++ 2I .
Thus, for large m, drawing a red ball one more time is almost certain when n is large.
Problem 47. * Binomial coefficient formula and the Pascal triangle.
(a) Use the definition of G) as the number of distinct n-toss sequences with k
heads, to derive the recursion suggested by the so called Pascal triangle, given in
Fig. 1 . 20.
(b) Use the recursion derived in part (a) and induction, to establish the formula
()
n
k
-
n!
k! (n - k)! '
Solution. (a) Note that n-toss sequences that contain k heads ( for a < k < n) can be
obtained in two ways:
( 1 ) By starting with an (n - I )-toss sequence that contains k heads and adding a tail
at the end. There are ( n k1)
different sequences of this type .
(2) By starting with an (n - I )-toss sequence that contains k - 1 heads and adding
a head at the end. There are G=�) different sequences of this type.
1
(0 )
) ( 1 )
2
( ) ( �) (2 )
3 3
) () ( ) ()
(� ) ( ) ( ) ( � ) (�� ) 1
1 .20: .... ... ...." ' ... 'n f- binomial co��m4C le:m:.s
Each term (�) in the
and i n t he array on the its two in the
row above it ( exce p t for the boundary terms with k = 0 or k = n , w h ich are equal
to 1 ) .
if k = 1, . . . j n - 1.
if k = n.
=
(�) k ! (n - k ) ! '
by on n . we t he (�) = (i) = 1 , so n = 1
above formula is seen to ho l d as l o n g as we use the convention O ! = 1. If t he formula
holds for each i n dex up to n - 1 \ we have for k = 1 , 2 , . , n - 1j ..
n
( k)
n
(k - l
- 1) (n - )+ k
1
-------�-
=
(k -
= -k . + -- . ---- ---
n-k n!
n k! ( n - k) ! 11,
k ! (n - k ) !
k! ( n - k ) ! '
n= 1
and P(I ) = 1 .
(b) Let Si b e the event that the ith trial is a success.
Fix some number n and for every
i > n, let Pi be the event that the first success after time n occurs at time i. Note
that Pi C Si. Finally, let An be the event that there is at least one success after time
I
n. Note that C An , because an infinite number of successes implies that there are
successes subsequent to time n. Furthermore, the event An is the union of the disjoint
events Pi , i > n. Therefore,
We take the limit of both sides as n ---+ 00. Because of the assumption 2:: : 1 Pt < 00 ,
Problem 50. The birthday problem. Consider n people who are attending a
party. We assume that every person has an equal probability of being born on any day
68 Sample Space and Probability Chap. 1
during the year. independent of everyone else, and ignore the additional complication
presented by leap years (i.e. , assume that nobody is born on February 29) . What is
the probability that each person has a distinct birthday?
Problem 52. We deal from a well-shuffled 52-card deck. Calculate the probability
that the 13th card is the first king to be dealt.
Problem 53. Ninety students, including Joe and Jane, are to be split into three
classes of equal size, and this is to be done at random. What is the probability that
Joe and Jane end up in the same class?
Problem 54. Twenty distinct cars park in the same parking lot every day. Ten of
these cars are US-made. while the other ten are foreign-made . The parking lot has
exactly twenty spaces. all in a row. so the cars park side by side. However. the drivers
have varying schedules. so the position any car might take on a certain day is random.
(a) In how many different ways can the cars line up?
(b) What is the probability that on a given day, the cars will park in such a way
that they alternate (no two US-made are adjacent and no two foreign-made are
adjacent)?
Problem 55. Eight rooks are placed in distinct squares of an 8 x 8 chessboard, with
all possible placements being equally likely. Find the probability that all the rooks are
safe from one another, i.e . . that there is no row or column with more than one rook.
Problem 57. How many 6-word sentences can be made using each of the 26 letters
of the alphabet exactly once? A word is defined as a nonempty (possibly jibberish)
sequence of letters.
Problems 69
Problem 58. We draw the top 7 cards from a well-shuffled standard 52-card deck.
Find the probability that:
(a) The 7 cards include exactly 3 aces.
(b) The 7 cards include exactly 2 kings.
(c) The probability that the 7 cards include exactly 3 aces. or exactly 2 kings, or
both.
Problem 60. A well-shuffled 52-card deck is dealt to 4 players. Find the probability
that each of the players gets an ace.
IS
n
(a) Suppose that k out of the objects are indistinguishable. Show that the number
of distinguishable object sequences is n! / kL
r
(b) Suppose that we have types of indistinguishable objects, and for each i, ki
objects of type i. Show that the number of distinguishable object sequences is
n!
70 Sample Space and Probability Chap. 1
Solution. (a) Each one of the n! permutations corresponds to k! duplicates which are
obtained by permuting the k indistinguishable objects. Thus, the n! permutations can
be grouped into n! /k! groups of k! indistinguishable permutations that result in the
same object sequence. Therefore, the number of distinguishable object sequences is
n!/kL For example, the three letters A, D, and D give the 3! = 6 permutations