Intro. To Probability Sol N
Intro. To Probability Sol N
David F. Anderson
Timo Seppäläinen
Benedek Valkó
Preface 1
Solutions to Chapter 1 3
Solutions to Chapter 2 29
Solutions to Chapter 3 59
Solutions to Chapter 4 91
Solutions to Chapter 5 113
i
Preface
This collection of solutions is for reference for the instructors who use our book.
The authors firmly believe that the best way to master new material is via problem
solving. Having all the detailed solutions readily available would undermine this
process. Hence, we ask that instructors not distribute this document to the students
in their courses.
The authors welcome comments and corrections to the solutions. A list of
corrections and clarifications to the textbook is updated regularly at the website
https://www.math.wisc.edu/asv/
1
Solutions to Chapter 1
1.2. (a) Since Bob has to choose exactly two options, ⌦ consists of the 2-element
subsets of the set {cereal, eggs, fruit}:
⌦ = {{cereal, eggs}, {cereal, fruit}, {eggs, fruit}}
The items in Bob’s breakfast do not come in any particular order, hence the
outcomes are sets instead of ordered pairs.
(b) The two outcomes in the event A are {cereal, eggs} and {cereal, fruit}. In
symbols,
A = {Bob’s breakfast includes cereal} = {{cereal, eggs}, {cereal, fruit}}.
3
4 Solutions to Chapter 1
1.3. (a) This is a Cartesian product where the first factor covers the outcome of
the coin flip ({H, T } or {0, 1}, depending on how you want to encode heads
and tails) and the second factor represents the outcome of the die. Hence
⌦ = {0, 1} ⇥ {1, 2, . . . , 6} = {(i, j) : i = 0 or 1 and j 2 {1, 2, . . . , 6}}.
(b) Now we need a larger Cartesian product space because the outcome has to
contain the coin flip and die roll of each person. Let ci be the outcome of the
coin flip of person i, and let di be the outcome of the die roll of person i. Index
i runs from 1 to 10 (one index value for each person). Each ci 2 {0, 1} and each
di 2 {1, 2, . . . , 6}. Here are various ways of writing down the sample space:
⌦ = ({0, 1} ⇥ {1, 2, . . . , 6})10
= {(c1 , d1 , c2 , d2 , . . . , c10 , d10 ) : each ci 2 {0, 1} and each di 2 {1, 2, . . . , 6}}
= {(ci , di )1i10 : each ci 2 {0, 1} and each di 2 {1, 2, . . . , 6}}.
The last formula illustrates the use of indexing to shorten the writing of the
20-tuple of all outcomes. The number of elements is #⌦ = 210 · 610 = 1210 =
61,917,364,224.
(c) If nobody rolled a five, then each die outcome di comes from the set {1, 2, 3, 4, 6}
that has 5 elements. Hence the number of these outcomes is 210 · 510 = 1010 .
To get the number of outcomes where at least 1 person rolls a five, subtract
the number of outcomes where no one rolls a 5 from the total: 1210 1010 =
51,917,364,224.
1.4. (a) This is an example of sampling with replacement, where order matters.
Thus, the sample space is
⌦ = {! = (x1 , x2 , x3 ) : xi 2 {states in the U.S.}}.
In other words, each sample point is a 3-tuple or ordered triple of U.S. states.
The problem statement contains the assumption that every day each state
is equally likely to be chosen. Since #⌦ = 503 =125,000, each sample point
1
! has equal probability P {!} = 50 3 = 125,000 . This specifies the probability
measure completely because then the probability of any event A comes from
#A
the formula P (A) = 125,000 .
(b) The 3-tuple (Wisconsin, Minnesota, Florida) is a particular outcome, and hence
as explained above,
1
P ((Wisconsin, Minnesota, Florida)) = 503 .
(c) The number of ways to have Wisconsin come on Monday and Tuesday, but not
Wednesday is 1 · 1 · 49, with similar expressions for the other combinations.
Since there is only 1 way for Wisconsin to come each of the three days, we see
the total number of positive outcomes is
1 · 1 · 49 + 1 · 49 · 1 + 49 · 1 · 1 + 1 = 3 · 49 + 1 = 148.
Thus
P (Wisconsin’s flag hung at least two of the three days)
3 · 49 + 1 37
= = = 0.001184.
503 31250
Solutions to Chapter 1 5
1.5. (a) There are two natural sample spaces we can choose, depending upon
whether or not we want to let order matter.
If we let the order of the numbers matter, then we may choose
⌦1 = {(x1 , . . . , x5 ) : xi 2 {1, . . . , 40}, xi 6= xj if i 6= j},
the set of ordered 5-tuples of distinct elements from the set {1, 2, 3, . . . , 40}. In
1
this case #⌦1 = 40 · 39 · 38 · 37 · 36 and P1 (!) = #⌦ 1
for each ! 2 ⌦1 .
If we do not let order matter, then we take
⌦2 = {{x1 , . . . , x5 } : xi 2 {1, 2, 3, . . . , 40}, xi 6= xj if i 6= j},
40
the set of 5-element subsets of the set {1, 2, 3, . . . , 40}. In this case #⌦2 = 5
1
and P2 (!) = #⌦ 2
for each ! 2 ⌦2 .
(b) The correct calculation for this question depends on which sample space was
chosen in part (a).
When order matters, we imagine filling the positions of the 5-tuple with
three even and two odd numbers. There are 53 ways to choose the positions
of the three even numbers. The remaining two positions are for the two odd
numbers. We fill these positions in order, separately for the even and odd
numbers. There are 20 · 19 · 18 ways to choose the even numbers and 20 · 19
ways to choose the odd numbers. This gives
5
· 20 · 19 · 18 · 20 · 19
3 475
P (exactly three numbers are even) = = .
40 · 39 · 38 · 37 · 36 1443
When order does not matter, we choose sets. There are 20 3 ways to choose
a set of three even numbers between 1 and 40, and 20 2 ways to choose a set of
two odd numbers. Therefore, the probability can be computed as
20 20
3 · 2 475
P (exactly three numbers are even) = 40 = .
5
1443
1.6. We give two solutions, first with an ordered sample, and then without order.
(a) Label the three green balls 1, 2, and 3, and label the yellow balls 4, 5, 6, and
7. We imagine picking the balls in order, and hence take
⌦ = {(i, j) : i, j 2 {1, 2, . . . , 7}, i 6= j},
the set of ordered pairs of distinct elements from the set {1, 2, . . . , 7}. The
event of two di↵erent colored balls is,
A = {(i, j) : (i 2 {1, 2, 3} and j 2 {4, . . . , 7}) or (i 2 {4, . . . , 7} and j 2 {1, 2, 3})}.
(b) We have #⌦ = 7 · 6 = 42 and #A = 3 · 4 + 4 · 3 = 24. Thus,
24 4
P (A) = = .
42 7
Alternatively, we could have chosen a sample space in which order does not
matter. In this case the size of the sample space is 72 . There are 31 ways to
choose one of the green balls and 41 ways to choose one yellow ball. Hence,
the probability is computed as
3 4
1 1 4
P (A) = 7 = .
2
7
6 Solutions to Chapter 1
1.7. (a) Label the balls 1 through 7, with the green balls labeled 1, 2 and 3, and
the yellow balls labeled 4, 5, 6 and 7. Let
⌦ = {(i, j, k) : i, j, k 2 {1, 2, . . . , 7}, i 6= j, j 6= k, i 6= k},
which captures the idea that order matters for this problem. Note that #⌦ =
7 · 6 · 5. There are exactly
3 · 4 · 2 = 24
ways to choose first a green ball, then a yellow ball, and then a green ball. Thus
the desired probability is
24 4
P (green, yellow, green) = = .
7·6·5 35
(b) We can use the same reasoning as in the previous part, by accounting for all
the di↵erent orders in which the colors can come:
P (2 greens and one yellow) = P (green, green, yellow)
P (green, yellow, green) + P (yellow, green, green)
3·2·4+3·4·2+4·3·2 72 12
= = = .
7·6·5 210 35
Alternatively, since this question does not require ordering the sample of
balls, we can take
⌦ = {{i, j, k} : i, j, k 2 {1, 2, . . . , 7}, i 6= j, j 6= k, i 6= k},
the set of 3-element subsets of the set {1, 2, . . . , 7}. Now #⌦ = 73 . There are
3 4
2 ways to choose 2 green balls from the 3 green balls, and 1 ways to choose
one yellow ball from the 4 yellow balls. So the desired probability is
3 4
2 · 1 12
P (2 greens and one yellow) = 7 = .
3
35
1.8. (a) Label the letters from 1 to 14 so that the first 5 are Es, the next 4 are As,
the next 3 are Ns and the last 2 are Bs.
Our ⌦ consists of (ordered) sequences of four distinct elements:
⌦ = {(a1 , a2 , a3 , a4 ) : ai 6= aj , ai 2 {1, 2, . . . , 14}}.
The size of ⌦ is 14 · 13 · 12 · 11 = 24024. (Because we can choose a1 14 di↵erent
ways, then a2 13 di↵erent ways and so on.)
The event C consists of sequences (a1 , a2 , a3 , a4 ) consisting of two numbers
between 1 and 5, one between 6 and 9 and one between 10 and 12. We can
count these by constructing such a sequence step-by-step: we first choose the
positions of the two Es: we can do that 42 = 6 ways. Then we choose a first E
out of the 5 choices and place it to the first chosen position. Then we choose
the second E out of the remaining 4 and place it to the second (remaining)
chosen position. Then we choose the A out of the 4 choices, and its position
(there are 2 possibilities left), Finally we choose the letter N out of the 3 choices
and place it in the remaining position (we only have one possibility here). In
each step the number of choices did not depend on the previous choices so we
can just multiply the numbers together to get 6 · 5 · 4 · 4 · 2 · 3 · 1 = 2880.
Solutions to Chapter 1 7
The probability of C is
#C 2880 120
P (C) = = = .
#⌦ 24024 1001
(b) As before, we label the letters from 1 to 14 so that the first 5 are Es, the next
4 are As, the next 3 are Ns and the last 2 are Bs. Our ⌦ is the set of unordered
samples of size 4, or in other words: all subsets of {1, 2, . . . , 14} of size 4:
L/5 L/5 2
P (A) = P {! 2 [0, L] : ! L/5} + P {! 2 [0, L] : ! 4L/5} = + = .
L L 5
1.10. (a) Since the outcome of the experiment is the number of times we roll the
die (as in Example 1.16), we take
⌦ = {1, 1, 2, 3, . . . }.
Element k in ⌦ means that it took k rolls to see the first four. Element 1
means that four never appeared.
Next we deduce the probability measure P on ⌦. Since ⌦ is a discrete
sample space (countably infinite), P is determined by giving the probabilities
of all the individual sample points.
For an integer k 1, we have
P (k) = P {needed k rolls} = P {no fours in the first k 1 rolls, then a 4}.
Each roll has 6 outcomes so the total number of outcomes from k rolls is 6k .
Each roll can fail to be a four in 5 ways. Hence by taking the ratio of the
number of favorable outcomes over the total number of outcomes,
✓ ◆k 1
5k 1 · 1 5 1
P (k) = P {no fours in the first k 1 rolls, then a 4} = k
= .
6 6 6
8 Solutions to Chapter 1
To complete the specification of the measure P , we find the value P (1). Since
the outcomes are mutually exclusive,
1
X
1 = P (⌦) = P (1) + P (k)
k=1
◆k 1
X1 ✓
5 1
= P (1) +
6 6
k=1
1 ✓ ◆ j
1X 5
(reindex) = P (1) +
6 j=0 6
1 1
(geometric series) = P (1) + ·
6 1 5/6
= P (1) + 1.
Thus, P (1) = 0.
(b) We already deduced above that
P (the number four never appears) = P (1) = 0.
Here is an alternative solution.
✓ ◆n
5
P (the number four never appears) P (no fours in the first n rolls) = .
6
Since ( 56 )n ! 0 as n ! 1 and the inequality holds for any n, the probability
on the left must be zero.
1.11. The sample space ⌦ that represents the dartboard itself is a square of side
length 20 inches. We can assume that the center of the board is at the origin. The
event A, that the dart hits within 2 inches of the center, is then the subset of ⌦
described by A = {x : |x| 2}. Probability is now proportional to area, and so
area of A ⇡ · 22 ⇡
P (A) = = 2
= .
area of the board 20 100
1.12. The sample space and probability measure for this experiment were described
in the solution to Exercise 1.10: P (k) = ( 56 )k 1 16 for positive integers k.
1 5
(a) P (need at most 3 rolls) = P (1) + P (2) + P (3) = 6 1+ 6 + ( 56 )2 = 91
216 .
(b)
1
X 1
X 1
X
P (even number of rolls) = P (2m) = ( 56 )2m 11
6 = 1
5 ( 25
36 )
m
1.13. (a) Imagine selecting one student uniformly at random from the school.
Thus, ⌦ is the set of students and each outcome is equally likely. Let W
be the subset of ⌦ consisting of those students who wear a watch. Let B be
the subset of students who wear a bracelet. We are told that
P (W c B c ) = 0.6, P (W ) = 0.25, P (B) = 0.30.
Solutions to Chapter 1 9
1.16. If we see only heads, I win $5. If we see 4 heads, I win $3. If we see 3
heads, I win $1. If we see 2 heads, I “win” -$1. If we see 1 heads, I “win” -$3.
Finally, if we see 0 heads, then I “win” -$5. Thus, the possible values of X are
{ 5, 3, 1, 1, 3, 5}. The sample space for the 5 coin flips is ⌦ = {(x1 , . . . , x5 ) :
xi 2 {H, T }} with #⌦ = 25 . Each individual outcome (x1 , . . . , x5 ) of five flips has
probability 2 5 .
Let k 2 {0, 1, . . . , 5}. To calculate the probability of exactly k heads we need
to count how many five-flip outcomes yield exactly k heads. The answer is k5 , the
number of ways of specifying which of the five flips are heads. Hence
P (precisely k heads)
✓ ◆
# ways to select k slots from the 5 for the k heads 5 5
= = 2 .
25 k
Thus,
5
P (X = 5) = P (0 heads) = 2
P (X = 3) = P (1 heads) = 5 · 2 5
✓ ◆
5
P (X = 1) = P (2 heads) = ·2 5
2
✓ ◆
5
P (X = 1) = P (3 heads) = ·2 5
3
✓ ◆
5
P (X = 3) = P (4 heads) = ·2 5
4
✓ ◆
5
P (X = 5) = P (5 heads) = 2 5.
5
4·4 16
pW (0) = P (W = 0) = = ,
7·7 49
4·3+3·4 24
pW (1) = P (W = 1) = = ,
7·7 49
3·4 9
pW (2) = P (W = 2) = = .
7·7 49
Solutions to Chapter 1 11
1.18. The possible values of X are {3, 4, 5} as these are the possible lengths of the
words. The probability mass function is
3
P (X = 3) = P (we chose one of the letters of ARE) =
16
8 1
P (X = 4) = P (we chose one of the letters of SOME or DOGS) = =
16 2
5
P (X = 5) = P (we chose one of the letters of BROWN) = .
16
1.19. The possible values of X are 5 and 1. For the probability mass function we
need P (X = 1) and P (X = 5). From the wording of the problem
P (X = 5) = P (dart lands within 2 inches of the center).
We may assume that the position of the dart is chosen uniformly from the disk of
radius 6 inches, and hence we may compute the probability above as the ratio of
the area of the disk of radius 2 to the area of the entire disk of radius 6:
⇡22 1
P (dart lands within 2 inches of the center) = = .
⇡62 9
Since P (X = 5) + P (X = 1) = 1, we get P (X = 1) = 1 P (X = 5) = 89 .
1.20. (a) One appropriate sample space is
⌦ = {1, . . . , 6}4 = {(x1 , x2 , x3 , x4 ) : xi 2 {1, . . . , 6}}.
Note that #⌦ = 64 = 1296. Since it is reasonable to assume that all outcomes
are equally likely, we set
1 1
P (!) = = .
#⌦ 1296
(b) To find P (A) and P (B) we count to find #A and #B, that is, the number of
outcomes in these events.
Begin with the easy observation: there is only one way for there to be four
fives, namely (5, 5, 5, 5). There are 5 ways to get three fives in the pattern
(5, 5, 5, X), one for each X 2 {1, 2, 3, 4, 6}. Similarly, there are 5 ways to have
three fives in each of the patterns (5, 5, X, 5), (5, X, 5, 5) and (X, 5, 5, 5). Thus,
there are a total of 5 + 5 + 5 + 5 = 20 ways to have three fives. A slicker way to
calculate this would be to note that there are 41 = 4 ways to choose which roll
is not five, and for each not-five we have 5 choices, thus altoghether 4 · 5 = 20.
Continuing this logic, we see that the number of ways to have precisely two
fives is:
✓ ◆
4
(#ways to choose the not-five rolls) · 5 · 5 = · 5 · 5 = 150.
2
Thus,
#A 1 + 20 + 150 171 19
P (A) = = = = .
#⌦ 1296 1296 144
Similarly,
4
#B 4 · 54 + 43 · 53 1125 125
P (B) = = = = .
#⌦ 1296 1296 144
12 Solutions to Chapter 1
(b) Use the same labels for the chips as in part (a). Our sample space is
⌦ = {{x1 , x2 , x3 } : xi 2 {1, . . . , 7}, xi 6= xj for i 6= j}.
Note that the sample points are now subsets of size 3 instead of ordered triples,
and to indicate this the notation changed from (x1 , x2 , x3 ) to {x1 , x2 , x3 }. We
have #⌦ = 73 = 7·6·53! = 35. #A = 3 · 2 · 2 = 12, the number of ways to choose
one of three black chips, one of two red chips and one of two green chips. Thus
P (A) = #A 12
#⌦ = 35 . The answer is the same as in part (a), as it should be.
1.22. (a) The sample space is the set of 52 cards. We can represent the cards with
numbers from 1 to 52, or with their names. Since each outcome is equally
1
likely, P {!} = 52 for any fixed card !. For any subset A of cards we have
#A
P (A) = 52 .
(b) An event is a subset of the sample space ⌦. In part (a) we saw that for an event
A we have P (A) = #A 52 . So the desired event must have three elements. Any
such set will work, for example {~2, ~3, ~K}. In words, this is the event that
the chosen card is the two of hearts, the three of hearts or the king of hearts.
(c) By part (a), if P (A) = 15 then #A 1 52 52
52 = 5 which forces #A = 5 . Since 5 is not
an integer, there cannot be a subset with this many elements. Consequently
this probability space has no event with probability 1/5.
1.23. (a) You win if the prize is behind door 1. Probability 13 .
(b) You win if the prize is behind door 2 or 3. Probability 23 .
1.24. Choose door 3 and commit to switch. Then probability of winning is p1 + p2 .
1.25. (a) Since there are 5 restaurants with at least one friend out of 6 total restau-
rants, this probability is 56 .
(b) She has 7 friends in total. 3 of them are at a restaurant alone and 4 of them
are at a restaurant with somebody else. Thus the probability that she calls a
friend at a restaurant with 2 friends present is 47 .
1.26. This is sampling without replacement for it would make no sense to put the
same person twice on the committee. We are choosing 4 out of 15. We can do this
with order (there is a first pick, a second pick, etc) or without order (we choose
the subset of 4). It does not matter which approach we choose. But once we have
chosen a method, or calculations have to be consistent. If we work with order then
Solutions to Chapter 1 13
chips (a1 , a2 or a3 ) and place it in position i. (We have three choices for that.)
Then we distribute the remaining 6 numbers among the remaining 6 places.
(There are 6! ways we can do that.) Thus for any 1 i 7 we get #Bi = 3 · 6!
and then
#Bi 3 · 6! 3
P (Bi ) = = = .
#⌦ 7! 7
1.28. Assume that both m and n are at least 1 so the problem is not trivial.
(a) Sampling without replacement. We can compute the answer using either an
ordered or an unordered sample. It helps to assume that the balls are labeled
(e.g. by numbering them from 1 to m + n), although the actual labeling will
not play a role in the computation.
With an ordered sample we have (m+n)(m+n 1) outcomes (we have m+n
choices for the first pick and m + n 1 choices for the second). The favorable
outcomes can be counted by considering green-green and yellow-yellow pairs
separately: their number is m(m 1) + n(n 1). The answer is the ratio of
the number of favorable outcomes to the total number of outcomes,
m(m 1) + n(n 1)
P {(g,g) or (y,y)} = .
(m + n)(m + n 1)
The unordered sample calculation gives the same answer:
m n
2 + 2 m(m 1) + n(n 1)
P {a set of two greens or a set of two yellows} = m+n = .
2
(m + n)(m + n 1)
k
Note: for integers 0 k < `, the convention is ` = 0. This makes the
answers above correct even if m or n or both are 1.
(b) Sampling with replacement. Now the sample has to be ordered (there is a
first pick and a second pick). The total number of outcomes is (m + n)2 , and
the number of favorable outcomes (again counting the green-green and yellow-
yellow pairs separately) is m2 + n2 . This gives
m2 + n2
P {(g,g) or (y,y)} = .
(m + n)2
(c) We simplify the inequality through a sequence of equivalences, by cancelling
factors, multiplying away the denominators, and then cancelling some more.
answer to (a) < answer to (b)
m(m 1) + n(n 1) m2 + n2
() <
(m + n)(m + n 1) (m + n)2
m(m 1) + n(n 1) m2 + n2
() <
m+n 1 m+n
() (m(m 1) + n(n 1))(m + n) < (m2 + n2 )(m + n 1)
2 2 2 2 2
() (m m+n n)(m + n) < (m + n )(m + n) m n2
() ( m n)(m + n) < m2 n2
() (m + n)2 > m2 + n2
() 2mn > 0.
Solutions to Chapter 1 15
The last inequality is always true for positive m or n. Since the last inequality
is equivalent to the first one, the first one is also always true.
The conclusion we take from this is that if you want to maximize your
chances of getting two of the same color, you want to sample with replacement
rather than without replacement. Intuitively this should be obvious: once you
remove a ball, you have diminished the chances of drawing another one of the
same color.
1.29. (a) Label the liberals 1 through 7 and the conservatives 8 through 13. We
do not care about order, so
⌦ = {{x1 , x2 , x3 , x4 , x5 } : xi 2 {1, . . . , 13}, xi 6= xj if i 6= j},
in other words the set of 5-element subsets of the set {1, 2, . . . , 13}. Note that
#⌦ = 13 5 . The event A is
A solution without order comes by erasing the labels of the rooks and only
considering the set of squares they occupy. For the number of sets of 8 squares
that share no row or column we can take the count (8!)2 from the previous answer
and divide it by the number of orderings of the rooks, namely 8!. This leaves
(8!)2 /8! = 8! as the number of sets of 8 squares that share no row or column.
Alternately, pick the squares one column at a time. There are 8 choices for the
square from the first column, 7 available squares in the second column, 6 in the
third, and so on, to give 8! sets of 8 squares that share no row or column.
64
The total number of sets of 8 squares is 8 . So again
P (no two rooks can capture each other)
8! (8!)2
= 64 = ⇡ 0.000009109.
8
64 · 63 · 62 · · · 57
1.31. (a) Number the cards in the deck 1, 2, . . . , 52, with the numbers 1, 2, 3, 4 for
the four aces, and the number 1 for the ace of spades. We sample two cards
without replacement. We solve the problem without considering order. Thus
we set our sample space to be
⌦ = {{x1 , x2 } : x1 6= x2 , 1 xi 52 for i = 1, 2},
the set of 2-element subsets of the set {1, 2, . . . , 52}. We have #⌦ = 52 2 =
52·51
2! = 1326.
We need to compute the probability of the event A that both of the
chosen cards are aces and one of them is the ace of spades. Thus A =
{{1, 2}, {1, 3}, {1, 4}} and #A = 3. From this we get P (A) = #A 3 1
#⌦ = 1326 = 442 .
(b) We use the same sample space as in part (a). We need to compute the proba-
bility of the event B that at least one of the chosen cards is an ace. It is a bit
easier to compute the probability of the complement B c : this is the event that
none of the two chosen cards are aces. B c is the collection of 2-element sets
{x1 , x2 } 2 ⌦ such that both x1 5 and x2 5. There are 48 cards that are
not aces. The number of 2-element sets of such cards is 48 2 =
48·47
2! = 1128.
c c #B c 1128 188
Thus #B = 1128 and P (B ) = #⌦ = 1326 = 221 . Now we can compute
P (B) as P (B) = 1 P (B c ) = 1 188 33
221 = 221 .
1.32. Here is one way to determine the number of ways to be dealt a full house.
We take as our sample space the set of 5-element subsets of the deck of cards:
⌦ = {{x1 , . . . , x5 } : xi 2 {deck of 52}, xi 6= xj if i 6= j}.
52
Note that #⌦ = 5 .
Now count the number of ways to get a full house. First, choose the face value
for the 3 cards that share a face value. There are 13 options. Then select 3 of the 4
suits for this face value. There are 43 ways to do that. We now have the three of
a kind selected. Next, choose another face value for the remaining two cards from
the remaining 12 face values. Then select 2 of the 4 suits for this face value. There
are 42 ways to do that. By the multiplication rule we conclude that there are
✓ ◆ ✓ ◆
4 4
13 · · 12 ·
3 2
ways to be dealt a full house. Since there are a total of 52 5 poker hands, the
probability is
13 · 12 · 43 42
P (full house) = 52 ⇡ 0.00144.
5
1.33. We let our sample space be the set of ordered 5-tuples from the set {1, 2, 3, 4, 5, 6}:
⌦ = {(x1 , . . . , x5 ) : xi 2 {1, . . . , 6}}.
This comes from sampling five times with replacement from {1, 2, 3, 4, 5, 6}, to pro-
duce an ordered sample. Note that #⌦ = 65 .
We count the number of 5-tuples that give a full house. First pick one of the
six numbers (6 choices) for the face value that appears three times. Then pick
another number (5 choices) for the face value that appears twice. Next, select 3
of the 5 rolls for the first number. There are 53 ways to choose three slots from
five. The remaining two positions are for the second number. (Here is an example:
suppose we picked the numbers “4” and “6” and then positions {1, 3, 4}. Then our
full house would be (4, 6, 4, 4, 6).)
5
Thus there are 6 · 5 · 3 ways to roll a full house, and the probability is
6 · 5 · 53
P (full house) = ⇡ 0.03858.
65
1.34. Let the corners of the unit square be the points (0, 0), (0, 1), (1, 1), (1, 0).
The circle of radius of 1/3 around the random point is completely within the
square if and only if the random point lies within the smaller square with cor-
ners (1/3, 1/3), (2/3, 1/3), (2/3, 2/3), (1/3, 2/3). The unit square has area one and
the smaller square has area 1/9. Consequently
P (the circle lies inside the unit square)
area of the smaller square 1/9 1
= = = .
area of original unit square 1 9
1.35. (a) Our sample space ⌦ is the set of points in the triangle with vertices (0, 0),
(3, 0) and (0, 3). The area of ⌦ is 3·3 9
2 = 2.
18 Solutions to Chapter 1
The event A describes the points in ⌦ with distance less than 1 from the
y-axis. These are exactly the points in the trapezoid with vertices (0, 0), (1, 0),
(1, 2), (0, 3). The area of A is (3+2)·1
2 = 52 . Since we are choosing our point
uniformly from ⌦, we can compute P (A) using the ratio of areas:
area of A (5) 5
P (A) = = 29 = .
area of ⌦ (2) 9
(b) We use the same sample space as in part (a). The event B describes the set
of points in ⌦ with distance more than 1 from the origin. The event B c is the
set of points that are in ⌦ and at most distance one from the origin. B c is
a quarter circle with center at (0, 0), radius 1, and corner points at (1, 0) and
(0, 1). The area of B c is ⇡4 . Thus
area of B c (⇡) ⇡
P (B c ) = = 49 =
area of ⌦ (2) 18
and then
⇡
P (B) = 1 P (B c ) = 1
.
18
1.36. (a) Since (X, Y ) is a uniformly random point, probability is proportional to
area:
P (a < X < b)
= P (point (X, Y ) lies in rectangle with vertices (a, 0), (b, 0), (b, 1), (a, 1))
area of rectangle with vertices (a, 0), (b, 0), (b, 1), (a, 1)
=
area of square with vertices (0, 0), (1, 0), (1, 1), (0, 1)
= b a.
Thus, X has a uniform distribution on [0, 1].
(b) The region of the xy plane defined by the inequality |x y| 1/4 consists of
the region between the lines y = x 1/4 and y = x + 1/4. Intersecting this
region with the unit square gives a region with an area of 7/16. (Easiest to see
by subtracting the complementary triangles from the unit square.) Thus, the
desired probability is also 7/16 since the unit square has an area of one.
1.37. (a) Let Bk = {Mary wins on her kth roll and her kth roll is a six}.
✓ ◆k 1 ✓ ◆k 1
(4 · 2)k 1 · 4 · 1 8 4 2 1
P (Bk ) = k
= = .
(6 · 6) 36 36 9 9
Then
1
X 1 ✓ ◆k
X 1
2 1 1
P (Mary wins and her last roll is a six) = P (Bk ) = = .
9 9 7
k=1 k=1
Then
1
X 1 ✓ ◆k
X 1
2 2 6
P (Mary wins) = P (Ak ) = = .
9 3 7
k=1 k=1
(c) Suppose Peter starts. Then the game lasts an even number of rolls precisely
when Mary wins. Thus the calculation is the same as in the example. Let
Dm = {the game lasts exactly m rolls}. Then for k 1,
✓ ◆k 1
(4 · 2)k 1 · 4 · 4 2 4
P (D2k ) = =
(6 · 6)k 9 9
and
1
X 1 ✓ ◆k
X 1
2 4 4
P (the game lasts an even number of rolls) = P (D2k ) = = .
9 9 7
k=1 k=1
If Mary starts, then an even-roll game ends with Peter’s roll. In this case
✓ ◆k 1
(2 · 4)k 1 · 2 · 2 2 1
P (D2k ) = =
(6 · 6)k 9 9
and
1
X 1 ✓ ◆k
X 1
2 1 1
P (the game lasts an even number of rolls) = P (D2k ) = = .
9 9 7
k=1 k=1
(d) Let again Dm = {the game lasts exactly m rolls}. Suppose Peter starts. Then
for k 1
✓ ◆k 1
(4 · 2)k 1 · 4 · 4 2 4
P (D2k ) = =
(6 · 6)k 9 9
and
✓ ◆k 1
(4 · 2)k 1
·2 2 1
P (D2k 1) = = .
(6 · 6)k 1·6 9 3
Next, for j 1:
2j
X j
X j
X
P (game lasts at most 2j rolls) = P (Dm ) = P (D2k ) + P (D2k 1)
m=1 k=1 k=1
j ✓ ◆k
X 1 j ✓ ◆k 1 j ✓ ◆k 1 j 1 ✓ ◆i
2 4 X 2 1 X 2 7 X 2 7
= + = =
9 9 9 3 9 9 i=0
9 9
k=1 k=1 k=1
7 ✓ ◆j
9 (1 ( 29 )j ) 2
= =1
1 29 9
20 Solutions to Chapter 1
and
2j
X1 j 1
X j
X
P (game lasts at most 2j 1 rolls) = P (Dm ) = P (D2k ) + P (D2k 1)
m=1 k=1 k=1
j 1 ✓ ◆k
X 1 j ✓ ◆k 1 j ✓ ◆k 1 ✓ ◆j 1 j ✓ ◆k 1
2 4 X 2 1 X 2 4 2 4 X 2 1
= + = +
9 9 9 3 9 9 9 9 9 3
k=1 k=1 k=1 k=1
j ✓
X 2 k 17 ◆ ✓ ◆ j 1 j 1 ✓ ◆ i ✓ ◆ j 1
2 4 X 2 7 2 4
= =
9 9 9 9 i=0
9 9 9 9
k=1
7 ✓ ◆j 1 ✓ ◆j ✓ ◆j 1 ✓ ◆j
(1 ( 29 )j ) 2 4 2 2 4 2
= 9 2 = 1 =1 3 .
1 9 9 9 9 9 9 9
Finally, suppose Mary starts. Then for k 1
k 1
✓ ◆k 1
(2 · 4) ·2·2 2 1
P (D2k ) = k
=
(6 · 6) 9 9
and ✓ ◆k 1
(2 · 4)k 1
·4 2 2
P (D2k 1) = = .
(6 · 6)k 1·6 9 3
Next, for j 1:
2j
X j
X j
X
P (game lasts at most 2j rolls) = P (Dm ) = P (D2k ) + P (D2k 1)
m=1 k=1 k=1
j ✓ ◆k
X 1 j ✓ ◆k 1 j ✓ ◆k 1 j 1 ✓ ◆i
2 1 X 2 2 X 2 7 X 2 7
= + = =
9 9 9 3 9 9 i=0
9 9
k=1 k=1 k=1
7 ✓ ◆j
9 (1 ( 29 )j ) 2
= =1
1 29 9
and
2j
X1 j 1
X j
X
P (game lasts at most 2j 1 rolls) = P (Dm ) = P (D2k ) + P (D2k 1)
m=1 k=1 k=1
j
X j ✓ ◆k
X 1 ✓ ◆j 1
⇥ ⇤ 2 7 2 1
= P (D2k ) + P (D2k 1) P (D2j ) =
9 9 9 9
k=1 k=1
✓ ◆j ✓ ◆j 1 ✓ ◆j
2 2 1 3 2
=1 =1 .
9 9 9 2 9
We see that when Mary starts, the game tends to be over faster.
1.38. If the choice is to be uniformly random, then each integer has to have the same
probability, say P {k} = c for each integer k. If c > 0, choose an integer n > 1/c.
Then by the additivity of probability over mutually exclusive alternatives,
P (the outcome is between 1 and n) = P {1, 2, . . . , n} = nc > 1.
Since total probability cannot exceed 1, it must be that c = 0 and so P {k} = 0 for
each positive integer k. The total sample space ⌦ is the union of the sequence of
Solutions to Chapter 1 21
singletons {k} as k ranges over all positive integers. Hence again by the additivity
axiom
X1 X1
1 = P (⌦) = P {k} = 0 = 0.
k=1 k=1
We have a contradiction. Thus there cannot be a sample space and probability P
that represents a uniformly chosen random positive integer.
1.39. (a) Define
A = the event that a portion of the bill was paid using cash,
B = the event that a portion of the bill was paid using check,
C = the event that a portion of the bill was paid using card.
Note that we know the following:
P (A) = 0.78, P (B) = 0.16, P (C) = 0.26
P (AC) = 0.13, P (AB) = 0.06, P (BC) = 0.04
P (ABC) = 0.03.
The probability that someone paid with cash only is now seen to be
P (A \ (B [ C)c ) = P (A) P (AB) P (AC) + P (ABC)
= 0.78 0.06 0.13 + 0.03 = 0.62.
The probability that someone paid with check only is
P (B \ (A [ C)c ) = P (B) P (BC) P (AB) + P (ABC)
= 0.16 0.04 0.06 + 0.03 = 0.09.
The probability that someone paid with card only is
P (C \ (A [ B)c ) = P (C) P (AC) P (BC) + P (ABC)
= 0.26 0.13 0.04 + 0.03 = 0.12.
So the probability of the union of these three mutually disjoint sets is,
P (only one method of payment)
= P (cash only) + P (check only) + P (card only)
= 0.62 + 0.09 + 0.12 = 0.83.
(b) Define the event
D = {at least one bill was paid using two or more methods}.
Then Dc is the event that both bills were paid using only one method. By part (a),
we know that there are 83 bills that were paid with only one method. Hence, since
there are precisely 100
2 ways to choose the two checks from the 100, and precisely
83
2 ways to choose the two bills from the pool of 83, we have
83
83 · 82
P (D) = 1 P (Dc ) = 1 2
100 =1 ⇡ 0.3125.
2
100 · 99
22 Solutions to Chapter 1
Next we derive the probabilities that appear in the equation above. The out-
comes of this experiment are 4-tuples from the set {green, red, yellow, white}.
The total number of 4-tuples is 44 = 256.
4
·3·3
2 27
P (G) = P (exactly two greens) = = .
256 128
The numerator above is derived as follows: there are 42 ways to pick the positions
of the two greens in the 4-tuple. For both of the remaining two positions we have
3 colors to choose from. By the same reasoning, P (G) = P (R) = P (Y ) = P (W ) =
27
128 .
An event of type AB above means that the four draws yielded two balls of color
a and two balls of color b, where a and b are two distinct particular colors. The
number of 4-tuples in the event AB is 42 = 6. We can even list them easily. Here
they are in lexicogaphic order:
aabb, abab, abba, baab, baba, bbaa.
Thus P (AB) = 6/256 = 3/128.
Events of the type ABC are empty because four draws cannot yield three
di↵erent colors that each appear exactly twice. For the same reason GRY W = ?.
Putting everything together gives
P (at least one color is repeated exactly twice)
27 3 45
=4· 6· = ⇡ 0.7031.
128 128 64
1.41. Let A1 , A2 , A3 be the events that person 1, 2, and 3 win no games, respec-
tively. Then we want
P (A1 [A2 [A3 ) = P (A1 )+P (A2 )+P (A3 ) P (A1 A2 ) P (A1 A3 ) P (A2 A3 )+P (A1 A2 A3 ),
where we used inclusion-exclusion. Since each person has a probability of 2/3 of
not winning each particular game, we have
✓ ◆4
2
P (Ai ) = ,
3
for each i 2 {1, 2, 3}. Event A1 A2 is equivalent to saying that person 1 won all
three games, and analogously for A1 A3 and A2 A3 . Hence
✓ ◆4
1
P (A1 A2 ) = P (A1 A3 ) = P (A2 A3 ) = .
3
Solutions to Chapter 1 23
Finally, we have P (A1 A2 A3 ) = 0 because somebody had to win at least one game.
Thus,
✓ ◆4 ✓ ◆4
2 1 5
P (A1 [ A2 [ A3 ) = 3 · 3· = .
3 3 9
1.42. By inclusion-exclusion and the bound P (A [ B) 1,
From this we can get the statement step by step for larger and larger values of n.
For n = 3 we can use the n = 2 statement twice, first for A1 [ A2 and A3 :
For general n one can do the same by repeating the procedure n 1 times.
The last step of the proof can also be finished with mathematical induction.
Here is the induction step. If the statement is assumed to be true for n 1 then,
first by the case of two events and then by the induction assumption,
1.44. Let ⌦ = {(i, j) : i, j 2 {1, . . . , 6}} be the sample space of the two rolls of the
two dice (order matters). Note that #⌦ = 36. For (i, j) 2 ⌦ we let X = max{i, j}
and Y = min{i, j}.
We now have
25 11
P (X = 6) = P (X 6) P (X 5) = 1 =
36 36
25 16 9
P (X = 5) = P (X 5) P (X 4) = =
36 36 36
16 9 7
P (X = 4) = P (X 4) P (X 3) = =
36 36 36
9 4 5
P (X = 3) = P (X 3) P (X 2) = =
36 36 36
4 1 3
P (X = 2) = P (X 2) P (X 1) = =
36 36 36
1
P (X = 1) = P (X 1) = .
36
(c) We can use similar reasoning for the probabilities associated with Y :
P (Y 1) = 1
# ways to roll only 2s or higher 52 25
P (Y 2) = = =
36 36 36
# ways to roll only 3s or higher 42 16
P (Y 3) = = =
36 36 36
# ways to roll only 4s or higher 32 9
P (Y 4) = = =
36 36 36
# ways to roll only 5s or higher 22 4
P (Y 5) = = =
36 36 36
# ways to roll only 6s or higher 12 1
P (Y 6) = = = ,
36 36 36
25 11
P (Y = 1) = P (Y 1) P (Y 2) = 1 =
36 36
25 16 9
P (Y = 2) = P (Y 2) P (Y 3) = =
36 36 36
16 9 7
P (Y = 3) = P (Y 3) P (Y 4) = =
36 36 36
9 4 5
P (Y = 4) = P (Y 4) P (Y 5) = =
36 36 36
4 1 3
P (Y = 5) = P (Y 5) P (Y 6) = =
36 36 36
1
P (Y = 6) = P (Y 6) = .
36
Solutions to Chapter 1 25
1.45. The possible values of X are 4, 3, 2, 1, 0, because you can win at most 4 dollars.
The probability mass function is
1
P (X = 4) = P (the first six was rolled on the first roll) =
6
5
P (X = 3) = P (the first six was rolled on the 2nd roll) = 2
6
52
P (X = 2) = P (the first six was rolled on the 3rd roll) = 3
6
53
P (X = 1) = P (the first six was rolled on the 4th roll) = 4
6
54
P (X = 0) = P (no six was rolled in the first 4 rolls) = 4
6
You can check that these probabilities add up to 1, as they should.
1.46. To simplify the counting task we imagine that all four balls are drawn from
the urn one by one, and then let X denote the number of red balls that come before
the yellow. (This is subtly di↵erent from the setup of the problem which says that
we stop drawing balls once we see the red. This distinction makes no di↵erence for
the value that X takes.) Number the red balls 1, 2 and 3, and number the yellow
ball 4. Then the sample space is
In other words, ⌦ is the set of all permutations of the numbers 1, 2, 3, 4 and conse-
quently #⌦ = 4! = 24.
The possible values of X are {0, 1, 2, 3}. To compute the probabilities P (X = k)
we count the number of ways in which each event can take place.
1·3·2·1 1
P (X = 0) = P (yellow came first) = = .
24 4
The numerator equals the number of ways to choose one yellow (1) times the number
of ways to choose the first red (3) times the number of ways to choose the second
red (2) times the number of ways to choose the last red (1). By similar reasoning,
3·1·2·1 1
P (X = 1) = P (yellow came second) = =
24 4
3·2·1·1 1
P (X = 2) = P (yellow came third) = =
24 4
3·2·1·1 1
P (X = 3) = P (yellow came fourth) = = .
24 4
1.47. Since ! 2 [0, 1], the random variable Z satisfies Z(!) = e! 2 [1, e]. Thus for
t < 1 the event {Z t} is empty and has probability P (Z t) = 0. If t e then
{Z t} = ⌦ (in other words, Z t is always true) and so P (Z t) = 1 for t e.
For 1 t < e then we have this equality of events:
{Z t} = {! : e! t} = {! : ! ln t}.
26 Solutions to Chapter 1
k k+1 1
P (Y = k) = P [ 10 , 10 ) = for each k 2 {0, 1, . . . , 9}.
10
1.49. (a) To answer the question with inclusion-exclusion, let Ai = {ith draw is red}.
Then B = [`i=1 Ai . To apply (1.20) we need the probabilities P (Ai1 \ · · · \ Aik )
for each choice of indices 1 i1 < · · · < ik `. To see how this goes, let us
first derive the example
P (A2 \ A5 ) = P (the 2nd draw and 5th draw are red)
by counting favorable outcomes and total outcomes. Each of the ` draws comes
from a set of n balls, so #⌦ = n` . The number of favorable outcomes is
n · 3 · n · n · 3 · n · · · n = n` 2 32 because the second and fifth draws are restricted
to the 3 red balls, and the other ` 2 draws are unrestricted. This gives
✓ ◆2
n` 2 32 3
P (A2 \ A5 ) = = .
n` n
The same reasoning gives for any choice of k indices 1 i1 < · · · < ik `
✓ ◆k
n ` k 3k 3
P (Ai1 \ · · · \ Aik ) = = .
n` n
Then
X̀ X
P (B) = ( 1)k+1 P (Ai1 \ · · · \ Aik )
k=1 1i1 <···<ik `
X̀ ✓ ◆✓ ◆k X̀ ✓ ` ◆✓ 3 ◆k
` 3
= ( 1)k+1 =
k n k n
k=1 k=1
X̀ ✓ ` ◆✓ 3 ◆k ✓
3
◆`
=1 =1 1 .
k n n
k=0
Solutions to Chapter 1 27
In the second to last equality above we added and subtracted the term for
k = 0 which is 1. This enabled us to apply the binomial theorem (Fact D.2 in
Appendix D).
(b) Let Bk = {a red ball is seen exactly k times} for 1 k `. There are k`
ways to decide which k of the ` draws produce the red ball. Thus there are
altogether k` 3k (n 3)` k ways to draw exactly k red balls. Then
` k ✓ ◆✓ ◆k ✓ ◆` k
3 (n 3)` k ` 3 3
P (Bk ) = k = 1
n` k n n
and then by the binomial theorem (add and subtract the k = 0 term)
X̀ X̀ ✓ ` ◆✓ 3 ◆k ✓ 3
◆` k
P (B) = P (Bk ) = 1
k n n
k=1 k=1
X̀ ✓ ` ◆✓ 3 ◆k ✓ 3
◆` k ✓
3
◆` ✓
3
◆`
= 1 1 =1 1 .
k n n n n
k=0
(c) The quickest solution comes by using the complement B c = {each draw is green}.
✓ ◆`
c (n 3)` 3
P (B) = 1 P (B ) = 1 =1 1 .
n` n
Solutions to Chapter 2
Since the outcomes are equally likely, we can equivalently find the answer from
P (A|B) = #AB 3
#B = 5 .
2.2. A = {second flip is tails} = {(H, T, H), (H, T, T ), (T, T, H), (T, T, T )},
B = {at most one tails} = {(H, H, H), (H, H, T ), (H, T, H), (T, H, H)}.
Hence AB = {(H, T, H)}, and since we have equally likely outcomes,
P (AB) #AB
P (A | B) = = = 14 .
P (B) #B
2.3. We set the sample space as ⌦ = {1, 2, . . . , 100}. We have #⌦ = 100 and each
outcome is equally likely.
Let A denote the event that the chosen number is divisible by 3 and B denote
the event that at least one digit is equal to 5. Then
B = {5, 15, 25, . . . , 95} [ {50, 51, . . . , 59}
and #B = 19. (As there are 10 numbers with 5 as the last digit, 10 numbers with
5 at the tens place, and 55 was counted both times.) We also have
AB = {15, 45, 51, 54, 57, 75}, #AB = 6.
29
30 Solutions to Chapter 2
P (AB) 6/100 6
This gives P (A|B) = P (B) = 19/100 = 19 .
2.4. Let A be the event that we picked the ball labeled 5 and B the event that we
picked the first urn. Then we have P (B) = 1/2, P (B c ) = P (we picked the second urn) =
1/2. Moreover, from the setup if the problem
P (A|B) = P (we chose the number 5 | we chose from the first urn) = 0,
1
P (A|B c ) = P (we chose the number 5 | we chose from the second urn) = .
3
We compute P (A) by conditioning on B and B c :
1 1 1 1
P (A) = P (A|B)P (B) + P (A|B c )P (B c ) = 0 ·+ · = .
2 3 2 6
2.5. Let A be the event that we picked the number 2 and B the event that we picked
the first urn. Then we have P (B) = 1/5, P (B c ) = P (we picked the second urn) =
4/5. Moreover, from the setup if the problem
1
P (A|B) = P (we chose the number 2 | we chose from the first urn) = ,
3
1
P (A|B c ) = P (we chose the number 2 | we chose from the second urn) = .
4
Then we can compute P (A) by conditioning on B and B c :
1 1 1 4 4
P (A) = P (A|B)P (B) + P (A|B c )P (B c ) = · + · = .
3 5 4 5 15
2.6. Define events
A = {Alice watches TV tomorrow} and B = {Betty watches TV tomorrow}.
(a) P (AB) = P (A)P (B|A) = 0.6 · 0.8 = 0.48.
(b) Intuitively, the answer must be the same 0.48 as in part (a) because Betty
cannot watch TV unless Alice is also watching. Mathematically, this says that
P (B|Ac ) = 0. Then by the law of total probability,
P (B) = P (B|A)P (A) + P (B|Ac )P (Ac ) = 0.8 · 0.6 + 0 · 0.4 = 0.48.
(c) P (AB c ) = P (A) P (AB) = 0.6 0.48 = 0.12. Or, by conditioning and using
the outcome of Exercise 2.7(a),
P (AB c ) = P (A)P (B c |A) = P (A) 1 P (B|A) = 0.6 · 0.2 = 0.12.
c
2.7. (a) By definition P (Ac |B) = PP(A(B)B) . We have Ac B [ AB = B, and the two
sets on the left are disjoint, so P (Ac B) + P (AB) = P (B), and P (Ac B) =
P (B) P (AB). This gives
P (Ac B) P (B) P (AB) P (AB)
P (Ac |B) = = =1 =1 P (A|B).
P (B) P (B) P (B)
(b) From part (a) we have P (Ac |B) = 1 P (A|B) = 0.4. Then P (Ac B) =
P (Ac |B)P (B) = 0.4 · 0.5 = 0.2.
2.8. Let A1 , A2 , A3 denote the events that the first, second and third cards are
queen, king and ace, respectively. We need to compute P (A1 A2 A3 ). One could
do this by counting favorable outcomes. But conditional probabilities provide an
Solutions to Chapter 2 31
easier way because then we can focus on picking one card at a time. We just have
to keep track of how earlier picks influence the probabilities of the later picks.
4 1
We have P (A1 ) = 52 = 13 since there are 52 equally likely choices for the first
pick and four of them are queens. The conditional probability P (A2 | A1 ) must
reflect the fact that one queen has been removed from the deck and is no longer
a possible outcome. Since the outcomes are still equally likely, the conditional
4
probability of getting a king for the second pick is 51 . Similarly, when we compute
P (A3 | A1 A2 ) we can assume that we pick a card out of 50 (with one queen and
one king removed) and thus the conditional probability of picking an ace will be
4 2
50 = 25 . Thus the probability of A1 A2 A3 is given by
1 4 2 8
P (A1 A2 A3 ) = P (A1 )P (A2 | A1 )P (A3 | A2 A1 ) = 13 · 51 · 25 = 16,575 .
2.9. Let C be the event that we chose the ball 3 and D the event that we chose
from the second urn. Then we have
4 1 1 1
P (D) = , P (Dc ) = , P (C|D) = , P (C|Dc ) = .
5 5 4 3
We need to compute P (D|C), which we can do using Bayes’ formula:
1 4
P (C|D)P (D) 4 · 5 3
P (D|C) = = 1 4 1 1 = .
P (C|D)P (D) + P (C|Dc )P (Dc ) 4 · 5 + 3 · 5
4
2.10. Define events:
A = {outcome of the roll is 4} and Bk = {the k-sided die is picked}.
Then
P (A \ B6 ) P (A|B6 )P (B6 )
P (B6 |A) = =
P (A) P (A|B4 )P (B4 ) + P (A|B6 )P (B6 ) + P (A|B12 )P (B12 )
1 1
· 1
= 1 1 16 13 1 1 = .
·
4 3 + ·
6 3 + ·
12 3
3
2.11. Let A be the event that the chosen customer is reckless. Let B be the event
that the chosen customer has an accident. We know the following:
P (A) = 0.2, P (Ac ) = 0.8, P (B|A) = 0.04, and P (B|Ac ) = 0.01.
The probability asked for is P (Ac |B). Using Bayes’ formula we get
P (B|Ac )P (Ac ) 0.01 · 0.80 1
P (Ac |B) = = = .
P (B|A)P (A) + P (B|Ac )P (Ac ) 0.04 · 0.2 + 0.01 · 0.80 2
2.12. (a) A = {X is even}, B = {X is divisible by 5}. #A = 50, #B = 20 and
AB = {10, 20, . . . , 100} so #AB = 10. Thus
50 20 1 10 1
P (A)P (B) = 100 · 100 = 10 and P (AB) = 100 = 10 .
This shows P (A)P (B) = P (AB) and verifies the independence of A and B.
(b) C = {X has two digits} = {10, 11, 12, . . . , 99} and #C = 90.
D = {X is divisible by 3} = {3, 6, 9, 12, . . . , 99} and #D = 33.
CD = {12, 15, . . . , 99} and #C = 30. Thus
90 33 30 3
P (C)P (D) = 100 · 100 ⇡ 0.297 and P (CD) = 100 = 10 .
32 Solutions to Chapter 2
This shows P (C)P (D) 6= P (CD) and verifies that C and D are not indepen-
dent.
(c) E = {X is a prime} = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47,
53, 59, 61, 67, 71, 73, 79, 83, 89, 97},
and #E = 25.
This shows P (E)P (F ) 6= P (EF ) and verifies that E and F are not independent.
2.13. We need to check whether or not we have
P (X1 = 1, X2 = 1, X3 = 0, X4 = 1, X5 = 0)
= P (X1 = 1)P (X2 = 1)P (X3 = 0)P (X4 = 1)P (X5 = 0)
9 9 1 9 1 729
= · · · · = .
10 10 10 10 10 10000
2.16. Let us label heads as 0 and tails as 1. The sample space is
the set of ordered triples of zeros and ones. #⌦ = 8 and so for equally likely
outcomes we have P (!) = 1/8 for each ! 2 ⌦. The events and their probabilities
Solutions to Chapter 2 33
1
a function on ⌦, Z(10) = 1 + 0 = 1.) And so P (Z = 1) = P {10} = 90 . If we
take X = 2, we cannot get Z = 1. Here is the precise derivation:
P (X = 2, Z = 1) = P ({20, 21, . . . , 29} \ {10}) = P (?) = 0.
1 1 1
Since P (X = 2)P (Z = 1) = 9 · 90 = 810 6= 0, we have shown that X and Z are
not independent.
2.19. (a) If we draw with replacement then we have 72 equally likely outcomes for
the two picks. Counting the favorable outcomes gives
1·7 1
P (X1 = 4) = =
7·7 7
7·1 1
P (X2 = 5) = =
7·7 7
1 1
P (X1 = 4, X2 = 5) = = .
7·7 49
(b) If we draw without replacement then we have 7 · 6 equally likely outcomes for
the two picks. Counting the favorable outcomes gives
1·6 1
P (X1 = 4) = =
7·6 7
6·1 1
P (X2 = 5) = =
7·6 7
1 1
P (X1 = 4, X2 = 5) = = .
7·6 42
(c) The answer to part (b) showed that P (X1 = 4)P (X2 = 5) 6= P (X1 = 4, X2 = 5).
This proves that X1 and X2 are not independent when drawing without replace-
ment.
Part (a) showed that the events {X1 = 4} and {X2 = 5} are independent when
drawing with replacement, but this is not enough for proving that the random
variables X1 and X2 are independent. Independence of random variables requires
checking P (X1 = a)P (X2 = b) = P (X1 = a, X2 = b) for all possible choices of a
and b. (This can be done and so independence of X1 and X2 does actually hold
here.)
2.20. (a) Let S5 denote the number of threes in the first five rolls. Then
X2 ✓ ◆
5 1 k 5 5 k
P (S5 2) = .
k 6 6
k=0
(b) Let N be the number of rolls needed to see the first three. Then from the p.m.f.
of a geometric random variable,
1
X
5 k 1 1 5 4
P (N > 4) = 6 6 = 6 .
k=5
Equivalently,
5 4
P (N > 4) = P (no three in the first four rolls) = 6 .
Solutions to Chapter 2 35
(c) We can approach this in a couple di↵erent ways. By using the independence of
the rolls,
P (5 N 20)
= P (no three in the first four rolls, at least one three in rolls 5–20)
5 4 5 16 5 4 5 20
= 6 1 6 = 6 6 .
Equivalently, thinking of the roll at which the first three comes,
P (5 N 20) = P (N 5) P (N 21)
1
X 1
X
5 k 1 1 5 k 1 1
= 6 6 6 6
k=5 k=21
5 4 5 20
= 6 6 .
2.21. (a) Let S be the number of problems she gets correct. Then S ⇠ Bin(4, 0.8)
and
P (Jane gets an A) = P (S 3) = P (S = 3) + P (S = 4)
✓ ◆
4
= (0.8)3 (0.2) + (0.8)4
3
= 0.8192.
(b) Let S2 be the number of problems Jane gets correct out of the last three. Then
S2 ⇠ Bin(3, 0.8). Let X1 ⇠ Bern(0.8) model whether or not she gets the first
problem correct. By assumption, S2 and X1 are independent. We have
P (S 3, X1 = 1)
P (S 3 | X1 = 1) =
P (X1 = 1)
P (S2 2, X1 = 1) P (S2 2)P (X1 = 1)
= = .
P (X1 = 1) P (X1 = 1)
The last equality followed by the independence of S2 and X1 . Hence,
✓ ◆
3
P (S 3|X1 = 1) = P (S2 2) = (0.8)2 (0.2) + (0.8)3 = 0.896.
2
2.22. (a) Let us encode the possible events in a single round as
AR = {Annie chooses rock}, AP = {Annie chooses paper}
and AS = {Annie chooses scissors}
and similarly BR , BP and BS for Bill. Then, using the independence of the
players’ choices,
P (Ann wins the round) = P (AR BS ) + P (AP BR ) + P (AS BP )
= P (AR )P (BS ) + P (AP )P (BR ) + P (AS )P (BP )
1 1 1 1 1 1
= 3 · 3 + 3 · 3 + 3 · 3 = 13 .
Conceptually quicker than enumerating cases would be to notice that no
matter what Ann chooses, the probability that Bill makes a losing choice is 13 .
36 Solutions to Chapter 2
2.23. Whether there is an accident on a given day can be treated as the outcome
of a trial (where success means that there is at least one accident). The success
probability is p = 1 0.95 = 0.05 and the failure probability is 0.95.
(a) The probability of no accidents at this intersection during the next 7 days is the
probability that the first seven trials failed, which is (1 p)7 = 0.957 ⇡ 0.6983.
(b) There are 30 days in September. Let X be the number of days that have at
least one accident. X counts the number of ‘successes’ among 30 trials, so X ⇠
Bin(30, 0.05). Using the probability mass function of the binomial we get
✓ ◆
30
P (X = 2) = 0.052 0.9528 ⇡ 0.2586.
2
(c) Let N denote the number of days we have to wait for the next accident, or
equivalently, the number of trials needed for the first success. N has geometric
distribution with parameter p = 0.05. We need to compute P (4 < N 10).
The event {4 < N 10} is the same as {N 2 {5, 6, 7, 8, 9, 10}}. Using the
probability mass function of the geometric distribution,
10
X 10
X 10
X
P (4 < N 10) = P (N = k) = (1 p)k 1
p= 0.95k 1
0.05
k=5 k=5 k=5
⇡ 0.2158.
Here is an alternative solution. Note that
P (4 < N 10) = P (N 10) P (N 4)
= (1 P (N > 10)) (1 P (N > 4))
= P (N > 4) P (N > 10).
Solutions to Chapter 2 37
For any positive integer k the event {N > k} is the same as having k failures
in the first k trials. By part (a) the probability of this is (1 p)k , which gives
P (N > k) = (1 p)k = 0.95k and then
P (4 < N 10) = P (N > 4) P (N > 10) = (1 p)4 (1 p)10
= 0.954 0.9510 ⇡ 0.2158.
2.24. (a) X is hypergeometric with parameters (6, 4, 3).
(b) The probability mass function of X is
4 2
k 3 k
P (X = k) = 6 for k 2 {0, 1, 2, 3},
3
with the convention that ka = 0 for integers k > a 0. In particular, P (X =
0) = 0 because with only 2 men available, a team of 3 cannot consist of men
alone.
2.25. Define events: A = {first roll is a three}, B = {second roll is a four}, Di =
{the die has i sides}. Assume that A and B are independent, given Di , for each
i = 4, 6, 12.
X X
P (AB) = P (AB|Di )P (Di ) = P (A|Di )P (B|Di )P (Di )
i=4,6,12 i=4,6,12
= ( 14 )2 + ( 16 )2 + 1 2
( 12 ) · 1
3.
P (AB|D6 )P (D6 ) ( 16 )2 · 13
P (D6 |AB) = = 1 2 = 27 .
P (AB) ( 4 ) + ( 16 )2 + ( 121 2
) · 1
3
2.26.
P ((AB) \ (CD)) = P (ABCD) = P (A)P (B)P (C)P (D) = P (AB)P (CD).
The very first equality is set algebra, namely, the associativity of intersection. This
can be taken as intuitively obvious, or verified from the definition of intersection
and common sense logic:
! 2 (AB) \ (CD) () ! 2 AB and ! 2 CD
() ! 2 A and ! 2 B and ! 2 C and ! 2 D
() ! 2 A and ! 2 B and ! 2 C and ! 2 D
() ! 2 ABCD.
Then we used the product rule first for all four events A, B, C, D, and then
separately for the pairs A, B and C, D.
2.27. (a) First introduce the necessary events. Let A be the event that we picked
Urn I. Then Ac is the event that we picked Urn II. Let B1 the event that we
picked a green ball. Then
1 1 2
P (A) = P (Ac ) = , P (B1 |A) = , P (B1 |Ac ) = .
2 3 3
P (B1 ) is computed from the law of total probability:
1 1 2 1 1
P (B1 ) = P (B1 |A)P (A) + P (B1 |Ac )P (Ac ) = · + · = .
3 2 3 2 2
38 Solutions to Chapter 2
(b) The two experiments are identical and independent. Thus the probability of
picking green both times is the square of the probability from part (a): 12 · 12 = 14 .
(c) Let B2 be the event that we picked a green ball in the second draw. The
events B1 , B2 are conditionally independent given A (and given Ac ), since we
are sampling with replacement from the same urn. Thus
1 2
P (B2 |A) = , P (B2 |Ac ) = ,
3 3
P (B1 B2 |A) = P (B1 |A)P (B2 |A), P (B1 B2 |Ac ) = P (B1 |Ac )P (B2 |Ac ).
From this we get
P (B1 B2 ) = P (B1 B2 |A)P (A) + P (B1 B2 |Ac )P (Ac )
= P (B1 |A)P (B2 |A)P (A) + P (B1 |Ac )P (B2 |Ac )P (Ac )
= ( 13 )2 12 + ( 23 )2 12 = 5
18 .
(d) The probability of getting a green from the first urn is 13 and the probability
of getting a green from the second urn is 23 . Since the picks are independent,
the probability of both picks being green is 13 · 23 = 29 .
2.28. (a) The number of aces I get in the first game is hypergeometric with pa-
rameters (52, 4, 13).
(b) The number of games in which I receive at least one ace during the evening is
binomial with parameters (50, 1 ( 48 52
13 / 13 )).
(c) The number of games in which all my cards are from the same suit is binomial
1
with parameters (50, 52
13 ).
(d) The number of spades I receive in the 5th game is hypergeometric with param-
eters (52, 13, 13).
2.29. Let E1 , E2 , E3 , N be the events that Uncle Bob hits a single, double, triple,
or not making it on base, respectively. These events form a partition of our sample
space. We also define S as the event Uncle Bob scores in this turn at bat. By the
law of total probability we have
P (S) = P (SE1 ) + P (SE2 ) + P (SE3 ) + P (SN )
= P (S|E1 )P (E1 ) + P (S|E2 )P (E2 ) + P (S|E3 )P (E3 ) + P (S|N )P (N )
= 0.2 · 0.35 + 0.3 · 0.25 + 0.4 · 0.1 + 0 · 0.3
= 0.185.
2.30. Identical twins have the same gender. We assume that identical twins are
equally likely to be boys or girls. Fraternal twins are also equally likely to be boys
or girls, but independently of each other. Thus fraternal twins are two girls with
probability 12 · 12 = 14 . Let I be the event that the twins are identical, F the event
that the twins are fraternal.
1 1 1 2
(a) P (two girls) = P (two girls | I)P (I) + P (two girls | F )P (F ) = 2 · 3 + 4 · 3 = 13 .
1 1
P (two girls | I)P (I) 2 · 3 1
(b) P (I | two girls) = = 1 = .
P (two girls) 3
2
Solutions to Chapter 2 39
(b)
k 1
P (A|Bk )P (Bk ) 10 · 5 k
P (Bk | A) = P5 = 3 = .
k=1 P (A | Bk )P (Bk ) 10
15
2.34. Since the urns are interchangeable, we can put the marked ball in urn 1.
There are three ways to arrange the two unmarked balls. Let case i for i 2 {0, 1, 2}
denote the situation where we put i unmarked balls together with the marked ball,
and the remaining 2 i unmarked balls in the other urn. Let M denote the event
that your friend draws the marked ball, and Aj the event that she chooses urn j,
j = 1, 2. Since P (M |A2 ) = 0, we get the following probabilities.
1
Case 0: P (M ) = P (M |A1 )P (A1 ) = 1 · 2 = 12 .
1
Case 1: P (M ) = P (M |A1 )P (A1 ) = 2 · 12 = 14 .
1
Case 2: P (M ) = P (M |A1 )P (A1 ) = 3 · 12 = 16 .
So (a) you would put all the balls in one urn (Case 2) while (b) she would put
the marked ball in one urn and the other balls in the other urn.
(c) The situation is analogous. If we put k unmarked balls together with the
marked ball in urn 1, then
1 1 1
P (M ) = P (M |A1 )P (A1 ) = k+1 · 2 = 2(k+1) .
Hence to minimize the chances of drawing the marked ball, put all the balls in one
urn, and to maximize the chances of drawing the marked ball, put the marked ball
in one urn and all the unmarked balls in the other.
2.35. Let A be the event that the first card is a queen and B the event that the
second card is a spade. Note that A and B are not independent, and there is no
immediate way to compute P (B|A). We can compute P (AB) by counting favorable
outcomes. Let ⌦ be the collection of all ordered pairs drawn without replacement
from 52 cards. #⌦ = 52 · 51 and all outcomes are equally likely. We can break up
AB into the union of the following two disjoint events:
We have #C = 12, as we can choose the second card 12 di↵erent ways. We have
#D = 3·13 = 39 as the first card can be any of the three non-spade queens, and the
second card can be any of the 13 spades. Thus #AB = #C + #D = 12 + 39 = 51
and we get P (AB) = #AB 51 1
#⌦ = 52·51 = 52 .
2.36. Let Aj be the event that a j-sided die was chosen and B the event that a six
was rolled.
Solutions to Chapter 2 41
(b)
1 3
P (B|A6 )P (A6 ) ·
P (A6 |B) = = 6
1
12
= 34 .
P (B) 18
2.37. (a) Let S, E, T, and W be the events that the six, eight, ten, and twenty sided
die is chosen. Let X be the outcome of the roll. Then
1
P ({X = 5} \ {X > 3}) 4 1
P (X = 5 | X > 3) = = 3 = .
P (X > 3) 4
3
42 Solutions to Chapter 2
(e)
1 1
P (R | A4 )P (A4 ) ·
P (A4 | R) = = 5
2
4
= 38 .
P (R) 15
2.39. (a) Let Bi the event that we chose the ith word (i = 1, . . . , 8). Events
B1 , . . . , B8 form a partition of the sample space and P (Bi ) = 18 for each i. Let
A be the event that we chose the letter O. Then P (A|B3 ) = 15 , P (A|B4 ) = 13 ,
P (A|B6 ) = 14 with all other P (A|Bi ) = 0. This gives
X8 ✓ ◆
1 1 1 1 47
P (A) = P (A|Bi )P (Bi ) = + + = .
i=1
8 5 3 4 480
(b) The length of the chosen word can be 3, 4, 5 or 6, so the range of X is the
set {3, 4, 5, 6}. For each of the possible value x we have to find the probability
P (X = x).
pX (3) = P (X = 3) = P (we chose the 1st, the 4th or the 7th word)
3
= P (B1 [ B4 [ B7 ) = ,
8
2
pX (4) = P (X = 4) = P (we chose the 6th or the 8th word) = P (B6 [ B8 ) = ,
8
2
pX (5) = P (X = 5) = P (we chose the 2nd or the 3rd word) = P (B2 [ B3 ) = ,
8
1
pX (6) = P (X = 6) = P (we chose the 5th word) = P (B5 ) = .
8
Note that the probabilities add up to 1, as they should.
2.40. (a) For i 2 {1, 2, 3, 4} let Ai be the event that the student scores i on the
test. Let M be the event that the student becomes a math major.
4
X
1 1 3
P (M ) = P (M |Ai )P (Ai ) = 0 · 0.1 + 5 · 0.2 + 3 · 0.6 + 7 · 0.1 ⇡ 0.2829.
i=1
(b)
3
P (M |A4 )P (A4 ) 7 · 0.1
P (A4 |M ) = = 1 1 3 ⇡ 0.1515.
P (M ) 5 · 0.2 + 3 · 0.6 + 7 · 0.1
Solutions to Chapter 2 43
2.44. Let Ai be the event that bin i was chosen (i = 1, 2) and Yj the event that
draw j (j = 1, 2) is yellow.
(a)
P (Y1 |A1 )P (A1 )
P (A1 |Y1 ) =
P (Y1 |A1 )P (A1 ) + P (Y1 |A2 )P (A2 )
4
· 12 14
= 4 10 1 4 1 = 34 ⇡ 0.4118.
10 · 2 + 7 · 2
(b) This question asks for the conditional probability of A1 , given that two draws
with replacement from the chosen urn yield yellow. We assume that draws
with replacement from the same urn are independent. This translates into
conditional independence of Y1 and Y2 , given Ai .
P (Y1 Y2 |A1 )P (A1 )
P (A1 |Y1 Y2 ) =
P (Y1 Y2 |A1 )P (A1 ) + P (Y1 Y2 |A2 )P (A2 )
P (Y1 |A1 )P (Y1 |A1 )P (A1 )
=
P (Y1 |A1 )P (Y1 |A1 )P (A1 ) + P (Y1 |A2 )P (Y1 |A2 )P (A2 )
4
· 4 ·1 196
= 4 410 1 10 4 2 4 1 = ⇡ 0.3289.
10 · 10 · 2 + 7 · 7 · 2
596
2.45. (a) Let B, G, and O be the events that a 7-year-old like the Bears, Packers,
and some other team, respectively. We are given the following:
P (B) = 0.10, P (G) = 0.75, P (O) = 0.15.
Let A be the event that the 7-year-old goes to a game. Then we have
P (A|B) = 0.01, P (A|G) = 0.05, P (A|O) = 0.005.
P (A) is computed from the law of total probability:
P (A) = P (A|B)P (B) + P (A|G)P (G) + P (A|O)P (O)
= 0.01 · 0.1 + 0.05 · 0.75 + 0.005 · 0.15 = 0.03925.
(b) Using the result of (a) (or Bayes’ formula directly):
P (AG) P (A|G)P (G) 0.05 · 0.75 0.0375
P (G|A) = = = = ⇡ 0.9554.
P (A) P (A) 0.03925 0.03925
2.46. A sample point is an ordered triple (x, y, z) where x is the number drawn
from box A, y is the number drawn from box B, and z the number drawn from box
C. All 6 · 12 · 4 = 288 outcomes are equally likely, so we can solve these problems
by counting.
(a) The number of outcomes with exactly two 1s is
1 · 1 · 3 + 1 · 11 · 1 + 5 · 1 · 1 = 19.
The number of outcomes with a 1 from box A and exactly two 1s is
1 · 1 · 3 + 1 · 11 · 1 = 14.
Solutions to Chapter 2 45
Thus
P (ball 1 from A and exactly two 1s)
P (ball 1 from A | exactly two 1s) =
P (exactly two 1s)
14/288 14
= = .
19/288 19
(b) There are three sample points whose sum is 21: (6, 12, 3), (6, 11, 4), (5, 12, 4).
Two of these have 12 drawn from B. Hence the answer is 2/3. Here is the
formal calculation.
P (ball 12 from B and sum of balls 21)
P (ball 12 from B | sum of balls 21) =
P (sum of balls 21)
P {(6, 12, 3), (5, 12, 4)} 2/288 2
= = = .
P {(6, 12, 3), (6, 11, 4), (5, 12, 4)} 3/288 3
2.47. Define random variables X and Y and event S:
X = total number of patients for whom the drug is e↵ective
Y = number of patients for whom the drug is e↵ective, excluding your friends
S = trial is a success for your two friends.
We need to find
P (S \ {X = 55})
P (S|X = 55) = .
P (X = 55)
Note that X ⇠ Bin(80, p), and thus P (X = 55) = 80 55
55 p (1 p)25 . Moreover,
S \ {X = 55} = S \ {Y = 53}. The events S and {Y = 53} are independent, as S
depends on the trial outcomes for your friends, and Y on the trial outcomes of the
other patients. Thus
P (S \ {X = 55}) = P (S \ {Y = 53}) = P (S)P (Y = 53).
78
We have P (S) = p2 and P (Y = 53) = 53 p53 (1 p)25 , as Y ⇠ Bin(78, p). Collecting
everything:
P (S \ {X = 55}) p2 · 78 53
53 p (1 p)25 78
53
P (S|X = 55) = = 80 55 = 80
P (X = 55) 55 p (1 p) 25
55
297
= ⇡ 0.4699.
632
2.48. Define events G = {Kevin is guilty}, A = {DNA match}. Before the DNA
evidence P (G) = 1/100, 000. After the DNA match
1
P (A|G)P (G) 1· 100,000
P (G|A) = = 1 1 99,999
P (A|G)P (G) + P (A|Gc )P (Gc ) 1· 100,000 + 10,000 · 100,000
1 1
= 4
⇡ .
1 + 10 10 11
P1
2.49. (a) The given numbers are nonnegative, so we just need to check that k=0 P (X =
k) = 1:
1
X 1
4 X 1 2 k 4 1
· 2
P (X = k) = + 10 · 3 = + 10 3
2 = 1.
5 5 1 3
k=0 k=1
46 Solutions to Chapter 2
(b)
P (D|C)P (C) 1 · 13 1
P (C|D) = = 1 = .
P (D) (p + 1) · 3
1+p
If the guard is equally likely to name either B or C when both of them are
slated to die, then A has not gained anything (his probability of pardon is still
1 2
3 ) but C’s chances of pardon have increased to 3 . In the extreme case where the
guard would never name B unless he had to (p = 0), C is now sure to be pardoned.
2.51. Since C ⇢ B we have B [ C = B and thus A [ B [ C = A [ B. Then
P (A [ B [ C) = P (A [ B) = P (A) + P (B) P (AB).
Since A and B are independent we have P (AB) = P (A)P (B). This gives
P (A [ B [ C) = P (A) + P (B) P (A)P (B) = 1/2 + 1/4 1/8 = 5/8.
2.52. Yes, A, B, and C are mutually independent. There are four equations to
check:
(i) P (AB) = P (A)P (B)
(ii) P (AC) = P (A)P (C)
(iii) P (BC) = P (B)P (C)
(iv) P (ABC) = P (A)P (B)P (C).
(i) comes from inclusion-exclusion:
P (AB) = P (A) + P (B) P (A [ B) = 0.06 = P (A)P (B).
Solutions to Chapter 2 47
(ii) comes from P (AC) = P (C) P (Ac C) = 0.03 = P (A)P (C). (iii) is given.
Finally, (iv) comes from using inclusion-exclusion once more and the previous com-
putations:
P (ABC) = P (A [ B [ C) P (A) P (B) P (C)
+ P (AB) + P (AC) + P (BC)
= 0.006 = P (A)P (B)P (C).
2.53. (a) If the events are disjoint then
P (A [ B) = P (A) + P (B) = 0.3 + 0.6 = 0.9.
(b) If the events are independent then
P (A [ B) = P (A) + P (B) P (AB) = P (A) + P (B) P (A)P (B)
= 0.3 + 0.6 0.3 · 0.6 = 0.72.
2.54. (a) It is possible. We use the fact that A = AB [ AB c and that these are
mutually exclusive:
P (A) = P (AB) + P (AB c ) = P (A|B)P (B) + P (A|B c )P (B c )
1 1 1 1
= P (B) + P (B c ) = (P (B) + P (B c )) = .
3 3 3 3
(b) A and B are independent. By part (a) and the given information,
P (AB)
P (A) = P (A|B) =
P (B)
from which P (AB) = P (A)P (B) and independence has been verified. (Note
that the value 13 was not needed for this conclusion.)
2.55. (a) Since Peter throws the first dart, in order for Mary to win Peter must
fail once more than she does.
X1
P (Mary wins) = P (Mary wins on her kth throw)
k=1
X1
(1 p)r
= ((1 p)(1 r))k 1
(1 p)r =
1 (1 p)(1 r)
k=1
(1 p)r
= .
p + r pr
(b) The possible values of X are the nonnegative integers.
P (X = 0) = P (Peter wins on his first throw) = p.
For k 1,
P (X = k) = P (Mary wins on her kth throw)
+ P (Peter wins on his (k + 1)st throw)
= ((1 p)(1 r))k 1
(1 p)r + ((1 p)(1 r))k p
= ((1 p)(1 r))k 1
(1 p)(p + r pr).
48 Solutions to Chapter 2
1
X (1 p)(p + r pr)
((1 p)(1 r))k 1
(1 p)(p + r pr) = =1 p.
1 (1 p)(1 r)
k=1
2.57. (a) Let E1 be the event that the first component functions. Let E2 be the
event that the second component functions. Let S be the event that the entire
system functions. S = E1 \E2 since both components must function in order for
the whole system to be operational. By the assumption that each component
acts independently, we have
Let Xi be a Bernoulli random variable taking the value 1 if the ith element of the
first component is working. The information given is that P (Xi = 1) = 0.95,
P (Xi = 0) = 0.05 and X1 , . . . , X8 are mutually independent. Similarly, let
Yi be a Bernoulli random variable taking the value 1 if the ith element of the
second component is working. Then P (Yi = 1) = 0.90, P (Yi = 0) = 0.1 and
P8
Y1 , . . . , Y4 are mutually independent. Let X = i=1 Xi give the total number
P4
of working elements in component number one and Y = i=1 Yi the total
number of working elements in component number 2. Then X ⇠ Bin(8, 0.95)
and Y ⇠ Bin(4, 0.90), and X and Y are independent (by the assumption that
Solutions to Chapter 2 49
2.59. Define events: B = {the bus functions}, T = {the train functions}, and S =
{no storm}. The event that travel is possible is (B [ T ) \ S = BS [ T S. We
calculate the probability with inclusion-exclusion and independence:
2.60. (a) P (AB c ) = P (A) P (AB) = P (A) P (A)P (B) = P (A) 1 P (B) =
P (A)P (B c ).
(b) Apply first de Morgan and then inclusion-exclusion:
(c) P (AB c C) = P (AC) P (ABC) = P (A)P (C) P (A)P (B)P (C) = P (A) 1
P (B) P (C = P (A)P (B c )P (C).
(d) Again first de Morgan and then inclusion-exclusion:
P (Ac B c C c ) = 1 P (A [ B [ C)
=1 P (A) P (B) P (C) + P (AB) + P (AC) + P (BC) P (ABC)
=1 P (A) P (B) P (C) + P (A)P (B) + P (A)P (C) + P (B)P (C)
P (A)P (B)P (C)
= 1 P (A) 1 P (B) 1 P (C)
c c c
= P (A )P (B )P (C ).
2.61. (a) Treat each draw as a trial: green is success, red is failure. By counting
favorable outcomes, the probability of success is p = 37 for each draw. Because we
draw with replacement the outcomes are independent. Thus the number of greens
in the 9 picks is the number of successes in 9 trials, hence a Bin(9, 37 ) distribution.
Using the probability mass function of the binomial distribution gives
P (X 1) = 1 (1 p)9 ⇡ 0.9935,
P (X = 0) = 1
5
X 5 ✓ ◆
X 9 k
P (X 5) = P (X = k) = p (1 p)9 k ⇡ 0.8653.
k
k=0 k=0
(b) N is the number of trials needed for the first success, and so has geometric
distribution with parameter p = 37 . The probability mass function of the geometric
distribution gives
9
X 9
X
P (N 9) = P (N = k) = p(1 p)k 1
⇡ 0.9935.
k=1 k=1
Solutions to Chapter 2 51
(c) We have P (X 1) = P (N 9). We can check this by using the geometric sum
formula to get
9
X 1 (1 p)9 )
p(1 p)k 1
=p =1 (1 p)9 .
1 (1 p)
k=1
Here is another way to see this, without any algebra. Imagine that we draw balls
with replacement infinitely many times. Think of X as the number of green balls
in the first 9 draws. N is still the number of draws needed for the first green. Now
if X 1, then we have at least one green within the first 9 draws, which means
that the first green draw happened within the first 9 draws. Thus X 1 implies
N 9. But this works in the opposite direction as well: if N 9 then the first
green draw happened within the first 9 draws, which means that we must have at
least one green within the first 9 picks. Thus N 9 implies X 1. This gives the
equality of event: {X 1} = {N 9}, and hence the probabilities must agree as
well.
2.62. Regard the drawing of three marbles as one trial, with success probability p
given by
9
3 7 · 8 · 9 · 10 42
p = P (all three marbles blue) = 13 = = .
3
10 · 11 · 12 · 13 143
42
X ⇠ Bin(20, 143 ). The probability mass function is
✓ ◆
20 42 k 101 20 k
P (X = k) = 143 143 for k = 0, 1, 2, . . . , 20.
k
2.63. The number of heads in n coin flips has distribution Bin(n, 1/2). Thus the
probability of winning if we choose to flip n times is
✓ ◆
n 1 n(n 1)
fn = P (n flips yield exactly 2 heads) = = .
2 2n 2n+1
We want to find the n which maximizes fn . Let us compare fn and fn+1 . We have
n(n 1) (n + 1)n
fn < fn+1 () n+1
< () 2(n 1) < n + 1 () n < 3.
2 2n+2
Similarly, fn > fn+1 if and only if n > 3, and f3 = f4 . Thus
f 2 < f 3 = f 4 > f5 > f 6 > . . . .
This means that the maximum happens at n = 3 and n = 4, and the probability
of winning at those values is f3 = f4 = 3·2 3
24 = 8 .
2.64. Let X be the number of correct answers. X is the number of successes in 20
independent trials with success probability p + 12 r.
19 20
P (X 19) = P (X = 19) + P (X = 20) = 20 p + 12 r q + 12 r + p + 12 r .
2.65. Let A be the event that at least one die lands on a 4 and B be the event
that all three dice land on di↵erent numbers. Our sample space is the set of all
triples (a1 , a2 , a3 ) with 1 ai 6. All outcomes are equally likely and there are
216 outcomes. We need P (A|B) = PP(AB) (B) . There are 6 · 5 · 4 = 120 elements in B.
To count the elements of AB, we first consider Ac B. This is the set of triples where
52 Solutions to Chapter 2
the three numbers are distinct and none of them is a 4. So #Ac B = 5 · 4 · 3 = 60.
Then #AB = #B #Ac B = 120 60 = 60 and
60
P (AB) 216 1
P (A|B) = = 120 = .
P (B) 216
2
2.66. Let
✓ ◆
n 1 2 5 n 2 n(n 1)5n 2
fn = P (n die rolls give exactly two sixes) = 6 6 = .
2 2 · 6n
Next,
1)5n
n(n 2
(n + 1)n5n 1
fn < fn+1 () < () 6(n 1) < 5(n + 1)
2 · 6n 2 · 6n+1
() n < 11.
By reversing the inequalities we get the equivalence
fn > fn+1 () n > 11.
By complementing the two equivalences, we get
fn = fn+1 () fn fn+1 and fn fn+1
() n 11 and n 11 () n = 11.
Putting all these facts together we conclude that the probability of two sixes is
maximized by n = 11 and n = 12 and for these two values of n, that probability is
11 · 10 · 59
⇡ 0.2961.
2 · 611
2.67. Since {X = n + k} ⇢ {X > n} for k 1, we have
P (X = n + k, X > n) P (X = n + k) (1 p)n+k 1 p
P (X = n + k|X > n) = = = .
P (X > n) P (X > n) P (X > n)
Evaluate the denominator:
1
X 1
X
P (X > n) = P (X = k) = (1 p)k 1
p
k=n+1 k=n+1
1
X
n 1
= p(1 p) (1 p)k = p(1 p)n · = (1 p)n .
1 (1 p)
k=0
Thus,
(1 p)n+k 1 p (1 p)n+k 1 p
P (X = n + k|X > n) = =
P (X > n) (1 p)n
= (1 p)k 1
p = P (X = k).
2.68. For k 1, the assumed memoryless property gives
P (X = k + 1)
P (X = k) = P (X = k + 1 | X > 1) =
P (X > 1)
Solutions to Chapter 2 53
2655 513 2
Now 10,000 6= ( 1000 ) which says that P (A1 A2 ) 6= P (A1 )P (A2 ). In other words,
A1 and A2 are not independent without the conditioning on the type of coin. The
intuitive reason is that the first flip gives us information about the coin we hold,
and thereby alters our expectations about the second flip.
2.70. The relevant probabilities: P (A) = P (B) = 2p(1 p) and
P (AB) = P {(T, H, T), (H, T, H)} = p2 (1 p) + p(1 p)2 = p(1 p).
Thus A and B are independent if and only if
2
2p(1 p) = p(1 p) () 4p2 (1 p)2 p(1 p) = 0
() p(1 p) 4p(1 p) 1) = 0
() p = 0 or 1 p = 0 or 4p(1 p) 1 = 0 () p 2 {0, 21 , 1}.
Note that cancelling p(1 p) from the very first equation misses the solutions p = 0
and p = 1.
2.71. Let F = {coin is fair}, B = {coin is biased} and Ak = {kth flip is tails}.
We assume that conditionally on F , the events Ak are independent, and similarly
conditionally on B. Let Dn = A1 \ A2 \ · · · \ An = {the first n flips are all tails}.
(a)
P (Dn |B)P (B) ( 3 )n 1
P (B|Dn ) = = 3 n 15 101 n 9
P (Dn |B)P (B) + P (Dn |F )P (F ) ( 5 ) 10 + ( 2 ) 10
( 35 )n
= .
( 35 )n + 9( 12 )n
2 4
In particular, P (B|D1 ) = 17 and P (B|D2 ) = 29 .
54 Solutions to Chapter 2
(b)
( 35 )24
⇡ 0.898
( 35 )24 + 9( 12 )24
while
( 35 )25
⇡ 0.914,
( 35 )25 + 9( 12 )25
so 25 flips are needed.
(c)
P (Dn+1 ) P (Dn+1 |B)P (B) + P (Dn+1 |F )P (F )
P (An+1 |Dn ) = =
P (Dn ) P (Dn |B)P (B) + P (Dn |F )P (F )
( 35 )n+1 10 1
+ ( 12 )n+1 10
9
= .
( 35 )n 10
1
+ ( 12 )n 10
9
(d) Intuitively speaking, an unending sequence of tails would push the probability
of a biased coin to 1, and hence the probability of the next tails is 3/5. For a
rigorous calculation we take the limit of the previous answer:
( 35 )n+1 101
+ ( 12 )n+1 10
9 3 9 5 n+1
5 + 2(6) 3
lim P (An+1 |Dn ) = lim 3 n 1 1 n 9 = lim 5 n = .
n!1 n!1 ( 5 ) 10 + ( 2 ) 10 n!1 1 + 9( 6 ) 5
2.72. The sample space for n trials is the same, regardless of the probabilities,
namely the space of ordered n-tuples of zeros and ones:
⌦ = {! = (s1 , . . . , sn ) : each si equals 0 or 1}.
By independence, the probability of a sample point ! = (s1 , . . . , sn ) is obtained by
multiplying together a factor pi for each si = 1 and 1 pi for each si = 0. We can
express this in a single formula as follows:
n
Y
P {(s1 , . . . , sn )} = psi i (1 pi )1 si .
i=1
2.73. Let X be the number of blond customers at the pancake place. The popula-
tion of the town is 500, and 100 of them are blond. We may assume that the visitors
are chosen randomly from the population, which means that we take a sample of
size 14 without replacement from the population. X denotes the number of blonds
among this sample. This is exactly the setup for the hypergeometric distribution
and X ⇠ Hypergeom(500, 100, 14). (Because the total population size is N = 500,
the number of blonds is NA = 100 and we take a sample of n = 14.) We can now
use the probability mass function of the hypergeometric distribution to answer the
two questions.
(a)
100 400
10 4
P (exactly 10 blonds) = P (X = 10) = 500 ⇡ 0.00003122.
14
(b)
2
X 2
X 100 400
k 14 k
P (at most 2 blonds) = P (X 2) = P (X = k) = 500
k=0 k=0 14
⇡ 0.4458.
Solutions to Chapter 2 55
2.74. Define events: D = {Steve is a drug user}, A1 = {Steve fails the first drug test}
and A2 = {Steve fails the second drug test}. Assume that Steve is no more or less
likely to be a drug user than a random person from the company, so P (D) =
0.01. The data about the reliability of the tests tells us that P (Ai |D) = 0.99
and P (Ai |Dc ) = 0.02 for i = 1, 2, and conditional independence P (A1 A2 |D) =
P (A1 |D)P (A2 |D) and also the same under conditioning on Dc .
(a)
99 1
P (A1 |D)P (D) 100 · 100 1
P (D|A1 ) = = 99 1 2 99 =
P (A1 |D)P (D) + P (A1 |Dc )P (Dc ) 100 · 100 + 100 · 100
3
(b)
P (A1 A2 ) P (A1 A2 |D)P (D) + P (A1 A2 |Dc )P (Dc )
P (A2 |A1 ) = =
P (A1 ) P (A1 |D)P (D) + P (A1 |Dc )P (Dc )
99 2 1 2 2 99
100 · 100 + 100 · 100 103
= 99 1 2 99 = ⇡ 0.3433.
100 · 100 + 100 · 100
300
(c)
P (A1 A2 |D)P (D)
P (D|A1 A2 ) =
P (A1 A2 |D)P (D) + P (A1 A2 |Dc )P (Dc )
99 2 1
100 · 100 99
= = ⇡ 0.9612.
99 2 1 2 2 99 103
100 · 100 + 100 · 100
Then Ac is the event that the phone is from factory I. We know that
2 3 1 1
P (A) = 0.4 = , P (Ac ) = 0.6 = , P (Bi |A) = 0.2 = , P (Bi |Ac ) = 0.1 = .
5 5 5 10
We need to compute P (A|B1 B2 ). By Bayes’ theorem,
P (B1 B2 |A) · P (A)
P (A|B1 B2 ) = .
P (B1 B2 |A)P (A) + P (B1 B2 |Ac )P (Ac )
We may assume that conditionally on A the events B1 and B2 are independent. This
means that given that the store gets its phones from factory II, the defectiveness of
the phones stocked there are independent. We may also assume that conditionally
on Ac the events B1 and B2 are independent. Then
P (B1 B2 |A) = P (B1 |A)P (B2 |A) = ( 15 )2 , P (B1 B2 |Ac ) = P (B1 |Ac )P (B2 |Ac ) = ( 10
1 2
)
and
( 15 )2 · 25 8
P (A|B1 B2 ) = = ⇡ 0.7273.
( 15 )2 · 25 + ( 10 1 2
) · 3
5
11
56 Solutions to Chapter 2
2.76. Let A2 be the event that the second test comes back positive. Take now
96
P (D) = 494 ⇡ 0.194 as the prior. Then
P (A2 |D)P (D)
P (D|A2 ) =
P (A2 |D)P (D) + P (A2 |Dc )P (Dc )
96 96
100 · 494 2304
= 96 96 2 398 = 2503 ⇡ 0.9205.
·
100 494 + ·
100 494
P (AB)
2.77. By definition P (A|B) = P (B) . Since AB ⇢ B, we have P (AB) P (B) and
P (AB) P (AB)
thus P (A|B) = 1. Furthermore, P (A|B) =
P (B) P (B) 0 because P (B) > 0
and P (AB) 0. The property 0 P (A|B) 1.
To check P (⌦ | B) = 1 note that ⌦ \ B = B, and so
P (⌦ \ B) P (B)
P (⌦ | B) = = = 1.
P (B) P (B)
Similarly, ? \ B = ?, thus
P (? \ B) P (?) 0
P (? | B) = = = = 0.
P (B) P (B) P (B)
Finally, if we have a pairwise disjoint sequence {Ai } then {BAi } are also pairwise
disjoint, and their union is ([1
i=1 Ai ) \ B. This gives
P (([1
i=1 Ai ) \ B)) P ([1
i=1 Ai B))
P ([1
i=1 Ai | B) = =
P (B) P (B)
P1 X1 1
i=1 P (Ai B)) P (Ai B)) X
= = = P (Ai |B).
P (B) i=1
P (B) i=1
ways to choose the first 22 birthdays to be all di↵erent and the twenty-third to be
one of the first 22. Thus, the desired probability is
Q21
22 · k=0 (365 k)
⇡ 0.0316.
36523
2.80. Assume that birth months of distinct people are independent and that for
any particular person each month is equally likely. Then we are asking that a
sample of seven items with replacement from a set of 12 produces no repetitions.
The probability is
12 · 11 · 10 · · · 6 385
= ⇡ 0.1114.
127 3456
2.81. Let An be the event that there is a match among the birthdays of the chosen
n Martians. Then
669 · 668 · · · (669 (n 1))
P (An ) = 1 P (all n birthdays are distinct) = 1
669n
x
To estimate the product we use 1 x ' e to get
669 · 668 · · · (669 (n 1)) Y1 ✓
n
k
◆ nY1
k
= 1 ⇡ e 669
669n 669
k=0 k=0
1
Pn 1 1 n(n 1) n2
k
=e 669 k=0 =e 669 2 ⇡e 2·669
n2
Thus P (An ) ⇡ 1 e . Now solving the inequality P (An ) 0.9:
2·669
n2 n2 p
1 e 2·669 0.9 () ln(1 0.9) () n 2 · 669 ln 10 ' 55.5.
2 · 669
This would suggest n = 56.
In fact this is correct: the actual numerical values are P (A56 ) ' 0.9064 and
P (A55 ) ' 0.8980.
Solutions to Chapter 3
3.1. (a) The random variable X takes the values 1, 2, 3, 4 and 5. Collecting the
probabilities corresponding to the values that are at most 3 we get
1 1 3 3
P (X 3) = P (X = 1)+P (X = 2)+P (X = 3) = pX (1)+pX (2)+pX (3) = + + = .
7 14 14 7
(b) Now we have to collect the probabilities corresponding to the values which are
less than 3:
1 1 3
P (X < 3) = P (X = 1) + P (X = 2) = pX (1) + pX (2) = + = .
7 14 14
(c) First we use the definition of conditional probability to get
P (X < 4.12 and X > 1.638)
P (X < 4.12 | X > 1.638) = .
P (X > 1.638)
We have P (X < 4.12 and X > 1.638) = P (1.638 < X < 4.12). The possible values
of X between 1.638 and 4.12 are 2, 3 and 4. Thus
1 3 2 4
P (X < 4.12 and X > 1.638) = pX (2) + pX (3) + pX (4) = + + = .
14 14 7 7
Similarly,
1 3 2 2 6
P (X > 1.638) = pX (2) + pX (3) + pX (4) + pX (5) = + + + = .
14 14 7 7 7
From this we get
4
2
P (X < 4.12 | X > 1.638) = 76 = .
7
3
3.2. (a) We must have that the probability mass function sums to one. Hence, we
require
X 6
1= p(k) = c (1 + 2 + 3 + 4 + 5 + 6) = 21c.
k=1
1
Thus, c = 21 .
59
60 Solutions to Chapter 3
In the first step we used the formula for f (x), and the fact that it is equal to 0 for
x 0.
(b) Using the definition of the probability density function we get
Z 1 Z 1
x=1
P ( 1 < X < 1) = f (x)dx = 3e 3x dx = e 3x x=0 = 1 e 3 .
1 0
(c) Using the definition of the probability density function again we get
Z 5 Z 5
x=5
P (X < 5) = f (x)dx = 3e 3x dx = e 3x x=0 = 1 e 15 .
1 0
P (2 < X < 4) e 6 e 15
P (2 < X < 4 | X < 5) = = .
P (X < 5) 1 e 15
1
3.4. (a) The density of X is 6 on [4, 10] and zero otherwise. Hence,
6 4 1
P (X < 6) = P (4 < X < 6) = = .
6 3
(b)
3.5. The possible values of a discrete random variable are exactly the values where
the c.d.f. jumps. In this case these are the values 1, 4/3, 3/2 and 9/5. The corre-
sponding probabilities are equal to the size of corresponding jumps:
1
pX (1) = 3 0 = 13 ,
1 1
pX (4/3) = 2 3 = 16 ,
3 1
pX (3/2) = 4 2 = 14 ,
3
pX (9/5) = 1 4 = 14 .
3.6. For the random variable in Exercise 3.1, we may use (3.13). For s 2 ( 1, 1),
8
>
> 0, s<1
>
>
>
> 1
>
> 7, 1s<2
>
>
>
<3, 2s<3
F (s) = P (X s) = 14
>
> 6
>
>
> 14 , 3 s < 4
>
>
>
> 10
>
> 14 , 4 s < 5
:
1, 5 s.
For the random variable in Exercise 3.3, we may use (3.15). For s 0 we have
that
P (X s) = 0,
whereas for s > 0 we have
Z s
P (X s) = 3e 3x dx = 1 e 3s .
0
3.7. (a) If P (a X b) = 1 then F (y) =p 0 for y < a
p and F (y) = 1 for y b.
From the definition of F we see that a = 2 and b = 3 gives the smallest such
interval.
(b) Since X is continuous, P (X = 1.6) = 0. We can also see this directly from F :
P (X = 1.6) = F (1.6) lim F (x) = F (1.6) F (1.6 ).
x!1.6
(b) We have
5
X 1 1 3 2 2 25
E[|X 2|] = |k 2|pX (k) = 1 · +0· +1· +2· +3· = .
7 14 14 7 7 14
k=1
Using integration by parts we can evaluate the last integral to get E[X] = 13 .
(b) e2X is a function of X, and X is continuous, so we can compute E[e2X ] as
follows:
Z 1 Z 1 Z 1
2X 2x 2x 3x
E[e ] = e f (x)dx = e · 3e dx = 3e x dx = 3.
1 0 0
3.10. (a) The random variable |X| takes values 0 and 1 with probabilities
1
P (|X| = 0) = P (X = 0) = 3 and P (|X| = 1) = P (X = 1) + P (X = 1) = 23 .
Then the definition of expectation gives
E[|X|] = 0 · P (|X| = 0) + 1 · P (|X| = 1) = 23 .
(b) Applying formula (3.24):
X
E[|X|] = |k| P (X = k) = 1 · P (X = 1) + 0 · P (X = 0) + 1 · P (X = 1)
k
1 1
= 2 + 6 = 23 .
3.11. By (3.25) we have
Z 1 Z 2
E[(Y 1)2 ] = (x 1)2 f (x) dx = (x 1)2 · 23 x dx = 7
18 .
1 1
From this we get that m = 4 works as the median, but any number that is larger
or smaller than 4 is not a median.
For X from Exercise 3.3 we have
3m 3m
P (X m) = 1 e , and P (X m) = e if m 0
and P (X m) = 0, P (X m) = 1 for m < 0. From this we get that the median
m satisfies e 3m = 1/2, which leads to m = ln(2)/3.
(b) We need P (X q) 0.9 and P (X q) 0.1. Since X is continuous, we
must have P (X q) + P (X q) = 1 and hence P (X q) = 0.9 and P (X
q) = 0.1. Using the calculations from part (a) we see that e 3m = 0.1 from which
q = ln(10)/3.
3.14. The mean of the random variable X from Exercise 3.1 is
5
X 1 1 3 2 2 7
E[X] = kpX (k) = 1 · +2· +3· +4· +5· = .
7 14 14 7 7 2
k=1
= P (Z < p4 ) = ( p47 )
7
⇡ (1.51) ⇡ 0.9345.
(d)
✓ ◆
X µ 10 µ
P (X < 19) = P <
= P (Z < p8 ) = ( p8 ) =1 ( p8 )
7 7 7
⇡1 (3.02) ⇡ 1 0.9987 = 0.0013.
Solutions to Chapter 3 65
(e)
✓ ◆
X µ 4 µ
P (X > 4) = P >
= P (Z > p6 ) =1 ( p67 )
7
⇡1 (2.27) ⇡ 1 0.9884 = 0.0116.
3.20. We must show that Y ⇠ Unif[0, c]. We find the cumulative function. For any
t 2 ( 1, 1) we have
8
>
<0, t<0
c (c t) t
FY (t) = P (Y t) = P (c X t) = P (c t X) = c = c, 0 t < c
>
:
1, c t.
which is the cumulative distribution function for a Unif[0, c] random variable.
3.21. (a) The number of heads out of 2 coin flips can be 0, 1 or 2. These are the pos-
sible values of X. The possible outcomes of the experiment are {HH, HT, T H, T T },
and each one of these has a probability 14 . We can compute the probability mass
function of X by identifying the events {X = 0}, {X = 1}, {X = 2} and computing
the corresponding probabilities:
1
pX (0) = P (X = 0) = P ({T T }) = 4
2 1
pX (1) = P (X = 1) = P ({HT, T H}) = 4 = 2
pX (2) = P (X = 2) = P ({HH}) = 14 .
(b) Using the probability mass function from (a):
3
P (X 1) = P (X = 1) + P (X = 2) = pX (1) + pX (2) = 4
and
P (X > 1) = P (X = 2) = pX (2) = 14 .
(c) Since X is a discrete random variable, we can compute the expectation as
X
E[X] = kpX (k) = 0 · pX (0) + 1pX (1) + 2 · pX (2) = 12 + 2 · 14 = 1.
k
This gives
Var(X) = E[X 2 ] (E[X])2 = 3
2 1 = 12 .
3.22. (a) The random variable X is binomially distributed with parameters n = 3
and p = 12 . Thus, the possible values of X are {0, 1, 2, 3} and the probability
mass function is
1 1 1 1
P (X = 0) = 3 , P (X = 1) = 3 · 3 , P (X = 2) = 3 · 3 , P (X = 3) = 3 .
2 2 2 2
(b) We have
3+3+1 7
P (X 1) = P (X = 1) + P (X = 2) + P (X = 3) = = ,
8 8
and
3+1 1
P (X > 1) = P (X = 2) + P (X = 3) = = .
8 2
Solutions to Chapter 3 67
q
This gives b = 23
6 . However, x
2 23
6 is negative for 1 x <
23
6 ⇡ 1.96 which
shows that the function f cannot be a pdf.
(b) We need b 0, otherwise the function is zero everywhere. The cos x function
is non-negative on [ ⇡/2, ⇡/2], but then it goes below 0. Thus if g is a pdf then
b ⇡/2. Computing the integral of g on ( 1, 1) we get
Z 1 Z b
g(x)dx = cos(x)dx = 2 sin(b).
1 b
There is exactly one solution for 2 sin(b) = 1 in the interval (0, ⇡/2], this is b =
arcsin(1/2) = ⇡/6. For this choice of b the function g is a pdf.
3.26. (a) We require that the probability mass function sum to one. Hence,
1
X 1
X c
1= pX (k) = .
k(k + 1)
k=1 k=1
and X
E[X 2 ] = k 2 P (X = k) = 1 · 2
5 +4· 1
5 +9· 1
5 + 16 · 1
5 = 31
5 .
k
This leads to
Var(X) = E[X 2 ] (E[X])2 = 34
25 .
3.28. (a) The possible values of X are 1, 2, and 3. Since there are three boxes with
nice prizes, we have
3
P (X = 1) = .
5
Next, for X = 2, we must first choose a box that does not have a good prize
(two choices) followed by one that does (three choices). Hence,
2·3 3
P (X = 2) = = .
5·4 10
Similarly,
2·1·3 1
P (X = 3) = = .
5·4·3 10
(b) The expectation is
3 3 1 3
E[X] = 1 · + 2 · +3· = .
5 10 10 2
(c) The second moment is
3 3 1 27
E[X 2 ] = 12 · + 22 · + 32 · = .
5 10 10 10
Hence, the variance is
✓ ◆2
2 2 27 3 9
Var(X) = E[X ] (E[X]) = = .
10 2 20
(d) Let W be the gain or loss in this game. Then
8
>
<100, if X = 1
W = 100(2 X) = 200 100X = 0, if X = 2
>
:
100, if X = 3.
Thus, by Fact 3.52,
3
E[W ] = E[200 100X] = 200 100E[X] = 200
= 50. 100 ·
2
3.29. The possible values of X are the possible class sizes: 17, 21, 24, 28. We can
compute the corresponding probabilities by computing the probability of choosing
a student from that class:
17 21 7 24 4 28 14
pX (17) = 90 , pX (21) = 90 = 30 , pX (24) = 90 = 15 , pX (28) = 90 = 45 .
From this we can compute E[X]:
X
17 7 4 14 209
E[X] = kP (X = k) = 17 · 90 + 21 · 30 + 24 · 15 + 28 · 45 = 9 .
k
and
Z 1 Z 1 1 x=1
3x
E(X 2 ) = xf (x)dx = x2 · 3x 4
dx = = 3.
1 1 1 x=1
From this we get
9
Var(X) = E(X 2 ) (E(X))2 = 3 = 3/4.
4
(g) We have
Z 1 Z 1 x=1
2 2 9 15 39
E[5X +3X] = (5x +3x)f (x)dx = (5x2 +3x)·3x 4
dx = = .
1 1 2x2 x x=1 2
(h) We Z Z
1 1
n n
E[X ] = x f (x)dx = xn · 3x 4
dx.
1 1
Evaluating this integral for integer values of n we get
(
n 1, n 3
E(X ) = 3
3 n , n 2.
(d) We have
Z 1 Z 1
1 1/4 1 1 1/4 1
E[X 1/4 ] = x x 3/2
dx = x 5/4
dx = 4· ·x x=1
= 2.
1 2 2 1 2
3.33. (a) A probability density function must be nonnegative, and it has to inte-
grate to 1. Thus c 0 and we must have
Z 1 Z 2 Z 5
1 1
1= f (x)dx = dx + c dx = + 2c.
1 1 4 3 4
This gives c = 38 .
(b) Since X has a probability density function we can compute the probability in
question by integrating f (x) on the interval [ 32 , 4]:
Z 4 Z 2 Z 4
3 1 1 1 1
P ( 2 < X < 4) = f (x)dx = dx + c dx = · + 1 · c = .
3/2 3/2 4 3 2 4 2
72 Solutions to Chapter 3
R1
(c) We can compute the expectation using the formula E[X] = 1
xf (x)dx and
evaluating the integral using he definition of f .
Z 1 Z 2 Z 5
1
E[X] = xf (x)dx = x dx + x · c dx
1 1 4 3
x=2 x=5 3
x2 cx2 4 1 8 · 25 2·9 27
= + = + = .
8 x=1 2 x=3 8 8 2 2 8
3.34. (a) Since X is discrete, we can compute E[g(X)] using the following formula:
X 1 1 1
E[g(X)] = P (X = k)g(k) = g(1) + g(2) + g(5).
2 3 6
k
R3
We could also compute this probability by evaluating the integral 2 f (x)dx.
(c) Using the probability density function we can write
Z 1
2 2X
E[(1 + X) e ]= f (x)(1 + x)2 e 2x dx
0
Z 1 Z 1
2 2x 2
= (1 + x) e (1 + x) dx = e 2x dx
0 0
1
1 2x 1
= e = .
2 x=0 2
3.38. (a) Since Z is continuous and the pd.f. is given, we can compute its expec-
tation as
Z 1 Z 1 z=1
E[Z] = zf (z)dz = z · 52 z 4 dz = 12
5 6
z = 0.
1 1 z= 1
(b) We have
Z 1/2 Z 1/2 z=1/2
5 4 1 5
P (0 < Z < 1/2) = f (z)dz = 2 z dz = 12 z 5 = 1
2 2 = 1
64 .
0 0 z=0
(c) We have
Then F (1) = P (X 1) = P (X = 1) = 13 ,
3
F (2) = P (X 2) = P (X = 1) + P (X = 2) =
4
and
F (3) = P (X 3) = P (X = 1) + P (X + 2) + P (X = 3) = 1.
(b) There are a number of possible solutions. Here is one that can be checked easily
using part (a):
81
>
> 3 0x1
>
<5 1<x2
f (x) = 12 1
>
> 2<x3
>
:4
0 otherwise.
1
3.40. Here is a continuous example:
R 1 let f (x) = x2 for x 1 and 0 otherwise. This
is a nonnegative function with 1 f (x)dx = 1, thus there is a random variable X
with p.d.f. f . Then the cumulative distribution function of X is given by
Z x (
0, if x < 1
F (x) = f (y)dy = R x 1
1 1 y2
dy = 1 1/x, if x 1.
1
In particular, F (n) = 1 n for each positive integer n.
3.41. We begin by deriving the probability F (s) = P (X s) using the law of total
probability. For s 2 (3, 4),
6
X 3
X 6
X
1 s 1
F (s) = P (X s) = P (X s | Y = k)P (Y = k) = + ·
6 k 6
k=1 k=1 k=4
1 37s
= + .
2 360
We can find the density function f on the interval (3, 4) by di↵erentiating this.
Thus
f (s) = F 0 (s) = 37
360 for s 2 (3, 4).
3.42. (a) Note that 0 X 1 so FX (x) = 1 for x 1 and FX (x) = 0 for x < 0.
For 0 x < 1 the event {X x} is the same as the event that the chosen
point is in the trapezoid Dx with vertices (0, 0), (x, 0), (x, 2 x), (0, 2). The
area of this trapezoid is 12 (2 + 2 x)x, while the area of D is (2+1)1
2 = 32 . Thus
1
area(Dx ) 2 (2 +2 x)x 4x x2
P (X x) = = 3 = .
area(D) 2
3 3
Thus 8
>
<1, if x 1
4x x2
FX (x) = 3 , if 0 x < 1
>3
:
0, if x < 0.
To find FY we first note that 0 Y 2 so FY (y) = 1 for y 2 and FY (y) = 0
for y < 0.
Solutions to Chapter 3 75
For 0 y < 1 the event {Y y} is the same as the event that the chosen
point is in the rectangle with vertices (0, 0), (0, y), (1, y), (1, 0). The area of
this rectangle is y, so in that case P (Y y) = y3 = 2y3 .
2
If 1 y < 2 then the event {Y y} is the same as the event that the
chosen point in the region Dy with vertices (0, 0), (0, y), (2 y, y), (1, 1), (1, 0).
The area of this region can be computed for example by subtracting the area of
the triangle with vertices (2, 0), (0, y), (2 y, y) from the area of D, this gives
y2 1
3 (2 y)2 y2 1 2y 1
2 2 = 2y 2 2. Thus P (Y y) = 3
2 2
= 3 4y y2 1
2
Thus we have
8
>
> 1, if y 2
>
< 1 4y 2
y 1 , if 1y<2
FY (y) = 32y
>
> , if 0y<1
>
:3
0, if x < 0.
(b) Both cumulative distribution functions found in part (a) are continuous ev-
erywhere, and di↵erentiable everywhere apart from maybe a couple of points.
Thus we can find fX and fY by di↵erentiating FX and FY :
(
4 2x
3 , if 0 x < 1
fX (x) = 3
0, otherwise.
8
> 1
< 3 (4 2y) , if 1 y < 2
fY (y) = 23 , if 0 y < 1
>
:
0, otherwise.
3.43. If (a, b) is a point in the square [0, 1]2 then the distances from the four sides
are a, b, 1 a, 1 b and the minimal distance is the minimum of these four numbers.
Since min(a, 1 a) 1/2, this minimal distance is at most 1/2 (which can be
achieved at (a, b) = (1/2, 1/2)), and at least 0. Thus the possible values of X are
from the interval [0, 1/2].
(a) We would like to compute F (x) = P (X x) for all x. Because 0 X 1/2,
we have F (x) = 0 for x < 0 and F (x) = 1 for x > 1/2.
Denote the coordinates of the randomly chosen point by A and B. If 0 x
1/2 then the set {X x}c = {X > x} is the same as the set
{x < A, x < 1 A, x < B, 1 x < B} = {x < A < 1 x, x < B < 1 x}.
2
This is the same as the point (A, B) being in the square (x, 1 x) which has
probability (1 2x)2 . Hence, for 0 x 1/2 we have
F (x) = P (X x) = 1 P (X > x) = 1 (1 2x)2 = 4x 4x2 .
(b) Since the cumulative distribution function F (x) that we found in part (a) is
continuous, and it is di↵erentiable apart from x = 0, we can find f (x) just by
di↵erentiating F (x). This means that f (x) = 4 8x for 0 x 1/2 and 0
otherwise.
3.44. (a) Let s be a real number. Let ↵ = arctan(r) 2 ( ⇡/2, ⇡/2) be the angle
corresponding to the slope s, this is the number ↵ 2 ( ⇡/2, ⇡/2) with tan(↵) =
s. The event that {S s} is the same as the event that the uniformly chosen
76 Solutions to Chapter 3
point is in the circular sector corresponding to the angles ⇡/2 and ↵ and
radius 1. The area of this circular sector is ↵ + ⇡/2, while the area of the half
disk is ⇡. Thus
↵ + ⇡/2 1 arctan(s)
FS (s) = P (S s) = = + .
⇡ 2 ⇡
(b) The c.d.f. found in part (a) is di↵erentiable everywhere, hence the p.d.f. is equal
to its derivative:
✓ ◆0
1 arctan(s) 1
fS (s) = + = .
2 ⇡ ⇡(1 + s2 )
Y
3.45. Let (X, Y ) be the uniformly chosen point, then S = X. We can disregard
the case X = 0, as the probability of this is 0.
(a) We need to compute F (s) = P (S s) for all s.
The slope S can be any nonnegative number, but it cannot be negative. Thus
FS (s) = P (S s) = 0 if s < 0.
If 0 s 1 then the points (x, y) 2 [0, 1]2 with y/x s are exactly the points
in the triangle with vertices (0, 0), (1, 0), (1, s). The area of this triangle is s/2,
hence for 0 s 1 we have FS (s) = s/2.
If 1 < s then the points (x, y) 2 [0, 1]2 with y/x s are either in the triangle
with vertices (0, 0), (1, 0), (1, 1) or in the triangle with vertices (0, 0), (1, 1), (1/s, 1).
The area of the union of these triangles is 1/2 + 21 (1 1/s) = 1 2s 1
, hence for
1
1 < s we have FS (s) = 1 2s .
To summarize: 8
>
<0 s<0
1
F (s) = s 0<s1.
>2
: 1
1 2s 1<s
(b) Since F (s) is continuous everywhere and it is di↵erentiable apart from s = 0,
we can get the probability density function f (s) just by di↵erentiating F . This
gives
8
>
<0 s<0
1
f (s) = 2 0<s1.
>
: 1
2s2 1<s
3.46. (a) The smaller piece cannot be larger than `/2, hence 0 X `/2. Thus
FX (x) = 0 for x < 0 and FX (x) = 1 for x `/2.
For 0 x < `/2 the event {X x} is the same as the event that the chosen
point where we break the stick in two is within x of one of the end points. The
set of possible locations is thus the union of two intervals of length x, hence
the probability of the uniformly chosen point to be in this set is 2·x
` . Hence for
0 x < `/2 we have FX (x) = 2x ` .
To summarize
8
>
<1 for x `/2
2x
FX (x) = ` for 0 x < `/2
>
:
0 for x < 0.
Solutions to Chapter 3 77
(b) The c.d.f. found in part (a) is continuous everywhere, and di↵erentiable apart
from x = `/2. Hence we can find the p.d.f. by di↵erentiating it, which gives
(
2
for 0 x < `/2
fX (x) = `
0 otherwise.
3.47. (a) We need to find F (x) = P (X x) for all x. The X coordinate of a point
in the triangle must be between 0 and 30, so F (x) = 0 for x < 0 and F (x) = 1 for
x 30.
For 0 x < 30 then the set of points in the triangle with X x is the triangle
with vertices (0, 0), (x, 0) and (x, 23 x). The area of this triangle is 13 x2 , while the
area of the original triangle is 20·30
2 = 300. This means that if 0 x < 30 then
1 2
3x x2
F (x) = 300 = 900 . Thus
8
>
<0 2 x<0
x
F (x) = 0 x < 30 .
> 900
:
1 x 30
3.50. (a) For " < t < 9 the event {t " < R < t} is the event that the dart lands
in the annulus (or ring) with radii t ✏ and t. The area of this annulus is
Solutions to Chapter 3 79
In the last sum we are summing for k, j with 1 j k. If we reverse the order of
summation, then k will go from j to 1, while j goes from 1 to 1:
1 X
X k 1 X
X 1
(1 p)k 1
p= (1 p)k 1
p.
k=1 j=1 j=1 k=j
Note that in the double sum we have 1 k i. If we switch the order of the two
summations (which is allowed, since each term is nonnegative) then k goes from 1
to i, and i goes from 1 to 1:
1 X
X 1 1 X
X i
P (X = i) = P (X = i).
k=1 i=k i=1 k=1
Pi
Since P (X = i) does not depend on k, we have k=1 P (X = i) = iP (X = i) and
hence
X1 X1 X i 1
X
P (X k) = P (X = i) = iP (X = i).
k=1 i=1 k=1 i=1
80 Solutions to Chapter 3
P1
Because X takes only nonnegative integers we have E[X]
P1 = i=0 iP (X = i), and
since thePi = 0 term is equal to zero we have E[X] = i=1 iP (X = i). This proves
1
E[X] = k=1 P (X k).
3.53. (a) Since X is discrete, taking values from 0, 1, 2, . . . , we can compute its
expectation as follows:
X1 1 1
3 X X
E[X] = kP (X = k) = 0 · + k · 12 · ( 13 )k = 12 k · ( 13 )k
4
k=0 k=1 k=1
P1
The infinite sum may be computed using the identity k=1 kxk 1 = (1 1x)2 (which
P1
holds for |x| < 1, and follows from k=0 xk = 1 1 x by di↵erentiation):
1
X 1
X 1 3
k · ( 13 )k = 1
3 k · ( 13 )k 1
= 1
3 1 2 = ,
(1 3)
4
k=1 k=1
1 3 3
which gives E[X] = 2 · 4 = 8.
Another way to arrive to this solution would be to apply the approach outlined
in Exercise 3.51.
(b) To compute Var(X) we need E[X 2 ]. It turns out that E[X 2 X] = E[X(X 1)]
is easier to compute:
X1 X1
E[X(X 1)] = k(k 1)P (X = k) = k(k 1) · 12 · ( 13 )k .
k=0 k=2
P1 1
Next we can use that for |x| < 1 we have k=2 k(k 1)xk 2
= (1 x)3 . (This
P1 k 1
follows from k=0 x = 1 x by di↵erentiating twice.)
1
X 1
X 1 3
k(k 1) · 1
2 · ( 13 )k = 1
2 · ( 13 )2 k(k 1) · ( 13 )k 2
= 1
18 · 1 3 = .
(1 3)
8
k=2 k=2
3
Thus E[X(X 1)] = 8 and hence
3 3 3
E[X 2 ] = E[X(X 1) + X] = E[X(X 1)] + E[X] = + =
8 8 4
and
39
Var(X) = E[X 2 ] (E[X])2 = 3/4 (3/8)2 =
.
64
3.54. (a) We have P (X k) = (1 p)k 1
. We can compute this by evaluating the
geometric series
1
X 1
X
P (X k) = P (X = `) = pq ` 1
.
`=k `=k
An easier way is to note that if X is the number of trials needed for the first
success then {X k} is the event that the first k 1 trials are all failures,
which has probability (1 p)k 1 .
(b) By Exercise 3.52 we have
1
X 1
X 1 1
E[X] = P (X k) = (1 p)k 1
= = .
1 q p
k=1 k=1
Solutions to Chapter 3 81
3.55. We first find the probability mass function of Y . The possible values are
1, 2, 3, . . . . Peter wins the game if Y is an odd number, and Mary wins the game if
it is even. If n 0 then
P (Y = 2n + 1)
= P (Peter misses n times, Mary misses n times, Peter hits bullseye next)
= (1 p)n (1 r)n p.
Similarly, for n 1:
P (Y = 2n)
= P (Peter misses n times, Mary misses n 1 times, Mary hits bullseye next)
n n 1
= (1 p) (1 r) r.
Then
1
X 1
X 1
X
E[Y ] = kP (Y = k) = (2n + 1)(1 p)n (1 r)n p + 2n(1 p)n (1 r)n 1
r.
k=1 n=0 n=1
The evaluationPof these sums is a bitP1lengthy, but in the end one just has to use
1
the identities k=0 xk = 1 1 x and k=1 kxk 1 = (1 1x)2 , which holds for |x| < 1.
To simplify notations a little bit, we introduce s = (1 p)(1 r).
1
X 1
X 1
X 1
X
(2n + 1)(1 p)n (1 r)n p = (2n + 1)sn p = 2nsn p + sn p
n=0 n=0 n=0 n=0
1
X 1
X
= 2sp nsn 1
+p sn
n=1 n=0
2sp p p(1 + s)
= + = .
(1 s)2 1 s (1 s)2
1
X 1
X
2n(1 p)n (1 r)n 1
r = 2(1 p)r n(1 p)n 1
(1 r)n 1
n=1 n=1
X1
2(1 p)r
= 2(1 p)r nsn 1
= .
n=1
(1 s)2
This gives
p(1 + s) + 2(1 p)r
E[Y ] = .
(1 s)2
Substituting back s = (1 r)(1 p) = 1 p r + pr:
p(1 + (1 p)(1 r)) + 2(1 p)r (2 p)(p + r pr) 2 p
E[Y ] = = = .
(p + r pr)2 (p + r pr)2 p + r pr
For r = p the random variable Y has geometric distribution with parameter p, and
our formula gives 2p2 pp2 = p1 , as it should.
82 Solutions to Chapter 3
3.56. Using the hint we compute E[X(X 1)] first. Using the formula for the
expectation of a function of a discrete random variable we get
1
X 1
X 1
X
E[X(X 1)] = k(k 1)pq k 1
= pq k(k 1)q k 2
= pq k(k 1)q k 2
.
k=1 k=1 k=0
(We used that k(k 1) = 0 for k = 0.) Note that k(k 1)q k 2 = (q k )00 for k 2,
and the formula also works for k = 0 and 1.
P1
The identity 1 1 x = k=0 xk holds for |x| < 1, and di↵erentiating both sides
we get
✓ ◆0 1
!00 1
1 2 X X
k
= = x = k(k 1)xk 2 .
1 x (1 x)3
k=0 k=0
(We are allowedPto di↵erentiate the series term by term for |x| < 1.) Thus for
1
|x| < 1 we have k=0 k(k 1)xk 2 = (1 2x)3 and thus
1
X 2 2q
E[X(X 1)] = pq k(k 1)q k 2
= pq · = ,
(1 q)3 p2
k=0
where we used p + q = 1.
Then
1 2q p + 2q 1+q
E[X 2 ] = E[X] + E[X(X 1)] = + = =
p p2 p2 p2
where we used p + q = 1 again.
3.57. We have P (X = k) = p(1 p)k 1
for k 1. Hence we can compute E[ X1 ]
using the following formula:
1
X 1
E[ X1 ] = p(1 p)k 1
.
k
k=1
P1
In order to evaluate the infinite sum, we start with the identity 1 1 x = k=0 xk
which holds for |x| < 1, and then integrate both sides from 0 to y with |y| < 1:
Z y Z yX1
1
dx = xk dy.
0 1 x 0 k=0
Ry 1 1
On the left side we have 0 1 x
dx = ln( 1 y ). On the right side we integrate term
by term to get
Z 1
yX 1
X 1
X
y k+1 yn
xk dy = = .
0 k=0 k + 1 n=1 n
k=0
1
X 1
E[ X1 ] = p(1 p)k 1
=
k
k=1
1
X
p (1 p)k p
= ln( p1 )
1 p k 1 p
k=1
3.58. Using the formula for the expected value of a function of a discrete random
variable we get
Xn ✓ ◆
1 n k
E[X] = p (1 p)n k .
k+1 k
k=0
We have
✓ ◆
1 n 1 n! n!
= =
k+1 k k + 1 k!(n k)! (k + 1)!(n k)!
1 (n + 1)!
=
n + 1 (k + 1)!((n + 1) (k + 1))!
✓ ◆
1 n+1
= .
n+1 k+1
1 X ✓n + 1 ◆
n+1
= p` (1 p)n+1 ` .
p(n + 1) `
`=1
Adding and removing the ` = 0 term to the sum and using the binomial theorem
yields
1 X ✓ n + 1◆
n+1
E[X] = p` (1 p)n+1 `
p(n + 1) `
`=1
!
1 X ✓n + 1 ◆
n+1
= p` (1 p)n+1 `
(1 p)n+1
p(n + 1) `
`=0
1
= (1 (1 p)n+1 ).
p(n + 1)
84 Solutions to Chapter 3
3.59. (a) Using the solution for Example 1.38 we see that the following function
works: 8
>
> 10 if 0 r 1,
>
>
>
> if 1 < r 3,
<5
g(r) = 2 if 3 < r 6,
>
>
>
> 1 if 6 < r 9,
>
>
:0 otherwise.
Since 0 R 9 we could have defined g any way we like it outside [0, 9]. (b) The
probability mass function for X is given by
1 8 27 45
pX (10) = , pX (5) = , pX (2) = , pX (1) = .
81 81 81 81
Thus the expectation is
1 8 27 45 149
E[X] = 10 · +5· +2· +1· =
81 81 81 81 81
(c) Using the result of Example 3.19 we see that the probability density fR (r) of R
2r
is 81 for 0 < r 9 and zero otherwise. We can now compute the expectation of
X = g(R) as follows:
Z 1
E[X] = E[g(R)] = g(r)fR (r)dr
1
Z 1 Z 3 Z 6 Z 9
2r 2r 2r 2r
= 10 · dr + 5 · dr + 2 · dr + 1 · dr
0 81 1 81 3 81 6 81
149
= .
81
3.60. (a) Let pX be the probability mass function of X. Then
X X X
E[u(X) + v(X)] = pX (k)(u(k) + v(k)) = pX (k)u(k) + pX (k)v(k)
k k k
= E[u(X)] + E[v(X)].
The first step is the expectation of a function of a discrete random variable.
In the second step we broke the sum into two parts. (This actually requires
care in case of infinitely many terms. It is a valid step in this case because u
and v are bounded and hence all the sums involved are finite.) In the last step
we again used the formula for the expected value of a function of a discrete
random variable.
(b) Suppose that the probability density function of X is f . Then
Z 1 Z 1 Z 1
E[u(X) + v(X)] = f (x)(u(x) + v(x))dx = f (x)u(x)dx + f (x)v(x)dx
1 1 1
= E[u(X)] + E[v(X)].
The first step is the formula for the expectation of a function of a continuous
random variable. In the second step we rewrote the integral of a sum as the
sum of the integrals. (This is a valid step because u and v are bounded and
thus all the integrals involved are finite.) In the last step we again used the
formula for the expected value of a function of a continuous random variable.
Solutions to Chapter 3 85
3.61. (a) Note that the range of X is [0, M ]. Thus, we know that
FX (s) = 0 if s < 0, and F (s) = 1 if s > M.
Next, for s 2 [0, M ] we have
Z s
2s s2
FX (s) = P (X s) = 2(M x)/M 2 dx = .
0 M M2
(b) We have
(
X if X 2 [0, M/2]
Y = .
M/2 if X 2 (M/2, M ]
(d) We have
3
P (Y < M/2) = lim FY (y) = .
y! M
2
4
Another way to see this is by noticing that
1 3
P (Y < M/2) = 1 P (Y M/2) = 1 P (Y = M/2) = 1 = .
4 4
(e) Y cannot be continuous, as P (Y = M/2) = 14 > 0. But it cannot be discrete
either, as there are no other values which Y takes with positive probability.
Thus there is no density, nor is there a probability mass function.
3.62. From the set-up we know F (s) = 0 for s < 0 because negative values have no
probability and F (s) = 1 for s 3/4 because the boy is sure to be inside by time
86 Solutions to Chapter 3
3/4. For values 0 s < 3/4 the probability P (X s) comes from the uniform
distribution and hence equals s, the length of the interval [0, s]. To summarize,
8
>
<0, s < 0
F (s) = s, 0 s < 3/4
>
:
1, s 3/4.
In particular, we have a jump in F that gives the probability for the value 3/4:
P (X = 34 ) = F ( 43 ) F ( 34 ) =1 3
4 = 14 .
This reflects the fact that, left to his own devices, the boy would come in after time
3/4 with probability 1/4. This option is removed by the mother’s call and so all
this probability concentrates on the value 3/4.
P
3.63. (a) We have E[X] = k kpX (k). Because X is symmetric, we must have
P (X = k) = P (X = k) for all k. Thus we can write the sum as
X X X
E[X] = kpX (k) = 0·pX (0)+ kpX (k)+( k)pX ( k) = k(pX (k) pX ( k)) = 0
k k>0 k>0
P1
with C = P11 1 . Since 0 < 1
< 1, this is indeed a probability mass
k=1 k3
k=1 k3
function. Moreover, we have
1
X 1
X X 1 1
C
E[X] = kP (X = k) = k· 3
=C < 1.
k k2
k=1 k=1 k=1
and
1
X 1
X X1 1
2 2 C
2
E[X ] = k P (X = k) = k · 3 =C = 1.
k k
k=1 k=1 k=1
2
3.65. (a) We have Var(2X + 1) = 2 Var(X) = 4 · 3 = 12.
(b) We have
E[(3X 4)2 ] = E[9X 2 24X + 16] = 9E[X 2 ] 24E[X] + 16.
2
We know that Var(X) = E[X ] E[X] , so E[X ] = Var(X) + E[X]2 = 3 + 22 = 7.
2 2
Thus
E[(3X 4)2 ] = 9E[X 2 ] 24E[X] + 16 = 9 · 7 24 · 2 + 16 = 31.
p
3.66. We can express X as X = 3Y + 8 where Y ⇠ N (0, 1). Then
p
0.15 = P (X > ↵) = P ( 3Y + 8 > ↵) = P (Y > ↵p38 ) = 1 ( ↵p38 ).
x2
We can evaluate the integral using integration by parts noting that e 2 x =
x2
( e 2 )0 :
Z 1 Z 1
1 x2
4 1 x2
p e 2 x dx = p e 2 x · x3 dx
1 2⇡ 1 2⇡
Z 1
1 x2
3 x=1 1 x2
=p ( e 2 ) · x x= 1 p ( e 2 ) · 3x2
2⇡ 1 2⇡
Z 1
1 x2
=3 p e 2 x2 dx = 3.
1 2⇡
x2 R1 x2
We used that lim e 2 x3 = 0 (and the same for x ! 1), and that 1 p12⇡ e 2 x2 dx =
x!1
E[Z 2 ] = 1.
Hence E[Z 4 ] = 3.
(b) We can express X as X = Y + µ where Y ⇠ N (0, 1). Then
E[X 4 ] = E[( Y + µ)4 ]
4
= E[ Y4+4 3
µY 3 + 6 2 2
µ Y 2 + 4 Y µ 3 + µ4 ]
4
= E[Y 4 ] + 4 3
µE[Y 3 ] + 6 2 2
µ E[Y 2 ] + 4 µ3 E[Y ] + µ4 .
We know that E[Y ] = 0, E[Y 2 ] = 1. By part (a) we have E[Y 4 ] = 3 and by
the previous problem we have E[Y 3 ] = 0. Substituting these in the previous
expression we get
E[X 4 ] = 3 4 + 6 2 µ2 + µ4 .
3.69. Denote the nth moment E[Z n ] by mn . It can be computed as
Z 1 Z 1
1 x2
mn = xn '(x)dx = xn p e 2 dx
1 1 2⇡
We have seen that m1 = E[Z] = 0 and m2 = E[Z 2 ] = 1.
Suppose first that n = 2k + 1 is an odd number. Then the function x2k+1 is
odd and hence the function x2k+1 '(x) is odd as well. If the
R 1integral is finite then
the contribution of the positive and negative half lines in 1 x2k+1 '(x)dx cancel
each other out and thus m2k+1 = 0. The fact that the integral is finite follows from
x2
the fact that for any fixed n xn grows a lot slower than e 2 .
For n = 2k 2 we see that xn '(x) is even, and thus (if the integrals are finite)
we have Z Z
1 1
m2k = x2k '(x)dx = 2 x2k '(x)dx
1 0
⇣ x2
⌘0
Using integration by parts with the functions x 2k 1
and x'(x) = p1 e 2 =
2⇡
0
( '(x)) we get
Z 1 x=1 Z 1
1 x2
x2k '(x)dx = x2k 1
p e 2 + (2k 1)x2k 2
'(x)dx
0 2⇡ x=0 0
Z 1
= (2k 1) x2k 2 '(x)dx.
0
Solutions to Chapter 3 89
x2
Here the boundary term at 1 disappears because xn e 2 ! 0 for any n 0 as
x ! 1. The integration by parts reduced the exponent of x by 2, and multiplying
both sides by 2 gives
m2k = (2k 1)m2k 2.
We have
y b
a µ y (aµ + b)
=
a
⇣ ⌘
y (aµ+b)
thus FY (y) = a . By (3.42) this is exactly the c.d.f. of a N (aµ+b, a2 2
)
2 2
distributed random variable, so Y ⇠ N (aµ + b, a ).
If a < 0 then
!
y b
y b µ
FY (y) = P (aX + b y) = P (X a ) =1 FX ( y a b ) =1 a
.
3.71. We define noon to be time zero. Let X ⇠ N(0,36) model the arrival time of
the bus in minutes (since the standard deviation is 6). Thus, X = 6Z where Z ⇠
N(0,1). The question is then:
P (X > 5) = P (6Z > 5) = P (Z > 5/6)
=1 (0.83) ⇡ 1 0.7967 = 0.2033.
3.72. Define the random variable X as the number of points made on one swing of
an axe. Note that X is a discrete random variable taking values {0, 5, 10, 15} and
its expected value can be computed as
X
E[X] = kP (X = k) = 0P (X = 0) + 5P (X = 5) + 10P (X = 10) + 15P (X = 15).
k
From the point system given in the problem we have
P (X = 5) =P ( 20 Y 10) + P (10 Y 20) = 2P (10 Y 20)
P (X = 10) =P ( 10 Y 3) + P (3 Y 10) = 2P (3 Y 10)
P (X = 15) =P ( 3 Y 3) = 2P (0 Y 3).
31
(b) Here p = 365 , and we get
!
130 1200 · p
P (S > 130) ⇡ 1 p ⇡1 (2.91) ⇡ 0.0018.
1200p(1 p)
4.2. Let S be the number of hands with a single pair that are observed in 1000
poker hands. Then S ⇠ Bin(n, p) where n = 1000 and p is the probability of
getting a single pair in a poker hand of 5 cards. We take p = 0.42, which is the
approximate success probability given in the exercise.
To approximate P (S 450) we use the normal approximation. With p = 0.42,
np(1 p) = 243.6 so we can feel confident about using this method.
p p
We have E[S] = np = 420 and Var(S) = 243.6. Then
✓ ◆
S 420 450 420
P (S 450) = P p p
243.6 243.6
✓ ◆
S 420
⇡P p 1.92 ⇡ P (Z 1.92),
243.6
91
92 Solutions to Chapter 4
p p
Here we have " = 0.02 and need 2 (2" n) 1 0.95. p This leads to (2" n)
0.975 which, by the table of -values, is satisfied if 2" n 1.96. Solving this
inequality gives
1.962
n = 2401.
4"2
Thus the size of the sample should be at least 2401.
4.7. Now n = 1, 000 and take Sn ⇠ Bin(n, p), where p is unknown. We estimate p
with p̂ = Sn /1000 = 457/1000 = .457. For the 95% confidence interval we need to
find " > 0 such that
P (|p̂ p| < ") 0.95.
Then the confidence interval is (0.457 ", 0.457 + ").
Repeating again the normal approximation procedure: gives
Sn np
P (|p̂ p| < ") = P ( " < p̂ p < ") = P ( " < < ")
p n p
" n Sn np " n
= P( p <p p <p )
p(1 p) n p(1 p) p(1 p)
p
⇡ 2 ( p" n
) 1.
p(1 p)
p
Note that p(1 p) 1/2 on the interval [0, 1], from which we conclude that
p p
2 ( p " n ) 1 2 (2" n) 1,
p(1 p)
and so
p
P (|p̂ p| < ") 2 (2" n) 1.
Hence, we just need to find ✏ > 0 satisfying
p p p
2 (2" n) 1 = 0.95 =) (2" n) = 0.975 =) 2" n ⇡ 1.96.
Thus, take
1.96
"= p ⇡ 0.031
2 1000
and the confidence interval is
(0.457 0.031, 0.457 + 0.031).
4.8. We have n =1,000,000 trials with an unknown success probability p. To find a
99.9% confidence interval we need an " > 0 so that P (|p̂ p| < ") 0.999, where p̂
is the fraction of positive outcomes. We have seen in Section 4.3 that P (|p̂ p| < ")
can be estimated using the normal approximation as
p p
P (|p̂ p| < ") ⇡ 2 ( p " n ) 1 2 (2" n) 1.
p(1 p)
p p
We need 2p (2" n) 1 0.999 which means (2" n) 0.9995 and so approxi-
mately 2" n 3.32. (Since 0.9995 appears several times in our table, other values
instead of 3.32 are also acceptable.) This gives
3.32
" p ⇡ 0.00166
2 n
94 Solutions to Chapter 4
and
P13 k
P (X 13 and X 7) k=7 e
P (X 13 | X 7) = = P6 k! k
P (X 7) 1 k=0 k! e
0.7343
⇡ ⇡ 0.844.
0.8699
4.10. It is reasonable to assume that the hockey player has a number of scoring
chances per game, but only a few of them result in goals. Hence the number
of goals in a given game corresponds to counting rare events, which means that
it is reasonable to approximate this random number with a Poisson( ) distributed
random variable. Then the probability of scoring at least one goal would be 1 e
(since e is the probability of no goals). Using the setup of the problem we have
1 e ⇡ 0.5 which gives ⇡ ln(2) ⇡ 0.6931. We estimate the probability that
the player scores exactly 3 goals. Using the Poisson probability mass function and
our estimate on gives
3
P (exactly 3 goals) = e ⇡ 0.028.
3!
Thus we would expect the player to get a hat-trick in about 2.8% of his games.
Equally valid is the answer where we estimate the probability of scoring at least
3 goals:
2
P (at least 3 goals) = 1 P (at most 2 goals) = 1 e e e
2!
=1 1
2 1 + ln 2 + 12 (ln 2)2 ⇡ 0.033.
4.11. We assume that typos are rare events that do not strongly depend on each
other. Hence the number of typos on a given page should be well-approximated by
a Poisson random variable with parameter = 6, since that is the average number
of typos per page.
Let X be the number of errors on page 301. We now have
3
X k
66
P (X 4) = 1 P (X 3) ⇡ 1 e = 0.8488.
k!
k=0
Solutions to Chapter 4 95
x
4.12. The probability density function fT (x) of T is e for x 0 and 0 other-
wise. Thus E[T 3 ] can be evaluated as
Z 1 Z 1
E[T 3 ] = fT (x)x3 dx = x3 e x
dx.
1 0
x 0
To compute the integral we use integration by parts with e x = ( e ):
Z 1 x=1 Z 1 Z 1
x3 e x dx = x3 e x 3x2 ( e x )dx = 3x2 e x
dx.
0 x=0 0 0
x x=1
R1
Note that x3 e x=0
= 0 because lim x3 e x
= 0. To evaluate 0
3x2 e x
dx
x!1
we can integrate by parts twice more, or we can quote equation (4.18) from the
text to get
Z 1 Z
3 1 2 3 2 6
3x2 e x dx = x e x dx = · 2 = 3 .
0 0
3 6
Thus E[T ] = 3 .
1
4.13. The probability density function of T is fT (x) = 13 e 3x for x 0, and zero
1
otherwise. The cumulative distribution function is FT (x) = 1 e 3x for x 0,
and zero otherwise. From this we can compute
1
P (T > 3) = 1 FT (3) = e ,
1/3 8/3
P (1 T < 8) = FT (8) FT (1) = e e ,
P (T > 4 and T > 1) P (T > 4)
P (T > 4 | T > 1) = =
P (T > 1) P (T > 1)
4/3
1 FT (4) e 1
= = 1/3
=e .
1 FT (1) e
P (T > 4 | T > 1) can also be computed using the memoryless property of the
exponential:
1
P (T > 4 | T > 1) = P (T > 3) = 1 FT (3) = e .
4.14. (a) Denote the lifetime of the lightbulb by T . Since T is exponentially dis-
1
tributed with expected value 1000 we have T ⇠ Exp( ) with = 1000 . The
t
cumulative distribution function of T is then FT (t) = 1 e for t > 0 and 0
otherwise. Hence
2000· 2
P (T > 2000) = 1 P (T 2000) = 1 FT (2000) = e =e .
(b) We need to compute P (T > 2000|T > 500) where we used the notation of part
(a). By the memoryless property P (T > 2000|T > 500) = P (T > 1500). Using
the steps in part (a) we get
3
1500·
P (T > 1500) = 1 FT (1500) = e =e 2.
4.15. Let N be the Poisson process of arrival times of meteors. Let 11 PM corre-
spond to the origin on the time line.
96 Solutions to Chapter 4
(a) Using the fact that N ([0, 1]), the number of meteors within the first hour, has
Poisson(4) distribution, we get
2
X
P (N ([0, 1]) > 2) = 1 P (N ([0, 1] = k)
k=0
2
X k
44
=1 e ⇡ 0.7619.
k!
k=0
(b) Using the independent increment property we get that N ([0, 1]) and N ([1, 4])
are independent. Moreover, N ([0, 1]) ⇠ Poisson(4) and N ([1, 4]) ⇠ Poisson(3 ·
4), which gives
P (N ([0, 1]) = 0, N ([1, 4]) 10) = P (N ([0, 1]) = 0) · P (N ([1, 4]) 10)
= P (N ([0, 1]) = 0) · (1 P (N ([1, 4]) < 10))
✓ X9 ◆
12k
=e 4· 1 e 12
k!
k=0
⇡ 0.01388.
(c) Using the independent increment property again:
P (N ([0, 1]) = 0, N ([0, 4]) = 13)
P (N ([0, 1]) = 0 | N ([0, 4]) = 13) =
P (N ([0, 4]) = 13)
P (N ([0, 1]) = 0, N ([1, 4]) = 13)
=
P (N ([0, 4]) = 13)
P (N ([0, 1]) = 0) · P (N ([1, 4]) = 13)
=
P (N ([0, 4]) = 13)
e 4 · e 12 1213 /13!
=
e 16 1613 /13!
✓ ◆13
3
=
4
⇡ 0.02376.
4.16. (a) Denote by S the number of random numbers starting with the digit 1.
Note that a number in the interval [1.5, 4.8] starts with 1 if and only if it is in
the interval [1.5, 2). The probability that a uniformly chosen number from the
interval [1.5, 4.8] is in [1.5, 2) is equal to p = 4.80.51.5 = 33
5
. Assuming that the
500 numbers are chosen independently, the distribution of S is binomial with
parameters n = 500 and p.
To estimate P (S < 65) we use normal approximation. Note that E[S] =
5
np = 500 · 33 ⇡ 75.7576 and Var(S) = np(1 p) ⇡ 64.2792. Hence
✓ ◆ ✓ ◆
S 75.7576 65 75.7576 S 75.7576
P (S < 65) = P p < p ⇡P p < 1.34
64.2792 64.2792 64.2792
⇡ ( 1.34) = 1 (1.34) ⇡ 1 0.9099 = 0.0901.
Note that P (S < 65) = P (S 64). Using 64 instead of 65 in the calculation
above gives 1 (1.47) ⇡ 0.0708. If we use the continuity correction then we
Solutions to Chapter 4 97
need to use 64.5 instead of 65 which gives 1 (1.4) ⇡ 0.0808. The actual
probability (evaluated numerically) is 0.0778.
(b) We proceed similarly as in part (a). The probability that a given uniformly
1
chosen number from [1.5, 4.8] starts with 3 is q = 3.3 = 10
33 . If we denote
the number of such numbers among the 500 random numbers by T then T ⇠
Bin(n, q) with n = 500.
Then
! !
T nq 160 nq T nq
P (T > 160) = P p >p ⇡P p > 0.83
nq(1 q) nq(1 q) nq(1 q)
⇡1 (0.83) ⇡ 1 0.7967 = 0.2033.
Again, since P (T > 160) = P (T 161), we could have done the compu-
tation with 161 instead of 160, which would give 1 (0.92) ⇡ 0.1788. If we
use the continuity correction then we replace 160 with 160.5 in the calculation
above which leads to 1 (0.87) ⇡ 0.1922. The actual probability (evaluated
numerically) is 0.1906.
1
4.17. The probability of rolling two ones is 36 . Denote the number of snake eyes
1
out of 10,000 rolls by X. Then X ⇠ Bin(n, p) with n =10,000 and p = 36 . The
expectation and variance are
2500 21,875
np = ⇡ 277.78, np(1 p) = ⇡ 270.06.
9 81
Using the normal approximation:
✓ 2500 2500 2500 ◆
280 9 X 9 300 9
P (280 X 300) = P q q q
21,875 21,875 21,875
81 81 81
✓ ◆
4 X 25009 8
=P p q p
5 35 21,875 35
81
With continuity correction we need to replace 100 with 99.5 in the calculation
above. This way we get 1 (2.225) ⇡ 0.01305 (using linear approximation for
(2.225)). The actual probability (evaluated numerically) is 0.0153.
4.19. Let X be number of people in the sample who prefer cereal A. We may
approximate the distribution of X with a Bin(n, p) distribution with n = 100, p =
0.2. (This is an approximation, because the true distribution is hypergeometric.)
The expectation and variance are np = 20 and np(1 p) = 16. Since the variance is
large enough, it is reasonable to use the normal approximation to estimate P (X
25):
✓ ◆
X 20 25 20
P (X 25) = P p p
16 16
⇡ P (Z > 1.25) = 1 (1.25) ⇡ 1 0.8944 = 0.1056,
If we use the continuity correction then we get
✓ ◆
X 20 24.5 20
P (X 25) = P (X > 24.5) = P p p
16 16
⇡ P (Z > 1.125) = 1 (1.125) ⇡ 1 0.8697 = 0.1303.
4.20. Let X be the number of heads. Then 10,000 X is the number of tails and
|X (10,000 X)| = |2X 10,000| is the di↵erence between the number of heads
and number of tails. We need to estimate
P (|2X 10,000| 100) = P (4950 X 5050).
Solutions to Chapter 4 99
P (4950 X 5050)
0 1
1 1 1
4950 10,000 · 2 X 10,000 · 2 5050 10,000 · 2
=P@ q q q A
1 1 1 1
10,000 · 2 · 2 10,000 · 2 · 2 10,000 · 12 · 12
0 1
X 10,000 · 12
=P@ 1 q 1A
1 1
10,000 · 2 · 2
⇡ 2 (1) 1 ⇡ 0.6826.
4.21. Let Xn be the number of games won out of the first n games. Then Xn ⇠
1
Bin(n, p) with p = 20 . The amount of money won in the first n games is then
Wn = 10Xn (n Xn ) = 11Xn n. We have
n 100
P (Wn > 100) = P (11Xn n> 100) = P (Xn > 11 ).
Note that the variance in the n = 200 case is 9.5, which is slightly below 10, so
the normal approximation is not fully justified. In this case np2 = 1/2, so the Pois-
son approximation is not guaranteed to work either. The Poisson approximation
is
9
X k
100 10 10
P (W200 > 100) = P (X200 > 11 ) = P (X200 10) ⇡ 1 e ⇡ 0.5421.
k!
k=0
1 1
Using the normal approximation (with E[S] = 400 · 2 = 200 and Var(S) = 400 · 2 ·
1
2 = 100):
But
Sn,4 1 Sn,4 1 Sn,4 17
P( n 6 ") P( n 6 + ") = P ( n 100 ),
Sn,4 1 S 17
thus if P ( n 6 ") converges to zero then so does P ( n,4
n 100 ).
(b) Let Bn,i , i = 1, . . . , 6 be the event that after n rolls the frequency of the number
i is between 16% and 17%. Then An = \6i=1 Bn,i . Note that Acn = [6i=1 Bn,i c
,
and
6
X
(⇤) P (Acn ) = P ([6i=1 Bn,i
c
) c
P (Bn,i ).
i=1
(Exercise 1.43 proved this subadditivity relation.) We would like to show that
for large enough n we have P (An ) 0.999. This is equivalent to P (Acn ) < 0.001.
If we could show that there is a K so that for n K we have P (Bn,i c
) < 0.001
6
for each 1 i 6, then the bound (⇤) implies P (Acn ) < 0.001 and thereby
P (An ) 0.999.
Solutions to Chapter 4 101
Begin again with the statement given by the law of large numbers: for any
" > 0 and 1 i 6 we have
Sn,i 1
lim P ( n 6 < ") = 1.
n!1
17 1 1
Take " = 100 6 = 300 . Then we have
Sn,i 1 Sn,i
P( n 6 < ") = P ( 16 "< n < 1
6 + ")
49 Sn,i 17
= P ( 300 < n < 100 )
16 Sn,i 17
P ( 100 < n < 100 ) = P (Bn,i ).
Sn,i 1
Since P ( n < ") converges to 1, so does P (Bn,i ) for each 1 i 6.
6
By this convergence there exists K > 0 so that P (Bn,i ) > 1 0.0016 for each
1 i 6 and all n K. This gives P (Bn,i c
) = 1 P (Bn,i ) < 0.001
6 for each
1 i 6. As argued above, this implies that P (An ) 0.999 for all n K.
4.25. Let Sn be the number of interviewed people that prefer cereal to bagels
for breakfast. If the population is large, we can assume that sampling from the
population with replacement or without replacement does not make a big di↵erence,
therefore we assume Sn ⇠Bin(n, p). In this case, n = 81. As usual, the estimate of
p will be
Sn
p̂ = .
n
We want to find q 2 [0, 1] such that
✓ ◆
Sn
P (|p̂ p| < 0.05) = P p < 0.05 q
n
If Z ⇠ N(0,1), we have that
✓ ◆ p p !
Sn 0.05 n Sn np 0.05 n
P p < 0.05 = P <p <
n p(1 p) np(1 p) p(1 p)
✓ p p ◆
0.05 n 0.05 n
⇡P <Z<
p(1 p) p(1 p)
p p
P ( 2 · 0.05 n < Z < 2 · 0.05 n)
p p p
= (2 · 0.05 n) ( 2 · 0.05 n) = 2 (2 · 0.05 n) 1
= 2 (0.9) 1 ⇡ 2 · 0.8159 1 = 0.6318.
Therefore, the true p lies in the interval (p̂ 0.05, p̂ + 0.05) with probability greater
than or equal to 0.6318. Note that this is not a very high confidence level.
4.26. Let S be the number of interviewed people that prefer whole milk to skim
milk. Then S ⇠ Bin(n, p) with n = 100. Our estimate for p is pb = Sn . The event
p 2 (bp 0.1 , pb + 0.1) is the same as | Sn p| < 0.1. To estimate the probability of
this event we use normal approximation:
p p
P (|S/n p| < 0.1) = P ( p0.1 n
< pS np
< p0.1 n
)
p(1 p) np(1 p) p(1 p)
p p
⇡2 ( p0.1 n ) 1 2 (0.2 n) 1,
p(1 p)
We need p p
n n
2 ( p ) 1 0.9 , ( p ) 0.95
10 p(1 p) 10 p(1 p)
which holds if p
n
p 1.645
10 p(1 p)
This means that f (k+1) > f (k) exactly if > k+1 or 1 > k, and f (k+1) < f (k)
exactly if 1 < k.
If is not an integer then let k ⇤ = b c be the integer part of (the largest
integer smaller than ). By the arguments above we have
f (0) < f (1) < · · · < f (k ⇤ ) > f (k ⇤ + 1) > f (k ⇤ + 2) > . . .
If is a positive integer then
f (0) < f (1) < · · · < f ( 1) = f ( ) > f ( + 1) > f ( + 2) > . . .
In both cases f is increasing and then decreasing.
4.31. We have
1
X 1
1 1 µµ
k
1X µ µk+1
E = e = e
1+X k+1 k! µ (k + 1)!
k=0 k=0
X1
1 µ` 1 e µ
= e µ = .
µ `! µ
`=1
P1 µµ
`
We introduced ` = k + 1 and used `=1 e `! =1 e µ.
P1
4.32. (a) We can compute E[g(Y )] with the formula k=0 g(k)P (Y = k). Thus
1
X µk µ
E[Y (Y 1) · · · (Y n + 1)] = k(k 1) · · · (k n + 1) e .
k!
k=0
1 n
Note that we can approximate the Bin(n, 365 ) distribution with a Poisson( 365 )
distributed random variable Y . Then P (X 1) ⇡ P (Y 1) = 1 P (Y = 0) =
n n
1 e 365 . To get 1 e 365 2/3 we need n 365 ln 3 ⇡ 400.993 which also gives
n 401.
4.37. Since there are lots of scoring chances, but only a few of them result goals,
it is reasonable to model the number of goals in a given game by a Poisson( )
random variable. Then the percentage of games with no goals should be close to
the probability of this Poisson( ) random variable being zero, which is e . Thus
0.0816 = e = log(0.0816) ⇡ 2.506
The percentage of games where exactly one goal was scored should be close to
e = 0.2045 or 20.45%.
(Note: in reality 77 of the 380 games ended with one goal which gives 20.26%.
The Poisson approximation gives an extremely precise estimate!)
4.38. Note that X is a Bernoulli random variable with success probability p, and
Y ⇠ Poisson(p). We need to show that for any subset A of {0, 1, . . . } we have
|P (X 2 A) P (Y 2 A)| p2 .
This looks hard, as there are lots of subsets of {0, 1, . . . }. Let us start with the
subsets {0} and {1}. In these cases
(
1 p e p, if k = 0
P (X 2 A) P (Y 2 A) = P (X = k) P (Y = k) = p
p pe , if k = 1.
(b) Since np2 is small, the Poisson approximation is appropriate with parameter
µ = np = 87 . Then
8 8
8
P (X 2) = 1 P (X = 0) P (X = 1) ⇡ 1 e 7
7e
7 ⇡ 0.3166
4.40. Let X denote the number of times the number one appears in the sample.
1
Then X ⇠ Bin(111, 10 ). We need to approximate P (X 3). Using normal ap-
proximation gives
0 1
1 1
X 111 · 10 3 111 · 10
P (X 3) = P @ q q A
1 9 1 9
111 · 10 · 10 111 · 10 · 10
0 1
1
X 111 · 10
⇡ P @q 2.56A
1 9
111 · 10 · 10
⇡ ( 2.56) = 1 (2.56) ⇡ 1 0.9948 = 0.0052.
If we use the continuity correction then we have to repeat the calculation above
starting from P (X 3) = P (X < 2.5) which gives the approximation ( 2.72) ⇡
0.0033.
Solutions to Chapter 4 107
P (X 3) ⇡ P (Y 3) = P (Y = 0) + P (Y = 1) + P (Y = 2) + P (Y = 3)
11.1 11.12 11.13
=e (1 + 11.1 + + ) ⇡ 0.004559.
2 6
The variance of X is 999
100 which is almost 10, hence it is not that surprising that the
normal approximation is pretty accurate (especially with continuity correction).
1 2
Since np2 = 111 · ( 10 ) = 1.11 is not very small, we cannot expect the Poisson
approximation to be very precise, although it is still quite accurate.
4.41. Let X be the number of sixes. Then X ⇠ Bin(n, p) with n = 72 and p = 1/6.
✓ ◆
72 1 3 5 69
P (X = 3) = ( ) ( ) ⇡ 0.00095.
3 6 6
The Poisson approximation would compare X with a Poisson(µ) random variable
with µ = np = 12:
123
P (X = 3) ⇡ e 12 ⇡ 0.0018.
3!
For the normal approximation we need the continuity correction:
⇣ ⌘
P (X = 3) = P (2.5 X 3.5) = P 2.5
p 12 Xp 12 3.5
10 10
p 12
10
⇡ ( 2.69) ( 3.0) = (3.0) (2.69) ⇡ 0.9987 0.9964 = 0.0023.
4.42. (a) Let X be the number of mildly defective gadgets in the box. Then X ⇠
Bin(n, p) with n = 100 and p = 0.2 = 15 . We have
X14 ✓ ◆
100
P (A) = P (X < 15) = (1/5)k (4/5)100 k .
k
k=0
(b) We have np(1 p) = 16 > 10 and np2 = 4. This suggests that the normal
approximation is more appropriate than the Poisson approximation in this case.
Using normal approximation we get
0 1
X 100 · 15 15 100 · 15
P (X < 15) = P @ q <q A
100 · 15 · 45 100 · 15 · 45
0 1
X 100 · 15 5
= P @q < A
100 · ·1 4 4
5 5
is large enough for a normal approximation to work. So, letting Z ⇠ N (0, 1) and
using the correction for continuity, we have
✓ ◆
X 40 47.5 40
P {X 48} = P (X 47.5) = P
6 6
⇡ P (Z 1.25) = 1 (1.25) = 1 0.8944 = 0.1056.
(b) We have np(1 p) = 100 > 10 and np2 = 100. Thus it is more reasonable to
use the normal approximation:
✓ ◆
X 400· 12 215 400· 12
P (X 215) = P p 1 1
p 1 1
400· 2 · 2 400· 2 · 2
✓ ◆
X 400· 12 3
=P p 2
400· 12 · 12
4.47. (a) Let X be the number of times in a year that he needed more than 10
coin flips. Then X ⇠ Bin(365, p) with
1
p = P (more than 10 coin flips needed) = P (first 10 coin flips are tails) =
210
Since np(1 p) is small (and np2 is even smaller), we can use the Poisson
approximation here with = np = 365
210 = 0.356. Then
2
P (X 3) = 1 P (X = 0) P (X = 1) P (X = 2) ⇡ 1 e ) ⇡ 0.00579. (1+ +
2
(b) Denote the number of times that he needed exactly 3 coin flips by Y . This
has a Bin(365, r) distribution with success probability r = 213 = 18 . (The value
of r is the probability that a Geo(1/2) random variable is equal to 3.) Since
nr(1 r) = 39.92 > 10, we can use normal approximation. The expectation of
Y is E[Y ] = nr = 45.625.
X 45.625 50 45.625 X 45.625
P (X > 50) = P ( p > ) = P( p > 0.69)
39.92 39.92 39.92
⇡1 (0.69) = 1 0.7549 = 0.2451.
4.48. Let A = {X 2 [0, 1]} and B = {X 2 [a, 2]}. We need to find a < 1 so that
P (AB) = P (A)P (B).
If a 0 then AB = A, and then P (A)P (B) 6= P (AB). Thus we must have
0 < a < 1 and hence AB = {X 2 [a, 1]}. The c.d.f. of X is 1 e 2x for x 0 and
0 otherwise. From this we can compute
2
P (A) = P (0 X 1) = 1 e
2a 4
P (B) = P (a X 2) = e e
2a 2
P (AB) = P (a X 1) = e e .
Thus P (AB) = P (A)P (B) is equivalent to
2 2a 4 2a 2
(1 e )(e e )=e e .
2a 4 2 1 2
Solving this we get e =e +1 e and a = 2 ln(1 e + e 4) ⇡ 0.0622.
4.49. Let T ⇠ Exp(1/10) be the lifetime of a particular stove. Let r > 0 and let X
be the amount of money you earn on a particular extended warranty of length r.
We see that ⇢
C if T > r
X=
C 800 if T r
(1/10)r
We have P (T > r) = e , and so
E[X] = CP (X = C) + (C 800)P (X = C 800)
= CP (T > r) + (C 800)P (T r)
r/10 r/10
= Ce + (C 800)(1 e ).
Thus, the pairs of numbers (C, r) will give an expected profit of zero are those
satisfying:
0 = Ce r/10 + (C 800)(1 e r/10 ).
110 Solutions to Chapter 4
4.50. By the memoryless property of the exponential distribution for any x > 0 we
have
P (T > x + 7|T > 7) = P (T > x).
Thus the conditional probability of waiting at least 3 more hours is P (T > 3) =
1
e 3 ·3 = e 1 , and the conditional probability of waiting at least x > 0 more hours
1
is P (T > x) = e 3 x .
4.51. We know from the condition that 0 T1 t, so P (T1 s | Nt = 1) = 0 if
s < 0 and P (T1 s | Nt = 1) = 1 if s > t.
If 0 s t we have
P (T1 s, Nt = 1)
P (T1 s | Nt = 1) = .
P (Nt = 1)
t
Since the arrival is a Poisson process with intensity , we have P (N1 = 1) = e .
Also,
P (T1 s, Nt = 1) = P (N ([0, s]) = 1, N ([0, t]) = 1) = P (N ([0, s]) = 1, N ([s, t]) = 0)
s (t s)
= P (N ([0, s]) = 1)P (N ([s, t]) = 0) = se ·e
t
= se .
Then
t
P (T1 s, Nt = 1) se
P (T1 s | Nt = 1) = = t
= s.
P (Nt = 1) e
Collecting all cases:
8
>
<0, s<0
P (T1 s | Nt = 1) = s, 0st
>
:
1, s > t.
This means that the conditional distribution is uniform on [0, t].
R1 R1
4.52. (a) By definition (r) = 0 xr 1 e x dx for r > 0. Then (r+1) = 0 xr e x
dx.
Using integration by parts with ( e x )0 = e x we get
Z 1
(r + 1) = xr e x dx
0
Z 1
x=1
= xr ( e x ) x=0 rxr 1 ( e x )dx
0
Z 1
=r xr 1 e x dx = r (r).
0
r x x=1
The two terms in x ( e ) x=0
disappear because r > 0 and lim xr e x
= 0.
x!1
(b) We use induction to prove the identity. For n = 1 the statement is true as
Z 1
(1) = e x dx = 1 = 0!.
0
Assume that the statement is true for some positive integer n: (n) = (n 1)!,
we need to show that it also holds for n + 1. But this is true because by part
(a) we have
(n + 1) = n (n) = n · (n 1)! = n!,
Solutions to Chapter 4 111
P
5.1. We have M (t) = E[etX ], and since X is discrete we have E[etX ] = k P (X =
k)etk . Using the given probability mass function we get
6t 2t
M (t) = P (X = 6)e + P (X = 2)e + P (X = 0) + P (X = 3)e3t
6t 2t
= 49 e + 19 e + 2
9 + 29 e3t
5.2. (a) We have
M 0 (t) = 4
3e
4t
+ 56 e5t , M 00 (t) = 16
3 e
4t
+ 25 5t
6 e .
5.3. The probability density function of X is f (x) = 1 for x 2 [0, 1] and 0 otherwise.
The moment generating function can be computed as
Z 1 Z 1
tX tx
M (t) = E[e ] = f (x)e dx = etx dx.
1 0
R1
If t = 0 then M (t) = 0
dx = 1. If t 6= 0 then
Z 1
et 1
M (t) = etx dx = .
0 t
113
114 Solutions to Chapter 5
5.4. (a) In Example 5.5 we have seen that the moment generating function of a
2 t2 2
N (µ, 2 ) random variable is e 2 +µt . Thus if X̃ ⇠ N (0, 12) then MX̃ (t) = e6t
and MX̃ (t) = MX (t) for |t| < 2. But then by Fact 5.14 the distribution of X is the
same as the distribution of X̃.
(b) In Example 5.6 we computed the moment generating function of an Exp( )
distribution, and it was t for t < and 1 otherwise. Thus MY (t) has the same
moment generating function as an Exp(2) distribution in the interval ( 1/2, 1/2),
hence by Fact 5.14 we have Y ⇠ Exp(2).
(c) We cannot identify the distribution of Z, as there are many random variables
with moment generating functions that are infinite for t 5. For example, all
Exp( ) distributions with < 5 have this property.
(d) We cannot identify the distribution of W , as there are many random variables
where the moment generating function is equal to 2 at t = 2. Here are two examples:
if W1 ⇠ N (0, 2 ) with 2 = ln22 then
2 t2 ln 2 (22 )
2
MW1 (2) = e 2 =e 2 = eln 2 = 2.
ln 2
If W2 ⇠ Poisson( ) with = e2 1 then
(e2 1)
ln 2
(e2 1)
MW2 (2) = e = e e2 1 = eln 2 = 2.
t
5.5. We can recognize MX (t) = e3(e 1) as the moment generating function of a
4
Poisson(3) random variable. Hence P (X = 4) = e 3 34! .
5.6. Then possible values of Y = (X 1)2 are 1, 4 and 9. The corresponding
probabilities are
P ((X 1)2 = 1) = P (X = 0 or X = 2) = P (X = 0) + P (X = 2)
1 3 2
= + =
14 14 7
2 1
P ((X 1) = 4) = P (X = 1) = ,
7
2 4
P ((X 1) = 9) = P (X = 4) = .
7
5.7. The cumulative distribution function of X is FX (x) = 1 e x for x 0 and
0 otherwise. Note that X > 0 with probability one, and ln(X) can take values from
the whole R.
We have
ey
FY (y) = P (Y y) = P (ln(X) y) = P (X ey ) = 1 e ,
where we used ey > 0. From this we get
d ⇣ ⌘0
ey ey
fY (y) = FY (y) = 1 e = ey
dy
for all y 2 R.
5.8. We first compute the cumulative distribution function of Y . Since 1 X 2,
we have 0 X 2 4, thus FY (y) = 1 for y 4 and FY (y) = 0 for y < 0.
Solutions to Chapter 5 115
From these we get Var(X) = E[X 2 ] (E[X])2 = (n 1)np2 +np n2 p2 = np(1 p).
5.10. Using the Binomial Theorem we get
✓ ◆30 X 30 ✓ ◆ ✓ ◆k ✓ ◆30 k
1 4 t 30 4 1
M (t) = + e = ekt .
5 5 k 5 5
k=0
Since this is the sum of terms of the form pk etk , we see that X is discrete. The
possible values can be identified with the exponents: these are 0,1,2,. . . , 30. The
coefficients are the corresponding probabilities:
✓ ◆ ✓ ◆k ✓ ◆30 k
30 4 1
P (X = k) = , k = 0, 1, . . . , 30.
k 5 5
We can recognize this as the probability mass function of a binomial distribution
with n = 30 and p = 45 .
116 Solutions to Chapter 5
(b) The possible values of X are { 2, 1, 0, 1}, so the possible values of Y = |X +1|
are {0, 1, 2}. We get
3
P (Y = 0) = P (X = 1) =
10
1 3 2
P (Y = 1) = P (X = 2) + P (X = 0) = + =
10 10 5
2
P (Y = 2) = P (X = 1) = .
5
R1 1
5.16. (a) We have E[X n ] = 0 xn dx = n+1 .
(b) In Exercise 5.3 we have seen that the moment generating function of X is given
by the case defined function
(
1, t=0
MX (t) = et 1
t , t 6= 0.
118 Solutions to Chapter 5
P1 tk
P1 tk
We have et = k=0 k! , hence et 1= k=1 k! and
1 1 k 1 1
et 1 1 X tk X t X tn
MX (t) = = = =
t t k! k! n=0
(n + 1)!
k=1 k=1
for t 6= 0. In fact, this formula works for t = 0 as well, as the constant term of the
series is equal to 1. Now we can read o↵ the nth derivative at zero by taking the
coefficient of tn and multiplying by n!:
1 1
E[X n ] = M (n) (0) = n! · = .
(n + 1)! n+1
This agrees with the result we got for part (a).
5.17. (a) MX (0) = 1. For t 6= 0 integrate by parts.
Z 1 Z
1 2 tx
MX (t) = E[etX ] = etx f (x) dx = xe dx
1 2 0
Z 2
1 ⇣ x tx ⌘ ⇣1 ⌘
x=2 x=2
1 tx 1 2e2t
= e e dx = 2
etx
2 t x=0 0 t 2 t t x=0
2t 2t
2te e +1
= .
2t2
To summarize,
8
>
<1 for t = 0,
MX (t) = 2t
e2t + 1
: 2te
>
for t 6= 0.
2t2
(b) For t 6= 0 we insert the exponential series into MX (t) found in part (a) and
then cancel terms:
✓X1 1 ◆
2te2t e2t + 1 1 (2t)k+1 X (2t)k
MX (t) = = + 1
2t2 2t2 k! k!
k=0 k=0
1 ✓ ◆ 1
1 X 1 1 X 2k+1 tk
= 2 (2t)k = ·
2t (k 1)! k! k + 2 k!
k=2 k=0
2k+1
from which we read o↵ E(X k ) = M (k) (0) = k+2 .
(c)
Z 2
1 2k+1
E(X k ) = xk+1 dx = .
2 0 k+2
5.18. (a) Using the definition of a moment generating function we have
1
X 1
X
MX (t) = E[etX ] = etk P (X = k) = (et )k (1 p)k 1
p
k=1 k=1
1
X 1
X
= pet (et (1 p))k 1
= pe t t
(e (1 p))k
k=1 k=0
Solutions to Chapter 5 119
t
Note that the sum converges⇣ to⌘a finite number if and only if e (1 p) < 1, which
holds if and only if t < ln 1 1 p . In this case we have
1
MX (t) = pet · .
1 et (1 p)
Overall, we find:
8 ⇣ ⌘
< pet 1
1 et (1 p) t < ln
MX (t) = ⇣1 p⌘
.
:1 t ln 1
1 p
The geometric series is finite exactly if 34 et < 1, which holds for t ln(4/3). In
that case
1 ✓ ◆k
2 1X 3 t 2 1 3 t
e 8 3et
MX (t) = + e = + · 4 3 t = .
5 5 4 5 5 1 4e 20 15et
k=1
Hence
(
3et 8
MX (t) = 15et 20 , t < ln(4/3)
1 else.
120 Solutions to Chapter 5
5.21. We have
MY (t) = E[etY ] = E[et(aX+b) ] = E[ebt+atX ] = ebt E[eatX ] = ebt MX (at).
5.22. By the definition of the moment generating function and the properties of
expectation we get
MY (t) = E[etY ] = E[e(3X 2)t
] = E[e3tX e 2t
]=e 2t
E[e3tX ].
Note that E[e3tX ] is exactly the moment generating function MX (t) of X evaluated
at 3t. The moment generating function of X ⇠ Exp( ) is t for t < and 1
otherwise, thus E[e3tX ] = 3t for t < /3 and 1 otherwise. This gives
(
e 2t 3t , if t < /3
MY (t) =
1, otherwise.
Solutions to Chapter 5 121
5.23. We can notice that MY (t) looks very similar to the moment generating func-
t
tion of a Poisson random variable. If X ⇠ Poisson(2), then MX (t) = e2(e 1) , and
MY (t) = MX (2t). From Exercise 5.21 we see that Y has the same moment gener-
ating function as 2X, which means that they have the same distribution. Hence
2
22 2
P (Y = 4) = P (2X = 4) = P (X = 2) = e = 2e .
2!
5.24. (a) Since Y = eX > 0, we have FY (t) = 0 for t 0. For t 0,
FY (t) = P (Y t) = P (eX t) = 0,
since ex > 0 for all x 2 R. Next, for any t > 0
FY (t) = P (Y t) = P (eX t) = P (X ln t) = (ln t).
Di↵erentiating this gives the probability density function for t > 0:
✓ ◆
1 1 1 (ln(t))2
fY (t) = 0 (ln t) = '(ln t) = p exp .
t t 2⇡t2 2
For t 0 the probability density function is 0.
(b) From the definition of Y we get that E[Y n ] = E[(eX )n ] = E[enX ]. Note that
E[enX ] = MX (n) is the moment generating function of X evaluated at n.
We computed the moment generating function for X ⇠ N (0, 1) and it is given
2
by MX (t) = et /2 . Thus we have
n2
E[Y n ] = e 2 .
5.25. We start by expressing the cumulative distribution function FY (y) of Y in
terms of FX . Since Y = |X 1| 0, we can concentrate on y 0.
FY (y) = P (Y y) = P (|X 1| y) = P ( y X 1 y)
= P (1 y X 1 + y) = FX (1 + y) FX (1 y).
(In the last step we used P (X = 1 y) = 0.) Di↵erentiating the final expression:
d
fY (y) = FY0 (y) = (FX (1 + y) FX (1 y)) = fX (1 + y) + fX (1 y).
dy
1
We have fX (x) = 5 if 2 x 3 and zero otherwise. Considering the various
cases we get 8
> 2
<5, 0<y<2
fY (y) = 15 , 2y<3
>
:
0 otherwise.
5.26. The function g(x) = x(x 3) is non-positive in [0, 3] (as 0 x and x 3 0).
It is a simple calculus exercise to show that the function g(x)) takes its minimum
at x = 3/2 inside [0, 3], and the minimum value is 94 . Thus Y = g(X) will take
values from the interval [ 94 , 0] and the probability density function fY (y) is 0 for
y2/ [ 94 , 0].
9
We will determine the cumulative distribution function FY (y) for y 2 [ 4 , 0].
We have
FY (y) = P (Y y) = P (X(X 3) y).
122 Solutions to Chapter 5
Next we solve the inequality x(x 3) y for x. Since x(x 3) is a parabola facing
up, the solution will be an interval and the endpoints are exactly the solutions of
x(x 3) = y. The solutions of this equation are
p p
3 9 + 4y 3 + 9 + 4y
x1 = , and x2 = ,
2 2
9
thus for 4 y 0 we get
✓ p p ◆
3 9 + 4y 3 + 9 + 4y
FY (y) = P (X(X 3) y) = P X
2 2
p p
= FX ( 3+ 9+4y
2 ) FX ( 3 9+4y
2 ).
Di↵erentiating with respect to y gives
1 p 1 p
fY (y) = FY0 (y) = p fX ( 3+ 29+4y ) + p FX ( 3 9+4y
2 ).
9 + 4y 9 + 4y
Using the fact that fX (x) = 29 x for 0 x 3 we obtain
1 p 1 p
fY (y) = p · 29 ( 3+ 29+4y ) + p · 2
9 · (3 9+4y
2 )
9 + 4y 9 + 4y
2
= p .
9 9 + 4y
Thus
2 9
fY (y) = p if 4 y0
9 9 + 4y
and 0 otherwise.
Finding the probability density via the Fact 5.27.
By Fact 5.27 we have
X 1
fY (y) = fX (x)
|g 0 (x)|
x:g(x)=y,g 0 (x)6=0
5.28. We have fX (x) = 13 for 1 < x < 2 and 0 otherwise. Y = X 4 takes values
from [0, 16], thus fY (y) = 0 outside this interval. For 0 < y 16 we have
p p p p
FY (y) = P (Y y) = P (X 4 y) = P ( 4 y X 4 y) = FX ( 4 y) FX ( 4 y).
Di↵erentiating this gives
1 3/4 p 1 p
fY (y) = FY0 (y) = y fX ( 4 y) + y 3/4 fX ( 4 y).
4 4
p p p
Note that for 0 < y < 1 both 4 y and 4 y are in ( 1, 2), hence fX ( 4 y) and
p 1
fX ( 4 y) are both equal to 3 . This gives
1 1 1
fY (y) = 2 · y 3/4 · = y 3/4 , if 0 < y < 1.
4 3 6
p p
If 1 y < 16 then 4 y 2 ( 1, 2), but 4 y =6 ( 1, 2) which gives
1 3/4 1 1 3/4
fY (y) = y · = y , if 1 y < 16.
4 3 12
Collecting everything
8
> 1 3/4
<6y , if 0 < y < 1
1
fY (y) = 12 y 3/4 , if 1 y < 16
>
:
0, otherwise.
5.29. Y = |Z| 0. For y 0 we get
FY (y) = P (Y y) = P (|Z| y) = P ( y Z y) = (y) ( y) = 2 (y) 1.
Hence for y 0 we have
2 y2
fY (y) = F 0 (y) = (2 (y) 1)0 = 2 (y) = p e 2 ,
2⇡
and fY (y) = 0 otherwise.
5.30. We present two approaches for the solution.
Finding the probability density via the cumulative distribution function.
1
The probability density function of X is fX (x) = 3⇡ on [ ⇡, 2⇡] and 0 otherwise.
The sin(x) function takes values between 1 and 1, and it will take all these
values on [ ⇡, 2⇡]. Thus the set of possible values of Y are the interval [ 1, 1].
124 Solutions to Chapter 5
where g(x) = sin(x). Again, we only need to worry about the case 1 y 1,
since Y can only take values from here. With a little bit of trigonometry you can
check that the solutions of sin(x) = y for |y| < 1 are exactly the numbers
P (Y t) = P ( X1 t) = P (X 1
t) =1 1
t.
1
Di↵erentiating now shows that fY (t) = t2 when t 1.
5.33. The following function will work:
8
>
<1 if 0 < u < 1/7
g(u) = 4 if 1/7 u < 3/7
>
:
9 if 3/7 u 1.
5.34. We can see from the conditions that
1 1 1
P (1 < X < 3) = P (1 < X < 2) + P (X = 2) = P (2 < X < 3) = + + = 1,
3 3 3
hence we will need to find a function g that maps (0, 1) to (1, 3). The conditions
show that inside the intervals (1, 2) and (2, 3) the random variable X ‘behaves’
like a random variable with probability density function 13 there, but it also takes
the value 2 with probability 13 (so it actually cannot have a probability density
function). We get P (g(U ) = 2) = 13 if the function g is constant 2 on an interval
126 Solutions to Chapter 5
of length 13 inside (0, 1). To get the behavior in (1, 2) and (2, 3) we can have linear
functions there with slope 3. This leads to the following construction:
8
>
<1 + 3x, if 0 < x 13
g(x) = 2, if 13 < x 23
>
: 2
2 + 3(x 3 ), if 23 < x < 1.
We can define g any way we want it to outside (1, 3).
To check that this function works note that
1
P (g(U ) = 2) = P ( 13 U 23 ) = ,
3
for 1 < a < 2 we have
P (1 < g(U ) < a) = P (1 + 3U < a) = P (U < 13 (a 1)) = 13 (a 1),
and for 2 < b < 3 we have
2
P (b < g(U ) < 3) = P (b < 2+3(U 3 )) = P ( 13 (b 2)+ 23 < U ) = 1
3
1
3 (b 2) = 13 (3 b).
5.35. Note that Y = bXc is an integer, and hence Y is discrete. Moreover, for an
integer k we have bXc = k if and only if k X < 1. Thus
P (bXc = k) = P (k X < k + 1).
Since X ⇠ Exp( ), we have P (k X < k + 1) = 0 if k 1, and for k 0:
Z k+1
P (k X < k + 1) = e y dy = e k e (k+1) = e k (1 e ).
k
5.36. Note that X 0 and thus the possible values of bXc are 0, 1, 2, . . . . To find
the probability mass function, we have to compute P (bXc = k) for all nonnegative
integer k. Note that bXc = k if and only if k X < k + 1. Thus for k 2 {0, 1, . . . }
we have
Z k+1
P (bXc = k) = P (k X < k + 1) = e t dt
k
t=k+1
t k (k+1)
= e =e e
t=k
k
=e (1 e ) = (e )k (1 e ).
Note that this implies the random variable bXc + 1 is geometric with a parameter
of e .
5.37. Since Y = {X}, we have 0 Y < 1. For 0 y < 1 we have
FY (y) = P (Y y) = P ({X} y).
If {x} y then k x k + y for some integer k. Thus
X X
P ({X} y) = P (k X k + y) = (FX (k + y) FX (k)).
k k
6.1. (a) We just need to compute the row sums to get P (X = 1) = 0.3, P (X =
2) = 0.5, and P (X = 3) = 0.2.
(b) The possible values for Z = XY are {0, 1, 2, 3, 4, 6, 9} and the probability mass
function is
P (Z = 0) = P (Y = 0) = 0.35
P (Z = 1) = P (X = 1, Y = 1) = 0.15
P (Z = 2) = P (X = 1, Y = 2) + P (X = 2, Y = 1) = 0.05
P (Z = 3) = P (X = 1, Y = 3) + P (X = 3, Y = 1) = 0.05
P (Z = 4) = P (X = 2, Y = 2) = 0.05
P (Z = 6) = P (X = 2, Y = 3) + P (X = 3, Y = 2) = 0.2 + 0.1 = 0.3
P (Z = 9) = P (X = 3, Y = 3) = 0.05.
3 X
X 3
E[XeY ] = xey
x=1 y=0
6.2. (a) The marginal probability mass function of X is found by computing the
row sums,
1 1 1
P (X = 1) = , P (X = 2) = , P (X = 3) = .
3 2 6
129
130 Solutions to Chapter 6
For 0 x 1,
Z1 Z 1
fX (x) = f (x, y) dy = 12
7 (xy + y 2 ) dy = 12 1
7 (2x + 13 ) dy = 67 x + 47 .
0
1
For 0 y 1,
Z1 Z 1
fY (y) = f (x, y) dx = 12
7 (xy + y 2 ) dx = 12 1
7 (2y + y 2 ) dy = 12 2
7 y + 67 y.
0
1
(c)
ZZ Z1 ✓Zy ◆ Z1
12 2 12 3 3
P (X < Y ) = f (x, y) dx dy = 7 (xy + y ) dx dy = 7 2y dy
x<y 0 0 0
12 3 9
= 7 · 8 = 14 .
(d)
Z 1 Z 1 Z 1 Z 1
E[X 2 Y ] = x2 yf (x, y) dx dy = x2 y 12
7 (xy + y 2 ) dx dy
1 1 0 0
Z 1 Z 1
= 12
7 (x3 y 2 + x2 y 3 ) dx dy = 12 1
7 4 · 1
3 + 1
3 · 1
4 = 27 .
0 0
Thus (
2(1 x), if 0 x 1
fX (x) =
0, otherwise.
Similar computation shows that
(
2(1 y), if 0 y 1
fY (y) =
0, otherwise.
(b) The expectation of X can be computed using the marginal density:
Z 1 Z 1 x=1
2x3 1
E[X] = xfX (x)dx = x2(1 x)dx = x2 = .
1 0 3 x=0 3
Similar computation gives E[Y ] = 13 .
(c) To compute E[XY ] we need to integrate the function xyf (x, y) on the whole
plane, which in our case is the same as integrating 2xy on our triangle. We can
write this double integral as two single variable integrals: for a given 0 x 1 the
possible y values are 0 y 1 x hence
Z 1Z 1 x Z 1⇣ ⌘ Z 1
y=1 x
E[XY ] = 2xy dy dx = xy 2 y=0 dx = x(1 x)2 dx
0 0 0 0
x4 2x3 x2 x=1 1
= + x=0
= .
4 3 2 12
6.8. (a) X and Y from Exercise 6.2 are not independent. For example, note that
P (X = 3) > 0 and P (Y = 2) > 0, but P (X = 3, Y = 2) = 0.
(b) The marginals for X and Y from Exercise 6.5 are:
For 0 x 1,
Z1 Z 1
fX (x) = f (x, y) dy = 12
7 (xy + y 2 ) dy = 12 1
7 (2x + 13 ) dy = 67 x + 47 .
0
1
For 0 y 1,
Z1 Z 1
fY (y) = f (x, y) dx = 12
7 (xy + y 2 ) dx = 12 1
7 (2y + y 2 ) dy = 12 2
7 y + 67 y.
0
1
Solutions to Chapter 6 133
Thus, fX (x)fY (y) 6= f (x, y) and they are not independent. For example,
fX ( 14 ) = 11 1 9 1 1 99 1 1
14 and fY ( 4 ) = 28 , so that fX ( 4 )fY ( 4 ) = 392 . However, f ( 4 , 4 ) =
3
14 .
(c) The marginal of X is
Z 1 Z 1
x(1+y) x xy x
fX (x) = xe dy = xe e dy = e ,
0 0
for x > 0 and zero otherwise. The marginal of Y is
Z 1
1
fY (y) = xe x(1+y) dx = ,
0 (1 + y)2
for y > 0 and zero otherwise. Hence, f (x, y) is not the product of the marginals
and X and Y are not independent.
(d) X and Y are not independent. For example, choose any point (x, y) contained
in the square {(u, v) : 0 u 1, 0 v 1}, but not contained in the
triangle with vertices (0, 0), (1, 0), (0, 1). Then fX (x) > 0, fY (y) > 0, and
so fX (x)fY (y) > 0. However, f (x, y) = 0 (because the point is outside the
triangle).
6.9. X is binomial with parameters 3 and 1/2, thus its probability mass function is
pX (a) = a3 18 for a = 0, 1, 2, 3 and zero otherwise. The probability mass function
of Y is pY (b) = 16 for b = 1, 2, 3, 4, 5, 6. Since X and Y are independent, the joint
probability mass function is just the product of the individual probability mass
functions which means that
✓ ◆
3 1
pX,Y (a, b) = pX (a)pY (b) = , for a 2 {0, 1, 2, 3} and b 2 {1, 2, 3, 4, 5, 6}.
a 48
6.10. The marginals of X and Y are
( (
1, x 2 (0, 1) 1, y 2 (0, 1)
fX (x) = , fY (y) =
0, x 2/ (0, 1) 0, y2/ (0, 1),
and because they are independent the joint density is their product
(
1, 0 < x < 1, and 0 < y < 1
fX,Y (x, y) = fX (x)fY (y) =
0, else.
Therefore,
ZZ Z 1 Z y Z 1
1
P (X < Y ) = fX,Y (x, y)dxdy = 1dx dy = y dy = .
x<y 0 0 0 2
6.11. Because Y is uniform on (1, 2), the marginal density for Y is
(
1 y 2 (1, 2)
fY (y) =
0 else
By independence, the joint distribution of (X, Y ) is therefore
(
2x 0 < x < 1, 1 < y < 2
fX,Y (X, Y ) =
0 else
134 Solutions to Chapter 6
where you should draw a picture of the region to see why this is the case. Calculating
the double integral yields:
Z 12 Z 2 Z 1/2
3
P (Y X 2) = 2x dy dx = 2x( 12 x) dx = 24 1
.
0 x+ 32 0
2⇡ 1 ⇢2
We can simplify the exponent of the exponential as follows:
✓ ◆2
2 y ⇢x
x + p 2
1 ⇢ x2 (1 ⇢2 + ⇢2 ) + y 2 2⇢xy x2 + y 2 2⇢xy
= 2
= .
2 2(1 ⇢ ) 2(1 ⇢2 )
This shows that the joint probability density of X, Y is indeed the same as given
in (6.28), and thus the pair (X, Y ) has standard bivariate normal distribution with
parameter ⇢.
6.16. In terms of the polar coordinates (r, ✓) the Cartesian coordinates (x, y) are
expressed as
x = r cos(✓) and y = r sin(✓).
These equations give the coordinate functions of the inverse function G 1 (r, ✓).
The Jacobian is
" @x @x #
@r @✓ cos(✓) r sin(✓)
J(r, ✓) = det @y @y = det = r cos2 ✓ + r sin2 ✓ = r.
sin(✓) r cos(✓)
@r @✓
1
The joint density function of X, Y is fX,Y (x, y) = ⇡r02
in D and 0 outside. Formula
(6.32) gives
1
fR,⇥ (r, ✓) = fX,Y (r cos(✓), r sin(✓)) |J(r, ✓)| = ⇡r02
r for (r, ✓) 2 L.
This is exactly the joint density function obtained earlier in (6.26) of Example 6.37.
6.17. We can express (X, Y ) as (g(U, V ), h(U, V )) where g(u, v) = uv and h(u, v) =
(1 u)v. We can find the inverse of the function (g(u, v), h(u, v)) by solving the
system of equations
x = uv, y = (1 u)v
x
for u and v. The solution is u = x+y , v = x + y, so the inverse of (g(u, v), h(u, v))
is the function (q(x, y), r(x, y)) with
x
q(x, y) = , r(x, y) = x + y.
x+y
The Jacobian of (q(x, y), r(x, y)) with respect to x, y is
y x
2 (x+y)2 y+x 1
J(x, y) = det (x+y) = 2
= .
1 1 (x + y) x+y
136 Solutions to Chapter 6
The terms are nonnegative and add to 1, which shows that pX,Y is a probability
mass function.
(b) Adding the rows and columns gives the marginals. The marginal of X is
P (X = 1) = 14 , P (X = 2) = 14 , P (X = 3) = 14 , P (X = 4) = 14 ,
whereas the marginal of Y is
25 13 7 1
P (Y = 1) = 48 , P (Y = 2) = 48 , P (Y = 3) = 48 , P (Y = 4) = 16 .
(c)
P (X = Y + 1) = P (X = 2, Y = 1) + P (X = 3, Y = 2) + P (X = 4, Y = 3)
1 1 1 13
= 8 + 12 + 16 = 48 .
6.19. (a) By adding the probabilities in the respective rows we get pX (0) = 13 ,
pX (1) = 23 . By adding them in the appropriate columns we get the marginal
probability mass function of Y : pY (0) = 16 , pY (1) = 13 , pY (2) = 12 .
(b) We have pZ,W (z, w) = pZ (z)pW (w) by the independence of Z and W . Using
the probability mass functions from part (a) we get
W
0 1 2
1 1 1
Z 0 18 9 6
1 2 1
1 9 9 3
Solutions to Chapter 6 137
6.20. Note that the random variable X1 + X2 counts the number of times that out-
comes 1 or 2 occurred. This event has a probability of 12 . Hence, and similar to the
argument made at the end of Example 6.10, (X1 +X2 , X3 , X4 ) ⇠ Mult(n, 3, 12 , 18 , 38 ).
Therefore, for any pair of integers (k, `) with k + ` n
P (X3 = k, X4 = `) = P (X1 + X2 = n k `, X3 = k, X4 = `)
n! 1 n k ` 1 k 3 `
= 2 8 8 .
(n k `)! k! `!
6.21. They are not independent. Both X1 and X2 can take the value n with positive
probability. However, they cannot take it the same time, as X1 + X2 n. Thus
6.22. The random variable X1 + X2 counts the number of times that outcomes
1 or 2 occurred. This event has a probability of p1 + p2 . Therefore, X1 + X2 ⇠
Bin(n, p1 + p2 ).
6.23. Let Xg , Xr , Xy be the number of times we see a green ball, red ball, and
yellow ball, respectively. Then, (Xg , Xr , Xy ) ⇠ Mult(4, 3, 1/3, 1/3, 1/3). We want
the following probability,
6.24. The number of green balls chosen is binomially distributed with parameters
n = 3 and p = 14 . Hence, the probability that exactly two balls are green and one
is not green is
✓ ◆ ✓ ◆2
3 1 3 9
= .
2 4 4 64
The same argument goes for seeing exactly two red balls, two yellow balls, or two
white balls. Hence, the probability that exactly two balls are of the same color is
9 9
4· = .
64 16
6.25. (a) The possible values for X and Y are 0, 1, 2. For each possible pair we
compute the probability of the corresponding event, For example,
3
P (X = 0, Y = 0) = P {(T, T, T )} = 2 .
138 Solutions to Chapter 6
Similarly
3
P (X = 0, Y = 1) = P ({(T, T, H)}) = 2
P (X = 0, Y = 2) = 0
3
P (X = 1, Y = 0) = P ({(H, T, T )}) = 2
3 2
P (X = 1, Y = 1) = P ({(H, T, H), (T, H, T )}) = 2 ⇥ 2 =2
3
P (X = 1, Y = 2) = P ({(T, H, H)}) = 2
3
P (X = 2, Y = 1) = P ({(H, H, T )}) = 2
3
P (X = 2, Y = 2) = P ({(H, H, H)}) = 2
6.26. (a) By the setup of the experiment, XA is uniformly distributed over {0, 1, 2}
whereas XB is uniformly distributed over {1, 2, . . . , 6}. Moreover, XA and XB
are independent. Hence, (XA , XB ) is uniformly distributed over ⌦ = {(k, `) :
0 k 2, 1 ` 6}. That is, for (k, `) 2 ⌦,
1
P ((XA , XB ) = (k, `)) = 18 .
(b) The set of possible values of Y1 is {0, 1, 2, 3, 4, 5, 6, 8, 10, 12} and the set of
possible values of Y2 is {1, 2, 3, 4, 5, 6}. The joint distribution can be given in
tabular form
Y1 \ Y2 1 2 3 4 5 6
1 1 1 1 1 1
0 18 18 18 18 18 18
1
1 18 0 0 0 0 0
2
2 0 18 0 0 0 0
1
3 0 0 18 0 0 0
1 1
4 0 18 0 18 0 0
1
5 0 0 0 0 18 0
1 1
6 0 0 18 0 0 18
1
8 0 0 0 18 0 0
1
10 0 0 0 0 18 0
1
12 0 0 0 0 0 18
For example,
1 1
P (Y1 = 2, Y2 = 2) = P (XA = 1, XB = 2) + P (XA = 2, XB = 1) = 18 + 18 .
Solutions to Chapter 6 139
(c) The marginals are found by summing along the rows and columns:
6 1 2
P (Y1 = 0) = 18 , P (Y1 = 1) = 18 , P (Y1 = 2) = 18
1 2 1
P (Y1 = 3) = 18 , P (Y1 = 4) = 18 , P (Y1 = 5) = 18
2 1 1
P (Y1 = 6) = 18 , P (Y1 = 8) = 18 , P (Y1 = 10) = 18
1
P (Y1 = 12) = 18 ,
and
2 4 3
P (Y2 = 1) = 18 , P (Y2 = 2) = 18 , P (Y2 = 3) = 18
3 3 3
P (Y2 = 4) = 18 , P (Y2 = 5) = 18 , P (Y2 = 6) = 18 .
= P (X = k)P (k < Y ) = pq k 1
· q k = pq 2k 1
,
where we used the independence of X and Y in the third equality. We get P (V =
k, W = 2) = pq 2k 1 in exactly the same way. Finally,
P (V = k, W = 1) = P (min(X, Y ) = k, X = Y ) = P (X = k, Y = k) = p2 q 2k 2
.
This gives us the joint probability mass function of V and W ; for the independence
we need to check if this is the product of the marginals.
By Example 6.31 we have V ⇠ Geom(1 q 2 ) so for any k 2 {1, 2, . . . } we get
P (V = k) = (1 (1 q 2 ))k 1
(1 q 2 ) = q 2k 2
(1 q 2 ).
140 Solutions to Chapter 6
1
X 1
X
P (W = 1) = P (X = Y ) = P (X = k, Y = k) = P (X = k)P (Y = k)
k=1 k=1
1
X 1
X p2
= pq k 1
· pq k 1
= p2 (q 2 )k =
1 q2
k=1 k=0
p
= .
2 p
P (V = k)P (W = 0) = q 2k 2
(1 q 2 ) 12 p
p, P (V = k, W = 0) = pq 2k 1
,
1 q2 1
and since 2 p = (1 q)(1 + q) 1+q = p, we have
P (V = k)P (W = 0) = P (V = k, W = 0).
P (V = k)P (W = 1) = q 2k 2
(1 q2 ) 2 p p , P (V = k, W = 1) = p2 q 2k 2
1 q2
and using 2 p = p again we get
P (V = k)P (W = 1) = P (V = k, W = 1).
6.29. Because of the independence, the joint probability mass function of X and
Y is the product of the individual probability mass functions:
a=1 a=1
p(1 r) p pr
= = .
1 (1 p)(1 r) p + r pr
6.30. Note the typo in the problem, it should say P (X = Y +1), not P (X +1 = Y ).
For k 1 and ` 0 the joint probability mass function of X and Y is
`
P (X = k, Y = `) = (1 p)k 1
p·e `! .
Breaking up {X = Y + 1} into the disjoint union of smaller events {X = Y + 1} =
[1
k=0 {X = k + 1, Y = k}. Thus
1
X 1
X k
P (X = Y + 1) = P (X = k + 1, Y = k) = (1 p)k p · e k!
k=0 k=0
1
X ( (1 p))k
= pe
k!
k=1
(1 p) p
= pe e = pe .
red, b green and c yellow balls. Thus the joint probability mass function is
10 15 20
a b c
P (X1 = a, X2 = b, X3 = c) = 45
8
We turn to finding the joint probability mass function of N and Y . First, note
that
P (Y = 1, N = n) = P ((n 1) white balls followed by a green ball)
= ( 29 )n 1 49 .
Similarly,
P (Y = 2, N = n) = ( 29 )n 13
9.
Similarly,
P (Y = 2) = 37 .
We see that Y and N are independent:
P (Y = 1)P (N = n) = 4
7 · ( 29 )n 17
9 = ( 29 )n 14
9 = P (Y = 1, N = n)
P (Y = 2)P (N = n) = 3
7 · ( 29 )n 1 79 = ( 29 )n 1 39 = P (Y = 2, N = n).
Thus (
6y 6y 2 if 0 < y < 1
fY (y) =
0 otherwise.
The joint density function is positive on the triangle
{(x, y) : 0 < y < 1, y < x < 2 y}.
Solutions to Chapter 6 143
The line segment from (1, 1) to (2, 0) that forms part of the boundary of D
obeys the equation y = 2 x. The marginal density functions are derived as
follows. First for X.
For x 0 and x 2,
fX (x) = 0.
Z 1 Z 1
2
For 0 < x 1, fX (x) = fX,Y (x, y) dy = 3 dy = 23 .
1 0
Z 1 Z 2 x
2 4 2
For 1 < x < 2, fX (x) = fX,Y (x, y) dy = 3 dy = 3 3 x.
1 0
so indeed it is.
Next the marginal density function of Y :
For y 0 and y 1,
fY (y) = 0.
Z 1 Z 2 y
2 4 2
For 0 < y < 1, fY (y) = fX,Y (x, y) dx = 3 dx = 3 3 y.
1 0
(b)
Z 1 Z 1 Z 2
2 2 2
E[X] = x fX (x) dx = 3 x dx + ( 43 x 3 x ) dx = 79 .
1 0 1
Z 1 Z 1
2 2
E[Y ] = y fY (y) dy = ( 43 y 3 y ) dy = 49 .
1 0
(c) X and Y are not independent. Their joint density is not a product of the
marginal densities. Also, a picture of D shows that P (X > 32 , Y > 12 ) = 0
because all points in D satisfy x + y 2. However, the marginal densities show
that P (X > 32 ) · P (Y > 12 ) > 0 so the probability of the intersection does not
equal the product of the probabilities.
144 Solutions to Chapter 6
6.35. (a) Since fXY is non-negative, we just need to prove that the integral of fXY
is 1:
Z Z Z Z y
1 1 2
fXY (x, y)dxdy = (x + y)dx dy = (x + y)dx dy
0xy2 4 4 0 0
Z
1 2 3 2
= y dy = 1.
4 0 2
(b) We calculate the probability using the joint density function:
Z Z 2 Z y
1 1
P {Y < 2X} = (x + y)dxdy = (x + y)dx dy
0xy2,y<2x 4 0 y 4
2
Z Z 2
1 2 3 2 5 2 7 7 8 7
= ( y y )dy = y 2 dy = · =
4 0 2 8 32 0 32 3 12
(c) According to the definition, when 0 y 2:
Z Z y
1 1 3 3 2
fY (y) = fXY (x, y)dx = (x + y)dx = ( y 2 0) = y
0 4 4 2 8
Otherwise, the density function fXY (x, y) = 0. Thus:
(
3 2
y y 2 [0, 2]
fY (y) = 8
0 else
R1 R1
6.36. (a) We need to find c so that 1 1 f (x, y)dxdy = 1. For this we need to
compute Z Z1 1
x2 (x y)2
e 2 2 dx dy
1 1
We can decide whether we should integrate with respect to x or y first, and
choosing y gives a slightly easier path.
Z 1 Z 1
x2 (x y)2 x2 (x y)2
e 2 2 dy = e 2 e 2 dy
1 1
p Z 1 p
x2 1 (x y)2 x2
= 2⇡e 2 p e 2 dy = 2⇡e 2 .
1 2⇡
In the last step we could recognize the integral of the pdf of a N (x, 1) distributed
random variable. From this we get
Z 1Z 1 Z 1p
x2 (x y)2 x2
e 2 2 dydx = 2⇡e 2 dx
1 1 0
Z 1
1 x2
= 2⇡ p e 2 dx = 2⇡.
0 2⇡
1
In the last step we integrated the pdf of the standard normal. Hence, c = 2⇡ .
(b) We have basically computed fX (without the constant c) in part (a) already.
Z 1
1 x2 (x y)2
fX (x) = e 2 2 dy
1 2⇡
Z 1
1 x2 1 (x y)2 1 x2
=p e 2 p e 2 dy = p e 2 .
2⇡ 1 2⇡ 2⇡
Solutions to Chapter 6 145
Now we compute fY :
Z 1 Z 1
1 x2 (x 1
y)2 1 x2 (x y)2
fY (y) = e 2 dx = p2 p e 2 2 dx.
1 2⇡ 2⇡ 1 2⇡
We can complete the square in the exponent of the exponential:
x2 (x y)2
= x2 xy 12 y 2 = (x y/2)2 y 2 /4,
2 2
and we can now compute the integral:
Z 1
1 1 x2 (x y)2
fY (y) = p p e 2 2 dx
2⇡ 1 2⇡
Z 1
1 1 2 2
=p p e (x y/2) y /4 dx
2⇡ 1 2⇡
Z 1
1 y 2 /4 1 2 1 y 2 /4
=p e p e (x y/2) dx = p e .
4⇡ 1 ⇡ 4⇡
2
In the last step we used the fact that p1⇡ e (x y/2) is the pdf of a N (y/2, 1)
distributed random variable. It follows that Y ⇠ N (0, 2).
Thus X ⇠ N (0, 1) and Y ⇠ N (0, 2).
(c) X and Y are not independent since there joint density function is not the same
as the product of the marginal densities.
Rd
6.37. We want to find fX (x) for which P (c < X < d) = c fX (x)dx for all c < d.
Because the x-coordinate of any point in D is in (a, b), we can assume that a < c <
d < b. In this case
A = {c < X < d} = {(x, y) : c < x < d, 0 < y < h(x)}.
area(A)
Because we chose (X, Y ) uniformly from D, we get P (A) = area(D) . We can
compute the areas by integration:
R d R h(x) Rd
dydx h(x)dx
P (c < X < d) = P (A) = Rcb R 0h(x) = Rcb .
a 0
dydx a
h(x)dx
We can rewrite the last expression as
Z d
h(x)
P (c < X < d) = Rb dx
c
a
h(s)ds
which shows that ( h(x)
Rb
h(s)ds
, if a < x < b
fX (x) = a
0, otherwise.
6.38. The marginal of Y is
Z 1
x(1+y) 1
fY (y) = xe dx = ,
0 (1 + y)2
for y > 0 and zero otherwise (use integration by parts). Hence,
Z 1
y
E[Y ] = dy = 1.
0 (1 + y)2
146 Solutions to Chapter 6
6.39. F (p, q) is the probability corresponding to the quarter plane {(x, y) : x <
p, y < q}. (Because X, Y are jointly continuous it does not matter whether we
write < or .) Our goal is to get the probability of (X, Y ) being in the rectangle
{(x, y) : a < x < b, c < y < d} using quarter planes probabilities. We start with
the probability F (b, d), this is the probability corresponding to the quarter plane
with corner (b, d). If we subtract F (a, d) + F (b, c) from this then we remove the
probabilities of the quarter planes corresponding to (a, d) and (b, c), and we have
exactly the rectangle (a, b) ⇥ (c, d) left. However, the probability corresponding to
the quarter plane with corner (a, c) was subtracted twice (instead of once), so we
have to add it back. This gives
P (a < X < b, c < Y < d) = F (b, d) F (b, c) F (a, d) + F (a, b).
6.40. First note that the relevant set of values is s 2 [0, 2] since 0 X + Y 2.
The joint density function is positive on the triangle
{(x, y) : 0 < y < 1, y < x < 2 y}.
To calculate the probability that X + Y s, for 0 s 2, we combine the
restriction x + y s with the description of the triangle to find the region of
integration. (A picture could help.)
ZZ Z s/2 ✓ Z s y ◆
P (X + Y s) = f (x, y) dx dy = 3y(2 x) dx dy
0 y
x+ys
Z s/2
= 3
2 s2 y + 3 sy 2 + 6 sy 12 y 2 dy
0
3 2 2
(3 s 12) s3 2 s + 6s s
= + .
24 8
Di↵erentiating to give the density yields
3 1 3
f (s) = s2 s for 0 < s < 2, and zero elsewhere.
4 4
6.41. Let A be the intersection of the ball with radius r centered at the origin and
D. Because r < h, this is just the ‘top’ half of the ball. We need to compute
P ((X, Y, Z) 2 A), and because (X, Y, Z) is chosen uniformly from D this is just the
ratio of volumes of D and A. The volume 2 3
of D is r2 h⇡ while the volume of A is
2 3 3r ⇡ 2r
3 r ⇡, so the probability in question is r 2 h⇡ = 3h .
6.42. Drawing a picture is key to understanding the solution as there are multiple
cases requiring the computation of the areas of relevant regions.
Note that 0 X 2 and 0 Z = X + Y 5. This means that for x < 0 or
z < 0 we have
FX,Z (x, z) = P (X x, Z z) = 0.
If x and z are both nonnegative then we can compute P (X x, Z z) = P (X
x, X + Y z) by integrating the joint density of X, Y on the region Ax,z = {(s, t) :
s x, s + t z}. This is just the area of the intersection of Ax,z and D divided by
the area of D (which is 6). The rest of the solution boils down to identifying the
region Ax,z \ D in various cases and finding the corresponding area.
If 0 x 2 and z is nonnegative then we need to consider four cases:
Solutions to Chapter 6 147
• If 0 z x then Ax,z \ D is the triangle with vertices (0, 0), (z, 0), (0, z),
2
with area z2 .
• If x < z 3 then Ax,z \ D is a trapezoid with vertices (0, 0), (x, 0), (0, z) and
(x, z x). Its area is x(2z2 x) .
• If 3 < z 3 + x then Ax,z \ D is a pentagon with vertices (0, 0), (x, 0),
2
(x, z x), (z 3, 3) and (0, 3). Its area is 3x (3+x2 z)
• If 3 + x < z then Ax,z \ D is the rectangle with vertices (0, 0), (x, 0), (x, 3)
and (0, 3), with area 3x.
We get the corresponding probabilities by dividing the area of Ax,z \ D with 6.
Thus for 0 x 2 we have
8
>
> 0, if z < 0
>
>
>
>z ,
2
>
> if 0 z x
>
< 12
x(2z x)
FX,Z (x, z) = 12 , if x < z 3
>
>
>
> (3+x z) 2
>
>
x
, if 3 < z 3 + x
>
> 2 12
>
:x
2, if 3 + x < z.
For 2 < x we get P (X x, Z z) = P (X 2, Z z) = FX,Z (2, z). Using the
previous results, in this case we get
8
>
> 0, if z < 0
>
>
>
> z 2
>
> , if 0 z 2
>
< 12
F (x, z) = (z 3 x) , if 2 < z 3
>
>
>
> 2
>
>
> 1 (5 12z) , if 3 < z 5
>
>
:
1, if 5 < z.
6.43. Following the reasoning of Example 6.40,
fT,V (u, v) = fX,Y (u, v) + fX,Y (v, u).
Substituting in the definition of fX,Y gives the answer
( p p
2u2 v + v + 2v 2 u + u if 0 < u < v < 1
fT,V (u, v) =
0 else.
6.44. Drawing a picture of the cone would help with this problem. The joint density
of the uniform distribution in the teepee is
(
1
if (x, y, z) 2 Cone
fX,Y,Z (x, y, z) = vol(Cone)
0 else .
The volume of the cone is ⇡r2 h/3. Thus the joint density is,
(
3
2 if (x, y, z) 2 Cone
fX,Y,Z (x, y, z) = ⇡r h
0 else .
148 Solutions to Chapter 6
To find the joint density of (X, Y ) we must integrate out the Z variable. To do so,
we switch to cylindrical variables. Let (R̃, ⇥, Z) be the distance from the center of
the teepee, angle, and height where the fly dies. The height that we must integrate
depends where we are on the floor. That is, if we are in the middle of the teepee
R̃ = 0, we must integrate Z from z = 0 to z = h. If we are near the edge of the
teepee, we only integrate a small amount, for example z = 0 to z = ✏. For an
arbitrary radius R̃0 , the height we must integrate to is h0 = (1 R̃r )h.
Then the integral we must compute is
Z (1 r̃r )h
3 3(1 r̃r )
fR̃,⇥ (r, ✓) = 2
dz = .
0 ⇡r h ⇡r2
We can check that this integrates to one. Recall that we are integrating with respect
to cylindrical coordinates and thus
Z Z Z 2⇡ Z r
3(1 r̃r )
fX,Y (x, y) dx dy = r̃ dr̃ d✓
circle 0 0 ⇡r2
Z 2⇡ r2 r3
3( 2 3 ) 3r2 16
= d✓ = (2⇡) = 1.
0 ⇡r2 ⇡r2
Thus, switching back to rectangular coordinates,
p
x2 +y 2
p 3(1 )
r
fX,Y (x, y) = fR,⇥ ( x2 + y 2 , ✓) = 2
⇡r
for x2 + y 2 r2 .
For the marginal in Z, consider the height to be z. Then we must integrate
over the circle with radius r0 = r(1 hz ). Thus, in cylindrical coordinates,
Z 2⇡ Z r(1 z/h)
3
fZ (z) = 2h
r̃ dr̃ d✓
0 0 ⇡r
which yields,
Z 2⇡
3r2 (1 z/h)2 3⇣ z ⌘2
fZ (z) = d✓ = 1 .
0 2⇡r2 h h h
6.45. We first note that
FV (v) = P (V v) = P (max(X, Y ) v) = P (X v, Y v)
= P (X v)P (Y v) = FX (v)FY (v).
Di↵erentiating this we get the p.d.f. of V :
d 0
fV (v) =FV (v) = FX (v)FY (v) = fX (v)FY (v) + FX (v)fY (v).
dv
For the minimum we use
P (T > z) = P (min(X, Y ) > z) = P (X > z, Y > z) = P (X > z)P (Y > z),
then
FT (z) = P (T z) = 1 P (T > z) = 1 P (X > z)P (Y > z)
=1 (1 FX (z))(1 FY (z)),
Solutions to Chapter 6 149
and
⇥ ⇤0
fT (z) = 1 (1 FX (z))(1 FY (z))
= fX (z)(1 FY (z)) + fY (z)(1 FX (z)).
We computed the probabilities of the events {max(X, Y ) v} and {min(X, Y ) > z}
because these events can be written as intersections to take advantage of indepen-
dence.
6.46. We know from (6.31) and the independence of X and Y that
fT,V (t, v) = fX (t)fY (v) + fX (v)fY (t),
if t < v and zero otherwise. The marginal of T = min(X, Y ) is found by integrating
the v variable:
Z 1 Z 1
fT (t) = fT,V (t, v)dv = fX (t)fY (v) + fX (v)fY (t) dv
1 t
= fX (t)(1 FY (t)) + fY (t)(1 FX (t)).
(b) We can find the density functions by di↵erentiation (using the chain rule):
d d
fZ (z) = FZ (z) = (1 (1 FX (z))n ) = nfX (x)(1 FX (x))n 1
,
dz dz
d d
fW (w) = FW (w) = FX (w)n = nfX (x)FX (x)n 1 .
dw dw
( 1 +···+ n )t
6.48. Let t > 0. We will show that P (Y > t) = e . Using the indepen-
dence of the random variables we have
P (Y > t) = P (min(X1 , X2 , . . . , Xn ) > t) = P (X1 > t, X2 > t, . . . , Xn > t)
n
Y n
Y
it
= P (Xi > t) = e
i=1 i=1
( 1 +···+ n )t
=e .
Hence, Y is exponentially distributed with parameter 1 + ··· + n.
6.49. In the setting of Fact 6.41, let G(x, y) = (min(x, y), max(x, y)) and L =
{(t, v) : t < v}. When x 6= y this function G is two-to-one. Hence we define
two separate regions K1 = {(x, y) : x < y} and K2 = {(x, y) : x > y}, so that
G is one-to-one and onto L from both K1 and K2 . The inverse functions are as
follows: from L onto K1 it is (q1 (t, v), r1 (t, v)) = (t, v) and from L onto K2 it is
(q2 (t, v), r2 (t, v)) = (v, t). Their Jacobians are
1 0 0 1
J1 (t, v) = det = 1 and J2 (t, v) = det = 1.
0 1 1 0
Since the diagonal {(x, y) : x = y} has zero area it was legitimate to drop it from
the first double integral. From the last line we can read o↵ the joint density function
fT,V (t, v) = fX,Y (t, v) + fX,Y (v, t) for t < v.
6.50. (a) Since X ⇠ Gamma(r, ) and Y ⇠ Gamma(s, ) are independent, we have
xr 1 r
xy
s 1 s
y
fX,Y (x, y) = fX (x)fY (y) = e e
(r) (s)
for x > 0, y > 0, and fX,Y (x, y) = 0 otherwise.
Solutions to Chapter 6 151
In the setting of Fact 6.41, for x, y 2 (0, 1) we are using the change of
variables
x
u = g(x, y) = 2 (0, 1), v = h(x, y) = x + y 2 (0, 1).
x+y
The inverse functions are
q(u, v) = uv 2 (0, 1), r(u, v) = v(1 u) 2 (0, 1).
The relevant Jacobian is
@q @q
@u (u, v) @v (u, v)
v u
J(u, v) = @r @r = = v.
@u (u, v) @v (u, v)
v 1 u
From this we get
fB,G (u, v) = fX (uv)fY (v(1 u))v
r r 1 s
(uv) (v(1 u))s 1
= e uv e (v(1 u))
v
(r) (s)
(r + s) r 1 1
= u (1 u)s 1
· r+s (r+s) 1
v e v
.
(r) (s) (r + s)
for u 2 (0, 1), v 2 (0, 1), and 0 otherwise. We can recognize that this is exactly
the product of a Beta(r, s) probability density (in u) and a Gamma(r + s, )
probability density (in v), hence B ⇠ Beta(r, s), G ⇠ Gamma(r + s, ), and
they are independent.
(b) The transformation described is the inverse of that found in part (a). Therefore,
X and Y are independent with X ⇠ Gamma(r, ) and Y ⇠ Gamma(s, ).
For the detailed solution note that
(r + s) r 1 1
fB,G (b, g) = b (1 b)s 1 · r+s (r+s) 1
g e g
(r) (s) (r + s)
for b 2 (0, 1), g 2 (0, 1) and it is zero otherwise.
We use the change of variables
x = b · g, y = (1 b) · g.
The inverse function is
x
b= , g = x + y.
x+y
The Jacobian is
y x
(x+y)2 (x+y)2 1
J(x, y) = = .
1 1 x+y
From this we get
(r + s) x r 1 x s 1 1 r+s 1
fX,Y (x, y) = ( ) (1 x+y ) · (x + y)(r+s) 1
e (x+y)
(r) (s) x+y (r + s) x+y
xr 1 r ys 1 s
= e x e y
(r) (s)
for x > 0, y > 0 (and zero otherwise). This shows that indeed X and Y are
independent with X ⇠ Gamma(r, ) and Y ⇠ Gamma(s, ).
152 Solutions to Chapter 6
6.51. (a) Apply the two-variable expectation formula to the function h(x, y) =
g(x). Then
X X
E[g(X)] = E[h(X, Y )] = h(k, `)P (X = k, Y = `) = g(k)P (X = k, Y = `)
k,` k,`
X X X
= g(k) P (X = k, Y = `) = g(k)P (X = k).
k ` k
Proof. One way to prove this is with the infinitesimal method. For " > 0 we have
P (X1 2 (x1 , x1 + "), . . . , Xm 2 (xm , xm + "))
Z x1 +" Z xm +" Z 1 Z 1
= ··· ··· f (y1 , . . . , yn ) dy1 . . . dyn
x1 xm 1 1
✓Z 1 Z 1 ◆
⇡ ··· f (x1 , . . . , xm , ym+1 , . . . , yn ) dym+1 . . . dyn "m .
1 1
We set P (XB = XD = 0) = 0 to make sure that a call comes. a and b are unknowns
that have to satisfy a 0, b 0 and a + b 1, in order for the table to represent
a legitimate joint probability mass function.
(a) The given marginal p.m.f.s force the following solution:
XD
0 1
0 0 0.7
XB
1 0.2 0.1
(b) There is still a solution when P (XD = 1) = 0.7 but no longer when P (XD =
1) = 0.6.
6.56. Pick an x for which P (X = x) > 0. Then,
X X X
0 < P (X = x) = P (X = x, Y = y) = a(x)b(y) = a(x) b(y).
y y y
P
Hence, y b(y) 6= 0 and
P (X = x)
a(x) = P .
y b(y)
7.1. We have
X
P (Z = 3) = P (X + Y = 3) = P (X = k)P (Y = 3 k).
k
155
156 Solutions to Chapter 7
Going through the possible values of k for which P (X1 = k) > 0, and keeping only
the terms for which P (X2 = 2 k) is also positive:
P (X1 + X2 = 2) = P (X1 = 1)P (X2 = 3) + P (X1 = 0)P (X2 = 2)
+ P (X1 = 1)P (X2 = 1) + P (X1 = 2)P (X2 = 0)
+ P (X1 = 3)P (X2 = 1)
1 1 1 1 1 1
= 64 + 64 + 16 + 64 + 64 = 8
7.4. We have
( (
x µy
e , if x > 0 µe , if y > 0
fX (x) = fY (y) =
0, otherwise, 0, otherwise.
Since X and Y are both positive, X +Y > 0 with probability one, and fX+Y (z) = 0
for z 0. For z > 0, using the convolution formula
Z 1 Z z
fX+Y (z) = fX (x)fY (z x)dx = e x µe µ(z x) dx.
1 0
In the second step we used that fX (x)fY (z x) 6= 0 if and only if x > 0 and
z x > 0 which means that 0 < x < z.
Returning to the integral
Z z Z z
fX+Y (z) = e x µe µ(z x)
dx = µe µz
e(µ )x
dx
0 0
(µ )x x=z (µ )z z µz
µz e µz e 1 e e
= µe = µe = µ .
µ x=0 µ µ
Note that we used 6= µ when we integrated e(µ )x
.
Hence the probability density function of X + Y is
( z µz
µe µ e , if z > 0
fX+Y (z) =
0, otherwise.
7.5. (a) By Fact 7.9 the distribution of W is normal, with
2 2 2 2
µW = 2µx 4µY + µZ = 7, W = X + 16 Y + Z = 25.
Thus W ⇠ N ( 7, 25).
W
p+7
(b) Using part (a) we know that 25
is a standard normal. Thus
✓ ◆
W +7 2+7
P (W > 2) = P > =1 (1) ⇡ 1 0.8413 = 0.1587.
5 5
7.6. By exchangeability
P (3rd card is a king, 5th card is the ace of spades)
= P (1st card is the ace of spades, 2nd card is king).
The second probability can now be computed by counting favorable outcomes within
the first two picks:
1·4 2
P (1st card is the ace of spades, 2nd card is king) = 52 = .
2
663
Solutions to Chapter 7 157
7.7. By exchangeability
P (X3 is the second largest) = P (Xi is the second largest)
for any i = 1, 2, 4. Because the Xi are jointly continuous the probability that any
two are equal is zero. Thus
4
X
1= P (Xi is the second largest) = 4P (X3 is the second largest)
i=1
we need to compute P (N ([0, 3]) = 3). But N ([0, 3]) has Poisson distribution with
parameter 3 · 16 = 12 , hence
1
X 1
X
P (X = Y ) = P (X = Y = k) = P (X = k)P (Y = k)
k=1 k=1
X1 1
X k
= p(1 p)k 1
r(1 r)k 1
= pr [(1 p)(1 r)]
k=1 k=0
1 pr
= pr = .
1 (1 p)(1 r) r + p rp
n
X1 n
X1
P (Z = n) = P (X = i)P (Y = n i) = p(1 p)i 1
r(1 r)n i 1
i=1 i=1
n
X1 n
X2
= pr (1 p)i 1
(1 r)n i 1
= pr (1 p)i (1 r)n (i+1) 1
i=1 i=0
X2
n
1 p
i
[(1 p)/(1 r)]n 1
21
= pr(1 r)n 2
= pr(1 r)n
i=0
1 r 1 (1 p)/(1 r)
We also get
7
X 7 ✓
X ◆
k 1
P (Brewers win) = P (X = k) = p4 (1 p)k 4
.
3
k=4 k=4
160 Solutions to Chapter 7
Evaluating this sum for the various values of p gives the following numerical values:
p 0.40 0.35 0.30
P (Brewers win) 0.2898 0.1998 0.1260
7.17. Let X be the the number of trials needed until we reach k successes, then
X ⇠ Negbin(k, p). The event that the number of successes reaches k before the
number of failures reaches ` is the same as {X < k + `}. Moreover this event is the
same as having at least k successes within the first k + ` 1 trials. Thus
` 1✓
X ◆ X 1 ✓k + ` 1◆
k+`
k+j k
P (X < k + `) = p (1 p)j = pa (1 p)k+` 1 a .
j=0
k 1 a
a=k
7.18. Both X and Y have probability densities that are zero for negative values,
this will hold for X + Y as well. Using the convolution formula, for z 0 we get
Z 1 Z z
fX+Y (z) = fX (x)fY (z x)dx = fX (x)fY (z x)dx
1 0
Z z Z z
= 2e 2x 4(z x)e 2(z x) dx = 8(z x)e 2z dx
0 0
Z z
2z
= 8e (z x)dx = 4z 2 e 2z .
0
Thus (
4z 2 e 2z
, if z 0,
fX+Y (z) =
0, otherwise.
7.19. (a) We need to compute
ZZ Z 1 Z 1
x y
P (Y X 2) = fX (x)fY (y) dx dy = e dydx
y x 2 2 x
Z 1
2x 4
= e dx = 12 e .
2
(b) The density of f Y is given by f Y (y) = fY ( y). Then from the convolution
formula we get
Z 1 Z 1 Z 1
fX Y (z) = fX (t)f Y (z t)dt = fX (t)f Y (z t)dt = fX (t)fY (t z)dt.
1 1 1
Note that fX (t)fY (t z) > 0 if t > 0 and t z > 0, which is the same as
t > max(z, 0). Thus
Z 1 Z 1
1
fX Y (z) = fX (t)fY (t z)dt = e 2t+z dt = e 2 max(z,0)+z .
max(z,0) max(z,0) 2
162 Solutions to Chapter 7
where we used the fact that fX (x) = 0 outside (0, 1). For a given 1 < z < 3 the
function fY (z x) is nonzero if and only if 1 < z x < 2, which is equivalent to
z 2 < x < z 1. Since we must have 0 < x < 1 for fX (x) to be nonzero, this
means that fX (x)fY (z x) is nonzero only if max(0, z 2) < x < min(1, z 1).
Thus
Z 1 Z min(1,z 1)
fX+Y (z) = fX (x)fY (z x)dx = 2xdx
0 max(0,z 2)
2 2
= min(1, z 1) max(0, z 2) .
Considering the 1 < z 2 and 2 < z < 3 cases separately:
8
> 2
<(z 1) , if 1 < z 2,
fX+Y (z) = 1 (z 2)2 , if 2 < z < 3,
>
:
0, otherwise.
7.21. (a) By Fact 7.9 the distribution of W is normal, with
2 2 2
µW = 3µx + 4µY = 10, W =9 X + 16 Y = 59.
Thus W ⇠ N (9, 57).
(b) Using part (a) we know that Wp5710 is a standard normal. Thus
✓ ◆
W 10 15 10
P (W > 15) = P p > p =1 ( p557 ) ⇡ 1 (0.66) ⇡ 0.2578.
57 57
Solutions to Chapter 7 163
7.22. Using Fact 3.61 we have 2X ⇠ N (2µ, 4 2 ). From Fact 7.9 by the indepen-
dence of X and Y we get X + Y ⇠ N (2µ, 2 2 ). Since 2 > 0, the two distributions
can never be the same.
7.23. By Fact 7.9 X Y ⇠ N (0, 2) and thus Xp2Y ⇠ N (0, 1). From this we get
p p
P (X > Y + 2) = P ( Xp2Y > 2) = 1 ( 2) ⇡ 1 (1.41) ⇡ 0.0793.
2
7.24. Suppose that the variances of X, Y and Z are X , Y2 and Z
2
. Using Fact 7.9
2 2 2 p X+2Y 3Z
we have that X + 2Y 3Z ⇠ N (0, X + 4 Y + 9 Z ), and 2 2 2
⇠ N (0, 1).
X +4 Y +9 Z
This gives
!
X + 2Y 3Z 1
P (X + 2Y 3Z > 0) = P p >0 =1 (0) = .
2
X +4 2
Y +9 2
Z
2
7.25. We have fX (x) = 1 for 0 < x < 1 and zero otherwise. For Y we have
fY (y) = 12 for 8 < y < 10 and zero otherwise. Note that 8 < X + Y < 11.
The density of X + Y is given by
Z 1
fX+Y (z) = fX (t)fY (z t)dt.
1
The product fX (t)fY (z t) is 12 if 0 < t < 1 and 8 < z t < 10, and zero otherwise.
The second inequality is equivalent to z 10 < t < z 8. The the solution of the
inequality system is max(0, z 10) < t < min(1, z 8). Hence, for 8 < z < 11 we
have
Z 1
1
fX+Y (z) = fX (t)fY (z t)dt = (min(1, z 8) max(0, z 10)).
1 2
Evaluating the formula on (8, 9), [9, 10) and [10, 11) we get the following case defined
function: 8z 8
>
> 2 8<z<9
>
<1 9 z < 10
fX+Y (z) = 211 z
>
> 10 z < 11,
>
: 2
0 otherwise
7.26. The probability density functions of X and Y are
( (
1
, if 1 < x < 3 1, if 9 < y < 10
fX (x) = 2 fY (y) =
0, otherwise 0, otherwise
Since 1 X 3 and 9 Y 10 we must have 10 X + Y 13. For a z 2 [10, 13]
the convolution formula gives
Z 1 Z 3
fX+Y (z) = fX (x)fY (z x)dx = fX (x)fY (z x)dx.
1 1
Thus
Z 3 Z min(3,z 9)
1
fX+Y (z) = fX (x)fY (z x)dx = dx
1 max(1,z 10) 2
1
= (min(3, z 9) max(1, z 10)) .
2
Evaluating these expressions for 10 z < 11, 11 z < 12 and 12 z < 13 we get
the following case defined function:
81
>
> 2 (z 10) if 10 z < 11
>
<1 if 11 z < 12
fX+Y (z) = 21 .
>
> (13 z) if 12 z < 13
>
: 2
0 otherwise.
7.27. Using the convolution formula:
Z 1
fX+Y (t) = f (s)fY (t s)ds.
1
7.28. Because X1 , X2 , X3 are jointly continuous, the probability that any two of
them are equal is 0. This means that P (X1 , X2 , X3 are all di↵erent) = 1. By the
exchangeability of X1 , X2 , X3 we have
where we listed all six possible orderings of X1 , X2 , X3 . Since the sum of the six
probabilities is P (X1 , X2 , X3 are all di↵erent), we get that P (X1 < X2 < X3 ) = 61 .
7.29. By exchangeability, each Xi , 1 i 100 has the same probability to be the
50th largest. Since the Xi are jointly continuous, the probability of any two being
equal is 0. Hence
100
X
1= P (Xi is the 50th largest number) = 100P (X20 is the 50th largest number)
i=1
1
and the probability in question must be 100 .
(b) Again, by exchangeability and counting the favorable outcomes within the first
two picks:
13
2 1
P (1st card is , 5th card is ) = P (1st card is , 2nd card is ) = 52 = .
2
17
(c) Using the same arguments:
P (2nd card is K, last two cards are aces)
P (2nd card is K|last two cards are aces) =
P (last two cards are aces)
P (3rd card is K, first two cards are aces)
=
P (first two cards are aces)
= P (3rd card is K|first two cards are aces)
4 2
= = .
50 25
The final probability comes either from counting favorable outcomes for the first
three picks, or by noting that if we choose two aces for the first two picks then we
always have 50 cards left with 4 of them being kings.
7.31. By exchangeability the probability that the 3rd, 10th and 23rd picks are
of di↵erent colors is the same as the probability of the first three picks being of
di↵erent color. For this event the order of the first three picks does not matter, so
we can assume that we choose the three balls without order, and we just need the
probability that these are of di↵erent colors. Thus the probability is
20 · 10 · 15 100
P (we choose one of each color) = 45 = .
3
473
7.32. Denote by Xk the numerical value of the kth pick. By exchangeability of
X1 , . . . , X23 we get
P (X9 5, X14 5, X21 5) = P (X1 5, X2 5, X3 5).
(53) 10
The probability that the first three picks are from {1, 2, 3, 4, 5} is = 1771 .
(23
3)
among the 5 cards. (48 is the total number of non-ace cards, 5 k is the number
of non-ace cards among the 5.)
Thus
48
5 (a1 +···+a4 )
P (X1 = a1 , X2 = a2 , X3 = a3 , X4 = a4 ) = 52
5
if a1 , a2 , a3 , a4 2 {0, 1}. But this is a symmetric function of a1 , a2 , a3 , a4 (as the sum
does not change when we permute these numbers), which shows that the random
variables X1 , X2 , X3 , X4 are indeed exchangeable.
7.35. By exchangeability, it is enough to compute the probability that the values of
first three picks are increasing. By using exchangeability again, any of the possible
3! = 6 order for the first three picks are equally likely. Hence the probability in
question is 16 .
7.36. (a) The waiting times between replacements are independent exponentials
with parameter 1/2 (with years as the time units). This means that the replace-
ments form a Poisson process with parameter 1/2. Then the number of replacements
within the next year is Poisson distributed with parameter 1/2, and hence
P (have to replace a light bulb during the year)
1/2
=1 P (no replacements within the year) = 1 e .
(b) The number of points in two non-overlapping intervals are independent for a
Poisson process. Thus the conditional probability is the same as the unconditional
one, and using the same approach as in part (b) we get
(1/2)2 1/2 e 1/2
P (two replacements in the year) = e = .
2! 8
7.37. The joint probability mass function of g(X1 ), g(X2 ), g(X3 ) can be expressed
in terms of the joint probability mass function p(x1 , x2 , x3 ) of X1 , X2 , X3 :
X
P (g(X1 ) = a1 , g(X2 ) = a2 , g(X3 ) = a3 ) = p(x1 , x2 , x3 ).
b1 :g(b1 )=a1
b2 :g(b2 )=a2
b3 :g(b3 )=a3
8.1. From the information given and properties of the random variables we deduce
1 2 p
EX = , E(X 2 ) = , EY = nr, E(Y 2 ) = n(n 1)r2 + nr.
p p2
1
(a) By linearity of expectation, E[X + Y ] = EX + EY = p + nr.
(b) We cannot calculate E[XY ] without knowing something about the joint distri-
bution of (X, Y ). But no such information is given.
2 p
(c) By linearity of expectation, E[X 2 + Y 2 ] = E[X 2 ]+ E[Y 2 ] = p2 + n(n 1)r2 +
nr.
(d) E[ (X + Y )2 ] = E[X 2 + 2XY + Y 2 ] = E[X 2 ] + 2E[XY ] + E[Y 2 ]. Again we
would need E[XY ] which we cannot calculate.
8.2. Let Xk be the number showing on the k-sided die. We need E[X4 + X6 + X12 ].
By linearity of expectation
This gives
4 + 1 6 + 1 12 + 1 25
E[X4 + X6 + X12 ] = + + = .
2 2 2 2
8.3. Introduce indicator variables XB , XC , XD so that X = XB + XC + XD , by
defining XB = 1 if Ben calls and zero otherwise, and similarly for XC and XD . Then
E[X] = E[XB + XC + XD ] = E[XB ] + E[XC ] + E[XD ] = 0.3 + 0.4 + 0.7 = 1.4.
167
168 Solutions to Chapter 8
8.4. Let Ik be the indicator of the event that the number 4 is showing on the k-sided
die. Then Z = I4 + I6 + I12 . For each k 4 we have
1
E[Ik ] = P (the number 4 is showing on the k-sided die) = .
k
Hence, by linearity of expectation
1 1 1 1
E[Z] = E[I4 ] + E[I6 ] + E[I12 ] = + + = .
4 6 12 2
8.5. We have E[X] = p1 = 3 and E[Y ] = = 4 from the given distributions. The
perimeter of the rectangle is given by 2(X + Y + 1) and the area is X(Y + 1). The
expectation of the perimeter is
E[2(X + Y + 1)] = E[2X + 2Y + 2] = 2E[X] + 2E[Y ] + 2 = 2 · 3 + 2 · 4 + 2 = 16,
where we used the linearity of expectation.
The expectation of the area is
E[X(Y + 1)] = E[XY + X] = E[XY ] + E[X] = E[X]E[Y ] + E[X] = 3 · 4 + 3 = 15.
We used the linearity of expectation, and also that because of the independence of
X and Y we have E[XY ] = E[X]E[Y ].
8.6. The answer to parts (a) and (c) do not change. However, we can now com-
pute E[XY ] and E[(X + Y )2 ] using the additional information that X and Y are
independent. Using the facts from the solution of Exercise 8.1 about the first and
second moments of X and Y , and the independence of these random variables we
get
1 nr
E[XY ] = E[X]E[Y ] = · nr = ,
p p
and
E[(X + Y )2 ] = E[X 2 + 2XY + Y 2 ] = E[X 2 ] + 2E[XY ] + E[Y 2 ]
2 p 2nr
= + + n(n 1)r2 + nr.
p2 p
8.7. The mean of X is given by the solution of Exercise 8.3. As in the solution of
Exercise 8.3, introduce indicators so that X = XB + XC + XD . Using the assumed
independence,
Var(X) = Var(XB + XC + XD ) = Var(XB ) + Var(XC ) + Var(XD )
= 0.3 · 0.7 + 0.4 · 0.6 + 0.7 · 0.3 = 0.66.
8.8. Let X be the arrival time of the plumber and T the time needed to complete
the project. Then X ⇠ Unif[1, 7] and T ⇠ Exp(2) (with hours as units), and these
are independent. The parameter of the exponential comes from the fact that an
Exp( ) distributed random variable has expectation 1/ .
We need to compute E[X + T ] and Var(X + T ). Using the distributions of X
and T we get
1+7 62 1 1 1
E[X] = = 4, Var(X) = = 3, E[T ] = , Var(T ) = = .
2 12 2 22 4
Solutions to Chapter 8 169
By linearity we get
1 9
E[X + T ] = E[X] + E[T ] = 4 + = .
2 2
From the independence
1 13
Var(X + T ) = Var(X) + Var(T ) = 3 + = .
4 4
8.9. (a) We have
E[3X 2Y + 7] = 3E[X] 2E[Y ] + 7 = 3 · 3 2 · 5 + 7 = 6,
where we used the linearity of expectation.
(b) Using the independence of X and Y :
Var(3X 2Y + 7) = 9 · Var(X) + 4 · Var(Y ) = 92 + 43 = 30.
(c) From the definition of the variance
Var(XY ) = E[(XY )2 ] E[XY ]2 .
By independence we have E[XY ] = E[X]E[Y ] and E[(XY )2 ] = E[X 2 ]E[Y 2 ], thus
Var(XY ) = E[X 2 ]E[Y 2 ] E[X]2 E[Y ]2
= E[X 2 ]E[Y 2 ] 925 = E[X 2 ]E[Y 2 ] 225,
To compute the second moments we use the variance:
2 = Var(X) = E[X 2 ] E[X]2 = E[X 2 ] 9
hence E[X 2 ] = 9 + 2 = 11. Similarly, E[Y 2 ] = E[Y ]2 + Var(Y ) = 25 + 3 = 28. Thus
Var(XY ) = 11 · 28 225 = 83.
8.10. The moment generating function of X1 is given by
X 1 1 1
MX1 (t) = E[etX ] = etk P (X1 = k) = + et + e2t .
2 3 6
k
The moment generating function of X2 is the same. Since X1 and X2 are inde-
pendent, we can compute the moemnt generating function of S = X1 + X2 as
follows:
✓ ◆2
1 1 t 1 2t
MS (t) = MX1 (t)MX2 (t) = + e + e .
2 3 6
Expanding the square we get
1 1 t 5 1 1
MS (t) = + e + e2t + e3t + e4t .
4 3 18 9 36
We can read o↵ the probability mass function of S from this by identifying the
coefficients of the exponential terms:
P (S = 0) = 14 , P (S = 1) = 13 , P (S = 2) = 5
18 , P (S = 3) = 19 , P (S = 4) = 1
36 .
170 Solutions to Chapter 8
then MX (t) = 12 e t + 25 + 10
1 t/2
e . Now take independent random variables X1 , . . . , X36
with the same distribution as X. By independence, the sum X1 + · · · + X36 has a
moment generating function which is the product of the individual moment gener-
1 t/2 36
ating functions, which is exactly 12 e t + 25 + 10 e = MZ (t). Hence Z has the
same distribution as X1 + · · · + X36 .
8.14. We need to compute E[X], E[Y ], E[X 2 ], E[Y 2 ], E[XY ]. All of these can be
computed using the joint probability mass function given in the table. For example,
1 1 2 1 1 1 1 1 1 1 1
E[X] = 1 · ( 15 + 15 + 15 + 15 ) + 2 · ( 10 + 10 + 5 + 10 ) + 3 · ( 30 + 30 +0+ 10 )
11
=
6
Solutions to Chapter 8 171
and
1 1 2 1 1 1
E[XY ] = 1 · 0 · 15 1 ·1· 15 +1·2· 15 +1·3· 15 +2·0· 10 +2·1· 10
+2· 2 · 15 +2·3· 1
10 +3·0· 1
30 +3·1· 1
30 +3·3· 1
10
47
= .
15
Similarly,
5 23 59
E[Y ] = , E[X 2 ] = , E[Y 2 ] = .
3 6 15
Then
47 11 5 7
Cov(X, Y ) = E[XY ] E[X]E[Y ] = · = .
15 6 3 90
For the correlation we first compute the variances:
✓
◆2
2 2 23 11 17
Var(X) = E[X ] (E[X]) = =
6 6 36
✓ ◆2
59 5 52
Var(Y ) = E[Y 2 ] (E[Y ])2 = = .
15 3 45
From this we have
Cov(X, Y ) 7
Corr(X, Y ) = p = p ⇡ 0.1053
Var(X) Var(Y ) 2 1105
8.15. We first compute the joint probability density of (X, Y ). The quadrilateral
D is composed of a unit square and a triangle which is half of the unit square, thus
the area of D is 32 . Thus the joint density function is
2
fX,Y (x, y) =
1{(x,y)2D} .
3
To calculate the covariance we need to calculate
E[XY ], E[X], E[Y ].
We have
Z 1 Z 2 y Z 1
2 2
E[XY ] = xy dx dy = y(2 y)2 dy
0 0 3 0 6
✓ ◆ 1
2 4 2 4 3 1 4 2 11 11
= y y + y = · = .
6 2 3 4 0 6 12 36
Z 1 Z 2 y Z 1
2 2
E[X] = x dx dy = (2 y)2 dy
0 0 3 0 6
✓ ◆ 1
2 4 2 1 3 2 7 7
= 4y y + y = · = .
6 2 3 0 6 3 9
Z 1 Z 2 y Z 1
2 2
E[Y ] = y dx dy = (2 y)y dy
0 0 3 0 3
✓ ◆ 1
2 2 2 1 3 2 2 4
= y y = · = .
3 2 3 0 3 3 9
172 Solutions to Chapter 8
2 2 2⇡
1 u2 1 v2
=p e 2 p e 2.
2⇡ 2⇡
The final result shows that U and V are independent standard normals.
8.19. This is the same problem as Exercise 6.15.
8.20. By linearity, E[X3 + X10 + X22 ] = E[X3 ] + E[X10 ] + E[X22 ]. The random
variables X1 , . . . , X30 are exchangeable, thus E[Xk ] = E[X1 ] for all 1 k 30.
This gives
E[X3 + X10 + X22 ] = 3E[X1 ].
The value of the first pick is equally likely to be any of the first 30 positive integers,
hence
X30
1 30 · 31 31
E[X1 ] = k = = ,
30 2 · 30 2
k=1
and
93
E[X3 + X10 + X22 ] = 3E[X1 ] = .
2
8.21. Label the coins from 1 to 10, for example so that coins 1-5 are the dimes, coins
6-8 are the quarters, and coins 9-10 are the pennies. Let ak be the value of coin k
and let Ik be the indicator variable that is 1 if coin k is chosen, for k = 1, . . . , 10.
Then
X10
X= ak Ik = 10(I1 + · · · + I5 ) + 25(I6 + I7 + I8 ) + I9 + I10 .
k=1
The probability that any particular coin is chosen is
9
2 3
E(Ik ) = P (coin k chosen) = 10 = 10 .
3
Hence
10
X
3 3 3
EX = ak E(Ik ) = 10 · 5 · 10 + 25 · 3 · 10 +2· 10 = 38.1 (cents).
k=1
8.22. There are several ways to approach this problem. One possibility that gives
the answer without doing complicated computations is as follows. For each 1 j
89 let Ij be the indicator of the event that both j and j + 1 are chosen among the
P89
five numbers. Then X = j=1 Ij , since if j and j + 1 are both chosen then they
will be next to each other in the ordered sample. By linearity
89
X 89
X
E[X] = E[ Ij ] = E[Ij ].
j=1 j=1
174 Solutions to Chapter 8
Because we draw with replacement, the colors of di↵erent picks are independent:
E[Ij ] = P (jth ball is green and the (j + 1)st ball is yellow)
4 3 4
= P (jth ball is green)P ((j + 1)st ball is yellow) = · = .
9 9 27
176 Solutions to Chapter 8
This gives
n
X1 4 4(n 1)
E[Xn ] = = .
j=1
27 27
(b) We will see a di↵erent (maybe more straightforward) technique in Chapter 10,
but here we will give a solution using the indicator method. Let Jk denote the
indicator that thePkth ball is green and there are no white balls among the first
1
k 1. Then Y = k=1 Jk . (In the sum a term is equal to 1 if the corresponding
ball is green and came before the first white.) Using linearity
1
X 1
X
E[Y ] = E[ Jk ] = E[Jk ]
k=1 k=1
1
X
= P (kth ball is green, no white balls among the first k 1).
k=1
(We can exchange the expectation and the infinite sum here as each term is
nonnegative.) Using independence we can compute the probability in question
for each k:
P (kth ball is green, no white balls among the first k 1)
= P (kth ball is green)P (first k 1 balls are all green or yellow)
4 7 k 1
= 9 · 9 .
This gives
1
X
4 7 k 1 4 1
E[Y ] = 9 · 9 = 9 · 1 7 = 2.
9
k=1
Here is an intuitive explanation for the result that we got. The yellow draws
are irrelevant in this problem: the only thing that matters is the position of
the first white, and the number of green choices before that. Imagine that
we remove the yellow balls from the urn, and we repeat the same experiment
(sampling with replacement), stopping at the first white ball. Then the number
of picks is a geometric random variable with parameter 26 = 13 . The expectation
of this geometric random variable is 3. Moreover, the number of total picks is
equal to the number of green balls chosen before the first white plus the 1 (the
first white). This explains why the expectation of Y is 3 1 = 2.
8.27. For 1 i < j n let Ii,j be the indicator of the event P that ai = aj . We need
to compute the expected value of the random variable X = i<j Ii,j . By linearity
P
E[X] = i<j E[Ii,j ]. Using the exchangeability of the sample (a1 , . . . , an ) we get
for all i < j that E[Ii,j ] = E[I1,2 ] = P (a1 = a2 ). Counting favorable outcomes (or
by conditioning on the first pick) we get P (a1 = a2 ) = n1 . This gives
X ✓ ◆ ✓ ◆
n n 1 n 1
E[X] = E[Ii,j ] = P (a1 = a2 ) = · = .
i<j
2 2 n 2
8.28. Imagine that we take the sample with order and for each 1 k 10 let
Ik be the indicator that we got a yellow marble for the kth pick, and Jk be the
Solutions to Chapter 8 177
P10 P10
indicator that we got a green pick. Then X = k=1 Ik , Y = k=1 Jk and X Y =
P10
k=1 (Ik Jk ). Using the linearity of expectation we get
10
X 10
X
E[X Y ] = E[ (Ik Jk )] = (E[Ik ] E[Jk ]).
k=1 k=1
Counting favorable outcomes (noting that there are 4 · 9 = 36 number cards in the
deck) gives
36
3 21
P (the first three cards flipped are number cards) = 52 =
3
65
and
21 210
E[X] = 50 ·
= .
65 13
8.30. Let Xk be the number of the kth chosen ball and let Ik be the indicator of
the event that Xk > Xk 1 . Then
N = I2 + I3 + · · · + I20 ,
and using linearity and exchangeability
20
X 20
X
E[N ] = E[ Ik ] = E[Ik ] = 19E[I2 ].
k=2 k=2
We also have
E[I2 ] = P (X1 < X2 ) = P (first number is smaller than the second).
One could compute the probability P (X1 < X2 ) by counting favorable outcomes
for the first two picks. Another way is to notice that
1 = P (X1 < X2 ) + P (X1 > X2 ) + P (X1 = X2 ) = 2P (X1 < X2 ) + P (X1 = X2 ),
178 Solutions to Chapter 8
8.33. (a) For each 1 a 10 let Ia be the indicator of the event that the ath
player won exactly 2 matches. Then we need
10
X 10
X
E[ Ik ] = P (the ath player won exactly 2 matches).
k=1 k=1
By exchangeability the probability is the same for each a. Since the outcomes of
the matches are independent and a player plays 9 matches, we have
✓ ◆
9
P (the first player won exactly 2 matches) = 2 9.
2
Thus the expectation is 10 · 92 2 9 = 45 64 .
(b) For each 1 a < b < c 10 let Ja,b,c P be the indicator
Pthat the players numbered
a, b and c form a 3-cycle. We need E[ a<b<c Ja,b,c ] = a<b<c E[Ja,b,c ]. There are
10
3 such triples, and the expectation is the same for each one, so it is enough to
find
E[J1,2,3 ] = P (Players 1, 2 and 3 form a 3-cycle).
Players 1, 2 and 3 form a 3-cycle if 1 beats 2, 2 beats 3, 3 beats 1 (this has probability
1/8) or if 1 beats 3, 3 beats 2 and 2 beats 1 (this also has probability 1/8). Thus
E[J1,2,3 ] = 1/8 + 1/8 = 14 , and the expectation in question is 10 1
3 4 = 30.
(c) We use the indicator method again. For each possible sequence of di↵erent
players a1 , a2 , . . . , ak we set up an indicator that this sequence is a k-path. The
number of such indicators is 10 10!
k · k! = (10 k)! (we choose the k players, then
their order). The probability that a given indicator is 1 is the probability that a1
beats a2 , a2 beats a3 , . . . , ak 1 beats ak which is 2 (k 1) . Thus the expectation is
10! 1 k 1
(10 k)! ( 2 ) .
8.34. We show the proof for n = 2, the general case can be done similarly. Assume
that the joint probability density function of X1 , X2 is f (x1 , x2 ). Then
Z 1Z 1
E[g1 (X1 ) + g2 (X2 )] = (g1 (x1 ) + g2 (x2 ))f (x1 , x2 )dx1 dx2 .
1 1
Using the linearity of the integral we can write this as
Z 1Z 1 Z 1Z 1
g1 (x1 )f (x1 , x2 )dx1 dx2 + g2 (x2 )f (x1 , x2 )dx1 dx2 .
1 1 1 1
Integrating out x2 in the first integral gives
Z 1Z 1 Z 1 ✓Z 1 ◆
g1 (x1 )f (x1 , x2 )dx1 dx2 = g1 (x1 ) f (x1 , x2 )dx2 dx1 .
1 1 1 1
R1
Note that 1 f (x1 , x2 )dx2 is equal to fX1 (x1 ), the marginal probability density
of X1 . Hence
Z 1 ✓Z 1 ◆ Z 1
g1 (x1 ) f (x1 , x2 )dx2 dx1 = g1 (x1 )fX1 (x1 )dx1 = E[g1 (X1 )].
1 1 1
Similar computation shows that
Z 1Z 1
g2 (x2 )f (x1 , x2 )dx1 dx2 = E[g2 (X2 )].
1 1
Thus E[g1 (X1 ) + g2 (X2 )] = E[g1 (X1 )] + E[g2 (X2 )].
180 Solutions to Chapter 8
8.35. (a) We may assume that the choices we made each day are independent. Let
Jk be the indicator for the event that the sweater k is worn at least once in the 5
days. Then X = J1 + J2 + J3 + J4 . By linearity and exchangeability
4
X
E[X] = E[J1 + J2 + J3 + J4 ] = E[Jk ] = 4E[J1 ]
k=1
= 4P (the first sweater was worn at least once).
Considering the complement of the event in the last line:
P (the first sweater was worn at least once) = 1 P (the first sweater was not worn at all)
✓ ◆5
3
=1 ,
4
where we used the independence assumption. This gives
✓ ◆5 !
3 781
E[X] = 4 1 = .
4 256
(b) We use the notation introduced in part (a). For the variance of X we need
E[X 2 ]. Using linearity and exchangeability:
4
X X
E[X 2 ] = E[(J1 + J2 + J3 + J4 )2 ] = E[ Jk2 + 2 Jk J` ]
k=1 k<`
✓ ◆
4
= 4E[J12 ] +2 E[J1 J2 ] = 4E[J12 ] + 12E[J1 J2 ]
2
Since J1 is one or zero, we have J12 = J1 and by part (a)
781
4E[J12 ] = 4E[J1 ] = E[X] = .
256
We also have
E[J1 J2 ] = P (both the first and second sweater were worn at least once).
Let Ak denote the event that the kth sweater was not worn at all during the week.
Then
P (both the first and second sweater were worn at least once) = P (Ac1 Ac2 )
=1 P ((Ac1 Ac2 )c ) = 1 P (A1 [ A2 )
=1 (P (A1 ) + P (A2 ) P (A1 A2 )).
From part (a) we get P (A1 ) = P (A2 ) = ( 34 )5 , and similarly
P (A1 A2 ) = P (neither the first nor the second sweater was worn) = ( 24 )5 .
Thus
E[J1 J2 ] = 1 P (A1 ) P (A2 ) + P (A1 A2 ) = 1 2( 34 )5 + ( 24 )5
and
781 2491
E[X 2 ] = + 12 1 2( 34 )5 + ( 24 )5 = .
256 256
Finally,
Var(X) = E[X 2 ] E[X]2 = 2491
256 ( 781 2
256 ) ⇡ 0.4232.
Solutions to Chapter 8 181
8.36. (a) Let Ik be the indicator of the event that the number k appears at least
once among the four die rolls. Then X = I1 + · · · + I6 and we get
E[X] = E[I1 + · · · + I6 ] = E[I1 ] + · · · + E[I6 ] = 6E[I1 ],
where the last step comes from exchangeability. We have
5 4
E[I1 ] = P (the number 1 shows up) = 1 P (none of the rolls are equal to 1) = 1 6
which gives ⇣ ⌘
5 4
E[X] = 6 1 6 .
(b) We need to compute the second moment of X. Using the notation of part (a):
6
X X
2 2
E[X ] = E[(I1 + · · · + I6 ) ] = E[ Ik2 + 2 Ij Ik ]
k=1 j<k6
6
X X
= E[Ik2 ] + 2 E[Ij Ik ].
k=1 j<k6
Collecting everything:
⇣ ⌘ ⇣ ⌘
5 4 2 4 5 4
E[X 2 ] = 6 1 6 + 30 1 + 3 2· 6
and
Var(X) = E[X 2 ] E[X]2
⇣ ⌘ ⇣ ⌘ ⇣ ⌘2
5 4 2 4 5 4 5 4
=6 1 6 + 30 1+ 3 2· 6 36 1 6
⇡ 0.447.
182 Solutions to Chapter 8
8.37. (a) Let Jk be the indicator for the event that the toy k is in at least one of
the 4 boxes. Then X = J1 + J2 + · · · + J10 . By linearity and exchangeability
10
X 10
X
E[X] = E[ Jk ] = E[Jk ] = 10E[J1 ]
k=1 k=1
= 10P (the first toy was in one of the boxes).
Let Ak be the event that the kth toy was not in any of the four boxes. Then
E[X] = 10P (Ac1 ) = 10(1 P (A1 )).
We may assume that the toys in the boxes are chosen independently of each other,
and hence
✓ 9 ◆4
4 ( 2)
P (A1 ) = P (first box does not contain the first toy) = = ( 45 )4
(10
2)
and ⇣ ⌘
738 4 4
E[X] = 10 1 . 5 =
125
(b) We need E[X 2 ] which can be expressed using the introduced indicators as
10
X 10
X X
E[X 2 ] = E[( Jk )2 ] = E[ Jk2 + 2 Jj Jk ]
k=1 k=1 j<k
10
X X
= E[Jk2 ] + 2 E[Jj Jk ]
k=1 j<k
◆ ✓
10
= 10E[J12 ]
+2 E[J1 J2 ]
2
= 10E[J1 ] + 90E[J1 J2 ].
738
We used linearity, exchangeability and J1 = J12 . Note that 10E[J1 ] = E[X] = 125
by part (a). Recalling the definition of Ak from part (a) we get
E[J1 J2 ] = P (Ac1 Ac2 ).
By taking complements,
P (Ac1 Ac2 ) = 1 P ((Ac1 Ac2 )c ) = 1 P (A1 [ A2 ) = 1 (P (A1 ) + P (A2 ) P (A1 A2 )).
As we have seen in part (a):
✓ ◆4
(92)
P (A1 ) = P (A2 ) = = ( 45 )4 ,
(10
2)
This gives
E[J1 J2 ] = 1 2( 45 )4 + ( 28
45 )
4
and
738
E[X 2 ] = + 90 1 2( 45 )4 + ( 28
45 )
4
,
125
Solutions to Chapter 8 183
which leads to
62 (1 + 1
4 + 1
9 + 1
16 + 1
25 ) 6(1 + 1
2 + 1
3 + 1
4 + 15 ) = 38.99.
8.39. Let Ji = 1 if a boy is chosen with the ith selection,
P15 and zero otherwise. Note
that E[Ji ] = P {Xi = 1} = 17/40. Then X = i=1 Ji and using linearity and
exchangeability
15
X 17 51
E[X] = P {Ji = 1} = 15 ⇥ = .
i=1
40 8
Using the formula for the variance of the sum (together with exchangeability) gives
15
! 15
X X X
Var(X) = Var Ji = Var(Ji ) + 2 Cov(Ji , Jk )
i=1 i=1 i<k
= 15Var(J1 ) + 15 · 14 Cov(J1 , J2 ),
To find E[J1 J2 ] note that J1 J2 = 1 only if a boy is called upon twice to start, and
zero otherwise. Thus, by counting favorable outcomes we get
17
2 34
E[J1 J2 ] = 40 = .
2
195
Collecting everything:
17 23 34 2 1955
Var(X) = 15 · · + 15 · 14 · 195 ( 17
40 ) = .
40 40 832
8.40. (a) We use the method of indicators. Let Jk be the indicator for the event
that the number k is drawn in at least one of the 4 weeks. Then X = J1 + J2 +
184 Solutions to Chapter 8
We have
From this
✓ ◆4 !
85
E[X] = 90E[J1 ] = 90 1 ⇡ 18.394.
90
(b) We first compute the second moment of X. Using the notation from part (b)
we have
2 !2 3 2 3
X90 X90 X
E[X 2 ] = E 4 Jk 5 = E 4 Jk2 + 2 Jk J` 5
k=1 k=1 1k<`90
90
X X
= E[Jk2 ] + 2 E[Jk J` ]
k=1 1k<`90
✓ ◆
90
= 90E[J12 ] + 2 · E[J1 J2 ],
2
where we used exchangeability again in the last step. Since J1 is either zero or
one, we have J12 = J1 . Thus the term 90E[J12 ] is the same as 90E[J1 ] which is
equal to E[X]. The second term can be computed as follows:
E[J1 J2 ] = P (both 1 and 2 are drawn at least once within the 4 weeks)
=1 P (at least one of 1 and 2 is not drawn within of the 4 weeks))
=1 P (1 is not drawn in any of the 4 weeks)
+ P (2 is not drawn in any of the 4 weeks)
+ P (neither 1 nor 2 is drawn in any of the 4 weeks) ,
and
✓ ◆4
88 · 87 · 86 · 85 · 84
P (neither 1 nor 2 is drawn in any of the 4 weeks) =
90 · 89 · 88 · 87 · 86
✓ ◆4
85 · 84
= .
90 · 89
8.41. We have
"✓ ◆3 #
X1 + · · · + Xn 1 h 3
i
E[X̄n3 ] = E = E (X 1 + · · · + X n ) .
n n3
By expanding the cube of the sum and using linearity and exchangeability
2 3
X n X X
1
E[X̄n3 ] = 3 E 4 Xk3 + 6 Xi Xj Xk + 3 Xj2 Xk 5
n
k=1 i<j<k j6=k
0 1
n
1 @X X X
= 3 E[Xk3 ] + 6 E[Xi Xj Xk ] + 3 E[Xj2 Xk ]A
n
k=1 i<j<k j6=k
✓ ◆
1 n
= 3 · n E[X13 ] + 6 E[X1 X2 X3 ] + 3n(n 1)E[X12 X2 ].
n 3
By independence
hence
1 b
E[X̄n3 ] = · n E[X13 ] = 2 .
n3 n
8.42. We have
"✓ ◆4 #
X1 + · · · + Xn 1 h 4
i
E[X̄n4 ] =E = 3
E (X1 + · · · + Xn ) .
n n
186 Solutions to Chapter 8
By expanding the fourth power of the sum and using linearity and exchangeability
X
n X
1
E[X̄n4 ] = 4 E Xk4 + 24 Xi Xj Xk X`
n
k=1 i<j<k<`
X X X
+ 12 Xj2 Xk X` + 6 Xj2 Xk2 + 4 Xj3 Xk
k<` j<k j6=k
j6=k,j6=`
n
1 X X
= 4 E[Xk4 ] + 24 E[Xi Xj Xk X` ]
n
k=1 i<j<k<`
X X X
+ 12 E[Xj2 Xk X` ] + 6 E[Xj2 Xk2 ] + 4 E[Xj3 Xk ]
k<` j<k j6=k
j6=k,j6=`
✓ ◆
1 n
= 3 E[X14 ] + 24 E[X1 X2 X3 X4 ]
n 4
✓ ◆ ✓ ◆
n n
+ 12 · · E[X12 X2 X3 ] + 6 E[X12 X22 ] + 4n(n 1)E[X13 X2 ].
3 2
By independence
E[X1 X2 X3 X4 ] = E[X1 ]E[X2 ]E[X3 ]E[X4 ] = 0, E[X12 X2 X3 ] = E[X12 ]E[X2 ]E[X3 ] = 0,
E[X13 X2 ] = E[X13 ]E[X2 ] = 0, E[X12 X22 ] = E[X12 ]E[X22 ] = E[X12 ]2 .
Hence
1 3n(n 1) c 3(n 1)a2
E[X̄n4 ] = E[X 4
1 ] + E[X 2 2
1 ] = + .
n3 n4 n3 n3
8.43. (a) Note that E[Zi2 ] = E[Zi2 ] E[Zi ]2 = Var(Zi ) = 1, because E[Zi ] = 0.
Therefore by linearity we have
n
X
E[Y ] = E[Zi2 ] = nE[Z12 ] = n.
i=1
We have
Var(Z12 ) = E[Z14 ] E[Z12 ]2 .
The fourth moment of a standard normal random variable in Exercise 3.69: E[Z14 ] =
3. Thus,
Var(Y ) = nVar(Z12 ) = n(3 1) = 2n.
(b) The moment generating function of Y is
2 2 2
MY (t) = E[etY ] = E[et(Z1 +Z2 +···+Zn ) ].
By the independence of Zi we can write the right hand side as a product of the
individual moment generating functions, and using the fact that the Zi are i.i.d. we
get
MY (t) = MZ12 (t)n .
Solutions to Chapter 8 187
This integral convergences only for t < 1/2 (otherwise we integrate a function
that is always at least 1). Moreover, we can write this using the integral of the
probability density function of an N (0, 2t1 1 ) random variable:
Z 1 z2 Z 1
1 2 1 1 1 (2t 1)z 2 1
p e 2t 1 dz = p q e 2 dz = p .
2⇡ 1 2t 1 1 2⇡ 2t1 1 2t 1
Therefore,
⇢ n/2
(1 2t) for t < 1/2
MY (t) =
1 for t 1/2.
Using the moment generating function we calculate the mean to be
and similarly,
1 2t 2 3t 4 4t
MY (t) =
e + e + e .
7 7 7
(b) Since X and Y are independent, we have MX+Y (t) = MX (t)MY (t). Using the
result of part (a) we get
1 t
MX+Y (t) = MX (t)MY (t) = 4e + 14 e2t + 12 e3t 1 2t
7e + 27 e3t + 47 e4t .
16 8
Then Cov(X, Y ) = E[XY ] E[X]E[Y ] = 9 2· 9 = 0, which means that
Corr(X, Y ) = 0 as well.
8.46. The first five and last five draws together will give all the draws, thus X +Y =
6 and Y = 6 X. Then
The number of red balls in the first five draws has a hypergeometric distribution
with NA = 6, NB = 4, N = 10, n = 5. In Example we computed the variance of
such a random variable to get
N n NA NB 10 5 6 4 2
Var(X) = ·n· · = ·5· · = .
N 1 N N 10 1 10 10 3
2
This leads to Cov(X, Y ) = Var(X) = 3.
8.47. The mean of X is given by the solution of Exercise 8.3. As in the solution
of Exercise 8.3, introduce indicators so that X = XB + XC + XD . Assumption (i)
of the problem implies that Cov(XB , XD ) = Cov(XC , XD ) = 0. Assumption (ii) of
the problem implies that
Then
8.48. The joint probability mass function of the random variables (X, Y ) can be
represented by the following table.
Y
0 1 2
9
1 100 0 0
X 81 9
2 100 100 0
1
3 0 0 100
8.49. We need E[X], E[Y ], E[XY ]. The joint density of X, Y is f (x, y) = 1((x, y) 2
D)) (the area is 1) and the bounding lines of D are y = 1, y = x, y = x. We get
ZZ Z 1Z y Z 1
E[X] = xf (x, y)dxdy = xdxdy = (y 2 /2 ( y)2 /2)dy = 0,
0 y 0
(x,y)2D
ZZ Z 1 Z y Z 1
2
E[Y ] = yf (x, y)dxdy = ydxdy = 2y 2 dy = ,
0 y 0 3
(x,y)2D
ZZ Z 1 Z y Z 1
E[XY ] = xyf (x, y)dydx = xydxdy = (y 3 /2 y( y)2 /2)dy = 0.
0 y 0
(x,y)2D
This gives
Cov(X, Y ) = E[XY ] E[X]E[Y ] = 0.
Solution without computation:
By symmetry we see that (X, Y ) has the same distribution as ( X, Y ). This implies
E[X] = E[ X] = E[X] yielding E[X] = 0. It also implies E[XY ] = E[ XY ] =
E[XY ] which gives E[XY ] = 0. This immediately shows that
Cov(X, Y ) = E[XY ] E[X]E[Y ] = 0.
8.50. Note that if (x, y) is on the union of the line segments AB and AC then
either x or y is equal to zero. This means that XY = 0, and Cov(X, Y ) = E[XY ]
E[X]E[Y ] = E[X]E[Y ].
To compute E[X] and E[Y ] is a little bit tricky, since X and Y are neither
continuous, nor discrete. However, we can write both of them as a function of a
continuous random variable. Imagine that we rotate AC 90 degrees about (0, 0) so
190 Solutions to Chapter 8
that it C is rotated into ( 1, 0). Let Z be a uniformly chosen point on the line
segment connecting ( 1, 0) and (1, 0). We can get (X, Y ) as the following function
of Z: (
(z, 0), if z 0
g(z) =
(0, z), if z < 0.
In other words: we ‘fold out’ the union of AB and AC so that it becomes the line
segment connecting ( 1, 0) and (1, 0), choose a point Z on it uniformly, and then
‘fold’ it back into the original AB [ AC.
The density function of Z is 12 on ( 1, 1), and zero otherwise and X = h(Z) =
max(z, 0). Thus
Z 1 Z 1
1 z 1
E[X] = max(z, 0)dz = dz = .
1 2 0 2 4
Similarly,
Z 1 Z 0
1 z 1
E[Y ] = max( z, 0)dz = dz = .
1 2 1 2 4
1
This gives Cov(X, Y ) = E[X]E[Y ] = 16 .
8.51. We start by computing the second moment:
E[(X + 2Y + Z)2 ] = E[X 2 + 4Y 2 + Z 2 + 4XY + 2XZ + 4Y Z]
= E[X 2 ] + 4E[Y 2 ] + E[Z 2 ] + 4E[XY ] + 2E[XZ] + 4E[Y Z]
= 2 + 4 · 12 + 12 + 4 · 2 + 2 · 4 + 4 · 9
= 114.
Then the variance is given by
Var(X+2Y +Z) = E[(X+2Y +Z)2 ] (E[X+2Y +Z])2 = 114 (1+2·3+3)2 = 114 100 = 14
One could also compute all the variances and pairwise covariances first and use
Var(X+2Y +Z) = Var(X)+4 Var(Y )+Var(Z)+4 Cov(X, Y )+2 Cov(X, Z)+4 Cov(Y, Z).
8.52. For the correlation we need Cov(X, Y ), Var(X) and Var(Y ). Both X and Y
have Bin(20, 21 ) distribution, thus
1 1
Var(X) = Var(Y ) = 20 · · = 5.
2 2
Denote by Zi the number of heads among the coin flips 10(i 1) + 1, 10(i 1) +
2, . . . , 10i. Then Z1 , Z2 , Z3 are independent, they all have Bin(10, 12 ) distribution,
and we have X = Z1 + Z2 and Y = Z2 + Z3 . Using the properties of the covariance
and the independence of Z1 , Z2 , Z3 :
Cov(X, Y ) = Cov(Z1 + Z2 , Z2 + Z3 )
= Cov(Z1 , Z2 ) + Cov(Z2 , Z2 ) + Cov(Z1 , Z3 ) + Cov(Z2 , Z3 )
1 1 5
= Cov(Z2 , Z2 ) = Var(Z2 ) = 10 · · = .
2 2 2
Now we can compute the correlation:
5
Cov(X, Y ) 1
Corr(X, Y ) = p = p2 = .
Var(X) Var(Y ) 5·5 2
Solutions to Chapter 8 191
Here is another way to compute the covariance. Let Ij be the indicator of the event
that the jth flip is heads. These are independent Ber(1/2) distributed random
P20 P30
variables. We have X = k=1 Ik and Y = k=21 Ik , and using the properties of
covariance and the independence we get
20
X 30
X
Cov(X, Y ) = Cov( Ik , Ij )
k=1 j=11
20 X
X 30
= Cov(Ik , Ij )
k=1 j=11
20
X 20
X 1 1
= Cov(Ik , Ik ) = Var(Ik ) = 10 · · .
2 2
k=11 k=11
Then
Cov(aX + c, bY + d)
Corr(aX + c, bY + d) = p
Var(aX + c) Var(bY + d)
ab Cov(X, Y )
=p
a2 b2 Var(X) Var(Y )
ab Cov(X, Y ) ab
= p = Corr(X, Y ).
|a| · |b| Var(X) Var(Y ) |a| · |b|
ab
The coefficient |a|·|b| is 1 if ab > 0 and 1 if ab < 0.
8.57. Assume that there are random variables satisfying the listed conditions. Then
and
Cov(X, Y ) = E[XY ] E[X]E[Y ] = 1 1·2= 3.
From this the correlation is
Cov(X, Y ) 3 3
Corr(X, Y ) = p =p = p .
Var(X) Var(Y ) 2·1 2
But p32 < 1, and we know that the correlation must be in [ 1, 1]. The found
contradiction shows that we cannot find such random variables.
8.58. By the discussion in Section 8.5 if Z and W are independent standard normals
then with
p
X = X Z + µX , Y = Y ⇢Z + Y 1 ⇢2 W + µY
the random variables (X, Y ) have bivariate normal distribution with marginals
2
X ⇠ N (µX , X ) and Y ⇠ N (µY , Y2 ) and correlation Corr(X, Y ) = ⇢. Then we
have
p
U = 2X + Y = (2 X + Y ⇢)Z + Y 1 ⇢2 W + 2µX + µY
p
V =X Y =( X Y ⇢)Z Y 1 ⇢2 W + µ X µ Y .
We can turn this system of equations into a single vector valued equation:
" p #
U 2 X + Y⇢ Y 1 ⇢2 Z 2µX + µY
= p +
V 2 W µX µY
X Y⇢ Y 1 ⇢
thus the inverse of (g(z, w), h(z, w)) is the function (q(x, y), r(x, y)) with
x µX (y µY ) X (x µX )⇢ Y
q(x, y) = , r(x, y) = p .
X 1 ⇢2 X Y
Rearranging the terms in the exponent shows that the found joint density is the
same as the one given in (8.32). This shows that the distribution of (X, Y ) is
bivariate normal with parameters µX , X , µY , Y , ⇢.
8.60. The number of ways in which toys can be chosen so that new toys appear at
times 1, 1 + a1 , 1 + a1 + a2 , . . . , 1 + a1 + · · · + an 1 is
n
Y1
n·1a1 1
·(n 1)·2a2 1
·(n 2)·3a3 1
·(n 3) · · · 2·(n 1)an 1 1
·1 = n· (n k)·k ak 1
.
k=1
where in the last step we used the fact that W1 , W2 , . . . , Wk 1 are independent with
Wj ⇠ Geom( nn j ).
1
8.61. (a) Since f (x) = x is a decreasing function, by the bounds shown in Figure
D.1 we get
n
X Z n n
X1 1
1 1
dx .
k 1 x k
k=2 k=1
Rn1
Since 1 x
dx = ln n this gives
n
X n
1 X1
ln n = 1
k k
k=2 k=1
and
n
X1 n
1 X1
ln n
k k
k=1 k=1
Pn 1
which together give 0 ln n 1.
k=1 k Pn
(c) In Example 8.17 we have shown that E[Tn ] = n k=1 n1 . Using the bounds in
part (a) we have
n ln n nE[Tn ] n(ln n + 1)
E(Tn )
from which limn!1 n ln n = 1 follows.
We have also shown
n
X1 n
X1
1 1
Var(Tn ) = n2 n ,
j=1
j2 j=1
j
and hence
n 1 n 1
Var(Tn ) X 1 1X1
= .
n2 j=1
j2 n j=1 j
Solutions to Chapter 8 195
P1 1 ⇡2
Pn
1 1 ⇡2
Pn 1
Since j=1 j 2 = 6 we have limn!1
j=1 j 2 = 6 . We also have 0 j=1 1j
Pn 1
ln n by part (a), and we know that limn!1 lnnn = 0, thus limn!1 n1 j=1 1j = 0.
2
But this means that limn!1 Var(T
n2
n)
= ⇡6 .
Solutions to Chapter 9
1
9.1. (a) The expected value of Y is E[Y ] = p = 6. Since Y is nonnegative, we can
E[Y ] 6
use Markov’s inequality to get the bound P (Y 16) 16 = 16 = 38 .
5
q
(b) The variance of Y is Var(Y ) = p2 = 6
1 = 30. Using Chebyshev’s inequality we
36
get
Var(Y ) 30 3
P (Y 16) = P (Y E[Y ] 10) P (|Y E[Y ]| 10) = = .
102 100 10
(c) The exact value of P (Y 16) can be computed for example by treating Y as
the number trials needed for the first success in a sequence of independent trials
with success probability p. Then
We can see that the estimates in (a) and (b) are valid, although they are not very
close to the truth.
1
9.2. (a) We have E[X] = = 2 and X 0. By Markov’s inequality
E[X] 1
P (X > 6) = .
6 3
1 1
(b) We have E[X] = = 2, Var[X] = 2 = 4. By Chebyshev’s inequality
Var(X) 4 1
P (X > 6) = P (X E[X] > 4) P (|X E[X]| > 4) 2
= 2 = .
4 4 4
9.3. Let Xi be the price change between day i 1 and day i (with day 0 being
today). Then Cn C0 = X1 + X2 + · · · + Xn . The expectation of Xi (for each i)
is given by E[Xi ] = E[X1 ] = 0.45 · 1 + 0.5 · ( 2) + 0.05 · (10) = 0.05. We can also
197
198 Solutions to Chapter 9
where we used linear interpolation to approximate (1.875) using the table in the
Appendix.
9.7. Let Xi be the size of the claim made by the ith policyholder. Let m be the
premium they charge. We desire a premium m for which
2,500
!
X
P Xi 2, 500 · m 0.999.
i=1
We first use Chebyshev’s inequality to estimate
p the probability of the complement.
Recall that µ = E[Xi ] = 1000 and = Var(Xi ) = 900. Using the notation
P2,500
S = i=1 Xi we have
2
E[S] = 2500µ, Var(S) = 2500 .
By Chebyshev’s inequality (assuming m > µ)
P (S 2, 500 · m) = P (S 2500µ 2, 500 · (m µ))
Var(S) 2500 · 9002 324
2 2
= 2 2
= .
2500 · (m µ) 2500 · (m µ) (m 1000)2
We need this probability to be at most 1 0.999 = 0.001, which leads to (m 324 1000)2
0.001 and
18
m 1000 + p ⇡ 1569.21.
0.001
Note that we assumed m > µ which was natural: for m µ we can use Chebyshev’s
inequality that the probability in question cannot be at least 0.999.
⇣P ⌘
2,500
Now let us see how we can estimate P i=1 X i 2, 500 · m using the cen-
tral limit theorem. We have
✓ ◆
S 2, 500 · 1, 000 2, 500 · m 2, 500 · 1, 000
P (S 2500 · m) = P p p
2, 500 · 900 2, 500 · 900
✓ ◆ ✓ ◆
2500(m 1, 000) m 1, 000
⇡ p =
2, 500 · 900 18
We would like this probability to be at most 0.999. Using the table in Appendix E
m 1,000
we get that 18 0.999 if m 18
1,000
3.1 which leads to m 1055.8.
9.8. (a) This is just the area of the quarter of the unit disk, multiplied by 4.
(b) We have
Z 1 Z 1
4 · I(x2 + y 2 1) dx dy = E[g(U1 , U2 )]
0 0
where U1 , U2 are independent Unif[0, 1] random variables and g(x, y) = 4·I(x2 +
y 2 1).
(c) We need to generate n = 106 independent samples of the random variable
g(U1 , U2 ). If µ̄ is the sample mean and s2n is the sample variance then the
appropriate confidence interval is (µ̄ 1.96·s
p n , µ̄ + 1.96·s
n
p n ).
n
i Cov(X ,Xj )
Since we have Var(Xi ) = 4500, this gives Corr(Xi , Xj ) = 4500 . Hence
(
0.5 · 4500, if j = i + 1,
Cov(Xi , Xj ) =
0, if j i 2.
There are n 1 pairs of the form i, i + 1 in the sum above, which gives
Var(X1 + · · · + Xn ) = 4500n + 4500(n 1) = 9000n 4500.
Using the outline given in Exercise 9.9(c) we get
✓ ◆
Sn Var(Sn /n) 9000n 4500
P 5, 000 50 = .
n 502 n2 2500
9000n 4500
We need n2 2500 < 0.05 which leads to n 72.
9.11. (a) We have
0 3 5/2 5/2
MX (t) = 2 · 2(1 2t) = 3(1 2t) .
Thus,
0
MX (0) = E[X] = 3.
We may now use Markov’s inequality to conclude that
E[X] 3
P (X > 8) = = 0.375.
8 8
(b) In order to use Chebyshev’s inequality, we must find the variance of X. So,
di↵erentiating again yields
00 7/2
MX (t) = 15(1 2t) ,
and so,
M 00 (0) = E[X 2 ] = 15 =) Var(X) = 15 9 = 6.
Solutions to Chapter 9 201
will converge to 0 by Theorem 9.9. But this meansPn that thePprobability of the
n
complement will converge to 1, in other words P ( k=1 Xi Y
k=1 i ) converges
to 1 as n gets larger and larger.
9.16. Let Ui be the waiting time for number 5 on morning i, and Vi the waiting time
1 1
for number 8 on morning i. From the problem, Ui ⇠ Exp( 10 ) and Vi ⇠ Exp( 20 ).
The actual waiting time on morning i is Xi = min(Ui , Vi ). Let Yi be the Bernoulli
variable that records 1 if I take the number 5 on morning i. Then from properties
of exponential variables (from Examples 6.33 and 6.34)
1
3 20
Xi ⇠ Exp( 20 ), E(Xi ) = 3 , E(Yi ) = P (Yi = 1) = P (Ui < Vi ) = 1
10
1 = 23 .
10 + 20
Pn Pn
Since Sn = i=1 Xi and Tn = i=1 Yi , we can answer the questions by the LLN.
(a)
lim P (Sn 7n) = lim P (Sn nE(X1 ) 13 n)
n!1 n!1
Sn
lim P ( n E(X1 ) 13 ) = 1.
n!1
(b)
1
lim P (Tn 0.6n) = lim P (Tn nE(Y1 ) 15 n)
n!1 n!1
Tn 1
lim P ( n E(Y1 ) 15 ) = 1.
n!1
1
9.18. (a) From Example 8.13 we have E[X] = 100 · 1 = 300. Hence by Markov’s
3
inequality we get
E[X] 300 3
P (X > 500) = = .
500 500 5
1
1 3
(b) Again, from Example 8.13 we have Var[X] = 100 · 1 2 = 600. Then from
(3)
Chebyshev’s inequality:
P (X > 500) = P (X E[X] > 500 300)
Var(X) 600 3
= = = 0.015.
2002 2002 200
(c) By the CLT the distribution of the standardized version of X is close to that
of a standard normal. The standardized version is Xp600
300
, hence
⇣ ⌘
P (X > 500) = P Xp600300
> 500
p 300 ⇡ 1
600
20
(p 6
)⇡1 (8.16) < 0.0002.
16
(In fact 1 (8.16) is way smaller than 0.0002, it is approximately 2.2 · 10 .)
(d) We need more than 500 trials for the 100th success exactly if there are at most
99 successes within the first 500 trials. Thus denoting by S the number of
successes within the first 500 trials we have P (X > 500) = P (S 99). Since
S ⇠ Bin(500, 13 ), we may use normal approximation to get
0 1 0 1
500 500 500
S 99
P (S 99) = P @ r 3 r 3 A⇡ @ 99
r 3 A⇡ ( 6.42) < 0.002.
2 2 2
500· 9 500· 9 500· 9
(Again, the real value of ( 6.42) is a lot smaller than 0.0002, it is approxi-
mately 6.8 · 10 11 .)
9.19. Let Xi be the amount of time it takes the child to spin around on his ith
revolution. Then the total time it will take to spin around 100 times is
S100 = X1 + · · · + X100 .
We assume that the Xi are independent with mean 1/2 and standard deviation
1/3. Then E[S100 ] = 50 and Var(S100 ) = 100
32 . Using Chebyshev’s inequality:
Var(S100 ) 100 4
P (X1 + · · · + X100 > 55) = P (X1 + · · · + X100 50 > 5) = = .
52 9 · 25 9
If we use the CLT then
✓ ◆
X1 + · · · + X100 50 55 50
P (X1 + · · · + X100 > 55) = P p >p
100 · (1/3) 100 · (1/3)
5
⇡ P (Z >
10 · (1/3)
= P (Z > 1.5) = 1 P (Z 1.5)
=1 0.9332 = 0.0668.
204 Solutions to Chapter 9
10.1. (a) By summing the probabilities in the appropriate columns we get the
marginal probability mass function of Y :
We can now compute the conditional probability mass function pX|Y (x|y) for y =
p (x,y)
0, 1, 2 using the formula pX|Y (x|y) = X,Y
pY (y) . We get
pX|Y (2|0) = 1,
pX|Y (1|1) = 14 , pX|Y (2|1) = 12 , pX|Y (3|1) = 14 ,
pX|Y (2|2) = 12 , pX|Y (3|2) = 1
2
(b) The conditional expectations can be computed using the conditional probability
mass functions:
205
206 Solutions to Chapter 10
10.3. Given Y = y, the random variable X is binomial with parameters y and 1/2.
Hence, for x between 0 and 6, we have
X6 X6 ✓ ◆
y 1 1
pX (x) = pX|Y (x|y)pY (y) = · ,
y=1 y=1
x 2y 6
y
where x = 0 if y < x (as usual).
For the expectation, we have
6
X X6
y 1 7
E[X] = E[X|Y = y]pY (y) = · = .
y=1 y=1
2 6 4
10.4. (a) Directly from the description of the problem we get that
✓ ◆
n 1 n
pX|N (k|n) = ( ) for 0 k n 100.
k 2
(b) From knowing the mean of the binomial, E[X|N = n] = n/2 for 0 n 100.
(c)
100
X 100
X
1
E[X] = E[X|N = n] pN (n) = 2 n pN (n) = 12 E[N ] = 1
2 · 100 · 1
4 = 25
2 .
n=0 n=0
if fY (y) > 0. Since the joint density is only nonzero for 0 < y < 1, the Y variable
will have a density which is only nonzero in 0 < y < 1. In that case we have
Z 1 Z 1
12
fY (y) = fX,Y (w, y)dw = w(2 w y)dw
1 0 5
12 2 1 3 1 2 1 12 1 1 8 6
= (w w yw ) 0 = (1 y) = y
5 3 2 5 3 2 5 5
Thus, for 0 < y < 1 we have
12
5 x(2 x y) 6x(2 x y)
fX|Y (x|y) = 8 6 = .
5 5y
4 3y
(b) We have
Z Z 1
1 3 1
3 6x(2 x 34 )
P (X > |Y = ) = fX|Y (x|y = )dx = dx
2 4 1
2
4 1
2
4 94
Z
24 1 5 24 5 2 1 3 1 24 5 1 5 1
= x( x)dx = ( x x ) 1 = ( + )
7 12 4 7 8 3 2 7 8 3 32 24
24 7 11 24 17 17
= ( )= = .
7 24 96 7 96 28
Z Z
3 1
6x( 54 x) 24 1
5 24 5 3 1 4 1
E[X|Y = ]= x 7 dx = x2 ( x)dx = ( x x ) 0
4 0 4
7 0 4 7 12 4
24 1 4
= = .
7 6 7
10.6. (a) Begin by finding the marginal density function of Y . For 0 < y < 2,
Z 1 Z y
fY (y) = 1
f (x, y) dx = 4 (x + y) dx = 38 y 2 .
1 0
and
Z 3/2 Z 1
3 2
P (X < 2 | Y = 1) = fX|Y (x|1)dx = 3 (x + 1) dx = 1.
1 0
Note that integrating all the way to 3/2 would be wrong in the last integral
above because conditioning on Y = 1 restricts X to 0 < X < 1.
208 Solutions to Chapter 10
or equivalently from
Z 1 Z 2
1 1 3 2
fX (x) = fX|Y (x|y)fY (y) dy = 4 (x + y) dy = 2 + 12 x 8x .
1 x
10.7. (a) Directly by multiplying, fX,Y (x, y) = fX|Y (x|y)fY (y) = 6x for 0 < x <
y < 1.
(b)
Z 1
2x
fX (x) = · 3y 2 dy = 6x(1 x), 0 < x < 1.
x y2
fX,Y (x, y) 1
fY |X (y|x) = = , 0 < x < y < 1.
fX (x) 1 x
Thus given X = x, Y is uniform on the interval (x, 1). Valid for 0 < x < 1.
10.8. (a) From the description of the problem,
✓ ◆
` 4 m 5 ` m
pY |X (m|`) = ( ) (9) for 0 m `.
m 9
From knowing the mean of a binomial, E[Y |X = `] = 49 `. Thus E[Y |X] = 49 X.
(b) X ⇠ Geom( 16 ), and so E(X) = 6. For the mean of Y ,
E[Y ] = E[E(Y |X)] = 49 E[X] = 4
9 · 6 = 83 .
10.9. (a) We have
Z 1 Z 1
1 x/y y y
fY (y) = f (x, y)dx = e e dx = e
1 0 y
if 0 < y and zero otherwise. We can evaluate the last integral without computation
if we recognize that y1 e x/y is the probability density function of an Exp(1/y)
distribution and hence its integral on [0, 1) is equal to 1.
Solutions to Chapter 10 209
From the found probability density fY (y) we see that Y ⇠ Exp(1) and hence
E[Y ] = 1. We also get
f (x, y) 1 x/y
fX|Y (x|y) = = e if 0 < x, 0 < y,
fY (y) y
and zero otherwise.
(b) The conditional probability density function fX|Y (x|y) found in part (a) shows
that given Y = y > 0 the conditional distribution of X is Exp(1/y). Hence
E[X|Y = y] = 11 = y and E[X|Y ] = Y .
y
(c) We can compute E[X] by conditioning on Y and then averaging the conditional
expectation:
E[X] = E[E[X|Y ]] = E[Y ] = 1,
where in the last step we used part (a).
10.10. (a) ✓ ◆
n k
pX|N (k | n) = p (1 p)n k for 0 k n.
k
From knowing the expectation of a binomial, E(X | N = n) = np and then
E(X | N ) = pN .
(b) E[X] = E[E(X|N )] = pE[N ] = p .
(c) We use formula (10.36) to compute the expectation of the product:
E[N X] = E[E(N X|N )] = E[N E(X|N )] = E[N · pN ] = pE[N 2 ] = p( 2
+ ).
In the last step we used E[N ] = Var[N ] = and E[N 2 ] = (E[N ])2 + Var[N ].
The calculation above can be done without formula (10.36) also, by manipu-
lating the sums involved:
X X
E[XN ] = kn pX,N (k, n) = kn pX|N (k | n) pN (n)
k,n k,n
X X X
= n pN (n) k pX|N (k | n) = n pN (n) E(X | N = n)
n k n
X
=p n2 pN (n) = pE[N 2 ] = p( 2
+ ).
n
Now for the covariance:
2
Cov(N, X) = E[N X] EN · EX = p( + ) ·p =p .
10.11. The expected value of a Poisson(y) random variable is y, and the second
moment is y + y 2 . Thus
E[X|Y = y] = y, E[X 2 |Y = y] = y 2 + y,
and E[X|Y ] = Y , E[X 2 |Y ] = Y 2 + Y . Now taking expectations and using the the
moments of the exponential distribution gives
1
E[X] = E[E[X|Y ]] = E[Y ] =
and
2 1
E[E[X 2 |Y ]] = E[Y 2 + Y ] = 2
+ .
210 Solutions to Chapter 10
This gives
2 1 1 1 1
Var(X) = E[X 2 ] E[X]2 = 2
+ 2
= 2
+
On the other hand, according to the thinning property of Example 10.14, the
process of arrival times of buying customers is a Poisson process of rate p . Hence
again by Fact 7.26 the time of arrival of the first buying customer has Exp(p )
distribution. Thus we conclude that SN ⇠ Exp(p ). From this, E[SN ] = 1/(p ).
10.13. The price should be the expected value of X. The expectation of a Poisson( )
distributed random variable is , hence we have E[X|U = u] = u and E[X|U ] = U .
Taking expectations again:
E[X] = E[E[X|U ]] = E[U ] = 5
since U ⇠ Unif[0, 10].
10.14. Given the vector (t1 , . . . , tn ) of zeroes and ones, let m be the number of ones
among t1 , . . . , tn . Permutation does not alter the number of ones in the vector and
so m is also the number of ones among tk1 , . . . , tkn . Consequently
P (X1 = t1 , X2 = t2 , . . . , Xn = tn )
Z 1
= P (X1 = t1 , X2 = t2 , . . . , Xn = tn | ⇠ = p) dp
0
Z 1
= pm (1 p)n m
dp
0
and similarly
P (X1 = tk1 , X2 = tk2 , . . . , Xn = tkn )
Z 1
= P (X1 = tk1 , X2 = tk2 , . . . , Xn = tkn | ⇠ = p) dp
0
Z 1
= pm (1 p)n m
dp.
0
The two probabilities agree.
10.15. (a) This is very similar to Example 10.13 and can be solved similarly. Let
N be the number of claims in one day. We know that N ⇠ Poisson(12). Let NA be
the number of claims from A policies in one day, and NB be the number of claims
from B policies in one day. We assume that each claim comes independently from
policy A or policy B. Hence, given N = n, NA is distributed as a binomial random
variable with parameters n and 1/4. Therefore, for any nonnegative k,
1
X
P (NA = k) = P (NA = k|N = n)P (N = n)
n=0
X✓ ◆ ✓ ◆k ✓ ◆ n k
1
n 1 3 12n
= e 12
k 4 4 n!
n=k
✓ ◆k X1 ✓ ◆n k
1 1 k 12 1 3
= 12 e · 12
k! 4 (n k)! 4
n=k
1
X
1 9j 1 k 33
k
= 3k e 12
= 3 e 12 9
e =e .
k! j=0
j! k! k!
212 Solutions to Chapter 10
10.16. There are several ways to approach this problem. We begin with an ap-
proach of direct calculation. The total number of claims is N ⇠ Poisson(12).
Consider any particular claim. Let A be the event that this claim is from policy
A, B the event that this claim is from policy B, and C the event that this claim is
greater than $100,000. By the law of total probability
4 1 1 3 7
P (C) = P (C|A)P (A) + P (C|B)P (B) = 5 · 4 + 5 · 4 = 20 .
Let X denote the number of claims that are greater than $100,000. We must
assume that each claim is greater than $100,000 independently of the other claims.
7
It follows then that given N = n, X is conditionally Bin(n, 20 ). We can deduce the
p.m.f. of X. For k 0,
1
X X1 ✓ ◆
n 7 k 13 n k 12 12n
P (X = k) = P (X = k|N = n)P (N = n) = ( ) ( 20 ) e
k 20 n!
n=k n=k
1
X 1
X
7 k 12 12 ( 13
k
20 )
n k
12n k
k 12 1 ( 39 )j
= ( 20 ) e = ( 21
5 ) e
5
k! (n k)! k! j=0
j!
n=k
k 12 1 39 21 ( 21
5 )
k
= ( 21
5 ) e e5 =e 5 .
k! k!
We found that X ⇠ Poisson( 21
5 ). From this we answer the questions.
21
(a) E[X] = 5 .
21 21
21
(b) P (X 2) = e 5 (1 + 5 + 12 ( 21 2
5 ) )=e
5 701
50 ⇡ 0.21.
We can arrive at the distribution of X also without calculation, and then solve
the problem as above. From the solution to Exercise 10.15, NA ⇠ Poisson(3)
and NB ⇠ Poisson(9). These two variables are independent by the same kind of
calculation that was done in Example 10.13. Let XA be the number of claims from
policy A that are greater than $100,000 and let XB be the number of claims from
policy B that are greater than $100,000. The situation is exactly as in Problem
10.15 and in Example 10.13, and we conclude that XA and XB are independent
with distributions NA ⇠ Poisson( 12 9
5 ) and NB ⇠ Poisson( 5 ). Consequently X =
21
XA + XB ⇠ Poisson( 5 ).
Solutions to Chapter 10 213
10.17. (a) Let B be the event that the coin lands on heads. Then the conditional
distribution of X given B is binomial with parameters 3 and 16 , while the
conditional distribution of X given B c is Bin(5, 16 ). From this we can write down
the conditional probability mass functions, and using (10.5) the unconditional
one:
The set of possible values of X are {0, 1, . . . , 5}, and the formula makes sense
for all k if we define ab as 0 if b > a.
(b) We could use the probability mass function to compute the expectation of
X, but it is much easier to use the conditional expectations. Because the
conditional distributions are binomial, the conditional expectation of X given
B is E[X|B] = 3 · 16 = 12 and the conditional expectation of X given B c is
E[X|B c ] = 5 · 16 = 56 . Thus,
10.18. Let N be the number of trials needed for seeing the first outcome s, and Y
the number of outcomes t in the first N 1 trials.
N 1
Hence E(Y | N ) = r 1 and then
E[Y ] = E[E[Y | N ]] = E[ Nr 1
1] = 1
r 1 (E[N ] 1) = 1
r 1 (r 1) = 1.
214 Solutions to Chapter 10
(b) The conditional probability mass function found in (a) is binomial with param-
eters k + ` = n m and p2p+p 2
3
. Thus conditioned upon X1 = m, the distribution
p2
of X2 is Bin(n m, p2 +p3 ).
10.20. (a) Let n 1 and 0 k n so that P (Sn = k) > 0 and conditioning on
the event {Sn = k} is sensible. By the definition of conditional probability,
P (X1 = a1 , X2 = a2 , . . . , Xn = an | Sn = k)
P (X1 = a1 , X2 = a2 , . . . , Xn = an , Sn = k)
= .
P (Sn = k)
Unless the vector (a1 , . . . , an ) has exactly k ones, the numerator above equals
zero. Hence assume that (a1 , . . . , an ) has exactly k ones. Then the condition
Solutions to Chapter 10 215
10.22. (a) Start by observing that either X = 1 and Y 2 (when the first trial is
a success) or X 2 and Y = 1 (when the first trial is a failure). Thus when
Y = 1 we have, for m 2,
pX,Y (m, 1) P (first m 1 trials fail, mth trial succeeds)
pX|Y (m|1) = =
pY (1) P (first trial fails)
(1 p)m 1 p
= = (1 p)m 2
p.
1 p
In the other case when Y = ` 2 we must have X = 1, and the calculation
also verifies this:
pX,Y (1, `) P (first ` 1 trials succeed, `th trial fails)
pX|Y (1|`) = =
pY (`) P (first trial succeeds)
` 1
p (1 p)
= ` 1 = 1.
p (1 p)
We can summarize the answer in the following pair of formulas that capture
all the possible values of both X and Y :
(
0, m=1
pX|Y (m|1) = m 2
(1 p) p, m 2,
and for ` 2,
(
1, m=1
pX|Y (m|`) =
0, m 2.
Solutions to Chapter 10 217
(b) We reason as in Example 10.6. Let B be the event that the first trial is a
success. Then
10.23. (a) The distribution of Y is negative binomial with parameters 3 and 1/6
and the probability mass function is
✓ ◆ ✓ ◆y 2
y 1 1 5
P (Y = y) = , y = 3, 4, . . .
2 63 6
This leads to
5 y 3
P (X = x, Y = y) (y x 2) 6 · 613
P (X = x|Y = y) = = y 2
P (Y = y) y 1 1 5
2 63 6
y x 2 2(y x 1)
= (y 1)(y 2)
= ,
(y 1)(y 2)
2
y
X2 2(y x 1)
E[X|Y = y] = x .
x=1
(y 1)(y 2)
218 Solutions to Chapter 10
Py 2
To evaluate the sum x=1 2x(y x 1) we separate it in parts and then use the
identities (D.6) and (D.7):
y
X2 y
X2 y
X2
2x(y x 1) = 2(y 1) x 2 x2
x=1 x=1 x=1
(y 2)(y 1) (y 2)(y 1)(2(y 2) + 1)
= 2(y 1) 2
2 6
(y 2)(y 1)y
= .
3
This gives
y
X2 2(y x 1) (y 2)(y 1)y y
E[X|Y = y] = x = = ,
x=1
(y 1)(y 2) 3(y 2)(y 1) 3
Y
and E[X|Y ] = 3 .
X X 10 ✓ ◆
X ✓ ◆
y 1 x 5 y x 10 1
pX (x) = pX|Y (x | y)pY (y) = pX,Y (x, y) = 6 6 10
y y y=x
x y 2
10
X 10! 1 x 5 y x
= 6 6 2 10
y=x
x!(y x)!(10 y)!
10
Xx
10! 1 x 1 10 (10 x)! 5 k
=
x!(10 x)! 6 2 k!(10 x k)! 6
k=0
✓ ◆ ✓ ◆
10 1 x 1 10 11 10 x 10 1 x 11 10 x
= 6 2 6 = 12 12 .
x x
The conditional expectation E[X|Y = y] for a fixed y is just the expected value
of Bin(y, 16 ) which is y6 . This means that E(X|Y ) = Y6 and
E[X] = E[E(X|Y )] = E[ Y6 ] = 56 ,
since Y ⇠ Bin(10, 12 ).
Solutions to Chapter 10 219
(b) A closer inspection of the joint probability mass function shows that (X, Y
1 5 1
X, 10 Y ) has a multinomial distribution with parameters (10, 12 , 12 , 2 ):
P (X = x, Y X =y x, 10 Y = 10 y) = P (X = x, Y = y)
✓ ◆ ✓ ◆
y 1 x 5 (y x) 10 1
= 10
x 6 6 y 2
10! 1 x 5 y x 1 10 y
= x!(y x)!(10 y)! 12 12 2 .
1
This implies again that X is just a Bin(10, 12 ) random variable.
To see the joint distribution without computation, imagine that after we flip
the 10 coins, we roll 10 dice, but only count the sixes if the corresponding coin
showed heads. This is the same experiment because the number of ‘counted’
sixes has the same distribution as X. This is the number of successes for 10
identical experiments where success for the kth experiment means that the
kth coin shows heads and the kth die shows six. The probability of success is
1 1 1
2 · 6 = 12 . Moreover, (X, Y X, 10 Y ) gives the number of outcomes where
we have heads and a six, heads and not a six, and tails. This explains why the
1 5 1
the joint distribution is multinomial with probabilities ( 12 , 12 , 2 ).
We can recognize this as the probability mass function of the geometric distri-
1
bution with parameter 12 .
1
10.26. Let B be the event that the first trial is a success. Recall that E[N ] = p .
2 p
E[N 2 ] = .
p2
From this,
2 p 1 1 p
Var(N ) = E[N 2 ] (E[N ])2 = = .
p2 p 2 p2
10.27. Utilize again the temporary notation E[X|Y ] = v(Y ) from Definition 10.23
and identity (10.11):
⇥ ⇤ X X
E E[X|Y ] = E[v(Y )] = v(y)pY (y) = E[X|Y = y]pY (y) = E(X).
y y
10.28. We reason as in Example 10.13. First deduction of the joint p.m.f. Let
k1 , k2 , . . . , kr 2 {0, 1, 2, . . . } and set k = k1 + k2 + · · · + kr . In the first equality
below we can add the condition X = k into the probability because the event
{X1 = k1 , X2 = k2 , . . . , Xr = kr } is a subset of the event {X = k}.
P (X1 = k1 , X2 = k2 , . . . , Xr = kr )
= P (X1 = k1 , X2 = k2 , . . . , Xr = kr , X = k)
= P (X = k) P (X1 = k1 , X2 = k2 , . . . , Xr = kr | X = k)
(A) e k
k!
= · pk1 pk2 · · · pkr r
k! k1 ! k2 ! · · · kr ! 1 2
p1
e (p1 )k1 e p2
(p2 )k2 e pr
(pr )kr
= · ··· .
k1 ! k2 ! kr !
In the passage from line 3 to line 4 we used the conditional joint probability
mass function of (X1 , X2 , . . . , Xr ), given that X = k, namely
k!
P (X1 = k1 , X2 = k2 , . . . , Xr = kr | X = k) = pk1 pk2 · · · pkr r ,
k1 ! k2 ! · · · kr ! 1 2
which came from the description of the problem. In the last equality of (A) we
cancelled k! and then used both k = k1 + k2 + · · · + kr and p1 + p2 + · · · + pr = 1.
From the joint p.m.f. we deduce the marginal p.m.f.s by summing away the
other variables. Let 1 j r and ` 0. In the second equality below substitute
in the last line from (A). Then observe that each sum over the entire Poisson p.m.f.
Solutions to Chapter 10 221
evaluates to 1.
X
P (Xj = `) = P X1 = k 1 , . . . , X j 1 = kj 1,
k1 ,...,kj 1 ,
kj+1 ,...,kr 0
Xj = `, Xj+1 = kj+1 , . . . , Xr = kr
✓ X
1 p1
◆ ✓ X
1 ◆
e (p1 )k1 e pj 1
(pj 1 )kj 1
e pj
(pj )`
= ···
k1 ! kj 1 ! `!
k1 =0 kj 1 =0
✓ X
1 pj+1
◆ ✓ X
1 ◆
e (pj+1 )kj+1 e pr
(pr )kr
· ···
kj+1 ! kr !
kj+1 =0 kr =0
pj
e (pj )`
= .
`!
This gives us Xj ⇠ Poisson(pj ) for each j. Together with the earlier calculation
(A) we now know that X1 , X2 , . . . , Xr are independent with Poisson marginals Xj ⇠
Poisson(pj ).
10.29. For 0 ` n,
n
X
pL (`) = pL|M (`|m) pM (m)
m=`
Xn
m! n!
= r` (1 r)m `
· pm (1 p)n m
`!(m `)! m!(n m)!
m=`
n
X
n! (n `)!
= (pr)` (1 r)m ` m `
p (1 p)n m
`!(n `)! (m `)!(n m)!
m=`
n
X`
n! (n `)!
= (pr)` ((1 r)p)j (1 p)n ` j
`!(n `)! j=0
j!(n ` j)!
✓ ◆
n! n
= (pr)` ((1 r)p + 1 p) n `
= (pr)` (1 pr)n `
.
`!(n `)! `
In other words, L ⇠ Bin(n, pr).
Here is a way to get the distribution of L without calculation. Imagine that
we allow everybody to write the second test (even those applicants who fail the
first one). For a given applicant the probability of passing both tests is pr by
independence. Since L is the number of applicants passing both tests out of the n
applicants, we immediately get L ⇠ Bin(n, pr).
10.30. First deduction of the joint p.m.f. Let k, ` 2 {0, 1, 2, . . . }.
P (X1 = k, X2 = `) = P (X1 = k, X2 = `, X = k + `)
= P (X = k + `) P (X1 = k, X2 = ` | X = k + `)
(k + `)! k
= (1 p)k+` p · ↵ (1 ↵)` .
k! `!
222 Solutions to Chapter 10
To find the marginal p.m.f. we manipulate the series into a form where we can
apply identity (10.52). Let k 0.
1
X 1
X (k + `)! k
P (X1 = k) = P (X1 = k, X2 = `) = (1 p)k+` p · ↵ (1 ↵)`
k! `!
`=0 `=0
1
X
k (k + 1)(k + 2) · · · (k + `) `
= (↵(1 p)) p (1 p)(1 ↵)
`!
`=0
1
X ( k 1)( k 2) · · · ( k 1 ` + 1) `
= (↵(1 p))k p (1 p)(1 ↵)
`!
`=0
1 ✓
X ◆
k 1 `
= (↵(1 p))k p (1 p)(1 ↵)
`
`=0
k k 1
= (↵(1 p)) · p · 1 (1 p)(1 ↵)
✓ ◆k
↵(1 p) p
= · .
p + ↵(1 p) p + ↵(1 p)
Same reasoning (or simply replacing ↵ with 1 ↵) gives for ` 0
✓ ◆`
(1 ↵)(1 p) p
P (X2 = `) = · .
p + (1 ↵)(1 p) p + (1 ↵)(1 p)
Thus marginally X1 and X2 are shifted geometric random variables. However, the
conditional p.m.f. of X2 , given that X1 = k, is of a di↵erent form and furthermore
depends on k:
(k+`)! k
pX,Y (k, `) (1 p)k+` p · k! `! ↵ (1 ↵)`
pY |X (`|k) = = k
pX (k) ↵(1 p) p
· p+↵(1
p+↵(1 p) p)
(k + 1)(k + 2) · · · (k + `) `
= (p + ↵(1 p))k+1 (1 p)(1 ↵) .
`!
We conclude in particular that X1 and X2 are not independent.
10.31. We have
pX|IB (x | 1) = P (X = x | IB = 1) = P (X = x | B) = pX|B (x),
and
pX|IB (x | 0) = P (X = x | IB = 0) = P (X = x | B c ) = pX|B c (x).
10.32. From Exercise 6.34 we record the joint and marginal density functions:
(
2
(x, y) 2 D,
fX,Y (x, y) = 3
0 (x, y) 2 / D,
8
> x 0 or x 2, (
<0 0 y 0 or y 1,
2
fX (x) = 3 0 < x 1, fY (y) = 4 2
>
:4 2 3 3 y 0 < y < 1.
3 3 x 1 < x < 2,
From these we deduce the conditional densities. Note that the line segment
from (1, 1) to (2, 0) that forms part of the boundary of D obeys the equation
Solutions to Chapter 10 223
2
fX,Y (x, y) 3 1
fX|Y (x|y) = = 4 2 = for 0 < x < 2 y and 0 < y < 1.
fY (y) 3 3y
2 y
This shows that given Y = y 2 (0, 1), X is uniform on the interval (0, 2 y). Since
the mean of a uniform random variable is the midpoint of the interval,
y
E[X|Y = y] = 1 2 for 0 < y < 1.
82
>
> 3
>
<2 =1 for 0 < y < 1 and 0 < x 1,
fX,Y (x, y)
fY |X (y|x) = = 3 2
fX (x) >
> 3 1
>
:4 2 = for 0 < y < 2 x and 1 < x < 2.
3 3x
2 x
Thus given X = x 2 (0, 1], Y is uniform on the interval (0, 1), while given X = x 2
(1, 2), Y is uniform on the interval (0, 2 x). Hence
(
1
0 < x 1,
E[Y |X = x] = 2 x
1 2 1 < x < 2.
(i) y < 12 : P (X 1
2 | Y = y) = 0.
Z 1/2 1
1 1 y
(ii) 2 y < 34 : P (X 1
| Y = y) =
2 dx = 2
.
1 y 1 y 1 y
Z 2 2y
3 1 1
(iii) y 4 : P (X 2 | Y = y) = dx = 1.
1 y 1 y
(b) From Figure 6.4 or from the formula for fX in Example 6.20 we deduce P (X
1 1
2 ) = 8 . Then integrate the conditional probability from part (a) to find
Z 1
P (X 12 | Y = y) fY (y) dy
1
Z 3/4 1 Z 1
y
= 2
(2 2y) dy + (2 2y) dy = 18 .
1/2 1 y 3/4
10.34. The discrete case, utilizing pX|Y (x|y)pY (y) = pX,Y (x, y):
X X X
E[Y · E(X|Y )] = y E(X|Y = y) pY (y) = y x pX|Y (x|y) pY (y)
y y x
X X
= xy pX|Y (x|y) pY (y) = xy pX,Y (x, y) = E[XY ].
x,y x,y
The jointly continuous case, utilizing fX|Y (x|y)fY (y) = fX,Y (x, y):
Z 1
E[Y · E(X|Y )] = y E(X|Y = y) fY (y) dy
1
Z 1 ✓Z 1 ◆
= y x fX|Y (x|y) dx fY (y) dy
1 1
Z 1Z 1
= xy fX|Y (x|y) fY (y) dx dy
1 1
Z 1Z 1
= xy fX,Y (x, y) dx dy = E[XY ].
1 1
10.35. (a) We first find the joint density of (X, S). Using the same idea as in
Example 10.22, we write an expression for the joint cumulative distribution function
FX,S (x, s).
FX,S (x, s) = P (X x, S s) = P (X x, X + Y s)
ZZ ZZ
= fX,Y (u, v) du dv = '(u)'(v) du dv
ux, u+vs ux, vs u
Z x Z s u Z x
= '(u)'(v) du dv = '(u) (s u)du.
1 1 1
We can get the joint density of (X, S) by taking the mixed partial derivative, and
we will do that by taking the x-derivative first:
✓ Z x ◆
@ @ @ @
fX,S (x, s) = FX,S (x, s) = '(u) (s u)du
@s @x @s @x 1
@ 1 x2 +(s x)2
= ('(x) (s x)) = '(x)'(s x) = e 2 .
@s 2⇡
Solutions to Chapter 10 225
Since S is the sum of two independent standard normals, we have S ⇠ N (0, 2) and
1 s2
fS (s) = p
2 ⇡
e 4 . Then
x2 +(s x)2
1
fX,S (x, s) 2⇡ e 1 1
2 2
( s4 sx+x2 ) (x s 2
2)
fX|S (x | s) = = s2
=p e =p e .
fS (s) 1
p e 4 ⇡ ⇡
2 ⇡
We can recognize the final result as the probability density function of the N ( 2s , 12 )
distribution.
(b) Since the conditional distribution of X given S = s is N ( 2s , 12 ), we get
s 1
E[X|S = s] = , and E[X 2 |S = s] = + ( 2s )2 ,
2 2
S 1 S2
from which E[X|S] = 2, E[X 2 |S] = 2 + 4 .
Taking expectations again:
S2
E[E[X|S]] = E[S/2] = 0, E[E[X 2 |S]] = E[ 12 + 4 ] = 1
2 + 2
4 = 1,
where we used S ⇠ N (0, 2). The final answers agree with the fact that X is
standard normal.
10.36. To find the joint density function of (X, S), we change variables in an integral
that calculates the expectation of a function g(X, S).
Z 1Z 1
1 (x µ)2 (y µ)2
E[g(X, S)] = E[g(X, X + Y )] = g(x, x + y)e 2 2 2 2 dy dx
2⇡ 2 1 1
Z 1Z 1
1 (x µ)2 (s x µ)2
= g(x, s)e 2 2 2 2 ds dx.
2⇡ 2 1 1
From these ingredients we write down the conditional density function of X, given
that S = s:
p
fX,S (x, s) 4⇡ 2 (x µ) 2 (s x µ)2 (s 2µ)2
+ 4 2
fX|S (x|s) = = 2
e 2 2 2 2 .
fS (s) 2⇡
After some algebra and cancellation in the exponent, this turns into
⇢
1 (x 2s )2
fX|S (x|s) = p exp .
2⇡ 2 /2 2 2 /2
The conclusion is that given S = s, X ⇠ N (s/2, 2 /2). Knowledge of the normal
expectation gives E(X|S = s) = s/2, from which E[X|S] = 12 S.
10.37. Let A be the event {Z > 0}. Random variable Y has the same distribution
as Z conditioned on the event A. Hence the density function fY (y) is the same as
226 Solutions to Chapter 10
the conditional probability density function fZ|A (y). This conditional density will
be 0 for y 0, so we can focus on y > 0. The conditional density will satisfy
Z b
P (a Z b|Z > 0) = fY |A (y)dy
a
for any 0 < a < b. But if 0 < a < b then
P (a Z b, Z > 0) P (a Z b)
P (a Z b|Z > 0) = =
P (Z > 0) P (Z > 0)
Rb Z b
'(y)dy
= a = 2'(y)dy.
1/2 a
Thus fY (y) = fZ|A (y) = 2'(y) for y > 0 and 0 otherwise.
10.38. (a) The problem statement gives us these density functions for x, y > 0:
y yx
fY (y) = e and fX|Y (x|y) = ye .
Then the joint density function is given by
y(x+1)
fX,Y (x, y) = fX|Y (x|y)fY (y) = ye for x > 0, y > 0.
(b) Once we observe X = x, the distribution of Y should be conditioned on X = x.
First find the marginal density function of X for x > 0.
Z 1 Z 1
1
fX (x) = fX,Y (x, y)dy = ye y(x+1) dy = .
1 0 (1 + x)2
Then, again for x > 0 and y > 0,
fX,Y (x, y)
fY |X (y|x) = = y(1 + x)2 e y(x+1)
.
fX (x)
The conclusion is that, given X = x, Y ⇠ Gamma(2, x + 1). The gamma
distribution was defined in Definition 4.37.
10.39. From the problem we get that the conditional distribution of Y given X = x
is uniform on [x, 1]. From this we get that fY |X (y|x) is defined for every 0 x < 1
and is equal to (
1
if x y 1
fY |X (y|x) = 1 x
0 otherwise.
By averaging out x we can get the unconditional probability density function of Y ,
for any 0 y 1 we have
Z 1
fY (y) = fY |X (y|x)fX (x)dx
0
Z y
1
= · 20x3 (1 x)dx
0 1 x
Z y y
x4
= 20 x3 dx = 20 = 5y 4
0 4 0
If y < 0 or y > 1 then we have fY (y) = 0, thus
(
5y 4 if 0 y 1
fY (y) =
0 otherwise.
Solutions to Chapter 10 227
From these ingredients we find the density function fX (x). Concerning the
range, the inequalities 0 < x < y/2 and 0 < y < 1/2 combine to give 0 < x < 1/4.
For such x,
Z 1 Z 1/2
2
(A) fX (x) = fX|Y (x|y) fY (y) dy = · 2 dy = 4 ln 4x.
1 2x y
0 otherwise.
10.44. The calculation below begins with the averaging principle. Conditioning
on Y = y permits us to replace Y with y inside the probability, and then the
Solutions to Chapter 10 229
10.45. (a) We have the joint density fX,Y (a, y) given in (8.32). The distribution of
(y µY )2
2 2
Y is N (µY , 2 p 1
Y ) and thus the marginal density is fY (y) = e . Then
Y
2⇡ Y
fX,Y (x,y) x µX
fX|Y (x|y) = fY (y) . To help with the notation let us introduce x̃ = X
and
ỹ = y YµY . Then
1
(x̃2 +ỹ 2 2⇢x̃ỹ) ỹ 2
1p ⇢2 ) p 1
fX,Y (x, y) = e 2(1 , fY (y) = 2⇡
e 2
2⇡ X Y 1 ⇢2 Y
and
1p
1
⇢2 )
(x̃2 +ỹ 2 2⇢x̃ỹ)
e 2(1
x̃2 2x̃ỹ⇢+ỹ 2 ⇢2
2⇡ X Y 1 ⇢2
fX|Y (x|y) = ỹ 2
= p p1 e 2(1 ⇢2 )
p 1 2⇡ 1 ⇢2
2⇡
e 2 X
Y
˜ 2
(x̃ y ⇢)
= p p1 e 2(1 ⇢2 )
2⇡ 1 ⇢2 X
We can check that the formula given above for fX (x | Y 2 B) satisfies this identity.
By the definition of conditional probability,
ZZ
P (X 2 A, Y 2 B) 1
P (X 2 A | Y 2 B) = = f (x, y) dx dy
P (Y 2 B) P (Y 2 B) A⇥B
Z ✓Z ◆ Z
1
= f (x, y) dy dx = fX (x | Y 2 B) dx.
P (Y 2 B) A B A
10.47.
X X X
E[ g(X) | Y = y] = m P (g(X) = m | Y = y) = m P (X = k | Y = y)
m m k:g(k)=m
X X X X
= m P (X = k | Y = y) = g(k)P (X = k | Y = y)
m k:g(k)=m m k:g(k)=m
X
= g(k)P (X = k | Y = y).
k
10.48.
X
E[X + Z | Y = y] = m P (X + Z = m | Y = y)
m
X X
= m P (X = k, Z = ` | Y = y)
m k,`: k+`=m
X
= m P (X = k, Z = ` | Y = y)
k,`,m: k+`=m
X
= (k + `)P (X = k, Z = ` | Y = y)
k,`
X X
= k P (X = k, Z = ` | Y = y) + ` P (X = k, Z = ` | Y = y)
k,` k,`
X X X X
= k P (X = k, Z = ` | Y = y) + ` P (X = k, Z = ` | Y = y)
k ` ` k
X X
= k P (X = k | Y = y) + ` P (Z = ` | Y = y)
k `
= E[X | Y = y] + E[Z | Y = y].
10.49. (a) If it takes me more than one time unit to complete the job I’m simply
paid 1 dollar, so for t 1, pX|T (1|t) = 1. For 0 < t < 1 we get either 1 or 2 dollars
with probability 1/2 1/2, so the conditional probability mass function is
1
pX|T (1|t) = 2 and pX|T (2|t) = 12 .
We can compute E[X] by averaging E[X|T = t] using the probability density fT (t)
of T . Since T ⇠ Exp( ), we have fT (t) = e t for t > 0 and 0 otherwise. Thus
Z 1 Z 1 Z 1
3
E[X] = E[X|T = t]fT (t)dy = e t dt + e t dt
0 0 2 1
3 3 1
= (1 e ) + e = e .
2 2 2
10.50. For 0 k < n we have
Z 1 ✓ ◆Z 1
n
P (Sn = k) = P (Sn = k | ⇠ = p)f⇠ (p) dp = pk (1 p)n k dp.
0 k 0
Thus
n
X ✓ ◆
n k
P (Y < x) = (x k) p (1 p)n k
.
k
k=0
10.52. (a)
(b)
3k e 3 1
lim pY |X (1|k) = lim = lim = 1.
k!1 k!1 2k e 2 + 3k e 3 k!1 ( 23 )k e + 1
P (X2 = (0, 1), X3 = (0, 1)) = 0 6= P (X2 = (0, 1))P (X3 = (0, 1)) > 0.
Let
xn+1 = (an+1 , bn+1 ) 2 {(0, 0), (0, 1), (1, 0), (1, 1)}.
Then
Now consider the conditional distribution of Xn+1 with respect to the full past:
P (Xn+1 = xn+1 | X2 = x2 , . . . , Xn = xn )
P (X2 = x2 , . . . , Xn = xn , Xn+1 = xn+1 )
=
P (X2 = x2 , . . . , Xn = xn )
P (Y1 = a1 , Y2 = a2 , . . . , Yn 1 = an 1 , Yn = bn , Yn = an+1 , Yn+1 = bn+1 )
= .
P (Y1 = a1 , Y2 = a2 , . . . , Yn 1 = an 1 , Yn = bn )
This ratio is zero if bn 6= an+1 , and if bn = an+1 then it becomes P (Yn+1 = bn+1 )
by the independence of the Yk . Thus
P (Xn+1 = xn+1 |Xn = xn ) = P (Xn+1 = xn+1 | X2 = x2 , . . . , Xn = xn )
which shows that the process is a Markov chain.
Solutions to the Appendix
Appendix B.
B.1.
(a) We want to collect the elements which are either (in A and in B, but not in
C), or (in A and in C, but not in B), or (in B and in C, but not in A).
The elements described by the first parentheses are given by the set ABC c
(or equivalently A \ B \ C c ). The set in the second parentheses is ACB c while
the third is BCAc . By taking the union of these sets we have exactly the
elements of D:
D = ABC c [ ACB c [ BCAc .
(b) This is similar to part (a), but now we should also include the elements that
are in all three sets. These are exactly the elements of ABC = A \ B \ C, so by
taking the union of this set with the answer of (a) we get the required result.
D = ABC c [ BCAc [ ACB c [ ABC.
Alternately, we can write simply
D = AB [ AC [ BC = (A \ B) [ (A \ C) [ (B \ C).
In this last expression there can be overlap between the members of the union
but it is still a legitimate way to express the set D.
B.2. (a) A \ B \ C
(b) A \ (B [ C)c which can also be written as A \ B c \ C c .
(c) (A [ B) \ (A \ B)c
(d) A \ B \ C c
(e) A \ (B [ C)c
B.3.
(a) B \ A = {15, 25, 35, 45, 51, 53, 55, 57, 59, 65, 75, 85, 95}.
235
236 Solutions to the Appendix
(b) A \ B \ C c = {50, 52, 54, 56, 58} \ C c = {50, 52, 56, 58}.
(c) Observe that a two-digit number 10a + b is a multiple of 3 if and only if a + b is
a multiple of 3: 10a + b = 3k () a + b = 3(k 3a). Thus C \ D = ? because
the sum of the digits cannot be both 10 and a multiple of 3. Consequently
((A \ D) [ B) \ (C \ D) = ?.
✓ ◆c ✓ ◆
T T
B.4. We have ! 2 i Ai if and only if ! 2
/ i Ai . An element ! is not
in the intersection of the sets Ai if and only if there is at least one i with !S2
/ Ai ,
which is the same as ! 2 Aci . But ! 2 Aci for one of the i if and only if ! 2 i Aci .
This proves the identity.
B.5. (a) The elements in A4B are either elements of A, but not B or elements of
B, but not A. Thus we have A4B = AB c [ Ac B.
(b) First note that for any two sets E, F ⇢ ⌦ we have
⌦ = EF [ E c F [ EF c [ E c F c
where the four sets on the right are disjoint. From this and part (a) it follows that
(E4F )c = (EF c [ E c F )c = EF [ E c F c .
This gives
A4(B4C) = A(B4C)c [ Ac (B4C)
= A(BC [ B c C c ) [ Ac (BC c [ B c C)
= ABC [ AB c C c [ Ac BC c [ Ac B c C.
and
(A4B)4C = (A4B)C c [ (A4B)c C
= (AB c [ Ac B)C c [ (AB [ Ac B c )C
= AB c C [ Ac BC c [ ABC [ Ac B c C
which shows that the two sets are the same.
B.6. (a) We have ! 2 E = A \ B if and only if ! 2 A and ! 2 B. Similarly,
! 2 E = A \ B c if and only if ! 2 A and ! 2 B c . This shows that we cannot
have ! 2 E and ! 2 F the same time: this would imply ! 2 B and ! 2 B c
the same time, which cannot happen. Thus the intersection of E and F must
be the empty set.
(b) We first show that if ! 2 A then either ! 2 E or ! 2 F , this shows that
! 2 E [ F . We either have ! 2 B or ! 2 B c . If ! 2 B then ! is an element
of both A and B, and hence an element of E = A \ B. If ! 2 B c then ! is an
element of A and B c , and hence F = A \ B c . This proves that if ! 2 A then
! 2 E [ F.
On the other hand, if ! 2 E [ F then we must have either ! 2 E = A \ B
or ! 2 F = A \ B c . In both cases ! 2 A. Thus ! 2 E [ F implies ! 2 A.
This proves that the elements of A are exactly the elements of E [ F , and
thus A = E [ F .
B.7. (a) Yes. One possibility is D = CB c .
(b) Note that whenever 2 appears in one of the sets (A or B) then 6 is there as
Solutions to the Appendix 237
well, and vice versa. This means that we cannot separate these two elements with
the set operations, whatever set expression we come up with, the result will either
have both 2 and 6 or neither. Thus we cannot get {2, 4} as the result.
Appendix C.
C.1. We can construct all allowed license plates using the following procedure: we
choose one of the 26 letters to be the first letter, then one of the remaining 25
letters to be the 2nd, and then one of the remaining 24 letters to be the third
letter. Similarly, we choose one of the 10 digits to be the first digit, then choose
the second and third digits (with 9 and 8 possible choices). By the multiplication
principle this gives us 26 · 25 · 24 · 10 · 9 · 8 = 11, 232, 000 di↵erent license plates.
C.2. There are 26 choices for each of the three letters. Further, there are 10 choices
for each of the digits. Thus, there are a total of 263 · 103 ways to construct license
plates when any combination is allowed. However, there are 263 · 13 ways to con-
struct license plates with three zeros (we have 26 choices for each of the three letters,
and exactly one choice for each number). Subtracting those o↵ gives a solution of
263 (103 1) = 17,558,424.
Another way to get the same answer is as follows: we have 263 choices for the three
letters and 999 choices for the three digits (103 minus the three zero case) which
gives again 263 · 999 = 17,558,424.
C.3. There are 25 license plates that di↵er from U W U 144 only at the first position
(as there are 25 other letters we can choose there), the same is true for the second
and third positions. There are 9 license plates that di↵er from U W U 144 only at
the fourth position (there are 9 other possible digits), and the same is true for the
5th and 6th positions. This gives 3 · 25 + 3 · 9 = 102 possibilities.
C.4. We can arrange the 6 letters in 6! = 120 di↵erent orders, so the answer is 120.
C.5. Imagine that we di↵erentiate between the two P s: there is a P1 and a P2 .
Then we could order the five letters 5! = 5 · 4 · 3 · 2 · 1 = 120 di↵erent ways. Each
ordering of the letters gives a word, but we counted each word twice (as the two P s
can be in two di↵erent orders). Thus we can construct 120 2 = 60 di↵erent words.
C.6. (a) This is the choice of a subset of size 5 from a set of size 90, hence we have
90
5 = 43, 949, 268 outcomes.
If you want to first choose the numbers in order, then first you produce an
ordered list of 5 numbers: 90 · 89 · 88 · 87 · 86 outcomes. But now each set of
5 numbers is counted 5! times (in each of its orderings). Thus the answer is
again ✓ ◆
90 · 89 · 88 · 87 · 86 90
= = 43, 949, 268.
5! 5
(b) If 1 is forced into the set, then we choose the remaining 4 winning numbers
from the 89 numbers {2, 3, . . . , 90}. We can do that 894 = 2, 441, 626 di↵erent
ways, this is the number of outcomes with 1 appearing among the five numbers.
(c) These outcomes can be produced by first picking 2 numbers from the set
{1, 2, . . . , 49} and 3 numbers from {61, 62, . . . , 90}. By the multiplication prin-
ciple of counting there are 49 2
30
3 = 4, 774, 560 ways we can do that, so that
238 Solutions to the Appendix
is the number of outcomes. Note: It does not matter in what order the steps
are performed, or you can imagine them performed simultaneously.
(d) Here are two possible ways of solving this problem:
(i) First choose a set of 5 distinct second digits from the set {0, 1, 2, . . . , 9}:
10
5 choices. The for each last digit in turn, choose a first digit. There
are always 9 choices: if the last digit is 0, then the choices for the first
digit are {1, 2, . . . , 9}, while if the last digit is in the range 1 9 then the
choices for the first digit are {0, 1, . . . , 8}. By the multiplication principle
of counting there are 10 5
5 9 = 14, 880, 348 outcomes.
(ii) Here is another presentation of the same idea: divide the 90 numbers into
subsets according to last digit:
The rule is that at most 1 number comes from each Ak . Hence first
choose 5 subsets Ak1 , Ak2 , . . . , Ak5 out of the ten possible: 10
5 choices.
Then choose one number from the 9 in each set Akj : 95 total possibilities.
By the multiplication principle 10 5
5 9 outcomes.
C.7. Denote the four players by A, B, C and D. Note that if we choose the partner
of A (which we can do three possible ways) then this will determine the other team
as well. Thus there are 3 ways to set up the doubles match.
C.8. (a) Once we choose the opponent of team A, the whole tournament is set up.
Thus there are 3 ways to set up the tournament.
(b) In the tournament there are three games, each have two possible outcomes.
Thus for a given set up we have 23 = 8 outcomes, and since there are 3 ways to
set up the tournament this gives 8·3 = 24 possible outcomes for the tournament.
C.9. (a) In order to produce all pairs we can first choose the rank of the pair (2,
3, . . . , J, Q, K or A), which gives 13 choices. Then we choose the two cards
from the 4 possibilities for that rank (for example, if the rank is K then we
choose 2 cards from ~ K, | K, } K, K), which gives 42 choices. By the
multiplication principle we have altogether 13 · 42 = 78 choices.
(b) To produce two cards with the same suit we first choose the suit (4 choices)
and then choose the two cards from the 13 possibilities with the given suit
( 13 13
2 = 78 choices). By the multiplication principle the result is 4 · 2 = 312.
(c) To produce a suited connector, first choose the suit (4 choices) then one of the
13 neighboring pairs. This gives 4 · 13 = 52 choices.
C.10. (a) We can construct a hand with two pairs the following way. First we
choose the ranks of the repeated ranks, we can do that 13 2 di↵erent ways.
For the lower ranked pair we can choose the two suits 42 ways, and the for
the larger ranked pair we again have 42 choices for the suits. The fifth card
must have a di↵erent rank than the two pairs we have already chosen, there are
52 2 · 4 = 44 choices for that. This gives 13 4 4
2 · 2 · 2 · 44 = 123552 choices.
Solutions to the Appendix 239
(b) We can choose the rank of the three cards of the same rank 13 ways, and the
three suits 43 = 4 ways. The other two cards have di↵erent ranks, we can
choose those ranks 12 2 di↵erent ways. For each of these two ranks we can
choose the suit four ways, which gives 42 choices. This gives 13 · 4 · 12 2
2 ·4 =
54912 possible three of a kinds.
(c) We can choose the rank of the starting card 10 ways (A, 2, . . . , 10) if we want
five cards in sequential order, this identifies the ranks of the other cards. For
each of the 5 ranks we can choose the suit 4 ways. But for each sequence we
have four cases where all five cards are of the same suit, we have to remove
these from the 45 possibilities. This gives 10 · (45 4) = 10200 choices for a
straight.
(d) The suit of the five cards can be chosen 4 ways. There are 13
5 ways to choose
five cards, but we have to remove the cases when these are in sequential order.
We can choose the rank of the starting card 10 ways (A, 2, . . . , 10) if we want
five cards in sequential order. This gives 4 · ( 13
5 10) = 5108 choices for a
flush.
(e) We can construct a full house the following way. First choose the rank that
appears three times (13 choices), and then the rank appearing twice (there are
12 remaining choices). Then choose the three suits for the rank appearing three
times ( 43 = 4 choices) and the suits for the other two cards ( 42 = 6 choices).
In each step the number of choices does not depend on the previous decisions,
so we can multiply these together to get the number of ways we can get a full
house: 13 · 12 · 4 · 6 = 3744.
(f) We can choose the rank of the 4 times repeated card 13 ways, and the fifth card
48 ways (since we have 48 other cards), this gives 13 · 48 = 624 poker hands
with four of a kind.
(g) We can choose the value of the starting card 10 ways (A, 2, . . . , 10), and the
suit 4 ways, which gives 10 · 4 = 40 poker hands with straight flush. (Often
the case when the starting card is a 10 is called a royal flush. There are 4 such
hands.)
C.11. From the definition:
✓ ◆ ✓ ◆
n 1 n 1 (n 1)! (n 1)!
+ = +
k k 1 k!(n k 1)! (k 1)!(n k 1)!
n k n · (n 1)! k n · (n 1)!
= · + ·
n k!(n k 1)! · (n k) n k · (k 1)!(n k 1)!
✓ ◆ ✓ ◆
n k k n! n
= + = .
n n k!(n k)! k
Here is another way to prove the identity. Assume that in a class there are n
students, and one of them is called Dana. There are nk ways to choose a team of
k students from the class. When we choose the team there are two possibilities:
Dana is either on the team or not. There are n k 1 ways to choose the team if
we cannot include Dana. There are nk 11 ways to choose the team if we have to
include Dana. These two numbers must add up to the total number of ways we can
select the team, which gives the identity.
240 Solutions to the Appendix
C.12. (a) We have to divide up the remaining 48 (non-ace) cards into four groups
so that the first group has 9 cards, and the second, third and fourth groups
48 48!
have 13 cards. This can be done by 9,13,13,13 = 9!(13!) 3 di↵erent ways.
(b) To describe such a configuration we just have to assign a di↵erent suit for each
player. This can be done 4! = 24 di↵erent ways.
(c) We can construct such a configuration by first choosing the 13 cards of Player 4
(there are 39 non-~ cards, so we can do that 39
13 di↵erent ways), then choosing
the 13 cards of Player 3 (there are 26 non-~ cards remaining, so we can do that
26
13 di↵erent ways), and then choosing the 13 cards of Player 2 out of the
remaining 26 cards (out of which 13 are ~), we can do that 2613 di↵erent ways.
(Player 1 gets the remaining 13 cards.) Since the number of choices in each
step do not depend on the outcomes of the previous choices, the total number
of configurations is the product 39 26 26 39!26!
13 · 13 13 = (13!)5 .
C.13. Label the sides of the square with north, west, south and east. For any
coloring we can always rotate the square in a unique way so that the red side is the
north side. We can choose the colors of the other two sides (W, S, E) 3 · 2 · 1 = 6
di↵erent ways, which means that there are 6 di↵erent colorings.
C.14. We will use one color twice and the other colors once. Let us first count the
number of ways we can color the sides so there are two red sides. Label the sides
of the square with north, west, south, east. We can rotate any coloring uniquely
so the (only) blue side is the north side. The yellow side can be chosen now three
di↵erent ways (from the other three positions), and once we have that, the positions
of the red sides are determined. Thus there are three ways we can color the sides of
the square so that there are 2 red, 1 blue and 1 yellow side and colorings that can
be rotated to each other are treated the same. Similarly, we have three colorings
with 2 blue, 1 red and 1 yellow side, and three colorings with 2 yellow, 1 red and 1
blue side. This gives 9 possible colorings.
C.15. Imagine that we place the colored cube on the table so that one of the faces
is facing us. There are 6 di↵erent colorings of the cube where the red and blue faces
are on the opposite sides. Indeed: for such a coloring we can always rotate the cube
uniquely so that it rests on the red face and the yellow face is facing us (with blue
on the top). Now we can choose the colors of the other three faces 3 · 2 · 1 di↵erent
ways, which gives us 6 such colorings.
If the red and the blue faces are next to each other then we can always rotate
the cube uniquely so it rests on the red face and the blue face is facing us. The
remaining four faces can be colored 4 · 3 · 2 · 1 di↵erent ways, thus we have 24 such
colorings.
This gives 24 + 6 = 30 colorings all together.
C.16. Number the bead positions clockwise with 0, 1, . . . , 17. We can choose the
positions of the 7 green beads out of the 18 possibilities 18
7 di↵erent ways. However
this way we over counted the number of necklaces, as we counted the rotated
versions of each necklace separately. We will show that each necklace was counted
exactly 18 times. A given necklace can be rotated 18 di↵erent ways (with the first
position going into one of the eighteen possible positions), we just have to check that
Solutions to the Appendix 241
two di↵erent rotations cannot give the same set of positions for the green beads.
We prove this by contradiction. Assume that we have seven di↵erent positions
g1 , . . . , g7 2 {0, 1, . . . , 17} so that if we rotate them by 0 < d < 18 then we get
the same set of positions. It can be shown that this can only happen if each two
neighboring position are separated by the same number of steps. But 7 does not
divide 16, so this is impossible. Thus all 18 rotations of a necklace were counted
1 18
separately, which means that the number of necklaces is 18 7 = 1768.
C.17. Suppose that in a class there are n girls and n boys. There are 2n
n di↵erent
ways we can choose a team of n students out of this class of 2n. For any 0 k n
there are nk · n n k ways to choose the team so that there are exactly k girls and
n n n n n 2
n k boys chosen. For 0 k n we have n k = k and thus k · n k = k .
By considering the possible values of the number of girls in the team we now
get the identity
✓ ◆ ✓ ◆2 ✓ ◆ 2 ✓ ◆2
2n n n n
= + + ··· + .
n 0 1 n
C.18. If x = 1 then the inequality is 0 1 n which certainly holds.
Now assume x > 1. For n = 1 both sides are equal to 1+x, so the inequality is
true. Assume now that the inequality holds for some positive integer n, we need to
show that it holds for n + 1 as well. By our induction assumption (1 + x)n 1 + nx,
and because x > 1, we have 1 + x > 0. Hence we can multiply both sides of the
previous inequality with 1 + x to get
Since nx2 0 we get (1 + x)n+1 1 + (n + 1)x which proves the induction step,
and finishes the proof.
C.19. Let an = 11n 6. We have a1 = 5, which is divisible by 5. Now assume that
for some positive integer n the number an is divisible by 5. We have
We will show that for all n 4 we have 2n 4n. This certainly holds for n = 4.
Now assume that it holds for some integer n 4, we will show that it also holds
for n + 1. Multiplying both sides of the inequality 2n 4n (which we assumed to
be true) by 2 we get
2n+1 8n.
But 8n = 4(n + 1) + 4(n 1) > 4(n + 1) if n 4. Thus 2n+1 4(n + 1), which
finishes the proof.
242 Solutions to the Appendix
Appendix D.
D.1. We can separate the terms into two sums:
n
X n
X n
X
(n + 2k) = n+ (2k).
k=1 k=1 k=1
Note that in the first sum we add n times the constant term n, so the sum is equal
to n2 . The second sum is just twice the sum (D.6), so its value is n(n + 1). Thus
n
X
(n + 2k) = n2 + n(n + 1) = 2n2 + n.
k=1
P1
D.2. For any fixed i 1 we have j=1 ai,j = ai,i + ai,i+1 = 1 1 = 0. Thus
P1 P1
i=1 j=1 ai,j = 0.
If we fix j 1 then
1
(
X a1,1 = 1, if j = 1,
ai,j =
i=1
aj 1,j + aj,j = 1 + 1 = 0, if j > 1.
P1 P1
Thus j=1 i=1 ai,j = 1. This shows that for this particular choice of numbers
ai,j we have
1 X
X 1 1 X
X 1
ai,j 6= ai,j = 1.
i=1 j=1 j=1 i=1
D.3. (a) Evaluating the sum on the inside first using (D.6):
n X
X k n
X n ✓
X ◆
k(k + 1) 1 1
2
`= = k + k .
2 2 2
k=1 `=1 k=1 k=1
Separating the sum in two parts and then using (D.6) and (D.7):
X n ✓ ◆ n n
1 2 1 1X 2 1X
k + k = k + k
2 2 2 2
k=1 k=1 k=1
1 n(n + 1)(2n + 1) 1 n(n + 1)
= · + ·
2 6 2 2
(n(n + 1) n3 n2 n
= · (2n + 1 + 3) = + + .
12 6 2 3
(b) Since the sum on the inside has k terms that are all equal to k we get
n X
X k n
X n(n + 1)(2n + 1) 1 1 1
k= k2 = = n3 + n2 + n.
6 3 2 6
k=1 `=1 k=1
The second and third sums can be evaluated using parts (a) and (b). The first sum
is
Xn X k Xn
7n(n + 1) 7 7
7= 7k = = n2 + n.
2 2 2
k=1 `=1 k=1
Thus we get
n X
X k ✓ ◆
7 7 1 3 1 2 1 n3 n2 n
(7 + 2k + `) = n2 + n + 2 · n + n + n + + +
2 2 3 2 6 6 2 3
k=1 `=1
5 3 25
= n + 5n2 + n.
6 6
Pn
D.4. j=i j is the sum of the arithmetic progression i, i+1, . . . , n which has n i+1
elements, so its value is (n i + 1) n+i
2 . Thus
n X
X n n
X n
n+i X1
j= (n i + 1) = i 2 + i + n2 + n
i=1 j=i i=1
2 i=1
2
n n n
1 X +1 X 1X 2
= i i+ (n + n).
2 i=1 2 i=1 2 i=1
The first and second sums can be computed using the identities (D.6) and (D.7):
n
1X 2 n(n + 1)(2n + 1)
i =
2 i=1 12
n
1X n(n + 1)
i= .
2 i=1 4
(The switching of the order of the summation is justified because we have a finite
sum.) The inside sum is easy to evaluate because the summand does not depend
244 Solutions to the Appendix
Pj
on i: i=1 j = j · j = j 2 . Then
X j
n X n
X n(n + 1)(2n + 1)
j= j2 = ,
j=1 i=1 j=1
6
by (D.7).
D.5. (a) From (D.1) we have
1
X 1
X xi
xj = xi + xi+1 + xi+2 + · · · = xi xn = .
j=i n=0
1 x
Thus
1 X
X 1 X1
j xi x
x = = (1 + x + x2 + . . . )
i=1 j=i i=1
1 x 1 x
1
X
x x 1 x
= xn = · = .
1 x n=0 1 x 1 x (1 x)2
This is exactly the sum that we computed in part (a), which shows that the answer
is again (1 xx)2 . The fact that we can switch the order of the summation follows
from the fact that the double sum in (a) is finite even if we put absolute values
around each term.
D.6. We use induction. For n = 1 the two sides are equal: 12 = 1·2·(2·1+1)
6 . Assume
that the identity holds for n 1, we will show that it also holds for n + 1. By the
induction hypothesis
n(n + 1)(2n + 1)
12 + 22 + · · · + n2 + (n + 1)2 = + (n + 1)2
6
n+1 n+1
= (n(2n + 1) + 6(n + 1)) = 2n2 + 7n + 6
6 6
(n + 1)(2n2 + 7n + 6) (n + 1)(n + 2)(2n + 3)
= = .
6 6
The last formula is exactly the right side of (D.7) for n + 1 in place of n, which
proves the induction step and the statement.
D.7. We prove the identity by induction. The identity holds for n = 1. Assume
that it holds for n 1, we will show that it also holds for n + 1. By the induction
Solutions to the Appendix 245
hypothesis
n2 (n + 1)2
13 + 23 + · · · + n3 + (n + 1)3 = + (n + 1)3
4 ✓ ◆
2 n2 n2 + 4n + 4
= (n + 1) + n + 1 = (n + 1)2
4 4
(n + 1)2 (n + 2)2
= .
4
This is exactly (D.8) stated for n + 1, which completes the proof.
n
D.8. First note that both sums have finitely many terms, because k = 0 if k > n.
If we move every term to the left side then we get
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
n n n n n
+ + +...
0 1 2 3 4
We would like to show that this expression is zero. Note that the alternating
signs
Pn can be expressed
Pn using kpowers of 1, hence the expression above is equal to
k n n k n
k=0 ( 1) k = k=0 ( 1) · 1 k .P But this is exactly equal to ( 1 + 1)n =
n
0 = 0 by the binomial theorem. Hence k=0 ( 1)k nk = 0 and
n
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
n n n n n n
+ + + ··· = + + + ...
0 2 4 1 3 5
Pn
Using the binomial theorem for (1 + 1)n we get k=0 nk = 2n . Introducing
✓ ◆ ✓ ◆ ✓ ◆
n n n
an = + + + ...
0 2 4
✓ ◆ ✓ ◆ ✓ ◆
n n n
bn = + + + ...,
1 3 5
we have just shown that an = bn and an + bn = 2n . This yields an = bn = 2n 1 .
But an is exactly the number of even subsets of a set of size n (as it counts the
number of subsets with 0, 2, 4 . . . elements), thus the number of even subsets is
2n 1 . Similarly, the number of odd subsets is also 2n 1 .
D.9. We would like to show (D.10) for all x, y and n 1. For n = 1 the two sides
are equal. Assume that the statement holds for n, we will prove that it also holds
for n + 1. By the induction hypothesis
Xn ✓ ◆
n+1 n n k n k
(x + y) = (x + y) · (x + y) = (x + y) x y
k
k=0
Xn ✓ ◆ n ✓ ◆
X n ✓ ◆
n k n k n k+1 n k X n k n k+1
= x y (x + y) = x y + x y .
k k k
k=0 k=0 k=0
Shifting the index in the first sum gives
Xn ✓ ◆ n ✓ ◆ n+1 ✓ ◆ n ✓ ◆
n k+1 n k X n k n k+1 X n X n
x y + x y = xk y n+1 k
+ xk y n k+1
k k k 1 k
k=0 k=0 k=1 k=0
Xn ✓✓ ◆ ✓ ◆◆
n n
= xn+1 + y n+1 + + xk y n+1 k
k 1 k
k=1
246 Solutions to the Appendix
n ✓
X ◆ X✓
n+1 ◆
n+1 n + 1 k n+1
(x + y)n+1 = xn+1 + y n+1 + xk y n+1 k
= x y k
,
k k
k=1 k=0
which is exactly the statement we have to prove for r + 1. This proves the induction
step and the theorem.
D.11. This can be done similarly to Exercise D.9. We outline the proof for r = 3,
the general case is similar (with more indices). We need to show that
X ✓ ◆
n
n
(x1 + x2 + x3 ) = x k1 x k2 x k3 .
k 1 , k2 , k3 1 2 3
k1 0, k2 0,k3 0
k1 +k2 +k3 =n
For n = 1 the two sides are equal: the only possible triples (k1 , k2 , k3 ) are (1, 0, 0),
(0, 1, 0) and (0, 0, 1) and these give the terms x1 , x2 and x3 . Now assume that the
equation holds for some n, we would like to show it for n+1. Take the equation for n
and multiply both sides with x1 +x2 +x3 . Then on one side we get (x1 +x2 +x3 )n+1 ,
while the other side is
X ✓ ◆⇣ ⌘
n
xk11 +1 xk22 xk33 + xk11 xk22 +1 xk33 + xk11 xk22 xk33 +1 .
k 1 , k2 , k3
k1 0, k2 0,k3 0
k1 +k2 +k3 =n
The coefficient of xa1 1 xa2 2 xa3 3 for a given 0 a1 , 0 a2 , 0 a3 with a1 +a2 +a3 = n+1
is equal to
✓ ◆ ✓ ◆ ✓ ◆
n n n
+ +
a1 1, a2 , a3 a 1 , a2 1, a3 a 1 , a 2 , a3 1
n+1
which can be shown to be equal to . (This is a generalization of Exercise
a1 ,a2 ,a3
D.9 and can be shown the same way.) But this means that
X ✓ ◆⇣ ⌘
n
(x1 + x2 + x3 ) n+1
= xk11 +1 xk22 xk33 + xk11 xk22 +1 xk33 + xk11 xk22 xk33 +1
k 1 , k2 , k3
k1 0, k2 0,k3 0
k1 +k2 +k3 =n
X ✓ ◆
n+1
= xa1 xa2 xa3 ,
a 1 , a2 , a3 1 2 3
a1 0, a2 0,a3 0
a1 +a2 +a3 =n+1
n
number of ways we can do that is exactly the multinomial coefficient k1 ,k2 ,...,kr .
This proves the identity (D.11).