100% found this document useful (2 votes)

10K views252 pages

Intro. To Probability Sol N

This document provides detailed solutions to exercises from a textbook on probability. It contains solutions organized by chapter, with an introduction and preface. The solutions demonstrate calculations and reasoning for a variety of probability problems involving events, sample spaces, and other core probability concepts.

Uploaded by

Elaine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

10K views252 pages

Intro. To Probability Sol N

Uploaded by

Elaine

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 252

Introduction to Probability

Detailed Solutions to Exercises

David F. Anderson
Timo Seppäläinen
Benedek Valkó

c David F. Anderson, Timo Seppäläinen and Benedek Valkó 2018

Contents

Preface 1
Solutions to Chapter 1 3

Solutions to Chapter 2 29
Solutions to Chapter 3 59
Solutions to Chapter 4 91
Solutions to Chapter 5 113

Solutions to Chapter 6 129

Solutions to Chapter 7 155
Solutions to Chapter 8 167
Solutions to Chapter 9 197

Solutions to Chapter 10 205

Solutions to the Appendix 235

i
Preface

This collection of solutions is for reference for the instructors who use our book.
The authors firmly believe that the best way to master new material is via problem
solving. Having all the detailed solutions readily available would undermine this
process. Hence, we ask that instructors not distribute this document to the students
in their courses.
The authors welcome comments and corrections to the solutions. A list of
corrections and clarifications to the textbook is updated regularly at the website
https://www.math.wisc.edu/asv/

1
Solutions to Chapter 1

1.1. One sample space is

⌦ = {1, . . . , 6} ⇥ {1, . . . , 6} = {(i, j) : i, j 2 {1, . . . , 6}},
where we view order as mattering. Note that #⌦ = 62 = 36. Since all outcomes
1
are equally likely, we take P (!) = 36 for each ! 2 ⌦. The event A is
8 9
>
> (1, 2), (1, 3), (1, 4), (1, 5), (1, 6) > >
>
> >
< (2, 3), (2, 4), (2, 5), (2, 6) > =
A= (3, 4), (3, 5), (3, 6) = {(i, j) : i, j 2 {1, 2, 3, 4, 5, 6}, i < j},
>
> >
>
> (4, 5), (4, 6) >>
>
: ;
(5, 6)
and
#A 15
P (A) = = .
#⌦ 36
One way to count the number of elements in A without explicitly writing them out
is to note that for a first roll of i 2 {1, 2, 3, 4, 5}, there are only 6 i allowable rolls
for the second. Hence,
5
X
#A = (6 i) = 5 + 4 + 3 + 2 + 1 = 15.
i=1

1.2. (a) Since Bob has to choose exactly two options, ⌦ consists of the 2-element
subsets of the set {cereal, eggs, fruit}:
⌦ = {{cereal, eggs}, {cereal, fruit}, {eggs, fruit}}
The items in Bob’s breakfast do not come in any particular order, hence the
outcomes are sets instead of ordered pairs.
(b) The two outcomes in the event A are {cereal, eggs} and {cereal, fruit}. In
symbols,
A = {Bob’s breakfast includes cereal} = {{cereal, eggs}, {cereal, fruit}}.

3
4 Solutions to Chapter 1

1.3. (a) This is a Cartesian product where the first factor covers the outcome of
the coin flip ({H, T } or {0, 1}, depending on how you want to encode heads
and tails) and the second factor represents the outcome of the die. Hence
⌦ = {0, 1} ⇥ {1, 2, . . . , 6} = {(i, j) : i = 0 or 1 and j 2 {1, 2, . . . , 6}}.
(b) Now we need a larger Cartesian product space because the outcome has to
contain the coin flip and die roll of each person. Let ci be the outcome of the
coin flip of person i, and let di be the outcome of the die roll of person i. Index
i runs from 1 to 10 (one index value for each person). Each ci 2 {0, 1} and each
di 2 {1, 2, . . . , 6}. Here are various ways of writing down the sample space:
⌦ = ({0, 1} ⇥ {1, 2, . . . , 6})10
= {(c1 , d1 , c2 , d2 , . . . , c10 , d10 ) : each ci 2 {0, 1} and each di 2 {1, 2, . . . , 6}}
= {(ci , di )1i10 : each ci 2 {0, 1} and each di 2 {1, 2, . . . , 6}}.
The last formula illustrates the use of indexing to shorten the writing of the
20-tuple of all outcomes. The number of elements is #⌦ = 210 · 610 = 1210 =
61,917,364,224.
(c) If nobody rolled a five, then each die outcome di comes from the set {1, 2, 3, 4, 6}
that has 5 elements. Hence the number of these outcomes is 210 · 510 = 1010 .
To get the number of outcomes where at least 1 person rolls a five, subtract
the number of outcomes where no one rolls a 5 from the total: 1210 1010 =
51,917,364,224.
1.4. (a) This is an example of sampling with replacement, where order matters.
Thus, the sample space is
⌦ = {! = (x1 , x2 , x3 ) : xi 2 {states in the U.S.}}.
In other words, each sample point is a 3-tuple or ordered triple of U.S. states.
The problem statement contains the assumption that every day each state
is equally likely to be chosen. Since #⌦ = 503 =125,000, each sample point
1
! has equal probability P {!} = 50 3 = 125,000 . This specifies the probability
measure completely because then the probability of any event A comes from
#A
the formula P (A) = 125,000 .
(b) The 3-tuple (Wisconsin, Minnesota, Florida) is a particular outcome, and hence
as explained above,
1
P ((Wisconsin, Minnesota, Florida)) = 503 .
(c) The number of ways to have Wisconsin come on Monday and Tuesday, but not
Wednesday is 1 · 1 · 49, with similar expressions for the other combinations.
Since there is only 1 way for Wisconsin to come each of the three days, we see
the total number of positive outcomes is
1 · 1 · 49 + 1 · 49 · 1 + 49 · 1 · 1 + 1 = 3 · 49 + 1 = 148.
Thus
P (Wisconsin’s flag hung at least two of the three days)
3 · 49 + 1 37
= = = 0.001184.
503 31250
Solutions to Chapter 1 5

1.5. (a) There are two natural sample spaces we can choose, depending upon
whether or not we want to let order matter.
If we let the order of the numbers matter, then we may choose
⌦1 = {(x1 , . . . , x5 ) : xi 2 {1, . . . , 40}, xi 6= xj if i 6= j},
the set of ordered 5-tuples of distinct elements from the set {1, 2, 3, . . . , 40}. In
1
this case #⌦1 = 40 · 39 · 38 · 37 · 36 and P1 (!) = #⌦ 1
for each ! 2 ⌦1 .
If we do not let order matter, then we take
⌦2 = {{x1 , . . . , x5 } : xi 2 {1, 2, 3, . . . , 40}, xi 6= xj if i 6= j},
40
the set of 5-element subsets of the set {1, 2, 3, . . . , 40}. In this case #⌦2 = 5
1
and P2 (!) = #⌦ 2
for each ! 2 ⌦2 .
(b) The correct calculation for this question depends on which sample space was
chosen in part (a).
When order matters, we imagine filling the positions of the 5-tuple with
three even and two odd numbers. There are 53 ways to choose the positions
of the three even numbers. The remaining two positions are for the two odd
numbers. We fill these positions in order, separately for the even and odd
numbers. There are 20 · 19 · 18 ways to choose the even numbers and 20 · 19
ways to choose the odd numbers. This gives
5
· 20 · 19 · 18 · 20 · 19
3 475
P (exactly three numbers are even) = = .
40 · 39 · 38 · 37 · 36 1443
When order does not matter, we choose sets. There are 20 3 ways to choose
a set of three even numbers between 1 and 40, and 20 2 ways to choose a set of
two odd numbers. Therefore, the probability can be computed as
20 20
3 · 2 475
P (exactly three numbers are even) = 40 = .
5
1443
1.6. We give two solutions, first with an ordered sample, and then without order.
(a) Label the three green balls 1, 2, and 3, and label the yellow balls 4, 5, 6, and
7. We imagine picking the balls in order, and hence take
⌦ = {(i, j) : i, j 2 {1, 2, . . . , 7}, i 6= j},
the set of ordered pairs of distinct elements from the set {1, 2, . . . , 7}. The
event of two di↵erent colored balls is,
A = {(i, j) : (i 2 {1, 2, 3} and j 2 {4, . . . , 7}) or (i 2 {4, . . . , 7} and j 2 {1, 2, 3})}.
(b) We have #⌦ = 7 · 6 = 42 and #A = 3 · 4 + 4 · 3 = 24. Thus,
24 4
P (A) = = .
42 7
Alternatively, we could have chosen a sample space in which order does not
matter. In this case the size of the sample space is 72 . There are 31 ways to
choose one of the green balls and 41 ways to choose one yellow ball. Hence,
the probability is computed as
3 4
1 1 4
P (A) = 7 = .
2
7
6 Solutions to Chapter 1

1.7. (a) Label the balls 1 through 7, with the green balls labeled 1, 2 and 3, and
the yellow balls labeled 4, 5, 6 and 7. Let
⌦ = {(i, j, k) : i, j, k 2 {1, 2, . . . , 7}, i 6= j, j 6= k, i 6= k},
which captures the idea that order matters for this problem. Note that #⌦ =
7 · 6 · 5. There are exactly
3 · 4 · 2 = 24
ways to choose first a green ball, then a yellow ball, and then a green ball. Thus
the desired probability is
24 4
P (green, yellow, green) = = .
7·6·5 35
(b) We can use the same reasoning as in the previous part, by accounting for all
the di↵erent orders in which the colors can come:
P (2 greens and one yellow) = P (green, green, yellow)
P (green, yellow, green) + P (yellow, green, green)
3·2·4+3·4·2+4·3·2 72 12
= = = .
7·6·5 210 35
Alternatively, since this question does not require ordering the sample of
balls, we can take
⌦ = {{i, j, k} : i, j, k 2 {1, 2, . . . , 7}, i 6= j, j 6= k, i 6= k},
the set of 3-element subsets of the set {1, 2, . . . , 7}. Now #⌦ = 73 . There are
3 4
2 ways to choose 2 green balls from the 3 green balls, and 1 ways to choose
one yellow ball from the 4 yellow balls. So the desired probability is
3 4
2 · 1 12
P (2 greens and one yellow) = 7 = .
3
35
1.8. (a) Label the letters from 1 to 14 so that the first 5 are Es, the next 4 are As,
the next 3 are Ns and the last 2 are Bs.
Our ⌦ consists of (ordered) sequences of four distinct elements:
⌦ = {(a1 , a2 , a3 , a4 ) : ai 6= aj , ai 2 {1, 2, . . . , 14}}.
The size of ⌦ is 14 · 13 · 12 · 11 = 24024. (Because we can choose a1 14 di↵erent
ways, then a2 13 di↵erent ways and so on.)
The event C consists of sequences (a1 , a2 , a3 , a4 ) consisting of two numbers
between 1 and 5, one between 6 and 9 and one between 10 and 12. We can
count these by constructing such a sequence step-by-step: we first choose the
positions of the two Es: we can do that 42 = 6 ways. Then we choose a first E
out of the 5 choices and place it to the first chosen position. Then we choose
the second E out of the remaining 4 and place it to the second (remaining)
chosen position. Then we choose the A out of the 4 choices, and its position
(there are 2 possibilities left), Finally we choose the letter N out of the 3 choices
and place it in the remaining position (we only have one possibility here). In
each step the number of choices did not depend on the previous choices so we
can just multiply the numbers together to get 6 · 5 · 4 · 4 · 2 · 3 · 1 = 2880.
Solutions to Chapter 1 7

The probability of C is
#C 2880 120
P (C) = = = .
#⌦ 24024 1001
(b) As before, we label the letters from 1 to 14 so that the first 5 are Es, the next
4 are As, the next 3 are Ns and the last 2 are Bs. Our ⌦ is the set of unordered
samples of size 4, or in other words: all subsets of {1, 2, . . . , 14} of size 4:

⌦ = {{a1 , a2 , a3 , a4 } : ai 6= aj , ai 2 {1, 2, . . . , 14}}.

The size of ⌦ is 14 4 = 1001.

The event C is that {a1 , a2 , a3 , a4 } has two numbers between 1 and 5, one
between 6 and 9 and one between 10 and 12. The number of ways we can
choose such a set is 52 41 31 = 120. (Because we can choose the two Es out
of 5 possibilities, the single A out of 4 possibilities and the single N out of 3
possibilities.)
This gives
#C 120
P (C) = = ,
#⌦ 1001
the same as in part (a).
1.9. We model the point at which the stick is broken as being chosen uniformly
at random along the length of the stick, which we take to be L (in some arbitrary
units). Thus, ⌦ = [0, L]. The event we care about is A = {! 2 ⌦ : !  L/5 or !
4L/5}. Hence, since the two events are mutually exclusive,

L/5 L/5 2
P (A) = P {! 2 [0, L] : !  L/5} + P {! 2 [0, L] : ! 4L/5} = + = .
L L 5
1.10. (a) Since the outcome of the experiment is the number of times we roll the
die (as in Example 1.16), we take

⌦ = {1, 1, 2, 3, . . . }.

Element k in ⌦ means that it took k rolls to see the first four. Element 1
means that four never appeared.
Next we deduce the probability measure P on ⌦. Since ⌦ is a discrete
sample space (countably infinite), P is determined by giving the probabilities
of all the individual sample points.
For an integer k 1, we have

P (k) = P {needed k rolls} = P {no fours in the first k 1 rolls, then a 4}.

Each roll has 6 outcomes so the total number of outcomes from k rolls is 6k .
Each roll can fail to be a four in 5 ways. Hence by taking the ratio of the
number of favorable outcomes over the total number of outcomes,
✓ ◆k 1
5k 1 · 1 5 1
P (k) = P {no fours in the first k 1 rolls, then a 4} = k
= .
6 6 6
8 Solutions to Chapter 1

To complete the specification of the measure P , we find the value P (1). Since
the outcomes are mutually exclusive,
1
X
1 = P (⌦) = P (1) + P (k)
k=1
◆k 1
X1 ✓
5 1
= P (1) +
6 6
k=1
1 ✓ ◆ j
1X 5
(reindex) = P (1) +
6 j=0 6
1 1
(geometric series) = P (1) + ·
6 1 5/6
= P (1) + 1.
Thus, P (1) = 0.
(b) We already deduced above that
P (the number four never appears) = P (1) = 0.
Here is an alternative solution.
✓ ◆n
5
P (the number four never appears)  P (no fours in the first n rolls) = .
6
Since ( 56 )n ! 0 as n ! 1 and the inequality holds for any n, the probability
on the left must be zero.
1.11. The sample space ⌦ that represents the dartboard itself is a square of side
length 20 inches. We can assume that the center of the board is at the origin. The
event A, that the dart hits within 2 inches of the center, is then the subset of ⌦
described by A = {x : |x|  2}. Probability is now proportional to area, and so
area of A ⇡ · 22 ⇡
P (A) = = 2
= .
area of the board 20 100
1.12. The sample space and probability measure for this experiment were described
in the solution to Exercise 1.10: P (k) = ( 56 )k 1 16 for positive integers k.
1 5
(a) P (need at most 3 rolls) = P (1) + P (2) + P (3) = 6 1+ 6 + ( 56 )2 = 91
216 .
(b)
1
X 1
X 1
X
P (even number of rolls) = P (2m) = ( 56 )2m 11
6 = 1
5 ( 25
36 )
m

m=1 m=1 m=1

25
1 36 5
= 5 · 25 = 11 .
1 36

1.13. (a) Imagine selecting one student uniformly at random from the school.
Thus, ⌦ is the set of students and each outcome is equally likely. Let W
be the subset of ⌦ consisting of those students who wear a watch. Let B be
the subset of students who wear a bracelet. We are told that
P (W c B c ) = 0.6, P (W ) = 0.25, P (B) = 0.30.
Solutions to Chapter 1 9

We are asked for P (W [ B). By de Morgan (or a Venn Diagram) we have

P (W [ B) = 1 P ((W [ B)c ) = 1 P (W c B c ) = 1 0.6 = 0.4.
(b) We want P (W \ B). We have
P (W \ B) = P (W ) + P (B) P (W [ B) = 0.25 + 0.30 0.4 = 0.15.
1.14. From the inclusion-exclusion principle we get
P (A [ B) = P (A) + P (B) P (AB) = 0.4 + 0.7 P (AB) = 1.1 P (AB).
Rearranging this we get P (AB) = 1.1 P (A [ B).
Since P (A [ B) is a probability, it is at most 1, so
P (AB) = 1.1 P (A [ B) 1.1 1 = 0.1.
On the other hand, B ⇢ A [ B so P (A [ B) P (B) = 0.7 which gives
P (AB) = 1.1 P (A [ B)  1.1 0.7 = 0.4.
Putting these together we get 0.1  P (AB)  0.4.
1.15. (a) The event that one of the colors does not appear is W [ G [ R. If we use
the inclusion-exclusion principle then
P (W [ G [ R) = P (W ) + P (G) + P (R) P (W G) P (GR) P (RW ) + P (W GR).
We compute each term on the right-hand side. Note that the we can label the
4 balls so that we can di↵erentiate between the 2 red balls. This way the three
draws lead to equally likely outcomes, each with probability 413 .
We have
33
P (W ) = P (each pick is green or red) = 3
4
33 23
and similarly P (G) = 43 and P (R) = 43 . Also:
23
P (W G) = P (each pick is red) =
43
1 1
and similarly P (GR) = 43 and P (RW ) = 43 . Finally, P (W GR) = 0, since it
is not possible to have none of the colors in the sample.
Putting everything together:
1 13
P (W [ G [ R) = 3 (33 + 33 + 23 23 1 1) = .
4 16
(b) The complement of the event is {all three colors appear}. Let us count how
many di↵erent ways we can get such an outcome. We have 2 choices to decide
which red ball will show up, while there is only one possibility for the green
and the white. Then there are 3! = 6 di↵erent ways we can order the three
colors. This gives 2 · 6 = 12 possibilities. Thus
12 3
P (all three colors appear) = 3 =
4 16
from which
13
P (one of the colors does not appear) = 1 P (all three colors appear) = .
16
10 Solutions to Chapter 1

1.16. If we see only heads, I win $5. If we see 4 heads, I win $3. If we see 3
heads, I win $1. If we see 2 heads, I “win” -$1. If we see 1 heads, I “win” -$3.
Finally, if we see 0 heads, then I “win” -$5. Thus, the possible values of X are
{ 5, 3, 1, 1, 3, 5}. The sample space for the 5 coin flips is ⌦ = {(x1 , . . . , x5 ) :
xi 2 {H, T }} with #⌦ = 25 . Each individual outcome (x1 , . . . , x5 ) of five flips has
probability 2 5 .
Let k 2 {0, 1, . . . , 5}. To calculate the probability of exactly k heads we need
to count how many five-flip outcomes yield exactly k heads. The answer is k5 , the
number of ways of specifying which of the five flips are heads. Hence

P (precisely k heads)
✓ ◆
# ways to select k slots from the 5 for the k heads 5 5
= = 2 .
25 k

Thus,
5
P (X = 5) = P (0 heads) = 2
P (X = 3) = P (1 heads) = 5 · 2 5
✓ ◆
5
P (X = 1) = P (2 heads) = ·2 5
2
✓ ◆
5
P (X = 1) = P (3 heads) = ·2 5
3
✓ ◆
5
P (X = 3) = P (4 heads) = ·2 5
4
✓ ◆
5
P (X = 5) = P (5 heads) = 2 5.
5

1.17. (a) Possible values of Z are {0, 1, 2}.

4
2
pZ (0) = P (Z = 0) = 7 = 27 ,
2
4 3
1 1
pZ (1) = P (Z = 1) = 7 = 47 ,
2
3
2
pZ (2) = P (Z = 2) = 7 = 17 .
2

(b) Possible values of W are {0, 1, 2}.

4·4 16
pW (0) = P (W = 0) = = ,
7·7 49
4·3+3·4 24
pW (1) = P (W = 1) = = ,
7·7 49
3·4 9
pW (2) = P (W = 2) = = .
7·7 49
Solutions to Chapter 1 11

1.18. The possible values of X are {3, 4, 5} as these are the possible lengths of the
words. The probability mass function is
3
P (X = 3) = P (we chose one of the letters of ARE) =
16
8 1
P (X = 4) = P (we chose one of the letters of SOME or DOGS) = =
16 2
5
P (X = 5) = P (we chose one of the letters of BROWN) = .
16
1.19. The possible values of X are 5 and 1. For the probability mass function we
need P (X = 1) and P (X = 5). From the wording of the problem
P (X = 5) = P (dart lands within 2 inches of the center).
We may assume that the position of the dart is chosen uniformly from the disk of
radius 6 inches, and hence we may compute the probability above as the ratio of
the area of the disk of radius 2 to the area of the entire disk of radius 6:
⇡22 1
P (dart lands within 2 inches of the center) = = .
⇡62 9
Since P (X = 5) + P (X = 1) = 1, we get P (X = 1) = 1 P (X = 5) = 89 .
1.20. (a) One appropriate sample space is
⌦ = {1, . . . , 6}4 = {(x1 , x2 , x3 , x4 ) : xi 2 {1, . . . , 6}}.
Note that #⌦ = 64 = 1296. Since it is reasonable to assume that all outcomes
are equally likely, we set
1 1
P (!) = = .
#⌦ 1296
(b) To find P (A) and P (B) we count to find #A and #B, that is, the number of
outcomes in these events.
Begin with the easy observation: there is only one way for there to be four
fives, namely (5, 5, 5, 5). There are 5 ways to get three fives in the pattern
(5, 5, 5, X), one for each X 2 {1, 2, 3, 4, 6}. Similarly, there are 5 ways to have
three fives in each of the patterns (5, 5, X, 5), (5, X, 5, 5) and (X, 5, 5, 5). Thus,
there are a total of 5 + 5 + 5 + 5 = 20 ways to have three fives. A slicker way to
calculate this would be to note that there are 41 = 4 ways to choose which roll
is not five, and for each not-five we have 5 choices, thus altoghether 4 · 5 = 20.
Continuing this logic, we see that the number of ways to have precisely two
fives is:
✓ ◆
4
(#ways to choose the not-five rolls) · 5 · 5 = · 5 · 5 = 150.
2
Thus,
#A 1 + 20 + 150 171 19
P (A) = = = = .
#⌦ 1296 1296 144
Similarly,
4
#B 4 · 54 + 43 · 53 1125 125
P (B) = = = = .
#⌦ 1296 1296 144
12 Solutions to Chapter 1

(c) A [ B = ⌦. Since A and B are disjoint we should have 1 = P (⌦) = P (A [ B) =

P (A) + P (B), which agrees with the above.
1.21. (a) Number the black chips 1, 2, 3, the red chips 4 and 5, and the green chips
6 and 7. Then, let the sample space be ⌦ = {(x1 , x2 , x3 ) : xi 2 {1, . . . , 7}, xi 6=
xj for i 6= j}, where the entry xi represents our ith draw. Note that elements
of this ⌦ are equally likely and that there are precisely 7 · 6 · 5 = 210 such
elements.
To compute P (A) we count the number of ways we can get three di↵erent
colored chips for our three choices. We can choose a black chip, a red chip and
a green chip in 3·2·2 = 12 di↵erent ways. For each such choice we can order the
three chips 3! = 6 ways. Thus #A = 12 · 6 = 72 and P (A) = #A 72
#⌦ = 210 = 35 .
12

(b) Use the same labels for the chips as in part (a). Our sample space is
⌦ = {{x1 , x2 , x3 } : xi 2 {1, . . . , 7}, xi 6= xj for i 6= j}.
Note that the sample points are now subsets of size 3 instead of ordered triples,
and to indicate this the notation changed from (x1 , x2 , x3 ) to {x1 , x2 , x3 }. We
have #⌦ = 73 = 7·6·53! = 35. #A = 3 · 2 · 2 = 12, the number of ways to choose
one of three black chips, one of two red chips and one of two green chips. Thus
P (A) = #A 12
#⌦ = 35 . The answer is the same as in part (a), as it should be.

1.22. (a) The sample space is the set of 52 cards. We can represent the cards with
numbers from 1 to 52, or with their names. Since each outcome is equally
1
likely, P {!} = 52 for any fixed card !. For any subset A of cards we have
#A
P (A) = 52 .
(b) An event is a subset of the sample space ⌦. In part (a) we saw that for an event
A we have P (A) = #A 52 . So the desired event must have three elements. Any
such set will work, for example {~2, ~3, ~K}. In words, this is the event that
the chosen card is the two of hearts, the three of hearts or the king of hearts.
(c) By part (a), if P (A) = 15 then #A 1 52 52
52 = 5 which forces #A = 5 . Since 5 is not
an integer, there cannot be a subset with this many elements. Consequently
this probability space has no event with probability 1/5.
1.23. (a) You win if the prize is behind door 1. Probability 13 .
(b) You win if the prize is behind door 2 or 3. Probability 23 .
1.24. Choose door 3 and commit to switch. Then probability of winning is p1 + p2 .
1.25. (a) Since there are 5 restaurants with at least one friend out of 6 total restau-
rants, this probability is 56 .
(b) She has 7 friends in total. 3 of them are at a restaurant alone and 4 of them
are at a restaurant with somebody else. Thus the probability that she calls a
friend at a restaurant with 2 friends present is 47 .
1.26. This is sampling without replacement for it would make no sense to put the
same person twice on the committee. We are choosing 4 out of 15. We can do this
with order (there is a first pick, a second pick, etc) or without order (we choose
the subset of 4). It does not matter which approach we choose. But once we have
chosen a method, or calculations have to be consistent. If we work with order then
Solutions to Chapter 1 13

we have 15 · 14 · 13 · 12 possible outcomes, while if we work without order then

we have 15 4 choices. Each computation boils down to counting the number of
favorable outcomes and then dividing by the total number of outcomes.
(a) Without order: we can choose the two men 10 2 ways and the two women 2
5

ways. Thus te number of favorable outcomes is 10 5

2 · 2 and the probability is
10 5
( 2 ) ·( 2 )
= 30
91 .
(15
4)
With order: we can choose the two men 10·9 di↵erent ways and the two women
5 · 4 di↵erent ways. We also have to choose which two positions out of the 4
belong to men, and there are 42 choices for that. Thus the number of favorable
10·9·5·4·(42)
outcomes is 10 · 9 · 5 · 4 · 42 and the probability is 15·14·13·12 = 30
91 .
We got the same answer, but the computation without order was quicker.
(b) Without order: we want to count the number of committees that have both
Bob and Jane. We need to choose two additional members out of the remaining
13: we can do that 132 di↵erent ways. Thus the probability that both Bob
(13
2) 2
and Jane are on the committee is 15 = 35 .
(4)
With order: choose Bob’s position among the 4 members (4 choices), then
Jane’s position among the remaining 3 places (3 choices), and finally choose
two other members for the remaining two places (13 · 12 choices). This gives
4·3·13·12 2
15·14·13·12 = 35 .
(c) Without order: we need to choose 3 additional members besides Bob, out of
the 13 possibilities (since Jane cannot be chosen). This gives 13
3 choices and
(13
3) 22
the corresponding probability is 15 = 105 .
(4)
With order: we choose Bob’s position (4 choices) and the 3 additional members
4·13·12·11 22
(13 · 12 · 11 choices). This gives 15·14·13·12 = 105 .
1.27. (a) The colors do not matter for this part. So we can set up our sample space
as follows: ⌦ = {(x1 , . . . , x7 ) : xi 2 {1, . . . , 7}, xi 6= xj for i 6= j}. ⌦ is the set
of all permutations of the numbers 1, 2, 3 . . . , 7 and #⌦ = 7!.
For 1  i  7 we need to compute the probability of the event
Ai = {the ith draw is the number 5}.
For a given i we count the number of elements in Ai . We can construct all
elements of Ai by first placing 5 in the ith position, and then distributing the
remaining 6 numbers among the remaining 6 positions. We can do this 6!
di↵erent ways: there are 6 choices for the number in the first available position,
5 choices for the next available position, and so on. Thus #Ai = 6! (the same
for each i), and thus for all 1  i  7 we get
#Ai 6! 1
P (Ai ) = = = .
#⌦ 7! 7
(b) Assume that the three black chips are labeled by a1 < a2 < a3 . We can use
the same sample space as in part (a). We need to compute the probability of
the event Bi that the ith pick is black. Again we may assume 1  i  7. For a
given i we can construct all elements of Bi as follows: we pick one of the black
14 Solutions to Chapter 1

chips (a1 , a2 or a3 ) and place it in position i. (We have three choices for that.)
Then we distribute the remaining 6 numbers among the remaining 6 places.
(There are 6! ways we can do that.) Thus for any 1  i  7 we get #Bi = 3 · 6!
and then
#Bi 3 · 6! 3
P (Bi ) = = = .
#⌦ 7! 7
1.28. Assume that both m and n are at least 1 so the problem is not trivial.
(a) Sampling without replacement. We can compute the answer using either an
ordered or an unordered sample. It helps to assume that the balls are labeled
(e.g. by numbering them from 1 to m + n), although the actual labeling will
not play a role in the computation.
With an ordered sample we have (m+n)(m+n 1) outcomes (we have m+n
choices for the first pick and m + n 1 choices for the second). The favorable
outcomes can be counted by considering green-green and yellow-yellow pairs
separately: their number is m(m 1) + n(n 1). The answer is the ratio of
the number of favorable outcomes to the total number of outcomes,
m(m 1) + n(n 1)
P {(g,g) or (y,y)} = .
(m + n)(m + n 1)
The unordered sample calculation gives the same answer:
m n
2 + 2 m(m 1) + n(n 1)
P {a set of two greens or a set of two yellows} = m+n = .
2
(m + n)(m + n 1)
k
Note: for integers 0  k < `, the convention is ` = 0. This makes the
answers above correct even if m or n or both are 1.
(b) Sampling with replacement. Now the sample has to be ordered (there is a
first pick and a second pick). The total number of outcomes is (m + n)2 , and
the number of favorable outcomes (again counting the green-green and yellow-
yellow pairs separately) is m2 + n2 . This gives
m2 + n2
P {(g,g) or (y,y)} = .
(m + n)2
(c) We simplify the inequality through a sequence of equivalences, by cancelling
factors, multiplying away the denominators, and then cancelling some more.
answer to (a) < answer to (b)
m(m 1) + n(n 1) m2 + n2
() <
(m + n)(m + n 1) (m + n)2
m(m 1) + n(n 1) m2 + n2
() <
m+n 1 m+n
() (m(m 1) + n(n 1))(m + n) < (m2 + n2 )(m + n 1)
2 2 2 2 2
() (m m+n n)(m + n) < (m + n )(m + n) m n2
() ( m n)(m + n) < m2 n2
() (m + n)2 > m2 + n2
() 2mn > 0.
Solutions to Chapter 1 15

The last inequality is always true for positive m or n. Since the last inequality
is equivalent to the first one, the first one is also always true.
The conclusion we take from this is that if you want to maximize your
chances of getting two of the same color, you want to sample with replacement
rather than without replacement. Intuitively this should be obvious: once you
remove a ball, you have diminished the chances of drawing another one of the
same color.
1.29. (a) Label the liberals 1 through 7 and the conservatives 8 through 13. We
do not care about order, so
⌦ = {{x1 , x2 , x3 , x4 , x5 } : xi 2 {1, . . . , 13}, xi 6= xj if i 6= j},
in other words the set of 5-element subsets of the set {1, 2, . . . , 13}. Note that
#⌦ = 13 5 . The event A is

A = {more conservatives than liberals}

= {{x1 , x2 , x3 , x4 , x5 } 2 ⌦ : at least three elements in {8, . . . , 13}}.
(b) Let A3 , A4 , A5 be the events that there are three, four, and five conservatives,
respectively, chosen for the committee. Then A = A3 [ A4 [ A5 and these
are mutually exclusive events. By counting the number of ways we can choose
conservatives and liberals, we have
6 7
3 · 2 140
P (A3 ) = 13 =
5
429
6 7
4 · 1 35
P (A4 ) = 13 =
5
429
6 7
5 · 0 2
P (A5 ) = 13 = .
5
429
Thus,
140 35 2 59
P (A) = P (A1 ) + P (A2 ) + P (A3 ) = + + = .
429 429 429 143
1.30. First a solution that imagines that the rooks are labeled, for example num-
bered 1 through 8, and places the rooks on the chessboard in order. There are 64
squares on the chessboard, hence the total number of ways to place 8 rooks in order
is 64 · 63 · 62 · · · 57.
Next we place the rooks one by one so that none of them can capture any of
the previously placed rooks. The first rook can go anywhere on the board and so
has 82 = 64 choices. Placing the first rook removes one row and one column from
further consideration. Hence the second rook has 72 = 49 options. The first two
rooks remove two rows and two columns from further consideration. Thus the third
rook has 62 = 36 squares to choose from. The pattern continues. In total, there are
82 · 72 · · · 22 · 12 = (8!)2 ways to place the rooks in order, subject to the restriction
that no two rooks share a row or a column. The probability comes from the ratio:
(8!)2
P (no two rooks can capture each other) = ⇡ 0.000009109.
64 · 63 · 62 · · · 57
16 Solutions to Chapter 1

A solution without order comes by erasing the labels of the rooks and only
considering the set of squares they occupy. For the number of sets of 8 squares
that share no row or column we can take the count (8!)2 from the previous answer
and divide it by the number of orderings of the rooks, namely 8!. This leaves
(8!)2 /8! = 8! as the number of sets of 8 squares that share no row or column.
Alternately, pick the squares one column at a time. There are 8 choices for the
square from the first column, 7 available squares in the second column, 6 in the
third, and so on, to give 8! sets of 8 squares that share no row or column.
64
The total number of sets of 8 squares is 8 . So again
P (no two rooks can capture each other)
8! (8!)2
= 64 = ⇡ 0.000009109.
8
64 · 63 · 62 · · · 57
1.31. (a) Number the cards in the deck 1, 2, . . . , 52, with the numbers 1, 2, 3, 4 for
the four aces, and the number 1 for the ace of spades. We sample two cards
without replacement. We solve the problem without considering order. Thus
we set our sample space to be
⌦ = {{x1 , x2 } : x1 6= x2 , 1  xi  52 for i = 1, 2},
the set of 2-element subsets of the set {1, 2, . . . , 52}. We have #⌦ = 52 2 =
52·51
2! = 1326.
We need to compute the probability of the event A that both of the
chosen cards are aces and one of them is the ace of spades. Thus A =
{{1, 2}, {1, 3}, {1, 4}} and #A = 3. From this we get P (A) = #A 3 1
#⌦ = 1326 = 442 .
(b) We use the same sample space as in part (a). We need to compute the proba-
bility of the event B that at least one of the chosen cards is an ace. It is a bit
easier to compute the probability of the complement B c : this is the event that
none of the two chosen cards are aces. B c is the collection of 2-element sets
{x1 , x2 } 2 ⌦ such that both x1 5 and x2 5. There are 48 cards that are
not aces. The number of 2-element sets of such cards is 48 2 =
48·47
2! = 1128.
c c #B c 1128 188
Thus #B = 1128 and P (B ) = #⌦ = 1326 = 221 . Now we can compute
P (B) as P (B) = 1 P (B c ) = 1 188 33
221 = 221 .

Here is an alternative solution with ordered samples of cards.

(a) P (two aces and one of them the ace of spaces)
= P (ace of spades, a di↵erent ace) + P (a di↵erent ace, ace of spades)
1·3 3·1 6 1 1
= + = = = .
52 · 51 52 · 51 52 · 51 26 · 17 442

(b) P (at least one of the cards is an ace)

= P (ace, ace) + P (ace, non-ace) + P (non-ace, ace)
4·3 4 · 48 48 · 4 33
= + + = .
52 · 51 52 · 51 52 · 51 221
Solutions to Chapter 1 17

1.32. Here is one way to determine the number of ways to be dealt a full house.
We take as our sample space the set of 5-element subsets of the deck of cards:
⌦ = {{x1 , . . . , x5 } : xi 2 {deck of 52}, xi 6= xj if i 6= j}.
52
Note that #⌦ = 5 .
Now count the number of ways to get a full house. First, choose the face value
for the 3 cards that share a face value. There are 13 options. Then select 3 of the 4
suits for this face value. There are 43 ways to do that. We now have the three of
a kind selected. Next, choose another face value for the remaining two cards from
the remaining 12 face values. Then select 2 of the 4 suits for this face value. There
are 42 ways to do that. By the multiplication rule we conclude that there are
✓ ◆ ✓ ◆
4 4
13 · · 12 ·
3 2
ways to be dealt a full house. Since there are a total of 52 5 poker hands, the
probability is
13 · 12 · 43 42
P (full house) = 52 ⇡ 0.00144.
5

1.33. We let our sample space be the set of ordered 5-tuples from the set {1, 2, 3, 4, 5, 6}:
⌦ = {(x1 , . . . , x5 ) : xi 2 {1, . . . , 6}}.
This comes from sampling five times with replacement from {1, 2, 3, 4, 5, 6}, to pro-
duce an ordered sample. Note that #⌦ = 65 .
We count the number of 5-tuples that give a full house. First pick one of the
six numbers (6 choices) for the face value that appears three times. Then pick
another number (5 choices) for the face value that appears twice. Next, select 3
of the 5 rolls for the first number. There are 53 ways to choose three slots from
five. The remaining two positions are for the second number. (Here is an example:
suppose we picked the numbers “4” and “6” and then positions {1, 3, 4}. Then our
full house would be (4, 6, 4, 4, 6).)
5
Thus there are 6 · 5 · 3 ways to roll a full house, and the probability is
6 · 5 · 53
P (full house) = ⇡ 0.03858.
65
1.34. Let the corners of the unit square be the points (0, 0), (0, 1), (1, 1), (1, 0).
The circle of radius of 1/3 around the random point is completely within the
square if and only if the random point lies within the smaller square with cor-
ners (1/3, 1/3), (2/3, 1/3), (2/3, 2/3), (1/3, 2/3). The unit square has area one and
the smaller square has area 1/9. Consequently
P (the circle lies inside the unit square)
area of the smaller square 1/9 1
= = = .
area of original unit square 1 9
1.35. (a) Our sample space ⌦ is the set of points in the triangle with vertices (0, 0),
(3, 0) and (0, 3). The area of ⌦ is 3·3 9
2 = 2.
18 Solutions to Chapter 1

The event A describes the points in ⌦ with distance less than 1 from the
y-axis. These are exactly the points in the trapezoid with vertices (0, 0), (1, 0),
(1, 2), (0, 3). The area of A is (3+2)·1
2 = 52 . Since we are choosing our point
uniformly from ⌦, we can compute P (A) using the ratio of areas:
area of A (5) 5
P (A) = = 29 = .
area of ⌦ (2) 9
(b) We use the same sample space as in part (a). The event B describes the set
of points in ⌦ with distance more than 1 from the origin. The event B c is the
set of points that are in ⌦ and at most distance one from the origin. B c is
a quarter circle with center at (0, 0), radius 1, and corner points at (1, 0) and
(0, 1). The area of B c is ⇡4 . Thus
area of B c (⇡) ⇡
P (B c ) = = 49 =
area of ⌦ (2) 18
and then
⇡
P (B) = 1 P (B c ) = 1
.
18
1.36. (a) Since (X, Y ) is a uniformly random point, probability is proportional to
area:
P (a < X < b)
= P (point (X, Y ) lies in rectangle with vertices (a, 0), (b, 0), (b, 1), (a, 1))
area of rectangle with vertices (a, 0), (b, 0), (b, 1), (a, 1)
=
area of square with vertices (0, 0), (1, 0), (1, 1), (0, 1)
= b a.
Thus, X has a uniform distribution on [0, 1].
(b) The region of the xy plane defined by the inequality |x y|  1/4 consists of
the region between the lines y = x 1/4 and y = x + 1/4. Intersecting this
region with the unit square gives a region with an area of 7/16. (Easiest to see
by subtracting the complementary triangles from the unit square.) Thus, the
desired probability is also 7/16 since the unit square has an area of one.
1.37. (a) Let Bk = {Mary wins on her kth roll and her kth roll is a six}.
✓ ◆k 1 ✓ ◆k 1
(4 · 2)k 1 · 4 · 1 8 4 2 1
P (Bk ) = k
= = .
(6 · 6) 36 36 9 9
Then
1
X 1 ✓ ◆k
X 1
2 1 1
P (Mary wins and her last roll is a six) = P (Bk ) = = .
9 9 7
k=1 k=1

(b) Let Ak = {Mary wins on her kth roll}.

✓ ◆k 1
(2 · 4)k 1
·4 2 2
P (Ak ) = = .
(6 · 6)k 1·6 9 3
Solutions to Chapter 1 19

Then
1
X 1 ✓ ◆k
X 1
2 2 6
P (Mary wins) = P (Ak ) = = .
9 3 7
k=1 k=1

(c) Suppose Peter starts. Then the game lasts an even number of rolls precisely
when Mary wins. Thus the calculation is the same as in the example. Let
Dm = {the game lasts exactly m rolls}. Then for k 1,
✓ ◆k 1
(4 · 2)k 1 · 4 · 4 2 4
P (D2k ) = =
(6 · 6)k 9 9

and
1
X 1 ✓ ◆k
X 1
2 4 4
P (the game lasts an even number of rolls) = P (D2k ) = = .
9 9 7
k=1 k=1

If Mary starts, then an even-roll game ends with Peter’s roll. In this case
✓ ◆k 1
(2 · 4)k 1 · 2 · 2 2 1
P (D2k ) = =
(6 · 6)k 9 9

and
1
X 1 ✓ ◆k
X 1
2 1 1
P (the game lasts an even number of rolls) = P (D2k ) = = .
9 9 7
k=1 k=1

(d) Let again Dm = {the game lasts exactly m rolls}. Suppose Peter starts. Then
for k 1
✓ ◆k 1
(4 · 2)k 1 · 4 · 4 2 4
P (D2k ) = =
(6 · 6)k 9 9
and
✓ ◆k 1
(4 · 2)k 1
·2 2 1
P (D2k 1) = = .
(6 · 6)k 1·6 9 3
Next, for j 1:
2j
X j
X j
X
P (game lasts at most 2j rolls) = P (Dm ) = P (D2k ) + P (D2k 1)
m=1 k=1 k=1
j ✓ ◆k
X 1 j ✓ ◆k 1 j ✓ ◆k 1 j 1 ✓ ◆i
2 4 X 2 1 X 2 7 X 2 7
= + = =
9 9 9 3 9 9 i=0
9 9
k=1 k=1 k=1
7 ✓ ◆j
9 (1 ( 29 )j ) 2
= =1
1 29 9
20 Solutions to Chapter 1

and
2j
X1 j 1
X j
X
P (game lasts at most 2j 1 rolls) = P (Dm ) = P (D2k ) + P (D2k 1)
m=1 k=1 k=1
j 1 ✓ ◆k
X 1 j ✓ ◆k 1 j ✓ ◆k 1 ✓ ◆j 1 j ✓ ◆k 1
2 4 X 2 1 X 2 4 2 4 X 2 1
= + = +
9 9 9 3 9 9 9 9 9 3
k=1 k=1 k=1 k=1
j ✓
X 2 k 17 ◆ ✓ ◆ j 1 j 1 ✓ ◆ i ✓ ◆ j 1
2 4 X 2 7 2 4
= =
9 9 9 9 i=0
9 9 9 9
k=1
7 ✓ ◆j 1 ✓ ◆j ✓ ◆j 1 ✓ ◆j
(1 ( 29 )j ) 2 4 2 2 4 2
= 9 2 = 1 =1 3 .
1 9 9 9 9 9 9 9
Finally, suppose Mary starts. Then for k 1
k 1
✓ ◆k 1
(2 · 4) ·2·2 2 1
P (D2k ) = k
=
(6 · 6) 9 9
and ✓ ◆k 1
(2 · 4)k 1
·4 2 2
P (D2k 1) = = .
(6 · 6)k 1·6 9 3
Next, for j 1:
2j
X j
X j
X
P (game lasts at most 2j rolls) = P (Dm ) = P (D2k ) + P (D2k 1)
m=1 k=1 k=1
j ✓ ◆k
X 1 j ✓ ◆k 1 j ✓ ◆k 1 j 1 ✓ ◆i
2 1 X 2 2 X 2 7 X 2 7
= + = =
9 9 9 3 9 9 i=0
9 9
k=1 k=1 k=1
7 ✓ ◆j
9 (1 ( 29 )j ) 2
= =1
1 29 9
and
2j
X1 j 1
X j
X
P (game lasts at most 2j 1 rolls) = P (Dm ) = P (D2k ) + P (D2k 1)
m=1 k=1 k=1
j
X j ✓ ◆k
X 1 ✓ ◆j 1
⇥ ⇤ 2 7 2 1
= P (D2k ) + P (D2k 1) P (D2j ) =
9 9 9 9
k=1 k=1
✓ ◆j ✓ ◆j 1 ✓ ◆j
2 2 1 3 2
=1 =1 .
9 9 9 2 9
We see that when Mary starts, the game tends to be over faster.
1.38. If the choice is to be uniformly random, then each integer has to have the same
probability, say P {k} = c for each integer k. If c > 0, choose an integer n > 1/c.
Then by the additivity of probability over mutually exclusive alternatives,
P (the outcome is between 1 and n) = P {1, 2, . . . , n} = nc > 1.
Since total probability cannot exceed 1, it must be that c = 0 and so P {k} = 0 for
each positive integer k. The total sample space ⌦ is the union of the sequence of
Solutions to Chapter 1 21

singletons {k} as k ranges over all positive integers. Hence again by the additivity
axiom
X1 X1
1 = P (⌦) = P {k} = 0 = 0.
k=1 k=1
We have a contradiction. Thus there cannot be a sample space and probability P
that represents a uniformly chosen random positive integer.
1.39. (a) Define
A = the event that a portion of the bill was paid using cash,
B = the event that a portion of the bill was paid using check,
C = the event that a portion of the bill was paid using card.
Note that we know the following:
P (A) = 0.78, P (B) = 0.16, P (C) = 0.26
P (AC) = 0.13, P (AB) = 0.06, P (BC) = 0.04
P (ABC) = 0.03.

The probability that someone paid with cash only is now seen to be
P (A \ (B [ C)c ) = P (A) P (AB) P (AC) + P (ABC)
= 0.78 0.06 0.13 + 0.03 = 0.62.
The probability that someone paid with check only is
P (B \ (A [ C)c ) = P (B) P (BC) P (AB) + P (ABC)
= 0.16 0.04 0.06 + 0.03 = 0.09.
The probability that someone paid with card only is
P (C \ (A [ B)c ) = P (C) P (AC) P (BC) + P (ABC)
= 0.26 0.13 0.04 + 0.03 = 0.12.
So the probability of the union of these three mutually disjoint sets is,
P (only one method of payment)
= P (cash only) + P (check only) + P (card only)
= 0.62 + 0.09 + 0.12 = 0.83.
(b) Define the event
D = {at least one bill was paid using two or more methods}.
Then Dc is the event that both bills were paid using only one method. By part (a),
we know that there are 83 bills that were paid with only one method. Hence, since
there are precisely 100
2 ways to choose the two checks from the 100, and precisely
83
2 ways to choose the two bills from the pool of 83, we have
83
83 · 82
P (D) = 1 P (Dc ) = 1 2
100 =1 ⇡ 0.3125.
2
100 · 99
22 Solutions to Chapter 1

1.40. This is an application of inclusion-exclusion with four events. Below we use

some hopefully self-evident summation notation to avoid writing out long sums.
P (at least one color is repeated exactly twice) = P (G [ R [ Y [ W )
X
= P (G) + P (R) + P (Y ) + P (W ) P (AB)
A,B2{G,R,Y,W }
A6=B
X
+ P (ABC) P (GRY W )
A,B,C2{G,R,Y,W }
A,B,C distinct

Next we derive the probabilities that appear in the equation above. The out-
comes of this experiment are 4-tuples from the set {green, red, yellow, white}.
The total number of 4-tuples is 44 = 256.
4
·3·3
2 27
P (G) = P (exactly two greens) = = .
256 128
The numerator above is derived as follows: there are 42 ways to pick the positions
of the two greens in the 4-tuple. For both of the remaining two positions we have
3 colors to choose from. By the same reasoning, P (G) = P (R) = P (Y ) = P (W ) =
27
128 .
An event of type AB above means that the four draws yielded two balls of color
a and two balls of color b, where a and b are two distinct particular colors. The
number of 4-tuples in the event AB is 42 = 6. We can even list them easily. Here
they are in lexicogaphic order:
aabb, abab, abba, baab, baba, bbaa.
Thus P (AB) = 6/256 = 3/128.
Events of the type ABC are empty because four draws cannot yield three
di↵erent colors that each appear exactly twice. For the same reason GRY W = ?.
Putting everything together gives
P (at least one color is repeated exactly twice)
27 3 45
=4· 6· = ⇡ 0.7031.
128 128 64
1.41. Let A1 , A2 , A3 be the events that person 1, 2, and 3 win no games, respec-
tively. Then we want
P (A1 [A2 [A3 ) = P (A1 )+P (A2 )+P (A3 ) P (A1 A2 ) P (A1 A3 ) P (A2 A3 )+P (A1 A2 A3 ),
where we used inclusion-exclusion. Since each person has a probability of 2/3 of
not winning each particular game, we have
✓ ◆4
2
P (Ai ) = ,
3
for each i 2 {1, 2, 3}. Event A1 A2 is equivalent to saying that person 1 won all
three games, and analogously for A1 A3 and A2 A3 . Hence
✓ ◆4
1
P (A1 A2 ) = P (A1 A3 ) = P (A2 A3 ) = .
3
Solutions to Chapter 1 23

Finally, we have P (A1 A2 A3 ) = 0 because somebody had to win at least one game.
Thus,
✓ ◆4 ✓ ◆4
2 1 5
P (A1 [ A2 [ A3 ) = 3 · 3· = .
3 3 9
1.42. By inclusion-exclusion and the bound P (A [ B)  1,

P (AB) = P (A) + P (B) P (A [ B) > 0.8 + 0.5 1 = 0.3.

1.43. For n = 2 we can use the inclusion exclusion to get

P (A1 [ A2 ) = P (A1 ) + P (A2 ) P (A1 A2 )  P (A1 ) + P (A2 ).

From this we can get the statement step by step for larger and larger values of n.
For n = 3 we can use the n = 2 statement twice, first for A1 [ A2 and A3 :

P ((A1 [ A2 ) [ A3 )  P (A1 [ A2 ) + P (A3 )

and then for A1 and A2 :

P ((A1 [ A2 ) [ A3 )  P (A1 [ A2 ) + P (A3 )  P (A1 ) + P (A2 ) + P (A3 ).

For general n one can do the same by repeating the procedure n 1 times.
The last step of the proof can also be finished with mathematical induction.
Here is the induction step. If the statement is assumed to be true for n 1 then,
first by the case of two events and then by the induction assumption,

P ((A1 [ · · · [ An 1) [ An )  P (A1 [ · · · [ An 1) + P (An )

n
X1 n
X
 P (Ak ) + P (An ) = P (Ak ).
k=1 k=1

1.44. Let ⌦ = {(i, j) : i, j 2 {1, . . . , 6}} be the sample space of the two rolls of the
two dice (order matters). Note that #⌦ = 36. For (i, j) 2 ⌦ we let X = max{i, j}
and Y = min{i, j}.

(a) The possible values of both X and Y are {1, . . . , 6}.

(b) Note that P (X  6) = 1. P (X  5) is the probability that both rolls yielded
five or less. Then there are 5 possibilities for each die, and this event has a
probability of
5·5 25
P (X  5) = = .
36 36
Continuing in the same manner:
4·4 16 3·3 9
P (X  4) = = , P (X  3) = =
36 36 36 36
2·2 4 1·1 1
P (X  2) = = , P (X  1) = = .
36 36 36 36
24 Solutions to Chapter 1

We now have

25 11
P (X = 6) = P (X  6) P (X  5) = 1 =
36 36
25 16 9
P (X = 5) = P (X  5) P (X  4) = =
36 36 36
16 9 7
P (X = 4) = P (X  4) P (X  3) = =
36 36 36
9 4 5
P (X = 3) = P (X  3) P (X  2) = =
36 36 36
4 1 3
P (X = 2) = P (X  2) P (X  1) = =
36 36 36
1
P (X = 1) = P (X  1) = .
36

P (Y 1) = 1
# ways to roll only 2s or higher 52 25
P (Y 2) = = =
36 36 36
# ways to roll only 3s or higher 42 16
P (Y 3) = = =
36 36 36
# ways to roll only 4s or higher 32 9
P (Y 4) = = =
36 36 36
# ways to roll only 5s or higher 22 4
P (Y 5) = = =
36 36 36
# ways to roll only 6s or higher 12 1
P (Y 6) = = = ,
36 36 36

and using that P (Y = k) = P (Y k) P (Y k + 1) we get

25 11
P (Y = 1) = P (Y 1) P (Y 2) = 1 =
36 36
25 16 9
P (Y = 2) = P (Y 2) P (Y 3) = =
36 36 36
16 9 7
P (Y = 3) = P (Y 3) P (Y 4) = =
36 36 36
9 4 5
P (Y = 4) = P (Y 4) P (Y 5) = =
36 36 36
4 1 3
P (Y = 5) = P (Y 5) P (Y 6) = =
36 36 36
1
P (Y = 6) = P (Y 6) = .
36
Solutions to Chapter 1 25

1.45. The possible values of X are 4, 3, 2, 1, 0, because you can win at most 4 dollars.
The probability mass function is
1
P (X = 4) = P (the first six was rolled on the first roll) =
6
5
P (X = 3) = P (the first six was rolled on the 2nd roll) = 2
6
52
P (X = 2) = P (the first six was rolled on the 3rd roll) = 3
6
53
P (X = 1) = P (the first six was rolled on the 4th roll) = 4
6
54
P (X = 0) = P (no six was rolled in the first 4 rolls) = 4
6
You can check that these probabilities add up to 1, as they should.
1.46. To simplify the counting task we imagine that all four balls are drawn from
the urn one by one, and then let X denote the number of red balls that come before
the yellow. (This is subtly di↵erent from the setup of the problem which says that
we stop drawing balls once we see the red. This distinction makes no di↵erence for
the value that X takes.) Number the red balls 1, 2 and 3, and number the yellow
ball 4. Then the sample space is

⌦ = {(x1 , x2 , x3 , x4 ) : xi 2 {1, 2, 3, 4} and xi 6= xj if i 6= j}.

In other words, ⌦ is the set of all permutations of the numbers 1, 2, 3, 4 and conse-
quently #⌦ = 4! = 24.
The possible values of X are {0, 1, 2, 3}. To compute the probabilities P (X = k)
we count the number of ways in which each event can take place.
1·3·2·1 1
P (X = 0) = P (yellow came first) = = .
24 4
The numerator equals the number of ways to choose one yellow (1) times the number
of ways to choose the first red (3) times the number of ways to choose the second
red (2) times the number of ways to choose the last red (1). By similar reasoning,
3·1·2·1 1
P (X = 1) = P (yellow came second) = =
24 4
3·2·1·1 1
P (X = 2) = P (yellow came third) = =
24 4
3·2·1·1 1
P (X = 3) = P (yellow came fourth) = = .
24 4
1.47. Since ! 2 [0, 1], the random variable Z satisfies Z(!) = e! 2 [1, e]. Thus for
t < 1 the event {Z  t} is empty and has probability P (Z  t) = 0. If t e then
{Z  t} = ⌦ (in other words, Z  t is always true) and so P (Z  t) = 1 for t e.
For 1  t < e then we have this equality of events:

{Z  t} = {! : e!  t} = {! : !  ln t}.
26 Solutions to Chapter 1

Since 0  ln t < 1, we have P (! : !  ln t) = ln t. In summary,

8
>
<0 if t < 0
P (Z  t) = ln t if 0  t < e
>
:
1 if e  t.
1.48. The first digit takes one of the values 0, 1, . . . , 9, which then also form the
range of Y . Since the range of Y is finite, Y must be a discrete random variable.
However, a subtlety having to do with real numbers has to be addressed.
Namely, as it stands, the definition of Y (!) is ambiguous for certain sample points
!. This is because 0.1 = 0.09 = 0.0999 . . . , 0.2 = 0.19 = 0.1999 . . . , and so on, up
until 1.0 = 0.9 = 0.999.... But there are only ten of these real numbers in [0, 1]
whose first digit after the decimal is not precisely defined. Since individual num-
bers have probability zero under a uniform draw from [0, 1], we can ignore these
ten sample points {0.1, 0.2, . . . , 1.0} without a↵ecting the probabilities of Y .
With the convention of the previous paragraph, for each k 2 {0, 1, . . . , 9}, the
k k+1
event {Y = k} is the same as the left-closed right-open interval [ 10 , 10 ). Thus

k k+1 1
P (Y = k) = P [ 10 , 10 ) = for each k 2 {0, 1, . . . , 9}.
10
1.49. (a) To answer the question with inclusion-exclusion, let Ai = {ith draw is red}.
Then B = [`i=1 Ai . To apply (1.20) we need the probabilities P (Ai1 \ · · · \ Aik )
for each choice of indices 1  i1 < · · · < ik  `. To see how this goes, let us
first derive the example
P (A2 \ A5 ) = P (the 2nd draw and 5th draw are red)
by counting favorable outcomes and total outcomes. Each of the ` draws comes
from a set of n balls, so #⌦ = n` . The number of favorable outcomes is
n · 3 · n · n · 3 · n · · · n = n` 2 32 because the second and fifth draws are restricted
to the 3 red balls, and the other ` 2 draws are unrestricted. This gives
✓ ◆2
n` 2 32 3
P (A2 \ A5 ) = = .
n` n
The same reasoning gives for any choice of k indices 1  i1 < · · · < ik  `
✓ ◆k
n ` k 3k 3
P (Ai1 \ · · · \ Aik ) = = .
n` n
Then
X̀ X
P (B) = ( 1)k+1 P (Ai1 \ · · · \ Aik )
k=1 1i1 <···<ik `

X̀ ✓ ◆✓ ◆k X̀ ✓ ` ◆✓ 3 ◆k
` 3
= ( 1)k+1 =
k n k n
k=1 k=1
X̀ ✓ ` ◆✓ 3 ◆k ✓
3
◆`
=1 =1 1 .
k n n
k=0
Solutions to Chapter 1 27

In the second to last equality above we added and subtracted the term for
k = 0 which is 1. This enabled us to apply the binomial theorem (Fact D.2 in
Appendix D).
(b) Let Bk = {a red ball is seen exactly k times} for 1  k  `. There are k`
ways to decide which k of the ` draws produce the red ball. Thus there are
altogether k` 3k (n 3)` k ways to draw exactly k red balls. Then
` k ✓ ◆✓ ◆k ✓ ◆` k
3 (n 3)` k ` 3 3
P (Bk ) = k = 1
n` k n n
and then by the binomial theorem (add and subtract the k = 0 term)
X̀ X̀ ✓ ` ◆✓ 3 ◆k ✓ 3
◆` k
P (B) = P (Bk ) = 1
k n n
k=1 k=1
X̀ ✓ ` ◆✓ 3 ◆k ✓ 3
◆` k ✓
3
◆` ✓
3
◆`
= 1 1 =1 1 .
k n n n n
k=0
(c) The quickest solution comes by using the complement B c = {each draw is green}.
✓ ◆`
c (n 3)` 3
P (B) = 1 P (B ) = 1 =1 1 .
n` n
Solutions to Chapter 2

2.1. We can set our sample space to be ⌦ = {(a1 , a2 ) : 1  ai  6}. We have

#⌦ = 36 and each outcome is equally likely.
Denote by A the event that at least one number is even and by B the event that
the sum is 8. Then we need P (A|B) which can be computed from the definition as
P (A|B) = PP(AB)
(B) .
#B 5
We have B = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}, and hence P (B) = #⌦ = 36 .
#AB 3 1
Moreover, AB = {(2, 6), (4, 4), (6, 2)} and hence P (AB) = #⌦ = 36 = 12 . Thus
1
P (AB) 3
P (A|B) = P (B) = 12
5 = 5.
36

Since the outcomes are equally likely, we can equivalently find the answer from
P (A|B) = #AB 3
#B = 5 .

2.2. A = {second flip is tails} = {(H, T, H), (H, T, T ), (T, T, H), (T, T, T )},
B = {at most one tails} = {(H, H, H), (H, H, T ), (H, T, H), (T, H, H)}.
Hence AB = {(H, T, H)}, and since we have equally likely outcomes,
P (AB) #AB
P (A | B) = = = 14 .
P (B) #B
2.3. We set the sample space as ⌦ = {1, 2, . . . , 100}. We have #⌦ = 100 and each
outcome is equally likely.
Let A denote the event that the chosen number is divisible by 3 and B denote
the event that at least one digit is equal to 5. Then
B = {5, 15, 25, . . . , 95} [ {50, 51, . . . , 59}
and #B = 19. (As there are 10 numbers with 5 as the last digit, 10 numbers with
5 at the tens place, and 55 was counted both times.) We also have
AB = {15, 45, 51, 54, 57, 75}, #AB = 6.

29
30 Solutions to Chapter 2

P (AB) 6/100 6
This gives P (A|B) = P (B) = 19/100 = 19 .

2.4. Let A be the event that we picked the ball labeled 5 and B the event that we
picked the first urn. Then we have P (B) = 1/2, P (B c ) = P (we picked the second urn) =
1/2. Moreover, from the setup if the problem
P (A|B) = P (we chose the number 5 | we chose from the first urn) = 0,
1
P (A|B c ) = P (we chose the number 5 | we chose from the second urn) = .
3
We compute P (A) by conditioning on B and B c :
1 1 1 1
P (A) = P (A|B)P (B) + P (A|B c )P (B c ) = 0 ·+ · = .
2 3 2 6
2.5. Let A be the event that we picked the number 2 and B the event that we picked
the first urn. Then we have P (B) = 1/5, P (B c ) = P (we picked the second urn) =
4/5. Moreover, from the setup if the problem
1
P (A|B) = P (we chose the number 2 | we chose from the first urn) = ,
3
1
P (A|B c ) = P (we chose the number 2 | we chose from the second urn) = .
4
Then we can compute P (A) by conditioning on B and B c :
1 1 1 4 4
P (A) = P (A|B)P (B) + P (A|B c )P (B c ) = · + · = .
3 5 4 5 15
2.6. Define events
A = {Alice watches TV tomorrow} and B = {Betty watches TV tomorrow}.
(a) P (AB) = P (A)P (B|A) = 0.6 · 0.8 = 0.48.
(b) Intuitively, the answer must be the same 0.48 as in part (a) because Betty
cannot watch TV unless Alice is also watching. Mathematically, this says that
P (B|Ac ) = 0. Then by the law of total probability,
P (B) = P (B|A)P (A) + P (B|Ac )P (Ac ) = 0.8 · 0.6 + 0 · 0.4 = 0.48.
(c) P (AB c ) = P (A) P (AB) = 0.6 0.48 = 0.12. Or, by conditioning and using
the outcome of Exercise 2.7(a),
P (AB c ) = P (A)P (B c |A) = P (A) 1 P (B|A) = 0.6 · 0.2 = 0.12.
c
2.7. (a) By definition P (Ac |B) = PP(A(B)B) . We have Ac B [ AB = B, and the two
sets on the left are disjoint, so P (Ac B) + P (AB) = P (B), and P (Ac B) =
P (B) P (AB). This gives
P (Ac B) P (B) P (AB) P (AB)
P (Ac |B) = = =1 =1 P (A|B).
P (B) P (B) P (B)
(b) From part (a) we have P (Ac |B) = 1 P (A|B) = 0.4. Then P (Ac B) =
P (Ac |B)P (B) = 0.4 · 0.5 = 0.2.
2.8. Let A1 , A2 , A3 denote the events that the first, second and third cards are
queen, king and ace, respectively. We need to compute P (A1 A2 A3 ). One could
do this by counting favorable outcomes. But conditional probabilities provide an
Solutions to Chapter 2 31

easier way because then we can focus on picking one card at a time. We just have
to keep track of how earlier picks influence the probabilities of the later picks.
4 1
We have P (A1 ) = 52 = 13 since there are 52 equally likely choices for the first
pick and four of them are queens. The conditional probability P (A2 | A1 ) must
reflect the fact that one queen has been removed from the deck and is no longer
a possible outcome. Since the outcomes are still equally likely, the conditional
4
probability of getting a king for the second pick is 51 . Similarly, when we compute
P (A3 | A1 A2 ) we can assume that we pick a card out of 50 (with one queen and
one king removed) and thus the conditional probability of picking an ace will be
4 2
50 = 25 . Thus the probability of A1 A2 A3 is given by
1 4 2 8
P (A1 A2 A3 ) = P (A1 )P (A2 | A1 )P (A3 | A2 A1 ) = 13 · 51 · 25 = 16,575 .

2.9. Let C be the event that we chose the ball 3 and D the event that we chose
from the second urn. Then we have
4 1 1 1
P (D) = , P (Dc ) = , P (C|D) = , P (C|Dc ) = .
5 5 4 3
We need to compute P (D|C), which we can do using Bayes’ formula:
1 4
P (C|D)P (D) 4 · 5 3
P (D|C) = = 1 4 1 1 = .
P (C|D)P (D) + P (C|Dc )P (Dc ) 4 · 5 + 3 · 5
4
2.10. Define events:
A = {outcome of the roll is 4} and Bk = {the k-sided die is picked}.
Then
P (A \ B6 ) P (A|B6 )P (B6 )
P (B6 |A) = =
P (A) P (A|B4 )P (B4 ) + P (A|B6 )P (B6 ) + P (A|B12 )P (B12 )
1 1
· 1
= 1 1 16 13 1 1 = .
·
4 3 + ·
6 3 + ·
12 3
3
2.11. Let A be the event that the chosen customer is reckless. Let B be the event
that the chosen customer has an accident. We know the following:
P (A) = 0.2, P (Ac ) = 0.8, P (B|A) = 0.04, and P (B|Ac ) = 0.01.
The probability asked for is P (Ac |B). Using Bayes’ formula we get
P (B|Ac )P (Ac ) 0.01 · 0.80 1
P (Ac |B) = = = .
P (B|A)P (A) + P (B|Ac )P (Ac ) 0.04 · 0.2 + 0.01 · 0.80 2
2.12. (a) A = {X is even}, B = {X is divisible by 5}. #A = 50, #B = 20 and
AB = {10, 20, . . . , 100} so #AB = 10. Thus
50 20 1 10 1
P (A)P (B) = 100 · 100 = 10 and P (AB) = 100 = 10 .
This shows P (A)P (B) = P (AB) and verifies the independence of A and B.
(b) C = {X has two digits} = {10, 11, 12, . . . , 99} and #C = 90.
D = {X is divisible by 3} = {3, 6, 9, 12, . . . , 99} and #D = 33.
CD = {12, 15, . . . , 99} and #C = 30. Thus
90 33 30 3
P (C)P (D) = 100 · 100 ⇡ 0.297 and P (CD) = 100 = 10 .
32 Solutions to Chapter 2

This shows P (C)P (D) 6= P (CD) and verifies that C and D are not indepen-
dent.
(c) E = {X is a prime} = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47,
53, 59, 61, 67, 71, 73, 79, 83, 89, 97},
and #E = 25.

F = {X has a digit 5} = {5, 15, 25, . . . , 95} [ {50, 51, . . . , 59}

and #F = 19. EF = {5, 53, 59} and #EF = 3. We have

25 19 3
P (E)P (F ) = 100 · 100 = 0.0475 and P (EF ) = 100 .

This shows P (E)P (F ) 6= P (EF ) and verifies that E and F are not independent.
2.13. We need to check whether or not we have

P (AB) = P (A)P (B).

We know that P (A)P (B) = 13 · 13 = 19 . We also know that A = AB [ AB c and that

the events AB and AB c are disjoint. Thus,
1 2
= P (A) = P (AB) + P (AB c ) = P (AB) + .
3 9
Thus,
1 2 1
P (AB) = = = P (A)P (B),
3 9 9
so A and B are independent.
2.14. Since P (AB) = P (?) = 0 and independence requires P (A)P (B) = P (AB),
disjoint events A and B are independent if and only if at least one of them has
probability zero.
2.15. Number the days by 1,2,3,4,5 starting from Monday. Let Xi = 1 if Ramona
catches her bus on day i and Xi = 0 if she misses it. Then we need to compute
P (X1 = 1, X2 = 1, X3 = 0, X4 = 1, X5 = 0). By assumption, the events {X1 = 1},
{X2 = 1}, {X3 = 0}, {X4 = 1}, {X5 = 0} are independent from each other, and
9 1
P (Xi = 1) = 10 and P (Xi = 0) = 10 . Thus

P (X1 = 1, X2 = 1, X3 = 0, X4 = 1, X5 = 0)
= P (X1 = 1)P (X2 = 1)P (X3 = 0)P (X4 = 1)P (X5 = 0)
9 9 1 9 1 729
= · · · · = .
10 10 10 10 10 10000
2.16. Let us label heads as 0 and tails as 1. The sample space is

⌦ = {(s1 , s2 , s3 ) : each si 2 {0, 1}},

the set of ordered triples of zeros and ones. #⌦ = 8 and so for equally likely
outcomes we have P (!) = 1/8 for each ! 2 ⌦. The events and their probabilities
Solutions to Chapter 2 33

we need for answering the question of independence are

4
P (A1 ) = P {(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1)} = 8 = 12 ,
4
P (A2 ) = P {(0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1)} = 8 = 12 ,
4
P (A3 ) = P {(0, 1, 1), (1, 0, 1), (1, 1, 0), (0, 0, 0)} = 8 = 12 ,
2 1 1 1
P (A1 A2 ) = {(0, 1, 0), (0, 1, 1)} = 8 = 4 = 2 · 2 = P (A1 )P (A2 ),
P (A1 A3 ) = {(0, 1, 1), (0, 0, 0)} = 28 = 1
4 = 1
2 · 1
2 = P (A1 )P (A3 ),
P (A2 A3 ) = {(0, 1, 1), (1, 0, 1)} = 28 = 1
4 = 1
2 · 1
2 = P (A1 )P (A3 ),
P (A1 A2 A3 ) = {(0, 1, 1)} = 18 = 12 · 12 · 12 = P (A1 )P (A2 )P (A3 ).
All the four possible combinations of more than two events from A1 , A2 , A3 satisfy
the product identity. Hence independence of A1 , A2 , A3 has been verified.
2.17. We have AB [ C = ABC c [ C, and the events ABC c and C are disjoint.
Thus P (AB [ C) = P (ABC c ) + P (C). Since A, B, C are mutually independent,
this is also true for A, B, C c . Thus
1 1 ⇣ 1⌘ 1
P (ABC c ) = P (A)P (B)P (C c ) = · · 1 = ,
2 3 4 8
From this we get
1 1 3
P (AB [ C) = P (ABC c ) + P (C) = + = .
8 4 8
Here is another solution: by inclusion-exclusion P (AB [ C) = P (AB) + P (C)
P (ABC). Because of independence
1 1 1 1 1 1 1
P (AB) = P (A)P (B) = · = , P (ABC) = P (A)P (B)P (C) = · · = .
2 3 6 2 3 4 24
Thus
1 1 1 3
P (AB [ C) = P (AB) + P (C) P (ABC) = + = .
6 4 24 8
2.18. There are 90 numbers to choose from and so each outcome has probability
1
90 .

(a) From enumerating the possible values of X, we see that P (X = k) = 19 for

each k 2 {1, 2, . . . , 9}. (For example, the event {X = 3} = {30, 31, . . . , 39}
1
has 10 outcomes from the 90 total.) For Y we have P (Y = `) = 10 for each
` 2 {0, 1, 2, . . . , 9}. (For example, the event {Y = 3} = {13, 23, 33, . . . , 93} has
9 outcomes from the 90 total.)
The intersection {X = k, Y = `} contains exactly one number from the 90
outcomes, namely 10k + `. (For example {X = 3, Y = 5} = {35}). Thus for
each pair (k, `) of possible values,
1 1 1
P (X = k, Y = `) = P {10k + `} = 90 = 9 · 10 = P (X = k)P (Y = `).
Thus we have checked that X and Y are independent.
(b) To show that independence fails, we need to find only one case where the
product property P (X = k, Z = m) = P (X = k)P (Z = m) fails. Let’s take
an extreme case. The smallest possible value for Z is 1 that comes only from
the outcome 10, since the sum of the digits is 1 + 0 = 1. (Formally, since Z is
34 Solutions to Chapter 2

1
a function on ⌦, Z(10) = 1 + 0 = 1.) And so P (Z = 1) = P {10} = 90 . If we
take X = 2, we cannot get Z = 1. Here is the precise derivation:
P (X = 2, Z = 1) = P ({20, 21, . . . , 29} \ {10}) = P (?) = 0.
1 1 1
Since P (X = 2)P (Z = 1) = 9 · 90 = 810 6= 0, we have shown that X and Z are
not independent.
2.19. (a) If we draw with replacement then we have 72 equally likely outcomes for
the two picks. Counting the favorable outcomes gives
1·7 1
P (X1 = 4) = =
7·7 7
7·1 1
P (X2 = 5) = =
7·7 7
1 1
P (X1 = 4, X2 = 5) = = .
7·7 49
(b) If we draw without replacement then we have 7 · 6 equally likely outcomes for
the two picks. Counting the favorable outcomes gives
1·6 1
P (X1 = 4) = =
7·6 7
6·1 1
P (X2 = 5) = =
7·6 7
1 1
P (X1 = 4, X2 = 5) = = .
7·6 42
(c) The answer to part (b) showed that P (X1 = 4)P (X2 = 5) 6= P (X1 = 4, X2 = 5).
This proves that X1 and X2 are not independent when drawing without replace-
ment.
Part (a) showed that the events {X1 = 4} and {X2 = 5} are independent when
drawing with replacement, but this is not enough for proving that the random
variables X1 and X2 are independent. Independence of random variables requires
checking P (X1 = a)P (X2 = b) = P (X1 = a, X2 = b) for all possible choices of a
and b. (This can be done and so independence of X1 and X2 does actually hold
here.)
2.20. (a) Let S5 denote the number of threes in the first five rolls. Then
X2 ✓ ◆
5 1 k 5 5 k
P (S5  2) = .
k 6 6
k=0

(b) Let N be the number of rolls needed to see the first three. Then from the p.m.f.
of a geometric random variable,
1
X
5 k 1 1 5 4
P (N > 4) = 6 6 = 6 .
k=5

Equivalently,
5 4
P (N > 4) = P (no three in the first four rolls) = 6 .
Solutions to Chapter 2 35

(c) We can approach this in a couple di↵erent ways. By using the independence of
the rolls,
P (5  N  20)
= P (no three in the first four rolls, at least one three in rolls 5–20)
5 4 5 16 5 4 5 20
= 6 1 6 = 6 6 .
Equivalently, thinking of the roll at which the first three comes,
P (5  N  20) = P (N 5) P (N 21)
1
X 1
X
5 k 1 1 5 k 1 1
= 6 6 6 6
k=5 k=21
5 4 5 20
= 6 6 .
2.21. (a) Let S be the number of problems she gets correct. Then S ⇠ Bin(4, 0.8)
and
P (Jane gets an A) = P (S 3) = P (S = 3) + P (S = 4)
✓ ◆
4
= (0.8)3 (0.2) + (0.8)4
3
= 0.8192.
(b) Let S2 be the number of problems Jane gets correct out of the last three. Then
S2 ⇠ Bin(3, 0.8). Let X1 ⇠ Bern(0.8) model whether or not she gets the first
problem correct. By assumption, S2 and X1 are independent. We have
P (S 3, X1 = 1)
P (S 3 | X1 = 1) =
P (X1 = 1)
P (S2 2, X1 = 1) P (S2 2)P (X1 = 1)
= = .
P (X1 = 1) P (X1 = 1)
The last equality followed by the independence of S2 and X1 . Hence,
✓ ◆
3
P (S 3|X1 = 1) = P (S2 2) = (0.8)2 (0.2) + (0.8)3 = 0.896.
2
2.22. (a) Let us encode the possible events in a single round as
AR = {Annie chooses rock}, AP = {Annie chooses paper}
and AS = {Annie chooses scissors}
and similarly BR , BP and BS for Bill. Then, using the independence of the
players’ choices,
P (Ann wins the round) = P (AR BS ) + P (AP BR ) + P (AS BP )
= P (AR )P (BS ) + P (AP )P (BR ) + P (AS )P (BP )
1 1 1 1 1 1
= 3 · 3 + 3 · 3 + 3 · 3 = 13 .
Conceptually quicker than enumerating cases would be to notice that no
matter what Ann chooses, the probability that Bill makes a losing choice is 13 .
36 Solutions to Chapter 2

Hence by the law of total probability, Ann’s probability of winning must be 13 .

Here is the calculation:
P (Ann wins the round) = P (Ann wins the round | AR )P (AR )
+ P (Ann wins the round | AP )P (AP )
+ P (Ann wins the round | AS )P (AS )
1 1 1
= 3 · P (AR ) + 3 · P (AP ) + 3 · P (AS )
1
= 3 · P (AR ) + P (AP ) + P (AS ) = 13 .
(b) By the independence of the outcomes of di↵erent rounds,
P (Ann’s first win happens in the fourth round)
= P (Ann does not win any of the first three rounds,
Ann wins the fourth round)
2 2 2 1 8
= 3 · 3 · 3 · 3 = 81 .

(c) Again by the independence of the outcomes of di↵erent rounds,

2 4 16
P (Ann does not win any of the first four rounds) = 3 = 81 .

2.23. Whether there is an accident on a given day can be treated as the outcome
of a trial (where success means that there is at least one accident). The success
probability is p = 1 0.95 = 0.05 and the failure probability is 0.95.
(a) The probability of no accidents at this intersection during the next 7 days is the
probability that the first seven trials failed, which is (1 p)7 = 0.957 ⇡ 0.6983.

(b) There are 30 days in September. Let X be the number of days that have at
least one accident. X counts the number of ‘successes’ among 30 trials, so X ⇠
Bin(30, 0.05). Using the probability mass function of the binomial we get
✓ ◆
30
P (X = 2) = 0.052 0.9528 ⇡ 0.2586.
2
(c) Let N denote the number of days we have to wait for the next accident, or
equivalently, the number of trials needed for the first success. N has geometric
distribution with parameter p = 0.05. We need to compute P (4 < N  10).
The event {4 < N  10} is the same as {N 2 {5, 6, 7, 8, 9, 10}}. Using the
probability mass function of the geometric distribution,
10
X 10
X 10
X
P (4 < N  10) = P (N = k) = (1 p)k 1
p= 0.95k 1
0.05
k=5 k=5 k=5
⇡ 0.2158.
Here is an alternative solution. Note that
P (4 < N  10) = P (N  10) P (N  4)
= (1 P (N > 10)) (1 P (N > 4))
= P (N > 4) P (N > 10).
Solutions to Chapter 2 37

For any positive integer k the event {N > k} is the same as having k failures
in the first k trials. By part (a) the probability of this is (1 p)k , which gives
P (N > k) = (1 p)k = 0.95k and then
P (4 < N  10) = P (N > 4) P (N > 10) = (1 p)4 (1 p)10
= 0.954 0.9510 ⇡ 0.2158.
2.24. (a) X is hypergeometric with parameters (6, 4, 3).
(b) The probability mass function of X is
4 2
k 3 k
P (X = k) = 6 for k 2 {0, 1, 2, 3},
3
with the convention that ka = 0 for integers k > a 0. In particular, P (X =
0) = 0 because with only 2 men available, a team of 3 cannot consist of men
alone.
2.25. Define events: A = {first roll is a three}, B = {second roll is a four}, Di =
{the die has i sides}. Assume that A and B are independent, given Di , for each
i = 4, 6, 12.
X X
P (AB) = P (AB|Di )P (Di ) = P (A|Di )P (B|Di )P (Di )
i=4,6,12 i=4,6,12

= ( 14 )2 + ( 16 )2 + 1 2
( 12 ) · 1
3.

P (AB|D6 )P (D6 ) ( 16 )2 · 13
P (D6 |AB) = = 1 2 = 27 .
P (AB) ( 4 ) + ( 16 )2 + ( 121 2
) · 1
3
2.26.
P ((AB) \ (CD)) = P (ABCD) = P (A)P (B)P (C)P (D) = P (AB)P (CD).
The very first equality is set algebra, namely, the associativity of intersection. This
can be taken as intuitively obvious, or verified from the definition of intersection
and common sense logic:
! 2 (AB) \ (CD) () ! 2 AB and ! 2 CD
() ! 2 A and ! 2 B and ! 2 C and ! 2 D
() ! 2 A and ! 2 B and ! 2 C and ! 2 D
() ! 2 ABCD.
Then we used the product rule first for all four events A, B, C, D, and then
separately for the pairs A, B and C, D.
2.27. (a) First introduce the necessary events. Let A be the event that we picked
Urn I. Then Ac is the event that we picked Urn II. Let B1 the event that we
picked a green ball. Then
1 1 2
P (A) = P (Ac ) = , P (B1 |A) = , P (B1 |Ac ) = .
2 3 3
P (B1 ) is computed from the law of total probability:
1 1 2 1 1
P (B1 ) = P (B1 |A)P (A) + P (B1 |Ac )P (Ac ) = · + · = .
3 2 3 2 2
38 Solutions to Chapter 2

(b) The two experiments are identical and independent. Thus the probability of
picking green both times is the square of the probability from part (a): 12 · 12 = 14 .

(d) The probability of getting a green from the first urn is 13 and the probability
of getting a green from the second urn is 23 . Since the picks are independent,
the probability of both picks being green is 13 · 23 = 29 .
2.28. (a) The number of aces I get in the first game is hypergeometric with pa-
rameters (52, 4, 13).
(b) The number of games in which I receive at least one ace during the evening is
binomial with parameters (50, 1 ( 48 52
13 / 13 )).
(c) The number of games in which all my cards are from the same suit is binomial
1
with parameters (50, 52
13 ).
(d) The number of spades I receive in the 5th game is hypergeometric with param-
eters (52, 13, 13).
2.29. Let E1 , E2 , E3 , N be the events that Uncle Bob hits a single, double, triple,
or not making it on base, respectively. These events form a partition of our sample
space. We also define S as the event Uncle Bob scores in this turn at bat. By the
law of total probability we have
P (S) = P (SE1 ) + P (SE2 ) + P (SE3 ) + P (SN )
= P (S|E1 )P (E1 ) + P (S|E2 )P (E2 ) + P (S|E3 )P (E3 ) + P (S|N )P (N )
= 0.2 · 0.35 + 0.3 · 0.25 + 0.4 · 0.1 + 0 · 0.3
= 0.185.
2.30. Identical twins have the same gender. We assume that identical twins are
equally likely to be boys or girls. Fraternal twins are also equally likely to be boys
or girls, but independently of each other. Thus fraternal twins are two girls with
probability 12 · 12 = 14 . Let I be the event that the twins are identical, F the event
that the twins are fraternal.
1 1 1 2
(a) P (two girls) = P (two girls | I)P (I) + P (two girls | F )P (F ) = 2 · 3 + 4 · 3 = 13 .
1 1
P (two girls | I)P (I) 2 · 3 1
(b) P (I | two girls) = = 1 = .
P (two girls) 3
2
Solutions to Chapter 2 39

2.31. (a) The sample space is

⌦ = {(g, b), (b, g), (b, b), (g, g)},
and the probability measure is simply
1
P (g, b) = P (b, g) = P (b, b) = P (g, g) = ,
4
since we assume that each outcome is equally likely.
(b) Let A be the event that there is a girl in the family. Let B be the event that
there is a boy in the family. Note that the question is asking for P (B|A). Begin
to solve by noting that
3
A = {(g, b), (b, g), (g, g)} and P (A) = .
4
Similarly,
3
B = {(g, b), (b, g), (b, b)} and P (B) = .
4
Finally, we have
P (AB) P ({(g, b), (b, g)}) 2/4 2
P (B|A) = = = = .
P (A) 3/4 3/4 3
(c) Let C = {(g, b), (g, g)} be the event that the first child is a girl. B is as above.
We want P (B|C). Since P (C) = 1/2 we have
P (BC) P {(g, b)} 1/4 1
P (B|C) = = = = .
P (C) 1/2 1/2 2
2.32. (a) The sample space is
⌦ = {(b, b, b), (b, b, g), (b, g, b), (b, g, g), (g, b, b), (g, b, g), (g, g, b), (g, g, g)},
1
and each sample point has probability 8 since we assume all outcomes equally
likely.
(b) Let A = {(b, g, g), (g, b, g), (g, g, b), (g, g, g)} be the event that there are at least
two girls in the family. Let
B = {(b, b, b), (b, b, g), (b, g, b), (b, g, g), (g, b, b), (g, b, g), (g, g, b)}
be the event that there is a boy in the family.
P (AB) P ({(b, g, g), (g, b, g), (g, g, b)}) 3/8 3
P (B|A) = = = = .
P (A) P {(b, g, g), (g, b, g), (g, g, b), (g, g, g)} 4/8 4
(c) Let C = {(g, g, b), (g, g, g)} be the event that the first two children are girls. B
is as above. We want P (B|C). We have
P (BC) P {(g, g, b)} 1
P (B|C) = = = .
P (C) P {(g, g, b), (g, g, g)} 2
2.33. (a) Let Bk be the event that we choose urn k and let A be the event that we
chose a red ball. Then
P (Bk ) = 15 , P (A|Bk ) = k
10 , for 1  k  5.
40 Solutions to Chapter 2

By conditioning on the urn we chose and using (2.7) we get

5
X 5
X
k 1 1+2+3+4+5 3
P (A) = P (A | Bk )P (Bk ) = 10 · 5 = 50 = 10 .
k=1 k=1

(b)
k 1
P (A|Bk )P (Bk ) 10 · 5 k
P (Bk | A) = P5 = 3 = .
k=1 P (A | Bk )P (Bk ) 10
15

2.34. Since the urns are interchangeable, we can put the marked ball in urn 1.
There are three ways to arrange the two unmarked balls. Let case i for i 2 {0, 1, 2}
denote the situation where we put i unmarked balls together with the marked ball,
and the remaining 2 i unmarked balls in the other urn. Let M denote the event
that your friend draws the marked ball, and Aj the event that she chooses urn j,
j = 1, 2. Since P (M |A2 ) = 0, we get the following probabilities.
1
Case 0: P (M ) = P (M |A1 )P (A1 ) = 1 · 2 = 12 .
1
Case 1: P (M ) = P (M |A1 )P (A1 ) = 2 · 12 = 14 .
1
Case 2: P (M ) = P (M |A1 )P (A1 ) = 3 · 12 = 16 .
So (a) you would put all the balls in one urn (Case 2) while (b) she would put
the marked ball in one urn and the other balls in the other urn.
(c) The situation is analogous. If we put k unmarked balls together with the
marked ball in urn 1, then
1 1 1
P (M ) = P (M |A1 )P (A1 ) = k+1 · 2 = 2(k+1) .

Hence to minimize the chances of drawing the marked ball, put all the balls in one
urn, and to maximize the chances of drawing the marked ball, put the marked ball
in one urn and all the unmarked balls in the other.

2.35. Let A be the event that the first card is a queen and B the event that the
second card is a spade. Note that A and B are not independent, and there is no
immediate way to compute P (B|A). We can compute P (AB) by counting favorable
outcomes. Let ⌦ be the collection of all ordered pairs drawn without replacement
from 52 cards. #⌦ = 52 · 51 and all outcomes are equally likely. We can break up
AB into the union of the following two disjoint events:

C = {first card is queen of spades, second is a spade},

D = {first card is a queen but not a spade, the second card is a spade}.

We have #C = 12, as we can choose the second card 12 di↵erent ways. We have
#D = 3·13 = 39 as the first card can be any of the three non-spade queens, and the
second card can be any of the 13 spades. Thus #AB = #C + #D = 12 + 39 = 51
and we get P (AB) = #AB 51 1
#⌦ = 52·51 = 52 .

2.36. Let Aj be the event that a j-sided die was chosen and B the event that a six
was rolled.
Solutions to Chapter 2 41

(a) By the law of total probability,

P (B) = P (B|A4 )P (A4 ) + P (B|A6 )P (A6 ) + P (B|A12 )P (A12 )

7 1 3 1 2 1
=0· 12 + 6 · 12 + 12 · 12 = 18 .

(b)
1 3
P (B|A6 )P (A6 ) ·
P (A6 |B) = = 6
1
12
= 34 .
P (B) 18

2.37. (a) Let S, E, T, and W be the events that the six, eight, ten, and twenty sided
die is chosen. Let X be the outcome of the roll. Then

P (X = 6) = P (X = 6|S)P (S) + P (X = 6|E)P (E)

+ P (X = 6|T )P (T ) + P (X = 6|W )P (W )
1 1 1 2 1 3 1 4
= · + · + · + ·
6 10 8 10 10 10 20 10
11
= .
120
(b) We want
P (W, X = 7) P (X = 7|W )P (W )
P (W |X = 7) = = .
P (X = 7) P (X = 7)
Following part (a), we have

P (X = 7) = P (X = 7|S)P (S) + P (X = 7|E)P (E)

+ P (X = 7|T )P (T ) + P (X = 7|W )P (W )
1 1 2 1 3 1 4 3
=0· + · + · + · = .
10 8 10 10 10 20 10 40
Thus,
(1/20) · (4/10) 4
P (W |X = 7) = = .
(3/40) 15
2.38. Let R denote the event that the chosen letter is R and let Ai be the event
that the ith word of the sentence is chosen.
P4
(a) P (R) = i=1 P (R|Ai )P (Ai ) = 0 · 14 + 0 · 14 + 13 · 14 + 15 · 14 = 15
2
.
(b) P (X = 3) = 14 , P (X = 4) = 12 , P (X = 5) = 14 .
(c) P (X = 3 | X > 3) = 0.
1
P ({X = 4} \ {X > 3}) P (X = 4) 2 2
P (X = 4 | X > 3) = = = 3 = .
P (X > 3) P (X = 4) + P (X = 5) 4
3

1
P ({X = 5} \ {X > 3}) 4 1
P (X = 5 | X > 3) = = 3 = .
P (X > 3) 4
3
42 Solutions to Chapter 2

(d) Use below that R \ A1 = R \ A2 = A3 \ {X > 3} = ?.

4
X
P (R | X > 3) = P (RAi |X > 3) = P (RA3 |X > 3) + P (RA4 |X > 3)
i=1
P (R \ A3 \ {X > 3}) P (R \ A4 \ {X > 3})
= +
P (X > 3) P (X > 3)
P (R \ A4 ) P (R | A4 )P (A4 )
= =
P (X = 4) + P (X = 5) P (X = 4) + P (X = 5)
1 1
· 1
= 15 41 = 15 .
2 + 4

(e)
1 1
P (R | A4 )P (A4 ) ·
P (A4 | R) = = 5
2
4
= 38 .
P (R) 15

2.39. (a) Let Bi the event that we chose the ith word (i = 1, . . . , 8). Events
B1 , . . . , B8 form a partition of the sample space and P (Bi ) = 18 for each i. Let
A be the event that we chose the letter O. Then P (A|B3 ) = 15 , P (A|B4 ) = 13 ,
P (A|B6 ) = 14 with all other P (A|Bi ) = 0. This gives
X8 ✓ ◆
1 1 1 1 47
P (A) = P (A|Bi )P (Bi ) = + + = .
i=1
8 5 3 4 480

(b) The length of the chosen word can be 3, 4, 5 or 6, so the range of X is the
set {3, 4, 5, 6}. For each of the possible value x we have to find the probability
P (X = x).
pX (3) = P (X = 3) = P (we chose the 1st, the 4th or the 7th word)
3
= P (B1 [ B4 [ B7 ) = ,
8
2
pX (4) = P (X = 4) = P (we chose the 6th or the 8th word) = P (B6 [ B8 ) = ,
8
2
pX (5) = P (X = 5) = P (we chose the 2nd or the 3rd word) = P (B2 [ B3 ) = ,
8
1
pX (6) = P (X = 6) = P (we chose the 5th word) = P (B5 ) = .
8
Note that the probabilities add up to 1, as they should.
2.40. (a) For i 2 {1, 2, 3, 4} let Ai be the event that the student scores i on the
test. Let M be the event that the student becomes a math major.
4
X
1 1 3
P (M ) = P (M |Ai )P (Ai ) = 0 · 0.1 + 5 · 0.2 + 3 · 0.6 + 7 · 0.1 ⇡ 0.2829.
i=1

(b)
3
P (M |A4 )P (A4 ) 7 · 0.1
P (A4 |M ) = = 1 1 3 ⇡ 0.1515.
P (M ) 5 · 0.2 + 3 · 0.6 + 7 · 0.1
Solutions to Chapter 2 43

2.41. Introduce the following events:

B = {the phone is not defective}, A = {the phone comes from factory II}.
Then Ac is the event that the phone is from factory I. We know that
2 3 1 1
P (A) = 0.4 = , P (Ac ) = 0.6 = , P (B c |A) = 0.2 = , P (B c |Ac ) = 0.1 = .
5 5 5 10
Note that this also gives
4 9
P (B|A) = 1 P (B c |A) = , P (B|Ac ) = 1 P (B c |Ac ) = .
5 10
We need to compute P (A|B). By Bayes’ formula,
4 2
P (B|A) · P (A) · 16
P (A|B) = c c
= 4 2 5 59 3 =
P (B|A)P (A) + P (B|A )P (A ) ·
5 5 + ·
10 5
16 + 27
16
= ⇡ 0.3721.
43
2.42. Let R be the event that the transferred ball was red, and W the event that
the transferred ball was white. Let V be the event that a white ball was drawn from
urn B. Then P (R) = 13 and P (W ) = 23 . If a red ball was transferred, then the new
composition of urn B is 2 red and 1 white, while if a white ball was transferred,
then the new composition of urn B is 1 red and 2 white. Putting all this together
gives the following calculation.
P (W V ) P (V |W )P (W )
P (W |V ) = =
P (V ) P (V |W )P (W ) + P (V |R)P (R)
2 2
·
= 2 23 31 1 = 45 .
3 · 3 + 3 · 3
2.43. (a) Let A1 be the event that the first sample had two balls of the same color.
If we imagine that the draws are done one at a time in order then there are 5 · 4
possible outcomes. Counting the green-green and yellow-yellow cases separately
we get that 3 · 2 + 2 · 1 of those outcomes have two balls of the same color. Thus
3·2+2·1 2
P (A1 ) = = .
5·4 5
(b) Let A2 be the event that the second sample had two balls of the same color.
We have P (A2 |A1 ) = 1, since if the first sample had two balls of the same
color then this must be true for the second one. Furthermore, P (A2 |Ac1 ) = 12 ,
because if we sample twice with replacement from an urn containing one yellow
and one green ball, then 1/2 is the probability that the second draw has the
same color as the first one. (Or, dividing the number of favorable outcomes by
the total, 1·1+1·1
2·2 = 12 .) From part (a) we know that P (A1 ) = 25 and P (Ac1 ) = 35 .
Altogether this gives
2 1 3 7
P (A2 ) = P (A2 |A1 )P (A1 ) + P (A2 |Ac1 )P (Ac1 ) = 1 · + · = .
5 2 5 10
(c) Using the already computed probabilities:
P (A2 |A1 )P (A1 ) 1· 2 4
P (A1 |A2 ) = = 75 = .
P (A2 ) 10
7
44 Solutions to Chapter 2

2.44. Let Ai be the event that bin i was chosen (i = 1, 2) and Yj the event that
draw j (j = 1, 2) is yellow.
(a)
P (Y1 |A1 )P (A1 )
P (A1 |Y1 ) =
P (Y1 |A1 )P (A1 ) + P (Y1 |A2 )P (A2 )
4
· 12 14
= 4 10 1 4 1 = 34 ⇡ 0.4118.
10 · 2 + 7 · 2

(b) This question asks for the conditional probability of A1 , given that two draws
with replacement from the chosen urn yield yellow. We assume that draws
with replacement from the same urn are independent. This translates into
conditional independence of Y1 and Y2 , given Ai .
P (Y1 Y2 |A1 )P (A1 )
P (A1 |Y1 Y2 ) =
P (Y1 Y2 |A1 )P (A1 ) + P (Y1 Y2 |A2 )P (A2 )
P (Y1 |A1 )P (Y1 |A1 )P (A1 )
=
P (Y1 |A1 )P (Y1 |A1 )P (A1 ) + P (Y1 |A2 )P (Y1 |A2 )P (A2 )
4
· 4 ·1 196
= 4 410 1 10 4 2 4 1 = ⇡ 0.3289.
10 · 10 · 2 + 7 · 7 · 2
596
2.45. (a) Let B, G, and O be the events that a 7-year-old like the Bears, Packers,
and some other team, respectively. We are given the following:
P (B) = 0.10, P (G) = 0.75, P (O) = 0.15.
Let A be the event that the 7-year-old goes to a game. Then we have
P (A|B) = 0.01, P (A|G) = 0.05, P (A|O) = 0.005.
P (A) is computed from the law of total probability:
P (A) = P (A|B)P (B) + P (A|G)P (G) + P (A|O)P (O)
= 0.01 · 0.1 + 0.05 · 0.75 + 0.005 · 0.15 = 0.03925.
(b) Using the result of (a) (or Bayes’ formula directly):
P (AG) P (A|G)P (G) 0.05 · 0.75 0.0375
P (G|A) = = = = ⇡ 0.9554.
P (A) P (A) 0.03925 0.03925
2.46. A sample point is an ordered triple (x, y, z) where x is the number drawn
from box A, y is the number drawn from box B, and z the number drawn from box
C. All 6 · 12 · 4 = 288 outcomes are equally likely, so we can solve these problems
by counting.
(a) The number of outcomes with exactly two 1s is
1 · 1 · 3 + 1 · 11 · 1 + 5 · 1 · 1 = 19.
The number of outcomes with a 1 from box A and exactly two 1s is
1 · 1 · 3 + 1 · 11 · 1 = 14.
Solutions to Chapter 2 45

Thus
P (ball 1 from A and exactly two 1s)
P (ball 1 from A | exactly two 1s) =
P (exactly two 1s)
14/288 14
= = .
19/288 19
(b) There are three sample points whose sum is 21: (6, 12, 3), (6, 11, 4), (5, 12, 4).
Two of these have 12 drawn from B. Hence the answer is 2/3. Here is the
formal calculation.
P (ball 12 from B and sum of balls 21)
P (ball 12 from B | sum of balls 21) =
P (sum of balls 21)
P {(6, 12, 3), (5, 12, 4)} 2/288 2
= = = .
P {(6, 12, 3), (6, 11, 4), (5, 12, 4)} 3/288 3
2.47. Define random variables X and Y and event S:
X = total number of patients for whom the drug is e↵ective
Y = number of patients for whom the drug is e↵ective, excluding your friends
S = trial is a success for your two friends.
We need to find
P (S \ {X = 55})
P (S|X = 55) = .
P (X = 55)
Note that X ⇠ Bin(80, p), and thus P (X = 55) = 80 55
55 p (1 p)25 . Moreover,
S \ {X = 55} = S \ {Y = 53}. The events S and {Y = 53} are independent, as S
depends on the trial outcomes for your friends, and Y on the trial outcomes of the
other patients. Thus
P (S \ {X = 55}) = P (S \ {Y = 53}) = P (S)P (Y = 53).
78
We have P (S) = p2 and P (Y = 53) = 53 p53 (1 p)25 , as Y ⇠ Bin(78, p). Collecting
everything:
P (S \ {X = 55}) p2 · 78 53
53 p (1 p)25 78
53
P (S|X = 55) = = 80 55 = 80
P (X = 55) 55 p (1 p) 25
55
297
= ⇡ 0.4699.
632
2.48. Define events G = {Kevin is guilty}, A = {DNA match}. Before the DNA
evidence P (G) = 1/100, 000. After the DNA match
1
P (A|G)P (G) 1· 100,000
P (G|A) = = 1 1 99,999
P (A|G)P (G) + P (A|Gc )P (Gc ) 1· 100,000 + 10,000 · 100,000
1 1
= 4
⇡ .
1 + 10 10 11
P1
2.49. (a) The given numbers are nonnegative, so we just need to check that k=0 P (X =
k) = 1:
1
X 1
4 X 1 2 k 4 1
· 2
P (X = k) = + 10 · 3 = + 10 3
2 = 1.
5 5 1 3
k=0 k=1
46 Solutions to Chapter 2

(b) For k 1, by changing the summation index from j to i = j k:

1
X 1
X
1 2 j 1 2 k 2 i 1 2 k 1 1 2 k 1
P (X k) = 10 · 3 = 10 · 3 3 = 10 · 3 2 = 5 3 .
j=k i=0
1 3

Thus again for k 1,

P ({X k} \ {X 1}) P (X k)
P (X k|X 1) = =
P (X 1) P (X 1)
1 2 k 1
5 3 2 k 1
= 1 = 3 .
5

The numerator simplified because {X k} ⇢ {X 1}. The answer shows

(b)

P (D|C)P (C) 1 · 13 1
P (C|D) = = 1 = .
P (D) (p + 1) · 3
1+p

If the guard is equally likely to name either B or C when both of them are
slated to die, then A has not gained anything (his probability of pardon is still
1 2
3 ) but C’s chances of pardon have increased to 3 . In the extreme case where the
guard would never name B unless he had to (p = 0), C is now sure to be pardoned.
2.51. Since C ⇢ B we have B [ C = B and thus A [ B [ C = A [ B. Then
P (A [ B [ C) = P (A [ B) = P (A) + P (B) P (AB).
Since A and B are independent we have P (AB) = P (A)P (B). This gives
P (A [ B [ C) = P (A) + P (B) P (A)P (B) = 1/2 + 1/4 1/8 = 5/8.
2.52. Yes, A, B, and C are mutually independent. There are four equations to
check:
(i) P (AB) = P (A)P (B)
(ii) P (AC) = P (A)P (C)
(iii) P (BC) = P (B)P (C)
(iv) P (ABC) = P (A)P (B)P (C).
(i) comes from inclusion-exclusion:
P (AB) = P (A) + P (B) P (A [ B) = 0.06 = P (A)P (B).
Solutions to Chapter 2 47

(ii) comes from P (AC) = P (C) P (Ac C) = 0.03 = P (A)P (C). (iii) is given.
Finally, (iv) comes from using inclusion-exclusion once more and the previous com-
putations:
P (ABC) = P (A [ B [ C) P (A) P (B) P (C)
+ P (AB) + P (AC) + P (BC)
= 0.006 = P (A)P (B)P (C).
2.53. (a) If the events are disjoint then
P (A [ B) = P (A) + P (B) = 0.3 + 0.6 = 0.9.
(b) If the events are independent then
P (A [ B) = P (A) + P (B) P (AB) = P (A) + P (B) P (A)P (B)
= 0.3 + 0.6 0.3 · 0.6 = 0.72.
2.54. (a) It is possible. We use the fact that A = AB [ AB c and that these are
mutually exclusive:
P (A) = P (AB) + P (AB c ) = P (A|B)P (B) + P (A|B c )P (B c )
1 1 1 1
= P (B) + P (B c ) = (P (B) + P (B c )) = .
3 3 3 3
(b) A and B are independent. By part (a) and the given information,
P (AB)
P (A) = P (A|B) =
P (B)
from which P (AB) = P (A)P (B) and independence has been verified. (Note
that the value 13 was not needed for this conclusion.)
2.55. (a) Since Peter throws the first dart, in order for Mary to win Peter must
fail once more than she does.
X1
P (Mary wins) = P (Mary wins on her kth throw)
k=1
X1
(1 p)r
= ((1 p)(1 r))k 1
(1 p)r =
1 (1 p)(1 r)
k=1
(1 p)r
= .
p + r pr
(b) The possible values of X are the nonnegative integers.
P (X = 0) = P (Peter wins on his first throw) = p.
For k 1,
P (X = k) = P (Mary wins on her kth throw)
+ P (Peter wins on his (k + 1)st throw)
= ((1 p)(1 r))k 1
(1 p)r + ((1 p)(1 r))k p
= ((1 p)(1 r))k 1
(1 p)(p + r pr).
48 Solutions to Chapter 2

We check that the values for k 1 add up to 1 (the value at k = 0):

1
X (1 p)(p + r pr)
((1 p)(1 r))k 1
(1 p)(p + r pr) = =1 p.
1 (1 p)(1 r)
k=1

This is not one of our named distributions.

P (Mary wins on her kth throw)

P (X = k | Mary wins) =
P (Mary wins)
((1 p)(1 r))k 1 (1 p)r
= (1 p)r
p+r pr
k 1
= ((1 p)(1 r)) (p + r pr).

Thus given that Mary wins, X ⇠ Geom(p + r pr).

2.56. Suppose P (A) = 0. Then for any B, AB ⇢ A implies P (AB) = 0. We also

have P (A)P (B) = 0 · P (B) = 0. Thus P (AB) = 0 = P (A)P (B) and independence
of A and B has been verified.
Suppose P (A) = 1. Then P (Ac ) = 0 and the previous case gives the indepen-
dence of Ac and B, from which follows the independence of A and B. Alternatively,
we can prove this case by first observing that P (AB) = P (B) P (Ac B) = P (B)
0 = P (B) and then P (A)P (B) = 1 · P (B) = P (B). Again P (AB) = P (A)P (B)
has been verified.

2.57. (a) Let E1 be the event that the first component functions. Let E2 be the
event that the second component functions. Let S be the event that the entire
system functions. S = E1 \E2 since both components must function in order for
the whole system to be operational. By the assumption that each component
acts independently, we have

P (S) = P (E1 \ E2 ) = P (E1 )P (E2 ).

Next we find the probabilities P (E1 ) and P (E2 ).

Let Xi be a Bernoulli random variable taking the value 1 if the ith element of the
first component is working. The information given is that P (Xi = 1) = 0.95,
P (Xi = 0) = 0.05 and X1 , . . . , X8 are mutually independent. Similarly, let
Yi be a Bernoulli random variable taking the value 1 if the ith element of the
second component is working. Then P (Yi = 1) = 0.90, P (Yi = 0) = 0.1 and
P8
Y1 , . . . , Y4 are mutually independent. Let X = i=1 Xi give the total number
P4
of working elements in component number one and Y = i=1 Yi the total
number of working elements in component number 2. Then X ⇠ Bin(8, 0.95)
and Y ⇠ Bin(4, 0.90), and X and Y are independent (by the assumption that
Solutions to Chapter 2 49

the components behave independently). We have

P (E1 ) = P (X 6) = P (X = 6) + P (X = 7) + P (X = 8)
✓ ◆ ✓ ◆ ✓ ◆
8 8 8
= (0.95)6 (0.05)2 + (0.95)7 (0.05)1 + (0.95)8 (0.05)0
6 7 8
= 0.9942117,
and
P (E2 ) = P (Y 3) = P (Y = 3) + P (Y = 4)
✓ ◆
4
= (0.9)3 (0.1) + (0.9)4
3
= 0.9477.
Thus,
P (S) = P (E1 )P (E2 ) = 0.9942117 · 0.9477 ⇡ 0.9422.
(b) We look for P (E2c | S c ). We have
P (E2c S c ) P (E2c )
P (E2c |S c ) = = ,
P (S c ) 1 P (S)
where we used that E2c ⇢ S c . (If the first component does not work, then
the system does not work; mathematically a consequence of de Morgan’s law:
S c = E1c [ E2c .) Thus,
1 P (E2 ) 1 0.9477
P (E2c |S c ) = = ⇡ 0.9048.
1 P (S) 1 0.9422
2.58. (a) It is enough to show that any two of them are pairwise independent since
the argument is the same for any such pair. We show that P (AB) = P (A)P (B).
Let
⌦ = {(a, b, c) : a, b, c 2 {1, 2, . . . , 365}} =) #⌦ = 3653 .
We have by counting the possibilities
1
#AB = {all three have same birthday} = 365 · 1 · 1 =) P (AB) = .
3652
Also,
#A = {Alex and Betty have the same birthday} = 365 · 1 · 365,
where we counted as follows: there are 365 ways for Alex to have a birthday,
then only once choice for Betty, and then another 365 ways for Conlin. Thus,
3652 1
P (A) = = .
3653 365
1
Similarly, P (B) = 365 and so,
P (AB) = P (A)P (B).
(b) The events are not independent. Note that ABC = AB and so,
1 1
P (ABC) = P (AB) = 6= P (A)P (B)P (C) = .
3652 3653
50 Solutions to Chapter 2

2.59. Define events: B = {the bus functions}, T = {the train functions}, and S =
{no storm}. The event that travel is possible is (B [ T ) \ S = BS [ T S. We
calculate the probability with inclusion-exclusion and independence:

P (BS [ T S) = P (BS) + P (T S) P (BT S)

= P (B)P (S) + P (T )P (S) P (B)P (T )P (S)
8 19 9 19 8 9 19 931
= 10 · 20 + 10 · 20 10 · 10 · 20 = 1000 .

2.60. (a) P (AB c ) = P (A) P (AB) = P (A) P (A)P (B) = P (A) 1 P (B) =
P (A)P (B c ).
(b) Apply first de Morgan and then inclusion-exclusion:

P (Ac C c ) = 1 P (A [ C) = 1 P (A) P (C) + P (AC)

=1 P (A) P (C) + P (A)P (C)
= 1 P (A) 1 P (C) = P (Ac )P (C c ).

(c) P (AB c C) = P (AC) P (ABC) = P (A)P (C) P (A)P (B)P (C) = P (A) 1
P (B) P (C = P (A)P (B c )P (C).
(d) Again first de Morgan and then inclusion-exclusion:

P (Ac B c C c ) = 1 P (A [ B [ C)
=1 P (A) P (B) P (C) + P (AB) + P (AC) + P (BC) P (ABC)
=1 P (A) P (B) P (C) + P (A)P (B) + P (A)P (C) + P (B)P (C)
P (A)P (B)P (C)
= 1 P (A) 1 P (B) 1 P (C)
c c c
= P (A )P (B )P (C ).

2.61. (a) Treat each draw as a trial: green is success, red is failure. By counting
favorable outcomes, the probability of success is p = 37 for each draw. Because we
draw with replacement the outcomes are independent. Thus the number of greens
in the 9 picks is the number of successes in 9 trials, hence a Bin(9, 37 ) distribution.
Using the probability mass function of the binomial distribution gives

P (X 1) = 1 (1 p)9 ⇡ 0.9935,
P (X = 0) = 1
5
X 5 ✓ ◆
X 9 k
P (X  5) = P (X = k) = p (1 p)9 k ⇡ 0.8653.
k
k=0 k=0

(b) N is the number of trials needed for the first success, and so has geometric
distribution with parameter p = 37 . The probability mass function of the geometric
distribution gives
9
X 9
X
P (N  9) = P (N = k) = p(1 p)k 1
⇡ 0.9935.
k=1 k=1
Solutions to Chapter 2 51

(c) We have P (X 1) = P (N  9). We can check this by using the geometric sum
formula to get
9
X 1 (1 p)9 )
p(1 p)k 1
=p =1 (1 p)9 .
1 (1 p)
k=1
Here is another way to see this, without any algebra. Imagine that we draw balls
with replacement infinitely many times. Think of X as the number of green balls
in the first 9 draws. N is still the number of draws needed for the first green. Now
if X 1, then we have at least one green within the first 9 draws, which means
that the first green draw happened within the first 9 draws. Thus X 1 implies
N  9. But this works in the opposite direction as well: if N  9 then the first
green draw happened within the first 9 draws, which means that we must have at
least one green within the first 9 picks. Thus N  9 implies X 1. This gives the
equality of event: {X 1} = {N  9}, and hence the probabilities must agree as
well.
2.62. Regard the drawing of three marbles as one trial, with success probability p
given by
9
3 7 · 8 · 9 · 10 42
p = P (all three marbles blue) = 13 = = .
3
10 · 11 · 12 · 13 143
42
X ⇠ Bin(20, 143 ). The probability mass function is
✓ ◆
20 42 k 101 20 k
P (X = k) = 143 143 for k = 0, 1, 2, . . . , 20.
k
2.63. The number of heads in n coin flips has distribution Bin(n, 1/2). Thus the
probability of winning if we choose to flip n times is
✓ ◆
n 1 n(n 1)
fn = P (n flips yield exactly 2 heads) = = .
2 2n 2n+1
We want to find the n which maximizes fn . Let us compare fn and fn+1 . We have
n(n 1) (n + 1)n
fn < fn+1 () n+1
< () 2(n 1) < n + 1 () n < 3.
2 2n+2
Similarly, fn > fn+1 if and only if n > 3, and f3 = f4 . Thus
f 2 < f 3 = f 4 > f5 > f 6 > . . . .
This means that the maximum happens at n = 3 and n = 4, and the probability
of winning at those values is f3 = f4 = 3·2 3
24 = 8 .
2.64. Let X be the number of correct answers. X is the number of successes in 20
independent trials with success probability p + 12 r.
19 20
P (X 19) = P (X = 19) + P (X = 20) = 20 p + 12 r q + 12 r + p + 12 r .
2.65. Let A be the event that at least one die lands on a 4 and B be the event
that all three dice land on di↵erent numbers. Our sample space is the set of all
triples (a1 , a2 , a3 ) with 1  ai  6. All outcomes are equally likely and there are
216 outcomes. We need P (A|B) = PP(AB) (B) . There are 6 · 5 · 4 = 120 elements in B.
To count the elements of AB, we first consider Ac B. This is the set of triples where
52 Solutions to Chapter 2

the three numbers are distinct and none of them is a 4. So #Ac B = 5 · 4 · 3 = 60.
Then #AB = #B #Ac B = 120 60 = 60 and
60
P (AB) 216 1
P (A|B) = = 120 = .
P (B) 216
2
2.66. Let
✓ ◆
n 1 2 5 n 2 n(n 1)5n 2
fn = P (n die rolls give exactly two sixes) = 6 6 = .
2 2 · 6n
Next,
1)5n
n(n 2
(n + 1)n5n 1
fn < fn+1 () < () 6(n 1) < 5(n + 1)
2 · 6n 2 · 6n+1
() n < 11.
By reversing the inequalities we get the equivalence
fn > fn+1 () n > 11.
By complementing the two equivalences, we get
fn = fn+1 () fn fn+1 and fn  fn+1
() n 11 and n  11 () n = 11.
Putting all these facts together we conclude that the probability of two sixes is
maximized by n = 11 and n = 12 and for these two values of n, that probability is
11 · 10 · 59
⇡ 0.2961.
2 · 611
2.67. Since {X = n + k} ⇢ {X > n} for k 1, we have
P (X = n + k, X > n) P (X = n + k) (1 p)n+k 1 p
P (X = n + k|X > n) = = = .
P (X > n) P (X > n) P (X > n)
Evaluate the denominator:
1
X 1
X
P (X > n) = P (X = k) = (1 p)k 1
p
k=n+1 k=n+1
1
X
n 1
= p(1 p) (1 p)k = p(1 p)n · = (1 p)n .
1 (1 p)
k=0

Thus,
(1 p)n+k 1 p (1 p)n+k 1 p
P (X = n + k|X > n) = =
P (X > n) (1 p)n
= (1 p)k 1
p = P (X = k).
2.68. For k 1, the assumed memoryless property gives
P (X = k + 1)
P (X = k) = P (X = k + 1 | X > 1) =
P (X > 1)
Solutions to Chapter 2 53

which we convert into P (X = k + 1) = P (X > 1)P (X = k). Now let m 2, and

apply this repeatedy to k = m 1, m 2, . . . , 2:
P (X = m) = P (X > 1)P (X = m 1) = P (X > 1)2 P (X = m 2)
= · · · = P (X > 1)m 1
P (X = 1).
Set p = P (X = 1). Then it follows that P (X = m) = (1 p)m 1 p for all m 1
(m = 1 by definition of p, m 2 by the calculation above). In other words, X ⇠
Geom(p).
2.69. We assume that the successive flips of a given coin are independent. This
gives us the conditional independence:
P (A1 A2 | F ) = P (A1 | F ) P (A2 | F ), P (A1 A2 | M ) = P (A1 | M ) P (A2 | M ),
and P (A1 A2 | H) = P (A1 | H) P (A2 | H).
The solution comes by the law of total probability:
P (A1 A2 ) = P (A1 A2 | F ) P (F ) + P (A1 A2 | M ) P (M ) + P (A1 A2 | H) P (H)
= P (A1 | F )P (A2 | F )P (F ) + P (A1 | B)P (A2 | B)P (B)
1 1 90 3 3 9 9 9 1 2655
= 2 · 2 · 100 + 5 · 5 · 100 + 10 · 10 · 100 = 10,000 .

2655 513 2
Now 10,000 6= ( 1000 ) which says that P (A1 A2 ) 6= P (A1 )P (A2 ). In other words,
A1 and A2 are not independent without the conditioning on the type of coin. The
intuitive reason is that the first flip gives us information about the coin we hold,
and thereby alters our expectations about the second flip.
2.70. The relevant probabilities: P (A) = P (B) = 2p(1 p) and
P (AB) = P {(T, H, T), (H, T, H)} = p2 (1 p) + p(1 p)2 = p(1 p).
Thus A and B are independent if and only if
2
2p(1 p) = p(1 p) () 4p2 (1 p)2 p(1 p) = 0
() p(1 p) 4p(1 p) 1) = 0
() p = 0 or 1 p = 0 or 4p(1 p) 1 = 0 () p 2 {0, 21 , 1}.
Note that cancelling p(1 p) from the very first equation misses the solutions p = 0
and p = 1.
2.71. Let F = {coin is fair}, B = {coin is biased} and Ak = {kth flip is tails}.
We assume that conditionally on F , the events Ak are independent, and similarly
conditionally on B. Let Dn = A1 \ A2 \ · · · \ An = {the first n flips are all tails}.
(a)
P (Dn |B)P (B) ( 3 )n 1
P (B|Dn ) = = 3 n 15 101 n 9
P (Dn |B)P (B) + P (Dn |F )P (F ) ( 5 ) 10 + ( 2 ) 10
( 35 )n
= .
( 35 )n + 9( 12 )n
2 4
In particular, P (B|D1 ) = 17 and P (B|D2 ) = 29 .
54 Solutions to Chapter 2

(b)
( 35 )24
⇡ 0.898
( 35 )24 + 9( 12 )24
while
( 35 )25
⇡ 0.914,
( 35 )25 + 9( 12 )25
so 25 flips are needed.
(c)
P (Dn+1 ) P (Dn+1 |B)P (B) + P (Dn+1 |F )P (F )
P (An+1 |Dn ) = =
P (Dn ) P (Dn |B)P (B) + P (Dn |F )P (F )
( 35 )n+1 10 1
+ ( 12 )n+1 10
9
= .
( 35 )n 10
1
+ ( 12 )n 10
9

(d) Intuitively speaking, an unending sequence of tails would push the probability
of a biased coin to 1, and hence the probability of the next tails is 3/5. For a
rigorous calculation we take the limit of the previous answer:
( 35 )n+1 101
+ ( 12 )n+1 10
9 3 9 5 n+1
5 + 2(6) 3
lim P (An+1 |Dn ) = lim 3 n 1 1 n 9 = lim 5 n = .
n!1 n!1 ( 5 ) 10 + ( 2 ) 10 n!1 1 + 9( 6 ) 5
2.72. The sample space for n trials is the same, regardless of the probabilities,
namely the space of ordered n-tuples of zeros and ones:
⌦ = {! = (s1 , . . . , sn ) : each si equals 0 or 1}.
By independence, the probability of a sample point ! = (s1 , . . . , sn ) is obtained by
multiplying together a factor pi for each si = 1 and 1 pi for each si = 0. We can
express this in a single formula as follows:
n
Y
P {(s1 , . . . , sn )} = psi i (1 pi )1 si .
i=1

2.73. Let X be the number of blond customers at the pancake place. The popula-
tion of the town is 500, and 100 of them are blond. We may assume that the visitors
are chosen randomly from the population, which means that we take a sample of
size 14 without replacement from the population. X denotes the number of blonds
among this sample. This is exactly the setup for the hypergeometric distribution
and X ⇠ Hypergeom(500, 100, 14). (Because the total population size is N = 500,
the number of blonds is NA = 100 and we take a sample of n = 14.) We can now
use the probability mass function of the hypergeometric distribution to answer the
two questions.
(a)
100 400
10 4
P (exactly 10 blonds) = P (X = 10) = 500 ⇡ 0.00003122.
14
(b)
2
X 2
X 100 400
k 14 k
P (at most 2 blonds) = P (X  2) = P (X = k) = 500
k=0 k=0 14
⇡ 0.4458.
Solutions to Chapter 2 55

2.74. Define events: D = {Steve is a drug user}, A1 = {Steve fails the first drug test}
and A2 = {Steve fails the second drug test}. Assume that Steve is no more or less
likely to be a drug user than a random person from the company, so P (D) =
0.01. The data about the reliability of the tests tells us that P (Ai |D) = 0.99
and P (Ai |Dc ) = 0.02 for i = 1, 2, and conditional independence P (A1 A2 |D) =
P (A1 |D)P (A2 |D) and also the same under conditioning on Dc .
(a)
99 1
P (A1 |D)P (D) 100 · 100 1
P (D|A1 ) = = 99 1 2 99 =
P (A1 |D)P (D) + P (A1 |Dc )P (Dc ) 100 · 100 + 100 · 100
3

(c)
P (A1 A2 |D)P (D)
P (D|A1 A2 ) =
P (A1 A2 |D)P (D) + P (A1 A2 |Dc )P (Dc )
99 2 1
100 · 100 99
= = ⇡ 0.9612.
99 2 1 2 2 99 103
100 · 100 + 100 · 100

2.75. We introduce the following events:

A = {the store gets its phones from factory II},
Bi = {the ith phone is defective}, i = 1, 2.

Then Ac is the event that the phone is from factory I. We know that
2 3 1 1
P (A) = 0.4 = , P (Ac ) = 0.6 = , P (Bi |A) = 0.2 = , P (Bi |Ac ) = 0.1 = .
5 5 5 10
We need to compute P (A|B1 B2 ). By Bayes’ theorem,
P (B1 B2 |A) · P (A)
P (A|B1 B2 ) = .
P (B1 B2 |A)P (A) + P (B1 B2 |Ac )P (Ac )
We may assume that conditionally on A the events B1 and B2 are independent. This
means that given that the store gets its phones from factory II, the defectiveness of
the phones stocked there are independent. We may also assume that conditionally
on Ac the events B1 and B2 are independent. Then
P (B1 B2 |A) = P (B1 |A)P (B2 |A) = ( 15 )2 , P (B1 B2 |Ac ) = P (B1 |Ac )P (B2 |Ac ) = ( 10
1 2
)
and
( 15 )2 · 25 8
P (A|B1 B2 ) = = ⇡ 0.7273.
( 15 )2 · 25 + ( 10 1 2
) · 3
5
11
56 Solutions to Chapter 2

2.76. Let A2 be the event that the second test comes back positive. Take now
96
P (D) = 494 ⇡ 0.194 as the prior. Then
P (A2 |D)P (D)
P (D|A2 ) =
P (A2 |D)P (D) + P (A2 |Dc )P (Dc )
96 96
100 · 494 2304
= 96 96 2 398 = 2503 ⇡ 0.9205.
·
100 494 + ·
100 494
P (AB)
2.77. By definition P (A|B) = P (B) . Since AB ⇢ B, we have P (AB)  P (B) and
P (AB) P (AB)
thus P (A|B) =  1. Furthermore, P (A|B) =
P (B) P (B) 0 because P (B) > 0
and P (AB) 0. The property 0  P (A|B)  1.
To check P (⌦ | B) = 1 note that ⌦ \ B = B, and so
P (⌦ \ B) P (B)
P (⌦ | B) = = = 1.
P (B) P (B)
Similarly, ? \ B = ?, thus
P (? \ B) P (?) 0
P (? | B) = = = = 0.
P (B) P (B) P (B)
Finally, if we have a pairwise disjoint sequence {Ai } then {BAi } are also pairwise
disjoint, and their union is ([1
i=1 Ai ) \ B. This gives

P (([1
i=1 Ai ) \ B)) P ([1
i=1 Ai B))
P ([1
i=1 Ai | B) = =
P (B) P (B)
P1 X1 1
i=1 P (Ai B)) P (Ai B)) X
= = = P (Ai |B).
P (B) i=1
P (B) i=1

2.78. Define events D = {A happens before B} and

Dn = {neither A nor B happens in trials 1, . . . , n 1,
and A happens in trial n}.
Then D is the union of the pairwise disjoint events {Dn }1n<1 . This statement
uses the assumption that A and B are disjoint. Without that assumption we would
have to add to Dn the condition that B c happens in trial n.
1
X 1
X n 1
P (D) = P (Dn ) = 1 P (A [ B) P (A)
n=1 n=1
P (A)
= = P (A | A [ B).
P (A [ B)
2.79. Following the text, we consider
⌦ = {(x1 , . . . , x23 ) : xi 2 {1, . . . , 365}},
which is the set of possible birthday combinations for 23 people. Note that #⌦ =
36523 . Next, note that there are exactly
21
Y
365 · 364 · · · · · (365 21) · 22 = 22 · (365 k)
k=0
Solutions to Chapter 2 57

ways to choose the first 22 birthdays to be all di↵erent and the twenty-third to be
one of the first 22. Thus, the desired probability is
Q21
22 · k=0 (365 k)
⇡ 0.0316.
36523
2.80. Assume that birth months of distinct people are independent and that for
any particular person each month is equally likely. Then we are asking that a
sample of seven items with replacement from a set of 12 produces no repetitions.
The probability is
12 · 11 · 10 · · · 6 385
= ⇡ 0.1114.
127 3456
2.81. Let An be the event that there is a match among the birthdays of the chosen
n Martians. Then
669 · 668 · · · (669 (n 1))
P (An ) = 1 P (all n birthdays are distinct) = 1
669n
x
To estimate the product we use 1 x ' e to get
669 · 668 · · · (669 (n 1)) Y1 ✓
n
k
◆ nY1
k
= 1 ⇡ e 669
669n 669
k=0 k=0
1
Pn 1 1 n(n 1) n2
k
=e 669 k=0 =e 669 2 ⇡e 2·669

n2
Thus P (An ) ⇡ 1 e . Now solving the inequality P (An ) 0.9:
2·669

n2 n2 p
1 e 2·669 0.9 () ln(1 0.9) () n 2 · 669 ln 10 ' 55.5.
2 · 669
This would suggest n = 56.
In fact this is correct: the actual numerical values are P (A56 ) ' 0.9064 and
P (A55 ) ' 0.8980.
Solutions to Chapter 3

3.1. (a) The random variable X takes the values 1, 2, 3, 4 and 5. Collecting the
probabilities corresponding to the values that are at most 3 we get
1 1 3 3
P (X  3) = P (X = 1)+P (X = 2)+P (X = 3) = pX (1)+pX (2)+pX (3) = + + = .
7 14 14 7
(b) Now we have to collect the probabilities corresponding to the values which are
less than 3:
1 1 3
P (X < 3) = P (X = 1) + P (X = 2) = pX (1) + pX (2) = + = .
7 14 14
(c) First we use the definition of conditional probability to get
P (X < 4.12 and X > 1.638)
P (X < 4.12 | X > 1.638) = .
P (X > 1.638)
We have P (X < 4.12 and X > 1.638) = P (1.638 < X < 4.12). The possible values
of X between 1.638 and 4.12 are 2, 3 and 4. Thus
1 3 2 4
P (X < 4.12 and X > 1.638) = pX (2) + pX (3) + pX (4) = + + = .
14 14 7 7
Similarly,
1 3 2 2 6
P (X > 1.638) = pX (2) + pX (3) + pX (4) + pX (5) = + + + = .
14 14 7 7 7
From this we get
4
2
P (X < 4.12 | X > 1.638) = 76 = .
7
3
3.2. (a) We must have that the probability mass function sums to one. Hence, we
require
X 6
1= p(k) = c (1 + 2 + 3 + 4 + 5 + 6) = 21c.
k=1
1
Thus, c = 21 .

59
60 Solutions to Chapter 3

(b) The probability that X is odd is

1 9 3
P (X 2 {1, 3, 5}) = p(1) + p(3) + p(5) = (1 + 3 + 5) = = .
21 21 7
3.3. (a) We need to check that f is non-negative and that it integrates to 1 on R.
The non-negativity follows from the definition. For the integral we can compute
Z 1 Z 1
x=1
f (x)dx = 3e 3x dx = e 3x x=0 = lim ( e 3x ) ( e0 ) = 0 ( 1) = 1.
1 0 x!1

In the first step we used the formula for f (x), and the fact that it is equal to 0 for
x  0.
(b) Using the definition of the probability density function we get
Z 1 Z 1
x=1
P ( 1 < X < 1) = f (x)dx = 3e 3x dx = e 3x x=0 = 1 e 3 .
1 0

(c) Using the definition of the probability density function again we get
Z 5 Z 5
x=5
P (X < 5) = f (x)dx = 3e 3x dx = e 3x x=0 = 1 e 15 .
1 0

(d) From the definition of conditional probability

P (2 < X < 4 and X < 5)
P (2 < X < 4 | X < 5) = .
P (X < 5)
We have P (2 < X < 4 and X < 5) = P (2 < X < 4). Similar to the previous parts:
Z 4 Z 4
x=4
P (2 < X < 4) = f (x)dx = 3e 3x dx = e 3x x=2 = e 6 e 15 .
2 2

Using the result of part (c):

P (2 < X < 4) e 6 e 15
P (2 < X < 4 | X < 5) = = .
P (X < 5) 1 e 15
1
3.4. (a) The density of X is 6 on [4, 10] and zero otherwise. Hence,
6 4 1
P (X < 6) = P (4 < X < 6) = = .
6 3
(b)

P (|X 7| > 1) = P (X 7 > 1) + P (X 7< 1) = P (X > 8) + P (X < 6)

10 8 1 2
= + = .
6 3 3
(c) For 4  t  6 we have
P (X < t, X < 6) P (X < t) t 4 t 4
P (X < t | X < 6) = = =3· = .
P (X < 6) 1/3 6 2
Solutions to Chapter 3 61

3.5. The possible values of a discrete random variable are exactly the values where
the c.d.f. jumps. In this case these are the values 1, 4/3, 3/2 and 9/5. The corre-
sponding probabilities are equal to the size of corresponding jumps:
1
pX (1) = 3 0 = 13 ,
1 1
pX (4/3) = 2 3 = 16 ,
3 1
pX (3/2) = 4 2 = 14 ,
3
pX (9/5) = 1 4 = 14 .
3.6. For the random variable in Exercise 3.1, we may use (3.13). For s 2 ( 1, 1),
8
>
> 0, s<1
>
>
>
> 1
>
> 7, 1s<2
>
>
>
<3, 2s<3
F (s) = P (X  s) = 14
>
> 6
>
>
> 14 , 3  s < 4
>
>
>
> 10
>
> 14 , 4  s < 5
:
1, 5  s.

For the random variable in Exercise 3.3, we may use (3.15). For s  0 we have
that
P (X  s) = 0,
whereas for s > 0 we have
Z s
P (X  s) = 3e 3x dx = 1 e 3s .
0
3.7. (a) If P (a  X  b) = 1 then F (y) =p 0 for y < a
p and F (y) = 1 for y b.
From the definition of F we see that a = 2 and b = 3 gives the smallest such
interval.
(b) Since X is continuous, P (X = 1.6) = 0. We can also see this directly from F :
P (X = 1.6) = F (1.6) lim F (x) = F (1.6) F (1.6 ).
x!1.6

Since F (x) is continuous at x = 1.6 (actually, it is continuous everywhere), we have

F (1.6 ) = F (1.6) and this gives P (X = 1.6) = 0 again.
(c) Because X is continuous, we have P (1  X  3/2) = P (1 < X  3/2). We
also have
P (1  X  3/2) = P (1 < X  3/2) = P (X  3/2) P (X  1)
= F (3/2) F (1) = (( 32 )2
2) 0 = 94 2 = 14 .
p p
We used 1 < 2  3/2  3 when we evaluated F (3/2) F (1).
(d)
p SincepF is continuous, and it is di↵erentiable apart from finitely many points
( 2 and 3), we can just di↵erentiate it to get the probability density function:
( p p
0 2x if 2 < x < 3
f (x) = F (x) =
0 otherwise.
p p
We chose 0 for the value of f at 2 and 3, but the actual values are not important.
62 Solutions to Chapter 3

3.8. (a) We have

5
X 1 1 3 2 2 7
E[X] = kpX (k) = 1 · +2· +3· +4· +5· = .
7 14 14 7 7 2
k=1

(b) We have
5
X 1 1 3 2 2 25
E[|X 2|] = |k 2|pX (k) = 1 · +0· +1· +2· +3· = .
7 14 14 7 7 14
k=1

3.9. (a) Since X is continuous, we can compute its mean as

Z 1 Z 1
E[X] = xf (x)dx = x · 3e 3x dx.
1 0

Using integration by parts we can evaluate the last integral to get E[X] = 13 .
(b) e2X is a function of X, and X is continuous, so we can compute E[e2X ] as
follows:
Z 1 Z 1 Z 1
2X 2x 2x 3x
E[e ] = e f (x)dx = e · 3e dx = 3e x dx = 3.
1 0 0

3.10. (a) The random variable |X| takes values 0 and 1 with probabilities
1
P (|X| = 0) = P (X = 0) = 3 and P (|X| = 1) = P (X = 1) + P (X = 1) = 23 .
Then the definition of expectation gives
E[|X|] = 0 · P (|X| = 0) + 1 · P (|X| = 1) = 23 .
(b) Applying formula (3.24):
X
E[|X|] = |k| P (X = k) = 1 · P (X = 1) + 0 · P (X = 0) + 1 · P (X = 1)
k
1 1
= 2 + 6 = 23 .
3.11. By (3.25) we have
Z 1 Z 2
E[(Y 1)2 ] = (x 1)2 f (x) dx = (x 1)2 · 23 x dx = 7
18 .
1 1

The interval of integration changed from ( 1, 1) to [1, 2] since f (x) = 0 outside

[1, 2].
3.12. The expectation is
1
X 1
X 1
6 1 6 X1
E[X] = nP (X = n) = n· = ,
n=1 n=1
⇡ 2 n2 ⇡ 2 n=1 n
which is infinite by the conclusion of Example D.5 (using = 1 in that example).
3.13. (a) We need to find an m for which P (X m) 1/2 and P (X  m) 1/2.
For X from Exercise 3.1 we have
P (X  3) = 37 , P (X  4) = 5
7 P (X  5) = 1
and
11
P (X 3) = 14 , P (X 4) = 37 , P (X 5) = 27 .
Solutions to Chapter 3 63

From this we get that m = 4 works as the median, but any number that is larger
or smaller than 4 is not a median.
For X from Exercise 3.3 we have
3m 3m
P (X  m) = 1 e , and P (X m) = e if m 0
and P (X  m) = 0, P (X m) = 1 for m < 0. From this we get that the median
m satisfies e 3m = 1/2, which leads to m = ln(2)/3.
(b) We need P (X  q) 0.9 and P (X q) 0.1. Since X is continuous, we
must have P (X  q) + P (X q) = 1 and hence P (X  q) = 0.9 and P (X
q) = 0.1. Using the calculations from part (a) we see that e 3m = 0.1 from which
q = ln(10)/3.
3.14. The mean of the random variable X from Exercise 3.1 is
5
X 1 1 3 2 2 7
E[X] = kpX (k) = 1 · +2· +3· +4· +5· = .
7 14 14 7 7 2
k=1

The second moment is

5
X 1 1 3 2 2 197
E[X 2 ] = k 2 pX (k) = 12 · + 22 · + 32 · + 42 · + 52 · = .
7 14 14 7 7 14
k=1

Therefore, the variance is

✓ ◆2
197 7 51
Var(X) = E[X 2 ] (E[X])2 = = .
14 2 28
Now let X be the random variable from Exercise 3.3. The mean is
Z 1 Z 1
1
E[X] = xf (x)dx = x · 3e 3x dx = ,
1 0 3
which follows from an application of integration by parts. The second moment is
Z 1 Z 1
2 2 2
E[X ] = x f (x) dx = x2 · 3e 3x dx = ,
1 0 9
where the integral is calculated using two rounds of integration by parts. Thus, the
variance is
✓ ◆2
2 1 1
Var(X) = E[X 2 ] (E[X])2 = = .
9 3 9
3.15. (a) We have
E[3X + 2] = 3E[X] + 2 = 3 · 3 + 2 = 11.
(b) We know that Var(X) = E[X 2 ] E[X]2 . Rearranging the terms gives
E[X 2 ] = Var(X) + E[X]2 = 4 + 32 = 13.
(c) Expanding the square gives
E[(2X + 3)2 ] = E[4X 2 + 12X + 9] = 4E[X 2 ] + 12E[X] + 9 = 4 · 13 + 12 · 3 + 9 = 97,
where we also used the result of part (b).
(d) We have Var(4X 2) = 42 Var(X) = 42 · 4 = 64.
64 Solutions to Chapter 3

3.16. The expectation of Z is

Z 1 Z 2 Z 7
1 3 1 4 1 3 49 25 75
E[Z] = zfZ (z)dz = z · dz + z · dz = · + · = .
1 1 7 5 7 7 2 7 2 14
The second moment is
Z 1 Z 2 Z 7
1 3
E[Z 2 ] = z 2 fZ (z)dz = z 2 · dz + z 2 · dz
1 1 7 5 7
1 8 1 3 73 53 661
= · + · = .
7 3 7 3 21
Hence, the variance is
✓ ◆2
661 75 1633
Var(Z) = E[Z 2 ] (E[Z])2 = = .
21 14 588
3.17. If X ⇠ N (µ, 2 ) then Z = X µ is a standard normal random variable. We
will reduce each question to a probability involving the standard normal random
variable Z. Recall that P (Z < x) = (x) and P (Z > x) = 1 (x). The numerical
values of can be looked up using the table in Appendix E.
(a)
✓ ◆
X µ 3.5 µ
P (X > 3.5) = P >
5.5 5.5
= P (Z > p
7
) =1 (p 7
)
⇡1 (2.08) ⇡ 1 0.9812 = 0.0188.
(b)
✓ ◆
2.1 µ X µ 1.9 µ
P ( 2.1 < X < 1.9) = P < <

= P( p0.1 <Z< 0.1

p )= 0.1
(p ) ( 0.1
p )
7 7 7 7
0.1 0.1 0.1
= ( 7)
p (1 (p 7
)) = 2 (p 7
) 1
⇡ 2 (0.04) 1 ⇡ 2 · 0.516 1 = 0.032.
(c)
✓ ◆
X µ 2 µ
P (X < 2) = P <

= P (Z < p4 ) = ( p47 )
7
⇡ (1.51) ⇡ 0.9345.
(d)
✓ ◆
X µ 10 µ
P (X < 19) = P <

= P (Z < p8 ) = ( p8 ) =1 ( p8 )
7 7 7
⇡1 (3.02) ⇡ 1 0.9987 = 0.0013.
Solutions to Chapter 3 65

(e)
✓ ◆
X µ 4 µ
P (X > 4) = P >

= P (Z > p6 ) =1 ( p67 )
7
⇡1 (2.27) ⇡ 1 0.9884 = 0.0116.

3.18. If X ⇠ N (µ, 2 ) then Z = X µ is a standard normal random variable. Recall

that the values P (Z < x) = (x) can be looked up using the table in Appendix E.
(a)
✓ ◆
2 3 X 3 6 3 1
P (2 < X < 6) = P < < = P( 2 < Z < 32 )
2 2 2
= P (Z < 1.5) P (Z < .5) = (1.5) ( 0.5)
= (1.5) (1 (0.5)) = 0.9332 (1 0.6915) = .6247.

(b) We need c so that

✓ ◆ ✓ ◆
X 3 c 3 c 3
0.33 = P (X > c) = P > =1 .
2 2 2
c 3
Hence, we need c satisfying 2 = 0.67. Checking the table in Appendix
E, we conclude that (z) = 0.67 is solved by z = 0.44. Hence,
c 3
= 0.44 () c = 3.88.
2
(c) We have that

E[X 2 ] = Var(X) + (E[X])2 = 4 + 32 = 13.

3.19. From the definition of the c.d.f. we have
F (2) = P (Z  2) = P (Z = 0) + P (Z = 1) + P (Z = 2)
✓ ◆ ✓ ◆ ✓ ◆
10 1 0 2 10 10 1 1 2 9 10 1 2 2 8
= 3 3 + 3 3 + 3 3
0 1 2
10 9 8
2 + 10 · 2 + 45 · 2
= ⇡ 0.299.
310
The solution for F (8) can be done the same way:
8 ✓ ◆
X 10 1 i 2 10 i
F (8) = P (Z  8) = 3 3 .
i=0
i

There is another way which involves fewer terms:

✓✓ ◆ ✓ ◆ ◆
10 1 9 2 1 10 1 10 2 0
F (8) = P (Z  8) = 1 P (Z 9) = 1 3 3 + 3 3
9 10
21
=1 ⇡ 0.9996.
310
66 Solutions to Chapter 3

3.20. We must show that Y ⇠ Unif[0, c]. We find the cumulative function. For any
t 2 ( 1, 1) we have
8
>
<0, t<0
c (c t) t
FY (t) = P (Y  t) = P (c X  t) = P (c t  X) = c = c, 0  t < c
>
:
1, c  t.
which is the cumulative distribution function for a Unif[0, c] random variable.
3.21. (a) The number of heads out of 2 coin flips can be 0, 1 or 2. These are the pos-
sible values of X. The possible outcomes of the experiment are {HH, HT, T H, T T },
and each one of these has a probability 14 . We can compute the probability mass
function of X by identifying the events {X = 0}, {X = 1}, {X = 2} and computing
the corresponding probabilities:
1
pX (0) = P (X = 0) = P ({T T }) = 4
2 1
pX (1) = P (X = 1) = P ({HT, T H}) = 4 = 2
pX (2) = P (X = 2) = P ({HH}) = 14 .
(b) Using the probability mass function from (a):
3
P (X 1) = P (X = 1) + P (X = 2) = pX (1) + pX (2) = 4

and
P (X > 1) = P (X = 2) = pX (2) = 14 .
(c) Since X is a discrete random variable, we can compute the expectation as
X
E[X] = kpX (k) = 0 · pX (0) + 1pX (1) + 2 · pX (2) = 12 + 2 · 14 = 1.
k

For the variance we need to compute E[X 2 ]:

X
E[X 2 ] = k 2 pX (k) = 0 · pX (0) + 1pX (1) + 4 · pX (2) = 1
2 +4· 1
4 = 32 .
k

This gives
Var(X) = E[X 2 ] (E[X])2 = 3
2 1 = 12 .
3.22. (a) The random variable X is binomially distributed with parameters n = 3
and p = 12 . Thus, the possible values of X are {0, 1, 2, 3} and the probability
mass function is
1 1 1 1
P (X = 0) = 3 , P (X = 1) = 3 · 3 , P (X = 2) = 3 · 3 , P (X = 3) = 3 .
2 2 2 2
(b) We have
3+3+1 7
P (X 1) = P (X = 1) + P (X = 2) + P (X = 3) = = ,
8 8
and
3+1 1
P (X > 1) = P (X = 2) + P (X = 3) = = .
8 2
Solutions to Chapter 3 67

(c) The mean is

1 3 3 1 12 3
E[X] = 0 · +1· +2· +3· = = .
8 8 8 8 8 2
The second moment is
1 3 3 1 24
E[X] = 02 · + 12 · + 22 · + 32 · = = 3.
8 8 8 8 8
Hence, the variance is
✓ ◆2
2 2 3 3
Var(X) = E[X ] (E[X]) = 3 = .
2 4
3.23. (a) The possible values for the profit (in dollars) are 0 1 = 1, 2 1 =
1, 100 1 = 99 and 7000 1 = 6999. The probability mass function can be
computed as follows:
10000 100 99
P (X = 1) = P (the randomly chosen player was not a winner) = = ,
10000 100
80 1
P (X = 1) = P (the randomly chosen player was one of the 80 who won $2) = = ,
10000 125
19
P (X = 99) = P (the randomly chosen player was one of the 19 who won $100) = ,
10000
1
P (X = 6999) = P (the randomly chosen player was the one who won $7000) = .
10000
(b)
1
P (X 100) = P (X = 6999) = .
10000
(c) Since X is discrete, we can find its expectation as
X 99 1 19 1
E[X] = kP (X = k) = 1 · +1· + 99 · + 6999 · = 0.094.
100 125 10000 10000
k

For the variance we need E[X 2 ]:

X 99 1 19 1
E[X 2 ] = k 2 P (X = k) = 1 · +1· + 992 · + 69992 · = 4918.22.
100 125 10000 10000
k

From this we get

Var(X) = E[X 2 ] (E[X])2 ⇡ 4918.21.
3.24. (a) We have
2 4 6
P (X 2) = P (X = 2) + P (X = 3) = + = .
7 7 7
(b) We have
✓ ◆
1 1 1 1 2 1 4 13
E = · + · + · = .
1+X 1+1 7 1+2 7 1+3 7 47
R1
3.25. (a) If f is a pdf then 1 f (x)dx = 1. We have
Z 1 Z 3 x=3
26
1= f (x)dx = (x2 b)dx = x3 /3 bx = 2b.
1 1 x=1 3
68 Solutions to Chapter 3

q
This gives b = 23
6 . However, x
2 23
6 is negative for 1  x <
23
6 ⇡ 1.96 which
shows that the function f cannot be a pdf.
(b) We need b 0, otherwise the function is zero everywhere. The cos x function
is non-negative on [ ⇡/2, ⇡/2], but then it goes below 0. Thus if g is a pdf then
b  ⇡/2. Computing the integral of g on ( 1, 1) we get
Z 1 Z b
g(x)dx = cos(x)dx = 2 sin(b).
1 b

There is exactly one solution for 2 sin(b) = 1 in the interval (0, ⇡/2], this is b =
arcsin(1/2) = ⇡/6. For this choice of b the function g is a pdf.
3.26. (a) We require that the probability mass function sum to one. Hence,
1
X 1
X c
1= pX (k) = .
k(k + 1)
k=1 k=1

The sum can be computed in the following way:

1
X XM XM ✓ ◆
c c 1 1
= lim = c lim
k(k + 1) M !1 k(k + 1) M !1 k k+1
k=1 k=1 k=1
✓ ◆
1 1 1 1 1 1 1
= c lim 1 + + + ··· +
M !1 2 2 3 3 4 M M +1
✓ ◆
1
= c lim 1 = c.
M !1 M +1
Combining the above shows that c = 1.
(b) Turning to the expectation,
1
X X1 1
1
E(X) = k = = 1,
k(k + 1) k
k=1 k=2

by the conclusion of Example D.5.

3.27. (a) By collecting the possible values of X that are at least 2 we get
1 1 1
P (X 2) = P (X = 2) + P (X = 3) + P (X = 4) = 5 + 5 + 5 = 35 .
(b) We have
P (X  3 and X 2) P (2  X  3)
P (X  3|X 2) = = .
P (X 2) P (X 2)
3
We already computed P (X 2) = 5 in (a). Similarly,
P (2  X  3) = P (X = 2) + P (X = 3) = 25 ,
and
P (2  X  3) 2/5 2
P (X  3|X 2) = = = .
P (X 2) 3/5 3
(c) We need to compute E[X] and E[X 2 ]. Since X is discrete:
X
E[X] = kP (X = k) = 1 · 25 + 2 · 15 + 3 · 15 + 4 · 15 = 11
5 ,
k
Solutions to Chapter 3 69

and X
E[X 2 ] = k 2 P (X = k) = 1 · 2
5 +4· 1
5 +9· 1
5 + 16 · 1
5 = 31
5 .
k
This leads to
Var(X) = E[X 2 ] (E[X])2 = 34
25 .
3.28. (a) The possible values of X are 1, 2, and 3. Since there are three boxes with
nice prizes, we have
3
P (X = 1) = .
5
Next, for X = 2, we must first choose a box that does not have a good prize
(two choices) followed by one that does (three choices). Hence,
2·3 3
P (X = 2) = = .
5·4 10
Similarly,
2·1·3 1
P (X = 3) = = .
5·4·3 10
(b) The expectation is
3 3 1 3
E[X] = 1 · + 2 · +3· = .
5 10 10 2
(c) The second moment is
3 3 1 27
E[X 2 ] = 12 · + 22 · + 32 · = .
5 10 10 10
Hence, the variance is
✓ ◆2
2 2 27 3 9
Var(X) = E[X ] (E[X]) = = .
10 2 20
(d) Let W be the gain or loss in this game. Then
8
>
<100, if X = 1
W = 100(2 X) = 200 100X = 0, if X = 2
>
:
100, if X = 3.
Thus, by Fact 3.52,
3
E[W ] = E[200 100X] = 200 100E[X] = 200
= 50. 100 ·
2
3.29. The possible values of X are the possible class sizes: 17, 21, 24, 28. We can
compute the corresponding probabilities by computing the probability of choosing
a student from that class:
17 21 7 24 4 28 14
pX (17) = 90 , pX (21) = 90 = 30 , pX (24) = 90 = 15 , pX (28) = 90 = 45 .
From this we can compute E[X]:
X
17 7 4 14 209
E[X] = kP (X = k) = 17 · 90 + 21 · 30 + 24 · 15 + 28 · 45 = 9 .
k

For the variance we need E[X 2 ]:

X
E[X 2 ] = k 2 P (X = k) = 172 · 17
90 + 212 · 7
30 + 242 · 4
15 + 282 · 14
45 = 555.
k
70 Solutions to Chapter 3

Then the variance is

1274
Var(X) = E[X 2 ] . (E[X])2 =
81
3.30. (a) The probability mass function is found by utilizing Fact 2.6. We have
1
P (X = 0) = P (hit on first shot) =
2
P (X = 1) = P (miss on first, then hit)
1 1 1
= P (hit on second|miss on first)P (miss on first) = · = .
3 2 6
Continuing,
1 2 1 1
P (X = 2) = · · =
2 3 4 12
1 2 3 1 1
P (X = 3) = · · · =
2 3 4 5 20
1 2 3 4 1
P (X = 4) = · · · = .
2 3 4 5 5
(b) The expected value of X, the number of misses, is
1 1 1 1 1 77
E[X] = 0 ·
+1· +2· +3· +4· = .
2 6 12 20 5 60
R1
3.31. (a) We must have 1 = 1 f (x)dx. So, we solve:
Z 1
c
1= cx 4 dx =
1 3
which gives c = 3.
(b) We have
Z 1 Z 1
P (0.5 < X < 1) = f (x)dx = 0dx = 0.
0.5 0.5
(c) We have
Z 2 Z 2 2
4 3 1 7
P (0.5 < X < 2) = f (x)dx = 3x dx = x =1 = .
0.5 1 x=1 8 8
(d) We have
Z 4 Z 4 4
4 3 1 1 7
P (2 < X < 4) = f (x)dx = 3x dx = x = = .
2 2 x=2 8 64 64
(e) For x < 1 we have FX (x) = P (X  x) = 0. For x 1 we have
Z x x
1
F (x) = P (X  x) = 3y 4 dy = y 3 =1 .
1 y=1 x3
(f) We have
Z 1 Z 1 2 x=1
4 3x
E(X) = xf (x)dx = x · 3x dx = = 3/2,
1 1 2 x=1
Solutions to Chapter 3 71

and
Z 1 Z 1 1 x=1
3x
E(X 2 ) = xf (x)dx = x2 · 3x 4
dx = = 3.
1 1 1 x=1
From this we get
9
Var(X) = E(X 2 ) (E(X))2 = 3 = 3/4.
4
(g) We have
Z 1 Z 1 x=1
2 2 9 15 39
E[5X +3X] = (5x +3x)f (x)dx = (5x2 +3x)·3x 4
dx = = .
1 1 2x2 x x=1 2
(h) We Z Z
1 1
n n
E[X ] = x f (x)dx = xn · 3x 4
dx.
1 1
Evaluating this integral for integer values of n we get
(
n 1, n 3
E(X ) = 3
3 n , n  2.

3.32. (a) We have

Z 1
1 3/2 1 1/2 1
P (X > 10) = x =p . dx = x x=10
10 2 10
(b) For t < 1, we have that FX (t) = P (X  t) = 0. For t 1 we have
Z t
1 3/2 t 1
P (X  t) = x dx = x 1/2 x=1 = 1 p .
1 2 t
(c) We have
Z 1 Z
1 3/2 1 1 1/2
E[X] = x· x dx = x dx = 1.
1 2 2 1
This last equality can be seen as follows:
Z 1 Z b p
x=b
x 1/2 dx = lim x 1/2 dx = 2 lim x1/2 x=1 = 2 lim ( b 1) = 1.
1 b!1 1 b!1 b!1

(d) We have
Z 1 Z 1
1 1/4 1 1 1/4 1
E[X 1/4 ] = x x 3/2
dx = x 5/4
dx = 4· ·x x=1
= 2.
1 2 2 1 2
3.33. (a) A probability density function must be nonnegative, and it has to inte-
grate to 1. Thus c 0 and we must have
Z 1 Z 2 Z 5
1 1
1= f (x)dx = dx + c dx = + 2c.
1 1 4 3 4
This gives c = 38 .
(b) Since X has a probability density function we can compute the probability in
question by integrating f (x) on the interval [ 32 , 4]:
Z 4 Z 2 Z 4
3 1 1 1 1
P ( 2 < X < 4) = f (x)dx = dx + c dx = · + 1 · c = .
3/2 3/2 4 3 2 4 2
72 Solutions to Chapter 3

R1
(c) We can compute the expectation using the formula E[X] = 1
xf (x)dx and
evaluating the integral using he definition of f .
Z 1 Z 2 Z 5
1
E[X] = xf (x)dx = x dx + x · c dx
1 1 4 3
x=2 x=5 3
x2 cx2 4 1 8 · 25 2·9 27
= + = + = .
8 x=1 2 x=3 8 8 2 2 8
3.34. (a) Since X is discrete, we can compute E[g(X)] using the following formula:
X 1 1 1
E[g(X)] = P (X = k)g(k) = g(1) + g(2) + g(5).
2 3 6
k

Thus we will certainly have E[g(X)] = 13 ln 2 + 16 ln 5 if g(1) = 0, g(2) =

ln 2, g(5) = ln 5. The function g(x) = ln x satisfies these requirements, thus
E[ln(X)] = 13 ln 2 + 16 ln 5.
(b) Based on the solution of part (a) there is a function g for which g(1) = et ,
g(2) = 2e2t , g(5) = 5e5t then
E[g(X)] = 12 et + 23 e2t + 56 e5t .
The function g(x) = xext satisfies the requirements, so
E[XetX ] = 12 et + 23 e2t + 56 e5t .
(c) We need to find a function g for which
1 1 1
E[g(X)] = g(1) + g(2) + g(5) = 2.
2 3 6
There are lots of functions that satisfy this requirement. The simplest choice
is the constant function g(x) = 2, but for example the function g(x) = x also
works.
3.35.
X
E[X 4 ] = k 4 P (X = k) = ( 2)4 P (X = 2) + 04 P (X = 0) + 44 P (X = 4)
k
1 7
= 16 · + 256 · = 29.
16 64
3.36. Since X is continuous, we can compute E[X 4 ] as follows:
Z 1 Z 2 Z 2
2 2x3 x=2 14
E[X 4 ] = x4 f (x)dx = x4 · 2 dx = 2x2 dx = x=1
= .
1 1 x 1 3 3
3.37. (a) The cumulative distribution function F (x) is continuous everywhere (even
at x = 0) and it is di↵erentiable everywhere except at x = 0. Thus we can get the
probability density function by di↵erentiating F .
(
2
(1 + x) x 0
f (x) = F 0 (x) =
0 x < 0.
(b) We have
3 2 1
P (2 < X < 3) = F (3) F (2) = = .
4 3 12
Solutions to Chapter 3 73

R3
We could also compute this probability by evaluating the integral 2 f (x)dx.
(c) Using the probability density function we can write
Z 1
2 2X
E[(1 + X) e ]= f (x)(1 + x)2 e 2x dx
0
Z 1 Z 1
2 2x 2
= (1 + x) e (1 + x) dx = e 2x dx
0 0
1
1 2x 1
= e = .
2 x=0 2
3.38. (a) Since Z is continuous and the pd.f. is given, we can compute its expec-
tation as
Z 1 Z 1 z=1
E[Z] = zf (z)dz = z · 52 z 4 dz = 12
5 6
z = 0.
1 1 z= 1

(b) We have
Z 1/2 Z 1/2 z=1/2
5 4 1 5
P (0 < Z < 1/2) = f (z)dz = 2 z dz = 12 z 5 = 1
2 2 = 1
64 .
0 0 z=0

1 P (Z < 12 and Z > 0) P (0 < Z < 1/2)

P {Z < 2 | Z > 0} = = .
P (Z > 0) P (Z > 0)
1
The numerator is 64 .
The denominator is
Z 1 Z 1 z=1
5 4 z5
P (Z > 0) = f (z)dz = z dz = = 1/2.
0 0 2 2 z=0
Thus,
1
1 1 64
P {Z < 2 | Z > 0} =
. =
1/2 32
(d) Since Z is continuous and the pd.f. is given, we can compute E[Z n ] for n 1
as follows
Z 1 Z 1 Z 1
E[Z n ] = z n f (z)dz = z n · 52 z 4 dz = 5 n+4
2z dz
1 1 1
z=1
n+5
= 5
2(n+5) z = 5
2(n+5) 1n+5 ( 1)n+5
z= 1
5 n+5
= 2(n+5) 1 ( 1) .

Note that ( 1)n+5 = 1 if n is odd and ( 1)n+5 = 1 if n is even. Thus

(
5
n , if n is odd
E[Z ] = n+5
0, if n is even.
3.39. (a) One possible example:
1 3 1 5 1
P (X = 1) = , P (X = 2) = = , P (X = 3) = 1 P (X = 1) P (X = 2) = .
3 4 3 12 4
74 Solutions to Chapter 3

Then F (1) = P (X  1) = P (X = 1) = 13 ,
3
F (2) = P (X  2) = P (X = 1) + P (X = 2) =
4
and
F (3) = P (X  3) = P (X = 1) + P (X + 2) + P (X = 3) = 1.
(b) There are a number of possible solutions. Here is one that can be checked easily
using part (a):
81
>
> 3 0x1
>
<5 1<x2
f (x) = 12 1
>
> 2<x3
>
:4
0 otherwise.
1
3.40. Here is a continuous example:
R 1 let f (x) = x2 for x 1 and 0 otherwise. This
is a nonnegative function with 1 f (x)dx = 1, thus there is a random variable X
with p.d.f. f . Then the cumulative distribution function of X is given by
Z x (
0, if x < 1
F (x) = f (y)dy = R x 1
1 1 y2
dy = 1 1/x, if x 1.
1
In particular, F (n) = 1 n for each positive integer n.
3.41. We begin by deriving the probability F (s) = P (X  s) using the law of total
probability. For s 2 (3, 4),
6
X 3
X 6
X
1 s 1
F (s) = P (X  s) = P (X  s | Y = k)P (Y = k) = + ·
6 k 6
k=1 k=1 k=4
1 37s
= + .
2 360
We can find the density function f on the interval (3, 4) by di↵erentiating this.
Thus
f (s) = F 0 (s) = 37
360 for s 2 (3, 4).
3.42. (a) Note that 0  X  1 so FX (x) = 1 for x 1 and FX (x) = 0 for x < 0.
For 0  x < 1 the event {X  x} is the same as the event that the chosen
point is in the trapezoid Dx with vertices (0, 0), (x, 0), (x, 2 x), (0, 2). The
area of this trapezoid is 12 (2 + 2 x)x, while the area of D is (2+1)1
2 = 32 . Thus
1
area(Dx ) 2 (2 +2 x)x 4x x2
P (X  x) = = 3 = .
area(D) 2
3 3
Thus 8
>
<1, if x 1
4x x2
FX (x) = 3 , if 0  x < 1
>3
:
0, if x < 0.
To find FY we first note that 0  Y  2 so FY (y) = 1 for y 2 and FY (y) = 0
for y < 0.
Solutions to Chapter 3 75

For 0  y < 1 the event {Y  y} is the same as the event that the chosen
point is in the rectangle with vertices (0, 0), (0, y), (1, y), (1, 0). The area of
this rectangle is y, so in that case P (Y  y) = y3 = 2y3 .
2
If 1  y < 2 then the event {Y  y} is the same as the event that the
chosen point in the region Dy with vertices (0, 0), (0, y), (2 y, y), (1, 1), (1, 0).
The area of this region can be computed for example by subtracting the area of
the triangle with vertices (2, 0), (0, y), (2 y, y) from the area of D, this gives
y2 1
3 (2 y)2 y2 1 2y 1
2 2 = 2y 2 2. Thus P (Y  y) = 3
2 2
= 3 4y y2 1
2
Thus we have
8
>
> 1, if y 2
>
< 1 4y 2
y 1 , if 1y<2
FY (y) = 32y
>
> , if 0y<1
>
:3
0, if x < 0.
(b) Both cumulative distribution functions found in part (a) are continuous ev-
erywhere, and di↵erentiable everywhere apart from maybe a couple of points.
Thus we can find fX and fY by di↵erentiating FX and FY :
(
4 2x
3 , if 0  x < 1
fX (x) = 3
0, otherwise.
8
> 1
< 3 (4 2y) , if 1  y < 2
fY (y) = 23 , if 0  y < 1
>
:
0, otherwise.
3.43. If (a, b) is a point in the square [0, 1]2 then the distances from the four sides
are a, b, 1 a, 1 b and the minimal distance is the minimum of these four numbers.
Since min(a, 1 a)  1/2, this minimal distance is at most 1/2 (which can be
achieved at (a, b) = (1/2, 1/2)), and at least 0. Thus the possible values of X are
from the interval [0, 1/2].
(a) We would like to compute F (x) = P (X  x) for all x. Because 0  X  1/2,
we have F (x) = 0 for x < 0 and F (x) = 1 for x > 1/2.
Denote the coordinates of the randomly chosen point by A and B. If 0  x 
1/2 then the set {X  x}c = {X > x} is the same as the set
{x < A, x < 1 A, x < B, 1 x < B} = {x < A < 1 x, x < B < 1 x}.
2
This is the same as the point (A, B) being in the square (x, 1 x) which has
probability (1 2x)2 . Hence, for 0  x  1/2 we have
F (x) = P (X  x) = 1 P (X > x) = 1 (1 2x)2 = 4x 4x2 .
(b) Since the cumulative distribution function F (x) that we found in part (a) is
continuous, and it is di↵erentiable apart from x = 0, we can find f (x) just by
di↵erentiating F (x). This means that f (x) = 4 8x for 0  x  1/2 and 0
otherwise.
3.44. (a) Let s be a real number. Let ↵ = arctan(r) 2 ( ⇡/2, ⇡/2) be the angle
corresponding to the slope s, this is the number ↵ 2 ( ⇡/2, ⇡/2) with tan(↵) =
s. The event that {S  s} is the same as the event that the uniformly chosen
76 Solutions to Chapter 3

point is in the circular sector corresponding to the angles ⇡/2 and ↵ and
radius 1. The area of this circular sector is ↵ + ⇡/2, while the area of the half
disk is ⇡. Thus
↵ + ⇡/2 1 arctan(s)
FS (s) = P (S  s) = = + .
⇡ 2 ⇡
(b) The c.d.f. found in part (a) is di↵erentiable everywhere, hence the p.d.f. is equal
to its derivative:
✓ ◆0
1 arctan(s) 1
fS (s) = + = .
2 ⇡ ⇡(1 + s2 )
Y
3.45. Let (X, Y ) be the uniformly chosen point, then S = X. We can disregard
the case X = 0, as the probability of this is 0.
(a) We need to compute F (s) = P (S  s) for all s.
The slope S can be any nonnegative number, but it cannot be negative. Thus
FS (s) = P (S  s) = 0 if s < 0.
If 0  s  1 then the points (x, y) 2 [0, 1]2 with y/x  s are exactly the points
in the triangle with vertices (0, 0), (1, 0), (1, s). The area of this triangle is s/2,
hence for 0  s  1 we have FS (s) = s/2.
If 1 < s then the points (x, y) 2 [0, 1]2 with y/x  s are either in the triangle
with vertices (0, 0), (1, 0), (1, 1) or in the triangle with vertices (0, 0), (1, 1), (1/s, 1).
The area of the union of these triangles is 1/2 + 21 (1 1/s) = 1 2s 1
, hence for
1
1 < s we have FS (s) = 1 2s .
To summarize: 8
>
<0 s<0
1
F (s) = s 0<s1.
>2
: 1
1 2s 1<s
(b) Since F (s) is continuous everywhere and it is di↵erentiable apart from s = 0,
we can get the probability density function f (s) just by di↵erentiating F . This
gives
8
>
<0 s<0
1
f (s) = 2 0<s1.
>
: 1
2s2 1<s
3.46. (a) The smaller piece cannot be larger than `/2, hence 0  X  `/2. Thus
FX (x) = 0 for x < 0 and FX (x) = 1 for x `/2.
For 0  x < `/2 the event {X  x} is the same as the event that the chosen
point where we break the stick in two is within x of one of the end points. The
set of possible locations is thus the union of two intervals of length x, hence
the probability of the uniformly chosen point to be in this set is 2·x
` . Hence for
0  x < `/2 we have FX (x) = 2x ` .
To summarize
8
>
<1 for x `/2
2x
FX (x) = ` for 0  x < `/2
>
:
0 for x < 0.
Solutions to Chapter 3 77

(b) The c.d.f. found in part (a) is continuous everywhere, and di↵erentiable apart
from x = `/2. Hence we can find the p.d.f. by di↵erentiating it, which gives
(
2
for 0  x < `/2
fX (x) = `
0 otherwise.

3.47. (a) We need to find F (x) = P (X  x) for all x. The X coordinate of a point
in the triangle must be between 0 and 30, so F (x) = 0 for x < 0 and F (x) = 1 for
x 30.
For 0  x < 30 then the set of points in the triangle with X  x is the triangle
with vertices (0, 0), (x, 0) and (x, 23 x). The area of this triangle is 13 x2 , while the
area of the original triangle is 20·30
2 = 300. This means that if 0  x < 30 then
1 2
3x x2
F (x) = 300 = 900 . Thus
8
>
<0 2 x<0
x
F (x) = 0  x < 30 .
> 900
:
1 x 30

(b) Since F (x) is continuous everywhere, and it is di↵erentiable everywhere apart

from x = 30 we can get the probability density function as F 0 (x). This gives
(
x
0  x < 30
f (x) = 450 .
0 otherwise

(c) Since X is absolutely continuous, we can compute E[X] as

Z 1
E[X] = xf (x)dx.
1

Using the solution from part (b):

Z 1 Z 30
x
E[X] = xf (x)dx = x dx = 20.
1 0 450
3.48. Denote the distance by R. The distance from the y-axis for a point in the
triangle is at most 2, hence 0  R  2. We first compute the c.d.f. of R. For
0 < r < 2 the event {R < r} is the same as the event that the chosen point is in
the trapezoid with vertices (0, 0), (r, 0), (r, 1 r/2), (0, 1). The probability of this
event can be computed by taking ratios of areas:
r(1+1 r/2)
area(trapezoid) 2 r2
FR (r) = P (R  r) = = 2·1 =r .
area(triangle) 2
4

For r 2 we have FR (r) = P (R  r) = 1 and for r  0 we have FR (r) = P (R 

r) = 0. The found c.d.f. is continuous everywhere and di↵erentiable apart from
r = 0. Thus we can find the probability density function by di↵erentiation:
fR (r) = (FR (r))0 = 1 r/2, if 0 < r < 2,
and fR (r) = 0 otherwise.
78 Solutions to Chapter 3

Thus R is a continuous random variable, and we can compute its expectation

by evaluating the appropriate integral:
Z 1 Z 2
2
E[R] = rfR (R)dr = r(1 r/2)dr = .
1 0 3
3.49. (a) The set of possible values for X is the interval [0, 4]. Thus F (x) = P (X 
x) is 0 for x < 0 and equal to 1 for x 4. If 0  x < 4 then the set of points
(X, Y ) in the triangle with X  x is the quadrilateral formed by the vertices
(0, 0), (0, 2), (x, x/4), (x, 2 x/4). This is actually a trapezoid, and its area can
2
be readily computed as (2 x/2+2)x
2 = 2x x4 . (Another way is to integrate the
function 2 s/2 on (0, x).) The area of the triangle is 2·4 2 = 4 which means that
P (x  x) = 12 x 16 1 2
x for 0  x < 4.
This gives the continuous cdf
8
>
<0 x<0
F (x) = 12 x 1 2
16 x 0x<4.
>
:
1 x 4
Di↵erentiating this gives f :
(
1 1
f (x) = 2 8x 0<x<4
.
0 otherwise
(b) Our goal now is to compute f (x) directly. Since X takes values from [0, 4], we
can assume 0 < x < 4. We would like to compute the probability P (X 2 (x, x + "))
for a small ". The set of points (X, Y ) in the triangle with x  X  x + " is the
trapezoid formed by the points (x, x/4), (x, 2 x/4), (x + ", x+"
4 ), (x + ", 2
x+"
4 ).
x
For " small the area of this trapezoid will be close to " · (2 2 ) (as the trapezoid
is close to a rectangle with sides " and 2 x2 ). The area of the original triangle is
4, thus, for 0 < x < 4 we have
x
2 2
P (X 2 (x, x + ")) ⇡ " ·
4
1 1
which means that in this case f (x) = 2 8 x. For x  0 and x 4 we have
f (x) = 0.
We can now
R x compute the cumulative distribution function F (x) using the for-
mula F (x) = 1 f (y)dy.
Rx
For x < 0 we have F (x) = 1 f (y)dy = 0. For x 4 we have
Z x Z 4
1 1
F (x) = f (y)dy = 2 8 ydy = 1.
1 0

Finally, for 0  x < 4 we have

Z x Z x
1 1 1 2
F (x) = f (y)dy = 2 8 ydy = 12 x 16 x .
1 0

3.50. (a) For " < t < 9 the event {t " < R < t} is the event that the dart lands
in the annulus (or ring) with radii t ✏ and t. The area of this annulus is
Solutions to Chapter 3 79

⇡(t2 (t ")2 ), thus the corresponding probability is

⇡(t2 (t ")2 ) 1 2 2 "2
P (t " < R < t) = 2
= (t t2 + 2"t "2 ) = "t .
9 ⇡ 81 81 81
2t
Taking the limit of " 1 P (t " < R < t) as " ! 0 gives 81 for 0 < t < 9. This is
the probability density in (0, 9), and since R cannot be negative or larger than
9, the p.d.f. is 0 otherwise.
(b) The argument is similar to the one presented in part (a). If " < t < 9 " then
⇡((t + ")2 (t ")2 ) 4t"
P (t " < R < t + ") = = .
81⇡ 81
2t
Hence (2") 1 P (t " < R < t+") = 81 (we don’t even need to take a limit here).
2t
Thus the probability density function of R is 81 on (0, 9) and zero otherwise.
3.51. We have
1
X 1 X
X k
E(X) = k(1 p)k 1
p= (1 p)k 1
p.
k=1 k=1 j=1

In the last sum we are summing for k, j with 1  j  k. If we reverse the order of
summation, then k will go from j to 1, while j goes from 1 to 1:
1 X
X k 1 X
X 1
(1 p)k 1
p= (1 p)k 1
p.
k=1 j=1 j=1 k=j

For a given positive integer j we have

1
X 1
X 1
(1 p)k 1
p = p(1 p)j 1
(1 p)` = p(1 p)j 1
= (1 p)j 1
.
1 (1 p)
k=j `=0

where we introduced k = j + ` and evaluated the geometric sum. This gives

1
X 1
X 1
E(X) = (1 p)j 1
= (1 p)i = .
j=1 i=0
p

3.52. Using the hint we write

1
X 1 X
X 1
P (X k) = P (X = i).
k=1 k=1 i=k

Note that in the double sum we have 1  k  i. If we switch the order of the two
summations (which is allowed, since each term is nonnegative) then k goes from 1
to i, and i goes from 1 to 1:
1 X
X 1 1 X
X i
P (X = i) = P (X = i).
k=1 i=k i=1 k=1
Pi
Since P (X = i) does not depend on k, we have k=1 P (X = i) = iP (X = i) and
hence
X1 X1 X i 1
X
P (X k) = P (X = i) = iP (X = i).
k=1 i=1 k=1 i=1
80 Solutions to Chapter 3

P1
Because X takes only nonnegative integers we have E[X]
P1 = i=0 iP (X = i), and
since thePi = 0 term is equal to zero we have E[X] = i=1 iP (X = i). This proves
1
E[X] = k=1 P (X k).
3.53. (a) Since X is discrete, taking values from 0, 1, 2, . . . , we can compute its
expectation as follows:
X1 1 1
3 X X
E[X] = kP (X = k) = 0 · + k · 12 · ( 13 )k = 12 k · ( 13 )k
4
k=0 k=1 k=1
P1
The infinite sum may be computed using the identity k=1 kxk 1 = (1 1x)2 (which
P1
holds for |x| < 1, and follows from k=0 xk = 1 1 x by di↵erentiation):
1
X 1
X 1 3
k · ( 13 )k = 1
3 k · ( 13 )k 1
= 1
3 1 2 = ,
(1 3)
4
k=1 k=1
1 3 3
which gives E[X] = 2 · 4 = 8.
Another way to arrive to this solution would be to apply the approach outlined
in Exercise 3.51.
(b) To compute Var(X) we need E[X 2 ]. It turns out that E[X 2 X] = E[X(X 1)]
is easier to compute:
X1 X1
E[X(X 1)] = k(k 1)P (X = k) = k(k 1) · 12 · ( 13 )k .
k=0 k=2
P1 1
Next we can use that for |x| < 1 we have k=2 k(k 1)xk 2
= (1 x)3 . (This
P1 k 1
follows from k=0 x = 1 x by di↵erentiating twice.)
1
X 1
X 1 3
k(k 1) · 1
2 · ( 13 )k = 1
2 · ( 13 )2 k(k 1) · ( 13 )k 2
= 1
18 · 1 3 = .
(1 3)
8
k=2 k=2
3
Thus E[X(X 1)] = 8 and hence
3 3 3
E[X 2 ] = E[X(X 1) + X] = E[X(X 1)] + E[X] = + =
8 8 4
and
39
Var(X) = E[X 2 ] (E[X])2 = 3/4 (3/8)2 =
.
64
3.54. (a) We have P (X k) = (1 p)k 1
. We can compute this by evaluating the
geometric series
1
X 1
X
P (X k) = P (X = `) = pq ` 1
.
`=k `=k
An easier way is to note that if X is the number of trials needed for the first
success then {X k} is the event that the first k 1 trials are all failures,
which has probability (1 p)k 1 .
(b) By Exercise 3.52 we have
1
X 1
X 1 1
E[X] = P (X k) = (1 p)k 1
= = .
1 q p
k=1 k=1
Solutions to Chapter 3 81

3.55. We first find the probability mass function of Y . The possible values are
1, 2, 3, . . . . Peter wins the game if Y is an odd number, and Mary wins the game if
it is even. If n 0 then

P (Y = 2n + 1)
= P (Peter misses n times, Mary misses n times, Peter hits bullseye next)
= (1 p)n (1 r)n p.

Similarly, for n 1:

P (Y = 2n)
= P (Peter misses n times, Mary misses n 1 times, Mary hits bullseye next)
n n 1
= (1 p) (1 r) r.

Then
1
X 1
X 1
X
E[Y ] = kP (Y = k) = (2n + 1)(1 p)n (1 r)n p + 2n(1 p)n (1 r)n 1
r.
k=1 n=0 n=1

The evaluationPof these sums is a bitP1lengthy, but in the end one just has to use
1
the identities k=0 xk = 1 1 x and k=1 kxk 1 = (1 1x)2 , which holds for |x| < 1.
To simplify notations a little bit, we introduce s = (1 p)(1 r).
1
X 1
X 1
X 1
X
(2n + 1)(1 p)n (1 r)n p = (2n + 1)sn p = 2nsn p + sn p
n=0 n=0 n=0 n=0
1
X 1
X
= 2sp nsn 1
+p sn
n=1 n=0
2sp p p(1 + s)
= + = .
(1 s)2 1 s (1 s)2

1
X 1
X
2n(1 p)n (1 r)n 1
r = 2(1 p)r n(1 p)n 1
(1 r)n 1

n=1 n=1
X1
2(1 p)r
= 2(1 p)r nsn 1
= .
n=1
(1 s)2

This gives
p(1 + s) + 2(1 p)r
E[Y ] = .
(1 s)2
Substituting back s = (1 r)(1 p) = 1 p r + pr:
p(1 + (1 p)(1 r)) + 2(1 p)r (2 p)(p + r pr) 2 p
E[Y ] = = = .
(p + r pr)2 (p + r pr)2 p + r pr
For r = p the random variable Y has geometric distribution with parameter p, and
our formula gives 2p2 pp2 = p1 , as it should.
82 Solutions to Chapter 3

3.56. Using the hint we compute E[X(X 1)] first. Using the formula for the
expectation of a function of a discrete random variable we get
1
X 1
X 1
X
E[X(X 1)] = k(k 1)pq k 1
= pq k(k 1)q k 2
= pq k(k 1)q k 2
.
k=1 k=1 k=0

(We used that k(k 1) = 0 for k = 0.) Note that k(k 1)q k 2 = (q k )00 for k 2,
and the formula also works for k = 0 and 1.
P1
The identity 1 1 x = k=0 xk holds for |x| < 1, and di↵erentiating both sides
we get
✓ ◆0 1
!00 1
1 2 X X
k
= = x = k(k 1)xk 2 .
1 x (1 x)3
k=0 k=0

(We are allowedPto di↵erentiate the series term by term for |x| < 1.) Thus for
1
|x| < 1 we have k=0 k(k 1)xk 2 = (1 2x)3 and thus
1
X 2 2q
E[X(X 1)] = pq k(k 1)q k 2
= pq · = ,
(1 q)3 p2
k=0

where we used p + q = 1.
Then
1 2q p + 2q 1+q
E[X 2 ] = E[X] + E[X(X 1)] = + = =
p p2 p2 p2
where we used p + q = 1 again.
3.57. We have P (X = k) = p(1 p)k 1
for k 1. Hence we can compute E[ X1 ]
using the following formula:
1
X 1
E[ X1 ] = p(1 p)k 1
.
k
k=1
P1
In order to evaluate the infinite sum, we start with the identity 1 1 x = k=0 xk
which holds for |x| < 1, and then integrate both sides from 0 to y with |y| < 1:
Z y Z yX1
1
dx = xk dy.
0 1 x 0 k=0
Ry 1 1
On the left side we have 0 1 x
dx = ln( 1 y ). On the right side we integrate term
by term to get
Z 1
yX 1
X 1
X
y k+1 yn
xk dy = = .
0 k=0 k + 1 n=1 n
k=0

This gives the identity

X1
yn
= ln( 1 1 y )
n=1
n
Solutions to Chapter 3 83

for |y| < 1. Using this with y = 1 p:

1
X 1
E[ X1 ] = p(1 p)k 1
=
k
k=1
1
X
p (1 p)k p
= ln( p1 )
1 p k 1 p
k=1

3.58. Using the formula for the expected value of a function of a discrete random
variable we get
Xn ✓ ◆
1 n k
E[X] = p (1 p)n k .
k+1 k
k=0

We have
✓ ◆
1 n 1 n! n!
= =
k+1 k k + 1 k!(n k)! (k + 1)!(n k)!
1 (n + 1)!
=
n + 1 (k + 1)!((n + 1) (k + 1))!
✓ ◆
1 n+1
= .
n+1 k+1

where we used (k + 1) · k! = (k + 1)!.

Then
n
X ✓ ◆
1 n+1 k
E[X] = p (1 p)n k
n+1 k+1
k=0
Xn ✓ ◆
1 n + 1 k+1
= p (1 p)n+1 (k+1)
p(n + 1) k+1
k=0

1 X ✓n + 1 ◆
n+1
= p` (1 p)n+1 ` .
p(n + 1) `
`=1

Adding and removing the ` = 0 term to the sum and using the binomial theorem
yields

1 X ✓ n + 1◆
n+1
E[X] = p` (1 p)n+1 `
p(n + 1) `
`=1
!
1 X ✓n + 1 ◆
n+1
= p` (1 p)n+1 `
(1 p)n+1
p(n + 1) `
`=0
1
= (1 (1 p)n+1 ).
p(n + 1)
84 Solutions to Chapter 3

3.59. (a) Using the solution for Example 1.38 we see that the following function
works: 8
>
> 10 if 0  r  1,
>
>
>
> if 1 < r  3,
<5
g(r) = 2 if 3 < r  6,
>
>
>
> 1 if 6 < r  9,
>
>
:0 otherwise.
Since 0  R  9 we could have defined g any way we like it outside [0, 9]. (b) The
probability mass function for X is given by
1 8 27 45
pX (10) = , pX (5) = , pX (2) = , pX (1) = .
81 81 81 81
Thus the expectation is
1 8 27 45 149
E[X] = 10 · +5· +2· +1· =
81 81 81 81 81
(c) Using the result of Example 3.19 we see that the probability density fR (r) of R
2r
is 81 for 0 < r  9 and zero otherwise. We can now compute the expectation of
X = g(R) as follows:
Z 1
E[X] = E[g(R)] = g(r)fR (r)dr
1
Z 1 Z 3 Z 6 Z 9
2r 2r 2r 2r
= 10 · dr + 5 · dr + 2 · dr + 1 · dr
0 81 1 81 3 81 6 81
149
= .
81
3.60. (a) Let pX be the probability mass function of X. Then
X X X
E[u(X) + v(X)] = pX (k)(u(k) + v(k)) = pX (k)u(k) + pX (k)v(k)
k k k
= E[u(X)] + E[v(X)].
The first step is the expectation of a function of a discrete random variable.
In the second step we broke the sum into two parts. (This actually requires
care in case of infinitely many terms. It is a valid step in this case because u
and v are bounded and hence all the sums involved are finite.) In the last step
we again used the formula for the expected value of a function of a discrete
random variable.
(b) Suppose that the probability density function of X is f . Then
Z 1 Z 1 Z 1
E[u(X) + v(X)] = f (x)(u(x) + v(x))dx = f (x)u(x)dx + f (x)v(x)dx
1 1 1
= E[u(X)] + E[v(X)].
The first step is the formula for the expectation of a function of a continuous
random variable. In the second step we rewrote the integral of a sum as the
sum of the integrals. (This is a valid step because u and v are bounded and
thus all the integrals involved are finite.) In the last step we again used the
formula for the expected value of a function of a continuous random variable.
Solutions to Chapter 3 85

3.61. (a) Note that the range of X is [0, M ]. Thus, we know that
FX (s) = 0 if s < 0, and F (s) = 1 if s > M.
Next, for s 2 [0, M ] we have
Z s
2s s2
FX (s) = P (X  s) = 2(M x)/M 2 dx = .
0 M M2
(b) We have
(
X if X 2 [0, M/2]
Y = .
M/2 if X 2 (M/2, M ]

(c) For y < M/2 we have that {Y  y} = {X  y} and so,

2y y2
P (Y  y) = P (X  y) = FX (y) = .
M M2
Since {Y = M/2} = {X M/2} we have
P (Y = M/2) = P (X M/2) = 1 P (X < M/2)
=1 P (X  M/2)

2(M/2) (M/2)2
=1 FX (M/2) = 1
M M2
1 1
=1 (1 )= .
4 4
Since Y is at most M/2, for y > M/2 we have
P (Y  y) = P (Y  M/2) = 1.
Putting this all together yields
8
>
<0 y<0
2y y2
FY (y) = M M2 0  y < M/2 .
>
:
1 y M/2

(d) We have
3
P (Y < M/2) = lim FY (y) = .
y! M
2
4
Another way to see this is by noticing that
1 3
P (Y < M/2) = 1 P (Y M/2) = 1 P (Y = M/2) = 1 = .
4 4
(e) Y cannot be continuous, as P (Y = M/2) = 14 > 0. But it cannot be discrete
either, as there are no other values which Y takes with positive probability.
Thus there is no density, nor is there a probability mass function.
3.62. From the set-up we know F (s) = 0 for s < 0 because negative values have no
probability and F (s) = 1 for s 3/4 because the boy is sure to be inside by time
86 Solutions to Chapter 3

3/4. For values 0  s < 3/4 the probability P (X  s) comes from the uniform
distribution and hence equals s, the length of the interval [0, s]. To summarize,
8
>
<0, s < 0
F (s) = s, 0  s < 3/4
>
:
1, s 3/4.
In particular, we have a jump in F that gives the probability for the value 3/4:
P (X = 34 ) = F ( 43 ) F ( 34 ) =1 3
4 = 14 .
This reflects the fact that, left to his own devices, the boy would come in after time
3/4 with probability 1/4. This option is removed by the mother’s call and so all
this probability concentrates on the value 3/4.
P
3.63. (a) We have E[X] = k kpX (k). Because X is symmetric, we must have
P (X = k) = P (X = k) for all k. Thus we can write the sum as
X X X
E[X] = kpX (k) = 0·pX (0)+ kpX (k)+( k)pX ( k) = k(pX (k) pX ( k)) = 0
k k>0 k>0

since each term is 0.

(b) The solution is similar in the continuous case. We have
Z 1 Z 1 Z 0
E[X] = xf (x)dx = xf (x)dx + xf (x)dx
1 0 1
Z 1 Z 1
= xf (x)dx + xf ( x)dx
Z0 1 0

= x(f (x) f ( x))dx = 0.

0
R1 1
3.64. For the continuous random variable first recall that x↵ dx = 1 if ↵  1,
R1 1
and 1 x1↵ dx = ↵ 1 1 < 1 if ↵ > 1.
Now set (
2
f (x) = x3 , if x 1,
0 otherwise.
R1 R1
Since f (x) 0 and 1 f (x)dx = 2 1 x23 dx = 1, the function f is a probability
density function. Let X be a continuous random variable with probability density
function equal to f . Then
Z 1 Z 1 Z 1
k 2 1
E[X] = x f (x)dx = x 3 dx = 2 2
dx = 2 < 1
1 1 x 1 x
Z 1 Z 1 Z 1
3 1
E[X 2 ] = x2 f (x)dx = x2 3 dx = 3 dx = 1.
1 1 x 1 x
P1 P1
For the discrete example recall that k=1 k1↵ < 1 if ↵ > 1 and k=1 k1↵ = 1
for ↵  1. Consider the discrete random variable X with probability mass function
C
P (X = k) = , k = 1, 2, . . .
k3
Solutions to Chapter 3 87

P1
with C = P11 1 . Since 0 < 1
< 1, this is indeed a probability mass
k=1 k3
k=1 k3
function. Moreover, we have
1
X 1
X X 1 1
C
E[X] = kP (X = k) = k· 3
=C < 1.
k k2
k=1 k=1 k=1

and
1
X 1
X X1 1
2 2 C
2
E[X ] = k P (X = k) = k · 3 =C = 1.
k k
k=1 k=1 k=1
2
3.65. (a) We have Var(2X + 1) = 2 Var(X) = 4 · 3 = 12.
(b) We have
E[(3X 4)2 ] = E[9X 2 24X + 16] = 9E[X 2 ] 24E[X] + 16.
2
We know that Var(X) = E[X ] E[X] , so E[X ] = Var(X) + E[X]2 = 3 + 22 = 7.
2 2

Thus
E[(3X 4)2 ] = 9E[X 2 ] 24E[X] + 16 = 9 · 7 24 · 2 + 16 = 31.
p
3.66. We can express X as X = 3Y + 8 where Y ⇠ N (0, 1). Then
p
0.15 = P (X > ↵) = P ( 3Y + 8 > ↵) = P (Y > ↵p38 ) = 1 ( ↵p38 ).

Using the table in Appendix E we get that if ( ↵p38 ) = 0.85 then ↵p 8

3
⇡ 1.04.
From this we get
p
↵ ⇡ 31.04 + 8 ⇡ 9.8.
3.67. (a) We have
Z 1 Z 1
1 x2
E[Z 3 ] = x3 '(x)dx = x3 p e 2 dx.
1 1 2⇡
x2
Note that the function g(x) = x3 p12⇡ e 2 is odd: g(x) = g( x). Thus if the
integral is finite then it must be equal to 0, as the values on the positive and
negative half lines cancel each other out. The fact that the integral is finite follows
x2
from the fact that x3 grows a lot slower than e 2 . (Or you can evaluate the integral
on the positive and negative half lines separately by integration by parts.)
(b) We can express X as X = Y + µ where Y ⇠ N (0, 1). Then
E[X 3 ] = E[( Y + µ)3 ] = E[ 3
Y3+3 2
µY 2 + 3 µ2 Y + µ3 ]
3
= E[Y 3 ] + 3 2
µE[Y 2 ] + 3 µ2 E[Y ] + µ3 .

We have E[Y ] = E[Y 3 ] = 0 and E[Y 2 ] = 1. Thus

E[X 3 ] = 3
E[Y 3 ] + 3 2
µE[Y 2 ] + 3 µ2 E[Y ] + µ3 = 3 2
µ + µ3
x2
3.68. (a) Since the p.d.f. of Z is '(x) = p12⇡ e 2 , we have
Z 1 Z 1
1 x2
E[Z 4 ] = '(x)x4 dx = p e 2 x4 dx.
1 1 2⇡
88 Solutions to Chapter 3

x2
We can evaluate the integral using integration by parts noting that e 2 x =
x2
( e 2 )0 :
Z 1 Z 1
1 x2
4 1 x2
p e 2 x dx = p e 2 x · x3 dx
1 2⇡ 1 2⇡
Z 1
1 x2
3 x=1 1 x2
=p ( e 2 ) · x x= 1 p ( e 2 ) · 3x2
2⇡ 1 2⇡
Z 1
1 x2
=3 p e 2 x2 dx = 3.
1 2⇡
x2 R1 x2
We used that lim e 2 x3 = 0 (and the same for x ! 1), and that 1 p12⇡ e 2 x2 dx =
x!1
E[Z 2 ] = 1.
Hence E[Z 4 ] = 3.
(b) We can express X as X = Y + µ where Y ⇠ N (0, 1). Then
E[X 4 ] = E[( Y + µ)4 ]
4
= E[ Y4+4 3
µY 3 + 6 2 2
µ Y 2 + 4 Y µ 3 + µ4 ]
4
= E[Y 4 ] + 4 3
µE[Y 3 ] + 6 2 2
µ E[Y 2 ] + 4 µ3 E[Y ] + µ4 .
We know that E[Y ] = 0, E[Y 2 ] = 1. By part (a) we have E[Y 4 ] = 3 and by
the previous problem we have E[Y 3 ] = 0. Substituting these in the previous
expression we get
E[X 4 ] = 3 4 + 6 2 µ2 + µ4 .
3.69. Denote the nth moment E[Z n ] by mn . It can be computed as
Z 1 Z 1
1 x2
mn = xn '(x)dx = xn p e 2 dx
1 1 2⇡
We have seen that m1 = E[Z] = 0 and m2 = E[Z 2 ] = 1.
Suppose first that n = 2k + 1 is an odd number. Then the function x2k+1 is
odd and hence the function x2k+1 '(x) is odd as well. If the
R 1integral is finite then
the contribution of the positive and negative half lines in 1 x2k+1 '(x)dx cancel
each other out and thus m2k+1 = 0. The fact that the integral is finite follows from
x2
the fact that for any fixed n xn grows a lot slower than e 2 .
For n = 2k 2 we see that xn '(x) is even, and thus (if the integrals are finite)
we have Z Z
1 1
m2k = x2k '(x)dx = 2 x2k '(x)dx
1 0
⇣ x2
⌘0
Using integration by parts with the functions x 2k 1
and x'(x) = p1 e 2 =
2⇡
0
( '(x)) we get
Z 1 x=1 Z 1
1 x2
x2k '(x)dx = x2k 1
p e 2 + (2k 1)x2k 2
'(x)dx
0 2⇡ x=0 0
Z 1
= (2k 1) x2k 2 '(x)dx.
0
Solutions to Chapter 3 89

x2
Here the boundary term at 1 disappears because xn e 2 ! 0 for any n 0 as
x ! 1. The integration by parts reduced the exponent of x by 2, and multiplying
both sides by 2 gives
m2k = (2k 1)m2k 2.

Repeating this step we get

m2k = (2k 1)m2k 2 = (2k 1)(2k 3)m2k 4 = · · · = (2k 1)(2k 3) · · · 3m2
= (2k 1)(2k 3) · · · 1.
The final answer is the product of positive odd numbers not larger than 2k, which
is sometimes denoted by (2k 1)!!. It can also be computed as
2k · (2k 1) · (2k 2) · · · 2 · 1 (2k)! (2k)!
(2k 1)(2k 3) · · · 1 = = k = k .
(2k)(2k 2) · · · 2 2 · k(k 1) · · · 1 2 k!
Thus we get
8
<0, if n = 2k + 1
mn = E[Z n ] =
: (2k)! , if n = 2k.
2k k!

3.70. We assume a 6= 0, otherwise Y is not random.

We have seen in (3.42) that if X ⇠ N (µ, 2 ) then FX (x) = P (X  x) =
x µ
( ). Let us compute the cumulative distribution function of Y = aX + b. We
have
FY (y) = P (Y  y) = P (aX + b  y).
If a > 0 then
!
y b
y b µ
FY (y) = P (aX + b  y) = P (X  a ) = FX ( y a b ) = a
.

We have
y b
a µ y (aµ + b)
=
a
⇣ ⌘
y (aµ+b)
thus FY (y) = a . By (3.42) this is exactly the c.d.f. of a N (aµ+b, a2 2
)
2 2
distributed random variable, so Y ⇠ N (aµ + b, a ).
If a < 0 then
!
y b
y b µ
FY (y) = P (aX + b  y) = P (X a ) =1 FX ( y a b ) =1 a
.

Using 1 (x) = ( x) and the computation above we get

!
y b
µ ⇣ ⌘ ⇣ ⌘
a y (aµ+b) y (aµ+b)
FY (y) = = ( a) = |a| .

This is exactly the c.d.f. of a N (aµ + b, a2 2

) distributed random variable, so
Y ⇠ N (aµ + b, a2 2 ) in this case as well.
90 Solutions to Chapter 3

3.71. We define noon to be time zero. Let X ⇠ N(0,36) model the arrival time of
the bus in minutes (since the standard deviation is 6). Thus, X = 6Z where Z ⇠
N(0,1). The question is then:
P (X > 5) = P (6Z > 5) = P (Z > 5/6)
=1 (0.83) ⇡ 1 0.7967 = 0.2033.
3.72. Define the random variable X as the number of points made on one swing of
an axe. Note that X is a discrete random variable taking values {0, 5, 10, 15} and
its expected value can be computed as
X
E[X] = kP (X = k) = 0P (X = 0) + 5P (X = 5) + 10P (X = 10) + 15P (X = 15).
k
From the point system given in the problem we have
P (X = 5) =P ( 20  Y  10) + P (10  Y  20) = 2P (10  Y  20)
P (X = 10) =P ( 10  Y  3) + P (3  Y  10) = 2P (3  Y  10)
P (X = 15) =P ( 3  Y  3) = 2P (0  Y  3).

Since Y ⇠ N (0, 100) the random variable Z = pY = Y

has standard normal
100 10
distribution. Hence
P (X = 5) =2P (1  Z  2) = 2( (2) (1)) ⇡ 2(.9772 .8413) ⇡ 0.2718
P (X = 10) =2P (0.3  Z  1) = 2( (1) (0.3)) ⇡ 2(.8413 0.6179) ⇡ 0.4468
P (X = 15) =2P (0  Z  0.3) = 2 (0.3) 1 ⇡ 2(0.6179) 1 ⇡ 0.2358.
Thus the expected value of X is
E[X] =0P (X = 0) + 5P (X = 5) + 10P (X = 10) + 15P (X = 15)
⇡5(0.2718) + 10(0.4468) + 15(0.2358) = 9.364.
3.73. The
R 1answer is no. Although xfY (x) is an odd function, which
R 1suggests that
E[Y ] = 1 xfY (x)dx = 0, this is incorrect. The problem is that 0 xfY (x)dx =
R0
1 and 1 xfY (x)dx = 1 and hence the integral on ( 1, 1) is not defined.
3.74. There are
R 1lots of ways to construct such
R 1 a random variable. Here we will use
the fact that 1 x1↵ dx = 1 if ↵  1, and 1 x1↵ dx = ↵ 1 1 < 1 if ↵ > 1.
Now let (
k+1
xk+2
, if x 1,
f (x) =
0 otherwise.
R1 R1
Since f (x) 0 and 1 f (x)dx = (k + 1) 1 xk+1 k+2 dx = 1, the function f is a prob-

ability density function. Let X be a continuous random variable with probability

density function equal to f . Then
Z 1 Z 1 Z 1
k+1 1
E[X k ] = xk f (x)dx = xk k+2 dx = (k + 1) 2
dx = k + 1 < 1
1 1 x 1 x
Z 1 Z 1 Z 1
k+1 1
E[X k+1 ] = xk+1 f (x)dx = xk+1 k+2 dx = (k + 1) dx = 1.
1 1 x 1 x
Solutions to Chapter 4

4.1. Let S be the number of students born in January. Then S is distributed as

Bin(1200, p), where p is the probability of a birthday being in January. We use the
normal approximation for P (S > 130):
! !
S 1200 · p 130 1200 · p 130 1200 · p
P (S > 130) = P p >p ⇡1 p .
1200p(1 p) 1200p(1 p) 1200p(1 p)
1
(a) Here p = 12 , and we get
!
130 1200 · p
P (S > 130) ⇡ 1 p ⇡1 (3.13) ⇡ 0.0009.
1200p(1 p)

31
(b) Here p = 365 , and we get
!
130 1200 · p
P (S > 130) ⇡ 1 p ⇡1 (2.91) ⇡ 0.0018.
1200p(1 p)

4.2. Let S be the number of hands with a single pair that are observed in 1000
poker hands. Then S ⇠ Bin(n, p) where n = 1000 and p is the probability of
getting a single pair in a poker hand of 5 cards. We take p = 0.42, which is the
approximate success probability given in the exercise.
To approximate P (S 450) we use the normal approximation. With p = 0.42,
np(1 p) = 243.6 so we can feel confident about using this method.
p p
We have E[S] = np = 420 and Var(S) = 243.6. Then
✓ ◆
S 420 450 420
P (S 450) = P p p
243.6 243.6
✓ ◆
S 420
⇡P p 1.92 ⇡ P (Z 1.92),
243.6

91
92 Solutions to Chapter 4

where Z ⇠ N (0, 1). Hence,

P (S 450) ⇡ P (Z 1.92) = 1 (1.92) ⇡ 1 0.9726 = 0.0274
4.3. Let S be the number of die rolls that are multiples of 3, that is, 3 or 6. Then
S ⇠ Bin(n, p) with n = 300 and p = 13 . We need to approximate P (S = 100) for
which we use the normal approximation with continuity correction.
!
0.5 S 100 0.5
P (S = 100) = P (99.5  S  100.5) = P p p p
200/3 200/3 200/3
! ! !
0.5 0.5 0.5
⇡ p p =2 p 1
200/3 200/3 200/3
⇡ 2 (0.06) 1 ⇡ 0.0478.
4.4. Let Sn be the number of times the roll is 3, 4, 5 or 6 in the first n rolls. Then
Xn = 2Sn + (n Sn ) = Sn + n and Sn ⇠ Bin(n, 23 ). We have E(S90 ) = 60 and
Var(S90 ) = 90 · 23 · 13 = 20. Then normal approximation gives
✓ ◆ ✓ ◆
S90 60 70 60 S90 60 p
P (X90 160) = P (S90 70) = P p p =P p 5
20 20 20
⇡1 (2.24) ⇡ 1 0.9875 = 0.0125.
4.5. Xn = 2Sn + (n Sn ) = Sn + n and Sn ⇠ Bin(n, 23 ).
2
(a) Use below the inequality 3 0.6 0.05.
2
lim P (Xn > 1.6n) = P (Sn > 0.6n) = P (Sn 3n > ( 23 0.6)n)
n!1
2 2
P (Sn 3n > 0.05n) P ( Sn 3n < 0.05n) ! 1
where the last limit is from the LLN.
2
(b) This time use 0.7 3 > 0.03.
2 2
lim P (Xn > 1.7n) = P (Sn > 0.7n) = P (Sn 3n > (0.7 3 )n)
n!1
2 2
 P (Sn 3n > 0.03n)  P ( Sn 3n > 0.03n) ! 0.
The last limit comes from taking complements in the LLN.
4.6. Let n be the size of the sample and Sn the number of positive answers in the
sample. Then pb = Snn and we need P (|bp p|  0.02) 0.95.
We have seen in Section 4.3 that P (|b
p p| > ") can be approximated as
Sn np
P (|p̂ p| < ") = P ( " < p̂ p < ") = P ( " < < ")
p n p
" n Sn np " n
= P( p <p p <p )
p(1 p) n p(1 p) p(1 p)
p
⇡ 2 ( p" n
) 1.
p(1 p)
p
Moreover, since p(1 p)  1/2, we have the bound
p
P (|p̂ p| < ") 2 (2" n) 1.
Solutions to Chapter 4 93

p p
Here we have " = 0.02 and need 2 (2" n) 1 0.95. p This leads to (2" n)
0.975 which, by the table of -values, is satisfied if 2" n 1.96. Solving this
inequality gives
1.962
n = 2401.
4"2
Thus the size of the sample should be at least 2401.
4.7. Now n = 1, 000 and take Sn ⇠ Bin(n, p), where p is unknown. We estimate p
with p̂ = Sn /1000 = 457/1000 = .457. For the 95% confidence interval we need to
find " > 0 such that
P (|p̂ p| < ") 0.95.
Then the confidence interval is (0.457 ", 0.457 + ").
Repeating again the normal approximation procedure: gives
Sn np
P (|p̂ p| < ") = P ( " < p̂ p < ") = P ( " < < ")
p n p
" n Sn np " n
= P( p <p p <p )
p(1 p) n p(1 p) p(1 p)
p
⇡ 2 ( p" n
) 1.
p(1 p)
p
Note that p(1 p)  1/2 on the interval [0, 1], from which we conclude that
p p
2 ( p " n ) 1 2 (2" n) 1,
p(1 p)

and so
p
P (|p̂ p| < ") 2 (2" n) 1.
Hence, we just need to find ✏ > 0 satisfying
p p p
2 (2" n) 1 = 0.95 =) (2" n) = 0.975 =) 2" n ⇡ 1.96.
Thus, take
1.96
"= p ⇡ 0.031
2 1000
and the confidence interval is
(0.457 0.031, 0.457 + 0.031).
4.8. We have n =1,000,000 trials with an unknown success probability p. To find a
99.9% confidence interval we need an " > 0 so that P (|p̂ p| < ") 0.999, where p̂
is the fraction of positive outcomes. We have seen in Section 4.3 that P (|p̂ p| < ")
can be estimated using the normal approximation as
p p
P (|p̂ p| < ") ⇡ 2 ( p " n ) 1 2 (2" n) 1.
p(1 p)
p p
We need 2p (2" n) 1 0.999 which means (2" n) 0.9995 and so approxi-
mately 2" n 3.32. (Since 0.9995 appears several times in our table, other values
instead of 3.32 are also acceptable.) This gives
3.32
" p ⇡ 0.00166
2 n
94 Solutions to Chapter 4

with n =1,000,000. We had 180,000 positive outcomes, so p̂ = 0.18. Thus our

confidence interval is (0.18 0.00166, 0.18 + 0.00166) = (0.17834, 0.18166).
If we choose 3.28 from the table for the solution of (x) = 0.9995 then we get
(0.17836, 0.18164) instead.

4.9. If X ⇠ Poisson( ) with = 10 then

6
X 6
X k
P (X 7) = 1 P (X = k) = 1 e ⇡ 0.8699,
k!
k=0 k=0

and
P13 k
P (X  13 and X 7) k=7 e
P (X  13 | X 7) = = P6 k! k
P (X 7) 1 k=0 k! e
0.7343
⇡ ⇡ 0.844.
0.8699
4.10. It is reasonable to assume that the hockey player has a number of scoring
chances per game, but only a few of them result in goals. Hence the number
of goals in a given game corresponds to counting rare events, which means that
it is reasonable to approximate this random number with a Poisson( ) distributed
random variable. Then the probability of scoring at least one goal would be 1 e
(since e is the probability of no goals). Using the setup of the problem we have
1 e ⇡ 0.5 which gives ⇡ ln(2) ⇡ 0.6931. We estimate the probability that
the player scores exactly 3 goals. Using the Poisson probability mass function and
our estimate on gives
3
P (exactly 3 goals) = e ⇡ 0.028.
3!
Thus we would expect the player to get a hat-trick in about 2.8% of his games.
Equally valid is the answer where we estimate the probability of scoring at least
3 goals:
2
P (at least 3 goals) = 1 P (at most 2 goals) = 1 e e e
2!
=1 1
2 1 + ln 2 + 12 (ln 2)2 ⇡ 0.033.

Both calculations give the answer of roughly 3 percent.

4.11. We assume that typos are rare events that do not strongly depend on each
other. Hence the number of typos on a given page should be well-approximated by
a Poisson random variable with parameter = 6, since that is the average number
of typos per page.
Let X be the number of errors on page 301. We now have
3
X k
66
P (X 4) = 1 P (X  3) ⇡ 1 e = 0.8488.
k!
k=0
Solutions to Chapter 4 95

x
4.12. The probability density function fT (x) of T is e for x 0 and 0 other-
wise. Thus E[T 3 ] can be evaluated as
Z 1 Z 1
E[T 3 ] = fT (x)x3 dx = x3 e x
dx.
1 0
x 0
To compute the integral we use integration by parts with e x = ( e ):
Z 1 x=1 Z 1 Z 1
x3 e x dx = x3 e x 3x2 ( e x )dx = 3x2 e x
dx.
0 x=0 0 0

x x=1
R1
Note that x3 e x=0
= 0 because lim x3 e x
= 0. To evaluate 0
3x2 e x
dx
x!1
we can integrate by parts twice more, or we can quote equation (4.18) from the
text to get
Z 1 Z
3 1 2 3 2 6
3x2 e x dx = x e x dx = · 2 = 3 .
0 0
3 6
Thus E[T ] = 3 .
1
4.13. The probability density function of T is fT (x) = 13 e 3x for x 0, and zero
1
otherwise. The cumulative distribution function is FT (x) = 1 e 3x for x 0,
and zero otherwise. From this we can compute
1
P (T > 3) = 1 FT (3) = e ,
1/3 8/3
P (1  T < 8) = FT (8) FT (1) = e e ,
P (T > 4 and T > 1) P (T > 4)
P (T > 4 | T > 1) = =
P (T > 1) P (T > 1)
4/3
1 FT (4) e 1
= = 1/3
=e .
1 FT (1) e
P (T > 4 | T > 1) can also be computed using the memoryless property of the
exponential:
1
P (T > 4 | T > 1) = P (T > 3) = 1 FT (3) = e .
4.14. (a) Denote the lifetime of the lightbulb by T . Since T is exponentially dis-
1
tributed with expected value 1000 we have T ⇠ Exp( ) with = 1000 . The
t
cumulative distribution function of T is then FT (t) = 1 e for t > 0 and 0
otherwise. Hence
2000· 2
P (T > 2000) = 1 P (T  2000) = 1 FT (2000) = e =e .
(b) We need to compute P (T > 2000|T > 500) where we used the notation of part
(a). By the memoryless property P (T > 2000|T > 500) = P (T > 1500). Using
the steps in part (a) we get
3
1500·
P (T > 1500) = 1 FT (1500) = e =e 2.

4.15. Let N be the Poisson process of arrival times of meteors. Let 11 PM corre-
spond to the origin on the time line.
96 Solutions to Chapter 4

(a) Using the fact that N ([0, 1]), the number of meteors within the first hour, has
Poisson(4) distribution, we get
2
X
P (N ([0, 1]) > 2) = 1 P (N ([0, 1] = k)
k=0
2
X k
44
=1 e ⇡ 0.7619.
k!
k=0

(b) Using the independent increment property we get that N ([0, 1]) and N ([1, 4])
are independent. Moreover, N ([0, 1]) ⇠ Poisson(4) and N ([1, 4]) ⇠ Poisson(3 ·
4), which gives
P (N ([0, 1]) = 0, N ([1, 4]) 10) = P (N ([0, 1]) = 0) · P (N ([1, 4]) 10)
= P (N ([0, 1]) = 0) · (1 P (N ([1, 4]) < 10))
✓ X9 ◆
12k
=e 4· 1 e 12
k!
k=0
⇡ 0.01388.
(c) Using the independent increment property again:
P (N ([0, 1]) = 0, N ([0, 4]) = 13)
P (N ([0, 1]) = 0 | N ([0, 4]) = 13) =
P (N ([0, 4]) = 13)
P (N ([0, 1]) = 0, N ([1, 4]) = 13)
=
P (N ([0, 4]) = 13)
P (N ([0, 1]) = 0) · P (N ([1, 4]) = 13)
=
P (N ([0, 4]) = 13)
e 4 · e 12 1213 /13!
=
e 16 1613 /13!
✓ ◆13
3
=
4
⇡ 0.02376.
4.16. (a) Denote by S the number of random numbers starting with the digit 1.
Note that a number in the interval [1.5, 4.8] starts with 1 if and only if it is in
the interval [1.5, 2). The probability that a uniformly chosen number from the
interval [1.5, 4.8] is in [1.5, 2) is equal to p = 4.80.51.5 = 33
5
. Assuming that the
500 numbers are chosen independently, the distribution of S is binomial with
parameters n = 500 and p.
To estimate P (S < 65) we use normal approximation. Note that E[S] =
5
np = 500 · 33 ⇡ 75.7576 and Var(S) = np(1 p) ⇡ 64.2792. Hence
✓ ◆ ✓ ◆
S 75.7576 65 75.7576 S 75.7576
P (S < 65) = P p < p ⇡P p < 1.34
64.2792 64.2792 64.2792
⇡ ( 1.34) = 1 (1.34) ⇡ 1 0.9099 = 0.0901.
Note that P (S < 65) = P (S  64). Using 64 instead of 65 in the calculation
above gives 1 (1.47) ⇡ 0.0708. If we use the continuity correction then we
Solutions to Chapter 4 97

need to use 64.5 instead of 65 which gives 1 (1.4) ⇡ 0.0808. The actual
probability (evaluated numerically) is 0.0778.
(b) We proceed similarly as in part (a). The probability that a given uniformly
1
chosen number from [1.5, 4.8] starts with 3 is q = 3.3 = 10
33 . If we denote
the number of such numbers among the 500 random numbers by T then T ⇠
Bin(n, q) with n = 500.
Then
! !
T nq 160 nq T nq
P (T > 160) = P p >p ⇡P p > 0.83
nq(1 q) nq(1 q) nq(1 q)
⇡1 (0.83) ⇡ 1 0.7967 = 0.2033.

Again, since P (T > 160) = P (T 161), we could have done the compu-
tation with 161 instead of 160, which would give 1 (0.92) ⇡ 0.1788. If we
use the continuity correction then we replace 160 with 160.5 in the calculation
above which leads to 1 (0.87) ⇡ 0.1922. The actual probability (evaluated
numerically) is 0.1906.
1
4.17. The probability of rolling two ones is 36 . Denote the number of snake eyes
1
out of 10,000 rolls by X. Then X ⇠ Bin(n, p) with n =10,000 and p = 36 . The
expectation and variance are
2500 21,875
np = ⇡ 277.78, np(1 p) = ⇡ 270.06.
9 81
Using the normal approximation:
✓ 2500 2500 2500 ◆
280 9 X 9 300 9
P (280  X  300) = P q  q  q
21,875 21,875 21,875
81 81 81
✓ ◆
4 X 25009 8
=P p  q p
5 35 21,875 35
81

⇡ ( p835 ) ( 5p435 ) ⇡ (1.35) (0.135)

⇡ 0.9115 0.5537 = 0.3578

(For (0.135) we used the average of (0.13) and (0.14).)

With continuity correction:
✓ 2500 2500 2500 ◆
279.5 9 X 9 300.5 9
P (279.5  X  300.5) = P q  q  q
21,875 21,875 21,875
81 81 81
✓ 2500 ◆
X 9
= P 0.105  q  1.38
21,875
81
⇡ (1.38) (0.105) ⇡ 0.9162 0.5418
= 0.3744.
98 Solutions to Chapter 4

The exact probability can be computed using a computer:

300 ✓
X ◆
10,000 1 k 35 10,000 k
P (280  X  300) = ( 36 ) ( 36 ) ⇡ 0.3699.
k
k=280
1
4.18. The probability of hitting the bullseye with a given dart is p = ⇡1 1
⇡52 = 25 .
Denoting the number of bullseyes among the 2000 throws by S we get S ⇠ Bin(n, p)
with n = 2000.
Using the normal approximation,
! !
S np 100 np S np 20
P (S 100) = P p p =P p p
np(1 p) np(1 p) np(1 p) 8 6/5
!
S np
⇡P p 2.28
np(1 p)
⇡1 (2.28) ⇡ 1 0.9887 = 0.0113

With continuity correction we need to replace 100 with 99.5 in the calculation
above. This way we get 1 (2.225) ⇡ 0.01305 (using linear approximation for
(2.225)). The actual probability (evaluated numerically) is 0.0153.
4.19. Let X be number of people in the sample who prefer cereal A. We may
approximate the distribution of X with a Bin(n, p) distribution with n = 100, p =
0.2. (This is an approximation, because the true distribution is hypergeometric.)
The expectation and variance are np = 20 and np(1 p) = 16. Since the variance is
large enough, it is reasonable to use the normal approximation to estimate P (X
25):
✓ ◆
X 20 25 20
P (X 25) = P p p
16 16
⇡ P (Z > 1.25) = 1 (1.25) ⇡ 1 0.8944 = 0.1056,
If we use the continuity correction then we get
✓ ◆
X 20 24.5 20
P (X 25) = P (X > 24.5) = P p p
16 16
⇡ P (Z > 1.125) = 1 (1.125) ⇡ 1 0.8697 = 0.1303.

(We approximated (1.125) as the average of (1.12) and (1.13).

Using a computer one can also compute the exact probability
100 ✓
X ◆
100
P (X 25) = (0.2)k (0.8)100 k ⇡ 0.1313.
k
k=25

4.20. Let X be the number of heads. Then 10,000 X is the number of tails and
|X (10,000 X)| = |2X 10,000| is the di↵erence between the number of heads
and number of tails. We need to estimate
P (|2X 10,000|  100) = P (4950  X  5050).
Solutions to Chapter 4 99

Since X ⇠ Bin(10,000, 12 ), we may use normal approximation to do that:

P (4950  X  5050)
0 1
1 1 1
4950 10,000 · 2 X 10,000 · 2 5050 10,000 · 2
=P@ q q  q A
1 1 1 1
10,000 · 2 · 2 10,000 · 2 · 2 10,000 · 12 · 12
0 1
X 10,000 · 12
=P@ 1 q  1A
1 1
10,000 · 2 · 2
⇡ 2 (1) 1 ⇡ 0.6826.
4.21. Let Xn be the number of games won out of the first n games. Then Xn ⇠
1
Bin(n, p) with p = 20 . The amount of money won in the first n games is then
Wn = 10Xn (n Xn ) = 11Xn n. We have
n 100
P (Wn > 100) = P (11Xn n> 100) = P (Xn > 11 ).

We apply the normal approximation to this probability.

For n = 200 (using the continuity correction):
100
P (W200 > 100) = P (X200 > 11 ) = P (X200 10)
= P (X200 > 9.5) = P ( X200
p 10
9.5
> p0.5 )
9.5
⇡1 ( 0.16) = (0.16) ⇡ 0.5636.

For n = 300 (using the continuity correction):

200
P (W300 > 100) = P (X300 > 11 ) = P (X200 19)
= P (X300 > 18.5) = P ( Xp300 15
14.25
> p 3.5 )
14.25
⇡1 (0.93) ⇡ 0.1762.

Note that the variance in the n = 200 case is 9.5, which is slightly below 10, so
the normal approximation is not fully justified. In this case np2 = 1/2, so the Pois-
son approximation is not guaranteed to work either. The Poisson approximation
is
9
X k
100 10 10
P (W200 > 100) = P (X200 > 11 ) = P (X200 10) ⇡ 1 e ⇡ 0.5421.
k!
k=0

The true probability (computed using binomial distribution) is approximately 0.5453,

so the Poisson approximation is actually pretty good.
4.22. Let S be the number of times we flipped heads among the first 400 steps.
Then S ⇠ Bin(400, 12 ) and the position of the game piece on the board is Y =
S (400 S) = 2S 400. We need to estimate

P (|Y |  10) = P (|2S 400|  10) = P ( 10  2S 400  10) = P (195  S  205).

100 Solutions to Chapter 4

1 1
Using the normal approximation (with E[S] = 400 · 2 = 200 and Var(S) = 400 · 2 ·
1
2 = 100):

P (195  S  205) = P ( 19510200  S 200

10  20510200 ) = P ( 1
2  S 200
10  12 )
1
⇡ P( 2 Z  12 ) = 2 (1/2) 1 ⇡ 2 · 0.6915 1 = 0.383.
With the continuity correction we get
P (195  S  205) = P (194.5 < S < 205.5) ⇡ P ( 0.55  Z  0.55)
= 2 (0.55) 1 ⇡ 2 · 0.7088 1 = 0.4176.
4.23. Let X ⇠ N (1200, 10,000) be the lifetime of a single car battery. With Z ⇠
N (0, 1), X has the same distribution as 1200 + 100Z. Then
P (X  1100) = P (1200 + 100Z  1100)
= P (Z  1) = 1 (1) ⇡ 1 0.8413 = 0.1587.
Now let W be the number of car batteries, in a batch of 100, whose lifetimes are less
than 1100 hours. Note that W ⇠ Bin(100, 0.1587) with an approximate variance
of 100 · 0.1587 · 0.8413 = 13.35. Using a normal approximation, we have
✓ ◆
W 100 · 0.1587 20 100 · 0.1587
P (W 20) = P p p ⇡ P (Z 1.13)
100 · 0.1587 · 0.8413 100 · 0.1587 · 0.8413
=1 (1.13) = 1 0.8708
= 0.1292.
4.24. (a) Let Sn,i , i = 1, 2, . . . , 6 be the number of times we rolled the number i
among the first n rolls. The probability of each number between 1 and 6 is 1/6,
so the law of large numbers states that for any " > 0 we have
Sn,4 1
lim P n 6 < " = 1.
n!1
17 1 1
Using " = 100 6 = 300 and taking complements we get
Sn,4 1
lim P ( n 6 ") = 0.
n!1

But
Sn,4 1 Sn,4 1 Sn,4 17
P( n 6 ") P( n 6 + ") = P ( n 100 ),
Sn,4 1 S 17
thus if P ( n 6 ") converges to zero then so does P ( n,4
n 100 ).
(b) Let Bn,i , i = 1, . . . , 6 be the event that after n rolls the frequency of the number
i is between 16% and 17%. Then An = \6i=1 Bn,i . Note that Acn = [6i=1 Bn,i c
,
and
6
X
(⇤) P (Acn ) = P ([6i=1 Bn,i
c
)  c
P (Bn,i ).
i=1

(Exercise 1.43 proved this subadditivity relation.) We would like to show that
for large enough n we have P (An ) 0.999. This is equivalent to P (Acn ) < 0.001.
If we could show that there is a K so that for n K we have P (Bn,i c
) < 0.001
6
for each 1  i  6, then the bound (⇤) implies P (Acn ) < 0.001 and thereby
P (An ) 0.999.
Solutions to Chapter 4 101

Begin again with the statement given by the law of large numbers: for any
" > 0 and 1  i  6 we have
Sn,i 1
lim P ( n 6 < ") = 1.
n!1
17 1 1
Take " = 100 6 = 300 . Then we have
Sn,i 1 Sn,i
P( n 6 < ") = P ( 16 "< n < 1
6 + ")
49 Sn,i 17
= P ( 300 < n < 100 )
16 Sn,i 17
 P ( 100 < n < 100 ) = P (Bn,i ).
Sn,i 1
Since P ( n < ") converges to 1, so does P (Bn,i ) for each 1  i  6.
6
By this convergence there exists K > 0 so that P (Bn,i ) > 1 0.0016 for each
1  i  6 and all n K. This gives P (Bn,i c
) = 1 P (Bn,i ) < 0.001
6 for each
1  i  6. As argued above, this implies that P (An ) 0.999 for all n K.
4.25. Let Sn be the number of interviewed people that prefer cereal to bagels
for breakfast. If the population is large, we can assume that sampling from the
population with replacement or without replacement does not make a big di↵erence,
therefore we assume Sn ⇠Bin(n, p). In this case, n = 81. As usual, the estimate of
p will be
Sn
p̂ = .
n
We want to find q 2 [0, 1] such that
✓ ◆
Sn
P (|p̂ p| < 0.05) = P p < 0.05 q
n
If Z ⇠ N(0,1), we have that
✓ ◆ p p !
Sn 0.05 n Sn np 0.05 n
P p < 0.05 = P <p <
n p(1 p) np(1 p) p(1 p)
✓ p p ◆
0.05 n 0.05 n
⇡P <Z<
p(1 p) p(1 p)
p p
P ( 2 · 0.05 n < Z < 2 · 0.05 n)
p p p
= (2 · 0.05 n) ( 2 · 0.05 n) = 2 (2 · 0.05 n) 1
= 2 (0.9) 1 ⇡ 2 · 0.8159 1 = 0.6318.
Therefore, the true p lies in the interval (p̂ 0.05, p̂ + 0.05) with probability greater
than or equal to 0.6318. Note that this is not a very high confidence level.
4.26. Let S be the number of interviewed people that prefer whole milk to skim
milk. Then S ⇠ Bin(n, p) with n = 100. Our estimate for p is pb = Sn . The event
p 2 (bp 0.1 , pb + 0.1) is the same as | Sn p| < 0.1. To estimate the probability of
this event we use normal approximation:
p p
P (|S/n p| < 0.1) = P ( p0.1 n
< pS np
< p0.1 n
)
p(1 p) np(1 p) p(1 p)
p p
⇡2 ( p0.1 n ) 1 2 (0.2 n) 1,
p(1 p)

where we used p(1 p)  1/4 in the last step.

102 Solutions to Chapter 4

Since n = 100 we have

p
2 (0.2 n) 1 = 2 (2) 1 ⇡ 2 · 0.9772 1 = 0.9544.
Thus the interval (b
p 0.1 , pb + 0.1) corresponds to 95.44% confidence.
4.27. We need to find n so that
1 X 1
P( 10  n p 10 ) 0.9.
Using normal approximation:
p p
n n
P( 1
10  X
n p 1
10 )  P( p  pX np
 p )
10 p(1 p) np(1 p) 10 p(1 p)
p
⇡2 ( p n ) 1
10 p(1 p)

We need p p
n n
2 ( p ) 1 0.9 , ( p ) 0.95
10 p(1 p) 10 p(1 p)

which holds if p
n
p 1.645
10 p(1 p)

(using linear interpolation in the table). This yields

n 1.6452 · 100p(1 p).
1
We know that p(1 p)  1/4, so if n 1.6452 · 100 · 4 = 67.65 then our inequality
will hold. Thus n should be at least 68.
4.28. For p = 1 the maximum is at n (since the p.m.f. is 1 there), and for p = 0 it
is not (as the p.m.f. is 0 there). From this point we will assume 0 < p < 1.
Denote by f (k) the p.m.f. of the Bin(n, p) distribution at k. Then for 0  k 
n 1 we have
n! k+1
f (k + 1)
n
k+1 pk+1 (1 p)n k 1
(k+1)!(n k 1)! p (1 p)n k 1
= n = n!
f (k) k pk (1 p)n k k
k!(n k)! p (1 p)n k
(n k)p
=
(k + 1)(1 p)
Then f (k + 1) f (k) if and only if (n k)p (k + 1)(1 p), which is equivalent
to k  p(n + 1) 1. This means that if n 1  p(n + 1) 1 then we have
f (0)  f (1)  · · ·  f (n 1)  f (n). If n 1 > p(n + 1) 1 then f (n 1) > f (n).
1
Thus the maximum is at n if n 1  p(n+1) 1 which is equivalent to p 1 n+1 .
To summarize: the p.m.f. of the Bin(n, p) distribution has its maximum at n if
1
p 1 n+1 .
4.29. If P (Sn = k) > 0 then |k| cannot be bigger than n, and the parity of n and k
must be the same. (Otherwise the random walker cannot get from 0 to k in exactly
n steps.)
Assume now that |k|  n and that n k = 2a with a being an integer. The
random walker ends up at k = n 2a after n steps exactly if it takes n a up steps
and a down steps. The probability of this is the same that a Bin(n, p) random
Solutions to Chapter 4 103

variable is equal to n a, which is n n a pn a (1 p)a . Since n a = n+k

2 and
a = n 2 k , we get that for |k|  n and n k even we have
✓ ◆
n n+k n k
P (Sn = k) = n+k p 2 (1 p) 2 ,
2

otherwise P (Sn = k) is zero.

4.30. Let f (k) be the probability mass function of a Poisson( ) random variable
at k. Then for k 0 we have
k+1
f (k + 1) (k+1)! e
= k = .
f (k) k! e k+1

This means that f (k+1) > f (k) exactly if > k+1 or 1 > k, and f (k+1) < f (k)
exactly if 1 < k.
If is not an integer then let k ⇤ = b c be the integer part of (the largest
integer smaller than ). By the arguments above we have
f (0) < f (1) < · · · < f (k ⇤ ) > f (k ⇤ + 1) > f (k ⇤ + 2) > . . .
If is a positive integer then
f (0) < f (1) < · · · < f ( 1) = f ( ) > f ( + 1) > f ( + 2) > . . .
In both cases f is increasing and then decreasing.
4.31. We have
 1
X 1
1 1 µµ
k
1X µ µk+1
E = e = e
1+X k+1 k! µ (k + 1)!
k=0 k=0
X1
1 µ` 1 e µ
= e µ = .
µ `! µ
`=1
P1 µµ
`
We introduced ` = k + 1 and used `=1 e `! =1 e µ.
P1
4.32. (a) We can compute E[g(Y )] with the formula k=0 g(k)P (Y = k). Thus
1
X µk µ
E[Y (Y 1) · · · (Y n + 1)] = k(k 1) · · · (k n + 1) e .
k!
k=0

Note that k(k 1) · · · (k n + 1) = 0 for k = 0, 1, . . . , n 1. Thus we can start

the sum at k = n:
1
X µk µ
E[Y (Y 1) · · · (Y n + 1)] = k(k 1) · · · (k n + 1) e .
k!
k=n

Moreover, for k n the product k(k 1) · · · (k n + 1) is exactly the product

of the first n factors in k! = k(k 1)(k 2) · · · 1, hence
1
X µk µ
E[Y (Y 1) · · · (Y n + 1)] = e .
(k n)!
k=n
104 Solutions to Chapter 4

Introducing ` = k n we can rewrite the sum as

1
X 1
X X1
µk µ`+n µ µ`
e µ= e = µn e µ
= µn .
(k n)! `! `!
k=n `=0 `=0
P1 µ` µ
(The last step follows from `=0 `! e = 1.) Thus the nth factorial moment
of Y is µn .
(b) We can compute E[Y 3 ] by expressing it in terms of factorial moments of Y and
then using part (a). We have
y 3 = y(y 1)(y 2) + 3y 2 2y
= y(y 1)(y 2) + 3y(y 1) + y.
Thus
1
X
3 µk
E[Y ] = k3 e µ
k!
k=0
X1 1
X 1
X
µk µ µk µ µk µ
= k(k 1)(k 2) e +3 k(k 1) e + k e
k! k! k!
k=0 k=0 k=0
= µ3 + 3µ2 + µ.
4.33. Let X denote the number of calls on a given day. According to our assumption
this is a Poisson( ) random variable with some parameter , and our goal is to find
. (Since the parameter is the same as the expected value.) We are given that
P (X = 0) = 0.005, which gives e = 0.005 and = log(0.005) ⇡ 5.298.
4.34. We can assume that each taxi has a small probability of getting into an
accident on a given day, independently of the others. Since there are a large number
of taxis, the number of accidents on a given week could be well approximated with
a Poisson(µ) distributed random variable. There are on average 3 accidents a week,
thus it is reasonable to choose µ = 3. Then the probability of having 2 accidents
2
next week is given by 32! e 3 = 92 e 3 .
4.35. The probability of getting all heads or all tails after flipping a coin ten times
is p = 2 9 . The distribution of X is Bin(n, p) with n = 365.
(a)
9 365 9 9 364
P (X > 1) = 1 P (X = 0) P (X = 1) = 1 (1 2 ) 365 · 2 (1 2 ) .
9 2
(b) Since np = 365 · 2 ⇡ 0.7129 and np < 0.0014, the Poisson approximation is
appropriate.
0.7129 0.7129
P (X > 1) = 1 P (X = 0) P (X = 1) ⇡ 1 e 0.71e ⇡ 0.1603.
4.36. Assume that we invite n guests and let X denote the number of guests with
the same birth day as mine. We need to find n so that P (X 1) 2/3. If
we disregard leap years, and assume that the birth days are chosen uniformly and
1
independently, then X has binomial distribution with parameters n and p = 365 .
1 n 1 n
We have P (X 1) = 1 P (X = 0) = 1 (1 365 ) . Solving 1 (1 365 ) 2/3
ln(3)
gives n 1 ⇡ 400.444 which means that we should invite at least 401
ln(1 365 )
guests.
Solutions to Chapter 4 105

1 n
Note that we can approximate the Bin(n, 365 ) distribution with a Poisson( 365 )
distributed random variable Y . Then P (X 1) ⇡ P (Y 1) = 1 P (Y = 0) =
n n
1 e 365 . To get 1 e 365 2/3 we need n 365 ln 3 ⇡ 400.993 which also gives
n 401.
4.37. Since there are lots of scoring chances, but only a few of them result goals,
it is reasonable to model the number of goals in a given game by a Poisson( )
random variable. Then the percentage of games with no goals should be close to
the probability of this Poisson( ) random variable being zero, which is e . Thus
0.0816 = e = log(0.0816) ⇡ 2.506
The percentage of games where exactly one goal was scored should be close to
e = 0.2045 or 20.45%.
(Note: in reality 77 of the 380 games ended with one goal which gives 20.26%.
The Poisson approximation gives an extremely precise estimate!)
4.38. Note that X is a Bernoulli random variable with success probability p, and
Y ⇠ Poisson(p). We need to show that for any subset A of {0, 1, . . . } we have
|P (X 2 A) P (Y 2 A)|  p2 .
This looks hard, as there are lots of subsets of {0, 1, . . . }. Let us start with the
subsets {0} and {1}. In these cases
(
1 p e p, if k = 0
P (X 2 A) P (Y 2 A) = P (X = k) P (Y = k) = p
p pe , if k = 1.

We have 1 p  e p . This can be shown by noting that the function e x is convex,

and hence its tangent line at x = 0 (the line 1 x) must always be below the graph.
2
Integrating this inequality on [0, p] and then rearranging it gives 0  e p +p 1  p2 .
We also get 0  p pe p = p(1 e p )  p2 .
This gives
p2
 P (X = 0) P (Y = 0)  0, 0  P (X = 1) P (Y = 1)  p2 .
2
Now consider a general subset A of {0, 1, . . . }. We consider four cases.
Case 1: A does not contain 0 or 1. In this case P (X 2 A) = 0 and
p
P (Y 2 A)  P (Y 2) = 1 P (Y = 0) P (Y = 1) = 1 e (1 + p).
Hence P (X 2 A) P (Y 2 A) = P (Y 2 A) and
p
|P (X 2 A) P (Y 2 A)|  1 e (1 + p)  1 (1 p)(1 + p) = p2 .
Case 2: A contains both 0 and 1. In this case P (X 2 A) = 1 and
p
1 P (Y 2 A) P (Y  1) = e (1 + p).
Hence P (X 2 A) P (Y 2 A) = P (Y 2 A) and
p
|P (X 2 A) P (Y 2 A)|  1 e (1 + p)  1 (1 p)(1 + p) = p2 .
106 Solutions to Chapter 4

Case 3: A contains 0 but not 1. In this case P (X 2 A) = 1 p and

p
P (Y 2 A) P (Y = 0) = e
p
P (Y 2 A)  P (Y = 0) + P (Y 2) = 1 P (Y = 1) = 1 pe .
This gives
p p
1 p (1 pe )  P (X 2 A) P (Y 2 A)  1 p e .
2
p p
We have seen that 2 1 p e  0 and we also have
p p
1 p (1 pe )= p(1 e ) p2 .
Thus
p2  P (X 2 A) P (Y 2 A)  p2 /2
and |P (X 2 A) P (Y 2 A)|  p2 . Case 4: A contains 1 but not 0. This case
can be handled similarly as Case 3. Or we could note that Ac contains 0 but not
1, and thus by Case 3 we have |P (X 2 Ac ) P (Y 2 Ac )|  p2 . But
|P (X 2 A) P (Y 2 A)| = |(1 P (X 2 Ac )) (1 P (Y 2 Ac )| = |P (X 2 Ac ) P (Y 2 Ac )|
hence we get |P (X 2 A) P (Y 2 A)|  p2 in this case as well.
We checked all possible cases, and we have shown that Fact 4.20 holds for n = 1
every time.
4.39. Let X be the number of wheat cents among Cassandra’s
1
(a) We have X ⇠ Bin(n, p) with n = 400 and p = 350 . Thus
400 399
P (X 2) = 1 P (X = 0) P (X = 1) = 1 ( 349
350 ) 400 · 1
350 · ( 349
350 )
We could also write this as
400 ✓
X ◆
400 1 k 400 k
P (X 2) = ( 350 ) · ( 349
350 )
k
k=2

(b) Since np2 is small, the Poisson approximation is appropriate with parameter
µ = np = 87 . Then
8 8
8
P (X 2) = 1 P (X = 0) P (X = 1) ⇡ 1 e 7
7e
7 ⇡ 0.3166
4.40. Let X denote the number of times the number one appears in the sample.
1
Then X ⇠ Bin(111, 10 ). We need to approximate P (X  3). Using normal ap-
proximation gives
0 1
1 1
X 111 · 10 3 111 · 10
P (X  3) = P @ q q A
1 9 1 9
111 · 10 · 10 111 · 10 · 10
0 1
1
X 111 · 10
⇡ P @q  2.56A
1 9
111 · 10 · 10
⇡ ( 2.56) = 1 (2.56) ⇡ 1 0.9948 = 0.0052.
If we use the continuity correction then we have to repeat the calculation above
starting from P (X  3) = P (X < 2.5) which gives the approximation ( 2.72) ⇡
0.0033.
Solutions to Chapter 4 107

For the Poisson approximation we approximate X with a random variable Y ⇠

Poisson( 111
10 ). Then

P (X  3) ⇡ P (Y  3) = P (Y = 0) + P (Y = 1) + P (Y = 2) + P (Y = 3)
11.1 11.12 11.13
=e (1 + 11.1 + + ) ⇡ 0.004559.
2 6
The variance of X is 999
100 which is almost 10, hence it is not that surprising that the
normal approximation is pretty accurate (especially with continuity correction).
1 2
Since np2 = 111 · ( 10 ) = 1.11 is not very small, we cannot expect the Poisson
approximation to be very precise, although it is still quite accurate.
4.41. Let X be the number of sixes. Then X ⇠ Bin(n, p) with n = 72 and p = 1/6.
✓ ◆
72 1 3 5 69
P (X = 3) = ( ) ( ) ⇡ 0.00095.
3 6 6
The Poisson approximation would compare X with a Poisson(µ) random variable
with µ = np = 12:
123
P (X = 3) ⇡ e 12 ⇡ 0.0018.
3!
For the normal approximation we need the continuity correction:
⇣ ⌘
P (X = 3) = P (2.5  X  3.5) = P 2.5
p 12  Xp 12  3.5
10 10
p 12
10
⇡ ( 2.69) ( 3.0) = (3.0) (2.69) ⇡ 0.9987 0.9964 = 0.0023.
4.42. (a) Let X be the number of mildly defective gadgets in the box. Then X ⇠
Bin(n, p) with n = 100 and p = 0.2 = 15 . We have
X14 ✓ ◆
100
P (A) = P (X < 15) = (1/5)k (4/5)100 k .
k
k=0

(b) We have np(1 p) = 16 > 10 and np2 = 4. This suggests that the normal
approximation is more appropriate than the Poisson approximation in this case.
Using normal approximation we get
0 1
X 100 · 15 15 100 · 15
P (X < 15) = P @ q <q A
100 · 15 · 45 100 · 15 · 45
0 1
X 100 · 15 5
= P @q < A
100 · ·1 4 4
5 5

⇡ ( 1.25) = 1 (1.25) ⇡ 1 0.8944 = 0.1056.

With continuity correction we would get ( 1.375) = 1 (1.375) ⇡ 0.08455
(using linear interpolation to get (1.375)).
The actual value is 0.0804437 (calculated with a computer).
4.43. We first consider the probability P (X 48). Note that X ⇠ Binomial(400, 0.1).
Note also that the mean of X is 40 and the variance is 400 ⇤ 0.1 ⇤ 0.9 = 36, which
108 Solutions to Chapter 4

is large enough for a normal approximation to work. So, letting Z ⇠ N (0, 1) and
using the correction for continuity, we have
✓ ◆
X 40 47.5 40
P {X 48} = P (X 47.5) = P
6 6
⇡ P (Z 1.25) = 1 (1.25) = 1 0.8944 = 0.1056.

Next we turn to approximating P (Y 2). Note that Y ⇠ Binomial(400,0.0025),

and since 400 · 0.0025 = 1 and 400 · 0.00252 = 0.0025 is small, it is clear that only
a Poisson approximation is appropriate in this case. Letting N ⇠ Poisson(1), we
have
1 1
P (Y 2) ⇡ P (N 2) = 1 P (N = 0) P (N = 1) = 1 e e = 0.2642.
4.44. (a) Let X denote the number of defective watches in the box. Then X ⇠
Bin(n, p) with n = 400 and p = 1/2. We are interested in the probability that
at least 215 of the 400 watches are defective, this is the event {X 215}. The
exact probability is
X400 ✓ ◆
400 1
P (X 215) = .
k 2400
k=215

(b) We have np(1 p) = 100 > 10 and np2 = 100. Thus it is more reasonable to
use the normal approximation:
✓ ◆
X 400· 12 215 400· 12
P (X 215) = P p 1 1
p 1 1
400· 2 · 2 400· 2 · 2
✓ ◆
X 400· 12 3
=P p 2
400· 12 · 12

⇡1 (1.5) ⇡ 1 0.9332 = 0.0668.

If we use continuity correction then we start with P (X 215) = P (X > 214.5)
which leads to the approximation 1 (1.45) ⇡ 0.0735.
The actual probability is 0.07348 (calculated with a computer).
1 13·48
4.45. The probability of a four of a kind is p = = 4165 . Denote by X the
(52
5)
number of four of a kinds we see in 10,000 poker hands. Then X ⇠ Bin(n, p) with
n = 10, 000. Since np2 is tiny, we can approximate X with a Poisson(µ) random
variable with µ = np. Then
1
10,000· 4165
P (X = 0) ⇡ e ⇡ 0.0907.
4.46. The probability that we get 5 tails when we flip a coin 5 times is 215 = 321
.
1 465
Thus X ⇠ Bin(n, p) with n = 30 and p = 32 . Since np(1 p) = 512 < 1, the
15
normal approximation is not appropriate. On the other hand, np2 = 512 ⇡ 0.029
is small, so the Poisson approximation should work. For this we approximate the
distribution of X using a random variable Y ⇠ Poisson( ) with = np = 15 16 to get
2
✓ ◆ 2
15 15
P (X = 2) ⇡ P (Y = 2) = e = e 16 ⇡ 0.1721.
2 16
The actual probability is 0.1746 (calculated with a computer).
Solutions to Chapter 4 109

4.47. (a) Let X be the number of times in a year that he needed more than 10
coin flips. Then X ⇠ Bin(365, p) with
1
p = P (more than 10 coin flips needed) = P (first 10 coin flips are tails) =
210
Since np(1 p) is small (and np2 is even smaller), we can use the Poisson
approximation here with = np = 365
210 = 0.356. Then
2
P (X 3) = 1 P (X = 0) P (X = 1) P (X = 2) ⇡ 1 e ) ⇡ 0.00579. (1+ +
2
(b) Denote the number of times that he needed exactly 3 coin flips by Y . This
has a Bin(365, r) distribution with success probability r = 213 = 18 . (The value
of r is the probability that a Geo(1/2) random variable is equal to 3.) Since
nr(1 r) = 39.92 > 10, we can use normal approximation. The expectation of
Y is E[Y ] = nr = 45.625.
X 45.625 50 45.625 X 45.625
P (X > 50) = P ( p > ) = P( p > 0.69)
39.92 39.92 39.92
⇡1 (0.69) = 1 0.7549 = 0.2451.
4.48. Let A = {X 2 [0, 1]} and B = {X 2 [a, 2]}. We need to find a < 1 so that
P (AB) = P (A)P (B).
If a  0 then AB = A, and then P (A)P (B) 6= P (AB). Thus we must have
0 < a < 1 and hence AB = {X 2 [a, 1]}. The c.d.f. of X is 1 e 2x for x 0 and
0 otherwise. From this we can compute
2
P (A) = P (0  X  1) = 1 e
2a 4
P (B) = P (a  X  2) = e e
2a 2
P (AB) = P (a  X  1) = e e .
Thus P (AB) = P (A)P (B) is equivalent to
2 2a 4 2a 2
(1 e )(e e )=e e .
2a 4 2 1 2
Solving this we get e =e +1 e and a = 2 ln(1 e + e 4) ⇡ 0.0622.
4.49. Let T ⇠ Exp(1/10) be the lifetime of a particular stove. Let r > 0 and let X
be the amount of money you earn on a particular extended warranty of length r.
We see that ⇢
C if T > r
X=
C 800 if T  r
(1/10)r
We have P (T > r) = e , and so
E[X] = CP (X = C) + (C 800)P (X = C 800)
= CP (T > r) + (C 800)P (T  r)
r/10 r/10
= Ce + (C 800)(1 e ).
Thus, the pairs of numbers (C, r) will give an expected profit of zero are those
satisfying:
0 = Ce r/10 + (C 800)(1 e r/10 ).
110 Solutions to Chapter 4

4.50. By the memoryless property of the exponential distribution for any x > 0 we
have
P (T > x + 7|T > 7) = P (T > x).
Thus the conditional probability of waiting at least 3 more hours is P (T > 3) =
1
e 3 ·3 = e 1 , and the conditional probability of waiting at least x > 0 more hours
1
is P (T > x) = e 3 x .
4.51. We know from the condition that 0  T1  t, so P (T1  s | Nt = 1) = 0 if
s < 0 and P (T1  s | Nt = 1) = 1 if s > t.
If 0  s  t we have
P (T1  s, Nt = 1)
P (T1  s | Nt = 1) = .
P (Nt = 1)
t
Since the arrival is a Poisson process with intensity , we have P (N1 = 1) = e .
Also,
P (T1  s, Nt = 1) = P (N ([0, s]) = 1, N ([0, t]) = 1) = P (N ([0, s]) = 1, N ([s, t]) = 0)
s (t s)
= P (N ([0, s]) = 1)P (N ([s, t]) = 0) = se ·e
t
= se .
Then
t
P (T1  s, Nt = 1) se
P (T1  s | Nt = 1) = = t
= s.
P (Nt = 1) e
Collecting all cases:
8
>
<0, s<0
P (T1  s | Nt = 1) = s, 0st
>
:
1, s > t.
This means that the conditional distribution is uniform on [0, t].
R1 R1
4.52. (a) By definition (r) = 0 xr 1 e x dx for r > 0. Then (r+1) = 0 xr e x
dx.
Using integration by parts with ( e x )0 = e x we get
Z 1
(r + 1) = xr e x dx
0
Z 1
x=1
= xr ( e x ) x=0 rxr 1 ( e x )dx
0
Z 1
=r xr 1 e x dx = r (r).
0
r x x=1
The two terms in x ( e ) x=0
disappear because r > 0 and lim xr e x
= 0.
x!1
(b) We use induction to prove the identity. For n = 1 the statement is true as
Z 1
(1) = e x dx = 1 = 0!.
0
Assume that the statement is true for some positive integer n: (n) = (n 1)!,
we need to show that it also holds for n + 1. But this is true because by part
(a) we have
(n + 1) = n (n) = n · (n 1)! = n!,
Solutions to Chapter 4 111

where we used the induction hypothesis and the definition of n!.

4.53. We have
Z 1 Z 1 r r 1
x x
E[X] = xf (x)dx = x· e dx.
1 0 (r)
We can modify the integrand so that we the probability density function of a
Gamma(r + 1, ) appears:
Z
(r + 1) 1 r+1 xr
E[X] = e x dx.
(r) 0 (r + 1)
Since the probability density function of a Gamma(r + 1, ) integrates to 1 this
leads to
(r + 1) r (r) r
E[X] = = = .
(r) (r)
In the last step we used (r + 1) = r (r). We can use the same trick to compute
the second moment:
Z 1 Z
2 2
r r 1
x x (r + 2) 1 r+2 xr+1
E[X ] = x · e dx = 2 e x dx
0 (r) (r) 0 (r + 2)
(r + 2) (r + 1)r (r) (r + 1)r
= 2 = 2 (r)
= 2
.
(r)
Then the variance is
r(r + 1) ⇣ r ⌘2 r
Var(X) = E[X 2 ] E[X]2 = 2
= 2.
Solutions to Chapter 5

P
5.1. We have M (t) = E[etX ], and since X is discrete we have E[etX ] = k P (X =
k)etk . Using the given probability mass function we get
6t 2t
M (t) = P (X = 6)e + P (X = 2)e + P (X = 0) + P (X = 3)e3t
6t 2t
= 49 e + 19 e + 2
9 + 29 e3t
5.2. (a) We have
M 0 (t) = 4
3e
4t
+ 56 e5t , M 00 (t) = 16
3 e
4t
+ 25 5t
6 e .

Hence E(X) = M 0 (0) = 43 + 56 = 12 , E(X 2 ) = M 00 (0) = 16

3 + 25
6 = 57
6 = 19
2 ,
and Var(X) = E(X 2 ) (E[X])2 = 192
1 37
4 = 4 .
(b) From the moment generating function we see that X is discrete, the possible
values are 4, 0 and 5. The corresponding probabilities can be read o↵ from
the coefficients of the appropriate exponential terms:
p(0) = 12 , p( 4) = 13 , p(5) = 16 .
From this we get
1 1
E(X) = 3 · ( 4) + 6 · 5 = 12 ,
E(X 2 ) = 1
3 · 16 + 1
6 · 25 = 57
6 = 19
2 ,
Var(X) = E(X 2 ) (E[X]) = 2 19
2
1
4 = 37
4

5.3. The probability density function of X is f (x) = 1 for x 2 [0, 1] and 0 otherwise.
The moment generating function can be computed as
Z 1 Z 1
tX tx
M (t) = E[e ] = f (x)e dx = etx dx.
1 0
R1
If t = 0 then M (t) = 0
dx = 1. If t 6= 0 then
Z 1
et 1
M (t) = etx dx = .
0 t

113
114 Solutions to Chapter 5

5.4. (a) In Example 5.5 we have seen that the moment generating function of a
2 t2 2
N (µ, 2 ) random variable is e 2 +µt . Thus if X̃ ⇠ N (0, 12) then MX̃ (t) = e6t
and MX̃ (t) = MX (t) for |t| < 2. But then by Fact 5.14 the distribution of X is the
same as the distribution of X̃.
(b) In Example 5.6 we computed the moment generating function of an Exp( )
distribution, and it was t for t < and 1 otherwise. Thus MY (t) has the same
moment generating function as an Exp(2) distribution in the interval ( 1/2, 1/2),
hence by Fact 5.14 we have Y ⇠ Exp(2).
(c) We cannot identify the distribution of Z, as there are many random variables
with moment generating functions that are infinite for t 5. For example, all
Exp( ) distributions with < 5 have this property.
(d) We cannot identify the distribution of W , as there are many random variables
where the moment generating function is equal to 2 at t = 2. Here are two examples:
if W1 ⇠ N (0, 2 ) with 2 = ln22 then
2 t2 ln 2 (22 )
2
MW1 (2) = e 2 =e 2 = eln 2 = 2.
ln 2
If W2 ⇠ Poisson( ) with = e2 1 then
(e2 1)
ln 2
(e2 1)
MW2 (2) = e = e e2 1 = eln 2 = 2.
t
5.5. We can recognize MX (t) = e3(e 1) as the moment generating function of a
4
Poisson(3) random variable. Hence P (X = 4) = e 3 34! .
5.6. Then possible values of Y = (X 1)2 are 1, 4 and 9. The corresponding
probabilities are
P ((X 1)2 = 1) = P (X = 0 or X = 2) = P (X = 0) + P (X = 2)
1 3 2
= + =
14 14 7
2 1
P ((X 1) = 4) = P (X = 1) = ,
7
2 4
P ((X 1) = 9) = P (X = 4) = .
7
5.7. The cumulative distribution function of X is FX (x) = 1 e x for x 0 and
0 otherwise. Note that X > 0 with probability one, and ln(X) can take values from
the whole R.
We have
ey
FY (y) = P (Y  y) = P (ln(X)  y) = P (X  ey ) = 1 e ,
where we used ey > 0. From this we get
d ⇣ ⌘0
ey ey
fY (y) = FY (y) = 1 e = ey
dy
for all y 2 R.
5.8. We first compute the cumulative distribution function of Y . Since 1  X  2,
we have 0  X 2  4, thus FY (y) = 1 for y 4 and FY (y) = 0 for y < 0.
Solutions to Chapter 5 115

For 0  y < 4 we have

p p p p
FY (y) = P (Y  y) = P (X 2  y) = P ( yX y) = FX ( y) FX ( y).
Di↵erentiating this we get the probability density function:
1 p 1 p
fY (y) = FY0 (y) = p fX ( y) + p fX ( y).
2 y 2 y

The probability density of X is fX (x) = 13 for 1  x  2 and zero otherwise. For

p p
0  y  1 then both fX ( y) and fX ( y) is equal to 13 , and for 1 < y < 4 we
p p
have fX ( y) = 13 and fX ( y) = 0.
From this we get
8 1
>
< 3py for 0  y  1,
1
fY (y) = 6
p
y for 1 < y < 4,
>
:
0 otherwise.
5.9. (a) Using the probability mass function of the binomial distribution, and the
binomial theorem:
Xn ✓ ◆
n k
MX (t) = p (1 p)n k etk
k
k=0
Xn ✓ ◆
n
= (et p)k (1 p)n k
k
k=0
= (et p + 1 p)n .
(b) We have
n 1
E[X] = M 0 (0) = npet pet p+1 t=0
= np
n 2 n 1
E[X 2 ] = M 00 (0) = (n 1)np2 e2t pet p+1 + npet pet p+1 t=0
= (n 1)np2 + np.

From these we get Var(X) = E[X 2 ] (E[X])2 = (n 1)np2 +np n2 p2 = np(1 p).
5.10. Using the Binomial Theorem we get
✓ ◆30 X 30 ✓ ◆ ✓ ◆k ✓ ◆30 k
1 4 t 30 4 1
M (t) = + e = ekt .
5 5 k 5 5
k=0

Since this is the sum of terms of the form pk etk , we see that X is discrete. The
possible values can be identified with the exponents: these are 0,1,2,. . . , 30. The
coefficients are the corresponding probabilities:
✓ ◆ ✓ ◆k ✓ ◆30 k
30 4 1
P (X = k) = , k = 0, 1, . . . , 30.
k 5 5
We can recognize this as the probability mass function of a binomial distribution
with n = 30 and p = 45 .
116 Solutions to Chapter 5

5.11. (a) The moment generating function is

Z 1 Z 1
MX (t) = f (x)etX = xe (t 1)x
dx.
1 0

If t 1  0 then the integral is infinite. If t 1 > 0 then we can compute the

integral by writing
Z 1 Z 1
(t 1)x 1 1
xe dx = x(t 1)e (t 1)x dx =
0 t 1 0 (t 1)2
where in the last step we recognized the integral to be the expectation of an Exp(t
1) random variable. (One can also compute the integral by integrating by parts.)
Hence MX (t) = (1 1t)2 for t < 1, and MX (t) = 1 otherwise.
(b) Di↵erentiating repeatedly:
2 2·3 2·3·4
M 0 (t) = , M 00 (t) = , M 000 (t) = .
(1 t)3 (1 t)4 (1 t)5
Using mathematical induction one can show the general expression
2 · 3 · · · (n + 1) (n + 1)!
M (n) = = ,
(1 t)n+2 (1 t)n+2
from which we get
E[X n ] = M (n) (0)(n + 1)!.
5.12. We have
Z 1 Z 1 Z 1
tx 1 2 x tx 1 2 (t 1)x
M (t) = f (x)e dx = 2x e e dx = 2x e dx.
1 0 0
R1 1 2
If t 1 then e(t 1)x 1 for x 0 and M (t) 0 2 x dx = 1. If t < 1 then
Z 1 Z 1
1 2 (t 1)x 1 1 2 1
2x e dx = x2 (1 t)e (1 t)x
dx = · = .
0 2(1 t) 0 2(1 t) (1 t)2 (1 t)3
The integral can be computed using integration by parts, or by recognizing it as
the second moment of an Exp(1 t) distributed random variable.
Thus we get
(
1
(1 t)3 , for t < 1
M (t) =
1, otherwise.
5.13. We can get E[Y ] by computing MY0 (0):
1 1 1 3t 121 100t
MY0 (t) = 34 · e 34t
5· e 5t
+3· e + 100 · e
16 8 100 400
and
E[Y ] = MY0 (0) = 27.53.
P
Since MY (t) is of the form k pk etk , we see that Y is discrete, the possible
values are the numbers k for which pk 6= 0 and pk gives the probability P (Y = k).
Solutions to Chapter 5 117

Hence the probability mass function of Y is

1 1
P (Y = 0) = 1/2, P (Y = 34) = , P (Y = 5) = ,
16 8
1 121
P (Y = 3) = , P (Y = 100) = .
100 400
From this
E[Y ] = 0 · P (Y = 0) + ( 34) · P (Y = 34) + ( 5) · P (Y = 5)
+ 3 · P (Y = 3) + 100 · P (Y = 100) = 27.53.
5.14. The probability mass function of X is
✓ ◆
4 1
pX (k) = , k = 0, 1, . . . , 4.
k 24
The possible values of X are k = 0, 1, . . . , 4, which means that the possible values
of Y are 0, 1, 4. We have
✓ ◆
2 4 1 3
P (Y = 0) = P ((X 2) = 2) = P (X = 2) = 4
=
2 2 8
P (Y = 1) = P ((X 2)2 = 1) = P (X = 3, or X = 1) = P (X = 1) + P (X = 3)
✓ ◆ ✓ ◆
4 1 4 1 1
= 4
+ 4
=
1 2 3 2 2
P (Y = 4) = P ((X 2)2 = 4) = P (X = 4, or X = 0) = P (X = 0) + P (X = 4)
✓ ◆ ✓ ◆
4 1 4 1 1
= + = .
0 24 4 24 8

5.15. (a) We have

X 1 1 3 2
MX (t) = P (X = k)etk = e 2t
+ e t
+ + et .
10 5 10 5
k

(b) The possible values of X are { 2, 1, 0, 1}, so the possible values of Y = |X +1|
are {0, 1, 2}. We get
3
P (Y = 0) = P (X = 1) =
10
1 3 2
P (Y = 1) = P (X = 2) + P (X = 0) = + =
10 10 5
2
P (Y = 2) = P (X = 1) = .
5
R1 1
5.16. (a) We have E[X n ] = 0 xn dx = n+1 .
(b) In Exercise 5.3 we have seen that the moment generating function of X is given
by the case defined function
(
1, t=0
MX (t) = et 1
t , t 6= 0.
118 Solutions to Chapter 5

P1 tk
P1 tk
We have et = k=0 k! , hence et 1= k=1 k! and
1 1 k 1 1
et 1 1 X tk X t X tn
MX (t) = = = =
t t k! k! n=0
(n + 1)!
k=1 k=1

for t 6= 0. In fact, this formula works for t = 0 as well, as the constant term of the
series is equal to 1. Now we can read o↵ the nth derivative at zero by taking the
coefficient of tn and multiplying by n!:
1 1
E[X n ] = M (n) (0) = n! · = .
(n + 1)! n+1
This agrees with the result we got for part (a).
5.17. (a) MX (0) = 1. For t 6= 0 integrate by parts.
Z 1 Z
1 2 tx
MX (t) = E[etX ] = etx f (x) dx = xe dx
1 2 0
 Z 2 
1 ⇣ x tx ⌘ ⇣1 ⌘
x=2 x=2
1 tx 1 2e2t
= e e dx = 2
etx
2 t x=0 0 t 2 t t x=0
2t 2t
2te e +1
= .
2t2
To summarize,
8
>
<1 for t = 0,
MX (t) = 2t
e2t + 1
: 2te
>
for t 6= 0.
2t2
(b) For t 6= 0 we insert the exponential series into MX (t) found in part (a) and
then cancel terms:
✓X1 1 ◆
2te2t e2t + 1 1 (2t)k+1 X (2t)k
MX (t) = = + 1
2t2 2t2 k! k!
k=0 k=0
1 ✓ ◆ 1
1 X 1 1 X 2k+1 tk
= 2 (2t)k = ·
2t (k 1)! k! k + 2 k!
k=2 k=0

2k+1
from which we read o↵ E(X k ) = M (k) (0) = k+2 .
(c)
Z 2
1 2k+1
E(X k ) = xk+1 dx = .
2 0 k+2
5.18. (a) Using the definition of a moment generating function we have
1
X 1
X
MX (t) = E[etX ] = etk P (X = k) = (et )k (1 p)k 1
p
k=1 k=1
1
X 1
X
= pet (et (1 p))k 1
= pe t t
(e (1 p))k
k=1 k=0
Solutions to Chapter 5 119

t
Note that the sum converges⇣ to⌘a finite number if and only if e (1 p) < 1, which
holds if and only if t < ln 1 1 p . In this case we have

1
MX (t) = pet · .
1 et (1 p)
Overall, we find:
8 ⇣ ⌘
< pet 1
1 et (1 p) t < ln
MX (t) = ⇣1 p⌘
.
:1 t ln 1
1 p

(b) For the mean,

0 pet (1 et (1 p)) pet ( et (1 p))

E[X] = MX (0) =
(1 et (1 p))2 t=0
pet
= t
(1 e (1 p))2 t=0
p 1
= 2 = .
p p
For the variance we need the second moment,

00 pet (1 et (1 p))2 2pet (1 et (1 p))( et (1 p))

E[X 2 ] = MX (0) =
(1 et (1 p))4 t=0
2
p(1 (1 p))
2p(1 (1 p))( (1 p))
=
p4
p3 2p3 + 2p2 2 1
= 4
= 2 .
p p p
Finally the variance is
2 1 1 1 1
Var(X) = E[X 2 ] (E[X])2 = = 2 .
p2 p p2 p p
5.19. (a) Since X is discrete, we get
1
X 1 ✓ ◆k 1 ✓ ◆k
tk 2 1X 3 tk 2 1X 3 t
MX (t) = P (X = k)e = + e = + e .
5 5 4 5 5 4
k=0 k=1 k=1

The geometric series is finite exactly if 34 et < 1, which holds for t  ln(4/3). In
that case
1 ✓ ◆k
2 1X 3 t 2 1 3 t
e 8 3et
MX (t) = + e = + · 4 3 t = .
5 5 4 5 5 1 4e 20 15et
k=1

Hence
(
3et 8
MX (t) = 15et 20 , t < ln(4/3)
1 else.
120 Solutions to Chapter 5

(b) Di↵erentiating MX (t) from part (a) we get

15et (8 3et ) 3et 12
E[X] = M 0 (0) = 2 =
(20 15et ) 20 15et t=0 5
15et (8 t
3e ) 450e (82t
3et ) 3et 90e2t 84
E[X 2 ] = M 00 (0) = 2 + 3 2 = .
(20 15et ) (20 15et ) 20 15et (20 15et ) t=0 5
From this we get
84 12 2 276
Var(X) = E[X 2 ] (E[X])2 ( ) = .
5 5 25
5.20. (a) From the definition we have
Z 1 Z Z
1 1 1 1 0
MX (t) = etx e |x| dx = e (1 t)x
dx + e(t+1)x dx.
1 2 2 0 2 1

After the change of variables x ! x for the integral on ( 1, 0] we get

Z Z
1 1 (1 t)x 1 1 (t+1)x
MX (t) = e dx + e dx.
2 0 2 0
R1
We have seen that the integral of 0 e cx dx is 1c if c > 0 and 1 otherwise. Thus
MX (t) is finite if 1 t > 0 and 1 + t > 0 (or 1 < t < 1) and 1 otherwise.
Moreover, if it is finite it is equal to
1 1 1 1 1
MX (t) = · + · = .
2 1 t 2 1+t 2(1 t2 )
Thus MX (t) is 2(1 1 t2 ) for |t| < 1, and 1 otherwise.
(b) We could try to di↵erentiate MX (t) to get the moments,P1 but it is simpler to
take the Taylor expansion at t = 0. If |t| < 1 then 1 1t2 = k=0 t2k , hence
1
X 1
MX (t) = t2k .
2
k=0
n
The nth moment is the coefficient of t multiplied by n!. There are no odd exponent
terms in the expansion, so all odd moments of X are zero. The term t2k has a
coefficient 12 , so the (2k)th moment is (2k)!
2 .

5.21. We have
MY (t) = E[etY ] = E[et(aX+b) ] = E[ebt+atX ] = ebt E[eatX ] = ebt MX (at).
5.22. By the definition of the moment generating function and the properties of
expectation we get
MY (t) = E[etY ] = E[e(3X 2)t
] = E[e3tX e 2t
]=e 2t
E[e3tX ].
Note that E[e3tX ] is exactly the moment generating function MX (t) of X evaluated
at 3t. The moment generating function of X ⇠ Exp( ) is t for t < and 1
otherwise, thus E[e3tX ] = 3t for t < /3 and 1 otherwise. This gives
(
e 2t 3t , if t < /3
MY (t) =
1, otherwise.
Solutions to Chapter 5 121

5.23. We can notice that MY (t) looks very similar to the moment generating func-
t
tion of a Poisson random variable. If X ⇠ Poisson(2), then MX (t) = e2(e 1) , and
MY (t) = MX (2t). From Exercise 5.21 we see that Y has the same moment gener-
ating function as 2X, which means that they have the same distribution. Hence
2
22 2
P (Y = 4) = P (2X = 4) = P (X = 2) = e = 2e .
2!
5.24. (a) Since Y = eX > 0, we have FY (t) = 0 for t  0. For t  0,
FY (t) = P (Y  t) = P (eX  t) = 0,
since ex > 0 for all x 2 R. Next, for any t > 0
FY (t) = P (Y  t) = P (eX  t) = P (X  ln t) = (ln t).
Di↵erentiating this gives the probability density function for t > 0:
✓ ◆
1 1 1 (ln(t))2
fY (t) = 0 (ln t) = '(ln t) = p exp .
t t 2⇡t2 2
For t  0 the probability density function is 0.
(b) From the definition of Y we get that E[Y n ] = E[(eX )n ] = E[enX ]. Note that
E[enX ] = MX (n) is the moment generating function of X evaluated at n.
We computed the moment generating function for X ⇠ N (0, 1) and it is given
2
by MX (t) = et /2 . Thus we have
n2
E[Y n ] = e 2 .
5.25. We start by expressing the cumulative distribution function FY (y) of Y in
terms of FX . Since Y = |X 1| 0, we can concentrate on y 0.
FY (y) = P (Y  y) = P (|X 1|  y) = P ( y  X 1  y)
= P (1 y  X  1 + y) = FX (1 + y) FX (1 y).
(In the last step we used P (X = 1 y) = 0.) Di↵erentiating the final expression:
d
fY (y) = FY0 (y) = (FX (1 + y) FX (1 y)) = fX (1 + y) + fX (1 y).
dy
1
We have fX (x) = 5 if 2  x  3 and zero otherwise. Considering the various
cases we get 8
> 2
<5, 0<y<2
fY (y) = 15 , 2y<3
>
:
0 otherwise.
5.26. The function g(x) = x(x 3) is non-positive in [0, 3] (as 0  x and x 3  0).
It is a simple calculus exercise to show that the function g(x)) takes its minimum
at x = 3/2 inside [0, 3], and the minimum value is 94 . Thus Y = g(X) will take
values from the interval [ 94 , 0] and the probability density function fY (y) is 0 for
y2/ [ 94 , 0].
9
We will determine the cumulative distribution function FY (y) for y 2 [ 4 , 0].
We have
FY (y) = P (Y  y) = P (X(X 3)  y).
122 Solutions to Chapter 5

Next we solve the inequality x(x 3)  y for x. Since x(x 3) is a parabola facing
up, the solution will be an interval and the endpoints are exactly the solutions of
x(x 3) = y. The solutions of this equation are
p p
3 9 + 4y 3 + 9 + 4y
x1 = , and x2 = ,
2 2
9
thus for 4  y  0 we get
✓ p p ◆
3 9 + 4y 3 + 9 + 4y
FY (y) = P (X(X 3)  y) = P X
2 2
p p
= FX ( 3+ 9+4y
2 ) FX ( 3 9+4y
2 ).
Di↵erentiating with respect to y gives
1 p 1 p
fY (y) = FY0 (y) = p fX ( 3+ 29+4y ) + p FX ( 3 9+4y
2 ).
9 + 4y 9 + 4y
Using the fact that fX (x) = 29 x for 0  x  3 we obtain
1 p 1 p
fY (y) = p · 29 ( 3+ 29+4y ) + p · 2
9 · (3 9+4y
2 )
9 + 4y 9 + 4y
2
= p .
9 9 + 4y
Thus
2 9
fY (y) = p if 4 y0
9 9 + 4y
and 0 otherwise.
Finding the probability density via the Fact 5.27.
By Fact 5.27 we have
X 1
fY (y) = fX (x)
|g 0 (x)|
x:g(x)=y,g 0 (x)6=0

with g(x) = x(x 3). As we have seen before, if 0  x  3 then 94  g(x)  0.

We also have g 0 (x) = 2x 3. For 94 < y  0 we have to possible x values with
g(x) = y, these are the solutions x1 , x2 found above. Then the formula gives
p 1 p 1
fY (y) = fX ( 3+ 9+4y
2 )p + fX ( 3 9+4y
2 )p
9 + 4y 9 + 4y
p 1 p 1
= 29 ( 3+ 29+4y ) · p + 2 · (3 9+4y
)· p
9 + 4y 9 2
9 + 4y
2
= p .
9 9 + 4y
For y outside [ 94 , 0] the probability density is 0 (and we can set it equal to zero
for y = 94 as well).
5.27. We start by expressing the cumulative distribution function FY (y) of Y in
terms of FX . Because Y = eX 1, we may assume y 1.
FY (y) = P (Y  y) = P (eX  y) = P (X  ln y) = FX (ln y).
Solutions to Chapter 5 123

Di↵erentiating this we get

d 1
fY (y) = FY0 (y) = FX (ln(y)) = fX (ln y) .
dy y
The probability density function of X is e x for x 0 and zero otherwise. If
y > 1 then ln y > 0, hence in this case
1
fY (y) = e ln y = y ( +1) .
y
For y = 1 we can set fY (1) = 0, so we get
(
y ( +1)
, y>1
fY (y) =
0 else.

5.28. We have fX (x) = 13 for 1 < x < 2 and 0 otherwise. Y = X 4 takes values
from [0, 16], thus fY (y) = 0 outside this interval. For 0 < y  16 we have
p p p p
FY (y) = P (Y  y) = P (X 4  y) = P ( 4 y  X  4 y) = FX ( 4 y) FX ( 4 y).
Di↵erentiating this gives
1 3/4 p 1 p
fY (y) = FY0 (y) = y fX ( 4 y) + y 3/4 fX ( 4 y).
4 4
p p p
Note that for 0 < y < 1 both 4 y and 4 y are in ( 1, 2), hence fX ( 4 y) and
p 1
fX ( 4 y) are both equal to 3 . This gives
1 1 1
fY (y) = 2 · y 3/4 · = y 3/4 , if 0 < y < 1.
4 3 6
p p
If 1  y < 16 then 4 y 2 ( 1, 2), but 4 y =6 ( 1, 2) which gives
1 3/4 1 1 3/4
fY (y) = y · = y , if 1  y < 16.
4 3 12
Collecting everything
8
> 1 3/4
<6y , if 0 < y < 1
1
fY (y) = 12 y 3/4 , if 1  y < 16
>
:
0, otherwise.
5.29. Y = |Z| 0. For y 0 we get
FY (y) = P (Y  y) = P (|Z|  y) = P ( y  Z  y) = (y) ( y) = 2 (y) 1.
Hence for y 0 we have
2 y2
fY (y) = F 0 (y) = (2 (y) 1)0 = 2 (y) = p e 2 ,
2⇡
and fY (y) = 0 otherwise.
5.30. We present two approaches for the solution.
Finding the probability density via the cumulative distribution function.
1
The probability density function of X is fX (x) = 3⇡ on [ ⇡, 2⇡] and 0 otherwise.
The sin(x) function takes values between 1 and 1, and it will take all these
values on [ ⇡, 2⇡]. Thus the set of possible values of Y are the interval [ 1, 1].
124 Solutions to Chapter 5

We will compute the cumulative distribution function of Y for 1 < y < 1. By

definition,
FY (y) = P (Y  y) = P (sin(X)  y).
In the next step we have to solve the inequality {sin(X)  y} for X. Note that
sin(x) is not one-to-one on [ ⇡, 2⇡]. In order to solve the inequality, it helps to
consider two cases: 0  y < 1 and 1 < y < 0. If 0  y < 1 then the solution of
the inequality is
{⇡ arcsin(y)  X  2⇡} [ { ⇡  X  arcsin(y)}
and we get
FY (y) = P (Y  y) = P (sin(X)  y)
= P ( ⇡  X  arcsin(y)) + P (⇡ arcsin(y)  X  2⇡)
= FX (arcsin(y)) + (1 FX (⇡ arcsin(y))
Di↵erentiating this (recall that (arcsin(x))0 = p 1 ) we get
1 x2
1 1 2
fY (y) = fX (arcsin(y)) p + fX (⇡ arcsin(y)) p = p
1 y2 1 y2 3⇡ 1 y2
(Note that arcsin(y) and ⇡ arcsin(y) are both in [ ⇡, 2⇡].)
If 1 < y < 0 then the solution of the inequality is
{ ⇡ arcsin(y)  X  arcsin(y)} [ {⇡ arcsin(y)  X  2⇡ + arcsin(y)}
and we get
FY (y) = P (Y  y) = P (sin(X)  y)
= P( ⇡ arcsin(y)  X  arcsin(y)) + P (⇡ arcsin(y)  X  2⇡ + arcsin(y))
= FX (arcsin(y)) FX ( ⇡ arcsin(y)) + FX (2⇡ + arcsin(y)) FX (⇡ arcsin(y))
Di↵erentiating this (and again using (arcsin(x)) = 0 p 1 ) we get
1 x2
1 1
fY (y) = fX (arcsin(y)) p + fX ( ⇡ arcsin(y)) p
1 y2 1 y2
1 1
+ fX (2⇡ + arcsin(y)) p + fX (⇡ arcsin(y)) p
1 y2 1 y2
4
= p
3⇡ 1 y2
This gives
8
>
> p4 , 1<y<0
< 3⇡ 1 y2
fY (y) = p2 , 0y<1
>
> 3⇡ 1 y2
:
0, |y| 1
Finding the probability density via the Fact 5.27.
By Fact 5.27 we have
X 1
fY (y) = fX (x)
|g 0 (x)|
x:g(x)=y,g 0 (x)6=0
Solutions to Chapter 5 125

where g(x) = sin(x). Again, we only need to worry about the case 1  y  1,
since Y can only take values from here. With a little bit of trigonometry you can
check that the solutions of sin(x) = y for |y| < 1 are exactly the numbers

Ay = {arcsin(y) + 2⇡k, k 2 Z} \ {⇡ arcsin(y) + 2⇡k, k 2 Z}.

Note that g 0 (x) = cos(x) and for any integer k

1 1 1
= =p .
| cos(arcsin(y) + 2⇡k)| | cos(⇡ arcsin(y) + 2⇡k)| 1 y2
1
Since the density fX (x) is constant 3⇡ on [ ⇡, 2⇡], we just need to check how many
of the solutions from the set Ay are in this interval. It can be checked that there
will be two solutions if 0 < y < 1 and four solution for 1 < y < 0. (Sketching a
graph of the sin function would help to visualize this.) Each one of these solutions
will give a term p1 2 to the sum, so we get the case-defined function found wit
3⇡ 1 y
the first approach.
U
5.31. We have Y = e 1 U 1. For y 1:
U U ln y ln y
FY (y) = P (Y  y) = P (e 1 U  y) = P (  ln y) = P (U  )= ,
1 U ln y + 1 ln y + 1
ln y
where we used U ⇠ Unif[0, 1] and 0 < ln y+1 < 1. For y > 1 we have

fY (y) = FY0 (y) = 1

y(1+ln(y))2 ,

and fY (y) = 0 otherwise.

5.32. The set of possible values of X is (0, 1), hence the set of possible values for
Y is the interval [1, 1). Thus, for t < 1, fY (t) = 0. For t 1,

P (Y  t) = P ( X1  t) = P (X 1
t) =1 1
t.

1
Di↵erentiating now shows that fY (t) = t2 when t 1.
5.33. The following function will work:
8
>
<1 if 0 < u < 1/7
g(u) = 4 if 1/7  u < 3/7
>
:
9 if 3/7  u  1.
5.34. We can see from the conditions that
1 1 1
P (1 < X < 3) = P (1 < X < 2) + P (X = 2) = P (2 < X < 3) = + + = 1,
3 3 3
hence we will need to find a function g that maps (0, 1) to (1, 3). The conditions
show that inside the intervals (1, 2) and (2, 3) the random variable X ‘behaves’
like a random variable with probability density function 13 there, but it also takes
the value 2 with probability 13 (so it actually cannot have a probability density
function). We get P (g(U ) = 2) = 13 if the function g is constant 2 on an interval
126 Solutions to Chapter 5

of length 13 inside (0, 1). To get the behavior in (1, 2) and (2, 3) we can have linear
functions there with slope 3. This leads to the following construction:
8
>
<1 + 3x, if 0 < x  13
g(x) = 2, if 13 < x  23
>
: 2
2 + 3(x 3 ), if 23 < x < 1.
We can define g any way we want it to outside (1, 3).
To check that this function works note that
1
P (g(U ) = 2) = P ( 13  U  23 ) = ,
3
for 1 < a < 2 we have
P (1 < g(U ) < a) = P (1 + 3U < a) = P (U < 13 (a 1)) = 13 (a 1),
and for 2 < b < 3 we have
2
P (b < g(U ) < 3) = P (b < 2+3(U 3 )) = P ( 13 (b 2)+ 23 < U ) = 1
3
1
3 (b 2) = 13 (3 b).
5.35. Note that Y = bXc is an integer, and hence Y is discrete. Moreover, for an
integer k we have bXc = k if and only if k  X < 1. Thus
P (bXc = k) = P (k  X < k + 1).
Since X ⇠ Exp( ), we have P (k  X < k + 1) = 0 if k  1, and for k 0:
Z k+1
P (k  X < k + 1) = e y dy = e k e (k+1) = e k (1 e ).
k
5.36. Note that X 0 and thus the possible values of bXc are 0, 1, 2, . . . . To find
the probability mass function, we have to compute P (bXc = k) for all nonnegative
integer k. Note that bXc = k if and only if k  X < k + 1. Thus for k 2 {0, 1, . . . }
we have
Z k+1
P (bXc = k) = P (k  X < k + 1) = e t dt
k
t=k+1
t k (k+1)
= e =e e
t=k
k
=e (1 e ) = (e )k (1 e ).
Note that this implies the random variable bXc + 1 is geometric with a parameter
of e .
5.37. Since Y = {X}, we have 0  Y < 1. For 0  y < 1 we have
FY (y) = P (Y  y) = P ({X}  y).
If {x}  y then k  x  k + y for some integer k. Thus
X X
P ({X}  y) = P (k  X  k + y) = (FX (k + y) FX (k)).
k k

Since X ⇠ Exp( ), we have FX (x) = 1 e x for x 0 and 0 otherwise. This

gives
1 ⇣
X ⌘ X1
1 e y
FY (y) = 1 e (k+y) (1 e k ) = e k (1 e y ) = .
1 e
k=0 k=0
Solutions to Chapter 5 127

Di↵erentiating this gives

( y
e
1 e
, 0y<1
fY (y) =
0, otherwise.
5.38. The cumulative distribution function of X can be computed from the prob-
ability density:
Z x (
1 x1 , x > 1,
FX (x) = fX (y)dy =
1 0, x  1.

We will look for a strictly increasing continuous function g. The probability

density function of X is positive on (1, 1), thus the function g must map (1, 1) to
(0, 1).
If g(X) is uniform on [0, 1] then for any 0 < y < 1 we have P (g(X)  y) = y. If
g is strictly increasing and continuous then there is a well-defined inverse function
g 1 and we have
y = P (g(X)  y) = P (X  g 1 (y)).
Since g maps (1, 1) to (0, 1), g 1 maps (0, 1) to ( 1, 1), which means g 1 (y) > 1
and
1
y = P (X  g 1 (y)) = 1 1
.
g (y)
This gives y = 1 g 11(y) . By substituting y = g(x) and we get g(x) = 1 x1 for
1 < x. We can define g any way we want for x  1.
Solutions to Chapter 6

6.1. (a) We just need to compute the row sums to get P (X = 1) = 0.3, P (X =
2) = 0.5, and P (X = 3) = 0.2.
(b) The possible values for Z = XY are {0, 1, 2, 3, 4, 6, 9} and the probability mass
function is

P (Z = 0) = P (Y = 0) = 0.35
P (Z = 1) = P (X = 1, Y = 1) = 0.15
P (Z = 2) = P (X = 1, Y = 2) + P (X = 2, Y = 1) = 0.05
P (Z = 3) = P (X = 1, Y = 3) + P (X = 3, Y = 1) = 0.05
P (Z = 4) = P (X = 2, Y = 2) = 0.05
P (Z = 6) = P (X = 2, Y = 3) + P (X = 3, Y = 2) = 0.2 + 0.1 = 0.3
P (Z = 9) = P (X = 3, Y = 3) = 0.05.

(c) We can compute the expectation as follows:

3 X
X 3
E[XeY ] = xey
x=1 y=0

= e0 · 0.1 + e1 · 0.15 + e2 · 0 + e3 · 0.05

+ 2e0 · 0.2 + 2e1 · 0.05 + 2e2 · 0.05 + 2e3 · 0.2
+ 3e0 · 0.05 + 3e1 · 0 + 3e2 · 0.1 + 3e3 · 0.05
⇡ 16.3365

6.2. (a) The marginal probability mass function of X is found by computing the
row sums,

1 1 1
P (X = 1) = , P (X = 2) = , P (X = 3) = .
3 2 6

129
130 Solutions to Chapter 6

Computing the column sums gives the probability mass function of Y ,

1 1 1 4
P (Y = 0) = , P (Y = 1) = , P (Y = 2) = , P (Y = 3) = .
5 5 3 15
(b) First we find the combinations of X and Y where X + Y 2  2. These are
(1, 0), (1, 1), and (2, 0). So we have
P (X + Y 2  2) =P (X = 1, Y = 0) + P (X = 1, Y = 1) + P (X = 2, Y = 0)
1 1 1 7
= + + = .
15 15 10 30
6.3. (a) Let (XW , XY , XP ) denote the number of times the professor chooses white,
yellow and purple chalks, respectively. Choosing the color of the chalk can be
considered a trial with three possible outcomes (the three colors), and since the
choices are independent the random vector (XW , XY , XP ) has multinomial distri-
bution with parameters n = 10, r = 3 and pW = 0.5 = 1/2, pY = 0.4 = 2/5 and
pP = 0.1 = 1/10. We can now compute the probability in question using the joint
probability mass function of the multinomial:
10! 1 5 2 4 1 1 63
P (XW = 5, XY = 4, XP = 1) = ( ) ( ) ( ) = = 0.1008.
5!4!1! 2 5 10 625
(b) Using the same notations as in part (a) we need to compute P (XW = 9). The
marginal distribution of XW is Bin(10, 1/2), since it counts the number of times in
10 trials we got a specific outcome (getting yellow chalk). Thus
✓ ◆
10 1 10 5
P (XW = 9) = ( ) = ⇡ 0.009766.
9 2 512
6.4. (X, Y, Z, W ) has a multinomial distribution with parameters n = 5, r = 4,
p1 = p2 = p3 = 18 , and p4 = 58 . Hence, the joint probability mass function of
(X, Y, Z, W ) is
✓ ◆x ✓ ◆y ✓ ◆z ✓ ◆w
5! 1 1 1 5
P (X = x, Y = y, Z = z, W = w) =
x! y! z! w! 8 8 8 8
5! 5w
= · ,
x! y! z! w! 8x+y+z+w
for those integers x, y, z, w 0 satisfying x + y + z + w = 5, and zero otherwise.
Let W be the number of times some sandwich other than salami, falafel, or
veggie is chosen. Then (X, Y, Z, W ) has a multinomial distribution with parameters
n = 5, r = 4, p1 = p2 = p3 = 18 , and p4 = 58 .
6.5. (a)
Z1 Z1 Z 1 ✓Z 1 ◆ Z 1
2
f (x, y) dx dy = 12
7 (xy + y ) dx dy = 12
7 ( 12 y + y 2 ) dy
0 0 0
1 1
12 1 1
= 7 4 + 3 = 1.
Since f 0 by its definition and integrates to 1, it passes the test.
(b) Since 0  X, Y  1, the marginal density functions fX and fY both vanish
outside [0, 1].
Solutions to Chapter 6 131

For 0  x  1,
Z1 Z 1
fX (x) = f (x, y) dy = 12
7 (xy + y 2 ) dy = 12 1
7 (2x + 13 ) dy = 67 x + 47 .
0
1

For 0  y  1,
Z1 Z 1
fY (y) = f (x, y) dx = 12
7 (xy + y 2 ) dx = 12 1
7 (2y + y 2 ) dy = 12 2
7 y + 67 y.
0
1

(d)
Z 1 Z 1 Z 1 Z 1
E[X 2 Y ] = x2 yf (x, y) dx dy = x2 y 12
7 (xy + y 2 ) dx dy
1 1 0 0
Z 1 Z 1
= 12
7 (x3 y 2 + x2 y 3 ) dx dy = 12 1
7 4 · 1
3 + 1
3 · 1
4 = 27 .
0 0

6.6. (a) The marginal of X is

Z 1 Z 1
x(1+y) x xy x
fX (x) = xe dy = xe e dy = e ,
0 0

for x > 0 and zero otherwise. The marginal of Y is

Z 1
1
fY (y) = xe x(1+y) dx = ,
0 (1 + y)2
for y > 0 and zero otherwise (use integration by parts).
(b) The expectation is
Z 1Z 1 Z 1Z 1
E[XY ] = xy · f (x, y) dx dy = x2 ye x(1+y) dy dx
0 0 0 0
Z 1 Z 1 Z 1 Z 1
2 x xy 2 x 1 x
= x e ye dy dx = x e · 2 dx = e dx = 1.
0 0 0 x 0

(c) The expectation is

 Z 1Z 1 Z 1 Z 1
X x x(1+y) 1
E = xe dx dy = x2 e x(1+y)
dx dy
1+Y 0 0 1+y 0 1+y 0
Z 1 Z 1
1 2 1 2
= · 3
dy = 2 4
dy = .
0 1 + y (1 + y) 0 (1 + y) 3
1
6.7. (a) The area of the triangle is 1/2, thus the joint density f (x, y) is 1/2 =2
inside the triangle and 0 outside. The triangle is the set {(x, y) : 0  x, 0 
132 Solutions to Chapter 6

y, x + y  1}, so we can also write

(
2, if 0  x, 0  y, x + y  1
f (x, y) =
0, otherwise.
We
R 1 can compute the marginal density of X by evaluating the integral fX (x) =
1
f (x, y)dy. If (x, y) is in the triangle then we must have 0  x  1, so for values
outside this interval fX (x) = 0. If 0  x  1 then f (x, y) = 2 for 0  y  1 x
and thus in this case we have
Z 1 Z 1 x
fX (x) = f (x, y)dy = 2dy = 2(2 x).
1 0

Thus (
2(1 x), if 0  x  1
fX (x) =
0, otherwise.
Similar computation shows that
(
2(1 y), if 0  y  1
fY (y) =
0, otherwise.
(b) The expectation of X can be computed using the marginal density:
Z 1 Z 1 x=1
2x3 1
E[X] = xfX (x)dx = x2(1 x)dx = x2 = .
1 0 3 x=0 3
Similar computation gives E[Y ] = 13 .
(c) To compute E[XY ] we need to integrate the function xyf (x, y) on the whole
plane, which in our case is the same as integrating 2xy on our triangle. We can
write this double integral as two single variable integrals: for a given 0  x  1 the
possible y values are 0  y  1 x hence
Z 1Z 1 x Z 1⇣ ⌘ Z 1
y=1 x
E[XY ] = 2xy dy dx = xy 2 y=0 dx = x(1 x)2 dx
0 0 0 0
x4 2x3 x2 x=1 1
= + x=0
= .
4 3 2 12
6.8. (a) X and Y from Exercise 6.2 are not independent. For example, note that
P (X = 3) > 0 and P (Y = 2) > 0, but P (X = 3, Y = 2) = 0.
(b) The marginals for X and Y from Exercise 6.5 are:
For 0  x  1,
Z1 Z 1
fX (x) = f (x, y) dy = 12
7 (xy + y 2 ) dy = 12 1
7 (2x + 13 ) dy = 67 x + 47 .
0
1

For 0  y  1,
Z1 Z 1
fY (y) = f (x, y) dx = 12
7 (xy + y 2 ) dx = 12 1
7 (2y + y 2 ) dy = 12 2
7 y + 67 y.
0
1
Solutions to Chapter 6 133

Thus, fX (x)fY (y) 6= f (x, y) and they are not independent. For example,
fX ( 14 ) = 11 1 9 1 1 99 1 1
14 and fY ( 4 ) = 28 , so that fX ( 4 )fY ( 4 ) = 392 . However, f ( 4 , 4 ) =
3
14 .
(c) The marginal of X is
Z 1 Z 1
x(1+y) x xy x
fX (x) = xe dy = xe e dy = e ,
0 0
for x > 0 and zero otherwise. The marginal of Y is
Z 1
1
fY (y) = xe x(1+y) dx = ,
0 (1 + y)2
for y > 0 and zero otherwise. Hence, f (x, y) is not the product of the marginals
and X and Y are not independent.
(d) X and Y are not independent. For example, choose any point (x, y) contained
in the square {(u, v) : 0  u  1, 0  v  1}, but not contained in the
triangle with vertices (0, 0), (1, 0), (0, 1). Then fX (x) > 0, fY (y) > 0, and
so fX (x)fY (y) > 0. However, f (x, y) = 0 (because the point is outside the
triangle).
6.9. X is binomial with parameters 3 and 1/2, thus its probability mass function is
pX (a) = a3 18 for a = 0, 1, 2, 3 and zero otherwise. The probability mass function
of Y is pY (b) = 16 for b = 1, 2, 3, 4, 5, 6. Since X and Y are independent, the joint
probability mass function is just the product of the individual probability mass
functions which means that
✓ ◆
3 1
pX,Y (a, b) = pX (a)pY (b) = , for a 2 {0, 1, 2, 3} and b 2 {1, 2, 3, 4, 5, 6}.
a 48
6.10. The marginals of X and Y are
( (
1, x 2 (0, 1) 1, y 2 (0, 1)
fX (x) = , fY (y) =
0, x 2/ (0, 1) 0, y2/ (0, 1),
and because they are independent the joint density is their product
(
1, 0 < x < 1, and 0 < y < 1
fX,Y (x, y) = fX (x)fY (y) =
0, else.
Therefore,
ZZ Z 1 Z y Z 1
1
P (X < Y ) = fX,Y (x, y)dxdy = 1dx dy = y dy = .
x<y 0 0 0 2
6.11. Because Y is uniform on (1, 2), the marginal density for Y is
(
1 y 2 (1, 2)
fY (y) =
0 else
By independence, the joint distribution of (X, Y ) is therefore
(
2x 0 < x < 1, 1 < y < 2
fX,Y (X, Y ) =
0 else
134 Solutions to Chapter 6

The required probability is

Z Z
3
P (Y X 2) = fXY (x, y) dx dy
3
y x 2
Z 1
2
Z 2
= 2x dy dx,
0 x+ 32

where you should draw a picture of the region to see why this is the case. Calculating
the double integral yields:
Z 12 Z 2 Z 1/2
3
P (Y X 2) = 2x dy dx = 2x( 12 x) dx = 24 1
.
0 x+ 32 0

6.12. fX (x) = 0 if x < 0 and if x > 0

Z 1 Z 1 Z 1
(x+2y) x 2y x
fX (x) = f (x, y) dy = 2e dy = e 2e dy = e .
1 0 0

fY (y) = 0 if y < 0 and for y > 0,

Z 1 Z 1 Z 1
(x+2y) 2y 1 2y
fY (y) = f (x, y) dx = 2e dx = 2e e dx = 2e .
1 0 0

Now note that f (x, y) is the product of fX and fY .

6.13. In Example 6.19 we computed the probability density functions fX and fY ,
and these functions were positive on ( r0 , r0 ). If X and Y were independent then
the joint density would be f (x, y) = fX (x)fY (y), a function that is positive on the
square ( r0 , r0 )2 . But f (x, y) is zero outside the disk D, which means that X and
Y are not independent.
max min(a, x), 0 · max min(b, y), 0
6.14. (a) F (x, y) = .
ab
(b) If (x, y) is not in the rectangle, then F (x, y) = 0 and f (x, y) = 0. When (x, y)
is in the interior of the rectangle, (so that 0 < x < a and 0 < y < b)
max min(a, x), 0 · max min(b, y), 0 max(x, 0) · max(y, 0) xy
F (x, y) = = = .
ab ab ab
Hence,
@2 ab
F (x, y) =
@x@y .
6.15. We can express X and Y in terms
p of Z and W as X = g(Z, W ), Y = h(Z, W )
with g(z, w) = z and h(z, w) = ⇢z + 1 ⇢2 w. Solving the equations
p
x = z, y = ⇢z + 1 ⇢2 w
for z, w gives the inverse of the function (g(z, w), h(z, w)). The solution is
y ⇢x
z = x, w = p ,
1 ⇢2
thus the inverse of (g(z, w), h(z, w)) is the function (q(x, y), r(x, y)) with
y ⇢x
q(x, y) = x, r(x, y) = p .
1 ⇢2
Solutions to Chapter 6 135

The Jacobian of (q(x, y), r(x, y)) with respect to x, y is

" #
1 0 1
J(x, y) = det p⇢ 2 p1 2 =p .
1 ⇢ 1 ⇢ 1 ⇢2
Using Fact 6.41 we get the joint density of X and Y :
!
y ⇢x 1
fX,Y (x, y) = fZ,W x, p ·p .
1 ⇢ 2 1 ⇢2
z 2 +w2
1
Since Z and W are independent standard normals, we have fZ,W (z, w) = 2⇡ e
2 .
Thus !2
y ⇢x
x2 + p
1 1 ⇢2
fX,Y (x, y) = p e .2

2⇡ 1 ⇢2
We can simplify the exponent of the exponential as follows:
✓ ◆2
2 y ⇢x
x + p 2
1 ⇢ x2 (1 ⇢2 + ⇢2 ) + y 2 2⇢xy x2 + y 2 2⇢xy
= 2
= .
2 2(1 ⇢ ) 2(1 ⇢2 )
This shows that the joint probability density of X, Y is indeed the same as given
in (6.28), and thus the pair (X, Y ) has standard bivariate normal distribution with
parameter ⇢.
6.16. In terms of the polar coordinates (r, ✓) the Cartesian coordinates (x, y) are
expressed as
x = r cos(✓) and y = r sin(✓).
These equations give the coordinate functions of the inverse function G 1 (r, ✓).
The Jacobian is
" @x @x # 
@r @✓ cos(✓) r sin(✓)
J(r, ✓) = det @y @y = det = r cos2 ✓ + r sin2 ✓ = r.
sin(✓) r cos(✓)
@r @✓
1
The joint density function of X, Y is fX,Y (x, y) = ⇡r02
in D and 0 outside. Formula
(6.32) gives
1
fR,⇥ (r, ✓) = fX,Y (r cos(✓), r sin(✓)) |J(r, ✓)| = ⇡r02
r for (r, ✓) 2 L.
This is exactly the joint density function obtained earlier in (6.26) of Example 6.37.
6.17. We can express (X, Y ) as (g(U, V ), h(U, V )) where g(u, v) = uv and h(u, v) =
(1 u)v. We can find the inverse of the function (g(u, v), h(u, v)) by solving the
system of equations
x = uv, y = (1 u)v
x
for u and v. The solution is u = x+y , v = x + y, so the inverse of (g(u, v), h(u, v))
is the function (q(x, y), r(x, y)) with
x
q(x, y) = , r(x, y) = x + y.
x+y
The Jacobian of (q(x, y), r(x, y)) with respect to x, y is
 y x
2 (x+y)2 y+x 1
J(x, y) = det (x+y) = 2
= .
1 1 (x + y) x+y
136 Solutions to Chapter 6

Using Fact 6.41 we get the joint density of X and Y :

✓ ◆
x 1
fX,Y (x, y) = fU,V ,x + y · .
x+y x+y
The joint density of (U, V ) is given by
2 v
fU,V (u, v) = fU (u)fV (v) = ve , for0 < u < 1, 0 < v
and zero otherwise. This gives
2 (x+y) 1 2 (x+y)
fX,Y (x, y) = (x + y)e · = e
x+y
x
for 0 < x+y < 1 and 0 < x + y, zero otherwise. This condition is equivalent to
0 < x, 0 < y, and the found joint density can be factorized as
x y
fX,Y (x, y) = e · e .
This shows that X and Y are independent exponentials with parameter .
6.18. (a) The probability mass function can be visualized in tabular form
X\Y 1 2 3 4
1
1 4 0 0 0
1 1
2 8 8 0 0
1 1 1
3 12 12 12 0
1 1 1 1
4 16 16 16 16

The terms are nonnegative and add to 1, which shows that pX,Y is a probability
mass function.
(b) Adding the rows and columns gives the marginals. The marginal of X is
P (X = 1) = 14 , P (X = 2) = 14 , P (X = 3) = 14 , P (X = 4) = 14 ,
whereas the marginal of Y is
25 13 7 1
P (Y = 1) = 48 , P (Y = 2) = 48 , P (Y = 3) = 48 , P (Y = 4) = 16 .

6.19. (a) By adding the probabilities in the respective rows we get pX (0) = 13 ,
pX (1) = 23 . By adding them in the appropriate columns we get the marginal
probability mass function of Y : pY (0) = 16 , pY (1) = 13 , pY (2) = 12 .
(b) We have pZ,W (z, w) = pZ (z)pW (w) by the independence of Z and W . Using
the probability mass functions from part (a) we get
W
0 1 2
1 1 1
Z 0 18 9 6
1 2 1
1 9 9 3
Solutions to Chapter 6 137

6.20. Note that the random variable X1 + X2 counts the number of times that out-
comes 1 or 2 occurred. This event has a probability of 12 . Hence, and similar to the
argument made at the end of Example 6.10, (X1 +X2 , X3 , X4 ) ⇠ Mult(n, 3, 12 , 18 , 38 ).
Therefore, for any pair of integers (k, `) with k + `  n

P (X3 = k, X4 = `) = P (X1 + X2 = n k `, X3 = k, X4 = `)
n! 1 n k ` 1 k 3 `
= 2 8 8 .
(n k `)! k! `!

6.21. They are not independent. Both X1 and X2 can take the value n with positive
probability. However, they cannot take it the same time, as X1 + X2  n. Thus

0 < P (X1 = n)P (X2 = n) 6= P (X1 = n, X2 = n) = 0

which shows that X1 and X2 are not independent.

6.22. The random variable X1 + X2 counts the number of times that outcomes
1 or 2 occurred. This event has a probability of p1 + p2 . Therefore, X1 + X2 ⇠
Bin(n, p1 + p2 ).

6.23. Let Xg , Xr , Xy be the number of times we see a green ball, red ball, and
yellow ball, respectively. Then, (Xg , Xr , Xy ) ⇠ Mult(4, 3, 1/3, 1/3, 1/3). We want
the following probability,

P (Xg = 2, Xr = 1, Xy = 1) + P (Xg = 1, Xr = 2, Xy = 1) + P (Xg = 1, Xr = 1, Xy = 2)

4! 1 21 1 4! 1 21 1 4! 1 21 1
= 2!1!1! ( 3 ) 3 3 + 2!1!1! ( 3 ) 3 3 + 2!1!1! ( 3 ) 3 3
4
= 9.

6.24. The number of green balls chosen is binomially distributed with parameters
n = 3 and p = 14 . Hence, the probability that exactly two balls are green and one
is not green is

✓ ◆ ✓ ◆2
3 1 3 9
= .
2 4 4 64

The same argument goes for seeing exactly two red balls, two yellow balls, or two
white balls. Hence, the probability that exactly two balls are of the same color is

9 9
4· = .
64 16
6.25. (a) The possible values for X and Y are 0, 1, 2. For each possible pair we
compute the probability of the corresponding event, For example,

3
P (X = 0, Y = 0) = P {(T, T, T )} = 2 .
138 Solutions to Chapter 6

Similarly
3
P (X = 0, Y = 1) = P ({(T, T, H)}) = 2
P (X = 0, Y = 2) = 0
3
P (X = 1, Y = 0) = P ({(H, T, T )}) = 2
3 2
P (X = 1, Y = 1) = P ({(H, T, H), (T, H, T )}) = 2 ⇥ 2 =2
3
P (X = 1, Y = 2) = P ({(T, H, H)}) = 2
3
P (X = 2, Y = 1) = P ({(H, H, T )}) = 2
3
P (X = 2, Y = 2) = P ({(H, H, H)}) = 2

and zero for every other value of X and Y .

(b) The discrete random variable XY can take values {0, 1, 2, 4}. The probability
mass function is found by considering the possible coin flip sequences for each value:
3
P (XY = 0) =P (X = 0, Y = 0) + P (X = 0, Y = 1) + P (X = 1, Y = 0) = 8
1
P (XY = 1) =P (X = 1, Y = 1) = 4
1
P (XY = 2) =P (X = 1, Y = 2) + P (X = 2, Y = 1) = 4
P (XY = 4) =P (X = 2, Y = 2) = 18 .

6.26. (a) By the setup of the experiment, XA is uniformly distributed over {0, 1, 2}
whereas XB is uniformly distributed over {1, 2, . . . , 6}. Moreover, XA and XB
are independent. Hence, (XA , XB ) is uniformly distributed over ⌦ = {(k, `) :
0  k  2, 1  `  6}. That is, for (k, `) 2 ⌦,
1
P ((XA , XB ) = (k, `)) = 18 .

(b) The set of possible values of Y1 is {0, 1, 2, 3, 4, 5, 6, 8, 10, 12} and the set of
possible values of Y2 is {1, 2, 3, 4, 5, 6}. The joint distribution can be given in
tabular form

Y1 \ Y2 1 2 3 4 5 6
1 1 1 1 1 1
0 18 18 18 18 18 18
1
1 18 0 0 0 0 0
2
2 0 18 0 0 0 0
1
3 0 0 18 0 0 0
1 1
4 0 18 0 18 0 0
1
5 0 0 0 0 18 0
1 1
6 0 0 18 0 0 18
1
8 0 0 0 18 0 0
1
10 0 0 0 0 18 0
1
12 0 0 0 0 0 18

For example,
1 1
P (Y1 = 2, Y2 = 2) = P (XA = 1, XB = 2) + P (XA = 2, XB = 1) = 18 + 18 .
Solutions to Chapter 6 139

(c) The marginals are found by summing along the rows and columns:
6 1 2
P (Y1 = 0) = 18 , P (Y1 = 1) = 18 , P (Y1 = 2) = 18
1 2 1
P (Y1 = 3) = 18 , P (Y1 = 4) = 18 , P (Y1 = 5) = 18
2 1 1
P (Y1 = 6) = 18 , P (Y1 = 8) = 18 , P (Y1 = 10) = 18
1
P (Y1 = 12) = 18 ,

and
2 4 3
P (Y2 = 1) = 18 , P (Y2 = 2) = 18 , P (Y2 = 3) = 18
3 3 3
P (Y2 = 4) = 18 , P (Y2 = 5) = 18 , P (Y2 = 6) = 18 .

The random variables Y1 and Y2 are not independent. For example,

P (Y1 = 2, Y2 = 6) = 0 whereas P (Y1 = 2) > 0 and P (Y2 = 6) > 0.
6.27. The possible values of Y are 1, 1, which is the same as X2 . Thus, we need
to show four things:
P (X2 = 1, Y = 1) = P (X2 = 1)P (Y = 1)
P (X2 = 1, Y = 1) = P (X2 = 1)P (Y = 1)
P (X2 = 1, Y = 1) = P (X2 = 1)P (Y = 1)
P (X2 = 1, Y = 1) = P (X2 = 1)P (Y = 1).
To check the first one
P (X2 = 1, Y = 1) = P (X2 = 1, X2 X1 = 1) = P (X2 = 1, X1 = 1) = P (X2 = 1)P (X1 = 1) = p2 .
Also,
p 1
P (Y = 1) = P (X1 = 1, X2 = 1) + P (X1 = 1, X2 = 1) = 2 + 2 · (1 p) = 12 ,
and so,
P (X2 = 1)P (Y = 1) = p 12 = P (X2 = 1, Y = 1).
All the other terms are handled similarly, using P (Y = 1) = P (Y = 1) = 1/2 and
P (X2 = a, Y = b) = P (X1 = b/a, X2 = a).
6.28. To help with notation we will use q = 1 p. For the joint probability mass
function we need to compute P (V = k, W = `) for all k 1, ` = 0, 1, 2. We have
P (V = k, W = 0) = P (min(X, Y ) = k, X < Y ) = P (X = k, k < Y )

= P (X = k)P (k < Y ) = pq k 1
· q k = pq 2k 1
,
where we used the independence of X and Y in the third equality. We get P (V =
k, W = 2) = pq 2k 1 in exactly the same way. Finally,
P (V = k, W = 1) = P (min(X, Y ) = k, X = Y ) = P (X = k, Y = k) = p2 q 2k 2
.
This gives us the joint probability mass function of V and W ; for the independence
we need to check if this is the product of the marginals.
By Example 6.31 we have V ⇠ Geom(1 q 2 ) so for any k 2 {1, 2, . . . } we get
P (V = k) = (1 (1 q 2 ))k 1
(1 q 2 ) = q 2k 2
(1 q 2 ).
140 Solutions to Chapter 6

The probability mass function of W is also easy to compute. By symmetry we must

have

P (W = 0) = P (X < Y ) = P (Y < X) = P (W = 2).

Also, by the independence of X and Y ,

1
X 1
X
P (W = 1) = P (X = Y ) = P (X = k, Y = k) = P (X = k)P (Y = k)
k=1 k=1
1
X 1
X p2
= pq k 1
· pq k 1
= p2 (q 2 )k =
1 q2
k=1 k=0
p
= .
2 p

Combining the above with the fact that P (W = 0) + P (W = 1) + P (W = 2) = 1

gives
1 p
P (W = 0) = P (W = 2) = 12 (1 P (W = 1)) = .
2 p

Now we can check the independence of V and W . First note that

P (V = k)P (W = 0) = q 2k 2
(1 q 2 ) 12 p
p, P (V = k, W = 0) = pq 2k 1
,

1 q2 1
and since 2 p = (1 q)(1 + q) 1+q = p, we have

P (V = k)P (W = 0) = P (V = k, W = 0).

The same computation shows P (V = k)P (W = 2) = P (V = k, W = 2). Finally,

P (V = k)P (W = 1) = q 2k 2
(1 q2 ) 2 p p , P (V = k, W = 1) = p2 q 2k 2

1 q2
and using 2 p = p again we get

P (V = k)P (W = 1) = P (V = k, W = 1).

We showed that P (V = k, W = `) = P (V = k)P (W = `) for all relevant k, `,

and this shows that V and W are independent.

6.29. Because of the independence, the joint probability mass function of X and
Y is the product of the individual probability mass functions:

P (X = a, Y = b) = P (X = a)P (Y = b) = p(1 p)a 1

r(1 r)b 1
, a, b 1.
Solutions to Chapter 6 141

We can break up the P (X < Y ) as the sum of probabilities of events {X = a, Y = b}

with b > a:
X 1 X
X 1
P (X < Y ) = P (X = a)P (Y = b) = P (X = a)P (Y = b)
a<b a=1 b=a+1
X1 1
X X1
= P (X = a) P (Y = b) = P (X = a)P (Y > a)
a=1 b=a+1 a=1
X1 X1
= p(1 p)a 1
(1 r)a = p(1 r) (1 p)a 1
(1 r)a 1

a=1 a=1
p(1 r) p pr
= = .
1 (1 p)(1 r) p + r pr
6.30. Note the typo in the problem, it should say P (X = Y +1), not P (X +1 = Y ).
For k 1 and ` 0 the joint probability mass function of X and Y is
`
P (X = k, Y = `) = (1 p)k 1
p·e `! .
Breaking up {X = Y + 1} into the disjoint union of smaller events {X = Y + 1} =
[1
k=0 {X = k + 1, Y = k}. Thus
1
X 1
X k
P (X = Y + 1) = P (X = k + 1, Y = k) = (1 p)k p · e k!
k=0 k=0
1
X ( (1 p))k
= pe
k!
k=1
(1 p) p
= pe e = pe .

For P (X + 1 = Y ) we need a couple of more steps to compute the answer. We

start with {X + 1 = Y } = [1k=1 {X = k, Y = k + 1}. Then
1
X 1
X k+1
P (X + 1 = Y ) = P (X = k, Y = k + 1) = (1 p)k 1
p·e (k+1)!
k=1 k=1
1
X 1
X
1 ( (1 p))k+1 1 ( (1 p))k
= (1 p)2 pe = (1 p)2 pe
(k + 1)! k!
k=1 k=2
1
!
X ( (1 p))k
1
= (1 p)2 pe 1 (1 p)
k!
k=0
⇣ ⌘
1 (1 p)
= (1 p)2 pe e 1 (1 p)
p p p p
= e e e
(1 p)2 (1 p)2 (1 p)
6.31. We have X1 + X2 + X3 = 8, and 0  Xi for i = 1, 2, 3. Thus we have to
find the probability P (X1 = a, X2 = b, X3 = c) for nonnegative integers a, b, c with
a + b + c = 8. Imagining that all 45 balls are di↵erent (e.g. by numbering them)
we get 458 equally likely outcomes. Out of these a
10 15 20
b c outcomes produce a
142 Solutions to Chapter 6

red, b green and c yellow balls. Thus the joint probability mass function is
10 15 20
a b c
P (X1 = a, X2 = b, X3 = c) = 45
8

for 0  a, 0  b, 0  c and a + b + c = 8, and zero otherwise.

6.32. Note that N is geometrically distributed with p = 79 . Thus, for n 1,
P (N = n) = ( 29 )n 17
9.

We turn to finding the joint probability mass function of N and Y . First, note
that
P (Y = 1, N = n) = P ((n 1) white balls followed by a green ball)
= ( 29 )n 1 49 .
Similarly,
P (Y = 2, N = n) = ( 29 )n 13
9.

We can use the above to find the marginal of Y .

1
X 1
X
P (Y = 1) = P (Y = 1, N = n) = ( 29 )n 14
9 = 4
9 · 1
1 2/9 = 47 .
n=1 n=1

Similarly,
P (Y = 2) = 37 .
We see that Y and N are independent:
P (Y = 1)P (N = n) = 4
7 · ( 29 )n 17
9 = ( 29 )n 14
9 = P (Y = 1, N = n)
P (Y = 2)P (N = n) = 3
7 · ( 29 )n 1 79 = ( 29 )n 1 39 = P (Y = 2, N = n).

The distribution of Y can be understood by noting that there are a total of 7

balls colored green or yellow, and the selection of one of the 4 green balls, condi-
tioned on one of these 7 being chosen, is 47 .
6.33. Since f (x, y) is positive only if 0 < y < 1, we have fY (y) = 0 if y  0 or
y 1. For 0 < y < 1, f (x, y) is positive only if y < x < 2 y, and so
Z 1 Z 2 y Z 2 y
fY (y) = f (x, y) dx = f (x, y) dx = 3y(2 x) dx
1 y y
x=2 y
2
= 6yx 3
2 yx = 6y 6y 2 .
x=y

Thus (
6y 6y 2 if 0 < y < 1
fY (y) =
0 otherwise.
The joint density function is positive on the triangle
{(x, y) : 0 < y < 1, y < x < 2 y}.
Solutions to Chapter 6 143

To calculate the probability that X + Y  1, we combine the restriction x + y  1

with the description of the triangle to find the region of integration. Some trial and
error may be necessary to discover the easiest way to integrate.
ZZ Z 1/2 ✓ Z 1 y ◆
P (X + Y  1) = f (x, y) dx dy = 3y(2 x) dx dy
0 y
x+y1
Z 1/2
= ( 92 y 9y 2 ) dy = 3
16 .
0

6.34. (a) The area of D is 32 , and hence the joint p.d.f. is

(
2
, (x, y) 2 D
fX,Y (x, y) = 3
0, (x, y) 2/ D.

The line segment from (1, 1) to (2, 0) that forms part of the boundary of D
obeys the equation y = 2 x. The marginal density functions are derived as
follows. First for X.
For x  0 and x 2,
fX (x) = 0.
Z 1 Z 1
2
For 0 < x  1, fX (x) = fX,Y (x, y) dy = 3 dy = 23 .
1 0
Z 1 Z 2 x
2 4 2
For 1 < x < 2, fX (x) = fX,Y (x, y) dy = 3 dy = 3 3 x.
1 0

Let us check that this is a density function:

Z 1 Z 1 Z 2
2
fX (x) dx = 3 dx + ( 43 2
3 x) dx = 1,
1 0 1

so indeed it is.
Next the marginal density function of Y :
For y  0 and y 1,
fY (y) = 0.
Z 1 Z 2 y
2 4 2
For 0 < y < 1, fY (y) = fX,Y (x, y) dx = 3 dx = 3 3 y.
1 0

(b)
Z 1 Z 1 Z 2
2 2 2
E[X] = x fX (x) dx = 3 x dx + ( 43 x 3 x ) dx = 79 .
1 0 1
Z 1 Z 1
2 2
E[Y ] = y fY (y) dy = ( 43 y 3 y ) dy = 49 .
1 0

(c) X and Y are not independent. Their joint density is not a product of the
marginal densities. Also, a picture of D shows that P (X > 32 , Y > 12 ) = 0
because all points in D satisfy x + y  2. However, the marginal densities show
that P (X > 32 ) · P (Y > 12 ) > 0 so the probability of the intersection does not
equal the product of the probabilities.
144 Solutions to Chapter 6

6.35. (a) Since fXY is non-negative, we just need to prove that the integral of fXY
is 1:
Z Z Z Z y
1 1 2
fXY (x, y)dxdy = (x + y)dx dy = (x + y)dx dy
0xy2 4 4 0 0
Z
1 2 3 2
= y dy = 1.
4 0 2
(b) We calculate the probability using the joint density function:
Z Z 2 Z y
1 1
P {Y < 2X} = (x + y)dxdy = (x + y)dx dy
0xy2,y<2x 4 0 y 4
2
Z Z 2
1 2 3 2 5 2 7 7 8 7
= ( y y )dy = y 2 dy = · =
4 0 2 8 32 0 32 3 12
(c) According to the definition, when 0  y  2:
Z Z y
1 1 3 3 2
fY (y) = fXY (x, y)dx = (x + y)dx = ( y 2 0) = y
0 4 4 2 8
Otherwise, the density function fXY (x, y) = 0. Thus:
(
3 2
y y 2 [0, 2]
fY (y) = 8
0 else
R1 R1
6.36. (a) We need to find c so that 1 1 f (x, y)dxdy = 1. For this we need to
compute Z Z1 1
x2 (x y)2
e 2 2 dx dy
1 1
We can decide whether we should integrate with respect to x or y first, and
choosing y gives a slightly easier path.
Z 1 Z 1
x2 (x y)2 x2 (x y)2
e 2 2 dy = e 2 e 2 dy
1 1
p Z 1 p
x2 1 (x y)2 x2
= 2⇡e 2 p e 2 dy = 2⇡e 2 .
1 2⇡
In the last step we could recognize the integral of the pdf of a N (x, 1) distributed
random variable. From this we get
Z 1Z 1 Z 1p
x2 (x y)2 x2
e 2 2 dydx = 2⇡e 2 dx
1 1 0
Z 1
1 x2
= 2⇡ p e 2 dx = 2⇡.
0 2⇡
1
In the last step we integrated the pdf of the standard normal. Hence, c = 2⇡ .
(b) We have basically computed fX (without the constant c) in part (a) already.
Z 1
1 x2 (x y)2
fX (x) = e 2 2 dy
1 2⇡
Z 1
1 x2 1 (x y)2 1 x2
=p e 2 p e 2 dy = p e 2 .
2⇡ 1 2⇡ 2⇡
Solutions to Chapter 6 145

Now we compute fY :
Z 1 Z 1
1 x2 (x 1
y)2 1 x2 (x y)2
fY (y) = e 2 dx = p2 p e 2 2 dx.
1 2⇡ 2⇡ 1 2⇡
We can complete the square in the exponent of the exponential:
x2 (x y)2
= x2 xy 12 y 2 = (x y/2)2 y 2 /4,
2 2
and we can now compute the integral:
Z 1
1 1 x2 (x y)2
fY (y) = p p e 2 2 dx
2⇡ 1 2⇡
Z 1
1 1 2 2
=p p e (x y/2) y /4 dx
2⇡ 1 2⇡
Z 1
1 y 2 /4 1 2 1 y 2 /4
=p e p e (x y/2) dx = p e .
4⇡ 1 ⇡ 4⇡
2
In the last step we used the fact that p1⇡ e (x y/2) is the pdf of a N (y/2, 1)
distributed random variable. It follows that Y ⇠ N (0, 2).
Thus X ⇠ N (0, 1) and Y ⇠ N (0, 2).
(c) X and Y are not independent since there joint density function is not the same
as the product of the marginal densities.
Rd
6.37. We want to find fX (x) for which P (c < X < d) = c fX (x)dx for all c < d.
Because the x-coordinate of any point in D is in (a, b), we can assume that a < c <
d < b. In this case
A = {c < X < d} = {(x, y) : c < x < d, 0 < y < h(x)}.
area(A)
Because we chose (X, Y ) uniformly from D, we get P (A) = area(D) . We can
compute the areas by integration:
R d R h(x) Rd
dydx h(x)dx
P (c < X < d) = P (A) = Rcb R 0h(x) = Rcb .
a 0
dydx a
h(x)dx
We can rewrite the last expression as
Z d
h(x)
P (c < X < d) = Rb dx
c
a
h(s)ds
which shows that ( h(x)
Rb
h(s)ds
, if a < x < b
fX (x) = a

0, otherwise.
6.38. The marginal of Y is
Z 1
x(1+y) 1
fY (y) = xe dx = ,
0 (1 + y)2
for y > 0 and zero otherwise (use integration by parts). Hence,
Z 1
y
E[Y ] = dy = 1.
0 (1 + y)2
146 Solutions to Chapter 6

6.39. F (p, q) is the probability corresponding to the quarter plane {(x, y) : x <
p, y < q}. (Because X, Y are jointly continuous it does not matter whether we
write < or .) Our goal is to get the probability of (X, Y ) being in the rectangle
{(x, y) : a < x < b, c < y < d} using quarter planes probabilities. We start with
the probability F (b, d), this is the probability corresponding to the quarter plane
with corner (b, d). If we subtract F (a, d) + F (b, c) from this then we remove the
probabilities of the quarter planes corresponding to (a, d) and (b, c), and we have
exactly the rectangle (a, b) ⇥ (c, d) left. However, the probability corresponding to
the quarter plane with corner (a, c) was subtracted twice (instead of once), so we
have to add it back. This gives
P (a < X < b, c < Y < d) = F (b, d) F (b, c) F (a, d) + F (a, b).
6.40. First note that the relevant set of values is s 2 [0, 2] since 0  X + Y  2.
The joint density function is positive on the triangle
{(x, y) : 0 < y < 1, y < x < 2 y}.
To calculate the probability that X + Y  s, for 0  s  2, we combine the
restriction x + y  s with the description of the triangle to find the region of
integration. (A picture could help.)
ZZ Z s/2 ✓ Z s y ◆
P (X + Y  s) = f (x, y) dx dy = 3y(2 x) dx dy
0 y
x+ys
Z s/2
= 3
2 s2 y + 3 sy 2 + 6 sy 12 y 2 dy
0
3 2 2
(3 s 12) s3 2 s + 6s s
= + .
24 8
Di↵erentiating to give the density yields
3 1 3
f (s) = s2 s for 0 < s < 2, and zero elsewhere.
4 4
6.41. Let A be the intersection of the ball with radius r centered at the origin and
D. Because r < h, this is just the ‘top’ half of the ball. We need to compute
P ((X, Y, Z) 2 A), and because (X, Y, Z) is chosen uniformly from D this is just the
ratio of volumes of D and A. The volume 2 3
of D is r2 h⇡ while the volume of A is
2 3 3r ⇡ 2r
3 r ⇡, so the probability in question is r 2 h⇡ = 3h .
6.42. Drawing a picture is key to understanding the solution as there are multiple
cases requiring the computation of the areas of relevant regions.
Note that 0  X  2 and 0  Z = X + Y  5. This means that for x < 0 or
z < 0 we have
FX,Z (x, z) = P (X  x, Z  z) = 0.
If x and z are both nonnegative then we can compute P (X  x, Z  z) = P (X 
x, X + Y  z) by integrating the joint density of X, Y on the region Ax,z = {(s, t) :
s  x, s + t  z}. This is just the area of the intersection of Ax,z and D divided by
the area of D (which is 6). The rest of the solution boils down to identifying the
region Ax,z \ D in various cases and finding the corresponding area.
If 0  x  2 and z is nonnegative then we need to consider four cases:
Solutions to Chapter 6 147

• If 0  z  x then Ax,z \ D is the triangle with vertices (0, 0), (z, 0), (0, z),
2
with area z2 .
• If x < z  3 then Ax,z \ D is a trapezoid with vertices (0, 0), (x, 0), (0, z) and
(x, z x). Its area is x(2z2 x) .
• If 3 < z  3 + x then Ax,z \ D is a pentagon with vertices (0, 0), (x, 0),
2
(x, z x), (z 3, 3) and (0, 3). Its area is 3x (3+x2 z)
• If 3 + x < z then Ax,z \ D is the rectangle with vertices (0, 0), (x, 0), (x, 3)
and (0, 3), with area 3x.
We get the corresponding probabilities by dividing the area of Ax,z \ D with 6.
Thus for 0  x  2 we have
8
>
> 0, if z < 0
>
>
>
>z ,
2
>
> if 0  z  x
>
< 12
x(2z x)
FX,Z (x, z) = 12 , if x < z  3
>
>
>
> (3+x z) 2
>
>
x
, if 3 < z  3 + x
>
> 2 12
>
:x
2, if 3 + x < z.
For 2 < x we get P (X  x, Z  z) = P (X  2, Z  z) = FX,Z (2, z). Using the
previous results, in this case we get
8
>
> 0, if z < 0
>
>
>
> z 2
>
> , if 0  z  2
>
< 12
F (x, z) = (z 3 x) , if 2 < z  3
>
>
>
> 2
>
>
> 1 (5 12z) , if 3 < z  5
>
>
:
1, if 5 < z.
6.43. Following the reasoning of Example 6.40,
fT,V (u, v) = fX,Y (u, v) + fX,Y (v, u).
Substituting in the definition of fX,Y gives the answer
( p p
2u2 v + v + 2v 2 u + u if 0 < u < v < 1
fT,V (u, v) =
0 else.
6.44. Drawing a picture of the cone would help with this problem. The joint density
of the uniform distribution in the teepee is
(
1
if (x, y, z) 2 Cone
fX,Y,Z (x, y, z) = vol(Cone)
0 else .
The volume of the cone is ⇡r2 h/3. Thus the joint density is,
(
3
2 if (x, y, z) 2 Cone
fX,Y,Z (x, y, z) = ⇡r h
0 else .
148 Solutions to Chapter 6

To find the joint density of (X, Y ) we must integrate out the Z variable. To do so,
we switch to cylindrical variables. Let (R̃, ⇥, Z) be the distance from the center of
the teepee, angle, and height where the fly dies. The height that we must integrate
depends where we are on the floor. That is, if we are in the middle of the teepee
R̃ = 0, we must integrate Z from z = 0 to z = h. If we are near the edge of the
teepee, we only integrate a small amount, for example z = 0 to z = ✏. For an
arbitrary radius R̃0 , the height we must integrate to is h0 = (1 R̃r )h.
Then the integral we must compute is
Z (1 r̃r )h
3 3(1 r̃r )
fR̃,⇥ (r, ✓) = 2
dz = .
0 ⇡r h ⇡r2
We can check that this integrates to one. Recall that we are integrating with respect
to cylindrical coordinates and thus
Z Z Z 2⇡ Z r
3(1 r̃r )
fX,Y (x, y) dx dy = r̃ dr̃ d✓
circle 0 0 ⇡r2
Z 2⇡ r2 r3
3( 2 3 ) 3r2 16
= d✓ = (2⇡) = 1.
0 ⇡r2 ⇡r2
Thus, switching back to rectangular coordinates,
p
x2 +y 2
p 3(1 )
r
fX,Y (x, y) = fR,⇥ ( x2 + y 2 , ✓) = 2
⇡r
for x2 + y 2  r2 .
For the marginal in Z, consider the height to be z. Then we must integrate
over the circle with radius r0 = r(1 hz ). Thus, in cylindrical coordinates,
Z 2⇡ Z r(1 z/h)
3
fZ (z) = 2h
r̃ dr̃ d✓
0 0 ⇡r
which yields,
Z 2⇡
3r2 (1 z/h)2 3⇣ z ⌘2
fZ (z) = d✓ = 1 .
0 2⇡r2 h h h
6.45. We first note that
FV (v) = P (V  v) = P (max(X, Y )  v) = P (X  v, Y  v)
= P (X  v)P (Y  v) = FX (v)FY (v).
Di↵erentiating this we get the p.d.f. of V :
d 0
fV (v) =FV (v) = FX (v)FY (v) = fX (v)FY (v) + FX (v)fY (v).
dv
For the minimum we use
P (T > z) = P (min(X, Y ) > z) = P (X > z, Y > z) = P (X > z)P (Y > z),
then
FT (z) = P (T  z) = 1 P (T > z) = 1 P (X > z)P (Y > z)
=1 (1 FX (z))(1 FY (z)),
Solutions to Chapter 6 149

and
⇥ ⇤0
fT (z) = 1 (1 FX (z))(1 FY (z))
= fX (z)(1 FY (z)) + fY (z)(1 FX (z)).
We computed the probabilities of the events {max(X, Y )  v} and {min(X, Y ) > z}
because these events can be written as intersections to take advantage of indepen-
dence.
6.46. We know from (6.31) and the independence of X and Y that
fT,V (t, v) = fX (t)fY (v) + fX (v)fY (t),
if t < v and zero otherwise. The marginal of T = min(X, Y ) is found by integrating
the v variable:
Z 1 Z 1
fT (t) = fT,V (t, v)dv = fX (t)fY (v) + fX (v)fY (t) dv
1 t
= fX (t)(1 FY (t)) + fY (t)(1 FX (t)).

Turning to V = max(X, Y ), we integrate away the t variable:

Z 1 Z v
fV (v) = fT,V (t, v)dt = fX (t)fY (v) + fX (v)fY (t) dt
1 1
= fY (v)FX (v) + fX (v)FY (v).
6.47. (a) We will write FX for F to avoid confusion. We need
FZ (z) = P (min(X1 , . . . , Xn )  z).
We would like to write this in terms of the intersections of independent events, so
we consider the complement:
1 P (min(X1 , . . . , Xn )  z) = P (min(X1 , . . . , Xn ) > z).
The minimum of a group of numbers is larger than z if and only if every number is
larger than z:
P (min(X1 , . . . , Xn ) > z) = P (X1 > z, . . . , Xn > z) = P (X1 > z) · · · P (Xn > z)
= (1 P (X1  z)) · · · (1 P (Xn  z)) = (1 FX (z))n .
Thus
FZ (z) = 1 (1 FX (z))n
For the cumulative distribution of the maximum we need
FW (w) = P (max(X1 , X2 , . . . , Xn )  w).
The maximum of some numbers is at most w if and only if every number is at most
w:
P (max(X1 , X2 , . . . , Xn )  w) = P (X1  w, . . . , Xn  w)
= P (X1  w) · · · P (Xn  w) = FX (w)n .
150 Solutions to Chapter 6

(b) We can find the density functions by di↵erentiation (using the chain rule):
d d
fZ (z) = FZ (z) = (1 (1 FX (z))n ) = nfX (x)(1 FX (x))n 1
,
dz dz
d d
fW (w) = FW (w) = FX (w)n = nfX (x)FX (x)n 1 .
dw dw
( 1 +···+ n )t
6.48. Let t > 0. We will show that P (Y > t) = e . Using the indepen-
dence of the random variables we have
P (Y > t) = P (min(X1 , X2 , . . . , Xn ) > t) = P (X1 > t, X2 > t, . . . , Xn > t)
n
Y n
Y
it
= P (Xi > t) = e
i=1 i=1
( 1 +···+ n )t
=e .
Hence, Y is exponentially distributed with parameter 1 + ··· + n.

6.49. In the setting of Fact 6.41, let G(x, y) = (min(x, y), max(x, y)) and L =
{(t, v) : t < v}. When x 6= y this function G is two-to-one. Hence we define
two separate regions K1 = {(x, y) : x < y} and K2 = {(x, y) : x > y}, so that
G is one-to-one and onto L from both K1 and K2 . The inverse functions are as
follows: from L onto K1 it is (q1 (t, v), r1 (t, v)) = (t, v) and from L onto K2 it is
(q2 (t, v), r2 (t, v)) = (v, t). Their Jacobians are
 
1 0 0 1
J1 (t, v) = det = 1 and J2 (t, v) = det = 1.
0 1 1 0

Let again w be an arbitrary function whose expectation we wish to compute.

Z 1Z 1
E[w(U, V )] = w min(x, y), max(x, y) fX,Y (x, y) dx dy
1 1
ZZ ZZ
= w(x, y)fX,Y (x, y) dx dy + w(y, x)fX,Y (x, y) dx dy
x<y y>x
ZZ
= w(t, v) fX,Y (q1 (t, v), r1 (t, v)) |J1 (t, v)| dt dv
L
ZZ
+ w(t, v) fX,Y (q2 (t, v), r2 (t, v)) |J2 (t, v)| dt dv
L
ZZ
= w(t, v) fX,Y (t, v) + fX,Y (v, t) dt dv.
t<v

Since the diagonal {(x, y) : x = y} has zero area it was legitimate to drop it from
the first double integral. From the last line we can read o↵ the joint density function
fT,V (t, v) = fX,Y (t, v) + fX,Y (v, t) for t < v.
6.50. (a) Since X ⇠ Gamma(r, ) and Y ⇠ Gamma(s, ) are independent, we have
xr 1 r
xy
s 1 s
y
fX,Y (x, y) = fX (x)fY (y) = e e
(r) (s)
for x > 0, y > 0, and fX,Y (x, y) = 0 otherwise.
Solutions to Chapter 6 151

In the setting of Fact 6.41, for x, y 2 (0, 1) we are using the change of
variables
x
u = g(x, y) = 2 (0, 1), v = h(x, y) = x + y 2 (0, 1).
x+y
The inverse functions are
q(u, v) = uv 2 (0, 1), r(u, v) = v(1 u) 2 (0, 1).
The relevant Jacobian is
@q @q
@u (u, v) @v (u, v)
v u
J(u, v) = @r @r = = v.
@u (u, v) @v (u, v)
v 1 u
From this we get
fB,G (u, v) = fX (uv)fY (v(1 u))v
r r 1 s
(uv) (v(1 u))s 1
= e uv e (v(1 u))
v
(r) (s)
(r + s) r 1 1
= u (1 u)s 1
· r+s (r+s) 1
v e v
.
(r) (s) (r + s)
for u 2 (0, 1), v 2 (0, 1), and 0 otherwise. We can recognize that this is exactly
the product of a Beta(r, s) probability density (in u) and a Gamma(r + s, )
probability density (in v), hence B ⇠ Beta(r, s), G ⇠ Gamma(r + s, ), and
they are independent.
(b) The transformation described is the inverse of that found in part (a). Therefore,
X and Y are independent with X ⇠ Gamma(r, ) and Y ⇠ Gamma(s, ).
For the detailed solution note that
(r + s) r 1 1
fB,G (b, g) = b (1 b)s 1 · r+s (r+s) 1
g e g
(r) (s) (r + s)
for b 2 (0, 1), g 2 (0, 1) and it is zero otherwise.
We use the change of variables
x = b · g, y = (1 b) · g.
The inverse function is
x
b= , g = x + y.
x+y
The Jacobian is
y x
(x+y)2 (x+y)2 1
J(x, y) = = .
1 1 x+y
From this we get
(r + s) x r 1 x s 1 1 r+s 1
fX,Y (x, y) = ( ) (1 x+y ) · (x + y)(r+s) 1
e (x+y)
(r) (s) x+y (r + s) x+y
xr 1 r ys 1 s
= e x e y
(r) (s)
for x > 0, y > 0 (and zero otherwise). This shows that indeed X and Y are
independent with X ⇠ Gamma(r, ) and Y ⇠ Gamma(s, ).
152 Solutions to Chapter 6

6.51. (a) Apply the two-variable expectation formula to the function h(x, y) =
g(x). Then
X X
E[g(X)] = E[h(X, Y )] = h(k, `)P (X = k, Y = `) = g(k)P (X = k, Y = `)
k,` k,`
X X X
= g(k) P (X = k, Y = `) = g(k)P (X = k).
k ` k

(b) Similarly with integrals:

Z 1 Z 1
E[g(X)] = E[h(X, Y )] = h(x, y) fX,Y (x, y) dx dy
1 1
Z 1 ✓Z 1 ◆ Z 1
= g(x) fX,Y (x, y) dy dx = g(x) fX (x) dx.
1 1 1

6.52. For any t1 , . . . , tr 2 R we have

⇥ ⇤ X n!
E et1 X1 +···+tr Xr = et1 k1 +···+tr kr pk1 · · · pkr r
k1 ! · · · kr ! 1
k1 +k2 +···+kr =n
X n! k1 kr
= p1 e t 1 · · · pr e t r
k1 ! · · · kr !
k1 +k2 +···+kr =n

= (p1 et1 + · · · + pr etr )n ,

where the final step follows from the multinomial theorem.
6.53.
pX1 ,...,Xm (k1 , . . . , km ) = P (X1 = k1 , . . . , Xm = km )
X
= P (X1 = k1 , . . . , Xm = km , Xm+1 = `m+1 , . . . , Xn = `n )
`m+1 ,...,`n
X
= pX1 ,...,Xm ,...,Xn (k1 , . . . , km , `m+1 , . . . , `n ).
`m+1 ,...,`n

6.54. Let X1 , . . . , Xn be jointly continuous random variables with joint density

function f . Then for any 1  m  n the joint density function fX1 ,...,Xm of random
variables X1 , . . . , Xm is
Z 1 Z 1
fX1 ,...,Xm (x1 , . . . , xm ) = ··· f (x1 , . . . , xm , ym+1 , . . . , yn ) dym+1 . . . dyn .
1 1

Proof. One way to prove this is with the infinitesimal method. For " > 0 we have
P (X1 2 (x1 , x1 + "), . . . , Xm 2 (xm , xm + "))
Z x1 +" Z xm +" Z 1 Z 1
= ··· ··· f (y1 , . . . , yn ) dy1 . . . dyn
x1 xm 1 1
✓Z 1 Z 1 ◆
⇡ ··· f (x1 , . . . , xm , ym+1 , . . . , yn ) dym+1 . . . dyn "m .
1 1

The result is shown by an application of Fact 6.39. ⇤

Solutions to Chapter 6 153

Another possible proof would be to express the joint cumulative distribution

function of X1 , . . . , Xm as a multiple integral, and to read o↵ the joint probability
density function from that.
6.55. Consider the table for the joint probability mass function:
XD
0 1
0 0 a
XB
1 b 1 a b

We set P (XB = XD = 0) = 0 to make sure that a call comes. a and b are unknowns
that have to satisfy a 0, b 0 and a + b  1, in order for the table to represent
a legitimate joint probability mass function.
(a) The given marginal p.m.f.s force the following solution:
XD
0 1
0 0 0.7
XB
1 0.2 0.1

(b) There is still a solution when P (XD = 1) = 0.7 but no longer when P (XD =
1) = 0.6.
6.56. Pick an x for which P (X = x) > 0. Then,
X X X
0 < P (X = x) = P (X = x, Y = y) = a(x)b(y) = a(x) b(y).
y y y
P
Hence, y b(y) 6= 0 and
P (X = x)
a(x) = P .
y b(y)

Similarly, for a y for which P (Y = y) > 0 we have

P (Y = y)
b(y) = P .
x a(x)

Combining the above we have

P (X = x)P (Y = y)
P (X = x, Y = y) = a(x)b(y) = P P .
ỹ b(ỹ) x̃ a(x̃)

However, the denominator is equal to 1:

X X X X
1= P (X = x, Y = y) = a(x)b(y) = a(x) b(y),
x,y x,y x y

and so the result is shown.

154 Solutions to Chapter 6

6.57. We can assume that n 2. (If n = 1 then Z = W = X1 and there is no joint

density.)
Since all Xi are in [0, 1], this will be true for Z and W as well. We also know
that the maximum is at least as large as the minimum: P (Z  W ) = 1. We start by
computing the probability P (z < Z  W  w) for 0  z < w  1. The maximum
and minimum are between z and w if and only if all the numbers are between z
and w. Thus
P (z < Z  W  w) = P (z < X1  w, . . . , z < Xn  w)
= P (z < X1  w) · · · P (z < Xn  w)
= (w z)n .
We would like to find the joint cumulative distribution function FZ,W (z, w) =
P (Z  z, W  w). Because 0  Z  W  1, it is enough to focus on 0  z  w 
1. Note that
P (z < Z  W  w) = P (W  w) P (Z  z, W  w)
hence for 0  z  w  1 we have
FZ,W (z, w) = P (W  w) (w z)n .
(This also holds for w = z, because then P (Z  w, W  w) = P (W  w).) Taking
the mixed partial derivatives gives the joint density (note that the P (W  w)
disappears when we di↵erentiate with respect to z):
@2 @2
fZ,W (z, w) = FZ,W (z, w) = (P (W  w) (w z)n )
@z@w @z@w
= n(n 1)(w z)n 2 .
Thus fZ,W (z, w) = n(n 1)(w z)n 2
if 0  z < w  1 and zero otherwise.
Solutions to Chapter 7

7.1. We have
X
P (Z = 3) = P (X + Y = 3) = P (X = k)P (Y = 3 k).
k

Since X is Poisson, P (X = k) = 0 for k < 0. The random variable Y is geometric,

hence P (Y = 3 k) = 0 if 3 k  0. Thus P (X = k)P (Y = 3 k) is nonzero for
k = 0, 1 and 2 and we get
P (Z = 3) = P (X = 0)P (Y = 3) + P (X = 1)P (Y = 2) + P (X = 2)P (Y = 1)
2 50 2
= e 2 23 · ( 13 )2 + 2e 2 23 · 13 + 22! e 2 23 = e .
27
7.2. The possible values for both X and Y are 0 and 1, hence X + Y can take the
values 0, 1 and 2. If X + Y = 0 then we must have X = 0 and Y = 0 and by
independence we get
P (X + Y = 0) = P (X = 0, Y = 0) = P (X = 0)P (Y = 0) = (1 p)(1 r).
Similarly, if X + Y = 2 then we must have X = 1 and Y = 1:
P (X + Y = 2) = P (X = 1, Y = 1) = P (X + 1)P (Y = 1) = pr.
We can now compute P (X + Y ) = 1 by considering the complement:
P (X+Y = 1) = 1 P (X+Y = 0) P (X+Y = 2) = 1 (1 p)(1 r) pr = p+r 2pr.
We have computed the probability mass function of X + Y which identifies its
distribution.
7.3. Let X1 and X2 be the change in price tomorrow and the day after tomorrow.
We know that X1 and X2 are independent, they have probability mass functions
given by the table. We need to compute P (X1 + X2 = 2), which is given by
X
P (X1 + X2 = 2) = P (X1 = k)P (X2 = 2 k).
k

155
156 Solutions to Chapter 7

Going through the possible values of k for which P (X1 = k) > 0, and keeping only
the terms for which P (X2 = 2 k) is also positive:
P (X1 + X2 = 2) = P (X1 = 1)P (X2 = 3) + P (X1 = 0)P (X2 = 2)
+ P (X1 = 1)P (X2 = 1) + P (X1 = 2)P (X2 = 0)
+ P (X1 = 3)P (X2 = 1)
1 1 1 1 1 1
= 64 + 64 + 16 + 64 + 64 = 8

7.4. We have
( (
x µy
e , if x > 0 µe , if y > 0
fX (x) = fY (y) =
0, otherwise, 0, otherwise.
Since X and Y are both positive, X +Y > 0 with probability one, and fX+Y (z) = 0
for z  0. For z > 0, using the convolution formula
Z 1 Z z
fX+Y (z) = fX (x)fY (z x)dx = e x µe µ(z x) dx.
1 0

In the second step we used that fX (x)fY (z x) 6= 0 if and only if x > 0 and
z x > 0 which means that 0 < x < z.
Returning to the integral
Z z Z z
fX+Y (z) = e x µe µ(z x)
dx = µe µz
e(µ )x
dx
0 0
(µ )x x=z (µ )z z µz
µz e µz e 1 e e
= µe = µe = µ .
µ x=0 µ µ
Note that we used 6= µ when we integrated e(µ )x
.
Hence the probability density function of X + Y is
( z µz
µe µ e , if z > 0
fX+Y (z) =
0, otherwise.
7.5. (a) By Fact 7.9 the distribution of W is normal, with
2 2 2 2
µW = 2µx 4µY + µZ = 7, W = X + 16 Y + Z = 25.
Thus W ⇠ N ( 7, 25).
W
p+7
(b) Using part (a) we know that 25
is a standard normal. Thus
✓ ◆
W +7 2+7
P (W > 2) = P > =1 (1) ⇡ 1 0.8413 = 0.1587.
5 5
7.6. By exchangeability
P (3rd card is a king, 5th card is the ace of spades)
= P (1st card is the ace of spades, 2nd card is king).
The second probability can now be computed by counting favorable outcomes within
the first two picks:
1·4 2
P (1st card is the ace of spades, 2nd card is king) = 52 = .
2
663
Solutions to Chapter 7 157

7.7. By exchangeability
P (X3 is the second largest) = P (Xi is the second largest)
for any i = 1, 2, 4. Because the Xi are jointly continuous the probability that any
two are equal is zero. Thus
4
X
1= P (Xi is the second largest) = 4P (X3 is the second largest)
i=1

and P (X3 is the second largest) = 14 .

7.8. Let Xk denote the color of the kth pick. Since the random variables X1 , . . . , X10
are exchangeable, we have
P (X3 = green, X5 = yellow)
P (X3 = green | X5 = yellow) =
P (X5 = yellow)
P (X2 = green, X1 = yellow)
=
P (X1 = yellow)
2
= P (X2 = green | X1 = yellow) = .
7
6
The fact that P (X3 = green | X5 = yellow) = 21 = 27 follows by counting favorable
outcomes, or noting that given that the first pick is yellow there are 6 out of the
21 balls left are green.
7.9. (a) The waiting time W5 between the 4th and 5th call has Exp(6) distribution
(with hours as units). Thus
1
P (W5 < 10 min) = P (W5 < 16 ) = 1 e 6 ·6 = 1 e 1
.
(b) The waiting time between the 9th and 7th call is W8 + W9 where Wi is the
waiting time between the (i 1)th and ith calls. These are independent ex-
ponentials with parameter 6. The sum of two independent Exp(6) distributed
random variables has Gamma(2, 6) distribution (see Example 7.29 and the dis-
cussion before that). Thus
Z 1
4 5
P (W8 + W9  15 min) = P (W8 + W9  1
4) = 62 te 6t
dt = 1 e 3/2
.
0 2
The final computation comes from the pdf of the gamma random variable
and integration by parts. Alternatively, you can use the explicit cdf of the
Gamma(2, ) distribution that we derived in Example 4.36.
7.10. By the memoryless property of the exponential distribution the waiting time
until the first bulb replacement has distribution Exp( 16 ) (where we use months as
units). The waiting time from the first bulb replacement until the second one has
the same Exp( 16 ) distribution, and we can assume that it is independent of the first
wait time. The same holds for the waiting time between the kth and (k + 1)st bulb
replacements. This means that the replacement times form a Poisson process with
intensity 16 . Denoting the number of points in [0, t) for the process by N ([0, t])
158 Solutions to Chapter 7

we need to compute P (N ([0, 3]) = 3). But N ([0, 3]) has Poisson distribution with
parameter 3 · 16 = 12 , hence

P (exactly 3 bulbs are replaced before the end of March)

1
(1/2)3 1 e 2
= P (N ([0, 3]) = 3) = e 2 =
3! 48
7.11. (a) Let X be the number of trials you perform and let Y be the number
of trials I perform. Then, using that X and Y are independent Geom(p) and
Geom(r) distributed random variables

1
X 1
X
P (X = Y ) = P (X = Y = k) = P (X = k)P (Y = k)
k=1 k=1
X1 1
X k
= p(1 p)k 1
r(1 r)k 1
= pr [(1 p)(1 r)]
k=1 k=0
1 pr
= pr = .
1 (1 p)(1 r) r + p rp

(b) We have Z = X + Y . Thus, the range of Z is {2, 3, . . . } and the probability

mass function can be computed as

n
X1 n
X1
P (Z = n) = P (X = i)P (Y = n i) = p(1 p)i 1
r(1 r)n i 1

i=1 i=1
n
X1 n
X2
= pr (1 p)i 1
(1 r)n i 1
= pr (1 p)i (1 r)n (i+1) 1

i=1 i=0
X2 
n
1 p
i
[(1 p)/(1 r)]n 1
21
= pr(1 r)n 2
= pr(1 r)n
i=0
1 r 1 (1 p)/(1 r)

11 [(1 p)/(1 r)]n 1

(1 r)n 1
(1 p)n 1
= pr(1 r)n = pr .
(1 r) (1 p) p r

7.12. The probability mass function of Z is pZ (0) = 1 p, pZ (1) = p. The proba-

bility mass function of W is
✓ ◆
n k
pW (k) = p (1 p)n k
, k = 0, 1, . . . , n.
k

The possible values of Z + W are 0, 1, . . . , n + 1. Using the convolution formula we

get
X
pZ+W (k) = pZ (`)pW (k `).
`
Solutions to Chapter 7 159

We only need to evaluate this for k = 0, 1, . . . , n + 1. Since pZ (`) is nonzero only

for ` = 0 and ` = 1:
pZ+W (k) = pZ (0)pW (k) + pZ (1)pW (k 1)
✓ ◆ ✓ ◆
n k n
= (1 p) · p (1 p)n k + p · pk 1
(1 p)n k+1
k k 1
✓✓ ◆ ✓ ◆◆
n n
= + pk (1 p)n+1 k .
k k 1
In the last formula we used the convention that na = 0 if a < 0 or a > n. The
final formula looks very similar to the probability mass function of a Bin(n + 1, p)
distribution. In fact, it is exactly the same, as by Exercise C.11 we have n+1
k =
n n
k + k 1 . Thus Z + W ⇠ Bin(n + 1, p).
Once we find (or conjecture) the answer, we can find a simpler argument. We
can represent a Bin(n, p) distributed random variable as the number of successes
among n independent trials with success probability p. Now imagine that we have
n + 1 independent trials with success probability p. Denote the number of successes
among the first n trials by W̃ and denote the outcome of the last trial by Z̃.
Then Z̃ ⇠ Ber(p), W̃ ⇠ Bin(n, p) and these are independent (since the last trial
is independent of the first n). But Z̃ + W̃ counts the number of successes among
the n + 1 trials, so its distribution is Bin(n + 1, p). This shows that the sum of a
Ber(p) and and independent Bin(n, p) distributed random variable is distributed as
Bin(n + 1, p).
7.13. We could use the convolution formula, but it is easier to use the way we
introduced the negative binomial distribution. (See the discussion before Definition
7.6.) If Z1 , Z2 , . . . are independent Geom(p) random variables, then adding n of
them gives a Negbin(n, p) distributed random variable. In particular, Z1 +· · ·+Zk ⇠
Negbin(k, p) and Zk+1 + · · · + Zm ⇠ Negbin(m, p) and these are independent. Thus
X + Y has the same distribution as Z1 + · · · + Zm+n which has Negbin(k + m, p)
distribution. Thus X + Y has possible values k + m, k + m + 1, . . . and pmf
✓ ◆
n 1
P (X + Y = n) = pk+m (1 p)n k m for n k + m.
k+m 1
7.14. Using the same notation as in Example 7.7 we get that
✓ ◆
k 1 4
P (X = k) = p (1 p)k 4 , k = 4, 5, 6, 7.
3
Evaluating P (X = 6) for the various values of p gives the following numerical
values:
p 0.40 0.35 0.30
P (Brewers win in 6) 0.09216 0.06340 0.03969

We also get
7
X 7 ✓
X ◆
k 1
P (Brewers win) = P (X = k) = p4 (1 p)k 4
.
3
k=4 k=4
160 Solutions to Chapter 7

Evaluating this sum for the various values of p gives the following numerical values:
p 0.40 0.35 0.30
P (Brewers win) 0.2898 0.1998 0.1260

7.15. We have the following probability mass functions for X and Y :

1 1
pX (k) = , for 1  k  n, and pY (k) = , for 1  k  m.
n m
Both functions can be extended to all integers by setting them equal to zero outside
the given domain. The domain of X + Y is the set {2, 3, . . . , n + m}. The pmf can
be computed using the convolution formula:
X
pX+Y (a) = pX (k)pY (a k).
k
1
The value of pX (k)pY (a k) is either zero or mn , so we just have to compute the
number of nonzero terms in the sum for a given 2  a  n + m. In order for
pX (k)pY (a k) to be nonzero we need 1  k  n and 1  a k  m. The second
inequality gives a m  k  a 1. Solving the system of inequalities by considering
the ‘worse’ of the upper and lower bounds we get
max(1, a m)  k  min(n, a 1).
There are min(n, a 1) max(1, a m) + 1 integer solutions to this inequality, so
1
pX+Y (a) = (min(n, a 1) max(1, a m) + 1) , for 2  a  n + m.
mn
By considering the cases 2  a  n, n + 1  a  m + 1 and m + 2  a  m + n
separately, we can simplify the answer to get the following function:
8
> a 1
< mn 2  a  n,
1
pX+Y (a) = m n + 1  a  m + 1,
>
: m+n+1 a
mn m + 2  a  m + n.
7.16. The probability mass function of X is
k
pX (k) = e , k = 0, 1, 2, . . .
k!
while the probability mass function of Y is pY (0) = 1 p, pY (1) = p. Using the
convolution formula we get
X
pX+Y (n) = pX (k)pY (n k).
k

The possible values of X + Y are 0, 1, 2, . . . , so we only need to deal with n 0.

We only have pY (n k) 6= 0 if n k = 0 or n k = 1 so we get
pX+Y (n) = pX (n)pY (0) + pX (n 1)pY (1).
If n = 0 then pX (n 1) = 0, so
pX+Y (0) = pX (0)pY (0) = (1 p)e .
Solutions to Chapter 7 161

For n > 0 we get

n n 1
pX+Y (n) = pX (n)pY (0) + pX (n 1)pY (1) = (1 p) e +p e
n! (n 1)!
n 1( (1
p) + np)
= e .
n!
Thus the probability mass function of X + Y is
8
<(1 p)e , if n = 0,
pX+Y (n) =
: n 1 ( (1 p)+np) e , if n 1.
n!

7.17. Let X be the the number of trials needed until we reach k successes, then
X ⇠ Negbin(k, p). The event that the number of successes reaches k before the
number of failures reaches ` is the same as {X < k + `}. Moreover this event is the
same as having at least k successes within the first k + ` 1 trials. Thus
` 1✓
X ◆ X 1 ✓k + ` 1◆
k+`
k+j k
P (X < k + `) = p (1 p)j = pa (1 p)k+` 1 a .
j=0
k 1 a
a=k

7.18. Both X and Y have probability densities that are zero for negative values,
this will hold for X + Y as well. Using the convolution formula, for z 0 we get
Z 1 Z z
fX+Y (z) = fX (x)fY (z x)dx = fX (x)fY (z x)dx
1 0
Z z Z z
= 2e 2x 4(z x)e 2(z x) dx = 8(z x)e 2z dx
0 0
Z z
2z
= 8e (z x)dx = 4z 2 e 2z .
0
Thus (
4z 2 e 2z
, if z 0,
fX+Y (z) =
0, otherwise.
7.19. (a) We need to compute
ZZ Z 1 Z 1
x y
P (Y X 2) = fX (x)fY (y) dx dy = e dydx
y x 2 2 x
Z 1
2x 4
= e dx = 12 e .
2

(b) The density of f Y is given by f Y (y) = fY ( y). Then from the convolution
formula we get
Z 1 Z 1 Z 1
fX Y (z) = fX (t)f Y (z t)dt = fX (t)f Y (z t)dt = fX (t)fY (t z)dt.
1 1 1

Note that fX (t)fY (t z) > 0 if t > 0 and t z > 0, which is the same as
t > max(z, 0). Thus
Z 1 Z 1
1
fX Y (z) = fX (t)fY (t z)dt = e 2t+z dt = e 2 max(z,0)+z .
max(z,0) max(z,0) 2
162 Solutions to Chapter 7

If z 0 then this gives 12 e 2 max(z,0)+z = 12 e z . If z < 0 then 12 e 2 max(z,0)+z =

1 z 1 |z|
2 e . We can summarize these two cases with the formula fX Y (z) = 2 e .
7.20. (a) Since X and Y are independent, we have fX,Y (x, y) = fX (x)fY (y) where
( (
2x, if 0 < x < 1 1, if 1 < y < 2
fX (x) = fY (y) =
0, otherwise 0, otherwise
3
To compute P (Y X 2 ) we need to integrate fX,Y (x, y) on the set {(x, y) :
3
y x 2 }. Since f X,Y (x, y) is positive only if 0 < x < 1 and 1 < y < 2, it is
enough to consider the intersection
3
{(x, y) : y x 2} \ {(x, y) : 0 < x < 1, 1 < y < 2}.
By sketching this region (or solving the inequalities) we get the region is the same
as {(x, y) : 0 < x < 1/2, 3/2 + x < y < 2}.Thus we get
ZZ Z 1/2 Z 2
3
P (Y X 2) = fX,Y (x, y)dxdy = 2xdydx
y x 3/2 0 3/2+x
Z 1/2
1
= (1/2 x)2xdx = .
0 24
(b) Note that X takes values in (0, 1), Y takes values in (1, 2) so X + Y will take
values in (1, 3). For a given z 2 (1, 3) the convolution formula gives
Z 1 Z 1
fX+Y (z) = fX (x)fY (z x)dx = fX (x)fY (z x)dx,
1 0

where we used the fact that fX (x) = 0 outside (0, 1). For a given 1 < z < 3 the
function fY (z x) is nonzero if and only if 1 < z x < 2, which is equivalent to
z 2 < x < z 1. Since we must have 0 < x < 1 for fX (x) to be nonzero, this
means that fX (x)fY (z x) is nonzero only if max(0, z 2) < x < min(1, z 1).
Thus
Z 1 Z min(1,z 1)
fX+Y (z) = fX (x)fY (z x)dx = 2xdx
0 max(0,z 2)
2 2
= min(1, z 1) max(0, z 2) .
Considering the 1 < z  2 and 2 < z < 3 cases separately:
8
> 2
<(z 1) , if 1 < z  2,
fX+Y (z) = 1 (z 2)2 , if 2 < z < 3,
>
:
0, otherwise.
7.21. (a) By Fact 7.9 the distribution of W is normal, with
2 2 2
µW = 3µx + 4µY = 10, W =9 X + 16 Y = 59.
Thus W ⇠ N (9, 57).
(b) Using part (a) we know that Wp5710 is a standard normal. Thus
✓ ◆
W 10 15 10
P (W > 15) = P p > p =1 ( p557 ) ⇡ 1 (0.66) ⇡ 0.2578.
57 57
Solutions to Chapter 7 163

7.22. Using Fact 3.61 we have 2X ⇠ N (2µ, 4 2 ). From Fact 7.9 by the indepen-
dence of X and Y we get X + Y ⇠ N (2µ, 2 2 ). Since 2 > 0, the two distributions
can never be the same.
7.23. By Fact 7.9 X Y ⇠ N (0, 2) and thus Xp2Y ⇠ N (0, 1). From this we get
p p
P (X > Y + 2) = P ( Xp2Y > 2) = 1 ( 2) ⇡ 1 (1.41) ⇡ 0.0793.
2
7.24. Suppose that the variances of X, Y and Z are X , Y2 and Z
2
. Using Fact 7.9
2 2 2 p X+2Y 3Z
we have that X + 2Y 3Z ⇠ N (0, X + 4 Y + 9 Z ), and 2 2 2
⇠ N (0, 1).
X +4 Y +9 Z

This gives
!
X + 2Y 3Z 1
P (X + 2Y 3Z > 0) = P p >0 =1 (0) = .
2
X +4 2
Y +9 2
Z
2
7.25. We have fX (x) = 1 for 0 < x < 1 and zero otherwise. For Y we have
fY (y) = 12 for 8 < y < 10 and zero otherwise. Note that 8 < X + Y < 11.
The density of X + Y is given by
Z 1
fX+Y (z) = fX (t)fY (z t)dt.
1

The product fX (t)fY (z t) is 12 if 0 < t < 1 and 8 < z t < 10, and zero otherwise.
The second inequality is equivalent to z 10 < t < z 8. The the solution of the
inequality system is max(0, z 10) < t < min(1, z 8). Hence, for 8 < z < 11 we
have
Z 1
1
fX+Y (z) = fX (t)fY (z t)dt = (min(1, z 8) max(0, z 10)).
1 2
Evaluating the formula on (8, 9), [9, 10) and [10, 11) we get the following case defined
function: 8z 8
>
> 2 8<z<9
>
<1 9  z < 10
fX+Y (z) = 211 z
>
> 10  z < 11,
>
: 2
0 otherwise
7.26. The probability density functions of X and Y are
( (
1
, if 1 < x < 3 1, if 9 < y < 10
fX (x) = 2 fY (y) =
0, otherwise 0, otherwise
Since 1  X  3 and 9  Y  10 we must have 10  X + Y  13. For a z 2 [10, 13]
the convolution formula gives
Z 1 Z 3
fX+Y (z) = fX (x)fY (z x)dx = fX (x)fY (z x)dx.
1 1

We must have 9  z x  10 for fY (z x) to be nonzero, and this means

z 10  x  z 9. Combining this with the inequality 1  x  3 we get that
fX (x)fY (z x) is nonzero if
max(1, z 10)  x  min(3, z 9).
164 Solutions to Chapter 7

Thus
Z 3 Z min(3,z 9)
1
fX+Y (z) = fX (x)fY (z x)dx = dx
1 max(1,z 10) 2
1
= (min(3, z 9) max(1, z 10)) .
2
Evaluating these expressions for 10  z < 11, 11  z < 12 and 12  z < 13 we get
the following case defined function:
81
>
> 2 (z 10) if 10  z < 11
>
<1 if 11  z < 12
fX+Y (z) = 21 .
>
> (13 z) if 12  z < 13
>
: 2
0 otherwise.
7.27. Using the convolution formula:
Z 1
fX+Y (t) = f (s)fY (t s)ds.
1

We have fY (t s) = 1 for 0  t s  1 and zero otherwise. The inequality

0  t s  1 is equivalent to t 1  s  t. Thus
Z 1 Z t
fX+Y (t) = f (s)fY (t s)ds = f (s)ds.
1 t 1

7.28. Because X1 , X2 , X3 are jointly continuous, the probability that any two of
them are equal is 0. This means that P (X1 , X2 , X3 are all di↵erent) = 1. By the
exchangeability of X1 , X2 , X3 we have

P (X1 < X2 < X3 ) = P (X2 < X1 < X3 ) = P (X1 < X3 < X2 )

= P (X3 < X2 < X1 ) = P (X2 < X3 < X1 ) = P (X3 < X1 < X2 ),

where we listed all six possible orderings of X1 , X2 , X3 . Since the sum of the six
probabilities is P (X1 , X2 , X3 are all di↵erent), we get that P (X1 < X2 < X3 ) = 61 .
7.29. By exchangeability, each Xi , 1  i  100 has the same probability to be the
50th largest. Since the Xi are jointly continuous, the probability of any two being
equal is 0. Hence
100
X
1= P (Xi is the 50th largest number) = 100P (X20 is the 50th largest number)
i=1

1
and the probability in question must be 100 .

7.30. (a) By exchangeability

4·4 4
P (2nd card is A, 4th card is K) = P (1st card is A, 2nd card is K) = = ,
52 · 51 663
where the final probability comes from counting the favorable outcomes for the first
two picks.
Solutions to Chapter 7 165

(b) Again, by exchangeability and counting the favorable outcomes within the first
two picks:
13
2 1
P (1st card is , 5th card is ) = P (1st card is , 2nd card is ) = 52 = .
2
17
(c) Using the same arguments:
P (2nd card is K, last two cards are aces)
P (2nd card is K|last two cards are aces) =
P (last two cards are aces)
P (3rd card is K, first two cards are aces)
=
P (first two cards are aces)
= P (3rd card is K|first two cards are aces)
4 2
= = .
50 25
The final probability comes either from counting favorable outcomes for the first
three picks, or by noting that if we choose two aces for the first two picks then we
always have 50 cards left with 4 of them being kings.
7.31. By exchangeability the probability that the 3rd, 10th and 23rd picks are
of di↵erent colors is the same as the probability of the first three picks being of
di↵erent color. For this event the order of the first three picks does not matter, so
we can assume that we choose the three balls without order, and we just need the
probability that these are of di↵erent colors. Thus the probability is
20 · 10 · 15 100
P (we choose one of each color) = 45 = .
3
473
7.32. Denote by Xk the numerical value of the kth pick. By exchangeability of
X1 , . . . , X23 we get
P (X9  5, X14  5, X21  5) = P (X1  5, X2  5, X3  5).
(53) 10
The probability that the first three picks are from {1, 2, 3, 4, 5} is = 1771 .
(23
3)

7.33. Denote the color of the kth chip by Xk . By exchangeability

4 2
P (X5 = black|X3 = X10 = red) = P (X3 = black|X1 = X2 = red) = = ,
22 11
where the last step follows from the fact that if the first two choices were red then
there are 4 out of the remaining 22 chops are black.
7.34. By Fact 7.17 we have to show that the joint probability mass function of
X1 , . . . , X4 is a symmetric function.
We will compute P (X1 = a1 , X2 = a2 , X3 = a3 , X4 = a4 ) for all choices
of a1 , a2 , a3 , a4 2 {0, 1}. For a given choice of a1 , a2 , a3 , a4 2 {0, 1} we know
which aces were chosen and which were not. We can compute P (X1 = a1 , X2 =
a2 , X3 = a3 , X4 = a4 ) by counting the favorable outcomes among the 52 5 choices
of unordered samples of 5. Since we know which aces are in the sample, and which
are not, we just have to count the number of ways we can choose the remaining
non-aces. This is given by 548k , where k = a1 + a2 + a3 + a4 is the number of aces
166 Solutions to Chapter 7

among the 5 cards. (48 is the total number of non-ace cards, 5 k is the number
of non-ace cards among the 5.)
Thus
48
5 (a1 +···+a4 )
P (X1 = a1 , X2 = a2 , X3 = a3 , X4 = a4 ) = 52
5
if a1 , a2 , a3 , a4 2 {0, 1}. But this is a symmetric function of a1 , a2 , a3 , a4 (as the sum
does not change when we permute these numbers), which shows that the random
variables X1 , X2 , X3 , X4 are indeed exchangeable.
7.35. By exchangeability, it is enough to compute the probability that the values of
first three picks are increasing. By using exchangeability again, any of the possible
3! = 6 order for the first three picks are equally likely. Hence the probability in
question is 16 .
7.36. (a) The waiting times between replacements are independent exponentials
with parameter 1/2 (with years as the time units). This means that the replace-
ments form a Poisson process with parameter 1/2. Then the number of replacements
within the next year is Poisson distributed with parameter 1/2, and hence
P (have to replace a light bulb during the year)
1/2
=1 P (no replacements within the year) = 1 e .
(b) The number of points in two non-overlapping intervals are independent for a
Poisson process. Thus the conditional probability is the same as the unconditional
one, and using the same approach as in part (b) we get
(1/2)2 1/2 e 1/2
P (two replacements in the year) = e = .
2! 8
7.37. The joint probability mass function of g(X1 ), g(X2 ), g(X3 ) can be expressed
in terms of the joint probability mass function p(x1 , x2 , x3 ) of X1 , X2 , X3 :
X
P (g(X1 ) = a1 , g(X2 ) = a2 , g(X3 ) = a3 ) = p(x1 , x2 , x3 ).
b1 :g(b1 )=a1
b2 :g(b2 )=a2
b3 :g(b3 )=a3

Similarly, for any permutation (k1 , k2 , k3 ) of (1, 2, 3) we can write

X
P (g(Xk1 ) = a1 , g(Xk2 ) = a2 , g(Xk3 ) = a3 ) = P (Xk1 = a1 , Xk2 = a3 , Xk3 = a3 ).
b1 :g(b1 )=a1
b2 :g(b2 )=a2
b3 :g(b3 )=a3

Since X1 , X2 , X3 are exchangeable, we have

P (Xk1 = a1 , Xk2 = a3 , Xk3 = a3 ) = P (X1 = a1 , X2 = a3 , X3 = a3 ) = p(x1 , x2 , x3 )
which means that
P (g(Xk1 ) = a1 , g(Xk2 ) = a2 , g(Xk3 ) = a3 ) = P (g(X1 ) = a1 , g(X2 ) = a2 , g(X3 ) = a3 ).
This proves that g(X1 ), g(X2 ), g(X3 ) are exchangeable.
Solutions to Chapter 8

8.1. From the information given and properties of the random variables we deduce
1 2 p
EX = , E(X 2 ) = , EY = nr, E(Y 2 ) = n(n 1)r2 + nr.
p p2
1
(a) By linearity of expectation, E[X + Y ] = EX + EY = p + nr.
(b) We cannot calculate E[XY ] without knowing something about the joint distri-
bution of (X, Y ). But no such information is given.
2 p
(c) By linearity of expectation, E[X 2 + Y 2 ] = E[X 2 ]+ E[Y 2 ] = p2 + n(n 1)r2 +
nr.
(d) E[ (X + Y )2 ] = E[X 2 + 2XY + Y 2 ] = E[X 2 ] + 2E[XY ] + E[Y 2 ]. Again we
would need E[XY ] which we cannot calculate.

8.2. Let Xk be the number showing on the k-sided die. We need E[X4 + X6 + X12 ].
By linearity of expectation

E[X4 + X6 + X12 ] = E[X4 ] + E[X6 ] + E[X12 ].

We can compute the expectation of Xk by taking the average of the numbers

1, 2, . . . , k:
k
X 1 k(k + 1) k+1
E[Xk ] = j· = = .
j=1
k 2k 2

This gives
4 + 1 6 + 1 12 + 1 25
E[X4 + X6 + X12 ] = + + = .
2 2 2 2
8.3. Introduce indicator variables XB , XC , XD so that X = XB + XC + XD , by
defining XB = 1 if Ben calls and zero otherwise, and similarly for XC and XD . Then
E[X] = E[XB + XC + XD ] = E[XB ] + E[XC ] + E[XD ] = 0.3 + 0.4 + 0.7 = 1.4.

167
168 Solutions to Chapter 8

8.4. Let Ik be the indicator of the event that the number 4 is showing on the k-sided
die. Then Z = I4 + I6 + I12 . For each k 4 we have
1
E[Ik ] = P (the number 4 is showing on the k-sided die) = .
k
Hence, by linearity of expectation
1 1 1 1
E[Z] = E[I4 ] + E[I6 ] + E[I12 ] = + + = .
4 6 12 2
8.5. We have E[X] = p1 = 3 and E[Y ] = = 4 from the given distributions. The
perimeter of the rectangle is given by 2(X + Y + 1) and the area is X(Y + 1). The
expectation of the perimeter is
E[2(X + Y + 1)] = E[2X + 2Y + 2] = 2E[X] + 2E[Y ] + 2 = 2 · 3 + 2 · 4 + 2 = 16,
where we used the linearity of expectation.
The expectation of the area is
E[X(Y + 1)] = E[XY + X] = E[XY ] + E[X] = E[X]E[Y ] + E[X] = 3 · 4 + 3 = 15.
We used the linearity of expectation, and also that because of the independence of
X and Y we have E[XY ] = E[X]E[Y ].
8.6. The answer to parts (a) and (c) do not change. However, we can now com-
pute E[XY ] and E[(X + Y )2 ] using the additional information that X and Y are
independent. Using the facts from the solution of Exercise 8.1 about the first and
second moments of X and Y , and the independence of these random variables we
get
1 nr
E[XY ] = E[X]E[Y ] = · nr = ,
p p
and
E[(X + Y )2 ] = E[X 2 + 2XY + Y 2 ] = E[X 2 ] + 2E[XY ] + E[Y 2 ]
2 p 2nr
= + + n(n 1)r2 + nr.
p2 p
8.7. The mean of X is given by the solution of Exercise 8.3. As in the solution of
Exercise 8.3, introduce indicators so that X = XB + XC + XD . Using the assumed
independence,
Var(X) = Var(XB + XC + XD ) = Var(XB ) + Var(XC ) + Var(XD )
= 0.3 · 0.7 + 0.4 · 0.6 + 0.7 · 0.3 = 0.66.
8.8. Let X be the arrival time of the plumber and T the time needed to complete
the project. Then X ⇠ Unif[1, 7] and T ⇠ Exp(2) (with hours as units), and these
are independent. The parameter of the exponential comes from the fact that an
Exp( ) distributed random variable has expectation 1/ .
We need to compute E[X + T ] and Var(X + T ). Using the distributions of X
and T we get
1+7 62 1 1 1
E[X] = = 4, Var(X) = = 3, E[T ] = , Var(T ) = = .
2 12 2 22 4
Solutions to Chapter 8 169

By linearity we get
1 9
E[X + T ] = E[X] + E[T ] = 4 + = .
2 2
From the independence
1 13
Var(X + T ) = Var(X) + Var(T ) = 3 + = .
4 4
8.9. (a) We have
E[3X 2Y + 7] = 3E[X] 2E[Y ] + 7 = 3 · 3 2 · 5 + 7 = 6,
where we used the linearity of expectation.
(b) Using the independence of X and Y :
Var(3X 2Y + 7) = 9 · Var(X) + 4 · Var(Y ) = 92 + 43 = 30.
(c) From the definition of the variance
Var(XY ) = E[(XY )2 ] E[XY ]2 .
By independence we have E[XY ] = E[X]E[Y ] and E[(XY )2 ] = E[X 2 ]E[Y 2 ], thus
Var(XY ) = E[X 2 ]E[Y 2 ] E[X]2 E[Y ]2
= E[X 2 ]E[Y 2 ] 925 = E[X 2 ]E[Y 2 ] 225,
To compute the second moments we use the variance:
2 = Var(X) = E[X 2 ] E[X]2 = E[X 2 ] 9
hence E[X 2 ] = 9 + 2 = 11. Similarly, E[Y 2 ] = E[Y ]2 + Var(Y ) = 25 + 3 = 28. Thus
Var(XY ) = 11 · 28 225 = 83.
8.10. The moment generating function of X1 is given by
X 1 1 1
MX1 (t) = E[etX ] = etk P (X1 = k) = + et + e2t .
2 3 6
k

The moment generating function of X2 is the same. Since X1 and X2 are inde-
pendent, we can compute the moemnt generating function of S = X1 + X2 as
follows:
✓ ◆2
1 1 t 1 2t
MS (t) = MX1 (t)MX2 (t) = + e + e .
2 3 6
Expanding the square we get
1 1 t 5 1 1
MS (t) = + e + e2t + e3t + e4t .
4 3 18 9 36
We can read o↵ the probability mass function of S from this by identifying the
coefficients of the exponential terms:
P (S = 0) = 14 , P (S = 1) = 13 , P (S = 2) = 5
18 , P (S = 3) = 19 , P (S = 4) = 1
36 .
170 Solutions to Chapter 8

8.11. Introduce indicator variables XB , XC , XD so that X = XB + XC + XD ,

by defining XB = 1 if Ben calls and zero otherwise, and similarly for XC and
XD . These are independent Bernoulli random variables with parameters 0.3, 0.4
and 0.7, respectively. By the independence, the moment generating function of
X = XB + XC + XD can be written as
MX (t) = MXA (t)MXB (t)MXC (t).
The generating function of a parameter p Bernoulli random variable is pet + 1 p,
which means that
MX (t) = (0.3et +0.7)(0.4et +0.6)(0.7et +0.3) = 0.126+0.432et +0.358e2t +0.084e3t .
8.12. (a) We need to compute
Z 1 Z 1 Z 1
MZ (t) = E(etZ ) = etz fZ (z)dz = etz 2
ze z
dz = 2
ze ( t)z
dz.
1 0 0
R1
If t  0 then this integral is at least as large as 2 0 zdz which is infinite.
If t > 0 then
R 1 we can compute the integral using integration by parts, or by
noting that 0 z( t)e ( t)z dz = 1 t as the integral is the expectation of
an Exp( t) distributed random variable. This gives
( 2
2, if t <
MZ (t) = ( t)
1, if t .
(b) We have seen in Example 5.6 that
(
MX (t) = MY (t) = t, if t <
1, if t .
Since X and Y are independent, we have MX+Y (t) = MX (t)MY (t). Comparing
with part (a) we see that X +Y has the same moment generating function as Z,
which means that they must have a the same distribution. (Since the moment
generating function is finite in a neighborhood of 0.)
8.13. We first find a random variable that has the moment generating function
1
2e
t
+ 25 + 10
1 t/2
e . Reading o↵ the coefficients of the e t , et/2 and also considering
the constant term we get that if X has probability mass function
p( 1) = 12 , p(0) = 25 , p( 21 ) = 1
10 .

then MX (t) = 12 e t + 25 + 10
1 t/2
e . Now take independent random variables X1 , . . . , X36
with the same distribution as X. By independence, the sum X1 + · · · + X36 has a
moment generating function which is the product of the individual moment gener-
1 t/2 36
ating functions, which is exactly 12 e t + 25 + 10 e = MZ (t). Hence Z has the
same distribution as X1 + · · · + X36 .
8.14. We need to compute E[X], E[Y ], E[X 2 ], E[Y 2 ], E[XY ]. All of these can be
computed using the joint probability mass function given in the table. For example,
1 1 2 1 1 1 1 1 1 1 1
E[X] = 1 · ( 15 + 15 + 15 + 15 ) + 2 · ( 10 + 10 + 5 + 10 ) + 3 · ( 30 + 30 +0+ 10 )
11
=
6
Solutions to Chapter 8 171

and
1 1 2 1 1 1
E[XY ] = 1 · 0 · 15 1 ·1· 15 +1·2· 15 +1·3· 15 +2·0· 10 +2·1· 10
+2· 2 · 15 +2·3· 1
10 +3·0· 1
30 +3·1· 1
30 +3·3· 1
10
47
= .
15
Similarly,
5 23 59
E[Y ] = , E[X 2 ] = , E[Y 2 ] = .
3 6 15
Then
47 11 5 7
Cov(X, Y ) = E[XY ] E[X]E[Y ] = · = .
15 6 3 90
For the correlation we first compute the variances:
✓
◆2
2 2 23 11 17
Var(X) = E[X ] (E[X]) = =
6 6 36
✓ ◆2
59 5 52
Var(Y ) = E[Y 2 ] (E[Y ])2 = = .
15 3 45
From this we have
Cov(X, Y ) 7
Corr(X, Y ) = p = p ⇡ 0.1053
Var(X) Var(Y ) 2 1105
8.15. We first compute the joint probability density of (X, Y ). The quadrilateral
D is composed of a unit square and a triangle which is half of the unit square, thus
the area of D is 32 . Thus the joint density function is
2
fX,Y (x, y) =
1{(x,y)2D} .
3
To calculate the covariance we need to calculate
E[XY ], E[X], E[Y ].
We have
Z 1 Z 2 y Z 1
2 2
E[XY ] = xy dx dy = y(2 y)2 dy
0 0 3 0 6
✓ ◆ 1
2 4 2 4 3 1 4 2 11 11
= y y + y = · = .
6 2 3 4 0 6 12 36
Z 1 Z 2 y Z 1
2 2
E[X] = x dx dy = (2 y)2 dy
0 0 3 0 6
✓ ◆ 1
2 4 2 1 3 2 7 7
= 4y y + y = · = .
6 2 3 0 6 3 9
Z 1 Z 2 y Z 1
2 2
E[Y ] = y dx dy = (2 y)y dy
0 0 3 0 3
✓ ◆ 1
2 2 2 1 3 2 2 4
= y y = · = .
3 2 3 0 3 3 9
172 Solutions to Chapter 8

By the definition of covariance, we get

11 7 4 13
Cov(X, Y ) = E[XY ] E[X]E[Y ] = · = .
36 9 9 324
The fact that X and Y are negatively correlated could have been guessed from the
shape of D: as Y gets smaller, the value of X tend to get larger on average.
8.16. We have
Cov(X, 2X + Y 3) = 2 Cov(X, X) + Cov(X, Y ) = 2 Var(X) + Cov(X, Y ).
The variance of X can be computed as follows:
Var(X) = E[X 2 ] (E[X])2 = 3 12 = 2.
The covariance can be calculated as
Cov(X, Y ) = E[XY ] E[X]E[Y ] = 4 1·2= 6.
Thus
Cov(X, 2X + Y 3) = 2 Var(X) + Cov(X, Y ) = 2 · 2 6= 2.
2
8.17. We need E[X ] and E[X]. By linearity:
E[X] = E[IA + IB ] = E[IA ] + E[IB ] = P (A) + P (B) = 0.7.
Similarly:
E[X 2 ] = E[(IA + IB )2 ] = E[IA
2 2
+ IB + 2IA IB ],
2 2
= E[IA ] + E[IB ] + 2E[IA IB ].
2 2
We have IA = IA , IB = IB and IA IB = IAB , hence
E[X 2 ] = E[IA
2 2
] + E[IB ] + 2E[IA IB ]
= P (A) + P (B) + 2P (AB) = 0.9.
Then
Var(X) = E[X 2 ] E[X]2 = 0.9 0.72 = 0.41.
8.18. By the discussion in Section 8.6 if X, Y are independent standard normals
and A is a 2 ⇥ 2 matrix then the coordinates of the random vector A[X, Y ]T are
distributed as a bivariate normal with expectation vector [0, 0]T and covariance
1 1
matrix AAT . Choosing A = p12 we get A[X, Y ]t = [U, V ]T . Since
1 1

T 1 0
AA = we get that the variance of U and V are both 1, and the covariance
0 1
of U and V is 0. Hence U and V are indeed independent standard normals.
Here is another solution using the Jacobian technique of Section 6.4. We have
U = g(X, Y ), V = h(X, Y ) with
1 1
g(x, y) = p (x y), h(x, y) = p (x + y).
2 2
Then the inverse of these functions is given by
1 1
q(u, v) = p (u + v), r(u, v) = p (v u),
2 2
Solutions to Chapter 8 173

and the Jacobian is " #

p1 p1
2 2
J(u, v) = det = 1.
p1 p1
2 2
Now using Fact 6.41 we get that the joint density of U, V is given by
2 2 ⇣ ⌘ ⇣ ⌘
u+v v u 1 1 u+v
p 1 vp u
fU,V (u, v) = fX,Y ( p , p ) = e 2 2 2 2

2 2 2⇡
1 u2 1 v2
=p e 2 p e 2.
2⇡ 2⇡
The final result shows that U and V are independent standard normals.
8.19. This is the same problem as Exercise 6.15.
8.20. By linearity, E[X3 + X10 + X22 ] = E[X3 ] + E[X10 ] + E[X22 ]. The random
variables X1 , . . . , X30 are exchangeable, thus E[Xk ] = E[X1 ] for all 1  k  30.
This gives
E[X3 + X10 + X22 ] = 3E[X1 ].
The value of the first pick is equally likely to be any of the first 30 positive integers,
hence
X30
1 30 · 31 31
E[X1 ] = k = = ,
30 2 · 30 2
k=1
and
93
E[X3 + X10 + X22 ] = 3E[X1 ] = .
2
8.21. Label the coins from 1 to 10, for example so that coins 1-5 are the dimes, coins
6-8 are the quarters, and coins 9-10 are the pennies. Let ak be the value of coin k
and let Ik be the indicator variable that is 1 if coin k is chosen, for k = 1, . . . , 10.
Then
X10
X= ak Ik = 10(I1 + · · · + I5 ) + 25(I6 + I7 + I8 ) + I9 + I10 .
k=1
The probability that any particular coin is chosen is
9
2 3
E(Ik ) = P (coin k chosen) = 10 = 10 .
3
Hence
10
X
3 3 3
EX = ak E(Ik ) = 10 · 5 · 10 + 25 · 3 · 10 +2· 10 = 38.1 (cents).
k=1

8.22. There are several ways to approach this problem. One possibility that gives
the answer without doing complicated computations is as follows. For each 1  j 
89 let Ij be the indicator of the event that both j and j + 1 are chosen among the
P89
five numbers. Then X = j=1 Ij , since if j and j + 1 are both chosen then they
will be next to each other in the ordered sample. By linearity
89
X 89
X
E[X] = E[ Ij ] = E[Ij ].
j=1 j=1
174 Solutions to Chapter 8

We can compute E[Ij ] directly by counting favorable outcomes:

88
3 2
E[Ij ] = P (both j and j + 1 are chosen) = 90 = .
5
801
Thus
2 20
E[X] = 89 ·
= .
801 89
Note that we could have expressed X di↵erently as a sum of indicators, e.g. by
considering the indicator that the jth and (j + 1)st number among the chosen
numbers have a di↵erence of 1. However, this would lead to indicators that are not
exchangeable, and the corresponding probabilities would be hard to compute.
8.23. (a) Let Yi denote the color of the ith pick (i.e. Yi 2 {red, green}). Then
Y1 , . . . , Y50 are exchangeable so
P (Y28 6= Y29 ) = P (Y28 = red, Y29 = green) + P (Y29 = red, Y28 = green)
20 · 30 24
= 2P (Y1 = red, Y2 = green) = 2 =
50 · 49 49
(b) Let Ij be the indicator that Yj 6= Yj+1 for j = 1, . . . , 49. Then X = I1 +· · ·+I49
and by linearity
49
X 49
X
E[X] = E[Ii ] = P (Yi 6= Yi+1 )
i=1 i=1

By the exchangeability of the Yi random variables and part (a) we get

49
X 24
E[X] = P (Yi 6= Yi+1 ) = 49P (Y1 6= Y2 ) = 49 = 24.
i=1
49

Another (bit more complicated) solution for part (b):

Introduce labels for the 20 red balls (from 1 to 20). Let Ji , 1  i  20 be the
indicator that the ith red ball has a green ball right after it, and Ki be the indicator
that the ith red ball has a green ball right before it. Then
20
X
X= (Ji + Ki ),
i=1

and by the linearity of expectation and exchangeability we have

20
X 20
X
E[X] = E[Ji ] + E[Ki ] = 20E[J1 ] + 20E[K1 ]
i=1 i=1

Using exchangeability again:

49
X
P (J1 = 1) = P (red ball # 1 is picked at position i and a green ball is picked at i + 1)
i=1
= 49P (red ball # 1 is picked at position 1 and a green ball is picked at 2)
1 · 30 3
= 49 = .
50 · 49 5
Solutions to Chapter 8 175

Same way we get P (K1 = 1) = 35 . Putting everything together:

3
E[X] = 20E[J1 ] + 20E[K1 ] = 2 · 20 · = 24.
5
8.24. Let Ij be the indicator of the event that Jane’s jth pick has the same color
as Sam’s jth pick. Imagine that we write down the picked colors as they appear,
all 80 of them. Then Ij depends on the color of the (2j 1)st and 2jth pick, and
since the colors are exchangeable, the Ij random variables will be exchangeable as
P40
well. We have N = j=1 Ij , and by linearity of expectation and exchangeability
we get
40
X 40
X
E[N ] = E[ Ij ] = E[Ij ] = 40E[I1 ].
j=1 i=1
But
30 50
2 + 2 83
E[I1 ] = P (first two colors are the same color) = 80 = ,
2
158
by counting favorable outcomes within the first two pick. This gives
83 1660
E[N ] = 40 · = ⇡ 21.0127.
158 79
8.25. (a) Let Yi denote the number of the ith pick. Then (Y1 , Y2 , . . . , Y10 ) is ex-
changeable, and hence
P (Y5 > Y4 ) = P (Y1 > Y2 ) = P (Y2 > Y1 ) = 1/2.
In the last step we used that the numbers are di↵erent and this P (Y1 > Y2 )+P (Y2 >
Y1 ) = 1.
(b) Let Ij be the indicator of the event that the number on the jth ball is larger
than the number on the (j 1)st. (For j = 2, 3, . . . , 10.) Then
X = I2 + I3 + · · · + I10
and
10
X
E[X] = E[I2 + I3 + · · · + I10 ] = P (jth number is larger than the (j 1)st).
j=2

Using part (a) we get that

P (jth number is larger than the (j 1)st) = 1/2
P10
for all 2  j  10, which means that E[X] = j=2 12 = 92 .
8.26. (a) Let Ij be the indicator that the jth ball is green and the (j + 1)st ball is
Pn 1
yellow. Then Xn = j=1 Ij . By linearity
n
X1 n
X1
E[Xn ] = E[ Ij ] = E[Ij ].
j=1 j=1

Because we draw with replacement, the colors of di↵erent picks are independent:
E[Ij ] = P (jth ball is green and the (j + 1)st ball is yellow)
4 3 4
= P (jth ball is green)P ((j + 1)st ball is yellow) = · = .
9 9 27
176 Solutions to Chapter 8

This gives
n
X1 4 4(n 1)
E[Xn ] = = .
j=1
27 27
(b) We will see a di↵erent (maybe more straightforward) technique in Chapter 10,
but here we will give a solution using the indicator method. Let Jk denote the
indicator that thePkth ball is green and there are no white balls among the first
1
k 1. Then Y = k=1 Jk . (In the sum a term is equal to 1 if the corresponding
ball is green and came before the first white.) Using linearity
1
X 1
X
E[Y ] = E[ Jk ] = E[Jk ]
k=1 k=1
1
X
= P (kth ball is green, no white balls among the first k 1).
k=1

(We can exchange the expectation and the infinite sum here as each term is
nonnegative.) Using independence we can compute the probability in question
for each k:
P (kth ball is green, no white balls among the first k 1)
= P (kth ball is green)P (first k 1 balls are all green or yellow)
4 7 k 1
= 9 · 9 .
This gives
1
X
4 7 k 1 4 1
E[Y ] = 9 · 9 = 9 · 1 7 = 2.
9
k=1
Here is an intuitive explanation for the result that we got. The yellow draws
are irrelevant in this problem: the only thing that matters is the position of
the first white, and the number of green choices before that. Imagine that
we remove the yellow balls from the urn, and we repeat the same experiment
(sampling with replacement), stopping at the first white ball. Then the number
of picks is a geometric random variable with parameter 26 = 13 . The expectation
of this geometric random variable is 3. Moreover, the number of total picks is
equal to the number of green balls chosen before the first white plus the 1 (the
first white). This explains why the expectation of Y is 3 1 = 2.
8.27. For 1  i < j  n let Ii,j be the indicator of the event P that ai = aj . We need
to compute the expected value of the random variable X = i<j Ii,j . By linearity
P
E[X] = i<j E[Ii,j ]. Using the exchangeability of the sample (a1 , . . . , an ) we get
for all i < j that E[Ii,j ] = E[I1,2 ] = P (a1 = a2 ). Counting favorable outcomes (or
by conditioning on the first pick) we get P (a1 = a2 ) = n1 . This gives
X ✓ ◆ ✓ ◆
n n 1 n 1
E[X] = E[Ii,j ] = P (a1 = a2 ) = · = .
i<j
2 2 n 2

8.28. Imagine that we take the sample with order and for each 1  k  10 let
Ik be the indicator that we got a yellow marble for the kth pick, and Jk be the
Solutions to Chapter 8 177

P10 P10
indicator that we got a green pick. Then X = k=1 Ik , Y = k=1 Jk and X Y =
P10
k=1 (Ik Jk ). Using the linearity of expectation we get
10
X 10
X
E[X Y ] = E[ (Ik Jk )] = (E[Ik ] E[Jk ]).
k=1 k=1

Using the exchangeability of I1 , . . . , I10 , and J1 , . . . , J10 :

10
X
E[X Y]= (E[Ik ] E[Jk ]) = 10E[I1 ] 10E[J1 ].
k=1

By counting favorable outcomes:

25 5
E[I1 ] = P (first pick is yellow) = =
95 19
30 6
E[J1 ] = P (first pick is green) = = .
95 19
which leads to
5 6 10
E[X Y ] = 10 ·
10 · = .
19 19 19
8.29. Let Ij be the indicator that the cards flipped at j, j + 1 and j + 2 are all
P50 P50
number cards. (Here 1  j  50.) Then X = j=1 Ij and E[X] = j=1 E[Ij ]. By
exchangeability we have
50
X
E[X] = E[Ij ] = 50E[I1 ] = 50P (the first three cards flipped are number cards).
j=1

Counting favorable outcomes (noting that there are 4 · 9 = 36 number cards in the
deck) gives
36
3 21
P (the first three cards flipped are number cards) = 52 =
3
65
and
21 210
E[X] = 50 ·
= .
65 13
8.30. Let Xk be the number of the kth chosen ball and let Ik be the indicator of
the event that Xk > Xk 1 . Then
N = I2 + I3 + · · · + I20 ,
and using linearity and exchangeability
20
X 20
X
E[N ] = E[ Ik ] = E[Ik ] = 19E[I2 ].
k=2 k=2

We also have
E[I2 ] = P (X1 < X2 ) = P (first number is smaller than the second).
One could compute the probability P (X1 < X2 ) by counting favorable outcomes
for the first two picks. Another way is to notice that
1 = P (X1 < X2 ) + P (X1 > X2 ) + P (X1 = X2 ) = 2P (X1 < X2 ) + P (X1 = X2 ),
178 Solutions to Chapter 8

where we used exchangeability again. By conditioning on the first outcome we see

1
that P (X1 = X2 ) = 19 , which gives
1 P (X1 = X2 ) 9
2P (X1 < X2 ) = =
2 19
and E[N ] = 19P (X1 < X2 ) = 9.
8.31. Write the uniformly chosen number with exactly 4 digits (by putting zeros at
the beginning if needed), and denote the four digits by X1 , X2 , X3 , X4 . (Thus for
128 we have X1 = 0, X2 = 1, X3 = 2, X4 = 8.) Then each digit will be uniform on
the set {0, . . . , 9} (you can check this by counting), hence E[Xi ] = 0+1+2+···+0
2 = 92 .
We have X = X1 + X2 + X3 + X4 and hence
EX = E[X1 + X2 + X3 + X4 ] = 4EX1 = 4 · 9/2 = 18.
8.32. (a) We have
IA[B = I(Ac \B c )c = 1 I Ac B c = 1 I Ac I B c = 1 (1 IA )(1 IB ).
Expanding the last expression gives
IA[B = 1 (1 IA )(1 IB ) = 1 (1 IA IB + IA IB ) = IA + IB IA IB .
The identity now follows by noting that IA IB = IAB .
Another approach would be to note that AB [ AB c [ Ac B [ Ac B c gives
a partition of ⌦, so any ! 2 ⌦ will be a member of exactly one of AB, AB c ,
Ac B or Ac B c . For each of these four cases we can evaluate IA[B , IA , IB , IAB
and check that the two sides of the equation are equal.
(b) This is immediate after taking expectation in the identity proved in part (a).
We have
E[IA[B ] = P (A [ B)
and using linearity
E[IA + IB IA\B ] = E[IA ] + E[IB ] E[IA\B ] = P (A) + P (B) P (AB).
Since the two expectations agree by part (a), we get P (A[B) = P (A)+P (B)
P (AB).
(c) Let A, B, C be events on the same sample space. Then
IA[B[C = I(Ac B c C c )c = 1 I Ac B c C c
=1 I Ac I B c I C c = 1 (1 IA )(1 IB )(1 IC ).
Expanding the product
IA[B[C = 1 (1 IA )(1 IB )(1 IC )
=1 (1 IA IB IC + IA IB + IA IC + IB IC IA IB IC )
= IA + IB + IC IA IB IA IC IB IC + IA IB IC .
Using ID IE = IDE repeatedly:
IA[B[C = IA + IB + IC IA IB IA IC IB IC + IA IB IC
= IA + IB + IC IAB IAC IBC + IABC .
Taking expectations of both sides now gives
P (A [ B [ C) = P (A) + P (B) + P (C) P (AB) P (AC) P (BC) + P (ABC).
Solutions to Chapter 8 179

8.33. (a) For each 1  a  10 let Ia be the indicator of the event that the ath
player won exactly 2 matches. Then we need
10
X 10
X
E[ Ik ] = P (the ath player won exactly 2 matches).
k=1 k=1
By exchangeability the probability is the same for each a. Since the outcomes of
the matches are independent and a player plays 9 matches, we have
✓ ◆
9
P (the first player won exactly 2 matches) = 2 9.
2
Thus the expectation is 10 · 92 2 9 = 45 64 .
(b) For each 1  a < b < c  10 let Ja,b,c P be the indicator
Pthat the players numbered
a, b and c form a 3-cycle. We need E[ a<b<c Ja,b,c ] = a<b<c E[Ja,b,c ]. There are
10
3 such triples, and the expectation is the same for each one, so it is enough to
find
E[J1,2,3 ] = P (Players 1, 2 and 3 form a 3-cycle).
Players 1, 2 and 3 form a 3-cycle if 1 beats 2, 2 beats 3, 3 beats 1 (this has probability
1/8) or if 1 beats 3, 3 beats 2 and 2 beats 1 (this also has probability 1/8). Thus
E[J1,2,3 ] = 1/8 + 1/8 = 14 , and the expectation in question is 10 1
3 4 = 30.
(c) We use the indicator method again. For each possible sequence of di↵erent
players a1 , a2 , . . . , ak we set up an indicator that this sequence is a k-path. The
number of such indicators is 10 10!
k · k! = (10 k)! (we choose the k players, then
their order). The probability that a given indicator is 1 is the probability that a1
beats a2 , a2 beats a3 , . . . , ak 1 beats ak which is 2 (k 1) . Thus the expectation is
10! 1 k 1
(10 k)! ( 2 ) .
8.34. We show the proof for n = 2, the general case can be done similarly. Assume
that the joint probability density function of X1 , X2 is f (x1 , x2 ). Then
Z 1Z 1
E[g1 (X1 ) + g2 (X2 )] = (g1 (x1 ) + g2 (x2 ))f (x1 , x2 )dx1 dx2 .
1 1
Using the linearity of the integral we can write this as
Z 1Z 1 Z 1Z 1
g1 (x1 )f (x1 , x2 )dx1 dx2 + g2 (x2 )f (x1 , x2 )dx1 dx2 .
1 1 1 1
Integrating out x2 in the first integral gives
Z 1Z 1 Z 1 ✓Z 1 ◆
g1 (x1 )f (x1 , x2 )dx1 dx2 = g1 (x1 ) f (x1 , x2 )dx2 dx1 .
1 1 1 1
R1
Note that 1 f (x1 , x2 )dx2 is equal to fX1 (x1 ), the marginal probability density
of X1 . Hence
Z 1 ✓Z 1 ◆ Z 1
g1 (x1 ) f (x1 , x2 )dx2 dx1 = g1 (x1 )fX1 (x1 )dx1 = E[g1 (X1 )].
1 1 1
Similar computation shows that
Z 1Z 1
g2 (x2 )f (x1 , x2 )dx1 dx2 = E[g2 (X2 )].
1 1
Thus E[g1 (X1 ) + g2 (X2 )] = E[g1 (X1 )] + E[g2 (X2 )].
180 Solutions to Chapter 8

8.35. (a) We may assume that the choices we made each day are independent. Let
Jk be the indicator for the event that the sweater k is worn at least once in the 5
days. Then X = J1 + J2 + J3 + J4 . By linearity and exchangeability
4
X
E[X] = E[J1 + J2 + J3 + J4 ] = E[Jk ] = 4E[J1 ]
k=1
= 4P (the first sweater was worn at least once).
Considering the complement of the event in the last line:
P (the first sweater was worn at least once) = 1 P (the first sweater was not worn at all)
✓ ◆5
3
=1 ,
4
where we used the independence assumption. This gives
✓ ◆5 !
3 781
E[X] = 4 1 = .
4 256
(b) We use the notation introduced in part (a). For the variance of X we need
E[X 2 ]. Using linearity and exchangeability:
4
X X
E[X 2 ] = E[(J1 + J2 + J3 + J4 )2 ] = E[ Jk2 + 2 Jk J` ]
k=1 k<`
✓ ◆
4
= 4E[J12 ] +2 E[J1 J2 ] = 4E[J12 ] + 12E[J1 J2 ]
2
Since J1 is one or zero, we have J12 = J1 and by part (a)
781
4E[J12 ] = 4E[J1 ] = E[X] = .
256
We also have
E[J1 J2 ] = P (both the first and second sweater were worn at least once).
Let Ak denote the event that the kth sweater was not worn at all during the week.
Then
P (both the first and second sweater were worn at least once) = P (Ac1 Ac2 )
=1 P ((Ac1 Ac2 )c ) = 1 P (A1 [ A2 )
=1 (P (A1 ) + P (A2 ) P (A1 A2 )).
From part (a) we get P (A1 ) = P (A2 ) = ( 34 )5 , and similarly
P (A1 A2 ) = P (neither the first nor the second sweater was worn) = ( 24 )5 .
Thus
E[J1 J2 ] = 1 P (A1 ) P (A2 ) + P (A1 A2 ) = 1 2( 34 )5 + ( 24 )5
and
781 2491
E[X 2 ] = + 12 1 2( 34 )5 + ( 24 )5 = .
256 256
Finally,
Var(X) = E[X 2 ] E[X]2 = 2491
256 ( 781 2
256 ) ⇡ 0.4232.
Solutions to Chapter 8 181

8.36. (a) Let Ik be the indicator of the event that the number k appears at least
once among the four die rolls. Then X = I1 + · · · + I6 and we get
E[X] = E[I1 + · · · + I6 ] = E[I1 ] + · · · + E[I6 ] = 6E[I1 ],
where the last step comes from exchangeability. We have
5 4
E[I1 ] = P (the number 1 shows up) = 1 P (none of the rolls are equal to 1) = 1 6
which gives ⇣ ⌘
5 4
E[X] = 6 1 6 .
(b) We need to compute the second moment of X. Using the notation of part (a):
6
X X
2 2
E[X ] = E[(I1 + · · · + I6 ) ] = E[ Ik2 + 2 Ij Ik ]
k=1 j<k6
6
X X
= E[Ik2 ] + 2 E[Ij Ik ].
k=1 j<k6

Since Ik is either 0 or 1, we have Ik2 = Ik . Using exchangeability

6
X X
2
E[X ] = E[Ik2 ] + 2 E[Ij Ik ]
k=1 j<k6
6
X X
= E[Ik ] + 2 E[Ij Ik ]
k=1 j<k6
= 6E[I1 ] + 30E[I1 I2 ].
⇣ ⌘
5 4
We computed 6E[I1 ] in part (a), it is exactly E[X] = 6 1 6 . To com-
pute E[I1 I2 ] we first note that I1 I2 is the indicator of the event that both
the numbers 1 and 2 show up at least once. Taking complements and using
inclusion-exclusion:
E[I1 I2 ] = P (both 1 and 2 show up at least once)
=1 P (none of the rolls are equal to 1 or none of the rolls are equal to 2)
=1 P (the number 1 shows up) + P (the number 2 shows up)
P (neither 1 nor 2 shows up)
⇣ ⌘
5 4 4 2 4 4 5 4
=1 6 + 56 3 = 1 + 23 2· 6

Collecting everything:
⇣ ⌘ ⇣ ⌘
5 4 2 4 5 4
E[X 2 ] = 6 1 6 + 30 1 + 3 2· 6

and
Var(X) = E[X 2 ] E[X]2
⇣ ⌘ ⇣ ⌘ ⇣ ⌘2
5 4 2 4 5 4 5 4
=6 1 6 + 30 1+ 3 2· 6 36 1 6
⇡ 0.447.
182 Solutions to Chapter 8

8.37. (a) Let Jk be the indicator for the event that the toy k is in at least one of
the 4 boxes. Then X = J1 + J2 + · · · + J10 . By linearity and exchangeability
10
X 10
X
E[X] = E[ Jk ] = E[Jk ] = 10E[J1 ]
k=1 k=1
= 10P (the first toy was in one of the boxes).
Let Ak be the event that the kth toy was not in any of the four boxes. Then
E[X] = 10P (Ac1 ) = 10(1 P (A1 )).
We may assume that the toys in the boxes are chosen independently of each other,
and hence
✓ 9 ◆4
4 ( 2)
P (A1 ) = P (first box does not contain the first toy) = = ( 45 )4
(10
2)

and ⇣ ⌘
738 4 4
E[X] = 10 1 . 5 =
125
(b) We need E[X 2 ] which can be expressed using the introduced indicators as
10
X 10
X X
E[X 2 ] = E[( Jk )2 ] = E[ Jk2 + 2 Jj Jk ]
k=1 k=1 j<k
10
X X
= E[Jk2 ] + 2 E[Jj Jk ]
k=1 j<k
◆ ✓
10
= 10E[J12 ]
+2 E[J1 J2 ]
2
= 10E[J1 ] + 90E[J1 J2 ].
738
We used linearity, exchangeability and J1 = J12 . Note that 10E[J1 ] = E[X] = 125
by part (a). Recalling the definition of Ak from part (a) we get
E[J1 J2 ] = P (Ac1 Ac2 ).
By taking complements,
P (Ac1 Ac2 ) = 1 P ((Ac1 Ac2 )c ) = 1 P (A1 [ A2 ) = 1 (P (A1 ) + P (A2 ) P (A1 A2 )).
As we have seen in part (a):
✓ ◆4
(92)
P (A1 ) = P (A2 ) = = ( 45 )4 ,
(10
2)

and a similar computation gives

✓ ◆4
(82) 4
P (A1 A2 ) = = ( 28
45 ) .
(10
2)

This gives
E[J1 J2 ] = 1 2( 45 )4 + ( 28
45 )
4

and
738
E[X 2 ] = + 90 1 2( 45 )4 + ( 28
45 )
4
,
125
Solutions to Chapter 8 183

which leads to

Var(X) = E[X 2 ] E[X]2

738
= + 90 1 2( 45 )4 + ( 28
45 )
4
( 738
125 )
2
125
⇡ 0.8092.
8.38. Consider the coupon collector’s problem with n = 6 (see Example 8.17).
Then we have one of 6 possible toys in each box of cereal, each with probability
1/6, independently of the others. Thus we can imagine that the toy in a given box
is chosen as the result of a die roll. Then finding all 6 toys means that we see all
6 numbers as outcomes among the die rolls. Hence the answer to our question is
given by the solution of the coupon collector’s problems with n = 6, by Example
8.17 the mean is 6(1 + 12 + 13 + 14 + 15 + 16 ) = 14.7 and the variance is

62 (1 + 1
4 + 1
9 + 1
16 + 1
25 ) 6(1 + 1
2 + 1
3 + 1
4 + 15 ) = 38.99.
8.39. Let Ji = 1 if a boy is chosen with the ith selection,
P15 and zero otherwise. Note
that E[Ji ] = P {Xi = 1} = 17/40. Then X = i=1 Ji and using linearity and
exchangeability
15
X 17 51
E[X] = P {Ji = 1} = 15 ⇥ = .
i=1
40 8

Using the formula for the variance of the sum (together with exchangeability) gives
15
! 15
X X X
Var(X) = Var Ji = Var(Ji ) + 2 Cov(Ji , Jk )
i=1 i=1 i<k

= 15Var(J1 ) + 15 · 14 Cov(J1 , J2 ),

Finding the variance of J1 is easy since J1 is a Bernoulli random variable:

17 23
Var(J1 ) = P (X1 = 1)(1 P (X1 ) = 1) = · .
40 40
To find the covariance, we have
2
Cov(J1 , J2 ) = E[J1 J2 ] E[J1 ]E[J2 ] = E[J1 J2 ] ( 17
40 ) .

To find E[J1 J2 ] note that J1 J2 = 1 only if a boy is called upon twice to start, and
zero otherwise. Thus, by counting favorable outcomes we get
17
2 34
E[J1 J2 ] = 40 = .
2
195

Collecting everything:
17 23 34 2 1955
Var(X) = 15 · · + 15 · 14 · 195 ( 17
40 ) = .
40 40 832
8.40. (a) We use the method of indicators. Let Jk be the indicator for the event
that the number k is drawn in at least one of the 4 weeks. Then X = J1 + J2 +
184 Solutions to Chapter 8

· · · + J90 . Then by the linearity of expectation and exchangeability we get

" 90 # 90
X X
E[X] = E Jk = E[Jk ]
k=1 k=1
= 90E[J1 ].

We have

E[J1 ] = P (1 is drawn at least one of the 4 weeks)

=1 P (1 is not drawn in any of the 4 weeks))
✓ ◆4 ✓ ◆4
89 · 88 · 87 · 86 · 85 85
=1 =1 .
90 · 89 · 88 · 87 · 86 90

From this
✓ ◆4 !
85
E[X] = 90E[J1 ] = 90 1 ⇡ 18.394.
90

(b) We first compute the second moment of X. Using the notation from part (b)
we have
2 !2 3 2 3
X90 X90 X
E[X 2 ] = E 4 Jk 5 = E 4 Jk2 + 2 Jk J` 5
k=1 k=1 1k<`90
90
X X
= E[Jk2 ] + 2 E[Jk J` ]
k=1 1k<`90
✓ ◆
90
= 90E[J12 ] + 2 · E[J1 J2 ],
2

where we used exchangeability again in the last step. Since J1 is either zero or
one, we have J12 = J1 . Thus the term 90E[J12 ] is the same as 90E[J1 ] which is
equal to E[X]. The second term can be computed as follows:

E[J1 J2 ] = P (both 1 and 2 are drawn at least once within the 4 weeks)
=1 P (at least one of 1 and 2 is not drawn within of the 4 weeks))
=1 P (1 is not drawn in any of the 4 weeks)
+ P (2 is not drawn in any of the 4 weeks)
+ P (neither 1 nor 2 is drawn in any of the 4 weeks) ,

where we used inclusion-exclusion in the last step. We have

P (1 is not drawn in any of the 4 weeks)

✓ ◆4
85
= P (2 is not drawn in any of the 4 weeks) = ,
90
Solutions to Chapter 8 185

and
✓ ◆4
88 · 87 · 86 · 85 · 84
P (neither 1 nor 2 is drawn in any of the 4 weeks) =
90 · 89 · 88 · 87 · 86
✓ ◆4
85 · 84
= .
90 · 89

Putting everything together:

✓ ◆4 ! ✓ ◆4 ✓ ◆4 !
85 85 85 · 84
E[X 2 ] = 90 1 + 90 · 89 1 2· +
90 90 90 · 89
⇡ 339.59.

Now we can compute the variance:

Var(X) = E[X 2 ] E[X]2 ⇡ 339.59 (18.394)2 ⇡ 1.25.

8.41. We have
"✓ ◆3 #
X1 + · · · + Xn 1 h 3
i
E[X̄n3 ] = E = E (X 1 + · · · + X n ) .
n n3

By expanding the cube of the sum and using linearity and exchangeability
2 3
X n X X
1
E[X̄n3 ] = 3 E 4 Xk3 + 6 Xi Xj Xk + 3 Xj2 Xk 5
n
k=1 i<j<k j6=k
0 1
n
1 @X X X
= 3 E[Xk3 ] + 6 E[Xi Xj Xk ] + 3 E[Xj2 Xk ]A
n
k=1 i<j<k j6=k
✓ ◆
1 n
= 3 · n E[X13 ] + 6 E[X1 X2 X3 ] + 3n(n 1)E[X12 X2 ].
n 3

By independence

E[X1 X2 X3 ] = E[X1 ]E[X2 ]E[X3 ] = 0, and E[X12 X2 ] = E[X12 ]E[X2 ] = 0,

hence
1 b
E[X̄n3 ] = · n E[X13 ] = 2 .
n3 n
8.42. We have
"✓ ◆4 #
X1 + · · · + Xn 1 h 4
i
E[X̄n4 ] =E = 3
E (X1 + · · · + Xn ) .
n n
186 Solutions to Chapter 8

By expanding the fourth power of the sum and using linearity and exchangeability
X
n X
1
E[X̄n4 ] = 4 E Xk4 + 24 Xi Xj Xk X`
n
k=1 i<j<k<`
X X X
+ 12 Xj2 Xk X` + 6 Xj2 Xk2 + 4 Xj3 Xk
k<` j<k j6=k
j6=k,j6=`
n
1 X X
= 4 E[Xk4 ] + 24 E[Xi Xj Xk X` ]
n
k=1 i<j<k<`
X X X
+ 12 E[Xj2 Xk X` ] + 6 E[Xj2 Xk2 ] + 4 E[Xj3 Xk ]
k<` j<k j6=k
j6=k,j6=`
✓ ◆
1 n
= 3 E[X14 ] + 24 E[X1 X2 X3 X4 ]
n 4
✓ ◆ ✓ ◆
n n
+ 12 · · E[X12 X2 X3 ] + 6 E[X12 X22 ] + 4n(n 1)E[X13 X2 ].
3 2
By independence
E[X1 X2 X3 X4 ] = E[X1 ]E[X2 ]E[X3 ]E[X4 ] = 0, E[X12 X2 X3 ] = E[X12 ]E[X2 ]E[X3 ] = 0,
E[X13 X2 ] = E[X13 ]E[X2 ] = 0, E[X12 X22 ] = E[X12 ]E[X22 ] = E[X12 ]2 .
Hence
1 3n(n 1) c 3(n 1)a2
E[X̄n4 ] = E[X 4
1 ] + E[X 2 2
1 ] = + .
n3 n4 n3 n3
8.43. (a) Note that E[Zi2 ] = E[Zi2 ] E[Zi ]2 = Var(Zi ) = 1, because E[Zi ] = 0.
Therefore by linearity we have
n
X
E[Y ] = E[Zi2 ] = nE[Z12 ] = n.
i=1

For the variance, by independence, using independence

n
X
Var(Y ) = Var(Zi2 ) = nVar(Z12 ).
i=1

We have
Var(Z12 ) = E[Z14 ] E[Z12 ]2 .
The fourth moment of a standard normal random variable in Exercise 3.69: E[Z14 ] =
3. Thus,
Var(Y ) = nVar(Z12 ) = n(3 1) = 2n.
(b) The moment generating function of Y is
2 2 2
MY (t) = E[etY ] = E[et(Z1 +Z2 +···+Zn ) ].
By the independence of Zi we can write the right hand side as a product of the
individual moment generating functions, and using the fact that the Zi are i.i.d. we
get
MY (t) = MZ12 (t)n .
Solutions to Chapter 8 187

We compute the moment generating function of Z12 by computing the expectation

2
E[etZ1 ]. We have
Z 1 Z 1
tZ12 1 tz 2 z 2 /2 1 (2t 1)z 2
E[e ] = p e e dz = p e 2 dz.
2⇡ 1 2⇡ 1

This integral convergences only for t < 1/2 (otherwise we integrate a function
that is always at least 1). Moreover, we can write this using the integral of the
probability density function of an N (0, 2t1 1 ) random variable:
Z 1 z2 Z 1
1 2 1 1 1 (2t 1)z 2 1
p e 2t 1 dz = p q e 2 dz = p .
2⇡ 1 2t 1 1 2⇡ 2t1 1 2t 1

Therefore,
⇢ n/2
(1 2t) for t < 1/2
MY (t) =
1 for t 1/2.
Using the moment generating function we calculate the mean to be

E[Y ] = MY0 (0) = n.

For the variance, we first calculate the second moment,

E[Y 2 ] = MY00 (0) = n(n 2) = n(n 2).

From this the variance is

Var(Y ) = E[Y 2 ] E[Y ]2 = n2 2n n2 = 2n.

8.44. (a) From the definition
3
X 1 t 1 2t 1 3t
MX (t) = E[etX ] = pX (k)etk = e + e + e
4 4 2
k=1

and similarly,
1 2t 2 3t 4 4t
MY (t) =
e + e + e .
7 7 7
(b) Since X and Y are independent, we have MX+Y (t) = MX (t)MY (t). Using the
result of part (a) we get
1 t
MX+Y (t) = MX (t)MY (t) = 4e + 14 e2t + 12 e3t 1 2t
7e + 27 e3t + 47 e4t .

Expanding the product gives

e3t 3e4t 2e5t 2e6t 2e7t
MX+Y (t) = + + + + .
28 28 7 7 7
We can identify the possible values of X + Y by looking at the exponents. The
probability mass function at k is just the coefficient of ekt . This gives
1 3 2 2 2
pX+Y (3) = , pX+Y (4) = , pX+Y (5) = , pX+Y (6) = , pX+Y (7) = .
28 28 7 7 7
188 Solutions to Chapter 8

8.45. Using the joint probability mass function we can compute

E[XY ] = 1 · 1 · pX,Y (1, 1) + 1 · 2 · pX,Y (1, 2) + 2 · 0 · pX,Y (2, 0)

16
+ 2 · 1 · pX,Y (2, 1) + 3 · 1 · pX,Y (3, 1) + 3 · 2 · pX,Y (3, 2) = ,
9
E[X] = 1 · pX,Y (1, 1) + 1 · pX,Y (1, 2) + 2 · pX,Y (2, 0)
+ 2 · pX,Y (2, 1) + 3 · pX,Y (3, 1) + 3 · pX,Y (3, 2) = 2,
E[Y ] = 1 · pX,Y (1, 1) + 2 · pX,Y (1, 2) + 0 · pX,Y (2, 0)
8
+ 1 · pX,Y (2, 1) + 1 · pX,Y (3, 1) + 2 · pX,Y (3, 2) = .
9

16 8
Then Cov(X, Y ) = E[XY ] E[X]E[Y ] = 9 2· 9 = 0, which means that
Corr(X, Y ) = 0 as well.

8.46. The first five and last five draws together will give all the draws, thus X +Y =
6 and Y = 6 X. Then

Cov(X, Y ) = Cov(X, 6 X) = Cov(X, X) = Var(X).

The number of red balls in the first five draws has a hypergeometric distribution
with NA = 6, NB = 4, N = 10, n = 5. In Example we computed the variance of
such a random variable to get

N n NA NB 10 5 6 4 2
Var(X) = ·n· · = ·5· · = .
N 1 N N 10 1 10 10 3

2
This leads to Cov(X, Y ) = Var(X) = 3.

8.47. The mean of X is given by the solution of Exercise 8.3. As in the solution
of Exercise 8.3, introduce indicators so that X = XB + XC + XD . Assumption (i)
of the problem implies that Cov(XB , XD ) = Cov(XC , XD ) = 0. Assumption (ii) of
the problem implies that

Cov(XB , XC ) = E[XB XC ] E[XB ]E[XC ]

= P (XB = 1, XC = 1) P (XB = 1)P (XC = 1)
= P (XC = 1|XB = 1)P (XB = 1) P (XB = 1)P (XC = 1)
= 0.8 · 0.3 0.3 · 0.4 = 0.12.

Then

Var(X) = Var(XB + XC + XD ) = Var(XB ) + Var(XC ) + Var(XD )

+ 2[Cov(XB , XC ) + Cov(XB , XD ) + Cov(XC , XD )]
= 0.3 · 0.7 + 0.4 · 0.6 + 0.7 · 0.3 + 2 · 0.12 = 0.9
Solutions to Chapter 8 189

8.48. The joint probability mass function of the random variables (X, Y ) can be
represented by the following table.
Y
0 1 2
9
1 100 0 0
X 81 9
2 100 100 0
1
3 0 0 100

Hence, the marginal distribution are:

9 90 1
pX (1) = 100 , pX (2) = 100 , pX (3) = 100
90 9 1
pY (0) = 100 , pY (1) = 100 , pY (2) = 100 .
From these we can compute the following expectations:
48 11 6
E[X] = 25 , E[Y ] = 100 , E[XY ] = 25 ,
and so
6 48 11 18
Cov(X, Y ) = E[XY ] E[X]E[Y ] = 25 25 · 100 = 625 .

8.49. We need E[X], E[Y ], E[XY ]. The joint density of X, Y is f (x, y) = 1((x, y) 2
D)) (the area is 1) and the bounding lines of D are y = 1, y = x, y = x. We get
ZZ Z 1Z y Z 1
E[X] = xf (x, y)dxdy = xdxdy = (y 2 /2 ( y)2 /2)dy = 0,
0 y 0
(x,y)2D
ZZ Z 1 Z y Z 1
2
E[Y ] = yf (x, y)dxdy = ydxdy = 2y 2 dy = ,
0 y 0 3
(x,y)2D
ZZ Z 1 Z y Z 1
E[XY ] = xyf (x, y)dydx = xydxdy = (y 3 /2 y( y)2 /2)dy = 0.
0 y 0
(x,y)2D

This gives
Cov(X, Y ) = E[XY ] E[X]E[Y ] = 0.
Solution without computation:
By symmetry we see that (X, Y ) has the same distribution as ( X, Y ). This implies
E[X] = E[ X] = E[X] yielding E[X] = 0. It also implies E[XY ] = E[ XY ] =
E[XY ] which gives E[XY ] = 0. This immediately shows that
Cov(X, Y ) = E[XY ] E[X]E[Y ] = 0.
8.50. Note that if (x, y) is on the union of the line segments AB and AC then
either x or y is equal to zero. This means that XY = 0, and Cov(X, Y ) = E[XY ]
E[X]E[Y ] = E[X]E[Y ].
To compute E[X] and E[Y ] is a little bit tricky, since X and Y are neither
continuous, nor discrete. However, we can write both of them as a function of a
continuous random variable. Imagine that we rotate AC 90 degrees about (0, 0) so
190 Solutions to Chapter 8

that it C is rotated into ( 1, 0). Let Z be a uniformly chosen point on the line
segment connecting ( 1, 0) and (1, 0). We can get (X, Y ) as the following function
of Z: (
(z, 0), if z 0
g(z) =
(0, z), if z < 0.
In other words: we ‘fold out’ the union of AB and AC so that it becomes the line
segment connecting ( 1, 0) and (1, 0), choose a point Z on it uniformly, and then
‘fold’ it back into the original AB [ AC.
The density function of Z is 12 on ( 1, 1), and zero otherwise and X = h(Z) =
max(z, 0). Thus
Z 1 Z 1
1 z 1
E[X] = max(z, 0)dz = dz = .
1 2 0 2 4
Similarly,
Z 1 Z 0
1 z 1
E[Y ] = max( z, 0)dz = dz = .
1 2 1 2 4
1
This gives Cov(X, Y ) = E[X]E[Y ] = 16 .
8.51. We start by computing the second moment:
E[(X + 2Y + Z)2 ] = E[X 2 + 4Y 2 + Z 2 + 4XY + 2XZ + 4Y Z]
= E[X 2 ] + 4E[Y 2 ] + E[Z 2 ] + 4E[XY ] + 2E[XZ] + 4E[Y Z]
= 2 + 4 · 12 + 12 + 4 · 2 + 2 · 4 + 4 · 9
= 114.
Then the variance is given by
Var(X+2Y +Z) = E[(X+2Y +Z)2 ] (E[X+2Y +Z])2 = 114 (1+2·3+3)2 = 114 100 = 14
One could also compute all the variances and pairwise covariances first and use
Var(X+2Y +Z) = Var(X)+4 Var(Y )+Var(Z)+4 Cov(X, Y )+2 Cov(X, Z)+4 Cov(Y, Z).
8.52. For the correlation we need Cov(X, Y ), Var(X) and Var(Y ). Both X and Y
have Bin(20, 21 ) distribution, thus
1 1
Var(X) = Var(Y ) = 20 · · = 5.
2 2
Denote by Zi the number of heads among the coin flips 10(i 1) + 1, 10(i 1) +
2, . . . , 10i. Then Z1 , Z2 , Z3 are independent, they all have Bin(10, 12 ) distribution,
and we have X = Z1 + Z2 and Y = Z2 + Z3 . Using the properties of the covariance
and the independence of Z1 , Z2 , Z3 :
Cov(X, Y ) = Cov(Z1 + Z2 , Z2 + Z3 )
= Cov(Z1 , Z2 ) + Cov(Z2 , Z2 ) + Cov(Z1 , Z3 ) + Cov(Z2 , Z3 )
1 1 5
= Cov(Z2 , Z2 ) = Var(Z2 ) = 10 · · = .
2 2 2
Now we can compute the correlation:
5
Cov(X, Y ) 1
Corr(X, Y ) = p = p2 = .
Var(X) Var(Y ) 5·5 2
Solutions to Chapter 8 191

Here is another way to compute the covariance. Let Ij be the indicator of the event
that the jth flip is heads. These are independent Ber(1/2) distributed random
P20 P30
variables. We have X = k=1 Ik and Y = k=21 Ik , and using the properties of
covariance and the independence we get
20
X 30
X
Cov(X, Y ) = Cov( Ik , Ij )
k=1 j=11
20 X
X 30
= Cov(Ik , Ij )
k=1 j=11
20
X 20
X 1 1
= Cov(Ik , Ik ) = Var(Ik ) = 10 · · .
2 2
k=11 k=11

8.53. (a) We have Cov(3X + 2, 2Y 3) = 3 · 2 Cov(X, Y ). Also:

Cov(X, Y ) = E[XY ] E[X]E[Y ] = 1 1·2= 3.
Thus Cov(3X + 2, 2Y 3) = 3 · 2 · ( 3) = 18.
(b) We have
Var(X) = E[X 2 ] E[X]2 = 3 12 = 2, Var(Y ) = E[Y 2 ] E[Y ]2 = 13 22 = 9.
Using that Cov(X, Y ) = 3 we get
Cov(X, Y ) 3 1
Corr(X, Y ) = p =p = p .
Var(X) Var(Y ) 2·9 2
8.54. (a) We have
Var(X) = E[X 2 ] (E[X])2 = 5 22 = 1
Var(Y ) = E[Y 2 ] (E[Y ])2 = 10 12 = 9
Cov(X, Y ) = E[XY ] E[X]E[Y ] = 1 2·1= 1.
Then
Cov(X, Y ) 1 1
Corr(X, Y ) = p =p = .
Var(X) Var(Y ) 1·9 3
(b) We have
Cov(X, X + cY ) = Var(X) + c Cov(X, Y ) = 1 c( 1) = 1 + c.
Thus for c = 1 the random variables X and X + cY are uncorrelated.
8.55. Note that IAc = 1 IA and IB c = 1 IB . Then from Theorem 8.36 we have
Corr(IAc , IB c ) = Corr(1 IA , 1 IB ) = ( 1)·Corr(IA , 1 IB ) = ( 1)·( 1)·Corr(IA , IB ).
8.56. From the properties of variance and covariance:
Var(aX + c) = a2 Var(X)
Var(bY + d) = b2 Var(Y )
Cov(aX + c, bY + d) = ab Cov(X, Y ).
192 Solutions to Chapter 8

Then
Cov(aX + c, bY + d)
Corr(aX + c, bY + d) = p
Var(aX + c) Var(bY + d)
ab Cov(X, Y )
=p
a2 b2 Var(X) Var(Y )
ab Cov(X, Y ) ab
= p = Corr(X, Y ).
|a| · |b| Var(X) Var(Y ) |a| · |b|

ab
The coefficient |a|·|b| is 1 if ab > 0 and 1 if ab < 0.

8.57. Assume that there are random variables satisfying the listed conditions. Then

Var(X) = E[X 2 ] E[X]2 = 3 12 = 2, Var(Y ) = E[Y 2 ] E[Y ]2 = 5 22 = 1

and
Cov(X, Y ) = E[XY ] E[X]E[Y ] = 1 1·2= 3.
From this the correlation is
Cov(X, Y ) 3 3
Corr(X, Y ) = p =p = p .
Var(X) Var(Y ) 2·1 2

But p32 < 1, and we know that the correlation must be in [ 1, 1]. The found
contradiction shows that we cannot find such random variables.
8.58. By the discussion in Section 8.5 if Z and W are independent standard normals
then with
p
X = X Z + µX , Y = Y ⇢Z + Y 1 ⇢2 W + µY

the random variables (X, Y ) have bivariate normal distribution with marginals
2
X ⇠ N (µX , X ) and Y ⇠ N (µY , Y2 ) and correlation Corr(X, Y ) = ⇢. Then we
have
p
U = 2X + Y = (2 X + Y ⇢)Z + Y 1 ⇢2 W + 2µX + µY
p
V =X Y =( X Y ⇢)Z Y 1 ⇢2 W + µ X µ Y .

We can turn this system of equations into a single vector valued equation:
 " p # 
U 2 X + Y⇢ Y 1 ⇢2 Z 2µX + µY
= p +
V 2 W µX µY
X Y⇢ Y 1 ⇢

In Section 8.6 it was shown that if Z, W are independent standard normals, A is a

2 ⇥ 2 matrix and µ is an R2 valued vector then A[Z, W ]T + µ is a bivariate normal
with mean vector µ and covariance matrix AAT . Thus (U, V ) is a bivariate normal
and we just have to identify the individual means, variances and the correlation of
U and V .
Solutions to Chapter 8 193

Using the properties of mean, variance and covariance together gives

E[U ] = E[2X + Y ] = 2µX + µY
E[V ] = E[X Y ] = µX µY
2 2
Var(U ) = Var(2X + Y ) = 4 Var(X) + Var(Y ) + 4 Cov(X, Y ) = 4 X + Y +4 X Y ⇢
2 2
Var(V ) = Var(X Y ) = Var(X) + Var(Y ) 2 Cov(X, Y ) = X + Y 2 X Y ⇢
Cov(U, V ) = Cov(2X + Y, X Y ) = 2 Var(X) + Cov(X, Y ) 2 Cov(X, Y ) Var(Y, Y )
2 2
=2 X Y 2 X Y ⇢.
p
We also used the fact that Cov(X, Y ) = Corr(X, Y ) Var(X) Var(Y ).
Finally,
2 2
Cov(X, Y ) 2 X Y 2 X Y⇢
Corr(U, V ) = p =p 2 2 2 + 2
.
Var(U ) Var(V ) (4 X + Y +4 X Y ⇢)( X Y 2 X Y ⇢)
Thus (U, V ) has bivariate normal distribution with the parameters identified above.
Remark: the joint density of U, V can also be identified by considering the joint
probability density of (X, Y ) from (8.32) and using the Jacobian technique of Sec-
tion 6.4 to derive the joint density function of (U, V ) = (2X + Y, X Y ).
8.59. We can express X and Y in terms of Z and W as
pX = g(Z, W ), Y = h(Z, W )
with g(z, w) = X z + µX and h(z, w) = Y ⇢z + Y 1 ⇢2 w + µY . Solving the
equations
p
x = X z + µX , y = Y ⇢z + Y 1 ⇢2 w + µY
for z, w gives the inverse of the function (g(z, w), h(z, w)). The solution is
x µX (y µY ) X (x µX )⇢ Y
z= , w= p ,
X 1 ⇢2 X Y

thus the inverse of (g(z, w), h(z, w)) is the function (q(x, y), r(x, y)) with
x µX (y µY ) X (x µX )⇢ Y
q(x, y) = , r(x, y) = p .
X 1 ⇢2 X Y

The Jacobian of (q(x, y), r(x, y)) with respect to x, y is

" #
1/ X 0 1
J(x, y) = det p⇢ 2 p1 2 = p .
X 1 ⇢ Y 1 ⇢ X Y 1 ⇢2
Using Fact 6.41 we get the joint density of X and Y :
!
x µX (y µY ) X (x µX )⇢ Y 1
fX,Y (x, y) = fZ,W , p · p .
X 1 ⇢2 X Y X Y 1 ⇢2
z 2 +w2
1
Since Z and W are independent standard normals, we have fZ,W (z, w) = 2⇡ e 2 .
Thus
2 !2 3
✓ ◆2
1 4 1 x µX 1 (y µY ) X (x µX )⇢ Y 5
fX,Y (x, y) = p exp p
2⇡ X Y 1 ⇢ 2 2 X 2 1 ⇢2 X Y
194 Solutions to Chapter 8

Rearranging the terms in the exponent shows that the found joint density is the
same as the one given in (8.32). This shows that the distribution of (X, Y ) is
bivariate normal with parameters µX , X , µY , Y , ⇢.
8.60. The number of ways in which toys can be chosen so that new toys appear at
times 1, 1 + a1 , 1 + a1 + a2 , . . . , 1 + a1 + · · · + an 1 is
n
Y1
n·1a1 1
·(n 1)·2a2 1
·(n 2)·3a3 1
·(n 3) · · · 2·(n 1)an 1 1
·1 = n· (n k)·k ak 1
.
k=1

The total number of sequences of 1 + a1 + · · · + an 1 toys is n1+a1 +···+an 1 . The

probability is
Qn 1 Y1 n k ✓ k ◆ak 1
n
n · k=1 (n k) · k ak 1
P (W1 = a1 , . . . , Wn 1 = an 1 ) = =
n1+a1 +···+an 1 n n
k=1
n
Y1
= P (Wk = ak ).
k=1

where in the last step we used the fact that W1 , W2 , . . . , Wk 1 are independent with
Wj ⇠ Geom( nn j ).
1
8.61. (a) Since f (x) = x is a decreasing function, by the bounds shown in Figure
D.1 we get
n
X Z n n
X1 1
1 1
 dx  .
k 1 x k
k=2 k=1
Rn1
Since 1 x
dx = ln n this gives
n
X n
1 X1
ln n = 1
k k
k=2 k=1

and
n
X1 n
1 X1
ln n  
k k
k=1 k=1
Pn 1
which together give 0  ln n  1.
k=1 k Pn
(c) In Example 8.17 we have shown that E[Tn ] = n k=1 n1 . Using the bounds in
part (a) we have
n ln n  nE[Tn ]  n(ln n + 1)
E(Tn )
from which limn!1 n ln n = 1 follows.
We have also shown
n
X1 n
X1
1 1
Var(Tn ) = n2 n ,
j=1
j2 j=1
j

and hence
n 1 n 1
Var(Tn ) X 1 1X1
= .
n2 j=1
j2 n j=1 j
Solutions to Chapter 8 195

P1 1 ⇡2
Pn
1 1 ⇡2
Pn 1
Since j=1 j 2 = 6 we have limn!1
j=1 j 2 = 6 . We also have 0  j=1 1j 
Pn 1
ln n by part (a), and we know that limn!1 lnnn = 0, thus limn!1 n1 j=1 1j = 0.
2
But this means that limn!1 Var(T
n2
n)
= ⇡6 .
Solutions to Chapter 9

1
9.1. (a) The expected value of Y is E[Y ] = p = 6. Since Y is nonnegative, we can
E[Y ] 6
use Markov’s inequality to get the bound P (Y 16)  16 = 16 = 38 .
5
q
(b) The variance of Y is Var(Y ) = p2 = 6
1 = 30. Using Chebyshev’s inequality we
36
get

Var(Y ) 30 3
P (Y 16) = P (Y E[Y ] 10)  P (|Y E[Y ]| 10)  = = .
102 100 10

(c) The exact value of P (Y 16) can be computed for example by treating Y as
the number trials needed for the first success in a sequence of independent trials
with success probability p. Then

P (Y 16) = P (first 15 trials all failed) = q 15 = (5/6)1 5 ⇡ 0.0649.

We can see that the estimates in (a) and (b) are valid, although they are not very
close to the truth.
1
9.2. (a) We have E[X] = = 2 and X 0. By Markov’s inequality

E[X] 1
P (X > 6)  = .
6 3
1 1
(b) We have E[X] = = 2, Var[X] = 2 = 4. By Chebyshev’s inequality

Var(X) 4 1
P (X > 6) = P (X E[X] > 4)  P (|X E[X]| > 4)  2
= 2 = .
4 4 4
9.3. Let Xi be the price change between day i 1 and day i (with day 0 being
today). Then Cn C0 = X1 + X2 + · · · + Xn . The expectation of Xi (for each i)
is given by E[Xi ] = E[X1 ] = 0.45 · 1 + 0.5 · ( 2) + 0.05 · (10) = 0.05. We can also

197
198 Solutions to Chapter 9

check that the variance is finite. We have

n
X n
X
P (Cn > C0 ) = P (Cn C0 > 0) = P ( Xi > 0) = P ( n1 Xi > 0)
i=1 i=1
n
X
= P ( n1 Xi E[X1 ] > 0.05).
i=1
By the law of large numbers (Theorem 9.9) we have
n
X n
X
P ( n1 Xi E[X1 ] > 0.05)  P (| n1 Xi E[X1 ]| > 0.05) ! 0
i=1 i=1
as n ! 1. Thus limn!1 P (Cn > C0 ) = 0.
9.4. In each round Ben wins $1 with probability 18 37 and loses $1 with probability
19
37 . Let X k be Ben’s net winnings in the kth round, we may assume that X1 , X2 , . . .
are independent. We have µ = E[Xk ] = 18 37
19
37 = 1
37 . If we denote by Sk the
total net winnings within the first k rounds then Sk = X1 + · · · + Xk . By the law of
large numbers Snn will be close to µ = 37 1
with high probability. More precisely,
Sn 1
for any " > 0 we the probability P n + 37 < " converges to 1 as n ! 1.
This means that for large n with high probability Ben will lose many after n
rounds.
9.5. (a) Using Markov’s inequality:
E[X] 10 2
P (X 15)  = = .
15 15 3
(b) Using Chebyshev’s inequality:
V ar(X) 3
P (X 15) = P (X 10 5)  =
52 25
P300
(c) Let S = i=1 Yi . Use the general version of the Central Limit Theorem to esti-
mate P (S > 3030), by first standardizing the sum, then replacing the standardized
sum with a standard normal:
✓ ◆
S 300 · 10 3030 300 · 10
P (S > 3030) = P p > p
3 · 300 3 · 300
✓ ◆
S 300 · 10
=P p >1
3 · 300
⇡1 (1) ⇡ 1 0.8413 = 0.1587
9.6. Let Xk denote the time needed in seconds to it the kth hot dog, and denote by
Sn the sum X1 + · · · + Xn . Since 15 minutes is 900 seconds, we need to estimate the
probability P (S64 < 900). By the CLT the standardized random variable S64 p 64·15
64·42
is close to a standard normal. Thus
✓ ◆
S64 64 · 15 900 64 · 15
P (S64 < 900) = P p < p
64 · 52 64 · 42
✓ ◆
900 64 · 15
⇡ p = ( 1.875) = 1 (1.875)
64 · 42
⇡ 0.0304,
Solutions to Chapter 9 199

where we used linear interpolation to approximate (1.875) using the table in the
Appendix.
9.7. Let Xi be the size of the claim made by the ith policyholder. Let m be the
premium they charge. We desire a premium m for which
2,500
!
X
P Xi  2, 500 · m 0.999.
i=1
We first use Chebyshev’s inequality to estimate
p the probability of the complement.
Recall that µ = E[Xi ] = 1000 and = Var(Xi ) = 900. Using the notation
P2,500
S = i=1 Xi we have
2
E[S] = 2500µ, Var(S) = 2500 .
By Chebyshev’s inequality (assuming m > µ)
P (S 2, 500 · m) = P (S 2500µ 2, 500 · (m µ))
Var(S) 2500 · 9002 324
 2 2
= 2 2
= .
2500 · (m µ) 2500 · (m µ) (m 1000)2
We need this probability to be at most 1 0.999 = 0.001, which leads to (m 324 1000)2 
0.001 and
18
m 1000 + p ⇡ 1569.21.
0.001
Note that we assumed m > µ which was natural: for m  µ we can use Chebyshev’s
inequality that the probability in question cannot be at least 0.999.
⇣P ⌘
2,500
Now let us see how we can estimate P i=1 X i  2, 500 · m using the cen-
tral limit theorem. We have
✓ ◆
S 2, 500 · 1, 000 2, 500 · m 2, 500 · 1, 000
P (S  2500 · m) = P p  p
2, 500 · 900 2, 500 · 900
✓ ◆ ✓ ◆
2500(m 1, 000) m 1, 000
⇡ p =
2, 500 · 900 18
We would like this probability to be at most 0.999. Using the table in Appendix E
m 1,000
we get that 18 0.999 if m 18
1,000
3.1 which leads to m 1055.8.
9.8. (a) This is just the area of the quarter of the unit disk, multiplied by 4.
(b) We have
Z 1 Z 1
4 · I(x2 + y 2  1) dx dy = E[g(U1 , U2 )]
0 0
where U1 , U2 are independent Unif[0, 1] random variables and g(x, y) = 4·I(x2 +
y 2  1).
(c) We need to generate n = 106 independent samples of the random variable
g(U1 , U2 ). If µ̄ is the sample mean and s2n is the sample variance then the
appropriate confidence interval is (µ̄ 1.96·s
p n , µ̄ + 1.96·s
n
p n ).
n

9.9. (a) Using Markov’s inequality we have

E[X] 5
P {X 7, 000}  = .
7, 000 7
200 Solutions to Chapter 9

(b) Using Chebyshev’s inequality we have

4, 500 9
P {X 7, 000} = P (X 5, 000 2, 000)  = = 0.001125.
20002 8000
(c) We want n so that
✓ ◆
Sn
P 5, 000 50  0.05.
n
Using Chebyshev’s inequality we have that
✓ ◆
Sn Var(Sn /n) nVar(X1 ) 4, 500 9
P 5, 000 50  = = = .
n 502 n2 502 n · 502 n·5
Hence, it is sufficient to choose an n so that
9 1 9 · 20
 0.05 = =) n = 9 · 4 = 36.
n·5 20 5
9.10. We have
n
X X
Var(X1 + · · · + Xn ) = Var(Xi ) + 2 Cov(Xi , Xj ).
i=1 i<jn

i Cov(X ,Xj )
Since we have Var(Xi ) = 4500, this gives Corr(Xi , Xj ) = 4500 . Hence
(
0.5 · 4500, if j = i + 1,
Cov(Xi , Xj ) =
0, if j i 2.
There are n 1 pairs of the form i, i + 1 in the sum above, which gives
Var(X1 + · · · + Xn ) = 4500n + 4500(n 1) = 9000n 4500.
Using the outline given in Exercise 9.9(c) we get
✓ ◆
Sn Var(Sn /n) 9000n 4500
P 5, 000 50  = .
n 502 n2 2500
9000n 4500
We need n2 2500 < 0.05 which leads to n 72.
9.11. (a) We have
0 3 5/2 5/2
MX (t) = 2 · 2(1 2t) = 3(1 2t) .
Thus,
0
MX (0) = E[X] = 3.
We may now use Markov’s inequality to conclude that
E[X] 3
P (X > 8)  = = 0.375.
8 8
(b) In order to use Chebyshev’s inequality, we must find the variance of X. So,
di↵erentiating again yields
00 7/2
MX (t) = 15(1 2t) ,
and so,
M 00 (0) = E[X 2 ] = 15 =) Var(X) = 15 9 = 6.
Solutions to Chapter 9 201

Thus, Chebyshev’s inequality yields

Var(X) 6
P (X > 8) = P (X 3 > 5) = = 0.24.
52 25
9.12. (a) We have E[X] = 2 and E[Y ] = 1/2 which gives E[X + Y ] = 5/2. Since
X +Y 0, we may use Markov’s inequality to get
E[X + Y ] 5 1
P (X + Y > 10)  = = .
10 20 4
1 25
(b) We have Var(X) = 2 and Var(Y ) = 12 , and by independence Var(X + Y ) = 12 .
Using Chebyshev’s inequality:
5 5 5 15
P (X + Y > 10) = P (X + Y 2 > 10 2)  P (|X + Y 2| > 2 )
25
Var(X + Y ) 12 1
 15 2 = 15 2 = 27 .
(2) (2)
9.13. We have
10 1 2 20
E[X] = , Var(X) = 10 · · =
3 3 3 9
1 1
E[Y ] = , Var(Y ) = .
3 9
From this we get
10 1 20 1 7
E[X Y]= = 3, Var(X Y ) = Var(X) + Var(Y ) = + = .
3 3 9 9 3
Now we can apply Chebyshev’s inequality:
Var(X Y) 7
P (X Y < 1) = P (X Y 3< 4)  P (|X Y 3| > 4)  = .
42 48
9.14. To get a meaningful bounds we consider only t > 2.
Markov’s inequality gives the bound
E[X] 2
P (X > t)  = .
t t
Chebyshev’s inequality (for t > 2) yields
Var(X) 9
P (X > t) = P (X E[X] > t 2)  P (|X E[X]| > t 2)  = .
(t 2)2 (t 2)2
2 9
Solving the inequality t < (t 2)2 gives 1/2 < t < 8, and since t > 2, this leads to
2 < t < 8.
9.15. Let Xi and Yi the number of customers comingPto Omar’s Pand Cheryl’s truck
n n
on the ith day, respectively. We need to estimate P ( k=1 Xi k=1 Yi ) as n gets
larger. This is the same as the probability
n n
!
X 1X
P ( (Xi Yi ) 0) = P (Xi Yi ) 0
n
k=1 k=1

The random variables Zi = Xi Yi are independent, have mean E[Zi ] = E[Xi ]

E[Yi ] = 10 and a finite variance. By the law of large numbers the average of these
202 Solutions to Chapter 9

random variables will converge to 10, in particular

n
! n
!
1X 1X
P (Xi Yi ) < 0 = P (Zi E[Zi ]) < 10
n n
k=1 k=1

will converge to 0 by Theorem 9.9. But this meansPn that thePprobability of the
n
complement will converge to 1, in other words P ( k=1 Xi Y
k=1 i ) converges
to 1 as n gets larger and larger.
9.16. Let Ui be the waiting time for number 5 on morning i, and Vi the waiting time
1 1
for number 8 on morning i. From the problem, Ui ⇠ Exp( 10 ) and Vi ⇠ Exp( 20 ).
The actual waiting time on morning i is Xi = min(Ui , Vi ). Let Yi be the Bernoulli
variable that records 1 if I take the number 5 on morning i. Then from properties
of exponential variables (from Examples 6.33 and 6.34)
1
3 20
Xi ⇠ Exp( 20 ), E(Xi ) = 3 , E(Yi ) = P (Yi = 1) = P (Ui < Vi ) = 1
10
1 = 23 .
10 + 20
Pn Pn
Since Sn = i=1 Xi and Tn = i=1 Yi , we can answer the questions by the LLN.
(a)
lim P (Sn  7n) = lim P (Sn nE(X1 )  13 n)
n!1 n!1
Sn
lim P ( n E(X1 )  13 ) = 1.
n!1

(b)
1
lim P (Tn 0.6n) = lim P (Tn nE(Y1 ) 15 n)
n!1 n!1
Tn 1
lim P ( n E(Y1 )  15 ) = 1.
n!1

9.17. (a) Using Markov’s inequality we have

E[X] 100 5
P (X > 120)  = = .
120 120 6
(b) Using Chebyshev’s inequality we have
Var(X) 100 1
P (X > 120) = P (X 100 > 20)  2
= = .
20 400 4
P100
(c) We have that X = i=1 Xi where the Xi are i.i.d. Poisson random variables
with a parameter of one (hence, they all have mean 1 and variance 1). Thus,
100
! 100
!
X X
P (X > 120) = P Xi > 120 = P (Xi 1) > 20
i=1 i=1
P100 !
i=1 (Xi 1)
=P p > 2 ⇡ P (Z > 2),
100
where Z is a standard normal random variable and we have applied the CLT in the
last line. Hence,
P (X > 120) ⇡ 1 (2) = 1 0.9772 = 0.0228.
Solutions to Chapter 9 203

1
9.18. (a) From Example 8.13 we have E[X] = 100 · 1 = 300. Hence by Markov’s
3
inequality we get
E[X] 300 3
P (X > 500)  = = .
500 500 5
1
1 3
(b) Again, from Example 8.13 we have Var[X] = 100 · 1 2 = 600. Then from
(3)
Chebyshev’s inequality:
P (X > 500) = P (X E[X] > 500 300)
Var(X) 600 3
 = = = 0.015.
2002 2002 200
(c) By the CLT the distribution of the standardized version of X is close to that
of a standard normal. The standardized version is Xp600
300
, hence
⇣ ⌘
P (X > 500) = P Xp600300
> 500
p 300 ⇡ 1
600
20
(p 6
)⇡1 (8.16) < 0.0002.
16
(In fact 1 (8.16) is way smaller than 0.0002, it is approximately 2.2 · 10 .)
(d) We need more than 500 trials for the 100th success exactly if there are at most
99 successes within the first 500 trials. Thus denoting by S the number of
successes within the first 500 trials we have P (X > 500) = P (S  99). Since
S ⇠ Bin(500, 13 ), we may use normal approximation to get
0 1 0 1
500 500 500
S 99
P (S  99) = P @ r 3  r 3 A⇡ @ 99
r 3 A⇡ ( 6.42) < 0.002.
2 2 2
500· 9 500· 9 500· 9

(Again, the real value of ( 6.42) is a lot smaller than 0.0002, it is approxi-
mately 6.8 · 10 11 .)
9.19. Let Xi be the amount of time it takes the child to spin around on his ith
revolution. Then the total time it will take to spin around 100 times is
S100 = X1 + · · · + X100 .
We assume that the Xi are independent with mean 1/2 and standard deviation
1/3. Then E[S100 ] = 50 and Var(S100 ) = 100
32 . Using Chebyshev’s inequality:

Var(S100 ) 100 4
P (X1 + · · · + X100 > 55) = P (X1 + · · · + X100 50 > 5)  = = .
52 9 · 25 9
If we use the CLT then
✓ ◆
X1 + · · · + X100 50 55 50
P (X1 + · · · + X100 > 55) = P p >p
100 · (1/3) 100 · (1/3)
5
⇡ P (Z >
10 · (1/3)
= P (Z > 1.5) = 1 P (Z  1.5)
=1 0.9332 = 0.0668.
204 Solutions to Chapter 9

9.20. (a) We can use the law of large numbers:

lim P (Sn 0.01n) = lim P (Sn nE[X1 ] 0.01n)
n!1 n!1
 lim P (| Snn E[X1 ]| 0.01) = 0.
n!1
Hence the limit is 0.
(b) Here the central limit theorem will be helpful:
Sn nE[X1 ] 1
lim P (Sn 0) = lim P ( p 0) = 1 (0) = .
n!1 n!1 n Var(X1 ) 2
The limit is 12 .
(c) We can use the law of large numbers:
lim P (Sn 0.01n) = lim P (Sn nE[X1 ] 0.01n)
n!1 n!1
lim P (| Snn E[X1 ]|  0.01) = 1.
n!1
Hence the limit is 1.
9.21. Let Zi = Xi Yi . Then
E[Zi ] = E[Xi ] E[Yi ] = 2 2 = 0, Var(Zi ) = Var(Xi Yi ) = Var(Xi )+Var(Yi ) = 3+2 = 5.
We have ! !
500
X 500
X 500
X
P Xi > Yi + 50 =P Zi > 50 .
i=1 i=1 i=1
Applying the central limit theorem we get
500
! P500 !
X Z i 50
P Zi > 50 = P p i=1 >p
i=1
500 · 5 500 · 5
✓ ◆
50
⇡1 p =1 (1)
500 · 5
⇡ 1 0.8413 = 0.1587.
9.22. If we can generate a Unif[0, 1] distributed random variable, then by Example
5.19 we can also generate an Exp(1) random variable by plugging it into ln(1 x).
Then we can produce a sample of n = 105 independent copies of the Y random
variable given in the exercise. If µ̄ is the sample mean and s2n is the sample variance
from this sample then the 95% confidence interval for the integral is (µ̄ 1.96·s
p n , µ̄+
n
1.96·s
p n ).
n
Solutions to Chapter 10

10.1. (a) By summing the probabilities in the appropriate columns we get the
marginal probability mass function of Y :

pY (0) = 13 , pY (1) = 49 , pY (2) = 29 .

We can now compute the conditional probability mass function pX|Y (x|y) for y =
p (x,y)
0, 1, 2 using the formula pX|Y (x|y) = X,Y
pY (y) . We get

pX|Y (2|0) = 1,
pX|Y (1|1) = 14 , pX|Y (2|1) = 12 , pX|Y (3|1) = 14 ,
pX|Y (2|2) = 12 , pX|Y (3|2) = 1
2

(b) The conditional expectations can be computed using the conditional probability
mass functions:

E[X|Y = 0] = 2pX|Y (2|0) = 2

1 1 1
E[X|Y = 1] = 1pX|Y (1|1) + 2pX|Y (2|1) + 3pX|Y (3|1) = 4 + 2· 2 +3· 4 =2
1 1 5
E[X|Y = 2] = 2pX|Y (2|2) + 3pX|Y (3|2) = 2 · 2 +3· 2 = 2.

10.2. (i) Given X = 1, Y is uniformly distributed. This implies pX,Y (1, 1) = 18 .

(ii) pX|Y (0|0) = 23 . This implies that

2 pX,Y (0, 0) pX,Y (0, 0) pX,Y (0, 0)

3 = = = 1
pY (0) pX,Y (0, 0) + pX,Y (1, 0) pX,Y (0, 0) + 8

which implies pX,Y (0, 0) = 28 .

205
206 Solutions to Chapter 10

(iii) E(Y |X = 0) = 45 . This implies

4
5 = 0 · pY |X (0|0) + 1 · pY |X (1|0) + 2 · pY |X (2|0)
pX,Y (0, 1) + 2pX,Y (0, 2) pX,Y (0, 1) + 2pX,Y (0, 2)
= =
pX (0) pX,Y (0, 0) + pX,Y (0, 1) + pX,Y (0, 2)
pX,Y (0, 1) + 2( 38 pX,Y (0, 1))
= 2 3 .
8 + pX,Y (0, 1) + 8 pX,Y (0, 1)
With the previously known values of the table, the fact that probabilities sum
to 1 gives 58 + pX,Y (0, 1) + pX,Y (0, 2) = 1 and we can replace pX,Y (0, 2) with
3
8 pX,Y (0, 1). From the equation above we deduce pX,Y (0, 1) = 28 and then
pX,Y (0, 2) = 18 .
The final table is
Y
0 1 2
2 2 1
X 0 8 8 8
1 1 1
1 8 8 8

10.3. Given Y = y, the random variable X is binomial with parameters y and 1/2.
Hence, for x between 0 and 6, we have
X6 X6 ✓ ◆
y 1 1
pX (x) = pX|Y (x|y)pY (y) = · ,
y=1 y=1
x 2y 6
y
where x = 0 if y < x (as usual).
For the expectation, we have
6
X X6
y 1 7
E[X] = E[X|Y = y]pY (y) = · = .
y=1 y=1
2 6 4

10.4. (a) Directly from the description of the problem we get that
✓ ◆
n 1 n
pX|N (k|n) = ( ) for 0  k  n  100.
k 2
(b) From knowing the mean of the binomial, E[X|N = n] = n/2 for 0  n  100.
(c)
100
X 100
X
1
E[X] = E[X|N = n] pN (n) = 2 n pN (n) = 12 E[N ] = 1
2 · 100 · 1
4 = 25
2 .
n=0 n=0

Above we used the fact that N is binomial.

10.5. (a) The conditional probability density function is given by the formula:
fX,Y (x, y)
fX|Y (x|y) = ,
fY (y)
Solutions to Chapter 10 207

if fY (y) > 0. Since the joint density is only nonzero for 0 < y < 1, the Y variable
will have a density which is only nonzero in 0 < y < 1. In that case we have
Z 1 Z 1
12
fY (y) = fX,Y (w, y)dw = w(2 w y)dw
1 0 5
12 2 1 3 1 2 1 12 1 1 8 6
= (w w yw ) 0 = (1 y) = y
5 3 2 5 3 2 5 5
Thus, for 0 < y < 1 we have
12
5 x(2 x y) 6x(2 x y)
fX|Y (x|y) = 8 6 = .
5 5y
4 3y
(b) We have
Z Z 1
1 3 1
3 6x(2 x 34 )
P (X > |Y = ) = fX|Y (x|y = )dx = dx
2 4 1
2
4 1
2
4 94
Z
24 1 5 24 5 2 1 3 1 24 5 1 5 1
= x( x)dx = ( x x ) 1 = ( + )
7 12 4 7 8 3 2 7 8 3 32 24
24 7 11 24 17 17
= ( )= = .
7 24 96 7 96 28
Z Z
3 1
6x( 54 x) 24 1
5 24 5 3 1 4 1
E[X|Y = ]= x 7 dx = x2 ( x)dx = ( x x ) 0
4 0 4
7 0 4 7 12 4
24 1 4
= = .
7 6 7
10.6. (a) Begin by finding the marginal density function of Y . For 0 < y < 2,
Z 1 Z y
fY (y) = 1
f (x, y) dx = 4 (x + y) dx = 38 y 2 .
1 0

Then for 0 < x < y < 2

f (x, y) 2 x+y
fX|Y (x|y) = = · .
fY (y) 3 y2
(b) For y = 1 the conditional density function of X is
fX|Y (x|1) = 23 (x + 1) for 0 < x < 1 and zero otherwise.
We compute the conditional probabilities with the conditional density function:
Z 1/2 Z 1/2
1 2 5
P (X < 2 | Y = 1) = fX|Y (x|1)dx = 3 (x + 1) dx = 12
1 0

and
Z 3/2 Z 1
3 2
P (X < 2 | Y = 1) = fX|Y (x|1)dx = 3 (x + 1) dx = 1.
1 0

Note that integrating all the way to 3/2 would be wrong in the last integral
above because conditioning on Y = 1 restricts X to 0 < X < 1.
208 Solutions to Chapter 10

(c) The conditional expectation: for 0 < y < 2,

Z 1 Z y
2 2 x+y
E[X |Y = y] = x fX|Y (x|y) dy = 3 2
x2 · dx = 7 2
18 y .
1 0 y2
For 0 < x < 2, the marginal density function of X can be obtained either
from
Z 1 Z 2
fX (x) = 1
f (x, y) dy = 4 (x + y) dy = 12 + 12 x 38 x2 ,
1 x

or equivalently from
Z 1 Z 2
1 1 3 2
fX (x) = fX|Y (x|y)fY (y) dy = 4 (x + y) dy = 2 + 12 x 8x .
1 x

With the marginal density function we calculate E[X 2 ]:

Z 1 Z 2
2 2
E[X ] = x fX (x)dx = x2 12 + 12 x 38 x2 dx = 14
15 .
1 0

We can get the same answer by averaging the conditional expectation:

Z 1 Z 1
E[X 2 |Y = y] fY (y) dy = 18
7
y 2 fY (y)dy = 18
7
E[Y 2 ]
1 1
Z 2
= 7
18 y 2 · 38 y 2 dy = 14
15 .
0

10.7. (a) Directly by multiplying, fX,Y (x, y) = fX|Y (x|y)fY (y) = 6x for 0 < x <
y < 1.
(b)
Z 1
2x
fX (x) = · 3y 2 dy = 6x(1 x), 0 < x < 1.
x y2
fX,Y (x, y) 1
fY |X (y|x) = = , 0 < x < y < 1.
fX (x) 1 x
Thus given X = x, Y is uniform on the interval (x, 1). Valid for 0 < x < 1.
10.8. (a) From the description of the problem,
✓ ◆
` 4 m 5 ` m
pY |X (m|`) = ( ) (9) for 0  m  `.
m 9
From knowing the mean of a binomial, E[Y |X = `] = 49 `. Thus E[Y |X] = 49 X.
(b) X ⇠ Geom( 16 ), and so E(X) = 6. For the mean of Y ,
E[Y ] = E[E(Y |X)] = 49 E[X] = 4
9 · 6 = 83 .
10.9. (a) We have
Z 1 Z 1
1 x/y y y
fY (y) = f (x, y)dx = e e dx = e
1 0 y
if 0 < y and zero otherwise. We can evaluate the last integral without computation
if we recognize that y1 e x/y is the probability density function of an Exp(1/y)
distribution and hence its integral on [0, 1) is equal to 1.
Solutions to Chapter 10 209

From the found probability density fY (y) we see that Y ⇠ Exp(1) and hence
E[Y ] = 1. We also get
f (x, y) 1 x/y
fX|Y (x|y) = = e if 0 < x, 0 < y,
fY (y) y
and zero otherwise.
(b) The conditional probability density function fX|Y (x|y) found in part (a) shows
that given Y = y > 0 the conditional distribution of X is Exp(1/y). Hence
E[X|Y = y] = 11 = y and E[X|Y ] = Y .
y
(c) We can compute E[X] by conditioning on Y and then averaging the conditional
expectation:
E[X] = E[E[X|Y ]] = E[Y ] = 1,
where in the last step we used part (a).
10.10. (a) ✓ ◆
n k
pX|N (k | n) = p (1 p)n k for 0  k  n.
k
From knowing the expectation of a binomial, E(X | N = n) = np and then
E(X | N ) = pN .
(b) E[X] = E[E(X|N )] = pE[N ] = p .
(c) We use formula (10.36) to compute the expectation of the product:
E[N X] = E[E(N X|N )] = E[N E(X|N )] = E[N · pN ] = pE[N 2 ] = p( 2
+ ).
In the last step we used E[N ] = Var[N ] = and E[N 2 ] = (E[N ])2 + Var[N ].
The calculation above can be done without formula (10.36) also, by manipu-
lating the sums involved:
X X
E[XN ] = kn pX,N (k, n) = kn pX|N (k | n) pN (n)
k,n k,n
X X X
= n pN (n) k pX|N (k | n) = n pN (n) E(X | N = n)
n k n
X
=p n2 pN (n) = pE[N 2 ] = p( 2
+ ).
n
Now for the covariance:
2
Cov(N, X) = E[N X] EN · EX = p( + ) ·p =p .
10.11. The expected value of a Poisson(y) random variable is y, and the second
moment is y + y 2 . Thus
E[X|Y = y] = y, E[X 2 |Y = y] = y 2 + y,
and E[X|Y ] = Y , E[X 2 |Y ] = Y 2 + Y . Now taking expectations and using the the
moments of the exponential distribution gives
1
E[X] = E[E[X|Y ]] = E[Y ] =
and
2 1
E[E[X 2 |Y ]] = E[Y 2 + Y ] = 2
+ .
210 Solutions to Chapter 10

This gives
2 1 1 1 1
Var(X) = E[X 2 ] E[X]2 = 2
+ 2
= 2
+

10.12. (a) This question is for Wald’s identity.

1 1 1
E[SN ] = E[N ] · E[X1 ] = p · = .
p
(b) We derive the moment generating function of SN by conditioning on N . Let
t 2 R. First the conditional moment generating function. As in equation
(10.35) and in the proof of Wald’s identity, conditioning on N = n turns SN
into Sn . Then we use independence and identical distribution of the terms Xi .
hYn i Y n
E[etSN |N = n] = E[etSn ] = E etXi = E[etXi ]
i=1 i=1
8
>
<1 if t ,
✓ ◆n
=
>
: if t < .
t
Above we took the moment generating function of the exponential distribution
from Example 5.6.
Next, for t < , we take expectations over the conditioning variable N :
1
X
tSN tSN
E[e ] = E[E(e |N )] = E[etSN |N = n] pN (n)
n=1
1 ✓
X ◆n 1 ✓
X ◆n 1
n 1 p (1 p)
= (1 p) p=
n=1
t t n=1 t
p
t p
= (1 p)
= .
1 p t
t

With t < the geometric series above converges if and only if

(1 p)
< 1 if and only if t < p .
t
The outcome of the calculation is
8
<1 if t p ,
E[etSN ] = p
: if t < p .
p t
Comparison with Example 5.6 shows that SN ⇠ Exp(p ).
This problem can be solved without calculation by appeal to the properties of
the Poisson process in Section 7.3 and Example 10.14. Namely, start with a Poisson
process of rate of customers that arrive at my store. By Fact 7.26 the interarrival
times of the customers are i.i.d. Exp( ) random variables that we call X1 , X2 ,
X3 , etc. Suppose each customer independently buys something with probability p.
Then the first customer who buys something is the N th customer for a Geom(p)
random variable N . This customer’s arrival time is SN .
Solutions to Chapter 10 211

On the other hand, according to the thinning property of Example 10.14, the
process of arrival times of buying customers is a Poisson process of rate p . Hence
again by Fact 7.26 the time of arrival of the first buying customer has Exp(p )
distribution. Thus we conclude that SN ⇠ Exp(p ). From this, E[SN ] = 1/(p ).
10.13. The price should be the expected value of X. The expectation of a Poisson( )
distributed random variable is , hence we have E[X|U = u] = u and E[X|U ] = U .
Taking expectations again:
E[X] = E[E[X|U ]] = E[U ] = 5
since U ⇠ Unif[0, 10].
10.14. Given the vector (t1 , . . . , tn ) of zeroes and ones, let m be the number of ones
among t1 , . . . , tn . Permutation does not alter the number of ones in the vector and
so m is also the number of ones among tk1 , . . . , tkn . Consequently
P (X1 = t1 , X2 = t2 , . . . , Xn = tn )
Z 1
= P (X1 = t1 , X2 = t2 , . . . , Xn = tn | ⇠ = p) dp
0
Z 1
= pm (1 p)n m
dp
0
and similarly
P (X1 = tk1 , X2 = tk2 , . . . , Xn = tkn )
Z 1
= P (X1 = tk1 , X2 = tk2 , . . . , Xn = tkn | ⇠ = p) dp
0
Z 1
= pm (1 p)n m
dp.
0
The two probabilities agree.
10.15. (a) This is very similar to Example 10.13 and can be solved similarly. Let
N be the number of claims in one day. We know that N ⇠ Poisson(12). Let NA be
the number of claims from A policies in one day, and NB be the number of claims
from B policies in one day. We assume that each claim comes independently from
policy A or policy B. Hence, given N = n, NA is distributed as a binomial random
variable with parameters n and 1/4. Therefore, for any nonnegative k,
1
X
P (NA = k) = P (NA = k|N = n)P (N = n)
n=0
X✓ ◆ ✓ ◆k ✓ ◆ n k
1
n 1 3 12n
= e 12
k 4 4 n!
n=k
✓ ◆k X1 ✓ ◆n k
1 1 k 12 1 3
= 12 e · 12
k! 4 (n k)! 4
n=k
1
X
1 9j 1 k 33
k
= 3k e 12
= 3 e 12 9
e =e .
k! j=0
j! k! k!
212 Solutions to Chapter 10

Hence, NA ⇠ Poisson(3), and we can use this to calculate P (NA 5):

4
X 4
X k
33
P (NA 5) = 1 P (NA = k) = 1 e ⇡ 0.1847.
k!
k=0 k=0

(b) As in part (a), we can show that NB ⇠ Poisson(9), which gives

4
X 4
X k
99
P (NB 5) = 1 P (NB = k) = 1 e ⇡ 0.9450.
k!
k=0 k=0

(c) Since N ⇠ Poisson(12), we have

9
X 9
X k
12 12
P (N 10) = 1 P (N = k) = 1 e ⇡ 0.7576.
k!
k=0 k=0

10.16. There are several ways to approach this problem. We begin with an ap-
proach of direct calculation. The total number of claims is N ⇠ Poisson(12).
Consider any particular claim. Let A be the event that this claim is from policy
A, B the event that this claim is from policy B, and C the event that this claim is
greater than $100,000. By the law of total probability
4 1 1 3 7
P (C) = P (C|A)P (A) + P (C|B)P (B) = 5 · 4 + 5 · 4 = 20 .

Let X denote the number of claims that are greater than $100,000. We must
assume that each claim is greater than $100,000 independently of the other claims.
7
It follows then that given N = n, X is conditionally Bin(n, 20 ). We can deduce the
p.m.f. of X. For k 0,
1
X X1 ✓ ◆
n 7 k 13 n k 12 12n
P (X = k) = P (X = k|N = n)P (N = n) = ( ) ( 20 ) e
k 20 n!
n=k n=k
1
X 1
X
7 k 12 12 ( 13
k
20 )
n k
12n k
k 12 1 ( 39 )j
= ( 20 ) e = ( 21
5 ) e
5
k! (n k)! k! j=0
j!
n=k

k 12 1 39 21 ( 21
5 )
k
= ( 21
5 ) e e5 =e 5 .
k! k!
We found that X ⇠ Poisson( 21
5 ). From this we answer the questions.
21
(a) E[X] = 5 .
21 21
21
(b) P (X  2) = e 5 (1 + 5 + 12 ( 21 2
5 ) )=e
5 701
50 ⇡ 0.21.
We can arrive at the distribution of X also without calculation, and then solve
the problem as above. From the solution to Exercise 10.15, NA ⇠ Poisson(3)
and NB ⇠ Poisson(9). These two variables are independent by the same kind of
calculation that was done in Example 10.13. Let XA be the number of claims from
policy A that are greater than $100,000 and let XB be the number of claims from
policy B that are greater than $100,000. The situation is exactly as in Problem
10.15 and in Example 10.13, and we conclude that XA and XB are independent
with distributions NA ⇠ Poisson( 12 9
5 ) and NB ⇠ Poisson( 5 ). Consequently X =
21
XA + XB ⇠ Poisson( 5 ).
Solutions to Chapter 10 213

10.17. (a) Let B be the event that the coin lands on heads. Then the conditional
distribution of X given B is binomial with parameters 3 and 16 , while the
conditional distribution of X given B c is Bin(5, 16 ). From this we can write down
the conditional probability mass functions, and using (10.5) the unconditional
one:

P (X = k) = P (X = k|B)P (B) + P (X = k|B c )P (B c )

✓ ◆✓ ◆k ✓ ◆3 k ✓ ◆✓ ◆k ✓ ◆5 k
3 1 5 1 5 1 5 1
= · + · .
k 6 6 2 k 6 6 2

The set of possible values of X are {0, 1, . . . , 5}, and the formula makes sense
for all k if we define ab as 0 if b > a.
(b) We could use the probability mass function to compute the expectation of
X, but it is much easier to use the conditional expectations. Because the
conditional distributions are binomial, the conditional expectation of X given
B is E[X|B] = 3 · 16 = 12 and the conditional expectation of X given B c is
E[X|B c ] = 5 · 16 = 56 . Thus,

E[X] = E[X|B]P (B) + E[X|B c ]P (B c ) = 1

2 · 1
2 + 5
6 · 1
2 = 23 .

10.18. Let N be the number of trials needed for seeing the first outcome s, and Y
the number of outcomes t in the first N 1 trials.

(a) For the equally likely outcomes case P (N = n) = ( r r 1 )n 11

r for n 1. The
joint distribution is, for 0  m < n,

P (Y = m, N = n) = P (m outcomes t and no outcomes s

in the first n 1 trials, outcome s in trial n)
✓ ◆
n 1 1 m r 2 n 1 m 1
= r r · r.
m

The conditional probability mass function of Y given N = n is therefore

n 1 1 m r 2 n 1 m 1
P (Y = m, N = n) m r r · r
pY |N (m | n) = =
P (N = n) r 1 n 11
r r
✓ ◆
n 1 1 m r 2 n 1 m
= r 1 r 1 , 0mn 1.
m

Thus given N = n, the conditional distribution of Y is Bin(n 1, r 1 1 ). From

knowing the mean of a binomial,
n 1
E[Y | N = n] = r 1.

N 1
Hence E(Y | N ) = r 1 and then

E[Y ] = E[E[Y | N ]] = E[ Nr 1
1] = 1
r 1 (E[N ] 1) = 1
r 1 (r 1) = 1.
214 Solutions to Chapter 10

(b) In this case P (N = n) = (1 ps ) n 1

ps for n 1. The joint distribution is, for
0  m < n,
P (Y = m, N = n) = P (m outcomes t and no outcomes s
in the first n 1 trials, outcome s in trial n)
✓ ◆
n 1 m
= pt (1 ps pt )n 1 m ps .
m
The conditional probability mass function of Y given N = n is therefore
n 1
P (Y = m, N = n) m pm
t (1 ps p t ) n 1 m
ps
pY |N (m | n) = =
P (N = n) (1 ps ) n 1 ps
✓ ◆
n 1 pt m pt n 1 m
= 1 p 1 1 ps , 0mn 1.
m s

Thus given N = n, the conditional distribution of Y is Bin(n 1, 1 ptps ). From

knowing the mean of a binomial,
pt (n 1)
E[Y | N = n] = .
1 ps
pt (N 1)
Hence E(Y | N ) = and then
1 ps

pt (N 1) pt (E[N ] 1)
E[Y ] = E[E[Y | N ]] = E =
1 ps 1 ps
1
pt (ps 1) pt
= = .
1 ps ps
10.19. (a) We know that X1 ⇠ Bin(n, p1 ) and (X1 , X2 , X3 ) ⇠ Mult(n, 3, p1 , p2 , p3 ).
Using the probability mass function of X1 and the joint probability mass function
of (X1 , X2 , X3 ) we get that if k + ` + m = n and 0  k, `, m then
P (X2 = k, X3 = ` | X1 = m)
P (X2 = k, X3 = ` | X1 = m) =
P (X1 = m)
n m k ` n!
k,l,m p1 p2 p3 pm k `
k!`!m! 1 p2 p3
= n m
= n! m
m p1 (p2 + p3 )
n m
(n m)!m!
p1 (p2 + p3 )n m
✓ ◆
(n m)! pk2 p`3 k+`
= = ( p2p+p
2
)k (1 p2 `
p2 +p3 ) .
k!`! (p2 + p3 ) (p2 + p3 )`
k k 3

(b) The conditional probability mass function found in (a) is binomial with param-
eters k + ` = n m and p2p+p 2
3
. Thus conditioned upon X1 = m, the distribution
p2
of X2 is Bin(n m, p2 +p3 ).
10.20. (a) Let n 1 and 0  k  n so that P (Sn = k) > 0 and conditioning on
the event {Sn = k} is sensible. By the definition of conditional probability,
P (X1 = a1 , X2 = a2 , . . . , Xn = an | Sn = k)
P (X1 = a1 , X2 = a2 , . . . , Xn = an , Sn = k)
= .
P (Sn = k)
Unless the vector (a1 , . . . , an ) has exactly k ones, the numerator above equals
zero. Hence assume that (a1 , . . . , an ) has exactly k ones. Then the condition
Solutions to Chapter 10 215

Sn = k is superfluous in the numerator and can be dropped. The ratio above

equals
P (X1 = a1 , X2 = a2 , . . . , Xn = an ) pk (1 p)n k 1
= n = n .
P (Sn = k) kpk (1 p)n k k
Summarize this as a formula: for 0  k  n,
8
> 1 Pn
< n if i=1 ai = k
P (X1 = a1 , X2 = a2 , . . . , Xn = an | Sn = k) = k
>
:0 otherwise.
(b) The equation above shows that the conditional probability P (X1 = a1 , X2 =
a2 , . . . , Xn = an | Sn = k) depends only on the number of ones in the vector
(a1 , . . . , an ). A permutation of (a1 , . . . , an ) does not change the number of ones.
Hence for any permutation (a`1 , . . . , a`n ) of (a1 , . . . , an ),
P (X1 = a1 , X2 = a2 , . . . , Xn = an | Sn = k)
= P (X1 = a`1 , X2 = a`2 , . . . , Xn = a`n | Sn = k).
This shows that, given Sn = k, X1 , . . . , Xn are exchangeable.
We show that independence fails for any n 2 and 0 < k < n. First
deduce for a fixed index j 2 {1, . . . , n} that
P (Xj = 1, Sn = k)
P (Xj = 1 | Sn = k) =
P (Sn = k)
P (Xj = 1, exactly k 1 successes among Xi for i 6= j)
=
P (Sn = k)
n 1 k 1
p· k 1 p (1 p)n k k
= n k = .
k p (1 p)n k n
Thus
k(n k)
P (X1 = 1 | Sn = k) · P (X2 = 0 | Sn = k) = .
n2
To complete the proof that independence fails we show that the product
above does not agree with P (X1 = 1, X2 = 0 | Sn = k), as long as 0 < k < n.
P (X1 = 1, X2 = 0, Sn = k)
P (X1 = 1, X2 = 0 | Sn = k) =
P (Sn = k)
P (X1 = 1, X2 = 0, exactly k 1 successes among Xi for i 3)
=
P (Sn = k)
n 2 k 1
p(1 p) · k 1 p (1 p)n k 1 k(n k)
= n k = .
k p (1 p)n k n(n 1)
The condition 0 < k < n guarantees that the numerators of k(nn2 k) and n(n
k(n k)
1)
agree and do not vanish. Hence the disagreement of the denominators forces
k(n k) k(n k)
n2 6= n(n 1) .

10.21. (a) We have for 1  m < n

P (Sm = `, Sn = k) P (Sm = `, Sn Sm = k `)
P (Sm = `|Sn = k) = = .
P (Sn = k) P (Sn = k)
216 Solutions to Chapter 10

We know that Sn ⇠ Bin(n, p) and Sk ⇠ Bin(k, p) as these random variables count

the number of successes within the first n and k trials. The random variable Sn Sk
counts the number of successes within the trials k+1, k+2, . . . , n, so its distribution
is Bin(n k, p). Moreover, Sn Sk is independent of Sk , since Sk depends on the
outcome of the first k, and Sn Sk depends on the next n k trials. Thus
P (Sm = `, Sn Sm = k `) P (Sm = `)P (Sn Sm = k `)
P (Sm = `|Sn = k) = =
P (Sn = k) P (Sn = k)
m ` n m k `
` p` (1 p)m k ` p (1 p)(n m) (k `)
= n k
k p (1 p)n k
m n m
` k `
= n .
k

This means that the conditional distribution of Sm given Sn = k is hypergeomet-

ric with parameters n, k, m. Intuitively, the conditional distribution of Sm given
Sn = k is identical to the distribution of the number of successes that occur by
sampling m times without replacement from a set containing k successes and n k
failures.
(b) From Example 8.7 we know that the expectation of a Hypgeom(n, k, m) dis-
tributed random variable is mk mk
n . Hence E[Sm |Sn = k] = n and E[Sm |Sn ] = Sn n .
m

10.22. (a) Start by observing that either X = 1 and Y 2 (when the first trial is
a success) or X 2 and Y = 1 (when the first trial is a failure). Thus when
Y = 1 we have, for m 2,
pX,Y (m, 1) P (first m 1 trials fail, mth trial succeeds)
pX|Y (m|1) = =
pY (1) P (first trial fails)
(1 p)m 1 p
= = (1 p)m 2
p.
1 p
In the other case when Y = ` 2 we must have X = 1, and the calculation
also verifies this:
pX,Y (1, `) P (first ` 1 trials succeed, `th trial fails)
pX|Y (1|`) = =
pY (`) P (first trial succeeds)
` 1
p (1 p)
= ` 1 = 1.
p (1 p)
We can summarize the answer in the following pair of formulas that capture
all the possible values of both X and Y :
(
0, m=1
pX|Y (m|1) = m 2
(1 p) p, m 2,

and for ` 2,
(
1, m=1
pX|Y (m|`) =
0, m 2.
Solutions to Chapter 10 217

(b) We reason as in Example 10.6. Let B be the event that the first trial is a
success. Then

E[max(X, Y )] = pE[max(X, Y ) | B] + (1 p)E[max(X, Y ) | B c ]

= pE[Y | B] + (1 p)E[X | B c ] = pE[Y + 1] + (1 p)E[X + 1]
✓ ◆ ✓ ◆
1 1 1 p + p2
=p + 1 + (1 p) +1 = .
p 1 p p(1 p)

10.23. (a) The distribution of Y is negative binomial with parameters 3 and 1/6
and the probability mass function is

✓ ◆ ✓ ◆y 2
y 1 1 5
P (Y = y) = , y = 3, 4, . . .
2 63 6

To find the conditional probability P (X = x|Y = y) = P (X=x,Y =y)

P (Y =y) we just need to
compute the joint probability mass function of X, Y . Note that X + 2  Y (since
we need at least two more rolls to get the third six after the first six). For 1  x,
x + 2  y the event {X = x, Y = y} is exactly the same as getting no sixes within
the first x 1 rolls, six on the xth roll, exactly one six from x + 1 to y 1 and a six
on the yth roll. These can be written as intersection of independent events, thus

P (X = x, Y = y) = P (no sixes within the first x 1 rolls)P (xth roll is a six)

· P (exactly one six from x + 1 to y 1)P (yth roll is a six)
✓ ◆x 1 ✓ ◆y x 2 !
5 1 5 1 1
= · · (y x 2) · ·
6 6 6 6 6
✓ ◆y 3
5 1
= (y x 2) · 3.
6 6

This leads to

5 y 3
P (X = x, Y = y) (y x 2) 6 · 613
P (X = x|Y = y) = = y 2
P (Y = y) y 1 1 5
2 63 6
y x 2 2(y x 1)
= (y 1)(y 2)
= ,
(y 1)(y 2)
2

if 1  x, x + 2  y and zero otherwise.

(b) For a given y 3 the possible values of X are 1, 2, . . . , y 2. Using the result
of part(a) we get

y
X2 2(y x 1)
E[X|Y = y] = x .
x=1
(y 1)(y 2)
218 Solutions to Chapter 10

Py 2
To evaluate the sum x=1 2x(y x 1) we separate it in parts and then use the
identities (D.6) and (D.7):

y
X2 y
X2 y
X2
2x(y x 1) = 2(y 1) x 2 x2
x=1 x=1 x=1
(y 2)(y 1) (y 2)(y 1)(2(y 2) + 1)
= 2(y 1) 2
2 6
(y 2)(y 1)y
= .
3
This gives
y
X2 2(y x 1) (y 2)(y 1)y y
E[X|Y = y] = x = = ,
x=1
(y 1)(y 2) 3(y 2)(y 1) 3

Y
and E[X|Y ] = 3 .

10.24. (a) Given {Y = y} the distribution of X is Bin(y, 16 ). Thus

✓ ◆
y 1 x 5 y x
pX|Y (x | y) = 6 6 , 0  x  y  10.
x
10
Since Y ⇠ Bin(10, 12 ) we have pY (y) = y ( 12 )10 and then
✓ ◆ ✓ ◆
y 1 x 5 y x 10 1 10
pX,Y (x, y) = pX|Y (x | y)pY (y) = 6 6 (2) , 0  x  y  10.
x y

The unconditional probability mass function of X can be computed as

X X 10 ✓ ◆
X ✓ ◆
y 1 x 5 y x 10 1
pX (x) = pX|Y (x | y)pY (y) = pX,Y (x, y) = 6 6 10

y y y=x
x y 2
10
X 10! 1 x 5 y x
= 6 6 2 10
y=x
x!(y x)!(10 y)!
10
Xx
10! 1 x 1 10 (10 x)! 5 k
=
x!(10 x)! 6 2 k!(10 x k)! 6
k=0
✓ ◆ ✓ ◆
10 1 x 1 10 11 10 x 10 1 x 11 10 x
= 6 2 6 = 12 12 .
x x

The conditional expectation E[X|Y = y] for a fixed y is just the expected value
of Bin(y, 16 ) which is y6 . This means that E(X|Y ) = Y6 and

E[X] = E[E(X|Y )] = E[ Y6 ] = 56 ,

since Y ⇠ Bin(10, 12 ).
Solutions to Chapter 10 219

(b) A closer inspection of the joint probability mass function shows that (X, Y
1 5 1
X, 10 Y ) has a multinomial distribution with parameters (10, 12 , 12 , 2 ):

P (X = x, Y X =y x, 10 Y = 10 y) = P (X = x, Y = y)
✓ ◆ ✓ ◆
y 1 x 5 (y x) 10 1
= 10
x 6 6 y 2
10! 1 x 5 y x 1 10 y
= x!(y x)!(10 y)! 12 12 2 .

1
This implies again that X is just a Bin(10, 12 ) random variable.
To see the joint distribution without computation, imagine that after we flip
the 10 coins, we roll 10 dice, but only count the sixes if the corresponding coin
showed heads. This is the same experiment because the number of ‘counted’
sixes has the same distribution as X. This is the number of successes for 10
identical experiments where success for the kth experiment means that the
kth coin shows heads and the kth die shows six. The probability of success is
1 1 1
2 · 6 = 12 . Moreover, (X, Y X, 10 Y ) gives the number of outcomes where
we have heads and a six, heads and not a six, and tails. This explains why the
1 5 1
the joint distribution is multinomial with probabilities ( 12 , 12 , 2 ).

10.25. (a) The conditional distribution of Y given X = x is a negative binomial

with parameters x, 1/2: so we have
✓ ◆
y 1 1
P (Y = y|X = x) = , 1  x  y.
x 1 2y

(b) We have P (X = x) = (5/6)x 1

(1/6) and X  Y so
X
P (Y = y) = P (Y = y|X = x)P (X = x)
x
y ✓
X ◆ y 1✓ ◆
y 1 1 5 x 1 1 1 1 X y 1 5 i
= ( ) ( )= · y (6)
x=1
x 1 2y 6 6 6 2 i=0 i
✓ ◆y 1
1 1 1 11
= (1 + 56 )y 1
= .
6 2y 12 12

We can recognize this as the probability mass function of the geometric distri-
1
bution with parameter 12 .

(c) We have for 1  x  y:

y 1 1 x 1
P (X = x, Y = y) x 1 2y (5/6) (1/6)
P (X = x|Y = y) = = 1 11 y 1
P (Y = y) (
12 12 )
✓ ◆
y 1 5 x 1 6 y x
= ( ) ( 11 ) .
x 1 11
5
Thus the conditional distribution of X 1 given Y = y is Bin(y 1, 11 ).
220 Solutions to Chapter 10

1
10.26. Let B be the event that the first trial is a success. Recall that E[N ] = p .

E(N 2 ) = E[N 2 |B]P (B) + E[N 2 |B c ]P (B c ) = 1 · p + E[(N + 1)2 ] · (1 p)

2 1
= p + (1 p)(E[N ] + 2E[N ] + 1) = p + (1 p)(2p + 1) + (1 p)E[N 2 ]
2 p
= + (1 p)E[N 2 ].
p

From the equation above we solve

2 p
E[N 2 ] = .
p2

From this,
2 p 1 1 p
Var(N ) = E[N 2 ] (E[N ])2 = = .
p2 p 2 p2
10.27. Utilize again the temporary notation E[X|Y ] = v(Y ) from Definition 10.23
and identity (10.11):
⇥ ⇤ X X
E E[X|Y ] = E[v(Y )] = v(y)pY (y) = E[X|Y = y]pY (y) = E(X).
y y

10.28. We reason as in Example 10.13. First deduction of the joint p.m.f. Let
k1 , k2 , . . . , kr 2 {0, 1, 2, . . . } and set k = k1 + k2 + · · · + kr . In the first equality
below we can add the condition X = k into the probability because the event
{X1 = k1 , X2 = k2 , . . . , Xr = kr } is a subset of the event {X = k}.

P (X1 = k1 , X2 = k2 , . . . , Xr = kr )
= P (X1 = k1 , X2 = k2 , . . . , Xr = kr , X = k)
= P (X = k) P (X1 = k1 , X2 = k2 , . . . , Xr = kr | X = k)
(A) e k
k!
= · pk1 pk2 · · · pkr r
k! k1 ! k2 ! · · · kr ! 1 2
p1
e (p1 )k1 e p2
(p2 )k2 e pr
(pr )kr
= · ··· .
k1 ! k2 ! kr !

In the passage from line 3 to line 4 we used the conditional joint probability
mass function of (X1 , X2 , . . . , Xr ), given that X = k, namely

k!
P (X1 = k1 , X2 = k2 , . . . , Xr = kr | X = k) = pk1 pk2 · · · pkr r ,
k1 ! k2 ! · · · kr ! 1 2

which came from the description of the problem. In the last equality of (A) we
cancelled k! and then used both k = k1 + k2 + · · · + kr and p1 + p2 + · · · + pr = 1.
From the joint p.m.f. we deduce the marginal p.m.f.s by summing away the
other variables. Let 1  j  r and ` 0. In the second equality below substitute
in the last line from (A). Then observe that each sum over the entire Poisson p.m.f.
Solutions to Chapter 10 221

evaluates to 1.
X
P (Xj = `) = P X1 = k 1 , . . . , X j 1 = kj 1,
k1 ,...,kj 1 ,
kj+1 ,...,kr 0
Xj = `, Xj+1 = kj+1 , . . . , Xr = kr

✓ X
1 p1
◆ ✓ X
1 ◆
e (p1 )k1 e pj 1
(pj 1 )kj 1
e pj
(pj )`
= ···
k1 ! kj 1 ! `!
k1 =0 kj 1 =0

✓ X
1 pj+1
◆ ✓ X
1 ◆
e (pj+1 )kj+1 e pr
(pr )kr
· ···
kj+1 ! kr !
kj+1 =0 kr =0

pj
e (pj )`
= .
`!
This gives us Xj ⇠ Poisson(pj ) for each j. Together with the earlier calculation
(A) we now know that X1 , X2 , . . . , Xr are independent with Poisson marginals Xj ⇠
Poisson(pj ).
10.29. For 0  `  n,
n
X
pL (`) = pL|M (`|m) pM (m)
m=`
Xn
m! n!
= r` (1 r)m `
· pm (1 p)n m
`!(m `)! m!(n m)!
m=`
n
X
n! (n `)!
= (pr)` (1 r)m ` m `
p (1 p)n m
`!(n `)! (m `)!(n m)!
m=`
n
X`
n! (n `)!
= (pr)` ((1 r)p)j (1 p)n ` j
`!(n `)! j=0
j!(n ` j)!
✓ ◆
n! n
= (pr)` ((1 r)p + 1 p) n `
= (pr)` (1 pr)n `
.
`!(n `)! `
In other words, L ⇠ Bin(n, pr).
Here is a way to get the distribution of L without calculation. Imagine that
we allow everybody to write the second test (even those applicants who fail the
first one). For a given applicant the probability of passing both tests is pr by
independence. Since L is the number of applicants passing both tests out of the n
applicants, we immediately get L ⇠ Bin(n, pr).
10.30. First deduction of the joint p.m.f. Let k, ` 2 {0, 1, 2, . . . }.
P (X1 = k, X2 = `) = P (X1 = k, X2 = `, X = k + `)
= P (X = k + `) P (X1 = k, X2 = ` | X = k + `)
(k + `)! k
= (1 p)k+` p · ↵ (1 ↵)` .
k! `!
222 Solutions to Chapter 10

To find the marginal p.m.f. we manipulate the series into a form where we can
apply identity (10.52). Let k 0.
1
X 1
X (k + `)! k
P (X1 = k) = P (X1 = k, X2 = `) = (1 p)k+` p · ↵ (1 ↵)`
k! `!
`=0 `=0
1
X
k (k + 1)(k + 2) · · · (k + `) `
= (↵(1 p)) p (1 p)(1 ↵)
`!
`=0
1
X ( k 1)( k 2) · · · ( k 1 ` + 1) `
= (↵(1 p))k p (1 p)(1 ↵)
`!
`=0
1 ✓
X ◆
k 1 `
= (↵(1 p))k p (1 p)(1 ↵)
`
`=0
k k 1
= (↵(1 p)) · p · 1 (1 p)(1 ↵)
✓ ◆k
↵(1 p) p
= · .
p + ↵(1 p) p + ↵(1 p)
Same reasoning (or simply replacing ↵ with 1 ↵) gives for ` 0
✓ ◆`
(1 ↵)(1 p) p
P (X2 = `) = · .
p + (1 ↵)(1 p) p + (1 ↵)(1 p)
Thus marginally X1 and X2 are shifted geometric random variables. However, the
conditional p.m.f. of X2 , given that X1 = k, is of a di↵erent form and furthermore
depends on k:
(k+`)! k
pX,Y (k, `) (1 p)k+` p · k! `! ↵ (1 ↵)`
pY |X (`|k) = = k
pX (k) ↵(1 p) p
· p+↵(1
p+↵(1 p) p)

(k + 1)(k + 2) · · · (k + `) `
= (p + ↵(1 p))k+1 (1 p)(1 ↵) .
`!
We conclude in particular that X1 and X2 are not independent.
10.31. We have
pX|IB (x | 1) = P (X = x | IB = 1) = P (X = x | B) = pX|B (x),
and
pX|IB (x | 0) = P (X = x | IB = 0) = P (X = x | B c ) = pX|B c (x).
10.32. From Exercise 6.34 we record the joint and marginal density functions:
(
2
(x, y) 2 D,
fX,Y (x, y) = 3
0 (x, y) 2 / D,
8
> x  0 or x 2, (
<0 0 y  0 or y 1,
2
fX (x) = 3 0 < x  1, fY (y) = 4 2
>
:4 2 3 3 y 0 < y < 1.
3 3 x 1 < x < 2,

From these we deduce the conditional densities. Note that the line segment
from (1, 1) to (2, 0) that forms part of the boundary of D obeys the equation
Solutions to Chapter 10 223

y = 2 x and consequently all points of D (excluding boundary points) satisfy

x > 0, 0 < y < 1, and x + y < 2.

2
fX,Y (x, y) 3 1
fX|Y (x|y) = = 4 2 = for 0 < x < 2 y and 0 < y < 1.
fY (y) 3 3y
2 y
This shows that given Y = y 2 (0, 1), X is uniform on the interval (0, 2 y). Since
the mean of a uniform random variable is the midpoint of the interval,
y
E[X|Y = y] = 1 2 for 0 < y < 1.

82
>
> 3
>
<2 =1 for 0 < y < 1 and 0 < x  1,
fX,Y (x, y)
fY |X (y|x) = = 3 2
fX (x) >
> 3 1
>
:4 2 = for 0 < y < 2 x and 1 < x < 2.
3 3x
2 x
Thus given X = x 2 (0, 1], Y is uniform on the interval (0, 1), while given X = x 2
(1, 2), Y is uniform on the interval (0, 2 x). Hence
(
1
0 < x  1,
E[Y |X = x] = 2 x
1 2 1 < x < 2.

We combine the answers in the formulas for the conditional expectations as

random variables:
(
1
if X  1,
E[X|Y ] = 1 12 Y and E[Y |X] = 2 1
1 2 X if X > 1.
(Note that not all bounds are needed explicitly in the cases above because with
probability one we have 0 < Y < 1 and 0 < X < 2.)
Last, we calculate the expectations of the conditional expectations.
Z 1
E[X] = E[E(X|Y )] = E[1 12 Y ] = 1 12 E[Y ] = 1 12 y( 43 23 y) dy
0
1 4
=1 2 · 9 = 79 .
Z 1
E[Y ] = E[E(Y |X)] = E[Y |X = x] fX (x) dx
1
Z 1 Z 2
1 2 1 4 2 1 1
= 2 · 3 dx + (1 2 x)( 3 3 x) dx = 3 + 9 = 49 .
0 1
10.33. (a) By formula (10.15),
Z 1/2
1
P (X  2 | Y = y) = fX|Y (x | y) dx.
1
To find the correct limits of integration, look at (10.20) and check where the
integrand fX|Y (x | y) is nonzero on the integration interval ( 1, 21 ]. There are
three cases, depending on whether the right endpoint 12 is to the left of, in the
middle of, or to the right of the interval [1 y, 2 2y]. We get these three cases.
224 Solutions to Chapter 10

(i) y < 12 : P (X  1
2 | Y = y) = 0.
Z 1/2 1
1 1 y
(ii) 2  y < 34 : P (X  1
| Y = y) =
2 dx = 2
.
1 y 1 y 1 y
Z 2 2y
3 1 1
(iii) y 4 : P (X  2 | Y = y) = dx = 1.
1 y 1 y
(b) From Figure 6.4 or from the formula for fX in Example 6.20 we deduce P (X 
1 1
2 ) = 8 . Then integrate the conditional probability from part (a) to find
Z 1
P (X  12 | Y = y) fY (y) dy
1
Z 3/4 1 Z 1
y
= 2
(2 2y) dy + (2 2y) dy = 18 .
1/2 1 y 3/4

10.35. (a) We first find the joint density of (X, S). Using the same idea as in
Example 10.22, we write an expression for the joint cumulative distribution function
FX,S (x, s).
FX,S (x, s) = P (X  x, S  s) = P (X  x, X + Y  s)
ZZ ZZ
= fX,Y (u, v) du dv = '(u)'(v) du dv
ux, u+vs ux, vs u
Z x Z s u Z x
= '(u)'(v) du dv = '(u) (s u)du.
1 1 1
We can get the joint density of (X, S) by taking the mixed partial derivative, and
we will do that by taking the x-derivative first:
✓ Z x ◆
@ @ @ @
fX,S (x, s) = FX,S (x, s) = '(u) (s u)du
@s @x @s @x 1
@ 1 x2 +(s x)2
= ('(x) (s x)) = '(x)'(s x) = e 2 .
@s 2⇡
Solutions to Chapter 10 225

Since S is the sum of two independent standard normals, we have S ⇠ N (0, 2) and
1 s2
fS (s) = p
2 ⇡
e 4 . Then
x2 +(s x)2
1
fX,S (x, s) 2⇡ e 1 1
2 2
( s4 sx+x2 ) (x s 2
2)
fX|S (x | s) = = s2
=p e =p e .
fS (s) 1
p e 4 ⇡ ⇡
2 ⇡

We can recognize the final result as the probability density function of the N ( 2s , 12 )
distribution.
(b) Since the conditional distribution of X given S = s is N ( 2s , 12 ), we get
s 1
E[X|S = s] = , and E[X 2 |S = s] = + ( 2s )2 ,
2 2
S 1 S2
from which E[X|S] = 2, E[X 2 |S] = 2 + 4 .
Taking expectations again:
S2
E[E[X|S]] = E[S/2] = 0, E[E[X 2 |S]] = E[ 12 + 4 ] = 1
2 + 2
4 = 1,
where we used S ⇠ N (0, 2). The final answers agree with the fact that X is
standard normal.
10.36. To find the joint density function of (X, S), we change variables in an integral
that calculates the expectation of a function g(X, S).
Z 1Z 1
1 (x µ)2 (y µ)2
E[g(X, S)] = E[g(X, X + Y )] = g(x, x + y)e 2 2 2 2 dy dx
2⇡ 2 1 1
Z 1Z 1
1 (x µ)2 (s x µ)2
= g(x, s)e 2 2 2 2 ds dx.
2⇡ 2 1 1

From this we read o↵

1 (x µ)2 (s x µ)2
fX,S (x, s) = e 2 2 2 2 for x, y 2 R.
2⇡ 2
2
From the properties of sums of normals we know that S ⇠ N (2µ, 2 ) and hence
1 (s 2µ)2
fS (s) = p 4 2 . e
4⇡ 2

From these ingredients we write down the conditional density function of X, given
that S = s:
p
fX,S (x, s) 4⇡ 2 (x µ) 2 (s x µ)2 (s 2µ)2
+ 4 2
fX|S (x|s) = = 2
e 2 2 2 2 .
fS (s) 2⇡
After some algebra and cancellation in the exponent, this turns into
⇢
1 (x 2s )2
fX|S (x|s) = p exp .
2⇡ 2 /2 2 2 /2
The conclusion is that given S = s, X ⇠ N (s/2, 2 /2). Knowledge of the normal
expectation gives E(X|S = s) = s/2, from which E[X|S] = 12 S.
10.37. Let A be the event {Z > 0}. Random variable Y has the same distribution
as Z conditioned on the event A. Hence the density function fY (y) is the same as
226 Solutions to Chapter 10

the conditional probability density function fZ|A (y). This conditional density will
be 0 for y  0, so we can focus on y > 0. The conditional density will satisfy
Z b
P (a  Z  b|Z > 0) = fY |A (y)dy
a
for any 0 < a < b. But if 0 < a < b then
P (a  Z  b, Z > 0) P (a  Z  b)
P (a  Z  b|Z > 0) = =
P (Z > 0) P (Z > 0)
Rb Z b
'(y)dy
= a = 2'(y)dy.
1/2 a
Thus fY (y) = fZ|A (y) = 2'(y) for y > 0 and 0 otherwise.
10.38. (a) The problem statement gives us these density functions for x, y > 0:
y yx
fY (y) = e and fX|Y (x|y) = ye .
Then the joint density function is given by
y(x+1)
fX,Y (x, y) = fX|Y (x|y)fY (y) = ye for x > 0, y > 0.
(b) Once we observe X = x, the distribution of Y should be conditioned on X = x.
First find the marginal density function of X for x > 0.
Z 1 Z 1
1
fX (x) = fX,Y (x, y)dy = ye y(x+1) dy = .
1 0 (1 + x)2
Then, again for x > 0 and y > 0,
fX,Y (x, y)
fY |X (y|x) = = y(1 + x)2 e y(x+1)
.
fX (x)
The conclusion is that, given X = x, Y ⇠ Gamma(2, x + 1). The gamma
distribution was defined in Definition 4.37.
10.39. From the problem we get that the conditional distribution of Y given X = x
is uniform on [x, 1]. From this we get that fY |X (y|x) is defined for every 0  x < 1
and is equal to (
1
if x  y  1
fY |X (y|x) = 1 x
0 otherwise.
By averaging out x we can get the unconditional probability density function of Y ,
for any 0  y  1 we have
Z 1
fY (y) = fY |X (y|x)fX (x)dx
0
Z y
1
= · 20x3 (1 x)dx
0 1 x
Z y y
x4
= 20 x3 dx = 20 = 5y 4
0 4 0
If y < 0 or y > 1 then we have fY (y) = 0, thus
(
5y 4 if 0  y  1
fY (y) =
0 otherwise.
Solutions to Chapter 10 227

10.40. The conditional density function of Y given X = x is

(
x, 0 < y < 1/x
fY |X (y|x) =
0, y  0 or y 1/x.
(a) Conditional on X = x, Y < 1/x. Hence P (Y > 2|X = x) = 0 if 1/x  2 which
is equivalent to x 1/2. For 0 < x < 1/2 we have
Z 1/x ⇣1 ⌘
P (Y > 2|X = x) = x dy = x 2 = 1 2x.
2 x
In summary,
(
0, if x 1/2
P (Y > 2|X = x) =
1 2x, if 0 < x < 1/2.
(b) Since the expectation of a uniform random variable is the midpoint of the
1
interval, E[Y |X = x] = 2x and from this E[Y |X] = 1/(2X). Finally,
Z 1 Z
⇥ 1 ⇤ 1 1 1 x 1
E[Y ] = E[E[Y |X]] = E = · xe x dx = e dx = .
2X 0 2x 2 0 2
10.41. Let X be the length of the stick after two stick-breaking steps. From Ex-
ample 10.26 we have fX (x) = ln x for 0 < x < 1 and zero elsewhere, and from
the problem description fZ|X (z|x) = x1 for 0 < z < x < 1. Thus for 0 < z < 1,
Z 1 Z 1 Z 1
ln x d
fZ (z) = fZ|X (z|x) fX (x) dx = dx = 12 (ln x)2 dx
1 z x z dx
2
= 1
2 ((ln 1) (ln z)2 ) = 12 (ln z)2 .
As already computed in Example 10.26, E(Z|X) = 12 X and E(Z 2 |X) = 13 X 2 . Next
compute
E(Z) = E[E(Z|X)] = 12 E(X) = 1
8
and
E(Z 2 ) = E[E(Z 2 |X)] = 13 E(X 2 ) = 1
27 .
1 1 37
Finally, Var(Z) = E(Z 2 ) (E[Z])2 = 27 64 = 1728 ⇡ 0.021.
10.42. We introduce several random variables to get to X. First let U ⇠ Unif(0, 1)
and then Y = min(U, 1 U ). Then Y is the length of the shorter piece after the
first stick breaking. Let us deduce the density function fY (y) by di↵erentiating the
c.d.f. of Y . Y cannot be larger than 1/2, and hence we can restrict to 0 < y  1/2.
Exclusion of one point makes no di↵erence to the density function so we can restrict
to 0 < y < 1/2. This is convenient because for 0 < y < 1/2 the events {U  y}
and {U 1 y} are disjoint. This makes the addition of probabilities in the next
calculation legitimate.
FY (y) = P (Y  y) = P (U  y) + P (U 1 y) = y + 1 (1 y) = 2y.
From this fY (y) = FY0 (y) = 2 for 0 < y < 1/2.
Next, given Y = y, let V ⇠ Unif(0, y) and then X = min(V, Y V ). Now X is
the length of the shorter piece after the second stick breaking. We apply the same
strategy to find the conditional density function fX|Y (x|y), namely, we di↵erentiate
228 Solutions to Chapter 10

the conditional c.d.f. Since X  Y /2, when conditioning on Y = y we discard the

From these ingredients we find the density function fX (x). Concerning the
range, the inequalities 0 < x < y/2 and 0 < y < 1/2 combine to give 0 < x < 1/4.
For such x,
Z 1 Z 1/2
2
(A) fX (x) = fX|Y (x|y) fY (y) dy = · 2 dy = 4 ln 4x.
1 2x y

Alternative. Instead of the two separate calculations above for finding fY

and fX|Y , we can do a single calculation for a stick of general length. Let Z be the
length of the shorter piece when a stick of length ` is broken at a uniformly random
position. Let U ⇠ Unif(0, `). Then as above, for 0 < z < `/2,
z ` (` z) 2z
FZ (z) = P (Z  z) = P (U  z) + P (U ` z) = + =
` ` `
from which fZ (z) = FZ0 (z) = 2/` for 0 < z < `/2. We apply this first with ` = 1
to get fY (y) = 2 for 0 < y < 1/2 and then with ` = y to get fX|Y (x|y) = 2/y for
0 < x < y/2. The solution is then completed with (A) as above.
10.43. (a) Since 0 < Y < 2 we can assume that 0 < y < 2. The area of the triangle
is 2, thus the joint density fX,Y (x, y) is 12 inside the triangle, and 0 outside. Note
that the points (x, y) in the triangle are the points satisfying 0  x, 0  y and
x + y  2. For 0 < y < 2 we have
Z 1 Z 2
1 2 y
fY (y) = fX,Y (x, y)dx = 2 dx = 2
1 y

and fY (y) = 0 otherwise. Thus

fX,Y (x, y)
fX|Y (x|y) =
fY (y)
( 1
1
2
2
y = 2 y if x < 2 y
= 2

0 otherwise.

This shows that the conditional distribution of X given Y = y is Uniform[y, 2].

(b) From part (a) we have E[X|Y = y] = y+2 Y +2
2 and E[X|Y ] = 2 .

10.44. The calculation below begins with the averaging principle. Conditioning
on Y = y permits us to replace Y with y inside the probability, and then the
Solutions to Chapter 10 229

conditioning can be dropped because X and Y are independent. Manipulation of

the integrals then gives us the convolution formula.
Z 1
P (X + Y  z) = P (X + Y  z | Y = y) fY (y) dy
1
Z 1
= P (X  z y | Y = y) fY (y) dy
1
Z 1 Z 1 ✓Z z y ◆
= P (X  z y) fY (y) dy = fX (w) dw fY (y) dy
1 1 1
Z 1✓ Z z ◆ Z z ✓Z 1 ◆
= fX (x y) dx fY (y) dy = fX (x y) fY (y) dy dx.
1 1 1 1

10.45. (a) We have the joint density fX,Y (a, y) given in (8.32). The distribution of
(y µY )2
2 2
Y is N (µY , 2 p 1
Y ) and thus the marginal density is fY (y) = e . Then
Y
2⇡ Y
fX,Y (x,y) x µX
fX|Y (x|y) = fY (y) . To help with the notation let us introduce x̃ = X
and
ỹ = y YµY . Then
1
(x̃2 +ỹ 2 2⇢x̃ỹ) ỹ 2
1p ⇢2 ) p 1
fX,Y (x, y) = e 2(1 , fY (y) = 2⇡
e 2
2⇡ X Y 1 ⇢2 Y

and
1p
1
⇢2 )
(x̃2 +ỹ 2 2⇢x̃ỹ)
e 2(1
x̃2 2x̃ỹ⇢+ỹ 2 ⇢2
2⇡ X Y 1 ⇢2
fX|Y (x|y) = ỹ 2
= p p1 e 2(1 ⇢2 )

p 1 2⇡ 1 ⇢2
2⇡
e 2 X
Y
˜ 2
(x̃ y ⇢)
= p p1 e 2(1 ⇢2 )
2⇡ 1 ⇢2 X

Substituting back x̃ = X and ỹ = y YµY we see that the conditional distribution

x µX

of X given Y = y is normal distribution with mean X

X
⇢(y µY ) + µX and variance
2 2
X (1 ⇢ ).
(b) The conditional expectation of X given Y = y is the mean of the normal
distribution we found: X
X
⇢(y µY ) + µX . Thus
X
E[X|Y ] = ⇢(Y µY ) + µX .
X
Note that this is just a linear function of Y .
10.46. The definitions of conditional p.m.f.s and density functions use a ratio
of a joint probability or density function over a marginal. Following the same
joint/marginal pattern, a sensible suggestion would be
Z
1
fX (x | Y 2 B) = f (x, y) dy.
P (Y 2 B) B
A conditional probability of X should come by integrating the conditional density,
and so we would expect
Z
P (X 2 A | Y 2 B) = fX (x | Y 2 B) dx.
A
230 Solutions to Chapter 10

We can check that the formula given above for fX (x | Y 2 B) satisfies this identity.
By the definition of conditional probability,
ZZ
P (X 2 A, Y 2 B) 1
P (X 2 A | Y 2 B) = = f (x, y) dx dy
P (Y 2 B) P (Y 2 B) A⇥B
Z ✓Z ◆ Z
1
= f (x, y) dy dx = fX (x | Y 2 B) dx.
P (Y 2 B) A B A

10.49. (a) If it takes me more than one time unit to complete the job I’m simply
paid 1 dollar, so for t 1, pX|T (1|t) = 1. For 0 < t < 1 we get either 1 or 2 dollars
with probability 1/2 1/2, so the conditional probability mass function is
1
pX|T (1|t) = 2 and pX|T (2|t) = 12 .

(b) From part (a) we get that

(
1 · 12 + 2 · 1
2 = 32 , if 0 < t < 1
E[X|T = t] =
1·1=1 if 1  t.
Solutions to Chapter 10 231

We can compute E[X] by averaging E[X|T = t] using the probability density fT (t)
of T . Since T ⇠ Exp( ), we have fT (t) = e t for t > 0 and 0 otherwise. Thus
Z 1 Z 1 Z 1
3
E[X] = E[X|T = t]fT (t)dy = e t dt + e t dt
0 0 2 1
3 3 1
= (1 e ) + e = e .
2 2 2
10.50. For 0  k < n we have
Z 1 ✓ ◆Z 1
n
P (Sn = k) = P (Sn = k | ⇠ = p)f⇠ (p) dp = pk (1 p)n k dp.
0 k 0

We use integration by parts on the right-hand side to show that P (Sn = k) =

P (Sn = k + 1).
✓ ◆Z 1
n
P (Sn = k) = pk (1 p)n k dp
k 0
✓ ◆ k+1 p=1 Z
n p n k 1 k+1
= (1 p)n k + p (1 p)n k 1 dp
k k+1 p=0 k + 1 0
✓ ◆ Z
n n k 1 k+1
= p (1 p)n k 1 dp
k k+1 0
✓ ◆Z 1
n
= pk+1 (1 p)n k 1 dp = P (Sn = k + 1).
k+1 0
10.51. (a) By independence we have
P (Z 2 [ 1, 1], X = 3) = P (Z 2 [ 1, 1])P (X = 3)
✓ ◆
n 3
= ( (1) ( 1)) p (1 p)n 3 = (2 (1) 1)p3 (1 p)n 3
.
3
P (Y <1,X=3)
(b) We have P (Y < 1|X = 3) = P (X=3) and
P (Y < 1, X = 3) = P (X + Z < 1, X = 3) = P (3 + Z < 1, X = 3)
= P (Z < 2, X = 3) = P (Z < 2)P (X = 3).
Thus
P (Y < 1, X = 3) P (Z < 2)P (X = 3)
P (Y < 1|X = 3) = =
P (X = 3) P (X = 3)
= P (Z < 2) = ( 2).
(c) We can condition on X to get
n
X ✓ ◆
n k
P (Y < x) = P (Y < x|X = k) p (1 p)n k
.
k
k=0

Using the same argument as in part (b) we get

P (Z + X < x, X = k) P (Z < x k)P (X = k)
P (Y < x|X = k) = =
P (X = k) P (X = k)
= P (Z < x k) = (x k).
232 Solutions to Chapter 10

Thus
n
X ✓ ◆
n k
P (Y < x) = (x k) p (1 p)n k
.
k
k=0

10.52. (a)

pX,Y (k, y) pX|Y (k|y) pY (y)

pY |X (y|k) = =
pX (k) pX|Y (k|0) pY (0) + pX|Y (k|1) pY (1)
8
> 1 2 2k
>
> 2 ·e k! 2k e 2
>
> = , y=0
<1 ·e 2 2k + 1 · e 3 3k 2k e 2+ 3k e 3
2 k! 2 k!
=
>
> 1 3 3k
>
> 2 ·e k! 3k e 3
>
:1 = , y = 1.
·e 2 2k + 1 · e 3 3k 2k e 2+ 3k e 3
2 k! 2 k!

(b)

3k e 3 1
lim pY |X (1|k) = lim = lim = 1.
k!1 k!1 2k e 2 + 3k e 3 k!1 ( 23 )k e + 1

Since Y = 1 makes X typically larger than Y = 0 does, a very large X makes

Y = 1 overwhelmingly likelier than Y = 0.
10.53. To see that X2 and X3 are not independent, observe the following. Both
X2 and X3 can take the value (0, 1) with positive probability, but

P (X2 = (0, 1), X3 = (0, 1)) = 0 6= P (X2 = (0, 1))P (X3 = (0, 1)) > 0.

Now we show that X2 , X3 , X4 , . . . is a Markov chain. Suppose that we have

a sequence x2 , x3 , . . . , xn from the set {(0, 0), (0, 1), (1, 0), (1, 1)} so that P (X2 =
x2 , X3 = x3 , . . . .Xn = xn ) > 0. Denote the two coordinates of xi by ai and bi .
Then we must have bk = ak+1 for k = 2, 3, . . . , n 1 and

P (X2 = x2 , X3 = x3 , . . . .Xn = xn ) = P (Y1 = a1 , Y2 = a2 , . . . , Yn 1 = an 1 , Yn = bn ).

Let
xn+1 = (an+1 , bn+1 ) 2 {(0, 0), (0, 1), (1, 0), (1, 1)}.
Then

P (Xn+1 = xn+1 |Xn = xn ) = P (Xn+1 = (an+1 , bn+1 )|Xn = (an , bn )))

P (Xn+1 = (an+1 , bn+1 ), Xn = (an , bn ))
=
P (Xn = (an , bn ))
P (Yn = an+1 , Yn+1 = bn+1 , Yn 1 = an , Yn = bn )
=
P (Yn 1 = an , Yn = bn )
(
P (Yn+1 = bn+1 ), if an+1 = bn
=
0, if an+1 6= bn .
Solutions to Chapter 10 233

Now consider the conditional distribution of Xn+1 with respect to the full past:
P (Xn+1 = xn+1 | X2 = x2 , . . . , Xn = xn )
P (X2 = x2 , . . . , Xn = xn , Xn+1 = xn+1 )
=
P (X2 = x2 , . . . , Xn = xn )
P (Y1 = a1 , Y2 = a2 , . . . , Yn 1 = an 1 , Yn = bn , Yn = an+1 , Yn+1 = bn+1 )
= .
P (Y1 = a1 , Y2 = a2 , . . . , Yn 1 = an 1 , Yn = bn )
This ratio is zero if bn 6= an+1 , and if bn = an+1 then it becomes P (Yn+1 = bn+1 )
by the independence of the Yk . Thus
P (Xn+1 = xn+1 |Xn = xn ) = P (Xn+1 = xn+1 | X2 = x2 , . . . , Xn = xn )
which shows that the process is a Markov chain.
Solutions to the Appendix

Appendix B.
B.1.
(a) We want to collect the elements which are either (in A and in B, but not in
C), or (in A and in C, but not in B), or (in B and in C, but not in A).
The elements described by the first parentheses are given by the set ABC c
(or equivalently A \ B \ C c ). The set in the second parentheses is ACB c while
the third is BCAc . By taking the union of these sets we have exactly the
elements of D:
D = ABC c [ ACB c [ BCAc .
(b) This is similar to part (a), but now we should also include the elements that
are in all three sets. These are exactly the elements of ABC = A \ B \ C, so by
taking the union of this set with the answer of (a) we get the required result.
D = ABC c [ BCAc [ ACB c [ ABC.
Alternately, we can write simply
D = AB [ AC [ BC = (A \ B) [ (A \ C) [ (B \ C).
In this last expression there can be overlap between the members of the union
but it is still a legitimate way to express the set D.
B.2. (a) A \ B \ C
(b) A \ (B [ C)c which can also be written as A \ B c \ C c .
(c) (A [ B) \ (A \ B)c
(d) A \ B \ C c
(e) A \ (B [ C)c
B.3.
(a) B \ A = {15, 25, 35, 45, 51, 53, 55, 57, 59, 65, 75, 85, 95}.

235
236 Solutions to the Appendix

(b) A \ B \ C c = {50, 52, 54, 56, 58} \ C c = {50, 52, 56, 58}.
(c) Observe that a two-digit number 10a + b is a multiple of 3 if and only if a + b is
a multiple of 3: 10a + b = 3k () a + b = 3(k 3a). Thus C \ D = ? because
the sum of the digits cannot be both 10 and a multiple of 3. Consequently
((A \ D) [ B) \ (C \ D) = ?.
✓ ◆c ✓ ◆
T T
B.4. We have ! 2 i Ai if and only if ! 2
/ i Ai . An element ! is not

in the intersection of the sets Ai if and only if there is at least one i with !S2
/ Ai ,
which is the same as ! 2 Aci . But ! 2 Aci for one of the i if and only if ! 2 i Aci .
This proves the identity.
B.5. (a) The elements in A4B are either elements of A, but not B or elements of
B, but not A. Thus we have A4B = AB c [ Ac B.
(b) First note that for any two sets E, F ⇢ ⌦ we have
⌦ = EF [ E c F [ EF c [ E c F c
where the four sets on the right are disjoint. From this and part (a) it follows that
(E4F )c = (EF c [ E c F )c = EF [ E c F c .
This gives
A4(B4C) = A(B4C)c [ Ac (B4C)
= A(BC [ B c C c ) [ Ac (BC c [ B c C)
= ABC [ AB c C c [ Ac BC c [ Ac B c C.
and
(A4B)4C = (A4B)C c [ (A4B)c C
= (AB c [ Ac B)C c [ (AB [ Ac B c )C
= AB c C [ Ac BC c [ ABC [ Ac B c C
which shows that the two sets are the same.
B.6. (a) We have ! 2 E = A \ B if and only if ! 2 A and ! 2 B. Similarly,
! 2 E = A \ B c if and only if ! 2 A and ! 2 B c . This shows that we cannot
have ! 2 E and ! 2 F the same time: this would imply ! 2 B and ! 2 B c
the same time, which cannot happen. Thus the intersection of E and F must
be the empty set.
(b) We first show that if ! 2 A then either ! 2 E or ! 2 F , this shows that
! 2 E [ F . We either have ! 2 B or ! 2 B c . If ! 2 B then ! is an element
of both A and B, and hence an element of E = A \ B. If ! 2 B c then ! is an
element of A and B c , and hence F = A \ B c . This proves that if ! 2 A then
! 2 E [ F.
On the other hand, if ! 2 E [ F then we must have either ! 2 E = A \ B
or ! 2 F = A \ B c . In both cases ! 2 A. Thus ! 2 E [ F implies ! 2 A.
This proves that the elements of A are exactly the elements of E [ F , and
thus A = E [ F .
B.7. (a) Yes. One possibility is D = CB c .
(b) Note that whenever 2 appears in one of the sets (A or B) then 6 is there as
Solutions to the Appendix 237

well, and vice versa. This means that we cannot separate these two elements with
the set operations, whatever set expression we come up with, the result will either
have both 2 and 6 or neither. Thus we cannot get {2, 4} as the result.

Appendix C.
C.1. We can construct all allowed license plates using the following procedure: we
choose one of the 26 letters to be the first letter, then one of the remaining 25
letters to be the 2nd, and then one of the remaining 24 letters to be the third
letter. Similarly, we choose one of the 10 digits to be the first digit, then choose
the second and third digits (with 9 and 8 possible choices). By the multiplication
principle this gives us 26 · 25 · 24 · 10 · 9 · 8 = 11, 232, 000 di↵erent license plates.
C.2. There are 26 choices for each of the three letters. Further, there are 10 choices
for each of the digits. Thus, there are a total of 263 · 103 ways to construct license
plates when any combination is allowed. However, there are 263 · 13 ways to con-
struct license plates with three zeros (we have 26 choices for each of the three letters,
and exactly one choice for each number). Subtracting those o↵ gives a solution of
263 (103 1) = 17,558,424.
Another way to get the same answer is as follows: we have 263 choices for the three
letters and 999 choices for the three digits (103 minus the three zero case) which
gives again 263 · 999 = 17,558,424.
C.3. There are 25 license plates that di↵er from U W U 144 only at the first position
(as there are 25 other letters we can choose there), the same is true for the second
and third positions. There are 9 license plates that di↵er from U W U 144 only at
the fourth position (there are 9 other possible digits), and the same is true for the
5th and 6th positions. This gives 3 · 25 + 3 · 9 = 102 possibilities.
C.4. We can arrange the 6 letters in 6! = 120 di↵erent orders, so the answer is 120.
C.5. Imagine that we di↵erentiate between the two P s: there is a P1 and a P2 .
Then we could order the five letters 5! = 5 · 4 · 3 · 2 · 1 = 120 di↵erent ways. Each
ordering of the letters gives a word, but we counted each word twice (as the two P s
can be in two di↵erent orders). Thus we can construct 120 2 = 60 di↵erent words.
C.6. (a) This is the choice of a subset of size 5 from a set of size 90, hence we have
90
5 = 43, 949, 268 outcomes.
If you want to first choose the numbers in order, then first you produce an
ordered list of 5 numbers: 90 · 89 · 88 · 87 · 86 outcomes. But now each set of
5 numbers is counted 5! times (in each of its orderings). Thus the answer is
again ✓ ◆
90 · 89 · 88 · 87 · 86 90
= = 43, 949, 268.
5! 5
(b) If 1 is forced into the set, then we choose the remaining 4 winning numbers
from the 89 numbers {2, 3, . . . , 90}. We can do that 894 = 2, 441, 626 di↵erent
ways, this is the number of outcomes with 1 appearing among the five numbers.
(c) These outcomes can be produced by first picking 2 numbers from the set
{1, 2, . . . , 49} and 3 numbers from {61, 62, . . . , 90}. By the multiplication prin-
ciple of counting there are 49 2
30
3 = 4, 774, 560 ways we can do that, so that
238 Solutions to the Appendix

is the number of outcomes. Note: It does not matter in what order the steps
are performed, or you can imagine them performed simultaneously.
(d) Here are two possible ways of solving this problem:
(i) First choose a set of 5 distinct second digits from the set {0, 1, 2, . . . , 9}:
10
5 choices. The for each last digit in turn, choose a first digit. There
are always 9 choices: if the last digit is 0, then the choices for the first
digit are {1, 2, . . . , 9}, while if the last digit is in the range 1 9 then the
choices for the first digit are {0, 1, . . . , 8}. By the multiplication principle
of counting there are 10 5
5 9 = 14, 880, 348 outcomes.
(ii) Here is another presentation of the same idea: divide the 90 numbers into
subsets according to last digit:

A0 = {10, 20, 30, . . . , 90}, A1 = {1, 11, 21, . . . , 81},

A2 = {2, 12, 22, . . . , 82}, . . . , A9 = {9, 19, 29, . . . , 89}.

The rule is that at most 1 number comes from each Ak . Hence first
choose 5 subsets Ak1 , Ak2 , . . . , Ak5 out of the ten possible: 10
5 choices.
Then choose one number from the 9 in each set Akj : 95 total possibilities.
By the multiplication principle 10 5
5 9 outcomes.

C.7. Denote the four players by A, B, C and D. Note that if we choose the partner
of A (which we can do three possible ways) then this will determine the other team
as well. Thus there are 3 ways to set up the doubles match.
C.8. (a) Once we choose the opponent of team A, the whole tournament is set up.
Thus there are 3 ways to set up the tournament.
(b) In the tournament there are three games, each have two possible outcomes.
Thus for a given set up we have 23 = 8 outcomes, and since there are 3 ways to
set up the tournament this gives 8·3 = 24 possible outcomes for the tournament.
C.9. (a) In order to produce all pairs we can first choose the rank of the pair (2,
3, . . . , J, Q, K or A), which gives 13 choices. Then we choose the two cards
from the 4 possibilities for that rank (for example, if the rank is K then we
choose 2 cards from ~ K, | K, } K, K), which gives 42 choices. By the
multiplication principle we have altogether 13 · 42 = 78 choices.
(b) To produce two cards with the same suit we first choose the suit (4 choices)
and then choose the two cards from the 13 possibilities with the given suit
( 13 13
2 = 78 choices). By the multiplication principle the result is 4 · 2 = 312.
(c) To produce a suited connector, first choose the suit (4 choices) then one of the
13 neighboring pairs. This gives 4 · 13 = 52 choices.
C.10. (a) We can construct a hand with two pairs the following way. First we
choose the ranks of the repeated ranks, we can do that 13 2 di↵erent ways.
For the lower ranked pair we can choose the two suits 42 ways, and the for
the larger ranked pair we again have 42 choices for the suits. The fifth card
must have a di↵erent rank than the two pairs we have already chosen, there are
52 2 · 4 = 44 choices for that. This gives 13 4 4
2 · 2 · 2 · 44 = 123552 choices.
Solutions to the Appendix 239

(b) We can choose the rank of the three cards of the same rank 13 ways, and the
three suits 43 = 4 ways. The other two cards have di↵erent ranks, we can
choose those ranks 12 2 di↵erent ways. For each of these two ranks we can
choose the suit four ways, which gives 42 choices. This gives 13 · 4 · 12 2
2 ·4 =
54912 possible three of a kinds.
(c) We can choose the rank of the starting card 10 ways (A, 2, . . . , 10) if we want
five cards in sequential order, this identifies the ranks of the other cards. For
each of the 5 ranks we can choose the suit 4 ways. But for each sequence we
have four cases where all five cards are of the same suit, we have to remove
these from the 45 possibilities. This gives 10 · (45 4) = 10200 choices for a
straight.
(d) The suit of the five cards can be chosen 4 ways. There are 13
5 ways to choose
five cards, but we have to remove the cases when these are in sequential order.
We can choose the rank of the starting card 10 ways (A, 2, . . . , 10) if we want
five cards in sequential order. This gives 4 · ( 13
5 10) = 5108 choices for a
flush.
(e) We can construct a full house the following way. First choose the rank that
appears three times (13 choices), and then the rank appearing twice (there are
12 remaining choices). Then choose the three suits for the rank appearing three
times ( 43 = 4 choices) and the suits for the other two cards ( 42 = 6 choices).
In each step the number of choices does not depend on the previous decisions,
so we can multiply these together to get the number of ways we can get a full
house: 13 · 12 · 4 · 6 = 3744.
(f) We can choose the rank of the 4 times repeated card 13 ways, and the fifth card
48 ways (since we have 48 other cards), this gives 13 · 48 = 624 poker hands
with four of a kind.
(g) We can choose the value of the starting card 10 ways (A, 2, . . . , 10), and the
suit 4 ways, which gives 10 · 4 = 40 poker hands with straight flush. (Often
the case when the starting card is a 10 is called a royal flush. There are 4 such
hands.)
C.11. From the definition:
✓ ◆ ✓ ◆
n 1 n 1 (n 1)! (n 1)!
+ = +
k k 1 k!(n k 1)! (k 1)!(n k 1)!
n k n · (n 1)! k n · (n 1)!
= · + ·
n k!(n k 1)! · (n k) n k · (k 1)!(n k 1)!
✓ ◆ ✓ ◆
n k k n! n
= + = .
n n k!(n k)! k
Here is another way to prove the identity. Assume that in a class there are n
students, and one of them is called Dana. There are nk ways to choose a team of
k students from the class. When we choose the team there are two possibilities:
Dana is either on the team or not. There are n k 1 ways to choose the team if
we cannot include Dana. There are nk 11 ways to choose the team if we have to
include Dana. These two numbers must add up to the total number of ways we can
select the team, which gives the identity.
240 Solutions to the Appendix

C.12. (a) We have to divide up the remaining 48 (non-ace) cards into four groups
so that the first group has 9 cards, and the second, third and fourth groups
48 48!
have 13 cards. This can be done by 9,13,13,13 = 9!(13!) 3 di↵erent ways.

(b) To describe such a configuration we just have to assign a di↵erent suit for each
player. This can be done 4! = 24 di↵erent ways.
(c) We can construct such a configuration by first choosing the 13 cards of Player 4
(there are 39 non-~ cards, so we can do that 39
13 di↵erent ways), then choosing
the 13 cards of Player 3 (there are 26 non-~ cards remaining, so we can do that
26
13 di↵erent ways), and then choosing the 13 cards of Player 2 out of the
remaining 26 cards (out of which 13 are ~), we can do that 2613 di↵erent ways.
(Player 1 gets the remaining 13 cards.) Since the number of choices in each
step do not depend on the outcomes of the previous choices, the total number
of configurations is the product 39 26 26 39!26!
13 · 13 13 = (13!)5 .

C.13. Label the sides of the square with north, west, south and east. For any
coloring we can always rotate the square in a unique way so that the red side is the
north side. We can choose the colors of the other two sides (W, S, E) 3 · 2 · 1 = 6
di↵erent ways, which means that there are 6 di↵erent colorings.
C.14. We will use one color twice and the other colors once. Let us first count the
number of ways we can color the sides so there are two red sides. Label the sides
of the square with north, west, south, east. We can rotate any coloring uniquely
so the (only) blue side is the north side. The yellow side can be chosen now three
di↵erent ways (from the other three positions), and once we have that, the positions
of the red sides are determined. Thus there are three ways we can color the sides of
the square so that there are 2 red, 1 blue and 1 yellow side and colorings that can
be rotated to each other are treated the same. Similarly, we have three colorings
with 2 blue, 1 red and 1 yellow side, and three colorings with 2 yellow, 1 red and 1
blue side. This gives 9 possible colorings.
C.15. Imagine that we place the colored cube on the table so that one of the faces
is facing us. There are 6 di↵erent colorings of the cube where the red and blue faces
are on the opposite sides. Indeed: for such a coloring we can always rotate the cube
uniquely so that it rests on the red face and the yellow face is facing us (with blue
on the top). Now we can choose the colors of the other three faces 3 · 2 · 1 di↵erent
ways, which gives us 6 such colorings.
If the red and the blue faces are next to each other then we can always rotate
the cube uniquely so it rests on the red face and the blue face is facing us. The
remaining four faces can be colored 4 · 3 · 2 · 1 di↵erent ways, thus we have 24 such
colorings.
This gives 24 + 6 = 30 colorings all together.
C.16. Number the bead positions clockwise with 0, 1, . . . , 17. We can choose the
positions of the 7 green beads out of the 18 possibilities 18
7 di↵erent ways. However
this way we over counted the number of necklaces, as we counted the rotated
versions of each necklace separately. We will show that each necklace was counted
exactly 18 times. A given necklace can be rotated 18 di↵erent ways (with the first
position going into one of the eighteen possible positions), we just have to check that
Solutions to the Appendix 241

two di↵erent rotations cannot give the same set of positions for the green beads.
We prove this by contradiction. Assume that we have seven di↵erent positions
g1 , . . . , g7 2 {0, 1, . . . , 17} so that if we rotate them by 0 < d < 18 then we get
the same set of positions. It can be shown that this can only happen if each two
neighboring position are separated by the same number of steps. But 7 does not
divide 16, so this is impossible. Thus all 18 rotations of a necklace were counted
1 18
separately, which means that the number of necklaces is 18 7 = 1768.

C.17. Suppose that in a class there are n girls and n boys. There are 2n
n di↵erent
ways we can choose a team of n students out of this class of 2n. For any 0  k  n
there are nk · n n k ways to choose the team so that there are exactly k girls and
n n n n n 2
n k boys chosen. For 0  k  n we have n k = k and thus k · n k = k .
By considering the possible values of the number of girls in the team we now
get the identity
✓ ◆ ✓ ◆2 ✓ ◆ 2 ✓ ◆2
2n n n n
= + + ··· + .
n 0 1 n
C.18. If x = 1 then the inequality is 0 1 n which certainly holds.
Now assume x > 1. For n = 1 both sides are equal to 1+x, so the inequality is
true. Assume now that the inequality holds for some positive integer n, we need to
show that it holds for n + 1 as well. By our induction assumption (1 + x)n 1 + nx,
and because x > 1, we have 1 + x > 0. Hence we can multiply both sides of the
previous inequality with 1 + x to get

(1 + x)n+1 (1 + nx)(1 + x) = 1 + (n + 1)x + nx2 .

Since nx2 0 we get (1 + x)n+1 1 + (n + 1)x which proves the induction step,
and finishes the proof.
C.19. Let an = 11n 6. We have a1 = 5, which is divisible by 5. Now assume that
for some positive integer n the number an is divisible by 5. We have

an+1 = 11n+1 6 = 11(an + 6) 6 = 11an + 60.

If a5n is an integer then an+1

5 = 11 a5n +12 is also an integer. This shows the induction
step, which finishes the proof.
C.20. By checking the first couple of values of n we see that

21 < 4 · 1, 22 < 4 · 2, 23 < 4 · 3, 24 = 4 · 4.

We will show that for all n 4 we have 2n 4n. This certainly holds for n = 4.
Now assume that it holds for some integer n 4, we will show that it also holds
for n + 1. Multiplying both sides of the inequality 2n 4n (which we assumed to
be true) by 2 we get
2n+1 8n.
But 8n = 4(n + 1) + 4(n 1) > 4(n + 1) if n 4. Thus 2n+1 4(n + 1), which
finishes the proof.
242 Solutions to the Appendix

Appendix D.
D.1. We can separate the terms into two sums:
n
X n
X n
X
(n + 2k) = n+ (2k).
k=1 k=1 k=1

Note that in the first sum we add n times the constant term n, so the sum is equal
to n2 . The second sum is just twice the sum (D.6), so its value is n(n + 1). Thus
n
X
(n + 2k) = n2 + n(n + 1) = 2n2 + n.
k=1
P1
D.2. For any fixed i 1 we have j=1 ai,j = ai,i + ai,i+1 = 1 1 = 0. Thus
P1 P1
i=1 j=1 ai,j = 0.
If we fix j 1 then
1
(
X a1,1 = 1, if j = 1,
ai,j =
i=1
aj 1,j + aj,j = 1 + 1 = 0, if j > 1.
P1 P1
Thus j=1 i=1 ai,j = 1. This shows that for this particular choice of numbers
ai,j we have
1 X
X 1 1 X
X 1
ai,j 6= ai,j = 1.
i=1 j=1 j=1 i=1

D.3. (a) Evaluating the sum on the inside first using (D.6):
n X
X k n
X n ✓
X ◆
k(k + 1) 1 1
2
`= = k + k .
2 2 2
k=1 `=1 k=1 k=1

Separating the sum in two parts and then using (D.6) and (D.7):
X n ✓ ◆ n n
1 2 1 1X 2 1X
k + k = k + k
2 2 2 2
k=1 k=1 k=1
1 n(n + 1)(2n + 1) 1 n(n + 1)
= · + ·
2 6 2 2
(n(n + 1) n3 n2 n
= · (2n + 1 + 3) = + + .
12 6 2 3
(b) Since the sum on the inside has k terms that are all equal to k we get
n X
X k n
X n(n + 1)(2n + 1) 1 1 1
k= k2 = = n3 + n2 + n.
6 3 2 6
k=1 `=1 k=1

(c) Separating the sum into three parts:

n X
X k n X
X k n X
X k n X
X k
(7 + 2k + `) = 7+2 k+ `.
k=1 `=1 k=1 `=1 k=1 `=1 k=1 `=1
Solutions to the Appendix 243

The second and third sums can be evaluated using parts (a) and (b). The first sum
is
Xn X k Xn
7n(n + 1) 7 7
7= 7k = = n2 + n.
2 2 2
k=1 `=1 k=1
Thus we get
n X
X k ✓ ◆
7 7 1 3 1 2 1 n3 n2 n
(7 + 2k + `) = n2 + n + 2 · n + n + n + + +
2 2 3 2 6 6 2 3
k=1 `=1
5 3 25
= n + 5n2 + n.
6 6
Pn
D.4. j=i j is the sum of the arithmetic progression i, i+1, . . . , n which has n i+1
elements, so its value is (n i + 1) n+i
2 . Thus
n X
X n n
X n
n+i X1
j= (n i + 1) = i 2 + i + n2 + n
i=1 j=i i=1
2 i=1
2
n n n
1 X +1 X 1X 2
= i i+ (n + n).
2 i=1 2 i=1 2 i=1

The terms in the last sum do not depend on i, so

n
1X 2 1 n2 (n + 1)
(n + n) = (n2 + n)n = .
2 i=1 2 2

The first and second sums can be computed using the identities (D.6) and (D.7):
n
1X 2 n(n + 1)(2n + 1)
i =
2 i=1 12
n
1X n(n + 1)
i= .
2 i=1 4

Collecting all the terms:

n X
X n
n(n + 1)(2n + 1) n(n + 1) n2 (n + 1)
j= + +
i=1 j=i
12 2 2
n(n + 1) n(n + 1)(2n + 1)
= ( (2n + 1) + 3 + 6n) = .
12 6
Here is a quicker solution using the exchange of sums. In the double sum we have
1  i  j  n. If we switch the order of the summation, then i will go from 1 to j,
and then j will go from 1 to n:
n X
X n j
n X
X
j= j.
i=1 j=i j=1 i=1

(The switching of the order of the summation is justified because we have a finite
sum.) The inside sum is easy to evaluate because the summand does not depend
244 Solutions to the Appendix

Pj
on i: i=1 j = j · j = j 2 . Then
X j
n X n
X n(n + 1)(2n + 1)
j= j2 = ,
j=1 i=1 j=1
6

by (D.7).
D.5. (a) From (D.1) we have
1
X 1
X xi
xj = xi + xi+1 + xi+2 + · · · = xi xn = .
j=i n=0
1 x

Thus
1 X
X 1 X1
j xi x
x = = (1 + x + x2 + . . . )
i=1 j=i i=1
1 x 1 x
1
X
x x 1 x
= xn = · = .
1 x n=0 1 x 1 x (1 x)2

(b) Using the hint we can write

1
X 1 X
X k
kxk = xk .
k=1 k=1 j=1

In the sum we have all k, j with 1  j  k. Thus if we switch the order of

summation then we first have k going from j to 1 and then j going from 1 to 1:
1 X
X k 1 X
X 1
xk = xk .
k=1 j=1 j=1 k=j

This is exactly the sum that we computed in part (a), which shows that the answer
is again (1 xx)2 . The fact that we can switch the order of the summation follows
from the fact that the double sum in (a) is finite even if we put absolute values
around each term.
D.6. We use induction. For n = 1 the two sides are equal: 12 = 1·2·(2·1+1)
6 . Assume
that the identity holds for n 1, we will show that it also holds for n + 1. By the
induction hypothesis
n(n + 1)(2n + 1)
12 + 22 + · · · + n2 + (n + 1)2 = + (n + 1)2
6
n+1 n+1
= (n(2n + 1) + 6(n + 1)) = 2n2 + 7n + 6
6 6
(n + 1)(2n2 + 7n + 6) (n + 1)(n + 2)(2n + 3)
= = .
6 6
The last formula is exactly the right side of (D.7) for n + 1 in place of n, which
proves the induction step and the statement.
D.7. We prove the identity by induction. The identity holds for n = 1. Assume
that it holds for n 1, we will show that it also holds for n + 1. By the induction
Solutions to the Appendix 245

hypothesis
n2 (n + 1)2
13 + 23 + · · · + n3 + (n + 1)3 = + (n + 1)3
4 ✓ ◆
2 n2 n2 + 4n + 4
= (n + 1) + n + 1 = (n + 1)2
4 4
(n + 1)2 (n + 2)2
= .
4
This is exactly (D.8) stated for n + 1, which completes the proof.
n
D.8. First note that both sums have finitely many terms, because k = 0 if k > n.
If we move every term to the left side then we get
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
n n n n n
+ + +...
0 1 2 3 4
We would like to show that this expression is zero. Note that the alternating
signs
Pn can be expressed
Pn using kpowers of 1, hence the expression above is equal to
k n n k n
k=0 ( 1) k = k=0 ( 1) · 1 k .P But this is exactly equal to ( 1 + 1)n =
n
0 = 0 by the binomial theorem. Hence k=0 ( 1)k nk = 0 and
n
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
n n n n n n
+ + + ··· = + + + ...
0 2 4 1 3 5
Pn
Using the binomial theorem for (1 + 1)n we get k=0 nk = 2n . Introducing
✓ ◆ ✓ ◆ ✓ ◆
n n n
an = + + + ...
0 2 4
✓ ◆ ✓ ◆ ✓ ◆
n n n
bn = + + + ...,
1 3 5
we have just shown that an = bn and an + bn = 2n . This yields an = bn = 2n 1 .
But an is exactly the number of even subsets of a set of size n (as it counts the
number of subsets with 0, 2, 4 . . . elements), thus the number of even subsets is
2n 1 . Similarly, the number of odd subsets is also 2n 1 .
D.9. We would like to show (D.10) for all x, y and n 1. For n = 1 the two sides
are equal. Assume that the statement holds for n, we will prove that it also holds
for n + 1. By the induction hypothesis
Xn ✓ ◆
n+1 n n k n k
(x + y) = (x + y) · (x + y) = (x + y) x y
k
k=0
Xn ✓ ◆ n ✓ ◆
X n ✓ ◆
n k n k n k+1 n k X n k n k+1
= x y (x + y) = x y + x y .
k k k
k=0 k=0 k=0
Shifting the index in the first sum gives
Xn ✓ ◆ n ✓ ◆ n+1 ✓ ◆ n ✓ ◆
n k+1 n k X n k n k+1 X n X n
x y + x y = xk y n+1 k
+ xk y n k+1
k k k 1 k
k=0 k=0 k=1 k=0
Xn ✓✓ ◆ ✓ ◆◆
n n
= xn+1 + y n+1 + + xk y n+1 k
k 1 k
k=1
246 Solutions to the Appendix

where in the last step we separated

⇣ the⌘last and first term of the two sums. Using
Exercise C.11 we get that k 1 + nk = n+1
n
k which gives

n ✓
X ◆ X✓
n+1 ◆
n+1 n + 1 k n+1
(x + y)n+1 = xn+1 + y n+1 + xk y n+1 k
= x y k
,
k k
k=1 k=0

which is exactly what we wanted to prove.

D.10. For r = 2 the statement is the binomial theorem, which we have proved in
Fact D.2. Assume that for a certain r 2 the statement is true, we will prove that
it holds for r + 1 as well.
We start by noting that

(x1 + x2 + · · · + xr+1 )n = (x1 + x2 + · · · + (xr + xr+1 ))n .

We can use our induction assumption for the r numbers x1 , x2 , . . . , xr 1 , xr + xr+1

to get

(x1 + x2 + · · · + (xr + xr+1 ))n

X ✓ ◆
n k
= xk1 xk2 · · · xr r 11 (xr + xr+1 )kr
k 1 , k2 , . . . , k r 1 2
k1 0, k2 0,..., kr 0
k1 +k2 +···+kr =n

Using the binomial theorem for (xr + xr+1 )kr gives

(x1 + x2 + · · · + (xr + xr+1 ))n

X kr ✓
X ◆✓ ◆
n k r k1 k 2 k
= x1 x2 · · · xr r 11 xjr xr+1 )kr j
.
k 1 , k2 , . . . , k r j
k1 0, k2 0,..., kr 0 j=0
k1 +k2 +···+kr =n

Introducing the new notation a = j, b = kr j we can rewrite the double sum as

follows
X kr ✓
X ◆✓ ◆
n k r k1 k2 k
x1 x2 · · · xr r 11 xjr xr+1 )kr j
k 1 , k2 , . . . , k r j
k1 0, k2 0,..., kr 0 j=0
k1 +k2 +···+kr =n
X ✓ ◆✓ ◆
n a + b k1 k 2 k
= x1 x2 · · · xr r 11 xar xr+1 )b .
k 1 , k2 , . . . , k r 1, a +b a
k1 0, k2 0,..., kr 1 0,a 0,b 0
k1 +k2 +···+kr 1 +a+b=n

Now note that

✓ ◆✓ ◆
n a+b n! (a + b)!
= ·
k 1 , k2 , . . . , k r 1, a +b a k1 !k2 ! · · · kr 1 !(a + b)! a!b!
✓ ◆
n
= .
k1 , k2 , . . . , kr 1 , a, b
Solutions to the Appendix 247

This means that

(x1 + x2 + · · · + (xr + xr+1 ))n
X ✓ ◆
n k
= xk11 xk22 · · · xr r 11 xar xr+1 )b
k 1 , k2 , . . . , k r 1 , a, b
k1 0, k2 0,..., kr 1 0,a 0,b 0
k1 +k2 +···+kr 1 +a+b=n

which is exactly the statement we have to prove for r + 1. This proves the induction
step and the theorem.
D.11. This can be done similarly to Exercise D.9. We outline the proof for r = 3,
the general case is similar (with more indices). We need to show that
X ✓ ◆
n
n
(x1 + x2 + x3 ) = x k1 x k2 x k3 .
k 1 , k2 , k3 1 2 3
k1 0, k2 0,k3 0
k1 +k2 +k3 =n

For n = 1 the two sides are equal: the only possible triples (k1 , k2 , k3 ) are (1, 0, 0),
(0, 1, 0) and (0, 0, 1) and these give the terms x1 , x2 and x3 . Now assume that the
equation holds for some n, we would like to show it for n+1. Take the equation for n
and multiply both sides with x1 +x2 +x3 . Then on one side we get (x1 +x2 +x3 )n+1 ,
while the other side is
X ✓ ◆⇣ ⌘
n
xk11 +1 xk22 xk33 + xk11 xk22 +1 xk33 + xk11 xk22 xk33 +1 .
k 1 , k2 , k3
k1 0, k2 0,k3 0
k1 +k2 +k3 =n

The coefficient of xa1 1 xa2 2 xa3 3 for a given 0  a1 , 0  a2 , 0  a3 with a1 +a2 +a3 = n+1
is equal to
✓ ◆ ✓ ◆ ✓ ◆
n n n
+ +
a1 1, a2 , a3 a 1 , a2 1, a3 a 1 , a 2 , a3 1
n+1
which can be shown to be equal to . (This is a generalization of Exercise
a1 ,a2 ,a3
D.9 and can be shown the same way.) But this means that
X ✓ ◆⇣ ⌘
n
(x1 + x2 + x3 ) n+1
= xk11 +1 xk22 xk33 + xk11 xk22 +1 xk33 + xk11 xk22 xk33 +1
k 1 , k2 , k3
k1 0, k2 0,k3 0
k1 +k2 +k3 =n
X ✓ ◆
n+1
= xa1 xa2 xa3 ,
a 1 , a2 , a3 1 2 3
a1 0, a2 0,a3 0
a1 +a2 +a3 =n+1

which is exactly what we needed for the induction step.

D.12. Imagine that we expand all the parentheses in the product
(x1 + · · · + xr )n = (x1 + · · · + xr )(x1 + · · · + xr ) · · · (x1 + · · · + xr ).
Then each term in the resulting expansion will be of the form of xk11 · · · xkr r with
ki 0 and k1 + · · · + kr = n. This is because from each of the (x1 + · · · + xr ) term
we will pick exactly one of the xi , and we have n factors in the end. Now we have
to determine the coefficient of a the term xk11 · · · xkr r in the expansion for a given
choice of k1 , . . . , kr with ki 0 and k1 + · · · + kr = n. In order to get such a term
from the expansion we need to choose k1 times x1 , k2 times x2 and so on. But the
248 Solutions to the Appendix

n
number of ways we can do that is exactly the multinomial coefficient k1 ,k2 ,...,kr .
This proves the identity (D.11).

Linear Algebra Done Right
No ratings yet
Linear Algebra Done Right
408 pages
An Introduction To Optimization-4th Ed-Solution Manual
100% (1)
An Introduction To Optimization-4th Ed-Solution Manual
221 pages
Probability Pitman
90% (20)
Probability Pitman
575 pages
Solutions Manual For Statistical Computing With R - Rizzo
100% (1)
Solutions Manual For Statistical Computing With R - Rizzo
136 pages
Linear Algebra 9781107177901 1107177901 - Compress
100% (1)
Linear Algebra 9781107177901 1107177901 - Compress
448 pages
Foundations and Applications of Statistics: An Introduction Using
100% (4)
Foundations and Applications of Statistics: An Introduction Using
640 pages
A First Course in Probability 9th Edition Solutions
No ratings yet
A First Course in Probability 9th Edition Solutions
11 pages
Management Science
No ratings yet
Management Science
134 pages
(397 P. COMPLETE SOLUTIONS) Elements of Information Theory 2nd Edition - COMPLETE Solutions Manual (Chapters 1-17)
85% (55)
(397 P. COMPLETE SOLUTIONS) Elements of Information Theory 2nd Edition - COMPLETE Solutions Manual (Chapters 1-17)
397 pages
Morris Tenenbaum Harry Pollard Ordinary Differential Equations
100% (1)
Morris Tenenbaum Harry Pollard Ordinary Differential Equations
819 pages
Probability and Statistics With Examples Using R Siva Athreya, Deepayan Sarkar, and Steve Tanner
No ratings yet
Probability and Statistics With Examples Using R Siva Athreya, Deepayan Sarkar, and Steve Tanner
258 pages
Introduction To Topology Theodore W Gamelin, Robert Everist Greene
100% (1)
Introduction To Topology Theodore W Gamelin, Robert Everist Greene
242 pages
Abstract Algebra Dummit Foote Chapter 13 Field Theory Solutions
60% (5)
Abstract Algebra Dummit Foote Chapter 13 Field Theory Solutions
25 pages
Kit-Wing Yu - Problems and Solutions For Undergraduate Real Analysis (2018)
100% (2)
Kit-Wing Yu - Problems and Solutions For Undergraduate Real Analysis (2018)
412 pages
(Kendall M.G.) The Advanced Theory of Statistics
100% (2)
(Kendall M.G.) The Advanced Theory of Statistics
466 pages
Classical Complex Analysis: Liang-Shin Hahn
100% (1)
Classical Complex Analysis: Liang-Shin Hahn
437 pages
Probability
100% (2)
Probability
520 pages
Andrew Pressley-Instructor's Solutions Manual To Elementary Differential Geometry-Springer (2012) PDF
85% (13)
Andrew Pressley-Instructor's Solutions Manual To Elementary Differential Geometry-Springer (2012) PDF
137 pages
Solutions
100% (1)
Solutions
183 pages
Solution - Probability and Stochastics - Cinlar
No ratings yet
Solution - Probability and Stochastics - Cinlar
71 pages
Probability by Jim Pitman Solutions Manual PDF
100% (7)
Probability by Jim Pitman Solutions Manual PDF
200 pages
Durrett Probability Theory and Examples Solutions PDF
73% (15)
Durrett Probability Theory and Examples Solutions PDF
122 pages
22-Lecture Notes On Probability Theory and Random Processes
100% (2)
22-Lecture Notes On Probability Theory and Random Processes
302 pages
stein答案全部
No ratings yet
stein答案全部
287 pages
Series Convergence Tests Blackpenredpen
100% (3)
Series Convergence Tests Blackpenredpen
1 page
Durrett 5e Solutions
No ratings yet
Durrett 5e Solutions
96 pages
Garrett Birkhoff, Gian-Carlo Rota Ordinary Differential Equations 1989 PDF
88% (8)
Garrett Birkhoff, Gian-Carlo Rota Ordinary Differential Equations 1989 PDF
409 pages
BMATS101 - Model QP (Set-2) Solution - 240125 - 163217
100% (1)
BMATS101 - Model QP (Set-2) Solution - 240125 - 163217
37 pages
CKW Vol.4AB Solutions
100% (2)
CKW Vol.4AB Solutions
190 pages
Solutions Manual To Algebra, Chapter 0
100% (1)
Solutions Manual To Algebra, Chapter 0
107 pages
Solution Manual Elias M - Stein Rami Shakarchi-Real Analysis PDF
100% (4)
Solution Manual Elias M - Stein Rami Shakarchi-Real Analysis PDF
112 pages
An Introduction To Fourier Analysis-Solutions Manual
50% (2)
An Introduction To Fourier Analysis-Solutions Manual
182 pages
Solutions Manual To Walter Rudin's Principles of Mathematical Analysis - R. Cooke (Tiene Más Páginas Creo) PDF
No ratings yet
Solutions Manual To Walter Rudin's Principles of Mathematical Analysis - R. Cooke (Tiene Más Páginas Creo) PDF
248 pages
Unit 1 Set Theory & Relations POSET & Lattices
No ratings yet
Unit 1 Set Theory & Relations POSET & Lattices
63 pages
Probability and Stochastic Processes 3rd Edition Quiz Solutions
100% (2)
Probability and Stochastic Processes 3rd Edition Quiz Solutions
90 pages
Stein and Shak Arch I Real Analysis Solutions
100% (1)
Stein and Shak Arch I Real Analysis Solutions
112 pages
Dummit and Foote Soln
83% (6)
Dummit and Foote Soln
39 pages
Selected Solutions To Dummit and Foote's Abstract Algebra Third Edition
No ratings yet
Selected Solutions To Dummit and Foote's Abstract Algebra Third Edition
157 pages
Solution Manual For Brownian Motion 1nd
100% (1)
Solution Manual For Brownian Motion 1nd
159 pages
Anderson, D. - Introduction To Probability (Solutions)
No ratings yet
Anderson, D. - Introduction To Probability (Solutions)
254 pages
Stochastic Calculus With Applications by Ovidiu Calin
100% (1)
Stochastic Calculus With Applications by Ovidiu Calin
372 pages
Logan PDEs Solution Manual PDF
No ratings yet
Logan PDEs Solution Manual PDF
78 pages
Apostol Chapter 02 Solutions
No ratings yet
Apostol Chapter 02 Solutions
23 pages
Selected Solutions To Linear Algebra Done Wrong
No ratings yet
Selected Solutions To Linear Algebra Done Wrong
21 pages
Marek Capinski, Tomasz Jerzy Zastawniak - Probability Through Problems (2000, Springer) PDF
No ratings yet
Marek Capinski, Tomasz Jerzy Zastawniak - Probability Through Problems (2000, Springer) PDF
133 pages
Stochastic Differential Equations. Introduction To Stochastic Models For Pollutants Dispersion, Epidemic and Finance
100% (1)
Stochastic Differential Equations. Introduction To Stochastic Models For Pollutants Dispersion, Epidemic and Finance
156 pages
Soa Exam P
100% (1)
Soa Exam P
10 pages
Boats and Streams - Aptitude Test Tricks, Formulas & Shortcuts
No ratings yet
Boats and Streams - Aptitude Test Tricks, Formulas & Shortcuts
5 pages
Dynamic Programing and Optimal Control PDF
No ratings yet
Dynamic Programing and Optimal Control PDF
276 pages
Readers Solution Manual For Probability, Random Processes and Statistical Analysis (HISASHI KOBAYASHI)
No ratings yet
Readers Solution Manual For Probability, Random Processes and Statistical Analysis (HISASHI KOBAYASHI)
119 pages
Solution Manual For Stochastic Calculus For Finance
100% (2)
Solution Manual For Stochastic Calculus For Finance
84 pages
Probability and Statistics: Dr. Jagannath Bhanja
No ratings yet
Probability and Statistics: Dr. Jagannath Bhanja
33 pages
Cambridge Methods 1/2 - Chapter 9 Probability
No ratings yet
Cambridge Methods 1/2 - Chapter 9 Probability
38 pages
Methods 1 2 Chapter 9 Worked Solutions
No ratings yet
Methods 1 2 Chapter 9 Worked Solutions
34 pages
Solutions For Chap 6 Part I
No ratings yet
Solutions For Chap 6 Part I
5 pages
HW 1 Sol
No ratings yet
HW 1 Sol
11 pages
Solutions To Durrett's Probability: Theory and Examples: 1 Martingales
No ratings yet
Solutions To Durrett's Probability: Theory and Examples: 1 Martingales
19 pages
Homework 1 Solution
No ratings yet
Homework 1 Solution
14 pages
MATH 239 1065 Midterm Exam
No ratings yet
MATH 239 1065 Midterm Exam
10 pages
HW3 Probability
No ratings yet
HW3 Probability
5 pages
116 hw1
No ratings yet
116 hw1
5 pages
NUS ST2334 Lecture Notes
No ratings yet
NUS ST2334 Lecture Notes
56 pages
Mathematical Foundations of Computer Science Lecture Outline
No ratings yet
Mathematical Foundations of Computer Science Lecture Outline
6 pages
Tutorial4 solutionsEE3110
No ratings yet
Tutorial4 solutionsEE3110
6 pages
Introduction To Probability: Math 30530, Section 01 - Fall 2012 Homework 4 - Solutions
No ratings yet
Introduction To Probability: Math 30530, Section 01 - Fall 2012 Homework 4 - Solutions
8 pages
Math Module 7 Week 1
No ratings yet
Math Module 7 Week 1
10 pages
MIT8 01SC Problems03 Soln
No ratings yet
MIT8 01SC Problems03 Soln
12 pages
Functions Papst Paper 2
No ratings yet
Functions Papst Paper 2
19 pages
Series Solution of Differential Equations and Special Functions
No ratings yet
Series Solution of Differential Equations and Special Functions
64 pages
Riemann Shock Tube
No ratings yet
Riemann Shock Tube
27 pages
HiSET Math Fpt6a
No ratings yet
HiSET Math Fpt6a
16 pages
55+Scott+Page - The Model Thinker
No ratings yet
55+Scott+Page - The Model Thinker
39 pages
Speed Control of Separately Excited DC Motor Using Fuzzy Logic Controller
No ratings yet
Speed Control of Separately Excited DC Motor Using Fuzzy Logic Controller
6 pages
Mathematics Class 10 Syllabus Break Up AY 2022-23
No ratings yet
Mathematics Class 10 Syllabus Break Up AY 2022-23
4 pages
John Nash's Problem
No ratings yet
John Nash's Problem
4 pages
Class6 Maths Patterns MAV
No ratings yet
Class6 Maths Patterns MAV
3 pages
Cs 229, Public Course Problem Set #2 Solutions: Kernels, SVMS, and Theory
No ratings yet
Cs 229, Public Course Problem Set #2 Solutions: Kernels, SVMS, and Theory
8 pages
MAD2104 Syllabus
No ratings yet
MAD2104 Syllabus
13 pages
CH6 - Trig Identities and Equations
No ratings yet
CH6 - Trig Identities and Equations
34 pages
Leibniz Integral Rule
No ratings yet
Leibniz Integral Rule
14 pages
Jurnal 4-The Effect of Contextual Teaching and Learning Approach and Motivation of Learning On The Ability of Understanding The Mathema
No ratings yet
Jurnal 4-The Effect of Contextual Teaching and Learning Approach and Motivation of Learning On The Ability of Understanding The Mathema
8 pages
Smarandache Rings, by W.B.Vasantha Kandasamy
No ratings yet
Smarandache Rings, by W.B.Vasantha Kandasamy
222 pages
6 Python Fundamentals m05 Collections Slides
No ratings yet
6 Python Fundamentals m05 Collections Slides
90 pages
Cot2022-1st Quarter-Math - Addition of Fractions With or Without Regrouping
100% (1)
Cot2022-1st Quarter-Math - Addition of Fractions With or Without Regrouping
3 pages
Math 11
No ratings yet
Math 11
1 page
QTDM Unit-3 Probability Distributions
No ratings yet
QTDM Unit-3 Probability Distributions
29 pages
Maths Assignment Unit 5
No ratings yet
Maths Assignment Unit 5
5 pages
Problems
No ratings yet
Problems
7 pages
Binary Adder
No ratings yet
Binary Adder
5 pages
The Complex Plane
No ratings yet
The Complex Plane
2 pages