100% found this document useful (1 vote)
2K views41 pages

Answer Key 2

The document lists solved exercises from chapters 1-7 of the book "Probability and Computing: Solutions to Selected Exercises" by Michael Mitzenmacher and Eli Upfal. It provides the chapter number and list of exercises that have been solved for each chapter. For Chapter 1, it gives the solutions to Exercises 1.1, 1.2, 1.4, 1.5, 1.6, 1.9, and 1.12. The solutions provide the probability calculations and workings for problems related to coin flips, dice rolls, tournament games, and other probability questions. The document notes that more solutions will continue to be added over time.

Uploaded by

jara ubbus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
2K views41 pages

Answer Key 2

The document lists solved exercises from chapters 1-7 of the book "Probability and Computing: Solutions to Selected Exercises" by Michael Mitzenmacher and Eli Upfal. It provides the chapter number and list of exercises that have been solved for each chapter. For Chapter 1, it gives the solutions to Exercises 1.1, 1.2, 1.4, 1.5, 1.6, 1.9, and 1.12. The solutions provide the probability calculations and workings for problems related to coin flips, dice rolls, tournament games, and other probability questions. The document notes that more solutions will continue to be added over time.

Uploaded by

jara ubbus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Probability and Computing : Solutions to Selected Exercises

Michael Mitzenmacher and Eli Upfal

Do Not Distribute!!!

(Current) List of Solved Exercises


Chapter 1 : 1.1 1.2 1.4 1.5 1.6 1.9 1.12 1.13 1.15 1.18 1.21 1.22 1.23 1.25 1.26
Chapter 2 : 2.1 2.5 2.6 2.7 2.8 2.10 2.13 2.14 2.15 2.17 2.18 2.19 2.20 2.21 2.24 2.32
Chapter 3 : 3.2 3.3 3.7 3.10 3.11 3.12 3.16 3.17 3.20 3.25
Chapter 4 : 4.3 4.6 4.7 4.9 4.10 4.11 4.12 4.13 4.19 4.21
Chapter 5 : 5.1 5.4 5.7 5.9 5.11 5.16 5.21 5.22 5.23 [Discussion of Exploratory Assignment]
Chapter 6 : 6.2 6.3 6.4 6.5 6.9 6.10 6.13 6.14 6.16
Chapter 7 : 7.2 7.12 7.17 7.20 7.21 7.23 7.26 7.27

We are still in the process of adding solutions. Expect further updates over time.

1
Chapter 1

Exercise 1.1: We flip a fair coin ten times. Find the probability of the following events.
(a) The number of heads and the number of tails are equal.
(b) There are more heads than tails.
(c) The ith flip and the (11 − i)th flip are the same for i = 1, . . . , 5.
(d) We flip at least four consecutive heads.
Solution to Exercise 1.1:
  10
(a) The probability is 10
5 /2 = 252/1024 = 63/256.
(b) By part (a), the probability that the number of heads and the number of tails is different is 193/256.
By symmettry, the probability that there are more heads than tails is 1/2 of this, or 193/512.
(c) The probability that each pair is the same is 1/2; by independence, the probability that all five pairs
are the same is 1/32.
(d) While there are other ways of solving this problem, with only 1024 possibilities, perhaps the easiest
way is just to exhaustively consider all 1024 possibilities (by computer!). This gives 251/1024.
Exercise 1.2: We roll two standard six-sided dice. Find the probability of the following events, assuming
that the outcomes of the rolls are independent.
(a) The two dice show the same number.
(b) The number that appears on the first die is larger than the number on the second.
(c) The sum of the dice is even.
(d) The product of the dice is a perfect square.
Solution to Exercise 1.2:
(a) The probability the second die matches the first is just 1/6.
(b) By part (a), the probability that the rolls are different is 5/6. By symmettry, the probability that the
first die is larger is 5/12.
(c) The probability is 1/2. This can be done by considering all possibilities exhaustively. Alternatively,
if the first die comes up with an odd number, the probability is 1/2 that the second die is also odd
and the sum is even. Similarly, if the first die comes up with an even number, the probability is 1/2
that the second die is also even and the sum is even. Regardless of the outcome of the first roll, the
probability is 1/2.
(d) The possible squares are 1, 4, 9, 16, 25, and 36. There is 1 way for the product to be 1, 9, 16, 25, or
36, and 3 ways for the product to be 4. This gives a total probability of 8/36 = 2/9.
Exercise 1.4: We are playing a tournament in which we stop as soon as one of us wins n games. We are
evenly matched, so each one of us wins any game with probability 1/2, independently of other games. What
is the probability that the loser has won k games when the match is over?
Solution to Exercise 1.4: Suppose that you win, and I have won k < n games. For this to happen, you
must win the last game, and in the remaining n+ k − 1 games, I must win exactly k of them. This probability
is just  
n+k−1
k
.
2n+k

2
Of course, I could win, which has the same probability by symmetry. Hence the total desired probability is
n+k−1
k
.
2n+k−1

Exercise 1.5: After lunch one day, Alice suggests to Bob the following method to determine who pays.
Alice pulls three six-sided dice from her pocket. These dice are not the standard dice, but have the following
numbers on their faces:
• Die A: 1,1,6,6,8,8
• Die B: 2,2,4,4,9,9
• Die C: 3,3,5,5,7,7
The dice are fair, so each side comes up with equal probability. Alice explains that Alice and Bob will each
pick up one of the dice. They will each roll their die, and the one who rolls the lowest number loses and will
buy lunch. So as to take no advantage, Alice offers Bob the first choice of the dice.

(a) Suppose that Bob chooses Die A and Alice chooses Die B. Write out all of the possible events and their
probabilities, and show that the probability that Alice wins is bigger than 1/2.
(b) Suppose that Bob chooses Die B and Alice chooses Die C. Write out all of the possible events and their
probabilities, and show that the probability that Alice wins is bigger than 1/2.
(c) Since Die A and Die B lead to situations in Alice’s favor, it would seem that Bob should choose Die
C. Suppose that Bob chooses Die C and Alice chooses Die A. Write out all of the possible events and
their probabilities, and show that the probability that Alice wins is still bigger than 1/2.

Solution to Exercise 1.5: By enumerating all cases, we find the second player wins with probability 5/9 in
all cases. For example, if Bob chooses Die A and Alice chooses Die B, there are nine equally likely outcomes:

(1, 2), (1, 4), (1, 9), (6, 2), (6, 4), (6, 9), (8, 2), (8, 4), (8, 9).

By counting, we see Alice wins in five of the nine situations.


It might seem odd that Alice always wins. There is a literature on this subject, which is known as
non-transitive dice.
Exercise 1.6: Consider the following balls and bin game. We start with 1 black ball and 1 white ball in a
bin. We repeatedly do the following: we choose one ball from the bin uniformly at random, and put the ball
back in the bin with another ball of the same color. We repeat until there are n balls in the bin. Show that
the number of white balls is equally likely to be any number between 1 and n − 1.
Solution to Exercise 1.6: Let En,k be the event that there are k white balls when the bin has n balls. We
1
want to show that P r[En,k ] = n−1 when 1 ≤ k < n and 0 otherwise. We prove this by induction on n. The
claim holds when n = 2. Suppose it holds for n = m. For n = m + 1, condition on the state after m balls.
For Em+1,k to occur, either Em,k or Em,k−1 occurs. In the first case, we must choose a black ball last (with
probability (m − k)/m). In the second case, we must choose a white ball last (with probability (k − 1)/m).
By the inductive hypothesis,

Pr[Em+1,k ] = Pr[Em+1,k |Em,k ] · Pr[Em,k ] + Pr[Em+1,k |Em,k−1 ] · Pr[Em,k−1 ]


m−k 1 k−1 1 1
= · + · = ,
m m−1 m m−1 m
m
as desired. Since k=1 Pr[Em+1,k ] = 1, Pr[Em+1,k ] = 0 for all other values of k.
Exercise 1.9: Suppose that a fair coin is flipped n times. For k > 0, find an upper bound on the probability
that there is a sequence of log2 n + k consecutive heads.

3
Solution to Exercise 1.9: Let us assume n is a power of 2. The probaibility of having log2 n+k consecutive
heads starting from the jth flip is
1 1
log n+k
= k .
2 2 2 n
Here j can range from 1 to n − log2 n − k + 1; for j > n − log2 n − k + 1, there are not enough flips to have
k in a row before reaching the nth flip. Using a union bound, we have an upper bound of
n − log2 n − k + 1 1
k
≤ k.
2 n 2
Exercise 1.12: The following problem is known as the Monty Hall problem, after the host of the game
show “Let’s Make a Deal.” There are three curtains. Behind one curtain is a new car, and behind the other
two are goats. The game is played as follows. The contestant chooses the curtain that she thinks the car is
behind. Monty then opens one of the other curtains to show a goat. (Monty may have more than one goat
to choose from; in this case, assume he chooses which goat to show uniformly at random.) The contestant
can then stay with the curtain she originally chose or switch to the other unopened curtain. After that, the
location of the car is revealed, and the contestant wins the car or the remaining goat. Should the contestant
switch curtains or not, or does it make no difference?
Solution to Exercise 1.12:
We assume that the car is behing a curtain chosen uniformly at random. Since the contestant can pick
from 1 car and 2 goats, the probability of choosing a curtain with a car behind it is 1/3 and that of choosing
a curtain with a goat behind it is 2/3; if the contestant doesn’t switch, the probability of winning is just 1/3.
Now notice that if the contestant switches, she will win whenever she started out by picking a goat! It
follows that switching wins 2/3 of the time, and switching is a much better strategy.
This can be also set up as a conditional probability question, but the above argument is perhaps the
simplest; or simply playing the game several times (using cards – an Ace for the car, deuces for the goats)
can be very convincing.
Exercise 1.13: A medical company touts its new test for a certain genetic disorder. The false negative rate
is small: if you have the disorder, the probability that the test returns a positive result is 0.999. The false
positive rate is also small: if you do not have the disorder, the probability that the test returns a positive
result is only 0.005. Assume that 2 percent of the population has the disorder. If a person chosen uniformly
from the population is tested, and the result comes back positive, what is the probability that the person
has the disorder?
Solution to Exercise 1.13: Let X be the event that the person has the disorder, and Y be the event that
the test result is positive. Then
Pr(X ∩ Y ) Pr(X ∩ Y )
Pr(X | Y ) = = .
Pr(Y ) Pr(X ∩ Y ) + Pr(X̄ ∩ Y )
We now plug in the appropriate numbers:
0.02 · 0.999 999
Pr(X | Y ) = = ≈ 0.803.
0.02 · 0.999 + 0.98 · 0.005 1244
Exercise 1.15: Suppose that we roll ten standard six-sided dice. What is the probability that their sum
will be divisible by 6, assuming that the rolls are independent? (Hint: use the principle of deferred decisions,
and consider the situation after rolling all but one of the dice.)
Solution to Exercise 1.15: The answer is 1/6. Using the principle of deferred decisions, consider the
situation after the first 9 die rolls. The remainder when dividing the sum of the nine rolls by 6 takes on one
of the values 0, 1, 2, 3, 4, or 5. If the remainder is 0, the last roll needs to be a 6 to have the final sum be
divisible by 6; if the remainder is 1, the last roll needs to be a 5 to have the final sum be divisible by 6; and
so on. For any value of the remainder, there is exactly one roll which will make the final sum divisible by 6,
so the probability is 1/6. (If desired, this can be made more formal using conditional probability.)
Exercise 1.18: We have a function F : {0, . . . , n − 1} → {0, . . . , m − 1}. We know that for 0 ≤ x, y ≤ n − 1,
F ((x + y) mod n) = (F (x) + F (y)) mod m. The only way we have to evaluate F is to use a look-up table

4
that stores the values of F . Unfortunately, an Evil Adversary has changed the value of 1/5 of the table
entries when we were not looking.
Describe a simple randomized algorithm that, given an input z, outputs a value that equals F (z) with
probability at least 1/2. Your algorithm should work for every value of z, regardless of what values the
Adversary changed. Your algorithm should use as few lookups and as little computation as possible.
Suppose I allow you to repeat your initial algorithm three times. What should you do in this case, and
what is the probability that your enhanced algorithm returns the correct answer?
Solution to Exercise 1.18: Choose x randomly and let y = z − x mod n. Look up F (x) and F (y) and
output (F (x) + F (y)) mod n. By the union bound, the probability of an error is bounded by the sum of
the probability that F (x) was changed and the probability that F (y) was changed. This sum is 2/5, so
(F (x) + F (y)0 + F (z) mod n with probability at least 3/5. The algorithm uses two lookups and one random
choice. Notice that the probability that F (x) and F (y) are both correct is not (4/5)2 ; this calculation
assumes independence. Only x is chosen randomly, while y is computed from x, so (x, y) is not a random
pair.
Suppose we run the algorithm three times independently and return the majority answer (if there is
one). This scheme returns the correct answer if at least two of the trials returns the correct answer. By
independence, exactly two trials are correct with probability at least 3 · ( 35 )2 · 25 = 125
54
and all three trials are
3 3 27 81
correct with probability at least ( 5 ) = 125 , so the probability of a correct answer is at least 125 .
Exercise 1.21: Given an example of three random events X, Y , Z for which any pair is independent but
all three are not mutually independent.
Solution to Exercise 1.21: The standard example is to let X and Y be independent random bits (0 with
probability 1/2, 1 with probability 1/2) and let Z be their exclusive-or (equivalently, their sum modulo 2).
Independence follows from calculations such as

Pr(Z = 0 | X = 0) = Pr(Y = 0) = 1/2,

so X is independent from Z, and similarly Y is. The three values are clearly not independent since Z is
determined by X and Y . This construction generalizes; if Z is the exclusive-or of independent random bits
X1 , . . . , Xn , then any collection of n of the variables are independent, but the n + 1 variables are not.
Exercise 1.22:
(a) Consider the set {1, . . . , n}. We generate a subset X of this set as follows: a fair coin is flipped
independently for each element of the set; if the coin lands heads, the element is added to X, and
otherwise it is not. Argue that the resulting set X is equally likely to be any one of the 2n possible
subsets.
(b) Suppose that two sets X and Y are chosen independently and uniformly at random from all the 2n
subsets of {1, . . . , n}. Determine Pr(X ⊆ Y ) and Pr(X ∪ Y = {1, . . . , n}). (Hint: use the first part of
this problem.)
Solution to Exercise 1.22: For the first part, there are 2n possible outcomes for the n flips, and each
gives a different set; since there are 2n sets, each set must come up with probability 1/2n .
For the second part, suppose we choose sets X and Y using the coin flipping method given in the first
part. In order for X ⊆ Y , there must be no element that is in X but not in Y . For each of the n elements, the
probability that is in X but not in Y is 1/4, and these events are independent for each element. Hence the
probability that X ⊆ Y is (1 − 1/4)n = (3/4)n . Similarly, the probability that X ∪ Y contains all elements
is (3/4)n , since the probability that each element is in X or Y (or both) is 3/4 independently for each item.
Exercise 1.23: There may be several different min-cut sets in a graph. Using the analysis of the randomized
min-cut algorithm, argue that there can be at most n(n − 1)/2 distinct min-cut sets.
Solution to Exercise 1.23: We have found that the probability that any specific min-cut set is returned
is at least 2/(n(n − 1)). If there are k distinct min-cut sets, the probability that any of them is returned is
2k/(n(n − 1)), and we must have
2k
≤ 1.
n(n − 1)

5
We can conclude that there are at most k ≤ n(n − 1)/2 distinct min-cut sets.
Exercise 1.25 To improve the probability of success of the randomized min-cut algorithm, it can be run
multiple times.
(a) Consider running the algorithm twice. Determine the number of edge contractions and bound the
probability of finding a min-cut.
(b) Consider the following variation. Starting with a graph with n vertices, first contract the graph down to
k vertices using the randomized min-cut algorithm. Make copies of the graph with k vertices, and now
run the randomized algorithm on this reduced graph  times, independently. Determine the number
of edge contractions and bound the probability of finding a minimum cut.
(c) Find optimal (or at least near-optimal) values of k and  for the variation above that maximize the
probability of finding a minimum cut while using the same number of edge contractions as running the
original algorithm twice.
Solution to Exercise 1.25
(a) There are 2n − 4 edge contractions. The probability of success if each run is independent is at least:
 2
2 4(n2 − n − 1)
1− 1− = .
n(n − 1) n2 (n − 1)2

(b) First consider the probability that no edge from min-cut is contracted in the first n − k contraction
steps. In the notation of the book, this is

Pr(Fn−k ) = Pr(En−k | Fn−k−1 ) · Pr(En−k−1 | Fn−k−2 ) · · · Pr(E2 | F1 ) Pr(F1 )



n−k
n−i−1

i=1
n−i+1
k(k − 1)
= .
n(n − 1)

Now consider the last k − 2 contractions. The probability that no min-cut edge is contracted in these
steps is:

Pr(Fn−2 | Fn−k ) = Pr(En−2 | Fn−3 ) · Pr(En−3 | Fn−4 ) · · · Pr(En−k+1 | Fn−k )


 n−i−1
n−2

n−i+1
i=n−k+1
2
= .
k(k − 1)
Alternatively, this follows by thinking of the subproblem as a case of running the original min-cut
algorithm on k vertices.
It follows that if we repeat the last k − 2 contractions independent  times the probability of success
is at least
 

  k(k − 1) 2

Pr(Fn−k ) · 1 − (1 − Pr(Fn−2 | Fn−k )) ≥ 1− 1− .
n(n − 1) k(k − 1)

(c) Assuming that k and n are large enough so that we can ignore constants, the expression from the
previous part is approximately  

k2 2
1− 1− 2 ,
n2 k

6
and we require 2n − 4 = (n − k) + (k − 2), or roughly speaking k ≈ n. We note that when k = c1 n1/3
and  = c2 n2/3 the resulting probability of success is at least Ω(n−4/3 ), and this is essentially to the
best possible. This can be checked by various programs like Mathematica. Alternatively, note that
when  < k 2 , as an approximation, we hvae
 

2 2
1− 1− 2 ≈ 2,
k k

and hence the success probability is at least (approximately) 2/n2 , suggesting  should be as large as
possible. However, once  is large enough so that (1 − (1 − 2/k 2 ) ) is constant, increasing  can only
improve this term by at most a constant factor, suggesting k should be as large as possible. This gives
the tradeoff giving the result.
Exercise 1.26: Tic-tac-toe always ends up in a tie if players play optimally. Instead, we may consider
random variations of tic-tac-toe.
(a) First variation: Each of the nine squares is labeled either X or O according to an independent and
uniform coin flip. If only one of the players has one (or more) winning tic-tac-toe combinations, that
player wins. Otherwise, the game is a tie. Determine the probability that X wins. (You may want to
use a computer program to help run through the configurations.)
(b) Second variation: X and O take turns, with the X player going first. On the X player’s turn, an X
is placed on a square chosen independently and uniformly at random from the squares that are still
vacant; O plays similarly. The first player to have a winning tic-tac-toe combination wins the game,
and a tie occurs if neither player achieves a winning combination. Find the probability that each player
wins. (Again, you may want to write a program to help you.)
Solution to Exercise 1.26: For the first variation, out of 29 = 512 possible games, there are 116 ties. By
symmetry, X and O win with equal probability, so each wins 198 of the possible games, giving a winning
probability of roughly .386.
For the second variation, there are 255168 possible games. X wins 131184; Y wins 77904, leaving 46080
ties. However, shorter games occur with higher probability. Specifically, a game with k moves occurs with
1
probability 9·8·...·(9−k+1) . Taking this into account, X wins with probability roughly 0.585, and Y wins with
probability roughly 0.288.

7
Chapter 2
Exercise 2.1: Suppose we roll a fair k-sided die with the numbers 1 through k on the die’s faces. If X is
the number that appears, what is E[X]?
Solution to Exercise 2.1:
k
1 k(k + 1) k+1
E[X] = ·i= = .
i=1
k 2k 2

Exercise 2.5: If X is a B(n, 1/2) random variable with n ≥ 1, show that the probability that X is even is
1/2.
Solution to Exercise 2.5: This can be proven by induction. It is trivially true for n = 1. Now note that a
B(n, 1/2) random variable corresponds to the number of heads in n fair coin flips. Consider n flips. By the
induction hypothesis, the number Y of heads after the first n − 1 of these coin flips, which has distribution
B(n − 1, 1/2), is even with probability 1/2. Flipping one more coin, we find the number of heads is
(1/2) · Pr(Y is even) + (1/2) · Pr(Y is odd) = 1/2.
In fact, notice we don’t even need the induction! Even if just the last flip is fair, the same equation holds,
and since Pr(Y is even) + Pr(Y is odd) = 1 the result holds.
Exercise 2.6: Suppose that we independently roll two standard six-sided dice. Let X1 be the number that
shows on the first die, X2 the number on the second die, and X the sum of the numbers on the two dice.
(a) What is E[X | X1 is even]?
(b) What is E[X | X1 = X2 ]?
(c) What is E[X1 | X = 9]?
(d) What is E[X1 − X2 | X = k] for k in the range [2, 12]?
Solution to Exercise 2.6:
(a) What is E[X | X1 is even]?
12

E[X | X1 is even] = Pr(X = k | X1 is even)
k=2
1 1 2 1
= 3· +4· +5· + . . . + 12
18 18 18 18
= 7.5.
There are other ways of calculating this; for example
E[X | X1 is even] = E[X1 + X2 | X1 is even]
= E[X1 | X1 is even] + E[X2 | X1 is even]
= 4 + 3.5.

(b) What is E[X | X1 = X2 ]?


6
1
2i · = 7.
i=1
6

(c) What is E[X1 | X = 9]?


There are only 4 combinations giving X = 9, all equally likely; over these 4 combinations, X1 takes on
the values from 3 to 6. Hence
1
E[X1 | X = 9] = Pr(X1 = k | X = 9) = (3 + 4 + 5 + 6) = 4.5.
4
k

8
(d) What is E[X1 − X2 | X = k] for k in the range [2, 12]?
Note that E[X1 − X2 | X = k] = E[X1 | X = k] − E[X2 | X = k]. But this difference is 0 by symmetry.
Exercise 2.7: Let X and Y be independent geometric random variables, where X has parameter p and Y
has parameter q.
(a) What is the probability that X = Y ?

(b) What is E[max(X, Y )]?


(c) What is Pr(min(X, Y ) = k)?
(d) What is E[X | X ≤ Y ]?
You may find it helpful to keep in mind the memoryless property of geometric random variables.
Solution to Exercise 2.7:
(a) Think of flipping biased coins to determine X and Y (according to the frist head for each). The
probability that X and Y both equal 1 is pq. The probability that X = 1 and Y > 1 is p(1 − q), and
the probability that X > 1 and Y = 1 is q(1 − p). Hence if the outcome is decided by the first flips of
the coins, we have

Pr(X = Y ) = pq/(pq + (1 − p)q + (1 − q)p) = pq/(1 − (1 − p)(1 − q)).

If X > 1 and Y > 1, then by the memoryless property, the distributions of X and Y are as if we are
just beginning again. Hence, whenever we stop, we must have

Pr(X = Y ) = pq/(pq + (1 − p)q + (1 − q)p) = pq/(1 − (1 − p)(1 − q)).

(b) Think of flipping biased coins to determine X and Y . Let Z = max(X, Y ). Consider the four cases for
the outcomes from the first flip of X and Y . For example, if X > 1 and Y > 1, then by the memoryless
property the remaining distribution for the number of flips until the first head for each coin remains
geometric. If X = 1 and Y > 1 then we only consider the number of flips until the first head until the
second coin. Following this logic for all four cases, we have:

E[Z] = (pq) · 1 + p(1 − q)(1 + E[Y ]) + (1 − p)q(1 + E[X]) + (1 − p)(1 − q)(1 + E[Z])
= 1 + p(1 − q)/q + q(1 − p)/p + (1 − p)(1 − q)E[Z].

This gives
E[Z] = (1 + p(1 − q)/q + q(1 − p)/p)/(1 − (1 − p)(1 − q)).

(c) What is Pr(min(X, Y ) = k)? Think of flipping biased coins to determine X and Y . On the first flip,
at least one of the two coins is heads with probability s = 1 − (1 − p)(1 − q). If not, by the memoryless
property, the distribution of flips until the first head for each coin remains geometric. It follows that
min(X, Y ) is itself geometric with parameter s, giving the distribution.
(d) Using the now familair logic of considering whether Y ≥ X is determined by the first flip of respective
coins, we find that

Pr(Y ≥ X) = (pq + p(1 − q))/(pq + (1 − p)q + (1 − q)p) = p/(1 − (1 − p)(1 − q)).

9
The expectation is then


E[X | X ≤ Y ] = x Pr(X = x | Y ≥ X)
x=1

= x Pr(Y ≥ X = x)/ Pr(Y ≥ X)
x=1

(1 − p)x−1 p(1 − q)x−1
= x
x=1
p/(1 − (1 − p)(1 − q))


= (1 − (1 − p)(1 − q)) x((1 − p)(1 − q))x−1
x=0
1
= .
1 − (1 − p)(1 − q)

Exercise 2.8:
(a) Alice and Bob decide to have children until either they have their first girl or they have k ≥ 1
children. Assume that each child is a boy or girl independently with probability 1/2, and that there
are no multiple births. What is the expected number of female children that they have? What is the
expected number of male children that they have?
(b) Suppose Alice and Bob simply decide to keep having children until they have their first girl. Assuming
that this is possible, what is the expected number of boy children that they have?
Solution to Exercise 2.8:
(a) With probability 1/2k , they have 0 girls; with probability 1−1/2k , they have 1 girl. Hence the expected
number of female children is 1 − 1/2k . The probability that they have at least i ≥ 1 boys is 1/2i , since
this occurs whenever the first i children are boys. Hence the expected number of male children is

k
1
i
= 1 − 1/2k .
i=1
2

The two values match.


(b) Extending the argument from the previous part, the expected number of boys is


1
i+1
= 1.
i=0
2

Hence, even if they have kids until their first girl, on average they’ll have just 1 boy.
Exercise 2.10:
Show by induction that if f : R → R is convex then, for any x1 , x2 , . . . , xn and λ1 , λ2 , . . . , λn with
(a) 
n
i=1 λi = 1, n

n
f λi xi ≤ λi f (xi ).
i=1 i=1

(b) Use Eqn. (2.2) to prove that if f : R → R is convex then

E[f (X)] ≥ f (E[X])

for any random vairable X that takes on only finitely many values.

10
Solution to Exercise 2.10:
(a) This is done by induction. The case n = 1 is trivial; the case n = 2 is just the definition of convexity.
Assume
n+1that the statement holds for some value n. Consider now x1 , x2 , . . . , xn , xn+1 and λ1 , λ2 , . . . , λn , λn+1
with i=1 λi = 1, If λn+1 = 1 the formula holds trivially, so we many assume λn+1 < 1. Let
γi = λ1 /(1 − λn+1 ). Now
n+1
n


f λi xi = f λi xi + λn+1 xn+1
i=1 i=1


n
= f (1 − λn+1 γi xi + λn+1 xn+1
i=1
n


≤ (1 − λn+1 )f γi xi + λn+1 f (xn+1 ).
i=1

The last line comes from convexity of f (the case of n = 2 items). But now by the induction hypothesis
n

n
(1 − λn+1 )f γi xi + λn+1 f (xn+1 ) ≤ (1 − λn+1 )γi f (xi ) + λn+1 f (xn+1 )
i=1 i=1

n
= λi f (xi ) + λn+1 f (xn+1 )
i=1

n+1
= λi f (xi ),
i=1

completing the induction.


(b) Let x1 , . . . , xn be the set of values that X takes on and let λi = Pr(X = xi ). By part (a) we have
n

n
E[f (X)] = f (xi ) Pr(X = xi ) ≥ f xi Pr(X = xi ) = f (E[X]).
i=1 i=1

Exercise 2.13:
(a) Consider the following variation of the coupon collector’s problem. Each box of cereal contains one
of 2n different coupons. The coupons are organized into n pairs, so that coupons 1 and 2 are a pair,
coupons 3 and 4 are a pair, and so on. Once you obtain one coupon from every pair, you can obtain a
prize. Assuming that the coupon in each box is chosen independently and uniformly at random from
the 2n possibilities, what is the expected number of boxes you have to buy before you can claim the
prize?
(b) Generalize the result of the problem above for the case where there are kn different coupons, organized
into n disjoint sets of k coupons, so that you need one coupon from every set.
Solution to Exercise 2.13:
(a) Let Xi be the number of boxes bought while  having a coupon from i − 1 distinct pairs to obtain a
n
coupon from an ith distinct pair. Then X = i=1 Xi is the desired number of boxes. Each Xi is a
geometric random random variable, with pi = 1 − (2(i − 1))/2n = 1 − (i − 1)/n. Hence this has the
same behavior as the coupon collector’s problem, and the same expectation, nH(n).
(b) Nothing changes; Xi is now the number of boxed needed to go from i − 1 groups to i groups, and again
pi = 1 − (2(i − 1))/2n = 1 − (i − 1)/n.

11
Exercise 2.14: The geometric distribution arises as the distribution of the number of times we flip a coin
until it comes up heads. Consider now the distribution of the number of flips X until the kth head appears,
where each coin flip comes up heads independently with probability p. Prove that this distribution is given
by  
n−1 k
Pr(X = n) = p (1 − p)n−k
k−1
for n ≥ k. (This is known as the negative binomial distribution.)
Solution to Exercise 2.14: In order for X = n, the nth flip must be heads, and from the other n − 1 flips,
we must choose exactly k − 1 of them to be heads. Hence
   
n − 1 k−1 n−1 k
Pr(X = n) = p · p (1 − p) n−k
= p (1 − p)n−k .
k−1 k−1

Exercise 2.15: For a coin that comes up heads independently with probability p on each flip, what is the
expected number of flips until the k-th head?
Solution to Exercise 2.15: Let Xi be the number of flips between the (i − 1)st and and ith heads. Then

each Xi is geometric with expectation 1/p. Let X = ki=1 Xi be the number of flips until the kth head. By
k
linearity of expectations, E[X] = i=1 E[Xi ] = k/p. Note that this gives the expectation of the negative
binomial distribution of Exercise 2.14. (The expectation of the negative binomial could also be computed
directly, but it is more work.)
Exercise 2.17: Recall the recursive spawning process described in Section 2.3. Suppose that each call
to process S recursively spawns new copies of the process S, where the number of new copies is 2 with
probability p and 0 with probability 1 − p. If Yi denotes the number of copies of S in the i-th generation,
determine E[Yi ]. For what values of p is the expected total number of copies bounded?
Solution Exercise 2.17: Let Yi be the number of processes in generation i, and similarly for Yi−1 . Let Zk
be the number of processes spawned by the kth process in generation Yi−1 . We have
yi−1

E[Yi | Yi−1 = yi−1 ] = E Zk
k=1

yi−1
= E[Zk ]
k=1
= 2yi−1 p.

Further,
E[Yi ] = E[E[Yi | Yi−1 ]] = E[2Yi−1 p] = 2pE[Yi−1 ].
Inductively, since Y0 = 1, we have E[Yi ] = (2p)i . The expected total number of processes spawned is
∞ ∞ ∞

E Yi = E[Yi ] = (2p)i ,
i=0 i=0 i=0

which is finite when 2p < 1 (or p < 1/2).


Exercise 2.18: The following approach is often called reservoir sampling. Suppose we have a sequence of
items passing by one at a time. We want to maintain a sample of one item that has the property that it is
uniformly distributed over all the items that we have seen at each step. Moreover, we want to accomplish
this without knowing the total number of items in advance or storing all of the items that we see.
Consider the following algorithm, which stores just one item in memory at all times. When the first
item appears, it is stored in the memory. When the kth item appears, it replaces the item in memory with
probability 1/k. Explain why this algorithm solves the problem.
Solution to Exercise 2.18: The proof is by induction. Clearly after the first item appears, the item in
memory is uniform over all the items that have appeared. After the kth item appears, with probability

12
1 − 1/k = (k − 1)/k the item in memory is uniform over the first k − 1 items, by they induction hypothesis.
Hence each of the first k − 1 items is in the memory with probability 1/k, and by construction so is the kth
item, completing the induction.
Exercise 2.19: Suppose that we modify the reservoir sampling algorithm of Exercise 2.18 so that, when
the kth item appears, it replaces the item in memory with probability 1/2. Explain what the distribution
of the item in memory looks like.
Solution to Exercise 2.19: By induction, we can again show that the probability that the jth item is
in memory after the kth item has appeared is 2j−k−1 , except for the first item, which is in memory with
probability 2−k+1 . (Note j ≥ 1.)
Exercise 2.20: A permutation on the numbers [1, n] can be represented as a function π : [1, n] → [1, n],
where π(i) is the position of i in the ordering given by the permutation. A fixed point of a permutation
π : [1, n] → [1, n] is a value for which π(x) = x. Find the expected number of fixed points of a permutation
chosen uniformly at random from all permutations.
Solution to Exercise 2.20: Let Xi = 1 if i is a fixed point and 0 otherwise. As each element is a fixed
point with probability 1/n in a random permutation we have E[Xi ] = 1/n. Let X = ni=1 Xi be the number
of fixed points. By linearity of expectations

n
E[X] = E[Xi ] = n/n = 1.
i=1

Exercise 2.21: Let a1 , a2 , . . . , an be a random permutation of {1, 2, . . . , n}, equally likely to be any of the
n! possible permutations. When sorting the list a1 , a2 , . . . , an , the element ai has to move a distance of
|ai − i| places from its current position to reach its position in the sorted order. Find
n

E |ai − i| ,
i=1

the expected total distance elements will have to be moved.


Solution to Exercise 2.21: There are many way to perform the calculation, but the following approach
is appealing. Consider a permutation chosen at random, and let Xij = |j − i| if in this permutation j = ai ,
and 0 otherwise. Then
n
|j − i|
E |ai − i| = E [sum1≤i,j≤n Xij ] = E[Xij ] = .
n
i=1 1≤i,j≤n 1≤i,j≤n

The first equality follows from the definition of Xij , the second from the linearity of expectations, and the
third from the fact that Xij = |j − i| precisely with probability 1/n if the permutation is random. Now to
simplify this summtation, note that over the range of i and j values, |j − i| takes on the value 1 exactly
2(n − 1) times, the value 2 exactly 2(n − 2) times, and so on; hence the sum equals


n
2i(n − i)
.
i=1
n

With some algebra, this simplifies to


n2 − 1
.
3
Exercise 2.24: We roll a standard fair die over and over. What is the expected number of rolls until the
first pair of consecutive sixes appears? (Hint: the answer is not 36.)
Solution to Exercise 2.24: One way to think of this problem is the following. Let X be the number of
rolls until we see a six and Y be the number of rolls until we see two consecutive sixes. If the next roll
following this six is also a six, then we have seen a pair of consecutive sixes. If not, then it is as though we

13
must start all over again. Hence, with probability 1/6, Y = X + 1, and with probability 5/6, Y = X + 1 + Y  ,
where Y  has the same distribution as Y . Hence
1 5
E[Y ] = E[X + 1] + E[X + 1 + Y  ].
6 6
Simplifying using linearity of expectations and E[Y  ] = E[Y ] gives
1
E[Y ] = E[X] + 1.
6
Now E[X], the expected number of rolls until our first six, equals six. Hence E[Y ] = 42.

14
Chapter 3

Exercise 3.2: Let X be a number chosen uniformly at random from [−k, k]. Find Var[X].
Solution to Exercise 3.2: We have E[X] = 0, so Var[X] = E[X 2 ]. We find


k
1
Var[X] = i2
2k + 1
i=−k


k
2
= i2
i=1
2k + 1
2 k(k + 1)(2k + 1)
=
2k + 1 6
k(k + 1)
= .
3
Exercise 3.3: Suppose that we roll a standard fair die 100 times. Let X be the sum of the numbers that
appear over the 100 rolls. Use Chebyshev’s inequality to bound Pr(|X − 350| ≥ 50).
Solution to Exercise 3.3: For a roll with outcome Y , we have E[Y ] = 7/2 and
6

Var[Y ] = E[X 2 ] − E[X]2 = i2 /6 − 49/4 = 35/12.
i=1

By linearity of expectations and of variance (when the variables being summed are independent), we have
E[X] = 350 and E[Y ] = 3500/12 = 875/3. Hence, by Chebyshev’s inequality
875 7
Pr(|X − 350| ≥ 50) ≤ = .
3 · 2500 60
Exercise 3.7: A simple model of the stock market suggests that, each day, a stock with price q will increase
by a factor r > 1 to qr with probability p and will fall to q/r with probability 1 − p. Assuming we start with
a stock with price 1, find a formula for the expected value and the variance of the price of the stock after d
days.
Solution to Exercise 3.7: Let Xd be the price of the stock after d days. Let Yi = r if the price dincreases
on the ith day and let Yi = r−1 otherwise. We assume that the Yi are independent. Now Xd = i=1 Yi , so


d 
d  d
1−p
E[Xd ] = E[ Yi ] = E[Yi ] = pr + ,
i=1 i=1
r

where we make use the independence to simplify the expectation of the product. Similarly


d 
d  d
1−p
E[Xd2 ] = E[ Yi2 ] = E[Yi2 ] = pr2 + .
i=1 i=1
r2

Hence  d  2d
1−p 1−p
Var[Xd ] = E[Xd2 ] − E[Xd ]2 = pr2 + 2 − pr + .
r r
Exercise 3.10: For a geometric random variable X, find E[X 3 ] and E[X 4 ]. (Hint: Use Lemma 2.5.)
Solution to Exercise 3.10: Let X be a geometric random variable with parameter p. Let Y = 1 if X = 1,
and Y = 0 otherwise. Finally, let X = X  + 1. By Lemma 2.5, we have

E[X 3 ] = Pr(Y = 1)E[X 3 | Y = 1] + Pr(Y = 0)E[X 3 | Y = 0]


= p · 1 + (1 − p)E[(X  + 1)3 | Y = 0].

15
Here we use E[X 3 | Y = 1] = 1 since X = 1 when Y = 1. To simplify the expression further, recall that
conditioned on Y = 0, or equivalently that X > 1, X  has a geometric distribution by the memoryless
property. Hence
E[(X  + 1)3 | Y = 0] = E[(X + 1)3 ],
and we continue

E[X 3 ] = p · 1 + (1 − p)E[(X  + 1)3 | Y = 0]


= p + (1 − p)E[(X + 1)3 ]
= p + (1 − p)E[X 3 ] + 3(1 − p)E[X 2 ] + 3(1 − p)E[X] + (1 − p)
= (1 − p)E[X 3 ] + 1 + 3(1 − p)(2 − p)/p2 + 3(1 − p)/p.

Solving for E[X 3 ] gives

E[X 3 ] = 1/p + 3(1 − p)(2 − p)/p3 + 3(1 − p)/p2


= (p2 − 6p + 6)/p3 .

Similarly, we find

E[X 4 ] = Pr(Y = 1)E[X 4 | Y = 1] + Pr(Y = 0)E[X 4 | Y = 0]


= (1 − p)E[X 4 ] + 1 + 4(1 − p)E[X 3 ] + 6(1 − p)E[X 2 ] + 4(1 − p)E[X],

giving

E[X 4 ] = 1/p + 4(1 − p)(p2 − 6p + 6)/p4 + 6(1 − p)(2 − p)/p3 + 4(1 − p)/p2
= 1/p + 4(1 − p)(p2 − 6p + 6)/p4 + 6(1 − p)(2 − p)/p3 + 4(1 − p)/p2
= (24 − 36p + 14p2 − p3 )/p4 .

Exercise 3.11: Recall the Bubblesort algorithm of Exercise 2.22. Determine the variance of the number of
inversions that need to be corrected by Bubblesort.
Solution to Exercise 3.11: Denote the input set by {1, . . . , n} and the permutation by π. For i < j, let
Xij be 1 if i and j are inverted and 0 otherwise. The total number of inversions is given by X = i<j Xij ,
and E[Xij ] is the probability a pair is inverted, which is 1/2. By linearity of expectations,
 
1 n
E[X] = E[Xij ] = .
i<j
2 2

To compute Var[X] = E[X 2 ] − E[X]2 , we find


 2 
 
E[X 2 ] = E  Xij   = 2
E[Xij ]+ E[Xij Xk ],
i<j i<j {i,j}={k,}

2
where is the second summation we have i < j and k < . Since Xij is an indicator variable, Xij = Xij , and
the first sum is just E[X]. To compute the second sum, we consider several cases. If none of the
  indices
four 
are equal, then Xij and Xk are independent, so E[Xij Xk ] = E[Xij ]E[Xk ] = 1/4. There are n2 n−2
n such
pairs (i, j), (k, ).
If the pairs overlap, there are three distinct indices i < j < k and three cases:
 
(i) (i, j), (i, k): This occurs 2 n3 ways, and in this case

1
E[Xij Xik ] = Pr(Xij Xik = 1) = Pr(π(i) > π(j) and π(i) > π(k)) = .
3

16
n
(ii) (i, j), (j, k): This occurs 2 3 ways, and in this case
1
E[Xij Xjk ] = Pr(Xij Xjk = 1) = Pr(π(i) > π(j) > π(k)) = .
6
n
(iii) (i, k), (j, k): This occurs 2 3 ways, and in this case
1
E[Xik Xjk ] = Pr(Xik Xjk = 1) = Pr(π(i) > π(k) and π(j) > π(k)) = .
3

Therefore
       
2 n 1 n 2 n 5 n
E[Xij Xk ] = + + = .
3 3 3 3 3 3 3 3
{i,j}={k,}

Finally,
        2
1 n 1 n n−2 5 n 1 n n(n − 1)(2n + 5)
V ar[X] = + + − = .
2 2 4 2 2 3 3 4 2 72
Exercise 3.12: Find an example of a random variable with finite expectation and unbounded variance.
Give a clear argument showing that your choice has these properties.
Solution to Exercise 3.12: Let X be an integer-valued  random variable defined by Pr(X = i) = c/i3 for

i ≥ 1 and some positive constant c. This is possible since i=1 1/i3 is finite. Then


E[X] = 1/i2
i=1

which is finite, but




E[X 2 ] = 1/i
i=1

which is not. Hence X has finite expectation but infinite variance.


Exercise 3.16: This problem shows that Markov’s inequality is as tight as it could possibly be. Given a
positive integer k, describe a random variable X assuming only non-negative values such that
1
Pr(X ≥ kE[X]) = .
k
Solution to Exercise 3.16: Let X = 0 with probability 1 − 1/k, and let X = k with probability 1/k. This
satisfies the equality.
Looking at the proof of Markov’s inequality, one can see that for tightness to hold, the distribution must
have this form. That is, X must either be 0 or kE[X].
Exercise 3.17: Can you give an example (similar to that for Markov’s inequality in Exercise 3.16) that
shows that Chebyshev’s inequality is tight? If not, explain why not.
Solution to Exercise 3.17: In fact you can give such an example. Looking at the proof of Theorem 3.6,
tightness will occur whenever
E[(X − E[X])2 ]
Pr((X − E[X])2 ≥ a2 ) = ,
a2
which is where Markov’s inequality is applied. By the argument for the solution to Exercise 3.16, for Markov’s
inequality to be tight, we need either (X − E[X])2 = 0 or (X − E[X])2 = a2 . Hence for some m = E[X], we
should have that X = m, X = m + a, or X = m − a. In order to have E[X] = m, the distribution should
be symmetric, so that X = m with probability p and X = m − a with probability p. Any such distribution
would be satisfactory. Students may just come up with the special case where X = m + a with probability
1/2 and X = m − a with probability 1/2, which is satisfactory.
Exercise 3.20:

17
(a) Chebyshev’s inequality uses the variance of a random variable to bound its deviation from its expec-
tation. We can also use higher moments. Suppose that we have a random variable X and an even
integer k for which E[(X − E[X])k ] is finite. Show that
  
1
Pr |X − E[X]| > t E[(X − E[X]) ] ≤ k .
k k
t

(b) Why is it difficult to derive a similar inequality when k is odd?

Solution to Exercise 3.20: For even k the kth moment is always non-negative, so we have

Pr(|X − E[X]| > α) = Pr(|X − E[X]|k > αk ) = Pr((X − E[X])k > αk ).

This allows also to use Markov inequality:



Pr(|X − E[X]| > t k E[(X − E[X])k ]) = Pr((X − E[X])k > tk E[(X − E[X])k ])
E[(X − E[X])k ]

tk E[(X − E[X])k ]
1
= .
tk

When k is odd, terms like X − E[X] might be negative, and the argument will not go through. For
example, (X − E[X])k might be negative and we cannot apply Markov’s inequality to it.
Exercise 3.25: The weak law of large numbers says that if X1 , X2 , X3 , . . . are independent and identically
distributed random variables with mean µ and standard deviation σ, then for any constant  > 0
  
 X1 + X2 + . . . + Xn 
lim Pr  − µ >  = 0.
n→∞ n

Use Chebyshev’s inequality to prove the weak law of large numbers.


Solution to Exercise 3.25:
Let Sn = X1 + X2 + · · · + Xn . We have E[Sn ] = nµ, and since the Xi are independent, we have

Var[Sn ] = Var[X1 ] + Var[X2 ] + . . . + Var[Xn ] = nσ 2 .

Fix an  > 0. We have, by Chebyshev’s inequality,


  
 X1 + X2 + · · · + Xn 
Pr  
− µ >  = Pr(|Sn − µn| > n)
n
Var[Sn ]

(n)2
σ2
= 2 .
 n
Now as n goes to infinity, σ 2 /(2 n) goes to 0. Since we are bounding a probability, which is always at
least 0, we must have   
 X1 + X2 + · · · + Xn 
lim Pr  − µ >  = 0.
n→∞ n

18
Chapter 4

Exercise 4.3:
(a) Determine the moment generating function for the binomial random variable B(n, p).
(b) Let X be a B(n, p) random variable and Y be a B(m, p) random variable, where X and Y are inde-
pendent. Use part (a) to determine the moment generating function of X + Y .
(c) What can we conclude from the form of the moment generating function of X + Y ?
Solution to Exercise 4.3:
(a) We find using the binomial theorem
n  
n
E[e tX
] = pi (1 − p)n−i eti
i=0
i
n  
n i
= pet (1 − p)n−i
i=0
i
= (1 − p + pet )n .

(b) Since X and Y are independent,

E[et(X+Y ) ] = E[etX etY ]


= E[etX ]E[etY ]
= (1 − p + pet )n (1 − p + pet )m
= (1 − p + pet )n+m .

(c) The moment generating function for X + Y has the same form as the moment generating function for
a B(n + m, p) random variable. We conclude that the sum of a B(n, p) random variable and B(m, p)
random variable gives a B(n + m, p) random variable.
Exercise 4.6:
(a) In an election with two candidates using paper ballots, each vote is independently misrecorded with
probability p = 0.02. Use a Chernoff bound to give an upper bound on the probability that more than
4 percent of the votes are misrecorded in an election of 1, 000, 000 ballots.
(b) Assume that a misrecorded ballot always counts as a vote for the other candidate. Suppose that
Candidate A received 510, 000 votes and Candidate B received 490, 000 votes. Use Chernoff bounds to
upper bound the probability that Candidate B wins the election due to misrecorded ballots. Specifically,
let X be the number of votes for Candidate A that are misrecorded and let Y be the number of votes
for Candidate B that are misrecorded. Bound Pr((X > k) ∪ (Y < )) for suitable choices of k and .

Solution to Exercise 4.6: For the first part, a variety of Chernoff bounds could be used. Applying
Equation (4.1), for instance, gives that if Z is the number of misrecorded votes out of n total votes, then
 e 0.02n
Pr(Z ≥ 2(0.02n)) ≤ .
4
With n being 1 million, the right hand side is (e/4)20000 , which is very small.
For the second part, as an example, in order for candidate B to win due to misrecording of ballots, at
least one of the following must occur: either more than 15000 votes for A get misrecorded or less than 5000

19
votes for B get misrecorded. If both of these fail to occur, A wins, since A will obtain at least 510000 - 15000
+ 5000 = 500000 votes. Hence bounding
Pr((X > 15000) ∩ Pr(Y < 5000))
suffices. More generally, we could say that either more than 10000 + z votes for A get misrecorded or less
than z votes for B get misrecorded, for any value of z; choosing the best value for z is then an optimization
problem, and the best value could depend on the form of Chernoff bound used.
Exercise 4.7: Throughout the chapter we implicitly assumed the following extension of the Chernoff bound.
Prove that itis true.
n
Let X = i=1 Xi , where the Xi ’s are independent 0-1 random variables. Let µ = E[X]. Choose any µL
and µH such that µL ≤ µ ≤ µH . Then for any δ > 0,
 µH

Pr(X ≥ (1 + δ)µH ) ≤ .
(1 + δ)(1+δ)
Similarly, for any 0 < δ < 1,  µL
e−δ
Pr(X ≤ (1 − δ)µL ) ≤ .
(1 − δ)(1−δ)
Solution to Exercise 4.7:
t t
(a) Let t = ln(1 + δ) > 0. By section 4.3 of the book, E[etX ] ≤ e(e −1)µ ≤ e(e −1)µH , where the second
inequality follows from et − 1 > 0 for t > 0 and µ ≤ µH . Following the proof of the normal Chernoff
bound, we get:

Pr(X ≥ (1 + δ)µH ) = Pr(etX ≥ et(1+δ)µH )


E[etX ]

et(1+δ)µH
t
e(e −1)µH

et(1+δ)µH
 µH

= .
(1 + δ)(1+δ)

t t
(b) Let t = ln(1 − δ) < 0. By section 4.3 of the book, E[etX ] ≤ e(e −1)µ ≤ e(e −1)µL , where the second
inequality follows from et − 1 < 0 for t < 0 and µL ≤ µ. Following the proof of the normal Chernoff
bound, we get:

Pr(X ≤ (1 − δ)µL ) = Pr(etX ≥ et(1−δ)µL )


E[etX ]

et(1−δ)µL
t
e(e −1)µL

et(1−δ)µL
 µL
e−δ
= .
(1 − δ)(1−δ)

Exercise 4.9: Suppose that we can obtain independent samples X1 , X 2 , . . . , of a random variable X, and
t
we want to use these samples to estimate E[X]. Using t samples, we use i=1 Xi /t for our estimate of E[X].
We want the estimate to be within E[X] from the true value of E[X] with probability at least 1 − δ. We
may not be able to use Chernoff’s bound directly to bound how good our estimate is if X is not a 0 − 1
random variable, and we do not know its moment generating function.
√ We develop an alternative approach
Var[X]
that requires only having a bound on the variance of X. Let r = E[X] .

20
2
(a) Show using Chebyshev’s inequality that O( r2 δ ) samples are sufficient to solve the above problem.
(b) Suppose that we only need a weak estimate that is within E[X] of E[X] with probability at least 3/4.
2
Argue that only O( r2 ) samples are enough for this weak estimate.
(c) Show that by taking the median of O(log 1δ ) weak estimates, we can obtain an estimate within E[X]
2
log 1/δ
of E[X] with probability at least 1 − δ. Conclude that we only need O( r 2 ) samples.
Solution to Exercise 4.9:
2
1 n
(a) We take n = r2 δ
independent samples X1 , . . . , Xn , and return Z = n i=1 Xi as our estimate.
By linearity of expectation, E[Z] = E[X] = µ. Also, since the Xi are independent and all have the
same distribution as X,
n
1
Var[Z] = Var Xi
n2 i=1
1
= Var[X]
n
r2 µ2
= .
n

Thus, by Chebyshev’s Inequality,


Var[Z]
Pr(|Z − µ| ≥ µ) ≤
2 µ2
r2
=
n2
≤ δ.
2
r
Hence n = O( δ 2 ) samples are sufficient.

2 2
(b) Just set δ = 1 − 3/4 = 1/4 in part a. Hence we need O(4 r2 ) = O( r2 ) samples.
(c) We run the algorithm from part b m = 12 ln 1δ
times, obtaining independent estimates S1 , . . . , Sm for
E[X]. We then return the median of the Si ’s, which we denote by M , as our new estimate for E[X].
the 0-1 random variable that is 1 if and only if |Si − µ| > µ. By part b, E[Yi ] ≤ 1/4. Now,
Let Yi be
m
let Y = i=1 Yi . We see that E[Y ] ≤ m/4 by linearity of expectations, and that Y is the sum of
independent 0-1 random variables, so we can (and will) apply the Chernoff bound to Y .
Now, if |M − µ| > µ, then by the definition of the median, |Si − µ| > µ for at least m/2 of the Si ’s,
and so Y ≥ m/2 = (1 + 1) m4 . Applying the Chernoff bound gives

Pr(|M − µ| > µ) ≤ Pr(Y ≥ (1 + 1)m/4)


1 m 2
≤ e− 3 · 4 ·1
1 1
≤ e− 12 ·12 ln δ
≤ eln δ
= δ.

Exercise 4.10: A casino is testing a new class of simple slot machines. Each game, the player puts in $1,
and the slot machine is supposed to return either $3 to the player with probability 4/25, $100 with proability
1/200, and nothing with all remaining probability. Each game is supposed to be independent of other games.
The casino has been surprised to find in testing that the machines have lost $10,000 over the first millions
game. Derive a Chernoff bound for the probability of this event. You may want to use a calculator or program
to help you choose appropriate values as you derive your bound.

21
Solution to Exercise 4.10:
1000000
Let X = i=1 Xi , where Xi is the winnings to the player from the ith game. With probability 1/200,
Xi = 99; With probability 4/25, Xi = 2; and with probability 167/200, Xi = −1. Note E[Xi ] = −1/50, so
on average after 1 million games you would expect to lose 20000 dollars. It seems very odd for the casino to
instead lose 10000 dollars.
A direct application of Markov’s inequality gives, for t > 0,

Pr(X ≥ 10000) = Pr(etX ≥ e10000t )


E[etX ]

e10000t
(E[etXi ])1000000
=
e10000t
((167/200)e−t + (4/25)e2t + (1/200)e99t )1000000
=
e10000t
−1.01t
= ((167/200)e + (4/25)e1.99t + (1/200)e98.99t )1000000

Now we simply have to optimize to choose the right value of t. For t = 0.0006 the expression

((167/200)e−1.01t + (4/25)e1.99t + (1/200)e98.99t )

is approximately 0.9999912637. You can do a little better; for t = 0.00058 the expression is approximately
0.9999912508. The resulting bound on Pr(X ≥ 10000) is approximately 1.6E − 4.
Exercise 4.11: Consider na collection X1 , . . . , Xn of n independent integers chosen uniformly from the
set {0, 1, 2}. Let X = i=1 Xi and 0 < δ < 1. Derive a Chernoff bound for Pr(X ≥ (1 + δ)n) and
Pr(X ≤ (1 − δ)n).
Solution to Exercise 4.11:
We consider just the case of Pr(X ≥ (1 + δ)n). Using the standard Chernoff bound argument, for any
t > 0,

Pr(X ≥ (1 + δ)n) = Pr(etX ≥ et(1+δ)n )


E[etX ]

et(1+δ)n
(E[etXi ])n
=
et(1+δ)n
((1 + et + e2t )/3)n
=
et(1+δ)n
 n
(1 + et + e2t )
= .
3et(1+δ)

We therefore need to choose t minimize the expression


(1 + et + e2t )
.
3et(1+δ)
Sadly, there does not appear to be a simple form for t as a function of δ. For any specific δ, the best t can
be found numerically. Using the choice t = ln(1 + δ) used in other bounds, we have
 
1 + δ + δ 2 /3
Pr(X ≥ (1 + δ)n) ≤ .
(1 + δ)(1+δ)

Exercise 4.12: Recall that a function f is said to be convex if, for any x1 , x2 and 0 ≤ λ ≤ 1,

f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ).

22
(a) Let Z be a random variable that takes on a (finite) set of values in the interval [0, 1], and let p = E[Z].
Define the Bernoulli random variable X by Pr(X = 1) = p and Pr(X = 0) = 1 − p. Show that
E[f (Z)] ≤ E[f (X)] for any convex function f .
(b) Use the fact that f (x) = etx is convex for any t ≥ 0 to obtain a Chernoff-like bound for Z based on a
Chernoff bound for X.
Solution to Exercise 4.12: Consider acollection X1 , . . . , Xn of n independent geometrically distributed
n
random variables with mean 2. Let X = i=1 Xi and δ > 0.
(a) Derive a bound on Pr(X ≥ (1 + δ)(2n)) by applying the Chernoff bound to a sequence of (1 + δ)(2n)
fair coin tosses.
(b) Directly derive a Chernoff bound on Pr(X ≥ (1 + δ)(2n)) using the moment-generating function for
geometric random variables.
(c) Which bound is better?
Solution to Exercise 4.12:
(a) Let Y denote the number of heads in (1 + δ)2n coin flips. We note that the events {X ≥ (1 + δ)2n}
and {Y ≤ n} are equivalent, since the geometric random variables can be thought of as the number
of flips until the next heads when flipping the coins. Now E[Y ] = 12 (1 + δ)2n = (1 + δ)n = µ, so
1 1
n = 1+δ µ = (1 − (1 − 1+δ ))µ = (1 − 1+δ
δ
)µ. Using the Chernoff bound (4.5) we get

δ δ2
Pr(Y ≤ n) = Pr(Y ≤ (1 − )µ) ≤= e−n 2(1+δ) .
1+δ

(b) From page 62 of the book, we have for a geometric random variable Z,
p
MZ (t) = ((1 − (1 − p)et )−1 − 1).
1−p
1
Plugging in p = 2 we get
1 t
1 2e et
MZ (t) = (1 − et )−1 − 1 = 1 = .
2 1 − 2 et ) 2 − et

For 0 < t < ln 2, we get that

E[etX ]
Pr(X ≥ (1 + δ)2n) = Pr(etX ≥ et(1+δ)2n ) ≤ t(1+δ)2n
 t n e
n e
MX (t) MXi (t) 2−et
t(1+δ)2n
= i=1
t(1+δ)2n
= t(1+δ)2n = e−nt(1+2δ) (2 − et )−nt
e e e
Using calculus, we find the function e−nt(1+2δ) (2 − et )−nt is minimized for t = ln(1 + δ
δ+1 ) < ln 2.
Finally we get that

Pr(X ≥ (1 + δ)2n) ≤ min e−nt(1+2δ) (2 − et )−nt =


0<t<ln 2
 −n(1+2δ)  −n
δ δ δ
1+ 1− , setting t = ln(1 + ).
δ+1 δ+1 δ+1

(c) Using calculus, we can show that the bound from the moment generating function is better under this
derivation.

23
n
Exercise 4.13: Let X1 , . . . , Xn be independent Poisson trials such that Pr(Xi = 1) = p. Let X = i=1 Xi ,
so that E[X] = pn. Let
F (x, p) = x ln(x/p) + (1 − x) ln((1 − x)/(1 − p)).
(a) Show that for 1 ≥ x > p,
Pr(X ≥ xn) ≤ e−nF (x,p) .

(b) Show that when 0 < x, p < 1, F (x, p) − 2(x − p)2 ≥ 0. (Hint: take the second derivative of F (x, p) −
2(x − p)2 with respect to x.)
(c) Using the above, argue that
2
Pr(X ≥ (p + )n) ≤ e−2n .

(d) Use symmetry to argue that


2
Pr(X ≥ (p − )n) ≤ e−2n ,
and conclude that 2
Pr(|X − pn| ≥ n) ≤ 2e−2n .

Solution to Exercise 4.13:


(a) In the proof of the basic Chernoff bound, we showed that

MXi (t) = E[etXi ]


= pi et + (1 − pi )
= pet + (1 − p)

since in this case, pi = p always. It follows that E[etX ] = MX (t) = (pet + (1 − p))n . Now for any t > 0,
by Markov’s inequality,

Pr(X ≥ xn) = Pr(etX ≥ etxn )


≤ E[etX ]/etxn
= (pet(1−x) + (1 − p)e−tx )n .

To find the appropriate value of t to minimize this quantity, we take the derivative of the expression
inside the parentheses with respect to t. The derivative is:

p(1 − x)et(1−x) − (1 − p)xe−tx .

It is easy to check that this is 0 (and gives a minimum) at

(1 − p)x (1 − p)x
et = , or t = ln ,
p(1 − x) p(1 − x)

and plugging in this value for t now gives the desired result, as

(pet + (1 − p))n
Pr(X ≥ xn) ≤
etxn
((1 − p)x/(1 − x) + (1 − p))n
=
((1 − p)x/(p(1 − x)))xn
((1 − p)/(1 − x))n
=
((1 − p)x/(p(1 − x)))xn
 n(1−x)  
1−p p nx
=
1−x x
−nF (x,p)
= e .

24
(b) The first derivative of F (x, p) − 2(x − p)2 with respect to x is
ln(x/p) − ln((1 − x)/(1 − p)) − 4(x − p)
and the second derivative is (for 0 < x < 1)
1/x + 1/(1 − x) − 4 ≥ 0.
As the first derivative is 0 when x = p and the second derivative is non-negative we must have that
x = p gives a global minimum, so when 0 < x, p < 1, F (x, p) − 2(x − p)2 ≥ 0.
(c) It now follows that
2
Pr(X ≥ (p + )n) ≤ e−nF (p+,p) ≤ e−2n .
n
(d) Suppose we let Yi = 1 − Xi and Y = i=1 Yi . Since the Yi are independent Poisson trials with
Pr(Yi = 1) = 1 − p we have
Pr(X ≤ (p − )n) = Pr(Y ≥ (1 − p + )n)
2
≤ e−2n
by applying part (c) to Y . The result follows.
Exercise 4.19:
Solution to Exercise 4.19: It is worth noting that a function f is convex if it has a second derivative
which is non-negative everywhere; this explains why etx is convex.
Let Z be a random variable that takes on a finite set of values S ⊂ [0, 1] with p = E[Z]. Let X be a
Bernoulli random variable with p = E[X]. We show E[f (Z)] ≤ E[f (X)] for any convex function f . By
definition, f (s) ≤ (1 − s)f (0) + sf (1), so

E[f (Z)] = Pr(Z = s)f (s)
s∈S

≤ Pr(Z = s)(1 − s)f (0) + Pr(Z = s)f (1)
s∈S s∈S
= f (0)(1 − E[Z]) + f (1)E[Z]
= f (0) · (1 − p) + f (1) · p
= E[f (X)].
For the second part, the first part implies that in the derivation of the Chernoff bound, we can upper
bound E[etZi ] for any Zi with distribution Z by E[etXi ], where Xi is a Poisson trial with the same mean.
Hence any Chernoff bound that holds for the sum of independent Poisson trials with mean p holds also for
the corresponding sum of independent random variables all with distribution Z.
Exercise 4.21:
Consider the bit-fixing routing algorithm for routing a permutation on the n-cube. Suppose that n is
even. Write each source node s as the concatenation of two binary strings as and bs each of length n/2.
Let the destination of s’s packet be the√concatenation of bs and as . Show that this permutation causes the
bit-fixing routing algorithm to take Ω( N ) steps.
Solution to Exercise 4.21:
Assume that n is even and N is the total number of nodes (2n ). Take an arbitrary sequence of bits
b1 , · · · , b n2 . Examine all the nodes on the hypercube of the form (x1 , · · · , x n2 −1 , b n2 , b1 , · · · , b n2 ). Clearly we
have 2n−1 choices for the xi ’s so we have 2n−1 such nodes. Now by our choice of permutation, we know that
the packet starting at (x1 , · · · , x n2 −1 , b n2 , b1 , · · · , b n2 ) is routed to (b1 , · · · , b n2 , x1 , · · · , x n2 −1 , b n2 ). Since the
bit fixing algorithm fixes one bit at a time from left to right, by the time the n2 − 1’th bit has been aligned
the packet arrives at node (b1 , · · · , b n2 , b1 , · · · , b n2 ) and hence to align the n2 ’th bit the packet must take the
edge to (b1 , · · · , b n2 , b1 , · · · , b n2 ). Hence the packets starting at the nodes (x1 , · · · , x n2 −1 , b n2 , b1 , · · · , b n2 ) must

all take the edge from (b1 , · · · , b n2 , b1 , · · · , b n2 ) to (b1 , · · · , b n2 , b1 , · · · , b n2 ). So 2n−1 = N /2 packets must
√ √
use the same edge. Since only one packet can traverse an edge at a time, we need at least N /2 = Ω( N )
steps to complete the permutation.

25
Chapter 5

Exercise 5.1: For what values of n is (1 + 1/n)n within 1% of e? Within 0.0001% of e? Similarly, For what
values of n is (1 − 1/n)n within 1% of 1/e? Within 0.0001%?
Solution to Exercise 5.1: We note that (1+1/n)n is increasing, which can be verified by calculus. Hence we
need only find the first point where n reaches the desired thresholds. When n = 50, (1 + 1/n)n = 2.691588...,
and e − (1 + 1/n)n < 0.01e. When n = 499982, (1 + 1/n)n = 2.718287911..., and e − (1 + 1/n)n < 0.000001e.
Similarly, (1 − 1/n)n is increasing. When n = 51, (1 + 1/n)n = 0.364243..., and 1/e − (1 + 1/n)n < 0.01/e.
When n = 499991, (1 − 1/n)n = 0.36787907..., and 1/e − (1 + 1/n)n < 0.000001/e.
Exercise 5.4: In a lecture hall containing 100 people, you consider whether or not there are three people in
the rooom who share the same birthday. Explain how to calculate this probability exactly, using the same
assumptions as in our previous analysis.
Solution to Exercise 5.4:
Let us calculate the probability that there is no day with three birthdays. One approach is to set up
a recurrence. Let P (n, a, b) be the probability that when n people are each independently given a random
birthday, there are a days where two people share a birthday and b days where just one person has a birthday.
For generality let us denote the number of days in the year as m. Then we have P (1, 0, 1) = 1.0, and the
recursion
b+1 m−a−b+1
P (n, a, b) = P (n − 1, a − 1, b + 1) + P (n − 1, a, b − 1) .
m m
That is, if we consider the people one at a time, a new person either hits a day where someone else already
has a birthday, or a new person is the first to have that birthday. This recurrence allows the calculation of
the desired value.
Alternatively, we can express the probability as a summation. Let k ≥ 0 be the number of days where
two people share a birthday, and let  ≥ 0 be the number of days where just one person  has a birthday. Note
that we must have 2k +  = 100. There are 365 k ways to pick the k days, and then 365−k
 ways to pick the
other  days. Further, there are 100!/2k ways to assign the 100 people to their birthdays; note we need to
divide by 2k to account for the fact that there are two equivalent ways of assigning two people to the same
birthday for each of the k days. Each configuration has probability (1/365)100 of occurring. Putting this all
together, we have:
365!100!
.
k!!(365 − k − )!2k 365100
k,:k+2=100;k,≥0

Exercise 5.7: Suppose that n balls are thrown independently and uniformly at random into n bins.
(a) Find the conditional probability that bin 1 has one ball given that exactly one ball fell into the first
three bins.
(b) Find the conditional expectation of the number of balls in bin 1 under the condition that bin 2 received
no balls.
(c) Write an expression for the probability that bin 1 receives more balls than bin 2.
Solution to Exercise 5.7: For the first part, the ball is equally likely to fall in any of the three bins, so the
answer must be 1/3. For the second part, under the condition that bin 2 received no balls, the n balls are
uniformly distributed over the remaining n − 1 bins; hence the conditional expectation must be n/(n − 1).
For the third part, it is easier to begin by writing an expression for the probability the two bins receive the
same number of balls. This is:
n/2    2k  n−2k
n 1 2
P = 1− .
k; k; n − k n n
k=0

By symmetry, the probability we seek is (1 − P )/2.


Exercise 5.9: Consider the probability that every bin receives exactly one ball when n balls are thrown
randomly into n bins.

26
(a) Give an upper bound on this probability using the Poisson approximation.
(b) Determine the exact probability of this event.
(c) Show that these two probabilities differ by a multiplicative factor that equals the probability that
a Poisson random variable with parameter n takes on the value n. Explain why this is implied by
Theorem 5.6.

Solution to Exercise 5.9:


(a) With the Poisson approximation, we have each bin independently obtains a number of balls that is
obtains one ball is (e−1 )n = e−n .
Poisson distributed with mean 1. Hence the probability each bin √
Using Theorem 5.7, for example, would give an upper bound of e ne−n ; more tightly, delving into
Theorem 5.7, we would have an upper bound of n!/nn .
(b) This probability is just n!/nn . There are n! ways of distributing the balls one per bin, and nn total
ways of distributing the balls.
(c) The first two parts show that the probability from the Poisson approximation, namely e−n , and the
exact probability, n!/nn , differ by a factor corresponding exactly to the probability that a Poisson
random variable with parameter n takes on the value n. Looking at Theorem 5.6, we see that this
has to be the case. Note that for the Poisson approximation, we can first require that n total balls be
thrown; otherwise there is no chance of having exactly one ball in each bin. Conditioned on this event,
Theorem 5.6 tells us that the distribution will be exactly that of throwing balls into bins. Hence the
difference between the Poisson approximation and throwing balls into bins in this case is exactly the
initial conditioning on the number of balls being n with the Poisson approximation, giving the precise
factor between the probabilities.
Exercise 5.11: The following problem models a simple distributed system wherein agents contend for
resources but “back off” in the face of contention. Balls represent agents, and bins represent resources.
The system evolves over rounds. Every round, balls are thrown independently and uniformly at random
into n bins. Any ball that lands in a bin by itself is served and removed from consideration. The remaining
balls are thrown again in the next round. We begin with n balls in the first round, and we finish when every
ball is served.

(a) If there are b balls at the start of a round, what is the expected number of balls at the start of the
next round?
(b) Suppose that every round the number of balls served was exactly the expected number of balls to be
served. Show that all the balls would be served in O(log log n) rounds. (Hint: if xj is the expected
number of balls left after j rounds, show and use that xj+1 ≤ x2j /n.)

Solution to Exercise 5.11:


(a) If there are b balls at the start of a round, the probability a ball lands alone in a bin is
 b−1
1
p= 1− .
n

Hence the expected number of balls in the next round is n(1 − p). Note that this is approximately
n(1 − e−b/n ).
(b) Consider any ball in round j + 1. Considering all the other balls, at most xj of the bins have at least
one ball in it, since there are at most xj balls being placed in the round. Hence the probability any
ball is placed into a bin where there is some other ball is at most xj /n. It follows that the expected
number of balls that are left after j + 1 rounds is at most xj · (xj /n) = x2j /n.

27
From the calculations in part (a), it is easy to check that after two rounds, the expected number of
balls remaining is at most n/2. It follows inductively that for j ≥ 2
j−2
xj ≤ 2−2 n.

After j = log log n + 3 rounds, xj < 1; since xj is supposed to be integer-valued, O(log log n) rounds
suffice. (Alternatively, after O(log log n) rounds of following the expectation, we are down to 1 ball, at
which point only 1 more round is necessary.)
Exercise 5.16: Let G be a random graph generated using the Gn,p model.
 
(a) A clique of k vertcies in a graph is a subset of k vertices such that all k2 edges between these vertices
lie in the graph. For what value of p, as a function of n, is the expected number of cliques of five
vertices in G equal to 1?
(b) A K3,3 graph is a complete bipartite graph with three vertices on each side. In other words, it is a
graph with six vertices and nine edges; the six distinct vertices are arranged in two groups of three,
and the nine edges connect each of the nine pairs of vertices with one vertex in each group. For what
value of p, as function of n, is the expected number of K3,3 subgraphs in G equal to one?
(c) For what value of p, as a function of n, is the expected number of Hamiltonian cycles in the graph
equal to 1?
Solution to Exercise 5.16:
  
(a) There are n5 possible cliques of size 5. Each possible clique has 52 = 10 edges. Hence the expected
n 10  
10 n −1
number of cliques is 5 p , and this expectation equals 1 when p = 5 .
   
(b) There are 12 n3 · n−33 possible ways of choosing the six vertices of the K3,3 ; first we choose three
vertices for one side, then we choose three of the remaining vertices for the other side. (We have to
divide by two because each K3,3 is counted twice, as each set of vertices could
  show
 9 up on either side.)
As there are 9 edges in a K3,3 , the expected number that appear is 12 n3 · n−3 3 p . For this to equal
    
9 −1 −1
1, we need p = 2 n3 n−3
3 .

(c) There are (n− 1)!/2 different Hamiltonian cycles in an (undirected) graph. Hence
 the expected number
of Hamiltonian cycles is pn (n − 1)!/2, and this expectation is 1 when p = n 2/(n − 1)!.
Exercise 5.21: In hashing with open addressing, the hash table is implemented as an array, and there are
no linked lists or chaining. Each entry in the array either contains one hashed item or it is empty. The hash
function defines for each key k a probe sequence h(k, 0), h(k, 1), . . . of table locations. To insert the key k,
we examine the sequence of table locations in the order defined by the key’s probe sequence until we find
an empty location and then insert the item at that position. When searching for an item in the hash table,
we examine the sequence of table locations in the order defined by the key’s probe sequence until either the
item is found, or we have found an empty location in the sequence. If an empty location is found, it means
that the item is not present in the table.
An open address hash table with 2n entries is used to store n items. Assume that the table location
h(k, j) is uniform over the 2n possible table locations and that all h(k, j) are independent.
(a) Show that under these conditions the probability an insertion requires more than k probes is at most
2−k .
(b) Show that for i = 1, 2, . . . , n the probability that the i-th insertion requires more than 2 log n probes
is at most 1/n2 .
Let the random variable Xi denote the number of probes required by the i-th insertion. You have shown
in the previous question that Pr(Xi > 2 log n) ≤ 1/n2 . Let the random variable X = max1≤i≤n Xi
denote the maximum number of probes required by any of the n insertions.

28
(c) Show that Pr(X > 2 log n) ≤ 1/n.
(d) Show that the expected length of the longest probe sequence is E[X] = O(log n).
Solution to Exercise 5.21:
(a) Since the hash table contains 2n entries and at any point at most n elements, the probability that a
given probe fails to find an empty entry is at most 1/2. Therefore the probability that an insertion
needs k probes is at most 2−k .
(b) We apply part (a) with k = 2 log n.
(c) We apply the union bound to part (b).

(d) We find

E[X] = Pr(X ≤ 2 log n) · E[X | X ≤ 2 log n] + Pr(X > 2 log n) · E[X | X > 2 log n]
 

1
≤ 2 log n + 2 log n + Pr(X ≥ k | X > 2 log n)
n
k=2 log n

1
≤ 2 log n + 2 log n + Pr(X ≥ k)
n
k=1

1
−k
≤ 2 log n + 2 log n + n 2
n
k=1
1
= 2 log n + (2 log n + n)
n
≤ 2 log n + 2,

for sufficiently large n.


Exercise 5.22: Bloom filters can be used to estimate set differences. Suppose you have a set X and I have
a set Y , both with n elements. For example, the sets might represent our 100 favorite songs. We both create
Bloom filters of our sets, using the same number of bits m and the same k hash functions. Determine the
expected number of bits where our Bloom filters differ as a function of m, n, k, and |X ∩ Y |. Explain how
this could be used as a tool to find people with the same taste in music more easily than comparing lists of
songs directly.
Solution to Exercise 5.22:
Consider the ith position in the filter. For it to be set to 1 by the set X but not by the set Y , it must
be set by the hash of some element in X − (X ∩ Y ), but not set by the hash of any element in Y . This
probability is
 k|Y |  k(|X|−|X∩Y |)

1 1
1− 1− 1− .
m m

Note that |X| = |Y | = n, so we can write this as


 kn  k(n−|X∩Y |)

1 1
1− 1− 1− .
m m

The probability that the bit is set to 1 by the Y but not by the set X is the same. Hence the expected
number of bits set differently is just
 kn  k(n−|X∩Y |)

1 1
2m 1 − 1− 1− .
m m

29
This approach could be used to estimate how much two people share the same taste in music; the higher
the overlap between the Bloom filters of their 100 favorite songs, the larger the intersection in their favorite
songs list. Since the Bloom filters are like compressed versions of the lists, comparing them may be much
faster than comparing song lists directly.
Exercise 5.23:
Suppose that we wanted to extend Bloom filters to allow deletions as well as insertions of items into the
underlying set. We could modify the Bloom filter to be an array of counters instead of an array of bits. Each
time an item is inserted into a Bloom filter, the counters given by the hashes of the item are increased by
one. To delete an item, one can simply decrement the counters. To keep space small, the counters should
be a fixed length, such as 4 bits.
Explain how errors can arise when using fixed-length counters. Assuming a setting where one has at
most n elements in the set at any time, m counters, k hash functions, and counters with b bits, explain how
to bound the probability that an error occurs over the course of t insertions or deletions.
Solution to Exercise 5.23: This variation of the Bloom filter is generally referred to as a counting Bloom
filter in the literature.
Problems can occur with counting Bloom filters if there is counter overflow. An overflow occurs whenever
2b of the nk hashes fall in the same counter. We can use the Poisson approximation to bound the probability of
counter overflow, treating each hash as a ball and each counter as a bin. (By Theorem 5.10, the probabilities
we compute via the Poisson approximation are within a factor of 2 of the exact probabilities, since counter
overflow increases with the number of hashes.) The Poisson approximation says that the probability single
counter does not overflow at any point in time is at least
b
2 −1
e−kn/m (kn/m)j
.
j=0
j!

(The probability could be even greater if there were less than n elements at that time.) The probability that
no counter overflows is then at least
b m
2 −1 −kn/m j
 e (kn/m) 
.
j=0
j!

Including the factor of 2 from the approximation, the probability of any counter overflow is bounded by
 b m 
2 −1 −kn/m j
e (kn/m)  
2 1 −  .
j=0
j!

Finally, the above is for any fixed moment in time, so by a union bound, we can bound the probability over
all time steps by  b m 
2 −1 −kn/m j
e (kn/m)  
2t 1 −  .
j=0
j!

As an example, when b = 4, n = 10000, m = 80000, and k = 5, we have a bound of approximately


0.23 · 10−11 · t, so one can go many steps without an overflow.
Solution to the Exploratory Assignment
For better or worse, a great deal about the assignment can be found in the paper “An Experimental
Assignment on Random Processes”, available at Michael Mitzenmacher’s home page. This makes useful
reading for an instructor in preparation for the assignment; however, an instructor should be on the lookout
for students copying from this decscription.
A key insight into the assignment is that, for each pair of sibling leaves at the bottom of the tree, at least
one of the two siblings must arrive in order for all the node to be marked.
The first process behaves like a coupon collector’s problem: each pair of sibling leaves must have one of its
leaves marked. The probability any given pair is hit on any specific step is 2/N , and there are Q = (N + 1)/2

30
pairs of leaves.
 Let Xi be the number of nodes sent after the (i − 1)st pair is hit until the ith pair is hit, and
let X = Xi . Then

N 1
Q
N N N
E[X] = + + ...+ = = Ω(N log Q) = Ω(N log N ),
2Q 2(Q − 1) 2 2 i=1 i

and this is a lower bound on the expected total nodes sent.


For the second process, the behavior is similar to the birthday paradox. Let us first consider just the
Q = (N + 1)/2 leaf nodes, √ in the order they arrive. What is the probability some pair of sibling leaf nodes
both lie in the last L = c Q leaf nodes (for some constant c)? Reversing the order, so that we look
√ at these
Q nodes from back to front, this is equivalent to asking whether we have some pair in the first c Q nodes.
The probability that no such pair occurs is just
L 
 
i−1 PL
i=1 (i−1)/(Q−i)
PL 2 2
1− ≤e ≤ e− i=1 (i−1)/(Q−i) ≤ e−L /(Q−L)
≤ e−c .
i=1
Q−i+1

Hence the probability that last L = c Q nodes contain
√ a pair of sibling leaves it at least some constant.
Now let c = 1, and note that within the last 2 N sent nodes, with probability at least 1/2, at least 1/2
of these nodes are leaf nodes (just by symmetry). The result therefore follows immediately, using say c = 1.
The behavior of the third process is completely deterministic. It takes exactly (N + 1)/2 nodes to arrive
in all cases. This is easily proved by induction on n, taking cases on whether one of the (N + 1)/2 nodes
that arrives is the root or not. Alternatively, there is a nice interpretation. Let us think of each leaf node
as a variable, and each internal node as the sum of variables in its subtree. Interpret being sent a node as
being given a value for the corresponding variable or sum of variables. We have (N + 1)/2 variables; once
we have (N + 1)/2 values, we have (N + 1)/2 linearly independent equations in these (N + 1)/2 variables,
which can then be solved, giving all the values and allowing us to fill the tree. The marking actions on the
tree correspond exactly to solving for all possible variables (or sums of variables) given the values present.

31
Chapter 6

Exercise 6.2:
(a) Prove that, for every integer n, there exists a coloring of the edges of the complete
 graph Kn by two
colors so that the total number of monochromatic copies of K4 is at most n4 2−5 .
 
(b) Give a randomized algorithm for finding a coloring with at most n4 2−5 monochromatic copies of K4
that runs in expected time polynomial in n.
(c) Show how to construct such a coloring deterministically in polynomial time using the method of
conditional expectations.
Solution to Exercise 6.2:
 
(a) Suppose we just color the edges randomly, independently for each edge. Enumerate the n4 copies
of K4 in thegraph in any order. Let Xi =  1 if the ith copy is monochromatic, and 0 otherwise.
Then X = Xi , and by linearity E[X] = E[Xi ]. Each copy of K4 has 6 edges, and is therefore
monochromatic with probability 2−5 , so we have E[X] = n4 2−5 . It follows that there is at least one
coloring with at least this many monochromatic K4 .
(b) A randomized algorithm would simply color each edge randomly and count the n number of monochro-
4
matic copies of K4 . Such counting takes time at most
  O(n ), as there are 4 copies to check. For
convenience, let us assume n is large enough so that n4 2−5 is an integer. The probability p that each
randomized trial succeeds then satisifies:
      
n n −5 n −5
p + (1 − p) 2 −1 ≥ 2 ,
4 4 4
or
1
p≥ 31/32
.
(n4 )+1
Hence it takes on average at most O(n4 ) trials to have one with the required monochromatic coloring.
We conclude the randomized guess-and-check algorithm takes time at most O(n8 ).
(c) Consider the edges in some arbitrary order. We color the edges one at a time. Each time we color
an edge, we consider both possible colors, and for each coloring we compute the expected number of
monochromatic K4 that will be obtained if we color the remaining edges randomly. We give the edge
the color with the greater expectation. This can be computed in at most O(n4 ) time each time we color
an edge, simply by checking each copy of K4 and determining the probability it will still be colored
monochromatically. In fact, this computation can be done much more quickly; for example, we need
really only consider and compare the expectation for copies of K4 that include the edge to be colored,
which reduces the computation to O(n2 ) for each edge colored. As there are m edges, this gives an
4 2
O(n m) (or O(n m)) algorithm. At each step, the expected number of monochromatic K4 is at least
n −5
4 2 , so the algorithm terminates with the desired coloring.
Exercise 6.3: Given an n-vertex undirected graph G = (V, E), consider the following method of generating
an independent set. Given a permutation σ of the vertices, define a subset S(σ) of the vertices as follows:
for each vertex i, i ∈ S(σ) if and only if no neighbor j of i precedes i in the permutation σ.
(a) Show that each S(σ) is an independent set in G.
(b) Suggest a natural randomized algorithm to produce σ for which you can show that the expected
cardinality of S(σ) is
n
1
,
i=1
di +1
where di denotes the degree of vertex i.

32
n 1
(c) Prove that G has an independent set of size at least i=1 di +1 .

Solution to Exercise 6.3:


(a) Supppose i, j ∈ S(σ) where i and j are neighbors. But if i precedes j in σ then j cannot be in S(σ),
and similarly if j precedes i. In either case we have a contradiction, so no two elements of S(σ) can be
neighbors, and S(σ) forms an independent set.
(b) We choose σ uniformly at random. We now define Xi by Xi = 1 if i ∈ S(σ) and Xi = 0 otherwise.
We have Xi = 1 if and only if i precedes all of its neighbors in σ. Now consider the subset of
elements Si consisting of i and all of its neighbors. Since the permutation is chosen uniformly at
random, each element of Si is equally likely to be the first element of Si to appear
 in σ. Since
|Si | = di + 1, we see that Pr(Xi = 1) = di1+1 . As we can represent |S(σ)| = n X i , we have
  i=1
E[|S(σ)|] = ni=1 E[Xi ] = ni=1 di1+1 .
n
(c) We have shown that for a random permutation σ, E[|S(σ)|] = i=1 di1+1 . We can therefore conclude
n
that there exists a permutation π such that S(π) ≥ i=1 di1+1 , and therefore there is a independent
set S(π) of the desired size.
Exercise 6.4: Consider the following 2-player game. The game begins with k tokens, placed at the number
0 on the integer number line spanning [0, n]. Each round, one player, called the chooser, selects two disjoint
and nonempty sets of tokens A and B. (The sets A and B need not cover all the remaining tokens; they
only need to be disjoint.) The second player, called the remover, takes all the tokens from one of the sets
off the board. The tokens from the other set all move up one space on the number line from their current
position. The chooser wins if any token ever reaches n. The remover wins if the chooser finishes with one
token that has not reached n.
(a) Give a winning strategy for the chooser when k ≥ 2n .
(b) Use the probabilistic method to show that there must exist a winning strategy for the remover when
k < 2n .
(c) Explain how to use the method of conditional expectations to derandomize the winning strategy for
the remover when k < 2n .
Solution to Exercise 6.4:
(a) At each round, the chooser splits the tokens up into two equal size sets (or sets as equal as possible)
A and B. By induction, after j rounds, the chooser has at least 2n−j tokens at position j, and hence
has at least one token at position n after n rounds.
(b) Suppose that the remover just chooses a set to remove randomly (via an independent, fair coin flip)
each time. Let X be the number of tokens that ever reach position n when the remover k uses this
strategy. Let Xm = 1 if the mth token ever reaches position n and 0 otherwise, so X = i=1 Xm . For
the mth token to reach position n, it has to be moved forward n times; each time the chooser puts
it in a set, it is removed with probability 1/2. Hence the probability it reaches position n is at most
1/2n, from which we have E[Xm ] ≤ 1/2n and thus E[X] ≤ k/2n < 1. Since a random strategy yields
on average less than 1 token that reaches position n, there must be a strategy that yields 0 tokens that
reach position n.
(c) At any point in the game, let ki be the number of tokens at position 
i. With the randomized strategy
n
of part (b), the expected number of tokens that reach position n is i=0 ki 2i−n . At each round, the
remover should remove the set that minimizes this expectation after the set is removed. Inductively,
after each round, the expected number of tokens that will reach position n is less than 1. Hence,
regardless of the strategy of the chooser, the remover guarantees that 0 tokens reach position n.

33
Exercise 6.5: We have shown using the probabilistic method that, if a graph G has n nodes and m edges,
there exists a partition of the n nodes into sets A and B so that at least m/2 edges cross the partition.
mn
Improve this result slightly: show that there exists a partition so that at least 2n−1 edges cross the partition.
Solution to Exercise 6.5: Instead of partitioning the vertices by placing each randomly, uniformly choose
a partition with n/2 edges on each side if n is even, or (n − 1)/2 on  one side and (n + 1)/2 on the other
n−2
if n is odd. In the first case, for any edge (x, y) there are 2 (n−2)/2 partitions of the vertices where x and
 n−2 
y are on opposite sides (2 ways to position x and y, (n−2)/2 ways to partition the other vertices). Hence
 n−2   n 
the probability the edge crosses the cut is 2 (n−2)/2 / n/2 = n/(2n − 2). By linearity of expectations the
mn
expected number of edges crossing the cut is 2n−2 . The second case is similar, but the the probability the
 n−2   n 
edge crosses the cut is 2 (n−1)/2 / (n+1)/2 = (n + 1)/(2n), and the expected number of edges crossing the
cut is m(n+1)
2n
mn
. In both cases the result is better than 2n−1 .
Exercise 6.9: A tournament is a graph on n vertices with exactly one directed edge between each pair of
vertices. If vertices represent players, each edge can be thought of as the result of a match between the two
players; the edge points to the winner. A ranking is an ordering on the n players from best to worst (ties are
not allowed). Given the outcome of a tournament, one might wish to determine a ranking of the players. A
ranking is said to disagree with a directed edge from y to x if y is ahead of x in the ranking (since x beat y
in the tournament).
(a) Prove that, for every tournament, there exists a ranking that disagrees with at most 50% of the edges.
(b) Prove that, for sufficiently large n, there exists a tournament such that every ranking disagrees with
at least 49% of the edges in the tournament.
Solution to Exercise 6.9:
(a) For a ranking chosen uniformly at random, each edge individually agrees with ranking with probability
1/2. Hence the expected number of edges that agree is 1/2 the number of edges, and thus there must
exist a ranking that agrees on at least 1/2 the edges.
(b) Choose a random tournament, where the direction of each edge is chosen independently, and
  consider
any fixed ranking. Let X be the number of edges that agree with ranking. Clearly E[X] = n2 /2. Since
each edge is chosen independently, we must have by Chernoff bounds (such as, for example, Corollary
4.9 on page 70)   
n n
Pr X ≥ 0.51 ≤ e−( 2 )/10000 .
2
Now let us take a union bound
 over all n! different rankings. The probability that any of these rankings
agree with at least 0.51 n2 is at most
n
n!e−( 2 )/10000 .
Since n! ≤ nn = en ln n , we see that for large enough n, this expression goes to 0. Hence, for large
enough n, a random tournament is very likely to have no ranking that agrees in 51 percent of the
matches, which gives the result.
Exercise 6.10: A family of subsets F of {1, 2, . . . , n} is called an antichain if there is no pair of sets A and
B in F satisfying A ⊂ B.
 n 
(a) Give an example of F where |F | = n/2
.

(b) Let fk be the number of sets in F with size k. Show that



n
f
nk ≤ 1.
k=0 k

(Hint: Choose a random permutation of the numbers


n from 1 to n, and let Xk = 1 if the first k numbers
in your permutation yield a set in F . If X = k=0 Xk , what can you say about X?)

34
 
(c) Argue that |F | ≤ n
n/2
for any antichain F .

Solution to Exercise 6.10:


(a) The example is just to take all subsets of n/2 elements.
n
(b) For X = k=0 Xk , we must have X ≤ 1, since only one fo the Xi can be 1 by definition. (If two of
the Xi equal 1, we have two sets, one a subset of the other, both in the antichain. This contradicts
the definition.) Hence E[X] ≤ 1. But by linearity


n
n
n
f
1 ≥ E[X] = E[Xk ] = Pr(Xk = 1) = nk ,
k=0 k=0 k=0 k

giving the result.


(c) As

n
f
n
f F
1≥ nk ≥  nk  =  n  ,
k=0 k k=0 n/2
n/2

the result follows.


Exercise 6.13: Consider a graph in Gn,p , with p = c ln n
n . Use the second moment method or the conditional
expectation inequality to prove that if c < 1 then, for any constant  > 0 and for n sufficiently large, the
graph has isolated vertices with probability at least 1 − .
Solution to Exercise 6.13: First, let us determine the expected number of isolated vertices. The proba-
bility a vertex is isolated is the probability that none of its adjacent edges appear, which is
 n
c ln n
1− ≈ e−c ln n = n−c .
n

Note that here and throughout we use the appoximation 1 − x ≈ e−x when x is small; for n sufficiently large,
this approximation is suitable for this argument, as the error affects only lower order terms, so we may think
of the approximation as an equality up to these lower order terms. The expected number of isolated vertices
is therefore approximately n1−c , which is much greater than 1 for c < 1.
Now consider using theconditional expectation inequality. Let Xi be 1 if the ith vertex is isolated and
0 otherwise, and let X = Xi . We have found that Pr(Xi = 1) ≈ n−c . Also,

E[X | Xi = 1] = 1 + j = i Pr(Xj = 1 | Xi = 1).

Now, the effect that vertex i is isolated does not affect Xj substantially; the probability that the jth vertex
is isolated (for j = i) after this conditioning is
 n−1
c ln n
Pr(Xj = 1 | Xi = 1) = 1 − ≈ n−c .
n

Again, here we use approximation to signify the difference is only in lower order terms. From the conditional
expectation inequality, and using our approximations, we have

n1−c
Pr(X > 0) ≥ ≥1−
1 + (n − 1)n−c

for any constant  when n is sufficiently large.


Exercise 6.14: Consider a graph in Gn,p , with p = n1 . Let X be the number of triangles in the graph,
where a triangle is a clique with three edges. Show that

Pr(X ≥ 1) ≤ 1/6

35
and that
lim Pr(X ≥ 1) ≥ 1/7.
n→∞

(Hint: use the conditional expectation inequality.)


Solution to Exercise 6.14:
The expectation E[X] is given by
   3
n 1 1
E[X] = ≤ .
3 n 6

Since X takes on values on the non-negative integers, it follows thatPr(X


 > 0) ≤ 1/6.
To use the conditional expectation inequality, we enumerate the n3 possible triangles in some
arbitrary
way. We let Xi = 1 if the ith triangle appears in the graph, and Xi = 0 otherwise. Clearly X = i Xi , and
Pr(Xi = 1) = 1/n3 . Now
E[X | Xi = 1] = 1 + Pr(Xj = 1 | Xi = 1).
j=i
n−3
There are 3 ways of choosing a triangle so that it is disjoint from Xi . In this case the edges of
 
the triangles are independent and Pr(Xj = 1 | Xi = 1) = 1/n3 . Similarly, there are 3 n−3 2 triangles
3
that share a vertex
 but no edges with X i , and for these triangles Pr(X j = 1 | X i = 1) = 1/n . Finally,
there are 3 n−3
1 triangles that share two vertices and hence one edge with X i , and for these triangles
Pr(Xj = 1 | Xi = 1) = 1/n2 . Putting this together in the conditional expectation inequality we have
  1
n n3
Pr(X > 0) ≥ n−3 1 n−3  1 .
3 1 + 3 n3 + 3 2 n13 + 3 n−3
1 n2

Taking the limit now gives the desired result.


Exercise 6.16: Use the Lovasz local lemma to show that, if
  
k n k
4 21−(2) < 1,
2 k−2

then it is possible to color the edges of Kn so that it has no monochromatic Kk subgraph.


Solution to Exercise 6.16: To use the Lovasz local lemma, we need to set up a group of events so that
4dp ≤ 1, where d is the
 degree of the dependency graph, and p is an upper bound on the probability of each
event. There will be nk events, one event for each Kk subgraph, with the event being that the subgraph is
1−(k
2 ) . Two events are dependent if they share an edge. Each edge is a part of
monochromatic.
n−2
  n  Then p = 2   n 
k−2 < k−2 subgraphs of type Kk , so each Kk subgraph shares an edge with fewer than k2 k−2 other
Kk subgraphs. This gives our bound on d, and by the Lovasz local lemma, we have a coloring where no Kk
graph is monochromatic.

36
Chapter 7

Exercise 7.2: Consider the two-state Markov chain with the following transition matrix.
 
p 1−p
P= .
1−p p
t
Find a simple expression for P0,0 .
Solution to Exercise 7.2:
t 1 + (2p − 1)t
P0,0 = .
2
This is easily shown by induction. See also the related Exercise 1.11, which considers the equivalent problem
in a different way.
Exercise 7.12: Let Xn be the sum of n independent rolls of a fair die. Show that for any k ≥ 2,
1
lim Pr(Xn is divisible by k) = .
n→∞ k

Solution to Exercise 7.12: Let Z1 , Z2 , . . . be the sequence defined as follows: Zj = (Xj mod k). Notice
that the Zj give a Markov chain! It is easy to check that the stationary distributon of this chain is uniform
over 0, 1, . . . , k − 1, using for example Definition 7.8. Therefore we have by Theorem 7.7 that

lim Pr(Zn = 0) = 1/k,


n→∞

which is equivalent to the desired result.


Exercise 7.17: Consider the following Markov chain, which is similar to the 1-dimensional random walk
with a completely reflecting boundary at 0. Whenever position 0 is reached, with probability 1 the walk
moves to position 1 at the next step. Otherwise, the walk moves from i to i + 1 with probability p and from
i to i − 1 with probability 1 − p. Prove that
(a) if p < 1/2, each state is positive recurrent.
(b) if p = 1/2, each state is null recurrent.
(c) if p > 1/2, each state is transient.
Solution to Exercise 7.17: While there are many ways to solve this problem, a particulary pleasant way
involves the Catalan numbers; this may require some hints for the students if they have not seen them before.
(More on the Catalan numbers can be found in standard texts, or see the excellent Wikipedia article.)
Suppose we start from position k. If we move to k − 1, it is clear we will eventually (in finite time!)
return to k since 0 is a boundary. Hence, it suffices to consider what happens when we move to k + 1. In
this sense, every state has the same behavior with regard to recurrence and transience as state 0, so we just
consider starting from state 0 henceforth.
Starting from 0, we must make a move to 1. The probability that our first return to 0 occurs after 2n + 2
moves is given by Cn pn (1 − p)n+1 , when Cn is the nth Catalan number, given by
 
1 2n
Cn = .
n+1 n

This is because there are Cn paths from 1 back to 1 of length 2n that do not hit 0. (Again, this is a standard
combinatorial result.)
t
Recall that we let r0,0 be the probability that the first return to 0 from 0 is at time t. Then


t
r0,0 = Cn pn (1 − p)n+1 .
n=0

37
We now use the Catalan identity √


n 1− 1 − 4x
Cn x =
n=0
2x
to find
∞ 
1− 1 − 4p(1 − p)
t
r0,0 = (1 − p) Cn (p(1 − p)) = (1 − p) ·
n
.
n=0
2p(1 − p)

When p ≤ 1/2, then 1 − 4p(1 − p) = 2p, and we have r0,0 t
= 1. Hence when p ≤ 1/2 the chain is recurrent.

When p > 1/2, then 1 − 4p(1 − p) = 2 − 2p, and r0,0 t
= (1 − p)/p < 1. In this case, the chain is transient.
To cope with the issue of whether state 0 is positive recurrent or null recurrent, we consider


ht0,0 = (2n + 2)Cn pn (1 − p)n+1
n=0
∞  
2n
= 2(1 − p) (p(1 − p))n .
n=0
n

We make use of the binomial identity


∞  
2n n 1
x =√
n=0
n 1 − 4x
to find
2(1 − p)
ht0,0 =  .
1 − 4p(1 − p)
Hence ht0,0 is finite when p < 1/2 and is infinite when p = 1/2.
Exercise 7.20: We have considered the gambler’s ruin problem in the case where the game is fair. Consider
the case where the game is not fair; instead, the probability of losing a dollar each game is 2/3, and the
probability of winning a dollar each game is 1/3. Suppose that you start with i dollars and finish either
when you reach n or lose it all. Let Wt be the amount you have gain after t rounds of play.
(a) Show that E[2Wt+1 ] = E[2Wt ].
(b) Use part (a) to determine the probability of finishing with 0 dollars and the probability of finishing
with n dollars when starting at position i.
(c) Generalize the preceding argument to the case where the probability of losing is p > 1/2. (Hint: Try
considering E[cWt ] for some constant c.)
Solution to Exercise 7.20:
(a) We have, whenever we not reached 0 or n dollars, that
2 Wt −1 1 Wt +1
E[2Wt+1 | Wt ] = 2 + 2 = 2Wt .
3 3
It follows that
E[2Wt+1 ] = E[E[2Wt+1 | Wt ]] = E[2Wt ].
(b) We have W0 = 0, and inductively, E[2Wt ] = E[2W0 ] = 1. Let qi be the probability that starting from i
we eventually end at 0. By the same argument as for the gambler’s ruin problem, we now have
lim E[2Wt ] = 1,
t→∞

and hence
qi 2−i + (1 − qi )2n−i = 1.
We find
2n − 2i
qi = .
2n − 1

38
(c) We can repeat the same argument, with c = p/(1 − p). We have
E[cWt+1 | Wt ] = p(p/(1 − p))Wt −1 + (1 − p)(p/(1 − p))Wt +1 = cWt .
We therefore find in this setting
qi c−i + (1 − qi )cn−i = 1,
so
cn − ci
qi = .
cn − 1
Exercise 7.21: Consider a Markov chain on the states {0, 1, . . . , n}, where for i < n we have Pi,i+1 = 1/2
and Pi,0 = 1/2. Also, Pn,n = 1/2 and Pn,0 = 1/2. This process can be viewed as a random walk on a
directed graph with vertices {0, 1, . . . , n}, where each vertex has two directed edges: one that returns to
0 and one that moves to the vertex with the next higher number (with a self-loop at vertex n). Find the
stationary distribution of this chain. (This example shows that random walks on directed graphs are very
different than random walks on undirected graphs.)
Solution Exercise 7.21: Clearly π0 , the probability of being at state 0 in the stationary distribution, is
just 1/2, since at each step we move to 0 with probability 1/2. It follows inductively that πi = 2−i−1 for
1 < i < n, and πn = 2−n .
Exercise 7.23: One way of spreading information on a network uses a rumor-spreading paradigm. Suppose
that there are n hosts currently on the network. Initially, one host begins with a message. Each round, every
host that has the message contacts another host chosen independently and uniformly at random from the
other n − 1 hosts, and sends that host the message. We would like to know how many rounds are necessary
before all hosts have received the message with probability 0.99.
(a) Explain how this problem can be viewed in terms of Markov chains.
(b) Determine a method for computing the probability that j hosts have received the message after round
k given that i hosts have received the message after round k − 1. (Hint: There are various ways of
doing this. One approach is to let P (i, j, c) be the probability j hosts have the message after the first
c of the i hosts have made their choices in a round; then find a recurrence for P .)
(c) As a computational exercise, write a program to determine the number of rounds for a message starting
at one host to reach all other hosts with probability 0.9999 when n = 128.
Solution to Exercise 7.23:
(a) There are multiple ways to view this problem in terms of Markov chains. The most natural is to
have Xk be the number of hosts that have received the message after k rounds, with X0 = 1. The
distribution of Xk+1 depends only on Xk ; the history and which actual hosts have received the message
do not affect the distribution of Xk+1 .
(b) Following the hint, we have P (i, j, 0) = 1 if i = j and 0 otherwise. Further
j−1 n−j +2
P (i, j, c) = P (i, j, c − 1) + P (i, j − 1, c − 1) .
n−1 n−1
That is, there are two ways to have j hosts with the message after c of the i hosts have made their
choices; either j hosts had the message after c − 1 choices and the cth host chose one of the other j − 1
hosts with the message already, or j − 1 hosts had the message after c − 1 choices and the cth host
chose a host without the message.
(c) We use the above reucrrence to track the entire distribution of Xk round by round. A simple program
computes the desired values. For n = 128, all the hosts have the message with probability (greater
than) 0.99 after 17 rounds (if my code is right!). For probability (greater than) 0.9999, it requires 22
rounds.
Notice that even if each round each host contacted a host without the message, 7 rounds would be
needed.

39
Exercise 7.26: Let n equidistant points be marked on a circle. Without loss of generality, we think of the
points as being labeled clockwise from 0 to n − 1. Initially, a wolf begins at 0, and there is one sheep at
each of the remaining n − 1 points. The wolf takes a random walk on the circle. For each step, it moves
with probability 1/2 to one neighboring point and with probability 1/2 to the other neighboring point. At
the first visit to a point, the wolf eats a sheep if there is still one there. Which sheep is most likely to be the
last eaten?
Solution to Exercise 7.26: Interestingly, each of the remaining sheep is equally likely to be the last eaten.
There are many different ways this can be shown. One way is as follows. Consider the sheep at position
n − 1. This is the last sheep to be eaten if the wolf, starting from 0, walks in one direction from 0 to n − 2
before walking in the other direction directly from 0 to n − 1. By breaking the circle between n − 2 and
n − 1, we see that his is equivalent to a gambler’s ruin problem as in Section 7.2.1. Because of this, the
probability the sheep at n − 1 is last is 1/((n − 2) + 1) = 1/(n − 1). By symmetry the same is true for the
sheep at position 1.
Now consider the sheep at position i, 1 < i < n − 1. There are two ways the sheep could be last: the
wolf first walks from 0 to i − 1 before reaching i + 1 (event A) and then walks from i − 1 to i + 1 without
passing through i (event B); or the wolf first walks from 0 to i + 1 before reaching i − 1 (event C), and then
walks from i + 1 to i − 1 without passing through i (event D). Again, each of these are simple gambler’s
ruin problems, and combining them gives
i−1 1 n−i−1 1 1
Pr(A) · Pr(B) + Pr(C) · Pr(D) = + = .
n−2n−1 n−2 n−1 n−1
Exercise 7.27: Suppose that we are given n records, R1 , R2 , . . . , Rn . The records are kept in some order.
The cost of accessing the jth record in the order is j. Thus, if we had 4 records ordered as R2 , R4 , R3 , R1 ,
the cost of accessing R4 would be 2, and the cost of accessing R1 would be 4.
Suppose further that at each step, record Rj is accessed with probability pj , with each step being
independent of other steps. If we knew the values of the pj in advance, we would keep the Rj in decreasing
order with respect to pj . But if we don’t know the pj in advance, we might use the “move-to-front” heuristic:
at each step, put the record that was accessed at the front of the list. We assume that moving the record
can be done with no cost and that all other records remain in the same order. For example, if the order was
R2 , R4 , R3 , R1 before R3 was accessed, then the order at the next step the order would be R3 , R2 , R4 , R1 .
In this setting, the order of the records can be thought of as the state of a Markov chain. Give the
stationary distribution of this chain. Also, let Xk be the cost for accessing the kth requested record.
Determine an expression for limk→∞ E[Xk ]. Your expression should be easily computable in time that is
polynomial in n, given the pj .
Solution to Exercise 7.27: Let us consider an example with 4 records. Let π(i) be the record in the ith
position. The key is to note that for the records to be in the order Rπ(1) , Rπ(2) , Rπ(3) , Rπ(4) , we must have
that, thinking of the sequence of choices in reverse order, π(1) was chosen first, then π(2), and so on. This
gives the probabilility of this order as
pπ(1) pπ(2) pπ(3)
.
pπ(1) + pπ(2) + pπ(3) + pπ(4) pπ(2) + pπ(3) + pπ(4) pπ(3) + pπ(4)
This formula generalizes; if there are n records, the probability is:

n−1
pπ(1)
n .
i=1 j=i pπ(j)

This must correspond to the stationary distribution.


To calculate Xk , consider accessing some record far off in time. In the stationary distribution, record Rj
is ahead of record Ri if Rj was chosen more recently than Ri . The probability of this is just pj /(pi + pj ).
To find E[Xk ], let Xij be the probability that at time k that record i is chosen and record j is ahead of it.
Then  
pj
E[Xk ] = E  Xij  = E[Xi,j ] = pi .
pi + pj
i,j=i i,j=i i,j=i

40
This can be easily calculated in polynomial time.

41

You might also like