Answer Key 2
Answer Key 2
Do Not Distribute!!!
We are still in the process of adding solutions. Expect further updates over time.
1
Chapter 1
Exercise 1.1: We flip a fair coin ten times. Find the probability of the following events.
(a) The number of heads and the number of tails are equal.
(b) There are more heads than tails.
(c) The ith flip and the (11 − i)th flip are the same for i = 1, . . . , 5.
(d) We flip at least four consecutive heads.
Solution to Exercise 1.1:
10
(a) The probability is 10
5 /2 = 252/1024 = 63/256.
(b) By part (a), the probability that the number of heads and the number of tails is different is 193/256.
By symmettry, the probability that there are more heads than tails is 1/2 of this, or 193/512.
(c) The probability that each pair is the same is 1/2; by independence, the probability that all five pairs
are the same is 1/32.
(d) While there are other ways of solving this problem, with only 1024 possibilities, perhaps the easiest
way is just to exhaustively consider all 1024 possibilities (by computer!). This gives 251/1024.
Exercise 1.2: We roll two standard six-sided dice. Find the probability of the following events, assuming
that the outcomes of the rolls are independent.
(a) The two dice show the same number.
(b) The number that appears on the first die is larger than the number on the second.
(c) The sum of the dice is even.
(d) The product of the dice is a perfect square.
Solution to Exercise 1.2:
(a) The probability the second die matches the first is just 1/6.
(b) By part (a), the probability that the rolls are different is 5/6. By symmettry, the probability that the
first die is larger is 5/12.
(c) The probability is 1/2. This can be done by considering all possibilities exhaustively. Alternatively,
if the first die comes up with an odd number, the probability is 1/2 that the second die is also odd
and the sum is even. Similarly, if the first die comes up with an even number, the probability is 1/2
that the second die is also even and the sum is even. Regardless of the outcome of the first roll, the
probability is 1/2.
(d) The possible squares are 1, 4, 9, 16, 25, and 36. There is 1 way for the product to be 1, 9, 16, 25, or
36, and 3 ways for the product to be 4. This gives a total probability of 8/36 = 2/9.
Exercise 1.4: We are playing a tournament in which we stop as soon as one of us wins n games. We are
evenly matched, so each one of us wins any game with probability 1/2, independently of other games. What
is the probability that the loser has won k games when the match is over?
Solution to Exercise 1.4: Suppose that you win, and I have won k < n games. For this to happen, you
must win the last game, and in the remaining n+ k − 1 games, I must win exactly k of them. This probability
is just
n+k−1
k
.
2n+k
2
Of course, I could win, which has the same probability by symmetry. Hence the total desired probability is
n+k−1
k
.
2n+k−1
Exercise 1.5: After lunch one day, Alice suggests to Bob the following method to determine who pays.
Alice pulls three six-sided dice from her pocket. These dice are not the standard dice, but have the following
numbers on their faces:
• Die A: 1,1,6,6,8,8
• Die B: 2,2,4,4,9,9
• Die C: 3,3,5,5,7,7
The dice are fair, so each side comes up with equal probability. Alice explains that Alice and Bob will each
pick up one of the dice. They will each roll their die, and the one who rolls the lowest number loses and will
buy lunch. So as to take no advantage, Alice offers Bob the first choice of the dice.
(a) Suppose that Bob chooses Die A and Alice chooses Die B. Write out all of the possible events and their
probabilities, and show that the probability that Alice wins is bigger than 1/2.
(b) Suppose that Bob chooses Die B and Alice chooses Die C. Write out all of the possible events and their
probabilities, and show that the probability that Alice wins is bigger than 1/2.
(c) Since Die A and Die B lead to situations in Alice’s favor, it would seem that Bob should choose Die
C. Suppose that Bob chooses Die C and Alice chooses Die A. Write out all of the possible events and
their probabilities, and show that the probability that Alice wins is still bigger than 1/2.
Solution to Exercise 1.5: By enumerating all cases, we find the second player wins with probability 5/9 in
all cases. For example, if Bob chooses Die A and Alice chooses Die B, there are nine equally likely outcomes:
(1, 2), (1, 4), (1, 9), (6, 2), (6, 4), (6, 9), (8, 2), (8, 4), (8, 9).
3
Solution to Exercise 1.9: Let us assume n is a power of 2. The probaibility of having log2 n+k consecutive
heads starting from the jth flip is
1 1
log n+k
= k .
2 2 2 n
Here j can range from 1 to n − log2 n − k + 1; for j > n − log2 n − k + 1, there are not enough flips to have
k in a row before reaching the nth flip. Using a union bound, we have an upper bound of
n − log2 n − k + 1 1
k
≤ k.
2 n 2
Exercise 1.12: The following problem is known as the Monty Hall problem, after the host of the game
show “Let’s Make a Deal.” There are three curtains. Behind one curtain is a new car, and behind the other
two are goats. The game is played as follows. The contestant chooses the curtain that she thinks the car is
behind. Monty then opens one of the other curtains to show a goat. (Monty may have more than one goat
to choose from; in this case, assume he chooses which goat to show uniformly at random.) The contestant
can then stay with the curtain she originally chose or switch to the other unopened curtain. After that, the
location of the car is revealed, and the contestant wins the car or the remaining goat. Should the contestant
switch curtains or not, or does it make no difference?
Solution to Exercise 1.12:
We assume that the car is behing a curtain chosen uniformly at random. Since the contestant can pick
from 1 car and 2 goats, the probability of choosing a curtain with a car behind it is 1/3 and that of choosing
a curtain with a goat behind it is 2/3; if the contestant doesn’t switch, the probability of winning is just 1/3.
Now notice that if the contestant switches, she will win whenever she started out by picking a goat! It
follows that switching wins 2/3 of the time, and switching is a much better strategy.
This can be also set up as a conditional probability question, but the above argument is perhaps the
simplest; or simply playing the game several times (using cards – an Ace for the car, deuces for the goats)
can be very convincing.
Exercise 1.13: A medical company touts its new test for a certain genetic disorder. The false negative rate
is small: if you have the disorder, the probability that the test returns a positive result is 0.999. The false
positive rate is also small: if you do not have the disorder, the probability that the test returns a positive
result is only 0.005. Assume that 2 percent of the population has the disorder. If a person chosen uniformly
from the population is tested, and the result comes back positive, what is the probability that the person
has the disorder?
Solution to Exercise 1.13: Let X be the event that the person has the disorder, and Y be the event that
the test result is positive. Then
Pr(X ∩ Y ) Pr(X ∩ Y )
Pr(X | Y ) = = .
Pr(Y ) Pr(X ∩ Y ) + Pr(X̄ ∩ Y )
We now plug in the appropriate numbers:
0.02 · 0.999 999
Pr(X | Y ) = = ≈ 0.803.
0.02 · 0.999 + 0.98 · 0.005 1244
Exercise 1.15: Suppose that we roll ten standard six-sided dice. What is the probability that their sum
will be divisible by 6, assuming that the rolls are independent? (Hint: use the principle of deferred decisions,
and consider the situation after rolling all but one of the dice.)
Solution to Exercise 1.15: The answer is 1/6. Using the principle of deferred decisions, consider the
situation after the first 9 die rolls. The remainder when dividing the sum of the nine rolls by 6 takes on one
of the values 0, 1, 2, 3, 4, or 5. If the remainder is 0, the last roll needs to be a 6 to have the final sum be
divisible by 6; if the remainder is 1, the last roll needs to be a 5 to have the final sum be divisible by 6; and
so on. For any value of the remainder, there is exactly one roll which will make the final sum divisible by 6,
so the probability is 1/6. (If desired, this can be made more formal using conditional probability.)
Exercise 1.18: We have a function F : {0, . . . , n − 1} → {0, . . . , m − 1}. We know that for 0 ≤ x, y ≤ n − 1,
F ((x + y) mod n) = (F (x) + F (y)) mod m. The only way we have to evaluate F is to use a look-up table
4
that stores the values of F . Unfortunately, an Evil Adversary has changed the value of 1/5 of the table
entries when we were not looking.
Describe a simple randomized algorithm that, given an input z, outputs a value that equals F (z) with
probability at least 1/2. Your algorithm should work for every value of z, regardless of what values the
Adversary changed. Your algorithm should use as few lookups and as little computation as possible.
Suppose I allow you to repeat your initial algorithm three times. What should you do in this case, and
what is the probability that your enhanced algorithm returns the correct answer?
Solution to Exercise 1.18: Choose x randomly and let y = z − x mod n. Look up F (x) and F (y) and
output (F (x) + F (y)) mod n. By the union bound, the probability of an error is bounded by the sum of
the probability that F (x) was changed and the probability that F (y) was changed. This sum is 2/5, so
(F (x) + F (y)0 + F (z) mod n with probability at least 3/5. The algorithm uses two lookups and one random
choice. Notice that the probability that F (x) and F (y) are both correct is not (4/5)2 ; this calculation
assumes independence. Only x is chosen randomly, while y is computed from x, so (x, y) is not a random
pair.
Suppose we run the algorithm three times independently and return the majority answer (if there is
one). This scheme returns the correct answer if at least two of the trials returns the correct answer. By
independence, exactly two trials are correct with probability at least 3 · ( 35 )2 · 25 = 125
54
and all three trials are
3 3 27 81
correct with probability at least ( 5 ) = 125 , so the probability of a correct answer is at least 125 .
Exercise 1.21: Given an example of three random events X, Y , Z for which any pair is independent but
all three are not mutually independent.
Solution to Exercise 1.21: The standard example is to let X and Y be independent random bits (0 with
probability 1/2, 1 with probability 1/2) and let Z be their exclusive-or (equivalently, their sum modulo 2).
Independence follows from calculations such as
so X is independent from Z, and similarly Y is. The three values are clearly not independent since Z is
determined by X and Y . This construction generalizes; if Z is the exclusive-or of independent random bits
X1 , . . . , Xn , then any collection of n of the variables are independent, but the n + 1 variables are not.
Exercise 1.22:
(a) Consider the set {1, . . . , n}. We generate a subset X of this set as follows: a fair coin is flipped
independently for each element of the set; if the coin lands heads, the element is added to X, and
otherwise it is not. Argue that the resulting set X is equally likely to be any one of the 2n possible
subsets.
(b) Suppose that two sets X and Y are chosen independently and uniformly at random from all the 2n
subsets of {1, . . . , n}. Determine Pr(X ⊆ Y ) and Pr(X ∪ Y = {1, . . . , n}). (Hint: use the first part of
this problem.)
Solution to Exercise 1.22: For the first part, there are 2n possible outcomes for the n flips, and each
gives a different set; since there are 2n sets, each set must come up with probability 1/2n .
For the second part, suppose we choose sets X and Y using the coin flipping method given in the first
part. In order for X ⊆ Y , there must be no element that is in X but not in Y . For each of the n elements, the
probability that is in X but not in Y is 1/4, and these events are independent for each element. Hence the
probability that X ⊆ Y is (1 − 1/4)n = (3/4)n . Similarly, the probability that X ∪ Y contains all elements
is (3/4)n , since the probability that each element is in X or Y (or both) is 3/4 independently for each item.
Exercise 1.23: There may be several different min-cut sets in a graph. Using the analysis of the randomized
min-cut algorithm, argue that there can be at most n(n − 1)/2 distinct min-cut sets.
Solution to Exercise 1.23: We have found that the probability that any specific min-cut set is returned
is at least 2/(n(n − 1)). If there are k distinct min-cut sets, the probability that any of them is returned is
2k/(n(n − 1)), and we must have
2k
≤ 1.
n(n − 1)
5
We can conclude that there are at most k ≤ n(n − 1)/2 distinct min-cut sets.
Exercise 1.25 To improve the probability of success of the randomized min-cut algorithm, it can be run
multiple times.
(a) Consider running the algorithm twice. Determine the number of edge contractions and bound the
probability of finding a min-cut.
(b) Consider the following variation. Starting with a graph with n vertices, first contract the graph down to
k vertices using the randomized min-cut algorithm. Make copies of the graph with k vertices, and now
run the randomized algorithm on this reduced graph times, independently. Determine the number
of edge contractions and bound the probability of finding a minimum cut.
(c) Find optimal (or at least near-optimal) values of k and for the variation above that maximize the
probability of finding a minimum cut while using the same number of edge contractions as running the
original algorithm twice.
Solution to Exercise 1.25
(a) There are 2n − 4 edge contractions. The probability of success if each run is independent is at least:
2
2 4(n2 − n − 1)
1− 1− = .
n(n − 1) n2 (n − 1)2
(b) First consider the probability that no edge from min-cut is contracted in the first n − k contraction
steps. In the notation of the book, this is
Now consider the last k − 2 contractions. The probability that no min-cut edge is contracted in these
steps is:
k(k − 1) 2
Pr(Fn−k ) · 1 − (1 − Pr(Fn−2 | Fn−k )) ≥ 1− 1− .
n(n − 1) k(k − 1)
(c) Assuming that k and n are large enough so that we can ignore constants, the expression from the
previous part is approximately
k2 2
1− 1− 2 ,
n2 k
6
and we require 2n − 4 = (n − k) + (k − 2), or roughly speaking k ≈ n. We note that when k = c1 n1/3
and = c2 n2/3 the resulting probability of success is at least Ω(n−4/3 ), and this is essentially to the
best possible. This can be checked by various programs like Mathematica. Alternatively, note that
when < k 2 , as an approximation, we hvae
2 2
1− 1− 2 ≈ 2,
k k
and hence the success probability is at least (approximately) 2/n2 , suggesting should be as large as
possible. However, once is large enough so that (1 − (1 − 2/k 2 ) ) is constant, increasing can only
improve this term by at most a constant factor, suggesting k should be as large as possible. This gives
the tradeoff giving the result.
Exercise 1.26: Tic-tac-toe always ends up in a tie if players play optimally. Instead, we may consider
random variations of tic-tac-toe.
(a) First variation: Each of the nine squares is labeled either X or O according to an independent and
uniform coin flip. If only one of the players has one (or more) winning tic-tac-toe combinations, that
player wins. Otherwise, the game is a tie. Determine the probability that X wins. (You may want to
use a computer program to help run through the configurations.)
(b) Second variation: X and O take turns, with the X player going first. On the X player’s turn, an X
is placed on a square chosen independently and uniformly at random from the squares that are still
vacant; O plays similarly. The first player to have a winning tic-tac-toe combination wins the game,
and a tie occurs if neither player achieves a winning combination. Find the probability that each player
wins. (Again, you may want to write a program to help you.)
Solution to Exercise 1.26: For the first variation, out of 29 = 512 possible games, there are 116 ties. By
symmetry, X and O win with equal probability, so each wins 198 of the possible games, giving a winning
probability of roughly .386.
For the second variation, there are 255168 possible games. X wins 131184; Y wins 77904, leaving 46080
ties. However, shorter games occur with higher probability. Specifically, a game with k moves occurs with
1
probability 9·8·...·(9−k+1) . Taking this into account, X wins with probability roughly 0.585, and Y wins with
probability roughly 0.288.
7
Chapter 2
Exercise 2.1: Suppose we roll a fair k-sided die with the numbers 1 through k on the die’s faces. If X is
the number that appears, what is E[X]?
Solution to Exercise 2.1:
k
1 k(k + 1) k+1
E[X] = ·i= = .
i=1
k 2k 2
Exercise 2.5: If X is a B(n, 1/2) random variable with n ≥ 1, show that the probability that X is even is
1/2.
Solution to Exercise 2.5: This can be proven by induction. It is trivially true for n = 1. Now note that a
B(n, 1/2) random variable corresponds to the number of heads in n fair coin flips. Consider n flips. By the
induction hypothesis, the number Y of heads after the first n − 1 of these coin flips, which has distribution
B(n − 1, 1/2), is even with probability 1/2. Flipping one more coin, we find the number of heads is
(1/2) · Pr(Y is even) + (1/2) · Pr(Y is odd) = 1/2.
In fact, notice we don’t even need the induction! Even if just the last flip is fair, the same equation holds,
and since Pr(Y is even) + Pr(Y is odd) = 1 the result holds.
Exercise 2.6: Suppose that we independently roll two standard six-sided dice. Let X1 be the number that
shows on the first die, X2 the number on the second die, and X the sum of the numbers on the two dice.
(a) What is E[X | X1 is even]?
(b) What is E[X | X1 = X2 ]?
(c) What is E[X1 | X = 9]?
(d) What is E[X1 − X2 | X = k] for k in the range [2, 12]?
Solution to Exercise 2.6:
(a) What is E[X | X1 is even]?
12
E[X | X1 is even] = Pr(X = k | X1 is even)
k=2
1 1 2 1
= 3· +4· +5· + . . . + 12
18 18 18 18
= 7.5.
There are other ways of calculating this; for example
E[X | X1 is even] = E[X1 + X2 | X1 is even]
= E[X1 | X1 is even] + E[X2 | X1 is even]
= 4 + 3.5.
8
(d) What is E[X1 − X2 | X = k] for k in the range [2, 12]?
Note that E[X1 − X2 | X = k] = E[X1 | X = k] − E[X2 | X = k]. But this difference is 0 by symmetry.
Exercise 2.7: Let X and Y be independent geometric random variables, where X has parameter p and Y
has parameter q.
(a) What is the probability that X = Y ?
If X > 1 and Y > 1, then by the memoryless property, the distributions of X and Y are as if we are
just beginning again. Hence, whenever we stop, we must have
(b) Think of flipping biased coins to determine X and Y . Let Z = max(X, Y ). Consider the four cases for
the outcomes from the first flip of X and Y . For example, if X > 1 and Y > 1, then by the memoryless
property the remaining distribution for the number of flips until the first head for each coin remains
geometric. If X = 1 and Y > 1 then we only consider the number of flips until the first head until the
second coin. Following this logic for all four cases, we have:
E[Z] = (pq) · 1 + p(1 − q)(1 + E[Y ]) + (1 − p)q(1 + E[X]) + (1 − p)(1 − q)(1 + E[Z])
= 1 + p(1 − q)/q + q(1 − p)/p + (1 − p)(1 − q)E[Z].
This gives
E[Z] = (1 + p(1 − q)/q + q(1 − p)/p)/(1 − (1 − p)(1 − q)).
(c) What is Pr(min(X, Y ) = k)? Think of flipping biased coins to determine X and Y . On the first flip,
at least one of the two coins is heads with probability s = 1 − (1 − p)(1 − q). If not, by the memoryless
property, the distribution of flips until the first head for each coin remains geometric. It follows that
min(X, Y ) is itself geometric with parameter s, giving the distribution.
(d) Using the now familair logic of considering whether Y ≥ X is determined by the first flip of respective
coins, we find that
9
The expectation is then
∞
E[X | X ≤ Y ] = x Pr(X = x | Y ≥ X)
x=1
∞
= x Pr(Y ≥ X = x)/ Pr(Y ≥ X)
x=1
∞
(1 − p)x−1 p(1 − q)x−1
= x
x=1
p/(1 − (1 − p)(1 − q))
∞
= (1 − (1 − p)(1 − q)) x((1 − p)(1 − q))x−1
x=0
1
= .
1 − (1 − p)(1 − q)
Exercise 2.8:
(a) Alice and Bob decide to have children until either they have their first girl or they have k ≥ 1
children. Assume that each child is a boy or girl independently with probability 1/2, and that there
are no multiple births. What is the expected number of female children that they have? What is the
expected number of male children that they have?
(b) Suppose Alice and Bob simply decide to keep having children until they have their first girl. Assuming
that this is possible, what is the expected number of boy children that they have?
Solution to Exercise 2.8:
(a) With probability 1/2k , they have 0 girls; with probability 1−1/2k , they have 1 girl. Hence the expected
number of female children is 1 − 1/2k . The probability that they have at least i ≥ 1 boys is 1/2i , since
this occurs whenever the first i children are boys. Hence the expected number of male children is
k
1
i
= 1 − 1/2k .
i=1
2
∞
1
i+1
= 1.
i=0
2
Hence, even if they have kids until their first girl, on average they’ll have just 1 boy.
Exercise 2.10:
Show by induction that if f : R → R is convex then, for any x1 , x2 , . . . , xn and λ1 , λ2 , . . . , λn with
(a)
n
i=1 λi = 1, n
n
f λi xi ≤ λi f (xi ).
i=1 i=1
for any random vairable X that takes on only finitely many values.
10
Solution to Exercise 2.10:
(a) This is done by induction. The case n = 1 is trivial; the case n = 2 is just the definition of convexity.
Assume
n+1that the statement holds for some value n. Consider now x1 , x2 , . . . , xn , xn+1 and λ1 , λ2 , . . . , λn , λn+1
with i=1 λi = 1, If λn+1 = 1 the formula holds trivially, so we many assume λn+1 < 1. Let
γi = λ1 /(1 − λn+1 ). Now
n+1
n
f λi xi = f λi xi + λn+1 xn+1
i=1 i=1
n
= f (1 − λn+1 γi xi + λn+1 xn+1
i=1
n
≤ (1 − λn+1 )f γi xi + λn+1 f (xn+1 ).
i=1
The last line comes from convexity of f (the case of n = 2 items). But now by the induction hypothesis
n
n
(1 − λn+1 )f γi xi + λn+1 f (xn+1 ) ≤ (1 − λn+1 )γi f (xi ) + λn+1 f (xn+1 )
i=1 i=1
n
= λi f (xi ) + λn+1 f (xn+1 )
i=1
n+1
= λi f (xi ),
i=1
n
E[f (X)] = f (xi ) Pr(X = xi ) ≥ f xi Pr(X = xi ) = f (E[X]).
i=1 i=1
Exercise 2.13:
(a) Consider the following variation of the coupon collector’s problem. Each box of cereal contains one
of 2n different coupons. The coupons are organized into n pairs, so that coupons 1 and 2 are a pair,
coupons 3 and 4 are a pair, and so on. Once you obtain one coupon from every pair, you can obtain a
prize. Assuming that the coupon in each box is chosen independently and uniformly at random from
the 2n possibilities, what is the expected number of boxes you have to buy before you can claim the
prize?
(b) Generalize the result of the problem above for the case where there are kn different coupons, organized
into n disjoint sets of k coupons, so that you need one coupon from every set.
Solution to Exercise 2.13:
(a) Let Xi be the number of boxes bought while having a coupon from i − 1 distinct pairs to obtain a
n
coupon from an ith distinct pair. Then X = i=1 Xi is the desired number of boxes. Each Xi is a
geometric random random variable, with pi = 1 − (2(i − 1))/2n = 1 − (i − 1)/n. Hence this has the
same behavior as the coupon collector’s problem, and the same expectation, nH(n).
(b) Nothing changes; Xi is now the number of boxed needed to go from i − 1 groups to i groups, and again
pi = 1 − (2(i − 1))/2n = 1 − (i − 1)/n.
11
Exercise 2.14: The geometric distribution arises as the distribution of the number of times we flip a coin
until it comes up heads. Consider now the distribution of the number of flips X until the kth head appears,
where each coin flip comes up heads independently with probability p. Prove that this distribution is given
by
n−1 k
Pr(X = n) = p (1 − p)n−k
k−1
for n ≥ k. (This is known as the negative binomial distribution.)
Solution to Exercise 2.14: In order for X = n, the nth flip must be heads, and from the other n − 1 flips,
we must choose exactly k − 1 of them to be heads. Hence
n − 1 k−1 n−1 k
Pr(X = n) = p · p (1 − p) n−k
= p (1 − p)n−k .
k−1 k−1
Exercise 2.15: For a coin that comes up heads independently with probability p on each flip, what is the
expected number of flips until the k-th head?
Solution to Exercise 2.15: Let Xi be the number of flips between the (i − 1)st and and ith heads. Then
each Xi is geometric with expectation 1/p. Let X = ki=1 Xi be the number of flips until the kth head. By
k
linearity of expectations, E[X] = i=1 E[Xi ] = k/p. Note that this gives the expectation of the negative
binomial distribution of Exercise 2.14. (The expectation of the negative binomial could also be computed
directly, but it is more work.)
Exercise 2.17: Recall the recursive spawning process described in Section 2.3. Suppose that each call
to process S recursively spawns new copies of the process S, where the number of new copies is 2 with
probability p and 0 with probability 1 − p. If Yi denotes the number of copies of S in the i-th generation,
determine E[Yi ]. For what values of p is the expected total number of copies bounded?
Solution Exercise 2.17: Let Yi be the number of processes in generation i, and similarly for Yi−1 . Let Zk
be the number of processes spawned by the kth process in generation Yi−1 . We have
yi−1
E[Yi | Yi−1 = yi−1 ] = E Zk
k=1
yi−1
= E[Zk ]
k=1
= 2yi−1 p.
Further,
E[Yi ] = E[E[Yi | Yi−1 ]] = E[2Yi−1 p] = 2pE[Yi−1 ].
Inductively, since Y0 = 1, we have E[Yi ] = (2p)i . The expected total number of processes spawned is
∞
∞ ∞
E Yi = E[Yi ] = (2p)i ,
i=0 i=0 i=0
12
1 − 1/k = (k − 1)/k the item in memory is uniform over the first k − 1 items, by they induction hypothesis.
Hence each of the first k − 1 items is in the memory with probability 1/k, and by construction so is the kth
item, completing the induction.
Exercise 2.19: Suppose that we modify the reservoir sampling algorithm of Exercise 2.18 so that, when
the kth item appears, it replaces the item in memory with probability 1/2. Explain what the distribution
of the item in memory looks like.
Solution to Exercise 2.19: By induction, we can again show that the probability that the jth item is
in memory after the kth item has appeared is 2j−k−1 , except for the first item, which is in memory with
probability 2−k+1 . (Note j ≥ 1.)
Exercise 2.20: A permutation on the numbers [1, n] can be represented as a function π : [1, n] → [1, n],
where π(i) is the position of i in the ordering given by the permutation. A fixed point of a permutation
π : [1, n] → [1, n] is a value for which π(x) = x. Find the expected number of fixed points of a permutation
chosen uniformly at random from all permutations.
Solution to Exercise 2.20: Let Xi = 1 if i is a fixed point and 0 otherwise. As each element is a fixed
point with probability 1/n in a random permutation we have E[Xi ] = 1/n. Let X = ni=1 Xi be the number
of fixed points. By linearity of expectations
n
E[X] = E[Xi ] = n/n = 1.
i=1
Exercise 2.21: Let a1 , a2 , . . . , an be a random permutation of {1, 2, . . . , n}, equally likely to be any of the
n! possible permutations. When sorting the list a1 , a2 , . . . , an , the element ai has to move a distance of
|ai − i| places from its current position to reach its position in the sorted order. Find
n
E |ai − i| ,
i=1
The first equality follows from the definition of Xij , the second from the linearity of expectations, and the
third from the fact that Xij = |j − i| precisely with probability 1/n if the permutation is random. Now to
simplify this summtation, note that over the range of i and j values, |j − i| takes on the value 1 exactly
2(n − 1) times, the value 2 exactly 2(n − 2) times, and so on; hence the sum equals
n
2i(n − i)
.
i=1
n
13
must start all over again. Hence, with probability 1/6, Y = X + 1, and with probability 5/6, Y = X + 1 + Y ,
where Y has the same distribution as Y . Hence
1 5
E[Y ] = E[X + 1] + E[X + 1 + Y ].
6 6
Simplifying using linearity of expectations and E[Y ] = E[Y ] gives
1
E[Y ] = E[X] + 1.
6
Now E[X], the expected number of rolls until our first six, equals six. Hence E[Y ] = 42.
14
Chapter 3
Exercise 3.2: Let X be a number chosen uniformly at random from [−k, k]. Find Var[X].
Solution to Exercise 3.2: We have E[X] = 0, so Var[X] = E[X 2 ]. We find
k
1
Var[X] = i2
2k + 1
i=−k
k
2
= i2
i=1
2k + 1
2 k(k + 1)(2k + 1)
=
2k + 1 6
k(k + 1)
= .
3
Exercise 3.3: Suppose that we roll a standard fair die 100 times. Let X be the sum of the numbers that
appear over the 100 rolls. Use Chebyshev’s inequality to bound Pr(|X − 350| ≥ 50).
Solution to Exercise 3.3: For a roll with outcome Y , we have E[Y ] = 7/2 and
6
Var[Y ] = E[X 2 ] − E[X]2 = i2 /6 − 49/4 = 35/12.
i=1
By linearity of expectations and of variance (when the variables being summed are independent), we have
E[X] = 350 and E[Y ] = 3500/12 = 875/3. Hence, by Chebyshev’s inequality
875 7
Pr(|X − 350| ≥ 50) ≤ = .
3 · 2500 60
Exercise 3.7: A simple model of the stock market suggests that, each day, a stock with price q will increase
by a factor r > 1 to qr with probability p and will fall to q/r with probability 1 − p. Assuming we start with
a stock with price 1, find a formula for the expected value and the variance of the price of the stock after d
days.
Solution to Exercise 3.7: Let Xd be the price of the stock after d days. Let Yi = r if the price dincreases
on the ith day and let Yi = r−1 otherwise. We assume that the Yi are independent. Now Xd = i=1 Yi , so
d
d d
1−p
E[Xd ] = E[ Yi ] = E[Yi ] = pr + ,
i=1 i=1
r
where we make use the independence to simplify the expectation of the product. Similarly
d
d d
1−p
E[Xd2 ] = E[ Yi2 ] = E[Yi2 ] = pr2 + .
i=1 i=1
r2
Hence d 2d
1−p 1−p
Var[Xd ] = E[Xd2 ] − E[Xd ]2 = pr2 + 2 − pr + .
r r
Exercise 3.10: For a geometric random variable X, find E[X 3 ] and E[X 4 ]. (Hint: Use Lemma 2.5.)
Solution to Exercise 3.10: Let X be a geometric random variable with parameter p. Let Y = 1 if X = 1,
and Y = 0 otherwise. Finally, let X = X + 1. By Lemma 2.5, we have
15
Here we use E[X 3 | Y = 1] = 1 since X = 1 when Y = 1. To simplify the expression further, recall that
conditioned on Y = 0, or equivalently that X > 1, X has a geometric distribution by the memoryless
property. Hence
E[(X + 1)3 | Y = 0] = E[(X + 1)3 ],
and we continue
Similarly, we find
giving
E[X 4 ] = 1/p + 4(1 − p)(p2 − 6p + 6)/p4 + 6(1 − p)(2 − p)/p3 + 4(1 − p)/p2
= 1/p + 4(1 − p)(p2 − 6p + 6)/p4 + 6(1 − p)(2 − p)/p3 + 4(1 − p)/p2
= (24 − 36p + 14p2 − p3 )/p4 .
Exercise 3.11: Recall the Bubblesort algorithm of Exercise 2.22. Determine the variance of the number of
inversions that need to be corrected by Bubblesort.
Solution to Exercise 3.11: Denote the input set by {1, . . . , n} and the permutation by π. For i < j, let
Xij be 1 if i and j are inverted and 0 otherwise. The total number of inversions is given by X = i<j Xij ,
and E[Xij ] is the probability a pair is inverted, which is 1/2. By linearity of expectations,
1 n
E[X] = E[Xij ] = .
i<j
2 2
2
where is the second summation we have i < j and k < . Since Xij is an indicator variable, Xij = Xij , and
the first sum is just E[X]. To compute the second sum, we consider several cases. If none of the
indices
four
are equal, then Xij and Xk are independent, so E[Xij Xk ] = E[Xij ]E[Xk ] = 1/4. There are n2 n−2
n such
pairs (i, j), (k, ).
If the pairs overlap, there are three distinct indices i < j < k and three cases:
(i) (i, j), (i, k): This occurs 2 n3 ways, and in this case
1
E[Xij Xik ] = Pr(Xij Xik = 1) = Pr(π(i) > π(j) and π(i) > π(k)) = .
3
16
n
(ii) (i, j), (j, k): This occurs 2 3 ways, and in this case
1
E[Xij Xjk ] = Pr(Xij Xjk = 1) = Pr(π(i) > π(j) > π(k)) = .
6
n
(iii) (i, k), (j, k): This occurs 2 3 ways, and in this case
1
E[Xik Xjk ] = Pr(Xik Xjk = 1) = Pr(π(i) > π(k) and π(j) > π(k)) = .
3
Therefore
2 n 1 n 2 n 5 n
E[Xij Xk ] = + + = .
3 3 3 3 3 3 3 3
{i,j}={k,}
Finally,
2
1 n 1 n n−2 5 n 1 n n(n − 1)(2n + 5)
V ar[X] = + + − = .
2 2 4 2 2 3 3 4 2 72
Exercise 3.12: Find an example of a random variable with finite expectation and unbounded variance.
Give a clear argument showing that your choice has these properties.
Solution to Exercise 3.12: Let X be an integer-valued random variable defined by Pr(X = i) = c/i3 for
∞
i ≥ 1 and some positive constant c. This is possible since i=1 1/i3 is finite. Then
∞
E[X] = 1/i2
i=1
17
(a) Chebyshev’s inequality uses the variance of a random variable to bound its deviation from its expec-
tation. We can also use higher moments. Suppose that we have a random variable X and an even
integer k for which E[(X − E[X])k ] is finite. Show that
1
Pr |X − E[X]| > t E[(X − E[X]) ] ≤ k .
k k
t
Solution to Exercise 3.20: For even k the kth moment is always non-negative, so we have
When k is odd, terms like X − E[X] might be negative, and the argument will not go through. For
example, (X − E[X])k might be negative and we cannot apply Markov’s inequality to it.
Exercise 3.25: The weak law of large numbers says that if X1 , X2 , X3 , . . . are independent and identically
distributed random variables with mean µ and standard deviation σ, then for any constant > 0
X1 + X2 + . . . + Xn
lim Pr − µ > = 0.
n→∞ n
18
Chapter 4
Exercise 4.3:
(a) Determine the moment generating function for the binomial random variable B(n, p).
(b) Let X be a B(n, p) random variable and Y be a B(m, p) random variable, where X and Y are inde-
pendent. Use part (a) to determine the moment generating function of X + Y .
(c) What can we conclude from the form of the moment generating function of X + Y ?
Solution to Exercise 4.3:
(a) We find using the binomial theorem
n
n
E[e tX
] = pi (1 − p)n−i eti
i=0
i
n
n i
= pet (1 − p)n−i
i=0
i
= (1 − p + pet )n .
(c) The moment generating function for X + Y has the same form as the moment generating function for
a B(n + m, p) random variable. We conclude that the sum of a B(n, p) random variable and B(m, p)
random variable gives a B(n + m, p) random variable.
Exercise 4.6:
(a) In an election with two candidates using paper ballots, each vote is independently misrecorded with
probability p = 0.02. Use a Chernoff bound to give an upper bound on the probability that more than
4 percent of the votes are misrecorded in an election of 1, 000, 000 ballots.
(b) Assume that a misrecorded ballot always counts as a vote for the other candidate. Suppose that
Candidate A received 510, 000 votes and Candidate B received 490, 000 votes. Use Chernoff bounds to
upper bound the probability that Candidate B wins the election due to misrecorded ballots. Specifically,
let X be the number of votes for Candidate A that are misrecorded and let Y be the number of votes
for Candidate B that are misrecorded. Bound Pr((X > k) ∪ (Y < )) for suitable choices of k and .
Solution to Exercise 4.6: For the first part, a variety of Chernoff bounds could be used. Applying
Equation (4.1), for instance, gives that if Z is the number of misrecorded votes out of n total votes, then
e 0.02n
Pr(Z ≥ 2(0.02n)) ≤ .
4
With n being 1 million, the right hand side is (e/4)20000 , which is very small.
For the second part, as an example, in order for candidate B to win due to misrecording of ballots, at
least one of the following must occur: either more than 15000 votes for A get misrecorded or less than 5000
19
votes for B get misrecorded. If both of these fail to occur, A wins, since A will obtain at least 510000 - 15000
+ 5000 = 500000 votes. Hence bounding
Pr((X > 15000) ∩ Pr(Y < 5000))
suffices. More generally, we could say that either more than 10000 + z votes for A get misrecorded or less
than z votes for B get misrecorded, for any value of z; choosing the best value for z is then an optimization
problem, and the best value could depend on the form of Chernoff bound used.
Exercise 4.7: Throughout the chapter we implicitly assumed the following extension of the Chernoff bound.
Prove that itis true.
n
Let X = i=1 Xi , where the Xi ’s are independent 0-1 random variables. Let µ = E[X]. Choose any µL
and µH such that µL ≤ µ ≤ µH . Then for any δ > 0,
µH
eδ
Pr(X ≥ (1 + δ)µH ) ≤ .
(1 + δ)(1+δ)
Similarly, for any 0 < δ < 1, µL
e−δ
Pr(X ≤ (1 − δ)µL ) ≤ .
(1 − δ)(1−δ)
Solution to Exercise 4.7:
t t
(a) Let t = ln(1 + δ) > 0. By section 4.3 of the book, E[etX ] ≤ e(e −1)µ ≤ e(e −1)µH , where the second
inequality follows from et − 1 > 0 for t > 0 and µ ≤ µH . Following the proof of the normal Chernoff
bound, we get:
t t
(b) Let t = ln(1 − δ) < 0. By section 4.3 of the book, E[etX ] ≤ e(e −1)µ ≤ e(e −1)µL , where the second
inequality follows from et − 1 < 0 for t < 0 and µL ≤ µ. Following the proof of the normal Chernoff
bound, we get:
Exercise 4.9: Suppose that we can obtain independent samples X1 , X 2 , . . . , of a random variable X, and
t
we want to use these samples to estimate E[X]. Using t samples, we use i=1 Xi /t for our estimate of E[X].
We want the estimate to be within E[X] from the true value of E[X] with probability at least 1 − δ. We
may not be able to use Chernoff’s bound directly to bound how good our estimate is if X is not a 0 − 1
random variable, and we do not know its moment generating function.
√ We develop an alternative approach
Var[X]
that requires only having a bound on the variance of X. Let r = E[X] .
20
2
(a) Show using Chebyshev’s inequality that O( r2 δ ) samples are sufficient to solve the above problem.
(b) Suppose that we only need a weak estimate that is within E[X] of E[X] with probability at least 3/4.
2
Argue that only O( r2 ) samples are enough for this weak estimate.
(c) Show that by taking the median of O(log 1δ ) weak estimates, we can obtain an estimate within E[X]
2
log 1/δ
of E[X] with probability at least 1 − δ. Conclude that we only need O( r 2 ) samples.
Solution to Exercise 4.9:
2
1 n
(a) We take n = r2 δ
independent samples X1 , . . . , Xn , and return Z = n i=1 Xi as our estimate.
By linearity of expectation, E[Z] = E[X] = µ. Also, since the Xi are independent and all have the
same distribution as X,
n
1
Var[Z] = Var Xi
n2 i=1
1
= Var[X]
n
r2 µ2
= .
n
2 2
(b) Just set δ = 1 − 3/4 = 1/4 in part a. Hence we need O(4 r2 ) = O( r2 ) samples.
(c) We run the algorithm from part b m = 12 ln 1δ
times, obtaining independent estimates S1 , . . . , Sm for
E[X]. We then return the median of the Si ’s, which we denote by M , as our new estimate for E[X].
the 0-1 random variable that is 1 if and only if |Si − µ| > µ. By part b, E[Yi ] ≤ 1/4. Now,
Let Yi be
m
let Y = i=1 Yi . We see that E[Y ] ≤ m/4 by linearity of expectations, and that Y is the sum of
independent 0-1 random variables, so we can (and will) apply the Chernoff bound to Y .
Now, if |M − µ| > µ, then by the definition of the median, |Si − µ| > µ for at least m/2 of the Si ’s,
and so Y ≥ m/2 = (1 + 1) m4 . Applying the Chernoff bound gives
Exercise 4.10: A casino is testing a new class of simple slot machines. Each game, the player puts in $1,
and the slot machine is supposed to return either $3 to the player with probability 4/25, $100 with proability
1/200, and nothing with all remaining probability. Each game is supposed to be independent of other games.
The casino has been surprised to find in testing that the machines have lost $10,000 over the first millions
game. Derive a Chernoff bound for the probability of this event. You may want to use a calculator or program
to help you choose appropriate values as you derive your bound.
21
Solution to Exercise 4.10:
1000000
Let X = i=1 Xi , where Xi is the winnings to the player from the ith game. With probability 1/200,
Xi = 99; With probability 4/25, Xi = 2; and with probability 167/200, Xi = −1. Note E[Xi ] = −1/50, so
on average after 1 million games you would expect to lose 20000 dollars. It seems very odd for the casino to
instead lose 10000 dollars.
A direct application of Markov’s inequality gives, for t > 0,
Now we simply have to optimize to choose the right value of t. For t = 0.0006 the expression
is approximately 0.9999912637. You can do a little better; for t = 0.00058 the expression is approximately
0.9999912508. The resulting bound on Pr(X ≥ 10000) is approximately 1.6E − 4.
Exercise 4.11: Consider na collection X1 , . . . , Xn of n independent integers chosen uniformly from the
set {0, 1, 2}. Let X = i=1 Xi and 0 < δ < 1. Derive a Chernoff bound for Pr(X ≥ (1 + δ)n) and
Pr(X ≤ (1 − δ)n).
Solution to Exercise 4.11:
We consider just the case of Pr(X ≥ (1 + δ)n). Using the standard Chernoff bound argument, for any
t > 0,
Exercise 4.12: Recall that a function f is said to be convex if, for any x1 , x2 and 0 ≤ λ ≤ 1,
22
(a) Let Z be a random variable that takes on a (finite) set of values in the interval [0, 1], and let p = E[Z].
Define the Bernoulli random variable X by Pr(X = 1) = p and Pr(X = 0) = 1 − p. Show that
E[f (Z)] ≤ E[f (X)] for any convex function f .
(b) Use the fact that f (x) = etx is convex for any t ≥ 0 to obtain a Chernoff-like bound for Z based on a
Chernoff bound for X.
Solution to Exercise 4.12: Consider acollection X1 , . . . , Xn of n independent geometrically distributed
n
random variables with mean 2. Let X = i=1 Xi and δ > 0.
(a) Derive a bound on Pr(X ≥ (1 + δ)(2n)) by applying the Chernoff bound to a sequence of (1 + δ)(2n)
fair coin tosses.
(b) Directly derive a Chernoff bound on Pr(X ≥ (1 + δ)(2n)) using the moment-generating function for
geometric random variables.
(c) Which bound is better?
Solution to Exercise 4.12:
(a) Let Y denote the number of heads in (1 + δ)2n coin flips. We note that the events {X ≥ (1 + δ)2n}
and {Y ≤ n} are equivalent, since the geometric random variables can be thought of as the number
of flips until the next heads when flipping the coins. Now E[Y ] = 12 (1 + δ)2n = (1 + δ)n = µ, so
1 1
n = 1+δ µ = (1 − (1 − 1+δ ))µ = (1 − 1+δ
δ
)µ. Using the Chernoff bound (4.5) we get
δ δ2
Pr(Y ≤ n) = Pr(Y ≤ (1 − )µ) ≤= e−n 2(1+δ) .
1+δ
(b) From page 62 of the book, we have for a geometric random variable Z,
p
MZ (t) = ((1 − (1 − p)et )−1 − 1).
1−p
1
Plugging in p = 2 we get
1 t
1 2e et
MZ (t) = (1 − et )−1 − 1 = 1 = .
2 1 − 2 et ) 2 − et
E[etX ]
Pr(X ≥ (1 + δ)2n) = Pr(etX ≥ et(1+δ)2n ) ≤ t(1+δ)2n
t n e
n e
MX (t) MXi (t) 2−et
t(1+δ)2n
= i=1
t(1+δ)2n
= t(1+δ)2n = e−nt(1+2δ) (2 − et )−nt
e e e
Using calculus, we find the function e−nt(1+2δ) (2 − et )−nt is minimized for t = ln(1 + δ
δ+1 ) < ln 2.
Finally we get that
(c) Using calculus, we can show that the bound from the moment generating function is better under this
derivation.
23
n
Exercise 4.13: Let X1 , . . . , Xn be independent Poisson trials such that Pr(Xi = 1) = p. Let X = i=1 Xi ,
so that E[X] = pn. Let
F (x, p) = x ln(x/p) + (1 − x) ln((1 − x)/(1 − p)).
(a) Show that for 1 ≥ x > p,
Pr(X ≥ xn) ≤ e−nF (x,p) .
(b) Show that when 0 < x, p < 1, F (x, p) − 2(x − p)2 ≥ 0. (Hint: take the second derivative of F (x, p) −
2(x − p)2 with respect to x.)
(c) Using the above, argue that
2
Pr(X ≥ (p + )n) ≤ e−2n .
since in this case, pi = p always. It follows that E[etX ] = MX (t) = (pet + (1 − p))n . Now for any t > 0,
by Markov’s inequality,
To find the appropriate value of t to minimize this quantity, we take the derivative of the expression
inside the parentheses with respect to t. The derivative is:
(1 − p)x (1 − p)x
et = , or t = ln ,
p(1 − x) p(1 − x)
and plugging in this value for t now gives the desired result, as
(pet + (1 − p))n
Pr(X ≥ xn) ≤
etxn
((1 − p)x/(1 − x) + (1 − p))n
=
((1 − p)x/(p(1 − x)))xn
((1 − p)/(1 − x))n
=
((1 − p)x/(p(1 − x)))xn
n(1−x)
1−p p nx
=
1−x x
−nF (x,p)
= e .
24
(b) The first derivative of F (x, p) − 2(x − p)2 with respect to x is
ln(x/p) − ln((1 − x)/(1 − p)) − 4(x − p)
and the second derivative is (for 0 < x < 1)
1/x + 1/(1 − x) − 4 ≥ 0.
As the first derivative is 0 when x = p and the second derivative is non-negative we must have that
x = p gives a global minimum, so when 0 < x, p < 1, F (x, p) − 2(x − p)2 ≥ 0.
(c) It now follows that
2
Pr(X ≥ (p + )n) ≤ e−nF (p+,p) ≤ e−2n .
n
(d) Suppose we let Yi = 1 − Xi and Y = i=1 Yi . Since the Yi are independent Poisson trials with
Pr(Yi = 1) = 1 − p we have
Pr(X ≤ (p − )n) = Pr(Y ≥ (1 − p + )n)
2
≤ e−2n
by applying part (c) to Y . The result follows.
Exercise 4.19:
Solution to Exercise 4.19: It is worth noting that a function f is convex if it has a second derivative
which is non-negative everywhere; this explains why etx is convex.
Let Z be a random variable that takes on a finite set of values S ⊂ [0, 1] with p = E[Z]. Let X be a
Bernoulli random variable with p = E[X]. We show E[f (Z)] ≤ E[f (X)] for any convex function f . By
definition, f (s) ≤ (1 − s)f (0) + sf (1), so
E[f (Z)] = Pr(Z = s)f (s)
s∈S
≤ Pr(Z = s)(1 − s)f (0) + Pr(Z = s)f (1)
s∈S s∈S
= f (0)(1 − E[Z]) + f (1)E[Z]
= f (0) · (1 − p) + f (1) · p
= E[f (X)].
For the second part, the first part implies that in the derivation of the Chernoff bound, we can upper
bound E[etZi ] for any Zi with distribution Z by E[etXi ], where Xi is a Poisson trial with the same mean.
Hence any Chernoff bound that holds for the sum of independent Poisson trials with mean p holds also for
the corresponding sum of independent random variables all with distribution Z.
Exercise 4.21:
Consider the bit-fixing routing algorithm for routing a permutation on the n-cube. Suppose that n is
even. Write each source node s as the concatenation of two binary strings as and bs each of length n/2.
Let the destination of s’s packet be the√concatenation of bs and as . Show that this permutation causes the
bit-fixing routing algorithm to take Ω( N ) steps.
Solution to Exercise 4.21:
Assume that n is even and N is the total number of nodes (2n ). Take an arbitrary sequence of bits
b1 , · · · , b n2 . Examine all the nodes on the hypercube of the form (x1 , · · · , x n2 −1 , b n2 , b1 , · · · , b n2 ). Clearly we
have 2n−1 choices for the xi ’s so we have 2n−1 such nodes. Now by our choice of permutation, we know that
the packet starting at (x1 , · · · , x n2 −1 , b n2 , b1 , · · · , b n2 ) is routed to (b1 , · · · , b n2 , x1 , · · · , x n2 −1 , b n2 ). Since the
bit fixing algorithm fixes one bit at a time from left to right, by the time the n2 − 1’th bit has been aligned
the packet arrives at node (b1 , · · · , b n2 , b1 , · · · , b n2 ) and hence to align the n2 ’th bit the packet must take the
edge to (b1 , · · · , b n2 , b1 , · · · , b n2 ). Hence the packets starting at the nodes (x1 , · · · , x n2 −1 , b n2 , b1 , · · · , b n2 ) must
√
all take the edge from (b1 , · · · , b n2 , b1 , · · · , b n2 ) to (b1 , · · · , b n2 , b1 , · · · , b n2 ). So 2n−1 = N /2 packets must
√ √
use the same edge. Since only one packet can traverse an edge at a time, we need at least N /2 = Ω( N )
steps to complete the permutation.
25
Chapter 5
Exercise 5.1: For what values of n is (1 + 1/n)n within 1% of e? Within 0.0001% of e? Similarly, For what
values of n is (1 − 1/n)n within 1% of 1/e? Within 0.0001%?
Solution to Exercise 5.1: We note that (1+1/n)n is increasing, which can be verified by calculus. Hence we
need only find the first point where n reaches the desired thresholds. When n = 50, (1 + 1/n)n = 2.691588...,
and e − (1 + 1/n)n < 0.01e. When n = 499982, (1 + 1/n)n = 2.718287911..., and e − (1 + 1/n)n < 0.000001e.
Similarly, (1 − 1/n)n is increasing. When n = 51, (1 + 1/n)n = 0.364243..., and 1/e − (1 + 1/n)n < 0.01/e.
When n = 499991, (1 − 1/n)n = 0.36787907..., and 1/e − (1 + 1/n)n < 0.000001/e.
Exercise 5.4: In a lecture hall containing 100 people, you consider whether or not there are three people in
the rooom who share the same birthday. Explain how to calculate this probability exactly, using the same
assumptions as in our previous analysis.
Solution to Exercise 5.4:
Let us calculate the probability that there is no day with three birthdays. One approach is to set up
a recurrence. Let P (n, a, b) be the probability that when n people are each independently given a random
birthday, there are a days where two people share a birthday and b days where just one person has a birthday.
For generality let us denote the number of days in the year as m. Then we have P (1, 0, 1) = 1.0, and the
recursion
b+1 m−a−b+1
P (n, a, b) = P (n − 1, a − 1, b + 1) + P (n − 1, a, b − 1) .
m m
That is, if we consider the people one at a time, a new person either hits a day where someone else already
has a birthday, or a new person is the first to have that birthday. This recurrence allows the calculation of
the desired value.
Alternatively, we can express the probability as a summation. Let k ≥ 0 be the number of days where
two people share a birthday, and let ≥ 0 be the number of days where just one person has a birthday. Note
that we must have 2k + = 100. There are 365 k ways to pick the k days, and then 365−k
ways to pick the
other days. Further, there are 100!/2k ways to assign the 100 people to their birthdays; note we need to
divide by 2k to account for the fact that there are two equivalent ways of assigning two people to the same
birthday for each of the k days. Each configuration has probability (1/365)100 of occurring. Putting this all
together, we have:
365!100!
.
k!!(365 − k − )!2k 365100
k,:k+2=100;k,≥0
Exercise 5.7: Suppose that n balls are thrown independently and uniformly at random into n bins.
(a) Find the conditional probability that bin 1 has one ball given that exactly one ball fell into the first
three bins.
(b) Find the conditional expectation of the number of balls in bin 1 under the condition that bin 2 received
no balls.
(c) Write an expression for the probability that bin 1 receives more balls than bin 2.
Solution to Exercise 5.7: For the first part, the ball is equally likely to fall in any of the three bins, so the
answer must be 1/3. For the second part, under the condition that bin 2 received no balls, the n balls are
uniformly distributed over the remaining n − 1 bins; hence the conditional expectation must be n/(n − 1).
For the third part, it is easier to begin by writing an expression for the probability the two bins receive the
same number of balls. This is:
n/2 2k n−2k
n 1 2
P = 1− .
k; k; n − k n n
k=0
26
(a) Give an upper bound on this probability using the Poisson approximation.
(b) Determine the exact probability of this event.
(c) Show that these two probabilities differ by a multiplicative factor that equals the probability that
a Poisson random variable with parameter n takes on the value n. Explain why this is implied by
Theorem 5.6.
(a) If there are b balls at the start of a round, what is the expected number of balls at the start of the
next round?
(b) Suppose that every round the number of balls served was exactly the expected number of balls to be
served. Show that all the balls would be served in O(log log n) rounds. (Hint: if xj is the expected
number of balls left after j rounds, show and use that xj+1 ≤ x2j /n.)
Hence the expected number of balls in the next round is n(1 − p). Note that this is approximately
n(1 − e−b/n ).
(b) Consider any ball in round j + 1. Considering all the other balls, at most xj of the bins have at least
one ball in it, since there are at most xj balls being placed in the round. Hence the probability any
ball is placed into a bin where there is some other ball is at most xj /n. It follows that the expected
number of balls that are left after j + 1 rounds is at most xj · (xj /n) = x2j /n.
27
From the calculations in part (a), it is easy to check that after two rounds, the expected number of
balls remaining is at most n/2. It follows inductively that for j ≥ 2
j−2
xj ≤ 2−2 n.
After j = log log n + 3 rounds, xj < 1; since xj is supposed to be integer-valued, O(log log n) rounds
suffice. (Alternatively, after O(log log n) rounds of following the expectation, we are down to 1 ball, at
which point only 1 more round is necessary.)
Exercise 5.16: Let G be a random graph generated using the Gn,p model.
(a) A clique of k vertcies in a graph is a subset of k vertices such that all k2 edges between these vertices
lie in the graph. For what value of p, as a function of n, is the expected number of cliques of five
vertices in G equal to 1?
(b) A K3,3 graph is a complete bipartite graph with three vertices on each side. In other words, it is a
graph with six vertices and nine edges; the six distinct vertices are arranged in two groups of three,
and the nine edges connect each of the nine pairs of vertices with one vertex in each group. For what
value of p, as function of n, is the expected number of K3,3 subgraphs in G equal to one?
(c) For what value of p, as a function of n, is the expected number of Hamiltonian cycles in the graph
equal to 1?
Solution to Exercise 5.16:
(a) There are n5 possible cliques of size 5. Each possible clique has 52 = 10 edges. Hence the expected
n 10
10 n −1
number of cliques is 5 p , and this expectation equals 1 when p = 5 .
(b) There are 12 n3 · n−33 possible ways of choosing the six vertices of the K3,3 ; first we choose three
vertices for one side, then we choose three of the remaining vertices for the other side. (We have to
divide by two because each K3,3 is counted twice, as each set of vertices could
show
9 up on either side.)
As there are 9 edges in a K3,3 , the expected number that appear is 12 n3 · n−3 3 p . For this to equal
9 −1 −1
1, we need p = 2 n3 n−3
3 .
(c) There are (n− 1)!/2 different Hamiltonian cycles in an (undirected) graph. Hence
the expected number
of Hamiltonian cycles is pn (n − 1)!/2, and this expectation is 1 when p = n 2/(n − 1)!.
Exercise 5.21: In hashing with open addressing, the hash table is implemented as an array, and there are
no linked lists or chaining. Each entry in the array either contains one hashed item or it is empty. The hash
function defines for each key k a probe sequence h(k, 0), h(k, 1), . . . of table locations. To insert the key k,
we examine the sequence of table locations in the order defined by the key’s probe sequence until we find
an empty location and then insert the item at that position. When searching for an item in the hash table,
we examine the sequence of table locations in the order defined by the key’s probe sequence until either the
item is found, or we have found an empty location in the sequence. If an empty location is found, it means
that the item is not present in the table.
An open address hash table with 2n entries is used to store n items. Assume that the table location
h(k, j) is uniform over the 2n possible table locations and that all h(k, j) are independent.
(a) Show that under these conditions the probability an insertion requires more than k probes is at most
2−k .
(b) Show that for i = 1, 2, . . . , n the probability that the i-th insertion requires more than 2 log n probes
is at most 1/n2 .
Let the random variable Xi denote the number of probes required by the i-th insertion. You have shown
in the previous question that Pr(Xi > 2 log n) ≤ 1/n2 . Let the random variable X = max1≤i≤n Xi
denote the maximum number of probes required by any of the n insertions.
28
(c) Show that Pr(X > 2 log n) ≤ 1/n.
(d) Show that the expected length of the longest probe sequence is E[X] = O(log n).
Solution to Exercise 5.21:
(a) Since the hash table contains 2n entries and at any point at most n elements, the probability that a
given probe fails to find an empty entry is at most 1/2. Therefore the probability that an insertion
needs k probes is at most 2−k .
(b) We apply part (a) with k = 2 log n.
(c) We apply the union bound to part (b).
(d) We find
E[X] = Pr(X ≤ 2 log n) · E[X | X ≤ 2 log n] + Pr(X > 2 log n) · E[X | X > 2 log n]
∞
1
≤ 2 log n + 2 log n + Pr(X ≥ k | X > 2 log n)
n
k=2 log n
∞
1
≤ 2 log n + 2 log n + Pr(X ≥ k)
n
k=1
∞
1
−k
≤ 2 log n + 2 log n + n 2
n
k=1
1
= 2 log n + (2 log n + n)
n
≤ 2 log n + 2,
1 1
1− 1− 1− .
m m
1 1
1− 1− 1− .
m m
The probability that the bit is set to 1 by the Y but not by the set X is the same. Hence the expected
number of bits set differently is just
kn k(n−|X∩Y |)
1 1
2m 1 − 1− 1− .
m m
29
This approach could be used to estimate how much two people share the same taste in music; the higher
the overlap between the Bloom filters of their 100 favorite songs, the larger the intersection in their favorite
songs list. Since the Bloom filters are like compressed versions of the lists, comparing them may be much
faster than comparing song lists directly.
Exercise 5.23:
Suppose that we wanted to extend Bloom filters to allow deletions as well as insertions of items into the
underlying set. We could modify the Bloom filter to be an array of counters instead of an array of bits. Each
time an item is inserted into a Bloom filter, the counters given by the hashes of the item are increased by
one. To delete an item, one can simply decrement the counters. To keep space small, the counters should
be a fixed length, such as 4 bits.
Explain how errors can arise when using fixed-length counters. Assuming a setting where one has at
most n elements in the set at any time, m counters, k hash functions, and counters with b bits, explain how
to bound the probability that an error occurs over the course of t insertions or deletions.
Solution to Exercise 5.23: This variation of the Bloom filter is generally referred to as a counting Bloom
filter in the literature.
Problems can occur with counting Bloom filters if there is counter overflow. An overflow occurs whenever
2b of the nk hashes fall in the same counter. We can use the Poisson approximation to bound the probability of
counter overflow, treating each hash as a ball and each counter as a bin. (By Theorem 5.10, the probabilities
we compute via the Poisson approximation are within a factor of 2 of the exact probabilities, since counter
overflow increases with the number of hashes.) The Poisson approximation says that the probability single
counter does not overflow at any point in time is at least
b
2 −1
e−kn/m (kn/m)j
.
j=0
j!
(The probability could be even greater if there were less than n elements at that time.) The probability that
no counter overflows is then at least
b m
2−1 −kn/m j
e (kn/m)
.
j=0
j!
Including the factor of 2 from the approximation, the probability of any counter overflow is bounded by
b m
2−1 −kn/m j
e (kn/m)
2 1 − .
j=0
j!
Finally, the above is for any fixed moment in time, so by a union bound, we can bound the probability over
all time steps by b m
2 −1 −kn/m j
e (kn/m)
2t 1 − .
j=0
j!
30
pairs of leaves.
Let Xi be the number of nodes sent after the (i − 1)st pair is hit until the ith pair is hit, and
let X = Xi . Then
N 1
Q
N N N
E[X] = + + ...+ = = Ω(N log Q) = Ω(N log N ),
2Q 2(Q − 1) 2 2 i=1 i
31
Chapter 6
Exercise 6.2:
(a) Prove that, for every integer n, there exists a coloring of the edges of the complete
graph Kn by two
colors so that the total number of monochromatic copies of K4 is at most n4 2−5 .
(b) Give a randomized algorithm for finding a coloring with at most n4 2−5 monochromatic copies of K4
that runs in expected time polynomial in n.
(c) Show how to construct such a coloring deterministically in polynomial time using the method of
conditional expectations.
Solution to Exercise 6.2:
(a) Suppose we just color the edges randomly, independently for each edge. Enumerate the n4 copies
of K4 in thegraph in any order. Let Xi = 1 if the ith copy is monochromatic, and 0 otherwise.
Then X = Xi , and by linearity E[X] = E[Xi ]. Each copy of K4 has 6 edges, and is therefore
monochromatic with probability 2−5 , so we have E[X] = n4 2−5 . It follows that there is at least one
coloring with at least this many monochromatic K4 .
(b) A randomized algorithm would simply color each edge randomly and count the n number of monochro-
4
matic copies of K4 . Such counting takes time at most
O(n ), as there are 4 copies to check. For
convenience, let us assume n is large enough so that n4 2−5 is an integer. The probability p that each
randomized trial succeeds then satisifies:
n n −5 n −5
p + (1 − p) 2 −1 ≥ 2 ,
4 4 4
or
1
p≥ 31/32
.
(n4 )+1
Hence it takes on average at most O(n4 ) trials to have one with the required monochromatic coloring.
We conclude the randomized guess-and-check algorithm takes time at most O(n8 ).
(c) Consider the edges in some arbitrary order. We color the edges one at a time. Each time we color
an edge, we consider both possible colors, and for each coloring we compute the expected number of
monochromatic K4 that will be obtained if we color the remaining edges randomly. We give the edge
the color with the greater expectation. This can be computed in at most O(n4 ) time each time we color
an edge, simply by checking each copy of K4 and determining the probability it will still be colored
monochromatically. In fact, this computation can be done much more quickly; for example, we need
really only consider and compare the expectation for copies of K4 that include the edge to be colored,
which reduces the computation to O(n2 ) for each edge colored. As there are m edges, this gives an
4 2
O(n m) (or O(n m)) algorithm. At each step, the expected number of monochromatic K4 is at least
n −5
4 2 , so the algorithm terminates with the desired coloring.
Exercise 6.3: Given an n-vertex undirected graph G = (V, E), consider the following method of generating
an independent set. Given a permutation σ of the vertices, define a subset S(σ) of the vertices as follows:
for each vertex i, i ∈ S(σ) if and only if no neighbor j of i precedes i in the permutation σ.
(a) Show that each S(σ) is an independent set in G.
(b) Suggest a natural randomized algorithm to produce σ for which you can show that the expected
cardinality of S(σ) is
n
1
,
i=1
di +1
where di denotes the degree of vertex i.
32
n 1
(c) Prove that G has an independent set of size at least i=1 di +1 .
33
Exercise 6.5: We have shown using the probabilistic method that, if a graph G has n nodes and m edges,
there exists a partition of the n nodes into sets A and B so that at least m/2 edges cross the partition.
mn
Improve this result slightly: show that there exists a partition so that at least 2n−1 edges cross the partition.
Solution to Exercise 6.5: Instead of partitioning the vertices by placing each randomly, uniformly choose
a partition with n/2 edges on each side if n is even, or (n − 1)/2 on one side and (n + 1)/2 on the other
n−2
if n is odd. In the first case, for any edge (x, y) there are 2 (n−2)/2 partitions of the vertices where x and
n−2
y are on opposite sides (2 ways to position x and y, (n−2)/2 ways to partition the other vertices). Hence
n−2 n
the probability the edge crosses the cut is 2 (n−2)/2 / n/2 = n/(2n − 2). By linearity of expectations the
mn
expected number of edges crossing the cut is 2n−2 . The second case is similar, but the the probability the
n−2 n
edge crosses the cut is 2 (n−1)/2 / (n+1)/2 = (n + 1)/(2n), and the expected number of edges crossing the
cut is m(n+1)
2n
mn
. In both cases the result is better than 2n−1 .
Exercise 6.9: A tournament is a graph on n vertices with exactly one directed edge between each pair of
vertices. If vertices represent players, each edge can be thought of as the result of a match between the two
players; the edge points to the winner. A ranking is an ordering on the n players from best to worst (ties are
not allowed). Given the outcome of a tournament, one might wish to determine a ranking of the players. A
ranking is said to disagree with a directed edge from y to x if y is ahead of x in the ranking (since x beat y
in the tournament).
(a) Prove that, for every tournament, there exists a ranking that disagrees with at most 50% of the edges.
(b) Prove that, for sufficiently large n, there exists a tournament such that every ranking disagrees with
at least 49% of the edges in the tournament.
Solution to Exercise 6.9:
(a) For a ranking chosen uniformly at random, each edge individually agrees with ranking with probability
1/2. Hence the expected number of edges that agree is 1/2 the number of edges, and thus there must
exist a ranking that agrees on at least 1/2 the edges.
(b) Choose a random tournament, where the direction of each edge is chosen independently, and
consider
any fixed ranking. Let X be the number of edges that agree with ranking. Clearly E[X] = n2 /2. Since
each edge is chosen independently, we must have by Chernoff bounds (such as, for example, Corollary
4.9 on page 70)
n n
Pr X ≥ 0.51 ≤ e−( 2 )/10000 .
2
Now let us take a union bound
over all n! different rankings. The probability that any of these rankings
agree with at least 0.51 n2 is at most
n
n!e−( 2 )/10000 .
Since n! ≤ nn = en ln n , we see that for large enough n, this expression goes to 0. Hence, for large
enough n, a random tournament is very likely to have no ranking that agrees in 51 percent of the
matches, which gives the result.
Exercise 6.10: A family of subsets F of {1, 2, . . . , n} is called an antichain if there is no pair of sets A and
B in F satisfying A ⊂ B.
n
(a) Give an example of F where |F | = n/2
.
34
(c) Argue that |F | ≤ n
n/2
for any antichain F .
n
n
n
f
1 ≥ E[X] = E[Xk ] = Pr(Xk = 1) = nk ,
k=0 k=0 k=0 k
Note that here and throughout we use the appoximation 1 − x ≈ e−x when x is small; for n sufficiently large,
this approximation is suitable for this argument, as the error affects only lower order terms, so we may think
of the approximation as an equality up to these lower order terms. The expected number of isolated vertices
is therefore approximately n1−c , which is much greater than 1 for c < 1.
Now consider using theconditional expectation inequality. Let Xi be 1 if the ith vertex is isolated and
0 otherwise, and let X = Xi . We have found that Pr(Xi = 1) ≈ n−c . Also,
E[X | Xi = 1] = 1 + j = i Pr(Xj = 1 | Xi = 1).
Now, the effect that vertex i is isolated does not affect Xj substantially; the probability that the jth vertex
is isolated (for j = i) after this conditioning is
n−1
c ln n
Pr(Xj = 1 | Xi = 1) = 1 − ≈ n−c .
n
Again, here we use approximation to signify the difference is only in lower order terms. From the conditional
expectation inequality, and using our approximations, we have
n1−c
Pr(X > 0) ≥ ≥1−
1 + (n − 1)n−c
Pr(X ≥ 1) ≤ 1/6
35
and that
lim Pr(X ≥ 1) ≥ 1/7.
n→∞
36
Chapter 7
Exercise 7.2: Consider the two-state Markov chain with the following transition matrix.
p 1−p
P= .
1−p p
t
Find a simple expression for P0,0 .
Solution to Exercise 7.2:
t 1 + (2p − 1)t
P0,0 = .
2
This is easily shown by induction. See also the related Exercise 1.11, which considers the equivalent problem
in a different way.
Exercise 7.12: Let Xn be the sum of n independent rolls of a fair die. Show that for any k ≥ 2,
1
lim Pr(Xn is divisible by k) = .
n→∞ k
Solution to Exercise 7.12: Let Z1 , Z2 , . . . be the sequence defined as follows: Zj = (Xj mod k). Notice
that the Zj give a Markov chain! It is easy to check that the stationary distributon of this chain is uniform
over 0, 1, . . . , k − 1, using for example Definition 7.8. Therefore we have by Theorem 7.7 that
This is because there are Cn paths from 1 back to 1 of length 2n that do not hit 0. (Again, this is a standard
combinatorial result.)
t
Recall that we let r0,0 be the probability that the first return to 0 from 0 is at time t. Then
∞
t
r0,0 = Cn pn (1 − p)n+1 .
n=0
37
We now use the Catalan identity √
∞
n 1− 1 − 4x
Cn x =
n=0
2x
to find
∞
1− 1 − 4p(1 − p)
t
r0,0 = (1 − p) Cn (p(1 − p)) = (1 − p) ·
n
.
n=0
2p(1 − p)
When p ≤ 1/2, then 1 − 4p(1 − p) = 2p, and we have r0,0 t
= 1. Hence when p ≤ 1/2 the chain is recurrent.
When p > 1/2, then 1 − 4p(1 − p) = 2 − 2p, and r0,0 t
= (1 − p)/p < 1. In this case, the chain is transient.
To cope with the issue of whether state 0 is positive recurrent or null recurrent, we consider
∞
ht0,0 = (2n + 2)Cn pn (1 − p)n+1
n=0
∞
2n
= 2(1 − p) (p(1 − p))n .
n=0
n
and hence
qi 2−i + (1 − qi )2n−i = 1.
We find
2n − 2i
qi = .
2n − 1
38
(c) We can repeat the same argument, with c = p/(1 − p). We have
E[cWt+1 | Wt ] = p(p/(1 − p))Wt −1 + (1 − p)(p/(1 − p))Wt +1 = cWt .
We therefore find in this setting
qi c−i + (1 − qi )cn−i = 1,
so
cn − ci
qi = .
cn − 1
Exercise 7.21: Consider a Markov chain on the states {0, 1, . . . , n}, where for i < n we have Pi,i+1 = 1/2
and Pi,0 = 1/2. Also, Pn,n = 1/2 and Pn,0 = 1/2. This process can be viewed as a random walk on a
directed graph with vertices {0, 1, . . . , n}, where each vertex has two directed edges: one that returns to
0 and one that moves to the vertex with the next higher number (with a self-loop at vertex n). Find the
stationary distribution of this chain. (This example shows that random walks on directed graphs are very
different than random walks on undirected graphs.)
Solution Exercise 7.21: Clearly π0 , the probability of being at state 0 in the stationary distribution, is
just 1/2, since at each step we move to 0 with probability 1/2. It follows inductively that πi = 2−i−1 for
1 < i < n, and πn = 2−n .
Exercise 7.23: One way of spreading information on a network uses a rumor-spreading paradigm. Suppose
that there are n hosts currently on the network. Initially, one host begins with a message. Each round, every
host that has the message contacts another host chosen independently and uniformly at random from the
other n − 1 hosts, and sends that host the message. We would like to know how many rounds are necessary
before all hosts have received the message with probability 0.99.
(a) Explain how this problem can be viewed in terms of Markov chains.
(b) Determine a method for computing the probability that j hosts have received the message after round
k given that i hosts have received the message after round k − 1. (Hint: There are various ways of
doing this. One approach is to let P (i, j, c) be the probability j hosts have the message after the first
c of the i hosts have made their choices in a round; then find a recurrence for P .)
(c) As a computational exercise, write a program to determine the number of rounds for a message starting
at one host to reach all other hosts with probability 0.9999 when n = 128.
Solution to Exercise 7.23:
(a) There are multiple ways to view this problem in terms of Markov chains. The most natural is to
have Xk be the number of hosts that have received the message after k rounds, with X0 = 1. The
distribution of Xk+1 depends only on Xk ; the history and which actual hosts have received the message
do not affect the distribution of Xk+1 .
(b) Following the hint, we have P (i, j, 0) = 1 if i = j and 0 otherwise. Further
j−1 n−j +2
P (i, j, c) = P (i, j, c − 1) + P (i, j − 1, c − 1) .
n−1 n−1
That is, there are two ways to have j hosts with the message after c of the i hosts have made their
choices; either j hosts had the message after c − 1 choices and the cth host chose one of the other j − 1
hosts with the message already, or j − 1 hosts had the message after c − 1 choices and the cth host
chose a host without the message.
(c) We use the above reucrrence to track the entire distribution of Xk round by round. A simple program
computes the desired values. For n = 128, all the hosts have the message with probability (greater
than) 0.99 after 17 rounds (if my code is right!). For probability (greater than) 0.9999, it requires 22
rounds.
Notice that even if each round each host contacted a host without the message, 7 rounds would be
needed.
39
Exercise 7.26: Let n equidistant points be marked on a circle. Without loss of generality, we think of the
points as being labeled clockwise from 0 to n − 1. Initially, a wolf begins at 0, and there is one sheep at
each of the remaining n − 1 points. The wolf takes a random walk on the circle. For each step, it moves
with probability 1/2 to one neighboring point and with probability 1/2 to the other neighboring point. At
the first visit to a point, the wolf eats a sheep if there is still one there. Which sheep is most likely to be the
last eaten?
Solution to Exercise 7.26: Interestingly, each of the remaining sheep is equally likely to be the last eaten.
There are many different ways this can be shown. One way is as follows. Consider the sheep at position
n − 1. This is the last sheep to be eaten if the wolf, starting from 0, walks in one direction from 0 to n − 2
before walking in the other direction directly from 0 to n − 1. By breaking the circle between n − 2 and
n − 1, we see that his is equivalent to a gambler’s ruin problem as in Section 7.2.1. Because of this, the
probability the sheep at n − 1 is last is 1/((n − 2) + 1) = 1/(n − 1). By symmetry the same is true for the
sheep at position 1.
Now consider the sheep at position i, 1 < i < n − 1. There are two ways the sheep could be last: the
wolf first walks from 0 to i − 1 before reaching i + 1 (event A) and then walks from i − 1 to i + 1 without
passing through i (event B); or the wolf first walks from 0 to i + 1 before reaching i − 1 (event C), and then
walks from i + 1 to i − 1 without passing through i (event D). Again, each of these are simple gambler’s
ruin problems, and combining them gives
i−1 1 n−i−1 1 1
Pr(A) · Pr(B) + Pr(C) · Pr(D) = + = .
n−2n−1 n−2 n−1 n−1
Exercise 7.27: Suppose that we are given n records, R1 , R2 , . . . , Rn . The records are kept in some order.
The cost of accessing the jth record in the order is j. Thus, if we had 4 records ordered as R2 , R4 , R3 , R1 ,
the cost of accessing R4 would be 2, and the cost of accessing R1 would be 4.
Suppose further that at each step, record Rj is accessed with probability pj , with each step being
independent of other steps. If we knew the values of the pj in advance, we would keep the Rj in decreasing
order with respect to pj . But if we don’t know the pj in advance, we might use the “move-to-front” heuristic:
at each step, put the record that was accessed at the front of the list. We assume that moving the record
can be done with no cost and that all other records remain in the same order. For example, if the order was
R2 , R4 , R3 , R1 before R3 was accessed, then the order at the next step the order would be R3 , R2 , R4 , R1 .
In this setting, the order of the records can be thought of as the state of a Markov chain. Give the
stationary distribution of this chain. Also, let Xk be the cost for accessing the kth requested record.
Determine an expression for limk→∞ E[Xk ]. Your expression should be easily computable in time that is
polynomial in n, given the pj .
Solution to Exercise 7.27: Let us consider an example with 4 records. Let π(i) be the record in the ith
position. The key is to note that for the records to be in the order Rπ(1) , Rπ(2) , Rπ(3) , Rπ(4) , we must have
that, thinking of the sequence of choices in reverse order, π(1) was chosen first, then π(2), and so on. This
gives the probabilility of this order as
pπ(1) pπ(2) pπ(3)
.
pπ(1) + pπ(2) + pπ(3) + pπ(4) pπ(2) + pπ(3) + pπ(4) pπ(3) + pπ(4)
This formula generalizes; if there are n records, the probability is:
n−1
pπ(1)
n .
i=1 j=i pπ(j)
40
This can be easily calculated in polynomial time.
41