f22 hw1 Sol
f22 hw1 Sol
1. Suppose we are given a bit string B[1 .. n]. A triple of indices 1 ≤ i < j < k ≤ n is called a
well-spaced triple if B[i] = B[ j] = B[k] = 1 and k − j = j − i.
Solution:
BruteForceTriple(B[1 .. n]):
for i ← 1 to n
for j ← i + 1 to n
k ← 2j − i
if k ≤ n and B[i] = 1 and B[ j] = 1 and B[k] = 1
return True
return False
■
(b) Describe an algorithm to determine whether B has a well-spaced triple in O(n log n)
time. [Hint: FFT!]
Solution: Let BB denote the convolution of the input array B with itself, that is,
BB = B ∗ B. We can compute BB in O(n log n) time using any FFT algorithm.
Consider an index j such that B[ j] = 1. If j is the middle index in an evenly-
spaced triple i, j, k in B, then each
P pair (a, b) ∈ {(i, k), ( j, j), (k, i)} contributes 1
to the summation BB[2 j] = a+b=2 j B[a] · B[b], and thus BB[2 j] ≥ 3. On
the other hand, if j is not the middle index of any evenly-spaced triple, then
BB[2 j] = 1.
Finally, if B[ j] = 0, then j cannot be part of an evenly-spaced triple.
FFTTriple(B[0 .. n]):
〈〈pad for convolution〉〉
for i ← n + 1 to 2n
B[i] ← 0
〈〈compute convolution B ∗ B〉〉
B ∗ = FFT(B)
for i ← 0 to 2n
BB ∗ [i] ← B ∗ [i] · B ∗ [i]
BB ← InverseFFT(BB ∗ )
〈〈look for evenly-spaced triples〉〉
for j ← 0 to n
if B[ j] = 1 and BB[2 j] > 1
return True
return False
This algorithm runs in O(n log n) time; the running time is dominated by the
calls to FFT and InverseFFT. ■
1
CS 473 Homework 1 Solutions Fall 2022
Solution: If the algorithm from part (c) returns a positive integer, return True;
otherwise, return False. ■
Rubric: 4 points. The second solution should get exactly the same score as part (c).
(c) Describe an algorithm to determine the number of well-spaced triples in B in O(n log n)
time.
Solution: As in part (b), let BB = B ∗ B. We can compute BB in O(n log n) time
using any FFT algorithm.
Consider an index j such that B[ j] = 1. If j is the middle index in an evenly-
spaced triple i, j, k in B, thenP
each pair (a, b) ∈ {(i, k), ( j, j), (k, i)} contributes 1
to the summation BB[2 j] = a+b=2 j B[a] · B[b]. It follows that j is the middle
index of c evenly-spaced triples if and only if BB[2 j] = 2c + 1.
FFTTriple(B[0 .. n]):
〈〈pad for convolution〉〉
for i ← n + 1 to 2n
B[i] ← 0
〈〈compute convolution BB = B ∗ B〉〉
B ∗ = FFT(B)
for i ← 0 to 2n
BB ∗ [i] ← B ∗ [i] · B ∗ [i]
BB ← InverseFFT(BB ∗ )
〈〈count evenly-spaced triples〉〉
triples ← 0
for j ← 0 to n
if B[ j] = 1
triples ← triples + ⌊BB[2 j]/2⌋
return triples
This algorithm runs in O(n log n) time; the running time is dominated by the
calls to FFT and InverseFFT. ■
Rubric: 4 points.
2
CS 473 Homework 1 Solutions Fall 2022
2. This problem explores different algorithms for computing the factorial function n!.
(a) Recall that the standard lattice algorithm that you learned in elementary school
multiplies any n-bit integer and any m-bit integer in O(mn) time. Describe and
analyze a variant of Karatsuba’s algorithm that multiplies any n-bit integer and any
m-bit integer, for any n ≥ m, in O(n · mlg 3−1 ) = O(n · m0.58496 ) time.
Solution: Let A[0 .. n − 1] and B[0 .. m − P1] denote the input Pbit arrays, re-
n−1 m−1
spectively representing the integers a = i=0 A[i] · 2 and b = i=0 B[i] · 2i .
i
Intuitively, we split the longer array A into about n/m chunks of length about m,
multiply each chunk by b using Karatsuba’s algorithm, and then combine the
partial products by brute force.
For simplicity, I’ll assume that n is a multiple of m; otherwise, we can pad
the array A with at most m − 1 zeros.
UnbalancedKaratsuba(A[0 .. n − 1], B[0 .. m − 1]):
product ← 0
for j ← 0 to n/m − 1
shift ← m · j
Pm−1
inch = i=0 A[shift + i] · 2i 〈〈inch = INput CHunk〉〉
ouch ← Karatsuba(inch, b) 〈〈ouch = OUtput CHunk〉〉
product ← product + ouch · 2shift
return product
Each iteration of the main loop calls Karatsuba with two m-bit integers, and
performs O(m) other work, so the running time of each iteration is O(mlg 3 ). The
for-loop repeats n/m times, so the overall algorithm runs in O(mlg 3 · n/m) =
O(n · m lg 3−1 ) time as required. ■
3
CS 473 Homework 1 Solutions Fall 2022
(b) Analyze the running time of Factorial(n) using different algorithms for the multipli-
cation in line (∗):
i. Lattice multiplication
Solution: In the kth iteration of the main loop, we are multiplying k and
(k − 1)!, which have Θ(log k) and Θ(k log k) bits, respectively. If we use the
lattice algorithm, this multiplication takes O(k log2 k) = O(k(log k)2 ) time.
So the running time of Factorial(n) is
n
n
X X
O(k log2 k) ≤ O(k) · O(log2 n) = O(n 2 log2 n).
k=1 k=1
Solution: Now the kth multiplication takes O((k log k) · (log k)lg 3−1 ) =
O(k loglg 3 k) time. So the running time of Factorial(n) is
n
n
X X
lg 3
O(k log k) ≤ O(k) · O(loglg 3 n) = O(n 2 loglg 3 n).
k=1 k=1
4
CS 473 Homework 1 Solutions Fall 2022
(c) Analyze the running time of FasterFactorial(n) using different algorithms for the
multiplication in the last line of Falling:
i. Lattice multiplication
Solution: Let T (n, m) denote the running time of Falling(n, m). Both
factors nm/2 and (n − m/2)m/2 are Θ(m log n) bits long.
If we use the lattice algorithm, the final multiplication in Falling(n, m)
takes O(m2 log2 n) time, which gives us the recurrence
Now define a new function t(m) = T (n, m)/ log2 n, which satisfies the even
simpler recurrence
t(m) ≤ 2t(m/2) + O(m2 ),
whose solution via recursion trees is t(m) = O(m2 ). It follows that T (n, m) ≤
t(m) log2 n = O(m2 log n).
We conclude that FasterFactorial(n) runs in O(n 2 log2 n) time, just
like Factorial(n). ■
Solution: Again, let T (n, m) denote the running time of Falling(n, m). If
we use Karatsuba’s algorithm, the final multiplication in Falling(n, m) takes
O(mlg 3 loglg 3 n) time, giving us the recurrence
Again, we can simplify using the inequality T (n − m/2, m/2) ≤ T (n, m/2):
The helper function t(m) = T (n, m)/ loglg 3 n satisfies the even simpler
recurrence
t(m) ≤ 2t(m/2) + O(mlg 3 ),
whose solution via recursion trees is t(m) = O(mlg 3 ). It follows that
T (n, m) ≤ t(m) loglg 3 n = O(mlg 3 loglg 3 n).
We conclude that FasterFactorial(n) runs in O(n lg 3 loglg 3 n) time,
which is significantly faster than Factorial(n)! ■
5
CS 473 Homework 1 Solutions Fall 2022
3. Your new boss at the Dixon Ticonderoga Pencil Factory asks you to design an algorithm to
solve the following problem. Suppose you are given N pencils, each with one of c different
colors, and a non-negative integer k. How many different ways are there to choose a set
of k pencils? Two pencil sets are considered identical if they contain the same number of
pencils of each color.
Describe an algorithm to solve this problem, and analyze its running time. Your input is
an array Pencils[1 .. c] and an integer k, where Pencils[i] stores the number of pencils with
color i. Your output is a single non-negative integer. For full credit, report the running time
of your algorithm as a function of the parameters N , c, and k. Assume that k ≪ c ≪ N .
This problem is based on a Jane Street interview question.
PPencils[ j] i
Solution: Following the hint, we associate the polynomial P j (x) = i=1 x with
each color
Qcj. We need to compute the coefficient of x in the product polynomial
k
P(x) = j=1 P j (x). This coefficient is equal to the number of different ways of
choosing one term from each polynomial P j (that is, choosing a number of pencils
of color j) such that the degrees of the chosen terms (that is, the chosen number of
pencils of each color) sum to k. 〈〈So far this is worth 2 points.〉〉
First, the output polynomial P has degree N . Thus, if we multiply the polynomials
in a simple for-loop, each intermediate polynomial has degree at most N . So each
multiplication can be performed in O(N 2 ) time using the lattice algorithm. There are c
factor polynomials, so computing their product requires c − 1 pairwise multiplications.
The overall running time of this algorithm is O(c N 2 ). 〈〈So far this is worth 4 points.〉〉
If instead we multiply polynomials using fast Fourier transforms, each multiplication
takes O(N log N ) time, so the overall running time drops to O(c N log N). 〈〈So far this
is worth 6 points.〉〉
We can further improve this algorithm using a divide-and-conquer strategy instead
of a simple for-loop, similarly to the Falling function in problem 2. To compute the
product of c polynomials, we first recursively multiply the first c/2 polynomials, then
recursively multiply the last c/2 polynomials, and finally multiply the two products
using FFTs. The recursion tree has depth O(log c), and the total time spent on
multiplications in each level of the tree is O(N log N ), so the overall running time
becomes O(N log N log c).a 〈〈So far this is worth 8 points.〉〉
Alternatively, because k ≪ N , we can improve our algorithm algorithm by discard-
ing all terms with degree greater than k as soon as they arise. With this optimization,
each multiplication involves two polynomials of degree at most k, and thus can be
performed in O(k log k) time via FFT. The overall running time drops to O(ck log k).
〈〈So far this is worth 10 points (full credit) even without the binary tree optimization.〉〉
But we can still do better! Because k ≪ c, we can improve the algorithm even
further by observing that, after discarding all terms larger than x k , there are
P j at most k
distinct truncated polynomials P j . For any integer 1 ≤ j ≤ k, let Q j (x) = i=1 x j . For
all j < k, let c j denote the number of colors for which we have exactly j pencils, and
let ck denote the number of colors for which we have at least k pencils. Then the
6
CS 473 Homework 1 Solutions Fall 2022
first k coefficients of P(x) are equal to the first k coefficients of the polynomial
k
Y
P̃(x) = (Q j (x))c j .
i=1
The overall running time of this algorithm is O(c + k 2 log k log c), which is o(ck log k)
if k is sufficiently small compared to c.c 〈〈This is worth 13 points.〉〉
p
Finally, if k is sufficiently small—specifically, if k = o( c/ log c)—the running time
of this algorithm simplifies to O(c).d This is clearly optimal, because any correct
algorithm must read the entire input! 〈〈This is worth 15 points.〉〉
■
a
In practice, this can be optimized further by using a Huffman code to optimally organize the
multiplications; however, this optimization does not reduce the worst-case running time, when Pencils[i]
has roughly the same value for all i.
b
More careful analysis implies an even smaller upper bound O(k log k log(c/k)) for this part of the
repeated-squaring algorithm; the running time is maximized when all exponents c j are equal.
c
If we used lattice multiplication everywhere instead of FFTs, the running time would be O(c+k3 log c).
d
If k = o(c 1/3 / log1/3 c), the running time is O(c) even if we use lattice multiplication everywhere
instead of FFTs.
7
CS 473 Homework 1 Solutions Fall 2022
Solution (dynamic programming): To save a bit of space, I’ll call the input array
P[1 .. c] instead of Pencils[1 .. c].
For any integers 0 ≤ i ≤ c and 0 ≤ s ≤ k, let NumSets(i, n) denote the number of
pencil sets of size s that use only the first i colors. This function satisfies the following
recurrence:
1 if s = 0
if i = 0 and s > 0
0
s
P
NumSets(i, s) = NumSets(i − 1, s − ni) if i > 0 and s ≤ P[i]
ni=0
P[i]
NumSets(i − 1, s − ni) otherwise
P
ni=0
We need to compute NumSets(c, k). The variable ni represents the number of pencils
with color i. As a sanity check, notice that NumSets(i, 0) = 1 for all i; there is exactly
one way to make an empty pencil set!
We can memoize this function into a two-dimensional array NumSets[0 .. c, 0 .. k],
which we can fill in standard row-major order in O(k 2 c) time. (For each i and s, the
innermost loop iterates at most k + 1 times.) 〈〈So far this is worth 8 points.〉〉
The main bottleneck in this algorithm is evaluating the sums inside the inner
loop; we can speed up this evaluation using a technique called prefix sums. Let
NSAtMost(i, s) denote the number of pencil sets of size at most s that can be made
using only the first i colors:
s
X
NSAtMost(i, s) := NumSets(i, j).
j=0
This helper function and our earlier function NumSets satisfy the following mutual
recurrence:
¨
1 s=0
NSAtMost(i, s) =
NSAtMost(i, s − 1) + NumSets(i, s) otherwise
1 if s = 0
0 if i = 0 and s > 0
NumSets(i, s) =
NSAtMost(i − 1, s) if i > 0 and s ≤ P[i]
NSAtMost(i − 1, s) − NSAtMost(i − 1, s − P[i]) otherwise
We can memoize these function into two c × k arrays. We can fill both arrays
simultaneously in row-major order, increasing i in the outer loop, increasing s in
the inner loop, and computing NumSets[i, s] and then NSAtMost[i, s] inside the inner
loop. The resulting algorithm runs in O(ck) time. 〈〈This is worth 11 points.〉〉
But we can do better! Each set of pencils that we can build consists of 1 pencil
each from x 1 different colors, 2 pencils each from x 2 different colors, 3 pencils each
8
CS 473 Homework 1 Solutions Fall 2022
from x 3 different colors, and so on, for some non-negative integers x 1 , x 2 , . . . , x k such
that
Xk
i · x i = k.
i=1
The main idea behind our faster dynamic programming algorithm is to consider the
possible choices of x 1 , x 2 , . . . , x k separately from the actual color assignments. For
example, if we are given two red pencils, four green pencils, and one blue pencil, we
can build four different types of four-pencil sets:
• (x 1 , x 2 , x 3 , x 4 ) = (0, 0, 0, 1) — GGGG
• (x 1 , x 2 , x 3 , x 4 ) = (1, 0, 1, 0) — GGGR and GGGB
• (x 1 , x 2 , x 3 , x 4 ) = (0, 2, 0, 0) — GGRR
• (x 1 , x 2 , x 3 , x 4 ) = (2, 1, 0, 0) — GGRB and RRGB
Imagine choosing the counts x i (and counting the choices for the corresponding
colors) in decreasing index order. In some intermediate stage of this decision process,
we have already chosen x k , x k−1 , . . . , x ℓ+1 and we need to count decompositions of
the form
X ℓ
i · x i = s,
i=1
Pk
assuming we have already used u = j=ℓ+1 x j colors of pencils. The remaining
subproblem can be specified by three integers, each between 0 and k:
Define NSets(s , ℓ, u) to be the number of pencil sets of size s that use at most ℓ pencils
of each color, if u colors (each with at least ℓ pencils) are unavailable. Our top-level
problem asks us to compute NSets(k, k, 0).
Before we describe a recurrence for this function, we need some simple prepro-
cessing. If any color has more than k pencils, we throw the extra pencils away; for
all i, replace P[i] with min{P[i], k}. Then we compute an array Cols[1 .. k], where
Cols[i] is the number of different colors for which we have at least i pencils. We can
compute this array in O(c) time by sorting the input array P[1 .. c] (using counting
sort) and then scanning.
The function NSets satifies the following rather intimidating recurrence:
if s = 0
1
NSets(s, ℓ, u) = ⌊s/ℓ⌋
X Cols[ℓ] − u
· NSets(s − ℓ · x ℓ , ℓ − 1, u + x ℓ ) otherwise
x
x ℓ =0 ℓ
The recurrence looks at all possible values of x ℓ , and for each such value, counts
the number of subsets with x ℓ colors contributing exactly ℓ pencils. For each value
9
CS 473 Homework 1 Solutions Fall 2022
of x ℓ , the binomial coefficient counts number of ways to choose x ℓ colors from the
Cols[ℓ] − u available colors that have at least ℓ pencils, and the recursive call counts
our choices for the remaining s − ℓ · x ℓ pencils.
We can memoize this recurrence into a three-dimensional array, indexed by s, ℓ
and u. We can fill the array using three nested for-loops, increasing ℓ in the outer
loop, decreasing u in the middle loop, and increasing s in the inner loop. We can
evaluate each entry NSets[s, ℓ, u] in O(1 + s/ℓ) time; in particular, we can evaluate
each binomial coefficient in O(1) time using the recurrence
1 if i = 0
n
= n n − i
i otherwise
i−1 i
Thus, the overall time to fill the entire memoization array is is at most
k X
k X
k k k
X
3
X X 1
O(1 + s/ℓ) = O(k ) + O(k) s = O(k3 log k).
s=1 ℓ=1 u=0 s=1 ℓ=1
ℓ
The overall running time of our algorithm is O(c + k 3 log k) time, which is o(ck) if k
is sufficiently small compared to c.a 〈〈This is worth 13 points.〉〉
Rubric: 10 points for a correct O(ck log k)-time algorithm, partial credit and extra credit as
indicated in the solution. Partial credit for dynamic programming solutions follows the standard
dynamic programming rubric given in Homework 2. These are almost certainly not the only
correct algorithms, or the fastest possible algorithms for all combinations of c , k, and N .
10