0% found this document useful (0 votes)
61 views10 pages

f22 hw1 Sol

This document contains solutions to homework problems about algorithms for analyzing bit strings and calculating factorials. For the bit string problem, it provides an O(n^2) brute force algorithm, an O(n log n) algorithm using fast Fourier transforms (FFT) to detect if a bit string contains a "well-spaced triple", and an O(n log n) algorithm using FFT to count the number of well-spaced triples. For the factorials problem, it analyzes the running time of a factorial function using different multiplication algorithms: O(n^2 log^2 n) time using standard lattice multiplication, and O(n^2 log^lg3 n) time using a variant of Kar

Uploaded by

Peter Rosenberg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views10 pages

f22 hw1 Sol

This document contains solutions to homework problems about algorithms for analyzing bit strings and calculating factorials. For the bit string problem, it provides an O(n^2) brute force algorithm, an O(n log n) algorithm using fast Fourier transforms (FFT) to detect if a bit string contains a "well-spaced triple", and an O(n log n) algorithm using FFT to count the number of well-spaced triples. For the factorials problem, it analyzes the running time of a factorial function using different multiplication algorithms: O(n^2 log^2 n) time using standard lattice multiplication, and O(n^2 log^lg3 n) time using a variant of Kar

Uploaded by

Peter Rosenberg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

CS 473 Homework 1 Solutions Fall 2022

1. Suppose we are given a bit string B[1 .. n]. A triple of indices 1 ≤ i < j < k ≤ n is called a
well-spaced triple if B[i] = B[ j] = B[k] = 1 and k − j = j − i.

(a) Describe a brute-force algorithm to determine whether B has a well-spaced triple in


O(n2 ) time.

Solution:
BruteForceTriple(B[1 .. n]):
for i ← 1 to n
for j ← i + 1 to n
k ← 2j − i
if k ≤ n and B[i] = 1 and B[ j] = 1 and B[k] = 1
return True
return False

Rubric: 2 points. This is not the only brute-force solution.

(b) Describe an algorithm to determine whether B has a well-spaced triple in O(n log n)
time. [Hint: FFT!]

Solution: Let BB denote the convolution of the input array B with itself, that is,
BB = B ∗ B. We can compute BB in O(n log n) time using any FFT algorithm.
Consider an index j such that B[ j] = 1. If j is the middle index in an evenly-
spaced triple i, j, k in B, then each
P pair (a, b) ∈ {(i, k), ( j, j), (k, i)} contributes 1
to the summation BB[2 j] = a+b=2 j B[a] · B[b], and thus BB[2 j] ≥ 3. On
the other hand, if j is not the middle index of any evenly-spaced triple, then
BB[2 j] = 1.
Finally, if B[ j] = 0, then j cannot be part of an evenly-spaced triple.

FFTTriple(B[0 .. n]):
〈〈pad for convolution〉〉
for i ← n + 1 to 2n
B[i] ← 0
〈〈compute convolution B ∗ B〉〉
B ∗ = FFT(B)
for i ← 0 to 2n
BB ∗ [i] ← B ∗ [i] · B ∗ [i]
BB ← InverseFFT(BB ∗ )
〈〈look for evenly-spaced triples〉〉
for j ← 0 to n
if B[ j] = 1 and BB[2 j] > 1
return True
return False

This algorithm runs in O(n log n) time; the running time is dominated by the
calls to FFT and InverseFFT. ■

1
CS 473 Homework 1 Solutions Fall 2022

Solution: If the algorithm from part (c) returns a positive integer, return True;
otherwise, return False. ■

Rubric: 4 points. The second solution should get exactly the same score as part (c).

(c) Describe an algorithm to determine the number of well-spaced triples in B in O(n log n)
time.
Solution: As in part (b), let BB = B ∗ B. We can compute BB in O(n log n) time
using any FFT algorithm.
Consider an index j such that B[ j] = 1. If j is the middle index in an evenly-
spaced triple i, j, k in B, thenP
each pair (a, b) ∈ {(i, k), ( j, j), (k, i)} contributes 1
to the summation BB[2 j] = a+b=2 j B[a] · B[b]. It follows that j is the middle
index of c evenly-spaced triples if and only if BB[2 j] = 2c + 1.

FFTTriple(B[0 .. n]):
〈〈pad for convolution〉〉
for i ← n + 1 to 2n
B[i] ← 0
〈〈compute convolution BB = B ∗ B〉〉
B ∗ = FFT(B)
for i ← 0 to 2n
BB ∗ [i] ← B ∗ [i] · B ∗ [i]
BB ← InverseFFT(BB ∗ )
〈〈count evenly-spaced triples〉〉
triples ← 0
for j ← 0 to n
if B[ j] = 1
triples ← triples + ⌊BB[2 j]/2⌋
return triples

This algorithm runs in O(n log n) time; the running time is dominated by the
calls to FFT and InverseFFT. ■

Rubric: 4 points.

2
CS 473 Homework 1 Solutions Fall 2022

2. This problem explores different algorithms for computing the factorial function n!.

(a) Recall that the standard lattice algorithm that you learned in elementary school
multiplies any n-bit integer and any m-bit integer in O(mn) time. Describe and
analyze a variant of Karatsuba’s algorithm that multiplies any n-bit integer and any
m-bit integer, for any n ≥ m, in O(n · mlg 3−1 ) = O(n · m0.58496 ) time.

Solution: Let A[0 .. n − 1] and B[0 .. m − P1] denote the input Pbit arrays, re-
n−1 m−1
spectively representing the integers a = i=0 A[i] · 2 and b = i=0 B[i] · 2i .
i

Intuitively, we split the longer array A into about n/m chunks of length about m,
multiply each chunk by b using Karatsuba’s algorithm, and then combine the
partial products by brute force.
For simplicity, I’ll assume that n is a multiple of m; otherwise, we can pad
the array A with at most m − 1 zeros.
UnbalancedKaratsuba(A[0 .. n − 1], B[0 .. m − 1]):
product ← 0
for j ← 0 to n/m − 1
shift ← m · j
Pm−1
inch = i=0 A[shift + i] · 2i 〈〈inch = INput CHunk〉〉
ouch ← Karatsuba(inch, b) 〈〈ouch = OUtput CHunk〉〉
product ← product + ouch · 2shift
return product

Each iteration of the main loop calls Karatsuba with two m-bit integers, and
performs O(m) other work, so the running time of each iteration is O(mlg 3 ). The
for-loop repeats n/m times, so the overall algorithm runs in O(mlg 3 · n/m) =
O(n · m lg 3−1 ) time as required. ■

Rubric: 2 points = 1 for algorithm + 1 for analysis

3
CS 473 Homework 1 Solutions Fall 2022

(b) Analyze the running time of Factorial(n) using different algorithms for the multipli-
cation in line (∗):
i. Lattice multiplication

Solution: In the kth iteration of the main loop, we are multiplying k and
(k − 1)!, which have Θ(log k) and Θ(k log k) bits, respectively. If we use the
lattice algorithm, this multiplication takes O(k log2 k) = O(k(log k)2 ) time.
So the running time of Factorial(n) is
n
‚ n Œ
X X
O(k log2 k) ≤ O(k) · O(log2 n) = O(n 2 log2 n).
k=1 k=1

Rubric: 2 points = 1 for running time + 1 for justification

ii. Your variant of Karatsuba’s algorithm from part (a)

Solution: Now the kth multiplication takes O((k log k) · (log k)lg 3−1 ) =
O(k loglg 3 k) time. So the running time of Factorial(n) is
n
‚ n Œ
X X
lg 3
O(k log k) ≤ O(k) · O(loglg 3 n) = O(n 2 loglg 3 n).
k=1 k=1

Alas, faster multiplication only speeds up Factorial by a sub-logarithmic


factor. ■

Rubric: 2 points = 1 for running time + 1 for justification

4
CS 473 Homework 1 Solutions Fall 2022

(c) Analyze the running time of FasterFactorial(n) using different algorithms for the
multiplication in the last line of Falling:
i. Lattice multiplication

Solution: Let T (n, m) denote the running time of Falling(n, m). Both
factors nm/2 and (n − m/2)m/2 are Θ(m log n) bits long.
If we use the lattice algorithm, the final multiplication in Falling(n, m)
takes O(m2 log2 n) time, which gives us the recurrence

T (n, m) = T (n, m/2) + T (n − m/2, m/2) + O(m2 log2 n).

Multiplying larger numbers takes longer, so we can reasonably assume that


T (n − m/2, m/2) ≤ T (n, m/2) and simplify our recurrence:

T (n, m) ≤ 2T (n, m/2) + O(m2 log2 n)

Now define a new function t(m) = T (n, m)/ log2 n, which satisfies the even
simpler recurrence
t(m) ≤ 2t(m/2) + O(m2 ),
whose solution via recursion trees is t(m) = O(m2 ). It follows that T (n, m) ≤
t(m) log2 n = O(m2 log n).
We conclude that FasterFactorial(n) runs in O(n 2 log2 n) time, just
like Factorial(n). ■

Rubric: 2 points = 1 for running time + 1 for justification

ii. Your variant of Karatsuba’s algorithm from part (a)

Solution: Again, let T (n, m) denote the running time of Falling(n, m). If
we use Karatsuba’s algorithm, the final multiplication in Falling(n, m) takes
O(mlg 3 loglg 3 n) time, giving us the recurrence

T (n, m) = T (n, m/2) + T (n − m/2, m/2) + O(mlg 3 loglg 3 n).

Again, we can simplify using the inequality T (n − m/2, m/2) ≤ T (n, m/2):

T (n, m) ≤ 2T (n, m/2) + O(mlg 3 loglg 3 n).

The helper function t(m) = T (n, m)/ loglg 3 n satisfies the even simpler
recurrence
t(m) ≤ 2t(m/2) + O(mlg 3 ),
whose solution via recursion trees is t(m) = O(mlg 3 ). It follows that
T (n, m) ≤ t(m) loglg 3 n = O(mlg 3 loglg 3 n).
We conclude that FasterFactorial(n) runs in O(n lg 3 loglg 3 n) time,
which is significantly faster than Factorial(n)! ■

Rubric: 2 points = 1 for running time + 1 for justification

5
CS 473 Homework 1 Solutions Fall 2022

3. Your new boss at the Dixon Ticonderoga Pencil Factory asks you to design an algorithm to
solve the following problem. Suppose you are given N pencils, each with one of c different
colors, and a non-negative integer k. How many different ways are there to choose a set
of k pencils? Two pencil sets are considered identical if they contain the same number of
pencils of each color.
Describe an algorithm to solve this problem, and analyze its running time. Your input is
an array Pencils[1 .. c] and an integer k, where Pencils[i] stores the number of pencils with
color i. Your output is a single non-negative integer. For full credit, report the running time
of your algorithm as a function of the parameters N , c, and k. Assume that k ≪ c ≪ N .
This problem is based on a Jane Street interview question.

PPencils[ j] i
Solution: Following the hint, we associate the polynomial P j (x) = i=1 x with
each color
Qcj. We need to compute the coefficient of x in the product polynomial
k

P(x) = j=1 P j (x). This coefficient is equal to the number of different ways of
choosing one term from each polynomial P j (that is, choosing a number of pencils
of color j) such that the degrees of the chosen terms (that is, the chosen number of
pencils of each color) sum to k. 〈〈So far this is worth 2 points.〉〉
First, the output polynomial P has degree N . Thus, if we multiply the polynomials
in a simple for-loop, each intermediate polynomial has degree at most N . So each
multiplication can be performed in O(N 2 ) time using the lattice algorithm. There are c
factor polynomials, so computing their product requires c − 1 pairwise multiplications.
The overall running time of this algorithm is O(c N 2 ). 〈〈So far this is worth 4 points.〉〉
If instead we multiply polynomials using fast Fourier transforms, each multiplication
takes O(N log N ) time, so the overall running time drops to O(c N log N). 〈〈So far this
is worth 6 points.〉〉
We can further improve this algorithm using a divide-and-conquer strategy instead
of a simple for-loop, similarly to the Falling function in problem 2. To compute the
product of c polynomials, we first recursively multiply the first c/2 polynomials, then
recursively multiply the last c/2 polynomials, and finally multiply the two products
using FFTs. The recursion tree has depth O(log c), and the total time spent on
multiplications in each level of the tree is O(N log N ), so the overall running time
becomes O(N log N log c).a 〈〈So far this is worth 8 points.〉〉
Alternatively, because k ≪ N , we can improve our algorithm algorithm by discard-
ing all terms with degree greater than k as soon as they arise. With this optimization,
each multiplication involves two polynomials of degree at most k, and thus can be
performed in O(k log k) time via FFT. The overall running time drops to O(ck log k).
〈〈So far this is worth 10 points (full credit) even without the binary tree optimization.〉〉

But we can still do better! Because k ≪ c, we can improve the algorithm even
further by observing that, after discarding all terms larger than x k , there are
P j at most k
distinct truncated polynomials P j . For any integer 1 ≤ j ≤ k, let Q j (x) = i=1 x j . For
all j < k, let c j denote the number of colors for which we have exactly j pencils, and
let ck denote the number of colors for which we have at least k pencils. Then the

6
CS 473 Homework 1 Solutions Fall 2022

first k coefficients of P(x) are equal to the first k coefficients of the polynomial
k
Y
P̃(x) = (Q j (x))c j .
i=1

So we can proceed as follows:

• In O(c + k) time, compute the exponent c j for every index 1 ≤ j ≤ k.


• For each index j, compute the first k coefficients of (Q j (x))c j by repeated squaring
with O(log c j ) multiplications. If each multiplication uses FFTs, we can computing
the first k coefficients of (Q j (x))c j in O(k log k log c j ) = O(k log k log c) time.b
• Finally, compute the first k coefficients of P̃(k) from these k polynomials in
O(k · k log k) time.

The overall running time of this algorithm is O(c + k 2 log k log c), which is o(ck log k)
if k is sufficiently small compared to c.c 〈〈This is worth 13 points.〉〉
p
Finally, if k is sufficiently small—specifically, if k = o( c/ log c)—the running time
of this algorithm simplifies to O(c).d This is clearly optimal, because any correct
algorithm must read the entire input! 〈〈This is worth 15 points.〉〉

I’ve described a sequence of optimizations, each leading to a different time bound.


It is not obvious from the stated time bounds, but in fact, each optimization makes
the algorithm faster (or at least no slower).

• Multiply pairs of polynomials using FFTs instead of the lattice algorithm.


• Organize multiplications in a binary tree instead of a simple for-loop.
• Discard terms with degree greater than k whenever they arise. (This makes
the binary tree optimization useful only for factors with degree less than k, but
that’s still better than nothing.)
• Compute powers of (truncated) polynomials using repeated squaring instead of
multiplying one by one.


a
In practice, this can be optimized further by using a Huffman code to optimally organize the
multiplications; however, this optimization does not reduce the worst-case running time, when Pencils[i]
has roughly the same value for all i.
b
More careful analysis implies an even smaller upper bound O(k log k log(c/k)) for this part of the
repeated-squaring algorithm; the running time is maximized when all exponents c j are equal.
c
If we used lattice multiplication everywhere instead of FFTs, the running time would be O(c+k3 log c).
d
If k = o(c 1/3 / log1/3 c), the running time is O(c) even if we use lattice multiplication everywhere
instead of FFTs.

7
CS 473 Homework 1 Solutions Fall 2022

Solution (dynamic programming): To save a bit of space, I’ll call the input array
P[1 .. c] instead of Pencils[1 .. c].
For any integers 0 ≤ i ≤ c and 0 ≤ s ≤ k, let NumSets(i, n) denote the number of
pencil sets of size s that use only the first i colors. This function satisfies the following
recurrence:


 1 if s = 0
if i = 0 and s > 0

0



s

P
NumSets(i, s) = NumSets(i − 1, s − ni) if i > 0 and s ≤ P[i]

 ni=0
P[i]



NumSets(i − 1, s − ni) otherwise

 P

ni=0

We need to compute NumSets(c, k). The variable ni represents the number of pencils
with color i. As a sanity check, notice that NumSets(i, 0) = 1 for all i; there is exactly
one way to make an empty pencil set!
We can memoize this function into a two-dimensional array NumSets[0 .. c, 0 .. k],
which we can fill in standard row-major order in O(k 2 c) time. (For each i and s, the
innermost loop iterates at most k + 1 times.) 〈〈So far this is worth 8 points.〉〉

The main bottleneck in this algorithm is evaluating the sums inside the inner
loop; we can speed up this evaluation using a technique called prefix sums. Let
NSAtMost(i, s) denote the number of pencil sets of size at most s that can be made
using only the first i colors:
s
X
NSAtMost(i, s) := NumSets(i, j).
j=0

This helper function and our earlier function NumSets satisfy the following mutual
recurrence:
¨
1 s=0
NSAtMost(i, s) =
NSAtMost(i, s − 1) + NumSets(i, s) otherwise


1 if s = 0

0 if i = 0 and s > 0

NumSets(i, s) =

NSAtMost(i − 1, s) if i > 0 and s ≤ P[i]

NSAtMost(i − 1, s) − NSAtMost(i − 1, s − P[i]) otherwise

We can memoize these function into two c × k arrays. We can fill both arrays
simultaneously in row-major order, increasing i in the outer loop, increasing s in
the inner loop, and computing NumSets[i, s] and then NSAtMost[i, s] inside the inner
loop. The resulting algorithm runs in O(ck) time. 〈〈This is worth 11 points.〉〉

But we can do better! Each set of pencils that we can build consists of 1 pencil
each from x 1 different colors, 2 pencils each from x 2 different colors, 3 pencils each

8
CS 473 Homework 1 Solutions Fall 2022

from x 3 different colors, and so on, for some non-negative integers x 1 , x 2 , . . . , x k such
that
Xk
i · x i = k.
i=1

The main idea behind our faster dynamic programming algorithm is to consider the
possible choices of x 1 , x 2 , . . . , x k separately from the actual color assignments. For
example, if we are given two red pencils, four green pencils, and one blue pencil, we
can build four different types of four-pencil sets:

• (x 1 , x 2 , x 3 , x 4 ) = (0, 0, 0, 1) — GGGG
• (x 1 , x 2 , x 3 , x 4 ) = (1, 0, 1, 0) — GGGR and GGGB
• (x 1 , x 2 , x 3 , x 4 ) = (0, 2, 0, 0) — GGRR
• (x 1 , x 2 , x 3 , x 4 ) = (2, 1, 0, 0) — GGRB and RRGB

Imagine choosing the counts x i (and counting the choices for the corresponding
colors) in decreasing index order. In some intermediate stage of this decision process,
we have already chosen x k , x k−1 , . . . , x ℓ+1 and we need to count decompositions of
the form
X ℓ
i · x i = s,
i=1
Pk
assuming we have already used u = j=ℓ+1 x j colors of pencils. The remaining
subproblem can be specified by three integers, each between 0 and k:

• The size s of the subset we have left to build


• The largest number ℓ of pencils of one color we can include in the subset
• The number u of colors that we have already used.

Define NSets(s , ℓ, u) to be the number of pencil sets of size s that use at most ℓ pencils
of each color, if u colors (each with at least ℓ pencils) are unavailable. Our top-level
problem asks us to compute NSets(k, k, 0).
Before we describe a recurrence for this function, we need some simple prepro-
cessing. If any color has more than k pencils, we throw the extra pencils away; for
all i, replace P[i] with min{P[i], k}. Then we compute an array Cols[1 .. k], where
Cols[i] is the number of different colors for which we have at least i pencils. We can
compute this array in O(c) time by sorting the input array P[1 .. c] (using counting
sort) and then scanning.
The function NSets satifies the following rather intimidating recurrence:

if s = 0

1

NSets(s, ℓ, u) = ⌊s/ℓ⌋
X Cols[ℓ] − u‹
 · NSets(s − ℓ · x ℓ , ℓ − 1, u + x ℓ ) otherwise
 x
x ℓ =0 ℓ

The recurrence looks at all possible values of x ℓ , and for each such value, counts
the number of subsets with x ℓ colors contributing exactly ℓ pencils. For each value

9
CS 473 Homework 1 Solutions Fall 2022

of x ℓ , the binomial coefficient counts number of ways to choose x ℓ colors from the
Cols[ℓ] − u available colors that have at least ℓ pencils, and the recursive call counts
our choices for the remaining s − ℓ · x ℓ pencils.
We can memoize this recurrence into a three-dimensional array, indexed by s, ℓ
and u. We can fill the array using three nested for-loops, increasing ℓ in the outer
loop, decreasing u in the middle loop, and increasing s in the inner loop. We can
evaluate each entry NSets[s, ℓ, u] in O(1 + s/ℓ) time; in particular, we can evaluate
each binomial coefficient in O(1) time using the recurrence

 ‹ 1 if i = 0
n
=  n ‹n − i
i  otherwise
i−1 i

Thus, the overall time to fill the entire memoization array is is at most
k X
k X
k k k
‚ Œ‚ Œ
X
3
X X 1
O(1 + s/ℓ) = O(k ) + O(k) s = O(k3 log k).
s=1 ℓ=1 u=0 s=1 ℓ=1

The overall running time of our algorithm is O(c + k 3 log k) time, which is o(ck) if k
is sufficiently small compared to c.a 〈〈This is worth 13 points.〉〉

Finally, if k is sufficiently small—specifically, if k = o(c 1/3 / log1/3 c)—the running


time of this algorithm simplifies to O(c). This is clearly optimal, because any correct
algorithm must read the entire input! 〈〈This is worth 15 points.〉〉 ■
This is slightly faster than the previous solution based on truncated polynomials if we use Lattice
a

multiplication everywhere instead of FFTs.

Rubric: 10 points for a correct O(ck log k)-time algorithm, partial credit and extra credit as
indicated in the solution. Partial credit for dynamic programming solutions follows the standard
dynamic programming rubric given in Homework 2. These are almost certainly not the only
correct algorithms, or the fastest possible algorithms for all combinations of c , k, and N .

10

You might also like