Basic Quantum Subroutines: Finding Multiple Marked Elements and Summing Numbers
Basic Quantum Subroutines: Finding Multiple Marked Elements and Summing Numbers
1
IViR and QuSoft, University of Amsterdam, The Netherlands
2
Department of Econometrics and Operations Research, Tilburg University, Tilburg, The Netherlands
3
Korteweg–de Vries Institute for Mathematics and QuSoft, University of Amsterdam, The Netherlands and Faculty
of Computer Science, Ruhr University Bochum, Germany and Department of Mathematical Sciences, University of
Copenhagen, Denmark
arXiv:2302.10244v3 [quant-ph] 5 Mar 2024
We show√how to find all k marked elements in a list of size N using the optimal
number O( N k) of quantum queries and only a polylogarithmic overhead in the
gate complexity, in the setting where one has a small quantum memory. Previous
algorithms either incurred a factor k overhead in the gate complexity, or had an
extra factor log(k) in the query complexity.
We then consider the problem of finding a multiplicative δ-approximation of
s= N N
i=1 vi where v = (vi ) ∈ [0, 1] , given quantum query access to a binary descrip-
P
pof v. We give an algorithm that does so, with probability at least 1 − ρ, using
tion
O( N log(1/ρ)/δ) quantum queries (under mild assumptions on ρ). This quadrat-
ically improves the dependence on 1/δ and log(1/ρ) compared to a straightforward
application of amplitude estimation. To obtain the improved log(1/ρ) dependence
we use the first result.
1 Introduction
1.1 Finding multiple marked elements in a list
Grover’s famous search algorithm [Gro96] can be used to find a marked element in a list quadrat-
ically faster than possible classically. Formally it can be used to solve the following problem:
given a bit string x ∈ {0, 1}N , x ̸= 0, find an index i ∈ [N ] such that xi = 1.
In this work we consider the problem of finding all indices i ∈ [N ] for which xi = 1. We give
a query-optimal quantum algorithm with polylogarithmic gate overhead in the setting where one
has a small quantum memory. We explain below why this last assumption makes the problem
non-trivial. This improves over the previous state-of-the-art: previous algorithms were either
query-optimal but with a polynomial gate overhead, or had a polylogarithmic gate overhead but
also a logarithmic overhead in the query count.
A well-known query-optimal algorithm for the problem is as follows [dGdW02, Lem. 2].
Let k be the Hamming weight |x| := N
P
i=1 xi of x. For ease of exposition, suppose the algorithm
knows k. (For our results we will work with weaker assumptions such as knowing only an upper
bound on k, or an estimate of it, see Section 3. We also ignore failure probabilities in this part of
Joran van Apeldoorn: [email protected]
Sander Gribling: [email protected]
Harold Nieuwboer: [email protected]
Accepted in Quantum 2024-02-27, click title to verify. Published under CC-BY 4.0. 1
the introduction.)
p A variant of Grover’s algorithm
p [BBHT98] can find a single marked element
using O( N/k) quantum queries and O( N/k log(N )) additional single- and two-qubit gates.
One can then find all k marked elements using
q q √ √
O N/k + N/(k − 1) + . . . + N = O Nk
quantum queries to x. The above complexity is obtained as follows. Suppose we have already
found a set J ⊆ [N ] of marked elements. Then to find a new marked element, we replace x by
the string z ∈ {0, 1}N defined as
(
xi if i ̸∈ J,
zi =
0 otherwise.
A quantum query to z can be made using a single quantum query to x and quantum query to
J (which on input |i⟩ |b⟩ for i ∈ [N ], b ∈ {0, 1} returns |i⟩ |b ⊕ δi∈J ⟩ where δi∈J ∈ {0, 1} is one
iff i ∈ J). In particular, if J can be stored in a quantum
√ memory (i.e. queried and updated
√
in unit time), then the query complexity will be O( N k) and the time complexity is O( e N k).
We refer the interested reader to [GLM08] and [CHI+ 18, Sec. 5] for a discussion of quantum
memory and its (dis)advantages.
However, when we cannot store J in a quantum memory, a naive implementation of the quan-
tum queries to J is expensive in terms of gate complexity: if |J| = s, then one can use O(s log(N ))
quantum gates to implement a single query to J.1 Since the√size of J grows to k, the total gate
complexity of finding all marked elements will scale as O( e N k 3/2 ), which is a factor k larger
than the query complexity. We show that this factor of k in the gate complexity can be avoided:
√ with large probability, all k indices using the optimal number
we give an algorithm that finds,
of quantum queries to x, O( N k), while incurring only a polylogarithmic overhead in the gate
complexity, in the case where we only have a small quantum memory. We state a simplified
version of our main result below; for the full version, see Theorem 3.9 and the corresponding
algorithm GroverMultipleFast.
Theorem 1.1. Let x ∈ {0, 1}N with |x| = k ≥ 2, and let ρ ∈ (0, 1) be such that k ∈ Ω(log(k/ρ)3 )
(e.g. ρ = Ω(1/poly(k))).
√ Then we can find, with probability
√ ≥ 1 − ρ, all k indices i ∈ [N ] for
which xi = 1 using O( N k) quantum queries and O( N k log(k)3 log(N )) additional gates.
We mention that by √ a simple coupon-collector argument one can already achieve both query-
and gate-complexity N k polylog(N, 1/ρ), see Proposition 3.7. Our algorithm completely re-
moves the polylog(N ) factors in the query complexity and moreover has a much improved
dependence on log(1/ρ): one can achieve ρ = 1/poly(k) without increasing the number of quan-
tum queries made by the algorithm. In the same spirit, we mention that previous work had
already shown that simply boosting a constant success probability is notpoptimal for finding a
single marked element: one can do so with probability ≥ 1 − ρ using N log(1/ρ) quantum
queries [BCdWZ99].
In a nutshell, our algorithm is a hybrid between the quantum coupon-collector and the query-
optimal algorithm described above. First, we use the coupon collection strategy to find t marked
indices 1 ≤ i1 < · · · < it ≤ n, for t roughly k/log(k)2 . A basic property of this strategy is that
the resulting indices {i1 , . . . , it } yield a uniformly random subset of size t of the marked indices
in x. Next, for every j ∈ [t + 1], we use the query-optimal algorithm to find all remaining
marked elements in the interval (ij−1 , ij ) ⊆ [n], where we write i0 = 0 and it+1 = n + 1. With
1
We ignore here the cost of maintaining a classical data structure for J, but comment on this again later.
2
high probability over the found indices {i1 , . . . , it }, each of the intervals (ij−1 , ij ) contains few
remaining marked indices, which reduces the effect of the high gate-complexity overhead of the
query-optimal search algorithm.
Theorem 1.2 (Informal version of Theorem 4.3). Let v ∈ [0, 1]N . Let ρ, δ ∈ (0, 1) be such that
log(1/ρ)/δ = O(N ). Then we can find, with probability ≥ 1 − ρ, a multiplicative δ-approximation
of N i=1 vi using
P
s
N
O log(1/ρ) (1.1)
δ
quantum queries to binary descriptions of the entries of v, and a gate complexity which is larger
by a factor polylogarithmic in N , 1/δ and 1/ρ.
In a nutshell, our algorithm first finds all indices of “large enough” entries of the v us-
ing GroverMultipleFast and sums the corresponding elements classically. It then rescales the
remaining “small enough” elements and uses amplitude estimation [BHMT02] to approximate
their sum. To determine what “large enough” means, we use a recent quantum quantile esti-
mation procedure from [Ham21]. Choosing the quantile carefully controls both the number of
elements that need to be found in the first stage, as well as the size of the elements that remain
to be summed in the second stage. Note that it is the above p version of Grover’s algorithm
that allows us to obtain a query complexity with only a log(1/ρ)-dependence, and without
additional polylogarithmic factors in N and δ. Indeed, the fact that the number of quantum
queries required to find multiple marked elements does not depend on log(1/ρ) (for ρ not too
small) allows us to balance the complexities of the two stages.3
The problem we consider can be viewed as a special case of the mean estimation problem,
or as a generalization of the approximate counting problem for binary strings x ∈ {0, 1}N . We
briefly discuss how our results compare to prior work on those problems.
Mean estimation algorithms. After multiplying the vi by a factor N1 , we can interpret the
problem of finding a multiplicative δ-approximation of the sum s = N
P
i=1 vi as the problem of
1 PN
obtaining a multiplicative δ-approximation of the mean µ = N i=1 vi of the random variable
that, for each i ∈ [N ], takes value vi with probability 1/N . Quantum algorithms for the mean
estimation problem date back to the work of Grover [Gro97, Gro98]. A careful application of
maximum finding and quantum amplitude estimation yields such an approximation of µ, with
2
Here we use the convention that a multiplicative δ-approximation of a real number s is a real number s̃ for
which (1 − δ)s ≤ s̃ ≤ (1 + δ)s.
3
For completeness we mention
pthat if you do allow a large quantum memory, or a polynomial overhead in the
gate complexity, then the same log(1/ρ)-dependence can be obtained using an analogous approach as the one
we use to prove Theorem 4.3, but instead relying on, e.g., one of the versions of Grover’s algorithm discussed in
Section 1.1.
3
√
probability ≥ 1 − ρ, using O( δN log(1/ρ)) quantum queries and polylogarithmic gate overhead,
√
see Theorem 2.9. We improve the dependence on δ from 1/δ to 1/ δ.
As for applications, we note that Theorem 2.9 was used to give quantum speedups for
the matrix scaling problem in [vAGL+ 21, GN22], where it is used to approximate the row- and
column sums of a matrix with non-negative entries. This is one of their main sources of quantum
speedup, and the quality of this approximation directly affects the achievable precision for the
matrix scaling problem. Using the improved quantum summing subroutine of Theorem 4.3, the
dependence on the desired precision ε for the matrix scaling problem is further improved. More
×N
precisely, if A ∈ RN
≥0 is an N × N matrix with non-negative entries, let r(A) ∈ RN≥0 denote its
PN
vector of row sums, i.e., ri (A) = j=1 Aij . Then given quantum query access to A, using the
√
improved summing subroutine, one can with O(N e 1.5 / δ) queries compute a vector r̂ ∈ Rn such
that
∥r̂ − r(A)∥1 ≤ δ∥r(A)∥1 .
Computing such an r̂ with δ = ε2 is the bottleneck in the second-order method for matrix
scaling presented in [GN22]. By reducing the complexity of this step, this method is improved to
become better than the fastest classical first-order method (Sinkhorn’s algorithm) for entrywise-
positive matrices: the classical method finds an ε-ℓ1 -scaling of entrywise positive matrices in
e 2 /ε), whereas the box-constrained Newton method now runs in time O(N
time O(N e 1.5 /ε). Note
that this √
gives an algorithm for matrix scaling whose runtime is sublinear in the input size when
1/ε = o( N ), corresponding to 1/δ = o(N ), which is precisely the regime of δ for which the
quantum subroutine improves over classical summing.
We remark that faster mean estimation algorithms have been developed for example for
random variables with a small variance σ 2 . Indeed, the current state of the art obtains a
e σ + √1 ) log(1/ρ)) quantum
multiplicative δ-approximation, with probability ≥ 1 − ρ, using O(( δµ δµ
queries in expectation [Ham21, KO23].4 For comparison, we mention that σ ≤ µ(1 − µ) always
p
holds, and when given binary access to the vi , one may additionally assume (after maximum-
finding
p and rescaling) that µ ∈ [1/N, 1]. The second term in the complexity is then at most
N/δ log(1/ρ) (i.e. at most our bound when we ignore the ρ-dependence). The first term,
however, is larger than our complexity if and only if δN ≤ (σ/µ)2 (again ignoring ρ). Our
algorithm thus improves over prior mean estimation algorithms when the support is relatively
small: when δN is at most (σ/µ)2 .
Approximate counting algorithms. As mentioned above, our algorithm improves the error-
dependence for mean estimation (for random variables with small support). It therefore makes
sense to compare our upper bound with the well-known lower bound for the approximate count-
ing problem for binary strings x ∈ {0, 1}N . We first recall a precise statement. Let x ∈ {0, 1}N
and k = |x|, and Ux a unitary implementing quantum oracle access to x. Then for an in-
teger ∆ > 0, any quantum algorithm p which, p with probability ≥ 2/3, computes an additive ∆-
approximation of k uses at least Ω( N/∆+ k(N − k)/∆) applications of controlled-Ux [NW99,
Thm. 1.10 and 1.11]. A matching upper bound is given in [BHMT02, Thm. 18], see Theorem 2.7
for a precise formulation. We can compare the complexity of our algorithm by converting mul-
tiplicative error into additive error, i.e., to achieve an additive error of ε we take δ = ε/k (or
ε divided by a suitable multiplicative approximation of k). Then the key point is that if one
4
In [Ham21, Proposition 6.4], a matching (up to log-factors) lower bound is shown for Bernoulli random
variables. We remark that our algorithm does not break that lower bound since we parameterize the problem
differently: the complexity of our algorithm depends also on the size of the support of the distribution.
4
considers Eq. (1.1) for ε ≤ ∆ and k ≥ 1, then
s s s s p
Nk Nk 1N 1 k(N − k) 1 N k(N − k)
≥ ≥ + ≥ +
ε ∆ 2∆ 2 ∆ 2 ∆ ∆
where the last inequality follows from concavity of the square-root function and ∆ ≥ 1. In other
words, for all parameters N, k, ∆, the complexity of our algorithm (left hand side), is at least
as large as the lower bound on approximate counting (right hand side), so we do not break the
lower bound.
We highlight two ranges of parameters. On the one hand, when ∆ is large, our upper bound
is suboptimal
√ for quantum counting. For example, when ∆ = k/2 (i.e., δ = 1/2), our algorithm
uses O( pN ) queries whereas the approximate counting algorithm from [BHMT02, Thm. 18] uses
only O( N/k) queries. This is no surprise given that our algorithm finds all “large” elements,
which in the counting setting amounts to finding all ones. On the other hand, when ∆ is a
small constant, say ∆ = 1, the approximate counting lower bound shows that our upper bound
is essentially tight. To see this, note if one had a quantum algorithm for √ computing (1 ± δ)-
multiplicative approximations of sums with quantum query complexity O( N /δ c ) (that succeeds
√ c
with probability ≥ 2/3), this would give an upper bound
p of O( N k ) for finding an additive ∆ =
1-approximation
√ of k. The lower bound becomes Ω( k(N − k)) when ∆ = 1, and so we must
have N k c ≥ k(N − k), i.e., c ≥ 1/2. We leave it as an open problem whether one can
p
obtain a quantum algorithm for approximate summing of vectors v ∈ [0, 1]N that matches the
approximate counting complexity when applied to v ∈ {0, 1}N for the entire range of parameters
N, k, ∆.
Finally we highlight that our quantum upper bound for summing outperforms the classical
lower bound for approximate counting for a certain range of parameters. The classical random-
ized query complexity of achieving a multiplicative δ-approximation of the Hamming weight of
x ∈ {0, 1}N is Θ(min{N, δN2 k }).5 This classical bound exceeds our quantum upper bound of
e N/δ) if 1/δ ∈ O(N ) and 1/δ ∈ Ω(k 2/3 /N 1/3 ) (ignoring logarithmic factors).
p
O(
2 Preliminaries
2.1 Notation and assumptions
Throughout the paper, we will assume that N ≥ 1 and N = 2n for some n ≥ 1. We identify
n
CN with C2 by |j⟩ 7→ |j1 . . . jn ⟩, where (j1 , . . . , jn ) ∈ {0, 1}n is the standard binary encoding
of j − 1 ∈ {0, . . . , 2n − 1}. We write log for the logarithm with base 2 and ln for the natural
logarithm. For a bit string x ∈ {0, 1}N we write |x| = i∈[N ] xi . Throughout we will use k to
P
denote the Hamming weight of x, i.e, |x| = k, and we write kest , klb , kub for various bounds on
k: kest will denote an integer such that k/2 ≤ kest ≤ 3k/2, and klb and kub are lower- and upper
bounds on k respectively.
5
We believe this is well known: the upper bound (which was also conjectured in [BHMT02]) follows from the
algorithm presented in [DKLR00], applied to a Bernoulli random variable with success probability k/N ; the lower
bound is claimed for instance in [AR20], but we could not locate a proof in the literature for cases other than
k = Θ(N ) [CEG95]. We therefore provide a (not entirely trivial) proof in Appendix A.
5
Procedure AmpEst(U, M )
Input: Access to controlled versions of unitary U ∈ U(2q ) and its inverse, an integer
M ≥ 1.
Output: Real number ã ∈ [0, 1].
Analysis: Lemma 2.3
In both cases we allow the unitary to act on additional workspace registers, which we omit
for notational convenience. Moreover, throughout the paper, every algorithm will use at most
logarithmic number of additional ancillary qubits.
We additionally use a classical data structure to maintain sorted lists that supports both
insertion and look-up in a time that scales logarithmically with the size of the list, see for
example [Knu98, Sec. 6.2.3] or [CLRS22, Ch. 13]. We emphasize that we allow neither writing
nor reading of such a data structure in superposition.
6
Procedure GroverCertainty(U , k0 )
Input: Quantum oracle Ux to access x ∈ {0, 1}N , an integer k0 ≥ 1.
Output: An index i ∈ [N ].
Guarantee: If |x| = k0 , then xi = 1 with certainty.
Analysis: Theorem 2.4
Procedure GroverExpectation(Ux )
Input: Quantum oracle Ux to access x ∈ {0, 1}N .
Output: An index i ∈ [N ].
Guarantee: If |x| ≥ 1, then xi = 1 with certainty.
Analysis: Theorem 2.5
Proof. This follows from the formulation in [BHMT02] by setting k = 1 and implementing
the reflection through |0q ⟩ using O(q) gates, which needs to be performed M times. If M is a
power of 2 we can implement the quantum Fourier transform on m = log2 (M ) qubits using m
Hadamard gates, and the QFT and its inverse need only be performed once; therefore, this cost
is absorbed in the big-O.
We note that the above formulation of AmpEst outputs a real number ã whereas we require
a fixed-point encoded number for future uses. However, it suffices to use fixed-point arithmetic
using O(log(M )) bits; after all, the guarantee of AmpEst only gives a precision of 1/poly(M ).
We also need a version of amplitude amplification where the success probability is 1 if one
knows the amplitude of the “good” part of the state exactly. In a nutshell, the algorithm with
success probability 1 is the usual amplitude amplification algorithm applied not to U but to U
followed by a rotation of the last qubit to slightly reduce the amplitude a to ā. Carefully choosing
ā ensures that the success probability is exactly 1 after an integer number of rounds of amplitude
amplification. This requires having access to gates which implement rotation by arbitrary angles,
not just angles of the form π/2m for some integer m. We specialize the statement of this result to
the search setting but remark that this works more generally. For exactly N/4 marked elements
this observation was first made in [BBHT98].
Theorem 2.4 ([BHMT02, Thm. 4]). Let x ∈ {0, 1}N with |x| = k ≥ 1. Then there is a quantum
algorithm GroverCertainty that takes as input a quantum oracle Ux to access x and an integer
∈ [N ], and that outputs an index i ∈ [Np
k0 p ], such that xi = 1 with certainty if k0 = k, and uses
O( N/k0 ) quantum queries to x, and O( N/k0 log(N )) additional gates.
The other version of Grover that we need is the following, which is originally due to [BBHT98,
Thm. 3], but we use a slightly different version from [BHMT02, Thm. 3]:
Theorem 2.5 ([BHMT02, Thm. 3]). Let x ∈ {0, 1}N with |x| = k, where k is not necessarily
known. Then there is a quantum algorithm GroverExpectation that takes as input a quantum
oracle Ux to access x, and if k ≥ 1, outputs an index i ∈ [N ] such that xi = 1. The number of
quantum queries to x that it uses is a random variable Q, such that, if k ≥ 1, then
q
E[Q] = O N/k ,
and if |x| = 0, then Q = ∞ (i.e., the algorithm runs forever). The number of additional gates
used is O(Q log(N )). The index i which is output is uniformly random among all such indices,
and independent of the value of Q.
7
Procedure Grover2/3 (Ux , klb )
Input: Quantum oracle Ux to access x ∈ {0, 1}N and a lower bound klb on |x|.
Output: An index i ∈ [N ].
Guarantee: If |x| ≥ 1, then with probability ≥ 2/3, xi = 1.
Analysis: Lemma 2.6
Procedure ApproxCount(Ux , ε, ρ)
Input: Quantum oracle Ux to access x ∈ {0, 1}N , rational number ε > 0 such that
1
3N < ε ≤ 1, failure probability ρ > 0.
Output: Integer k̃ ∈ {0, . . . , N }.
Guarantee: If |x| = k ≥ 1, with probability ≥ 1 − ρ, |k̃ − k| ≤ εk, and if k = 0 then
k̃ = 0 with certainty.
Analysis: Theorem 2.7
Lemma 2.6. Let x ∈ {0, 1}N . Then there is a quantum algorithm Grover2/3 that takes as
input a quantum oracle Ux to access x and a lower bound klbp≥ 1 on |x|. With probability
it outputs an index i ∈ [N ] such that xi = 1. It uses O( N/klb ) quantum queries to x,
≥ 2/3, p
and O( N/klb log(N )) additional gates.
Proof. The algorithm GroverExpectation finds an index i such that xi = 1. Its number of
p Q and the number of additional gates is
applications of controlled-Ux is a random variable
O(Q·log(N )). By Theorem 2.5 we have E[Q] = O( p N/|x|). Markov’s inequality shows that if we
terminate GroverExpectation after at most C N/|x| quantum queries for a suitable constant
C > 0, then it finds an index i such that xi = 1 with probability at least 2/3. The procedure
Grover2/3 uses the lower bound klb on |x| to decide when to terminate GroverExpectation.
p
For the p
same constantp C > 0 as before, it terminates after at most C N/klb quantum queries.
Since C N/klb ≥ C N/|x|, the success probability of Grover2/3 is also at least 2/3.
Let us make some remarks about the complexity of finding a single marked element. First,
to find such an element
√ with certainty one can essentially remove the log(N ) factor in the
∗
gate complexity: O( N log(log (N ))) gates suffice [AdW17]. Second, by cleverly combining
GroverCertainty and Grover2/3 , one can find a marked element (among an unknown number
p
of solutions) with probability ≥ 1 − ρ using N log(1/ρ) quantum queries [BCdWZ99]. This
shows that the standard way of boosting the success probability of Grover2/3 is not optimal.
Next, we recall a well-known result on approximate counting.
Theorem 2.7 ([BHMT02, Thm. 18]). Let x ∈ {0, 1}N and write |x| = k. Let 3N 1
< ε ≤ 1. Then
there is a quantum algorithm that, with probability at least 2/3, that outputs an estimate k̃ such
that
|k̃ − k| ≤ εk
using an expected number of
s p !
N k(N − k)
Θ +
⌊εk⌋ + 1 ⌊εk⌋ + 1
√
quantum queries to x. If k = 0, then the algorithm outputs k̃ = 0 with certainty, using Θ( N )
quantum queries to x. In both cases, the algorithm uses a number of gates which is O(log(N ))
times the number of quantum queries. To boost the success probability to 1 − ρ, repeat the
procedure O(log(1/ρ)) many times and output the median of the returned values.
8
We often use the special case ε = 1/2 of the above theorem, hence we record it here for
future use. (Note that the proof of Theorem 2.7 given in [BHMT02] in fact starts by obtaining
a constant factor approximation of |x|.)
Corollary 2.8. Let x ∈ {0, 1}N and write |x| = k. Then there is a quantum algorithm
thatpoutputs a kest such that, with probability ≥ p
1 − ρ, we have k/2 ≤ kest ≤ 3k/2, and uses
O( N/(k + 1) log(1/ρ)) quantum queries and O( N/(k + 1) log(1/ρ) log(N )) gates.
We now discuss known extensions of the above results on counting the Hamming weight of
a bit string to the problem of mean estimation: given a vector v ∈ [0, 1]N , one is interested
in approximating v̄ = N1 N
P
i=1 vi . This was first studied in [Gro97] and later in [Gro98] where
in the latter it was shown that one can find an additive ε-approximation of v̄ using O(1/ε) e
quantum queries to a unitary that prepares a state encoding the entries of v in its amplitudes,
and a similar number of additional gates (also dependent on N ). Using amplitude amplification
techniques one can reduce the query dependence to O(1/ε) with O(log(N )/ε) additional gates.
This result may be easily recovered from Lemma 2.3 with M = Θ(1/ε), applied to a unitary
preparing
N
1 X √ √
√ |i⟩ ( 1 − vi |0⟩ + vi |1⟩).
N i=1
It is well-known that when one has quantum oracle access to fixed point representations of the
entries of v (cf. Definition 2.2), rather than just a state encoding its entries in the amplitudes,
one can give an algorithm whose complexity depends only on N and δ, with guarantees as given
below.
Theorem 2.9. Let v ∈ [0, 1]N be a vector with each entry vi encoded in (0, b)-fixed-point format,
and let Uv be a√unitary implementing binary oracle access to v (cf. Definition 2.2). Let ρ ∈ (0, 1).
Then with O( δN log(1/ρ)) applications of controlled-Uv , controlled-Uv† , and a polylogarithmic
gate overhead, one can find with probability ≥ 1 − ρ a multiplicative δ-approximation of N1 i vi .
P
We give an informal description of the algorithm here, and refer the interested reader
to [vAGL+ 21] for a careful implementation √ along with a bit complexity analysis.
√ By using
quantum maximum finding [DH96], with O( N ) quantum queries and O(b N log(N )) other
gates, one may find vmax = maxi vi . If vmax = 0 one may output v̄est = 0 as an estimate of v̄.
Note that having binary access here makes it easy to compare elements. Next, set wi = vi /vmax ,
and let U be a unitary preparing a state
N
1 X √ √
√ |i⟩ ( 1 − wi |0⟩ + wi |1⟩).
N i=1
√
Then Lemma 2.3 with M = 8 N /δ outputs an estimate w̄est of w̄, such that
π2 2π √ π2
p
w̄(1 − w̄)
|w̄est − w̄| ≤ 2π + 2 ≤ δ w̄ 1 − w̄ + δ 2 w̄ ≤ δ w̄
M M 8 64
because 1/N ≤ w̄, so w̄est is a multiplicative δ-approximation of w̄. Therefore w̄est · vmax is a
multiplicative δ-approximation of v̄. We note that in this step the binary access to the entries
of v enables the “binary amplification” by ensuring the largest entry of w is 1.
9
√
√ finds all such indices with probability ≥ 1 − ρ, and uses O( N k) quantum queries
algorithm
and O(
e N k) single- and two-qubit gates. The contribution here is that the query complexity
is optimal and the time complexity is only polylogarithmically worse than the query complexity,
without using a QRAM.
if i = j,
(
|i⟩ |b ⊕ 1⟩
Cj |i⟩ |b⟩ = (3.1)
|i⟩ |b⟩ otherwise.
Then the Cj -gate can be implemented with O(n) standard gates and n − 1 ancillary qubits.
1. For each l ∈ [n] such that jl = 0, apply a NOT gate on the l-th qubit of the index register.
2. Apply a NOT-gate to the output register containing b, controlled on all n qubits of the
index register. This can be implemented using O(n) Toffoli gates, one CNOT gate, and
n − 1 ancilla qubits, see [NC02, Fig. 4.10].
Lemma 3.2. Let x ∈ {0, 1}N , Ux a quantum oracle to access x, and kub ≥ 1. If |x| = k ≤ kub ,
then GroverCertaintyMultiple(Ux , kub ) finds, with probability 1, all k indices i such that
xi = 1. The algorithm uses p
O N kub
applications of Ux , and p
O N kub (k + 1) log(N )
additional non-query gates.
10
Procedure GroverCertaintyMultiple(Ux , kub )
Input: Quantum oracle Ux to access x ∈ {0, 1}N , an integer kub ≥ 1.
Output: Classical list of indices J ⊆ [N ].
Guarantee: If |x| ≤ kub , then for every j ∈ [N ], j ∈ J if and only if xj = 1.
Analysis: Lemma 3.2
1 Jkub ← ∅;
2 UJkub ← Ux ;
3 m ← kub ;
4 while m > 0 do
5 use GroverCertainty(UJm , m) to find a j ∈ [N ] \ Jm ;
6 if xj = 1 then
7 Jm−1 ← Jm ∪ {j};
8 UJm−1 ← Cj UJm , where Cj is defined in Lemma 3.1;
9 else
10 Jm−1 ← Jm ;
11 UJm−1 ← UJm ;
12 end if
13 m ← m − 1;
14 end while
15 return J0 ;
√
In total, this procedure uses km=1
P ub p
O( N/m) = O( N kub ) applications of Ux . The number
of auxiliary gates for a single query inpthe m-th iteration is O(|Jm | · log(N )), and Grover-
Certainty itself uses an additional O( N/m log(N )) additional gates. Therefore the total
number of gates in the m-th iteration is
q q q
O N/m · |Jm | · log(N ) + N/m log(N ) = O N/m(k + 1) log(N )
√
Summing this over all iterations yields a total gate complexity of O( N kub (k + 1) log(N )).
11
basic properties in the following lemma.
Lemma 3.3. The k-th harmonic number Hk is defined by Hk = j=1 j , and we shall use the
Pk 1
12
Applying the above tail bound with pi ≥ 32 (k − i + 1)/k yields the following lemma.
Lemma 3.6. Let 1 ≤ t ≤ k ≤ N and subset I ⊆ [N ] of size k, and let ρ ∈ (0, 1). Consider a
procedure in which at each step with probability ≥ 2/3, one obtains a uniformly random sample
from I. The outputs of
2 ln(1/ρ)
r ≥ 3 ln(2) k(Hk − Hk−t ) + =: Rt,k,ρ
ln(3k/(k + 2(t − 1)))
repetitions of this procedure suffice to, with probability ≥ 1 − ρ, obtain t distinct samples from I.
obtain t distinct samples from I with probability at least 1 − ρ, whenever we use at least Rt,k,ρ
repetitions of this procedure.
We briefly emphasize the value of this lemma. For general t ≤ k, we can use the simple
bound ln(k/(t − 1)) ≥ ln(k/(k − 1)) ≥ 1/k and the estimate Hk − Hk−t ≈ ln(k/(k − t)), to obtain
that r ∈ Ω(k log(k) + k ln(1/ρ)) = Ω(k log(k/ρ)) samples suffice. By contrast, an application of
Markov’s inequality only yields a sample complexity upper bound of k log(k) log(1/ρ). In later
applications (cf. Theorem 3.9), we apply this with t at most k/2, in which case we can give
tighter estimates. Indeed, the factor 1/ ln(3k/(k + 2(t − 1))) is then at most a constant and
Hk − Hk−t ≤ 2(t+1)
k by Lemma 3.3, and thus r ∈ Ω(t + ln(1/ρ)) samples suffice. Therefore the
bound is an improvement over the sample complexity of Ω(t ln(1/ρ)) one would obtain from a
simple application of Markov’s inequality – in particular, one can now “for free” choose ρ to be
exponentially small in t (and similar above).
By using Grover2/3 to obtain the samples required for Lemma 3.6, we obtain the following
algorithmic result.
Proposition 3.7. Let x ∈ {0, 1}N with |x| = k unknown, let R ≥ 1, let klb ≥ 1 be such
that klb ≤ k, let t ≥ 1, and ρ ∈ (0, 1). Assume 1 ≤ t ≤ k. Then GroverCoupon called with a
quantum oracle Ux to access and additional inputs , and uses lb r) quantum
p
x, R, klb t, O( N/k
queries to x and O( N/klb r log(N )) additional quantum gates. Here, r is a random variable
p
ln(1/ρ)
r ≤ Rt,k,ρ = 3 ln(2)k(Hk − Hk−t ) + 2 .
ln(3k/(k + 2(t − 1))
If R ≥ Rt,k,ρ , then with probability ≥ 1 − ρ, it finds a set of t distinct marked elements, uniformly
at random from the set of all sets of k marked elements.
Proof. We first analyze the complexity of GroverCoupon. Let r ∈ [R] be the number of times
the algorithm repeats line 3 through line 8. By Lemma 2.6, the application of Grover2/3 in line 3
p p
uses O( N/klb ) quantum queries and O( N/klb log(N )) additional gates. With one additional
query we can verify if the index j ∈ [N ] that is returned by Grover2/3 is such that xj = 1. If
indeed xj = 1, then we add j to J. As mentioned in Section 2.2, we can insert an element
in the sorted list J in (classical) time O(log(N )). We can verify line 7 in time O(log(N
p )) by
maintaining a counter for p |J|. The above shows that GroverCoupon indeed uses O( N/klb r)
quantum queries and O( N/klb r log(N )) additional quantum gates.
13
Procedure GroverCoupon(Ux , R, klb , t)
Input: Quantum oracle Ux to access x ∈ {0, 1}N , an integer R ≥ 1, an integer klb such
that klb ≤ |x|, an integer t ≥ 1 such that 1 ≤ t ≤ |x|.
Output: Classical sorted list of indices J ⊆ [N ].
ln(1/ρ)
Guarantee: If R ≥ Rt,k,ρ = 3 ln(2)k(Hk − Hk−t ) + 2 ln(3k/(k+2(t−1)) , then, with
probability ≥ 1 − ρ, we have |J| = t and xj = 1 for all j ∈ J.
Analysis: Proposition 3.7
1 J ← ∅;
2 for r = 1, . . . , R do
3 use Grover2/3 with arguments (Ux , klb ) to find a j ∈ [N ] such that xj = 1 with
probability ≥ 2/3 ;
4 if j ̸∈ J and xj = 1 then
5 add j to J;
6 end if
7 if |J| = t then
8 return J ;
9 end if
10 end for
11 return J;
We now establish correctness. By construction r ≤ R with certainty. Lemma 2.6 shows that,
with probability ≥ 2/3, the index returned by Grover2/3 in line 3 is a uniformly random marked
element. Hence Lemma 3.6 shows that after obtaining
2 ln(1/ρ)
Rt,k,ρ = 3 ln(2) k(Hk − Hk−t ) +
ln(3k/(k + 2(t − 1)))
such indices, we have obtained t distinct indices with probability at least 1 − ρ. In other words,
if R ≥ Rt,k,ρ , then, with probability at least 1 − ρ, GroverCoupon terminates at line 8 with a
sorted list J ⊆ [N ] of t distinct marked indices.
14
Proof. The probability that [k] \ S contains a contiguous subset I of length at least ℓ is the same
as the probability that it contains a contiguous subset of length exactly ℓ. This is in turn given
by
By the uniform randomness of S, each of the latter probabilities is the same, and given by
k−ℓ ℓ
(k − t)(k − t − 1) · · · (k − t − l + 1) k−t
t
Pr[{a, . . . , a + ℓ − 1} ∩ S = ∅] = k
= ≤ .
t
k(k − 1) · · · (k − l + 1) k
We conclude that the probability that [k] \ S contains a contiguous subset I of at least ℓ is at
most (k − ℓ + 1)(1 − kt )ℓ .
Theorem 3.9. Let x ∈ {0, 1}N with |x| = k ≥ 2, and assume one knows kest ≥ 1 such that
k/2 ≤ kest ≤ 3k/2. Let 0 < ρ < 1 and 6 ≤ λ ≤ kest be such that t := ⌈kest /λ⌉ ≥ log(6kest /ρ).
Then √
1
O N k 1 + √ log(k/ρλ)
λ
15
quantum queries to x suffice to, with probability ≥ 1 − ρ, find all k indices i s.t. xi = 1. The
algorithm uses an additional √
O N kλ log(k/ρ) log(N )
non-query gates.
Corollary 3.10. Let x ∈ {0, 1}N with |x| = k ≥ 2. Assume one knows kest such that k/2 ≤
kest ≤ 3k/2. Let 1 > ρ > 0. Then we can find, with probability ≥ 1 − ρ, all k indices i for which
xi = 1 using either:
√ √
• O( N k) quantum queries and time complexity O( N k min{log3 (k/ρ), k} log(N )), via
Theorem 3.9 with λ = min{kest / log(6kest /ρ), log2 (kest /ρ)},6 or,
√ √
• O( N k log(k/ρ)) quantum queries and time complexity O( N k log(k/ρ) log(N )), via
Theorem 3.9 with λ = 6.
Proof of Theorem 3.9. Let t = ⌈kest /λ⌉. Note that because λ ≥ 6 and kest ≤ 3k/2, we have
t ≤ k/2. Therefore we can find t of the solutions using the procedure of Proposition 3.7 with
probability ≥ 1 − ρ/3, using
s s
N N k
O (t + log(1/ρ)) = O + log(1/ρ) (3.2)
k k λ
queries and s
N k
O + log(1/ρ) log(N ) (3.3)
k λ
gates. We remark here that these upper bounds hold because t ≤ k/2 < k. Indeed, under that
assumption on t and k we have k(Hk − Hk−t ) ≤ 2(t + 1) by Lemma 3.3, and moreover the factor
1/ ln(3k/(k + 2(t − 1))) is Θ(1) (it lies between 1/ ln(3) and 1/ ln(3/2)). This shows that calling
GroverCoupon with R = 6 ln(2)(t + 1) + 2 ln(1/ρ) ln(3/2) ∈ Θ(t + log(1/ρ)) has the desired
behaviour.
Let a1 < a2 < · · · < at denote the found indices for which xaj = 1 and define the intervals
I0 = {1, . . . , a1 − 1}, It = {at + 1, . . . , N }, and, for j ∈ [t − 1], Ij = {aj + 1, . . . , aj+1 − 1}. We use
kj to denote the (unknown) number of marked elements in Ij , so in particular tj=0 kj ≤ k − t.
P
Then by Lemma 3.8, the probability that there is a kj larger than ℓ := kt (log(k) + log(3/ρ)) is at
most
ℓ k (log(k)+log(3/ρ)) log(k)+log(3/ρ)
t t 1
t
log(k) log(k)
(k − ℓ + 1) 1 − ≤2 1− ≤2 = ρ/3.
k k 2
6
Strictly speaking, this choice of λ could be smaller than 6, but in that case GroverCertaintyMultiple already
has the stated complexity.
16
Here we used that ℓ ≥ 1, (1 − kt )k/t ≤ 1e ≤ 12 , and log(k) + log(3/ρ) ≥ 0.7 For the rest of the
argument we may thus assume that there is no interval with more than ℓ not-yet-found marked
elements.
In the next step of our algorithm we search for all marked elements in each interval. To
do so for the jth interval, we search over the elements from [2⌈log(|Ij |)⌉ ] marking an element
i ∈ [2⌈log(|Ij |)⌉ ] if xi+aj = 1 and i ≤ |Ij | (letting a0 = 0). One can implement this unitary using
O(1) quantum queries and O(log(N )) gates (to implement the addition and comparison). For
each interval, we first compute an estimate (kj )est of kj that satisfies kj /2 ≤ (kj )est ≤ 3kj /2
usingq Corollary 2.8, with success probability q ≥ 1 − ρ/(3(t + 1)). The associated query cost
is O( |Ij |/(kj + 1) log(t/ρ)), and it uses O( |Ij |/(kj + 1) log(t/ρ) log(N )) additional gates.
Then Lemmaq 3.2 shows that we can find all marked q elements in the j-th interval with probability 1
3/2
using O( |Ij | (kj )est ) quantum queries and O( |Ij | (kj )est log(N )) additional gates. By a union
bound, with probability ≥ 1 − ρ/3, all (kj )est are correct, and this step has a total query
complexity of
t q t q √ √ √
log(k/ρλ)
X X
O |Ij |kj + |Ij | log(t/ρ) = O Nk + √
N t log(t/ρ) = O) , N k(1 +
j=0 j=0 λ
q q (3.4)
where the first step uses Cauchy–Schwarz for both terms (reading |Ij | as |Ij | · 1 for the
second term) and tj=0 |Ij | ≤ N , tj=0 kj = k. To analyze the gate complexity of this step,
P P
we first bound tj=0 kj3 . We have ∥k2 ∥∞ ≤ ℓ2 = O(λ2 log2 (3k/ρ)) where k is the vector
P
with entries kj and k2 is the entrywise square of k. As we also have ∥k∥1 ≤ k we get
Pt 3 2 2 2
j=0 kj = ⟨k, k ⟩ ≤ ∥k∥1 ∥k ∥∞ = O(kℓ ). Then the gate complexity of the final search
steps becomes:
t q
X t q
X
O |Ij |kj3 + |Ij | log(t/ρ) log(N )
j=0 j=0
v
u t
√ uX √
= O N t kj3 + t log(t/ρ) log(N )
j=0
√ √ √
=O N kℓ + 1 + t log(t/ρ) log(N )
√ √ q
=O N kλ log(k/ρ) + k/λ log(k/ρλ) log(N )
√ q
=O N k λ log(k/ρ) + 1/λ log(k/ρλ) log(N )
√
=O N kλ log(k/ρ) log(N ) , (3.5)
where we again used Cauchy–Schwarz in the first step, and the total error probability is bounded
ρ
by ρ/3 + ρ/3 + (t + 1) · 3(t+1) = ρ.
To conclude, the upper bound on the total query complexity follows by combining Eqs. (3.2)
7
Note also that ℓ ≤ k because t ≥ log(6kest /ρ) ≥ log(3k/ρ) by assumption; if ℓ > k, then the probability of
having an interval of length ≥ ℓ is of course 0, and in this regime one may just as well run GroverCertainty-
Multiple on the whole string (and have zero failure probability).
17
and (3.4):
s
N
k √
1
O + log(1/ρ) + N k 1 + √ log(k/ρλ)
k λ λ
| {z } | {z }
sample t elements find remaining elements
s
√ 1
N
√
1
= O N k 1 + √ log(k/ρλ) + log(1/ρ) = O
N k 1 + √ log(k/ρλ) .
λ k λ
q √
Here the first equality uses that Nk λk ≤ N k since λ ≥ 1. The second equality follows since
log(1/ρ) ≤ log(6kest /ρ) and, by assumption, log(6kest /ρ) ≤ ⌈kest /λ⌉ = t ≤ k. A similar argument
using Eqs. (3.3) and (3.5) and λ ≥ 1, establishes the desired gate complexity:
s
√
!
N k
O + log(1/ρ) log(N ) + N kλ log(k/ρ) log(N )
k λ | {z }
| {z } find remaining elements
sample t elements
s
√
1
N
= O N k + λ log(k/ρ) log(N ) + log(1/ρ) log(N )
λ k
√
=O N kλ log(k/ρ) log(N ) .
s
N
O log(1/ρ) (4.1)
δ
quantum queries and a similar gate complexity (with only a polylogarithmic overhead). In the
above (4.1) we have made very mild assumptions on the value of ρ and δ; a precise statement is
given in Theorem 4.3. The algorithm is given in ApproxSum. By slightly perturbing the entries
of v, we may assume without loss of generality that all entries of v are distinct; we shall make
this assumption throughout this section, and have made this assumption in the description of
the algorithm as well.
We briefly explain the overall strategy. Recall from the proof of Theorem 2.7 that it is useful
to preprocess the vector v by using quantum maximum finding to find vmax = maxi∈[N ] vi , and
then to use amplitude estimation on the vector w = v/vmax . We take this approach slightly
further: we first find the largest k entries z1 , . . . , zk of v, where k = Θ(pN ) for p ∈ (0, 1), and
sum their values classically. Let z̃ be the smallest value among the z1 , . . . , zk .8 For the next part,
we treat the corresponding entries of v as zero: checking whether one exceeds the threshold z̃ is
a binary comparison, hence can be done in superposition without explicitly using their indices,
8
We actually first compute a good value of z̃ using a quantile estimation subroutine [Ham21, Thm. 3.4] and
then find all the zj ’s. Alternatively, one could use [DHHM06, Thm. 3.4] to find all Θ(pN ) largest elements directly,
but our approach has the advantage of being able to use the better ρ-dependence of our version of Grover search.
18
Procedure ApproxSum(Uv , δ, p, λ, ρ)
Input: Quantum query access Uv to (0, b)-fixed point representations of v ∈ [0, 1]N ,
δ ∈ (0, 1), p ∈ (0, 1), λ ≥ 6, failure probability ρ > 0.
Output: A real number s̃.
Guarantee: With probability ≥ 1 − ρ, s̃ is a multiplicative δ-approximation of s.
Analysis: Theorem 4.3
1 use Theorem 4.2 to compute z̃ ∈ [0, 1] such that with probability ≥ 1 − ρ/4,
Q(p) ≤ z̃ ≤ Q(cp), where c < 1 is a universal constant and Q is defined in Eq. (4.2);
2 let x ∈ {0, 1}N be defined by xi = 1 if vi ≥ z̃ and xi = 0 otherwise;
3 let Ux implement quantum query access to x by applying Uv , comparing to z̃, and
uncomputing Uv ;
4 compute estimate kest of k = |x| satisfying k2 ≤ kest ≤ 3k
2 with probability ≥ 1 − ρ/4
using Corollary 2.8;
5 use GroverMultipleFast(Ux , kest , ρ/4, λ) to find all indices i1 , . . . , ik such that xij = 1;
6 if z̃ = 0 then
return kj=1 vij ;
P
7
8 else
9 construct unitary Uw for query access to w ∈ [0, 1]N where wi = 0 if vi ≥ z̃ and
wi = vi /z̃ otherwise;
10 let U be a unitary such that U |0⟩ = |ψ⟩ given by
1 X √ √
|ψ⟩ = √ |i⟩ ( w̃i |1⟩ + 1 − w̃i |0⟩)
N i
√ √
where αi is a ⌈log(4N/δ)⌉-bit approximation of arcsin( wi ), and w̃i = sin(αi );
p
11 use AmpEst(U, M ) with M = ⌈12π δ 2 pc⌉, increased to the next power of 2 if
necessary, with c < 1 from Theorem 4.2, to compute ã ≈ i w̃i /N , and
P
repeat O(log(1/ρ)) times and take the mean of the outputs to achieve success
probability ≥ 1 − ρ/4;
return kj=1 vij + N z̃ã;
P
12
13 end if
and so with one query to v we can implement quantum oracle access to the vector w ∈ [0, 1]N
defined by (
vi
if vi < z̃
wi = z̃
0 else.
This has the effect of amplifying the small elements in v at no extra cost. We then use amplitude
estimation to compute N
P
i=1 wi with
PN
additive precision O(δs/z̃) (without knowing s). This yields
an additive δs-approximation of i=1 vi (i.e., a multiplicative δ-approximation), where we use
that
N
X k
X N
X
vi = zi + z̃ wi
i=1 i=1 i=1
To balance the costs of these two stages we need to carefully choose z̃. We do so by estimating the
p-th quantile of the vector. We first give an algorithm ApproxSum whose complexity depends on
the quantile p and then give a suitable choice for p that allows us to obtain (4.1), see Theorem 4.3
and Corollary 4.4.
19
We use the following lemma to derive a bound on the required precision for certain arithmetic
operations.
p If a = sin 2(θa ) and ã = sin (θ̃a ) for θa , θ̃a ∈ [0, 2π], then
Lemma 4.1 ([BHMT02, Lem. 7]). 2 2
For the quantile estimation, we use a subroutine from [Ham21]. Let v ∈ [0, 1]N . Then
for p ∈ (0, 1), we define the p-quantile Q(p) ∈ [0, 1] by
In words, Q(p) is the largest value z ∈ [0, 1] such that there are at least pN entries of v which
are larger than z. The subroutine we invoke allows one to produce an estimate for Q(p), in the
following sense:
Theorem 4.2 ([Ham21, Thm. 3.4]). There exists a universal constant c ∈ (0, 1) such that
the following holds: Let v ∈ [0, 1]N and let Uv be a unitary implementing quantum oracle
√
access to v. Then O(log(1/ρ)/ p) applications of controlled-Uv and controlled-Uv† suffice to
find, with probability ≥ 1 − ρ, a value z̃ such that Q(p) ≤ z̃ ≤ Q(cp). The algorithm uses an
√
additional O((log(1/ρ)/ p) b log(b) log(N )) gates.
The actual access model for which the above theorem holds is more general, but we have
instantiated it for our setting. The gate complexity overhead follows from having to implement
their access model from ours, which involves arithmetic and comparisons on the fixed point
representations we use, and the fact that the underlying technique is amplitude amplification.
We now get to the main theorem of this section, which proves the correctness of ApproxSum and
analyzes its complexity.
Theorem 4.3. Let v ∈ [0, 1]N , let Uv be a unitary implementing quantum query access
to (0, b)-fixed point representations of v, and let δ ∈ (0, 1). Let p, ρ ∈ (0, 1) and choose
6 ≤ λ ≤ min{cpN/ log(pN/ρ), log(cpN/ρ)2 }. Then ApproxSum computes, with probability ≥ 1−ρ,
a multiplicative δ-approximation of s = N i=1 vi . It uses
P
s !
log(1/ρ) N √ 1 1
O √ + log(1/ρ) + N p 1 + √ log(N p/λρ)) + √ log(1/ρ)
p Np + 1 λ δ p
quantum queries, and the number of additional gates is bounded by
s
log(1/ρ) N
O √ b log(b) log(N ) + log(1/ρ) log(N )
p Np + 1
!
√ 1
+ N pλ log(pN/ρ) log(N ) + √ b log(b) log(N/δ) log2 log(N/δ) log(1/ρ) .
δ p
Before we give the proof, we discuss two useful regimes for p and λ:
Corollary 4.4. Let v ∈ [0, 1]N , let Uv be a unitary implementing quantum oracle access to (0, b)-
fixed point representations of v, and let δ ∈ (0, 1). Then we can find, with probability ≥ 1 − ρ,
a multiplicative δ-approximation of s = N i=1 vi , using:
P
additional gates, or
20
• O( N/δ
p log(1/ρ)) quantum queries when p = Θ(1/(δN )) < 1 and we choose λ = 6, and
p
Proof of Theorem 4.3. We assume without loss of generality that all the entries of v are distinct.
If this is not the case, one can perturb the i-th entry of v by i2−ℓ for some sufficiently large ℓ =
Ω(log(N ) + b), where we recall that b is the number of bits describing vi , and discarding these
trailing bits from the output value s̃.
We use Theorem 4.2 to find a value z̃ such that the number of elements of v that are at least
as large as z̃, is at most pN and at least cpN . The number of quantum queries is
!
log(1/ρ)
O √ ,
p
Let k = |{i ∈ [N ] : vi ≥ z̃}|. By the assumption that the vi are all distinct, cpN ≤ k ≤ pN .
We next compute a multiplicative 21 -approximation of k using Corollary 2.8. This uses
q
O N/(k + 1) log(1/ρ)
additional gates. The next step is to find all k such elements using GroverMultipleFast
(Theorem 3.9). This uses √
1
O N k 1 + √ log(k/(λρ))
λ
quantum queries and √
O N kλ log(k/ρ) log(N )
additional gates.
Let z1 , . . . , zk be the entries of v that are ≥ z̃. Then
N
X k
X N
X
vi = zj + z̃ wi
i=1 j=1 i=1
where (
vi
z̃ if vi < z̃
wi =
0 otherwise.
As we have found all the zj ’s, we can compute their sum exactly; therefore, to determine a mul-
tiplicative δ-approximation of s, we must produce an additive δs-approximation of z̃ N
P
i=1 wi .
Let ε := δs; note that we do not know s as we do not know δ. Then we have to approxi-
mate N1 N
P
i=1 wi with precision ε/(N z̃). For this, we use amplitude estimation as follows. First,
one can implement query access to Uw by using two quantum queries to v and O(b log(b)) non-
query gates, by querying an entry, comparing the entry to z̃, and conditional on the comparison
21
uncomputing the query, and lastly performing the division by z̃. From this, we can construct a
unitary U with U |0⟩ = |ψ⟩ satisfying
1 X √ √
|ψ⟩ = √ |i⟩ w̃i |1⟩ + 1 − w̃i |0⟩ ,
N i
where w̃i is close to wi . One can implement such a unitary as follows. First, set up a uniform
superposition over the index register using O(log(N )) gates. Use Uw to load binary descrip-
√
tions of the entries of w. Calculate a ⌈log2 (4N/δ)⌉-bit approximation αi of arcsin( wi ) using
O(log(bN/δ) log2 log(bN/δ)) gates [BZ10, Ch. 4]. Then conditionally rotate the last qubit from
0 to 1 over angles π/4, π/8, et cetera, depending on the bits of αi . Lastly, we uncompute αi
and Uw√to return work registers to the zero state, and we have obtained the desired state |ψ⟩,
where w̃i = sin(αi ). We now show that w̃i = sin(αi )2 is close to wi , and hence
N
1 X
a := w̃i = ∥|ψ1 ⟩∥2
N i=1
1 PN √
is close to N i=1 wi . Lemma 4.1 shows that if |αi − arcsin( wi )| ≤ ξ, then
q
|w̃i − wi | = |sin2 (αi ) − wi | ≤ 2ξ wi (1 − wi ) + ξ 2 ≤ ξ + ξ 2 .
√
Since αi is a ⌈log2 (4N/δ)⌉-bit approximation of arcsin( wi ), we may apply the above with ξ =
δ/(4N ) for every i ∈ [N ]. Because s ≥ z̃, δ = ε/s ≤ ε/z̃, and δ ≤ 1, so the total error satisfies
N N
1 X 1 X δ δ2 ε
|a − wi | ≤ |w̃i − wi | = ξ + ξ 2 ≤ + 2
≤ .
N i=1 N i=1 4N 16N 2N z̃
where the last inequality uses ε = δs ≤ s and i:vi <z̃ vi ≤ s. Therefore, using AmpEst with M
P
π2 π2
p p
a(1 − a) 2s/(N z̃)
|ã − a| ≤ 2π + 2 ≤ 2π + 2
M M M M
by Lemma 2.3. We now determine an appropriate number of rounds M to be used for amplitude
estimation. We will choose M such that |ã − a| ≤ 12 ε/(N z̃); if we do so, then by the triangle
inequality |ã − N1 N
p
i=1 wi | ≤ ε/(N z̃). The claim is that any M ≥ 12π N z̃/(εδ) suffices, as
P
then p √ p √
2s/(z̃N ) 2π 2 s/(N z̃) 2 ε 1 ε
2π ≤ p = ≤ ,
M 12π N z̃/(εδ) 6 N z̃ 4 N z̃
and, using δ ≤ 1,
π2 εδ 1 ε
≤ ≤ .
M2 144N z̃ 4 N z̃
Even though we do not know ε, by choosing p carefully, we can enforce upper bounds
on z̃ and give a safe choice for M . We use that the number of entries k which are at least z̃
satisfies k ≥ cpN , so that X
cp N z̃ ≤ vj ≤ s,
i:vj ≥z̃
22
p
i.e., z̃ ≤ s/(cpN ). Therefore it suffices to take M = 12π/ δ 2 pc, as this satisfies
s s
1 s N z̃
r
M = 12π 2
= 12π ≥ 12π ,
δ pc δεpc δε
PN Pk
This guarantees that |ã− N1 i=1 wi | ≤ ε/(N z̃), and the output value s̃ = j=1 zj +N z̃ã satisfies
|s̃ − s| ≤ ε = δs.
√
The number of quantum queries used for this step is therefore O(M ) = O(1/(δ p)), and the
number of additional gates used is O(M b log(b) log(N/δ) log2 log(N/δ)). To amplify the success
probability to 1 − ρ, we repeat the above procedure log(1/ρ) many times and output the median
of the individual estimates. The query- and gate complexity of the entire algorithm follow by
combining those of the four parts: the quantile estimation, the approximate counting, Grover
search for finding all large elements, and amplitude estimation for approximating the sum of the
small elements.
Acknowledgements
We would like to thank Ronald de Wolf for helpful discussions and comments on an early ver-
sion of this work, and Yassine Hamoudi for helpful discussion regarding [Ham21]. We also
thank anonymous referees for their feedback. HN acknowledges support by the Dutch Research
Council (NWO grant OCENW.KLEIN.267), by the European Research Council (ERC) through
ERC Starting Grant 101040907-SYMOPTIC and ERC Grant Agreement No. 81876432, and
by VILLUM FONDEN via the QMATH Centre of Excellence (Grant No. 10059). JvA was
supported by the Dutch Research Council (NWO/OCW), as part of QSC (024.003.037) and by
QuantumDelta NL.
References
[AdW17] Srinivasan Arunachalam and Ronald de Wolf. Optimizing the number of gates in
quantum search. Quantum Info. Comput., 17(3-4):251–261, 2017. doi:10.26421/
qic17.3-4.
[AJ06] José A. Adell and P. Jodrá. Exact Kolmogorov and total variation distances be-
tween some familiar discrete distributions. Journal of Inequalities and Applications,
2006(1):1–8, 2006. doi:10.1155/JIA/2006/64307.
[vAGL+ 21] Joran van Apeldoorn, Sander Gribling, Yinan Li, Harold Nieuwboer, Michael
Walter, and Ronald de Wolf. Quantum algorithms for matrix scaling and ma-
trix balancing. In Proceedings of 48th International Colloquium on Automata,
Languages, and Programming (ICALP’21), volume 198, pages 110:1–110:17, 2021.
arXiv:2011.12823, doi:10.4230/LIPIcs.ICALP.2021.110.
[AR20] Scott Aaronson and Patrick Rall. Quantum approximate counting, simplified.
In Symposium on Simplicity in Algorithms, pages 24–32, 2020. doi:10.1137/1.
9781611976014.5.
[BBHT98] Michel Boyer, Gilles Brassard, Peter Høyer, and Alain Tapp. Tight bounds on
quantum searching. Fortschritte der Physik, 46(4–5):493–505, 1998. Earlier version
in Physcomp’96. arXiv:quant-ph/9605034.
23
[BCdWZ99] Harry Buhrman, Richard Cleve, Ronald de Wolf, and Christof Zalka. Bounds
for small-error and zero-error quantum algorithms. In 40th Annual Symposium
on Foundations of Computer Science (FOCS’99), pages 358–368. IEEE Computer
Society, 1999.
[BHMT02] Gilles Brassard, Peter Høyer, Michele Mosca, and Alain Tapp. Quantum amplitude
amplification and estimation. In Quantum Computation and Quantum Information:
A Millennium Volume, volume 305 of Contemporary Mathematics, pages 53–74.
American Mathematical Society, 2002. doi:10.1002/(SICI)1521-3978(199806)
46:4/5<493::AID-PROP493>3.0.CO;2-P.
[BZ10] Richard Brent and Paul Zimmermann. Modern Computer Arithmetic, volume 18.
Cambridge University Press, 2010.
[CEG95] Ran Canetti, Guy Even, and Oded Goldreich. Lower bounds for sampling algorithms
for estimating the average. Information Processing Letters, 53(1):17–25, January
1995. doi:10.1016/0020-0190(94)00171-T.
[CHI+ 18] Carlo Ciliberto, Mark Herbster, Alessandro Davide Ialongo, Massimiliano Pontil,
Andrea Rocchetto, Simone Severini, and Leonard Wossnig. Quantum machine
learning: a classical perspective. Proceedings of the Royal Society A: Mathe-
matical, Physical and Engineering Sciences, 474(2209):20170551, jan 2018. doi:
10.1098/rspa.2017.0551.
[CLRS22] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.
Introduction to Algorithms. MIT Press, 4th edition, 2022.
[DF80] P. Diaconis and D. Freedman. Finite Exchangeable Sequences. The Annals of
Probability, 8(4):745–764, 1980. URL: https://www.jstor.org/stable/2242823.
[DH96] Christoph Dürr and Peter Høyer. A quantum algorithm for finding the minimum,
1996. doi:10.48550/arXiv.quant-ph/9607014.
[DHHM06] Christoph Dürr, Mark Heiligman, Peter Høyer, and Mehdi Mhalla. Quantum Query
Complexity of Some Graph Problems. SIAM Journal on Computing, 35(6):1310–
1328, January 2006. doi:10.1137/050644719.
[DKLR00] Paul Dagum, Richard Karp, Michael Luby, and Sheldon Ross. An Optimal Algo-
rithm for Monte Carlo Estimation. SIAM Journal on Computing, 29(5):1484–1496,
January 2000. doi:10.1137/S0097539797315306.
[GLM08] Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. Quantum random access
memory. Physical Review Letters, 100(16), apr 2008. doi:10.1103/physrevlett.
100.160501.
[GN22] Sander Gribling and Harold Nieuwboer. Improved quantum lower and upper bounds
for matrix scaling. In Proceedings of 39th International Symposium on Theoreti-
cal Aspects of Computer Science (STACS’22), volume 219, pages 35:1–35:23, 2022.
arXiv:2109.15282, doi:10.4230/LIPIcs.STACS.2022.35.
[dGdW02] Mart de Graaf and Ronald de Wolf. On Quantum Versions of the Yao Principle. In
19th Symposium on Theoretical Aspects of Computer Science (STACS’02), volume
2285 of Lecture Notes in Computer Science, pages 347–358, Berlin, Heidelberg, 2002.
Springer. doi:10.1007/3-540-45841-7_28.
[Gro96] Lov K. Grover. A fast quantum mechanical algorithm for database search. In
Proceedings of 38th Annual ACM Symposium on Theory of Computing (STOC’96),
pages 212–219, 1996. arXiv:quant-ph/9605043, doi:10.1145/237814.237866.
24
[Gro97] Lov K. Grover. Quantum telecomputation, 1997. Bell Labs Technical Memorandum
ITD97-31630F. doi:10.48550/arXiv.quant-ph/9704012.
[Gro98] Lov K. Grover. A framework for fast quantum mechanical algorithms. In Proceedings
of the Thirtieth Annual ACM Symposium on the Theory of Computing (STOC’98),
pages 53–62, 1998. arXiv:quant-ph/9711043, doi:10.1145/276698.276712.
[Ham21] Yassine Hamoudi. Quantum Sub-Gaussian Mean Estimator. In 29th Annual Eu-
ropean Symposium on Algorithms (ESA 2021), volume 204 of Leibniz International
Proceedings in Informatics (LIPIcs), pages 50:1–50:17, 2021. doi:10.4230/LIPIcs.
ESA.2021.50.
[Jan18] Svante Janson. Tail bounds for sums of geometric and exponential variables. Statis-
tics & Probability Letters, 135:1–6, 2018. doi:10.1016/j.spl.2017.11.017.
[Knu98] Donald Ervin Knuth. The Art of Computer Programming, volume III. Addison-
Wesley, 2nd edition, 1998. URL: https://www.worldcat.org/oclc/312994415.
[KO23] Robin Kothari and Ryan O’Donnell. Mean estimation when you have the source
code; or, quantum Monte Carlo methods. In Proceedings of the 2023 Annual ACM-
SIAM Symposium on Discrete Algorithms (SODA’23), pages 1186–1215, 2023. doi:
10.1137/1.9781611977554.ch44.
[NC02] Michael A. Nielsen and Isaac L. Chuang. Quantum computation and quantum in-
formation. Cambridge University Press, 2002.
[NW99] Ashwin Nayak and Felix Wu. The quantum query complexity of approximat-
ing the median and related statistics. In Proceedings of the 31st Annual ACM
SIGACT Symposium on Theory of Computing (STOC’99), pages 384–393, 1999.
arXiv:quant-ph/9804066, doi:10.1145/301250.301349.
[Roo01] B. Roos. Binomial Approximation to the Poisson Binomial Distribution: The
Krawtchouk Expansion. Theory of Probability & Its Applications, 45(2):258–272,
2001. doi:10.1137/S0040585X9797821X.
[You91] Robert M. Young. 75.9 Euler’s Constant. The Mathematical Gazette, 75(472):187–
190, 1991. doi:10.2307/3620251.
Theorem A.1. There exists a universal constant C > 0 such that the following holds. Let δ > 0
be such that kδ is an integer and k(1 + δ) ≤ N . Suppose A is a randomized query algorithm such
that for all x ∈ {0, 1}N with |x| ∈ {k, k(1 + δ)},
5
Pr [A(x, r) = |x|] ≥ ,
r∼Unif({0,1}R ) 6
25
where R is the number of used random bits, and Unif({0, 1}R ) refers to the uniform distribution
on them. Assume that A makes t ≥ 0 queries, independent of the input x, or the randomness r
used by the algorithm. Then
N
t ≥ C min{N, 2 }
δ |x|
At a high level, our proof boils down to showing that if a t-query algorithm A succeeds with
high probability, then the total variation distance between Hyp(N, k, t) and Hyp(N, k(1 + δ), t)
must be Ω(1). Here Hyp(N, ℓ, t) is the distribution on the number of observed marked elements
if one draws t elements from a set of size N of which ℓ elements are marked, without replace-
ment. The lower bound on t then follows from a t-dependent upper bound on this total variation
distance, for which we state the necessary ingredients from the literature first. We let Bin(t, p)
the binomial distribution with parameters t ≥ 1 and p ∈ (0, 1), corresponding to t indepen-
dent Bernoulli trials, each of which succeeds with probability p. First, we state the following
bound [DF80, Thm. 3] which shows that when the number of samples t is small compared to N ,
sampling with and without replacement yield approximately the same distribution.
Theorem A.2. The total variation distance between Hyp(N, ℓ, t) and Bin(t, ℓ/N ) is at most 4t/N .
Next, we use the following estimate on the total variation distance between two binomial
distributions with the same t, but distinct success probability [Roo01, AJ06]:
Theorem A.3. Let t ≥ 1, p ∈ (0, 1) and δ ∈ (0, 1) such that p(1 + δ) < 1. The total variation
distance between Bin(t, p) and Bin(t, p(1 + δ)) is at most
√
e τ
,
2 (1 − τ )2
Proof. This follows from [Roo01, eq. (15)]. We apply their bound on the distance to Bin(t, p)
with the following parameters: s = 0, n is our t, the random variable Sn is the sum of t
Bernoulli random variables with success probability p + x with x = δp, hence its distribution P Sn
is Bin(t, p + x), and
2tx2 + t2 x2
γ1 (p) = tx, γ2 (p) = tx2 , η(p) = 2tx2 + t2 x2 , θ(p) = .
2tp(1 − p)
√ √
The upper bound given is then dTV (P Sn , Bin(t, p)) ≤ e √θ(p)
2 (1− θ(p))2 . The claimed bound in the
theorem then follows from θ(p) = τ , using x = pδ.
Via the triangle inequality, the above two theorems suffice to upper bound the total variation
distance between Hyp(N, k, t) and Hyp(N, k(1 + δ), t) (for a precise statement, see the proof
below). We are now ready to prove Theorem A.1.
Proof of Theorem A.1. For an integer ℓ, let Xℓ ⊆ {0, 1}N be the set of bit strings with Hamming
weight ℓ. We use Yao’s minimax principle to lower bound the number of queries required by a
randomized algorithm that outputs |x| with probability ≥ 5/6 on every input x ∈ Xk ∪ Xk(1+δ) .
That is, we exhibit a distribution D on Xk ∪ Xk(1+δ) for which every deterministic algorithm
that computes |x| on a 5/6 fraction of the inputs, weighted according to D, requires at least
26
C min{N, N/(δ 2 k)} queries for some universal constant C. Consider the distribution D on inputs
that with probability 1/2 samples a uniformly random element from Xk , and with probability
1/2 samples a uniformly random element from Xk(1+δ) . Suppose A is a deterministic t-query
algorithm that on input x ∼ D correctly returns |x| with probability at least 5/6 (where the
probability is over the sample from D). Note that we allow A to know k and δ. We show the
desired lower bound on t. Let A(x) = a denote the substring (xi1 , . . . , xit ) of x that corresponds
to the t queried indices i1 , . . . , it . Note that the output A(x) ∈ {k, k(1 + δ)} of the algorithm is
deterministic and only a function of a (for a fixed algorithm). If one thinks of A as a decision
tree, then the first index to be queried does not depend on a, and after every subsequent query,
the next index to be queried is deterministic as a function of the previous queried indices
and outcomes. It is also the case that the queried indices i1 , . . . , it are a function of the query
outcomes a! Therefore, we may view A(x) as a just a function of a = A(x). Let B ⊂ Xk ∪Xk(1+δ)
be the set of a ∈ {0, 1}t on which A(a) outputs k(1 + δ).
Let Pℓ be the distribution on a = A(x) ∈ {0, 1}t induced by x ∼ Unif(Xℓ ), where the latter
refers to the uniform distribution on Xℓ . By assumption, A can distinguish (with constant
success probability) the distributions Pk and Pk(1+δ) . Therefore, the total variation distance
between these distributions is large. Indeed, the probability, with respect to D, that A outputs
the wrong value of |x| is at most 1/6, therefore A fails with probability at most 1/3 when
x ∼ Unif(Xℓ ) for both ℓ = k and ℓ = k(1 + δ), and hence
1 1
+ ≥ Pr [A(x) = k] + Pr [A(x) = k(1 + δ)]
3 3 x∈R Xk(1+δ) x∈R Xk
We prove this by exploiting the permutation symmetry of the distribution on Xℓ , along with
an iterative conditioning argument. Let i1 , . . . , it denote the sequence of indices of x queried,
so that A(x) = (xi1 , . . . , xit ). Recall that the i1 , . . . , it may be chosen adaptively, but ij+1 is
determined completely from i1 , . . . , ij and xi1 , . . . , xij . Then
ℓ
Pr [xi1 = 1] = .
x∈Xℓ N
Moreover, one has
ℓ − js=1 xis
P
Pr [xij+1 = 1|xi1 , . . . , xij ] = ,
x∈Xℓ N −j
27
because conditioned on the values of x at the indices i1 , . . . , ij , the distribution of x becomes
uniform among bit strings of Hamming weight ℓ − js=1 xis of length N − j in the remaining
P
position. Finally
t
Y
Pr [A(x) = a] = Pr [xij = aj |xi1 = a1 , . . . , xij−1 = aj−1 ]
x∈Xℓ x∈Xℓ
j=1
Now because
ℓ N −ℓ
|a| t−|a|
Pr[Wℓ = |a|] = N
t
we see that we indeed have
1
Pr [A(x) = a] = t Pr[Wℓ = |a|].
x∈Xℓ
|a|
Therefore we obtain
1 X
dTV (Pk , Pk(1+δ) ) = |Pk (a) − Pk(1+δ) (a)|
2 a∈{0,1}t
1 X 1
= t |Pr[Wk = |a|] − Pr[Wk(1+δ) = |a|]|
2 a∈{0,1}t |a|
t
1X
= |Pr[Wk = s] − Pr[Wk(1+δ) = s]|
2 s=0
= dTV (Hyp(N, k, t), Hyp(N, k(1 + δ), t)).
We now give a t-dependent upper bound on this total variation distance; combined with the
assumption that dTV (Pk , Pk(1+δ) ) ≥ 1/3, this will lead to the right lower bound on t. Let
s s
δk t+2 k(t + 2)
τ= =δ .
N 2 N (1 − Nk )
k 2(N − k)
If τ ≥ 1/2, then
(N − k)
t+2≥ .
2δ 2 k
and so in this case t = Ω(N/(δ 2 k)) (unless the latter is O(1), in which case the query lower
bound we aim for is constant and uninteresting). Otherwise, by the triangle inequality,
dTV (Hyp(N, k, t), Hyp(N, k(1 + δ), t)) ≤ dTV (Hyp(N, k, t), Bin(t, k/N ))
+ dTV (Bin(t, k/N ), Bin(t, k(1 + δ)/N ))
+ dTV (Bin(t, k(1 + δ)/N ), Hyp(N, k(1 + δ), t))
28
√
8t e τ
≤ +
N 2 (1 − τ )2
8t √
≤ + 2τ e,
N
where we applied Theorems A.2 and A.3 (note that we needed τ < 1). Since the left-hand side
is at least 1/3 as shown before, we have
1 8t √
≤ + 2τ e,
3 N
so at least one of the two terms must be 1/6 or greater. If 8t/N ≥ 1/6, then t ≥ N/48 and we
are done; otherwise, τ ≥ 1/24 and so
2(N − k)
t+2≥ .
242 δ 2 k
If k ≤ N/2, then N − k ≥ N/2 and one deduces t + 2 ≥ N/(144δ 2 k).
29