0% found this document useful (0 votes)
69 views74 pages

Quantum Learning Algorithms Imply Circuit Lower Bounds: Srinivasan Arunachalam Alex B. Grilo

1) The document establishes a connection between the design of quantum algorithms and circuit lower bounds. Specifically, it proves that if a class of concepts C can be PAC learned on a quantum computer with error 1/2 - γ in time T, and γ2T is much less than 2n/n, then C cannot be computed by quantum circuits of size 2O(n). 2) This shows that even a slight quantum speedup in learning certain concept classes would have major implications for complexity theory, such as proving new lower bounds on quantum circuit size. 3) The proof builds on previous works in learning theory, pseudorandomness, and complexity theory. It constructs new tools such as the first pseudor

Uploaded by

Abner ogega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views74 pages

Quantum Learning Algorithms Imply Circuit Lower Bounds: Srinivasan Arunachalam Alex B. Grilo

1) The document establishes a connection between the design of quantum algorithms and circuit lower bounds. Specifically, it proves that if a class of concepts C can be PAC learned on a quantum computer with error 1/2 - γ in time T, and γ2T is much less than 2n/n, then C cannot be computed by quantum circuits of size 2O(n). 2) This shows that even a slight quantum speedup in learning certain concept classes would have major implications for complexity theory, such as proving new lower bounds on quantum circuit size. 3) The proof builds on previous works in learning theory, pseudorandomness, and complexity theory. It constructs new tools such as the first pseudor

Uploaded by

Abner ogega
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Quantum learning algorithms imply circuit lower bounds

Srinivasan Arunachalam Alex B. Grilo


IBM T. J. Watson Research Center Sorbonne Université, CNRS, LIP6
arXiv:2012.01920v2 [quant-ph] 1 Dec 2021

[email protected] [email protected]

Tom Gur Igor C. Oliveira Aarthi Sundaram


University of Warwick University of Warwick Microsoft Quantum
[email protected] [email protected] [email protected]

December 3, 2021

Abstract
We establish the first general connection between the design of quantum algorithms and
circuit lower bounds. Specifically, let C be a class of polynomial-size concepts, and suppose
that C can be PAC-learned with membership queries under the uniform distribution with error
1/2 − γ by a time T quantum algorithm. We prove that if γ 2 · T ≪ 2n /n, then BQE * C, where
BQE = BQTIME[2O(n) ] is an exponential-time analogue of BQP. This result is optimal in both
γ and T , since it is not hard to learn any class C of functions in (classical) time T = 2n (with no
error), or in quantum time T = poly(n) with error at most 1/2 − Ω(2−n/2) via Fourier sampling.
In other words, even a marginal quantum speedup over these generic learning algorithms would
lead to major consequences in complexity lower bounds. As a consequence, our result shows that
the study of quantum learning speedups is intimately connected to fundamental open problems
about algorithms, quantum computing, and complexity theory.
Our proof builds on several works in learning theory, pseudorandomness, and computational
complexity, and on a connection between non-trivial classical learning algorithms and circuit
lower bounds established by Oliveira and Santhanam (CCC 2017). Extending their approach
to quantum learning algorithms turns out to create significant challenges, since extracting com-
putational hardness from a quantum computation is inherently more complicated. To achieve
that, we show among other results how pseudorandom generators imply learning-to-lower-bound
connections in a generic fashion, construct the first conditional pseudorandom generator secure
against uniform quantum computations, and extend the local list-decoding algorithm of Im-
pagliazzo, Jaiswal, Kabanets and Wigderson (SICOMP 2010) to quantum circuits via a deli-
cate analysis. We believe that these contributions are of independent interest and might find
other applications.

1
Contents
1 Introduction 3
1.1 Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 The classical proof and our new perspective . . . . . . . . . . . . . . . . . . . 7
1.2.2 Challenges in the quantum setting . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Overview of the proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Directions and open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Preliminaries 16
2.1 Basic definitions and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Quantum computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Quantum learning algorithms and extensions . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Quantum natural properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Self-reducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Pseudorandomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Inherently probabilistic computations . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Lower bounds from learning algorithms: a modular approach via PRGs 26


3.1 Quantum natural properties from quantum learning algorithms . . . . . . . . . . . . 26
3.2 Circuit lower bounds from quantum natural properties . . . . . . . . . . . . . . . . . 29
3.3 Non-trivial quantum learning yields non-uniform circuit lower bounds . . . . . . . . 32

4 Technical tools 32
4.1 Nisan-Wigderson generator against quantum adversaries . . . . . . . . . . . . . . . . 32
4.2 Goldreich-Levin lemma in the quantum setting . . . . . . . . . . . . . . . . . . . . . 37
4.3 Local list decoding and uniform hardness amplification for quantum circuits . . . . . 39
4.3.1 Inherently probabilistic circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.2 Excellence implies correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.3 Excellent edges are abundant . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3.4 Extension to quantum circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Self-reducibility in the quantum setting . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 A conditional PRG against uniform quantum circuits 64

A On trivial quantum learning algorithms 73

2
1 Introduction
One of the salient goals of quantum computing is to understand which computational prob-
lems admit quantum speedups over classical algorithms. The canonical example is Shor’s algo-
rithm [Sho94] for factoring, which achieves an exponential speedup over the best known classical
algorithms. Another notable example is Grover’s algorithm [Gro96], which sheds light on quantum
complexity theory by showing that expressive languages such as the quintessential NP-complete
problem Formula-SAT admits quantum algorithms of time complexity O(2 e n/2 ), whereas in general,
there are no known classical algorithms that outperform the O(2 e n )-time brute force search. This
paper investigates the implications of quantum speedups within the setting of learning theory.
Quantum learning theory forms the theoretical foundations which allow us to understand the
potential power and limitations of quantum machine learning. At its core, this field studies quantum
algorithms that are given quantum access (typically, quantum queries or quantum samples) to an
unknown circuit f from a fixed concept class C, and the goal is to output a hypothesis h that
well-approximates f , in which case we say that C can be quantumly learned. Due to the massive
success of machine learning and the great potential of quantum computing, quantum learning
received much attention over the last two decades [BJ98, SG04, AS05, ABG06, AS07, OW16,
OW17, AW18, GKZ19, ACL+ 19] (cf. survey [AW17] and references therein).
In this setting, quantum algorithms have two main advantages over their classical counterparts:
making queries in superposition, and using quantum computation to process the information ob-
tained from these queries. Note that a large number of negative results about the power of classical
learning algorithms do not extend to the quantum setting (e.g., [Kha93, NR99]), since the under-
lying hardness assumptions, based on problems such as factoring and discrete logarithm, break for
quantum computations. In addition, in some learning models and for some learning tasks, quan-
tum algorithms are strictly faster than classical algorithms [SG04], under standard cryptographic
assumptions. While it is possible to rule out polynomial-time or quasi-polynomial time quantum
learning algorithms for some concept classes using stronger cryptographic assumptions [AGS21],
our understanding of the possibilities and limitations of sub-exponential time quantum learnability
is still limited. This motivates the following fundamental question:

Are quantum speedups for learning expressive concept classes possible?

Our main result exposes an intrinsic connection between complexity theory and quantum learn-
ing theory, showing that obtaining any quantum learner that does marginally better than certain
“trivial” learners (see Section 1.1) would imply circuit lower bounds for languages computable in
BQE, the quantum analogue of E.1 To our knowledge, this is the first result connecting the design
of general quantum algorithms to proving lower bounds:
Main result (Informal). If a class C of polynomial-size concepts can be learned under the uniform
distribution with membership queries and with error at most 1/2 − γ in quantum time o(γ 2 · 2n /n),
then BQE * C.
While it seems extremely unlikely that a large class such as BQE can be simulated by classical
Boolean circuits of polynomial size, showing this and similar results for exponential time classes
seems to be out of reach of current techniques. For comparison, the recent NE * ACC0 lower bound
due to Williams [Wil14] is widely recognized to be a milestone in complexity theory.
Our result admits two possible interpretations. For a pessimist, it explains the difficulty of
designing provably correct quantum learning algorithms for expressive concept classes, given that
1
Recall that E = DTIME[2O(n) ], and analogously BQE = BQTIME[2O(n) ].

3
establishing non-uniform circuit lower bounds is a notoriously difficult problem. Contrarily, for
an optimist, it indicates a potential path to new lower bounds by exploring the power of quantum
computation. For instance, if depth-2 threshold circuits of polynomial size can be learned in o(2n /n)
quantum time under the uniform distribution, a new complexity lower bound would follow.
Our starting point in the proof of this result is a connection of a similar nature between classical
learning algorithms and circuit lower bounds due to Oliveira and Santhanam [OS17]. The extension
to quantum learning algorithms turns out to require significant technical work. En route to that,
we obtain new results concerning local list-decoding for quantum circuits, construct the first PRG
secure against uniform quantum computations, and provide a new general method to establish
learning-to-lower-bound connections. We next describe our contributions in more detail.

1.1 Main result


Before we proceed to formally state our result, we first discuss the model of quantum learning
and provide a brief overview of circuit lower bounds in complexity theory.

Quantum learning. We consider the standard PAC learning model under the uniform distribu-
tion with quantum membership queries. Our main result derives a consequence from the existence
of learning algorithms, and restricting our model to the uniform distribution and allowing queries
only makes it stronger (since learning algorithms are easier to design under these assumptions).
Although we consider quantum learning algorithms, we emphasize that our results are concerned
with the learnability of (classical) Boolean functions f : {0, 1}n → {0, 1} from a fixed concept class
C. Here, we use C[s(n)] to refer to concepts defined over n-bit inputs and of size at most s(n).
In more detail, a quantum learning algorithm A for C running in time T is described by a
uniform sequence of quantum oracle circuits Qn of size at most T (n). We say that A learns C[s]
with probability δ and error ε if for every n and every f ∈ C[s(n)], Qn with oracle access to f outputs
with probability at least δ the description of a (classical) Boolean circuit C such that Prx [C(x) 6=
f (x)] ≤ ε. Note that having oracle access to f means that Qn can query f in superposition via
a unitary map Of whose action on basis states is defined as Of : |x, bi → |x, b ⊕ f (x)i, for every
x ∈ {0, 1}n and b ∈ {0, 1}. We stress here that we do not require the hypothesis circuit C to be
from the same class, i.e, the learning algorithm can be improper (also known as representation
independent). (Our result also holds for learners that output a quantum circuit that approximates
f on average to error at most ε, and we refer to the body of the paper for details.)
It is instructive to contrast our main result with two learners that are “trivial” in a certain
sense, i.e., they do not exploit the structure of the concept class. The first one is simply a brute-
force (classical) learner that queries all possible inputs of the unknown function, and outputs a
hypothesis consisting of its truth-table. This learner can be implemented in time T = O(2 e n ) and
achieves optimal error parameter ε = 0. On the other hand, the second algorithm explores the
basic fact that any function f : {0, 1}n → {0, 1} can be approximated by a parity function (or its
negation) with advantage γ = Ω(2−n/2 ) (i.e., with error ε ≤ 1/2 − Ω(2−n/2 )). For this reason, any
class C of Boolean concepts can be learned with probability δ = Ω(1) and error ε ≤ 1/2 − Ω(2−n/2 )
using Fourier sampling in time T = poly(n) (see Appendix A for details). (We stress that this
highly efficient learner is only available in the quantum setting, thanks to Fourier sampling.)

Complexity Theory. Establishing (non-uniform) circuit lower bounds is one of the holy grails
of the theory of computation. Despite over 50 years of extensive research, we still have a very poor
understanding of the limitations of Boolean circuits. While significant progress has been made with

4
respect to very restricted circuit models, such as small-depth circuits with OR, AND, and NOT
gates, the power of more expressive circuit classes remains mysterious.
To give two concrete examples, firstly, we don’t know how to rule out that every algorithm
running in time 2O(n) can be simulated by (classical) Boolean circuits of linear size. Secondly, as
mentioned above, it was not long ago that Williams [Wil14] obtained the first separation between
non-deterministic exponential time NE = NTIME[2O(n) ] and the class ACC0 [m] of polynomial-
size constant-depth circuits consisting of AND, OR, NOT, and MODm gates. This illustrates
how difficult it is to understand the power of non-uniform computations, even in the setting of
exponential time classes such as E, NE and BQE.2
Indeed, while the possibility that a large class such as BQE can be simulated by classical
Boolean circuits of linear size seems unlikely, proving such statements remains notoriously hard.
Given the scarcity of techniques to establish such complexity separations, it is of interest to obtain
new approaches for circuit lower bounds and to connect this question to other research areas in
algorithms and complexity.

In the following, we restrict our attention to reasonable classes C of circuits that can be ef-
ficiently simulated by general Boolean circuits and are closed under restrictions, i.e., if f is com-
putable by C-circuits of size s(n), then f is computable by general Boolean circuits of size poly(s(n)),
and the function obtained by restricting some of the variables of f to constants 0 and 1 is also com-
putable by C-circuits of size s(n). Note that virtually any circuit class of interest, including depth-2
threshold circuits, ACC0 , and polynomial-size formulas, satisfies these properties.

Theorem 1 (Non-trivial quantum learning algorithms yield non-uniform complexity lower bounds).
There is a universal constant λ ≥ 1 for which the following holds. Let C be a concept class. Let
γ : N → [0, 1/2] and T : N → N satisfy γ(n) ≥ λ · 2−n/2 and T (n) ≤ γ(n)2 · 2n /λn. Suppose that,
for every k ≥ 1, the class C[nk ] can be learned in quantum time T (n) with probability ≥ 1/100 and
error ε ≤ 1/2 − γ(n). Then, for every k ≥ 1, we have BQE * C[nk ].

The confidence probability 1/100 is not essential, and we adopted this constant in order to sim-
plify the statement and focus on the trade-off between running time and accuracy. Two interesting
settings of parameters are (1) γ(n) = 0.49 and T (n) = o(2n /n), and (2) γ(n) = nω(1) · 2−n/2 and
T (n) = nω(1) . The first case shows that strong learners that beat the trivial brute-force learning
algorithm by a polynomial factor (with respect to the running time) imply lower bounds, while the
second setting shows that polynomial-time learners that perform marginally better than a Fourier-
sampling based learner (with respect to the error parameter) also imply lower bounds. For this
reason, we view Theorem 1 as a result essentially stating that non-trivial quantum learnability of a
class of polynomial-size circuits yields complexity lower bounds.
As alluded to above, the connection established in Theorem 1 can be interpreted in two ways.
On the one hand, it provides an explanation for why it is difficult to design provably correct
non-trivial quantum learners, as they would imply dramatic consequences to complexity theory,
showing new circuit lower bounds that are notoriously hard to prove. On the other hand, this
connection significantly strengthens the paradigm of proving circuit lower bounds via (classical)
2
Note that existing circuit lower bounds against circuits of size nk , such as the result from [San09] showing that
MA/1 * SIZE[nk ], do not provide a hard language in BQE. The naive simulation of a language L ∈ MA in exponential
c
time results in an algorithm that runs in time 2n for some constant c ≥ 1 that can be larger than k (and we can
c
easily diagonalize againts circuits of size nk in time 2n when c > k). Additionally, we do not consider in this paper
the computation of a hard problem with advice, as in the MA/1 lower bound, which requires 1 bit of non-uniform
advice.

5
learning algorithms [OS17] by capitalizing on the power of quantum learning algorithms, which
might be vastly stronger than their classical counterparts.3
In the subsequent section, we discuss our techniques in detail and explain our additional con-
tributions.

1.2 Techniques
The first results showing that learning algorithms imply lower bounds appeared in the pio-
neering work of Fortnow and Klivans [FK09]. These initial results, however, required a strong
assumption on the resources of the learning algorithm. For instance, for randomized learners, lower
bound consequences could only be obtained from polynomial-time learners. Different learning as-
sumptions were explored in a sequence of subsequent works [HH13, KKO13, Vol14, Vol16, OS17,
OS18, CRTY20]. We review these works in detail in Section 1.4, where we present a self-contained
exposition of existing connections between learning algorithms and lower bounds. Here we focus
on one of the strongest (and most relevant in our context) connections obtained before this work.
Oliveira and Santhanam [OS17, Theorem 14] showed that if a class C of polynomial-size concepts
can be PAC-learned under the uniform distribution to error ε ≤ 1/2 − γ by a randomized algorithm
running in time γ 2 · 2n /nω(1) , then for every k ≥ 1 we have BPE * C[nk ]. In contrast, Theorem 1
can be seen as an analogue of this connection in the quantum setting, where algorithms can be
more powerful. Intuitively, this means that the task of extracting a computational lower bound
from a learning algorithm becomes more delicate. (Indeed, as mentioned above, there is even
another “trivial” learner that is only available in the quantum setting and proceeds via Fourier
sampling.) To accomplish that, we make additional conceptual and technical contributions of
independent interest:

– Our paper introduces a new paradigm to establish learning-to-lower-bound connections that


employs a pseudorandom generator (PRG) in a black-box way. Thus, even without its quan-
tum counterpart, we simplify and extend existing results.
– We propose and analyze the first PRG with sub-exponential stretch that is secure against
uniform quantum computations. Moreover, we base its security on a weaker uniform hard-
ness assumption.
– We prove a near-optimal uniform hardness amplification result for quantum circuits by a deli-
cate extension of techniques and analysis from Impagliazzo, Jaiswal, Kabanets, and Wigderson
[IJKW10].
– We introduce a new computational model between classical and quantum computation: in-
herently probabilistic circuits. It highlights an important difference between classical and
quantum algorithms, and provides a way to mitigate the complex task of analyzing quantum
computations.

In the next sections, we provide more details about the proof of Theorem 1, explain the role of the
contributions mentioned above, and contrast our techniques to prior work.
3
Even if in the short term we are not able to design new quantum learning algorithms, the mere existence of a
connection between learning and lower bounds has been used to establish unconditional complexity lower bounds
(see Section 1.4 and [Oli19, Section 3.1]). Thus, if the “pessimist” interpretation is the correct one, it is still possible
that the connection established in Theorem 1 can be indirectly used as a key ingredient of a lower bound proof.

6
1.2.1 The classical proof and our new perspective
Before discussing the quantum perspective, it is instructive to review the approach for showing
the classical connection between randomized learners and lower bounds.

Randomized learners. Techniques from computational complexity theory and from the theory
of pseudorandomness play a key role in the proof from [OS17] that non-trivial randomized learners
yield complexity lower bounds. In a bit more detail, their argument proceeds roughly as follows:

1. First, they show that sub-exponential time randomized learners for a class C imply that
BPE * C. This part refines ideas from [FK09, KKO13] that rely on results from structural
complexity theory (e.g., a special PSPACE-complete language [TV07] and diagonalization).

2. This is followed by a proof that the existence of “non-trivial” randomized learners for a class C
of polynomial-size concepts implies the existence of sub-exponential time randomized learners
for C. This implication relies in part on connections between learning theory and the theory
of pseudorandomness introduced by [CIKK16].

The proof of the learning-to-lower-bound connection immediately follows from Items 1 and 2 above.

The quantum case: A new perspective. In contrast to [OS17], here we take a more direct
path to show that “non-trivial” quantum learning algorithms for C imply lower bounds against C.
While several technical tools from the theory of pseudorandomness that we use still correspond to
quantum extensions of core results behind [OS17], our approach is conceptually very different. In
more detail, we are able to show that PRGs against uniform (classical or quantum) computations
can be used to establish a learning-to-lower-bound connection in a black-box way. In particular, we
do not follow the 2-step approach outlined above.
The benefits of our new perspective are twofold: (a) on the one hand, stronger PRG statements
immediately lead to stronger connections between learning algorithms and lower bounds, and (b) it
allows for a more modular proof of the learning-to-lower-bound connection. In particular, with our
perspective the argument becomes more manageable in the quantum setting, where new technical
difficulties are present compared with the randomized case.
At a very high level, we use a PRG to generate a “hard” function that is not correctly learned by
the quantum learning algorithm. Consequently, this function does not belong to the circuit class C,
and it can be used to define a language that cannot be computed by C-circuits of bounded size.
What makes the approach viable is that a PRG that fools uniform computations is sufficient. We
discuss our proof in more detail in Section 1.2.3. Before that, at a more conceptual level, we explain
some of the challenges associated with the transition from randomized to quantum computations.

1.2.2 Challenges in the quantum setting


Our goal in Theorem 1 is to show that the existence of a quantum learning algorithm for a class
C of polynomial-size circuits can be used to construct a function h ∈ BQE that cannot be computed
by C-circuits of size nk . As mentioned above, and explained in more detail in Section 1.2.3 below,
this will be achieved through the design of a PRG against uniform quantum computations, along
with other ideas. In more detail, we show that if a certain language L is hard for sub-exponential
γ
time uniform quantum computations, i.e., L ∈ / BQTIME[2n ] for some γ > 0, then there is a
generator G : {0, 1}ℓ → {0, 1}m computable in deterministic time 2O(ℓ) , where ℓ = poly(n) and
Ω(1)
m = 2n , that is able to “fool” uniform quantum circuits of size poly(m).

7
Generators of this form are know in the realm of classical computations (see [IW01, TV07,
γ
CRTY20]), i.e., under an appropriate hardness assumption against BPTME[2n ], it is possible
to fool uniform probabilistic computations running in time poly(m). The proof of these results,
essentially, proceeds as follows. If there is a sequence of (uniform) circuits {Cm } of bounded
complexity that distinguish the m-bit output of G from a random m-bit string, then there is a
faster uniform algorithm for the hard problem L, which leads to a contradiction. In order to
quantize this argument, we aim to construct a PRG secure against uniform quantum circuits. This
leads to the natural question: What can go wrong in these classical proofs when each Cm is a
quantum circuit?

Randomized circuits versus quantum circuits. It turns out that there is a general property
of probabilistic computations that is not available to quantum computations. To be precise, consider
the standard model of randomized Boolean circuits, i.e., a Boolean circuit C(x, y), where x is the
input string and y is the random string. We say that C computes a Boolean function g : {0, 1}m →
{0, 1}k on an input x ∈ {0, 1}m if Pry [C(x, y) = g(x)] ≥ 2/3. Similarly, we can also discuss the
correlation between C and g, captured by η = Prx,y [C(x, y) = g(x)]. An extremely useful property
of this model is that, by an averaging argument, there exists a fixed string y ′ such that Prx [C(x, y ′ ) =
g(x)] ≥ η, i.e., there is a deterministic circuit Cy′ (x) = C(x, y ′ ) that correctly computes g on an
η-fraction of inputs. This often allows one to reduce the analysis of randomized Boolean circuits to
the deterministic case. Additionally, Cy′ “forces” C(x, y) to commit to a single consistent output
on every x, which can be relevant if C is used as a subroutine in other computations.
As a concrete example, note that this idea is crucial in the proof of BPP ⊆ P/poly, which is a
simple combination of reducing the failure probability of a randomized circuit and fixing its random
input. (In contrast, it is open if BQP ⊆ P/poly.) Importantly, while quantum computations share
superficial similarities with the model of randomized computations, there is no obvious way of
reducing a quantum computation to a distribution of “deterministic” quantum computations. This
has an important effect on the design of algorithms as well as on their analysis, creating several
difficulties in our proof. We explain one of the most significant of these below.

Uniform hardness amplification for quantum circuits. A core component in several PRG con-
structions is to amplify the hardness of a function f : {0, 1}n → {0, 1} that is only assumed to be
mildly hard. This is a standard element of PRGs that follow the hardness versus randomness frame-
work of Nisan and Wigderson [NW94], and we need it here as well. To be precise, suppose we start
with f that is weakly hard for quantum circuits of bounded size, i.e., for every circuit A we have
h  i def h i
E Pr A(x) = f (x) = E kΠf (x) A |x, 0ℓ ik2 ≤ 1 − δ,
x∼{0,1}n A x∼{0,1}n

where δ = 1/n for instance. In the notation above, kΠf (x) A |x, 0ℓ ik2 is the probability of obtain-
ing f (x) when measuring the first qubit of the circuit A on input |x, 0ℓ i, and we average its success

probability over a random input x. Our goal is to define a function g : {0, 1}n → {0, 1}k from f
that is exponentially harder than f , i.e., for every quantum circuit B of bounded size, we have
h  i def h ′
i
E ′ Pr B(y) = g(y) = E ′ kΠg(y) B |y, 0ℓ ik2 ≤ ε,
y∼{0,1}n B y∼{0,1}n

Ω(1)
where ε = 2−n for instance. We have two additional requirements beyond the generalization
from classical to quantum computation: (1) we need a hardness amplification scheme with a near-
optimal setting of parameters (so that our PRG achieves good parameters), and (2) since we are

8
constructing a PRG under a uniform hardness assumption, it is also crucial to control the amount
of “non-uniformity” in the argument that establishes the correctness of the construction (i.e., how a
circuit B violating the conclusion can be used to construct a circuit A that violates the hypothesis).
Our main technical contribution is to prove a quantum analogue of the near-optimal (uniform)
hardness amplification result from Impagliazzo, Jaiswal, Kabanets, and Wigderson [IJKW10] which,
in the language of coding theory, can be interpreted as a local-list decoding algorithm for direct

product codes. In other words, their work considers g = f k , i.e., g : {0, 1}n → {0, 1}k is defined as
the computation of k independent copies of f (thus n′ = kn, and one can think of k = poly(n)).
The extension of [IJKW10] to quantum circuits is challenging for the following reason. In the
classical setting, given (without loss of generality) a deterministic circuit B such that

Pr [B(y) = g(y)] > ε, (1)


y∼{0,1}n′


there is only one way to distribute the correlation between B and g: there is a set S ⊆ {0, 1}n

such that B is correct on S and wrong on S, where |S| > ε · 2n . On the other hand, for a quantum
circuit B, the correlation between B and g can be distributed in an arbitrary way among the inputs.

For instance, perhaps on every input string y ∈ {0, 1}n , PrB [B(y) = g(y)] = 2ε. Going beyond
that, we might also have quantum circuits that “interpolate” between these two extreme examples
in an arbitrary way. Unfortunately, as opposed to randomized circuits, there is no way for us to
“fix the quantumness” of B and reduce the analysis to the deterministic case, i.e., to the simpler
setting of Equation (1). (For instance, one could query B on each input only once, memorizing
its output after that. This, however, would not provide a fixed hard language if we employ B as
a sub-routine when extracting lower bounds from quantum learning algorithms. Hence this trick
would not work.) This more general scenario in the quantum setting affects the original analysis
from [IJKW10] and introduces several fundamental “asymmetries” in the argument.

An inherently random “channel” and a quantization of [IJKW10]. From the perspective


of coding theory, what we are trying to achieve is local list-decoding in a setting where querying
a coordinate (an input for B) does not produce a deterministic outcome. It turns out that this
problem can be investigated without reference to quantum circuits. To achieve that, we explore a
computational model that captures the scenario described above: inherently probabilistic circuits
(see Section 2.7 for the definition and discussion of related work). Roughly speaking, this is a
model for randomized computations where “the random input cannot be fixed”, i.e., the random
input remains inaccessible and the randomized circuit is only accessed as a black-box. It can be
seen as an intermediate model between classical randomized circuits and quantum circuits. Most
importantly for us, this model captures nearly all the difficulties when quantizing building blocks
of the PRG.
Employing an intricate analysis that extends [IJKW10], we are able to show that their local list
decoding algorithm (with a natural modification) works in the setting of inherently probabilistic
circuits, with a minor loss in the parameters. The result can then be translated to the setting
of quantum computations without much effort, which provides the near-optimal uniform hardness
amplification needed in our PRG construction. We refer the reader to Section 4.3 for the technical
details. Having established the above, we are ready for a high-level proof overview of Theorem 1.

1.2.3 Overview of the proof of Theorem 1


Let C be a circuit class, and suppose that polynomial-size concepts from C can be learned with
error parameter ε ≤ 1/2 − γ in quantum time o(γ 2 · 2n /n). For a given k ≥ 1, we argue that there

9
is a language L ∈ BQE = BQTIME[2O(n) ] such that L ∈ / C[nk ].
In contrast with known proofs of existing learning-to-lower-bound connections, our black-box
“PRG-based” approach relies on the following basic principle from the theory of pseudorandomness:
If we have a property P of objects that is “dense” in the set of all possible objects, and a PRG G
that is able to “fool” a class of “tests” that contain P (e.g., when P is easy to compute and G fools
predicates of bounded complexity), then some output of G must be an object that satisfies P. For
us, P is simply the class of functions that are not in C[nk ]. We would like to find a (fixed) function
in P and compute according to it in BQTIME[2O(n) ]. We start off with the “ideal” plan for the
proof, then discuss difficulties when implementing this strategy and how to overcome them.

Ideal plan for the proof of Theorem 1:

1. From a quantum learning algorithm A that finds “structure”, we get a (uniform) quantum
n
algorithm B that decides the “absence of structure” on an input string f ∈ {0, 1}2 . (Since
a typical random string is “unstructured”, a pseudorandom string is likely to be accepted
by B.)
n
2. Assuming that there exists a PRG G : {0, 1}O(n) → {0, 1}2 secure against uniform quantum
computations, we use it with B as a “test” to find a function hn : {0, 1}n → {0, 1} that is not
in C[nk ], i.e., a function that “lacks structure”. Given an n-bit input x, the hard language L
outputs hn (x).
3. We show that a pseudorandom generator G with the desired parameters and guarantees exists.

Intuitively, if we could implement these steps then L would indeed be a hard function, and the time
complexity of L can be bounded using upper bounds on the complexities of computing G and B.

Implementing Step 1: (promise) quantum natural properties. In order to formalize


Step 1, we introduce the notion of (promise) quantum natural properties, a generalization of the
central concept from [RR97] to quantum computations. Informally, a natural property (useful)
against a class C of functions is an algorithm B that: (a) rejects every function g ∈ C, when g is
n
viewed as a 2n -bit string; (b) the set of strings accepted by B is dense in {0, 1}2 ; and (c) B runs
n
in time poly(2n ) = 2O(n) on an input string f ∈ {0, 1}2 . Due to these properties, algorithm B can
be seen as a way to efficiently tell apart “structured” strings (those that encode functions from C)
from a dense set of “unstructured” strings. A quantum natural property against C is simply a
natural property against C that is decided by a quantum algorithm. To our knowledge, this is the
first time that quantum natural properties are considered in the literature.
We show that quantum learners with parameters as in Theorem 1 imply the existence of
quantum natural properties against C[nk ]. The argument follows an idea that has appeared in a
n
few different works (e.g., [Oli13, Vol14, OS17, AGS21]): to test if an input string f ∈ {0, 1}2 is not
in C[nk ], one can simulate the learning algorithm A when its oracle computes according to f (now
viewed as function), and accepts if the hypothesis output by A and f have agreement estimated to
be less than 1/2 + γ. For a function f ∈ C[nk ], when A learns f its hypothesis will have a larger
correlation with f , and so f is rejected. On the other hand, it is possible to prove that if A runs in
quantum time o(γ 2 · 2n /n), then a dense set of Boolean functions are not learned by A even with
a large error ε = 1/2 − γ (simply because such functions cannot be approximated by hypotheses
of size o(γ 2 · 2n /n)). Consequently, any such function will pass the low-correlation test with high
probability and be accepted, no matter how many times the learner is simulated.
We directly extend this idea to the quantum setting in Section 3.1 while also addressing a
subtlety that arises in the case of randomized and quantum learners. The resulting quantum

10
algorithm B computing a natural property against C[nk ] might accept some input strings f ∈
n
{0, 1}2 with a probability that is not bounded away from 1/2. This happens because we cannot
control the behavior of the algorithm A on functions near the “border” of C[nk ] (i.e., those inputs
that are not as hard as a “random” function but are not in C[nk ]). Hence, we only get a promise
quantum natural property: we are guaranteed that strings from the dense set are accepted with
high probability and strings corresponding to functions in C[nk ] are rejected with high probability.

Challenges in implementing Step 2. Unfortunately, the strategy described in Step 2 is prob-


lematic for a number of reasons:
(i ) The PRG G constructed in Step 3 requires a hardness assumption, while Step 2 needs a
language L that is unconditionally not in C[nk ]. This can only be achieved if G provably
works (i.e. it does not depend on an unproven assumption).
(ii ) Since B only computes a promise quantum natural property, it is not immediately clear how
to use the output of G and the test B to fix with high probability a unique “hard” function
hn . This is needed so the language L is well defined.
(iii ) The (conditional) PRG G that we are able to construct in Step 3 is much weaker: it will
λ
stretch poly(n) bits to m(n) = 2n bits for some λ > 0, and it is only guaranteed to fool
uniform quantum computations of time poly(m) on infinitely many choices of n.
These issues create technical difficulties that are addressed in Section 3.2 with some modifications
to our original plan, as we explain next.

Issue (i ). To resolve this and relax the conditions on the PRG, we consider two possible scenarios:
(a) Classical computations performed in polynomial space can be simulated by quantum algo-
o(1)
rithms running in sub-exponential time 2n .
α
/ BQTIME[2n ].
(b) Item (a) does not hold, i.e., there is a language L ∈ PSPACE and α > 0 such that L ∈
While we cannot currently decide which of the two scenarios holds, we obtain lower bounds against
C[nk ] in each scenario, which is sufficient for the purpose of proving Theorem 1. (We note that such
“win-win” arguments have appeared in many previous works, including [FK09, KKO13, OS17].) In
more detail, if (a) holds, we employ standard diagonalization techniques from complexity theory to
get a language L ∈ PSPACE \ C[nk ], then show that L ∈ BQE via a sub-exponential time quantum
simulation that is granted to exist by assumption (a). We can therefore assume for the rest of the
proof that (b) holds. In other words, we have a hardness hypothesis against quantum computations
that we can hope to use to construct a pseudorandom generator.

Issue (ii ). We fix this issue as follows. Suppose, for simplicity, that we did manage to construct
n
a PRG G : {0, 1}O(n) → {0, 1}2 of exponential stretch that is computable in time exponential in
n. A 2n bit string output by G can be seen as the truth table for a function hn . Since each seed
for G produces a corresponding function, G encodes a collection of functions. What we know from
the pseudorandomness of G and the existence of the promise quantum natural property is that at
least one of these functions must lie outside C[nk ]. Then, the hard language L could be defined by,
say, the “first” of the hn that lies outside of C[nk ] – the set of “easy functions”. However, finding
the hn that satisfies this condition in BQTIME[2O(n) ] remains problematic.4 Instead, we define a
4
This is because B computes a promise natural property, and we cannot easily estimate the exact probability that
B accepts a string to consistently compute the “first” such string.

11
language L over O(n)+n input bits that simultaneously computes according to all output functions
of G. More precisely, the hard language L is defined over a pair of input strings w and x, where
w is a seed for G of length O(n), and x is an n-bit string. We then set L(w, x) = fw (x), where fw
n
is the function defined by G(w) ∈ {0, 1}2 . As explained above, given that the PRG G is secure
against the uniform quantum computation performed by B, it must produce at least one string
y = G(w) encoding a function hn which lies outside the set C[nk ]. Since L computes hn when we
fix its first input string to its corresponding string w, L cannot be computed by C-circuits of size nk .

Issue (iii ). Finally, this issue is handled using a more careful description of the hard language
L. The proof makes use of the fact that we are able to get a quantum natural property against
C[nd ] for each fixed choice of d, since by assumption we have learning algorithms for arbitrarily
large polynomial-size concepts from C. Thus, a potential loss in the stretch of the generator (from
λ
2n to just 2n for some λ > 0) can be compensated by considering a natural property against
harder functions, together with standard translation arguments from complexity theory. The weaker
infinitely often guarantee of the generator can also be addressed with a careful implementation,
and we refer to the formal proof for these technical details. We highlight that the construction of
better (conditional) pseudorandom generators immediately leads to a tighter connection between
the circuit size in the lower bound and the circuit size in the learning assumption.5

Revised Step 3: (conditional) PRG against uniform quantum computations. This leaves
us with the last and most technical part of the proof: the construction of a PRG of sub-exponential
Ω(1)
stretch 2n that fools uniform quantum computations infinitely often, assuming PSPACE *
BQSUBEXP (obtained from Item (b) above). In Section 1.2.1, we explained that establishing
this result in the quantum setting is more delicate than in the classical scenario. Here we focus on
the different components employed in the proof and how they fit together in the PRG construction.
First, we stress that our PRG Gn : {0, 1}ℓ(n) → {0, 1}m(n) is defined classically, i.e., it is com-
puted by a deterministic algorithm running in time exponential in ℓ. The extension to quantum cir-
cuits lies in its security analysis, where we argue that if there is a sequence {Cm } of uniform quantum
circuits of size poly(m) that distinguish the output of G from random, then PSPACE ⊆ BQSUBEXP.
As alluded to above, we employ the hardness versus randomness paradigm. More precisely,
our construction relies on the beautiful insight of Impagliazzo and Wigderson [IW01] on how to
extend this paradigm to the uniform case, where the non-uniform advice appearing in the security
proof can be eliminated via a clever recursive approach. This allows one to prove security based
on a uniform hardness assumption. Following [IW01], in order to achieve this we also base the
generator on a problem that is downward-self-reducible and self-correctable (such properties are
useful to eliminate advice). However, we deviate from their work in terms of some other technical
tools and parameters of our PRG construction, which is formally shown in Section 5.
In more detail, we construct a family of generators Gλn , each parameterized by a fixed λ > 0.
Each Gλn employs a special PSPACE-complete language L⋆ on n input bits from Trevisan and
Vadhan [TV07] (which has the self-reducibiliy properties cited above), and applies the well-known
λ
Nisan-Wigderson generator [NW94] with sub-exponential stretch m(n) = 2n to an “amplified”
version Amp(L⋆ ) of L⋆ defined over poly(n) input bits. Amp(L⋆ ) is obtained from L⋆ as follows.
First, we consider its k-product (L⋆ )k : {0, 1}kn → {0, 1}k , which is discussed in Section 1.2.2 in
the context of hardness amplification and a result from [IJKW10]. Then, we convert (L⋆ )k into
5
Indeed, this has happened in practice, where techniques employed to design a better PRG also led to a new
learning-to-lower-bound connection [CRTY20]. Our work shows that this is not a coincidence, i.e., better PRGs lead
to better connections in a black-box way.

12
a Boolean function by a standard application of the Goldreich-Levin construction [GL89], which
XORs the output of (L⋆ )k with a new input string r ∈ {0, 1}k . This completes the sketch of the
definition of Gλn . By a careful choice of parameters k = poly(n) and seed length ℓ(n) = poly(n), it
is not hard to prove that Gλn : {0, 1}ℓ(n) → {0, 1}m(n) can be computed in time 2O(ℓ(n)) .
The non-trivial aspect here is to prove that Gλn is quantum secure for some choice of λ. It is
o(1)
enough to argue that, if for every λ > 0 the generator Gλn can be broken, then L⋆ ∈ BQTIME[2n ].
Since this language is PSPACE-complete, this violates our uniform hardness assumption. So we fo-
cus next on a fixed λ > 0 and its corresponding generator Gλn , and assume that there is a uniform
sequence of quantum circuits {Cm } of size poly(m) that distinguishes its output from random. Our
O(λ)
goal is to conclude that L⋆ ∈ BQTIME[2n ]. To achieve this, we need quantum analogues of the
“reconstruction” procedures of [NW94, GL89, IJKW10] implemented in a uniform way as in [IW01].

Quantum Nisan-Wigderson reconstruction. We observe that the original analysis of [NW94] can
be adapted without difficulty in our setting. In Section 4.1, we first verify the result in the inter-
mediate model of inherently probabilistic circuits and then extend it to the quantum case.

Quantum Goldreich-Levin reconstruction. We observe that a quantum analogue of [GL89] was es-
tablished by Adcock and Cleve [AC02]. In Section 4.2, we adapt their argument to match our nota-
tion and setting of parameters. For this reason, we do not use inherently probabilistic computations
here, which are convenient in the investigation of the other building blocks of our PRG construction.

Quantum Impagliazzo-Jaiswal-Kabanets-Wigderson reconstruction. As explained above, this turns


out to be the most complicated aspect of the security proof, since it is non-trivial to extend the
original analysis from [IJKW10] to the quantum setting. We refer to Section 1.2.2 above for an
informal explanation, and to Section 4.3 for more details. We remark that it is not hard to quantize
the well-known XOR Lemma for hardness amplification (e.g., Impagliazzo’s proof [Imp95]), but in
this work we need a uniform hardness amplification result with near-optimal parameters.

It remains to put together these tools to conclude that the existence of a quantum distin-
O(λ)
guisher {Cm } implies that L⋆ ∈ BQTIME[2n ]. As explained above, this is done in a uniform
way following the approach of [IW01]. However, controlling the uniformity of the final sequence of
quantum circuits that compute L⋆ is delicate. Recall that to show that L⋆ ∈ BQTIME[T ] we need to
produce in uniform deterministic time T , the description of a quantum circuit Qn for L⋆ on inputs
of length n. In the recursive construction of [IW01] that provides an algorithm to compute L⋆ on
an input string x ∈ {0, 1}n , one “learns” how to compute L⋆ on every input length from 1 to n, by
producing a sequence of randomized circuits D1 , . . . , Dn that compute L ∩ {0, 1}i for i ∈ {1, . . . , n}.
In order to compute Di from Di−1 , it is necessary to simulate Di−1 on some inputs, which requires
randomness. Similarly, the natural way to proceed in the quantum case is by generating a uniform
sequence of quantum circuits Q1 , . . . , Qn as above. Note that simulating Qi−1 to produce Qi now
requires a quantum computation. However, we must be able to deterministically produce the code
of quantum circuits in order to show that L⋆ ∈ BQTIME[T ]. We address this issue in Section 4.4 by
going to another “meta-level” in this simulation, where the circuit Qi “incorporates” this recursive
process from i to 1 by manipulating the codes of quantum circuits Qi−1 to Q1 . (Formally, this is not
too different from [IW01], which also manipulates descriptions of circuits, but in the quantum case
one needs to be more explicit.) We observe that it is not hard to implement this idea if the uniform
versions of the quantum reconstruction procedures described above are stated in a convenient way.
This completes the sketch of Step 3, and our overview of the proof of Theorem 1.

13
1.3 Directions and open problems
The most ambitious direction is to address the possibility that quantum computation might
lead to faster learning algorithms for expressive classes of concepts.

Question 1. Is there a quantum learning algorithm for Boolean circuits of size O(n) that runs in
time o(2n /n) and with constant probability achieves error ε ≤ 0.49 under the uniform distribution?

Addressing this and related problems might be out of reach given our current techniques. We focus
below on directions that we find particularly interesting and possibly fruitful.
It has been suggested to us that it might also be possible to extend existing algorithms for local-
list decoding of Reed-Muller codes to the inherently probabilistic setting. (This would provide an
alternative presentation of our PRG construction.) While we have not verified all the details,
we believe that this is quite plausible. It would be interesting to understand if there is a more
general phenomenon in place here, i.e., whether certain classes of ECCs and decoding algorithms
can be extended to the inherently random setting in a generic fashion, and to investigate further
applications of such codes.
Our results offer the exciting possibility that new circuit lower bounds might follow through
the design of quantum algorithms. Recall that Williams [Wil14] established via the satisfiability-to-
lower-bound connection that NE * ACC0 . Similarly, can we use our new learning-to-lower-bound
connection to show that ACC0 circuits cannot compute all functions in BQE? Using our techniques
(cf. Corollary 3.6), it would be sufficient to provide a positive answer to the following question.

Question 2. Is there a (promise) quantum natural property useful against ACC0 circuits?

Note that in order to achieve this it suffices to construct a quantum natural property for the
larger class SYM+ of quasi-polynomial size depth-2 circuits consisting of an arbitrary symmetric
gate at the top layer fed with AND gates of poly-logarithmic fan-in at the bottom layer [BT94].
Similarly, it is enough to get a quantum natural property for the class of functions that can be
approximated by a torus polynomial of bounded degree [BHLR19], or for Boolean matrices of
bounded “symmetric rank” [Wil18]. Perhaps quantum computations can be helpful in the design
of natural properties for these or for other circuit classes (e.g., Boolean formulas of size n3.01 )?
We are also curious about the prospects of designing non-trivial quantum learners for restricted
circuit classes. Servedio and Tan [ST17] explored this possibility in the context of classical compu-
tation, showing several examples where non-trivial savings can be achieved compared to the trivial
“brute-force” learning algorithm that runs in time O(2n ). As alluded to above, another regime of
parameters in the quantum setting that might be interesting to explore is that of polynomial-time
learners that achieve a non-trivial advantage γ ≫ 2−n/2 .
Finally, the investigation of the classical learning-to-lower-bound connection established in
[OS17] led to many other results, such as the existence of a learning speedup phenomenon [OS17,
Lemma 1] and unconditional complexity lower bounds [Oli19]. It would be interesting to see if new
consequences in quantum complexity theory and quantum learning theory can be obtained using
the techniques introduced in this work.

1.4 Related work


There is a rich history of connections between circuit complexity theory and the investigation
of learning algorithms. In many cases, specific circuit lower bound techniques have been used to
design new algorithms. For instance, Linial, Mansour, and Nisan [LMN93] relied on the method
of random restrictions to show that constant-depth polynomial-size circuits can be PAC-learned

14
in quasi-polynomial time under the uniform distribution from random examples. In a more recent
development, Carmosino, Impagliazzo, Kabanets, and Kolokolva [CIKK16] showed that learning
algorithms can be obtained from any lower bound technique that is “constructive” in the sense
of the theory of natural proofs of Razborov and Rudich [RR97]. This allowed them to show that
constant-depth polynomial-size circuits augmented with parity gates can be PAC learned in quasi-
polynomial time under the uniform distribution from membership queries. These, and several other
results (cf. Servedio and Tan [ST17]), show that a circuit lower bound against a circuit class C, in
most cases, can be converted into a learning algorithm for C.
The first results in the opposite direction, i.e., showing that learning algorithms imply circuit
lower bounds, were established by Fortnow and Klivans [FK09]. In their work, among other results,
o(1)
they proved that any sub-exponential time (i.e., 2n ), deterministic exact learning algorithm for
C (using membership and equivalence queries) implies the existence of a function in ENP that
is not in C. They also showed that if C is PAC learnable with membership queries under the
uniform distribution or exact learnable in randomized polynomial-time, there is a function in BPE
(an exponential time analog of BPP) that is not in C. Note that these results have the appealing
feature that they make no assumption about the techniques employed in the design of the learning
algorithm. Nevertheless, there is an important drawback in the initial results of [FK09]: they make
strong assumptions about the resources of the learning algorithm. For example, many learning
algorithms are randomized and require at least quasi-polynomial time (e.g., [LMN93, CIKK16]),
and the results from [FK09] do not apply in this case.
Over the last decade several works have addressed this and other aspects of the learning-to-
lower-bound connection. Harkins and Hitchcock [HH13] eliminated the NP oracle and showed that
E * C from the existence of exact deterministic sub-exponential time learning algorithms with mem-
bership and equivalence queries. Shortly after, Klivans, Kothari, and Oliveira [KKO13] simplified
these proofs, strengthened a few existing learning-to-lower-bound connections, and obtained re-
sults for additional learning models (e.g., learning from statistical queries). In particular, [KKO13]
proved that E * C if there are exact learners for C running in deterministic time o(2n ), showing
that “non-trivial” deterministic learners yield lower bounds.
In a different direction, Volkovich [Vol14] showed how to extract lower bounds in polynomial-
time classes from randomized learners running in polynomial time. More precisely, [Vol14] shows
that BPP/1 (BPP with 1 bit of non-uniform advice per input length) is not contained in C[nk ]
for all k ≥ 1 if C can be PAC learned with membership queries under the uniform distribution
by randomized learners running in polynomial time. In another work, Volkovich [Vol16] explores
connections between algebraic complexity lower bounds and learning algorithms.
While extracting circuit lower bounds from non-trivial deterministic learners can be done using
elementary techniques [KKO13], extending the result to randomized learners required advanced
tools from complexity theory. This was first achieved by Oliveira and Santhanam [OS17], who
proved that if for every k ≥ 1 the class C[nk ] can be PAC learned under the uniform distribution
with membership queries by a randomized algorithm running in time 2n /nω(1) , then for every k ≥ 1
we have BPE * C[nk ]. Their result admits an extension to faster learners with a weaker advantage
γ, with parameters similar to our Theorem 1. A simpler proof that non-trivial randomized learning
yields lower bounds, with a stronger consequence, was obtained by Oliveira and Santhanam [OS18]
under the additional assumption that the learner has “pseudo-deterministic” behaviour. We refer
to their work for details.
In a more recent work, Oliveira [Oli19] explored the learning-to-lower-bound connection in an
indirect way to show that a natural problem in time-bounded Kolmogorov complexity cannot be
solved in probabilistic polynomial time. This can be achieved by proving a lower bound under the

15
assumption that learning algorithms do not exist (since otherwise a useful lower bound immediately
follows from an existing learning-to-lower-bound connection).
Chen, Rothblum, Tell, and Yogev [CRTY20] established a “fine-grained” learning-to-lower-
bound connection with respect to circuit size in the setting of randomized learners. More precisely,
they show that learning general Boolean circuits of size n·poly(log n) in randomized time 2n/poly(log n)
yields a function in BPE that cannot be computed by circuits of size n · poly(log n). While their
running time assumption is much more stringent than 2n /nω(1) , it is not necessary to learn circuits
of large polynomial size to get interesting lower bound consequences.
In a parallel line of work, Williams [Wil13] employed completely different methods to establish
that “non-trivial” (running in time 2n /nω(1) ) deterministic satisfiability algorithms for a class C of
polynomial-size circuits imply that NE * C, where NE is an exponential time analogue of NP. The
satisfiability-to-lower-bound connection and its extensions have been highly successful in establish-
ing new circuit lower bounds for a variety of circuit classes (see e.g., [Wil14, MW18, CR20, CLW20]
and references therein). Chen, Oliveira, and Santhanam [COS18] combined learning and satisfiabil-
ity algorithms to strengthen lower bound consequences from learning algorithms in the particular
case of C = ACC0 (constant-depth polynomial-size circuits extended with modular gates).
In contrast to all these works, here we establish the first general connection between the de-
sign of quantum (learning) algorithms for an arbitrary class C of polynomial-size circuits and the
existence of lower bounds against C.

Organization. In Section 2, we fix notation and review a few basic definitions and results, as well
as formalize our quantum learning model and the useful concept of inherently random circuits. In
Section 3, we establish that non-trivial quantum learners imply lower bounds (Theorem 1), under
the assumption that a conditional PRG against quantum computations exists. Section 4 develops
the tools that are needed to establish the correctness of our PRG construction. Finally, Section 5
defines and analyses a PRG with the desired properties, which completes the proof of Theorem 1.

Acknowledgements. We thank Lijie Chen (MIT) for several comments and feedback on a pre-
liminary version of this paper, and the anonymous reviewers of QIP’2021 and FOCS’2021 for many
useful remarks about the presentation. S.A. was partially supported by the IBM Research Frontiers
Institute and acknowledges support from the Army Research Laboratory, the Army Research Office
under grant number W911NF-20-1-001. A.S. was partially supported by the Joint Centre for Quan-
tum Information and Computer Science, University of Maryland, USA and acknowledges support
from the Department of Defense. Part of this work was done while A.G. was affiliated to CWI and
QuSoft. Part of this work was done while S.A, A.G. and A.S were participating in the program
“The Quantum Wave in Computing” at Simons Institute for the Theory of Computing. This work
received support from the Royal Society University Research Fellowship URF\R1\191059 and the
UKRI Future Leaders Fellowship MR/S031545/1.

2 Preliminaries
2.1 Basic definitions and notation
We begin with standard notation that will be employed throughout this work.

– We use N to denote the set {1, 2, 3, . . .} of positive integers.


– We denote by Un the uniform distribution over n-bit strings.

16
– When sampling a uniformly random element a from a set A, we might use a ∈ A or a ∼ A.
– We say that two Boolean functions f, g : {0, 1}n → {0, 1} are λ-close if Prx∼Un [f (x) 6= g(x)] ≤ λ.
– We say that f computes g with advantage γ if Prx∼Un [f (x) = g(x)] ≥ 1/2 + γ.
– We use negl(n) to denote a function g : N → N such that, for every univariate polynomial p(·)
with positive coefficients, there exists n0 ∈ N such that g(n) ≤ 1/p(n) for every n ≥ n0 .

We assume the familiarity of the reader with standard complexity classes (e.g. PSPACE) and
basic notions from classical computational complexity theory and refer to a textbook such as [AB09]
for more information. Some notions from quantum computing and quantum complexity theory are
reviewed in Section 2.2.

2.1.1 Useful results


The following standard concentration bound will be used in some proofs.

Lemma 2.1 (Chernoff bound; see e.g. [AS16, Theorem A.1.15]). Let XP 1 , . . . , Xk be i.i.d. {0, 1}-
valued random variables each taking value 1 with probability p. Let X = i Xi and µ = pk. Then
for any δ ∈ (0, 1) it follows that
2 µ/2
Pr [X ≤ (1 − δ)µ] ≤ e−δ .

An alternate version of this bound will also be useful.

Lemma 2.2 (Chernoff bound; see e.g. [JLR11, Theorem 2.1]). LetP X1 , . . . , Xk be i.i.d. {0, 1}-valued
random variables each taking value 1 with probability p. Let X = i Xi and µ = pk. Then for any
t ≥ 0 it follows that
 t2 
Pr [|X − E[X]| ≥ t] ≤ exp − .
2(µ + t/3)
We will also need the following form of the Hoeffding bound [Hoe63].

Lemma 2.3 (Hoeffding bound [Hoe63]). Let F : V → [0, 1] be a function over a finite set V with
expectation Ex∼V [F (x)]
P= α ∈ [0, 1]. Let R be a random subset of V of size t, and consider the
random variable X = x∈R F (x). Then the expectation of X is µ = αt, and for any 0 ≤ γ ≤ 1,
  2
Pr |X − µ| ≥ γµ ≤ 2 · e−γ µ/3 .

2.1.2 Circuit and concept classes


For a typical Boolean circuit class C such as AC 0 , T C 0 , Formulas, etc., and for a size function
s : N → N measuring number of gates, we use C[s] to denote the set of languages L ⊆ {0, 1}∗ that
can be computed by a sequence of C-circuits of size s(n). For circuit classes of bounded depth, we
might use Cd when referring to circuits of depth at most d. We stress that our notation refers to
non-uniform circuit classes, and consequently the complexity lower bound appearing in Theorem
1 is a non-uniform circuit complexity lower bound.
We will also employ circuit classes in the context of learning algorithms, where they are often
referred to as concept classes. In this case, we will abuse notation and view C[s(n)] as the class of
Boolean functions f : {0, 1}n → {0, 1} that can be computed by C circuits over n input variables
of size at most s(n). For convenience, we use sizeC (f ) to denote the minimum number of gates in
a C-circuit computing f . We omit the subscript C in sizeC (f ) when we refer to general Boolean

17
circuits. For concreteness, in this case circuit size refers to number of gates in the model consisting
of fan-in two circuits over AND, OR, and NOT gates.
Theorem 1 applies to a broad family of circuit classes investigated in computational complexity
theory, including fixed-depth classes such as depth-2 circuits consisting of majority gates. The only
assumptions needed on the circuit class C are that:
(i ) Any function fn ∈ C[s(n)] can be computed by a general Boolean circuit of size poly(s(n)).
This is the case, for instance, if every gate allowed in C can be simulated by a Boolean circuit
of size polynomial in the number of input wires of the gate.
(ii ) The class C is closed under restrictions of input variables to constants 0 and 1. In other words,

if fn ∈ C[s(n)] and we fix some input variables of fn to obtain a function fn′ : {0, 1}n → {0, 1},
then fn′ is also computed by a C-circuit of size at most s(n).
These assumptions are needed only in Section 3.2, and we rely on each of them as follows. We
use Item (i ) to show that if a language L is not computable by (unrestricted) Boolean circuits
of size nα for some α ≥ 1, then it cannot be computed by C-circuits of size nβ , where β = α/C ′
for some universal positive constant C ′ that is independent of α. On the other hand, we will rely
on Item (ii ) to say that if a function f : {0, 1}n → {0, 1} can be computed by C-circuits of size
polynomial in n, then any sub-function of f defined over n′ = nΩ(1) input variables via a restriction
also admits C-circuits of size polynomial in n′ .
Note that Theorem 1 indeed applies to virtually any class of polynomial size circuits.
We refer to a standard reference such as [Juk12] for more background on Boolean functions
and circuit complexity theory.

2.1.3 Classical learning algorithms


For the convenience of the reader, we review in this section (classical) learnability under the
uniform distribution in the Probably Approximately Correct (PAC) model extended with mem-
bership queries. The quantum learning model that we consider in this work and its extensions are
discussed in Section 2.3.
Definition 2.4. Let Cn be a family of Boolean functions on n input variables, C = ∪n≥1 Cn ,
T : N → N, and ε, δ : N → [0, 1]. We say that C can be (ε, δ)-learned in time T under the uni-
form distribution with membership queries if there exists a randomized algorithm A that satisfies
the following guarantees:
For all n ∈ N, for every f ∈ Cn , given n and query access to f , A runs in time T (n)
and with probability at least 1−δ(n) over its internal randomness outputs a hypothesis h
(encoded as a general Boolean circuit) that satisfies
Pr [h(x) = f (x)] ≥ 1 − ε(n).
x∼Un

It is also possible to focus on a fixed circuit class C, and to consider learnability of functions
computed by C-circuits of an arbitrary size s(n), for a fixed learning algorithm A that is independent
of s but that is given the value sf = sizeC (f ) (or an upper bound on sf ) as part of its input. In
this case, we allow the running time of A to depend on sf . Similarly, we can provide the learner
with δ and ε, and allow its running time to depend on these parameters.
We stress that the output hypothesis h produced by A is not required to be a “circuit” from C,
when C explicitly refers to a circuit class and a size measure.
We refer to a standard reference such as [KV94] for more background in computational learning
theory.

18
2.2 Quantum computation
We assume the reader is familiar with the quantum computing framework and notation, such
as Dirac’s bra-kets. We refer to a standard text such as [NC10] for a detailed introduction to
quantum computation. Here, we discuss concepts and notation of specific relevance to this paper.
A quantum circuit U on n-qubits is a sequence of quantum gates (i.e., unitary matrices).
Throughout this paper we will assume our quantum gates are restricted to the one-qubit Hadamard
gate H and 3-qubit Toffoli gate Toff defined as follows:
1 
H : |b1 i → √ |0i + (−1)b1 |1i , Toff : |b1 , b2 , b3 i → |b1 , b2 , b3 ⊕ b1 · b2 i
2
for b1 , b2 , b3 ∈ {0, 1}. The Toffoli gate is particularly useful for us as any classical circuit can be
implemented as a quantum circuit using only Toffoli gates. Additionally, we choose this set of gates
because {H, Toff} is universal for approximating unitaries with only real entries, i.e., every unitary
with real entries can be approximated arbitrarily well with only H and Toff gates [Aha03]. In fact,
our results still go through as long as we have a gate set whose size is constant. In our case, the
size of a quantum circuit is the number of Hadamard and Toffoli gates in the circuit.
The classical description of a quantum circuit U on n-qubits, denoted as code(U ), is a canonical
encoding of the sequence of gates (and the qubits they act on) in the circuit, represented as a binary
string. We will repeatedly use the result that, given the description code(U ) of a quantum circuit
U of a predetermined size, there exists an efficient universal quantum circuit U that can simulate
the action of U on any n-qubit input |yi. Formally, we have the following definition and result.

Definition 2.5 (Universal quantum circuit). Fix n ∈ N and let C be the collection of quantum
circuits on n-qubits of size s(n). An (n + m)-qubit quantum circuit U is universal for C if for every
circuit U ∈ C and associated code(U ) ∈ {0, 1}∗ ,

U (|yi ⊗ |code(U )i) = (U |yi) ⊗ |code(U )i , for every y ∈ {0, 1}n .

There exist efficient constructions of U whose size has only a log-factor blow-up in the parameters
n and s(n) (see [BFGH10]). (We note that a polynomial blow-up would still be sufficient in our
constructions.)
We say that a quantum circuit U computes a function f : {0, 1}n → {0, 1} if the following
holds: there exists m ≥ 0 such that, for every x ∈ {0, 1}n , when measuring the first qubit of U on
input |x, 0m i, we get f (x) with probability at least 2/3, i.e.,
2
Πf (x) U |x, 0m i ≥ 2/3,

def
where Πf (x) = |f (x)ihf (x)| ⊗ Id. The extra m qubits are called auxiliary qubits. As in probabilistic
computing, the constant 2/3 is not important here as one can use standard amplification techniques,
such as taking the majority vote over many repetitions, to amplify this probability to 1 − δ for
some δ > 0.
Often in this paper, we will abuse notation and write Pr[U (x) = f (x)] when referring to
kΠf (x) U |x, 0m i k2 – the probability that the quantum circuit U on input x outputs f (x) after
measurement. Similarly, we might write
h i
def 2
Pr [U (x) = f (x)] = E Πf (x) U |x, 0m i
x∼{0,1}n , U x∼{0,1}n

to refer to the expected agreement between the output of U and f over a random input string.

19
These definitions extend naturally to the case of functions with a non-Boolean output. For
instance, if g : {0, 1}n → {0, 1}ℓ , then we use
h i
def 2
Pr [U (x) = g(x)] = E Πg(x) U |x, 0m i ,
x∼{0,1}n , U x∼{0,1}n

where Πw for a string w ∈ {0, 1}ℓ is defined as = |wihw| ⊗ Id.


Unlike classical probabilistic computation, quantum interference could result in the output
qubit being entangled with junk values left in the auxiliary qubits when they are used in inter-
mediate computations. This could impact the probability that [U (x) = f (x)] on measurement –
especially when U is used as a sub-routine in larger computations. However, by relying on the
reversible nature of unitary computations, we can “remove the garbage” via standard techniques.
In particular, given U on n + m qubits that computes f ,6 construct the garbage free version Ũ on
n + m + 1 qubits as follows:

(i ) Compute U on input |x, 0m i;

(ii ) Copy the first qubit of U into the (n + m + 1)-th qubit of Ũ with a quantum gate;

(iii ) Un-compute U by applying U † on the first n + m qubits – i.e., apply each gate of U in reverse
order. This will return |x, junki back to |x, 0m i;
e f (x) = |f (x)ihf (x)| ⊗ |xihx| ⊗
(iv ) Measure the (n + m + 1)-th qubit to check if Ũ (x) = f (x) with Π
m m
|0 ih0 |.

The lower bounds in this paper involve quantum complexity classes, which are defined next.
Owing to the probabilistic nature of quantum circuits, all classes of interest are in the bounded
error paradigm. We first consider the uniform class BQTIME (bounded-error quantum time).

Definition 2.6 (BQTIME). Let f : {0, 1}∗ → {0, 1} and t : N → N. We say that f is computable
in bounded-error quantum t(n)-time if there exists a deterministic algorithm A which on input 1n
runs in time poly(n, t(n)) and outputs the description of a quantum circuit U , represented as a
string code(U ), with at most t(n) quantum gates such that U computes f . In this case, we write
f ∈ BQTIME[t(n)].

We also fix notation for the complexity classes that arise when t(n) scales polynomially or
exponentially in n.

S Let n ∈ N and ν, c > 0 be constants.


Definition 2.7 (Bounded-error quantum complexity classes).
Then, quantum polynomial time refers to the class BQP := c>0 BQTIME[nc ]. Similarly, BQE :=
S c·n
c>0 BQTIME[2 ] refers to the class of languagesTcomputable in quantum
 nν  2O(n) -time. Quantum
sub-exponential time is denoted as BQSUBEXP := 0<ν<1 BQTIME 2 .

2.3 Quantum learning algorithms and extensions


In this section we define the quantum learning model. In contrast to the classical model, a
quantum learner is given quantum oracle access to a concept f . This means the learner is allowed
6
We remark that in Section 4.4 when we need to remove garbage in unitary computations, we will always amplify
the success probability of U , so as to ensure that U computes f with overwhelming probability before constructing
Ũ .

20
to perform a quantum membership query, which is defined by a unitary map Of acting on n + 1
qubits whose action on basis states is defined as

Of : |x, bi → |x, b ⊕ f (x)i ,

for every x ∈ {0, 1}n and b ∈ {0, 1}. Naturally, a quantum learner can perform a quantum
computation in between quantum queries, and its goal is the same as for a classical learner (defined
in Section 2.1.3), i.e., to output a hypothesis h that approximates the target concept f ∈ C.
We can formalise this model as follows. A quantum learning algorithm L running in time G
is described by a uniform sequence of quantum circuits Qn , where each Qn contains at most G(n)
gates. The quantum circuit Qn expects as input |0m i for some m = m(n), and consists of a sequence
of gates from {H, Toff, O}, where O is defined over n + 1 qubits and is considered as applying a
single gate. In other words, when learning an unknown Boolean function f : {0, 1}n → {0, 1}, O
computes as the unitary map Of described above. Finally, the output hypothesis of Qn when
computing with quantum membership query access to f (referred to as Qfn ) is described by the
string corresponding to the output measurement of Qfn . We note that intermediate measurements
are also allowed in Qn .

Definition 2.8 (Standard quantum learners). Let Cn be a family of Boolean functions on n input
variables and C = ∪n≥1 Cn . Let G : N → N. We say C can be (ε(n), δ(n))-learned in time G(n) if
there exists a quantum learning algorithm L that satisfies the following:

For all n ∈ N, for every f ∈ Cn , given quantum oracle access to f , L uses at most G(n)
gates and with probability at least 1 − δ(n) (over the output measurement of L) outputs
a (classical) hypothesis h that satisfies

Pr [h(x) = f (x)] ≥ 1 − ε(n).


x∼Un

We remark that one distinct advantage of having quantum oracle access to f is that one can
efficiently Fourier sample for the squared Fourier distribution {fb2 (S)}S , which might be computa-
tionally hard for classical learners (we discuss this in more detail in Appendix A). For more on this
quantum learning model, we refer the interested reader to [SG04, AW17].
In this paper, we consider a natural generalization of this model: we allow the quantum learner
to output a quantum hypothesis, meaning that L is allowed to output a classical description of a
quantum circuit U ff that approximates the function f . Recall that computing f on a given input x
means that measuring the first qubit of U acting on an input |xi and auxiliary qubits set to |0i, in
the computational basis, produces f (x) with probability ≥ 2/3. On the other hand, our notion of
a quantum circuit U ff that approximates f refers to the expected agreement between the output of
ff and f on a random input x, as defined in Section 2.2.
U
We formally define this model below.7

Definition 2.9 (Quantum learning with quantum hypothesis). Let Cn be a family of Boolean
functions on n input variables and C = ∪n≥1 Cn . Let G : N → N. We say C can be (ε(n), δ(n))-
learned in time G(n) if there exists a quantum learning algorithm L that satisfies the following:
7
We only define the uniform distribution learning model below. Note that one can naturally define a quantum
learner with quantum hypothesis under an arbitrary distribution by changing Un to an arbitrary distribution D.

21
For all n ∈ N, for every f ∈ Cn , given quantum oracle access to f , L uses at most G(n)
gates and with probability at least 1 − δ(n) (over the output measurement of L) outputs
the description of a quantum circuit U ff that satisfies
 
E ff |xi |0ik2 ≥ 1 − ε(n) ,
kΠf (x) U
x∼Un

where Πf (x) = |f (x)ihf (x)| ⊗ Id.

2.4 Quantum natural properties


Let Fn be the family of all Boolean functions on n input bits. We say that Γ = {Γn }n≥1 is a
combinatorial property of Boolean functions if Γn ⊆ Fn for every n ≥ 1. For every function f ∈ Fn
and N = 2n , let tt(f ) ∈ {0, 1}N be the truth table representing f . We associate with Γ a language
LΓ ⊆ {0, 1}∗ defined as follows. A string x is in LΓ if and only if x = tt(f ) for some n and f ∈ Γn .
Conversely, given a string w ∈ {0, 1}N , let fncw denote the boolean function f : {0, 1}n → {0, 1}
such that tt(f ) = w.

Definition 2.10 (Natural property [RR97]). Let Γ be a combinatorial property, C be a circuit class,
D be a (uniform or non-uniform) complexity class, and s : N → N. We say that Γ is a D-natural
property useful against C[s] if there exists n0 ∈ N for which the following holds:
?
• Constructivity: LΓ ∈ D, i.e., for every n, fn ∈ Γn is decidable in D.

• Largeness: For every n ≥ n0 , Prf ∼Fn [f ∈ Γn ] ≥ 1/2.

• Usefulness: For every n ≥ n0 , Cn [s(n)] ∩ Γn = ∅.

If D = BQP, we say that Γ is a quantum natural property.

Note for instance that for BPP and BQP-natural properties the corresponding algorithm (that
decides LΓ ) is allowed to run in time polynomial in the input length N = 2n . We will need to slightly
relax the guarantees offered by a natural property in order to establish a connection to learning.

Definition 2.11 (Informal). We say that a circuit class C admits a promise natural property if the
underlying algorithm (that decides LΓ ) in Definition 2.10 accepts every string in LΓ with probability
≥ 2/3 and rejects every string encoding a function in Cn [s(n)] with probability ≥ 2/3. (For instance,
the algorithm might accept some strings with probability close to 1/2.)

Definition 2.12 (Promise BQP-natural property). We say that there is a (promise) BQP-quantum
natural property against C-circuits of size s if there is a quantum algorithm D operating over inputs
of the form N = 2n for which the following holds for every large enough parameter n:

• Constructiveness: D runs in time polynomial on its input length N .

• s(n)-hardness: For every f ∈ Cn [s(n)], D accepts tt(f ) with probability at most 1/3.

• Density: There is a set Γn ⊆ {0, 1}N of density |Γn |/2N ≥ 1/2 such that, for every w ∈ Γn ,
fncw ∈
/ Cn [s(n)] and D accepts w with probability at least 2/3.

Remark 2.13. We stress that by the definition of BQP, a promise BQP-natural property implies
a classical algorithm A such that A(1N ) computes the quantum circuit DN that deals with inputs
of size N .

22
2.5 Self-reducibility
Definition 2.14 (Downward self-reducibility). A function f : {0, 1}∗ → {0, 1} is said to be down-
ward self-reducible if there is a deterministic polynomial-time oracle procedure Af such that:

1. On any input x of length n, Af (x) only makes queries of length < n.

2. For every input x, Af (x) = f (x).

Definition 2.15 (Random self-reducibility). A function f : {0, 1}∗ → {0, 1} is said to be ran-
dom self-reducible if there are constants a, b, c ≥ 1 and polynomial-time computable functions
g : {0, 1}∗ → {0, 1}∗ and h : {0, 1}∗ → {0, 1} satisfying the following conditions:

1. For large enough n, for every x ∈ {0, 1}n and for each i ∈ N such that i ∈ [na ], g(i, x, r) ∼ Un
when r ∼ Unc .

2. For large enough n and for every function f˜n : {0, 1}n → {0, 1} that is (1/nb )-close to f on
n-bit strings, for every x ∈ {0, 1}n :

f (x) = h x, r, f˜n (g(1, x, r)), f˜n (g(2, x, r)), . . . , f˜n (g(na , x, r))

with probability ≥ 1 − 2−2n when r ∼ Unc .

Trevisan and Vadhan [TV07] construted a language in PSPACE that simultaneously satisfies
the following conditions.8

Theorem 2.16 (Trevisan and Vadhan [TV07]). There is a language L⋆ ⊆ {0, 1}∗ satisfying the
following properties:

• L⋆ ∈ PSPACE. In particular, there is a positive constant d⋆ such that L on inputs of length n


d⋆
can be computed in time O(2n ).

• L⋆ is complete for PSPACE with respect to deterministic polynomial-time reductions.

• L⋆ is downward-self-reducible and random-self-reducible, with parameters a⋆ , b⋆ , and c⋆ in


Definition 2.15.

2.6 Pseudorandomness
For s ∈ N and ε ∈ [0, 1], we say that a distribution D supported over {0, 1}m is (s, ε)-
pseudorandom against quantum circuits if for every quantum circuit C of size s defined over m
input bits, we have
Pr [C(x) = 1] − Pr [C(y) = 1] ≤ ε.
x∼{0,1}m , C y∼D, C

Note that each probability above refers to an appropriate random input and the randomness present
in the output of C.
We will consider weaker forms of pseudorandomness that hold against uniformly constructed
families of quantum circuits. For functions s : N → N and ε : N → [0, 1], we say that an ensem-
ble {Dm }m≥1 of distributions Dm supported over {0, 1}m is (s, ε)-pseudorandom against uniform
8
The proof of this result in [TV07] refers to “self-correction” instead of “random-self-reducibility”, but they
ultimately rely on a decoding algorithm for polynomials making queries that are uniformly distributed on each
coordinate.

23
quantum circuits if for every deterministic algorithm A(1m ) that runs in time s(m) and outputs a
quantum circuit Cm over m input variables and of size at most s(m),

Pr [Cm (x) = 1] − Pr [Cm (y) = 1] ≤ ε(m).


x∼{0,1}m , Cm y∼Dm , Cm

For technical reasons, the pseudorandom distributions we construct are supported over strings
λ
of length m(n) = ⌊2n ⌋, for n ∈ N and a fixed λ > 0. (The index parameter n will play an
important role in our constructions.) Moreover, we will only be able to show that the distributions
Dm(n) are pseudorandom for infinitely many values of n. For these reasons, we refine the previous
definition in the following way.
Let m : N → N, s : N → N and ε : N → [0, 1]. We say that an ensemble {Dm(n) }n≥1 of
distributions Dm supported over {0, 1}m(n) is infinitely often (s, ε)-pseudorandom against uniform
quantum circuits if for every deterministic algorithm A(1m ) that runs in time s(m) and outputs a
quantum circuit Cm over m input variables and of size at most s(m), for infinitely many values of
def
n and m = m(n),

Pr [Cm (x) = 1] − Pr [Cm (y) = 1] ≤ ε(m).


x∼{0,1}m , Cm y∼Dm , Cm

Finally, our distributions are obtained from pseudorandom generators. Let ℓ : N → N. We say
that a family of functions {Gn }n≥1 is a quick infinitely often (s, ε)-pseudorandom generator against
uniform quantum circuits of seed length ℓ(n) and output length m(n) if the following conditions
hold:

(i ) Stretch: Each function stretches an ℓ(n)-bit input to m(n)-bits i.e., Gn : {0, 1}ℓ(n) → {0, 1}m(n) .

(ii ) Uniformity and Running Time: There is a deterministic algorithm A that when given 1n and
x ∈ {0, 1}ℓ(n) runs in time O(2ℓ(n) ) and outputs Gn (x).

(iii ) Pseudorandomness: For each n ≥ 1, let Dm(n) = Gn (Uℓ(n) ) be the distribution over m(n)-bit
strings induced by evaluating Gn on a random ℓ(n)-bit string. Then the corresponding en-
semble {Dm(n) }n≥1 is infinitely often (s, ε)-pseudorandom against uniform quantum circuits.

For convenience, we will also say in this case that

G = {Gn }n≥1 is an infinitely often (ℓ, m, s, ε)-generator.

Under our notation, observe that ℓ(n) and m(n) are functions indexed by n that control the
“stretch” of Gn (i.e. Gn maps ℓ bits to m bits), and s(m) and ε(m) are pseudorandomness param-
eters of the induced distribution Dm = Gn (Uℓ ).

2.7 Inherently probabilistic computations


As an intermediate model between classical and quantum computations, it will be useful to
consider inherently probabilistic circuits. Roughly speaking, these are computational devices whose
randomness remains “inacessible”, in the sense that there is no simple way of decomposing an
inherently probabilistic circuit as a distribution of deterministic circuits.
To make this point more precise, we review the discussion from Section 1.2.2. Consider the
standard model of randomized Boolean circuits. In other words, we consider a Boolean circuit
C(x, y), where x is the input string and y is the random string, and say that C computes a Boolean

24
function g : {0, 1}m → {0, 1}k on an input x ∈ {0, 1}m if Pry [C(x, y) = g(x)] ≥ 2/3. Similarly,
def
we can also discuss the correlation between C and g, captured by γ = Prx,y [C(x, y) = g(x)].
An extremely useful trick in this model is that, by an averaging argument, there must exist a
fixed string y ′ such that Prx [C(x, y ′ ) = g(x)] ≥ γ. In other words, there is a deterministic circuit
def
Cy′ (x) = C(x, y ′ ) that correctly computes g on a γ fraction of inputs. This often allows one to
reduce the analysis of randomized Boolean circuits to the deterministic case. Additionally, Cy′
“forces” C(x, y) to commit to a single consistent output on every x, which can be relevant if C is
needed as a subroutine in other computations.
As a concrete example, note that this idea is crucial in the proof of BPP ⊆ P/poly, which
is a simple combination of reducing the failure probability of a randomized circuit and fixing its
random input.
While quantum computations share superficial similarities with the model of randomized com-
putations and their corresponding randomized circuits, there is no obvious way of reducing a quan-
tum computation to a distribution of “deterministic” quantum computations. In some cases, this
is part of the difficulty when extending classical results to the quantum setting. With this in mind,
below, we introduce inherently probabilistic circuits, which serve as a useful intermediate model in
our analysis of quantum computations in the context of error correction and hardness amplification.

Inherently probabilistic circuits. We adopt a definition of inherently probabilistic computations


that is sufficient for our purposes. In order to be as general as possible and because of how we
access these computations, we model an inherently probabilistic circuit A over m input bits and that
produces ℓ output bits as a function A : {0, 1}m → F, where F is the set of probability distributions
supported over a fixed domain {0, 1}ℓ . In other words, A assigns to each input z ∈ {0, 1}m a
distribution A(z) supported over {0, 1}ℓ . We say that A computes a function g : {0, 1}m → {0, 1}ℓ
with probability at least ε if
Pr m [v = g(z)] ≥ ε.
z∼{0,1} ,
v∼A(z)

Note that nothing is assumed about the relation between distributions A(z1 ) and A(z2 ) for z1 6= z2 ,
and that there is no way of globally “fixing” the randomness of A across different input strings.
We will also need to employ inherently probabilistic computations as subroutines inside stan-
dard deterministic and randomized computations. We model this situation using standard Boolean
circuits and “inherently probabilistic oracles gates”, described next.
Deterministic Boolean circuits with oracle gates are defined in the standard way. More precisely,
we use C O to represent a deterministic circuit that in addition to its original gates can label certain
gates of fan-in m and of fan-out ℓ by O. The computation of C A (x) on an input x is defined once
we set O to an inherently probabilistic circuit A. Formally, each gate of C A (x) is now a random
variable whose value depends on the input x and on the outcomes of the calls to A, each distributed
according to A(z) when the input to A is the string z (two calls of A over the same string z are
distributed independently). For convenience, if C O is an oracle circuit with k designated output
gates, we use C A (x) to denote the random variable supported over {0, 1}k and distributed according
to the bits produced by its output gates (under a fixed order of the output gates of C O ).
If g : {0, 1}n → {0, 1}k , C O is a Boolean oracle circuit over n input bits with oracle gates of fan-
in m and fan-out ℓ, and A is inherently probabilistic, we say that C A computes g with probability
at least ε if Prx∼{0,1}n ,A [C A (x) = g(x)] ≥ ε.
It will also be useful to consider randomized oracle circuits with access to an inherently prob-
abilistic oracle A. In this setting, C O has, in addition to its input string x, an extra input y for
random bits. The computation of C A on an input pair (x, y) is defined as above, i.e., the inputs

25
x and y are considered as a single input string of C A . We can similarly consider the probability
Prx,y,A [C A (x, y) = g(x)] of computing g using C A . Note in this case that y can be fixed to a given
string y ′ , but the randomness associated with each call to A cannot be fixed.
When it is clear from the context, we might omit superscripts O and A when referring to such
oracle circuit computations.
A somewhat similar model is explored by Kearns and Schapire [KS94] (see also [Yam92])
from a different perspective.9 Their paper focuses on the learnability of functions of the form
c : X → [0, 1], where c(x) can be interpreted as a probability distribution over {0, 1}. In contrast,
in this work we assign to each input x a probability distribution over strings. Most importantly, we
are concerned with the subtleties of computing with such devices, which will be used as internal
building blocks in other computations, while the focus of [KS94] and [Yam92] is on the learnability
of probabilistic/stochastic functions.
We notice that any quantum circuit that is accessed through classical queries can be viewed as
an inherently probabilistic circuit.

Lemma 2.17. Let R be an inherently probabilistic circuit and Q a quantum circuit such that
on every input x, the output distributions of R(x) and Q(x) are exactly the same.10 Then for
every probabilistic algorithm A, which might have some classical input w, that is allowed to make
(classical) queries to R or Q, we have that for every y,

Pr [AR (w) = y] = Pr [AQ (w) = y].


A,R A,Q

Proof. Notice that since A is a classical algorithm and is only allowed to make classical queries to
Q, A only has access to samples of the output distribution of Q. Since the output distribution of
Q and R is exactly the same for every input, the result follows.

3 Lower bounds from learning algorithms: a modular approach


via PRGs
In this section, we show how to derive circuit lower bounds from quantum learning algorithms
via two steps: first, in Section 3.1 we show that quantum learners imply quantum natural proper-
ties; then, in Section 3.2 we show that given a pseudorandom generator against uniform quantum
computation, quantum natural properties imply circuit lower bounds.

3.1 Quantum natural properties from quantum learning algorithms


Theorem 3.1. There is a universal constant λ ≥ 1 for which the following holds. Let γ : N →
[0, 1/2] be a constructive function such that γ(n) ≥ λ · 2−n/2 , and let s(n) = γ(n)2 · 2n /(λ · n).
If C[nk ] can be learned under the uniform distribution in quantum time at most s(n) by a (δ, ε)-
learner with δ(n) ≤ 0.99 and ε(n) ≤ 1/2 − γ(n), then there exists a promise quantum natural
property against C[nk ].

Proof. First, we prove the result under the assumption that δ ≤ 1/10. Then we show that this
can be easily relaxed to learners that only succeed with probability 1/100, as in the statement of
the result.
9
This reference has been brought to our attention by an anonymous reviewer.
10
Notice that here Q(x) could also have a garbage register that is traced out.

26
By our assumption, there exists a quantum learning algorithm A|f i that runs in quantum time
s(n) for a function f : {0, 1}n → {0, 1} computed by a circuit f ∈ C[nk ], and with probability at
least 9/10, the algorithm A|f i outputs a quantum hypothesis U such that
  1
E kΠf (x) U |x, 0ik2 ≥ + γ(n) . (2)
x 2
We show now how to construct a quantum natural property from A|f i . Let D be the quantum
algorithm that receives as input tt(f ), the truth-table of f , and performs the following steps.
P P
1. Simulate A|f i by answering all the queries x αx |x, bi → x αx |x, b ⊕ f (x)i for arbitrary
amplitudes {αx }x (this can be performed efficiently since D has access to the truth table of
f ).
2. If the simulation of A|f i does not output a hypothesis U , output 1.
3. Otherwise, set T = γ(n)−3 + 100n and choose x1 , . . . , xT ∈ {0, 1}n uniformly at random,
invoke U |xi , 0i and measure the first bit for every i ∈ [T ]; denote by bi ∈ {0, 1} the output
of the ith invocation.
4. Output 0 if and only if
1 X 1 1
[bi = f (xi )] ≥ + · γ(n) . (3)
T 2 4
i∈[T ]

We prove that D satisfies all three conditions of Definition 2.12. Constructiveness follows by
noting that simulating A|f i can be done in poly(2n ) time, and D performs T = γ(n)−3 + 100n
invocations of the hypothesis U and a check that requires summing over T terms.
We proceed to show the hardness condition. Fix f ∈ C[nk ]. We show that the algorithm D
accepts tt(f ) with probability at most 1/3. By our assumption, with probability at least 9/10, the
learning algorithm A outputs U that satisfies Eq. (2). Suppose this is the case. Then, for every
i ∈ [T ], let qi = Pr[bi = f (xi )] (where the probability is taken over the randomness of xi and
algorithm A) and note that qi ≥ 1/2 + γ(n). By the Chernoff bound (in Theorem 2.1), we have
!
h1 X 1 1 i γ(n)2 T 
Pr [bi = f (xi )] < + · γ(n) ≤ exp − ≤ exp − O(1/γ(n)) ,
T
i
2 4 T ( 1 + γ(n)) + T γ(n)
2 12
(4)
where in the last inequality we use −3
 the fact that T = γ(n) + 100n. Thus, with probability at
least (9/10) · 1 − exp − 1/γ(n) ≥ 2/3, the algorithms D rejects.
We proceed to show density. For that, let us fix an arbitrary quantum circuit U of size s(n).
Also, for a fixed f, x, let Wf,x = PrU [U (x) = f (x)]]. Then, we have that
1
E [Wf,x ] = ,
f,x 2
since for any fixed x, we have that
h i 1     

E E[U (x) = f (x)] = E E U (x) = 0|f (x) = 0 + E U (x) = 1|f (x) = 1
f U 2f U U
 
1    
= E E U (x) = 0 + E U (x) = 1
2f U U
1
= ,
2

27
where in the second equality we have that the fact PrU [U (x) = b] is independent of f . We now
P random function f and use the Chernoff Bound (see Lemma 2.2)
consider drawing an independent
to bound the random variable x Wf,x .
h X  X  i  t2 
Pr Wf,x − E Wf,x ≥ t ≤ exp − . (5)
f f 2(t/3 + 2n · 12 )
x∈{0,1}n x∈{0,1}n

Setting t = γ(n) · 2n /8, we have,


h X  i  γ(n)2 22n /64 
Pr Wf,x − 2n /2 ≥ γ(n) · 2n /8 ≤ exp − ,
f
x
2(γ(n) · 2n /24 + 2n /2)

which implies
h  i  γ(n)2 2n 
Pr E[Wf,x ] − 1/2 ≥ γ(n)/8 ≤ exp − .
f x 64(γ(n)/12 + 1)
This means that for every quantum circuit U (regardless its size), the fraction  of functionsthat can
γ(n)2 2n
be computed by it with advantage γ(n)/8 on a random x is at most exp − 64(γ(n)/12+1) .
We now count the number of unitaries of size s(n). Observe that there exists a constant α < 50
such that there exist at most 2αs(n) log(s(n)) circuits of size s(n) if we fix, for example, the universal
gateset {H, Toff} as described in Section 2.2: this is true because one can define, for the i-th gate,
if the gate is a Hadamard or a Toffoli, which requires just a single bit, and it takes O(log s(n)) bits
to denote the qubits on which the gate acts. Therefore, using a union bound, we have that the
fraction of functions that can be γ(n)/8 approximated by s(n) circuits is at most
    
γ(n)2 2n 2 n 1 O(n) 
exp − + αs(n) log s(n) ≤ exp γ(n) 2 − + < 1/2,
64(γ(n)/12 + 1) 64(γ(n)/2 + 1) λ·n

where the final inequality used that λ ≥ 1 is a sufficiently large constant as in the theorem statement.
To conclude the proof, finally observe that for a function f that cannot be approximated by an
s(n)-sized circuit U , i.e., for all s(n)-sized circuits U

1 γ(n)
Pr [U (x) = f (x)] < + ,
x,U 2 8

by the Chernoff bound (in Theorem 2.1), we have that by picking x1 , . . . , xT uniformly at random
and bi ∼ U (x), we obtain
h1 X 1 1 i 
Pr [bi = f (xi )] ≥ + · γ(n) ≤ exp − O(1/γ(n)) . (6)
T 2 4
i

Hence the distinguisher D accepts it with overwhelming probability. This completes the proof
under our initial assumption that δ ≤ 1/10.
In order to obtain a promise quantum natural property from learners that succeed with prob-
ability 1/100, we can modify the construction above as follows. The quantum algorithm D simply
runs steps (1)-(3) above ℓ = O(1) times, obtaining hypotheses U1 , . . . , Uℓ , and outputs 0 if and only
if at least one of them satisfies the condition in step (4). It is not hard to see that the analysis
presented above still holds under this modification, provided that ℓ is a large enough constant.

28
3.2 Circuit lower bounds from quantum natural properties
In this section, we show that promise quantum natural properties imply circuit lower bounds.
This connection between quantum natural properties and circuit lower bounds is demonstrated
using pseudorandom generators against uniform circuits.
The key technical component we shall need is the following theorem, which we prove in
Section 5, that shows a quantum-secure PRG conditioned on the assumption that PSPACE *
BQSUBEXP.

Theorem 3.2 (Conditional PRG against uniform quantum computations). Suppose that there is a
γ
language L ∈ PSPACE and γ > 0 such that L ∈ / BQTIME[2n ]. Then, for some choice of constants
α ≥ 1 and λ ∈ (0, 1/5), there is an infinitely often (ℓ, m, s, ε)-generator G = {Gn }n≥1 , where
λ 2λ
ℓ(n) ≤ nα , m(n) = ⌊2n ⌋, s(m) = 2n ≥ poly(m) (for any polynomial), and ε(m) = 1/m.

We shall also need the following diagonalization lemma.

Lemma 3.3 ([Diagonalization Lemma, see e.g. [OS17, Corollary 2]). For every k ∈ N, there exists
a language Lk ∈ PSPACE such that Lk cannot be computed by Boolean circuits of size nk whenever
n is sufficiently large.
def def
Recall that E = DTIME[2O(n) ] and BQE = BQTIME[2O(n) ]. Our goal for the rest of this
section is to establish the following connection between quantum natural properties and circuit
lower bounds.

Theorem 3.4 (Connection between natural properties and lower bounds). For every circuit class C
that is closed under restrictions of input variables to constants 0 and 1, at least one of the following
lower bounds holds:

1. For every k ∈ N there exists a language L ∈ BQE that cannot be computed by general Boolean
circuits of size nk , for sufficiently large input lengths n.

2. There exists a universal constant υ ≥ 1 for which the following holds. If there is a promise
quantum natural property against C[na ], for a > 1, then there exists a language L ∈ E that
cannot be computed by C-circuits of size na/υ , for infinitely many input lengths n.

Proof. The proof is via a win-win argument. Starting with the easy case, suppose that for every
γ
γ > 0 it holds that PSPACE ⊆ BQTIME[2n ]. Fix k ∈ N. By Lemma 3.3, there exists a language Lk
that cannot be computed by Boolean circuits of size nk (for any sufficiently large input length n).
By our assumption, the language Lk can be computed by a quantum algorithm of time complexity
γ
2n , and so Lk ∈ BQE, and the lower bound in Item (1) of Theorem 3.4 holds.
Otherwise, suppose that there exists a language L ∈ PSPACE and γ > 0 such that L ∈ /
γ
BQTIME[2n ]. We show that this implies the lower bound in Item (2). Indeed, under this assump-
tion, by Theorem 3.2, there exist constants α ≥ 1 and λ ∈ (0, 1/5) such that there is an infinitely
λ 2λ
often (ℓ, m, s, ε)-generator G = {Gn }n≥1 , where ℓ(n) ≤ nα , m(n) = ⌊2n ⌋, s(m) = 2n ≥ poly(m),
and ε(m) = 1/m; that is,

(i) Each Gn is a function mapping {0, 1}ℓ(n) to {0, 1}m(n) .

(ii) There is a deterministic algorithm A that when given 1n and x ∈ {0, 1}ℓ(n) runs in time
O(2ℓ(n) ) and outputs Gn (x).

29
(iii) For each n ≥ 1, let Dm(n) = Gn (Uℓ(n) ) be the distribution over m(n)-bit strings induced by
evaluating Gn on a random ℓ(n)-bit string. Then the corresponding ensemble {Dm(n) }n≥1 is
infinitely often (s, ε)-pseudorandom against uniform quantum circuits.

We define a language LG using the generator G = {Gn }n≥1 and argue that it is hard to decide
this language due to the existence of a quantum natural property, as assumed in Item (2) of the
statement. Our language LG is defined by the following deterministic algorithm D:

1. Encoding check: On an input u ∈ {0, 1}n , reject if u is not of the form

u = (1t , w, x)

for some t ∈ N such that |w| = ℓ(t) and |x| = t′ , where we denote t′ = ⌊tλ ⌋. (Note that 2t is
the maximal power of 2 that is not greater than m(t).)11

2. PRG invocation: Compute Gt (w) ∈ {0, 1}m(t) and let gw denote its leftmost 2t bits.

3. Emulate computation: Let hw : {0, 1}t → {0, 1} be the function determined by gw (i.e., fncgw ),
and output hw (x).

To prove the lower bound in Item (2) of Theorem 3.4, we first observe that D runs in deterministic
time complexity 2O(n) , and hence LG ∈ E. To see that, observe that by construction, the runtime
of D is dominated by Step (2), which computes Gt (y) given a correctly encoded input. Note that
for valid inputs we have n ≥ t + ℓ(t) + t′ . As ℓ(t) < n, the time complexity of D is at most O(2n ).
To complete the proof, we need to prove the following lemma, which shows that the language
LG is hard for C-circuits. Recall that we are under the assumption that there is a promise quantum
natural property against C[na ], where a > 1 is arbitrary but can be assumed to be sufficiently large
without loss of generality.
Lemma 3.5. Therehexists
a
i a constant υ = υ(α, λ) ≥ 1, where (α, λ) are the parameters of the PRG
G
G, such that L ∈/ C n υ for infinitely many input lengths n.

Proof. Set υ = 2 · αλ , and let a > 1. We prove the lemma by contradiction:


h a i suppose that there
exists a finite set N such that for every n ∈ N \ N it holds that LG ∈ C n υ over length-n inputs.
We argue next that this circuit upper bound and the existence of a quantum natural property as
in the statement of the theorem lead to a violation of property (iii ) of G stated above.
Let F be a quantum algorithm that gets as input a string of length N = 2n and computes a
(promise) quantum natural property in the sense of Definition 2.12 in poly(N ) time. Recall that F
accepts a dense set Γn of inputs and rejects functions computable by C-circuits of size na , both with
probability at least 2/3. Performing error reduction on F using amplitude amplification, we can
assume without loss of generality that on each string z ∈ Γn ⊆ {0, 1}N , we have kΠ1 F |z, 0ik2 ≥
1 − 2−2N , and on each string z ∈ {0, 1}N such that fncz ∈ Cn [na ], we have kΠ1 F |z, 0ik2 ≤ 2−2N .
We construct a deterministic algorithm A(1m ) to generate quantum circuits that violate prop-
λ
erty (iii ) of G (parameterized here by a parameter t). On an input 1m , where m = m(t) = ⌊2t ⌋,
′ ′
A writes m = 2t + r, where r ∈ [0, 2t − 1]. It outputs a quantum circuit CF on m input bits

that computes according to F applied to the 2t -bit prefix of the input and ignores the remaining
r input bits. To complete the proof of the lemma, we show that A(1m ) outputs a quantum circuit
11
Notice that there exists at most one value of t for which these constraints hold.

30

CF of size at most s(m) = 2t that Ω(1)-distinguishes Gt (Uℓ(t) ) from Um(t) for every large enough
t, thereby violating property (iii). In other words,

E [CF (y) = 1] − E [CF (Gt (β)) = 1] ≥ Ω(1). (7)


y∼Um(t) β∼Uℓ(t)


For each m = 2t + r of the form defined before,

(a) CF rejects with overwhelming probability every m-bit string z whose 2t -bit prefix encodes a
function f ∈ C[(t′ )a ].

(b) CF accepts with overwhelming probability every m-bit string z whose 2t -bit prefix is in Γt′ .

(c) |CF | = O(poly(2t )) < s(m) as its run time is dominated by the size of F which runs in
′ ′
poly(2t ) time on inputs of size 2t .
Note that, as a consequence of the density of Γt′ and (b), we have
1   1
E [CF (z) = 1] = E [F (y) = 1] ≥ · E F (y) = 1 | y ∈ Γt′ ≥ · (1 − 2−2N ).
z∼Um(t) y∼U ′
2(t )
2 y∼U2(t′ ) 2

To satisfy Equation (7), it is enough to argue that for each seed w ∈ {0, 1}ℓ(t) and the string
def
τ = Gt (w), we have Prτ [CF (τ ) = 1] ≤ 2−2N . According to Item (a) above, it suffices to show

that gτ , the 2t -prefix of τ , is the truth-table of a function hw ∈ C[(t′ )a ]. For this we rely on the
a
assumption that LG ∈ C[n υ ] for every input length n ∈ N \ N . Details follow.
Let n = t + ℓ(t) + t′ + O(1) ≤ 2tα < 2(t′ )α/λ such that n ∈ N \ N , where n is an input length
of LG that contains inputs of the following form: (1t , w, x), where x is an arbitrary t′ -bit string,
′ def
and w is the seed that generates τ . Let gτ be the 2t -bit prefix of τ and hτ = fncgτ be a Boolean
a
function on t′ < n inputs bits. Then, by the assumption that LG ∈ C[n υ ] where C is a circuit class
closed under restriction of input variables to constants 0 and 1 (see Item (ii) in Section 2.1.2), we
obtain a C-circuit that computes hτ (x). Furthermore, this circuit is of size at most
  α a/υ α a α a
na/υ ≤ (2tα )a/υ < 2 t′ λ ≤ 2a/υ · (t′ ) λ · υ < (t′ )2· λ · υ = (t′ )a ,

where we have used that 2z ≤ z 2 for z ≥ 2 in the penultimate inequality and 2· αλ · υa ≤ a as υ = 2· αλ


in the last equality.

This concludes the proof of Theorem 3.4.

As an immediate corollary, we obtain the following theorem.


Corollary 3.6 (Lower bounds from quantum natural properties). Let C be a circuit class. Suppose
that for every a ≥ 1 there is a promise quantum natural property against C[na ]. Then for every
constant b ≥ 1 we have BQE * C[nb ].
Proof. From Theorem 3.4, note that there are two possibilities and at least one of them holds.
In the first case, for every k ∈ N, there exists a language L ∈ BQE that cannot be computed
by general Boolean circuits of size nk . Since by assumption C-circuits can be simulated by general
Boolean circuits with only a polynomial blowup on circuit size (Section 2.1.2), it follows in this
case that for every b ≥ 1 we have BQE 6⊆ C[nb ].
In the second case, given b ≥ 1, we let a = bυ. Since by assumption there is a natural property
against C[na ], the second case of Theorem 3.4 gives a language in E that is not in C[na/υ ]. Since
E ⊆ BQE, the result follows.

31
3.3 Non-trivial quantum learning yields non-uniform circuit lower bounds
Theorem 3.7 (Theorem 1, formally restated). There is a universal constant λ ≥ 1 for which
the following holds. Let C be a concept class. Let γ : N → [0, 1/2] and T : N → N be arbitrary
constructive functions with γ(n) ≥ λ · 2−n/2 and T (n) ≤ γ(n)2 · 2n /λn for large enough n. Suppose
that, for every k ≥ 1, the class C[nk ] can be learned in quantum time T (n) with probability ≥ 1/100
and advantage γ(n). Then, for every k ≥ 1, we have BQE * C[nk ].
Proof. This proof combines results from the previous two sections to relate quantum learning
algorithms to circuit lower bounds through the existence of quantum natural properties. The
assumptions for the algorithm that learns C[nk ] imply that it runs in time at most T (n) ≤ γ(n)2 ·
2n /λn, has a confidence probability 1 − δ(n) ≥ 1/100 (i.e., δ ≤ 0.99) and error probability ε(n) ≤
1/2 − γ(n). Then, from Theorem 3.1, for every constant k ≥ 1, there is a promise quantum natural
property against C[nk ]. Following our notation from Section 2.1.2, the concept class C[nk ] also
denotes a corresponding circuit class. Using Corollary 3.6, the existence of a quantum natural
property against C[nk ] for every k ≥ 1 implies that for every constant b ≥ 1, BQE * C[nb ]. This
completes the proof.

Similarly to previous work [OS17], our arguments can be adapted to show a relation between
the non-trivial learnability of a concept class C[s] of size-s circuits, where s(n) = nω(1) , and lower
bounds against C[s′ ], where s, s′ : N → N and s′ is a function that depends on s. We have decided
not to pursue the most general form of the result in this paper, as our proofs are already significantly
involved and the relation between s and s′ deteriorates as s becomes super-quasi-polynomial, due
to the use of win-win arguments.

4 Technical tools
In this section, we discuss extensions to quantum computing of several fundamental results
from complexity theory. This is needed to establish the correctness of our PRG construction.
In Section 4.1, we provide a (fairly straightforward) quantization of [NW94], where we show
that a quantum distinguisher for the (candidate) PRG NWf implies that f can be non-trivially
approximated by quantum algorithms. In Section 4.2, we give a self-contained exposition of a
quantum analogue of the Goldreich-Levin algorithm [GL89] discovered by [AC02]. In Section 4.3,
we quantise the near-optimal uniform hardness amplification result of [IJKW10]. We remark that
this is the most technically involved result in this section. Finally, in Section 4.4, we show how to
use downward and random self-reducibility of languages to devise quantum algorithms to compute
them, adapting ideas from [IW01].

4.1 Nisan-Wigderson generator against quantum adversaries


An ordered family S = (S1 , . . . , St ) of sets is a (t, m, n, α)-design if the following conditions are
satisfied:
• |S| = t, and for each i ∈ [t], Si ⊆ [m] and |Si | = n.
• For every distinct pair i, j ∈ [t], |Si ∩ Sj | ≤ α.
Lemma 4.1 ([NW94, Lemma 2.5]). There is an absolute constant c ≥ 1 for which the following
holds. For any positive integers n and t such that n ≤ t ≤ 2n , there is an ordered family S that
is a (t, cn2 , n, log t)-design. Moreover, given n, t, and an index i ∈ [t], the set Si can be output in
time poly(t).

32
Definition 4.2 (Nisan-Wigderson generator [NW94]). Let f : {0, 1}n → {0, 1}, m = cn2 , and
n ≤ t ≤ 2n . In addition, let S be a (t, m, n, log t)-design. Then the Nisan-Wigderson generator
NWfS : {0, 1}m → {0, 1}t is the function defined as

def
NWfS (z) = f (z|S1 )f (z|S2 ) · · · f (z|St ),

where z|Si is the n-bit string formed by restricting z ∈ {0, 1}m onto the coordinates given by the
i-th set Si ∈ S.

From now on we will only consider the designs given by Lemma 4.1, so in order to simplify
notation, we omit the underlying collection S and simply write NWf .
The next lemma verifies that the usual analysis of the Nisan-Wigderson generator extends to
inherently probabilistic circuits.

Lemma 4.3 (Uniformity and correctness of the NW generator for inherently probabilistic circuits).
Let n and t be positive integers such that 1 ≤ n ≤ t ≤ 2n , f : {0, 1}n → {0, 1}, and consider
the corresponding function NWf : {0, 1}m → {0, 1}t , where m = cn2 . Let D be an inherently
probabilistic circuit defined over t input bits that satisfies
   
Pr D(NWf (s)) = 1 − Pr D(y) = 1 ≥ γ. (8)
s∈{0,1}m , D y∈{0,1}t , D

There exists a probabilistic algorithm B with oracle access to f that, given an input 1t , runs in time
poly(t) and outputs a deterministic oracle circuit AO of size O(t2 ) for which the following holds.
For any choice of D as above, with probability Ω(γ/t2 ) over the internal randomness of B f , the
generated circuit AO satisfies
  1 γ
Pr AD (x) = f (x) ≥ + . (9)
x∈{0,1}n , D 2 2t

Proof. We show that there exists a collection of deterministic oracle circuits AO such that, for
any choice of D, a noticeable fraction of such oracle circuits provide the desired advantage. The
argument also establishes the existence of a uniform algorithm B f with the required properties.
Note that B f does not have access to D and that the computation of each AO does not specify its
oracle. However, in order to make the argument more concrete, in our exposition below we refer to
the oracle O as D, and consider a fixed D satisfying the assumption of the lemma. After that, we
explain how the conclusion of the result follows from this argument.
We prove first that there exists a fixed j ∈ [t] (depending on D) and a sub-collection of
deterministic oracle “next-bit predictor” circuits Pr,d D parameterized by r ∈ {0, 1}t−j and a fixed

d ∈ {0, 1} (depending on D), each of size O(t), such that


h  i
D
Pr Pr,d NWf (s)1 , . . . , NWf (s)j−1 = NWf (s)j ≥ 1/2 + γ/t. (10)
s∈{0,1}m ,r,D

For notational simplicity, from here onwards we abuse notation and write z ∼ NWf (s), meaning
that s ∼ {0, 1}m is uniformly random and z = NWf (s). Additionally, for i ∈ [t] and z ∈ {0, 1}t ,
we let z (i) be a random variable with the first i bits being equal to z and the last t − i bits being a
uniformly random string r ∈ {0, 1}t−i . Also, let
 
pi = Pr D(z (i) ) = 1 ,
z∼NWf (s),D

33
and observe that

p0 = Pr [D(y) = 1] and pt = Pr [D(z) = 1].


y∈{0,1}t ,D z∼NWf (s),D

Therefore, by Eq. (8), we have |pt −p0 | ≥ γ. By the triangle inequality, there exists j ∈ {0, 1, . . . , t−
1} such that |pj+1 − pj | ≥ γ/t. Setting ẑ = z (j) for convenience, notice that,

γ/t ≤ |pj+1 − pj |
!
   
= E D(z (j+1) ) = 1 − E D(ẑ) = 1
z∼NWf (s),D,z (j+1) z∼NWf (s),D,ẑ

 
= E D(ẑ) = 1|ẑj+1 = zj+1
z∼NWf (s),D,ẑ
!!
1    
− E D(ẑ) = 1|ẑj+1 = zj+1 + E D(ẑ) = 1|ẑj+1 6= zj+1
2 z∼NWf (s),D,ẑ z∼NWf (s),D,ẑ
!
1    
= E D(ẑ) = 1|ẑj+1 = zj+1 − E D(ẑ) = 1|ẑj+1 6= zj+1
2 z∼NWf (s),D,ẑ z∼NWf (s),D,ẑ
!
1    
= E D(ẑ) = 1|ẑj+1 = zj+1 + E D(ẑ) 6= 1|ẑj+1 6= zj+1 − 1 ,
2 z∼NWf (s),D,ẑ z∼NWf (s),D,ẑ

where in the second equality we use three simple facts:


   
E D(z (j+1) ) = 1 = E D(ẑ) = 1|ẑj+1 = zj+1
z∼NWf (s),D,z (j+1) z∼NWf (s),D,ẑ

and
 
E D(ẑ) = 1
z∼NWf (s),D,ẑ

= Pr [ẑj+1 = zj+1 ] E D(ẑ) = 1 ẑj+1 = zj+1 ]
f f
z∼NW (s),D,ẑ z∼NW (s),D,ẑ

+ Pr [ẑj+1 6= zj+1 ] E D(ẑ) = 1 ẑj+1 6= zj+1 ]
z∼NWf (s),D,ẑ z∼NWf (s),D,ẑ

and Prz∼NWf (s),D,ẑ [ẑj+1 = zj+1 ] = Prz∼NWf (s),D,ẑ [ẑj+1 6= zj+1 ] = 12 , since ẑj+1 = r1 is a uniformly
random bit.
The inequality above motivates the following definition. Let d ∈ {0, 1} be such that (−1)d (pj+1 −
pj ) > 0. We define the “next-bit predictor” Pr,d D , for r ∈ {0, 1}t−j , as follows. On input z , . . . , z
1 j
(where z ∼ NWf (s)), Pr,d D makes an oracle call to D on input z , ..., , z , r , ..., r
1 j 1 t−j and let b be its
output. If the output b = 1, then P outputs r1 ⊕ d, otherwise P outputs r1 ⊕ d ⊕ 1. It follows from
the inequality above, the definition of Pr,dD , and a simple manipulation (see e.g., [AB09, Proof of

Theorem 9.11]) that


 D
 1
E Pr,d (z1 , . . . , zj ) = zj+1 −
f
z∼NW (s),r,D 2
!
(−1)d    
= E D(ẑ) = 1|ẑj+1 = zj+1 + E D(ẑ) 6= 1|ẑj+1 6= zj+1 − 1
2 z∼NWf (s),D,ẑ z∼NWf (s),D,ẑ

≥ γ/t.

34
This concludes the construction of a collection of deterministic oracle circuits Pr,d D satisfying Eq. (10).

We now construct an algorithm AD j,r,r ′ ,d (x) that has hardwired on its code the (t − j)-bit
(random) string r from above and another (m − n)-bit (random) string r ′ introduced next. We
will show that, on average over the choices of r and r ′ , the algorithm computes f (x) on a random
x with noticeable probability. For each i 6= j + 1, we now assume (and subsequently prove) the
existence of a deterministic circuit Ci,r′ (x) of size O(t) that computes NWf (s)i , where s ∈ {0, 1}m
is defined as s|Sj+1 = x and s|Sj+1 = r ′ .
The circuit AD D D
j,r,r ′,d on an input x invokes Pr,d as follows. Aj,r,r ′,d (x) fixes seed s consisting
of s|Sj+1 = x and s|Sj+1 = r ′ , computes NWf (s)1 , . . . , NWf (s)j using the corresponding circuits
D NWf (s) , . . . , NW f (s) .

Ci,r′ (x), and outputs Pr,d 1 j
Notice that, averaging over the random choices of r and r ′ , and writing s as the random string
obtained from random choices of x and r ′ , we have:
   
 D  
E′ Pr [AD
j,r,r ′ ,d (x) = f (x)] = E ′
Pr n
Pr,d NW f
(s)1 , . . . , NW f
(s)j = f (x)
r,r x,D r,r x∈{0,1} ,D
 D
 1
= Pr Pr,d (z1 , . . . , zj ) = zj+1 ≥ + γ/t. (11)
f
z∼NW (s),r,D 2

Consequently, we get that


h i
  Er,r′ Prx,D [AD (x) 6
= f (x)]
1 j,r,r ′,d
Pr′ Pr [AD
j,r,r ′,d (x) 6= f (x)] ≥ − γ/2t ≤ 1
r,r x,D 2 2 − γ/2t
1
2 − γ/t 1 − 2γ/t
≤ 1 = ≤ 1 − γ/t, (12)
2 − γ/2t
1 − γ/t

where the first inequality comes from Markov’s inequality, the second inequality comes from Equa-
tion (11), and the last inequality comes from the fact that 1 − 2x ≤ (1 − x)2 for every x ∈ R.
Using the assumption about the size of Ci,r′ , we have that the size of AO is trivially O(t2 ). We
prove next the existence of circuits Ci,r (x) of the claimed size. First notice that NWf (s)i depends
on at most log t bits of s|Sj+1 = x, by Lemma 4.1. Therefore, for fixed i and r ′ , we can define
Ci,r′ (x) as a lookup-table of size O(t) that stores the corresponding values of f for all possible
choices of the ≤ log t relevant bits of x.
Finally, we define the uniform algorithm B f (1t ) as follows. First, it picks j ∗ ∈ [t−1], d∗ ∈ {0, 1},
r ∈ {0, 1}t−j and r ′ ∈ {0, 1}m−n uniformly at random. Then B f outputs the oracle circuit AO j ∗ ,r,r ′ ,d∗ ,
where the lookup table of each relevant circuit Ci,r′ can be computed by B using membership queries
to its oracle f . Note that the computation of B f and the description of AO are indeed oblivious to
the particular choice of D.
Since for any fixed D as in the hypothesis of the result good choices of j ∗ and d∗ are made with
1
probability at least 2t , and in this case r and r ′ yield a circuit AD
j ∗ ,r,r ′ ,d∗ with the desired properties
with probability at least γ/t by Equation (12), we have that for any admissible D, with probability
at least Ω(γ/t2 ) over its internal randomness B f outputs a deterministic oracle circuit AO j ∗ ,r,r ′ ,d∗
that satisfies Equation (9). (Note that the analysis shows that the construction succeeds with the
desired probability for any fixed choice of the inherently probabilistic circuit D, though different
random choices of the parameters d, r, and r ′ might be needed as a function of D.)

We now establish a quantum analogue of this result, stated in a way that is convenient for our
application.

35
Lemma 4.4 (Uniform Nisan-Wigderson reconstruction for quantum circuits). Let sF , sD , t : N → N
and γ : N → [0, 1] be constructive functions, where n ≤ t(n) ≤ 2n for every n. There exists a
sequence {CnNW }n≥1 of quantum circuits CnNW such that:

(i ) Input: Each circuit CnNW gets as input strings 1n , 1t(n) , code(D), and code(F ), where code(D)
encodes a quantum circuit D of size ≤ sD (n) over t(n) input bits, and code(F ) encodes a
quantum circuit F of size ≤ sF (n) over n input bits.

(ii ) Uniformity and Size: Each circuit CnNW is of size S(n) = poly(t(n), sF (n), sD (n)), and there
is a uniform deterministic algorithm that given 1n runs in time poly(S(n)) and prints the
string code(CnNW ).

(iii ) Output and Correctness: Suppose that the input circuit F computes a Boolean function
f : {0, 1}n → {0, 1}, and consider the associated function NWf : {0, 1}m(n) → {0, 1}t(n) , where
m(n) = cn2 . In addition, assume that the input circuit D satisfies
   
Pr D(NWf (s)) = 1 − Pr D(y) = 1 ≥ γ(n). (13)
s∈{0,1}m(n) , D y∈{0,1}t(n) , D

Then with probability Ω(γ(n)/t(n)2 ) over its output measurement, CnNW produces a string
code(A) encoding a quantum circuit A of size O(t(n)2 · sD (n)) such that
h i 1 γ(n)
def
Pr [A(x) = f (x)] = E kΠf (x) A |xi |0ℓ ik ≥ + . (14)
x∈{0,1}n , A x∼{0,1}n , A 2 2t(n)

Note. We stress that the result is non-trivial because the size of A is independent of sF (n) (other-
wise it would be sufficient for CnNW to output the string code(F ), which encodes a quantum circuit
that computes f (x) on every input x with probability ≥ 2/3).

Proof of Lemma 4.4. We follow the argument in the proof of Lemma 4.3. Here the circuit CnNW
takes the role of B f , the oracle f is replaced by the input circuit F , and the input circuit D behaves
as the inherently probabilistic circuit D.
Under the assumption that the quantum circuit D distinguishes the output of NWf (s) on a
random seed s from a random string y (Equation 13), and proceeding as in the proof of Lemma
4.4 while replacing D with D, it follows from Lemma 2.17 that the same probability analysis holds.
Consequently, there is a quantum circuit A that employs D as a sub-routine and computes f with
advantage 1/2+ γ/2t. The only relevant difference here is that in order to generate with the desired
probability a correct description of A, it is necessary to evaluate the function f on different input
strings (recall that we hardwire the corresponding answers in the circuits Ci,r′ , which appear as sub-
circuits of A). To achieve that, CnNW simulates the input quantum circuit F (x) (using a universal
quantum circuit), amplifying its success probability of computing f (x) via standard techniques in
a way that reduces the probability that F errs on any string employed during these simulations
to less than 1/2. Whenever the correct values of f needed in the circuits Ci,r′ are produced, we
get that with probability Ω(γ(n)/t(n)2 ) over its output measurement, CnNW successfully generates
a “good” quantum circuit A from circuits F and D and its internal random choices, i.e., Equation
14 holds for A.
While in the proof of Lemma 4.3 the size of each deterministic oracle circuit AO is O(t2 ), here
the number of gates in the corresponding quantum circuit A is O(t(n)2 · sD (n)), since A explicitly
incorporates the computation of circuit D, which by assumption has size at most sD (n).

36
We now discuss the uniformity and size of each quantum circuit CnNW . Recall that CnNW operates
in a way that is analogous to algorithm B f from Lemma 4.3. A bit more formally, CnNW must output
a string code(A) from strings code(F ) and code(D) (and the auxiliary parameters 1n and 1t(n) ).
The computation of CnNW involves manipulating the codes of F and D to produce the code of A,
and includes the simulation of the quantum circuit F on at most t(n) · t(n) distinct input strings
(we have at most t(n) circuits Ci,r′ , and each of them stores the value of f on at most t inputs),
with an overhead for the amplification of the success probability. Since the descriptions of F and
D have length poly(sF (n)) and poly(sD (n)), respectively, and each quantum simulation can be
done using poly(sF (n)) gates, it follows that each quantum circuit CnNW can be implemented with
S(n) = poly(t(n), sF (n), sD (n)) gates. Finally, it is not hard to see that the code of CnNW is explicit
and can be deterministically generated from 1n in time poly(S(n)), since the description of B f in
the proof of Lemma 4.3 is also explicit.

4.2 Goldreich-Levin lemma in the quantum setting


Lemma 4.5. Let f : {0, 1}kn → {0, 1}k . Suppose there is a quantum circuit A (that uses m ≥ 1
workspace qubits) satisfying
  1
E E kΠf (x)·r A |x, r, 0m ik2 ≥ + γ.
x∈{0,1}kn r∈{0,1}k 2

Then there is a quantum oracle circuit B A of size O(kn) that has oracle access to A (and to its
inverse A† ) such that
h i γ3
E kΠf (x) B A |x, 0k+m+1 ik2 ≥ .
x,BA 2
Moreover, a quantum circuit B of this form can be constructed from a quantum circuit A as
above by a uniform sequence of circuits, which we formalise as follows. Let k, sA : N → N and
γ : N → [0, 1/2] be constructive functions. In addition, let fn : {0, 1}kn → {0, 1}k be a sequence
of functions, where k = k(n). Then there is a sequence {CnGL }n≥1 of deterministic circuits CnGL of
size poly(n, k(n), sA (n)) for which the following holds. If CnGL is given as input strings 1n and a
description code(A) of a quantum circuit A of size ≤ sA (n) with the property above, then it outputs
a string code(B) describing a quantum circuit B of size O(n · k(n) · sA (n)) such that
h i γ(n)3
E kΠfn (x) B |x, 0k+m+1 ik2 ≥ .
x,B 2

Proof. Without loss of generality, let us assume that A measures the first qubit of A |x, r, 0m i and
produces it as an output. We define the algorithm B A on input x as follows:
P
1. Start with √1 k r |x, r, 0m , 1i
2

2. Apply A on the first kn + k + m qubits.

3. Apply a control-Z operation with the “output of A” (i.e., first qubit) as the control qubit and
the last qubit as the target qubit.

4. Apply A† on the first kn + k + m qubits.

5. Apply Hadamard transform on the second register.

6. Measure all qubits in the computational basis.

37
7. If the outcome is of the form |x, a, 0k+m , 1i, output a.
 
To analyze its correctness, we first define G = {x : Er∈{0,1}k kΠf (x)·r A |x, r, 0m ik2 ≥ 1
2 + γ2 }. We
will show that
γ n
|G| ≥ ·2 , (15)
2
and that for every x ∈ G, we have
h i
E kΠf (x) B A |x, 0k+m+1 ik2 ≥ γ 2 . (16)
BA

Before proving Equations (15) and (16), notice that they directly imply our statement, since
h i |G| h i
E kΠf (x) B A |x, 0k+m+1 ik2 ≥ n E kΠf (x) B A |x, 0k+m+1 ik2
x,BA 2 x∈G,BA
γ h i γ3
≥ E kΠf (x) B A |x, 0k+m+1 ik2 ≥ ,
2 x∈G,BA 2

where in the first inequality we removed some non-negative values, in the second inequality we use
Equation (15) and in the third inequality we use Equation (16).
Let us now show Equation (15). Suppose toward a contradiction that |G| < γ2 · 2n . Then we
have
1  
+γ ≤ E E kΠf (x)·r A |x, r, 0m ik2
2 kn
x∈{0,1} r∈{0,1}k
 
|G|  m 2
 |G|  
= n E E kΠf (x)·r A |x, r, 0 ik + 1 − n E E kΠf (x)·r A |x, r, 0m ik2
2 x∈G r∈{0,1}k 2 x6∈G r∈{0,1}k
  
|G| |G| 1 γ
< n + 1− n +
2 2 2 2
1
< + γ/2,
2
which is a contradiction. Therefore, Equation (15) must be true.
Now we prove Equation (16) and for that let us fix an arbitrary x ∈ G. This part of the proof
closely follows the proof by Cleve and Adcock [AC02]. Without loss of generality, we can assume
that A on the input state acts as

A |x, r, 0m i = |x, ri ⊗ (αx,r,0 |ψx,r,0 i |f (x) · ri + αx,r,1 |ψx,r,1 i |1 ⊕ f (x) · ri) .

From the assumption that x ∈ G, we have that


  1 γ   1 γ
E |αx,r,0 |2 ≥ + E |αx,r,1 |2 ≤ − .
and
r 2 2 r 2 2
P
We define two quantum states. First, we start with √1 k r |x, r, 0m , 1i and apply steps 2 and 3
2
of B A to obtain
1 X 
|φx i = |xi ⊗ √ (−1)f (x)·r αx,r,0 |ri |ψx,r,0 i |f (x) · r, 1i − αx,r,1 |ri |ψx,r,1 i |1 ⊕ f (x) · r, 1i .
2k r

38
In order to analyze the probability of the measurement outcome a (the output of B A in step
(7)) equalling f (x), consider the state where we start with |x, f (x), 0k+m , 1i, apply the Hadamard
transform on the second register and then apply A on the first kn + k + m qubits to obtain
1 X 
|σx i = √ |xi (−1)f (x)·r αx,r,0 |ri |ψx,r,0 i |f (x) · r, 1i + αx,r,1 |ri |ψx,r,1 i |1 ⊕ f (x) · r, 1i . (17)
2k r

The probability that B A (x) outputs f (x) is then given by


  2
| hσx | φx i |2 = E |αx,r,0 |2 − |αx,r,1 |2 ≥ γ 2, (18)
r
 
where we use the fact that for x ∈ G, we have Er |αx,r,0 |2 − |αx,r,1 |2 ≥ γ. This concludes the
proof of Equation (16). Notice that the size of B A can be simply checked by inspection from
its description.
We discuss now the moreover part of our lemma. The circuit CnGL receives the input code(A)
(and the auxiliary parameter 1n ) and outputs the circuit B that executes the steps (1) − (7)
described above, with an oracle call to A replaced by the execution of code(A) (or its inverse). It
is straightforward from the previous calculations that B has the desired properties and that it has
size at most O(k(n) · n · sA (n)). Notice that the circuit CnGL that prints B is deterministic and it can
be implemented using poly(n, k(n), sA (n)) gates. We also notice that code of CnGL is explicit and it
can be deterministically generated from 1n in time poly(n, k(n), sA (n)).

4.3 Local list decoding and uniform hardness amplification for quantum circuits
In this section, we start with some arbitrary function g : {0, 1}n → {0, 1}, which we assume to
be mildly hard, and our goal is to amplify its hardness using techniques from local list decoding of
(classical) error-correcting codes. Specifically, we prove a quantum analogue of the direct product
theorem of Impagliazzo, Jaiswal, Kabanets, and Wigderson [IJKW10]. For simplicity of notation,
we will fix n ∈ N and denote U = {0, 1}n . Roughly speaking, we will prove that computation of the
concatenation of k independent copies of g amplifies its hardness exponentially as a function of k.
For that, we first define the domain of such concatenation.
Definition 4.6 (k-sets). Let Sn,k = {S ⊆ U : |S| = k} be the set of all subsets of {0, 1}n of size k.
When n is fixed, we denote Sk = Sn,k .
Remark 4.7. We consider the n-bit strings in a set B ∈ Sn,k in lexicographic order when B is
given as input to a Boolean circuit.
We consider then the k-direct product of g, denoted g k : Sk → {0, 1}k , where g k (B) is the
concatenation of g(x) for every x ∈ B in a canonical order, i.e,

gk (x1 , . . . , xk ) = g(x1 ), . . . , g(xk ) .

Whenever it is clear from the context, we abuse the notation with g(B) = g k (B), for B ∈ Sk .
There are well known connections between hardness amplification and classical error-correcting
(for example see [AB09, Chapter 19]) and part of our terminology reflects them. For instance, the
code can be viewed as corresponding to gk and decoding will correspond to computing g(y) for
some input y ∈ U when given appropriate access to gk .
As a preliminary step, we consider hardness amplification for inherently probabilistic computa-
tions, which we introduced in Section 2.7 as an intermediate model between classical and quantum
circuits.

39
4.3.1 Inherently probabilistic circuits
Recall that an inherently probabilistic circuit G for computing a function g : {0, 1}m → {0, 1}ℓ
with probability at least ε is a circuit that assigns to each input z ∈ {0, 1}m a distribution G(z)
supported over {0, 1}ℓ such that
Pr m [v = g(z)] ≥ ε .
z∼{0,1} ,
v∼G(z)

Our goal in this section is to show that an inherently probabilistic circuit G that computes g k
with small probability ε > 0 can be used as a subroutine in a randomized circuit with oracle access
to G to compute g with high probability 1 − δ. Moreover, we aim to design a uniform randomized
algorithm that is able to generate, with non-trivial probability ζ as a function of ε, a description
of this oracle circuit from a description of G.
We state the main theorem of this section. In the following, it might be √ instructive to think of
the following setting of parameters as a function of the input size: ε = 2− n , δ = 1/ poly(n), and
k = poly(n).
Theorem 4.8 (Local list decoding for inherently probabilistic circuits). There exists a universal
constant C ≥ 1 for which the following holds. Let n ≥ 1 be a positive integer, k be an even integer,
and let ε, δ > 0 satisfy     
1 1 1
k ≥ C · · log + log . (19)
δ δ ε
If G is an inherently probabilistic circuit defined over Sn,k with k output bits such that
h i
E G(B) = gk (B) ≥ ε, (20)
B∼Sn,k , G

then there is a randomized oracle circuit B O of size poly(n, k, log(1/δ), 1/ε) such that
h i
E B G (x, y) = g(x) ≥ 1 − δ. (21)
x∼U,
y∼{0,1}∗ , G

Moreover, there is a uniform randomized algorithm D that, given n, k, ε, and δ satisfying the
conditions of the theorem, access to a description of G, and the ability to run G on a given input
B ∈ Sn,k , computes in time poly(n, k, log(1/δ), 1/ε) and outputs with probability ζ = Ω(ε2 ) over its
internal randomness and the randomness of G a description of a circuit B O with this property.
Remark 4.9 (Amplification). We observe that the generating probability ζ can be amplified using
standard techniques (repetition and hypothesis testing) if the uniform randomized algorithm D is
also given oracle access to the function g.
Remark 4.10 (On the correlation between G and gk ). Before we proceed with the proof of Theorem
4.8, it is worth pointing out different ways in which the inherently probabilistic circuit G can be
correlated with gk . First, G might behave as a deterministic circuit, being correct on an arbitrary
set V ⊆ Sk of relative size ε (i.e., on each B ∈ V we have G(B) = gk (B) with probability 1), and
being incorrect elsewhere. At the other extreme, it is also possible for G to agree with g k (B) on
each B ∈ Sk with probability about ε over its internal randomness, spreading out its correlation
with gk across all inputs. Since G is inherently probabilistic, there is no simple way of reducing this
case to the preceding case (e.g., by fixing G’s internal randomness). Finally, it is possible that G’s
behavior combines in an arbitrary way the two aforementioned cases, while maintaining an overall
advantage ε with respect to g k . A proof of Theorem 4.8 needs to address all possible scenarios.

40
To present the local decoding algorithm with which we will prove Theorem 4.8, it will be
convenient to refer to the bipartite graph (the incidence graph of the Johnson scheme) that contains
all k/2-sets on the left and all k-sets on the right, where the edges correspond to incidence of the
sets.
Definition 4.11 (Edges and Neighbors). We define the set of edges I = {(A, B) ∈ Sk/2 × Sk :
A ⊆ B}. We define the neighbors of a set A by NI (A) = {B ∈ Sk : (A, B) ∈ I} and the neighbors
of A ∈ Sk/2 and x ∈ U \ A as NI (A, x) = {B ∈ Sk : A ∪ {x} ⊆ B}.
We proceed to present the key sub-procedure for the list-decoding algorithm, which is a circuit
that, given a k/2-set A and an assignment w to A, attempts to compute g(x) for a given x by
choosing a random neighbour B ′ of A and x, running G on B ′ and outputting the corresponding
value for x if the output of G(B ′ ) is consistent with w on A.
Construction 4.12 (The decoding circuit CA,w ). For a fixed A ∈ Sk/2 , w ∈ {0, 1}k/2 , and a
parameter T , we define the randomized circuit CA,w with oracle access to G that, on input x ∈ U,
works as follows:
1. If x ∈ A, output w|x .12
2. Repeat for T steps:
2.1. Pick a uniformly random B ′ ∈ NI (A, x).
2.2. Sample v ′ ∼ G(B ′ ).
2.3. If v ′ |A = w, output v ′ |x .
3. Output ⊥.
Using Construction 4.12, the list-decoding algorithm will decode by evaluating the input cir-
cuit G on a random k-set B and outputting the circuit CA,w with respect to a random k/2-set
A ⊂ B.
Construction 4.13 (The list-decoding algorithm D). The uniform randomized algorithm D is
given n, k, ε, and δ satisfying the conditions of Theorem 4.8, access to a description of G, and a
parameter T . The algorithm D operates as follows:
1. Pick a random k-set B ∈ Sn,k and a random k/2-subset A ⊆ B.
2. Sample G(B) and use it to set w = G(B)|A .
3. Output the description of the circuit CA,w (defined in Construction 4.12) with respect to the
given parameter T .
Similar to [IJKW10], we prove Theorem 4.8 in two steps. First, we show that if A and w,
which were randomly chosen by the list-decoder, have certain desired properties and the repetition
parameter T is sufficiently large, the circuit CA,w computes g with high probability over a random
input x ∈ U, its internal randomness, and the inherent randomness of G. (Therefore, the oracle
circuit B O in the statement of the lemma will be CA,w for a good choice of A and w, with its oracle
O computing as G, and the input string y used as a source of random bits.) Then, we argue that
with probability at least ζ = Ω(ε2 ) the uniform randomized algorithm D is able to generate good
parameters A and w.
Throughout this section, we define C eA = C k/2 , for simplicity.
A,g (A)
12
By w|x , we mean the following: suppose g(y1 , . . . , yk/2 ) = (w1 , . . . , wk/2 ) for yi ∈ {0, 1}n and wi ∈ {0, 1}, and
yℓ = x for some ℓ ∈ [k/2], then w|x = wℓ .

41
Remark 4.14 (Well-defined conditional probabilities). In order to simplify our exposition, we
will assume without loss of generality throughout this section that for every B ∈ Sk , we have
PrG [G(B) = g k (B)] > 0. Indeed, this can be easily obtained by defining a modified circuit G ′ from
G that, say, with probability 2−n outputs a uniformly random value in {0, 1}k , and otherwise runs
G on B. We have this assumption so that definitions and proofs become simpler when considering
certain conditional probabilities. We stress that our argument would still work without this trick,
with a slightly more complicated exposition.
We will need a few definitions to prove the theorem. Observe that some of them implicitly
refer to G and g, which are fixed for the remainder of this section. We start with the definition of
correctness of the algorithm G on a k-set.
Definition 4.15. For B ∈ Sk , we define

CorrG (B) = E[G(B) = gk (B)].


G

We say that B is η-correct if CorrG (B) ≥ η.


Notice that, using this notation, we can rewrite the assumption of Equation (20) in Theo-
rem 4.8 as
E [CorrG (B)] ≥ ε.
B∈Sk

Since G is fixed throughout this section, we might write Corr(B) instead of CorrG (B).
Next, we define good edges as (mostly) correct sets for which many of their neighbours are also
(mostly) correct.
Definition 4.16 (Good edges). We say an edge (A, B) ∈ I is (γ, η)-good if
(i ) B is η-correct; and

(ii ) A γ fraction of the neighbors of A are η-correct, i.e.,

E [Corr(B ′ ) ≥ η] ≥ γ.
B ′ ∈NI (A)

We remark that with the above definition we already start deviating from [IJKW10]. Since
their version of G is deterministic, its answers are either correct or wrong and therefore they can
afford to have η = 1. In our case, we need to take into account the randomized aspect of G and
therefore we need to be more flexible with the definition of goodness.
As in [IJKW10], in order to prove the correctness of CA,w , we need edges that satisfy a stronger
property than the above, referred to as “excellence”. For that, we first define the function that
computes the probability that an edge (A, B ′ ) leads to a wrong answer on a random x ∈ B ′ \ A
conditioned on a correct answer on A.
Definition 4.17. For an edge (A, B ′ ) ∈ I, we define
 
ErrCons(A, B ′ ) = E′ perr (x, B ′ ) , (22)
x∈B \A

where perr (x, B ′ ) = Pry∼G(B ′ ) [y|x 6= g(x) | y|A = gk/2 (A)].


Using the foregoing definition, we define an excellent edge (A, B) as a good edge for which the
expected fraction of errors in the neighbors of A is small, where this expectation gives weight to
the edges based on their probability of being correct on A.

42
Definition 4.18 (Excellent edges). We call an edge (A, B) (η, γ, α)-excellent if it is (η, γ)-good and
 ′


E ErrCons(A, B ) ≤ α, (23)
B ∼WI (A)

where the distribution


 WI (A), supported
 as follows: for each B ′ ∈ NI (A), let
over NI (A), is defined P
pcons (B ′ ) = PrG G(B ′ )|A = gk/2 (A) . Moreover, let ptot (A) = B ′ ∈NI (A) pcons (B ′ ). Then each B ′
is assigned probability pcons (B ′ )/ptot (A).

For the reader familiar with [IJKW10], we mention that the definition above is really crucial. It
refers to a more general class of distributions WI (A) when contrasted with the analogous definition
from their paper. (This distribution naturally appears in our error analysis when we condition on
CA,w not outputting the error symbol ⊥.) Jumping ahead, it ties the two main lemmas stated below
and proved in the subsequent sections, and introduces significant difficulties when establishing each
of them.
Given these definitions, we can now state the main two lemmas needed for the proof of Theo-
rem 4.8. The first lemma shows that if the decoding algorithm hits an excellent edge, then it will
decode correctly with high probability. The second lemma shows that there are many excellent
edges, and so the decoding algorithm will hit one with non-trivial probability (this is close to 0).

Lemma 4.19 (Excellence implies correctness). Fix some 0 ≤ β ≤ 1 and let λ = 2e−βk/24 . More-
over, assume that
γ, η < 1/10 and 4e−kα/12 ≤ (γ · η)5 .
If (A, B) is a (γ, η, α)-excellent edge, then

E eA (x) = g(x)] ≥ 1 − β − (1 − η(γ − λ)/2)T − 16α,


[C
x∼U\A,
eA , G
C

eA .
where T is the number of iterations in C

Lemma 4.20 (Excellent edges are abundant). For any α < 12 , if EB∈Sk [Corr(B)] ≥ ε then at least
α
an (ε/3 − 62208
α3 ·ε5
· e− 96 ·k )-fraction of the edges (A, B) ∈ I are (ε/3, ε/3, α)-excellent.

We prove Lemmas 4.19 and 4.20 in Sections 4.3.2 and 4.3.3, respectively. Assuming these
results, we are ready to prove Theorem 4.8.

Proof of Theorem 4.8. We begin with the first part of the result; namely, showing that the algo-
rithm CA,w defined in Construction 4.12, parameterized as we specify below, is of size poly(n, k, log(1/δ), 1/ε)
and satisfies h i
G
E CA,w (x, y) = g(x) ≥ 1 − δ.
x∼U,
y∼{0,1}∗ , G

First, note that we can assume that ε < 1/10 without loss of generality. For this it is sufficient to
redefine G to output a random value with probability 1 − 10−3 , and to compute as before otherwise.
Its overall success probability drops at most by a constant factor, and now ε < 1/100. We also
assume that δ ≤ 1/2, by taking a smaller δ if necessary.
Let C be a large positive constant independent of the remaining parameters. Let
def def def
γ, η = ε/3 < 1/10 and β = δ/3 and α = δ/48 and T = C · log(1/δ) · (1/ε2 ),

43
where T is the repetition parameter in Construction 4.13. Note that
    
1 1 1
k ≥ C· · log + log ,
δ δ ε
and so 4e−kα/12 ≤ (γ · η)5 , for a sufficiently large choice of C.
By applying Lemma 4.20 with respect to these α, ε, and k we have that at least an (ε/3 −
α
62208
α3 ·ε5
· e− 96 ·k )-fraction of the edges (A, B) ∈ I are (ε/3, ε/3, α)-excellent. Let λ = 2e−βk/24 < ε/6.
Note that our choice of parameters satisfies all the conditions of Lemma 4.19, and thus by invoking
it we obtain that if (A, B) is a (γ, η, α)-excellent edge, then
E eA = g(x)] ≥ 1 − β − (1 − η(γ − λ)/2)T − 16α,
[C
x∼U\A,
eA , G
C

eA = C k/2 (x) as previously defined.


where C A,g
Now, observe that
   T
T ε ε ε 1 T ε2 2 δ
(1 − η(γ − λ)/2) ≤ 1 − − · = 1− ≤ e−(ε /36)·T ≤ ,
3 3 6 2 36 3
where we have used that ε < 1/100 and that the constant C appearing in T is large. As a conse-
quence, if the randomized algorithm D succeeds in sampling a pair (A, B) that is (ε/3, ε/3, δ/48)-
excellent and it also samples a value w = G(B)|A which agrees with gk/2 (A), we get the randomised
eA with oracle access to G such that
circuit C
E eA (x) = g(x)] ≥ 1 − δ/3 − δ/3 − δ/3 = 1 − δ.
[C
x∼U\A,
eA , G
C

It is not hard to see that the inequality above also holds when x is sampled from the (marginally)
eA is always correct for x ∈ A.
larger set U, since C
On the other hand, note that, due to our choice of parameters in Lemma 4.20 we have
62208 − α ·k
· e 96 ≤ ε/6.
α3 · ε5
It follows from Lemma 4.20 that at least an (ε/6)-fraction of the pairs (A, B) ∈ I are (ε/3, ε/3, δ/48)-
excellent. In particular, excellent edges of this form exist.
To sum up, the first part of the lemma follows by picking B O = C eA with O = G for any choice
of (A, B) as above. Note that, by our choice of T , the number of gates in the circuit B O satisfies
the parameters of the lemma.
We now prove that the uniform randomized algorithm D outputs a circuit with the desired
properties with probability at least Ω(ε2 ) over its internal randomness and the randomness of G.
It follows from our discussion in the paragraph above that, in order for the circuit output by D to
have the desired accuracy, we need two properties from the values produced by D:
(i ) the randomly selected edge (A, B) is (ε/3, ε/3, δ/48)-excellent; and
(ii ) the sampled value w = G(B)|A coincides with gk/2 (A).
As we have already established, an Ω(ε) fraction of edges (A, B) are (ε/3, ε/3, δ/48)-excellent, so
by picking a random k-set B and a random k/2-subset A of B, which produces a uniformly random
edge (A, B), we have that (A, B) is an excellent edge with probability Ω(ε). Then, assuming that
(A, B) is (ε/3, ε/3, δ/48)-excellent, it follows that G(B) = gk (B) with probability at least ε/3, which
implies that the probability that G(B)|A = g k/2 (A) is also at least ε/3. Therefore, the probability
that D outputs a circuit with the desired properties is at least Ω(ε2 ).

44
4.3.2 Excellence implies correctness
In this section, we prove Lemma 4.19, which shows that if an excellent edge (A, B) is picked,
then C̃A computes g(x) with high probability, on average over the x’s. Since this proof is fairly
technical, we first give an overview of its structure, before diving into the details. Recall that the
goal is to upper bound the quantity
eA (x) 6= g(x)].
E [C (24)
x∈U\A
CeA , G

eA (x) 6= g(x) occurs if either (1) the decoder outputs ⊥ and aborts
It is easy to see that the event C
or (2) the decoder does not output g(x) conditioned on not outputting ⊥.
We first show that the probability of (1) happening is small. More precisely, we show that given
a fixed excellent edge (A, B) and a sufficiently large number of iterations T , for most x ∈ U \ A the
probability that C eA outputs ⊥ on x is negligibly small.
Then, we follow up showing that (2) happens with low probability, which turns out to be much
more cumbersome than [IJKW10]. Here, we need to upper bound
def
µ = E E [y 6= g(x) | y 6= ⊥] . (25)
eA (x), G
x∈U\A y∼C
| {z }
:=h(x)

If we dive into the definition of C̃A (x), Equation (25) computes, for a uniformly random x ∈
U \ A, the probability that at some iteration C̃A picks some B ′ ∼ NI (A, x) and then G(B ′ ) outputs
some value y that is incorrect on x given that it is correct on A. Alternatively (as a thought
experiment), we could consider a different sampling procedure that picks B ′ ∼ NI (A) and then
picks a uniformly random x ∈ B ′ \A and analyze the probability of being incorrect to x conditioned
on the fact of being correct to A. This latter procedure would give an expression similar to that
for determining the excellence of the edge (A, B) and could then be related to α as required. In
fact, this is the approach taken by [IJKW10] where the sampling behaviour satisfied by A, B and x
(in their case), allows them to switch between both scenarios in a fairly direct way. This already
becomes more complicated in our case.
For one thing, in our definition of excellence (see Definition 4.18), we sample B ′ ∼ WI (A) and
not uniformly at random. However, we succeed in indirectly relating the α-excellence of (A, B) to
Eq. (25) in two steps: (i) show that α upper bounds the expression

E E [h(x)]; (26)
B ′ ∼WI (A) x∼B ′ \A

(ii) lower bound Eq. (26) in terms of µ. In order to relate these two expressions, we still need
to contend with the fact that B ′ ∈ NI (A) is sampled with probability proportional to pcons (B) in
Eq. (26). To this end, we first partition the set NI (A) as follows: let Γi be the set of all B ′ ∈ NI (A)
such that pcons (B ′ ) lies in the interval (2−i , 2−i+1 ]. Observe that in the scenario of [IJKW10], if
we were to do a similar partitioning, there would only be two sets, B ′ s that are consistent with A
and the B ′ s that aren’t consistent. For us, since each B is associated with a different weight, each
B has a varying “degree of consistency”. We now decompose Eq. (26) into all the distinct buckets
and write it as
X    ′ 
E E h(x) | B ′ ∈ Γi · Pr B ∈ Γi . (27)
B ′ ∼WI (A) x∼B ′ \A B ′ ∼WI (A)
i

45
At this point, we show with some straightforward calculations that most of the contribution in
Eq. (27) comes from the buckets Γi that are heavy (i.e., contain many B ′ ∈ NI (A) inside them)
and for such heavy buckets, we show that
   ′ 
E h(x) | B ′ ∈ Γi · Pr B ∈ Γi ≥ σ · E [C eA (x) 6= g(x)], (28)
x∼B ′ \A B ′ ∼WI (A) x∈U\A
CeA , G

for some universal constant σ < 1. At this point, observe that the RHS of Eq. (28) is exactly the
quantity we wanted to upper bound, i.e., Eq. (24) and we have shown that the LHS is at most α.
This shows that µ ≤ O(α) and concludes the proof of the theorem. We now make this sketch more
rigorous below.
We start with the first point of showing that the probability of outputting ⊥ is small. In order
to show this, we will need to use that if (A, B) is a (γ, η)-good edge (which is implied by (γ, η, α)-
excellence), we can bound the number of x ∈ U \ A for which less than about a γ/2-fraction of
the sets B ′ ∈ NI (A, x) have E G(B ′ ) A = g k/2 (A) ≥ η (these are the x’s on which CeA is likely to
output ⊥). This claim is formalized as follows.

Lemma 4.21 (Analogue of [IJKW10, Lemma 3.9]). Let β ∈ [0, 1] and λ = 2e−βk/24 . Fix a
(γ, η)-good edge (A, B) ∈ I. Let F ⊆ U \ A be the set of elements x such
 that strictly less than
a (γ − λ)/2 fraction of B ′ ∈ NI (A, x) satisfy PrG G(B ′ ) A = gk/2 (A) ≥ η. Then the density
def
ρ(F ) = |F |/|U \ A| < β.

Proof. Assume toward a contradiction that ρ(F ) = β (the case where ρ(F ) > β follows by fixing
some F ′ ⊆ F with ρ(F ′ ) = β). Define the set
  
W = B ′ ∈ NI (A) : E G(B ′ ) A = gk/2 (A) ≥ η .

Observe that
   
Pr x ∈ F ∧ B ′ ∈ W = Pr [x ∈ F ] · Pr B ′ ∈ W | x ∈ F < β · (γ − λ)/2. (29)
x∈U\A x∈U\A x∈U\A
B ′ ⊇{x}∪A B ′ ⊇{x}∪A

On the other hand, let


|(B ′ \ A) ∩ F |
W ′ = {B ′ ∈ W : ≥ β/2},
|B ′ \ A|
and observe that since W ′ ⊆ W , we have
   
Pr x ∈ F ∧ B′ ∈ W ≥ Pr x ∈ F ∧ B′ ∈ W ′
x∈U\A x∈U\A
B ′ ⊇{x}∪A B ′ ⊇{x}∪A
 
= Pr [B ′ ∈ W ′ ] · Pr x ∈ F | B′ ∈ W ′ , (30)
B ′ ∈NI (A) B ′ ∈NI (A)
x∈B ′ \A

where the equality uses the following simple fact: selecting x uniformly from U \ A then picking
a uniform B ′ ∈ NI (A) that contains {x} ∪ A is equivalent to selecting B ′ uniformly from NI (A)
then picking a uniform x ∈ B ′ \ A. To complete the proof, we argue next that the last expression
is larger than the bound in Eq. (29).
Since (A, B) is (γ, η)-good, we get that that |N|W |
I (A)|
≥ γ. Moreover, using that ρ(F ) = β and
our choice of λ, by the Hoeffding bound (Lemma 2.3) the density of W \ W ′ inside NI (A) is at

46

most λ. Consequently, we have that |N|W |
I (A)|
≥ γ − λ. Using this estimate along with Eq. (30),
   
Pr x ∈ F ∧ B′ ∈ W ≥ Pr [B ′ ∈ W ′ ] · ′ Pr x ∈ F | B ′ ∈ W ′ ≥ (γ − λ) · β/2,
x∈U\A B ′ ∈NI (A) B ∈NI (A)
B ′ ⊇{x}∪A x∈B ′ \A
(31)
which stands in contradiction to Eq. (29).

We now prove Lemma 4.19, which shows that selecting an excellence edge leads to correctness.

Proof of Lemma 4.19. Let β, λ, and T be as in the statement of the lemma, and suppose that
(A, B) is a (γ, η, α)-excellent edge. We show that
E eA = g(x)] ≥ 1 − β − (1 − η(γ − λ)/2)T − 16α.
[C
x∼U\A,
eA , G
C

We first decouple the errors that stem from failing to find a consistent neighbour (in which
case thee algorithm outputs ⊥ and aborts) and those in which a consistent neighbour was found
yet still the algorithm output incorrectly. To this end, by a union bound, we have
eA (x) 6= g(x)] ≤ Pr [C
Pr [C eA (x) = ⊥] + Pr [y 6= g(x) | y 6= ⊥]. (32)
x∈U\A x∈U\A x∈U\A
CeA , G CeA , G y∼CeA (x), G

We will first bound the first term of the RHS of Eq. (32).
Let F be the subset of U \ A such that x ∈ F iff strictly less than a (γ − λ)/2 fraction of
B ′ ∈ NI (A, x) satisfy E[G(B ′ )|A = gk/2 (A)] ≥ η. We have from Lemma 4.21 that ρ(F ) < β, where
ρ(F ) denotes the measure of F inside U \ A.
We now compute the probability that C eA (x) = ⊥ for an arbitrary but fixed x 6∈ F . Let W
be the subset of NI (A, x), such that B ∈ W iff E[G(B ′ )|A = gk/2 (A)] ≥ η. From the assumption

that x 6∈ F , we have that PrB ′ ∈NI (A,x) [B ′ ∈ W ] ≥ (γ − λ)/2. Let Ei be the event that on the i-th
iteration of CeA , v ′ |A 6= gk/2 (A), where v ′ is sampled in Step 2.2 from the definition of the circuit
e
CA . For a fixed i ∈ {1, . . . , T }, and a fixed x, we have
Pr[Ei ] = Pr [G(B ′ )|A 6= g k/2 (A) | B ′ ∈ NI (A, x)]
eA ,G
C

= 1 − Pr [G(B ′ )|A = gk/2 (A) | B ′ ∈ NI (A, x)]


eA ,G
C

≤ 1 − Pr [G(B ′ )|A = gk/2 (A) | B ′ ∈ W ′ ] · Pr [B ′ ∈ W ]


eA ,G
C B ′ ∈NI (A,x)

γ−λ
≤1−η· .
2
eA a fresh selection of B ′ ∼ NI (A, x) and v ′ ∼ G(B ′ ) is made, we
Using that in each iteration of C
get that
Pr [CeA (x) = ⊥ | x ∈
/ F ] ≤ Pr [E1 ∧ · · · ∧ ET ] ≤ (1 − η(γ − λ)/2)T .
eA , G
C eA , G
C

Putting together the previous estimates,


eA (x) = ⊥] ≤ Pr [x ∈ F ] + Pr [C
Pr [C eA (x) = ⊥ | x 6∈ F ]
x∈U\A x∈U\A x∈U\A
CeA , G CeA , G CeA , G

≤ β + (1 − η(γ − λ)/2)T .

47
We now bound the second term of the RHS of Eq. (32). Assuming that the output of the
eA (x) is not ⊥, we have that at some iteration C
circuit C eA (x) picks B ′ ∼ NI (A, x) and v ′ ∼ G(B ′ )
′ k/2
such that v |A = g (A).
For convenience, for x ∈ U \ A let us define h(x) to be the conditional probability that C eA
e
produces an incorrect answer on x (over the randomness of CA and G), given that it does not
output ⊥. Our goal is to show that

Pr [y 6= g(x) | y 6= ⊥] = E [h(x)] (33)


x∈U\A x∼U\A
y∼CeA (x), G

is small. To show this, it will be convenient to use the following notation. For B ∈ NI (A),
 
pcons (B) =P PrG G(B)|A = g k/2 (A) .
ptot (A) = P B∈NI (A) pcons (B).
ptot (x) = B∈NI (A,x) pcons (B).
pused (x, B) = pcons (B)/ptot (x).
perr (x, B) = Prv∼G(B) [v|x 6= g(x) | v|A = g k/2 (A)].

Note that all these values depend on A. It is not hard to see that
X
h(x) = perr (x, B) · pused (x, B). (34)
B∈NI (A,x)

Moreover, since

Prv∼G(B) [v|x 6= g(x) ∧ v|A = g k/2 (A)] Prv∼G(B) [v|x 6= g(x) ∧ v|A = gk/2 (A)]
perr (x, B) = = ,
PrG [G(B)|A = gk/2 (A)] pcons (B)

we have P ′′ )| 6= g(x) ∧ G(B ′′ )|A = gk/2 (A)]


B ′′ ∈NI (A,x) PrG [G(B x
h(x) = P ′′ )|
,
B ′′ ∈NI (A,x) PG [G(B A = g k/2 (A)]
or equivalently,
P ′′ |
B ′′ ∈NI (A,x) Prv′′ ∼G(B ′′ ) [v x 6= g(x) ∧ v ′′ |A = gk/2 (A)]
h(x) = . (35)
ptot (x)

Next, we upper bound the quantity on the RHS of Equation (33). For convenience, let

µ= E [h(x)].
x∼U\A

As we are interested in bounding the probability that CeA outputs an erroneous value when it
samples an excellent edge, we would like to relate µ to the excellence parameter α. We do this by
first relating α and then µ to the following intermediate expression:
h i 1 X
E E [h(x)] = · pcons (B ′ ) E [h(x)], (36)
B ′ ∼WI (A) x∼B ′ \A ptot(A) x∼B ′ \A
B ′ ∈NI (A)

where WI (A) is the distribution supported on NI (A) with each B ′ ∈ NI (A) being sampled with
probability pcons (B ′ )/ptot (A).

48
On the one hand, using Equation (35), we can rewrite the RHS of Equation (36) as:
1 X 1 X 1 X
· pcons (B ′ )· · · Pr[G(B ′′ )|x 6= g(x)∧ G(B ′′ )|A = gk/2 (A)].
ptot (A) ′ k/2 ′
p tot (x) ′′
G
B ∈NI (A) x∈B \A B ∈NI (A,x)

Note that in the expression above, every fixed (x, B ′′ ), where x ∈ U \ A and B ′′ ∈ NI (A, x),
contributes a value
 
1 X 1 1
· pcons (B ′ ) · · · Pr[G(B ′′ )|x 6= g(x) ∧ G(B ′′ )|A = gk/2 (A)],
ptot (A) ′
k/2 p tot (x) G
B ∈NI (A,x)

since it appears in the sum with a corresponding factor pcons (B ′ ) for each B ′ in NI (A, x). Since the
corresponding sum of pcons (B ′ ) is precisely ptot (x), it follows that every (x, B ′′ ) appears in Equation
(36) with a contribution of
1 1
· · Pr[G(B ′′ )|x 6= g(x) ∧ G(B ′′ )|A = gk/2 (A)], (37)
ptot (A) k/2 G
and Equation (36) is precisely the sum of these contributions over all (x, B ′′ ).
On the other hand, using that the edge (A, B) from the statement of the lemma is α-excellent,
we know that
 

α ≥ E E [perr (x, B )]
B ′ ∼WI (A) x∼B ′ \A
1 X
= · pcons (B ′ ) E [perr (x, B ′ )]
ptot (A) x∼B ′ \A
B ′ ∈NI (A)
1 X 1 X
= · Pr[G(B ′ )|x 6= g(x) ∧ G(B ′ )|A = gk/2 (A)].
ptot (A) k/2 G
B ′ ∈NI (A) ′ x∈B \A

In the expression above, each edge (x, B ′ ) also contributes a value equal to that in Equation (37).
Consequently, we get from the discussion above that
h i
α ≥ E E [h(x)] . (38)
B ′ ∼WI (A) x∼B ′ \A

Next, we move on to relate µ = Ex∼U\A [h(x)] to the RHS of this inequality. Unlike in the classical
case, this calculation is more involved as the B ′ in Equation (38) is not sampled uniformly from
NI (A) but from WI (A) (where sampling B ′ is proportional to pcons (B ′ )). To deal with this,
we partition the elements B ′ ∈ NI (A) into buckets based on their value pcons (B ′ ) ∈ [0, 1]. Let
ℓ = 5 · log(1/(γ · η)), and for i ∈ {0, 1, . . . , ℓ}, set
Γi = {B ′ ∈ NI (A) | pcons (B ′ ) ∈ (2−i−1 , 2−i ] }.
We also define the exceptional bucket Γℓ+1 as follows:
Γℓ+1 = {B ′ ∈ NI (A) | pcons (B ′ ) ≤ 2−ℓ−1 }.
S
Note that the buckets are disjoint, and that NI (A) = ℓ+1 ′ ′
i=0 Γi . For B ∈ NI (A), let µ(B ) =
Ex∼B ′ \A [h(x)]. Then
h i  
E E [h(x)] = E µ(B ′ )
B ′ ∼WI (A) x∼B ′ \A B ′ ∼WI (A)
ℓ+1
X     (39)
= E µ(B ′ ) | B ′ ∈ Γi · Pr B ′ ∈ Γi .
B ′ ∼WI (A) B ′ ∼WI (A)
i=0

49
Our goal is to lower bound the expression in Equation (39). Towards that, we aim to bound
each term in the expression individually. The following simple claim will be useful. It shows that,
in our analysis, for 0 ≤ i ≤ ℓ, we can replace sampling a B ′ from Γi according to WI (A) with
sampling a uniformly random B ′ ∼ Γi .
Claim 4.22. For every 0 ≤ i ≤ ℓ,
  1  
E µ(B ′ ) | B ′ ∈ Γi ≥ · ′E µ(B ′ ) .
B ′ ∼WI (A) 2 B ∼Γi

Proof of Claim 4.22. Indeed,


P pcons (B ′ )
  B ′ ∈Γi ptot (A) · µ(B ′ )
′ ′ P
E µ(B ) | B ∈ Γi =
B ′ ∼WI (A) (1/ptot (A)) · B ′ ∈Γi pcons (B ′ )
P ′ ′
′ ∈Γ pcons (B ) · µ(B )
= BP i

B ′ ∈Γi pcons (B )
P −i−1 · µ(B ′ )
B ′ ∈Γi 2

|Γ | · 2−i
 i 
1 1 X
= · · µ(B ′ )
2 |Γi | ′
B ∈Γi
1 

= · ′E µ(B ′ ) .
2 B ∼Γi

Notice that this bound does not work for Γℓ+1 . However, it suffices for us to show that
PrB ′ ∼WI (A) [B ′ ∈ Γℓ+1 ] is “small” and can be omitted while determining the lower bound for Equa-
tion (39). In fact, we are able to claim something slightly stronger as shown next. For 0 ≤ i ≤ ℓ,
we say that bucket ΓiP is large if |Γi | ≥ (γ · η)5 · |NI (A)|. Otherwise, we say that it is small. For
convenience, let wi = B ′ ∈Γi pcons (B ′ ). Recall that (A, B) is a (γ, η)-good edge, and therefore

ℓ+1
X
wi = ptot (A) ≥ γ · η · |NI (A)|. (40)
i=0

Claim 4.23. The following upper bounds hold.

(i ) For 0 ≤ i ≤ ℓ, if Γi is small then

Pr [B ′ ∈ Γi ] ≤ (γ · η)4 .
B ′ ∼WI (A)

(ii ) Moreover, in the special case where i = ℓ + 1, we have

Pr [B ′ ∈ Γℓ+1 ] ≤ (γ · η)4 .
B ′ ∼WI (A)

Proof of Claim 4.23. For the proof of Item (i ), we rely on Equation (40) and on the smallness of Γi :

wi |Γi | · 2−i |Γi | (γ · η)5 · |NI (A)|


Pr [B ′ ∈ Γi ] = ≤ ≤ ≤ = (γ · η)4 .
B ′ ∼WI (A) ptot (A) ptot (A) ptot (A) γ · η · |NI (A)

50
For the proof of Item (ii ), we rely on Equation (40) an on the upper bound on pcons (B ′ ) for
B ′ ∈ Γℓ+1 :

wℓ+1 |Γℓ+1 | · 2−ℓ−1 |NI (A)| · 2−ℓ−1 2−ℓ−1


Pr [B ′ ∈ Γℓ+1 ] = ≤ ≤ ≤ ≤ (γ · η)4 ,
B ′ ∼WI (A) ptot (A) ptot (A) ptot (A) γ ·η

where the last inequality uses ℓ = 5 · (log(1/γ · η)).

Intuitively, Claim 4.23 shows that the only significant terms in Equation (39) are the ones
coming from buckets Γi with 0 ≤ i ≤ ℓ that are large. Moreover, since there are O(log(1/γη))
terms, the combined probability weight of the insignificant terms is also small.
Claim 4.24. A random B ′ ∼ WI (A) is likely to belong to a large bucket Γi with 0 ≤ i ≤ ℓ, i.e.,

Pr [B ′ is in a large bucket Γi for some 0 ≤ i ≤ ℓ ] ≥ 1/2.


B ′ ∼WI (A)

Proof of Claim 4.24. Using Claim 4.23 and a union bound over buckets,

Pr [B ′ is in a small bucket or B ′ ∈ Γℓ+1 ] ≤ (ℓ + 2) · (γη)4 ≤ 6 · log(1/(γη)) · (γη)4 ≤ 1/2,


B ′ ∼WI (A)

where the last inequality uses the assumption of the lemma that γ, η ≤ 1/10.
 
Finally, the next claim establishes that when 0 ≤ i ≤ ℓ and Γi is large, EB ′ ∼Γi µ(B ′ ) = Ω(µ),
provided that µ is not too small and k is large enough. (Recall that if µ is small, specifically less
than the excellence parameter α, we are already done.)
Claim 4.25. Suppose that µ ≥ α. Let 0 ≤ i ≤ ℓ and Γi be a large bucket. Then
!
  µ 2e −kα/12 µ
E µ(B ′ ) ≥ · 1− 5
≥ .

B ∼Γi 2 (γη) 4

Proof of Claim 4.25. Let


Bad = {B ′ ∈ NI (A) | µ(B ′ ) < µ/2}.
Then, by the Hoeffding bound, PrB ′ ∼NI (A) [B ′ ∈ Bad] ≤ 2e−kµ/12 ≤ 2e−kα/12 , using that µ ≥ α.
Note that, since Γi is large, |Γi | ≥ (γη)5 · |NI (A)|. For convenience, let λ = 2e−kα/12 . Then

|Bad| λ · |NI (A)| λ


≤ 5
= .
|Γi | (γη) |NI (A)| (γη)5

Consequently,
     ′ 

E µ(B ′ ) ≥ ′
E µ(B ′ ) | B ′ ∈
/ Bad · Pr

B ∈
/ Bad
B ∼Γi B ∼Γi B ∼Γi
  |Γi | − |Bad|
= E µ(B ′ ) ·
B ′ ∼Γi \Bad |Γi |
   
µ |Bad| µ λ
≥ · 1− ≥ · 1− ,
2 |Γi | 2 (γη)5

where the second inequality used the definition of Bad. Claim 4.25 follows using the value of λ and
the hypothesis of the lemma.

51
We are ready to conclude the proof of Lemma 4.19. If µ ≤ α, we are done. So we assume from
now on that µ ≥ α. From Equations (38) and (39), we have
h i
α ≥ E E [h(x)]
B ′ ∼WI (A) x∼B ′ \A
ℓ+1
X    
= E µ(B ′ ) | B ′ ∈ Γi · Pr B ′ ∈ Γi
B ′ ∼WI (A) B ′ ∈WI (A)
i=0
X 1    ′ 
(Omitting terms + Claim 4.22) ≥ · ′E µ(B ′ ) · Pr B ∈ Γi
2 B ∼Γi B ′ ∼WI (A)
0≤i≤ℓ
Γi is large
X µ  ′ 
(By Claim 4.25) ≥ · Pr B ∈ Γi
8 B ′ ∼WI (A)
0≤i≤ℓ
Γi is large
µ  ′ 
= · Pr B is in a large bucket Γi for some 0 ≤ i ≤ ℓ
8 B ′ ∼WI (A)
µ
(By Claim 4.24) ≥ .
16
This shows that µ ≤ 16α, which completes the proof of Lemma 4.19.

4.3.3 Excellent edges are abundant


In this section, we prove Lemma 4.20, which loosely speaking, shows that if the algorithm G
non-trivially agrees with gk , then there are many edges that satisfy the excellence condition. Recall
that an edge is said to be (η, γ, α)-excellent if it is: (1) (η, γ)-good, and (2) satisfies
 

E ErrCons(A, B ′ ) ≤ α,
B ∼WI (A)

where WI (A) gives each edge (A, B ′ ) weight according to its probability of being correct on A
(see Definition 4.18). Our first lemma shows that the first condition holds, i.e., that under the
assumption that our algorithm G computes g k with probability at least ε, there are many good
edges.

Lemma 4.26. If EB∈Sk [Corr(B)] ≥ ε, for any 0 < η, γ, ξ such that η + γ + ξ = ε, we have that at
least a ξ-fraction of (A, B) ∈ I are (γ, η)-good.

Proof. We prove the contrapositive statement in the lemma. In this direction, let

| (A, B) ∈ I : B ∈ NI (A) and (A, B) is (γ, η)-good | < ξ · |I|. (41)

We define the set  



G = A ∈ Sk/2 : E [Corr(B ) ≥ η] ≥ γ
B ′ ∈NI (A)

of As who have at least an γ-fraction of η-correct neighbors.


By our assumption in Equation (41), we have that |G| < ξ|Sk/2 | and thus

E [A ∈ G] · E [Corr(B)|A ∈ G] < ξ · 1 = ξ. (42)


A∈Sk/2 A∈Sk/2 ,
B∈NI (A)

52
We also have that
h i
A6∈G
E [A 6∈ G ∧ Corr(B) < η] · E Corr(B) Corr(B)<η ≤ 1 · η = η, (43)
A∈Sk/2 A∈Sk/2
B∈NI (A) B∈NI (A)

and
h i
A6∈G
E [A 6∈ G ∧ Corr(B) ≥ η] · E Corr(B) Corr(B)≥η < γ · 1 = γ, (44)
A∈Sk/2 A∈Sk/2
B∈NI (A) B∈NI (A)

where the first inequality above used the fact that conditioned on A ∈
/ G, then the probability of a
uniformly random B ′ ∈ NI (A) being η-correct is at most γ (by definition of G).
We have that

E [Corr(B)] = E [Corr(B)] < Eq. (42) + Eq. (43) + Eq. (44) < ξ + η + γ = ε,
B∈Sk A∈Sk/2 ,
B∈NI (A)

where the first equality above used that we can obtain the uniform distribution on B ∈ Sk by
uniformly picking A ∈ Sk/2 and then considering a random B ∈ NI (A); and the last equality is by
the assumption of the lemma. This concludes the proof of the statement.

Next we will use Lemma 4.26 to strengthen the foregoing conclusion and show that if G com-
putes g k with probability at least ε, then not only is the number of good edges large, but rather
the number of excellent edges is also large. Formally, we prove that if EB∈Sk [Corr(B)] ≥ ε then at
α
least an (ε/3 − 62208
α3 ·ε5
· e− 96 ·k )-fraction of the edges (A, B) ∈ I are (ε/3, ε/3, α)-excellent. Note that
the fraction of edges with this property is at least ε/6 if k = 100 α · (15 + 3 log(1/α) + 6 log(1/ε)), so
this does show that a noticeable fraction of edges are excellent as long as k is not too small.
Compared with its counterpart in [IJKW10], in the proof of Lemma 4.20 we face additional
difficulties due to the asymmetries in the definition of excellent edges, which in this paper refer to
the more involved distribution WI (A).

Proof of Lemma 4.20. Consider random choices of A ∼ Sk/2 and B ∼ NI (A). We would like to
lower bound
Pr [(A, B) is (ε/3, ε/3, α)-excellent].
A,B

In order to show that an (ε/3, ε/3)-good edge (A, B) ∈ I is also (ε/3, ε/3, α)-excellent, we need to
show that
E [ErrCons(A, B ′ )] ≤ α.
B ′ ∼WI (A)

It will be useful to introduce the following probability space and event. In addition to sampling
A ∼ Sk/2 and B ∼ NI (A), independently sample B ′ ∼ NI (A), v ′ = G(B ′ ), and x ∼ B ′ \ A. Let
Err(B ′ , v ′ ) = {x ∈ B ′ \ A | v ′ |x 6= g(x)} be the set of xs in B ′ \ A for which v ′ |x disagrees with g(x),
and err(B ′ , v ′ ) = |Err(B ′ , v ′ )|/(k/2). We introduce the following event.

E(A, B, B ′ , v ′ , x): The following conditions hold:

v ′ |A = gk/2 (A).
err(B ′ , v ′ ) > α/4.
v ′ |x 6= g(x).

53
By symmetry, when analysing Pr[E] we can select B ′ first (accordingly sampling v ′ and x depending
on it), followed by choices of a random A contained in B ′ and a random B that contains A. Note
that
1 X
Pr[E] = Pr[E | B ′ fixed],
|Sk | ′
B ∈Sk

by “B ′ fixed” we mean the event that the random choice gives a fixed B ′ ∈ Sk . We now decompose
each term in the sum above according to the possible values of λ = err(B ′ , v ′ ) for v ′ ∼ G(B ′ ). Since
for E to hold this value must be larger than α/4, we get for each fixed B ′ :
X
Pr[E | B ′ fixed] = Pr[E | B ′ is fixed ∧ err(B ′ , v ′ ) = λ] · Pr′ [err(B ′ , v ′ ) = λ].
v
λ>α/4

For a fixed B ′ and any choice of v ′ with err(B ′ , v ′ ) = λ, if Err(B ′ , v ′ ) ∩ A 6= ∅ then it cannot
be the case that v ′ |A = gk/2 (A). Consequently, for E to hold A must avoid the set Err(B ′ , v ′ ),
and in addition, x must be selected from Err(B ′ , v ′ ). It follows from the Hoeffding bound using
λ > α/4 that
α
Pr[E | B ′ is fixed ∧ err(B ′ , v ′ ) = λ] ≤ 2 · e− 96 ·k · λ.
Putting together these estimates, and using that λ ≤ 1, we have
1 X X α
Pr[E] ≤ 2 · e− 96 ·k · Pr [err(B ′ , v ′ ) = λ]
|T | ′ v′ ∼G(B ′ )
B ∈T λ>α/4
1 X α X
≤ 2 · e− 96 ·k · Pr [err(B ′ , v ′ ) = λ]
|T | ′ v′ ∼G(B ′ )
B ∈T λ>α/4
1 X α α
≤ 2 · e− 96 ·k ≤ 2 · e− 96 ·k .
|T | ′
B ∈T

Next, notice that

E [ErrCons(A, B ′′ )]
B ′′ ∼WI (A)
X  h i
1 ′′ ′′ ′′ k/2
= pcons (B ) E Pr v |x 6= g(x) | v |A = g (A)
ptot (A) x∼B ′′ \A v′′ ∼G(B ′′ )
B ′′ ∈N I (A)
X  h i
1 ′′ ′′ k/2
= E Pr v |x 6= g(x) ∧ v |A = g (A)
ptot (A) x∼B ′′ \A v′′ ∼G(B ′′ )
B ′′ ∈N I (A)

1 X 1 X h i
= Pr v ′′ |x 6= g(x) ∧ v ′′ |A = gk/2 (A)
ptot (A) (k/2) v′′ ∼G(B ′′ )
B ′′ ∈NI (A) x∈B ′′ \A
1 X
= Pr [v ′′ |A = gk/2 (A) ∧ x ∈ Err(B ′′ , v ′′ )]
ptot (A) v′′ ∼G(B ′′ ),
B ′′ ∈NI (A) x∼B ′′ \A

1 X
= pcons (B ′′ ) Pr [x ∈ Err(B ′′ , v ′′ ) | v ′′ |A = gk/2 (A)]
ptot (A) v′′ ∼G(B ′′ ), x∼B ′′ \A
B ′′ ∈NI (A)

54
 
h i
 
= E  Pr x ∈ Err(B ′′ , v ′′ ) | v ′′ |A = g k/2 (A)  ,
B ′′ ∼WI (A) v′′ ∼G(B ′′ ),
x∼B ′′ \A

where in the second equality we use the fact that


h i h i
pcons (B ′′ ) Pr v ′′ |x 6= g(x) | v ′′ |A = gk/2 (A) = Pr v ′′ |x 6= g(x) ∧ v ′′ |A = gk/2 (A) .
v′′ ∼G(B ′′ ) v′′ ∼G(B ′′ )

This motivates the following definition. We say that a set B ′′ is A-heavy if


h i
′′ ′′ ′′ k/2
′′
Pr ′′
x ∈ Err(B , v ) | v |A = g (A) > α/2.
v ∼G(B ),
x∼B ′′ \A

Then, in order to show that an (ε/3, ε/3)-good edge (A, B) is (ε/3, ε/3, α)-excellent, it suffices to
prove that
Pr [B ′′ is A-heavy] ≤ α/2.
B ′′ ∼WI (A)

Consider the following event, which only depends on A and B, but can also be considered over the
probability space of event E:

E1 (A, B): The following conditions hold:

(A, B) is (ε/3, ε/3)-good.


PrB ′′ ∼WI (A) [B ′′ is A-heavy] > α/2. (Note that this is a property of A.)

We want to show that the event E1 happens with small probability. In other words, we show that
Pr[E1 ] = Pr[E]/ Pr[E | E1 ] is small by arguing that Pr[E | E1 ] is not too small. We make use of the
following claim, whose proof is deferred:
Claim 4.27. For every (ε/3, ε/3)-good edge (A, B),
α
if PrB ′′ ∼WI (A) [B ′′ is A-heavy] > α/2 then PrB ′ ∼NI (A) [B ′ is A-heavy ∧ pcons (B ′ ) ≥ ε∗ ] > 108 · ε3 ,
def
where ε∗ = (α/8) · (ε2 /9) and pcons (B ′ ) = Prv′ ∼G(B ′ ) [v ′ |A = g k/2 (A)].
Assuming this claim, we proceed as follows. Under event E1 , it follows from Claim 4.27 that

Pr [B ′ is A-heavy ∧ pcons (B ′ ) ≥ ε∗ ] ≥ (α/108) · ε3 .


B ′ ∼NI (A)

Therefore, conditioning on E1 , with probability at least (α/108) · ε3 over the choices of A, B and B ′
we get

pcons (B ′ ) ≥ ε∗ and Pr [x ∈ Err(B ′ , v ′ ) | v ′ |A = gk/2 (A)] > α/2. (45)


v′ ∼G(B ′ ), x∼B ′ \A

We can write the latter probability as follows:


   
Pr x ∈ Err(B ′ , v ′ )∧err(B ′ , v ′ ) ≤ α/4 | v ′ |A = g k/2 (A) +Pr x ∈ Err(B ′ , v ′ )∧err(B ′ , v ′ ) > α/4 | v ′ |A = gk/2 (A) .

Since the leftmost probability is at most α/4, it follows that for a B ′ of this form

Pr [x ∈ Err(B ′ , v ′ ) ∧ err(B ′ , v ′ ) > α/4 | v ′ |A = gk/2 (A)] ≥ α/4.


v′ ∼G(B ′ ),
x∼B ′ \A

55
Note that x ∈ Err(B ′ , v ′ ) is equivalent to v ′ |x 6= g(x). Using this and rewriting the probability
inequality above using the expression Pr[F1 | F2 ] = Pr[F1 ∧ F2 ]/ Pr[F2 ],

Pr [v ′ |x 6= g(x) ∧ err(B ′ , v ′ ) > α/4 ∧ v ′ |A = g k/2 (A)]


v′ ∼G(B ′ ),
x∼B ′ \A

= Pr [v ′ |x 6= g(x) ∧ err(B ′ , v ′ ) > α/4 | v ′ |A = gk/2 (A)] Pr [v ′ |A = gk/2 (A)]


v′ ∼G(B ′ ) v′ ∼G(B ′ )

≥ (α/4) · pcons (B ′ )
≥ (α/4) · (α/8) · (ε2 /9),

where we have used pcons (B ′ ) ≥ ε∗ and pcons (B ′ ) = Prv′ ∼G(B ′ ) [v ′ |A = gk/2 (A)]. Overall, combining
this probability lower bound and the probability that the conditions in Equation (45) hold, it
follows that
α3 · ε5
Pr[E | E1 ] ≥ (α/108) · ε3 · (α/4) · (α/8) · (ε2 /9) = .
31104
Since Pr[E1 ] ≤ Pr[E]/ Pr[E | E1 ],
α
62208 · e− 96 ·k
Pr[E1 ] ≤ .
α3 · ε5
Finally, from the definition of event E1 and the discussion above we get

Pr [(A, B) is (ε/3, ε/3, α)-excellent] ≥ Pr [(A, B) is (ε/3, ε/3)-good] − Pr [E1 ].


A∼Sk/2 , A, B A, B
B∼NI (A)

This implies using Lemma 4.26 and our probability estimate for E1 that the probability that a
random edge (A, B) is (ε/3, ε/3, α)-excellent is at least
α
62208 · e− 96 ·k
ε/3 − .
α3 · ε5
In order to complete the argument, it remains to establish Claim 4.27.

Proof of Claim 4.27. Let (A, B) be an (ε/3, ε/3)-good edge, and assume that

Pr [B ′′ is A-heavy] > α/2.


B ′′ ∼WI (A)

We need to prove that

Pr [B ′ is A-heavy ∧ pcons (B ′ ) ≥ ε∗ ] > (α/108) · ε3 ,


B ′ ∼NI (A)

where ε∗ = (α/8) · (ε2 /9).


Note that the probability of interest can be rewritten as

Pr [B ′ is A-heavy | pcons (B ′ ) ≥ ε∗ ] · Pr [pcons (B ′ ) ≥ ε∗ ]. (46)


B ′ ∼NI (A) B ′ ∼NI (A)

Since ε∗ ≤ ε/3 and (A, B) is (ε/3, ε/3)-good, the rightmost probability is at least ε/3. We lower
bound the other probability next.

56
Let NI (A, ≥ ε∗ ) = {B ′ ∈ NI (A) | pcons (B ′ ) ≥ ε∗ }, and define NI (A, < ε∗ ) in a similar way. On
the one hand,

Pr [B ′ is A-heavy | pcons (B ′ ) ≥ ε∗ ] = Pr [B ′ is A-heavy]


B ′ ∼NI (A) B ′ ∼NI (A,≥ε∗ )
1 X (47)
= · 1[B ′ is A-heavy] .
|NI (A, ≥ ε∗ )|
B ′ ∈NI (A,≥ε∗ )

On the other hand, using the assumption of the claim,


 
1 X X
α/2 <  pcons (B ′ ) · 1[B ′ is A-heavy] + pcons (B ′ ) · 1[B ′ is A-heavy]

ptot (A)
B ′ ∈NI (A,≥ε∗ ) B ′ ∈NI (A,<ε∗ )
 
1 X X
≤  1[B ′ is A-heavy] + pcons (B ′ ) .
ptot (A) ′ ∗ ′ ∗
B ∈NI (A,≥ε ) B ∈NI (A,<ε )

This yields
X X
1[B ′ is A-heavy] > (α/2) · ptot (A) − pcons (B ′ ).
B ′ ∈NI (A,≥ε∗ ) B ′ ∈NI (A,<ε∗ )

In turn, thanks to our choice of ε∗ ,


X α ε2 α
pcons (B ′ ) ≤ |NI (A, < ε∗ )| · ε∗ ≤ |NI (A)| · · < · ptot (A),
8 9 4
B ′ ∈NI (A,<ε∗ )

where the last inequality uses that (A, B) is (ε/3, ε/3)-good, which yields ptot (A) ≥ |NI (A)|ε2 /9.
As a consequence,
X α ε2
1[B ′ is A-heavy] > (α/4) · ptot (A) ≥ · · |NI (A)|.
4 9
B ′ ∈NI (A,≥ε∗ )

Overall, the probability we want to lower bound in Eq. (47) is at least

1 α ε2 α ε2
· · · |NI (A)| ≥ · .
|NI (A, ≥ ε∗ )| 4 9 4 9
α ε2
Combining our estimates, Eq. (46) can be lower bound by 4 · 9 · ε/3 = α/108 · ε3 , which proves
the claim.

This completes the proof of Lemma 4.20.

4.3.4 Extension to quantum circuits


The next statement shares part of the terminology from Theorem 4.8, and we refer to the
beginning of Section 4.3 for more details. The part of the statement about uniformity refers to a
fixed sequence of functions g : {0, 1}n → {0, 1} indexed by n.

57
Theorem 4.28 (Local list decoding for quantum circuits). There exists a universal constant C ≥ 1
for which the following holds. Let n ≥ 1 be a positive integer, k be an even integer, and let ε, δ > 0
satisfy     
1 1 1
k ≥ C · · log + log . (48)
δ δ ε
If G is a quantum circuit of size at most s defined over Sn,k with k output bits such that
h i h i
def
E G(B) = gk (B) = E kΠgk (B) G |B, 0m ik2 ≥ ε, (49)
B∼Sn,k , G B∼Sn,k , G

then there is a quantum circuit B of size poly(n, k, s, log(1/δ), 1/ε) such that
h i h i
def ′
E B(x) = g(x) = E kΠg(x) B |x, 0m ik ≥ 1 − δ. (50)
x∼{0,1}n , B x∼{0,1}n , B

Moreover, a quantum circuit B of this form can be constructed with noticeable probability from
a quantum circuit G as above by a uniform sequence of quantum circuits, a statement which is
formalised as follows. For any choice of constructive functions ε = ε(n), δ = δ(n), and k = k(n)
satisfying the conditions of the theorem, and for any constructive function s = s(n), there is a
uniform family {CnIJKW }n≥1 of quantum circuits CnIJKW of size poly(n, k, s, log(1/δ), 1/ε) for which
the following holds. If CnIJKW is given as input a string code(G) describing a quantum circuit G
with the properties above, then when its output is measured it produces with probability ζ = Ω(ε2 )
a string code(B) describing a quantum circuit B of the desired size and success probability.
Proof. The proof is divided into two parts: existence and uniformity. First, we argue that if there
is quantum circuit G of the above form, then a corresponding quantum circuit B exists. Then, for
a fixed choice of constructive functions ε, δ, and k that depend on n and satisfy the conditions
of the result, and for any constructive function s = s(n), we provide a deterministic algorithm D
that, given 1n , runs in time t = poly(n, k, s, log(1/δ), ε) and outputs the description of a quantum
circuit CnIJKW as in the statement.
Let G be any quantum circuit of size s satisfying the hypothesis of the theorem. Note that in
the proof of Theorem 4.8 the corresponding circuit G is accessed in a classical way. By Lemma
2.17, the analysis of the success probability of the circuit B O constructed in the proof of Theorem
4.8 remains valid when the oracle O is replaced by the circuit G. However, in the quantum case,
instead of getting an inherently random oracle circuit B O of size poly(n, k, log(1/δ), 1/ε), we obtain
a quantum circuit B of size poly(n, k, s, log(1/δ), 1/ε), where the size overhead comes from replacing
each oracle gate O by a copy of the quantum circuit G of size s. Finally, recall that U = {0, 1}n ,
and note that the random input y of B O appearing in Equation 21 can be assumed to be part of the
quantum computation of B by standard techniques. Consequently, we obtain a quantum circuit B
for which Equation 50 holds.
Next, we argue that after fixing functions s(n), k(n), ε(n), and δ(n), there is a quantum circuit
CnIJKW that given the code of a good quantum circuit G outputs with probability Ω(ε2 ) the code of a
quantum circuit B with the desired properties. The circuit CnIJKW is simply the quantum analogue
of the algorithm D from Construction 4.13. Let T (n) = poly(n, k, s, log(1/δ), 1/ε) be fixed as in the
proof of Theorem 4.8. Then, given the code of G and using that n, k, and T (n) are fixed, CnIJKW
proceed as follows. It uses its internal randomness (simulated in a quantum way) to compute as
D in Step 1, then it simulates G using a universal quantum circuit (see Section 2.2) of size poly(T )
in order to sample w ∼ G(B)|A ,13 and finally it outputs the description of a quantum circuit B
13
This is where we use that all parameters are fixed, meaning that we can instantiate a universal quantum circuit
for quantum computations containing a fixed number of gates.

58
that computes as CA,w , where CA,w incorporates the code of G. Since by the proof of Theorem 4.8
algorithm D outputs a good circuit B O with probability Ω(ε2 ) over its internal randomness and the
randomness of G (whenever G satisfies the conditions of the theorem), it follows that this is also
true for CnIJKW and the description code(B) that it generates from code(G).
Observe that CnIJKW is defined over inputs of length poly(s), which represent the string code(G).
In addition, CnIJKW has size poly(T ). This includes the time it takes to simulate G, and the time it
requires to print an explicit description of B, which contains poly(T ) many gates.
Finally, note that the code of the quantum circuit CnIJKW is fully explicit, given a choice of
parameters. In other words, there is a deterministic algorithm that when given 1n runs for at most
t = poly(T ) = poly(n, k, s, log(1/δ), 1/ε) steps and prints CnIJKW .

4.4 Self-reducibility in the quantum setting


In this section, we explain how to use the downward and random self-reducibility of a language
L to help us to produce a sequence of circuits computing L. In more detail, we show how to
design a small quantum circuit Bn to compute L on n bit inputs from a large quantum circuit Pn−1
computing L on n − 1 bit inputs and a collection of small quantum circuits A1 , . . . , At with the
following guarantee: some Ai offers a good approximation of L over n bit inputs.
We will implement this idea with respect to the language L⋆ provided by Theorem 2.16. Note
that the downward self-reducibility and random-self-reducibility of this language holds with respect
to classical computation, while here we will rely on these structural properties in the context of
quantum circuits. This is not an issue for the following reasons.
In the case of downward self-reducibility, given a quantum circuit Pn−1 for L⋆n−1 (i.e. L⋆ re-
stricted to n − 1 bit inputs), we observe that its success probability on every input can be amplified
to 1 − negl(n). As a consequence, the classical reduction, which makes poly(n) classical queries,
will obtain correct and consistent answers with overwhelming probability, even if Pn−1 is a quan-
tum circuit.
On the other hand, in the case of random self-reducibility, on an input x for L⋆ , the reduction
is implemented by a classical algorithm that makes na queries to a classical oracle A, where each
query is uniformly distributed over {0, 1}n (but different queries can be correlated). If the oracle A
answers all queries according to a fixed function fen : {0, 1}n → {0, 1} that is 1/nb -close to L⋆n , the
reduction gives a correct answer on x with high probability. Now note that Definition 2.15 does not
offer a guarantee when fen is not a classical oracle, e.g., if it is defined from a quantum circuit that
does not provide a deterministic output. Fortunately, in order to quantize this reduction, we can
reduce the analysis to the classical case. This is only possible because here we are in the regime
where the correlation of interest is of the form 1 − o(1), while for instance in hardness amplification
(Section 4.3) we consider correlations that are o(1).
In more detail, here is one possible way of doing this. In our proof, A is some quantum circuit
that computes L⋆n with probability p, in the sense that the expected agreement between A and Ln⋆
on a random input x and the measurement of A’s ouput is p. It turns out that in our analysis we
can take p ≥ 1 − 1/nc for a convenient constant c > a + b. This allows us to show that there is a
large set S ⊆ {0, 1}n with Prx [x ∈ S] ≥ 1− 1/na+b+1 such that A is correct on each x ∈ S with high
probability. Moreover, by amplifying the success probability of A, we can get a quantum circuit A′
that is correct on each x ∈ S with probability at least, say, 1 − 2−n . Finally, since the reduction
makes at most na queries and each of them is uniform, by a union bound, with high probability
all queries land in S. This can be used to show that with high probability all (classical) queries
answered by the quantum circuit A′ agree with a classical function that is 1/nb -close to L⋆n , which
guarantees the correctness of the reduction also in our context. Using the ideas described above,

59
it is not hard to implement the result explained at the beginning of this section. We provide the
details next.
We start with the following: suppose we have a quantum circuit U that computes a random self-
reducible language L with high probability for a uniformly random input, then one can construct
a quantum circuit U ∗ that, with high probability, computes L on every x ∈ {0, 1}n .
Lemma 4.29 (Random self-reducibility and quantum circuits). Let L : {0, 1}∗ → {0, 1} be a
random self-reducible language with parameters a, b, c (as described in Definition 2.15). For every
n, suppose we have the description of a quantum circuit U such that
  1
E kΠL(x) U |x, 0q ik2 ≥ 1 − k , (51)
x∈{0,1}n n
for some k ≥ 2b + a.
There is a O(|U | · poly(n))-size quantum circuit U ∗ that satisfies

kΠ̃x U ∗ |0, x, 0q ik2 ≥ 1 − 2−2n+1 for every x ∈ {0, 1}n ,
∗ ∗
where Π̃x = |L(x)ihL(x)| ⊗ |xihx| ⊗ |0q ih0q | and q ∗ = poly(n).
Proof. Let us define Y = {x ∈ {0, 1}n : kΠL(x) U |x, 0q ik2 ≥ 23 }. It follows from Equation (51), that
|Y | ≥ (1 − n3k )2n . In this case, we can consider the circuit U amp that, on input x, computes U on
the input O(n) times in parallel and answers with the majority of the outputs. It follows that in
this case, for every x ∈ Y , we have that kΠL(x) U amp |x, 0O(qn) ik2 ≥ 1 − 2−n .
Let g : {0, 1}∗ → {0, 1}∗ be the polynomial-time computable classical function from Defini-
tion 2.15 in the definition of the random self-reducibility of L. Before defining U ∗ , we first define
the unitary Ũ1
X O
e1 : |x, 0i → √1
U |x, r, 0i |g(i, x, r), 0i . (52)
2nc r∈{0,1}nc i∈[na ]
| {z }
:=|χi

a
e2 = Id⊗(U amp )⊗n
We now use the circuit U amp to compute L for every g(i, x, r), i.e., we apply U
to the state |χi to obtain
1 X O 
|ψi = √ c
|x, r, 0i U amp |g(i, x, r), 0i . (53)
2n r∈{0,1}nc i∈[na ]

Let us fix some random r. Observe that by our assumption on g (i.e. g(i, x, r) ∼ Un ), it follows
from a union bound that there exists some i ∈ [na ] such that g(i, x, r) 6∈ Y with probability at most
3
nk−a
. Assuming that for every i ∈ [na ] we have g(i, x, r) ∈ Y , it follows that for every i ∈ [na ]
 
2
Pr ΠL(g(i,x,r)) U |g(i, x, r), 0i ≥ 1 − 2−n .
r

Let Ue3 be a unitary that implements the classical circuit h : {0, 1}∗ → {0, 1} from Defini-
e3 on |ψi can be written as
tion 2.15. The action of U
 
1 X  O
|φi = √ c Ũ3 |x, r, 0i U amp |g(i, x, r), 0i . (54)
2 n
r a
i∈[n ]

60
Notice that if we assume that for every i ∈ [na ] we have g(i, x, r) ∈ Y , and that U amp is correct
for every g(i, x, r), we have from Definition 2.15 that
kΠL(x) |φik2 ≥ 1 − 2−2n .
It follows from a union bound that for every x ∈ {0, 1}n ,
3 na 1 1
kΠL(x) |φik2 ≥ 1 − − − 2n ≥ 1 − .
nk−a 2n 2 poly(n)
We can pick U ∗ as the algorithm that runs Ũ3 Ũ2 Ũ1 in parallel O(n) times and answers with the
majority. Finally, to remove any garbage from the computation, we can copy the output register into
a separate register and uncompute U ∗ and still compute L(x) with overwhelming probability.

We now show that if L is downward self-reducible, then we can construct a quantum circuit
U ∗ that computes L on inputs of size n from a quantum circuit Un−1 that computes L on inputs
with size n − 1.
Theorem 4.30 (Downward-self-reducibility of L⋆ and quantum circuits). Let sP : N → N be a
constructive function. Let L⋆ be the language from Theorem 2.16. There is a sequence {CnDR }n≥1
of deterministic circuits CnDR for which the following holds:
(i ) Input: Each circuit CnDR gets as input 1n and a string code(Pn−1 ) that describes a quantum
circuit Pn−1 of size ≤ sP (n − 1).
(ii ) Uniformity and Size: Each circuit CnDR is of size S(n) = poly(n, sP (n − 1)), and there is a
deterministic algorithm that when given 1n prints code(CnDR ) in time poly(S(n)).
(iii ) Output and Correctness: If Pn−1 computes L⋆ on inputs of length n − 1 then CnDR outputs the
description of a quantum circuit Pn of size poly(n, sP (n − 1)) that computes L⋆ on inputs of
length n.
Proof. By item (i), we have that
2
ΠL⋆ (x) Pn−1 |x, 0q i ≥ 2/3.

We first amplify the success probability of Pn−1 from 2/3 to 1 − negl(n). For this, we perform a

standard majority vote on the outputs of O(n) parallel copies of Pn−1 and construct Pn−1 of size
O(n · sP (n − 1)) such that

e x P ∗ |0, x, 0q i
2 1
Π n−1 ≥1− , (55)
negl(n)
where q = O(poly(n)) and Π e x = |L⋆ (x)ihL⋆ (x)| ⊗ |xihx| ⊗ |0q ih0q |. Now, Pn simulates the classical

circuit AL from Definition 2.14 that computes L⋆ (x) and answers A’s queries to L⋆ on instances
of size n − 1 by simulating Pn−1∗ . To remove garbage, P copies the output of AL⋆ onto the output
n

qubit, and then uncomputes AL (by also uncomputing the calls to Pn−1 ∗ ). Since each one of the

polynomially many queries is correct with probability at least 1−1/negl(n), we have that the output

of AL is correct with probability at least 1 − 1/negl(n) ≥ 2/3. Additionally, as A is a poly(n) time

circuit and Pn−1 is of size O(n · sP (n − 1)), Pn is of size poly(n, sP (n − 1)), and therefore (iii) holds.
In order to show (ii), observe that the circuit CnDR first generates code(Pn−1 ∗ ) of size O(n·s (n−
P
L ⋆
1)) by using code(Pn−1 ). It then converts the classical circuit A into it’s reversible form and
replaces these reversible gates with corresponding unitary descriptions. It also replaces the queries

made by AL with code(Pn−1 ∗ ). All of this can be computed by C DR using poly(n, |code(P
n n−1 )|) =
poly(n, sP (n − 1)) gates.

61
Theorem 4.31 (Self-reducibility of L⋆ and quantum circuits). Let sA , sP , t : N → N be construc-
tive functions. Moreover, let L⋆ be the language from Theorem 2.16, and a⋆ , b⋆ be the associated
constants. There is a sequence {CnSR }n≥1 of quantum circuits CnSR for which the following holds:

(i ) Input: Each circuit CnSR gets as input 1n and strings code(Pn−1 ), code(A1 ), . . . , code(At(n) ),
where Pn−1 is a quantum circuit of size ≤ sP (n − 1) and each Ai is a quantum circuit of
size ≤ sA (n).

(ii ) Uniformity and Size: Each circuit CnSR is of size S(n) = poly(n, t(n), sA (n), sP (n − 1)), and
there is a deterministic algorithm that when given 1n prints code(CnSR ) in time poly(S(n)).

(iii ) Output and Correctness: Assume that Pn−1 computes L⋆ on inputs of length n − 1, and that
there exists i ∈ [t(n)] such that

Pr [Ai (x) = L⋆ (x)] ≥ 1 − n−2b⋆ −a⋆ . (56)


x∼{0,1}n , A i

Then with probability at least 1 − 1/500n2 over its output measurement, CnSR generates the
description code(Bn ) of a quantum circuit Bn of size poly(n, sA (n)) that correctly computes
L⋆ on inputs of length n. In other words, for every x ∈ {0, 1}n ,

Pr[Bn (x) = L⋆ (x)] ≥ 2/3.


Bn

Note. Notice that from Theorem 4.30, we could achieve a circuit of size poly(n, sP (n − 1)) that
computes L∗ on inputs of size n. The non-trivial aspect of Theorem 4.31 is to achieve a circuit Pn
whose size depends only on n and sA (n) and is independent of sP (n − 1).

Proof. First, given code(Pn−1 ), we use Theorem 4.30 to construct a circuit Pn with |code(Pn )| =
poly(n, sP (n − 1)) such that for every x ∈ {0, 1}n and for q ′ = poly(n), we have

e x Pn |0, x, 0q′ i 2
Π ≥ 1 − negl(n) (57)

where Πe x = |L⋆ (x)ihL⋆ (x)| ⊗ |xihx| ⊗ |0q′ ih0q′ |. In the remainder of this proof, the usage of Π
e x will

denote the process of checking if the output qubit is |L (x)i, the input qubits remain as |xi and
all auxiliary qubits are set to 0. We proceed to construct Bn in three steps using the random and
downward self-reducibility of L⋆ .

Step 1: Notice that, the theorem statement provides no guarantees on how well Aℓ performs for
ℓ 6= i. To identify the circuits (among {A1 , . . . , At(n) }) which compute L⋆ correctly with probability
at least 1 − 1/ poly(n), under the promise that there exists one, we carry out the following test.
Let R = O(log(t(n)/η)) where η = 1/ poly(n). For every ℓ ∈ [t(n)], pick uniformly random
x1ℓ , . . . , xR n
ℓ ∈ {0, 1} , and for every ′r ∈ [R] run Aℓ and Pn (constructed at the start of the proof)
separately on two copies of |0, xrℓ , 0q i, measure the first qubit and let the outputs be brℓ , crℓ ∈ {0, 1}
respectively. Consider the set
( R
)
X 3R
J = ℓ ∈ [t(n)] : [brℓ = crℓ ] ≥ .
4
r=1

Note that it is possible that there is an ℓ ∈ J that passes the test but Aℓ does not compute L⋆
correctly. However, it suffices for us to show that, with high probability (over the randomness of

62
sampling xij s) if ℓ ∈ J , then the quantum circuit Aℓ computes L⋆ . Towards this end, we define
t⋆ = 2b⋆ + a⋆ + 1 and show that, supposing Pr[Aℓ (x) = Pn (x)] < 1 − n−t⋆ (where the probability
is taken over uniformly random x ∈ {0, 1}n and randomness in Aℓ , Pn ) then with high probability
ℓ∈/ J . To prove this, first notice that if Pr[Aℓ (x) = Pn (x)] ≤ 1 − n−t⋆ , then applying the Chernoff
bound (Theorem 2.1) with µ < 1 − n−t⋆ and δ = O(1), we have
h1 X 3i
Pr [brℓ = crℓ ] ≥ ≤ e−O(R) ,
R r 4

This implies that with probability ≥ 1 − e−O(R) (over the random samples xij s), if ℓ ∈ J , then we
have Prx [Aℓ (x) = Pn (x)] ≥ 1 − 1/nt⋆ . Moreover, along with Equation (57), this implies that with
probability ≥ 1 − e−O(R) , if ℓ ∈ J then
Pr[Aℓ (x) = L⋆ (x)] ≥ Pr[Aℓ (x) = L⋆ (x)|Pn (x) = L⋆ (x)] · Pr[Pn (x) = L⋆ (x)]
x x x
−t⋆ −2b⋆ −a⋆
≥ (1 − n ) (1 − negl(n)) ≥ 1 − n
where the second inequality uses the fact that the conditional probability refers to the event that
Aℓ (x) = Pn (x). Hence, with probability at least 1 − e−O(R) , if ℓ ∈ J then Pr[Aℓ (x) = L⋆ (x)] ≥
1 − n−2b⋆ −a⋆ . By taking a union bound over all ℓ ∈ J , we now have
h i
Pr ∃ℓ ∈ J : Pr[Aℓ (x) = L⋆ (x)] < 1 − n−2b⋆ −a⋆ ≤ t(n) · e−O(R) ≤ η.

Hence with probability at least 1 − η, every ℓ ∈ J satisfies Pr[Aℓ (x) = L⋆ (x)] ≥ 1 − n−2b⋆ −a⋆ .
Before proceeding to the next step, we argue that with probability ≥ 1−e−O(R) , J is non-empty.
By assumption, we have that for i ∈ [t(n)] it follows that Prx [Ai (x) = L⋆ (x)] ≥ (1 − n−2b⋆ −a⋆ ).
Notice that since Prx [Pn (x) = L⋆ (x)] ≥ 1− negl(n), we have that Prx1 ,...,xR [∃r ∈ [R], cri 6= L⋆ (xri )] ≤
negl(n) by a union bound. Conditioned that for all r we have cri = L⋆ (xri ), we have that the
probability that Ai fails the test is very small i.e.,
h1 X 3i h1 X 3i
Pr [bri = cri ] ≤ = Pr [bri = L⋆ (xri )] ≤ ≤ e−O(R) ,
R r 4 R r 4
where the inequality follows by applying the Chernoff bound (Theorem 2.1) on Eq. (56), Hence,
with probability at least 1 − e−O(R) − negl(n) = 1 − e−O(R) , J contains i.

Step 2: In Step 1, we showed that circuits Aℓ for ℓ ∈ J compute L⋆ on average x. Now we


use the random self-reducibility property of L⋆ to obtain a circuit that performs well for every
x ∈ {0, 1}n and not just for a uniformly random x. In this direction, for every ℓ ∈ J , given
code(Aℓ ), we use Lemma 4.29 to construct the circuit A∗ℓ (with |code(A∗ℓ )| = poly(|code(Aℓ )|, n) =
poly(sA (n), n)). Then, for every ℓ ∈ J such that Ex [Aℓ (x) = L⋆ (x)] ≥ 1 − n−2b⋆ −a⋆ we also have
e x A∗ |0, x, 0q̄ i 2
Π ℓ ≥ 1 − 2−2n+1 , for every x ∈ {0, 1}n . (58)
We now prove item (iii). Pick η = 1/500n2 and note that with probability at least 1 − η =
1 − 1/500n2 , every ℓ ∈ J satisfies Eq. (58). Hence, picking an arbitrary ℓ ∈ J and setting Bn = A∗ℓ
gives us the desired quantum circuit with |code(Bn )| = |code(A∗ℓ )| ≤ poly(sA (n), n).

Finally, observe that Step 1 can be described with poly(t(n) · |code(A∗ℓ )| + |code(Pn )|) =
poly(n, t(n), sP (n − 1), sA (n)) gates. Similarly, Step 2 can be described with poly(T, t(n), sP (n −
1), sA (n)) = poly(n, t(n), sP (n − 1), sA (n)) gates. Step 3 uses poly(n, sA (n)) gates. Putting these
together, |code(CnSR )| = poly(n, t(n), sP (n − 1), sA (n)). This proves item (ii) and completes the
proof of the theorem.

63
5 A conditional PRG against uniform quantum circuits
In this section, we put together the results of Section 4, and show that if polynomial-space
classical algorithms cannot be simulated in sub-exponential time by quantum algorithms, then
there exists a pseudorandom generator secure against uniform quantum computations.

Theorem 5.1 (Conditional PRG against uniform quantum computations). Suppose that PSPACE *
γ
BQSUBEXP. In other words, there is a language L ∈ PSPACE and γ > 0 such that L ∈ / BQTIME[2n ].
Then, for some choice of constants α ≥ 1 and λ ∈ (0, 1/5), there is an infinitely often (ℓ, m, s, ε)-
λ 2λ
generator G = {Gn }n≥1 , where ℓ(n) ≤ nα , m(n) = ⌊2n ⌋, s(m) = 2n ≥ poly(m) (for any
polynomial), and ε(m) = 1/m.

In the proof given below, one can even take a larger constant λ closer to 1. However, since we
are not optimizing the choice of the constant α in the seed length, this is inessential.

Proof. Let L⋆ ⊆ {0, 1}∗ be the special language from Theorem 2.16. For each n ≥ 1, let fn : {0, 1}n →
{0, 1} be the indicator Boolean function which agrees with the set L⋆ ∩ {0, 1}n . We use these func-
tions to define a set of candidate generators {Gα,λ }α,λ , where Gα,λ = {Gα,λn }n≥1 with parameters
ℓ, m, s, and ε as in the statement of the result. We then argue that if none of them is an in-
γ
finitely often (ℓ, m, s, ε)-generator, then for every γ > 0 we have L⋆ ∈ BQTIME[2n ]. Since this
language is complete for PSPACE, as a consequence we get PSPACE ⊆ BQSUBEXP, contradicting
the hypothesis of the theorem.
Each generator Gα,λ = {Gα,λ n }n≥1 for a large enough α ≥ 1 is defined as follows. Let d⋆ ≥ 1 be
the constant appearing in Theorem 2.16, and a⋆ , b⋆ , c⋆ ≥ 1 be the constants from Definition 2.15
associated with the random-self-reducibility of L⋆ . Next, for every n ≥ 1 we set
def def def def def λ
n1 (n) = k·n (for k(n) = 2n2b⋆ +a⋆ +d⋆ +2 ), n2 (n) = kn+k, ℓ(n) = (kn+k)2 ≤ nα , m(n) = ⌊2n ⌋,

and introduce functions

gn : {0, 1}n1 → {0, 1}k , hn : {0, 1}n2 → {0, 1}, Gα,λ ℓ m


n : {0, 1} → {0, 1} ,

which are defined as follows:


def
gn ≡ fnk , i.e., g(x1 , . . . , xk ) = (fn (x1 ), . . . , fn (xk )) for every ~x ∈ {0, 1}kn ,
k
X
def
hn (~x, r) = gn (~x) · r = f (xi ) · r (mod 2), where r ∈ {0, 1}k , and
i=1
def
Gα,λ
n (z) = NWhn (z) instantiated with m(n) output bits.
def
This completes the definition of the generator Gα,λ = {Gα,λ n }n≥1 .

We argue next that if Gα,λ is not an infinitely often (ℓ, m, s, ε)-generator, then L⋆ ∈ BQTIME[2n ].
First, note that it has the correct stretch. Moreover, fn can be computed in (deterministic) time
d⋆ d⋆
O(2n ), gn and hn can be computed in time poly(n) · 2n , and the sets in the combinatorial design
can each be computed in time poly(n). Given these time bounds, it follows that Gα,λ n can be com-
n d⋆ ℓ(n)
puted in time m(n) · poly(n) · 2 = O(2 ), by our choice of ℓ(n). In other words, the generator
also satisfies the uniformity and running time requirements. Thus, if Gα,λ is not an infinitely often
(ℓ, m, s, ε)-generator it must be the case that its output distributions violate the pseudorandomness
condition. We explore this in what follows.

64
Let Dm ≡ Gα,λ
n (Uℓ ) be the distribution induced by the output of the generator on a random

input seed. Since the generator is not infinitely often pseudorandom for parameters s(m) = 2n
and ε(m) = 1/m, there is a deterministic algorithm A(1m ) that runs in time s(m), outputs a
“distinguisher” quantum circuit Dm over m input variables and of size at most s(m), and for every
large enough n (say, n ≥ κ ∈ N),

Pr [Dm (x) = 1] − Pr [Dm (y) = 1] > ε(m).


x∼{0,1}m y∼Dm

Our next step is to argue that algorithm A can be used to define a uniform family of quantum

circuits Qn of size at most O(2n ) that correctly decide L⋆ on n-bit inputs. In other words, accord-
ing to our notation, for every x ∈ {0, 1}n we have PrQn [Qn (x) = fn (x)] ≥ 2/3. By the discussion
above, this completes the proof of the theorem.

The quantum circuit Qn (x). We now describe each quantum circuit Qn and argue about its
correctness. This circuit computes in 1 + (n − κ + 1) + 1 sequential stages, which are delegated to
sub-circuits Q′0 followed by Q′κ , Q′κ+1 , . . . , Q′n and a final sub-circuit En :

Initialization: Q′0 on input |0q0 i prints the description code(Pκ−1 ) of a quantum circuit Pκ−1 of size
O(1) that computes the Boolean function fκ−1 corresponding to L⋆ over (κ − 1)-bit strings.

Core Stages: For every j ∈ {κ, . . . , n}, the circuit Q′j expects as input a description code(Pj−1 ) of
a quantum circuit Pj−1 that computes fj−1 . The goal of Q′j is to output with high probability the
code of a circuit Pj that computes fj .

Final Computation: Quantum circuit En expects input strings code(Pn ) and x ∈ {0, 1}n , and
outputs the simulation of Pn on x.

(We omit in this description the amplification of the success probability of Qn from 1/2 + Ω(1) to
≥ 2/3 (see e.g. Section 4.4, where more elaborate amplifications are discussed.)

Next, we formalise this idea, analysing the size and uniformity of quantum circuits Q′j and En ,
the size of the involved circuits Pj , and the success probability of each Q′j . This will allow us to
bound the size of Qn and to show that it correctly computes fn .
In order to define these circuits, we will make use of the uniform families of quantum circuits
from Section 4 and of the uniform family of quantum circuits {Dm(n) }n≥1 that violate the pseudo-
randomness of the generator whenever n ≥ κ. Our main goal is to prove the following lemma.
Lemma 5.2. There exist universal constants CU ≥ CQ ≥ CP ≥ 1 for which the following holds.
def 2λ
Let sP (n) = 2CP ·n for every n ≥ 1. For every j ∈ {κ, . . . , n}, there is a quantum circuit Q′j
such that:

(i ) Input: Q′j expects as input the description of a quantum circuit Pj−1 of size ≤ sP (j − 1).

(ii ) Size and Uniformity: Q′j is a circuit of size ≤ 2CQ ·j . Moreover, there is a deterministic

algorithm that when given 1j prints code(Q′j ) in time ≤ 2CU ·j .

(iii ) Output and Correctness: If the input circuit Pj−1 correctly computes fj−1 , then with prob-
ability at least 1 − 1/100j 2 over its output measurement, Q′j generates the description of a
circuit Pj of size ≤ sP (j) that computes fj .

65
Assuming Lemma 5.2, we can complete the proof of Theorem 5.1 as follows. By the definition
of Qn and its components, it follows from a union bound over all measurements in the core stages
of Qn that the probability that the string code(Pn ) output by Q′n does not describe a quantum
circuit Pn that computes fn is a most
n
X 1 X 1 1 π2 1
1/100j 2 ≤ · = · ≤ .
100 j2 100 6 50
j=κ j≥1

In this case, for every fixed x ∈ {0, 1}n , PrPn [Pn (x) = fn (x)] ≥ 2/3. Since in the final computation
stage of Qn it uses En to simulate the input circuit Pn on x, we get by a union bound that on each
input x,
Pr[Qn (x) 6= fn (x)] ≤ 1/50 + 1/3 < 2/5.
Qn

In other words, Qn computes fn . The size of Qn is given by the sum of the sizes of each component.

First, we can assume that size(Q′0 ) ≤ 2CP ·(κ−1) provided that CP is a large enough constant, given
that κ is constant. In addition, we have
n
X 2λ
size(Q′j ) ≤ n · size(Q′n ) ≤ n · 2CQ ·n .
j=κ

Finally, the size of En is at most polynomial in the size of the input circuit Pn . Overall, we get
2λ 3λ
that size(Qn ) = 2O(n ) = O(2n ), as desired. The uniformity of Qn follows from the uniformity
of its components (the code of Q′0 can be obtained using an exhaustive computation, since κ is

constant.). It follows from this discussion that L⋆ ∈ BQTIME[2n ].
We now proceed to prove Lemma 5.2, which finishes our proof.

Proof of Lemma 5.2. It will be evident from our proof that large enough constants CP , CQ , and
CU can be chosen so that the argument works. Furthermore, from the proof given below it will
be clear that every sub-circuit of Q′j can be uniformly constructed in deterministic time that is
polynomial in the size of the sub-circuit. From this and using that it is easy to describe how these
sub-circuits are connected, we get that the sequence {Q′j }j≥κ of quantum circuits Q′j is uniform.
Let j ∈ {κ, . . . , n} be fixed, and assume that Q′j is given as input a string code(Pj−1 ) rep-
resenting a quantum circuit Pj−1 of size ≤ sP (j − 1) that computes fj−1 . First, Q′j invokes the
deterministic circuit CjDR from Lemma 4.30 on code(Pj−1 ), obtaining from it a string code(Fj ) de-
scribing a quantum circuit Fj of size poly(sP (j − 1)) that computes fj . Note that Fj might contain
more than sP (j) gates, so it cannot be used as the output circuit Pj . However, we show next that
we can use Fj to uniformly construct the circuit Pj that computes fj with size sP (j). For that we
need four steps:

1. From a large circuit for fj to a smaller approximate circuit for hj (with non-trivial probability).
Circuit Q′j takes the code of Fj and produces from it a string code(Hj ) describing a quantum
circuit Hj that computes the function hj defined above. Note that the definition of Hj from Fj
is elementary, and that size(Hj ) = poly(size(Fj )) = poly(sP (j − 1)). Recall that hj is defined
over ℓ(j) = poly(j) input bits, and that the corresponding function NWhj from above produces
λ
m(j) = ⌊2j ⌋ output bits. We now invoke Lemma 4.4 with parameters associated with the Nisan-
Wigderson generator for index value j: function hj and its corresponding quantum circuit Hj ,
λ 2λ
stretch m(j) = ⌊2j ⌋, and distinguisher circuit Dm(j) of size s(j) = 2j and advantage γ(j) =
1/m(j). From this lemma and our choice of parameters, it follows that Q′j has access to a quantum

66

circuit C NW of size poly(m(j), sP (j − 1), s(j)) = poly(sP (j − 1)) = 2O(j ) such that, when given
code(Hj ), code(Dm(j) ), the input length of hj and the stretch value m(j), it outputs with probability
λ
at least Ω(γ(j)/m(j)2 ) = Ω(1/m(j)3 ) = Ω(2−3·j ) the encoding code(Aj ) of a quantum circuit Aj

of size O(m(j)2 · s(j)) = O(21.01·j ) such that
1 γ(j) 1 1 1 λ
Pr [Aj (x, r) = hj (x)] ≥ + = + 2
≥ + 2−3·j . (59)
x, r, Aj 2 2m(j) 2 2m(j) 2
Crucially, note that the size of Aj does not depend on the sizes of Fj and Pj−1 . In its next steps,

Q′j tries to compute from Aj a quantum circuit for fj of size ≤ 2CP ·j , while maintaining its total

number of gates ≤ 2CQ ·j (we will boost the success probability of Q′j later in the proof). Note
that the constant CQ might depend on CP once we fix a large enough CP , and indeed this is needed
for this plan to work.

2. From a circuit approximating hj to a non-trivial quantum circuit for gj (with probability 1).
Given the string code(Aj ) produced by C NW in the step above, Q′j proceeds as follows. It instantiates
the corresponding deterministic circuit C GL from Lemma 4.5, which computes from code(Aj ) a string

code(Bj ) describing a quantum circuit Bj , with size(Bj ) = poly(j, k(j)) · size(Aj ) = O(21.02·j ). If
we assume that Aj satisfies Equation (59), it follows from Lemma 4.5 that
λ
(2−3·j )3 λ
Pr [Bj (x) = gj (x)] ≥ ≥ 2−10·j . (60)
x, Bj 2
2λ )
Moreover, note that size(C GL ) = poly(j, k(j), size(Aj )) = 2O(j .

3. From a non-trivial circuit for gj to an excellent circuit for fj (with non-trivial probability).
Given the string code(Bj ) produced by C GL in the step above, and assuming for now that Bj
def λ def ⋆
satisfies Equation 60, Q′j proceeds as follows.14 Let ε′ (j) = 2−10·j and δ(j) = j −2b +a⋆ , and
consider k(j) and the size bound for Bj established above. Then, for this choice of parameters as
a function of j, it is not hard to check that Equation 48 in Theorem 4.28 holds. Indeed,
    
1 1 1
k(j) = 2j 2b⋆ +a⋆ +d⋆ +2 , while · log + log ≪ j 2b⋆ +a⋆ +λ+1 ≪ j 2b⋆ +a⋆ +2 .
δ(j) δ(j) ε′ (j)

Let C IJKW be the quantum circuit provided by Theorem 4.28 for our choice of parameters. We
rewrite Equation 60 more explicitly as follows:
h  i λ
Pr Bj x1 , . . . , xk = fj (x1 ), . . . , fj (xk ) ≥ 2−10·j , (61)
x1 ,...,xk ∼({0,1}j )k , Bj

where by assumption 0 < λ < 1/5. Now note that there is a mismatch between the expression above
and the assumption in Equation 49, because there we sample a k-tuple of j-bit strings according
to Sj,k ,15 i.e., there is no repetition of strings and we assume a canonical order of the tuple when
using it as an input string of length k · j bits. To remedy this situation, Q′j will not invoke C IJKW
directly on code(Bj ), as we explain next.
Define the quantum circuit A′j that attempts to compute gj on Sj,k based on Bj as follows:
14
To be more formal Q′j proceeds as we describe independently of the assumption that Bj satisfies Equation (60),
but we keep this assumption for simplicity of exposition.
15
Remember from Definition 4.6 that we define Sj,k = {S ⊆ {0, 1}j : |S| = k}.

67
1. On an input ~x ∈ Sj,k ,

2. Sample a random permutation π : [k] → [k].


def
3. Permute the k strings in ~x according to π, and let ~y = π(~x) be the corresponding kj-bit
string.

4. Output Bj (~y ).
Claim 5.3. The following holds:
λ
  2−10·j
Pr A′j (~x) = gj (~x) ≥ .
x∼Sj,k , A′j
~ 2

Proof of Claim 5.3. Indeed, using the definition of A′j , and recalling the definition of gj ,
   
Pr ′ A′j (~x) = gj (~x) = Pr Bj (~y ) = gj (~y )
x∼Sj,k , Aj
~ ~
x∼Sj,k
~
y =π(~
x), Bj
 
= Pr Bj (~x) = gj (~x) | strings in ~x are distinct
x∼{0,1}jk
~ ,Bj
 
≥ Pr Bj (~x) = gj (~x) ∧ strings in ~x are distinct
x∼{0,1}jk ,Bj
~
   
(Pr[E1 ∧ E2 ] ≥ Pr[E1 ] − Pr[¬E2 ]) ≥ Pr Bj (~x) = gj (~x) − Pr ∃i1 6= i2 s.t. xi1 = xi2
x∼{0,1}jk ,Bj
~ x∼{0,1}jk
~
λ
λ 2−10·j
≥ 2−10·j − k2 · 2−j ≥ ,
2
where the last inequality used that λ < 1 and k = k(j) = poly(j).

Circuit Q′j constructs code(A′j ) from code(Bj ), then invokes C IJKW on code(A′j ). Assuming Equation
λ
61 holds, C IJKW outputs with probability Ω(ε′ (j)2 ) = Ω(2−20·j ) a string code(Bj′ ) describing a
quantum circuit Bj′ of size
2λ )
size(Bj′ ) = poly(j, k(j), size(Bj ), log(1/δ(j)), 1/ε′ (j)) = poly(size(Bj )) = 2O(j (62)

such that  ′ 
Pr Bj (x) = fj (x) ≥ 1 − δ(j) = 1 − j −2b⋆ −a⋆ .
x∼{0,1}j ,Bj′
2λ )
Note that size(C IJKW ) = 2O(j .

Summary of Steps 1–3. By composing Steps 1–3 described above, we get that for every sufficiently
large constant C1 (such that Equation 62 holds) there is a constant C2 > C1 and a quantum

circuit Q′j of size 2C2 ·j that when given a description code(Pj−1 ) of a quantum circuit Pj−1 of
2λ λ
size 2C1 ·(j−1) that computes fj−1 , outputs with probability ζ(j) = Ω(2−23·j ) the description of a
quantum circuit P fj of size ≤ 2C1 ·j 2λ such that
 
Pr fj (x) = fj (x) ≥ 1 − j −2b⋆ −a⋆ .
P
fj
x∼{0,1}j ,P

fj does not depend on the exponent C2 nor on the size of Pj−1 , thanks to
(Crucially, the size of P
the results from Section 4 and the existence of distinguisher circuits Dm(j) for every j ≥ κ.) In

68
order to complete the proof of Lemma 5.2, it remains for us to (1) amplify the success probability
fj ; and (2) convert an almost-correct circuit P
that Q′j generates an almost-correct circuit P fj into a
quantum circuit Pj that computes fj on each input with probability at least 2/3. These two goals
are achieved next.

4. Amplifying the success probability of Q′j (code(Pj−1 )) and generating a correct circuit Pj for fj .
Our final version of Q′j works as follows. This circuit takes its (classical) input code(Pj−1 ) and
def λ)
runs Steps 1–3 for a total of t(j) = poly(j, 1/ζ(j)) = 2O(j times, obtaining from this a collection
Pb1 , . . . , Pbt(j) of candidate quantum circuits such that,
 
with probability ≥ 1 − 1/500j 2 , there is i ∈ [t(j)] s.t. Pr Pbi (x) = fj (x) ≥ 1 − j −2b⋆ −a⋆ .
x∼{0,1}j ,c
Pi

Now Q′j instantiates the corresponding circuit C SR from Theorem 4.31 on inputs 1j , code(Pbi )i∈[t(j)] ,
and code(Pj−1 ) in order to output with high probability a (single) circuit Pj that computes fj . Note
that despite the blowup in the size of Q′j and Pj due to the amplification, our size requirements for
them are maintained. In particular, we have from Theorem 4.31 that the size of the output circuit
Pj does not depend on size(Pj−1 ).

This finishes the construction of circuits Q′j satisfying the conditions of Lemma 5.2, which
completes the proof of Theorem 5.1.

References
[AB09] Sanjeev Arora and Boaz Barak. Computational Complexity: A Modern Approach. Cam-
bridge University Press, 2009. 17, 34, 39
[ABG06] Esma Aı̈meur, Gilles Brassard, and Sébastien Gambs. Machine learning in a quantum
world. In Conference of the Canadian Society for Computational Studies of Intelligence,
pages 431–442. Springer, 2006. 3
[AC02] Mark Adcock and Richard Cleve. A quantum Goldreich-Levin theorem with cryp-
tographic applications. In Symposium on Theoretical Aspects of Computer Science
(STACS), pages 323–334. Springer, 2002. 13, 32, 38
[ACL+ 19] Srinivasan Arunachalam, Sourav Chakraborty, Troy Lee, Manaswi Paraashar, and
Ronald de Wolf. Two new results about quantum exact learning. In International
Colloquium on Automata, Languages, and Programming (ICALP), volume 132, pages
16:1–16:15, 2019. 3
[AGS21] Srinivasan Arunachalam, Alex Bredariol Grilo, and Aarthi Sundaram. Quantum hard-
ness of learning shallow classical circuits. SIAM J. Comput., 50(3):972–1013, 2021. 3,
10
[Aha03] Dorit Aharonov. A simple proof that Toffoli and Hadamard are quantum universal.
CoRR, abs/0301040, 2003. 19
[AS05] Alp Atici and Rocco A. Servedio. Improved bounds on quantum learning algorithms.
Quantum Information Processing, 4(5):355–386, 2005. 3

69
[AS07] Alp Atıcı and Rocco A. Servedio. Quantum algorithms for learning and testing juntas.
Quantum Information Processing, 6(5):323–348, 2007. 3
[AS16] Noga Alon and Joel H. Spencer. The Probabilistic Method. John Wiley & Sons, 2016.
17
[AW17] Srinivasan Arunachalam and Ronald de Wolf. Guest column: A survey of quantum
learning theory. ACM SIGACT News, 48(2):41–67, 2017. 3, 21
[AW18] Srinivasan Arunachalam and Ronald de Wolf. Optimal quantum sample complexity of
learning algorithms. The Journal of Machine Learning Research, 19(1):2879–2878, 2018.
3
[BFGH10] Debajyoti Bera, Stephen Fenner, Frederic Green, and Steven Homer. Efficient universal
quantum circuits. Quantum Information & Computation, 10(1):16–28, 2010. 19
[BHLR19] Abhishek Bhrushundi, Kaave Hosseini, Shachar Lovett, and Sankeerth Rao. Torus
polynomials: An algebraic approach to ACC lower bounds. In Innovations in Theoretical
Computer Science Conference (ITCS), pages 13:1–13:16, 2019. 14
[BJ98] Nader H. Bshouty and Jeffrey C Jackson. Learning DNF over the uniform distribution
using a quantum example oracle. SIAM Journal on Computing, 28(3):1136–1153, 1998.
3
[BT94] Richard Beigel and Jun Tarui. On ACC. Comput. Complex., 4:350–366, 1994. 14
[CIKK16] Marco L. Carmosino, Russell Impagliazzo, Valentine Kabanets, and Antonina
Kolokolova. Learning algorithms from natural proofs. In Conference on Computational
Complexity (CCC), pages 10:1–10:24, 2016. 7, 15
[CLW20] Lijie Chen, Xin Lyu, and Ryan Williams. Almost-everywhere circuit lower bounds
from non-trivial derandomization. In Symposium on Foundations of Computer Science
(FOCS), 2020. 16
[COS18] Ruiwen Chen, Igor C. Oliveira, and Rahul Santhanam. An average-case lower bound
against ACC0 . In Latin American Symposium on Theoretical Informatics (LATIN),
pages 317–330, 2018. 16
[CR20] Lijie Chen and Hanlin Ren. Strong average-case lower bounds from non-trivial deran-
domization. In Symposium on Theory of Computing (STOC), pages 1327–1334, 2020.
16
[CRTY20] Lijie Chen, Ron Rothblum, Roei Tell, and Eylon Yogev. On exponential-time hypothe-
ses, derandomization, and circuit lower bounds. In Symposium on Foundations of Com-
puter Science (FOCS), 2020. 6, 8, 12, 16
[FK09] Lance Fortnow and Adam R. Klivans. Efficient learning algorithms yield circuit lower
bounds. J. Comput. Syst. Sci., 75(1):27–36, 2009. 6, 7, 11, 15
[GKZ19] Alex B. Grilo, Iordanis Kerenidis, and Timo Zijlstra. Learning with Errors is easy with
quantum samples. Physical Review Letters A, 99:032314, 2019. arXiv: 1702.08255. 3
[GL89] Oded Goldreich and Leonid A. Levin. A hard-core predicate for all one-way functions.
In Symposium on Theory of Computing (STOC), pages 25–32, 1989. 13, 32

70
[Gro96] Lov K. Grover. A fast quantum mechanical algorithm for database search. In Symposium
on Theory of Computing (STOC), pages 212–219, 1996. 3

[HH13] Ryan C. Harkins and John M. Hitchcock. Exact learning algorithms, betting games,
and circuit lower bounds. Transactions on Computation Theory (TOCT), 5(4):18, 2013.
6, 15

[Hoe63] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. J.
Amer. Staist. Assoc., pages 13–30, 1963. 17

[IJKW10] Russell Impagliazzo, Ragesh Jaiswal, Valentine Kabanets, and Avi Wigderson. Uniform
direct product theorems: simplified, optimized, and derandomized. SIAM Journal on
Computing, 39(4):1637–1665, 2010. 6, 9, 12, 13, 32, 39, 41, 42, 43, 45, 46, 53

[Imp95] Russell Impagliazzo. Hard-core distributions for somewhat hard problems. In Sympo-
sium on Foundations of Computer Science (FOCS), pages 538–545, 1995. 13

[IW01] Russell Impagliazzo and Avi Wigderson. Randomness vs time: Derandomization under
a uniform assumption. J. Comput. Syst. Sci., 63(4):672–688, 2001. 8, 12, 13, 32

[JLR11] Svante Janson, Tomasz Luczak, and Andrzej Rucinski. Random Graphs. John Wiley &
Sons, 2011. 17

[Juk12] Stasys Jukna. Boolean Function Complexity - Advances and Frontiers. Springer, 2012.
18

[Kha93] Michael Kharitonov. Cryptographic hardness of distribution-specific learning. In Sym-


posium on Theory of Computing (STOC), pages 372–381, 1993. 3

[KKO13] Adam Klivans, Pravesh Kothari, and Igor C. Oliveira. Constructing hard functions
using learning algorithms. In Conference on Computational Complexity (CCC), pages
86–97, 2013. 6, 7, 11, 15

[KS94] Michael J. Kearns and Robert E. Schapire. Efficient distribution-free learning of prob-
abilistic concepts. J. Comput. Syst. Sci., 48(3):464–497, 1994. 26

[KV94] M. Kearns and U. Vazirani. An Introduction to Computational Learning Theory. MIT


Press, 1994. 18

[LMN93] Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier
transform, and learnability. J. ACM, 40(3):607–620, 1993. 14, 15

[MW18] Cody Murray and R. Ryan Williams. Circuit lower bounds for nondeterministic quasi-
polytime: an easy witness lemma for NP and NQP. In Symposium on Theory of Com-
puting (STOC), pages 890–901, 2018. 16

[NC10] Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum Infor-
mation: 10th Anniversary Edition. Cambridge University Press, 2010. 19

[NR99] Moni Naor and Omer Reingold. Synthesizers and their application to the parallel
construction of pseudo-random functions. Journal of Computer and System Sciences,
58(2):336–375, 1999. 3

71
[NW94] Noam Nisan and Avi Wigderson. Hardness vs randomness. Journal of computer and
System Sciences, 49(2):149–167, 1994. 8, 12, 13, 32, 33

[O’D14] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014. 73

[Oli13] Igor C. Oliveira. Algorithms versus circuit lower bounds. Electronic Colloquium on
Computational Complexity, 20:117, 2013. 10

[Oli19] Igor C. Oliveira. Randomness and intractability in Kolmogorov complexity. In In-


ternational Colloquium on Automata, Languages, and Programming (ICALP), pages
32:1–32:14, 2019. 6, 14, 15

[OS17] Igor C. Oliveira and Rahul Santhanam. Conspiracies between learning algorithms, cir-
cuit lower bounds, and pseudorandomness. In Computational Complexity Conference
(CCC), pages 18:1–18:49, 2017. 4, 6, 7, 10, 11, 14, 15, 29, 32

[OS18] Igor C. Oliveira and Rahul Santhanam. Pseudo-derandomizing learning and approxi-
mation. In International Workshop on Randomization and Computation (RANDOM),
pages 55:1–55:19, 2018. 6, 15

[OW16] Ryan O’Donnell and John Wright. Efficient quantum tomography. In Symposium on
Theory of Computing (STOC), pages 899–912, 2016. 3

[OW17] Ryan O’Donnell and John Wright. Efficient quantum tomography II. In Symposium on
Theory of Computing (STOC), pages 962–974, 2017. 3

[RR97] Alexander A. Razborov and Steven Rudich. Natural proofs. J. Comput. Syst. Sci.,
55(1):24–35, 1997. 10, 15, 22

[San09] Rahul Santhanam. Circuit lower bounds for merlin–arthur classes. SIAM J. Comput.,
39(3):1038–1061, 2009. 5

[SG04] Rocco A. Servedio and Steven J. Gortler. Equivalences and separations between quan-
tum and classical learnability. SIAM Journal on Computing, 33(5):1067–1092, 2004. 3,
21

[Sho94] Peter W. Shor. Algorithms for quantum computation: Discrete logarithms and factoring.
In Symposium on Foundations of Computer Science (FOCS), pages 124–134, 1994. 3

[ST17] Rocco A. Servedio and Li-Yang Tan. What circuit classes can be learned with non-
trivial savings? In Innovations in Theoretical Computer Science Conference (ITCS),
pages 30:1–30:21, 2017. 14, 15

[TV07] Luca Trevisan and Salil P. Vadhan. Pseudorandomness and average-case complexity via
uniform reductions. Computational Complexity, 16(4):331–364, 2007. 7, 8, 12, 23

[Vol14] Ilya Volkovich. On learning, lower bounds and (un)keeping promises. In International
Colloquium on Automata, Languages, and Programming (ICALP), pages 1027–1038,
2014. 6, 10, 15

[Vol16] Ilya Volkovich. A guide to learning arithmetic circuits. In Conference on Learning


Theory (COLT), pages 1540–1561, 2016. 6, 15

72
[Wil13] Ryan Williams. Improving exhaustive search implies superpolynomial lower bounds.
SIAM J. Comput., 42(3):1218–1244, 2013. 16

[Wil14] Ryan Williams. Nonuniform ACC circuit lower bounds. J. ACM, 61(1):2:1–2:32, 2014.
3, 5, 14, 16

[Wil18] R. Ryan Williams. New algorithms and lower bounds for circuits with linear threshold
gates. Theory Comput., 14(1):1–25, 2018. 14

[Yam92] Kenji Yamanishi. A learning criterion for stochastic rules. Mach. Learn., 9:165–203,
1992. 26

A On trivial quantum learning algorithms


In this section, we explain in more detail that there are two quantum learners with different
parameters that are “trivial”, in the sense that they do not really exploit the structure of a concept
class C. First observe that, even classically, there is always a “brute-force” learner that works for
all possible functions: query the input function f : {0, 1}n → {0, 1} on all inputs, then store the
outcomes in a lookup table which is used as the exponentially large output hypothesis.
The learner above also works in the quantum setting. However, notice that quantumly there
exists a second “trivial” learner coming from Fourier sampling. Before we describe this learner, we
briefly discuss the basics of Fourier analysis on the Boolean cube (and refer the reader to [O’D14]
for more details).
Given the space of functions f : {0, 1}n → R, define the inner product between two functions
in this space as hf, gi = Ex [f (x) · g(x)] where the expectation is taken uniformly from x ∈ {0, 1}n .
In this space, one can define a setPof orthonormal basis functions as follows: for S ∈ {0, 1}n , define
χS (x) = (−1)S·x where S · x = i Si · xi . Hence, every function f : {0, 1}n → R can be written
uniquely as X
f (x) = fb(S)χS (x),
S

where fb(S) = Ex [f (x) · χS (x)] is called a Fourier coeffient of f . Moreover, it is not hard to see that
by Parseval’s identity, for every Boolean function f : {0, 1}n → {−1, 1},
X
fb(S)2 = E[f (x)2 ] = 1,
x
S

hence the set of of squared Fourier coefficients {fb(S)2 }S of a Boolean function f forms a probabil-
ity distribution.
It
P is well known that in nthe quantum learning model, given one uniform quantum example
√1
2n x |x, f (x)i for f : {0, 1} → {−1, 1}, with probability 1/2 we can Fourier sample, i.e., sample
P
from the distribution {fb(S)2 }S . Indeed, given √12n
|x, f (x)i, apply the one-qubit Hadamard gate
x
on the last register and measurePthe last qubit: with probability 1/2 we get the 1 outcome, in which
case the resulting state is √12n x f (x) |xi |1i, then apply the n-qubit Hadamard transform on the
P
first n qubits to obtain the state S fb(S) |Si |1i. Measuring this state produces a sample S from the
distribution {fb(S)2 }S . Let S be the random variable that outputs S ⊆ [n] with probability fb(S)2 .

73
Claim A.1. For every 0 ≤ γ ≤ 1 and f : {0, 1}n → {−1, 1}, we have
 
Pr |fˆ(S)| ≥ γ · 2−n/2 ≥ 1 − γ 2 .
S

def
Proof. It is enough to show that p = PrS [ fˆ(S)2 ≥ ε · 2−n ] ≥ 1 − ε. Note that with probability
1 − p over the choice of S, we have fˆ(S)2 < ε · 2−n . But then
X
fˆ(S)2 = 1 − p.
S : fˆ(S)2 < ε·2−n

This in turn implies that


X
2n · (ε · 2−n ) ≥ ε · 2−n ≥ 1 − p,
S : fˆ(S)2 < ε·2−n

which completes the proof.

Let Fn be the class of all Boolean functions f : {0, 1}n → {0, 1}. Claim A.1 implies that Fn
admits a quantum learning algorithm A under the uniform distribution with the following property:
for every f ∈ Fn , A runs in polynomial time and outputs with probability at least 0.249 a Boolean
circuit C that computes f with probability 21 + Ω(2−n/2 ). In order to see this, we use Fourier
sampling to obtain a set S, and then with probability 21 we let C = χS and with probability 12 we
let C = ¬χS . By Claim A.1, we are guaranteed that with probability at least 0.999, we pick some
S such that χS or ¬χS computes f with probability 12 + Ω(2−n/2 ), and we pick the correct one
with probability 21 . Since the Fourier sampling routine described above samples from the correct
distribution with probability 1/2, we are done.

74

You might also like