0% found this document useful (0 votes)

46 views99 pages

TR19 003

This document discusses near-optimal lower bounds on the threshold degree and sign-rank of AC0 circuits. The authors construct an AC0 circuit in n variables that has threshold degree Ω(n1-ε) and sign-rank exp(Ω(n1-ε)), for any ε > 0. This improves on previous best lower bounds and subsumes all previous lower bounds on threshold degree and sign-rank of AC0 circuits of any given depth. As a corollary, the authors also obtain near-optimal bounds on the discrepancy, threshold weight, and threshold density of AC0, strictly improving on previous work.

Uploaded by

Sparsh Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views99 pages

TR19 003

Uploaded by

Sparsh Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 99

Electronic Colloquium on Computational Complexity, Report No.

3 (2019)

NEAR-OPTIMAL LOWER BOUNDS ON THE THRESHOLD

DEGREE AND SIGN-RANK OF AC0

ALEXANDER A. SHERSTOV AND PEI WU

Abstract. The threshold degree of a Boolean function f : {0, 1}n → {0, 1} is

the minimum degree of a real polynomial p that represents f in sign: sgn p(x) =
(−1)f (x) . A related notion is sign-rank, defined for a Boolean matrix F = [Fij ]
as the minimum rank of a real matrix M with sgn Mij = (−1)Fij . Determining
the maximum threshold degree and sign-rank achievable by constant-depth
circuits (AC0 ) is a well-known and extensively studied open problem, with
complexity-theoretic and algorithmic applications.
We give an essentially optimal solution to this problem. For any > 0,
we construct an AC0 circuit in n variables that has threshold degree Ω(n1− )
and sign-rank exp(Ω(n1− )), improving on the previous best lower bounds of
√ √
Ω( n) and exp(Ω̃( n)), respectively. Our results subsume all previous lower
bounds on the threshold degree and sign-rank of AC0 circuits of any given
depth, with a strict improvement starting at depth 4. As a corollary, we also
obtain near-optimal bounds on the discrepancy, threshold weight, and thresh-
old density of AC0 , strictly subsuming previous work on these quantities. Our
work gives some of the strongest lower bounds to date on the communication
complexity of AC0 .

∗ Computer Science Department, UCLA, Los Angeles, CA 90095. Supported by NSF grant CCF-

1814947 and an Alfred P. Sloan Foundation Research Fellowship.

B {sherstov,pwu}@cs.ucla.edu .

ISSN 1433-8092
Contents
1. Introduction 3
1.1. Threshold degree of AC0 . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Sign-rank of AC0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Communication complexity . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Threshold weight and threshold density . . . . . . . . . . . . . . . 8
1.5. Previous approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6. Our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2. Preliminaries 15
2.1. General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2. Boolean functions and circuits . . . . . . . . . . . . . . . . . . . . . 16
2.3. Norms and products . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4. Orthogonal content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5. Sign-representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6. Symmetrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7. Communication complexity . . . . . . . . . . . . . . . . . . . . . . . 23
2.8. Discrepancy and sign-rank . . . . . . . . . . . . . . . . . . . . . . . . 23
3. Auxiliary results 25
3.1. Basic dual objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2. Dominant components . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3. Input transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4. The threshold degree of AC0 32
4.1. Shifting probability mass in product distributions . . . . . . . . . 32
4.2. A bounded dual polynomial for MP . . . . . . . . . . . . . . . . . . 39
4.3. Hardness amplification for threshold degree and beyond . . . . . 42
4.4. Threshold degree and discrepancy of AC0 . . . . . . . . . . . . . . 48
4.5. Threshold degree of surjectivity . . . . . . . . . . . . . . . . . . . . 50
5. The sign-rank of AC0 52
5.1. A simple lower bound for depth 3 . . . . . . . . . . . . . . . . . . . 53
5.2. Local smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3. Metric properties of locally smooth distributions . . . . . . . . . . 58
5.4. Weight transfer in locally smooth distributions . . . . . . . . . . . 59
5.5. A locally smooth dual polynomial for MP . . . . . . . . . . . . . . 68
5.6. An amplification theorem for smooth threshold degree . . . . . . 75
5.7. The smooth threshold degree of AC0 . . . . . . . . . . . . . . . . . 83
5.8. The sign-rank of AC0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Acknowledgments 87
References 87
Appendix A. Sign-rank and smooth threshold degree 90
A.1. Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
A.2. Forster’s bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.3. Spectral norm of pattern matrices . . . . . . . . . . . . . . . . . . . 91
A.4. Proof of Theorem 2.18 . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Appendix B. A dual object for OR 93
1. Introduction
A real polynomial p is said to sign-represent the Boolean function f : {0, 1}n →
{0, 1} if sgn p(x) = (−1)f (x) for every input x ∈ {0, 1}n . The threshold degree of
f , denoted deg± (f ), is the minimum degree of a multivariate real polynomial that
sign-represents f . Equivalent terms in the literature include strong degree [4], voting
polynomial degree [28], PTF degree [34], and sign degree [10]. Since any function
f : {0, 1}n → {0, 1} can be represented exactly by a real polynomial of degree at
most n, the threshold degree of f is an integer between 0 and n. Viewed as a compu-
tational model, sign-representation is remarkably powerful because it corresponds
to the strongest form of pointwise approximation. The formal study of threshold
degree began in 1969 with the pioneering work of Minsky and Papert [32] on limita-
tions of perceptrons. The authors of [32] famously proved that the parity function
on n variables has the maximum possible threshold degree, n. They obtained lower
bounds on the threshold degree of several other functions, including DNF formu-
las and intersections of halfspaces. Since then, sign-representing polynomials have
found applications far beyond artificial intelligence. In theoretical computer science,
applications of threshold degree include circuit lower bounds [28, 29, 42, 19, 7], size-
depth trade-offs [37, 55], communication complexity [42, 19, 44, 39, 7, 53, 51], struc-
tural complexity [4, 9], and computational learning [26, 25, 35, 3, 47, 49, 13, 50, 56].
The notion of threshold degree has been especially influential in the study of AC0 ,
the class of constant-depth polynomial-size circuits with ∧, ∨, ¬ gates of unbounded
fan-in. The first such result was obtained by Aspnes et al. [4], who used sign-
representing polynomials to give a beautiful new proof of classic lower bounds
for AC0 . In communication complexity, the notion of threshold degree played a
central role in the first construction [42, 44] of an AC0 circuit with exponentially
small discrepancy and hence large communication complexity in nearly every model.
That discrepancy result was used in [42] to show the optimality of Allender’s classic
simulation of AC0 by majority circuits, solving the open problem [28] on the relation
between the two circuit classes. Subsequent work [19, 7, 53, 51] resolved other
questions in communication complexity and circuit complexity related to constant-
depth circuits by generalizing the threshold degree method of [42, 44].
Sign-representing polynomials also paved the way for algorithmic breakthroughs
in the study of constant-depth circuits.
Specifically, anyfunction of threshold degree
d can be viewed as a halfspace in n0 + n1 + · · · + nd dimensions, corresponding

to the monomials in a sign-representation of f . As a result, a class of functions
of threshold degree at most d can be learned inthe standard PAC model under
arbitrary distributions in time polynomial in n0 + n1 + · · · + nd . Klivans and

Servedio [26] used this threshold degree approach to give what is currently the
fastest algorithm for learning polynomial-size DNF formulas, with running time
exp(Õ(n1/3 )). Another learning-theoretic breakthrough based on threshold degree
is the fastest algorithm for learning Boolean formulas, obtained by O’Donnell and
Servedio [35] for formulas of constant depth and by Ambainis et al. [3] for arbitrary
k−1 k
depth. Their algorithm runs in time exp(Õ(n(2 −1)/(2 −1) )) for formulas of size
√
n and constant depth k, and in time exp(Õ( n)) for formulas of unbounded depth.
In both cases, the bound on the running time follows from the corresponding upper
bound on the threshold degree.
A far-reaching generalization of threshold degree is the matrix-analytic notion of
sign-rank, which allows sign-representation out of arbitrary low-dimensional sub-
spaces rather than the subspace of low-degree polynomials. The contribution of
this paper is to prove essentially optimal lower bounds on the threshold degree and
4 ALEXANDER A. SHERSTOV AND PEI WU

sign-rank of AC0 , which in turn imply lower bounds on other fundamental com-
plexity measures of interest in communication complexity and learning theory. In
the remainder of this section, we give a detailed overview of the previous work,
present our main results, and discuss our proofs.

1.1. Threshold degree of AC0 . Determining the maximum threshold degree of

an AC0 circuit in n variables is a longstanding open problem in the area. It is
motivated by algorithmic and complexity-theoretic applications [26, 35, 27, 39, 13],
in addition to being a natural question in its own right. Table 1 gives a quantitative
summary of the results obtained to date. In their seminal monograph, Minsky and
Papert [32] proved a lower bound of Ω(n1/3 ) on the threshold degree of the following
DNF formula in n variables:
1/3 2/3
n^ n_
f (x) = xi,j .
i=1 j=1

Three decades later, Klivans and Servedio [26] obtained an O(n1/3 log n) upper
bound on the threshold degree of any polynomial-size DNF formula in n vari-
ables, essentially matching Minsky and Papert’s result and resolving the problem
for depth 2. Determining the threshold degree of circuits of depth k > 3 proved
to be challenging. The only upper bound known to date is the trivial O(n), which
follows directly from the definition of threshold degree. In particular, it is consis-
tent with our knowledge that there are AC0 circuits with linear threshold degree.
On the lower bounds side, the only progress for a long time was due to O’Donnell
and Servedio [35], who constructed for any k > 1 a circuit of depth k + 2 with
threshold degree Ω(n1/3 log2k/3 n). The authors of [35] formally posed the prob-
lem of obtaining a polynomial improvement on Minsky and Papert’s lower bound.
Such an improvement was obtained in [50], with a threshold degree lower bound

Depth Threshold degree Reference

2 Ω(n1/3 ) Minsky and Papert [32]

1/3
2 O(n log n) Klivans and Servedio [26]
1/3 2k/3
k+2 Ω(n log n) O’Donnell and Servedio [35]
k−1
k Ω(n )
2k−1 Sherstov [50]
√
4 Ω( n) Sherstov [52]
√
3 Ω̃( n) Bun and Thaler [17]
k−1
k Ω̃(n k+1 ) This paper

Table 1: Known bounds on the maximum threshold degree of ∧, ∨, ¬-circuits

of polynomial size and constant depth. In all bounds, n denotes the number of
variables, and k denotes an arbitrary positive integer.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 5

of Ω(n(k−1)/(2k−1) ) for circuits of depth √

k. A polynomially stronger result was ob-
tained in [52], with a lower bound of Ω( n) on the threshold degree of an explicit
circuit of depth 4. Bun and Thaler [17] recently used a different, depth-3 circuit
√
to give a much simpler proof of the Ω̃( n) lower bound for AC0 . We obtain a
quadratically stronger, and near-optimal, lower bound on the threshold degree of
AC0 .

Theorem 1.1. Let k > 1 be a fixed integer. Then there is an (explicitly given)
Boolean circuit family {fn }∞ n
n=1 , where fn : {0, 1} → {0, 1} has polynomial size,
depth k, and threshold degree
k−1 1 k−2 k−2

deg± (fn ) = Ω n k+1 · (log n)− k+1 d 2 eb 2 c .

Moreover, fn has bottom fan-in O(log n) for all k 6= 2.

For large k, Theorem 1.1 essentially matches the trivial upper bound of n on the
threshold degree of any function. For any fixed depth k, Theorem 1.1 subsumes
all previous lower bounds on the threshold degree of AC0 , with a polynomial im-
provement starting at depth k = 4. In particular, the lower bounds due to Minsky
and Papert [32] and Bun and Thaler [17] are subsumed as the special cases k = 2
and k = 3, respectively. From a computational learning perspective, Theorem 1.1
definitively rules out the threshold degree approach to learning constant-depth cir-
cuits.

1.2. Sign-rank of AC0 . The sign-rank of a matrix A = [Aij ] without zero entries,
denoted rk± (A), is the least rank of a real matrix M = [Mij ] with sgn Mij = sgn Aij
for all i, j. In other words, the sign-rank of A is the minimum rank of a matrix
that can be obtained by making arbitrary sign-preserving changes to the entries of
A. The sign-rank of a Boolean function F : {0, 1}n × {0, 1}n → {0, 1} is defined
in the natural way as the sign-rank of the matrix [(−1)F (x,y) ]x,y . In particular,
the sign-rank of F is an integer between 1 and 2n . This fundamental notion has
been studied in contexts as diverse as matrix analysis, communication complexity,

Depth Sign-rank Reference

3 exp(Ω(n1/3 )) Razborov and Sherstov [39]

2/5
3 exp(Ω̃(n )) Bun and Thaler [15]
√
7 exp(Ω̃( n)) Bun and Thaler [17]
1
1− k+1
3k exp(Ω̃(n )) This paper
1
1− k+1.5
3k + 1 exp(Ω̃(n )) This paper

Table 2: Known lower bounds on the maximum sign-rank of ∧, ∨, ¬-circuits

F : {0, 1}n × {0, 1}n → {0, 1} of polynomial size and constant depth. In all
bounds, k denotes an arbitrary positive integer.
6 ALEXANDER A. SHERSTOV AND PEI WU

circuit complexity, and learning theory; see [39] for a bibliographic overview. To a
complexity theorist, sign-rank is a vastly more challenging quantity to analyze than
threshold degree. Indeed, a sign-rank lower bound rules out sign-representation out
of every linear subspace of given dimension, whereas a threshold degree lower bound
rules out sign-representation specifically by linear combinations of monomials up
to a given degree.
Unsurprisingly, progress in understanding sign-rank has been slow and difficult.
No nontrivial lower bounds were known for any explicit matrices until the break-
through work of Forster [21], who proved strong lower bounds on the sign-rank of
Hadamard matrices and more generally all sign matrices with small spectral norm.
The sign-rank of constant-depth circuits F : {0, 1}n ×{0, 1}n → {0, 1} has since seen
considerable work, as summarized in Table 2. The first exponential lower bound on
the sign-rank of an AC0 circuit was obtained by Razborov and Sherstov [39], solv-
ing a 22-year-old problem due to Babai, Frankl, and Simon [5]. The authors of [39]
constructed a polynomial-size circuit of depth 3 with sign-rank exp(Ω(n1/3 )). In
follow-up work, Bun and Thaler [15] constructed a polynomial-size circuit of depth 3
with sign-rank exp(Ω̃(n2/5 )). A more recent and incomparable result, also due to
√
Bun and Thaler [17], is a sign-rank lower bound of exp(Ω̃( n)) for a circuit of
polynomial size and depth 7. No nontrivial upper bounds are known on√the sign-
rank of AC0 . Closing this gap between the best lower bound of exp(Ω̃( n)) and
the trivial upper bound of 2n has been a challenging open problem. We solve this
problem almost completely, by constructing for any > 0 a constant-depth circuit
with sign-rank exp(Ω(n1− )). In quantitative detail, our results on the sign-rank of
AC0 are the following two theorems.

Theorem 1.2. Let k > 1 be a given integer. Then there is an (explicitly given)
Boolean circuit family {Fn }∞ n n
n=1 , where Fn : {0, 1} ×{0, 1} → {0, 1} has polynomial
size, depth 3k, and sign-rank
k(k−1)
1

rk± (Fn ) = exp Ω n1− k+1 · (log n)− 2(k+1) .

As a companion result, we prove the following qualitatively similar but quantita-

tively incomparable theorem.

Theorem 1.3. Let k > 1 be a given integer. Then there is an (explicitly given)
Boolean circuit family {Gn }∞ n n
n=1 , where Gn : {0, 1} × {0, 1} → {0, 1} has polyno-
mial size, depth 3k + 1, and sign-rank
k2
1

rk± (Gn ) = exp Ω n1− k+1.5 · (log n)− 2k+3 .

For large k, the lower bounds of Theorems 1.2 and 1.3 approach the trivial upper
bound of 2n on the sign-rank of any Boolean function {0, 1}n × {0, 1}n → {0, 1}.
For any given depth, Theorems 1.2 and 1.3 subsume all previous lower bounds
on the sign-rank of AC0 , with a strict improvement starting at depth 3. From a
computational learning perspective, Theorems 1.2 and 1.3 state that AC0 has near-
maximum dimension complexity [41, 43, 39, 17], namely, exp(Ω(n1− )) for any
constant > 0. This rules out the possibility of learning AC0 circuits via dimension
complexity [39], a far-reaching generalization of the threshold degree approach from
the monomial basis to arbitrary bases.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 7

1.3. Communication complexity. Theorems 1.1–1.3 imply strong new lower

bounds on the communication complexity of AC0 . We adopt the standard ran-
domized model of Yao [30], with players Alice and Bob and a Boolean function
F : X × Y → {0, 1}. On input (x, y) ∈ X × Y, Alice and Bob receive the arguments
x and y, respectively, and communicate back and forth according to an agreed-upon
protocol. Each player privately holds an unlimited supply of uniformly random bits
that he or she can use when deciding what message to send at any given point in
the protocol. The cost of a protocol is the total number of bits communicated in a
worst-case execution. The -error randomized communication complexity R (F ) of
F is the least cost of a protocol that computes F with probability of error at most
on every input.
Of particular interest to us are communication protocols with error probability
close to that of random guessing, 1/2. There are two standard ways to formalize
the complexity of a communication problem F in this setting, both inspired by
probabilistic polynomial time PP for Turing machines:

UPP(F ) = inf R (F )
06<1/2

and

1
PP(F ) = inf R (F ) + log2 1 .
06<1/2
2 −

The former quantity, introduced by Paturi and Simon [38], is called the communi-
cation complexity of F with unbounded error, in reference to the fact that the error
probability can be arbitrarily close to 1/2. The latter quantity is called the com-
munication complexity of F with weakly unbounded error. Proposed by Babai et
al. [5], it features an additional penalty term that depends on the error probability.
It is clear that

UPP(F ) 6 PP(F ) 6 n + 2

for every communication problem F : {0, 1}n ×{0, 1}n → {0, 1}, with an exponential
gap achievable between the two complexity measures [10, 41]. These two models
occupy a special place in the study of communication because they are more pow-
erful than any other standard model (deterministic, nondeterministic, randomized,
quantum with or without entanglement). Moreover, unbounded-error protocols
represent a frontier in communication complexity theory in that they are the most
powerful protocols for which explicit lower bounds are currently known. Our re-
sults imply that even for such protocols, AC0 has near-maximal communication
complexity.
To begin with, combining Theorem 1.1 with the pattern matrix method [42, 44]
gives:

Theorem 1.4. Let k > 3 be a fixed integer. Then there is an (explicitly given)
Boolean circuit family {Fn }∞ n n
n=1 , where Fn : {0, 1} ×{0, 1} → {0, 1} has polynomial
size, depth k, communication complexity
k−1 1 k−2 k−2

PP(Fn ) = Ω n k+1 · (log n)− k+1 d 2 eb 2 c
8 ALEXANDER A. SHERSTOV AND PEI WU

and discrepancy
k−1 1 k−2 k−2

disc(Fn ) = exp −Ω n k+1 · (log n)− k+1 d 2 eb 2 c .

Discrepancy is a combinatorial complexity measure of interest in communication

complexity theory and other research areas; see Section 2.8 for a formal definition.
As k grows, the bounds of Theorem 1.4 approach the best possible bounds for any
communication problem Fn : {0, 1}n × {0, 1}n → {0, 1}. The same qualitative be-
havior was achieved in previous work by Bun and Thaler [17], who constructed,
for any constant > 0, a constant-depth circuit Fn : {0, 1}n × {0, 1}n → {0, 1}
with communication complexity PP(Fn ) = Ω(n1− ) and discrepancy disc(Fn ) =
exp(−Ω(n1− )). Theorem 1.4 strictly subsumes the result of Bun and Thaler [17]
and all other prior work on the discrepancy and PP-complexity of constant-depth
circuits [42, 10, 44, 50, 52]. For any fixed depth greater than 3, the bounds of
Theorem 1.4 are a polynomial improvement in n over all previous work. We fur-
ther show that Theorem 1.4 carries over to the number-on-the-forehead model, the
strongest formalism of multiparty communication. This result, presented in detail
in Section 4.4, uses the multiparty version [51] of the pattern matrix method.
Our work also gives near-optimal lower bounds for AC0 in the much more power-
ful unbounded-error model. Specifically, it is well-known [38] that the unbounded-
error communication complexity of any Boolean function F : X × Y → {0, 1} co-
incides up to an additive constant with the logarithm of the sign-rank of F. As a
result, Theorems 1.2 and 1.3 imply:

Theorem 1.5. Let k > 1 be a given integer. Let {Fn }∞ ∞

n=1 and {Gn }n=1 be the
polynomial-size circuit families of depth 3k and 3k + 1, respectively, constructed in
Theorems 1.2 and 1.3. Then
k(k−1)
1

UPP(Fn ) = Ω n1− k+1 · (log n)− 2(k+1) ,
k2
1

UPP(Gn ) = Ω n1− k+1.5 · (log n)− 2k+3 .

For large k, the lower bounds of Theorem 1.5 essentially match the trivial upper
bound of n + 1 on the unbounded-error communication complexity of any function
F : {0, 1}n × {0, 1}n → {0, 1}. Theorem 1.5 strictly subsumes all previous lower
bounds on the unbounded-error communication complexity of AC0 , with a poly-
nomial improvement for any depth greater than 2. The best lower bound on√the
unbounded-error communication complexity of AC0 prior to our work was Ω̃( n)
for a circuit of depth 7, due to Bun and Thaler [17]. Finally, we remark that The-
orem 1.5 gives essentially the strongest possible separation of the communication
complexity classes PH and UPP. We refer the reader to the work of Babai et al. [5]
for definitions and detailed background on these classes.
Qualitatively, Theorem 1.5 is stronger than Theorem 1.4 because communica-
tion protocols with unbounded error are significantly more powerful than those
with weakly unbounded error. On the other hand, Theorem 1.4 is stronger quanti-
tatively for any fixed depth and has the additional advantage of generalizing to the
multiparty setting.

1.4. Threshold weight and threshold density. By well-known reductions,

Theorem 1.1 implies a number of other lower bounds for the representation of
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 9

AC0 circuits by polynomials. For the sake of completeness, we mention two such
consequences. The threshold density of a Boolean function f : {0, 1}n → {0, 1},
denoted dns(f ), is the minimum size of a set family S ⊆ P({1, 2, . . . , n}) such
that
!
X P
x
sgn λS (−1) i∈S i
≡ (−1)f (x)
S∈S

for some reals λS . A related complexity

P measure is threshold weight, denoted W (f )
and defined as the minimum sum S⊆{1,2,...,n} |λS | over all integers λS such that
 
X P
xi 
sgn  λS (−1) i∈S ≡ (−1)f (x) .
S⊆{1,2,...,n}

It is not hard to see that the threshold density and threshold weight of f correspond
to the minimum size of a threshold-of-parity and majority-of-parity circuit for f,
respectively. The definitions imply that dns(f ) 6 W (f ) for every f, and a little
more thought reveals that 1 6 dns(f ) 6 2n and 1 6 W (f ) 6 4n . These complexity
measures have seen extensive work, motivated by applications to computational
learning and circuit complexity. For a bibliographic overview, we refer the reader
to [50, Section 8.2].
Krause and Pudlák [28, Proposition 2.1] gave an ingenious method for transform-
ing threshold degree lower bounds into lower bounds on threshold density and thus
also threshold weight. Specifically, let f : {0, 1}n → {0, 1} be a Boolean function of
interest. The authors of [28] considered the related function F : ({0, 1}n )3 → {0, 1}
given by F (x, y, z) = f (. . . , (zi ∧ xi ) ∨ (zi ∧ yi ), . . . ), and proved that dns(F ) >
2deg± (f ) . In this light, Theorem 1.1 implies that the threshold density of AC0 is
exp(Ω(n1− )) for any constant > 0:

Corollary 1.6. Let k > 3 be a fixed integer. Then there is an (explicitly given)
Boolean circuit family {Fn }∞ n
n=1 , where Fn : {0, 1} → {0, 1} has polynomial size
and depth k and satisfies

W (Fn ) > dns(Fn )

k−1 1 k−2 k−2

= exp Ω n k+1 · (log n)− k+1 d 2 eb 2 c .

For large k, the lower bounds on the threshold weight and density in Corollary 1.6
essentially match the trivial upper bounds. Observe that the circuit family {Fn }∞
n=1
of Corollary 1.6 has the same depth as the circuit family {fn }∞n=1 of Theorem 1.1.
This is because fn has bottom fan-in O(log n), and thus the Krause-Pudlák transfor-
mation fn 7→ Fn can be “absorbed” into the bottom two levels of fn . Corollary 1.6
subsumes all previous lower bounds [28, 13, 50, 52, 17] on the threshold weight and
density of AC0 , with a polynomial improvement for every k > 4. The improvement
is particularly noteworthy in the√case of threshold density, where the best previous
lower bound [52, 17] was exp(Ω( n)).
10 ALEXANDER A. SHERSTOV AND PEI WU

1.5. Previous approaches. In the remainder of this section, we discuss our

proofs of Theorems 1.1–1.3. The notation that we use here is standard, and we de-
fer its formal review to Section 2. We start with necessary approximation-theoretic
background, then review relevant previous work, and finally contrast it with the
approach of this paper. To sidestep minor technicalities, we will represent Boolean
functions in this overview as mappings {−1, +1}n → {−1, +1}. We alert the reader
that we will revert to the standard {0, 1}n → {0, 1} representation starting with
Section 2.

Background. Recall that our results concern the sign-representation of Boolean

functions and matrices. To properly set the stage for our proofs, however, we
need to consider the more general notion of pointwise approximation [33]. Let
f : {−1, +1}n → {−1, +1} be a Boolean function of interest. The -approximate
degree of f, denoted deg (f ), is the minimum degree of a real polynomial that
approximates f within pointwise: deg (f ) = min{deg p : kf − pk∞ 6 }. The
regimes of most interest are bounded-error approximation, corresponding to con-
stants ∈ (0, 1); and large-error approximation, corresponding to = 1 − o(1).
In the former case, the choice of error parameter ∈ (0, 1) is immaterial and af-
fects the approximate degree of a Boolean function by at most a multiplicative
constant. It is clear that pointwise approximation is a stronger requirement than
sign-representation, and thus deg± (f ) 6 deg (f ) for all 0 6 < 1. A moment’s
thought reveals that threshold degree is in fact the limiting case of -approximate
degree as the error parameter approaches 1:

deg± (f ) = lim deg (f ). (1.1)

Both approximate degree and threshold degree have dual characterizations [44],
obtained by appeal to linear programming duality. Specifically, deg (f ) > d if
and only if there is a function φ : {−1, +1}n → R with the following two proper-
ties: hφ, f i > kφk1 ; and hφ, pi = 0 for every polynomial p of degree less than d.
Rephrasing, φ must have large correlation with f but zero correlation with every
low-degree polynomial. By weak linear programming duality, φ constitutes a proof
that deg (f ) > d and for that reason is said to witness the lower bound deg (f ) > d.
In view of (1.1), this discussion generalizes to threshold degree. The dual charac-
terization here states that deg± (f ) > d if and only if there is a nonzero function
φ : {−1, +1}n → R with the following two properties: φ(x)f (x) > 0 for all x; and
hφ, pi = 0 for every polynomial p of degree less than d. In this dual characteriza-
tion, φ agrees in sign with f and is additionally orthogonal to polynomials of degree
less than d. The sign-agreement property can be restated in terms of correlation,
as hφ, f i = kφk1 . As before, φ is called a threshold degree witness for f.
What distinguishes the dual characterizations of approximate degree and thresh-
old degree is how the dual object φ relates to f . Specifically, a threshold degree
witness must agree in sign with f at every point. An approximate degree witness,
on the other hand, need only exhibit such sign-agreement with f at most points,
in that the points where the sign of φ is correct should account for most of the `1
norm of φ. As a result, constructing dual objects for threshold degree is significantly
more difficult than for approximate degree. This difficulty is to be expected because
the gap between threshold degree and approximate degree can be arbitrary, e.g.,
1 versus Θ(n) for the majority function on n bits [36].
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 11

Hardness amplification via block-composition. Much of the recent work on approx-

imate degree and threshold degree is concerned with composing functions in ways
that amplify their hardness. Of particular significance here is block-composition, de-
fined for functions f : {−1, +1}n → {−1, +1} and g : X → {−1, +1} as the Boolean
function f ◦ g : X n → {−1, +1} given by (f ◦ g)(x1 , . . . , xn ) = f (g(x1 ), . . . , g(xn )).
Block-composition works particularly well for threshold degree. To use an al-
ready familiar example, the block-composition ANDn1/3 ◦ ORn2/3 has threshold
degree Ω(n1/3 ) whereas the constituent functions ANDn1/3 and ORn2/3 have thresh-
old degree 1. As a more extreme example, Sherstov [49] obtained a lower bound
of Ω(n) on the threshold degree of the conjunction h1 ∧ h2 of two halfspaces
h1 , h2 : {0, 1}n → {0, 1}, each of which by definition has threshold degree 1. The
fact that threshold degree can increase spectacularly under block-composition is
the basis of much previous work, including the best previous lower bounds [50, 52]
on the threshold degree of AC0 . Apart from threshold degree, block-composition
has yielded strong results for approximate degree in various error regimes, includ-
ing direct sum theorems [47], direct product theorems [46], and error amplification
results [46, 13, 56, 14].
How, then, does one prove lower bounds on the threshold degree or approximate
degree of a composed function f ◦ g? It is here that the dual characterizations take
center stage: they make it possible to prove lower bounds algorithmically, by con-
structing the corresponding dual object for the composed function. Such algorith-
mic proofs run the gamut in terms of technical sophistication, from straightforward
to highly technical, but they have some structure in common. In most cases, one
starts by obtaining dual objects φ and ψ for the constituent functions f and g,
respectively, either by direct construction or by appeal to linear programming du-
ality. They are then combined to yield a dual object Φ for the composed function,
using dual block-composition [47, 31]:
n
Y
Φ(x1 , x2 , . . . , xn ) = φ(sgn ψ(x1 ), . . . , sgn ψ(xn )) |ψ(xi )|. (1.2)
i=1

This composed dual object often requires additional work to ensure sign-agreement
or correlation with the composed Boolean function. Among the generic tools
available to assist in this process is a “corrector” object ζ due to Razborov and
Sherstov [39], with the following four properties: (i) ζ is orthogonal to low-degree
polynomials; (ii) ζ takes on 1 at a prescribed point of the hypercube; (iii) ζ is
bounded on inputs of low Hamming weight; and (iv) ζ vanishes on all other points
of the hypercube. Using the Razborov–Sherstov object, suitably shifted and scaled,
one can surgically correct the behavior of a given dual object Φ on a substantial
fraction of inputs, thus modifying its metric properties without affecting its orthog-
onality to low-degree polynomials. This technique has played an important role in
recent work, e.g., [15, 16, 11, 17].
Hardness amplification for approximate degree. While block-composition has pro-
duced a treasure trove of results on polynomial representations of Boolean functions,
it is of limited use when it comes to constructing functions with high bounded-
error approximate degree. To illustrate the issue, consider arbitrary functions
f : {−1, +1}n1 → {−1, +1} and g : {−1, +1}n2 → {−1, +1} with 1/3-approximate
degrees nα α2
1 and n2 , respectively, for some 0 < α1 < 1 and 0 < α2 < 1. It
1

is well-known [48] that the composed function f ◦ g on n1 n2 variables has 1/3-

approximate degree O(nα 1 α2
1 n2 ) = O(n1 n2 )
max{α1 ,α2 }
. This means that relative to
12 ALEXANDER A. SHERSTOV AND PEI WU

the new number of variables, the block-composed function f ◦ g is asymptotically

no harder to approximate to bounded error than the constituent functions f and
g. In particular, one cannot use block-composition to transform functions on n
bits with 1/3-approximate degree at most nα into functions on N > n bits with
1/3-approximate degree ω(N α ).
Until recently, the best lower bound on the bounded-error approximate degree
of AC0 was Ω(n2/3 ), due to Aaronson and Shi [1]. Breaking this n2/3 barrier
was a fundamental problem in its own right, in addition to being a hard pre-
requisite for threshold degree lower bounds for AC0 better than Ω(n2/3 ). This
barrier was overcome in a brilliant paper of Bun and Thaler [16], who proved,
for any constant > 0, an Ω(n1− ) lower bound on the 1/3-approximate de-
gree of AC0 . Their hardness amplification for approximate degree works as fol-
lows. Let f : {−1, +1}n → {−1, +1} be given, with 1/3-approximate degree nα
for some 0 6 α < 1. Bun and Thaler consider the block-composition F =
f ◦ ANDΘ(log m) ◦ ORm , for an appropriate parameter m = poly(n). As shown in
earlier work [47, 13] on approximate degree, dual block-composition
√ witnesses the
lower bound deg1/3 (F ) = Ω(deg1/3 (ORm ) deg1/3 (f )) = Ω( m deg1/3 (f )). Next,
Bun and Thaler make the crucial observation that the dual object for ORm has
most of its `1 mass on inputs of Hamming weight O(1), which in view of (1.2)
implies that the dual object for F places most of its `1 mass on inputs of Hamming
weight Õ(n). The authors of [16] then use the Razborov–Sherstov corrector object
to transfer the small amount of `1 mass that the dual object for F places on inputs
of high Hamming weight, to inputs of low Hamming weight. The resulting dual
object for F is supported entirely on inputs of low Hamming weight and therefore
witnesses a lower bound on the 1/3-approximate degree of the restriction F 0 of F to
inputs of low Hamming weight. By re-encoding the input to F 0 , one finally obtains
a function F 00 on Õ(n) variables with 1/3-approximate degree polynomially larger
than that of f. This passage from f to F 00 is the desired hardness amplification for
approximate degree. We find it helpful to think of Bun and Thaler’s technique as
block-composition followed by input compression, to reduce the number of input
variables in the block-composed function. To obtain an Ω(n1− ) lower bound on
the approximate degree of AC0 , the authors of [16] start with a trivial circuit and
iteratively apply the hardness amplification step a constant number of times, until
approximate degree Ω(n1− ) is reached.
In follow-up work, Bun, Kothari, and Thaler [11] refined the technique of [16] by
deriving optimal concentration bounds for the dual object for ORm . They thereby
obtained tight or nearly tight lower bounds on the 1/3-approximate degree of sur-
jectivity, element distinctness, and other important problems. The most recent
contribution to this line of work is due to Bun and Thaler [17], who prove an
1−
Ω(n1− ) lower bound on the (1 − 2−n )-approximate degree of AC0 by combining
the method of [16] with Sherstov’s work [46] on direct product theorems for approx-
imate degree. This near-linear lower bound substantially strengthens the authors’
previous result [16] on the bounded-error approximate degree of AC0 , but does not
address the threshold degree.

1.6. Our approach.

Threshold degree of AC 0 . Bun and Thaler [17] refer to obtaining an Ω(n1− ) thresh-
old degree lower bound for AC0 as the “main glaring open question left by our work.”
It is important to note here that lower bounds on approximate degree, even with the
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 13

error parameter exponentially close to 1 as in [17], have no implications for thresh-

old degree. For example, there are functions [49] with (1 − 2−Θ(n) )-approximate
degree Θ(n) but threshold degree 1. Our proof of Theorem 1.1 is unrelated to the
most recent work of Bun and Thaler [17] on the large-error approximate degree
of AC0 and instead builds on their earlier and simpler “block-composition followed
by input compression” approach [16]. The centerpiece of our proof is a hardness
amplification result for threshold degree, whereby any function f with threshold
degree nα for a constant 0 6 α < 1 can be transformed efficiently and within AC0
into a function F with polynomially larger threshold degree.
In more detail, let f : {−1, +1}n → {−1, +1} be a function of interest, with
threshold degree nα . We consider the block-composition f ◦MPm , where m = nO(1)
is an appropriate parameter and MPm = ANDm ◦ ORm2 is the Minsky–Papert
function with threshold degree Ω(m). We construct the dual object for MPm from
scratch to ensure concentration on inputs of Hamming weight Õ(m). By applying
dual block-composition to the threshold degree witnesses of f and MPm , we obtain
a dual object Φ witnessing the Ω(mnα ) threshold degree of f ◦ MPm . So far in
the proof, our differences from [16] are as follows: (i) since our goal is amplifica-
tion of threshold degree, we work with witnesses of threshold degree rather than
approximate degree; (ii) to ensure rapid growth of threshold degree, we use block-
composition with inner function MPm = ANDm ◦ ORm2 of threshold degree Θ(m),
in place of Bun and Thaler’s inner function ANDΘ(log m) ◦ ORm of threshold degree
Θ(log m).
Since the dual object for MPm by construction has most of its `1 norm on inputs
of Hamming weight Õ(m), the dual object Φ for the composed function has most
of its `1 norm on inputs of Hamming weight Õ(nm). Analogous to [16, 11, 17], we
would like to use the Razborov–Sherstov corrector object to remove the `1 mass
that Φ has on the inputs of high Hamming weight, transferring it to inputs of low
Hamming weight. This brings us to the novel and technically demanding part of
our proof. Previous works [16, 11, 17] transferred the `1 mass from the inputs
of high Hamming weight to the neighborhood of the all-zeroes input (0, 0, . . . , 0).
An unavoidable feature of the Razborov–Sherstov transfer process is that it ampli-
fies the `1 mass being transferred. When the transferred mass finally reaches its
destination, it overwhelms Φ’s original values at the local points, destroying Φ’s
sign-agreement with the composed function f ◦ MPm . It is this difficulty that most
prevented earlier works [16, 11, 17] from obtaining a strong threshold degree lower
bound for AC0 .
We proceed differently. Instead of transferring the `1 mass of Φ from the inputs
of high Hamming weight to the neighborhood of (0, 0, . . . , 0), we transfer it simul-
taneously to exponentially many strategically chosen neighborhoods. Split this way
across many neighborhoods, the transferred mass does not overpower the original
values of Φ and in particular does not change any signs. Working out the details
of this transfer scheme requires subtle and lengthy calculations; it was not clear
to us until the end that such a scheme exists. Once the transfer process is com-
plete, we obtain a witness for the Ω(mnα ) threshold degree of f ◦ MPm restricted
to inputs of low Hamming weight. Compressing the input as in [16, 11], we ob-
tain an amplification theorem for threshold degree. With this work behind us, the
proof of Theorem 1.1 for any depth k amounts to starting with a trivial circuit and
amplifying its threshold degree O(k) times.
Sign-rank of AC 0 . It is not known how to “lift” a threshold degree lower bound in
a black-box manner to a sign-rank lower bound. In particular, Theorem 1.1 has no
14 ALEXANDER A. SHERSTOV AND PEI WU

implications a priori for the sign-rank of AC0 . Our proofs of Theorems 1.2 and 1.3
are completely disjoint from Theorem 1.1 and are instead based on a stronger
approximation-theoretic quantity that we call γ-smooth threshold degree. Formally,
the γ-smooth threshold degree of a Boolean function f : X → {−1, +1} is the largest
d for which there is a nonzero function φ : X → R with the following two properties:
φ(x)f (x) > γ · kφk1 /|X| for all x ∈ X; and hφ, pi = 0 for every polynomial p of
degree less than d. Taking γ = 0 in this formalism, one recovers the standard dual
characterization of the threshold degree of f. In particular, threshold degree is syn-
onymous with 0-smooth threshold degree. The general case of γ-smooth threshold
degree for γ > 0 requires threshold degree witnesses φ that are min-smooth, in that
the absolute value of φ at any given point is at least a γ fraction of the average
absolute value of φ over all points. A substantial advantage of smooth threshold de-
gree is that it has immediate sign-rank implications. Specifically, any lower bound
of d on the 2−O(d) -smooth threshold degree can be converted efficiently and in a
black-box manner into a sign-rank lower bound of 2Ω(d) , using a combination of
the pattern matrix method [42, 44] and Forster’s spectral lower bound on sign-
rank [21, 22]. Accordingly, we obtain Theorems 1.2 and 1.3 by proving an Ω(n1− )
1−
lower bound on the 2−O(n ) -smooth threshold degree of AC0 , for any constant
> 0.
At the core of our result is an amplification theorem for smooth threshold de-
gree, whose repeated application makes it possible to prove arbitrarily strong lower
bounds for AC0 . Amplifying smooth threshold degree is a complex juggling act
due to the presence of two parameters—degree and smoothness—that must evolve
in coordinated fashion. The approach of Theorem 1.1 is not useful here because
the threshold degree witnesses that arise from the proof of Theorem 1.1 are highly
nonsmooth. In more detail, when amplifying the threshold degree of a function f
as in the proof of Theorem 1.1, two phenomena adversely affect the smoothness
parameter. The first is block-composition itself as a composition technique, which
in the regime of interest to us transforms every threshold degree witness for f into
a hopelessly nonsmooth witness for the composed function. The other culprit is the
input compression step, which re-encodes the input and thereby affects the smooth-
ness in ways that are hard to control. To overcome these difficulties, we develop a
novel approach based on what we call local smoothness.
Formally, let Φ : Nn → R be a function of interest. For a subset X ⊆ Nn and
0
a real number K > 1, we say that Φ is K-smooth on X if |Φ(x)| 6 K |x−x | |Φ(x0 )|
for all x, x0 ∈ X. Put another way, for any two points of X at `1 distance d, the
corresponding values of Φ differ in magnitude by a factor of at most K d . In and
of itself, a locally smooth function Φ need not be min-smooth because for a pair
of points that are far from each other, the corresponding Φ-values can differ by
many orders of magnitude. However, locally smooth functions exhibit extraordi-
nary plasticity. Specifically, we show how to modify a locally smooth function’s
metric properties—such as its support or the distribution of its `1 mass—without
the change being detectable by low-degree polynomials. This apparatus makes
it possible to restore min-smoothness to the dual object Φ that results from the
block-composition step and preserve that min-smoothness throughout the input
compression step, eliminating the two obstacles to min-smoothness in the earlier
proof of Theorem 1.1. The new block-composition step uses a locally smooth wit-
ness for the threshold degree of MPm , which needs to be built from scratch and is
quite different from the witness in the proof of Theorem 1.1.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 15

Our described approach departs considerably from previous work on the sign-
rank of constant-depth circuits [39, 15, 17]. The analytic notion in those earlier
papers is weaker than γ-smooth threshold degree and in particular allows the dual
object to be arbitrary on a γ fraction of the inputs. This weaker property is accept-
able when the main result is proved in one shot, with a closed-form construction
of the dual object. By contrast, we must construct dual objects iteratively, with
each iteration increasing the degree parameter and proportionately decreasing the
smoothness parameter. This iterative process requires that the dual object in each
iteration be min-smooth on the entire domain. Perhaps unexpectedly, we find γ-
smooth threshold degree easier to work with than the weaker notion in previous
work [39, 15, 17]. In particular, we are able to give a new and short proof of the
exp(Ω(n1/3 )) lower bound on the sign-rank of AC0 , originally obtained by Razborov
and Sherstov [39] with a much more complicated approach. The new proof can be
found in Section 5.1, where it serves as a prelude to our main result on the sign-rank
of AC0 .

2. Preliminaries
2.1. General. For a string x ∈ {0, 1}n and a set S ⊆ {1, 2, . . . , n}, we let x|S
denote the restriction of x to the indices in S. In other words, x|S = xi1 xi2 . . . xi|S| ,
where i1 < i2 < · · · < i|S| are the elements of S. The characteristic function of a
set S ⊆ {1, 2, . . . , n} is given by
(
1 if x ∈ S,
1S (x) =
0 otherwise.

For a logical condition C, we use the Iverson bracket

(
1 if C holds,
I[C] =
0 otherwise.

We let N = {0, 1, 2, 3, . . .} denote the set of natural numbers. The following well-
known bound [24, Proposition 1.4] is used in our proofs without further mention:

k en k
X n
6 , k = 0, 1, 2, . . . , n, (2.1)
i=0
i k

where e = 2.7182 . . . denotes Euler’s number.

We adopt the extended real number system R ∪ {−∞, ∞} in all calculations,
with the additional convention that 0/0 = 0. We use the comparison operators in
a unary capacity to denote one-sided intervals of the real line. Thus, <a, 6a, >a,
>a stand for (−∞, a), (−∞, a], (a, ∞), [a, ∞), respectively. We let ln x and log x
stand for the natural logarithm of x and the logarithm of x to base 2, respectively.
We use the following two versions of the sign function:

−1 if x < 0,
 (
−1 if x < 0,
sgn x = 0 if x = 0, sg
gn x =

1 1 if x > 0.
if x > 0,
16 ALEXANDER A. SHERSTOV AND PEI WU

The term Euclidean space refers to Rn for some positive integer n. We let ei de-
note the vector whose ith component is 1 and the others are 0. Thus, the vectors
e1 , e2 , . . . , en form the standard basis for Rn . For vectors x and y, we write x 6 y
to mean that xi 6 yi for each i. The relations >, <, > on vectors are defined
analogously.
We frequently omit the argument in equations and inequalities involving func-
tions, as in sgn p = (−1)f . Such statements are to be interpreted pointwise. For
example, the statement “f > 2|g| on X” means that f (x) > 2|g(x)| for every
x ∈ X. The positive and negative parts of a function f : X → R are denoted
pos f = max{f, 0} and neg f = max{−f, 0}, respectively.

2.2. Boolean functions and circuits. We view Boolean functions as mappings

X → {0, 1} for some finite set X. More generally, we consider partial Boolean
functions f : X → {0, 1, ∗}, with the output value ∗ used for don’t-care inputs.
The negation of a Boolean function f is denoted as usual by f = 1 − f. The
n
Wn ORn : {0, 1} → {0, 1}Vand
familiar functions ANDn : {0, 1}n → {0, 1} are given
n
by ORn (x) = i=1 xi and ANDn (x) = i=1 xi . We abbreviate NORn = ¬ORn .
The generalizedVmMinsky–Papert
Wr function MPm,r : ({0, 1}r )m → {0, 1} is given by
MPm,r (x) = i=1 j=1 xi,j . We abbreviate MPm = MPm,m2 , which is the right
setting of parameters for most of our applications.
We adopt the standard notation for function composition, with f ◦ g defined by
(f ◦ g)(x) = f (g(x)). In addition, we use the ◦ operator to denote the component-
wise composition of Boolean functions. Formally, the componentwise composition
of f : {0, 1}n → {0, 1} and g : X → {0, 1} is the function f ◦ g : X n → {0, 1}
given by (f ◦ g)(x1 , x2 , . . . , xn ) = f (g(x1 ), g(x2 ), . . . , g(xn )). To illustrate, MPm,r =
ANDm ◦ ORr . Componentwise composition is consistent with standard composi-
tion, which in the context of Boolean functions is only defined for n = 1. Thus, the
meaning of f ◦ g is determined by the range of g and is never in doubt. Compo-
nentwise composition generalizes in the natural manner to partial Boolean functions
f : {0, 1}n → {0, 1, ∗} and g : X → {0, 1, ∗}, as follows:
(
f (g(x1 ), . . . , g(xn )) if x1 , . . . , xn ∈ g −1 (0 ∪ 1),
(f ◦ g)(x1 , . . . , xn ) =
∗ otherwise.

Compositions f1 ◦ f2 ◦ · · · ◦ fk of three or more functions, where each instance of

the ◦ operator can be standard or componentwise, are well-defined by associativity
and do not require parenthesization.
For Boolean strings x, y ∈ {0, 1}n , we let x ⊕ y denote their bitwise XOR. The
strings x ∧ y and x ∨ y are defined analogously, with the binary connective applied
bitwise. A Boolean circuit C in variables x1 , x2 , . . . , xn is a circuit with inputs
x1 , ¬x1 , x2 , ¬x2 , . . . , xn , ¬xn and gates ∧ and ∨. The circuit C is monotone if it
does not use any of the negated inputs ¬x1 , ¬x2 , . . . , ¬xn . The fan-in of C is the
maximum in-degree of any ∧ or ∨ gate. Unless stated otherwise, we place no
restrictions on the gate fan-in. The size of C is the number of ∧ and ∨ gates.
The depth of C is the maximum number of ∧ and ∨ gates on any path from
an input to the circuit output. With this convention, the circuit that computes
(x1 , x2 , . . . , xn ) 7→ x1 has depth 0. The circuit class AC0 consists of function
families {fn }∞ n
n=1 such that fn : {0, 1} → {0, 1} is computed by a Boolean circuit
c
of size at most cn and depth at most c, for some constant c > 1 and all n. We
specify small-depth layered circuits by indicating the type of gate used in each
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 17

layer. For example, an AND-OR-AND circuit is a depth-3 circuit with the top
and bottom layers composed of ∧ gates, and middle layer composed of ∨ gates. A
Boolean formula is a Boolean circuit in which every gate has fan-out 1. Common
examples of Boolean formulas are DNF and CNF formulas.

2.3. Norms and products. For a set X, we let RX denote the linear space of
real-valued functions on X. The support of a function f ∈ RX is denoted supp f =
{x ∈ X : f (x) 6= 0}. For real-valued functions with finite support, we adopt the
usual norms and inner product:

kf k∞ = max |f (x)|,
x∈supp f
X
kf k1 = |f (x)|,
x∈supp f
X
hf, gi = f (x)g(x).
x∈supp f ∩ supp g

This covers as a special case functions on finite sets. The tensor product of f ∈ RX
and g ∈ RY is denoted f ⊗g ∈ RX×Y and given by (f ⊗g)(x, y) = f (x)g(y). The ten-
sor product f ⊗f ⊗· · ·⊗f (n times) is abbreviated f ⊗n . For a subset S ⊆ {1, 2, . . . , n}
⊗S
Q a function f : X → R, we define f ⊗∅: X → R by
and n
f ⊗S (x1 , x2 , . . . , xn ) =
⊗{1,2,...,n}
i∈S f (xi ). As extremal cases, we have f ≡ 1 and f = f ⊗n . Tensor
product notation generalizes naturally to sets of functions: F ⊗ G = {f ⊗ g : f ∈
F, g ∈ G} and F ⊗n = {f1 ⊗f2 ⊗· · ·⊗fn : f1 , f2 , . . . , fn ∈ F }. A conical combination
of f1 , f2 , . . . , fk ∈ RX is any function of the form λ1 f1 + λ2 f2 + · · · + λk fk , where
λ1 , λ2 , . . . , λk are nonnegative reals. A convex combination of f1 , f2 , . . . , fk ∈ RX
is any function λ1 f1 + λ2 f2 + · · · + λk fk , where λ1 , λ2 , . . . , λk are nonnegative re-
als that sum to 1. The conical hull of F ⊆ RX , denoted cone F, is the set of all
conical combinations of functions in F. The convex hull, denoted conv F , is defined
analogously as the set of all convex combinations of functions in F. For any set of
functions F ⊆ RX , we have

(conv F )⊗n ⊆ conv(F ⊗n ). (2.2)

Throughout this manuscript, we view probability distributions as real functions.

This convention makes available the shorthands introduced above. In particular,
for probability distributions µ and λ, the symbol supp µ denotes the support of µ,
and µ ⊗ λ denotes the probability distribution given by (µ ⊗ λ)(x, y) = µ(x)λ(y).
If µ is a probability distribution on X, we consider µ to be defined also on any
superset of X with the understanding that µ = 0 outside X. We let D(X) denote
the family of all finitely supported probability distributions on X. Most of this
paper is concerned with the distribution family D(Nn ) and its subfamilies, each of
which we will denote with a Fraktur letter.
n
Analogous to functions, we adopt the familiar norms Pn for vectors x ∈ R in
Euclidean space: kxk∞ = maxi=1,...,n |xi | and kxk1 = i=1 |xi |. The latter norm
is particularly prominent in this paper, and to avoid notational clutter we use |x|
interchangeably with kxk1 . We refer to |x| = kxk1 as the weight of x. For any sets
X ⊆ Nn and W ⊆ R, we define

X|W = {x ∈ X : |x| ∈ W }.
18 ALEXANDER A. SHERSTOV AND PEI WU

In the case of a one-element set W = {w}, we further shorten X|{w} to X|w . To

illustrate, Nn |6w denotes the set of vectors whose components are natural numbers
and sum to at most w, whereas {0, 1}n |w denotes the set of Boolean strings of
length n and Hamming weight exactly w. For a function f : X → R on a subset
X ⊆ Nn , we let f |W denote the restriction of f to X|W . A typical instance of this
notation would be f |6w for some real number w.

2.4. Orthogonal content. For a multivariate real polynomial p : Rn → R, we

let deg p denote the total degree of p, i.e., the largest degree of any monomial of
p. We use the terms degree and total degree interchangeably in this paper. It will
be convenient to define the degree of the zero polynomial by deg 0 = −∞. For a
real-valued function φ supported on a finite subset of Rn , we define the orthogonal
content of φ, denoted orth φ, to be the minimum degree of a real polynomial p for
which hφ, pi =6 0. We adopt the convention that orth φ = ∞ if no such polynomial
exists. It is clear that orth φ ∈ N ∪ {∞}, with the extremal cases orth φ = 0 ⇔
hφ, 1i =
6 0 and orth φ = ∞ ⇔ φ = 0. Our next three results record additional facts
about orthogonal content.

Proposition 2.1. Let X and Y be nonempty finite subsets of Euclidean space.

Then:
(i) orth(φ + ψ) > min{orth φ, orth ψ} for all φ, ψ : X → R;
(ii) orth(φ ⊗ ψ) = orth(φ) + orth(ψ) for all φ : X → R and ψ : Y → R;
(iii) orth(φ⊗n − ψ ⊗n ) > orth(φ − ψ) for all φ, ψ : X → R and all n > 1.

Proof. Item (i) is immediate, as is the upper bound in (ii). For the lower bound
in (ii), simply note that the linearity of inner product makes it possible to restrict
attention to factored polynomials p(x)q(y), where p and q are polynomials on X
and Y , respectively. For (iii), use a telescoping sum to write

n−1
X
φ⊗n − ψ ⊗n = (φ⊗(n−i) ⊗ ψ ⊗i − φ⊗(n−i−1) ⊗ ψ ⊗(i+1) )
i=0
n−1
X
= φ⊗(n−i−1) ⊗ (φ − ψ) ⊗ ψ ⊗i .
i=0

By (ii), each term in the final expression has orthogonal content at least orth(φ−ψ).
By (i), then, the sum has orthogonal content at least orth(φ − ψ) as well.

Proposition 2.2. Let φ0 , φ1 : X → R be given functions on a finite subset X of

Nn 1 − φ0 ) > 0. Then for every polynomial
Euclidean space with orth(φ p : Xn →
n
R, the mapping z 7→ h i=1 φzi , pi is a polynomial on {0, 1} of degree at most
(deg p)/ orth(φ1 − φ0 ).

Proof.
Qn By linearity, it suffices to consider factored polynomials p(x1 , . . . , xn ) =
i=1 pi (xi ), where each pi is a nonzero polynomial on X. In this setting, we have
* n
+ n
O Y
φzi , p = hφzi , pi i . (2.3)
i=1 i=1
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 19

By definition, hφ0 , pi i = hφ1 , pi i for any index i with deg pi < orth(φ1 − φ0 ). As a
result, such indices do not contribute to the degree of the right-hand side of (2.3)
as a function of z. The contribution of any other index to the degree is clearly at
most 1. Summarizing, the right-hand side of (2.3) is a polynomial in z ∈ {0, 1}n of
degree at most |{i : deg pi > orth(φ1 − φ0 )}| 6 (deg p)/ orth(φ1 − φ0 ).

Corollary 2.3. Let X be a finite subset of Euclidean space. Then for any func-
tions φ0 , φ1 : X → R and ψ : {0, 1}n → R,
 
X n
O
orth  ψ(z) φzi  > orth(ψ) · orth(φ1 − φ0 ).
z∈{0,1}n i=1

Proof. We may assume that orth(ψ)·orth(φ1 −φ0 ) > 0 since the claim holds trivially
otherwise. Fix any polynomial P of degree less than orth(ψ) · orth(φ1 − φ0 ). The
linearity of inner product leads to
* n
+ * n +
X O X O
ψ(z) φzi , P = ψ(z) φz i , P .
z∈{0,1}n i=1 z∈{0,1}n i=1

By Proposition 2.2, the right-hand side is the inner product of ψ with a polynomial
of degree less than orth ψ and is therefore zero.

Observe that Corollary 2.3 gives an alternate proof of Proposition 2.1(iii). Our
next proposition uses orthogonal content to give a useful criterion for a real-valued
function to be a probability distribution.

Proposition 2.4. Let Λ be a probability distribution on a finite subset X of Eu-

clidean space. Let Λ̃ : X → R be given with Λ̃ > 0 and orth(Λ − Λ̃) > 0. Then Λ̃ is
a probability distribution on X.

Proof. By hypothesis, Λ̃ is a nonnegative function. Moreover, kΛ̃k1 = hΛ̃, 1i =

hΛ, 1i − hΛ − Λ̃, 1i = hΛ, 1i = 1, where the third step uses orth(Λ − Λ̃) > 0.

2.5. Sign-representation. Let f : X → {0, 1} be a given Boolean function, for

a finite subset X ⊂ Rn . The threshold degree of f, denoted deg± (f ), is the least
degree of a real polynomial p that represents f in sign: sgn p(x) = (−1)f (x) for each
x ∈ X. The term “threshold degree” appears to be due to Saks [40]. Equivalent
terms in the literature include “strong degree” [4], “voting polynomial degree” [28],
“polynomial threshold function degree” [35], and “sign degree” [10]. One of the first
results on polynomial representations of Boolean functions was the following tight
lower bound on the threshold degree of MPm , due to Minsky and Papert [32].

Theorem 2.5 (Minsky and Papert). deg± (MPm ) = Ω(m).

Three new proofs of this lower bound, unrelated to Minsky and Papert’s original
proof, were discovered in [50]. Threshold degree admits the following dual charac-
terization, obtained by appeal to linear programming duality.
20 ALEXANDER A. SHERSTOV AND PEI WU

Fact 2.6. Let f : X → {0, 1} be a given Boolean function on a finite subset X of

Euclidean space. Then deg± (f ) > d if and only if there exists ψ : X → R such that

(−1)f (x) ψ(x) > 0, x ∈ X,

orth ψ > d,
ψ 6≡ 0.

The function ψ witnesses the threshold degree of f , and is called a dual polynomial
due to its origin in a dual linear program. We refer the reader to [4, 35, 47]
for a proof of Fact 2.6. The following equivalent statement is occasionally more
convenient to work with.

Fact 2.7. For every Boolean function f : X → {0, 1} on a finite subset X of Eu-
clidean space,

deg± (f ) = max orth((−1)f · µ). (2.4)

µ∈D(X)

We now define a generalization of threshold degree inspired by the dual view in

Fact 2.7. For a function f : X → {0, 1} and a real number 0 6 γ 6 1, let

deg± (f, γ) = max orth((−1)f · µ). (2.5)

µ∈D(X):
µ>γ/|X| on X

We call this quantity the γ-smooth threshold degree of f , in reference to the fact
that the maximization in (2.5) is over probability distributions µ that assign to
every point of the domain at least a γ fraction of the average point’s probability.
A glance at (2.4) and (2.5) reveals that deg± (f, γ) is monotonically nonincreasing
in γ, with the limiting case deg± (f, 0) = deg± (f ).

Fact 2.8. For every nonconstant function f : X → {0, 1},

1
deg± f, > 1.
2

Proof. Define µ = 12 µ0 + 21 µ1 , where µi is the uniform probability distribution on

f −1 (i). Then clearly orth((−1)f · µ) > 1 and µ > 12 max{µ0 , µ1 } > 2|X|
1
on X.

As one might expect, padding a Boolean function with irrelevant variables does not
decrease its smooth threshold degree. We record this observation below.

Proposition 2.9. Fix integers N > n > 1 and a function f : {0, 1}n → {0, 1}.
Define F : {0, 1}N → {0, 1} by F (x1 , x2 , . . . , xN ) = f (x1 , x2 , . . . , xn ). Then

deg± (F, γ) > deg± (f, γ), 0 6 γ 6 1.

In particular,

deg± (F ) > deg± (f ).

THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 21

Proof. Fix 0 6 γ 6 1 arbitrarily. Let λ be a probability distribution on {0, 1}n such

that λ(x) > γ2−n for all x ∈ {0, 1}n , and in addition orth((−1)f · λ) = deg± (f, γ).
Consider the probability distribution Λ on {0, 1}N given by Λ(x1 , x2 , . . . , xN ) =
2−N +n λ(x1 , x2 , . . . , xn ). Then Λ(x) > 2−N +n · γ2−n = γ2−N for all x ∈ {0, 1}N ,
and in addition orth((−1)F · Λ) = orth((−1)f · λ) = deg± (f, γ).

2.6. Symmetrization. Let Sn denote the symmetric group on n elements. For a

permutation σ ∈ Sn and an arbitrary sequence x = (x1 , x2 , . . . , xn ), we adopt the
shorthand σx = (xσ(1) , xσ(2) , . . . , xσ(n) ). A function f (x1 , x2 , . . . , xn ) is called sym-
metric if it is invariant under permutation of the input variables: f (x1 , x2 , . . . , xn ) =
f (xσ(1) , xσ(2) , . . . , xσ(n) ) for all x and σ. Symmetric functions on {0, 1}n are inti-
mately related to univariate polynomials, as was first observed by Minsky and
Papert in their symmetrization argument [32].

Proposition 2.10 (Minsky and Papert). Let p : Rn → R be a given polynomial.

Then the mapping

t 7→ E p(x)
x∈{0,1}n |t

is a univariate polynomial on {0, 1, 2, . . . , n} of degree at most deg p.

Minsky and Papert’s result generalizes to block-symmetric functions:

Proposition 2.11. Let n1 , . . . , nk be positive integers. Let p : Rn1 × · · · × Rnk → R

be a given polynomial. Then the mapping

(t1 , t2 , . . . , tk ) 7→ E E ··· E p(x1 , x2 , . . . , xk )

x1 ∈{0,1}n1 |t1 x2 ∈{0,1}n2 |t2 xk ∈{0,1}nk |tk

is a polynomial on {0, 1, . . . , n1 } × {0, 1, . . . , n2 } × · · · × {0, 1, . . . , nk } of degree at

most deg p.

Proposition 2.11 follows in a straightforward manner from Proposition 2.10 by

induction on the number of blocks k, as pointed out in [39, Proposition 2.3]. The
next result is yet another generalization of Minsky and Papert’s symmetrization
technique, this time to the setting when x1 , x2 , . . . , xn are vectors rather than bits.

Proposition 2.12. Let p : (Rm )n → R be a polynomial of degree d. Then there is

a polynomial p∗ : Rn → R of degree at most d such that for all x1 , x2 , . . . , xn ∈
{e1 , e2 , . . . , em , 0m },

E p(xσ(1) , xσ(2) , . . . , xσ(n) ) = p∗ (x1 + x2 + · · · + xn ).

σ∈Sn

Proof. We closely follow an argument due to Ambainis [2, Lemma 3.4], who proved
a related result. Since the components of x1 , x2 , . . . , xn are Boolean-valued, we
have xi,j = x2i,j = x3i,j = · · · and therefore we may assume that p is multilinear.
By linearity, it further suffices to consider the case when p is a single monomial:
m Y
Y
p(x1 , x2 , . . . , xn ) = xi,j (2.6)
j=1 i∈Aj
22 ALEXANDER A. SHERSTOV AND PEI WU

Pm
for some sets A1 , A2 , . . . , Am ⊆ {1, 2, . . . , n} with j=1 |Aj | 6 d. If some pair of sets
Aj , Aj 0 with j 6= j 0 have nonempty intersection, then the right-hand side of (2.6)
contains a product of the form xi,j xi,j 0 for some i and thus p ≡ 0 on the domain
in question. As a result, the proposition holds with p∗ = 0. In the complementary
case when A1 , A2 , . . . , Am are pairwise disjoint, we calculate

E p(xσ(1) , xσ(2) , . . . , xσ(n) )

σ∈Sn
 
Ym Y Y
xσ(i),j 0 = 1 for all j 0 < j 

= E  xσ(i),j
σ∈Sn
j=1 i∈Aj i∈Aj 0
m −1
Y x1,j + x2,j + · · · + xn,j n − |A1 | − |A2 | − · · · − |Aj−1 |
= .
j=1
|Aj | |Aj |

Expanding out the binomial coefficients shows that the final expression is an m-
variate polynomial whose argument is the P vector sum x1 + x2 + · · · + xn ∈ Rm .
Moreover, the degree of this polynomial is |Aj | 6 d.

Corollary 2.13. Let p : (Rm )n → R be a given polynomial. Then the mapping

v 7→ E p(x) (2.7)
x∈{0m ,e1 ,e2 ,...,em }n :
x1 +x2 +···+xn =v

is a polynomial on Nm |6n of degree at most deg p.

Minsky and Papert’s symmetrization corresponds to m = 1 in Corollary 2.13.

Proof of Corollary 2.13. Let v ∈ Nm |6n be given. Then all representations v =

x1 + x2 + · · · + xn with x1 , x2 , . . . , xn ∈ {0m , e1 , e2 , . . . , em } are the same up to the
order of the summands. As a result, (2.7) is the same mapping as

v 7→ E p(σ(e1 , . . . , e1 , e2 , . . . , e2 , . . . , em , . . . , em , 0m , 0m . . . , 0m )),
σ∈Sn | {z } | {z } | {z } | {z }
v1 v2 vm n−v1 −···−vm

which by Proposition 2.12 is a polynomial in

e + · · · + e1 + e2 + · · · + e2 + · · · + em + · · · + em + 0m + · · · + 0m = v
|1 {z } | {z } | {z } | {z }
v1 v2 vm n−v1 −···−vm

of degree at most deg p.

It will be helpful to define symmetric versions of basic Boolean functions. We

define AND∗n , OR∗n : {0, 1, 2, . . . , n} → {0, 1} by
( (
1 if t = n, 0 if t = 0,
AND∗n (t) = OR∗n (t) =
0 otherwise, 1 otherwise.

The symmetric variant of the Minsky–Papert function is MP∗m,r = ANDm ◦ OR∗r .

THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 23

2.7. Communication complexity. An excellent reference on communication com-

plexity is the monograph by Kushilevitz and Nisan [30]. In this overview, we
will limit ourselves to key definitions and notation. We adopt the standard ran-
domized model of multiparty communication, due to Chandra et al. [18]. The
model features ` communicating players, tasked with computing a Boolean func-
tion F : X1 ×X2 ×· · ·×X` → {0, 1} for some finite sets X1 , X2 , . . . , X` . A given input
(x1 , x2 , . . . , x` ) ∈ X1 × X2 × · · · × X` is distributed among the players by placing xi ,
figuratively speaking, on the forehead of the ith player (for i = 1, 2, . . . , `). In other
words, the ith player knows the arguments x1 , . . . , xi−1 , xi+1 , . . . , x` but not xi .
The players communicate by sending broadcast messages, taking turns according
to a protocol agreed upon in advance. Each of them privately holds an unlimited
supply of uniformly random bits, which he can use along with his available argu-
ments when deciding what message to send at any given point in the protocol. The
players’ objective is to compute F (x1 , x2 , . . . , x` ). An -error protocol for F is one
which, on every input (x1 , x2 , . . . , x` ), produces the correct answer F (x1 , x2 , . . . , x` )
with probability at least 1 − . The cost of a protocol is the total bit length of the
messages broadcast by all the players in the worst case.1 The -error randomized
communication complexity of F, denoted R (F ), is the least cost of an -error ran-
domized protocol for F . As a special case of this model for ` = 2, one recovers the
original two-party model of Yao [58] reviewed in the introduction.
Our work focuses on randomized protocols with error probability close to that
of random guessing, 1/2. There are two natural ways to define the communication
complexity of a multiparty problem F in this setting. The communication complex-
ity of F with unbounded error, introduced by Paturi and Simon [38], is the quantity

UPP(F ) = inf R (F ). (2.8)

06<1/2

Here, the error is unbounded in the sense that it can be arbitrarily close to 1/2.
Babai et al. [5] proposed an alternate quantity, which includes an additive penalty
term that depends on the error probability:

1
PP(F ) = inf R (F ) + log 1 . (2.9)
2 −
06<1/2

This quantity is known as the communication complexity of F with weakly un-

bounded error.
2.8. Discrepancy and sign-rank. An `-dimensional cylinder intersection is a
function χ : X1 × X2 × · · · × X` → {0, 1} of the form

`
Y
χ(x1 , x2 , . . . , x` ) = χi (x1 , . . . , xi−1 , xi+1 , . . . , x` ),
i=1

where χi : X1 ×· · ·×Xi−1 ×Xi+1 ×· · ·×X` → {0, 1}. In other words, an `-dimensional

cylinder intersection is the product of ` functions with range {0, 1}, where the ith
function does not depend on the ith coordinate but may depend arbitrarily on the
other `−1 coordinates. Introduced by Babai et al. [6], cylinder intersections are the

1 The contribution of a b-bit broadcast to the protocol cost is b rather than ` · b.

24 ALEXANDER A. SHERSTOV AND PEI WU

fundamental building blocks of communication protocols and for that reason play
a central role in the theory. For a Boolean function F : X1 × X2 × · · · × X` → {0, 1}
and a probability distribution P on X1 × X2 × · · · × X` , the discrepancy of F with
respect to P is given by

X
F (x)
discP (F ) = max (−1) P (x)χ(x) ,

χ
x∈X1 ×X2 ×···×X`

where the maximum is over cylinder intersections χ. The minimum discrepancy

over all distributions is denoted

disc(F ) = min discP (F ).

The discrepancy method [20, 6, 30] is a classic technique that bounds randomized
communication complexity from below in terms of discrepancy.

Theorem 2.14 (Discrepancy method). Let F : X1 × X2 × · · · × X` → {0, 1} be an

`-party communication problem. Then

1 − 2
2R (F ) > .
disc(F )

Combining this theorem with the definition of PP(F ) gives the following corollary.

Corollary 2.15. Let F : X1 ×X2 ×· · ·×X` → {0, 1} be an `-party communication

problem. Then

2
PP(F ) > log .
disc(F )

The sign-rank of a real matrix A ∈ Rn×m with nonzero entries is the least rank
of a matrix B ∈ Rn×m such that sgn Ai,j = sgn Bi,j for all i, j. In general, the
sign-rank of a matrix can be vastly smaller than its rank. For example, consider
the following nonsingular matrices of order n > 3:

1 1
   
 1 1   1 −1 
1 1
   
   
 .. ,  .. .

 . 


 . 


−1 1
1
 
−1 1
1


These matrices have sign-rank at most 2 and 3, respectively. Indeed, the first matrix
has the same sign pattern as [2(j − i) + 1]i,j . The second has the same sign pattern
as [hvi , vj i − (1 − )]i,j , where v1 , v2 , . . . , vn ∈ R2 are arbitrary pairwise distinct
unit vectors and is a suitably small positive real, cf. [38, Section 5]. As a matter
of notational convenience, we extend the notion of sign-rank to Boolean functions
f : X ×Y → {0, 1} by defining rk± (f ) = rk± (Mf ), where Mf = [(−1)f (x,y) ]x∈X,y∈Y
is the matrix associated with f . A remarkable fact, due to Paturi and Simon [38],
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 25

is that the sign-rank of a two-party communication problem fully characterizes its

unbounded-error communication complexity.

Theorem 2.16 (Paturi and Simon). Let F : X × Y → {0, 1} be a given communi-

cation problem. Then

log rk± (F ) 6 UPP(F ) 6 log rk± (F ) + 2.

As Corollary 2.15 and Theorem 2.16 show, the study of communication with
unbounded and weakly unbounded error is in essence the study of discrepancy and
sign-rank. These quantities are difficult to analyze from first principles. The pattern
matrix method, developed in [42, 44], is a technique that transforms lower bounds
for polynomial approximation into bounds on discrepancy, sign-rank, and various
other quantities in communication complexity. For our discrepancy bounds, we
use the following special case of the pattern matrix method [51, Theorem 5.7 and
equation (119)].

Theorem 2.17 (Sherstov). Let f : {0, 1}n → {0, 1} be given. Consider the `-party
communication problem F : ({0, 1}nm )` → {0, 1} given by F = f ◦ NORm ◦ AND` .
Then
deg± (f )/2
c2` `

disc(F ) 6 √ ,
m

where c > 0 is a constant independent of n, m, `, f.

We note that the case ` = 2 of Theorem 2.17 is vastly easier to prove than the
general statement; this two-party result can be found in [44, Theorem 7.3 and
equation (7.3)]. For our sign-rank lower bounds, we use the following theorem
implicit in [45].

Theorem 2.18 (Sherstov, implicit). Let f : {0, 1}n → {0, 1} be given. Suppose that
deg± (f, γ) > d, where γ and d are positive reals. Fix an integer m > 2 and define
F : {0, 1}mn × {0, 1}mn → {0, 1} by F = f ◦ ORm ◦ AND2 . Then
j m kd/2
rk± (F ) > γ .
2
For the reader’s convenience, we include a detailed proof of Theorem 2.18 in Ap-
pendix A.

3. Auxiliary results
In this section, we collect a number of supporting results on polynomial approxi-
mation that have appeared in one form or another in previous work. For the reader’s
convenience, we provide self-contained proofs whenever the precise formulation that
we need departs from published work.

3.1. Basic dual objects. As described in the introduction, we prove our main
results constructively, by building explicit dual objects that witness the correspond-
ing lower bounds. An important tool in this process is the following lemma due to
Razborov and Sherstov [39]. Informally, it is used to adjust a dual object’s metric
26 ALEXANDER A. SHERSTOV AND PEI WU

properties while preserving its orthogonality to low-degree polynomials. The lemma

plays a basic role in several recent papers [39, 15, 16, 11, 17] as well as our work.

Lemma 3.1 (Razborov and Sherstov). Fix integers d and n, where 0 6 d < n. Then
there is an (explicitly given) function ζ : {0, 1}n → R such that

supp ζ ⊆ {0, 1}n |6d ∪ {1n },

ζ(1n ) = 1,

n
kζk1 6 1 + 2d ,
d
orth ζ > d.

In more detail, this result corresponds to taking k = d and ζ = (−1)n g in the proof
of Lemma 3.2 of [39]. We will need the following symmetrized version of Lemma 3.1.

Lemma 3.2. Fix a point u ∈ Nn and a natural number d < |u|. Then there is
ζu : Nn → R such that

supp ζu ⊆ {u} ∪ {v ∈ Nn : v 6 u and |v| 6 d}, (3.1)

ζu (u) = 1, (3.2)

|u|
kζu k1 6 1 + 2d , (3.3)
d
orth ζu > d. (3.4)

Proof. Lemma 3.1 gives a function ζ : {0, 1}|u| → R such that

supp ζ ⊆ {0, 1}|u| |6d ∪ {1|u| }, (3.5)

|u|
ζ(1 ) = 1, (3.6)

|u|
kζk1 6 1 + 2d , (3.7)
d
orth ζ > d. (3.8)

Now define ζu : Nn → R by
X X
ζu (v) = ··· ζ(x1 . . . xn ),
x1 ∈{0,1}|u1 | ||v xn ∈{0,1}|un | ||v
1| n|

where we adopt the convention that the set {0, 1}0 = {0, 1}0 |0 has as its only
element the empty string, with weight 0. Then properties (3.1)–(3.3) are imme-
diate from (3.5)–(3.7), respectively. To verify the remaining property (3.4), fix a
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 27

polynomial p : Rn → R of degree at most d. Then

 
X  X X
hζu , pi = ··· ζ(x1 . . . xn ) p(v1 , . . . , vn )


v:v6u x1 ∈{0,1}|u1 | ||v xn ∈{0,1}|un | ||v
1| n|
 
X  X X
= ··· ζ(x1 . . . xn )p(|x1 |, . . . , |xn |)


v:v6u x1 ∈{0,1}|u1 | ||v xn ∈{0,1}|un | ||v
1| n|
X X
= ··· ζ(x1 . . . xn )p(|x1 |, . . . , |xn |)
x1 ∈{0,1}|u1 | xn ∈{0,1}|un |
= 0,

where the last step uses (3.8).

When constructing a dual polynomial for a complicated constant-depth circuit,

it is natural to start with a dual polynomial for the OR function or, equivalently, its
counterpart AND. The first such dual polynomial was constructed by Špalek [57],
with many refinements and generalizations [12, 50, 52, 16, 11] obtained in follow-up
work. We augment this line of work with yet another construction, which delivers
the exact combination of analytic and metric properties that we need.

Theorem 3.3. Let 0 < < 1 be given. Then for some constants c0 , c00 ∈ (0, 1) and
all integers N > n > 1, there is an (explicitly given) function ψ : {0, 1, 2, . . . , N } →
R such that
1−
ψ(0) > ,
2
kψk1 = 1,
√
orth ψ > c0 n,
sgn ψ(t) = (−1)t , t = 0, 1, 2, . . . , N,
c0

1
|ψ(t)| ∈ √ , √ , t = 0, 1, 2, . . . , N.
(t + 1)2 2c00 t/ n c0 (t + 1)2 2c00 t/ n

A self-contained proof of Theorem 3.3 is available in Appendix B.

3.2. Dominant components. We now recall a lemma due to Bun and Thaler [16]
that serves to identify the dominant components of a vector. Its primary use [16, 11]
is to prove concentration-of-measure results for product distributions on Nn .

Lemma 3.4 (Bun and Thaler). Let v ∈ Rn be given, v 6= 0n . Then there is S ⊆

{1, 2, . . . , n} such that

kvk1
|S| > ,
2kvk∞
kvk1
|S| min |vi | > .
i∈S 2(1 + ln n)
28 ALEXANDER A. SHERSTOV AND PEI WU

Proof (adapted from [16]). By renumbering the indices if necessary, we may as-
sume that |v1 | > |v2 | > · · · > |vn | > 0. For the sake of contradiction, suppose that
no such set S exists. Then
1 kvk1
|vi | < ·
i 2(1 + ln n)

kvk1
for every index i > 2kvk∞ . As a result,

X n
X
kvk1 = |vi | + |vi |
kvk
l m
kvk
i< 2kvk1 i= 2kvk1
∞ ∞
n
X X 1 kvk1
6 kvk∞ + ·
kvk
l m i 2(1 + ln n)
kvk
i< 2kvk1 i= 2kvk1
∞ ∞
n
kvk1 kvk1 X 1
< +
2 2(1 + ln n) i=1
i
6 kvk1 ,

where the final step uses

n n Z n
X 1 X 1 di
=1+ 61+ = 1 + ln n.
i=1
i i=2
i 1 i

We have arrived at kvk1 < kvk1 , a contradiction.

We will need a slightly more general statement, which can be thought of as an

extremal analogue of Lemma 3.4.

Lemma 3.5. Fix θ > 0 and let v ∈ Rn be an arbitrary vector with kvk1 > θ. Then
there is S ⊆ {1, 2, . . . , n} such that

kvk1
|S| > , (3.9)
2kvk∞
1 θ
min |vi | > · , (3.10)
i∈S |S| 2(1 + ln n)
X
|vi | < θ. (3.11)
i∈S
/

Proof. Fix n, v, and θ for the remainder of the proof. We will refer to a subset
S ⊆ {1, 2, . . . , n} as regular if S satisfies (3.9) and (3.10). Lemma 3.4 along with
kvk1 > θ ensures the existence of at least one regular set. Now, let S be a maximal
regular set. For the sake of contradiction, suppose that (3.11) fails. Applying
Lemma 3.4 to v|S produces a nonempty set T ⊆ S with

1 θ
min |vi | > · .
i∈T |T | 2(1 + ln n)
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 29

But then S ∪ T is regular, contradicting the maximality of S.

Lemma 3.5 implies the following concentration-of-measure result for product dis-
tributions on Nn ; cf. Bun and Thaler [16].

Lemma 3.6 (cf. Bun and Thaler). Let λ1 , λ2 , . . . , λn ∈ D(N) be given with

Cαt
λi (t) 6 , t ∈ N, (3.12)
(t + 1)2

where C > 0 and 0 6 α 6 1. Then for all θ > 8Cen(1 + ln n),

P [kvk1 > θ] 6 αθ/2 .

v∼λ1 ×λ2 ×···×λn

Proof (adapted from [16]). For any vector v ∈ Nn with kvk1 > θ/2, Lemma 3.5
guarantees the existence of a nonempty set S ⊆ {1, 2, . . . , n} such that

1 θ
min |vi | > · , (3.13)
i∈S |S| 4(1 + ln n)
X θ
|vi | < . (3.14)
2
i∈S
/

If in addition kvk1 > θ, the second property implies

X θ
|vi | > . (3.15)
2
i∈S

In what follows, we refer to the combination of properties (3.13) and (3.15) by

saying that v is S-heavy. In this terminology, every vector v ∈ Nn with kvk1 > θ is
S-heavy for some nonempty set S.
Now, consider a random vector v ∈ Nn distributed according to λ1 ×λ2 ×· · ·×λn .
We have

P[kvk1 > θ] 6 P[v is S-heavy for some nonempty S]

v v
X
6 P[v is S-heavy]
v
S⊆{1,2,...,n}
S6=∅
 |S|
X X C 
6 αθ/2 

(t + 1)2

S⊆{1,2,...,n} 1 θ
t> |S| · 4(1+ln n)
S6=∅
!|S|
Z ∞
X
θ/2 dt
6 α C
1
· θ t2
S⊆{1,2,...,n} |S| 4(1+ln n)
S6=∅
30 ALEXANDER A. SHERSTOV AND PEI WU

|S|
X
θ/2 C|S| · 4(1 + ln n)
= α
θ
S⊆{1,2,...,n}
S6=∅
n s
X n Cs · 4(1 + ln n)
= · αθ/2
s=1
s θ
n s
X
θ/2 en Cs · 4(1 + ln n)
6 α ·
s=1
s θ
6 αθ/2 ,

where the first inequality holds by the opening paragraph of the proof; the sec-
ond step applies the union bound; the third step uses 0 6 α 6 1 and the upper
bound (3.12) for the λi ; and the last two steps use (2.1) and the hypothesis that
θ > 8Cen(1 + ln n), respectively.

3.3. Input transformation. We work almost exclusively with Boolean functions

on Nn |6θ , for appropriate settings of the dimension parameter n and weight pa-
rameter θ. This choice of domain is admittedly unusual but greatly simplifies the
analysis. Fortunately, approximation-theoretic results obtained in this setting carry
over in a black-box manner to the hypercube. In more detail, we will now prove that
every function on Nn |6θ can be transformed into a function in O(θ log n) Boolean
variables with similar approximation-theoretic properties. Analogous input trans-
formations, with similar proofs, have been used in previous work to translate results
from {0, 1}n |θ or {0, 1}n |6θ to the hypercube setting [16, 11]. The presentation be-
low seems more economical than previous treatments.
Recall that e1 , e2 , . . . , en denote the standard basis for Rn . The following encod-
ing lemma was proved in [52, Lemma 3.1].

Lemma 3.7 (Sherstov). Let n > 1 be a given integer. Then there is a surjection
g : {0, 1}6dlog(n+1)e → {0n , e1 , e2 , . . . , en } such that

E p= E p= E p = ··· = E p
g −1 (0n ) g −1 (e1 ) g −1 (e2 ) g −1 (en )

for every polynomial p of degree at most dlog(n+1)e. Moreover, g can be constructed

deterministically in time polynomial in n.

Observe that the points 0n , e1 , e2 , . . . , en in this lemma act simply as labels and
can be replaced with any other tuple of n + 1 distinct points. Indeed, this result
was originally stated in [52] for a different choice of points. A tensor version of
Lemma 3.7 is as follows.

Lemma 3.8. Let g : {0, 1}6dlog(n+1)e → {0n , e1 , e2 , . . . , en } be as constructed in

Lemma 3.7. Then for any integer θ > 1 and any polynomial p : (R6dlog(n+1)e )θ → R,
the mapping

(y1 , y2 , . . . , yθ ) 7→ E p
g −1 (y1 )×g −1 (y2 )×···×g −1 (yθ )

is a polynomial in y ∈ {0n , e1 , e2 , . . . , en }θ of degree at most (deg p)/dlog(n+1)+1e.

THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 31

Proof. By linearity, it suffices to prove the lemma for factored polynomials of the
form p(x1 , x2 , . . . , xθ ) = p1 (x1 )p2 (x2 ) · · · pθ (xθ ), where p1 , p2 , . . . , pθ are real poly-
nomials on R6dlog(n+1)e . For such a polynomial p, the defining equation simplifies
to
n
Y
E p = E pi . (3.16)
g −1 (y1 )×g −1 (y2 )×···×g −1 (yθ ) g −1 (yi )
i=1

We now examine the individual contributions of p1 , p2 , . . . , pθ to the degree of the

right-hand side as a real polynomial in y. For any polynomial pi of degree at most
dlog(n + 1)e, Lemma 3.7 ensures that the corresponding expectation Eg−1 (yi ) pi is
a constant independent of the input yi . Thus, polynomials pi of degree at most
dlog(n + 1)e do not contribute to the degree of the right-hand side of (3.16). For
the other polynomials pi , the expectation Eg−1 (yi ) pi is a linear polynomial in yi ,
namely,

E pi = yi,1 E pi + yi,2 E pi + · · · + yi,n E pi

g −1 (yi ) g −1 (e1 ) g −1 (e2 ) g −1 (en )
 
Xn
+ 1 − yi,j  E pi ,
g −1 (0n )
j=1

where we are crucially exploiting the fact that yi ∈ {0n , e1 , e2 , . . . , en }. Thus,

polynomials pi of degree greater than dlog(n + 1)e contribute at most 1 each to
the degree. Summarizing, the right-hand side of (3.16) is a real polynomial in
y1 , y2 , . . . , yθ of degree at most

deg p
|{i : deg pi > dlog(n + 1)e + 1}| 6 ,
dlog(n + 1)e + 1

as was to be shown.

We have reached the claimed result on input transformation.

Theorem 3.9. Let n, θ > 1 be given integers. Set N = 6dlog(n + 1)eθ. Then there
is a surjection G : {0, 1}N → Nn |6θ such that:
(i) for every polynomial p : RN → R, the mapping v 7→ EG−1 (v) p is a polyno-
mial on Nn |6θ of degree at most (deg p)/dlog(n + 1) + 1e;
(ii) for every coordinate i = 1, 2, . . . , n, the mapping x 7→ OR∗θ (G(x)i ) is com-
putable by an explicitly given DNF formula with O(θn6 ) terms, each with
at most 6dlog(n + 1)e variables.

Applying Theorem 3.9 to a function f : Nn |6θ → {0, 1} produces a composed func-

tion f ◦G : {0, 1}6dlog(n+1)eθ → {0, 1} in the hypercube setting. The theorem ensures
that lower bounds for the pointwise approximation, or sign-representation, of f ap-
ply to f ◦ G as well. Moreover, the circuit complexity of f ◦ G is only slightly
higher than that of f. This way, Theorem 3.9 efficiently transfers approximation-
theoretic results from Nn |6θ (or any subset thereof, such as {0, 1}n |6θ or Nn |θ ) to
the traditional setting of the hypercube.
32 ALEXANDER A. SHERSTOV AND PEI WU

Proof of Theorem 3.9. Define G : ({0, 1}6dlog(n+1)e )θ → Nn |6θ by

G(x1 , x2 , . . . , xθ ) = g(x1 ) + g(x2 ) + · · · + g(xθ ),

where g : {0, 1}6dlog(n+1)e → {0n , e1 , e2 , . . . , en } is as constructed in Lemma 3.7.

The surjectivity of G follows trivially from that of g. We proceed to verify the
additional properties required of G.
(i) For v ∈ Nn |6θ , we have the partition
[
G−1 (v) = g −1 (y1 ) × g −1 (y2 ) × · · · × g −1 (yθ ). (3.17)
n θ
y∈{0 ,e1 ,e2 ,...,en } :
y1 +y2 +···+yθ =v

All representations v = y1 + y2 + · · · + yθ with y1 , y2 , . . . , yθ ∈ {0n , e1 , e2 , . . . , en }

are the same up to the order of the summands. As a result, each part g −1 (y1 ) ×
g −1 (y2 ) × · · · × g −1 (yθ ) in the partition on the right-hand side of (3.17) has the
same cardinality. We conclude that for any given polynomial p,

E p= E E p. (3.18)
G−1 (v) y∈{0n ,e1 ,e2 ,...,en }θ : g −1 (y1 )×g −1 (y2 )×···×g −1 (yθ )
y1 +y2 +···+yθ =v

Recall from Lemma 3.8 that the rightmost expectation in this equation is a polyno-
mial in y1 , y2 , . . . , yθ ∈ {0n , e1 , e2 , . . . , en } of degree at most (deg p)/dlog(n+1)+1e.
As a result, Corollary 2.13 implies that the right-hand side of (3.18) is a polynomial
in v of degree at most (deg p)/dlog(n + 1) + 1e.
(ii) Fix an index i. Then

θ
_
OR∗θ (G(x)i ) = I[g(xj ) = ei ].
j=1

Each of the disjuncts on the right-hand side is a function of 6dlog(n + 1)e Boolean
variables. Therefore, OR∗θ (G(x)i ) is representable by a DNF formula with O(θn6 )
terms, each with at most 6dlog(n + 1)e variables.

4. The threshold degree of AC0

This section is devoted to our results on threshold degree. While we are mainly
interested in the threshold degree of AC0 , the techniques developed here apply to a
much broader class of functions. Specifically, we prove an amplification theorem that
takes an arbitrary function f and builds from it a function F with higher threshold
degree. We give analogous amplification theorems for various other approximation-
theoretic quantities. The transformation f 7→ F is efficient with regard to circuit
depth and size and in particular preserves membership in AC0 . To deduce our main
results for AC0 , we start with a single-gate circuit and iteratively apply the ampli-
fication theorem to produce constant-depth circuits of higher and higher threshold
degree. We develop this general machinery in Sections 4.1–4.3, followed by the
applications to AC0 in Sections 4.4 and 4.5.

4.1. Shifting probability mass in product distributions. Consider a prod-

uct distribution Λ on Nn whereby every component is concentrated near 0. The
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 33

centerpiece of our threshold degree analysis, presented here, is the construction of

an associated probability distribution Λ̃ that is supported entirely on inputs of low
weight and cannot be distinguished from Λ by a low-degree polynomial. More for-
mally, define B(r, c, α) to be the family of probability distributions λ on N such
that

supp λ = {0, 1, 2, . . . , r0 }

for some nonnegative integer r0 6 r, and in addition

ct+1 1
6 λ(t) 6 , t ∈ supp λ. (4.1)
(t + 1)2 2αt c(t + 1)2 2αt

Distributions in this family are subject to pointwise constraints, hence the symbol
B for “bounded.” Our bounding functions are motivated mainly by the metric
properties of the dual polynomial for ORn , constructed in Theorem 3.3.
In this notation, our analysis handles any distribution Λ ∈ B(r, c, α)⊗n . It would
be possible to generalize our work further, but the lower and upper bounds in (4.1)
are already exponentially far apart and capture a much larger class of probability
distributions than what we need for the applications to AC0 . The precise statement
of our result is as follows.

Theorem 4.1. Let Λ ∈ B(r, c, α)⊗n be given, for some integer r > 0 and reals
c > 0 and α > 0. Let d and θ be positive integers with

θ > 2d, (4.2)

4en(1 + ln n)
θ> . (4.3)
c2

Then there is a function Λ̃ : Nn → R such that

supp Λ̃ ⊆ (supp Λ)|<2θ , (4.4)

orth(Λ − Λ̃) > d, (4.5)
d
8nr
|Λ − Λ̃| 6 2−dθ/re−αdθ/2e+2 Λ on Nn |<2θ . (4.6)
c

In general, the function Λ̃ constructed in Theorem 4.1 may not be a probability

distribution. However, when θ is large enough relative to the other parameters, the
pointwise property (4.6) forces |Λ − Λ̃| 6 Λ on the support of Λ̃, and in particular
Λ̃ > 0. Since orth(Λ − Λ̃) > 0 by construction, Proposition 2.4 guarantees that Λ̃
is a probability distribution in that case.
Proof of Theorem 4.1. For c > 1, we have B(r, c, α) = ∅ and the theorem holds
vacuously. Another degenerate possibility is r = 0, in which case Λ is the single-
point distribution on 0n , and therefore it suffices to take Λ̃ = Λ. In what follows,
we treat the general case when

c ∈ (0, 1],
r > 1.
34 ALEXANDER A. SHERSTOV AND PEI WU

For every vector v ∈ Nn with kvk1 > θ, let S(v) ⊆ {1, 2, . . . , n} denote the
corresponding subset identified by Lemma 3.5. To restate the lemma’s guarantees,

θ
|S(v)| > , v ∈ (supp Λ)|>2θ , (4.7)
r
θ
min vi > , v ∈ (supp Λ)|>2θ , (4.8)
i∈S(v) 2|S(v)|(1 + ln n)

kv|S(v) k1 < θ. v ∈ (supp Λ)|>2θ . (4.9)

Property (4.9) implies that

kv|S(v) k1 > θ, v ∈ (supp Λ)|>2θ , (4.10)

and in particular

kv|S(v) k1 > d, v ∈ (supp Λ)|>2θ . (4.11)

For each i = 1, 2, . . . , n and each u ∈ Ni |>d , Lemma 3.2 gives a function ζu : Ni → R

such that

supp ζu ⊆ {u} ∪ {v ∈ Ni : v 6 u and |v| 6 d}, (4.12)

ζu (u) = 1, (4.13)

kuk1
kζu k1 6 1 + 2d , (4.14)
d
orth ζu > d, (4.15)

and in particular

kζu k∞ 6 max{|ζu (u)|, kζu k1 − |ζu (u)|}

kuk1
6 2d
d
6 2kuk1 d . (4.16)

The central object of study in our proof is the following function ζ : Nn → R,

built from the auxiliary objects S(v) and ζu just introduced:
X
ζ(x) = Λ(v) ζv|S(v) (x|S(v) ) I[x|S(v) = v|S(v) ]. (4.17)
v∈(supp Λ)|>2θ

The expression on the right-hand side is well-formed because, to restate (4.11), each
string v|S(v) has weight greater than d and can therefore be used as a subscript in
ζv|S(v) . Specializing (4.15) and (4.16),

orth ζv|S(v) > d, v ∈ (supp Λ)|>2θ , (4.18)

d
kζv|S(v) k∞ 6 2(nr) , v ∈ (supp Λ)|>2θ . (4.19)
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 35

Property (4.12) ensures that ζv|S(v) (x|S(v) ) I[x|S(v) = v|S(v) ] 6= 0 only when x 6 v.
It follows that
[
supp ζ ⊆ {x ∈ Nn : x 6 v}
v∈supp Λ

= supp Λ, (4.20)

where second step is valid because Λ ∈ B(r, c, α)⊗n .

Before carrying on with the proof, we take a moment to simplify the defining
expression for ζ. For any v ∈ Nn |>2θ , we have

ζv|S(v) (x|S(v) ) I[x|S(v) = v|S(v) ]

= ζv|S(v) (x|S(v) ) I[x|S(v) = v|S(v) or kx|S(v) k1 6 d] I[x|S(v) = v|S(v) ]
= ζv|S(v) (x|S(v) )(I[x|S(v) = v|S(v) ] + I[kx|S(v) k1 6 d])I[x|S(v) = v|S(v) ]
= ζv|S(v) (x|S(v) )I[x = v]
+ ζv|S(v) (x|S(v) )I[kx|S(v) k1 6 d] I[x|S(v) = v|S(v) ]
= I[x = v] + ζv|S(v) (x|S(v) )I[kx|S(v) k1 6 d] I[x|S(v) = v|S(v) ],

We proceed to establish key properties of ζ.

Step 1: Orthogonality. By Proposition 2.1(ii), each term in the summation

on the right-hand side of (4.17) is a function orthogonal to polynomials of degree
less than orth ζv|S(v) . Therefore,

orth ζ > min orth ζv|S(v)

v∈(supp Λ)|>2θ

> d, (4.22)

where the first step uses Proposition 2.1(i), and the second step applies (4.18).

Step 2: Heavy inputs. We now examine the behavior of ζ on inputs of weight

at least 2θ, which we think of as “heavy.” For any string v ∈ (supp Λ)|>2θ , we have

x ∈ Nn |>2θ =⇒ kxk1 > d + θ

where the final implication uses (4.9). We conclude that the first summation
in (4.21) vanishes on Nn |>2θ , so that

ζ(x) = Λ(x), x ∈ Nn |>2θ . (4.23)

This completes the analysis of heavy inputs.

Step 3: Light inputs. We now turn to inputs of weight less than 2θ, the most
technical part of the proof. Fix an arbitrary string x ∈ (supp Λ)|<2θ . Then

|ζ(x)| X Λ(v)
= ζv|S(v) (x|S(v) ) I[kx|S(v) k1 6 d] I[x|S(v) = v|S(v) ]
Λ(x) v∈(supp Λ)|>2θ Λ(x)
X Λ(v)
6 |ζv| (x|S(v) )| I[kx|S(v) k1 6 d] I[x|S(v) = v|S(v) ]
Λ(x) S(v)
v∈(supp Λ)|>2θ
X Λ(v)
6 2(nr)d I[kx|S(v) k1 6 d] I[x|S(v) = v|S(v) ]
Λ(x)
v∈(supp Λ)|>2θ
X X Λ(v)
= 2(nr)d I[kx|S k1 6 d] I[x|S = v|S ]
Λ(x)
S⊆{1,...,n}: v∈(supp Λ)|>2θ :
|S|>θ/r S(v)=S
X X Λ(v)
6 2(nr)d I[kx|S k1 6 d] I[x|S = v|S ],
n Λ(x)
S⊆{1,...,n}: P v∈N :
|S|>θ/r i∈S vi >θ,
θ
mini∈S vi > 2|S|(1+ln n)

(4.24)

where the first step uses (4.21); the second step applies the triangle inequality; the
third step is valid by (4.19); the fourth step amounts to collecting terms according
to S(v), which by (4.7) has cardinality at least θ/r; and the fifth step uses (4.8)
and (4.10). Nn
Bounding (4.24) requires a bit of work. To start with, write Λ = i=1 λi for
some λ1 , λ2 , . . . , λn ∈ B(r, c, α). Then for every nonempty set S ⊆ {1, 2, . . . , n},

Y 1 Y (xi + 1)2 2αxi

I[kx|S k1 6 d] 6 I[kx|S k1 6 d]
λi (xi ) cxi +1
i∈S i∈S
α Pi∈S xi Y
−|S| 2
= I[kx|S k1 6 d] c (xi + 1)2
c
i∈S
α 2 Pi∈S xi
2 e
6 I[kx|S k1 6 d] c−|S|
c
α 2 d
2 e
6 c−|S| , (4.25)
c
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 37

where the first step applies the definition of B(r, c, α), and the third step uses the
bound 1 + t 6 et for real t. Continuing,
X Λ(v)
I[x|S = v|S ]
n Λ(x)
P v∈N :
i∈S vi >θ,
θ
mini∈S vi > 2|S|(1+ln n)

X Y λi (vi )
=
n λi (xi )
P v∈N : i∈S
i∈S vi >θ,
θ
mini∈S vi > 2|S|(1+ln n) ,
vi =xi for i∈S
/
X P Y 1
6 2−α i∈S vi
n c(vi + 1)2 λi (xi )
P v∈N : i∈S
i∈S vi >θ,
θ
mini∈S vi > 2|S|(1+ln n)
,
vi =xi for i∈S
/
X Y 1
6 2−αθ
c(vi + 1)2 λi (xi )
v∈Nn : i∈S
θ
mini∈S vi > 2|S|(1+ln n)
,
vi =xi for i∈S
/
 |S|
∞
X 1 Y 1
= 2−αθ 
 
c(t + 1)2 λi (xi )

t=d 2|S|(1+ln i∈S
n) e
θ

!|S|
Z ∞
−αθ dt Y 1
62 2
d 2|S|(1+ln
θ
n) e
ct λi (xi )
i∈S
|S| Y
2|S|(1 + ln n) 1
6 2−αθ , (4.26)
cθ λi (xi )
i∈S

Nn
where the first step uses Λ = i=1 λi , and the second step applies the definition of
B(r, c, α).
It remains to put together the bounds obtained so far. We have:
|S| Y
|ζ(x)| X
−αθ 2|S|(1 + ln n) 1
6 2(nr)d I[kx|S k1 6 d] · 2
Λ(x) cθ λi (xi )
S⊆{1,...,n}: i∈S
|S|>θ/r
|S| α 2 d
d
X
−αθ 2|S|(1 + ln n) 2 e
6 2(nr) 2 ·
c2 θ c
S⊆{1,...,n}:
|S|>θ/r
|S|
(e2 nr/c)d

X 2|S|(1 + ln n)
6 2 · αdθ/2e
2 c2 θ
S⊆{1,...,n}:
|S|>θ/r
38 ALEXANDER A. SHERSTOV AND PEI WU

∞ s
(e2 nr/c)d

X n 2s(1 + ln n)
= 2 · αdθ/2e
2 s c2 θ
s=dθ/re
∞ s
(e2 nr/c)d X

en 2s(1 + ln n)
6 2 · αdθ/2e ·
2 s c2 θ
s=dθ/re
2 d ∞
(e nr/c) X
62· 2−s
2αdθ/2e
s=dθ/re
2 d
(e nr/c)
=4· ,
2αdθ/2e+dθ/re
where the first step follows from (4.24) and (4.26); the second step substitutes the
bound from (4.25); the third step uses (4.2); and the next-to-last step uses (4.3).
In summary, we have shown that

(e2 nr/c)d
|ζ(x)| 6 4 · Λ(x), x ∈ (supp Λ)|<2θ . (4.27)
2αdθ/2e+dθ/re

Step 4: Finishing the proof. Define Λ̃ = Λ − ζ. Then the support prop-

erty (4.4) follows from (4.20) and (4.23); the analytic indistinguishability prop-
erty (4.5) follows from (4.22); and the pointwise property (4.6) follows from (4.20)
and (4.27).

We record a generalization of Theorem 4.1 to translates of probability distri-

butions in B(r, c, α)⊗n , and further to convex combinations of such distributions.
Formally, define B(r, c, α, ∆) for ∆ > 0 to be the family of probability distributions
λ on N such that λ(t) ≡ λ0 (t − a) for some λ0 ∈ B(r, c, α) and a ∈ [0, ∆]. We have:

Corollary 4.2. Let Λ ∈ conv(B(r, c, α, ∆)⊗n ) be given, for some integers r, ∆ > 0
and reals c > 0 and α > 0. Let d and θ be positive integers with

θ > 2d, (4.28)

4en(1 + ln n)
θ> , (4.29)
c2
d
8nr
2dθ/re+αdθ/2e > 4 . (4.30)
c

Then there is a probability distribution Λ̃ such that

supp Λ̃ ⊆ (supp Λ)|<2θ+n∆ , (4.31)

orth(Λ − Λ̃) > d. (4.32)

Proof. We first consider the special case when Λ ∈ B(r, c, α, ∆)⊗n . Then by def-
inition, Λ(t1 , . . . , tn ) = Λ0 (t1 − a1 , . . . , tn − an ) for some probability distribution
Λ0 ∈ B(r, c, α)⊗n and integers a1 , . . . , an ∈ [0, ∆]. Applying Theorem 4.1 to Λ0
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 39

yields a function Λ̃0 : Nn → R with

supp Λ̃0 ⊆ (supp Λ0 )|<2θ , (4.33)

0 0
orth(Λ − Λ̃ ) > d, (4.34)
0 0 0 0
|Λ − Λ̃ | 6 Λ on supp Λ̃ . (4.35)

The last property implies in particular that Λ̃0 is a nonnegative function. As a re-
sult, (4.34) and Proposition 2.4 guarantee that Λ̃0 is a probability distribution. Now
the sought properties (4.31) and (4.32) follow from (4.33) and (4.34), respectively,
for the probability distribution Λ̃(t1 , . . . , tn ) = Λ̃0 (t1 − a1 , . . . , tn − an ).
In the general case of a convex combination Λ = λ1 Λ1 + · · · + λk Λk of probability
distributions Λ1 , . . . , Λk ∈ B(r, c, α, ∆)⊗n , one uses the technique of the previous
paragraph to transform Λ1 , . . . , Λk individually into corresponding probability dis-
tributions Λ̃1 , . . . , Λ̃k , and takes Λ̃ = λ1 Λ̃1 + · · · + λk Λ̃k .

4.2. A bounded dual polynomial for MP. We now turn to the construction
of a gadget for our amplification theorem. Let B∗ (r, c, α) denote the family of
probability distributions λ on N such that

supp λ = {0, 1, 2, . . . , r0 }

for some nonnegative integer r0 6 r, and moreover

c 1
6 λ(t) 6 , t ∈ supp λ.
(t + 1)2 2αt c(t + 1)2 2αt

In this family, a distribution’s weight at any given point is prescribed up to the

multiplicative constant c, in contrast to the exponentially large range allowed in
the definition of B(r, c, α). For all parameter settings, we have

B∗ (r, c, α) ⊆ B(r, c, α). (4.36)

Indeed, the containment holds trivially for c 6 1, and remains valid for c > 1
because the left-hand side and right-hand side are both empty in that case. As
before, it will be helpful to have shorthand notation for translates of distributions
in B(r, c, α): we define B∗ (r, c, α, ∆) for ∆ > 0 to be the family of probability
distributions λ on N such that λ(t) = λ0 (t − a) for some λ0 ∈ B∗ (r, c, α) and
a ∈ [0, ∆].
As our next step toward analyzing the threshold degree of AC0 , we will construct
a dual object that witnesses the high threshold degree of MP∗m,r and possesses
additional metric properties in the sense of B∗ . To simplify the exposition, we
start with an auxiliary construction.

Lemma 4.3. Let 0 < < 1 be given. Then for some constants c1 , c2 ∈ (0, 1) and all
integers R > r > 1, there are (explicitly given) probability distributions λ0 , λ1 , λ2
40 ALEXANDER A. SHERSTOV AND PEI WU

such that:

supp λ0 = {0}, (4.37)

supp λi = {1, 2, . . . , R}, i = 1, 2, (4.38)

c2
λi ∈ B∗ R, c1 , √ , 1 , i = 0, 1, 2, (4.39)
r
√
orth((1 − )λ0 + λ2 − λ1 ) > c1 r. (4.40)

Our analysis of the threshold degree of AC0 only uses the special case R = r of
Lemma 4.3. The more general formulation with R > r will be needed much later,
in the analysis of the sign-rank of AC0 .

Proof. Theorem 3.3 constructs a function ψ : {0, 1, 2, . . . , R} → R such that

1 − 2
ψ(0) > , (4.41)
2
kψk1 = 1, (4.42)
√
orth ψ > c0 r, (4.43)
c0

1
|ψ(t)| ∈ √ , √ , t = 0, 1, . . . , R, (4.44)
(t + 1)2 2c00 t/ r c0 (t + 1)2 2c00 t/ r

for some constants c0 , c00 ∈ (0, 1) independent of R, r. Property (4.42) makes it

possible to view |ψ| as a probability distribution on {0, 1, 2, . . . , R}. Let µ0 , µ1 , µ2
be the probability distributions induced by |ψ| on {0}, {t 6= 0 : ψ(t) < 0}, and
{t 6= 0 : ψ(t) > 0}, respectively. It is clear from (4.41) that the negative part of ψ is
a multiple of µ1 , whereas the positive part of ψ is a nonnegative linear combination
of µ0 and µ2 . Moreover, it follows from hψ, 1i = 0 and kψk1 = 1 that the positive
and negative parts of ψ both have `1 -norm 1/2. Summarizing,

1−δ 1 δ
ψ= µ0 − µ1 + µ2 (4.45)
2 2 2
for some 0 6 δ 6 1. In view of (4.41), we infer the more precise bound

06δ< . (4.46)
2
We define

λ0 = µ0 , (4.47)
1 − δ −δ
λ1 = µ1 + δ · µ2 , (4.48)
1 − δ2 1 − δ2
−δ 1 − δ
λ2 = µ1 + δ · µ2 . (4.49)
(1 − δ 2 ) (1 − δ 2 )
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 41

It follows from 0 6 δ 6 that λ1 and λ2 are convex combinations of µ1 and µ2 and

are therefore probability distributions with support

supp λi ⊆ {1, 2, . . . , R}, i = 1, 2. (4.50)

Recall from (4.45) that |ψ| = 12 µ1 + 2δ µ2 on {1, 2, . . . , R}. Comparing the coefficients
in |ψ| = 12 µ1 + 2δ µ2 with the corresponding coefficients in the defining equations for
λ1 and λ2 , where 0 6 δ 6 /2 by (4.46), we conclude that λ1 , λ2 ∈ [c000 |ψ|, |ψ|/c000 ]
on {1, 2, . . . , R} for some constant c000 = c000 () ∈ (0, 1). In view of (4.44), we arrive
at

c0 c000

1
λi (t) ∈ √ , √ ,
(t + 1)2 2c00 t/ r c0 c000 (t + 1)2 2c00 t/ r
i = 1, 2; t = 1, 2, . . . , R. (4.51)

Continuing,

1− 1−δ 1 δ
orth((1 − )λ0 + λ2 − λ1 ) = orth 2 · µ0 − µ1 + µ2
1−δ 2 2 2

1−
= orth 2 · ψ
1−δ
√
> c0 r, (4.52)

where the first step follows from the defining equations (4.47)–(4.49), the second
step uses (4.45), and the final step is a restatement of (4.43).
We are now in a position to verify the claimed properties of λ0 , λ1 , λ2 in the
theorem statement. Property (4.37) follows from (4.47), whereas property (4.38)
is immediate from (4.50) and (4.51). The remaining properties (4.39) and (4.40)
for c2 = c00 and a small enough constant c1 > 0 now follow from (4.51) and (4.52),
respectively.

We are now in a position to construct our desired dual polynomial for the Min-
sky–Papert function.

Theorem 4.4. For some absolute constants c1 , c2 ∈ (0, 1) and all positive integers
m and r, there are probability distributions Λ0 , Λ1 such that
⊗m !
∗ c2
Λi ∈ conv B r, c1 , √ , 1 , i = 0, 1, (4.53)
r
supp Λi ⊆ (MP∗m,r )−1 (i), i = 0, 1, (4.54)
√
orth(Λ1 − Λ0 ) > min{m, c1 r}. (4.55)

The last two properties in the theorem statement are equivalent, in the √ sense of
linear programming duality, to the lower bound deg± (MP∗m,r ) > min{m, c1 r} and
can be recovered in a black-box manner from many previous papers, e.g., [32, 42, 50].
The key additional property that we prove is (4.53), which is where the newly
established Lemma 4.3 plays an essential role.
42 ALEXANDER A. SHERSTOV AND PEI WU

Proof of Theorem 4.4. Take = 1/2 and R = r in Lemma 4.3, and let λ0 , λ1 , λ2 be
the resulting probability distributions. Let

Λ0 = E λ⊗S ⊗S
0 · λ2 ,
S⊆{1,2,...,m}
|S| odd

Λ1 = λ⊗m
1 .

Then (4.53) is immediate from (4.39), whereas (4.54) follows from (4.37) and (4.38).
To verify the remaining property (4.55), rewrite
X
Λ0 = 2−m+1 λ⊗S ⊗S
0 · λ2
S⊆{1,2,...,m}
|S| odd
⊗m ⊗m
1 1 1 1
= λ0 + λ2 − − λ0 + λ2 .
2 2 2 2

Observe that

orth(λi − λj ) > 1, i, j = 0, 1, 2, (4.56)

which can be seen from hλi − λj , 1i = hλi , 1i − hλj , 1i = 1 − 1 = 0. Now

orth(Λ1 − Λ0 )
⊗m ⊗m !
1 1 1 1
= orth λ⊗m
−
1 λ0 + λ2 + − λ0 + λ2
2 2 2 2
( ⊗m ! ⊗m !)
⊗m 1 1 1 1
> min orth λ1 − λ0 + λ2 , orth − λ0 + λ2
2 2 2 2
( ⊗m !)
1 1 1 1
> min orth λ1 − λ0 − λ2 , orth − λ0 + λ2
2 2 2 2

1 1 1 1
= min orth λ1 − λ0 − λ2 , m orth − λ0 + λ2
2 2 2 2

1 1
> min orth λ1 − λ0 − λ2 , m
2 2
√
> min{c1 r, m},

where the last five steps are valid by Proposition 2.1(i), Proposition 2.1(iii), Propo-
sition 2.1(ii), equation (4.56), and equation (4.40), respectively.

4.3. Hardness amplification for threshold degree and beyond. We now

present a black-box transformation that takes any given circuit in n variables with
threshold degree n1− into a circuit with polynomially larger threshold degree,

Ω(n1− 1+ ). This hardness amplification procedure increases the circuit size addi-
tively by nO(1) and the circuit depth by 2, preserving membership in AC0 . We
obtain analogous hardness amplification results for a host of other approximation-
theoretic complexity measures. For this reason, we adopt the following abstract
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 43

view of polynomial approximation. Let I0 , I1 , I∗ be nonempty convex subsets of the

real line, i.e., any kind of nonempty intervals (closed, open, or half-open; bounded
or unbounded). Let f : X → {0, 1, ∗} be a (possibly partial) Boolean function on a
finite subset X of Euclidean space. We define an (I0 , I1 , I∗ )-approximant for f to be
any real polynomial that maps f −1 (0), f −1 (1), f −1 (∗) into I0 , I1 , I∗ , respectively.
The (I0 , I1 , I∗ )-approximate degree of f, denoted degI0 ,I1 ,I∗ (f ), is the least degree
of an (I0 , I1 , I∗ )-approximant for f. Threshold degree corresponds to the special
case

deg± = deg(0,∞),(−∞,0),(−∞,∞) . (4.57)

Other notable cases include -approximate degree and one-sided -approximate de-
gree, given by

deg = deg[−,],[1−,1+],[−,1+] , (4.58)

deg+
= deg[−,],[1−,∞),(−∞,∞) , (4.59)

respectively. Our hardness amplification result applies to (I0 , I1 , I∗ )-approximate

degree for any nonempty convex I0 , I1 , I∗ ⊆ R, with threshold degree being a special
case. The centerpiece of our argument is the following lemma.

Lemma 4.5. Let c1 , c2 > 0 be the absolute constants from Theorem 4.4. Let
n, m, r, d, θ be positive integers such that

θ > 2d, (4.60)

4enm(1 + ln(nm))
θ> , (4.61)
c21
√
2 r 8nmr
θ> d log +2 . (4.62)
c2 c1

Then for each z ∈ {0, 1}n , there is a probability distribution Λ̃z on Nnm such that:
Qn
(i) the support of Λ̃z is contained in ( i=1 (MP∗m,r )−1 (zi ))|<2θ+nm ;
(ii) for every polynomial p : Rnm → R of degree at most d, the mapping z 7→
1 √
EΛ̃z p is a polynomial on {0, 1}n of degree at most min{m,c 1 r}
· deg p.

Proof. Theorem 4.4 constructs probability distributions Λ0 and Λ1 such that

⊗m !
c 2
Λi ∈ conv B∗ r, c1 , √ , 1 , i = 0, 1, (4.63)
r
supp Λi ⊆ (MP∗m,r )−1 (i), i = 0, 1, (4.64)
√
orth(Λ1 − Λ0 ) > min{m, c1 r}. (4.65)
44 ALEXANDER A. SHERSTOV AND PEI WU

Nn
As a result, the probability distributions Λz = i=1 Λzi for z ∈ {0, 1}n obey

⊗m !!⊗n
c 2
Λz ∈ conv B∗ r, c1 , √ , 1
r
⊗nm !
∗ c2
⊆ conv B r, c1 , √ , 1
r
⊗nm !
c2
⊆ conv B r, c1 , √ , 1 , (4.66)
r

where the last two steps are valid by (2.2) and (4.36), respectively. By (4.60)–(4.62),
(4.66), and Corollary 4.2, there are probability distributions Λ̃z for z ∈ {0, 1}n such
that

supp Λ̃z ⊆ (supp Λz )|<2θ+nm , (4.67)

orth(Λz − Λ̃z ) > d. (4.68)

We proceed to verify the properties required of Λ̃z . For (i), it follows from (4.64)
Qn
and (4.67) that each Λ̃z has support contained in ( i=1 (MP∗m,r )−1 (zi ))|<2θ+nm .
For (ii), let p be any polynomial of degree at most d. Then (4.68) forces EΛ̃z p =
EΛz p, where the right-hand side is by (4.65) and Proposition 2.2 a polynomial in
√
z ∈ {0, 1}n of degree at most deg p/ orth(Λ1 − Λ0 ) 6 deg p/ min{m, c1 r}.

At its core, a hardness amplification result is a lower bound on the complexity

of a composed function in terms of the complexities of its constituent parts. We
now prove such a composition theorem for (I0 , I1 , I∗ )-approximate degree.

Theorem 4.6. There is an absolute constant 0 < c < 1 such that

cθ
degI0 ,I1 ,I∗ ((f ◦ MP∗m )|6θ ) > min cm degI0 ,I1 ,I∗ (f ), −n ,
m log(n + m)

∗ cθ
degI0 ,I1 ,I∗ ((f ◦ ¬MPm )|6θ ) > min cm degI0 ,I1 ,I∗ (f ), −n
m log(n + m)

for all positive integers n, m, θ, all functions f : {0, 1}n → {0, 1, ∗}, and all nonempty
convex sets I0 , I1 , I∗ ⊆ R.

Proof. Negating a function’s inputs has no effect on the (I0 , I1 , I∗ )-approximate

degree, so that f (x1 , x2 , . . . , xn ) and f (¬x1 , ¬x2 , . . . , ¬xn ) both have (I0 , I1 , I∗ )-
approximate degree degI0 ,I1 ,I∗ (f ). Therefore, it suffices to prove the lower bound
on degI0 ,I1 ,I∗ ((f ◦ MP∗m )|6θ ) for all f .
Let c ∈ (0, 1) be an absolute constant that is sufficiently small relative to the
constants in Lemma 4.5. For θ 6 1c · nm log(n + m), the lower bounds in the
statement of the theorem are nonpositive and therefore trivially true. In the com-
plementary case θ > 1c · nm log(n + m), Lemma 4.5 applies to the positive integers
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 45

n0 , m0 , r0 , d0 , θ0 , where

n0 = n,
m0 = m,
r0 = m2 ,

θ − nm
θ0 = ,
2

cθ
d0 = .
m log(n + m)

We thus obtain, for each z ∈ {0, 1}n , a probability distribution Λ̃z on Nnm such
that:
Qn
(i) the support of Λ̃z is contained in ( i=1 (MP∗m )−1 (zi ))|6θ ;
(ii) for every polynomial p : R nm
→ R of degree at most d0 , the mapping z 7→
1
EΛ̃z p is a polynomial on {0, 1}n of degree at most cm · deg p.

Now, let p : Rnm → R be an (I0 , I1 , I∗ )-approximant for (f ◦ MP∗m )|6θ of degree at

most d0 . Consider the mapping p∗ : z 7→ EΛ̃z p, which we view as a polynomial in
z ∈ {0, 1}n . Then (i) along with the convexity of I0 , I1 , I∗ ensures that p∗ is an
(I0 , I1 , I∗ )-approximant for f , whence deg p∗ > degI0 ,I1 ,I∗ (f ). At the same time, (ii)
guarantees that deg p∗ 6 cm 1
· deg p. This pair of lower and upper bounds force

deg p > cm degI0 ,I1 ,I∗ (f ).

Since p was chosen arbitrarily from among (I0 , I1 , I∗ )-approximants of (f ◦MP∗m )|6θ
that have degree at most d0 , we conclude that

degI0 ,I1 ,I∗ ((f ◦ MP∗m )|6θ ) > min{cm degI0 ,I1 ,I∗ (f ), d0 + 1}

cθ
> min cm degI0 ,I1 ,I∗ (f ), .
m log(n + m)

The previous composition theorem has the following analogue for Boolean inputs.

Theorem 4.7. Let 0 < c < 1 be the absolute constant from Theorem 4.6. Let
n, m, N be positive integers. Then there is an (explicitly given) transformation
H : {0, 1}N → {0, 1}n , computable by an AND-OR-AND circuit of size (N nm)O(1)
with bottom fan-in O(log(nm)), such that for all functions f : {0, 1}n → {0, 1, ∗}
and all nonempty convex sets I0 , I1 , I∗ ⊆ R,

cN
degI0 ,I1 ,I∗ (f ◦ H) > min cm degI0 ,I1 ,I∗ (f ), − n log(n + m),
50m log2 (n + m)

cN
degI0 ,I1 ,I∗ (f ◦ ¬H) > min cm degI0 ,I1 ,I∗ (f ), − n log(n + m).
50m log2 (n + m)

For the function H with multibit output, the notation ¬H above refers to the
function obtained by negating each of H’s outputs.
46 ALEXANDER A. SHERSTOV AND PEI WU

Proof of Theorem 4.7. As in the previous proof, settling the first lower bound for
all f will automatically settle the second lower bound, due to the invariance of
(I0 , I1 , I∗ )-approximate degree under negation of the input bits. In what follows,
we focus on f ◦ H.
We may assume that N > 50mn log2 (n + m) since otherwise the lower bounds
in the theorem statement are nonpositive and hence trivially true. Define

N
θ= .
50 log(n + m)

Theorem 3.9 gives a surjection G : {0, 1}6θdlog(nm+1)e → Nnm |6θ with the following
two properties:
(i) for every coordinate i = 1, 2, . . . , nm, the mapping x 7→ OR∗θ (G(x)i ) is
computable by an explicit DNF formula of size (nmθ)O(1) = N O(1) with
bottom fan-in O(log(nm));
(ii) for any polynomial p, the map v 7→ EG−1 (v) p is a polynomial on Nnm |6θ
of degree at most (deg p)/dlog(nm + 1) + 1e 6 (deg p)/ log(n + m).
Consider the composition F = (f ◦ MP∗m,θ ) ◦ G. Then

F = (f ◦ (ANDm ◦ OR∗θ )) ◦ G
= f ◦ ((ANDm ◦ OR∗θ , . . . , ANDm ◦ OR∗θ ) ◦ G),
| {z }
n

which by property (i) of G means that F is the composition of f and an AND-OR-

AND circuit H on 6θdlog(nm + 1)e 6 N variables of size (nmN )O(1) = N O(1) with
bottom fan-in O(log(nm)). Hence, the proof will be complete once we show that

cN
degI0 ,I1 ,I∗ (F ) > min cm degI0 ,I1 ,I∗ (f ), − n log(n + m).
50m log2 (n + m)
(4.69)

For this, fix an (I0 , I1 , I∗ )-approximant p for F of degree degI0 ,I1 ,I∗ (F ). Consider
the polynomial p∗ : Nnm |6θ → R given by p∗ (v) = EG−1 (v) p. Since I0 , I1 , I∗ are
convex and p is an (I0 , I1 , I∗ )-approximant for F = (f ◦ MP∗m,θ ) ◦ G, it follows that
p∗ is an (I0 , I1 , I∗ )-approximant for (f ◦ MP∗m,θ )|6θ . Therefore,

deg p∗ > degI0 ,I1 ,I∗ ((f ◦ MP∗m,θ )|6θ )

> degI0 ,I1 ,I∗ ((f ◦ MP∗m )|6θ )

cθ
> min cm degI0 ,I1 ,I∗ (f ), −n
m log(n + m)

cN
> min cm degI0 ,I1 ,I∗ (f ), − n ,
50m log2 (n + m)

where the second step is valid because (f ◦ MP∗m,θ )|6θ either contains the function
(f ◦ MP∗m )|6θ = (f ◦ MP∗m,m2 )|6θ as a subfunction (case θ > m2 ), or is equal to it
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 47

(case θ 6 m2 ); and the third step applies Theorem 4.6. However, property (ii) of
G states that
deg p
deg p∗ 6
log(n + m)
degI0 ,I1 ,I∗ (F )
= .
log(n + m)

Comparing these lower and upper bounds on the degree of p∗ settles (4.69).

At last, we illustrate the use of the previous two composition results to amplify
hardness for polynomial approximation.

Theorem 4.8 (Hardness amplification). Let I0 , I1 , I∗ ⊆ R be any nonempty convex

subsets. Let f : {0, 1}n → {0, 1} be a given function with
1
degI0 ,I1 ,I∗ (f ) > n1− k ,

for some real number k > 1. Suppose further that f is computable by a Boolean
circuit of size s and depth d, where d > 1. Then there is a function F : {0, 1}N →
1
{0, 1} on N = Θ(n1+ k log2 n) variables with
1
!
N 1− k+1
degI0 ,I1 ,I∗ (F ) > Ω 2 .
log1− k+1 N

Moreover, F is computable by a Boolean circuit of size s + nO(1) , bottom fan-in

O(log n), depth d + 2 if the circuit for f is monotone, and depth d + 3 otherwise.

Proof. Take

m = dn1/k e,

100 2
N= mn log (n + m) ,
c

where 0 < c < 1 is the absolute constant from Theorem 4.6. Then Theorem 4.7
gives an explicit transformation H : {0, 1}N → {0, 1}n , computable by an AND-
OR-AND circuit of size nO(1) with bottom fan-in O(log n), such that

min{degI0 ,I1 ,I∗ (f ◦ H), degI0 ,I1 ,I∗ (f ◦ ¬H)}

cN
> min cm degI0 ,I1 ,I∗ (f ), − n log(n + m)
50m log2 (n + m)
> cn log n
1
!
N 1− k+1
=Θ 2 .
log1− k+1 N
48 ALEXANDER A. SHERSTOV AND PEI WU

Now, fix a circuit for f of size s and depth d > 1. Composing the circuits for f and
H results in circuits for f ◦ H and f ◦ ¬H of size s + nO(1) , bottom fan-in O(log n),
and depth at most d + 3. Thus, F can be taken to be either of f ◦ H and f ◦ ¬H.
When the circuit for f is monotone, the depth of F can be reduced to d + 2 as
follows. After merging like gates if necessary, the circuit for f can be viewed as
composed of d layers of alternating gates (∧ and ∨). The bottom layer of f can
therefore be merged with the top layer of either H or ¬H, resulting in a circuit of
depth at most (d + 3) − 1 = 2.

We emphasize that in view of (4.57), the symbol degI0 ,I1 ,I∗ in Theorems 4.6–4.8
can be replaced with the threshold degree symbol deg± . The same goes for any
other special case of (I0 , I1 , I∗ )-approximate degree.

4.4. Threshold degree and discrepancy of AC0 . We have reached our main
result on the sign-representation of constant-depth circuits. For any > 0, the next
theorem constructs a circuit family in AC0 with threshold degree Ω(n1− ). The
proof amounts to a recursive application of the hardness amplification procedure of
Section 4.3.

Theorem 4.9. Let k > 1 be a fixed integer. Then there is an (explicitly given)
family of functions {fk,n }∞ n
n=1 , where fk,n : {0, 1} → {0, 1} has threshold degree
k−1 1 k−2 k−2

deg± (fk,n ) = Ω n k+1 · (log n)− k+1 d 2 eb 2 c (4.70)

and is computable by a monotone Boolean circuit of size nO(1) and depth k. In

addition, the circuit for fk,n has bottom fan-in O(log n) for all k 6= 2.

Proof. The proof is by induction on k. The base cases k = 1 and k = 2 correspond

to the families

f1,n (x) = x1 , n = 1, 2, 3, . . . ,
f2,n (x) = MPbn1/3 c , n = 1, 2, 3, . . . .

For the former, the threshold degree lower bound (4.70) is trivial. For the latter, it
follows from Theorem 2.5.
For the inductive step, fix k > 3. Due to the asymptotic nature of (4.70), it is
enough to construct the functions in {fk,n }∞
n=1 for n larger than a certain constant
of our choosing. As a starting point, the inductive hypothesis gives an explicit
family {fk−2,n }∞ n
n=1 in which fk−2,n : {0, 1} → {0, 1} has threshold degree
k−3 1 k−4 k−4

deg± (fk−2,n ) = Ω n k−1 · (log n)− k−1 d 2 eb 2 c (4.71)

and is computable by a monotone Boolean circuit of size nO(1) and depth k − 2. We

view the circuit for fk−2,n as composed of k − 2 layers of alternating gates, where
without loss of generality the bottom layer consists of AND gates. This last property
can be forced by using ¬fk−2,n (¬x1 , ¬x2 , . . . , ¬xn ) instead of fk−2,n (x1 , x2 , . . . , xn ),
which interchanges the circuit’s AND and OR gates without affecting the threshold
degree, circuit depth, or circuit size.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 49

Now, let c > 0 be the absolute constant from Theorem 4.6. For every N larger
than a certain constant, we apply Theorem 4.7 with
l k−1 1 k−4 k−4 2(k−1) c m
n = N k+1 (log N )− k+1 d 2 eb 2 c− k+1 · , (4.72)
l 2 m 100
1 k−4 k−4 4
m = N k+1 (log N ) k+1 d 2 eb 2 c− k+1 , (4.73)
f = fk−2,n , (4.74)
I0 = (0, ∞), (4.75)
I1 = (−∞, 0), (4.76)
I∗ = (−∞, ∞) (4.77)

to obtain a function HN : {0, 1}N → {0, 1}n such that the composition FN =
fk−2,n ◦ HN has threshold degree

cN
deg± (FN ) > min cm deg± (fk−2,n ), − n log(n + m)
50m log2 (n + m)
k−1 1 k−4 k−4 k−3

= Θ N k+1 (log N )− k+1 d 2 eb 2 c− k+1
k−1 1 k−2 k−2

= Θ N k+1 (log N )− k+1 d 2 eb 2 c , (4.78)

where the second step uses (4.71)–(4.73). Moreover, Theorem 4.7 ensures that HN
is computable by an AND-OR-AND circuit of polynomial size and bottom fan-
in O(log N ). The bottom layer of fk−2,n consists of AND gates, which can be
merged with the top layer of HN to produce a circuit for FN = fk−2,n ◦ HN of
depth (k − 2) + 3 − 1 = k.
We have thus constructed, for some constant N0 , a family of functions {FN }∞
N =N0
in which each FN : {0, 1}N → {0, 1} has threshold degree (4.78) and is computable
by a Boolean circuit of polynomial size, depth k, and bottom fan-in O(log N ). Now,
take the circuit for FN and replace the negated inputs in it with N new, unnegated
inputs. The resulting monotone circuit on 2N variables clearly has threshold degree
at least that of FN . This completes the inductive step.

Theorem 4.9 settles Theorem 1.1 from the introduction. Using the pattern matrix
method, we now “lift” this result to communication complexity.

Theorem 4.10. Let k > 3 be a fixed integer. Let ` : N → {2, 3, 4, . . .} be given.

Then there is an (explicitly given) family {Fn }∞ n `(n)
n=1 , where Fn : ({0, 1} ) → {0, 1}
is an `(n)-party communication problem with discrepancy
  l m k−1 
k+1
n
  4`(n) `(n)2 
disc(Fn ) 6 2 exp  −Ω 
1 k−2

k−2  (4.79)
  l m k+1 d 2 eb 2 c
n
1 + log 4`(n) `(n)2
50 ALEXANDER A. SHERSTOV AND PEI WU

and communication complexity

 l m k−1 
k+1
n
 4`(n) `(n)2 
PP(Fn ) = Ω  1 k−2 k−2  .
 (4.80)
 l m k+1 d 2 eb 2 c
n
1 + log 4`(n) `(n)2

Moreover, Fn is computable by a Boolean circuit of size nO(1) and depth k + 1

in which the bottom two layers have fan-in O(4`(n) `(n)2 log n) and `(n), in that
order. In particular, if `(n) = O(1), then Fn is computable by a Boolean circuit of
polynomial size, depth k, and bottom fan-in O(log n).

Proof. Theorem 4.9 constructs a family of functions {fn }∞ n

n=1 , where fn : {0, 1} →
{0, 1} has threshold degree
k−1 1 k−2 k−2

deg± (fn ) = Ω n k+1 · (log n)− k+1 d 2 eb 2 c (4.81)

and is computable by a monotone Boolean circuit of polynomial size, depth k,

and bottom fan-in O(log n). We view the circuit for fn as composed of k layers of
alternating gates, where without loss of generality the bottom layer consists of AND
gates. This last property can be forced by using ¬fn (¬x1 , ¬x2 , . . . , ¬xn ) instead
of fn (x1 , x2 , . . . , xn ), which interchanges the circuit’s AND and OR gates without
affecting the threshold degree, circuit depth, circuit size, or bottom fan-in.
Now, let c > 0 be the absolute constant from Theorem 2.17. For any given n,
define
(
AND`(n) if n 6 2m,
Fn =
fbn/mc ◦ NORm ◦ AND`(n) otherwise,

where m = 2dc2`(n) `(n)e2 . Then the discrepancy bound (4.79) is trivial for n 6 2m,
and follows from (4.81) and Theorem 2.17 for n > 2m. The lower bound (4.80) on
the communication complexity of Fn with weakly unbounded error is now immedi-
ate by the discrepancy method (Corollary 2.15).
It remains to examine the circuit complexity of Fn . Since fn is computable by
a monotone circuit of size nO(1) and depth k, with the bottom layer composed
of AND gates of fan-in O(log n), it follows that Fn is computable by a circuit
of size nO(1) and depth at most k + 1 in which the bottom two levels have fan-
in O(log n) · m = O(4`(n) `(n)2 log n) and `(n), in that order. This means that
for `(n) = O(1), the bottom three levels of Fn can be computed by a circuit of
polynomial size, depth 2, and bottom fan-in O(log n), which in turn gives a circuit
for Fn of polynomial size, depth (k +1)−3+2 = k, and bottom fan-in O(log n).

Taking `(n) = 2 in Theorem 4.10 settles Theorem 1.4 from the introduction.

4.5. Threshold degree of surjectivity. We close this section with another ap-
plication of our amplification theorem, in which we take the outer function f to be
the identity map f : {0, 1} → {0, 1} on a single bit.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 51

Theorem 4.11. For any integer m > 1,

deg± (MP∗m |6m2 log m ) = Ω(m).

Proof. Let f : {0, 1} → {0, 1} be the identity function, so that deg± (f ) = 1. Invok-
ing Theorem 4.6 with n = 1 and θ = bm2 log mc, one obtains the claimed lower
bound.

Theorem 4.11 has a useful interpretation. For positive integers n and r, the surjec-
tivity problem is the problem of determining whether a given mapping {1, 2, . . . , n} →
{1, 2, . . . , r} is surjective. This problem is trivial for r > n, and the standard regime
studied in previous work is r 6 cn for some constant 0 < c < 1. The input to the
surjectivity problem is represented by a Boolean matrix x ∈ {0, 1}r×n with pre-
cisely one nonzero entry in every column. More formally, let e1 , e2 , . . . , er be the
standard basis for Rr . The surjectivity function SURJn,r : {e1 , e2 , . . . , er }n → {0, 1}
is given by
r _
^ n
SURJn,r (x1 , x2 , . . . , xn ) = xi,j .
j=1 i=1

It is clear that SURJn,r (x1 , x2 , . . . , xn ) is uniquely determined by the vector sum

x1 +x2 +· · ·+xn ∈ Nr |n . It is therefore natural to consider a symmetric counterpart
of the surjectivity function, with domain Nr |n instead of {e1 , e2 , . . . , er }n . This
symmetric version is (ANDr ◦ OR∗n )|n = MP∗r,n |n , and Proposition 2.12 ensures
that

deg± (SURJn,r ) = deg± (MP∗r,n |n ). (4.82)

The surjectivity problem has seen much work recently [8, 54, 11, 17]. In par-
ticular, Bun and Thaler [17] have obtained an essentially tight lower bound of
√
Ω̃(min{r, n}) on the threshold degree of SURJn,r in the standard regime r 6
(1 − Ω(1))n. As a corollary to Theorem 4.11, we give a new proof of Bun and
Thaler’s result, sharpening their bound by a polylogarithmic factor.

Corollary 4.12. For any integers n > r > 1,

r
n−r
deg± (SURJn,r ) > Ω min r, . (4.83)
1 + log(n − r)

Proof. Define
r
0 n−r
r = min r − 1, . (4.84)
1 + log(n − r)
52 ALEXANDER A. SHERSTOV AND PEI WU

We may assume that r0 > 1 since (4.83) holds trivially otherwise. The identity

MP∗r0 ,n (x1 , x2 , . . . , xr0 )

 0

r
X
= MP∗r,n x1 , x2 , . . . , xr0 , 1, 1, . . . , 1, 1 + n − (r − r0 ) − xi 
| {z }
i=1
r−r 0 −1

0
holds for all (x1 , x2 , . . . , xr0 ) ∈ Nr |6n−(r−r0 ) , whence

deg± (MP∗r0 ,n |6n−(r−r0 ) ) 6 deg± (MP∗r,n |n ). (4.85)

Now

deg± (SURJn,r ) = deg± (MP∗r,n |n )

> deg± (MP∗r0 ,n |6n−(r−r0 ) )
> deg± (MP∗r0 ,r02 |6r02 log r0 )
> Ω(r0 ),

where the four steps use (4.82), (4.85), (4.84), and Theorem 4.11, respectively.

5. The sign-rank of AC0

We now turn to the second main result of this paper, a near-optimal lower
bound on the sign-rank of constant-depth circuits. To start with, we show that our
smoothing technique from Theorem 4.4 already gives an exponential lower bound on
the sign-rank of AC0 . Specifically, we prove in Section 5.1 that the Minsky–Papert
function MPn1/3 has exp(−O(n1/3 ))-smooth threshold degree Ω(n1/3 ), which by
Theorem 2.18 immediately implies an exp(Ω(n1/3 )) lower bound on the sign-rank
of an AC0 circuit of depth 3. This result was originally obtained, with a longer and
more demanding proof, by Razborov and Sherstov [39].
To obtain the near-optimal lower bound of exp(Ω(n1− )) for every > 0, we
use a completely different approach. It is based on the notion of local smoothness
and is unrelated to the threshold degree analysis. In Section 5.2, we define local
smoothness and record basic properties of locally smooth functions. In Sections 5.3
and 5.4, we develop techniques for manipulating locally smooth functions to achieve
desired global behavior, without the manipulations being detectable by low-degree
polynomials. To apply this machinery to constant-depth circuits, we design in
Section 5.5 a locally smooth dual polynomial for the Minsky–Papert function. We
use this dual object in Section 5.6 to prove an amplification theorem for smooth
threshold degree. We apply the amplification theorem iteratively in Section 5.7
to construct, for any > 0, a constant-depth circuit with exp(−O(n1− ))-smooth
threshold degree Ω(n1− ). Finally, we present our main result on the sign-rank of
AC0 in Section 5.8.
In the remainder of this section, we adopt the following additional notation. For
an arbitrary subset X of Euclidean space, we write diam X = supx,x0 ∈X |x − x0 |,
with the convention that diam ∅ = 0. For a vector x ∈ Zn and a natural number
d, we let Bd (x) = {v ∈ Zn : |x − v| 6 d} denote the set of integer-valued vectors
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 53

within distance d of x. For all x,

n+d
|Bd (x)| = |Bd (0)| 6 2d , (5.1)
d

where the binomial coefficient corresponds to the number of nonnegative integer

vectors of weight at most d. Finally, for vectors u, v ∈ Nn , we define cube(u, v) to
be the smallest Cartesian product of integer intervals that contains both u and v.
Specifically,

cube(u, v) = {w ∈ Nn : min{ui , vi } 6 wi 6 max{ui , vi } for all i}

Yn
= {min{ui , vi }, min{ui , vi } + 1, . . . , max{ui , vi }}.
i=1

5.1. A simple lower bound for depth 3. We start by presenting a new proof
of Razborov and Sherstov’s exponential lower bound [39] on the sign-rank of AC0 .
More precisely, we prove the following stronger result that was not known before.

Theorem 5.1. There is a constant 0 < c < 1 such that for all positive integers m
and r,
√
deg± (MPm,r , 12−m−1 ) > min{m, c r}.

Theorem 5.1 is asymptotically optimal, and it is the first lower bound on the smooth
threshold degree of the Minsky–Papert function. As we will discuss shortly, this
theorem implies an exp(Ω(n1/3 )) lower bound on the sign-rank of AC0 . In addition,
we will use Theorem 5.1 as the base case in the inductive proof of Theorem 1.3.

Proof of Theorem 5.1. It is well-known [33, 36, 57] that for some constant c > 0
and all r,√any real polynomial p : {0, 1}r → R with kp − NORr k∞ 6 0.49 has degree
at least c r. By linear programming duality [50, Theorem 2.5], this approximation-
theoretic fact is equivalent to the existence of a function ψ : {0, 1}r → R with

ψ(0) > 0.49, (5.2)

kψk1 = 1, (5.3)
√
orth ψ > c r. (5.4)

The rest of the proof is a reprise of Section 4.2. To begin with, property (5.3)
makes it possible to view |ψ| as a probability distribution on {0, 1}r . Let µ0 , µ1 , µ2
be the probability distributions induced by |ψ| on the sets {0r }, {x 6= 0r : ψ(x) < 0},
and {x 6= 0r : ψ(x) > 0}, respectively. It is clear from (5.2) that the negative part
of ψ is a multiple of µ1 , whereas the positive part of ψ is a nonnegative linear
combination of µ0 and µ2 . Moreover, it follows from hψ, 1i = 0 and kψk1 = 1 that
the positive and negative parts of ψ both have `1 -norm 1/2. Summarizing,

1−δ 1 δ
ψ= µ0 − µ1 + µ2 (5.5)
2 2 2
54 ALEXANDER A. SHERSTOV AND PEI WU

for some 0 6 δ 6 1. In view of (5.2), we infer the more precise bound

1
06δ< . (5.6)
50
Let υ be the uniform probability distribution on {0, 1}r \ {0r }. We define

λ0 = µ0 , (5.7)

2 2
λ1 = µ1 + 1 − υ, (5.8)
3(1 − δ) 3(1 − δ)

2δ 2δ
λ2 = µ2 + 1 − υ. (5.9)
1−δ 1−δ

It is clear from (5.6) that λ1 and λ2 are convex combinations of υ, µ1 , µ2 and

therefore are probability distributions with support

supp λi ⊆ {0, 1}r \ {0r }, i = 1, 2, (5.10)

whereas

supp λ0 = {0r } (5.11)

by definition. Moreover, (5.6) implies that

1
λi > υ, i = 1, 2. (5.12)
4
The defining equations (5.7)–(5.9) further imply that

2 1 4
λ0 + λ2 − λ1 = ψ,
3 3 3(1 − δ)

which along with (5.4) gives

√

2 1
orth λ0 + λ2 − λ1 > c r. (5.13)
3 3

With this work behind us, define

⊗m ⊗m
1 2 1 1 1 1 1
Λ= λ0 + λ2 − − λ0 + λ2 + λ⊗m .
2 3 3 2 3 3 2 1
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 55

Multiplying out the tensor products in the definition of Λ and collecting like terms,
we obtain

1 X 2|S| − (−1)|S| ⊗S ⊗S 1 ⊗m
Λ= λ0 · λ2 + λ1 (5.14)
2 3m 2
S⊆{1,2,...,m}
S6=∅

1 X 2|S| ⊗S ⊗S 1 ⊗m
> λ · λ2 + λ1
4 3m 0 2
S⊆{1,2,...,m}
S6=∅
⊗S ⊗m
2|S| ⊗S

1 X 1 1 1
> λ · υ + υ
4 3m 0 4 2 4
S⊆{1,2,...,m}
S6=∅
⊗S
1 X 2|S| ⊗S 1
> λ · υ
4 3m 0 4
S⊆{1,2,...,m}
⊗m
1 2 1 1
= λ0 + · υ
4 3 3 4
m
1 1
> 1({0,1}r )m , (5.15)
4 12 · 2r

where the third step uses (5.12). In particular, Λ is a nonnegative function. We

further calculate
m m
1 2 1 1 1 1 1
hΛ, 1i = λ0 + λ2 , 1 − − λ0 + λ2 , 1 + hλ1 , 1im
2 3 3 2 3 3 2
m
1 2 1 1
= λ0 + λ2 , 1 + hλ1 , 1im
2 3 3 2
1 1
= +
2 2
= 1, (5.16)

which makes Λ a probability distribution on ({0, 1}r )m .

It remains to examine the orthogonal content of Λ · (−1)MPm,r . We have

1 X 2|S| − (−1)|S| ⊗S ⊗S
Λ · (−1)MPm,r = λ0 · λ2 · (−1)MPm,r
2 3m
S⊆{1,2,...,m}
S6=∅
1
+ λ⊗m · (−1)MPm,r
2 1
1 X 2|S| − (−1)|S| ⊗S 1 ⊗m
= λ0 · λ⊗S
2 − λ1
2 3m 2
S⊆{1,2,...,m}
S6=∅
⊗m ⊗m
1 2 1 1 1 1 1
= λ0 + λ2 − − λ0 + λ2 − λ⊗m ,
2 3 3 2 3 3 2 1
56 ALEXANDER A. SHERSTOV AND PEI WU

where the first step uses (5.14); the second step uses (5.10) and (5.11); and the
final equality can be verified by multiplying out the tensor powers and collecting
like terms. Now

orth(Λ · (−1)MPm,r )
( ⊗m !
2 1 1 1 ⊗m
= min orth λ0 + λ2 − λ1 ,
3 2 3 2
⊗m !)
1 1 1
orth − − λ0 + λ2
2 3 3

2 1 1 1
> min orth λ0 + λ2 − λ1 , m orth − λ0 + λ2
3 3 3 3
√

1 1
> min c r, m orth − λ0 + λ2
3 3
√
> min{c r, m},

where the first step applies Proposition 2.1(i); the second step applies Proposi-
tion 2.1(ii), (iii); the third step substitutes the lower bound from (5.13); and the
last step uses h−λ0 + λ2 , 1i = −hλ0 , 1i + hλ2 , 1i = −1 + 1 = 0. Combining this
conclusion with (5.15) and (5.16) completes the proof.

We now “lift” the approximation-theoretic result just obtained to a sign-rank

lower bound, reproving a result of Razborov and Sherstov [39].

Theorem 5.2 (Razborov and Sherstov). Define Fn : {0, 1}n × {0, 1}n → {0, 1} by

Fn = ANDn1/3 ◦ ORn2/3 ◦ AND2 .

Then
1/3
rk± (Fn ) > 2Ω(n )
.

Proof. Theorem 5.1 gives

deg± (ANDn1/3 ◦ ORn2/3 , exp(−c0 n1/3 )) > c00 n1/3

for some absolute constants c0 , c00 > 0 and all n. This lower bound along with
Theorem 2.18 implies that the composition

Hn = ANDn1/3 ◦ ORn2/3 ◦ OR2dexp( 4c0 )e ◦ AND2

c00

has sign-rank rk± (Hn ) = exp(Ω(n1/3 )). This completes the proof because for some
integer constant c > 1, each Hn is a subfunction of Fcn .

5.2. Local smoothness. The remainder of this paper focuses on our exp(Ω(n1− ))
lower bound on the sign-rank of AC0 , whose proof is unrelated to the work in Sec-
tion 4 and Section 5.1. Central to our approach is an analytic notion that we call
local smoothness. Formally, let Φ : Nn → R be a function of interest. For a subset
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 57

X ⊆ Nn and a real number K > 1, we say that Φ is K-smooth on X if

0
|Φ(x)| 6 K |x−x | |Φ(x0 )| for all x, x0 ∈ X.

Put another way, for any two points of X at distance d, the corresponding values of
Φ differ in magnitude by a factor of at most K d . For any set X, we let Smooth(K, X)
denote the family of functions that are K-smooth on X. The following proposition
collects basic properties of local smoothness, to which we refer as the restriction
property, scaling property, tensor property, and conical property.

Proposition 5.3. Let K > 1 be given.

(i) If Φ ∈ Smooth(K, X) and X 0 ⊆ X, then Φ ∈ Smooth(K, X 0 ).
(ii) If Φ ∈ Smooth(K, X) and a ∈ R, then aΦ ∈ Smooth(K, X).
(iii) Smooth(K, X) ⊗ Smooth(K, Y ) ⊆ Smooth(K, X × Y ).
(iv) If Φ, Ψ ∈ Smooth(K, X) and Φ, Ψ are nonnegative on X, then cone{Φ, Ψ} ⊆
Smooth(K, X).

Proof. Properties (i) and (ii) are immediate from the definition of K-smoothness.
For (iii), fix Φ ∈ Smooth(K, X) and Ψ ∈ Smooth(K, Y ). Then for all (x, y), (x0 , y 0 ) ∈
X × Y, we have
0 0
|Φ(x)Ψ(y)| 6 K |x−x | |Φ(x0 )| K |y−y | |Ψ(y 0 )|
0 0
= K |(x,y)−(x ,y )| |Φ(x0 )Ψ(y 0 )|,

where the first step uses the K-smoothness of Φ and Ψ. Finally, for (iv), let a and
b be nonnegative reals. Then

|aΦ(x) + bΨ(x)| = a|Φ(x)| + b|Ψ(x)|

0 0
6 aK |x−x | |Φ(x0 )| + bK |x−x | |Ψ(x0 )|
0
= K |x−x | |aΦ(x0 ) + bΨ(x0 )|

for all x, x0 ∈ X, where the second step uses the K-smoothness of Φ and Ψ.
We will take a special interest in locally smooth functions that are probability
distributions. For our purposes, it will be sufficient to consider locally smooth
distributions whose support is the Cartesian product of integer intervals. For an
integer n > 1 and a real number K > 1, we let S(n, K) denote the set of probability
distributions Λ such that:
Qn
(i) Λ is supported on i=1 {0, 1, 2, . . . , ri }, for some r1 , r2 , . . . , rn ∈ N;
(ii) Λ is K-smooth on its support.
Analogous to the development in Section 4, we will need a notation for translates of
distributions in S(n, K). For ∆ > 0, we let S(n, K, ∆) denote the set of probability
distributions Λ ∈ D(Nn ) such that Λ(t1 , . . . , tn ) ≡ Λ0 (t1 − a1 , . . . , tn − an ) for some
fixed Λ0 ∈ S(n, K) and a ∈ Nn |6∆ . As a special case, S(n, K, 0) = S(n, K).
Specializing Proposition 5.3(iii) to this context, we obtain:

Proposition 5.4. For any n0 , n00 , ∆0 , ∆00 , K, one has

S(n0 , K, ∆0 ) ⊗ S(n00 , K, ∆00 ) ⊆ S(n0 + n00 , K, ∆0 + ∆00 ).

58 ALEXANDER A. SHERSTOV AND PEI WU

Proof. The only nontrivial property to verify is K-smoothness, which follows from
Proposition 5.3(iii).

5.3. Metric properties ofQlocally smooth distributions. If Λ is a locally

n
smooth distribution on X = i=1 {0, 1, 2, . . . , ri }, then a moment’s thought reveals
that Λ(x) > 0 at every point x ∈ X. In general, local smoothness gives considerable
control over metric properties of Λ, making it possible to prove nontrivial upper
and lower bounds on Λ(S) for various sets S ⊆ X. We now record two such results,
as regards our work on the sign-rank on AC0 .
Qn
Proposition 5.5. Let Λ be a probability distribution on X = i=1 {0, 1, 2, . . . , ri }.
Let θ and d be nonnegative integers with θ > d. If Λ is K-smooth on X|6θ , then

d n+d
Λ(X|6θ ) 6 K Λ(X|6θ−d ).
d

Proof. Consider an arbitrary vector x ∈ X|6θ . By definition, the components of x

are nonnegative integers that sum to at most θ. By decreasing the components of
x as needed, one can obtain a vector x0 with

x0 ∈ X|6θ−d ,
x0 6 x,
|x0 − x| 6 d.

In particular, the K-smoothness of Λ implies that

Λ(x) 6 K d Λ(x0 ).

Summing on both sides over x ∈ X|6θ , we obtain

Λ(X|6θ ) 6 K d Λ(X|6θ−d ) max |{x ∈ X|6θ : x > x0 and |x − x0 | 6 d}|

x0 ∈X|6θ−d

6 K d Λ(X|6θ−d ) max |{x ∈ Nn : x > x0 and |x − x0 | 6 d}|

x0 ∈Nn

n+d
= K d Λ(X|6θ−d ) .
d
Qn
Proposition 5.6. Let Λ be a probability distribution on X = i=1 {0, 1, 2, . . . , ri }.
Let θ and d be nonnegative integers with
( n )
1 X
d < min θ, ri . (5.17)
2 i=1

If Λ is K-smooth on X|6θ , then

d+1 2d+1 n+d
Λ(X|6θ ) 6 2 K Λ(X|6θ \ Bd (u))
d

for every u ∈ X.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 59

|u − u0 | = d + 1, (5.18)
0
u ∈ X|6θ . (5.19)

The algorithm for finding u0 depends on |u|, as follows.

(i) If |u| > d, decrease one or more of the components of u as needed to obtain
a vector u0 whose components are nonnegative integers that sum to exactly
|u| − d − 1. Then (5.18) is immediate, whereas (5.19) follows in view of
|u| 6 θ + d. Qn
(ii) If |u| 6 d, the analysis is more subtle. P Recall thatPu ∈ i=1 {0, 1, 2, . . . , ri }
and therefore |(r1 , . . . , rn ) − u| = ri − |u| > ri − d > d, where the
last step uses (5.17). As a result, by increasing the components of u as
n
necessary, one can obtain a vector u0 ∈ i=1 {0, 1, 2, . . . , ri } with |u0 | =
Q
|u| + d + 1. Then property (5.18) is immediate. Property (5.19) follows
from |u0 | = |u| + d + 1 6 2d + 1 < θ + 1, where the last step uses (5.17).

Now that u0 has been constructed, apply the K-smoothness of Λ to conclude that
for every x ∈ X|6θ ∩ Bd (u),
0
Λ(x) 6 K |x−u | Λ(u0 )
0
6 K |x−u|+|u−u | Λ(u0 )
6 K 2d+1 Λ(u0 ), (5.20)

where the last step uses (5.18). As a result,

Λ(X|6θ ∩ Bd (u)) 6 X|6θ ∩ Bd (u) K 2d+1 Λ(u0 )

6 |Bd (u)| K 2d+1 Λ(u0 )

6 |Bd (u)| K 2d+1 Λ(X|6θ \ Bd (u))

d n+d
62 K 2d+1 Λ(X|6θ \ Bd (u)), (5.21)
d

where the first inequality is the result of summing (5.20) over x ∈ X|6θ ∩ Bd (u);
the third step uses (5.18) and (5.19); and the last step applies (5.1). To complete
the proof, add Λ(X|6θ \ Bd (u)) to both sides of (5.21).

5.4. Weight transfer in locally smooth distributions. Locally smooth func-

tions exhibitQgreat plasticity. In what follows, we will show that a locally smooth
n
function on i=1 {0, 1, 2, . . . , ri } can be modified to achieve a broad range of global
metric behaviors—without the modification being detectable by low-degree poly-
nomials. Among other things, we will be able to take any locally smooth distri-
bution and make it globally min-smooth. Our starting point is a generalization of
Lemma 3.2, which corresponds to taking v = 0n in the new result.
60 ALEXANDER A. SHERSTOV AND PEI WU

Lemma 5.7. Fix points u, v ∈ Nn and a natural number d < |u − v|. Then there is
a function ζu,v : cube(u, v) → R such that

supp ζu,v ⊆ {u} ∪ {x ∈ cube(u, v) : |x − v| 6 d}, (5.22)

ζu,v (u) = 1, (5.23)

d |u − v|
kζu,v k1 6 1 + 2 , (5.24)
d
orth ζu,v > d. (5.25)

Proof. Abbreviate u∗ = (|u1 − v1 |, |u2 − v2 |, . . . , |un − vn |). Lemma 3.2 constructs

a function ζu∗ : Nn → R such that

supp ζu∗ ⊆ {u∗ } ∪ {x ∈ Nn : x 6 u∗ and |x| 6 d}, (5.26)

ζu∗ (u∗ ) = 1, (5.27)
∗

|u |
kζu∗ k1 6 1 + 2d , (5.28)
d
orth ζu∗ > d. (5.29)

Define ζu,v : cube(u, v) → R by

ζu,v (x) = ζu∗ (|x1 − v1 |, |x2 − v2 |, . . . , |xn − vn |).

Then (5.22) and (5.23) are immediate from (5.26) and (5.27), respectively. Prop-
erty (5.24) can be verified as follows:
X
kζu,v k1 = |ζu∗ (|x1 − v1 |, |x2 − v2 |, . . . , |xn − vn |)|
x∈cube(u,v)
X
= |ζu∗ (w)|
w∈Nn :
w6u∗
∗
|u |d
61+2 ,
d

where the last step uses (5.28). For (5.25), fix an arbitrary polynomial p of degree
at most d. Then at every point x ∈ cube(u, v), we have

p(x) = p((x1 − v1 ) + v1 , . . . , (xn − vn ) + vn )

= p(sgn(u1 − v1 )|x1 − v1 | + v1 , . . . , sgn(un − vn )|xn − vn | + vn )
= q(|x1 − v1 |, . . . , |xn − vn |), (5.30)
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 61

where q is some polynomial of degree at most d. As a result,

X
hζu,v , pi = ζu∗ (|x1 − v1 |, . . . , |xn − vn |) p(x)
x∈cube(u,v)
X
= ζu∗ (|x1 − v1 |, . . . , |xn − vn |) q(|x1 − v1 |, . . . , |xn − vn |)
x∈cube(u,v)
X
= ζu∗ (w) q(w)
w∈Nn :
w6u∗
= hζu∗ , qi
= 0,

where the second, fourth, and fifth steps are valid by (5.30), (5.26), and (5.29),
respectively.

Our next result is a smooth analogue of Lemma 5.7. The smoothness offers a great
deal of flexibility when using the lemma to transfer `1 mass from one region of Nn
to another.
Qn
Lemma 5.8. Let X = i=1 {0, 1, 2, . . . , ri }, where each ri > 0 is an integer. Let θ
and d be nonnegative integers with
( n )
1 X
d < min θ, ri .
3 i=1

Let Λ be a probability distribution on X|6θ . Suppose further that Λ is K-smooth on

X|6θ . Then for every u ∈ X, there is a function Zu : X → R with

supp Zu ⊆ X|6θ ∪ {u}, (5.31)

Zu (u) = 1, (5.32)
orth Zu > d, (5.33)

diam({u} ∪ supp Λ)
kZu k1 6 2d + 1, (5.34)
d
3
3d+1 4d+1 n + d diam({u} ∪ supp Λ)
|Zu (x)| 6 2 K Λ(x), x 6= u.
d d
(5.35)

Proof. We have

1 = Λ(X|6θ )

n+d
6 Kd Λ(X|6θ−d )
d
2
d+1 3d+1 n + d
62 K Λ(X|6θ−d \ Bd (u)), (5.36)
d

where the last two steps apply Propositions 5.5 and 5.6, respectively.
62 ALEXANDER A. SHERSTOV AND PEI WU

We now move on to the construction of Zu . For any v ∈ X|6θ−d \ Bd (u),

Lemma 5.7 gives a function ζu,v : X → R with

supp ζu,v ⊆ {x ∈ cube(u, v) : |x − v| 6 d} ∪ {u} (5.37)

⊆ X|6θ ∪ {u}, (5.38)
ζu,v (u) = 1, (5.39)
orth ζu,v > d, (5.40)

|u − v|
kζu,v k1 6 2d + 1. (5.41)
d

The last inequality can be simplified as follows:

d diam(X|6θ ∪ {u})
kζu,v k1 6 2 +1
d

diam({u} ∪ supp Λ)
6 2d + 1, (5.42)
d

where the first step uses v ∈ X|6θ , and the second step is legitimate because Λ is a
K-smooth probability distribution on X|6θ and therefore Λ 6= 0 at every point of
X|6θ . Combining (5.39) and (5.42),

diam({u} ∪ supp Λ)
kζu,v k∞ 6 2d . (5.43)
d

We define Zu : X → R by

1 X
Zu (x) = Λ(v) ζu,v (x),
Λ(X|6θ−d \ Bd (u))
v∈X|6θ−d \Bd (u)

which is legitimate since Λ(X|6θ−d \ Bd (u)) > 0 by (5.36). Then properties (5.31),
(5.32), (5.33), and (5.34) for Zu are immediate from the corresponding proper-
ties (5.38), (5.39), (5.40), and (5.42) of ζu,v .
It remains to verify (5.35). Fix x 6= u. If x ∈ / X|6θ , then (5.38) implies that
Zu (x) = 0 and therefore (5.35) holds in that case. In the complementary case when
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 63

x ∈ X|6θ , we have
X Λ(v)
|Zu (x)| 6 · |ζu,v (x)|
Λ(X|6θ−d \ Bd (u))
v∈X|6θ−d \Bd (u)
X Λ(v)
= · |ζu,v (x)|
Λ(X|6θ−d \ Bd (u))
v∈X|6θ−d \Bd (u):
|v−x|6d

K d Λ(x)

X diam({u} ∪ supp Λ)
6 · 2d
Λ(X|6θ−d \ Bd (u)) d
v∈X|6θ−d \Bd (u):
|v−x|6d

K d Λ(x)

n+d diam({u} ∪ supp Λ)
6 2d · · 2d ,
d Λ(X|6θ−d \ Bd (u)) d

where the first step applies the triangle inequality to the definition of Zu ; the
second step uses (5.37) and x 6= u; the third step applies the K-smoothness of
Λ and substitutes the bound from (5.43); and the final step uses (5.1). In view
of (5.36), this completes the proof of (5.35).

We now show how to efficiently zero out a locally smooth function on points of
large Hamming weight. The modified function is pointwise close to the original and
cannot be distinguished from it by any low-degree polynomial.
Qn
Lemma 5.9. Define X = i=1 {0, 1, 2, . . . , ri }, where each ri > 0 is an integer. Let
θ and d be nonnegative integers with

θ
d< . (5.44)
3
Let Φ : X → R be a function that is K-smooth on X|6θ , with Φ|6θ 6≡ 0. Then there
is Φ̃ : X → R such that

orth(Φ − Φ̃) > d, (5.45)

supp Φ̃ ⊆ X|6θ , (5.46)
3
n+d diam(supp Φ) kΦ|>θ k1
|Φ − Φ̃| 6 23d+1 K 4d+1 · |Φ|
d d kΦ|6θ k1
on X|6θ. (5.47)
Pn
Proof. If θ > i=1 ri , the lemma P
holds trivially for Φ̃ = Φ. In what follows, we
n
treat the complementary case θ 6 i=1 ri . By (5.44),
( n )
1 X
d < min θ, ri .
3 i=1

Since Φ is K-smooth on X|6θ , the probability distribution Λ on X|6θ given by

Λ(x) = |Φ(x)|/kΦ|6θ k1 is also K-smooth. As a result, Lemma 5.8 gives for every
64 ALEXANDER A. SHERSTOV AND PEI WU

u ∈ X a function Zu : X → R with

Zu (u) = 1, (5.48)
3
n+d diam({u} ∪ supp Λ) |Φ(x)|
|Zu (x)| 6 23d+1 K 4d+1
d d kΦ|6θ k1
for x 6= u, (5.49)
orth Zu > d, (5.50)
supp Zu ⊆ X|6θ ∪ {u}. (5.51)

Now define
X
Φ̃ = Φ − Φ(u)Zu .
u∈X|>θ

Then (5.45) is immediate from (5.50). To verify (5.46), fix any point x ∈ X|>θ .
Then
X
Φ̃(x) = Φ(x) − Φ(u)Zu (x)
u∈X|>θ

= Φ(x) − Φ(x)Zx (x)

= 0,

where the last two steps use (5.51) and (5.48), respectively.
It remains to verify (5.47) on X|6θ :
X
|Φ − Φ̃| 6 |Φ(u)| |Zu |
u∈X|>θ :
Φ(u)6=0
3
3d+1 4d+1 n+d diam(supp Φ) X |Φ|
62 K |Φ(u)| ·
d d kΦ|6θ k1
u∈X|>θ :
Φ(u)6=0
3
n+d diam(supp Φ) kΦ|>θ k1
= 23d+1 K 4d+1 · |Φ|,
d d kΦ|6θ k1

where the second step uses (5.49).

For
Qntechnical reasons, we need a generalization of the previous lemma to functions
on i=1 {∆i , ∆i + 1, . . . , ∆i + ri } for nonnegative integers ∆i and ri , and further
to convex combinations of such functions. We obtain these generalizations in the
two corollaries that follow.
Qn
Corollary 5.10. Define X = i=1 {∆i , ∆i + 1, . . . , ∆i + ri }, where all ∆i and ri
are nonnegative integers. Let θ and d be nonnegative integers with
n
!
1 X
d< θ− ∆i .
3 i=1
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 65

Let Φ : X → R be a function that is K-smooth on X|6θ , with Φ|6θ 6≡ 0. Then there

is a function Φ̃ : X → R such that

orth(Φ − Φ̃) > d, (5.52)

supp Φ̃ ⊆ X|6θ , (5.53)
3
n+d diam(supp Φ) kΦ|>θ k1
|Φ − Φ̃| 6 23d+1 K 4d+1 · |Φ|
d d kΦ|6θ k1
on X|6θ. (5.54)
Qn Pn
Proof. Abbreviate X 0 = i=1 {0, 1, 2, . . . , ri } and θ0 = θ− i=1 ∆i . In this notation,

θ0
d< . (5.55)
3
Consider the function Φ0 : X 0 → R given by Φ0 (x) = Φ(x + (∆1 , ∆2 . . . , ∆n )). Then
any two points u, v ∈ X 0 |6θ0 obey

|Φ0 (u)| = |Φ(u + (∆1 , ∆2 , . . . , ∆n ))|

6 K |u−v| |Φ(v + (∆1 , ∆2 , . . . , ∆n ))|
= K |u−v| |Φ0 (v)|,

where the second step uses the K-smoothness of Φ on X|6θ . As a result, Φ0

is K-smooth on X 0 |6θ0 . Moreover, kΦ0 |6θ0 k1 = kΦ|6θ k1 > 0. In view of (5.55),
Lemma 5.9 gives a function Φ̃0 : X 0 → R such that

orth(Φ0 − Φ̃0 ) > d,

supp Φ̃0 ⊆ X 0 |6θ0 ,

and
3
diam(supp Φ0 ) kΦ0 |>θ0 k1

n+d
|Φ0 − Φ̃0 | 6 23d+1 K 4d+1 · |Φ0 |
d d kΦ0 |6θ0 k1
3
3d+1 4d+1 n + d diam(supp Φ) kΦ|>θ k1
=2 K · |Φ0 |
d d kΦ|6θ k1

on X 0 |6θ0 . As a result, (5.52)–(5.54) hold for the real-valued function Φ̃ : X → R

given by Φ̃(x) = Φ̃0 (x − (∆1 , ∆2 , . . . , ∆n )).

Corollary 5.11. Fix integers ∆, d, θ > 0 and n > 1, and a real number δ, where

δ ∈ [0, 1),
1
d < (θ − ∆).
3
66 ALEXANDER A. SHERSTOV AND PEI WU

Then for every

Λ ∈ conv(S(n, K, ∆) ∩ {Λ0 ∈ D(Nn ) : Λ0 (Nn |>θ ) 6 δ}),

there is a function Λ̃ : Nn → R such that

orth(Λ − Λ̃) > d,

supp Λ̃ ⊆ Nn |6θ ∩ supp Λ,
3
3d+1 4d+1 n + d diam(supp Λ) δ
|Λ − Λ̃| 6 2 K ·Λ on Nn |6θ .
d d 1−δ

Proof. Write Λ out explicitly as

N
X
Λ= λi Λi
i=1
P
for some positive reals λ1 , . . . , λN with λi = 1, where Λi ∈ S(n, K, ∆) and
Λi (Nn |>θ ) 6 δ. Then clearly
n
[
supp Λ = supp Λi . (5.56)
i=1

For i = 1, 2, . . . , N, Corollary 5.10 constructs Λ̃i : Nn → R with

orth(Λi − Λ̃i ) > d, (5.57)

supp Λ̃i ⊆ Nn |6θ , (5.58)
3
3d+1 4d+1 n + d diam(supp Λi ) δ
|Λi − Λ̃i | 6 2 K · Λi
d d 1−δ
on Nn |6θ , (5.59)
supp Λ̃i ⊆ supp Λi , (5.60)

where the last property follows from the two before it. In view of (5.56)–(5.60), the
PN
proof is complete by taking Λ̃ = i=1 λi Λ̃i .

Our next result uses local smoothness to achieve something completely different.
Here, we show how to start with a locally smooth function and make it globally
min-smooth. The new function has the same sign pointwise as the original, and
cannot be distinguished from it by any low-degree polynomial. Crucially for us, the
global min-smoothness can be achieved relative to any distribution on the domain.
Qn
Lemma 5.12. Define X = i=1 {0, 1, 2, . . . , ri }, where each ri > 0 is an integer.
Let θ and d be nonnegative integers with
( n )
1 X
d < min θ, ri .
3 i=1
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 67

Let Φ : X|6θ → R be a function that is K-smooth on X|6θ . Then for every proba-
bility distribution Λ∗ on X|6θ , there is Φ∗ : X|6θ → R such that

orth(Φ − Φ∗ ) > d, (5.61)

kΦ∗ k1 6 2kΦk1 , (5.62)
Φ · Φ∗ > 0, (5.63)
3 !−1
3d+1 4d+1 n + d diam(supp Φ)
|Φ∗ | > 2 K kΦk1 Λ∗ . (5.64)
d d

Proof. If Φ ≡ 0, the lemma holds trivially with Φ∗ = Φ. In the complementary

case, abbreviate
3
n+d diam(supp Φ)
N = 23d+1 K 4d+1 .
d d

We will view |Φ|/kΦk1 as a probability distribution on X|6θ . By hypothesis, this

Zu (u) = 1, (5.65)
N
kZu k1 6 + 1, (5.66)
2
|Φ(x)|
|Zu (x)| 6 N · , x 6= u, (5.67)
kΦk1
orth Zu > d. (5.68)

Now, define Φ∗ : X|6θ → R by

kΦk1 X
Φ∗ = Φ + gn(Φ(u))Λ∗ (u)Zu .
sg
N
u∈X|6θ

Then (5.61) follows directly from (5.68). For (5.62), we have:

kΦk1 X
kΦ∗ k1 6 kΦk1 + Λ∗ (u) kZu k1
N
u∈X|6θ
X
kΦk1 N
6 kΦk1 + · +1 Λ∗ (u)
N 2
u∈X|6θ
3N + 2
= kΦk1
2N
6 2 kΦk1 , (5.69)
68 ALEXANDER A. SHERSTOV AND PEI WU

where the second step uses (5.66). The remaining properties (5.63) and (5.64) can
be established simultaneously as follows: for every x ∈ X|6θ ,

gn(Φ(x)) · Φ∗ (x)
sg
kΦk1 X
= |Φ(x)| + Λ∗ (u)Zu (x)
N
u∈X|6θ

kΦk1 ∗ kΦk1 X
> |Φ(x)| + Λ (x)Zx (x) − Λ∗ (u) |Zu (x)|
N N
u∈X|6θ :
u6=x
kΦk1 ∗ kΦk1 X
= |Φ(x)| + Λ (x) − Λ∗ (u) |Zu (x)|
N N
u∈X|6θ :
u6=x
kΦk1 ∗ kΦk1 |Φ(x)| X
> |Φ(x)| + Λ (x) − ·N · Λ∗ (u)
N N kΦk1
u∈X|6θ :
u6=x
kΦk1 ∗
= |Φ(x)| + Λ (x) − |Φ(x)| (1 − Λ∗ (x))
N
kΦk1 ∗
> Λ (x), (5.70)
N

where the third and fourth steps use (5.65) and (5.67), respectively.

5.5. A locally smooth dual polynomial for MP. As Sections 5.2–5.4 show,
local smoothness implies several useful metric and analytic properties. To tap into
this resource, we now construct a locally smooth dual polynomial for the Min-
sky–Papert function. It is helpful to view this new result as a counterpart of
Theorem 4.4 from our analysis of the threshold degree of AC0 . The new proof
is considerably more technical because local smoothness is a delicate property to
achieve.

Theorem 5.13. For some absolute constant 0 < c < 1 and all positive integers
m, r, R with r 6 R, there are probability distributions Λ0 and Λ1 such that

supp Λ0 = (MP∗m,R )−1 (0), (5.71)

supp Λ1 = (MP∗m,R )−1 (1), (5.72)
√
orth(Λ0 − Λ1 ) > min{m, c r}, (5.73)
Λ0 + Λ1 m
∈ Smooth , {0, 1, 2, . . . , R}m , (5.74)
2 ( c

1
Λ0 , Λ1 ∈ conv λ ∈ S 1, , 1 :
c
⊗m !
1
λ(t) 6 √ for t ∈ N . (5.75)
c(t + 1)2 2ct/ r

Our proof of Theorem 5.13 repeatedly employs the following simple but useful
criterion for K-smoothness: a probability distribution λ is K-smooth on an integer
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 69

interval I = {i, i+1, i+2, . . . , j} if and only if the probabilities of any two consecutive
integers in I are within a factor of K.
Proof of Theorem 5.13. Abbreviate = 1/6. For some absolute constants c0 , c00 ∈
(0, 1), Lemma 4.3 constructs probability distributions λ0 , λ1 , λ2 such that

supp λ0 = {0}, (5.76)

supp λi = {1, 2, . . . , R}, i = 1, 2, (5.77)
c0

1
λi (t) ∈ 2 c00 t/√r , 0 2 c00 t/√r , i = 1, 2; t = 1, 2, . . . , R, (5.78)
t 2 ct 2
√
orth((1 − )λ0 + λ2 − λ1 ) > c0 r. (5.79)

We infer that

λ0 ∈ S(1, K), (5.80)

λ1 ∈ S(1, K, 1), (5.81)
λ2 ∈ S(1, K, 1), (5.82)
(1 − )λ0 + λ2 ∈ S(1, K), (5.83)
1 m
λ0 + λ1 ∈ S(1, Km) (5.84)
m+1 m+1

for some large constant K = K(c0 , c00 ) > 1. Indeed, (5.80) is trivial since λ0 is the
single-point distribution on the origin; (5.81) holds because by (5.77) and (5.78), the
probabilities of any pair of consecutive integers in supp λ1 = {1, 2, . . . , R} are the
same up to a constant factor; and (5.82)–(5.84) can be seen analogously, by compar-
ing the probabilities of any pair of consecutive integers. Combining (5.80)–(5.84)
with Proposition 5.4, we obtain

{λ0 , λ1 , λ2 }⊗m ⊆ S(m, K, m), (5.85)

⊗m
((1 − )λ0 + λ2 ) ∈ S(m, K), (5.86)
⊗m
1 m
λ0 + λ1 ∈ S(m, Km). (5.87)
m+1 m+1

The proof centers around the dual objects Ψ1 , Ψ2 : {0, 1, 2, . . . , R}m → R given
by
⊗m
1 m
Ψ1 = λ0 + λ1 − 2λ⊗m
1
m+1 m+1

and

Ψ2 = 2((1 − )λ0 + λ2 )⊗m − 2(−λ0 + λ2 )⊗m

⊗m
1 m
− λ0 + ((1 − )λ0 + λ2 ) .
m+1 m+1

The next four claims establish key properties of Ψ1 and Ψ2 .

70 ALEXANDER A. SHERSTOV AND PEI WU

Claim 5.14. Ψ1 satisfies

pos Ψ1 ∈ cone({λ0 , λ1 }⊗m \ {λ⊗m

1 }), (5.88)
neg Ψ1 ∈ cone{λ⊗m
1 }, (5.89)
⊗m
1 1 m
|Ψ1 | 6 λ0 + λ1 6 |Ψ1 |. (5.90)
5 m+1 m+1

Claim 5.15. Ψ2 satisfies

pos Ψ2 ∈ cone({λ0 , λ2 }⊗m \ {λ⊗m

2 }), (5.91)
neg Ψ2 ∈ cone{λ⊗m
2 }, (5.92)
1 ⊗m
|Ψ2 | 6 ((1 − )λ0 + λ2 ) 6 3|Ψ2 |. (5.93)
3
Claim 5.16. Ψ1 and Ψ2 satisfy

supp(pos Ψi ) = (MP∗m,R )−1 (0), i = 1, 2, (5.94)

supp(neg Ψi ) = (MP∗m,R )−1 (1), i = 1, 2. (5.95)
√
Claim 5.17. orth(Ψ1 + Ψ2 ) > min{m, c0 r}.

We will settle Claims 5.14–5.17 shortly, once we complete the main proof. Define

2
Λ0 = pos(Ψ1 + Ψ2 ),
kΨ1 k1 + kΨ2 k1
2
Λ1 = neg(Ψ1 + Ψ2 ),
kΨ1 k1 + kΨ2 k1

where the denominators are nonzero by (5.90). We proceed to verify the properties
required of Λ0 and Λ1 in the theorem statement.

Support. Recall from Claim 5.16 that the positive parts of Ψ1 and Ψ2 are
supported on (MP∗m,R )−1 (0). Therefore, the positive part of Ψ1 + Ψ2 is supported
on (MP∗m,R )−1 (0) as well, which in turn implies that

supp Λ0 = (MP∗m,R )−1 (0). (5.96)

Analogously, Claim 5.16 states that the negative parts of Ψ1 and Ψ2 are supported
on (MP∗m,R )−1 (1). As a result, the negative part of Ψ1 + Ψ2 is also supported on
(MP∗m,R )−1 (1), whence

supp Λ1 = (MP∗m,R )−1 (1). (5.97)

Orthogonality. The defining equations for Λ0 and Λ1 imply that

2
Λ0 − Λ1 = (Ψ1 + Ψ2 ),
kΨ1 k1 + kΨ2 k1
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 71

which along with Claim 5.17 forces

√
orth(Λ0 − Λ1 ) > min{m, c0 r}. (5.98)

Nonnegativity and norm. By definition, Λ0 and Λ1 are nonnegative func-

tions. We calculate

kΛ0 k1 − kΛ1 k1 = hΛ0 , 1i − hΛ1 , 1i

= hΛ0 − Λ1 , 1i
= 0, (5.99)

where the first step uses the nonnegativity of Λ0 and Λ1 , and the last step ap-
plies (5.98). In addition,

2
kΛ0 k1 + kΛ1 k1 = (k pos(Ψ1 + Ψ2 )k1 + k neg(Ψ1 + Ψ2 )k1 )
kΨ1 k1 + kΨ2 k1
2
= kΨ1 + Ψ2 k1
kΨ1 k1 + kΨ2 k1
= 2, (5.100)

where the last step uses Claim 5.16. A consequence of (5.99) and (5.100) is that
kΛ0 k1 = kΛ1 k1 = 1, which makes Λ0 and Λ1 probability distributions. In view
of (5.96) and (5.97), we conclude that

Λi ∈ D((MP∗m,R )−1 (i)), i = 0, 1. (5.101)

In particular,

Λ0 + Λ1
∈ D({0, 1, 2, . . . , R}m ). (5.102)
2

Smoothness. We have
Λ0 + Λ1 |Ψ1 + Ψ2 |
=
2 kΨ1 k1 + kΨ2 k1
1 1
= |Ψ1 | + |Ψ2 |, (5.103)
kΨ1 k1 + kΨ2 k1 kΨ1 k1 + kΨ2 k1

where the first step follows from the defining equations for Λ0 and Λ1 , and the sec-
ond step uses Claim 5.16. Inequality (5.90) shows that at every point, |Ψ1 | is within
a factor of 5 of the tensor product ( m+11 m
λ0 + m+1 λ1 )⊗m , which by (5.87) is Km-
smooth on its support. It follows that |Ψ1 | is 25Km-smooth on {0, 1, 2, . . . , R}m . By
an analogous argument, (5.93) and (5.86) imply that |Ψ2 | is 9K-smooth (and hence
also 25Km-smooth) on {0, 1, 2, . . . , R}m . Now (5.103) shows that 12 (Λ0 + Λ1 ) is a
conical combination of two nonnegative 25Km-smooth functions on {0, 1, 2, . . . , R}m .
72 ALEXANDER A. SHERSTOV AND PEI WU

By Proposition 5.3(iv),

Λ0 + Λ 1
∈ Smooth(25Km, {0, 1, 2, . . . , R}m ). (5.104)
2
Λ0 +Λ1
Having examined the convex combination 2 , we now turn to the individual
distributions Λ0 and Λ1 . We have

2
Λ0 = pos(Ψ1 + Ψ2 )
kΨ1 k1 + kΨ2 k1
2
= (pos(Ψ1 ) + pos(Ψ2 ))
kΨ1 k1 + kΨ2 k1
∈ cone({λ0 , λ1 , λ2 }⊗m ),

where the first equation restates the definition of Λ0 , the second step applies (5.94),
and the last step uses (5.88) and (5.91). Analogously,

2
Λ1 = neg(Ψ1 + Ψ2 )
kΨ1 k1 + kΨ2 k1
2
= (neg(Ψ1 ) + neg(Ψ2 ))
kΨ1 k1 + kΨ2 k1
∈ cone({λ⊗m ⊗m
1 , λ2 }),

where the first equation restates the definition of Λ1 , the second step applies (5.95),
and the last step uses (5.89) and (5.92). Thus, Λ0 and Λ1 are conical combinations
of probability distributions in {λ0 , λ1 , λ2 }⊗m . Since Λ0 and Λ1 are themselves prob-
ability distributions, we conclude that

Λ0 , Λ1 ∈ conv({λ0 , λ1 , λ2 }⊗m ).

By (5.76)–(5.78),

1
λi (t) 6 √ (t ∈ N; i = 0, 1, 2)
c000 (t + 1)2 2c000 t/ r

for some constant c000 > 0. The last two equations along with (5.80)–(5.82) yield
(
Λ0 , Λ1 ∈ conv λ ∈ S(1, K, 1) :
⊗m !
1
λ(t) 6 000 √ for t ∈ N . (5.105)
c (t + 1)2 2c000 t/ r

Now (5.96)–(5.98), (5.104), and (5.105) imply (5.71)–(5.75) for a small enough
constant c > 0.

We now settle the four claims made in the proof of Theorem 5.13.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 73

Proof of Claim 5.14. Multiplying out the tensor product in the definition of Ψ1 and
collecting like terms, we obtain
m
m
Ψ1 = − 2 − λ⊗m
1
m+1
|S| m−|S|
X 1 m
+ λ⊗S ⊗S
0 · λ1 . (5.106)
m+1 m+1
S⊆{1,2,...,m}
S6=∅

Recall from (5.76) and (5.77) that λ0 and λ1 are supported on {0} and {1, 2, . . . , R},
respectively. Therefore, the right-hand side of (5.106) is the sum of 2m nonzero
functions whose supports are pairwise disjoint. Now (5.88) and (5.89) follow directly
from (5.106). One further obtains that
m
m
|Ψ1 | = 2− λ⊗m
1
m+1
|S| m−|S|
X 1 m
+ λ⊗S ⊗S
0 · λ1 .
m+1 m+1
S⊆{1,2,...,m}
S6=∅

From first principles,

⊗m m
1 m m
λ0 + λ1 = λ⊗m
1
m+1 m+1 m+1
|S| m−|S|
X 1 m
+ λ⊗S ⊗S
0 · λ1 .
m+1 m+1
S⊆{1,2,...,m}
S6=∅

Comparing the right-hand sides of the last two equations settles (5.90).

Proof of Claim 5.15. Multiplying out the tensor powers in the definition of Ψ2 and
collecting like terms, we obtain
m
m X
Ψ2 = − m λ⊗m
2 + a|S| λ⊗S ⊗S
0 · λ2 , (5.107)
m+1
S⊆{1,2,...,m}
S6=∅

where the coefficients a1 , a2 , . . . , am are given by

m−i !
i
i m−i i m mm
ai = 2(1 − ) − 2(−1) − 1 −
m+1
m+1
i i m−i !
i m−i − m
= (1 − ) 2−2 − 1+
1− (1 − )(m + 1) m+1

1
∈ (1 − )i m−i , 3(1 − )i m−i . (5.108)
3
74 ALEXANDER A. SHERSTOV AND PEI WU

As in the proof of the previous claim, recall from (5.76) and (5.77) that λ0 and λ2
have disjoint support. Therefore, the right-hand side of (5.107) is the sum of 2m
nonzero functions whose supports are pairwise disjoint. Now (5.91) and (5.92) are
immediate from (5.108). The disjointness of the supports of the summands on the
right-hand side of (5.107) also implies that
m
m X
|Ψ2 | = m λ⊗m
2 + |a|S| | λ⊗S ⊗S
0 · λ2 .
m+1
S⊆{1,2,...,m}
S6=∅

In view of (5.108), we conclude that |Ψ2 | coincides up to a factor of 3 with the

function
X
(1 − )|S| m−|S| λ⊗S ⊗S
0 · λ2 = ((1 − )λ0 + λ2 )⊗m .
S⊆{1,2,...,m}

This settles (5.93) and completes the proof.

Proof of Claim 5.16. Recall from (5.76) and (5.77) that supp λ0 = {0} and supp λ1 =
supp λ2 = {1, 2, . . . , R}. In this light, (5.88)–(5.90) imply

supp(pos Ψ1 ) ⊆ (MP∗m,R )−1 (0),

supp(neg Ψ1 ) ⊆ (MP∗m,R )−1 (1),
supp(Ψ1 ) = (MP∗m,R )−1 (0) ∪ (MP∗m,R )−1 (1),

respectively. Analogously, (5.91)–(5.93) imply

supp(pos Ψ2 ) ⊆ (MP∗m,R )−1 (0),

supp(neg Ψ2 ) ⊆ (MP∗m,R )−1 (1),
supp(Ψ2 ) = (MP∗m,R )−1 (0) ∪ (MP∗m,R )−1 (1).

Since the support of each Ψi is the disjoint union of the supports of its positive and
negative parts, (5.94) and (5.95) follow.

Proof of Claim 5.17. Write Ψ1 + Ψ2 = A + B + C, where

⊗m ⊗m
1 m 1 m
A= λ0 + λ1 − λ0 + ((1 − )λ0 + λ2 ) ,
m+1 m+1 m+1 m+1
B = 2((1 − )λ0 + λ2 )⊗m − 2λ⊗m
1 ,
C = −2(−λ0 + λ2 )⊗m .

As a result, Proposition 2.1(i) guarantees that

orth(Ψ1 + Ψ2 ) > min{orth A, orth B, orth C}. (5.109)

THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 75

We have

1 m
orth A > orth λ0 + λ1
m+1 m+1

1 m
− λ0 + ((1 − )λ0 + λ2 )
m+1 m+1

m
= orth − ((1 − )λ0 + λ2 − λ1 )
m+1
√
> c0 r, (5.110)

where the first step uses Proposition 2.1(iii), and the last step is a restatement
of (5.79). Analogously,

orth B > orth(((1 − )λ0 + λ2 ) − λ1 )

√
> c0 r, (5.111)

where the first and second steps use Proposition 2.1(iii) and (5.79), respectively.
Finally,

orth C = orth((−λ0 + λ2 )⊗m )

= m orth(−λ0 + λ2 )
> m, (5.112)

where the second step applies Proposition 2.1(ii), and the third step is valid because
h−λ0 + λ2 , 1i = −hλ0 , 1i + hλ2 , 1i = − + = 0. By (5.109)–(5.112), the proof
is complete.

5.6. An amplification theorem for smooth threshold degree. We have

reached the technical centerpiece of our sign-rank analysis, an amplification theorem
for smooth threshold degree. This result is qualitatively stronger than the amplifi-
cation theorems for threshold degree in Section 4.3, which do not preserve smooth-
ness. We prove the new amplification theorem by manipulating locally smooth dual
objects to achieve the desired global behavior, an approach unrelated to our work
in Section 4.3. A detailed statement of our result follows.

Theorem 5.18. There is an absolute constant C > 1 such that

for all:
positive integers n, m, r, R, θ with R > r and θ > Cnm log(2nm);
real numbers γ ∈ [0, 1];
functions f : {0, 1}n → {0, 1};
probability distributions Λ∗ on {0, 1, 2, . . . , R}mn |6θ ; and
positive integers d with

√

1 θ
d 6 min m deg± (f, γ), r deg± (f, γ), √ , (5.113)
C r log(2nmR)
76 ALEXANDER A. SHERSTOV AND PEI WU

one has:
∗
orth((−1)f ◦MPm,R · Λ) > d, (5.114)
−9d ∗
Λ > γ · (CnmR) Λ (5.115)

for some Λ ∈ D({0, 1, 2, . . . , R}mn |6θ ).

Proof. Let 0 < c < 1 be the constant from Theorem 5.13. Take C > 1/c to be a
sufficiently large absolute constant. By hypothesis,

θ > Cnm log(2nm). (5.116)

Abbreviate

X = {0, 1, 2, . . . , R}nm ,
√
δ = 2−cθ/(2 r)
. (5.117)

The following inequalities are straightforward to verify:

1
d< min{θ − nm, nmR}, (5.118)
3
8enm(1 + ln(nm))
θ> , (5.119)
c
3
23d+1 nm + d

nmR δ 1
< , (5.120)
c4d+1 d d 1−δ 2
4d+1 3
(CnmR)9d

3m nm + d nmR
23d+1 6 . (5.121)
c d d 4

For example, (5.118) holds because d 6 nm/C by (5.113) and θ > Cnm log(2nm)
by (5.116). Inequalities (5.119)–(5.121) follow analogously from (5.113) and (5.116)
for a large enough constant C. The rest of the proof splits neatly into four major
steps.

Step 1: Key distributions. Theorem 5.13 provides probability distributions

Λ0 and Λ1 such that

supp Λi = (MP∗m,R )−1 (i), i = 0, 1, (5.122)

√
orth(Λ0 − Λ1 ) > min{m, c r}, (5.123)
Λ0 + Λ1 m
∈ Smooth , {0, 1, 2, . . . , R}m , (5.124)
2 c

1
Λ0 , Λ1 ∈ conv λ ∈ S 1, , 1 :
c
⊗m !
1
λ(t) 6 √ for t ∈ N . (5.125)
c(t + 1)2 2ct/ r
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 77

Consider the probability distributions

n
O
Λz = Λzi , z ∈ {0, 1}n .
i=1

Then
⊗mn !
1 1
Λz ∈ conv λ ∈ S 1, , 1 : λ(t) 6 √ for t ∈ N
c c(t + 1)2 2ct/ r
⊗mn
1
⊆ conv S 1, , 1 ∩
c
⊗mn !
1
λ ∈ D(N) : λ(t) 6 √ for t ∈ N
c(t + 1)2 2ct/ r
⊗mn n !
1 √ o
mn nm −cθ/(2 r)
⊆ conv S 1, , 1 ∩ Λ ∈ D(N ) : Λ(N |>θ ) 6 2
c
⊗mn !
1 mn nm
⊆ conv S 1, , 1 ∩ {Λ ∈ D(N ) : Λ(N |>θ ) 6 δ}
c

1 mn nm
⊆ conv S nm, , nm ∩ {Λ ∈ D(N ) : Λ(N |>θ ) 6 δ} , (5.126)
c

where the first step uses (2.2) and (5.125); the third step is valid by (5.119) and
Lemma 3.6; the fourth step is a substitution from (5.117); and the last step is an
application of Proposition 5.4.

Step 2: Restricting the support. By (5.118), (5.126), and Corollary 5.11,

there is a real function Λ̃z : Nnm → R such that

orth(Λz − Λ̃z ) > d, (5.127)

nm
supp Λ̃z ⊆ N |6θ , (5.128)
supp Λ̃z ⊆ supp Λz , (5.129)

and
3
23d+1 nm + d

diam(supp Λz ) δ
|Λz − Λ̃z | 6 4d+1 · Λz on Nnm |6θ .
c d d 1−δ

In view of (5.120) and diam(supp Λz ) 6 nmR, the last equation simplifies to

1
|Λz − Λ̃z | 6 Λz on Nnm |6θ . (5.130)
2

Properties (5.128) and (5.130) imply that Λ̃z is a nonnegative function, which
along with (5.127) and Proposition 2.4 implies that Λ̃z is a probability distribution.
78 ALEXANDER A. SHERSTOV AND PEI WU

Combining this fact with (5.122), (5.128), and (5.129) gives

n
!
Y
Λ̃z ∈ D N nm
|6θ ∩ (MP∗m,R )−1 (zi ) , z ∈ {0, 1}n . (5.131)
i=1

In particular, the Λ̃z are supported on disjoint sets of inputs.

Step 3: Ensuring min-smoothness. Recall from (5.131) that each of the

probability distributions Λ̃z is supported on a subset of X|6θ . Consider the function
Φ : X|6θ → R given by
X
Φ = 2−n (−1)f (z) Λ̃z .
z∈{0,1}n

Qn
Again by (5.131), the support of Λ̃z is contained in i=1 (MP∗m,R )−1 (zi ). This means
in particular that f ◦ MP∗m,R = f (z) on the support of Λ̃z , whence
∗
(−1)f (z) Λ̃z = (−1)f ◦MPm,R · Λ̃z (5.132)

everywhere on X|6θ . Making this substitution in the defining equation for Φ, we

find that
∗
(−1)f ◦MPm,R · Φ > 0. (5.133)

The fact that the Λ̃z are supported on pairwise disjoint sets of inputs forces
X
|Φ| = 2−n Λ̃z (5.134)
z∈{0,1}n

and in particular

kΦk1 = 1. (5.135)

We now examine the smoothness of Φ. For this, consider the probability distri-
bution
X
Λ = 2−n Λz . (5.136)
z∈{0,1}n

Comparing equations (5.134) and (5.136) term by term and using the upper bound
(5.130), we find that |Λ − |Φ|| 6 12 Λ on X|6θ . Equivalently,

1 3
Λ 6 |Φ| 6 Λ on X|6θ. (5.137)
2 2
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 79

But
⊗n
1 1
Λ= Λ0 + Λ1
2 2
m ⊗n
∈ Smooth , {0, 1, 2, . . . , R}m
mc
⊆ Smooth , {0, 1, 2, . . . , R}mn , (5.138)
c
where the last two steps are valid by (5.124) and Proposition 5.3(iii), respectively.
Combining (5.137) and (5.138), we conclude that Φ is (3m/c)-smooth on X|6θ . As
a result, (5.118) and Lemma 5.12 provide a function Φ∗ : X|6θ → R with

orth(Φ − Φ∗ ) > d, (5.139)

∗
kΦ k1 6 2kΦk1 , (5.140)
∗
Φ · Φ > 0, (5.141)
4d+1 3 !−1
3m nm + d diam(supp Φ)
|Φ∗ | > 23d+1 kΦk1 Λ∗ .
c d d
(5.142)

In view of (5.135), the second property simplifies to

kΦ∗ k1 6 2. (5.143)

Recall that on X|6θ , the function Φ is (3m/c)-smooth and, by (5.135), is not

identically zero. Therefore, Φ must be nonzero at every point of X|6θ , which
includes the support of Φ∗ . As a result, (5.133) and (5.141) imply that
∗
(−1)f ◦MPm,R · Φ∗ > 0. (5.144)

Finally, using diam(supp Φ) 6 nmR along with the bounds (5.121) and (5.135), we
can restate (5.142) as

|Φ∗ | > 4(CnmR)−9d Λ∗ . (5.145)

Step 4: The final construction. By the definition of smooth threshold

degree, there is a probability distribution µ on {0, 1}n such that

orth((−1)f · µ) > deg± (f, γ), (5.146)

−n n
µ(z) > γ · 2 , z ∈ {0, 1} . (5.147)

Define
X
Φfinal = µ(z)(−1)f (z) Λ̃z − γΦ + γΦ∗ .
z∈{0,1}n
80 ALEXANDER A. SHERSTOV AND PEI WU

The right-hand side is a linear combination of functions on X|6θ , whence

supp(Φfinal ) ⊆ X|6θ . (5.148)

Moreover,
X
kΦfinal k1 6 µ(z)kΛ̃z k1 + γkΦk1 + γkΦ∗ k1
z∈{0,1}n
6 1 + 3γ
6 4, (5.149)

where the first step applies the triangle inequality, and the second step uses (5.131),
(5.135) and (5.143). Continuing,
∗
(−1)f ◦MPm,R · Φfinal
 
f ◦MP∗
X
= (−1) m,R · (µ(z) − γ2−n )(−1)f (z) Λ̃z + γΦ∗ 
z∈{0,1}n
X ∗ ∗
= (µ(z) − γ2−n )(−1)f ◦MPm,R · (−1)f (z) Λ̃z + γ(−1)f ◦MPm,R · Φ∗
z∈{0,1}n
X
= (µ(z) − γ2−n )Λ̃z + γ|Φ∗ | (5.150)
z∈{0,1}n
∗
> γ|Φ |
> 4γ(CnmR)−9d Λ∗ , (5.151)

where the first step applies the definition of Φ; the third step uses (5.132) and (5.144);
the fourth step follows from (5.147); and the fifth step substitutes the lower bound
from (5.145). Now

Φfinal 6≡ 0 (5.152)

follows from (5.150) if γ = 0, and from (5.151) if γ > 0.

It remains to examine the orthogonal content of Φfinal . For this, write
X X
Φfinal = µ(z)(−1)f (z) Λz + µ(z)(−1)f (z) (Λ̃z − Λz )
z∈{0,1}n z∈{0,1}n

+ γ(Φ∗ − Φ).
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 81

Then
  
 X
orth(Φfinal ) > min orth  µ(z)(−1)f (z) Λz  ,

z∈{0,1}n


min{orth(Λ̃z − Λz )}, orth(Φ∗ − Φ)
z 
   
 X 
> min orth  µ(z)(−1)f (z) Λz  , d
 
z∈{0,1}n
   
 X On 
> min orth  µ(z)(−1)f (z) Λzi  , d
 
z∈{0,1}n i=1

> min orth(µ · (−1)f ) orth(Λ1 − Λ0 ), d

√
> min{deg± (f, γ) min{m, c r}, d}
= d, (5.153)

where the first step applies Proposition 2.1(i); the second step follows from (5.127)
and (5.139); the third step is valid by the definition of Λz ; the fourth step applies
Corollary 2.3; the fifth step substitutes the lower bounds from (5.123) and (5.146);
and the final step uses (5.113).
To complete the proof, let

Φfinal ∗
Λ= · (−1)f ◦MPm,R ,
kΦfinal k1

where the right-hand side is well-defined by (5.152). Then kΛk1 = 1 by defini-

tion. Moreover, (5.148) and (5.151) guarantee that Λ is a nonnegative function
with support contained in X|6θ , so that Λ ∈ D(X|6θ ). The orthogonality prop-
erty (5.114) follows from (5.153), whereas the min-smoothness property (5.115)
follows from (5.149) and (5.151).

We now translate the new amplification theorem from Nnm |6θ to the hypercube,
using the input transformation scheme of Theorem 3.9.

Theorem 5.19. Let C > 1 be the absolute constant from Theorem 5.18. Fix pos-
itive integers n, m, θ with θ > Cnm log(2nm). Then there is an (explicitly given)
transformation H : {0, 1}6θdlog(nm+1)e → {0, 1}n , computable by an AND-OR-AND
circuit of polynomial size with bottom fan-in at most 6dlog(nm + 1)e, such that

deg± (f ◦ H, γθ−30d ) > ddlog(nm + 1) + 1e, (5.154)

−30d
deg± (f ◦ ¬H, γθ ) > ddlog(nm + 1) + 1e (5.155)
82 ALEXANDER A. SHERSTOV AND PEI WU

for all Boolean functions f : {0, 1}n → {0, 1}, all real numbers γ ∈ [0, 1], and all
positive integers

1 θ
d 6 min m deg± (f, γ), .
C 4m log(2θ)

Proof. Negating a function’s input bits has no effect on its γ-smooth threshold
degree for any 0 6 γ 6 1, so that f (x1 , x2 , . . . , xn ) and f (¬x1 , ¬x2 , . . . , ¬xn ) both
have γ-smooth threshold degree deg± (f, γ). Therefore, proving (5.154) for all f will
also settle (5.155) for all f. In what follows, we focus on the former.
Theorem 3.9 constructs an explicit surjection G : {0, 1}N → Nnm |6θ on N =
6θdlog(nm + 1)e variables with the following two properties:
(i) for every coordinate i = 1, 2, . . . , nm, the mapping x 7→ OR∗θ (G(x)i ) is
computable by a DNF formula of size (nmθ)O(1) = θO(1) with bottom fan-
in at most 6dlog(nm + 1)e;
(ii) for any polynomial p, the map v 7→ EG−1 (v) p is a polynomial on Nnm |6θ
of degree at most (deg p)/dlog(nm + 1) + 1e.
Consider the composition F = (f ◦ MP∗m,θ ) ◦ G. Then

F = (f ◦ (ANDm ◦ OR∗θ )) ◦ G
= f ◦ ((ANDm ◦ OR∗θ , . . . , ANDm ◦ OR∗θ ) ◦ G),
| {z }
n

which by property (i) of G means that F is the composition of f and an AND-OR-

AND circuit H of size (nmθ)O(1) = θO(1) and bottom fan-in at most 6dlog(nm+1)e.
Hence, the proof will be complete once we show that

deg± (F, γθ−30d ) > ddlog(nm + 1) + 1e. (5.156)

Define r = m2 and R = max{θ, r}, and consider the probability distribution on

{0, 1, 2, . . . , R}nm |6θ = Nnm |6θ given by Λ∗ (v) = |G−1 (v)|/2N . Then Theorem 5.18
constructs a probability distribution Λ on Nnm |6θ such that
∗
orth((−1)f ◦MPm,R · Λ) > d, (5.157)
−30d ∗
Λ > γθ Λ . (5.158)

In view of R > θ, inequality (5.157) can be restated as

∗
orth((−1)f ◦MPm,θ · Λ) > d. (5.159)

Define
X 1G−1 (v)
λ= Λ(v) · ,
|G−1 (v)|
v∈Nnm |6θ
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 83

where 1G−1 (v) denotes as usual the characteristic function of the set G−1 (v). Clearly,
λ is a probability distribution on {0, 1}N . Moreover,
X 1G−1 (v)
λ > γθ−30d Λ∗ (v) ·
|G−1 (v)|
v∈Nnm |6θ
X |G−1 (v)| 1G−1 (v)
= γθ−30d · −1
2N |G (v)|
v∈Nnm |6θ
1{0,1}N
= γθ−30d · , (5.160)
2N
where the first two steps use (5.158) and the definition of Λ∗ , respectively.
Finally, we examine the orthogonal content of (−1)F · λ. Let p : RN → R be any
polynomial of degree less than ddlog(nm + 1) + 1e. Then by property (ii) of G, the
mapping p∗ : v 7→ EG−1 (v) p is a polynomial on Nnm |6θ of degree less than d. As a
result,
∗
h(−1)F · λ, pi = h(−1)(f ◦MPm,θ )◦G · λ, pi
X X ∗
= (−1)(f ◦MPm,θ )◦G · λ · p
v∈Nnm |6θ G−1 (v)
X ∗ X
= (−1)(f ◦MPm,θ )(v) λ·p
v∈Nnm |6θ G−1 (v)
X ∗
= (−1)(f ◦MPm,θ )(v) Λ(v) E p
G−1 (v)
v∈Nnm |6θ
∗
= h(−1)f ◦MPm,θ · Λ, p∗ i
= 0,

where the last step uses (5.159) and deg p∗ < d. We conclude that orth((−1)F · λ) >
ddlog(nm + 1) + 1e, which along with (5.160) settles (5.156).

5.7. The smooth threshold degree of AC0 . By Theorem 5.1, the composi-
tion ANDn1/3 ◦ ORn2/3 has exp(−O(n1/3 ))-smooth threshold degree Ω(n1/3 ). The
objective of this section is to strengthen this result to a near-optimal bound. For
any > 0, we will construct a constant-depth circuit f : {0, 1}n → {0, 1} with
exp(−O(n1− ))-smooth threshold degree Ω(n1− ). This construction is likely to
find applications in future work, in addition to its use in this paper to obtain a
lower bound on the sign-rank of AC0 . The proof proceeds by induction, with the
amplification theorem for smooth threshold degree (Theorem 5.19) applied repeat-
edly to construct increasingly harder circuits. To simplify the exposition, we isolate
the inductive step in the following lemma.

Lemma 5.20. Let f : {0, 1}n → {0, 1} be a Boolean circuit of size s, depth d > 0,
and smooth threshold degree

n1−α n1−α

deg± f, exp −c0 · β
> c00 · , (5.161)
log n logβ n
84 ALEXANDER A. SHERSTOV AND PEI WU

for some constants α ∈ [0, 1], β > 0, c0 > 0, and c00 > 0. Then f can be trans-
formed in polynomial time into a Boolean circuit F : {0, 1}N → {0, 1} on N =
Θ(n1+α log2+β n) variables that has size s + N O(1) , depth at most d + 3, bottom
fan-in O(log N ), and smooth threshold degree
1
!! 1
0 N 1+α N 1+α
deg± F, exp −C · 1−α+β > C 00 · 1−α+β , (5.162)
log 1+α N log 1+α N

where C 0 , C 00 > 0 are constants that depend only on α, β, c0 , c00 . Moreover, if d > 1
and f is monotone with AND gates at the bottom, then the depth of F is at most
d + 2.

Proof. Let C > 1 be the absolute constant from Theorem 5.18. Apply Theorem 5.19
with

m = dnα logβ ne,

θ = dCnm log(2nm)e,
n1−α

γ = exp −c0 ·
logβ n

to obtain a function H : {0, 1}N → {0, 1}n on N = Θ(n1+α log2+β n) variables such
that the composition F = f ◦ H satisfies (5.162) for some C 0 , C 00 > 0 that depend
only on α, β, c0 , c00 , and furthermore H is computable by an AND-OR-AND circuit
of polynomial size and bottom fan-in O(log N ). Clearly, the composition F = f ◦ H
is a circuit of size s + N O(1) , depth at most d + 3, and bottom fan-in O(log N ).
Moreover, if d > 1 and the circuit for f is monotone with AND gates at the bottom
level, then the bottom level of f can be merged with the top level of H to reduce
the depth of F = f ◦ H to at most (d + 3) − 1 = d + 2.

Corollary 5.21. Fix constants α ∈ [0, 1], β > 0, c0 > 0, c00 > 0, and d > 0. Let
{fn }∞ n
n=1 be a Boolean circuit family in which fn : {0, 1} → {0, 1} has polynomial
size, depth at most d, and smooth threshold degree

n1−α n1−α

0
deg± fn , exp −c · β
> c00 · (5.163)
log n logβ n

for all n > 2. Then there is a Boolean circuit family {FN }∞ N

N =1 in which FN : {0, 1} →
{0, 1} has polynomial size, depth at most d + 3, bottom fan-in O(log N ), and smooth
threshold degree
1
!! 1
0 N 1+α 00 N 1+α
deg± FN , exp −C · 1−α+β >C · 1−α+β (5.164)
log 1+α N log 1+α N

for all N > 2, where C 0 , C 00 > 0 are constants that depend only on α, β, c0 , c00 .
Moreover, if d > 1 and each fn is monotone with AND gates at the bottom, then
the depth of each FN is at most d + 2.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 85

Proof. It suffices to construct FN for N larger than a constant of our choosing.

The first few circuits of the family {FN }∞ N =1 can then be taken to be the “dictator”
functions x 7→ x1 , each of which has 1/2-smooth threshold degree 1 by Fact 2.8.
Let c = c(α, β, c0 , c00 ) > 0 be a sufficiently small constant. For any given integer
N larger than a certain constant, apply Lemma 5.20 to fb(cN/ log2+β N )1/(1+α) c to
obtain a circuit F on at most N variables with smooth threshold degree (5.162),
for some constants C 0 , C 00 > 0 that depend only on α, β, c0 , c00 . Then Proposition 2.9
forces (5.164) for the circuit FN : {0, 1}N → {0, 1} given by FN (x1 , x2 , . . . , xN ) =
F (x1 , x2 , . . .). The claims regarding the size, depth, and fan-ins of each FN follow
directly from Lemma 5.20.

We now obtain our lower bounds on the smooth threshold degree of AC0 . We
present two incomparable theorems here, both of which apply Corollary 5.21 in a
recursive manner but with different base cases.

Theorem 5.22. Let k > 0 be a given integer. Then there is an (explicitly given)
Boolean circuit family {fn }∞ n
n=1 , where fn : {0, 1} → {0, 1} has polynomial size,
depth 3k, bottom fan-in O(log n), and smooth threshold degree
1
!! 1
0 n1− k+1 n1− k+1
deg± fn , exp −c · k(k−1)
> c00 · k(k−1)
(5.165)
log 2(k+1) n log 2(k+1) n

for some constants c0 , c00 > 0 and all n > 2.

Proof. The proof is by induction on k. The base case k = 0 corresponds to the

family of “dictator” functions x 7→ x1 , each of which has 1/2-smooth threshold de-
gree 1 by Fact 2.8. For the inductive step, fix an explicit circuit family {fn }∞n=1 in
which fn : {0, 1}n → {0, 1} has polynomial size, depth 3k, and smooth thresh-
old degree (5.165) for some constants c0 , c00 > 0. Then taking α = k+1 1
and
β = k(k−1) ∞
2(k+1) in Corollary 5.21 produces an explicit circuit family {Fn }n=1 in which
n
Fn : {0, 1} → {0, 1} has polynomial size, depth 3k + 3 = 3(k + 1), bottom fan-in
O(log n), and smooth threshold degree
k+1 k+1
!!
0 n k+2 n k+2
deg± Fn , exp −C · k(k+1)
> C 00 · k(k+1)
log 2(k+2) n log 2(k+2) n

for some constants C 0 , C 00 > 0 and all n > 2.

Theorem 5.23. Let k > 1 be a given integer. Then there is an (explicitly given)
Boolean circuit family {fn }∞ n
n=1 , where fn : {0, 1} → {0, 1} has polynomial size,
depth 3k + 1, bottom fan-in O(log n), and smooth threshold degree
2
!! 2
0 n1− 2k+3 00 n1− 2k+3
deg± fn , exp −c · k2
>c · k2
(5.166)
log 2k+3 n log 2k+3 n

for some constants c0 , c00 > 0 and all n > 2.

86 ALEXANDER A. SHERSTOV AND PEI WU

Proof. As with Theorem 5.22, the proof is by induction on k. For the base case
k = 1, consider the family {gn }∞ n
n=1 in which gn : {0, 1} → {0, 1} is given by

bn1/3 c bn2/3 c
_ ^
gn (x) = xi,j .
i=1 j=1

Then
1/3 1/3
deg± (gn , 12−bn c−1
) = deg± (ORbn1/3 c ◦ ANDbn2/3 c , 12−bn c−1
)
1/3
= deg± (MPbn1/3 c,bn2/3 c , 12−bn c−1
)
1/3
= Ω(n ),

where the first step uses Proposition 2.9; the second step is valid because a function’s
smooth threshold degree remains unchanged when one negates the function or its
input variables; and the last step follows from Theorem 5.1. Applying Corollary 5.21
to the circuit family {gn }∞n=1 with α = 2/3 and β = 0 yields an explicit circuit
family {Gn }∞n=1 in which G n
n : {0, 1} → {0, 1} has polynomial size, depth 2+2 = 4,
bottom fan-in O(log n), and smooth threshold degree

n3/5 n3/5

deg± Gn , exp −C 0 · > C 00 ·
log1/5 n log1/5 n

for some constants C 0 , C 00 > 0 and all n > 2. This new circuit family {Gn }∞ n=1
establishes the base case.
For the inductive step, fix an integer k > 1 and an explicit circuit family {fn }∞
n=1
in which fn : {0, 1}n → {0, 1} has polynomial size, depth 3k +1, and smooth thresh-
old degree (5.166) for some constants c0 , c00 > 0. Applying Corollary 5.21 with
α = 2/(2k + 3) and β = k 2 /(2k + 3) yields an explicit circuit family {Fn }∞ n=1 ,
where Fn : {0, 1}n → {0, 1} has polynomial size, depth (3k + 1) + 3 = 3(k + 1) + 1,
bottom fan-in O(log n), and smooth threshold degree
2k+3 2k+3
!!
000 n 2k+5 n 2k+5
deg± Fn , exp −C · (k+1)2
> C 0000 · (k+1)2
log 2k+5 n log 2k+5 n

for some constants C 000 , C 0000 > 0 and all n > 2. This completes the inductive
step.

5.8. The sign-rank of AC0 . We have reached our main result on the sign-rank
and unbounded-error communication complexity of constant-depth circuits. The
proof amounts to lifting, by means of Theorem 2.18, the lower bounds on smooth
threshold degree in Theorems 5.22 and 5.23 to sign-rank lower bounds.

Theorem 5.24. Let k > 1 be a given integer. Then there is an (explicitly given)
Boolean circuit family {Fn }∞ n n
n=1 , where Fn : {0, 1} ×{0, 1} → {0, 1} has polynomial
size, depth 3k, bottom fan-in O(log n), sign-rank
k(k−1)
1

rk± (Fn ) = exp Ω n1− k+1 · (log n)− 2(k+1) , (5.167)
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 87

and unbounded-error communication complexity

k(k−1)
1

UPP(Fn ) = Ω n1− k+1 · (log n)− 2(k+1) . (5.168)

Proof. Theorem 5.22 constructs a circuit family {fn }∞ n=1 in which fn : {0, 1} →
n

{0, 1} has polynomial size, depth 3k, bottom fan-in O(log n), and smooth thresh-
old degree (5.165) for some constants c0 , c00 > 0 and all n > 2. Abbreviate m =
2dexp(4c0 /c00 )e. For any n > m, define Fn = fbn/mc ◦ ORm ◦ AND2 . Then (5.167) is
immediate from (5.165) and Theorem 2.18. Combining (5.167) with Theorem 2.16
settles (5.168).
It remains to analyze the circuit complexity of Fn . We defined Fn formally as a
circuit of depth 3k + 2 in which the bottom four levels have fan-ins nO(1) , O(log n),
m, and 2, in that order. Since m is a constant independent of n, these four levels can
be computed by a circuit of polynomial size, depth 2, and bottom fan-in O(log n).
This optimization reduces the depth of Fn to (3k + 2) − 4 + 2 = 3k while keeping
the bottom fan-in at O(log n).
We now similarly lift Theorem 5.23 to a lower bound on sign-rank and unbounded-
error communication complexity.

Theorem 5.25. Let k > 1 be a given integer. Then there is an (explicitly given)
Boolean circuit family {Fn }∞ n n
n=1 , where Fn : {0, 1} ×{0, 1} → {0, 1} has polynomial
size, depth 3k + 1, bottom fan-in O(log n), sign-rank

k2
2

rk± (Fn ) = exp Ω n1− 2k+3 · (log n)− 2k+3 ,

and unbounded-error communication complexity

k2
2

UPP(Fn ) = Ω n1− 2k+3 · (log n)− 2k+3 .

Proof. The proof is analogous to that of Theorem 5.24, with the only difference that
the appeal to Theorem 5.22 should be replaced with an appeal to Theorem 5.23.
Theorems 5.24 and 5.25 settle Theorems 1.2, 1.3, and 1.5 in the introduction.

Acknowledgments
The authors are thankful to Mark Bun and Justin Thaler for valuable comments
on an earlier version of this paper.

References
[1] S. Aaronson and Y. Shi, Quantum lower bounds for the collision and the element distinct-
ness problems, J. ACM, 51 (2004), pp. 595–605, doi:10.1145/1008731.1008735.
[2] A. Ambainis, Polynomial degree and lower bounds in quantum complexity: Collision
and element distinctness with small range, Theory of Computing, 1 (2005), pp. 37–46,
doi:10.4086/toc.2005.v001a003.
[3] A. Ambainis, A. M. Childs, B. Reichardt, R. Špalek, and S. Zhang, Any AND-OR
formula of size N can be evaluated in time N 1/2+o(1) on a quantum computer, SIAM J.
Comput., 39 (2010), pp. 2513–2530, doi:10.1137/080712167.
[4] J. Aspnes, R. Beigel, M. L. Furst, and S. Rudich, The expressive power of voting
polynomials, Combinatorica, 14 (1994), pp. 135–148, doi:10.1007/BF01215346.
88 ALEXANDER A. SHERSTOV AND PEI WU

[5] L. Babai, P. Frankl, and J. Simon, Complexity classes in communication complexity

theory, in Proceedings of the Twenty-Seventh Annual IEEE Symposium on Foundations of
Computer Science (FOCS), 1986, pp. 337–347, doi:10.1109/SFCS.1986.15.
[6] L. Babai, N. Nisan, and M. Szegedy, Multiparty protocols, pseudorandom generators
for logspace, and time-space trade-offs, J. Comput. Syst. Sci., 45 (1992), pp. 204–232,
doi:10.1016/0022-0000(92)90047-M.
[7] P. Beame and T. Huynh, Multiparty communication complexity and threshold circuit size
of AC0 , SIAM J. Comput., 41 (2012), pp. 484–518, doi:10.1137/100792779.
[8] P. Beame and W. Machmouchi, The quantum query complexity of AC0 , Quantum Infor-
mation & Computation, 12 (2012), pp. 670–676.
[9] R. Beigel, N. Reingold, and D. A. Spielman, PP is closed under intersection, J. Comput.
Syst. Sci., 50 (1995), pp. 191–202, doi:10.1006/jcss.1995.1017.
[10] H. Buhrman, N. K. Vereshchagin, and R. de Wolf, On computation and communi-
cation with small bias, in Proceedings of the Twenty-Second Annual IEEE Conference on
Computational Complexity (CCC), 2007, pp. 24–32, doi:10.1109/CCC.2007.18.
[11] M. Bun, R. Kothari, and J. Thaler, The polynomial method strikes back: Tight quantum
query bounds via dual polynomials. ECCC Report TR17-169, 2017.
[12] M. Bun and J. Thaler, Dual lower bounds for approximate degree and Markov–Bernstein
inequalities, Inf. Comput., 243 (2015), pp. 2–25, doi:10.1016/j.ic.2014.12.003.
[13] M. Bun and J. Thaler, Hardness amplification and the approximate degree of constant-
depth circuits, in Proceedings of the Forty-Second International Colloquium on Automata,
Languages and Programming (ICALP), 2015, pp. 268–280, doi:10.1007/978-3-662-47672-
7_22.
[14] M. Bun and J. Thaler, Approximate degree and the complexity of depth three circuits, in
Electronic Colloquium on Computational Complexity (ECCC), 2016. Report TR16-121.
[15] M. Bun and J. Thaler, Improved bounds on the sign-rank of AC0 , in Proceedings of the
Forty-Third International Colloquium on Automata, Languages and Programming (ICALP),
2016, pp. 37:1–37:14, doi:10.4230/LIPIcs.ICALP.2016.37.
[16] M. Bun and J. Thaler, A nearly optimal lower bound on the approximate degree of AC0 ,
in Proceedings of the Fifty-Eighth Annual IEEE Symposium on Foundations of Computer
Science (FOCS), 2017, pp. 1–12, doi:10.1109/FOCS.2017.10.
[17] M. Bun and J. Thaler, The large-error approximate degree of AC0 , in Electronic Collo-
quium on Computational Complexity (ECCC), August 2018. Report TR18-143.
[18] A. K. Chandra, M. L. Furst, and R. J. Lipton, Multi-party protocols, in Proceedings of
the Fifteenth Annual ACM Symposium on Theory of Computing (STOC), 1983, pp. 94–99,
doi:10.1145/800061.808737.
[19] A. Chattopadhyay, Discrepancy and the power of bottom fan-in in depth-three circuits,
in Proceedings of the Forty-Eighth Annual IEEE Symposium on Foundations of Computer
Science (FOCS), 2007, pp. 449–458, doi:10.1109/FOCS.2007.24.
[20] B. Chor and O. Goldreich, Unbiased bits from sources of weak randomness and
probabilistic communication complexity, SIAM J. Comput., 17 (1988), pp. 230–261,
doi:10.1137/0217015.
[21] J. Forster, A linear lower bound on the unbounded error probabilistic communication com-
plexity, J. Comput. Syst. Sci., 65 (2002), pp. 612–625, doi:10.1016/S0022-0000(02)00019-3.
[22] J. Forster, M. Krause, S. V. Lokam, R. Mubarakzjanov, N. Schmitt, and H.-U. Si-
mon, Relations between communication complexity, linear arrangements, and computational
complexity, in Proc. of the 21st Conf. on Foundations of Software Technology and Theoretical
Computer Science (FST TCS), 2001, pp. 171–182, doi:10.1007/3-540-45294-X_15.
[23] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics: A Foundation
for Computer Science, Addison-Wesley, 2nd ed., 1994.
[24] S. Jukna, Extremal Combinatorics with Applications in Computer Science, Springer-Verlag
Berlin Heidelberg, 2nd ed., 2011, doi:10.1007/978-3-642-17364-6.
[25] A. R. Klivans, R. O’Donnell, and R. A. Servedio, Learning intersections and thresholds
of halfspaces, J. Comput. Syst. Sci., 68 (2004), pp. 808–840, doi:10.1016/j.jcss.2003.11.002.
1/3
[26] A. R. Klivans and R. A. Servedio, Learning DNF in time 2Õ(n ) , J. Comput. Syst.
Sci., 68 (2004), pp. 303–318, doi:10.1016/j.jcss.2003.07.007.
[27] A. R. Klivans and R. A. Servedio, Toward attribute efficient learning of decision lists
and parities, J. Machine Learning Research, 7 (2006), pp. 587–602.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 89

[28] M. Krause and P. Pudlák, On the computational power of depth-2 circuits with thresh-
old and modulo gates, Theor. Comput. Sci., 174 (1997), pp. 137–156, doi:10.1016/S0304-
3975(96)00019-9.
[29] M. Krause and P. Pudlák, Computing Boolean functions by polynomials and threshold
circuits, Comput. Complex., 7 (1998), pp. 346–370, doi:10.1007/s000370050015.
[30] E. Kushilevitz and N. Nisan, Communication complexity, Cambridge University Press,
1997.
[31] T. Lee, A note on the sign degree of formulas, 2009. Available at http://arxiv.org/abs/
0909.4607.
[32] M. L. Minsky and S. A. Papert, Perceptrons: An Introduction to Computational Geom-
etry, MIT Press, Cambridge, Mass., 1969.
[33] N. Nisan and M. Szegedy, On the degree of Boolean functions as real polynomials, Com-
putational Complexity, 4 (1994), pp. 301–313, doi:10.1007/BF01263419.
[34] R. O’Donnell and R. A. Servedio, Extremal properties of polynomial threshold functions,
J. Comput. Syst. Sci., 74 (2008), pp. 298–312, doi:10.1016/j.jcss.2007.06.021.
[35] R. O’Donnell and R. A. Servedio, New degree bounds for polynomial threshold functions,
Combinatorica, 30 (2010), pp. 327–358, doi:10.1007/s00493-010-2173-3.
[36] R. Paturi, On the degree of polynomials that approximate symmetric Boolean functions,
in Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing
(STOC), 1992, pp. 468–474, doi:10.1145/129712.129758.
[37] R. Paturi and M. E. Saks, Approximating threshold circuits by rational functions, Inf.
Comput., 112 (1994), pp. 257–272, doi:10.1006/inco.1994.1059.
[38] R. Paturi and J. Simon, Probabilistic communication complexity, J. Comput. Syst. Sci.,
33 (1986), pp. 106–123, doi:10.1016/0022-0000(86)90046-2.
[39] A. A. Razborov and A. A. Sherstov, The sign-rank of AC0 , SIAM J. Comput., 39 (2010),
pp. 1833–1855, doi:10.1137/080744037. Preliminary version in Proceedings of the Forty-Ninth
Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2008.
[40] M. E. Saks, Slicing the hypercube, Surveys in Combinatorics, (1993), pp. 211–255,
doi:10.1017/CBO9780511662089.009.
[41] A. A. Sherstov, Halfspace matrices, Computational Complexity, 17 (2008), pp. 149–178,
doi:10.1007/s00037-008-0242-4. Preliminary version in Proceedings of the Twenty-Second An-
nual IEEE Conference on Computational Complexity (CCC), 2007.
[42] A. A. Sherstov, Separating AC0 from depth-2 majority circuits, SIAM J. Comput., 38
(2009), pp. 2113–2129, doi:10.1137/08071421X. Preliminary version in Proceedings of the
Thirty-Ninth Annual ACM Symposium on Theory of Computing (STOC), 2007.
[43] A. A. Sherstov, Communication complexity under product and nonproduct distributions,
Computational Complexity, 19 (2010), pp. 135–150, doi:10.1007/s00037-009-0285-1. Prelimi-
nary version in Proceedings of the Twenty-Third Annual IEEE Conference on Computational
Complexity (CCC), 2008.
[44] A. A. Sherstov, The pattern matrix method, SIAM J. Comput., 40 (2011), pp. 1969–
2000, doi:10.1137/080733644. Preliminary version in Proceedings of the Fortieth Annual ACM
Symposium on Theory of Computing (STOC), 2008.
[45] A. A. Sherstov, The unbounded-error communication complexity of symmetric functions,
Combinatorica, 31 (2011), pp. 583–614, doi:10.1007/s00493-011-2580-0. Preliminary version
in Proceedings of the Forty-Ninth Annual IEEE Symposium on Foundations of Computer
Science (FOCS), 2008.
[46] A. A. Sherstov, Strong direct product theorems for quantum communication and query
complexity, SIAM J. Comput., 41 (2012), pp. 1122–1165, doi:10.1137/110842661. Preliminary
version in Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing
(STOC), 2011.
[47] A. A. Sherstov, The intersection of two halfspaces has high threshold degree, SIAM J. Com-
put., 42 (2013), pp. 2329–2374, doi:10.1137/100785260. Preliminary version in Proceedings of
the Fiftieth Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2009.
[48] A. A. Sherstov, Making polynomials robust to noise, Theory of Computing, 9 (2013),
pp. 593–615, doi:10.4086/toc.2013.v009a018. Preliminary version in Proceedings of the Forty-
Fourth Annual ACM Symposium on Theory of Computing (STOC), 2012.
[49] A. A. Sherstov, Optimal bounds for sign-representing the intersection of two halfspaces
by polynomials, Combinatorica, 33 (2013), pp. 73–96, doi:10.1007/s00493-013-2759-7. Pre-
liminary version in Proceedings of the Forty-Second Annual ACM Symposium on Theory of
Computing (STOC), 2010.
90 ALEXANDER A. SHERSTOV AND PEI WU

[50] A. A. Sherstov, Breaking the Minsky–Papert barrier for constant-depth circuits, in Pro-
ceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing (STOC), 2014,
pp. 223–232, doi:10.1145/2591796.2591871. Full version available as ECCC Report TR14-009,
January 2014.
[51] A. A. Sherstov, Communication lower bounds using directional derivatives, J. ACM, 61
(2014), pp. 1–71, doi:10.1145/2629334. Preliminary version in Proceedings of the Forty-Fifth
Annual ACM Symposium on Theory of Computing (STOC), 2013.
[52] A. A. Sherstov, The power of asymmetry in constant-depth circuits, in Proceedings of the
Fifty-Sixth Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2015,
pp. 431–450, doi:10.1109/FOCS.2015.34.
[53] A. A. Sherstov, The multiparty communication complexity of set disjointness, SIAM J.
Comput., 45 (2016), pp. 1450–1489, doi:10.1137/120891587. Preliminary version in Proceed-
ings of the Forty-Fourth Annual ACM Symposium on Theory of Computing (STOC), 2009.
[54] A. A. Sherstov, Algorithmic polynomials, in Proceedings of the Fiftieth Annual ACM Sym-
posium on Theory of Computing (STOC), 2018, pp. 311–324, doi:10.1145/3188745.3188958.
[55] K.-Y. Siu, V. P. Roychowdhury, and T. Kailath, Rational approximation techniques for
analysis of neural networks, IEEE Transactions on Information Theory, 40 (1994), pp. 455–
466, doi:10.1109/18.312168.
[56] J. Thaler, Lower bounds for the approximate degree of block-composed functions, in Proceed-
ings of the Forty-Third International Colloquium on Automata, Languages and Programming
(ICALP), 2016, pp. 17:1–17:15, doi:10.4230/LIPIcs.ICALP.2016.17.
[57] R. Špalek, A dual polynomial for OR. Available at http://arxiv.org/abs/0803.4516, 2008.
[58] A. C.-C. Yao, Some complexity questions related to distributive computing, in Proceedings of
the Eleventh Annual ACM Symposium on Theory of Computing (STOC), 1979, pp. 209–213,
doi:10.1145/800135.804414.

Appendix A. Sign-rank and smooth threshold degree

The purpose of this appendix is to prove Theorem 2.18, implicit in [45, 39].
We closely follow the treatment in those earlier papers. Sections A.1–A.3 cover
necessary technical background, followed by the proof proper in Section A.4.

A.1. Fourier transform. Consider the real vector space of functions {0,P1}n →
R. For S ⊆ {1, 2, . . . , n}, define χS : {0, 1}n → {−1, +1} by χS (x) = (−1) i∈S xi .
Then
(
2n if S = T,
hχS , χT i =
0 otherwise.

Thus, {χS }S⊆{1,2,...,n} is an orthogonal basis for the vector space in question. In
particular, every function φ : {0, 1}n → R has a unique representation of the form
X
φ= φ̂(S)χS
S⊆{1,2,...,n}

for some reals φ̂(S), where by orthogonality φ̂(S) = 2−n hφ, χS i. The reals φ̂(S)
are called the Fourier coefficients of φ, and the mapping φ 7→ φ̂ is the Fourier
transform of f. The following fact is immediate from the definition of φ̂(S).

Proposition A.1. Let φ : {0, 1}n → R be given. Then

max |φ̂(S)| 6 2−n kφk1 .

S⊆{1,2,...,n}
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 91

The linear subspace of real polynomials on {0, 1}n of degree at most d is easily
seen to be span{χS : |S| 6 d}. Its orthogonal complement, span{χS : |S| > d},
is then the linear subspace of functions that have zero inner product with every
polynomial of degree at most d. As a result, the orthogonal content of a nonzero
function φ : {0, 1}n → R is given by

orth φ = min{|S| : φ̂(S) 6= 0}, φ 6≡ 0. (A.1)

A.2. Forster’s bound. The spectral norm of a real matrix A = [Axy ]x∈X,y∈Y is
given by

kAk = max kAvk2 ,

v∈R|Y | , kvk2 =1

where k · k2 is the Euclidean norm on vectors. The first strong lower bound on the
sign-rank of an explicit matrix was obtained by Forster [21], who proved that
p
|X| |Y |
rk± (A) >
kAk

for any matrix A = [Axy ]x∈X,y∈Y with ±1 entries. Forster’s result has seen a
number of generalizations, including the following theorem [22, Theorem 3].

Theorem A.2 (Forster et al.). Let A = [Axy ]x∈X,y∈Y be a real matrix without zero
entries. Then
p
|X| |Y |
rk± (A) > min |Axy |.
kAk x,y

A.3. Spectral norm of pattern matrices. Pattern matrices were introduced

in [42, 44] and proved useful in obtaining strong lower bounds on communication
complexity. Relevant definitions and results from [44] follow. Let n and N be
positive integers with n | N. Partition {1, 2, . . . , N } into n contiguous blocks, each
with N/n elements:

N N 2N
{1, 2, . . . , N } = 1, 2, . . . , ∪ + 1, . . . ,
n n n

(n − 1)N
∪ ··· ∪ + 1, . . . , N .
n

Now, let V (N, n) denote the family of subsets V ⊆ {1, 2, . . . , N } that have exactly
one element in each of these blocks (in particular, |V | = n). Clearly, |V (N, n)| =
(N/n)n . For a function φ : {0, 1}n → R, the (N, n, φ)-pattern matrix is the real
matrix A given by
h i
A = φ(x|V ⊕ w) N n
.
x∈{0,1} , (V,w)∈V (N,n)×{0,1}

In words, A is the matrix of size 2N by (N/n)n 2n whose rows are indexed by strings
x ∈ {0, 1}N , whose columns are indexed by pairs (V, w) ∈ V (N, n) × {0, 1}n , and
92 ALEXANDER A. SHERSTOV AND PEI WU

whose entries are given by Ax,(V,w) = φ(x|V ⊕ w). We will need the following
expression for the spectral norm of a pattern matrix [44, Theorem 4.3].

Theorem A.3 (Sherstov). Let φ : {0, 1}n → R be given. Let A be the (N, n, φ)-
pattern matrix. Then
s n
N n |S|/2
kAk = 2N +n max |φ̂(S)| .
n S⊆{1,2,...,n} N

A.4. Proof of Theorem 2.18. We are now in a position to prove Theorem 2.18.
We will derive it from the following more general result, stated in terms of pattern
matrices.

Theorem A.4. Let f : {0, 1}n → {0, 1} be given. Suppose that deg± (f, γ) > d,
where γ and d are positive reals. Then for any integer T > 1, the (T n, n, (−1)f )-
pattern matrix has sign-rank at least γT d/2 .

Proof. By the definition of smooth threshold degree, there is a probability distri-

bution µ on {0, 1}n such that

µ(x) > γ 2−n , x ∈ {0, 1}n , (A.2)

orth((−1)f · µ) > d. (A.3)

Abbreviate φ = (−1)f · µ. Let F and Φ denote the (T n, n, (−1)f )- and (T n, n, φ)-

pattern matrices, respectively. By (A.1) and (A.3),

φ̂(S) = 0, |S| < d. (A.4)

The remaining Fourier coefficients of φ can be bounded using Proposition A.1:

|φ̂(S)| 6 2−n , S ⊆ {1, 2, . . . , n}. (A.5)

Now

rk± (F ) = rk± (Φ)

√
2T n+n T n
> · γ 2−n
kΦk
γ 2−n
=
maxS {|φ̂(S)| T −|S|/2 }
> γT d/2 ,

where the first step is valid because F and Φ have the same sign pattern; the second
step uses (A.2) and Theorem A.2; the third step applies Theorem A.3; and the final
step substitutes the upper bounds from (A.4) and (A.5).

We have reached the main result of this appendix.

THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 93

Theorem (restatement of Theorem 2.18). Let f : {0, 1}n → {0, 1} be given. Sup-
pose that deg± (f, γ) > d, where γ and d are positive reals. Fix an integer m > 2
and define F : {0, 1}mn × {0, 1}mn → {0, 1} by F (x, y) = f ◦ ORm ◦ AND2 . Then
j m kd/2
rk± (F ) > γ .
2

Proof. The result is immediate from Theorem A.4 since the (bm/2cn, n, (−1)f )-
pattern matrix is a submatrix of [(−1)F (x,y) ]x,y .

Appendix B. A dual object for OR

The purpose of this appendix is to prove Theorem 3.3, which gives a dual poly-
nomial for the OR function with a number of additional properties. The treatment
here closely follows earlier work by Špalek [57], Bun and Thaler [12, 16, 11], and
Sherstov [50, 52]. We start with a well-known binomial identity [23].

Fact B.1. For every univariate polynomial p of degree less than n,

n
X n
t
(−1) p(t) = 0.
t=0
t

The next lemma constructs a dual polynomial for OR that has the sign behavior
claimed in Theorem 3.3 but may lack some of the metric properties. The lemma is
an adaptation of [50, Lemma A.2].

Lemma B.2. Let be given, 0 < < 1. Then for some constant c = c() ∈ (0, 1) and
every integer n > 1, there is an (explicitly given) function ω : {0, 1, 2, . . . , n} → R
such that
1−
ω(0) > · kωk1 , (B.1)
2
1
|ω(t)| 6 2 ct/√n · kωk1 (t = 1, 2, . . . , n), (B.2)
ct 2
(−1)t ω(t) > 0 (t = 0, 1, 2, . . . , n), (B.3)
√
orth ω > c n. (B.4)

Remark B.3. It is helpful to keep in mind that properties (B.1)–(B.4) are logically
monotonic in c. In other words, establishing these properties for a given constant
c > 0 also establishes them for all smaller positive constants.

Proof of Lemma B.2. Let ∆ = 8d1/e + 3. If n 6 ∆, the requirements of the lemma

hold for the function ω : (0, 1, 2, 3, . . . , n) 7→ (1, −1, 0, 0, . . . , 0) and all c ∈ (0, 1/∆].
In what follows,p we treat the complementary case n > ∆.
Define d = b n/∆c and let S = {1, ∆+1 2
2 } ∪ {i ∆ : i = 0, 1, 2, . . . , d}, so that
S ⊆ {0, 1, 2, . . . , n}. Consider the function ω : {0, 1, 2, . . . , n} → R given by

(−1)n+t+|S|+1 n
Y
ω(t) = (t − i).
n! t i=0,1,2,...,n:
i∈S
/
94 ALEXANDER A. SHERSTOV AND PEI WU

Fact B.1 implies that

orth ω > d + 1
r
n
> . (B.5)
∆

A routine calculation reveals that

(
(−1)|{i∈S:i<t}| i∈S\{t} 1
Q
|t−i| if t ∈ S,
ω(t) = (B.6)
0 otherwise.

It follows that
d
ω(0) ∆ − 1 Y i2 ∆ − 1
=
|ω(1)| ∆ + 1 i=1 i2 ∆
d
2 X 1
>1− −
∆ + 1 i=1 i2 ∆
∞
2 1 X 1
>1− −
∆ + 1 ∆ i=1 i2
4
>1− . (B.7)
∆
An analogous application of (B.6) shows that

|ω( ∆+1
2 )|
∆+1
2 ∆d d! d!
6 ∆+1
|ω(0)| 2 · ( ∆+1
2 − 1) (∆ − ∆+1
2 ) · 1 d−1
2∆ (d − 1)! (d + 1)!
8∆d
=
(∆ − 1)2 (d + 1)
8∆
6 . (B.8)
(∆ − 1)2
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 95

Finally, for i = 1, 2, . . . , d,
∆+1
|ω(i2 ∆)| 2 d! d! ∆d
= 2 ∆+1
· 1
|ω(0)| (i ∆ − 1)(i2 ∆ − 2 ) 2 · (d − i)! (d + i)! ∆d
2(∆ + 1) d! d!
6 4 2
·
i (∆ − 1) (d − i)! (d + i)!
2(∆ + 1) d d−1 d−i+1
= 4 · · · ··· ·
i (∆ − 1)2 d + i d + i − 1 d+1
i
2(∆ + 1) i
6 4 2
· 1−
i (∆ − 1) d+i
i2

2(∆ + 1)
6 4 · exp −
i (∆ − 1)2 d+i
2
2(∆ + 1) i
6 4 · exp −
i (∆ − 1)2 2d
!
2(∆ + 1) i2
6 4 · exp − p . (B.9)
i (∆ − 1)2 2 n/∆

Now,

d
kωk1 |ω(1)| |ω( ∆+1
2 )|
X |ω(i2 ∆)|
=1+ + +
ω(0) ω(0) ω(0) i=1
ω(0)
−1 ∞
4 8∆ X 2(∆ + 1)
61+ 1− + +
∆ (∆ − 1)2 i=1 i4 (∆ − 1)2
−1
π 4 (∆ + 1)

4 8∆
=1+ 1− + 2
+
∆ (∆ − 1) 45(∆ − 1)2
2
6 8
1− ∆
2
< , (B.10)
1−

where the second step uses (B.7)–(B.9), and the last step substitutes the definition
of ∆. Now (B.1) follows from (B.10) and ω(0) > 0. Moreover, for c = c(∆) > 0
small enough, (B.4) follows from (B.5), whereas (B.2) follows from (B.9) and the
fact that ω vanishes outside the union {1, ∆+1 2
2 } ∪ {i ∆ : i = 0, 1, 2, . . . , d}.
It remains to verify that ω has the desired sign behavior. Since ω vanishes outside
S, the requirement (B.3) holds trivially at those points. For t ∈ S, it follows from
(B.6) that

sgn ω(1) = −1,

sgn ω ∆+1

2 = 1,
sgn ω(i2 ∆) = (−1)i , i = 0, 1, 2, . . . , d.
96 ALEXANDER A. SHERSTOV AND PEI WU

Since ∆ ∈ 4Z + 3 by definition, we conclude that sgn ω(t) = (−1)t for all t ∈ S.

This settles (B.3) and completes the proof.

We have reached the main result of this section.

Theorem (restatement of Theorem 3.3). Let 0 < < 1 be given. Then for some
constants c0 , c00 ∈ (0, 1) and all integers N > n > 1, there is an (explicitly given)
function ψ : {0, 1, 2, . . . , N } → R such that

1−
ψ(0) > , (B.11)
2
kψk1 = 1, (B.12)
√
orth ψ > c0 n, (B.13)
t
sgn ψ(t) = (−1) , t = 0, 1, 2, . . . , N, (B.14)
c0

1
|ψ(t)| ∈ √ , √ , t = 0, 1, 2, . . . , N. (B.15)
(t + 1)2 2c00 t/ n c0 (t + 1)2 2c00 t/ n

Proof. For a sufficiently small constant c > 0 and all n > 1, Lemma B.2 and
Remark B.3 ensure the existence of a function ω : {0, 1, 2, . . . , dn/2e} → R such
that

kωk1 = 1, (B.16)
1
ω(0) > 1− , (B.17)
2 6
1
|ω(t)| 6 2 ct/√n (t = 1, 2, . . . , dn/2e), (B.18)
ct 2
(−1)t ω(t) > 0 (t = 0, 1, 2, . . . , dn/2e), (B.19)
√
orth ω > c n. (B.20)

For convenience, extend ω to all of Z by defining it to be zero outside its original

domain. Define Ψ : {0, 1, 2, . . . , N } → R by

N −dn/2e
X (−1)i
Ψ(t) = ω(t) + δ  √ ω(t − i)
i=1
i2 2ci/ n

N
X (−1)i
+ √ ω(−t + i) ,
2 2ci/ n
i=N −dn/2e+1
i

where
5
δ= .
π 2 (1
− )

By (B.20) and Proposition 2.1(i),

√
orth Ψ > c n. (B.21)
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 97

We now move on to metric properties of Ψ. Multiplying the defining equation for

Ψ on both sides by (−1)t and applying (B.19), we arrive at
 
N −dn/2e N
X |ω(t − i)| X |ω(−t + i)| 
(−1)t Ψ(t) = |ω(t)| + δ  √ + √ ,
i=1
i2 2ci/ n i=N −dn/2e+1 i2 2ci/ n
t = 0, 1, 2, . . . , N. (B.22)

Summing over t gives

N
X 1
kΨk1 = kωk1 + δ √ kωk1
i=1
i2 2ci/ n
N
X 1
=1+δ √
i=1
i2 2ci/ n
∞
" #
X 1
∈ 1, 1 + δ
i=1
i2

6−
= 1, , (B.23)
6(1 − )

where the second step uses (B.16). We also have

Ψ(0) > ω(0)

6−
> , (B.24)
12
where the first and second steps use (B.22) and (B.17), respectively.
We now estimate |Ψ(t)| for each t = 1, 2, . . . , N. For a lower bound, we have
 
N −dn/2e N
X |ω(t − i)| X |ω(−t + i)| 
|Ψ(t)| = |ω(t)| + δ  √ + √
i=1
i2 2ci/ n i=N −dn/2e+1 i2 2ci/ n
|ω(0)|
>δ· √
t2
2ct/ n
5 6− 1
> 2 · · 2 ct/√n , (B.25)
π (1 − ) 12 t 2

where the first and last steps use (B.22) and (B.17), respectively. The upper bound
on |Ψ(t)| is somewhat more technical. To begin with, we have the following bound
for every positive integer t:
98 ALEXANDER A. SHERSTOV AND PEI WU

t−1 t−1
X 1 X 1
=
i=1
(t − i)2 i2 i=1
max{(t − i)2 , i2 } min{(t − i)2 , i2 }

t−1
1 X 1
6
(t/2)2 i=1 min{(t − i)2 , i2 }
∞
1 X 1
6 · 2
(t/2)2 i=1
i2

4π 2
= . (B.26)
3t2
Continuing,
∞ t−1
X |ω(t − i)| |ω(0)| X |ω(t − i)|
√ = √ + √
i=1
i2 2ci/ n 2
t 2 ct/ n
i=1
i2 2ci/ n
t−1
1 X 1
6 √ + √
t2 2ct/ n
i=1
c(t − i)2 i2 2ct/ n

4π 2

1
6 √ 1+ , (B.27)
t2 2ct/ n 3c

where the second step uses (B.16) and (B.18), and the third step substitutes the
bound from (B.26). Analogously,
∞ ∞
X |ω(−t + i)| |ω(0)| X |ω(−t + i)|
√ = √ + √
i=1
i2 2ci/ n t2 2ct/ n i=t+1 i2 2ci/ n
∞
1 X 1
6 √ + √
t2 2ct/ n i=t+1
c(t − i)2 i2 2ci/ n

∞
!
1 X 1
6 √ 1+
t2 2ct/ n
i=t+1
c(t − i)2
π2

1
= √ 1+ , (B.28)
t2 2ct/ n 6c

where the second step uses (B.16) and (B.18). Now for every integer t > 1,
∞ ∞
!
X |ω(t − i)| X |ω(−t + i)|
|Ψ(t)| 6 |ω(t)| + δ √ + √
i=1
i2 2ci/ n
i2 2ci/ i=1
n

4π 2 δ π2 δ

1
6 √ 1 + 2cδ + + , (B.29)
ct2 2ct/ n 3 6

where the first step is immediate from the defining equation for Ψ, and the second
step uses (B.18), (B.27), and (B.28).
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 99

To complete the proof, let ψ : {0, 1, 2, . . . , N } → R be given by ψ = Ψ/kΨk1 .

Then for c00 = c and small enough c0 = c0 (c, , δ) > 0, properties (B.11)–(B.15)
follow directly from (B.21)–(B.25) and (B.29).

ECCC ISSN 1433-8092

https://eccc.weizmann.ac.il

Chap 4 Beyond Gradient Descent
No ratings yet
Chap 4 Beyond Gradient Descent
26 pages
Douglas Rachford Optimization
No ratings yet
Douglas Rachford Optimization
4 pages
Numerical Methods Course Outline
No ratings yet
Numerical Methods Course Outline
4 pages
Engineering Optimization Theory and Practice 5th Edition by Singiresu Rao 9781119454793 1119454794
No ratings yet
Engineering Optimization Theory and Practice 5th Edition by Singiresu Rao 9781119454793 1119454794
41 pages
Transportation Problems
No ratings yet
Transportation Problems
39 pages
Homework 0: Highly Recommended Yourself
No ratings yet
Homework 0: Highly Recommended Yourself
7 pages
9 Lagrange Interpolation
No ratings yet
9 Lagrange Interpolation
18 pages
Warm Up Lesson Presentation Lesson Quiz: Holt Mcdougal Algebra 2 Holt Algebra 2 Holt Mcdougal Algebra 2
No ratings yet
Warm Up Lesson Presentation Lesson Quiz: Holt Mcdougal Algebra 2 Holt Algebra 2 Holt Mcdougal Algebra 2
32 pages
CH9 1
No ratings yet
CH9 1
28 pages
L5 - Analysis of Algorithm Efficiency
No ratings yet
L5 - Analysis of Algorithm Efficiency
13 pages
Deep Learning Unit 4
No ratings yet
Deep Learning Unit 4
11 pages
Pyomo VSjump
No ratings yet
Pyomo VSjump
22 pages
Adobe Scan 23-Jun-2023
No ratings yet
Adobe Scan 23-Jun-2023
25 pages
Rechecking Result of B.Tech 3rd Sem. (Affiliated) For Exam Held in Dec 2023
No ratings yet
Rechecking Result of B.Tech 3rd Sem. (Affiliated) For Exam Held in Dec 2023
8 pages
Revision1OfTR23 176
No ratings yet
Revision1OfTR23 176
40 pages
Revision1OfTR22 087
No ratings yet
Revision1OfTR22 087
71 pages
An Improved Line-Point Low-Degree Test: Prahladh Harsha Mrinal Kumar Ramprasad Saptharishi Madhu Sudan
No ratings yet
An Improved Line-Point Low-Degree Test: Prahladh Harsha Mrinal Kumar Ramprasad Saptharishi Madhu Sudan
52 pages
Practice Test
No ratings yet
Practice Test
8 pages
Pseudospectral Shattering, The Sign Function, and Diagonalization in Nearly Matrix Multiplication Time
No ratings yet
Pseudospectral Shattering, The Sign Function, and Diagonalization in Nearly Matrix Multiplication Time
89 pages
EEP311L Chapter 17 The QR Method
No ratings yet
EEP311L Chapter 17 The QR Method
10 pages
Polynomials and Polynomial Equation
No ratings yet
Polynomials and Polynomial Equation
7 pages
Rozprawa
No ratings yet
Rozprawa
77 pages
Tropico
No ratings yet
Tropico
3 pages
Softmax Policy Gradient Methods Can Take Exponenti
No ratings yet
Softmax Policy Gradient Methods Can Take Exponenti
65 pages
Lec 4
No ratings yet
Lec 4
5 pages
Department of Computing: MATH333: Numerical Analysis
No ratings yet
Department of Computing: MATH333: Numerical Analysis
3 pages
Lec02-4 Splines Interpolation
No ratings yet
Lec02-4 Splines Interpolation
14 pages
Combinatorics and Complexity of Partition Functions (2016)
No ratings yet
Combinatorics and Complexity of Partition Functions (2016)
304 pages
Mreza
No ratings yet
Mreza
21 pages
(CS 5008) Reinforcement Learning: Assignment 1: 1 Single Random Variable
No ratings yet
(CS 5008) Reinforcement Learning: Assignment 1: 1 Single Random Variable
1 page
(CS 5008) Reinforcement Learning: Assignment 3: t+1 T t+1 T 0 0 0 4
No ratings yet
(CS 5008) Reinforcement Learning: Assignment 3: t+1 T t+1 T 0 0 0 4
1 page
(CS 5008) Reinforcement Learning: Assignment 5
No ratings yet
(CS 5008) Reinforcement Learning: Assignment 5
2 pages
P NP NPC and NPH
No ratings yet
P NP NPC and NPH
3 pages
Stanford Statistics311 InformationTheoryAndStatistics
No ratings yet
Stanford Statistics311 InformationTheoryAndStatistics
304 pages
Polynomial Functions 1
100% (1)
Polynomial Functions 1
19 pages
3 of 3 - Trade Secretes and Conflict of Interest, CSR PDF
No ratings yet
3 of 3 - Trade Secretes and Conflict of Interest, CSR PDF
38 pages
A Diagram Free Approach To The Stochastic Estimates in Regularity Structures
No ratings yet
A Diagram Free Approach To The Stochastic Estimates in Regularity Structures
97 pages
PP Online Test2 Aw Sample Essays and Commentary Issue
No ratings yet
PP Online Test2 Aw Sample Essays and Commentary Issue
8 pages
Frederick Trick
No ratings yet
Frederick Trick
69 pages
Introduction To Machinelearning
No ratings yet
Introduction To Machinelearning
75 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Full Notes
No ratings yet
Full Notes
197 pages
Hua-Dong: Yao, PHD
No ratings yet
Hua-Dong: Yao, PHD
2 pages
Combinations of Boolean Groebner Bases and SAT Solvers
No ratings yet
Combinations of Boolean Groebner Bases and SAT Solvers
101 pages
As A Single PDF
No ratings yet
As A Single PDF
3 pages
Randomised Algorithm
No ratings yet
Randomised Algorithm
385 pages
Information Theory: Lecture Notes For
No ratings yet
Information Theory: Lecture Notes For
193 pages
Finite Element 2 PDF
No ratings yet
Finite Element 2 PDF
5 pages
1 of 3. PHILOSOPHY OF TECHNOLOGY
No ratings yet
1 of 3. PHILOSOPHY OF TECHNOLOGY
3 pages
Web Coding Book
No ratings yet
Web Coding Book
473 pages
Optimization of Fixed-Point Circuits Represented by Taylor Series and Real-Valued Polynomials Including Analysis of Precision and Range
No ratings yet
Optimization of Fixed-Point Circuits Represented by Taylor Series and Real-Valued Polynomials Including Analysis of Precision and Range
226 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Factor Graph Thesis
No ratings yet
Factor Graph Thesis
135 pages
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Notes
No ratings yet
Notes
97 pages
Generating High-Order Threshold Functions With Multiple Thresholds
No ratings yet
Generating High-Order Threshold Functions With Multiple Thresholds
7 pages
COMS 6998 Lec 1
No ratings yet
COMS 6998 Lec 1
8 pages
Information Theory Lecture Notes
100% (1)
Information Theory Lecture Notes
97 pages
Mit6 441s16 Course Notes
No ratings yet
Mit6 441s16 Course Notes
295 pages
Numerical Method For Engineers-Chapter 14
No ratings yet
Numerical Method For Engineers-Chapter 14
8 pages
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Lecture Notes
No ratings yet
Lecture Notes
495 pages
Wagner A
No ratings yet
Wagner A
70 pages
LPP Mod 2
No ratings yet
LPP Mod 2
27 pages
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Information Theory For Single-User Systems With Arbitrary Statistical Memory
No ratings yet
Information Theory For Single-User Systems With Arbitrary Statistical Memory
111 pages
(Lecture Notes in Control and Information Sciences 385) Raphaël Jungers (Auth.) - The Joint Spectral Radius - Theory and applications-Springer-Verlag Berlin Heidelberg (2009) PDF
No ratings yet
(Lecture Notes in Control and Information Sciences 385) Raphaël Jungers (Auth.) - The Joint Spectral Radius - Theory and applications-Springer-Verlag Berlin Heidelberg (2009) PDF
144 pages
DATA MINING - PROJECT - PPT
No ratings yet
DATA MINING - PROJECT - PPT
22 pages
300 Solved-Numerical Problems
No ratings yet
300 Solved-Numerical Problems
37 pages
Optimization Structure and Applications
100% (1)
Optimization Structure and Applications
407 pages
Polynomials Chapter Wise Important Questions Class 10 Mathematics
No ratings yet
Polynomials Chapter Wise Important Questions Class 10 Mathematics
25 pages
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Webxyu Xcoding Book
No ratings yet
Webxyu Xcoding Book
391 pages
Lecture Notes
100% (1)
Lecture Notes
324 pages
It Lectures
No ratings yet
It Lectures
342 pages
Height
No ratings yet
Height
218 pages
On Small Depth Threshold Circuits
No ratings yet
On Small Depth Threshold Circuits
14 pages
Combinatorica: Received May 8, 1990 Revised May 21, 1991
No ratings yet
Combinatorica: Received May 8, 1990 Revised May 21, 1991
16 pages
Lecture Notes On Cryptographic Boolean Functions: Anne Canteaut
No ratings yet
Lecture Notes On Cryptographic Boolean Functions: Anne Canteaut
48 pages
Notes On Computational Complexity Theory CPSC 468/568: Spring 2020
No ratings yet
Notes On Computational Complexity Theory CPSC 468/568: Spring 2020
229 pages
EE 376A: Information Theory: Lecture Notes
No ratings yet
EE 376A: Information Theory: Lecture Notes
75 pages
NP Is Both True and Provable, Why Proving It Is So Hard
No ratings yet
NP Is Both True and Provable, Why Proving It Is So Hard
121 pages
Osama the Gun
From Everand
Osama the Gun
Norman Spinrad
5/5 (1)
Principles of Digital Communication
No ratings yet
Principles of Digital Communication
222 pages
Lecture Notes On Computational Complexity: Luca Trevisan Notes Written in Fall 2002, Revised May 2004
No ratings yet
Lecture Notes On Computational Complexity: Luca Trevisan Notes Written in Fall 2002, Revised May 2004
171 pages
Scott Aaronson P NP Survey
No ratings yet
Scott Aaronson P NP Survey
122 pages
Distributed Systems
67% (3)
Distributed Systems
331 pages
Advanced Topics Information Theory-Lecture Notes - Stefan M. Moser 2.5 PDF
No ratings yet
Advanced Topics Information Theory-Lecture Notes - Stefan M. Moser 2.5 PDF
416 pages
Comput Ability Decid Ability Complexity
No ratings yet
Comput Ability Decid Ability Complexity
172 pages
CSE291 Course Notes
No ratings yet
CSE291 Course Notes
69 pages
Lecture 9: Learning Decision Trees and DNFS: 1 Two Important Learning Algorithms
No ratings yet
Lecture 9: Learning Decision Trees and DNFS: 1 Two Important Learning Algorithms
8 pages
Complexity Theory
0% (1)
Complexity Theory
91 pages
PDC Chapter1
No ratings yet
PDC Chapter1
36 pages
Lecture Notes in Information Theory Volume II
No ratings yet
Lecture Notes in Information Theory Volume II
293 pages
Streaming Algorithms
No ratings yet
Streaming Algorithms
73 pages
Computing Functions Over Wireless Networks
No ratings yet
Computing Functions Over Wireless Networks
37 pages
Estimation and Detection Theory by Don H. Johnson
No ratings yet
Estimation and Detection Theory by Don H. Johnson
214 pages

TR19 003

Uploaded by

TR19 003

Uploaded by

Electronic Colloquium on Computational Complexity, Report No.

NEAR-OPTIMAL LOWER BOUNDS ON THE THRESHOLD

ALEXANDER A. SHERSTOV AND PEI WU

Abstract. The threshold degree of a Boolean function f : {0, 1}n → {0, 1} is

1814947 and an Alfred P. Sloan Foundation Research Fellowship.

1.1. Threshold degree of AC0 . Determining the maximum threshold degree of

Depth Threshold degree Reference

2 Ω(n1/3 ) Minsky and Papert [32]

Table 1: Known bounds on the maximum threshold degree of ∧, ∨, ¬-circuits

of Ω(n(k−1)/(2k−1) ) for circuits of depth √

Moreover, fn has bottom fan-in O(log n) for all k 6= 2.

Depth Sign-rank Reference

3 exp(Ω(n1/3 )) Razborov and Sherstov [39]

Table 2: Known lower bounds on the maximum sign-rank of ∧, ∨, ¬-circuits

As a companion result, we prove the following qualitatively similar but quantita-

1.3. Communication complexity. Theorems 1.1–1.3 imply strong new lower

Discrepancy is a combinatorial complexity measure of interest in communication

Theorem 1.5. Let k > 1 be a given integer. Let {Fn }∞ ∞

1.4. Threshold weight and threshold density. By well-known reductions,

for some reals λS . A related complexity

W (Fn ) > dns(Fn )

1.5. Previous approaches. In the remainder of this section, we discuss our

Background. Recall that our results concern the sign-representation of Boolean

deg± (f ) = lim deg (f ). (1.1)

Hardness amplification via block-composition. Much of the recent work on approx-

is well-known [48] that the composed function f ◦ g on n1 n2 variables has 1/3-

the new number of variables, the block-composed function f ◦ g is asymptotically

1.6. Our approach.

error parameter exponentially close to 1 as in [17], have no implications for thresh-

For a logical condition C, we use the Iverson bracket

where e = 2.7182 . . . denotes Euler’s number.

2.2. Boolean functions and circuits. We view Boolean functions as mappings

Compositions f1 ◦ f2 ◦ · · · ◦ fk of three or more functions, where each instance of

(conv F )⊗n ⊆ conv(F ⊗n ). (2.2)

Throughout this manuscript, we view probability distributions as real functions.

In the case of a one-element set W = {w}, we further shorten X|{w} to X|w . To

2.4. Orthogonal content. For a multivariate real polynomial p : Rn → R, we

Proposition 2.1. Let X and Y be nonempty finite subsets of Euclidean space.

Proposition 2.2. Let φ0 , φ1 : X → R be given functions on a finite subset X of

Proposition 2.4. Let Λ be a probability distribution on a finite subset X of Eu-

Proof. By hypothesis, Λ̃ is a nonnegative function. Moreover, kΛ̃k1 = hΛ̃, 1i =

2.5. Sign-representation. Let f : X → {0, 1} be a given Boolean function, for

Theorem 2.5 (Minsky and Papert). deg± (MPm ) = Ω(m).

Fact 2.6. Let f : X → {0, 1} be a given Boolean function on a finite subset X of

(−1)f (x) ψ(x) > 0, x ∈ X,

deg± (f ) = max orth((−1)f · µ). (2.4)

We now define a generalization of threshold degree inspired by the dual view in

deg± (f, γ) = max orth((−1)f · µ). (2.5)

Fact 2.8. For every nonconstant function f : X → {0, 1},

Proof. Define µ = 12 µ0 + 21 µ1 , where µi is the uniform probability distribution on

deg± (F, γ) > deg± (f, γ), 0 6 γ 6 1.

deg± (F ) > deg± (f ).

Proof. Fix 0 6 γ 6 1 arbitrarily. Let λ be a probability distribution on {0, 1}n such

2.6. Symmetrization. Let Sn denote the symmetric group on n elements. For a

Proposition 2.10 (Minsky and Papert). Let p : Rn → R be a given polynomial.

is a univariate polynomial on {0, 1, 2, . . . , n} of degree at most deg p.

Minsky and Papert’s result generalizes to block-symmetric functions:

Proposition 2.11. Let n1 , . . . , nk be positive integers. Let p : Rn1 × · · · × Rnk → R

(t1 , t2 , . . . , tk ) 7→ E E ··· E p(x1 , x2 , . . . , xk )

is a polynomial on {0, 1, . . . , n1 } × {0, 1, . . . , n2 } × · · · × {0, 1, . . . , nk } of degree at

Proposition 2.11 follows in a straightforward manner from Proposition 2.10 by

Proposition 2.12. Let p : (Rm )n → R be a polynomial of degree d. Then there is

E p(xσ(1) , xσ(2) , . . . , xσ(n) ) = p∗ (x1 + x2 + · · · + xn ).

E p(xσ(1) , xσ(2) , . . . , xσ(n) )

Corollary 2.13. Let p : (Rm )n → R be a given polynomial. Then the mapping

is a polynomial on Nm |6n of degree at most deg p.

Minsky and Papert’s symmetrization corresponds to m = 1 in Corollary 2.13.

Proof of Corollary 2.13. Let v ∈ Nm |6n be given. Then all representations v =

which by Proposition 2.12 is a polynomial in

of degree at most deg p.

It will be helpful to define symmetric versions of basic Boolean functions. We

The symmetric variant of the Minsky–Papert function is MP∗m,r = ANDm ◦ OR∗r .

2.7. Communication complexity. An excellent reference on communication com-

UPP(F ) = inf R (F ). (2.8)

This quantity is known as the communication complexity of F with weakly un-

where χi : X1 ×· · ·×Xi−1 ×Xi+1 ×· · ·×X` → {0, 1}. In other words, an `-dimensional

deg± (f ) = lim deg (f ). (1.1)

UPP(F ) = inf R (F ). (2.8)