TR19 003
TR19 003
3 (2019)
∗ Computer Science Department, UCLA, Los Angeles, CA 90095. Supported by NSF grant CCF-
ISSN 1433-8092
Contents
1. Introduction 3
1.1. Threshold degree of AC0 . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Sign-rank of AC0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Communication complexity . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Threshold weight and threshold density . . . . . . . . . . . . . . . 8
1.5. Previous approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6. Our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2. Preliminaries 15
2.1. General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2. Boolean functions and circuits . . . . . . . . . . . . . . . . . . . . . 16
2.3. Norms and products . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4. Orthogonal content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5. Sign-representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6. Symmetrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7. Communication complexity . . . . . . . . . . . . . . . . . . . . . . . 23
2.8. Discrepancy and sign-rank . . . . . . . . . . . . . . . . . . . . . . . . 23
3. Auxiliary results 25
3.1. Basic dual objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2. Dominant components . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3. Input transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4. The threshold degree of AC0 32
4.1. Shifting probability mass in product distributions . . . . . . . . . 32
4.2. A bounded dual polynomial for MP . . . . . . . . . . . . . . . . . . 39
4.3. Hardness amplification for threshold degree and beyond . . . . . 42
4.4. Threshold degree and discrepancy of AC0 . . . . . . . . . . . . . . 48
4.5. Threshold degree of surjectivity . . . . . . . . . . . . . . . . . . . . 50
5. The sign-rank of AC0 52
5.1. A simple lower bound for depth 3 . . . . . . . . . . . . . . . . . . . 53
5.2. Local smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3. Metric properties of locally smooth distributions . . . . . . . . . . 58
5.4. Weight transfer in locally smooth distributions . . . . . . . . . . . 59
5.5. A locally smooth dual polynomial for MP . . . . . . . . . . . . . . 68
5.6. An amplification theorem for smooth threshold degree . . . . . . 75
5.7. The smooth threshold degree of AC0 . . . . . . . . . . . . . . . . . 83
5.8. The sign-rank of AC0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Acknowledgments 87
References 87
Appendix A. Sign-rank and smooth threshold degree 90
A.1. Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
A.2. Forster’s bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.3. Spectral norm of pattern matrices . . . . . . . . . . . . . . . . . . . 91
A.4. Proof of Theorem 2.18 . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Appendix B. A dual object for OR 93
1. Introduction
A real polynomial p is said to sign-represent the Boolean function f : {0, 1}n →
{0, 1} if sgn p(x) = (−1)f (x) for every input x ∈ {0, 1}n . The threshold degree of
f , denoted deg± (f ), is the minimum degree of a multivariate real polynomial that
sign-represents f . Equivalent terms in the literature include strong degree [4], voting
polynomial degree [28], PTF degree [34], and sign degree [10]. Since any function
f : {0, 1}n → {0, 1} can be represented exactly by a real polynomial of degree at
most n, the threshold degree of f is an integer between 0 and n. Viewed as a compu-
tational model, sign-representation is remarkably powerful because it corresponds
to the strongest form of pointwise approximation. The formal study of threshold
degree began in 1969 with the pioneering work of Minsky and Papert [32] on limita-
tions of perceptrons. The authors of [32] famously proved that the parity function
on n variables has the maximum possible threshold degree, n. They obtained lower
bounds on the threshold degree of several other functions, including DNF formu-
las and intersections of halfspaces. Since then, sign-representing polynomials have
found applications far beyond artificial intelligence. In theoretical computer science,
applications of threshold degree include circuit lower bounds [28, 29, 42, 19, 7], size-
depth trade-offs [37, 55], communication complexity [42, 19, 44, 39, 7, 53, 51], struc-
tural complexity [4, 9], and computational learning [26, 25, 35, 3, 47, 49, 13, 50, 56].
The notion of threshold degree has been especially influential in the study of AC0 ,
the class of constant-depth polynomial-size circuits with ∧, ∨, ¬ gates of unbounded
fan-in. The first such result was obtained by Aspnes et al. [4], who used sign-
representing polynomials to give a beautiful new proof of classic lower bounds
for AC0 . In communication complexity, the notion of threshold degree played a
central role in the first construction [42, 44] of an AC0 circuit with exponentially
small discrepancy and hence large communication complexity in nearly every model.
That discrepancy result was used in [42] to show the optimality of Allender’s classic
simulation of AC0 by majority circuits, solving the open problem [28] on the relation
between the two circuit classes. Subsequent work [19, 7, 53, 51] resolved other
questions in communication complexity and circuit complexity related to constant-
depth circuits by generalizing the threshold degree method of [42, 44].
Sign-representing polynomials also paved the way for algorithmic breakthroughs
in the study of constant-depth circuits.
Specifically, anyfunction of threshold degree
d can be viewed as a halfspace in n0 + n1 + · · · + nd dimensions, corresponding
to the monomials in a sign-representation of f . As a result, a class of functions
of threshold degree at most d can be learned inthe standard PAC model under
arbitrary distributions in time polynomial in n0 + n1 + · · · + nd . Klivans and
Servedio [26] used this threshold degree approach to give what is currently the
fastest algorithm for learning polynomial-size DNF formulas, with running time
exp(Õ(n1/3 )). Another learning-theoretic breakthrough based on threshold degree
is the fastest algorithm for learning Boolean formulas, obtained by O’Donnell and
Servedio [35] for formulas of constant depth and by Ambainis et al. [3] for arbitrary
k−1 k
depth. Their algorithm runs in time exp(Õ(n(2 −1)/(2 −1) )) for formulas of size
√
n and constant depth k, and in time exp(Õ( n)) for formulas of unbounded depth.
In both cases, the bound on the running time follows from the corresponding upper
bound on the threshold degree.
A far-reaching generalization of threshold degree is the matrix-analytic notion of
sign-rank, which allows sign-representation out of arbitrary low-dimensional sub-
spaces rather than the subspace of low-degree polynomials. The contribution of
this paper is to prove essentially optimal lower bounds on the threshold degree and
4 ALEXANDER A. SHERSTOV AND PEI WU
sign-rank of AC0 , which in turn imply lower bounds on other fundamental com-
plexity measures of interest in communication complexity and learning theory. In
the remainder of this section, we give a detailed overview of the previous work,
present our main results, and discuss our proofs.
Three decades later, Klivans and Servedio [26] obtained an O(n1/3 log n) upper
bound on the threshold degree of any polynomial-size DNF formula in n vari-
ables, essentially matching Minsky and Papert’s result and resolving the problem
for depth 2. Determining the threshold degree of circuits of depth k > 3 proved
to be challenging. The only upper bound known to date is the trivial O(n), which
follows directly from the definition of threshold degree. In particular, it is consis-
tent with our knowledge that there are AC0 circuits with linear threshold degree.
On the lower bounds side, the only progress for a long time was due to O’Donnell
and Servedio [35], who constructed for any k > 1 a circuit of depth k + 2 with
threshold degree Ω(n1/3 log2k/3 n). The authors of [35] formally posed the prob-
lem of obtaining a polynomial improvement on Minsky and Papert’s lower bound.
Such an improvement was obtained in [50], with a threshold degree lower bound
Theorem 1.1. Let k > 1 be a fixed integer. Then there is an (explicitly given)
Boolean circuit family {fn }∞ n
n=1 , where fn : {0, 1} → {0, 1} has polynomial size,
depth k, and threshold degree
k−1 1 k−2 k−2
deg± (fn ) = Ω n k+1 · (log n)− k+1 d 2 eb 2 c .
For large k, Theorem 1.1 essentially matches the trivial upper bound of n on the
threshold degree of any function. For any fixed depth k, Theorem 1.1 subsumes
all previous lower bounds on the threshold degree of AC0 , with a polynomial im-
provement starting at depth k = 4. In particular, the lower bounds due to Minsky
and Papert [32] and Bun and Thaler [17] are subsumed as the special cases k = 2
and k = 3, respectively. From a computational learning perspective, Theorem 1.1
definitively rules out the threshold degree approach to learning constant-depth cir-
cuits.
1.2. Sign-rank of AC0 . The sign-rank of a matrix A = [Aij ] without zero entries,
denoted rk± (A), is the least rank of a real matrix M = [Mij ] with sgn Mij = sgn Aij
for all i, j. In other words, the sign-rank of A is the minimum rank of a matrix
that can be obtained by making arbitrary sign-preserving changes to the entries of
A. The sign-rank of a Boolean function F : {0, 1}n × {0, 1}n → {0, 1} is defined
in the natural way as the sign-rank of the matrix [(−1)F (x,y) ]x,y . In particular,
the sign-rank of F is an integer between 1 and 2n . This fundamental notion has
been studied in contexts as diverse as matrix analysis, communication complexity,
circuit complexity, and learning theory; see [39] for a bibliographic overview. To a
complexity theorist, sign-rank is a vastly more challenging quantity to analyze than
threshold degree. Indeed, a sign-rank lower bound rules out sign-representation out
of every linear subspace of given dimension, whereas a threshold degree lower bound
rules out sign-representation specifically by linear combinations of monomials up
to a given degree.
Unsurprisingly, progress in understanding sign-rank has been slow and difficult.
No nontrivial lower bounds were known for any explicit matrices until the break-
through work of Forster [21], who proved strong lower bounds on the sign-rank of
Hadamard matrices and more generally all sign matrices with small spectral norm.
The sign-rank of constant-depth circuits F : {0, 1}n ×{0, 1}n → {0, 1} has since seen
considerable work, as summarized in Table 2. The first exponential lower bound on
the sign-rank of an AC0 circuit was obtained by Razborov and Sherstov [39], solv-
ing a 22-year-old problem due to Babai, Frankl, and Simon [5]. The authors of [39]
constructed a polynomial-size circuit of depth 3 with sign-rank exp(Ω(n1/3 )). In
follow-up work, Bun and Thaler [15] constructed a polynomial-size circuit of depth 3
with sign-rank exp(Ω̃(n2/5 )). A more recent and incomparable result, also due to
√
Bun and Thaler [17], is a sign-rank lower bound of exp(Ω̃( n)) for a circuit of
polynomial size and depth 7. No nontrivial upper bounds are known on√the sign-
rank of AC0 . Closing this gap between the best lower bound of exp(Ω̃( n)) and
the trivial upper bound of 2n has been a challenging open problem. We solve this
problem almost completely, by constructing for any > 0 a constant-depth circuit
with sign-rank exp(Ω(n1− )). In quantitative detail, our results on the sign-rank of
AC0 are the following two theorems.
Theorem 1.2. Let k > 1 be a given integer. Then there is an (explicitly given)
Boolean circuit family {Fn }∞ n n
n=1 , where Fn : {0, 1} ×{0, 1} → {0, 1} has polynomial
size, depth 3k, and sign-rank
k(k−1)
1
rk± (Fn ) = exp Ω n1− k+1 · (log n)− 2(k+1) .
Theorem 1.3. Let k > 1 be a given integer. Then there is an (explicitly given)
Boolean circuit family {Gn }∞ n n
n=1 , where Gn : {0, 1} × {0, 1} → {0, 1} has polyno-
mial size, depth 3k + 1, and sign-rank
k2
1
rk± (Gn ) = exp Ω n1− k+1.5 · (log n)− 2k+3 .
For large k, the lower bounds of Theorems 1.2 and 1.3 approach the trivial upper
bound of 2n on the sign-rank of any Boolean function {0, 1}n × {0, 1}n → {0, 1}.
For any given depth, Theorems 1.2 and 1.3 subsume all previous lower bounds
on the sign-rank of AC0 , with a strict improvement starting at depth 3. From a
computational learning perspective, Theorems 1.2 and 1.3 state that AC0 has near-
maximum dimension complexity [41, 43, 39, 17], namely, exp(Ω(n1− )) for any
constant > 0. This rules out the possibility of learning AC0 circuits via dimension
complexity [39], a far-reaching generalization of the threshold degree approach from
the monomial basis to arbitrary bases.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 7
UPP(F ) = inf R (F )
06<1/2
and
1
PP(F ) = inf R (F ) + log2 1 .
06<1/2
2 −
The former quantity, introduced by Paturi and Simon [38], is called the communi-
cation complexity of F with unbounded error, in reference to the fact that the error
probability can be arbitrarily close to 1/2. The latter quantity is called the com-
munication complexity of F with weakly unbounded error. Proposed by Babai et
al. [5], it features an additional penalty term that depends on the error probability.
It is clear that
UPP(F ) 6 PP(F ) 6 n + 2
for every communication problem F : {0, 1}n ×{0, 1}n → {0, 1}, with an exponential
gap achievable between the two complexity measures [10, 41]. These two models
occupy a special place in the study of communication because they are more pow-
erful than any other standard model (deterministic, nondeterministic, randomized,
quantum with or without entanglement). Moreover, unbounded-error protocols
represent a frontier in communication complexity theory in that they are the most
powerful protocols for which explicit lower bounds are currently known. Our re-
sults imply that even for such protocols, AC0 has near-maximal communication
complexity.
To begin with, combining Theorem 1.1 with the pattern matrix method [42, 44]
gives:
Theorem 1.4. Let k > 3 be a fixed integer. Then there is an (explicitly given)
Boolean circuit family {Fn }∞ n n
n=1 , where Fn : {0, 1} ×{0, 1} → {0, 1} has polynomial
size, depth k, communication complexity
k−1 1 k−2 k−2
PP(Fn ) = Ω n k+1 · (log n)− k+1 d 2 eb 2 c
8 ALEXANDER A. SHERSTOV AND PEI WU
and discrepancy
k−1 1 k−2 k−2
disc(Fn ) = exp −Ω n k+1 · (log n)− k+1 d 2 eb 2 c .
For large k, the lower bounds of Theorem 1.5 essentially match the trivial upper
bound of n + 1 on the unbounded-error communication complexity of any function
F : {0, 1}n × {0, 1}n → {0, 1}. Theorem 1.5 strictly subsumes all previous lower
bounds on the unbounded-error communication complexity of AC0 , with a poly-
nomial improvement for any depth greater than 2. The best lower bound on√the
unbounded-error communication complexity of AC0 prior to our work was Ω̃( n)
for a circuit of depth 7, due to Bun and Thaler [17]. Finally, we remark that The-
orem 1.5 gives essentially the strongest possible separation of the communication
complexity classes PH and UPP. We refer the reader to the work of Babai et al. [5]
for definitions and detailed background on these classes.
Qualitatively, Theorem 1.5 is stronger than Theorem 1.4 because communica-
tion protocols with unbounded error are significantly more powerful than those
with weakly unbounded error. On the other hand, Theorem 1.4 is stronger quanti-
tatively for any fixed depth and has the additional advantage of generalizing to the
multiparty setting.
AC0 circuits by polynomials. For the sake of completeness, we mention two such
consequences. The threshold density of a Boolean function f : {0, 1}n → {0, 1},
denoted dns(f ), is the minimum size of a set family S ⊆ P({1, 2, . . . , n}) such
that
!
X P
x
sgn λS (−1) i∈S i
≡ (−1)f (x)
S∈S
It is not hard to see that the threshold density and threshold weight of f correspond
to the minimum size of a threshold-of-parity and majority-of-parity circuit for f,
respectively. The definitions imply that dns(f ) 6 W (f ) for every f, and a little
more thought reveals that 1 6 dns(f ) 6 2n and 1 6 W (f ) 6 4n . These complexity
measures have seen extensive work, motivated by applications to computational
learning and circuit complexity. For a bibliographic overview, we refer the reader
to [50, Section 8.2].
Krause and Pudlák [28, Proposition 2.1] gave an ingenious method for transform-
ing threshold degree lower bounds into lower bounds on threshold density and thus
also threshold weight. Specifically, let f : {0, 1}n → {0, 1} be a Boolean function of
interest. The authors of [28] considered the related function F : ({0, 1}n )3 → {0, 1}
given by F (x, y, z) = f (. . . , (zi ∧ xi ) ∨ (zi ∧ yi ), . . . ), and proved that dns(F ) >
2deg± (f ) . In this light, Theorem 1.1 implies that the threshold density of AC0 is
exp(Ω(n1− )) for any constant > 0:
Corollary 1.6. Let k > 3 be a fixed integer. Then there is an (explicitly given)
Boolean circuit family {Fn }∞ n
n=1 , where Fn : {0, 1} → {0, 1} has polynomial size
and depth k and satisfies
For large k, the lower bounds on the threshold weight and density in Corollary 1.6
essentially match the trivial upper bounds. Observe that the circuit family {Fn }∞
n=1
of Corollary 1.6 has the same depth as the circuit family {fn }∞n=1 of Theorem 1.1.
This is because fn has bottom fan-in O(log n), and thus the Krause-Pudlák transfor-
mation fn 7→ Fn can be “absorbed” into the bottom two levels of fn . Corollary 1.6
subsumes all previous lower bounds [28, 13, 50, 52, 17] on the threshold weight and
density of AC0 , with a polynomial improvement for every k > 4. The improvement
is particularly noteworthy in the√case of threshold density, where the best previous
lower bound [52, 17] was exp(Ω( n)).
10 ALEXANDER A. SHERSTOV AND PEI WU
Both approximate degree and threshold degree have dual characterizations [44],
obtained by appeal to linear programming duality. Specifically, deg (f ) > d if
and only if there is a function φ : {−1, +1}n → R with the following two proper-
ties: hφ, f i > kφk1 ; and hφ, pi = 0 for every polynomial p of degree less than d.
Rephrasing, φ must have large correlation with f but zero correlation with every
low-degree polynomial. By weak linear programming duality, φ constitutes a proof
that deg (f ) > d and for that reason is said to witness the lower bound deg (f ) > d.
In view of (1.1), this discussion generalizes to threshold degree. The dual charac-
terization here states that deg± (f ) > d if and only if there is a nonzero function
φ : {−1, +1}n → R with the following two properties: φ(x)f (x) > 0 for all x; and
hφ, pi = 0 for every polynomial p of degree less than d. In this dual characteriza-
tion, φ agrees in sign with f and is additionally orthogonal to polynomials of degree
less than d. The sign-agreement property can be restated in terms of correlation,
as hφ, f i = kφk1 . As before, φ is called a threshold degree witness for f.
What distinguishes the dual characterizations of approximate degree and thresh-
old degree is how the dual object φ relates to f . Specifically, a threshold degree
witness must agree in sign with f at every point. An approximate degree witness,
on the other hand, need only exhibit such sign-agreement with f at most points,
in that the points where the sign of φ is correct should account for most of the `1
norm of φ. As a result, constructing dual objects for threshold degree is significantly
more difficult than for approximate degree. This difficulty is to be expected because
the gap between threshold degree and approximate degree can be arbitrary, e.g.,
1 versus Θ(n) for the majority function on n bits [36].
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 11
This composed dual object often requires additional work to ensure sign-agreement
or correlation with the composed Boolean function. Among the generic tools
available to assist in this process is a “corrector” object ζ due to Razborov and
Sherstov [39], with the following four properties: (i) ζ is orthogonal to low-degree
polynomials; (ii) ζ takes on 1 at a prescribed point of the hypercube; (iii) ζ is
bounded on inputs of low Hamming weight; and (iv) ζ vanishes on all other points
of the hypercube. Using the Razborov–Sherstov object, suitably shifted and scaled,
one can surgically correct the behavior of a given dual object Φ on a substantial
fraction of inputs, thus modifying its metric properties without affecting its orthog-
onality to low-degree polynomials. This technique has played an important role in
recent work, e.g., [15, 16, 11, 17].
Hardness amplification for approximate degree. While block-composition has pro-
duced a treasure trove of results on polynomial representations of Boolean functions,
it is of limited use when it comes to constructing functions with high bounded-
error approximate degree. To illustrate the issue, consider arbitrary functions
f : {−1, +1}n1 → {−1, +1} and g : {−1, +1}n2 → {−1, +1} with 1/3-approximate
degrees nα α2
1 and n2 , respectively, for some 0 < α1 < 1 and 0 < α2 < 1. It
1
Threshold degree of AC 0 . Bun and Thaler [17] refer to obtaining an Ω(n1− ) thresh-
old degree lower bound for AC0 as the “main glaring open question left by our work.”
It is important to note here that lower bounds on approximate degree, even with the
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 13
implications a priori for the sign-rank of AC0 . Our proofs of Theorems 1.2 and 1.3
are completely disjoint from Theorem 1.1 and are instead based on a stronger
approximation-theoretic quantity that we call γ-smooth threshold degree. Formally,
the γ-smooth threshold degree of a Boolean function f : X → {−1, +1} is the largest
d for which there is a nonzero function φ : X → R with the following two properties:
φ(x)f (x) > γ · kφk1 /|X| for all x ∈ X; and hφ, pi = 0 for every polynomial p of
degree less than d. Taking γ = 0 in this formalism, one recovers the standard dual
characterization of the threshold degree of f. In particular, threshold degree is syn-
onymous with 0-smooth threshold degree. The general case of γ-smooth threshold
degree for γ > 0 requires threshold degree witnesses φ that are min-smooth, in that
the absolute value of φ at any given point is at least a γ fraction of the average
absolute value of φ over all points. A substantial advantage of smooth threshold de-
gree is that it has immediate sign-rank implications. Specifically, any lower bound
of d on the 2−O(d) -smooth threshold degree can be converted efficiently and in a
black-box manner into a sign-rank lower bound of 2Ω(d) , using a combination of
the pattern matrix method [42, 44] and Forster’s spectral lower bound on sign-
rank [21, 22]. Accordingly, we obtain Theorems 1.2 and 1.3 by proving an Ω(n1− )
1−
lower bound on the 2−O(n ) -smooth threshold degree of AC0 , for any constant
> 0.
At the core of our result is an amplification theorem for smooth threshold de-
gree, whose repeated application makes it possible to prove arbitrarily strong lower
bounds for AC0 . Amplifying smooth threshold degree is a complex juggling act
due to the presence of two parameters—degree and smoothness—that must evolve
in coordinated fashion. The approach of Theorem 1.1 is not useful here because
the threshold degree witnesses that arise from the proof of Theorem 1.1 are highly
nonsmooth. In more detail, when amplifying the threshold degree of a function f
as in the proof of Theorem 1.1, two phenomena adversely affect the smoothness
parameter. The first is block-composition itself as a composition technique, which
in the regime of interest to us transforms every threshold degree witness for f into
a hopelessly nonsmooth witness for the composed function. The other culprit is the
input compression step, which re-encodes the input and thereby affects the smooth-
ness in ways that are hard to control. To overcome these difficulties, we develop a
novel approach based on what we call local smoothness.
Formally, let Φ : Nn → R be a function of interest. For a subset X ⊆ Nn and
0
a real number K > 1, we say that Φ is K-smooth on X if |Φ(x)| 6 K |x−x | |Φ(x0 )|
for all x, x0 ∈ X. Put another way, for any two points of X at `1 distance d, the
corresponding values of Φ differ in magnitude by a factor of at most K d . In and
of itself, a locally smooth function Φ need not be min-smooth because for a pair
of points that are far from each other, the corresponding Φ-values can differ by
many orders of magnitude. However, locally smooth functions exhibit extraordi-
nary plasticity. Specifically, we show how to modify a locally smooth function’s
metric properties—such as its support or the distribution of its `1 mass—without
the change being detectable by low-degree polynomials. This apparatus makes
it possible to restore min-smoothness to the dual object Φ that results from the
block-composition step and preserve that min-smoothness throughout the input
compression step, eliminating the two obstacles to min-smoothness in the earlier
proof of Theorem 1.1. The new block-composition step uses a locally smooth wit-
ness for the threshold degree of MPm , which needs to be built from scratch and is
quite different from the witness in the proof of Theorem 1.1.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 15
Our described approach departs considerably from previous work on the sign-
rank of constant-depth circuits [39, 15, 17]. The analytic notion in those earlier
papers is weaker than γ-smooth threshold degree and in particular allows the dual
object to be arbitrary on a γ fraction of the inputs. This weaker property is accept-
able when the main result is proved in one shot, with a closed-form construction
of the dual object. By contrast, we must construct dual objects iteratively, with
each iteration increasing the degree parameter and proportionately decreasing the
smoothness parameter. This iterative process requires that the dual object in each
iteration be min-smooth on the entire domain. Perhaps unexpectedly, we find γ-
smooth threshold degree easier to work with than the weaker notion in previous
work [39, 15, 17]. In particular, we are able to give a new and short proof of the
exp(Ω(n1/3 )) lower bound on the sign-rank of AC0 , originally obtained by Razborov
and Sherstov [39] with a much more complicated approach. The new proof can be
found in Section 5.1, where it serves as a prelude to our main result on the sign-rank
of AC0 .
2. Preliminaries
2.1. General. For a string x ∈ {0, 1}n and a set S ⊆ {1, 2, . . . , n}, we let x|S
denote the restriction of x to the indices in S. In other words, x|S = xi1 xi2 . . . xi|S| ,
where i1 < i2 < · · · < i|S| are the elements of S. The characteristic function of a
set S ⊆ {1, 2, . . . , n} is given by
(
1 if x ∈ S,
1S (x) =
0 otherwise.
We let N = {0, 1, 2, 3, . . .} denote the set of natural numbers. The following well-
known bound [24, Proposition 1.4] is used in our proofs without further mention:
k en k
X n
6 , k = 0, 1, 2, . . . , n, (2.1)
i=0
i k
The term Euclidean space refers to Rn for some positive integer n. We let ei de-
note the vector whose ith component is 1 and the others are 0. Thus, the vectors
e1 , e2 , . . . , en form the standard basis for Rn . For vectors x and y, we write x 6 y
to mean that xi 6 yi for each i. The relations >, <, > on vectors are defined
analogously.
We frequently omit the argument in equations and inequalities involving func-
tions, as in sgn p = (−1)f . Such statements are to be interpreted pointwise. For
example, the statement “f > 2|g| on X” means that f (x) > 2|g(x)| for every
x ∈ X. The positive and negative parts of a function f : X → R are denoted
pos f = max{f, 0} and neg f = max{−f, 0}, respectively.
layer. For example, an AND-OR-AND circuit is a depth-3 circuit with the top
and bottom layers composed of ∧ gates, and middle layer composed of ∨ gates. A
Boolean formula is a Boolean circuit in which every gate has fan-out 1. Common
examples of Boolean formulas are DNF and CNF formulas.
2.3. Norms and products. For a set X, we let RX denote the linear space of
real-valued functions on X. The support of a function f ∈ RX is denoted supp f =
{x ∈ X : f (x) 6= 0}. For real-valued functions with finite support, we adopt the
usual norms and inner product:
kf k∞ = max |f (x)|,
x∈supp f
X
kf k1 = |f (x)|,
x∈supp f
X
hf, gi = f (x)g(x).
x∈supp f ∩ supp g
This covers as a special case functions on finite sets. The tensor product of f ∈ RX
and g ∈ RY is denoted f ⊗g ∈ RX×Y and given by (f ⊗g)(x, y) = f (x)g(y). The ten-
sor product f ⊗f ⊗· · ·⊗f (n times) is abbreviated f ⊗n . For a subset S ⊆ {1, 2, . . . , n}
⊗S
Q a function f : X → R, we define f ⊗∅: X → R by
and n
f ⊗S (x1 , x2 , . . . , xn ) =
⊗{1,2,...,n}
i∈S f (xi ). As extremal cases, we have f ≡ 1 and f = f ⊗n . Tensor
product notation generalizes naturally to sets of functions: F ⊗ G = {f ⊗ g : f ∈
F, g ∈ G} and F ⊗n = {f1 ⊗f2 ⊗· · ·⊗fn : f1 , f2 , . . . , fn ∈ F }. A conical combination
of f1 , f2 , . . . , fk ∈ RX is any function of the form λ1 f1 + λ2 f2 + · · · + λk fk , where
λ1 , λ2 , . . . , λk are nonnegative reals. A convex combination of f1 , f2 , . . . , fk ∈ RX
is any function λ1 f1 + λ2 f2 + · · · + λk fk , where λ1 , λ2 , . . . , λk are nonnegative re-
als that sum to 1. The conical hull of F ⊆ RX , denoted cone F, is the set of all
conical combinations of functions in F. The convex hull, denoted conv F , is defined
analogously as the set of all convex combinations of functions in F. For any set of
functions F ⊆ RX , we have
X|W = {x ∈ X : |x| ∈ W }.
18 ALEXANDER A. SHERSTOV AND PEI WU
Proof. Item (i) is immediate, as is the upper bound in (ii). For the lower bound
in (ii), simply note that the linearity of inner product makes it possible to restrict
attention to factored polynomials p(x)q(y), where p and q are polynomials on X
and Y , respectively. For (iii), use a telescoping sum to write
n−1
X
φ⊗n − ψ ⊗n = (φ⊗(n−i) ⊗ ψ ⊗i − φ⊗(n−i−1) ⊗ ψ ⊗(i+1) )
i=0
n−1
X
= φ⊗(n−i−1) ⊗ (φ − ψ) ⊗ ψ ⊗i .
i=0
By (ii), each term in the final expression has orthogonal content at least orth(φ−ψ).
By (i), then, the sum has orthogonal content at least orth(φ − ψ) as well.
Proof.
Qn By linearity, it suffices to consider factored polynomials p(x1 , . . . , xn ) =
i=1 pi (xi ), where each pi is a nonzero polynomial on X. In this setting, we have
* n
+ n
O Y
φzi , p = hφzi , pi i . (2.3)
i=1 i=1
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 19
By definition, hφ0 , pi i = hφ1 , pi i for any index i with deg pi < orth(φ1 − φ0 ). As a
result, such indices do not contribute to the degree of the right-hand side of (2.3)
as a function of z. The contribution of any other index to the degree is clearly at
most 1. Summarizing, the right-hand side of (2.3) is a polynomial in z ∈ {0, 1}n of
degree at most |{i : deg pi > orth(φ1 − φ0 )}| 6 (deg p)/ orth(φ1 − φ0 ).
Corollary 2.3. Let X be a finite subset of Euclidean space. Then for any func-
tions φ0 , φ1 : X → R and ψ : {0, 1}n → R,
X n
O
orth ψ(z) φzi > orth(ψ) · orth(φ1 − φ0 ).
z∈{0,1}n i=1
Proof. We may assume that orth(ψ)·orth(φ1 −φ0 ) > 0 since the claim holds trivially
otherwise. Fix any polynomial P of degree less than orth(ψ) · orth(φ1 − φ0 ). The
linearity of inner product leads to
* n
+ * n +
X O X O
ψ(z) φzi , P = ψ(z) φz i , P .
z∈{0,1}n i=1 z∈{0,1}n i=1
By Proposition 2.2, the right-hand side is the inner product of ψ with a polynomial
of degree less than orth ψ and is therefore zero.
Observe that Corollary 2.3 gives an alternate proof of Proposition 2.1(iii). Our
next proposition uses orthogonal content to give a useful criterion for a real-valued
function to be a probability distribution.
Three new proofs of this lower bound, unrelated to Minsky and Papert’s original
proof, were discovered in [50]. Threshold degree admits the following dual charac-
terization, obtained by appeal to linear programming duality.
20 ALEXANDER A. SHERSTOV AND PEI WU
The function ψ witnesses the threshold degree of f , and is called a dual polynomial
due to its origin in a dual linear program. We refer the reader to [4, 35, 47]
for a proof of Fact 2.6. The following equivalent statement is occasionally more
convenient to work with.
Fact 2.7. For every Boolean function f : X → {0, 1} on a finite subset X of Eu-
clidean space,
We call this quantity the γ-smooth threshold degree of f , in reference to the fact
that the maximization in (2.5) is over probability distributions µ that assign to
every point of the domain at least a γ fraction of the average point’s probability.
A glance at (2.4) and (2.5) reveals that deg± (f, γ) is monotonically nonincreasing
in γ, with the limiting case deg± (f, 0) = deg± (f ).
As one might expect, padding a Boolean function with irrelevant variables does not
decrease its smooth threshold degree. We record this observation below.
Proposition 2.9. Fix integers N > n > 1 and a function f : {0, 1}n → {0, 1}.
Define F : {0, 1}N → {0, 1} by F (x1 , x2 , . . . , xN ) = f (x1 , x2 , . . . , xn ). Then
In particular,
t 7→ E p(x)
x∈{0,1}n |t
Proof. We closely follow an argument due to Ambainis [2, Lemma 3.4], who proved
a related result. Since the components of x1 , x2 , . . . , xn are Boolean-valued, we
have xi,j = x2i,j = x3i,j = · · · and therefore we may assume that p is multilinear.
By linearity, it further suffices to consider the case when p is a single monomial:
m Y
Y
p(x1 , x2 , . . . , xn ) = xi,j (2.6)
j=1 i∈Aj
22 ALEXANDER A. SHERSTOV AND PEI WU
Pm
for some sets A1 , A2 , . . . , Am ⊆ {1, 2, . . . , n} with j=1 |Aj | 6 d. If some pair of sets
Aj , Aj 0 with j 6= j 0 have nonempty intersection, then the right-hand side of (2.6)
contains a product of the form xi,j xi,j 0 for some i and thus p ≡ 0 on the domain
in question. As a result, the proposition holds with p∗ = 0. In the complementary
case when A1 , A2 , . . . , Am are pairwise disjoint, we calculate
Expanding out the binomial coefficients shows that the final expression is an m-
variate polynomial whose argument is the P vector sum x1 + x2 + · · · + xn ∈ Rm .
Moreover, the degree of this polynomial is |Aj | 6 d.
v 7→ E p(x) (2.7)
x∈{0m ,e1 ,e2 ,...,em }n :
x1 +x2 +···+xn =v
v 7→ E p(σ(e1 , . . . , e1 , e2 , . . . , e2 , . . . , em , . . . , em , 0m , 0m . . . , 0m )),
σ∈Sn | {z } | {z } | {z } | {z }
v1 v2 vm n−v1 −···−vm
e + · · · + e1 + e2 + · · · + e2 + · · · + em + · · · + em + 0m + · · · + 0m = v
|1 {z } | {z } | {z } | {z }
v1 v2 vm n−v1 −···−vm
Here, the error is unbounded in the sense that it can be arbitrarily close to 1/2.
Babai et al. [5] proposed an alternate quantity, which includes an additive penalty
term that depends on the error probability:
1
PP(F ) = inf R (F ) + log 1 . (2.9)
2 −
06<1/2
`
Y
χ(x1 , x2 , . . . , x` ) = χi (x1 , . . . , xi−1 , xi+1 , . . . , x` ),
i=1
fundamental building blocks of communication protocols and for that reason play
a central role in the theory. For a Boolean function F : X1 × X2 × · · · × X` → {0, 1}
and a probability distribution P on X1 × X2 × · · · × X` , the discrepancy of F with
respect to P is given by
X
F (x)
discP (F ) = max (−1) P (x)χ(x) ,
χ
x∈X1 ×X2 ×···×X`
The discrepancy method [20, 6, 30] is a classic technique that bounds randomized
communication complexity from below in terms of discrepancy.
1 − 2
2R (F ) > .
disc(F )
Combining this theorem with the definition of PP(F ) gives the following corollary.
2
PP(F ) > log .
disc(F )
The sign-rank of a real matrix A ∈ Rn×m with nonzero entries is the least rank
of a matrix B ∈ Rn×m such that sgn Ai,j = sgn Bi,j for all i, j. In general, the
sign-rank of a matrix can be vastly smaller than its rank. For example, consider
the following nonsingular matrices of order n > 3:
1 1
1 1 1 −1
1 1
.. , .. .
.
.
−1 1
1
−1 1
1
These matrices have sign-rank at most 2 and 3, respectively. Indeed, the first matrix
has the same sign pattern as [2(j − i) + 1]i,j . The second has the same sign pattern
as [hvi , vj i − (1 − )]i,j , where v1 , v2 , . . . , vn ∈ R2 are arbitrary pairwise distinct
unit vectors and is a suitably small positive real, cf. [38, Section 5]. As a matter
of notational convenience, we extend the notion of sign-rank to Boolean functions
f : X ×Y → {0, 1} by defining rk± (f ) = rk± (Mf ), where Mf = [(−1)f (x,y) ]x∈X,y∈Y
is the matrix associated with f . A remarkable fact, due to Paturi and Simon [38],
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 25
As Corollary 2.15 and Theorem 2.16 show, the study of communication with
unbounded and weakly unbounded error is in essence the study of discrepancy and
sign-rank. These quantities are difficult to analyze from first principles. The pattern
matrix method, developed in [42, 44], is a technique that transforms lower bounds
for polynomial approximation into bounds on discrepancy, sign-rank, and various
other quantities in communication complexity. For our discrepancy bounds, we
use the following special case of the pattern matrix method [51, Theorem 5.7 and
equation (119)].
Theorem 2.17 (Sherstov). Let f : {0, 1}n → {0, 1} be given. Consider the `-party
communication problem F : ({0, 1}nm )` → {0, 1} given by F = f ◦ NORm ◦ AND` .
Then
deg± (f )/2
c2` `
disc(F ) 6 √ ,
m
We note that the case ` = 2 of Theorem 2.17 is vastly easier to prove than the
general statement; this two-party result can be found in [44, Theorem 7.3 and
equation (7.3)]. For our sign-rank lower bounds, we use the following theorem
implicit in [45].
Theorem 2.18 (Sherstov, implicit). Let f : {0, 1}n → {0, 1} be given. Suppose that
deg± (f, γ) > d, where γ and d are positive reals. Fix an integer m > 2 and define
F : {0, 1}mn × {0, 1}mn → {0, 1} by F = f ◦ ORm ◦ AND2 . Then
j m kd/2
rk± (F ) > γ .
2
For the reader’s convenience, we include a detailed proof of Theorem 2.18 in Ap-
pendix A.
3. Auxiliary results
In this section, we collect a number of supporting results on polynomial approxi-
mation that have appeared in one form or another in previous work. For the reader’s
convenience, we provide self-contained proofs whenever the precise formulation that
we need departs from published work.
3.1. Basic dual objects. As described in the introduction, we prove our main
results constructively, by building explicit dual objects that witness the correspond-
ing lower bounds. An important tool in this process is the following lemma due to
Razborov and Sherstov [39]. Informally, it is used to adjust a dual object’s metric
26 ALEXANDER A. SHERSTOV AND PEI WU
Lemma 3.1 (Razborov and Sherstov). Fix integers d and n, where 0 6 d < n. Then
there is an (explicitly given) function ζ : {0, 1}n → R such that
In more detail, this result corresponds to taking k = d and ζ = (−1)n g in the proof
of Lemma 3.2 of [39]. We will need the following symmetrized version of Lemma 3.1.
Lemma 3.2. Fix a point u ∈ Nn and a natural number d < |u|. Then there is
ζu : Nn → R such that
Now define ζu : Nn → R by
X X
ζu (v) = ··· ζ(x1 . . . xn ),
x1 ∈{0,1}|u1 | ||v xn ∈{0,1}|un | ||v
1| n|
where we adopt the convention that the set {0, 1}0 = {0, 1}0 |0 has as its only
element the empty string, with weight 0. Then properties (3.1)–(3.3) are imme-
diate from (3.5)–(3.7), respectively. To verify the remaining property (3.4), fix a
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 27
Theorem 3.3. Let 0 < < 1 be given. Then for some constants c0 , c00 ∈ (0, 1) and
all integers N > n > 1, there is an (explicitly given) function ψ : {0, 1, 2, . . . , N } →
R such that
1−
ψ(0) > ,
2
kψk1 = 1,
√
orth ψ > c0 n,
sgn ψ(t) = (−1)t , t = 0, 1, 2, . . . , N,
c0
1
|ψ(t)| ∈ √ , √ , t = 0, 1, 2, . . . , N.
(t + 1)2 2c00 t/ n c0 (t + 1)2 2c00 t/ n
3.2. Dominant components. We now recall a lemma due to Bun and Thaler [16]
that serves to identify the dominant components of a vector. Its primary use [16, 11]
is to prove concentration-of-measure results for product distributions on Nn .
kvk1
|S| > ,
2kvk∞
kvk1
|S| min |vi | > .
i∈S 2(1 + ln n)
28 ALEXANDER A. SHERSTOV AND PEI WU
Proof (adapted from [16]). By renumbering the indices if necessary, we may as-
sume that |v1 | > |v2 | > · · · > |vn | > 0. For the sake of contradiction, suppose that
no such set S exists. Then
1 kvk1
|vi | < ·
i 2(1 + ln n)
kvk1
for every index i > 2kvk∞ . As a result,
X n
X
kvk1 = |vi | + |vi |
kvk
l m
kvk
i< 2kvk1 i= 2kvk1
∞ ∞
n
X X 1 kvk1
6 kvk∞ + ·
kvk
l m i 2(1 + ln n)
kvk
i< 2kvk1 i= 2kvk1
∞ ∞
n
kvk1 kvk1 X 1
< +
2 2(1 + ln n) i=1
i
6 kvk1 ,
Lemma 3.5. Fix θ > 0 and let v ∈ Rn be an arbitrary vector with kvk1 > θ. Then
there is S ⊆ {1, 2, . . . , n} such that
kvk1
|S| > , (3.9)
2kvk∞
1 θ
min |vi | > · , (3.10)
i∈S |S| 2(1 + ln n)
X
|vi | < θ. (3.11)
i∈S
/
Proof. Fix n, v, and θ for the remainder of the proof. We will refer to a subset
S ⊆ {1, 2, . . . , n} as regular if S satisfies (3.9) and (3.10). Lemma 3.4 along with
kvk1 > θ ensures the existence of at least one regular set. Now, let S be a maximal
regular set. For the sake of contradiction, suppose that (3.11) fails. Applying
Lemma 3.4 to v|S produces a nonempty set T ⊆ S with
1 θ
min |vi | > · .
i∈T |T | 2(1 + ln n)
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 29
Lemma 3.5 implies the following concentration-of-measure result for product dis-
tributions on Nn ; cf. Bun and Thaler [16].
Lemma 3.6 (cf. Bun and Thaler). Let λ1 , λ2 , . . . , λn ∈ D(N) be given with
Cαt
λi (t) 6 , t ∈ N, (3.12)
(t + 1)2
Proof (adapted from [16]). For any vector v ∈ Nn with kvk1 > θ/2, Lemma 3.5
guarantees the existence of a nonempty set S ⊆ {1, 2, . . . , n} such that
1 θ
min |vi | > · , (3.13)
i∈S |S| 4(1 + ln n)
X θ
|vi | < . (3.14)
2
i∈S
/
|S|
X
θ/2 C|S| · 4(1 + ln n)
= α
θ
S⊆{1,2,...,n}
S6=∅
n s
X n Cs · 4(1 + ln n)
= · αθ/2
s=1
s θ
n s
X
θ/2 en Cs · 4(1 + ln n)
6 α ·
s=1
s θ
6 αθ/2 ,
where the first inequality holds by the opening paragraph of the proof; the sec-
ond step applies the union bound; the third step uses 0 6 α 6 1 and the upper
bound (3.12) for the λi ; and the last two steps use (2.1) and the hypothesis that
θ > 8Cen(1 + ln n), respectively.
Lemma 3.7 (Sherstov). Let n > 1 be a given integer. Then there is a surjection
g : {0, 1}6dlog(n+1)e → {0n , e1 , e2 , . . . , en } such that
E p= E p= E p = ··· = E p
g −1 (0n ) g −1 (e1 ) g −1 (e2 ) g −1 (en )
Observe that the points 0n , e1 , e2 , . . . , en in this lemma act simply as labels and
can be replaced with any other tuple of n + 1 distinct points. Indeed, this result
was originally stated in [52] for a different choice of points. A tensor version of
Lemma 3.7 is as follows.
(y1 , y2 , . . . , yθ ) 7→ E p
g −1 (y1 )×g −1 (y2 )×···×g −1 (yθ )
Proof. By linearity, it suffices to prove the lemma for factored polynomials of the
form p(x1 , x2 , . . . , xθ ) = p1 (x1 )p2 (x2 ) · · · pθ (xθ ), where p1 , p2 , . . . , pθ are real poly-
nomials on R6dlog(n+1)e . For such a polynomial p, the defining equation simplifies
to
n
Y
E p = E pi . (3.16)
g −1 (y1 )×g −1 (y2 )×···×g −1 (yθ ) g −1 (yi )
i=1
deg p
|{i : deg pi > dlog(n + 1)e + 1}| 6 ,
dlog(n + 1)e + 1
as was to be shown.
Theorem 3.9. Let n, θ > 1 be given integers. Set N = 6dlog(n + 1)eθ. Then there
is a surjection G : {0, 1}N → Nn |6θ such that:
(i) for every polynomial p : RN → R, the mapping v 7→ EG−1 (v) p is a polyno-
mial on Nn |6θ of degree at most (deg p)/dlog(n + 1) + 1e;
(ii) for every coordinate i = 1, 2, . . . , n, the mapping x 7→ OR∗θ (G(x)i ) is com-
putable by an explicitly given DNF formula with O(θn6 ) terms, each with
at most 6dlog(n + 1)e variables.
E p= E E p. (3.18)
G−1 (v) y∈{0n ,e1 ,e2 ,...,en }θ : g −1 (y1 )×g −1 (y2 )×···×g −1 (yθ )
y1 +y2 +···+yθ =v
Recall from Lemma 3.8 that the rightmost expectation in this equation is a polyno-
mial in y1 , y2 , . . . , yθ ∈ {0n , e1 , e2 , . . . , en } of degree at most (deg p)/dlog(n+1)+1e.
As a result, Corollary 2.13 implies that the right-hand side of (3.18) is a polynomial
in v of degree at most (deg p)/dlog(n + 1) + 1e.
(ii) Fix an index i. Then
θ
_
OR∗θ (G(x)i ) = I[g(xj ) = ei ].
j=1
Each of the disjuncts on the right-hand side is a function of 6dlog(n + 1)e Boolean
variables. Therefore, OR∗θ (G(x)i ) is representable by a DNF formula with O(θn6 )
terms, each with at most 6dlog(n + 1)e variables.
supp λ = {0, 1, 2, . . . , r0 }
ct+1 1
6 λ(t) 6 , t ∈ supp λ. (4.1)
(t + 1)2 2αt c(t + 1)2 2αt
Distributions in this family are subject to pointwise constraints, hence the symbol
B for “bounded.” Our bounding functions are motivated mainly by the metric
properties of the dual polynomial for ORn , constructed in Theorem 3.3.
In this notation, our analysis handles any distribution Λ ∈ B(r, c, α)⊗n . It would
be possible to generalize our work further, but the lower and upper bounds in (4.1)
are already exponentially far apart and capture a much larger class of probability
distributions than what we need for the applications to AC0 . The precise statement
of our result is as follows.
Theorem 4.1. Let Λ ∈ B(r, c, α)⊗n be given, for some integer r > 0 and reals
c > 0 and α > 0. Let d and θ be positive integers with
c ∈ (0, 1],
r > 1.
34 ALEXANDER A. SHERSTOV AND PEI WU
For every vector v ∈ Nn with kvk1 > θ, let S(v) ⊆ {1, 2, . . . , n} denote the
corresponding subset identified by Lemma 3.5. To restate the lemma’s guarantees,
θ
|S(v)| > , v ∈ (supp Λ)|>2θ , (4.7)
r
θ
min vi > , v ∈ (supp Λ)|>2θ , (4.8)
i∈S(v) 2|S(v)|(1 + ln n)
and in particular
and in particular
The expression on the right-hand side is well-formed because, to restate (4.11), each
string v|S(v) has weight greater than d and can therefore be used as a subscript in
ζv|S(v) . Specializing (4.15) and (4.16),
Property (4.12) ensures that ζv|S(v) (x|S(v) ) I[x|S(v) = v|S(v) ] 6= 0 only when x 6 v.
It follows that
[
supp ζ ⊆ {x ∈ Nn : x 6 v}
v∈supp Λ
= supp Λ, (4.20)
where the first, second, and fourth steps are valid by (4.12), (4.11), and (4.13),
respectively. Making this substitution in the defining equation for ζ,
X
ζ(x) = Λ(v)ζv|S(v) (x|S(v) )I[kx|S(v) k1 6 d] I[x|S(v) = v|S(v) ]
v∈(supp Λ)|>2θ
X
+ Λ(v)I[x = v]. (4.21)
v∈(supp Λ)|>2θ
> d, (4.22)
where the first step uses Proposition 2.1(i), and the second step applies (4.18).
where the final implication uses (4.9). We conclude that the first summation
in (4.21) vanishes on Nn |>2θ , so that
Step 3: Light inputs. We now turn to inputs of weight less than 2θ, the most
technical part of the proof. Fix an arbitrary string x ∈ (supp Λ)|<2θ . Then
|ζ(x)| X Λ(v)
= ζv|S(v) (x|S(v) ) I[kx|S(v) k1 6 d] I[x|S(v) = v|S(v) ]
Λ(x) v∈(supp Λ)|>2θ Λ(x)
X Λ(v)
6 |ζv| (x|S(v) )| I[kx|S(v) k1 6 d] I[x|S(v) = v|S(v) ]
Λ(x) S(v)
v∈(supp Λ)|>2θ
X Λ(v)
6 2(nr)d I[kx|S(v) k1 6 d] I[x|S(v) = v|S(v) ]
Λ(x)
v∈(supp Λ)|>2θ
X X Λ(v)
= 2(nr)d I[kx|S k1 6 d] I[x|S = v|S ]
Λ(x)
S⊆{1,...,n}: v∈(supp Λ)|>2θ :
|S|>θ/r S(v)=S
X X Λ(v)
6 2(nr)d I[kx|S k1 6 d] I[x|S = v|S ],
n Λ(x)
S⊆{1,...,n}: P v∈N :
|S|>θ/r i∈S vi >θ,
θ
mini∈S vi > 2|S|(1+ln n)
(4.24)
where the first step uses (4.21); the second step applies the triangle inequality; the
third step is valid by (4.19); the fourth step amounts to collecting terms according
to S(v), which by (4.7) has cardinality at least θ/r; and the fifth step uses (4.8)
and (4.10). Nn
Bounding (4.24) requires a bit of work. To start with, write Λ = i=1 λi for
some λ1 , λ2 , . . . , λn ∈ B(r, c, α). Then for every nonempty set S ⊆ {1, 2, . . . , n},
where the first step applies the definition of B(r, c, α), and the third step uses the
bound 1 + t 6 et for real t. Continuing,
X Λ(v)
I[x|S = v|S ]
n Λ(x)
P v∈N :
i∈S vi >θ,
θ
mini∈S vi > 2|S|(1+ln n)
X Y λi (vi )
=
n λi (xi )
P v∈N : i∈S
i∈S vi >θ,
θ
mini∈S vi > 2|S|(1+ln n) ,
vi =xi for i∈S
/
X P Y 1
6 2−α i∈S vi
n c(vi + 1)2 λi (xi )
P v∈N : i∈S
i∈S vi >θ,
θ
mini∈S vi > 2|S|(1+ln n)
,
vi =xi for i∈S
/
X Y 1
6 2−αθ
c(vi + 1)2 λi (xi )
v∈Nn : i∈S
θ
mini∈S vi > 2|S|(1+ln n)
,
vi =xi for i∈S
/
|S|
∞
X 1 Y 1
= 2−αθ
c(t + 1)2 λi (xi )
t=d 2|S|(1+ln i∈S
n) e
θ
!|S|
Z ∞
−αθ dt Y 1
62 2
d 2|S|(1+ln
θ
n) e
ct λi (xi )
i∈S
|S| Y
2|S|(1 + ln n) 1
6 2−αθ , (4.26)
cθ λi (xi )
i∈S
Nn
where the first step uses Λ = i=1 λi , and the second step applies the definition of
B(r, c, α).
It remains to put together the bounds obtained so far. We have:
|S| Y
|ζ(x)| X
−αθ 2|S|(1 + ln n) 1
6 2(nr)d I[kx|S k1 6 d] · 2
Λ(x) cθ λi (xi )
S⊆{1,...,n}: i∈S
|S|>θ/r
|S| α 2 d
d
X
−αθ 2|S|(1 + ln n) 2 e
6 2(nr) 2 ·
c2 θ c
S⊆{1,...,n}:
|S|>θ/r
|S|
(e2 nr/c)d
X 2|S|(1 + ln n)
6 2 · αdθ/2e
2 c2 θ
S⊆{1,...,n}:
|S|>θ/r
38 ALEXANDER A. SHERSTOV AND PEI WU
∞ s
(e2 nr/c)d
X n 2s(1 + ln n)
= 2 · αdθ/2e
2 s c2 θ
s=dθ/re
∞ s
(e2 nr/c)d X
en 2s(1 + ln n)
6 2 · αdθ/2e ·
2 s c2 θ
s=dθ/re
2 d ∞
(e nr/c) X
62· 2−s
2αdθ/2e
s=dθ/re
2 d
(e nr/c)
=4· ,
2αdθ/2e+dθ/re
where the first step follows from (4.24) and (4.26); the second step substitutes the
bound from (4.25); the third step uses (4.2); and the next-to-last step uses (4.3).
In summary, we have shown that
(e2 nr/c)d
|ζ(x)| 6 4 · Λ(x), x ∈ (supp Λ)|<2θ . (4.27)
2αdθ/2e+dθ/re
Corollary 4.2. Let Λ ∈ conv(B(r, c, α, ∆)⊗n ) be given, for some integers r, ∆ > 0
and reals c > 0 and α > 0. Let d and θ be positive integers with
Proof. We first consider the special case when Λ ∈ B(r, c, α, ∆)⊗n . Then by def-
inition, Λ(t1 , . . . , tn ) = Λ0 (t1 − a1 , . . . , tn − an ) for some probability distribution
Λ0 ∈ B(r, c, α)⊗n and integers a1 , . . . , an ∈ [0, ∆]. Applying Theorem 4.1 to Λ0
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 39
The last property implies in particular that Λ̃0 is a nonnegative function. As a re-
sult, (4.34) and Proposition 2.4 guarantee that Λ̃0 is a probability distribution. Now
the sought properties (4.31) and (4.32) follow from (4.33) and (4.34), respectively,
for the probability distribution Λ̃(t1 , . . . , tn ) = Λ̃0 (t1 − a1 , . . . , tn − an ).
In the general case of a convex combination Λ = λ1 Λ1 + · · · + λk Λk of probability
distributions Λ1 , . . . , Λk ∈ B(r, c, α, ∆)⊗n , one uses the technique of the previous
paragraph to transform Λ1 , . . . , Λk individually into corresponding probability dis-
tributions Λ̃1 , . . . , Λ̃k , and takes Λ̃ = λ1 Λ̃1 + · · · + λk Λ̃k .
4.2. A bounded dual polynomial for MP. We now turn to the construction
of a gadget for our amplification theorem. Let B∗ (r, c, α) denote the family of
probability distributions λ on N such that
supp λ = {0, 1, 2, . . . , r0 }
c 1
6 λ(t) 6 , t ∈ supp λ.
(t + 1)2 2αt c(t + 1)2 2αt
Indeed, the containment holds trivially for c 6 1, and remains valid for c > 1
because the left-hand side and right-hand side are both empty in that case. As
before, it will be helpful to have shorthand notation for translates of distributions
in B(r, c, α): we define B∗ (r, c, α, ∆) for ∆ > 0 to be the family of probability
distributions λ on N such that λ(t) = λ0 (t − a) for some λ0 ∈ B∗ (r, c, α) and
a ∈ [0, ∆].
As our next step toward analyzing the threshold degree of AC0 , we will construct
a dual object that witnesses the high threshold degree of MP∗m,r and possesses
additional metric properties in the sense of B∗ . To simplify the exposition, we
start with an auxiliary construction.
Lemma 4.3. Let 0 < < 1 be given. Then for some constants c1 , c2 ∈ (0, 1) and all
integers R > r > 1, there are (explicitly given) probability distributions λ0 , λ1 , λ2
40 ALEXANDER A. SHERSTOV AND PEI WU
such that:
Our analysis of the threshold degree of AC0 only uses the special case R = r of
Lemma 4.3. The more general formulation with R > r will be needed much later,
in the analysis of the sign-rank of AC0 .
1 − 2
ψ(0) > , (4.41)
2
kψk1 = 1, (4.42)
√
orth ψ > c0 r, (4.43)
c0
1
|ψ(t)| ∈ √ , √ , t = 0, 1, . . . , R, (4.44)
(t + 1)2 2c00 t/ r c0 (t + 1)2 2c00 t/ r
1−δ 1 δ
ψ= µ0 − µ1 + µ2 (4.45)
2 2 2
for some 0 6 δ 6 1. In view of (4.41), we infer the more precise bound
06δ< . (4.46)
2
We define
λ0 = µ0 , (4.47)
1 − δ −δ
λ1 = µ1 + δ · µ2 , (4.48)
1 − δ2 1 − δ2
−δ 1 − δ
λ2 = µ1 + δ · µ2 . (4.49)
(1 − δ 2 ) (1 − δ 2 )
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 41
Recall from (4.45) that |ψ| = 12 µ1 + 2δ µ2 on {1, 2, . . . , R}. Comparing the coefficients
in |ψ| = 12 µ1 + 2δ µ2 with the corresponding coefficients in the defining equations for
λ1 and λ2 , where 0 6 δ 6 /2 by (4.46), we conclude that λ1 , λ2 ∈ [c000 |ψ|, |ψ|/c000 ]
on {1, 2, . . . , R} for some constant c000 = c000 () ∈ (0, 1). In view of (4.44), we arrive
at
c0 c000
1
λi (t) ∈ √ , √ ,
(t + 1)2 2c00 t/ r c0 c000 (t + 1)2 2c00 t/ r
i = 1, 2; t = 1, 2, . . . , R. (4.51)
Continuing,
1− 1−δ 1 δ
orth((1 − )λ0 + λ2 − λ1 ) = orth 2 · µ0 − µ1 + µ2
1−δ 2 2 2
1−
= orth 2 · ψ
1−δ
√
> c0 r, (4.52)
where the first step follows from the defining equations (4.47)–(4.49), the second
step uses (4.45), and the final step is a restatement of (4.43).
We are now in a position to verify the claimed properties of λ0 , λ1 , λ2 in the
theorem statement. Property (4.37) follows from (4.47), whereas property (4.38)
is immediate from (4.50) and (4.51). The remaining properties (4.39) and (4.40)
for c2 = c00 and a small enough constant c1 > 0 now follow from (4.51) and (4.52),
respectively.
We are now in a position to construct our desired dual polynomial for the Min-
sky–Papert function.
Theorem 4.4. For some absolute constants c1 , c2 ∈ (0, 1) and all positive integers
m and r, there are probability distributions Λ0 , Λ1 such that
⊗m !
∗ c2
Λi ∈ conv B r, c1 , √ , 1 , i = 0, 1, (4.53)
r
supp Λi ⊆ (MP∗m,r )−1 (i), i = 0, 1, (4.54)
√
orth(Λ1 − Λ0 ) > min{m, c1 r}. (4.55)
The last two properties in the theorem statement are equivalent, in the √ sense of
linear programming duality, to the lower bound deg± (MP∗m,r ) > min{m, c1 r} and
can be recovered in a black-box manner from many previous papers, e.g., [32, 42, 50].
The key additional property that we prove is (4.53), which is where the newly
established Lemma 4.3 plays an essential role.
42 ALEXANDER A. SHERSTOV AND PEI WU
Proof of Theorem 4.4. Take = 1/2 and R = r in Lemma 4.3, and let λ0 , λ1 , λ2 be
the resulting probability distributions. Let
Λ0 = E λ⊗S ⊗S
0 · λ2 ,
S⊆{1,2,...,m}
|S| odd
Λ1 = λ⊗m
1 .
Then (4.53) is immediate from (4.39), whereas (4.54) follows from (4.37) and (4.38).
To verify the remaining property (4.55), rewrite
X
Λ0 = 2−m+1 λ⊗S ⊗S
0 · λ2
S⊆{1,2,...,m}
|S| odd
⊗m ⊗m
1 1 1 1
= λ0 + λ2 − − λ0 + λ2 .
2 2 2 2
Observe that
orth(Λ1 − Λ0 )
⊗m ⊗m !
1 1 1 1
= orth λ⊗m
−
1 λ0 + λ2 + − λ0 + λ2
2 2 2 2
( ⊗m ! ⊗m !)
⊗m 1 1 1 1
> min orth λ1 − λ0 + λ2 , orth − λ0 + λ2
2 2 2 2
( ⊗m !)
1 1 1 1
> min orth λ1 − λ0 − λ2 , orth − λ0 + λ2
2 2 2 2
1 1 1 1
= min orth λ1 − λ0 − λ2 , m orth − λ0 + λ2
2 2 2 2
1 1
> min orth λ1 − λ0 − λ2 , m
2 2
√
> min{c1 r, m},
where the last five steps are valid by Proposition 2.1(i), Proposition 2.1(iii), Propo-
sition 2.1(ii), equation (4.56), and equation (4.40), respectively.
Other notable cases include -approximate degree and one-sided -approximate de-
gree, given by
Lemma 4.5. Let c1 , c2 > 0 be the absolute constants from Theorem 4.4. Let
n, m, r, d, θ be positive integers such that
Then for each z ∈ {0, 1}n , there is a probability distribution Λ̃z on Nnm such that:
Qn
(i) the support of Λ̃z is contained in ( i=1 (MP∗m,r )−1 (zi ))|<2θ+nm ;
(ii) for every polynomial p : Rnm → R of degree at most d, the mapping z 7→
1 √
EΛ̃z p is a polynomial on {0, 1}n of degree at most min{m,c 1 r}
· deg p.
Nn
As a result, the probability distributions Λz = i=1 Λzi for z ∈ {0, 1}n obey
⊗m !!⊗n
c 2
Λz ∈ conv B∗ r, c1 , √ , 1
r
⊗nm !
∗ c2
⊆ conv B r, c1 , √ , 1
r
⊗nm !
c2
⊆ conv B r, c1 , √ , 1 , (4.66)
r
where the last two steps are valid by (2.2) and (4.36), respectively. By (4.60)–(4.62),
(4.66), and Corollary 4.2, there are probability distributions Λ̃z for z ∈ {0, 1}n such
that
We proceed to verify the properties required of Λ̃z . For (i), it follows from (4.64)
Qn
and (4.67) that each Λ̃z has support contained in ( i=1 (MP∗m,r )−1 (zi ))|<2θ+nm .
For (ii), let p be any polynomial of degree at most d. Then (4.68) forces EΛ̃z p =
EΛz p, where the right-hand side is by (4.65) and Proposition 2.2 a polynomial in
√
z ∈ {0, 1}n of degree at most deg p/ orth(Λ1 − Λ0 ) 6 deg p/ min{m, c1 r}.
for all positive integers n, m, θ, all functions f : {0, 1}n → {0, 1, ∗}, and all nonempty
convex sets I0 , I1 , I∗ ⊆ R.
n0 , m0 , r0 , d0 , θ0 , where
n0 = n,
m0 = m,
r0 = m2 ,
θ − nm
θ0 = ,
2
cθ
d0 = .
m log(n + m)
We thus obtain, for each z ∈ {0, 1}n , a probability distribution Λ̃z on Nnm such
that:
Qn
(i) the support of Λ̃z is contained in ( i=1 (MP∗m )−1 (zi ))|6θ ;
(ii) for every polynomial p : R nm
→ R of degree at most d0 , the mapping z 7→
1
EΛ̃z p is a polynomial on {0, 1}n of degree at most cm · deg p.
Since p was chosen arbitrarily from among (I0 , I1 , I∗ )-approximants of (f ◦MP∗m )|6θ
that have degree at most d0 , we conclude that
degI0 ,I1 ,I∗ ((f ◦ MP∗m )|6θ ) > min{cm degI0 ,I1 ,I∗ (f ), d0 + 1}
cθ
> min cm degI0 ,I1 ,I∗ (f ), .
m log(n + m)
The previous composition theorem has the following analogue for Boolean inputs.
Theorem 4.7. Let 0 < c < 1 be the absolute constant from Theorem 4.6. Let
n, m, N be positive integers. Then there is an (explicitly given) transformation
H : {0, 1}N → {0, 1}n , computable by an AND-OR-AND circuit of size (N nm)O(1)
with bottom fan-in O(log(nm)), such that for all functions f : {0, 1}n → {0, 1, ∗}
and all nonempty convex sets I0 , I1 , I∗ ⊆ R,
cN
degI0 ,I1 ,I∗ (f ◦ H) > min cm degI0 ,I1 ,I∗ (f ), − n log(n + m),
50m log2 (n + m)
cN
degI0 ,I1 ,I∗ (f ◦ ¬H) > min cm degI0 ,I1 ,I∗ (f ), − n log(n + m).
50m log2 (n + m)
For the function H with multibit output, the notation ¬H above refers to the
function obtained by negating each of H’s outputs.
46 ALEXANDER A. SHERSTOV AND PEI WU
Proof of Theorem 4.7. As in the previous proof, settling the first lower bound for
all f will automatically settle the second lower bound, due to the invariance of
(I0 , I1 , I∗ )-approximate degree under negation of the input bits. In what follows,
we focus on f ◦ H.
We may assume that N > 50mn log2 (n + m) since otherwise the lower bounds
in the theorem statement are nonpositive and hence trivially true. Define
N
θ= .
50 log(n + m)
Theorem 3.9 gives a surjection G : {0, 1}6θdlog(nm+1)e → Nnm |6θ with the following
two properties:
(i) for every coordinate i = 1, 2, . . . , nm, the mapping x 7→ OR∗θ (G(x)i ) is
computable by an explicit DNF formula of size (nmθ)O(1) = N O(1) with
bottom fan-in O(log(nm));
(ii) for any polynomial p, the map v 7→ EG−1 (v) p is a polynomial on Nnm |6θ
of degree at most (deg p)/dlog(nm + 1) + 1e 6 (deg p)/ log(n + m).
Consider the composition F = (f ◦ MP∗m,θ ) ◦ G. Then
F = (f ◦ (ANDm ◦ OR∗θ )) ◦ G
= f ◦ ((ANDm ◦ OR∗θ , . . . , ANDm ◦ OR∗θ ) ◦ G),
| {z }
n
For this, fix an (I0 , I1 , I∗ )-approximant p for F of degree degI0 ,I1 ,I∗ (F ). Consider
the polynomial p∗ : Nnm |6θ → R given by p∗ (v) = EG−1 (v) p. Since I0 , I1 , I∗ are
convex and p is an (I0 , I1 , I∗ )-approximant for F = (f ◦ MP∗m,θ ) ◦ G, it follows that
p∗ is an (I0 , I1 , I∗ )-approximant for (f ◦ MP∗m,θ )|6θ . Therefore,
where the second step is valid because (f ◦ MP∗m,θ )|6θ either contains the function
(f ◦ MP∗m )|6θ = (f ◦ MP∗m,m2 )|6θ as a subfunction (case θ > m2 ), or is equal to it
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 47
(case θ 6 m2 ); and the third step applies Theorem 4.6. However, property (ii) of
G states that
deg p
deg p∗ 6
log(n + m)
degI0 ,I1 ,I∗ (F )
= .
log(n + m)
Comparing these lower and upper bounds on the degree of p∗ settles (4.69).
At last, we illustrate the use of the previous two composition results to amplify
hardness for polynomial approximation.
for some real number k > 1. Suppose further that f is computable by a Boolean
circuit of size s and depth d, where d > 1. Then there is a function F : {0, 1}N →
1
{0, 1} on N = Θ(n1+ k log2 n) variables with
1
!
N 1− k+1
degI0 ,I1 ,I∗ (F ) > Ω 2 .
log1− k+1 N
Proof. Take
m = dn1/k e,
100 2
N= mn log (n + m) ,
c
where 0 < c < 1 is the absolute constant from Theorem 4.6. Then Theorem 4.7
gives an explicit transformation H : {0, 1}N → {0, 1}n , computable by an AND-
OR-AND circuit of size nO(1) with bottom fan-in O(log n), such that
Now, fix a circuit for f of size s and depth d > 1. Composing the circuits for f and
H results in circuits for f ◦ H and f ◦ ¬H of size s + nO(1) , bottom fan-in O(log n),
and depth at most d + 3. Thus, F can be taken to be either of f ◦ H and f ◦ ¬H.
When the circuit for f is monotone, the depth of F can be reduced to d + 2 as
follows. After merging like gates if necessary, the circuit for f can be viewed as
composed of d layers of alternating gates (∧ and ∨). The bottom layer of f can
therefore be merged with the top layer of either H or ¬H, resulting in a circuit of
depth at most (d + 3) − 1 = 2.
We emphasize that in view of (4.57), the symbol degI0 ,I1 ,I∗ in Theorems 4.6–4.8
can be replaced with the threshold degree symbol deg± . The same goes for any
other special case of (I0 , I1 , I∗ )-approximate degree.
4.4. Threshold degree and discrepancy of AC0 . We have reached our main
result on the sign-representation of constant-depth circuits. For any > 0, the next
theorem constructs a circuit family in AC0 with threshold degree Ω(n1− ). The
proof amounts to a recursive application of the hardness amplification procedure of
Section 4.3.
Theorem 4.9. Let k > 1 be a fixed integer. Then there is an (explicitly given)
family of functions {fk,n }∞ n
n=1 , where fk,n : {0, 1} → {0, 1} has threshold degree
k−1 1 k−2 k−2
deg± (fk,n ) = Ω n k+1 · (log n)− k+1 d 2 eb 2 c (4.70)
f1,n (x) = x1 , n = 1, 2, 3, . . . ,
f2,n (x) = MPbn1/3 c , n = 1, 2, 3, . . . .
For the former, the threshold degree lower bound (4.70) is trivial. For the latter, it
follows from Theorem 2.5.
For the inductive step, fix k > 3. Due to the asymptotic nature of (4.70), it is
enough to construct the functions in {fk,n }∞
n=1 for n larger than a certain constant
of our choosing. As a starting point, the inductive hypothesis gives an explicit
family {fk−2,n }∞ n
n=1 in which fk−2,n : {0, 1} → {0, 1} has threshold degree
k−3 1 k−4 k−4
deg± (fk−2,n ) = Ω n k−1 · (log n)− k−1 d 2 eb 2 c (4.71)
Now, let c > 0 be the absolute constant from Theorem 4.6. For every N larger
than a certain constant, we apply Theorem 4.7 with
l k−1 1 k−4 k−4 2(k−1) c m
n = N k+1 (log N )− k+1 d 2 eb 2 c− k+1 · , (4.72)
l 2 m 100
1 k−4 k−4 4
m = N k+1 (log N ) k+1 d 2 eb 2 c− k+1 , (4.73)
f = fk−2,n , (4.74)
I0 = (0, ∞), (4.75)
I1 = (−∞, 0), (4.76)
I∗ = (−∞, ∞) (4.77)
to obtain a function HN : {0, 1}N → {0, 1}n such that the composition FN =
fk−2,n ◦ HN has threshold degree
cN
deg± (FN ) > min cm deg± (fk−2,n ), − n log(n + m)
50m log2 (n + m)
k−1 1 k−4 k−4 k−3
= Θ N k+1 (log N )− k+1 d 2 eb 2 c− k+1
k−1 1 k−2 k−2
= Θ N k+1 (log N )− k+1 d 2 eb 2 c , (4.78)
where the second step uses (4.71)–(4.73). Moreover, Theorem 4.7 ensures that HN
is computable by an AND-OR-AND circuit of polynomial size and bottom fan-
in O(log N ). The bottom layer of fk−2,n consists of AND gates, which can be
merged with the top layer of HN to produce a circuit for FN = fk−2,n ◦ HN of
depth (k − 2) + 3 − 1 = k.
We have thus constructed, for some constant N0 , a family of functions {FN }∞
N =N0
in which each FN : {0, 1}N → {0, 1} has threshold degree (4.78) and is computable
by a Boolean circuit of polynomial size, depth k, and bottom fan-in O(log N ). Now,
take the circuit for FN and replace the negated inputs in it with N new, unnegated
inputs. The resulting monotone circuit on 2N variables clearly has threshold degree
at least that of FN . This completes the inductive step.
Theorem 4.9 settles Theorem 1.1 from the introduction. Using the pattern matrix
method, we now “lift” this result to communication complexity.
where m = 2dc2`(n) `(n)e2 . Then the discrepancy bound (4.79) is trivial for n 6 2m,
and follows from (4.81) and Theorem 2.17 for n > 2m. The lower bound (4.80) on
the communication complexity of Fn with weakly unbounded error is now immedi-
ate by the discrepancy method (Corollary 2.15).
It remains to examine the circuit complexity of Fn . Since fn is computable by
a monotone circuit of size nO(1) and depth k, with the bottom layer composed
of AND gates of fan-in O(log n), it follows that Fn is computable by a circuit
of size nO(1) and depth at most k + 1 in which the bottom two levels have fan-
in O(log n) · m = O(4`(n) `(n)2 log n) and `(n), in that order. This means that
for `(n) = O(1), the bottom three levels of Fn can be computed by a circuit of
polynomial size, depth 2, and bottom fan-in O(log n), which in turn gives a circuit
for Fn of polynomial size, depth (k +1)−3+2 = k, and bottom fan-in O(log n).
Taking `(n) = 2 in Theorem 4.10 settles Theorem 1.4 from the introduction.
4.5. Threshold degree of surjectivity. We close this section with another ap-
plication of our amplification theorem, in which we take the outer function f to be
the identity map f : {0, 1} → {0, 1} on a single bit.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 51
Proof. Let f : {0, 1} → {0, 1} be the identity function, so that deg± (f ) = 1. Invok-
ing Theorem 4.6 with n = 1 and θ = bm2 log mc, one obtains the claimed lower
bound.
Theorem 4.11 has a useful interpretation. For positive integers n and r, the surjec-
tivity problem is the problem of determining whether a given mapping {1, 2, . . . , n} →
{1, 2, . . . , r} is surjective. This problem is trivial for r > n, and the standard regime
studied in previous work is r 6 cn for some constant 0 < c < 1. The input to the
surjectivity problem is represented by a Boolean matrix x ∈ {0, 1}r×n with pre-
cisely one nonzero entry in every column. More formally, let e1 , e2 , . . . , er be the
standard basis for Rr . The surjectivity function SURJn,r : {e1 , e2 , . . . , er }n → {0, 1}
is given by
r _
^ n
SURJn,r (x1 , x2 , . . . , xn ) = xi,j .
j=1 i=1
The surjectivity problem has seen much work recently [8, 54, 11, 17]. In par-
ticular, Bun and Thaler [17] have obtained an essentially tight lower bound of
√
Ω̃(min{r, n}) on the threshold degree of SURJn,r in the standard regime r 6
(1 − Ω(1))n. As a corollary to Theorem 4.11, we give a new proof of Bun and
Thaler’s result, sharpening their bound by a polylogarithmic factor.
Proof. Define
r
0 n−r
r = min r − 1, . (4.84)
1 + log(n − r)
52 ALEXANDER A. SHERSTOV AND PEI WU
We may assume that r0 > 1 since (4.83) holds trivially otherwise. The identity
0
holds for all (x1 , x2 , . . . , xr0 ) ∈ Nr |6n−(r−r0 ) , whence
Now
where the four steps use (4.82), (4.85), (4.84), and Theorem 4.11, respectively.
5.1. A simple lower bound for depth 3. We start by presenting a new proof
of Razborov and Sherstov’s exponential lower bound [39] on the sign-rank of AC0 .
More precisely, we prove the following stronger result that was not known before.
Theorem 5.1. There is a constant 0 < c < 1 such that for all positive integers m
and r,
√
deg± (MPm,r , 12−m−1 ) > min{m, c r}.
Theorem 5.1 is asymptotically optimal, and it is the first lower bound on the smooth
threshold degree of the Minsky–Papert function. As we will discuss shortly, this
theorem implies an exp(Ω(n1/3 )) lower bound on the sign-rank of AC0 . In addition,
we will use Theorem 5.1 as the base case in the inductive proof of Theorem 1.3.
Proof of Theorem 5.1. It is well-known [33, 36, 57] that for some constant c > 0
and all r,√any real polynomial p : {0, 1}r → R with kp − NORr k∞ 6 0.49 has degree
at least c r. By linear programming duality [50, Theorem 2.5], this approximation-
theoretic fact is equivalent to the existence of a function ψ : {0, 1}r → R with
The rest of the proof is a reprise of Section 4.2. To begin with, property (5.3)
makes it possible to view |ψ| as a probability distribution on {0, 1}r . Let µ0 , µ1 , µ2
be the probability distributions induced by |ψ| on the sets {0r }, {x 6= 0r : ψ(x) < 0},
and {x 6= 0r : ψ(x) > 0}, respectively. It is clear from (5.2) that the negative part
of ψ is a multiple of µ1 , whereas the positive part of ψ is a nonnegative linear
combination of µ0 and µ2 . Moreover, it follows from hψ, 1i = 0 and kψk1 = 1 that
the positive and negative parts of ψ both have `1 -norm 1/2. Summarizing,
1−δ 1 δ
ψ= µ0 − µ1 + µ2 (5.5)
2 2 2
54 ALEXANDER A. SHERSTOV AND PEI WU
1
06δ< . (5.6)
50
Let υ be the uniform probability distribution on {0, 1}r \ {0r }. We define
λ0 = µ0 , (5.7)
2 2
λ1 = µ1 + 1 − υ, (5.8)
3(1 − δ) 3(1 − δ)
2δ 2δ
λ2 = µ2 + 1 − υ. (5.9)
1−δ 1−δ
whereas
1
λi > υ, i = 1, 2. (5.12)
4
The defining equations (5.7)–(5.9) further imply that
2 1 4
λ0 + λ2 − λ1 = ψ,
3 3 3(1 − δ)
√
2 1
orth λ0 + λ2 − λ1 > c r. (5.13)
3 3
Multiplying out the tensor products in the definition of Λ and collecting like terms,
we obtain
1 X 2|S| − (−1)|S| ⊗S ⊗S 1 ⊗m
Λ= λ0 · λ2 + λ1 (5.14)
2 3m 2
S⊆{1,2,...,m}
S6=∅
1 X 2|S| ⊗S ⊗S 1 ⊗m
> λ · λ2 + λ1
4 3m 0 2
S⊆{1,2,...,m}
S6=∅
⊗S ⊗m
2|S| ⊗S
1 X 1 1 1
> λ · υ + υ
4 3m 0 4 2 4
S⊆{1,2,...,m}
S6=∅
⊗S
1 X 2|S| ⊗S 1
> λ · υ
4 3m 0 4
S⊆{1,2,...,m}
⊗m
1 2 1 1
= λ0 + · υ
4 3 3 4
m
1 1
> 1({0,1}r )m , (5.15)
4 12 · 2r
1 X 2|S| − (−1)|S| ⊗S ⊗S
Λ · (−1)MPm,r = λ0 · λ2 · (−1)MPm,r
2 3m
S⊆{1,2,...,m}
S6=∅
1
+ λ⊗m · (−1)MPm,r
2 1
1 X 2|S| − (−1)|S| ⊗S 1 ⊗m
= λ0 · λ⊗S
2 − λ1
2 3m 2
S⊆{1,2,...,m}
S6=∅
⊗m ⊗m
1 2 1 1 1 1 1
= λ0 + λ2 − − λ0 + λ2 − λ⊗m ,
2 3 3 2 3 3 2 1
56 ALEXANDER A. SHERSTOV AND PEI WU
where the first step uses (5.14); the second step uses (5.10) and (5.11); and the
final equality can be verified by multiplying out the tensor powers and collecting
like terms. Now
orth(Λ · (−1)MPm,r )
( ⊗m !
2 1 1 1 ⊗m
= min orth λ0 + λ2 − λ1 ,
3 2 3 2
⊗m !)
1 1 1
orth − − λ0 + λ2
2 3 3
2 1 1 1
> min orth λ0 + λ2 − λ1 , m orth − λ0 + λ2
3 3 3 3
√
1 1
> min c r, m orth − λ0 + λ2
3 3
√
> min{c r, m},
where the first step applies Proposition 2.1(i); the second step applies Proposi-
tion 2.1(ii), (iii); the third step substitutes the lower bound from (5.13); and the
last step uses h−λ0 + λ2 , 1i = −hλ0 , 1i + hλ2 , 1i = −1 + 1 = 0. Combining this
conclusion with (5.15) and (5.16) completes the proof.
Theorem 5.2 (Razborov and Sherstov). Define Fn : {0, 1}n × {0, 1}n → {0, 1} by
Then
1/3
rk± (Fn ) > 2Ω(n )
.
for some absolute constants c0 , c00 > 0 and all n. This lower bound along with
Theorem 2.18 implies that the composition
has sign-rank rk± (Hn ) = exp(Ω(n1/3 )). This completes the proof because for some
integer constant c > 1, each Hn is a subfunction of Fcn .
5.2. Local smoothness. The remainder of this paper focuses on our exp(Ω(n1− ))
lower bound on the sign-rank of AC0 , whose proof is unrelated to the work in Sec-
tion 4 and Section 5.1. Central to our approach is an analytic notion that we call
local smoothness. Formally, let Φ : Nn → R be a function of interest. For a subset
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 57
Put another way, for any two points of X at distance d, the corresponding values of
Φ differ in magnitude by a factor of at most K d . For any set X, we let Smooth(K, X)
denote the family of functions that are K-smooth on X. The following proposition
collects basic properties of local smoothness, to which we refer as the restriction
property, scaling property, tensor property, and conical property.
Proof. Properties (i) and (ii) are immediate from the definition of K-smoothness.
For (iii), fix Φ ∈ Smooth(K, X) and Ψ ∈ Smooth(K, Y ). Then for all (x, y), (x0 , y 0 ) ∈
X × Y, we have
0 0
|Φ(x)Ψ(y)| 6 K |x−x | |Φ(x0 )| K |y−y | |Ψ(y 0 )|
0 0
= K |(x,y)−(x ,y )| |Φ(x0 )Ψ(y 0 )|,
where the first step uses the K-smoothness of Φ and Ψ. Finally, for (iv), let a and
b be nonnegative reals. Then
for all x, x0 ∈ X, where the second step uses the K-smoothness of Φ and Ψ.
We will take a special interest in locally smooth functions that are probability
distributions. For our purposes, it will be sufficient to consider locally smooth
distributions whose support is the Cartesian product of integer intervals. For an
integer n > 1 and a real number K > 1, we let S(n, K) denote the set of probability
distributions Λ such that:
Qn
(i) Λ is supported on i=1 {0, 1, 2, . . . , ri }, for some r1 , r2 , . . . , rn ∈ N;
(ii) Λ is K-smooth on its support.
Analogous to the development in Section 4, we will need a notation for translates of
distributions in S(n, K). For ∆ > 0, we let S(n, K, ∆) denote the set of probability
distributions Λ ∈ D(Nn ) such that Λ(t1 , . . . , tn ) ≡ Λ0 (t1 − a1 , . . . , tn − an ) for some
fixed Λ0 ∈ S(n, K) and a ∈ Nn |6∆ . As a special case, S(n, K, 0) = S(n, K).
Specializing Proposition 5.3(iii) to this context, we obtain:
Proof. The only nontrivial property to verify is K-smoothness, which follows from
Proposition 5.3(iii).
x0 ∈ X|6θ−d ,
x0 6 x,
|x0 − x| 6 d.
Λ(x) 6 K d Λ(x0 ).
for every u ∈ X.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 59
Proof. Fix u ∈ X for the rest of the proof. If |u| > θ + d, then X|6θ \ Bd (u) = X|6θ
and the statement holds trivially. In what follows, we treat the complementary case
|u| 6 θ + d. Here, the key is to find a vector u0 with
|u − u0 | = d + 1, (5.18)
0
u ∈ X|6θ . (5.19)
(i) If |u| > d, decrease one or more of the components of u as needed to obtain
a vector u0 whose components are nonnegative integers that sum to exactly
|u| − d − 1. Then (5.18) is immediate, whereas (5.19) follows in view of
|u| 6 θ + d. Qn
(ii) If |u| 6 d, the analysis is more subtle. P Recall thatPu ∈ i=1 {0, 1, 2, . . . , ri }
and therefore |(r1 , . . . , rn ) − u| = ri − |u| > ri − d > d, where the
last step uses (5.17). As a result, by increasing the components of u as
n
necessary, one can obtain a vector u0 ∈ i=1 {0, 1, 2, . . . , ri } with |u0 | =
Q
|u| + d + 1. Then property (5.18) is immediate. Property (5.19) follows
from |u0 | = |u| + d + 1 6 2d + 1 < θ + 1, where the last step uses (5.17).
Now that u0 has been constructed, apply the K-smoothness of Λ to conclude that
for every x ∈ X|6θ ∩ Bd (u),
0
Λ(x) 6 K |x−u | Λ(u0 )
0
6 K |x−u|+|u−u | Λ(u0 )
6 K 2d+1 Λ(u0 ), (5.20)
where the first inequality is the result of summing (5.20) over x ∈ X|6θ ∩ Bd (u);
the third step uses (5.18) and (5.19); and the last step applies (5.1). To complete
the proof, add Λ(X|6θ \ Bd (u)) to both sides of (5.21).
Lemma 5.7. Fix points u, v ∈ Nn and a natural number d < |u − v|. Then there is
a function ζu,v : cube(u, v) → R such that
Then (5.22) and (5.23) are immediate from (5.26) and (5.27), respectively. Prop-
erty (5.24) can be verified as follows:
X
kζu,v k1 = |ζu∗ (|x1 − v1 |, |x2 − v2 |, . . . , |xn − vn |)|
x∈cube(u,v)
X
= |ζu∗ (w)|
w∈Nn :
w6u∗
∗
|u |d
61+2 ,
d
where the last step uses (5.28). For (5.25), fix an arbitrary polynomial p of degree
at most d. Then at every point x ∈ cube(u, v), we have
where the second, fourth, and fifth steps are valid by (5.30), (5.26), and (5.29),
respectively.
Our next result is a smooth analogue of Lemma 5.7. The smoothness offers a great
deal of flexibility when using the lemma to transfer `1 mass from one region of Nn
to another.
Qn
Lemma 5.8. Let X = i=1 {0, 1, 2, . . . , ri }, where each ri > 0 is an integer. Let θ
and d be nonnegative integers with
( n )
1 X
d < min θ, ri .
3 i=1
Proof. We have
1 = Λ(X|6θ )
n+d
6 Kd Λ(X|6θ−d )
d
2
d+1 3d+1 n + d
62 K Λ(X|6θ−d \ Bd (u)), (5.36)
d
where the last two steps apply Propositions 5.5 and 5.6, respectively.
62 ALEXANDER A. SHERSTOV AND PEI WU
where the first step uses v ∈ X|6θ , and the second step is legitimate because Λ is a
K-smooth probability distribution on X|6θ and therefore Λ 6= 0 at every point of
X|6θ . Combining (5.39) and (5.42),
diam({u} ∪ supp Λ)
kζu,v k∞ 6 2d . (5.43)
d
We define Zu : X → R by
1 X
Zu (x) = Λ(v) ζu,v (x),
Λ(X|6θ−d \ Bd (u))
v∈X|6θ−d \Bd (u)
which is legitimate since Λ(X|6θ−d \ Bd (u)) > 0 by (5.36). Then properties (5.31),
(5.32), (5.33), and (5.34) for Zu are immediate from the corresponding proper-
ties (5.38), (5.39), (5.40), and (5.42) of ζu,v .
It remains to verify (5.35). Fix x 6= u. If x ∈ / X|6θ , then (5.38) implies that
Zu (x) = 0 and therefore (5.35) holds in that case. In the complementary case when
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 63
x ∈ X|6θ , we have
X Λ(v)
|Zu (x)| 6 · |ζu,v (x)|
Λ(X|6θ−d \ Bd (u))
v∈X|6θ−d \Bd (u)
X Λ(v)
= · |ζu,v (x)|
Λ(X|6θ−d \ Bd (u))
v∈X|6θ−d \Bd (u):
|v−x|6d
K d Λ(x)
X diam({u} ∪ supp Λ)
6 · 2d
Λ(X|6θ−d \ Bd (u)) d
v∈X|6θ−d \Bd (u):
|v−x|6d
K d Λ(x)
n+d diam({u} ∪ supp Λ)
6 2d · · 2d ,
d Λ(X|6θ−d \ Bd (u)) d
where the first step applies the triangle inequality to the definition of Zu ; the
second step uses (5.37) and x 6= u; the third step applies the K-smoothness of
Λ and substitutes the bound from (5.43); and the final step uses (5.1). In view
of (5.36), this completes the proof of (5.35).
We now show how to efficiently zero out a locally smooth function on points of
large Hamming weight. The modified function is pointwise close to the original and
cannot be distinguished from it by any low-degree polynomial.
Qn
Lemma 5.9. Define X = i=1 {0, 1, 2, . . . , ri }, where each ri > 0 is an integer. Let
θ and d be nonnegative integers with
θ
d< . (5.44)
3
Let Φ : X → R be a function that is K-smooth on X|6θ , with Φ|6θ 6≡ 0. Then there
is Φ̃ : X → R such that
u ∈ X a function Zu : X → R with
Zu (u) = 1, (5.48)
3
n+d diam({u} ∪ supp Λ) |Φ(x)|
|Zu (x)| 6 23d+1 K 4d+1
d d kΦ|6θ k1
for x 6= u, (5.49)
orth Zu > d, (5.50)
supp Zu ⊆ X|6θ ∪ {u}. (5.51)
Now define
X
Φ̃ = Φ − Φ(u)Zu .
u∈X|>θ
Then (5.45) is immediate from (5.50). To verify (5.46), fix any point x ∈ X|>θ .
Then
X
Φ̃(x) = Φ(x) − Φ(u)Zu (x)
u∈X|>θ
where the last two steps use (5.51) and (5.48), respectively.
It remains to verify (5.47) on X|6θ :
X
|Φ − Φ̃| 6 |Φ(u)| |Zu |
u∈X|>θ :
Φ(u)6=0
3
3d+1 4d+1 n+d diam(supp Φ) X |Φ|
62 K |Φ(u)| ·
d d kΦ|6θ k1
u∈X|>θ :
Φ(u)6=0
3
n+d diam(supp Φ) kΦ|>θ k1
= 23d+1 K 4d+1 · |Φ|,
d d kΦ|6θ k1
For
Qntechnical reasons, we need a generalization of the previous lemma to functions
on i=1 {∆i , ∆i + 1, . . . , ∆i + ri } for nonnegative integers ∆i and ri , and further
to convex combinations of such functions. We obtain these generalizations in the
two corollaries that follow.
Qn
Corollary 5.10. Define X = i=1 {∆i , ∆i + 1, . . . , ∆i + ri }, where all ∆i and ri
are nonnegative integers. Let θ and d be nonnegative integers with
n
!
1 X
d< θ− ∆i .
3 i=1
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 65
θ0
d< . (5.55)
3
Consider the function Φ0 : X 0 → R given by Φ0 (x) = Φ(x + (∆1 , ∆2 . . . , ∆n )). Then
any two points u, v ∈ X 0 |6θ0 obey
and
3
diam(supp Φ0 ) kΦ0 |>θ0 k1
n+d
|Φ0 − Φ̃0 | 6 23d+1 K 4d+1 · |Φ0 |
d d kΦ0 |6θ0 k1
3
3d+1 4d+1 n + d diam(supp Φ) kΦ|>θ k1
=2 K · |Φ0 |
d d kΦ|6θ k1
Corollary 5.11. Fix integers ∆, d, θ > 0 and n > 1, and a real number δ, where
δ ∈ [0, 1),
1
d < (θ − ∆).
3
66 ALEXANDER A. SHERSTOV AND PEI WU
N
X
Λ= λi Λi
i=1
P
for some positive reals λ1 , . . . , λN with λi = 1, where Λi ∈ S(n, K, ∆) and
Λi (Nn |>θ ) 6 δ. Then clearly
n
[
supp Λ = supp Λi . (5.56)
i=1
where the last property follows from the two before it. In view of (5.56)–(5.60), the
PN
proof is complete by taking Λ̃ = i=1 λi Λ̃i .
Our next result uses local smoothness to achieve something completely different.
Here, we show how to start with a locally smooth function and make it globally
min-smooth. The new function has the same sign pointwise as the original, and
cannot be distinguished from it by any low-degree polynomial. Crucially for us, the
global min-smoothness can be achieved relative to any distribution on the domain.
Qn
Lemma 5.12. Define X = i=1 {0, 1, 2, . . . , ri }, where each ri > 0 is an integer.
Let θ and d be nonnegative integers with
( n )
1 X
d < min θ, ri .
3 i=1
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 67
Let Φ : X|6θ → R be a function that is K-smooth on X|6θ . Then for every proba-
bility distribution Λ∗ on X|6θ , there is Φ∗ : X|6θ → R such that
Zu (u) = 1, (5.65)
N
kZu k1 6 + 1, (5.66)
2
|Φ(x)|
|Zu (x)| 6 N · , x 6= u, (5.67)
kΦk1
orth Zu > d. (5.68)
kΦk1 X
Φ∗ = Φ + gn(Φ(u))Λ∗ (u)Zu .
sg
N
u∈X|6θ
kΦk1 X
kΦ∗ k1 6 kΦk1 + Λ∗ (u) kZu k1
N
u∈X|6θ
X
kΦk1 N
6 kΦk1 + · +1 Λ∗ (u)
N 2
u∈X|6θ
3N + 2
= kΦk1
2N
6 2 kΦk1 , (5.69)
68 ALEXANDER A. SHERSTOV AND PEI WU
where the second step uses (5.66). The remaining properties (5.63) and (5.64) can
be established simultaneously as follows: for every x ∈ X|6θ ,
gn(Φ(x)) · Φ∗ (x)
sg
kΦk1 X
= |Φ(x)| + Λ∗ (u)Zu (x)
N
u∈X|6θ
kΦk1 ∗ kΦk1 X
> |Φ(x)| + Λ (x)Zx (x) − Λ∗ (u) |Zu (x)|
N N
u∈X|6θ :
u6=x
kΦk1 ∗ kΦk1 X
= |Φ(x)| + Λ (x) − Λ∗ (u) |Zu (x)|
N N
u∈X|6θ :
u6=x
kΦk1 ∗ kΦk1 |Φ(x)| X
> |Φ(x)| + Λ (x) − ·N · Λ∗ (u)
N N kΦk1
u∈X|6θ :
u6=x
kΦk1 ∗
= |Φ(x)| + Λ (x) − |Φ(x)| (1 − Λ∗ (x))
N
kΦk1 ∗
> Λ (x), (5.70)
N
where the third and fourth steps use (5.65) and (5.67), respectively.
5.5. A locally smooth dual polynomial for MP. As Sections 5.2–5.4 show,
local smoothness implies several useful metric and analytic properties. To tap into
this resource, we now construct a locally smooth dual polynomial for the Min-
sky–Papert function. It is helpful to view this new result as a counterpart of
Theorem 4.4 from our analysis of the threshold degree of AC0 . The new proof
is considerably more technical because local smoothness is a delicate property to
achieve.
Theorem 5.13. For some absolute constant 0 < c < 1 and all positive integers
m, r, R with r 6 R, there are probability distributions Λ0 and Λ1 such that
Our proof of Theorem 5.13 repeatedly employs the following simple but useful
criterion for K-smoothness: a probability distribution λ is K-smooth on an integer
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 69
interval I = {i, i+1, i+2, . . . , j} if and only if the probabilities of any two consecutive
integers in I are within a factor of K.
Proof of Theorem 5.13. Abbreviate = 1/6. For some absolute constants c0 , c00 ∈
(0, 1), Lemma 4.3 constructs probability distributions λ0 , λ1 , λ2 such that
We infer that
for some large constant K = K(c0 , c00 ) > 1. Indeed, (5.80) is trivial since λ0 is the
single-point distribution on the origin; (5.81) holds because by (5.77) and (5.78), the
probabilities of any pair of consecutive integers in supp λ1 = {1, 2, . . . , R} are the
same up to a constant factor; and (5.82)–(5.84) can be seen analogously, by compar-
ing the probabilities of any pair of consecutive integers. Combining (5.80)–(5.84)
with Proposition 5.4, we obtain
The proof centers around the dual objects Ψ1 , Ψ2 : {0, 1, 2, . . . , R}m → R given
by
⊗m
1 m
Ψ1 = λ0 + λ1 − 2λ⊗m
1
m+1 m+1
and
We will settle Claims 5.14–5.17 shortly, once we complete the main proof. Define
2
Λ0 = pos(Ψ1 + Ψ2 ),
kΨ1 k1 + kΨ2 k1
2
Λ1 = neg(Ψ1 + Ψ2 ),
kΨ1 k1 + kΨ2 k1
where the denominators are nonzero by (5.90). We proceed to verify the properties
required of Λ0 and Λ1 in the theorem statement.
Support. Recall from Claim 5.16 that the positive parts of Ψ1 and Ψ2 are
supported on (MP∗m,R )−1 (0). Therefore, the positive part of Ψ1 + Ψ2 is supported
on (MP∗m,R )−1 (0) as well, which in turn implies that
Analogously, Claim 5.16 states that the negative parts of Ψ1 and Ψ2 are supported
on (MP∗m,R )−1 (1). As a result, the negative part of Ψ1 + Ψ2 is also supported on
(MP∗m,R )−1 (1), whence
2
Λ0 − Λ1 = (Ψ1 + Ψ2 ),
kΨ1 k1 + kΨ2 k1
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 71
where the first step uses the nonnegativity of Λ0 and Λ1 , and the last step ap-
plies (5.98). In addition,
2
kΛ0 k1 + kΛ1 k1 = (k pos(Ψ1 + Ψ2 )k1 + k neg(Ψ1 + Ψ2 )k1 )
kΨ1 k1 + kΨ2 k1
2
= kΨ1 + Ψ2 k1
kΨ1 k1 + kΨ2 k1
= 2, (5.100)
where the last step uses Claim 5.16. A consequence of (5.99) and (5.100) is that
kΛ0 k1 = kΛ1 k1 = 1, which makes Λ0 and Λ1 probability distributions. In view
of (5.96) and (5.97), we conclude that
In particular,
Λ0 + Λ1
∈ D({0, 1, 2, . . . , R}m ). (5.102)
2
Smoothness. We have
Λ0 + Λ1 |Ψ1 + Ψ2 |
=
2 kΨ1 k1 + kΨ2 k1
1 1
= |Ψ1 | + |Ψ2 |, (5.103)
kΨ1 k1 + kΨ2 k1 kΨ1 k1 + kΨ2 k1
where the first step follows from the defining equations for Λ0 and Λ1 , and the sec-
ond step uses Claim 5.16. Inequality (5.90) shows that at every point, |Ψ1 | is within
a factor of 5 of the tensor product ( m+11 m
λ0 + m+1 λ1 )⊗m , which by (5.87) is Km-
smooth on its support. It follows that |Ψ1 | is 25Km-smooth on {0, 1, 2, . . . , R}m . By
an analogous argument, (5.93) and (5.86) imply that |Ψ2 | is 9K-smooth (and hence
also 25Km-smooth) on {0, 1, 2, . . . , R}m . Now (5.103) shows that 12 (Λ0 + Λ1 ) is a
conical combination of two nonnegative 25Km-smooth functions on {0, 1, 2, . . . , R}m .
72 ALEXANDER A. SHERSTOV AND PEI WU
By Proposition 5.3(iv),
Λ0 + Λ 1
∈ Smooth(25Km, {0, 1, 2, . . . , R}m ). (5.104)
2
Λ0 +Λ1
Having examined the convex combination 2 , we now turn to the individual
distributions Λ0 and Λ1 . We have
2
Λ0 = pos(Ψ1 + Ψ2 )
kΨ1 k1 + kΨ2 k1
2
= (pos(Ψ1 ) + pos(Ψ2 ))
kΨ1 k1 + kΨ2 k1
∈ cone({λ0 , λ1 , λ2 }⊗m ),
where the first equation restates the definition of Λ0 , the second step applies (5.94),
and the last step uses (5.88) and (5.91). Analogously,
2
Λ1 = neg(Ψ1 + Ψ2 )
kΨ1 k1 + kΨ2 k1
2
= (neg(Ψ1 ) + neg(Ψ2 ))
kΨ1 k1 + kΨ2 k1
∈ cone({λ⊗m ⊗m
1 , λ2 }),
where the first equation restates the definition of Λ1 , the second step applies (5.95),
and the last step uses (5.89) and (5.92). Thus, Λ0 and Λ1 are conical combinations
of probability distributions in {λ0 , λ1 , λ2 }⊗m . Since Λ0 and Λ1 are themselves prob-
ability distributions, we conclude that
Λ0 , Λ1 ∈ conv({λ0 , λ1 , λ2 }⊗m ).
By (5.76)–(5.78),
1
λi (t) 6 √ (t ∈ N; i = 0, 1, 2)
c000 (t + 1)2 2c000 t/ r
for some constant c000 > 0. The last two equations along with (5.80)–(5.82) yield
(
Λ0 , Λ1 ∈ conv λ ∈ S(1, K, 1) :
⊗m !
1
λ(t) 6 000 √ for t ∈ N . (5.105)
c (t + 1)2 2c000 t/ r
Now (5.96)–(5.98), (5.104), and (5.105) imply (5.71)–(5.75) for a small enough
constant c > 0.
We now settle the four claims made in the proof of Theorem 5.13.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 73
Proof of Claim 5.14. Multiplying out the tensor product in the definition of Ψ1 and
collecting like terms, we obtain
m
m
Ψ1 = − 2 − λ⊗m
1
m+1
|S| m−|S|
X 1 m
+ λ⊗S ⊗S
0 · λ1 . (5.106)
m+1 m+1
S⊆{1,2,...,m}
S6=∅
Recall from (5.76) and (5.77) that λ0 and λ1 are supported on {0} and {1, 2, . . . , R},
respectively. Therefore, the right-hand side of (5.106) is the sum of 2m nonzero
functions whose supports are pairwise disjoint. Now (5.88) and (5.89) follow directly
from (5.106). One further obtains that
m
m
|Ψ1 | = 2− λ⊗m
1
m+1
|S| m−|S|
X 1 m
+ λ⊗S ⊗S
0 · λ1 .
m+1 m+1
S⊆{1,2,...,m}
S6=∅
Comparing the right-hand sides of the last two equations settles (5.90).
Proof of Claim 5.15. Multiplying out the tensor powers in the definition of Ψ2 and
collecting like terms, we obtain
m
m X
Ψ2 = − m λ⊗m
2 + a|S| λ⊗S ⊗S
0 · λ2 , (5.107)
m+1
S⊆{1,2,...,m}
S6=∅
As in the proof of the previous claim, recall from (5.76) and (5.77) that λ0 and λ2
have disjoint support. Therefore, the right-hand side of (5.107) is the sum of 2m
nonzero functions whose supports are pairwise disjoint. Now (5.91) and (5.92) are
immediate from (5.108). The disjointness of the supports of the summands on the
right-hand side of (5.107) also implies that
m
m X
|Ψ2 | = m λ⊗m
2 + |a|S| | λ⊗S ⊗S
0 · λ2 .
m+1
S⊆{1,2,...,m}
S6=∅
Proof of Claim 5.16. Recall from (5.76) and (5.77) that supp λ0 = {0} and supp λ1 =
supp λ2 = {1, 2, . . . , R}. In this light, (5.88)–(5.90) imply
Since the support of each Ψi is the disjoint union of the supports of its positive and
negative parts, (5.94) and (5.95) follow.
We have
1 m
orth A > orth λ0 + λ1
m+1 m+1
1 m
− λ0 + ((1 − )λ0 + λ2 )
m+1 m+1
m
= orth − ((1 − )λ0 + λ2 − λ1 )
m+1
√
> c0 r, (5.110)
where the first step uses Proposition 2.1(iii), and the last step is a restatement
of (5.79). Analogously,
where the first and second steps use Proposition 2.1(iii) and (5.79), respectively.
Finally,
where the second step applies Proposition 2.1(ii), and the third step is valid because
h−λ0 + λ2 , 1i = −hλ0 , 1i + hλ2 , 1i = − + = 0. By (5.109)–(5.112), the proof
is complete.
√
1 θ
d 6 min m deg± (f, γ), r deg± (f, γ), √ , (5.113)
C r log(2nmR)
76 ALEXANDER A. SHERSTOV AND PEI WU
one has:
∗
orth((−1)f ◦MPm,R · Λ) > d, (5.114)
−9d ∗
Λ > γ · (CnmR) Λ (5.115)
Proof. Let 0 < c < 1 be the constant from Theorem 5.13. Take C > 1/c to be a
sufficiently large absolute constant. By hypothesis,
Abbreviate
X = {0, 1, 2, . . . , R}nm ,
√
δ = 2−cθ/(2 r)
. (5.117)
1
d< min{θ − nm, nmR}, (5.118)
3
8enm(1 + ln(nm))
θ> , (5.119)
c
3
23d+1 nm + d
nmR δ 1
< , (5.120)
c4d+1 d d 1−δ 2
4d+1 3
(CnmR)9d
3m nm + d nmR
23d+1 6 . (5.121)
c d d 4
For example, (5.118) holds because d 6 nm/C by (5.113) and θ > Cnm log(2nm)
by (5.116). Inequalities (5.119)–(5.121) follow analogously from (5.113) and (5.116)
for a large enough constant C. The rest of the proof splits neatly into four major
steps.
Then
⊗mn !
1 1
Λz ∈ conv λ ∈ S 1, , 1 : λ(t) 6 √ for t ∈ N
c c(t + 1)2 2ct/ r
⊗mn
1
⊆ conv S 1, , 1 ∩
c
⊗mn !
1
λ ∈ D(N) : λ(t) 6 √ for t ∈ N
c(t + 1)2 2ct/ r
⊗mn n !
1 √ o
mn nm −cθ/(2 r)
⊆ conv S 1, , 1 ∩ Λ ∈ D(N ) : Λ(N |>θ ) 6 2
c
⊗mn !
1 mn nm
⊆ conv S 1, , 1 ∩ {Λ ∈ D(N ) : Λ(N |>θ ) 6 δ}
c
1 mn nm
⊆ conv S nm, , nm ∩ {Λ ∈ D(N ) : Λ(N |>θ ) 6 δ} , (5.126)
c
where the first step uses (2.2) and (5.125); the third step is valid by (5.119) and
Lemma 3.6; the fourth step is a substitution from (5.117); and the last step is an
application of Proposition 5.4.
and
3
23d+1 nm + d
diam(supp Λz ) δ
|Λz − Λ̃z | 6 4d+1 · Λz on Nnm |6θ .
c d d 1−δ
1
|Λz − Λ̃z | 6 Λz on Nnm |6θ . (5.130)
2
Properties (5.128) and (5.130) imply that Λ̃z is a nonnegative function, which
along with (5.127) and Proposition 2.4 implies that Λ̃z is a probability distribution.
78 ALEXANDER A. SHERSTOV AND PEI WU
Qn
Again by (5.131), the support of Λ̃z is contained in i=1 (MP∗m,R )−1 (zi ). This means
in particular that f ◦ MP∗m,R = f (z) on the support of Λ̃z , whence
∗
(−1)f (z) Λ̃z = (−1)f ◦MPm,R · Λ̃z (5.132)
The fact that the Λ̃z are supported on pairwise disjoint sets of inputs forces
X
|Φ| = 2−n Λ̃z (5.134)
z∈{0,1}n
and in particular
kΦk1 = 1. (5.135)
We now examine the smoothness of Φ. For this, consider the probability distri-
bution
X
Λ = 2−n Λz . (5.136)
z∈{0,1}n
Comparing equations (5.134) and (5.136) term by term and using the upper bound
(5.130), we find that |Λ − |Φ|| 6 12 Λ on X|6θ . Equivalently,
1 3
Λ 6 |Φ| 6 Λ on X|6θ. (5.137)
2 2
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 79
But
⊗n
1 1
Λ= Λ0 + Λ1
2 2
m ⊗n
∈ Smooth , {0, 1, 2, . . . , R}m
mc
⊆ Smooth , {0, 1, 2, . . . , R}mn , (5.138)
c
where the last two steps are valid by (5.124) and Proposition 5.3(iii), respectively.
Combining (5.137) and (5.138), we conclude that Φ is (3m/c)-smooth on X|6θ . As
a result, (5.118) and Lemma 5.12 provide a function Φ∗ : X|6θ → R with
kΦ∗ k1 6 2. (5.143)
Finally, using diam(supp Φ) 6 nmR along with the bounds (5.121) and (5.135), we
can restate (5.142) as
Define
X
Φfinal = µ(z)(−1)f (z) Λ̃z − γΦ + γΦ∗ .
z∈{0,1}n
80 ALEXANDER A. SHERSTOV AND PEI WU
Moreover,
X
kΦfinal k1 6 µ(z)kΛ̃z k1 + γkΦk1 + γkΦ∗ k1
z∈{0,1}n
6 1 + 3γ
6 4, (5.149)
where the first step applies the triangle inequality, and the second step uses (5.131),
(5.135) and (5.143). Continuing,
∗
(−1)f ◦MPm,R · Φfinal
f ◦MP∗
X
= (−1) m,R · (µ(z) − γ2−n )(−1)f (z) Λ̃z + γΦ∗
z∈{0,1}n
X ∗ ∗
= (µ(z) − γ2−n )(−1)f ◦MPm,R · (−1)f (z) Λ̃z + γ(−1)f ◦MPm,R · Φ∗
z∈{0,1}n
X
= (µ(z) − γ2−n )Λ̃z + γ|Φ∗ | (5.150)
z∈{0,1}n
∗
> γ|Φ |
> 4γ(CnmR)−9d Λ∗ , (5.151)
where the first step applies the definition of Φ; the third step uses (5.132) and (5.144);
the fourth step follows from (5.147); and the fifth step substitutes the lower bound
from (5.145). Now
Φfinal 6≡ 0 (5.152)
+ γ(Φ∗ − Φ).
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 81
Then
X
orth(Φfinal ) > min orth µ(z)(−1)f (z) Λz ,
z∈{0,1}n
min{orth(Λ̃z − Λz )}, orth(Φ∗ − Φ)
z
X
> min orth µ(z)(−1)f (z) Λz , d
z∈{0,1}n
X On
> min orth µ(z)(−1)f (z) Λzi , d
z∈{0,1}n i=1
where the first step applies Proposition 2.1(i); the second step follows from (5.127)
and (5.139); the third step is valid by the definition of Λz ; the fourth step applies
Corollary 2.3; the fifth step substitutes the lower bounds from (5.123) and (5.146);
and the final step uses (5.113).
To complete the proof, let
Φfinal ∗
Λ= · (−1)f ◦MPm,R ,
kΦfinal k1
We now translate the new amplification theorem from Nnm |6θ to the hypercube,
using the input transformation scheme of Theorem 3.9.
Theorem 5.19. Let C > 1 be the absolute constant from Theorem 5.18. Fix pos-
itive integers n, m, θ with θ > Cnm log(2nm). Then there is an (explicitly given)
transformation H : {0, 1}6θdlog(nm+1)e → {0, 1}n , computable by an AND-OR-AND
circuit of polynomial size with bottom fan-in at most 6dlog(nm + 1)e, such that
for all Boolean functions f : {0, 1}n → {0, 1}, all real numbers γ ∈ [0, 1], and all
positive integers
1 θ
d 6 min m deg± (f, γ), .
C 4m log(2θ)
Proof. Negating a function’s input bits has no effect on its γ-smooth threshold
degree for any 0 6 γ 6 1, so that f (x1 , x2 , . . . , xn ) and f (¬x1 , ¬x2 , . . . , ¬xn ) both
have γ-smooth threshold degree deg± (f, γ). Therefore, proving (5.154) for all f will
also settle (5.155) for all f. In what follows, we focus on the former.
Theorem 3.9 constructs an explicit surjection G : {0, 1}N → Nnm |6θ on N =
6θdlog(nm + 1)e variables with the following two properties:
(i) for every coordinate i = 1, 2, . . . , nm, the mapping x 7→ OR∗θ (G(x)i ) is
computable by a DNF formula of size (nmθ)O(1) = θO(1) with bottom fan-
in at most 6dlog(nm + 1)e;
(ii) for any polynomial p, the map v 7→ EG−1 (v) p is a polynomial on Nnm |6θ
of degree at most (deg p)/dlog(nm + 1) + 1e.
Consider the composition F = (f ◦ MP∗m,θ ) ◦ G. Then
F = (f ◦ (ANDm ◦ OR∗θ )) ◦ G
= f ◦ ((ANDm ◦ OR∗θ , . . . , ANDm ◦ OR∗θ ) ◦ G),
| {z }
n
Define
X 1G−1 (v)
λ= Λ(v) · ,
|G−1 (v)|
v∈Nnm |6θ
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 83
where 1G−1 (v) denotes as usual the characteristic function of the set G−1 (v). Clearly,
λ is a probability distribution on {0, 1}N . Moreover,
X 1G−1 (v)
λ > γθ−30d Λ∗ (v) ·
|G−1 (v)|
v∈Nnm |6θ
X |G−1 (v)| 1G−1 (v)
= γθ−30d · −1
2N |G (v)|
v∈Nnm |6θ
1{0,1}N
= γθ−30d · , (5.160)
2N
where the first two steps use (5.158) and the definition of Λ∗ , respectively.
Finally, we examine the orthogonal content of (−1)F · λ. Let p : RN → R be any
polynomial of degree less than ddlog(nm + 1) + 1e. Then by property (ii) of G, the
mapping p∗ : v 7→ EG−1 (v) p is a polynomial on Nnm |6θ of degree less than d. As a
result,
∗
h(−1)F · λ, pi = h(−1)(f ◦MPm,θ )◦G · λ, pi
X X ∗
= (−1)(f ◦MPm,θ )◦G · λ · p
v∈Nnm |6θ G−1 (v)
X ∗ X
= (−1)(f ◦MPm,θ )(v) λ·p
v∈Nnm |6θ G−1 (v)
X ∗
= (−1)(f ◦MPm,θ )(v) Λ(v) E p
G−1 (v)
v∈Nnm |6θ
∗
= h(−1)f ◦MPm,θ · Λ, p∗ i
= 0,
where the last step uses (5.159) and deg p∗ < d. We conclude that orth((−1)F · λ) >
ddlog(nm + 1) + 1e, which along with (5.160) settles (5.156).
5.7. The smooth threshold degree of AC0 . By Theorem 5.1, the composi-
tion ANDn1/3 ◦ ORn2/3 has exp(−O(n1/3 ))-smooth threshold degree Ω(n1/3 ). The
objective of this section is to strengthen this result to a near-optimal bound. For
any > 0, we will construct a constant-depth circuit f : {0, 1}n → {0, 1} with
exp(−O(n1− ))-smooth threshold degree Ω(n1− ). This construction is likely to
find applications in future work, in addition to its use in this paper to obtain a
lower bound on the sign-rank of AC0 . The proof proceeds by induction, with the
amplification theorem for smooth threshold degree (Theorem 5.19) applied repeat-
edly to construct increasingly harder circuits. To simplify the exposition, we isolate
the inductive step in the following lemma.
Lemma 5.20. Let f : {0, 1}n → {0, 1} be a Boolean circuit of size s, depth d > 0,
and smooth threshold degree
n1−α n1−α
deg± f, exp −c0 · β
> c00 · , (5.161)
log n logβ n
84 ALEXANDER A. SHERSTOV AND PEI WU
for some constants α ∈ [0, 1], β > 0, c0 > 0, and c00 > 0. Then f can be trans-
formed in polynomial time into a Boolean circuit F : {0, 1}N → {0, 1} on N =
Θ(n1+α log2+β n) variables that has size s + N O(1) , depth at most d + 3, bottom
fan-in O(log N ), and smooth threshold degree
1
!! 1
0 N 1+α N 1+α
deg± F, exp −C · 1−α+β > C 00 · 1−α+β , (5.162)
log 1+α N log 1+α N
where C 0 , C 00 > 0 are constants that depend only on α, β, c0 , c00 . Moreover, if d > 1
and f is monotone with AND gates at the bottom, then the depth of F is at most
d + 2.
Proof. Let C > 1 be the absolute constant from Theorem 5.18. Apply Theorem 5.19
with
to obtain a function H : {0, 1}N → {0, 1}n on N = Θ(n1+α log2+β n) variables such
that the composition F = f ◦ H satisfies (5.162) for some C 0 , C 00 > 0 that depend
only on α, β, c0 , c00 , and furthermore H is computable by an AND-OR-AND circuit
of polynomial size and bottom fan-in O(log N ). Clearly, the composition F = f ◦ H
is a circuit of size s + N O(1) , depth at most d + 3, and bottom fan-in O(log N ).
Moreover, if d > 1 and the circuit for f is monotone with AND gates at the bottom
level, then the bottom level of f can be merged with the top level of H to reduce
the depth of F = f ◦ H to at most (d + 3) − 1 = d + 2.
Corollary 5.21. Fix constants α ∈ [0, 1], β > 0, c0 > 0, c00 > 0, and d > 0. Let
{fn }∞ n
n=1 be a Boolean circuit family in which fn : {0, 1} → {0, 1} has polynomial
size, depth at most d, and smooth threshold degree
n1−α n1−α
0
deg± fn , exp −c · β
> c00 · (5.163)
log n logβ n
for all N > 2, where C 0 , C 00 > 0 are constants that depend only on α, β, c0 , c00 .
Moreover, if d > 1 and each fn is monotone with AND gates at the bottom, then
the depth of each FN is at most d + 2.
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 85
We now obtain our lower bounds on the smooth threshold degree of AC0 . We
present two incomparable theorems here, both of which apply Corollary 5.21 in a
recursive manner but with different base cases.
Theorem 5.22. Let k > 0 be a given integer. Then there is an (explicitly given)
Boolean circuit family {fn }∞ n
n=1 , where fn : {0, 1} → {0, 1} has polynomial size,
depth 3k, bottom fan-in O(log n), and smooth threshold degree
1
!! 1
0 n1− k+1 n1− k+1
deg± fn , exp −c · k(k−1)
> c00 · k(k−1)
(5.165)
log 2(k+1) n log 2(k+1) n
Theorem 5.23. Let k > 1 be a given integer. Then there is an (explicitly given)
Boolean circuit family {fn }∞ n
n=1 , where fn : {0, 1} → {0, 1} has polynomial size,
depth 3k + 1, bottom fan-in O(log n), and smooth threshold degree
2
!! 2
0 n1− 2k+3 00 n1− 2k+3
deg± fn , exp −c · k2
>c · k2
(5.166)
log 2k+3 n log 2k+3 n
Proof. As with Theorem 5.22, the proof is by induction on k. For the base case
k = 1, consider the family {gn }∞ n
n=1 in which gn : {0, 1} → {0, 1} is given by
bn1/3 c bn2/3 c
_ ^
gn (x) = xi,j .
i=1 j=1
Then
1/3 1/3
deg± (gn , 12−bn c−1
) = deg± (ORbn1/3 c ◦ ANDbn2/3 c , 12−bn c−1
)
1/3
= deg± (MPbn1/3 c,bn2/3 c , 12−bn c−1
)
1/3
= Ω(n ),
where the first step uses Proposition 2.9; the second step is valid because a function’s
smooth threshold degree remains unchanged when one negates the function or its
input variables; and the last step follows from Theorem 5.1. Applying Corollary 5.21
to the circuit family {gn }∞n=1 with α = 2/3 and β = 0 yields an explicit circuit
family {Gn }∞n=1 in which G n
n : {0, 1} → {0, 1} has polynomial size, depth 2+2 = 4,
bottom fan-in O(log n), and smooth threshold degree
n3/5 n3/5
deg± Gn , exp −C 0 · > C 00 ·
log1/5 n log1/5 n
for some constants C 0 , C 00 > 0 and all n > 2. This new circuit family {Gn }∞ n=1
establishes the base case.
For the inductive step, fix an integer k > 1 and an explicit circuit family {fn }∞
n=1
in which fn : {0, 1}n → {0, 1} has polynomial size, depth 3k +1, and smooth thresh-
old degree (5.166) for some constants c0 , c00 > 0. Applying Corollary 5.21 with
α = 2/(2k + 3) and β = k 2 /(2k + 3) yields an explicit circuit family {Fn }∞ n=1 ,
where Fn : {0, 1}n → {0, 1} has polynomial size, depth (3k + 1) + 3 = 3(k + 1) + 1,
bottom fan-in O(log n), and smooth threshold degree
2k+3 2k+3
!!
000 n 2k+5 n 2k+5
deg± Fn , exp −C · (k+1)2
> C 0000 · (k+1)2
log 2k+5 n log 2k+5 n
for some constants C 000 , C 0000 > 0 and all n > 2. This completes the inductive
step.
5.8. The sign-rank of AC0 . We have reached our main result on the sign-rank
and unbounded-error communication complexity of constant-depth circuits. The
proof amounts to lifting, by means of Theorem 2.18, the lower bounds on smooth
threshold degree in Theorems 5.22 and 5.23 to sign-rank lower bounds.
Theorem 5.24. Let k > 1 be a given integer. Then there is an (explicitly given)
Boolean circuit family {Fn }∞ n n
n=1 , where Fn : {0, 1} ×{0, 1} → {0, 1} has polynomial
size, depth 3k, bottom fan-in O(log n), sign-rank
k(k−1)
1
rk± (Fn ) = exp Ω n1− k+1 · (log n)− 2(k+1) , (5.167)
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 87
Proof. Theorem 5.22 constructs a circuit family {fn }∞ n=1 in which fn : {0, 1} →
n
{0, 1} has polynomial size, depth 3k, bottom fan-in O(log n), and smooth thresh-
old degree (5.165) for some constants c0 , c00 > 0 and all n > 2. Abbreviate m =
2dexp(4c0 /c00 )e. For any n > m, define Fn = fbn/mc ◦ ORm ◦ AND2 . Then (5.167) is
immediate from (5.165) and Theorem 2.18. Combining (5.167) with Theorem 2.16
settles (5.168).
It remains to analyze the circuit complexity of Fn . We defined Fn formally as a
circuit of depth 3k + 2 in which the bottom four levels have fan-ins nO(1) , O(log n),
m, and 2, in that order. Since m is a constant independent of n, these four levels can
be computed by a circuit of polynomial size, depth 2, and bottom fan-in O(log n).
This optimization reduces the depth of Fn to (3k + 2) − 4 + 2 = 3k while keeping
the bottom fan-in at O(log n).
We now similarly lift Theorem 5.23 to a lower bound on sign-rank and unbounded-
error communication complexity.
Theorem 5.25. Let k > 1 be a given integer. Then there is an (explicitly given)
Boolean circuit family {Fn }∞ n n
n=1 , where Fn : {0, 1} ×{0, 1} → {0, 1} has polynomial
size, depth 3k + 1, bottom fan-in O(log n), sign-rank
k2
2
rk± (Fn ) = exp Ω n1− 2k+3 · (log n)− 2k+3 ,
Proof. The proof is analogous to that of Theorem 5.24, with the only difference that
the appeal to Theorem 5.22 should be replaced with an appeal to Theorem 5.23.
Theorems 5.24 and 5.25 settle Theorems 1.2, 1.3, and 1.5 in the introduction.
Acknowledgments
The authors are thankful to Mark Bun and Justin Thaler for valuable comments
on an earlier version of this paper.
References
[1] S. Aaronson and Y. Shi, Quantum lower bounds for the collision and the element distinct-
ness problems, J. ACM, 51 (2004), pp. 595–605, doi:10.1145/1008731.1008735.
[2] A. Ambainis, Polynomial degree and lower bounds in quantum complexity: Collision
and element distinctness with small range, Theory of Computing, 1 (2005), pp. 37–46,
doi:10.4086/toc.2005.v001a003.
[3] A. Ambainis, A. M. Childs, B. Reichardt, R. Špalek, and S. Zhang, Any AND-OR
formula of size N can be evaluated in time N 1/2+o(1) on a quantum computer, SIAM J.
Comput., 39 (2010), pp. 2513–2530, doi:10.1137/080712167.
[4] J. Aspnes, R. Beigel, M. L. Furst, and S. Rudich, The expressive power of voting
polynomials, Combinatorica, 14 (1994), pp. 135–148, doi:10.1007/BF01215346.
88 ALEXANDER A. SHERSTOV AND PEI WU
[28] M. Krause and P. Pudlák, On the computational power of depth-2 circuits with thresh-
old and modulo gates, Theor. Comput. Sci., 174 (1997), pp. 137–156, doi:10.1016/S0304-
3975(96)00019-9.
[29] M. Krause and P. Pudlák, Computing Boolean functions by polynomials and threshold
circuits, Comput. Complex., 7 (1998), pp. 346–370, doi:10.1007/s000370050015.
[30] E. Kushilevitz and N. Nisan, Communication complexity, Cambridge University Press,
1997.
[31] T. Lee, A note on the sign degree of formulas, 2009. Available at http://arxiv.org/abs/
0909.4607.
[32] M. L. Minsky and S. A. Papert, Perceptrons: An Introduction to Computational Geom-
etry, MIT Press, Cambridge, Mass., 1969.
[33] N. Nisan and M. Szegedy, On the degree of Boolean functions as real polynomials, Com-
putational Complexity, 4 (1994), pp. 301–313, doi:10.1007/BF01263419.
[34] R. O’Donnell and R. A. Servedio, Extremal properties of polynomial threshold functions,
J. Comput. Syst. Sci., 74 (2008), pp. 298–312, doi:10.1016/j.jcss.2007.06.021.
[35] R. O’Donnell and R. A. Servedio, New degree bounds for polynomial threshold functions,
Combinatorica, 30 (2010), pp. 327–358, doi:10.1007/s00493-010-2173-3.
[36] R. Paturi, On the degree of polynomials that approximate symmetric Boolean functions,
in Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing
(STOC), 1992, pp. 468–474, doi:10.1145/129712.129758.
[37] R. Paturi and M. E. Saks, Approximating threshold circuits by rational functions, Inf.
Comput., 112 (1994), pp. 257–272, doi:10.1006/inco.1994.1059.
[38] R. Paturi and J. Simon, Probabilistic communication complexity, J. Comput. Syst. Sci.,
33 (1986), pp. 106–123, doi:10.1016/0022-0000(86)90046-2.
[39] A. A. Razborov and A. A. Sherstov, The sign-rank of AC0 , SIAM J. Comput., 39 (2010),
pp. 1833–1855, doi:10.1137/080744037. Preliminary version in Proceedings of the Forty-Ninth
Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2008.
[40] M. E. Saks, Slicing the hypercube, Surveys in Combinatorics, (1993), pp. 211–255,
doi:10.1017/CBO9780511662089.009.
[41] A. A. Sherstov, Halfspace matrices, Computational Complexity, 17 (2008), pp. 149–178,
doi:10.1007/s00037-008-0242-4. Preliminary version in Proceedings of the Twenty-Second An-
nual IEEE Conference on Computational Complexity (CCC), 2007.
[42] A. A. Sherstov, Separating AC0 from depth-2 majority circuits, SIAM J. Comput., 38
(2009), pp. 2113–2129, doi:10.1137/08071421X. Preliminary version in Proceedings of the
Thirty-Ninth Annual ACM Symposium on Theory of Computing (STOC), 2007.
[43] A. A. Sherstov, Communication complexity under product and nonproduct distributions,
Computational Complexity, 19 (2010), pp. 135–150, doi:10.1007/s00037-009-0285-1. Prelimi-
nary version in Proceedings of the Twenty-Third Annual IEEE Conference on Computational
Complexity (CCC), 2008.
[44] A. A. Sherstov, The pattern matrix method, SIAM J. Comput., 40 (2011), pp. 1969–
2000, doi:10.1137/080733644. Preliminary version in Proceedings of the Fortieth Annual ACM
Symposium on Theory of Computing (STOC), 2008.
[45] A. A. Sherstov, The unbounded-error communication complexity of symmetric functions,
Combinatorica, 31 (2011), pp. 583–614, doi:10.1007/s00493-011-2580-0. Preliminary version
in Proceedings of the Forty-Ninth Annual IEEE Symposium on Foundations of Computer
Science (FOCS), 2008.
[46] A. A. Sherstov, Strong direct product theorems for quantum communication and query
complexity, SIAM J. Comput., 41 (2012), pp. 1122–1165, doi:10.1137/110842661. Preliminary
version in Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing
(STOC), 2011.
[47] A. A. Sherstov, The intersection of two halfspaces has high threshold degree, SIAM J. Com-
put., 42 (2013), pp. 2329–2374, doi:10.1137/100785260. Preliminary version in Proceedings of
the Fiftieth Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2009.
[48] A. A. Sherstov, Making polynomials robust to noise, Theory of Computing, 9 (2013),
pp. 593–615, doi:10.4086/toc.2013.v009a018. Preliminary version in Proceedings of the Forty-
Fourth Annual ACM Symposium on Theory of Computing (STOC), 2012.
[49] A. A. Sherstov, Optimal bounds for sign-representing the intersection of two halfspaces
by polynomials, Combinatorica, 33 (2013), pp. 73–96, doi:10.1007/s00493-013-2759-7. Pre-
liminary version in Proceedings of the Forty-Second Annual ACM Symposium on Theory of
Computing (STOC), 2010.
90 ALEXANDER A. SHERSTOV AND PEI WU
[50] A. A. Sherstov, Breaking the Minsky–Papert barrier for constant-depth circuits, in Pro-
ceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing (STOC), 2014,
pp. 223–232, doi:10.1145/2591796.2591871. Full version available as ECCC Report TR14-009,
January 2014.
[51] A. A. Sherstov, Communication lower bounds using directional derivatives, J. ACM, 61
(2014), pp. 1–71, doi:10.1145/2629334. Preliminary version in Proceedings of the Forty-Fifth
Annual ACM Symposium on Theory of Computing (STOC), 2013.
[52] A. A. Sherstov, The power of asymmetry in constant-depth circuits, in Proceedings of the
Fifty-Sixth Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2015,
pp. 431–450, doi:10.1109/FOCS.2015.34.
[53] A. A. Sherstov, The multiparty communication complexity of set disjointness, SIAM J.
Comput., 45 (2016), pp. 1450–1489, doi:10.1137/120891587. Preliminary version in Proceed-
ings of the Forty-Fourth Annual ACM Symposium on Theory of Computing (STOC), 2009.
[54] A. A. Sherstov, Algorithmic polynomials, in Proceedings of the Fiftieth Annual ACM Sym-
posium on Theory of Computing (STOC), 2018, pp. 311–324, doi:10.1145/3188745.3188958.
[55] K.-Y. Siu, V. P. Roychowdhury, and T. Kailath, Rational approximation techniques for
analysis of neural networks, IEEE Transactions on Information Theory, 40 (1994), pp. 455–
466, doi:10.1109/18.312168.
[56] J. Thaler, Lower bounds for the approximate degree of block-composed functions, in Proceed-
ings of the Forty-Third International Colloquium on Automata, Languages and Programming
(ICALP), 2016, pp. 17:1–17:15, doi:10.4230/LIPIcs.ICALP.2016.17.
[57] R. Špalek, A dual polynomial for OR. Available at http://arxiv.org/abs/0803.4516, 2008.
[58] A. C.-C. Yao, Some complexity questions related to distributive computing, in Proceedings of
the Eleventh Annual ACM Symposium on Theory of Computing (STOC), 1979, pp. 209–213,
doi:10.1145/800135.804414.
A.1. Fourier transform. Consider the real vector space of functions {0,P1}n →
R. For S ⊆ {1, 2, . . . , n}, define χS : {0, 1}n → {−1, +1} by χS (x) = (−1) i∈S xi .
Then
(
2n if S = T,
hχS , χT i =
0 otherwise.
Thus, {χS }S⊆{1,2,...,n} is an orthogonal basis for the vector space in question. In
particular, every function φ : {0, 1}n → R has a unique representation of the form
X
φ= φ̂(S)χS
S⊆{1,2,...,n}
for some reals φ̂(S), where by orthogonality φ̂(S) = 2−n hφ, χS i. The reals φ̂(S)
are called the Fourier coefficients of φ, and the mapping φ 7→ φ̂ is the Fourier
transform of f. The following fact is immediate from the definition of φ̂(S).
The linear subspace of real polynomials on {0, 1}n of degree at most d is easily
seen to be span{χS : |S| 6 d}. Its orthogonal complement, span{χS : |S| > d},
is then the linear subspace of functions that have zero inner product with every
polynomial of degree at most d. As a result, the orthogonal content of a nonzero
function φ : {0, 1}n → R is given by
A.2. Forster’s bound. The spectral norm of a real matrix A = [Axy ]x∈X,y∈Y is
given by
where k · k2 is the Euclidean norm on vectors. The first strong lower bound on the
sign-rank of an explicit matrix was obtained by Forster [21], who proved that
p
|X| |Y |
rk± (A) >
kAk
for any matrix A = [Axy ]x∈X,y∈Y with ±1 entries. Forster’s result has seen a
number of generalizations, including the following theorem [22, Theorem 3].
Theorem A.2 (Forster et al.). Let A = [Axy ]x∈X,y∈Y be a real matrix without zero
entries. Then
p
|X| |Y |
rk± (A) > min |Axy |.
kAk x,y
Now, let V (N, n) denote the family of subsets V ⊆ {1, 2, . . . , N } that have exactly
one element in each of these blocks (in particular, |V | = n). Clearly, |V (N, n)| =
(N/n)n . For a function φ : {0, 1}n → R, the (N, n, φ)-pattern matrix is the real
matrix A given by
h i
A = φ(x|V ⊕ w) N n
.
x∈{0,1} , (V,w)∈V (N,n)×{0,1}
In words, A is the matrix of size 2N by (N/n)n 2n whose rows are indexed by strings
x ∈ {0, 1}N , whose columns are indexed by pairs (V, w) ∈ V (N, n) × {0, 1}n , and
92 ALEXANDER A. SHERSTOV AND PEI WU
whose entries are given by Ax,(V,w) = φ(x|V ⊕ w). We will need the following
expression for the spectral norm of a pattern matrix [44, Theorem 4.3].
Theorem A.3 (Sherstov). Let φ : {0, 1}n → R be given. Let A be the (N, n, φ)-
pattern matrix. Then
s n
N n |S|/2
kAk = 2N +n max |φ̂(S)| .
n S⊆{1,2,...,n} N
A.4. Proof of Theorem 2.18. We are now in a position to prove Theorem 2.18.
We will derive it from the following more general result, stated in terms of pattern
matrices.
Theorem A.4. Let f : {0, 1}n → {0, 1} be given. Suppose that deg± (f, γ) > d,
where γ and d are positive reals. Then for any integer T > 1, the (T n, n, (−1)f )-
pattern matrix has sign-rank at least γT d/2 .
Now
where the first step is valid because F and Φ have the same sign pattern; the second
step uses (A.2) and Theorem A.2; the third step applies Theorem A.3; and the final
step substitutes the upper bounds from (A.4) and (A.5).
Theorem (restatement of Theorem 2.18). Let f : {0, 1}n → {0, 1} be given. Sup-
pose that deg± (f, γ) > d, where γ and d are positive reals. Fix an integer m > 2
and define F : {0, 1}mn × {0, 1}mn → {0, 1} by F (x, y) = f ◦ ORm ◦ AND2 . Then
j m kd/2
rk± (F ) > γ .
2
Proof. The result is immediate from Theorem A.4 since the (bm/2cn, n, (−1)f )-
pattern matrix is a submatrix of [(−1)F (x,y) ]x,y .
The next lemma constructs a dual polynomial for OR that has the sign behavior
claimed in Theorem 3.3 but may lack some of the metric properties. The lemma is
an adaptation of [50, Lemma A.2].
Lemma B.2. Let be given, 0 < < 1. Then for some constant c = c() ∈ (0, 1) and
every integer n > 1, there is an (explicitly given) function ω : {0, 1, 2, . . . , n} → R
such that
1−
ω(0) > · kωk1 , (B.1)
2
1
|ω(t)| 6 2 ct/√n · kωk1 (t = 1, 2, . . . , n), (B.2)
ct 2
(−1)t ω(t) > 0 (t = 0, 1, 2, . . . , n), (B.3)
√
orth ω > c n. (B.4)
Remark B.3. It is helpful to keep in mind that properties (B.1)–(B.4) are logically
monotonic in c. In other words, establishing these properties for a given constant
c > 0 also establishes them for all smaller positive constants.
(−1)n+t+|S|+1 n
Y
ω(t) = (t − i).
n! t i=0,1,2,...,n:
i∈S
/
94 ALEXANDER A. SHERSTOV AND PEI WU
orth ω > d + 1
r
n
> . (B.5)
∆
It follows that
d
ω(0) ∆ − 1 Y i2 ∆ − 1
=
|ω(1)| ∆ + 1 i=1 i2 ∆
d
2 X 1
>1− −
∆ + 1 i=1 i2 ∆
∞
2 1 X 1
>1− −
∆ + 1 ∆ i=1 i2
4
>1− . (B.7)
∆
An analogous application of (B.6) shows that
|ω( ∆+1
2 )|
∆+1
2 ∆d d! d!
6 ∆+1
|ω(0)| 2 · ( ∆+1
2 − 1) (∆ − ∆+1
2 ) · 1 d−1
2∆ (d − 1)! (d + 1)!
8∆d
=
(∆ − 1)2 (d + 1)
8∆
6 . (B.8)
(∆ − 1)2
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 95
Finally, for i = 1, 2, . . . , d,
∆+1
|ω(i2 ∆)| 2 d! d! ∆d
= 2 ∆+1
· 1
|ω(0)| (i ∆ − 1)(i2 ∆ − 2 ) 2 · (d − i)! (d + i)! ∆d
2(∆ + 1) d! d!
6 4 2
·
i (∆ − 1) (d − i)! (d + i)!
2(∆ + 1) d d−1 d−i+1
= 4 · · · ··· ·
i (∆ − 1)2 d + i d + i − 1 d+1
i
2(∆ + 1) i
6 4 2
· 1−
i (∆ − 1) d+i
i2
2(∆ + 1)
6 4 · exp −
i (∆ − 1)2 d+i
2
2(∆ + 1) i
6 4 · exp −
i (∆ − 1)2 2d
!
2(∆ + 1) i2
6 4 · exp − p . (B.9)
i (∆ − 1)2 2 n/∆
Now,
d
kωk1 |ω(1)| |ω( ∆+1
2 )|
X |ω(i2 ∆)|
=1+ + +
ω(0) ω(0) ω(0) i=1
ω(0)
−1 ∞
4 8∆ X 2(∆ + 1)
61+ 1− + +
∆ (∆ − 1)2 i=1 i4 (∆ − 1)2
−1
π 4 (∆ + 1)
4 8∆
=1+ 1− + 2
+
∆ (∆ − 1) 45(∆ − 1)2
2
6 8
1− ∆
2
< , (B.10)
1−
where the second step uses (B.7)–(B.9), and the last step substitutes the definition
of ∆. Now (B.1) follows from (B.10) and ω(0) > 0. Moreover, for c = c(∆) > 0
small enough, (B.4) follows from (B.5), whereas (B.2) follows from (B.9) and the
fact that ω vanishes outside the union {1, ∆+1 2
2 } ∪ {i ∆ : i = 0, 1, 2, . . . , d}.
It remains to verify that ω has the desired sign behavior. Since ω vanishes outside
S, the requirement (B.3) holds trivially at those points. For t ∈ S, it follows from
(B.6) that
Theorem (restatement of Theorem 3.3). Let 0 < < 1 be given. Then for some
constants c0 , c00 ∈ (0, 1) and all integers N > n > 1, there is an (explicitly given)
function ψ : {0, 1, 2, . . . , N } → R such that
1−
ψ(0) > , (B.11)
2
kψk1 = 1, (B.12)
√
orth ψ > c0 n, (B.13)
t
sgn ψ(t) = (−1) , t = 0, 1, 2, . . . , N, (B.14)
c0
1
|ψ(t)| ∈ √ , √ , t = 0, 1, 2, . . . , N. (B.15)
(t + 1)2 2c00 t/ n c0 (t + 1)2 2c00 t/ n
Proof. For a sufficiently small constant c > 0 and all n > 1, Lemma B.2 and
Remark B.3 ensure the existence of a function ω : {0, 1, 2, . . . , dn/2e} → R such
that
kωk1 = 1, (B.16)
1
ω(0) > 1− , (B.17)
2 6
1
|ω(t)| 6 2 ct/√n (t = 1, 2, . . . , dn/2e), (B.18)
ct 2
(−1)t ω(t) > 0 (t = 0, 1, 2, . . . , dn/2e), (B.19)
√
orth ω > c n. (B.20)
where
5
δ= .
π 2 (1
− )
N
X 1
kΨk1 = kωk1 + δ √ kωk1
i=1
i2 2ci/ n
N
X 1
=1+δ √
i=1
i2 2ci/ n
∞
" #
X 1
∈ 1, 1 + δ
i=1
i2
6−
= 1, , (B.23)
6(1 − )
where the first and last steps use (B.22) and (B.17), respectively. The upper bound
on |Ψ(t)| is somewhat more technical. To begin with, we have the following bound
for every positive integer t:
98 ALEXANDER A. SHERSTOV AND PEI WU
t−1 t−1
X 1 X 1
=
i=1
(t − i)2 i2 i=1
max{(t − i)2 , i2 } min{(t − i)2 , i2 }
t−1
1 X 1
6
(t/2)2 i=1 min{(t − i)2 , i2 }
∞
1 X 1
6 · 2
(t/2)2 i=1
i2
4π 2
= . (B.26)
3t2
Continuing,
∞ t−1
X |ω(t − i)| |ω(0)| X |ω(t − i)|
√ = √ + √
i=1
i2 2ci/ n 2
t 2 ct/ n
i=1
i2 2ci/ n
t−1
1 X 1
6 √ + √
t2 2ct/ n
i=1
c(t − i)2 i2 2ct/ n
4π 2
1
6 √ 1+ , (B.27)
t2 2ct/ n 3c
where the second step uses (B.16) and (B.18), and the third step substitutes the
bound from (B.26). Analogously,
∞ ∞
X |ω(−t + i)| |ω(0)| X |ω(−t + i)|
√ = √ + √
i=1
i2 2ci/ n t2 2ct/ n i=t+1 i2 2ci/ n
∞
1 X 1
6 √ + √
t2 2ct/ n i=t+1
c(t − i)2 i2 2ci/ n
∞
!
1 X 1
6 √ 1+
t2 2ct/ n
i=t+1
c(t − i)2
π2
1
= √ 1+ , (B.28)
t2 2ct/ n 6c
where the second step uses (B.16) and (B.18). Now for every integer t > 1,
∞ ∞
!
X |ω(t − i)| X |ω(−t + i)|
|Ψ(t)| 6 |ω(t)| + δ √ + √
i=1
i2 2ci/ n
i2 2ci/ i=1
n
4π 2 δ π2 δ
1
6 √ 1 + 2cδ + + , (B.29)
ct2 2ct/ n 3 6
where the first step is immediate from the defining equation for Ψ, and the second
step uses (B.18), (B.27), and (B.28).
THE THRESHOLD DEGREE AND SIGN-RANK OF AC0 99
https://eccc.weizmann.ac.il