Lower Bounds in Computer Science
Lower Bounds in Computer Science
Amit Chakrabarti
Dartmouth College
1.1 Sorting
Definition 1.1.1 (The Sorting Problem). Given n items (x1 , x2 , · · · , xn ) from an ordered universe, output a permuta-
tion σ : [n] → [n] such that xσ (1) < xσ (2) · · · < xσ (n) .
At present we do not know how to prove superlinear lower bounds for the Turing machine model for any problem
in NP, so we instead restrict ourselves to a more limited model of computation, the comparison tree. We begin by
considering deterministic sorting algorithms.
` ≤ 2h , h ≥ dlg `e,
where ` is the number of leaves, and h is the height of T . But now we note that since there are n! possible permutations,
any correct tree must have at least n! leaves, so then, by Stirling’s formula, we have
Open Problem. Prove that some language L ∈ NP cannot be decided in O(n) time on a Turing machine.
1
CS 85, Spring 2008, Dartmouth College
LECTURE 1. COMPARISON TREES: SORTING AND SELECTION Lower Bounds in Computer Science
chosen uniformly at random. Note that the “randomness” in any randomized algorithm can be captured as an infinite
binary “random string”, chosen at the beginning, such that at every call of the random number generator, we simply
read off the next bit (or the next few bits) on the random string. Then, we can say that a randomized algorithm is just
a probability distribution over deterministic algorithms (each obtained by fixing the random string). In other words,
when calling the randomized algorithm, one can move all the randomness to the beginning, and simply pick at random
a deterministic algorithm to call.
With this view of randomized algorithms in mind, we describe an extremely useful lemma due to Andrew Chi-Chih
Yao (1979). But first, some preliminaries.
Suppose we have some notion of “cost” of an algorithm (for some function/problem f ) on an input, i.e., a (non-
negative) function cost : A × X → R+ , where A is a space of (deterministic) algorithms and X is a space of inputs.
An example of a useful cost measure is “running time.” In what follows, we will consider random algorithms A ∼ λ,
where λ is a probability distribution on A . Note that, per our above remarks, a randomized algorithm is specified by
such a distribution λ. We will also consider a random input X ∼ µ, for some distribution µ on X .
Definition 1.1.2 (Distributional, worst-case and randomized complexity). In the above setting, the term Eµ [cost(a, X )]
is then just the expected cost of a particular algorithm a ∈ A on a random input X ∼ µ. The quantity
Dµ ( f ) = min Eµ [cost(a, X )] ,
a
where f is the function we want to compute, is called the distributional complexity of f according to input distribution
µ. We constrast this to the worst case complexity of f , which we can write as follows
∀ µ Dµ ( f ) ≤ C( f ) and R( f ) ≤ C( f ).
To obtain the latter inequality, consider the trivial distributions λ that are each supported on a single algorithm a ∈ A .
The proof of part (2) is somewhat non-trivial and requires the use of the linear programming duality theorem.
Moreover, we do not need part (2) at this point, so we shall only prove part (1). We note in passing that part (2) can be
written as
max min Eµ [cost(a, X )] = min max Eλ [cost(A, x)].
µ a λ x
Proof of Part (1). To visualize this proof, it helps to consider the following table, where X = {x1 , x2 , x3 , . . .}, A =
{a1 , a2 , a3 , . . .} and ci j = cost(ai , x j ):
a1 a2 a3 ···
x1 c11 c12 c13 ···
x2 c21 c22 c23 ···
x3 c31 c32 c33 ···
.. .. .. .. ..
. . . . .
2
CS 85, Spring 2008, Dartmouth College
LECTURE 1. COMPARISON TREES: SORTING AND SELECTION Lower Bounds in Computer Science
While reading the argument below, think of row averages and column averages in the above table.
Define Rλ ( f ) = maxx Eλ [cost(A, x)]. Then, for all x, we have
Eλ [cost(A, x)] ≤ Rλ ( f ),
whence
Eµ [Eλ [cost(A, X )]] ≤ Rλ ( f ).
Since expectation is essentially just summation, we can switch the order of summation to get
Eλ Eµ [cost(A, X )] ≤ Rλ ( f ),
which implies that there must be an algorithm a such that Eµ [cost(a, X )] ≤ Rλ ( f ). Since the distributional complexity
Dµ ( f ) takes the minimum such a, it’s clear that
Dµ ( f ) ≤ Rλ ( f ).
note here that m is a function of both the algorithm and the input.
Theorem 1.1.6 (Randomized sorting lower bound). Let T (x) denote the expected number of comparisons made by a
randomized n-element sorting algorithm on input x. Let C = maxx T (x). Then C = (n lg n).
Proof. Let A denote the space of all (deterministic) comparison trees that sort an n-element array and let X denote
the space of all permutations of [n]. Consider a randomized sorting algorithm given by a probability distribution λ
on A . As it stands, a “randomized sorting algorithm” is always correct on every input, but has a random runtime
(between (n) and O(n 2 )) that depends on its input. (Such algorithms are called Las Vegas algorithms.)
We want to convert each a ∈ A into a corresponding algorithm a 0 that has a nontrivially bounded runtime, but
may sometimes spits out garbage. We do this by allowing at most 10C comparisons, and outputting some garbage
if the sorted permutation is still unknown. Let A0 denote a random algorithm corresponding to a random A ∼ λ.
Corollary 1.1.5 implies
1
Pr[A0 is wrong on x] = Pr[NumComps(A, x) > 10C] ≤ .
10
Now define
1, if a is wrong on input x
cost(a, x) =
0, otherwise
We can then rewrite the above inequality as
1
max Eλ [cost(A0 , x)] ≤ .
x 10
3
CS 85, Spring 2008, Dartmouth College
LECTURE 1. COMPARISON TREES: SORTING AND SELECTION Lower Bounds in Computer Science
1.2 Selection
Definition 1.2.1 (The Selection Problem). Given n items (x1 , x2 , · · · , xn ) from an ordered universe and an integer
k ∈ [n], the selection problem is to return the kth smallest element, i.e., xi such that |{ j : x j < xi }| = k − 1. Let
V (n, k) denote the height of the shortest comparison tree that solves this problem.
Special Cases. The minimum (resp. maximum) selection problems, where k = 1 (resp. k = n), and the median
problem, where k = d n2 e. Minimum and maximum can clearly be done in O(n) time. Other selection problems are
less trivial.
Proof. Consider any leaf λ of T . If xi is the output at λ, then every x j ( j 6= i) must have won at least one comparison
on the path leading to λ (if not, we cannot know for sure that x j is not the minimum). Since each comparison can be
won by at most one element, we have depth(λ) ≥ n − 1.
4
CS 85, Spring 2008, Dartmouth College
LECTURE 1. COMPARISON TREES: SORTING AND SELECTION Lower Bounds in Computer Science
Proof. Let T be a comparison tree for selecting the kth smallest element. At a leaf λ of T , if we output xi , then every
x j ( j 6= i) must be “settled” with respect to xi , i.e., we must know whether or not x j < xi . In particular, we must
know the set
Sλ = {x j : x j < xi }.
Fix a particular set U ⊆ {x1 , . . . , xn }, with |U | = k − 1. Now consider the following set of leaves:
L U = {λ : Sλ = U }.
Notice that the output at any leaf in L U is the minimum of the elements in U . Therefore, if we treat U as “given,” and
prune T accordingly, by eliminating any nodes that perform comparisons involving one or more elements of U , then
the residual tree is one that has leaf set L U and finds the minimum of the n − k + 1 elements of U .
By Theorem 1.2.2, the pruned tree must have at least 2n−k leaves. Therefore, |L U | ≥ 2n−k . The number of leaves
n
of T is the sum of |L U | over all k−1 choices for the set U . Therefore, the height of T must be
n n
height(T ) ≥ lg ·2 n−k
= n − k + lg .
k−1 k−1
which already gives us a good intial lower bound for median selection:
n n n n 1 3
V n, ≥ + lg n ≥ + n − lg n − O(1) = n − o(n).
2 2 2 −1 2 2 2
Using more sophisticated techniques, one can prove the following stronger lower bounds:
V n, n2 ≥ 2n − o(n).
[BJ85]
V n, 2n
≥ (2 + 2 )n − o(n).
−50 [DZ01]
On the upper bound side, we again have nontrivial results, starting with the famous “groups of five” algorithm:
5
Lecture 2
Boolean Decision Trees
We define 1-certificates and C1 ( f ) similarly. We use the convention that C0 ( f ) = ∞ if f ≡ 1, and similarly for
C1 ( f ). Finally, we define the certificate complexity C( f ) of a function f to be the larger of the two, that is
C( f ) = max{C0 ( f ), C1 ( f )}.
6
CS 85, Spring 2008, Dartmouth College
LECTURE 2. BOOLEAN DECISION TREES Lower Bounds in Computer Science
Definition 2.1.5 (Sensitivity). For a function f : {0, 1}n → {0, 1}, define the sensitivity sx ( f ) for f on a particular
input x as follows
sx ( f ) = |{i ∈ [n] : f (x (i) ) 6= f (x)}|,
where we write x (i) to denote the string x with the ith bit flipped. We then define the sensitivity of f as
s( f ) = max{sx ( f )}.
x
Definition 2.1.6 (Block Sensitivity). For a block B ⊆ [n] of bit positions and n = |x|, let x (B) be the string x with all
bits indexed by B flipped. Define the block sensitivity bsx ( f ) for f on a particular input x as
bsx ( f ) = max{b : {B1 , B2 , . . . , Bb } is a set of disjoint blocks such that ∀ i ∈ [b] : f (x) 6= f (x (Bi ) )}.
That is to say, bsx ( f ) is the maximum number of disjoint blocks that are each sensitive for f on input x. We then
define the block sensitivity of f as
bs( f ) = max bsx ( f ).
x
Note that since block sensitivity is just a generalization of sensitivity (which requires that the “blocks” be of size 1
each), we have bsx ( f ) ≥ sx ( f ), and thus bs( f ) ≥ s( f ). Before we go on to develop some of the theory behind this,
consider the following example.
√ 2.1.7. Consider a√function f (x) defined as follows, where the input x is of length n and is arranged in a
Example
√
n × n matrix. Assume n is an even integer. Define
every block sensitive. This gives a quadratic gap between s( f ) and bs( f ).
Proof. The leftmost and rightmost inequalities have already been established above. To see that the middle inequality
holds, observe that every certificate must expose at least one bit per sensitive block.
Here we note that there could be at least a quadratic gap between s( f ) and bs( f ) as evidenced by example 2.1.7.
It turns out that there is a maximum of a quadratic gap between C( f ) and D( f ), which leads us to the next theorem,
whose proof we will see as a corollary later in this section.
7
CS 85, Spring 2008, Dartmouth College
LECTURE 2. BOOLEAN DECISION TREES Lower Bounds in Computer Science
Clearly this polynomial is 1 if and only if every xi = bEi . Then, it follows that the polynomial that represents f is just
the polynomial
Xk
p= paEi
i=1
Since each aEi is distinct, at most one paEi will evaluate to 1, but since that implies that the input is one of the accepting
inputs for f , p behaves as desired. To prove uniqueness, suppose 2 multilinear polynomials p, q both represent f ,
then the polynomial r = p − q satisfies r (E a ) = 0 for all aE ∈ {0, 1}n .
Consider the smallest monomial xi1 , xi2 , . . . , xik in r . FINISH UNIQUENESS PROOF.
Theorem 2.1.14. For a Boolean function f , let p be its representative polynomial. Then we have deg( p) ≤ D( f )
Proof. For any leaf λ of a decision tree, without loss of generality let x1 , x2 , . . . , xr be the queries made along the
path to that leaf, and let b1 , b2 , . . . , br be the results of the queries. Then for every leaf λ of a decision tree for f , write
a polynomial pλ given as follows Y Y
pλ = xi · (xi − 1)
i:bi =1 i:bi =0
where we assume that n = 3k for some integer k, and NAE is defined as follows
1 if x, y, z are not all equal
NAE(x, y, z) =
0 if x = y = z
8
CS 85, Spring 2008, Dartmouth College
LECTURE 2. BOOLEAN DECISION TREES Lower Bounds in Computer Science
E we have
In other words, this is the k-layer tree of NAE computations applied recursively to 3k initial inputs. For x = 0,
sx (R - NAE) = 3 , since changing any single bit will flip the answer. Now observe that we can alternatively write the
k
NAE(x, y, z) ≡ x + y + z − x y − yz − zx
Then, we can see that at the second lowest level we have 3k−1 polynomials, each of degree 2 (lowest level contains 3k
polynomials of degree 1 - coordinates of the input vector), but at the root, we have a single polynomial of degree 2k ,
but this is polynomially distinct to the sensitivity sx (R - NAE) = 3k . Alternatively, writing this in terms of n, we have
deg( f ) = n log3 2 s( f ) = n
To see why deg( f ) = n, simply recall that no matter the input, we must always examine every bit of the input to get a
definitive answer. In other words, the decision tree has constant height D(g) = n (refer to example 2.1.10).
D( f ) ≤ C1 ( f )bs( f )
≤ C0 ( f )bs( f )
Proof. Without loss of generality we only consider the first case, that D( f ) ≤ C1 ( f )bs( f ), as the case for C0 ( f )bs( f )
is symmetric. We proceed to induct on the number of variables, or size of the input n.
The base case n = 1 is trivial. Before we continue with the induction step, we define a subfunction of f as follows
where α ∈ {0, 1, ∗}n and exposes k bits, or has exactly k characters ∈ {0, 1}. If f has no 1-certificate, then f = 0, and
D( f ) = 0, so we’re done. FINISH BEALS BUHRMAN PROOF
Corollary 2.1.19. D( f ) ≤ C( f )2
Proof.
D( f ) ≤ min{C0 ( f ), C1 ( f )} · bs( f )
≤ min{C0 ( f ), C1 ( f )} · C( f )
= min{C0 ( f ), C1 ( f )} · max{C0 ( f ), C1 ( f )}
= C0 ( f ) · C1 ( f )
≤ C( f )2
9
Lecture 3
Degree of a Boolean Function
Theorems:
1. S( f ) ≤ bs( f ) ≤ C( f ) ≤ D( f )
2. deg( f ) ≤ D( f )
6. Corollary: D( f ) ≤ bs( f )3
7. bs( f ) ≤ 2 deg( f )2
Examples:
√
1. ∃ f : S( f ) = 2( n), bs( f ) = 2(n)
√ √
2. ∃ f : S( f ) = bs( f ) = C( f ) = n, deg( f ) = D( f ) = n
3. ∃ f : deg( f ) = n log3 2 , S( f ) = n
Theorem 7:
Proof.
• Pick an input −
→
a (∈ {0, 1}n ) and blocks B1 , B2 , . . . , Bb that achieve the max in bs( f ) = b.
– If i 6∈ B1 ∪ B2 . . . ∪ Bb , set xi = ai
10
CS 85, Spring 2008, Dartmouth College
LECTURE 3. DEGREE OF A BOOLEAN FUNCTION Lower Bounds in Computer Science
– If i ∈ B j ,
∗ If ai = 1, set xi = y j
∗ If ai = 0, set xi = 1 − y j
• Multi-linearize the polynomial
deg(Q) ≤ deg(P) (Note: deg(Q) ≤ b)
Q(1, 1, 1, . . . , 1) = P(−
→
a ) = f (− →a)
Q(0, 1, 1, . . . , 1) = Q(1, 0, 1, . . . , 1) = Q(1, 1, . . . , 0) = 1 − f (−
→
a ) (Because each B j is sensitive)
−
→ −
→
Q( c ) ∈ {0, 1}∀ c ∈ {0, 1} b
[Trick by Minsky-Papert]
Define Q sym (k) as follows:
P −
→
a)
a |=k Q(
|−
→
Q sym
(k) = b
, k ∈ {0, 1, 2, . . . , b}
k
where |−
→
a | (weight of −
→
a ) is the number of ones in −
→
a.
Note: deg(Q ) ≤ b
sym
Q sym (k) = f (−
→
a ), if k = b
= 1 − f (−
→a ), if k = b − 1
∈ [0, 1], else
Markov’s Inequality:
Definition 3.0.20 (Interval Norm of a function). Define k f k S = supx∈S | f (x)|, for f : R → R, S ⊆ R.
Suppose f is a polynomial of degree ≤ n, then k f 0 k[−1,1] ≤ n 2 k f k[−1,1]
Corollary: If a ≤ x ≤ b, c ≤ f (x) ≤ d, then k f 0 k[a,b] ≤ n 2 · b−a
d−c
However, Q sym (b − 1) = 1 − f (−
→
a ) and Q sym (b) = f (−
→
a ). By Mean-Value theorem, ∃α ∈ [b − 1, b] such that
|(Q ) (α)| = 1.
sym 0
11
CS 85, Spring 2008, Dartmouth College
LECTURE 3. DEGREE OF A BOOLEAN FUNCTION Lower Bounds in Computer Science
– The proof is similar for the case where κ = −Q sym (x) for some x.
Theorem 8:
D( f ) ≤ bs( f ) deg( f )2
• There exists a set (size ≤ bs( f ) deg( f )) of variables that intersects every maxonomial.
• Query all these variables to get subfunction f |α with deg( f |α ) ≤ deg( f ) − 1. Since all the maxonomials of
f vanish after exposing the variables, the degree of the resulting polynomial representation ( f |α ) decreases at
least by 1.
If the above two statements hold, then we can prove the theorem by induction on the degree of the function f . If
f is a constant function, then the theorem holds trivially. Assume that it is true for all functions whose degree is less
than n = deg( f ).
D( f ) ≤ bs( f ) deg( f ) + D( f |α )
≤ bs( f ) deg( f ) + bs( f |α ) deg( f |α )2 By induction hypothesis
≤ bs( f )(deg( f ) + (deg( f ) − 1) ) 2
12
CS 85, Spring 2008, Dartmouth College
LECTURE 3. DEGREE OF A BOOLEAN FUNCTION Lower Bounds in Computer Science
• Pick all variables in any maxonomial that is not yet intersected and add these variables to S. (Number of
variables added is less than or equal to deg( f ).)
• If the current set intersects every maxonomial stop else repeat the above step.
13
Lecture 4
Symmetric Functions and Monotone Functions
Definition 4.0.22. f : {0, 1}n → {0, 1} is said to be symmetric if ∀x ∈ {0, 1}n ∀π ∈ Sn (Group of all permutations
on [n])
f (x) = f (xπ(1) , xπ(2) , . . . , xπ(n) )
This is the same as saying f (x) depends only on |x|(number of ones in x).
Recall the proof of bs( f ) ≤ 2 deg( f )2 . We used symmetrizing trick.
Can prove: For any symmetric f , deg( f ) = n − O(n α ) where α(= 0.548) < 1 is a constant. [Von Zur Gathen &
Roche]
Then, D( f ) ≥ deg( f ) = n − O(n)
For x, y ∈ {0, 1}n write x ≤ y if ∀i(xi ≤ yi )
Definition 4.0.23. f is monotone if x ≤ y ⇒ f (x) ≤ f (y)
Example of monotone functions: O Rn , AN Dn , AN D√n · O R√n .
Non-examples: P A Rn , ¬O Rn .
Theorem 4.0.24. If f is monotone, then S( f ) = bs( f ) = C( f )
Proof. We already have S( f ) ≤ bs( f ) ≤ C( f ).
Remains to prove: C( f ) ≤ S( f ).
Let x ∈ {0, 1}n . Let α be a minimal 0-certificate matching x (suppose f (x) = 0).
Claim: All exposed bits of α are zeros.
Proof: If not, we can change an exposed bit (which is 1) in α and still have f (x|α ) = 0 as f is monotone. So, we
don’t need to expose that bit. So, α can’t be minimal 0-certificate.
Claim: Every exposed bit in α is sensitive for [α : 1∗ ] = y (the input obtained by setting *=1 everywhere in α)
Proof: Suppose i th bit is exposed by α but f (y (i) ) = f (y) = 0. By monotonicity, any other setting of *’s in α
causes f = 0. So, i th bit is not necessary, contradicting the minimality of α.
By second claim: S( f ) ≥ number of exposed bits in α. Thus, S( f ) ≥ C0 ( f ). Similarly, S( f ) ≥ C1 ( f ).
Therefore, S( f ) ≥ max{C0 ( f ), C1 ( f )} = C( f )
Theorem 4.0.25. If f is monotone, D( f ) ≤ bs( f )2 = S( f )2
[Using an earlier theorem: D( f ) ≤ C( f )bs( f ) = S( f )2 ]
The above bounds are tight, considering f = AN D√n · O R√n .
14
CS 85, Spring 2008, Dartmouth College
LECTURE 4. SYMMETRIC FUNCTIONS AND MONOTONE FUNCTIONS Lower Bounds in Computer Science
group G ≤ S N of permutations (G ∼ = Sn ).
G is a transitive on {1, . . . , N } i.e, for any i, j ∈ [N ], ∃π ∈ G such that π(i) = j.
Lemma 1: Suppose f : {0, 1}d → {0, 1} is monotone (6= constant), invariant under a transitive group and d = 2α
for some α ∈ N, then f is evasive.
Corollary to Lemma 1: If n = 2k , then D( f ) ≥ n 2 /4.
Hierarchically cluster [n] into subclusters of size n/2 and recurse. Consider the subfunction of f obtained the
following way: gi : A → {0, 1} and g(x) = f (x), where
15
CS 85, Spring 2008, Dartmouth College
LECTURE 4. SYMMETRIC FUNCTIONS AND MONOTONE FUNCTIONS Lower Bounds in Computer Science
as 0 < k < d.
By Claim:
−
→
|S0 | = 0 as f ( 0 ) = 0 and |Sd | = 1.
= 0 + even number + 1
= odd
Consider all x ∈ {0, 1}n that reach leaf λ of decision tree at depth < d. An even number of x’s reach here.
Therefore, if all leaves whose output is f (x) = 1 have depth < d, then number of x : f (x) = 1, is even. This is a
contradiction and hence, there exists a leaf whose depth is d. Thus, D( f ) = d and f is evasive.
16
Lecture 5
Randomized Decision Tree Complexity:
Boolean functions
17
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science
18
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science
√
Lemma2 [O’Donnell-Servedio]: For any monotone f : {−1, 1}n → {−1, 1}, I ( f ) ≤ Dunif ( f ).
From Lemma1 and Lemma2, we get that if f is monotone, transitive, and balanced, then:
I( f ) Dunif ( f )3/2
1 ≤ Dunif ( f ) ≤
n n
⇒ R( f ) ≥ Dunif ( f ) ≥ n 2/3
Proof of Lemma2:
Recall that for monotone f , Ii ( f ) = 21 E x [Di f (x)], which is also the coefficient of xi in the representing polyno-
mial.
Let T be a deterministic decision tree for f and suppose T outputs ±1 values. Let L = {leaves of T } and L + =
{leaves of T that output −1}, i.e., accepting leaves.
For each leaf λ ∈ L, we have a polynomial pλ (x1 , x2 , . . . , xn ) such that
1, if (x1 , x2 , . . . , xn ) reaches λ
pλ (x1 , x2 , . . . , xn ) =
0, otherwise
19
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science
d(λ) = depth(λ)
s(λ) = skew of λ
= (#left branches − #right branches) on path to λ
qP
d(λ)
⇒ I( f ) ≤ Duni f ( f ). The first inequality is due to drunkard’s walk in probability theory.
p
λ 2d(λ) =
20
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science
2
Theorem 5.0.39. If f is a non-constant, monotone graph property on n-vertex graphs, then R( f ) = (min{ pn∗ , log
n
n }),
∗
where p is the critical probability of f .
Note that we can always assume p ∗ ≤ 12 . Why? Because if the critical probability of f > 1/2, we can always
consider g(−→x ) = 1 − f (1 − −→x ), then p ∗ (g) = 1 − p ∗ ( f ).
Critical probability p is the probability that makes E[ f (x)] = 1/2.
∗
Let us now define the key graph theoretic property “graph packing” that we need for the proof. Graph packing is
the basis for a lot of lower bounds for graph properties.
Definition 5.0.40 (Graph Packing). Given graphs G, H with |V (G)| = |V (H )|, we say that G and H pack if
∃bijection φ : V (G) → V (H ) such that ∀{u, v} ∈ E(G), {φ(u), φ(v)}notin E(H ). φ is called a packing.
We state the following theorem, but we will not prove it:
Theorem 5.0.41 (Sauer-Spencer theorem). If |V (G)| = |V (H )| = n and 4(G) · 4(H ) ≤ n
2, then G and H pack.
Here 4(G) = max degree in G.
Note: you can never pack a 0-certificate and a 1-certificate. What is the graph of a 0-certificate? It is a bunch of
things that are not edges in the input graph.
Intuition: Sauer-Spencer theorem says that if two graphs are small, you can pack them. Since we cannot pack a
0-certificate and a 1-certificate, it implies that either of the two has to be big so that certificate complexity is big which
gives you a handle on the lower bound.
Consider a decision tree; pick a random input x ∈ {0, 1}n according to µ p and run the decision tree on x. Let Y =
#1 s read in the input, and Z = #00 s read in the input.
0
1− p n
E[Z ] = ·
p 64
n
≥
128 p
= (n/ p)
n n2
≥ (min{ , })
p log n
n
⇒ we may assume E[Y ] ≤ 64 .
E[Y ]
⇒ E[Y | f (x) = 1∧4(G x ) ≤ np + 4np log n] ≤
p
√ , where G x = graph represented
Pr[ f (x)=1∧4(G x )≤np+ 4np log n]
by 10 s in the input x (so other edges are “off”).
E[X ]
Theorem 5.0.42. E[X |c] ≤ Pr[c]
Proof.
21
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science
Pr[A ∧ B] = 1 − Pr[A ∨ B]
≥ 1 − Pr[A] − Pr[B]
= Pr[A] − Pr[B]
n/32 + o(n). Therefore, there exists a 1-certificate with 1 ≤ np + 4np log n and #edges ≤ n/32 + o(n). This
means that we are touching at most n/16 vertices, and so at least n/2 vertices are isolated.
2
Claim 2: All 0-certificates must have at least √n edges.
16(np+ 4np log n)
Now we have a lower bound on the number of edges of any 0-certificate. As we did E[Y ], we can say
Claim 2 Proof: This proof amounts to showing that if the claim were false, we could pack a 1-certificate and
0-certificate.
n2
We will show that if 1(G) ≤ k and G has more that n/2 isolated vertices and |E(H )| ≤ 161(G) , then G and
H pack.
Number of edges in a graph equals half the sum of the degrees of the vertices, i.e.,
1 X
|E(H )| = · deg(v)
2 v∈H
davg (H ) · n
=
2
2
⇒ davg (H ) = · |E(H )|
n
n
= .
81(G)
If H 0 is a subgraph of H spanned by n/2 lowest degree vertices , then
1(H 0 ) ≤ 2 · davg (H )
.
Now, if you have an upper bound on the average, you can bound the max of n/2 smallest items.
22
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science
By packing H − H 0 with the isolated vertices of G, we can extend a packing of H 0 and G 0 to one one H and G.
We can get a packing of H 0 and G 0 because,
1, if xi is read as a 0 10 .
Yi =
0, otherwise
1, if xi is read as a 0 00 .
Zi =
0, otherwise
Then,
(n2)
X (n2)
X
Y = Yi ; Z = Zi
i=1 i=1
Enough to show the same under an arbitrary conditioning on x j ( j 6= i) – because xi ’s are independent.
Apply an arbitrary conditioning of x j ( j 6= i). If this conditioning implies that the i-th coordinate is not read,
it implies Yi = Z i = 0. Otherwise this is the only coordinate that is left to be read; and we know Pr[1] = p and
Pr[0] = 1 − p, i.e.,
E[Z i ] Pr[X i = 0]
=
E[Yi ] Pr[X i = 1]
1− p
= .
p
What do you mean by “enough to show under arbitrary conditioning”?
If you have conditioning on rest of N − 1 variables then we have C1 , C2 , . . . , C2 N −1 conditions. This implies,
We used a property of decision trees that whether or not you read ith variable is independent of the value of the ith
variable; it may depend on what the values of other variables are.
23
Lecture 6
Linear and Algebraic Decision Trees
Model 0. Equality testing only. This is a very weak model with no non-trivial solutions. The element distinctness
problem would require 2(n 2 ) comparisons.
Model 1. Assume universe U is totally ordered. We can compare any two elements xi and x j and take either of 3
branches depending on whether xi > x j , xi = x j , or xi > x j . Note that in a computer science context this
makes sense, as it is always possible to compare two objects by comparing their binary representations.
In this model, the problem reduces to sorting, hence the element distinctness problem can be solved in O(n log n)
comparisons. In fact we have a matching lower bound (n log n) for this problem, through a complicated
adversarial argument.
Instead of describing the adversarial argument we describe a stronger model. We then use this model to give us
a lower bound, since a lower bound for the problem in the stronger model is also a lower bound in this model.
Model 2. We assume w.l.o.g. that the universe U = R with the usual ordering. Then the comparison of two elements
xi and x j is equivalent to deciding if xi − x j is <, > or = 0. For this model we allow more general tests, such
as
> 0,
is x1 − 2x3 + 4x5 − x7 = 0, ?
<0
>0
In general at an internal node v in the decision tree, we can decide whether pv (x1 , x2 , · · · , xn ) = 0 , and
<0
n
(v) (v)
X
branch accordingly. Here pv (x1 , x2 , · · · , xn ) = a0 + ai xi .
i=1
24
CS 85, Spring 2008, Dartmouth College
LECTURE 6. LINEAR AND ALGEBRAIC DECISION TREES Lower Bounds in Computer Science
decide whether pv (x1 , x2 , · · · , xn ) = 0 , and branch accordingly. The depth of such a tree is defined as usual.
<0
If we assume that our tree tests membership in a set, we can consider the output to be {0, 1}. Then we can think of
two subsets W0 , W1 ⊆ Rn where:
x ∈ Rn : tree outputs i}
1. Wi = {E
T
2. W0 W1 = ∅
3. W0 W1 = Rn
S
Consider a leaf λ of the tree. Then the set Sλ = {x ∈ Rn : x reaches λ} is described by a set of linear equality /
inequality constraints, corresponding to the nodes on the path from the root to the leaf. Then Sλ is a convex polytope,
since each linear equality / inequality defines a convex set, and the intersection of convex sets is itself a convex set.
Convex sets are connected, in the following sense:
Definition 6.2.2 (Connected Sets). A set S ⊆ Rn is said to be (path-) connected if ∀a, b ∈ S, ∃ a function pab :
[0, 1] → S such that:
• pab is continuous
• pab (0) = a
• pab (1) = b
Suppose a linear decision tree T has L 1 leaves that output 1 and L 0 leaves that 0. Then
[
W1 = Sλ
λ outputs 1
For a set S, define #S to be the number of connected components of S. Then, since each leaf outputs at most a
single connected component,
#W1 ≤ L 1
and similarly,
#W0 ≤ L 0
For a given function f : Rn → {0, 1} (for example, element distinctness) we define f −1 (1) = W1 = {E
x : f (x) =
1}, and f (0) is similarly defined. Then,
−1
L 1 ≥ # f −1 (1)
and,
L 0 ≥ # f −1 (0)
This implies for any decision tree that computes f , the number of leaves L satisfies
L = L1 + L0
≥ # f −1 (1) + # f −1 (0)
25
CS 85, Spring 2008, Dartmouth College
LECTURE 6. LINEAR AND ALGEBRAIC DECISION TREES Lower Bounds in Computer Science
and hence,
26
Lecture 7
Algebraic Computation Trees and Ben-Or’s
Theorem
7.1 Introduction
Last time we saw the Milnor-Thom theorem, which says that the following
If instead of constant-degree polynomials we allowed arbitrary degree polynomials then it turns out that the element
distinctness problem could be done in O(1) time! Consider the expression
x = 5i< j (xi − x j )
Then x = 0 ⇔ ∃i 6= j : xi = x j . Hence, this is obviously too powerful a model.
In general such problems can be posed as “set recognition problems”, e.g. the element distinctness can be posed as
recognising the set W where W = {E x ∈ Rm : ∀i, j (i 6= j ⇒ xi 6= x j )}. Define Dlin (W ) as the height of the shortest
linear decision tree that recognizes W . We saw in the last lecture that for this particular problem, Dlin (W ) ≥ log3 (#W ).
1. Branching Nodes: labelled “v:0” where ‘v’ is an ancestor of the current node
2. Computation Nodes: labelled “v op v”’ where v, v’ are
27
CS 85, Spring 2008, Dartmouth College
LECTURE 7. ALGEBRAIC COMPUTATION TREES AND BEN-OR’S THEOREM Lower Bounds in Computer Science
and op is one of +, −, ×, ÷. The leaves are labelled “ACCEPT” or “REJECT”. The computation nodes in an
algebraic computation tree have 1 child, while branching nodes have 3 children.
For a set W ⊆ Rn , we define Dalg (W ) = minimum height of an algebraic computation tree recognizing W .
Note that in this model, every computation is being individually charged, hence an operation such as computing
5i< j (xi − x j ) would be charged O(n 2 ).
The Milnor-Thom theorem is applicable to polynomials of low degree, and it turns out that we still do not exactly
have that. For example, consider a path of length h in a tree. Then the degree of the polynomial computed can be 2h .
w1 = x 1 x 1
w2 = w1 w1
w3 = w2 w2
...
wh = wh−1 wh−1
We need to somehow bound the degree of the polynomial obtained from such a path. Let v be a node in an ACT
and let w denote its parent. In order to do this, with each node v of an ACT, we associate a polynomial equality /
inequality as follows:
1. If w = parent(v) is a branching node, then pv = pw T 0, where the inequality contains <, =, or > depending
on the branch taken from w to reach v.
At this √
point, we can even introduce the square root operator for our algebraic computation tree so that if the node
is labelled v 0 , then the corresponding polynomial inequality is pv2 = pv 0 .
Let λ be a leaf of the ACT, and we define as usual Sλ = {E x ∈ Rn : xE reaches λ}. Let wi denote the variable
introduced at node i in the tree for obtaining the polynomial equalities and inequalities. Then we can write
_
Sλ = xE ∈ Rn : ∃w1 , w2 , . . . , wh ∈ R s.t. ( condition at v)
v ancestor of λ
x , w)
∃y : p(E E = y2
This does not increase the degree of the polynomial, although now we are moving the equation to a higher dimen-
sion space (through the introduction of more variables). We can move back to the lower dimension space by simply
projecting the solution space to the lower dimension. Note that this projection function is a continuous map.
But we still don’t know how to handle strict inequalities, which is all we have. In order to handle strict inequalities,
we proceed as follows. Suppose that the subset v ⊆ Rm is defined by a number of polynomial equations and strict
inequalities i.e.
28
CS 85, Spring 2008, Dartmouth College
LECTURE 7. ALGEBRAIC COMPUTATION TREES AND BEN-OR’S THEOREM Lower Bounds in Computer Science
#Sλ ≤ #V
Suppose V = V1 ∪ V2 ∪ · · · ∪ Vl , where the Vi s are the connected components of V . Pick a point zEi ∈ Vi , for
each Vi , then we have a point in each connected component of V . Each zEi is also in V , so it satisfies the equality and
inequality constraints, and q1 (zEi ), q2 (zEi ), . . . , qu (zEi ) are all > 0.
Let ε = min q j (zEi ). Let
i, j
Vε = {Ez ∈ Rm : p1 (Ez ) = p2 (Ez ) = · · · = pt (Ez ) = 0
q1 (Ez ) ≥ ε, q2 (Ez ) ≥ ε, . . . , qu (Ez ) ≥ ε}
Clearly Vε ⊆ V . Also, Vε contains every (zEi ), i.e., Vε contains a point in each component Vi of V .
Therefore,
#Vε ≥ #V
Now we can replace the inequalities, and we obtain the set Vε0 as
^
V 0 = {Ez ∈ Rm : ∃y ∈ Ru (polynomial equations of degree at most 2) }
and note that Vε is the projection of (Ez , uE) ∈ Rm+u onto Rm . Then,
• Set equality, set inclusion, “convex hull” are all (n log n)
• Knapsack is (n 2 )
29
Lecture 8
Circuits
30
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science
Proof. We can assume that the circuit does not use NOT gates. We can get the effect of a NOT gate somewhere in the
circuit by using DeMorgan’s Laws and having all inputs as well as their negations.
We can lay down any circuit in a topological sort, so that any edge is from a vertex u to a vertex v that is after u in
the sorted order. With this in mind, we can count the number of circuits (DAGs) of size s. For each gate we have two
choices for the type of the gate and ≤ s 2 choices for inputs that feed the gate. So the total number of such circuits is
≤ (2s 2 )s ≤ s 3s .
2n
Each circuit computes a unique Boolean function, so that the number of functions that have a circuit of size ≤ 10n
is
n 3· 2n
23·2 /10
n
2 10n 3·2n 3·2n 3 log 10n
10 − 10n log 10n = 22 ( 10 − 10n ) < 22 · 10
n 3 n 3
≤ = n /10n = 2
10n (10n) 3·2
n 2n · 3
2 10
But the total number of functions f : {0, 1}n → {0, 1} is 22 , so that limn→∞ 2 2n = 0.
Observe that every function f : {0, 1}n
→ {0, 1} has a circuit of size O(n · 2n ). We can get this by writing f as an
OR of minterms (DNF). We will need at most 2n minterms.
n
It is possible to improve the above bound to O 2n .
We will now prove our first concrete lower bound. This bound is for the function THRkn : {0, 1}n → {0, 1} defined
as
1, if the number of 1s in x is ≥ k
THR n (x) = .
k
0, otherwise
31
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science
Proof. Note that we have the trivial lower bound of ≥ n − 1. The lower bound in the theorem is proved by the
technique of “gate elimination”.
We will prove the theorem by induction on n. For the base case (n = 2), we have THR22 = AND2 for which we
need a circuit of size 1 = 2n − 3.
Now, let C be a minimal circuit for THR2n . Then C does not have any gate that reads inputs (xi , xi ) or (xi , ¬xi ) or
(¬xi , ¬xi ) for i ∈ [n].
Pick a gate in C such that its inputs are z i = xi (or ¬xi ) and z j = x j (or ¬x j ), for some i 6= j. Then we can claim
that either z i or z j must have fan-out at least 2. To see this, note that by suitably setting z i and z j , we can get three
different subfunctions on the remaining n − 2 variables: THR2n−2 , THR1n−2 and THR0n−2 . But if both z i and z j have
fan-out only 1, the settings of z i and z j can create only two different subcircuits, which gives us a contradiction.
Suppose xi (or ¬xi ) has fan-out ≥ 2. Then, setting xi = 0 eliminates at least two gates. So we have a circuit for
THR 2n−1 (the resulting subfunction) with size ≤ (original size) −2.
Now by our induction hypothesis, (new size) ≥ 2(n − 1) − 3 = 2n − 5. This implies that the original size was
≥ 2+ (new size) ≥ 2n − 3.
Definition 8.1.1. AC0 is the class of O(1)-depth, polynomial size circuits with unbounded fan-in for AND, OR and
NOT gates.
1. We do not have any NOT gates. As before, if a NOT gate is indeed required somewhere in a circuit, we can
get an equivalent circuit using DeMorgan’s Laws that does not have any NOT gates, but uses the negations of
variables as input gates.
2. The vertices (gates) of the circuit are partitioned into d + 1 layers 0, 1, . . . , d such that: any edge (wire) (u, v)
is from vertex u in layer i to vertex v in layer i + 1 for some i, inputs xi , ¬xi , i ∈ [n] are at layer 0, the output is
at layer d, each layer consists of either all AND gates or all OR gates and layers alternate between AND gates
and OR gates. d is the depth of the circuit.
We can convert the DAG for the circuit into this form after a topological sort. First group the AND gates and OR
gates together into layers. If there is an edge from a vertex u in layer i to a vertex v in layer j > i + 1, then we can
replace the edge with a path from u to v having a vertex in each intermediate layer. Each new vertex represents an
appropriate gate for the layer it is in. For each edge in the original circuit the number of gates (wires) we add is at
most the (constant) depth of the original circuit.
Note that constant-depth polynomial-sized circuits remain so even after the above transformations, so we can
assume that AC0 circuits have these properties.
Now we are ready to prove our first lower bound on AC0 .
/ AC0 .
Theorem 8.2.1 (Håstad [Hås86]). PARn ∈
Proof. We will prove this by induction on the depth of the circuit. Suppose there exists a depth d, size n c circuit for
PAR n . We will show that there exists a restriction (partial assignment) such that after applying it
32
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science
• the resulting subfunction which is either PAR or ¬PAR depends on n 1/4 variables.
• there exists a depth d − 1, size O(n c ) circuit for the resulting subfunction.
But by our induction hypothesis this will not be possible.
For the base case, we will show that a circuit of depth 2 computing PAR or ¬PAR must have > 2n−1 gates. This
will complete the proof.
For the base, case let’s assume without loss of generality that level 1 is composed of OR gates and level 2 of an
AND gate producing the output. We claim that each OR gate must depend on every input xi . For suppose this is not
the case. Some OR gate is independent of (say) x1 . Then set x2 , x3 , . . . , xn so as to make that OR gate output 0. Then
the circuit outputs 0, but the remaining subfunction is not constant. This gives a contradiction. Now, for each OR gate
there is a unique input that makes it output 0. For each 0-input to PAR there must be an OR-gate that outputs 0. So the
number of OR gates is at least the number of 0-inputs to PAR which is 2n−1
To prove the induction step, consider the following random restriction on the PAR function. For each i ∈ [n],
independently set
1 1
0, with probability 2 − 2 n
√
1 1
xi ←− 1, with probability 2 − 2 n
√ .
1
∗, with probability √
n
Here ∗ means that the variable is not assigned a value and is free. Repeat this random experiment with the resulting
subfunction. After one restriction we get that
1 √
E[# of free variables] = √ · n = n.
n
√
n
Then by applying a Chernoff bound we see that with probability ≥ 1 − 2−O(n) , we have ≥ 2 free variables. After
n 1/4
two restrictions, with probability ≥ 1 − exp (−n), we have ≥ free variables. Let BAD0 denote the event that this
4
is not the case.
We make two claims:
Claim 1: After the first random restriction, with probability ≥ 1 − O n1 , every layer-1 OR gate depends on ≤ 4c
variables.
Claim 2: If every
layer-1 OR gate depends on ≤ b variables, then after another random restriction, with probability
≥ 1 − O n , every layer-2 AND gate depends on ≤ `b variables (where `b is a constant depending on b alone).
1
Now, we make the observation that for a function g : {0, 1}n → {0, 1} that depends only on ` of its inputs, we
have a depth-2 AND-of-OR’s circuit and a depth-2 OR-of-AND’s circuit computing g both of size ≤ ` · 2` .
Let BAD 1 and BAD2 respectively denote the events that Claim 1 and Claim 2 do not hold. Then with probability
≥ 1 − O n1 none of the bad events BAD0 , BAD1 and BAD2 occur. So there exists a restriction of the variables
1/4
such that none of the bad events occur. That restriction leaves us with a circuit on ≥ n 4 variables and by the above
switching argument, we can make layer-2 use only OR gates and layer-1 use only AND gates. Then we can combine
layers 2 and 3 to reduce depth by 1, completing the induction step.
Proof of Claim 1. Consider a layer-1 OR gate G. We have two cases:
Fat case: Fan-in of G is ≥ 4c log n.
Then with high probability G is set to be constant 1. To be precise,
4c log n 4c log n
1 1 2 2 81
Pr[G is not set to 1] ≤ + √ ≤ = n 4c log 3 = n −c log 16 < n −2c .
2 2 n 3
Thin case: Fan-in of G is ≤ 4c log n.
4c
4c log n 1
Pr[G depends on ≥ 4c variables] ≤ √ ≤ (4c log n)4c n −2c ≤ n −1.5c .
4c n
33
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science
Proof of Claim 2. The proof is by induction on b. For the base case b = 1, every layer-2 gate is “really” a layer-1
gate. So this case reduces to Claim 1 and we can set `1 = 4c.
For the induction step, consider a layer-2 AND gate G. Let M be a maximal set of non-interacting OR-gates that
are in G, where by interacting we mean that two gates share an input variable. Let V be the set of inputs read by gates
in M. We again have two cases:
Fat case: |V | > a log n.
In this case, |M| ≥ |Vb | > a log n
b , so that
b b
1 1 1
Pr[a particular OR-gate in M is set to 0] ≥ − √ ≥ .
2 2 n 3
Then
|M| a log n
1 1 b −a log n
−a/(b·3b ) 1
Pr[no OR-gate in M is set to 0] ≤ 1 − b ≤ 1− b ≤e b·3b ≤n ≤O , for a = b·3b .
3 3 n
1 i
|V |
Pr[after restriction V has ≥ i free variables] ≤ √
i n
a log n
≤ · n −i/2
i
≤ (a log n)i · n −i/2 ≤ n −i/3 .
Choose i = 4c. Then with probability ≥ 1 − n −4c/3 , there are ≤ 4c free variables remaining in V .
This implies that for every one of the 24c settings of these free variables, the resulting circuit computes an AND-
of-ORs with bottom fan-in ≤ b − 1. This is because all OR gates that are not in M interact with V .
This further implies that every one of these 24c subfunctions will (after restriction) depend on ≤ `b−1 variables.
So the whole function under G depends on ≤ 4c + 24c · `b−1 variables. Set `b = 4c + 24c · `b−1 , to complete the
induction step.
If we do our calculations carefully through the steps of the induction above, we could prove a lower bound of
(1/d) 1/(d−1)
2n . On the other hand, we can also construct a depth-d circuit of size O(n · 2n ).
34
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science
function calculated by G in the following sense: for all aE ∈ {0, 1}n , pG (E a ) ∈ {0, 1} and PraE∈{0,1}n [ pG (E
a ) 6= G(Ea )] ≤
small(`) ≈ 21` , where ` is a parameter to be fixed later.
Eventually, we will get a polynomial that approximates parity and has “low” degree, which we will see is a
contradiction.
We define a polynomial that approximates an OR gate in the following manner. We wantPto know if the input
xE to the OR gate is 0. E Take a random vector rE ∈ R {0, 1}n and compute rE · xE mod 3 =
i∈[n] ri x i mod 3 =
E r · xE mod 3 = 0] ≤ 21 .
P
i∈[n],ri =1 x i mod 3. If x E = 0, then rE · xE mod 3 = 0 always. If xE 6= 0, then PrrE [E
n
P Pick S ⊆ [n], uniformly at random (this is equivalent to picking rE ∈ R {0, 1} as above). Consider the polynomial
i∈S x i ∈ F3 [x 1 , x 2 , . . . , x n ]. If x
E this polynomial is always 0. Otherwise, this polynomial is not 0 with
E = 0,
probability 2 , that is, the square of the polynomial is 1 with probability ≤ 12 .
1
then this is a polynomial of degree 2` and for a particular input, this choice is bad with probability ≤ 21` . Now, by
Yao’s Minimax Lemma, we have that there exists a polynomial of degree 2` that agrees with ORn except at ≤ 1/2`
fraction of the inputs.
Now we topologically sort the circuit C to get the order: x1 = g1 , x2 = g2 , . . . , xn = gn , gn+1 , gn+2 , . . . , gs ,
where s = size(C). For i = 1 to s, we write a polynomial pgi corresponding to gate gi that approximates the Boolean
function computed at gi :
• For i ≤ n, pgi = xi .
1 s
Pr [ f (E
a ) 6= PAR(E
a )] ≤ (number of OR gates) · `
≤ `.
aE ∈{0,1}n 2 2
0 −→ FALSE −→ +1
1 −→ TRUE −→ −1 .
Then we have that there exists A ⊆ {−1, 1}n and a polynomial fˆ ∈ F3 [x1 , . . . , xn ] such that:
√
1. def( fˆ) = def( f ) ≤ n, and
Qn
a ∈ A : fˆ(E
2. ∀E a ) = ±1PAR(Ea ) = i=1 ai .
35
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science
Consider F3A = {all functions : A → F3 }, so that |F3A | = |F3 ||A| = 3|A| . Pick a function φ : A → F3 . Then φ has
a multilinear polynomial representation g(x1 , . . . , xn ) that agrees with φ on {−1, 1}n (hence on A).QThe polynomial
g(x1 , . . . , xn ) is of the form (monomials) · (coefficients), where each monomial is of the form i∈I xi for some
P
I ⊆ [n].
Note that, in F3 , Y Y Y Y
xi = xi · xi = fˆ(x1 , . . . , xn ) · xi
i∈I i∈[n] i∈[n]\I i∈[n]\I
√
for xE ∈ A (required for the last equality). The LHS above has
√ degree |I |, while the RHS has degree n + n − |I |.
√ > 2 + n, we get a polynomial ĝ such that g(x1 , . . . , xn ) =
n
Applying this to every monomial of degree
ĝ(x1 , . . . , xn ) for xE ∈ A with deg(ĝ) ≤ 2 + n.
n
√
Now, the number of multinomial polynomials in F3 [x1 , . . . , xn ] of degree ≤ n2 + n is
√ Pn/2+√n
3(#monomials of degree≤n/2+ n)
= 3 i=0 (ni) ≤ 30.98×2n ,
P In a circuit we can also have MOD3 gates. A MOD3 gate taking inputs x1 , . . . ,0xn produces output y = 1 if
i x i mod 3 6= 0 and y = 0 otherwise. If we denote the class of such circuits as AC [3] then the above proof also
shows that PARn ∈ / AC0 [3].
In general, if we have MODm gates taking inputs x1 , . . . , xn and producing output y = 1 if i xi mod m 6= 0 and
P
y = 0 otherwise then we have the following definition.
Definition 8.3.2. AC0 [m] is the class of O(1)-depth, polynomial size circuits with unbounded fan-in using AND, OR
and NOT and MODm gates.
Definition 8.3.3. ACC0 is the class of O(1)-depth, polynomial size circuits with unbounded fan-in using AND, OR
and NOT and MODm 1 , MODm 2 , . . . , MODm k gates, for some constants m 1 , m 2 , . . . , m k .
We can think of ACC0 as the class of AC0 circuits but with “counters”.
36
Bibliography
[BFP+ 73] Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert Endre Tarjan. Time
bounds for selection. JCSS, 7(4):448–461, 1973.
[BJ85] Samuel W. Bent and John W. John. Finding the median requires 2n comparisons. In STOC, pages 213–216,
1985.
[DZ99] Dorit Dor and Uri Zwick. Selecting the median. SICOMP, 28(5):1722–1758, 1999.
[DZ01] Dorit Dor and Uri Zwick. Median selection requires (2 + ε)n comparisons. SIDMA, 14(3):312–325, 2001.
[FG79] Frank Fussenegger and Harold N. Gabow. A counting approach to lower bounds for selection problems.
JACM, 26(2):227–238, 1979.
[Hås86] Johan Håstad. Almost optimal lower bounds for small depth circuits. In STOC, pages 6–20, 1986.
[Raz87] Alexander A. Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with
logical addition. Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338, 1987.
[Sip06] Michael Sipser. Introduction to the Theory of Computation, 2nd edition. Thomson Course Technology,
second edition edition, 2006.
[Smo87] Roman Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In
STOC, pages 77–82, 1987.
[SPP76] Arnold Schönhage, Mike Paterson, and Nicholas Pippenger. Finding the median. JCSS, 13(2):184–199,
1976.
37