0% found this document useful (0 votes)
43 views38 pages

Lower Bounds in Computer Science

This document provides lecture notes on lower bounds in computer science. It discusses sorting algorithms and proves that any randomized sorting algorithm requires Ω(n log n) comparisons in the worst case. The proof uses Yao's minimax lemma and Markov's inequality to show that the maximum expected number of comparisons across all inputs for a randomized sorting algorithm must be Ω(n log n). It also covers deterministic sorting algorithms and introduces the concepts of distributional complexity, worst-case complexity, and randomized complexity.

Uploaded by

Mirza Abdulla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views38 pages

Lower Bounds in Computer Science

This document provides lecture notes on lower bounds in computer science. It discusses sorting algorithms and proves that any randomized sorting algorithm requires Ω(n log n) comparisons in the worst case. The proof uses Yao's minimax lemma and Markov's inequality to show that the maximum expected number of comparisons across all inputs for a randomized sorting algorithm must be Ω(n log n). It also covers deterministic sorting algorithms and introduces the concepts of distributional complexity, worst-case complexity, and randomized complexity.

Uploaded by

Mirza Abdulla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

CS85: You Can’t Do That

(Lower Bounds in Computer Science)

Lecture Notes, Spring 2008

Amit Chakrabarti
Dartmouth College

Latest Update: May 9, 2008


Lecture 1
Comparison Trees: Sorting and Selection

Scribe: William Chen

1.1 Sorting
Definition 1.1.1 (The Sorting Problem). Given n items (x1 , x2 , · · · , xn ) from an ordered universe, output a permuta-
tion σ : [n] → [n] such that xσ (1) < xσ (2) · · · < xσ (n) .

At present we do not know how to prove superlinear lower bounds for the Turing machine model for any problem
in NP, so we instead restrict ourselves to a more limited model of computation, the comparison tree. We begin by
considering deterministic sorting algorithms.

Deterministic Sorting Algorithms


Whereas the Turing machine is allowed to move heads, read and write symbols at every time step, in a comparison
tree T we are only allowed to make comparisons and read the result. Without loss of generality we’ll assume that the
xi s are distinct, so every comparison results in either a “<” or “>”. At each internal node of T we make a comparison,
the result of which gives tells us the next comparison to make by directing us to either the left or the right child. At
each leaf node we must have enough information to be able to output the sorted list of items. The cost of a sorting
algorithm in this model is then just the height of such a comparison tree T , i.e., the maximum depth of a leaf of T .
Now note that T is in fact a binary tree, so we have

` ≤ 2h , h ≥ dlg `e,

where ` is the number of leaves, and h is the height of T . But now we note that since there are n! possible permutations,
any correct tree must have at least n! leaves, so then, by Stirling’s formula, we have

h ≥ dlg n!e = (n lg n).

Open Problem. Prove that some language L ∈ NP cannot be decided in O(n) time on a Turing machine.

Randomized Sorting Algorithms


Now we turn our attention to randomized comparison-based sorting algorithms and prove that in this case access to a
constant time random number generator does not give us any additional power. That is to say, it does not change our
lower bound. An example of such an algorithm is the variant of quicksort where, at each recursive call, the pivot is

1
CS 85, Spring 2008, Dartmouth College
LECTURE 1. COMPARISON TREES: SORTING AND SELECTION Lower Bounds in Computer Science

chosen uniformly at random. Note that the “randomness” in any randomized algorithm can be captured as an infinite
binary “random string”, chosen at the beginning, such that at every call of the random number generator, we simply
read off the next bit (or the next few bits) on the random string. Then, we can say that a randomized algorithm is just
a probability distribution over deterministic algorithms (each obtained by fixing the random string). In other words,
when calling the randomized algorithm, one can move all the randomness to the beginning, and simply pick at random
a deterministic algorithm to call.
With this view of randomized algorithms in mind, we describe an extremely useful lemma due to Andrew Chi-Chih
Yao (1979). But first, some preliminaries.
Suppose we have some notion of “cost” of an algorithm (for some function/problem f ) on an input, i.e., a (non-
negative) function cost : A × X → R+ , where A is a space of (deterministic) algorithms and X is a space of inputs.
An example of a useful cost measure is “running time.” In what follows, we will consider random algorithms A ∼ λ,
where λ is a probability distribution on A . Note that, per our above remarks, a randomized algorithm is specified by
such a distribution λ. We will also consider a random input X ∼ µ, for some distribution µ on X .

Definition 1.1.2 (Distributional, worst-case and randomized complexity). In the above setting, the term Eµ [cost(a, X )]
is then just the expected cost of a particular algorithm a ∈ A on a random input X ∼ µ. The quantity

Dµ ( f ) = min Eµ [cost(a, X )] ,
a

where f is the function we want to compute, is called the distributional complexity of f according to input distribution
µ. We constrast this to the worst case complexity of f , which we can write as follows

C( f ) = min max cost(a, x).


a x

Finally, we define R( f ), the randomized complexity of f as follows:

R( f ) = min max Eλ [cost(A, x)] .


λ x

From the definitions above it follows easily that

∀ µ Dµ ( f ) ≤ C( f ) and R( f ) ≤ C( f ).

To obtain the latter inequality, consider the trivial distributions λ that are each supported on a single algorithm a ∈ A .

Theorem 1.1.3 (Yao’s Minimax Lemma). We have

1. (The Easy Half) For all input distributions µ, Dµ ( f ) ≤ R( f ).

2. (The Difficult Half) maxµ Dµ ( f ) = R( f ).




The proof of part (2) is somewhat non-trivial and requires the use of the linear programming duality theorem.
Moreover, we do not need part (2) at this point, so we shall only prove part (1). We note in passing that part (2) can be
written as
max min Eµ [cost(a, X )] = min max Eλ [cost(A, x)].
µ a λ x

Proof of Part (1). To visualize this proof, it helps to consider the following table, where X = {x1 , x2 , x3 , . . .}, A =
{a1 , a2 , a3 , . . .} and ci j = cost(ai , x j ):
a1 a2 a3 ···
x1 c11 c12 c13 ···
x2 c21 c22 c23 ···
x3 c31 c32 c33 ···
.. .. .. .. ..
. . . . .

2
CS 85, Spring 2008, Dartmouth College
LECTURE 1. COMPARISON TREES: SORTING AND SELECTION Lower Bounds in Computer Science

While reading the argument below, think of row averages and column averages in the above table.
Define Rλ ( f ) = maxx Eλ [cost(A, x)]. Then, for all x, we have

Eλ [cost(A, x)] ≤ Rλ ( f ),

whence
Eµ [Eλ [cost(A, X )]] ≤ Rλ ( f ).
Since expectation is essentially just summation, we can switch the order of summation to get

Eλ Eµ [cost(A, X )] ≤ Rλ ( f ),
 

which implies that there must be an algorithm a such that Eµ [cost(a, X )] ≤ Rλ ( f ). Since the distributional complexity
Dµ ( f ) takes the minimum such a, it’s clear that

Dµ ( f ) ≤ Rλ ( f ).

Since this holds for any λ, we have Dµ ( f ) ≤ minλ Rλ ( f ) = R( f ).


Before we returning to the issue of a lower bound for randomized sorting, we introduce another lemma.
Lemma 1.1.4 (Markov’s Inequality). If X is a random variable taking only nonnegative values, then for all t > 0, we
have Pr[X ≥ t] ≤ E[X ]/t.
Proof. An easy one-liner: E[X ] = E[X | X < t] · Pr[X < t] + E[X | X ≥ t] · Pr[X ≥ t] ≥ 0 + t · Pr[X ≥ t].
Corollary 1.1.5. Under the above conditions, for any constant c > 0, Pr[X > c E[X ]] ≤ 1/c.
Let m denote the number of comparisons made on any particular run, or equivalently its runtime. Consider a
randomized sorting algorithm a (i.e., a distribution over comparison trees), where we set

C = max {Eλ [m]}


input

note here that m is a function of both the algorithm and the input.
Theorem 1.1.6 (Randomized sorting lower bound). Let T (x) denote the expected number of comparisons made by a
randomized n-element sorting algorithm on input x. Let C = maxx T (x). Then C = (n lg n).
Proof. Let A denote the space of all (deterministic) comparison trees that sort an n-element array and let X denote
the space of all permutations of [n]. Consider a randomized sorting algorithm given by a probability distribution λ
on A . As it stands, a “randomized sorting algorithm” is always correct on every input, but has a random runtime
(between (n) and O(n 2 )) that depends on its input. (Such algorithms are called Las Vegas algorithms.)
We want to convert each a ∈ A into a corresponding algorithm a 0 that has a nontrivially bounded runtime, but
may sometimes spits out garbage. We do this by allowing at most 10C comparisons, and outputting some garbage
if the sorted permutation is still unknown. Let A0 denote a random algorithm corresponding to a random A ∼ λ.
Corollary 1.1.5 implies
1
Pr[A0 is wrong on x] = Pr[NumComps(A, x) > 10C] ≤ .
10
Now define 
1, if a is wrong on input x
cost(a, x) =
0, otherwise
We can then rewrite the above inequality as
1
max Eλ [cost(A0 , x)] ≤ .
x 10

3
CS 85, Spring 2008, Dartmouth College
LECTURE 1. COMPARISON TREES: SORTING AND SELECTION Lower Bounds in Computer Science

By Yao’s minimax lemma, for all input distributions µ, we have


1
Dµ (sorting with ≤ 10C comparisons) ≤ .
10
Now pick µ to be the uniform distribution on X . Pick any deterministic algorithm a with ≤ 10C comparisons that
achieves the minimum in the definition of Dµ . Then, for X ∼ µ, we have Pr[a is wrong on X ] ≤ 1/10. Now the
height of a’s tree is ≤ 10C, so the number of leaves of a’s tree is ≤ 210C , and thus the number of distinct permutations
that a can output is ≤ 210C . But now we see that
9
Pr[X is one of the permutations output by a] > ,
10
so 210C ≥ (9/10)n! and hence, by Stirling’s formula,
  
1 9
C≥ lg n! = (n lg n).
10 10

1.2 Selection
Definition 1.2.1 (The Selection Problem). Given n items (x1 , x2 , · · · , xn ) from an ordered universe and an integer
k ∈ [n], the selection problem is to return the kth smallest element, i.e., xi such that |{ j : x j < xi }| = k − 1. Let
V (n, k) denote the height of the shortest comparison tree that solves this problem.

Special Cases. The minimum (resp. maximum) selection problems, where k = 1 (resp. k = n), and the median
problem, where k = d n2 e. Minimum and maximum can clearly be done in O(n) time. Other selection problems are
less trivial.

1.2.1 Minimum Selection


Theorem 1.2.2. Let T be a comparison tree that finds the minimum of n elements. Then every leaf of T has depth at
least n − 1. Since T is binary, it must therefore have at least 2n−1 leaves.

Proof. Consider any leaf λ of T . If xi is the output at λ, then every x j ( j 6= i) must have won at least one comparison
on the path leading to λ (if not, we cannot know for sure that x j is not the minimum). Since each comparison can be
won by at most one element, we have depth(λ) ≥ n − 1.

Corollary 1.2.3. V (n, 1) = n − 1.

1.2.2 General Selection


By an adversarial argument, Hyafil [Hya76] proved that V (n, k) ≥ n − k + (k − 1)dlg k−1
n
e. This was strengthened
by Fussenegger and Gabow [FG79] to the bound stated in the next theorem. Both these results give poor bounds for
k ≈ n. However, observe that V (n, k) = V (n, n − k).
Note. For the special case where k = 2, we have V (n, 2) = n − 2 + dlg ne. It is a good exercise to prove this: both the
upper and the lower bound are interesting. Also, k ∈ {1, 2, n − 1, n} are the only values for which we know V (n, k)
exactly.
l m
n 
Theorem 1.2.4 (Fussenegger & Gabow). V (n, k) >= n − k + lg k−1 .

4
CS 85, Spring 2008, Dartmouth College
LECTURE 1. COMPARISON TREES: SORTING AND SELECTION Lower Bounds in Computer Science

Proof. Let T be a comparison tree for selecting the kth smallest element. At a leaf λ of T , if we output xi , then every
x j ( j 6= i) must be “settled” with respect to xi , i.e., we must know whether or not x j < xi . In particular, we must
know the set
Sλ = {x j : x j < xi }.
Fix a particular set U ⊆ {x1 , . . . , xn }, with |U | = k − 1. Now consider the following set of leaves:

L U = {λ : Sλ = U }.

Notice that the output at any leaf in L U is the minimum of the elements in U . Therefore, if we treat U as “given,” and
prune T accordingly, by eliminating any nodes that perform comparisons involving one or more elements of U , then
the residual tree is one that has leaf set L U and finds the minimum of the n − k + 1 elements of U .
By Theorem 1.2.2, the pruned tree must have at least 2n−k leaves. Therefore, |L U | ≥ 2n−k . The number of leaves
n 
of T is the sum of |L U | over all k−1 choices for the set U . Therefore, the height of T must be
      
n n
height(T ) ≥ lg ·2 n−k
= n − k + lg .
k−1 k−1

This proves the theorem.

1.2.3 Median Selection


Turning now to the median selection special case, recall that by Stirling’s formula, we have
   n 
n 2
=2 √ ,
n/2 n

which already gives us a good intial lower bound for median selection:
 n  n   n  n 1 3
V n, ≥ + lg n ≥ + n − lg n − O(1) = n − o(n).
2 2 2 −1 2 2 2

Using more sophisticated techniques, one can prove the following stronger lower bounds:

V n, n2  ≥ 2n − o(n).

[BJ85]
V n, 2n
≥ (2 + 2 )n − o(n).
−50 [DZ01]

On the upper bound side, we again have nontrivial results, starting with the famous “groups of five” algorithm:

V n, n2  ≤ 6n + o(n). [BFP+ 73]



n
V n, 2  ≤ 3n + o(n). [SPP76]
V n, n2 ≤ 2.995n. [DZ99]

5
Lecture 2
Boolean Decision Trees

Scribe: William Chen

2.1 Definitions and Basic Theorems


Definition 2.1.1 (Decision Tree Complexity). Let f : {0, 1}n → {0, 1} be some Boolean function. We sometimes
write f n when we want to emphasize the input size. A decision tree T models a computation that adaptively reads the
bits of an input x, branching based on each bit read. Thus, at each internal node of T , we read a specific bit (indicated
by the label of the node) and branch left or right depending upon whether the bit read was a ‘0’ or a ‘1’. At each leaf
node we must have enough information to determine f (x). We define the deterministic decision tree complexity D( f )
of f to be
D( f ) = min{height(T ) : T is a decision tree that evaluates f }.
For example, easy adversarial arguments show that for the “or”, “and” and “parity” functions, we have D(ORn ) =
D(ANDn ) = D(PARn ) = n. It is convenient to introduce a term for functions that are maximally hard to evaluate in
the decision tree model.
Definition 2.1.2 (Evasiveness). A function f : {0, 1}n → {0, 1} is said to be evasive (a.k.a. elusive) if D( f ) = n.
We shall try to relate this algorithmically defined quantity D( f ) to several more combinatorial measure of the
hardness of f , that we now define.
Definition 2.1.3 (Certificate Complexity). For a function f : {0, 1}n → {0, 1}, define a 0-certificate to be a string
α ∈ {0, 1, ∗}n such that ∀ x ∈ {0, 1}n : x matches α ⇒ f (x) = 0. That is to say, for any input x that matches α for
all non-∗ bits, we have f (x) = 0, no matter the settings of the free variables given by the ∗’s. Define the size of a
certificate α to be the number of exposed (i.e., non-∗) bits. Then, define the 0-certificate complexity of f as

C0 ( f ) = max min{size(α) : α is a 0-certificate that matches x}.


x

We define 1-certificates and C1 ( f ) similarly. We use the convention that C0 ( f ) = ∞ if f ≡ 1, and similarly for
C1 ( f ). Finally, we define the certificate complexity C( f ) of a function f to be the larger of the two, that is

C( f ) = max{C0 ( f ), C1 ( f )}.

Example 2.1.4. C1 (ORn ) = 1, whereas C0 (ORn ) = n, so C(ORn ) = n.


The quantity C( f ) is sometimes called the nondeterministic decision tree complexity.
Note that for any decision tree for f , any leaf giving an answer p ∈ {0, 1} must be of depth at least C p ( f ), so we
have C( f ) ≤ D( f ).

6
CS 85, Spring 2008, Dartmouth College
LECTURE 2. BOOLEAN DECISION TREES Lower Bounds in Computer Science

Definition 2.1.5 (Sensitivity). For a function f : {0, 1}n → {0, 1}, define the sensitivity sx ( f ) for f on a particular
input x as follows
sx ( f ) = |{i ∈ [n] : f (x (i) ) 6= f (x)}|,
where we write x (i) to denote the string x with the ith bit flipped. We then define the sensitivity of f as

s( f ) = max{sx ( f )}.
x

Definition 2.1.6 (Block Sensitivity). For a block B ⊆ [n] of bit positions and n = |x|, let x (B) be the string x with all
bits indexed by B flipped. Define the block sensitivity bsx ( f ) for f on a particular input x as

bsx ( f ) = max{b : {B1 , B2 , . . . , Bb } is a set of disjoint blocks such that ∀ i ∈ [b] : f (x) 6= f (x (Bi ) )}.

That is to say, bsx ( f ) is the maximum number of disjoint blocks that are each sensitive for f on input x. We then
define the block sensitivity of f as
bs( f ) = max bsx ( f ).
x

Note that since block sensitivity is just a generalization of sensitivity (which requires that the “blocks” be of size 1
each), we have bsx ( f ) ≥ sx ( f ), and thus bs( f ) ≥ s( f ). Before we go on to develop some of the theory behind this,
consider the following example.

√ 2.1.7. Consider a√function f (x) defined as follows, where the input x is of length n and is arranged in a
Example

n × n matrix. Assume n is an even integer. Define

1, if some row of the matrix matches 0∗ 110∗



f (x) =
0, otherwise.
√ √
Since each row can have at most 2 sensitive bits, we have s( f ) = 2 n = O( n). However, for block sensitivity we
have bs( f ) ≥ bsz ( f ) ≥ 2 where z = 0 and we have 2 blocks each consisting of 2 consecutive bits, which makes
n n n

every block sensitive. This gives a quadratic gap between s( f ) and bs( f ).

Theorem 2.1.8. s( f ) ≤ bs( f ) ≤ C( f ) ≤ D( f ).

Proof. The leftmost and rightmost inequalities have already been established above. To see that the middle inequality
holds, observe that every certificate must expose at least one bit per sensitive block.

Here we note that there could be at least a quadratic gap between s( f ) and bs( f ) as evidenced by example 2.1.7.
It turns out that there is a maximum of a quadratic gap between C( f ) and D( f ), which leads us to the next theorem,
whose proof we will see as a corollary later in this section.

Theorem 2.1.9 (Blum-Impagliazzo). D( f ) ≤ C( f )2



Example 2.1.10. Consider a function g = AND√n ◦ OR√n . That is to say, g divides the input x with |x| = n into n
groups, feeds each group into the subfunction OR√n , and then feeds the results of each OR√n into AND√n . To certify

0, we need only expose all n of√ the inputs in one OR-subtree to show that they are all 0. To certify 1, we need only
expose one input in each of the n OR-subtrees to show that there is at least a single 1 going into every OR. Then, we
have √
C(g) = n
however, it’s easy to see that given any subset of the input, the final answer still depends on the rest of the input, and
thus D(g) = n. Therefore, g is evasive.

7
CS 85, Spring 2008, Dartmouth College
LECTURE 2. BOOLEAN DECISION TREES Lower Bounds in Computer Science

2.1.1 Degree of a Boolean Function


Definition 2.1.11 (Representative Polynomial). For a function f : {0, 1}n → {0, 1}, we say that a polynomial
p(x1 , x2 , . . . , xn ) represents f if for all aE ∈ {0, 1}n , we have p(E
a ) = f (E
a ).
Definition 2.1.12 (Degree of a Boolean Function). For a function f : {0, 1}n → {0, 1} and a representative polynomial
p for f , the degree of f is then just the degree of p.
Theorem 2.1.13. For any such f , there exists a unique multilinear polynomial that represents f .
Proof. Let {Ea1 , aE2 , . . . , aEk } be the set of all inputs that make f = 1. Then, we claim that for all b,
E cE ∈ {0, 1}n , there
exists a polynomial pbE (x1 , . . . , xn ) such that

1 if cE = bE
pbE (E
c) =
0 otherwise

E consider the polynomial


To see this, for any such vector b,
n
Y
pbE (x1 , . . . , xn ) = (xi + bEi − 1))
i=1

Clearly this polynomial is 1 if and only if every xi = bEi . Then, it follows that the polynomial that represents f is just
the polynomial
Xk
p= paEi
i=1

Since each aEi is distinct, at most one paEi will evaluate to 1, but since that implies that the input is one of the accepting
inputs for f , p behaves as desired. To prove uniqueness, suppose 2 multilinear polynomials p, q both represent f ,
then the polynomial r = p − q satisfies r (E a ) = 0 for all aE ∈ {0, 1}n .
Consider the smallest monomial xi1 , xi2 , . . . , xik in r . FINISH UNIQUENESS PROOF.
Theorem 2.1.14. For a Boolean function f , let p be its representative polynomial. Then we have deg( p) ≤ D( f )
Proof. For any leaf λ of a decision tree, without loss of generality let x1 , x2 , . . . , xr be the queries made along the
path to that leaf, and let b1 , b2 , . . . , br be the results of the queries. Then for every leaf λ of a decision tree for f , write
a polynomial pλ given as follows Y Y
pλ = xi · (xi − 1)
i:bi =1 i:bi =0

Clearly each pλ has degree r ≤ D( f ), then the polynomial


X
p= pλ
λ

represents f and also has degree ≤ D( f ).


Example 2.1.15. We present an example to show that deg( f ) can be much smaller than s( f ). Consider the function
on an input vector xE with size n
(
NAE( R - NAE(E x1 , . . . , xEb n3 c ), R - NAE(E
xd n3 e , . . . , xEb 2n c ), R - NAE(E
xd 2n e , . . . , xEn )) if |E
x| > 3
R - NAE(E
x) = 3 3
NAE(Ex1 , xE2 , xE3 ) if |E
x| = 3

where we assume that n = 3k for some integer k, and NAE is defined as follows

1 if x, y, z are not all equal
NAE(x, y, z) =
0 if x = y = z

8
CS 85, Spring 2008, Dartmouth College
LECTURE 2. BOOLEAN DECISION TREES Lower Bounds in Computer Science

E we have
In other words, this is the k-layer tree of NAE computations applied recursively to 3k initial inputs. For x = 0,
sx (R - NAE) = 3 , since changing any single bit will flip the answer. Now observe that we can alternatively write the
k

NAE(x, y, z) function as a degree 2 polynomial

NAE(x, y, z) ≡ x + y + z − x y − yz − zx

Then, we can see that at the second lowest level we have 3k−1 polynomials, each of degree 2 (lowest level contains 3k
polynomials of degree 1 - coordinates of the input vector), but at the root, we have a single polynomial of degree 2k ,
but this is polynomially distinct to the sensitivity sx (R - NAE) = 3k . Alternatively, writing this in terms of n, we have

deg( f ) = n log3 2 s( f ) = n

Now returning to the example g = AND√n ◦ OR√n , we have



deg( f ) = n s( f ) ≤ n = C( f )

To see why deg( f ) = n, simply recall that no matter the input, we must always examine every bit of the input to get a
definitive answer. In other words, the decision tree has constant height D(g) = n (refer to example 2.1.10).

Theorem 2.1.16 (Beals-Buhrman-Cleve-Marca-deWolf).

D( f ) ≤ C1 ( f )bs( f )
≤ C0 ( f )bs( f )

Proof. Without loss of generality we only consider the first case, that D( f ) ≤ C1 ( f )bs( f ), as the case for C0 ( f )bs( f )
is symmetric. We proceed to induct on the number of variables, or size of the input n.
The base case n = 1 is trivial. Before we continue with the induction step, we define a subfunction of f as follows

f |α : {0, 1}n−k → {0, 1}, with bits in α set to their values in α

where α ∈ {0, 1, ∗}n and exposes k bits, or has exactly k characters ∈ {0, 1}. If f has no 1-certificate, then f = 0, and
D( f ) = 0, so we’re done. FINISH BEALS BUHRMAN PROOF

Theorem 2.1.17 (Nisan). C( f ) ≤ s( f )bs( f )

Proof. Try it yourself.

Corollary 2.1.18. D( f ) ≤ C( f )bs( f ) ≤ s( f )bs( f )2 ≤ bs( f )3

Proof. Direct result of theorems 2.1.17 and 2.1.16.

Corollary 2.1.19. D( f ) ≤ C( f )2

Proof.

D( f ) ≤ min{C0 ( f ), C1 ( f )} · bs( f )
≤ min{C0 ( f ), C1 ( f )} · C( f )
= min{C0 ( f ), C1 ( f )} · max{C0 ( f ), C1 ( f )}
= C0 ( f ) · C1 ( f )
≤ C( f )2

9
Lecture 3
Degree of a Boolean Function

Scribe: Ranganath Kondapally

Theorems:
1. S( f ) ≤ bs( f ) ≤ C( f ) ≤ D( f )

2. deg( f ) ≤ D( f )

3. C( f ) ≤ S( f )bs( f ) ≤ bs( f )2 [Nisan]

4. D( f ) ≤ C1 ( f )bs( f ) [Beals et al]

5. Corollary: D( f ) ≤ C0 ( f )C1 ( f ) ≤ C( f )2 [Blum-Impagliazzo]

6. Corollary: D( f ) ≤ bs( f )3

7. bs( f ) ≤ 2 deg( f )2

8. D( f ) ≤ bs( f ) deg( f )2 ≤ 2 deg( f )4

Examples:

1. ∃ f : S( f ) = 2( n), bs( f ) = 2(n)
√ √
2. ∃ f : S( f ) = bs( f ) = C( f ) = n, deg( f ) = D( f ) = n

3. ∃ f : deg( f ) = n log3 2 , S( f ) = n

Theorem 7:
Proof.

• Pick an input −

a (∈ {0, 1}n ) and blocks B1 , B2 , . . . , Bb that achieve the max in bs( f ) = b.

• Let P(x1 , . . . , xn ) be a multi-linear polynomial representation of f . Create a new polynomial Q(y1 , y2 , . . . , yb )


from P by setting xi ’s as follows

– If i 6∈ B1 ∪ B2 . . . ∪ Bb , set xi = ai

10
CS 85, Spring 2008, Dartmouth College
LECTURE 3. DEGREE OF A BOOLEAN FUNCTION Lower Bounds in Computer Science

– If i ∈ B j ,
∗ If ai = 1, set xi = y j
∗ If ai = 0, set xi = 1 − y j
• Multi-linearize the polynomial
deg(Q) ≤ deg(P) (Note: deg(Q) ≤ b)
Q(1, 1, 1, . . . , 1) = P(−

a ) = f (− →a)
Q(0, 1, 1, . . . , 1) = Q(1, 0, 1, . . . , 1) = Q(1, 1, . . . , 0) = 1 − f (−

a ) (Because each B j is sensitive)

→ −

Q( c ) ∈ {0, 1}∀ c ∈ {0, 1} b

[Trick by Minsky-Papert]
Define Q sym (k) as follows:
P −

a)
a |=k Q(
|−

Q sym
(k) = b
, k ∈ {0, 1, 2, . . . , b}
k

where |−

a | (weight of −

a ) is the number of ones in −

a.
Note: deg(Q ) ≤ b
sym

These definitions define Q sym uniquely.

Q sym (k) = f (−

a ), if k = b
= 1 − f (−
→a ), if k = b − 1
∈ [0, 1], else

Markov’s Inequality:
Definition 3.0.20 (Interval Norm of a function). Define k f k S = supx∈S | f (x)|, for f : R → R, S ⊆ R.
Suppose f is a polynomial of degree ≤ n, then k f 0 k[−1,1] ≤ n 2 k f k[−1,1]
Corollary: If a ≤ x ≤ b, c ≤ f (x) ≤ d, then k f 0 k[a,b] ≤ n 2 · b−a
d−c

Let λ = k(Q ) k[0,b]


sym 0

Let κ = max{(sup Q sym (x)) − 1, − inf Q sym (x), 0}


Then Q sym (x), for x ∈ [0, b] maps to [−k, 1 + k].

• If κ = 0, Q sym maps [0, b] to [0, 1]. Markov’s Inequality implies

deg(Q sym )2 (1 − 0) deg(Q sym )2


k(Q sym )0 k[0,b] ≤ ≤
(b − 0) bs( f )

However, Q sym (b − 1) = 1 − f (−

a ) and Q sym (b) = f (−

a ). By Mean-Value theorem, ∃α ∈ [b − 1, b] such that
|(Q ) (α)| = 1.
sym 0

so, k(Q sym )0 k[0,b] ≥ 1 ⇒ bs( f ) ≤ deg(Q sym )2


• If κ > 0
– If κ = Q sym (x) − 1 for some x = x0 . Suppose x0 ∈ [i, i + 1]. Considering the smaller of the two intervals
[i, x0 ], [x0 , i + 1] and applying mean-value theorem,
κ
k(Q sym )0 k[0,b] ≥ = 2κ
1/2

11
CS 85, Spring 2008, Dartmouth College
LECTURE 3. DEGREE OF A BOOLEAN FUNCTION Lower Bounds in Computer Science

Q sym maps [0, b] to [−k, 1 + k]. Markov’s Inequality implies,

deg(Q sym )2 (1 + κ − (−κ))


2κ ≤ k(Q sym )0 k[0,b] ≤
b−0
deg(Q sym )2 (1 + 2κ)
So, 2κ ≤
bs( f )
1
⇒ bs( f ) ≤ deg(Q sym )2 ( + 1) (1)

Using α such that |(Q sym )0 (α)| = 1, we have

deg(Q sym )2 (1 + 2κ)


1 ≤ k(Q sym )0 k[0,b] ≤
bs( f )
bs( f ) ≤ deg(Q sym )2 (1 + 2κ) (2)
1
(1)&(2) ⇒ bs( f ) ≤ deg(Q sym )2 (1 + min{ , 2κ})

≤ 2 · deg(Q sym )2
≤ 2 · deg(Q)2
≤ 2 · deg(P)2
= 2 · deg( f )2

– The proof is similar for the case where κ = −Q sym (x) for some x.

Theorem 8:
D( f ) ≤ bs( f ) deg( f )2

Proof. Consider polynomial representation P of f .


Definition 3.0.21. Define “maxonomial” = maximum degree monomial of f

• There exists a set (size ≤ bs( f ) deg( f )) of variables that intersects every maxonomial.

• Query all these variables to get subfunction f |α with deg( f |α ) ≤ deg( f ) − 1. Since all the maxonomials of
f vanish after exposing the variables, the degree of the resulting polynomial representation ( f |α ) decreases at
least by 1.

If the above two statements hold, then we can prove the theorem by induction on the degree of the function f . If
f is a constant function, then the theorem holds trivially. Assume that it is true for all functions whose degree is less
than n = deg( f ).

D( f ) ≤ bs( f ) deg( f ) + D( f |α )
≤ bs( f ) deg( f ) + bs( f |α ) deg( f |α )2 By induction hypothesis
≤ bs( f )(deg( f ) + (deg( f ) − 1) ) 2

≤ bs( f )deg( f )2 if deg( f ) ≥ 1

The algorithm to get those variables is

12
CS 85, Spring 2008, Dartmouth College
LECTURE 3. DEGREE OF A BOOLEAN FUNCTION Lower Bounds in Computer Science

• Start with an empty set S

• Pick all variables in any maxonomial that is not yet intersected and add these variables to S. (Number of
variables added is less than or equal to deg( f ).)

• If the current set intersects every maxonomial stop else repeat the above step.

Claim: Number of iterations in the algorithm is at most bs( f )




Proof: We claim that inside every maxonomial, there is a sensitive block for 0 . Pick any maxonomial. Set xi
(i 6∈ maxonomial) to 0. The resulting function’s polynomial representation still contains the maxonomial. Thus, the
resulting boolean function is not a constant. This implies that there exists a sub-block inside the maxonomial such that
flipping it changes the value of the function. Thus, the number of maxonomials is at most bs( f ). Hence, the number
of iterations which can be at most the number of maxonomials is at most bs( f ).

13
Lecture 4
Symmetric Functions and Monotone Functions

Scribe: Ranganath Kondapally

Definition 4.0.22. f : {0, 1}n → {0, 1} is said to be symmetric if ∀x ∈ {0, 1}n ∀π ∈ Sn (Group of all permutations
on [n])
f (x) = f (xπ(1) , xπ(2) , . . . , xπ(n) )
This is the same as saying f (x) depends only on |x|(number of ones in x).
Recall the proof of bs( f ) ≤ 2 deg( f )2 . We used symmetrizing trick.
Can prove: For any symmetric f , deg( f ) = n − O(n α ) where α(= 0.548) < 1 is a constant. [Von Zur Gathen &
Roche]
Then, D( f ) ≥ deg( f ) = n − O(n)
For x, y ∈ {0, 1}n write x ≤ y if ∀i(xi ≤ yi )
Definition 4.0.23. f is monotone if x ≤ y ⇒ f (x) ≤ f (y)
Example of monotone functions: O Rn , AN Dn , AN D√n · O R√n .
Non-examples: P A Rn , ¬O Rn .
Theorem 4.0.24. If f is monotone, then S( f ) = bs( f ) = C( f )
Proof. We already have S( f ) ≤ bs( f ) ≤ C( f ).
Remains to prove: C( f ) ≤ S( f ).
Let x ∈ {0, 1}n . Let α be a minimal 0-certificate matching x (suppose f (x) = 0).
Claim: All exposed bits of α are zeros.
Proof: If not, we can change an exposed bit (which is 1) in α and still have f (x|α ) = 0 as f is monotone. So, we
don’t need to expose that bit. So, α can’t be minimal 0-certificate.
Claim: Every exposed bit in α is sensitive for [α : 1∗ ] = y (the input obtained by setting *=1 everywhere in α)
Proof: Suppose i th bit is exposed by α but f (y (i) ) = f (y) = 0. By monotonicity, any other setting of *’s in α
causes f = 0. So, i th bit is not necessary, contradicting the minimality of α.
By second claim: S( f ) ≥ number of exposed bits in α. Thus, S( f ) ≥ C0 ( f ). Similarly, S( f ) ≥ C1 ( f ).
Therefore, S( f ) ≥ max{C0 ( f ), C1 ( f )} = C( f )
Theorem 4.0.25. If f is monotone, D( f ) ≤ bs( f )2 = S( f )2
[Using an earlier theorem: D( f ) ≤ C( f )bs( f ) = S( f )2 ]
The above bounds are tight, considering f = AN D√n · O R√n .

14
CS 85, Spring 2008, Dartmouth College
LECTURE 4. SYMMETRIC FUNCTIONS AND MONOTONE FUNCTIONS Lower Bounds in Computer Science

The only possible symmetric monotone functions are T H Rn,k where

T H Rn,k (x) =1, if |x| ≥ k


0, Otherwise

D(T H Rn,k ) = n(k 6= 0) (by a simple adversarial argument).


Definition 4.0.26. f : {0, 1}n → {0, 1} is evasive if D( f ) = n.
Definition 4.0.27. A boolean function f (x1,2 , x1,3 , . . . , xn−1,n ) is a graph property if f (−

x ) depends only on the


graph described by x .
Every π ∈ Sn induces a permutation on [n]

2 . f is a graph property iff it is invariant with respect to these
permutations.
Theorem 4.0.28. Every monotone non-constant graph property f on n-vertex graph has D( f ) = (n 2 ) [Rivest &
Vuillemin]
Conjecture: D( f ) = n2 , i.e f is evasive! [Richard Karp].


Theorem 4.0.29. If n = p α for a prime p, α ∈ N, then f is evasive. [Kahn-Saks-Sturtevant]


Theorem 4.0.30. If f is invariant under edge-deletion or contraction (minor closed property) and n ≥ n 0 , then f is
evasive. [Amit-Khot-Shi]
Theorem 4.0.31. Any monotone bipartite graph property is evasive. [Yao]
Proof. Let f : {0, 1} N → {0, 1} where N = n2 be a non-constant monotone graph property. f is invariant under a


group G ≤ S N of permutations (G ∼ = Sn ).
G is a transitive on {1, . . . , N } i.e, for any i, j ∈ [N ], ∃π ∈ G such that π(i) = j.
Lemma 1: Suppose f : {0, 1}d → {0, 1} is monotone (6= constant), invariant under a transitive group and d = 2α
for some α ∈ N, then f is evasive.
Corollary to Lemma 1: If n = 2k , then D( f ) ≥ n 2 /4.
Hierarchically cluster [n] into subclusters of size n/2 and recurse. Consider the subfunction of f obtained the
following way: gi : A → {0, 1} and g(x) = f (x), where

A = {x is a graph on [n] where there is an edge between l, j

if l ∧ 1i 0n−k = j ∧ 1i 0n−k and no edge if l ∧ 01i−1 0n−k 6= j ∧ 01i−1 0n−k }


Number of inputs to subfunction = (2k−1 )2 = n 2 /4. By Lemma 1, D( f ) ≥ D(subfunction) = n 2 /4. There are
k subfunctions, depending on how deep we take the clustering: g1 (stop after one level of clustering),g2 , . . . , gk (go
down to singleton clusters).

→ −

gk ( 0 ) = f ( 0 ) = 0

→ −

g1 ( 1 ) = f ( 1 ) = 1

→ −

gi ( 1 ) = gi−1 ( 0 )

→ −

∃i such that gi ( 0 ) = 0 and gi ( 1 ) = 1 [i.e gi is non-constant]
Corollary 2: For all n ∈ N, D( f ) ≥ n 2 /16
Proof of Lemma 1: Consider Sk = {x ∈ {0, 1}∗ : |x| = k and f (x) = 1}
Definition 4.0.32. orb(x) = {y : y is obtained from x by applying some π ∈ G}
Sk is partitioned into orbits.

→ −

Claim: If k 6= 0, k 6= d( i.e x 6= 0 , x 6= 1 ), then every orbit has even size. Therefore, |Sk | is even.
Number of ones in resulting matrix = k · |orb(x)| = d·(#1’s in any column)
d · ( #1’s in any col) 2α (some integer)
|orb(x)| = = = even
k k

15
CS 85, Spring 2008, Dartmouth College
LECTURE 4. SYMMETRIC FUNCTIONS AND MONOTONE FUNCTIONS Lower Bounds in Computer Science

as 0 < k < d.
By Claim:

|{x : f (x) = 1}| = |S0 | + |S1 | + . . . + |Sd−1 | + |Sd |



|S0 | = 0 as f ( 0 ) = 0 and |Sd | = 1.

= 0 + even number + 1
= odd

Consider all x ∈ {0, 1}n that reach leaf λ of decision tree at depth < d. An even number of x’s reach here.
Therefore, if all leaves whose output is f (x) = 1 have depth < d, then number of x : f (x) = 1, is even. This is a
contradiction and hence, there exists a leaf whose depth is d. Thus, D( f ) = d and f is evasive.

16
Lecture 5
Randomized Decision Tree Complexity:
Boolean functions

Scribe: Priya Natarajan


We saw this type of complexity earlier for the sorting problem.
Definition 5.0.33 (Cost of a randomized decision tree on input x). cost (t, x) = number of bits of x queried by t.
Let randomized decision tree T ∼ λ (probability distribution over decision trees).
From Yao’s minimax lemma we have: R( f ) ≥ Dµ ( f ), for any input distribution µ. So, it is enough to prove a
lower bound on Dµ ( f ). Unlike what we did in sorting, here we won’t convert to a Monte Carlo algorithm. We will
strictly use Las Vegas algorithm.
Theorem 5.0.34. Let f be a non-constant monotone graph property on N = n2 bits, i.e., n vertices. Then, R( f ) =


(n 4/3 ) = (N 2/3 ).



This is saying something more involved than the easy thing you can prove: for any monotone f : R( f ) ≥ D( f ).
Last time we showed that D( f ) ≥ n 2 /16 = (n 2 ) [Rivest-Vuiellemin].
Yao’s conjecture: R( f ) = (n 2 ) = (N ).
Theorem 5.0.34 is not the best result known, we state below (without proof) the best result known:
R( f ) = (n 4/3 log1/3 n) [Chakrabarti-Khot], this is a strengthening of Hajnal’s proof.
Today we will prove something that implies Theorem 5.0.34.
Recall that graph properties are invariant under certain permutations (not all permutations) and these permutations
form a transitive group.
Note: From now on, number of inputs = n (not N ).
Theorem 5.0.35. If f : {0, 1}n is invariant under a transitive group of permutations, then R( f ) = (n 2/3 ).
Match the R( f ) bound in Theorem 5.0.34 to that in Theorem 5.0.35! Proof of Theorem 5.0.35 using probabilistic
analysis.
Proof.
Definition 5.0.36 (Influence of a coordinate on a function). Let f : {0, 1}n −→ {0, 1}. Define the influence of the i th
coordinate on f as: Ii ( f ) = Prx [ f (x) 6= f (x (i) )].
In words: choose a random x and what is the chance that flipping the i th bit will change the output.
Note that Ii ( f ) looks like sx ( f ).
Pn of invariance under the transitive group, we have: I1 ( f ) = I2 ( f ) = · · · = In ( f ) = I ( f )/n, where
Because
I ( f ) = i=1 Ii ( f ).
We will also make use of another technique:

17
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science

Definition 5.0.37 (Great Notational Shift). TRUE: 1 → −1; FALSE: 0 → 1


So, old x −→ new 1 − 2x, and old 1−y 2 ←− new y
Why are we doing this? Given function f : {−1, 1}n −→ {−1, 1}, this is a sign vector instead of a bit vector. In
this notation also we have a unique multilinear polynomial. It is important to note that this notation does not change
the degree of the multilinear polynomial, though it might change the coefficients and the number of monomials.
Example 5.0.38. AND(x, y, z) = x yz (old)
In the new notation we have:
 
  1 − x   1 − y   1 − z 
AND(x, y, z) = 1 − 2
 

 2 2 2 
| {z }
new−→old
| {z }
old−→new
3 x y z xy yz xz x yz
= + + + − − − +
4 4 4 4 4 4 4 4
Notice that the coefficients of the linear terms = 2n .
Given that x takes ±1 values, we can say something specific about Ii ( f ): If f : {−1, 1}n −→ {−1, 1}, then:
 
1
Ii ( f ) = E x | f (x | xi = 1) − f (x | xi = −1)|
2
1
= E x [|Di f (x)|]
2
Let φ(x, y, z) = 34 + x4 + 4y + 4z − x4y − yz
4 − xz
4 + x yz
4
D1 φ(y, z) = 2. 14 + higher degree terms
E y,z [D1 φ(y, z)] = 2. 14 + 0
1
2 E[φ(y, z)] = coefficient of x
For monotone f :
1
Ii ( f ) = E x [Di f (x)]
2
= coefficient of xi in polynomial representation of f
Lemma1: For n
Pnany f : {−1, 1} −→ {−1, 1} and deterministic decision tree T evaluating f , we have:
Varx [ f ] ≤ i=1 δi Ii ( f ), where δi = Prx [T queries xi on input x].

Varx [ f ] = E x [ f (x)2 ] − (E[ f (x)])2


= 1 − E x [ f (x)]2
Note: If f is balanced (i.e., if Pr[ f (x) = 1] = Pr[ f (x) = −1]), then Var[ f ] = 1. For example, PARITY function
is balanced, AND is not.
We will prove Theorem 5.0.35 assuming f is balanced; we will fix this later.
Recall that Lemma1 makes no assumptions about the function f . If f is invariant under a transitive group, then
Ii ( f ) = I (nf ) , so from Lemma1 we get:
n
I( f ) X
Var[ f ] ≤ δi
n i=1
I( f )
= E[# variables queried]
n
I( f )
= Duniform ( f )[by picking optimal deterministic decision tree for uniform distribution].
n

18
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science

Lemma2 [O’Donnell-Servedio]: For any monotone f : {−1, 1}n → {−1, 1}, I ( f ) ≤ Dunif ( f ).
From Lemma1 and Lemma2, we get that if f is monotone, transitive, and balanced, then:
I( f ) Dunif ( f )3/2
1 ≤ Dunif ( f ) ≤
n n
⇒ R( f ) ≥ Dunif ( f ) ≥ n 2/3

So now we will prove lemmas 1 and 2.


Proof of Lemma1:
n
X
Var[ f ] ≤ δi Ii ( f )
i=1
Var[ f ] = 1 − E[ f (x)]2
= 1 − E x [ f (x)].E y [ f (y)]
= 1 − E x,y [ f (x) f (y)]
= E x,y [1 − f (x) f (y)]
= E x,y [| f (x) − f (y)|]
That is, forPany sequence of random variables ∈ {0, 1}n , x = u [0] , u [1] , . . . , u [d] = y:
d−1
Var[ f ] ≤ i=0 E[| f (u [i] ) − f (u [i+1] )|] (by triangle inequality)
For example:
Start: f (x1 , x2 , x3 , x4 , x5 ) [= x = u [0] ]
f (x1 , y2 , x3 , x4 , x5 ) [tree T read x2 in the above step]
f (x1 , y2 , x3 , y4 , x5 ) [tree T read x4 above]
And now suppose the tree has reached a leaf so it outputs a value of f and stops. When the tree stops, replace the
remaining x 0 s with y 0 s. So we get f (y1 , y2 , y3 , y4 , y5 )
Note that going from u [i] to u [i+1] is just “re-randomizing”, there is no conditioning involved because the variable
which was queried (whose value got known) got replaced by a random variable.
Expression in summation above is just E u [| f (u) − f (u (∼ j) )|] where x j is the variable queried by T at this step.
Note that this looks a lot like the definition of influence, without the 1/2.
E u [| f (u) − f (u (∼ j) )|]: read this as “take a random variable u and re-randomize u by replacing the j th variable
by tossing a coin again”. Since with probability 1/2, the value of the j th variable got flipped or remained the same,
we get:
1
E u [| f (u) − f (u (∼ j) )|] = E u [| f (u) − f (u ( j) )|]
2
= I j (u)
So, if x j1 , x j2 , . . . , x jd is the random sequence of variables read (note that d is also random), then:
Var[ f ] ≤ (I j1 ( f ) + I j2 ( f ) + . . . + I jd ( f )) ∗ Pr[j1 , j2 , . . . , jd occurs]
Xn
= Ii ( f ) ∗ Pr[T queries xi ]
i=1

Proof of Lemma2:
Recall that for monotone f , Ii ( f ) = 21 E x [Di f (x)], which is also the coefficient of xi in the representing polyno-
mial.
Let T be a deterministic decision tree for f and suppose T outputs ±1 values. Let L = {leaves of T } and L + =
{leaves of T that output −1}, i.e., accepting leaves.
For each leaf λ ∈ L, we have a polynomial pλ (x1 , x2 , . . . , xn ) such that
1, if (x1 , x2 , . . . , xn ) reaches λ

pλ (x1 , x2 , . . . , xn ) =
0, otherwise

19
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science

deg( pλ ) = depth of λ. |coefficient of xi in pλ | = 2depth(λ)−1


1
.

+1, if upon reading xi on path to λ, we branch right



Sign of coefficient =
−1, otherwise
Notations:

d(λ) = depth(λ)
s(λ) = skew of λ
= (#left branches − #right branches) on path to λ

Using the above definitions, we have:


n
X
I ( pλ ) = (coefficient of xi in pλ )
i=1
−s(λ)
=
2d(λ)−1
P
Observe that representing polynomial of f = 1 − 2 λ∈L + pλ . Therefore,
n
X
I( f ) = (coefficient of xi in representing polynomial of f)
i=1
X s(λ)
=
λ∈L +
2d(λ)
X |s(λ)|

λ∈L
2d(λ)
= E random branching [|s(λ)|]

qP
d(λ)
⇒ I( f ) ≤ Duni f ( f ). The first inequality is due to drunkard’s walk in probability theory.
p
λ 2d(λ) =

Lecture dated 04/25/2008 follows:


Today we will prove something for graph properties, which will sometimes be stronger than what we proved last
time:
Pn I( f ) Duni f ( f )3/2
1 − E[ f (x)]2 = V ar [ f (x)] ≤ i=1 δi Ii ( f ) ≤
T
n Duni f ( f ) ≤ n . The first inequality is due to
Lemma1 of last time, the second inequality is due to symmetry, and the last inequality is due to monotonicity and
Lemma2. If f is balanced, we get Duni f ( f ) ≥ n 2/3 .
In the proof we did last time, we did not use the full power of uniform distribution; we just used the fact that
individual points are independent. If f is not balanced, we get Duni f ( f ) ≥ (some value close to 0), which is not a
good lower bound. For general f , instead of assuming X uni f , we use X µ p , where µ p means “choose each xi = 1
with probability p / −1 with probability 1 − p independently”.
µ p (x1 , x2 , . . . , xn ) = p N1 (x) (1 − p) N−1 (x)
Now, we will need to have the following modifications: √
p(1− p)·Dµ p ( f )3/2
δi,T p Ii, p ( f ) ≤ I (nf ) Dµ p ( f ) ≤
Pn
1 − E[ f (x)]2 = V ar [ f (x)] ≤ i=1 n , where we get the last
inequality due to generalized drunkard walk.
If p = 0, E[ f (x)] = −1; if p = 1, E[ f (x)] = 1. So, intermediate value theorem tells us that ∃ p such that
E[ f (x)] = 0. So we pick this p and we get a bound for Dµ p ≥ n 2/3 . Although this bound is for Dµ p , Yao’s minimax
lemma tells us that for any distribution, Dµ p is a lower bound for R( f ). This probability p where E[ f (x)] = 0 is
called “critical probability” of f . Every monotone function has a well-defined critical probability. Today’s theorem
will be in terms of this critical probability.

20
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science
2
Theorem 5.0.39. If f is a non-constant, monotone graph property on n-vertex graphs, then R( f ) = (min{ pn∗ , log
n
n }),

where p is the critical probability of f .
Note that we can always assume p ∗ ≤ 12 . Why? Because if the critical probability of f > 1/2, we can always
consider g(−→x ) = 1 − f (1 − −→x ), then p ∗ (g) = 1 − p ∗ ( f ).
Critical probability p is the probability that makes E[ f (x)] = 1/2.

Let us now define the key graph theoretic property “graph packing” that we need for the proof. Graph packing is
the basis for a lot of lower bounds for graph properties.
Definition 5.0.40 (Graph Packing). Given graphs G, H with |V (G)| = |V (H )|, we say that G and H pack if
∃bijection φ : V (G) → V (H ) such that ∀{u, v} ∈ E(G), {φ(u), φ(v)}notin E(H ). φ is called a packing.
We state the following theorem, but we will not prove it:
Theorem 5.0.41 (Sauer-Spencer theorem). If |V (G)| = |V (H )| = n and 4(G) · 4(H ) ≤ n
2, then G and H pack.
Here 4(G) = max degree in G.
Note: you can never pack a 0-certificate and a 1-certificate. What is the graph of a 0-certificate? It is a bunch of
things that are not edges in the input graph.
Intuition: Sauer-Spencer theorem says that if two graphs are small, you can pack them. Since we cannot pack a
0-certificate and a 1-certificate, it implies that either of the two has to be big so that certificate complexity is big which
gives you a handle on the lower bound.
Consider a decision tree; pick a random input x ∈ {0, 1}n according to µ p and run the decision tree on x. Let Y =
#1 s read in the input, and Z = #00 s read in the input.
0

Then, cost of the tree on random input = Y + Z .


Dµ p ( f ) = E[cost] = E[Y ] + E[Z ]
2
By Yao’s minimax lemma, it is enough to prove that either E[Y ] or E[Z ] is (min{ pn∗ , log
n
n }). Note that Y and Z
0 0
are non-negative random variables (since they represent the number of 0 s and 1 s read), so it is enough to prove the
lower bound on one of them.
Proof outline (remember we can assume p ∗ ≤ 1/2):
Claim 1: E[Z ]|E[Y ] = 1−p p .
n
⇒ if E[Y ] ≥ 64 then:

1− p n
E[Z ] = ·
p 64
n

128 p
= (n/ p)
n n2
≥ (min{ , })
p log n
n
⇒ we may assume E[Y ] ≤ 64 .
E[Y ]
⇒ E[Y | f (x) = 1∧4(G x ) ≤ np + 4np log n] ≤
p
√ , where G x = graph represented
Pr[ f (x)=1∧4(G x )≤np+ 4np log n]
by 10 s in the input x (so other edges are “off”).
E[X ]
Theorem 5.0.42. E[X |c] ≤ Pr[c]

Proof.

E[X ] = E[X |c] · Pr[c] + E[X |¬c] · Pr[¬c]


≥ E[X |c] · Pr[c]

21
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science

Pr[A ∧ B] = 1 − Pr[A ∨ B]
≥ 1 − Pr[A] − Pr[B]
= Pr[A] − Pr[B]

Using our assumption, and thepinequality above, we get:


E[Y ] n/64
E[Y | f (x) = 1 ∧ 4(G x ) ≤ np + 4np log n] ≤ Pr[ f (x)=1∧...] ≤ 1 1 ≤ n/32 + o(n)
2−n
Later: The constant 4 above is chosen so that
1
Pr [1(G x ) ≤ np + (Chernoff bound)
p
4np log n] ≥ 1 −
n
This implies that there exists an input, x, satisfying f (x) = 1 and
p 1(G x ) ≤ np + 4np log n, such that y ≤
p

n/32 + o(n). Therefore, there exists a 1-certificate with 1 ≤ np + 4np log n and #edges ≤ n/32 + o(n). This
means that we are touching at most n/16 vertices, and so at least n/2 vertices are isolated.
2
Claim 2: All 0-certificates must have at least √n edges.
16(np+ 4np log n)
Now we have a lower bound on the number of edges of any 0-certificate. As we did E[Y ], we can say

E[Z ] ≥ E[Z | f (x) = 0] · Pr [ f (x) = 0]


n2 1
≥ p ·
16(np + 4np log n) 2
n2
= p
32(np + 4np log n)
( )
n2 n2
≥ min ,
64np 256 log n
( )
n n2
≥ min ,
64 p 256 log n

Claim 2 Proof: This proof amounts to showing that if the claim were false, we could pack a 1-certificate and
0-certificate.
n2
We will show that if 1(G) ≤ k and G has more that n/2 isolated vertices and |E(H )| ≤ 161(G) , then G and
H pack.
Number of edges in a graph equals half the sum of the degrees of the vertices, i.e.,
1 X
|E(H )| = · deg(v)
2 v∈H
davg (H ) · n
=
2
2
⇒ davg (H ) = · |E(H )|
n
n
= .
81(G)
If H 0 is a subgraph of H spanned by n/2 lowest degree vertices , then

1(H 0 ) ≤ 2 · davg (H )

.
Now, if you have an upper bound on the average, you can bound the max of n/2 smallest items.

22
CS 85, Spring 2008, Dartmouth College
LECTURE 5. RANDOMIZED DECISION TREE COMPLEXITY: BOOLEAN FUNCTIONS Lower Bounds in Computer Science

By packing H − H 0 with the isolated vertices of G, we can extend a packing of H 0 and G 0 to one one H and G.
We can get a packing of H 0 and G 0 because,

1(G 0 )1(H 0 ) ≤ 1(G)1(H 0 )


n

4
n/2
= .
2
Apply Sauer-Spencer theorem.
Claim 1 Proof: Define a bunch of indicator variables. Let,

1, if xi is read as a 0 10 .

Yi =
0, otherwise
1, if xi is read as a 0 00 .

Zi =
0, otherwise

Then,

(n2)
X (n2)
X
Y = Yi ; Z = Zi
i=1 i=1

Then it is enough to show that:


(
either E[Yi ] = E[Z i ] = 0
∀i : i] 1− p
or E[Z
E[Yi ] = p

Enough to show the same under an arbitrary conditioning on x j ( j 6= i) – because xi ’s are independent.
Apply an arbitrary conditioning of x j ( j 6= i). If this conditioning implies that the i-th coordinate is not read,
it implies Yi = Z i = 0. Otherwise this is the only coordinate that is left to be read; and we know Pr[1] = p and
Pr[0] = 1 − p, i.e.,
E[Z i ] Pr[X i = 0]
=
E[Yi ] Pr[X i = 1]
1− p
= .
p
What do you mean by “enough to show under arbitrary conditioning”?
If you have conditioning on rest of N − 1 variables then we have C1 , C2 , . . . , C2 N −1 conditions. This implies,

E[Z i ] = E[Z i |C1 ] · Pr[C1 ] + E[Z i |C2 ] · Pr[C2 ] + . . .


E[Yi ] = E[Yi |C1 ] · Pr[C1 ] + E[Yi |C2 ] · Pr[C2 ] + . . .

We used a property of decision trees that whether or not you read ith variable is independent of the value of the ith
variable; it may depend on what the values of other variables are.

23
Lecture 6
Linear and Algebraic Decision Trees

Scribe: Umang Bhaskar

6.1 Models for the Element Distinctness Problem


Definition 6.1.1 (The Element Distinctness Problem). Given n items (x1 , x2 , · · · , xn ) from a universe U not neces-
sarily ordered, does ∃i 6= j ∈ [n] : xi 6= x j ?
The complexity of the problem depends on the model of computation used.

Model 0. Equality testing only. This is a very weak model with no non-trivial solutions. The element distinctness
problem would require 2(n 2 ) comparisons.
Model 1. Assume universe U is totally ordered. We can compare any two elements xi and x j and take either of 3
branches depending on whether xi > x j , xi = x j , or xi > x j . Note that in a computer science context this
makes sense, as it is always possible to compare two objects by comparing their binary representations.
In this model, the problem reduces to sorting, hence the element distinctness problem can be solved in O(n log n)
comparisons. In fact we have a matching lower bound (n log n) for this problem, through a complicated
adversarial argument.
Instead of describing the adversarial argument we describe a stronger model. We then use this model to give us
a lower bound, since a lower bound for the problem in the stronger model is also a lower bound in this model.
Model 2. We assume w.l.o.g. that the universe U = R with the usual ordering. Then the comparison of two elements
xi and x j is equivalent to deciding if xi − x j is <, > or = 0. For this model we allow more general tests, such
as

 > 0,

is x1 − 2x3 + 4x5 − x7 = 0, ?
<0

 >0

In general at an internal node v in the decision tree, we can decide whether pv (x1 , x2 , · · · , xn ) = 0 , and
<0

n
(v) (v)
X
branch accordingly. Here pv (x1 , x2 , · · · , xn ) = a0 + ai xi .
i=1

24
CS 85, Spring 2008, Dartmouth College
LECTURE 6. LINEAR AND ALGEBRAIC DECISION TREES Lower Bounds in Computer Science

6.2 Linear Decision Trees


For model 2 as described above, we have the following definition.
Definition 6.2.1 (Linear Decision Trees). A linear decision tree is a 3-ary tree such that each internal node v can
 >0

decide whether pv (x1 , x2 , · · · , xn ) = 0 , and branch accordingly. The depth of such a tree is defined as usual.
<0

If we assume that our tree tests membership in a set, we can consider the output to be {0, 1}. Then we can think of
two subsets W0 , W1 ⊆ Rn where:

x ∈ Rn : tree outputs i}
1. Wi = {E
T
2. W0 W1 = ∅
3. W0 W1 = Rn
S

Consider a leaf λ of the tree. Then the set Sλ = {x ∈ Rn : x reaches λ} is described by a set of linear equality /
inequality constraints, corresponding to the nodes on the path from the root to the leaf. Then Sλ is a convex polytope,
since each linear equality / inequality defines a convex set, and the intersection of convex sets is itself a convex set.
Convex sets are connected, in the following sense:
Definition 6.2.2 (Connected Sets). A set S ⊆ Rn is said to be (path-) connected if ∀a, b ∈ S, ∃ a function pab :
[0, 1] → S such that:
• pab is continuous
• pab (0) = a
• pab (1) = b
Suppose a linear decision tree T has L 1 leaves that output 1 and L 0 leaves that 0. Then
[
W1 = Sλ
λ outputs 1

For a set S, define #S to be the number of connected components of S. Then, since each leaf outputs at most a
single connected component,

#W1 ≤ L 1
and similarly,

#W0 ≤ L 0
For a given function f : Rn → {0, 1} (for example, element distinctness) we define f −1 (1) = W1 = {E
x : f (x) =
1}, and f (0) is similarly defined. Then,
−1

L 1 ≥ # f −1 (1)
and,

L 0 ≥ # f −1 (0)
This implies for any decision tree that computes f , the number of leaves L satisfies

L = L1 + L0
≥ # f −1 (1) + # f −1 (0)

25
CS 85, Spring 2008, Dartmouth College
LECTURE 6. LINEAR AND ALGEBRAIC DECISION TREES Lower Bounds in Computer Science

and hence,

height(T ) ≥ log3 (# f −1 (1) + # f −1 (0))


We now return to the element distinctness problem. Remember that

1, if ∀i 6= j, xi 6= x j
f (x1 , x2 , · · · , xn ) =
0, o.w.
It turns out that f −1 (1) has at least n! connected components. Actually the correct number is n!, but we are only
interested in the lower bound which we will prove.
Lemma 6.2.3. ∀σ, π ∈ Sn , the group of all permutations, σ 6= π, the points x σ = (σ (1), σ (2), . . . , σ (n)) and
x π = (π(1), π(2), . . . , π(n)) lie in distinct connected components of f −1 (1).
Proof. The proof is by contradiction. Suppose not, then there exists a path from x σ to x π lying entirely inside f −1 (1).
Then, for permutations σ, π,

∃i 6= j s.t. σ (i) < σ ( j)


π(i) > π( j)
Now x (σ ) and x (π) lie on opposite sides of the hyperplane xi = x j . Hence any path from x (σ ) to x (π) must
contain a point x ∗ s.t. xi∗ = x ∗j , by the Intermediate Value Theorem and f (x ∗ ) = 0, which gives us the required
contradiction.
Suppose a linear decision tree T solves element distinctness. Then
D(element distinctness) ≥ height(T )
≥ log3 (# f −1 (0) + # f −1 (1))
≥ log3 (n!)
= (n log n)
Note that very little of this proof actually uses the linearity of the nodes. We can as well use other functions e.g.
quadratic functions and get the same results.

6.3 Algebraic Decision Trees


An algebraic decision tree of order d is similar to a linear decision tree, except the internal nodes evaluate polynomials
of degree ≤ d, and branch accordingly.
The problem now is that the components may no longer be connected, e.g. (x + y)(x − y) > 0. However, it turns
out that low degree polynomials aren’t too “bad”, as evidenced by the following theorem:
Theorem 6.3.1 (Milnor-Thom). Let p1 (x1 , x2 , . . . , xm ), p2 (x1 , x2 , . . . , xm ), . . . , pt (x1 , x2 , . . . , xm ) be polynomials
with real coefficients of degree ≤ k each. Let V = {Ex ∈ Rm : p1 (E x ) = p2 (E x ) = · · · = pt (E x ) = 0} (called an algebraic
variety. Then the number of connected components in V = #V ≤ k(2k − 1)m−1 .
Milnor conjectured that the lower bound for the number of connected components in V should be k m . As an
example, consider the equalities

p1 (x) = (x1 − 1)(x1 − 2) . . . (x1 − k) = 0


p2 (x) = (x2 − 1)(x2 − 2) . . . (x2 − k) = 0
...
pm (x) = (xm − 1)(xm − 2) . . . (xm − k) = 0
Then p1 (x) = p2 (x) = · · · = pm (x) is satisfied at exactly k m points, each of which forms a connected component.
It follows from the theorem that each leaf gives at most k(2k −1)m−1 components. However, internal nodes are not
restricted to equalities and can evaluate inequalities. In the next lecture we introduce Ben-Or’s lemma, which handles
this case. We use this to get a lower bound for tree depth in such a model, i.e. where the internal nodes are allowed to
perform algebraic computations and compute inequalities.

26
Lecture 7
Algebraic Computation Trees and Ben-Or’s
Theorem

Scribe: Umang Bhaskar

7.1 Introduction
Last time we saw the Milnor-Thom theorem, which says that the following

Theorem 7.1.1 (Milnor-Thom). Let p1 (x1 , x2 , . . . , xm ), p2 (x1 , x2 , . . . , xm ), . . . , pt (x1 , x2 , . . . , xm ) be polynomials


with real coefficients of degree ≤ k each. Let W = {E x ∈ Rm : p1 (E x ) = p2 (E x ) = · · · = pt (E x ) = 0}. Then the number
of connected components in W = #W ≤ k(2k − 1)(m−1) .

If instead of constant-degree polynomials we allowed arbitrary degree polynomials then it turns out that the element
distinctness problem could be done in O(1) time! Consider the expression

x = 5i< j (xi − x j )
Then x = 0 ⇔ ∃i 6= j : xi = x j . Hence, this is obviously too powerful a model.
In general such problems can be posed as “set recognition problems”, e.g. the element distinctness can be posed as
recognising the set W where W = {E x ∈ Rm : ∀i, j (i 6= j ⇒ xi 6= x j )}. Define Dlin (W ) as the height of the shortest
linear decision tree that recognizes W . We saw in the last lecture that for this particular problem, Dlin (W ) ≥ log3 (#W ).

7.2 Algebraic Computation Trees


Definition 7.2.1 (Algebraic Computation Trees). An algebraic computation tree (abbreviated “ACT”) is one with two
types of internal nodes:

1. Branching Nodes: labelled “v:0” where ‘v’ is an ancestor of the current node
2. Computation Nodes: labelled “v op v”’ where v, v’ are

(a) ancestors of the current node,


(b) input variables, or
(c) constants

27
CS 85, Spring 2008, Dartmouth College
LECTURE 7. ALGEBRAIC COMPUTATION TREES AND BEN-OR’S THEOREM Lower Bounds in Computer Science

and op is one of +, −, ×, ÷. The leaves are labelled “ACCEPT” or “REJECT”. The computation nodes in an
algebraic computation tree have 1 child, while branching nodes have 3 children.

For a set W ⊆ Rn , we define Dalg (W ) = minimum height of an algebraic computation tree recognizing W .
Note that in this model, every computation is being individually charged, hence an operation such as computing
5i< j (xi − x j ) would be charged O(n 2 ).
The Milnor-Thom theorem is applicable to polynomials of low degree, and it turns out that we still do not exactly
have that. For example, consider a path of length h in a tree. Then the degree of the polynomial computed can be 2h .

w1 = x 1 x 1
w2 = w1 w1
w3 = w2 w2
...
wh = wh−1 wh−1
We need to somehow bound the degree of the polynomial obtained from such a path. Let v be a node in an ACT
and let w denote its parent. In order to do this, with each node v of an ACT, we associate a polynomial equality /
inequality as follows:

1. If w = parent(v) is a branching node, then pv = pw T 0, where the inequality contains <, =, or > depending
on the branch taken from w to reach v.

2. If w = parent(v) is a computation node of the form v1 op v2 , then

(a) If op = +, −, or ×, then pv = pv1 op pv2


(b) if op = ÷, then pv pv2 = pv2 .

At this √
point, we can even introduce the square root operator for our algebraic computation tree so that if the node
is labelled v 0 , then the corresponding polynomial inequality is pv2 = pv 0 .
Let λ be a leaf of the ACT, and we define as usual Sλ = {E x ∈ Rn : xE reaches λ}. Let wi denote the variable
introduced at node i in the tree for obtaining the polynomial equalities and inequalities. Then we can write
 
 _ 
Sλ = xE ∈ Rn : ∃w1 , w2 , . . . , wh ∈ R s.t. ( condition at v)
v ancestor of λ
 

where “condition at v” is of the form p(E x , w)


E = 0, > 0 or < 0 for some polynomial p of degree ≤ 2. In fact we
do not require inequalities of the form p(E x , w)
E < 0, since we can always negate them.
Now that we have bounded-degree polynomials, in order to apply Milnor-Thom’s theorem we need to remove the
inequalities. For this, conditions of the form p(E x , w)
E ≥ 0 can be rewritten as

x , w)
∃y : p(E E = y2
This does not increase the degree of the polynomial, although now we are moving the equation to a higher dimen-
sion space (through the introduction of more variables). We can move back to the lower dimension space by simply
projecting the solution space to the lower dimension. Note that this projection function is a continuous map.
But we still don’t know how to handle strict inequalities, which is all we have. In order to handle strict inequalities,
we proceed as follows. Suppose that the subset v ⊆ Rm is defined by a number of polynomial equations and strict
inequalities i.e.

V = {Ez ∈ Rm : p1 (Ez ) = p2 (Ez ) = · · · = pt (Ez )


q1 (Ez ) > 0, q2 (Ez ) > 0, . . . , qu (Ez ) > 0)}
Then Sλ = {E x ∈ R n : ∃wE ∈ Rh (E
x , w)
E ∈ V }, where V is of the the above form. Further, Sλ is the projection of V
onto the first n coordinates. Since projection is a continuous function,

28
CS 85, Spring 2008, Dartmouth College
LECTURE 7. ALGEBRAIC COMPUTATION TREES AND BEN-OR’S THEOREM Lower Bounds in Computer Science

#Sλ ≤ #V
Suppose V = V1 ∪ V2 ∪ · · · ∪ Vl , where the Vi s are the connected components of V . Pick a point zEi ∈ Vi , for
each Vi , then we have a point in each connected component of V . Each zEi is also in V , so it satisfies the equality and
inequality constraints, and q1 (zEi ), q2 (zEi ), . . . , qu (zEi ) are all > 0.
Let ε = min q j (zEi ). Let
i, j
Vε = {Ez ∈ Rm : p1 (Ez ) = p2 (Ez ) = · · · = pt (Ez ) = 0
q1 (Ez ) ≥ ε, q2 (Ez ) ≥ ε, . . . , qu (Ez ) ≥ ε}
Clearly Vε ⊆ V . Also, Vε contains every (zEi ), i.e., Vε contains a point in each component Vi of V .
Therefore,

#Vε ≥ #V
Now we can replace the inequalities, and we obtain the set Vε0 as
^
V 0 = {Ez ∈ Rm : ∃y ∈ Ru (polynomial equations of degree at most 2) }
and note that Vε is the projection of (Ez , uE) ∈ Rm+u onto Rm . Then,

#Sλ ≤ #V ≤ #Vε ≤ #V 0 ≤ 2(2.2 − 1)n+h+u−1


dimension: n n+h n+h n+h+u
and the last inequality is obtained by applying the Milnor-Thom theorem, since the constraints in V 0 are equations
of degree 2, as required by Milnor-Thom. Also, h = number of computation nodes, and u ≤ number of branching
nodes, hence h + u ≤ height of the tree.
So,

#Sλ ≤ 2.3n+ht (T ) − 1 ≤ 3n+ht (T )


[
Using the fact that #(A ∪ B) ≤ #A + #B and W = Sλ , we get that
λ accepts
X
#W ≤ #Sλ
λ accepts

and since the number of leaves ≤ 3ht (T ) ,

#W ≤ 3ht (T ) 3n+ht (T ) = 3n+2 ht (T )


or,
log3 (#W ) − n
ht (T ) ≥ = (log(#W ) − n)
2
which is Ben-Or’s theroem.
From this we can derive a number of results, such as,

• For element distinctness since #W = n!, Dalg (W ) = (log n! − n) = (n log n)

• Set equality, set inclusion, “convex hull” are all (n log n)

• Knapsack is (n 2 )

29
Lecture 8
Circuits

Scribe: Chrisil Arackaparambil


We have previously looked at decision trees that allow us to prove lower bounds on “data access” for a problem.
We saw that the best possible lower bound provable in this setting is (n) (or n if you care about constants).
We now look at a different model of computation called a circuit, that allows us to prove lower bounds on “data
movement”. Loosely speaking, this is the amount of data we will need to move around in order to solve a problem.
We will see that in this model, an (n) lower bound is in fact trivial.
Circuits attempt to address the issus of the lack of super-constant lower bounds with the Turing Machine (TM)
model, where the only technique available is simulation and diagonalization and we do not know how to “analyze” the
computation. Using circuits we will be able to look into the innards of computation.
A circuit is built out of AND (∧), OR (∨) and NOT (¬) gates alongwith wires and has special designated input
and output gates. More formally,
Definition 8.0.2 (Boolean Circuit). A Boolean circuit on n Boolean variables x1 , x2 , . . . , xn is a directed acyclic graph
(DAG) with a designated “type” for each vertex:
1. Input vertices with indegree (fan-in) = 0. These are labelled with: x1 , ¬x1 , x2 , ¬x2 , . . . , xn , ¬xn .
2. An output vertex with outdegree (fan-out) = 0.
3. AND (∧), OR (∨) gates with indegree = 2, and NOT (¬) gates with indegree = 1.
Since a circuit is a DAG, we can have a topological sort x1 , . . . , xn , g1 = xn+1 , g2 = xn+2 , . . . , gs = xn+s = y of
the graph with the input vertices as the first few vertices in the sort and the output gate as the last vertex in the sort.
Then we can inductively define the value computed by the circuit as follows:
1. Value of input vertex xi is the assignment to variable xi .
2. Value of x j for ( j > n) is:
• value(x j1 ) ∧ value(x j2 ), if x j is an AND gate with inputs x j1 and x j2 .
• value(x j1 ) ∨ value(x j2 ), if x j is an OR gate with inputs x j1 and x j2 .
• ¬value(x j1 ) if x j is a NOT gate with input x j1 .
3. Value computed by the circuit on a given input assignment is value(y).
Every Boolean circuit computes a Boolean function and provides a computationally efficient encoding of the
function.
To compute f : {0, 1}∗ → {0, 1} (i.e. language L f ⊆ {0, 1}∗ ), we need a circuit family hCn i∞
n=1 , where C n
computes f on inputs of length n.

30
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science

Definition 8.0.3 (Size of a circuit). For a circuit C, we define its size as


size(C) = # edges (wires) in C = 2(# gates in C).
Since we have indegree ≤ 2, the number of wires is within a factor of 2 of the number of gates.
Definition 8.0.4 (Depth of a circuit). For a circuit C, we define its depth as
depth(C) = maximum length of an input-output path.
We will see later that size and depth can be traded off. We will look at lower bounds on size, depth and size-depth
tradeoffs.
Theorem 8.0.5. If L ∈ P, then L has a polynomial sized circuit family i.e.

n=1 such that ∀n∀x ∈ {0, 1} C n (x) = 1 ⇔ x ∈ L ,


n
∃hCn i∞
and size(Cn ) = poly(n).
Proof. Proof idea Build a circuit to simulate the Turing Machine for L. In fact, we get a circuit of size
O((runtime of TM)2 ). (See Sipser [Sip06], Chapter 9, pages 355–356).
With this we have a plan to show P 6= NP:
1. Pick your favorite NP-complete problem L.
2. By the above theorem, L has polynomial-size circuits if P = NP.
3. Prove a super-polynomial circuit size lower bound for L.
However, around 25 years of effort in this line of work has not produced even a super-linear lower bound for an
NP-complete problem. The best known lower bound for a problem in NP is ≥ 5n − o(n).
What we can prove however is that polynomial circuits of some restricted types cannot solve “certain” problems
“efficiently”.
 n
Theorem 8.0.6 (Shannon’s Theorem). Almost all functions f : {0, 1}n → {0, 1} require circuit size ≥  2n . (The
n
  
bound can be improved to 2n 1 +  √1n ).

Proof. We can assume that the circuit does not use NOT gates. We can get the effect of a NOT gate somewhere in the
circuit by using DeMorgan’s Laws and having all inputs as well as their negations.
We can lay down any circuit in a topological sort, so that any edge is from a vertex u to a vertex v that is after u in
the sorted order. With this in mind, we can count the number of circuits (DAGs) of size s. For each gate we have two
choices for the type of the gate and ≤ s 2 choices for inputs that feed the gate. So the total number of such circuits is
≤ (2s 2 )s ≤ s 3s .
2n
Each circuit computes a unique Boolean function, so that the number of functions that have a circuit of size ≤ 10n
is
 n 3· 2n
23·2 /10
n
2 10n 3·2n 3·2n 3 log 10n
10 − 10n log 10n = 22 ( 10 − 10n ) < 22 · 10
n 3 n 3
≤ = n /10n = 2
10n (10n) 3·2

n 2n · 3
2 10
But the total number of functions f : {0, 1}n → {0, 1} is 22 , so that limn→∞ 2 2n = 0.
Observe that every function f : {0, 1}n
→ {0, 1} has a circuit of size O(n · 2n ). We can get this by writing f as an
OR of minterms (DNF). We will need at most 2n minterms.
 n
It is possible to improve the above bound to O 2n .
We will now prove our first concrete lower bound. This bound is for the function THRkn : {0, 1}n → {0, 1} defined
as 
1, if the number of 1s in x is ≥ k
THR n (x) = .
k
0, otherwise

31
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science

Theorem 8.0.7. Any circuit for THR2n must have ≥ 2n − 3 gates.

Proof. Note that we have the trivial lower bound of ≥ n − 1. The lower bound in the theorem is proved by the
technique of “gate elimination”.
We will prove the theorem by induction on n. For the base case (n = 2), we have THR22 = AND2 for which we
need a circuit of size 1 = 2n − 3.
Now, let C be a minimal circuit for THR2n . Then C does not have any gate that reads inputs (xi , xi ) or (xi , ¬xi ) or
(¬xi , ¬xi ) for i ∈ [n].
Pick a gate in C such that its inputs are z i = xi (or ¬xi ) and z j = x j (or ¬x j ), for some i 6= j. Then we can claim
that either z i or z j must have fan-out at least 2. To see this, note that by suitably setting z i and z j , we can get three
different subfunctions on the remaining n − 2 variables: THR2n−2 , THR1n−2 and THR0n−2 . But if both z i and z j have
fan-out only 1, the settings of z i and z j can create only two different subcircuits, which gives us a contradiction.
Suppose xi (or ¬xi ) has fan-out ≥ 2. Then, setting xi = 0 eliminates at least two gates. So we have a circuit for
THR 2n−1 (the resulting subfunction) with size ≤ (original size) −2.
Now by our induction hypothesis, (new size) ≥ 2(n − 1) − 3 = 2n − 5. This implies that the original size was
≥ 2+ (new size) ≥ 2n − 3.

8.1 Unbounded Fan-in Circuits


We study the class of functions that are computable by unbounded fan-in circuits. Here the DAG representing a circuit
is allowed to have unbounded indegree for vertices representing AND, OR and NOT gates.

Definition 8.1.1. AC0 is the class of O(1)-depth, polynomial size circuits with unbounded fan-in for AND, OR and
NOT gates.

8.2 Lower Bound via Random Restrictions


There are some assumptions we can make about AC0 circuits:

1. We do not have any NOT gates. As before, if a NOT gate is indeed required somewhere in a circuit, we can
get an equivalent circuit using DeMorgan’s Laws that does not have any NOT gates, but uses the negations of
variables as input gates.

2. The vertices (gates) of the circuit are partitioned into d + 1 layers 0, 1, . . . , d such that: any edge (wire) (u, v)
is from vertex u in layer i to vertex v in layer i + 1 for some i, inputs xi , ¬xi , i ∈ [n] are at layer 0, the output is
at layer d, each layer consists of either all AND gates or all OR gates and layers alternate between AND gates
and OR gates. d is the depth of the circuit.

We can convert the DAG for the circuit into this form after a topological sort. First group the AND gates and OR
gates together into layers. If there is an edge from a vertex u in layer i to a vertex v in layer j > i + 1, then we can
replace the edge with a path from u to v having a vertex in each intermediate layer. Each new vertex represents an
appropriate gate for the layer it is in. For each edge in the original circuit the number of gates (wires) we add is at
most the (constant) depth of the original circuit.
Note that constant-depth polynomial-sized circuits remain so even after the above transformations, so we can
assume that AC0 circuits have these properties.
Now we are ready to prove our first lower bound on AC0 .

/ AC0 .
Theorem 8.2.1 (Håstad [Hås86]). PARn ∈

Proof. We will prove this by induction on the depth of the circuit. Suppose there exists a depth d, size n c circuit for
PAR n . We will show that there exists a restriction (partial assignment) such that after applying it

32
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science

• the resulting subfunction which is either PAR or ¬PAR depends on n 1/4 variables.
• there exists a depth d − 1, size O(n c ) circuit for the resulting subfunction.
But by our induction hypothesis this will not be possible.
For the base case, we will show that a circuit of depth 2 computing PAR or ¬PAR must have > 2n−1 gates. This
will complete the proof.
For the base, case let’s assume without loss of generality that level 1 is composed of OR gates and level 2 of an
AND gate producing the output. We claim that each OR gate must depend on every input xi . For suppose this is not
the case. Some OR gate is independent of (say) x1 . Then set x2 , x3 , . . . , xn so as to make that OR gate output 0. Then
the circuit outputs 0, but the remaining subfunction is not constant. This gives a contradiction. Now, for each OR gate
there is a unique input that makes it output 0. For each 0-input to PAR there must be an OR-gate that outputs 0. So the
number of OR gates is at least the number of 0-inputs to PAR which is 2n−1
To prove the induction step, consider the following random restriction on the PAR function. For each i ∈ [n],
independently set
 1 1
 0, with probability 2 − 2 n
 √
1 1
xi ←− 1, with probability 2 − 2 n
√ .
1

 ∗, with probability √
n

Here ∗ means that the variable is not assigned a value and is free. Repeat this random experiment with the resulting
subfunction. After one restriction we get that
1 √
E[# of free variables] = √ · n = n.
n

n
Then by applying a Chernoff bound we see that with probability ≥ 1 − 2−O(n) , we have ≥ 2 free variables. After
n 1/4
two restrictions, with probability ≥ 1 − exp (−n), we have ≥ free variables. Let BAD0 denote the event that this
4
is not the case.
We make two claims:  
Claim 1: After the first random restriction, with probability ≥ 1 − O n1 , every layer-1 OR gate depends on ≤ 4c
variables.
Claim 2: If every
 layer-1 OR gate depends on ≤ b variables, then after another random restriction, with probability
≥ 1 − O n , every layer-2 AND gate depends on ≤ `b variables (where `b is a constant depending on b alone).
1

Now, we make the observation that for a function g : {0, 1}n → {0, 1} that depends only on ` of its inputs, we
have a depth-2 AND-of-OR’s circuit and a depth-2 OR-of-AND’s circuit computing g both of size ≤ ` · 2` .
Let BAD 1 and BAD2 respectively denote the events that Claim 1 and Claim 2 do not hold. Then with probability
≥ 1 − O n1 none of the bad events BAD0 , BAD1 and BAD2 occur. So there exists a restriction of the variables
1/4
such that none of the bad events occur. That restriction leaves us with a circuit on ≥ n 4 variables and by the above
switching argument, we can make layer-2 use only OR gates and layer-1 use only AND gates. Then we can combine
layers 2 and 3 to reduce depth by 1, completing the induction step.
Proof of Claim 1. Consider a layer-1 OR gate G. We have two cases:
Fat case: Fan-in of G is ≥ 4c log n.
Then with high probability G is set to be constant 1. To be precise,
 4c log n  4c log n
1 1 2 2 81
Pr[G is not set to 1] ≤ + √ ≤ = n 4c log 3 = n −c log 16 < n −2c .
2 2 n 3
Thin case: Fan-in of G is ≤ 4c log n.
  4c
4c log n 1
Pr[G depends on ≥ 4c variables] ≤ √ ≤ (4c log n)4c n −2c ≤ n −1.5c .
4c n

33
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science

Applying the Union bound,


 
1
Pr[∃G that depends on ≥ 4c variables] ≤ n · n c −1.5c
≤O .
n

Proof of Claim 2. The proof is by induction on b. For the base case b = 1, every layer-2 gate is “really” a layer-1
gate. So this case reduces to Claim 1 and we can set `1 = 4c.
For the induction step, consider a layer-2 AND gate G. Let M be a maximal set of non-interacting OR-gates that
are in G, where by interacting we mean that two gates share an input variable. Let V be the set of inputs read by gates
in M. We again have two cases:
Fat case: |V | > a log n.
In this case, |M| ≥ |Vb | > a log n
b , so that
 b  b
1 1 1
Pr[a particular OR-gate in M is set to 0] ≥ − √ ≥ .
2 2 n 3

Then
 |M|   a log n  
1 1 b −a log n
−a/(b·3b ) 1
Pr[no OR-gate in M is set to 0] ≤ 1 − b ≤ 1− b ≤e b·3b ≤n ≤O , for a = b·3b .
3 3 n

Thin case: |V | ≤ a log n.


In this case we have

1 i
  
|V |
Pr[after restriction V has ≥ i free variables] ≤ √
i n
 
a log n
≤ · n −i/2
i
≤ (a log n)i · n −i/2 ≤ n −i/3 .

Choose i = 4c. Then with probability ≥ 1 − n −4c/3 , there are ≤ 4c free variables remaining in V .
This implies that for every one of the 24c settings of these free variables, the resulting circuit computes an AND-
of-ORs with bottom fan-in ≤ b − 1. This is because all OR gates that are not in M interact with V .
This further implies that every one of these 24c subfunctions will (after restriction) depend on ≤ `b−1 variables.
So the whole function under G depends on ≤ 4c + 24c · `b−1 variables. Set `b = 4c + 24c · `b−1 , to complete the
induction step.
If we do our calculations carefully through the steps of the induction above, we could prove a lower bound of
(1/d) 1/(d−1)
2n . On the other hand, we can also construct a depth-d circuit of size O(n · 2n ).

8.3 Lower Bound via Polynomial Approximations


We will now look at the Razborov-Smolensky Theorem which gives an alternate proof of PARn ∈ / AC0 but also tells us
more. This proof also takes a more holistic view of the circuit, rather than looking at just the gates at the lower layers.
/ AC0 (but we will show more).
Theorem 8.3.1 (Razborov-Smolensky [Raz87, Smo87]). PARn ∈
Proof. Since p ∧ q = ¬(¬ p ∨ ¬q), we can assume that our circuit consists of OR and NOT gates alone. We will not
count NOT gates towards depth.
We will associate with each gate G of the depth-d circuit, a polyomial pG (x1 , x2 , . . . , xn ) ∈ F3 [x1 , x2 , . . . , xn ]
(F3 is the field of elements {0, 1, 2} under addition and multiplication modulo 3). pG will approximate the Boolean

34
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science

function calculated by G in the following sense: for all aE ∈ {0, 1}n , pG (E a ) ∈ {0, 1} and PraE∈{0,1}n [ pG (E
a ) 6= G(Ea )] ≤
small(`) ≈ 21` , where ` is a parameter to be fixed later.
Eventually, we will get a polynomial that approximates parity and has “low” degree, which we will see is a
contradiction.
We define a polynomial that approximates an OR gate in the following manner. We wantPto know if the input
xE to the OR gate is 0. E Take a random vector rE ∈ R {0, 1}n and compute rE · xE mod 3 =
i∈[n] ri x i mod 3 =
E r · xE mod 3 = 0] ≤ 21 .
P
i∈[n],ri =1 x i mod 3. If x E = 0, then rE · xE mod 3 = 0 always. If xE 6= 0, then PrrE [E
n
P Pick S ⊆ [n], uniformly at random (this is equivalent to picking rE ∈ R {0, 1} as above). Consider the polynomial
i∈S x i ∈ F3 [x 1 , x 2 , . . . , x n ]. If x
E this polynomial is always 0. Otherwise, this polynomial is not 0 with
E = 0,
probability 2 , that is, the square of the polynomial is 1 with probability ≤ 12 .
1

If we pick S1 , S2 , . . . , S` ⊆ [n] uniformly and independently and construct the polynomial


 !2   !2   !2 
X X X
1 − 1 − xi  1 − x i  . . . 1 − xi  ,
i∈S1 i∈S2 i∈S`

then this is a polynomial of degree 2` and for a particular input, this choice is bad with probability ≤ 21` . Now, by
Yao’s Minimax Lemma, we have that there exists a polynomial of degree 2` that agrees with ORn except at ≤ 1/2`
fraction of the inputs.
Now we topologically sort the circuit C to get the order: x1 = g1 , x2 = g2 , . . . , xn = gn , gn+1 , gn+2 , . . . , gs ,
where s = size(C). For i = 1 to s, we write a polynomial pgi corresponding to gate gi that approximates the Boolean
function computed at gi :

• For i ≤ n, pgi = xi .

• If gi is a NOT gate with input g j , then pgi = 1 − pg j .

• If gi is an OR gate with inputs h 1 , h 2 , . . . , h k , then


  2 
Ỳ  X
pgi = 1 − 1 −  ph m   ,

j=1 m∈S j

where S1 , . . . , S` are as obtained above (using Yao’s Lemma).

Let f = pgs be the polynomial corresponding to the output gate. Then,

1 s
Pr [ f (E
a ) 6= PAR(E
a )] ≤ (number of OR gates) · `
≤ `.
aE ∈{0,1}n 2 2

Also, deg( pgi ) ≤ (2`)depth(gi ) . So, deg( f ) ≤ (2`)d .


√ 1
 
Set ` such that (2`)d = n, that is ` = 12 · n 2d . Then we know that f (E
a ) = PAR(E
a ) for ≥ 2n 1 − s
2`
vectors

aE ∈ {0, 1}n and deg( f ) ≤ n.
Now we make the great notational change:

0 −→ FALSE −→ +1
1 −→ TRUE −→ −1 .

Then we have that there exists A ⊆ {−1, 1}n and a polynomial fˆ ∈ F3 [x1 , . . . , xn ] such that:

1. def( fˆ) = def( f ) ≤ n, and
Qn
a ∈ A : fˆ(E
2. ∀E a ) = ±1PAR(Ea ) = i=1 ai .

35
CS 85, Spring 2008, Dartmouth College
LECTURE 8. CIRCUITS Lower Bounds in Computer Science

Consider F3A = {all functions : A → F3 }, so that |F3A | = |F3 ||A| = 3|A| . Pick a function φ : A → F3 . Then φ has
a multilinear polynomial representation g(x1 , . . . , xn ) that agrees with φ on {−1, 1}n (hence on A).QThe polynomial
g(x1 , . . . , xn ) is of the form (monomials) · (coefficients), where each monomial is of the form i∈I xi for some
P
I ⊆ [n].
Note that, in F3 , Y Y Y Y
xi = xi · xi = fˆ(x1 , . . . , xn ) · xi
i∈I i∈[n] i∈[n]\I i∈[n]\I

for xE ∈ A (required for the last equality). The LHS above has
√ degree |I |, while the RHS has degree n + n − |I |.
√ > 2 + n, we get a polynomial ĝ such that g(x1 , . . . , xn ) =
n
Applying this to every monomial of degree
ĝ(x1 , . . . , xn ) for xE ∈ A with deg(ĝ) ≤ 2 + n.
n

Now, the number of multinomial polynomials in F3 [x1 , . . . , xn ] of degree ≤ n2 + n is
√ Pn/2+√n
3(#monomials of degree≤n/2+ n)
= 3 i=0 (ni) ≤ 30.98×2n ,

where the final inequality follows from concentration results.


 We √ have that
√ for
 a random variable X having Binomial
distribution with parameters n and p = 12 , the interval n2 − n, n2 + n is a 95% confidence interval. That is
√ √ 
Pr n2 − n ≤ X ≤ n2 + n = 0.95. Then, because of the symmetric distribution around the mean n2 , we have that

 n √  P n2 +√n n  n
0, 2 + n is an interval of confidence 97.5%. This gives us that i=0 i ≤ 0.98 × 2 .  
Now, comparing the two expressions for the number of functions, |A| ≤ 0.98 × 2n . But |A| ≥ 2n 1 − 2s` , so
1
1 (1/d)
that 1 − s
2`
≤ 0.98. This implies that s ≥ 0.02 × 2` = 0.02 × 2 2 ·n 2d = 2n .

P In a circuit we can also have MOD3 gates. A MOD3 gate taking inputs x1 , . . . ,0xn produces output y = 1 if
i x i mod 3 6= 0 and y = 0 otherwise. If we denote the class of such circuits as AC [3] then the above proof also
shows that PARn ∈ / AC0 [3].
In general, if we have MODm gates taking inputs x1 , . . . , xn and producing output y = 1 if i xi mod m 6= 0 and
P
y = 0 otherwise then we have the following definition.

Definition 8.3.2. AC0 [m] is the class of O(1)-depth, polynomial size circuits with unbounded fan-in using AND, OR
and NOT and MODm gates.

Note that PARn ∈ AC0 [2].


The “same” proof as above also shows that MODq ∈/ AC0 [ p], for primes p 6= q. This proof is due to Smolen-
sky [Smo87].
We do not have any idea of the computational power of AC0 [6]. For instance we do not know whether or not
0
AC [6] = NEXP.
We also define the class ACC0 as

Definition 8.3.3. ACC0 is the class of O(1)-depth, polynomial size circuits with unbounded fan-in using AND, OR
and NOT and MODm 1 , MODm 2 , . . . , MODm k gates, for some constants m 1 , m 2 , . . . , m k .

We can think of ACC0 as the class of AC0 circuits but with “counters”.

36
Bibliography

[BFP+ 73] Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert Endre Tarjan. Time
bounds for selection. JCSS, 7(4):448–461, 1973.

[BJ85] Samuel W. Bent and John W. John. Finding the median requires 2n comparisons. In STOC, pages 213–216,
1985.

[DZ99] Dorit Dor and Uri Zwick. Selecting the median. SICOMP, 28(5):1722–1758, 1999.

[DZ01] Dorit Dor and Uri Zwick. Median selection requires (2 + ε)n comparisons. SIDMA, 14(3):312–325, 2001.

[FG79] Frank Fussenegger and Harold N. Gabow. A counting approach to lower bounds for selection problems.
JACM, 26(2):227–238, 1979.

[Hås86] Johan Håstad. Almost optimal lower bounds for small depth circuits. In STOC, pages 6–20, 1986.

[Hya76] Laurent Hyafil. Bounds for selection. SICOMP, 5(1):109–114, 1976.

[Raz87] Alexander A. Razborov. Lower bounds on the size of bounded depth circuits over a complete basis with
logical addition. Mathematical Notes of the Academy of Sciences of the USSR, 41(4):333–338, 1987.

[Sip06] Michael Sipser. Introduction to the Theory of Computation, 2nd edition. Thomson Course Technology,
second edition edition, 2006.

[Smo87] Roman Smolensky. Algebraic methods in the theory of lower bounds for boolean circuit complexity. In
STOC, pages 77–82, 1987.

[SPP76] Arnold Schönhage, Mike Paterson, and Nicholas Pippenger. Finding the median. JCSS, 13(2):184–199,
1976.

37

You might also like