0% found this document useful (0 votes)
318 views339 pages

0400000010

This document provides an overview of pseudorandomness, which aims to reduce or eliminate the use of randomness in computer science applications by efficiently generating objects that appear random. The survey covers four main pseudorandom objects: pseudorandom generators, which stretch a short random seed into a longer pseudorandom string; randomness extractors, which extract randomness from biased and correlated sources; expander graphs, which allow efficient simulation of random walks; and list-decodable codes, which relate to samplers and expanders. The survey emphasizes the connections between these objects and their applications in derandomization, explicit constructions, complexity theory, and more. It is intended to be suitable for teaching in a graduate course, with exercises accompanying
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
318 views339 pages

0400000010

This document provides an overview of pseudorandomness, which aims to reduce or eliminate the use of randomness in computer science applications by efficiently generating objects that appear random. The survey covers four main pseudorandom objects: pseudorandom generators, which stretch a short random seed into a longer pseudorandom string; randomness extractors, which extract randomness from biased and correlated sources; expander graphs, which allow efficient simulation of random walks; and list-decodable codes, which relate to samplers and expanders. The survey emphasizes the connections between these objects and their applications in derandomization, explicit constructions, complexity theory, and more. It is intended to be suitable for teaching in a graduate course, with exercises accompanying
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 339

R

Foundations and Trends


in
Theoretical Computer Science
Vol. 7, Nos. 13 (2011) 1336
c 2012 S. P. Vadhan

DOI: 10.1561/0400000010

Pseudorandomness
By Salil P. Vadhan

Contents
1 Introduction

1.1
1.2
1.3
1.4

2
7
8
8

Overview of this Survey


Background Required and Teaching Tips
Notational Conventions
Chapter Notes and References

2 The Power of Randomness

10

2.1
2.2
2.3
2.4
2.5
2.6

10
17
23
32
40
46

Polynomial Identity Testing


The Computational Model and Complexity Classes
Sampling and Approximation Problems
Random Walks and S-T Connectivity
Exercises
Chapter Notes and References

3 Basic Derandomization Techniques

50

3.1
3.2
3.3
3.4
3.5
3.6
3.7

51
52
56
58
62
73
77

Enumeration
Nonconstructive/Nonuniform Derandomization
Nondeterminism
The Method of Conditional Expectations
Pairwise Independence
Exercises
Chapter Notes and References

4 Expander Graphs
4.1
4.2
4.3
4.4
4.5
4.6

Measures of Expansion
Random Walks on Expanders
Explicit Constructions
Undirected S-T Connectivity in Deterministic Logspace
Exercises
Chapter Notes and References

80
80
92
102
116
121
127

5 List-Decodable Codes

131

5.1
5.2
5.3
5.4
5.5
5.6

131
141
151
155
159
163

Denitions and Existence


List-Decoding Algorithms
List-decoding Views of Samplers and Expanders
Expanders from ParvareshVardy Codes
Exercises
Chapter Notes and References

6 Randomness Extractors

166

6.1
6.2
6.3
6.4
6.5

167
178
188
202
209

Motivation and Denition


Connections to Other Pseudorandom Objects
Constructing Extractors
Exercises
Chapter Notes and References

7 Pseudorandom Generators

212

7.1
7.2
7.3
7.4
7.5

212
219
224
230

7.6
7.7

Motivation and Denition


Cryptographic PRGs
Hybrid Arguments
Pseudorandom Generators from Average-Case Hardness
Worst-Case/Average-Case Reductions and
Locally Decodable Codes
Local List Decoding and PRGs from
Worst-Case Hardness
Connections to Other Pseudorandom Objects

239
252
261

7.8
7.9

Exercises
Chapter Notes and References

270
278

8 Conclusions

284

8.1
8.2

284
290

A Unied Theory of Pseudorandomness


Other Topics in Pseudorandomness

Acknowledgments

309

References

311

R
Foundations and Trends
in
Theoretical Computer Science
Vol. 7, Nos. 13 (2011) 1336
c 2012 S. P. Vadhan

DOI: 10.1561/0400000010

Pseudorandomness
Salil P. Vadhan
School of Engineering and Applied Sciences, Harvard University, Cambridge,
MA, 02138, USA, [email protected]

Abstract
This is a survey of pseudorandomness, the theory of eciently generating objects that look random despite being constructed using little
or no randomness. This theory has signicance for a number of areas
in computer science and mathematics, including computational complexity, algorithms, cryptography, combinatorics, communications, and
additive number theory. Our treatment places particular emphasis on
the intimate connections that have been discovered between a variety
of fundamental pseudorandom objects that at rst seem very dierent in nature: expander graphs, randomness extractors, list-decodable
error-correcting codes, samplers, and pseudorandom generators. The
structure of the presentation is meant to be suitable for teaching in a
graduate-level course, with exercises accompanying each section.

1
Introduction

1.1

Overview of this Survey

Over the past few decades, randomization has become one of the most
pervasive paradigms in computer science. Its widespread uses include:
Algorithm Design: For a number of important algorithmic problems, the most ecient algorithms known are randomized. For example:
Primality Testing: This was shown to have a randomized
polynomial-time algorithm in 1977. It wasnt until 2002 that
a deterministic polynomial-time algorithm was discovered.
(We will see this algorithm, but not its proof.)
Approximate Counting: Many approximate counting problems (e.g., counting perfect matchings in a bipartite graph)
have randomized polynomial-time algorithms, but the fastest
known deterministic algorithms take exponential time.
Undirected S-T Connectivity: This was shown to have
a randomized logspace algorithm in 1979. It wasnt until
2005 that a deterministic logspace algorithm was discovered
using tools from the theory of pseudorandomness, as we
will see.
2

1.1 Overview of this Survey

Perfect Matching: This was shown to have a randomized


polylogarithmic-time parallel algorithm in the late 1970s.
Deterministically, we only know polynomial-time algorithms.
Even in the cases where deterministic algorithms of comparable complexity were eventually found, the randomized algorithms remain considerably simpler and more ecient than the deterministic ones.
Cryptography: Randomization is central to cryptography. Indeed,
cryptography is concerned with protecting secrets, and how can something be secret if it is deterministically xed? For example, we assume
that cryptographic keys are chosen at random (e.g., uniformly from
the set of n-bit strings). In addition to the keys, it is known that often
the cryptographic algorithms themselves (e.g., for encryption) must be
randomized to achieve satisfactory notions of security (e.g., that no
partial information about the message is leaked).
Combinatorial Constructions: Randomness is often used to prove
the existence of combinatorial objects with a desired property. Specically, if one can show that a randomly chosen object has the property
with nonzero probability, then it follows that such an object must, in
fact, exist. A famous example due to Erd
os is the existence of Ramsey
graphs: A randomly chosen n-vertex graph has no clique or independent set of size 2 log n with high probability. We will see several other
applications of this Probabilistic Method in this survey, such as with
two important objects mentioned below: expander graphs and errorcorrecting codes.
Though these applications of randomness are interesting and rich
topics of study in their own right, they are not the focus of this survey.
Rather, we ask the following:
Main Question: Can we reduce or even eliminate the use of randomness in these settings?
We have several motivations for doing this.
Complexity Theory: We are interested in understanding and
comparing the power of various kinds of computational
resources. Since randomness is such a widely used resource,

Introduction

we want to know how it relates to other resources such as


time, space, and parallelism. In particular, we ask: Can every
randomized algorithm be derandomized with only a small loss
in eciency?
Using Physical Random Sources: It is unclear whether the
real world has physical sources of perfect randomness. We
may use sources that seem to have some unpredictability,
like the low order bits of a system clock or thermal noise,
but these sources will generally have biases and, more problematically, correlations. Thus we ask: What can we do with
a source of biased and correlated bits?
Explicit Constructions: Probabilistic constructions of combinatorial objects often do not provide us with ecient algorithms for using those objects. Indeed, the randomly chosen
object often has a description that is exponential in the relevant parameters. Thus, we look for explicit constructions
ones that are deterministic and ecient. In addition to their
applications, improvements in explicit constructions serve as
a measure of our progress in understanding the objects at
hand. Indeed, Erd
os posed the explicit construction of nearoptimal Ramsey graphs as an open problem, and substantial
progress on this problem was recently made using the theory
of pseudorandomness (namely randomness extractors).
Unexpected Applications: In addition, the theory of pseudorandomness has turned out to have many applications to
problems that seem to have no connection to derandomization. These include data structures, distributed computation,
circuit lower bounds and completeness results in complexity
theory, reducing interaction in interactive protocols, saving memory in streaming algorithms, and more. We will
see some of these applications in this survey (especially the
exercises).
The paradigm we will use to study the Main Question is that of
pseudorandomness: eciently generating objects that look random
using little or no randomness.

1.1 Overview of this Survey

Specically, we will study four pseudorandom objects:


Pseudorandom Generators: A pseudorandom generator is an
algorithm that takes as input a short, perfectly random seed and then
returns a (much longer) sequence of bits that looks random. That
the bits output cannot be perfectly random is clear the output is
determined by the seed and there are far fewer seeds than possible bit
sequences. Nevertheless, it is possible for the output to look random
in a very meaningful and general-purpose sense. Specically, we will
require that no ecient algorithm can distinguish the output from a
truly random sequence. The study of pseudorandom generators meeting this strong requirement originated in cryptography, where they have
numerous applications. In this survey, we will emphasize their role in
derandomizing algorithms.
Note that asserting that a function is a pseudorandom generator is
a statement about something that ecient algorithms cant do (in this
case, distinguish two sequences). But proving that ecient algorithms
cannot compute things is typically out of reach for current techniques
in theoretical computer science; indeed this is why the P vs. NP question is so hard. Thus, we will settle for conditional statements. An
ideal theorem would be something like: If P = NP, then pseudorandom generators exist. (The assumptions we make wont exactly be
P = NP, but hypotheses of a similar avor.)
Randomness Extractors: A randomness extractor takes as input
a source of biased and correlated bits, and then produces a sequence of
almost-uniform bits as output. Their original motivation was the simulation of randomized algorithms with sources of biased and correlated
bits, but they have found numerous other applications in theoretical
computer science. Ideally, extractors would be deterministic, but as we
will see this proves to be impossible for general sources of biased and
correlated bits. Nevertheless, we will get close constructing extractors that are only mildly probabilistic, in that they use small (logarithmic) number of truly random bits as a seed for the extraction.
Expander Graphs: Expanders are graphs with two seemingly
contradictory properties: they are sparse (e.g., having degree that is

Introduction

a constant, independent of the number of vertices), but also wellconnected in some precise sense. For example, the graph cannot be
bisected without cutting a large (say, constant) fraction of the edges.
Expander graphs have numerous applications in theoretical computer science. They were originally studied for their use in designing
fault-tolerant networks (e.g., for telephone lines), ones that maintain
good connectivity even when links or nodes fail. But they also have less
obvious applications, such as an O(log n)-time algorithm for sorting in
parallel.
It is not obvious that expander graphs exist, but in fact it can be
shown, via the Probabilistic Method, that a random graph of degree 3 is
a good expander with high probability. However, many applications
of expander graphs need explicit constructions, and these have taken
longer to nd. We will see some explicit constructions in this survey, but
even the state-of-the-art does not always match the bounds given by
the probabilistic method (in terms of the degree/expansion tradeo).
Error-Correcting Codes: Error-correcting codes are tools for communicating over noisy channels. Specically, they specify a way to
encode messages into longer, redundant codewords so that even if the
codeword gets somewhat corrupted along the way, it is still possible
for the receiver to decode the original message. In his landmark paper
that introduced the eld of coding theory, Shannon also proved the
existence of good error-correcting codes via the Probabilistic Method.
That is, a random mapping of n-bit messages to O(n)-bit codewords
is a good error-correcting code with high probability. Unfortunately,
these probabilistic codes are not feasible to actually use a random
mapping requires an exponentially long description, and we know of no
way to decode such a mapping eciently. Again, explicit constructions
are needed.
In this survey, we will focus on the problem of list decoding. Specifically, we will consider scenarios where the number of corruptions is
so large that unique decoding is impossible; at best one can produce a
short list that is guaranteed to contain the correct message.
A Unied Theory: Each of the above objects has been the center of
a large and beautiful body of research, but until recently these corpora

1.2 Background Required and Teaching Tips

were largely distinct. An exciting development over the past decade has
been the realization that all four of these objects are almost the same
when interpreted appropriately. Their intimate connections will be a
major focus of this survey, tying together the variety of constructions
and applications that we will see.
The surprise and beauty of these connections has to do with
the seemingly dierent nature of each of these objects. Pseudorandom generators, by asserting what ecient algorithms cannot do, are
objects of complexity theory. Extractors, with their focus on extracting the entropy in a correlated and biased sequence, are informationtheoretic objects. Expander graphs are of course combinatorial objects
(as dened above), though they can also be interpreted algebraically,
as we will see. Error-correcting codes involve a mix of combinatorics,
information theory, and algebra. Because of the connections, we obtain
new perspectives on each of the objects, and make substantial advances
on our understanding of each by translating intuitions and techniques
from the study of the others.

1.2

Background Required and Teaching Tips

The presentation assumes a good undergraduate background in the


theory of computation, and general mathematical maturity. Specically, it is assumed that the reader is familiar with basic algorithms
and discrete mathematics as covered in [109], including some exposure
to randomized algorithms; and with basic computational complexity
including P, NP, and reductions, as covered in [366]. Experience with
elementary abstract algebra, particularly nite elds, is helpful; recommended texts are [36, 263].
Most of the material in the survey is covered in a one-semester
graduate course that the author teaches at Harvard University, which
consists of 24 lectures of 1.5 hours each. Most of the students in that
course take at least one graduate-level course in theoretical computer
science before this one.
The exercises are an important part of the survey, as they include
proofs of some key facts, introduce some concepts that will be used in
later sections, and illustrate applications of the material to other topics.

Introduction

Problems that are particularly challenging or require more creativity


than most are marked with a star.

1.3

Notational Conventions

We denote the set of numbers {1, . . . , n} by [n]. We write N for the set
of nonnegative integers (so we consider 0 to be a natural number). We
write S T to mean that S is a subset of T (not necessarily strict),
and S  T for S being a strict subset of T . An inequality we use often
 
is nk (ne/k)k . All logarithms are base 2 unless otherwise specied.
We often use the convention that lowercase letters are the logarithm
(base 2) of the corresponding capital letter (e.g., N = 2n ).
Throughout, we consider random variables that can take values in
arbitrary discrete sets (as well as real-valued random variables). We
generally use capital letters, e.g., X, to denote random variables and
R
lowercase letters, e.g., x, to denote specic values. We write x X
to indicate that x is sampled according to X. For a set S, we write
R
x S to mean that x is selected uniformly at random from S. We
use the convention that multiple occurrences of a random variable in
an expression refer to the same instantiation, e.g., Pr[X = X] = 1. The
def

support of a random variable X is denoted by Supp(X) = {x : Pr[X =


x] > 0}. For an event E, we write X|E to denote the random variable
X conditioned on the event E. For a set S, we write US to denote
a random variable distributed uniformly over S. For n N, Un is a
random variable distributed uniformly over {0, 1}n .

1.4

Chapter Notes and References

In this section, we only provide pointers to general surveys and


textbooks on the topics covered, plus citations for specic results
mentioned.
General introductions to the theory of pseudorandomness (other
than this survey) include [162, 288, 393].
Recommended textbooks focused on randomized algorithms are
[290, 291]. The specic randomized and deterministic algorithms

1.4 Chapter Notes and References

mentioned are due to [6, 13, 220, 236, 237, 267, 287, 314, 327, 369];
for more details see Section 2.6.
Recommended textbooks on cryptography are [157, 158, 238]. The
idea that encryption should be randomized is due to Goldwasser and
Micali [176].
The Probabilistic Method for combinatorial constructions is the
subject of the book [25]. Erd
os used this method to prove the existence
of Ramsey graphs in [132]. Major recent progress on explicit constructions of Ramsey graphs was recently obtained by Barak, Rao, Shaltiel,
and Widgerson [48] via the theory of randomness extractors.
The modern notion of a pseudorandom generator was formulated in
the works of Blum and Micali [72] and Yao [421], motivated by cryptographic applications. We will spend most of our time on a variant of
the BlumMicaliYao notion, proposed by Nisan and Wigderson [302],
where the generator is allowed more running time than the algorithms
it fools. A detailed treatment of the BlumMicaliYao notion can be
found in [157].
Surveys on randomness extractors are [301, 352, 354]. The notion
of extractor that we will focus on is the one due to Nisan and
Zuckerman [303].
A detailed survey of expander graphs is [207]. The probabilistic
construction of expander graphs is due to Pinsker [309]. The application of expanders to sorting in parallel is due to Ajtai, Koml
os, and
Szemeredi [10].
A classic text on coding theory is [282]. For a modern, computer
science-oriented treatment, we recommend Sudans lecture notes [380].
Shannon started this eld with the paper [361]. The notion of list decoding was proposed by Elias [129] and Wozencraft [420], and was reinvigorated in the work of Sudan [378]. Recent progress on list decoding is
covered in [184].

2
The Power of Randomness

Before we study the derandomization of randomized algorithms, we will


need some algorithms to derandomize. Thus in this section, we present
a number of examples of randomized algorithms, as well as develop the
complexity-theoretic framework and basic tools for studying randomized algorithms.

2.1

Polynomial Identity Testing

In this section, we give a randomized algorithm to solve the following


computational problem.
Computational Problem 2.1. Polynomial Identity Testing:
Given two multivariate polynomials, f (x1 , . . . , xn ) and h(x1 , . . . , xn ),
decide whether f = g.

10

2.1 Polynomial Identity Testing

11

This problem statement requires some clarication. Specically, we


need to say what we mean by:
polynomials: A (multivariate) polynomial is a nite expression of the form

f (x1 , . . . , xn ) =
ci1 ,...,in xi11 xi22 xinn .
i1 ,...,in N

We need to specify from what space the coecients of the


polynomials come; they could be the integers, reals, rationals, etc. In general, we will assume that the coecients are
chosen from a eld (a set with addition and multiplication,
where every nonzero element has a multiplicative inverse) or
more generally an (integral) domain (where the product of
two nonzero elements is always nonzero). Examples of elds
include Q (the rationals), R (the reals), and Zp (integers
modulo p) where p is prime. An integral domain that is not
a eld is Z (the integers), but every integral domain is contained in its eld of fractions, which is Q in the case of Z. Zn
for composite n is not even an integral domain. We remark
that there does exist a nite eld Fq of size q = pk for every
prime p and positive integer k, and in fact this eld is unique
(up to isomorphism). Note that Fq is only equal to Zq in case
q is prime (i.e., k = 1). For more background on algebra, see
the references in the chapter notes.

For a polynomial f (x1 , . . . , xn ) = i1 ,...,in ci1 ,...,in xi11 xi22 xinn ,
we dene its degree (a.k.a. total degree) to be the maximum
sum of the exponents i1 + + in over its monomials with
nonzero coecients ci1 ,...,in . Its degree in xj is the maximum
of ij over its monomials with nonzero coecients.
f = g: What does it mean for two polynomials to be equal?
There are two natural choices: the polynomials are the same
as functions (they have the same output for every point in
the domain), or the polynomials are the same as formal polynomials (the coecients for each monomial are the same).
These two denitions are equivalent over the integers, but
they are not equivalent over nite elds. For example,

12

The Power of Randomness

consider
f (x) =

(x ).

for a nite eld F.1 It is easy to see that f () = 0 for all F,


but f = 0 as a formal polynomial. For us, equality refers
to equality as formal polynomials unless otherwise specied
(but often well be working with domains that are larger than
the degrees of the polynomials, in which case the two denitions are equivalent, as follows from the SchwartzZippel
Lemma below).
given: What does it mean to be given a polynomial? There
are several possibilities here:
(1) As a list of coecients: this trivializes the problem
of Polynomial Identity Testing, as we can just
compare. Since we measure complexity as a function
of the input length, this algorithm runs in linear time.
(2) As an oracle: a black box that, given any point in
the domain, gives the value of the polynomial. Here
the only explicit inputs to the algorithm are the
description of the eld F, the number n of variables,
and a bound d on the degree of the polynomial, and
we consider an algorithm to be ecient if it runs in
time polynomial in the lengths of these inputs and
n log |F| (since it takes time at least n log |F| to write
down an input to the oracle).
(3) As an arithmetic formula: a sequence of
symbols like (x1 + x2 )(x3 + x7 + 6x5 )x3 (x5 x6 ) +
x2 x4 (2x3 + 3x5 ) that describes the polynomial.
Observe that while we can solve Polynomial Identity Testing by expanding the polynomials and
grouping terms, the expanded polynomials may have
length exponential in the length of the formula, and
1 When

expanded and terms are collected, this polynomial p can be shown to simply equal
x|F| x.

2.1 Polynomial Identity Testing

13

thus the algorithm is not ecient as a function of the


input length.
More general than formulas are circuits. An arithmetic circuit consists of a directed acyclic graph,
consisting of input nodes, which have indegree 0
and are labeled by input variables or constants, and
computation nodes, which have indegree 2 and are
labelled by operations (+ or ) specifying how to
compute a value given the values at its children; one
of the computation nodes is designated as the output
node. Observe that every arithmetic circuit denes
a polynomial in its input variables x1 , . . . , xn . Arithmetic formulas are equivalent to arithmetic circuits
where the underlying graph is a tree.
The randomized algorithm we describe will work for all of the formulations above, but is only interesting for the second and third ones
(oracles and arithmetic circuits/formulas). It will be convenient to work
with the following equivalent version of the problem.
Computational Problem 2.2. Polynomial Identity Testing
(reformulation): Given a polynomial f (x1 , . . . , xn ), is f = 0?
That is, we consider the special case of the original problem where
g = 0. Any solution for the general version of course implies one for the
special case; conversely, we can solve the general version by applying
the special case to the polynomial f  = f g.
Algorithm 2.3 (Polynomial Identity Testing).
Input: A multivariate polynomial f (x1 , . . . , xn ) of degree at most d
over a eld/domain F.
(1) Let S F be any set of size 2d.
R
(2) Choose 1 , . . . , n S.
(3) Evaluate f (1 , . . . , n ). If the result is 0, accept. Otherwise,
reject.

14

The Power of Randomness

It is clear that if f = 0, the algorithm will always accept. The correctness in case f = 0 is based on the following simple but very useful
lemma.
Lemma 2.4 (SchwartzZippel Lemma). If f is a nonzero polynomial of degree d over a eld (or integral domain) F and S F, then
Pr

1 ,...,n S

[f (1 , . . . , n ) = 0]

d
.
|S|

In the univariate case (n = 1), this amounts to the familiar fact


that a polynomial with coecients in a eld and degree d has at most
d roots. The proof for multivariate polynomials proceeds by induction
on n, and we leave it as an exercise (Problem 2.1).
By the SchwartzZippel lemma, the algorithm will err with probability at most 1/2 when f = 0. This error probability can be reduced
by repeating the algorithm many times (or by increasing |S|). Note
that the error probability is only over the coin tosses of the algorithm,
not over the input polynomial f . This is what we mean when we say
randomized algorithm; it should work on a worst-case input with high
probability over the coin tosses of the algorithm. In contrast, algorithms
whose correctness (or eciency) only holds for inputs that are randomly
chosen (according to a particular distribution) are called heuristics, and
their study is called average-case analysis.
Note that we need a few things to ensure that our algorithm will
work.
First, we need a bound on the degree of the polynomial. We
can get this in dierent ways depending on how the polynomial is represented. For example, for arithmetic formulas, the
degree is bounded by the length of the formula. For arithmetic circuits, the degree is at most exponential in the size
(or even depth) of the circuit.
We also must be able to evaluate f when the variables take
arbitrary values in some set S of size 2d. For starters, this
requires that the domain F is of size at least 2d. We should

2.1 Polynomial Identity Testing

15

also have an explicit description of the domain F enabling us


to write down and manipulate eld elements, and randomly
sample from a subset of size at least 2d (e.g., the description
of F = Zp is simply the prime p). Then, if we are given f as
an oracle, we have the ability to evaluate f by denition. If
we are given f as an arithmetic formula or circuit, then we
can do a bottom-up, gate-by-gate evaluation. However, over
innite domains (like Z), there is subtlety the bit-length
of the numbers can grow exponentially large. Problem 2.4
gives a method for coping with this.
Since these two conditions are satised, we have a polynomialtime randomized algorithm for Polynomial Identity Testing for
polynomials given as arithmetic formulas over Z (or even circuits, by
Problem 2.4). There are no known subexponential-time deterministic
algorithms for this problem, even for formulas in form (i.e., a
sum of terms, each of which is the product of linear functions in the
input variables). A deterministic polynomial-time algorithm for
formulas where the outermost sum has only a constant number of terms
was obtained quite recently (2005).
2.1.1

Application to Perfect Matching

Now we will see an application of Polynomial Identity Testing to


an important graph-theoretic problem.
Denition 2.5. Let G = (V, E) be a graph. A matching on G is a
set E  E such that no two edges in E  have a common endpoint. A
perfect matching is a matching such that every vertex is incident to an
edge in the matching.

Computational Problem 2.6. Perfect Matching: Given a graph


G, decide whether there is a perfect matching in G.
Unlike Polynomial Identity Testing, Perfect Matching is
known to have deterministic polynomial-time algorithms e.g., using

16

The Power of Randomness

alternating paths, or by reduction to Max Flow in the bipartite case.


However, both of these algorithms seem to be inherently sequential
in nature. With randomization, we can obtain an ecient parallel
algorithm.
Algorithm 2.7 (Perfect Matching in bipartite graphs).
Input: A bipartite graph G with vertices numbered 1, . . . , n on each
side and edges E [n] [n].2
We construct an n n matrix A where

xi,j if(i, j) E
Ai,j (x) =
,
0
otherwise
where xi,j is a formal variable.
Consider the multivariate polynomial


sign()
Ai,(i) ,
det(A(x)) =
Sn

where Sn denotes the set of permutations on [n]. Note that the th term
is nonzero if and only if the permutation denes a perfect matching.
That is, (i, (i)) E for all 1 i n. So det(A(x)) = 0 i G has no
perfect matching. Moreover its degree is bounded by n, and given values
i,j for the xi,j s we can evaluate det(A()) eciently in parallel in polylogarithmic time using an appropriate algorithm for the determinant.
So to test for a perfect matching eciently in parallel, just
run the Polynomial Identity Testing algorithm with, say, S =
{1, . . . , 2n} Z, to test whether det(A(x)) = 0.
Some remarks:
The above also provides the most ecient sequential
algorithm for Perfect Matching, using the fact that
Determinant has the same time complexity as Matrix
Multiplication, which is known to be at most O(n2.38 ).
More sophisticated versions of the algorithm apply to nonbipartite graphs, and enable nding perfect matchings in the
2 Recall

that [n] denotes the set {1, . . . , n}. See Section 1.3.

2.2 The Computational Model and Complexity Classes

17

same parallel or sequential time complexity (where the result


for sequential time is quite recent).
Polynomial Identity Testing has been also used to
obtain a randomized algorithm for Primality Testing,
which was derandomized fairly recently (2002) to obtain the
celebrated deterministic polynomial-time algorithm for Primality Testing. See Problem 2.5.

2.2

The Computational Model and Complexity Classes

2.2.1

Models of Randomized Computation

To develop a rigorous theory of randomized algorithms, we need to


use a precise model of computation. There are several possible ways to
augmenting a standard deterministic computational model (e.g., Turing
machine or RAM model), such as:
(1) The algorithm has access to a coin-tossing black box that
provides it with (unbiased and independent) random bits on
request, with each request taking one time step. This is the
model we will use.
(2) The algorithm has access to a black box that, given a number n in binary, returns a number chosen uniformly at random from {1, . . . , n}. This model is often more convenient for
describing algorithms. Problem 2.2 shows that it is equivalent to Model 1, in the sense that any problem that can be
solved in polynomial time on one model can also be solved
in polynomial time on the other.
(3) The algorithm is provided with an innite tape (i.e., sequence
of memory locations) that is initially lled with random
bits. For polynomial-time algorithms, this is equivalent to
Model 1. However, for space-bounded algorithms, this model
seems stronger, as it provides the algorithm with free storage of its random bits (i.e., not counted toward its working
memory). Model 1 is considered to be the right model for
space-bounded algorithms. It is equivalent to allowing the
algorithm one-way access to an innite tape of random bits.

18

The Power of Randomness

We interchangeably use random bits and random coins to refer


to an algorithms random choices.
2.2.2

Complexity Classes

We will now dene complexity classes that capture the power of ecient
randomized algorithms. As is common in complexity theory, these
classes are dened in terms of decision problems, where the set of inputs
where the answer should be yes is specied by a language L {0, 1} .
However, the denitions generalize in natural ways to other types of
computational problems, such as computing functions or solving search
problems.
Recall that we say a deterministic algorithm A runs in time
t : N N if A takes at most t(|x|) steps on every input x, and it runs in
polynomial time if it runs time t(n) = O(nc ) for a constant c. Polynomial time is a theoretical approximation to feasible computation, with
the advantage that it is robust to reasonable changes in the model of
computation and representation of the inputs.
Denition 2.8. P is the class of languages L for which there exists a
deterministic polynomial-time algorithm A such that
x L A(x) accepts.
x
/ L A(x) rejects.
For a randomized algorithm A, we say that A runs in time t : N N
if A takes at most t(|x|) steps on every input x and every sequence of
random bits.
Denition 2.9. RP is the class of languages L for which there exists
a probabilistic polynomial-time algorithm A such that
x L Pr[A(x) accepts] 1/2.
x
 L Pr[A(x) accepts] = 0.
Here (and in the denitions below) the probabilities are taken over the
coin tosses of the algorithm A.

2.2 The Computational Model and Complexity Classes

19

That is, RP algorithms may have false negatives; the algorithm may
sometimes say no even if the answer is yes, albeit with bounded
probability. But the denition does not allow for false positives. Thus
RP captures ecient randomized computation with one-sided error.
RP stands for randomized polynomial time. Note that the error
probability of an RP algorithm can be reduced to 2p(n) for any polynomial p by running the algorithm p(n) times independently and accepting the input i at least one of the trials accepts. By the same reasoning,
the 1/2 in the denition is arbitrary, and any constant (0, 1) or even
= 1/poly(n) would yield the same class of languages.
A central question in this survey is whether randomization enables
us to solve more problems (e.g., decide more languages) in polynomial
time:
Open Problem 2.10. Does P = RP?
Similarly, we can consider algorithms that may have false positives
but no false negatives.
Denition 2.11. co-RP is the class of languages L whose complement
is in RP. Equivalently, L co-RP if there exists a probabilistic
L
polynomial-time algorithm A such that
x L Pr[A(x) accepts] = 1.
x
 L Pr[A(x) accepts] 1/2.
So, in co-RP we may err on no instances, whereas in RP we may
err on yes instances.
Using the Polynomial Identity Testing algorithm we saw earlier, we can deduce that Polynomial Identity Testing for arithmetic formulas is in co-RP. In Problem 2.4, this is generalized to arithmetic circuits, and thus we have:
Theorem 2.12. Arithmetic Circuit Identity Testing over Z,
dened as the language
ACITZ = {C : C(x1 , . . . , xn ) an arithmetic circuit over Z s.t. C = 0},
is in co-RP.

20

The Power of Randomness

It is common to also allow two-sided error in randomized algorithms:


Denition 2.13. BPP is the class of languages L for which there
exists a probabilistic polynomial-time algorithm A such that
x L Pr[A(x) accepts] 2/3.
x
 L Pr[A(x) accepts] 1/3.

Just as with RP, the error probability of BPP algorithms can be


reduced from 1/3 (or even 1/2 1/poly(n)) to exponentially small by
repetitions, this time taking a majority vote of the outcomes. Proving
this uses some facts from probability theory, which we will review in
the next section.
The cumbersome notation BPP stands for bounded-error probabilistic polynomial-time, due to the unfortunate fact that PP (probabilistic polynomial-time) refers to the denition where the inputs in
L are accepted with probability greater than 1/2 and inputs not in L
are accepted with probability at most 1/2. Despite its name, PP is
not a reasonable model for randomized algorithms, as it takes exponentially many repetitions to reduce the error probability. BPP is
considered the standard complexity class associated with probabilistic polynomial-time algorithms, and thus the main question of this
survey is:
Open Problem 2.14. Does BPP = P?
So far, we have considered randomized algorithms that can output an incorrect answer if they are unlucky in their coin tosses; these
are called Monte Carlo algorithms. It is sometimes preferable to
have Las Vegas algorithms, which always output the correct answer,
but may run for a longer time if they are unlucky in their coin
tosses. For this, we say that A has expected running time t : N N
if for every input x, the expectation of the number of steps taken by
A(x) is at most t(|x|), where the expectation is taken over the coin
tosses of A.

2.2 The Computational Model and Complexity Classes

21

Denition 2.15. ZPP is the class of languages L for which there


exists a probabilistic algorithm A that always decides L correctly and
runs in expected polynomial time.
ZPP stands for zero-error probabilistic polynomial time. The following relation between ZPP and RP is left as an exercise.
Fact 2.16 (Problem 2.3). ZPP = RP co-RP.
We do not know any other relations between the classes associated
with probabilistic polynomial time.
Open Problem 2.17. Are any of the inclusions P ZPP RP
BPP proper?
One can similarly dene randomized complexity classes associated
with complexity measures other than time such as space or parallel
computation. For example:
Denition 2.18. RNC is the class of languages L for which there
exists a probabilistic parallel algorithm A that runs in polylogarithmic
time on polynomially many processors and such that
x L Pr[A(x) accepts] 1/2.
x
 L Pr[A(x) accepts] = 0.
A formal model of a parallel algorithm is beyond the scope of this
survey, but can be found in standard texts on algorithms or parallel
computation. We have seen:
Theorem 2.19. Perfect Matching in bipartite graphs, i.e., the
language PM = {G : G a bipartite graph with a perfect matching}, is
in RNC.
Complexity classes associated with space-bounded computation will
be discussed in Section 2.4.1.

22

The Power of Randomness

2.2.3

Tail Inequalities and Error Reduction

In the previous section, we claimed that we can reduce the error of a


BPP algorithm by taking independent repetitions and ruling by majority vote. The intuition that this should work is based on the Law of
Large Numbers: if we repeat the algorithm many times, the fraction
of correct answers should approach its expectation, which is greater
than 1/2 (and thus majority rule will be correct). For complexity purposes, we need quantitative forms of this fact, which bound how many
repetitions are needed to achieve a desired probability of correctness.
First, we recall a basic inequality which says that it is unlikely for
(a single instantiation of) a random variable to exceed its expectation
by a large factor.
Lemma 2.20 (Markovs Inequality). If X is a nonnegative realvalued random variable, then for any > 0,
Pr[X ]

E[X]
.

Markovs Inequality alone does not give very tight concentration


around the expectation; to get even a 50% probability, we need to look
at deviations by a factor of 2. To get tight concentration, we can take
many independent copies of a random variable. There are a variety of
dierent tail inequalities that apply for this setting; they are collectively
referred to as Cherno Bounds.
Theorem 2.21(A Cherno Bound). Let X1 , . . . , Xt be independent

random variables taking values in the interval [0, 1], let X = ( i Xi )/t,
and = E[X]. Then
Pr[|X | ] 2 exp(t2 /4).
Thus, the probability that the average deviates signicantly from
the expectation vanishes exponentially with the number of repetitions t.
We leave the proof of this Cherno Bound as an exercise (Problem 2.7).
Now lets apply the Cherno Bound to analyze error-reduction for
BPP algorithms.

2.3 Sampling and Approximation Problems

23

Proposition 2.22. The following are equivalent:


(1) L BPP.
(2) For every polynomial p, L has a probabilistic polynomialtime algorithm with two-sided error at most 2p(n) .
(3) There exists a polynomial q such that L has a probabilistic polynomial-time algorithm with two-sided error at most
1/2 1/q(n).

Proof. Clearly, (2) (1) (3). Thus, we prove (3) (2).


Given an algorithm A with error probability at most 1/2 1/q(n),
consider an algorithm A that on an input x of length n, runs A for
t(n) independent repetitions and rules according to the majority, where
t(n) is a polynomial to be determined later.
We now compute the error probability of A on an input x of
length n. Let Xi be an indicator random variable that is 1 i the ith

execution of A(x) outputs the correct answer, and let X = ( i Xi )/t
be the average of these indicators, where t = t(n). Note that A (x) is
correct when X > 1/2. By the error probability of A and linearity of
expectations, we have E[X] 1/2 + 1/q, where q = q(n). Thus, applying the Cherno Bound with = 1/q, we have:
2

Pr[X 1/2] 2 et/2q < 2p(n) ,


for t(n) = 2p(n)q(n)2 and suciently large n.

2.3
2.3.1

Sampling and Approximation Problems


Sampling

The power of randomization is well-known to statisticians. If we want


to estimate the mean of some quantity over a large population, we can
do so very eciently by taking the average over a small random sample.
Formally, here is the computational problem we are interested in
solving.

24

The Power of Randomness

Computational Problem 2.23. Sampling (a.k.a. [+]-Approx


Oracle Average): Given oracle access to a function f : {0, 1}m
def

[0, 1], estimate (f ) = E[f (Um )] to within an additive error of , where


Um is uniformly distributed in {0, 1}m (as in Section 1.3). That is, output an answer in the interval [ , + ].
And here is the algorithm:
Algorithm 2.24 ([+]-Approx Oracle Average).
Input: oracle access to a function f : {0, 1}m [0, 1], and a desired
error probability (0, 1)
R

(1) Choose x1 , . . . , xt {0, 1}m , for an appropriate choice of t =


O(log(1/)/2 ).
(2) Query the oracle to obtain f (x1 ), . . . , f (xt ).

(3) Output ( i f (xi ))/t.
The correctness of this algorithm follows from the Cherno Bound
(Theorem 2.21). Note that for constant and , the sample size t is
independent of the size of the population (2m ), and we have running
time poly(m) even for = 1/poly(m) and = 2poly(m) .
For this problem, we can prove that no deterministic algorithm can
be nearly as ecient.
Proposition 2.25.
Any
deterministic
algorithm
solving
[+(1/4)]-Approx Oracle Average must make at least 2m /2
queries to its oracle.
Proof. Suppose we have a deterministic algorithm A that makes fewer
than 2m /2 queries. Let Q be the set of queries made by A when all
of its queries are answered by 0. (We need to specify how the queries
are answered to dene Q, because A make its queries adaptively, with
future queries depending on how earlier queries were answered.)

2.3 Sampling and Approximation Problems

25

Now dene two functions:


f0 (x) = 0

0
f1 (x) =
1

x
xQ
x
/Q

Then A gives the same answer on both f0 and f1 (since all the
oracle queries return 0 in both cases), but (f0 ) = 0 and (f1 ) > 1/2,
so the answer must have error greater than 1/4 for at least one of the
functions.
Thus, randomization provides an exponential savings for approximating the average of a function on a large domain. However, this does
not show that BPP = P. There are two reasons for this:
(1) [+]-Approx Oracle Average is not a decision problem,
and indeed it is not clear how to dene languages that capture the complexity of approximation problems. However,
below we will see how a slightly more general notion of decision problem does allow us to capture approximation problems such as this one.
(2) More fundamentally, it does not involve the standard model
of input as used in the denitions of P and BPP. Rather
than the input being a string that is explicitly given to the
algorithm (where we measure complexity in terms of the
length of the string), the input is an exponential-sized oracle
to which the algorithm is given random access. Even though
this is not the classical notion of input, it is an interesting one
that has received a lot of attention in recent years, because
it allows for algorithms whose running time is sublinear (or
even polylogarithmic) in the actual size of the input (e.g., 2m
in the example here). As in the example here, typically such
algorithms require randomization and provide approximate
answers.
2.3.2

Promise Problems

Now we will try to nd a variant of the [+]-Approx Oracle Average problem that is closer to the P versus BPP question. First, to

26

The Power of Randomness

obtain the standard notion of input, we consider functions that are


presented in a concise form, as Boolean circuits C : {0, 1}m {0, 1}
(analogous to the algebraic circuits dened in Section 2.1, but now the
inputs take on Boolean values and the computation gates are , ,
and ).
Next, we need a more general notion of decision problem than languages:
Denition 2.26. A promise problem consists of a pair (Y , N ) of
disjoint sets of strings, where Y is the set of yes instances and N is
the set of no instances. The corresponding computational problem is:
given a string that is promised to be in Y or N , decide which is
the case.
All of the complexity classes we have seen have natural promiseproblem analogues, which we denote by prP, prRP, prBPP, etc. For
example:
Denition 2.27. prBPP is the class of promise problems for which
there exists a probabilistic polynomial-time algorithm A such that
x Y Pr[A(x) accepts] 2/3.
x N Pr[A(x) accepts] 1/3.
Since every language L corresponds to the promise problem (L, L),
any fact that holds for every promise problem in some promise-class
also holds for every language in the corresponding language class. In
particular, if every prBPP algorithm can be derandomized, so can
every BPP algorithm:
Proposition 2.28. prBPP = prP BPP = P.
Except where otherwise noted, all of the results in this survey about
language classes (like BPP) easily generalize to the corresponding
promise classes (like prBPP), but we state most of the results in terms
of the language classes for notational convenience.

2.3 Sampling and Approximation Problems

27

Now we consider the following promise problem.


Computational Problem 2.29. [+]-Approx Circuit Average is
the promise problem CA , dened as:
CAY = {(C, p) : (C) > p + }
CAN = {(C, p) : (C) p}
Here can be a constant or a function of the input length n = |(C, p)|.
It turns out that this problem completely captures the power of
probabilistic polynomial-time algorithms.
Theorem 2.30. For every function such that 1/poly(n) (n) 1
o(1)
1/2n , [+]-Approx Circuit Average is prBPP-complete. That is,
it is in prBPP and every promise problem in prBPP reduces to it.
Proof Sketch: Membership in prBPP follows from Algorithm 2.24
and the fact that boolean circuits can be evaluated in polynomial time.
For prBPP-hardness, consider any promise problem prBPP.
We have a probabilistic t(n)-time algorithm A that decides with 2sided error at most 2n on inputs of length n, where t(n) = poly(n).
As an upper bound, A uses at most t(n) random bits. Thus we can
view A as a deterministic algorithm on two inputs its regular input
x {0, 1}n and its coin tosses r {0, 1}t(n) . (This view of a randomized algorithm is useful throughout the study of pseudorandomness.)
Well write A(x; r) for As output on input x and coin tosses r. For
every n, there is a circuit C(x; r) of size poly(t(n)) = poly(n) that simulates the computation of A for any input x {0, 1}n r {0, 1}t(n) ,
and moreover C can be constructed in time poly(t(n)) = poly(n). (See
any text on complexity theory for a proof that circuits can eciently
simulate Turing machine computations.) Let Cx (r) be the circuit C
with x hardwired in. Then the map x  (Cx , 1/2n ) is a polynomialtime reduction from to [+]-Approx Circuit Average. Indeed, if
x N , then A accepts with probability at most 1/2n , so (Cx )
1/2n . And if x Y , then (Cx ) 1 1/2n > 1/2n + (n ), where
n = |(Cx , 1/2n )| = poly(n) and we take n suciently large.

28

The Power of Randomness

Consequently, derandomizing this one algorithm is equivalent to


derandomizing all of prBPP:
Corollary 2.31. [+]-Approx Circuit Average is in prP if and
only if prBPP = prP.
Note that our proof of Proposition 2.25, giving an exponential lower
bound for [+]-Approx Oracle Average does not extend to [+]Approx Circuit Average. Indeed, its not even clear how to dene
the notion of query for an algorithm that is given a circuit C explicitly; it can do arbitrary computations that involve the internal structure
of the circuit. Moreover, even if we restrict attention to algorithms that
only use the input circuit C as if it were an oracle (other than computing the input length |(C, p)| to know how long it can run), there is
no guarantee that the function f1 constructed in the proof of Proposition 2.25 has a small circuit.
2.3.3

Approximate Counting to within Relative Error

Note that [+]-Approx Circuit Average can be viewed as the problem of approximately counting the number of satisfying assignments of
a circuit C : {0, 1}m {0, 1} to within additive error 2m , and a solution to this problem may give useless answers for circuits that dont
have very many satisfying assignments (e.g., circuits with fewer than
2m/2 satisfying assignments). Thus people typically study approximate
counting to within relative error. For example, given a circuit C, output a number that is within a (1 + ) factor of its number of satisfying assignments, #C. Or the following essentially equivalent decision
problem:
Computational Problem 2.32. [(1 + )]-Approx #CSAT is the
following promise problem:
CSATY = {(C, N ) : #C > (1 + ) N }
CSATN = {(C, N ) : #C N }
Here can be a constant or a function of the input length n = |(C, N )|.

2.3 Sampling and Approximation Problems

29

Unfortunately, this problem is NP-hard for general circuits (consider the special case that N = 0), so we do not expect a prBPP
algorithm. However, there is a very pretty randomized algorithm if we
restrict to DNF formulas.
Computational Problem 2.33. [(1 + )]-Approx #DNF is the
restriction of [(1 + )]-Approx #CSAT to C that are formulas in
disjunctive normal form (DNF) (i.e., an OR of clauses, where each
clause is an AND of variables or their negations).
Theorem 2.34. For every function (n) 1/poly(n), [(1 + )]Approx #DNF is in prBPP.
Proof. It suces to give a probabilistic polynomial-time algorithm that
estimates the number of satisfying assignments to within a 1 factor.
Let (x1 , . . . , xm ) be the input DNF formula.
A rst approach would be to apply random sampling as we have
used above: Pick t random assignments uniformly from {0, 1}m and
evaluate on each. If k of the assignments satisfy , output (k/t) 2m .
However, if # is small (e.g., 2m/2 ), random sampling will be unlikely
to hit any satisfying assignments, and our estimate will be 0.
The idea to get around this diculty is to embed the set of satisfying assignments, A, in a smaller set B so that sampling can be
useful. Specically, we will dene sets A B satisfying the following
properties:
(1)
(2)
(3)
(4)
(5)

|A | = |A|.
|A | |B|/poly(n), where n = |(, N )|.
We can decide membership in A in polynomial time.
|B| computable in polynomial time.
We can sample uniformly at random from B in polynomial
time.

Letting  be the number of clauses, we dene A and B as follows:


B = {(i, ) [] {0, 1}m : satises the ith clause}
A = {(i, ) B : does not satisfy any clauses before the ith clause}

30

The Power of Randomness

Now we verify the desired properties:


(1) |A| = |A | because for each satisfying assignment , A contains only one pair (i, ), namely the one where the rst
clause satised by is the ith one.
(2) The size of A and B can dier by at most a factor of , since
each satisfying assignment occurs at most  times in B.
(3) It is easy to decide membership in A in polynomial time.

(4) |B| = i=1 2mmi , where mi is the number of literals in
clause i (after removing contradictory clauses that contain
both a variable and its negation).
(5) We can randomly sample from B as follows. First pick a
clause with probability proportional to the number of satisfying assignments it has (2mmi ). Then, xing those variables
in the clause (e.g., if xj is in the clause, set xj = 1, and if xj
is in the clause, set xj = 0), assign the rest of the variables
uniformly at random.
Putting this together, we deduce the following algorithm:
Algorithm 2.35 ([(1 + )]-Approx #DNF).
Input: A DNF formula (x1 , . . . , xm ) with  clauses
(1) Generate a random sample of t points in B = {(i, ) []
{0, 1}m : satises the ith clause}, for an appropriate choice
of t = O(1/(/)2 ).
(2) Let
be the fraction of sample points that land in A =
{(i, ) B : does not satisfy any clauses before the ith
clause}.
(3) Output
|B|.
By the Cherno bound, for an appropriate choice of
t = O(1/(/)2 ), we have
= |A |/|B| / with high probability (where we write = to mean that [ , + ]). Thus,
with high probability the output of the algorithm satises:

|B| = |A | |B|/


= |A| |A|.

2.3 Sampling and Approximation Problems

31

There is no deterministic polynomial-time algorithm known for this


problem:
Open Problem 2.36. Give a deterministic polynomial-time algorithm for approximately counting the number of satisfying assignments
to a DNF formula.
However, when we study pseudorandom generators in Section 7, we
will see a quasipolynomial-time derandomization of the above algorithm
(i.e., one in time 2polylog(n) ).
2.3.4

MaxCut

We give an example of another algorithmic problem for which random


sampling is a useful tool.
Denition 2.37. For a graph G = (V, E) and S, T V , dene
cut(S, T ) = {{u, v} E : u S, v T }, and cut(S) = cut(S, V \ S).

Computational Problem 2.38. MaxCut (search version): Given a


graph G, nd a largest cut in G, i.e., a set S maximizing |cut(S)|.
Solving this problem optimally is NP-hard (in contrast to MINCUT, which is known to be in P). However, there is a simple randomized algorithm that nds a cut of expected size at least |E|/2 (which
is of course at least 1/2 the optimal, and hence this is referred to as a
1/2-approximation algorithm):
Algorithm 2.39 (MaxCut approximation).
Input: A graph G = (V, E) (with no self-loops)
Output a random subset S V . That is, place each vertex v in
S independently with probability 1/2.

32

The Power of Randomness

To analyze this algorithm, consider any edge e = (u, v). Then the
probability that e crosses the cut is 1/2. By linearity of expectations, we
have:

Pr[e is cut] = |E|/2.
E[|cut(S)|] =
eE

This also serves as a proof, via the probabilistic method, that every
graph (without self-loops) has a cut of size at least |E|/2.
In Section 3, we will see how to derandomize this algorithm. We
note that there is a much more sophisticated randomized algorithm
that nds a cut whose expected size is within a factor of 0.878 of the
largest cut in the graph (and this algorithm can also be derandomized).

2.4
2.4.1

Random Walks and S-T Connectivity


Graph Connectivity

One of the most basic problems in computer science is that of deciding


connectivity in graphs:
Computational Problem 2.40. S-T Connectivity: Given a
directed graph G and two vertices s and t, is there a path from s
to t in G?
This problem can of course be solved in linear time using breadthrst or depth-rst search. However, these algorithms also require linear
space. It turns out that S-T Connectivity can in fact be solved
using much less workspace. (When measuring the space complexity of
algorithms, we do not count the space for the (read-only) input and
(write-only) output.)
Theorem 2.41. There is a deterministic algorithm deciding S-T Connectivity using space O(log2 n) (and time O(nlog n )).
Proof. The following recursive algorithm IsPath(G, u, v, k) decides
whether there is a path of length at most k from u to v.

2.4 Random Walks and S-T Connectivity

33

Algorithm 2.42 (Recursive S-T Connectivity).


IsPath(G, u, v, k):
(1) If k = 0, accept if u = v.
(2) If k = 1, accept if u = v or (u, v) is an edge in G.
(3) Otherwise, loop through all vertices w in G and accept if
both IsPath(G, u, w, k/2) and IsPath(G, w, v, k/2) accept
for some w.

We can solve S-T Connectivity by running IsPath(G, s, t, n),


where n is the number of vertices in the graph. The algorithm has
log n levels of recursion and uses log n space per level of recursion (to
store the vertex w), for a total space bound of log2 n. Similarly, the
algorithm uses linear time per level of recursion, for a total time bound
of O(n)log n .
It is not known how to improve the space bound in Theorem 2.41
or to get the running time down to polynomial while maintaining space
no(1) . For undirected graphs, however, we can do much better using a
randomized algorithm. Specically, we can place the problem in the
following class:
Denition 2.43. A language L is in RL if there exists a randomized
algorithm A that always halts, uses space at most O(log n) on inputs
of length n, and satises:
x L Pr[A(x) accepts] 1/2.
x
 L Pr[A(x) accepts] = 0.
Recall that our model of a randomized space-bounded machine is
one that has access to a coin-tossing box (rather than an innite tape
of random bits), and thus must explicitly store in its workspace any
random bits it needs to remember. The requirement that A always halts
ensures that its running time is at most 2O(log n) = poly(n), because
otherwise there would be a loop in its conguration space. Similarly to

34

The Power of Randomness

RL, we can dene L (deterministic logspace), co-RL (one-sided error


with errors only on no instances), and BPL (two-sided error).
Now we can state the theorem for undirected graphs.
Computational Problem 2.44. Undirected S-T Connectivity:
Given an undirected graph G and two vertices s and t, is there a path
from s to t in G?

Theorem 2.45. Undirected S-T Connectivity is in RL.


Proof Sketch: The algorithm simply does a polynomial-length random walk starting at s.
Algorithm 2.46 (Undirected S-T Connectivity via Random
Walks).
Input: (G, s, t), where G = (V, E) has n vertices.
(1) Let v = s.
(2) Repeat poly(n) times:
(a) If v = t, halt and accept.
R
(b) Else let v {w : (v, w) E}.
(3) Reject (if we havent visited t yet).

Notice that this algorithm only requires space O(log n), to maintain
the current vertex v as well as a counter for the number of steps taken.
Clearly, it never accepts when there isnt a path from s to t. In the next
section, we will prove that in any connected undirected graph, a random
walk of length poly(n) from one vertex will hit any other vertex with
high probability. Applying this to the connected component containing
s, it follows that the algorithm accepts with high probability when s
and t are connected.
This algorithm, dating from the 1970s, was derandomized only
in 2005. We will cover the derandomized algorithm in Section 4.4.

2.4 Random Walks and S-T Connectivity

35

However, the general question of derandomizing space-bounded algorithms remains open.


Open Problem 2.47. Does RL = L? Does BPL = L?
2.4.2

Random Walks on Graphs

For generality that will be useful later, many of the denitions in this
section will be given for directed multigraphs (which we will refer to as
digraphs for short). By multigraph, we mean that we allow G to have
parallel edges and self-loops. Henceforth, we will refer to graphs without parallel edges and self-loops as simple graphs. We call a digraph
d-regular if every vertex has indegree d and outdegree d. A self-loop is
considered a single directed edge (i, i) from a vertex i to itself, so contributes 1 to both the indegree and outdegree of vertex i. An undirected
graph is a graph where the number of edges from i to j equals the number of edges from j to i for every i and j. (When i = j, we think of a pair
of edges (i, j) and (j, i) as comprising a single undirected edge {i, j},
and a self-loop (i, i) also corresponds to the single undirected edge {i}.)
To analyze the random-walk algorithm of the previous section, it
suces to prove a bound on the hitting time of random walks.
Denition 2.48. For a digraph G = (V, E), we dene its hitting
time as
hit(G) = max min{t : Pr[a random walk of
i,jV

length t started at i visits j] 1/2}.


We note that hit(G) is often dened as the maximum over vertices
i and j of the expected time for a random walk from i to visit j. The
two denitions are the same up to a factor of 2, and the above is more
convenient for our purposes.
We will prove:
Theorem 2.49. For every connected undirected graph G with n vertices and maximum degree d, we have hit(G) = O(d2 n3 log n).

36

The Power of Randomness

We observe that it suces to prove this theorem for d-regular


graphs, because any undirected graph be made regular by adding selfloops (and this can only increase the hitting time). This is the part of
our proof that fails for directed graphs (adding self-loops cannot correct
an imbalance between indegree and outdegree at a vertex), and indeed
Problem 2.10 shows that general directed graphs can have exponential
hitting time.
There are combinatorial methods for proving the above theorem,
but we will prove it using a linear-algebraic approach, as the same
methods will be very useful in our study of expander graphs. For an
n-vertex digraph G, we dene its random-walk transition matrix, or
random-walk matrix for short, to be the n n matrix M where Mi,j
is the probability of going from vertex i to vertex j in one step. That
is, Mi,j is the number of edges from i to j divided by the outdegree
of i. In case G is d-regular, M is simply the adjacency matrix of G
divided by d. Notice that for every probability distribution Rn on
the vertices of G (written as a row vector), the vector M is the probability distribution obtained by selecting a vertex i according to and
then taking one step of the random walk to end at a vertex j. This is

because (M )j = i i Mi,j .
In our application to bounding hit(G) for a regular digraph G, we
start at a probability distribution = (0, . . . , 0, 1, 0, . . . , 0) Rn concentrated at vertex i, and are interested in the distribution M t we get
after taking t steps on the graph. Specically, wed like to show that
it places nonnegligible mass on vertex j for t = poly(n). We will do
this by showing that it in fact converges to the uniform distribution
u = (1/n, 1/n, . . . , 1/n) Rn within a polynomial number of steps. Note
that uM = u by the regularity of G, so convergence to u is possible (and
will be guaranteed given some additional conditions on G).
We will measure the rate of convergence in the 2 norm. For vec
tors x, y Rn , we will
use
the
standard
inner
product
x,
y
=
i xi yi ,

and 2 norm x = x, x. We write x y to mean that x and y are
orthogonal, i.e., x, y = 0. We want to determine how large k needs
to be so that M k u is small. This is referred to as the mixing
time of the random walk. Mixing time can be dened with respect to
various distance measures and the 2 norm is not the most natural one,

2.4 Random Walks and S-T Connectivity

37

but it has the advantage that we will be able to show that the distance
decreases noticeably in every step. This is captured by the following
quantity.
Denition 2.50. For a regular digraph G with random-walk matrix
M , we dene
def

(G) = max

M u
xM 
,
= max
xu x
 u

where the rst maximization is over all probability distributions


[0, 1]n and the second is over all vectors x Rn such that x u. We
def

write (G) = 1 (G).


To see that the rst denition of (G) is smaller than or equal to
the second, note that for any probability distribution , the vector
x = ( u) is orthogonal to uniform (i.e., the sum of its entries is
zero). For the converse, observe that given any vector x u, the vector
= u + x is a probability distribution for a suciently small . It can
be shown that (G) [0, 1]. (This follows from Problems 2.10 and 2.9.)
The following lemma is immediate from the denition of (G).
Lemma 2.51. Let G be a regular digraph with random-walk matrix
M . For every initial probability distribution on the vertices of G and
every t N, we have
M t u (G)t  u (G)t .

Proof. The rst inequality follows from the denition of (G) and
induction. For the second, we have:
 u2 =  2  + u2 2, u


i2 + 1/N 2/N
i
=
i

i + 1/N 2/N

= 1 1/N 1.


i

38

The Power of Randomness

Thus a smaller value of (G) (equivalently, a larger value of (G))


means that the random walk mixes more quickly. Specically, for t
ln(n/)/(G), it follows that every entry of M t has probability mass
at least 1/n (1 (G))t (1 )/n, and thus we should think of
the walk as being mixed within O((log n)/(G)). Note that after
O(1/(G)) steps, the 2 distance is already small (say at most 1/10),
but this is not a satisfactory notion of mixing a distribution that
assigns 2 mass to 1/2 vertices has 2 distance smaller than from the
uniform distribution.
Corollary 2.52.
O(n log n/(G)).

For

every

regular

digraph

G,

hit(G) =

Proof. Let i, j be any two vertices of G. As argued above, a walk of


length ln(2n)/(G) from any start vertex has a probability of at least
1/2n of ending at j. Thus, if we do O(n) such walks consecutively, we
will hit j with probability at least 1/2.
Thus, our task reduces to bounding (G):
Theorem 2.53. If G is a connected, nonbipartite, and d-regular undirected graph on n vertices, then (G) = (1/(dn)2 ).
Theorem 2.53 and Corollary 2.52 imply Theorem 2.49, as any undirected and connected graph of maximum degree d can be made (d + 1)regular and nonbipartite by adding self-loops to each vertex. (We
remark that the bounds of Theorems 2.53 and 2.49 are not tight.) Theorem 2.53 is proven in Problem 2.9, using a connection with eigenvalues
described in the next section.
2.4.3

Eigenvalues

Recall that a nonzero vector v Rn is an eigenvector of n n matrix


M if vM = v for some R, which is called the corresponding eigenvalue. A useful feature of symmetric matrices is that they can be
described entirely in terms of their eigenvectors and eigenvalues.

2.4 Random Walks and S-T Connectivity

39

Theorem 2.54 (Spectral Theorem for Symmetric Matrices). If


M is a symmetric n n real matrix with distinct eigenvalues 1 , . . . , k ,
then the eigenspaces Wi = {v Rn : vM = i M } are orthogonal (i.e.,
v Wi , w Wj v w if i = j) and span Rn (i.e., Rn = W1 + +
Wk ). We refer to the dimension of Wi as the multiplicity of eigenvalue
i . In particular, Rn has a basis consisting of orthogonal eigenvectors
v1 , . . . , vn having respective eigenvalues 1 , . . . , n , where the number of
times i occurs among the j s exactly equals the multiplicity of i .
Notice that if G is a undirected regular graph, then its random-walk
matrix M is symmetric. We know that uM = u, so the uniform distribution is an eigenvector of eigenvalue 1. Let v2 , . . . , vn and 2 , . . . , n
be the remaining eigenvectors and eigenvalues, respectively. Given any
probability distribution , we can write it as = u + c2 v2 + + cn vn .
Then the probability distribution after k steps on the random walk is
M t = u + t2 c2 v2 + + tn cn vn .
In Problem 2.9, it is shown that all of the i s have absolute value at
most 1. Notice that if they all have magnitude strictly smaller than 1,
then M t indeed converges to u. Thus it is not surprising that our
measure of mixing rate, (G), equals the absolute value of the second
largest eigenvalue (in magnitude).
Lemma 2.55. Let G be an undirected graph with random-walk matrix
M . Let 1 , . . . , n be the eigenvalues of M , sorted so that 1 = 1
|2 | |3 | |n |. Then (G) = |2 |.
Proof. Let u = v1 , v2 , . . . , vn be the basis of orthogonal eigenvectors
corresponding to the i s. Given any vector x u, we can write x =
c2 v2 + + cn vn . Then:
xM 2 = 2 c2 v2 + + n cn vn 2
= 22 c22 v2 2 + + 2n c2n vn 2

40

The Power of Randomness

|2 |2 (c22 v2 2 + + c2n vn 2 )


= |2 |2 x2
Equality is achieved with x = v2 .
Thus, bounding (G) amounts to bounding the eigenvalues of G.
Due to this connection, (G) = 1 (G) is often referred to as the
spectral gap, as it is the gap between the largest eigenvalue and the
second largest eigenvalue in absolute value.
2.4.4

Markov Chain Monte Carlo

Random walks are a powerful tool in the design of randomized algorithms. In particular, they are the heart of the Markov Chain Monte
Carlo method, which is widely used in statistical physics and for solving approximate counting problems. In these applications, the goal is
to generate a random sample from an exponentially large space, such
as an (almost) uniformly random perfect matching for a given bipartite graph G. (It turns out that this is equivalent to approximately
counting the number of perfect matchings in G.) The approach is to
dened on
do a random walk on an appropriate (regular) graph G
the space (e.g., by doing random local changes on the current per is typically of size exponential in the
fect matching). Even though G
input size n = |G|, in many cases it can be proven to have mixing time
a property referred to as rapid mixing. These
poly(n) = polylog(|G|),
Markov Chain Monte Carlo methods provide some of the best examples
of problems where randomization yields algorithms that are exponentially faster than all known deterministic algorithms.

2.5

Exercises

Problem 2.1 (SchwartzZippel lemma). Prove Lemma 2.4: If


f (x1 , . . . , xn ) is a nonzero polynomial of degree d over a eld (or integral
domain) F and S F, then
Pr

1 ,...,n S

[f (1 , . . . , n ) = 0]

d
.
|S|

2.5 Exercises

41

You may use the fact that every nonzero univariate polynomial of
degree d over F has at most d roots.

Problem 2.2 (Robustness of the model). Suppose we modify our


model of randomized computation to allow the algorithm to obtain
a random element of {1, . . . , m} for any number m whose binary
representation it has already computed (as opposed to just allowing
it access to random bits). Show that this would not change the classes
BPP and RP.

Problem 2.3 (Zero error versus 1-sided error). Prove that


ZPP = RP co-RP.

Problem 2.4(Polynomial Identity Testing for integer circuits).


In this problem, you will show how to do Polynomial Identity
Testing for arithmetic circuits over the integers. The Prime Number
Theorem says that the number of primes less than T is (1 O(1))
T / ln T , where the O(1) tends to 0 as T . You may use this fact in
the problem below.
R

(1) Show that if N is a nonzero integer and M {1, . . . , log2 N },


then
Pr[N  0 (mod M )] = (1/loglogN ).
(2) Use the above to prove Theorem 2.12: Arithmetic Circuit
Identity Testing over Z is in co-RP.

Problem 2.5(Polynomial Identity Testing via Modular Reduction). In this problem, you will analyze an alternative to the algorithm
seen in class, which directly handles polynomials of degree larger than
the eld size. It is based on the same idea as Problem 2.4, using the
fact that polynomials over a eld have many of the same algebraic
properties as the integers.

42

The Power of Randomness

The following denitions and facts may be useful: A polynomial


f (x) over a eld F is called irreducible if it has no nontrivial factors
(i.e., factors other than constants from F or constant multiples of f ).
Analogously to prime factorization of integers, every polynomial over
F can be factored into irreducible polynomials and this factorization is
unique (up to reordering and constant multiples). It is known that the
number of irreducible polynomials of degree at most d over a eld F is
at least |F|d+1 /2d. (This is similar to the Prime Number Theorem for
integers mentioned in Problem 2.4, but is easier to prove.) For polynomials f (x) and g(x), f (x) mod g(x) is the remainder when f is divided
by g. (More background on polynomials over nite elds can be found
in the references listed in Section 2.6.)
In this problem, we consider a version of the Polynomial Identity Testing problem where a polynomial f (x1 , . . . , xn ) over nite
eld F is presented as a formula built up from elements of F and the
variables x1 , . . . , xn using addition, multiplication, and exponentiation
with exponents given in binary. We also assume that we are given a
representation of F enabling addition, multiplication, and division in F
to be done quickly.
(1) Let f (x) be a univariate polynomial of degree D over
a eld F. Prove that there are constants c, c such that if
f (x) is nonzero (as a formal polynomial) and g(x) is a randomly selected polynomial of degree at most d = c log D, then
the probability that f (x) mod g(x) is nonzero is at least
1/c log D. Deduce a randomized, polynomial-time identity
test for univariate polynomials presented in the above form.
(2) Obtain a randomized polynomial-time identity test for multivariate polynomials by giving a (deterministic) reduction
to the univariate case.

Problem 2.6 (Primality Testing).


(1) Show that for every positive integer n, the polynomial identity (x + 1)n xn + 1 (mod n) holds i n is prime.

2.5 Exercises

43

(2) Obtain a co-RP algorithm for the language Primality


Testing = {n : n prime} using Part 1 together with Problem 2.5. (In your analysis, remember that the integers
modulo n are a eld only when n is prime.)

Problem 2.7 (Cherno Bound). Let X1 , . . . , Xt be independent



[0, 1]-valued random variables, and X = ti=1 Xi . (Note that, in contrast to the statement of Theorem 2.21, here we are writing X for the
sum of the Xi s rather than their average.)
2

(1) Show that for every r [0, 1/2], E[erX ] er E[X]+r t . (Hint:
1 + x ex 1 + x + x2 for all x [0, 1/2].)
(2) Deduce the Cherno Bound of Theorem 2.21: Pr [X]
2
2
E[X]+ t e t/4 and Pr [X E[X] t] e t/4 .
(3) Where did you use the independence of the Xi s?

Problem 2.8 (Necessity of Randomness for Identity Testing*).


In this problem, we consider the oracle version of the identity testing problem, where an arbitrary polynomial f : Fm F of degree d is
given as an oracle and the problem is to test whether f = 0. Show that
any deterministic algorithm that solves this problem when m = d = n
must make at least 2n queries to the oracle (in contrast to the randomized identity testing algorithm from class, which makes only one query
provided that |F| 2n).
Is this a proof that P = RP? Explain.

Problem 2.9 (Spectral Graph Theory). Let M be the randomwalk matrix for a d-regular undirected graph G = (V, E) on n vertices.
We allow G to have self-loops and multiple edges. Recall that the uniform distribution is an eigenvector of M of eigenvalue 1 = 1. Prove the
following statements. (Hint: for intuition, it may help to think about
what the statements mean for the behavior of the random walk on G.)

44

The Power of Randomness

(1) All eigenvalues of M have absolute value at most 1.


(2) G is disconnected 1 is an eigenvalue of multiplicity at
least 2.
(3) Suppose G is connected. Then G is bipartite 1 is an
eigenvalue of M .
(4) G connected all eigenvalues of M other than 1 are at
most 1 1/poly(n, d). To do this, it may help to rst show
that the second largest eigenvalue of M (not necessarily in
absolute value) equals

1
(xi xj )2 ,
maxxM, x = 1 min
x
x
d
(i, j) E
where the maximum/minimum is taken over all vectors x


of length 1 such that i xi = 0, and x, y = i xi yi is the
standard inner product. For intuition, consider restricting the
above maximum/minimum to x {+, }n for , > 0.
(5) G connected and nonbipartite all eigenvalues of M (other
than 1) have absolute value at most 1 1/poly(n, d) and
thus (G) 1/poly(n, d).
(6*) Establish the (tight) bound 1 (1/d D n) in Part 4,
where D is the diameter of the graph. Conclude that (G) =
(1/d2 n2 ) if G is connected and nonbipartite.

Problem 2.10 (Hitting Time and Eigenvalues for Directed


Graphs).
(1) Show that for every n, there exists a digraph G with n vertices, outdegree 2, and hit(G) = 2(n) .
(2) Let G be a regulardigraph with random-walk matrix M .
Show that (G) = (G ), where G is the undirected graph
whose random-walk matrix is M M T .
(3) A digraph G is called Eulerian if it is connected and every
vertex has the same indegree as outdegree.3 (For example,
3 This

terminology comes from the fact that these are precisely the digraphs that have
Eulerian circuits, closed paths that visit all vertices use every directed edge exactly once.

2.5 Exercises

45

if we take an connected undirected graph and replace each


undirected edge {u, v} with the two directed edges (u, v) and
(v, u), we obtain an Eulerian digraph.) Show that if G is
an n-vertex Eulerian digraph of maximum degree d, then
hit(G) = poly(n, d).

Problem 2.11(Consequences of Derandomizing prBPP). Even


though prBPP is a class of decision problems, it also captures many
other types of problems that can be solved by randomized algorithms:
(1) (Approximation Problems) Consider the task of approximating a function f : {0, 1} N to within a (1 + (n)) multiplicative factor, for some function : N R0 . Suppose
there is a probabilistic polynomial-time algorithm A such
that for all x, Pr[A(x) f (x) (1 + (|x|)) A(x)] 2/3,
where the probability is taken over the coin tosses of A.
Show that if prBPP = prP, then there is a deterministic
polynomial-time algorithm B such that for all x, B(x)
f (x) (1 + (|x|)) B(x). (Hint: use a denition similar to
Computational Problem 2.32.)
(2) (NP Search Problems) An NP search problem is specied by
a polynomial-time verier V and a polynomial p; the problem
is, given an input x {0, 1}n , nd a string y {0, 1}p(n) such
that V (x, y) = 1. Suppose that such a search problem can be
solved in probabilistic polynomial time, i.e., there is a probabilistic polynomial-time algorithm A such that for every
input x {0, 1}n , outputs y {0, 1}p(n) such that V (x, y) = 1
with probability at least 2/3 over the coin tosses of A.
Show that if prBPP = prP, then there is a deterministic
polynomial-time algorithm B such that for every x {0, 1}n ,
B(x) always outputs y {0, 1}p(n) such that V (x, y) = 1.
(Hint: consider a promise problem whose instances are pairs
(x, r) where r is a prex of the coin tosses of A.)
(3) (MA Search Problems) In a relaxation of NP, known
as MA, we allow the verier V to be probabilistic. Let

46

The Power of Randomness

= (Y , N ) be a promise problem decided by V , i.e.,


Pr[V (x, y) = 1] 2/3 for (x, y) Y , and Pr[V (x, y) = 1]
1/3 for (x, y) N . Suppose that the corresponding search
problem can be solved in probabilistic polynomial time, i.e.,
there is a probabilistic polynomial-time algorithm A such
that for every input x {0, 1}n , outputs y {0, 1}p(n) such
that (x, y) Y with probability at least 2/3 over the coin
tosses of A. Show that if prBPP = prP, then there is a
deterministic polynomial-time algorithm B such that for
every x {0, 1}n , B(x) always outputs y {0, 1}p(n) such
that (x, y)
/ N . (Note that this conclusion is equivalent to
(x, y) Y if is a language, but is weaker otherwise.)
(4) Using Part 3, the Prime Number Theorem (see Problem 2.4),
and the fact that Primality Testing is in BPP (Problem 2.6) to show that if prBPP = prP, then there is a deterministic polynomial-time algorithm that given a number N ,
outputs a prime in the interval [N, 2N ) for all suciently
large N .

2.6

Chapter Notes and References

Recommended textbooks on randomized algorithms are Motwani


Raghavan [291] and MitzenmacherUpfal [290]. The algorithmic power
of randomization became apparent in the 1970s with a number of striking examples, notably Berlekamps algorithm for factoring polynomials over large nite elds [66] and the MillerRabin [287, 314] and
SolovayStrassen [369] algorithms for Primality Testing. The randomized algorithm for Polynomial Identity Testing was independently discovered by DeMillo and Lipton [116], Schwartz [351], and
Zippel [425]. More recently, a deterministic polynomial-time Polynomial Identity Testing algorithm for formulas in form with
a constant number of terms was given by Kayal and Saxena [241],
improving a previous quasipolynomial-time algorithm of Dvir and
Shpilka [126]. Problem 2.4 is from [351]. Problem 2.8 is from [260].
Algorithms for Polynomial Identity Testing and the study of

2.6 Chapter Notes and References

47

arithmetic circuits more generally are covered in the recent survey


by Shpilka and Yehudayo [362]. Recommended textbooks on abstract
algebra and nite elds are [36, 263].
The randomized algorithm for Perfect Matching is due to
Lov
asz [267], who also showed how to extend the algorithm to nonbipartite graphs. An ecient parallel randomized algorithm for nding
a perfect matching was given by Karp, Upfal, and Wigderson [237] (see
also [294]). A randomized algorithm for nding a perfect matching in
the same sequential time complexity as Lovaszs algorithm was given
recently by Mucha and Sankowski [292] (see also [201]).
The ecient sequential and parallel algorithms for Determinant
mentioned in the text are due [108, 377] and [64, 76, 111], respectively.
For more on algebraic complexity and parallel algorithms, we refer to
the textbooks [89] and [258], respectively. The Polynomial Identity
Testing and Primality Testing algorithms of Problems 2.5 and
2.6 are due to Agrawal and Biswas [5]. Agrawal, Kayal, and Saxena [6]
derandomized the Primality Testing algorithm of [5] to prove that
Primality Testing is in P.
The randomized complexity classes RP, BPP, ZPP, and PP were
formally dened by Gill [152], who conjectured that BPP = P (in fact
ZPP = P). Cherno Bounds are named after H. Cherno [95]; the
version in Theorem 2.21 is due to Hoeding [204] and is sometimes
referred to as Hoedings Inequality. The survey by Dubhashi and Panconesi [121] has a detailed coverage of Cherno Bounds and other tail
inequalities. Problem 2.3 is due to Rabin (cf. [152]).
The computational perspective on sampling, as introduced in
Section 2.3.1, is surveyed in [155, 156]. Sampling is perhaps the simplest example of a computational problem where randomization enables
algorithms with running time sublinear in the size of the input. Such
sublinear-time algorithms are now known for a wide variety of interesting computational problems; see the surveys [337, 339].
Promise problems were introduced by Even, Selman, and
Yacobi [137]. For survey of their role in complexity theory, see
Goldreich [159]. Problem 2.11 on the consequences of prBPP = prP
is from [163].

48

The Power of Randomness

The randomized algorithm for [(1 + )]-Approx #DNF is due


to Karp and Luby [236], who initiated the study of randomized
algorithms for approximate counting problems. A 1/2-approximation
algorithm for MaxCut was rst given in [342]; that algorithm can
be viewed as a natural derandomization of Algorithm 2.39. (See Algorithm 3.17.) The 0.878-approximation algorithm was given by Goemans
and Williamson [154].
The O(log2 n)-space algorithm for S-T Connectivity is due to
Savitch [349]. Using the fact that S-T Connectivity (for directed
graphs) is complete for nondeterministic logspace (NL), this result is
equivalent to the fact that NL L2 , where Lc is the class of languages
that can be decided deterministic space O(logc n). The latter formulation (and its generalization NSPACE(s(n)) DSPACE(s(n)2 )) is
known as Savitchs Theorem. The randomized algorithm for Undirected S-T Connectivity was given by Aleliunas, Karp, Lipton,
Lov
asz, and Racko [13], and was recently derandomized by Reingold [327] (see Section 4.4).
For more background on random walks, mixing time, and the
Markov Chain Monte Carlo Method, we refer the reader to [290, 291,
317]. The use of this method for counting perfect matchings is due to
[83, 219, 220].
The bound on hitting time given in Theorem 2.49 is not tight; for
example, it can be improved to (n2 ) for regular graphs that are simple
(have no self-loops or parallel edges) [229].
Even though we will focus primarily on undirected graphs (e.g., in
our study of expanders in Section 4), much of what we do generalizes to
regular, or even Eulerian, digraphs. See Problem 2.10 and the references
[139, 286, 331]. Problem 2.10, Part 2 is from [139].
The Spectral Theorem (Theorem 2.54) can be found in any standard textbook on linear algebra. Problem 2.9 is from [269]; Alon and
Sudakov [26] strengthen it to show (G) = (1/dDn), which implies a
tight bound of (G) = (1/n2 ) for simple graphs (where D = O(n/d)).
Spectral Graph Theory is a rich subject, with many applications
beyond the scope of this text; see the survey by Spielman [371] and
references therein.

2.6 Chapter Notes and References

49

One signicant omission from this section is the usefulness of randomness for verifying proofs. Recall that NP is the class of languages
having membership proofs that can be veried in P. Thus it is natural
to consider proof verication that is probabilistic, leading to the class
MA, as well as a larger class AM, where the proof itself can depend on
the randomness chosen by the verier [44]. (These are both subclasses of
the class IP of languages having interactive proof systems [177].) There
are languages, such as Graph Nonisomorphism, that are in AM but
are not known to be in NP [170]. Derandomizing these proof systems (e.g., proving AM = NP) would show that Graph Nonisomorphism is in NP, i.e., that there are short proofs that two graphs are
nonisomorphic. Similarly to sampling and sublinear-time algorithms,
randomized proof verication can also enable one to read only a small
portion of an appropriately encoded NP proof, leading to the celebrated PCP Theorem and its applications to hardness of approximation [33, 34, 138]. For more about interactive proofs and PCPs, see
[32, 160, 400].

3
Basic Derandomization Techniques

In the previous section, we saw some striking examples of the power of


randomness for the design of ecient algorithms:

Polynomial Identity Testing in co-RP.


[(1 + )]-Approx #DNF in prBPP.
Perfect Matching in RNC.
Undirected S-T Connectivity in RL.
Approximating MaxCut in probabilistic polynomial time.

This was of course only a small sample; there are entire texts on
randomized algorithms. (See the notes and references for Section 2.)
In the rest of this survey, we will turn toward derandomization
trying to remove the randomness from these algorithms. We will achieve
this for some of the specic algorithms we studied, and also consider the
larger question of whether all ecient randomized algorithms can be
derandomized. For example, does BPP = P? RL = L? RNC = NC?
In this section, we will introduce a variety of basic derandomization techniques. These will each be decient in that they are either
infeasible (e.g., cannot be carried out in polynomial time) or specialized (e.g., apply only in very specic circumstances). But it will be
50

3.1 Enumeration

51

useful to have these as tools before we proceed to study more sophisticated tools for derandomization (namely, the pseudorandom objects
of Sections 47).

3.1

Enumeration

We are interested in quantifying how much savings randomization provides. One way of doing this is to nd the smallest possible upper bound
on the deterministic time complexity of languages in BPP. For example, we would like to know which of the following complexity classes
contain BPP:
Denition 3.1 (Deterministic Time Classes). 1
DTIME(t(n)) = {L : L can be decided deterministically
in time O(t(n))}
P = c DTIME(nc )
= c DTIME(2(log n)c )
P

(polynomial time)
(quasipolynomial time)

SUBEXP =

DTIME(2n )

(subexponential time)

EXP = c

c
DTIME(2n )

(exponential time)

The Time Hierarchy Theorem of complexity theory implies that


 SUBEXP  EXP. More
all of these classes are distinct, i.e., P  P
generally, it says that DTIME(O(t(n)/ log t(n)))  DTIME(t(n)) for
any eciently computable time bound t. (What is dicult in complexity theory is separating classes that involve dierent computational
resources, like deterministic time versus nondeterministic time.)
Enumeration is a derandomization technique that enables us to
deterministically simulate any randomized algorithm with an exponential slowdown.
Proposition 3.2. BPP EXP.
1 Often

DTIME() is written as TIME(), but we include the D to emphasize that it


refers to deterministic rather than randomized algorithms. Also, in some contexts (e.g.,
cryptography), exponential time refers to DTIME(2O(n) ) and subexponential time
to be DTIME(2o(n) ); we will use E to denote the former class when it arises in Section 7.

52

Basic Derandomization Techniques

Proof. If L is in BPP, then there is a probabilistic polynomial-time


algorithm A for L running in time t(n) for some polynomial t. As
in the proof of Theorem 2.30, we write A(x; r) for As output on input
x {0, 1}n and coin tosses r {0, 1}m(n) , where we may assume m(n)
t(n) without loss of generality. Then:
Pr[A(x; r) accepts] =

1
2m(n)

A(x; r)

r{0,1}m(n)

We can compute the right-hand side of the above expression in deterministic time 2m(n) t(n).
We see that the enumeration method is general in that it applies
to all BPP algorithms, but it is infeasible (taking exponential time).
However, if the algorithm uses only a small number of random bits, it
becomes feasible:
Proposition 3.3. If L has a probabilistic polynomial-time algorithm that runs in time t(n) and uses m(n) random bits, then L
DTIME(t(n) 2m(n) ). In particular, if t(n) = poly(n) and m(n) =
O(log n), then L P.
Thus an approach to proving BPP = P is to show that the number
of random bits used by any BPP algorithm can be reduced to O(log n).
We will explore this approach in Section 7. However, to date, Proposition 3.2 remains the best unconditional upper-bound we have on the
deterministic time-complexity of BPP.

Open Problem 3.4. Is BPP closer to P or EXP? Is BPP P?


Is BPP SUBEXP?

3.2

Nonconstructive/Nonuniform Derandomization

Next we look at a derandomization technique that can be implemented


eciently but requires some nonconstructive advice that depends on
the input length.

3.2 Nonconstructive/Nonuniform Derandomization

53

Proposition 3.5. If A(x; r) is a randomized algorithm for a language


L that has error probability smaller than 2n on inputs x of length n,
then for every n, there exists a xed sequence of coin tosses rn such
that A(x; rn ) is correct for all x {0, 1}n .
Proof. We use the Probabilistic Method. Consider Rn chosen uniformly
at random from {0, 1}r(n) , where r(n) is the number of coin tosses used
by A on inputs of length n. Then
Pr[x {0, 1}n s.t. A(x; Rn ) incorrect on x]


Pr[A(x; Rn ) incorrect on x]
x

< 2n 2n = 1
Thus, there exists a xed value Rn = rn that yields a correct answer
for all x {0, 1}n .
The advantage of this method over enumeration is that once we
have the xed string rn , computing A(x; rn ) can be done in polynomial
time. However, the proof that rn exists is nonconstructive; it is not
clear how to nd it in less than exponential time.
Note that we know that we can reduce the error probability of
any BPP (or RP, RL, RNC, etc.) algorithm to smaller than 2n by
repetitions, so this proposition is always applicable. However, we begin
by looking at some interesting special cases.
Example 3.6 (Perfect Matching). We apply the proposition to
Algorithm 2.7. Let G = (V, E) be a bipartite graph with m vertices
on each side, and let AG (x1,1 , . . . , xm,m ) be the matrix that has entries
G
AG
i,j = xi,j if (i, j) E, and Ai,j = 0 if (i, j)  E. Recall that the polynomial det(AG (x)) is nonzero if and only if G has a perfect matching.
2
Let Sm = {0, 1, 2, . . . , m2m }. We argued that, by the SchwartzZippel
R
m2 at random and evaluate det(AG ()),
Lemma, if we choose Sm
we can determine whether det(AG (x)) is zero with error probability at
2
most m/|S| which is smaller than 2m . Since a bipartite graph with m

54

Basic Derandomization Techniques

vertices per side is specied by a string of length n = m2 , by Proposim2 such that


tion 3.5 we know that for every m, there exists an m Sm
det(AG (m )) = 0 if and only if G has a perfect matching, for every
bipartite graph G with m vertices on each side.
2

Open Problem 3.7. Can we nd such an m {0, . . . , m2m }m


explicitly, i.e., deterministically and eciently? An NC algorithm (i.e.,
parallel time polylog(m) with poly(m) processors) would put Perfect
Matching in deterministic NC, but even a subexponential-time algorithm would be interesting.

Example 3.8 (Universal Traversal Sequences). Let G be a connected d-regular undirected multigraph on n vertices. From Theorem 2.49, we know that a random walk of poly(n, d) steps from any
start vertex will visit any other vertex with high probability. By increasing the length of the walk by a polynomial factor, we can ensure
that every vertex is visited with probability greater than 1 2nd log n .
By the same reasoning as in the previous example, we conclude that
for every pair (n, d), there exists a universal traversal sequence w
{1, 2, . . . , d}poly(n,d) such that for every n-vertex, d-regular, connected
G and every vertex s in G, if we start from s and follow w then we will
visit the entire graph.

Open Problem 3.9. Can we construct such a universal traversal


sequence explicitly (e.g., in polynomial time or even logarithmic space)?
There has been substantial progress toward resolving this question
in the positive; see Section 4.4.
We now cast the nonconstructive derandomizations provided by
Proposition 3.5 in the language of nonuniform complexity classes.
Denition 3.10. Let C be a class of languages, and a : N N be a
function. Then C/a is the class of languages dened as follows: L C/a

3.2 Nonconstructive/Nonuniform Derandomization

55

if there exists L C, and 1 , 2 , . . . {0, 1} with |n | a(n), such


that x L (x, |x| ) L . The s are called the advice strings.

P/poly is the class c P/nc , i.e., polynomial time with polynomial
advice.
A basic result in complexity theory is that P/poly is exactly the
class of languages that can be decided by polynomial-sized Boolean
circuits:
Fact 3.11. L P/poly i there is a sequence of Boolean circuits
{Cn }nN and a polynomial p such that for all n
(1) Cn : {0, 1}n {0, 1} decides L {0, 1}n
(2) |Cn | p(n).

We refer to P/poly as a nonuniform model of computation


because it allows for dierent, unrelated programs for each input
length (e.g., the circuits Cn , or the advice n ), in contrast to classes
like P, BPP, and NP, that require a single program of constant
size specifying how the computation should behave for inputs of arbitrary length. Although P/poly contains some undecidable problems,2
people generally believe that NP  P/poly. Indeed, trying to prove
lower bounds on circuit size is one of the main approaches to proving
P = NP, since circuits seem much more concrete and combinatorial
than Turing machines. However, this too has turned out to be quite
dicult; the best circuit lower bound known for computing an explicit
function is roughly 5n.
Proposition 3.5 directly implies:
Corollary 3.12. BPP P/poly.
A more general meta-theorem is that nonuniformity is more powerful than randomness.
2 Consider

the unary version of the Halting Problem, which can be decided in constant
time given advice n {0, 1} that species whether the nth Turing machine halts or not.

56

3.3

Basic Derandomization Techniques

Nondeterminism

Although physically unrealistic, nondeterminism has proven to be a


very useful resource in the study of computational complexity (e.g.,
leading to the class NP). Thus it is natural to study how it compares
in power to randomness. Intuitively, with nondeterminism we should
be able to guess a good sequence of coin tosses for a randomized
algorithm and then do the computation deterministically. This intuition
does apply directly for randomized algorithms with 1-sided error:
Proposition 3.13. RP NP.
Proof. Let L RP and A be a randomized algorithm that decides it.
A polynomial-time veriable witness that x L is any sequence of coin
tosses r such that A(x; r) = accept.
However, for 2-sided error (BPP), containment in NP is not clear.
Even if we guess a good random string (one that leads to a correct
answer), it is not clear how we can verify it in polynomial time. Indeed,
it is consistent with current knowledge that BPP equals NEXP (nondeterministic exponential time)! Nevertheless, there is a sense in which
we can show that BPP is no more powerful than NP:
Theorem 3.14. If P = NP, then P = BPP.
Proof. For any language L BPP, we will show how to express membership in L using two quantiers. That is, for some polynomial-time
predicate P ,
xL

y z P (x, y, z),

(3.1)

where we quantify over y and z of length poly(|x|).


Assuming P = NP, we can replace z P (x, y, z) by a polynomialtime predicate Q(x, y), because the language {(x, y) : z P (x, y, z)} is
in co-NP = P. Then L = {x : y Q(x, y)} NP = P.
To obtain the two-quantier expression (3.1), consider a randomized
algorithm A for L, and assume w.l.o.g. that its error probability

3.3 Nondeterminism

57

is smaller than 2n and that it uses m = poly(n) coin tosses. Let


Ax {0, 1}m be the set of coin tosses r for which A(x; r) = 1. We will
show that if x is in L, there exist m shifts (or translations) of Ax
that cover all points in {0, 1}m . (Notice that this is a statement.)
Intuitively, this should be possible because Ax contains all but an exponentially small fraction of {0, 1}m . On the other hand if x
/ L, then no
m shifts of Ax can cover all of {0, 1}m . Intuitively, this is because Ax
is an exponentially small fraction of {0, 1}m .
Formally, by a shift of Ax we mean a set of the form Ax s =
{r s : r Ax } for some s {0, 1}m ; note that |Ax s| = |Ax |. We
will show
x L s1 , s2 , . . . , sm {0, 1}m r {0, 1}m

(Ax si )

i=1

s1 , s2 , . . . , sm {0, 1} r {0, 1}
m

(A(x; r si ) = 1);

i=1

x
/ L s1 , s2 , . . . , sm {0, 1} r {0, 1}
m

r
/

(Ax si )

i=1

s1 , s2 , . . . , sm {0, 1}m r {0, 1}m

(A(x; r si ) = 1).

i=1

We prove both parts by the Probabilistic Method, starting with the


second (which is simpler).
R

x
/ L Let s1 , . . . , sm be arbitrary, and choose R {0, 1}m at random. Now Ax and hence each Ax si contains less than a
2n fraction of {0, 1}m . So, by a union bound,
Pr[R



(Ax si )]
Pr[R Ax si ]
i

< m 2n < 1,
for suciently large n. In particular, there exists an r

{0, 1}m such that r
/ i (Ax si ).

58

Basic Derandomization Techniques


R

x L: Choose S1 , S2 , . . . , Sm {0, 1}m . Then, for every xed r, we


have:


Pr[r
/ (Ax Si )] =
Pr[r
/ Ax Si ]
i

Pr[Si
/ Ax r]

< (2n )m ,
since Ax and hence Ax r contains more than a 1 2n
fraction of {0, 1}m . By a union bound, we have:

Pr[r r
/ (Ax Si )] < 2m (2n )m 1.
i

Thus, there exist s1 , . . . , sm such that


all points r in {0, 1}m .

i (Ax

si ) contains

Readers familiar with complexity theory might notice that the


above proof shows that BPP is contained in the second level of the
polynomial-time hierarchy (PH). In general, the kth level of the PH
contains all languages that satisfy a k-quantier expression analogous
to (3.1).

3.4

The Method of Conditional Expectations

In the previous sections, we saw several derandomization techniques


(enumeration, nonuniformity, nondeterminism) that are general in the
sense that they apply to all of BPP, but are infeasible in the sense
that they cannot be implemented by ecient deterministic algorithms.
In this section and the next one, we will see two derandomization techniques that sometimes can be implemented eciently, but do not apply
to all randomized algorithms.
3.4.1

The General Approach

Consider a randomized algorithm that uses m random bits. We can


view all its sequences of coin tosses as corresponding to a binary tree

3.4 The Method of Conditional Expectations

59

of depth m. We know that most paths (from the root to the leaf) are
good, that is, give a correct answer. A natural idea is to try and nd
such a path by walking down from the root and making good choices
at each step. Equivalently, we try to nd a good sequence of coin tosses
bit-by-bit.
To make this precise, x a randomized algorithm A and an input
x, and let m be the number of random bits used by A on input x.
For 1 i m and r1 , r2 , . . . , ri {0, 1}, dene P (r1 , r2 , . . . , ri ) to be the
fraction of continuations that are good sequences of coin tosses. More
precisely, if R1 , . . . , Rm are uniform and independent random bits, then
def

P (r1 , r2 , . . . , ri ) =

Pr

R1 ,R2 ,...,Rm

[A(x; R1 , R2 , . . . , Rm ) is correct

|R1 = r1 , R2 = r2 , . . . , Ri = ri ]
=

E [P (r1 , r2 , . . . , ri , Ri+1 )].

Ri+1

(See Figure 3.1.)


By averaging, there exists an ri+1 {0, 1} such that
P (r1 , r2 , . . . , ri , ri+1 ) P (r1 , r2 , . . . , ri ). So at node (r1 , r2 , . . . , ri ),
we simply pick ri+1 that maximizes P (r1 , r2 , . . . , ri , ri+1 ). At the end
we have r1 , r2 , . . . , rm , and
P (r1 , r2 , . . . , rm ) P (r1 , r2 , . . . , rm1 ) P (r1 ) P () 2/3,

P(0,1)=7/8

o o

o o o

Fig. 3.1 An example of P (r1 , r2 ), where o at the leaf denotes a good path.

60

Basic Derandomization Techniques

where P () denotes the fraction of good paths from the root. Then
P (r1 , r2 , . . . , rm ) = 1, since it is either 1 or 0.
Note that to implement this method, we need to compute
P (r1 , r2 , . . . , ri ) deterministically, and this may be infeasible. However,
there are nontrivial algorithms where this method does work, often for
search problems rather than decision problems, and where we measure
not a boolean outcome (e.g., whether A is correct as above) but some
other measure of quality of the output. Below we see one such example,
where it turns out to yield a natural greedy algorithm.
3.4.2

Derandomized MAXCUT Approximation

Recall the MaxCutproblem:


Computational Problem 3.15 (Computational Problem 2.38,
rephrased). MaxCut: Given a graph G = (V, E), nd a partition
S, T of V (i.e., S T = V , S T = ) maximizing the size of the set
cut(S, T ) = {{u, v} E : u S, v T }.
We saw a simple randomized algorithm that nds a cut of (expected)
size at least |E|/2 (not counting any self-loops, which can never be cut),
which we now phrase in a way suitable for derandomization.
Algorithm 3.16 (randomized MaxCut, rephrased).
Input: A graph G = ([N ], E) (with no self-loops)
Flip N coins r1 , r2 , . . . , rN , put vertex i in S if ri = 0 and in T if
ri = 1. Output (S, T ).
To derandomize this algorithm using the Method of Conditional
Expectations, dene the conditional expectation



def

e(r1 , r2 , . . . , ri ) =
|cut(S, T )| R1 = r1 , R2 = r2 , . . . , Ri = ri
E
R1 ,R2 ,...,RN

to be the expected cut size when the random choices for the rst i coins
are xed to r1 , r2 , . . . , ri .

3.4 The Method of Conditional Expectations

61

We know that when no random bits are xed, e[] = |E|/2 (because
each edge is cut with probability 1/2), and all we need to calculate is
e(r1 , r2 , . . . , ri ) for 1 i N . For this particular algorithm it turns out
def

that the quantity is not hard to compute. Let Si = {j : j i, rj = 0}


def

(resp. Ti = {j : j i, rj = 1}) be the set of vertices in S (resp. T ) after


def

we determine r1 , . . . , ri , and Ui = {i + 1, i + 2, . . . , N } be the undecided vertices that have not been put into S or T . Then
e(r1 , r2 , . . . , ri ) = |cut(Si , Ti )| + 1/2(|cut(Ui , [N ])|).

(3.2)

Note that cut(Ui , [N ]) is the set of unordered edges that have at least
one endpoint in Ui . Now we can deterministically select a value for ri+1 ,
by computing and comparing e(r1 , r2 , . . . , ri , 0) and e(r1 , r2 , . . . , ri , 1).
In fact, the decision on ri+1 can be made even simpler than computing (3.2) in its entirety, by observing that the set cut(Ui+1 , [N ])
is independent of the choice of ri+1 . Therefore, to maximize
e(r1 , r2 . . . , ri , ri+1 ), it is enough to choose ri+1 that maximizes the
|cut(S, T )| term. This term increases by either |cut({i + 1}, Ti )| or
|cut({i + 1}, Si )| depending on whether we place vertex i + 1 in S or
T , respectively. To summarize, we have
e(r1 , . . . , ri , 0) e(r1 , . . . , ri , 1) = |cut({i + 1}, Ti )| |cut({i + 1}, Si )|.
This gives rise to the following deterministic algorithm, which is guaranteed to always nd a cut of size at least |E|/2:
Algorithm 3.17 (deterministic MaxCut approximation).
Input: A graph G = ([N ], E) (with no self-loops)
(1) Set S = , T =
(2) For i = 0, . . . , N 1:
(a) If |cut({i + 1}, T )| > |cut({i + 1}, S)|, set S S
{i + 1},
(b) Else set T T {i + 1}.
Note that this is the natural greedy algorithm for this problem. In
other cases, the Method of Conditional Expectations yields algorithms

62

Basic Derandomization Techniques

that, while still arguably greedy, would have been much less easy to
nd directly. Thus, designing a randomized algorithm and then trying
to derandomize it can be a useful paradigm for the design of deterministic algorithms even if the randomization does not provide gains
in eciency.

3.5
3.5.1

Pairwise Independence
An Example

As a motivating example for pairwise independence, we give another


way of derandomizing the MaxCut approximation algorithm discussed
above. Recall the analysis of the randomized algorithm:

Pr[Ri = Rj ] = |E|/2,
E[|cut(S)|] =
(i,j)E

where R1 , . . . , RN are the random bits of the algorithm. The key observation is that this analysis applies for any distribution on (R1 , . . . , RN )
satisfying Pr[Ri = Rj ] = 1/2 for each i = j. Thus, they do not need to
be completely independent random variables; it suces for them to be
pairwise independent. That is, each Ri is an unbiased random bit, and
for each i = j, Ri is independent from Rj .
This leads to the question: Can we generate N pairwise independent
bits using less than N truly random bits? The answer turns out to be
yes, as illustrated by the following construction.
Construction 3.18 (pairwise independent bits). Let B1 , . . . , Bk
be k independent unbiased random bits. For each nonempty S [k],
let RS be the random variable iS Bi , where denotes XOR.
Proposition 3.19. The 2k 1 random variables RS in Construction 3.18 are pairwise independent unbiased random bits.
Proof. It is evident that each RS is unbiased. For pairwise independence, consider any two nonempty sets S = T [k]. Then:
RS = RST RS\T
RT = RST RT \S .

3.5 Pairwise Independence

63

Note that RST , RS\T , and RT \S are independent as they depend


on disjoint subsets of the Bi s, and at least two of these subsets are
nonempty. This implies that (RS , RT ) takes each value in {0, 1}2 with
probability 1/4.
Note that this gives us a way to generate N pairwise independent
bits from log(N + 1) independent random bits. Thus, we can reduce
the randomness required by the MaxCut algorithm to logarithmic,
and then we can obtain a deterministic algorithm by enumeration.
Algorithm 3.20 (deterministic MaxCut algorithm II).
Input: A graph G = ([N ], E) (with no self-loops)
For all sequences of bits b1 , b2 , . . . , b log(N +1)
, run the randomized MaxCut algorithm using coin tosses (rS = iS bi )S = and
choose the largest cut thus obtained.
Since there are at most 2(N + 1) sequences of bi s, the derandomized algorithm still runs in polynomial time. It is slower than
the algorithm we obtained by the Method of Conditional Expectations (Algorithm 3.17), but it has the advantage of using logarithmic
workspace and being parallelizable. The derandomization can be sped
up using almost pairwise independence (at the price of a slightly worse
approximation factor); see Problem 3.4.
3.5.2

Pairwise Independent Hash Functions

Some applications require pairwise independent random variables that


take values from a larger range, for example, we want N = 2n pairwise independent random variables, each of which is uniformly distributed in {0, 1}m = [M ]. (Throughout this survey, we will often use
the convention that a lowercase letter equals the logarithm (base 2)
of the corresponding capital letter.) The naive approach is to repeat
the above algorithm for the individual bits m times. This uses roughly
n m = (log M )(log N ) initial random bits, which is no longer logarithmic in N if M is nonconstant. Below we will see that we can do much
better. But rst some denitions.

64

Basic Derandomization Techniques

A sequence of N random variables each taking a value in [M ] can be


viewed as a distribution on sequences in [M ]N . Another interpretation
of such a sequence is as a mapping f : [N ] [M ]. The latter interpretation turns out to be more useful when discussing the computational
complexity of the constructions.
Denition 3.21 (pairwise independent hash functions). A
family (i.e., multiset) of functions H = {h : [N ] [M ]} is pairwise indeR
pendent if the following two conditions hold when H H is a function
chosen uniformly at random from H:
(1) x [N ], the random variable H(x) is uniformly distributed
in [M ].
(2) x1 = x2 [N ], the random variables H(x1 ) and H(x2 ) are
independent.
Equivalently, we can combine the two conditions and require that
1
x1 = x2 [N ], y1 , y2 [M ], Pr [H(x1 ) = y1 H(x2 ) = y2 ] = 2 .
R
M
H H
Note that the probability above is over the random choice of a function
from the family H. This is why we talk about a family of functions
rather than a single function. The description in terms of functions
makes it natural to impose a strong eciency requirement:
Denition 3.22. A family of functions H = {h : [N ] [M ]} is explicit
if given the description of h and x [N ], the value h(x) can be computed in time poly(log N, log M ).
Pairwise independent hash functions are sometimes referred to as
strongly 2-universal hash functions, to contrast with the weaker notion
of 2-universal hash functions, which requires only that Pr[H(x1 ) =
H(x2 )] 1/M for all x1 = x2 . (Note that this property is all we needed
for the deterministic MaxCut algorithm, and it allows for a small savings in that we can also include S = in Construction 3.18.)
Below we present another construction of a pairwise independent
family.

3.5 Pairwise Independence

65

Construction 3.23 (pairwise independent hash functions from


linear maps). Let F be a nite eld. Dene the family of functions
H = { ha,b : F F}a,bF where ha,b (x) = ax + b.
Proposition 3.24. The family of functions H in Construction 3.23 is
pairwise independent.
Proof. Notice that the graph of the function ha,b (x) is the line with
slope a and y-intercept b. Given x1 = x2 and y1 , y2 , there is exactly
one such line containing the points (x1 , y1 ) and (x2 , y2 ) (namely, the
line with slope a = (y1 y2 )/(x1 x2 ) and y-intercept b = y1 ax1 ).
Thus, the probability over a, b that ha,b (x1 ) = y1 and ha,b (x2 ) = y2
equals the reciprocal of the number of lines, namely 1/|F|2 .
This construction uses 2 log |F| random bits, since we have to choose
R
a and b at random from F to get a function ha,b H. Compare this to
|F| log |F| bits required to choose a truly random function, and (log |F|)2
bits for repeating the construction of Proposition 3.19 for each output
bit.
Note that evaluating the functions of Construction 3.23 requires
a description of the eld F that enables us to perform addition and
multiplication of eld elements. Thus we take a brief aside to review
the complexity of constructing and computing in nite elds.
Remark 3.25 (constructing nite elds). Recall that for every
prime power q = pk there is a eld Fq (often denoted GF(q)) of size
q, and this eld is unique up to isomorphism (renaming elements). The
prime p is called the characteristic of the eld. Fq has a description
of length O(log q) enabling addition, multiplication, and division to be
performed in polynomial time (i.e., time poly(log q)). (This description
is simply an irreducible polynomial f of degree k over Fp = Zp .) To
construct this description, we rst need to determine the characteristic p; nding a prime p of a desired bitlength  can be done probabilistically in time poly() = poly(log p) and deterministically in time

66

Basic Derandomization Techniques

poly(2 ) = poly(p). Then given p and k, a description of Fq (for q = pk )


can be found probabilistically in time poly(log p, k) = poly(log q) and
deterministically in time poly(p, k). Note that for both steps, the deterministic algorithms are exponential in the bitlength of the characteristic p. Thus, for computational purposes, a convenient choice is often to
instead take p = 2 and k large, in which case everything can be done
deterministically in time poly(k) = poly(log q).
Using a nite elds of size 2n as suggested above, we obtain an
explicit construction of pairwise independent hash functions Hn,n =
{h : {0, 1}n {0, 1}n } for every n. What if we want a family Hn,m
of pairwise independent hash functions where the input length n and
output length m are not equal? For n < m, we can take hash functions
h from Hm,m and restrict their domain to {0, 1}n by dening h (x) =
h(x 0mn ). In the case that m < n, we can take h from Hn,n and
throw away n m bits of the output. That is, dene h (x) = h(x)|m ,
where h(x)|m denotes the rst m bits of h(x).
In both cases, we use 2 max{n, m} random bits. This is the best
possible when m n. When m < n, it can be reduced to m + n random bits by using (ax)|m + b where b {0, 1}m instead of (ax + b)|m .
Summarizing:
Theorem 3.26. For every n, m N, there is an explicit family of
pairwise independent functions Hn,m = {h : {0, 1}n {0, 1}m } where
a random function from Hn,m can be selected using max{m, n} + m
random bits.
Problem 3.5 shows that max{m, n} + m random bits is essentially
optimal.
3.5.3

Hash Tables

The original motivating application for pairwise independent functions


was for hash tables. Suppose we want to store a set S [N ] of values
and answer queries of the form Is x S? eciently (or, more generally, acquire some piece of data associated with key x in case x S).

3.5 Pairwise Independence

67

A simple solution is to have a table T such that T [x] = 1 if and


only if x S. But this requires N bits of storage, which is inecient
if |S| N .
A better solution is to use hashing. Assume that we have a hash
function h : [N ] [M ] for some M to be determined later. Let T be a
table of size M , initially empty. For each x [N ], we let T [h(x)] = x if
x S. So to test whether a given y S, we compute h(y) and check if
T [h(y)] = y. In order for this construction to be well-dened, we need h
to be one-to-one on the set S. Suppose we choose a random function H
from [N ] to [M ]. Then, for any set S of size at most K, the probability
that there are any collisions is

Pr[ x = y s.t. H(x) = H(y)]
Pr[H(x)
x =yS

 
K
1

= H(y)]
<
2
M
for M = K 2 /. Notice that the above analysis does not require H to be
a completely random function; it suces that H be pairwise independent (or even 2-universal). Thus using Theorem 3.26, we can generate
and store H using O(log N ) random bits. The storage required for the
table T is O(M log N ) = O(K 2 log N ) bits, which is much smaller than
N when K = N o(1) . Note that to uniquely represent a set of size K,
N 
we need space at least log K
= (K log N ) (when K N 0.99 ). In fact,
there is a data structure achieving a matching space upper bound of
O(K log N ), which works by taking M = O(K) in the above construction and using additional hash functions to separate the (few) collisions
that will occur.
Often, when people analyze applications of hashing in computer science, they model the hash function as a truly random function. However, the domain of the hash function is often exponentially large, and
thus it is infeasible to even write down a truly random hash function.
Thus, it would be preferable to show that some explicit family of hash
function works for the application with similar performance. In many
cases (such as the one above), it can be shown that pairwise independence (or k-wise independence, as discussed below) suces.

68

Basic Derandomization Techniques

3.5.4

Randomness-Ecient Error Reduction and Sampling

Suppose we have a BPP algorithm for a language L that has a constant


error probability. We want to reduce the error to 2k . We have already
seen that this can be done using O(k) independent repetitions (by a
Cherno Bound). If the algorithm originally used m random bits, then
we use O(km) random bits after error reduction. Here we will see how
to reduce the number of random bits required for error reduction by
doing repetitions that are only pairwise independent.
To analyze this, we will need an analogue of the Cherno Bound
(Theorem 2.21) that applies to averages of pairwise independent random variables. This will follow from Chebyshevs Inequality, which
bounds the deviations of a random variable X from its mean in
terms its variance Var[X] = E[(X )2 ] = E[X 2 ] 2 .
Lemma 3.27 (Chebyshevs Inequality). If X is a random variable
with expectation , then
Pr[|X | ]

Var[X]
2

Proof. Applying Markovs Inequality (Lemma 2.20) to the random variable Y = (X )2 , we have:
Pr[|X | ] = Pr[(X )2 2 ]

E[(X )2 ] Var[X]
=
.
2
2

We now use this to show that a sum of pairwise independent random


variables is concentrated around its expectation.
Proposition 3.28 (Pairwise Independent Tail Inequality). Let
X1 , . . . , Xt be pairwise independent random variables taking values in

the interval [0, 1], let X = ( i Xi )/t, and = E[X]. Then
Pr[|X | ]

1
.
t2

3.5 Pairwise Independence

69

Proof. Let i = E[Xi ]. Then




Var[X] = E (X )2

2

1
(Xi i )
= 2 E
t
i

=
=

t2

E[(Xi i )(Xj j )]

i,j

1 

E[(Xi i )2 ]
t2

(by pairwise independence)

1 

Var[Xi ]
t2
i

t
Now apply Chebyshevs Inequality.
While this requires less independence than the Cherno Bound, notice
that the error probability decreases only linearly rather than exponentially with the number t of samples.
Error Reduction. Proposition 3.28 tells us that if we use t = O(2k )
pairwise independent repetitions, we can reduce the error probability of a BPP algorithm from 1/3 to 2k . If the original BPP
algorithm uses m random bits, then we can do this by choosing
h : {0, 1}k+O(1) {0, 1}m at random from a pairwise independent
family, and running the algorithm using coin tosses h(x) for all
x {0, 1}k+O(1) . This requires m + max{m, k + O(1)} = O(m + k)
random bits.

Independent Repetitions
Pairwise Independent Repetitions

Number of
Repetitions

Number of
Random Bits

O(k)
O(2k )

O(km)
O(m + k)

70

Basic Derandomization Techniques

Note that we saved substantially on the number of random bits,


but paid a lot in the number of repetitions needed. To maintain a
polynomial-time algorithm, we can only aord k = O(log n). This setting implies that if we have a BPP algorithm with constant error
that uses m random bits, we have another BPP algorithm that uses
O(m + log n) = O(m) random bits and has error 1/poly(n). That is,
we can go from constant to inverse-polynomial error only paying a constant factor in randomness. (In fact, it turns out there is a way to
achieve this with no penalty in randomness; see Problem 4.6.)
Sampling. Recall the Sampling problem: Given oracle access to a
function f : {0, 1}m [0, 1], we want to approximate (f ) to within an
additive error of .
In Section 2.3.1, we saw that we can solve this problem with
probability 1 by outputting the average of f on a random
sample of t = O(log(1/)/2 ) points in {0, 1}m , where the correctness follows from the Cherno Bound. To reduce the number of
truly random bits used, we can use a pairwise independent sample
instead. Specically, taking t = 1/(2 ) pairwise independent points,
we get an error probability of at most (by Proposition 3.28). To
generate t pairwise independent samples of m bits each, we need
O(m + log t) = O(m + log(1/) + log(1/)) truly random bits.
Number of
Samples

Number of
Random Bits

Truly Random Sample

O((1/2 ) log(1/))

Pairwise Independent
Repetitions

O((1/2 ) (1/))

O(m (1/2 )
log(1/))
O(m + log(1/)
+ log(1/))

Both of these sampling algorithms have a natural restricted structure.


First, they choose all of their queries to the oracle f nonadaptively,
based solely on their coin tosses and not based on the answers to previous queries. Second, their output is simply the average of the queried

3.5 Pairwise Independence

71

values, whereas the original sampling problem does not constrain the
output function. It is useful to abstract these properties as follows.
Denition 3.29. A sampler Samp for domain size M is given coin
R
tosses x [N ] and outputs a sequence of samples z1 , . . . , zt [M ]. We
say that Samp : [N ] [M ]t is a (, ) averaging sampler if for every
function f : [M ] [0, 1], we have
 t

1
f (zi ) > (f ) + .
(3.3)
Pr
R
(z1 ,...,zt )Samp(U[N ] ) t
i=1

If Inequality 3.3 only holds for f with (boolean) range {0, 1}, we call
Samp a boolean averaging sampler. We say that Samp is explicit if given
x [N ] and i [t], Samp(x)i can be computed in time poly(log N, log t).
We note that, in contrast to the Cherno Bound (Theorem 2.21)
and the Pairwise Independent Tail Inequality (Proposition 3.28), this
denition seems to only provide an error guarantee in one direction,
namely that the sample average does not signicantly exceed the global
average (except with small probability). However, a guarantee in the
other direction also follows by considering the function 1 f . Thus,
up to a factor of 2 in the failure probability , the above denition

is equivalent to requiring that Pr[|(1/t) i f (zi ) (f )| > ] . We
choose to use a one-sided guarantee because it will make the connection
to list-decodable codes (in Section 5) slightly cleaner.
Our pairwise-independent sampling algorithm can now be described
as follows:
Theorem 3.30 (Pairwise Independent Sampler). For every
m N and , [0, 1], there is an explicit (, ) averaging sampler
Samp : {0, 1}n ({0, 1}m )t using n = O(m + log(1/) + log(1/)) random bits and t = O(1/(2 )) samples.
As we will see in subsequent sections, averaging samplers are intimately related to the other pseudorandom objects we are studying
(especially randomness extractors). In addition, some applications of
samplers require samplers of this restricted form.

72

Basic Derandomization Techniques

3.5.5

k-wise Independence

Our denition and construction of pairwise independent functions generalize naturally to k-wise independence for any k.
Denition 3.31 (k-wise independent hash functions). For
N, M, k N such that k N , a family of functions H = {h : [N ]
[M ]} is k-wise independent if for all distinct x1 , x2 , . . . , xk [N ], the
random variables H(x1 ), . . . , H(xk ) are independent and uniformly disR
tributed in [M ] when H H.

Construction 3.32 (k-wise independence from polynomials).


Let F be a nite eld. Dene the family of functions H =
{ha0 ,a1 ,...,ak1 :F F} where each ha0 ,a1 ,...,ak1 (x) = a0 + a1 x + a2 x2 +
+ ak1 xk1 for a0 , . . . , ak1 F.

Proposition 3.33. The family H given in Construction 3.32 is k-wise


independent.
Proof. Similarly to the proof of Proposition 3.24, it suces to prove
that for all distinct x1 , . . . , xk F and all y1 , . . . , yk F, there is exactly
one polynomial h of degree at most k 1 such that h(xi ) = yi for all i.
To show that such a polynomial exists, we can use the Lagrange Interpolation Formula:
h(x) =

k

i=1

yi

 x xj
.
xi xj
j =i

To show uniqueness, suppose we have two polynomials h and g of degree


at most k 1 such that h(xi ) = g(xi ) for i = 1, . . . , k. Then h g has
at least k roots, and thus must be the zero polynomial.
Corollary 3.34. For every n, m, k N, there is a family of k-wise
independent functions H = {h : {0, 1}n {0, 1}m } such that choosing

3.6 Exercises

73

a random function from H takes k max{n, m} random bits, and evaluating a function from H takes time poly(n, m, k).
k-wise independent hash functions have applications similar to those
that pairwise independent hash functions have. The increased independence is crucial in derandomizing some algorithms. k-wise independent
random variables also satisfy a tail bound similar to Proposition 3.28,
with the key improvement being that the error probability vanishes
linearly in tk/2 rather than t; see Problem 3.8.

3.6

Exercises

Problem 3.1 (Derandomizing RP versus BPP*). Show that


prRP = prP implies that prBPP = prP, and thus also that
BPP = P. (Hint: Look at the proof that NP = P BPP = P.)

Problem 3.2 (Designs). Designs (also known as packings) are collections of sets that are nearly disjoint. In Section 7, we will see how
they are useful in the construction of pseudorandom generators. Formally, a collection of sets S1 , S2 , . . . , Sm [d] is called an (, a)-design
(for integers a  d) if
For all i, |Si | = .
For all i = j, |Si Sj | < a.
For given , wed like m to be large, a to be small, and d to be small.
That is, wed like to pack many sets into a small universe with small
intersections.
   2
(1) Prove that if m ad / a , then there exists an (, a)-design
S1 , . . . , Sm [d].
Hint: Use the Probabilistic Method. Specically, show that
if the sets are chosen randomly, then for every S1 , . . . , Si1 ,
E [#{j < i : |Si Sj | a}] < 1.

Si

74

Basic Derandomization Techniques

(2) Conclude that for every constant > 0 and every , m N,


2
there exists an (, a)-design S1 , , Sm [d] with d = O( a )
and a = log m. In particular, setting m = 2 , we t exponentially many sets of size  in a universe of size d = O()
while keeping the intersections an arbitrarily small fraction
of the set size.
(3) Using the Method of Conditional Expectations, show how
to construct designs as in Parts 1 and 2 deterministically in
time poly(m, d).

Problem 3.3 (More Pairwise Independent Families).


(1) (matrix-vector family) For an n m {0, 1}-matrix A and
b {0, 1}n , dene a function hA,b : {0, 1}m {0, 1}n by
hA,b (x) = (Ax + b) mod 2. (The mod 2 is applied componentwise.) Show that Hm,n = {hA,b } is a pairwise independent family. Compare the number of random bits needed to
generate a random function in Hm,n to Construction 3.23.
(2) (Toeplitz matrices) A is a Toeplitz matrix if it is constant on
diagonals, i.e., Ai+1,j+1 = Ai,j for all i, j. Show that even if
we restrict the family Hm,n in Part 1 to only include hA,b
for Toeplitz matrices A, we still get a pairwise independent
family. How many random bits are needed now?

Problem 3.4(Almost Pairwise Independence). A family of functions H mapping domain [N ] to range [M ] is -almost pairwise
independent3 if for every x1 = x2 [N ], y1 , y2 [M ], we have
1+
.
Pr [H(x1 ) = y1 and H(x2 ) = y2 ]
R
M2
H H
3 Another

common denition of -almost pairwise independence requires instead that for


R
every x1 = x2 [N ], if we choose a random hash function H H, the random variable
(H(x1 ), H(x2 )) is -close to two uniform and independent elements of [M ] in statistical
dierence (as dened in Section 6). The two denitions are equivalent up to a factor of
M 2 in the error parameter .

3.6 Exercises

75

(1) Show that there exists a family H of -almost pairwise independent functions from {0, 1}n to {0, 1}m such that choosing
a random function from H requires only O(m + log n +
log(1/)) random bits (as opposed to O(m + n) for exact
pairwise independence). (Hint: First consider domain Fd+1
for an appropriately chosen nite eld F and d N, and look
at maps of the form h = g fa , where g comes from some
pairwise independent family and fa : Fd+1 F is dened by
fa (x0 , . . . , xd ) = x0 + x1 a + x2 a2 + + xd ad .)
(2) Give a deterministic algorithm that on input an N -vertex,
M -edge graph G (with no self-loops), nds a cut of size
at least (1/2 o(1)) M in time M polylog(N ) and space
O(log M ) (thereby improving the M poly(N ) running time
of Algorithm 3.20).

Problem 3.5 (Size Lower Bound for Pairwise Independent


Families). Let H = {h : [N ] [M ]} be a pairwise independent family
of functions.
(1) Prove that if N 2, then |H| M 2 .
(2) Prove that if M = 2, then |H| N + 1. (Hint: based on
H, construct a sequence of orthogonal vectors vx {1}|H|
parameterized by x [N ].)
(3) More generally, prove that for arbitrary M , we have |H|
N (M 1) + 1. (Hint: for each x [N ], construct M 1
linearly independent vectors vx,y R|H| such that vx,y
vx ,y if x = x .)
(4) Deduce that for N = 2n and M = 2m , selecting a random
function from H requires at least max{n, m} + m random
bits.

Problem 3.6(Frequency Moments of Data Streams). Given one


pass through a huge stream of data items (a1 , a2 , . . . , ak ), where each

76

Basic Derandomization Techniques

ai {0, 1}n , we want to compute statistics on the distribution of items


occurring in the stream while using small space (not enough to store all
the items or maintain a histogram). In this problem, you will see how to

compute the second frequency moment f2 = a m2a , where ma = #{i :
ai = a}.
The algorithm works as follows: Before receiving any items, it
chooses t random 4-wise independent hash functions H1 , . . . , Ht :
{0, 1}n {+1, 1}, and sets counters X1 = X2 = = Xt = 0. Upon
receiving the ith item ai , it adds Hj (ai ) to counter Xj . At the end of
the stream, it outputs Y = (X12 + + Xt2 )/t.
Notice that the algorithm only needs space O(t n) to store the
hash functions Hj and space O(t log k) to maintain the counters Xj
(compared to space k n to store the entire stream, and space 2n log k
to maintain a histogram).
(1) Show that for every data stream (a1 , . . . , ak ) and each j, we
have E[Xj2 ] = f2 , where the expectation is over the choice of
the hash function Hj .
(2) Show that Var[Xj2 ] 2f22 .
(3) Conclude that for a suciently large constant t (independent
of n and k), the output Y is within 1% of f2 with probability
at least 0.99.
(4) Show how to decrease the error probability to while only
increasing the space by a factor of log(1/).

Problem 3.7(Improved Pairwise Independent Tail Inequality).


(1) Show that if X is a random variable taking values in [0, 1]
and = E[X], we have Var[X] (1 ).
(2) Improve the Pairwise Independent Tail Inequality (Proposition 3.28) to show that if X is the average of t pairwise
independent random variables taking values in [0, 1] and =
E[X], then Pr[|X | ] (1 )/(t 2 ). In particular, for t = O(1/), we have Pr[.99 X 1.01] 0.99.

3.7 Chapter Notes and References

77

Problem 3.8(k -wise Independent Tail Inequality). Let X be the


average of t k-wise independent random variables for an even integer k,
and let = E[X]. Prove that

Pr[|X | ]

k2
4t2

k/2
.

(Hint: show that all terms in the expansion of E[(X )k ] =



E[( i (Xi i ))k ] that involve more than k/2 variables Xi are zero.)
Note that for xed k, this probability decays like O(1/(t2 ))k/2 , improving the 1/(t2 ) bound in pairwise independence when k > 2.

Problem 3.9 (Hitting Samplers). A function Samp : [N ] [M ]t is


a (, ) hitting sampler if for every set T [M ] of density greater than
, we have
Pr

(z1 ,...,zt )Samp(U[N ] )

[i zi T ] 1 .

(1) Show that every (, ) averaging sampler is a (, ) hitting


sampler.
(2) Show that if we only want a hitting sampler, the number of
samples in Theorem 3.30 can be reduced to O(1/()). (Hint:
use Problem 3.7.)
(3) For which subset of BPP algorithms are hitting samplers
useful for doing randomness-ecient error reduction?

3.7

Chapter Notes and References

The Time Hierarchy Theorem was proven by Hartmanis and


Stearns [200]; proofs can be found in any standard text on complexity theory, for example, [32, 161, 366]. Adleman [3] showed that every
language in RP has polynomial-sized circuits (cf., Corollary 3.12),
and this was generalized to BPP by Gill. Pippenger [310] showed
the equivalence between having polynomial-sized circuits and P/poly

78

Basic Derandomization Techniques

(Fact 3.11). The general denition of complexity classes with advice


(Denition 3.10) is due to Karp and Lipton [235], who explored the relationship between nonuniform lower bounds and uniform lower bounds.
A 5n O(n) circuit-size lower bound for an explicit function (in P)
was given by Iwama et al. [218, 254].
The existence of universal traversal sequences (Example 3.8) was
proven by Aleliunas et al. [13], who suggested nding an explicit construction (Open Problem 3.9) as an approach to derandomizing the
logspace algorithm for Undirected S-T Connectivity. For the state
of the art on these problems, see Section 4.4. An conjectured deterministic NC algorithm for Perfect Matching (derandomizing Algorithm 2.7 in a dierent way than Open Problem 3.7) is given in [4].
Theorem 3.14 is due to Sipser [364], who proved that BPP is in
the fourth level of the polynomial-time hierarchy; this was improved to
the second level by G
acs. Our proof of Theorem 3.14 is due to Lautemann [256]. Problem 3.1 is due to Buhrman and Fortnow [85]. For more
on nondeterministic computation and nonuniform complexity, see textbooks on computational complexity, such as [32, 161, 366].
The Method of Conditional Probabilities was formalized and
popularized as an algorithmic tool in the work of Spencer [370] and
Raghavan [316]. Its use in Algorithm 3.17 for approximating MaxCut is implicit in [279]. For more on this method, see the textbooks
[25, 291].
A more detailed treatment of pairwise independence (along with
a variety of other topics in pseudorandomness and derandomization)
can be found in the survey by Luby and Wigderson [281]. The use of
limited independence in computer science originates with the seminal
papers of Carter and Wegman [93, 417], which introduced the notions
of universal, strongly universal (i.e., k-wise independent), and almost
strongly universal (i.e., almost k-wise independent) families of hash
functions. The pairwise independent and k-wise independent sample
spaces of Constructions 3.18, 3.23, and 3.32 date back to the work
of Lancaster [255] and Joe [222, 223] in the probability literature,
and were rediscovered several times in the computer science literature.
The construction of pairwise independent hash functions from Part 1
of Problem 3.3 is due to Carter and Wegman [93] and Part 2 is implicit

3.7 Chapter Notes and References

79

in [408, 168]. The size lower bound for pairwise independent families
in Problem 3.5 is due to Stinson [376], based on the PlackettBurman
bound for orthogonal arrays [312]. The construction of almost pairwise
independent families in Problem 3.4 is due to Bierbrauer et al. [67]
(though the resulting parameters follow from the earlier work of Naor
and Naor [296]).
The application to hash tables from Section 3.5.3 is due to Carter
and Wegman [93], and the method mentioned for improving the
space complexity to O(K log N ) is due to Fredman, Komlos, and
Szemeredi [142]. The problem of randomness-ecient error reduction (sometimes called deterministic amplication) was rst studied
by Karp, Pippenger, and Sipser [234], and the method using pairwise independence given in Section 3.5.4 was proposed by Chor and
Goldreich [97]. The use of pairwise independence for derandomizing
algorithms was pioneered by Luby [278]; Algorithm 3.20 for MaxCut
is implicit in [279]. The notion of averaging samplers was introduced
by Bellare and Rompel [57] (under the term oblivious samplers). For
more on samplers and averaging samplers, see the survey by Goldreich [155]. Tail bounds for k-wise independent random variables, such
as the one in Problem 3.8, can be found in the papers [57, 97, 350].
Problem 3.2 on designs is from [134], with the derandomization of
Part 3 being from [281, 302]. Problem 3.6 on the frequency moments
of data streams is due to Alon, Matias, and Szegedy [22]. For more on
data stream algorithms, we refer to the survey by Muthukrishnan [295].

4
Expander Graphs

Now that we have seen a variety of basic derandomization techniques,


we will move on to study the rst major pseudorandom object in this
survey, expander graphs. These are graphs that are sparse yet very
well-connected.

4.1

Measures of Expansion

We will typically interpret the properties of expander graphs in an


asymptotic sense. That is, there will be an innite family of graphs Gi ,
with a growing number of vertices Ni . By sparse, we mean that the
degree Di of Gi should be very slowly growing as a function of Ni . The
well-connectedness property has a variety of dierent interpretations,
which we will discuss below. Typically, we will drop the subscripts of i
and the fact that we are talking about an innite family of graphs will
be implicit in our theorems. As in Section 2.4.2, we will state many
of our denitions for directed multigraphs (which well call digraphs for
short), though in the end we will mostly study undirected multigraphs.

80

4.1 Measures of Expansion

4.1.1

81

Vertex Expansion

The classic measure of well-connectedness in expanders requires that


every not-too-large set of vertices has many neighbors:
Denition 4.1. A digraph G is a (K, A) vertex expander if for
def

all sets S of at most K vertices, the neighborhood N (S) = {u|v


S s.t. (u, v) E} is of size at least A |S|.
Ideally, we would like D = O(1), K = (N ), where N is the number
of vertices, and A as close to D as possible.
There are several other measures of expansion, some of which we
will examine in forthcoming sections:
Boundary Expansion: Instead of N (S), only use the boundary
def

S = N (S) \ S.
Edge Expansion (cuts): Instead of S, use the number of
edges leaving S.
Random Walks: Random walks converge quickly to the uniform distribution, that is, the second eigenvalue (G) is
small.
Quasi-randomness (a.k.a Mixing): for every two sets S
and T (say of constant density), the fraction of edges between
S and T is roughly the product of their densities.
All of these measures are very closely related to each other, and are
even equivalent for certain settings of parameters.
It is not obvious from the denition that good vertex expanders
(say, with D = O(1), K = (N ), and A = 1 + (1)) even exist. We
will show this using the Probabilistic Method.
Theorem 4.2. For all constants D 3, there is a constant > 0 such
that for all N , a random D-regular undirected graph on N vertices is
an (N, D 1.01) vertex expander with probability at least 1/2.
Note that the degree bound of 3 is the smallest possible, as every
graph of degree 2 is a poor expander (being a union of cycles and

82

Expander Graphs

chains). In addition, for most settings of parameters, it is impossible to


have expansion larger than D 1 (as shown in Problem 4.3).
We prove a slightly simpler theorem for bipartite expanders.
Denition 4.3. A bipartite multigraph G is a (K, A) vertex expander
if for all sets S of left-vertices of size at most K, the neighborhood
N (S) is of size at least A |S|.
Now, let BipN,D be the set of bipartite multigraphs that have N
vertices on each side and are D-leftregular, meaning that every vertex
on the left has D neighbors, numbered from 1, . . . , D (but vertices on
the right may have varying degrees).
Theorem 4.4. For every constant D, there exists a constant > 0
such that for all N , a uniformly random graph from BipN,D is an
(N, D 2) vertex expander with probability at least 1/2.
R

Proof. First, note that choosing G BipN,D is equivalent to uniformly


and independently choosing D neighbors on the right for each left
vertex v. Now, for K N , let pK be the probability that there
exists a left-set S of size exactly K that does not expand by at
least D 2. Fixing a subset S of size K, N (S) is a set of KD random vertices in [N ] (chosen with replacement). We can imagine these
vertices V1 , V2 , . . . , VKD being chosen in sequence. Call Vi a repeat if
Vi {V1 , . . . , Vi1 }. Then the probability that Vi is a repeat, even conditioned on V1 , . . . , Vi1 , is at most (i 1)/N KD/N . So,
Pr[|N (S)| (D 2) K] Pr[there are at least 2K
repeats among V1 , . . . , VKD ]



KD 2K
KD

.
2K
N
Thus, we nd that


 
KD 2K
KD
N
pK
2K
K
N
K 
 

 3 4 K

KDe 2K KD 2K
e D K
Ne
=
,

K
2K
N
4N

4.1 Measures of Expansion

83

where e is the base of the natural logarithm. Since K N , we can


set = 1/(e3 D4 ) to obtain pK 4K . Thus
N 

Pr

GBipN,D

[G is not an (N, D 2) expander]

K=1

1
4K < .
2

There are a number of variants to the above probabilistic construction of expanders.


We can obtain a bipartite multigraph that is D-regular on
both sides by taking the union of D random perfect matchings. This can be analyzed using a small modication of the
analysis above; even though V1 , . . . , VKD are not independent, the probability of a Vi being a repeat conditioned on
V1 , . . . , Vi1 can still be bounded by KD/(N K). Also, the
multigraph can be made into a simple graph by eliminating
or redistributing edges.
One can optimize rather than the expansion factor A,
showing that for all constants < 1 and D > 2, there exists
a constant A > 1 such that for all suciently large N , a
random graph in BipN,D is an (N, A) vertex expander with
high probability.
In fact, a very general tradeo between D, , and A is known:
a random D-regular N -vertex bipartite multigraph is an
(N, A) vertex expander with high probability for suciently
H()+H(A)
large N if D > H()AH(1/A)
, where H(p) = p log(1/p) +
(1 p) log(1/(1 p)) is the binary entropy function.
The results can also be extended to unbalanced bipartite
graphs (where the right side is smaller than the left), and
nonbipartite graphs as well, and both of these cases are
important in some applications.
In addition to being natural combinatorial objects, expander graphs
have numerous applications in theoretical computer science, including
the construction of fault-tolerant networks (indeed, the rst papers
on expanders were in conferences on telephone networks), sorting in
O(log n) time in parallel, derandomization (as we will see), lower

84

Expander Graphs

bounds in circuit complexity and proof complexity, error-correcting


codes, negative results regarding integrality gaps for linear
programming relaxations and metric embeddings, distributed routing,
and data structures. For many of these applications, it is not enough
to know that a random graph is a good expander we need explicit
constructions, that is, ones that are deterministic and ecient. We view
such explicit expanders as pseudorandom objects because they are
xed graphs that possess many of the properties of random graphs.
4.1.2

Spectral Expansion

Intuitively, another way of saying that a graph is well-connected is


to require that random walks on the graph converge quickly to the
stationary distribution. As we have seen in Section 2.4.2, the mixing
rate of random walks in turn is captured well by the second largest
eigenvalue of the transition matrix, and this turns out to be a very
useful measure of expansion for many purposes.
Recall that for an N -vertex regular directed graph G with randomwalk matrix M , we dene
def

(G) = max

M u
xM 
= max
,
xu x
 u

where u = (1/N, . . . , 1/N ) RN is the uniform distribution on [N ], the


rst maximum is over all probability distributions [0, 1]N , and the
second maximum is over all vectors x RN that are orthogonal to u.
def
We write (G) = 1 (G) to denote the spectral gap of G.
Denition 4.5. For [0, 1], we say that a regular digraph G has
spectral expansion if (G) (equivalently, (G) 1 ).1
Larger values of (G) (or smaller values of (G)) correspond to better expansion. Sometimes it is more natural to state results in terms of
(G) and sometimes in terms of (G). Surprisingly, this linear-algebraic
1 In

other sources (including the original lecture notes on which this survey was based), the
spectral expansion referred to rather than . Here we use , because it has the more
natural feature that larger values of correspond to the graph being more expanding.

4.1 Measures of Expansion

85

measure of expansion turns out to be equivalent to the combinatorial


measure of vertex expansion for common parameters of interest.
One direction is given by the following:
Theorem 4.6 (spectral expansion vertex expansion). If G
is a regular digraph with spectral expansion = 1 for some


[0, 1], then, for every [0, 1], G is an N, 1/((1 )2 + ) vertex
expander. In particular, G is an (N/2, 1 + ) expander.
We prove this theorem using the following two useful statistics of
probability distributions.
Denition 4.7. For a probability distribution , the collision probability of is dened to be the probability that two independent sam
ples from are equal, namely CP() = x x2 . The support of is
Supp() = {x : x > 0}.
Lemma 4.8. For every probability distribution [0, 1]N , we have:
(1) CP() = 2 =  u2 + 1/N , where u is the uniform
distribution on [N ].
(2) CP() 1/ |Supp()|, with equality i is uniform on
Supp().
Proof. For Part 1, the fact that CP() = 2 follows immediately from
the denition. Then, writing = u + ( u), and noting that u is
orthogonal to u, we have 2 = u2 +  u2 = 1/N +  u2 .
For Part 2, by CauchySchwarz we have






x |Supp()|
x2 = |Supp()| CP(),
1=
xSupp()

with equality i is uniform on Supp().


Proof of Theorem 4.6 The condition that G has spectral expansion
= 1 is equivalent to saying that (G) . By the denition of

86

Expander Graphs

(G) and Part 1 of Lemma 4.8, we have




1
1
2
CP()
CP(M )
N
N
for every probability distribution . Letting S be any subset of the
vertices of size at most N and the uniform distribution on S, we
have CP() = 1/|S| and CP(M ) 1/ |Supp(M )| = 1/ |N (S)|. Thus,




1
1
1
1
2

.
|N (S)|
N
|S|
N
Solving for |N (S)| and using N |S|/, we obtain |N (S)|
|S|/(2 (1 ) + ), as desired.
The other direction, i.e., obtaining spectral expansion from vertex
expansion, is more dicult (and we will not prove it here).
Theorem 4.9 (vertex expansion spectral expansion). For
every > 0 and D > 0, there exists > 0 such that if G is a D-regular
(N/2, 1 + ) vertex expander, then it also has spectral expansion .
Specically, we can take = ((/D)2 ).
Note rst the dependence on subset size being N/2: this is necessary,
because a graph can have vertex expansion (N, 1 + (1)) for < 1/2
and be disconnected (e.g., the disjoint union of two good expanders),
thereby having no spectral expansion. Also note that the bound on
deteriorates as D increases. This is also necessary, because adding edges
to a good expander cannot hurt its vertex expansion, but can hurt its
spectral expansion.
Still, roughly speaking, these two results show that vertex expansion
and spectral expansion are closely related, indeed equivalent for many
interesting settings of parameters:
Corollary 4.10. Let G be an innite family of D-regular multigraphs,
for a constant D N. Then the following two conditions are equivalent:
There is a constant > 0 such that every G G is an
(N/2, 1 + ) vertex expander.

4.1 Measures of Expansion

87

There is a constant > 0 such that every G G has spectral


expansion .

When people informally use the term expander, they often mean
a family of regular graphs of constant degree D satisfying one of the
two equivalent conditions above.
However, the two measures are no longer equivalent if one wants
to optimize the expansion constants. For vertex expansion, we have
already seen that if we allow to be a small constant (depending on
D), then there exist (N, A) vertex expanders with A very close to
D 1, e.g., A = D 1.01, and Problem 4.3 shows that A cannot be
any larger than D 1. The optimal value for the spectral expansion is
also well-understood. First note that, by taking 0 in Theorem 4.6,
2
a graph with spectral expansion 1 has vertex
expansion A 1/
for small sets. Thus, a lower bound on is 1/ D o(1). In fact, this
lower bound can be improved, as shown in the following theorem (and
proven in Problem 4.4):
Theorem 4.11. For every constant D N, any D-regular, N -vertex

multigraph G satises (G) 2 D 1/D O(1), where the O(1)


term vanishes as N (and D is held constant).
Surprisingly, there exist explicit constructions giving (G) <

2 D 1/D. Graphs meeting this bound are called Ramanujan graphs.


Random graphs almost match this bound, as well:
Theorem 4.12. For any constant D N, a random D-regular N
vertex graph satises (G) 2 D 1/D + O(1) with probability
1 O(1) where both O(1) terms vanish as N (and D is held
constant).
Now let us see what these results for spectral expansion
imply in the world of vertex expansion. With Ramanujan graphs

((G) 2 D 1/D), the bound from Theorem 4.6 gives a vertex


expansion factor of A D/4 for small sets. This is not tight, and

88

Expander Graphs

it is known that Ramanujan graphs actually have vertex expansion


D/2 O(1) for sets of density O(1), which is tight in the sense that

there are families of graphs with (G) 2 D 1/D but vertex


expansion at most D/2. Still, this vertex expansion is not as good
as we obtained via the Probabilistic Method (Theorem 4.2), where
we achieved vertex expansion D O(1). This means that we cannot
obtain optimal vertex expansion by going through spectral expansion.
Similarly, we cannot obtain optimal spectral expansion by going
through vertex expansion (because the bound on spectral expansion in
Theorem 4.9 necessarily deteriorates as the degree D increases). The
conclusion is that vertex and spectral expansion are loosely equivalent,
but only if we are not interested in optimizing the constants in the
tradeos between various parameters (and for some applications these
are crucial).
4.1.3

Other Measures of Expansion

In this section, we mention two other useful measures of expansion


involving edges crossing cuts in the graph. For two sets S, T V (G),
let e(S, T ) = {(u, v) S T | {u, v} E}. Here (u, v) refers to an
ordered pair, in contrast to the denition of cut(S, T ) in Section 2.3.4.
Thus, we count edges entirely within S T twice, corresponding to
both orientations.

Denition 4.13. A D-regular digraph G is a (K, ) edge expander if


for all sets S of at most K vertices, the cut size e(S, S) is at least
|S| D.
That is, at least an fraction of the edges from S lead outside S.
(Sometimes edge expansion is dened without the normalization factor
of D, only requiring e(S, S) |S|.) When viewed in terms of the
random walk on G, the ratio e(S, S)/(|S| D) is the probability that,
if we condition the stationary distribution on being in S, the random
walk leaves S in one step. It turns out that if we x K = N/2, then
edge expansion turns out to be even more closely related to spectral

4.1 Measures of Expansion

89

expansion than vertex expansion is. Indeed:


Theorem 4.14.
(1) If a D-regular, N -vertex digraph G has spectral expansion
, then G is an (N/2, /2) edge expander.
(2) If a D-regular, N -vertex digraph G is a (N/2, ) edge
expander and at least an fraction of edges leaving each
vertex are self-loops for some [0, 1], then G has spectral
expansion 2 /2.

The condition about self-loops in Part 2 is to ensure that the graph


is far from being bipartite (or more generally periodic in the sense
that all cycle lengths are divisible by some number larger than 1),
because a bipartite graph has spectral expansion 0 but can have positive
edge expansion. For graphs with a constant fraction of self-loops at
each vertex, the theorem implies that the edge expansion is bounded
away from 0 i the spectral expansion is bounded away from 0. Unlike
Corollary 4.10, this equivalence holds even for graphs of unbounded
degree. The intuition for the relation is that a large edge expansion
implies that the random walk on the graph has no bottlenecks and
thus should mix rapidly. This connection also holds for Markov chains
in general (when the denitions are appropriately generalized), where
the edge expansion is known as the conductance. Part 1 of Theorem 4.14
will follow as a special case of the Expander Mixing Lemma below; we
omit the proof of Part 2.
Next, we consider a generalization of edge expansion, where we look
at edges not just from a set S to its complement but between any
two sets S and T . If we think of an expander as being like a random
graph, we would expect the fraction of edges that go from S to T to be
approximately equal to the product of the densities of S and T . The
following result shows that this intuition is correct:
Lemma 4.15 (Expander Mixing Lemma). Let G be a D-regular,
N -vertex digraph with spectral expansion 1 . Then for all sets of

90

Expander Graphs

vertices S, T of densities = |S|/N and = |T |/N , we have





e(S, T )


(1 ) (1 ).

N D


.
Observe that the denominator N D counts all edges of the graph
(as ordered pairs). The lemma states that the dierence between the
fraction of edges from S to T and the expected value if we were to choose
G randomly is small, roughly times the square root of this fraction.
Finally, note that Part 1 of Theorem 4.14 follows from the Expander
Mixing Lemma by setting T = S c , so = 1 and e(S, T )/N D
(1 ) (1 ) /2.
When a digraph G = (V, E) has the property that
|e(S, T )/|E| | = O(1) for all sets S, T (with densities , ),
the graph is called quasirandom. Thus, the Expander Mixing Lemma
implies that a regular digraph with (G) = O(1) is quasirandom.
Quasirandomness has been studied extensively for dense graphs, in
which case it has numerous equivalent formulations. Here we are most
interested in sparse graphs, especially constant-degree graphs (for
which (G) = O(1) is impossible).
Proof. Let S be the characteristic (row) vector of S and T
the characteristic vector of T . Let A be the adjacency matrix of
G, and M = A/D be the random-walk matrix for G. Note that
e(S, T ) = S AtT = S (DM )tT , where the superscript t denotes the
transpose.
We can express S as the sum of two components, one parallel to the uniform distribution u, and the other a vector
S , where


S u. The coecient of u is S , u/u, u = i (S )i = |S| = N .

Then S = (N )u +
S and similarly T = (N )u + T . Intuitively,
the components parallel to the uniform distribution spread the weight

of S and T uniformly over the entire graph, and


S and T will yield
the error term.

4.1 Measures of Expansion

91

Formally, we have
e(S, T )
1
t
= ((N )u +
S )M ((N )u + T )
N D
N
1
1
t
= (N 2 )uM ut + (N )uM (
T)
N
N
1
1
t
t
M (
+ (N )
S Mu +
T) .
N S
N

Since uM = u and M ut = ut , and both


S and T are orthogonal to
u, the above expression simplies to:
t
t
( M )(
M (
e(S, T )
T)
T)
= (N )uut + S
= + S
.
N D
N
N

Thus,



e(S, T )



N D =



t
(S M )(
T)



N
1


S M  T 
N
1


S  T .
N
To complete the proof, we note that
2
2
2
N = S 2 = (N )u2 + 
S  = N + S  ,


2 )N =
(1 ) N and similarly

=
(


S

so

(1 ) N .

T =

Similarly to vertex expansion and edge expansion, a natural question is to what extent the converse holds. That is, if e(S, T )/N D is
always close to the product of the densities of S and T , then is (G)
necessarily small? This is indeed true:
Theorem 4.16 (Converse to Expander Mixing Lemma). Let G
be a D-regular, N -vertex undirected graph. Suppose that for all pairs
ofdisjoint vertex sets S, T , we have |e(S, T )/(N D) (S)(T )|
(S)(T ) for some [0, 1], where (R) = |R|/N for any set R of
vertices. Then (G) = O ( log(1/)).

92

Expander Graphs

Putting the two theorems together, we see that and are the
same up to a logarithmic factor. Thus, unlike the other connections we
have seen, this connection is good for highly expanding graphs (i.e.,
(G) close to zero, (G) close to 1).

4.2

Random Walks on Expanders

From the previous section, we know that one way of characterizing


an expander graph G is by having a bound on its second eigenvalue
(G), and in fact there exist constant-degree expanders where (G) is
bounded by a constant less than 1. From Section 2.4.3, we know that
this implies that the random walk on G converges quickly to the uniform distribution. Specically, a walk of length t started at any vertex
ends at 2 distance at most t from the uniform distribution. Thus
after t = O(log N ) steps, the distribution is very close to uniform, for
example, the probability of every vertex is (1 0.01)/N . Note that,
if G has constant degree, the number of random bits invested here is
O(t) = O(log N ), which is within a constant factor of optimal; clearly
log N O(1) random bits are also necessary to sample an almost uniform vertex. Thus, expander walks give a very good tradeo between
the number of random bits invested and the randomness of the nal
vertex in the walk. Remarkably, expander walks give good randomness
properties not only for the nal vertex in the walk, but also for the
sequence of vertices traversed in the walk. Indeed, in several ways to
be formalized below, this sequence of vertices behaves like uniform
independent samples of the vertex set.
A canonical application of expander walks is for randomnessecient error reduction of randomized algorithms: Suppose we have an
algorithm with constant error probability, which uses some m random
bits. Our goal is to reduce the error to 2k , with a minimal penalty in
random bits and time. Independent repetitions of the algorithm suers
just an O(k) multiplicative penalty in time, but needs O(km) random
bits. We have already seen that with pairwise independence we can use
just O(m + k) random bits, but the time blows up by O(2k ). Expander
graphs let us have the best of both worlds, using just m + O(k) random bits, and increasing the time by only an O(k) factor. Note that

4.2 Random Walks on Expanders

93

for k = O(m), the number of random bits is (1 + O(1)) m, even better


than what pairwise independence gives.
The general approach is to consider an expander graph with vertex
set {0, 1}m , where each vertex is associated with a setting of the random
bits. We will choose a uniformly random vertex v1 and then do a random walk of length t 1, visiting additional vertices v2 , . . . , vt . (Note
that unlike the rapid mixing analysis, here we start at a uniformly random vertex.) This requires m random bits for the initial choice, and
log D for each of the t 1 steps. For every vertex vi on the random walk,
we will run the algorithm with vi as the setting of the random coins.
First, we consider the special case of randomized algorithms with
one-sided error (RP). For these, we should accept if at least one execution of the algorithm accepts, and reject otherwise. If the input is a
No instance, the algorithm never accepts, so we also reject. If the input
is a Yes instance, we want our random walk to hit at least one vertex
that makes the algorithm accept. Let B denote the set of bad vertices
giving coin tosses that make the algorithm reject. By the denition of
RP, the density of B is at most 1/2. Thus, our aim is to show that the
probability that all the vertices in the walk v1 , . . . , vt are in B vanishes
exponentially fast in t, if G is a good expander.
The case t = 2 follows from the Expander Mixing Lemma given
in the previous section. If we choose a random edge in a graph with
spectral expansion 1 , the probability that both endpoints are in a
set B is at most (B)2 + (B). So if (B), then the probability
is roughly (B)2 , just like two independent random samples. The case
of larger t is given by the following theorem.
Theorem 4.17 (Hitting Property of Expander Walks). If G is a
regular digraph with spectral expansion 1 , then for any B V (G)
of density , the probability that a random walk (V1 , . . . , Vt ) of t 1
steps in G starting in a uniformly random vertex V1 always remains in
B is

 t

Vi B ( + (1 ))t .
Pr
i=1

94

Expander Graphs

Equivalently, a random walk hits the complement of B with high


probability. Note that if and are constants less than 1, then the
probability of staying in B is 2(t) , completing the analysis of the
ecient error-reduction algorithm for RP.
Before proving the theorem, we discuss general approaches to analyzing spectral expanders and random walks on them. Typically, the
rst step is to express the quantities of interest linear-algebraically,
involving applications of the random-walk (or adjacency) matrix M
to some vectors v. For example, when proving the Expander Mixing
Lemma (Lemma 4.15), we expressed the fraction of edges between sets
S and T as S M tT (up to some normalization factor). Then we can
proceed in one of the two following ways:
Vector Decomposition Decompose the input vector(s) v as v = v  +
v , where v  = (v, u/u, u)u is the component of v in the
direction of the uniform distribution u and v is the component
of v orthogonal to u. Then this induces a similar orthogonal
decomposition of the output vector vM into
vM = (vM ) + (vM ) = v  M + v M,
where v  M = v  and v M  v . Thus, from information about how vs lengths are divided into the uniform and nonuniform components, we deduce information about how vM is
divided into the uniform and non-uniform components. This
is the approach we took in the proof of the Expander Mixing
Lemma.
Matrix Decomposition This corresponds to a dierent decomposition of the output vector vM that can be expressed in a way
that is independent of the decomposition of the input vector v.
Specically, if G has spectral expansion = 1 , then
vM = v  + v M = v  + ( v  + v M )
= vJ + vE = v(J + E),
where J is the matrix in which every entry is 1/N and the error
matrix E satises vE v. The advantage of this decomposition is that we can apply it even when we have no information about how v decomposes (only its length). The fact

4.2 Random Walks on Expanders

95

that M is a convex combination of J and E means that we


can often treat each of these components separately and then
just apply the triangle inequality. However, it is less rened
than the vector decomposition approach, and sometimes gives
weaker bounds. Indeed, if we used it to prove the Expander
Mixing Lemma (without decomposing
 S and T ), we would
get a slightly worse error term of (S)(T ) + (S)(T ).
The Matrix Decomposition Approach can be formalized using the
following notion.
Denition 4.18. The (spectral) norm of an N N real matrix M is
dened to be
M  = max

xRN

xM 
x

(If M is symmetric, then M  equals the largest absolute value of any


eigenvalue of M .)
Some basic properties of the matrix norm are that cA = |c| A,
A + B A + B, and A B A B for every two matrices A, B, and c R. Following the discussion above, we have the following lemma:
Lemma 4.19. Let G be a regular digraph on N vertices with randomwalk matrix M . Then G has spectral expansion = 1 i M =
J + E, where J is the N N matrix where every entry is 1/N (i.e.,
the random-walk matrix for the complete graph with self-loops) and
E 1.
Proof. Suppose that G has spectral expansion . Then dene E =
(M J)/. To see that E has norm at most 1, rst observe that
uE = (uM uJ)/ = (1 )u/ = u. Thus it suces to show that
for every vector v orthogonal to u, the vector vE is orthogonal to u and
is of length at most v. Orthogonality follows because vM is orthogonal to u (by regularity of G) and vJ = 0. The length bounds follows
from vE = (vM )/ and vM  v by the spectral expansion of G.

96

Expander Graphs

Conversely, suppose that M = J + E for some E 1. Then for


every vector v orthogonal to u, we have vM  = 0 + vE v,
and thus G has spectral expansion .
Intuitively, this lemma says that we can think of a random step
on a graph with spectral expansion as being a random step on the
complete graph with probability and not doing damage with probability 1 . This intuition would be completely accurate if E were
a stochastic matrix, but it is typically not (e.g., it may have negative entries). Still, note that the bound given in Theorem 4.17 exactly
matches this intuition: in every step, the probability of remaining in B
is at most + = + (1 ).
Now we can return to the proof of the theorem.
Proof of Theorem 4.17. We need a way to express getting stuck in
B linear-algebraically. For that, we dene P to be the diagonal matrix
with Pi,i = 1 if i B and Pi,i = 0 otherwise. Thus, the probability a
distribution picks a node in B is |P |1 , where | |1 is the 1 norm,

|x|1 = |xi | (which in our case is equal to the sum of the components
of the vector, since all values are nonnegative).
Let M be the random-walk matrix of G. The probability distribution for the rst vertex V1 is given by the vector u. Now we can state
the following crucial fact:
Claim 4.20. The probability that the random walk stays entirely
within B is precisely |uP (M P )t1 |1 .
Proof of Claim: By induction on , we show that (uP (M P ) )i is the
probability that the rst  + 1 vertices of the random walk are in B and
the ( + 1)st vertex is i. The base case is  = 0. If i B, (uP )i = 1/N ;
if i
/ B, (uP )i = 0. Now assume the hypothesis holds up to some .
Then (uP (M P ) M )i is the probability that the rst  + 1 vertices of
the random walk are in B and the ( + 2)nd vertex is i (which may
or may not be in B). Multiplying by P , we zero out all components
for nodes not in B and leave the others unchanged. Thus, we obtain
the probability that the rst  + 2 vertices are in B and the ( + 2)nd
vertex is i.

4.2 Random Walks on Expanders

97

To get a bound in terms of the spectral expansion, we will now


switch to the 2 norm. The intuition is that multiplying by M shrinks
the component that is perpendicular to u (by expansion) and multiplying by P shrinks the component parallel to u (because it zeroes out
some entries). Thus, we should be able to show that the norm M P 
is strictly less than 1. Actually, to get the best bound, we note that
uP (M P )t1 = uP (P M P )t1 , because P 2 = P , so we instead bound
P M P . Specically:
Claim 4.21. P M P  + (1 ).
Proof of Claim:
Using the Matrix Decomposition Lemma
(Lemma 4.19), we have:
P M P  = P (J + E)P 
P JP  + P EP 
P JP  +
Thus, we only need to analyze the case of J, the random walk on the
complete graph. Given any vector x, let y = xP , so



yi uP.
xP JP = yJP =
i

Since y x and y has at most N nonzero coordinates, we have






 


x.
yi uP 
N y
xP JP 


N
i

Thus,
P M P  + = + (1 ).
Using Claims 4.20 and 4.21, the probability of never leaving B in a
(t 1)-step random walk is

|uP (M P )t1 |1 N uP (M P )t1 

N uP  P M P t1

98

Expander Graphs


N

( + (1 ))t1
N

( + (1 ))t .
The hitting properties described above suce for reducing the error
of RP algorithms. What about BPP algorithms, which have two-sided
error? They are handled by the following.
Theorem 4.22(Cherno Bound for Expander Walks). Let G be
a regular digraph with N vertices and spectral expansion 1 , and
let f : [N ] [0, 1] be any function. Consider a random walk V1 , . . . , Vt
in G from a uniform start vertex V1 . Then for any > 0




1 
2


f (Vi ) (f ) + 2e( t) .
Pr

t
i

Note that this is just like the standard Cherno Bound (Theorem 2.21), except that our additive approximation error increases by
= 1 . Thus, unlike the Hitting Property we proved above, this
bound is only useful when is suciently small (as opposed to bounded
away from 1). This can be achieved by taking the a power of the initial
expander, where edges correspond to walks of length t in the original
expander; this raises the random-walk matrix and to the tth power.
However, there is a better Cherno Bound for Expander Walks, where
does not appear in the approximation error, but the exponent in
the probability of error is (2 t) instead of (2 t). The bound above
suces in the common case that a small constant approximation error
can be tolerated, as in error reduction for BPP.

Proof. Let Xi be the random variable f (Vi ), and X = i Xi . Just like
in the standard proof of the Cherno Bound (Problem 2.7), we show

that the expectation of the moment generating function erX = i erXi
is not much larger than er E[X] and apply Markovs Inequality, for a
suitable choice of r. However, here the factors erXi are not independent, so the expectation does not commute with the product. Instead,
we express E[erX ] linear-algebraically as follows. Dene a diagonal

4.2 Random Walks on Expanders

99

matrix P whose (i, i)th entry is erf (i) . Then, similarly Claim 4.20 in
the proof of the hitting proof above, we observe that





E[erX ] = uP (M P )t1 1 = u(M P )t 1 N u M P t = M P t .
To see this, we simply note that each cross-term in the matrix product
uP (M P )t1 corresponds to exactly one expander walk v1 , . . . , vt , with

a coecient equal to the probability of this walk times i ef (vi ) . By
the Matrix Decomposition Lemma (Lemma 4.19), we can bound
M P  (1 ) JP  + EP .
Since J simply projects onto the uniform direction, we have
uP 2
u2
 rf (v)
/N )2
v (e

=
2
v (1/N )
1  2rf (v)

=
e
N
v

JP 2 =

1 

(1 + 2rf (v) + O(r2 ))


N
v

= 1 + 2r + O(r2 )
for r 1, and thus
JP  =


1 + 2r + O(r2 ) = 1 + r + O(r2 ).

For the error term, we have


EP  P  er = 1 + r + O(r2 ).
Thus,
M P  (1 ) (1 + r + O(r2 )) + (1 + r + O(r2 ))
1 + ( + )r + O(r2 ),
and we have
E[erX ] (1 + ( + )r + O(r2 ))t e(+)rt+O(r

2 t)

100

Expander Graphs

By Markovs Inequality,
Pr[X ( + + ) t] ert+O(r

2 t)

= e(

2 t)

if we set r = /c for a large enough constant c. By applying the same


analysis to the function 1 f , we see that Pr[X ( )t] =
2
e( t) , and this establishes the theorem.
We now summarize the properties that expander walks give us for
randomness-ecient error reduction and sampling.
For reducing the error of a BPP algorithm from 1/3 to 2k , we can
apply Theorem 4.22 with = = 1/12, so that a walk of length t =
O(k) suces. If the original BPP algorithm used m random bits and
the expander is of constant degree (which is possible with = 1/12),
then the number of random bits needed is only m + O(k). Comparing
with previous methods for error reduction, we have:

Independent Repetitions
Pairwise Independent Repetitions
Expander Walks

Number of
Repetitions

Number of
Random Bits

O(k)
O(2k )
O(k)

O(km)
O(k + m)
m + O(k)

For Sampling, where we are given an oracle to a function


f : {0, 1}m [0, 1] and we want to approximate (f ) to within an additive error of , we can apply Theorem 4.22 with error /2 and = /2.
The needed expander can be obtained by taking an O(log(1/))th
power of a constant-degree expander, yielding the following bounds:
Number of
Samples

Number of
Random Bits

Truly Random Sample O((1/2 ) log(1/))


O(m (1/2 ) log(1/))
O(m + log(1/) + log(1/))
Pairwise Independent
O((1/2 ) (1/))
Samples
Expander Walks
O((1/2 ) log(1/)) m + O(log(1/) (log(1/)/2 ))

4.2 Random Walks on Expanders

101

The log(1/) factor in the number of random bits comes because we


took an O(log(1/))th power of a constant-degree expander and thus
spend O(log(1/)) random bits for each step on the expander. This
is actually not necessary and comes from the slightly weaker Cherno
Bound we proved. In any case, note that expander walks have a much
better dependence on the error probability in the number of samples
(as compared to pairwise independence), but have a worse dependence
on the approximation error in the number of random bits. Problem 4.5
shows how to combine these two samplers to achieve the best of both.
Similarly to pairwise independence, the sampling algorithm based
on expander walks is actually an averaging sampler in the sense of
Denition 3.29:
Theorem 4.23 (Expander-Walk Sampler). For every m N and
, [0, 1], there is a (, ) averaging sampler Samp : {0, 1}n
({0, 1}m )t using n = m + O(log(1/) log(1/)/2 ) random bits and
t = O((1/2 ) log(1/)) samples.
The sampling algorithm of Problem 4.5 that combines expander
walks and pairwise independence, however, is not an averaging sampler,
and it is an open problem to achieve similar parameters with an explicit
averaging sampler:
Open Problem 4.24. Give an explicit construction of a (, ) averaging sampler Samp : {0, 1}n ({0, 1}m )t that uses n = O(m + log() +
log(1/)) random bits and t = O((1/2 ) log(1/)) samples.
Before we end this section, we make an important remark: we have
not actually given an ecient algorithm for randomness-ecient error
reduction (or an explicit averaging sampler)! Our algorithm assumes an
expander graph of exponential size, namely 2m where m is the number
of random bits used by the algorithm. Generating such a graph at random would use far too many coins. Even generating it deterministically
would not suce, since we would have to write down an exponentialsize object. In the following section, we will see how to dene an explicit

102

Expander Graphs

expander graph without writing it down in its entirety, and eciently


do random walks in such a graph.

4.3

Explicit Constructions

As discussed in previous sections, expander graphs have numerous


applications in theoretical computer science. (See also the Chapter
Notes and Exercises.) For some of these applications, it may be acceptable to simply choose the graph at random, as we know that a random
graph will be a good expander with high probability. For many applications, however, this simple approach does not suce. Some reasons
are the following (in increasing order of signicance):
We may not want to tolerate the error probability introduced
by the (unlikely) event that the graph is not an expander.
To deal with this, we could try checking that the graph is
an expander. Computing the expansion of a given graph is
NP-hard for most of the combinatorial measures (e.g., vertex expansion or edge expansion), but the spectral expansion
can be computed to high precision in time polynomial in the
size of the graph (as it is just an eigenvalue computation).
As we saw, spectral expansion does yield estimates on vertex expansion and edge expansion (but cannot give optimal
expansion in these measures).
Some of the applications of expanders (like the one from the
previous section) are for reducing the amount of randomness
needed for certain tasks. Thus choosing the graph at random
defeats the purpose.
A number of the applications require exponentially large
expander graphs, and thus we cannot even write down a
randomly chosen expander. For example, for randomnessecient error reduction of randomized algorithms, we need
an expander on 2m nodes where m is the number of random
bits used by the algorithm.
From a more philosophical perspective, nding explicit constructions
is a way of developing and measuring our understanding of these
important combinatorial objects.

4.3 Explicit Constructions

103

A couple of alternatives for dening explicit constructions of


expanders on N nodes are:
Mildly Explicit: Construct a complete representation of the graph
in time poly(N ).
Fully Explicit: Given a node u [N ] and i [D], where D is the
degree of the expander, compute the ith neighbor of u in time
poly(log N ).
Consider the randomness-ecient error reduction application discussed
in the previous section, in which we performed a random walk on an
expander graph with exponentially many nodes. Mild explicitness is
insucient for this application, as the desired expander graph is of
exponential size, and hence cannot be even entirely stored, let alone
constructed. But full explicitness is perfectly suited for eciently conducting a random walk on a huge graph. So now our goal is the
following:
Goal: Devise a fully explicit construction of an innite family {Gi }
of D-regular graphs with spectral expansion at least , where D and
> 0 are constants independent of i.
We remark that we would also like the set {Ni }, where Ni is the
number of vertices in Gi , to be not too sparse, so that the family of
graphs {Gi } has graphs of size close to any desired size.
4.3.1

Algebraic Constructions

Here we mention a few known explicit constructions that are of interest


because of their simple description, the parameters achieved, and/or
the mathematics that goes into their analysis. We will not prove the
expansion properties of any of these constructions (but will rather give
a dierent explicit construction in the subsequent sections).
Construction 4.25(discrete torus expanders). Let G = (V, E) be
the graph with vertex set V = ZM ZM , and edges from each node
(x, y) to the nodes (x, y), (x + 1, y), (x, y + 1), (x, x + y), (y, x), where
all arithmetic is modulo M .

104

Expander Graphs

This is a fully explicit 5-regular digraph with N = M 2 nodes and


spectral expansion = (1). It can be made undirected by adding
a reverse copy of each edge. We refer to these as discrete torus
expanders because Z2M can be viewed as a discrete version of the real
torus, namely [0, 1]2 with arithmetic modulo 1. The expansion of these
graphs was originally proved using group representation theory, but
later proofs for similar discrete-torus expanders were found that only
rely on Fourier analysis over the torus.
Construction 4.26 (p-cycle with inverse chords). This is the
graph G = (V, E) with vertex set V = Zp and edges that connect each
node x with the nodes: x + 1, x 1, and x1 (where all arithmetic is
mod p and we dene 01 to be 0).
This graph is only mildly explicit since we do not know how to construct n-bit primes deterministically in time poly(n) (though Cramers
conjecture in Number Theory would imply that we can do so by simply
checking the rst poly(n) n-bit numbers). The proof of expansion relies
on the Selberg 3/16 Theorem from number theory.
Construction 4.27 (Ramanujan graphs). G = (V, E) is a graph
with vertex set V = Fq {}, the nite eld of prime order q s.t.
q 1 mod 4 plus one extra node representing innity. The edges in
this graph connect each node z with all z  of the form:
z =

(a0 + ia1 )z + (a2 + ia3 )


(a2 + ia3 )z + (a0 ia1 )

for a0 , a1 , a2 , a3 Z such that a20 + a21 + a22 + a23 = p, a0 is odd and positive, and a1 , a2 , a3 are even, for some xed prime p = q such that p 1
mod 4, q is a square modulo p, and i Fq such that i2 = 1 mod q.
The degree of the graph is the number of solutions to the equation a20 + a21 + a22 + a23 = p, which turns out to be D = p + 1, and it

has (G) 2 D 1/D, so it is an optimal spectral expander. (See


Theorems 4.11 and 4.12, and note that this bound is even better than
we know for random graphs, which have an additive O(1) term in the

4.3 Explicit Constructions

105

spectral expansion.) These graphs are also only mildly explicit, again
due to the need to nd the prime q.
These are called Ramanujan Graphs because the proof of their
spectral expansion relies on results in number theory concerning the
Ramanujan Conjectures. Subsequently, the term Ramanujan graphs
came to refer to any innite family of graphs with optimal spectral

expansion 1 2 D 1/D.
4.3.2

Graph Operations

The explicit construction of expanders given in the next section will


be an iterative one, where we start with a constant size expander H
and repeatedly apply graph operations to get bigger expanders. The
operations that we apply should increase the number of nodes in the
graph, while keeping the degree and the second eigenvalue bounded.
Well see three operations, each improving one property while paying a
price on the others; however, combined together, they yield the desired
expander. It turns out that this approach for constructing expanders
will also be useful in derandomizing the logspace algorithm for Undirected S-T Connectivity, as we will see in Section 4.4.
The following concise notation will be useful to keep track of each
of the parameters:

Denition 4.28. An (N, D, )-graph is a D-regular digraph on N vertices with spectral expansion .

4.3.2.1

Squaring

Denition 4.29(Squaring of Graphs). If G = (V, E) is a D-regular


digraph, then G2 = (V, E  ) is a D2 -regular digraph on the same vertex
set, where the (i, j)th neighbor of a vertex x is the jth neighbor of the
ith neighbor of x. In particular, a random step on G2 consists of two
random steps on G.

106

Expander Graphs

Lemma 4.30. If G is a (N, D, 1 )-graph, then G2 is a (N, D2 , 1


2 )-graph.
Namely, the degree deteriorates by squaring, while the spectral
expansion is improved from = 1 to  = 1 2 = 2 2 .
Proof. The eect of squaring on the number of nodes N and the degree
D is immediate from the denition. For the spectral expansion, note
that if M is the random-walk matrix for G, then M 2 is the random-walk
matrix for G2 . So for any vector x u,
xM 2  xM  2 x.
4.3.2.2

Tensoring

The next operation we consider increases the size of the graph at the
price of increasing the degree.
Denition 4.31 (Tensor Product of Graphs). Let G1 = (V1 , E1 )
be D1 -regular and G2 = (V2 , E2 ) be D2 -regular. Then their tensor
product is the D1 D2 -regular graph G1 G2 = (V1 V2 , E), where the
(i1 , i2 )th neighbor of a vertex (x1 , x2 ) is (y1 , y2 ), where yb is the ib th
neighbor of xb in Gb . That is, a random step on G1 G2 consists of a
random step on G1 in the rst component and a random step on G2 in
the second component.
Often this operation is simply called the product of G1 and G2 ,
but we use tensor product to avoid confusion with squaring and
to reect its connection with the standard tensor products in linear
algebra:
Denition 4.32 (Tensor Products of Vectors and Matrices).
Let x RN1 , y RN2 , then their tensor product is the vector z =
x y RN1 N2 where zij = xi yj .

4.3 Explicit Constructions

107

Similarly, for matrices A = (aij ) RN1 N1 , B = (bij ) RN2 N2 ,


their tensor product is the matrix C = A B RN1 N2 N1 N2 where
C = (cij,i j  ) for cij,i j  = aii bjj  .
A few comments on the tensor operation:
A random walk on a tensor graph G1 G2 is equivalent to
taking two independent random walks on G1 and G2 .
For vectors x RN1 , y RN2 that are probability distributions (i.e., nonnegative vectors with 1 norm 1), their tensor
product x y is a probability distribution on [N1 ] [N2 ]
where the two components are independently distributed
according to x and y, respectively.
(x y)(A B) = (xA) (yB) for every x RN1 , y RN2 ,
and in fact A B is the unique matrix with this property.
Not all vectors z RN1 N2 are decomposable as x y for x
RN1 and y RN2 . Nevertheless, the set of all decomposable
tensors x y spans RN1 N2 .
If M1 , M2 are the random-walk matrices for graphs G1 , G2
respectively, then the random-walk matrix for the graph
G1 G2 is
M1 M2 = (IN1 M2 )(M1 IN2 ) = (M1 IN2 )(IN1 M2 ),
where IN denotes the N N identity matrix. That is, we
can view a random step on G1 G2 as being a random step
on the G1 component followed by one on the G2 component
or vice-versa.
The eect of tensoring on expanders is given by the following:
Lemma 4.33. If G1 is an (N1 , D1 , 1 )-graph and G2 is an (N2 , D2 , 2 )graph, then G1 G2 is an (N1 N2 , D1 D2 , min{1 , 2 })-graph.
In particular, if G1 = G2 , then the number of nodes improves, the
degree deteriorates, and the spectral expansion remains unchanged.

108

Expander Graphs

Proof. As usual, we write 1 = 1 1 , 2 = 1 2 ; then our goal is to


show that G1 G2 has spectral expansion 1 max{1 , 2 }. The intuition is as follows. We can think of the vertices of G1 G2 as being partitioned into N1 clouds, each consisting of N2 vertices, where cloud
v1 contains all vertices of the form (v1 , ). Thus, any probability distribution (V1 , V2 ) on the vertices (v1 , v2 ) of G1 G2 can be thought of
as picking a cloud v1 according to the marginal distribution2 V1 and
then picking the vertex v2 within the cloud v1 according to the conditional distribution V2 |V1 =v1 . If the overall distribution on pairs is far
from uniform, then either
(1) The marginal distribution V1 on the clouds must be far from
uniform, or
(2) the conditional distributions V2 |V1 =v1 within the clouds must
be far from uniform.
When we take a random step, the expansion of G1 will bring us closer
to uniform in Case 1 and the expansion of G2 will bring us closer to
uniform in Case 2.
One way to prove the bound in the case of undirected graphs is
to use the fact that the eigenvalues of M1 M2 are all the products
of eigenvalues of M1 and M2 , so the largest magnitude is 1 1 and
the next largest is bounded by either 1 1 or 1 2 . Instead, we use
the Vector Decomposition Method to give a proof that matches the
intuition more closely and is a good warm-up for the analysis of the
zig-zag product in the next section. Given any vector x RN1 N2 that is
orthogonal to uN1 N2 , we can decompose x as x = x + x , where x is
a multiple of uN2 on each cloud of size N2 and x is orthogonal to uN2
on each cloud. Note that x = y uN2 , where y RN1 is orthogonal to
uN1 (because x = x x is orthogonal to uN1 N2 ). If we think of x as
the nonuniform component of a probability distribution, then x and
x correspond to the two cases in the intuition above.
For the rst case, we have
x M = (y uN2 )(M1 M2 ) = (yM1 ) uN2 .
2 For

two jointly distributed random variables (X, Y ), the marginal distribution of X is


simply the distribution of X alone, ignoring information about Y .

4.3 Explicit Constructions

109

The expansion of G1 tells us that M1 shrinks y by a factor of 1 , and


thus x M  1 x . For the second case, we write
x M = x (IN1 M2 )(M1 IN2 ).
The expansion of G2 tells us that M2 will shrink x by a factor of 2 on
each cloud, and thus IN1 M2 will shrink x by the same factor. The
subsequent application of M1 IN2 cannot increase the length (being
the random-walk matrix for a regular graph, albeit a disconnected one).
Thus, x M  2 x .
Finally, we argue that x M and x M are orthogonal. Note that

x M = (yM1 ) uN2 is a multiple of uN2 on every cloud. Thus it suces
to argue that x remains orthogonal to uN2 on every cloud after we
apply M . Applying (IN1 M2 ) retains this property (because applying
M2 preserves orthogonality to uN2 , by regularity of G2 ) and applying
(M1 IN2 ) retains this property because it assigns each cloud a linear
combination of several other clouds (and a linear combination of vectors
orthogonal to uN2 is also orthogonal to uN2 ).
Thus,
xM 2 = x M 2 + x M 2
21 x 2 + 22 x 2
max{1 , 2 }2 (x 2 + x 2 )
= max{1 , 2 }2 x2 ,
as desired.
4.3.2.3

The ZigZag Product

Of the two operations we have seen, one (squaring) improves expansion and one (tensoring) increases size, but both have the deleterious
eect of increasing the degree. Now we will see a third operation that
decreases the degree, without losing too much in the expansion. By
repeatedly applying these three operations, we will be able to construct arbitrarily large expanders while keeping both the degree and
expansion constant.

110

Expander Graphs

Let G be an (N1 , D1 , 1 ) expander and H be a (D1 , D2 , 2 ) expander.


z H, will be dened as
The zigzag product of G and H, denoted G $
z H are the pairs (u, i) where u V (G) and
follows. The nodes of G $
z H will be dened so that a random step
i V (H). The edges of G $
z H corresponds to a step on G, but using a random step on H
on G $
to choose the edge in G. (This is the reason why we require the number
z H will
of vertices in H to be equal to the degree of G.) A step in G $
therefore involve a step to a random neighbor in H and then a step
in G to a neighbor whose index is equal to the label of the current
node in H. Intuitively, a random walk on a good expander graph
H should generate choices that are suciently random to produce a
good random walk on G. One problem with this denition is that it
is not symmetric. That is, the fact that you can go from (u, i) to (v, j)
does not mean that you can go from (v, j) to (u, i). We correct this by
adding another step in H after the step in G. In addition to allowing
us to construct undirected expander graphs, this extra step will also
z H.
turn out to be important for the expansion of G $
More formally,
Denition 4.34 (ZigZag Product). Let G be an D1 -regular
digraph on N1 vertices, and H a D2 -regular digraph on D1 vertices.
z H is a graph whose vertices are pairs (u, i) [N1 ] [D1 ].
Then G $
For a, b [D2 ], the (a, b)th neighbor of a vertex (u, i) is the vertex (v, j)
computed as follows:
(1) Let i be the ath neighbor of i in H.
(2) Let v be the i th neighbor of u in G, so e = (u, v) is the i th
edge leaving u. Let j  be such that e is the j  th edge entering
v in G. (In an undirected graph, this simply means that u is
the j  th neighbor of v.)
(3) Let j be the bth neighbor of j  in H.

z H depends on how the edges leaving and


Note that the graph G $
entering each vertex of G are numbered. Thus it is best thought of
as an operation on labelled graphs. (This is made more explicit in

4.3 Explicit Constructions

111

Section 4.3.3 via the notion of an edge-rotation map.) Nevertheless,


the bound we will prove on expansion holds regardless of the labelling:
Theorem 4.35. If G is a (N1 , D1 , 1 )-graph, and H is a (D1 , D2 , 2 )z H is a (N1 D1 , D22 , = 1 22 )-graph. In particular, if
graph then G $
1 = 1 1 and 2 = 1 2 , then = 1 for 1 + 22 .
G should be thought of as a big graph and H as a small graph, where
D1 is a large constant and D2 is a small constant. Note that the number
of nodes D1 in H is required to equal the degree of G. Observe that
when D1 > D22 the degree is reduced by the zig-zag product.
There are two dierent intuitions underlying the expansion of the
zigzag product:
Given an initial distribution (U, I) on the vertices of G1 $
z G2
that is far from uniform, there are two extreme cases, just as
in the intuition for the tensor product.3 Either
(1) All the (conditional) distributions I|U =u within the
clouds are far from uniform, or
(2) All the (conditional) distributions I|U =u within the
clouds of size D1 are uniform (in which case the
marginal distribution U on the clouds must be far
from uniform).
In Case 1, the rst H-step (U, I)  (U, I  ) already brings us
closer to the uniform distribution, and the other two steps
cannot hurt (as they are steps on regular graphs). In Case 2,
the rst H-step has no eect, but the G-step (U, I  )  (V, J  )
has the eect of making the marginal distribution on clouds
closer to uniform, that is, V is closer to uniform than U .
But note that the joint distribution (V, J  ) isnt actually any
z G2
closer to the uniform distribution on the vertices of G1 $
because the G-step is a permutation. Still, if the marginal
3 Here

we follow our convention of using capital letters to denote random variables corresponding to the lower-case values in Denition 4.34.

112

Expander Graphs

distribution V on clouds is closer to uniform, then the conditional distributions within the clouds J  |V =v must have
become further from uniform, and thus the second H-step
(V, J  )  (V, J) brings us closer to uniform. This leads to a
proof by Vector Decomposition, where we decompose any vector x that is orthogonal to uniform into components x and
x , where x is uniform on each cloud, and x is orthogonal
to uniform on each cloud. This approach gives the best known
bounds on the spectral expansion of the zigzag product, but
it can be a bit messy since the two components generally do
not remain orthogonal after the steps of the zigzag product
(unlike the case of the tensor product, where we were able to
show that x M is orthogonal to x M ).
The second intuition is to think of the expander H as behaving similarly to the complete graph on D1 vertices (with
self-loops). In the case that H equals the complete graph,
z H = G H. Thus it is natuthen it is easy to see that G $
ral to apply Matrix Decomposition, writing the random-walk
matrix for an arbitrary expander H as a convex combination
of the random-walk matrix for the complete graph and an
error matrix. This gives a very clean analysis, but slightly
worse bounds than the Vector Decomposition Method.
We now proceed with the formal proof, following the Matrix Decomposition approach.
Proof of Theorem 4.35. Let A, B, and M be the random-walk matriz G2 , respectively. We decompose M into the
ces for G1 , G2 , and G1 $
product of three matrices, corresponding to the three steps in the def be the transition matrix for taking
z G2 s edges. Let B
inition of G1 $
a random G2 -step on the second component of [N1 ] [D1 ], that is,
= IN B, where IN is the N1 N1 identity matrix. Let A be the
B
1
1
permutation matrix corresponding to the G1 -step. That is, A(u,i),(v,j)
is 1 i (u, v) is the ith edge leaving u and the jth edge entering v. By
AB.

z G2 , we have M = B
the denition of G1 $
By the Matrix Decomposition Lemma (Lemma 4.19), B = 2 J +
(1 2 )E, where every entry of J equals 1/D1 and E has norm at

4.3 Explicit Constructions

113

= 2 J + (1 2 )E,
where J = IN J and E
= IN
most 1. Then B
1
1
E has norm at most 1.
This gives
A(
2 J + (1 2 )E)
= 22 JAJ + (1 22 )F,
M = (2 J + (1 2 )E)
where we take (1 22 )F to be the sum of the three terms involving
noting that their norms sum to at most (1 2 ), we see that F has
E;
2
norm at most 1. Now, the key observation is that JAJ = A J.
Thus,
M = 22 A J + (1 22 )F,
and thus
(M ) 22 (A J) + (1 22 )
22 (1 1 ) + (1 22 )
= 1 1 22 ,
as desired.
4.3.3

The Expander Construction

As a rst attempt for constructing a family of expanders, we construct


an innite family G1 , G2 , ... of graphs utilizing only the squaring and
the zigzag operations:
Construction 4.36 (Mildly Explicit Expanders). Let H be a
(D4 , D, 7/8)-graph (e.g., as constructed in Problem 4.8), and dene:
G1 = H 2
z H
Gt+1 = G2t $

Proposition 4.37. For all t, Gt is a (D4t , D2 , 1/2)-graph.


Proof. By induction on t.
Base Case: by the denition of H and Lemma 4.30, G1 = H 2 is a
4
(D , D2 , 1 20 )-graph and 20 1/2.

114

Expander Graphs

z H is well-dened because
Induction Step: First note that G2t $
2
2
2
2
deg(Gt ) = deg(Gt ) = (D ) = #nodes(H). Then,
deg(Gt+1 ) = deg(H)2 = D2
#nodes(Gt+1 ) = #nodes(G2t ) #nodes(H) = Nt D4 = D4t D4 = D4(t+1)
(Gt+1 ) (Gt )2 + 2(H) (1/2)2 + 2 (1/8) = 1/2
Now, we recursively bound the time to compute neighbors in Gt .
Actually, due to the way the G-step in the zigzag product is dened,
we bound the time to compute the edge-rotation map (u, i)  (v, j),
where the ith edge leaving u equals the jth edge entering v. Denote
by time(Gt ) the time required for one evaluation of the edge-rotation
map for Gt . This requires two evaluations of the edge-rotation map for
Gt1 (the squaring requires two applications, while the zigzag part
does not increase the number of applications), plus time poly(log Nt )
for manipulating strings of length O(log Nt ). Therefore,
time(Gt ) = 2 time(Gt1 ) + poly(log Nt )
= 2t poly(log Nt )
(1)

= Nt

where the last equality holds because Nt = D4t for a constant D. Thus,
this construction is only mildly explicit.
We remedy the above diculty by using tensoring to make the sizes
of the graphs grow more quickly:
Construction 4.38 (Fully Explicit Expanders). Let H be a
(D8 , D, 7/8)-graph, and dene:
G1 = H 2
z H
Gt+1 = (Gt Gt )2 $
In this family of graphs, the number of nodes grows doubly exponent
tially Nt c2 , while the computation time grows only exponentially
as before. Namely,
time(Gt ) = 4t poly(log Nt ) = poly(log Nt ).

4.3 Explicit Constructions

115

We remark that the above family is rather sparse, so the numbers


in {Nt } are far apart. To overcome this shortcoming, we can amend
the above denition to have
z H.
Gt = (G t/2
G t/2 )2 $
Now Nt = D8t , so given a number N , we can nd a graph Gt in
the family whose size is at most D8 N = O(N ). Moreover, the construction remains fully explicit because time(Gt ) = O(time(G t/2
) +
time(G t/2 )) = poly(t). Thus we have established:
Theorem 4.39. There is a constant D N such that for every t N,
there is a fully explicit expander graph Gt with degree D, spectral
expansion 1/2, and Nt = D4t nodes.
Consequently, the randomness-ecient error-reduction and averaging sampler based on expander walks can be made explicit:
Corollary 4.40. If a language L has a BPP algorithm with error
probability at most 1/3 that uses m(n) random bits on inputs of length
n, then for every polynomial k(n), L has a BPP algorithm with error
probability at most 2k(n) that uses m(n) + O(k(n)) random bits.

Corollary 4.41. There is an explicit averaging sampler achieving the


parameters of Theorem 4.23.
4.3.4

Open Problems

As we have seen, spectral expanders such as those in Theorem 4.39


are also vertex expanders (Theorem 4.6 and Corollary 4.10) and edge
expanders (Theorem 4.14), but these equivalences do not extend to
optimizing the various expansion measures.
As mentioned in Section 4.3.1, there are known explicit constructions of optimal spectral expanders, namely Ramanujan graphs. However, unlike the expanders of Theorem 4.39, those constructions rely

116

Expander Graphs

on deep results in number theory. The lack of a more elementary construction seems to signify a limitation in our understanding of expander
graphs.
Open Problem 4.42. Give an explicit combinatorial construction

of constant-degreeexpander graphs G with (G) 2 D 1/D (or


even (G) = O(1/ D), where D is the degree.
For vertex expansion, it is known how to construct bipartite (or
directed) expanders with constant left-degree (or out-degree) D and
expansion (1 ) D for an arbitrarily small constant (see Section 6),
but achieving the optimal expansion of D O(1) (cf., Theorem 4.4) or
constructing undirected vertex expanders with high expansion remains
open.
Open Problem 4.43. For an arbitrarily large constant D, give an
explicit construction of bipartite ((N ), D c) vertex expanders with
N vertices on each side and left-degree D, where c is a universal constant independent of D.

Open Problem 4.44. For an arbitrarily small constant > 0, give an


explicit construction of undirected ((N ), (1 )D) vertex expanders
with N vertices and constant degree D that depends only on .
We remark that while Open Problem 4.43 refers to balanced bipartite graphs (i.e., ones with the same number of vertices on each side),
the imbalanced case is also interesting and important. (See Problems 4.10, 5.5 and Open Problems 5.36, 6.35.)

4.4

Undirected S-T Connectivity in Deterministic Logspace

Recall the Undirected S-T Connectivity problem: Given an undirected graph G and two vertices s, t, decide whether there is a path from
s to t. In Section 2.4, we saw that this problem can be solved in randomized logspace (RL). Here we will see how we can use expanders and the
operations above to solve this problem in deterministic logspace (L).

4.4 Undirected S-T Connectivity in Deterministic Logspace

117

The algorithm is based on the following two ideas:


Undirected S-T Connectivity can be solved in logspace
on constant-degree expander graphs. More precisely, it is easy
on constant-degree graphs where every connected component is promised to be an expander (i.e., has spectral expansion bounded away from 1): we can try all paths of length
O(log N ) from s in logarithmic space; this works because
expanders have logarithmic diameter. (See Problem 4.2.)
The same operations we used to construct an innite
expander family above can also be used to turn any graph
into an expander (in logarithmic space). Above, we started
with a constant-sized expander and used various operations
to build larger and larger expanders. There, the goal was
to increase the size of the graph (which was accomplished
by tensoring and/or zigzag), while preserving the degree
and the expansion (which was accomplished by zigzag and
squaring, which made up for losses in these parameters).
Here, we want to improve the expansion (which will be
accomplished by squaring), while preserving the degree (as
will be handled by zigzag) and ensuring the graph remains
of polynomial size (so we will not use tensoring).
Specically, the algorithm is as follows.
Algorithm 4.45 (Undirected S-T Connectivity in L).
Input: An undirected graph G with N edges and vertices s and t.
(1) Let H be a xed (D4 , D, 3/4) graph for some constant D.
(2) Reduce (G, s, t) to (G0 , s0 , t0 ), where G0 is a D2 -regular graph
in which every connected component is nonbipartite and s0
and t0 are connected in G0 i s and t are connected in G.
(3) For k = 1, . . . ,  = O(log N ), dene:
z H
(a) Let Gk = G2k1 $
(b) Let sk and tk be any two vertices in the clouds
of Gk corresponding to sk1 and tk1 , respectively.

118

Expander Graphs

(Note that if sk and tk are connected in Gk , then


sk1 and tk1 are connected in Gk1 .)
(4) Try all paths of length O(log N ) in G from s and accept if
any of them visit t .

We will discuss how to implement this algorithm in logspace later,


and rst analyze its correctness. Let Ck be the connected component
of Gk containing sk . Observe that Ck is a connected component of
2
2
$
z H; below we will show that Ck1
$
z H is connected and hence
Ck1
2
z H. Since C0 is undirected, connected, and nonbipartite,
Ck = Ck1 $
we have (C0 ) 1/poly(N ) by Theorem 2.53. We will argue that in
each iteration the spectral gap increases by a constant factor, and thus
after O(log N ) iterations we have an expander.
By Lemma 4.30, we have
2
) 2 (Ck1 ) (1 (Ck1 )/2) 2(Ck )
(Ck1

for small (Ck1 ). By Theorem 4.35, we have


2
2
$
z H) (H)2 (Ck1
)
(Ck1
 2
3
2 (Ck1 ) (1 (Ck1 )/2)

4

35
1
min
(Ck1 ),
,
32
18

where the last inequality is obtained by considering whether (Ck1 )


2
$
z H is connected, so we
1/18 or (Ck1 ) > 1/18. In particular, Ck1
2
$
z
have Ck = Ck1 H and

35
1
(Ck ) min
(Ck1 ),
.
32
18
Thus, after  = O(log N ) iterations, we must have (C ) 1/18. Moreover, observe that the number of vertices N in G is at most N0
(D4 ) = poly(N ), so considering paths of length O(log N ) will suce
to decide s-t connectivity in G .

4.4 Undirected S-T Connectivity in Deterministic Logspace

119

To show that the algorithm can be implemented in logarithmic


space, we argue that the edge-rotation map of each Gk can be computed with only O(1) more space than the edge-rotation map of Gk1 ,
so that G requires space O(log N ) + O() = O(log N ). Since the inductive claim here refers to sublogarithmic dierences of space (indeed
O(1) space) and sublogarithmic space is model-dependent (even keeping a pointer into the input requires logarithmic space), we will refer
to a specic model of computation in establishing it. (The nal result,
that Undirected S-T Connectivity is in L, is, however, modelindependent because logspace computations in any reasonable computational model can be simulated by logspace computations in any
other reasonable model.) Formally, let space(Gk ) denote the workspace
needed to compute the edge-rotation map of Gk on a multi-tape Turing
machine with the following input/output conventions:
Input Description:
Tape 1 (read-only): Contains the initial input graph
G, with the head at the leftmost position of the tape.
Tape 2 (readwrite): Contains the input pair (v, i),
where v is a vertex of Gi and i [D2 ] is an index of
the a neighbor on a readwrite tape, with the head at
the rightmost position of i. The rest of the tape may
contain additional data.
Tapes 3+ (read-write): Blank worktapes with the
head at the leftmost position.
Output Description:
Tape 1: The head should be returned to the leftmost
position.
Tape 2: In place of (v, i), it should contain the output
(w, j) where w is the ith neighbor of v and v is the
jth neighbor of w. The head should be at the rightmost position of j and the rest of the tape should
remain unchanged from its state at the beginning of
the computation.

120

Expander Graphs

Tapes 3+ (readwrite): Are returned to the blank


state with the heads at the leftmost position.
With these conventions, it is not dicult to argue that space(G0 ) =
O(log N ), and space(Gk ) = space(Gk1 ) + O(1). For the latter, we
rst argue that space(G2k1 ) = space(Gk1 ) + O(1), and then that
z H) = space(G2k1 ) + O(1). For G2k1 , we are given a
space(G2k1 $
triple (v, (i1 , i2 )) on tape 2, with the head on the rightmost position
of i2 , and both i1 and i2 are elements of [D2 ] (and thus of constant
size). We move the head left to the rightmost position of i1 , compute
the edge-rotation map of Gk1 on (v, i1 ) so that tape 2 now contains
(w, j1 , i2 ). Then we swap j1 and i2 , and run the edge-rotation map of
Gk1 on (w, i2 ) to get (w, j2 , j1 ), and move the head to the rightmost
z H, we are given a
position of j1 , completing the rotation. For G2k1 $
tuple ((v, i), (a1 , a2 )), where v is a vertex of G2k1 , i is a vertex of H
(equivalently, an edge-label for G2k1 ), and a1 , a2 are edge labels for H.
Evaluating the rotation map requires two evaluations of the rotation
map for H (both of which are constant-size operations) and one evaluation of the rotation map of G2k1 .
Thus we have proven:
Theorem 4.46. Undirected S-T Connectivity is in L.
We remark that proving RL = L in general remains open. The
best deterministic simulation known for RL is essentially L3/2 =
DSPACE(log3/2 n), which makes beautiful use of known pseudorandom generators for logspace computation. (Unfortunately, we do not
have space to cover this line of work in this survey.) Historically,
improved derandomizations for Undirected S-T Connectivity
have inspired improved derandomizations of RL (and vice-versa). Since
Theorem 4.46 is still quite recent (2005), there is a good chance that
we have not yet exhausted the ideas in it.
Open Problem 4.47. Show that RL Lc for some constant c < 3/2.
Another open problem is the construction of universal traversal
sequences xed walks of polynomial length that are guaranteed to

4.5 Exercises

121

visit all vertices in any connected undirected regular graph of a given


size. (See Example 3.8 and Open Problem 3.9.) Using the ideas from
the algorithm above, it is possible to obtain logspace-constructible,
polynomial-length universal traversal sequences for all regular graphs
that are consistently labelled in the sense that no pair of distinct vertices
have the same ith neighbor for any i [D]. For general labellings, the
best known universal traversal sequences are of length N O(log N ) (and
are constructible in space O(log2 N )).
Open Problem 4.48 (Open Problem 3.9, restated). Give an
explicit construction of universal traversal sequences of polynomial
length for arbitrarily labelled undirected graphs (or even for an arbitrary labelling of the complete graph).
We remark that handling general labellings (for pseudorandom
walk generators rather than universal traversal sequences) seems to
be the main obstacle in extending the techniques of Theorem 4.46 to
prove RL = L. (See the Chapter Notes and References.)

4.5

Exercises

Problem 4.1(Bipartite Versus Nonbipartite Expanders). Show


that constructing bipartite expanders is equivalent to constructing
(standard, nonbipartite) expanders. That is, show how given an explicit
construction of one of the following, you can obtain an explicit construction of the other:
(1) D-regular (N, A) expanders on N vertices for innitely
many N , where > 0, A > 1, and D are constants independent of N .
(2) D-regular (on both sides) (N, A) bipartite expanders with
N vertices on each side for innitely many N , where > 0,
A > 1, and D are constants independent of N .
(Your transformations need not preserve the constants.)

122

Expander Graphs

Problem 4.2 (More Combinatorial Consequences of Spectral


Expansion). Let G be a graph on N vertices with spectral expansion
= 1 . Prove that:
(1) The independence number (G) is at most (/(1 + ))N ,
where (G) is dened to be the size of the largest independent set, i.e., subset S of vertices s.t. there are no edges with
both endpoints in S.
(2) The chromatic number (G) is at least (1 + )/, where
(G) is dened to be the smallest number of colors for which
the vertices of G can be colored s.t. all pairs of adjacent vertices have dierent colors.
(3) The diameter of G is O(log1/ N ).
Recall that computing (G) and (G) exactly are NP-complete problems. However, the above shows that for expanders, nontrivial bounds
on these quantities can be computed in polynomial time.

Problem 4.3 (Limits on Vertex Expansion). This problem and


the next one give limits on the vertex and spectral expansion that can
be achieved as a function of the degree D. Both bounds are proved by
relating the expansion of an arbitrary D-regular graph G by that of
the innite D-regular tree TD (where every vertex has one parent and
D 1 children), which is in some sense the best possible D-regular
expander.
(1) Show that if a D-regular digraph G is a (K, A) expander,
then TD is a (K, A) expander.
(2) Show that for every D N, there are innitely many K N
such that TD is not a (K, D 1 + 2/K) expander.
(3) Deduce that for constant D N and > 0, if a D-regular,
N -vertex digraph G is an (N, A) vertex expander, then A
D 1 + O(1), where the O(1) term vanishes as N (and
D, are held constant).

4.5 Exercises

123

Problem 4.4 (Limits on Spectral Expansion). Let G be a Dregular undirected graph and TD be the innite D-regular tree (as
in Problem 4.3). For a graph H and  N, let p (H) denote the probability that if we choose a random vertex v in H and do a random walk
of length 2, we end back at vertex v.
(1) Show that p (G) p (TD ) C (D 1) /D2 , where C is
the th Catalan number, which equals the number of properly
parenthesized strings in {(, )}2 strings where no prex has
more)s than (s.
(2) Show that N p (G) 1 + (N 1) (G)2 . (Hint: use the
fact that the trace of a matrix equals the sum of its eigenvalues.)
 
(3) Using the fact that C = 2
 /( + 1), prove that

2 D1
(G)
O(1),
D
where the O(1) term vanishes as N (and D is held
constant).

Problem 4.5 (Near-Optimal Sampling).


(1) Describe an algorithm for Sampling that tosses O(m +
log(1/) + log(1/)) coins, makes O((1/2 ) log(1/))
queries to a function f : {0, 1}m [0, 1], and estimates
(f ) to within with probability at least 1 . (Hint:
use expander walks to generate several sequences of coin
tosses for the pairwise-independent averaging sampler, and
compute the answer via a median of averages.)
(2) Give an explicit (, ) hitting sampler (see Problem
3.9)
Samp : {0, 1}n ({0, 1}m )t
that
tosses
n = O(m + log(1/) + log(1/)) coins and generates
t = O((1/) log(1/)) samples.

124

Expander Graphs

It turns out that these bounds on the randomness and query/sample


complexities are each optimal up to constant factors (for most parameter settings of interest).

Problem 4.6(Error Reduction For Free*). Show that if a problem


has a BPP algorithm with constant error probability, then it has a
BPP algorithm with error probability 1/n that uses exactly the same
number of random bits.

Problem 4.7 (Vertex Expanders versus Hitting Samplers).


Here we will see that hitting samplers (dened in Problem 3.9) are
equivalent to a variant of vertex expanders, where we only require that
for (left-)sets S of size exactly K, there are at least A K neighbors.
We call such graphs (= K, A) vertex expanders and will revisit them in
the next section (Denition 5.32).
Given a bipartite multigraph with neighbor function : [N ]
[D] [M ], we can obtain a sampler Samp : [N ] [M ]D by setting
Samp(x)y = (x, y). Conversely, every such sampler gives rise to a
bipartite multigraph. Prove that Samp is a (, ) hitting sampler if
and only if is an (= K, A) vertex expander for K = N  + 1 and
A = (1 )M/K.
Thus, bipartite vertex expanders and hitting samplers are equivalent. However, the typical settings of parameters for the two objects are
very dierent. For example, in vertex expanders, a primary goal is usually to maximize the expansion factor A, but A K may be signicantly
smaller than M . In samplers, AK = (1 )M is usually taken to be
very close to M , but A = (1 )M/N may even be smaller than 1.
Similarly, the most common setting of expanders takes K/N to be a
constant, whereas in samplers it is often thought of as vanishingly small.

Problem 4.8 (A Constant-Sized Expander).


(1) Let F be a nite eld. Consider a graph G with vertex set F2
and edge set {((a, b), (c, d)) : ac = b + d}. That is, we connect

4.5 Exercises

125

vertex (a, b) to all points on the


line y = ax b. Prove that
G is |F|-regular and (G) 1/ |F|. (Hint: consider G2 .)
(2) Show that if |F| is suciently large (but still constant),
then by applying appropriate operations to G, we can
obtain a base graph for the expander construction given in
Section 4.3.3, i.e., a (D8 , D, 7/8) graph for some constant D.

Problem 4.9 (The Replacement Product). Given a D1 -regular


graph G1 on N1 vertices and a D2 -regular graph G2 on D1 vertices,
r G2 on vertex set [N1 ] [D1 ]: vertex
consider the following graph G1 $
(u, i) is connected to (v, j) i (a) u = v and (i, j) is an edge in G2 , or
(b) v is the ith neighbor of u in G1 and u is the jth neighbor of v. That
is, we replace each vertex v in G1 with a copy of G2 , associating each
edge incident to v with one vertex of G2 .
(1) Prove that there is a function g such that if G1 has spectral expansion 1 > 0 and G2 has spectral expansion 2 > 0
r G2 has spectral
(and both graphs are undirected), then G1 $
r G2 )3 has
expansion g(1 , 2 , D2 ) > 0. (Hint: Note that (G1 $
z G2 as a subgraph.)
G1 $
(2) Show how to convert an explicit construction of constantdegree (spectral) expanders into an explicit construction of
degree 3 (spectral) expanders.
(3) Without using Theorem 4.14, prove an analogue of Part 1 for
edge expansion. That is, there is a function h such that if G1
is an (N1 /2, 1 ) edge expander and G2 is a (D1 /2, 2 ) edge
r G2 is a (N1 D1 /2, h(1 , 2 , D2 )) edge
expander, then G1 $
expander, where h(1 , 2 , D2 ) > 0 if 1 , 2 > 0. (Hint: given
r G2 , partition S into the clouds
any set S of vertices of G1 $
that are more than half-full and those that are not.)
(4) Prove that the functions g(1 , 2 , D2 ) and h(1 , 2 , D2 ) must
r G2 cannot be a
depend on D2 , by showing that G1 $
(N1 D1 /2, ) edge expander if > 1/(D2 + 1) and N1 2.

126

Expander Graphs

Problem 4.10(Unbalanced Vertex Expanders and Data Structures). Consider a (K, (1 )D) bipartite vertex expander G with N
left vertices, M right vertices, and left degree D.
(1) For a set S of left vertices, a y N (S) is called a unique
neighbor of S if y is incident to exactly one edge from S.
Prove that every left-set S of size at most K has at least
(1 2)D|S| unique neighbors.
(2) For a set S of size at most K/2, prove that at most |S|/2
vertices outside S have at least D neighbors in N (S), for
= O().
Now well see a beautiful application of such expanders to data
structures. Suppose we want to store a small subset S of a large universe
[N ] such that we can test membership in S by probing just 1 bit of our
data structure. A trivial way to achieve this is to store the characteristic
vector of S, but this requires N bits of storage. The hashing-based data
structures mentioned in Section 3.5.3 only require storing O(|S|) words,
each of O(log N ) bits, but testing membership requires reading an entire
word (rather than just one bit.)
Our data structure will consist of M bits, which we think of as a
{0, 1}-assignment to the right vertices of our expander. This assignment
will have the following property.
Property : For all left vertices x, all but a = O() fraction of the
neighbors of x are assigned the value S (x) (where S (x) = 1
i x S).
(3) Show that if we store an assignment satisfying Property ,
then we can probabilistically test membership in S with error
probability by reading just one bit of the data structure.
(4) Show that an assignment satisfying Property exists provided |S| K/2. (Hint: rst assign 1 to all of Ss neighbors
and 0 to all its nonneighbors, then try to correct the errors.)
It turns out that the needed expanders exist with M = O(K log N )
(for any constant ), so the size of this data structure matches the

4.6 Chapter Notes and References

127

hashing-based scheme while admitting (randomized) 1-bit probes. However, note that such bipartite vertex expanders do not follow from
explicit spectral expanders as given in Theorem 4.39, because the latter
do not provide vertex expansion beyond D/2 nor do they yield highly
imbalanced expanders (with M
N ) as needed here. But in Section 5,
we will see how to explicitly construct expanders that are quite good
for this application (specically, with M = K 1.01 polylogN ).

4.6

Chapter Notes and References

A detailed coverage of expander graphs and their applications in theoretical computer science is given by Hoory, Linial, and Wigderson [207].
Applications in pure mathematics are surveyed by Lubotzky [276].
The rst papers on expander graphs appeared in conferences on
telephone networks. Specically, Pinsker [309] proved that random
graphs are good expanders, and used these to demonstrate the existence
of graphs called concentrators. Bassalygo [52] improved Pinskers
results, in particular giving the general tradeo between the degree D,
expansion factor A, and set density mentioned after Theorem 4.4.
The rst computer science application of expanders (and superconcentrators) came in an approach by Valiant [403] to proving circuit
lower bounds. An early and striking algorithmic application was the
O(log n)-depth sorting network by Ajtai, Koml
os, and Szemeredi [10],
which also illustrated the usefulness of expanders for derandomization.
An exciting recent application of expanders is Dinurs new proof of the
PCP Theorem [118].
The fact that spectral expansion implies vertex expansion and edge
expansion was shown by Tanner [385] (for vertex expansion) and Alon
and Milman [23] (for edge expansion). The converses are discrete analogues of Cheegers Inequality for Riemannian manifolds [94], and various forms of these were proven by Alon [15] (for vertex expansion),
Jerrum and Sinclair [219] (for edge expansion in undirected graphs
and, more generally, conductance in reversible Markov chains), and
Mihail [286] (for edge expansion in regular digraphs and conductance
in nonreversible Markov chains).

128

Expander Graphs

The Ramanujan upper bound on spectral expansion given by


Theorem 4.11 was proven by Alon and Boppana (see [15, 297]). Theorem 4.12, stating that random graphs are asymptotically Ramanujan,
was conjectured by Alon [15], but was only proven recently by
Friedman [143]. Kahale [228] proved that Ramanujan graphs have
vertex expansion roughly D/2 for small sets.
Forms of the Expander Mixing Lemma date back to Alon and
Chung [18], who considered the number of edges between a set and
its complement (i.e., T = V \S). The converse to the Expander Mixing Lemma (Theorem 4.16) is due to Bilu and Linial [68]. For more
on quasirandomness, see [25, 104] for the case of dense graphs and
[100, 101] for sparse graphs.
The sampling properties of random walks on expanders were analyzed in a series of works starting with Ajtai, Koml
os, and Szemeredi [11]. The hitting bound of Theorem 4.17 is due to Kahale [228],
and the Cherno Bound for expander walks (cf., Theorem 4.22) is due
to Gillman [153]. Our proof of the Cherno Bound is inspired by that
of Healy [203], who also provides some other variants and generalizations. The RP version of Problem 4.6 is due to Karp, Pippenger, and
Sipser [234], who initiated the study of randomness-ecient error reduction of randomized algorithms. It was generalized to BPP in [107]. The
equivalence of hitting samplers and bipartite vertex expanders from
Problem 4.7 is due to Sipser [365]. Problem 4.5 is due to Bellare, Goldreich, and Goldwasser [55]; matching lower bounds for sampling were
given by Canetti, Even, and Goldreich [91]. Open Problem 4.24 was
posed by Bellare and Rompel [57].
Construction 4.25 is due to Margulis [283], and was the rst explicit
construction of constant-degree expanders. Gabber and Galil [146] (see
also [221]) gave a much more elementary proof of expansion for similar expanders, which also provided a specic bound on the spectral
expansion (unlike Margulis proof). Construction 4.26 is variant of
a construction of Lubotzky, Phillips, and Sarnak. (See [275, Thm.
4.42], from which the expansion of Construction 4.26 can be deduced.)
Ramanujan graphs (Construction 4.27) were constructed independently
by Lubotzky, Phillips, and Sarnak [277] and Margulis [284]. For more

4.6 Chapter Notes and References

129

on Ramanujan graphs and the mathematical machinery that goes into


their analysis, see the books [112, 348, 275].
The zigzag product and the expander constructions of Section 4.3.3
are due to Reingold, Vadhan, and Wigderson [333]. Our analysis of
the zigzag product is from [331], which in turn builds on [338], who
used matrix decomposition (Lemma 4.19) for analyzing other graph
products. Earlier uses of graph products in constructing expanders
include the use of the tensor product in [385]. Problem 4.9, on the
replacement product, is from [331, 333], and can be used in place
of the zigzag product in both the expander constructions and the
Undirected S-T Connectivity algorithm (Algorithm 4.45). Independently of [333], Martin and Randall [285] proved a decomposition
theorem for Markov chains that implies a better bound on the spectral
expansion of the replacement product.
There has been substantial progress on giving a combinatorial
construction of Ramanujan graphs (Open Problem 4.42). Biluand
D),
Linial [68] give a mildly explicit construction achieving (G) = O(
Ben-Aroya and Ta-Shma [58] give a fully explicit construction achieving (G) = D1/2+o(1) , and Batson, Spielman, and Srivastava [53] give
a mildly
explicit construction of a weighted graph achieving (G) =

O( D).
Constant-degree bipartite expanders with expansion (1 )D have
been constructed by Capalbo et al. [92], based on a variant of the zig
zag product for randomness condensers. (See Section 6.3.5.) Alon and
Capalbo [17] have made progress on Open Problem 4.44 by giving an
explicit construction of undirected constant-degree unique-neighbor
expanders (see Problem 4.10).
The deterministic logspace algorithm for Undirected S-T Connectivity (Algorithm 4.45) is due to Reingold [327]. The result
that RL L3/2 is due to Saks and Zhou [344], with an important
ingredient being Nisans pseudorandom generator for space-bounded
computation [299]. Based on Algorithm 4.45, explicit polynomiallength universal traversal sequences for consistently labelled regular
digraphs, as well as pseudorandom walk generators for such graphs,
were constructed in [327, 331]. (See also [338].) In [331], it is shown
that pseudorandom walk generators for arbitrarily labelled regular

130

Expander Graphs

digraphs would imply RL = L. The best known explicit construction


of a full-edged universal traversal sequence is due to Nisan [299], has
length nO(log n) , and can be constructed in time nO(log n) and space
O(log2 n). (See Section 8.2.1 for more on the derandomization of RL.)
Problem 4.8, Part 1 is a variant of a construction of Alon [16];
Part 4.8 is from [333]. The results of Problem 4.2 are from [23, 103,
205, 268]. The result of Problem 4.10, on bit-probe data structures for
set membership, is due to Buhrman, Miltersen, Radhakrishnan, and
Venkatesan [87].

5
List-Decodable Codes

The eld of coding theory is motivated by the problem of communicating reliably over noisy channels where the data sent over the channel
may come out corrupted on the other end, but we nevertheless want
the receiver to be able to correct the errors and recover the original
message. There is a vast literature studying aspects of this problem
from the perspectives of electrical engineering (communications and
information theory), computer science (algorithms and complexity),
and mathematics (combinatorics and algebra). In this survey, we are
interested in codes as pseudorandom objects. In particular, a generalization of the notion of an error-correcting code yields a framework that
we will use to unify all of the main pseudorandom objects covered in
this survey (averaging samplers, expander graphs, randomness extractors, list-decodable codes, pseudorandom generators).

5.1
5.1.1

Denitions and Existence


Denition

The approach to communicating over a noisy channel is to restrict the


data we send to be from a certain set of strings that can be easily
disambiguated (even after being corrupted).
131

132

List-Decodable Codes

Denition 5.1. A q-ary code is a multiset1 C n , where is an


alphabet of size q. Elements of C are called codewords. We dene the
following key parameters:
n
is the block length.
n = log |C| is the message length.
= n/(
n log ||) is the (relative) rate of the code.
An encoding function for C is an bijective mapping
Enc : {1, . . . , |C|} C. Given such an encoding function, we view
the elements of {1, . . . , |C|} as messages. When n = log |C| is an integer,
we often think of messages as strings in {0, 1}n .
We view C and Enc as being essentially the same object (with Enc
merely providing a labelling of codewords), with the former being
useful for studying the combinatorics of codes and the latter for algorithmic purposes.
We remark that our notation diers from the standard notation in
coding theory in several ways. Typically in coding theory, the input
alphabet is taken to be the same as the output alphabet (rather than
{0, 1} and , respectively), the blocklength is denoted n, and the message length (over ) is denoted k and is referred to as the rate. Our
notational choices are made in part to facilitate stating connections to
other pseudorandom objects covered in this survey.
So far, we havent talked at all about the error-correcting properties
of codes. Here we need to specify two things: the model of errors (as
introduced by the noisy channel) and the notion of a successful recovery.
For the errors, the main distinction is between random errors and
worst-case errors. For random errors, one needs to specify a stochastic
model of the channel. The most basic one is the binary symmetric channel (over alphabet = {0, 1}), where each bit is ipped independently
with probability . People also study more complex channel models,
1 Traditionally,

codes are dened to be sets rather than multisets. However, the generality
aorded by multisets will allow the connections we see later to be stated more cleanly.
In the case C is a multiset, the condition that a mapping Enc : {1, . . . , N } C is bijective
means that for every c C, |Enc1 (c)| equals the multiplicity of c in C.

5.1 Denitions and Existence

133

but as usual with stochastic models, there is always the question of


how well the theoretical model correctly captures the real-life distribution of errors. We, however, will focus on worst-case errors, where
we simply assume that fewer than a fraction of symbols have been
changed. That is, when we send a codeword c n over the channel,
the received word r n diers from c in fewer than
n places. Equivalently, c and r are close in Hamming distance:
Denition 5.2 (Hamming distance). For two strings x, y n ,
their (relative) Hamming distance dH (x, y) equals Pri [xi = yi ]. The
agreement is dened to be agr(x, y) = 1 dH (x, y).
For a string x n and [0, 1], the (open) Hamming ball of
radius around x is the set B(x, ) of strings y n such that
dH (x, y) < .
For the notion of a successful recovery, the traditional model
requires that we can uniquely decode the message from the received
word (in the case of random errors, this need only hold with high probability). Our main focus will be on a more relaxed notion which allows
us to produce a small list of candidate messages. As we will see, the
advantage of such list decoding is that it allows us to correct a larger
fraction of errors.
Denition 5.3. Let Enc : {0, 1}n n be an encoding algorithm
for a code C. A -decoding algorithm for Enc is a function Dec :
n {0, 1}n such that for every m {0, 1}n and r n satisfying
dH (Enc(m), r) < , we have Dec(r) = m. If such a function Dec exists,
we call the code -decodable.
A (, L)-list-decoding algorithm for Enc is a function Dec : n
({0, 1}n )L such that for every m {0, 1}n and r n satisfying
dH (Enc(m), r) < , we have m Dec(r). If such a function Dec exists,
we call the code (, L)-list-decodable.
Note that a -decoding algorithm is the same as a (, 1)-listdecoding algorithm. It is not hard to see that, if we do not care about

134

List-Decodable Codes

computational eciency, the existence of such decoding algorithms


depends only on the combinatorics of the set of codewords.
Denition 5.4. The (relative) minimum distance of a code C n
equals minx =yC dH (x, y).2

Proposition 5.5. Let C n be a code with any encoding function


Enc.
(1) For
n N, Enc is -decodable i its minimum distance is at
least 2 1/
n.
(2) Enc is (, L)-list-decodable i for every r n , we have
|B(r, ) C| L.

Proof. Item 2 follows readily from the denitions.


For Item 1, rst note that Enc is -decodable i there is no received
word r n at distance less than from two codewords c1 , c2 . (Such
an r should decode to both c1 and c2 , which is impossible for a unique
decoder.) If the code has minimum distance at least 2 1/
n, then
c1 and c2 disagree in at least 2
n 1 positions, which implies that r
disagrees with one of them in at least
n positions. (Recall that
n is an
integer by hypothesis.) Conversely, if the code has minimum distance
smaller than 2 1/
n, then there are two codewords that disagree in
at most 2
n 2 positions, and we can construct r that disagrees with
each in at most
n 1 positions.
Because of the factor of 2 in Item 1, unique decoding is only possible
at distances up to 1/2, whereas we will see that list decoding is possible
at distances approaching 1 (with small lists).
The main goals in constructing codes are to have innite families of
codes (e.g., for every message length n) in which we:
2 If

any codeword appears in C with multiplicity greater than 1, then the minimum distance
is dened to be zero.

5.1 Denitions and Existence

135

Maximize the fraction of errors correctible (e.g., constant


independent of n and n
).
Maximize the rate (e.g., a constant independent of n
and n
).
Minimize the alphabet size q (e.g., a constant, ideally q = 2).
Keep the list size L relatively small (e.g., a constant or
poly(n)).
Have computationally ecient encoding and decoding
algorithms.
In particular, coding theorists are very interested in obtaining the
optimal tradeo between the constants and with eciently encodable and decodable codes.
5.1.2

Existential Bounds

Like expanders, the existence of very good codes can be shown using
the probabilistic method. The bounds will be stated in terms of the
q-ary entropy functions, so we begin by dening those.
Denition 5.6. For q, n
N and [0, 1], we dene Hq (, n
) [0, 1]
to be such that |B(x, )| = q Hq (,n)n for x n , where is an alphabet
of size q.
We also dene the q-ary entropy function Hq () = logq ((q
1)/) + (1 ) logq (1/(1 )).
The reason we use similar notation for Hq (, n
) and Hq () is Part 1
of the following:
Proposition 5.7. For every q N, (0, 1 1/q), [0, 1/2],
(1) limn Hq (, n
) = Hq ().
(2) Hq () H2 ()/ log q + .
(3) H2 (1/2 ) = 1 (2 ).

136

List-Decodable Codes

Now we state the bounds for random error-correcting codes.


Theorem 5.8.
(1) For all n
, q N and (0, 1 1/q), there exists a q-ary code
of block length n
, minimum distance at least , and rate at
least = 1 Hq (, n
).
(2) For all integers n
, q, L N and (0, 1 1/q), there exists
a (, L)-list-decodable q-ary code of block length n
and rate
at least = 1 Hq (, n
) 1/(L + 1).

Proof.
(1) Pick the codewords c1 , . . . , cN in sequence ensuring that ci
is at distance at least from c1 , . . . , ci1 . The union of
the Hamming balls of radius around c1 , . . . , ci1 contains
at most (i 1) q Hq (,n)n < (N 1) q Hq (,n)n , so there is
always a choice of ci outside these balls if we take N =
q (1Hq (,n))n .
(2) We use the probabilistic method. Choose the N codewords
randomly and independently from n . The probability that
there is a Hamming Ball of radius containing at least L + 1
codewords is at most
  Hq (,n)n L+1 
L+1

N
q
N 1
n

,
L+1
q n
q (1Hq (,n)1/(L+1))n
which is less than 1 if we take N = q (1Hq (,n)1/(L+1))n .
Note that while the rate bounds are essentially the same for achieving minimum distance and achieving list-decoding radius (as we take
large list size), the bounds are incomparable because minimum distance
only corresponds to unique decoding up to radius roughly /2. The
bound for list decoding is known to be tight up to the dependence on L
(Problem 5.1), while the bound on minimum distance is not tight in
general. Indeed, there are families of algebraic-geometric codes with

5.1 Denitions and Existence

137

constant alphabet size q, constant minimum distance > 0, and constant rate > 0 where > 1 Hq (, n
) for suciently large n
. (Thus,
this is a rare counterexample to the phenomenon random is best.)
Identifying the best tradeo between rate and minimum distance, even
for binary codes, is a long-standing open problem in coding theory.
Open Problem 5.9. For each constant (0, 1), identify the largest
> 0 such that for every > 0, there exists an innite family of codes
Cn {0, 1}n of rate at least and minimum distance at least .
Lets look at some special cases of the parameters in Theorem 5.8.
For binary codes (q = 2), we will be most interested in the case
1/2, which corresponds to correcting the maximum possible fraction of errors for binary codes. (No nontrivial decoding is possible
for binary codes at distance greater than 1/2, since a completely
random received word will be at distance roughly 1/2 with most codewords.) In this case, Proposition 5.7 tells us that the rate approaches
1 H2 (1/2 ) = (2 ), i.e., the blocklength is n
= (n/2 ) (for list
size L = (1/2 )). For large alphabets q, Proposition 5.7 tells us that
the rate approaches 1 as q grows. We will be most interested in
the case = 1 for small , so we can correct a = 1 fraction of
errors with a rate arbitrarily close to . For example, we can achieve
rate of = 0.99, a list size of L = O(1/) and an alphabet of size
poly(1/). More generally, it is possible to achieve rate = (1 )
with an alphabet size of q = (1/)O(1/) .
While we are primarily interested in list-decodable codes, minimum
distance is often easier to bound. The following allows us to translate
bounds on minimum distance into bounds on list-decodability.
Proposition 5.10 (Johnson Bound).
(1) If C has minimum distance 1 , then it is (1

O( ), O(1/ ))-list-decodable.
(2) If a binary code C has minimum distance 1/2 , then it is

(1/2 O( ), O(1/))-list-decodable.

138

List-Decodable Codes

Proof. We prove Item 1, and leave Item 2 as an exercise in the next


section (Problem 6.3). The proof is by inclusionexclusion. Suppose for
contradiction that there are codewords c1 , . . . , cs at distance less than

1  from some r n , for  = 2 and s = 2/ . Then:


1 fraction of positions where r agrees with some ci



agr(r, ci )
agr(ci , cj )
i

 
s


> s
2

1i<js

2 1 = 1,
where the last inequality is by our setting of parameters. Hence,
contradiction.
Note the quadratic loss in the distance parameter. This means that
optimal codes with respect to minimum distance are not necessarily
optimal with respect to list decoding. Nevertheless, if we do not care
about the exact tradeo between the rate and the decoding radius, the
above can yield codes where the decoding radius is as large as possible
(approaching 1 for large alphabets and 1/2 for binary alphabets).
5.1.3

Explicit Codes

As usual, most applications of error-correcting codes (in particular the


original motivating one) require computationally ecient encoding and
decoding. For now, we focus on only the eciency of encoding.
Denition 5.11. A code Enc : {0, 1}n n is (fully) explicit if given
a message m {0, 1}n and an index i n
, the ith symbol of Enc(m)
can be computed in time poly(n, log n
, log ||).
The reason we talk about computing individual symbols of the codeword rather than the entire codeword is to have a meaningful denition
even for codes where the blocklength n
is superpolynomial in the message length n. One can also consider weaker notions of explicitness,

5.1 Denitions and Existence

139

where we simply require that the entire codeword Enc(m) can be computed in time poly(
n, log ||).
The constructions of codes that we describe will involve arithmetic
over nite elds. See Remark 3.25 for the complexity of constructing
nite elds and carrying out arithmetic in such elds .
In describing the explicit constructions below, it will be convenient
to think of the codewords as functions c : [
n] rather than as strings
n

in .
Construction 5.12 (Hadamard Code). For n N, the (binary)
Hadamard code of message length n is the binary code of blocklength
n
= 2n consisting of all functions c : Zn2 Z2 that are linear (modulo 2).
Proposition 5.13. The Hadamard code:
(1) is explicit with respect to the encoding function that takes a
message m Zn2 to the linear function cm dened by cm (x) =

i mi xi mod 2.
(2) has minimum distance 1/2, and
(3) is (1/2 , O(1/2 )) list-decodable for every > 0.
Proof. Explicitness is clear by inspection. The minimum distance
follows from the fact that for every two distinct linear functions
c, c : Zn2 Z2 , Prx [c(x) = c (x)] = Prx [(c c )(x) = 0] = 1/2. The listdecodability follows from the Johnson Bound.
The advantages of the Hadamard code are its small alphabet
(binary) and optimal distance (1/2), but unfortunately its rate is exponentially small ( = n/2n ). By increasing both the eld size and degree,
we can obtain complementary properties
Construction 5.14 (ReedSolomon Codes). For a prime power q
and d N, the q-ary ReedSolomon code of degree d is the code of
blocklength n
= q and message length n = (d + 1) log q consisting of
all polynomials p : Fq Fq of degree at most d.

140

List-Decodable Codes

Proposition 5.15. The q-ary ReedSolomon Code of degree d:


(1) is explicit with respect to the encoding function that takes a
to the polynomial pm dened
vector of coecients m Fd+1
q
d
i
by pm (x) = i=0 mi x (assuming we have a description of
the eld Fq ).
(2) has minimum
 distance
 = 1 d/q, and
(3) is (1 O( d/q), O( q/d)) list-decodable.

Proof. Again explicitness follows by inspection. The minimum distance


follows from the fact that two distinct polynomials of degree at most d
agree in at most d points (else their dierence would have more than d
roots). The list-decodability follows from the Johnson Bound.
Note that by setting q = O(d), ReedSolomon codes simultaneously
achieve constant rate and constant distance, the only disadvantage
being that the alphabet is of nonconstant size (namely q = n
> n.)
Another useful setting of parameters for ReedSolomon codes
in complexity theory is q = poly(d), which gives polynomial rate
(
n = poly(n)) and distance tending to 1 polynomially fast ( = 1 1/
poly(n)).
The following codes interpolate between Hadamard and Reed
Solomon codes, by allowing the number of variables, the degree, and
eld size all to vary.
Construction 5.16 (ReedMuller Codes). For a prime power q
and d, m N, the q-ary ReedMuller code of degree d and dimension m


is the code of blocklength n
= q m and message length n = m+d
log q
d
m
consisting of all polynomials p : Fq Fq of (total) degree at most d.

Proposition 5.17. The q-ary ReedMuller Code of degree d and


dimension m:

5.2 List-Decoding Algorithms

141

(1) is explicit with respect to the encoding function that takes


(m+d)
a vector of coecients v Fq d to the corresponding polynomial pv .
(2) has minimum
 distance
 1 d/q, and
(3) is (1 O( d/q), O( q/d)) list-decodable.

Proof. Same as for ReedSolomon Codes, except we use the Schwartz


Zippel Lemma (Lemma 2.4) to deduce the minimum distance.
Note that ReedSolomon Codes are simply ReedMuller codes of
dimension m = 1, and Hadamard codes are essentially ReedMuller
codes of degree d = 1 and alphabet size q = 2 (except that the Reed
Muller code also contains ane linear functions).

5.2

List-Decoding Algorithms

In this section, we will describe ecient list-decoding algorithms for the


ReedSolomon code and variants. It will be convenient to work with
the following notation:
Denition 5.18. Let C be a code with encoding function
Enc : {1, . . . , N } n . For r n , dene LIST(r, ) = {m n :
agr(Enc(m), r) > }.
Then the task of (1 ) list decoding (according to Denition 5.3)
is equivalent to producing the elements of LIST(r, ) given r n . In
this section, we will see algorithms that do this in time polynomial in
the bit-length of r, i.e., time poly(
n, log ||).
5.2.1

Review of Algebra

The list-decoding algorithms will require some additional algebraic


facts and notation:
For every eld F, F[X1 , . . . , Xn ] is the integral domain consisting of formal polynomials Q(X1 , . . . , Xn ) with coecients

142

List-Decodable Codes

in F, where addition and multiplication of polynomials is


dened in the usual way.
A nonzero polynomial Q(X1 , . . . , Xn ) is irreducible if we cannot write Q = RS, where R, S are nonconstant polynomials.
For a nite eld Fq of characteristic p and d N, a univariate
irreducible polynomial of degree d over Fq can be found in
deterministically in time poly(p, log q, d).
F[X1 , . . . , Xn ] is a unique factorization domain. That is, every
nonzero polynomial Q can be factored as Q = Q1 Q2 Qm ,
where each Qi is irreducible and this factorization is unique
up to reordering and multiplication by constants from
F. Given the description of a nite eld Fpk and the
polynomial Q, this factorization can be done probabilistically in time poly(log p, k, |Q|) and deterministically in time
poly(p, k, |Q|).
For a nonzero polynomial Q(Y, Z) F[Y, Z] and f (Y ) F[Y ],
if Q(Y, f (Y )) = 0, then Z f (Y ) is one of the irreducible
factors of Q(Y, Z) (and thus f (Y ) can be found in polynomial
time). This is analogous to the fact that if c Z is a root of
an integer polynomial Q(Z), then Z c is one of the factors
of Q (and can be proven in the same way, by long division).
5.2.2

List-Decoding ReedSolomon Codes

Theorem 5.19. There is a polynomial-time (1 ) list-decodingalgorithm for the q-ary ReedSolomon code of degree d, for = 2 d/q.
That is, given a function r : Fq Fq and d N, all polynomials of

degree at most d that agree with r on more than q = 2 dq inputs


can be found in polynomial time.
In fact the constant of 2 can be improved to 1, matching the combinatorial list-decoding radius for ReedSolomon codes given by an
optimized form of the Johnson Bound. See Problem 5.6.
Proof. We are given a received 
word r : Fq Fq , and want to nd all
elements of LIST(r, ) for = 2 d/q.

5.2 List-Decoding Algorithms

143

Step 1: Find a low-degree Q explaining r. Specically,


Q(Y, Z) will be a nonzero bivariate polynomial of degree at most dY
in its rst variable Y and dZ in its second variable, with the property that Q(y, r(y)) = 0 for all y Fq . Each such y imposes a linear
constraint on the (dY + 1)(dZ + 1) coecients of Q. Thus, this system
has a nonzero solution provided (dY + 1)(dZ + 1) > q, and it can be
found in polynomial time by linear algebra (over Fq ).
Step 2: Argue that each f (Y ) LIST(r, ) is a root of Q.
Specically, it will be the case that Q(Y, f (Y )) = 0 for each f
LIST(r, ). The reason is that Q(Y, f (Y )) is a univariate polynomial
of degree at most dY + d dZ , and has more than q zeroes (one for
each place that f and r agree). Thus, we can conclude Q(Y, f (Y )) = 0
provided q dY + d dZ . Then we can enumerate all of the elements
of LIST(r, ) by factoring Q(Y, Z) and taking all the factors of the form
Z f (Y ).
For this algorithm to work, the two conditions we need to satisfy
are
(dY + 1)(dZ + 1) > q,
and
q dY + d dZ .
These conditionscan be satised by setting dY = q/2, dZ =
q/(2d), and = 2 d/q.
Note that the rate of ReedSolomon codes is = (d + 1)/q = (2 ).
2 ). In contrast, the random

The alphabet size is q = (n/)


= (n/
codes of Theorem
5.8 achieve and q = poly(1/). It is not known
whether the = d/q bound on the list-decodability of ReedSolomon
codes can be improved, even with inecient decoding.
Open Problem 5.20. Do there exist constants , (0, 1) with

< and an innite sequence of prime powers q such that the


q-ary ReedSolomon code of degree d = q is (1 , poly(q))-listdecodable?

144

List-Decodable Codes

5.2.3

ParvareshVardy Codes

Our aim is to improve the rate-distance tradeo to = ().


Intuitively,
the power of the ReedSolomon list-decoding algorithm comes from the
fact that we can interpolate the q points (y, r(y)) of the received word

using a bivariate polynomial Q of degree roughly q in each variable


(think of d = O(1) for now). If we could use m variables instead of 2,
then the degrees would only have to be around q 1/m .
First attempt: Replace Step 1 with nding an (m + 1)-variate polynomial Q(Y, Z1 , . . . , Zm ) of degree dY in Y and dZ in each Zi such
that Q(y, r(y), r(y), . . . , r(y)) = 0 for every y Fq . Then, we will be
able to choose a nonzero polynomial Q of degree roughly q 1/m such
that Q(Y, f (Y ), . . . , f (Y )) = 0 for every f LIST(r, ), and hence it
follows that Z f (Y ) divides the bivariate polynomial Q (Y, Z) =
Q(Y, Z, . . . , Z). Unfortunately, Q might be the zero polynomial even
if Q is nonzero.
Second attempt: Replace Step 1 with nding an (m + 1)-variate
polynomial Q(Y, Z1 , . . . , Zm ) of degree dY in Y and dZ = h 1 in each
2
m1
Zi such that Q(y, r(y), r(y)h , r(y)h , . . . , r(y)h
) = 0 for every y Fq .
m1

h
Then, it follows that Q (Y, Z) = Q(Y, Z, Z , . . . , Z h
) is nonzero if Q
is nonzero because every monomial in Q with individual degrees at most
h 1 in Z1 , . . . , Zm gets mapped to a dierent power of Z. However,
here the diculty is that the degree of Q (Y, f (Y )) is too high (roughly

d = dY + d h m > dm
Z ) for us to satisfy the constraint q d .
ParvareshVardy codes get the best of both worlds by providing more information with each symbol not just the evaluation
of f at each point, but the evaluation of m 1 other polynomials
f1 , . . . , fm1 , each of which is still of degree d (as is good for arguing
that Q(Y, f (Y ), f1 (Y ), . . . , fm1 (Y )) = 0, but can be viewed as raising
f to successive powers of h for the purposes of ensuring that Q is
nonzero.
To introduce this idea, we need some additional algebra.
For univariate polynomials f (Y ) and E(Y ), we dene
f (Y ) mod E(Y ) to be the remainder when f is divided by E.

5.2 List-Decoding Algorithms

145

If E(Y ) is of degree k, then f (Y ) mod E(Y ) is of degree at


most k 1.
The ring F[Y ]/E(Y ) consists of all polynomials of degree at
most k 1 with arithmetic modulo E(Y ) (analogous to Zn
consisting of integers smaller than n with arithmetic modulo n). If E is irreducible, then F[Y ]/E(Y ) is a eld (analogous to Zp being a eld when p is prime). Indeed, this is
how the nite eld of size pk is constructed: take F = Zp and
E(Y ) to be an irreducible polynomial of degree k over Zp ,
and then F[Y ]/E(Y ) is the (unique) eld of size pk .
A multivariate polynomial Q(Y, Z1 , . . . , Zm ) can be reduced
modulo E(Y ) by writing it as a polynomial in variables
Z1 , . . . , Zm with coecients in F[Y ] and then reducing each
coecient modulo E(Y ). After reducing Q modulo E, we
think of Q as a polynomial in variables Z1 , . . . , Zm with coefcients in the eld F[Y ]/E(Y ).
Construction 5.21 (ParvareshVardy Codes). For a prime
power q, integers m, d, h N, and an irreducible polynomial E(Y ) of
degree larger than d, the q-ary ParvareshVardy Code of degree d,
power h, redundancy m, and irreducible E is dened as follows:
The alphabet is = Fm
q .
The blocklength is n
= q.
The message space is Fd+1
q , where we view each message as
representing a polynomial f (Y ) of degree at most d over Fq .
For y Fq , the yth symbol of the Enc(f ) is
[f0 (y), f1 (y), . . . , fm1 (y)],
i

where fi (Y ) = f (Y )h mod E(Y ).

Theorem 5.22. For every prime power q, integer 0 d < q, and irreducible polynomial E of degree d + 1, the q-ary ParvareshVardy code
of degree d, redundancy m = log(q/d), power h = 2, and irreducible E

146

List-Decodable Codes

has rate = (d/q)


and can be list-decoded in polynomial time up to

distance = 1 O(d/q).
Proof. We are given a received word r : Fq Fm
q , and want to nd all

elements of LIST(r, ), for some = O(d/q).


Step 1: Find a low-degree Q explaining r. We nd a nonzero
polynomial Q(Y, Z0 , . . . , Zm1 ) of degree at most dY in its rst variable Y and at most h 1 in each of the remaining variables, and satisfying Q(y, r(y)) = 0 for all y Fq .
Such a Q exists and can be found by linear algebra in time poly(q, m)
provided that:
dY hm > q.

(5.1)

Moreover, we may assume that Q is not divisible by E(Y ). If it is,


we can divide out all the factors of E(Y ), which will not aect the
conditions Q(y, r(y)) = 0 since E has no roots (being irreducible).
Step 2: Argue that each f (Y ) LIST(r, ) is a root of a related
univariate polynomial Q . First, we argue as before that for f
LIST(r, ), we have
Q(Y, f0 (Y ), . . . , fm1 (Y )) = 0.

(5.2)

Since each fi has degree at most deg(E) 1 = d, this will be ensured


provided
q dY + (h 1) d m.

(5.3)

Once we have this, we can reduce both sides of Equation (5.2) modulo E(Y ) and deduce
0 = Q(Y, f0 (Y ), f1 (Y ), . . . , fm1 (Y )) mod E(Y )
= Q(Y, f (Y ), f (Y )h , . . . , f (Y )h

m1

) mod E(Y )

Thus, if we dene the univariate polynomial


Q (Z) = Q(Y, Z, Z h , . . . , Z h

m1

) mod E(Y ),

then f (Y ) is a root of Q over the eld Fq [Y ]/E(Y ).

5.2 List-Decoding Algorithms

147

Observe that Q is nonzero because Q is not divisible by E(Y ) and


has degree at most h 1 in each Zi . Thus, we can nd all elements
of LIST(r, ) by factoring Q (Z). Constructing Q (Z) and factoring it
can be done in time poly(q, d, hm ) = poly(q).
For this algorithm to work, we need to satisfy Conditions (5.1) and
(5.3). We can satisfy Condition (5.3) by setting dY = q dhm, in
which case Condition (5.1) is satised for
=

dhm
1

= O(d/q),
+
hm
q

(5.4)

for h = 2 and m = log(q/d). Observing that the rate is = d/(mq) =

(d/q),
this completes the proof of the theorem.
Note that the obstacles to obtaining a tradeo are the factor
of m in expression for the rate = d/(mq) and the factor of hm in
Equation (5.4) for . We remedy these in the next section.
5.2.4

Folded ReedSolomon Codes

In this section, we show how to obtain an optimal rate-distance tradeo,


where the rate is arbitrarily close to the agreement parameter .
Consider the ParvareshVardy construction with polynomial
E(Y ) = Y q1 , where is a generator of the multiplicative group Fq ;
it can be shown that E(Y ) is irreducible. (That is, {, 2 , . . . , q1 } =
Fq \ {0}.) Then it turns out that f q (Y ) mod E(Y ) = f (Y q ) mod
i
E(Y ) = f (Y ). So, we set h = q and the degree of fi (Y ) = f h (Y ) mod
E(Y ) = f ( i Y ) is d even though E has degree q 1. For each nonzero
element y of Fq , the yth symbol of the PV encoding of f (Y ) is then
[f (y), f (y), . . . , f ( m1 y)] = [f ( j ), f ( j+1 ), . . . , f ( j+m1 )],

(5.5)

where we write y = j .
Thus, the symbols of the PV encoding have a lot of overlap. For
example, the j th symbol and the j+1 th symbol share all but one
component. Intuitively, this means that we should only have to send a
1/m fraction of the symbols of the codeword, saving us a factor of m
in the rate. (The other symbols can be automatically lled in by the

148

List-Decodable Codes

receiver.) Thus, the rate becomes = d/q, just like in ReedSolomon


codes.
More formally, we use the following codes.
Construction 5.23 (Folded ReedSolomon Codes). For a prime
power q, a generator of Fq , integers m, d N, the q-ary folded Reed
Solomon code of degree d, folding parameter m, and generator is
dened as follows:
The alphabet is = Fm
q .
The blocklength is n
= (q 1)/m.
The message space is Fd+1
q , where we view each message as
representing a polynomial f (Y ) of degree at most d over Fq .
For k [
n], the kth symbol of Enc(f ) is
[f ( (k1)m ), f ( (k1)m+1 ), . . . , f ( km1 )].

We now show that these codes can be eciently list-decoded at


distance arbitrarily close to 1 , where = d/q is the rate.
Theorem 5.24. For every prime power q, integers 0 d, m q, and
generator of Fq , the q-ary folded ReedSolomon code of degree d,
folding parameter m, and generator has rate at least = d/q 
and can
be list-decoded in time q O(m) up to distance = 1 d/q O( 1/m).
Note that this theorem is most interesting (and improves on the
ReedSolomon decoding of Theorem 5.19) when m is larger than q/d,
in contrast to the setting of parameters in Theorem 5.22, where m is
logarithmic in q. We can aord the larger setting of m because now the
rate does not depend on m, and it is this change in parameters that
enables us to save us the factor of hm in Equation (5.4). However, the
running time, list size, and alphabet size do grow exponentially with m.
Proof. We are given a received word r : [
n] Fm
q , and want to nd
all elements of LIST(r, ) for an appropriate choice of = d/q +
O( 1/m).

5.2 List-Decoding Algorithms

149

Step 0: Unpack to a PV received word. From r, we will



obtain a received word r : Fq Fm
q for theParvareshVardy code of
degree d, power h = q, redundancy m =  m 1, and irreducible
E(Y ) = Y q1 . Specically, following Equation (5.5), each symbol
of r yields m m symbols of r . If r agrees with the FRS encoding of
f in more than
n positions, then r will agree with the PV encoding
of f in more than
n (m m ) positions. We have

n (m m ) ((q 1)/m 1) (m m ) = ( O(1/ m)) q.

Thus our task is now to nd LIST(r ,  ) for  = O(1/ m), which


we do in a similar manner to the decoding algorithm for Parvaresh
Vardy codes.
Step 1: Find a low-degree Q explaining r . We nd a nonzero
polynomial Q(Y, Z0 , . . . , Zm 1 ) satisfying Q(y, r (y)) = 0 for all y Fq
and such that Q has degree at most dY in its rst variable Y and total
degree at most dZ = 1 in the Zi variables. That is, Q only has monomials of the form Y j Zi for j dY . The number of these monomials is
(dY + 1) m , so we can nd such a Q in time poly(q, m ) = poly(q, m)
provided:
(dY + 1) m > q.

(5.6)

As before, we may assume that Q is not divisible by E(Y ). (Note


that using total degree 1 instead of individual degrees h 1 in the Zi s
requires an exponentially larger setting of m compared to the setting
of m needed for Equation (5.1). However, this will provide a signicant
savings below. In general, it is best to consider a combination of the
two constraints, requiring that the total degree in the Zi s is at most
some dZ and that the individual degrees are all at most h 1.)
Step 2: Argue that each f (Y ) LIST(r ,  ) is a root of a
related univariate polynomial Q . First, we argue as before that
for f LIST(r ,  ), we have
Q(Y, f0 (Y ), . . . , fm 1 (Y )) = 0,

(5.7)

150

List-Decodable Codes
i

where fi (Y ) = f h (Y ) mod E(Y ) = f ( i Y ). Since each fi has degree at


most d, this will be ensured provided
 q dY + d.

(5.8)

Note the savings of the factor (h 1) m as compared to Inequality (5.3); this is because we chose Q to be of total degree 1 in the Zi s
instead of having individual degree 1.
Now, as in the ParvareshVardy decoding, we can reduce both sides
of Equation (5.7) modulo E(Y ) and deduce
0 = Q(Y, f0 (Y ), f1 (Y ), . . . , fm1 (Y )) mod E(Y )
= Q(Y, f (Y ), f (Y )h , . . . , f (Y )h

m1

) mod E(Y ).

Thus, if we dene the univariate polynomial


Q (Z) = Q(Y, Z, Z h , . . . , Z h

m1

) mod E(Y ),

then f (Y ) is a root of Q over the eld Fq [Y ]/E(Y ).


Observe that Q is nonzero because Q is not divisible by E(Y ) and
has degree at most h 1 in each Zi . Thus, we can nd all elements of
LIST(r ,  ) by factoring Q (Z). Constructing and factoring Q (Z) can
be done in time poly(q, hm ) = q O(m) .
For this algorithm to work, we need to satisfy Conditions (5.6) and
(5.8). We can satisfy Condition (5.6) by setting dY = q/m , in which
case Condition (5.8) is satised for
 1/m + d/q.

Recalling that  = O(1/ m) and m =  m 1, we can take =

d/q + O(1/ m).


Setting parameters appropriately gives:
Theorem 5.25. The following holds for all constants , > 0. For
every n N, there is an explicit code of message length n and rate
= that can be list-decoded in polynomial time from distance
1 , with alphabet size q = poly(n).

5.3 List-decoding Views of Samplers and Expanders

151

The polynomials in the running time, alphabet size, and list size
2
depend exponentially on ; specically they are of the form nO(1/ ) .
Nonconstructively, it is possible to have alphabet size (1/)O(/) and
list size O(1/) (see discussion after Theorem 5.8), and one could hope
for running time that is a xed polynomial in n, with an exponent
that is independent of . Recent work has achieved these parameters,
albeit with a randomized construction of codes; it remains open to
have a fully explicit and deterministic construction. However, for a
xed constant-sized alphabet, e.g., q = 2, it is still not known how to
achieve list-decoding capacity.
Open Problem 5.26. For any desired constants , > 0 such that
> 1 H2 (), construct an explicit family of of codes Encn : {0, 1}n
{0, 1}n that have rate at least and are -list-decodable in polynomial
time.

5.3

List-decoding Views of Samplers and Expanders

In this section, we show how averaging samplers and expander graphs


can be understood as variants of list-decodable codes. The general list
decoding framework we use to describe these connections will also allow
us to capture the other pseudorandom objects we study in this survey.
We begin with the syntactic correspondence between the three
objects, so we can view each as a function : [N ] [D] [M ]:
Construction 5.27 (Syntactic Correspondence of Expanders,
Samplers, and Codes).
(1) Given a bipartite multigraph G with N left vertices, M right
vertices, and left-degree D, we let : [N ] [D] [M ] be its
neighbor function, i.e., (x, y) is the yth neighbor of x in G.
(2) Given a sampler Samp : [N ] [M ]D , we dene : [N ]
[D] [M ] by
(x, y) = Samp(x)y .

152

List-Decodable Codes

(3) Given a code Enc : [N ] [q]D , we set M = q D, associate


the elements of [q] [D] with [M ], and dene : [N ]
[D] [M ] by
(x, y) = (y, Enc(x)y ).

The correspondence between expanders and samplers is identical to


the one from Problem 4.7, which shows that the vertex expansion of G
is equivalent to the hitting sampler properties of Samp.
Note that codes correspond to expanders and samplers where the
rst component of a neighbor/sample equals the edge-label/samplenumber. Conversely, any such expander/sampler can be viewed as a
code. Many constructions of expanders and samplers have or can be
easily modied to have this property. This syntactic constraint turns
out to be quite natural in the setting of samplers. It corresponds to
the case where we have D functions f1 , . . . , fD : [M ] [0, 1], we are

interested in estimating the total average (1/D) i (fi ), but can only
evaluate each fi on a single sample.
We now consider the following generalized notion of list decoding.
Denition 5.28. For a function : [N ] [D] [M ], a set T [M ],
and [0, 1), we dene
LIST (T, ) = {x : Pr[(x, y) T ] > } and
y

LIST (T, 1) = {x : y (x, y) T }.


More generally, for a function f : [M ] [0, 1], we dene
LIST (f, ) = {x : E[f ((x, y))] > }.
y

We can formulate the list-decoding property of Enc, the averaging


sampler property of Samp, and the vertex expansion property of in

5.3 List-decoding Views of Samplers and Expanders

153

this language as follows:


Proposition 5.29. Let Enc and be as in Construction 5.27. Then
Enc is (1 1/q , K) list-decodable i for every r [M ]D , we have
|LIST (Tr , 1/q + )| K,
where Tr = {(y, ry ) : y [D]}.
Proof. Observe that, for each x [N ] and r [M ]D ,
Pr[(x, y) Tr ] = Pr[(y, Enc(x)y ) Tr ]
y

= Pr[Enc(x)y = ry ]
y

= agr(Enc(x), r).
Thus |LIST(Tr , 1/q + )| K if and only if there are at most K messages x whose encodings have agreement greater than 1/q + with r,
which is the same as being (1 1/q , K) list-decodable.
Proposition 5.30. Let Samp and be as in Construction 5.27. Then
(1) Samp is a (, ) averaging sampler i for every function f :
[M ] [0, 1], we have
|LIST (f, (f ) + )| N.
(2) Samp is a (, ) boolean averaging sampler i for every set
T [M ], we have
|LIST (T, (T ) + )| N.

Proof. Observe that x LIST (f, (f ) + ) if and only if x is a bad



set of coin tosses for Samp namely (1/D) D
i=1 f (zi ) > (f ) + ,
where (z1 , . . . , zD ) = Samp(x). Thus Samp errs with probability at most
R
over x U[N ] if and only if |LIST (f, (f ) + )| N . The boolean
averaging sampler case follows by viewing boolean functions f as characteristic functions of sets T .

154

List-Decodable Codes

Noting that the sets Tr in Proposition 5.29 have density (Tr ) = 1/q,
we see that the averaging-sampler property implies the standard listdecoding property:
Corollary 5.31. If Samp is a (, ) boolean averaging sampler of
the form Samp(x)y = (y, Enc(x)y ), then Enc is (1 1/q , N ) listdecodable.
Note, however, that the typical settings of parameters of samplers
and list-decodable codes are very dierent. With codes, we want the
alphabet size q to be as small as possible (e.g., q = 2) and the blocklength D to be linear or polynomial in the message length n = log N ,
so M = qD is also linear or polynomial in n = log N . In contrast, we
usually are interested in samplers for functions on exponentially large
domains (e.g., M = 2(n) ).
In Section 6, we will see a converse to Corollary 5.31 when the
alphabet size is small: if Enc is (1 1/q , N ), list-decodable, then
Samp is an (/, q ) averaging sampler.
For expanders, it will be convenient to state the list-decoding property in terms of the following variant of vertex expansion, where we
only require that sets of size exactly K expand:
Denition 5.32. For K N, a bipartite multigraph G is an (= K, A)
vertex expander if all sets S consisting of K left-vertices, the neighborhood N (S) is of size at least A K.
Thus, G is a (K, A) vertex expander in the sense of Denition 4.3
i G is an (= K  , A) vertex expander for all positive integers K  K.
Proposition 5.33. For K N, : [N ] [D] [M ] is an (= K, A)
vertex expander i for every set T [D] [M ] such that |T | < KA,
we have:
|LIST (T, 1)| < K.

5.4 Expanders from ParvareshVardy Codes

155

Proof.
not an (= K, A) expander
S [N ] s.t. |S| = K and |N (S)| < KA
S [N ] s.t. |S| K and |N (S)| < KA
T [M ] s.t. |LIST(T, 1)| K and |T | < KA,
where the last equivalence follows because if T = N (S), then S
LIST(T, 1), and conversely if S = LIST(T, 1) then N (S) T .
On one hand, this list-decoding property seems easier to establish
than the ones for codes and samplers because we look at LIST(T, 1)
instead of LIST(T, (T ) + ). On the other hand, to get expansion
(i.e., A > 1), we require a very tight relationship between |T | and
|LIST(T, 1)|. In the setting of codes or samplers, we would not care
much about a factor of 2 loss in |LIST(T )|, as this just corresponds to
a factor of 2 in list size or error probability. But here it corresponds
to a factor of 2 loss in expansion, which can be quite signicant. In
particular, we cannot aord it if we are trying to get A = (1 ) D,
as we will be in the next section.

5.4

Expanders from ParvareshVardy Codes

Despite the substantial dierences in the standard settings of parameters between codes, samplers, and expanders, it can be very useful
to translates ideas and techniques from one object to the other using
the connections described in the previous section. In particular, in this
section we will see how to build graphs with extremely good vertex
expansion (A = (1 )D) from ParvareshVardy codes.
Consider the bipartite multigraph obtained from the Parvaresh
Vardy codes (Construction 5.21) via the correspondence of Construction 5.27. That is, we dene a neighbor function : Fnq Fq Fq
Fm
q by
(f, y) = [y, f0 (y), f1 (y), . . . , fm1 (y)],

(5.9)

where f (Y ) is a polynomial of degree at most n 1 over Fq , and


i
we dene fi (Y ) = f (Y )h mod E(Y ), where E is a xed irreducible

156

List-Decodable Codes

polynomial of degree n over Fq . (Note that we are using n 1 instead


of d to denote degree of f .)
be the neighbor function
Theorem 5.34. Let : Fnq Fq Fm+1
q
of the bipartite multigraph corresponding to the q-ary Parvaresh
Vardy code of degree d = n 1, power h, and redundancy m via Construction 5.27. Then is a (Kmax , A) expander for Kmax = hm and
A = q nhm.
Proof. Let K be any integer less than or equal to Kmax = hm , and let
A = q nmh. By Proposition 5.33, it suces to show that for every
set T Fm+1
of size at most AK 1, we have |LIST(T )| K 1.
q
We begin by doing the proof for K = Kmax = hm , and later describe
the modications to handle smaller values of K. The proof goes along
the same lines as the list-decoding algorithm for the ParvareshVardy
codes from Section 5.2.3.
Step 1: Find a low-degree Q vanishing on T . We nd a nonzero
polynomial Q(Y, Z0 , . . . , Zm1 ) of degree at most dY = A 1 in its
rst variable Y and at most h 1 in each of the remaining variables
such that Q(z) = 0 for all z T . (Compare this to Q(y, r(y)) = 0 for
all y Fq in the list-decoding algorithm, which corresponds to taking
T = Tr .)
This is possible because
A hm = AK > |T |.
Moreover, we may assume that Q is not divisible by E(Y ). If it is,
we can divide out all the factors of E(Y ), which will not aect the
conditions Q(z) = 0 since E has no roots (being irreducible).
Step 2: Argue that each f (Y ) LIST(T, 1) is a root of a
related univariate polynomial Q . First, we argue as in the listdecoding algorithm that if f LIST(T, 1), we have
Q(Y, f0 (Y ), . . . , fm1 (Y )) = 0.
This is ensured because
q > A 1 + nmh.

5.4 Expanders from ParvareshVardy Codes

157

(In the list-decoding algorithm, the left-hand side of this inequality was
q, since we were bounding |LIST(Tr , )|.)
Once we have this, we can reduce both sides modulo E(Y ) and
deduce
0 = Q(Y, f0 (Y ), f1 (Y ), . . . , fm1 (Y )) mod E(Y )
2

= Q(Y, f (Y ), f (Y )h , . . . , f (Y )h

m1

) mod E(Y )

Thus, if we dene the univariate polynomial


Q (Z) = Q(Y, Z, Z h , . . . , Z h

m1

) mod E(Y ),

then f (Y ) is a root of Q over the eld Fq [Y ]/E(Y ).


Observe that Q is nonzero because Q is not divisible by E(Y ) and
has degree at most h 1 in each Zi . Thus,
|LIST(T, 1)| deg(Q ) h 1 + (h 1) h + (h 1) h2
+ + (h 1) hm1 = K 1.
(Compare this to the list-decoding algorithm, where our primary goal
was to eciently enumerate the elements of LIST(T, ), as opposed to
bound its size.)
Handling smaller values of K. We further restrict Q(Y, Z1 , . . .,
Zm ) to only have nonzero coecients on monomials of the form Y i Monj
(Z1 , . . . , Zm ) for 0 i A 1 and 0 j K 1 hm 1, where
j
Monj (Z1 , . . . , Zm ) = Z1j0 Zmm1 and j = j0 + j1 h + + jm1 hm1
is the base-h representation of j. Note that this gives us AK > |T |
2
m1
monomials, so Step 1 is possible. Moreover Mj (Z, Z h , Z h , . . . , Z h
)=
Z j , so the degree of Q is at most K 1, and we get the desired list-size
bound in Step 3.
Setting parameters, we have:
Theorem 5.35. For every constant > 0, every N N, K N , and
> 0, there is an explicit (K, (1 )D) expander with N left-vertices,
M right-vertices, left-degree D = O((log N )(log K)/)1+1/ and M
D2 K 1+ . Moreover, D is a power of 2.

158

List-Decodable Codes

Proof. Let n = log N and k = log Kmax . Let h = (2nk/)1/  and let q
be the power of 2 in the interval (h1+ /2, h1+ ].
Set m = (log Kmax )/(log h), so that hm1 Kmax hm . Then,
by Theorem 5.34, the graph : Fnq Fq Fm+1
dened in (5.9) is
q
m
an (h , A) expander for A = q nhm. Since Kmax hm , it is also a
(Kmax , A) expander.
Note that the number of left-vertices in is q n N , and the number
of right-vertices is
1+
.
M = q m+1 q 2 h(1+)(m1) q 2 Kmax

The degree is
D = q h1+ = O(nk/)1+1/ = O((log N )(log Kmax )/)1+1/ .
To see that the expansion factor A = q nhm q nhk is at least
(1 )D = (1 )q, note that
nhk (/2) h1+ q,
where the rst inequality holds because h 2nk/.
Finally, the construction is explicit because a description of Fq for q
a power of 2 (i.e., an irreducible polynomial of degree log q over F2 )
as well as an irreducible polynomial E(Y ) of degree n over Fq can be
found in time poly(n, log q) = poly(log N, log D).
These expanders are of polylogarithmic rather than constant degree.
But the expansion is almost as large as possible given the degree
(A = (1 ) D), and the size of the right-hand side is almost as
small as possible (in a (K, A) expander, we must have M KA =
(1 )KD). In particular, these expanders can be used in the data
structure application of Problem 4.10 storing a K-sized subset of [N ]
using K 1.01 polylog(N ) bits in such a way that membership can be
probabilistically tested by reading 1 bit of the data structure. (An ecient solution to that application actually requires more than the graph
being explicit in the usual sense, but also that there are ecient algorithms for nding all left-vertices having at least some fraction neighbors in a given set T [M ] of right vertices, but the expanders above

5.5 Exercises

159

can be shown to have that property by a variant of the list-decoding


algorithm above.)
A deciency of the expander of Theorem 5.35 is that the size of the
right-hand side is polynomial in K and D (for constant ), whereas the
optimal bound is M = O(KD/). Achieving the latter, while keeping
the left-degree polylogarithmic, is an open problem:
Open Problem 5.36. Construct (= K, A) bipartite expanders with N
left-vertices, degree D = poly(log N ), expansion A = 0.99D, and M =
O(KD) right-hand vertices.
We remark that a construction where D is quasipolynomial in log N
is known.

5.5

Exercises

Problem 5.1 (Limits of List Decoding). Show that if there exists


a q-ary code C n of rate that is (, L) list-decodable, then
1 Hq (, n
n.
) + (logq L)/

Problem 5.2 (Concatenated Codes). For codes Enc1 : {1, . . . ,


N } n1 1 and Enc2 : 1 n2 2 , their concatenation Enc : {1, . . . ,
N } n2 1 n2 is dened by
Enc(m) = Enc2 (Enc1 (m)1 )Enc2 (Enc1 (m)2 ) Enc2 (Enc1 (m)n1 ).
This is typically used as a tool for reducing alphabet size, e.g., with
2 = {0, 1}.
(1) Prove that if Enc1 has minimum distance 1 and Enc2 has
minimum distance 2 , then Enc has minimum distance at
least 1 2 .
(2) Prove that if Enc1 is (1 1 , 1 ) list-decodable and Enc2 is
(2 , 2 ) list-decodable, then Enc is ((1 1 2 ) 2 , 1 2 ) listdecodable.

160

List-Decodable Codes

(3) By concatenating a ReedSolomon code and a Hadamard


code, show that for every n N and > 0, there is a (fully)
explicit code Enc : {0, 1}n {0, 1}n with blocklength n
=
2
2
O(n / ) with minimum distance at least 1/2 . Furthermore, show that with blocklength n
= poly(n, 1/), we can
obtain a code that is (1/2 , poly(1/)) list-decodable in
polynomial time. (Hint: the inner code can be decoded by
brute force.)

Problem 5.3(List Decoding implies Unique Decoding for Random Errors).


(1) Suppose that C {0, 1}n is a code with minimum distance
at least 1/4 and rate at most 2 for a constant > 0, and
we transmit a codeword c C over a channel in which each
bit is ipped with probability 1/2 2. Show that if is a
suciently small constant (independent of n
and ), then all
but exponentially small probability over the errors, c will be
the unique codeword at distance at most 1/2 from the
received word r.
(2) Using Problem 5.2, deduce that for every > 0 and n N,
there is an explicit code of blocklength n
= poly(n, 1/) that
can be uniquely decoded from (1/2 2) random errors as
above in polynomial time.

Problem 5.4 (Linear Codes). For a prime power q, a q-ary code


C Fnq is called linear if C is a linear subspace of Fnq . That is, for every
u, v C and Fq , we have u + v C and u C.
(1) Verify that the Hadamard, ReedSolomon, and ReedMuller
codes are all linear.
(2) Show that if C is linear, then the minimum distance of C
equals the minimum weight of C, which is dened to be
mincC\{0} |{i : ci = 0}|/
n.

5.5 Exercises

161

(3) Show that if C is a subspace of dimension n, then its rate is


n/
n and it has an encoding function Enc : Fnq Fnq that is a
linear map. Moreover, the encoding function can be made to
have the property that there is a set S [
n] of n coordinates
such that Enc(m)|S = m for every m Fnq . (Here c|S is the
projection of c onto the coordinates in S.) Such encodings are
called systematic, and will be useful when we study locally
decodable codes in Section 7.
(4) Find explicit systematic encodings for the Hadamard and
ReedSolomon codes.
(5) Show that the nonconstructive bound of Theorem 5.8, Part 1,
can be achieved by a linear code. That is, for all prime powers
q, integers n, n
N, and (0, 1 1/q), there exists a linear
q-ary code of block length n
, minimum distance at least ,
and rate at least = n/
n provided 1 Hq (, n
). (Hint:
n
n

chose a random linear map Enc : Fq Fq and use Part 2.)

Problem 5.5 (LDPC Codes). Given a bipartite multigraph G with


N left-vertices and M right-vertices, we can obtain a linear code C
{0, 1}N (where we view {0, 1} as the eld of two elements) by:
C = {c {0, 1}N : j [M ] i(j) ci = 0},
where (j) denotes the set of neighbors of vertex j. When G has small
left-degree D (e.g., D = O(1)), then C is called a low-density parity
check (LDPC) code.
(1) Show that C has rate at least 1 M/N .
(2) Show that if G is a (K, A) expander for A > D/2, then C has
minimum distance at least = K/N .
(3) Show that if G is a (K, (1 )D) expander for a suciently
small constant , then C has a polynomial-time -decoder
for = (1 3) K/N . Assume that G is given as input to
the decoder. (Hint: given a received word r {0, 1}n , ip all
coordinates of r for which at least 2/3 of the neighboring

162

List-Decodable Codes

parity checks are not satised, and argue that the number of
errors decreases by a constant factor. It may be useful to use
the results of Problem 4.10.)
By a probabilistic argument like Theorem 4.4, graphs G as above exist
with D = O(1), K = (N ), M = (1 (1))N , arbitrarily small constant > 0, and N , and in fact explicit constructions are known.
This yields explicit LDPC codes with constant rate and constant distance (asymptotically good LDPC codes).

Problem 5.6 (Improved


Codes).

list

decoding

of

ReedSolomon

(1) Show that there is a polynomial-time algorithm for list


decoding the 
ReedSolomon codes of degree
 d over Fq up to
distance 1 2d/q, improving the 1 2 d/q bound from
Theorem 5.19. (Hint: do not use xed upper bounds on the
individual degrees of the interpolating polynomial Q(Y, Z),
but rather allow as many monomials as possible.) 
(2) (*) Improve the list-decoding radius further to 1 d/q by
using the following method of multiplicities. First, require
the interpolating polynomial Q(Y, Z) to have a zero of multiplicity s at each point (y, r(y)) that is, the polynomial Q(Y + y, Z + r(y)) should have no monomials of degree
smaller than s. Second, use the fact that a univariate polynomial R(Y ) of degree t can have at most t roots, counting
multiplicities.

Problem 5.7 (Twenty Questions). In the game of 20 questions, an


oracle has an arbitrary secret s {0, 1}n and the aim is to determine the
secret by asking the oracle as few yes/no questions about s as possible.
It is easy to see that n questions are necessary and sucient. Here we
consider a variant where the oracle has two secrets s1 , s2 {0, 1}n , and
can adversarially decide to answer each question according to either

5.6 Chapter Notes and References

163

s1 or s2 . That is, for a question f : {0, 1}n {0, 1}, the oracle may
answer with either f (s1 ) or f (s2 ). Here it turns out to be impossible
to pin down either of the secrets with certainty, no matter how many
questions we ask, but we can hope to compute a small list L of secrets
such that |L {s1 , s2 }| = 0. (In fact, |L| can be made as small as 2.)
This variant of twenty questions apparently was motivated by problems
in Internet trac routing.
(1) Let Enc : {0, 1}n {0, 1}n be a code such that every two
codewords in Enc agree in at least a 1/2 fraction of positions and that Enc has a polynomial-time (1/4 + , ) listdecoding algorithm. Show how to solve the above problem in
polynomial time by asking the n
questions {fi } dened by
fi (x) = Enc(x)i .
(2) Taking Enc to be the code constructed in Problem 1, deduce
that n
= poly(n) questions suces.

5.6

Chapter Notes and References

Standard texts on coding theory include MacWilliamsSloane [282]


and the Handbook of Coding Theory [313]. The lecture notes of
Sudan [379] present coding theory with an algorithmic perspective,
and Guruswami [184] gives a thorough treatment of list decoding.
The eld of coding theory began with Shannons seminal 1948
paper [361], which proposed the study of stochastic error models and
proved that a random error-correcting code achieves an optimal ratedistance tradeo with high probability. The study of decoding from
worst-case errors began a couple of years later with the work of
Hamming [199]. The notion of list decoding was proposed independently in the late 1950s, in the work of Elias [129] and Wozencraft [420].
The nonconstructive bound for the tradeo between rate and minimum distance of Theorem 5.8 (Part 1) is due to Gilbert [151],
and its extension to linear codes in Problem 5.4 (Part 5) is due to
Varshamov [406]; together they are known as the GilbertVarshamov
Bound. For more on the algebraic geometric codes and how they
can be used to beat the GilbertVarshamov Bound, see the text by

164

List-Decodable Codes

Stichtenoth [375]. The nonconstructive bound for list decoding of Theorem 5.8 (Part 2) is due to Elias [131], and it has been extended to
linear codes in [186, 187, 428]. The Johnson Bound (Proposition 5.10)
was proven for binary codes by Johnson [224, 225]. An optimized form
of the bound can be found in [191].
The binary (q = 2) case of ReedMuller codes was introduced independently by Reed [324] and Muller [293] in the mid-50s. ReedSolomon
codes were introduced in 1960 by Reed and Solomon [325]. Polynomial time unique-decoding algorithms for ReedSolomon Codes include
those of Peterson [308] and Berlekamp [65]. The rst nontrivial list
decoding algorithm was given by Goldreich and Levin [168]; while
their algorithm is stated in the language of hardcore bits for oneway functions, it can be viewed as an ecient local list-decoding
algorithm for the Hadamard Code. (Local decoding algorithms are discussed in Section 7.) The list-decoding algorithm for ReedSolomon
Codes of Theorem 5.19 is from the seminal work of Sudan [378], which
sparked a resurgence in the study of list decoding. The improved
decoding algorithm of Problem 5.6, Part 2 is due to Guruswami and
Sudan [189]. ParvareshVardy codes and their decoding algorithm are
from [306]. Folded ReedSolomon Codes and their capacity-achieving
list-decoding algorithm are due to Guruswami and Rudra [188]. (In all
cases, our presentation uses a simplied setting of parameters compared
to the original algorithms.) The list size, decoding time, and alphabet
size of Folded ReedSolomon Codes have recently been improved in
[125, 185, 194], with the best parameters to date being achieved in the
randomized construction of [194], while [125] give a fully deterministic
construction.
The list-decoding views of expanders and samplers emerged out
of work on randomness extractors (the subject of Section 6). Specically, a close connection between extractors and expanders was understood already at the time that extractors were introduced by Nisan
and Zuckerman [303]. An equivalence between extractors and averaging samplers was established by Zuckerman [427] (building on previous
connections between other types of samplers and expanders [107, 365]).
A connection between list-decodable codes and extractors emerged in
the work of Trevisan [389], and the list-decoding view of extractors

5.6 Chapter Notes and References

165

was crystallized by Ta-Shma and Zuckerman [383]. The list-decoding


formulation of vertex expansion and the construction of expanders
from ParvareshVardy codes is due to Guruswami, Umans, and Vadhan [192].
Code concatenation (Problem 5.2) was introduced by Forney [140],
who used it to construct explicit, eciently decodable binary codes
that achieve capacity for random errors. Ecient list-decoding algorithms for the concatenation of a ReedSolomon code with a Hadamard
code were given in [190]. Low-density parity check (LDPC) codes
(Problem 5.5) were introduced by Gallager [148]. The use of expansion to analyze such codes and their decoding algorithm comes from
Sipser and Spielman [367]. Explicit expanders suitable for Problem 5.5
were given by [92]. The surveys of Guruswami [183, 184] describe the
state of the art in expander-based constructions of codes and in LDPC
codes, respectively.
The question of Problem 5.7 (Twenty Questions) was posed in [102],
motivated by Internet trac routing applications. The solution using
list decoding is from [20].

6
Randomness Extractors

Randomness extractors are functions that extract almost-uniform bits


from sources of biased and correlated bits. The original motivation for
extractors was to simulate randomized algorithms with weak random
sources as might arise in nature. This motivation is still compelling,
but extractors have taken on a much wider signicance in the years
since they were introduced. They have found numerous applications
in theoretical computer science beyond this initial motivating one,
in areas from cryptography to distributed algorithms to hardness of
approximation. More importantly from the perspective of this survey,
they have played a major unifying role in the theory of pseudorandomness. Indeed, the links between the various pseudorandom objects
we are studying in this survey (expander graphs, randomness extractors, list-decodable codes, pseudorandom generators, samplers) were
all discovered through work on extractors (even though now we nd it
more natural to present these links in the language of list decoding, as
introduced in Section 5.3).

166

6.1 Motivation and Denition

6.1
6.1.1

167

Motivation and Denition


Deterministic Extractors

Typically, when we design randomized algorithms or protocols, we


assume that all algorithms/parties have access to sources of perfect
randomness, i.e., bits that are unbiased and completely independent.
However, when we implement these algorithms, the physical sources of
randomness to which we have access may contain biases and correlations. For example, we may use low-order bits of the system clock, the
users mouse movements, or a noisy diode based on quantum eects.
While these sources may have some randomness in them, the assumption that the source is perfect is a strong one, and thus it is of interest
to try and relax it.
Ideally, what we would like is a compiler that takes any algorithm
A that works correctly when fed perfectly random bits Um , and produces a new algorithm A that will work even if it is fed random bits
X {0, 1}n that come from a weak random source. For example, if
A is a BPP algorithm, then we would like A to also run in probabilistic polynomial time. One way to design such compilers is to design a
randomness extractor Ext : {0, 1}n {0, 1}m such that Ext(X) is distributed uniformly in {0, 1}m .
IID-Bit Sources. A simple version of this question was already
considered by von Neumann. He looked at sources that consist of
boolean random variables X1 , X2 , . . . , Xn {0, 1} that are independent
but biased. That is, for every i, Pr [Xi = 1] = for some unknown .
How can such a source be converted into a source of independent, unbiased bits? Von Neumann proposed the following extractor: Break all
the variables in pairs and for each pair output 0 if the outcome was
01, 1 if the outcome was 10, and skip the pair if the outcome was 00
or 11. This will yield an unbiased random bit after 1/(2(1 )) pairs
on average.
Independent-Bit Sources. Lets now look at a bit more interesting
class of sources in which all the variables are still independent but the
bias is no longer the same. Specically, for every i, Pr [Xi = 1] = i and

168

Randomness Extractors

0 < i 1 for some constant > 0. How can we deal with such
a source?
It can be shown that when we take a parity of  bits from such
an independent-bit source, the result approaches an unbiased coin


ip exponentially fast in , i.e., | Pr i=1 Xi = 1 1/2| = 2() . The
result is not a perfect coin ip but is as good as one for almost all
purposes.
Lets be more precise about the problems we are studying. A source
on {0, 1}n is simply a random variable X taking values in {0, 1}n . In
each of the above examples, there is an implicit class of sources being
studied. For example, IndBitsn, is the class of sources X on {0, 1}n
where the bits Xi are independent and satisfy Pr[Xi = 1] 1 .
We could dene IIDBitsn, to be the same with the further restriction that all of the Xi s are identically distributed, i.e., Pr[Xi = 1] =
Pr[Xj = 1] for all i, j, thereby capturing von Neumann sources.
Denition 6.1 (deterministic extractors). 1 Let C be a class of
sources on {0, 1}n . An -extractor for C is a function Ext : {0, 1}n
{0, 1}m such that for every X C, Ext(X) is -close to Um .
Note that we want a single function Ext that works for all sources
in the class. This captures the idea that we do not want to assume
we know the exact distribution of the physical source we are using,
but only that it comes from some class. For example, for IndBitsn, ,
we know that the bits are independent and none are too biased, but
not the specic bias of each bit. Note also that we only allow the
extractor one sample from the source X. If we want to allow multiple independent samples, then this should be modelled explicitly in
our class of sources; ideally we would like to minimize the independence
assumptions used.
We still need to dene what we mean for the output to be -close
to Um .

1 Such

extractors are called deterministic or seedless to contrast with the probabilistic or


seeded randomness extractors we will see later.

6.1 Motivation and Denition

169

Denition 6.2.For random variables X and Y taking values in U, their


statistical dierence (also known as variation distance) is (X, Y ) =
maxT U | Pr[X T ] Pr[Y T ]|. We say that X and Y are -close if
(X, Y ) .
Intuitively, any event in X happens in Y with the same probability . This is perhaps the most natural measure of distance for
probability distributions (much more so than the 2 distance we used
in the study of random walks). In particular, it satises the following
natural properties.
Lemma 6.3(properties of statistical dierence). Let X, Y, Z, X1 ,
X2 , Y1 , Y2 be random variables taking values in a universe U. Then,
(1) (X, Y ) 0, with equality i X and Y are identically distributed,
(2) (X, Y ) 1, with equality i X and Y have disjoint supports,
(3) (X, Y ) = (Y, X),
(4) (X, Z) (X, Y ) + (Y, Z),
(5) for every function f , we have (f (X), f (Y )) (X, Y ),
(6) ((X1 , X2 ), (Y1 , Y2 )) (X1 , Y1 ) + (X2 , Y2 ) if X1 and X2 ,
as well as Y1 and Y2 , are independent, and
(7) (X, Y ) = 12 |X Y |1 , where | |1 is the 1 distance. (Thus,
X is -close to Y i we can transform X into Y by shifting
at most an fraction of probability mass.)
We now observe that extractors according to this denition give us
the compilers we want.
Proposition 6.4. Let A(w; r) be a randomized algorithm such that
A(w; Um ) has error probability at most , and let Ext : {0, 1}n
{0, 1}m be an -extractor for a class C of sources on {0, 1}n . Dene
A (w; x) = A(w; Ext(x)). Then for every source X C, A (w; X) has
error probability at most + .

170

Randomness Extractors

This application identies some additional properties wed like from


our extractors. Wed like the extractor itself to be eciently computable
(e.g., polynomial time). In particular, to get m almost-uniform bits
out, we should need at most n = poly(m) bits from the weak random
source.
We can cast our earlier extractor for sources of independent bits in
this language:
Proposition 6.5. For every constant > 0, every n, m N, there is
a polynomial-time computable function Ext : {0, 1}n {0, 1}m that is
an -extractor for IndBitsn, , with = m 2(n/m) .
In particular, taking n = m2 , we get exponentially small error with
a source of polynomial length.
Proof. Ext breaks the source into m blocks of length n/m and outputs the parity of each block.

Unpredictable-Bit Sources (aka SanthaVazirani Sources).


Another interesting class of sources, which looks similar to the previous example is the class UnpredBitsn, of unpredictable-bit sources.
These are the sources that for every i, every x1 , . . . , xn {0, 1} and some
constant > 0, satisfy
Pr [Xi = 1 | X1 = x1 , X2 = x2 , . . . , Xi1 = xi1 ] 1
The parity extractor used above will be of no help with this source since
the next bit could be chosen in a way that the parity will be equal to 1
with probability . Indeed, there does not exist any nontrivial extractor
for these sources the best we can do is output the rst bit:
Proposition 6.6. For every n N, > 0, and xed extraction function Ext : {0, 1}n {0, 1} there exists a source X UnpredBitsn, such
that either Pr [Ext(X) = 1] or Pr [Ext(X) = 1] 1 . That is,
there is no -extractor for UnpredBitsn, for < 1/2 .

6.1 Motivation and Denition

171

The proof is left as an exercise (Problem 6.6).


Nevertheless, as we will see, the answer to the question whether we
can simulate BPP algorithms with unpredictable sources will be yes!
Indeed, we will even be able to handle a much more general class of
sources, introduced in the next section.
6.1.2

Entropy Measures and General Weak Sources

Intuitively, to extract m almost-uniform bits from a source, the source


must have at least m bits of randomness in it. (In particular, its
support cannot be much smaller than 2m .) Ideally, this is all we would
like to assume about a source. Thus, we need some measure of how much
randomness is in a random variable; this can be done using various
notions of entropy described below.
Denition 6.7 (entropy measures). Let X be a random variable.
Then
the Shannon entropy of X is:
!
HSh (X) = E log
R

xX

"
1
.
Pr [X = x]

the Renyi entropy of X is:




1
1
H2 (X) = log
= log
, and
ExX
[Pr [X = x]]
CP(X)
R
the min-entropy of X is:

H (X) = min log


x

1
,
Pr [X = x]

where all logs are base 2.


Renyi entropy H2 (X) should not be confused with the binary
entropy function H2 () from Denition 5.6. Indeed, the q-ary entropy
Hq () is equal to the Shannon entropy of a random variable that equals
1 with probability 1 and is uniformly distributed in {2, . . . , q} with
probability .

172

Randomness Extractors

All the three measures satisfy the following properties we would


expect from a measure of randomness:

Lemma 6.8 (properties of entropy). For each of the entropy measures H {HSh , H2 , H } and random variables X, Y , we have:
H(X) 0, with equality i X is supported on a single
element,
H(X) log |Supp(X)|, with equality i X is uniform on
Supp(X),
if X, Y are independent, then H((X, Y )) = H(X) + H(Y ),
for every deterministic function f , we have H(f (X)) H(X),
and
for every X, we have H (X) H2 (X) HSh (X).

To illustrate the dierences between the three notions, consider a


source X such that X = 0n with probability 0.99 and X = Un with
probability 0.01. Then HSh (X) 0.01n (contribution from the uniform
distribution), H2 (X) log(1/0.992 ) < 1 and H (X) log(1/0.99) < 1
(contribution from 0n ). Note that even though X has Shannon entropy
linear in n, we cannot expect to extract bits that are close to uniform or
carry out any useful randomized computations with one sample from X,
because it gives us nothing useful 99% of the time. Thus, we should use
the stronger measures of entropy given by H2 or H .
Then why is Shannon entropy so widely used in information theory
results? The reason is that such results typically study what happens
when you have many independent samples from the source (whereas we
only allow one sample). In the case of many samples, it turns out that
the source is close to one where the min-entropy is roughly equal
to the Shannon entropy. Thus the distinction between these entropy
measures becomes less signicant. Moreover, Shannon entropy satises
many nice identities that make it quite easy to work with. Min-entropy
and Renyi entropy are much more delicate.

6.1 Motivation and Denition

173

We will consider the task of extracting randomness from sources


where all we know is a lower bound on the min-entropy:
Denition 6.9. A random variable X is a k-source if H (X) k, i.e.,
if Pr [X = x] 2k .
A typical setting of parameters is k = n for some xed , e.g.,
0.01. We call the min-entropy rate. Some dierent ranges that are
commonly studied (and are useful for dierent applications): k =
polylog(n), k = n for a constant (0, 1), k = n for a constant
(0, 1), and k = n O(1). The middle two (k = n and k = n) are
the most natural for simulating randomized algorithms with weak random sources.
Examples of k-sources:
k random and independent bits, together with n k xed
bits (in an arbitrary order). These are called oblivious bitxing sources.
k random and independent bits, and n k bits that depend
arbitrarily on the rst k bits. These are called adaptive bitxing sources.
Unpredictable-bit sources with bias parameter . These are
k-sources with k = log(1/(1 )n ) = (n).
Uniform distribution on a set S {0, 1}n with |S| = 2k .
These are called at k-sources.
It turns out that at k-sources are really representative of general
k-sources.
Lemma 6.10. Every k-source is a convex combination of at k-sources


(provided that 2k N), i.e., X = pi Xi with 0 pi 1, pi = 1 and
all the Xi are at k-sources.
That is, we can think of any k-source as being obtained by rst selecting
a at k-source Xi according to some distribution (given by the pi s) and

174

Randomness Extractors

then selecting a random sample from Xi . This means that if we can


compile probabilistic algorithms to work with at k-sources, then we
can compile them to work with any k-source.
Proof. Let X be a k-source on [N ]. We can view X as partitioning a
circle of unit circumference into N (half-open) intervals, where the tth
interval has length exactly Pr[X = t]. (If we associate the points on the
circle with [0, 1), then the tth interval is [Pr[X < t], Pr[X t]).) Now
consider a set S of K points spaced evenly on the circle. Then since
each interval is half-open and has length at most 1/K, each interval
contains at most one point from S, so the uniform distribution on the
set T (S) = {t : S It = } is a at k-source. Moreover, if we perform a
uniformly random rotation of S on the circle to obtain a rotated set R
and then choose a uniformly random element of T (R), the probability
that we output any value t [N ] is exactly the length of It , which equals
Pr[X = t]. Thus we have decomposed X as a convex combination of at

k-sources. (Specically, X = T pT UT , where the sum is over subsets
T [N ] of size K, and pT = PrR [T (R) = T ].)
We also sketch another proof of Lemma 6.10 that can be more
easily generalized to other classes of sources. We can view a random
variable X taking values in [N ] as an N -dimensional vector, where
X(i) is the probability mass of i. Then X is a k-source if and only if

X(i) [0, 2k ] for every i [N ] and i X(i) = 1. The set of vectors
X satisfying these linear inequalities is a convex polytope in RN . By
basic linear programming theory, all of the points in the polytope are
convex combinations of its vertices, which are dened to be the points
that make a maximal subset of the inequalities tight. By inspection, the
vertices of the polytope of k-sources are those sources where X(i) = 2k
for 2k values of i and X(i) = 0 for the remaining values of i; these are
simply the at k-sources.
6.1.3

Seeded Extractors

Proposition 6.6 tells us that it is impossible to have deterministic


extractors for unpredictable sources. Here we consider k-sources, which
are more general than unpredictable sources, and hence it is also

6.1 Motivation and Denition

175

impossible to have deterministic extractors for them. The impossibility


result for k-sources is stronger and simpler to prove.
Proposition 6.11. For any Ext : {0, 1}n {0, 1} there exists an
(n 1)-source X so that Ext(X) is constant.
Proof. There exists b {0, 1} so that |Ext1 (b)| 2n /2 = 2n1 . Then
let X be the uniform distribution on Ext1 (b).
On the other hand, if we reverse the order of quantiers, allowing the
extractor to depend on the source, it is easy to see that good extractors
exist and in fact a randomly chosen function will be a good extractor
with high probability.
Proposition 6.12. For every n, k, m N, every > 0, and every at
k-source X, if we choose a random function Ext : {0, 1}n {0, 1}m with
m = k 2 log(1/) O(1), then Ext(X) will be -close to Um with
2
probability 1 2(K ) , where K = 2k .
(In this section, we will make extensive use of the convention that
capital variables are 2 raised to the power of the corresponding lowercase variable, such as K = 2k above.)
Proof. Choose our extractor randomly. We want it to have following
property: for all T [M ], | Pr[Ext(X) T ] Pr[Um T ]| . Equivalently, |{x Supp(X) : Ext(x) T }|/K diers from the density (T )
by at most . For each point x Supp(X), the probability that
Ext(x) T is (T ), and these events are independent. By the Cherno
Bound (Theorem 2.21) for each xed T , this condition holds with prob2
ability at least 1 2(K ) . Then the probability that condition is
2
violated for at least one T is at most 2M 2(K ) , which is less than 1
for m = k 2 log(1/) O(1).
Note that the failure probability is doubly-exponentially small in k.
Naively, one might hope that we could get an extractor thats good for
all at k-sources by a union bound. But the number of at k-sources

176

Randomness Extractors

N 
N K (where N = 2n ), which is unfortunately a larger doubleis K
exponential in k. We can overcome this gap by allowing the extractor
to be slightly probabilistic, i.e., allowing the extractor a seed consisting of a small number of truly random bits in addition to the weak
random source. We can think of this seed of truly random bits as a
random choice of an extractor from family of extractors. This leads to
the following crucial denition:
Denition 6.13 (seeded extractors). A function Ext : {0, 1}n
{0, 1}d {0, 1}m is a (k, )-extractor if for every k-source X on {0, 1}n ,
Ext(X, Ud ) is -close to Um .
(Sometimes we will refer to extractors Ext : [N ] [D] [M ] whose
domain and range do not consist of bit-strings. These are dened in the
natural way, requiring that Ext(X, U[D] ) is -close to U[M ] .)
The goal is to construct extractors that minimize d and maximize m.
We prove the following theorem.
Theorem 6.14. For every n N, k [0, n] and > 0, there exists
a (k, )-extractor Ext : {0, 1}n {0, 1}d {0, 1}m with m = k + d
2 log(1/) O(1) and d = log(n k) + 2 log(1/) + O(1).
One setting of parameters to keep in mind (for our application of
simulating randomized algorithms with a weak source) is k = n, with
a xed constant (e.g., = 0.01), and a xed constant (e.g., = 0.01).
Proof. We use the Probabilistic Method. By Lemma 6.10, it suces for
Ext to work for at k-sources. Choose the extractor Ext at random.
Then the probability that the extractor fails is at most the number of
at k-sources times the probability Ext fails for a xed at k-source.
By the above proposition, the probability of failure for a xed at k2
source is at most 2(KD ) , since (X, Ud ) is a at (k + d)-source) and
m = k + d 2 log( 1 ) O(1). Thus the total failure probability is at
most

 

N
N e K (KD2 )
(KD2 )
2

2
.
K
K

6.1 Motivation and Denition

177

The latter expression is less than 1 if D2 c log(N e/K) = c


(n k) + c for constants c, c . This is equivalent to d log(n k) +
2 log( 1 ) + O(1).
It turns out that both bounds (on m and d) are individually tight
up to the O(1) terms.
Recall that our motivation for extractors was to simulate randomized algorithms given only a weak random source, so allowing a truly
random seed may seem to defeat the purpose. However, if the seed is
of logarithmic length as in Theorem 6.14, then instead of selecting it
randomly, we can enumerate all possibilities for the seed and take a
majority vote.
Proposition 6.15. Let A(w; r) be a randomized algorithm for
computing a function f such that A(w; Um ) has error probability
at most (i.e., Pr[A(w; Um ) = f (w)] ]), and let Ext : {0, 1}n
{0, 1}d {0, 1}m be a (k, )-extractor. Dene
A (w; x) = maj {A(w; Ext(x, y))}.
y{0,1}d

Then for every k-source X on {0, 1}n , A (w; X) has error probability
at most 2 ( + ).
Proof. The probability that A(w; Ext(X, Ud )) is incorrect is not more
than the probability A(w; Um ) is incorrect plus , i.e., + , by
the denition of statistical dierence. Then the probability that
majy A(w, Ext(X, y)) is incorrect is at most 2 ( + ), because each
error of majy A(w; Ext(x, y)) corresponds to A(w; Ext(x, Ud )) erring
with probability at least 1/2.
Note that the enumeration incurs a 2d factor slowdown in the simulation. Thus, to retain running time poly(m), we want to construct
extractors where (a) d = O(log n); (b) Ext is computable in polynomial
time; and (c) m = n(1) .
We remark that the error probability in Proposition 6.15 can actually be made exponentially small by using an extractor that is designed
for slightly lower min-entropy. (See Problem 6.2.)

178

Randomness Extractors

We note that even though seeded extractors suce for simulating


randomized algorithms with only a weak source, they do not suce
for all applications of randomness in theoretical computer science. The
trick of eliminating the random seed by enumeration does not work, for
example, in cryptographic applications of randomness. Thus the study
of deterministic extractors for restricted classes of sources remains a
very interesting and active research direction. We, however, will focus
on seeded extractors, due to their many applications and their connections to the other pseudorandom objects we are studying.

6.2

Connections to Other Pseudorandom Objects

As mentioned earlier, extractors have played a unifying role in the


theory of pseudorandomness, through their close connections with a
variety of other pseudorandom objects. In this section, we will see two
of these connections. Specically, how by reinterpreting them appropriately, extractors can be viewed as providing families of hash functions,
and as being a certain type of highly expanding graphs.
6.2.1

Extractors as Hash Functions

Proposition 6.12 says that for any subset S [N ] of size K, if we choose


a completely random hash function h : [N ] [M ] for M
K, then h
will map the elements of S almost-uniformly to [M ]. Equivalently, if
we let H be distributed uniformly over all functions h : [N ] [M ] and
X be uniform on the set S, then (H, H(X)) is statistically close to
(H, U[M ] ), where we use the notation UT to denote the uniform distribution on a set T . Can we use a smaller family of hash functions than
the set of all functions h : [N ] [M ]? This gives rise to the following
variant of extractors.
Denition 6.16 (strong extractors). Extractor Ext : {0, 1}n
{0, 1}d {0, 1}m is a strong (k, )-extractor if for every k-source
X on {0, 1}n , (Ud , Ext(X, Ud )) is -close to (Ud , Um ). Equivalently,
Ext (x, y) = (y, Ext(x, y)) is a standard (k, )-extractor.

6.2 Connections to Other Pseudorandom Objects

179

The nonconstructive existence proof of Theorem 6.14 can be


extended to establish the existence of very good strong extractors:
Theorem 6.17. For every n, k N and > 0 there exists a
strong (k, )-extractor Ext : {0, 1}n {0, 1}d {0, 1}m with m = k
2 log(1/) O(1) and d = log(n k) + 2 log(1/) + O(1).
Note that the output length is m k instead of m k + d; intuitively a strong extractor needs to extract randomness that is independent of the seed and thus can only get the k bits from the source.
We see that strong extractors can be viewed as very small families of
hash functions having the almost-uniform mapping property mentioned
above. Indeed, our rst explicit construction of extractors is obtained
by using pairwise independent hash functions.
Theorem 6.18 (Leftover Hash Lemma). If H = {h : {0, 1}n
{0, 1}m } is a pairwise independent (or even 2-universal) family of hash
def

functions where m = k 2 log(1/), then Ext(x, h) = h(x) is a strong


(k, )-extractor. Equivalently, Ext(x, h) = (h, h(x)) is a standard (k, )extractor.
Note that the seed length equals the number of random bits required
R
to choose h H, which is at least n by Problem 3.5.2 This is far
from optimal; for the purposes of simulating randomized algorithms we
would like d = O(log n). However, the output length of the extractor is
m = k 2 log(1/), which is optimal up to an additive constant.
Proof. Let X be an arbitrary k-source on {0, 1}n , H as above, and
R
H H. Let d be the seed length. We show that (H, H(X)) is -close
to Ud Um in the following three steps:
(1) We show that the collision probability of (H, H(X)) is close
to that of Ud Um .
2 Problem

3.5 refers to pairwise independent families, but a similar argument shows that
universal families require (n) random bits. (Instead of constructing orthogonal vectors,
we construct vectors that have nonpositive dot product.)

180

Randomness Extractors

(2) We note that this is equivalent to saying that the 2 distance


between (H, H(X)) and Ud Um is small.
(3) Then we deduce that the statistical dierence is small, by
recalling that the statistical dierence equals half of the 1
distance, which can be (loosely) bounded by the 2 distance.
Proof of (1): By denition, CP(H, H(X)) = Pr[(H, H(X)) =
(H  , H  (X  ))], where (H  , X  ) is independent of and identically
distributed to (H, X). Note that (H, H(X)) = (H  , H  (X)) if and only
if H = H  and either X = X  or X = X  but H(X) = H(X  ). Thus
CP(H, H(X)) = CP(H) (CP(X) + Pr[H(X) = H(X  ) | X = X  ])


1
1
1 + 2
1

D
K
M
DM
To see the penultimate inequality, note that CP(H) = 1/D because
there are D hash functions, CP(X) 1/K because H (X) k, and
Pr [H(X) = H(X  ) |X = X  ] 1/M by 2-universality.
Proof of (2):
1
DM
2
1
2
1+

=
.

DM
DM
DM

(H, H(X)) Ud Um 2 = CP(H, H(X))

Proof of (3): Recalling that the statistical dierence between two


random variables X and Y is equal to 12 |X Y |1 , we have:
1
|(H, H(X)) Ud Um |1
2

DM
(H, H(X)) Ud Um 

2


DM
2

2
DM

= .
2

((H, H(X)), Ud Um ) =

Thus, we have in fact obtained a strong (k, /2)-extractor.

6.2 Connections to Other Pseudorandom Objects

181

The proof above actually shows that Ext(x, h) = h(x) extracts with
respect to collision probability, or equivalently, with respect to the
2 -norm. This property may be expressed in terms of Renyi entropy
def
H2 (Z) = log(1/CP(Z)). Indeed, we can dene Ext : {0, 1}n {0, 1}d
{0, 1}m to be a (k, ) Renyi-entropy extractor if H2 (X) k implies
H2 (Ext(X, Ud )) m (or H2 (Ud , Ext(X, Ud )) m + d for strong
Renyi-entropy extractors). Then the above proof shows that pairwiseindependent hash functions yield strong Renyi-entropy extractors.
In general, it turns out that an extractor with respect to
Renyi entropy must have seed length d min{m/2, n k} O(1) (as
opposed to d = O(log n)); this explains why the seed length in the above
extractor is large. (See Problem 6.4.)
6.2.2

Extractors versus Expanders

Extractors have a natural interpretation as graphs. Specically, we can


interpret an extractor Ext : {0, 1}n {0, 1}d {0, 1}m as the neighbor function of a bipartite multigraph G = ([N ], [M ], E) with N = 2n
left-vertices, M = 2m right-vertices, and left-degree D = 2d ,3 where the
rth neighbor of left-vertex u is Ext(u, r). Typically n % m, so the graph
is very unbalanced. It turns out that the extraction property of Ext
is related to various expansion properties of G. In this section, we
explore this relationship.
Let Ext : {0, 1}n {0, 1}d {0, 1}m be a (k, )-extractor and
G = ([N ], [M ], E) the associated graph. Recall that it suces to
examine Ext with respect to at k-sources: in this case, the extractor
property says that given a subset S of size K = 2k on the left, a
random neighbor of a random element of S should be close to uniform
on the right. In particular, if S [N ] is a subset on the left of size K,
then |N (S)| (1 )M . This property is just like vertex expansion,
except that it ensures expansion only for sets of size exactly K, not
any size K. Recall that we call such a graph an (= K, A) vertex
expander (Denition 5.32). Indeed, this gives rise to the following
weaker variant of extractors.
3 This

connection is the reason we use d to denote the seed length of an extractor.

182

Randomness Extractors

Denition 6.19 (dispersers). A function Disp : {0, 1}n {0, 1}d


{0, 1}m is a (k, )-disperser if for every k-source X on {0, 1}n ,
Disp(X, Ud ) has a support of size at least (1 ) 2m .
While extractors can be used to simulate BPP algorithms with a
weak random source, dispersers can be used to simulate RP algorithms
with a weak random source. (See Problem 6.2.)
Then, we have:
Proposition 6.20. Let n, m, d N, K = 2k N, and > 0. A function Disp : {0, 1}n {0, 1}d {0, 1}m is a (k, )-disperser i the
corresponding bipartite multigraph G = ([N ], [M ], E) with left-degree
D is an (= K, A) vertex expander for A = (1 ) M/K.
Note that extractors and dispersers are interesting even when
M
K, so the expansion parameter A may be less than 1. Indeed,
A < 1 is interesting for vertex expanders when the graph is highly
imbalanced. Still, for an optimal extractor, we have M = (2 KD)
(because m = k + d 2 log(1/) (1)), which corresponds to
expansion factor A = (2 D). (An optimal disperser actually gives
A = (D/ log(1/)).) Note this is smaller than the expansion factor of
D/2 in Ramanujan graphs and D O(1) in random graphs; the reason
is that those expansion factors are for small sets, whereas here we
are asking for sets to expand to almost the entire right-hand side.
Now lets look for a graph-theoretic property that is equivalent
to the extraction property. Ext is a (k, )-extractor i for every set
S [N ] of size K,
(Ext(US , U[D] ), U[M ] ) = max | Pr[Ext(US , U[D] ) T ]
T [M ]

Pr[U[M ] T ]| ,
where US denotes the uniform distribution on S. This inequality
may be expressed in graph-theoretic terms as follows. For every

6.2 Connections to Other Pseudorandom Objects

set T [M ],

183






,
U
)

Pr
U

T
Pr
Ext(U


S
[D]
[M ]


e(S, T )
|T |

|S|D
M


e(S, T )



(S)(T ) (S),
ND

where e(S, T ) is the number of edges from S to T (as in Denition 4.13).


Thus, we have:
Proposition 6.21. A function Ext : {0, 1}n {0, 1}d {0, 1}m is a
(k, )-extractor i the corresponding bipartite multigraph G = ([N ],
[M ], E) with left-degree D has the property that |e(S, T )/N D
(S)(T )| (S) for every S [N ] of size K and every T [M ].
Note that this is very similar to the Expander Mixing Lemma
(Lemma 4.15), which states that if a graph G has spectral expansion ,
then for all sets S, T [N ] we have




e(S, T )


N D (S)(T ) (S)(T ).

It follows that if (S)(T ) (S) for all S [N ] of size K and
all T [N ], then G gives rise to a (k, )-extractor (by turning G into a
D-regular bipartite graph with
 N vertices on each side in the natural
way). It suces for K/N for this to work.
We can use this connection to turn our explicit construction of spectral expanders
into an explicit construction of extractors. To achieve

K/N , we can take an appropriate power of a constant-degree
expander. Specically, if G0 is a D0 -regular expander on N vertices with
bounded second eigenvalue,
 we can consider the tth power of G0 , G =
Gt0 , where t = O(log((1/) N/K)) = O(n k + log(1/)). The degree
of G is D = D0t , so d = log D = O(t). This yields the following result:
Theorem 6.22. For every n, k N and > 0, there is an
explicit
(k, )-extractor
Ext : {0, 1}n {0, 1}d {0, 1}n
with
d = O(n k + log(1/)).

184

Randomness Extractors

Note that the seed length is signicantly better than in the


construction from pairwise-independent hashing when k is close to
n, say k = n O(n) (i.e., K = N 1o(1) ). The output length is n,
which is much larger than the typical output length for extractors
(usually m
n). Using a Ramanujan graph (rather than an arbitrary
constant-degree expander), the seed length can be improved to
d = n k + 2 log(1/) + O(1), which yields an optimal output length
n = k + d 2 log(1/) O(1).
Another way of proving Theorem 6.22 is to use the fact that a
random step on an expander decreases the 2 distance to uniform, like
in the proof of the Leftover Hash Lemma. This analysis shows that
we actually get a Renyi-entropy extractor; and thus explains the large
seed length d n k.
The following table summarizes the main dierences between
classic expanders and extractors.
Table 6.1.

Dierences between classic expanders and extractors.

Expanders
Measured by vertex or spectral
expansion
Typically constant degree
All sets of size at most K expand
Typically balanced

6.2.3

Extractors
Measured by min-entropy/
statistical dierence
Typically logarithmic or
poly-logarithmic degree
All sets of size exactly (or at
least) K expand
Typically unbalanced, bipartite
graphs

List Decoding View of Extractors

In this section, we cast extractors into the same list-decoding framework that we used to capture list-decodable codes, samplers, and
expanders (in Section 5.3). Recall that all of these objects could be
syntactically described as functions : [N ] [D] [M ], and their
properties could be captured by bounding the sizes of sets of the
def
form LIST (T, ) = {x : Pry [(x, y) T ] > } for T [M ]. We also

6.2 Connections to Other Pseudorandom Objects

185

considered a generalization to functions f : [M ] [0, 1] where we


def

dened LIST (f, ) = LIST (f, ) = {x : Ey [f ((x, y))] > }.


Conveniently, an extractor Ext : {0, 1}n {0, 1}d {0, 1}m already
meets the syntax of a function : [N ] [D] [M ] (matching our
convention that N = 2n , D = 2d , M = 2m ). The extraction property
can be described in the list-decoding framework as follows:
Proposition 6.23. Let = Ext : [N ] [D] [M ], let K = 2k N,
and [0, 1].
(1) If Ext is a (k, ) extractor, then for every f : [M ] [0, 1], we
have
|LIST (f, (f ) + )| < K.
(2) Suppose that for every T [M ], we have
|LIST (T, (T ) + )| K.
Then Ext is a (k + log(1/), 2) extractor.

Proof.
(1) Suppose for contradiction that |LIST (f, (f ) + )| K.
Let X be uniformly distributed over LIST (f, (f ) + ).
Then X is a k-source, and
E[f (Ext(X, U[D] ))] = E [f (Ext(x, U[D] ))]
R

xX

> (f ) +
= E[f (U[M ] )] + .
By Problem 6.1, this implies that Ext(X, U[D] ) and U[M ]
are -far, contradicting the hypothesis that Ext is a (k, )
extractor.
(2) Let X be any (k + log(1/))-source. We need to show that
Ext(X, U[D] ) is 2-close to U[M ] . That is, we need to show
that for every T [M ], Pr[Ext(X, U[D] ) T ] (T ) + 2.

186

Randomness Extractors

So let T be any subset of [M ]. Then


Pr[Ext(X, U[D] ) T ]
Pr[X LIST(T, (T ) + )]
+ Pr[Ext(X, U[D] ) T |X
/ LIST(T, (T ) + )]
|LIST(T, (T ) + )| 2(k+log(1/)) + ((T ) + )
K 2(k+log(1/)) + (T ) +
= (T ) + 2.
The proposition does not give an exact list-decoding characterization
of extractors, as the two parts are not exactly converses of each other.
One dierence is the extra log(1/) bits of entropy and the factor of 2
in appearing in Part 6.23. These are typically insignicant dierences
for extractors; indeed even an optimal extractor loses (log(1/)) bits
of entropy (cf. Theorem 6.14). A second dierence is that Part 6.23
shows that extractors imply bounds on |LIST(f, (f ) + )| even for
fractional functions f , whereas Part 6.23 only requires bounds on
|LIST(T, (T ) + )| to deduce the extractor property. This only makes
the result stronger, and indeed we will utilize this below.
Notice that the conditions characterizing extractors here are identical to the ones characterizing averaging samplers in Proposition 5.30.
Actually, the condition in Part 6.23 is the one characterizing averaging
samplers, whereas the condition in Part 6.23 is the one characterizing
boolean averaging samplers. Thus we have:
Corollary 6.24. Let Ext : [N ] [D] [M ] and Samp : [N ] [M ]D
be such that Ext(x, y) = Samp(x)y . Then:
(1) If Ext is a (k, ) extractor, then Samp is a (K/N, ) averaging
sampler, where K = 2k .
(2) If Samp is a (K/N, ) boolean averaging sampler, then Ext
is a (k + log(1/), 2) extractor.
(3) If Samp is a (, ) boolean averaging sampler, then Samp is
a (/, 2) averaging sampler.

6.2 Connections to Other Pseudorandom Objects

187

Thus, the only real dierence between extractors and averaging


samplers is one of perspective, and both perspectives can be useful.
For example, in samplers, we measure the error probability = K/N =
2k /2n , whereas in extractors we measure the min-entropy threshold k
on its own. Thus, the sampler perspective can be more natural when
is relatively large compared to 1/N , and the extractor perspective
when becomes quite close to 1/N . Indeed, an extractor for minentropy k = o(n) corresponds to a sampler with error probability =
1/2(1o(1))n , which means that each of the n bits of randomness used
by the sampler reduces the error probability by almost a factor of 2!
We can now also describe the close connection between strong
extractors and list-decodable codes when the alphabet size/output
length is small.
Proposition 6.25. Let Ext : [N ] [D] [M ] and Enc : [N ] [M ]D
be such that Ext(x, y) = Enc(x)y , and let K = 2k N.
(1) If Ext is a strong (k, ) extractor, then Enc is
(1 1/M , K) list-decodable.
(2) If Enc is (1 1/M , K) list-decodable, then Ext is a
(k + log(1/), M ) strong extractor.

Proof.
(1) Follows from Corollaries 5.31 and 6.24.
(2) Let X be a (k + log(1/))-source and Y = U[D] . Then the
statistical dierence between (Y, Ext(X, Y )) and Y U[M ]
equals
((Y, Ext(X, Y )), Y U[M ] )
= E [(Ext(X, y), U[M ] )]
R

y Y

= E [(Enc(X)y , U[M ] )]
R

y Y



M
E max Pr[Enc(X)y = z] 1/M ,
R
z
2 yY

188

Randomness Extractors

where the last inequality follows from the 1 formulation of


statistical dierence.
So if we dene r [M ]D by setting ry to be the value z
maximizing Pr[Enc(X)y = z] 1/M , we have:
((Y, Ext(X, Y )), Y U[M ] )
M

(Pr[(Y, Enc(X)Y ) Tr ] 1/M ),


2
M

(Pr[X LIST(Tr , 1/M + )] + )


2
M

(2(k+log(1/)) K + )
2
M .
Note that the quantitative relationship between extractors and
list-decodable codes given by Proposition 6.25 deteriorates extremely
fast as the output length/alphabet size increases. Nevertheless, the
list-decoding view of extractors as given in Proposition 6.23 turns out
to be quite useful.

6.3

Constructing Extractors

In the previous sections, we have seen that very good extractors


exist extracting almost all of the min-entropy from a source with
only a logarithmic seed length. But the explicit constructions we have
seen (via universal hashing and spectral expanders) are still quite far
from optimal in seed length, and in particular cannot be used to give
a polynomial-time simulation of BPP with a weak random source.
Fortunately, much better extractor constructions are known
ones that extract any constant fraction of the min-entropy using
a logarithmic seed length, or extract all of the min-entropy using
a polylogarithmic seed length. In this section, we will see how to
construct such extractors.
6.3.1

Block Sources

We introduce a useful model of sources that has more structure than


an arbitrary k-source:

6.3 Constructing Extractors

189

Denition 6.26. A random variable X = (X1 , X2 , . . . , Xt ) is a (k1 , k2 ,


. . . , kt ) block source if for every x1 , . . . , xi1 , Xi |X1 =x1 ,...,Xi1 =xi1 is a
ki -source. If k1 = k2 = = kt = k, then we call X a t k block source.
Note that a (k1 , k2 , . . . , kt ) block source is also a (k1 + + kt )source, but it comes with additional structure each block is
guaranteed to contribute some min-entropy. Thus, extracting randomness from block sources is an easier task than extracting from general
sources.
The study of block sources has a couple of motivations.
They are a natural and plausible model of sources in their
own right. Indeed, they are more general than unpredictablebit sources of Section 6.1.1: if X UnpredBitsn, is broken
into t blocks of length  = n/t, then the result is a t  
block source, where  = log(1/(1 )).
We can construct extractors for general weak sources by
converting a general weak source into a block source. We
will see how to do this later in the section.
We now illustrate how extracting from block sources is easier than
from general sources. The idea is that we can extract almost-uniform
bits from later blocks that are essentially independent of earlier blocks,
and hence use these as a seed to extract more bits from the earlier
blocks. Specically, for the case of two blocks we have the following:
Lemma 6.27. Let Ext1 : {0, 1}n1 {0, 1}d1 {0, 1}m1 be a (k1 , 1 )extractor, and Ext2 : {0, 1}n2 {0, 1}d2 {0, 1}m2 be a (k2 , 2 )extractor with m2 d1 . Dene Ext ((x1 , x2 ), y2 ) = (Ext1 (x1 , y1 ), z2 ),
where (y1 , z2 ) is obtained by partitioning Ext2 (x2 , y2 ) into a prex y1
of length d1 and a sux z2 of length m2 d1 .
Then for every (k1 , k2 ) block source X = (X1 , X2 ) taking values
in {0, 1}n1 {0, 1}n2 , it holds that Ext (X, Ud2 ) is (1 + 2 )-close to
Um1 Um2 d1 .

190

Randomness Extractors

Proof. Since X2 is a k2 -source conditioned on any value of X1 and Ext2


is a (k2 , 2 )-extractor, it follows that (X1 , Y1 , Z2 ) = (X1 , Ext2 (X2 , Ud2 ))
is 2 -close to (X1 , Um2 ) = (X1 , Ud1 , Um2 d1 ).
Thus, (Ext1 (X1 , Y1 ), Z2 ) is 2 -close to (Ext1 (X1 , Ud1 ), Um2 d1 ),
which is 1 -close to (Um1 , Um2 d1 ) because X1 is a k1 -source and Ext1
is a (k1 , 1 )-extractor.
By the triangle inequality, Ext (X, Ud2 ) = (Ext1 (X1 , Y1 ), Z2 ) is
(1 + 2 )-close to (Um1 , Um2 d1 ).
The benet of this composition is that the seed length of Ext
depends only one of the extractors (namely Ext2 ) rather than being
the sum of the seed lengths. (If this is reminiscent of the zigzag
product, it is because they are closely related see Section 6.3.5).
Thus, we get to extract from multiple blocks at the price of one.
Moreover, since we can take d1 = m2 , which is typically much larger
than d2 , the seed length of Ext can even be smaller than that of Ext1 .
The lemma extends naturally to extracting from many blocks:
Lemma 6.28. For i = 1, . . . , t, let Exti : {0, 1}ni {0, 1}di {0, 1}mi
be a (ki , i )-extractor, and suppose that mi di1 for every i = 1, . . . , t,
where we dene d0 = 0. Dene Ext ((x1 , . . . , xt ), yt ) = (z1 , . . . , zt ), where
for i = t, . . . , 1, we inductively dene (yi1 , zi ) to be a partition of
Exti (xi , yi ) into a di1 -bit prex and a (mi di1 )-bit sux.
Then for every (k1 , . . . , kt ) block source X = (X1 , . . . , Xt ) taking
values in {0, 1}n1 {0, 1}nt , it holds that Ext (X, Udt ) is -close to


Um for = ti=1 i and m = ti=1 (mi di1 ).
We remark that this composition preserves strongness. If each of
the Exti s correspond to strong extractors in the sense that their seeds
are prexes of their outputs, then Ext will also correspond to a strong
extractor. If in addition d1 = d2 = = dt , then this construction can
be seen as simply using the same seed to extract from all blocks.
Already with this simple composition, we can simulate BPP with
an unpredictable-bit source (even though deterministic extraction
from such sources is impossible by Proposition 6.6). As noted above,
by breaking an unpredictable-bit source X with parameter into

6.3 Constructing Extractors

191

blocks of length , we obtain a t k block source for t = n/, k =  ,


and  = log(1/(1 )).
Suppose that is a constant. Set  = (10/  ) log n, so that X
is a t k block source for k = 10 log n, and dene = n2 . Letting Ext : {0, 1} {0, 1}d {0, 1}d+m be the (k, ) extractor using
universal hash functions (Theorem 6.18), we have:
d = O() = O(log n) and
1
m = k 2 log O(1) > k/2

Composing Ext with itself t times as in Lemma 6.27, we obtain


Ext : {0, 1}t {0, 1}d {0, 1}d+tm such that Ext (X, Ud ) is  close to uniform, for  = 1/n. (Specically, Ext ((x1 , . . . , xt ), h) =
(h, h(x1 ), . . . , h(xt )).) This tells us that Ext essentially extracts half of
the min-entropy from X, given a random seed of logarithmic length.
Plugging this extractor into the construction of Proposition 6.15 gives
us the following result.
Theorem 6.29. For every constant > 0, we can simulate BPP
with an unpredictable-bit source of parameter . More precisely, for
every L BPP and every constant > 0, there is a polynomial-time
algorithm A and a polynomial q such that for every w {0, 1} and
every source X UnpredBitsq(|w|), , the probability that A(w; X) errs
is at most 1/|w|.
6.3.2

Reducing General Sources to Block Sources

Given the results of the previous section, a common approach to


constructing extractors for general k-sources is to reduce the case of
general k-sources to that of block sources.
One approach to doing this is as follows. Given a k-source X of
length n, where k = n, pick a (pseudo)random subset S of the bits of
X, and let W = X|S be the bits of X in those positions. If the set S
is of size , then we expect that W will have at least roughly  bits of
min-entropy (with high probability over the choice of S). Moreover,
W reveals at most  bits of information, so if  < n, intuitively there

192

Randomness Extractors

should still be min-entropy left in X. (This is justied by Lemma 6.30


below.) Thus, the pair (W, X) should be a block source. This approach
can be shown to work for appropriate ways of sampling the set S, and
recursive applications of it formed the original approach to constructing good extractors (and is still useful in various contexts today). The
fact mentioned above, that conditioning on a string of length  reduces
min-entropy by at most  bits, is given by the following lemma (which
is very useful when working with min-entropy).
Lemma 6.30 (chain rule for min-entropy). If (W, X) are two
jointly distributed random variables, where (W, X) is a k-source and
|Supp(W )| 2 , then for every > 0, it holds that with probability at
R
least 1 over w W , X|W =w is a (k  log(1/))-source.
The proof of this lemma is left as an exercise (Problem 6.1).
This is referred to as the chain rule for min-entropy by analogy with the chain rule for Shannon entropy, which states that
HSh (X|W ) = HSh (W, X) HSh (W ), where the conditional Shannon entropy is dened to be HSh (X|W ) = EwW
[HSh (X|W =w )].
R
Thus, if HSh (W, X) k and W is of length at most , we have
HSh (X|W ) k . The chain rule for min-entropy is not quite as
clean; we need to assume that W has small support (rather than just
small min-entropy) and we lose log(1/) bits of additional min-entropy.
(The log(1/) bits of entropy can be saved by using an appropriate
notion of conditional min-entropy; see Problems 6.7 and 6.8.)
Another approach, which we will follow, is based on the observation
that every source of high min-entropy rate (namely, greater than 1/2)
is (close to) a block source, as shown by the lemma below. Thus, we
will try to convert arbitrary sources into ones of high min-entropy rate.
Lemma 6.31. If X is an (n )-source of length n, and X = (X1 , X2 )
is a partition of X into blocks of lengths n1 and n2 , then for every > 0,
(X1 , X2 ) is -close to some (n1 , n2 log(1/)) block source.
Consider = n for a constant < 1/2, and n1 = n2 = n/2. Then
each block contributes min-entropy at least (1/2 )n. The proofs of
Lemmas 6.30 and 6.31 are left as exercises (Problem 6.1).

6.3 Constructing Extractors

6.3.3

193

Condensers

The previous section left us with the problem of converting a general


k-source into one of high min-entropy rate. We will do this via the
following kind of object:
Denition 6.32. A function Con : {0, 1}n {0, 1}d {0, 1}m is a
k k  condenser if for every k-source X on {0, 1}n , Con(X, Ud ) is
-close to some k  -source. Con is lossless if k  = k + d.
If k  /m > k/n, then the condenser increases the min-entropy rate,
intuitively making extraction an easier task. Indeed, condensers with
k  = m are simply extractors themselves.
Like extractors, it is often useful to view condensers graphtheoretically. Specically, we think of Con as the neighbor function of
a bipartite multigraph G with N = 2n left vertices, left degree D = 2d ,
and M = 2m right vertices.
The lossless condensing property of Con turns out to be equivalent
to G having vertex expansion close to the degree:
Proposition 6.33. Let n, d, m N, K = 2k N, and > 0. A function
Con : {0, 1}n {0, 1}d {0, 1}m is a k k + d lossless condenser if
and only if the corresponding bipartite multigraph G = ([N ], [M ], E)
of left degree D is an (= K, (1 )D) vertex expander.
Proof. : Let S [N ] be any set of size K. Then US is a
k-source, so Con(US , Ud ) is -close to a (k + d)-source. This
implies that |Supp(Con(US , Ud ))| (1 ) 2k+d . Noting that
Supp(Con(US , Ud )) = N (S) completes the proof.
: By Lemma 6.10, it suces to prove that for every subset
S [N ] of size K, it holds that Con(US , Ud ) is -close to a (k + d)source. By expansion, we know that |N (S)| (1 ) KD. Since
there are only KD edges leaving S, by redirecting KD of the edges,
we can ensure that they all go to KD distinct vertices in [M ]. The
uniform distribution on these KD vertices is a (k + d)-source that is
-close to Con(US , Ud ).

194

Randomness Extractors

Recall that vertex expansion normally corresponds to the disperser


property (see Proposition 6.20), which is weaker than extraction and
condensing. Indeed, vertex expansion generally does not guarantee
much about the distribution induced on a random neighbor of a set
S, except that its support is large. However, in case the expansion is
very close to the degree (A = (1 ) D), then the distribution must
be nearly at (as noted in the above proof).
By applying Proposition 6.33 to the expanders based on Parvaresh
Vardy codes (Theorem 5.35), we get the following lossless condenser:
Theorem 6.34. For every constant > 0, for all positive integers
n k and all > 0, there is an explicit
k k + d
lossless condenser Con : {0, 1}n {0, 1}d {0, 1}m
O(log n + log(1/)) and m = (1 + )k + O(log(n/)).

with

d=

Note that setting to be a small constant, we obtain an output


min-entropy rate arbitrarily close to 1.
Problem 6.5 gives a simple extractor Ext for high min-entropy rate
when the error parameter is constant. Applying that extractor to the
output of the above condenser, we obtain extractors with a seed length
of d = O(log n) that extract (k) almost-uniform bits (with constant
error ) from sources of any desired min-entropy k. In the next section,
we will use the above condenser to give an ecient construction of
extractors for arbitrary values of .
We remark that the fact that having output min-entropy rate
bounded away from 1 is not inherent for lossless condensers. Nonconstructively, there exist lossless condensers with output length m =
k + d + log(1/) + O(1), and Open Problem 5.36 about expanders
can be restated in the language of lossless condensers as follows:
Open Problem 6.35. Give an explicit construction of a k k + d
lossless condenser Con : {0, 1}n {0, 1}d {0, 1}m with d = O(log n),
m = k + d + O(1), and = 0.01.

6.3 Constructing Extractors

195

If we had such a condenser, then we could get extractors that


extract all but O(1) bits of the min-entropy by then applying extractors
based on spectral expanders (Theorem 6.22).
6.3.4

The Extractor

In this section, we will use the ideas outlined in the previous section
namely condensing and block-source extraction to construct an
extractor that is optimal up to constant factors.
Theorem 6.36. For all positive integers n k and all > 0, there
is an explicit (k, ) extractor Ext : {0, 1}n {0, 1}d {0, 1}m with
m k/2 and d = O(log(n/)).
We will use the following building block, constructed in Problem 6.9.
Lemma 6.37. For every constant t > 0 and all positive integers
n k and all > 0, there is an explicit (k, ) extractor Ext :
{0, 1}n {0, 1}d {0, 1}m with m k/2 and d = k/t + O(log(n/)).
The point is that this extractor has a seed length that is an
arbitrarily large constant factor (approximately t/2) smaller than its
output length. Thus, if we use it as Ext2 in the block-source extraction
of Lemma 6.27, the resulting seed length will be smaller than that of
Ext1 by an arbitrarily large constant factor. (The seed length of the
composed extractor Ext in Lemma 6.27 is the same of that as Ext2 ,
which will be a constant factor smaller than its output length m2 ,
which we can take to be equal to the seed length d1 of Ext1 .)
Overview of the Construction. Note that for small minentropies k, namely k = O(log(n/)), the extractor we want is already
given by Lemma 6.37 with seed length d smaller than the output
length m by any constant factor. (If we allow d m, then extraction
is trivial just output the seed.) Thus, our goal will be to recursively
construct extractors for large min-entropies using extractors for smaller
min-entropies. Of course, if Ext : {0, 1}n {0, 1}d {0, 1}m is a (k0 , )

196

Randomness Extractors

extractor, say with m = k0 /2, then it is also a (k, ) extractor for every
k k0 . The problem is that the output length is only k0 /2 rather than
k/2. Thus, we need to increase the output length. This can be achieved
by simply applying extractors for smaller min-entropies several times:
Lemma 6.38. Suppose Ext1 : {0, 1}n {0, 1}d1 {0, 1}m1 is a (k1 , 1 )
extractor and Ext2 : {0, 1}n {0, 1}d2 {0, 1}m2 is a (k2 , 2 ) extractor
for k2 = k1 m1 log(1/3 ). Then Ext : {0, 1}n {0, 1}d1 +d2
{0, 1}m1 +m2 dened by Ext (x, (y1 , y2 )) = (Ext1 (x, y1 ), Ext2 (x, y2 )) is a
(k1 , 1 + 2 + 3 ) extractor.
The proof of this lemma follows from Lemma 6.30. After conditioning a k1 -source X on W = Ext1 (X, Ud1 ), X still has min-entropy
at least k1 m1 log(1/3 ) = k2 (except with probability 3 ), and
thus Ext2 (X, Ud2 ) can extract an additional m2 almost-uniform bits.
To see how we might apply this, consider setting k1 = 0.8k and
m1 = k1 /2, 1 = 2 = 3 = 20.1k , k2 = k1 m1 log(1/3 )
[0.3k, 0.4k], and m2 = k2 /2. Then we obtain a (k, 3) extractor Ext
with output length m = m1 + m2 > k/2 from two extractors for
min-entropies k1 , k2 that are smaller than k by a constant factor, and
we can hope to construct the latter two extractors recursively via the
same construction.
Now, however, the problem is that the seed length grows by
a constant factor in each level of recursion (e.g., if d1 = d2 = d in
Lemma 6.38, we get seed length 2d rather than d). Fortunately, block
source extraction using the extractor of Lemma 6.37 gives us a method
to reduce the seed length by a constant factor. (The seed length of the
composed extractor Ext in Lemma 6.27 is the same of that as Ext2 ,
which will be a constant factor smaller than its output length m2 ,
which we can take to be equal to the seed length d1 of Ext1 . Thus, the
seed length of Ext will be a constant factor smaller than that of Ext1 .)
In order to apply block source extraction, we rst need to convert our
source to a block source; by Lemma 6.31, we can do this by using the
condenser of Theorem 6.34 to make its entropy rate close to 1.
One remaining issue is that the error still grows by a constant factor in each level of recursion. However, we can start with

6.3 Constructing Extractors

197

polynomially small error at the base of the recursion and there are only
logarithmically many levels of recursion, so we can aord this blow-up.
We now proceed with the proof details. It will be notationally
convenient to do the steps in the reverse order from the description
above rst we will reduce the seed length by a constant factor via
block-source extraction, and then apply Lemma 6.38 to increase the
output length.
Proof of Theorem 6.36. Fix n N and 0 > 0. Set d = c log(n/0 )
for an error parameter 0 and a suciently large constant c to be
determined in the proof below. (To avoid ambiguity, we will keep
the dependence on c explicit throughout the proof, and all big-Oh
notation hides universal constants independent of c.) For k [0, n], let
i(k) be the smallest nonnegative integer i such that k 2i 8d. This
will be the level of recursion in which we handle min-entropy k; note
that i(k) log k log n.
For every k [0, n], we will construct an explicit Extk :
{0, 1}n {0, 1}d {0, 1}k/2 that is a (k, i(k) ) extractor, for an
appropriate sequence 0 1 2 . Note that we require the seed
length to remain d and the fraction of min-entropy extracted to remain
1/2 for all values of k. The construction will be by induction on i(k).
Base Case: i(k) = 0, i.e., k 8d. The construction of Extk follows
from Lemma 6.37, setting t = 9 and taking c to be a suciently large
constant.
Inductive Case: We construct Extk for i(k) 1 from extractors
Extk with i(k  ) < i(k) as follows. Given a k-source X of length n,
Extk works as follows.
(1) We apply the condenser of Theorem 6.34 to convert X into
a source X  that is 0 -close to a k-source of length (9/8)k +
O(log(n/0 )). This requires a seed of length O(log(n/0 )).
(2) We divide X  into two equal-sized halves (X1 , X2 ). By
Lemma 6.31, (X1 , X2 ) is 20 -close to a 2 k  block source for
k  = k/2 k/8 O(log(n/0 )).

198

Randomness Extractors

Note that i(k  ) < i(k). Since i(k) 1, we also have


k  3d O(log(n/0 )) 2d, for a suciently large choice
of the constant c.
(3) Now we apply block-source extraction as in Lemma 6.27.
We take Ext2 to be a (2d, 0 ) extractor from Lemma 6.37
with parameter t = 16, which will give us m2 = d output
bits using a seed of length d2 = (2d)/16 + O(log(n/0 )).
For Ext1 , we use our recursively constructed Extk , which
has seed length d, error i(k ) , and output length k  /2 k/6
(where the latter inequality holds for a suciently large
choice of the constant c, because k > 8d > 8c log(1/)).
All in all, our extractor so far has seed length at most
d/8 + O(log(n/0 )), error at most i(k)1 + O(0 ), and output
length at least k/6. This would be sucient for our induction except
that the output length is only k/6 rather than k/2. We remedy this
by applying Lemma 6.38.
With one application of the extractor above, we extract at least
m1 = k/6 bits of the source min-entropy. Then with another
application of the extractor above for min-entropy threshold
k2 = k m1 log(1/0 ) = 5k/6 log(1/0 ), by Lemma 6.38, we
extract another (5k/6 log(1/0 ))/6 bits and so on. After four applications, we have extracted all but (5/6)4 k + O(log(1/0 )) k/2 bits of
the min-entropy. Our seed length is then 4 (d/8 + O(log(n/0 ))) d
and the total error is i(k) = O(i(k)1 ).
Solving the recurrence for the error, we get i = 2O(i) 0
poly(n) 0 , so we can obtain error by setting 0 = /poly(n). As far
as explicitness, we note that computing Extk consists of four evaluations of our condenser from Theorem 6.34, four evaluations of Extk
for values of k  such that i(k  ) < (i(k) 1), four evaluations of the
explicit extractor from Lemma 6.37, and simple string manipulations
that can be done in time poly(n, d). Thus, the total computation time
is at most 4i(k) poly(n, d) = poly(n, d).
Repeatedly applying Lemma 6.38 using extractors from
Theorem 6.36, we can extract any constant fraction of the min-entropy
using a logarithmic seed length, and all the min-entropy using a
polylogarithmic seed length.

6.3 Constructing Extractors

199

Corollary 6.39. The following holds for every constant > 0.


For every n N, k [0, n], and > 0, there is an explicit (k, )
extractor Ext : {0, 1}n {0, 1}d {0, 1}m with m (1 )k and
d = O(log(n/)).
Corollary 6.40. For every n N, k [0, n], and > 0, there is
an explicit (k, ) extractor Ext : {0, 1}n {0, 1}d {0, 1}m with
m = k O(log(1/)) and d = O(log k log(n/)).
We remark that the above construction can be modied to yield
strong extractors achieving the same output lengths as above (so the
entropy of the seed need not be lost in Corollary 6.40).
A summary of the extractor parameters we have seen is in the
following table.
Table 6.2.

Method

Parameters for some constructions of (k, 0.01) extractors.

Seed Length d

Optimal and Nonconstructive log(n k) + O(1)


Necessary for BPP
O(log n)
Simulation
Spectral Expanders
O(n k)
Pairwise Independent
O(n)
Hashing
Corollary 6.39
O(log n)
Corollary 6.40

O(log2 n)

Output Length m
k + d O(1)
k(1)
n
k + d O(1)
(1 )k,
any constant > 0
k O(1)

While Theorem 6.36 and Corollary 6.39 give extractors that are
optimal up to constant factors in both the seed length and output
length, it remains an important open problem to get one or both of
these to be optimal to within an additive constants while keeping the
other optimal to within a constant factor.
Open Problem 6.41. Give an explicit construction of (k, 0.01) extractors Ext : {0, 1}n {0, 1}d {0, 1}m with seed length d = O(log n)
and output length m = k + d O(1).

200

Randomness Extractors

By using the condenser of Theorem 6.34, it suces to achieve


the above for high min-entropy rate, e.g., k = 0.99n. Alternatively,
a better condenser construction would also resolve the problem.
(See Open Problem 6.35.) We note that there is a recent construction of extractors with seed length d = O(log n) and output length
m = (1 1/polylog(n))k (improving the output length of m = (k) of
Corollary 6.39, in the case of constant or slightly subconstant error).

Open Problem 6.42. Give an explicit construction of (k, 0.01) extractors Ext : {0, 1}n {0, 1}d {0, 1}m with seed length d = log n + O(1)
and m = (k) (or even m = k (1) ).
One of the reasons that these open problems are signicant is that,
in many applications of extractors, the resulting complexity depends
exponentially on the seed length d and/or the entropy loss k + d m.
(An example is the simulation of BPP with weak random sources given
by Proposition 6.15.) Thus, additive constants in these parameters
corresponds to constant multiplicative factors in complexity.
Another open problem is more aesthetic in nature. The construction of Theorem 6.36 makes use of the condenser of Theorem 6.36, the
Leftover Hash Lemma (Theorem 6.18) and the composition techniques
of Lemmas 6.27 and 6.38 in a somewhat complex recursion. It is of
interest to have a construction that is more direct. In addition to the
aesthetic appeal, such a construction would likely be more practical
to implement and provide more insight into extractors. In Chapter 7,
we will see a very direct construction based on a connection between
extractors and pseudorandom generators, but its parameters will
be somewhat worse than Theorem 6.36. Thus the following remains
open:

Open Problem 6.43. Give a direct construction of (k, ) extractors


Ext : {0, 1}n {0, 1}d {0, 1}m with seed length d = O(log(n/)) and
m = (k).

6.3 Constructing Extractors

6.3.5

201

Block-Source Extraction versus the ZigZag Product

To further highlight the connections between extractors and expanders,


here we describe how the block-source extraction method of
Section 6.3.1 (which we used in the main extractor construction
of Theorem 6.36) is closely related to the zigzag graph product
of Section 4.3.2.3 (which we used in the expander construction of
Theorem 4.39).
Recall the block-source extraction method of Lemma 6.27: We
dene Ext : {0, 1}n1 +n2 {0, 1}d2 {0, 1}m1 by Ext ((x1 , x2 ), y2 ) =
Ext1 (x1 , Ext2 (x2 , y2 )). (Here we consider the special case that m2 = d1 .)
Viewing the extractors as bipartite graphs, the left-vertex set is
[N1 ] [N2 ] and the left-degree is D2 . A random step from a vertex
(x1 , x2 ) [N1 ] [N2 ] corresponds to taking a random step from x2 in
G2 to obtain a right-hand vertex y1 {0, 1}m2 , which we view as an
edge label y for G1 . We then move to the yth neighbor of x1 .
This is just like the rst two steps of the zigzag graph product.
Why do we need a third step in the zigzag product? It is because of
the slightly dierent goals in the two settings. In a (spectral) expander,
we consider an arbitrary initial distribution that does not have too
much (Renyi) entropy, and need to add entropy to it. In a block-source
extractor, our initial distribution is constrained to be a block source
(so both blocks have a certain amount of min-entropy), and our goal
is to produce an almost-uniform output (even if we end up with less
bits than the initial entropy).
Thus, in the zigzag setting, we must consider the following
extreme cases (that are ruled out for block sources):
The second block has no entropy given the rst. Here, the
step using G2 will add entropy, but not enough to make
y1 close to uniform. Thus, we have no guarantees on the
behavior of the G1 -step, and we may lose entropy with
it. For this reason, we keep track of the edge used in the
G1 -step that is, we remember b1 such that x1 is the b1 th
neighbor of z1 = Ext(x1 , y1 ). This ensures that the (edgerotation) mapping (x1 , y1 )  (z1 , b1 ) is a permutation and
does not lose any entropy. We can think of b1 as a buer

202

Randomness Extractors

that retains any extra entropy in (x1 , y1 ) that did not get
extracted into z1 . So a natural idea is to just do block source
extraction, but output (z1 , b1 ) rather than just z1 . However,
this runs into trouble with the next case.
The rst block has no entropy but the second block is
completely uniform given the rst. In this case, the G2 step
cannot add any entropy and the G1 step does not add any
entropy because it is a permutation. However, the G1 step
transfers entropy into z1 . So if we add another expander-step
from b1 at the end, we can argue that it will add entropy.
This gives rise to the 3-step denition of the zigzag product.
While we analyzed the zigzag product with respect to spectral
expansion (i.e., Renyi entropy), it is also possible to analyze it in
terms of a condenser-like denition (i.e., outputting distributions
-close to having some min-entropy). It turns out that a variant
of the zigzag product for condensers leads to a construction of
constant-degree bipartite expanders with expansion (1 ) D for
the balanced (M = N ) or nearly balanced (e.g., M = (N )) case.
However, as mentioned in Open Problems 4.43, 4.44, and 5.36, there
are still several signicant open problems concerning the explicit
construction of expanders with vertex expansion close to the degree,
achieving expansion D O(1) in the nonbipartite case, and achieving
a near-optimal number of right-hand vertices.

6.4

Exercises

Problem 6.1 (Min-entropy and Statistical Dierence).


(1) Prove that for every two random variables X and Y ,
(X, Y ) = max | E[f (X)] E[f (Y )]| =
f

1
|X Y |1 ,
2

where the maximum is over all [0, 1]-valued functions f .


(Hint: rst identify the functions f that maximize
| E[f (X)] E[f (Y )]|.)

6.4 Exercises

203

(2) Suppose that (W, X) are jointly distributed random variables


where (W, X) is a k-source and |Supp(W )| 2 . Show that
R
for every > 0, with probability at least 1 over w W ,
we have X|W =w is a (k  log(1/))-source.
(3) Suppose that X is an (n )-source taking values in {0, 1}n ,
and we let X1 consist of the rst n1 bits of X and X2 the
remaining n2 = n n1 bits. Show that for every > 0,
(X1 , X2 ) is -close to some (n1 , n2 log(1/))
block source.

Problem 6.2 (Simulating Randomized Algorithms with Weak


Sources).
(1) Let A(w; r) be a randomized algorithm for computing
a function f using m random bits such that A(w; Um )
has error probability at most 1/3 (i.e., for every
w, Pr[A(w; Um ) = f (w)] 1/3) and let Ext: {0, 1}n
{0, 1}d {0, 1}m be a (k, 1/7)-extractor. Dene A (w; x) =
majy{0,1}d {A(w; Ext(x; y)} (breaking ties arbitrarily).
Show that for every (k + t)-source X, A (w; X) has error
probability at most 2t .
(2) Let A(w; r) be a randomized algorithm for deciding a
language L using m random bits such that A(w; Um ) has
one-sided error probability at most 1/2 (i.e., if w L, then
Pr[A(w; Um ) = 1] 1/2 and if w
/ L, then Pr[A(w; Um ) =
1] = 0) and let Disp: {0, 1}n {0, 1}d {0, 1}m be a (k, 1/3)disperser. Show how to decide L with one-sided error at most
2t given a single sample from a (k + t)-source X (with no
other randomness) and running A and Disp each 2d times.

Problem 6.3 (Almost-Universal Hashing). A family H of functions mapping domain [N ] to [M ] is said to have collision probability

204

Randomness Extractors

at most if for every x1 = x2 [N ], we have


Pr [H(x1 ) = H(x2 )] .
R

H H

H has is -almost universal if it has collision probability at most


(1 + )/M . (Note that this is a relaxation of the notion of -almost
pairwise independence from Problem 3.4.)
(1) Show that if a family H = {h : [N ] [M ]} is 2 -almost
def

universal, Ext(x, h) = h(x) is a (k, ) strong extractor for


k = m + 2 log(1/) + O(1), where m = log M .
(2) Use Problem 3.4 to deduce that for every n N,
k n, and > 0, there is a (k, ) strong extractor
Ext : {0, 1}n {0, 1}d {0, 1}m with d = O(k + log(n/))
and m = k 2 log(1/) O(1).
(3) Given a family H of functions mapping [N ] to [M ], we can
obtain a code Enc : [N ] [M ]|H| by Enc(x)h = h(x), and
conversely. Show that H has collision probability at most
i Enc has minimum distance at least 1 .
(4) Use the above connections and the list-decoding view
of extractors (Proposition 6.23) to prove the Johnson
Bound for small alphabets: if a code Enc : [N ] [M ]n
has minimum distance at least 1 1/M /M , then it is

(1 1/M , O(M/)) list-decodable.

Problem 6.4 (R
enyi
extractors). Call a function Ext :
n
d
m
{0, 1} {0, 1} {0, 1}
a (k, ) Renyi extractor if for every
source X on {0, 1}n of Renyi entropy at least k, it holds that
Ext(X, Ud ) has Renyi entropy at least m .

(1) Prove that a (k, ) Renyi extractor is also a (k, O( ))


extractor.
(2) Show that for every n, k, m N with m n and > 0,
there exists a (k, ) Renyi extractor Ext : {0, 1}n {0, 1}d
{0, 1}m with d = O(min{n k + log(1/), m/2 + log(n/)}).

6.4 Exercises

205

(3) Show that if Ext : {0, 1}n {0, 1}d {0, 1}m is a (k, 1)
Renyi extractor, then d min{n k, m/2} O(1). (Hint:
consider a k-source that is uniform over {x : yExt(x, y) T }
for an appropriately chosen set T .)

Problem 6.5 (Extractors versus Samplers). Use the connection


between extractors and averaging samplers to do the following:
(1) Prove that for all constants , > 0, there is a constant < 1
such that for all n, there is an explicit (n, ) extractor Ext :
{0, 1}n {0, 1}d {0, 1}m with d log n and m (1 )n.
(2) Prove that for every m N, , > 0, there exists a (nonconstructive) (, ) averaging sampler Samp : {0, 1}n
({0, 1}m )t
using
n = m + 2 log(1/) + log(1/) + O(1)
random bits and t = O((1/2 ) log(1/)) samples.
(1) Suppose we are given a constant-error BPP algorithm that
uses r = r(n) random bits on inputs of length n. Show how,
using the explicit extractor of Theorem 6.36, we can reduce
its error probability to 2 using O(r) +  random bits,
for any polynomial  = (n). (Note that this improves the
r + O() given by expander walks for  % r.) Conclude that
every problem in BPP has a randomized polynomial-time
0.01
algorithm that only errs for 2q
choices of its q = q(n)
random bits.

Problem 6.6 (Extracting from Unpredictable-Bit Sources).


(1) Let X be a source taking values in {0, 1}n such that
for all x, y, Pr[X = x]/ Pr[X = y] (1 )/. Show that
X UnpredBitsn, .
(2) Prove that for every function Ext : {0, 1}n {0, 1} and
every > 0, there exists a source X UnpredBitsn,
with parameter such that Pr[Ext(X) = 1] or

206

Randomness Extractors

Pr[Ext(X) = 0] 1 . (Hint: for b {0, 1}, consider


X that is uniform on Ext1 (b) with probability 1 and is
uniform on Ext1 (b) with probability .)
(3) (*) Show how to extract from sources in UnpredBitsn,
using a seeded extractor with a seed of constant length.
That is, the seed length should not depend on the length
n of the source, but only on the bias parameter and the
statistical dierence from uniform desired. The number of
bits extracted should be (n).

Problem 6.7 (Average Min-Entropy). While there is no single


correct denition of conditional min-entropy, in this problem you
will explore one denition that is useful and has nice properties. For
two jointly distributed random variables (X, Y ) dene


1
H (Y |X) = log
.
ExX
[maxy Pr[Y = y|X = x]]
R
(1) Prove that H (Y |X) = log(1/ maxP Pr[P (X) = Y ]), where
the maximum is over functions P from the domain of X
to the domain of Y . So average min-entropy captures the
unpredictability of Y from X.
(2) Show that if H (Y |X) k, then with probability at least
R
1 over x X, we have H (Y |X=x ) k log(1/).
(3) Show
that
H (Y ) H (Y |X) H (X, Y )
log |Supp(X)|.
(4) Use the previous two items to deduce Problem 6.1, Part 6.1.
(5) Give an example showing that H (Y |X) can be much
smaller than H (Y ) H (X). Specically, construct n-bit
random variables where H (Y ) = n but H (Y |X) and
H (X) are both O(1).

Problem 6.8 (Average Min-Entropy Extractors). A function


Ext : {0, 1}n {0, 1}d {0, 1}m is a (k, ) average min-entropy

6.4 Exercises

207

extractor if for every two jointly distributed random variables (X, Y )


such that X takes values in {0, 1}n and H (X|Y ) k, we have
(Ext(X, Ud ), Y ) is -close to (Um , Y ), where Ud and Um are taken to be
independent of (X, Y ). By Problem 6.7, Part 6.7, it follows that if Ext
is a (k, ) extractor, then it is a (k + log(1/), 2) average min-entropy
extractor. Below, you will show how to avoid the log(1/) entropy loss
from this reduction.
(1) (*) Show that if Ext : {0, 1}n {0, 1}d {0, 1}m is a
(k, )-extractor with k n 1, then for every t > 0, Ext
is a (k t, 2t+1 )-extractor. (Hint: for a statistical test
T {0, 1}m , relate the (k t)-sources X maximizing the
distinguishing advantage | Pr[Ext(X, Ud ) T ] Pr[Um T ]|
to the k-sources maximizing the distinguishing advantage.)
(2) Show that if Ext : {0, 1}n {0, 1}d {0, 1}m is a (k, )extractor, then Ext is a (k, 3) average min-entropy extractor.
(3) Use these results and Problem 6.3 to improve Corollary6.40,
and construct, for every n N, k n, and > 0, an
explicit (k, ) extractor Ext : {0, 1}n {0, 1}d {0, 1}m
with seed length d = O((log k) (log(n/))) and output
length m = k + d 2 log(1/) O(1).

Problem 6.9
(The
Building-Block
Extractor). Prove
Lemma 6.37: Show that for every constant t > 0 and all positive integers n k and all > 0, there is an explicit (k, )extractor Ext : {0, 1}n {0, 1}d {0, 1}m with m k/2 and
d = k/t + O(log(n/)). (Hint: convert the source into a block
source with blocks of length k/O(t) + O(log(n/)).)

Problem 6.10 (Encryption and Deterministic Extraction). A


(one-time) encryption scheme with key length n and message length
m consists of an encryption function Enc : {0, 1}n {0, 1}m {0, 1}
and a decryption function Dec : {0, 1}n {0, 1} {0, 1}m such that
Dec(k, Enc(k, u)) = u for every k {0, 1}n and u {0, 1}m . Let K be

208

Randomness Extractors

a random variable taking values in {0, 1}n . We say that (Enc, Dec)
is (statistically) -secure with respect to K if for every two messages
u, v {0, 1}m , we have (Enc(K, u), Enc(K, v)) . For example, the
one-time pad, where n = m =  and Enc(k, u) = k u = Dec(k, u) is
0-secure (a.k.a perfectly secure) with respect to the uniform distribution K = Um . For a class C of sources on {0, 1}n , we say that the
encryption scheme (Enc, Dec) is -secure with respect to C if Enc is
-secure with respect to every K C.
(1) Show that if there exists a deterministic -extractor
Ext : {0, 1}n {0, 1}m for C, then there exists an 2-secure
encryption scheme with respect to C.
(2) Conversely, use the following steps to show that if there
exists an -secure encryption scheme (Enc, Dec) with
respect to C, where Enc : {0, 1}n {0, 1}m {0, 1} , then
there exists a deterministic 2-extractor Ext: {0, 1}n
{0, 1}m2 log(1/)O(1) for C, provided m log n + 2 log
(1/) + O(1).
(a) For each xed key k {0, 1}n , dene a source Xk on
{0, 1} by Xk = Enc(k, Um ), and let C  be the class
of all these sources (i.e., C  = {Xk : k {0, 1}n }).
Show that there exists a deterministic -extractor
Ext : {0, 1} {0, 1}m2 log(1/)O(1) for C  , provided
m log n + 2 log(1/) + O(1).
(b) Show that if Ext is a deterministic -extractor for C 
and Enc is -secure with respect to C, then Ext(k) =
Ext (Enc(k, 0m )) is a deterministic 2-extractor for C.
Thus, a class of sources can be used for secure encryption i it is
deterministically extractable.

Problem 6.11 (Extracting from Symbol-Fixing Sources*). A


generalization of a bit-xing source is a symbol-xing source X taking
values in n for some alphabet , where subset of the coordinates of X
are xed and the rest are uniformly distributed and independent elements of . For = {0, 1, 2} and k [0, n], give an explicit -extractor

6.5 Chapter Notes and References

209

Ext : n {0, 1}m for the class of symbol-xing sources on n with


min-entropy at least k, with m = (k) and = 2(k) . (Hint: use a
random walk on a consistently labelled 3-regular expander graph.)

6.5

Chapter Notes and References

Surveys on randomness extractors and their applications are given by


Nisan and Ta-Shma [301] and Shaltiel [352, 354].
The study of randomness extraction goes back to von Neumann [416], who gave the deterministic extractor described in the text
for IID-bit sources. (See [130, 307] for improvements in the extraction
rate.) The study of extraction was renewed by Blum [69], who showed
how to extend von Neumanns method to extract perfectly uniform
bits from a source generated by a nite-state Markov chain. Santha
and Vazirani [346] proposed the model of unpredictable-bit sources,
suggested that an extractor need only produce an output that is statistically close to uniform (rather than perfectly uniform), and proved
that there is no deterministic extractor for a single unpredictable-bit
source even under this relaxation (Proposition 6.6; the proof we give
in Problem 6.6 is from [334]). Vazirani and Vazirani [407] showed
that nevertheless every problem in BPP can be solved in polynomial
time given a single unpredictable-bit source (Theorem 6.29.) Chor and
Goldreich [96] generalized this result to block sources, in the process
introducing min-entropy to the literature on randomness extraction.
Cohen and Wigderson [107] showed how to simulate BPP with any
source of suciently high min-entropy rate. This was strengthened to
any constant entropy rate by Zuckerman [426], and polynomially small
entropy rate in [343, 28].
The notion of seeded extractor we focus on in this section (Denitions 6.13,6.16) was introduced by Nisan and Zuckerman [303], who
also gave the rst construction of extractors with polylogarithmic seed
length (building on [426]). A form of the Leftover Hash Lemma (for
at distributions, and where the quality of the output distribution
is measured with entropy rather than statistical dierence) was
rst proven by Bennett, Brassard, and Robert [63]. The version in
Theorem 6.18 is due to Impagliazzo, Levin, and Luby [197], and the

210

Randomness Extractors

proof we give is due to Racko [217]. The connection between the


Leftover Hash Lemma and the Johnson Bound in Problem 3.4 is
due to Ta-Shma and Zuckerman [383], with the equivalence between
almost-universal hashing and minimum distance being from [67]. The
extractor parameters of Problem 3.4 were rst obtained in [155, 372].
The study of deterministic extractors has also remained active,
motivated by applications in cryptography and other settings where
enumerating seeds does not work. Bit-xing sources, rst studied
in [63, 98, 409], received particular attention for their applications
to maintaining security when an adversary learns some bits of a
cryptographic secret key [90]. (There has been work on protecting
against even more severe forms of leakage, surveyed in [175].) The
deterministic extractor for symbol-xing sources in Problem 6.11 is
from [232]. Problem 6.10 (encryption requires deterministic extraction)
is due to Bosley and Dodis [77]. For a discussion of deterministic
extraction for other models of sources, see Section 8.2.
The relevance of bipartite expanders to simulating randomized
algorithms with weak sources is due to Cohen and Wigderson [107],
who studied dispersers and generalizations of them (as graphs rather
than functions). The expander-based extractor of Theorem 6.22 is due
to Goldreich and Wigderson [174]. The connection between extractors
and list-decodable codes emerged in the work of Trevisan [389], and
the list-decoding view of extractors (Proposition 6.23) was crystallized
by Ta-Shma and Zuckerman [383]. The equivalence between extractors
and averaging samplers (Corollary 6.24) is due to Zuckerman [427]
(building on previous connections between other types of samplers and
expanders/dispersers [365, 107]).
The notion of entropy was introduced by Shannon in his seminal
paper [361] that gave rise to the eld of information theory. The many
properties of Shannon entropy and related quantities and their applications to communications theory are covered in the textbook [110].
Therefore, the fact that Shannon entropy converges to min-entropy (up
to a vanishing statistical distance) when we take independent samples
is called the Asymptotic Equipartition Property. Renyi entropy
(indeed the entropies H for all (0, )) were introduced in [335].

6.5 Chapter Notes and References

211

The fact that every k-source is a convex combination of at k-sources


(Lemma 6.10) is implicit in [96]. Our proof is due to Kalai [230].
The optimal nonconstructive bounds for extractor parameters
(Theorem 6.14) were identied by Radhakrishnan and Ta-Shma [315],
who proved a matching lower bound on seed length and upper bound
on output length.
The method for extracting from block sources given by Lemmas 6.27
and 6.28 was developed over a series of works [96, 410, 426]; the form
we use is from [303]. The rst method described in Section 6.3.2 for
converting general weak sources to block sources is from the original
papers of Nisan and Zuckerman [426, 303] (see [402] for a tighter
analysis). The observation that high min-entropy sources can be
partitioned into block sources, and the benets of this for constructing
extractors, is due to Goldreich and Wigderson [174]. Lossless condensers were rst introduced by Raz and Reingold [319] (formulated in
a slightly dierent manner), and the general denition of condensers
is from Reingold, Shaltiel, and Wigderson [328]. The equivalence of
lossless condensing and vertex expansion close to the degree is from
Ta-Shma, Umans, and Zuckerman [382]. The extractor construction of
Section 6.3.4 is due to Guruswami, Umans, and Vadhan [192]. (Earlier
papers had results similar to Theorem 6.36 with some restrictions on
the parameters, such as k = (n) [427] and 1/ subpolynomial [272].)
Recently, Dvir, Kopparty, Saraf, and Sudan [124] constructed extractors that have logarithmic seed length and extract a 1 O(1) fraction
of the min-entropy (for constant or slightly subconstant error).
The zigzag product for extractors and condensers described in
Section 6.3.5 is from [332, 92].
Problem 6.2 (achieving exponentially small error when simulating
BPP with a weak source) is from [426].

7
Pseudorandom Generators

7.1

Motivation and Denition

In the previous sections, we have seen a number of interesting


derandomization results:
Derandomizing specic algorithms, such as the ones for
MaxCut and Undirected S-T Connectivity;
Giving explicit (ecient, deterministic) constructions of
various pseudorandom objects, such as expanders, extractors, and list-decodable codes, as well as showing various
relations between them;
Reducing the randomness needed for certain tasks, such
as sampling and amplifying the success probability of
randomized algorithm; and
Simulating BPP with any weak random source.
However, all of these still fall short of answering our original
motivating question, of whether every randomized algorithm can be
eciently derandomized. That is, does BPP = P?
As we have seen, one way to resolve this question in the positive is
to use the following two-step process: First show that the number of
212

7.1 Motivation and Denition

213

random bits for any BPP algorithm can be reduced from poly(n) to
O(log n), and then eliminate the randomness entirely by enumeration.
Thus, we would like to have a function G that stretches a seed
of d = O(log n) truly random bits into m = poly(n) bits that look
random. Such a function is called a pseudorandom generator. The
question is how we can formalize the requirement that the output
should look random in such a way that (a) the output can be used
in place of the truly random bits in any BPP algorithm, and (b) such
a generator exists.
Some candidate denitions for what it means for the random
variable X = G(Ud ) to look random include the following:
Information-theoretic or statistical measures: For example,
we might measure entropy of G(Ud ), its statistical dierence
from the uniform distribution, or require pairwise independence. All of these fail one of the two criteria. For example,
it is impossible for a deterministic function to increase
entropy from O(log n) to poly(n). And it is easy to construct
algorithms that fail when run using random bits that are
only guaranteed to be pairwise independent.
Kolmogorov complexity: A string x looks random if it is
incompressible (cannot be generated by a Turing machine
with a description of length less than |x|). An appealing
aspect of this notion is that it makes sense of the randomness
in a xed string (rather than a distribution). Unfortunately,
it is not suitable for our purposes. Specically, if the function
G is computable (which we certainly want) then all of its
outputs have Kolmogorov complexity d = O(log n) (just
hardwire the seed into the TM computing G), and hence are
very compressible.
Computational indistinguishability: This is the measure we
will use. Intuitively, we say that a random variable X looks
random if no ecient algorithm can distinguish X from a
truly uniform random variable. Another perspective comes
from the denition of statistical dierence:
(X, Y ) = max | Pr[X T ] Pr[Y T ]|.
T

214

Pseudorandom Generators

With computational indistinguishability, we simply restrict


the max to be taken only over ecient statistical tests T
that is, T s for which membership can be eciently tested.
7.1.1

Computational Indistinguishability

Denition 7.1 (computational indistinguishability). Random


variables X and Y taking values in {0, 1}m are (t, ) indistinguishable
if for every nonuniform algorithm T running in time at most t, we have
| Pr[T (X) = 1] Pr[T (Y ) = 1]| .
The left-hand side above is called also the advantage of T .
Recall that a nonuniform algorithm is an algorithm that may have
some nonuniform advice hardwired in. (See Denition 3.10.) If the
algorithm runs in time t we require that the advice string is of length at
most t. Typically, to make sense of complexity measures like running
time, it is necessary to use asymptotic notions, because a Turing
machine can encode a huge lookup table for inputs of any bounded size
in its transition function. However, for nonuniform algorithms, we can
avoid doing so by using Boolean circuits as our nonuniform model of
computation. Similarly to Fact 3.11, every nonuniform Turing machine
algorithm running in time t(n) can be simulated by a sequence of

Boolean circuit Cn of size O(t(n))


and conversely every sequence of
Boolean circuits of size s(n) can be simulated by a nonuniform Turing

machine running in time O(s(n)).


Thus, to make our notation cleaner,
from now on, by nonuniform algorithm running in time t, we mean
Boolean circuit of size t, where we measure the size by the number
of AND and OR gates in the circuit. (For convenience, we dont
count the inputs and negations in the circuit size.) Note also that in
Denition 7.1 we have not specied whether the distinguisher is deterministic or randomized; this is because a probabilistic distinguisher
achieving advantage greater than can be turned into a deterministic
distinguisher achieving advantage greater than by nonuniformly
xing the randomness. (This is another example of how nonuniformity
is more powerful than randomness, like in Corollary 3.12.)

7.1 Motivation and Denition

215

It is also of interest to study computational indistinguishability


and pseudorandomness against uniform algorithms.
Denition 7.2 (uniform computational indistinguishability).
Let Xm , Ym be some sequences of random variables on {0, 1}m (or
{0, 1}poly(m) ). For functions t : N N and : N [0, 1], we say that
{Xm } and {Ym } are (t(m), (m)) indistinguishable for uniform algorithms if for all probabilistic algorithms T running in time t(m), we
have that
| Pr[T (Xm ) = 1] Pr[T (Ym ) = 1]| (m)
for all suciently large m, where the probabilities are taken over Xm ,
Ym and the random coin tosses of T .
We will focus on the nonuniform denition in this survey, but will
mention results about the uniform denition as well.
7.1.2

Pseudorandom Generators

Denition 7.3. A deterministic function G : {0, 1}d {0, 1}m is a


(t, ) pseudorandom generator (PRG) if
(1) d < m, and
(2) G(Ud ) and Um are (t, ) indistinguishable.

Also, note that we have formulated the denition with respect


to nonuniform computational indistinguishability, but an analogous
uniform denition can be given.
People attempted to construct pseudorandom generators long
before this denition was formulated. Their generators were tested
against a battery of statistical tests (e.g., the number of 1s and 0s are
approximately the same, the longest run is of length O(log m), etc.),
but these xed set of tests provided no guarantee that the generators
will perform well in an arbitrary application (e.g., in cryptography
or derandomization). Indeed, most classical constructions (e.g., linear

216

Pseudorandom Generators

congruential generators, as implemented in the standard C library)


are known to fail in some applications.
Intuitively, the above denition guarantees that the pseudorandom
bits produced by the generator are as good as truly random bits for all
ecient purposes (where ecient means time at most t). In particular,
we can use such a generator for derandomizing any algorithm of
running time at most t. For the derandomization to be ecient, we
will also need the generator to be eciently computable.
Denition 7.4. We say a sequence of generators {Gm : {0, 1}d(m)
{0, 1}m } is computable in time t(m) if there is a uniform and deterministic algorithm M such that for every m N and x {0, 1}d(m) ,
we have M (m, x) = Gm (x) and M (m, x) runs in time at most t(m). In
addition, M (m) (with no second input) should output the value d(m)
in time at most t(m).
Note that even when we dene the pseudorandomness property of
the generator with respect to nonuniform algorithms, the eciency
requirement refers to uniform algorithms. As usual, for readability, we
will usually refer to a single generator G : {0, 1}d(m) {0, 1}m , with it
being implicit that we are really discussing a family {Gm }.
Theorem 7.5. Suppose that for all m there exists an (m, 1/8)
pseudorandom generator G : {0, 1}d(m) {0, 1}m computable in time

c
t(m). Then BPP c DTIME(2d(n ) (nc + t(nc ))).
Proof. Let A(x; r) be a BPP algorithm that on inputs x of length n,
can be simulated by Boolean circuits of size at most most nc , using coin
tosses r. Without loss of generality, we may assume that |r| = nc . (It will
often be notationally convenient to assume that the number of random
bits used by an algorithm equals its running time or circuit size, so as
to avoid an extra parameter. However, most interesting algorithms will
only actually read and compute with a smaller number of these bits, so
as to leave time available for computation. Thus, one should actually
think of an algorithm as only reading a prex of its random string r.)

7.1 Motivation and Denition

217

The idea is to replace the random bits used by A with pseudorandom bits generated by G, use the pseudorandomness property to show
that the algorithm will still be correct with high probability, and nally
enumerate over all possible seeds to obtain a deterministic algorithm.
Claim 7.6. For every x of length n, A(x; G(Ud(nc ) )) errs with
probability smaller than 1/2.
Proof of Claim: Suppose that there exists some x on which
A(x; G(Ud(nc ) )) errs with probability at least 1/2. Then T () = A(x, )
is a Boolean circuit of size at most nc that distinguishes G(Ud(nc ) )
from Unc with advantage at least 1/2 1/3 > 1/8. (Notice that we are
using the input x as nonuniform advice; this is why we need the PRG
to be pseudorandom against nonuniform tests.)
Now, enumerate over all seeds of length d(nc ) and take a majority
c
vote. There are 2d(n ) of them, and for each we have to run both G
and A.
Notice that we can aord for the generator G to have running time
t(m) = poly(m) or even t(m) = poly(m) 2O(d(m)) without aecting
the time of the derandomization by than more than a polynomial
amount. In particular, for this application, it is OK if the generator
runs in more time than the tests it fools (which are time m in this
theorem). That is, for derandomization, it suces to have G that is
mildly explicit according to the following denition:
Denition 7.7.
(1) A generator G : {0, 1}d(m) {0, 1}m is mildly explicit if it is
computable in time poly(m, 2d(m) ).
(2) A generator G : {0, 1}d(m) {0, 1}m is fully explicit if it is
computable in time poly(m).

These denitions are analogous to the notions of mildly explicit


and fully explicit for expander graphs in Section 4.3. The truth table

218

Pseudorandom Generators

of a mildly explicit generator can be constructed in time polynomial in


its size (which is of size m 2d(m) ), whereas a fully explicit generator
can be evaluated in time polynomial in its input and output lengths
(like the neighbor function of a fully explicit expander).
Theorem 7.5 provides a tradeo between the seed length of the
PRG and the eciency of the derandomization. Lets look at some
typical settings of parameters to see how we might simulate BPP in
the dierent deterministic time classes (see Denition 3.1):
(1) Suppose that for every constant > 0, there is an (m, 1/8)
mildly explicit PRG with seed length d(m) = m . Then
#
def
BPP >0 DTIME(2n ) = SUBEXP. Since it is
known that SUBEXP is a proper subset of EXP, this is
already a nontrivial improvement on the current inclusion
BPP EXP (Proposition 3.2).
(2) Suppose that there is an (m, 1/8) mildly explicit
PRG with seed length d(m) = polylog(m). Then

c
def
DTIME(2log m ) = P.
BPP
c

(3) Suppose that there is an (m, 1/8) mildly explicit PRG with
seed length d(m) = O(log m). Then BPP = P.
Of course, all of these derandomizations are contingent on the
question of whether PRGs exist. As usual, our rst answer is yes but
the proof is not very helpful it is nonconstructive and thus does not
provide for an eciently computable PRG:
Proposition 7.8. For all m N and > 0, there exists a (nonexplicit)
(m, ) pseudorandom generator G : {0, 1}d {0, 1}m with seed length
d = O(log m + log(1/)).
Proof. The proof is by the probabilistic method. Choose
G : {0, 1}d {0, 1}m at random. Now, x a time m algorithm, T .
The probability (over the choice of G) that T distinguishes G(Ud )
d 2
from Um with advantage is at most 2(2 ) , by a Cherno
bound. There are 2poly(m) nonuniform algorithms running in time m
(i.e., circuits of size m). Thus, union-bounding over all possible T ,

7.2 Cryptographic PRGs

219

we get that the probability that there exists a T breaking G


d 2
is at most 2poly(m) 2(2 ) , which is less than 1 for d being
O(log m + log(1/)).
Note that putting together Proposition 7.8 and Theorem 7.5 gives
us another way to prove that BPP P/poly (Corollary 3.12): just let
the advice string be the truth table of an (nc , 1/8) PRG (which can be
c
described by 2d(n ) nc = poly(n) bits), and then use that PRG in the
proof of Theorem 7.5 to derandomize the BPP algorithm. However, if
you unfold both this proof and our previous proof (where we do error
reduction and then x the coin tosses), you will see that both proofs
amount to essentially the same construction.

7.2

Cryptographic PRGs

The theory of computational pseudorandomness discussed in this section emerged from cryptography, where researchers sought a denition that would ensure that using pseudorandom bits instead of truly
random bits (e.g., when encrypting a message) would retain security
against all computationally feasible attacks. In this setting, the generator G is used by the honest parties and thus should be very ecient
to compute. On the other hand, the distinguisher T corresponds to an
attack carried about by an adversary, and we want to protect against
adversaries that may invest a lot of computational resources into trying
to break the system. Thus, one is led to require that the pseudorandom
generators be secure even against distinguishers with greater running
time than the generator. The most common setting of parameters in the
theoretical literature is that the generator should run in a xed polynomial time, but the adversary can run in an arbitrary polynomial time.

Denition 7.9. A generator Gm : {0, 1}d(m) {0, 1}m is a cryptographic pseudorandom generator if:
(1) Gm is fully explicit. That is, there is a constant b such that
Gm is computable in time mb .

220

Pseudorandom Generators

(2) Gm is an (m(1) , 1/m(1) ) PRG. That is, for every constant


c, Gm is an (mc , 1/mc ) pseudorandom generator for all
suciently large m.
Due to space constraints and the fact that such generators are
covered in other texts (see the Chapter Notes and References), we will
not do an in-depth study of cryptographic generators, but just survey
what is known about them.
The rst question to ask is whether such generators exist at all.
It is not hard to show that cryptographic pseudorandom generators
cannot exist unless P = NP, indeed unless NP  P/poly. (See
Problem 7.3.) Thus, we do not expect to establish the existence of such
generators unconditionally, and instead need to make some complexity
assumption. While it would be wonderful to show that NP  P/poly
implies the existence of cryptographic pseudorandom generators, that
too seems out of reach. However, we can base them on the very
plausible assumption that there are functions that are easy to evaluate
but hard to invert.
Denition 7.10. fn : {0, 1}n {0, 1}n is a one-way function if:
(1) There is a constant b such that fn is computable in time nb
for suciently large n.
(2) For every constant c and every nonuniform algorithm A
running in time nc :
Pr[A(fn (Un )) fn1 (fn (Un ))]

1
nc

for all suciently large n.


Assuming the existence of one-way functions seems stronger than
the assumption NP  P/poly. For example, it is an average-case
complexity assumption, as it requires that f is hard to invert when
evaluated on random inputs. Nevertheless, there are a number of
candidate functions believed to be one-way. The simplest is integer
multiplication: fn (x, y) = x y, where x and y are n/2-bit numbers.

7.2 Cryptographic PRGs

221

Inverting this function amounts to the integer factorization problem,


for which no ecient algorithm is known.
A classic and celebrated result in the foundations of cryptography
is that cryptographic pseudorandom generators can be constructed
from any one-way function:
Theorem 7.11. The following are equivalent:
(1) One-way functions exist.
(2) There exist cryptographic pseudorandom generators with
seed length d(m) = m 1.
(3) For every constant > 0, there exist cryptographic pseudorandom generators with seed length d(m) = m .

Corollary 7.12. If one-way functions exist, then BPP SUBEXP.


What about getting a better derandomization? The proof of the
above theorem is more general quantitatively. It takes any one-way
function f : {0, 1} {0, 1} and a parameter m, and constructs a generator Gm : {0, 1}poly() {0, 1}m . The fact that Gm is pseudorandom
is proven by a reduction as follows. Assuming for contradiction that we
have an algorithm T that runs in time t and distinguishes Gm from uniform with advantage , we construct an algorithm T  running in time
t = t (m/)O(1) inverting f (say with probability 1/2). If t poly(),
then this contradicts the one-wayness of f , and hence we conclude
that T cannot exist and Gm is a (t, ) pseudorandom generator.
Quantitatively, if f is hard to invert by algorithms running in time
s() and we take m = 1/ = s()o(1) , then we have t s() for every
t = poly(m) and suciently large . Thus, viewing the seed length d
of Gm as a function of m, we have d(m) = poly() = poly(s1 (m(1) )),
where m(1) denotes any superpolynomial function of m.
Thus:
If s() = (1) , we can get seed length d(m) = m for any
desired constant > 0 and BPP SUBEXP (as discussed
above).

222

Pseudorandom Generators
(1)

If s() = 2
(as is plausible for the factoring one-way
function), then we get seed length d(m) = poly(log m) and

BPP P.
But we cannot get seed length d(m) = O(log m), as needed for concluding BPP = P, from this result. Even for the maximum possible
hardness s() = 2() , we get d(m) = poly(log m). In fact, Problem 7.3
shows that it is impossible to have a cryptographic PRG with seed
length O(log m) meeting Denition 7.9, where we require that Gm
be pseudorandom against all poly(m)-time algorithms. However, for
derandomization we only need Gm to be pseudorandom against a xed
poly-time algorithm, e.g., running in time t = m, and we would get such
generators with seed length O(log m) if the aforementioned construction
could be improved to yield seed length d = O() instead of d = poly().
Open Problem 7.13. Given a one-way function f : {0, 1} {0, 1}
that is hard to invert by algorithms running in time s = s() and a
constant c, it is possible to construct a fully explicit (t, ) pseudorandom generator G : {0, 1}d {0, 1}m with seed length d = O() and
pseudorandomness against time t = s (/m)O(1) ?
The best known seed length for such a generator is
3 log(m/)/ log2 s), which is O(
2 ) for the case that s = 2()
d = O(
()
and m = 2
as discussed above.
The above open problem has long been solved in the positive
for one-way permutations f : {0, 1} {0, 1} . In fact, the construction of pseudorandom generators from one-way permutations has a
particularly simple description:
Gm (x, r) = (x, r, f (x), r, f (f (x)), r, . . . , f (m1) (x), r),
where |r| = |x| =  and ,  denotes inner product modulo 2. One
intuition for this construction is the following. Consider the sequence
(f (m1) (U ), f (m2) (U ), . . . , f (U ), U ). By the fact that f is hard to
invert (but easy to evaluate) it can be argued that the i + 1st component of this sequence is infeasible to predict from the rst i components
except with negligible probability. Thus, it is a computational analogue

7.2 Cryptographic PRGs

223

of a block source. The pseudorandom generator then is obtained by


a computational analogue of block-source extraction, using the strong
extractor Ext(x, r) = x, r. The fact that the extraction works in this
computational setting, however, is much more delicate and complex
to prove than in the setting of extractors, and relies on a local
list-decoding algorithm for the corresponding code (namely the
Hadamard code). See Problems 7.12 and 7.13. (We will discuss local
list decoding in Section 7.6.)
Pseudorandom Functions. It turns out that a cryptographic pseudorandom generator can be used to build an even more powerful
object a family of pseudorandom functions. This is a family of functions {fs : {0, 1}d {0, 1}}s{0,1}d such that (a) given the seed s, the
function fs can be evaluated in polynomial time, but (b) without the
seed, it is infeasible to distinguish an oracle for fs from an oracle to a
truly random function. Thus in some sense, the d-bit truly random seed
s is stretched to 2d pseudorandom bits (namely the truth table of fs )!
Pseudorandom functions have applications in several domains:
Cryptography: When two parties share a seed s to a PRF,
they eectively share a random function f : {0, 1}d {0, 1}.
(By denition, the function they share is indistinguishable
from random by any poly-time third party.) Thus, in order
for one party to send a message m encrypted to the other,
R
they can simply choose a random r {0, 1}d , and send
(r, fs (r) m). With knowledge of s, decryption is easy;
simply calculate fs (r) and XOR it to the second part of the
received message. However, the value fs (r) m would look
essentially random to anyone without knowledge of s.
This is just one example; pseudorandom functions have vast
applicability in cryptography.
Learning Theory: Here, PRFs are used mainly to prove negative results. The basic paradigm in computational learning
theory is that we are given a list of examples of a functions
behavior, (x1 , f (x2 )), (x2 , f (x2 )), . . . , (xk , f (xk ))), where the
xi s are being selected randomly from some underlying
distribution, and we would like to predict what the func-

224

Pseudorandom Generators

tions value will be on a new data point xk+1 coming from


the same distribution. Information-theoretically, correct
prediction is possible after a small number of samples (with
high probability), assuming that the function has a small
description (e.g., is computable by a poly-sized circuit).
However, it is computationally hard to predict the output
of a PRF fs on a new point xk+1 after seeing its value on
k points (and this holds even if the algorithm gets to make
membership queries choose the evaluation points on
its own, in addition to getting random examples from some
underlying distribution). Thus, PRFs provide examples of
functions that are eciently computable yet hard to learn.
Hardness of Proving Circuit Lower Bounds: One main
approach to proving P = NP is to show that some
f NP doesnt have polynomial size circuits (equivalently,
NP  P/poly). This approach has had very limited success the only superpolynomial lower bounds that have
been achieved have been using very restricted classes of
circuits (monotone circuits, constant depth circuits, etc).
For general circuits, the best lower bound that has been
achieved for a problem in NP is 5n O(n).
Pseudorandom functions have been used to help explain
why existing lower-bound techniques have so far not yielded
superpolynomial circuit lower bounds. Specically, it has
been shown that any suciently constructive proof of
superpolynomial circuit lower bounds (one that would allow
us to certify that a randomly chosen function has no small circuits) could be used to distinguish a pseudorandom function
from truly random in subexponential time and thus invert
any one-way function in subexponential time. This is known
as the Natural Proofs barrier to circuit lower bounds.

7.3

Hybrid Arguments

In this section, we introduce a very useful proof method for working


with computational indistinguishability, known as the hybrid argument.

7.3 Hybrid Arguments

225

We use it to establish two important facts that computational


indistinguishability is preserved under taking multiple samples, and
that pseudorandomness is equivalent to next-bit unpredictability.
7.3.1

Indistinguishability of Multiple Samples

The following proposition illustrates that computational indistinguishability behaves like statistical dierence when taking many
independent repetitions; the distance multiplies by at most the number of copies (cf. Lemma 6.3, Part 6). Proving it will introduce useful
techniques for reasoning about computational indistinguishability, and
will also illustrate how working with such computational notions can
be more subtle than working with statistical notions.
Proposition 7.14. If random variables X and Y are (t, ) indistinguishable, then for every k N, X k and Y k are (t, k) indistinguishable
(where X k represents k independent copies of X).
Note that when t = , this follows from Lemma 6.3, Part 6; the
challenge here is to show that the same holds even when we restrict to
computationally bounded distinguishers.
Proof. We will prove the contrapositive: if there is an ecient
algorithm T distinguishing X k and Y k with advantage greater than
k, then there is an ecient algorithm T  distinguishing X and Y
with advantage greater than . The dierence in this proof from the
corresponding result about statistical dierence is that we need to
preserve eciency when going from T to T  . The algorithm T  will
naturally use the algorithm T as a subroutine. Thus this is a reduction
in the same spirit as reductions used elsewhere in complexity theory
(e.g., in the theory of NP-completeness).
Suppose that there exists a nonuniform time t algorithm T such that
| Pr[T (X k ) = 1] Pr[T (Y k ) = 1]| > k.

(7.1)

We can drop the absolute value in the above expression without


loss of generality. (Otherwise we can replace T with its negation; recall
that negations are free in our measure of circuit size.)

226

Pseudorandom Generators

Now we will use a hybrid argument. Consider the hybrid


distributions Hi = X ki Y i , for i = 0, . . . , k. Note that H0 = X k and
Hk = Y k . Then Inequality (7.1) is equivalent to
k


Pr[T (Hi1 ) = 1] Pr[T (Hi ) = 1] > k,

i=1

since the sum telescopes. Thus, there must exist some i [k] such that
Pr[T (Hi1 ) = 1] Pr[T (Hi ) = 1] > , i.e.,
Pr[T (X ki XY i1 ) = 1] Pr[T (X ki Y Y i1 ) = 1] > .
By averaging, there exists some x1 , . . . xki and yki+2 , . . . yk such
that
Pr[T (x1 , . . . xki , X, yki+2 , . . . yk ) = 1]
Pr[T (x1 , . . . xki , Y, yki+2 , . . . yk ) = 1] > .
Then, dene T  (z) = T (x1 , . . . xki , z, yki+2 , . . . , yk ). Note that T 
is a nonuniform algorithm with advice i, x1 , . . . , xki , yki+2 , . . . yk
hardwired in. Hardwiring these inputs costs nothing in terms of circuit
size. Thus T  is a nonuniform time t algorithm such that
Pr[T  (X) = 1] Pr[T  (Y ) = 1] > ,
contradicting the indistinguishability of X and Y .
While the parameters in the above result behave nicely, with (t, )
going to (t, k), there are some implicit costs. First, the amount of
nonuniform advice used by T  is larger than that used by T . This is
hidden by the fact that we are using the same measure t (namely circuit
size) to bound both the time and the advice length. Second, the result is
meaningless for large values of k (e.g., k = t), because a time t algorithm
cannot read more than t bits of the input distributions X k and Y k .
We note that there is an analogue of the above result for computational indistinguishability against uniform algorithms (Denition 7.2),
but it is more delicate, because we cannot simply hardwire i,
x1 , . . . , xki , yki+2 , . . . , yk as advice. Indeed, a direct analogue of
the proposition as stated is known to be false. We need to add the

7.3 Hybrid Arguments

227

additional condition that the distributions X and Y are eciently


R
samplable. Then T  can choose i [k] at random, and randomly
R
R
sample x1 , . . . , xki X, yki+2 , . . . , yk Y .
7.3.2

Next-Bit Unpredictability

In analyzing the pseudorandom generators that we construct, it will


be useful to work with a reformulation of the pseudorandomness
property, which says that given a prex of the output, it should be
hard to predict the next bit much better than random guessing.
For notational convenience, we deviate from our usual conventions
and write Xi to denote the ith bit of random variable X, rather than
the ith random variable in some ensemble. We have:
Denition 7.15. Let X be a random variable distributed on {0, 1}m .
For t N and [0, 1], we say that X is (t, ) next-bit unpredictable if
for every nonuniform probabilistic algorithm P running in time t and
every i [m], we have:
Pr[P (X1 X2 Xi1 ) = Xi ]

1
+ ,
2

where the probability is taken over X and the coin tosses of P .


Note that the uniform distribution X Um is (t, 0) next-bit
unpredictable for every t. Intuitively, if X is pseudorandom, it must
be next-bit unpredictable, as this is just one specic kind of test one
can perform on X. In fact the converse also holds, and that will be
the direction we use.
Proposition 7.16. Let X be a random variable taking values in
{0, 1}m . If X is a (t, ) pseudorandom, then X is (t O(1), ) next-bit
unpredictable. Conversely, if X is (t, ) next-bit unpredictable, then it
is (t, m ) pseudorandom.
Proof. Here U denotes an r.v. uniformly distributed on {0, 1}m and
Ui denotes the ith bit of U .

228

Pseudorandom Generators

pseudorandom next-bit unpredictable. The proof is by


reduction. Suppose for contradiction that X is not (t 3, ) next-bit
unpredictable, so we have a predictor P : {0, 1}i1 {0, 1} that
succeeds with probability at least 1/2 + . We construct an algorithm
T : {0, 1}m {0, 1} that distinguishes X from Um as follows:
$
1 if P (x1 x2 xi1 ) = xi
T (x1 x2 xm ) =
0 otherwise.
T can be implemented with the same number of and gates as P ,
plus 3 for testing equality (via the formula (x y) (x y)).
next-bit unpredictable pseudorandom. Also by reduction.
Suppose X is not pseudorandom, so we have a nonuniform algorithm
T running in time t s.t.
Pr[T (X) = 1] Pr[T (U ) = 1] > ,
where we have dropped the absolute values without loss of generality
as in the proof of Proposition 7.14.
We
now
use
a
hybrid
argument.
Dene
Hi =
X1 X2 Xi Ui+1 Ui+2 Um . Then Hm = X and H0 = U . We have:
m


(Pr[T (Hi ) = 1] Pr[T (Hi1 ) = 1]) > ,

i=1

since the sum telescopes. Thus, there must exist an i such that
Pr[T (Hi ) = 1] Pr[T (Hi1 ) = 1] > /m.
This says that T is more likely to output 1 when we put Xi in the
ith bit than when we put a random bit Ui . We can view Ui as being
Xi with probability 1/2 and being Xi with probability 1/2. The only
advantage T has must be coming from the latter case, because in the
former case, the two distributions are identical. Formally,
/m < Pr[T (Hi ) = 1] Pr[T (Hi1 ) = 1]
= Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1]

7.3 Hybrid Arguments

229

1
Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1]
2

1
+ Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1]
2
1
= (Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1]
2
Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1]).

This motivates the following next-bit predictor: P (x1 x2 xi1 ):


R

(1) Choose random bits ui , . . . , um {0, 1}.


(2) Compute b = T (x1 xi1 ui um ).
(3) If b = 1, output ui , otherwise output ui .
The intuition is that T is more likely to output 1 when ui = xi than
when ui = xi . Formally, we have:
Pr[P (X1 Xi1 ) = Xi ]
1
= (Pr[T (X1 Xi1 Ui Ui+1 Um ) = 1|Ui = Xi ]
2
+ Pr[T (X1 Xi1 Ui Ui+1 Um ) = 0|Ui = Xi ])
1
= (Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1]
2
+ 1 Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1])
1

> + .
2
m
Note that as described P runs in time t + O(m). Recalling that we are
using circuit size as our measure of nonuniform time, we can reduce the
running time to t as follows. First, we may nonuniformly x the coin
tosses ui , . . . , um of P while preserving its advantage. Then all P does
is run T on x1 xi1 concatenated with some xed bits and and either
output what T does or its negation (depending on the xed value of ui ).
Fixing some input bits and negation can be done without increasing
circuit size. Thus we contradict the next-bit unpredictability of X.
We note that an analogue of this result holds for uniform distinguishers and predictors, provided that we change the denition of

230

Pseudorandom Generators
R

next-bit predictor to involve a random choice of i [m] instead of


a xed value of i, and change the time bounds in the conclusions to
be t O(m) rather than t O(1) and t. (We cant do tricks like in
the nal paragraph of the proof.) In contrast to the multiple-sample
indistinguishability result of Proposition 7.14, this result does not
need X to be eciently samplable for the uniform version.

7.4

Pseudorandom Generators from Average-Case Hardness

In Section 7.2, we surveyed cryptographic pseudorandom generators,


which have numerous applications within and outside cryptography,
including to derandomizing BPP. However, for derandomization, we
can use generators with weaker properties. Specically, Theorem 7.5
only requires G : {0, 1}d(m) {0, 1}m such that:
(1) G fools (nonuniform) distinguishers running in time m (as
opposed to all poly(m)-time distinguishers).
(2) G is computable in time poly(m, 2d(m) ) (i.e., G is mildly
explicit). In particular, the PRG may take more time than
the distinguishers it is trying to fool.
Such a generator implies that every BPP algorithm can be derandomized in time poly(n) 2d(poly(n)) .
The benet of studying such generators is that we can hope to construct them under weaker assumptions than used for cryptographic generators. In particular, a generator with the properties above no longer
seems to imply P = NP, much less the existence of one-way functions.
(The nondeterministic distinguisher that tests whether a string is
an output of the generator by guessing a seed needs to evaluate the
generator, which takes more time than the distinguishers are allowed.)
However, as shown in Problem 7.1, such generators still imply
nonuniform circuit lower bounds for exponential time, something that
is beyond the state of the art in complexity theory.
Our goal in the rest of this section is to construct generators as
above from assumptions that are as weak as possible. In this section, we
will construct them from boolean functions computable in exponential
time that are hard on average (for nonuniform algorithms), and in the
section after we will relax this to only require worst-case hardness.

7.4 Pseudorandom Generators from Average-Case Hardness

7.4.1

231

Average-Case Hardness

A function is hard on average if it is hard to compute correctly on


randomly chosen inputs. Formally:
Denition 7.17. For s N and [0, 1], we say that a Boolean function f : {0, 1} {0, 1} is (s, ) average-case hard if for all nonuniform
probabilistic algorithms A running in time s,
Pr[A(X) = f (X)] 1 ,
where the probability is taken over X and the coin tosses of A.
Note that saying that f is (s, ) hard for some > 0 (possibly
exponentially small) amounts to saying that f is worst-case hard.1
Thus, we think of average-case hardness as corresponding to values
of that are noticeably larger than zero, e.g., = 1/s.1 or = 1/3.
Indeed, in this section we will take = 1/2 for = 1/s. That is, no
ecient algorithm can compute f much better than random guessing.
A typical setting of parameters we use is s = s() somewhere in range
from (1) (slightly superpolynomial) to s() = 2 for a constant
> 0. (Note that every function is computable by a nonuniform
algorithm running in time roughly 2 , so we cannot take s() to be
any larger.) We will also require f to be computable in (uniform) time
2O() so that our pseudorandom generator will be computable in time
exponential in its seed length. The existence of such an average-case
hard function may seem like a strong assumption, but in later sections
we will see how to deduce it from a worst-case hardness assumption.
Now we show how to obtain a pseudorandom generator from
average-case hardness.
Proposition 7.18. If f : {0, 1} {0, 1} is (t, 1/2 ) average-case
hard, then G(x) = x f (x) is a (t, ) pseudorandom generator.
1 For

probabilistic algorithms, the right denition of worst-case hardness is actually that


there exists an input x for which Pr[A(x) = f (x)] < 2/3, where the probability is taken
over the coin tosses of A. But for nonuniform algorithms two denitions can be shown to
be roughly equivalent. See Denition 7.34 and the subsequent discussion.

232

Pseudorandom Generators

We omit the proof of this proposition, but it follows from Problem 7.5, Part 2 (by setting m = 1, a = 0, and d =  in Theorem 7.24).
Note that this generator includes its seed in its output. This is
impossible for cryptographic pseudorandom generators, but is feasible
(as shown above) when the generator can have more resources than
the distinguishers it is trying to fool.
Of course, this generator is quite weak, stretching by only one bit.
We would like to get many bits out. Here are two attempts:
Use concatenation: Dene G(x1 xk ) = x1 xk f (x1 )
f (xk ). This is a (t, k) pseudorandom generator because
G(Uk ) consists of k independent samples of a pseudorandom
distribution and thus computational indistinguishability is
preserved by Proposition 7.14. Note that already here we
are relying on nonuniform indistinguishability, because the
distribution (U , f (U )) is not necessarily samplable (in time
that is feasible for the distinguishers). Unfortunately, however, this construction does not improve the ratio between
output length and seed length, which remains very close to 1.
Use composition: For example, try to get two
bits out using the same seed length by dening
G (x) = G(G(x)1 )G(x)+1 , where G(x)1 denotes
the rst  bits of G(x). This works for cryptographic
pseudorandom generators, but not for the generators we are
considering here. Indeed, for the generator G(x) = xf (x) of
Proposition 7.18, we would get G (x) = xf (x)f (x), which is
clearly not pseudorandom.
7.4.2

The Pseudorandom Generator

Our goal now is to show the following:


Theorem 7.19. For s : N N, suppose that there is a function
f E = DTIME(2O() )2 such that for every input length  N, f
2E

should be contrasted with the larger class EXP = DTIME(2poly() ). See Problem 7.2.

7.4 Pseudorandom Generators from Average-Case Hardness

233

is (s(), 1/2 1/s()) average-case hard, where s() is computable in


time 2O() . Then for every m N, there is a mildly explicit (m, 1/m)
pseudorandom generator G : {0, 1}d(m) {0, 1}m with seed length
d(m) = O(s1 (poly(m))2 / log m) that is computable in time 2O(d(m)) .
Note that this is similar to the seed length d(m) =
poly(s1 (poly(m))) mentioned in Section 7.2 for constructing
cryptographic pseudorandom generators from one-way functions, but
the average-case assumption is incomparable (and will be weakened
further in the next section). In fact, it is known how to achieve a seed
length d(m) = O(s1 (poly(m))), which matches what is known for
constructing pseudorandom generators from one-way permutations as
well as the converse implication of Problem 7.1. We will not cover that
improvement here (see the Chapter Notes and References for pointers),
but note that for the important case of hardness s() = 2() , Theorem 7.19 achieves seed length d(m) = O(O(log m)2 / log m) = O(log m)
and thus P = BPP. More generally, we have:
Corollary 7.20. Suppose that E has a (s(), 1/2 1/s()) averagecase hard function f : {0, 1} {0, 1}.
(1) If s() = 2() , then BPP = P.
(1)

(2) If s() = 2 , then BPP P.


(3) If s() = (1) , then BPP SUBEXP.
The idea is to apply f repeatedly, but on slightly dependent inputs,
namely ones that share very few bits. The sets of seed bits used for
each output bit will be given by a design:
Denition 7.21. S1 , . . . , Sm [d] is an (, a)-design if
(1) i, |Si | = 
(2) i = j, |Si Sj | a

234

Pseudorandom Generators

We want lots of sets having small intersections over a small


universe. We will use the designs established by Problem 3.2:
Lemma 7.22. For every constant > 0 and every , m N, there
2
exists an (, a)-design S1 , . . . , Sm [d] with d = O( a ) and a = log m.
Such a design can be constructed deterministically in time poly(m, d).
The important points are that intersection sizes are only logarithmic in the number of sets, and the universe size d is at most quadratic
in the set size  (and can be linear in  in case we take m = 2() ).
Construction 7.23. (NisanWigderson Generator) Given a
function f : {0, 1} {0, 1} and an (, a)-design S1 , . . . , Sm [d], dene
the NisanWigderson generator G : {0, 1}d {0, 1}m as
G(x) = f (x|S1 )f (x|S2 ) f (x|Sm )
where if x is a string in {0, 1}d and S [d], then x|S is the string of
length |S| obtained from x by selecting the bits indexed by S.
Theorem 7.24. Let G : {0, 1}d {0, 1}m be the NisanWigderson
generator based on a function f : {0, 1} {0, 1} and some (, a)
design. If f is (s, 1/2 /m) average-case hard, then G is a (t, )
pseudorandom generator, for t = s m a 2a .
Theorem 7.19 follows from Theorem 7.24 by setting = 1/m
and a = log m, and observing that for  = s1 (m3 ), then
t = s() m a 2a m, so we have an (m, 1/m) pseudorandom generator. The seed length is d = O(2 / log m) = O(s1 (poly(m))2 / log m).
Proof. Suppose G is not a (t, ) pseudorandom generator. By Proposition 7.16, there is a nonuniform time t next-bit predictor P such that

1
+ ,
(7.2)
2
m
for some i [m]. From P , we construct A that computes f with
probability greater than 1/2 + /m.
Pr[P (f (X|S1 )f (X|S2 ) f (X|Si1 )) = f (X|Si )] >

7.4 Pseudorandom Generators from Average-Case Hardness

235

Let Y = X|Si . By averaging, we can x all bits of X|Si = z (where


Si is the complement of S) such that the prediction probability
remains greater than 1/2 + /m (over Y and the coin tosses of the
predictor P ). Dene fj (y) = f (x|Sj ) for j {1, . . . , i 1}. (That is,
fj (y) forms x by placing y in the positions in Si and z in the others,
and then applies f to x|Sj ). Then
Pr[P (f1 (Y ) fi1 (Y )) = f (Y )] >
Y

1
+ .
m
2

Note that fj (y) depends on only |Si Sj | a bits of y. Thus, we


can compute each fj with a look-up table, which we can include in the
advice to our nonuniform algorithm. Indeed, every function on a bits
can be computed by a boolean circuit of size at most a 2a . (In fact,
size at most O(2a /a) suces.)
Then, dening A(y) = P (f1 (y) fi1 (y)), we deduce that A(y) can
be computed with error probability smaller than 1/2 /m in nonuniform time less than t + m a 2a = s. This contradicts the hardness
of f . Thus, we conclude G is an (m, ) pseudorandom generator.
Some additional remarks on this proof:
(1) This is a very general construction that works for any
average-case hard function f . We only used f E to deduce
G is computable in E.
(2) The reduction works for any nonuniform class of algorithms
C where functions of logarithmically many bits can be
computed eciently.
Indeed, in the next section we will use the same construction to
obtain an unconditional pseudorandom generator fooling constantdepth circuits, and will later exploit the above black-box properties
even further.
As mentioned earlier, the parameters of Theorem 7.24 have been
improved in subsequent work, but the newer constructions do not have
the clean structure of NisanWigderson generator, where the seed of
the generator is used to generate m random but correlated evaluation
points, on which the average-case hard function f is evaluated. Indeed,

236

Pseudorandom Generators

each output bit of the improved generators depends on the entire


truth-table of the function f , translating to a construction of signicantly higher computational complexity. Thus the following remains an
interesting open problem (which would have signicance for hardness
amplication as well as constructing pseudorandom generators):
Open Problem 7.25. For every , s N, construct an explicit
generator H : {0, 1}O() ({0, 1} )m with m = s(1) such that if f
is (s, 1/2 1/s) average-case hard and we dene G(x) = f (H1 (x))
f (H2 (x)) f (Hm (x)) where Hi (x), denotes the ith component of
H(x), then G is an (m, 1/m) pseudorandom generator.

7.4.3

Derandomizing Constant-depth circuits

Denition 7.26. An unbounded fan-in circuit C(x1 , . . . , xn ) has input


gates consisting of variables xi , their negations xi , and the constants
0 and 1, as well as computation gates, which can compute the AND
or OR of an unbounded number of other gates (rather than just 2, as
in usual Boolean circuits).3 The size of such a circuit is the number of
computation gates, and the depth is the maximum of length of a path
from an input gate to the output gate.
AC0 is the class of functions f : {0, 1} {0, 1} for which there
exist constants c and k and a uniformly constructible sequence
of unbounded fan-in circuits (Cn )nN such that for all n, Cn has
size at most nc and depth at most k, and for all x {0, 1}n ,
Cn (x) = f (x). Uniform constructibility means that there is an ecient
(e.g., polynomial-time) uniform algorithm M such that for all n,
M (1n ) = Cn (where 1n denotes the number n in unary, i.e., a string of
n 1s). BPAC0 dened analogously, except that Cn may have poly(n)
extra inputs, which are interpreted as random bits, and we require
Prr [Cn (x, r) = f (x)] 2/3.
3 Note

that it is unnecessary to allow internal NOT gates, as these can always be pushed
to the inputs via DeMorgans Laws at no increase in size or depth.

7.4 Pseudorandom Generators from Average-Case Hardness

237

AC0 is one of the richest circuit classes for which we have


superpolynomial lower bounds:
Theorem 7.27. For all constant k N and every  N, the func%
tion Par : {0, 1} {0, 1} dened by PAR (x1 , . . . , x ) = i=1 xi is
(sk (), 1/2 1/sk ())-average-case hard for nonuniform unbounded
1/k
fan-in circuits of depth k and size sk () = 2( ) .
The proof of this result is beyond the scope of this survey; see the
Chapter Notes and References for pointers.
In addition to having an average-case hard function against
AC0 , we also need that AC0 can compute arbitrary functions on a
logarithmic number of bits.
Lemma 7.28. Every function g : {0, 1}a {0, 1} can be computed
by a depth 2 circuit of size 2a .
Using these two facts with the NisanWigderson pseudorandom generator construction, we obtain the following pseudorandom
generator for constant-depth circuits.
Theorem 7.29. For every constant k and every m, there exists
a poly(m)-time computable (m, 1/m)-pseudorandom generator
O(k)
m {0, 1}m fooling unbounded fan-in circuits of
Gm : {0, 1}log
depth k (and size m).
Proof. This is proven similarly to Theorems 7.19 and 7.24, except
that we take f = PAR rather than a hard function in E, and we
observe that the reduction can be implemented in a way that increases
the depth by only an additive constant. Specically, to obtain a
pseudorandom generator fooling circuits of depth k and size m, we
use the hardness of PAR against unbounded fan-in circuits of depth
k
2
k  = k + 2 and size m2 , where  = t1
k (m ) = O(log m). Then the
seed length of G is O(2 /a) < O(2 ) = logO(k) m.

238

Pseudorandom Generators

We now follow the steps of the proof of Theorem 7.19 to go from


an adversary T of depth k violating the pseudorandomness of G to a
circuit A of depth k  calculating the parity function PAR .
If T has depth k, then it can be veried that the next-bit predictor
P constructed in the proof of Proposition 7.16 also has depth k. (Recall
that negations and constants can be propagated to the inputs so they
do not contribute to the depth.) Next, in the proof of Theorem 7.24,
we obtain A from P by A(y) = P (f1 (y)f2 (y) fi1 (y)) for some
i {1, . . . , m} and where each fi depends on at most a bits of y. Now
we observe that A can be computed by a small constant-depth circuit
(if P can). Specically, applying Lemma 7.28 to each fi , the size of
A is at most (m 1) 2a + m = m2 and the depth of A is at most
k  = k + 2. This contradicts the hardness of PAR .

Corollary 7.30. BPAC0 P.


With more work, this can be strengthened to actually put BPAC0
& 0 , i.e., uniform constant-depth circuits of quasipolynomial size.
in AC
(The diculty is that we use majority voting in the derandomization,
but small constant-depth circuits cannot compute majority. However,
they can compute an approximate majority, and this suces.)
The above pseudorandom generator can also be used to give
a quasipolynomial-time derandomization of the randomized algorithm we saw for approximately counting the number of satisfying
assignments to a DNF formula (Theorem 2.34); see Problem 7.4.
Improving the running time of either of these derandomizations to
polynomial is an intriguing open problem.
Open Problem 7.31.
BPAC0 P.

Show

that

BPAC0 = AC0

or

even

Open Problem 7.32 (Open Problem 2.36, restated). Give a


deterministic polynomial-time algorithm for approximately counting
the number of satisfying assignments to a DNF formula.

7.5 Worst-Case/Average-Case Reductions and Locally Decodable Codes

239

We remark that it has recently been shown how to give an averagecase AC0 simulation of BPAC0 (i.e., the derandomized algorithm is
correct on most inputs); see Problem 7.5.
Another open problem is to construct similar, unconditional
pseudorandom generators as Theorem 7.29 for circuit classes larger
than AC0 . A natural candidate is AC0 [2], which is the same as AC0
but augmented with unbounded-fan-in parity gates. There are known
explicit functions f : {0, 1} {0, 1} (e.g., Majority) for which every
(1/k)
AC0 [2] circuit of depth k computing f has size at least sk () = 2
,
but unfortunately the average-case hardness is much weaker than
we need. These functions are only (sk (), 1/2 1/O())-average-case
hard, rather than (sk (), 1/2 1/sk ())-average-case hard, so we can
only obtain a small stretch using Theorem 7.24 and the following
remains open.
Open Problem 7.33. For every constant k and every m, construct a
o(1)
(mildly) explicit (m, 1/4)-pseudorandom generator Gm : {0, 1}m

m
0
{0, 1} fooling AC [2] circuits of depth k and size m.

7.5

Worst-Case/Average-Case Reductions and


Locally Decodable Codes

In the previous section, we saw how to construct pseudorandom


generators from boolean functions that are very hard on average,
where every nonuniform algorithm running in time t must err with
probability greater than 1/2 1/t on a random input. Now we want
to relax the assumption to refer to worst-case hardness, as captured
by the following denition.
Denition 7.34. A function f : {0, 1} {0, 1} is worst-case hard
for time t if, for all nonuniform probabilistic algorithms A running in
time t, there exists x {0, 1} such that Pr[A(x) = f (x)] > 1/3, where
the probability is over the coin tosses of A.
Note that, for deterministic algorithms A, the denition simply says
x A(x) = f (x). In the nonuniform case, restricting to deterministic

240

Pseudorandom Generators

algorithms is without loss of generality because we can always derandomize the algorithm using (additional) nonuniformity. Specically,
following the proof that BPP P/poly, it can be shown that if f
is worst-case hard for nonuniform deterministic algorithms running
in time t, then it is worst-case hard for nonuniform probabilistic
algorithms running in time t for some t = (t/).
A natural goal is to be able to construct an average-case hard function from a worst-case hard function. More formally, given a function
f : {0, 1} {0, 1} that is worst-case hard for time t = t(), construct a
function f : {0, 1}O() {0, 1} such that f is average-case hard for time
t = t(1) . Moreover, we would like f to be in E if f is in E. (Whether we
can obtain a similar result for NP is a major open problem, and indeed
there are negative results ruling out natural approaches to doing so.)
Our approach to doing this will be via error-correcting codes.
Specically, we will show that if f is the encoding of f in an appropriate kind of error-correcting code, then worst-case hardness of f implies
average-case hardness of f.
Specically, we view f as a message of length L = 2 , and apply an

error-correcting code Enc : {0, 1}L L to obtain f = Enc(f ), which

Pictorially:
we view as a function f : {0, 1} , where  = log L.

message f : {0, 1} {0, 1} Enc codeword f : {0, 1} .

(Ultimately, we would like = {0, 1}, but along the way we will
discuss larger alphabets.)
Now we argue the average-case hardness of f as follows. Suppose,
for contradiction, that f is not average-case hard. By denition,
there exists an ecient algorithm A with Pr[A(x) = f(x)] > 1 .
We may assume that A is deterministic by xing its coins. Then A

may be viewed as a received word in L , and our condition on A


becomes dH (A, f) < . So if Dec is a -decoding algorithm for Enc,
then Dec(A) = f . By assumption A is ecient, so if Dec is ecient,
then f may be eciently computed everywhere. This would contradict
our worst-case hardness assumption, assuming that Dec(A) gives a
time t() algorithm for f . However, the standard notion of decoding

requires reading all 2 values of the received word A and writing all 2

7.5 Worst-Case/Average-Case Reductions and Locally Decodable Codes

241

values of the message Dec(A), and thus Time(Dec(A)) % 2 . But every


function on  bits can be computed in nonuniform time 2 , and even
in the uniform case we are mostly interested in t() 2 . To solve this
problem we introduce the notion of local decoding.
Denition 7.35. A local -decoding algorithm for a code

Enc : {0, 1}L L is a probabilistic oracle algorithm Dec with


the following property. Let f : [L] {0, 1} be any message with
be such that
associated codeword f = Enc(f ), and let g : [L]
dH (g, f) < . Then for all x [L] we have Pr[Decg (x) = f (x)] 2/3,
where the probability is taken over the coins ips of Dec.
In other words, given oracle access to g, we want to eciently
compute any desired bit of f with high probability. So both the input
(namely g) and the output (namely f ) are treated implicitly; the
decoding algorithm does not need to read/write either in its entirety.
Pictorially:
g
oracle access

Dec

f(x)

This makes it possible to have sublinear-time (or even


polylogarithmic-time) decoding. Also, we note that the bound of 2/3
in the denition can be amplied in the usual way. Having formalized a
notion of local decoding, we can now make our earlier intuition precise.
Proposition 7.36. Let Enc be an error-correcting code with local
-decoding algorithm Dec that runs in nonuniform time at most tDec
(meaning that Dec is an boolean circuit of size at most tDec equipped
with oracle gates), and let f be worst-case hard for nonuniform time
t. Then f = Enc(f ) is (t , ) average-case hard, where t = t/tDec .

242

Pseudorandom Generators

Proof. We do everything as explained before except with DecA in place


of Dec(A), and now the running time is at most Time(Dec) Time(A).
(We substitute each oracle gate in the circuit for Dec with the circuit
for A.)
We note that the reduction in this proof does not use nonuniformity
in an essential way. We used nonuniformity to x the coin tosses of A,
making it deterministic. To obtain a version for hardness against uniform probabilistic algorithms, the coin tosses of A can be chosen and
xed randomly instead. With high probability, the xed coins will not
increase As error by more than a constant factor (by Markovs Inequality); we can compensate for this by replacing the (t , ) average-case
hardness in the conclusion with, say, (t , /3) average-case hardness.
In light of the above proposition, our task is now to nd an error
correcting code Enc : {0, 1}L L with a local decoding algorithm.
Specically, we would like the following parameters.
= poly(L). This is
(1) We want  = O(), or equivalently L
because we measure hardness as a function of input length
(which in turn translates to the relationship between output
length and seed length of pseudorandom generators obtained
via Theorem 7.19). In particular, when t = 2() , wed like

to achieve t = 2() . Since t < t in Proposition 7.36, this is


only possible if  = O().

(2) We would like Enc to be computable in time 2O() = poly(L),


= poly(L).
which is poly(L) if we satisfy the requirement L

This is because we want f E to imply f E.


(3) We would like = {0, 1} so that f is a boolean function, and
= 1/2 so that f has sucient average-case hardness for
the pseudorandom generator construction of Theorem 7.24.
(4) Since f will be average-case hard against time t = t/tDec , we
would want the running time of Dec to be tDec = poly(, 1/)
so that we can take = t(1) and still have t = t(1) /poly().
Of course, achieving = 1/2 is not possible with our current
notion of local unique decoding (which is only harder than the
standard notion of unique decoding), and thus in the next section

7.5 Worst-Case/Average-Case Reductions and Locally Decodable Codes

243

we will focus on getting to be just a xed constant. In Section 7.6,


we will introduce a notion of local list decoding, which will enable
decoding from distance = 1/2 .
In our constructions, it will be more natural to focus on the task
of decoding codeword symbols rather than message symbols. That is,
we replace the message f with the codeword f in Denition 7.35 to
obtain the following notion:
Denition 7.37(Locally Correctible Codes).4 A local -correcting

algorithm for a code C L is a probabilistic oracle algorithm Dec with



the following property. Let f C be any codeword, and let g : [L]

be such that dH (g, f ) < . Then for all x [L] we have Pr[Decg (x) =
f(x)] 2/3, where the probability is taken over the coin ips of Dec.
This implies the standard denition of locally decodable codes
under the (mild) constraint that the message symbols are explicitly
included in the codeword, as captured by the following denition (see
also Problem 5.4).
Denition 7.38 (Systematic Encodings). An encoding algo
rithm Enc : {0, 1}L C for a code C L is systematic if there is
such that for
a polynomial-time computable function I : [L] [L]
L

all f {0, 1} , f = Enc(f ), and all x [L], we have f(I(x)) = f (x),


where we interpret 0 and 1 as elements of in some canonical way.
Informally, this means that the message f can be viewed as the
restriction of the codeword f to the coordinates in the image of I.
Lemma 7.39. If Enc : {0, 1}L C is systematic and C has a local correcting algorithm running in time t, then Enc has a local -decoding
algorithm (in the standard sense) running in time t + poly(log L).
Proof. If Dec1 is the local corrector for C and I the mapping in the
denition of systematic encoding, then Decg2 (x) = Decg1 (I(x)) is a local
decoder for Enc.
4 In

the literature, these are often called self-correctible codes.

244

Pseudorandom Generators

7.5.1

Local Decoding Algorithms

Hadamard Code. Recall the Hadamard code of message length


m, which consists of the truth tables of all Z2 -linear functions
c : {0, 1}m {0, 1} (Construction 5.12).
m

Proposition 7.40. The Hadamard code C {0, 1}2 of message


length m has a local (1/4 )-correcting algorithm running in time
poly(m, 1/).
Proof. We are given oracle access to g : {0, 1}m {0, 1} that is at
distance less than 1/4 from some (unknown) linear function c, and
we want to compute c(x) at an arbitrary point x {0, 1}m . The idea
is random self-reducibility: we can reduce computing c at an arbitrary
point to computing c at uniformly random points, where g is likely
to give the correct answer. Specically, c(x) = c(x r) c(r) for
every r, and both x r and r are uniformly distributed if we choose
R
r {0, 1}m . The probability that g diers from c at either of these
points is less than 2 (1/4 ) = 1/2 2. Thus g(x r) g(r) gives
the correct answer with probability noticeably larger than 1/2. We can
amplify this success probability by repetition. Specically, we obtain
the following local corrector:
Algorithm 7.41 (Local Corrector for Hadamard Code).
Input: An oracle g : {0, 1}m {0, 1}, x {0, 1}m , and a parameter > 0
R

(1) Choose r1 , . . . , rt {0, 1}m , for t = O(1/2 ).


(2) Query g(ri ) and g(ri x) for each i = 1, . . . , t.
(3) Output maj1it {g(ri ) g(ri x)}.
If dH (g, c) < 1/4 , then this algorithm will output c(x) with
probability at least 2/3.
This local decoding algorithm is optimal in terms of its decoding
distance (arbitrarily close to 1/4) and running time (logarithmic in

7.5 Worst-Case/Average-Case Reductions and Locally Decodable Codes

245

the blocklength), but the problem is that the Hadamard code has
exponentially small rate.
ReedMuller Code. Recall that the q-ary ReedMuller code of
degree d and dimension m consists of all multivariate polynomials
p : Fm
q Fq of total degree at most d. (Construction 5.16.) This code
has minimum distance = 1 d/q. ReedMuller Codes are a common
generalization of both Hadamard and ReedSolomon codes, and thus
we can hope that for an appropriate setting of parameters, we will
be able to get the best of both kinds of codes. That is, we want to
combine the ecient local decoding of the Hadamard code with the
good rate of ReedSolomon codes.

Theorem 7.42. The q-ary ReedMuller Code of degree d and


dimension m has a local 1/12-correcting algorithm running in time
poly(m, q) provided d q/9 and q 36.
Note the running time of the decoder is roughly the mth root
= q m . When m = 1, our decoder can query
of the block length L
the entire string and we simply obtain a global decoding algorithm
for ReedSolomon Codes (which we already know how to achieve
from Theorem 5.19). But for large enough m, the decoder can only
access a small fraction of the received word. (In fact, one can improve
the running time to poly(m, d, log q), but the weaker result above is
sucient for our purposes.)
The key idea behind the decoder is to do restrictions to random
lines in Fm . The restriction of a ReedMuller codeword to such a line
is a ReedSolomon codeword, and we can aord to run our global
ReedSolomon decoding algorithm on the line.
Formally, for x, y Fm , we dene the (parameterized) line through
x in direction y as the function x,y : F Fm given by x,y (t) = x + ty.
Note that for every a F, b F \ {0}, the line x+ay,by has the same
set of points in its image as x,y ; we refer to this set of points as an
unpolymerized line. When y = 0, the parameterized line contains only
the single point x.

246

Pseudorandom Generators

If g : Fm F is any function and  : F Fm is a line, then we use


g| to denote the restriction of g to , which is simply the composition
g  : F F. Note that if p is any polynomial of total degree at most
d, then p| is a (univariate) polynomial of degree at most d.
So we are given an oracle g of distance less than from some
degree d polynomial p : Fm F, and we want to compute p(x) for
some x Fm . We begin by choosing a random line  through x. Every
point of Fm \ {x} lies on exactly one parameterized line through x,
so the points on  (except x) are distributed uniformly at random
over the whole domain, and thus g and p are likely to agree on these
points. Thus we can hope to use the points on this line to reconstruct
the value of p(x). If is suciently small compared to the degree
(e.g., = 1/3(d + 1)), we can simply interpolate the value of p(x) from
d + 1 points on the line. This gives rise to the following algorithm.
Algorithm 7.43 (Local Corrector for ReedMuller Code I).
Input: An oracle g : Fm F, an input x Fm , and a degree
parameter d
R

(1) Choose y Fm . Let  = x,y : F Fm be the line through x


in direction y.
(2) Query g to obtain 0 = g| (0 ) = g((a0 )), . . . , d = g| (d ) =
g((ad )), where 0 , . . . , d F \ {0} are any xed points
(3) Interpolate to nd the unique univariate polynomial q of
degree at most d s.t. i, q(i ) = i
(4) Output q(0)

Claim 7.44 If g has distance less than = 1/3(d + 1) from some


polynomial p of degree at most d, then Algorithm 7.43 will output
p(x) with probability greater than 2/3.
Proof of Claim: Observe that for all x Fm and i F \ {0},
R
x,y (i ) is uniformly random in Fm over the choice of y Fm . This

7.5 Worst-Case/Average-Case Reductions and Locally Decodable Codes

247

implies that for each i,


Pr[g| (i ) = p| (i )] < =


1
.
3(d + 1)

By a union bound,
1
Pr[i, g| (i ) = p| (i )] < (d + 1) = .

3
Thus, with probability greater than 2/3, we have i, q(i ) = p| (i )
and hence q(0) = p(x). The running time of the algorithm
is poly(m, q).
We now show how to improve the decoder to handle a larger fraction
of errors, up to distance = 1/12. We alter Steps 7.43 and 7.43 in the
above algorithm. In Step 7.43, instead of querying only d + 1 points,
we query over all points in . In Step 7.43, instead of interpolation,
we use a global decoding algorithm for ReedSolomon codes to decode
the univariate polynomial p| . Formally, the algorithm proceeds as
follows.
Algorithm 7.45 (Local Corrector for ReedMuller Codes II).
Input: An oracle g : Fm F, an input x Fm , and a degree parameter d, where q = |F| 36 and d q/9.
R

(1) Choose y Fm . Let  = x,y : F Fm be the line through x


in direction y.
(2) Query g at all points on  to obtain g| : F F.
(3) Run the 1/3-decoder for the q-ary ReedSolomon code of
degree d on g| to obtain the (unique) polynomial q at
distance less than 1/3 from g| (if one exists).5
(4) Output q(0).


1/3-decoder for ReedSolomon codes follows
from the (1 2 d/q) list-decoding algo
rithm of Theorem 5.19. Since 1/3 1 2 d/q, the list-decoder will produce a list containing all univariate polynomials at distance less than 1/3, and since 1/3 is smaller than
half the minimum distance (1 d/q), there will be only one good decoding.

5A

248

Pseudorandom Generators

Claim 7.46. If g has distance less than = 1/12 from some polynomial
p of degree at most d, and the parameters satisfy q = |F| 36, d q/9,
then Algorithm 7.45 will output p(x) with probability greater than 2/3.
Proof of Claim: The expected distance (between g| and p| ) is small:
E[dH (g| , p| )] <


1
1
1
1
+=
+
= ,
q
36
12 9

where the term 1/q is due to the fact that the point x is not random.
Therefore, by Markovs Inequality,
Pr[dH (g| , p| ) 1/3] 1/3.
Thus, with probability at least 2/3, we have that p| is the unique
polynomial of degree at most d at distance less than 1/3 from g| and
thus q must equal p| .
7.5.2

Low-Degree Extensions

Recall that to obtain locally decodable codes from locally correctible


codes (as constructed above), we need to exhibit systematic encoding:
(Denition 7.38.) Thus, given f : [L] {0, 1}, we want to encode it as
s.t.:
a ReedMuller codeword f : [L]

The encoding time is 2O() = poly(L).


= poly(L).
 = O(), or equivalently L
The code is systematic in the sense of Denition 7.38.

Note that the usual encoding for ReedMuller codes, where the message gives the coecients of the polynomial, is not systematic. Instead
the message should correspond to evaluations of the polynomial at
certain points. Once we settle on the set of evaluation points, the task
becomes one of interpolating the values at these points (given by the
message) to a low-degree polynomial dened everywhere.
The simplest approach is to use the boolean hypercube as the set
of evaluation points.

7.5 Worst-Case/Average-Case Reductions and Locally Decodable Codes

249

Lemma 7.47. (multilinear extension) For every f : {0, 1} {0, 1}


and every nite eld F, there exists a (unique) polynomial f: F F
such that f|{0,1} f and f has degree at most 1 in each variable (and
hence total degree at most ).
Proof. We prove the existence of the polynomial f. Dene

f () (x)
f(x1 , . . . , x ) =
{0,1}

for


(x) =

i : i =1


xi


(1 xi )

i : i =0

Note that for x {0, 1} , (x) = 1 only when = x, therefore


f|{0,1} f . We omit the proof of uniqueness. The bound on the
individual degrees is by inspection.
Thinking of f as an encoding of f , lets inspect the properties of
this encoding.
Since the total degree of the multilinear extension can be
as large as , we need q 9 for the local corrector of
Theorem 7.42 to apply.

The encoding time is 2O()


, as computing a single point of

f requires summing over 2 elements, and we have 2 points


on which to compute f.
The code is systematic, since f is an extension of f .
However, the input length is  =  log q = ( log ), which is
slightly larger than our target of  = O().
To solve the problem of the input length  in the multilinear
encoding, we reduce the dimension of the polynomial f by changing
the embedding of the domain of f : Instead of interpreting {0, 1} F
as an embedding of the domain of f in F , we map {0, 1} to Hm for
some subset H F, and as such embed it in Fm .

250

Pseudorandom Generators

More precisely, we x a subset H F of size |H| =  q. Choose


m = /log |H|, and x some ecient one-to-one mapping from {0, 1}
into Hm . With this mapping, view f as a function f : Hm F.
Analogously to before, we have the following.
Lemma 7.48. (low-degree extension) For every nite eld F,
H F, m N, and function f : Hm F, there exists a (unique)
f: Fm F such that f|Hm f and f has degree at most |H| 1 in
each variable (and hence has total degree at most m (|H| 1)).

Using |H| =  q, the total degree of f is at most d =  q. So we


can apply the local corrector of Theorem 7.42, as long as q 812 (so
that d q/9). Inspecting the properties of f as an encoding of f , we
have:
The input length is  = m log q = /log |H| log q = O(),

as desired. (We can use a eld of size 2k for k N, so that L


is a power of 2 and we incur no loss in encoding inputs to f
as bits.)
The code is systematic as long as our mapping from {0, 1}
to Hm is ecient.
Note that not every polynomial of total degree at most m (|H| 1)
is the low-degree extension of a function f : Hm F, so the image of
our encoding function f  f is only a subcode of the ReedMuller code.
This is not a problem, because any subcode of a locally correctible
code is also locally correctible, and we can aord the loss in rate (all
we need is  = O()).
7.5.3

Putting It Together

Combining Theorem 7.42 with Lemmas 7.48, and 7.39, we obtain the
following locally decodable code:
Proposition 7.49. For every L N, there is an explicit code Enc :

= poly(L) and alphabet size || =


{0, 1}L L , with blocklength L
poly(log L), that has a local (1/12)-decoder running in time poly(log L).

7.5 Worst-Case/Average-Case Reductions and Locally Decodable Codes

251

Using Proposition 7.36, we obtain the following conversion from


worst-case hardness to average-case hardness:
Proposition 7.50. If there exists f : {0, 1} {0, 1} in E that is
worst-case hard against (nonuniform) time t(), then there exists
f: {0, 1}O() {0, 1}O(log ) in E that is (t (), 1/12) average-case hard
for t () = t()/poly().
This diers from our original goal in two ways: f is not Boolean,
and we only get hardness 1/12 (instead of 1/2 ). The former concern
can be remedied by concatenating the code of Proposition 7.49 with
a Hadamard code, similarly to Problem 5.2. Note that the Hadamard
code is applied on message space , which is of size polylog(L), so it
can be 1/4-decoded by brute-force in time polylog(L) (which is the
amount of time already taken by our decoder).6 Using this, we obtain:
Theorem 7.51. For every L N, there is an explicit code

= poly(L) that has a


Enc : {0, 1}L {0, 1}L with blocklength L
local (1/48)-decoder running in time poly(log L).

Theorem 7.52. If there exists f : {0, 1} {0, 1} in E that is worstcase hard against time t(), then there exists f: {0, 1}O() {0, 1} in
E that is (t (), 1/48) average-case hard, for t () = t()/poly().
An improved decoding distance can be obtained using Problem 7.7.
We note that the local decoder of Theorem 7.51 not only runs
in time poly(log L), but also makes poly(log L) queries. For some
applications (such as Private Information Retrieval, see Problem 7.6),
it is important to have the number q of queries be as small as possible,
ideally a constant. Using ReedMuller codes of constant degree, it
is possible to obtain constant-query locally decodable codes, but the
6 Some

readers may recognize this concatenation step as the same as applying the
GoldreichLevin hardcore predicate to f. (See Problems 7.12 and 7.13.) However, for
the parameters we are using, we do not need the power of these results, and can aord to
perform brute-force unique decoding instead.

252

Pseudorandom Generators

= exp(L1/(q1) ). In a recent breakthrough, it


blocklength will be L
was shown how to obtain constant-query locally decodable codes with
= exp(Lo(1) ). Obtaining polynomial blocklength remains
blocklength L
open.
Open Problem 7.53. Are there binary codes that are locally
decodable with a constant number of queries (from constant distance
> 0) and blocklength polynomial in the message length?
7.5.4

Other Connections

As shown in Problem 7.6, locally decodable codes are closely related


to protocols for private information retrieval. Another connection, and
actually the setting in which these local decoding algorithms were rst
discovered, is to program self-correctors. Suppose we have a program for
computing a function, such as the Determinant, which happens to be
a codeword in a locally decodable code (e.g., the determinant is a lowdegree multivariate polynomial, and hence a ReedMuller codeword).
Then, even if this program has some bugs and gives the wrong answer
on some small fraction of inputs, we can use the local decoding algorithm to obtain the correct answer on all inputs with high probability.

7.6
7.6.1

Local List Decoding and PRGs from


Worst-Case Hardness
Hardness Amplication

In the previous section, we saw how to use locally decodable codes to


convert worst-case hard functions into ones with constant average-case
hardness (Theorem 7.52). Now our goal is to amplify this hardness
(e.g., to 1/2 ).
There are some generic techniques for hardness amplication. In
these methods, we evaluate the function on many independent inputs.
For example, consider f  that concatenates the evaluations of f on k
independent inputs:
f  (x1 , . . . , xk ) = (f(x1 ), . . . , f(xk )).

7.6 Local List Decoding and PRGs from Worst-Case Hardness

253

Intuitively, if f is 1/12 average-case hard, then f  should be (1


(11/12)k )-average case hard because any ecient algorithm can solve
each instance correctly with probability at most 11/12. Proving this
is nontrivial (because the algorithm trying to compute f  need not
behave independently on the k instances), but there are Direct Product Theorems showing that the hardness does get amplied essentially
as expected. Similarly, if we take the XOR on k independent inputs, the
XOR Lemma says that the hardness approaches 1/2 exponentially fast.
The main disadvantage of these approaches (for our purposes) is
that the input length of f  is k while we aim for input length of O().
To overcome this problem, it is possible to use derandomized products,
where we evaluate f on correlated inputs instead of independent ones.
We will take a dierent approach, generalizing the notion and
algorithms for locally decodable codes to locally list-decodable codes,
and thereby directly construct f that is (1/2 )-hard. Nevertheless,
the study of hardness amplication is still of great interest, because it
(or variants) can be employed in settings where doing a global encoding of the function is infeasible (e.g., for amplifying the average-case
hardness of functions in complexity classes lower than E, such as
NP, and for amplifying the security of cryptographic primitives). We
remark that results on hardness amplication can be interpreted in
a coding-theoretic language as well, as converting locally decodable
codes with a small decoding distance into locally list-decodable codes
with a large decoding distance. (See Section 8.2.3.)
7.6.2

Denition

We would like to formulate a notion of local list-decoding to enable us


to have binary codes that are locally decodable from distances close
to 1/2. This is somewhat tricky to dene what does it mean to
produce a list of decodings when only asked to decode a particular
coordinate? Let g be our received word, and f1 , f2 , . . . , fs the codewords
that are close to g. One option would be for the decoding algorithm,
on input x, to output a set of values Decg (x) that is guaranteed
to contain f1 (x), f2 (x), . . . fs (x) with high probability. However, this is
not very useful; in the common case that s ||, the list could always

254

Pseudorandom Generators

be Decg (x) = . Rather than outputting all of the values, we want to


be able to specify to our decoder which fi (x) to output. We do this
with a two-phase decoding algorithm (Dec1 , Dec2 ), where both phases
can be randomized.
(1) Dec1 , using g as an oracle and not given any other input
other than the parameters dening the code, returns a list
of advice strings a1 , a2 , . . . , as , which can be thought of as
labels for each of the codewords close to g.
(2) Dec2 (again, using oracle access to g), takes input x and ai ,
and outputs fi (x).
The picture for Dec2 is much like our old decoder, but it takes an
extra input ai corresponding to one of the outputs of Dec1 :
g
oracle access
x
a_i

Dec_2

f_i(x)

More formally:
Denition 7.54. A local -list-decoding algorithm for a code Enc is
a pair of probabilistic oracle algorithms (Dec1 , Dec2 ) such that for all
received words g and all codewords f = Enc(f ) with dH (f, g) < , the
following holds. With probability at least 2/3 over (a1 , . . . , as ) Decg1 ,
there exists an i [s] such that
x, Pr[Decg2 (x, ai ) = f (x)] 2/3.
Note that we dont explicitly require a bound on the list size s
(to avoid introducing another parameter), but certainly it cannot be
larger than the running time of Dec1 .
As we did for locally (unique-)decodable codes, we can dene a
local -list-correcting algorithm, where Dec2 should recover arbitrary
symbols of the codeword f rather than the message f . In this case,
we dont require that for all j, Decg2 (, aj ) is a codeword, or that it is

7.6 Local List Decoding and PRGs from Worst-Case Hardness

255

close to g; in other words, some of the aj s may be junk. Analogously


to Lemma 7.39, a local -list-correcting algorithm implies local
-list-decoding if the code is systematic.
Proposition 7.36 shows how locally decodable codes convert
functions that are hard in the worst case to ones that are hard on
average. The same is true for local list-decoding:
Proposition 7.55. Let Enc be an error-correcting code with local
-list-decoding algorithm (Dec1 , Dec2 ) where Dec2 runs in time at
most tDec , and let f be worst-case hard for non-uniform time t. Then
f = Enc(f ) is (t , ) average-case hard, where t = t/tDec .
Proof. Suppose for contradiction that f is not (t , )-hard. Then some
nonuniform algorithm A running in time t computes f with error probability smaller than . But if Enc has a local list-decoding algorithm,
then (with A playing the role of g) that means there exists ai (one
A
of the possible outputs of DecA
1 ), such that Dec2 (, ai ) computes f ()
A
everywhere. Hardwiring ai as advice, Dec2 (, ai ) is a nonuniform
algorithm running in time at most time(A) time(Dec2 ) t.
Note that, in contrast to Proposition 7.36, here we are using
nonuniformity more crucially, in order to select the right function from
the list of possible decodings. As we will discuss in Section 7.7.1, this
use of nonuniformity is essential for black-box constructions, that do
not exploit any structure in the hard function f or the adversary (A in
the above proof). However, there are results on hardness amplication
against uniform algorithms, which use structure in the hard function
f (e.g., that it is complete for a complexity class like E or NP) to
identify it among the list of decodings without any nonuniform advice.
7.6.3

Local List-Decoding ReedMuller Codes

Theorem 7.56. There is a universal constant c such that the q-ary


ReedMuller code of degree d and dimension mover can be locally
(1 )-list-corrected in time poly(q, m) for = c d/q.

256

Pseudorandom Generators

Note that the distance at which list decoding can be done


approaches 1 as q/d . It matches the bound for list-decoding
ReedSolomon codes (Theorem 5.19) up to the constant c. Moreover,
as the dimension m increases, the running time of the decoder
(poly(q, m)) becomes much smaller than the block length (q m log q),

 m
at the price of a reduced rate ( m+d
m /q ).
Proof. Suppose we are given an oracle g : Fm F that is (1 ) close
to some unknown polynomial p : Fm F, and that we are given an x
Fm . Our goal is to describe two algorithms, Dec1 and Dec2 , where Dec2
is able to compute p(x) using a piece of Dec1 s output (i.e., advice).
The advice that we will give to Dec2 is the value of p on a single
point. Dec1 can easily generate a (reasonably small) list that contains
one such point by choosing a random y Fm , and outputting all pairs
(y, z), for z F. More formally:
Algorithm 7.57 (ReedMuller Local List-Decoder Dec1 ).
Input: An oracle g : Fm F and a degree parameter d
R

(1) Choose y Fm
(2) Output {(y, z) : z F}
This rst-phase decoder is rather trivial in that it doesnt make use
of the oracle access to the received word g. It is possible to improve
both the running time and list size of Dec1 by using oracle access to g,
but we wont need those improvements below.
Now, the task of Dec2 is to calculate p(x), given the value of p
on some point y. Dec2 does this by looking at g restricted to the line
through x and y, and using the list-decoding algorithm for Reed
Solomon Codes to nd the univariate polynomials q1 , q2 , . . . , qt that are
close to g. If exactly one of these polynomials qi agrees with p on the
test point y, then we can be reasonably condent that qi (x) = p(x).
In more detail, the decoder works as follows:
Algorithm 7.58 (ReedMuller Local List-Corrector Dec2 ).
Input: An oracle g : Fm F, an input x Fm , advice (y, z) Fm F,

7.6 Local List Decoding and PRGs from Worst-Case Hardness

257

and a degree parameter d


(1) Let  = x,yx : F Fm be the line through x and y (so that
(0) = x and (1) = y).
(2) Run the (1 /2)-list-decoder for ReedSolomon Codes
(Theorem 5.19) on g| to get all univariate polys q1 , . . . , qt
that agree with g| in greater than an /2 fraction of points.
(3) If there exists a unique i such that qi (1) = z, output qi (0).
Otherwise, fail.

Now that we have fully specied the algorithms, it remains to


analyze them and show that they decode p correctly. Observe that it
suces to compute p on greater than an 11/12 fraction of the points
x, because then we can apply the unique local correcting algorithm of
Theorem 7.42. Therefore, to nish the proof of the theorem we must
prove the following.
Claim 7.59. Suppose that g : Fm F has agreement greater than
with a polynomial p : Fm F of degree at most d. For at least half
of the points y Fm the following holds for greater than an 11/12
fraction of lines  going through y:
(1) agr(g| , p| ) > /2.
(2) There does not exist any univariate polynomial q of degree
at most d other than p| such that agr(g| , q) > /2 and
q(y) = p(y).

Proof of Claim: It suces to show that Items 7.59 and 7.59 hold
R
with probability 0.99 over the choice of a random point y Fm and
a random line  through y; then we can apply Markovs inequality to
nish the job.
Item 7.59 holds by pairwise independence. If the line  is chosen
randomly, then the q points on  are pairwise independent samples
of Fm . The expected agreement between g| and p| is simply the

258

Pseudorandom Generators

agreement between g and p, which is greater than by hypothesis. So


by the Pairwise-Independent Tail Inequality (Prop. 3.28),
Pr[agr(g| , p| ) /2] <

1
,
q (/2)2

which can be madesmaller than 0.01 for a large enough choice of the
constant c in = c d/q.
To prove Item 7.59, we imagine rst choosing the line  uniformly
at random from all lines in Fm , and then choosing y uniformly at
random from the points on  (reparameterizing  so that (1) = y).
Once we choose , we can let q1 , . . . , qt be all polynomials of degree
at most d, other than p| , that have agreement greater than /2 with
g| . (Note that this list is independent of the parametrization of ,
i.e., if  (x) = (ax + b) for a = 0 then p| and qi (x) = qi (ax + b)
have agreement equal to agr(p| , qi ).) By the list-decodability
of

ReedSolomon Codes (Proposition 5.15), we have t = O( q/d).
Now, since two distinct polynomials can agree in at most d points,
R
when we choose a random point y , the probability that qi and
p agree at y is at most d/q. After reparameterization of  so that
(1) = y, this gives
 
d
d
.
Pr[i : qi (1) = p(1)] t = O
y
q
q
This can also be made smaller than 0.01 for large enough choice
of the constant c (since we may assume q/d > c2 , else 1 and the
result holds vacuously).
7.6.4

Putting it Together

To obtain a locally list-decodable (rather than list-correctible) code,


we again use the low-degree extension (Lemma 7.48) to obtain a
systematic encoding. As before, to encode messages of length  = log L,

we apply Lemma 7.48 with |H| =  q and m = / log |H|, for total

degree d q . To decode
from a 1 fraction of errors using

Theorem 7.56, we need c d/q , which follows if q c2 2 /4 . This

7.6 Local List Decoding and PRGs from Worst-Case Hardness

259

yields the following locally list-decodable codes:


Theorem 7.60. For every L N and > 0, there is an explicit code

= poly(L, 1/) and alphabet


Enc : {0, 1}L L , with blocklength L
size || = poly(log L, 1/), that has a local (1 )-list-decoder running
in time poly(log L, 1/).
Concatenating the code with a Hadamard code, similarly to
Problem 5.2, we obtain:
Theorem 7.61. For every L N and > 0, there is an explicit code

= poly(L, 1/) that has a


Enc : {0, 1}L {0, 1}L with blocklength L
local (1/2 )-list-decoder running in time poly(log L, 1/).
Using Proposition 7.55, we get the following hardness amplication
result:
Theorem 7.62. For s : N N, suppose that there is a function
f : {0, 1} {0, 1} in E that is worst-case hard against nonuniform
time s(), where s() is computable in time 2O() , then there exists
f: {0, 1}O() {0, 1} in E that is (1/2 1/s ()) average-case hard
against (non-uniform) time s () for s () = t()(1) /poly().
Combining this with Theorem 7.19 and Corollary 7.20, we get:
Theorem 7.63. For s : N N, suppose that there is a function f E
such that for every input length  N, f is worst-case hard for nonuniform time s(), where s() is computable in time 2O() . Then for every
m N, there is a mildly explicit (m, 1/m) pseudorandom generator G :
{0, 1}d(m) {0, 1}m with seed length d(m) = O(s1 (poly(m))2 / log m).

Corollary 7.64. For s : N N, suppose that there is a function


f E = DTIME(2O() ) such that for every input length  N, f is

260

Pseudorandom Generators

worst-case hard for nonuniform time s(). Then:


(1) If s() = 2() , then BPP = P.
(1)

(2) If s() = 2 , then BPP P.


(3) If s() = (1) , then BPP SUBEXP.
We note that the hypotheses in these results are simply asserting
that there are problems in E of high circuit complexity, which is quite
plausible. Indeed, many common NP-complete problems, such as
SAT, are in E and are commonly believed to have circuit complexity
2() on inputs of length  (though we seem very far from proving it).
Thus, we have a winwin situation, either we can derandomize all
of BPP or SAT has signicantly faster (nonuniform) algorithms than
currently known.
Problem 7.1 establishes a converse to Theorem 7.63, showing that
pseudorandom generators imply circuit lower bounds. The equivalence
is fairly tight, except for the fact that Theorem 7.63 has seed length
d(m) = O(s1 (poly(m))2 / log m) instead of d(m) = O(s1 (poly(m))).
It is known how to close this gap via a dierent construction, which
is more algebraic and constructs PRGs directly from worst-case hard
functions (see the Chapter Notes and References); a (positive) solution
to Open Problem 7.25 would give a more modular and versatile
construction. For Corollary 7.64, however, there is only a partial
converse known. See Section 8.2.2.
Technical Comment. Consider Item 3 of Corollary 7.64, which
assumes that there is a problem in E of superpolynomial circuit
complexity. This sounds similar to assuming that E  P/poly (which
is equivalent to EXP  P/poly, by Problem 7.2). However, the latter
assumption is a bit weaker, because it only guarantees that there is a
function f E and a function s() = (1) such that f has complexity
at least s() for innitely many input lengths . Theorem 7.63 and
Corollary 7.64 assume that f has complexity at least s() for all ;
equivalently f is not in i.o.-P/poly, the class of functions that are
computable by poly-sized circuits for innitely many input lengths. We
need the stronger assumptions because we want to build generators
G : {0, 1}d(m) {0, 1}m that are pseudorandom for all output lengths

7.7 Connections to Other Pseudorandom Objects

261

m, in order to get derandomizations of BPP algorithms that are


correct on all input lengths. However, there are alternate forms of
these results, where the innitely often is moved from the hypothesis
to the conclusion. For example, if E  P/poly, we can conclude
that BPP i.o.-SUBEXP, where i.o.-SUBEXP denotes the class
of languages having deterministic subexponential-time algorithms
that are correct for innitely many input lengths. Even though these
innitely often issues need to be treated with care for the sake of
precision, it would be quite unexpected if the complexity of problems
in E and BPP oscillated as a function of input length in such a
strange way that they made a real dierence.

7.7
7.7.1

Connections to Other Pseudorandom Objects


Black-Box Constructions

Similarly to our discussion after Theorem 7.19, the pseudorandom


generator construction in the previous section is very general. The construction shows how to take any function f : {0, 1} {0, 1} and use it
as a subroutine (oracle) to compute a generator Gf : {0, 1}d {0, 1}m
whose pseudorandomness can be related to the hardness of f . The
only place that we use the fact that f E is to deduce that Gf is
computable in E. The reduction proving that Gf is pseudorandom is
also very general. We showed how to take any T that distinguishes
the output of Gf (Ud ) from Um and use it as a subroutine (oracle)
to build an ecient nonuniform algorithm Red such that RedT
computes f . The only place that we use the fact that T is itself an
ecient nonuniform algorithm is to deduce that RedT is an ecient
nonuniform algorithm, contradicting the worst-case hardness of f .
Such constructions are called black box, because they treat the
hard function f and the distinguisher T as black boxes (i.e., oracles),
without using the code of the programs that compute f and T .
As we will see, black-box constructions have signicant additional
implications. Thus, we formalize the notion of a black-box construction
as follows:
Denition 7.65. Let Gf : [D] [M ] be a deterministic algorithm
that is dened for every oracle f : [L] {0, 1}, let t, k be positive

262

Pseudorandom Generators

integers such that k t, and let > 0. We say that G is a (t, k, )


black-box PRG construction if there is a randomized oracle algorithm
Red, running in time t, such that for every f : [L] {0, 1} and
T : [M ] {0, 1} such that if
Pr[T (Gf (U[D] )) = 1] Pr[T (U[M ] ) = 1] > ,
then there is an advice string z [K] such that
x [L]

Pr[RedT (x, z) = f (x)] 2/3,

where the probability is taken over the coin tosses of Red.


Note that we have separated the running time t of Red and the
length k of its nonuniform advice into two separate parameters, and
assume k t since an algorithm cannot read more than k bits in
time t. When we think of Red as a nonuniform algorithm (like a
boolean circuit), then we may as well think of these two parameters as
being equal. (Recall that, up to polylog(s) factors, being computable
by a circuit of size s is equivalent to being computable by a uniform
algorithm running in time s with s bits of nonuniform advice.)
However, separating the two parameters is useful in order to isolate
the role of nonuniformity, and to establish connections with the other
pseudorandom objects we are studying.7
We note that if we apply a black-box pseudorandom generator
construction with a function f that is actually hard to compute, then
the result is indeed a pseudorandom generator:
Proposition 7.66. If G is a (t, k, ) black-box PRG construction
and f has nonuniform worst-case hardness at least s, then Gf is an

(s/O(t),
) pseudorandom generator.

7 Sometimes

it is useful to allow the advice string z to also depend on the coin tosses of
the reduction Red. By error reduction via r = O() repetitions, such a reduction can be
converted into one satisfying Denition 7.65 by sampling r = O() sequences of coin tosses,
but this blows up the advice length by a factor of r, which may be too expensive.

7.7 Connections to Other Pseudorandom Objects

263

Now, we can rephrase the pseudorandom generator construction of


Theorem 7.63 as follows:
Theorem 7.67. For every constant > 0, and every , m N,
and every > 0, there is a (t, k, ) black-box PRG construction
Gf : {0, 1}d {0, 1}m that is dened for every oracle f : {0, 1} {0, 1},
with the following properties:
(1) (Mild) explicitness: Gf is computable in uniform time
poly(m, 2 ) given an oracle for f .
(2) Seed length: d = O(( + log(1/))2 / log m).
(3) Reduction running time: t = poly(m, 1/).
(4) Reduction advice length: k = m1+ + O( + log(m/)).

In addition to asserting the black-box nature of Theorem 7.63, the


above is more general in that it allows to vary independently of m
(rather than setting = 1/m), and gives a tighter bound on the length
of the nonuniform advice than just t = poly(m, 1/).
Proof Sketch: Given a function f , Gf encodes f in the locally
list-decodable code of Theorem 7.61 (with decoding distance 1/2 

for  = /m) to obtain f : {0, 1} {0, 1} with  = O( + log(1/)),


and then computes the NisanWigderson generator based on f
log m) design. The seed length and
(Construction 7.23) and a (,
mild explicitness follow from the explicitness and parameters of the
design and code (Lemma 7.22 and Theorem 7.61). The running time
and advice length of the reduction follow from inspecting the proofs
of Theorems 7.61 and 7.19. Specically, the running time of of the
NisanWigderson reduction in the proof of Theorem 7.19 is poly(m)
(given the nonuniform advice) by inspection, and the running time of
the local list-decoding algorithm is poly(, 1/ ) poly(m, 1/). (We
may assume that m > , otherwise Gf need not have any stretch, and
the conclusion is trivial.) The length of the advice from the locally
list-decodable code consists of a pair (y, z) Fv F, where F is a
eld of size q = poly(, 1/) and v log |F| =  = O( + log(1/)). The

264

Pseudorandom Generators

NisanWigderson reduction begins with the distinguisher-to-predictor


reduction of Proposition 7.16, which uses log m bits of advice to
specify the index i at which the predictor works and m i 1 bits
for hardwiring the bits fed to the distinguisher in positions i, . . . , m. In
addition, for j = 1, . . . , i 1, the NisanWigderson reduction nonuniformly hardwires a truth-table for the function fj (y) which depends on
the at most log m bits of y selected by the intersection of the ith and
jth sets in the design. These truth tables require at most (i 1) m
bits of advice. In total, the amount of advice used is at most
O( + log(1/)) + m i 1 + (i 1) m
= m1+ + O( + log(1/)).
One advantage of a black-box construction is that it allows us to
automatically scale up the pseudorandom generator construction.
If we apply the construction to a function f that is not necessarily
computable in E, but in some higher complexity class, we get a
pseudorandom generator Gf computable in an analogously higher
complexity class. Similarly, if we want our pseudorandom generator
to fool tests T computable by nonuniform algorithms in some higher
complexity class, it suces to use a function f that is hard against an
analogously higher class.
For example, we get the following nondeterministic analogue of
Theorem 7.63:
Theorem 7.68. For s : N N, suppose that there is a function f
NE co-NE such that for every input length  N, f is worst-case
hard for nonuniform algorithms running in time s() with an NP
oracle (equivalently, boolean circuits with SAT gates), where s() is
computable in time 2O() . Then for every m N, there is a pseudorandom generator G : {0, 1}d(m) {0, 1}m with seed length d(m) =
O(s1 (poly(m))2 / log m) such that G is (m, 1/m)-pseudorandom
against nonuniform algorithms with an NP oracle, and G is computable
in nondeterministic time 2O(d(m)) (meaning that there is a nondeterministic algorithm that on input x, outputs G(x) on at least one computation path and outputs either G(x) or fail on all computation paths).

7.7 Connections to Other Pseudorandom Objects

265

The signicance of such generators is that they can be used for


derandomizing AM, which is a randomized analogue of NP, dened
as follows:
Denition 7.69. A language L is in AM i there is a probabilistic
polynomial-time verier V and polynomials m(n), p(n) such that for
all inputs x of length n,
xL
x
/L

Pr

r{0,1}m(n)
R

Pr

r{0,1}m(n)

[y {0, 1}p(n) V (x, r, y) = 1] 2/3,


[y {0, 1}p(n) V (x, r, y) = 1] 1/3.

Another (non-obviously!) equivalent denition of AM is the


class of languages having constant-round interactive proof systems,
where a computationally unbounded prover (Merlin) can convince
a probabilistic polynomial-time verier (Arthur) that an input
x is in L through an interactive protocol with of O(1) rounds of
polynomial-length communication.
Graph Nonisomorphism is the most famous example of a language
that is in AM but is not known to be in NP. Nevertheless, using Theorem 7.68 we can give evidence that Graph Nonisomorphism is in NP.
Corollary 7.70. If there is a function f NE co-NE that, on
inputs of length , is worst-case hard for nonuniform algorithms
running in time 2() with an NP oracle, then AM = NP.
While the above complexity assumption may seem very strong,
it is actually known to be weaker than the very natural assumption that exponential time E = DTIME(2O() ) is not contained in
subexponential space >0 DSPACE(2n ).
As we saw in Section 7.4.3 on derandomizing constant-depth
circuits, black-box constructions can also be scaled down to apply
to lower complexity classes, provided that the construction G and/or
reduction Red can be shown to be computable in a lower class
(e.g., AC0 ).

266

Pseudorandom Generators

7.7.2

Connections to Other Pseudorandom Objects

At rst, it may seem that pseudorandom generators are of a dierent


character than the other pseudorandom objects we have been studying.
We require complexity assumptions to construct pseudorandom generators, and reason about them using the language of computational
complexity (referring to ecient algorithms, reductions, etc.). The
other objects we have been studying are all information-theoretic in
nature, and our constructions of them have been unconditional.
The notion of black-box constructions will enable us to bridge
this gap. Note that Theorem 7.67 is unconditional, and we will see
that it, like all black-box constructions, has an information-theoretic
interpretation. Indeed, we can t black-box pseudorandom generator
constructions into the list-decoding framework of Section 5.3 as follows:
Construction 7.71. Let Gf : [D] [M ] be an algorithm that is
dened for every oracle f : [n] {0, 1}. Then, setting N = 2n , dene
: [N ] [D] [M ], by
(f, y) = Gf (y),
where we view the truth table of f as an element of [N ] {0, 1}n .
It turns out that if we allow the reduction unbounded running time
(but still bound the advice length), then pseudorandom generator
constructions have an exact characterization in our framework:
Proposition 7.72. Let Gf and be as in Construction 7.71. Then
Gf is an (, k, ) black-box PRG construction i for every T [M ],
we have
|LIST (T, (T ) + )| K,
where K = 2k .
Proof.
. Suppose Gf is an (, k, ) black-box PRG construction. Then f
is in LIST (T, (T ) + ) i T distinguishes Gf (U[D] ) from U[M ] with

7.7 Connections to Other Pseudorandom Objects

267

advantage greater than . This implies that there exists a z [K]


such that RedT (, z) computes f everywhere. Thus, the number of
functions f in LIST (T, (T ) + ) is bounded by the number of advice
strings z, which is at most K.
. Suppose that for every T [M ], we have L = |LIST (T, (T ) +
)| K. Then we can dene RedT (x, z) = fz (x), where f1 , . . . , fL are
any xed enumeration of the elements of LIST (T, (T ) + ).
Notice that this characterization of black-box PRG constructions
(with reductions of unbounded running time) is the same as the one
for averaging samplers (Proposition 5.30) and randomness extractors
(Proposition 6.23). In particular, the black-box PRG construction
of Theorem 7.67 is already a sampler and extractor of very good
parameters:
Theorem 7.73. For every constant > 0, every n N, k [0, n], and
every > 0, there is an explicit (k, ) extractor Ext : {0, 1}n {0, 1}d
{0, 1}m with seed length d = O(log2 (n/)/ log k) and output length
m k 1 .
Proof Sketch: Without loss of generality, assume that n is a power
of 2, namely n = 2 = L. Let Gf (y) : {0, 1}d {0, 1}m be the (t, k0 , 0 )
black-box PRG construction of Theorem 7.67 which takes a function
f : {0, 1} {0, 1} and has k0 = m1+ + O( + log(m/0 )), and let
Ext(f, y) = (f, y) = Gf (y). By Propositions 7.72 and 6.23, Ext is a
(k0 + log(1/0 ), 20 ) extractor. Setting = 20 and k = k0 + log(1/0 ),
we have a (k, ) extractor with output length
m = (k O( + log(m/)))1 k 1 O( + log(m/)).
We can increase the output length to k 1 by increasing the seed
length by O( + log(m/)). The total seed length then is

 2


log (n/)
( + log(1/))2
+  + log(m/) = O
.
d=O
log m
log k
The parameters of Theorem 7.73 are not quite as good as those of
Theorem 6.36 and Corollary 6.39, as the output length is k 1 rather

268

Pseudorandom Generators

than (1 )k, and the seed length is only O(log n) when k = n(1) .
However, these settings of parameters are already sucient for many
purposes, such as the simulation of BPP with weak random sources.
Moreover, the extractor construction is much more direct than that of
Theorem 6.36. Specically, it is
Ext(f, y) = (f(y|S1 ), . . . , f(y|Sm )),
where f is an encoding of f in a locally list-decodable code and
S1 , . . . , Sm are a design. In fact, since Proposition 7.72 does not depend
on the running time of the list-decoding algorithm, but only the
amount of nonuniformity, we can use any (1/2 /2m, poly(m/))
list-decodable code, which will only require an advice of length
O(log(m/)) to index into the list of decodings. In particular, we can
use a ReedSolomon code concatenated with a Hadamard code, as in
Problem 5.2.
We now provide some additional intuition for why black-box
pseudorandom generator constructions are also extractors. A blackbox PRG construction Gf is designed to use a computationally
hard function f (plus a random seed) to produce an output that is
computationally indistinguishable from uniform. When we view it as
an extractor Ext(f, y) = Gf (y), we instead are feeding it a function
f that is chosen randomly from a high min-entropy distribution (plus
a random seed). This can be viewed as saying that f is informationtheoretically hard, and from this stronger hypothesis, we are able
to obtain the stronger conclusion that the output is statistically
indistinguishable from uniform. The information-theoretic hardness
of f can be formalized as follows: if f is sampled from a source F
of min-entropy at least k + log(1/), then for every xed function A
(such as A = RedT ), the probability (over f F ) that there exists
a string z of length k such that A(, z) computes f everywhere is at
most . That is, a function generated with min-entropy larger than
k is unlikely to have a description of length k (relative to any xed
interpreter A).
Similarly to black-box PRG constructions, we can also discuss
converting worst-case hard functions to average-case hard functions in

7.7 Connections to Other Pseudorandom Objects

269

a black-box manner:
Denition 7.74. Let Ampf : [D] [q] be a deterministic algorithm
that is dened for every oracle f : [n] {0, 1}. We say that Amp is a
(t, k, ) black-box worst-case-to-average-case hardness amplier if there
is a probabilistic oracle algorithm Red, called the reduction, running
in time t such that for every function g : [D] [q] such that
Pr[g(U[D] ) = Ampf (U[D] )] > 1/q + ,
there is an advice string z [K], where K = 2k , such that
x [n]

Pr[Redg (x, z) = f (x)] 2/3,

where the probability is taken over the coin tosses of Red.


Note that this denition is almost identical to that of a locally
(1 1/q )-list-decodable code (Denition 7.54), viewing Ampf as
Enc(f ), and Redg as Decg2 . The only dierence is that in the denition
of locally list-decodable code, we require that there is a rst-phase
decoder Decg1 that eciently produces a list of candidate advice strings
(a property that is natural from a coding perspective, but is not
needed when amplifying hardness against nonuniform algorithms). If
we remove the constraint on the running time of Red, we simply obtain
the notion of a (1 1/q , K) list-decodable code. By analogy, we
can view black-box PRG constructions (with reductions of bounded
running time) as simply being extractors (or averaging samplers) with
a kind of ecient local list-decoding algorithm (given by Red, again
with an advice string that need not be easy to generate).
In addition to their positive uses illustrated above, black-box reductions and their information-theoretic interpretations are also useful for
understanding the limitations of certain proof techniques. For example,
we see that a black-box PRG construction Gf : {0, 1}d {0, 1}m must
have a reduction that uses k m d log(1/) 1 bits of advice.
Otherwise, by Propositions 7.72 and 6.23, we would obtain a (k, 2)
extractor that outputs m almost-uniform bits when given a source
of min-entropy less than k d 1, which is impossible if < 1/4.

270

Pseudorandom Generators

Indeed, notions of black-box reduction have been used in other settings


as well, most notably to produce a very ne understanding of the
relations between dierent cryptographic primitives, meaning which
ones can be constructed from each other via black-box constructions.

7.8

Exercises

Problem 7.1 (PRGs imply hard functions). Suppose that


for every m, there exists a mildly explicit (m, 1/m) pseudorandom generator Gm : {0, 1}d(m) {0, 1}m . Show that E has
a function f : {0, 1} {0, 1} with nonuniform worst-case hardness t() = (d1 ( 1)). In particular, if d(m) = O(log m), then
t() = 2() (Hint: look at a prex of Gs output.)

Problem 7.2 (Equivalence of lower bounds for EXP and


E). Show that E contains a function f : {0, 1} {0, 1} of circuit complexity (1) if and only if EXP does. (Hint: consider
f  (x1 x ) = f (x1 x ).)
(1)
Does the same argument work if we replace (1) with 2 ? How
about 2() ?

Problem 7.3 (Limitations of Cryptographic Generators).


(1) Prove that a cryptographic pseudorandom generator cannot
have seed length d(m) = O(log m).
(2) Prove that cryptographic pseudorandom generators (even
with seed length d(m) = m 1) imply NP  P/poly.
(3) Note where your proofs fail if we only require that G is an
(mc , 1/mc ) pseudorandom generator for a xed constant c.

Problem 7.4 (Deterministic Approximate Counting). Using


the PRG for constant-depth circuits of Theorem 7.29, give deterministic quasipolynomial-time algorithms for the problems below.

7.8 Exercises

271

(The running time of your algorithms should be 2poly(log n,log(1/)) ,


where n is the size of the circuit/formula given and is the accuracy
parameter mentioned.)
(1) Given a constant-depth circuit C and > 0, approximate
the fraction of inputs x such that C(x) = 1 to within an
additive error of .
(2) Given a DNF formula and > 0, approximate the number
of assignments x such that (x) = 1 to within a multiplicative fraction of (1 + ). You may restrict your attention to
in which all clauses contain the same number of literals.
(Hint: Study the randomized DNF counting algorithm of
Theorem 2.34.)
Note that these are not decision problems, whereas classes such as
BPP and BPAC0 are classes of decision problems. One of the points
of this problem is to show how derandomization can be used for other
types of problems.

Problem 7.5 (Strong Pseudorandom Generators). By analogy


with strong extractors, call a function G : {0, 1}d {0, 1}m a (t, )
strong pseudorandom generator i the function G (x) = (x, G(x)) is a
(t, ) pseudorandom generator.
(1) Show that there do not exist strong cryptographic pseudorandom generators.
(2) Show that the NisanWigderson generator (Theorem 7.24)
is a strong pseudorandom generator.
(3) Suppose that for all constants > 0, there is a strong
and fully explicit (m, (m)) pseudorandom generator

G : {0, 1}m {0, 1}m . Show that for every language


L BPP, there is a deterministic polynomial-time algorithm A such that for all n, Prx{0,1}
R
n [A(x) = L (x)]
1/2n + (poly(n)). That is, we get a polynomial-time
average-case derandomization even though the seed length
of G is d(m) = m .

272

Pseudorandom Generators

(4) Show that for every language L BPAC0 , there is an AC0


algorithm A such that Prx{0,1}
R
n [A(x) = L (x)] 1/n.
(Warning: be careful about error reduction.)

Problem 7.6 (Private Information Retrieval). The goal of private information retrieval is for a user to be able to retrieve an entry of
a remote database in such a way that the server holding the database
learns nothing about which database entry was requested. A trivial
solution is for the server to send the user the entire database, in which
case the user does not need to reveal anything about the entry desired.
We are interested in solutions that involve much less communication.
One way to achieve this is through replication.8 Formally, in a q-server
private information-retrieval (PIR) scheme, an arbitrary database
D {0, 1}n is duplicated at q noncommunicating servers. On input an
index i [n], the user algorithm U tosses some coins r and outputs
queries (x1 , . . . , xq ) = U (i, r), and sends xj to the jth server. The jth
server algorithm Sj returns an answer yj = Sj (xj , D). The user then
computes its output U (i, r, x1 , . . . , xq ), which should equal Di , the ith
bit of the database. For privacy, we require that the distribution of
each query xj (over the choice of the random coin tosses r) is the same
regardless of the index i being queried.
It turns out that q-query locally decodable codes and q-server PIR
are essentially equivalent. This equivalence is proven using the notion
of smooth codes. A code Enc : {0, 1}n n is a q-query smooth code
if there is a probabilistic oracle algorithm Dec such that for every
message x and every i [n], we have Pr[DecEnc(x) (i) = xi ] = 1 and Dec
makes q nonadaptive queries to its oracle, each of which is uniformly
distributed in [
n]. Note that the oracle in this denition is a valid
codeword, with no corruptions. Below you will show that smooth
codes imply locally decodable codes and PIR schemes; converses are
also known (after making some slight relaxations to the denitions).
8 Another

way is through computational security, where we only require that it be computationally infeasible for the database to learn something about the entry requested.

7.8 Exercises

273

(1) Show that the decoder for a q-query smooth code is also a
local (1/3q)-decoder for Enc.
(2) Show that every q-query smooth code Enc : {0, 1}n n
gives rise to a q-server PIR scheme in which the user and
servers communicate at most q (log n
+ log ||) bits for
each database entry requested.
(3) Using the ReedMuller code, show that there is a polylog(n)server PIR scheme with communication complexity
polylog(n) for n-bit databases. That is, the user and servers
communicate at most polylog(n) bits for each database
entry requested. (For constant q, the ReedMuller code with
an optimal systematic encoding as in Problem 5.4 yields a
q-server PIR with communication complexity O(n1/(q1) ).)

Problem 7.7 (Better Local Decoding of ReedMuller Codes).


Show that for every constant > 0, there is a constant > 0 such that
there is a local (1/2 )-decoding algorithm for the q-ary ReedMuller
code of degree d and dimension m, provided that d q. (Here we are
referring to unique decoding, not list decoding.) The running time of
the decoder should be poly(m, q).

Problem 7.8 (Hitting-Set Generators). A set Hm {0, 1}m is a


(t, ) hitting set if for every nonuniform algorithm T running in time
t that accepts greater than an fraction of m-bit strings, T accepts at
least one element of Hm .
(1) Show that if, for every m, we can construct an
(m, 1/2) hitting set Hm in time s(m) m, then

RP c DTIME(s(nc )). In particular, if s(m) = poly(m),
then RP = P.
(2) Show that if there is a (t, ) pseudorandom generator
Gm : {0, 1}d {0, 1}m computable in time s, then there is a
(t, ) hitting set Hm constructible in time 2d s.

274

Pseudorandom Generators

(3) Show that if, for every m, we can construct an (m, 1/2) hitting set Hm in time s(m) = poly(m), then BPP = P. (Hint:
this can be proven in two ways. One uses Problem 3.1 and
the other uses a variant of Problem 7.1 together with Corollary 7.64. How do the parameters for general s(m) compare?)
(4) Dene the notion of a (t, k, ) black-box construction of
hitting set-generators, and show that, when t = , such
constructions are equivalent to constructions of dispersers
(Denition 6.19).

Problem 7.9 (PRGs versus Uniform Algorithms Average


Case Derandomization). For functions t : N N and : N [0, 1],
we say that a sequence {Gm : {0, 1}d(m) {0, 1}m } of is a (t(m), (m))
pseudorandom generator against uniform algorithms i the ensembles {G(Ud(m) )}mN and {Um }mN are uniformly computationally
indistinguishable (Denition 7.2).
Suppose that we have a mildly explicit (m, 1/m) pseudorandom generator against uniform algorithms that has seed length d(m). Show that
for every language L in BPP, there exists a deterministic algorithm
A running in time 2d(poly(n)) poly(n) on inputs of length n such that:
R

(1) Pr [A(Xn ) = L(Xn )] 1 1/n2 , where Xn {0, 1}n and


L() is the characteristic function of L. (The exponent of
2 in n2 is arbitrary, and can be replaced by any constant.)
Hint: coming up with the algorithm A is the easy part;
proving that it works well is a bit trickier.
(2) Pr [A(Xn ) = L(Xn )] 1 1/n2 , for any random variable
Xn distributed on {0, 1}n that is samplable in time n2 .

Problem 7.10 (PRGs are Necessary for Derandomization).


(1) Call a function G : {0, 1}d {0, 1}m a (t, , ) pseudorandom
generator against bounded-nonuniformity algorithms i for

7.8 Exercises

275

every probabilistic algorithm T that has a program of length


at most  and that runs in time at most t on inputs of
length n, we have
| Pr[T (G(Ud )) = 1] Pr[T (Um ) = 1]| .
Consider the promise problem whose YES instances
are truth tables of functions G : {0, 1}d {0, 1}m that are
(m, log m, 1/m) pseudorandom generators against boundednonuniformity algorithms, and whose NO instances are
truth tables of functions that are not (m, log m, 2/m)
pseudorandom generators against bounded-nonuniformity
algorithms. (Here m and d are parameters determined by
the input instance G.) Show that is in prBPP.
(2) Using Problem 2.11, show that if prBPP = prP, then
there is a mildly explicit (m, 1/m) pseudorandom generator
against uniform algorithms with seed length O(log m).
(See Problem 7.9 for the denition. It turns out that the
hypothesis prBPP = prP here can be weakened to obtain
an equivalence between PRGs vs. uniform algorithms and
average-case derandomization of BPP.)

Problem 7.11 (Composition). For simplicity in this problem, only


consider constant t in this problem (although the results do have
generalizations to growing t = t()).
(1) Show that if f : {0, 1} {0, 1} is a one-way permutation,
then for any constant t, f (t) is a one-way permutation, where
def
f (t) (x) = f (f ( f (x))).
' () *
t

(2) Show that the above fails for one-way functions. That
is, assuming that there exists a one-way function g, construct a one-way function f which doesnt remain one
way under composition. (Hint: for |x| = |y| = /2, set
f (x, y) = 1|g(y)| g(y) unless x {0 , 1 }.)

276

Pseudorandom Generators

(3) Show that if G is a cryptographic pseudorandom generator


with seed length d(m) = m(1) , then for any constant t, G(t)
is a cryptographic pseudorandom generator. Note where
your proof fails for fully explicit pseudorandom generators
against time mc for a xed constant c.

Problem 7.12 (Local List Decoding the Hadamard Code). For


m
a function f : Zm
2 Z2 , A parameterized subspace x + V of Z2 of
m
dimension d is given by a linear map V : Zd2 Zm
2 and a shift x Z2 .
(We do not require that the map V be full rank.) We write V for
d
m
0 + V . For a function f : Zm
2 Z2 , we dene f |x+V : Z2 Z2 by
f |x+V (y) = f (x + V (y)).
(1) Let c : Zm
2 Z2 be a codeword in the Hadamard code
(i.e., a linear function), r : Zm
2 Z2 a received word, V a
m
parameterized subspace of Z2 of dimension d, and x Zm
2 .
Show that if dH (r|x+V , c|x+V ) < 1/2, then c(x) can be
computed from x, V , c|V , and oracle access to r in time
poly(m, 2d ) with 2d 1 queries to r.
(2) Show that for every m N and > 0, the Hadamard code
of dimension m has a (1/2 ) local list-decoding algorithm
(Dec1 , Dec2 ) in which both Dec1 and Dec2 run in time
poly(m, 1/), and the list output by Dec1 has size O(1/2 ).
(Hint: consider a random parameterized subspace V of
dimension 2 log(1/) + O(1), and how many choices there
are for c|V .)
(3) Show that Dec2 can be made to be deterministic and run in
time O(m).

Problem 7.13 (Hardcore Predicates). A hardcore predicate for


a one-way function f : {0, 1} {0, 1} is a poly()-time computable
function b : {0, 1} {0, 1} such that for every constant c, every

7.8 Exercises

277

nonuniform algorithm A running in time c , we have:


Pr[A(f (U )) = b(U )]

1
1
+ c,
2


for all suciently large . Thus, while the one-wayness of f only


guarantees that it is hard to compute all the bits of f s input from its
output, b species a particular bit of information about the input that
is very hard to compute (one cant do noticeably better than random
guessing).

(1) Let Enc : {0, 1} {0, 1}L be a code such that given
Enc(x)y can be computed in time
x {0, 1} and y [L],
poly(). Suppose that for every constant c and all suciently
large , Enc has a (1/2 1/c ) local list-decoding algorithm
(Dec1 , Dec2 ) in which both Dec1 and Dec2 run in time
poly(). Prove that if f : {0, 1} {0, 1} is a one-way
function, then b(x, y) = Enc(x)y is a hardcore predicate for
the one-way function f  (x, y) = (f (x), y).
(2) Show that if b : {0, 1} {0, 1} is a hardcore predicate for
a one-way permutation f : {0, 1} {0, 1} , then for every
m = poly(), the following function G : {0, 1} {0, 1}m is
a cryptographic pseudorandom generator:
G(x) = (b(x), b(f (x)), b(f (f (x))), . . . , b(f (m1) (x))).
(Hint: show that G is previous-bit unpredictable.)
(3) Using Problem 7.12, deduce that if f : {0, 1} {0, 1} is
a one-way permutation, then for every m = poly(), the
following is a cryptographic pseudorandom generator:
Gm (x, r) = (x, r, f (x), r, f (f (x)), r, . . . , f (m1) (x), r).

Problem 7.14 (PRGs from 11 One-Way Functions). A


random variable X has (t, ) pseudoentropy at least k if it is (t, )
indistinguishable from some random variable of min-entropy at least k.

278

Pseudorandom Generators

(1) Suppose that X has (t, ) pseudoentropy at least k and that


Ext : {0, 1}n {0, 1}d {0, 1}m is a (k,  )-extractor computable in time t . Show that Ext(X, Ud ) is an (t t , +  )
indistinguishable from Um .

(2) Let f : {0, 1} {0, 1} be a one-to-one one-way function
(not necessarily length-preserving) and b : {0, 1} {0, 1} a
hardcore predicate for f (see Problem 7.13). Show that for
every constant c and all suciently large , the random variable f (U )b(U ) has (c , 1/c ) pseudoentropy at least  + 1.
(3) (*) Show how to construct a cryptographic pseudorandom
generator from any one-to-one one-way function. (Any seed
length (m) < m is ne.)

7.9

Chapter Notes and References

Other surveys on pseudorandom generators and derandomization


include [162, 209, 226, 288].
Descriptions of classical constructions of pseudorandom generators
(e.g., linear congruential generators) and the batteries of statistical
tests that are used to evaluate them can be found in [245, 341]. Linear
congruential generators and variants were shown to be cryptographically insecure (e.g., not satisfy Denition 7.9) in [56, 80, 144, 252, 374].
Current standards for pseudorandom generation in practice can be
found in [51].
The modern approach to pseudorandomness described in this
section grew out of the complexity-theoretic approach to cryptography
initiated by Die and Hellman [117] (who introduced the concept
of one-way functions, among other things). Shamir [360] constructed
a generator achieving a weak form of unpredictability based on the
conjectured one-wayness of the RSA function [336]. (Shamirs generator outputs a sequence of long strings, such that none of the string
can be predicted from the others, except with negligible probability,
but individual bits may be easily predictable.) Blum and Micali [72]
proposed the criterion of next-bit unpredictability (Denition 7.15)
and constructed a generator satisfying it based on the conjectured

7.9 Chapter Notes and References

279

hardness of the Discrete Logarithm Problem. Yao [421] gave the


now-standard denition of pseudorandomness (Denition 7.3) based
on the notion of computational indistinguishability introduced in the
earlier work of Goldwasser and Micali [176] (which also introduced
hybrid arguments). Yao also proved the equivalence of pseudorandomness and next-bit unpredictability (Proposition 7.16), and showed
how to construct a cryptographic pseudorandom generator from any
one-way permutation. The construction described in Section 7.2 and
Problems 7.12 and 7.13 uses the hardcore predicate from the later work
of Goldreich and Levin [168]. The construction of a pseudorandom
generator from an arbitrary one-way function (Theorem 7.11) is due
to H
astad, Impagliazzo, Levin, and Luby [197]. The most ecient
(and simplest) construction of pseudorandom generators from general
one-way functions to date is in [198, 401]. Goldreich, Goldwasser,
and Micali [164] dened and constructed pseudorandom functions,
and illustrated their applicability in cryptography. The application of
pseudorandom functions to learning theory is from [405], and their
application to circuit lower bounds is from [323]. For more about
cryptographic pseudorandom generators, pseudorandom functions,
and their applications in cryptography, see the text by Goldreich [157].
Yao [421] demonstrated the applicability of pseudorandom generators to derandomization, noting in particular that cryptographic
pseudorandom generators imply that BPP SUBEXP, and that one
under stronger intractability assumptions.
can obtain even BPP P
Nisan and Wigderson [302] observed that derandomization only
requires a mildly explicit pseudorandom generator, and showed how
to construct such generators based on the average-case hardness
of E (Theorem 7.24). A variant of Open Problem 7.25 was posed
in [202], who showed that it also would imply stronger results on
hardness amplication; some partial negative results can be found
in [214, 320].
The instantiation of the NisanWigderson pseudorandom generator
that uses the parity function to fool constant-depth circuits (Theorem 7.29) is from the earlier work of Nisan [298]. (The average-case
hardness of parity against constant-depth circuits stated in Theorem 7.27 is due Boppana and H
astad [196].) The rst unconditional

280

Pseudorandom Generators

pseudorandom generator against constant-depth circuits was due to


Ajtai and Wigderson [12] and had seed length (m) = m (compared
to polylog(m) in Nisans generator). Recently, Braverman [81] proved
that any polylog(m)-wise independent distribution fools AC0 , providing a dierent way to obtain polylogarithmic seed length and resolving
a conjecture of Linial and Nisan [265]. The notion of strong pseudorandom generators (a.k.a. seed-extending pseudorandom generators)
and the average-case derandomization of AC0 (Problem 7.5) are from
[242, 355]. Superpolynomial circuit lower bounds for AC0 [2] were
given by [322, 368]. Viola [412] constructed pseudorandom generators
with superpolynomial stretch for AC0 [2] circuits that are restricted
to have a logarithmic number of parity gates.
Detailed surveys on locally decodable codes and their applications
in theoretical computer science are given by Trevisan [391] and
Yekhanin [424]. The notion grew out of several dierent lines of work,
and it took a couple of years before a precise denition of locally decodable codes was formulated. The work of Goldreich and Levin [168] on
hardcore predicates of one-way permutations implicitly provided a local
list-decoding algorithm for the Hadamard code. (See Problems 7.12
and 7.13.) Working on the problem of instance hiding introduced
in [2], Beaver and Feigenbaum [54] constructed a protocol based on
Shamirs secret sharing [359] that eectively amounts to using the
local decoding algorithm for the ReedMuller code (Algorithm 7.43)
with the multilinear extension (Lemma 7.47). Blum, Luby, and Rubinfeld [71] and Lipton [266] introduced the concept of self-correctors
for functions, which allow a one to convert a program that correctly
computes a function on most inputs to one that correctly computes
the function on all inputs.9 Both papers gave self-correctors for group
homomorphisms, which, when applied to homomorphisms from Zn2
to Z2 , can be interpreted as a local corrector for the Hadamard code
9 Blum,

Luby, and Rubinfeld [71] also dened and constructed self-testers for functions,
which allow one to eciently determine whether a program does indeed compute a function
correctly on most inputs before attempting to use self-correction. Together a self-tester and
self-corrector yield a program checker in the sense of [70]. The study of self-testers gave
rise to the notion of locally testable codes, which are intimately related to probabilistically
checkable proofs [41, 42], and to the notion of property testing [165, 337, 340], which is an
area within sublinear-time algorithms.)

7.9 Chapter Notes and References

281

(Proposition 7.40). Lipton [266] observed that the techniques of Beaver


and Feigenbaum [54] yield a self-corrector for multivariate polynomials,
which, as mentioned above, can be interpreted as a local corrector for
the ReedMuller code. Lipton pointed out that it is interesting to apply
these self-correctors to presumably intractable functions, such as the
Permanent (known to be #P-complete [404]), and soon it was realized
that they could also be applied to complete problems for other classes
by taking the multilinear extension [42]. Babai, Fortnow, Nisan, and
Wigderson [43] used these results to construct pseudorandom generators from the worst-case hardness of EXP (or E, due to Problem 7.2),
and thereby obtain subexponential-time or quasipolynomial-time
simulations of BPP under appropriate worst-case assumptions
(Corollary 7.64, Parts 2 and 3). All of these works also used the
terminology of random self-reducibility, which had been present in
the cryptography literature for a while [29], and was known to imply
worst-case/average-case connections. Understanding the relationship
between the worst-case and average-case complexity of NP (rather
than high classes like EXP) is an important area of research; see the
survey [74].
Self-correctors for multivariate polynomials that can handle a
constant fraction of errors (as in Theorem 7.42) and fraction of errors
approaching 1/2 (as in Problem 7.7) were given by Gemmell et al. [149]
and Gemmell and Sudan [150], respectively. Babai, Fortnow, Levin, and
Szegedy [41] reformulated these results as providing error-correcting
codes with ecient local decoding (and local testing) algorithms.
Katz and Trevisan [239] focused attention on the exact query complexity of locally decodable codes (separately from computation time), and
proved that locally decodable codes cannot simultaneously have the
rate, distance, and query complexity all be constants independent of
the message length. Constructions of 3-query locally decodable codes
with subexponential blocklength were recently given by Yekhanin [423]
and Efremenko [128]. Private Information Retrieval (Problem 7.6)
was introduced by Chor, Goldreich, Kushilevitz, and Sudan [99].
Katz and Trevisan [239] introduced the notion of smooth codes and
showed their close relation to both private information retrieval and
locally decodable codes (Problem 7.6). Recently, Saraf, Kopparty,

282

Pseudorandom Generators

and Yekhanin [249] constructed the rst locally decodable codes with
sublinear-time decoding and rate larger 1/2.
Techniques for Hardness Amplication (namely, the Direct Product
Theorem and XOR Lemma) were rst described in oral presentations of
Yaos paper [421]. Since then, these results have been strengthened and
generalized in a number of ways. See the survey [171] and Section 8.2.3.
The rst local list-decoder for ReedMuller codes was given by Arora
and Sudan [35] (stated in the language of program self-correctors). The
one in Theorem 7.56 is due to Sudan, Trevisan, and Vadhan [381], who
also gave a general denition of locally list-decodable codes (inspired
by a list-decoding analogue of program self-correctors dened by Ar
et al. [30]) and explicitly proved Theorems 7.60, 7.61, and 7.62.
The result that BPP = P if E has a function of nonuniform worstcase hardness s() = 2() (Corollary 7.64, Part 1) is from the earlier
work of Impagliazzo and Wigderson [215], who used derandomized
versions of the XOR Lemma to obtain sucient average-case hardness
for use in the NisanWigderson pseudorandom generator. An optimal
construction of pseudorandom generators from worst-case hard functions, with seed length d(m) = O(s1 (poly(m))) (cf., Theorem 7.63),
was given by Shaltiel and Umans [356, 399].
For more background on AM, see the Notes and References of
Section 2. The rst evidence that AM = NP was given by Arvind
and K
obler [37], who showed that one can use the NisanWigderson
generator with a function that is (2() , 1/2 1/2() )-hard for nondeterministic circuits. Klivans and van Melkebeek [244] observed that
the ImpagliazzoWigderson pseudorandom generator construction
is black box and used this to show that AM can be derandomized using functions that are worst-case hard for circuits with
an NP oracle (Theorem 7.68). Subsequent work showed that one
only needs worst-case hardness against a nonuniform analogue of
NP co-NP [289, 356, 357].
Trevisan [389] showed that black-box pseudorandom generator
constructions yield randomness extractors, and thereby obtained the
extractor construction of Theorem 7.73. This surprising connection
between complexity-theoretic pseudorandomness and informationtheoretic pseudorandomness sparked much subsequent work, from

7.9 Chapter Notes and References

283

which the unied theory presented in this survey emerged. The fact
that black-box hardness ampliers are a form of locally list-decodable
codes was explicitly stated (and used to deduce lower bounds on
advice length) in [397]. The use of black-box constructions to classify
and separate cryptographic primitives was pioneered by Impagliazzo
and Rudich [213]; see also [326, 330].
Problem 7.1 (PRGs imply hard functions) is from [302]. Problem 7.2
is a special case of the technique called translation or padding
in complexity theory. Problem 7.4 (Deterministic Approximate
Counting) is from [302]. The fastest known deterministic algorithms
for approximately counting the number of satisfying assignments to
a DNF formula are from [280] and [178] (depending on whether the
approximation is relative or additive, and the magnitude of the error).
The fact that hitting set generators imply BPP = P (Problem 7.8)
was rst proven by Andreev, Clementi, and Rolim [27]; for a more
direct proof, see [173]. Problem 7.9 (that PRGs vs. uniform algorithms
imply average-case derandomization) is from [216]. Goldreich [163]
showed that PRGs are necessary for derandomization (Problem 7.10).
The result that one-to-one one-way functions imply pseudorandom
generators is due to Goldreich, Krawczyk, and Luby [167]; the proof
in Problem 7.14 is from [197].
For more on Kolmogorov complexity, see [261]. In recent years,
connections have been found between Kolmogorov complexity and
derandomization; see [14]. The tighter equivalence between circuit size
and nonuniform computation time mentioned after Denition 7.1 is due
to Pippenger and Fischer [311]. The 5n O(n) lower bound on circuit
size is due to Iwama, Lachish, Morizumi, and Raz [218, 254]. The
fact that single-sample indistinguishability against uniform algorithms
does not imply multiple-sample indistinguishability unless we make
additional assumptions such as ecient samplability (in contrast to
Proposition 7.14), is due to Goldreich and Meyer [169]. (See also [172].)

8
Conclusions

8.1

A Unied Theory of Pseudorandomness

In the course of this survey, we have seen that a wide variety of


pseudorandom objects are very closely related, and indeed almost
equivalent when viewed appropriately: list-decodable codes, averaging samplers, expander graphs, randomness extractors, (black-box)
pseudorandom generator constructions, and hardness ampliers. The
power of these connections comes from the fact they allow us to
translate intuitions and techniques that are very natural for one of
these objects to others where they have been might be dicult to
discover. Indeed, we have seen several examples of this, such the
use of randomness extractors to construct near-optimal samplers
(Problem 6.5), the use of ParvareshVardy codes to construct bipartite
expanders (Theorem 5.35) and lossless condensers (Theorem 6.34), and
the use of the NisanWigderson pseudorandom generator construction
to construct extractors (Theorem 7.73).
Table 8.1 reviews the connections between these objects, by putting
them in our list-decoding framework. Recall that we can view each
of the objects as a function : [N ] [D] [M ] and can characterize
their properties by bounding the sizes of sets of the form LIST (T, ) =
{x : Pry [(x, y) T ] > }, where T [M ]. (When = 1, we change
284

of x
Samp(x)y
Con(x, y)

hitting samplers
k k + d

(lossless) condensers

yth nbr

expanders
(, )

Ext(x, y)

averaging samplers
(k, )

black-box PRGs
(= K, A)

Samp(x)y

hardness ampliers
(, )

Gx (y)

(y, Ampf (y))

list-decodable codes
(t, k, ) black-box

extractors
(t, k, )

(x, y)
(y, Enc(x)y )

Object
(1 1/q , K)

(local w/advice)
dont care

|LIST(T, (T ) + )| K
|LIST(T, (T ) + )| K

t = poly(m)

dont care

|LIST(T, 1)| < K

dont care

|LIST(T, (T ) + )| K
|T | < (1 )DK

(local w/advice)
dont care

|LIST(T, 1)| < K


|T | < (1 )M

|T | < AK

|LIST(T, (T ) + )| K

dont care

t = poly(log n, 1/)

|LIST(T, (T ) + )| K
T = {(y, ry )}

|LIST(T, (T ) + )|  K

Decoding Time
poly(n, 1/)

List-Decoding Problem
T = {(y, ry )}

5.33

k = O(m) [polylog(n), n]

n = O(m + log(1/))
d = O(log(n/)),

4.7
6.33

Prob.

7.72
k = t = poly(m) [polylog(n), n ]
D = O(1), A = 1 + (1),
K = N/2, M = N
K = N , D = poly(1/, log(1/)),

6.23

k = O(m) [polylog(n), n]
= 1/m,

5.30

7.74

Prop.
5.29

n = O(m + log(1/))
d = O(log(n/)),

k = t = poly(log n, 1/) n
K = N , D = poly(1/, log(1/)),

M = Dq = O(n), K = poly(n)
d = O(log n), q = 2, M = Dq = poly(n),

Standard Parameters
q = O(1), = (1),

Table 8.1. Capturing Pseudorandom Objects : [N ] [D] [M ] by bounding sizes of LIST (T, ) = {x : Pry [(x, y) T ] > } for
T [M ]. The parameter denotes a arbitrarily small positive constant. As usual, N = 2n , M = 2m , D = 2d , and K = 2k .

8.1 A Unied Theory of Pseudorandomness

285

286

Conclusions

the inequality to an equality.) Other objects we have encountered,


such as dispersers (Denition 6.19), black-box hitting-set generator
constructions (Problem 7.8), and lossy condensers (Denition 6.32)
can also be cast in the framework, but we leave this as an exercise.
In addition to illustrating the connections between the objects,
Table 8.1 also brings out their dierences. In some cases = (T ) + ,
and in other cases = 1. In some cases (samplers, extractors, PRG constructions), we consider all subsets T [M ], and in other cases (codes,
hardness ampliers, expanders, condensers), we only consider T that
are small and possibly have some additional structure, like being the
graph of a received word. (Typically, the only way the graph structure
is used is to determine that the size of T is D.) In some cases, we want
ecient algorithms to construct LIST(T, ) (even polylogarithmic-time
local decoding algorithms), and in others, all we care about is its size.
The most common parameter ranges also vary quite widely among
the objects. The typical relation between M and N ranges from logarithmic (codes) to polylogarithmic (hardness ampliers) to a constant
power (samplers) to a constant multiplicative factor (expanders).
(Extractors, PRG constructions, and condensers all consider the full
range between polylogarithmic and a constant power.) The typical
value of D ranges from being a constant independent of N (expanders),
to being logarithmic in N (codes), to being polylogarithmic in N
(extractors, condensers, PRG constructions, hardness ampliers).
Despite these dierences, we have seen that the connections are
nevertheless quite powerful, and ideas or techniques developed for one
of these objects are often useful for the others.
The unied framework of Table 8.1 opens up a tantalizing possibility that we can also have a unied construction of all of these objects.
In what follows, we will argue that this is possible ignoring explicitness/eciency considerations (including local list-decoding). The
bounds will be stated in terms of the tails of the binomial distribution:
Denition 8.1 (tails of binomial distributions). For t N,
, (0, 1), dene
 t

1
Bin(t, , ) = Pr
Xi > ,
t
i=1

8.1 A Unied Theory of Pseudorandomness

287

where the Xi s are iid {0, 1}-valued random variables such that
Pr[Xi = 1] = , and dene
 t

1
Bin(t, , 1) = Pr
Xi = 1 = t .
t
i=1

With this denition, we can give the following unied construction:


Theorem 8.2 (nonconstructive unied pseudorandom object).
For every N, M, D N, there exists a function : [N ] [D] [M ]
such that for every T [M ] and every (0, 1], we have
|LIST (T, )|

   
M
N
1
Bin(D, (T ), )K

min K N :
.
2
|T |
K
4K |T |2
We leave the proof of Theorem 8.2 as an exercise and instead show
how many of the nonconstructive bounds weve seen for various pseudorandom objects would simultaneously hold for the of Theorem 8.2
by setting parameters appropriately:
The nonconstructive bounds for expanders (Theorem 4.4)
can be obtained by setting M = N , = 1, = (T ),
K = |T |/(D 2) (ignoring round-o errors), and noting that
   
N
N
Bin(D, , 1)K

|T |
K




Ne K
N e |T | DK

K
|T |


 (D2)K
(D 2) e K
e
=

DK

= ((D 2)eD1 )K
< 1/(4K 2 |T |2 ),
provided is suciently small. Thus, by Proposition 5.33,
denes a (N, (D 2)) vertex expander for a suciently
small .

288

Conclusions

More generally, we see that for any constants , A, D > 0


and suciently large N = M , will be a (N, A)
expander provided that D > (H() + H(A))/ log(1/A),
where H(x) = x log(1/x) + (1 x) log(1/(1 x)) is the
binary entropy function.1 (To see this, use the fact that
N 
(H()+o(1))N as N .)
N = 2
For the nonconstructive bound on extractors (Theorem 6.14) and averaging samplers, we set = (T ) +
N 
 
M
and use the bounds K
(N e/K)K , M
T 2 , and
Bin(D, (T ), (T ) + ) exp((D2 )) (by the Cherno
Bound of Theorem 2.21). Then the condition of Theorem 8.2 is satised provided that D (c/2 ) log(N/K) and
M 2 KD/c for a suciently large constant c.
To get the strong forms of pseudorandom objects, where
(x, y) = (y,  (x, y)) for some function  , we need to consider the
tails of sums of independent, but not necessarily identically distributed,
binomial random variables:
Denition 8.3 (tails of Poisson binomial distributions). For
t, N, , (0, 1), dene
 t

1
Yi > ,
PBin(t, , , ) = sup Pr
t
(Y1 ,...,Yt )
i=1

and


t
1
PBin(t, , 1, ) = sup Pr
Yi = 1 = Bin(t, , 1),
t
(Y1 ,...,Yt )
i=1

where the suprema are taken over all sequences of independent



Bernoulli random variables Y1 , . . . , Yt such that (1/t) ti=1 E[Yi ] ,
and additionally E[Yi ] is an integer multiple of for every i.
1 The

discussion after Theorem 4.4 states a slightly stronger result, only requiring D >
(H() + H(A))/(H() AH(1/)), but this is based on choosing a graph that is the
union of D random perfect matchings, which does indeed have slightly better expansion
properties than the model we are using now, of choosing D random neighbors for each left
vertex.

8.1 A Unied Theory of Pseudorandomness

289

Theorem 8.4 (nonconstructive unied pseudorandom object).


For every N, q, D N, there exists a function  : [N ] [D] [q] such
that if we dene (x, y) = (y,  (x, y)), then for every T [D] [q]
and every (0, 1], we have
|LIST (T, )|

   
Dq
N
1
PBin(D, (T ), , 1/q)K

min K N :
.
|T |
K
4K 2 |T |2
Nonconstructive bounds for strong vertex expanders,
hitting samplers, and lossless condensers (matching the
bounds for the non-strong ones) follow by noting that
PBin(n, , 1, ) Bin(n, , 1). Indeed, writing i for the
expectation of Bernoulli random variable Yi , then
 t
t

 t
t

1
1
Pr
Yi = 1 =
i
i t ,
t
t
i=1

i=1

i=1

where the second-to-last inequality follows from the


Arithmetic MeanGeometric Mean Inequality.
Nonconstructive bounds for strong extractors and
averaging samplers (matching the bounds for the nonstrong ones) follow because the Cherno Bound (Theorem 2.21) also applies to nonidentical random variables:
PBin(D, (T ), (T ) + , ) exp((D2 )) for every > 0.
Nonconstructive bounds for list-decodable codes (Theorem 5.8) follow because the list-decodability of codes
involves taking T to be the graph of a received word
(cf. Proposition 5.29) and hence (T ) = 1/q, and
PBin(D, 1/q, 1 , 1/q) = q Hq (,D)D /q D . To see the latter,

let E[Yi ] = mi /q for mi N such that i mi D. so Yi can
be viewed as the indicator for the event the R(i) {1, . . . , mi }
for a uniformly random received word R : [D] [q], and

Pr[ i Yi > 1 ] is the probability that the graph G(R) =
{(i, r(i)) : i [D]} of the received word R intersects the set

290

Conclusions

T  = {(i, j) : i [D], j mi } in more than (1 )D positions. For a set S T  , the probability that G(R) T  = S
is at most (1/q)|S| (1 1/q)D|S| . (This is exactly the
probability in case S contains only pairs (i, j) with D
distinct values of i, otherwise the probability is zero.) Thus,


  mi 

i
(1/q)s (1 1/q)Ds
Yi > 1
Pr
s
i

s>(1)D

1
D
q
=


s>(1)D

 
D
(q 1)s
s

q Hq (,D)D
.
qD

Now, given this bound on PBin() and Proposition 5.29,


Theorem 8.4 gives us a (, K) list-decodable code Enc :
N  Dq qHq (,D)D K
D ( qD
) 4K12 D2 .
[N ] [q]D provided that K
 
N 
D
N K and Dq
Using the bounds K
D (eq) , we see
that we get a list-decodable code provided that the rate
= log N/D log q is smaller than Hq (, D) O(1/K)
O((log KD)/(KD log q). This matches the nonconstructive
bound for list-decodable codes (Theorem 5.8) up to the
dependence of the error term on the list size K.
Given the above, a natural long-term goal for the theory of
pseudorandomness is the following:
Open Problem 8.5. Can we give explicit constructions of functions
that nearly match the bounds of Theorems 8.2 and 8.4? Can this be
done with ecient (local) list-decoding algorithms?

8.2

Other Topics in Pseudorandomness

In this section, we survey some additional topics in the theory of


pseudorandomness that we did not cover in depth, each of which
could merit at least an entire section on its own. Sections 7.2 and 7.9

8.2 Other Topics in Pseudorandomness

291

contained a detailed survey of cryptographic pseudorandomness, so we


do not discuss it again here.
8.2.1

Pseudorandomness for Space-Bounded Computation

As discussed in Section 4.4, unlike for BPP and RP, we do know


very nontrivial derandomizations RL (and BPL). These derandomizations are obtained constructions of pseudorandom generators
G : {0, 1}d {0, 1}m such that no randomized (log m)-space algorithm
can distinguish G(Ud ) from Um . In order to get derandomizations that
are correct on every input x, we require pseudorandom generators
that fool nonuniform space-bounded algorithms. On the other hand,
since randomized space-bounded algorithms get their random bits
as a stream of coin tosses, we only need to fool space-bounded
distinguishers that read each of their input bits once, in order. Thus,
we consider distinguishers that are (oblivious, read-once) branching
programs, which maintain a state si [w] after reading i input bits,
and determine the next state si+1 [w] as a function of si and the
(i + 1)th input bit. The number w of available states at each time
step is called the width of the branching program, and corresponds to
a space bound of log w bits. A generator G : {0, 1}d(m) {0, 1}m that
is computable in space O(d(m)) and such that G(Ud(m) ) cannot be
distinguished from Um by oblivious, read-once branching programs of
width m implies that BPL DSPACE(O(d(poly(m)))).
The fact that pseudorandom generators imply lower bounds
(Problem 7.1) applies to this context too, but fortunately we do know
exponential width lower bounds for oblivious, read-once branching programs (e.g., computing the inner-product modulo 2 of two -bit strings
requires width 2() ). On the other hand, we cannot simply plug
such lower bounds into the NisanWigderson generator construction
(Theorem 7.24), because the reductions used do not preserve the readonce property (and we do not know superpolynomial lower-bounds for
branching programs that can read each input bit many times).
Nevertheless, a series of papers starting with Ajtai, Komlos, and
Szemeredi [11] have given unconditional pseudorandom generators
for space-bounded computation. The pseudorandom generator of

292

Conclusions

Nisan [299] uses a seed of length O(log2 m) to produce m bits that are
pseudorandom to oblivious, read-once branching programs of width
m. At rst, this only seems to imply that RL L2 , which already
follows from Savitchs Theorem [349] that NL L2 . Nevertheless,
Nisans generator and its additional properties has been used in more
sophisticated ways to obtain highly nontrivial derandomizations of
RL. Specically, Nisan [300] used it to show that every problem in RL
can be solved simultaneously in polynomial time and O(log2 n) space,
and Saks and Zhou [344] used it to prove that RL L3/2 . Another
important pseudorandom generator for space-bounded computation
is that of Nisan and Zuckerman [303], which uses a seed of length
O(log m) to produce logk m bits that are pseudorandom to oblivious,
read-once branching programs of width m. None of these results have
been improved in nearly two decades.
However, substantially better generators have been constructed
for restricted classes of oblivious read-once branching programs.
Specically, there are pseudorandom generators or hitting-set gen
erators (see Problem 7.8) stretching a seed of length O(log
m)
to m bits that fool combinatorial rectangles (which check membership in a rectangle S1 S2 Sm/ log m , where each
Si {0, 1}log m ) [136, 264, 31, 271, 179], branching programs of
width 2 and 3 [345, 73, 363, 179], constant-width regular branching
programs (where the transition function at each layer is regular) [82, 84], and constant-width permutation branching programs
(where each input string induces a permutation of the states at each
layer) [338, 250, 113, 373]. However, the following remains open:
Open Problem 8.6. Is there an explicit pseudorandom generator
2
G : {0, 1}o(log m) {0, 1}m whose output distribution is pseudorandom
to oblivious, read-once branching programs of width 4?
8.2.2

Derandomization vs. Lower Bounds

Derandomization from Uniform Assumptions. The construction of pseudorandom generators we have seen (Theorem 7.63) requires
nonuniform circuit lower bounds for functions in E, and it is of interest

8.2 Other Topics in Pseudorandomness

293

to nd constructions that only require uniform hardness assumptions.


One way to obtain such results is to show that uniform hardness
assumptions imply nonuniform assumptions. Indeed, the KarpLipton
Theorems in complexity theory [235] show that certain strong uniform
lower bounds imply nonuniform lower bounds. For example, if an
EXP-complete problem cannot be solved by a uniform algorithm in
the second level of the polynomial-time hierarchy (like NP but with
two nondeterministic quantiers, cf. Section 3.3), then it also cannot be
solved by polynomial-sized circuits. Subsequent strengthenings of the
KarpLipton Theorem (based on the theory of interactive proofs) show
that if EXP = MA (where MA is like NP but allows probabilistic verication of witnesses), then EXP  P/poly [42]; consequently one gets
pseudorandom generators with arbitrary polynomial stretch (secure for
innitely many input lengths) under the assumption EXP = MA [43].
Assuming the KarpLipton Theorem cannot be further improved
(e.g., to show EXP  P/poly EXP = BPP), from a uniform
assumption such as EXP = BPP, we can only hope to construct
pseudorandom generators that are secure against uniform distinguishers (because pseudorandom generators secure against nonuniform
distinguishers imply nonuniform lower bounds, by Problem 7.1).
In the context of cryptographic pseudorandom generators, uniform
results were typically developed together with or soon after the corresponding nonuniform results. Indeed, analogously to Theorem 7.11,
it is known that cryptographic pseudorandom generators that are
secure against uniform distinguishers exist if and only if there exist
one-way functions that are hard to invert by uniform algorithms [197].
For noncryptographic, mildly explicit pseudorandom generators as
in Theorem 7.63 and Corollary 7.64, an obstacle is that black-box
constructions (Denition 7.65) require nonuniform advice in the
reduction. (See discussion at the end of Section 7.7.2. This obstacle
is avoided in the case of cryptographic pseudorandom generators,
because the appropriate denition of black-box construction from
one-way functions gives the reduction oracle access to the one-way
function in addition to the distinguisher, since one-way functions are
supposed to be eciently computable by denition.)

294

Conclusions

Thus, we must turn to non-black-box constructions, in which we


make more use of the fact that the hard function f is computable in E
and/or the fact that the distinguisher T is computable by an ecient
probabilistic algorithm, not just to deduce that Gf is mildly explicit
and RedT is ecient. In fact, f and T need not even be used as
oracles; we can make use of the code of the programs computing these
functions (e.g., to reduce f to an E-complete problem). While at rst
it may seem dicult to take advantage of non-black-box constructions,
this was eventually accomplished by Impagliazzo and Wigderson [216].
They showed that if EXP = BPP, then there are mildly explicit pseudorandom generators with polynomial stretch that are secure against
uniform probabilistic distinguishers (for innitely many input lengths),
and hence BPP has subexponential-time average-case derandomizations (by Problem 7.9). (See [397] for a precise statement regarding the
construction of pseudorandom generators.) This is a uniform analogue
of the low-end nonuniform result in Corollary 7.64 (Item 3). Analogues for the high-end bounds (Items 1, 2) remain open. For example:
Open Problem 8.7. Does EXP  BPSUBEXP imply mildly
explicit generators G : {0, 1}polylog(m) {0, 1}m whose input is pseudorandom to every uniform probabilistic algorithm running in time m
(for innitely many m)?
Such a result is known if we replace EXP by PSPACE [397].
For derandomizing AM instead of BPP, both high-end and
low-end uniform results have been obtained [195, 358]. These results
utilize hard functions in E, unlike the nonuniform results which only
require hard functions in NE co-NE (cf. Theorem 7.68).
Derandomization Implies Circuit Lower Bounds. Since uniform hardness assumptions and PRGs against uniform distinguishers only seem to imply average-case derandomizations (Problem 7.9),
it is tempting to conjecture that worst-case derandomizations imply
(or are even equivalent to) nonuniform circuit lower bounds. A result
of this type was rst given implicitly by Buhrman, Fortnow, and

8.2 Other Topics in Pseudorandomness

295

Thierauf [86] and then explicitly and in stronger form by Impagliazzo, Kabanets, and Wigderson [212]. Specically, these results show
that if MA (like NP, but with probabilistic verication of witnesses)
can be derandomized (e.g., MA = NP or even MA NSUBEXP),
then NEXP  P/poly. Derandomization of prBPP implies derandomization of MA, so this also implies that if prBPP = prP or
even prBPP prSUBEXP, then NEXP  P/poly. This result falls
short of giving a converse to Corollary 7.64 (Item 3) because the circuit
lower bounds are for NEXP rather than EXP. (Corollary 7.64, as well
as most of the other derandomization results weve seen, apply equally
well to prBPP as to BPP.) In addition, the result does not give exponential circuit lower bounds even if we assume full derandomization
(prBPP = prP). However, Santhanam [347] shows that prBPP =
prP implies that for every constant k, there is a language in NP that
does not have circuits of size nk , which can be viewed as a scaled down
version of the statement that NE requires circuits of size 2(n) .2
Thus, the following remain open:
Open Problem 8.8. Does prBPP = prP imply E  P/poly
(equivalently EXP  P/poly, by Problem 7.2)?

Open Problem 8.9. Does prBPP = prP imply that NEXP has a
(1)
problem requiring nonuniform boolean circuits of size 2
on inputs
of length ?
By the result of [212] and Corollary 2.31, nding a deterministic polynomial-time algorithm for the prBPP-complete problem
[+]-Approx Circuit Average implies superpolynomial circuit lower
bounds for NEXP. Unfortunately, we do not know a wide variety
of natural problems that are complete for prBPP (unlike NP).
Nevertheless, Kabanets and Impagliazzo [227] showed that nding
a deterministic polynomial-time algorithm for Polynomial Identity
2 Indeed,

by a standard padding or translation argument in complexity theory, if for


some constant k, every language in NP had circuits of size nk , then every language in
NE would have circuits of size 2o(n) .

296

Conclusions

Testing implies that either NEXP  P/poly or that the Permanent


does not have polynomial-sized arithmetic circuits, both of which
are long-standing open problems in complexity theory. (See [1] for a
simpler and somewhat stronger proof.) Polynomial Identity Testing is
in co-RP, by Theorem 2.12, but is not known to be complete for any
randomized complexity class. Of course, it is also of interest to nd
additional complete problems for prBPP:
Open Problem 8.10. Find combinatorial or algebraic complete
problems for any randomized complexity classes (e.g., prBPP, prRP,
prAM, BPP, RP, ZPP).
Derandomizations of prAM are also known to imply circuit lower
bounds, which are stronger than what the aforementioned results give
from derandomizations of prBPP in that they either yield exponentialsize bounds [38] or give lower bounds for nondeterministic circuits [39].
One interesting interpretation of many of these results is to show
that if derandomization is possible via any means (for all of prBPP or
prAM), then it can be done in a canonical way via pseudorandom
generators (because these results show that derandomization implies
circuit lower bounds, which in turn imply pseudorandom generators
via Theorem 7.63). A recent work by Goldreich [163] directly proves
equivalences between various kinds of derandomizations of prBPP
(e.g., worst-case or average-case) and various forms of pseudorandom generators, without going through circuit lower bounds. (See
Problem 7.10.)
Another reason for the interest in these results is that they suggest
derandomization as a potential approach to proving circuit lower
bounds. Indeed, derandomization results have played a role in some
state-of-the-art lower bounds, namely the result of Buhrman, Fortnow,
and Thierauf [86] that MAEXP (the exponential-time analogue of
MA) is not in P/poly, and the result of Williams [418, 419] that
NEXP is not in ACC (which is dened like AC0 but also allowing
unbounded fan-in gates that test whether their inputs sum to zero
modulo m, for any constant m). The result of Kabanets and Impagliazzo [227] (along with [7]) has also been one of the motivations for

8.2 Other Topics in Pseudorandomness

297

the recent line of work on derandomizing Polynomial Identity Testing


for low-depth arithmetic circuits. (See the Notes and References of
Section 2.)
8.2.3

Hardness Amplication

As discussed in Section 7.6.1, hardness amplication is the task


of taking a computational problem that is mildly hard on average
and making it much more hard on average. Hardness amplication was introduced by Yao, in oral presentations of his paper
[421]. Specically, he suggested the Direct Product construction
f  (x1 , . . . , xk ) = (f (x1 ), . . . , f (xk )) to convert a weak one-way function
f (one that is mildly hard to invert) into a standard strong one-way
function f  (satisfying Denition 7.10), and an XOR Lemma
showing that if a Boolean function f is mildly hard to compute,
then f  (x1 , . . . , xk ) = f (x1 ) f (x2 ) f (xk ) is very hard on average
to compute. These were tools used in his proof that weak one-way
permutations imply pseudorandom generators. These results have
been generalized and strengthened in a number of ways:
Quantitative Bounds: It is of interest to have tight bounds on
the hardness of f  as a function of the hardness of f and k. For
the Direct Product construction, if f is average-case-hard, then
intuitively we expect f  to be roughly (1 (1 )k ) average-case-hard,
corresponding to the fact that an ecient algorithm trying to compute
f  should have probability at most 1 of solving each of the k
instances correctly. Similarly, in the case of the XOR Lemma, we
expect that if f is (1 )/2 average-case-hard, then f  should be
roughly (1 k )/2-average-case hard. Levin [259] proved a version
of the XOR Lemma that gives such a tight bound, and his proof
technique also extends to the Direct Product construction (see [171]).
Derandomization: Goldreich, Impagliazzo, Levin, Venkatesan, and
Zuckerman [166] gave derandomized hardness amplication results
for converting weak one-way permutations and weak regular one-way
functions into strong ones, where the inputs xi are not independent but
are generated in some pseudorandom way from a short seed that is the

298

Conclusions

input to f  . Impagliazzo and Wigderson [208, 215] gave derandomized


versions of the XOR Lemma and a Direct Product Lemma (for
hard-to-compute boolean functions). These results were used in the
rst proof, due to [215], that P = BPP if E requires exponential-size
circuits. Recall that the proof we saw in Section 7 avoids hardness
amplication, and instead goes directly from worst-case hardness to
strong average-case hardness via locally list-decodable codes (following
[381]). Nevertheless, hardness amplication is still of interest because
it can be implemented in lower complexity than worst-case-to-averagecase amplication. Indeed, hardness amplication (starting from mild
average-case hardness) can be implemented in polynomial time with
oracle access to f , whereas Viola [411] has shown that black-box
worst-case-to-average-case amplication (per Denition 7.74) cannot
be implemented in the polynomial-time hierarchy (due to needing to
compute a list-decodable encoding of the truth table of f ). Indeed,
another line of work has investigated hardness amplication for
functions in NP.
Hardness Amplication in NP: The study of this topic was
initiated in the work of ODonnell [304]. The goal is to show that
if NP has a function f that is mildly hard on average, then it
has a function f  that is very hard on average. Yaos XOR Lemma
does not prove this because f being in NP does not imply that
f  (x1 , . . . , xk ) = f (x1 ) f (xk ) is also in NP, assuming that
NP = co-NP (which is commonly conjectured). Thus, ODonnell
examines constructions of the form f  (x1 , . . . , xk ) = C(f (x1 ), . . . , f (xk )),
where C is an eciently computable and monotone combining function
(i.e., changing an input of C from 0 to 1 cannot change the output
from 1 to 0). He characterizes the amplication properties of C in
terms of its noise stability, thereby connecting the study of hardness
amplication with the analysis of boolean functions (see [305] for more
on this topic). He uses this connection to nd monotone functions C
with nearly optimal amplication properties, namely
ones that will
ensure that the function f  is roughly (1/2 O(1/ k))-hard if it is
obtained by combining k evaluations of f . Contrast this bound with
the XOR Lemma, where C is the (non-monotone) parity function and

8.2 Other Topics in Pseudorandomness

299

ensures that f  is (1/2 1/2(k) )-hard. Healy, Vadhan, and Viola [202]
showed how to derandomize ODonnells construction, so that the
inputs x1 , . . . , xk can be generated in a correlated way by a much
shorter input to f  . This allows for taking k to be exponential in the
input length of f  , and for certain combining functions C, the function
f  is still in NP (using the ability of a nondeterministic computation
to compute exponential-sized ORs). As a result, assuming that f is
mildly hard for nonuniform algorithms running in time 2(n) (the
 1/2
high end), they obtain f  NP that is (1/2 1/2(n ) )-hard where
n is the input length of f  . A quantitative improvement was given by
 1/2


[273, 179], replacing 2(n ) with 2n /polylog(n ) , but it remains open to

achieve the optimal bound of 2(n ) .
Uniform Reductions: Another line of work has sought to give
hardness amplication results for uniform algorithms, similarly to
the work on derandomization from uniform assumptions described in
Section 8.2.2. Like with cryptographic pseudorandom generators, most
of the hardness amplication results in the cryptographic setting, such
as Yaos original hardness amplication for one-way functions [421],
also apply to uniform algorithms. In the noncryptographic setting,
a diculty is that black-box hardness amplication corresponds to
error-correcting codes that can be decoded from very large distances,
such as 1/2 in the case of binary codes, and at these distances
unique decoding is impossible, so one must turn to list decoding and
use some nonuniform advice to select the correct decoding from the
list. (See Denition 7.74 and the discussion after it. For amplication
from mild average-case hardness rather than worst-case hardness, the
coding-theoretic interpretation is that the decoding algorithm only
needs to recover a string that is very close to the original message,
rather than exactly equal to the message [209, 390]; this also requires
list decoding for natural settings of parameters.) However, unlike
the case of pseudorandom generator constructions, here the number
of candidates in the list can be relatively small (e.g., poly(1/)),
so a reasonable goal for a uniform algorithm is to produce a list of
possible decodings, as in our denition of locally list-decodable codes
(Denition 7.54). As observed in [216, 397, 390], if we are interested in

300

Conclusions

amplifying hardness for functions in natural complexity classes such as


NP or E, we can use (non-black-box) checkability properties of the
initial function f to select a good decoding from the list, and thereby
obtain a fully uniform hardness amplication result.
Uniform local list-decoding algorithms for the ReedMuller Code
were given by [35, 381] (as covered in Section 7), and were used to give
uniform worst-case-to-average-case hardness amplication results for
E and other complexity classes in [397]. Trevisan [390, 392] initiated
the study of uniform hardness amplication from mild average-case
hardness, giving uniform analogues of some of the hardness amplication results from [208, 304]. Impagliazzo, Jaiswal, Kabanets, and
Wigderson [210, 211] gave nearly optimal uniform Direct Product
Theorems and XOR Lemmas. The existing uniform amplication
results still do not quite match the nonuniform results in two respects.
First, the derandomizations are not quite as strong; in particular, it
is not known how to achieve an optimal high end result, converting
a function on n-bit inputs that is mildly average-case hard against
algorithms that run in time 2(n) into a function on O(n)-bit inputs
that is (1/2 1/2(n) )-hard against time 2(n) . (In the nonuniform
setting, this was achieved by [215].) Second, for hardness amplication
in NP, the existing uniform results only amplify a function that is
mildly hard against algorithms running in time t to ones that are
(1/2 1/(log t)(1) )-hard, rather than (1/2 1/t(1) )-hard, which
is achieved in the nonuniform case by [304, 202]. See [88] for a
coding-theoretic approach to closing this gap.
Other Cryptographic Primitives: There has also been a large
body of work on security amplication for other kinds of cryptographic
primitives and interactive protocols (where the goal of the adversary
is much more complex than just computing or inverting a function).
Describing this body of work is beyond the scope of this survey, so we
simply refer the interested reader to [119, 105, 388] and the references
therein.
A key component of many hardness amplication results mentioned
above is the Hardcore Theorem of Impagliazzo [208] and variants. In
its basic form, this theorem states that if a function f is mildly hard

8.2 Other Topics in Pseudorandomness

301

on average, then there is a hardcore set of inputs, of noticeable density,


on which the function is very hard on average. Thus, intuitively,
hardness amplication occurs when we evaluate a function many times
because we are likely to hit the hardcore set. Since Impagliazzos
original paper, there have been a number of papers giving quantitative
improvements to the Hardcore Theorem [243, 206, 45]. In addition
to its applications in hardness amplication, the Hardcore Theorem
has been shown to be closely related to boosting in machine learning [243], dense model theorems that have applications in additive
number theory and leakage-resilient cryptography [127, 329, 395],
computational analogues of entropy [49, 106, 145, 257, 329, 381, 401],
and regularity lemmas in the spirit of Szemeredis Regularity
Lemma in graph theory [396]. One downside of using the Hardcore
Theorem is that the (black-box) reductions in its proofs inherently
require a lot of nonuniform advice [274]. As shown by Holenstein [206],
this nonuniformity can often be avoided in cryptographic settings,
where one can eciently sample random pairs (x, f (x)); his uniform
analogue of the Hardcore Theorem and variants have played a key
role in simplifying and improving the construction of cryptographic
pseudorandom generators from one-way functions [198, 206, 401].

8.2.4

Deterministic Extractors

As mentioned in Section 6.5, the study of deterministic extractors has


remained active even after the introduction of seeded extractors. One
reason is that many applications (such as in cryptography, distributed
computing, and Monte Carlo simulation) really require high-quality
random bits, we often only have access to physical sources of randomness, and the trick of enumerating the seeds of a seeded extractor does
not work in these contexts. Another is that deterministic extractors
for certain classes of sources often have other applications of interest
(even in contexts where we allow sources of truly random bits). For
these reasons, after nearly a decade of work on simulating BPP with
weak sources and seeded extractors, there was a resurgence of interest
in deterministic extractors for various classes of sources [90, 398].

302

Conclusions

One important class of sources are those that consist of a


small number of independent k-sources, rst studied by Chor and
Goldreich [96] (following [346, 409], who gave extractors for several
independent unpredictable-bit sources). In addition to their motivation
for obtaining high-quality randomness, extractors for 2 independent
sources are of interest because of connections to communication
complexity and to Ramsey theory. (Textbooks on these topics are
[253] and [182], respectively, and their connections to extractors can be
found in [409, 96] and the references below.) In particular, a disperser
for 2 independent k-sources of length n is equivalent to a bipartite
Ramsey graph a bipartite graph with N vertices on each side that
contains no K K bipartite clique or K K bipartite independent
set (for N = 2n and K = 2k ). Giving explicit constructions of Ramsey
graphs that approach K = O(log N ) bound given by the probabilistic
method [132] is a long-standing open problem posed by Erd
os [133].
Chor and Goldreich [96] gave extractors for 2 independent
k-sources when k > n/2 (e.g., inner-product modulo 2), and there
was no improvement in this bound for nearly 2 decades. Substantial
progress began again with the work of Barak, Impagliazzo, and
Wigderson [46], who used new results in arithmetic combinatorics to
construct extractors for a constant number of independent k-sources
when k = n for an arbitrarily small constant > 0. Specically, they
used the SumProduct Theorem over nite elds [79], which says that
for p prime and every subset A Fp whose size is not too close to
p, either the set A + A of pairwise sums or the set A A of pairwise
products is of size signicantly larger than |A|. Arithmetic (and
additive) combinatorics have now been found to be closely connected
to many topics in pseudorandomness and theoretical computer science
more broadly; see the survey of Trevisan [394] and the talks and
lecture notes from the minicourse [50].
A series of subsequent works improved [46] using techniques based
on seeded extractors and condensers, various forms of composition, and
other new ideas, achieving in particular explicit extractors for 3 sources
of min-entropy k = n [47], extractors for a constant number of sources
of min-entropy k = n [318] (notably making no use of arithmetic combinatorics), and dispersers for 2 sources of min-entropy k = no(1) [48].

8.2 Other Topics in Pseudorandomness

303

The latter result is a substantial improvement over the previous best


explicit construction of Ramseygraphs [141], which avoided cliques and
independent sets of size K = 2 n and only applied to the nonbipartite
case. Even with all of this progress, the following remains open:
Open Problem 8.11. For every constant > 0, construct an explicit
function Ext : {0, 1}n {0, 1}n {0, 1} such that for every two
independent n-sources X, Y , Ext(X, Y ) is -close to uniform on {0, 1}.
Another line of work considers classes of sources dened by some
measure of complexity, for example capturing the complexity of generating a random sample of the source (from independent coin tosses).
Explicit extractors have been constructed for a variety of models of
space-bounded sources [69, 231, 246, 247, 398, 408]. In particular,
Kamp, Rao, Vadhan, and Zuckerman [231] show how to deterministically extract (n) bits from any source of min-entropy k = (n)
generated by a (nonuniform) algorithm with O(n) space; this results
exploits connections to extracting from multiple independent sources
(as discussed above) as well as from bit-xing sources (see Section 6.5).
Trevisan and Vadhan [398] suggest to consider sources generated
by small Boolean circuits, and show that under strong complexity
assumptions (similar to, but stronger than, the one in Theorem 7.68),
there is a deterministic extractor computable in time poly(s) that
extracts randomness from sources of min-entropy k = (1 (1))n
generated by circuits of size s, with error = 1/s. It is necessary that
the extractor has higher computational complexity than the source,3
but the min-entropy and error bounds can potentially be much better:
Open Problem 8.12. Under plausible complexity assumptions, show
that there exists a deterministic extractor computable in time poly(s)
that extracts randomness from sources of min-entropy k generated
by circuits of size s, with error , for k n/2 and/or = s(1) .
Alternatively, give evidence that no such extractor exists.
3 Interestingly,

for condensers, it no longer seems necessary that the extractor has higher
complexity than the source [120].

304

Conclusions

De and Watson [114] and Viola [414, 415] have recently obtained
unconditional extractors for restricted classes of sampling circuits,
such as NC0 (where each output bit depends on a constant number of
input bits) [114, 414] AC0 [414], even for sublinear min-entropy. These
results are based on a close connection between constructing extractors
for a class of circuits and nding explicit distributions that are hard
for circuits in the class to sample, the latter a topic that was also
studied for the purpose of proving data structure lower bounds [415].
It is also natural to look at sources that are low complexity in an
algebraic sense. The simplest such model is that of ane sources, which
are uniform over ane subspaces of Fn for a nite eld F. If the subspace has dimension k, then the source is a at source of min-entropy
k log |F|. For the case of F = Z2 , there are now explicit extractors for
ane sources of sublinear min-entropy k = O(n) [78, 422, 262] and dispersers for ane sources of subpolynomial min-entropy k = no(1) [353].
For large elds F (i.e., |F| = poly(n)), there are extractors for subspaces
of every dimension k 1 [147]. There have also been works on extractors for sources described by polynomials of degree larger than 1, either
as the output distribution of a low-degree polynomial map [123] or the
zero set of low-degree polynomial (i.e., an algebraic variety) [122].
We remark that several of the state-of-art constructions for
independent sources and ane sources, such as [47, 48, 353], are quite
complicated, using sophisticated compositions and/or machinery from
arithmetic combinatorics. It is of interest to nd simpler constructions;
some progress has been made for ane sources in [60, 262] and for independent sources in [62] (where the latter in fact gives a reductions from
constructing 2-source extractors to constructing ane extractors).
8.2.5

Algebraic Pseudorandomness

Algebraic Measures of Pseudorandomness. In this survey,


we have mostly focused on statistical and computational measures
of (pseudo)randomness, such as pairwise independence and computational indistinguishability, respectively. It is also of interest to
consider more algebraic measures, because they can be often related to
statistical and/or computational measures, may be convenient for ana-

8.2 Other Topics in Pseudorandomness

305

lyzing algebraic constructions of pseudorandom objects, and can have


applications of their own (e.g., in additive or arithmetic combinatorics).
One of the most basic and important algebraic measures of pseudorandomness is that of a small-bias space, introduced by Naor and
Naor [296]. Here a random variable X = (X1 , . . . , Xn ) taking values
in {0, 1}n is said to be -biased i for every nonempty subset S [n],
we have (1 )/2 Pr[iS Xi = 1] (1 + )/2. Naor and Naor [296]
presented an explicit generator G : {0, 1}d {0, 1}n of seed length
d = O(log(n/)) such that G(Ud ) is -biased; thus G is a pseudorandom
generator fooling all parity tests. (Simpler constructions with better
constants are in [19].) This generator has found a variety of applications, such as almost k-wise independent hash functions [296, 19] (cf.
Problem 3.4), pseudorandom generators for small-width branching
programs [345, 363, 179], derandomization of specic algorithms [296],
and almost-linear-length probabilistically checkable proofs [61, 59].
Small-bias generators have several equivalent formulations. When
viewed appropriately, they are equivalent to linear error-correcting
codes (as in Problem 5.4) over F2 in which every nonzero codeword has
Hamming weight between (1 )/2 and (1 + )/2 [296, 19]. They are
also equivalent to expanders with spectral expansion 1 that have
algebraic structure of a Cayley graph over the group G = Zn2 [24]. (In
general, when G is a group and S G, the Cayley graph is |S|-regular
digraph on vertex set G, where the neighbors of vertex g are {gs : s
S}.) Finally, the small-bias property is equivalent to X being a distribution on the group G = Zn2 all of whose nontrivial Fourier coecients are
at most [296]. Specically, the fact that the (1 )/2 Pr[iS Xi =
1] (1 + )/2 is equivalent to requiring that | E[S (X)]| , where
S (x) = (1)iS xi is the Fourier character of G = Zn2 indexed by the
set S. (For background on Fourier analysis over nite groups, see the
book by Terras [387], and for a survey of applications in theoretical computer science, see [115] and the references therein.) Thus, we see that
even in an algebraic context, dierent types of pseudorandom objects
(generators, codes, and expanders) are equivalent.
These dierent views of small-bias spaces suggest dierent generalizations. One is that of linear codes over larger nite elds, which have
been studied extensively in coding theory and of which weve seen some

306

Conclusions

examples (Problem 5.4). Another is to consider Cayley graphs and


Fourier analysis over groups G other than Zn2 . Over abelian groups, this
generalization is fairly direct; the Fourier coecients of a small-bias
space on a group G are exactly the eigenvalues of a corresponding Cayley graph over the group G. For many abelian groups, including those
of the form G = Znp for prime p, there are known explicit constructions
of small-bias generators with seed length O(log log |G| + log(1/)),
corresponding to explicit Cayley expanders of spectral expansion 1
and degree poly(log |G|, 1/) [240, 9, 135, 40, 321, 21]. However, there
are benets in working with nonabelian groups, as Cayley expanders
over abelian groups G require degree (log N ), where N = |G| is
the number of vertices [24]. The logarithmic degree lower bound for
expanders over abelian groups also holds for the more general notion of
Schreier graphs, where the group G acts as permutations on the vertex
set V , and we connect a vertex v V to s(v) for every s S for some
subset S G. On the other hand, many of the algebraic constructions
of constant-degree expanders, including Ramanujan graphs and the
others described in Section 4.3.1, are obtained as Cayley graphs or
Schreier graphs over nonabelian groups. The spectral expansion of
these constructions can be analyzed via group representation theory,
which is more involved than Fourier analysis over abelian groups,
because it deals with matrix-valued (rather than scalar-valued)
functions. See the surveys by Hoory, Linial, and Wigderson [207]
and Lubotzky [276], the lecture notes of Tao [386], and the notes for
Section 4 for more on this and other approaches to analyzing the
expansion of Cayley and Schreier graphs.
In the above, we think of a small-bias generator (according to
the original denition) as producing pseudorandom elements of the
group G = Zn2 , and consider generalizations to dierent groups G. An
alternative view is that the small-bias generator produces a sequence
of bits, and the group G = Zn2 only arises in dening what it means for
the distribution on bit-strings to be pseudorandom. More generally,
we can take G = H n for any nite group H, and consider a random
variable X = (X1 , . . . , Xn ) taking values in {0, 1}n to be pseudorandom
Xn is
1 X2
if for every (h1 , . . . , hn ) H n , the distribution of hX
1 h2 hn
R1 R2
R
statistically close to h1 h2 hn n where R = (R1 , . . . , Rn ) is uniformly

8.2 Other Topics in Pseudorandomness

307

distributed in {0, 1}n . This notion is of interest in part because the


function fh (x1 , . . . , xn ) = hx1 1 hxnn can be computed by a read-once
branching program of width |H| (see Section 8.2.1). Thus constructing
generators fooling such group product programs are a natural
warm-up to constructing pseudorandom generators for space-bounded
computation, and in fact are equivalent to constructing generators for
permutation branching programs in the case of constant width [250].
Yet another type of generalization is obtained by expanding the
class of distinguishers from linear functions (which over Fn2 are simply
parities) to higher-degree polynomials. A series of recent results has
shown that the sum of d small-bias spaces (over Fn for any nite
eld F) cannot be distinguished from uniform by polynomials of degree
d [75, 270, 413]. These results were inspired by and inuenced work on
the Gowers uniformity norms from arithmetic combinatorics [180, 181],
which can be viewed as providing a higher-order Fourier analysis
and have found numerous applications in theoretical computer science.
(See [394, 50].)
Explicit Constructions via Polynomial Evaluation. As weve
seen in this survey, algebra also provides powerful tools for constructing pseudorandom objects whose denitions make no reference to
algebra. One particularly useful paradigm in such constructions is
polynomial evaluation. Specically, we construct a pseudorandom
object : Fn E Fm , where F is a nite eld and E Ft is a set
of evaluation points, by setting (f, y) = (f1 (y), . . . , fm (y)), where we
view f Fn as specifying a low-degree polynomial in t variables, from
which we construct m related low-degree polynomials f1 , . . . , fm that
we evaluate at the seed y. ReedSolomon and ReedMuller Codes
(Constructions 5.14 and 5.16) correspond to the case where m = 1
and we take f1 = f to be a univariate or multivariate polynomial,
respectively.4 In ParvareshVardy Codes (Construction 5.21), we took
f to be a univariate polynomial and obtained the fi s by powering f
modulo an irreducible polynomial. In Folded ReedSolomon Codes
4 In

our presentation of ReedSolomon and ReedMuller codes, we took E = Ft , but many


of the properties of these codes also hold for appropriately chosen subsets of evaluation
points.

308

Conclusions

(Construction 5.23), fi (Y ) = f ( i1 Y ), so evaluating fi at y amounts


to evaluating f at the related point i1 y. More generally, if our construction (f, y) is obtained by evaluating f at points g1 (y), . . . , gm (y)
for linear (or low-degree) functions g1 , . . . , gm , we can also view it as
evaluating the polynomials f1 = f g1 , . . . , fm = f gm at the seed y.
Prior to ParvareshVardy codes, constructions of this type were
used for extractors and pseudorandom generators. Specically, Miltersen and Vinodchandran [289] show that if we take f to describe
a multivariate polynomial (via low-degree extension) and evaluate it
on the points of a random axis-parallel line through y, we obtain a
hitting-set generator construction against nondeterministic circuits
(assuming f has appropriate worst-case hardness for nondeterministic
circuits); this construction has played a key role in derandomizations
of AM under uniform assumptions [195, 358].5 Ta-Shma, Zuckerman, and Safra [384] showed that a similar construction yields
randomness extractors with seed length (1 + O(1)) log n for polynomially small min-entropy and polynomial entropy loss. Shaltiel and
Umans [356, 399] showed that evaluating the t-variate polynomial f at
the points y, y, 2 y, . . . , m1 y, where is a primitive element of the
eld of size |F|t (which we associate with Ft ) yields both a very good
extractor construction and an optimal construction of pseudorandom
generators from worst-case hard functions. Notice that this construction is precisely a multivariate analogue of Folded ReedSolomon
Codes (which came afterwards [188]). Recently, Kopparty, Saraf, and
Yekhanin [249] have introduced yet another useful way of obtaining
the related polynomials f1 , . . . , fm , namely by taking derivatives of
f ; this has yielded the rst codes with rate approaching 1 while being
locally (list-)decodable in sublinear time [249, 248] as well as codes
matching the optimal rate-distance tradeo of Folded ReedSolomon
Codes [193, 248]. All of this suggests that polynomial evaluation
may be a promising approach to obtaining a unied and near-optimal
construction of pseudorandom objects (Open Problem 8.5).
5 The

constructions described here have some additional components in addition to the basic
polynomial evaluation framework (f, y) = (f1 (y), . . . , fm (y)), for example the seed should
also specify the axis along which the line is parallel in [289] and a position in an inner
encoding in [384, 356, 399]. We ignore these components in this informal discussion.

Acknowledgments

My exploration of pseudorandomness began in my graduate and postdoctoral years at MIT and IAS, under the wonderful guidance of Oded
Goldreich, Sha Goldwasser, Madhu Sudan, and Avi Wigderson. It was
initiated by an exciting reading group organized at MIT by Luca Trevisan, which immersed me in the subject and started my extensive collaboration with Luca. Through fortuitous circumstances, I also began
to work with Omer Reingold, starting what I hope will be a lifelong collaboration. I am indebted to Oded, Sha, Madhu, Avi, Luca, and Omer
for all the insights and research experiences they have shared with me.
I have also learned a great deal from my other collaborators on
pseudorandomness, including Boaz Barak, Eli Ben-Sasson, Michael
Capalbo, Kai-Min Chung, Nenad Dedic, Yevgeniy Dodis, Parikshit
Gopalan, Dan Gutfreund, Venkat Guruswami, Iftach Haitner, Alex
Healy, Thomas Holenstein, Jesse Kamp, Danny Lewin, Adriana
L
opez-Alt, Shachar Lovett, Chi-Jen Lu, Raghu Meka, Ilya Mironov,
Michael Mitzenmacher, Shien Jin Ong, Michael Rabin, Anup Rao, Ran
Raz, Yakir Reshef, Leo Reyzin, Thomas Ristenpart, Eyal Rozenman,
Thomas Steinke, Madhur Tulsiani, Chris Umans, Emanuele Viola,
Hoeteck Wee, Colin Jia Zheng, and David Zuckerman. Needless to
309

310

Acknowledgments

say, this list omits many other researchers in the eld with whom I
have had stimulating discussions.
The starting point for this survey was scribe notes taken by
students in the 2004 version of my graduate course on Pseudorandomness. I thank those students for their contribution: Alexandr Andoni,
Adi Akavia, Yan-Cheng Chang, Denis Chebikin, Hamilton Chong,
Vitaly Feldman, Brian Greenberg, Chun-Yun Hsiao, Andrei Jorza,
Adam Kirsch, Kevin Matulef, Mihai P
atrascu, John Provine, Pavlo
Pylyavskyy, Arthur Rudolph, Saurabh Sanghvi, Grant Schoenebeck,
Jordanna Schutz, Sasha Schwartz, David Troiano, Vinod Vaikuntanathan, Kartik Venkatram, David Woodru. I also thank the
students from the other oerings of the course; Dan Gutfreund, who
gave some guest lectures in 2007; and all of my teaching fellows,
Johnny Chen, Kai-Min Chung, Minh Nguyen, Emanuele Viola, and
Colin Jia Zheng. Special thanks are due to Levent Alpoge, Michael
Forbes, Dieter van Melkebeek, Greg Price, and Adam Sealfon for their
extensive feedback on the lecture notes and/or drafts of this survey.
Helpful input of various types (corrections, comments, answering
questions) has also been given by Zachary Abel, Dana Albrecht, Nir
Avni, Pablo Azar, Trevor Bass, Osbert Bastiani, Jeremy Booher,
Fan Chung, Ben Dozier, Chinmoy Dutta, Zhou Fan, Oded Goldreich,
Andrey Grinshpun, Venkat Guruswami, Alex Healy, Stephan Holzer,
Andrei Jorza, Michael von Kor, Kevn Lee, Alex Lubotzky, Avner May,
Eric Miles, Shira Mitchell, Jelani Nelson, Omer Reingold, Yakir Reshef,
Shubhangi Saraf, Shrenik Shah, Madhu Sudan, Justin Thaler, Jon Ullman, Ameya Velingker, Neal Wadhwa, Hoeteck Wee, Avi Wigderson,
and David Wu. Thanks to all of them, as well as those I have forgotten.
I am extremely grateful to James Finlay of now publishers for his
years of patience and continual encouragement, without which I would
have never nished this survey.
My time working on this survey included sabbatical visits to
the Miller Institute for Basic Research in Science at UC Berkeley,
Microsoft Research Silicon Valley, and Stanford University, as well
as support from a Sloan Fellowship, a Guggenheim Fellowship, NSF
grants CCF-0133096 and CCF-1116616, ONR grant N00014-04-1-0478,
and US-Israel BSF grants 2006060 and 2010196.

References

[1] S. Aaronson and D. van Melkebeek, On circuit lower bounds from derandomization, Theory of Computing. An Open Access Journal, vol. 7, pp. 177184,
2011.
[2] M. Abadi, J. Feigenbaum, and J. Kilian, On hiding information from an oracle, Journal of Computer and System Sciences, vol. 39, no. 1, pp. 2150, 1989.
[3] L. Adleman, Two theorems on random polynomial time, in Annual
Symposium on Foundations of Computer Science (Ann Arbor, Mich., 1978),
pp. 7583, Long Beach, California, 1978.
[4] M. Agrawal, On derandomizing tests for certain polynomial identities, in
IEEE Conference on Computational Complexity, pp. 355, 2003.
[5] M. Agrawal and S. Biswas, Primality and identity testing via Chinese
remaindering, Journal of the ACM, vol. 50, no. 4, pp. 429443, 2003.
[6] M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P, Annals of
Mathematics. Second Series, vol. 160, no. 2, pp. 781793, 2004.
[7] M. Agrawal and V. Vinay, Arithmetic circuits: A chasm at depth four, in
FOCS, pp. 6775, 2008.
[8] A. V. Aho, ed., Proceedings of the Annual ACM Symposium on Theory of
Computing, 1987, New York, USA, 1987.
[9] M. Ajtai, H. Iwaniec, J. Koml
os, J. Pintz, and E. Szemeredi, Construction of a thin set with small Fourier coecients, Bulletin of the London
Mathematical Society, vol. 22, no. 6, pp. 583590, 1990.
[10] M. Ajtai, J. Koml
os, and E. Szemeredi, Sorting in c log n parallel steps,
Combinatorica, vol. 3, no. 1, pp. 119, 1983.

311

312

References

[11] M. Ajtai, J. Koml


os, and E. Szemeredi, Deterministic Simulation in
LOGSPACE, in Annual ACM Symposium on Theory of Computing,
pp. 132140, New York City, 2527 May 1987.
[12] M. Ajtai and A. Wigderson, Deterministic simulation of probabilistic constant depth circuits, in Randomness and Computation, vol. 5 of Advances in
Computing Research, (F. P. Preparata and S. Micali, eds.), pp. 199223, 1989.
[13] R. Aleliunas, R. M. Karp, R. J. Lipton, L. Lov
asz, and C. Racko, Random
walks, universal traversal sequences, and the complexity of maze problems,
in Annual Symposium on Foundations of Computer Science (San Juan,
Puerto Rico, 1979), pp. 218223, New York, 1979.
[14] E. Allender, H. Buhrman, M. Kouck
y, D. van Melkebeek, and D. Ronneburger, Power from random strings, SIAM Journal on Computing, vol. 35,
no. 6, pp. 14671493, 2006.
[15] N. Alon, Eigenvalues and expanders, Combinatorica, vol. 6, no. 2, pp. 8396,
1986.
[16] N. Alon, Eigenvalues, geometric expanders, sorting in rounds, and Ramsey
theory, Combinatorica, vol. 6, no. 3, pp. 207219, 1986.
[17] N. Alon and M. R. Capalbo, Explicit unique-neighbor expanders, in
Symposium on Foundations of Computer Science (Vancouver, BC, 2002),
pp. 7379, 2002.
[18] N. Alon and F. R. K. Chung, Explicit construction of linear sized tolerant
networks, Discrete Mathematics, vol. 72, no. 13, pp. 1519, 1988.
[19] N. Alon, O. Goldreich, J. H
astad, and R. Peralta, Simple constructions
of almost k-wise independent random variables, Random Structures &
Algorithms, vol. 3, no. 3, pp. 289304, 1992. (See also addendum in issue
4(1), 1993, pp. 199120).
[20] N. Alon, V. Guruswami, T. Kaufman, and M. Sudan, Guessing secrets
eciently via list decoding, ACM Transactions on Algorithms, vol. 3, no. 4,
pp. Art 42, 16, 2007.
[21] N. Alon and Y. Mansour, -discrepancy sets and their application for
interpolation of sparse polynomials, Information Processing Letters, vol. 54,
no. 6, pp. 337342, 1995.
[22] N. Alon, Y. Matias, and M. Szegedy, The space complexity of approximating
the frequency moments, Journal of Computer and System Sciences, vol. 58,
no. 1, pp. 137147, (Part 2) 1999.
[23] N. Alon and V. D. Milman, Eigenvalues, expanders and superconcentrators
(Extended Abstract), in Annual Symposium on Foundations of Computer
Science, pp. 320322, Singer Island, Florida, 2426 October 1984.
[24] N. Alon and Y. Roichman, Random Cayley graphs and expanders, Random
Structures and Algorithms, vol. 5, no. 2, pp. 271284, 1994.
[25] N. Alon and J. H. Spencer, The Probabilistic Method. Wiley-Interscience
Series in Discrete Mathematics and Optimization.
New York: WileyInterscience [John Wiley & Sons], Second Edition, 2000. (With an appendix
on the life and work of Paul Erd
os).
[26] N. Alon and B. Sudakov, Bipartite subgraphs and the smallest eigenvalue,
Combinatorics, Probability and Computing, vol. 9, no. 1, pp. 112, 2000.

References

313

[27] A. E. Andreev, A. E. F. Clementi, and J. D. P. Rolim, Worst-case hardness suces for derandomization: A new method for hardness-randomness
trade-os, in Automata, Languages and Programming, 24th International
Colloquium, vol. 1256 of Lecture Notes in Computer Science, (P. Degano,
R. Gorrieri, and A. Marchetti-Spaccamela, eds.), pp. 177187, Bologna, Italy:
Springer-Verlag, 711 July 1997.
[28] A. E. Andreev, A. E. F. Clementi, J. D. P. Rolim, and L. Trevisan, Weak
random sources, hitting sets, and BPP simulations, SIAM Journal on
Computing, vol. 28, no. 6, pp. 21032116, (electronic) 1999.
[29] D. Angluin and D. Lichtenstein, Provable security of cryptosystems: A survey, Technical Report YALEU/DCS/TR-288, Yale University, Department
of Computer Science, 1983.
[30] S. Ar, R. J. Lipton, R. Rubinfeld, and M. Sudan, Reconstructing algebraic
functions from mixed data, SIAM Journal on Computing, vol. 28, no. 2,
pp. 487510, 1999.
[31] R. Armoni, M. Saks, A. Wigderson, and S. Zhou, Discrepancy sets and pseudorandom generators for combinatorial rectangles, in Annual Symposium on
Foundations of Computer Science (Burlington, VT, 1996), pp. 412421, Los
Alamitos, CA, 1996.
[32] S. Arora and B. Barak, Computational complexity. Cambridge: Cambridge
University Press, 2009. (A modern approach).
[33] S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy, Proof verication and the hardness of approximation problems, Journal of the ACM,
vol. 45, pp. 501555, May 1998.
[34] S. Arora and S. Safra, Probabilistic checking of proofs: A new characterization of NP, Journal of the ACM, vol. 45, pp. 70122, January 1998.
[35] S. Arora and M. Sudan, Improved low degree testing and its applications,
in Proceedings of the Annual ACM Symposium on Theory of Computing,
pp. 485495, El Paso, Texas, 46 May 1997.
[36] M. Artin, Algebra. Englewood Clis, NJ: Prentice Hall Inc., 1991.
[37] V. Arvind and J. K
obler, On pseudorandomness and resource-bounded
measure, Theoretical Computer Science, vol. 255, no. 12, pp. 205221, 2001.
[38] B. Aydinlio
glu, D. Gutfreund, J. M. Hitchcock, and A. Kawachi, Derandomizing Arthur-Merlin games and approximate counting implies exponential-size
lower bounds, Computational Complexity, vol. 20, no. 2, pp. 329366, 2011.
[39] B. Aydinlio
glu and D. van Melkebeek, Nondeterministic circuit lower bounds
from mildly derandomizing Arthur-Merlin games, Electronic Colloquium on
Computational Complexity (ECCC), vol. 19, p. 80, 2012.
[40] Y. Azar, R. Motwani, and J. Naor, Approximating probability distributions
using small sample spaces, Combinatorica, vol. 18, no. 2, pp. 151171, 1998.
[41] L. Babai, L. Fortnow, L. A. Levin, and M. Szegedy, Checking computations
in polylogarithmic time, in STOC, (C. Koutsougeras and J. S. Vitter, eds.),
pp. 2131, ACM, 1991.
[42] L. Babai, L. Fortnow, and C. Lund, Nondeterministic exponential time has
two-prover interactive protocols, Computational Complexity, vol. 1, no. 1,
pp. 340, 1991.

314

References

[43] L. Babai, L. Fortnow, N. Nisan, and A. Wigderson, BPP has subexponential


time simulations unless EXPTIME has publishable proofs, Computational
Complexity, vol. 3, no. 4, pp. 307318, 1993.
[44] L. Babai and S. Moran, Arthur-Merlin games: A randomized proof system,
and a hierarchy of complexity classes, Journal of Computer and System
Sciences, vol. 36, no. 2, pp. 254276, 1988.
[45] B. Barak, M. Hardt, and S. Kale, The uniform hardcore lemma via
approximate Bregman projections, in Proceedings of the Annual ACM-SIAM
Symposium on Discrete Algorithms, pp. 11931200, Philadelphia, PA, 2009.
[46] B. Barak, R. Impagliazzo, and A. Wigderson, Extracting randomness using
few independent sources, SIAM Journal on Computing, vol. 36, no. 4,
pp. 10951118, 2006.
[47] B. Barak, G. Kindler, R. Shaltiel, B. Sudakov, and A. Wigderson, Simulating
independence: New constructions of condensers, Ramsey graphs, dispersers,
and extractors, Journal of the ACM, vol. 57, no. 4, no. 4, pp. Art 20, 52, 2010.
[48] B. Barak, A. Rao, R. Shaltiel, and A. Wigderson, 2-source dispersers for
no(1) entropy and Ramsey graphs beating the FranklWilson construction,
Annals of Mathematics, 2012. (To appear. Preliminary version in STOC 06).
[49] B. Barak, R. Shaltiel, and A. Wigderson, Computational analogues of
entropy, in Approximation, randomization, and combinatorial optimization,
vol. 2764 of Lecture Notes in Computer Science, pp. 200215, Berlin: Springer,
2003.
[50] B. Barak, L. Trevisan, and A. Wigderson, Additive Combinatorics and
Computer Science. http://www.cs.princeton.edu/theory/index.php/Main/
AdditiveCombinatoricsMinicourse, August 2007.
[51] E. Barker and J. Kelsey, Recommendation for random number generation
using deterministic random bit generators, Special Publication 800-90A,
National Institute of Standards and Technology, U.S. Department of
Commerce, January 2012.
[52] L. A. Bassalygo, Asymptotically optimal switching circuits, Problems of
Information Transmission, vol. 17, no. 3, pp. 206211, 1981.
[53] J. D. Batson, D. A. Spielman, and N. Srivastava, Twice-ramanujan sparsiers, in Annual ACM Symposium on Theory of Computing (Bethesda, MD),
pp. 255262, 2009.
[54] D. Beaver and J. Feigenbaum, Hiding instances in multioracle queries
(Extended Abstract), in STACS 90 (Rouen, 1990), vol. 415 of Lecture Notes
in Computer Science, pp. 3748, Berlin: Springer, 1990.
[55] M. Bellare, O. Goldreich, and S. Goldwasser, Randomness in interactive
proofs, Computational Complexity, vol. 3, no. 4, pp. 319354, 1993.
[56] M. Bellare, S. Goldwasser, and D. Micciancio, Pseudo-Random number
generation within cryptographic algorithms: The DDS case, in CRYPTO,
vol. 1294 of Lecture Notes in Computer Science, (B. S. K. Jr., ed.),
pp. 277291, Springer, 1997.
[57] M. Bellare and J. Rompel, Randomness-ecient oblivious sampling, in
Annual Symposium on Foundations of Computer Science, pp. 276287, Santa
Fe, New Mexico, 2022 November 1994.

References

315

[58] A. Ben-Aroya and A. Ta-Shma, A combinatorial construction of almostRamanujan graphs using the zig-zag product, in Annual ACM Symposium
on Theory of Computing (Victoria, British Columbia), pp. 325334, 2008.
[59] E. Ben-Sasson, O. Goldreich, P. Harsha, M. Sudan, and S. Vadhan, Robust
PCPs of proximity, shorter PCPs and applications to coding, SIAM Journal
on Computing, vol. 36, no. 4, pp. 889974, 2006.
[60] E. Ben-Sasson and S. Kopparty, Ane dispersers from subspace polynomials, in STOC09 Proceedings of the 2009 ACM International Symposium
on Theory of Computing, pp. 6574, New York, 2009.
[61] E. Ben-Sasson, M. Sudan, S. Vadhan, and A. Wigderson, Randomnessecient low degree tests and short PCPs via epsilon-biased sets, in
Proceedings of the Annual ACM Symposium on Theory of Computing,
pp. 612621, New York, 2003.
[62] E. Ben-Sasson and N. Zewi, From ane to two-source extractors via approximate duality, in STOC, (L. Fortnow and S. P. Vadhan, eds.), pp. 177186,
ACM, 2011.
[63] C. H. Bennett, G. Brassard, and J.-M. Robert, Privacy amplication by
public discussion, SIAM Journal on Computing, vol. 17, no. 2, pp. 210229,
1988. Special issue on cryptography.
[64] S. J. Berkowitz, On computing the determinant in small parallel time using
a small number of processors, Information Processing Letters, vol. 18, no. 3,
pp. 147150, 1984.
[65] E. R. Berlekamp, Algebraic Coding Theory. New York: McGraw-Hill Book
Co., 1968.
[66] E. R. Berlekamp, Factoring polynomials over large nite elds, Mathematics
of Computation, vol. 24, pp. 713735, 1970.
[67] J. Bierbrauer, T. Johansson, G. Kabatianskii, and B. Smeets, On families
of hash functions via geometric codes and concatenation, in Advances in
cryptology CRYPTO 93 (Santa Barbara, CA, 1993), vol. 773 of Lecture
Notes in Computer Science, pp. 331342, Berlin: Springer, 1994.
[68] Y. Bilu and N. Linial, Lifts, discrepancy and nearly optimal spectral gap,
Combinatorica, vol. 26, no. 5, pp. 495519, 2006.
[69] M. Blum, Independent unbiased coin ips from a correlated biased source
a nite state Markov chain, Combinatorica, vol. 6, no. 2, pp. 97108, 1986.
Theory of computing (Singer Island, Fla., 1984).
[70] M. Blum and S. Kannan, Designing programs that check their work,
Journal of the ACM, vol. 42, no. 1, pp. 269291, 1995.
[71] M. Blum, M. Luby, and R. Rubinfeld, Self-testing/correcting with applications to numerical problems, Journal of Computer and System Sciences,
vol. 47, no. 3, pp. 549595, 1993.
[72] M. Blum and S. Micali, How to generate cryptographically strong sequences
of pseudorandom bits, SIAM Journal on Computing, vol. 13, no. 4,
pp. 850864, 1984.
[73] A. Bogdanov, Z. Dvir, E. Verbin, and A. Yehudayo, Pseudorandomness
for Width 2 Branching Programs, Electronic Colloquium on Computational
Complexity (ECCC), vol. 16, p. 70, 2009.

316

References

[74] A. Bogdanov and L. Trevisan, Average-case complexity, Foundations


and Trends in Theoretical Computer Science, vol. 2, no. 1, pp. 1106,
2006.
[75] A. Bogdanov and E. Viola, Pseudorandom bits for polynomials, SIAM
Journal on Computing, vol. 39, no. 6, pp. 24642486, 2010.
[76] A. Borodin, J. von zur Gathen, and J. Hopcroft, Fast parallel matrix and
GCD computations, Information and Control, vol. 52, no. 3, pp. 241256,
1982.
[77] C. Bosley and Y. Dodis, Does privacy require true randomness?, in Theory
of cryptography, vol. 4392 of Lecture Notes in Computer Science, pp. 120,
Berlin: Springer, 2007.
[78] J. Bourgain, On the construction of ane extractors, Geometric and
Functional Analysis, vol. 17, no. 1, pp. 3357, 2007.
[79] J. Bourgain, N. Katz, and T. Tao, A sum-product estimate in nite
elds, and applications, Geometric and Functional Analysis, vol. 14, no. 1,
pp. 2757, 2004.
[80] J. Boyar, Inferring sequences produced by pseudo-random number generators, Journal of the Association for Computing Machinery, vol. 36, no. 1,
pp. 129141, 1989.
[81] M. Braverman, Polylogarithmic independence fools AC0 circuits, Journal
of the ACM, vol. 57, no. 5, pp. Art 28, 10, 2010.
[82] M. Braverman, A. Rao, R. Raz, and A. Yehudayo, Pseudorandom generators for regular branching programs, in FOCS, pp. 4047, IEEE Computer
Society, 2010.
[83] A. Z. Broder, How hard is to marry at random? (On the approximation
of the permanent), in Annual ACM Symposium on Theory of Computing
(Berkeley, CA), pp. 5058, 1986.
[84] J. Brody and E. Verbin, The coin problem and pseudorandomness for
branching programs, in FOCS, pp. 3039, IEEE Computer Society, 2010.
[85] H. Buhrman and L. Fortnow, One-sided two-sided error in probabilistic
computation, in STACS 99 (Trier), vol. 1563 of Lecture Notes in Computer
Science, pp. 100109, Berlin: Springer, 1999.
[86] H. Buhrman, L. Fortnow, and T. Thierauf, Nonrelativizing separations, in
Annual IEEE Conference on Computational Complexity (Bualo, NY, 1998),
pp. 812, Los Alamitos, CA, 1998.
[87] H. Buhrman, P. B. Miltersen, J. Radhakrishnan, and S. Venkatesh, Are
bitvectors optimal?, SIAM Journal on Computing, vol. 31, no. 6, pp. 1723
1744, 2002.
[88] J. Buresh-Oppenheim, V. Kabanets, and R. Santhanam, Uniform hardness amplication in NP via monotone codes, Electronic Colloquium on
Computational Complexity (ECCC), vol. 13, no. 154, 2006.
[89] P. B
urgisser, M. Clausen, and M. A. Shokrollahi, Algebraic Complexity
Theory, volume 315 of Grundlehren der Mathematischen Wissenschaften
[Fundamental Principles of Mathematical Sciences]. Berlin: Springer-Verlag,
1997. (With the collaboration of Thomas Lickteig).

References

317

[90] R. Canetti, Y. Dodis, S. Halevi, E. Kushilevitz, and A. Sahai, Exposureresilient functions and all-or-nothing transforms, in Advances in
Cryptology EUROCRYPT 00, Lecture Notes in Computer Science,
(B. Preneel, ed.), Springer-Verlag, 1418 May 2000.
[91] R. Canetti, G. Even, and O. Goldreich, Lower bounds for sampling algorithms for estimating the average, Information Processing Letters, vol. 53,
no. 1, pp. 1725, 1995.
[92] M. Capalbo, O. Reingold, S. Vadhan, and A. Wigderson, Randomness conductors and constant-degree lossless expanders, in Annual ACM Symposium
on Theory of Computing (STOC 02), pp. 659668, Montreal, CA, May 2002.
(Joint session with CCC 02).
[93] J. L. Carter and M. N. Wegman, Universal classes of hash functions,
Journal of Computer and System Sciences, vol. 18, no. 2, pp. 143154, 1979.
[94] J. Cheeger, A lower bound for the smallest eigenvalue of the Laplacian,
in Problems in analysis (Papers dedicated to Salomon Bochner, 1969),
pp. 195199, Princeton, NJ: Princeton Univ. Press, 1970.
[95] H. Cherno, A measure of asymptotic eciency for tests of a hypothesis
based on the sum of observations, Annals of Mathematical Statistics, vol. 23,
pp. 493507, 1952.
[96] B. Chor and O. Goldreich, Unbiased bits from sources of weak randomness
and probabilistic communication complexity, SIAM Journal on Computing,
vol. 17, pp. 230261, April 1988.
[97] B. Chor and O. Goldreich, On the power of two-point based sampling,
Journal of Complexity, vol. 5, no. 1, pp. 96106, 1989.
[98] B. Chor, O. Goldreich, J. H
astad, J. Friedman, S. Rudich, and R. Smolensky,
The bit extraction problem of t-resilient functions (Preliminary Version),
in FOCS, pp. 396407, IEEE, 1985.
[99] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, Private information
retrieval, Journal of the ACM, vol. 45, no. 6, pp. 965982, 1998.
[100] F. Chung and R. Graham, Sparse quasi-random graphs, Combinatorica,
vol. 22, no. 2, pp. 217244, 2002. (Special issue: Paul Erd
os and his
mathematics).
[101] F. Chung and R. Graham, Quasi-random graphs with given degree
sequences, Random Structures and Algorithms, vol. 32, no. 1, pp. 119, 2008.
[102] F. Chung, R. Graham, and T. Leighton, Guessing secrets, Electronic Journal of Combinatorics, vol. 8, no. 1, p. 25 (electronic), 2001. Research Paper 13.
[103] F. R. K. Chung, Diameters and eigenvalues, Journal of the American
Mathematical Society, vol. 2, no. 2, pp. 187196, 1989.
[104] F. R. K. Chung, R. L. Graham, and R. M. Wilson, Quasi-random graphs,
Combinatorica, vol. 9, no. 4, pp. 345362, 1989.
[105] K.-M. Chung, Ecient parallel repetition theorems with applications to
security amplication, PhD Thesis, Harvard University, 2011.
[106] K.-M. Chung, Y. T. Kalai, F.-H. Liu, and R. Raz, Memory delegation,
in Advances in Cryptology CRYPTO 2011, vol. 6841 of Lecture Notes in
Computer Science, pp. 151168, Heidelberg: Springer, 2011.

318

References

[107] A. Cohen and A. Wigderson, Dispersers, deterministic amplication,


and weak random sources (extended abstract), in Annual Symposium on
Foundations of Computer Science (Research Triangle Park, North Carolina),
pp. 1419, 1989.
[108] D. Coppersmith and S. Winograd, Matrix multiplication via arithmetic progressions, Journal of Symbolic Computation, vol. 9, no. 3, pp. 251280, 1990.
[109] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to
Algorithms. Cambridge, MA: MIT Press, Second Edition, 2001.
[110] T. M. Cover and J. A. Thomas, Elements of Information Theory. Wiley
Series in Telecommunications: John Wiley & Sons, Inc., Second Edition, 1991.
[111] L. Csanky, Fast parallel matrix inversion algorithms, SIAM Journal on
Computing, vol. 5, no. 4, pp. 618623, 1976.
[112] G. Davido, P. Sarnak, and A. Valette, Elementary number theory, group
theory, and Ramanujan graphs, in vol. 55 of London Mathematical Society
Student Texts, Cambridge: Cambridge University Press, 2003.
[113] A. De, Pseudorandomness for permutation and regular branching programs,
in IEEE Conference on Computational Complexity, pp. 221231, 2011.
[114] A. De and T. Watson, Extractors and lower bounds for locally samplable
sources, in Approximation, Randomization, and Combinatorial Optimization,
vol. 6845 of Lecture Notes in Computer Science, pp. 483494, Heidelberg:
Springer, 2011.
[115] R. de Wolf, A Brief Introduction to Fourier analysis on the Boolean cube,
Theory of Computing, Graduate Surveys, vol. 1, pp. 120, 2008.
[116] R. A. DeMillo and R. J. Lipton, A probabilistic remark on algebraic program
testing, Information Processing Letters, vol. 7, no. 4, pp. 193195, 1978.
[117] W. Die and M. E. Hellman, New directions in cryptography, IEEE
Transactions on Information Theory, vol. IT-22, no. 6, pp. 644654, 1976.
[118] I. Dinur, The PCP theorem by gap amplication, Journal of the ACM,
vol. 54, no. 3, article 12, p. 44 (electronic), 2007.
[119] Y. Dodis, R. Impagliazzo, R. Jaiswal, and V. Kabanets, Security amplication for interactive cryptographic primitives, in Theory of Cryptography, vol.
5444 of Lecture Notes in Computer Science, pp. 128145, Berlin: Springer,
2009.
[120] Y. Dodis, T. Ristenpart, and S. Vadhan, Randomness condensers for
eciently samplable, seed-dependent sources, in Proceedings of the IACR
Theory of Cryptography Conference (TCC 12), vol. 7194 of Lecture Notes
in Computer Science, (R. Cramer, ed.), pp. 618635, Springer-Verlag, 1921
March 2012.
[121] D. Dubhashi and A. Panconesi, Concentration of Measure for the Analysis of
Randomized Algorithms. Cambridge University Press, 2009.
[122] Z. Dvir, Extractors for varieties, in IEEE Conference on Computational
Complexity, pp. 102113, 2009.
[123] Z. Dvir, A. Gabizon, and A. Wigderson, Extractors and rank extractors
for polynomial sources, Computational Complexity, vol. 18, no. 1, pp. 158,
2009.

References

319

[124] Z. Dvir, S. Kopparty, S. Saraf, and M. Sudan, Extensions to the method


of multiplicities, with applications to Kakeya sets and mergers, in 2009
Annual IEEE Symposium on Foundations of Computer Science (FOCS
2009), pp. 181190, Los Alamitos, CA, 2009.
[125] Z. Dvir and S. Lovett, Subspace evasive sets, in Symposium on Theory of
Computing, (H. J. Karlo and T. Pitassi, eds.), pp. 351358, ACM, 2012.
[126] Z. Dvir and A. Shpilka, Locally decodable codes with two queries and polynomial identity testing for depth 3 circuits, SIAM Journal on Computing,
vol. 36, no. 5, pp. 14041434 (electronic), 2006/07.
[127] S. Dziembowski and K. Pietrzak, Leakage-resilient cryptography, in
Symposium on Foundations of Computer Science, pp. 293302, 2008.
[128] K. Efremenko, 3-query locally decodable codes of subexponential length,
in STOC09 Proceedings of the 2009 ACM International Symposium on
Theory of Computing, pp. 3944, New York, 2009.
[129] P. Elias, List Decoding for Noisy Channels, Research Laboratory of Electronics, Massachusetts Institute of Technology. Rep. No. 335: Cambridge, MA,
1957.
[130] P. Elias, The ecient construction of an unbiased random sequence, The
Annals of Mathematical Statistics, vol. 43, no. 3, pp. 865870, June 1972.
[131] P. Elias, Error-correcting codes for list decoding, IEEE Transactions on
Information Theory, vol. 37, no. 1, pp. 512, 1991.
[132] P. Erd
os, Some remarks on the theory of graphs, Bulletin of the American
Mathematical Society, vol. 53, pp. 292294, 1947.
[133] P. Erd
os, Problems and results in chromatic graph theory, in Proof
Techniques in Graph Theory (Proceedings of Ann Arbor Graph Theory
Conference, Ann Arbor, Michigan, 1968), pp. 2735, New York, 1969.
[134] P. Erd
os, P. Frankl, and Z. F
uredi, Families of nite sets in which no set
is covered by the union of r others, Israel Journal of Mathematics, vol. 51,
no. 12, pp. 7989, 1985.
[135] G. Even, Construction of small probabilistic spaces for deterministic
simulation, Masters Thesis, The Technion, 1991.
[136] G. Even, O. Goldreich, M. Luby, N. Nisan, and B. Velickovic, Ecient
approximation of product distributions, Random Struct. Algorithms, vol. 13,
no. 1, pp. 116, 1998.
[137] S. Even, A. L. Selman, and Y. Yacobi, The complexity of promise problems
with applications to public-key cryptography, Information and Control,
vol. 61, no. 2, pp. 159173, 1984.
[138] U. Feige, S. Goldwasser, L. Lov
asz, S. Safra, and M. Szegedy, Interactive
proofs and the hardness of approximating cliques, Journal of the ACM,
vol. 43, no. 2, pp. 268292, 1996.
[139] J. A. Fill, Eigenvalue bounds on convergence to stationarity for nonreversible
Markov chains, with an application to the exclusion process, Annals of
Applied Probability, vol. 1, no. 1, pp. 6287, 1991.
[140] G. D. Forney, Concatenated Codes. MIT Press, 1966.
[141] P. Frankl and R. M. Wilson, Intersection theorems with geometric
consequences, Combinatorica, vol. 1, no. 4, pp. 357368, 1981.

320

References

[142] M. L. Fredman, J. Koml


os, and E. Szemeredi, Storing a sparse table with
O(1) worst case access time, Journal of the ACM, vol. 31, no. 3, pp. 538544,
1984.
[143] J. Friedman, A proof of Alons second eigenvalue conjecture and related
problems, Memoirs of the American Mathematical Society, vol. 195, no. 910,
p. viii+100, 2008.
[144] A. M. Frieze, J. H
astad, R. Kannan, J. C. Lagarias, and A. Shamir, Reconstructing truncated integer variables satisfying linear congruences, SIAM
Journal on Computing, vol. 17, no. 2, pp. 262280, 1988. (Special issue on
cryptography).
[145] B. Fuller, A. ONeill, and L. Reyzin, A unied approach to deterministic
encryption: New constructions and a connection to computational entropy,
in TCC, vol. 7194 of Lecture Notes in Computer Science, (R. Cramer, ed.),
pp. 582599, 2012.
[146] O. Gabber and Z. Galil, Explicit constructions of linear-sized superconcentrators, Journal of Computer and System Sciences, vol. 22, pp. 407420,
June 1981.
[147] A. Gabizon and R. Raz, Deterministic extractors for ane sources over
large elds, Combinatorica, vol. 28, no. 4, pp. 415440, 2008.
[148] R. G. Gallager, Low-Density Parity-Check Codes. MIT Press, 1963.
[149] P. Gemmell, R. J. Lipton, R. Rubinfeld, M. Sudan, and A. Wigderson,
Self-testing/correcting for polynomials and for approximate functions, in
STOC, (C. Koutsougeras and J. S. Vitter, eds.), pp. 3242, 1991.
[150] P. Gemmell and M. Sudan, Highly resilient correctors for polynomials,
Information Processing Letters, vol. 43, no. 4, pp. 169174, 1992.
[151] E. Gilbert, A comparison of signalling alphabets, Bell Systems Technical
Journal, vol. 31, pp. 504522, 1952.
[152] J. Gill, Computational complexity of probabilistic Turing machines, SIAM
Journal on Computing, vol. 6, no. 4, pp. 675695, 1977.
[153] D. Gillman, A Cherno bound for random walks on expander graphs,
SIAM Journal on Computing, vol. 27, no. 4, pp. 12031220 (electronic), 1998.
[154] M. X. Goemans and D. P. Williamson, Improved approximation algorithms for maximum cut and satisability problems using semidenite
programming, Journal of the ACM, vol. 42, no. 6, pp. 11151145, 1995.
[155] O. Goldreich, A sample of samplers a computational perspective on
sampling (survey), Electronic Colloquium on Computational Complexity
(ECCC), vol. 4, no. 20, 1997.
[156] O. Goldreich, Modern Cryptography, Probabilistic Proofs and Pseudorandomness, vol. 17 of Algorithms and Combinatorics. Berlin: Springer-Verlag, 1999.
[157] O. Goldreich, Foundations of Cryptography. Cambridge: Cambridge University Press, 2001. (Basic tools).
[158] O. Goldreich, Foundations of Cryptography II. Cambridge: Cambridge
University Press, 2004. (Basic Applications).
[159] O. Goldreich, On promise problems: A survey, in Theoretical Computer
Science, vol. 3895 of Lecture Notes in Computer Science, pp. 254290, Berlin:
Springer, 2006.

References

321

[160] O. Goldreich, Probabilistic proof systems: A primer, Foundations and


Trends in Theoretical Computer Science, vol. 3, no. 1, pp. 191 (2008), 2007.
[161] O. Goldreich, Computational Complexity: A Conceptual Perspective. Cambridge: Cambridge University Press, 2008.
[162] O. Goldreich, A Primer on Pseudorandom Generators, vol. 55 of University
Lecture Series. Providence, RI: American Mathematical Society, 2010.
[163] O. Goldreich, In a World of P=BPP, in Studies in Complexity and
Cryptography. Miscellanea on the Interplay of Randomness and Computation,
vol. 6650 of Lecture Notes in Computer Science, pp. 191232, Springer, 2011.
[164] O. Goldreich, S. Goldwasser, and S. Micali, How to construct random
functions, Journal of the ACM, vol. 33, pp. 792807, October 1986.
[165] O. Goldreich, S. Goldwasser, and D. Ron, Property testing and its connection to learning and approximation, Journal of the ACM, vol. 45, no. 4,
pp. 653750, 1998.
[166] O. Goldreich, R. Impagliazzo, L. Levin, R. Venkatesan, and D. Zuckerman,
Security preserving amplication of hardness, in Annual Symposium
on Foundations of Computer Science, Vol. I, II (St. Louis, MO, 1990),
pp. 318326, Los Alamitos, CA: IEEE Computer Society Press, 1990.
[167] O. Goldreich, H. Krawczyk, and M. Luby, On the existence of pseudorandom
generators, SIAM Journal on Computing, vol. 22, no. 6, pp. 11631175, 1993.
[168] O. Goldreich and L. A. Levin, A hard-core predicate for all one-way
functions, in Proceedings of the Annual ACM Symposium on Theory of
Computing, pp. 2532, Seattle, Washington, 1517 May 1989.
[169] O. Goldreich and B. Meyer, Computational indistinguishability: algorithms
vs. circuits, Theoretical Computer Science, vol. 191, no. 12, pp. 215218,
1998.
[170] O. Goldreich, S. Micali, and A. Wigderson, Proofs that yield nothing but
their validity, or All languages in NP have zero-knowledge proof systems,
Journal of the ACM, vol. 38, no. 3, pp. 691729, 1991.
[171] O. Goldreich, N. Nisan, and A. Wigderson, On Yaos XOR lemma, Technical Report TR95050, revision 2, Electronic Colloquium on Computational
Complexity, http://www.eccc.uni-trier.de/eccc, June 2010.
[172] O. Goldreich and M. Sudan, Computational indistinguishability: A sample
hierarchy, Journal of Computer and System Sciences, vol. 59, no. 2, pp. 253
269, 1999. (13th Annual IEEE Conference on Computation Complexity
(Bualo, NY, 1998)).
[173] O. Goldreich, S. Vadhan, and A. Wigderson, Simplied derandomization of
BPP using a hitting set generator, in Studies in Complexity and Cryptography. Miscellanea on the Interplay of Randomness and Computation, vol. 6650
of Lecture Notes in Computer Science, pp. 5967, Springer, 2011.
[174] O. Goldreich and A. Wigderson, Tiny families of functions with random
properties: A quality-size trade-o for hashing, Random Structures &
Algorithms, vol. 11, no. 4, pp. 315343, 1997.
[175] S. Goldwasser, Cryptography without (hardly any) secrets?, in Advances
in cryptology EUROCRYPT 2009, vol. 5479 of Lecture Notes in Computer
Science, pp. 369370, Berlin: Springer, 2009.

322

References

[176] S. Goldwasser and S. Micali, Probabilistic Encryption, Journal of Computer


and System Sciences, vol. 28, pp. 270299, April 1984.
[177] S. Goldwasser, S. Micali, and C. Racko, The knowledge complexity of
interactive proof systems, SIAM Journal on Computing, vol. 18, no. 1,
pp. 186208, 1989.
[178] P. Gopalan, R. Meka, and O. Reingold, DNF Sparsication and a faster
deterministic counting algorithm, in IEEE Conference on Computational
Complexity, pp. 126135, 2012.
[179] P. Gopalan, R. Meka, O. Reingold, L. Trevisan, and S. Vadhan, Better pseudorandom generators via milder pseudorandom restrictions, in Proceedings
of the Annual IEEE Symposium on Foundations of Computer Science (FOCS
12), 2023 October 2012. (To appear).
[180] W. T. Gowers, A new proof of Szemeredis theorem for arithmetic progressions of length four, Geometric and Functional Analysis, vol. 8, no. 3,
pp. 529551, 1998.
[181] W. T. Gowers, A new proof of Szemeredis theorem, Geometric and
Functional Analysis, vol. 11, no. 3, pp. 465588, 2001.
[182] R. L. Graham, B. L. Rothschild, and J. H. Spencer, Ramsey Theory.
Wiley-Interscience Series in Discrete Mathematics and Optimization. New
York: John Wiley & Sons Inc., Second Edition, 1990.
[183] V. Guruswami, Guest column: Error-correcting codes and expander graphs,
SIGACT News, vol. 35, no. 3, pp. 2541, 2004.
[184] V. Guruswami, Algorithmic Results in List Decoding. volume 2, number 2 of
Foundations and Trends in Theoretical Computer Science. now publishers,
2006.
[185] V. Guruswami, Linear-algebraic list decoding of folded reed-solomon codes,
in IEEE Conference on Computational Complexity, pp. 7785, 2011.
[186] V. Guruswami, J. H
astad, and S. Kopparty, On the list-decodability of
random linear codes, IEEE Transactions on Information Theory, vol. 57,
no. 2, pp. 718725, 2011.
[187] V. Guruswami, J. H
astad, M. Sudan, and D. Zuckerman, Combinatorial
bounds for list decoding, IEEE Transactions on Information Theory, vol. 48,
no. 5, pp. 10211034, 2002.
[188] V. Guruswami and A. Rudra, Explicit codes achieving list decoding
capacity: error-correction with optimal redundancy, IEEE Transactions on
Information Theory, vol. 54, no. 1, pp. 135150, 2008.
[189] V. Guruswami and M. Sudan, Improved decoding of Reed-Solomon and
algebraic-geometry codes, Institute of Electrical and Electronics Engineers.
Transactions on Information Theory, vol. 45, no. 6, pp. 17571767, 1999.
[190] V. Guruswami and M. Sudan, List decoding algorithms for certain
concatenated codes, in STOC, pp. 181190, 2000.
[191] V. Guruswami and M. Sudan, Extensions to the Johnson bound, Unpublished Manuscript, February 2001.
[192] V. Guruswami, C. Umans, and S. Vadhan, Unbalanced expanders and
randomness extractors from ParvareshVardy codes, Journal of the ACM,
vol. 56, no. 4, pp. 134, 2009.

References

323

[193] V. Guruswami and C. Wang, Optimal rate list decoding via derivative codes,
in Approximation, randomization, and combinatorial optimization, vol. 6845
of Lecture Notes in Computer Science, pp. 593604, Heidelberg: Springer,
2011.
[194] V. Guruswami and C. Xing, Folded codes from function eld towers and
improved optimal rate list decoding, in STOC, (H. J. Karlo and T. Pitassi,
eds.), pp. 339350, ACM, 2012.
[195] D. Gutfreund, R. Shaltiel, and A. Ta-Shma, Uniform hardness versus
randomness tradeos for Arthur-Merlin games, Computational Complexity,
vol. 12, no. 34, pp. 85130, 2003.
[196] J. H
astad, Computational Limitations of Small-Depth Circuits. MIT Press,
1987.
[197] J. H
astad, R. Impagliazzo, L. A. Levin, and M. Luby, A pseudorandom
generator from any one-way function, SIAM Journal on Computing, vol. 28,
no. 4, pp. 13641396, 1999.
[198] I. Haitner, O. Reingold, and S. Vadhan, Eciency improvements in constructing pseudorandom generators from one-way functions, in Proceedings
of the Annual ACM Symposium on Theory of Computing (STOC 10),
pp. 437446, 68 June 2010.
[199] R. W. Hamming, Error detecting and error correcting codes, The Bell
System Technical Journal, vol. 29, pp. 147160, 1950.
[200] J. Hartmanis and R. E. Stearns, On the computational complexity of
algorithms, Transactions of the American Mathematical Society, vol. 117,
pp. 285306, 1965.
[201] N. J. A. Harvey, Algebraic structures and algorithms for matching and
matroid problems, in Annual IEEE Symposium on Foundations of Computer
Science (Berkeley, CA), pp. 531542, 2006.
[202] A. Healy, S. Vadhan, and E. Viola, Using nondeterminism to amplify
hardness, SIAM Journal on Computing, vol. 35, no. 4, pp. 903931, 2006.
[203] A. D. Healy, Randomness-ecient sampling within NC1 , Computational
Complexity, vol. 17, no. 1, pp. 337, 2008.
[204] W. Hoeding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, vol. 58, pp. 1330,
1963.
[205] A. J. Homan, On eigenvalues and colorings of graphs, in Graph Theory and
its Applications (Proceedings of Advanced Seminors, Mathematics Research
Center, University of Wisconsin, Madison, Wisconsin, 1969), pp. 7991, New
York: Academic Press, 1970.
[206] T. Holenstein, Key agreement from weak bit agreement, in STOC05:
Proceedings of the Annual ACM Symposium on Theory of Computing,
pp. 664673, New York, 2005.
[207] S. Hoory, N. Linial, and A. Wigderson, Expander graphs and their
applications, Bulletin of the AMS, vol. 43, no. 4, pp. 439561, 2006.
[208] R. Impagliazzo, Hard-core distributions for somewhat hard problems,
in Annual Symposium on Foundations of Computer Science, pp. 538545,
Milwaukee, Wisconsin, 2325 October 1995.

324

References

[209] R. Impagliazzo, Hardness as randomness: A survey of universal derandomization, in Proceedings of the International Congress of Mathematicians,
Vol. III (Beijing, 2002), pp. 659672, Beijing, 2002.
[210] R. Impagliazzo, R. Jaiswal, and V. Kabanets, Approximate list-decoding of
direct product codes and uniform hardness amplication, SIAM Journal on
Computing, vol. 39, no. 2, pp. 564605, 2009.
[211] R. Impagliazzo, R. Jaiswal, V. Kabanets, and A. Wigderson, Uniform direct
product theorems: Simplied, optimized, and derandomized, SIAM Journal
on Computing, vol. 39, no. 4, pp. 16371665, 2009/2010.
[212] R. Impagliazzo, V. Kabanets, and A. Wigderson, In search of an easy
witness: Exponential time vs. probabilistic polynomial time, Journal of
Computer and System Sciences, vol. 65, no. 4, pp. 672694, 2002.
[213] R. Impagliazzo and S. Rudich, Limits on the provable consequences of
one-way permutations, in Advances in cryptology CRYPTO 88 (Santa
Barbara, CA, 1988), vol 403 of Lecture Notes in Computer Science, pp. 826,
Berlin: Springer, 1990.
[214] R. Impagliazzo and A. Wigderson, An information-theoretic variant of the
inclusion-exclusion bound (preliminary version), Unpublished manuscript,
1996.
[215] R. Impagliazzo and A. Wigderson, P = BPP if E requires exponential
circuits: Derandomizing the XOR lemma, in Proceedings of the Annual ACM
Symposium on Theory of Computing, pp. 220229, El Paso, Texas, 46 May
1997.
[216] R. Impagliazzo and A. Wigderson, Randomness vs time: Derandomization
under a uniform assumption, Journal of Computer and System Sciences,
vol. 63, no. 4, pp. 672688, 2001. Special issue on FOCS 98 (Palo Alto CA).
[217] R. Impagliazzo and D. Zuckerman, How to recycle random bits, in Annual
Symposium on Foundations of Computer Science (Research Triangle Park,
North Carolina), pp. 248253, 1989.
[218] K. Iwama and H. Morizumi, An explicit lower bound of 5n o(n) for Boolean
circuits, in Mathematical foundations of computer science 2002, vol. 2420 of
Lecture Notes in Computer Science, pp. 353364, Berlin: Springer, 2002.
[219] M. Jerrum and A. Sinclair, Approximating the permanent, SIAM Journal
on Computing, vol. 18, no. 6, pp. 11491178, 1989.
[220] M. Jerrum, A. Sinclair, and E. Vigoda, A polynomial-time approximation
algorithm for the permanent of a matrix with nonnegative entries, Journal
of the ACM, vol. 51, no. 4, pp. 671697, 2004.
[221] S. Jimbo and A. Maruoka, Expanders obtained from ane transformations,
Combinatorica, vol. 7, no. 4, pp. 343355, 1987.
[222] A. Joe, On a sequence of almost deterministic pairwise independent random variables, Proceedings of the American Mathematical Society, vol. 29,
pp. 381382, 1971.
[223] A. Joe, On a set of almost deterministic k-independent random variables,
Annals of Probability, vol. 2, no. 1, pp. 161162, 1974.
[224] S. Johnson, Upper bounds for constant weight error correcting codes,
Discrete Mathematics, vol. 3, pp. 109124, 1972.

References

325

[225] S. M. Johnson, A new upper bound for error-correcting codes, IRE


Transactions on Information Theory, vol. IT-8, pp. 203207, 1962.
[226] V. Kabanets, Derandomization: A brief overview, in Current Trends in
Theoretical Computer Science, vol. 1 Algorithms and Complexity, (G. Paun,
G. Rozenberg, and A. Salomaa, eds.), pp. 165188, World Scientic, 2004.
[227] V. Kabanets and R. Impagliazzo, Derandomizing polynomial identity tests
means proving circuit lower bounds, Computational Complexity, vol. 13,
no. 12, pp. 146, 2004.
[228] N. Kahale, Eigenvalues and expansion of regular graphs, Journal of the
ACM, vol. 42, no. 5, pp. 10911106, 1995.
[229] J. D. Kahn, N. Linial, N. Nisan, and M. E. Saks, On the cover time of
random walks on graphs, Journal of Theoretical Probability, vol. 2, no. 1,
pp. 121128, 1989.
[230] A. T. Kalai, Unpublished manuscript, 2004.
[231] J. Kamp, A. Rao, S. Vadhan, and D. Zuckerman, Deterministic extractors
for small-space sources, Journal of Computer and System Sciences, vol. 77,
no. 1, pp. 191220, 2011.
[232] J. Kamp and D. Zuckerman, Deterministic extractors for bit-xing sources
and exposure-resilient cryptography, SIAM Journal on Computing, vol. 36,
no. 5, pp. 12311247, 2006/2007.
[233] H. J. Karlo and T. Pitassi, eds., Proceedings of the Symposium on Theory
of Computing Conference, STOC 2012, New York, NY, USA, May 1922,
2012, 2012.
[234] R. Karp, N. Pippenger, and M. Sipser, A time-randomness tradeo, in
AMS Conference on Probabilistic Computational Complexity, Durham, New
Hampshire, 1985.
[235] R. M. Karp and R. J. Lipton, Turing machines that take advice,
LEnseignement Mathematique. Revue Internationale. IIe Serie, vol. 28,
no. 34, pp. 191209, 1982.
[236] R. M. Karp, M. Luby, and N. Madras, Monte Carlo approximation algorithms for enumeration problems, Journal of Algorithms, vol. 10, no. 3,
pp. 429448, 1989.
[237] R. M. Karp, E. Upfal, and A. Wigderson, Constructing a perfect matching
is in Random NC, Combinatorica, vol. 6, no. 1, pp. 3548, 1986.
[238] J. Katz and Y. Lindell, Introduction to modern cryptography. Chapman &
Hall/CRC Cryptography and Network Security. Boca Raton, FL: Chapman
& Hall/CRC, 2008.
[239] J. Katz and L. Trevisan, On the eciency of local decoding procedures for
error-correcting codes, in Proceedings of the Annual ACM Symposium on
Theory of Computing, pp. 8086 (electronic), New York, 2000.
[240] N. M. Katz, An estimate for character sums, Journal of the American
Mathematical Society, vol. 2, no. 2, pp. 197200, 1989.
[241] N. Kayal and N. Saxena, Polynomial identity testing for depth 3 circuits,
Computational Complexity, vol. 16, no. 2, pp. 115138, 2007.
[242] J. Kinne, D. van Melkebeek, and R. Shaltiel, Pseudorandom generators,
typically-correct derandomization, and dircuit lower bounds, Computational
Complexity, vol. 21, no. 1, pp. 361, 2012.

326

References

[243] A. R. Klivans and R. A. Servedio, Boosting and Hard-Core Set Construction, Machine Learning, vol. 51, no. 3, pp. 217238, 2003.
[244] A. R. Klivans and D. van Melkebeek, Graph nonisomorphism has subexponential size proofs unless the polynomial-time hierarchy collapses, SIAM
Journal on Computing, vol. 31, no. 5, pp. 15011526 (electronic), 2002.
[245] D. E. Knuth, The art of computer programming. Volume 2: Seminumerical
Algorithms. AddisonWesley, Third Edition, 1998.
[246] R. K
onig and U. M. Maurer, Extracting randomness from generalized
symbol-xing and Markov sources, in Proceedings of 2004 IEEE International Symposium on Information Theory, p. 232, 2004.
[247] R. K
onig and U. M. Maurer, Generalized strong extractors and deterministic
privacy amplication, in IMA International Conference, vol. 3796 of Lecture
Notes in Computer Science, (N. P. Smart, ed.), pp. 322339, Springer, 2005.
[248] S. Kopparty, List-decoding multiplicity codes, Electronic Colloquium on
Computational Complexity (ECCC), vol. 19, p. 44, 2012.
[249] S. Kopparty, S. Saraf, and S. Yekhanin, High-rate codes with sublinear-time
decoding, in STOC, (L. Fortnow and S. P. Vadhan, eds.), pp. 167176,
ACM, 2011.
[250] M. Kouck
y, P. Nimbhorkar, and P. Pudl
ak, Pseudorandom generators for
group products: extended abstract, in STOC, (L. Fortnow and S. P. Vadhan,
eds.), pp. 263272, ACM, 2011.
[251] C. Koutsougeras and J. S. Vitter, eds., Proceedings of the Annual ACM Symposium on Theory of Computing, May 58, 1991, New Orleans, Louisiana,
USA, ACM, 1991.
[252] H. Krawczyk, How to predict congruential generators, Journal of Algorithms, vol. 13, no. 4, pp. 527545, 1992.
[253] E. Kushilevitz and N. Nisan, Communication complexity.
Cambridge:
Cambridge University Press, 1997.
[254] O. Lachish and R. Raz, Explicit lower bound of 4.5n o(n) for Boolean
circuits, in Annual ACM Symposium on Theory of Computing, pp. 399408
(electronic), New York, 2001.
[255] H. O. Lancaster, Pairwise statistical independence, Annals of Mathematical
Statistics, vol. 36, pp. 13131317, 1965.
[256] C. Lautemann, BPP and the polynomial hierarchy, Information Processing
Letters, vol. 17, no. 4, pp. 215217, 1983.
[257] C.-J. Lee, C.-J. Lu, and S.-C. Tsai, Computational randomness from generalized hardcore sets, in Fundamentals of Computation Theory, vol. 6914
of Lecture Notes in Computer Science, pp. 7889, Heidelberg: Springer,
2011.
[258] F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays,
Trees, and Hypercubes. San Mateo, CA: Morgan Kaufmann, 1992.
[259] L. A. Levin, One way functions and pseudorandom generators, Combinatorica, vol. 7, no. 4, pp. 357363, 1987.
[260] D. Lewin and S. Vadhan, Checking polynomial identities over any eld:
towards a derandomization?, in Annual ACM Symposium on the Theory of
Computing (Dallas, TX), pp. 438447, New York, 1999.

References

327

[261] M. Li and P. Vit


anyi, An Introduction to Kolmogorov Complexity and its
Applications. Texts in Computer Science. New York: Springer, Third Edition,
2008.
[262] X. Li, A New Approach to Ane Extractors and Dispersers, in IEEE
Conference on Computational Complexity, pp. 137147, 2011.
[263] R. Lidl and H. Niederreiter, Introduction to Finite Fields and their Applications. Cambridge: Cambridge University Press, First Edition, 1994.
[264] N. Linial, M. Luby, M. Saks, and D. Zuckerman, Ecient construction
of a small hitting set for combinatorial rectangles in high dimension,
Combinatorica, vol. 17, no. 2, pp. 215234, 1997.
[265] N. Linial and N. Nisan, Approximate inclusion-exclusion, Combinatorica,
vol. 10, no. 4, pp. 349365, 1990.
[266] R. J. Lipton, New directions in testing, in Distributed computing and
cryptography (Princeton, NJ, 1989), vol. 2 of DIMACS Series Discrete
Mathemtics and Theoretical Computer Science, pp. 191202, Providence, RI:
American Mathematical Society, 1991.
[267] L. Lov
asz, On determinants, matchings, and random algorithms, in Fundamentals of Computation Theory (Berlin/Wendisch-Rietz), pp. 565574, 1979.
[268] L. Lov
asz, On the Shannon capacity of a graph, IEEE Transactions on
Information Theory, vol. 25, no. 1, pp. 17, 1979.
[269] L. Lov
asz, Combinatorial Problems and Exercises. Providence, RI: AMS
Chelsea Publishing, Second Edition, 2007.
[270] S. Lovett, Unconditional pseudorandom generators for low-degree polynomials, Theory of Computing. An Open Access Journal, vol. 5, pp. 6982, 2009.
[271] C.-J. Lu, Improved pseudorandom generators for combinatorial rectangles,
Combinatorica, vol. 22, no. 3, pp. 417433, 2002.
[272] C.-J. Lu, O. Reingold, S. Vadhan, and A. Wigderson, Extractors: optimal
up to constant factors, in Proceedings of the ACM Symposium on Theory of
Computing (STOC 03), pp. 602611, 2003.
[273] C.-J. Lu, S.-C. Tsai, and H.-L. Wu, Improved hardness amplication in
NP, Theoretical Computer Science, vol. 370, no. 13, pp. 293298, 2007.
[274] C.-J. Lu, S.-C. Tsai, and H.-L. Wu, Complexity of hard-core set proofs,
Computational Complexity, vol. 20, no. 1, pp. 145171, 2011.
[275] A. Lubotzky, Discrete Groups, Expanding Graphs and Invariant Measures.
volume 125 of Progress in Mathematics. Basel: Birkh
auser Verlag, 1994.
(With an appendix by Jonathan D. Rogawski).
[276] A. Lubotzky, Expander graphs in pure and applied mathematics, American
Mathematical Society. Bulletin. New Series, vol. 49, no. 1, pp. 113162, 2012.
[277] A. Lubotzky, R. Phillips, and P. Sarnak, Ramanujan graphs, Combinatorica,
vol. 8, no. 3, pp. 261277, 1988.
[278] M. Luby, A simple parallel algorithm for the maximal independent set
problem, SIAM Journal on Computing, vol. 15, no. 4, pp. 10361053, 1986.
[279] M. Luby, Removing randomness in parallel computation without a processor penalty, Journal of Computer and System Sciences, vol. 47, no. 2,
pp. 250286, 1993.

328

References

[280] M. Luby, B. Velickovic, and A. Wigderson, Deterministic Approximate


Counting of Depth-2 Circuits, in ISTCS, pp. 1824, 1993.
[281] M. Luby and A. Wigderson, Pairwise Independence and Derandomization.
Volume 1, number 4 of Foundations and Trends in Theoretical Computer
Science. now publishers, 2005.
[282] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-correcting
Codes. Amsterdam: North-Holland Publishing Co., 1977. (North-Holland
Mathematical Library, Vol. 16).
[283] G. A. Margulis, Explicit constructions of expanders, Problemy Peredaci
Informacii, vol. 9, no. 4, pp. 7180, 1973.
[284] G. A. Margulis, Explicit group-theoretic constructions of combinatorial schemes and their applications in the construction of expanders and
concentrators, Problemy Peredaci Informacii, vol. 24, no. 1, pp. 5160, 1988.
[285] R. Martin and D. Randall, Disjoint decomposition of Markov chains
and sampling circuits in Cayley graphs, Combinatorics, Probability and
Computing, vol. 15, no. 3, pp. 411448, 2006.
[286] M. Mihail, Conductance and convergence of Markov Chains-A combinatorial
treatment of expanders, in Annual Symposium on Foundations of Computer
Science (Research Triangle Park, North Carolina), pp. 526531, 1989.
[287] G. L. Miller, Riemanns hypothesis and tests for primality, Journal of
Computer and System Sciences, vol. 13, no. 3, pp. 300317, December 1976.
[288] P. Miltersen, Handbook of Randomized Computing, chapter Derandomizing
Complexity Classes. Kluwer, 2001.
[289] P. B. Miltersen and N. V. Vinodchandran, Derandomizing Arthur-Merlin
games using hitting sets, Computational Complexity, vol. 14, no. 3,
pp. 256279, 2005.
[290] M. Mitzenmacher and E. Upfal, Probability and Computing. Cambridge:
Cambridge University Press, 2005. (Randomized algorithms and probabilistic
analysis).
[291] R. Motwani and P. Raghavan, Randomized Algorithms.
Cambridge:
Cambridge University Press, 1995.
[292] M. Mucha and P. Sankowski, Maximum matchings via Gaussian elimination, in Symposium on Foundations of Computer Science (Rome, Italy),
pp. 248255, IEEE Computer Society, 2004.
[293] D. E. Muller, Boolean algebras in electric circuit design, The American
Mathematical Monthly, vol. 61, no. 7 part II, pp. 2728, 1954. (Proceedings
of the symposium on special topics in applied mathematics, Northwestern
University (1953)).
[294] K. Mulmuley, U. V. Vazirani, and V. V. Vazirani, Matching is as easy as
matrix inversion, Combinatorica, vol. 7, no. 1, pp. 105113, 1987.
[295] S. Muthukrishnan, Data Streams: Algorithms and Applications, volume 1,
number 2 of Foundations and Trends in Theoretical Computer Science. now
publishers, 2005.
[296] J. Naor and M. Naor, Small-bias probability spaces: Ecient constructions
and applications, SIAM Journal on Computing, vol. 22, pp. 838856, August
1993.

References

329

[297] A. Nilli, On the second eigenvalue of a graph, Discrete Mathematics,


vol. 91, no. 2, pp. 207210, 1991.
[298] N. Nisan, Pseudorandom bits for constant depth circuits, Combinatorica,
vol. 11, no. 1, pp. 6370, 1991.
[299] N. Nisan, Pseudorandom generators for space-bounded computation,
Combinatorica, vol. 12, no. 4, pp. 449461, 1992.
[300] N. Nisan, RL SC, Computational Complexity, vol. 4, no. 1, pp. 111, 1994.
[301] N. Nisan and A. Ta-Shma, Extracting randomness: A survey and new constructions, Journal of Computer and System Sciences, vol. 58, pp. 148173,
February 1999.
[302] N. Nisan and A. Wigderson, Hardness vs Randomness, Journal of Computer
and System Sciences, vol. 49, pp. 149167, October 1994.
[303] N. Nisan and D. Zuckerman, Randomness is linear in space, Journal of
Computer and System Sciences, vol. 52, pp. 4352, February 1996.
[304] R. ODonnell, Hardness amplication within NP, Journal of Computer and
System Sciences, vol. 69, no. 1, pp. 6894, 2004.
[305] R. ODonnell, Analysis of boolean functions, Book draft available at
analysisofbooleanfunctions.org, 2012.
[306] F. Parvaresh and A. Vardy, Correcting errors beyond the Guruswami-Sudan
radius in polynomial time, in Proceedings of the IEEE Symposium on
Foundations of Computer Science, pp. 285294, 2005.
[307] Y. Peres, Iterating von Neumanns procedure for extracting random bits,
The Annals of Statistics, vol. 20, no. 1, pp. 590597, 1992.
[308] W. W. Peterson, Encoding and error-correction procedures for the BoseChaudhuri codes, IRE Transactions on Information Theory, vol. IT-6,
pp. 459470, 1960.
[309] M. Pinsker, On the complexity of a concentrator, in Annual Teletrac
Conference, pp. 318/1318/4, Stockholm, 1973.
[310] N. Pippenger, On simultaneous resource bounds (Preliminary Version), in
Annual Symposium on Foundations of Computer Science (San Juan, Puerto
Rico), pp. 307311, 1979.
[311] N. Pippenger and M. J. Fischer, Relations among complexity measures,
Journal of the Association for Computing Machinery, vol. 26, no. 2,
pp. 361381, 1979.
[312] R. L. Plackett and J. E. Burman, The design of optimum multi-factorial
experiments, Biometrika, vol. 33, pp. 305325, 1945.
[313] V. S. Pless, W. C. Human, and R. A. Brualdi, eds., Handbook of Coding
Theory. Vol. I, II. Amsterdam: North-Holland, 1998.
[314] M. O. Rabin, Probabilistic algorithm for testing primality, Journal of
Number Theory, vol. 12, no. 1, pp. 128138, 1980.
[315] J. Radhakrishnan and A. Ta-Shma, Bounds for dispersers, extractors, and
depth-two superconcentrators, SIAM Journal on Discrete Mathematics,
vol. 13, no. 1 (electronic), pp. 224, 2000.
[316] P. Raghavan, Probabilistic construction of deterministic algorithms:
approximating packing integer programs, Journal of Computer and System
Sciences, vol. 37, no. 2, pp. 130143, 1988. (Annual IEEE Symposium on the
Foundations of Computer Science (Toronto, ON, 1986)).

330

References

[317] D. Randall, Mixing, in Symposium on Foundations of Computer Science


(Cambridge, MA), pp. 415, 2003.
[318] A. Rao, Extractors for a constant number of polynomially small minentropy independent sources, SIAM Journal on Computing, vol. 39, no. 1,
pp. 168194, 2009.
[319] R. Raz and O. Reingold, On recycling the randomness of states in space
bounded computation, in Annual ACM Symposium on Theory of Computing
(Atlanta, GA, 1999), pp. 159168 (electronic), New York: ACM, 1999.
[320] R. Raz, O. Reingold, and S. Vadhan, Extracting all the Randomness and
Reducing the Error in Trevisans Extractors, Journal of Computer and
System Sciences, vol. 65, pp. 97128, August 2002.
[321] A. Razborov, E. Szemeredi, and A. Wigderson, Constructing small sets
that are uniform in arithmetic progressions, Combinatorics Probability
Computing, vol. 2, no. 4, pp. 513518, 1993.
[322] A. A. Razborov, Lower bounds on the dimension of schemes of bounded
depth in a complete basis containing the logical addition function, Akademiya
Nauk SSSR. Matematicheskie Zametki, vol. 41, no. 4, pp. 598607, 623, 1987.
[323] A. A. Razborov and S. Rudich, Natural proofs, Journal of Computer and
System Sciences, vol. 55, no. 1, part 1, pp. 2435, 1997. (26th Annual ACM
Symposium on the Theory of Computing (STOC 94) (Montreal, PQ, 1994)).
[324] I. S. Reed, A class of multiple-error-correcting codes and the decoding
scheme, IRE Transactions on Information Theory, PGIT-4, pp. 3849,
1954.
[325] I. S. Reed and G. Solomon, Polynomial codes over certain nite elds, Journal of the Society of Industrial and Applied Mathematics, vol. 8, pp. 300304,
1960.
[326] O. Reingold, On black-box separations in cryptography, Tutorial at the
Third Theory of Cryptography Conference (TCC 06), March 2006. Slides
available from http://research.microsoft.com/en-us/people/omreing/.
[327] O. Reingold, Undirected connectivity in log-space, Journal of the ACM,
vol. 55, no. 4, pp. Art 17, 24, 2008.
[328] O. Reingold, R. Shaltiel, and A. Wigderson, Extracting randomness
via repeated condensing, SIAM Journal on Computing, vol. 35, no. 5,
pp. 11851209, (electronic), 2006.
[329] O. Reingold, L. Trevisan, M. Tulsiani, and S. Vadhan, Dense subsets of
pseudorandom sets, in Proceedings of the Annual IEEE Symposium on Foundations of Computer Science (FOCS 08), pp. 7685, 2628 October 2008.
[330] O. Reingold, L. Trevisan, and S. Vadhan, Notions of reducibility between
cryptographic primitives, in Proceedings of the First Theory of Cryptography
Conference (TCC 04), vol. 2951 of Lecture Notes in Computer Science,
(M. Naor, ed.), pp. 120, Springer-Verlag, 1921 February 2004.
[331] O. Reingold, L. Trevisan, and S. Vadhan, Pseudorandom Walks in Regular
Digraphs and the RL vs. L Problem, in Proceedings of the Annual ACM
Symposium on Theory of Computing (STOC 06), pp. 457466, 2123 May
2006. (Preliminary version as ECCC TR05-22, February 2005).

References

331

[332] O. Reingold, S. Vadhan, and A. Wigderson, Entropy waves, the zig-zag


graph product, and new constant-degree expanders and extractors, in
Proceedings of the Annual Symposium on Foundations of Computer Science
(FOCS 00), pp. 313, Redondo Beach, CA, 1719 October 2000.
[333] O. Reingold, S. Vadhan, and A. Wigderson, Entropy waves, the zig-zag
graph product, and new constant-degree expanders, Annals of Mathematics,
vol. 155, no. 1, January 2001.
[334] O. Reingold, S. Vadhan, and A. Wigderson, A note on extracting randomness
from SanthaVazirani sources, Unpublished manuscript, September 2004.
[335] A. Renyi, On measures of entropy and information, in Proceedings of
Berkeley Symposium on Mathematics Statistics and Probability, Vol. I,
pp. 547561, Berkeley, California: University of California Press, 1961.
[336] R. L. Rivest, A. Shamir, and L. Adleman, A method for obtaining digital signatures and public-key cryptosystems, Communications of the Association
for Computing Machinery, vol. 21, no. 2, pp. 120126, 1978.
[337] D. Ron, Property testing, in Handbook of Randomized Computing, Vol. I,
II, volume 9 of Comb. Optim., pp. 597649, Dordrecht: Kluwer Academic
Publications, 2001.
[338] E. Rozenman and S. Vadhan, Derandomized squaring of graphs, in Proceedings of the International Workshop on Randomization and Computation
(RANDOM 05), vol. 3624 of Lecture notes in Computer Science, pp. 436447,
Berkeley, CA, August 2005.
[339] R. Rubinfeld, Sublinear time algorithms, in International Congress of
Mathematicians. Vol. III, pp. 10951110, Z
urich: European Mathematical
Society, 2006.
[340] R. Rubinfeld and M. Sudan, Robust characterizations of polynomials with
applications to program testing, SIAM Journal on Computing, vol. 25, no. 2,
pp. 252271, 1996.
[341] A. Rukhin, J. Soto, J. Nechvatal, M. Smid, E. Barker, S. Leigh, M. Levenson,
M. Vangel, D. Banks, A. Heckert, J. Dray, and S. Vo, A statistical test suite
for random and pseudorandom number generators for cryptographic applications, National Institute of Standards and Technology, U.S. Department of
Commerce, Special Publication 800-22, Revision 1a, April 2010.
[342] S. Sahni and T. Gonzalez, P -complete approximation problems, Journal of
the ACM, vol. 23, no. 3, pp. 555565, 1976.
[343] M. Saks, A. Srinivasan, and S. Zhou, Explicit OR-dispersers with
polylogarithmic degree, Journal of the ACM, vol. 45, no. 1, pp. 123154,
1998.
[344] M. Saks and S. Zhou, BPH SPACE(S) DSPACE(S 3/2 ), Journal of
Computer and System Sciences, vol. 58, no. 2, pp. 376403, 1999.
[345] M. Saks and D. Zuckerman, Unpublished manuscript. 1995.
[346] M. S
antha and U. V. Vazirani, Generating quasirandom sequences from
semirandom sources, Journal of Computer and System Sciences, vol. 33,
no. 1, pp. 7587, 1986. (Annual Symposium on Foundations of Computer
Science (Singer Island, FL, 1984)).

332

References

[347] R. Santhanam, Circuit lower bounds for Merlin-Arthur classes, in


STOC07 Proceedings of the Annual ACM Symposium on Theory of
Computing, pp. 275283, New York, 2007.
[348] P. Sarnak, Some Applications of Modular Forms. Vol. 99 of Cambridge Tracts
in Mathematics. Cambridge: Cambridge University Press, 1990.
[349] W. J. Savitch, Relationships between nondeterministic and deterministic
tape complexities, Journal of Computer and System Sciences, vol. 4,
pp. 177192, 1970.
[350] J. P. Schmidt, A. Siegel, and A. Srinivasan, Cherno-Hoeding bounds
for applications with limited independence, SIAM Journal on Discrete
Mathematics, vol. 8, no. 2, pp. 223250, 1995.
[351] J. T. Schwartz, Fast probabilistic algorithms for verication of polynomial
identities, Journal of the ACM, vol. 27, no. 4, pp. 701717, 1980.
[352] R. Shaltiel, Recent Developments in Extractors, in Current Trends in Theoretical Computer Science, vol. 1, (G. Paun, G. Rozenberg, and A. Salomaa,
eds.), pp. 189228, World Scientic, 2004.
[353] R. Shaltiel, Dispersers for ane sources with sub-polynomial entropy, in
FOCS, (R. Ostrovsky, ed.), pp. 247256, IEEE, 2011.
[354] R. Shaltiel, An introduction to randomness extractors, in Automata,
languages and programming. Part II, vol. 6756 of Lecture Notes in Computer
Science, pp. 2141, Heidelberg: Springer, 2011.
[355] R. Shaltiel, Weak derandomization of weak algorithms: explicit versions of
Yaos lemma, Computational Complexity, vol. 20, no. 1, pp. 87143, 2011.
[356] R. Shaltiel and C. Umans, Simple extractors for all min-entropies and a new
Pseudo-random generator, Journal of the ACM, vol. 52, no. 2, pp. 172216,
2005.
[357] R. Shaltiel and C. Umans, Pseudorandomness for approximate counting and
sampling, Computational Complexity, vol. 15, no. 4, pp. 298341, 2006.
[358] R. Shaltiel and C. Umans, Low-end uniform hardness versus randomness
tradeos for AM, SIAM Journal on Computing, vol. 39, no. 3, pp. 10061037,
2009.
[359] A. Shamir, How to share a secret, Communications of the Association for
Computing Machinery, vol. 22, no. 11, pp. 612613, 1979.
[360] A. Shamir, On the generation of cryptographically strong pseudorandom
sequences, in Automata, Languages and Programming (Akko, 1981), vol. 115
of Lecture Notes in Computer Science, pp. 544550, Berlin: Springer, 1981.
[361] C. E. Shannon, A mathematical theory of communication, The Bell System
Technical Journal, vol. 27, pp. 379423, 623656, 1948.
[362] A. Shpilka and A. Yehudayo, Arithmetic circuits: A survey of recent results
and open questions, Foundations and Trends in Theoretical Computer
Science, vol. 5, no. 34, pp. 207388 (2010), 2009.
[363] J. Sma and S. Z
ak, Almost k-wise independent sets establish hitting sets
for width-3 1-branching programs, in CSR, vol. 6651 of Lecture Notes in
Computer Science, (A. S. Kulikov and N. K. Vereshchagin, eds.), pp. 120133,
Springer, 2011.

References

333

[364] M. Sipser, A complexity theoretic approach to randomness, in Annual ACM


Symposium on Theory of Computing, pp. 330335, Boston, Massachusetts,
2527 April 1983.
[365] M. Sipser, Expanders, randomness, or time versus space, Journal of
Computer and System Sciences, vol. 36, no. 3, pp. 379383, 1988. (Structure
in Complexity Theory Conference (Berkeley, CA, 1986)).
[366] M. Sipser, Introduction to the Theory of Computation. Course Technology,
2nd Edition, 2005.
[367] M. Sipser and D. A. Spielman, Expander codes, IEEE Transactions on
Information Theory, vol. 42, no. 6 part 1, pp. 17101722, 1996. (Codes and
complexity).
[368] R. Smolensky, Algebraic methods in the theory of lower bounds for Boolean
circuit complexity, in STOC, (A. V. Aho, ed.), pp. 7782, ACM, 1987.
[369] R. Solovay and V. Strassen, A fast Monte-Carlo test for primality, SIAM
Journal on Computing, vol. 6, no. 1, pp. 8485, 1977.
[370] J. Spencer, Ten Lectures on the Probabilistic Method, volume 64 of CBMS-NSF
Regional Conference Series in Applied Mathematics. Philadelphia, PA: Society
for Industrial and Applied Mathematics (SIAM), Second Edition, 1994.
[371] D. A. Spielman, Spectral graph theory and its applications, in Symposium
on Foundations of Computer Science (FOCS 2007), 21-23 October 2007,
Providence, RI, USA, Proceedings, pp. 2938, 2007.
[372] A. Srinivasan and D. Zuckerman, Computing with very weak random
sources, SIAM Journal on Computing, vol. 28, no. 4, pp. 14331459
(electronic), 1999.
[373] T. Steinke, Pseudorandomness for permutation branching programs without
the group theory, Technical Report TR12-083, Electronic Colloquium on
Computational Complexity (ECCC), July 2012.
[374] J. Stern, Secret linear congruential generators are not cryptographically
secure, in FOCS, pp. 421426, IEEE Computer Society, 1987.
[375] H. Stichtenoth, Algebraic Function Fields and Codes, volume 254 of Graduate
Texts in Mathematics. Berlin: Springer-Verlag, Second Edition, 2009.
[376] D. R. Stinson, Combinatorial techniques for universal hashing, Journal of
Computer and System Sciences, vol. 48, no. 2, pp. 337346, 1994.
[377] V. Strassen, Gaussian elimination is not optimal, Numerische Mathematik,
vol. 13, pp. 354356, 1969.
[378] M. Sudan, Decoding of Reed Solomon codes beyond the error-correction
bound, Journal of Complexity, vol. 13, pp. 180193, March 1997.
[379] M. Sudan, Algorithmic introduction to coding theory, Lecture notes,
http://people.csail.mit.edu/madhu/FT01/, 2001.
[380] M. Sudan, Essential coding theory (lecture notes), http://people.
csail.mit.edu/madhu/FT04/, 2004.
[381] M. Sudan, L. Trevisan, and S. Vadhan, Pseudorandom generators without
the XOR lemma, Journal of Computer and System Sciences, vol. 62,
pp. 236266, 2001.
[382] A. Ta-Shma, C. Umans, and D. Zuckerman, Lossless condensers, unbalanced
expanders, and extractors, Combinatorica, vol. 27, no. 2, pp. 213240, 2007.

334

References

[383] A. Ta-Shma and D. Zuckerman, Extractor codes, IEEE Transactions on


Information Theory, vol. 50, no. 12, pp. 30153025, 2004.
[384] A. Ta-Shma, D. Zuckerman, and S. Safra, Extractors from Reed-Muller
codes, Journal of Computer and System Sciences, vol. 72, no. 5, pp. 786812,
2006.
[385] M. R. Tanner, Explicit concentrators from generalized N -gons, SIAM
Journal on Algebraic Discrete Methods, vol. 5, no. 3, pp. 287293, 1984.
[386] T. Tao, Expansion in nite groups of Lie type, Lecture Notes,
http://www.math.ucla.edu/ tao/254b.1.12w/, 2012.
[387] A. Terras, Fourier Analysis on Finite Groups and Applications, volume 43
of London Mathematical Society Student Texts. Cambridge: Cambridge
University Press, 1999.
[388] S. Tessaro, Computational indistinguishability amplication, PhD thesis,
ETH Zurich, http://e-collection.library.ethz.ch/eserv/eth:1817/eth-181702.pdf, 2010.
[389] L. Trevisan, Extractors and pseudorandom generators, Journal of the
ACM, vol. 48, no. 4, pp. 860879, (electronic), 2001.
[390] L. Trevisan, List decoding using the XOR lemma, in Proceedings of
the IEEE Symposium on Foundations of Computer Science, pp. 126135,
Cambridge, MA, October 2003.
[391] L. Trevisan, Some applications of coding theory in computational complexity, Quaderni di Matematica, vol. 13, pp. 347424, 2004.
[392] L. Trevisan, On uniform amplication of hardness in NP, in STOC05:
Proceedings of the Annual ACM Symposium on Theory of Computing,
pp. 3138, New York, 2005.
[393] L. Trevisan, Pseudorandomness and combinatorial constructions, in
International Congress of Mathematicians. Vol. III, pp. 11111136, Z
urich:
European Mathematics Society, 2006.
[394] L. Trevisan, Guest column: Additive combinatorics and theoretical computer
science, SIGACT News, vol. 40, no. 2, pp. 5066, 2009.
[395] L. Trevisan, Dense model theorems and their applications, in Theory of
cryptography, vol. 6597 of Lecture Notes in Computer Science, pp. 5557,
Heidelberg: Springer, 2011.
[396] L. Trevisan, M. Tulsiani, and S. Vadhan, Regularity, boosting, and eciently
simulating every high-entropy distribution, in Proceedings of the Annual
IEEE Conference on Computational Complexity (CCC 09), pp. 126136,
1518 July 2009. (Preliminary version posted as ECCC TR08-103).
[397] L. Trevisan and S. Vadhan, Pseudorandomness and average-case complexity
via uniform reductions, Computational Complexity, vol. 16, pp. 331364,
December 2007.
[398] L. Trevisan and S. Vadhan, Extracting randomness from samplable distributions, in Proceedings of the Annual Symposium on Foundations of Computer
Science (FOCS 00), pp. 3242, Redondo Beach, CA, 1719 October
2000.
[399] C. Umans, Pseudo-random generators for all hardnesses, Journal of
Computer and System Sciences, vol. 67, no. 2, pp. 419440, 2003.

References

335

[400] S. Vadhan, Probabilistic proof systems, Part I interactive & zeroknowledge proofs, in Computational Complexity Theory, vol. 10 of IAS/Park
City Mathematics Series, (S. Rudich and A. Wigderson, eds.), pp. 315348,
American Mathematical Society, 2004.
[401] S. Vadhan and C. J. Zheng, Characterizing pseudoentropy and simplifying
pseudorandom generator constructions, in Proceedings of the Annual ACM
Symposium on Theory of Computing (STOC 12), pp. 817836, 1922 May
2012.
[402] S. P. Vadhan, Constructing locally computable extractors and cryptosystems in the bounded-storage model, Journal of Cryptology, vol. 17, no. 1,
pp. 4377, January 2004.
[403] L. G. Valiant, Graph-theoretic properties in computational complexity,
Journal of Computer and System Sciences, vol. 13, no. 3, pp. 278285, 1976.
[404] L. G. Valiant, The complexity of computing the permanent, Theoretical
Computer Science, vol. 8, no. 2, pp. 189201, 1979.
[405] L. G. Valiant, A theory of the learnable, Communications of the ACM,
vol. 27, no. 11, pp. 11341142, 1984.
[406] R. Varshamov, Estimate of the number of signals in error correcting codes,
Doklady Akademe Nauk SSSR, vol. 117, pp. 739741, 1957.
[407] U. V. Vazirani, Towards a strong communication complexity theory or
generating quasi-random sequences from two communicating slightly-random
sources (extended abstract), in Proceedings of the Annual ACM Symposium
on Theory of Computing, pp. 366378, Providence, Rhode Island, 68 May
1985.
[408] U. V. Vazirani, Eciency considerations in using semi-random sources
(extended abstract), in STOC, (A. V. Aho, ed.), pp. 160168, ACM, 1987.
[409] U. V. Vazirani, Strong communication complexity or generating quasirandom
sequences from two communicating semirandom sources, Combinatorica,
vol. 7, no. 4, pp. 375392, 1987.
[410] U. V. Vazirani and V. V. Vazirani, Random polynomial time is equal to
slightly-random polynomial time, in Annual Symposium on Foundations of
Computer Science, pp. 417428, Portland, Oregon, 2123 October 1985.
[411] E. Viola, The complexity of constructing pseudorandom generators from hard
functions, Computational Complexity, vol. 13, no. 34, pp. 147188, 2004.
[412] E. Viola, Pseudorandom bits for constant-depth circuits with few arbitrary
symmetric gates, SIAM Journal on Computing, vol. 36, no. 5, pp. 13871403,
(electronic), 2006/2007.
[413] E. Viola, The sum of d small-bias generators fools polynomials of degree d,
Computational Complexity, vol. 18, no. 2, pp. 209217, 2009.
[414] E. Viola, Extractors for circuit sources, in FOCS, (R. Ostrovsky, ed.),
pp. 220229, IEEE, 2011.
[415] E. Viola, The complexity of distributions, SIAM Journal on Computing,
vol. 41, no. 1, pp. 191218, 2012.
[416] J. von Neumann, Various techniques used in conjunction with random digits, in Collected Works. Vol. V: Design of Computers, Theory of Automata
and Numerical Analysis, New York: The Macmillan Co., 1963.

336

References

[417] M. N. Wegman and J. L. Carter, New hash functions and their use in
authentication and set equality, Journal of Computer and System Sciences,
vol. 22, no. 3, pp. 265279, 1981.
[418] R. Williams, Improving exhaustive search implies superpolynomial lower
bounds, in STOC10 Proceedings of the 2010 ACM International
Symposium on Theory of Computing, pp. 231240, New York: ACM, 2010.
[419] R. Williams, Non-uniform ACC circuit lower bounds, in IEEE Conference
on Computational Complexity, pp. 115125, 2011.
[420] J. Wozencraft, List decoding, Quarterly Progress Report, Research
Laboratory of Electronics, MIT, vol. 48, pp. 9095, 1958.
[421] A. C. Yao, Theory and applications of trapdoor functions (extended
abstract), in Annual Symposium on Foundations of Computer Science,
pp. 8091, Chicago, Illinois, 35 November 1982.
[422] A. Yehudayo, Ane extractors over prime elds, Combinatorica, vol. 31,
no. 2, pp. 245256, 2011.
[423] S. Yekhanin, Towards 3-query locally decodable codes of subexponential
length, Journal of the ACM, vol. 55, no. 1, pp. Art 1, 16, 2008.
[424] S. Yekhanin, Locally Decodable Codes. Now Publishers, 2012. (To appear).
[425] R. Zippel, Probabilistic algorithms for sparse polynomials, in EUROSAM,
vol. 72 of Lecture Notes in Computer Science, (E. W. Ng, ed.), pp. 216226,
Springer, 1979.
[426] D. Zuckerman, Simulating BPP using a general weak random source,
Algorithmica, vol. 16, pp. 367391, October/November 1996.
[427] D. Zuckerman, Randomness-optimal oblivious sampling, Random Structures & Algorithms, vol. 11, no. 4, pp. 345367, 1997.
[428] V. V. Zyablov and M. S. Pinsker, List cascade decoding (in Russian),
Problems of Information Transmission, vol. 17, no. 4, pp. 2933, 1981.

You might also like