0400000010
0400000010
Pseudorandomness
By Salil P. Vadhan
Contents
1 Introduction
1.1
1.2
1.3
1.4
2
7
8
8
10
2.1
2.2
2.3
2.4
2.5
2.6
10
17
23
32
40
46
50
3.1
3.2
3.3
3.4
3.5
3.6
3.7
51
52
56
58
62
73
77
Enumeration
Nonconstructive/Nonuniform Derandomization
Nondeterminism
The Method of Conditional Expectations
Pairwise Independence
Exercises
Chapter Notes and References
4 Expander Graphs
4.1
4.2
4.3
4.4
4.5
4.6
Measures of Expansion
Random Walks on Expanders
Explicit Constructions
Undirected S-T Connectivity in Deterministic Logspace
Exercises
Chapter Notes and References
80
80
92
102
116
121
127
5 List-Decodable Codes
131
5.1
5.2
5.3
5.4
5.5
5.6
131
141
151
155
159
163
6 Randomness Extractors
166
6.1
6.2
6.3
6.4
6.5
167
178
188
202
209
7 Pseudorandom Generators
212
7.1
7.2
7.3
7.4
7.5
212
219
224
230
7.6
7.7
239
252
261
7.8
7.9
Exercises
Chapter Notes and References
270
278
8 Conclusions
284
8.1
8.2
284
290
Acknowledgments
309
References
311
R
Foundations and Trends
in
Theoretical Computer Science
Vol. 7, Nos. 13 (2011) 1336
c 2012 S. P. Vadhan
DOI: 10.1561/0400000010
Pseudorandomness
Salil P. Vadhan
School of Engineering and Applied Sciences, Harvard University, Cambridge,
MA, 02138, USA, [email protected]
Abstract
This is a survey of pseudorandomness, the theory of eciently generating objects that look random despite being constructed using little
or no randomness. This theory has signicance for a number of areas
in computer science and mathematics, including computational complexity, algorithms, cryptography, combinatorics, communications, and
additive number theory. Our treatment places particular emphasis on
the intimate connections that have been discovered between a variety
of fundamental pseudorandom objects that at rst seem very dierent in nature: expander graphs, randomness extractors, list-decodable
error-correcting codes, samplers, and pseudorandom generators. The
structure of the presentation is meant to be suitable for teaching in a
graduate-level course, with exercises accompanying each section.
1
Introduction
1.1
Over the past few decades, randomization has become one of the most
pervasive paradigms in computer science. Its widespread uses include:
Algorithm Design: For a number of important algorithmic problems, the most ecient algorithms known are randomized. For example:
Primality Testing: This was shown to have a randomized
polynomial-time algorithm in 1977. It wasnt until 2002 that
a deterministic polynomial-time algorithm was discovered.
(We will see this algorithm, but not its proof.)
Approximate Counting: Many approximate counting problems (e.g., counting perfect matchings in a bipartite graph)
have randomized polynomial-time algorithms, but the fastest
known deterministic algorithms take exponential time.
Undirected S-T Connectivity: This was shown to have
a randomized logspace algorithm in 1979. It wasnt until
2005 that a deterministic logspace algorithm was discovered
using tools from the theory of pseudorandomness, as we
will see.
2
Introduction
Introduction
a constant, independent of the number of vertices), but also wellconnected in some precise sense. For example, the graph cannot be
bisected without cutting a large (say, constant) fraction of the edges.
Expander graphs have numerous applications in theoretical computer science. They were originally studied for their use in designing
fault-tolerant networks (e.g., for telephone lines), ones that maintain
good connectivity even when links or nodes fail. But they also have less
obvious applications, such as an O(log n)-time algorithm for sorting in
parallel.
It is not obvious that expander graphs exist, but in fact it can be
shown, via the Probabilistic Method, that a random graph of degree 3 is
a good expander with high probability. However, many applications
of expander graphs need explicit constructions, and these have taken
longer to nd. We will see some explicit constructions in this survey, but
even the state-of-the-art does not always match the bounds given by
the probabilistic method (in terms of the degree/expansion tradeo).
Error-Correcting Codes: Error-correcting codes are tools for communicating over noisy channels. Specically, they specify a way to
encode messages into longer, redundant codewords so that even if the
codeword gets somewhat corrupted along the way, it is still possible
for the receiver to decode the original message. In his landmark paper
that introduced the eld of coding theory, Shannon also proved the
existence of good error-correcting codes via the Probabilistic Method.
That is, a random mapping of n-bit messages to O(n)-bit codewords
is a good error-correcting code with high probability. Unfortunately,
these probabilistic codes are not feasible to actually use a random
mapping requires an exponentially long description, and we know of no
way to decode such a mapping eciently. Again, explicit constructions
are needed.
In this survey, we will focus on the problem of list decoding. Specifically, we will consider scenarios where the number of corruptions is
so large that unique decoding is impossible; at best one can produce a
short list that is guaranteed to contain the correct message.
A Unied Theory: Each of the above objects has been the center of
a large and beautiful body of research, but until recently these corpora
were largely distinct. An exciting development over the past decade has
been the realization that all four of these objects are almost the same
when interpreted appropriately. Their intimate connections will be a
major focus of this survey, tying together the variety of constructions
and applications that we will see.
The surprise and beauty of these connections has to do with
the seemingly dierent nature of each of these objects. Pseudorandom generators, by asserting what ecient algorithms cannot do, are
objects of complexity theory. Extractors, with their focus on extracting the entropy in a correlated and biased sequence, are informationtheoretic objects. Expander graphs are of course combinatorial objects
(as dened above), though they can also be interpreted algebraically,
as we will see. Error-correcting codes involve a mix of combinatorics,
information theory, and algebra. Because of the connections, we obtain
new perspectives on each of the objects, and make substantial advances
on our understanding of each by translating intuitions and techniques
from the study of the others.
1.2
Introduction
1.3
Notational Conventions
We denote the set of numbers {1, . . . , n} by [n]. We write N for the set
of nonnegative integers (so we consider 0 to be a natural number). We
write S T to mean that S is a subset of T (not necessarily strict),
and S T for S being a strict subset of T . An inequality we use often
is nk (ne/k)k . All logarithms are base 2 unless otherwise specied.
We often use the convention that lowercase letters are the logarithm
(base 2) of the corresponding capital letter (e.g., N = 2n ).
Throughout, we consider random variables that can take values in
arbitrary discrete sets (as well as real-valued random variables). We
generally use capital letters, e.g., X, to denote random variables and
R
lowercase letters, e.g., x, to denote specic values. We write x X
to indicate that x is sampled according to X. For a set S, we write
R
x S to mean that x is selected uniformly at random from S. We
use the convention that multiple occurrences of a random variable in
an expression refer to the same instantiation, e.g., Pr[X = X] = 1. The
def
1.4
mentioned are due to [6, 13, 220, 236, 237, 267, 287, 314, 327, 369];
for more details see Section 2.6.
Recommended textbooks on cryptography are [157, 158, 238]. The
idea that encryption should be randomized is due to Goldwasser and
Micali [176].
The Probabilistic Method for combinatorial constructions is the
subject of the book [25]. Erd
os used this method to prove the existence
of Ramsey graphs in [132]. Major recent progress on explicit constructions of Ramsey graphs was recently obtained by Barak, Rao, Shaltiel,
and Widgerson [48] via the theory of randomness extractors.
The modern notion of a pseudorandom generator was formulated in
the works of Blum and Micali [72] and Yao [421], motivated by cryptographic applications. We will spend most of our time on a variant of
the BlumMicaliYao notion, proposed by Nisan and Wigderson [302],
where the generator is allowed more running time than the algorithms
it fools. A detailed treatment of the BlumMicaliYao notion can be
found in [157].
Surveys on randomness extractors are [301, 352, 354]. The notion
of extractor that we will focus on is the one due to Nisan and
Zuckerman [303].
A detailed survey of expander graphs is [207]. The probabilistic
construction of expander graphs is due to Pinsker [309]. The application of expanders to sorting in parallel is due to Ajtai, Koml
os, and
Szemeredi [10].
A classic text on coding theory is [282]. For a modern, computer
science-oriented treatment, we recommend Sudans lecture notes [380].
Shannon started this eld with the paper [361]. The notion of list decoding was proposed by Elias [129] and Wozencraft [420], and was reinvigorated in the work of Sudan [378]. Recent progress on list decoding is
covered in [184].
2
The Power of Randomness
2.1
10
11
12
consider
f (x) =
(x ).
expanded and terms are collected, this polynomial p can be shown to simply equal
x|F| x.
13
14
It is clear that if f = 0, the algorithm will always accept. The correctness in case f = 0 is based on the following simple but very useful
lemma.
Lemma 2.4 (SchwartzZippel Lemma). If f is a nonzero polynomial of degree d over a eld (or integral domain) F and S F, then
Pr
1 ,...,n S
[f (1 , . . . , n ) = 0]
d
.
|S|
15
16
where Sn denotes the set of permutations on [n]. Note that the th term
is nonzero if and only if the permutation denes a perfect matching.
That is, (i, (i)) E for all 1 i n. So det(A(x)) = 0 i G has no
perfect matching. Moreover its degree is bounded by n, and given values
i,j for the xi,j s we can evaluate det(A()) eciently in parallel in polylogarithmic time using an appropriate algorithm for the determinant.
So to test for a perfect matching eciently in parallel, just
run the Polynomial Identity Testing algorithm with, say, S =
{1, . . . , 2n} Z, to test whether det(A(x)) = 0.
Some remarks:
The above also provides the most ecient sequential
algorithm for Perfect Matching, using the fact that
Determinant has the same time complexity as Matrix
Multiplication, which is known to be at most O(n2.38 ).
More sophisticated versions of the algorithm apply to nonbipartite graphs, and enable nding perfect matchings in the
2 Recall
that [n] denotes the set {1, . . . , n}. See Section 1.3.
17
2.2
2.2.1
18
Complexity Classes
We will now dene complexity classes that capture the power of ecient
randomized algorithms. As is common in complexity theory, these
classes are dened in terms of decision problems, where the set of inputs
where the answer should be yes is specied by a language L {0, 1} .
However, the denitions generalize in natural ways to other types of
computational problems, such as computing functions or solving search
problems.
Recall that we say a deterministic algorithm A runs in time
t : N N if A takes at most t(|x|) steps on every input x, and it runs in
polynomial time if it runs time t(n) = O(nc ) for a constant c. Polynomial time is a theoretical approximation to feasible computation, with
the advantage that it is robust to reasonable changes in the model of
computation and representation of the inputs.
Denition 2.8. P is the class of languages L for which there exists a
deterministic polynomial-time algorithm A such that
x L A(x) accepts.
x
/ L A(x) rejects.
For a randomized algorithm A, we say that A runs in time t : N N
if A takes at most t(|x|) steps on every input x and every sequence of
random bits.
Denition 2.9. RP is the class of languages L for which there exists
a probabilistic polynomial-time algorithm A such that
x L Pr[A(x) accepts] 1/2.
x
L Pr[A(x) accepts] = 0.
Here (and in the denitions below) the probabilities are taken over the
coin tosses of the algorithm A.
19
That is, RP algorithms may have false negatives; the algorithm may
sometimes say no even if the answer is yes, albeit with bounded
probability. But the denition does not allow for false positives. Thus
RP captures ecient randomized computation with one-sided error.
RP stands for randomized polynomial time. Note that the error
probability of an RP algorithm can be reduced to 2p(n) for any polynomial p by running the algorithm p(n) times independently and accepting the input i at least one of the trials accepts. By the same reasoning,
the 1/2 in the denition is arbitrary, and any constant (0, 1) or even
= 1/poly(n) would yield the same class of languages.
A central question in this survey is whether randomization enables
us to solve more problems (e.g., decide more languages) in polynomial
time:
Open Problem 2.10. Does P = RP?
Similarly, we can consider algorithms that may have false positives
but no false negatives.
Denition 2.11. co-RP is the class of languages L whose complement
is in RP. Equivalently, L co-RP if there exists a probabilistic
L
polynomial-time algorithm A such that
x L Pr[A(x) accepts] = 1.
x
L Pr[A(x) accepts] 1/2.
So, in co-RP we may err on no instances, whereas in RP we may
err on yes instances.
Using the Polynomial Identity Testing algorithm we saw earlier, we can deduce that Polynomial Identity Testing for arithmetic formulas is in co-RP. In Problem 2.4, this is generalized to arithmetic circuits, and thus we have:
Theorem 2.12. Arithmetic Circuit Identity Testing over Z,
dened as the language
ACITZ = {C : C(x1 , . . . , xn ) an arithmetic circuit over Z s.t. C = 0},
is in co-RP.
20
21
22
2.2.3
E[X]
.
23
2.3
2.3.1
24
25
x
xQ
x
/Q
Then A gives the same answer on both f0 and f1 (since all the
oracle queries return 0 in both cases), but (f0 ) = 0 and (f1 ) > 1/2,
so the answer must have error greater than 1/4 for at least one of the
functions.
Thus, randomization provides an exponential savings for approximating the average of a function on a large domain. However, this does
not show that BPP = P. There are two reasons for this:
(1) [+]-Approx Oracle Average is not a decision problem,
and indeed it is not clear how to dene languages that capture the complexity of approximation problems. However,
below we will see how a slightly more general notion of decision problem does allow us to capture approximation problems such as this one.
(2) More fundamentally, it does not involve the standard model
of input as used in the denitions of P and BPP. Rather
than the input being a string that is explicitly given to the
algorithm (where we measure complexity in terms of the
length of the string), the input is an exponential-sized oracle
to which the algorithm is given random access. Even though
this is not the classical notion of input, it is an interesting one
that has received a lot of attention in recent years, because
it allows for algorithms whose running time is sublinear (or
even polylogarithmic) in the actual size of the input (e.g., 2m
in the example here). As in the example here, typically such
algorithms require randomization and provide approximate
answers.
2.3.2
Promise Problems
Now we will try to nd a variant of the [+]-Approx Oracle Average problem that is closer to the P versus BPP question. First, to
26
27
28
Note that [+]-Approx Circuit Average can be viewed as the problem of approximately counting the number of satisfying assignments of
a circuit C : {0, 1}m {0, 1} to within additive error 2m , and a solution to this problem may give useless answers for circuits that dont
have very many satisfying assignments (e.g., circuits with fewer than
2m/2 satisfying assignments). Thus people typically study approximate
counting to within relative error. For example, given a circuit C, output a number that is within a (1 + ) factor of its number of satisfying assignments, #C. Or the following essentially equivalent decision
problem:
Computational Problem 2.32. [(1 + )]-Approx #CSAT is the
following promise problem:
CSATY = {(C, N ) : #C > (1 + ) N }
CSATN = {(C, N ) : #C N }
Here can be a constant or a function of the input length n = |(C, N )|.
29
Unfortunately, this problem is NP-hard for general circuits (consider the special case that N = 0), so we do not expect a prBPP
algorithm. However, there is a very pretty randomized algorithm if we
restrict to DNF formulas.
Computational Problem 2.33. [(1 + )]-Approx #DNF is the
restriction of [(1 + )]-Approx #CSAT to C that are formulas in
disjunctive normal form (DNF) (i.e., an OR of clauses, where each
clause is an AND of variables or their negations).
Theorem 2.34. For every function (n) 1/poly(n), [(1 + )]Approx #DNF is in prBPP.
Proof. It suces to give a probabilistic polynomial-time algorithm that
estimates the number of satisfying assignments to within a 1 factor.
Let (x1 , . . . , xm ) be the input DNF formula.
A rst approach would be to apply random sampling as we have
used above: Pick t random assignments uniformly from {0, 1}m and
evaluate on each. If k of the assignments satisfy , output (k/t) 2m .
However, if # is small (e.g., 2m/2 ), random sampling will be unlikely
to hit any satisfying assignments, and our estimate will be 0.
The idea to get around this diculty is to embed the set of satisfying assignments, A, in a smaller set B so that sampling can be
useful. Specically, we will dene sets A B satisfying the following
properties:
(1)
(2)
(3)
(4)
(5)
|A | = |A|.
|A | |B|/poly(n), where n = |(, N )|.
We can decide membership in A in polynomial time.
|B| computable in polynomial time.
We can sample uniformly at random from B in polynomial
time.
30
31
MaxCut
32
To analyze this algorithm, consider any edge e = (u, v). Then the
probability that e crosses the cut is 1/2. By linearity of expectations, we
have:
Pr[e is cut] = |E|/2.
E[|cut(S)|] =
eE
This also serves as a proof, via the probabilistic method, that every
graph (without self-loops) has a cut of size at least |E|/2.
In Section 3, we will see how to derandomize this algorithm. We
note that there is a much more sophisticated randomized algorithm
that nds a cut whose expected size is within a factor of 0.878 of the
largest cut in the graph (and this algorithm can also be derandomized).
2.4
2.4.1
33
34
Notice that this algorithm only requires space O(log n), to maintain
the current vertex v as well as a counter for the number of steps taken.
Clearly, it never accepts when there isnt a path from s to t. In the next
section, we will prove that in any connected undirected graph, a random
walk of length poly(n) from one vertex will hit any other vertex with
high probability. Applying this to the connected component containing
s, it follows that the algorithm accepts with high probability when s
and t are connected.
This algorithm, dating from the 1970s, was derandomized only
in 2005. We will cover the derandomized algorithm in Section 4.4.
35
For generality that will be useful later, many of the denitions in this
section will be given for directed multigraphs (which we will refer to as
digraphs for short). By multigraph, we mean that we allow G to have
parallel edges and self-loops. Henceforth, we will refer to graphs without parallel edges and self-loops as simple graphs. We call a digraph
d-regular if every vertex has indegree d and outdegree d. A self-loop is
considered a single directed edge (i, i) from a vertex i to itself, so contributes 1 to both the indegree and outdegree of vertex i. An undirected
graph is a graph where the number of edges from i to j equals the number of edges from j to i for every i and j. (When i = j, we think of a pair
of edges (i, j) and (j, i) as comprising a single undirected edge {i, j},
and a self-loop (i, i) also corresponds to the single undirected edge {i}.)
To analyze the random-walk algorithm of the previous section, it
suces to prove a bound on the hitting time of random walks.
Denition 2.48. For a digraph G = (V, E), we dene its hitting
time as
hit(G) = max min{t : Pr[a random walk of
i,jV
36
37
but it has the advantage that we will be able to show that the distance
decreases noticeably in every step. This is captured by the following
quantity.
Denition 2.50. For a regular digraph G with random-walk matrix
M , we dene
def
(G) = max
M u
xM
,
= max
xu x
u
Proof. The rst inequality follows from the denition of (G) and
induction. For the second, we have:
u2 = 2 + u2 2, u
i2 + 1/N 2/N
i
=
i
i + 1/N 2/N
= 1 1/N 1.
i
38
For
every
regular
digraph
G,
hit(G) =
Eigenvalues
39
40
Random walks are a powerful tool in the design of randomized algorithms. In particular, they are the heart of the Markov Chain Monte
Carlo method, which is widely used in statistical physics and for solving approximate counting problems. In these applications, the goal is
to generate a random sample from an exponentially large space, such
as an (almost) uniformly random perfect matching for a given bipartite graph G. (It turns out that this is equivalent to approximately
counting the number of perfect matchings in G.) The approach is to
dened on
do a random walk on an appropriate (regular) graph G
the space (e.g., by doing random local changes on the current per is typically of size exponential in the
fect matching). Even though G
input size n = |G|, in many cases it can be proven to have mixing time
a property referred to as rapid mixing. These
poly(n) = polylog(|G|),
Markov Chain Monte Carlo methods provide some of the best examples
of problems where randomization yields algorithms that are exponentially faster than all known deterministic algorithms.
2.5
Exercises
1 ,...,n S
[f (1 , . . . , n ) = 0]
d
.
|S|
2.5 Exercises
41
You may use the fact that every nonzero univariate polynomial of
degree d over F has at most d roots.
Problem 2.5(Polynomial Identity Testing via Modular Reduction). In this problem, you will analyze an alternative to the algorithm
seen in class, which directly handles polynomials of degree larger than
the eld size. It is based on the same idea as Problem 2.4, using the
fact that polynomials over a eld have many of the same algebraic
properties as the integers.
42
2.5 Exercises
43
(1) Show that for every r [0, 1/2], E[erX ] er E[X]+r t . (Hint:
1 + x ex 1 + x + x2 for all x [0, 1/2].)
(2) Deduce the Cherno Bound of Theorem 2.21: Pr [X]
2
2
E[X]+ t e t/4 and Pr [X E[X] t] e t/4 .
(3) Where did you use the independence of the Xi s?
Problem 2.9 (Spectral Graph Theory). Let M be the randomwalk matrix for a d-regular undirected graph G = (V, E) on n vertices.
We allow G to have self-loops and multiple edges. Recall that the uniform distribution is an eigenvector of M of eigenvalue 1 = 1. Prove the
following statements. (Hint: for intuition, it may help to think about
what the statements mean for the behavior of the random walk on G.)
44
terminology comes from the fact that these are precisely the digraphs that have
Eulerian circuits, closed paths that visit all vertices use every directed edge exactly once.
2.5 Exercises
45
46
2.6
47
48
49
One signicant omission from this section is the usefulness of randomness for verifying proofs. Recall that NP is the class of languages
having membership proofs that can be veried in P. Thus it is natural
to consider proof verication that is probabilistic, leading to the class
MA, as well as a larger class AM, where the proof itself can depend on
the randomness chosen by the verier [44]. (These are both subclasses of
the class IP of languages having interactive proof systems [177].) There
are languages, such as Graph Nonisomorphism, that are in AM but
are not known to be in NP [170]. Derandomizing these proof systems (e.g., proving AM = NP) would show that Graph Nonisomorphism is in NP, i.e., that there are short proofs that two graphs are
nonisomorphic. Similarly to sampling and sublinear-time algorithms,
randomized proof verication can also enable one to read only a small
portion of an appropriately encoded NP proof, leading to the celebrated PCP Theorem and its applications to hardness of approximation [33, 34, 138]. For more about interactive proofs and PCPs, see
[32, 160, 400].
3
Basic Derandomization Techniques
This was of course only a small sample; there are entire texts on
randomized algorithms. (See the notes and references for Section 2.)
In the rest of this survey, we will turn toward derandomization
trying to remove the randomness from these algorithms. We will achieve
this for some of the specic algorithms we studied, and also consider the
larger question of whether all ecient randomized algorithms can be
derandomized. For example, does BPP = P? RL = L? RNC = NC?
In this section, we will introduce a variety of basic derandomization techniques. These will each be decient in that they are either
infeasible (e.g., cannot be carried out in polynomial time) or specialized (e.g., apply only in very specic circumstances). But it will be
50
3.1 Enumeration
51
useful to have these as tools before we proceed to study more sophisticated tools for derandomization (namely, the pseudorandom objects
of Sections 47).
3.1
Enumeration
We are interested in quantifying how much savings randomization provides. One way of doing this is to nd the smallest possible upper bound
on the deterministic time complexity of languages in BPP. For example, we would like to know which of the following complexity classes
contain BPP:
Denition 3.1 (Deterministic Time Classes). 1
DTIME(t(n)) = {L : L can be decided deterministically
in time O(t(n))}
P = c DTIME(nc )
= c DTIME(2(log n)c )
P
(polynomial time)
(quasipolynomial time)
SUBEXP =
DTIME(2n )
(subexponential time)
EXP = c
c
DTIME(2n )
(exponential time)
52
1
2m(n)
A(x; r)
r{0,1}m(n)
We can compute the right-hand side of the above expression in deterministic time 2m(n) t(n).
We see that the enumeration method is general in that it applies
to all BPP algorithms, but it is infeasible (taking exponential time).
However, if the algorithm uses only a small number of random bits, it
becomes feasible:
Proposition 3.3. If L has a probabilistic polynomial-time algorithm that runs in time t(n) and uses m(n) random bits, then L
DTIME(t(n) 2m(n) ). In particular, if t(n) = poly(n) and m(n) =
O(log n), then L P.
Thus an approach to proving BPP = P is to show that the number
of random bits used by any BPP algorithm can be reduced to O(log n).
We will explore this approach in Section 7. However, to date, Proposition 3.2 remains the best unconditional upper-bound we have on the
deterministic time-complexity of BPP.
3.2
Nonconstructive/Nonuniform Derandomization
53
Pr[A(x; Rn ) incorrect on x]
x
< 2n 2n = 1
Thus, there exists a xed value Rn = rn that yields a correct answer
for all x {0, 1}n .
The advantage of this method over enumeration is that once we
have the xed string rn , computing A(x; rn ) can be done in polynomial
time. However, the proof that rn exists is nonconstructive; it is not
clear how to nd it in less than exponential time.
Note that we know that we can reduce the error probability of
any BPP (or RP, RL, RNC, etc.) algorithm to smaller than 2n by
repetitions, so this proposition is always applicable. However, we begin
by looking at some interesting special cases.
Example 3.6 (Perfect Matching). We apply the proposition to
Algorithm 2.7. Let G = (V, E) be a bipartite graph with m vertices
on each side, and let AG (x1,1 , . . . , xm,m ) be the matrix that has entries
G
AG
i,j = xi,j if (i, j) E, and Ai,j = 0 if (i, j) E. Recall that the polynomial det(AG (x)) is nonzero if and only if G has a perfect matching.
2
Let Sm = {0, 1, 2, . . . , m2m }. We argued that, by the SchwartzZippel
R
m2 at random and evaluate det(AG ()),
Lemma, if we choose Sm
we can determine whether det(AG (x)) is zero with error probability at
2
most m/|S| which is smaller than 2m . Since a bipartite graph with m
54
Example 3.8 (Universal Traversal Sequences). Let G be a connected d-regular undirected multigraph on n vertices. From Theorem 2.49, we know that a random walk of poly(n, d) steps from any
start vertex will visit any other vertex with high probability. By increasing the length of the walk by a polynomial factor, we can ensure
that every vertex is visited with probability greater than 1 2nd log n .
By the same reasoning as in the previous example, we conclude that
for every pair (n, d), there exists a universal traversal sequence w
{1, 2, . . . , d}poly(n,d) such that for every n-vertex, d-regular, connected
G and every vertex s in G, if we start from s and follow w then we will
visit the entire graph.
55
the unary version of the Halting Problem, which can be decided in constant
time given advice n {0, 1} that species whether the nth Turing machine halts or not.
56
3.3
Nondeterminism
y z P (x, y, z),
(3.1)
3.3 Nondeterminism
57
(Ax si )
i=1
s1 , s2 , . . . , sm {0, 1} r {0, 1}
m
(A(x; r si ) = 1);
i=1
x
/ L s1 , s2 , . . . , sm {0, 1} r {0, 1}
m
r
/
(Ax si )
i=1
(A(x; r si ) = 1).
i=1
x
/ L Let s1 , . . . , sm be arbitrary, and choose R {0, 1}m at random. Now Ax and hence each Ax si contains less than a
2n fraction of {0, 1}m . So, by a union bound,
Pr[R
(Ax si )]
Pr[R Ax si ]
i
< m 2n < 1,
for suciently large n. In particular, there exists an r
{0, 1}m such that r
/ i (Ax si ).
58
Pr[Si
/ Ax r]
< (2n )m ,
since Ax and hence Ax r contains more than a 1 2n
fraction of {0, 1}m . By a union bound, we have:
Pr[r r
/ (Ax Si )] < 2m (2n )m 1.
i
i (Ax
si ) contains
3.4
59
of depth m. We know that most paths (from the root to the leaf) are
good, that is, give a correct answer. A natural idea is to try and nd
such a path by walking down from the root and making good choices
at each step. Equivalently, we try to nd a good sequence of coin tosses
bit-by-bit.
To make this precise, x a randomized algorithm A and an input
x, and let m be the number of random bits used by A on input x.
For 1 i m and r1 , r2 , . . . , ri {0, 1}, dene P (r1 , r2 , . . . , ri ) to be the
fraction of continuations that are good sequences of coin tosses. More
precisely, if R1 , . . . , Rm are uniform and independent random bits, then
def
P (r1 , r2 , . . . , ri ) =
Pr
R1 ,R2 ,...,Rm
[A(x; R1 , R2 , . . . , Rm ) is correct
|R1 = r1 , R2 = r2 , . . . , Ri = ri ]
=
Ri+1
P(0,1)=7/8
o o
o o o
Fig. 3.1 An example of P (r1 , r2 ), where o at the leaf denotes a good path.
60
where P () denotes the fraction of good paths from the root. Then
P (r1 , r2 , . . . , rm ) = 1, since it is either 1 or 0.
Note that to implement this method, we need to compute
P (r1 , r2 , . . . , ri ) deterministically, and this may be infeasible. However,
there are nontrivial algorithms where this method does work, often for
search problems rather than decision problems, and where we measure
not a boolean outcome (e.g., whether A is correct as above) but some
other measure of quality of the output. Below we see one such example,
where it turns out to yield a natural greedy algorithm.
3.4.2
to be the expected cut size when the random choices for the rst i coins
are xed to r1 , r2 , . . . , ri .
61
We know that when no random bits are xed, e[] = |E|/2 (because
each edge is cut with probability 1/2), and all we need to calculate is
e(r1 , r2 , . . . , ri ) for 1 i N . For this particular algorithm it turns out
def
we determine r1 , . . . , ri , and Ui = {i + 1, i + 2, . . . , N } be the undecided vertices that have not been put into S or T . Then
e(r1 , r2 , . . . , ri ) = |cut(Si , Ti )| + 1/2(|cut(Ui , [N ])|).
(3.2)
Note that cut(Ui , [N ]) is the set of unordered edges that have at least
one endpoint in Ui . Now we can deterministically select a value for ri+1 ,
by computing and comparing e(r1 , r2 , . . . , ri , 0) and e(r1 , r2 , . . . , ri , 1).
In fact, the decision on ri+1 can be made even simpler than computing (3.2) in its entirety, by observing that the set cut(Ui+1 , [N ])
is independent of the choice of ri+1 . Therefore, to maximize
e(r1 , r2 . . . , ri , ri+1 ), it is enough to choose ri+1 that maximizes the
|cut(S, T )| term. This term increases by either |cut({i + 1}, Ti )| or
|cut({i + 1}, Si )| depending on whether we place vertex i + 1 in S or
T , respectively. To summarize, we have
e(r1 , . . . , ri , 0) e(r1 , . . . , ri , 1) = |cut({i + 1}, Ti )| |cut({i + 1}, Si )|.
This gives rise to the following deterministic algorithm, which is guaranteed to always nd a cut of size at least |E|/2:
Algorithm 3.17 (deterministic MaxCut approximation).
Input: A graph G = ([N ], E) (with no self-loops)
(1) Set S = , T =
(2) For i = 0, . . . , N 1:
(a) If |cut({i + 1}, T )| > |cut({i + 1}, S)|, set S S
{i + 1},
(b) Else set T T {i + 1}.
Note that this is the natural greedy algorithm for this problem. In
other cases, the Method of Conditional Expectations yields algorithms
62
that, while still arguably greedy, would have been much less easy to
nd directly. Thus, designing a randomized algorithm and then trying
to derandomize it can be a useful paradigm for the design of deterministic algorithms even if the randomization does not provide gains
in eciency.
3.5
3.5.1
Pairwise Independence
An Example
where R1 , . . . , RN are the random bits of the algorithm. The key observation is that this analysis applies for any distribution on (R1 , . . . , RN )
satisfying Pr[Ri = Rj ] = 1/2 for each i = j. Thus, they do not need to
be completely independent random variables; it suces for them to be
pairwise independent. That is, each Ri is an unbiased random bit, and
for each i = j, Ri is independent from Rj .
This leads to the question: Can we generate N pairwise independent
bits using less than N truly random bits? The answer turns out to be
yes, as illustrated by the following construction.
Construction 3.18 (pairwise independent bits). Let B1 , . . . , Bk
be k independent unbiased random bits. For each nonempty S [k],
let RS be the random variable iS Bi , where denotes XOR.
Proposition 3.19. The 2k 1 random variables RS in Construction 3.18 are pairwise independent unbiased random bits.
Proof. It is evident that each RS is unbiased. For pairwise independence, consider any two nonempty sets S = T [k]. Then:
RS = RST RS\T
RT = RST RT \S .
63
64
65
66
Hash Tables
67
K
1
= H(y)]
<
2
M
for M = K 2 /. Notice that the above analysis does not require H to be
a completely random function; it suces that H be pairwise independent (or even 2-universal). Thus using Theorem 3.26, we can generate
and store H using O(log N ) random bits. The storage required for the
table T is O(M log N ) = O(K 2 log N ) bits, which is much smaller than
N when K = N o(1) . Note that to uniquely represent a set of size K,
N
we need space at least log K
= (K log N ) (when K N 0.99 ). In fact,
there is a data structure achieving a matching space upper bound of
O(K log N ), which works by taking M = O(K) in the above construction and using additional hash functions to separate the (few) collisions
that will occur.
Often, when people analyze applications of hashing in computer science, they model the hash function as a truly random function. However, the domain of the hash function is often exponentially large, and
thus it is infeasible to even write down a truly random hash function.
Thus, it would be preferable to show that some explicit family of hash
function works for the application with similar performance. In many
cases (such as the one above), it can be shown that pairwise independence (or k-wise independence, as discussed below) suces.
68
3.5.4
Var[X]
2
Proof. Applying Markovs Inequality (Lemma 2.20) to the random variable Y = (X )2 , we have:
Pr[|X | ] = Pr[(X )2 2 ]
E[(X )2 ] Var[X]
=
.
2
2
1
.
t2
69
=
=
t2
E[(Xi i )(Xj j )]
i,j
1
E[(Xi i )2 ]
t2
1
Var[Xi ]
t2
i
t
Now apply Chebyshevs Inequality.
While this requires less independence than the Cherno Bound, notice
that the error probability decreases only linearly rather than exponentially with the number t of samples.
Error Reduction. Proposition 3.28 tells us that if we use t = O(2k )
pairwise independent repetitions, we can reduce the error probability of a BPP algorithm from 1/3 to 2k . If the original BPP
algorithm uses m random bits, then we can do this by choosing
h : {0, 1}k+O(1) {0, 1}m at random from a pairwise independent
family, and running the algorithm using coin tosses h(x) for all
x {0, 1}k+O(1) . This requires m + max{m, k + O(1)} = O(m + k)
random bits.
Independent Repetitions
Pairwise Independent Repetitions
Number of
Repetitions
Number of
Random Bits
O(k)
O(2k )
O(km)
O(m + k)
70
Number of
Random Bits
O((1/2 ) log(1/))
Pairwise Independent
Repetitions
O((1/2 ) (1/))
O(m (1/2 )
log(1/))
O(m + log(1/)
+ log(1/))
71
values, whereas the original sampling problem does not constrain the
output function. It is useful to abstract these properties as follows.
Denition 3.29. A sampler Samp for domain size M is given coin
R
tosses x [N ] and outputs a sequence of samples z1 , . . . , zt [M ]. We
say that Samp : [N ] [M ]t is a (, ) averaging sampler if for every
function f : [M ] [0, 1], we have
t
1
f (zi ) > (f ) + .
(3.3)
Pr
R
(z1 ,...,zt )Samp(U[N ] ) t
i=1
If Inequality 3.3 only holds for f with (boolean) range {0, 1}, we call
Samp a boolean averaging sampler. We say that Samp is explicit if given
x [N ] and i [t], Samp(x)i can be computed in time poly(log N, log t).
We note that, in contrast to the Cherno Bound (Theorem 2.21)
and the Pairwise Independent Tail Inequality (Proposition 3.28), this
denition seems to only provide an error guarantee in one direction,
namely that the sample average does not signicantly exceed the global
average (except with small probability). However, a guarantee in the
other direction also follows by considering the function 1 f . Thus,
up to a factor of 2 in the failure probability , the above denition
is equivalent to requiring that Pr[|(1/t) i f (zi ) (f )| > ] . We
choose to use a one-sided guarantee because it will make the connection
to list-decodable codes (in Section 5) slightly cleaner.
Our pairwise-independent sampling algorithm can now be described
as follows:
Theorem 3.30 (Pairwise Independent Sampler). For every
m N and , [0, 1], there is an explicit (, ) averaging sampler
Samp : {0, 1}n ({0, 1}m )t using n = O(m + log(1/) + log(1/)) random bits and t = O(1/(2 )) samples.
As we will see in subsequent sections, averaging samplers are intimately related to the other pseudorandom objects we are studying
(especially randomness extractors). In addition, some applications of
samplers require samplers of this restricted form.
72
3.5.5
k-wise Independence
Our denition and construction of pairwise independent functions generalize naturally to k-wise independence for any k.
Denition 3.31 (k-wise independent hash functions). For
N, M, k N such that k N , a family of functions H = {h : [N ]
[M ]} is k-wise independent if for all distinct x1 , x2 , . . . , xk [N ], the
random variables H(x1 ), . . . , H(xk ) are independent and uniformly disR
tributed in [M ] when H H.
k
i=1
yi
x xj
.
xi xj
j=i
3.6 Exercises
73
a random function from H takes k max{n, m} random bits, and evaluating a function from H takes time poly(n, m, k).
k-wise independent hash functions have applications similar to those
that pairwise independent hash functions have. The increased independence is crucial in derandomizing some algorithms. k-wise independent
random variables also satisfy a tail bound similar to Proposition 3.28,
with the key improvement being that the error probability vanishes
linearly in tk/2 rather than t; see Problem 3.8.
3.6
Exercises
Problem 3.2 (Designs). Designs (also known as packings) are collections of sets that are nearly disjoint. In Section 7, we will see how
they are useful in the construction of pseudorandom generators. Formally, a collection of sets S1 , S2 , . . . , Sm [d] is called an (, a)-design
(for integers a d) if
For all i, |Si | = .
For all i = j, |Si Sj | < a.
For given , wed like m to be large, a to be small, and d to be small.
That is, wed like to pack many sets into a small universe with small
intersections.
2
(1) Prove that if m ad / a , then there exists an (, a)-design
S1 , . . . , Sm [d].
Hint: Use the Probabilistic Method. Specically, show that
if the sets are chosen randomly, then for every S1 , . . . , Si1 ,
E [#{j < i : |Si Sj | a}] < 1.
Si
74
Problem 3.4(Almost Pairwise Independence). A family of functions H mapping domain [N ] to range [M ] is -almost pairwise
independent3 if for every x1 = x2 [N ], y1 , y2 [M ], we have
1+
.
Pr [H(x1 ) = y1 and H(x2 ) = y2 ]
R
M2
H H
3 Another
3.6 Exercises
75
(1) Show that there exists a family H of -almost pairwise independent functions from {0, 1}n to {0, 1}m such that choosing
a random function from H requires only O(m + log n +
log(1/)) random bits (as opposed to O(m + n) for exact
pairwise independence). (Hint: First consider domain Fd+1
for an appropriately chosen nite eld F and d N, and look
at maps of the form h = g fa , where g comes from some
pairwise independent family and fa : Fd+1 F is dened by
fa (x0 , . . . , xd ) = x0 + x1 a + x2 a2 + + xd ad .)
(2) Give a deterministic algorithm that on input an N -vertex,
M -edge graph G (with no self-loops), nds a cut of size
at least (1/2 o(1)) M in time M polylog(N ) and space
O(log M ) (thereby improving the M poly(N ) running time
of Algorithm 3.20).
76
77
k2
4t2
k/2
.
[i zi T ] 1 .
3.7
78
79
in [408, 168]. The size lower bound for pairwise independent families
in Problem 3.5 is due to Stinson [376], based on the PlackettBurman
bound for orthogonal arrays [312]. The construction of almost pairwise
independent families in Problem 3.4 is due to Bierbrauer et al. [67]
(though the resulting parameters follow from the earlier work of Naor
and Naor [296]).
The application to hash tables from Section 3.5.3 is due to Carter
and Wegman [93], and the method mentioned for improving the
space complexity to O(K log N ) is due to Fredman, Komlos, and
Szemeredi [142]. The problem of randomness-ecient error reduction (sometimes called deterministic amplication) was rst studied
by Karp, Pippenger, and Sipser [234], and the method using pairwise independence given in Section 3.5.4 was proposed by Chor and
Goldreich [97]. The use of pairwise independence for derandomizing
algorithms was pioneered by Luby [278]; Algorithm 3.20 for MaxCut
is implicit in [279]. The notion of averaging samplers was introduced
by Bellare and Rompel [57] (under the term oblivious samplers). For
more on samplers and averaging samplers, see the survey by Goldreich [155]. Tail bounds for k-wise independent random variables, such
as the one in Problem 3.8, can be found in the papers [57, 97, 350].
Problem 3.2 on designs is from [134], with the derandomization of
Part 3 being from [281, 302]. Problem 3.6 on the frequency moments
of data streams is due to Alon, Matias, and Szegedy [22]. For more on
data stream algorithms, we refer to the survey by Muthukrishnan [295].
4
Expander Graphs
4.1
Measures of Expansion
80
4.1.1
81
Vertex Expansion
S = N (S) \ S.
Edge Expansion (cuts): Instead of S, use the number of
edges leaving S.
Random Walks: Random walks converge quickly to the uniform distribution, that is, the second eigenvalue (G) is
small.
Quasi-randomness (a.k.a Mixing): for every two sets S
and T (say of constant density), the fraction of edges between
S and T is roughly the product of their densities.
All of these measures are very closely related to each other, and are
even equivalent for certain settings of parameters.
It is not obvious from the denition that good vertex expanders
(say, with D = O(1), K = (N ), and A = 1 + (1)) even exist. We
will show this using the Probabilistic Method.
Theorem 4.2. For all constants D 3, there is a constant > 0 such
that for all N , a random D-regular undirected graph on N vertices is
an (N, D 1.01) vertex expander with probability at least 1/2.
Note that the degree bound of 3 is the smallest possible, as every
graph of degree 2 is a poor expander (being a union of cycles and
82
Expander Graphs
.
2K
N
Thus, we nd that
KD 2K
KD
N
pK
2K
K
N
K
3 4 K
KDe 2K KD 2K
e D K
Ne
=
,
K
2K
N
4N
83
Pr
GBipN,D
K=1
1
4K < .
2
84
Expander Graphs
Spectral Expansion
(G) = max
M u
xM
= max
,
xu x
u
other sources (including the original lecture notes on which this survey was based), the
spectral expansion referred to rather than . Here we use , because it has the more
natural feature that larger values of correspond to the graph being more expanding.
85
86
Expander Graphs
.
|N (S)|
N
|S|
N
Solving for |N (S)| and using N |S|/, we obtain |N (S)|
|S|/(2 (1 ) + ), as desired.
The other direction, i.e., obtaining spectral expansion from vertex
expansion, is more dicult (and we will not prove it here).
Theorem 4.9 (vertex expansion spectral expansion). For
every > 0 and D > 0, there exists > 0 such that if G is a D-regular
(N/2, 1 + ) vertex expander, then it also has spectral expansion .
Specically, we can take = ((/D)2 ).
Note rst the dependence on subset size being N/2: this is necessary,
because a graph can have vertex expansion (N, 1 + (1)) for < 1/2
and be disconnected (e.g., the disjoint union of two good expanders),
thereby having no spectral expansion. Also note that the bound on
deteriorates as D increases. This is also necessary, because adding edges
to a good expander cannot hurt its vertex expansion, but can hurt its
spectral expansion.
Still, roughly speaking, these two results show that vertex expansion
and spectral expansion are closely related, indeed equivalent for many
interesting settings of parameters:
Corollary 4.10. Let G be an innite family of D-regular multigraphs,
for a constant D N. Then the following two conditions are equivalent:
There is a constant > 0 such that every G G is an
(N/2, 1 + ) vertex expander.
87
When people informally use the term expander, they often mean
a family of regular graphs of constant degree D satisfying one of the
two equivalent conditions above.
However, the two measures are no longer equivalent if one wants
to optimize the expansion constants. For vertex expansion, we have
already seen that if we allow to be a small constant (depending on
D), then there exist (N, A) vertex expanders with A very close to
D 1, e.g., A = D 1.01, and Problem 4.3 shows that A cannot be
any larger than D 1. The optimal value for the spectral expansion is
also well-understood. First note that, by taking 0 in Theorem 4.6,
2
a graph with spectral expansion 1 has vertex
expansion A 1/
for small sets. Thus, a lower bound on is 1/ D o(1). In fact, this
lower bound can be improved, as shown in the following theorem (and
proven in Problem 4.4):
Theorem 4.11. For every constant D N, any D-regular, N -vertex
88
Expander Graphs
89
90
Expander Graphs
N D
.
Observe that the denominator N D counts all edges of the graph
(as ordered pairs). The lemma states that the dierence between the
fraction of edges from S to T and the expected value if we were to choose
G randomly is small, roughly times the square root of this fraction.
Finally, note that Part 1 of Theorem 4.14 follows from the Expander
Mixing Lemma by setting T = S c , so = 1 and e(S, T )/N D
(1 ) (1 ) /2.
When a digraph G = (V, E) has the property that
|e(S, T )/|E| | = O(1) for all sets S, T (with densities , ),
the graph is called quasirandom. Thus, the Expander Mixing Lemma
implies that a regular digraph with (G) = O(1) is quasirandom.
Quasirandomness has been studied extensively for dense graphs, in
which case it has numerous equivalent formulations. Here we are most
interested in sparse graphs, especially constant-degree graphs (for
which (G) = O(1) is impossible).
Proof. Let S be the characteristic (row) vector of S and T
the characteristic vector of T . Let A be the adjacency matrix of
G, and M = A/D be the random-walk matrix for G. Note that
e(S, T ) = S AtT = S (DM )tT , where the superscript t denotes the
transpose.
We can express S as the sum of two components, one parallel to the uniform distribution u, and the other a vector
S , where
Then S = (N )u +
S and similarly T = (N )u + T . Intuitively,
the components parallel to the uniform distribution spread the weight
91
Formally, we have
e(S, T )
1
t
= ((N )u +
S )M ((N )u + T )
N D
N
1
1
t
= (N 2 )uM ut + (N )uM (
T)
N
N
1
1
t
t
M (
+ (N )
S Mu +
T) .
N S
N
Thus,
e(S, T )
N D =
t
(S M )(
T)
N
1
S M T
N
1
S T .
N
To complete the proof, we note that
2
2
2
N = S 2 = (N )u2 +
S = N + S ,
2 )N =
(1 ) N and similarly
=
(
S
so
(1 ) N .
T =
Similarly to vertex expansion and edge expansion, a natural question is to what extent the converse holds. That is, if e(S, T )/N D is
always close to the product of the densities of S and T , then is (G)
necessarily small? This is indeed true:
Theorem 4.16 (Converse to Expander Mixing Lemma). Let G
be a D-regular, N -vertex undirected graph. Suppose that for all pairs
ofdisjoint vertex sets S, T , we have |e(S, T )/(N D) (S)(T )|
(S)(T ) for some [0, 1], where (R) = |R|/N for any set R of
vertices. Then (G) = O ( log(1/)).
92
Expander Graphs
Putting the two theorems together, we see that and are the
same up to a logarithmic factor. Thus, unlike the other connections we
have seen, this connection is good for highly expanding graphs (i.e.,
(G) close to zero, (G) close to 1).
4.2
93
94
Expander Graphs
95
xRN
xM
x
96
Expander Graphs
97
Thus,
P M P + = + (1 ).
Using Claims 4.20 and 4.21, the probability of never leaving B in a
(t 1)-step random walk is
|uP (M P )t1 |1 N uP (M P )t1
N uP P M P t1
98
Expander Graphs
N
( + (1 ))t1
N
( + (1 ))t .
The hitting properties described above suce for reducing the error
of RP algorithms. What about BPP algorithms, which have two-sided
error? They are handled by the following.
Theorem 4.22(Cherno Bound for Expander Walks). Let G be
a regular digraph with N vertices and spectral expansion 1 , and
let f : [N ] [0, 1] be any function. Consider a random walk V1 , . . . , Vt
in G from a uniform start vertex V1 . Then for any > 0
1
2
f (Vi ) (f ) + 2e( t) .
Pr
t
i
Note that this is just like the standard Cherno Bound (Theorem 2.21), except that our additive approximation error increases by
= 1 . Thus, unlike the Hitting Property we proved above, this
bound is only useful when is suciently small (as opposed to bounded
away from 1). This can be achieved by taking the a power of the initial
expander, where edges correspond to walks of length t in the original
expander; this raises the random-walk matrix and to the tth power.
However, there is a better Cherno Bound for Expander Walks, where
does not appear in the approximation error, but the exponent in
the probability of error is (2 t) instead of (2 t). The bound above
suces in the common case that a small constant approximation error
can be tolerated, as in error reduction for BPP.
Proof. Let Xi be the random variable f (Vi ), and X = i Xi . Just like
in the standard proof of the Cherno Bound (Problem 2.7), we show
that the expectation of the moment generating function erX = i erXi
is not much larger than er E[X] and apply Markovs Inequality, for a
suitable choice of r. However, here the factors erXi are not independent, so the expectation does not commute with the product. Instead,
we express E[erX ] linear-algebraically as follows. Dene a diagonal
99
matrix P whose (i, i)th entry is erf (i) . Then, similarly Claim 4.20 in
the proof of the hitting proof above, we observe that
E[erX ] = uP (M P )t1 1 = u(M P )t 1 N u M P t = M P t .
To see this, we simply note that each cross-term in the matrix product
uP (M P )t1 corresponds to exactly one expander walk v1 , . . . , vt , with
a coecient equal to the probability of this walk times i ef (vi ) . By
the Matrix Decomposition Lemma (Lemma 4.19), we can bound
M P (1 ) JP + EP .
Since J simply projects onto the uniform direction, we have
uP 2
u2
rf (v)
/N )2
v (e
=
2
v (1/N )
1 2rf (v)
=
e
N
v
JP 2 =
1
= 1 + 2r + O(r2 )
for r 1, and thus
JP =
1 + 2r + O(r2 ) = 1 + r + O(r2 ).
2 t)
100
Expander Graphs
By Markovs Inequality,
Pr[X ( + + ) t] ert+O(r
2 t)
= e(
2 t)
Independent Repetitions
Pairwise Independent Repetitions
Expander Walks
Number of
Repetitions
Number of
Random Bits
O(k)
O(2k )
O(k)
O(km)
O(k + m)
m + O(k)
Number of
Random Bits
101
102
Expander Graphs
4.3
Explicit Constructions
103
Algebraic Constructions
104
Expander Graphs
for a0 , a1 , a2 , a3 Z such that a20 + a21 + a22 + a23 = p, a0 is odd and positive, and a1 , a2 , a3 are even, for some xed prime p = q such that p 1
mod 4, q is a square modulo p, and i Fq such that i2 = 1 mod q.
The degree of the graph is the number of solutions to the equation a20 + a21 + a22 + a23 = p, which turns out to be D = p + 1, and it
105
spectral expansion.) These graphs are also only mildly explicit, again
due to the need to nd the prime q.
These are called Ramanujan Graphs because the proof of their
spectral expansion relies on results in number theory concerning the
Ramanujan Conjectures. Subsequently, the term Ramanujan graphs
came to refer to any innite family of graphs with optimal spectral
expansion 1 2 D 1/D.
4.3.2
Graph Operations
Denition 4.28. An (N, D, )-graph is a D-regular digraph on N vertices with spectral expansion .
4.3.2.1
Squaring
106
Expander Graphs
Tensoring
The next operation we consider increases the size of the graph at the
price of increasing the degree.
Denition 4.31 (Tensor Product of Graphs). Let G1 = (V1 , E1 )
be D1 -regular and G2 = (V2 , E2 ) be D2 -regular. Then their tensor
product is the D1 D2 -regular graph G1 G2 = (V1 V2 , E), where the
(i1 , i2 )th neighbor of a vertex (x1 , x2 ) is (y1 , y2 ), where yb is the ib th
neighbor of xb in Gb . That is, a random step on G1 G2 consists of a
random step on G1 in the rst component and a random step on G2 in
the second component.
Often this operation is simply called the product of G1 and G2 ,
but we use tensor product to avoid confusion with squaring and
to reect its connection with the standard tensor products in linear
algebra:
Denition 4.32 (Tensor Products of Vectors and Matrices).
Let x RN1 , y RN2 , then their tensor product is the vector z =
x y RN1 N2 where zij = xi yj .
107
108
Expander Graphs
109
Of the two operations we have seen, one (squaring) improves expansion and one (tensoring) increases size, but both have the deleterious
eect of increasing the degree. Now we will see a third operation that
decreases the degree, without losing too much in the expansion. By
repeatedly applying these three operations, we will be able to construct arbitrarily large expanders while keeping both the degree and
expansion constant.
110
Expander Graphs
111
we follow our convention of using capital letters to denote random variables corresponding to the lower-case values in Denition 4.34.
112
Expander Graphs
distribution V on clouds is closer to uniform, then the conditional distributions within the clouds J |V =v must have
become further from uniform, and thus the second H-step
(V, J ) (V, J) brings us closer to uniform. This leads to a
proof by Vector Decomposition, where we decompose any vector x that is orthogonal to uniform into components x and
x , where x is uniform on each cloud, and x is orthogonal
to uniform on each cloud. This approach gives the best known
bounds on the spectral expansion of the zigzag product, but
it can be a bit messy since the two components generally do
not remain orthogonal after the steps of the zigzag product
(unlike the case of the tensor product, where we were able to
show that x M is orthogonal to x M ).
The second intuition is to think of the expander H as behaving similarly to the complete graph on D1 vertices (with
self-loops). In the case that H equals the complete graph,
z H = G H. Thus it is natuthen it is easy to see that G $
ral to apply Matrix Decomposition, writing the random-walk
matrix for an arbitrary expander H as a convex combination
of the random-walk matrix for the complete graph and an
error matrix. This gives a very clean analysis, but slightly
worse bounds than the Vector Decomposition Method.
We now proceed with the formal proof, following the Matrix Decomposition approach.
Proof of Theorem 4.35. Let A, B, and M be the random-walk matriz G2 , respectively. We decompose M into the
ces for G1 , G2 , and G1 $
product of three matrices, corresponding to the three steps in the def be the transition matrix for taking
z G2 s edges. Let B
inition of G1 $
a random G2 -step on the second component of [N1 ] [D1 ], that is,
= IN B, where IN is the N1 N1 identity matrix. Let A be the
B
1
1
permutation matrix corresponding to the G1 -step. That is, A(u,i),(v,j)
is 1 i (u, v) is the ith edge leaving u and the jth edge entering v. By
AB.
z G2 , we have M = B
the denition of G1 $
By the Matrix Decomposition Lemma (Lemma 4.19), B = 2 J +
(1 2 )E, where every entry of J equals 1/D1 and E has norm at
113
= 2 J + (1 2 )E,
where J = IN J and E
= IN
most 1. Then B
1
1
E has norm at most 1.
This gives
A(
2 J + (1 2 )E)
= 22 JAJ + (1 22 )F,
M = (2 J + (1 2 )E)
where we take (1 22 )F to be the sum of the three terms involving
noting that their norms sum to at most (1 2 ), we see that F has
E;
2
norm at most 1. Now, the key observation is that JAJ = A J.
Thus,
M = 22 A J + (1 22 )F,
and thus
(M ) 22 (A J) + (1 22 )
22 (1 1 ) + (1 22 )
= 1 1 22 ,
as desired.
4.3.3
114
Expander Graphs
z H is well-dened because
Induction Step: First note that G2t $
2
2
2
2
deg(Gt ) = deg(Gt ) = (D ) = #nodes(H). Then,
deg(Gt+1 ) = deg(H)2 = D2
#nodes(Gt+1 ) = #nodes(G2t ) #nodes(H) = Nt D4 = D4t D4 = D4(t+1)
(Gt+1 ) (Gt )2 + 2(H) (1/2)2 + 2 (1/8) = 1/2
Now, we recursively bound the time to compute neighbors in Gt .
Actually, due to the way the G-step in the zigzag product is dened,
we bound the time to compute the edge-rotation map (u, i) (v, j),
where the ith edge leaving u equals the jth edge entering v. Denote
by time(Gt ) the time required for one evaluation of the edge-rotation
map for Gt . This requires two evaluations of the edge-rotation map for
Gt1 (the squaring requires two applications, while the zigzag part
does not increase the number of applications), plus time poly(log Nt )
for manipulating strings of length O(log Nt ). Therefore,
time(Gt ) = 2 time(Gt1 ) + poly(log Nt )
= 2t poly(log Nt )
(1)
= Nt
where the last equality holds because Nt = D4t for a constant D. Thus,
this construction is only mildly explicit.
We remedy the above diculty by using tensoring to make the sizes
of the graphs grow more quickly:
Construction 4.38 (Fully Explicit Expanders). Let H be a
(D8 , D, 7/8)-graph, and dene:
G1 = H 2
z H
Gt+1 = (Gt Gt )2 $
In this family of graphs, the number of nodes grows doubly exponent
tially Nt c2 , while the computation time grows only exponentially
as before. Namely,
time(Gt ) = 4t poly(log Nt ) = poly(log Nt ).
115
Open Problems
116
Expander Graphs
on deep results in number theory. The lack of a more elementary construction seems to signify a limitation in our understanding of expander
graphs.
Open Problem 4.42. Give an explicit combinatorial construction
4.4
Recall the Undirected S-T Connectivity problem: Given an undirected graph G and two vertices s, t, decide whether there is a path from
s to t. In Section 2.4, we saw that this problem can be solved in randomized logspace (RL). Here we will see how we can use expanders and the
operations above to solve this problem in deterministic logspace (L).
117
118
Expander Graphs
4
35
1
min
(Ck1 ),
,
32
18
119
120
Expander Graphs
4.5 Exercises
121
4.5
Exercises
122
Expander Graphs
4.5 Exercises
123
Problem 4.4 (Limits on Spectral Expansion). Let G be a Dregular undirected graph and TD be the innite D-regular tree (as
in Problem 4.3). For a graph H and N, let p (H) denote the probability that if we choose a random vertex v in H and do a random walk
of length 2, we end back at vertex v.
(1) Show that p (G) p (TD ) C (D 1) /D2 , where C is
the th Catalan number, which equals the number of properly
parenthesized strings in {(, )}2 strings where no prex has
more)s than (s.
(2) Show that N p (G) 1 + (N 1) (G)2 . (Hint: use the
fact that the trace of a matrix equals the sum of its eigenvalues.)
(3) Using the fact that C = 2
/( + 1), prove that
2 D1
(G)
O(1),
D
where the O(1) term vanishes as N (and D is held
constant).
124
Expander Graphs
4.5 Exercises
125
126
Expander Graphs
Problem 4.10(Unbalanced Vertex Expanders and Data Structures). Consider a (K, (1 )D) bipartite vertex expander G with N
left vertices, M right vertices, and left degree D.
(1) For a set S of left vertices, a y N (S) is called a unique
neighbor of S if y is incident to exactly one edge from S.
Prove that every left-set S of size at most K has at least
(1 2)D|S| unique neighbors.
(2) For a set S of size at most K/2, prove that at most |S|/2
vertices outside S have at least D neighbors in N (S), for
= O().
Now well see a beautiful application of such expanders to data
structures. Suppose we want to store a small subset S of a large universe
[N ] such that we can test membership in S by probing just 1 bit of our
data structure. A trivial way to achieve this is to store the characteristic
vector of S, but this requires N bits of storage. The hashing-based data
structures mentioned in Section 3.5.3 only require storing O(|S|) words,
each of O(log N ) bits, but testing membership requires reading an entire
word (rather than just one bit.)
Our data structure will consist of M bits, which we think of as a
{0, 1}-assignment to the right vertices of our expander. This assignment
will have the following property.
Property : For all left vertices x, all but a = O() fraction of the
neighbors of x are assigned the value S (x) (where S (x) = 1
i x S).
(3) Show that if we store an assignment satisfying Property ,
then we can probabilistically test membership in S with error
probability by reading just one bit of the data structure.
(4) Show that an assignment satisfying Property exists provided |S| K/2. (Hint: rst assign 1 to all of Ss neighbors
and 0 to all its nonneighbors, then try to correct the errors.)
It turns out that the needed expanders exist with M = O(K log N )
(for any constant ), so the size of this data structure matches the
127
hashing-based scheme while admitting (randomized) 1-bit probes. However, note that such bipartite vertex expanders do not follow from
explicit spectral expanders as given in Theorem 4.39, because the latter
do not provide vertex expansion beyond D/2 nor do they yield highly
imbalanced expanders (with M
N ) as needed here. But in Section 5,
we will see how to explicitly construct expanders that are quite good
for this application (specically, with M = K 1.01 polylogN ).
4.6
A detailed coverage of expander graphs and their applications in theoretical computer science is given by Hoory, Linial, and Wigderson [207].
Applications in pure mathematics are surveyed by Lubotzky [276].
The rst papers on expander graphs appeared in conferences on
telephone networks. Specically, Pinsker [309] proved that random
graphs are good expanders, and used these to demonstrate the existence
of graphs called concentrators. Bassalygo [52] improved Pinskers
results, in particular giving the general tradeo between the degree D,
expansion factor A, and set density mentioned after Theorem 4.4.
The rst computer science application of expanders (and superconcentrators) came in an approach by Valiant [403] to proving circuit
lower bounds. An early and striking algorithmic application was the
O(log n)-depth sorting network by Ajtai, Koml
os, and Szemeredi [10],
which also illustrated the usefulness of expanders for derandomization.
An exciting recent application of expanders is Dinurs new proof of the
PCP Theorem [118].
The fact that spectral expansion implies vertex expansion and edge
expansion was shown by Tanner [385] (for vertex expansion) and Alon
and Milman [23] (for edge expansion). The converses are discrete analogues of Cheegers Inequality for Riemannian manifolds [94], and various forms of these were proven by Alon [15] (for vertex expansion),
Jerrum and Sinclair [219] (for edge expansion in undirected graphs
and, more generally, conductance in reversible Markov chains), and
Mihail [286] (for edge expansion in regular digraphs and conductance
in nonreversible Markov chains).
128
Expander Graphs
129
O( D).
Constant-degree bipartite expanders with expansion (1 )D have
been constructed by Capalbo et al. [92], based on a variant of the zig
zag product for randomness condensers. (See Section 6.3.5.) Alon and
Capalbo [17] have made progress on Open Problem 4.44 by giving an
explicit construction of undirected constant-degree unique-neighbor
expanders (see Problem 4.10).
The deterministic logspace algorithm for Undirected S-T Connectivity (Algorithm 4.45) is due to Reingold [327]. The result
that RL L3/2 is due to Saks and Zhou [344], with an important
ingredient being Nisans pseudorandom generator for space-bounded
computation [299]. Based on Algorithm 4.45, explicit polynomiallength universal traversal sequences for consistently labelled regular
digraphs, as well as pseudorandom walk generators for such graphs,
were constructed in [327, 331]. (See also [338].) In [331], it is shown
that pseudorandom walk generators for arbitrarily labelled regular
130
Expander Graphs
5
List-Decodable Codes
The eld of coding theory is motivated by the problem of communicating reliably over noisy channels where the data sent over the channel
may come out corrupted on the other end, but we nevertheless want
the receiver to be able to correct the errors and recover the original
message. There is a vast literature studying aspects of this problem
from the perspectives of electrical engineering (communications and
information theory), computer science (algorithms and complexity),
and mathematics (combinatorics and algebra). In this survey, we are
interested in codes as pseudorandom objects. In particular, a generalization of the notion of an error-correcting code yields a framework that
we will use to unify all of the main pseudorandom objects covered in
this survey (averaging samplers, expander graphs, randomness extractors, list-decodable codes, pseudorandom generators).
5.1
5.1.1
132
List-Decodable Codes
codes are dened to be sets rather than multisets. However, the generality
aorded by multisets will allow the connections we see later to be stated more cleanly.
In the case C is a multiset, the condition that a mapping Enc : {1, . . . , N } C is bijective
means that for every c C, |Enc1 (c)| equals the multiplicity of c in C.
133
134
List-Decodable Codes
any codeword appears in C with multiplicity greater than 1, then the minimum distance
is dened to be zero.
135
Existential Bounds
Like expanders, the existence of very good codes can be shown using
the probabilistic method. The bounds will be stated in terms of the
q-ary entropy functions, so we begin by dening those.
Denition 5.6. For q, n
N and [0, 1], we dene Hq (, n
) [0, 1]
to be such that |B(x, )| = q Hq (,n)n for x n , where is an alphabet
of size q.
We also dene the q-ary entropy function Hq () = logq ((q
1)/) + (1 ) logq (1/(1 )).
The reason we use similar notation for Hq (, n
) and Hq () is Part 1
of the following:
Proposition 5.7. For every q N, (0, 1 1/q), [0, 1/2],
(1) limn Hq (, n
) = Hq ().
(2) Hq () H2 ()/ log q + .
(3) H2 (1/2 ) = 1 (2 ).
136
List-Decodable Codes
Proof.
(1) Pick the codewords c1 , . . . , cN in sequence ensuring that ci
is at distance at least from c1 , . . . , ci1 . The union of
the Hamming balls of radius around c1 , . . . , ci1 contains
at most (i 1) q Hq (,n)n < (N 1) q Hq (,n)n , so there is
always a choice of ci outside these balls if we take N =
q (1Hq (,n))n .
(2) We use the probabilistic method. Choose the N codewords
randomly and independently from n . The probability that
there is a Hamming Ball of radius containing at least L + 1
codewords is at most
Hq (,n)n L+1
L+1
N
q
N 1
n
,
L+1
q n
q (1Hq (,n)1/(L+1))n
which is less than 1 if we take N = q (1Hq (,n)1/(L+1))n .
Note that while the rate bounds are essentially the same for achieving minimum distance and achieving list-decoding radius (as we take
large list size), the bounds are incomparable because minimum distance
only corresponds to unique decoding up to radius roughly /2. The
bound for list decoding is known to be tight up to the dependence on L
(Problem 5.1), while the bound on minimum distance is not tight in
general. Indeed, there are families of algebraic-geometric codes with
137
constant alphabet size q, constant minimum distance > 0, and constant rate > 0 where > 1 Hq (, n
) for suciently large n
. (Thus,
this is a rare counterexample to the phenomenon random is best.)
Identifying the best tradeo between rate and minimum distance, even
for binary codes, is a long-standing open problem in coding theory.
Open Problem 5.9. For each constant (0, 1), identify the largest
> 0 such that for every > 0, there exists an innite family of codes
Cn {0, 1}n of rate at least and minimum distance at least .
Lets look at some special cases of the parameters in Theorem 5.8.
For binary codes (q = 2), we will be most interested in the case
1/2, which corresponds to correcting the maximum possible fraction of errors for binary codes. (No nontrivial decoding is possible
for binary codes at distance greater than 1/2, since a completely
random received word will be at distance roughly 1/2 with most codewords.) In this case, Proposition 5.7 tells us that the rate approaches
1 H2 (1/2 ) = (2 ), i.e., the blocklength is n
= (n/2 ) (for list
size L = (1/2 )). For large alphabets q, Proposition 5.7 tells us that
the rate approaches 1 as q grows. We will be most interested in
the case = 1 for small , so we can correct a = 1 fraction of
errors with a rate arbitrarily close to . For example, we can achieve
rate of = 0.99, a list size of L = O(1/) and an alphabet of size
poly(1/). More generally, it is possible to achieve rate = (1 )
with an alphabet size of q = (1/)O(1/) .
While we are primarily interested in list-decodable codes, minimum
distance is often easier to bound. The following allows us to translate
bounds on minimum distance into bounds on list-decodability.
Proposition 5.10 (Johnson Bound).
(1) If C has minimum distance 1 , then it is (1
O( ), O(1/ ))-list-decodable.
(2) If a binary code C has minimum distance 1/2 , then it is
(1/2 O( ), O(1/))-list-decodable.
138
List-Decodable Codes
agr(r, ci )
agr(ci , cj )
i
s
> s
2
1i<js
2 1 = 1,
where the last inequality is by our setting of parameters. Hence,
contradiction.
Note the quadratic loss in the distance parameter. This means that
optimal codes with respect to minimum distance are not necessarily
optimal with respect to list decoding. Nevertheless, if we do not care
about the exact tradeo between the rate and the decoding radius, the
above can yield codes where the decoding radius is as large as possible
(approaching 1 for large alphabets and 1/2 for binary alphabets).
5.1.3
Explicit Codes
139
where we simply require that the entire codeword Enc(m) can be computed in time poly(
n, log ||).
The constructions of codes that we describe will involve arithmetic
over nite elds. See Remark 3.25 for the complexity of constructing
nite elds and carrying out arithmetic in such elds .
In describing the explicit constructions below, it will be convenient
to think of the codewords as functions c : [
n] rather than as strings
n
in .
Construction 5.12 (Hadamard Code). For n N, the (binary)
Hadamard code of message length n is the binary code of blocklength
n
= 2n consisting of all functions c : Zn2 Z2 that are linear (modulo 2).
Proposition 5.13. The Hadamard code:
(1) is explicit with respect to the encoding function that takes a
message m Zn2 to the linear function cm dened by cm (x) =
i mi xi mod 2.
(2) has minimum distance 1/2, and
(3) is (1/2 , O(1/2 )) list-decodable for every > 0.
Proof. Explicitness is clear by inspection. The minimum distance
follows from the fact that for every two distinct linear functions
c, c : Zn2 Z2 , Prx [c(x) = c (x)] = Prx [(c c )(x) = 0] = 1/2. The listdecodability follows from the Johnson Bound.
The advantages of the Hadamard code are its small alphabet
(binary) and optimal distance (1/2), but unfortunately its rate is exponentially small ( = n/2n ). By increasing both the eld size and degree,
we can obtain complementary properties
Construction 5.14 (ReedSolomon Codes). For a prime power q
and d N, the q-ary ReedSolomon code of degree d is the code of
blocklength n
= q and message length n = (d + 1) log q consisting of
all polynomials p : Fq Fq of degree at most d.
140
List-Decodable Codes
141
5.2
List-Decoding Algorithms
Review of Algebra
142
List-Decodable Codes
Theorem 5.19. There is a polynomial-time (1 ) list-decodingalgorithm for the q-ary ReedSolomon code of degree d, for = 2 d/q.
That is, given a function r : Fq Fq and d N, all polynomials of
143
144
List-Decodable Codes
5.2.3
ParvareshVardy Codes
h
Then, it follows that Q (Y, Z) = Q(Y, Z, Z , . . . , Z h
) is nonzero if Q
is nonzero because every monomial in Q with individual degrees at most
h 1 in Z1 , . . . , Zm gets mapped to a dierent power of Z. However,
here the diculty is that the degree of Q (Y, f (Y )) is too high (roughly
d = dY + d h m > dm
Z ) for us to satisfy the constraint q d .
ParvareshVardy codes get the best of both worlds by providing more information with each symbol not just the evaluation
of f at each point, but the evaluation of m 1 other polynomials
f1 , . . . , fm1 , each of which is still of degree d (as is good for arguing
that Q(Y, f (Y ), f1 (Y ), . . . , fm1 (Y )) = 0, but can be viewed as raising
f to successive powers of h for the purposes of ensuring that Q is
nonzero.
To introduce this idea, we need some additional algebra.
For univariate polynomials f (Y ) and E(Y ), we dene
f (Y ) mod E(Y ) to be the remainder when f is divided by E.
145
Theorem 5.22. For every prime power q, integer 0 d < q, and irreducible polynomial E of degree d + 1, the q-ary ParvareshVardy code
of degree d, redundancy m = log(q/d), power h = 2, and irreducible E
146
List-Decodable Codes
distance = 1 O(d/q).
Proof. We are given a received word r : Fq Fm
q , and want to nd all
(5.1)
(5.2)
(5.3)
Once we have this, we can reduce both sides of Equation (5.2) modulo E(Y ) and deduce
0 = Q(Y, f0 (Y ), f1 (Y ), . . . , fm1 (Y )) mod E(Y )
= Q(Y, f (Y ), f (Y )h , . . . , f (Y )h
m1
) mod E(Y )
m1
) mod E(Y ),
147
dhm
1
= O(d/q),
+
hm
q
(5.4)
(d/q),
this completes the proof of the theorem.
Note that the obstacles to obtaining a tradeo are the factor
of m in expression for the rate = d/(mq) and the factor of hm in
Equation (5.4) for . We remedy these in the next section.
5.2.4
(5.5)
where we write y = j .
Thus, the symbols of the PV encoding have a lot of overlap. For
example, the j th symbol and the j+1 th symbol share all but one
component. Intuitively, this means that we should only have to send a
1/m fraction of the symbols of the codeword, saving us a factor of m
in the rate. (The other symbols can be automatically lled in by the
148
List-Decodable Codes
149
(5.6)
(5.7)
150
List-Decodable Codes
i
(5.8)
Note the savings of the factor (h 1) m as compared to Inequality (5.3); this is because we chose Q to be of total degree 1 in the Zi s
instead of having individual degree 1.
Now, as in the ParvareshVardy decoding, we can reduce both sides
of Equation (5.7) modulo E(Y ) and deduce
0 = Q(Y, f0 (Y ), f1 (Y ), . . . , fm1 (Y )) mod E(Y )
= Q(Y, f (Y ), f (Y )h , . . . , f (Y )h
m1
) mod E(Y ).
m1
) mod E(Y ),
151
The polynomials in the running time, alphabet size, and list size
2
depend exponentially on ; specically they are of the form nO(1/ ) .
Nonconstructively, it is possible to have alphabet size (1/)O(/) and
list size O(1/) (see discussion after Theorem 5.8), and one could hope
for running time that is a xed polynomial in n, with an exponent
that is independent of . Recent work has achieved these parameters,
albeit with a randomized construction of codes; it remains open to
have a fully explicit and deterministic construction. However, for a
xed constant-sized alphabet, e.g., q = 2, it is still not known how to
achieve list-decoding capacity.
Open Problem 5.26. For any desired constants , > 0 such that
> 1 H2 (), construct an explicit family of of codes Encn : {0, 1}n
{0, 1}n that have rate at least and are -list-decodable in polynomial
time.
5.3
152
List-Decodable Codes
153
= Pr[Enc(x)y = ry ]
y
= agr(Enc(x), r).
Thus |LIST(Tr , 1/q + )| K if and only if there are at most K messages x whose encodings have agreement greater than 1/q + with r,
which is the same as being (1 1/q , K) list-decodable.
Proposition 5.30. Let Samp and be as in Construction 5.27. Then
(1) Samp is a (, ) averaging sampler i for every function f :
[M ] [0, 1], we have
|LIST (f, (f ) + )| N.
(2) Samp is a (, ) boolean averaging sampler i for every set
T [M ], we have
|LIST (T, (T ) + )| N.
154
List-Decodable Codes
Noting that the sets Tr in Proposition 5.29 have density (Tr ) = 1/q,
we see that the averaging-sampler property implies the standard listdecoding property:
Corollary 5.31. If Samp is a (, ) boolean averaging sampler of
the form Samp(x)y = (y, Enc(x)y ), then Enc is (1 1/q , N ) listdecodable.
Note, however, that the typical settings of parameters of samplers
and list-decodable codes are very dierent. With codes, we want the
alphabet size q to be as small as possible (e.g., q = 2) and the blocklength D to be linear or polynomial in the message length n = log N ,
so M = qD is also linear or polynomial in n = log N . In contrast, we
usually are interested in samplers for functions on exponentially large
domains (e.g., M = 2(n) ).
In Section 6, we will see a converse to Corollary 5.31 when the
alphabet size is small: if Enc is (1 1/q , N ), list-decodable, then
Samp is an (/, q ) averaging sampler.
For expanders, it will be convenient to state the list-decoding property in terms of the following variant of vertex expansion, where we
only require that sets of size exactly K expand:
Denition 5.32. For K N, a bipartite multigraph G is an (= K, A)
vertex expander if all sets S consisting of K left-vertices, the neighborhood N (S) is of size at least A K.
Thus, G is a (K, A) vertex expander in the sense of Denition 4.3
i G is an (= K , A) vertex expander for all positive integers K K.
Proposition 5.33. For K N, : [N ] [D] [M ] is an (= K, A)
vertex expander i for every set T [D] [M ] such that |T | < KA,
we have:
|LIST (T, 1)| < K.
155
Proof.
not an (= K, A) expander
S [N ] s.t. |S| = K and |N (S)| < KA
S [N ] s.t. |S| K and |N (S)| < KA
T [M ] s.t. |LIST(T, 1)| K and |T | < KA,
where the last equivalence follows because if T = N (S), then S
LIST(T, 1), and conversely if S = LIST(T, 1) then N (S) T .
On one hand, this list-decoding property seems easier to establish
than the ones for codes and samplers because we look at LIST(T, 1)
instead of LIST(T, (T ) + ). On the other hand, to get expansion
(i.e., A > 1), we require a very tight relationship between |T | and
|LIST(T, 1)|. In the setting of codes or samplers, we would not care
much about a factor of 2 loss in |LIST(T )|, as this just corresponds to
a factor of 2 in list size or error probability. But here it corresponds
to a factor of 2 loss in expansion, which can be quite signicant. In
particular, we cannot aord it if we are trying to get A = (1 ) D,
as we will be in the next section.
5.4
Despite the substantial dierences in the standard settings of parameters between codes, samplers, and expanders, it can be very useful
to translates ideas and techniques from one object to the other using
the connections described in the previous section. In particular, in this
section we will see how to build graphs with extremely good vertex
expansion (A = (1 )D) from ParvareshVardy codes.
Consider the bipartite multigraph obtained from the Parvaresh
Vardy codes (Construction 5.21) via the correspondence of Construction 5.27. That is, we dene a neighbor function : Fnq Fq Fq
Fm
q by
(f, y) = [y, f0 (y), f1 (y), . . . , fm1 (y)],
(5.9)
156
List-Decodable Codes
157
(In the list-decoding algorithm, the left-hand side of this inequality was
q, since we were bounding |LIST(Tr , )|.)
Once we have this, we can reduce both sides modulo E(Y ) and
deduce
0 = Q(Y, f0 (Y ), f1 (Y ), . . . , fm1 (Y )) mod E(Y )
2
= Q(Y, f (Y ), f (Y )h , . . . , f (Y )h
m1
) mod E(Y )
m1
) mod E(Y ),
158
List-Decodable Codes
Proof. Let n = log N and k = log Kmax . Let h = (2nk/)1/ and let q
be the power of 2 in the interval (h1+ /2, h1+ ].
Set m = (log Kmax )/(log h), so that hm1 Kmax hm . Then,
by Theorem 5.34, the graph : Fnq Fq Fm+1
dened in (5.9) is
q
m
an (h , A) expander for A = q nhm. Since Kmax hm , it is also a
(Kmax , A) expander.
Note that the number of left-vertices in is q n N , and the number
of right-vertices is
1+
.
M = q m+1 q 2 h(1+)(m1) q 2 Kmax
The degree is
D = q h1+ = O(nk/)1+1/ = O((log N )(log Kmax )/)1+1/ .
To see that the expansion factor A = q nhm q nhk is at least
(1 )D = (1 )q, note that
nhk (/2) h1+ q,
where the rst inequality holds because h 2nk/.
Finally, the construction is explicit because a description of Fq for q
a power of 2 (i.e., an irreducible polynomial of degree log q over F2 )
as well as an irreducible polynomial E(Y ) of degree n over Fq can be
found in time poly(n, log q) = poly(log N, log D).
These expanders are of polylogarithmic rather than constant degree.
But the expansion is almost as large as possible given the degree
(A = (1 ) D), and the size of the right-hand side is almost as
small as possible (in a (K, A) expander, we must have M KA =
(1 )KD). In particular, these expanders can be used in the data
structure application of Problem 4.10 storing a K-sized subset of [N ]
using K 1.01 polylog(N ) bits in such a way that membership can be
probabilistically tested by reading 1 bit of the data structure. (An ecient solution to that application actually requires more than the graph
being explicit in the usual sense, but also that there are ecient algorithms for nding all left-vertices having at least some fraction neighbors in a given set T [M ] of right vertices, but the expanders above
5.5 Exercises
159
5.5
Exercises
160
List-Decodable Codes
5.5 Exercises
161
162
List-Decodable Codes
parity checks are not satised, and argue that the number of
errors decreases by a constant factor. It may be useful to use
the results of Problem 4.10.)
By a probabilistic argument like Theorem 4.4, graphs G as above exist
with D = O(1), K = (N ), M = (1 (1))N , arbitrarily small constant > 0, and N , and in fact explicit constructions are known.
This yields explicit LDPC codes with constant rate and constant distance (asymptotically good LDPC codes).
list
decoding
of
ReedSolomon
163
s1 or s2 . That is, for a question f : {0, 1}n {0, 1}, the oracle may
answer with either f (s1 ) or f (s2 ). Here it turns out to be impossible
to pin down either of the secrets with certainty, no matter how many
questions we ask, but we can hope to compute a small list L of secrets
such that |L {s1 , s2 }| = 0. (In fact, |L| can be made as small as 2.)
This variant of twenty questions apparently was motivated by problems
in Internet trac routing.
(1) Let Enc : {0, 1}n {0, 1}n be a code such that every two
codewords in Enc agree in at least a 1/2 fraction of positions and that Enc has a polynomial-time (1/4 + , ) listdecoding algorithm. Show how to solve the above problem in
polynomial time by asking the n
questions {fi } dened by
fi (x) = Enc(x)i .
(2) Taking Enc to be the code constructed in Problem 1, deduce
that n
= poly(n) questions suces.
5.6
164
List-Decodable Codes
Stichtenoth [375]. The nonconstructive bound for list decoding of Theorem 5.8 (Part 2) is due to Elias [131], and it has been extended to
linear codes in [186, 187, 428]. The Johnson Bound (Proposition 5.10)
was proven for binary codes by Johnson [224, 225]. An optimized form
of the bound can be found in [191].
The binary (q = 2) case of ReedMuller codes was introduced independently by Reed [324] and Muller [293] in the mid-50s. ReedSolomon
codes were introduced in 1960 by Reed and Solomon [325]. Polynomial time unique-decoding algorithms for ReedSolomon Codes include
those of Peterson [308] and Berlekamp [65]. The rst nontrivial list
decoding algorithm was given by Goldreich and Levin [168]; while
their algorithm is stated in the language of hardcore bits for oneway functions, it can be viewed as an ecient local list-decoding
algorithm for the Hadamard Code. (Local decoding algorithms are discussed in Section 7.) The list-decoding algorithm for ReedSolomon
Codes of Theorem 5.19 is from the seminal work of Sudan [378], which
sparked a resurgence in the study of list decoding. The improved
decoding algorithm of Problem 5.6, Part 2 is due to Guruswami and
Sudan [189]. ParvareshVardy codes and their decoding algorithm are
from [306]. Folded ReedSolomon Codes and their capacity-achieving
list-decoding algorithm are due to Guruswami and Rudra [188]. (In all
cases, our presentation uses a simplied setting of parameters compared
to the original algorithms.) The list size, decoding time, and alphabet
size of Folded ReedSolomon Codes have recently been improved in
[125, 185, 194], with the best parameters to date being achieved in the
randomized construction of [194], while [125] give a fully deterministic
construction.
The list-decoding views of expanders and samplers emerged out
of work on randomness extractors (the subject of Section 6). Specically, a close connection between extractors and expanders was understood already at the time that extractors were introduced by Nisan
and Zuckerman [303]. An equivalence between extractors and averaging samplers was established by Zuckerman [427] (building on previous
connections between other types of samplers and expanders [107, 365]).
A connection between list-decodable codes and extractors emerged in
the work of Trevisan [389], and the list-decoding view of extractors
165
6
Randomness Extractors
166
6.1
6.1.1
167
168
Randomness Extractors
0 < i 1 for some constant > 0. How can we deal with such
a source?
It can be shown that when we take a parity of bits from such
an independent-bit source, the result approaches an unbiased coin
ip exponentially fast in , i.e., | Pr i=1 Xi = 1 1/2| = 2() . The
result is not a perfect coin ip but is as good as one for almost all
purposes.
Lets be more precise about the problems we are studying. A source
on {0, 1}n is simply a random variable X taking values in {0, 1}n . In
each of the above examples, there is an implicit class of sources being
studied. For example, IndBitsn, is the class of sources X on {0, 1}n
where the bits Xi are independent and satisfy Pr[Xi = 1] 1 .
We could dene IIDBitsn, to be the same with the further restriction that all of the Xi s are identically distributed, i.e., Pr[Xi = 1] =
Pr[Xj = 1] for all i, j, thereby capturing von Neumann sources.
Denition 6.1 (deterministic extractors). 1 Let C be a class of
sources on {0, 1}n . An -extractor for C is a function Ext : {0, 1}n
{0, 1}m such that for every X C, Ext(X) is -close to Um .
Note that we want a single function Ext that works for all sources
in the class. This captures the idea that we do not want to assume
we know the exact distribution of the physical source we are using,
but only that it comes from some class. For example, for IndBitsn, ,
we know that the bits are independent and none are too biased, but
not the specic bias of each bit. Note also that we only allow the
extractor one sample from the source X. If we want to allow multiple independent samples, then this should be modelled explicitly in
our class of sources; ideally we would like to minimize the independence
assumptions used.
We still need to dene what we mean for the output to be -close
to Um .
1 Such
169
170
Randomness Extractors
171
xX
"
1
.
Pr [X = x]
1
,
Pr [X = x]
172
Randomness Extractors
Lemma 6.8 (properties of entropy). For each of the entropy measures H {HSh , H2 , H } and random variables X, Y , we have:
H(X) 0, with equality i X is supported on a single
element,
H(X) log |Supp(X)|, with equality i X is uniform on
Supp(X),
if X, Y are independent, then H((X, Y )) = H(X) + H(Y ),
for every deterministic function f , we have H(f (X)) H(X),
and
for every X, we have H (X) H2 (X) HSh (X).
173
174
Randomness Extractors
Seeded Extractors
175
176
Randomness Extractors
N
N K (where N = 2n ), which is unfortunately a larger doubleis K
exponential in k. We can overcome this gap by allowing the extractor
to be slightly probabilistic, i.e., allowing the extractor a seed consisting of a small number of truly random bits in addition to the weak
random source. We can think of this seed of truly random bits as a
random choice of an extractor from family of extractors. This leads to
the following crucial denition:
Denition 6.13 (seeded extractors). A function Ext : {0, 1}n
{0, 1}d {0, 1}m is a (k, )-extractor if for every k-source X on {0, 1}n ,
Ext(X, Ud ) is -close to Um .
(Sometimes we will refer to extractors Ext : [N ] [D] [M ] whose
domain and range do not consist of bit-strings. These are dened in the
natural way, requiring that Ext(X, U[D] ) is -close to U[M ] .)
The goal is to construct extractors that minimize d and maximize m.
We prove the following theorem.
Theorem 6.14. For every n N, k [0, n] and > 0, there exists
a (k, )-extractor Ext : {0, 1}n {0, 1}d {0, 1}m with m = k + d
2 log(1/) O(1) and d = log(n k) + 2 log(1/) + O(1).
One setting of parameters to keep in mind (for our application of
simulating randomized algorithms with a weak source) is k = n, with
a xed constant (e.g., = 0.01), and a xed constant (e.g., = 0.01).
Proof. We use the Probabilistic Method. By Lemma 6.10, it suces for
Ext to work for at k-sources. Choose the extractor Ext at random.
Then the probability that the extractor fails is at most the number of
at k-sources times the probability Ext fails for a xed at k-source.
By the above proposition, the probability of failure for a xed at k2
source is at most 2(KD ) , since (X, Ud ) is a at (k + d)-source) and
m = k + d 2 log( 1 ) O(1). Thus the total failure probability is at
most
N
N e K (KD2 )
(KD2 )
2
2
.
K
K
177
Then for every k-source X on {0, 1}n , A (w; X) has error probability
at most 2 ( + ).
Proof. The probability that A(w; Ext(X, Ud )) is incorrect is not more
than the probability A(w; Um ) is incorrect plus , i.e., + , by
the denition of statistical dierence. Then the probability that
majy A(w, Ext(X, y)) is incorrect is at most 2 ( + ), because each
error of majy A(w; Ext(x, y)) corresponds to A(w; Ext(x, Ud )) erring
with probability at least 1/2.
Note that the enumeration incurs a 2d factor slowdown in the simulation. Thus, to retain running time poly(m), we want to construct
extractors where (a) d = O(log n); (b) Ext is computable in polynomial
time; and (c) m = n(1) .
We remark that the error probability in Proposition 6.15 can actually be made exponentially small by using an extractor that is designed
for slightly lower min-entropy. (See Problem 6.2.)
178
Randomness Extractors
6.2
179
3.5 refers to pairwise independent families, but a similar argument shows that
universal families require (n) random bits. (Instead of constructing orthogonal vectors,
we construct vectors that have nonpositive dot product.)
180
Randomness Extractors
D
K
M
DM
To see the penultimate inequality, note that CP(H) = 1/D because
there are D hash functions, CP(X) 1/K because H (X) k, and
Pr [H(X) = H(X ) |X = X ] 1/M by 2-universality.
Proof of (2):
1
DM
2
1
2
1+
=
.
DM
DM
DM
DM
(H, H(X)) Ud Um
2
DM
2
2
DM
= .
2
((H, H(X)), Ud Um ) =
181
The proof above actually shows that Ext(x, h) = h(x) extracts with
respect to collision probability, or equivalently, with respect to the
2 -norm. This property may be expressed in terms of Renyi entropy
def
H2 (Z) = log(1/CP(Z)). Indeed, we can dene Ext : {0, 1}n {0, 1}d
{0, 1}m to be a (k, ) Renyi-entropy extractor if H2 (X) k implies
H2 (Ext(X, Ud )) m (or H2 (Ud , Ext(X, Ud )) m + d for strong
Renyi-entropy extractors). Then the above proof shows that pairwiseindependent hash functions yield strong Renyi-entropy extractors.
In general, it turns out that an extractor with respect to
Renyi entropy must have seed length d min{m/2, n k} O(1) (as
opposed to d = O(log n)); this explains why the seed length in the above
extractor is large. (See Problem 6.4.)
6.2.2
182
Randomness Extractors
Pr[U[M ] T ]| ,
where US denotes the uniform distribution on S. This inequality
may be expressed in graph-theoretic terms as follows. For every
set T [M ],
183
,
U
)
Pr
U
T
Pr
Ext(U
S
[D]
[M ]
e(S, T )
|T |
|S|D
M
e(S, T )
(S)(T ) (S),
ND
184
Randomness Extractors
Expanders
Measured by vertex or spectral
expansion
Typically constant degree
All sets of size at most K expand
Typically balanced
6.2.3
Extractors
Measured by min-entropy/
statistical dierence
Typically logarithmic or
poly-logarithmic degree
All sets of size exactly (or at
least) K expand
Typically unbalanced, bipartite
graphs
In this section, we cast extractors into the same list-decoding framework that we used to capture list-decodable codes, samplers, and
expanders (in Section 5.3). Recall that all of these objects could be
syntactically described as functions : [N ] [D] [M ], and their
properties could be captured by bounding the sizes of sets of the
def
form LIST (T, ) = {x : Pry [(x, y) T ] > } for T [M ]. We also
185
Proof.
(1) Suppose for contradiction that |LIST (f, (f ) + )| K.
Let X be uniformly distributed over LIST (f, (f ) + ).
Then X is a k-source, and
E[f (Ext(X, U[D] ))] = E [f (Ext(x, U[D] ))]
R
xX
> (f ) +
= E[f (U[M ] )] + .
By Problem 6.1, this implies that Ext(X, U[D] ) and U[M ]
are -far, contradicting the hypothesis that Ext is a (k, )
extractor.
(2) Let X be any (k + log(1/))-source. We need to show that
Ext(X, U[D] ) is 2-close to U[M ] . That is, we need to show
that for every T [M ], Pr[Ext(X, U[D] ) T ] (T ) + 2.
186
Randomness Extractors
187
Proof.
(1) Follows from Corollaries 5.31 and 6.24.
(2) Let X be a (k + log(1/))-source and Y = U[D] . Then the
statistical dierence between (Y, Ext(X, Y )) and Y U[M ]
equals
((Y, Ext(X, Y )), Y U[M ] )
= E [(Ext(X, y), U[M ] )]
R
y Y
= E [(Enc(X)y , U[M ] )]
R
y Y
M
E max Pr[Enc(X)y = z] 1/M ,
R
z
2 yY
188
Randomness Extractors
(2(k+log(1/)) K + )
2
M .
Note that the quantitative relationship between extractors and
list-decodable codes given by Proposition 6.25 deteriorates extremely
fast as the output length/alphabet size increases. Nevertheless, the
list-decoding view of extractors as given in Proposition 6.23 turns out
to be quite useful.
6.3
Constructing Extractors
Block Sources
189
190
Randomness Extractors
191
192
Randomness Extractors
6.3.3
193
Condensers
194
Randomness Extractors
with
d=
195
The Extractor
In this section, we will use the ideas outlined in the previous section
namely condensing and block-source extraction to construct an
extractor that is optimal up to constant factors.
Theorem 6.36. For all positive integers n k and all > 0, there
is an explicit (k, ) extractor Ext : {0, 1}n {0, 1}d {0, 1}m with
m k/2 and d = O(log(n/)).
We will use the following building block, constructed in Problem 6.9.
Lemma 6.37. For every constant t > 0 and all positive integers
n k and all > 0, there is an explicit (k, ) extractor Ext :
{0, 1}n {0, 1}d {0, 1}m with m k/2 and d = k/t + O(log(n/)).
The point is that this extractor has a seed length that is an
arbitrarily large constant factor (approximately t/2) smaller than its
output length. Thus, if we use it as Ext2 in the block-source extraction
of Lemma 6.27, the resulting seed length will be smaller than that of
Ext1 by an arbitrarily large constant factor. (The seed length of the
composed extractor Ext in Lemma 6.27 is the same of that as Ext2 ,
which will be a constant factor smaller than its output length m2 ,
which we can take to be equal to the seed length d1 of Ext1 .)
Overview of the Construction. Note that for small minentropies k, namely k = O(log(n/)), the extractor we want is already
given by Lemma 6.37 with seed length d smaller than the output
length m by any constant factor. (If we allow d m, then extraction
is trivial just output the seed.) Thus, our goal will be to recursively
construct extractors for large min-entropies using extractors for smaller
min-entropies. Of course, if Ext : {0, 1}n {0, 1}d {0, 1}m is a (k0 , )
196
Randomness Extractors
extractor, say with m = k0 /2, then it is also a (k, ) extractor for every
k k0 . The problem is that the output length is only k0 /2 rather than
k/2. Thus, we need to increase the output length. This can be achieved
by simply applying extractors for smaller min-entropies several times:
Lemma 6.38. Suppose Ext1 : {0, 1}n {0, 1}d1 {0, 1}m1 is a (k1 , 1 )
extractor and Ext2 : {0, 1}n {0, 1}d2 {0, 1}m2 is a (k2 , 2 ) extractor
for k2 = k1 m1 log(1/3 ). Then Ext : {0, 1}n {0, 1}d1 +d2
{0, 1}m1 +m2 dened by Ext (x, (y1 , y2 )) = (Ext1 (x, y1 ), Ext2 (x, y2 )) is a
(k1 , 1 + 2 + 3 ) extractor.
The proof of this lemma follows from Lemma 6.30. After conditioning a k1 -source X on W = Ext1 (X, Ud1 ), X still has min-entropy
at least k1 m1 log(1/3 ) = k2 (except with probability 3 ), and
thus Ext2 (X, Ud2 ) can extract an additional m2 almost-uniform bits.
To see how we might apply this, consider setting k1 = 0.8k and
m1 = k1 /2, 1 = 2 = 3 = 20.1k , k2 = k1 m1 log(1/3 )
[0.3k, 0.4k], and m2 = k2 /2. Then we obtain a (k, 3) extractor Ext
with output length m = m1 + m2 > k/2 from two extractors for
min-entropies k1 , k2 that are smaller than k by a constant factor, and
we can hope to construct the latter two extractors recursively via the
same construction.
Now, however, the problem is that the seed length grows by
a constant factor in each level of recursion (e.g., if d1 = d2 = d in
Lemma 6.38, we get seed length 2d rather than d). Fortunately, block
source extraction using the extractor of Lemma 6.37 gives us a method
to reduce the seed length by a constant factor. (The seed length of the
composed extractor Ext in Lemma 6.27 is the same of that as Ext2 ,
which will be a constant factor smaller than its output length m2 ,
which we can take to be equal to the seed length d1 of Ext1 . Thus, the
seed length of Ext will be a constant factor smaller than that of Ext1 .)
In order to apply block source extraction, we rst need to convert our
source to a block source; by Lemma 6.31, we can do this by using the
condenser of Theorem 6.34 to make its entropy rate close to 1.
One remaining issue is that the error still grows by a constant factor in each level of recursion. However, we can start with
197
polynomially small error at the base of the recursion and there are only
logarithmically many levels of recursion, so we can aord this blow-up.
We now proceed with the proof details. It will be notationally
convenient to do the steps in the reverse order from the description
above rst we will reduce the seed length by a constant factor via
block-source extraction, and then apply Lemma 6.38 to increase the
output length.
Proof of Theorem 6.36. Fix n N and 0 > 0. Set d = c log(n/0 )
for an error parameter 0 and a suciently large constant c to be
determined in the proof below. (To avoid ambiguity, we will keep
the dependence on c explicit throughout the proof, and all big-Oh
notation hides universal constants independent of c.) For k [0, n], let
i(k) be the smallest nonnegative integer i such that k 2i 8d. This
will be the level of recursion in which we handle min-entropy k; note
that i(k) log k log n.
For every k [0, n], we will construct an explicit Extk :
{0, 1}n {0, 1}d {0, 1}k/2 that is a (k, i(k) ) extractor, for an
appropriate sequence 0 1 2 . Note that we require the seed
length to remain d and the fraction of min-entropy extracted to remain
1/2 for all values of k. The construction will be by induction on i(k).
Base Case: i(k) = 0, i.e., k 8d. The construction of Extk follows
from Lemma 6.37, setting t = 9 and taking c to be a suciently large
constant.
Inductive Case: We construct Extk for i(k) 1 from extractors
Extk with i(k ) < i(k) as follows. Given a k-source X of length n,
Extk works as follows.
(1) We apply the condenser of Theorem 6.34 to convert X into
a source X that is 0 -close to a k-source of length (9/8)k +
O(log(n/0 )). This requires a seed of length O(log(n/0 )).
(2) We divide X into two equal-sized halves (X1 , X2 ). By
Lemma 6.31, (X1 , X2 ) is 20 -close to a 2 k block source for
k = k/2 k/8 O(log(n/0 )).
198
Randomness Extractors
199
Method
Seed Length d
O(log2 n)
Output Length m
k + d O(1)
k(1)
n
k + d O(1)
(1 )k,
any constant > 0
k O(1)
While Theorem 6.36 and Corollary 6.39 give extractors that are
optimal up to constant factors in both the seed length and output
length, it remains an important open problem to get one or both of
these to be optimal to within an additive constants while keeping the
other optimal to within a constant factor.
Open Problem 6.41. Give an explicit construction of (k, 0.01) extractors Ext : {0, 1}n {0, 1}d {0, 1}m with seed length d = O(log n)
and output length m = k + d O(1).
200
Randomness Extractors
Open Problem 6.42. Give an explicit construction of (k, 0.01) extractors Ext : {0, 1}n {0, 1}d {0, 1}m with seed length d = log n + O(1)
and m = (k) (or even m = k (1) ).
One of the reasons that these open problems are signicant is that,
in many applications of extractors, the resulting complexity depends
exponentially on the seed length d and/or the entropy loss k + d m.
(An example is the simulation of BPP with weak random sources given
by Proposition 6.15.) Thus, additive constants in these parameters
corresponds to constant multiplicative factors in complexity.
Another open problem is more aesthetic in nature. The construction of Theorem 6.36 makes use of the condenser of Theorem 6.36, the
Leftover Hash Lemma (Theorem 6.18) and the composition techniques
of Lemmas 6.27 and 6.38 in a somewhat complex recursion. It is of
interest to have a construction that is more direct. In addition to the
aesthetic appeal, such a construction would likely be more practical
to implement and provide more insight into extractors. In Chapter 7,
we will see a very direct construction based on a connection between
extractors and pseudorandom generators, but its parameters will
be somewhat worse than Theorem 6.36. Thus the following remains
open:
6.3.5
201
202
Randomness Extractors
that retains any extra entropy in (x1 , y1 ) that did not get
extracted into z1 . So a natural idea is to just do block source
extraction, but output (z1 , b1 ) rather than just z1 . However,
this runs into trouble with the next case.
The rst block has no entropy but the second block is
completely uniform given the rst. In this case, the G2 step
cannot add any entropy and the G1 step does not add any
entropy because it is a permutation. However, the G1 step
transfers entropy into z1 . So if we add another expander-step
from b1 at the end, we can argue that it will add entropy.
This gives rise to the 3-step denition of the zigzag product.
While we analyzed the zigzag product with respect to spectral
expansion (i.e., Renyi entropy), it is also possible to analyze it in
terms of a condenser-like denition (i.e., outputting distributions
-close to having some min-entropy). It turns out that a variant
of the zigzag product for condensers leads to a construction of
constant-degree bipartite expanders with expansion (1 ) D for
the balanced (M = N ) or nearly balanced (e.g., M = (N )) case.
However, as mentioned in Open Problems 4.43, 4.44, and 5.36, there
are still several signicant open problems concerning the explicit
construction of expanders with vertex expansion close to the degree,
achieving expansion D O(1) in the nonbipartite case, and achieving
a near-optimal number of right-hand vertices.
6.4
Exercises
1
|X Y |1 ,
2
6.4 Exercises
203
Problem 6.3 (Almost-Universal Hashing). A family H of functions mapping domain [N ] to [M ] is said to have collision probability
204
Randomness Extractors
H H
Problem 6.4 (R
enyi
extractors). Call a function Ext :
n
d
m
{0, 1} {0, 1} {0, 1}
a (k, ) Renyi extractor if for every
source X on {0, 1}n of Renyi entropy at least k, it holds that
Ext(X, Ud ) has Renyi entropy at least m .
6.4 Exercises
205
(3) Show that if Ext : {0, 1}n {0, 1}d {0, 1}m is a (k, 1)
Renyi extractor, then d min{n k, m/2} O(1). (Hint:
consider a k-source that is uniform over {x : yExt(x, y) T }
for an appropriately chosen set T .)
206
Randomness Extractors
6.4 Exercises
207
Problem 6.9
(The
Building-Block
Extractor). Prove
Lemma 6.37: Show that for every constant t > 0 and all positive integers n k and all > 0, there is an explicit (k, )extractor Ext : {0, 1}n {0, 1}d {0, 1}m with m k/2 and
d = k/t + O(log(n/)). (Hint: convert the source into a block
source with blocks of length k/O(t) + O(log(n/)).)
208
Randomness Extractors
a random variable taking values in {0, 1}n . We say that (Enc, Dec)
is (statistically) -secure with respect to K if for every two messages
u, v {0, 1}m , we have (Enc(K, u), Enc(K, v)) . For example, the
one-time pad, where n = m = and Enc(k, u) = k u = Dec(k, u) is
0-secure (a.k.a perfectly secure) with respect to the uniform distribution K = Um . For a class C of sources on {0, 1}n , we say that the
encryption scheme (Enc, Dec) is -secure with respect to C if Enc is
-secure with respect to every K C.
(1) Show that if there exists a deterministic -extractor
Ext : {0, 1}n {0, 1}m for C, then there exists an 2-secure
encryption scheme with respect to C.
(2) Conversely, use the following steps to show that if there
exists an -secure encryption scheme (Enc, Dec) with
respect to C, where Enc : {0, 1}n {0, 1}m {0, 1} , then
there exists a deterministic 2-extractor Ext: {0, 1}n
{0, 1}m2 log(1/)O(1) for C, provided m log n + 2 log
(1/) + O(1).
(a) For each xed key k {0, 1}n , dene a source Xk on
{0, 1} by Xk = Enc(k, Um ), and let C be the class
of all these sources (i.e., C = {Xk : k {0, 1}n }).
Show that there exists a deterministic -extractor
Ext : {0, 1} {0, 1}m2 log(1/)O(1) for C , provided
m log n + 2 log(1/) + O(1).
(b) Show that if Ext is a deterministic -extractor for C
and Enc is -secure with respect to C, then Ext(k) =
Ext (Enc(k, 0m )) is a deterministic 2-extractor for C.
Thus, a class of sources can be used for secure encryption i it is
deterministically extractable.
209
6.5
210
Randomness Extractors
211
7
Pseudorandom Generators
7.1
213
random bits for any BPP algorithm can be reduced from poly(n) to
O(log n), and then eliminate the randomness entirely by enumeration.
Thus, we would like to have a function G that stretches a seed
of d = O(log n) truly random bits into m = poly(n) bits that look
random. Such a function is called a pseudorandom generator. The
question is how we can formalize the requirement that the output
should look random in such a way that (a) the output can be used
in place of the truly random bits in any BPP algorithm, and (b) such
a generator exists.
Some candidate denitions for what it means for the random
variable X = G(Ud ) to look random include the following:
Information-theoretic or statistical measures: For example,
we might measure entropy of G(Ud ), its statistical dierence
from the uniform distribution, or require pairwise independence. All of these fail one of the two criteria. For example,
it is impossible for a deterministic function to increase
entropy from O(log n) to poly(n). And it is easy to construct
algorithms that fail when run using random bits that are
only guaranteed to be pairwise independent.
Kolmogorov complexity: A string x looks random if it is
incompressible (cannot be generated by a Turing machine
with a description of length less than |x|). An appealing
aspect of this notion is that it makes sense of the randomness
in a xed string (rather than a distribution). Unfortunately,
it is not suitable for our purposes. Specically, if the function
G is computable (which we certainly want) then all of its
outputs have Kolmogorov complexity d = O(log n) (just
hardwire the seed into the TM computing G), and hence are
very compressible.
Computational indistinguishability: This is the measure we
will use. Intuitively, we say that a random variable X looks
random if no ecient algorithm can distinguish X from a
truly uniform random variable. Another perspective comes
from the denition of statistical dierence:
(X, Y ) = max | Pr[X T ] Pr[Y T ]|.
T
214
Pseudorandom Generators
Computational Indistinguishability
215
Pseudorandom Generators
216
Pseudorandom Generators
217
The idea is to replace the random bits used by A with pseudorandom bits generated by G, use the pseudorandomness property to show
that the algorithm will still be correct with high probability, and nally
enumerate over all possible seeds to obtain a deterministic algorithm.
Claim 7.6. For every x of length n, A(x; G(Ud(nc ) )) errs with
probability smaller than 1/2.
Proof of Claim: Suppose that there exists some x on which
A(x; G(Ud(nc ) )) errs with probability at least 1/2. Then T () = A(x, )
is a Boolean circuit of size at most nc that distinguishes G(Ud(nc ) )
from Unc with advantage at least 1/2 1/3 > 1/8. (Notice that we are
using the input x as nonuniform advice; this is why we need the PRG
to be pseudorandom against nonuniform tests.)
Now, enumerate over all seeds of length d(nc ) and take a majority
c
vote. There are 2d(n ) of them, and for each we have to run both G
and A.
Notice that we can aord for the generator G to have running time
t(m) = poly(m) or even t(m) = poly(m) 2O(d(m)) without aecting
the time of the derandomization by than more than a polynomial
amount. In particular, for this application, it is OK if the generator
runs in more time than the tests it fools (which are time m in this
theorem). That is, for derandomization, it suces to have G that is
mildly explicit according to the following denition:
Denition 7.7.
(1) A generator G : {0, 1}d(m) {0, 1}m is mildly explicit if it is
computable in time poly(m, 2d(m) ).
(2) A generator G : {0, 1}d(m) {0, 1}m is fully explicit if it is
computable in time poly(m).
218
Pseudorandom Generators
(3) Suppose that there is an (m, 1/8) mildly explicit PRG with
seed length d(m) = O(log m). Then BPP = P.
Of course, all of these derandomizations are contingent on the
question of whether PRGs exist. As usual, our rst answer is yes but
the proof is not very helpful it is nonconstructive and thus does not
provide for an eciently computable PRG:
Proposition 7.8. For all m N and > 0, there exists a (nonexplicit)
(m, ) pseudorandom generator G : {0, 1}d {0, 1}m with seed length
d = O(log m + log(1/)).
Proof. The proof is by the probabilistic method. Choose
G : {0, 1}d {0, 1}m at random. Now, x a time m algorithm, T .
The probability (over the choice of G) that T distinguishes G(Ud )
d 2
from Um with advantage is at most 2(2 ) , by a Cherno
bound. There are 2poly(m) nonuniform algorithms running in time m
(i.e., circuits of size m). Thus, union-bounding over all possible T ,
219
7.2
Cryptographic PRGs
The theory of computational pseudorandomness discussed in this section emerged from cryptography, where researchers sought a denition that would ensure that using pseudorandom bits instead of truly
random bits (e.g., when encrypting a message) would retain security
against all computationally feasible attacks. In this setting, the generator G is used by the honest parties and thus should be very ecient
to compute. On the other hand, the distinguisher T corresponds to an
attack carried about by an adversary, and we want to protect against
adversaries that may invest a lot of computational resources into trying
to break the system. Thus, one is led to require that the pseudorandom
generators be secure even against distinguishers with greater running
time than the generator. The most common setting of parameters in the
theoretical literature is that the generator should run in a xed polynomial time, but the adversary can run in an arbitrary polynomial time.
Denition 7.9. A generator Gm : {0, 1}d(m) {0, 1}m is a cryptographic pseudorandom generator if:
(1) Gm is fully explicit. That is, there is a constant b such that
Gm is computable in time mb .
220
Pseudorandom Generators
1
nc
221
222
Pseudorandom Generators
(1)
If s() = 2
(as is plausible for the factoring one-way
function), then we get seed length d(m) = poly(log m) and
BPP P.
But we cannot get seed length d(m) = O(log m), as needed for concluding BPP = P, from this result. Even for the maximum possible
hardness s() = 2() , we get d(m) = poly(log m). In fact, Problem 7.3
shows that it is impossible to have a cryptographic PRG with seed
length O(log m) meeting Denition 7.9, where we require that Gm
be pseudorandom against all poly(m)-time algorithms. However, for
derandomization we only need Gm to be pseudorandom against a xed
poly-time algorithm, e.g., running in time t = m, and we would get such
generators with seed length O(log m) if the aforementioned construction
could be improved to yield seed length d = O() instead of d = poly().
Open Problem 7.13. Given a one-way function f : {0, 1} {0, 1}
that is hard to invert by algorithms running in time s = s() and a
constant c, it is possible to construct a fully explicit (t, ) pseudorandom generator G : {0, 1}d {0, 1}m with seed length d = O() and
pseudorandomness against time t = s (/m)O(1) ?
The best known seed length for such a generator is
3 log(m/)/ log2 s), which is O(
2 ) for the case that s = 2()
d = O(
()
and m = 2
as discussed above.
The above open problem has long been solved in the positive
for one-way permutations f : {0, 1} {0, 1} . In fact, the construction of pseudorandom generators from one-way permutations has a
particularly simple description:
Gm (x, r) = (x, r, f (x), r, f (f (x)), r, . . . , f (m1) (x), r),
where |r| = |x| = and , denotes inner product modulo 2. One
intuition for this construction is the following. Consider the sequence
(f (m1) (U ), f (m2) (U ), . . . , f (U ), U ). By the fact that f is hard to
invert (but easy to evaluate) it can be argued that the i + 1st component of this sequence is infeasible to predict from the rst i components
except with negligible probability. Thus, it is a computational analogue
223
224
Pseudorandom Generators
7.3
Hybrid Arguments
225
The following proposition illustrates that computational indistinguishability behaves like statistical dierence when taking many
independent repetitions; the distance multiplies by at most the number of copies (cf. Lemma 6.3, Part 6). Proving it will introduce useful
techniques for reasoning about computational indistinguishability, and
will also illustrate how working with such computational notions can
be more subtle than working with statistical notions.
Proposition 7.14. If random variables X and Y are (t, ) indistinguishable, then for every k N, X k and Y k are (t, k) indistinguishable
(where X k represents k independent copies of X).
Note that when t = , this follows from Lemma 6.3, Part 6; the
challenge here is to show that the same holds even when we restrict to
computationally bounded distinguishers.
Proof. We will prove the contrapositive: if there is an ecient
algorithm T distinguishing X k and Y k with advantage greater than
k, then there is an ecient algorithm T distinguishing X and Y
with advantage greater than . The dierence in this proof from the
corresponding result about statistical dierence is that we need to
preserve eciency when going from T to T . The algorithm T will
naturally use the algorithm T as a subroutine. Thus this is a reduction
in the same spirit as reductions used elsewhere in complexity theory
(e.g., in the theory of NP-completeness).
Suppose that there exists a nonuniform time t algorithm T such that
| Pr[T (X k ) = 1] Pr[T (Y k ) = 1]| > k.
(7.1)
226
Pseudorandom Generators
i=1
since the sum telescopes. Thus, there must exist some i [k] such that
Pr[T (Hi1 ) = 1] Pr[T (Hi ) = 1] > , i.e.,
Pr[T (X ki XY i1 ) = 1] Pr[T (X ki Y Y i1 ) = 1] > .
By averaging, there exists some x1 , . . . xki and yki+2 , . . . yk such
that
Pr[T (x1 , . . . xki , X, yki+2 , . . . yk ) = 1]
Pr[T (x1 , . . . xki , Y, yki+2 , . . . yk ) = 1] > .
Then, dene T (z) = T (x1 , . . . xki , z, yki+2 , . . . , yk ). Note that T
is a nonuniform algorithm with advice i, x1 , . . . , xki , yki+2 , . . . yk
hardwired in. Hardwiring these inputs costs nothing in terms of circuit
size. Thus T is a nonuniform time t algorithm such that
Pr[T (X) = 1] Pr[T (Y ) = 1] > ,
contradicting the indistinguishability of X and Y .
While the parameters in the above result behave nicely, with (t, )
going to (t, k), there are some implicit costs. First, the amount of
nonuniform advice used by T is larger than that used by T . This is
hidden by the fact that we are using the same measure t (namely circuit
size) to bound both the time and the advice length. Second, the result is
meaningless for large values of k (e.g., k = t), because a time t algorithm
cannot read more than t bits of the input distributions X k and Y k .
We note that there is an analogue of the above result for computational indistinguishability against uniform algorithms (Denition 7.2),
but it is more delicate, because we cannot simply hardwire i,
x1 , . . . , xki , yki+2 , . . . , yk as advice. Indeed, a direct analogue of
the proposition as stated is known to be false. We need to add the
227
Next-Bit Unpredictability
1
+ ,
2
228
Pseudorandom Generators
i=1
since the sum telescopes. Thus, there must exist an i such that
Pr[T (Hi ) = 1] Pr[T (Hi1 ) = 1] > /m.
This says that T is more likely to output 1 when we put Xi in the
ith bit than when we put a random bit Ui . We can view Ui as being
Xi with probability 1/2 and being Xi with probability 1/2. The only
advantage T has must be coming from the latter case, because in the
former case, the two distributions are identical. Formally,
/m < Pr[T (Hi ) = 1] Pr[T (Hi1 ) = 1]
= Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1]
229
1
Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1]
2
1
+ Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1]
2
1
= (Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1]
2
Pr[T (X1 Xi1 Xi Ui+1 Um ) = 1]).
> + .
2
m
Note that as described P runs in time t + O(m). Recalling that we are
using circuit size as our measure of nonuniform time, we can reduce the
running time to t as follows. First, we may nonuniformly x the coin
tosses ui , . . . , um of P while preserving its advantage. Then all P does
is run T on x1 xi1 concatenated with some xed bits and and either
output what T does or its negation (depending on the xed value of ui ).
Fixing some input bits and negation can be done without increasing
circuit size. Thus we contradict the next-bit unpredictability of X.
We note that an analogue of this result holds for uniform distinguishers and predictors, provided that we change the denition of
230
Pseudorandom Generators
R
7.4
7.4.1
231
Average-Case Hardness
232
Pseudorandom Generators
We omit the proof of this proposition, but it follows from Problem 7.5, Part 2 (by setting m = 1, a = 0, and d = in Theorem 7.24).
Note that this generator includes its seed in its output. This is
impossible for cryptographic pseudorandom generators, but is feasible
(as shown above) when the generator can have more resources than
the distinguishers it is trying to fool.
Of course, this generator is quite weak, stretching by only one bit.
We would like to get many bits out. Here are two attempts:
Use concatenation: Dene G(x1 xk ) = x1 xk f (x1 )
f (xk ). This is a (t, k) pseudorandom generator because
G(Uk ) consists of k independent samples of a pseudorandom
distribution and thus computational indistinguishability is
preserved by Proposition 7.14. Note that already here we
are relying on nonuniform indistinguishability, because the
distribution (U , f (U )) is not necessarily samplable (in time
that is feasible for the distinguishers). Unfortunately, however, this construction does not improve the ratio between
output length and seed length, which remains very close to 1.
Use composition: For example, try to get two
bits out using the same seed length by dening
G (x) = G(G(x)1 )G(x)+1 , where G(x)1 denotes
the rst bits of G(x). This works for cryptographic
pseudorandom generators, but not for the generators we are
considering here. Indeed, for the generator G(x) = xf (x) of
Proposition 7.18, we would get G (x) = xf (x)f (x), which is
clearly not pseudorandom.
7.4.2
should be contrasted with the larger class EXP = DTIME(2poly() ). See Problem 7.2.
233
234
Pseudorandom Generators
1
+ ,
(7.2)
2
m
for some i [m]. From P , we construct A that computes f with
probability greater than 1/2 + /m.
Pr[P (f (X|S1 )f (X|S2 ) f (X|Si1 )) = f (X|Si )] >
235
1
+ .
m
2
236
Pseudorandom Generators
7.4.3
that it is unnecessary to allow internal NOT gates, as these can always be pushed
to the inputs via DeMorgans Laws at no increase in size or depth.
237
238
Pseudorandom Generators
Show
that
BPAC0 = AC0
or
even
239
We remark that it has recently been shown how to give an averagecase AC0 simulation of BPAC0 (i.e., the derandomized algorithm is
correct on most inputs); see Problem 7.5.
Another open problem is to construct similar, unconditional
pseudorandom generators as Theorem 7.29 for circuit classes larger
than AC0 . A natural candidate is AC0 [2], which is the same as AC0
but augmented with unbounded-fan-in parity gates. There are known
explicit functions f : {0, 1} {0, 1} (e.g., Majority) for which every
(1/k)
AC0 [2] circuit of depth k computing f has size at least sk () = 2
,
but unfortunately the average-case hardness is much weaker than
we need. These functions are only (sk (), 1/2 1/O())-average-case
hard, rather than (sk (), 1/2 1/sk ())-average-case hard, so we can
only obtain a small stretch using Theorem 7.24 and the following
remains open.
Open Problem 7.33. For every constant k and every m, construct a
o(1)
(mildly) explicit (m, 1/4)-pseudorandom generator Gm : {0, 1}m
m
0
{0, 1} fooling AC [2] circuits of depth k and size m.
7.5
240
Pseudorandom Generators
algorithms is without loss of generality because we can always derandomize the algorithm using (additional) nonuniformity. Specically,
following the proof that BPP P/poly, it can be shown that if f
is worst-case hard for nonuniform deterministic algorithms running
in time t, then it is worst-case hard for nonuniform probabilistic
algorithms running in time t for some t = (t/).
A natural goal is to be able to construct an average-case hard function from a worst-case hard function. More formally, given a function
f : {0, 1} {0, 1} that is worst-case hard for time t = t(), construct a
function f : {0, 1}O() {0, 1} such that f is average-case hard for time
t = t(1) . Moreover, we would like f to be in E if f is in E. (Whether we
can obtain a similar result for NP is a major open problem, and indeed
there are negative results ruling out natural approaches to doing so.)
Our approach to doing this will be via error-correcting codes.
Specically, we will show that if f is the encoding of f in an appropriate kind of error-correcting code, then worst-case hardness of f implies
average-case hardness of f.
Specically, we view f as a message of length L = 2 , and apply an
Pictorially:
we view as a function f : {0, 1} , where = log L.
(Ultimately, we would like = {0, 1}, but along the way we will
discuss larger alphabets.)
Now we argue the average-case hardness of f as follows. Suppose,
for contradiction, that f is not average-case hard. By denition,
there exists an ecient algorithm A with Pr[A(x) = f(x)] > 1 .
We may assume that A is deterministic by xing its coins. Then A
requires reading all 2 values of the received word A and writing all 2
241
Dec
f(x)
242
Pseudorandom Generators
243
be such that dH (g, f ) < . Then for all x [L] we have Pr[Decg (x) =
f(x)] 2/3, where the probability is taken over the coin ips of Dec.
This implies the standard denition of locally decodable codes
under the (mild) constraint that the message symbols are explicitly
included in the codeword, as captured by the following denition (see
also Problem 5.4).
Denition 7.38 (Systematic Encodings). An encoding algo
rithm Enc : {0, 1}L C for a code C L is systematic if there is
such that for
a polynomial-time computable function I : [L] [L]
L
244
Pseudorandom Generators
7.5.1
245
the blocklength), but the problem is that the Hadamard code has
exponentially small rate.
ReedMuller Code. Recall that the q-ary ReedMuller code of
degree d and dimension m consists of all multivariate polynomials
p : Fm
q Fq of total degree at most d. (Construction 5.16.) This code
has minimum distance = 1 d/q. ReedMuller Codes are a common
generalization of both Hadamard and ReedSolomon codes, and thus
we can hope that for an appropriate setting of parameters, we will
be able to get the best of both kinds of codes. That is, we want to
combine the ecient local decoding of the Hadamard code with the
good rate of ReedSolomon codes.
246
Pseudorandom Generators
247
1
.
3(d + 1)
By a union bound,
1
Pr[i, g| (i ) = p| (i )] < (d + 1) = .
3
Thus, with probability greater than 2/3, we have i, q(i ) = p| (i )
and hence q(0) = p(x). The running time of the algorithm
is poly(m, q).
We now show how to improve the decoder to handle a larger fraction
of errors, up to distance = 1/12. We alter Steps 7.43 and 7.43 in the
above algorithm. In Step 7.43, instead of querying only d + 1 points,
we query over all points in . In Step 7.43, instead of interpolation,
we use a global decoding algorithm for ReedSolomon codes to decode
the univariate polynomial p| . Formally, the algorithm proceeds as
follows.
Algorithm 7.45 (Local Corrector for ReedMuller Codes II).
Input: An oracle g : Fm F, an input x Fm , and a degree parameter d, where q = |F| 36 and d q/9.
R
1/3-decoder for ReedSolomon codes follows
from the (1 2 d/q) list-decoding algo
rithm of Theorem 5.19. Since 1/3 1 2 d/q, the list-decoder will produce a list containing all univariate polynomials at distance less than 1/3, and since 1/3 is smaller than
half the minimum distance (1 d/q), there will be only one good decoding.
5A
248
Pseudorandom Generators
Claim 7.46. If g has distance less than = 1/12 from some polynomial
p of degree at most d, and the parameters satisfy q = |F| 36, d q/9,
then Algorithm 7.45 will output p(x) with probability greater than 2/3.
Proof of Claim: The expected distance (between g| and p| ) is small:
E[dH (g| , p| )] <
1
1
1
1
+=
+
= ,
q
36
12 9
where the term 1/q is due to the fact that the point x is not random.
Therefore, by Markovs Inequality,
Pr[dH (g| , p| ) 1/3] 1/3.
Thus, with probability at least 2/3, we have that p| is the unique
polynomial of degree at most d at distance less than 1/3 from g| and
thus q must equal p| .
7.5.2
Low-Degree Extensions
Note that the usual encoding for ReedMuller codes, where the message gives the coecients of the polynomial, is not systematic. Instead
the message should correspond to evaluations of the polynomial at
certain points. Once we settle on the set of evaluation points, the task
becomes one of interpolating the values at these points (given by the
message) to a low-degree polynomial dened everywhere.
The simplest approach is to use the boolean hypercube as the set
of evaluation points.
249
for
(x) =
i : i =1
xi
(1 xi )
i : i =0
250
Pseudorandom Generators
Putting It Together
Combining Theorem 7.42 with Lemmas 7.48, and 7.39, we obtain the
following locally decodable code:
Proposition 7.49. For every L N, there is an explicit code Enc :
251
Theorem 7.52. If there exists f : {0, 1} {0, 1} in E that is worstcase hard against time t(), then there exists f: {0, 1}O() {0, 1} in
E that is (t (), 1/48) average-case hard, for t () = t()/poly().
An improved decoding distance can be obtained using Problem 7.7.
We note that the local decoder of Theorem 7.51 not only runs
in time poly(log L), but also makes poly(log L) queries. For some
applications (such as Private Information Retrieval, see Problem 7.6),
it is important to have the number q of queries be as small as possible,
ideally a constant. Using ReedMuller codes of constant degree, it
is possible to obtain constant-query locally decodable codes, but the
6 Some
readers may recognize this concatenation step as the same as applying the
GoldreichLevin hardcore predicate to f. (See Problems 7.12 and 7.13.) However, for
the parameters we are using, we do not need the power of these results, and can aord to
perform brute-force unique decoding instead.
252
Pseudorandom Generators
Other Connections
7.6
7.6.1
253
Denition
254
Pseudorandom Generators
Dec_2
f_i(x)
More formally:
Denition 7.54. A local -list-decoding algorithm for a code Enc is
a pair of probabilistic oracle algorithms (Dec1 , Dec2 ) such that for all
received words g and all codewords f = Enc(f ) with dH (f, g) < , the
following holds. With probability at least 2/3 over (a1 , . . . , as ) Decg1 ,
there exists an i [s] such that
x, Pr[Decg2 (x, ai ) = f (x)] 2/3.
Note that we dont explicitly require a bound on the list size s
(to avoid introducing another parameter), but certainly it cannot be
larger than the running time of Dec1 .
As we did for locally (unique-)decodable codes, we can dene a
local -list-correcting algorithm, where Dec2 should recover arbitrary
symbols of the codeword f rather than the message f . In this case,
we dont require that for all j, Decg2 (, aj ) is a codeword, or that it is
255
256
Pseudorandom Generators
(1) Choose y Fm
(2) Output {(y, z) : z F}
This rst-phase decoder is rather trivial in that it doesnt make use
of the oracle access to the received word g. It is possible to improve
both the running time and list size of Dec1 by using oracle access to g,
but we wont need those improvements below.
Now, the task of Dec2 is to calculate p(x), given the value of p
on some point y. Dec2 does this by looking at g restricted to the line
through x and y, and using the list-decoding algorithm for Reed
Solomon Codes to nd the univariate polynomials q1 , q2 , . . . , qt that are
close to g. If exactly one of these polynomials qi agrees with p on the
test point y, then we can be reasonably condent that qi (x) = p(x).
In more detail, the decoder works as follows:
Algorithm 7.58 (ReedMuller Local List-Corrector Dec2 ).
Input: An oracle g : Fm F, an input x Fm , advice (y, z) Fm F,
257
Proof of Claim: It suces to show that Items 7.59 and 7.59 hold
R
with probability 0.99 over the choice of a random point y Fm and
a random line through y; then we can apply Markovs inequality to
nish the job.
Item 7.59 holds by pairwise independence. If the line is chosen
randomly, then the q points on are pairwise independent samples
of Fm . The expected agreement between g| and p| is simply the
258
Pseudorandom Generators
1
,
q (/2)2
which can be madesmaller than 0.01 for a large enough choice of the
constant c in = c d/q.
To prove Item 7.59, we imagine rst choosing the line uniformly
at random from all lines in Fm , and then choosing y uniformly at
random from the points on (reparameterizing so that (1) = y).
Once we choose , we can let q1 , . . . , qt be all polynomials of degree
at most d, other than p| , that have agreement greater than /2 with
g| . (Note that this list is independent of the parametrization of ,
i.e., if (x) = (ax + b) for a = 0 then p| and qi (x) = qi (ax + b)
have agreement equal to agr(p| , qi ).) By the list-decodability
of
ReedSolomon Codes (Proposition 5.15), we have t = O( q/d).
Now, since two distinct polynomials can agree in at most d points,
R
when we choose a random point y , the probability that qi and
p agree at y is at most d/q. After reparameterization of so that
(1) = y, this gives
d
d
.
Pr[i : qi (1) = p(1)] t = O
y
q
q
This can also be made smaller than 0.01 for large enough choice
of the constant c (since we may assume q/d > c2 , else 1 and the
result holds vacuously).
7.6.4
Putting it Together
we apply Lemma 7.48 with |H| = q and m = / log |H|, for total
degree d q . To decode
from a 1 fraction of errors using
Theorem 7.56, we need c d/q , which follows if q c2 2 /4 . This
259
260
Pseudorandom Generators
261
7.7
7.7.1
262
Pseudorandom Generators
(s/O(t),
) pseudorandom generator.
7 Sometimes
it is useful to allow the advice string z to also depend on the coin tosses of
the reduction Red. By error reduction via r = O() repetitions, such a reduction can be
converted into one satisfying Denition 7.65 by sampling r = O() sequences of coin tosses,
but this blows up the advice length by a factor of r, which may be too expensive.
263
264
Pseudorandom Generators
265
Pr
r{0,1}m(n)
R
Pr
r{0,1}m(n)
266
Pseudorandom Generators
7.7.2
267
268
Pseudorandom Generators
than (1 )k, and the seed length is only O(log n) when k = n(1) .
However, these settings of parameters are already sucient for many
purposes, such as the simulation of BPP with weak random sources.
Moreover, the extractor construction is much more direct than that of
Theorem 6.36. Specically, it is
Ext(f, y) = (f(y|S1 ), . . . , f(y|Sm )),
where f is an encoding of f in a locally list-decodable code and
S1 , . . . , Sm are a design. In fact, since Proposition 7.72 does not depend
on the running time of the list-decoding algorithm, but only the
amount of nonuniformity, we can use any (1/2 /2m, poly(m/))
list-decodable code, which will only require an advice of length
O(log(m/)) to index into the list of decodings. In particular, we can
use a ReedSolomon code concatenated with a Hadamard code, as in
Problem 5.2.
We now provide some additional intuition for why black-box
pseudorandom generator constructions are also extractors. A blackbox PRG construction Gf is designed to use a computationally
hard function f (plus a random seed) to produce an output that is
computationally indistinguishable from uniform. When we view it as
an extractor Ext(f, y) = Gf (y), we instead are feeding it a function
f that is chosen randomly from a high min-entropy distribution (plus
a random seed). This can be viewed as saying that f is informationtheoretically hard, and from this stronger hypothesis, we are able
to obtain the stronger conclusion that the output is statistically
indistinguishable from uniform. The information-theoretic hardness
of f can be formalized as follows: if f is sampled from a source F
of min-entropy at least k + log(1/), then for every xed function A
(such as A = RedT ), the probability (over f F ) that there exists
a string z of length k such that A(, z) computes f everywhere is at
most . That is, a function generated with min-entropy larger than
k is unlikely to have a description of length k (relative to any xed
interpreter A).
Similarly to black-box PRG constructions, we can also discuss
converting worst-case hard functions to average-case hard functions in
269
a black-box manner:
Denition 7.74. Let Ampf : [D] [q] be a deterministic algorithm
that is dened for every oracle f : [n] {0, 1}. We say that Amp is a
(t, k, ) black-box worst-case-to-average-case hardness amplier if there
is a probabilistic oracle algorithm Red, called the reduction, running
in time t such that for every function g : [D] [q] such that
Pr[g(U[D] ) = Ampf (U[D] )] > 1/q + ,
there is an advice string z [K], where K = 2k , such that
x [n]
270
Pseudorandom Generators
7.8
Exercises
7.8 Exercises
271
272
Pseudorandom Generators
Problem 7.6 (Private Information Retrieval). The goal of private information retrieval is for a user to be able to retrieve an entry of
a remote database in such a way that the server holding the database
learns nothing about which database entry was requested. A trivial
solution is for the server to send the user the entire database, in which
case the user does not need to reveal anything about the entry desired.
We are interested in solutions that involve much less communication.
One way to achieve this is through replication.8 Formally, in a q-server
private information-retrieval (PIR) scheme, an arbitrary database
D {0, 1}n is duplicated at q noncommunicating servers. On input an
index i [n], the user algorithm U tosses some coins r and outputs
queries (x1 , . . . , xq ) = U (i, r), and sends xj to the jth server. The jth
server algorithm Sj returns an answer yj = Sj (xj , D). The user then
computes its output U (i, r, x1 , . . . , xq ), which should equal Di , the ith
bit of the database. For privacy, we require that the distribution of
each query xj (over the choice of the random coin tosses r) is the same
regardless of the index i being queried.
It turns out that q-query locally decodable codes and q-server PIR
are essentially equivalent. This equivalence is proven using the notion
of smooth codes. A code Enc : {0, 1}n n is a q-query smooth code
if there is a probabilistic oracle algorithm Dec such that for every
message x and every i [n], we have Pr[DecEnc(x) (i) = xi ] = 1 and Dec
makes q nonadaptive queries to its oracle, each of which is uniformly
distributed in [
n]. Note that the oracle in this denition is a valid
codeword, with no corruptions. Below you will show that smooth
codes imply locally decodable codes and PIR schemes; converses are
also known (after making some slight relaxations to the denitions).
8 Another
way is through computational security, where we only require that it be computationally infeasible for the database to learn something about the entry requested.
7.8 Exercises
273
(1) Show that the decoder for a q-query smooth code is also a
local (1/3q)-decoder for Enc.
(2) Show that every q-query smooth code Enc : {0, 1}n n
gives rise to a q-server PIR scheme in which the user and
servers communicate at most q (log n
+ log ||) bits for
each database entry requested.
(3) Using the ReedMuller code, show that there is a polylog(n)server PIR scheme with communication complexity
polylog(n) for n-bit databases. That is, the user and servers
communicate at most polylog(n) bits for each database
entry requested. (For constant q, the ReedMuller code with
an optimal systematic encoding as in Problem 5.4 yields a
q-server PIR with communication complexity O(n1/(q1) ).)
274
Pseudorandom Generators
(3) Show that if, for every m, we can construct an (m, 1/2) hitting set Hm in time s(m) = poly(m), then BPP = P. (Hint:
this can be proven in two ways. One uses Problem 3.1 and
the other uses a variant of Problem 7.1 together with Corollary 7.64. How do the parameters for general s(m) compare?)
(4) Dene the notion of a (t, k, ) black-box construction of
hitting set-generators, and show that, when t = , such
constructions are equivalent to constructions of dispersers
(Denition 6.19).
7.8 Exercises
275
(2) Show that the above fails for one-way functions. That
is, assuming that there exists a one-way function g, construct a one-way function f which doesnt remain one
way under composition. (Hint: for |x| = |y| = /2, set
f (x, y) = 1|g(y)| g(y) unless x {0 , 1 }.)
276
Pseudorandom Generators
7.8 Exercises
277
1
1
+ c,
2
(1) Let Enc : {0, 1} {0, 1}L be a code such that given
Enc(x)y can be computed in time
x {0, 1} and y [L],
poly(). Suppose that for every constant c and all suciently
large , Enc has a (1/2 1/c ) local list-decoding algorithm
(Dec1 , Dec2 ) in which both Dec1 and Dec2 run in time
poly(). Prove that if f : {0, 1} {0, 1} is a one-way
function, then b(x, y) = Enc(x)y is a hardcore predicate for
the one-way function f (x, y) = (f (x), y).
(2) Show that if b : {0, 1} {0, 1} is a hardcore predicate for
a one-way permutation f : {0, 1} {0, 1} , then for every
m = poly(), the following function G : {0, 1} {0, 1}m is
a cryptographic pseudorandom generator:
G(x) = (b(x), b(f (x)), b(f (f (x))), . . . , b(f (m1) (x))).
(Hint: show that G is previous-bit unpredictable.)
(3) Using Problem 7.12, deduce that if f : {0, 1} {0, 1} is
a one-way permutation, then for every m = poly(), the
following is a cryptographic pseudorandom generator:
Gm (x, r) = (x, r, f (x), r, f (f (x)), r, . . . , f (m1) (x), r).
278
Pseudorandom Generators
7.9
279
280
Pseudorandom Generators
Luby, and Rubinfeld [71] also dened and constructed self-testers for functions,
which allow one to eciently determine whether a program does indeed compute a function
correctly on most inputs before attempting to use self-correction. Together a self-tester and
self-corrector yield a program checker in the sense of [70]. The study of self-testers gave
rise to the notion of locally testable codes, which are intimately related to probabilistically
checkable proofs [41, 42], and to the notion of property testing [165, 337, 340], which is an
area within sublinear-time algorithms.)
281
282
Pseudorandom Generators
and Yekhanin [249] constructed the rst locally decodable codes with
sublinear-time decoding and rate larger 1/2.
Techniques for Hardness Amplication (namely, the Direct Product
Theorem and XOR Lemma) were rst described in oral presentations of
Yaos paper [421]. Since then, these results have been strengthened and
generalized in a number of ways. See the survey [171] and Section 8.2.3.
The rst local list-decoder for ReedMuller codes was given by Arora
and Sudan [35] (stated in the language of program self-correctors). The
one in Theorem 7.56 is due to Sudan, Trevisan, and Vadhan [381], who
also gave a general denition of locally list-decodable codes (inspired
by a list-decoding analogue of program self-correctors dened by Ar
et al. [30]) and explicitly proved Theorems 7.60, 7.61, and 7.62.
The result that BPP = P if E has a function of nonuniform worstcase hardness s() = 2() (Corollary 7.64, Part 1) is from the earlier
work of Impagliazzo and Wigderson [215], who used derandomized
versions of the XOR Lemma to obtain sucient average-case hardness
for use in the NisanWigderson pseudorandom generator. An optimal
construction of pseudorandom generators from worst-case hard functions, with seed length d(m) = O(s1 (poly(m))) (cf., Theorem 7.63),
was given by Shaltiel and Umans [356, 399].
For more background on AM, see the Notes and References of
Section 2. The rst evidence that AM = NP was given by Arvind
and K
obler [37], who showed that one can use the NisanWigderson
generator with a function that is (2() , 1/2 1/2() )-hard for nondeterministic circuits. Klivans and van Melkebeek [244] observed that
the ImpagliazzoWigderson pseudorandom generator construction
is black box and used this to show that AM can be derandomized using functions that are worst-case hard for circuits with
an NP oracle (Theorem 7.68). Subsequent work showed that one
only needs worst-case hardness against a nonuniform analogue of
NP co-NP [289, 356, 357].
Trevisan [389] showed that black-box pseudorandom generator
constructions yield randomness extractors, and thereby obtained the
extractor construction of Theorem 7.73. This surprising connection
between complexity-theoretic pseudorandomness and informationtheoretic pseudorandomness sparked much subsequent work, from
283
which the unied theory presented in this survey emerged. The fact
that black-box hardness ampliers are a form of locally list-decodable
codes was explicitly stated (and used to deduce lower bounds on
advice length) in [397]. The use of black-box constructions to classify
and separate cryptographic primitives was pioneered by Impagliazzo
and Rudich [213]; see also [326, 330].
Problem 7.1 (PRGs imply hard functions) is from [302]. Problem 7.2
is a special case of the technique called translation or padding
in complexity theory. Problem 7.4 (Deterministic Approximate
Counting) is from [302]. The fastest known deterministic algorithms
for approximately counting the number of satisfying assignments to
a DNF formula are from [280] and [178] (depending on whether the
approximation is relative or additive, and the magnitude of the error).
The fact that hitting set generators imply BPP = P (Problem 7.8)
was rst proven by Andreev, Clementi, and Rolim [27]; for a more
direct proof, see [173]. Problem 7.9 (that PRGs vs. uniform algorithms
imply average-case derandomization) is from [216]. Goldreich [163]
showed that PRGs are necessary for derandomization (Problem 7.10).
The result that one-to-one one-way functions imply pseudorandom
generators is due to Goldreich, Krawczyk, and Luby [167]; the proof
in Problem 7.14 is from [197].
For more on Kolmogorov complexity, see [261]. In recent years,
connections have been found between Kolmogorov complexity and
derandomization; see [14]. The tighter equivalence between circuit size
and nonuniform computation time mentioned after Denition 7.1 is due
to Pippenger and Fischer [311]. The 5n O(n) lower bound on circuit
size is due to Iwama, Lachish, Morizumi, and Raz [218, 254]. The
fact that single-sample indistinguishability against uniform algorithms
does not imply multiple-sample indistinguishability unless we make
additional assumptions such as ecient samplability (in contrast to
Proposition 7.14), is due to Goldreich and Meyer [169]. (See also [172].)
8
Conclusions
8.1
of x
Samp(x)y
Con(x, y)
hitting samplers
k k + d
(lossless) condensers
yth nbr
expanders
(, )
Ext(x, y)
averaging samplers
(k, )
black-box PRGs
(= K, A)
Samp(x)y
hardness ampliers
(, )
Gx (y)
list-decodable codes
(t, k, ) black-box
extractors
(t, k, )
(x, y)
(y, Enc(x)y )
Object
(1 1/q , K)
(local w/advice)
dont care
|LIST(T, (T ) + )| K
|LIST(T, (T ) + )| K
t = poly(m)
dont care
dont care
|LIST(T, (T ) + )| K
|T | < (1 )DK
(local w/advice)
dont care
|T | < AK
|LIST(T, (T ) + )| K
dont care
t = poly(log n, 1/)
|LIST(T, (T ) + )| K
T = {(y, ry )}
|LIST(T, (T ) + )| K
Decoding Time
poly(n, 1/)
List-Decoding Problem
T = {(y, ry )}
5.33
k = O(m) [polylog(n), n]
n = O(m + log(1/))
d = O(log(n/)),
4.7
6.33
Prob.
7.72
k = t = poly(m) [polylog(n), n ]
D = O(1), A = 1 + (1),
K = N/2, M = N
K = N , D = poly(1/, log(1/)),
6.23
k = O(m) [polylog(n), n]
= 1/m,
5.30
7.74
Prop.
5.29
n = O(m + log(1/))
d = O(log(n/)),
k = t = poly(log n, 1/) n
K = N , D = poly(1/, log(1/)),
M = Dq = O(n), K = poly(n)
d = O(log n), q = 2, M = Dq = poly(n),
Standard Parameters
q = O(1), = (1),
Table 8.1. Capturing Pseudorandom Objects : [N ] [D] [M ] by bounding sizes of LIST (T, ) = {x : Pry [(x, y) T ] > } for
T [M ]. The parameter denotes a arbitrarily small positive constant. As usual, N = 2n , M = 2m , D = 2d , and K = 2k .
285
286
Conclusions
287
where the Xi s are iid {0, 1}-valued random variables such that
Pr[Xi = 1] = , and dene
t
1
Bin(t, , 1) = Pr
Xi = 1 = t .
t
i=1
min K N :
.
2
|T |
K
4K |T |2
We leave the proof of Theorem 8.2 as an exercise and instead show
how many of the nonconstructive bounds weve seen for various pseudorandom objects would simultaneously hold for the of Theorem 8.2
by setting parameters appropriately:
The nonconstructive bounds for expanders (Theorem 4.4)
can be obtained by setting M = N , = 1, = (T ),
K = |T |/(D 2) (ignoring round-o errors), and noting that
N
N
Bin(D, , 1)K
|T |
K
Ne K
N e |T | DK
K
|T |
(D2)K
(D 2) e K
e
=
DK
= ((D 2)eD1 )K
< 1/(4K 2 |T |2 ),
provided is suciently small. Thus, by Proposition 5.33,
denes a (N, (D 2)) vertex expander for a suciently
small .
288
Conclusions
and
t
1
PBin(t, , 1, ) = sup Pr
Yi = 1 = Bin(t, , 1),
t
(Y1 ,...,Yt )
i=1
discussion after Theorem 4.4 states a slightly stronger result, only requiring D >
(H() + H(A))/(H() AH(1/)), but this is based on choosing a graph that is the
union of D random perfect matchings, which does indeed have slightly better expansion
properties than the model we are using now, of choosing D random neighbors for each left
vertex.
289
min K N :
.
|T |
K
4K 2 |T |2
Nonconstructive bounds for strong vertex expanders,
hitting samplers, and lossless condensers (matching the
bounds for the non-strong ones) follow by noting that
PBin(n, , 1, ) Bin(n, , 1). Indeed, writing i for the
expectation of Bernoulli random variable Yi , then
t
t
t
t
1
1
Pr
Yi = 1 =
i
i t ,
t
t
i=1
i=1
i=1
290
Conclusions
T = {(i, j) : i [D], j mi } in more than (1 )D positions. For a set S T , the probability that G(R) T = S
is at most (1/q)|S| (1 1/q)D|S| . (This is exactly the
probability in case S contains only pairs (i, j) with D
distinct values of i, otherwise the probability is zero.) Thus,
mi
i
(1/q)s (1 1/q)Ds
Yi > 1
Pr
s
i
s>(1)D
1
D
q
=
s>(1)D
D
(q 1)s
s
q Hq (,D)D
.
qD
8.2
291
292
Conclusions
Nisan [299] uses a seed of length O(log2 m) to produce m bits that are
pseudorandom to oblivious, read-once branching programs of width
m. At rst, this only seems to imply that RL L2 , which already
follows from Savitchs Theorem [349] that NL L2 . Nevertheless,
Nisans generator and its additional properties has been used in more
sophisticated ways to obtain highly nontrivial derandomizations of
RL. Specically, Nisan [300] used it to show that every problem in RL
can be solved simultaneously in polynomial time and O(log2 n) space,
and Saks and Zhou [344] used it to prove that RL L3/2 . Another
important pseudorandom generator for space-bounded computation
is that of Nisan and Zuckerman [303], which uses a seed of length
O(log m) to produce logk m bits that are pseudorandom to oblivious,
read-once branching programs of width m. None of these results have
been improved in nearly two decades.
However, substantially better generators have been constructed
for restricted classes of oblivious read-once branching programs.
Specically, there are pseudorandom generators or hitting-set gen
erators (see Problem 7.8) stretching a seed of length O(log
m)
to m bits that fool combinatorial rectangles (which check membership in a rectangle S1 S2 Sm/ log m , where each
Si {0, 1}log m ) [136, 264, 31, 271, 179], branching programs of
width 2 and 3 [345, 73, 363, 179], constant-width regular branching
programs (where the transition function at each layer is regular) [82, 84], and constant-width permutation branching programs
(where each input string induces a permutation of the states at each
layer) [338, 250, 113, 373]. However, the following remains open:
Open Problem 8.6. Is there an explicit pseudorandom generator
2
G : {0, 1}o(log m) {0, 1}m whose output distribution is pseudorandom
to oblivious, read-once branching programs of width 4?
8.2.2
Derandomization from Uniform Assumptions. The construction of pseudorandom generators we have seen (Theorem 7.63) requires
nonuniform circuit lower bounds for functions in E, and it is of interest
293
294
Conclusions
295
Thierauf [86] and then explicitly and in stronger form by Impagliazzo, Kabanets, and Wigderson [212]. Specically, these results show
that if MA (like NP, but with probabilistic verication of witnesses)
can be derandomized (e.g., MA = NP or even MA NSUBEXP),
then NEXP P/poly. Derandomization of prBPP implies derandomization of MA, so this also implies that if prBPP = prP or
even prBPP prSUBEXP, then NEXP P/poly. This result falls
short of giving a converse to Corollary 7.64 (Item 3) because the circuit
lower bounds are for NEXP rather than EXP. (Corollary 7.64, as well
as most of the other derandomization results weve seen, apply equally
well to prBPP as to BPP.) In addition, the result does not give exponential circuit lower bounds even if we assume full derandomization
(prBPP = prP). However, Santhanam [347] shows that prBPP =
prP implies that for every constant k, there is a language in NP that
does not have circuits of size nk , which can be viewed as a scaled down
version of the statement that NE requires circuits of size 2(n) .2
Thus, the following remain open:
Open Problem 8.8. Does prBPP = prP imply E P/poly
(equivalently EXP P/poly, by Problem 7.2)?
Open Problem 8.9. Does prBPP = prP imply that NEXP has a
(1)
problem requiring nonuniform boolean circuits of size 2
on inputs
of length ?
By the result of [212] and Corollary 2.31, nding a deterministic polynomial-time algorithm for the prBPP-complete problem
[+]-Approx Circuit Average implies superpolynomial circuit lower
bounds for NEXP. Unfortunately, we do not know a wide variety
of natural problems that are complete for prBPP (unlike NP).
Nevertheless, Kabanets and Impagliazzo [227] showed that nding
a deterministic polynomial-time algorithm for Polynomial Identity
2 Indeed,
296
Conclusions
297
Hardness Amplication
298
Conclusions
299
ensures that f is (1/2 1/2(k) )-hard. Healy, Vadhan, and Viola [202]
showed how to derandomize ODonnells construction, so that the
inputs x1 , . . . , xk can be generated in a correlated way by a much
shorter input to f . This allows for taking k to be exponential in the
input length of f , and for certain combining functions C, the function
f is still in NP (using the ability of a nondeterministic computation
to compute exponential-sized ORs). As a result, assuming that f is
mildly hard for nonuniform algorithms running in time 2(n) (the
1/2
high end), they obtain f NP that is (1/2 1/2(n ) )-hard where
n is the input length of f . A quantitative improvement was given by
1/2
[273, 179], replacing 2(n ) with 2n /polylog(n ) , but it remains open to
achieve the optimal bound of 2(n ) .
Uniform Reductions: Another line of work has sought to give
hardness amplication results for uniform algorithms, similarly to
the work on derandomization from uniform assumptions described in
Section 8.2.2. Like with cryptographic pseudorandom generators, most
of the hardness amplication results in the cryptographic setting, such
as Yaos original hardness amplication for one-way functions [421],
also apply to uniform algorithms. In the noncryptographic setting,
a diculty is that black-box hardness amplication corresponds to
error-correcting codes that can be decoded from very large distances,
such as 1/2 in the case of binary codes, and at these distances
unique decoding is impossible, so one must turn to list decoding and
use some nonuniform advice to select the correct decoding from the
list. (See Denition 7.74 and the discussion after it. For amplication
from mild average-case hardness rather than worst-case hardness, the
coding-theoretic interpretation is that the decoding algorithm only
needs to recover a string that is very close to the original message,
rather than exactly equal to the message [209, 390]; this also requires
list decoding for natural settings of parameters.) However, unlike
the case of pseudorandom generator constructions, here the number
of candidates in the list can be relatively small (e.g., poly(1/)),
so a reasonable goal for a uniform algorithm is to produce a list of
possible decodings, as in our denition of locally list-decodable codes
(Denition 7.54). As observed in [216, 397, 390], if we are interested in
300
Conclusions
301
8.2.4
Deterministic Extractors
302
Conclusions
303
for condensers, it no longer seems necessary that the extractor has higher
complexity than the source [120].
304
Conclusions
De and Watson [114] and Viola [414, 415] have recently obtained
unconditional extractors for restricted classes of sampling circuits,
such as NC0 (where each output bit depends on a constant number of
input bits) [114, 414] AC0 [414], even for sublinear min-entropy. These
results are based on a close connection between constructing extractors
for a class of circuits and nding explicit distributions that are hard
for circuits in the class to sample, the latter a topic that was also
studied for the purpose of proving data structure lower bounds [415].
It is also natural to look at sources that are low complexity in an
algebraic sense. The simplest such model is that of ane sources, which
are uniform over ane subspaces of Fn for a nite eld F. If the subspace has dimension k, then the source is a at source of min-entropy
k log |F|. For the case of F = Z2 , there are now explicit extractors for
ane sources of sublinear min-entropy k = O(n) [78, 422, 262] and dispersers for ane sources of subpolynomial min-entropy k = no(1) [353].
For large elds F (i.e., |F| = poly(n)), there are extractors for subspaces
of every dimension k 1 [147]. There have also been works on extractors for sources described by polynomials of degree larger than 1, either
as the output distribution of a low-degree polynomial map [123] or the
zero set of low-degree polynomial (i.e., an algebraic variety) [122].
We remark that several of the state-of-art constructions for
independent sources and ane sources, such as [47, 48, 353], are quite
complicated, using sophisticated compositions and/or machinery from
arithmetic combinatorics. It is of interest to nd simpler constructions;
some progress has been made for ane sources in [60, 262] and for independent sources in [62] (where the latter in fact gives a reductions from
constructing 2-source extractors to constructing ane extractors).
8.2.5
Algebraic Pseudorandomness
305
306
Conclusions
307
308
Conclusions
constructions described here have some additional components in addition to the basic
polynomial evaluation framework (f, y) = (f1 (y), . . . , fm (y)), for example the seed should
also specify the axis along which the line is parallel in [289] and a position in an inner
encoding in [384, 356, 399]. We ignore these components in this informal discussion.
Acknowledgments
My exploration of pseudorandomness began in my graduate and postdoctoral years at MIT and IAS, under the wonderful guidance of Oded
Goldreich, Sha Goldwasser, Madhu Sudan, and Avi Wigderson. It was
initiated by an exciting reading group organized at MIT by Luca Trevisan, which immersed me in the subject and started my extensive collaboration with Luca. Through fortuitous circumstances, I also began
to work with Omer Reingold, starting what I hope will be a lifelong collaboration. I am indebted to Oded, Sha, Madhu, Avi, Luca, and Omer
for all the insights and research experiences they have shared with me.
I have also learned a great deal from my other collaborators on
pseudorandomness, including Boaz Barak, Eli Ben-Sasson, Michael
Capalbo, Kai-Min Chung, Nenad Dedic, Yevgeniy Dodis, Parikshit
Gopalan, Dan Gutfreund, Venkat Guruswami, Iftach Haitner, Alex
Healy, Thomas Holenstein, Jesse Kamp, Danny Lewin, Adriana
L
opez-Alt, Shachar Lovett, Chi-Jen Lu, Raghu Meka, Ilya Mironov,
Michael Mitzenmacher, Shien Jin Ong, Michael Rabin, Anup Rao, Ran
Raz, Yakir Reshef, Leo Reyzin, Thomas Ristenpart, Eyal Rozenman,
Thomas Steinke, Madhur Tulsiani, Chris Umans, Emanuele Viola,
Hoeteck Wee, Colin Jia Zheng, and David Zuckerman. Needless to
309
310
Acknowledgments
say, this list omits many other researchers in the eld with whom I
have had stimulating discussions.
The starting point for this survey was scribe notes taken by
students in the 2004 version of my graduate course on Pseudorandomness. I thank those students for their contribution: Alexandr Andoni,
Adi Akavia, Yan-Cheng Chang, Denis Chebikin, Hamilton Chong,
Vitaly Feldman, Brian Greenberg, Chun-Yun Hsiao, Andrei Jorza,
Adam Kirsch, Kevin Matulef, Mihai P
atrascu, John Provine, Pavlo
Pylyavskyy, Arthur Rudolph, Saurabh Sanghvi, Grant Schoenebeck,
Jordanna Schutz, Sasha Schwartz, David Troiano, Vinod Vaikuntanathan, Kartik Venkatram, David Woodru. I also thank the
students from the other oerings of the course; Dan Gutfreund, who
gave some guest lectures in 2007; and all of my teaching fellows,
Johnny Chen, Kai-Min Chung, Minh Nguyen, Emanuele Viola, and
Colin Jia Zheng. Special thanks are due to Levent Alpoge, Michael
Forbes, Dieter van Melkebeek, Greg Price, and Adam Sealfon for their
extensive feedback on the lecture notes and/or drafts of this survey.
Helpful input of various types (corrections, comments, answering
questions) has also been given by Zachary Abel, Dana Albrecht, Nir
Avni, Pablo Azar, Trevor Bass, Osbert Bastiani, Jeremy Booher,
Fan Chung, Ben Dozier, Chinmoy Dutta, Zhou Fan, Oded Goldreich,
Andrey Grinshpun, Venkat Guruswami, Alex Healy, Stephan Holzer,
Andrei Jorza, Michael von Kor, Kevn Lee, Alex Lubotzky, Avner May,
Eric Miles, Shira Mitchell, Jelani Nelson, Omer Reingold, Yakir Reshef,
Shubhangi Saraf, Shrenik Shah, Madhu Sudan, Justin Thaler, Jon Ullman, Ameya Velingker, Neal Wadhwa, Hoeteck Wee, Avi Wigderson,
and David Wu. Thanks to all of them, as well as those I have forgotten.
I am extremely grateful to James Finlay of now publishers for his
years of patience and continual encouragement, without which I would
have never nished this survey.
My time working on this survey included sabbatical visits to
the Miller Institute for Basic Research in Science at UC Berkeley,
Microsoft Research Silicon Valley, and Stanford University, as well
as support from a Sloan Fellowship, a Guggenheim Fellowship, NSF
grants CCF-0133096 and CCF-1116616, ONR grant N00014-04-1-0478,
and US-Israel BSF grants 2006060 and 2010196.
References
[1] S. Aaronson and D. van Melkebeek, On circuit lower bounds from derandomization, Theory of Computing. An Open Access Journal, vol. 7, pp. 177184,
2011.
[2] M. Abadi, J. Feigenbaum, and J. Kilian, On hiding information from an oracle, Journal of Computer and System Sciences, vol. 39, no. 1, pp. 2150, 1989.
[3] L. Adleman, Two theorems on random polynomial time, in Annual
Symposium on Foundations of Computer Science (Ann Arbor, Mich., 1978),
pp. 7583, Long Beach, California, 1978.
[4] M. Agrawal, On derandomizing tests for certain polynomial identities, in
IEEE Conference on Computational Complexity, pp. 355, 2003.
[5] M. Agrawal and S. Biswas, Primality and identity testing via Chinese
remaindering, Journal of the ACM, vol. 50, no. 4, pp. 429443, 2003.
[6] M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P, Annals of
Mathematics. Second Series, vol. 160, no. 2, pp. 781793, 2004.
[7] M. Agrawal and V. Vinay, Arithmetic circuits: A chasm at depth four, in
FOCS, pp. 6775, 2008.
[8] A. V. Aho, ed., Proceedings of the Annual ACM Symposium on Theory of
Computing, 1987, New York, USA, 1987.
[9] M. Ajtai, H. Iwaniec, J. Koml
os, J. Pintz, and E. Szemeredi, Construction of a thin set with small Fourier coecients, Bulletin of the London
Mathematical Society, vol. 22, no. 6, pp. 583590, 1990.
[10] M. Ajtai, J. Koml
os, and E. Szemeredi, Sorting in c log n parallel steps,
Combinatorica, vol. 3, no. 1, pp. 119, 1983.
311
312
References
References
313
[27] A. E. Andreev, A. E. F. Clementi, and J. D. P. Rolim, Worst-case hardness suces for derandomization: A new method for hardness-randomness
trade-os, in Automata, Languages and Programming, 24th International
Colloquium, vol. 1256 of Lecture Notes in Computer Science, (P. Degano,
R. Gorrieri, and A. Marchetti-Spaccamela, eds.), pp. 177187, Bologna, Italy:
Springer-Verlag, 711 July 1997.
[28] A. E. Andreev, A. E. F. Clementi, J. D. P. Rolim, and L. Trevisan, Weak
random sources, hitting sets, and BPP simulations, SIAM Journal on
Computing, vol. 28, no. 6, pp. 21032116, (electronic) 1999.
[29] D. Angluin and D. Lichtenstein, Provable security of cryptosystems: A survey, Technical Report YALEU/DCS/TR-288, Yale University, Department
of Computer Science, 1983.
[30] S. Ar, R. J. Lipton, R. Rubinfeld, and M. Sudan, Reconstructing algebraic
functions from mixed data, SIAM Journal on Computing, vol. 28, no. 2,
pp. 487510, 1999.
[31] R. Armoni, M. Saks, A. Wigderson, and S. Zhou, Discrepancy sets and pseudorandom generators for combinatorial rectangles, in Annual Symposium on
Foundations of Computer Science (Burlington, VT, 1996), pp. 412421, Los
Alamitos, CA, 1996.
[32] S. Arora and B. Barak, Computational complexity. Cambridge: Cambridge
University Press, 2009. (A modern approach).
[33] S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy, Proof verication and the hardness of approximation problems, Journal of the ACM,
vol. 45, pp. 501555, May 1998.
[34] S. Arora and S. Safra, Probabilistic checking of proofs: A new characterization of NP, Journal of the ACM, vol. 45, pp. 70122, January 1998.
[35] S. Arora and M. Sudan, Improved low degree testing and its applications,
in Proceedings of the Annual ACM Symposium on Theory of Computing,
pp. 485495, El Paso, Texas, 46 May 1997.
[36] M. Artin, Algebra. Englewood Clis, NJ: Prentice Hall Inc., 1991.
[37] V. Arvind and J. K
obler, On pseudorandomness and resource-bounded
measure, Theoretical Computer Science, vol. 255, no. 12, pp. 205221, 2001.
[38] B. Aydinlio
glu, D. Gutfreund, J. M. Hitchcock, and A. Kawachi, Derandomizing Arthur-Merlin games and approximate counting implies exponential-size
lower bounds, Computational Complexity, vol. 20, no. 2, pp. 329366, 2011.
[39] B. Aydinlio
glu and D. van Melkebeek, Nondeterministic circuit lower bounds
from mildly derandomizing Arthur-Merlin games, Electronic Colloquium on
Computational Complexity (ECCC), vol. 19, p. 80, 2012.
[40] Y. Azar, R. Motwani, and J. Naor, Approximating probability distributions
using small sample spaces, Combinatorica, vol. 18, no. 2, pp. 151171, 1998.
[41] L. Babai, L. Fortnow, L. A. Levin, and M. Szegedy, Checking computations
in polylogarithmic time, in STOC, (C. Koutsougeras and J. S. Vitter, eds.),
pp. 2131, ACM, 1991.
[42] L. Babai, L. Fortnow, and C. Lund, Nondeterministic exponential time has
two-prover interactive protocols, Computational Complexity, vol. 1, no. 1,
pp. 340, 1991.
314
References
References
315
[58] A. Ben-Aroya and A. Ta-Shma, A combinatorial construction of almostRamanujan graphs using the zig-zag product, in Annual ACM Symposium
on Theory of Computing (Victoria, British Columbia), pp. 325334, 2008.
[59] E. Ben-Sasson, O. Goldreich, P. Harsha, M. Sudan, and S. Vadhan, Robust
PCPs of proximity, shorter PCPs and applications to coding, SIAM Journal
on Computing, vol. 36, no. 4, pp. 889974, 2006.
[60] E. Ben-Sasson and S. Kopparty, Ane dispersers from subspace polynomials, in STOC09 Proceedings of the 2009 ACM International Symposium
on Theory of Computing, pp. 6574, New York, 2009.
[61] E. Ben-Sasson, M. Sudan, S. Vadhan, and A. Wigderson, Randomnessecient low degree tests and short PCPs via epsilon-biased sets, in
Proceedings of the Annual ACM Symposium on Theory of Computing,
pp. 612621, New York, 2003.
[62] E. Ben-Sasson and N. Zewi, From ane to two-source extractors via approximate duality, in STOC, (L. Fortnow and S. P. Vadhan, eds.), pp. 177186,
ACM, 2011.
[63] C. H. Bennett, G. Brassard, and J.-M. Robert, Privacy amplication by
public discussion, SIAM Journal on Computing, vol. 17, no. 2, pp. 210229,
1988. Special issue on cryptography.
[64] S. J. Berkowitz, On computing the determinant in small parallel time using
a small number of processors, Information Processing Letters, vol. 18, no. 3,
pp. 147150, 1984.
[65] E. R. Berlekamp, Algebraic Coding Theory. New York: McGraw-Hill Book
Co., 1968.
[66] E. R. Berlekamp, Factoring polynomials over large nite elds, Mathematics
of Computation, vol. 24, pp. 713735, 1970.
[67] J. Bierbrauer, T. Johansson, G. Kabatianskii, and B. Smeets, On families
of hash functions via geometric codes and concatenation, in Advances in
cryptology CRYPTO 93 (Santa Barbara, CA, 1993), vol. 773 of Lecture
Notes in Computer Science, pp. 331342, Berlin: Springer, 1994.
[68] Y. Bilu and N. Linial, Lifts, discrepancy and nearly optimal spectral gap,
Combinatorica, vol. 26, no. 5, pp. 495519, 2006.
[69] M. Blum, Independent unbiased coin ips from a correlated biased source
a nite state Markov chain, Combinatorica, vol. 6, no. 2, pp. 97108, 1986.
Theory of computing (Singer Island, Fla., 1984).
[70] M. Blum and S. Kannan, Designing programs that check their work,
Journal of the ACM, vol. 42, no. 1, pp. 269291, 1995.
[71] M. Blum, M. Luby, and R. Rubinfeld, Self-testing/correcting with applications to numerical problems, Journal of Computer and System Sciences,
vol. 47, no. 3, pp. 549595, 1993.
[72] M. Blum and S. Micali, How to generate cryptographically strong sequences
of pseudorandom bits, SIAM Journal on Computing, vol. 13, no. 4,
pp. 850864, 1984.
[73] A. Bogdanov, Z. Dvir, E. Verbin, and A. Yehudayo, Pseudorandomness
for Width 2 Branching Programs, Electronic Colloquium on Computational
Complexity (ECCC), vol. 16, p. 70, 2009.
316
References
References
317
[90] R. Canetti, Y. Dodis, S. Halevi, E. Kushilevitz, and A. Sahai, Exposureresilient functions and all-or-nothing transforms, in Advances in
Cryptology EUROCRYPT 00, Lecture Notes in Computer Science,
(B. Preneel, ed.), Springer-Verlag, 1418 May 2000.
[91] R. Canetti, G. Even, and O. Goldreich, Lower bounds for sampling algorithms for estimating the average, Information Processing Letters, vol. 53,
no. 1, pp. 1725, 1995.
[92] M. Capalbo, O. Reingold, S. Vadhan, and A. Wigderson, Randomness conductors and constant-degree lossless expanders, in Annual ACM Symposium
on Theory of Computing (STOC 02), pp. 659668, Montreal, CA, May 2002.
(Joint session with CCC 02).
[93] J. L. Carter and M. N. Wegman, Universal classes of hash functions,
Journal of Computer and System Sciences, vol. 18, no. 2, pp. 143154, 1979.
[94] J. Cheeger, A lower bound for the smallest eigenvalue of the Laplacian,
in Problems in analysis (Papers dedicated to Salomon Bochner, 1969),
pp. 195199, Princeton, NJ: Princeton Univ. Press, 1970.
[95] H. Cherno, A measure of asymptotic eciency for tests of a hypothesis
based on the sum of observations, Annals of Mathematical Statistics, vol. 23,
pp. 493507, 1952.
[96] B. Chor and O. Goldreich, Unbiased bits from sources of weak randomness
and probabilistic communication complexity, SIAM Journal on Computing,
vol. 17, pp. 230261, April 1988.
[97] B. Chor and O. Goldreich, On the power of two-point based sampling,
Journal of Complexity, vol. 5, no. 1, pp. 96106, 1989.
[98] B. Chor, O. Goldreich, J. H
astad, J. Friedman, S. Rudich, and R. Smolensky,
The bit extraction problem of t-resilient functions (Preliminary Version),
in FOCS, pp. 396407, IEEE, 1985.
[99] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, Private information
retrieval, Journal of the ACM, vol. 45, no. 6, pp. 965982, 1998.
[100] F. Chung and R. Graham, Sparse quasi-random graphs, Combinatorica,
vol. 22, no. 2, pp. 217244, 2002. (Special issue: Paul Erd
os and his
mathematics).
[101] F. Chung and R. Graham, Quasi-random graphs with given degree
sequences, Random Structures and Algorithms, vol. 32, no. 1, pp. 119, 2008.
[102] F. Chung, R. Graham, and T. Leighton, Guessing secrets, Electronic Journal of Combinatorics, vol. 8, no. 1, p. 25 (electronic), 2001. Research Paper 13.
[103] F. R. K. Chung, Diameters and eigenvalues, Journal of the American
Mathematical Society, vol. 2, no. 2, pp. 187196, 1989.
[104] F. R. K. Chung, R. L. Graham, and R. M. Wilson, Quasi-random graphs,
Combinatorica, vol. 9, no. 4, pp. 345362, 1989.
[105] K.-M. Chung, Ecient parallel repetition theorems with applications to
security amplication, PhD Thesis, Harvard University, 2011.
[106] K.-M. Chung, Y. T. Kalai, F.-H. Liu, and R. Raz, Memory delegation,
in Advances in Cryptology CRYPTO 2011, vol. 6841 of Lecture Notes in
Computer Science, pp. 151168, Heidelberg: Springer, 2011.
318
References
References
319
320
References
References
321
322
References
References
323
[193] V. Guruswami and C. Wang, Optimal rate list decoding via derivative codes,
in Approximation, randomization, and combinatorial optimization, vol. 6845
of Lecture Notes in Computer Science, pp. 593604, Heidelberg: Springer,
2011.
[194] V. Guruswami and C. Xing, Folded codes from function eld towers and
improved optimal rate list decoding, in STOC, (H. J. Karlo and T. Pitassi,
eds.), pp. 339350, ACM, 2012.
[195] D. Gutfreund, R. Shaltiel, and A. Ta-Shma, Uniform hardness versus
randomness tradeos for Arthur-Merlin games, Computational Complexity,
vol. 12, no. 34, pp. 85130, 2003.
[196] J. H
astad, Computational Limitations of Small-Depth Circuits. MIT Press,
1987.
[197] J. H
astad, R. Impagliazzo, L. A. Levin, and M. Luby, A pseudorandom
generator from any one-way function, SIAM Journal on Computing, vol. 28,
no. 4, pp. 13641396, 1999.
[198] I. Haitner, O. Reingold, and S. Vadhan, Eciency improvements in constructing pseudorandom generators from one-way functions, in Proceedings
of the Annual ACM Symposium on Theory of Computing (STOC 10),
pp. 437446, 68 June 2010.
[199] R. W. Hamming, Error detecting and error correcting codes, The Bell
System Technical Journal, vol. 29, pp. 147160, 1950.
[200] J. Hartmanis and R. E. Stearns, On the computational complexity of
algorithms, Transactions of the American Mathematical Society, vol. 117,
pp. 285306, 1965.
[201] N. J. A. Harvey, Algebraic structures and algorithms for matching and
matroid problems, in Annual IEEE Symposium on Foundations of Computer
Science (Berkeley, CA), pp. 531542, 2006.
[202] A. Healy, S. Vadhan, and E. Viola, Using nondeterminism to amplify
hardness, SIAM Journal on Computing, vol. 35, no. 4, pp. 903931, 2006.
[203] A. D. Healy, Randomness-ecient sampling within NC1 , Computational
Complexity, vol. 17, no. 1, pp. 337, 2008.
[204] W. Hoeding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, vol. 58, pp. 1330,
1963.
[205] A. J. Homan, On eigenvalues and colorings of graphs, in Graph Theory and
its Applications (Proceedings of Advanced Seminors, Mathematics Research
Center, University of Wisconsin, Madison, Wisconsin, 1969), pp. 7991, New
York: Academic Press, 1970.
[206] T. Holenstein, Key agreement from weak bit agreement, in STOC05:
Proceedings of the Annual ACM Symposium on Theory of Computing,
pp. 664673, New York, 2005.
[207] S. Hoory, N. Linial, and A. Wigderson, Expander graphs and their
applications, Bulletin of the AMS, vol. 43, no. 4, pp. 439561, 2006.
[208] R. Impagliazzo, Hard-core distributions for somewhat hard problems,
in Annual Symposium on Foundations of Computer Science, pp. 538545,
Milwaukee, Wisconsin, 2325 October 1995.
324
References
[209] R. Impagliazzo, Hardness as randomness: A survey of universal derandomization, in Proceedings of the International Congress of Mathematicians,
Vol. III (Beijing, 2002), pp. 659672, Beijing, 2002.
[210] R. Impagliazzo, R. Jaiswal, and V. Kabanets, Approximate list-decoding of
direct product codes and uniform hardness amplication, SIAM Journal on
Computing, vol. 39, no. 2, pp. 564605, 2009.
[211] R. Impagliazzo, R. Jaiswal, V. Kabanets, and A. Wigderson, Uniform direct
product theorems: Simplied, optimized, and derandomized, SIAM Journal
on Computing, vol. 39, no. 4, pp. 16371665, 2009/2010.
[212] R. Impagliazzo, V. Kabanets, and A. Wigderson, In search of an easy
witness: Exponential time vs. probabilistic polynomial time, Journal of
Computer and System Sciences, vol. 65, no. 4, pp. 672694, 2002.
[213] R. Impagliazzo and S. Rudich, Limits on the provable consequences of
one-way permutations, in Advances in cryptology CRYPTO 88 (Santa
Barbara, CA, 1988), vol 403 of Lecture Notes in Computer Science, pp. 826,
Berlin: Springer, 1990.
[214] R. Impagliazzo and A. Wigderson, An information-theoretic variant of the
inclusion-exclusion bound (preliminary version), Unpublished manuscript,
1996.
[215] R. Impagliazzo and A. Wigderson, P = BPP if E requires exponential
circuits: Derandomizing the XOR lemma, in Proceedings of the Annual ACM
Symposium on Theory of Computing, pp. 220229, El Paso, Texas, 46 May
1997.
[216] R. Impagliazzo and A. Wigderson, Randomness vs time: Derandomization
under a uniform assumption, Journal of Computer and System Sciences,
vol. 63, no. 4, pp. 672688, 2001. Special issue on FOCS 98 (Palo Alto CA).
[217] R. Impagliazzo and D. Zuckerman, How to recycle random bits, in Annual
Symposium on Foundations of Computer Science (Research Triangle Park,
North Carolina), pp. 248253, 1989.
[218] K. Iwama and H. Morizumi, An explicit lower bound of 5n o(n) for Boolean
circuits, in Mathematical foundations of computer science 2002, vol. 2420 of
Lecture Notes in Computer Science, pp. 353364, Berlin: Springer, 2002.
[219] M. Jerrum and A. Sinclair, Approximating the permanent, SIAM Journal
on Computing, vol. 18, no. 6, pp. 11491178, 1989.
[220] M. Jerrum, A. Sinclair, and E. Vigoda, A polynomial-time approximation
algorithm for the permanent of a matrix with nonnegative entries, Journal
of the ACM, vol. 51, no. 4, pp. 671697, 2004.
[221] S. Jimbo and A. Maruoka, Expanders obtained from ane transformations,
Combinatorica, vol. 7, no. 4, pp. 343355, 1987.
[222] A. Joe, On a sequence of almost deterministic pairwise independent random variables, Proceedings of the American Mathematical Society, vol. 29,
pp. 381382, 1971.
[223] A. Joe, On a set of almost deterministic k-independent random variables,
Annals of Probability, vol. 2, no. 1, pp. 161162, 1974.
[224] S. Johnson, Upper bounds for constant weight error correcting codes,
Discrete Mathematics, vol. 3, pp. 109124, 1972.
References
325
326
References
[243] A. R. Klivans and R. A. Servedio, Boosting and Hard-Core Set Construction, Machine Learning, vol. 51, no. 3, pp. 217238, 2003.
[244] A. R. Klivans and D. van Melkebeek, Graph nonisomorphism has subexponential size proofs unless the polynomial-time hierarchy collapses, SIAM
Journal on Computing, vol. 31, no. 5, pp. 15011526 (electronic), 2002.
[245] D. E. Knuth, The art of computer programming. Volume 2: Seminumerical
Algorithms. AddisonWesley, Third Edition, 1998.
[246] R. K
onig and U. M. Maurer, Extracting randomness from generalized
symbol-xing and Markov sources, in Proceedings of 2004 IEEE International Symposium on Information Theory, p. 232, 2004.
[247] R. K
onig and U. M. Maurer, Generalized strong extractors and deterministic
privacy amplication, in IMA International Conference, vol. 3796 of Lecture
Notes in Computer Science, (N. P. Smart, ed.), pp. 322339, Springer, 2005.
[248] S. Kopparty, List-decoding multiplicity codes, Electronic Colloquium on
Computational Complexity (ECCC), vol. 19, p. 44, 2012.
[249] S. Kopparty, S. Saraf, and S. Yekhanin, High-rate codes with sublinear-time
decoding, in STOC, (L. Fortnow and S. P. Vadhan, eds.), pp. 167176,
ACM, 2011.
[250] M. Kouck
y, P. Nimbhorkar, and P. Pudl
ak, Pseudorandom generators for
group products: extended abstract, in STOC, (L. Fortnow and S. P. Vadhan,
eds.), pp. 263272, ACM, 2011.
[251] C. Koutsougeras and J. S. Vitter, eds., Proceedings of the Annual ACM Symposium on Theory of Computing, May 58, 1991, New Orleans, Louisiana,
USA, ACM, 1991.
[252] H. Krawczyk, How to predict congruential generators, Journal of Algorithms, vol. 13, no. 4, pp. 527545, 1992.
[253] E. Kushilevitz and N. Nisan, Communication complexity.
Cambridge:
Cambridge University Press, 1997.
[254] O. Lachish and R. Raz, Explicit lower bound of 4.5n o(n) for Boolean
circuits, in Annual ACM Symposium on Theory of Computing, pp. 399408
(electronic), New York, 2001.
[255] H. O. Lancaster, Pairwise statistical independence, Annals of Mathematical
Statistics, vol. 36, pp. 13131317, 1965.
[256] C. Lautemann, BPP and the polynomial hierarchy, Information Processing
Letters, vol. 17, no. 4, pp. 215217, 1983.
[257] C.-J. Lee, C.-J. Lu, and S.-C. Tsai, Computational randomness from generalized hardcore sets, in Fundamentals of Computation Theory, vol. 6914
of Lecture Notes in Computer Science, pp. 7889, Heidelberg: Springer,
2011.
[258] F. T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays,
Trees, and Hypercubes. San Mateo, CA: Morgan Kaufmann, 1992.
[259] L. A. Levin, One way functions and pseudorandom generators, Combinatorica, vol. 7, no. 4, pp. 357363, 1987.
[260] D. Lewin and S. Vadhan, Checking polynomial identities over any eld:
towards a derandomization?, in Annual ACM Symposium on the Theory of
Computing (Dallas, TX), pp. 438447, New York, 1999.
References
327
328
References
References
329
330
References
References
331
332
References
References
333
334
References
References
335
[400] S. Vadhan, Probabilistic proof systems, Part I interactive & zeroknowledge proofs, in Computational Complexity Theory, vol. 10 of IAS/Park
City Mathematics Series, (S. Rudich and A. Wigderson, eds.), pp. 315348,
American Mathematical Society, 2004.
[401] S. Vadhan and C. J. Zheng, Characterizing pseudoentropy and simplifying
pseudorandom generator constructions, in Proceedings of the Annual ACM
Symposium on Theory of Computing (STOC 12), pp. 817836, 1922 May
2012.
[402] S. P. Vadhan, Constructing locally computable extractors and cryptosystems in the bounded-storage model, Journal of Cryptology, vol. 17, no. 1,
pp. 4377, January 2004.
[403] L. G. Valiant, Graph-theoretic properties in computational complexity,
Journal of Computer and System Sciences, vol. 13, no. 3, pp. 278285, 1976.
[404] L. G. Valiant, The complexity of computing the permanent, Theoretical
Computer Science, vol. 8, no. 2, pp. 189201, 1979.
[405] L. G. Valiant, A theory of the learnable, Communications of the ACM,
vol. 27, no. 11, pp. 11341142, 1984.
[406] R. Varshamov, Estimate of the number of signals in error correcting codes,
Doklady Akademe Nauk SSSR, vol. 117, pp. 739741, 1957.
[407] U. V. Vazirani, Towards a strong communication complexity theory or
generating quasi-random sequences from two communicating slightly-random
sources (extended abstract), in Proceedings of the Annual ACM Symposium
on Theory of Computing, pp. 366378, Providence, Rhode Island, 68 May
1985.
[408] U. V. Vazirani, Eciency considerations in using semi-random sources
(extended abstract), in STOC, (A. V. Aho, ed.), pp. 160168, ACM, 1987.
[409] U. V. Vazirani, Strong communication complexity or generating quasirandom
sequences from two communicating semirandom sources, Combinatorica,
vol. 7, no. 4, pp. 375392, 1987.
[410] U. V. Vazirani and V. V. Vazirani, Random polynomial time is equal to
slightly-random polynomial time, in Annual Symposium on Foundations of
Computer Science, pp. 417428, Portland, Oregon, 2123 October 1985.
[411] E. Viola, The complexity of constructing pseudorandom generators from hard
functions, Computational Complexity, vol. 13, no. 34, pp. 147188, 2004.
[412] E. Viola, Pseudorandom bits for constant-depth circuits with few arbitrary
symmetric gates, SIAM Journal on Computing, vol. 36, no. 5, pp. 13871403,
(electronic), 2006/2007.
[413] E. Viola, The sum of d small-bias generators fools polynomials of degree d,
Computational Complexity, vol. 18, no. 2, pp. 209217, 2009.
[414] E. Viola, Extractors for circuit sources, in FOCS, (R. Ostrovsky, ed.),
pp. 220229, IEEE, 2011.
[415] E. Viola, The complexity of distributions, SIAM Journal on Computing,
vol. 41, no. 1, pp. 191218, 2012.
[416] J. von Neumann, Various techniques used in conjunction with random digits, in Collected Works. Vol. V: Design of Computers, Theory of Automata
and Numerical Analysis, New York: The Macmillan Co., 1963.
336
References
[417] M. N. Wegman and J. L. Carter, New hash functions and their use in
authentication and set equality, Journal of Computer and System Sciences,
vol. 22, no. 3, pp. 265279, 1981.
[418] R. Williams, Improving exhaustive search implies superpolynomial lower
bounds, in STOC10 Proceedings of the 2010 ACM International
Symposium on Theory of Computing, pp. 231240, New York: ACM, 2010.
[419] R. Williams, Non-uniform ACC circuit lower bounds, in IEEE Conference
on Computational Complexity, pp. 115125, 2011.
[420] J. Wozencraft, List decoding, Quarterly Progress Report, Research
Laboratory of Electronics, MIT, vol. 48, pp. 9095, 1958.
[421] A. C. Yao, Theory and applications of trapdoor functions (extended
abstract), in Annual Symposium on Foundations of Computer Science,
pp. 8091, Chicago, Illinois, 35 November 1982.
[422] A. Yehudayo, Ane extractors over prime elds, Combinatorica, vol. 31,
no. 2, pp. 245256, 2011.
[423] S. Yekhanin, Towards 3-query locally decodable codes of subexponential
length, Journal of the ACM, vol. 55, no. 1, pp. Art 1, 16, 2008.
[424] S. Yekhanin, Locally Decodable Codes. Now Publishers, 2012. (To appear).
[425] R. Zippel, Probabilistic algorithms for sparse polynomials, in EUROSAM,
vol. 72 of Lecture Notes in Computer Science, (E. W. Ng, ed.), pp. 216226,
Springer, 1979.
[426] D. Zuckerman, Simulating BPP using a general weak random source,
Algorithmica, vol. 16, pp. 367391, October/November 1996.
[427] D. Zuckerman, Randomness-optimal oblivious sampling, Random Structures & Algorithms, vol. 11, no. 4, pp. 345367, 1997.
[428] V. V. Zyablov and M. S. Pinsker, List cascade decoding (in Russian),
Problems of Information Transmission, vol. 17, no. 4, pp. 2933, 1981.