0% found this document useful (0 votes)
57 views

CSE291 Course Notes

This document outlines the syllabus for an advanced algorithms course. The course will focus on applications of linear algebra and probability in algorithm design, covering topics like fast matrix multiplication, FFT, error correcting codes, cryptography, data structures, optimization problems, routing, and more. The document provides background on relevant mathematical concepts and outlines several course sections on specific algorithmic techniques.

Uploaded by

Dieter Joubert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

CSE291 Course Notes

This document outlines the syllabus for an advanced algorithms course. The course will focus on applications of linear algebra and probability in algorithm design, covering topics like fast matrix multiplication, FFT, error correcting codes, cryptography, data structures, optimization problems, routing, and more. The document provides background on relevant mathematical concepts and outlines several course sections on specific algorithmic techniques.

Uploaded by

Dieter Joubert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

CSE 291: Advanced techniques in algorithm design

Shachar Lovett
Spring 2016
Abstract
The class will focus on two themes: linear algebra and probability, and their many
applications in algorithm design. We will cover a number of classical examples, including: fast matrix multiplication, FFT, error correcting codes, cryptography, efficient
data structures, combinatorial optimization, routing, and more. We assume basic familiarity (undergraduate level) with linear algebra, probability, discrete mathematics
and graph theory.

Contents
0 Preface: Mathematical background
0.1 Fields . . . . . . . . . . . . . . . .
0.2 Polynomials . . . . . . . . . . . . .
0.3 Matrices . . . . . . . . . . . . . . .
0.4 Probability . . . . . . . . . . . . .

.
.
.
.

4
4
4
5
5

.
.
.
.

7
7
11
12
12

.
.
.
.
.

14
14
14
16
16
17

3 Secret sharing schemes


3.1 Construction of a secret sharing scheme . . . . . . . . . . . . . . . . . . . . .
3.2 Lower bound on the share size . . . . . . . . . . . . . . . . . . . . . . . . . .

19
19
21

4 Error correcting codes


4.1 Basic definitions . . . . . . . . . . . . .
4.2 Basic bounds . . . . . . . . . . . . . .
4.3 Existence of asymptotically good codes
4.4 Linear codes . . . . . . . . . . . . . . .

.
.
.
.

23
23
24
25
26

5 Reed-Solomon codes
5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Decoding Reed-Solomon codes from erasures . . . . . . . . . . . . . . . . . .
5.3 Decoding Reed-Solomon codes from errors . . . . . . . . . . . . . . . . . . .

27
27
28
28

6 Polynomial identity testing and finding perfect matchings


6.1 Perfect matchings in bi-partite graphs . . . . . . . . . . . . .
6.2 Polynomial representation . . . . . . . . . . . . . . . . . . .
6.3 Polynomial identity testing . . . . . . . . . . . . . . . . . . .
6.4 Perfect matchings via polynomial identity testing . . . . . .

30
30
31
33
34

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

1 Matrix multiplication
1.1 Strassens algorithm . . . . . . . . . . . .
1.2 Verifying matrix multiplication . . . . . .
1.3 Application: checking if a graph contains a
1.4 Application: listing all triangles in a graph

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

. . . . .
. . . . .
triangle
. . . . .

2 Fast Fourier Transform and fast polynomial


2.1 Univariate polynomials . . . . . . . . . . . .
2.2 The Fast Fourier Transform . . . . . . . . .
2.3 Inverse FFT . . . . . . . . . . . . . . . . . .
2.4 Fast polynomial multiplication . . . . . . . .
2.5 Multivariate polynomials . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

multiplication
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

in graphs
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

7 Satisfiability
7.1 2-SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 3-SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36
36
40

8 Hash functions: the power of pairwise independence


8.1 Pairwise independent bits . . . . . . . . . . . . . . . .
8.2 Application: de-randomized MAXCUT . . . . . . . . .
8.3 Optimal sample size for pairwise independent bits . . .
8.4 Hash functions with large ranges . . . . . . . . . . . .
8.5 Application: collision free hashing . . . . . . . . . . . .
8.6 Efficient dictionaries: storing sets efficiently . . . . . .
8.7 Bloom filters . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.

42
42
44
45
46
47
48
50

9 Min cut
9.1 Kargers algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Improving the running time . . . . . . . . . . . . . . . . . . . . . . . . . . .

51
51
53

10 Routing
10.1 Deterministic routing is bad . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Solution: randomized routing . . . . . . . . . . . . . . . . . . . . . . . . . .

55
55
56

11 Expander graphs
11.1 Edge expansion . . . . . . . . . . . . .
11.2 Spectral expansion . . . . . . . . . . .
11.3 Cheeger inequality . . . . . . . . . . .
11.4 Random walks mix fast . . . . . . . . .
11.5 Random walks escape small sets . . . .
11.6 Randomness efficient error reduction in

59
59
60
63
64
65
67

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
randomized algorithms

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

Preface: Mathematical background

0.1

Fields

A field is a set F endowed with two operations: addition and multiplication. It satisfies the
following conditions:
Associativity: (x + y) + z = x + (y + z) and (xy)z = x(yz).
Commutativity: x + y = y + x and xy = yx.
Distributivity: x(y + z) = xy + xz.
Unit elements (0,1): x + 0 = x and x1 = x.
Inverse: If x 6= 0 then there exists 1/x such that x(1/x) = 1.
You probably know many infinite fields: the real numbers R, the rationals Q and the
complex numbers C. For us, the most important fields will be finite fields, which have a
finite number of elements.
An example is the binary field F2 = {0, 1}, where addition corresponds to XOR and
multiplication to AND. This is an instance of a more general example, of prime finite fields.
Let p be a prime. The field Fp consists of the elements {0, 1, . . . , p 1}, where addition and
multiplication are defined modulo p. One can verify that it is indeed a field. The following
fact is important, but we will not prove it.
Fact 0.1. If a finite field F has q elements, then q must be a prime power. For any prime
power there is exactly one finite field with q elements, which is called the finite field of order
q, and denoted Fq .

0.2

Polynomials

Univariate polynomials over a field F are given by


f (x) =

n
X

f i xi .

i=0

Here, x is a variable which takes values in F, and fi F are constants, called the coefficients
of f . We can evaluate f at a point a F by plugging x = a, namely
f (a) =

n
X

f i ai .

i=0

The degree of f is the maximal i such that fi 6= 0.

Multi-variate polynomials are defined in the same way: if x1 , . . . , xd are variables that
X
f (x1 , . . . , xd ) =
fi1 ,...,id (x1 )i1 . . . (xd )id
i1 ,...,id

where i1 , . . . , id range over a finite subset of Nd . The total degree of f is the maximal
i1 + . . . + id for which fi1 ,...,id 6= 0.

0.3

Matrices

An n m matrix over a field F is a 2-dimensional array Ai,j for 1 i n, 1 j m. If A


is an n m matrix, B a m ` matrix, then their product is the n ` matrix given by
(AB)i,j =

m
X

Ai,j Bj,k .

k=1

If A is an n n matrix, its determinant is


det(A) =

(1)

sign()

n
Y

Ai,(i) .

i=1

Sn

Here, ranges over all permutations of {1, . . . , n}.


Fact 0.2. If A is an n n matrix, then det(A) = det(AT ), where AT is the transpose of A.
Fact 0.3. If A, B are n n matrices, then det(AB) = det(A) det(B).

0.4

Probability

In this class, we will only consider discrete distributions and discrete random variables, which
take a finite number of possible
P values Let X be a random variable, such that Pr[X = xi ] = pi
where xi R, pi 0 and
pi = 1. Its expectation (average) is
X
E[X] =
p i xi
i

and its variance is




Var[X] = E |X E[X]|2 = E[X 2 ] E[X]2 .
If X, Y are any two random variables then E[X + Y ] = E[X] + E[Y ]; if they are independent
then also Var(X + Y ) = Var(X) + Var(Y ) (in fact, it suffices if E[XY ] = E[X]E[Y ] for that
to hold). If X, Y are joint random variables, let X|Y = y be the marginal random variable
of X conditioned on Y = y. The conditional expectation E[X|Y ] is a random variable which
depends on Y . We have the useful formula
E[E[X|Y ]] = E[X].
We will frequently use the following two common bounds.
5

Claim 0.4 (Markov inequality). Let X 0 be a random variable. Then


Pr[X a]

E[X]
.
a

Proof. Let S = {i : xi a}. Then


X
X
X
E[X] =
p i xi
p i xi
pi a = Pr[X S]a = Pr[X a]a.
iS

iS

Claim 0.5 (Chebychev inequality). Let X R be a random variable where E[X 2 ] < .
Then
Var(X)
.
Pr [|X E[X]| a]
a2
Proof. Let Y = |X E[X]|2 . Then E[Y ] = Var(X) and, by applying Markov inequality to
Y (note that Y 0) we have
Pr [|X E[X]| a] = Pr[Y 2 a2 ]

Var(X)
E[Y 2 ]
=
.
2
a
a2

We will also need tail bounds for the sum of many independent random variables. This
is given by the Chrenoff bounds. We state two versions of these bounds: one for absolute
(additive) error and one for relative (multiplicative) error.
Theorem
P0.6 (Chernoff bounds). Let Z1 , . . . , Zn {0, 1} be independent random variables.
Let Z =
Zi and = E[Z]. Then for any > 0 it holds that
(i) Absolute error:
Pr [|Z E[Z]| n] 2 exp(22 n).
(ii) Relative error:
2
Pr [Z (1 + )] exp
2+



.

Matrix multiplication

Matrix multiplication is a basic primitive in many computations. Let A, B be nn matrices.


Their product C = AB is given by
Ci,j =

n
X

ai,k bk,j .

k=1

The basic computational problem is how many operations (additions and multiplications)
are required to compute C. Implementing the formula above in a straightforward manner
requires O(n3 ) operations. The best possible is 2n2 , which is the number of inputs. The
matrix multiplication exponent, denoted , is the best constant such that we can multiply
two n n matrices in O(n ) operations (to be precise, is the infimum of these exponents).
As we just saw, 2 3. To be concrete, we will consider matrices over the reals, but this
can be defined over any field.
Open Problem 1.1. What is the matrix multiplication exponent?
The first nontrivial algorithm was by Strassen [Str69] in 1969, who showed that
log2 7 2.81. Subsequently, researchers were able to improve the exponent. The best
results to date are by Le Gall [LG14] who get 2.373. Here, we will only describe
Strassens result, as well as general facts about matrix multiplication. A nice survey on
matrix multiplication, describing most of the advances so far, can be found in the homepage
of Yuval Filmus: http://www.cs.toronto.edu/~yuvalf/.

1.1

Strassens algorithm

The starting point of Strassens algorithm is the following algorithm for multiplying 2 2
matrices.
1. p1 = (a1,1 + a2,2 )(b1,1 + b2,2 )
2. p2 = (a2,1 + a2,2 )b1,1
3. p3 = a1,1 (b1,2 b2,2 )
4. p4 = a2,2 (b2,1 b1,1 )
5. p5 = (a1,1 + a1,2 )b2,2
6. p6 = (a2,1 a1,1 )(b1,1 + b1,2 )
7. p7 = (a1,2 a2,2 )(b2,1 + b2,2 )
8. c1,1 = p1 + p4 p5 + p7
9. c1,2 = p3 + p5
7

10. c2,1 = p2 + p4
11. c2,2 = p1 p2 + p3 + p6
It can be checked that this program uses 7 multiplications and 18 additions/subtractions,
so 25 operations overall. The naive implementation of multiplying two 22 matrices requires
8 multiplications and 4 additions, so 12 operations overall. The main observation of Strassen
is that really only the number of multiplications matter.
In order to show it, we will convert Strassens algorithm to a normal form. In such a
program, we first compute some linear combinations of the entries of A and the entries of
B, individually. We then multiply them, and take linear combinations of the results to get
the entries of C. We define it formally below.
Definition 1.2 (Normal form). A normal-form program for computing matrix multiplication,
which uses M multiplications, has the following form:
(i) For 1 i M , compute linear combinations i of the entries of A.
(ii) For 1 i M , compute linear combinations i of the entries of B.
(iii) For 1 i M , Compute the product pi = i i .
(iv) For 1 i, j n, compute ci,j as a linear combination of p1 , . . . , pM .
Lemma 1.3. Any program for computing matrix multiplication, which uses M multiplications (and any number of additions), can be converted to a normal form with at most 2M
multiplications.
Note that in the normal form, the program computes 2M multiplications and O(M n2 )
additions.
Proof. First, we can convert any program for computing matrix multiplication to a straightline program. In such a program, every step computes either the sum or product of two
previously computed variables. In our case, let z1 , . . . , zN be the values computed by the
program. The first 2n2 are the inputs: z1 , . . . , zn2 are the entries of A (in some order), and
zn2 +1 , . . . , z2n2 are the entries of B. The last n2 variables are the entries of C which we wish
to compute. Every intermediate variable zi is either:
A linear combination of two previously computed variables: zi = zj + zk , where
j, k < i and , R.
A product of two previously computed variables: zi = zj zk , where j, k < i.
In particular, note that each zi is some polynomial of the inputs {ai,j , bi,j : 1 i, j n}.
Next, we show that as the result is bilinear in the inputs, we can get the computation to
respect that structure.We decompose zt (A, B) as follows:
zt (A, B) = at + bt (A) + ct (B) + dt (A, B) + et (A, B),
where
8

at is a constant (independent of the inputs)


bt (A) is a linear combination of the entries of A
ct (B) is a linear combination of the entries of B
dt (A, B) is a bi-linear combination of the entries of A, B; namely, a linear combination
of {ai,j bk,` : 1 i, j, k, ` n}
et (A, B) is the rest. Namely, any monomial which is quadratic in A or in B.
Note that inputs have only a linear part (either bt or ct ); and that outputs have only a
bilinear part (dt ). The main observation is that we can compute all of the linear and bilinear
parts directly, without computing et at all, via a straight line program.
(i) Sums: If zt = zi + zj with i, j < t, then
at = ai + aj .
bt (A) = bi (A) + bj (A).
ct (B) = ci (B) + cj (B).
dt (B) = di (B) + dj (B).
The same holds for general linear combinations, zt = zi + zj .
(ii) Multiplications: If zt = zi zj with i, j < t, then
at = ai aj .
bt (A) = ai bj (A) + aj bi (A).
ct (B) = ai cj (B) + aj ci (B).
dt (A, B) = ai dj (A, B) + aj di (A, B) + bi (A)cj (B) + bj (A)ci (B).
Note that ct are constants independent of the inputs. So, the only actual multiplications
we do is in computing bi (A)cj (B) and bj (A)ci (B). To get to a normal form, we can compute:
Linear combinations of the entries of A: at (A) for 1 t N .
Linear combinations of the entries of B: bt (B) for 1 t N .
Products: bi (A)cj (B) and bj (A)ci (B) whenever zt = zi zj .
dt (A, B) are linear combination of these products, and previously computed
di (A, B), dj (A, B) for i, j < t.
Output: linear combination of dt (A, B) for 1 t N .
Note that we only need the linear combinations which enter the multiplication gates, which
gives the lemma.
9

Next, we show how to use matrix multiplication programs in normal form to compute
products of large matrices. This will show that only the number of multiplications matter.
Theorem 1.4. If two m m matrices can be computed using M = m multiplications (and
any number of additions) in a normal form, then for any n 1, any two n n matrices can
be multiplied using only O((mn) log(mn)) operations.
So for example, Strassens algorithm is an algorithm in normal form which uses 7 multiplications to multiply two 2 2 matrices. So, any two n n matrices can be multiplied
using O(nlog2 7 (log n)O(1) ) O(n2.81+ ) operations for any > 0. So, log2 7 2.81.
Proof. Let T (n) denote the number of operations required to compute the product of two
n n matrices. We assume that n is a power of m, by possible increasing it to the smallest
power of m larger than it. This might increase n to at most nm. Now, the main idea is to
compute it recursively. We partition an n n matrix as an m m matrix, whose entries
are (n/m) (n/m) matrices. Let C = AB and let Ai,j , Bi,j , Ci,j be these sub-matrices of
A, B, C, respectively, where 1 i, j m. Then, observe that (as matrices) we have
Ci,j =

m
X

Ai,k Bk,j .

k=1

We can apply any algorithm for m m matrix multiplication in normal form to compute
{Ci,j }, as the algorithm never assumes that the inputs commute. So, to compute {Ci,j }, we:
(i) For 1 i M , compute linear combinations i of the Ai,j .
(ii) For 1 i M , compute linear combinations i of the Bi,j .
(iii) For 1 i M , Compute pi = i i .
(iv) For 1 i, j m, compute Ci,j as a linear combination of p1 , . . . , pM .
Note that i , i , pi are all (n/m) (n/m) matrices. How many operations do we do? steps
(i),(ii),(iv) each require M m2 additions of (n/m) (n/m) matrices, so in total require
O(M n2 ) additions. Step (iii) requires M multiplications of matrices of size (n/m) (n/m).
So, we get the recursion formula
T (n) = m T (n/m) + O(m n2 ).
This solves to O((mn) ) if > 2 and to O((mn)2 log n) if = 2. Lets see explicitly the first
case, the second being similar.
Let n = ms . This recursion solves to a tree of depth s, where each node has m children.
The number of nodes at depth i is mi , and the amount of computation that each makes is
O(m (n/mi )2 ). Hence, the total amount of computation at depth i is O(m m(2)i n2 ). As
long as > 2, this grows exponentially fast in the depth, and hence controlled by the last
level (at depth s) which takes O(m m(2)s m2s ) = O((mn) ).
10

1.2

Verifying matrix multiplication

Assume that someone gives you a magical algorithm that is supposed to multiply two matrices quickly. How would you verify it? one way is to compute matrix multiplication yourself,
and compare the results. This will take time O(n ). Can you do better? the answer is yes,
if we allow for randomization. In the following, our goal is to verify that AB = C where
A, B, C are n n matrices over an arbitrary field.
Function MatrixMultVerify
Input : n n m a t r i c e s A, B, C .
Output : I s i s t r u e t h a t AB = C ?
1 . Choose x {0, 1}n randomly .
2 . Return TRUE i f A(Bx) = Cx , and FALSE o t h e r w i s e .
Clearly, if AB = C then the algorithm always returns true. Moreover, as all the algorithm
does is iteratively multiply an n n matrix with a vector, it runs in time O(n2 ). The main
question is: can we find matrices A, B, C where AB 6= C, but where the algorithm returns
TRUE with high probability? The answer is no, and is provided by the following lemma,
applied to M = AB C.
Lemma 1.5. Let M be a nonzero n n matrix. Then Prx{0,1}n [M x = 0] 1/2.
In particular, if we repeat this t times, the error probability will reduce to 2t .
Proof. The matrix M has some nonzero row, lets say it is a1 , . . . , an . Then,
hX
i
Pr n [M x = 0] Pr
ai x i = 0 .
x{0,1}

P
P
Let i be minimal such that ai 6= 0. Then,
ai xi = 0 iff xi = j>i (aj /ai )xj . Hence, for
any fixing of {xj : j > i}, there is at most one value for xi which would make this hold.

hX
i
hX
i
Pr
ai xi = 0 = Exj ,...,xn {0,1}
Pr
ai x i = 0
xi {0,1}
"
"
##
X
= Exj ,...,xn {0,1}
Pr
xi =
(aj /ai )xj
xi {0,1}

1/2.

11

j>i

1.3

Application: checking if a graph contains a triangle

Let G = (V, E) be a graph. Our goal is to find whether G contains a triangle, and more
generally, enumerate the triangles in G. Trivially, this takes n3 time. We will show how to
improve it using fast matrix multiplication. Let |V | = n, A be the n n adjacency matrix
of G, Ai,j = 1(i,j)E . Observe that
X
(A2 )i,j =
Ai,k Ak,j = number of pathes of length two between i, j
k

So, to check G contains a triangle, we can first compute A2 , and then use it to detect if there
is a triangle.
Function TriangleExists(A)
Input : An n n a d j a c e n c y matrix A
Output : I s t h e r e a t r i a n g l e i n t he graph ?
1 . Compute A2
2 . Check i f t h e r e i s 1 i, j n with Ai,j = 1 and (A2 )i,j 1 .
The running time of step 1 is O(n ), and of step 2 is O(n2 ). Thus the total time is O(n ).

1.4

Application: listing all triangles in a graph

We next describe a variant of the TriangleExists algorithms, which lists all triangles in the
graph. Again, the goal is to improve upon the naive O(n3 ) algorithm which tests all possible
triangles.
The enumeration algorithm will be recursive. At each step, we partition the vertices to
two sets and recurse over the possible 8 configurations. To this end, we will need to check
if a triangle i, j, k exists in G with i I, j J, k K for some I, J, K V . The same
algorithm works.
Function TriangleExists(A; I,J,K)
Input : An n n a d j a c e n c y matrix A , I, J, K {1, . . . , n}
Output : I s t h e r e a t r i a n g l e (i, j, k) with i I, j J, k K .
1 . Let A1 , A2 , A3 be I J, J K, I K s u b m a t r i c e s o f A .
2 . Compute A1 A2 .
3 . Check i f t h e r e i s i I, k K with (A1 A2 )i,k = 1 and (A3 )i,k = 1 .
We next describe the triangle listing algorithm. For simplicity, we assume n is a power
of two.
12

Function TrianglesList(A; I,J,K)


Input : An n n a d j a c e n c y matrix A , I, J, K {1, . . . , n}
Output : L i s t i n g o f a l l t r i a n g l e s (i, j, k) with i I, j J, k K .
1.
2.
3.
4.

I f n = 1 check i f t r i a n g l e e x i s t s . I f so , output i t .
I f Ch e ck Tr i an g le ( I , J ,K)==F a l s e r e t u r n .
P a r t i t i o n I = I1 I2 , J = J1 J2 , K = K1 K2 .
Run T r i a n g l e s L i s t ( Ia , Jb , Kc ) f o r a l l 1 a, b, c 2 .

We will run TrianglesList(A; V,V,V) to enumerate all triangles in the graph.


Lemma 1.6. If G has m triangles, then TrianglesList outputs all triangles, and runs in time
O(n m1/3 ).
In particular, if = 2, the algorithm runs in time O(n2 m1/3 ).
Proof. It is clear that the algorithm lists all triangles, and every triangle is listed once. To
analyze its running time, consider the tree defined by the execution of the algorithm. A
node at depth d corresponds to three matrices of size n/2d n/2d . It either has no children
(if there is no triangle in the corresponding sets of vertices), or has 8 children. Let `i denote
the number of nodes at depth i, then we know that
`i min(8i , 8m).
The first bound is obvious, the second follows because for any node at depth i, its parent at
depth i 1 must contain a triangle, and all the triangles at a given depth are disjoint. The
computation time at level i is given by
Ti = `i O((n/2i ) )

Let i be the level at which 8i = 8m. The computation time up to level i is given by
X
X
X

Ti
8i O((n/2i ) ) =
O(n 2i(3) ) = O(n 2i (3) ) = O(Ti ).
ii

ii

ii

The computation time up after level i is given by


X
X

m O((n/2i ) ) m O((n/2i ) ) = O(Ti ).


Ti
ii

ii

So the total running time is controlled by that of level i , and hence


X
Ti = O(Ti ) = O(n m1/3 ).

Open Problem 1.7. How fast can we find one triangle in a graph? how about m triangles?
13

Fast Fourier Transform and fast polynomial multiplication

The Fast Fourier Transform is an amazing discovery with many applications. Here, we
motivate it by the problem of computing quickly the product of two polynomials.

2.1

Univariate polynomials

Fix a field, say the reals. A univariate polynomial is


f (x) =

n
X

f i xi ,

i=0

where x is the variable and fi R are the coefficients. We may assume that fn 6= 0, in which
case we say that f has degree n.
Given two polynomials f, g of degree n, their sum is given by
(f + g)(x) = f (x) + g(x) =

n
X

(fi + gi )xi .

i=0

Note that given f, g as their list of coefficients, we can compute f + g in time O(n).
The product of two polynomials f, g of degree n each is given by

! n
!
min(i,n)
n
n X
n
2n
X
X
X
X
X

(f g)(x) = f (x)g(x) =
f i xi
gj xj =
fi gj xi+j =
fj gij xi .
i=0

j=0

i=0 j=0

i=0

j=0

P
fj gij for all
So, in order to compute the coefficients of f g, we need to compute min(i,n)
j=0
0 i 2n. This trivially takes time n2 . We will see how to do it in time O(n log n), using
Fast Fourier Transform (FFT).

2.2

The Fast Fourier Transform

Let n C be a primitive n-th root of unity, n = e2i/n = cos(2/n) + i sin(2/n). The


order n Fourier matrix is given by
(Fn )i,j = (n )ij = (n )ij mod n
What is so special about it? well, as we will see soon, we can multiply it by a vector in
time O(n log n), whereas for general matrices this takes time O(n2 ). To keep the description
simple, we assume from now on that n is a power of two.
Theorem 2.1. For any x Cn , we can compute Fn x using O(n log n) additions and multiplications.
14

Proof. Decompose Fn into four n/2 n/2 matrices as follows. First, reorder the rows to list
first all n/2 even indices, then all n/2 odd indices. Let Fn0 be the new matrix, with re-ordered
rows. Decompose


A B
0
Fn =
.
C D
What are A, B, C, D? If 1 a, b n/2 then
Aa,b = (Fn )2a,b = (n )2ab = (n/2 )ab = (Fn/2 )a,b .
Ba,b = (Fn )2a,b+n/2 = (n )2ab+an = (n )2ab = (Fn/2 )a,b .
Ca,b = (Fn )2a+1,b = (n )2ab+b = (Fn/2 )a,b (n )b .
Da,b = (Fn )2a+1,b+n/2 = (n )2ab+b+an+n/2 = (Fn/2 )a,b (n )b .
So, let P be the n/2 n/2 diagonal matrix with Pa,b = (n )b . Then
A = B = Fn/2 ,

C = D = Fn/2 P.

In order to compute Fn x, decompose x = (x0 , x00 ) with x0 , x00 Cn/2 . Then


Fn0 x = (Ax + By, Cx + Dy) = (Fn/2 (x + y), Fn/2 P (x y)).
Let T (n) be the number of additions and multiplications required to multiply Fn by a vector.
Then to compute Fn x we need to:
Compute x + y, x y, P (x y), which takes time 3n since P is diagonal.
Compute Fn/2 (x + y) and Fn/2 P (x y). This takes time 2T (n/2).
Reorder the entries of Fn0 x to compute Fn x, which takes another n steps.
So we obtain the recursion formula
T (n) = 2T (n/2) + 4n.
This recursion solves to T (n) 4n log n:
T (n) 2 4(n/2) log(n/2) + 4n = 4n log n 4n + 4n.

Open Problem 2.2. Can we compute Fn x faster than O(n log n)? maybe as fast as O(n)?

15

2.3

Inverse FFT

The inverse of the Fourier matrix is the complex conjugate of the Fourier matrix, up to
scaling.
Lemma 2.3. (Fn )1 = n1 Fn .
Proof. We have Fna,b = nab = nab . So
(Fn Fn )a,b =

n1
X

nac ncb

n1
X

(n )(ab)c .

c=0

c=0

If a =
then the sum equals n. We claim that when a 6= b the sum is zero. To see that, let
Pbn1
S = i=0 nic , where c 6= 0 mod n. Then
n
n1
X
X
ic
(n ) S =
(n ) =
(n )ic = S.
c

i=1

i=0

Since n has order n, we have nc 6= 1, and hence S = 0. This concludes that


Fn Fn = nIn .

Corollary 2.4. For any x Cn , we can compute (Fn )1 x using O(n log n) additions and
multiplications.
Proof. We have
Fn1 x = n1 Fn x = n1 Fn x.
We can conjugate x to obtain x in time O(n); compute Fn x in time O(n log n); and conjugate
the output and divide by n in time O(n).

2.4

Fast polynomial multiplication

Pn1
i
Let f (x) =
We identify it with the list of coefficients
i=0 fi x be a polynomial.
n
(f0 , f1 , . . . , fn1 ) C . Its order n Fourier transform is defined as its evaluations on the
n-th roots of unity:
fbi = f (ni ).
Lemma 2.5. Let f (x) be a polynomial of degree n 1. Its Fourier transform can be
computed in time O(n log n).

16

P
ij
Proof. We have fbj = n1
i=0 fi n . So

f0
f1
..
.

fb = Fn

fn1

Corollary 2.6. Let f be a polynomial of degree n 1. Given the evaluations of f at the


n-th roots of unity, we can recover the coefficients of f in time O(n log n).
Proof. Compute f = (Fn )1 fb.
The Fourier transform of a product has a simple formula:
d
(f
g)i = (f g)(ni ) = f (ni ) g(ni ) = fbi gbi .
So, we can multiply two polynomials as follows: compute their Fourier transform; multiply it coordinate-wise; and then perform the inverse Fourier transform. Note that if f, g
have degrees d, e, respectively, then f g has degree d + e. So, we need to choose n > d + e to
compute their product correctly.
Fast polynomial multiplication
Input : P o l y n o m i a l s f, g
Output : The product f g
0.
1.
2.
3.
4.
5.

2.5

Let n be th e s m a l l e s t power o f two , n deg(f ) + deg(g) + 1 .


Pad f, g t o l e n g t h n i f n e c e s s a r y ( by adding z e r o s )
Compute fb = Fn f
Compute gb = Fn g
Compute (fcg)i = fbi gbi f o r 1 i n
Return f g = (Fn )1 fcg .

Multivariate polynomials

Let f, g be multivariate polynomials. For simplicity, lets consider bivariate polynomials. Let
f (x, y) =

n
X

i j

fi,j x y ,

g(x, y) =

i,j=0

n
X
i,j=0

17

gi,j xi y j .

Their product is
(f g)(x, y) =

n
X

fi,j gi0 ,j 0 xi+i y j+j =

i,j,i0 ,j 0 =0

2n
X

min(n,i) min(n,j)

i0 =0

j 0 =0

i,j=0

fi0 ,j 0 gii0 ,jj 0 xi y j .

Our goal is to compute f g quickly. One approach is to define a two-dimensional FFT.


Instead, we would reduce the problem of multiplying two bivariate polynomials of degree n
in each variable, to the problem of multiplying two univariate polynomials of degree O(n2 ),
and then apply the algorithm using the standard FFT.
Let N be large enough to be determined later, and define the following univariate polynomials:
n
n
X
X
N i+j
F (z) =
fi,j z
, G(z) =
gi,j z N i+j .
i,j=0

i,j=0

We can clearly compute F, G from f, g in linear time, and as deg(F ), deg(G) (N + 1)n,
we can compute F G in time O((N n) log(N n)). The only question is whether we can infer
f g from F G.
P
Lemma 2.7. Let N 2n + 1. If H(z) = F (z)G(z) = Hi z i then
(f g)(x, y) =

2n
X

HN i+j xi y j .

i,j=0

Proof. We have
H(z) = F (z)G(z) =

n
X

!
fi,j z N i+j

i=0

n
X

n
X

!
gi0 ,j 0 z

N i0 +j 0

i=0
0

fi,j gi0 ,j 0 z N (i+i )+(j+j ) .

i,j,i0 ,j 0 =0

We need to show that the only solutions for


N (i + i0 ) + (j + j 0 ) = N i + j
where 0 i, i0 , j, j 0 n and 0 i , j 2n, are these which satisfy i + i0 = i , j + j 0 = j . As
0 j + j 0 , j 2n and N > 2n, if we compute the value modulo N we get that j + j 0 = j ,
and hence also i + i0 = i .
Corollary 2.8. We can compute the product of two bivariate polynomials of degree n in
time O(n2 log n).

18

Secret sharing schemes

Secret sharing is a method for distributing a secret amongst a group of participants, each
of whom is allocated a share of the secret. The secret can only be reconstructed when an
allowed groups of participants collaborate, and otherwise no information is learned about
the secret. Here, we consider the following special case. There are n players, each receiving
a share. The requirement is that every group of k players can together learn the secret, but
any group of less than k players can learn nothing about the secret. A method to accomplish
that is called an (n, k)-secret sharing scheme.
Example 3.1 ((3, 2)-secret sharing scheme). Assume a secret s {0, 1}. The shares
(S1 , S2 , S3 ) are joint random variables, defined as follows. Sample x F5 uniformly, and set
S1 = s + x, S2 = s + 2x, S3 = s + 3x. Then each of S1 , S2 , S3 is uniform in F5 , even conditioned on the secret, but any pair define two independent linear equations in two variables
x, s which can be solved.
We will show how to construct (n, k) secret sharing schemes for any n k 1. This will
follow [Sha79].

3.1

Construction of a secret sharing scheme

In order to construct (n, k)-secret sharing schemes, we will use polynomials. We will later
see that this is an instance of a more general phenomena. Let F be a finite field
P of sizei
|F| > n. We choose a random polynomial f (x) of degree k 1 as follows: f (x) = k1
i=0 fi x
where f0 = s and f1 , . . . , fk1 F are chosen uniformly. Let 1 , . . . , n F be distinct
nonzero elements. The share for player i is Si = f (i ). Note that the secret is s = f (0). For
example, the (3, 2)-secret sharing scheme corresponds to F = F5 , 1 = 1, 2 = 2, 3 = 3.
Theorem 3.2. This is an (n, k)-secret sharing scheme.
The proof will use the following definition and claim.
Definition 3.3 (Vandermonde matrices). Let 1 , . . . , k F be distinct elements in a field.
The Vandermonde matrix V = V (1 , . . . , k ) is defined as follows:
Vi,j = (i )j1 .
Lemma 3.4. If 1 , . . . , k F are distinct elements then det(V (1 , . . . , k )) 6= 0.
Q
Proof sketch. We will show that det(V (1 , . . . , k )) = i<j (j i ). In particular, it is
nonzero whenever 1 , . . . , k are distinct. To see that, let x1 , . . . , xk F be variables, and
define the polynomial
f (x1 , . . . , xk ) = det(V (x1 , . . . , xk )).
First, note that if we set xi = xj for some i 6= j, then f |xi =xj = 0. This is since the matrix
V (x1 , . . . , xk ) with xi = xj has two identical rows (the i-th and j-th rows), and hence its
19

determinant is zero. This then implies (and we omit the proof here) that f (x) is divisible
by xi xj for all i 6= j. So we can factor
Y
f (x1 , . . . , xk ) =
(xi xj )g(x1 , . . . , xk )
i>j

for some polynomial g(x1 , . . . , xk ). Next, we claim that g is a constant. This will follow by
comparing degrees. Recall that for an n n matrix V we have
det(V ) =

sign()

(1)

n
Y

V(i),i ,

i=1

Sn

where ranges over all permutations of {1, . . . , n}. In our case, Vi,j = xj(i) is a polynomial
of degree j, and hence
 
n1
X
n
deg(det(V (x1 , . . . , xn ))) =
.
j=
2
j=0
Observer that also

 
n
deg( (xi xj )) =
.
2
i>j
Y

So it must be that g is a constant. One can further verify that in fact g = 1, although we
would not need that.
If we substitute xi = i we obtain that, as 1 , . . . , n are distinct that
Y
det(V (1 , . . . , n )) =
(i j ) 6= 0.
i>j

Proof of Theorem 3.2. We need to show two things: (i) any k players can recover the secret,
and (ii) any k 1 learn nothing about it.
(i) Consider any k players, say i1 , . . . , ik . Each share Sij is a linear combination of the
k unknown variables f0 , . . . , fk1 . We will show that they are linearly independent,
and hence the players have enough information to solve for the k unknowns, and
in particular can recover f0 = s. Let V = V (i1 , . . . , ik ). By definition we have
Sij = (V f )j , where we view f = (f0 , . . . , fk1 ) as a vector in Fk . Since det(V ) 6= 0, the
players can solve the system of equations and obtain f0 = (V 1 (Si1 , . . . , Sik ))1 .
(ii) Consider any k 1 players, say i1 , . . . , ik1 . We will show that for any fixing of f0 = s,
the random variables Si1 , . . . , Sik1 are independent and uniform over F. To see that,
let V = V (0, i1 , . . . , ik1 ) and let f = (f0 , . . . , fk1 ) Fk be chosen uniformly.
Then, (f0 , Si1 , . . . , Sik1 ) = V f is also uniform in Fk . In particular, f0 is independent
of (Si1 , . . . , Sik1 ), and hence the distribution of (Si1 , . . . , Sik1 ), which happens to be
uniform in Fk1 , is independent of the choice of f0 . Thus, the k1 learn no information
about the secret.
20

We can in fact generalize this construction. Let M be an (n + 1) k matrix over a field


F with the following properties:
The first row of M is (1, 0, . . . , 0).
Each k rows of M are linearly independent.
j1
For example, if 1 , . . . , n F are nonzero and distinct, then Mi,j = i1
achieves this,
0
where we set 0 = 0 and 0 = 1. The secret sharing scheme is as follows: choose randomly
f1 , . . . , fk1 F, compute S = M (s, f1 , . . . , fk ), and give Si to the i-th player as its share.
It is easy to verify that the proof of Theorem 3.2 extends to this more general case. We will
later see that such matrices play an important role in other domains, such as coding theory
(MDS codes) and pseudo-randomness (k-wise independent random variables).

3.2

Lower bound on the share size

In our construction, the shares were elements of F, and hence their size grew with the number
of players. One may ask whether this is necessary, or whether there are better constructions
which achieve smaller shares. Here, we will only analyze the case of linear constructions
(such as above), although similar bounds can be obtained by general secret sharing schemes.
Lemma 3.5. Let M be a n k matrix over a field F, with n k + 2, such that any k rows
in M are linearly independent. Then |F| max(k, n k). In particular, |F| n/2.
We note that the condition n k + 2 is tight: the (k + 1) k matrix whose first k rows
form the identity matrix, and its last row is all ones, has this property over any field, and
in particular F2 . We also note that there is a conjecture (called the MDS conjecture) which
speculates that in fact |F| n 1, and our construction above is tight.
Proof. We can apply any invertible linear transformation to the columns of M , without
changing the property that any k rows are linearly independent. So, we may assume that


I
,
M=
R
where I is the k k identity matrix, and R is a k (n k) matrix.
Next, we argue that R cannot contain any 0. Otherwise, if for example Ri,j = 0, then
the following k rows of M are linearly dependent: the i-th row of R, and the k 1 rows of
the identity matrix which exclude row j. So, Ri,j 6= 0 for all i, j.
Hence, we may scale the rows of R so that Ri,1 = 1 for all 1 i n k. Moreover, we
can then scale the columns of R so that R1,i = 1 for all 1 i k. There are now two cases
to consider:

21

(i) If |F| < k then R2,i = R2,j for some 1 i < j 6= k. But then the following k rows are
linearly dependent: the first and second rows of R, and the k 2 rows of the identity
matrix which exclude rows i, j. So, |F| k.
(ii) If |F| < n k then Ri,2 = Rj,2 for some 1 i < j n k. But then the following k
rows are linearly dependent: the i-th and j-th rows of R, and the k 2 rows of the
identity matrix which exclude rows 1, 2. So, |F| n k.

22

Error correcting codes

4.1

Basic definitions

An error correcting code allows to encode messages into (longer) codewords, such that even
in the presence of a errors, we can decode the original message. Here, we focus on worst
case errors, where we make no assumptions on the distribution of errors, but instead limit
the number of errors.
Definition 4.1 (Error correcting code). Let be a finite set, n k 1. An error correcting
code over the alphabet of message length k and codeword length n (also called block length)
consists of
A set of codewords C n of size |C| = ||k .
A one-to-one encoding map E : k C.
A decoding map D : n k .
We require that D(E(m)) = m for all messages m k .
To describe the error correcting capability of a code, define the distance of x, y n as
the number of coordinates where they differ,
dist(x, y) = |{i [n] : xi 6= yi }|.
Definition 4.2 (Error correction capability of a code). A code (C, E, D) can correct up to e
errors if for any message m k and any x n such that dist(E(m), x) e, it holds that
then D(x) = m.
Example 4.3 (Repetition code). Let = {0, 1}, k = 1, n = 3. Define C = {000, 111}, E :
{0, 1} {0, 1}3 by E(0) = 000, E(1) = 111. Define D : {0, 1}3 {0, 1} by D(x1 , x2 , x3 ) =
Majority(x1 , x2 , x3 ). Then (C, E, D) can correct up to 1 errors.
If we just care about combinatorial bounds (that is, ignore algorithmic aspects), then a
code is defined by its codewords. We can define E : k C by any one-to-one way, and
D : n k by mapping x n to the closest codeword E(m), breaking ties arbitrarily.
From now on, we simply describe codes by describing the set of codewords C. Once we start
discussing algorithms for encoding and decoding, we will revisit this assumption.
Definition 4.4 (Minimal distance of a code). The minimal distance of C is the minimal
distance of any two distinct codewords,
distmin (C) = min dist(x, y).
x6=yC

Definition 4.5 ((n, k, d)-code). An (n, k, d)-code over an alphabet is a set of codewords
C n of size |C| = ||k and minimal distance d.
23

Lemma 4.6. Let C be an (n, k, 2e + 1)-code. Then it can decode from e errors.
Proof. Let x C, y n be such that dist(x, y) e. We claim that x is the unique closest
codeword to y. Assume not, that is there is another x0 C with dist(x0 , y) e. Then
by the triangle inequality, dist(x, x0 ) dist(x, y) + dist(y, x0 ) 2e, which contradicts the
assumption that the minimal distance of C is 2e + 1.
Moreover, if C has minimal distance d, then there exist x, x0 C and y n such that
dist(x, y)+dist(x0 , y) = d, so the bound is tight. So, we can restrict our study to the existence
of (n, k, d)-codes.

4.2

Basic bounds

Lemma 4.7 (Singleton bound). Let C be an (n, k, d)-code. Then k n d + 1.


Proof. We have C n of size |C| = ||k . Let C 0 nd+1 be the code obtained by
deleting the first d 1 coordinates from all codewords if C. Note that all codewords remain
distinct, as we assume the minimal distance is at least d. So, |C 0 | = ||k . This implies that
k n d + 1.
An MDS code (Maximal Distance Separable) is a code for which k = n d + 1. We will
later see an example of such a code (Reed-Solomon code).
Lemma 4.8 (Hamming bound). Let C be an (n, k, 2e + 1)-code over with || = q. Then
e  
X
n
k
q
(q i)i q n .
i
i=0
Proof. For each codeword x C define the ball of radius e around it,
B(x) = {y n : dist(x, y) e}.
These
intersect by the minimal distance requirement. Each ball contains
 cannot
Pe nball
i
k
i=0 i (q 1) elements, and there are q such balls. The lemma follows.
The singleton bound and the hamming bound are incomparable, in the sense that in
some regimes one is superior to the other, and in other regimes the opposite holds. The
following example demonstrates that.
Example 4.9. Let d = 3, corresponding to correcting 1 error. The singleton bound gives
k n 2 over any alphabet. The hamming bound gives (for e = 1)
q k (1 + (q 1)n) q n .
For binary codes, eg q = 2, it gives 2k (n + 1) 2n , which gives k n log2 (n + 1), which
is a stronger bound than that given by the singleton bound. On the other hand, if we take q
to be very large, then the hamming bound gives k n 1 + oq (1), which is weaker than the
singleton bound.
24

4.3

Existence of asymptotically good codes

An (n, k, d)-code is said to be asymptotically good (or simply good) if k = n, d = n for


some constants , > 0. More precisely, we consider families of codes with growing n and
fixed , > 0. The singleton bound implies that + 1, and MDS codes achieve that. It
is unknown if over binary alphabet it is achievable, and it is one of the major open problems
in coding theory. Here, we will show that good codes exist for some constants , , without
trying to optimize them.
Lemma 4.10. There exists a family C {0, 1}n of size 2n/10 such that distmin (C) n/10.
Proof. The proof is probabilistic. For N = 2n/10 let x1 , . . . , xN {0, 1}n be uniformly
chosen. We claim that with high probability, C = {x1 , . . . , xN } is as claimed. To see that,
lets consider the probability that dist(xi , xj ) n/10 for some fixed 1 i < j N . The
number of choices for xi is 2n . Given xi , the number of choices for xj of distance at most

P
n
n/10 from xi is n/10
. This should be divided by the total number of pairs, which is
i=0
i
2n
2 . So,


P
n
n
n n/10
2n n/10
i=0
i

.
Pr[dist(xi , xj ) n/10]
22n
2n
We need some estimates for the binomial coefficient. A useful one is
 n m  n   en m

.
m
m
m
So,


n/10
 
n
en
((10e)1/10 )n (1.4)n .

n/10
n/10

So,
n(1.4)n
= n(0.7)n .
2n
Now, the probability that there exists some pair 1 i < j N such that dist(xi , xj ) n/10
can be upper bounded by the union bound,
X
Pr[1 i < j N, dist(xi , xj ) n/10]
Pr[dist(xi , xj ) n/10]
Pr[dist(xi , xj ) n/10]

1i<jN

N 2 n(0.7)n = 22n/10 n(0.7)n n(0.81)n .


So, the probability that distmin (C) n/10 is at most most n(0.81)n , which is exponentially
small. Hence, with very high probability, the randomly chosen code will be a good code.
We can get the same bounds without using probability. Consider the following process
for choosing x1 , x2 , . . . , xN {0, 1}n . Pick x1 arbitrarily, and delete all points of distance
n/10 from it; pick x2 from the remaining points, and delete all points of distance n/10

25

from it; pick x3 from the remaining points, and so on. Continue in such a way until all points
are exhausted. The number of points chosen N satisfies that
2n
N Pn/10
i=0

.

n
i

This
we have initially a total of 2n points, and at each point we delete at most

Pn/10is nbecause
undelete points. The same calculations as before show that N 2n/10 .
i=0
i

4.4

Linear codes

A special family of codes are linear codes. Let F be a finite field. In a linear code, = F and
C Fn is a k-dimensional subspace. The encoding map is a linear map: E(x) = Ax where
A is a n k matrix over F. Note that rank(A) = k, as otherwise the set of codewords will
have dimension less than k. In practice, nearly all codes are linear, as the encoding map is
easy to define. However, the decoding map needs inherently to be nonlinear, and is usually
the hardest to compute.
Claim 4.11. Let C be a linear code. Then distmin (C) = min06=xC dist(0, x).
Proof. If x1 , x2 C have the minimal distance, then dist(x1 , x2 ) = dist(0, x1 x2 ) and
x1 x2 C.
We can view the decoding problem from either erasures or errors as a linear algebra
problem. Let A be an n k matrix. Codewords are C = {Ax : x Fk }, or equivalently the
subspace spanned by the columns of A.
Decoding from erasures. The problem of decoding from erasures is equivalent to the
following problem: given y (F {?})n , find x Fk such that (Ax)i = yi for all yi 6=?.
Equivalently, we want the sub-matrix formed by keeping only the rows {i [n] : yi 6=?} to
form a rank k matrix. So, the requirement that a linear code can be uniquely decoded from e
erasures, is equivalent to the requirement that if any e rows are deleted in the matrix, it still
has rank k. Clearly, e n k. We will see a code achieving this bound, the Reed-Solomon
code. It will be based on polynomials.
Decoding from errors. The problem of decoding from e errors is equivalent to the following problem: given y Fn , find x Fk such that (Ax)i 6= yi for at most e coordinates.
Equivalently, we want to find a vector spanned by the columns of A, which agrees with y
in at least n e coordinates. If the code has minimal distance d, then we know that this is
mathematically possible whenever e < d/2; however, finding this vector is in general computationally hard. We will see a code where this is possible, and which moreover has the best
minimal distance, d = n k + 1. Again, it will be Reed-Solomon code.

26

Reed-Solomon codes

Reed-Solomon codes are an important group of error-correcting codes that were introduced
by Irving Reed and Gustave Solomon in the 1960s. They have many important applications,
the most prominent of which include consumer technologies such as CDs, DVDs, Blu-ray
Discs, QR Codes, satellite communication and so on.

5.1

Definition

Reed-Solomon codes are defined as the evaluation of low degree polynomials over a finite
field. Let F be a finite field. Messages are in Fk are treated as the coefficients of a univariate
polynomial of degree k 1, and codewords are its evaluations on n < |F| points. So, ReedSolomon codes are defined by specifying F, k and n < |F| points 1 , . . . , n F, and its
codewords are
C = {(f (1 ), f (2 ), . . . , f (n )) : f (x) =

k1
X

fi xi , f0 , . . . , fk1 F}.

i=0

We define this family of codes in general as RSF (n, k), and if needs, we can specify the
evaluation points. An important special case is when n = |F|, and we evaluate the polynomial
on all field elements.
Lemma 5.1. The minimal distance of RSF (n, k) is d = n k + 1.
Proof. As C is a linear code, it suffices to show that for any nonzero polynomial f (x) of
degree k 1, |{x F : f (x) = 0}| k 1. Hence, |{i [n] : f (i ) = 0}| n k + 1.
Now, this follows from the fundamental theorem of algebra: a nonzero polynomial of degree
r has at most r roots. We prove it below by induction on r.
If r = 0 then f is a nonzero constant, and so it has no roots. So, assume r 1. Let
F be such that f () = 0. Lets shift the input so that the root is at zero. That is, define
g(x) = f (x + ), so that g(0) = 0 and g(x) is also a polynomial of degree r. Express it as
g(x) =

r
X

fi (x + ) =

i=0

r
X

gi xi .

i=0

P
i
Since g0 = g(0) = 0, we get that g(x) = xh(x) where h(x) = r1
i=0 gi+1 x is a polynomial of
degree r 1, and hence f (x) = g(x ) = (x )h(x ). By induction, h(x ) has at
most r 1 roots, and hence f has at most r roots.
Recall that the Singleton bound shows that in any (n, k, d) code, d n k + 1. Codes
which achieve this bound, i.e for which d = n k + 1, are called MDS codes (Maximal
Distance Separable). What we just showed is that Reed-Solomon codes are MDS codes.
In fact, for prime fields, it is known that Reed-Solomon are the only MDS codes [Bal11],
and it is conjecture to be true over non-prime fields as well (except for a few exceptions in
characteristic two).
27

5.2

Decoding Reed-Solomon codes from erasures

We first analyze the ability of Reed-Solomon codes to recover from erasures. Assume that
we are given a Reed-Solomon codeword, with some coordinates erased. Let S denote the set
of remaining coordinates. That is, for S [n] we know that f (i ) = yi for all i S, where
yi F. The question is: for which sets S is this information sufficient to uniquely recover
the polynomial f ?
Equivalently, we need to solve the following system of linear equations, where the unknowns are the coefficients f0 , . . . , fk1 of the polynomial f :
k1
X

fj ij = yi ,

i S.

j=0

In order to analyze this, let V = V ({i : i S}) be a |S| k Vandermonde matrix given by
Vi,j = ij for i S, 0 j k 1. Then, we want to solve the system of linear equations
V f = y,
where f = (f0 , . . . , fk1 ) Fk and y = (yi : i S) F|S| . Clearly, we need |S| k for
a unique solution to exist. As we saw, whenever |S| = k the matrix V is invertible, hence
there is a unique solution. So, as long as |S| k, we can restrict to k equations and uniquely
solve for the coefficients of f .
Corollary 5.2. The code RSF (n, k) can be uniquely decoded from n k erasures.

5.3

Decoding Reed-Solomon codes from errors

P
i
Next, we study the harder problem of decoding from errors. Again, let f (x) = k1
i=0 fi x be
an unknown polynomial of degree k 1. We know its evaluation on 1 , . . . , n F, but with
a few errors, say e. That is, we are given y1 , . . . , yn F, such that yi 6= f (i ) for at most e
of the evaluations. If we knew the locations of the errors, we would be back at the decoding
from erasures scenario; however, we do not know them, and enumerating them is too costly.
Instead, we will design an algebraic algorithm, called the Berlekamp-Welch algorithm, which
can detect the locations of the errors efficiently, as long as the number of errors is not too
large (interestingly enough, the algorithm was never published as an academic paper, and
instead is a patent).
Define a error locating polynomial E(x) as follows:
Y
E(x) =
(x i ).
i:yi 6=f (i )

The decoder doesnt know E(x). However, we will still use it in the analysis. It satisfies the
following equation:
E(i ) (f (i ) yi ) = 0
1 i n.
Let N (x) = E(x)f (x). Note that deg(E) = e and deg(N ) = deg(E) + deg(f ) = e + k 1.
We established the following claim.
28

Claim 5.3. There exists polynomials E(x), N (x) of degrees deg(E) = e, deg(N ) = e + k 1
such that
N (i ) yi E(i ) = 0
1 i n.
Proof. We have N (i ) = E(i )f (i ). This is equal to E(i )yi as either yi = f (i ) or
otherwise E(i ) = 0.
The main idea is that we can find such polynomials by solving a system of linear equations.

(x) of degrees deg(E)


e, deg(N
)
Claim 5.4. We can efficiently find polynomials E(x),
N
e + k 1, not both zero, such that
(i ) yi E(
i) = 0
N

1 i n.

Proof. Let

E(x)
=

e
X

(x) =
N

aj x j ,

j=0

e+k1
X

bj x j ,

j=0

where aj , bj are unknown coefficients. They need to satisfy the following system of n linear
equations:
e
e+k1
X
X
aj ij yi
bj ij = 0
1 i n.
j=0

j=0

We know that this system has a nonzero solution (since we know that E, N exist by our
assumptions). So, we can find a nonzero solution by linear algebra.
= N , so we are not done yet. However,
Note that it is not guaranteed that E = E, N
N
that we find.
the next claim shows that we can still recover f from any E,
(x) = E(x)f

Claim 5.5. If e (n k)/2 then N


(x).
(x) E(x)f

Proof. Consider the polynomial R(x) = N


(x). Note that for any i such that
f (i ) = yi , we have that
(i ) E(
i )f (i ) = N
(i ) E(
i )yi = 0.
R(i ) = N
), deg(E)+deg(f

So, R has at least ne roots. On the other hand, deg(R) max(deg(N


))
e + k 1. So, as long as n e > e + k 1, it has more zeros than its degree, and hence must
be the zero polynomial.
Corollary 5.6. The code RSF (n, k) can be uniquely decoded from (n k)/2 errors.
, E we can solve for f such that N
(x) = E(x)f

Proof. Given N
(x), either by polynomial
division, or by solving a system of linear equations.
Note that this is the best we can do, as the minimal distance is n k + 1.
29

Polynomial identity testing and finding perfect


matchings in graphs

We describe polynomial identity testing in this chapter. It is a key ingredient in several


algorithms. Here, we motivate it by the problem of checking whether a graph has a perfect
matching. We will first present a combinatorial algorithm, based on Halls marriage theorem.
Then, we will introduce a totally different algorithm based on polynomials and polynomial
identity testing.

6.1

Perfect matchings in bi-partite graphs

Let G = (U, V, E) be a bi-partite graph with |U | = |V | = n. Let U = {u1 , . . . , un } and


V = {v1 , . . . , vn }. A perfect matching is a matching of each node in U to an adjacent
distinct node in V . That is, it is given by a set of edges (u1 , v(1) ), (u2 , v(2) ), . . . , (un , v(n) ),
where Sn is a permutation.
We would like to characterize when a give graph has a perfect matching. For u U , let
(u) V denote the set of neighbors of u in V . For U 0 U , let (U 0 ) = uU 0 (u) be the
neighbors of the vertices of U 0 .
Theorem 6.1 (Hall marriage theorem). G has a perfect matching if and only if
|(U 0 )| |U 0 |

U 0 U.

Proof. The condition is clearly necessary: if G has a perfect matching {(ui , v(i) ) : i [n]}
for some permutation then if U 0 = {ui1 , . . . , uik } then v(i1 ) , . . . , v(ik ) (U 0 ), and hence
|(U 0 )| k = |U 0 |.
The more challenging direction is to show that the condition given is sufficient. To show
that, assume towards a contradiction that G has no perfect matching. Assume without loss
of generality (after possibly renaming the vertices) that M = {(u1 , v1 ), . . . , (um , vm )} is the
largest partial matching in G. Let u U \ {u1 , . . . , um }. We will build a partial matching
for {u1 , . . . , um , u}, which would violate the maximality of M .
If (u, v) E for some v
/ {v1 , . . . , vm }, then clearly M is not maximal, as we can add
the edge (u, v) to it. More generally, we say that a path P in G is an augmenting path for
M if it is of the form
P = u, vi1 , ui1 , vi2 , ui2 , . . . , vik , uik , v
with v
/ {v1 , . . . , vm }. Note that all the even edges in P are in the matching M (namely,
(vi1 , ui1 ), . . . , (vik , uik ) M ), and all the odd edges are not in the matching M (namely,
(u, vi1 ), (ui1 , vi2 ), . . . , (uik , v)
/ M ). Such a path would also allow us to increase the matching
size by one, by taking
M 0 = {(u, vi1 ), (ui1 , vi2 ), . . . , (uik , v)} {(uj , vj ) : j
/ {i1 , . . . , ik }}.
So, by our assumption that M is a partial matching of maximal size, there are no augmenting
paths in G which start at u.
30

We say that a path P in G is an alternating path if each other edge in it belongs to the
matching M . Let P be an alternating path of maximum length in G starting at u. It has
length at least 1, as u has at least one neighbor. So, it has the form
P = u, vi1 , ui1 , vi2 , ui2 , . . .
This path cannot end at a vertex vi V , as it can always be extended to ui U . So, it ends
at some vertex uik {u1 , . . . , um },
P = u, vi1 , ui1 , vi2 , ui2 , . . . , vik , uik .
Let U 0 = {u, ui1 , . . . , uik } and V 0 = {vi1 , . . . , vik }. We claim that V 0 = (U 0 ), which would
falsify our assumption, since |U 0 | = k + 1, |V 0 | = k. To see that, assume that v (U 0 ). If
v {v1 , . . . , vm }, say v = vi , then by construction ui U 0 and hence vi (U 0 ). Otherwise, if
v
/ {v1 , . . . , vm }, then it can be used to construct an augmenting path. Indeed, if (ui` , v) E
for some 1 ` k then the following is an augmenting path:
P = u, vi1 , ui1 , vi2 , ui2 , . . . , vi` , ui` , v.
In either case, we reached a contradiction.
So, we have a mathematical criteria to check if a bi-partite graph has a perfect matching.
Moreover, it can be verified that the proof of Halls marriage theorem is in fact algorithmic:
it can be used to find larger and larger partial matchings in a graph. This is much better
than verifying the conditions of the theorem, which naively would take time 2n to enumerate
all subsets of U . We will now see a totally different way to check if a graph has a perfect
matching, using polynomial identity testing.

6.2

Polynomial representation

We will consider multivariate polynomials in x1 , . . . , xn of degree d. How can we represent


them? one way is explicitly, by their list of coefficients:
X
f (x1 , . . . , xn ) =
fe1 ,...,en xe11 . . . xenn .
e1 ,...,en 0:

ei d

For large degrees,


this can be quite large: the number of monomials in n variables of degree

n+d
d is n , which is exponentially large if both n, d are large. This means that evaluating
the polynomial on a single input would take exponential time.
n
Another way is via a computation. Consider for example f (x) = (x + 1)2 . It has 2n + 1
monomials, so evaluating it via summing over all monomials would take exponential time.
However, we can compute it in time O(n) by first evaluating x + 1, and then squaring it
iteratively n times. This shows that some polynomials of exponential degree can in fact be
computed in polynomial time. In order to define this more formally, we introduce the notion
of an algebraic circuit.
31

Definition 6.2 (Algebraic circuit). Let F be a field, and x1 , . . . , xn be variables taking values
in F. An algebraic circuit computes a polynomial in x1 , . . . , xn . It is defined by a directed
acyclic graph (DAG), with multiple leaves (nodes with no incoming edges) and a single root
(node with no outgoing edges). Each leaf is labeled by either a constant c F or a variable
xi , which is the polynomial it computes. Internal nodes are labeled in one of two way: they
are either sum gates, which compute the sum of their inputs, or they are product gates,
which compute the product of their inputs. The polynomial computes by the circuit is the
polynomial computed by the root.
n

So for example, we can compute (x + 1)2 using an algebraic circuit of size O(n):
It has two leaves: v1 , v2 which compute the polynomials fv1 (x) = 1, fv2 (x) = x.
The node v3 is a sum gate with two children, v1 , v2 . It computes the polynomial
fv3 (x) = x + 1.
For i = 1, . . . , n, let vi+3 be a multiplication gate with two children, both being vi+2 .
i
It computes the polynomial fvi+3 (x) = fvi+2 (x)2 = (x + 1)2 .
n

The root is vn+3 , which computes the polynomial (x + 1)2 .


Definition 6.3. Let f (x1 , . . . , xn ) be a polynomial. It is said to be efficiently computable if
deg(f ) poly(n) and there is an algebraic circuit of size poly(n) which computes f . The
class of polynomials which can be efficiently computed by algebraic circuits is called VP.
An interesting example for a polynomial in VP is the determinant of a matrix. Let M
be a matrix with entries xi,j . The determinant polynomial is
det(M )((xi,j : 1 i, j n)) =

(1)

sign()

n
Y

xi,(i) .

i=1

Sn

The sum is over of all the permutations on n elements. In particular, the determinant is a
polynomial in n2 variables of degree n. As a sum of monomials, it has n! monomials, which
is very inefficient. However, we know that the determinant can be computed efficiently by
Gaussian elimination. Although we would not show this here, it turns out that it can be
transformed to an algebraic circuit of polynomial size which computes the determinant.
Another interesting polynomial is the permanent. It is defined very similarly to the
determinant, except that we do not have the signs of the permutations:
per(M )((xi,j : 1 i, j n)) =

n
XY

xi,(i) .

Sn i=1

A direct calculation of the permanent, by summing over all monomials, requires size n!. There
are more efficient ways (such as Ryser formula [Rys63]) which gives an arithmetic circuit of
size O(2n n2 ), but these are still exponential. It is suspected that no sub-exponential algebraic
circuits can compute the permanent, but we do not know how to prove this. The importance
of this problem is that the permanent is complete, in the sense that many counting problems
can be reduced to computing the permanent of some specific matrices.
32

Open Problem 6.4. What is the size of the smallest algebraic circuit which computes the
permanent?

6.3

Polynomial identity testing

A basic question in mathematics is whether two objects are the same. Here, we will consider
the following problem: given two polynomials f (x), g(x), possibly via an algebraic circuit,
is it the case that f (x) = g(x)? Equivalently, since we can create a circuit computing
f (x) g(x), it is sufficient to check if a given polynomial is zero. If this polynomial is given
via its list of coefficients, we can simply check that all of them are zero. But, this can be
a very expensive procedure, as the number of coefficients can be exponentially large. For
example, verifying the formula for the determanent of a Vandermonde matrix directly would
take exponential time if done in this way, although we saw a direct proof of this formula.
We will see that using randomness, it can be verified if a polynomial is zero or not.
This will be via the following lemma, called that Schwartz-Zippel lemma [Zip79, Sch80],
which generalizes the fact that univariate polynomials of degree d have at most d roots, to
multivariate polynomials.
Lemma 6.5. Let f (x1 , . . . , xn ) be a nonzero polynomial of degree d. Let S F of size
|S| > d. Then
d
.
Pr [f (a1 , . . . , an ) = 0]
a1 ,...,an S
|S|
Q
Note that the lemma is tight, even in the univariate case: if f (x) = di=1 (x i) and
S = {1, . . . , s} then PraS [f (a) = 0] = ds .
Proof. The proof is by induction on n. If n = 1, then f (x1 ) is a univariate polynomial of
d
degree d, hence it has at most d roots, and hence Pra1 S [f (a1 ) = 0] |S|
.
If n > 1, we express the polynomial as
f (x1 , . . . , xn ) =

d
X

xin fi (x1 , . . . , xn1 ),

i=0

where f0 , . . . , fd are polynomials in the remaining n 1 variables, and where deg(fi ) d i.


Let e d be maximal such that fe 6= 0. Let a1 , . . . , an S be chosen independently and
uniformly. We will bound the probability that f (a1 , . . . , an ) = 0 by
Pr

a1 ,...,an S

[f (a1 , . . . , an ) = 0]

= Pr[f (a1 , . . . , an ) = 0|fe (a1 , . . . , an1 ) = 0] Pr[fe (a1 , . . . , an1 ) = 0]


+ Pr[f (a1 , . . . , an ) = 0|fe (a1 , . . . , an1 ) 6= 0] Pr[fe (a1 , . . . , an1 ) 6= 0]
Pr[fe (a1 , . . . , an1 ) = 0] + Pr[f (a1 , . . . , an ) = 0|fe (a1 , . . . , an1 ) 6= 0].

33

We bound each term individually. We can bound the probability that fe (a1 , . . . , an1 ) = 0
by induction:
de
deg(fe )

.
Pr
[fe (a1 , . . . , an1 ) = 0]
a1 ,...,an1 S
|S|
|S|
Next, fix a1 , . . . , an1 S such that fe (a1 , . . . , an1 ) 6= 0. The polynomial f (a1 , . . . , an1 , x)
is a univariate polynomial in x of degree e, hence it has at most e roots. Thus, for any such
fixing of a1 , . . . , an1 we have
Pr [f (a1 , . . . , an1 , an ) = 0]

an S

e
.
|S|

This implies that also


Pr

a1 ,...,an S

[f (a1 , . . . , an ) = 0|fe (a1 , . . . , an1 ) 6= 0]

e
.
|S|

We conclude that
Pr

a1 ,...,an S

[f (a1 , . . . , an ) = 0]

e
d
de
+
=
.
|S|
|S|
|S|

Corollary 6.6. Let f, g be two different multivariate polynomials of degree d. Fix > 0 and
let s d/. Then
Pr

a1 ,...,an {1,...,s}

[f (a1 , . . . , an ) = g(a1 , . . . , an )] .

Note that if we have an efficient algebraic circuit which computes f, g, then we can run
this test efficiently using a randomized algorithm, which evaluates the two circuits on a
randomly chosen joint input.

6.4

Perfect matchings via polynomial identity testing

We will see an efficient way to find if a bipartite graph has a perfect matching, using polynomial identity testing.
Define the following n n matrix: Mi,j = xi,j if (ui , vj ) E, and Mi,j = 0 otherwise.
The determinant of M is
det(M ) =

sign()

(1)

n
Y

Mi,(i) .

i=1

Sn

Lemma 6.7. G has a perfect matching iff det(M ) is not the zero polynomial.
Q
Q
Proof. Each term ni=1 Mi,(i) is the monomial ni=1 xi,(i) if corresponds to a perfect
matching; and it zero otherwise. Moreover, each monomial appears only once, and hence
monomials cannot cancel each other.
34

We got an efficient randomized algorithm to test if a bi-partite graph has a perfect


matching: run the polynomial identity testing algorithm on the polynomial det(M ).
Corollary 6.8. Let F = Fp be a finite field for p n/. Define a randomized n n matrix
M over F as follows: if (i, j) E then sample Mi,j F uniformly, and if (i, j)
/ E then set
Mi,j = 0. Then
If G has no perfect matchings then always det(M ) = 0.
If G has a perfect matching then Pr[det(M ) = 0] .
The main advantage of the polynomial identity testing algorithm over the combinatorial
algorithm based on Hall marriage theorem, that we saw before, is that the polynomial identity
testing algorithm can be parallelized. It turns out that the determinant can be computed
in parallel by poly(n) processors in time O(log2 n), and in particular we can check if a
graph has a perfect matching in that time if parallelism is allowed. On that other hand,
all implementations of the algorithm based on Hall marriage theorem requires at least (n)
time, even when parallelism is allowed.

35

Satisfiability

Definition 7.1 (CNF formulas). A CNF formula over boolean variables is a conjunction
(AND) of clauses, where each clause is a disjunction (OR) of literals (variables or their
negation). A k-CNF is a CNF formula where each clause contains exactly k literals.
For example, the following is a 3-CNF with 6 variables:
(x1 , . . . , x6 ) = (x1 x2 x3 ) (x1 x3 x5 ) (x1 x2 x4 ) (x1 x2 x6 ).
The k-SAT problem is the computational problem of deciding whether a given k-CNF
has a satisfying assignment. Many constraint satisfaction problems can be cast as a k-SAT
problem, for example: verifying that a chip works correctly, scheduling flights in an airline,
routing packets in a network, etc. As we will shortly see, 2-SAT can be solved in polynomial
time (in fact, in linear time); however, for k 3, the k-SAT problem is NP-hard, and the
only known algorithms solving it run in exponential time. However, even there we would
see that we can improve upon full enumeration (which takes 2n time). We will present an
algorithm that solves 3-SAT in time 20.41n . The same algorithm solves k-SAT for any
k 3 in time 2ck n where ck < 1.
Both the polynomial algorithm for 2-SAT and the exponential algorithm for 3-SAT,
k 3, will be based on a similar idea: analyzing a random walk on the space of possible
solutions.

7.1

2-SAT

Let x = (x1 , . . . , xn ) and let (x) be a 2-SAT given by


(x) = C1 (x) . . . Cm (x),
where each Ci is the OR of two literals. We say that an assignment x {0, 1}n satisfies a
clause Ci if Ci (x) = 1. We will analyze the following simple looking algorithm.
Function Solve-2SAT
Input : 2CNF
Output : x {0, 1}n such t h a t (x) = 1
1 . Set x = 0 .
2 . While t h e r e e x i s t s some c l a u s e Ci such t h a t Ci (x) = 0 :
2 . 1 Let xa , xb be t h e two v a r i a b l e s p a r t i c i p a t i n g i n Ci .
2 . 2 Choose ` {a, b} u n i f o r m l y : Pr[` = a] = Pr[` = b] = 1/2 .
2 . 3 F l i p x` .
3 . Output x .
We will show the following theorem.
36

Theorem 7.2. If is a satisfiable 2-CNF, then with probability at least 1/4 over the internal
randomness of the algorithm, it outputs a solution within 4n2 steps.
We note that if we wish a higher success probability (say, 99%) then we can simply repeat
the algorithm a few times, where in each phase, if the algorithm does not find a solution
within the first 4n2 steps, we restart the algorithm. The probability that the algorithm still
doesnt find a solution after t restarts is at most (3/4)t . So, to get success probability of 99%
we need to run the algorithm 9 times (since (3/4)9 1%).
We next proceed to the proof. For the proof, fix some solution x for (if there is more
than one, choose one arbitrarily). Let xt denote the value of x in the t-th iteration of the
loop. Note that it is a random variable, which depends on our choice of which clause to
choose and which variables to flip in the previous steps. Define dt = dist(xt , x ) to be the
hamming distance between xt and x (that is, the number of bits where they differ). Clearly,
at any stage 0 dt n, and if dt = 0 then we reached a solution, and we output xt = x at
iteration t.
Consider xt , the assignment at iteration t, and assume that dt > 0. Let C = `a `b
be a violated clause, where `a {xa , xa } and `b {xb , xb }. This means that either
(x )a 6= (xt )a or (x )b 6= (xt )b (or both), since C(x ) = 1 but C(xt ) = 0. If we choose
` {a, b} such that (xt )` 6= (x )` , then the distance between xt+1 and x decreases by one;
otherwise, it increases by one. So we have:
dt+1 = dt + t ,
where t {1, 1} is a random variable that satisfies Pr[t = 1|xt ] 1/2.
Another way to put it, the sequence of distances d0 , d1 , d2 , . . . is a random walk on
{0, 1, . . . , n}. It starts at some arbitrary location d0 . In each step, we move to the left
(getting closer to 0) with probability 1/2, and otherwise move to the right (getting further
away from n). We will show that after O(n2 ) steps, with high probability, this has to terminate: either some satisfying assignment has been found, or otherwise we hit 0 and output
x . We do so by showing that a random walk tends to drift far from its origin.
For simplicity, we first analyze the slightly simpler case where the random walk is symmetric, that is Pr[t = 1|xt ] = Pr[t = 1|xt ] = 1/2. We will then show how to extend the
analysis to our case, where the probability for 1 could be larger (intuitively, this should
only help us get to 0 faster; however proving this formally is a bit technical).
Lemma 7.3. Let y0 , y1 , . . . be a random walk, defined as follows: y0 = 0 and yt+1 = yt + t ,
where t {1, 1} and Pr[t = 1|yt ] = 1/2 for all t 0. Then, for any t 0,
E[yt2 ] = t.
Proof. We prove this by induction on t. It is clear for t = 0. We have
2
E[yt+1
] = E[(yt + t )2 ] = E[yt2 ] + 2E[yt t ] + E[2t ].

37

By induction, E[yt2 ] = t. Since t {1, 1} we have E[2t ] = 1. To conclude, we need to


show that E[yt t ] = 0. We show this via the rule of conditional expectations:
E[yt t ] = Eyt [Et [yt t |yt ]] = Eyt [yt Et [t |yt ]] = Eyt [yt 0] = 0.

We now prove the more general lemma, where we allow a consistent drift.
Lemma 7.4. Let y0 , y1 , . . . be a random walk, defined as follows: y0 = 0 and yt+1 = yt + t ,
where t {1, 1} and Pr[t = 1|yt ] 1/2 for all t 0. Then, for any t 0,
E[yt2 ] t/2.
Note that the same result holds by symmetry if we assume instead that Pr[t = 1|yt ]
1/2 for all t 0.
Proof. The proof is by a coupling argument. Define a new random walk y00 , y10 , . . ., where
0
= yt0 + 0t . In general, we would have that yt0 , 0t depend on y1 , . . . , yt . So, fix
y00 = 0, yt+1
y1 , . . . , yt , and assume that Pr[t = 1|yt ] = for some 1/2. Define 0t as:
0t (yt , t )

 1 if t = 1
if t = 1, with probability 1/2
= 1
1 if t = 1, with probability 1 1/2

It satisfies the following properties:


0t t .
Pr[0t = 1|yt , t ] = 1/2.
Note that the random walk y00 , y10 , . . . , is a symmetric random walk, which further satisfies
yt0 yt for all t 0. By the previous lemma,
E[(yt0 )2 ] = t.
Note moreover that since the random walk y00 , y10 , . . . is symmetric, Pr[yt0 = a] = Pr[yt0 = a]
for any a Z. In particular, Pr[yt0 0] 1/2. Whenever yt0 0 we have yt2 (yt0 )2 . So we
have
E[yt2 ] E[yt2 1yt0 0 ] E[(yt0 )2 1yt0 0 ] = E[(yt0 )2 ]/2 = t/2.

We return now to the proof of Theorem 7.2. The proof will use Lemma 7.4, but the
analysis is more subtle.

38

Proof of Theorem 7.2. Recall that x0 , x1 , . . . is the sequence of guesses for a solution which
the algorithm explores. Let T N denote the random variable of the step at which the
algorithm outputs a solution. The challenge in the analysis is that not only the sequence is
random, for also T is a random variable. For simplicity of notation later on, set xt = xT for
all t > T .
Let y0 , y1 , . . . be defined as yt = dt d0 , where recall that dt = dist(xt , x ). In order to
analyze this random walk, we define a new random walk z0 , z1 , . . . as follows: set zt = yt for
t T , and for t T set zt+1 = zt + t , where t {1, +1} is uniformly and independently
chosen. We will argue that the sequence zt satisfies the conditions of Lemma 7.4. Namely,
if we define t = zt+1 zt then Pr[t = 1|zt ] 1/2.
To show this, let us condition on x0 , . . . , xt . If none of them are solutions to then
t = yt+1 yt , and conditioned on x0 , . . . , xt not being solutions, we already showed that
the probability that t = 1 is 1/2. If, on the other hand, x0 , . . . , xt contain a solution,
then t = t and the probability that t = 1 is exactly 1/2. In either case we have
Pr[t = 1|x0 , . . . , xt ] 1/2.
This then implies that
Pr[t = 1|zt ] 1/2.
Thus, we may apply Lemma 7.4 to the sequence z0 , z1 , . . . and obtain that for any t 1
we have
E[zt2 ] t/2.
Next, for t 1 let Tt = min(T, t). We may write zt as
zt = yTt +

t1
X

i ,

i=Tt

where the sum is empty if Tt = t. In particular, conditioning on T gives that

!2
t1
X

E[zt2 |T ] = E yTt +
i T = E[yT2t |T ] + t Tt .
i=Tt

Averaging over T gives then that


E[zt2 ] = E[yT2t ] + t E[Tt ].
Next, observe that any yi = di d0 is a difference of two numbers between 0 and n, and
hence |yi | n. In particular, E[yT2t ] n2 . Thus we have
E[Tt ] = E[yT2t ] + t E[zt2 ] n2 + t/2.
Wet t0 = 4n2 . We have
E[Tt0 ] (3/4)t0 .
39

By Markov inequality we have that


Pr[T t0 ] = Pr[Tt0 = t0 ] Pr[Tt0 (4/3)E[Tt0 ]] 3/4.
So, with probability at least 1/4, we have that T < t0 , which means the algorithm finds a
solution within t0 = 4n2 steps.

7.2

3-SAT

Let be a 3-CNF. Finding if has a satisfying assignment is NP-hard, and the best known
algorithms take exponential time. However, we can still improve upon the naive 2n full
enumeration, as the following algorithm shows. The algorithm we analyze is due to Schoening [Sch99].
Let m 1 be a parameter to be determined later.
Function Solve-3SAT
Input : 3CNF
Output : x {0, 1}n such t h a t (x) = 1
1 . Choose x {0, 1}n randomly .
2 . For i = 1, . . . , m :
2 . 1 I f (x) = 1 , output x .
2 . 2 Otherwise , l e t Ci be some c l a u s e such t h a t Ci (x) = 0 ,
with v a r i a b l e s xa , xb , xc .
2 . 2 Choose ` {a, b, c} u n i f o r m l y :
Pr[` = a] = Pr[` = b] = Pr[` = c] = 1/3 .
2 . 3 F l i p x` .
3 . Output FAIL .
Our goal is to analyze the following question: what is the success probability of the
algorithm? as before, assume is satisfiable, and choose some satisfying assignment x .
Define xi to be the value of x at the i-th iteration of the algorithm, and let di = dist(xi , x ).
Claim 7.5. The following holds

(i) Pr[d0 = k] = 2n nk for all 0 k n.
(ii) dt+1 = dt + t where t {1, 1} satisfies Pr[t = 1|dt ] 1/3.
Proof. For (i), note that d0 is the distance of a random string from x , so equivalently, it is
the hamming weight of a uniform element of {0, 1}n . The number of elements of hamming
weight k is nk . For (ii), if xa , xb , xc are the variables appearing in an unsatisfied clause at
iteration t, then at least one of them disagrees with the value of x . If we happen to choose
it, the distance will decrease by one, otherwise, it will increase by one.
40

For simplicity, lets assume from now on that Pr[t = 1|dt ] = 1/3, where the more
general case can be handled similar to the way we handled it for 2-SAT.
Claim 7.6. Assume that d0 = k. The probability that the algorithm finds a satisfying solution
is at least

  (m+k)/2  (mk)/2
2
m
1
.
3
3
(m + k)/2
Proof. Consider the sequence of steps 0 , . . . , m1 . If there are k more 1 than +1 in this
sequence, then starting at d0 = k, we will reach dm = 0. The number of such sign sequences
m
is (m+k)/2
, the probability for seeing a 1 is a 1/3, and the probability for seeing a +1 is
2/3.
Claim 7.7. For any 0 k n, the probability that the algorithms finds a solution is at least
 
  (m+k)/2  (mk)/2
m
1
2
n n
2
.
k
(m + k)/2
3
3

Proof. We require that d0 = k (which occurs with probability 2n nk ) and, conditioned on
that occurring, apply Claim 7.6.
We now need to optimize parameters. Fix k = n, m = n for some constants , > 0.
We will use the following approximation: for n 1 and 0 < < 1,
   n 
(1)n
n
1
1

2H()n ,
n

1
1
where H() = log2 ( 1 ) + (1 ) log2 ( 1
) is the entropy function. Then
 
n
2H()n ,
k




1
m
+
n
H
2
2
2
(m + k)/2
 (m+k)/2  (mk)/2
1
2
= 3n 2()/2n .
3
3

So, we can express the probability of success as 2n , where





= 1 H() H 21 + 2
+ log2 3

.
2

Our goal is to choose 0 < < 1 and to minimize . The minimum is obtained for
= 1/3, = 1, which gives 0.41.
So, we have an algorithm that runs in time O(m) = O(n), and finds a satisfiable assignment with probability 2n . To find a satisfiable assignment with high probability, we
simply repeat it N = 5 2n times. The probability it fails in all these executions is at most
(1 2n )N exp(2n N ) exp(5) 1%.
Corollary 7.8. We can solve 3-SAT in time O(2n ) for 0.41.
41

Hash functions: the power of pairwise independence

Hash functions are used to map elements from a large domain to a small one. They are commonly used in data structures, cryptography, streaming algorithms, coding theory, and more
- anywhere where we want to store efficiently a small subset of a large universe. Typically,
for many of the applications, we would not have a single hash function, but instead a family
of hash functions, where we would randomly choose one of the functions in this family as
our hash function.
Let H = {h : U R} be a family of functions, mapping elements from a (typically
large) universe U to a (typically small) range R. For many applications, we would like two,
seemingly contradicting, properties from the family of functions:
Functions h H should look random
Functions h H are succinctly described, and hence processed and stored efficiently.
The way to resolve this is to be more specific about what do we mean by looking
random. The following definition is such a concrete realization, which although is quite
weak, it is already very useful.
Definition 8.1 (Pairwise independent hash functions). A family H = {h : U R} is
said to be pairwise independent, if for any two distinct elements x1 6= x2 U , and any two
(possibly equal) values y1 , y2 R,
Pr [h(x1 ) = y1 and h(x2 ) = y2 ] =

hH

1
.
|R|2

We investigate the power of pairwise independent hash functions in this chapter, and
describe a few applications. For many more applications we recommend the book [LLW06].

8.1

Pairwise independent bits

To simplify notations, let us consider the case of R = {0, 1}. We also assume that |U | = 2k
for some k 1, by possibly increasing the size of the universe by a factor of at most two.
Thus, we can identify U = {0, 1}k , and identify functions h H with boolean functions
h : {0, 1}k {0, 1}. Consider the following construction:
H2 = {ha,b (x) = ha, xi + b

(mod 2) : a {0, 1}k , b {0, 1}}.

One can check that |H2 | = 2k+1 = 2|U |, which is much smaller than the set of all functions
k
from {0, 1}k to {0, 1} (which has size 22 ). We will show that H2 is pairwise independent.
To do so, we need the following claim.
Claim 8.2. Fix a x {0, 1}k , x 6= 0k . Then
Pr [ha, xi

a{0,1}k

1
(mod 2) = 0] = .
2
42

Proof. Fix i [k] such that xi = 1. Then


"
Pr

a{0,1}k

[ha, xi

(mod 2) = 0] = Pr ai =

aj xj

j6=i

1
(mod 2) = .
2

Lemma 8.3. H2 is pairwise independent.


Proof. Fix distinct x1 , x2 {0, 1}k and (not necessarily distinct) y1 , y2 {0, 1}. In all the
calculations below of ha, xi + b, we evaluate the result modulo 2. We need to prove
1
Pr
[ha, x1 i + b = y1 and ha, x2 i + b = y2 ] = .
k
4
a{0,1} ,b{0,1}
If we just randomized over a then by Claim 8.2, for any y {0, 1} we have
1
Pr[ha, x1 i ha, x2 i = y] = Pr[ha, x1 x2 i = y] = .
a
a
2
Randomizing also over b gives us the desired result:
Pr [ha, x1 i + b = y1 and ha, x2 i + b = y2 ] =
a,b

Pr [ha, x1 i ha, x2 i = y1 y2 and b = ha, x1 i + y1 ] =


a,b

1
1 1
= .
2 2
4
Next, we describe an alternative viewpoint, which will justify the name pairwise independent bits.
Definition 8.4 (Pairwise independent bits). A distribution D over {0, 1}n is said to be
pairwise independent, if for any distinct i, j [n] and any y1 , y2 {0, 1} we have
1
Pr [xi = y1 and xj = y2 ] = .
xD
4
We note that we can directly use H2 to generate pairwise independent bits. Assume
that n = 2k . Identify h H2 , h : {0, 1}k {0, 1}, with the string uh in{0, 1}n giving by
concatenating all the evaluations of h is some pre-fixed order. Let D be the distribution over
{0, 1}n obtained by sampling h H2 uniformly and outputting uh . Then the condition that
D is pairwise independent is equivalent to that of H2 be pairwise independent. Note that
the construction above gives a distribution D supported on |H2 | = 2n elements in {0, 1}n ,
much less than the full space. In particular, we can represent a string u in the support of D
by specifying the hash function which generated it, which only requires log |H| = log n + 1
bits.
Example 8.5. Let n = 4. A uniform string from the following set of 8 = 23 strings is
pairwise independent:
{0000, 0011, 0101, 0110, 1001, 1010, 1100, 1111}.
43

8.2

Application: de-randomized MAXCUT

Let G = (V, E) be a simple undirected graph. For S V let E(S, S c ) = {(u, v) E : u


S, v S c } be the number of edges which cross the cut S. The MAXCUT problem asks to
find the maximal number of edges in a cut.
MAXCUT(G) = max |E(S, S c )|.
SV

Computing the MAXCUT of a graph is known to be NP-hard. Still, there is a simple


randomized algorithm which approximates it within factor 2. Below, let V = {v1 , . . . , vn }.
Lemma 8.6. Let x1 , . . . , xn {0, 1} be uniformly and independently chosen. Set
S = {vi : xi = 1}.
Then
ES [|E(S, S c )|]

MAXCUT(G)
|E(G)|

.
2
2

Proof. For any choice of S we have


|E(S, S c )| =

1vi S 1vj S c .

(vi ,vj )E

Note that every undirected edge {u, v} in G is actually counted twice in the calculation
above, once as (u, v) and once as (v, u). However, clearly at most one of these is in E(S, S c ).
By linearity of expectation, the expected size of the cut is
X
X
ES [|E(S, S c )|] =
E[1vi S 1vj S
E[1xi =1 1xj =0 ]
/ ] =
(vi ,vj )E

(vi ,vj )E

Pr[xi = 1 and xj = 0] = 2|E(G)|

(vi ,vj )E

1
|E(G)|
=
.
4
2

This implies that a random choice of S has a non-negligible probability of giving a 2approximation.
i
h
|E(G)|
1
c
2E(G)
n12 .
Corollary 8.7. PrS |E(S, S )| 2
Proof. Let X = |E(S, S c )| be a random variable counting the number of edges in a random
cut. Let = |E(G)|/2, where we know that E[X] . Note that whenever X < , we
in fact have that X 1/2, since X is an integer and a half-integer. Also, note that
always X |E(G)| 2. Let p = Pr[X ]. Then
E[X] = E[X|X ] Pr[X ] + E[X|X 1/2] Pr[X 1/2]
2 p + ( 1/2) (1 p)
1/2 + 2p.
So we must have 2p 1/2, which means that p 1/(4) 1/(2E(G)).
44

In particular, we can sample O(n2 ) sets S, compute for each one its cut size, and we are
guaranteed that with high probability, the maximum will be at least |E(G)|/2.
Next, we derandomize this randomized algorithm using pairwise independent bits. As
a side benefit, it will reduce the computation time from testing O(n2 ) sets to testing only
O(n) sets.
Lemma 8.8. Let x1 , . . . , xn {0, 1} be pairwise independent bits (such as the ones given by
H2 ). Set
S = {vi : xi = 1}.
Then

|E(G)|
.
2
Proof. The only place where we used the fact that the bits were uniform in the proof of
Lemma 8.6, was in the calculation
ES [|E(S, S c )|]

Pr[xi = 1 and xj = 0] =

1
4

for all distinct i, j. However, this is also true for pairwise independent bits (by definition).
In particular, for one of the O(n) sets S that we generate in the algorithm, we must have
that |E(S, S c )| exceeds the average, and hence |E(S, S c )| |E(G)|/2.

8.3

Optimal sample size for pairwise independent bits

The previous application showed the usefulness of having small sample spaces for pairwise
independent bits. We saw that we can generate O(n) binary strings of length n, such that
choosing one of them uniformly gives us pairwise independent bits. We next show that this
is optimal.
Lemma 8.9. Let X {0, 1}n and let D be a distribution supported on X. Assume that D
is pairwise independent. Then |X| n.
Proof. Let X = {x1 , . . . , xm } and let pi = Pr[D = xi ]. For any i [n], we construct a vector
vi Rm as follows:

(vi )` = p` (1)(x` )i .
We will show that the set of vectors {v1 , . . . , vn } are linearly independent in Rm , and hence
we must have |X| = m n.
As a first step, we show that hvi , vj i = 0 for all i 6= j:
hvi , vj i =

m
X



p` (1)(x` )i +(x` )j = ExD (1)xi +xj

`=1

= Pr [xi + xj = 0
xD

(mod 2)] Pr [xi + xj = 1


xD

1 1
= 0.
2 2
45

(mod 2)]

Next, we show that this implies that v1 , . . . , vn must be linearly independent. Assume
towards contradiction that this is not the case. That is, there exist coefficients 1 , . . . , n
R, not all zero, such that
X
i vi = 0.
However, for any j [n], we have
DX
E X
0=
i vi , vj =
i hvi , vj i = j kvj k2 = j .
So j = 0 for all j, a contradiction. Hence, v1 , . . . , vn are linearly independent and hence
|X| = m n.

8.4

Hash functions with large ranges

We now consider the problem of constructing a family of hash functions H = {h : U R}


for large R. For simplicity, we will assume that |R| is prime, although this requirement can
be somewhat removed. So, lets identify R = Fp for a prime p. We may assume that |U | = pk ,
by possibly increase the size of the universe by a factor p. This allows us to identify U = Fkp .
Define the following family of hash functions:
Hp = {ha,b (x) = ha, xi + b : a Fkp , b Fp }.
Note that |H| = pk+1 = |U | |R|, and observe that for p = 2 this coincides with our previous
definition of H2 . We will show that Hp is pairwise independent. As before, we need an
auxiliary claim first.
Claim 8.10. Let x Fkp , x 6= 0. Then for any y Fp ,
1
Pr [ha, xi = y] = .
p
aFkp
Proof. Fix i [k] such that xi 6= 0. Then
"
Pr [ha, xi = y] = Pr ai xi = y

aFkp

#
X

aj x j .

j6=i

Now, for every fixing of {aj : j 6= i}, we have that ai xi is uniformly distributed in Fp , hence
the probability that it equals any specific value is exactly 1/p.
Lemma 8.11. Hp is pairwise independent.
Proof. Fix distinct x1 , x2 Fkp and (not necessarily distinct) y1 , y2 Fp . All the calculations
below of ha, xi + b are in Fp . We need to show
Pr

aFkp ,bFp

[ha, x1 i + b = y1 and ha, x2 i + b = y2 ] =


46

1
.
p2

If we just randomized over a then by the claim, then for any y Fp by the claim,
1
Pr[ha, x1 i ha, x2 i = y] = Pr[ha, x1 x2 i = y] = .
a
a
p
Randomizing also over b gives us the desired result.
Pr [ha, x1 i + b = y1 and ha, x2 i + b = y2 ] =
a,b

Pr [ha, x1 i ha, x2 i = y1 y2 and b = ha, x1 i + y1 ] =


a,b

1 1
1
= 2.
p p
p

8.5

Application: collision free hashing

Let S U be a set of objects. A hash function h : U R is said to be collision free for S if


it is injective on S. That is, h(x) 6= h(y) for all distinct x, y S. We will show that if R is
large enough, then any pairwise independent hash family contains many collision free hash
functions for any small set S. This is extremely useful: it allows to give lossless compression
of elements from a large universe to a small range.
Lemma 8.12. Let H : {h : U R} be a pairwise independent hash family. Let S U be a
set of size |S|2 |R|. Then
Pr [h is collision free for S] 1/2.

hH

Proof. Let h H be uniformly chosen, and let X be a random variable that counts the
number of collisions in S. That is,
X
X=
1h(x)=h(y) .
{x,y}S

The expected value of X is


E[X] =

X
{x,y}S


Pr [h(x) = h(y)] =

hH


|S|2
1
|S| 1

.
2|R|
2
2 |R|

By Markovs inequality,
Pr [h is not collision free for S] = Pr[X 1] 1/2.

hH

47

8.6

Efficient dictionaries: storing sets efficiently

We now show how to use pairwise independent hash functions, in order to design efficient
dictionaries. Fix a universe U . For simplicity, we will assume that for any R we have a
family of pairwise independent hash functions H = {h : U R}, and note that while our
previous constructions required R to be prime (or in fact, a prime power), this will at most
double the size of the range, which at the end will only change our space requirements by a
constant factor.
Given a set S U of size |S| = n, we would like to design a data structure which supports
queries of the form is x S? Our goal will be to do so, while minimizing both the space
requirements and the time it takes to answer a query. If we simply store the set as a sorted
list of n elements, then the space (memory) requirements are O(n log |U |), and each query
takes time O(log |U | + log n), by doing a binary search on the list. We will see that these
can be improved via hashing.
First, consider the following simple hashing scheme. Fix a range R = {1, . . . , n2 }. Let
H = {h : U [n2 ]} be a pairwise independent hash function. We showed that a randomly
chosen h H will be collision free on S with probability at least 1/2. So, we can sample
h H until we find such an h, which on average would require two attempts. Let A be an
array of length n2 . It will be mostly empty, except that we set A[h(x)] = x for all x S.
Now, to check whether x S, we compute h(x) and check whether A[h(x)] = x or not.
Thus, the query time is only O(log |U |). However, we pay in our space requirements are big:
to store n elements, we maintain an array of size n2 , which requires at least n2 bits (and
possibly even O(n2 log |U |), depending on how efficient we are in storing the empty cells).
We describe a two-step hashing scheme due to Fredman Komlos and Szemeredi [FKS84]
which avoids this large waste of space. It will use only O(n log n + log |U |) space, but would
still allow for query time of O(log |U |). As a preliminary step, we apply the collision free
hash scheme we just described. So, may assume from now on that U = O(n2 ) and that
S U has size |S| = n.
Step 1. We first find a hash function h : U [n] which has only n collisions. Let Coll(h, S)
denote the number of collisions of h for S, namely
Coll(h, S) = |{{x, y} S : h(x) = h(y)}|].
If H = {h : U [n]} is a family of pairwise independent hash functions, then
 
X
|S|2
n
|S| 1

.
EhH [Coll(h, S)] =
Pr[h(x) = h(y)] =
2n
2
2 n
{x,y}S

By Markovs inequality, we have


Pr [Coll(h, S) n] 1/2.

hH

So, after on average two iterations of randomly choosing h H, we find such a function
h : U [n] such that Coll(h, S) n. We fix it from now on. Note that it is represented
using only O(log n) bits.
48

P
Step 2. Next, for any i [n] let Si = {x S : h(x) = i}. Observe that
|Si | = n and

n 
X
|Si |
= Coll(h, S) n.
2
i=1
P
P
Let ni = |Si |2 . Note that
ni = 2Coll(h, S) + |Si | 3n. We will find hash functions
hi : U [ni ] which are collision free on Si . Choosing a uniform hash function from a
pairwise independent set of hash functions Hi = {h : U [ni ]} succeeds on average after
two samples. So, we only need O(n) time in total (on expectation) to find these functions.
As each hi requires O(log n) bits to be represented, we need in total O(n log n) to represent
all of them.
P
Let A be an array of size 3n. Let offseti = j<i nj . The sub-array A[offseti : offseti + ni ]
will be used to store the elements of Si . Initially A is empty. We set
A[offseti + hi (x)] = x

x Si .

Note that there are no collisions in A, as we are guaranteed that hi are collision free on Si .
We will also keep the list of {offseti : i [n]} in a separate array.
Query. To check whether x S, we do the following:
Compute i = h(x).
Read offseti .
Check if A[offseti + hi (x)] = x or not.
This can be computed using O(log n) bit operations.
Space requirements. The hash functions h requires O(log n) bits. The hash functions
{hi : i [n]} require O(n log n) bits. The array A requires O(n log n) bits.
Setup time. The setup algorithm is randomized, as it needs to find good hash functions.
It has expected running time is O(n log n) bit operations.
To find h takes O(n log n) time, as this is how long it takes to verify that it is collision
free.
To find each hi takes O(|Si | log n) time, and in total it is O(n log n) time.
To set up the arrays of {offseti : i [n]} and A takes O(n log n) time.
RAM model vs bit model. Up until now, we counted bit operations. However, computers can operate on words efficiently. A model for that is the RAM model, where we can
perform basic operations on log n-bit words. In this model, it can be verified that the query
time is O(1) word operations, space requirements are O(n) words and setup time is O(n)
word operations.
49

8.7

Bloom filters

Bloom filters allow for even more efficient data structures for set membership, if some errors
are allowed. Let U ba a universe, S U a subset of size |S| = n. Let h : U [m] be a
uniform hash function, for m to be determined later. The data structure maintains a bit
array A of length m, initially set to zero. Then, for every x S, we set A[h(x)] = 1. In order
to check if x S, we compute h(x), read the value A[h(x)] we answer yes if A[h(x)] = 1
and no otherwise. This has the following guarantees:
No false negative: if x S we will always say yes.
Few false positives: if x
/ S, we will say yes with probability
is a fully random function.

|{i:A[i]=1}|
,
m

assuming h

So, if for example we set m = 2n, then the probability for x


/ S that we say no it at least
1/2. In fact, the probability is greater, since when hashing n elements to 2n values there
will be some collisions, and so the number of 1s in the array will be less than n. It turns
out that to get probability 1/2 we only need m 1.44n. This is since, the probability that
A[i] = 1, over the choice of h, is
n

1
en/m .
Pr[A[i] = 0] = Pr[h(x) 6= i, x S] = 1
h
m
So, for m = n/ln(2) 1.44n, the expected number of 0s in A is m/2.
Note that a bloom filter uses O(n) bits, which is much less than the O(n log |U |) bits we
needed when we did not allow for any errors. It can be shown that we dont need h to be
uniform (which can be costly to store); a k-wise independent hash family (an extension of
pairwise independent) for k = O(log n) suffices, and such a function can be stored using only
O(log2 n) bits.

50

Min cut

Let G = (V, E) be a graph. A cut in this graph is |E(S, S c )| for some S V , that is,
the number of edges which cross a partition of the vertices. Finding the maximum cut is
NP-hard, and the best algorithms solving it run in exponential time. We saw an algorithm
which finds a 2-approximation for the max-cut. However, finding the minimum cut turns
out to be solvable in polynomial time. Today, we will see a randomized algorithm due to
Karger [Kar93] which achieves this is a beautiful way. To formalize this, the algorithm will
find
min-cut(G) = min |E(S, S c )|.
6=S(V

9.1

Kargers algorithm

The algorithm is very simple: choose a random edge and contract it. Repeat n 1 times,
until only two vertices remain. Output this as a guess the for min-cut. We will show that
this outputs the min-cut with probability at least 2/n2 , hence repeating it 3n2 times (say)
2
will yield a min-cut with probability at least (1 2/n2 )3n 99%.
Formally, to define a contraction we need to allow graphs to have parallel edges.
Definition 9.1 (edge contraction). Let G = (V, E) be an undirected graph with |V | = n
vertices, potentially with parallel edges. For an edge e = (u, v) E, the contraction of G
along e is an undirected graph on n 1 vertices, where we merge u, v to be a single node,
and delete any self-loops that may be created in this process.
We can now present the algorithm.
Karger
Input : U n d i r e c t e d graph G = (V, E) with |V | = n
Output : Cut i n G
1 . Let Gn = G .
2 . For i = n, . . . , 3 do :
2 . 1 Choose a uniform edge ei Gi .
2 . 2 S e t Gi1 t o be t h e c o n t r a c t i o n o f Gi a l o n g ei .
3 . Output t he c ut i n G c o r r e s p o n d i n g t o G2 .
Next, we proceed to analyze the algorithm. We start with a few observations. By
contracting along an edge e = (u, v), we make a commitment that u, v belong to the same
side of the cut, so we restrict the number of potential cuts. Hence, any cut of Gi is a cut of
G, and in particular
min-cut(G) min-cut(Gi ), i = n, . . . , 2.

51

In order to analyze the algorithm, lets fix from now on a min-cut S V (G). We will
analyze the probability that the algorithm never chooses an edge in S. Hence, after n 2
contractions, the output will be exactly the cut S. First, we bound the min-cut by cuts for
which one side is a single vertex.
|E|
.
Claim 9.2. min-cut(G) minvG deg(v) 2 |V
|

Proof. For any vertex v P


V , we can form a cut by taking E({v}, V \ {v}). It has deg(v)
many edges. Since 2|E| = vV deg(v) we can bound
min-cut(G) min deg(v)
vV

2|E|
.
|V |

Claim 9.3. Let G = (V, E) be a graph, S V be a minimal cut in G. Let e E be a


uniform chosen edge. Then
2
Pr [e E(S, S c )]
.
eE
|V |
Proof. We saw that |E(S, S c )| 2|E|/|V |. So, the probability that e is in the cut is
Pr [e E(S, S c )]

eE

2|E|/|V |
2
|E(S, S c )|

.
|E|
|E|
|V |

Theorem 9.4. For any min-cut S in G, Pr[algorithm outputs S]

(n2 )

2
.
n2

Proof. Fix a min-cut S V . The algorithm outputs S if it never chooses an edge in S. So


Pr[algorithm outputs S] = Pr[en , . . . , e3
/ E(S, S c )]
=

3
Y

Pr[ei
/ E(S, S c )|en , . . . , ei+1
/ E(S, S c )].

i=n

To analyze this, assume that en , . . . , ei+1


/ E(S, S c ), so that S is still a cut in Gi . Thus, it is
also a min-cut in Gi , and we proved that in this case, Pr[ei E(S, S c )] 1/2|V (Gi )| = 1/2i.
So

 


2
2
2
1
... 1
Pr[algorithm outputs S] 1
n
n1
3
n2 n3 n4
2 1
=

...
n
n1 n2
4 3
2
=
.
n(n 1)

We get a nice corollary: the number of min-cuts in G is at most


directly?
52

n
2

. Can you prove it

9.2

Improving the running time

The time it takes to find a min-cut by Kargers algorithm described above is O(n4 ): we need
O(n2 ) iterations to guarantee success with high probability. Every iteration requires n 2
rounds of contraction, and every contraction takes O(n) time. We will see how to improve
this running time to O(n2 log n) due to Karger and Stein [KS96]. The main observation
guiding this is that most of the error comes when the graph becomes rather small; however,
when the graphs are small, the running time is also smaller. So, we can allow to run multiple
instances for smaller graphs, which would boost the success probability without increasing
the running time too much.
Fast-Karger
Input : U n d i r e c t e d graph G = (V, E) with |V | = n
Output : Cut i n G
0 . I f n = 2 output t he o n l y p o s s i b l e c u t .

1 . Let Gn = G and m = n/ 2 .
2 . For i = n, . . . , m + 1 do :
2 . 1 Choose a uniform edge ei Gi .
2 . 2 S e t Gi1 t o be t h e c o n t r a c t i o n o f Gi a l o n g ei .
3 . Run r e c u r s i v e l y Si = Fast-Karger(Gi ) f o r i = 1, 2 .
4 . Output S {S1 , S2 } which m i n i m i z e s E(S, S c ) .

Claim 9.5. Fix a cut S. Run the original Karger algorithm, but output Gm for m = n/ 2.
Then the probability that S is still a cut in Gm is at least 1/2.
Proof. Repeating the analysis we did, but stopping once we reach Gm , gives
Pr[S is a cut of Gm ] = Pr[en , . . . , em+1
/ E(S, S c )]

m1
m(m 1)
n2
...
=
.
n
m+1
n(n 1)

So, if we set m = n/ 2, this probability is at least 1/2.


Theorem 9.6. Fast-Karger runs in time O(n2 log n) and returns a minimal cut with probability at least 1/4.
Proof. We first analyze the success probability. Let P (n) be the probability that the algorithm succeeds on graphs of size n. We will prove by induction that P (n) 1/4. We proved
that
1
Pr[min-cut(Gm ) = min-cut(G)] .
2
So we have
 2
1
1
3
9
1
2
P (n) (1 P (m))
=
.
2
2
4
32
4
53

We next analyze the running time. Let T (n) be the running time of the algorithm on graphs
of size n. Then

T (n) = 2 T (n/ 2) + O(n2 ).


This solves to T (n) = O(n2 log n).

54

10

Routing

Let G = (V, E) be an undirected graph, where nodes represent processors and edges represent
communication channels. Each node wants to send a message to another node: v (v),
where is some permutation on the vertices. However, messages can only traverse on edges,
and each edge can only carry one message at a given time unit. A routing scheme is a method
of deciding on pathes for the messages obeying these restrictions, which tries to minimize
the time it takes for all messages to reach their destination. If more than one packet needs
to traverse an edge, then only one packet does so at any unit time, and the rest are queued
for later time steps. The order of sending the remaining packets does not matter much. For
example, you can assume a FIFO (First In First Out) queue on every edge.
Here, we will focus on the hypercube graph H, which is a common graph used in distributed computation.
Definition 10.1 (Hypercube graph). The hypercube graph Hn = (V, E) has vertices corresponding to all n-bit strings V = {0, 1}n and edges which correspond to bit flips,
E = {(x, x ei ) : x {0, 1}n , i [n]},
where ei is the i-th unit vector, and is bitwise xor.
An oblivious routing scheme is a scheme where the path of sending v (v) depends
just on the endpoints v, (v), and not on the targets of all other messages. Such schemes
are easy to implement, as each node v can compute them given only their local knowledge
of their target (v). A very simple one for the hypercube graph is the bit-fixing scheme:
in order to route v = (v1 , . . . , vn ) to u = (u1 , . . . , un ), we flip the bits in order whenever
necessary. So for example, the path from v = 10110 to u = 00101 is
10110 00110 00100 00101.
We denote by Pfix (v, u) the path from v to u according to the bit-fixing routing scheme.

10.1

Deterministic routing is bad

Although the maximal distance between pairs of vertices in H is n, routing based on the
bit-fixing scheme can incur a very large overhead, due to the fact that edges can only carry
one message at a time.
Lemma 10.2. There are permutations : {0, 1}n {0, 1}n for which the bit-fixing scheme
requires at least 2n/2 /n time steps to transfer all messages.
Proof. Assume n is even, and write x {0, 1}n as x = (x0 , x00 ) with x0 , x00 {0, 1}n . Consider
any permutation : {0, 1}n {0, 1}n which maps (x0 , 0) to (0, x0 ) for all x0 {0, 1}n/2 .
These 2n/2 pathes all pass through a single vertex (0, 0). As it has only n outgoing edges,
we need at least 2n/2 /n time steps to send all these packages.
55

In fact, a more general theorem is true, which shows that any deterministic oblivious
routing scheme is equally bad. Here, a deterministic oblivious routing scheme is any scheme
in which if (v) = u then the path from v to u depends only on v, u, and more is decided in
some deterministic fixed way.
Theorem 10.3. For any deterministic routing
scheme, there exists a permutation :
n
n
n/2
{0, 1} {0, 1} which requires at least 2 / n time steps.
We will not prove this theorem. Instead, we will see how randomization can greatly
enhance performance.

10.2

Solution: randomized routing

We will consider the following oblivious routing scheme, which we call the two-step bit-fixing
scheme. It uses randomness on top of the deterministic bit-fixing routing scheme described
earlier.
Definition 10.4 (Two-step bit fixing scheme). In order to route a packet from a source
v {0, 1}n to a target (v) {0, 1}n do the following:
(i) Sample uniformly an intermediate target t(v) {0, 1}n .
(ii) Follow the path Pfix (v, t(v)).
(iii) Follow the path Pfix (t(v), (v)).
Observe that t : {0, 1}n {0, 1}n is not necessarily a permutation, as each t(v) is
sampled independently, and hence there could be collisions. Still, we prove that with very
high probability, all packets will be delivered in linear time.
Theorem 10.5. With probability 12(n1) , all packets will be routed to their destinations
in at most 14n time steps.
In preparation for proving it, we first prove a few results about the deterministic bit-fixing
routing scheme.
Claim 10.6. Let v, v 0 , u, u0 {0, 1}n . Let P = Pfix (v, u) and P 0 = Pfix (v 0 , u0 ). If the paths
separate at some point, they never re-connect. That is, let w1 , . . . , wm be the vertices of P
0
0
0
0
and let w10 , . . . , wm
0 be the vertices of P . Assume that wi = wj but wi+1 6= wj+1 . Then
w` 6= w`0 0

` i + 1, `0 j + 1.

0
0
Proof. By assumption wi = wj0 and wi+1 6= wj+1
. Then wi+1 = wi ea and wj+1
= wj0 eb
with a 6= b. Assume without loss of generality that a < b. Then (w` )a = (wi+1 )a = (wi )a 1
0
for all ` i + 1, while (w`0 0 )a = (wj+1
)a = (wj0 )a = (wi )a for all `0 j + 1. Thus, for any
0
0
` i + 1, ` j + 1 we have w` 6= w`0 , which means that the pathes never intersect again.

56

Lemma 10.7. Fix : {0, 1}n {0, 1}n and v {0, 1}n . Let e1 , . . . , ek be the edges of
Pfix (v, (v)). Define
Sv = {v 0 V : v 0 6= v, Pfix (v 0 , (v 0 )) contains some edge from e1 , . . . , ek }.
Then the packet sent from v to (v) will reach its destination after at most k + |Sv | steps.
Proof. Let S = Sv for simplicity of notation. The proof is by a charging argument. Let pv be
the packet sent from v to (v). We assume that packets carry tokens on them. Initially,
there are no tokens. Assume that at some time step, the packet pv it is supposed to traverse
an edge ei for some i [k], but instead another packet pv0 is sent over ei at this time step
(necessarily v 0 S). In such a case, we generate a new token and place it on pv0 . Next, if
for some v 0 S, a packet pv0 with at least one token on it is supposed to traverse an edge
ej for j [k], but instead another packet pv00 is sent over it at the same time step (again,
necessarily v 00 S), then we move one token from pv0 to pv00
We will show that any packet pv0 for v 0 S can have at most one token on it at any given
moment. This shows that at most |S| tokens are generated overall. Thus, pv is delayed for
at most |S| steps and hence reaches its destination after at most k + |S| steps.
To see that, observe that tokens always move forward along e1 , . . . , ek . That is, if we
follow a specific token, it starts at some edge ei , follows a path ei , . . . , ej for some j i, and
then traverses an edge outside e1 , . . . , ek . At this point, by Claim 10.6, it can never intersect
the path e1 , . . . , ek again. So, we can never have two tokens which traverse the same edge at
the same time, and hence two tokens can never be on the same packet.
Lemma 10.8. Let t : {0, 1}n {0, 1}n be uniformly chosen. Let P (v) = Pfix (v, t(v)). Then
with probability at least 1 2n over the choice of t, for any path P (v), there are at most 6n
other pathes P (w) which intersect some edge of P (v).
Proof. For v, w {0, 1}n , let Xv,w {0, 1} be the indicator random variablePfor the event
that P (v) and P (w) intersect in an edge. Our goal is to upper bound Xv = w6=v Xv,w for
all v {0, 1}n . Before analyzing it, we first analyze a simpler random variable.
Fix an edge e = (u, u ea ). For v {0, 1}n , let Ye,w {0, 1} be an indicator variable
for the event that P
the edge e belongs to the path P (w). The number of pathes which pass
through e is then w{0,1}n Ye,w . Now, the path between w and t(w) pathes through e iff
wi = ui for all i > a, and t(w)i = ui for all i < a. Let Ae = {w {0, 1}n : wi = ui i > a}.
Only w with w Ae has a nonzero probability for the path from w to t(w) to go through
e. Note that |Ae | = 2a . Moreover, for any w Ae , the probability that P (w) indeed passes
through e is given by
Pr[Ye,w = 1] = Pr[t(w)i = ui i < a] = 2(a1) .
Hence, the expected number of pathes P (w) which go through any edge e is at most 2, since

X
X
E
Ye,w =
Pr[Ye,w = 1] = 2a 2(a1) = 2.
w{0,1}n

wAe

57

Next, the pathes P (v), P (w) intersect if some e P (v) belongs to P (w). This implies
that
X
Xv,w
Ye,w .
eP (v)

Hence we can bound


"
E[Xv ] = E

#
X

Xv,w E

w6=v

X X

Ye,w .

eP (v) w6=v

In order to bound E[Xv ], note that once we fix t(v) then P (v) becomes a fixed list of n edges.
Hence
"
#
X
X
E[Xv |t(v)] =
E
Ye,w 2n.
eP (v)

w6=v

This then implies the same bound once we average over t(v) as well,
E[Xv ] 2n.
This means that for every v, the path P (v) intersects on average most 2n other pathes
P (w). Next, we show that this if we slightly increase the bound, this happens with very high
probability. This requires a tail bound. In our case, a multiplicative version of the Chernoff
inequality.
Theorem 10.9 (Chernoff bound, multiplicative version). Let Z1 , . . . , ZN {0, 1} be independent random variables. Let Z = Z1 + . . . + ZN with E[Z] = . Then for any > 0,


2
.
Pr [Z (1 + )] exp
2+
In our case, let v {0, 1}n , fix t(v) {0, 1}n and let Z1 , . . . , ZN be the random variables
{Xv,w : w 6= v}. Note that they are indeed independent, their sum is Z = Xv , and that
= E[Xv ] 2n. Taking = 2 gives
Pr[Xv 6n] exp(2n).
By the union bound, the probability that Xv 6n for some v {0, 1}n is bounded by
Pr[v {0, 1}n , Xv 6n] 2n exp(2n) 2n .
Proof of Theorem 10.5. Let t : {0, 1}n {0, 1}n be uniformly chosen. By Lemma 10.8, we
have that |Sv | 6n for all v {0, 1}n with probability at least 1 2n . By Lemma 10.7,
this implies that the packet send from v to t(v) would reach its destination in at most
n + 6n = 7n time steps. Analogously, we can send the packets from t(v) to (v) in at most
7n time steps (note that this is exactly the same argument, except that the starting point
is now randomized, instead of the end point). Again, the success probability of this phase
is 1 2n . By the union bound, the choice of t is good for both phases with probability at
least 1 2 2n .
58

11

Expander graphs

Expander graphs are deterministic graphs which behave like random graphs in many ways.
They have a large number of applications, including in derandomization, constructions of
error-correcting codes, robust network design, and many more. Here, we will only give some
definitions and describe a few of their properties. For a much more comprehensive survey
see [HLW06].

11.1

Edge expansion

Let G = (V, E) be an undirected graph. We will focus here on d-regular graphs, but many
of the results can be extended to non-regular graphs as well. Let E(S, T ) = |E (S T )|
denote the number of edges in G with one endpoint in S and the other in T . For a subset
S V , its edge boundary is E(S, S c ). We say that G is an edge expander, if any set S has
many edges going out of it.
Definition 11.1. The Cheeger constant of G is
h(G) =

E(S, S c )
.
SV,1|S||V |/2
|S|
min

Simple bounds are 0 h(G) d, with h(G) = 0 iff G is disconnected. A simple example
for a graph which large edge expansion is the complete graph. If G = Kn , the complete
graph on n vertices, then d = n 1 and
h(G) =

E(S, S c )
=
min
|S c | = n/2.
SV,1|S||V |/2
SV,1|S||V |/2
|S|
min

Our interest however will be in constructing large but very sparse graphs (ideally with d = 3)
for which h(G) c for some absolute constant c > 0. Such graphs are highly connected
graphs. For example, the following lemma shows that by deleting a few edges in such graphs,
we can only disconnect a few vertices. This is very useful for example in network design,
where we want the failure of edges to effect as few nodes as possible.
Lemma 11.2. To disconnect k vertices from the rest of the graph, we must delete at least
k h(G) edges.
Proof. If after deleting some number of edges, a set S V of size |S| = k gets disconnected
from the graph, then we must have deleted at least E(S, S c ) k h(G) many edges.
There are several constructions of expander graphs which are based on number theory.
The constructions are simple and beautiful, but the proofs are hard. For example, a construction of Selberg has V = Zp {} and E = {(x, x + 1), (x, x 1), (x, 1/x) : x V }. It
is a 3-regular graph, and it can be proven to have h(G) 3/32. The degree 3 is the smallest
we can hope for.
59

Claim 11.3. If G is 2-regular on n vertices then h(G) 4/n.


Proof. If G is 2-regular, it is a union of cycles. If it is disconnected then h(G) = 0. Otherwise,
it is a cycle v1 , v2 , . . . , vn , v1 . If we take S = {v1 , . . . , vn/2 } then E(S, S c ) = 2, |S| = n/2 and
hence h(G) 4/n.

11.2

Spectral expansion

We will now describe another notion of expansion, called spectral expansion. It must seem
less natural, but we will later see that it is essentially equivalent to edge expansion. However,
the benefit will be that it is easy to check if a graph has a good spectral expansion, while we
dont know of an efficient way to test for edge expansion (other than computing it directly,
which takes exponential time).
For a d-regular graph G = (V, E), |V | = n, let A be the adjacency matrix of G. That is,
A is an n n matrix with

1 if (i, j) E
Ai,j =
.
0 if (i, j)
/E
Note that A is a symmetric matrix, hence its eigenvalues are all real. We first note a few
simple properties of them.
Claim 11.4.
(i) All eigenvalues of A are in the range [d, d].
(ii) The vector ~1 is an eigenvector of A with eigenvalue d.
(iii) If G has k connected components then A has k linearly independent eigenvectors with
eigenvalue d.
(iv) If G is bi-partite then A has eigenvalue of d.
Parts (iii), (iv) are in fact iff, but we will only show one direction.
Proof. (i) Let v Rn be an eigenvector of A with eigenvalue . Let i be such that |vi | is
maximal. Then
X
vi = (Av)i =
vj ,
ji

and hence |||vi |


(ii) For any i [n],

ji

|vj | d|vi |, which gives || d.


(A~1)i =

1 = d.

ji

(iii) Let ~1S Rn be the indicator vector for a set S V . If G has k connected components,
say S1 , . . . , Sk V , then ~1S1 , . . . , ~1Sk are all eigenvectors of G with eigenvalue d.

60

(iv) If G is bi-partite, say V = V1 V2 with E V1 V2 , then the vector v = ~1V1 ~1V2 has
eigenvalue d. Indeed, if i V1 then
X
(Av)i =
vj = d
ji

since if j i then j V2 and hence vj = 1. Similarly if i V2 .


Let 1 2 . . . n be the eigenvalues of G. We know that 1 = d and that n d.
A graph is a spectral expander if all eigenvalues, except 1 , are bounded away from d, d.
Definition 11.5. A d-regular graph G is a -expander for 0 d if |2 |, . . . , |n | .
It is very simple to check spectral expansion: we can simply compute the eigenvalues
of the matrix A. Surprisingly, having a nontrivial spectral expansion (namely d )
is equivalent to having a nontrivial edge expansion (namely h 0 ). We next see that expanders have a stronger property than just edge expansion: the number of edges between
any two large sets, is close to the expected number in a random d-regular graph.
Lemma 11.6 (Expander mixing lemma). Let G be a d-regular -expander. Let S, T V .
Then


p


E(S, T ) d|S||T | |S||T |.

n
Proof. We can write E(S, T ) = ~1TS A~1T . We decompose
X
~1S =
i v i
and
~1T =

i vi .

Then
E(S, T ) =

i i i .

D
E

The terms for i = 1 correspond to the random graph case: 1 = ~1S , v1 = |S|/ n,

1 = |T |/ n and 1 = d, so
d|S||T |
1 1 1 =
.
n
We can thus bound
v
v

u n
u n

X
n
n

X
X
u
uX


d|S||T
|

2
t
t
E(S, T )
=

|
||
|

i2 ,

i i i
i
i
i


n
i=2

i=2

i=2

i=2

where we used the


inequality hu, vi kuk2 kvk2 for vectors u, v. To conclude,
D Cauchy-Schwarz
E P
P
2
note that |S| = ~1S , ~1S =
i and similarly |T | = i2 . Hence


p


d|S||T
|
E(S, T )
|S||T |.


n

61

For example, if |S| = n, |T | = n then


E(S, T ) = (d
So, as long as /d 
and T .

)n.

then we have a good estimate on the number of edges between S

Corollary 11.7. Let G = (V, E) be a d-regular -expander. Let I V be an independent


set. Then |I| d |V |.
Proof. If I V is an independent set then E(I, I) = 0. Plugging this to the expander
mixing lemma gives
d|I|2
|I|
|V |
which gives |I| d |V |.
So, we can prove strong properties of a graph whenever  d. But,
how small can be
as a function of the degree d? Alon and Boppana proved that 2 d 1(1 + o(1)). This
bound is tight, and graphs which attain it are called Ramanujan graphs. We will prove a
slightly weaker bound.
Lemma
11.8. Let G = (V, E) be a d-regular -expander. Assume that |V | = n  d. Then
d(1 o(1)).
P
Proof. For any real matrix M it holds that T r(M M t ) =
|i (M )|2 . As A is symmetric,
we have
X
T r(A2 ) =
2i .
On the other hand,
2

T r(A ) =

n
X

(A )i,i =

i=1

So

n
X

A2i,j = 2|E| = nd.

i,j=1

2i = nd. We have 1 = d and hence


n
X

2i = nd d2 = d(n d).

i=2

As we have |i | for all i 2, we conclude that




d(n d)
d1
2

=d 1
= d(1 o(1)).
n1
n1

62

11.3

Cheeger inequality

We will prove the following theorem, relating spectral expansion and edge expansion.
Theorem 11.9 (Cheeger inequality). For any d-regular graph G,
p
1
(d 2 ) h(G) 2d(d 2 ).
2
Let v1 , . . . , vn be the eigenvectors of A corresponding to eigenvalues 1 , . . . , n . Since
the matrix A is symmetric, we can choose orthonormal eigenvectors, hvi , vj i = 1i=j . In
particular, v1 = 1n~1. We will only prove the lower bound on h(G), which is easier and is
sufficient for our goals - to show a nontrivial edge expansion, it suffices to show that 2  d.
We start with a general characterization of 2 .
T

.
Claim 11.10. 2 = supwRn ,hw,~1i=1 wwTAw
w
D
E
P
Proof. Let w Rn be such that w, ~1 = 0. We can decompose w = ni=2 i vi . Then
P 2
P
wT w =
i and wT Aw = ni=2 i i2 . Since i 2 for all i 2, we obtain that
wT Aw 2 wT w.
Clearly, if w = v2 then wT Aw = 2 wT w, hence the claim follows.
So, to prove the lower bound h(G) (d 2 )/2, which is equivalent to 2 d 2h(G),
we just need to exhibit a vector w. We do so in the next lemma.
Lemma 11.11. Let S V be a set for which h(G) =
wi = ~1S

E(S,S c )
.
|S|

Define w Rn by

|S|~
1.
n

D
E
Then w, ~1 = 0 and
wT Aw
d 2h(G).
wT w
D

E
~
Proof. It is clear that w, 1 = 0. For the latter claim, we first compute

T 

2
|S|
|S|
T
~1
~1S
~1 = |S| |S| .
w w = ~1S
n
n
n
D
E
~1 and since w, ~1 = 0 we have
Next, Aw = A~1S d|S|
n
d|S|2
d|S|2
d|S|2
= E(S, S)
= d|S| E(S, S c )
.
n
n
n
We now plug in E(S, S c ) = |S|h(G) and obtain that




|S|
n
T
T
T
w Aw = w w d
h(G) = w w d
h(G) wT w (d 2h(G))
|S| |S|2 /n
n |S|
wT Aw = wT A~1S = ~1TS A~1S

since |S| n/2 and hence n/(n |S|) 2.


63

11.4

Random walks mix fast

We saw one notion under which expanders are robust - deleting a few edges can only
disconnect a few vertices. Now we will see another - random walks mix fast. This will
require a bound on all i for i 2, which is how we defined -expanders.
A random walk in a d-regular graph G is defined as one expects: given a current node
i V , a neighbour j of i is selected uniformly, and we move to j. There is a simple
characterization of the probability distribution on the nodes after one step, given by the
normalized adjacency matrix. Define
1
A = A.
d
We can describe distributions over V as vectors (R+ )n , where i is the probability that
we are at node i.
Claim 11.12. Let (R+ )n be a distribution over the nodes. After taking one step in the

random walk on G, the new distribution over nodes is given by A.


Proof. Let 0 be the distribution over the nodes after the random walk. The probability
that we are at node i after the random walk is the sum over all its neighours j i, of the
probability that we were at node j before the step, and that we chose to go from j to i. This
latter probability is 1/d always, as the graph is d-regular. So
X
i.
i0 =
j (1/d) = (1/d)(A)i = (A)
ji

We next use this observation to show that if 2  d then random walks in G converge fast
to the uniform distribution. The distance between distributions is the statistical distance,
given by
1X
1
dist(, 0 ) =
|i i0 | = k 0 k1 .
2 i
2
It can be shown that this is also equivalent to the largest probability in which an event can
distinguish from 0 ,
X
dist(, 0 ) = max
i i0 .
F [n]

iF

Below, we denote by U = (1/n)~1 the uniform distribution over the nodes.


Lemma 11.13. Let 0 be any starting distribution over the nodes of V . Let 1 , 2 , . . . be
the distributions obtained by performing a random walk on the nodes. Then
kt U k1 n(/d)t .

64

E
P
D

Proof. Decompose 0 =
i vi where 1 = h0 , v1 i = 1/ n 0 , ~1 = 1/ n and hence
1 v1 = (1/n)~1 = U is the uniform distribution. We have that t = At 0 . The eigenvectors
of A are v1 , . . . , vn with eigenvalues 1 = 1 /d, 2 /d, . . . , n /d. Hence
t =

n
X
(i /d)t vi .
i=1

Thus, the difference between t and the uniform distribution is given by


t U =

n
X

(i /d)t vi .

i=2

In order to bound kt U k1 , it will be easier to first bound kt U k2 , and then use the
Cauchy-Schwarz inequality: for any w Rn we have
!2
n
n
X
X
2
kwk1 =
|wi | n
|wi |2 = nkwk22 .
i=1

i=1

Now, |i | for all i 2. So,


k

U k22

n
X

(i /d)2t n(/d)2t .

i=2

Hence,
k U k1 n(/d)t .

Corollary 11.14. The diameter of a d-regular -expander is at most

2 log n
.
log(d/)

Proof. Fix any i, j V . The probability of a random walk of length t which starts at i to
c log n
then the error term is bounded by
reach j is 1/n n(/d)t . If t = log(d/)
n(/d)t n(c1) .
So, for c > 2 the error term is < 1/n, and hence there is a positive probability to reach j
from i within t steps. In particular, their distance is bounded by t.

11.5

Random walks escape small sets

We next show another property of random walks on expanders: they dont stay trapped in
small sets.

65

Lemma 11.15. Let S V . Let i0 V be uniformly chosen, and let i1 , i2 , . . . , it V be


nodes obtained by a random walk starting at i0 . Then

 t

S
|S|
+ 1
Pr[i0 , i1 , . . . , it S]
.
n
n d
Proof. We analyze the event that i0 , i1 , . . . , it S by analyzing the conditional probabilities:
Pr[i0 , i1 , . . . , it S] = Pr[i0 S] Pr[i1 S|i0 S] . . . Pr[it S|i0 , i1 , . . . , it1 S].
Clearly, Pr[i0 S] = |S|
. However, we will present it in another way. Let 0 = (1/n)~1 be
n
the uniform distribution, and let S be the projection to S. That is, S is a diagonal n n
matrix with (S )i,i = 1iS . Then
Pr[i0 S] = ~1T S 0 .
Let 00 be the conditional distribution of i0 , conditioned on i0 S. It is the uniform distribution over S. Equivalently,
S 0
.
00 =
Pr[i0 S]
00 . The probability that
The distribution of i1 , conditioned on i0 S, is given by 1 = A
i1 S conditioned on i0 S is given by
Pr[i1 S|i0 S] = ~1T S 1 ,
Let 10 be the distribution of i1 , conditioned on i0 , i1 S. Then
10 =

0
S 0
S 0
S A
S A
S A
S 1
0
=
=
=
Pr[i1 S|i0 S]
Pr[i1 S|i0 S]
Pr[i1 S|i0 S] Pr[i0 S]
Pr[i0 , i1 S]

Similarly,
Pr[i2 S|i0 , i1 S] = ~1T S 10 ,
and
20

S A
S 0
10
S A
S A
=
=
.
Pr[i2 S|i0 , i1 S]
Pr[i0 , i1 , i2 S]

More generally, and exploting the fact that 2S = S , we have that the conditioned distribution of it , conditioned on i0 , . . . , it S, is given by
t0 =

S )t 0
(S A
.
Pr[i0 , . . . , it S]

Since t0 is a distribution, ~1T t0 = 1. Hence we can compute


S )t 0 .
Pr[i0 , i1 , . . . , it S] = ~1T (S A
66

S . As it is a symmetric matrix, its eigenvalues are all real. Let R denote


Let M = S A
its largest eigenvalue in absolute value. We will shortly bound ||. This will then imply that
kM vk2 ||kvk2 for any vector v Rn , and hence kM t vk2 ||t kvk2 . Thus

Pr[i0 , i1 , . . . , it S] k~1k2 kM t 0 k2 n ||t k0 k2 = ||t .


In order to bound ||, let w Rn denote the eigenvector corresponding to , where we
S w) we must have that w is supported on
normalize kwk2 = 1. Since w = M w = S (A


S, and hence S w = w. Decompose w = v1 + w where = hw, v1 i and w , v1 = 0, and
let = kw k2 . We have
1 = kwk22 = 2 kv1 k22 + kw k22 = 2 + 2
and
= |2 + (w )T Aw
| 2 + (/d) 2 = 2 + (/d)(1 2 ).
|| = |wT M w| = |wT Aw|
So, to bound || we need to bound ||. As w is supported on S we have
E
E
p
1 D
1 D
1


|| = |hw, v1 i| = w, ~1 = w, ~1S kwk2 k~1S k2 = |S|/n
n
n
n
and hence
2
We thus obtain the bound

11.6

|S|
.
n



|S|
|S|
||
+ 1
.
n
n d

Randomness efficient error reduction in randomized algorithms

We describe an application of Lemma 11.15 in error reduction for randomized algorithms.


Let A(x, r) be a randomized algorithm which computes a boolean function f (x) {0, 1}.
Lets assume for simplicity it has a one-sided error (the analysis can be extended to two-sided
error, but we wont do that here). That is, we assume that
If f (x) = 0 then A(x, r) = 0 always.
If f (x) = 1 then Prr [A(x, r) = 1] 1/2.
Lets say that we want to increase the success probability for inputs x for which f (x) = 1
from 1/2 to 1 , where for simplicity we take = 2t . A simple solution is to simply
repeat the algorithm t times, and output 0 only if in all the runs it outputted 0. If A uses
m random bits (eg r {0, 1}m ) then the new algorithm will use mt random bits. However,
this can be improved using expanders.
67

Lemma 11.16. Let G = (V, E) be some d-regular -expander for V = {0, 1}m , where
< d = O(1) are constants. We treat nodes of G as assignments to the random bits of A.
Consider the following algorithm: choose a random r0 V , and let r1 , . . . , rt be obtained by
a random walk on G starting at r0 . On input x, we run A(x, r0 ), . . . , A(x, rt ), and output 0
only if in all the runs output 0. Then
1. The new algorithm is a one-sided algorithm with error 2(t) .
2. The new algorithm uses only m + O(t) random bits.
Proof. If f (x) = 0 then A(x, r) = 0 for all r, hence we will return 0 always. So assume
that f (x) = 1. Let B = {r {0, 1}m : A(x, r) = 0} be the bad random strings, on
which the algorithm makes a mistake. By assumption, |B| |V |/2. We will return 0 only if
r0 , . . . , rt B. However, we know that
t

1 1
+
= pt
Pr[r0 , . . . , rt B]
2 2 d
where p = p(, d) < 1 is a constant. So the probability of error is 2(t) . The number
of random bits required is as follows: m random bits to choose r0 , but only log d = O(1)
random bits to choose each ri given ri1 . So the total number of random bits is m+O(t).

68

References
[Bal11]

Simeon Ball. A proof of the mds conjecture over prime fields. In 3rd International
Castle Meeting on Coding Theory and Applications, volume 5, page 41. Univ.
Aut`onoma de Barcelona, 2011.

[FKS84] Michael L Fredman, Janos Komlos, and Endre Szemeredi. Storing a sparse table
with 0 (1) worst case access time. Journal of the ACM (JACM), 31(3):538544,
1984.
[HLW06] Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their
applications. Bulletin of the American Mathematical Society, 43(4):439561, 2006.
[Kar93]

David R Karger. Global min-cuts in rnc, and other ramifications of a simple


min-cut algorithm. In SODA, volume 93, pages 2130, 1993.

[KS96]

David R Karger and Clifford Stein. A new approach to the minimum cut problem.
Journal of the ACM (JACM), 43(4):601640, 1996.

[LG14]

Francois Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings


of the 39th international symposium on symbolic and algebraic computation, pages
296303. ACM, 2014.

[LLW06] Michael George Luby, Michael Luby, and Avi Wigderson. Pairwise independence
and derandomization. Now Publishers Inc, 2006.
[Rys63]

Herbert John Ryser. Combinatorial mathematics. 1963.

[Sch80]

Jacob T Schwartz. Fast probabilistic algorithms for verification of polynomial


identities. Journal of the ACM (JACM), 27(4):701717, 1980.

[Sch99]

Uwe Schoning. A probabilistic algorithm for k-sat and constraint satisfaction


problems. In Foundations of Computer Science, 1999. 40th Annual Symposium
on, pages 410414. IEEE, 1999.

[Sha79]

Adi Shamir. How to share a secret. Communications of the ACM, 22(11):612613,


1979.

[Str69]

Volker Strassen. Gaussian elimination is not optimal. Numerische Mathematik,


13(4):354356, 1969.

[Zip79]

Richard Zippel. Probabilistic algorithms for sparse polynomials. Springer, 1979.

69

You might also like