CSE291 Course Notes
CSE291 Course Notes
Shachar Lovett
Spring 2016
Abstract
The class will focus on two themes: linear algebra and probability, and their many
applications in algorithm design. We will cover a number of classical examples, including: fast matrix multiplication, FFT, error correcting codes, cryptography, efficient
data structures, combinatorial optimization, routing, and more. We assume basic familiarity (undergraduate level) with linear algebra, probability, discrete mathematics
and graph theory.
Contents
0 Preface: Mathematical background
0.1 Fields . . . . . . . . . . . . . . . .
0.2 Polynomials . . . . . . . . . . . . .
0.3 Matrices . . . . . . . . . . . . . . .
0.4 Probability . . . . . . . . . . . . .
.
.
.
.
4
4
4
5
5
.
.
.
.
7
7
11
12
12
.
.
.
.
.
14
14
14
16
16
17
19
19
21
.
.
.
.
23
23
24
25
26
5 Reed-Solomon codes
5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Decoding Reed-Solomon codes from erasures . . . . . . . . . . . . . . . . . .
5.3 Decoding Reed-Solomon codes from errors . . . . . . . . . . . . . . . . . . .
27
27
28
28
30
30
31
33
34
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 Matrix multiplication
1.1 Strassens algorithm . . . . . . . . . . . .
1.2 Verifying matrix multiplication . . . . . .
1.3 Application: checking if a graph contains a
1.4 Application: listing all triangles in a graph
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . .
. . . . .
triangle
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
multiplication
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
in graphs
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Satisfiability
7.1 2-SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 3-SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
36
40
.
.
.
.
.
.
.
42
42
44
45
46
47
48
50
9 Min cut
9.1 Kargers algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Improving the running time . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
51
53
10 Routing
10.1 Deterministic routing is bad . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Solution: randomized routing . . . . . . . . . . . . . . . . . . . . . . . . . .
55
55
56
11 Expander graphs
11.1 Edge expansion . . . . . . . . . . . . .
11.2 Spectral expansion . . . . . . . . . . .
11.3 Cheeger inequality . . . . . . . . . . .
11.4 Random walks mix fast . . . . . . . . .
11.5 Random walks escape small sets . . . .
11.6 Randomness efficient error reduction in
59
59
60
63
64
65
67
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
randomized algorithms
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0.1
Fields
A field is a set F endowed with two operations: addition and multiplication. It satisfies the
following conditions:
Associativity: (x + y) + z = x + (y + z) and (xy)z = x(yz).
Commutativity: x + y = y + x and xy = yx.
Distributivity: x(y + z) = xy + xz.
Unit elements (0,1): x + 0 = x and x1 = x.
Inverse: If x 6= 0 then there exists 1/x such that x(1/x) = 1.
You probably know many infinite fields: the real numbers R, the rationals Q and the
complex numbers C. For us, the most important fields will be finite fields, which have a
finite number of elements.
An example is the binary field F2 = {0, 1}, where addition corresponds to XOR and
multiplication to AND. This is an instance of a more general example, of prime finite fields.
Let p be a prime. The field Fp consists of the elements {0, 1, . . . , p 1}, where addition and
multiplication are defined modulo p. One can verify that it is indeed a field. The following
fact is important, but we will not prove it.
Fact 0.1. If a finite field F has q elements, then q must be a prime power. For any prime
power there is exactly one finite field with q elements, which is called the finite field of order
q, and denoted Fq .
0.2
Polynomials
n
X
f i xi .
i=0
Here, x is a variable which takes values in F, and fi F are constants, called the coefficients
of f . We can evaluate f at a point a F by plugging x = a, namely
f (a) =
n
X
f i ai .
i=0
Multi-variate polynomials are defined in the same way: if x1 , . . . , xd are variables that
X
f (x1 , . . . , xd ) =
fi1 ,...,id (x1 )i1 . . . (xd )id
i1 ,...,id
where i1 , . . . , id range over a finite subset of Nd . The total degree of f is the maximal
i1 + . . . + id for which fi1 ,...,id 6= 0.
0.3
Matrices
m
X
Ai,j Bj,k .
k=1
(1)
sign()
n
Y
Ai,(i) .
i=1
Sn
0.4
Probability
In this class, we will only consider discrete distributions and discrete random variables, which
take a finite number of possible
P values Let X be a random variable, such that Pr[X = xi ] = pi
where xi R, pi 0 and
pi = 1. Its expectation (average) is
X
E[X] =
p i xi
i
E[X]
.
a
iS
Claim 0.5 (Chebychev inequality). Let X R be a random variable where E[X 2 ] < .
Then
Var(X)
.
Pr [|X E[X]| a]
a2
Proof. Let Y = |X E[X]|2 . Then E[Y ] = Var(X) and, by applying Markov inequality to
Y (note that Y 0) we have
Pr [|X E[X]| a] = Pr[Y 2 a2 ]
Var(X)
E[Y 2 ]
=
.
2
a
a2
We will also need tail bounds for the sum of many independent random variables. This
is given by the Chrenoff bounds. We state two versions of these bounds: one for absolute
(additive) error and one for relative (multiplicative) error.
Theorem
P0.6 (Chernoff bounds). Let Z1 , . . . , Zn {0, 1} be independent random variables.
Let Z =
Zi and = E[Z]. Then for any > 0 it holds that
(i) Absolute error:
Pr [|Z E[Z]| n] 2 exp(22 n).
(ii) Relative error:
2
Pr [Z (1 + )] exp
2+
.
Matrix multiplication
n
X
ai,k bk,j .
k=1
The basic computational problem is how many operations (additions and multiplications)
are required to compute C. Implementing the formula above in a straightforward manner
requires O(n3 ) operations. The best possible is 2n2 , which is the number of inputs. The
matrix multiplication exponent, denoted , is the best constant such that we can multiply
two n n matrices in O(n ) operations (to be precise, is the infimum of these exponents).
As we just saw, 2 3. To be concrete, we will consider matrices over the reals, but this
can be defined over any field.
Open Problem 1.1. What is the matrix multiplication exponent?
The first nontrivial algorithm was by Strassen [Str69] in 1969, who showed that
log2 7 2.81. Subsequently, researchers were able to improve the exponent. The best
results to date are by Le Gall [LG14] who get 2.373. Here, we will only describe
Strassens result, as well as general facts about matrix multiplication. A nice survey on
matrix multiplication, describing most of the advances so far, can be found in the homepage
of Yuval Filmus: http://www.cs.toronto.edu/~yuvalf/.
1.1
Strassens algorithm
The starting point of Strassens algorithm is the following algorithm for multiplying 2 2
matrices.
1. p1 = (a1,1 + a2,2 )(b1,1 + b2,2 )
2. p2 = (a2,1 + a2,2 )b1,1
3. p3 = a1,1 (b1,2 b2,2 )
4. p4 = a2,2 (b2,1 b1,1 )
5. p5 = (a1,1 + a1,2 )b2,2
6. p6 = (a2,1 a1,1 )(b1,1 + b1,2 )
7. p7 = (a1,2 a2,2 )(b2,1 + b2,2 )
8. c1,1 = p1 + p4 p5 + p7
9. c1,2 = p3 + p5
7
10. c2,1 = p2 + p4
11. c2,2 = p1 p2 + p3 + p6
It can be checked that this program uses 7 multiplications and 18 additions/subtractions,
so 25 operations overall. The naive implementation of multiplying two 22 matrices requires
8 multiplications and 4 additions, so 12 operations overall. The main observation of Strassen
is that really only the number of multiplications matter.
In order to show it, we will convert Strassens algorithm to a normal form. In such a
program, we first compute some linear combinations of the entries of A and the entries of
B, individually. We then multiply them, and take linear combinations of the results to get
the entries of C. We define it formally below.
Definition 1.2 (Normal form). A normal-form program for computing matrix multiplication,
which uses M multiplications, has the following form:
(i) For 1 i M , compute linear combinations i of the entries of A.
(ii) For 1 i M , compute linear combinations i of the entries of B.
(iii) For 1 i M , Compute the product pi = i i .
(iv) For 1 i, j n, compute ci,j as a linear combination of p1 , . . . , pM .
Lemma 1.3. Any program for computing matrix multiplication, which uses M multiplications (and any number of additions), can be converted to a normal form with at most 2M
multiplications.
Note that in the normal form, the program computes 2M multiplications and O(M n2 )
additions.
Proof. First, we can convert any program for computing matrix multiplication to a straightline program. In such a program, every step computes either the sum or product of two
previously computed variables. In our case, let z1 , . . . , zN be the values computed by the
program. The first 2n2 are the inputs: z1 , . . . , zn2 are the entries of A (in some order), and
zn2 +1 , . . . , z2n2 are the entries of B. The last n2 variables are the entries of C which we wish
to compute. Every intermediate variable zi is either:
A linear combination of two previously computed variables: zi = zj + zk , where
j, k < i and , R.
A product of two previously computed variables: zi = zj zk , where j, k < i.
In particular, note that each zi is some polynomial of the inputs {ai,j , bi,j : 1 i, j n}.
Next, we show that as the result is bilinear in the inputs, we can get the computation to
respect that structure.We decompose zt (A, B) as follows:
zt (A, B) = at + bt (A) + ct (B) + dt (A, B) + et (A, B),
where
8
Next, we show how to use matrix multiplication programs in normal form to compute
products of large matrices. This will show that only the number of multiplications matter.
Theorem 1.4. If two m m matrices can be computed using M = m multiplications (and
any number of additions) in a normal form, then for any n 1, any two n n matrices can
be multiplied using only O((mn) log(mn)) operations.
So for example, Strassens algorithm is an algorithm in normal form which uses 7 multiplications to multiply two 2 2 matrices. So, any two n n matrices can be multiplied
using O(nlog2 7 (log n)O(1) ) O(n2.81+ ) operations for any > 0. So, log2 7 2.81.
Proof. Let T (n) denote the number of operations required to compute the product of two
n n matrices. We assume that n is a power of m, by possible increasing it to the smallest
power of m larger than it. This might increase n to at most nm. Now, the main idea is to
compute it recursively. We partition an n n matrix as an m m matrix, whose entries
are (n/m) (n/m) matrices. Let C = AB and let Ai,j , Bi,j , Ci,j be these sub-matrices of
A, B, C, respectively, where 1 i, j m. Then, observe that (as matrices) we have
Ci,j =
m
X
Ai,k Bk,j .
k=1
We can apply any algorithm for m m matrix multiplication in normal form to compute
{Ci,j }, as the algorithm never assumes that the inputs commute. So, to compute {Ci,j }, we:
(i) For 1 i M , compute linear combinations i of the Ai,j .
(ii) For 1 i M , compute linear combinations i of the Bi,j .
(iii) For 1 i M , Compute pi = i i .
(iv) For 1 i, j m, compute Ci,j as a linear combination of p1 , . . . , pM .
Note that i , i , pi are all (n/m) (n/m) matrices. How many operations do we do? steps
(i),(ii),(iv) each require M m2 additions of (n/m) (n/m) matrices, so in total require
O(M n2 ) additions. Step (iii) requires M multiplications of matrices of size (n/m) (n/m).
So, we get the recursion formula
T (n) = m T (n/m) + O(m n2 ).
This solves to O((mn) ) if > 2 and to O((mn)2 log n) if = 2. Lets see explicitly the first
case, the second being similar.
Let n = ms . This recursion solves to a tree of depth s, where each node has m children.
The number of nodes at depth i is mi , and the amount of computation that each makes is
O(m (n/mi )2 ). Hence, the total amount of computation at depth i is O(m m(2)i n2 ). As
long as > 2, this grows exponentially fast in the depth, and hence controlled by the last
level (at depth s) which takes O(m m(2)s m2s ) = O((mn) ).
10
1.2
Assume that someone gives you a magical algorithm that is supposed to multiply two matrices quickly. How would you verify it? one way is to compute matrix multiplication yourself,
and compare the results. This will take time O(n ). Can you do better? the answer is yes,
if we allow for randomization. In the following, our goal is to verify that AB = C where
A, B, C are n n matrices over an arbitrary field.
Function MatrixMultVerify
Input : n n m a t r i c e s A, B, C .
Output : I s i s t r u e t h a t AB = C ?
1 . Choose x {0, 1}n randomly .
2 . Return TRUE i f A(Bx) = Cx , and FALSE o t h e r w i s e .
Clearly, if AB = C then the algorithm always returns true. Moreover, as all the algorithm
does is iteratively multiply an n n matrix with a vector, it runs in time O(n2 ). The main
question is: can we find matrices A, B, C where AB 6= C, but where the algorithm returns
TRUE with high probability? The answer is no, and is provided by the following lemma,
applied to M = AB C.
Lemma 1.5. Let M be a nonzero n n matrix. Then Prx{0,1}n [M x = 0] 1/2.
In particular, if we repeat this t times, the error probability will reduce to 2t .
Proof. The matrix M has some nonzero row, lets say it is a1 , . . . , an . Then,
hX
i
Pr n [M x = 0] Pr
ai x i = 0 .
x{0,1}
P
P
Let i be minimal such that ai 6= 0. Then,
ai xi = 0 iff xi = j>i (aj /ai )xj . Hence, for
any fixing of {xj : j > i}, there is at most one value for xi which would make this hold.
hX
i
hX
i
Pr
ai xi = 0 = Exj ,...,xn {0,1}
Pr
ai x i = 0
xi {0,1}
"
"
##
X
= Exj ,...,xn {0,1}
Pr
xi =
(aj /ai )xj
xi {0,1}
1/2.
11
j>i
1.3
Let G = (V, E) be a graph. Our goal is to find whether G contains a triangle, and more
generally, enumerate the triangles in G. Trivially, this takes n3 time. We will show how to
improve it using fast matrix multiplication. Let |V | = n, A be the n n adjacency matrix
of G, Ai,j = 1(i,j)E . Observe that
X
(A2 )i,j =
Ai,k Ak,j = number of pathes of length two between i, j
k
So, to check G contains a triangle, we can first compute A2 , and then use it to detect if there
is a triangle.
Function TriangleExists(A)
Input : An n n a d j a c e n c y matrix A
Output : I s t h e r e a t r i a n g l e i n t he graph ?
1 . Compute A2
2 . Check i f t h e r e i s 1 i, j n with Ai,j = 1 and (A2 )i,j 1 .
The running time of step 1 is O(n ), and of step 2 is O(n2 ). Thus the total time is O(n ).
1.4
We next describe a variant of the TriangleExists algorithms, which lists all triangles in the
graph. Again, the goal is to improve upon the naive O(n3 ) algorithm which tests all possible
triangles.
The enumeration algorithm will be recursive. At each step, we partition the vertices to
two sets and recurse over the possible 8 configurations. To this end, we will need to check
if a triangle i, j, k exists in G with i I, j J, k K for some I, J, K V . The same
algorithm works.
Function TriangleExists(A; I,J,K)
Input : An n n a d j a c e n c y matrix A , I, J, K {1, . . . , n}
Output : I s t h e r e a t r i a n g l e (i, j, k) with i I, j J, k K .
1 . Let A1 , A2 , A3 be I J, J K, I K s u b m a t r i c e s o f A .
2 . Compute A1 A2 .
3 . Check i f t h e r e i s i I, k K with (A1 A2 )i,k = 1 and (A3 )i,k = 1 .
We next describe the triangle listing algorithm. For simplicity, we assume n is a power
of two.
12
I f n = 1 check i f t r i a n g l e e x i s t s . I f so , output i t .
I f Ch e ck Tr i an g le ( I , J ,K)==F a l s e r e t u r n .
P a r t i t i o n I = I1 I2 , J = J1 J2 , K = K1 K2 .
Run T r i a n g l e s L i s t ( Ia , Jb , Kc ) f o r a l l 1 a, b, c 2 .
Let i be the level at which 8i = 8m. The computation time up to level i is given by
X
X
X
Ti
8i O((n/2i ) ) =
O(n 2i(3) ) = O(n 2i (3) ) = O(Ti ).
ii
ii
ii
ii
Open Problem 1.7. How fast can we find one triangle in a graph? how about m triangles?
13
The Fast Fourier Transform is an amazing discovery with many applications. Here, we
motivate it by the problem of computing quickly the product of two polynomials.
2.1
Univariate polynomials
n
X
f i xi ,
i=0
where x is the variable and fi R are the coefficients. We may assume that fn 6= 0, in which
case we say that f has degree n.
Given two polynomials f, g of degree n, their sum is given by
(f + g)(x) = f (x) + g(x) =
n
X
(fi + gi )xi .
i=0
Note that given f, g as their list of coefficients, we can compute f + g in time O(n).
The product of two polynomials f, g of degree n each is given by
! n
!
min(i,n)
n
n X
n
2n
X
X
X
X
X
(f g)(x) = f (x)g(x) =
f i xi
gj xj =
fi gj xi+j =
fj gij xi .
i=0
j=0
i=0 j=0
i=0
j=0
P
fj gij for all
So, in order to compute the coefficients of f g, we need to compute min(i,n)
j=0
0 i 2n. This trivially takes time n2 . We will see how to do it in time O(n log n), using
Fast Fourier Transform (FFT).
2.2
Proof. Decompose Fn into four n/2 n/2 matrices as follows. First, reorder the rows to list
first all n/2 even indices, then all n/2 odd indices. Let Fn0 be the new matrix, with re-ordered
rows. Decompose
A B
0
Fn =
.
C D
What are A, B, C, D? If 1 a, b n/2 then
Aa,b = (Fn )2a,b = (n )2ab = (n/2 )ab = (Fn/2 )a,b .
Ba,b = (Fn )2a,b+n/2 = (n )2ab+an = (n )2ab = (Fn/2 )a,b .
Ca,b = (Fn )2a+1,b = (n )2ab+b = (Fn/2 )a,b (n )b .
Da,b = (Fn )2a+1,b+n/2 = (n )2ab+b+an+n/2 = (Fn/2 )a,b (n )b .
So, let P be the n/2 n/2 diagonal matrix with Pa,b = (n )b . Then
A = B = Fn/2 ,
C = D = Fn/2 P.
Open Problem 2.2. Can we compute Fn x faster than O(n log n)? maybe as fast as O(n)?
15
2.3
Inverse FFT
The inverse of the Fourier matrix is the complex conjugate of the Fourier matrix, up to
scaling.
Lemma 2.3. (Fn )1 = n1 Fn .
Proof. We have Fna,b = nab = nab . So
(Fn Fn )a,b =
n1
X
nac ncb
n1
X
(n )(ab)c .
c=0
c=0
If a =
then the sum equals n. We claim that when a 6= b the sum is zero. To see that, let
Pbn1
S = i=0 nic , where c 6= 0 mod n. Then
n
n1
X
X
ic
(n ) S =
(n ) =
(n )ic = S.
c
i=1
i=0
Corollary 2.4. For any x Cn , we can compute (Fn )1 x using O(n log n) additions and
multiplications.
Proof. We have
Fn1 x = n1 Fn x = n1 Fn x.
We can conjugate x to obtain x in time O(n); compute Fn x in time O(n log n); and conjugate
the output and divide by n in time O(n).
2.4
Pn1
i
Let f (x) =
We identify it with the list of coefficients
i=0 fi x be a polynomial.
n
(f0 , f1 , . . . , fn1 ) C . Its order n Fourier transform is defined as its evaluations on the
n-th roots of unity:
fbi = f (ni ).
Lemma 2.5. Let f (x) be a polynomial of degree n 1. Its Fourier transform can be
computed in time O(n log n).
16
P
ij
Proof. We have fbj = n1
i=0 fi n . So
f0
f1
..
.
fb = Fn
fn1
2.5
Multivariate polynomials
Let f, g be multivariate polynomials. For simplicity, lets consider bivariate polynomials. Let
f (x, y) =
n
X
i j
fi,j x y ,
g(x, y) =
i,j=0
n
X
i,j=0
17
gi,j xi y j .
Their product is
(f g)(x, y) =
n
X
i,j,i0 ,j 0 =0
2n
X
min(n,i) min(n,j)
i0 =0
j 0 =0
i,j=0
i,j=0
We can clearly compute F, G from f, g in linear time, and as deg(F ), deg(G) (N + 1)n,
we can compute F G in time O((N n) log(N n)). The only question is whether we can infer
f g from F G.
P
Lemma 2.7. Let N 2n + 1. If H(z) = F (z)G(z) = Hi z i then
(f g)(x, y) =
2n
X
HN i+j xi y j .
i,j=0
Proof. We have
H(z) = F (z)G(z) =
n
X
!
fi,j z N i+j
i=0
n
X
n
X
!
gi0 ,j 0 z
N i0 +j 0
i=0
0
i,j,i0 ,j 0 =0
18
Secret sharing is a method for distributing a secret amongst a group of participants, each
of whom is allocated a share of the secret. The secret can only be reconstructed when an
allowed groups of participants collaborate, and otherwise no information is learned about
the secret. Here, we consider the following special case. There are n players, each receiving
a share. The requirement is that every group of k players can together learn the secret, but
any group of less than k players can learn nothing about the secret. A method to accomplish
that is called an (n, k)-secret sharing scheme.
Example 3.1 ((3, 2)-secret sharing scheme). Assume a secret s {0, 1}. The shares
(S1 , S2 , S3 ) are joint random variables, defined as follows. Sample x F5 uniformly, and set
S1 = s + x, S2 = s + 2x, S3 = s + 3x. Then each of S1 , S2 , S3 is uniform in F5 , even conditioned on the secret, but any pair define two independent linear equations in two variables
x, s which can be solved.
We will show how to construct (n, k) secret sharing schemes for any n k 1. This will
follow [Sha79].
3.1
In order to construct (n, k)-secret sharing schemes, we will use polynomials. We will later
see that this is an instance of a more general phenomena. Let F be a finite field
P of sizei
|F| > n. We choose a random polynomial f (x) of degree k 1 as follows: f (x) = k1
i=0 fi x
where f0 = s and f1 , . . . , fk1 F are chosen uniformly. Let 1 , . . . , n F be distinct
nonzero elements. The share for player i is Si = f (i ). Note that the secret is s = f (0). For
example, the (3, 2)-secret sharing scheme corresponds to F = F5 , 1 = 1, 2 = 2, 3 = 3.
Theorem 3.2. This is an (n, k)-secret sharing scheme.
The proof will use the following definition and claim.
Definition 3.3 (Vandermonde matrices). Let 1 , . . . , k F be distinct elements in a field.
The Vandermonde matrix V = V (1 , . . . , k ) is defined as follows:
Vi,j = (i )j1 .
Lemma 3.4. If 1 , . . . , k F are distinct elements then det(V (1 , . . . , k )) 6= 0.
Q
Proof sketch. We will show that det(V (1 , . . . , k )) = i<j (j i ). In particular, it is
nonzero whenever 1 , . . . , k are distinct. To see that, let x1 , . . . , xk F be variables, and
define the polynomial
f (x1 , . . . , xk ) = det(V (x1 , . . . , xk )).
First, note that if we set xi = xj for some i 6= j, then f |xi =xj = 0. This is since the matrix
V (x1 , . . . , xk ) with xi = xj has two identical rows (the i-th and j-th rows), and hence its
19
determinant is zero. This then implies (and we omit the proof here) that f (x) is divisible
by xi xj for all i 6= j. So we can factor
Y
f (x1 , . . . , xk ) =
(xi xj )g(x1 , . . . , xk )
i>j
for some polynomial g(x1 , . . . , xk ). Next, we claim that g is a constant. This will follow by
comparing degrees. Recall that for an n n matrix V we have
det(V ) =
sign()
(1)
n
Y
V(i),i ,
i=1
Sn
where ranges over all permutations of {1, . . . , n}. In our case, Vi,j = xj(i) is a polynomial
of degree j, and hence
n1
X
n
deg(det(V (x1 , . . . , xn ))) =
.
j=
2
j=0
Observer that also
n
deg( (xi xj )) =
.
2
i>j
Y
So it must be that g is a constant. One can further verify that in fact g = 1, although we
would not need that.
If we substitute xi = i we obtain that, as 1 , . . . , n are distinct that
Y
det(V (1 , . . . , n )) =
(i j ) 6= 0.
i>j
Proof of Theorem 3.2. We need to show two things: (i) any k players can recover the secret,
and (ii) any k 1 learn nothing about it.
(i) Consider any k players, say i1 , . . . , ik . Each share Sij is a linear combination of the
k unknown variables f0 , . . . , fk1 . We will show that they are linearly independent,
and hence the players have enough information to solve for the k unknowns, and
in particular can recover f0 = s. Let V = V (i1 , . . . , ik ). By definition we have
Sij = (V f )j , where we view f = (f0 , . . . , fk1 ) as a vector in Fk . Since det(V ) 6= 0, the
players can solve the system of equations and obtain f0 = (V 1 (Si1 , . . . , Sik ))1 .
(ii) Consider any k 1 players, say i1 , . . . , ik1 . We will show that for any fixing of f0 = s,
the random variables Si1 , . . . , Sik1 are independent and uniform over F. To see that,
let V = V (0, i1 , . . . , ik1 ) and let f = (f0 , . . . , fk1 ) Fk be chosen uniformly.
Then, (f0 , Si1 , . . . , Sik1 ) = V f is also uniform in Fk . In particular, f0 is independent
of (Si1 , . . . , Sik1 ), and hence the distribution of (Si1 , . . . , Sik1 ), which happens to be
uniform in Fk1 , is independent of the choice of f0 . Thus, the k1 learn no information
about the secret.
20
3.2
In our construction, the shares were elements of F, and hence their size grew with the number
of players. One may ask whether this is necessary, or whether there are better constructions
which achieve smaller shares. Here, we will only analyze the case of linear constructions
(such as above), although similar bounds can be obtained by general secret sharing schemes.
Lemma 3.5. Let M be a n k matrix over a field F, with n k + 2, such that any k rows
in M are linearly independent. Then |F| max(k, n k). In particular, |F| n/2.
We note that the condition n k + 2 is tight: the (k + 1) k matrix whose first k rows
form the identity matrix, and its last row is all ones, has this property over any field, and
in particular F2 . We also note that there is a conjecture (called the MDS conjecture) which
speculates that in fact |F| n 1, and our construction above is tight.
Proof. We can apply any invertible linear transformation to the columns of M , without
changing the property that any k rows are linearly independent. So, we may assume that
I
,
M=
R
where I is the k k identity matrix, and R is a k (n k) matrix.
Next, we argue that R cannot contain any 0. Otherwise, if for example Ri,j = 0, then
the following k rows of M are linearly dependent: the i-th row of R, and the k 1 rows of
the identity matrix which exclude row j. So, Ri,j 6= 0 for all i, j.
Hence, we may scale the rows of R so that Ri,1 = 1 for all 1 i n k. Moreover, we
can then scale the columns of R so that R1,i = 1 for all 1 i k. There are now two cases
to consider:
21
(i) If |F| < k then R2,i = R2,j for some 1 i < j 6= k. But then the following k rows are
linearly dependent: the first and second rows of R, and the k 2 rows of the identity
matrix which exclude rows i, j. So, |F| k.
(ii) If |F| < n k then Ri,2 = Rj,2 for some 1 i < j n k. But then the following k
rows are linearly dependent: the i-th and j-th rows of R, and the k 2 rows of the
identity matrix which exclude rows 1, 2. So, |F| n k.
22
4.1
Basic definitions
An error correcting code allows to encode messages into (longer) codewords, such that even
in the presence of a errors, we can decode the original message. Here, we focus on worst
case errors, where we make no assumptions on the distribution of errors, but instead limit
the number of errors.
Definition 4.1 (Error correcting code). Let be a finite set, n k 1. An error correcting
code over the alphabet of message length k and codeword length n (also called block length)
consists of
A set of codewords C n of size |C| = ||k .
A one-to-one encoding map E : k C.
A decoding map D : n k .
We require that D(E(m)) = m for all messages m k .
To describe the error correcting capability of a code, define the distance of x, y n as
the number of coordinates where they differ,
dist(x, y) = |{i [n] : xi 6= yi }|.
Definition 4.2 (Error correction capability of a code). A code (C, E, D) can correct up to e
errors if for any message m k and any x n such that dist(E(m), x) e, it holds that
then D(x) = m.
Example 4.3 (Repetition code). Let = {0, 1}, k = 1, n = 3. Define C = {000, 111}, E :
{0, 1} {0, 1}3 by E(0) = 000, E(1) = 111. Define D : {0, 1}3 {0, 1} by D(x1 , x2 , x3 ) =
Majority(x1 , x2 , x3 ). Then (C, E, D) can correct up to 1 errors.
If we just care about combinatorial bounds (that is, ignore algorithmic aspects), then a
code is defined by its codewords. We can define E : k C by any one-to-one way, and
D : n k by mapping x n to the closest codeword E(m), breaking ties arbitrarily.
From now on, we simply describe codes by describing the set of codewords C. Once we start
discussing algorithms for encoding and decoding, we will revisit this assumption.
Definition 4.4 (Minimal distance of a code). The minimal distance of C is the minimal
distance of any two distinct codewords,
distmin (C) = min dist(x, y).
x6=yC
Definition 4.5 ((n, k, d)-code). An (n, k, d)-code over an alphabet is a set of codewords
C n of size |C| = ||k and minimal distance d.
23
Lemma 4.6. Let C be an (n, k, 2e + 1)-code. Then it can decode from e errors.
Proof. Let x C, y n be such that dist(x, y) e. We claim that x is the unique closest
codeword to y. Assume not, that is there is another x0 C with dist(x0 , y) e. Then
by the triangle inequality, dist(x, x0 ) dist(x, y) + dist(y, x0 ) 2e, which contradicts the
assumption that the minimal distance of C is 2e + 1.
Moreover, if C has minimal distance d, then there exist x, x0 C and y n such that
dist(x, y)+dist(x0 , y) = d, so the bound is tight. So, we can restrict our study to the existence
of (n, k, d)-codes.
4.2
Basic bounds
4.3
.
Pr[dist(xi , xj ) n/10]
22n
2n
We need some estimates for the binomial coefficient. A useful one is
n m n en m
.
m
m
m
So,
n/10
n
en
((10e)1/10 )n (1.4)n .
n/10
n/10
So,
n(1.4)n
= n(0.7)n .
2n
Now, the probability that there exists some pair 1 i < j N such that dist(xi , xj ) n/10
can be upper bounded by the union bound,
X
Pr[1 i < j N, dist(xi , xj ) n/10]
Pr[dist(xi , xj ) n/10]
Pr[dist(xi , xj ) n/10]
1i<jN
25
from it; pick x3 from the remaining points, and so on. Continue in such a way until all points
are exhausted. The number of points chosen N satisfies that
2n
N Pn/10
i=0
.
n
i
This
we have initially a total of 2n points, and at each point we delete at most
Pn/10is nbecause
undelete points. The same calculations as before show that N 2n/10 .
i=0
i
4.4
Linear codes
A special family of codes are linear codes. Let F be a finite field. In a linear code, = F and
C Fn is a k-dimensional subspace. The encoding map is a linear map: E(x) = Ax where
A is a n k matrix over F. Note that rank(A) = k, as otherwise the set of codewords will
have dimension less than k. In practice, nearly all codes are linear, as the encoding map is
easy to define. However, the decoding map needs inherently to be nonlinear, and is usually
the hardest to compute.
Claim 4.11. Let C be a linear code. Then distmin (C) = min06=xC dist(0, x).
Proof. If x1 , x2 C have the minimal distance, then dist(x1 , x2 ) = dist(0, x1 x2 ) and
x1 x2 C.
We can view the decoding problem from either erasures or errors as a linear algebra
problem. Let A be an n k matrix. Codewords are C = {Ax : x Fk }, or equivalently the
subspace spanned by the columns of A.
Decoding from erasures. The problem of decoding from erasures is equivalent to the
following problem: given y (F {?})n , find x Fk such that (Ax)i = yi for all yi 6=?.
Equivalently, we want the sub-matrix formed by keeping only the rows {i [n] : yi 6=?} to
form a rank k matrix. So, the requirement that a linear code can be uniquely decoded from e
erasures, is equivalent to the requirement that if any e rows are deleted in the matrix, it still
has rank k. Clearly, e n k. We will see a code achieving this bound, the Reed-Solomon
code. It will be based on polynomials.
Decoding from errors. The problem of decoding from e errors is equivalent to the following problem: given y Fn , find x Fk such that (Ax)i 6= yi for at most e coordinates.
Equivalently, we want to find a vector spanned by the columns of A, which agrees with y
in at least n e coordinates. If the code has minimal distance d, then we know that this is
mathematically possible whenever e < d/2; however, finding this vector is in general computationally hard. We will see a code where this is possible, and which moreover has the best
minimal distance, d = n k + 1. Again, it will be Reed-Solomon code.
26
Reed-Solomon codes
Reed-Solomon codes are an important group of error-correcting codes that were introduced
by Irving Reed and Gustave Solomon in the 1960s. They have many important applications,
the most prominent of which include consumer technologies such as CDs, DVDs, Blu-ray
Discs, QR Codes, satellite communication and so on.
5.1
Definition
Reed-Solomon codes are defined as the evaluation of low degree polynomials over a finite
field. Let F be a finite field. Messages are in Fk are treated as the coefficients of a univariate
polynomial of degree k 1, and codewords are its evaluations on n < |F| points. So, ReedSolomon codes are defined by specifying F, k and n < |F| points 1 , . . . , n F, and its
codewords are
C = {(f (1 ), f (2 ), . . . , f (n )) : f (x) =
k1
X
fi xi , f0 , . . . , fk1 F}.
i=0
We define this family of codes in general as RSF (n, k), and if needs, we can specify the
evaluation points. An important special case is when n = |F|, and we evaluate the polynomial
on all field elements.
Lemma 5.1. The minimal distance of RSF (n, k) is d = n k + 1.
Proof. As C is a linear code, it suffices to show that for any nonzero polynomial f (x) of
degree k 1, |{x F : f (x) = 0}| k 1. Hence, |{i [n] : f (i ) = 0}| n k + 1.
Now, this follows from the fundamental theorem of algebra: a nonzero polynomial of degree
r has at most r roots. We prove it below by induction on r.
If r = 0 then f is a nonzero constant, and so it has no roots. So, assume r 1. Let
F be such that f () = 0. Lets shift the input so that the root is at zero. That is, define
g(x) = f (x + ), so that g(0) = 0 and g(x) is also a polynomial of degree r. Express it as
g(x) =
r
X
fi (x + ) =
i=0
r
X
gi xi .
i=0
P
i
Since g0 = g(0) = 0, we get that g(x) = xh(x) where h(x) = r1
i=0 gi+1 x is a polynomial of
degree r 1, and hence f (x) = g(x ) = (x )h(x ). By induction, h(x ) has at
most r 1 roots, and hence f has at most r roots.
Recall that the Singleton bound shows that in any (n, k, d) code, d n k + 1. Codes
which achieve this bound, i.e for which d = n k + 1, are called MDS codes (Maximal
Distance Separable). What we just showed is that Reed-Solomon codes are MDS codes.
In fact, for prime fields, it is known that Reed-Solomon are the only MDS codes [Bal11],
and it is conjecture to be true over non-prime fields as well (except for a few exceptions in
characteristic two).
27
5.2
We first analyze the ability of Reed-Solomon codes to recover from erasures. Assume that
we are given a Reed-Solomon codeword, with some coordinates erased. Let S denote the set
of remaining coordinates. That is, for S [n] we know that f (i ) = yi for all i S, where
yi F. The question is: for which sets S is this information sufficient to uniquely recover
the polynomial f ?
Equivalently, we need to solve the following system of linear equations, where the unknowns are the coefficients f0 , . . . , fk1 of the polynomial f :
k1
X
fj ij = yi ,
i S.
j=0
In order to analyze this, let V = V ({i : i S}) be a |S| k Vandermonde matrix given by
Vi,j = ij for i S, 0 j k 1. Then, we want to solve the system of linear equations
V f = y,
where f = (f0 , . . . , fk1 ) Fk and y = (yi : i S) F|S| . Clearly, we need |S| k for
a unique solution to exist. As we saw, whenever |S| = k the matrix V is invertible, hence
there is a unique solution. So, as long as |S| k, we can restrict to k equations and uniquely
solve for the coefficients of f .
Corollary 5.2. The code RSF (n, k) can be uniquely decoded from n k erasures.
5.3
P
i
Next, we study the harder problem of decoding from errors. Again, let f (x) = k1
i=0 fi x be
an unknown polynomial of degree k 1. We know its evaluation on 1 , . . . , n F, but with
a few errors, say e. That is, we are given y1 , . . . , yn F, such that yi 6= f (i ) for at most e
of the evaluations. If we knew the locations of the errors, we would be back at the decoding
from erasures scenario; however, we do not know them, and enumerating them is too costly.
Instead, we will design an algebraic algorithm, called the Berlekamp-Welch algorithm, which
can detect the locations of the errors efficiently, as long as the number of errors is not too
large (interestingly enough, the algorithm was never published as an academic paper, and
instead is a patent).
Define a error locating polynomial E(x) as follows:
Y
E(x) =
(x i ).
i:yi 6=f (i )
The decoder doesnt know E(x). However, we will still use it in the analysis. It satisfies the
following equation:
E(i ) (f (i ) yi ) = 0
1 i n.
Let N (x) = E(x)f (x). Note that deg(E) = e and deg(N ) = deg(E) + deg(f ) = e + k 1.
We established the following claim.
28
Claim 5.3. There exists polynomials E(x), N (x) of degrees deg(E) = e, deg(N ) = e + k 1
such that
N (i ) yi E(i ) = 0
1 i n.
Proof. We have N (i ) = E(i )f (i ). This is equal to E(i )yi as either yi = f (i ) or
otherwise E(i ) = 0.
The main idea is that we can find such polynomials by solving a system of linear equations.
1 i n.
Proof. Let
E(x)
=
e
X
(x) =
N
aj x j ,
j=0
e+k1
X
bj x j ,
j=0
where aj , bj are unknown coefficients. They need to satisfy the following system of n linear
equations:
e
e+k1
X
X
aj ij yi
bj ij = 0
1 i n.
j=0
j=0
We know that this system has a nonzero solution (since we know that E, N exist by our
assumptions). So, we can find a nonzero solution by linear algebra.
= N , so we are not done yet. However,
Note that it is not guaranteed that E = E, N
N
that we find.
the next claim shows that we can still recover f from any E,
(x) = E(x)f
Proof. Given N
(x), either by polynomial
division, or by solving a system of linear equations.
Note that this is the best we can do, as the minimal distance is n k + 1.
29
6.1
U 0 U.
Proof. The condition is clearly necessary: if G has a perfect matching {(ui , v(i) ) : i [n]}
for some permutation then if U 0 = {ui1 , . . . , uik } then v(i1 ) , . . . , v(ik ) (U 0 ), and hence
|(U 0 )| k = |U 0 |.
The more challenging direction is to show that the condition given is sufficient. To show
that, assume towards a contradiction that G has no perfect matching. Assume without loss
of generality (after possibly renaming the vertices) that M = {(u1 , v1 ), . . . , (um , vm )} is the
largest partial matching in G. Let u U \ {u1 , . . . , um }. We will build a partial matching
for {u1 , . . . , um , u}, which would violate the maximality of M .
If (u, v) E for some v
/ {v1 , . . . , vm }, then clearly M is not maximal, as we can add
the edge (u, v) to it. More generally, we say that a path P in G is an augmenting path for
M if it is of the form
P = u, vi1 , ui1 , vi2 , ui2 , . . . , vik , uik , v
with v
/ {v1 , . . . , vm }. Note that all the even edges in P are in the matching M (namely,
(vi1 , ui1 ), . . . , (vik , uik ) M ), and all the odd edges are not in the matching M (namely,
(u, vi1 ), (ui1 , vi2 ), . . . , (uik , v)
/ M ). Such a path would also allow us to increase the matching
size by one, by taking
M 0 = {(u, vi1 ), (ui1 , vi2 ), . . . , (uik , v)} {(uj , vj ) : j
/ {i1 , . . . , ik }}.
So, by our assumption that M is a partial matching of maximal size, there are no augmenting
paths in G which start at u.
30
We say that a path P in G is an alternating path if each other edge in it belongs to the
matching M . Let P be an alternating path of maximum length in G starting at u. It has
length at least 1, as u has at least one neighbor. So, it has the form
P = u, vi1 , ui1 , vi2 , ui2 , . . .
This path cannot end at a vertex vi V , as it can always be extended to ui U . So, it ends
at some vertex uik {u1 , . . . , um },
P = u, vi1 , ui1 , vi2 , ui2 , . . . , vik , uik .
Let U 0 = {u, ui1 , . . . , uik } and V 0 = {vi1 , . . . , vik }. We claim that V 0 = (U 0 ), which would
falsify our assumption, since |U 0 | = k + 1, |V 0 | = k. To see that, assume that v (U 0 ). If
v {v1 , . . . , vm }, say v = vi , then by construction ui U 0 and hence vi (U 0 ). Otherwise, if
v
/ {v1 , . . . , vm }, then it can be used to construct an augmenting path. Indeed, if (ui` , v) E
for some 1 ` k then the following is an augmenting path:
P = u, vi1 , ui1 , vi2 , ui2 , . . . , vi` , ui` , v.
In either case, we reached a contradiction.
So, we have a mathematical criteria to check if a bi-partite graph has a perfect matching.
Moreover, it can be verified that the proof of Halls marriage theorem is in fact algorithmic:
it can be used to find larger and larger partial matchings in a graph. This is much better
than verifying the conditions of the theorem, which naively would take time 2n to enumerate
all subsets of U . We will now see a totally different way to check if a graph has a perfect
matching, using polynomial identity testing.
6.2
Polynomial representation
ei d
Definition 6.2 (Algebraic circuit). Let F be a field, and x1 , . . . , xn be variables taking values
in F. An algebraic circuit computes a polynomial in x1 , . . . , xn . It is defined by a directed
acyclic graph (DAG), with multiple leaves (nodes with no incoming edges) and a single root
(node with no outgoing edges). Each leaf is labeled by either a constant c F or a variable
xi , which is the polynomial it computes. Internal nodes are labeled in one of two way: they
are either sum gates, which compute the sum of their inputs, or they are product gates,
which compute the product of their inputs. The polynomial computes by the circuit is the
polynomial computed by the root.
n
So for example, we can compute (x + 1)2 using an algebraic circuit of size O(n):
It has two leaves: v1 , v2 which compute the polynomials fv1 (x) = 1, fv2 (x) = x.
The node v3 is a sum gate with two children, v1 , v2 . It computes the polynomial
fv3 (x) = x + 1.
For i = 1, . . . , n, let vi+3 be a multiplication gate with two children, both being vi+2 .
i
It computes the polynomial fvi+3 (x) = fvi+2 (x)2 = (x + 1)2 .
n
(1)
sign()
n
Y
xi,(i) .
i=1
Sn
The sum is over of all the permutations on n elements. In particular, the determinant is a
polynomial in n2 variables of degree n. As a sum of monomials, it has n! monomials, which
is very inefficient. However, we know that the determinant can be computed efficiently by
Gaussian elimination. Although we would not show this here, it turns out that it can be
transformed to an algebraic circuit of polynomial size which computes the determinant.
Another interesting polynomial is the permanent. It is defined very similarly to the
determinant, except that we do not have the signs of the permutations:
per(M )((xi,j : 1 i, j n)) =
n
XY
xi,(i) .
Sn i=1
A direct calculation of the permanent, by summing over all monomials, requires size n!. There
are more efficient ways (such as Ryser formula [Rys63]) which gives an arithmetic circuit of
size O(2n n2 ), but these are still exponential. It is suspected that no sub-exponential algebraic
circuits can compute the permanent, but we do not know how to prove this. The importance
of this problem is that the permanent is complete, in the sense that many counting problems
can be reduced to computing the permanent of some specific matrices.
32
Open Problem 6.4. What is the size of the smallest algebraic circuit which computes the
permanent?
6.3
A basic question in mathematics is whether two objects are the same. Here, we will consider
the following problem: given two polynomials f (x), g(x), possibly via an algebraic circuit,
is it the case that f (x) = g(x)? Equivalently, since we can create a circuit computing
f (x) g(x), it is sufficient to check if a given polynomial is zero. If this polynomial is given
via its list of coefficients, we can simply check that all of them are zero. But, this can be
a very expensive procedure, as the number of coefficients can be exponentially large. For
example, verifying the formula for the determanent of a Vandermonde matrix directly would
take exponential time if done in this way, although we saw a direct proof of this formula.
We will see that using randomness, it can be verified if a polynomial is zero or not.
This will be via the following lemma, called that Schwartz-Zippel lemma [Zip79, Sch80],
which generalizes the fact that univariate polynomials of degree d have at most d roots, to
multivariate polynomials.
Lemma 6.5. Let f (x1 , . . . , xn ) be a nonzero polynomial of degree d. Let S F of size
|S| > d. Then
d
.
Pr [f (a1 , . . . , an ) = 0]
a1 ,...,an S
|S|
Q
Note that the lemma is tight, even in the univariate case: if f (x) = di=1 (x i) and
S = {1, . . . , s} then PraS [f (a) = 0] = ds .
Proof. The proof is by induction on n. If n = 1, then f (x1 ) is a univariate polynomial of
d
degree d, hence it has at most d roots, and hence Pra1 S [f (a1 ) = 0] |S|
.
If n > 1, we express the polynomial as
f (x1 , . . . , xn ) =
d
X
i=0
a1 ,...,an S
[f (a1 , . . . , an ) = 0]
33
We bound each term individually. We can bound the probability that fe (a1 , . . . , an1 ) = 0
by induction:
de
deg(fe )
.
Pr
[fe (a1 , . . . , an1 ) = 0]
a1 ,...,an1 S
|S|
|S|
Next, fix a1 , . . . , an1 S such that fe (a1 , . . . , an1 ) 6= 0. The polynomial f (a1 , . . . , an1 , x)
is a univariate polynomial in x of degree e, hence it has at most e roots. Thus, for any such
fixing of a1 , . . . , an1 we have
Pr [f (a1 , . . . , an1 , an ) = 0]
an S
e
.
|S|
a1 ,...,an S
e
.
|S|
We conclude that
Pr
a1 ,...,an S
[f (a1 , . . . , an ) = 0]
e
d
de
+
=
.
|S|
|S|
|S|
Corollary 6.6. Let f, g be two different multivariate polynomials of degree d. Fix > 0 and
let s d/. Then
Pr
a1 ,...,an {1,...,s}
[f (a1 , . . . , an ) = g(a1 , . . . , an )] .
Note that if we have an efficient algebraic circuit which computes f, g, then we can run
this test efficiently using a randomized algorithm, which evaluates the two circuits on a
randomly chosen joint input.
6.4
We will see an efficient way to find if a bipartite graph has a perfect matching, using polynomial identity testing.
Define the following n n matrix: Mi,j = xi,j if (ui , vj ) E, and Mi,j = 0 otherwise.
The determinant of M is
det(M ) =
sign()
(1)
n
Y
Mi,(i) .
i=1
Sn
Lemma 6.7. G has a perfect matching iff det(M ) is not the zero polynomial.
Q
Q
Proof. Each term ni=1 Mi,(i) is the monomial ni=1 xi,(i) if corresponds to a perfect
matching; and it zero otherwise. Moreover, each monomial appears only once, and hence
monomials cannot cancel each other.
34
35
Satisfiability
Definition 7.1 (CNF formulas). A CNF formula over boolean variables is a conjunction
(AND) of clauses, where each clause is a disjunction (OR) of literals (variables or their
negation). A k-CNF is a CNF formula where each clause contains exactly k literals.
For example, the following is a 3-CNF with 6 variables:
(x1 , . . . , x6 ) = (x1 x2 x3 ) (x1 x3 x5 ) (x1 x2 x4 ) (x1 x2 x6 ).
The k-SAT problem is the computational problem of deciding whether a given k-CNF
has a satisfying assignment. Many constraint satisfaction problems can be cast as a k-SAT
problem, for example: verifying that a chip works correctly, scheduling flights in an airline,
routing packets in a network, etc. As we will shortly see, 2-SAT can be solved in polynomial
time (in fact, in linear time); however, for k 3, the k-SAT problem is NP-hard, and the
only known algorithms solving it run in exponential time. However, even there we would
see that we can improve upon full enumeration (which takes 2n time). We will present an
algorithm that solves 3-SAT in time 20.41n . The same algorithm solves k-SAT for any
k 3 in time 2ck n where ck < 1.
Both the polynomial algorithm for 2-SAT and the exponential algorithm for 3-SAT,
k 3, will be based on a similar idea: analyzing a random walk on the space of possible
solutions.
7.1
2-SAT
Theorem 7.2. If is a satisfiable 2-CNF, then with probability at least 1/4 over the internal
randomness of the algorithm, it outputs a solution within 4n2 steps.
We note that if we wish a higher success probability (say, 99%) then we can simply repeat
the algorithm a few times, where in each phase, if the algorithm does not find a solution
within the first 4n2 steps, we restart the algorithm. The probability that the algorithm still
doesnt find a solution after t restarts is at most (3/4)t . So, to get success probability of 99%
we need to run the algorithm 9 times (since (3/4)9 1%).
We next proceed to the proof. For the proof, fix some solution x for (if there is more
than one, choose one arbitrarily). Let xt denote the value of x in the t-th iteration of the
loop. Note that it is a random variable, which depends on our choice of which clause to
choose and which variables to flip in the previous steps. Define dt = dist(xt , x ) to be the
hamming distance between xt and x (that is, the number of bits where they differ). Clearly,
at any stage 0 dt n, and if dt = 0 then we reached a solution, and we output xt = x at
iteration t.
Consider xt , the assignment at iteration t, and assume that dt > 0. Let C = `a `b
be a violated clause, where `a {xa , xa } and `b {xb , xb }. This means that either
(x )a 6= (xt )a or (x )b 6= (xt )b (or both), since C(x ) = 1 but C(xt ) = 0. If we choose
` {a, b} such that (xt )` 6= (x )` , then the distance between xt+1 and x decreases by one;
otherwise, it increases by one. So we have:
dt+1 = dt + t ,
where t {1, 1} is a random variable that satisfies Pr[t = 1|xt ] 1/2.
Another way to put it, the sequence of distances d0 , d1 , d2 , . . . is a random walk on
{0, 1, . . . , n}. It starts at some arbitrary location d0 . In each step, we move to the left
(getting closer to 0) with probability 1/2, and otherwise move to the right (getting further
away from n). We will show that after O(n2 ) steps, with high probability, this has to terminate: either some satisfying assignment has been found, or otherwise we hit 0 and output
x . We do so by showing that a random walk tends to drift far from its origin.
For simplicity, we first analyze the slightly simpler case where the random walk is symmetric, that is Pr[t = 1|xt ] = Pr[t = 1|xt ] = 1/2. We will then show how to extend the
analysis to our case, where the probability for 1 could be larger (intuitively, this should
only help us get to 0 faster; however proving this formally is a bit technical).
Lemma 7.3. Let y0 , y1 , . . . be a random walk, defined as follows: y0 = 0 and yt+1 = yt + t ,
where t {1, 1} and Pr[t = 1|yt ] = 1/2 for all t 0. Then, for any t 0,
E[yt2 ] = t.
Proof. We prove this by induction on t. It is clear for t = 0. We have
2
E[yt+1
] = E[(yt + t )2 ] = E[yt2 ] + 2E[yt t ] + E[2t ].
37
We now prove the more general lemma, where we allow a consistent drift.
Lemma 7.4. Let y0 , y1 , . . . be a random walk, defined as follows: y0 = 0 and yt+1 = yt + t ,
where t {1, 1} and Pr[t = 1|yt ] 1/2 for all t 0. Then, for any t 0,
E[yt2 ] t/2.
Note that the same result holds by symmetry if we assume instead that Pr[t = 1|yt ]
1/2 for all t 0.
Proof. The proof is by a coupling argument. Define a new random walk y00 , y10 , . . ., where
0
= yt0 + 0t . In general, we would have that yt0 , 0t depend on y1 , . . . , yt . So, fix
y00 = 0, yt+1
y1 , . . . , yt , and assume that Pr[t = 1|yt ] = for some 1/2. Define 0t as:
0t (yt , t )
1 if t = 1
if t = 1, with probability 1/2
= 1
1 if t = 1, with probability 1 1/2
We return now to the proof of Theorem 7.2. The proof will use Lemma 7.4, but the
analysis is more subtle.
38
Proof of Theorem 7.2. Recall that x0 , x1 , . . . is the sequence of guesses for a solution which
the algorithm explores. Let T N denote the random variable of the step at which the
algorithm outputs a solution. The challenge in the analysis is that not only the sequence is
random, for also T is a random variable. For simplicity of notation later on, set xt = xT for
all t > T .
Let y0 , y1 , . . . be defined as yt = dt d0 , where recall that dt = dist(xt , x ). In order to
analyze this random walk, we define a new random walk z0 , z1 , . . . as follows: set zt = yt for
t T , and for t T set zt+1 = zt + t , where t {1, +1} is uniformly and independently
chosen. We will argue that the sequence zt satisfies the conditions of Lemma 7.4. Namely,
if we define t = zt+1 zt then Pr[t = 1|zt ] 1/2.
To show this, let us condition on x0 , . . . , xt . If none of them are solutions to then
t = yt+1 yt , and conditioned on x0 , . . . , xt not being solutions, we already showed that
the probability that t = 1 is 1/2. If, on the other hand, x0 , . . . , xt contain a solution,
then t = t and the probability that t = 1 is exactly 1/2. In either case we have
Pr[t = 1|x0 , . . . , xt ] 1/2.
This then implies that
Pr[t = 1|zt ] 1/2.
Thus, we may apply Lemma 7.4 to the sequence z0 , z1 , . . . and obtain that for any t 1
we have
E[zt2 ] t/2.
Next, for t 1 let Tt = min(T, t). We may write zt as
zt = yTt +
t1
X
i ,
i=Tt
!2
t1
X
E[zt2 |T ] = E yTt +
i T = E[yT2t |T ] + t Tt .
i=Tt
7.2
3-SAT
Let be a 3-CNF. Finding if has a satisfying assignment is NP-hard, and the best known
algorithms take exponential time. However, we can still improve upon the naive 2n full
enumeration, as the following algorithm shows. The algorithm we analyze is due to Schoening [Sch99].
Let m 1 be a parameter to be determined later.
Function Solve-3SAT
Input : 3CNF
Output : x {0, 1}n such t h a t (x) = 1
1 . Choose x {0, 1}n randomly .
2 . For i = 1, . . . , m :
2 . 1 I f (x) = 1 , output x .
2 . 2 Otherwise , l e t Ci be some c l a u s e such t h a t Ci (x) = 0 ,
with v a r i a b l e s xa , xb , xc .
2 . 2 Choose ` {a, b, c} u n i f o r m l y :
Pr[` = a] = Pr[` = b] = Pr[` = c] = 1/3 .
2 . 3 F l i p x` .
3 . Output FAIL .
Our goal is to analyze the following question: what is the success probability of the
algorithm? as before, assume is satisfiable, and choose some satisfying assignment x .
Define xi to be the value of x at the i-th iteration of the algorithm, and let di = dist(xi , x ).
Claim 7.5. The following holds
(i) Pr[d0 = k] = 2n nk for all 0 k n.
(ii) dt+1 = dt + t where t {1, 1} satisfies Pr[t = 1|dt ] 1/3.
Proof. For (i), note that d0 is the distance of a random string from x , so equivalently, it is
the hamming weight of a uniform element of {0, 1}n . The number of elements of hamming
weight k is nk . For (ii), if xa , xb , xc are the variables appearing in an unsatisfied clause at
iteration t, then at least one of them disagrees with the value of x . If we happen to choose
it, the distance will decrease by one, otherwise, it will increase by one.
40
For simplicity, lets assume from now on that Pr[t = 1|dt ] = 1/3, where the more
general case can be handled similar to the way we handled it for 2-SAT.
Claim 7.6. Assume that d0 = k. The probability that the algorithm finds a satisfying solution
is at least
(m+k)/2 (mk)/2
2
m
1
.
3
3
(m + k)/2
Proof. Consider the sequence of steps 0 , . . . , m1 . If there are k more 1 than +1 in this
sequence, then starting at d0 = k, we will reach dm = 0. The number of such sign sequences
m
is (m+k)/2
, the probability for seeing a 1 is a 1/3, and the probability for seeing a +1 is
2/3.
Claim 7.7. For any 0 k n, the probability that the algorithms finds a solution is at least
(m+k)/2 (mk)/2
m
1
2
n n
2
.
k
(m + k)/2
3
3
Proof. We require that d0 = k (which occurs with probability 2n nk ) and, conditioned on
that occurring, apply Claim 7.6.
We now need to optimize parameters. Fix k = n, m = n for some constants , > 0.
We will use the following approximation: for n 1 and 0 < < 1,
n
(1)n
n
1
1
2H()n ,
n
1
1
where H() = log2 ( 1 ) + (1 ) log2 ( 1
) is the entropy function. Then
n
2H()n ,
k
1
m
+
n
H
2
2
2
(m + k)/2
(m+k)/2 (mk)/2
1
2
= 3n 2()/2n .
3
3
= 1 H() H 21 + 2
+ log2 3
.
2
Our goal is to choose 0 < < 1 and to minimize . The minimum is obtained for
= 1/3, = 1, which gives 0.41.
So, we have an algorithm that runs in time O(m) = O(n), and finds a satisfiable assignment with probability 2n . To find a satisfiable assignment with high probability, we
simply repeat it N = 5 2n times. The probability it fails in all these executions is at most
(1 2n )N exp(2n N ) exp(5) 1%.
Corollary 7.8. We can solve 3-SAT in time O(2n ) for 0.41.
41
Hash functions are used to map elements from a large domain to a small one. They are commonly used in data structures, cryptography, streaming algorithms, coding theory, and more
- anywhere where we want to store efficiently a small subset of a large universe. Typically,
for many of the applications, we would not have a single hash function, but instead a family
of hash functions, where we would randomly choose one of the functions in this family as
our hash function.
Let H = {h : U R} be a family of functions, mapping elements from a (typically
large) universe U to a (typically small) range R. For many applications, we would like two,
seemingly contradicting, properties from the family of functions:
Functions h H should look random
Functions h H are succinctly described, and hence processed and stored efficiently.
The way to resolve this is to be more specific about what do we mean by looking
random. The following definition is such a concrete realization, which although is quite
weak, it is already very useful.
Definition 8.1 (Pairwise independent hash functions). A family H = {h : U R} is
said to be pairwise independent, if for any two distinct elements x1 6= x2 U , and any two
(possibly equal) values y1 , y2 R,
Pr [h(x1 ) = y1 and h(x2 ) = y2 ] =
hH
1
.
|R|2
We investigate the power of pairwise independent hash functions in this chapter, and
describe a few applications. For many more applications we recommend the book [LLW06].
8.1
To simplify notations, let us consider the case of R = {0, 1}. We also assume that |U | = 2k
for some k 1, by possibly increasing the size of the universe by a factor of at most two.
Thus, we can identify U = {0, 1}k , and identify functions h H with boolean functions
h : {0, 1}k {0, 1}. Consider the following construction:
H2 = {ha,b (x) = ha, xi + b
One can check that |H2 | = 2k+1 = 2|U |, which is much smaller than the set of all functions
k
from {0, 1}k to {0, 1} (which has size 22 ). We will show that H2 is pairwise independent.
To do so, we need the following claim.
Claim 8.2. Fix a x {0, 1}k , x 6= 0k . Then
Pr [ha, xi
a{0,1}k
1
(mod 2) = 0] = .
2
42
a{0,1}k
[ha, xi
(mod 2) = 0] = Pr ai =
aj xj
j6=i
1
(mod 2) = .
2
1
1 1
= .
2 2
4
Next, we describe an alternative viewpoint, which will justify the name pairwise independent bits.
Definition 8.4 (Pairwise independent bits). A distribution D over {0, 1}n is said to be
pairwise independent, if for any distinct i, j [n] and any y1 , y2 {0, 1} we have
1
Pr [xi = y1 and xj = y2 ] = .
xD
4
We note that we can directly use H2 to generate pairwise independent bits. Assume
that n = 2k . Identify h H2 , h : {0, 1}k {0, 1}, with the string uh in{0, 1}n giving by
concatenating all the evaluations of h is some pre-fixed order. Let D be the distribution over
{0, 1}n obtained by sampling h H2 uniformly and outputting uh . Then the condition that
D is pairwise independent is equivalent to that of H2 be pairwise independent. Note that
the construction above gives a distribution D supported on |H2 | = 2n elements in {0, 1}n ,
much less than the full space. In particular, we can represent a string u in the support of D
by specifying the hash function which generated it, which only requires log |H| = log n + 1
bits.
Example 8.5. Let n = 4. A uniform string from the following set of 8 = 23 strings is
pairwise independent:
{0000, 0011, 0101, 0110, 1001, 1010, 1100, 1111}.
43
8.2
MAXCUT(G)
|E(G)|
.
2
2
1vi S 1vj S c .
(vi ,vj )E
Note that every undirected edge {u, v} in G is actually counted twice in the calculation
above, once as (u, v) and once as (v, u). However, clearly at most one of these is in E(S, S c ).
By linearity of expectation, the expected size of the cut is
X
X
ES [|E(S, S c )|] =
E[1vi S 1vj S
E[1xi =1 1xj =0 ]
/ ] =
(vi ,vj )E
(vi ,vj )E
(vi ,vj )E
1
|E(G)|
=
.
4
2
This implies that a random choice of S has a non-negligible probability of giving a 2approximation.
i
h
|E(G)|
1
c
2E(G)
n12 .
Corollary 8.7. PrS |E(S, S )| 2
Proof. Let X = |E(S, S c )| be a random variable counting the number of edges in a random
cut. Let = |E(G)|/2, where we know that E[X] . Note that whenever X < , we
in fact have that X 1/2, since X is an integer and a half-integer. Also, note that
always X |E(G)| 2. Let p = Pr[X ]. Then
E[X] = E[X|X ] Pr[X ] + E[X|X 1/2] Pr[X 1/2]
2 p + ( 1/2) (1 p)
1/2 + 2p.
So we must have 2p 1/2, which means that p 1/(4) 1/(2E(G)).
44
In particular, we can sample O(n2 ) sets S, compute for each one its cut size, and we are
guaranteed that with high probability, the maximum will be at least |E(G)|/2.
Next, we derandomize this randomized algorithm using pairwise independent bits. As
a side benefit, it will reduce the computation time from testing O(n2 ) sets to testing only
O(n) sets.
Lemma 8.8. Let x1 , . . . , xn {0, 1} be pairwise independent bits (such as the ones given by
H2 ). Set
S = {vi : xi = 1}.
Then
|E(G)|
.
2
Proof. The only place where we used the fact that the bits were uniform in the proof of
Lemma 8.6, was in the calculation
ES [|E(S, S c )|]
Pr[xi = 1 and xj = 0] =
1
4
for all distinct i, j. However, this is also true for pairwise independent bits (by definition).
In particular, for one of the O(n) sets S that we generate in the algorithm, we must have
that |E(S, S c )| exceeds the average, and hence |E(S, S c )| |E(G)|/2.
8.3
The previous application showed the usefulness of having small sample spaces for pairwise
independent bits. We saw that we can generate O(n) binary strings of length n, such that
choosing one of them uniformly gives us pairwise independent bits. We next show that this
is optimal.
Lemma 8.9. Let X {0, 1}n and let D be a distribution supported on X. Assume that D
is pairwise independent. Then |X| n.
Proof. Let X = {x1 , . . . , xm } and let pi = Pr[D = xi ]. For any i [n], we construct a vector
vi Rm as follows:
(vi )` = p` (1)(x` )i .
We will show that the set of vectors {v1 , . . . , vn } are linearly independent in Rm , and hence
we must have |X| = m n.
As a first step, we show that hvi , vj i = 0 for all i 6= j:
hvi , vj i =
m
X
p` (1)(x` )i +(x` )j = ExD (1)xi +xj
`=1
= Pr [xi + xj = 0
xD
1 1
= 0.
2 2
45
(mod 2)]
Next, we show that this implies that v1 , . . . , vn must be linearly independent. Assume
towards contradiction that this is not the case. That is, there exist coefficients 1 , . . . , n
R, not all zero, such that
X
i vi = 0.
However, for any j [n], we have
DX
E X
0=
i vi , vj =
i hvi , vj i = j kvj k2 = j .
So j = 0 for all j, a contradiction. Hence, v1 , . . . , vn are linearly independent and hence
|X| = m n.
8.4
aFkp
#
X
aj x j .
j6=i
Now, for every fixing of {aj : j 6= i}, we have that ai xi is uniformly distributed in Fp , hence
the probability that it equals any specific value is exactly 1/p.
Lemma 8.11. Hp is pairwise independent.
Proof. Fix distinct x1 , x2 Fkp and (not necessarily distinct) y1 , y2 Fp . All the calculations
below of ha, xi + b are in Fp . We need to show
Pr
aFkp ,bFp
1
.
p2
If we just randomized over a then by the claim, then for any y Fp by the claim,
1
Pr[ha, x1 i ha, x2 i = y] = Pr[ha, x1 x2 i = y] = .
a
a
p
Randomizing also over b gives us the desired result.
Pr [ha, x1 i + b = y1 and ha, x2 i + b = y2 ] =
a,b
1 1
1
= 2.
p p
p
8.5
hH
Proof. Let h H be uniformly chosen, and let X be a random variable that counts the
number of collisions in S. That is,
X
X=
1h(x)=h(y) .
{x,y}S
X
{x,y}S
Pr [h(x) = h(y)] =
hH
|S|2
1
|S| 1
.
2|R|
2
2 |R|
By Markovs inequality,
Pr [h is not collision free for S] = Pr[X 1] 1/2.
hH
47
8.6
We now show how to use pairwise independent hash functions, in order to design efficient
dictionaries. Fix a universe U . For simplicity, we will assume that for any R we have a
family of pairwise independent hash functions H = {h : U R}, and note that while our
previous constructions required R to be prime (or in fact, a prime power), this will at most
double the size of the range, which at the end will only change our space requirements by a
constant factor.
Given a set S U of size |S| = n, we would like to design a data structure which supports
queries of the form is x S? Our goal will be to do so, while minimizing both the space
requirements and the time it takes to answer a query. If we simply store the set as a sorted
list of n elements, then the space (memory) requirements are O(n log |U |), and each query
takes time O(log |U | + log n), by doing a binary search on the list. We will see that these
can be improved via hashing.
First, consider the following simple hashing scheme. Fix a range R = {1, . . . , n2 }. Let
H = {h : U [n2 ]} be a pairwise independent hash function. We showed that a randomly
chosen h H will be collision free on S with probability at least 1/2. So, we can sample
h H until we find such an h, which on average would require two attempts. Let A be an
array of length n2 . It will be mostly empty, except that we set A[h(x)] = x for all x S.
Now, to check whether x S, we compute h(x) and check whether A[h(x)] = x or not.
Thus, the query time is only O(log |U |). However, we pay in our space requirements are big:
to store n elements, we maintain an array of size n2 , which requires at least n2 bits (and
possibly even O(n2 log |U |), depending on how efficient we are in storing the empty cells).
We describe a two-step hashing scheme due to Fredman Komlos and Szemeredi [FKS84]
which avoids this large waste of space. It will use only O(n log n + log |U |) space, but would
still allow for query time of O(log |U |). As a preliminary step, we apply the collision free
hash scheme we just described. So, may assume from now on that U = O(n2 ) and that
S U has size |S| = n.
Step 1. We first find a hash function h : U [n] which has only n collisions. Let Coll(h, S)
denote the number of collisions of h for S, namely
Coll(h, S) = |{{x, y} S : h(x) = h(y)}|].
If H = {h : U [n]} is a family of pairwise independent hash functions, then
X
|S|2
n
|S| 1
.
EhH [Coll(h, S)] =
Pr[h(x) = h(y)] =
2n
2
2 n
{x,y}S
hH
So, after on average two iterations of randomly choosing h H, we find such a function
h : U [n] such that Coll(h, S) n. We fix it from now on. Note that it is represented
using only O(log n) bits.
48
P
Step 2. Next, for any i [n] let Si = {x S : h(x) = i}. Observe that
|Si | = n and
n
X
|Si |
= Coll(h, S) n.
2
i=1
P
P
Let ni = |Si |2 . Note that
ni = 2Coll(h, S) + |Si | 3n. We will find hash functions
hi : U [ni ] which are collision free on Si . Choosing a uniform hash function from a
pairwise independent set of hash functions Hi = {h : U [ni ]} succeeds on average after
two samples. So, we only need O(n) time in total (on expectation) to find these functions.
As each hi requires O(log n) bits to be represented, we need in total O(n log n) to represent
all of them.
P
Let A be an array of size 3n. Let offseti = j<i nj . The sub-array A[offseti : offseti + ni ]
will be used to store the elements of Si . Initially A is empty. We set
A[offseti + hi (x)] = x
x Si .
Note that there are no collisions in A, as we are guaranteed that hi are collision free on Si .
We will also keep the list of {offseti : i [n]} in a separate array.
Query. To check whether x S, we do the following:
Compute i = h(x).
Read offseti .
Check if A[offseti + hi (x)] = x or not.
This can be computed using O(log n) bit operations.
Space requirements. The hash functions h requires O(log n) bits. The hash functions
{hi : i [n]} require O(n log n) bits. The array A requires O(n log n) bits.
Setup time. The setup algorithm is randomized, as it needs to find good hash functions.
It has expected running time is O(n log n) bit operations.
To find h takes O(n log n) time, as this is how long it takes to verify that it is collision
free.
To find each hi takes O(|Si | log n) time, and in total it is O(n log n) time.
To set up the arrays of {offseti : i [n]} and A takes O(n log n) time.
RAM model vs bit model. Up until now, we counted bit operations. However, computers can operate on words efficiently. A model for that is the RAM model, where we can
perform basic operations on log n-bit words. In this model, it can be verified that the query
time is O(1) word operations, space requirements are O(n) words and setup time is O(n)
word operations.
49
8.7
Bloom filters
Bloom filters allow for even more efficient data structures for set membership, if some errors
are allowed. Let U ba a universe, S U a subset of size |S| = n. Let h : U [m] be a
uniform hash function, for m to be determined later. The data structure maintains a bit
array A of length m, initially set to zero. Then, for every x S, we set A[h(x)] = 1. In order
to check if x S, we compute h(x), read the value A[h(x)] we answer yes if A[h(x)] = 1
and no otherwise. This has the following guarantees:
No false negative: if x S we will always say yes.
Few false positives: if x
/ S, we will say yes with probability
is a fully random function.
|{i:A[i]=1}|
,
m
assuming h
50
Min cut
Let G = (V, E) be a graph. A cut in this graph is |E(S, S c )| for some S V , that is,
the number of edges which cross a partition of the vertices. Finding the maximum cut is
NP-hard, and the best algorithms solving it run in exponential time. We saw an algorithm
which finds a 2-approximation for the max-cut. However, finding the minimum cut turns
out to be solvable in polynomial time. Today, we will see a randomized algorithm due to
Karger [Kar93] which achieves this is a beautiful way. To formalize this, the algorithm will
find
min-cut(G) = min |E(S, S c )|.
6=S(V
9.1
Kargers algorithm
The algorithm is very simple: choose a random edge and contract it. Repeat n 1 times,
until only two vertices remain. Output this as a guess the for min-cut. We will show that
this outputs the min-cut with probability at least 2/n2 , hence repeating it 3n2 times (say)
2
will yield a min-cut with probability at least (1 2/n2 )3n 99%.
Formally, to define a contraction we need to allow graphs to have parallel edges.
Definition 9.1 (edge contraction). Let G = (V, E) be an undirected graph with |V | = n
vertices, potentially with parallel edges. For an edge e = (u, v) E, the contraction of G
along e is an undirected graph on n 1 vertices, where we merge u, v to be a single node,
and delete any self-loops that may be created in this process.
We can now present the algorithm.
Karger
Input : U n d i r e c t e d graph G = (V, E) with |V | = n
Output : Cut i n G
1 . Let Gn = G .
2 . For i = n, . . . , 3 do :
2 . 1 Choose a uniform edge ei Gi .
2 . 2 S e t Gi1 t o be t h e c o n t r a c t i o n o f Gi a l o n g ei .
3 . Output t he c ut i n G c o r r e s p o n d i n g t o G2 .
Next, we proceed to analyze the algorithm. We start with a few observations. By
contracting along an edge e = (u, v), we make a commitment that u, v belong to the same
side of the cut, so we restrict the number of potential cuts. Hence, any cut of Gi is a cut of
G, and in particular
min-cut(G) min-cut(Gi ), i = n, . . . , 2.
51
In order to analyze the algorithm, lets fix from now on a min-cut S V (G). We will
analyze the probability that the algorithm never chooses an edge in S. Hence, after n 2
contractions, the output will be exactly the cut S. First, we bound the min-cut by cuts for
which one side is a single vertex.
|E|
.
Claim 9.2. min-cut(G) minvG deg(v) 2 |V
|
2|E|
.
|V |
eE
2|E|/|V |
2
|E(S, S c )|
.
|E|
|E|
|V |
(n2 )
2
.
n2
3
Y
Pr[ei
/ E(S, S c )|en , . . . , ei+1
/ E(S, S c )].
i=n
...
n
n1 n2
4 3
2
=
.
n(n 1)
n
2
9.2
The time it takes to find a min-cut by Kargers algorithm described above is O(n4 ): we need
O(n2 ) iterations to guarantee success with high probability. Every iteration requires n 2
rounds of contraction, and every contraction takes O(n) time. We will see how to improve
this running time to O(n2 log n) due to Karger and Stein [KS96]. The main observation
guiding this is that most of the error comes when the graph becomes rather small; however,
when the graphs are small, the running time is also smaller. So, we can allow to run multiple
instances for smaller graphs, which would boost the success probability without increasing
the running time too much.
Fast-Karger
Input : U n d i r e c t e d graph G = (V, E) with |V | = n
Output : Cut i n G
0 . I f n = 2 output t he o n l y p o s s i b l e c u t .
1 . Let Gn = G and m = n/ 2 .
2 . For i = n, . . . , m + 1 do :
2 . 1 Choose a uniform edge ei Gi .
2 . 2 S e t Gi1 t o be t h e c o n t r a c t i o n o f Gi a l o n g ei .
3 . Run r e c u r s i v e l y Si = Fast-Karger(Gi ) f o r i = 1, 2 .
4 . Output S {S1 , S2 } which m i n i m i z e s E(S, S c ) .
Claim 9.5. Fix a cut S. Run the original Karger algorithm, but output Gm for m = n/ 2.
Then the probability that S is still a cut in Gm is at least 1/2.
Proof. Repeating the analysis we did, but stopping once we reach Gm , gives
Pr[S is a cut of Gm ] = Pr[en , . . . , em+1
/ E(S, S c )]
m1
m(m 1)
n2
...
=
.
n
m+1
n(n 1)
We next analyze the running time. Let T (n) be the running time of the algorithm on graphs
of size n. Then
54
10
Routing
Let G = (V, E) be an undirected graph, where nodes represent processors and edges represent
communication channels. Each node wants to send a message to another node: v (v),
where is some permutation on the vertices. However, messages can only traverse on edges,
and each edge can only carry one message at a given time unit. A routing scheme is a method
of deciding on pathes for the messages obeying these restrictions, which tries to minimize
the time it takes for all messages to reach their destination. If more than one packet needs
to traverse an edge, then only one packet does so at any unit time, and the rest are queued
for later time steps. The order of sending the remaining packets does not matter much. For
example, you can assume a FIFO (First In First Out) queue on every edge.
Here, we will focus on the hypercube graph H, which is a common graph used in distributed computation.
Definition 10.1 (Hypercube graph). The hypercube graph Hn = (V, E) has vertices corresponding to all n-bit strings V = {0, 1}n and edges which correspond to bit flips,
E = {(x, x ei ) : x {0, 1}n , i [n]},
where ei is the i-th unit vector, and is bitwise xor.
An oblivious routing scheme is a scheme where the path of sending v (v) depends
just on the endpoints v, (v), and not on the targets of all other messages. Such schemes
are easy to implement, as each node v can compute them given only their local knowledge
of their target (v). A very simple one for the hypercube graph is the bit-fixing scheme:
in order to route v = (v1 , . . . , vn ) to u = (u1 , . . . , un ), we flip the bits in order whenever
necessary. So for example, the path from v = 10110 to u = 00101 is
10110 00110 00100 00101.
We denote by Pfix (v, u) the path from v to u according to the bit-fixing routing scheme.
10.1
Although the maximal distance between pairs of vertices in H is n, routing based on the
bit-fixing scheme can incur a very large overhead, due to the fact that edges can only carry
one message at a time.
Lemma 10.2. There are permutations : {0, 1}n {0, 1}n for which the bit-fixing scheme
requires at least 2n/2 /n time steps to transfer all messages.
Proof. Assume n is even, and write x {0, 1}n as x = (x0 , x00 ) with x0 , x00 {0, 1}n . Consider
any permutation : {0, 1}n {0, 1}n which maps (x0 , 0) to (0, x0 ) for all x0 {0, 1}n/2 .
These 2n/2 pathes all pass through a single vertex (0, 0). As it has only n outgoing edges,
we need at least 2n/2 /n time steps to send all these packages.
55
In fact, a more general theorem is true, which shows that any deterministic oblivious
routing scheme is equally bad. Here, a deterministic oblivious routing scheme is any scheme
in which if (v) = u then the path from v to u depends only on v, u, and more is decided in
some deterministic fixed way.
Theorem 10.3. For any deterministic routing
scheme, there exists a permutation :
n
n
n/2
{0, 1} {0, 1} which requires at least 2 / n time steps.
We will not prove this theorem. Instead, we will see how randomization can greatly
enhance performance.
10.2
We will consider the following oblivious routing scheme, which we call the two-step bit-fixing
scheme. It uses randomness on top of the deterministic bit-fixing routing scheme described
earlier.
Definition 10.4 (Two-step bit fixing scheme). In order to route a packet from a source
v {0, 1}n to a target (v) {0, 1}n do the following:
(i) Sample uniformly an intermediate target t(v) {0, 1}n .
(ii) Follow the path Pfix (v, t(v)).
(iii) Follow the path Pfix (t(v), (v)).
Observe that t : {0, 1}n {0, 1}n is not necessarily a permutation, as each t(v) is
sampled independently, and hence there could be collisions. Still, we prove that with very
high probability, all packets will be delivered in linear time.
Theorem 10.5. With probability 12(n1) , all packets will be routed to their destinations
in at most 14n time steps.
In preparation for proving it, we first prove a few results about the deterministic bit-fixing
routing scheme.
Claim 10.6. Let v, v 0 , u, u0 {0, 1}n . Let P = Pfix (v, u) and P 0 = Pfix (v 0 , u0 ). If the paths
separate at some point, they never re-connect. That is, let w1 , . . . , wm be the vertices of P
0
0
0
0
and let w10 , . . . , wm
0 be the vertices of P . Assume that wi = wj but wi+1 6= wj+1 . Then
w` 6= w`0 0
` i + 1, `0 j + 1.
0
0
Proof. By assumption wi = wj0 and wi+1 6= wj+1
. Then wi+1 = wi ea and wj+1
= wj0 eb
with a 6= b. Assume without loss of generality that a < b. Then (w` )a = (wi+1 )a = (wi )a 1
0
for all ` i + 1, while (w`0 0 )a = (wj+1
)a = (wj0 )a = (wi )a for all `0 j + 1. Thus, for any
0
0
` i + 1, ` j + 1 we have w` 6= w`0 , which means that the pathes never intersect again.
56
Lemma 10.7. Fix : {0, 1}n {0, 1}n and v {0, 1}n . Let e1 , . . . , ek be the edges of
Pfix (v, (v)). Define
Sv = {v 0 V : v 0 6= v, Pfix (v 0 , (v 0 )) contains some edge from e1 , . . . , ek }.
Then the packet sent from v to (v) will reach its destination after at most k + |Sv | steps.
Proof. Let S = Sv for simplicity of notation. The proof is by a charging argument. Let pv be
the packet sent from v to (v). We assume that packets carry tokens on them. Initially,
there are no tokens. Assume that at some time step, the packet pv it is supposed to traverse
an edge ei for some i [k], but instead another packet pv0 is sent over ei at this time step
(necessarily v 0 S). In such a case, we generate a new token and place it on pv0 . Next, if
for some v 0 S, a packet pv0 with at least one token on it is supposed to traverse an edge
ej for j [k], but instead another packet pv00 is sent over it at the same time step (again,
necessarily v 00 S), then we move one token from pv0 to pv00
We will show that any packet pv0 for v 0 S can have at most one token on it at any given
moment. This shows that at most |S| tokens are generated overall. Thus, pv is delayed for
at most |S| steps and hence reaches its destination after at most k + |S| steps.
To see that, observe that tokens always move forward along e1 , . . . , ek . That is, if we
follow a specific token, it starts at some edge ei , follows a path ei , . . . , ej for some j i, and
then traverses an edge outside e1 , . . . , ek . At this point, by Claim 10.6, it can never intersect
the path e1 , . . . , ek again. So, we can never have two tokens which traverse the same edge at
the same time, and hence two tokens can never be on the same packet.
Lemma 10.8. Let t : {0, 1}n {0, 1}n be uniformly chosen. Let P (v) = Pfix (v, t(v)). Then
with probability at least 1 2n over the choice of t, for any path P (v), there are at most 6n
other pathes P (w) which intersect some edge of P (v).
Proof. For v, w {0, 1}n , let Xv,w {0, 1} be the indicator random variablePfor the event
that P (v) and P (w) intersect in an edge. Our goal is to upper bound Xv = w6=v Xv,w for
all v {0, 1}n . Before analyzing it, we first analyze a simpler random variable.
Fix an edge e = (u, u ea ). For v {0, 1}n , let Ye,w {0, 1} be an indicator variable
for the event that P
the edge e belongs to the path P (w). The number of pathes which pass
through e is then w{0,1}n Ye,w . Now, the path between w and t(w) pathes through e iff
wi = ui for all i > a, and t(w)i = ui for all i < a. Let Ae = {w {0, 1}n : wi = ui i > a}.
Only w with w Ae has a nonzero probability for the path from w to t(w) to go through
e. Note that |Ae | = 2a . Moreover, for any w Ae , the probability that P (w) indeed passes
through e is given by
Pr[Ye,w = 1] = Pr[t(w)i = ui i < a] = 2(a1) .
Hence, the expected number of pathes P (w) which go through any edge e is at most 2, since
X
X
E
Ye,w =
Pr[Ye,w = 1] = 2a 2(a1) = 2.
w{0,1}n
wAe
57
Next, the pathes P (v), P (w) intersect if some e P (v) belongs to P (w). This implies
that
X
Xv,w
Ye,w .
eP (v)
#
X
Xv,w E
w6=v
X X
Ye,w .
eP (v) w6=v
In order to bound E[Xv ], note that once we fix t(v) then P (v) becomes a fixed list of n edges.
Hence
"
#
X
X
E[Xv |t(v)] =
E
Ye,w 2n.
eP (v)
w6=v
This then implies the same bound once we average over t(v) as well,
E[Xv ] 2n.
This means that for every v, the path P (v) intersects on average most 2n other pathes
P (w). Next, we show that this if we slightly increase the bound, this happens with very high
probability. This requires a tail bound. In our case, a multiplicative version of the Chernoff
inequality.
Theorem 10.9 (Chernoff bound, multiplicative version). Let Z1 , . . . , ZN {0, 1} be independent random variables. Let Z = Z1 + . . . + ZN with E[Z] = . Then for any > 0,
2
.
Pr [Z (1 + )] exp
2+
In our case, let v {0, 1}n , fix t(v) {0, 1}n and let Z1 , . . . , ZN be the random variables
{Xv,w : w 6= v}. Note that they are indeed independent, their sum is Z = Xv , and that
= E[Xv ] 2n. Taking = 2 gives
Pr[Xv 6n] exp(2n).
By the union bound, the probability that Xv 6n for some v {0, 1}n is bounded by
Pr[v {0, 1}n , Xv 6n] 2n exp(2n) 2n .
Proof of Theorem 10.5. Let t : {0, 1}n {0, 1}n be uniformly chosen. By Lemma 10.8, we
have that |Sv | 6n for all v {0, 1}n with probability at least 1 2n . By Lemma 10.7,
this implies that the packet send from v to t(v) would reach its destination in at most
n + 6n = 7n time steps. Analogously, we can send the packets from t(v) to (v) in at most
7n time steps (note that this is exactly the same argument, except that the starting point
is now randomized, instead of the end point). Again, the success probability of this phase
is 1 2n . By the union bound, the choice of t is good for both phases with probability at
least 1 2 2n .
58
11
Expander graphs
Expander graphs are deterministic graphs which behave like random graphs in many ways.
They have a large number of applications, including in derandomization, constructions of
error-correcting codes, robust network design, and many more. Here, we will only give some
definitions and describe a few of their properties. For a much more comprehensive survey
see [HLW06].
11.1
Edge expansion
Let G = (V, E) be an undirected graph. We will focus here on d-regular graphs, but many
of the results can be extended to non-regular graphs as well. Let E(S, T ) = |E (S T )|
denote the number of edges in G with one endpoint in S and the other in T . For a subset
S V , its edge boundary is E(S, S c ). We say that G is an edge expander, if any set S has
many edges going out of it.
Definition 11.1. The Cheeger constant of G is
h(G) =
E(S, S c )
.
SV,1|S||V |/2
|S|
min
Simple bounds are 0 h(G) d, with h(G) = 0 iff G is disconnected. A simple example
for a graph which large edge expansion is the complete graph. If G = Kn , the complete
graph on n vertices, then d = n 1 and
h(G) =
E(S, S c )
=
min
|S c | = n/2.
SV,1|S||V |/2
SV,1|S||V |/2
|S|
min
Our interest however will be in constructing large but very sparse graphs (ideally with d = 3)
for which h(G) c for some absolute constant c > 0. Such graphs are highly connected
graphs. For example, the following lemma shows that by deleting a few edges in such graphs,
we can only disconnect a few vertices. This is very useful for example in network design,
where we want the failure of edges to effect as few nodes as possible.
Lemma 11.2. To disconnect k vertices from the rest of the graph, we must delete at least
k h(G) edges.
Proof. If after deleting some number of edges, a set S V of size |S| = k gets disconnected
from the graph, then we must have deleted at least E(S, S c ) k h(G) many edges.
There are several constructions of expander graphs which are based on number theory.
The constructions are simple and beautiful, but the proofs are hard. For example, a construction of Selberg has V = Zp {} and E = {(x, x + 1), (x, x 1), (x, 1/x) : x V }. It
is a 3-regular graph, and it can be proven to have h(G) 3/32. The degree 3 is the smallest
we can hope for.
59
11.2
Spectral expansion
We will now describe another notion of expansion, called spectral expansion. It must seem
less natural, but we will later see that it is essentially equivalent to edge expansion. However,
the benefit will be that it is easy to check if a graph has a good spectral expansion, while we
dont know of an efficient way to test for edge expansion (other than computing it directly,
which takes exponential time).
For a d-regular graph G = (V, E), |V | = n, let A be the adjacency matrix of G. That is,
A is an n n matrix with
1 if (i, j) E
Ai,j =
.
0 if (i, j)
/E
Note that A is a symmetric matrix, hence its eigenvalues are all real. We first note a few
simple properties of them.
Claim 11.4.
(i) All eigenvalues of A are in the range [d, d].
(ii) The vector ~1 is an eigenvector of A with eigenvalue d.
(iii) If G has k connected components then A has k linearly independent eigenvectors with
eigenvalue d.
(iv) If G is bi-partite then A has eigenvalue of d.
Parts (iii), (iv) are in fact iff, but we will only show one direction.
Proof. (i) Let v Rn be an eigenvector of A with eigenvalue . Let i be such that |vi | is
maximal. Then
X
vi = (Av)i =
vj ,
ji
ji
1 = d.
ji
(iii) Let ~1S Rn be the indicator vector for a set S V . If G has k connected components,
say S1 , . . . , Sk V , then ~1S1 , . . . , ~1Sk are all eigenvectors of G with eigenvalue d.
60
(iv) If G is bi-partite, say V = V1 V2 with E V1 V2 , then the vector v = ~1V1 ~1V2 has
eigenvalue d. Indeed, if i V1 then
X
(Av)i =
vj = d
ji
i vi .
Then
E(S, T ) =
i i i .
D
E
The terms for i = 1 correspond to the random graph case: 1 = ~1S , v1 = |S|/ n,
1 = |T |/ n and 1 = d, so
d|S||T |
1 1 1 =
.
n
We can thus bound
v
v
u n
u n
X
n
n
X
X
u
uX
d|S||T
|
2
t
t
E(S, T )
=
|
||
|
i2 ,
i i i
i
i
i
n
i=2
i=2
i=2
i=2
61
)n.
T r(A ) =
n
X
(A )i,i =
i=1
So
n
X
i,j=1
2i = nd d2 = d(n d).
i=2
62
11.3
Cheeger inequality
We will prove the following theorem, relating spectral expansion and edge expansion.
Theorem 11.9 (Cheeger inequality). For any d-regular graph G,
p
1
(d 2 ) h(G) 2d(d 2 ).
2
Let v1 , . . . , vn be the eigenvectors of A corresponding to eigenvalues 1 , . . . , n . Since
the matrix A is symmetric, we can choose orthonormal eigenvectors, hvi , vj i = 1i=j . In
particular, v1 = 1n~1. We will only prove the lower bound on h(G), which is easier and is
sufficient for our goals - to show a nontrivial edge expansion, it suffices to show that 2 d.
We start with a general characterization of 2 .
T
.
Claim 11.10. 2 = supwRn ,hw,~1i=1 wwTAw
w
D
E
P
Proof. Let w Rn be such that w, ~1 = 0. We can decompose w = ni=2 i vi . Then
P 2
P
wT w =
i and wT Aw = ni=2 i i2 . Since i 2 for all i 2, we obtain that
wT Aw 2 wT w.
Clearly, if w = v2 then wT Aw = 2 wT w, hence the claim follows.
So, to prove the lower bound h(G) (d 2 )/2, which is equivalent to 2 d 2h(G),
we just need to exhibit a vector w. We do so in the next lemma.
Lemma 11.11. Let S V be a set for which h(G) =
wi = ~1S
E(S,S c )
.
|S|
Define w Rn by
|S|~
1.
n
D
E
Then w, ~1 = 0 and
wT Aw
d 2h(G).
wT w
D
E
~
Proof. It is clear that w, 1 = 0. For the latter claim, we first compute
T
2
|S|
|S|
T
~1
~1S
~1 = |S| |S| .
w w = ~1S
n
n
n
D
E
~1 and since w, ~1 = 0 we have
Next, Aw = A~1S d|S|
n
d|S|2
d|S|2
d|S|2
= E(S, S)
= d|S| E(S, S c )
.
n
n
n
We now plug in E(S, S c ) = |S|h(G) and obtain that
|S|
n
T
T
T
w Aw = w w d
h(G) = w w d
h(G) wT w (d 2h(G))
|S| |S|2 /n
n |S|
wT Aw = wT A~1S = ~1TS A~1S
11.4
We saw one notion under which expanders are robust - deleting a few edges can only
disconnect a few vertices. Now we will see another - random walks mix fast. This will
require a bound on all i for i 2, which is how we defined -expanders.
A random walk in a d-regular graph G is defined as one expects: given a current node
i V , a neighbour j of i is selected uniformly, and we move to j. There is a simple
characterization of the probability distribution on the nodes after one step, given by the
normalized adjacency matrix. Define
1
A = A.
d
We can describe distributions over V as vectors (R+ )n , where i is the probability that
we are at node i.
Claim 11.12. Let (R+ )n be a distribution over the nodes. After taking one step in the
We next use this observation to show that if 2 d then random walks in G converge fast
to the uniform distribution. The distance between distributions is the statistical distance,
given by
1X
1
dist(, 0 ) =
|i i0 | = k 0 k1 .
2 i
2
It can be shown that this is also equivalent to the largest probability in which an event can
distinguish from 0 ,
X
dist(, 0 ) = max
i i0 .
F [n]
iF
64
E
P
D
Proof. Decompose 0 =
i vi where 1 = h0 , v1 i = 1/ n 0 , ~1 = 1/ n and hence
1 v1 = (1/n)~1 = U is the uniform distribution. We have that t = At 0 . The eigenvectors
of A are v1 , . . . , vn with eigenvalues 1 = 1 /d, 2 /d, . . . , n /d. Hence
t =
n
X
(i /d)t vi .
i=1
n
X
(i /d)t vi .
i=2
In order to bound kt U k1 , it will be easier to first bound kt U k2 , and then use the
Cauchy-Schwarz inequality: for any w Rn we have
!2
n
n
X
X
2
kwk1 =
|wi | n
|wi |2 = nkwk22 .
i=1
i=1
U k22
n
X
(i /d)2t n(/d)2t .
i=2
Hence,
k U k1 n(/d)t .
2 log n
.
log(d/)
Proof. Fix any i, j V . The probability of a random walk of length t which starts at i to
c log n
then the error term is bounded by
reach j is 1/n n(/d)t . If t = log(d/)
n(/d)t n(c1) .
So, for c > 2 the error term is < 1/n, and hence there is a positive probability to reach j
from i within t steps. In particular, their distance is bounded by t.
11.5
We next show another property of random walks on expanders: they dont stay trapped in
small sets.
65
0
S 0
S 0
S A
S A
S A
S 1
0
=
=
=
Pr[i1 S|i0 S]
Pr[i1 S|i0 S]
Pr[i1 S|i0 S] Pr[i0 S]
Pr[i0 , i1 S]
Similarly,
Pr[i2 S|i0 , i1 S] = ~1T S 10 ,
and
20
S A
S 0
10
S A
S A
=
=
.
Pr[i2 S|i0 , i1 S]
Pr[i0 , i1 , i2 S]
More generally, and exploting the fact that 2S = S , we have that the conditioned distribution of it , conditioned on i0 , . . . , it S, is given by
t0 =
S )t 0
(S A
.
Pr[i0 , . . . , it S]
S, and hence S w = w. Decompose w = v1 + w where = hw, v1 i and w , v1 = 0, and
let = kw k2 . We have
1 = kwk22 = 2 kv1 k22 + kw k22 = 2 + 2
and
= |2 + (w )T Aw
| 2 + (/d) 2 = 2 + (/d)(1 2 ).
|| = |wT M w| = |wT Aw|
So, to bound || we need to bound ||. As w is supported on S we have
E
E
p
1 D
1 D
1
|| = |hw, v1 i| = w, ~1 = w, ~1S kwk2 k~1S k2 = |S|/n
n
n
n
and hence
2
We thus obtain the bound
11.6
|S|
.
n
|S|
|S|
||
+ 1
.
n
n d
Lemma 11.16. Let G = (V, E) be some d-regular -expander for V = {0, 1}m , where
< d = O(1) are constants. We treat nodes of G as assignments to the random bits of A.
Consider the following algorithm: choose a random r0 V , and let r1 , . . . , rt be obtained by
a random walk on G starting at r0 . On input x, we run A(x, r0 ), . . . , A(x, rt ), and output 0
only if in all the runs output 0. Then
1. The new algorithm is a one-sided algorithm with error 2(t) .
2. The new algorithm uses only m + O(t) random bits.
Proof. If f (x) = 0 then A(x, r) = 0 for all r, hence we will return 0 always. So assume
that f (x) = 1. Let B = {r {0, 1}m : A(x, r) = 0} be the bad random strings, on
which the algorithm makes a mistake. By assumption, |B| |V |/2. We will return 0 only if
r0 , . . . , rt B. However, we know that
t
1 1
+
= pt
Pr[r0 , . . . , rt B]
2 2 d
where p = p(, d) < 1 is a constant. So the probability of error is 2(t) . The number
of random bits required is as follows: m random bits to choose r0 , but only log d = O(1)
random bits to choose each ri given ri1 . So the total number of random bits is m+O(t).
68
References
[Bal11]
Simeon Ball. A proof of the mds conjecture over prime fields. In 3rd International
Castle Meeting on Coding Theory and Applications, volume 5, page 41. Univ.
Aut`onoma de Barcelona, 2011.
[FKS84] Michael L Fredman, Janos Komlos, and Endre Szemeredi. Storing a sparse table
with 0 (1) worst case access time. Journal of the ACM (JACM), 31(3):538544,
1984.
[HLW06] Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their
applications. Bulletin of the American Mathematical Society, 43(4):439561, 2006.
[Kar93]
[KS96]
David R Karger and Clifford Stein. A new approach to the minimum cut problem.
Journal of the ACM (JACM), 43(4):601640, 1996.
[LG14]
[LLW06] Michael George Luby, Michael Luby, and Avi Wigderson. Pairwise independence
and derandomization. Now Publishers Inc, 2006.
[Rys63]
[Sch80]
[Sch99]
[Sha79]
[Str69]
[Zip79]
69