1 Di Erence Between Grad and Undergrad Algorithms: Lecture 1: Course Intro and Hashing
1 Di Erence Between Grad and Undergrad Algorithms: Lecture 1: Course Intro and Hashing
Algorithms are integral to computer science and every computer scientist (even as an
undergrad) has designed several algorithms. So has many a physicist, electrical engineer,
mathematician etc. This course is meant to be your one-stop shop that teaches you how
to design a variety of algorithms. The operative word is “variety. ”In other words you will
avoid the blinders that one often sees in domain experts. A bayesian needs priors on the
data before he can design algorithms; an optimization expert wishes to cast all problems
as mathematican optimization; a systems designer has never seen any problem that cannot
be solved by hashing. (OK, mostly kidding but the joke does reflect truth to some degree.)
These and more domain-specific ideas make an appearance in our course, but we will learn
to not be wedded to any single approach.
The primary skill you will learn in this course is how to analyse algorithms: prove their
correctness and their running time and any other relevant properties. Learning to analyse a
variety of algorithms (designed by others) will let you design better algorithms later in life.
I will try to fill the course with beautiful algorithms. Be prepared for frequent rose-smelling
stops, in other words.
1
2
The changing graph. In undergrad algorithms the graph is given and arbitrary (worst-
case). In grad algorithms we are willing to look at where the graph came from (social
network, computer vision etc.) since those properties may be germane to designing a good
algorithm. (This is not a radical idea of course but we will see that formulating good graph
models is not easy. This is why you see a lot of heuristic work in practice, without any
mathematical proofs of correctness.)
Changing data structures: In undergrad algorithms the data structures were simple
and often designed to hold data generated by other algorithms. A stack allows you to hold
vertices during depth-first search traversal of a graph, or instances of a recursive call to a
procedure. A heap is useful for sorting and searching.
But in the newer applications, data often comes from sources we don’t control. Thus it
may be noisy, or inexact, or both. It may be high dimensional. Thus something like heaps
will not work, and we need more advanced data structures.
We will encounter the “curse of dimensionality”which constrains algorithm design for
high-dimensional data.
Type of analysis: In undergrad algorithms the algorithms were often exact and work on
all (i.e., worst-case) inputs. In grad algorithms we are willing to relax these requirements.
Advanced Algorithm Design: Hashing
Lectured by Prof. Moses Charikar
Transcribed by Linpeng Tang∗
Feb 2nd, 2013
1 Preliminaries
In hashing, we want to store a subset S of a large universe U (U can be very
large, say |U | = 232 is the set of all 32 bit integers). And |S| = m is a relatively
small subset. For each x ∈ U , we want to support 3 operations:
• insert(x). Insert x into S.
• delete(x). Delete x from S.
• query(x). Check whether x ∈ S.
U
h
n elements
A hash table can support all these 3 operations. We design a hash function
h : U −→ {0, 1, . . . , n − 1} (1.1)
such that x ∈ U is placed in T [h(x)], where T is a table of size n.
Since |U | $ n, multiple elements can be mapped into the same location in
T , and we deal with these collisions by constructing a linked list at each location
in the table.
One natural question to ask is: how long is the linked list at each location?
We make two kinds of assumptions:
∗ [email protected]
1
1. Assume the input is the random.
Assumption 1 may not be valid for many applications, since the input might
be correlated.
For Assumption 2, we construct a set of hash functions H, and for each
input, we choose a random function h ∈ H and hope that on average we will
achieve good performance.
2 Hash Functions
Say we have a family of hash functions H, and for each h ∈ H, h : U −→ [n]1 ,
what do mean by saying these functions are random?
For any x1 , x2 , . . . , xm ∈ S (xi $= xj when i $= j), and any a1 , a2 , . . . , am ∈
[n], ideally a random H should satisfy:
1
• Prh∈H [h(x1 ) = a1 ] = n.
1
• Prh∈H [h(x1 ) = a1 ∧ h(x2 ) = a2 ] = n2 . Pairwise independence.
1
• Prh∈H [h(x1 ) = a1 ∧ h(x2 ) = a2 ∧ · · · ∧ h(xk ) = ak ] = nk
. k-wise indepen-
dence.
• Prh∈H [h(x1 ) = a1 ∧ h(x2 ) = a2 ∧ · · · ∧ h(xm ) = am ] = n1m . Full indepen-
dence (note that |U | = m). In this case we have nm possible h (we store
h(x) for each x ∈ U ), so we need m log n bits to represent the each hash
function. Since m is usually very large, this is not practical.
For any x, let Lx be the length of the linked list containing x, then Lx is just
the number of elements with the same hash value as x. Let random variable
!
1 if h(y) = h(x),
Iy = (2.1)
0 otherwise.
"
So Lx = 1 + y"=x Iy , and
# m−1
E[Lx ] = 1 + E[Iy ] = 1 + (2.2)
n
y"=x
Note that we don’t need full independence to prove this property, and pairwise
independence would actually suffice.
1 We use [n] to denote the set {0, 1, . . . , n − 1}
2
3 2-Universal Hash Families
Definition 3.1 (Cater Wegman). Family H of hash functions is 2-universal if
for any x != y ∈ U ,
1
Prh∈H [h(x) = h(y)] ≤ (3.1)
n
Note that this property is even weaker than 2 independence.
We can design 2-universal hash families in the following way. Choose a prime
p ∈ {|U |, . . . , 2|U |}, and let
And let
ha,b (x) = fa,b (x) mod n (3.3)
1 p(p − 1)
≤ (3.10)
p(p − 1) n
1
= (3.11)
n
where δ is the Dirac delta function. Equation (3.10) follows because for each
s ∈ [p], we have at most (p − 1)/n different t such that s != t and s = t
mod n.
3
Can we design a collision free hash table then? Say we have m elements,
and the hash table is of size n. Since for any x1 != x2 , Prh [h(x1 ) = h(x2 )] ≤ n1 ,
the expected number of total collisions is just
! ! " #
m 1
E[ h(x1 ) = h(x2 )] = E[h(x1 ) = h(x2 )] ≤ (3.12)
2 n
x1 !=x2 x1 !=x2
0
1
si elements
i
s2i locations
n−1
4 Load Balance
In load balance problem, we can imagine that we are trying to put balls into
bins. If we have n balls and n bins, and we randomly put the balls into bins,
4
then for a give i,
! "
n 1 1
Pr[bini gets more than k elements] ≤ · k ≤ (4.1)
k n k!
By Stirling’s formula,
√ k
k! ∼ 2nk( )k (4.2)
e
If we choose k = O( logloglogn n ), we can let 1
k! ≤ 1
n2 . Then
1 1
Pr[∃ a bin ≥ k balls] ≤ n · 2
= (4.3)
n n
12
So with probability larger than 1 − n ,
log n
max load ≤ O( ) (4.4)
log log n
Note that if we look at 2 random bins when a new ball comes in and put
the ball in the bin with fewer balls, we can achieve maximal load at the scale of
O(log log n), which is a huge improvement.
2 this 1
can be easily improve to 1 − nc
for any constant c