Advanced Algorithms Course. Lecture Notes. Part 10: Hashing
Advanced Algorithms Course. Lecture Notes. Part 10: Hashing
Hashing
(This part may be skipped if you know hashing already very well from Data
Structure courses.)
Let be U a universe (a huge set) of elements. A dictionary is a data
structure that keeps track of a set S U and supports the following opra-
tions: insert, delete, lookup. That is, a dictionary enables us to quickly
insert elements into a set, delete elements from a set, or retrieve elements of
the set. Hash tables are among the most well known implementations of dic-
tionaries. In the following, n is always some fixed size bound much smaller
than |U |. A hash table H is an array of size n, with indices 0, . . . , n 1,
where n |S|. That is, H allocates enough space for storing sets S of at
most n elements. However, several elements may be stored in the same entry
of H, for example as a list. Then we speak of collisions.
A hash function h maps U onto this index set. In order to execute any
of the dictionary operations for an element, we compute the index of that
element and access the corresponding entry of H. Of course, h must be
easily computable, and it is essential that our hash function keeps collisions
to a minimum: If many elements are stored in the same entry, we still
have to search for the desired element there, and this would slow down the
dictionary operation. Since U is much larger than n, collisions cannot be
avoided, but with a good randomized approach we can keep their expected
number small. In the following, note again that randomness is only in the
algorithm (here: in the design of our hash function h), but we do not make
any assumptions on the set S we want to store, other than |S| n.
Here is a classical simple hashing scheme, along with a rigorous analysis
of its performance. We will choose h at random from a certain class of easily
computable functions. We call a function class universal if for any pair
1
u, v U the probability of h(u) = h(v) is at most 1/n. This is a good
property for hashing because, if we pick a random h from a universal class
then, for any fixed element u, the expected number of other elements s S
with h(s) = h(u) is at most 1, and we barely get large bags of elements in
the same entry of H. Thus our dictionary will be able to do any operation
in O(1) expected time.
But do such universal classes of functions exist? Trivially, the class of
all functions from U into the index set has this property. But what would
it mean to choose a random h from the class of all functions? Since the
values of such h are random and independent, h has no structure, and we
can compute the values of h for given elements only by looking them up,
in a table of size |U |, which is against the very idea of hashing. We need
a restricted class of functions which are easily computable but still shake
well the elements of any subset with at most n elements. One construction
comes from elementary number theory.
We choose a prime number p slightly larger than our n. (Prime numbers
are dense enough in the set of integers, we will always find such p. We do
not go into details of this preprocessing step.) We represent the elements of
U as vectors x = (x1 , . . . , xr ) with 0 xi < p for all i. The dimensionality
we need is clearly r log |U |/ log p. (This may look complicated, but note
that these vectors can be seen as arbitrary names of the elements.) For
every a = (a1 , . . . , ar ) we define a hash function ha (x) = ( ri=1 ai xi ) mod p.
P
For any given x U these values are really easy to compute. It remains
to analyze the collisions. We will see that the class of all functions ha is
universal.
Very little help from number theory is needed: If p is a prime number,
and z 6= 0 mod p, then az = bz mod p implies a = b mod p for any two
numbers a, b. (The proof is straightforward.)
Using this fact we show, for any two x, y U , that ha (x) = ha (y)
happens with probability at most 1/p. (Recall where this probability comes
from: We took some random a.) Since x 6= y, their vectors must differ
somewhere. Hence, let j be some position where xj 6= yj . A nice trick
makes the probability calculation extremely simple: Instead of considering a
random a, we fix all ai , i 6= j, and choose only the component aj randomly,
where 0 aj < p. Then the probability result applies also to a random
vector a. (Why?) By the construction of ha , a collision ha (x) = ha (y)
appears if and only if aj (yj xj ) = i6=j ai (xi yi ) mod p. Since we have
P
fixed the right-hand side, we can treat it as a constant, say m. Now define
2
z := yj xj . Due to the above number-theoretic fact, there exists exactly
one aj with aj z = m mod p. Hence the probability of collision is 1/p 1/n,
and our hash table can execute dictionary operations in O(1) expected time.
A final remark: There is often confusion about the time complexity of
hash table operations. O(1) is the expected number of arithmetic operations.
But the bit complexity is not constant, it grows logarithmically in the size
of the sets we want to deal with. Thus, hash tables are asymptotically not
faster than other dictionary implementations such as balanced search trees.
The real advantage of hash tables is elsewhere: They are easy to implement
(just evaluation of some simple functions) and use only arithmetic, which
is physically faster than manipulations with pointers, etc., that would be
needed to implement trees.
Closest Points
For the problem of finding a closest pair of n points in the plane there
exists a divide-and-conquer algorithm running in O(n log n) time. It follows
a simple idea but is a bit complicated when it comes to the implementation
details. A simpler randomized algorithm solves the problem already in O(n)
expected time plus O(n) dictionary operations.
We can always assume that our n points are in a unit square. In our
algorithm we maintain a real number d which is the smallest distance be-
tween two points known so far. We consider the n points in random order.
For every new point p we test whether p has distance smaller than d to
some earlier point, and in this case we update d. For an efficient test we
have to avoid computing the distances to all earlier points. Therefore we
divide the unit square into squares of side length d/2. Since d is the smallest
distance, at most one earlier point can be located in each square. Moreover,
those points which might have a distance smaller than d to p are in squares
close to the square containing p, more precisely, they are in a 5 5 grid of
squares. Thus we have to test at most 24 candidates in every step. Hence
O(n) computations are enough, for all n points. So far we have not even
used the fact that points are processed in random order.
However, some complications begin here: We need to know which points
are in the candidate squares! For this purpose we may use a hash table, with
an entry for every point. But whenever d is diminshed, our partitioning into
squares of side length d/2 changes totally, and we have to create a new
hash table from scratch. How often do we have to insert our points into
3
the various hash tables? Only here the randomized order of points becomes
important.
Let X be a random variable for the total number of insertions. Let Xi
be another random variable, with Xi = 1 if the ith point causes an update,
P
and Xi = 0 else. Clearly, X = n + i iXi . The key fact is that Xi = 1
with probability at most 2/i: For each i, the first i points are randomly
ordered as well, hence, the event that some of the two points in a closest
pair is the ith point has probability 2/i. Linearity of expectation gives
E[X] = n + i iE[Xi ] 3n. Thus, the expected number of dictionary
P
operationseeds O(1) expected time. From these two facts it follows that the
total expected time is O(n). Stop! The latter conclusion seems obvious at
first glance. But we have to point out that a strict proof needs a careful
analysis of conditional expectations, since we combine here two different
sources of randomness. However we omit this technical proof. We only
wanted to stress the efficiency and simplicity of a randomized approach.