Lec 11 Hash Table
Lec 11 Hash Table
Hash Tables
11.1 Directed-address tables
Direct addressing is a simple technique that works
well when the universe U of keys is reasonably
small. Suppose that an application needs a dynamic
set in which each element has a key drawn from the
universe U = {0,1,…, m – 1} where m is not too
large. We shall assume that no two elements have the
same key.
To represent the dynamic set, we use an array, or
directed-address table, T[0,… , m – 1], in which
each position, or slot, corresponds to a key in the
universe U.
2
Implementation of direct-address table
DIRECTED-ADDRESS-INSERT(T, x)
T[key[x]] x
DIRECTED-ADDRESS-DELETE(T, x)
T[key[x]] NIL
4
11.2 Hash tables
The difficulty with direct address is obvious:
if the universe U is large (sometime
unbounded), storing a table T of size |U | may
be impractical, or even impossible.
Furthermore, the set K of keys actually stored
may be so small relative to U. Specifically,
the storage requirements can be reduced to
(|K |), while searching for an element in the
hash table still requires only O(1) time.
5
Implementation of hash table
7
Collision resolution techniques:
Chaining
Putting all the elements that hash to
the same slot in a linked list.
Open addressing
One element in one position!
8
Implementation of chained hash
9
Functions of chained hash
CHAINED-HASH-INSERT(T, x)
insert x at the head of the list T[h(key[x])]
CHAINED-HASH-SEARCH(T, k)
search for the element with key k in the list
T[h[k]]
CHAINED-HASH-DELETE(T, x)
delete x from the list T[h(key[x])]
10
Complexity of chained-hash functions
INSERT: O(1) (the worst case), assume the
element x being inserted is not already present
in the table.
SEARCH: in the worst case proportional
to the length of the list.
DELETE: O(1), if the list are doubly linked
and a pointer to the element is given; similar
to the case of searching if the list is singly
linked.
11
Analysis of hashing with chaining
Given a hash table T with m slots
that stores n elements.
n
load factor: (the average
m
number of elements stored in a
chain.)
12
Assumption: simple uniform hashing
Simple uniform hashing: any given element is
equally likely to hash into any of the m slots,
independently of where any other element has
hashed to.
We assume the case of simple uniform hashing; also,
computing hashing function takes O(1) time.
for j = 0, 1, …, m-1, let us denote the length of the
list T[j] by nj, so that
n = n0 + n1 + … + nm - 1,
and the average value of nj is E[nj] = = n/m.
13
Theorem 11.1
If a hash table in which collisions are resolved by
chaining, an unsuccessful search takes expected
time (1+), under the assumption of simple
uniform hashing.
Proof:
n
The average length of the list is .
m
The expected number of elements examined in an
unsuccessful search is .
The total time required (including the time for
computing h(k) is O(1+ ).
14
Theorem 11.2
If a hash table in which collision are resolved
by chaining, a successful search takes time
(1+), on the average, under the assumption
of simple uniform hashing.
Assume that CHAINED-HASH-INSERT
procedure inserts a new element at the front of the
list instead of the end.
15
1 n n 1 n n
E 1 X ij 1 E[ X ij ]
n i 1 j i 1 n i 1 j i 1
1 n n
1
1
Xij : the random variable
n i 1 j i 1m
indicates that the i-th
1 n
1
nm i 1
(n i ) and j-th element are
hashed into the same
1 n n
slot.
1 n i
nm i 1 i 1
1 2 n(n 1)
1 n
nm 2 Total time required for
a successful
search
1 . ( 2 ) (1 ).
2 2n 2 2n
16
11.3 Hash functions
What makes a good hash function?
1
k :h ( k ) j
Pr( k )for
j = 1, 2, …, m
m
Example:
Assume 0 k < 1
Set h(k) = km
17
Interpreting keys as natural number
In many cases, we can assume the universe of
keys is the set N = {0, 1, 2, …}.
18
11.3.1 The division method
h(k ) k mod m
Suggestion: Choose m to be prime and not
too close to exactly power of 2:
m = 2p => h(k) is just the p lowest-order bits of k
m = 2p – 1 => h(k1) = h(k2) if k1 is a permutation
of k2 (exercise 11.3-3)
19
11.3.2 The multiplication method
20
Suggestion:
5 1
choose m 2 , A (Knuth)
p
w bits
s = A 2w
r1 r0
extract p bits
h(k)
21
Example:
14
k 123456, p 14, m 2 16384
5 1
A 0.61803...
2
h(k ) 16384 (123456 A mod 1)
16384 0.004115107
67.4219 67
22
11.4 Open addressing
All elements are stored in the hash tables
itself (no chains).
h : U {0,1,…, m–1} {0,1,…, m–1}.
25
Linear probing:
h(k , i ) (h (k ) i ) mod m
It suffers the primary clustering problem.
26
Quadratic probing:
27
Linear probing vs. Quadratic probing
0 h(k1, 0) = h(k2, 0) 0 h(k1, 0) = h(k2, 0)
1 => 1 79 =>
2 h(k1, i) = h(k2, i) 2 h(k1, i) = h(k2, i)
3 3
4 4 69
5 98 h(k1, i) = h(k2, j) 5 98 h(k1, i) = h(k2, j)
6 72 likely to have 6 unlikely to have
7 14 h(k1, i+1) = h(k2, j+1) 7 72 h(k1, i+1) = h(k2, j+1)
8 50 8 14
9 79 9
10 10
11 11 50
12 12
Double hashing:
29
0
1 79
2 h1(k) = k mod 13
3
4 69
5 98
6
h2(k) = 1 + (k mod 11)
7 72
8
9 14 INSERT 14
10
11 50
12 30
Example:
h1 (k ) k mod m
h2 (k ) 1 (k mod m' )
31
Double hashing vs
linear or quadratic probing
Double hashing represents an
improvement over linear and
quadratic probing in that (m2)
probe sequence are used, rather than
(m). Its performance is very close
to uniform hashing.
32
Analysis of open-address hashing
Theorem 11.6
Given an open-address hash-table with
load factor = n / m < 1, the expected
number of probes in an unsuccessful
search is at most 1/(1 – a), assuming
uniform hashing.
33
Proof.
Define pi = Pr(exactly i probes access
occupied slots)
for 0 i n. And pi = 0 if i > n
The expected number of probes
is 1 ip i .
i 0
E[ X ] i Pr{ X i}
i 0
i (Pr{ X i} Pr{ X i 1})
i 0 i 0
Pr{ X i}
i 1
35
n n n 1
q1 q2 ( )( )
m m m 1
n n 1 n i 1 n i i
qi ( )( ) ( ) ( )
m m 1 m i 1 m
if 1 i n
qi 0 for i > n.
1
1 ipi 1 qi 1 ...
2
i 0 i 1 1
36
Example:
1
0.1 1.1
= 5 times
1 ~ 2 times
1
0.5 2
~ 2 times
1 = 5 times
1
0.9 10
1
37
Corollary 11.7
Inserting an element into an open-
address hash table with load factor
requires at most 1/(1 – ) probes on
average, assuming uniform hashing.
38
Proof.
An element is inserted only if there is
room in the table, and thus a < 1.
Inserting a key requires an unsuccessful
search followed by placement of the key
in the first empty slot found. Thus, the
expected number of probes is at most
1/(1 – a).
39
Theorem 11.8
Given an open-address hash table with load
factor a < 1, the expected number of probes in
a successful search is at most
1 1
ln
1
assuming uniform hashing and assuming that
each key in the table is equally likely to be
searched for.
40
Proof.
A search for a key k follows the same probe
sequence as was followed when the element
with key k was inserted.
If k was the (i+1)st key inserted into the hash
table, the expected number of probes made in
a search for k is at most 1 m .
i mi
1
m
41
Averaging over all n keys in the hash table
gives us the average number of probes in a
successful search:
1 n 1 m m n 1 1 1
( H m H mn )
n i 0 m i n i 0 m i
1 m
H i j 11 / j
i
(1 / x)dx
mn
(harmonic numbers)
1 m 1 1
ln ln
m n 1
42
Example:
1 1
0.1 ln 1.054
1
1 1
0.5 ln 1.386
1
1 1
0.9 ln 2.558
1
43