0% found this document useful (0 votes)
146 views43 pages

Lec 11 Hash Table

Hash tables provide an efficient way to store key-value pairs when the universe of possible keys is large or unbounded. Hash tables use a hash function to map each key to an integer index in an array. Collisions occur when two keys hash to the same index. Collision resolution techniques include chaining, where colliding keys are stored in a linked list at the index, and open addressing, where keys try alternate indices. Common hash functions include division and multiplication. Linear probing is a basic open addressing scheme but can cause clustering issues. Quadratic probing and double hashing are more sophisticated open addressing methods.

Uploaded by

coco fina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views43 pages

Lec 11 Hash Table

Hash tables provide an efficient way to store key-value pairs when the universe of possible keys is large or unbounded. Hash tables use a hash function to map each key to an integer index in an array. Collisions occur when two keys hash to the same index. Collision resolution techniques include chaining, where colliding keys are stored in a linked list at the index, and open addressing, where keys try alternate indices. Common hash functions include division and multiplication. Linear probing is a basic open addressing scheme but can cause clustering issues. Quadratic probing and double hashing are more sophisticated open addressing methods.

Uploaded by

coco fina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

11.

Hash Tables
11.1 Directed-address tables
 Direct addressing is a simple technique that works
well when the universe U of keys is reasonably
small. Suppose that an application needs a dynamic
set in which each element has a key drawn from the
universe U = {0,1,…, m – 1} where m is not too
large. We shall assume that no two elements have the
same key.
 To represent the dynamic set, we use an array, or
directed-address table, T[0,… , m – 1], in which
each position, or slot, corresponds to a key in the
universe U.
2
Implementation of direct-address table

 U = {0, 1, …, 9}, K = {2, 3, 5, 8} 3


Functions of direct addressing
DIRECTED-ADDRESS-SEARCH(T, k)
return T[k]

DIRECTED-ADDRESS-INSERT(T, x)
T[key[x]]  x

DIRECTED-ADDRESS-DELETE(T, x)
T[key[x]]  NIL
4
11.2 Hash tables
 The difficulty with direct address is obvious:
if the universe U is large (sometime
unbounded), storing a table T of size |U | may
be impractical, or even impossible.
Furthermore, the set K of keys actually stored
may be so small relative to U. Specifically,
the storage requirements can be reduced to
(|K |), while searching for an element in the
hash table still requires only O(1) time.
5
Implementation of hash table

 Using a hash function h to map keys to hash-table slots.


Keys k2 and k5 map to the same slot, so they collide. 6
Some terminology and principles
 Universe set: U, the set of all possible key values.
 Hash function: h: U  {0, 1, …, m – 1}
 Hash table: T[0, …, m – 1}
 We say an element with key k hashes to slot h(k); we
also say that h(k) is the hash value of key k.
 Collision: two keys hashed to the same slot in the
hash table.
 Trade-off: smaller hash table may introduce more
collisions.

7
Collision resolution techniques:
 Chaining
 Putting all the elements that hash to
the same slot in a linked list.

 Open addressing
 One element in one position!
8
Implementation of chained hash

9
Functions of chained hash
 CHAINED-HASH-INSERT(T, x)
insert x at the head of the list T[h(key[x])]
 CHAINED-HASH-SEARCH(T, k)
search for the element with key k in the list
T[h[k]]
 CHAINED-HASH-DELETE(T, x)
delete x from the list T[h(key[x])]
10
Complexity of chained-hash functions
 INSERT: O(1) (the worst case), assume the
element x being inserted is not already present
in the table.
 SEARCH: in the worst case  proportional
to the length of the list.
 DELETE: O(1), if the list are doubly linked
and a pointer to the element is given; similar
to the case of searching if the list is singly
linked.
11
Analysis of hashing with chaining
 Given a hash table T with m slots
that stores n elements.
n
 load factor:   (the average
m
number of elements stored in a
chain.)

12
Assumption: simple uniform hashing
 Simple uniform hashing: any given element is
equally likely to hash into any of the m slots,
independently of where any other element has
hashed to.
 We assume the case of simple uniform hashing; also,
computing hashing function takes O(1) time.
for j = 0, 1, …, m-1, let us denote the length of the
list T[j] by nj, so that
n = n0 + n1 + … + nm - 1,
and the average value of nj is E[nj] =  = n/m.

13
Theorem 11.1
 If a hash table in which collisions are resolved by
chaining, an unsuccessful search takes expected
time (1+), under the assumption of simple
uniform hashing.

Proof:
n
 The average length of the list is   .
 m
The expected number of elements examined in an
unsuccessful search is  .
 The total time required (including the time for
computing h(k) is O(1+ ).
14
Theorem 11.2
 If a hash table in which collision are resolved
by chaining, a successful search takes time
(1+), on the average, under the assumption
of simple uniform hashing.
 Assume that CHAINED-HASH-INSERT
procedure inserts a new element at the front of the
list instead of the end.

15
1 n  n  1 n  n 
E   1   X ij    1   E[ X ij ] 
 n i 1  j i 1  n i 1  j i 1 
1 n  n
1
  1   
 Xij : the random variable
n i 1  j i 1m 
indicates that the i-th
1 n
 1 
nm i 1
(n  i ) and j-th element are
hashed into the same
1  n n
 slot.
 1  n   i 
nm  i 1 i 1 

1  2 n(n  1) 
 1 n  
nm  2  Total time required for
  a successful
  search
 1  . ( 2   )  (1   ).
2 2n 2 2n
16
11.3 Hash functions
 What makes a good hash function?

1

k :h ( k )  j
Pr( k )for
 j = 1, 2, …, m
m

 Example:
 Assume 0  k < 1
 Set h(k) = km

17
Interpreting keys as natural number
 In many cases, we can assume the universe of
keys is the set N = {0, 1, 2, …}.

 Example: ASCII coding


(p, t) = (112, 116) = 112128+116 = 14452

18
11.3.1 The division method

h(k )  k mod m
 Suggestion: Choose m to be prime and not
too close to exactly power of 2:
 m = 2p => h(k) is just the p lowest-order bits of k
 m = 2p – 1 => h(k1) = h(k2) if k1 is a permutation
of k2 (exercise 11.3-3)
19
11.3.2 The multiplication method

h(k) = m(kA mod 1) ,


where kA mod 1 = kA - kA

20
Suggestion:
5 1
choose m  2 , A (Knuth)
p

w bits

 s = A 2w

r1 r0
extract p bits
h(k)

21
Example:
14
k  123456, p  14, m  2  16384
5 1
A  0.61803...
2
h(k )  16384  (123456  A mod 1) 
 16384  0.004115107 
 67.4219  67
22
11.4 Open addressing
 All elements are stored in the hash tables
itself (no chains).
 h : U  {0,1,…, m–1}  {0,1,…, m–1}.

With open addressing, we require that for


every key k, the probe sequence
h(k,0), h(k,1),…, h(k, m-1)
be a permutation of {0,1, …, m–1},
or at least h(k, i){0,1,…, m–1} k, i
23
HASH-INSERT(T, k)
1 i0
2 repeat j  h(k, i)
3 if T[j] = NIL
4 then T[j]  k
5 return j
6 else i  i + 1
7 until i = m
8 error “hash table overflow”
24
HASH-SEARCH(T, k)
1 i0
2 repeat j  h(k, i)
3 if T[j] = k
4 then return j
5 ii+1
6 until T[j] = NIL or i = m
7 return NIL

25
Linear probing:
h(k , i )  (h (k )  i ) mod m
 It suffers the primary clustering problem.

26
Quadratic probing:

h(k, i) = (h(k) + c1i + c2 i2) mod m


c1, c2  0

 It suffers the secondary clustering problem.

27
Linear probing vs. Quadratic probing
0 h(k1, 0) = h(k2, 0) 0 h(k1, 0) = h(k2, 0)
1 => 1 79 =>
2 h(k1, i) = h(k2, i) 2 h(k1, i) = h(k2, i)
3 3
4 4 69
5 98 h(k1, i) = h(k2, j) 5 98 h(k1, i) = h(k2, j)
6 72 likely to have 6 unlikely to have
7 14 h(k1, i+1) = h(k2, j+1) 7 72 h(k1, i+1) = h(k2, j+1)
8 50 8 14
9 79 9
10 10
11 11 50
12 12
Double hashing:

h(k, i) = (h1(k) + ih2(k)) mod m

29
0
1 79
2 h1(k) = k mod 13
3
4 69
5 98
6
h2(k) = 1 + (k mod 11)
7 72
8
9 14 INSERT 14
10
11 50
12 30
Example:

h1 (k )  k mod m

h2 (k )  1  (k mod m' )

31
Double hashing vs
linear or quadratic probing
 Double hashing represents an
improvement over linear and
quadratic probing in that (m2)
probe sequence are used, rather than
(m). Its performance is very close
to uniform hashing.
32
Analysis of open-address hashing
Theorem 11.6
Given an open-address hash-table with
load factor  = n / m < 1, the expected
number of probes in an unsuccessful
search is at most 1/(1 – a), assuming
uniform hashing.

33
Proof.
 Define pi = Pr(exactly i probes access
occupied slots)
for 0  i  n. And pi = 0 if i > n
 The expected number of probes

is 1   ip i .
i 0

 Define qi = Pr{at least i probes access


occupied slots}. 34
 
Why?  ip   q
i 0
i
i 1
i


E[ X ]   i Pr{ X  i}
i 0
 
  i (Pr{ X  i}   Pr{ X  i  1})
i 0 i 0

  Pr{ X  i}
i 1

35
n n n 1
q1  q2  ( )( )
m m m 1
n n 1 n  i 1 n i i
qi  ( )( )  ( )  ( ) 
m m 1 m  i 1 m
if 1  i  n
qi  0 for i > n.
 
1
1   ipi  1   qi  1      ... 
2

i 0 i 1 1
36
Example:

1
  0.1  1.1
= 5 times
1 ~ 2 times
1
  0.5 2
~ 2 times
1 = 5 times
1
  0.9  10
1
37
Corollary 11.7
 Inserting an element into an open-
address hash table with load factor 
requires at most 1/(1 – ) probes on
average, assuming uniform hashing.

38
Proof.
 An element is inserted only if there is
room in the table, and thus a < 1.
Inserting a key requires an unsuccessful
search followed by placement of the key
in the first empty slot found. Thus, the
expected number of probes is at most
1/(1 – a).
39
Theorem 11.8
 Given an open-address hash table with load
factor a < 1, the expected number of probes in
a successful search is at most
1 1
ln
 1
assuming uniform hashing and assuming that
each key in the table is equally likely to be
searched for.
40
Proof.
 A search for a key k follows the same probe
sequence as was followed when the element
with key k was inserted.
 If k was the (i+1)st key inserted into the hash
table, the expected number of probes made in
a search for k is at most 1  m .
i mi
1
m

41
 Averaging over all n keys in the hash table
gives us the average number of probes in a
successful search:

1 n 1 m m n 1 1 1
    ( H m  H mn )
n i 0 m  i n i 0 m  i 
1 m
H i   j 11 / j
i
  (1 / x)dx
 mn
(harmonic numbers)
1 m 1 1
 ln  ln
 m  n  1
42
Example:

1 1
  0.1 ln  1.054
 1
1 1
  0.5 ln  1.386
 1
1 1
  0.9 ln  2.558
 1
43

You might also like