Lecture 13 - Hash Tables
Lecture 13 - Hash Tables
Hash Tables
1
Hash Tables (1/4)
¢ Motivation: symbol tables
u A compiler uses a symbol table to relate symbols to associated
data
ê Symbols: variable names, procedure names, etc.
ê Associated data: memory location, call graph, etc.
u For a symbol table (also called a dictionary), we care about
search, insertion, and deletion
u We typically don’t care about sorted order
2
N
e
t
w
o
r
Hash Tables (2/4)
k
i
n
g
L
a 0
b
o
r U
a (universe of keys) h(k1)
t
o h(k4)
r
y
3 K k1 h(k2)
/
4
(actual k4 k2
9 keys)
k3 h(k3)
m-1
3
Hash Tables (3/4)
¢ Idea
u Use a function h to compute the slot for each key
ℎ ∶ 𝑈 → {0, 1, . . . , 𝑚 − 1}
¢ Advantages:
u Reduce the range of array indices handled: 𝑚 instead of |𝑈|
4
Hash Tables (4/4)
¢ More formally:
u Given a table 𝑇 and a record 𝑥, with key (= symbol) and
satellite data, we need to support:
ê Insert (𝑇, 𝑥)
ê Delete (𝑇, 𝑥)
ê Search (𝑇, 𝑥)
u We want these to be fast, but don’t care about sorting the
records
¢ The structure we will use is a hash table
u Supports all the above in 𝑂(1) expected time !
5
Direct-Address Tables
¢ Suppose:
u The range of keys is 0. . 𝑚 − 1
u Keys are distinct
¢ The idea:
u Set up an array 𝑇[0. . 𝑚 − 1] in which
ê 𝑇[𝑖] = 𝑥 if 𝑥 ∈ 𝑇 and key[𝑥] = 𝑖
ê 𝑇[𝑖] = NULL otherwise
u This is called a direct-address table
ê Operations take 𝑂(1) time!
ê So what’s the problem?
6
Direct-Address Tables
¢ Dictionary operations are trivial and take 𝑂(1) time
each:
u DIRECT-ADDRESS-SEARCH(T, k)
return 𝑇[𝑘]
u DIRECT-ADDRESS-INSERT(T, x)
𝑇[key[𝑥]] ← 𝑥
u DIRECT-ADDRESS-DELETE(T, x)
𝑇[key[𝑥]] ← NIL
7
The Problem With Direct Addressing
¢ Direct addressing works well when the range 𝑚 of keys
is relatively small
¢ But what if the keys are 32-bit integers?
u Problem 1: direct-address table will have
232 entries, more than 4 billion
u Problem 2: even if memory is not an issue, the time to initialize
the elements to NULL may be …
¢ Solution: map keys to smaller range 0. . 𝑚 − 1
¢ This mapping is called a hash function
8
Hash Functions
¢ Next problem: collision
T
U 0
(universe of keys)
h(k1)
k1
h(k4)
K k4
(actual k5
h(k2) = h(k5)
keys)
k2 h(k3)
k3
m-1
9
Resolving Collisions
¢ How can we solve the problem of collisions?
u Solution 1: chaining
u Solution 2: open addressing
10
Chaining (1/10)
index LL Head
0
1
2
3
4
5
...
n-1
11
Chaining (2/10) key1: data1
Hash
index LL Head
Function
0
1
2 3
3 key1:data1
4
5
...
n-1
12
Chaining (3/10) key2: data2
Hash
index LL Head
Function
0
1 key2:data2
2 1
3 key1:data1
4
5
...
n-1
13
Chaining (4/10) key3: data3
Hash
index LL Head
Function
0
1 key2:data2
2 4
3 key1:data1
4 key3:data3
5
...
n-1
14
Chaining (5/10) key4: data4
Hash
index LL Head
Function
0
1 key2:data2
2 3
3 key4:data4 key1:data1
4 key3:data3
5
...
n-1 Note: Insertion at
beginning of list
15
Chaining (6/10)
¢ Chaining puts elements that hash to the same slot in a
linked list:
T
U ——
(universe of keys) k1 k4 ——
——
k1
——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
16
Chaining (7/10)
¢ How do we insert an element?
T
U ——
(universe of keys) k1 k4 ——
——
k1
——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
17
Chaining (8/10)
¢ How do we delete an element?
u Do we need a doubly-linked list for efficient delete?
T
U ——
(universe of keys) k1 k4 ——
——
k1
——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
18
Chaining (9/10)
¢ How do we search for an element with a given key?
T
U ——
(universe of keys) k1 k4 ——
——
k1
——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
19
Chaining (10/10)
u CHAINED-HASH-INSERT(T, x)
insert 𝑥 at the head of list 𝑇[ℎ(key[𝑥])]
u CHAINED-HASH-SEARCH(T, k)
search for an element with key 𝑘 in list 𝑇[ℎ[𝑘]]
u CHAINED-HASH-DELETE(T, x)
delete 𝑥 from the list 𝑇[ℎ(key[𝑥])]
20
Practice Problems
𝑖 0 1 2 3 4 5 6 7 8
[] [10, [20] [12] [] [5] [33, [] [17]
𝑇[𝑖] 19, 15]
28]
21
Analysis of Chaining
22
Analysis of Chaining
23
Choosing A Hash Function
24
Hash Functions: The Division Method
¢ Idea
u Map a key 𝑘 into one of the 𝑚 slots by taking the
remainder of 𝑘 divided by 𝑚
ℎ(𝑘) = 𝑘 𝑚𝑜𝑑 𝑚
¢ Advantage
u fast, requires only one operation
¢ Disadvantage
u Certain values of 𝑚 are bad: power of 2 and non-prime
numbers
25
The Division Method
26
Hash Functions: The Multiplication Method
Idea:
¢ Multiply key 𝑘 by a constant 𝐴, 0 < 𝐴 < 1
¢ Extract the fractional part of 𝑘𝐴
¢ Multiply the fractional part by 𝑚
¢ Take the floor of the result
ℎ(𝑘) = 𝑚 (𝑘 𝐴 𝑚𝑜𝑑 1)
fractional part of 𝑘𝐴 = 𝑘𝐴 − 𝑘𝐴
27
The Multiplication Method
¢ For a constant 𝐴, 0 < 𝐴 < 1:
¢ ℎ(𝑘) = 𝑚 (𝑘𝐴 − 𝑘𝐴 )
Fractional part of 𝑘𝐴
¢ Choose 𝑚 = 2𝑃
¢ Choose 𝐴 not too close to 0 or 1
¢ Knuth: Good choice for 𝐴 = (Ö5 - 1)/2
28
Truncation method
• Ignore part of the key and use the rest as the array
index (converting non-numeric parts)
• Example
If students have a 9-digit identification number, take
the last 3 digits as the table position
– e.g. 925371622 becomes 622
30
Mid-square method
¢ Example
u consider items whose keys are 4-digit numbers in
base 10
u The goal is to hash these key values to a table of
size 100
u This range is equivalent to two digits in base 10,
that is, the number of bits is 2
u For example, the input is the number 4567,
squaring gets an 8-digit number, 20857489
u The middle two digits of this result is 57
32
Folding method
¢ Partition the key into several parts of the same length
except for the last
¢ These parts are then added together to obtain the hash
address for the key
¢ Two ways of carrying out this addition
u Shift-folding
u Folding and reverse
33
Folding method
¢ Example
34
Digit analysis method
¢ All the keys in the table are known in advance
¢ The index is formed by extracting, and then
manipulating specific digits from the key
¢ For example, the key is 925371622, we select
the digits from 5 through 7 resulting 537
¢ The manipulation can then take many forms
u Reversing the digits (735)
u Performing a circular shift to the right (753)
u Performing a circular shift to the left (375)
u Swapping each pair of digits (357)
35
Open Addressing (1/9)
¢ If we have enough contiguous memory to store all the
keys (𝑚 > 𝑁)
Þ store the keys in the table itself
¢ No need to use the linked lists anymore
¢ Collision resolution
u Put the elements that collide in the available empty places in
the table
39
Open Addressing (2/9)
¢ Basic idea:
u To insert: if slot is full, try another slot, …, until an open slot is
found (probing)
u To search, follow same sequence of probes as would be used
when inserting the element
ê If reach element with correct key, return it
ê If reach a NULL pointer, element is not in table
40
Open Addressing (3/9)
index data
0
1
2
3
4
5
...
n-1
41
Open Addressing (4/9)
key1: data1
index data
0
1
2
Hash
3 key1:data1 Function
4
5
...
n-1 3
42
Open Addressing (5/9)
key2: data2
index data
0
1 key2:data2
2
Hash
3 key1:data1 Function
4
5
...
n-1 1
43
Open Addressing (6/9)
key3: data3
index data
0
1 key2:data2
2
Hash
3 key1:data1 Function
4 key3:data3
5
...
n-1 4
44
Open Addressing (7/9)
key4: data4
index data
0
1 key2:data2
2
Hash
3 key1:data1 Function
4 key3:data3
5
...
n-1 3
?
45
Open Addressing (8/9)
key4: data4
index data
0
1 key2:data2
2
Hash
3 key1:data1 Function
4 key3:data3
5 key4:data4
...
n-1 3
46
Open Addressing (9/9)
47
Linear Probing (1/3)
48
Linear Probing (2/3)
49
Linear Probing (3/3)
¢ Deleting a key
0
u Cannot mark the slot as empty
u Impossible to retrieve keys inserted after that
slot was occupied
¢ Solution
u Mark the slot with a sentinel value DELETED
¢ The deleted slot can later be used for
insertion
m-1
¢ Searching will be able to find all the keys
50
Practice Problems
¢ Consider a hash table of length 11 using open addressing
with the primary hash function ℎ1(𝑘) = 𝑘. Illustrate the
result of inserting 31, 4, 15, 28, 59 using linear probing.
u Hash function ℎ(𝑘, 𝑖) = (ℎ1(𝑘) + 𝑖) mod 𝑚 = (𝑘 + 𝑖) mod 𝑚
ê ℎ(31,0) = 9 Þ 𝑇[9] = 31
ê ℎ 4,0 = 4 Þ 𝑇[4] = 4
ê ℎ(15,0) = 4 collision
o ℎ 15,1 = 5 ⇒ 𝑇[5] = 15
ê ℎ 28,0 = 6 ⇒ 𝑇[6] = 28
ê ℎ(59,0) = 4 collision
o ℎ(59,1) = 5 collision
o ℎ(59,2) = 6 collision
o ℎ 59,3 = 7 Þ 𝑇[7] = 59
51
Double Hashing (1/3)
52
Double Hashing (2/3)
0
ℎ1(𝑘) = 𝑘 mod 13 1 79
6
ℎ1(14) = 14 mod 13 = 1 7 72
12
ℎ(14, 2) = (ℎ1(14) + 2 ℎ2(14)) mod 13
= (1 + 8) mod 13 = 9
53
Double Hashing (3/3)
Choosing the second hash function
ℎ(𝑘, 𝑖) = (ℎ1(𝑘) + 𝑖 ℎ2(𝑘)) mod 𝑚
¢ ℎ2 should not evaluate to 0 Þ infinite loop
¢ ℎ2 should be relatively prime to the table size 𝑚
¢ Solution 1
u Choose m prime
u Design ℎ2 such that it produces an integer less than 𝑚
¢ Solution 2
u Choose 𝑚 = 2𝑝
u Design ℎ2 such that it always produces an odd number
54
Practice Problems
¢ Consider a hash table of length 11 using open addressing.
Illustrate the result of inserting 31, 4, 15, 28, 59 using
double hashing with functions ℎ1(𝑘) = 𝑘 and ℎ2(𝑘) = 1 + (𝑘
mod (𝑚 − 1)).
u Hash function ℎ(𝑘, 𝑖) = (𝑘 + 𝑖(1 + (𝑘 mod (𝑚 − 1)))) mod 𝑚
ê ℎ 31,0 = 9 Þ 𝑇[9] = 31
ê ℎ 4,0 = 4 Þ 𝑇[4] = 4
ê ℎ(15,0) = 4 collision
o ℎ 15,1 = 10 Þ 𝑇[10] = 15
ê ℎ 28,0 = 6 Þ 𝑇[6] = 28
ê ℎ(59,0) = 4 collision
o ℎ 59,1 = 3 Þ 𝑇[3] = 59
55