L04 Hashing
L04 Hashing
5
Direct Addressing Table: Operations
delete (key)
a[key] = null
find (key)
return a[key]
6
Direct Addressing Table: Restrictions
Keys must be integer
What happens for key values 151A and NR10?
Range of keys must be small
Keys must be dense, i.e. not many gaps in
the key values.
How to overcome these restrictions?
7
Hash Table
11
Bad Hash Functions
12
Perfect Hash Functions
13
How to Define a Hash Function?
Uniform hash function
Division method
Multiplication method
Hashing of strings
14
Uniform Hash Functions
15
3 Division method (mod operator)
Map into a hash table of m slots.
Use the modulo operator (% ) to map an integer
to a value between 0 and m-1.
hash(k ) = k % m
The most popular method.
16
3 How to pick m (table size)?
17
Multiplication method
1. Multiply by a fraction A between 0 and 1
2. Extract the fractional part
3. Multiply the fraction by m, the hash table size
18
Hashing of strings
hash1(s) // s is a string
sum = 0
foreach character c in s
sum += c // sum up the ASCII values of all characters
return sum % m // m is the hash table size
19
Hashing of strings
hash(“Tan Ah Teck”)
20
Hashing of strings:
Lee Chin Tan
Chen Le Tian
Chan Tin Lee
21
Improved Hashing of strings
A better way is to “shift” the sum before adding the next
character, so that its position will affect the hash code.
This kind of hash code is called a polynomial hash code
hash2(s)
sum = 0
foreach character c in s
sum = sum*37 + c
return sum % m // m is the hash table size
22
Collision Resolution
4 Probability of Collision (1/2)
von Mises Paradox (The Birthday Paradox): “How
many people must be in a room before the probability
that some share a birthday, ignoring the year and leap
days, becomes at least 50 percent?”
24
4 Probability of Collision (2/2)
This means that if there are 23 people in a room,
the probability that some people share a
birthday is 50.7%.
So, if we insert 23 keys in to a table with 365
slots, more than half of the time we get
collisions. Such a result is counter-intuitive to
many.
So, collision is very likely!
25
How to resolve collisions?
26
27
28
Load Factor
n: number of keys in the hash table
m: size of the hash tables – number of slots
Define the load factor
= n/m
a measure of how full the hash table is.
Note: can be >= 1.
If table size is the number of linked lists, then
is the average length of the linked lists.
29
Average Running Time
Find O(1 + )
Insert O(1)
Delete O(1 + )
30
31
Linear Probing
0
hash(k) = k mod 7 In linear probing,
1 when there is a
Here the table size m=7
collision, we scan
2 forwards for the the
Note: 7 is a prime number.
3 next empty slot
(wrapping around
4 when we reach the
last slot).
5
32
Linear Probing: Insert 18
0
hash(k) = k mod 7
1
hash(18) 2
= 18 mod 7
3
=4
4 18
5
33
Linear Probing: Insert 14
0 14
hash(k) = k mod 7
1
2
hash(14)
3
= 14 mod 7
=0 4 18
5
34
Linear Probing: Insert 21
0 14
hash(k) = k mod 7
1 21
Collision occurs!
2 Look for next empty slot.
3
hash(21)
= 21 mod 7 4 18
=0 5
35
Linear Probing
36
Linear Probing: Insert 35
0 14
hash(k) = k mod 7 Collision, need to
1 check next 3 slots.
21
hash(35) 2 1
= 35 mod 7 3 35
=0
4 18
5
37
Linear Probing: Find 35
0 14
hash(k) = k mod 7 Found 35, after 4
1 probes.
21
hash(35) = 0
2 1
3 35
4 18
5
38
Linear Probing: Find 8
0 14
hash(k) = k mod 7 8 NOT found.
1 Need 5 probes!
21
hash(8) = 1
2 1
3 35
4 18
5
39
Linear Probing: Delete 21
0 14
hash(k) = k mod 7
1 21
hash(21) = 0
2 1
3 35
4 18
5
40
Linear Probing: Find 35
0 14
hash(k) = k mod 7 35 NOT found!
1 Incorrect!
hash(35) = 0
2 1
We cannot simply
3 35 remove a value,
4 because it can
18
affect find()!
5
41
How to delete?
Lazy Deletion
Use three different states of a slot
Occupied
deleted
Empty
When a value is removed from linear probed
hash table, we just mark the status of the slot
as “deleted”, Instead of emptying the slot.
Need to use a state array the same size as
the hash table.
42
Linear Probing: Delete 21
0 14
hash(k) = k mod 7 Slot 1 is occupied
1 but now marked
X
21
as deleted.
hash(21) = 0
2 1
3 35
4 18
5
43
Linear Probing
44
Linear Probing: Insert 15
0 14
hash(k) = k mod 7
1 X
21
15 15 is inserted into
hash(15) = 1 the slot 1 which
2 1 was marked as
Note: We continue 3 35 deleted
to search for 15, 4 18
and found that 15 We can insert a
is not in the hash 5 new value into a
table (total 5 slot that has been
6
probes).
marked as
deleted.
45
46
Problem of Linear Probing
0 14
1 15
Consecutive
2 1 occupied slots.
3 35
4 18
5
6 This is called
Primary
Clustering
47
Linear Probing
48
Modified Linear Probing
Q: How to modify linear probing to avoid primary clustering?
We can modify the probe sequence as follows:
hash(key)
( hash(key) + 1 * d ) % m
( hash(key) + 2 * d ) % m
( hash(key) + 3 * d ) % m
:
where d is some constant integer >1 and is co-prime to m.
Note: Since d and m are co-primes, the probe sequence
covers all the slots in the hash table.
49
Quadratic Probing
50
Quadratic Probing: Insert 3, 18
0
hash(k) = k mod 7
1
hash(3) = 3
Hash(18)= 4 2
3 3
4 18
5
51
Quadratic Probing: Insert 38
0 38
hash(k) = k mod 7
1
hash(38) = 3
2
Collision
3 3
+1
4 18
5
+4
6
52
Can quadratic probing always find a free slot?
Insert 12 into the previous example, followed by 10.
what happen?
Theorem
If < 0.5, and m is prime, then we can always
find an empty slot.
(m is the table size and is the load factor)
53
Secondary Clustering
In quadratic probing, clusters are formed along
the path of probing, instead of around the home
location.
These clusters are called secondary clusters.
Secondary clusters are formed as a result of
using the same pattern in probing by all keys.
54
Double Hashing
55
After Inserting 14 and 18, Insert 21
hash(k) = k mod 7 0 14 hash(21)
hash2(k) = k mod 5
1 21 =21 mod 7
2
=0
3
hash2(21)
4 18 = 21 mod 5
5 =1
6
56
Double Hashing: Insert 4
hash(k) = k mod 7 0 14 If we insert 4, the
hash2(k) = k mod 5
1 probe sequence is
21
4 (home), 8, 12, …
2
3 hash(4) = 4
4
hash2(4) = 4
18
5 4
6
57
Double Hashing: Insert 35
hash(k) = k mod 7 0 14 But if we insert 35,
hash2(k) = k mod 5
1 the probe sequence
21
hash(35) = 0 is 0, 0, 0, …
hash2(35) = 0 2
What is wrong?
3 Since hash2(35)=0.
4 Not acceptable!
18
5 4
6
58
Warning:
59
Note:
If hash2(k) =1, then it is the same as linear probing.
60
Good Collision Resolution Method
61
[CS1020 Lecture 10 AY2013/4 S1]
62
Summary
How to hash? Criteria for good hash functions?
How to resolve collision?
Collision resolution techniques:
separate chaining
linear probing
quadratic probing
double hashing
Problem on deletions
Primary clustering and secondary clustering.
63