Week 10: Hash Table: Readings
Week 10: Hash Table: Readings
Hash Table
Readings
p
Required
Exercise
[Weiss] ch20
20.5
6
nus.soc.cs1102b.week10
Recap
Unsorted
Sorted
BST
Array/List Array
Insert
O(1)
Hash
Table
O(N)
Delete
O(N)
O(N)
Find
O(N)
findMin
O(N)
O(1)
O(log N) O(N)
findMax O(N)
O(1)
O(log N) O(N)
9
nus.soc.cs1102b.week10
Direct Addressing
Table
9 October 2002
find(N)
n
insert(N)
n
delete(N)
n
11
nus.soc.cs1102b.week10
989
true
12
nus.soc.cs1102b.week10
2, data
:
989
989, data
13
nus.soc.cs1102b.week10
9 October 2002
Restrictions
p
p
15
nus.soc.cs1102b.week10
Hash Table
Idea
p
p
HASHING
17
nus.soc.cs1102b.week10
Hash Table
66752378
h
17
66752378,
data
68744483
h
974
68744483,
data
18
nus.soc.cs1102b.week10
9 October 2002
Hash Table
insert (key, data)
a[ h(key) ] = data
delete (key)
a[ h(key) ] = null
search (key)
return a[ h(key) ]
19
nus.soc.cs1102b.week10
Hash Table
66752378,
data
67774385
h
:
68744483,
data
20
nus.soc.cs1102b.week10
Problem
p
COLLISION
21
nus.soc.cs1102b.week10
How to hash?
How to resolve collision?
22
nus.soc.cs1102b.week10
9 October 2002
Hash Functions
random
p fast
p depends
24
nus.soc.cs1102b.week10
25
nus.soc.cs1102b.week10
Example
n
k [0, X )
k m
hash( k ) =
X
26
nus.soc.cs1102b.week10
9 October 2002
Hashing Integers
Division Method
p
hash( k )= k % m
28
nus.soc.cs1102b.week10
mod operator
p
29
nus.soc.cs1102b.week10
How to pick m?
p
m = 16
m = 10
m = 13
30
nus.soc.cs1102b.week10
9 October 2002
Rule
p
31
nus.soc.cs1102b.week10
Multiplication Method
1.Multiply by a number 0 <= A < 1
2.Extract the fractional part
3.Multiply by m
32
nus.soc.cs1102b.week10
Hashing Strings
Hashing of Strings
hash(s, m)
sum = 0
foreach character c in s
sum += c
return sum % m
34
nus.soc.cs1102b.week10
9 October 2002
Hashing of Strings
Lee Chin Tan
Chen Le Tian
p Chan Tin Lee
p
p
36
nus.soc.cs1102b.week10
Hashing of Strings
hash(s)
sum = 0
foreach character c in s
sum += sum*37 + c
return sum % m
37
nus.soc.cs1102b.week10
Collision
Resolution
9 October 2002
Probability of Collision
p
39
nus.soc.cs1102b.week10
Probability of Collision
Q(n) = Probability of unique birthday for n people
= 364
363 362 365 n + 1
365
365
365
...
365
P(23) = 0.507
40
nus.soc.cs1102b.week10
Probability of Collision
41
nus.soc.cs1102b.week10
Collision Resolutions
Separate Chaining
Linear Probing
p Quadratic Probing
p Double Hashing
p
p
42
nus.soc.cs1102b.week10
9 October 2002
Separate Chaining
Idea
0
k1,data
k2,data
m-1
k4,data
k3,data
44
nus.soc.cs1102b.week10
Hash Table
insert (key, data)
insert data into the list a[ h(key) ]
delete (key)
delete data from the list a[ h(key) ]
search (key)
find key from the list a[ h(key) ]
45
nus.soc.cs1102b.week10
Analysis
n: number of keys
m: number of slots
p L: load factor
p
p
p
p
L = n/m
Average length of list = L
46
nus.soc.cs1102b.week10
9 October 2002
p
p
47
nus.soc.cs1102b.week10
Rehashing
p
48
nus.soc.cs1102b.week10
Linear Probing
Linear Probing
hash(k)
k mod 7
0
1
2
3
4
5
6
50
nus.soc.cs1102b.week10
9 October 2002
Insert 21
hash(k)
k mod 7
14
21
2
3
4
18
5
6
53
nus.soc.cs1102b.week10
Insert 1
hash(k)
k mod 7
14
21
3
4
18
5
6
54
nus.soc.cs1102b.week10
Insert 35
hash(k)
k mod 7
14
21
35
18
5
6
55
nus.soc.cs1102b.week10
Find 35
hash(k)
k mod 7
14
21
35
18
FOUND 35
5
6
56
nus.soc.cs1102b.week10
9 October 2002
Find 8
hash(k)
k mod 7
14
21
35
18
8 NOT FOUND
6
57
nus.soc.cs1102b.week10
Delete 21
hash(k)
k mod 7
14
21
35
18
5
6
58
nus.soc.cs1102b.week10
Find 35
hash(k)
k mod 7
14
35 NOT FOUND!
35
18
5
6
59
nus.soc.cs1102b.week10
Problem
Cannot Delete!
60
nus.soc.cs1102b.week10
9 October 2002
How to delete?
p
p
Lazy Deletion
Three different states
n
n
n
occupied
occupied but mark as deleted
empty
61
nus.soc.cs1102b.week10
Delete 21
hash(k)
k mod 7
14
21
X
35
18
5
6
62
nus.soc.cs1102b.week10
Find 35
hash(k)
k mod 7
14
21
X
35
18
FOUND 35
5
6
63
nus.soc.cs1102b.week10
Insert 15
hash(k)
k mod 7
14
21
X
35
18
5
6
64
nus.soc.cs1102b.week10
9 October 2002
Insert 15
hash(k)
k mod 7
14
15
35
18
5
6
65
nus.soc.cs1102b.week10
Problem
Primary Clustering
67
nus.soc.cs1102b.week10
Quadratic Probing
Linear Probing
hash(key)
( hash(key) + 1 ) % m
( hash(key) + 2 ) % m
( hash(key) + 3 ) % m
:
69
nus.soc.cs1102b.week10
9 October 2002
Quadratic Probing
hash(key)
( hash(key) + 1 ) % m
( hash(key) + 4 ) % m
( hash(key) + 9 ) % m
:
70
nus.soc.cs1102b.week10
Insert 3
hash(k)
k mod 7
0
1
2
3
18
5
6
71
nus.soc.cs1102b.week10
Insert 38
hash(k)
k mod 7
38
1
2
3
18
5
6
72
nus.soc.cs1102b.week10
Theorem
p
73
nus.soc.cs1102b.week10
9 October 2002
Problems
If two keys have the same initial position,
their probe sequence is the same.
p Secondary clustering.
p
74
nus.soc.cs1102b.week10
Double Hashing
Double Hashing
hash(key)
(hash(key) + hash2(key)) % m
(hash(key) + 2*hash2(key)) % m
(hash(key) + 3*hash2(key)) % m
:
76
nus.soc.cs1102b.week10
Insert 21
hash(k)
k mod 7
14
21
hash 2(k)
k mod 5
3
4
18
5
6
77
nus.soc.cs1102b.week10
9 October 2002
Insert 4
hash(k)
k mod 7
14
21
hash 2(k)
k mod 5
2
3
4
18
6
78
nus.soc.cs1102b.week10
Insert 35
hash(k)
k mod 7
14
21
hash 2(k)
k mod 5
2
3
4
18
6
79
nus.soc.cs1102b.week10
Warning
p
Change hash2(key) to
hash2(key) = 5 (key % 5)
80
nus.soc.cs1102b.week10
81
nus.soc.cs1102b.week10
9 October 2002