COMP211slides 11
COMP211slides 11
E.G.M. Petrakis
Hashing
Hash Table
Hash Function: transforms keys to
array indices
0
1
2
3
4
h(key)
index
data
.
.
.
n
E.G.M. Petrakis
h(key): Hash
Hashing
Function
E.G.M. Petrakis
position
key
0
1
2
3
.
.
.
395
396
397
398
399
400
401
.
.
.
990
991
992
993
994
995
996
997
998
999
4967000
record
8421002
.
.
.
4618396
4957397
1286399
.
.
.
0000990
0000991
1200992
0047993
9846995
4618996
4967997
0001999
Hashing
Hashing
Collision Resolution
1. Open Addressing (rehashing):
Hashing
Open Addressing
Computes a new address to store the key
if it is occupied (rehashing)
E.G.M. Petrakis
Hashing
Example
i=h(key)=key mod 100
rh(i) = (i+1) mod 100
key: 193
i=h(193)=93
rh(i)=(93+1)=94
Key 193 will occupy
position 94
E.G.M. Petrakis
Hashing
0
1
2
.
.
.
90
91
92
93
94
.
.
.
100
100
101
.
.
.
990
991
992
993
193
.
.
.
7
i.
E.G.M. Petrakis
Hashing
Problem 2: Primary
Clustering
Different keys that hash
into different addresses
compete with each other
in successive rehashes
E.G.M. Petrakis
Hashing
0
1
2
.
.
.
90
91
92
93
94
.
.
.
100
100
101
.
.
.
990
991
992
993
.
.
.
9
Problem 3: Secondary
Clustering
rh = 4, 6, 9, 3,
ii. key 13 : h(13) = 3
rh = 4, 6, 9, 3,
E.G.M. Petrakis
Hashing
0
1
2
3
4
5
6
7
8
9
10
53
14
15
46
10
Linear Probing
Store the key into the next free
position
h0 = h(key) usually h0 = key mod m
hi = (hi-1 + 1) mod m, i >= 1
E.G.M. Petrakis
0
1
2
3
4
5
6
7
8
9
301
22
102
452
35
99
Hashing
11
Observation 1
Different insertion
probes
S2={53,40,33,31,12,22,7
7,50,8,99,27,3,11}=> 30
probes
E.G.M. Petrakis
Hashing
0
1
2
3
4
5
6
7
8
9
10
11
12
number
17
27
12
3
40
31
53
33
99
8
22
11
50
2
1
4
1
4
1
6
1
1
2
2
1
2
of
probes
Observation 2
0
70
1
2
12
33
14
55
65
75
85
E.G.M. Petrakis
Hashing
13
Observation 3
Linear probing tends
to create long
sequences of occupied
positions
m
P =
B +1
m
Hashing
14
Observation 4
Linear probing suffers from both
primary and secondary clustering
Solution: double hashing
uses two hash functions h1, h2 and a
rehashing function rh
E.G.M. Petrakis
Hashing
15
Double Hashing
Two hash functions and a rehashing
function
E.G.M. Petrakis
Hashing
16
m div 2
h2 (key) =
q
q =0
q 0
E.G.M. Petrakis
Hashing
17
Example (continued)
A. m = 10, key = 23
h1(23) = 3, h2(23) = 2
rh(3,2)=(3+2) mod 10 = 5
rehash sequence: 5, 7, 9, 1,
m = 10, key = 13
h1(key)=3, h2(13)=1, rh(3,1)=(3+1)mod10=4
rehash sequence: 4, 5, 6,
E.G.M. Petrakis
Hashing
18
Performance of Open
Addressing
Distinguish between
successful and
unsuccessful search
independent events
load factor: = n/m
: probability to probe an occupied position
each position has the same probability P=1/m
E.G.M. Petrakis
Hashing
19
Unsuccessful Search
The hash sequence is exhausted
let u be the expected number of probes
u equals the expected length of the hash
sequence
P(k): probability to search k positions in
the hash sequence
E.G.M. Petrakis
Hashing
20
u =
kP(k)
k 1
P(1) +
P(2) + P(2) +
P(3) + P(3) + P(3) +
L
E.G.M. Petrakis
Hashing
21
u = P( k probes) =
k 1
k 1
1
k
k 1
u=
E.G.M. Petrakis
k 1
independent events
u increases with =>
performance drops as
increases
Hashing
22
Successful Search
The hash sequence is not exhausted
the number of probes to find a key
1
1
1
s = (u + 1)dx = 1 + ln(
)
1
0
E.G.M. Petrakis
approximation
Hashing
increases with
23
Performance
The performance drops as increases
the higher the value of is, the higher
the probability of collisions
E.G.M. Petrakis
Hashing
24
Experimental Results
LOAD
FACTOR
SUCCESSFUL
LINEAR
UNSUCCESSFUL
i + bkey DOUBLE
LINEAR
i + bkey DOUBLE
25%
1.17
1.16
1.15
1.39
1.37
1.33
50%
1.50
1.44
1.39
2.50
2.19
2.00
75%
2.50
2.01
1.85
8.50
4.64
4.00
90%
5.50
2.85
2.56
50.50
11.40
10.00
95%
10.50
3.52
3.15
200.50
22.04
20.00
E.G.M. Petrakis
Hashing
25
SUCCESSFUL
LINEAR i + bkey
UNSUCCESSFUL
LOG2m
DOUBLE
100
6.60
4.62
4.12
50.50
6.64
500
14.35
6.22
5.72
250.50
8.97
1000
20.15
6.91
6.41
500.50
9.97
5000
44.64
8.52
8.02
2500.5
12.29
10000
63.00
9.21
8.71
5000.50
13.29
E.G.M. Petrakis
Hashing
26
Separate Chaining
Keys hashing to the same hash value
are stored in separate lists
one list per hash position
can store more than m records
easy to implement
the keys in each list can be ordered
E.G.M. Petrakis
Hashing
27
40
91
42
nil
nil
130
nil
192
nil
nil
75
66
16
87
67
nil
nil
417
227
nil
nil
E.G.M. Petrakis
49
nil
Hashing
28
Performance of Separate
Chaining
Depends on the average chain size
insertions are independent events
let P(c,n,m): probability that a position
E.G.M. Petrakis
Hashing
29
n c
n 1
1
P(c, n, m) = 1
m
c m
=
c
1 n n c +1
1
1
1 1
m m m
c! m
n c +1
n, m
=>
1
m
=>
1 e
m
E.G.M. Petrakis
Hashing
P(c,n,m)=(1/c!)ce-
Poison
30
Unsuccessful Search
The entire chain is searched
the average number of comparisons
equals its average length u
u = cP(c, ) = c e =
c 0
c 0 c!
E.G.M. Petrakis
Hashing
31
Successful Search
Not the whole chain is searched
the average number of comparisons
1
1
s = (u + 1)dx = (x + 1)dx = 1 +
0
2
0
E.G.M. Petrakis
Hashing
32
Performance
The performance drops with the
E.G.M. Petrakis
Hashing
33
Coalesced Hashing
The hash sequence is
implemented as a
linked list within the
hash table
no rehash function
the next hash position
Hashing
49
29
59
19
5
34
initialization
avail
0
1
2
3
4
5
6
7
8
9
nilkey
nilkey
.
.
.
.
.
.
.
nilkey
-1
0
1
2
3
4
5
6
7
8
List of
empty positions
E.G.M. Petrakis
initially: avail = 9
h(key) = key mod 10
keys:
14,29,34,28,42,39,84,38
0
1
2
3
4
5
6
7
8
9
Hashing
nilkey
nilkey
42
38
14
84
39
28
34
29
-1
0
-1
-1
8
-1
-1
3
5
6
Holds lists of
rehashing
positions and
list of empty
positions
35
Performance of Coalesced
Hashing
Unsuccessful search
1 2
e + 0.75 probes/search
4
2
Successful search
e 2 1
+ + 0.75
8
4
E.G.M. Petrakis
probes/search
Hashing
36