Hashing PDF
Hashing PDF
Hashing
ES-103 (Data Structure)
Alg.: DIRECT-ADDRESS-INSERT(T, x)
T[x. key ]= x
Alg.: DIRECT-ADDRESS-DELETE(T, x)
T[x. key] = NIL
Example 2:
Hash Tables
When K is much smaller than U, a hash table requires
much less space than a direct-address table
Can reduce storage requirements to |K|
Can still get O(1) search time, but on the average case,
not the worst case
Idea
Use a function h to compute the slot for each key
Store the element in slot h(k)
Hash Tables
A hash function h transforms a key into an index in a hash
table T[0…m-1]:
h : U → {0, 1, . . . , m - 1}
• We say that k hashes to slot h(k)
• Advantages:
Reduce the range of array indices handled: m
instead of |U|
Storage is also reduced
Example: Hash Tables
U
(universe of keys) h(k1)
h(k4)
k1
k4 k2 h(k2) = h(k5)
K
(actual k k3 h(k3)
5
keys)
m-1
Revisit Example 2 0
1
• Suppose that the keys are nine-digit 2
Social Security numbers (ssn) 3
4
• Hash Function : h(x) = x mod 100 5
(Table Size) T[0-99] 6
7
h(ssn)=ssn mod 100 (Last 2 digits of ssn)
8
for example if ssn = 101234511 then 9
h(101234511) =101234511mod100= 11 10
11 101234511
.
.
.
.
99
Problem with this approach?
Revisit Example 2 0
1
• Suppose that the keys are nine-digit 2
Social Security numbers (ssn) 3
4
• Hash Function : h(x) = x mod 100 5
(Table Size) T[0-99] 6
7
h(ssn)=ssn mod 100 (Last 2 digits of ssn)
8
for example if ssn = 101234511 then 9
h(101234511) = 101234511mod100=11 10
11 101234511
For ssn = 111456811 then .
.
h(111456811)= 111456811mod 100 = 11
.
.
Collision ?........ 99
Collision
• Two or more keys hash to the same slot!!
• For a given set K of keys
If |K| ≤ m, collision may or may not happen, depending on
the hash function
If |K| > m, collisions will definitely happen (i.e., there
must be at least two keys that have the same hash value)
Slot j contains a pointer to the head of the list of all elements that hash to j
Example
• M=10
• Keys = • Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
• Hash Function (HF) : Key mod M
• Collision Resolution Technique : Chaining
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0
1
72 Null
2
4
25 NULL
5
6 16 NULL
7
27 NULL
8 28 NULL
9
99 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0
1
72 Null
2
4
55 25 NULL
5
6 16 NULL
7
27 NULL
8 28 NULL
9
99 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0
1
72 Null
2
4
55 25 NULL
5
6 16 NULL
7
27 NULL
8 28 NULL
9 99
69 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0
1
72 Null
2
4
55 25
NULL
5
6 16 NULL
7
27 NULL
8 28
48 NULL
9 99
69 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0
1
72 Null
2
4
25
75 55 NULL
5
6 16 NULL
7
27 NULL
8 28
48 NULL
9 99
69 NULL
Insertion in Hash Tables
Alg.: CHAINED-HASH-INSERT(T, x)
insert x at the head of list T[h(x. key)]
• Assumes that the element being inserted isn’t already in
the list
• It would take an additional search to check if it was
already inserted
• Worst-case running time is O(1)
Deletion in Hash Tables
Alg.: CHAINED-HASH-DELETE(T, x)
delete x from the list T[h(x. key)]
9 1234569
Example :
m (8)23
00101mod 23 101
01101mod 23 101 Collision
10101mod 2 101
3
11101mod 23 101
i.e.
m 2 k
1010111001110101010101mod 2 101010101
k
• Quadratic probing
• Double hashing
Linear probing: Inserting a key
• Idea
when there is a collision, check the next
available position in the table (i.e.,
probing)
• Open Addressing
Linear Probing, Quadratic Probing, Double hashing
Linear Probing : Discussion 0
h ' : U {0,1..., m 1}
h(k , i ) (h ' (k ) i ) mod m for i= 0,1,…,m-1 1
'
2
• Given key k, we first probe T [h (k )]
• We next probe T [h (k )] 1 and so on up to
' 3
T [m 1] .
• Then, we wrap around the
.
T [0], T [1],..., T [h (k ) 1]
'
.
• m distinct probe sequences possible, because the
initial probe determines the entire probe sequence. .
m-1
Linear Probing 10
m = 10 {0-9}
0
6 full
H(k) = k mod m 1 11
Slot
Keys = 10, 11, 22, 33, 20, 30, 40
2 22
50 Primary
3 33
Clustering
4 20
5 30 Average
Search
6 40 Time
• Linear Probing suffers from 7 increases
Primary Clustering
8
9
Quadratic probing
• Quadratic probing introduced to solve the problem of
Primary Clustering.
H(k)=k mod m
QP(key, i) ( H (key) C1i C2i 2 ) mod m
i {0,1,2,...,m 1}
Collision ?.......
8 17
Collision ?....... 6 15
QP (59 ,2) ( H (59 ) 1.2 1.2 2 ) mod 11 10 7 18
Collision ?....... 8 17
QP (59 ,3) ( H (59 ) 1.3 1.32 ) mod 11 5 9 31
Quadratic Probing
• Quadratic probing uses hash function of the form
h(k , i) (h ' (k ) C1i C2i 2 ) mod m
where C1 and C2 are positive constants and i={0,1,2,…,m-1}
• The initial position probed is T [h ' (k )] later positions
probed are offset by amounts that depend in a quadratic
manner on the probe number i.
Quadratic Probing 0 10
•If two keys have the same initial probe position,
then their probe sequences are the same, 1 45
since h(k1 ,0) h(k 2 ,0) 2 20
implies h(k1 , i ) h(k2 , i )
• This property leads to Secondary Clustering.
3
m = 10 {0-9} 4 74
H(k)=k mod m 5 15
Key=10,15, 74, 20, 35, 45, 92
CRT= QP 6
7 35
• Similar to linear probing, the initial probe 8 92
determines the entire sequence and so only m
distinct probe sequences are used. 9
Linear and Quadratic
m = 10 {0-9}
H(k)=k mod m
Key=10,15, 74, 20,
35, 45, 92, 30
CRT= QP
Primary Cluster
Secondary Cluster
12
Example 1: Double hashing 0
• m = 13 1 79
2
Key = 79, 69, 72, 98, 50, 14
3 98
h1(k) = k mod m
4 69
h2(k) = 1+(k mod (m-3) )
5
CRT= Double Hashing
6
H (98,0) (h1 (98) 0.h2 (98)) mod 13 7 7 62
Collision ?.... 8
H (98,1) (h1 (98) 1.h2 (98)) mod 13 3 9
10
H (50,0) (h1 (50 ) 0.h2 (50 )) mod 13 11
11 50
12
Example 1: Double hashing 0
• m = 13 1 79
2
Key = 79, 69, 72, 98, 50, 14
3 98
h1(k) = k mod m
4 69
h2(k) = 1+(k mod (m-3) )
5
CRT= Double Hashing
6 14
H (14,0) (h1 (14 ) 0.h2 (14 )) mod 13 1 7 62
Collision ?.... 8
H (14,1) (h1 (14 ) 1.h2 (14 )) mod 13 6 9
10
11 50
12
Example 2: Double hashing 0 22
• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5
CRT= Double Hashing
6
H (key, i) (h1 (key) ih2 (key)) mod 11 7
8
H (10,0) (h1 (10 ) 0.h2 (10 )) mod 11 10
9 31
H (22,0) (h1 (22 ) 0.h2 (22 )) mod 11 0 10 10
H (31,0) (h1 (31) 0.h2 (31)) mod 11 9
H (4,0) (h1 (4) 0.h2 (4)) mod 11 4
Example 2: Double hashing 0 22
• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6
H (key, i) (h1 (key) ih2 (key)) mod 11 7
H (15,0) (h1 (15) 0.h2 (15)) mod 11 4 8
9 31
Collision ?.....
H (15,1) (h1 (15) 1.h2 (15)) mod 11 10 10 10
Collision ?.....
H (15,2) (h1 (15) 2.h2 (15)) mod 11 5
Example 2: Double hashing 0 22
• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i) (h1 (key) ih2 (key)) mod 11 7 18
• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i) (h1 (key) ih2 (key)) mod 11 7 18
H (88,0) (h1 (88) 0.h2 (88)) mod 11 0 8
Collision ?..... 9 31
• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3 88
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i) (h1 (key) ih2 (key)) mod 11 7 18
• m = 11 1
2 59
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3 88
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i) (h1 (key) ih2 (key)) mod 11 7 18
Collision ?.....
H (59,2) (h1 (59 ) 2.h2 (59 )) mod 11 2