0% found this document useful (0 votes)
92 views65 pages

Hashing PDF

This document provides an introduction to hashing. It discusses the motivation for using hashing to provide constant time search. Direct addressing is described which uses an array indexed by keys, but requires all keys to be distinct. Hashing is introduced to reduce storage by using a hash function to map keys to table slots, which can cause collisions. Collision handling techniques of chaining and open addressing are outlined. Chaining stores collided keys in linked lists at table slots, while open addressing probes for empty slots. Prime table sizes and good hash functions are discussed to minimize collisions.

Uploaded by

SYED MAJID AHMAD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views65 pages

Hashing PDF

This document provides an introduction to hashing. It discusses the motivation for using hashing to provide constant time search. Direct addressing is described which uses an array indexed by keys, but requires all keys to be distinct. Hashing is introduced to reduce storage by using a hash function to map keys to table slots, which can cause collisions. Collision handling techniques of chaining and open addressing are outlined. Chaining stores collided keys in linked lists at table slots, while open addressing probes for empty slots. Prime table sizes and good hash functions are discussed to minimize collisions.

Uploaded by

SYED MAJID AHMAD
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

An Introduction To

Hashing
ES-103 (Data Structure)

Dept of Computer Science & Engineering


Indian Institute of Information Technology, Design and Manufacturing
Jabalpur (MP)
Motivation
 Search Time O(1)
 Other Data Structures:
 Linked-List
 Array …………….
Search Time is O(N)
 Tree
Search Time O(log N)
N: no. of elements
The Search Problem
 Find items with keys matching a given search key
 Given an array A, containing n keys, and a search key
x, find the index i such as x=A[i]
Direct Addressing
 Assumption
 Key values are distinct
 Each key is drawn from a universe U = {0, 1, . . . ,m-1}
 Idea
 Store the items in an array, indexed by keys
 Direct-address table representation:
 An array T[0 . . . m - 1]
 Each slot, or position, in T corresponds to a key in U
 For an element x with key k, a pointer to x (or x itself)
will be placed in location T[k]
 If there are no elements with key k in the set, T[k] is
empty, represented by NIL
Direct Addressing
Direct Addressing
Alg.: DIRECT-ADDRESS-SEARCH(T, k)
return T[k]

Alg.: DIRECT-ADDRESS-INSERT(T, x)
T[x. key ]= x

Alg.: DIRECT-ADDRESS-DELETE(T, x)
T[x. key] = NIL

Running time for these operations: O(1)


Direct Addressing :
Example 1:

Example 2:
Hash Tables
 When K is much smaller than U, a hash table requires
much less space than a direct-address table
 Can reduce storage requirements to |K|
 Can still get O(1) search time, but on the average case,
not the worst case

 Idea
 Use a function h to compute the slot for each key
 Store the element in slot h(k)
Hash Tables
 A hash function h transforms a key into an index in a hash
table T[0…m-1]:
h : U → {0, 1, . . . , m - 1}
• We say that k hashes to slot h(k)
• Advantages:
 Reduce the range of array indices handled: m
instead of |U|
 Storage is also reduced
Example: Hash Tables

U
(universe of keys) h(k1)
h(k4)

k1
k4 k2 h(k2) = h(k5)
K
(actual k k3 h(k3)
5
keys)

m-1
Revisit Example 2 0
1
• Suppose that the keys are nine-digit 2
Social Security numbers (ssn) 3
4
• Hash Function : h(x) = x mod 100 5
(Table Size) T[0-99] 6
7
 h(ssn)=ssn mod 100 (Last 2 digits of ssn)
8
 for example if ssn = 101234511 then 9
h(101234511) =101234511mod100= 11 10
11 101234511
.
.
.
.

99
Problem with this approach?
Revisit Example 2 0
1
• Suppose that the keys are nine-digit 2
Social Security numbers (ssn) 3
4
• Hash Function : h(x) = x mod 100 5
(Table Size) T[0-99] 6
7
 h(ssn)=ssn mod 100 (Last 2 digits of ssn)
8
 for example if ssn = 101234511 then 9
h(101234511) = 101234511mod100=11 10
11 101234511
For ssn = 111456811 then .
.
h(111456811)= 111456811mod 100 = 11
.
.

Collision ?........ 99
Collision
• Two or more keys hash to the same slot!!
• For a given set K of keys
 If |K| ≤ m, collision may or may not happen, depending on
the hash function
 If |K| > m, collisions will definitely happen (i.e., there
must be at least two keys that have the same hash value)

Avoiding collisions completely is hard, even with a good hash


function
Handling Collisions
• Chaining
• Open addressing
 Linear probing
 Quadratic probing
 Double hashing
Handling Collisions Using Chaining
• Idea
 Put all elements that hash to the same slot into a linked
list:

 Slot j contains a pointer to the head of the list of all elements that hash to j
Example
• M=10
• Keys = • Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
• Hash Function (HF) : Key mod M
• Collision Resolution Technique : Chaining
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0

1
72 Null
2

4
25 NULL
5

6 16 NULL
7
27 NULL
8 28 NULL
9
99 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0

1
72 Null
2

4
55 25 NULL
5

6 16 NULL
7
27 NULL
8 28 NULL
9
99 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0

1
72 Null
2

4
55 25 NULL
5

6 16 NULL
7
27 NULL
8 28 NULL
9 99
69 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0

1
72 Null
2

4
55 25
NULL
5

6 16 NULL
7
27 NULL
8 28
48 NULL

9 99
69 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0

1
72 Null
2

4
25
75 55 NULL
5

6 16 NULL
7
27 NULL
8 28
48 NULL

9 99
69 NULL
Insertion in Hash Tables
Alg.: CHAINED-HASH-INSERT(T, x)
insert x at the head of list T[h(x. key)]
• Assumes that the element being inserted isn’t already in
the list
• It would take an additional search to check if it was
already inserted
• Worst-case running time is O(1)
Deletion in Hash Tables
Alg.: CHAINED-HASH-DELETE(T, x)
delete x from the list T[h(x. key)]

• Need to find the element to be deleted.


• Worst-case running time:
 Deletion depends on searching the corresponding
list
Searching in Hash Tables
Alg.: CHAINED-HASH-SEARCH(T, k)
• search for an element with key k in list T[h(k)]
• Running time is proportional to the length of the list of
elements in slot h(k)
Hashing with Chaining: Worst Case
T
• How long does it take to search
0
for an element with a given key?
• Worst case:
 All n keys hash to the same
slot
 Worst-case time to search is
(n), plus time to compute the
chain
hash function
m-1
Hashing with Chaining: Worst Case
• Hashing with chaining uses extra space, despite the space
is present in the hash table itself.
Hash Function
• A hash function transforms a key into a table address
• What make a good hash function
 Easy to compute
 Approximates a random function: for every input,
every output is equally likely (simple uniform hashing)
 Minimize the chance that closely related keys
hash to the same slot.
The Division Method
• Idea:
 Map a key k into one of the m slots by taking the
remainder of k divided by m
h(k) = k mod m
• Advantage:
 fast, requires only one operation
• Disadvantage:
 Certain values of m are bad, e.g.,
 power of 2
 non-prime numbers
Example :
0
• m= 10 (0 - 9) Hash Table
1
H(key) = key mod m
key = 1234569 mod 10 = 9 2

9 1234569
Example :
m  (8)23

00101mod 23  101
01101mod 23  101 Collision
10101mod 2  101
3

11101mod 23  101
i.e.
m 2 k
1010111001110101010101mod 2  101010101
k

K-bit LSB K-bit


The Division Method m m
K
• Choose m to be a prime, not close 97 100
to a power of 2
 Column 2: k mod 97
 Column 3: k mod 100
Open Addressing
• Idea:
 If we have enough contiguous memory to store all
the keys (m > N)  store the keys in the table itself
 No need to use linked lists anymore
 Basic idea:
 Insertion: if a slot is full, try another one, until
you find an empty one
 Search: follow the same sequence of probes
 Deletion: Difficult .
• Search time depends on the length of the
probe sequence!
Common Open Addressing Methods
• Linear probing

• Quadratic probing

• Double hashing
Linear probing: Inserting a key
• Idea
when there is a collision, check the next
available position in the table (i.e.,
probing)

h(k,i) = (h1(k) + i) mod m


i=0,1,2,...
 First slot probed: h1(k)
 Second slot probed: h1(k) + 1
 Third slot probed: h1(k)+2, and so on
 probe sequence:
< h1(k), h1(k)+1 , h1(k)+2 , ....>
• Can generate m probe sequences
maximum
Example : Linear Probing
m=10 {0-9}
0 10
H(key) = key mod m 1 60
Keys : 10,15, 99, 60, 75, 19, 35, 44
2
LP(10,0)  ( H (10)  0) mod10  (0  0) mod10  0
3
LP(15,0)  ( H (15)  0) mod10  (5  0) mod10  5
LP(99,0)  ( H (99)  0) mod10  (9  0) mod10  9 4
5 15
LP(60,0)  ( H (60)  0) mod10  (0  0) mod10  0
Collision ?.... 6
7
LP(60,1)  ( H (60)  1) mod10  (0  1) mod10  1 8
9 99
Example : Linear Probing
m=10 {0-9}
0 10
H(key) = key mod m 1 60
Keys : 10,15, 99, 60, 75, 19, 35, 44
2 19
LP(75,0)  ( H (75)  0) mod10  (5  0) mod10  5
3
Collision ?....
LP(75,1)  ( H (75)  1) mod10  (5  1) mod10  6 4
LP(19,0)  ( H (19)  0) mod10  (9  0) mod10  9 15
Collision ?...
5
LP(19,1)  ( H (19)  1) mod10  (9  1) mod10  0 6 75
Collision ?...
LP(19,2)  ( H (19)  2) mod10  (9  2) mod10  1 7
Collision ?... 8
LP(19,3)  ( H (19)  3) mod10  (9  3) mod10  2 99
9
Example : Linear Probing
m=10 {0-9}
0 10
H(key) = key mod m 1 60
Keys : 10,15, 99, 60, 75, 19, 35, 44
2 19
LP(35,0)  ( H (35)  0)  (5  0) mod10  5 3
Collision ?.....
LP(35,1)  ( H (35)  1)  (5  1) mod10  6 4 44
Collision ?..... 5 15
LP(35,2)  ( H (35)  2)  (5  2) mod10  7
6 75

LP(44,0)  ( H (44)  0)  (4  0) mod10  4


7 35
8
9 99
Searching: Linear Probing 0 10
• Find element 19
1 60
H (19)  19 mod10  9 2 19
3
• Note:
 In linear probing we are probing 4 44
(attempting) one by one slot 5 15
 To search an element in the table
Best Case = O(1) 6 75
 If empty slot reach and not find element 7 35
Then element is not there
 worst case O(m) 8
 We are not wasting space in the form of 9 99
linked list.
Deletion: Linear Probing 0 10
• In linear probing deletion is difficult because
1 60
deletion of one element, create trouble to other
element 2 19
e.g. delete 60
3
Find element 19 ?..
4 44
• Solution 5 15
Mark the slot with a sentinel value 6 75
DELETED
The deleted slot can later be used for 7 35
insertion 8
Searching will be able to find all the keys
9 99
Hashing
 Search Time O(1)
 Direct Address Table
 Idea:
 Store the items in an array, indexed by keys
 Advantage:
 Search Time: O(1)
 Insertion Time: O(1)
 Deletion Time: O(1)
 Disadvantage:
 High memory Requirement
Hashing
 Hash Table
 Idea:
 Transform the key into one of the memory location in
Hash Table using hash function.
 Advantage:
 Reduced memory location.
 Disadvantage:
 Collision.
Hashing
 collision Resolution Technique:
 Chaining
 Idea:
 Store the keys in linked list, pointing to the same slot
 Advantage:
 Any number of collision can be handled
 Disadvantage:
 Use of extra space, despite the space is present in
hash table itself

• Open Addressing
 Linear Probing, Quadratic Probing, Double hashing
Linear Probing : Discussion 0
h ' : U  {0,1..., m  1}
h(k , i )  (h ' (k )  i ) mod m for i= 0,1,…,m-1 1
'
2
• Given key k, we first probe T [h (k )]
• We next probe T [h (k )]  1 and so on up to
' 3
T [m  1] .
• Then, we wrap around the
.
T [0], T [1],..., T [h (k )  1]
'

.
• m distinct probe sequences possible, because the
initial probe determines the entire probe sequence. .

m-1
Linear Probing 10
m = 10 {0-9}
0
6 full
H(k) = k mod m 1 11
Slot
Keys = 10, 11, 22, 33, 20, 30, 40
2 22
50 Primary
3 33
Clustering
4 20
5 30 Average
Search
6 40 Time
• Linear Probing suffers from 7 increases
Primary Clustering
8
9
Quadratic probing
• Quadratic probing introduced to solve the problem of
Primary Clustering.

H(k)=k mod m
QP(key, i)  ( H (key)  C1i  C2i 2 ) mod m
i {0,1,2,...,m  1}

C1 and C2 are positive constant


Example 1:Quadratic probing 0 10
m = 10 {0-9}
1
H(k)=k mod m
Key=10,15, 74, 20, 35, 45, 92 2 20
CRT= QP, C1 =1, C2 = 1
3
QP (10 ,0)  ( H (10 )  1.0  1.0 2 ) mod 10  0
4 74
QP (15 ,0)  ( H (15 )  1.0  1.0 ) mod 10  5
2

QP (74 ,0)  ( H (74 )  1.0  1.0 2 ) mod 10  4 5 15

QP (20 ,0)  ( H (20 )  1.0  1.0 2 ) mod 10  0 6


Collision ?....... 7
QP (20 ,1)  ( H (20 )  1.1  1.12 ) mod 10  2
8
9
Example1 :Quadratic probing 0 10
m = 10 {0-9}
1 45
H(k)=k mod m
Key=10,15, 74, 20, 35, 45, 92 2 20
CRT= QP 3
QP (35,0)  ( H (35 )  1.0  1.0 2 ) mod 10  5 4 74
Collision ?.......
5 15
QP (35,1)  ( H (35 )  1.1  1.12 ) mod 10  7
6
QP (45,0)  ( H (45 )  1.0  1.0 2 ) mod 10  5
Collision ?....... 7 35
QP (45,1)  ( H (45 )  1.1  1.12 ) mod 10  7 8
Collision ?.......
9
QP (45,2)  ( H (45 )  1.2  1.2 2 ) mod 10  1
Example1 :Quadratic probing 0 10
m = 10 {0-9}
1 45
H(k)=k mod m
Key=10,15, 74, 20, 35, 45, 92 2 20
CRT= QP 3
QP (92 ,0)  ( H (92 )  1.0  1.0 2 ) mod 10  2 4 74
Collision ?.......
5 15
QP (92 ,1)  ( H (92 )  1.1  1.12 ) mod 10  4
Collision ?....... 6
QP (92 ,2)  ( H (92 )  1.2  1.2 2 ) mod 10  8 7 35
8 92
9
Example 2.:Quadratic probing 0 22
m = 11 {0-10}
1
H(k)=k mod m
Key=10,22, 31, 4, 15, 18, 17, 88, 59 2
CRT= QP, C1 =1, C2 = 1
3
QP (10 ,0)  ( H (10 )  1.0  1.0 2 ) mod 11  10 4 4
QP ( 22 ,0)  ( H ( 22 )  1.0  1.0 2 ) mod 11  0
5
QP (31,0)  ( H (31)  1.0  1.0 2 ) mod 11  9

QP (4,0)  ( H (4)  1.0  1.0 ) mod 11  4


2 6 15

QP (15,0)  ( H (15 )  1.0  1.02 ) mod 11  4


7
Collision ?....... 8
QP (15,1)  ( H (15 )  1.1  1.12 ) mod 11  6 9 31
Example 2:Quadratic probing 0 22
m = 11 {0-10}
H(k)=k mod m
1
Key=10,22, 31, 4, 15, 18, 17, 88, 59 2 88
CRT= QP
3
QP (18,0)  ( H (18 )  1.0  1.0 2 ) mod 11  7
4 4
QP (17 ,0)  ( H (17 )  1.0  1.0 2 ) mod 11  6
Collision ?.......
5
QP (17 ,1)  ( H (17 )  1.1  1.12 ) mod 11  8 6 15

QP (88,0)  ( H (88 )  1.0  1.02 ) mod 11  0 7 18

Collision ?.......
8 17

QP (88,1)  ( H (88 )  1.1  1.12 ) mod 11  2 9 31


Example 2:Quadratic probing 0 22
m = 11 {0-10}
1
H(k)=k mod m
Key=10,22, 31, 4, 15, 18, 17, 88, 59 2 88
CRT= QP 3
QP (59 ,0)  ( H (59 )  1.0  1.0 2 ) mod 11  4 4
Collision ?.......
4
QP (59 ,1)  ( H (59 )  1.1  1.12 ) mod 11  6 5 59

Collision ?....... 6 15
QP (59 ,2)  ( H (59 )  1.2  1.2 2 ) mod 11  10 7 18
Collision ?....... 8 17
QP (59 ,3)  ( H (59 )  1.3  1.32 ) mod 11  5 9 31
Quadratic Probing
• Quadratic probing uses hash function of the form
h(k , i)  (h ' (k )  C1i  C2i 2 ) mod m
where C1 and C2 are positive constants and i={0,1,2,…,m-1}
• The initial position probed is T [h ' (k )] later positions
probed are offset by amounts that depend in a quadratic
manner on the probe number i.
Quadratic Probing 0 10
•If two keys have the same initial probe position,
then their probe sequences are the same, 1 45
since h(k1 ,0)  h(k 2 ,0) 2 20
implies h(k1 , i )  h(k2 , i )
• This property leads to Secondary Clustering.
3
m = 10 {0-9} 4 74
H(k)=k mod m 5 15
Key=10,15, 74, 20, 35, 45, 92
CRT= QP 6
7 35
• Similar to linear probing, the initial probe 8 92
determines the entire sequence and so only m
distinct probe sequences are used. 9
Linear and Quadratic
m = 10 {0-9}
H(k)=k mod m
Key=10,15, 74, 20,
35, 45, 92, 30
CRT= QP

Primary Cluster

Secondary Cluster

Linear Probing Quadratic Probing


Double hashing
• Double hashing is a hash function of the form

h(k , i)  (h1 (k )  ih2 (k )) mod m


i {0,1,...,m  1}
here,
h1(k) = k mod m and 2 h ( k )  1  ( k mod m '
)
Where m ' is chosen to be slightly less than m

• m2 probe sequences are used, rather than m, since each


possible (h1(k), h2(k)) pair yields a distinct probe sequence.

• Double hashing introduced to solve the problem of


Secondary Clustering.
Example 1 : Double hashing 0
• m = 13 1 79
2
Key = 79, 69, 72, 98, 50, 14
3
h1(key) = key mod m
4 69
h2(key) = 1+(key mod (m-3) )
5
CRT= Double Hashing
6
H (key, i)  (h1 (key)  ih2 (key)) mod 13 7 72
8
H (79,0)  (h1 (79 )  0.h2 (79 )) mod 13  1
9
H (69,0)  (h1 (69 )  0.h2 (69 )) mod 13  4
10
H (72,0)  (h1 (72 )  0.h2 (72 )) mod 13  7
11

12
Example 1: Double hashing 0
• m = 13 1 79
2
Key = 79, 69, 72, 98, 50, 14
3 98
h1(k) = k mod m
4 69
h2(k) = 1+(k mod (m-3) )
5
CRT= Double Hashing
6
H (98,0)  (h1 (98)  0.h2 (98)) mod 13  7 7 62
Collision ?.... 8
H (98,1)  (h1 (98)  1.h2 (98)) mod 13  3 9

10
H (50,0)  (h1 (50 )  0.h2 (50 )) mod 13  11
11 50
12
Example 1: Double hashing 0
• m = 13 1 79
2
Key = 79, 69, 72, 98, 50, 14
3 98
h1(k) = k mod m
4 69
h2(k) = 1+(k mod (m-3) )
5
CRT= Double Hashing
6 14
H (14,0)  (h1 (14 )  0.h2 (14 )) mod 13  1 7 62
Collision ?.... 8
H (14,1)  (h1 (14 )  1.h2 (14 )) mod 13  6 9

10

11 50
12
Example 2: Double hashing 0 22

• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5
CRT= Double Hashing
6
H (key, i)  (h1 (key)  ih2 (key)) mod 11 7
8
H (10,0)  (h1 (10 )  0.h2 (10 )) mod 11  10
9 31
H (22,0)  (h1 (22 )  0.h2 (22 )) mod 11  0 10 10
H (31,0)  (h1 (31)  0.h2 (31)) mod 11  9
H (4,0)  (h1 (4)  0.h2 (4)) mod 11  4
Example 2: Double hashing 0 22

• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6
H (key, i)  (h1 (key)  ih2 (key)) mod 11 7
H (15,0)  (h1 (15)  0.h2 (15)) mod 11  4 8
9 31
Collision ?.....
H (15,1)  (h1 (15)  1.h2 (15)) mod 11  10 10 10
Collision ?.....
H (15,2)  (h1 (15)  2.h2 (15)) mod 11  5
Example 2: Double hashing 0 22

• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i)  (h1 (key)  ih2 (key)) mod 11 7 18

H (18,0)  (h1 (18)  0.h2 (18)) mod 11  7 8


9 31
H (17 ,0)  (h1 (17 )  0.h2 (17 )) mod 11  6 10 10
Example 2: Double hashing 0 22

• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i)  (h1 (key)  ih2 (key)) mod 11 7 18
H (88,0)  (h1 (88)  0.h2 (88)) mod 11  0 8
Collision ?..... 9 31

H (88,1)  (h1 (88)  1.h2 (88)) mod 11  9 10 10


Collision ?.....
H (88,2)  (h1 (88)  2.h2 (88)) mod 11  7
Collision ?.....
Example 2: Double hashing 0 22

• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3 88
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i)  (h1 (key)  ih2 (key)) mod 11 7 18

H (88,3)  (h1 (88)  3.h2 (88 )) mod 11  5 8


9 31
Collision ?.....
10
H (88,4)  (h1 (88)  4.h2 (88 )) mod 11  3 10
Example 2: Double hashing 0 22

• m = 11 1
2 59
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3 88
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i)  (h1 (key)  ih2 (key)) mod 11 7 18

H (59,0)  (h1 (59 )  0.h2 (59 )) mod 11  4 8


9 31
Collision ?.....
10
H (59,1)  (h1 (59 )  1.h2 (59 )) mod 11  3 10

Collision ?.....
H (59,2)  (h1 (59 )  2.h2 (59 )) mod 11  2

You might also like