HASHING
HASHING
table)
Hash
function
Table size.
Collision handling scheme
0
1
2
3
.
. Simple Hash table with table size = 10
8
9
Page 1 of 271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS
Hash function:
It is a function, which distributes the keys evenly among the cells in the
Hash Table.
Using the same hash function we can retrieve data from the hash table.
Hash function is used to implement hash table.
The integer value returned by the hash function is called hash key.
If the input keys are integer, the commonly used hash function is
E.g. consider the following data or record or key (36, 18, 72, 43, 6) table size = 8
93 44
Page 260 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS
3306 107
4999
The folding method for constructing hash functions begins by dividing the item into
equal-size pieces (the last piece may not be of equal size). These pieces are then added together
to give the resulting hash key value. For example, if our item was the phone number 436-555-
4601, we would take the digits and divide them into groups of 2 (43, 65, 55, 46, 01). After the
addition, 43+65+55+46+01, we get 210. If we assume our hash table has 11 slots, then we need
to perform the extra step of dividing by 11 and keeping the remainder. In this case 210 % 11 is
1, so the phone number 436-555-4601 hashes to slot 1.
6-555-4601
Page 261 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS
Collision:
If two more keys hashes to the same index, the corresponding records cannot be stored in the
same location. This condition is known as collision.
Characteristics of Good Hashing Function:
Page 262 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS
P = find ( key, H
); if(P = = NULL)
{
newnode = malloc(sizeof(Struct node));
L = H TheLists[Hash(key,Tablesize)];
newnode nex t= L next;
newnode data = key;
L next = newnode;
}}
Position find( int key, Hashtable H){
Position P, List L;
L = H TheLists[Hash(key,Tablesize)];
P = L next;
while(P != NULL && P data != key)
P = P next;
return P;}
If two keys map to same value, the elements are chained together.
Initial configuration of the hash table with separate chaining. Here we use SLL(Singly Linked List)
concept to chain the elements.
NULL
0
NULL
1
NULL
2
NULL
3
NULL
4
NULL
5
NULL
6
NULL
7
NULL
8
NULL
9
Page 264 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS
Insert the following four keys 22 84 35 62 into hash table of size 10 using separate
chaining. The hash function is
H(key) = key % 10
1. H(22) = 22 % 10 =2 2. 84 % 10 = 4
3.H(35)=35%10=5 4. H(62)=62%10=2
Page 265 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS
Advantages
1. More number of elements can be inserted using array of Link List
Disadvantages
1. It requires more pointers, which occupies more memory space.
2.Search takes time. Since it takes time to evaluate Hash Function and also to traverse
the List
Open Addressing
Closed Hashing
Collision resolution technique
Uses Hi(X)=(Hash(X)+F(i))mod Tablesize
When collision occurs, alternative cells are tried until empty cells are
found. Types:-
Linear Probing
Quadratic Probing
Double
Hashing Hash function
H(key) = key % table size.
Insert Operation
To insert a key; Use the hash function to identify the list to which
the element should be inserted.
Then traverse the list to check whether the element is already present.
If exists, increment the count.
Else the new element is placed at the front of the list.
Linear Probing:
Easiest method to handle collision.
Apply the hash function H (key) = key % table size
Hi(X)=(Hash(X)+F(i))mod Tablesize,where F(i)=i.
How to Probing:
first probe – given a key k, hash to H(key)
second probe – if H(key)+f(1) is occupied, try H(key)
+f(2) And so forth.
Probing Properties:
We force f(0)=0
The ith probe is to (H (key) +f (i)) %table size.
If i reach size-1, the probe has failed.
Depending on f (i), the probe may fail
sooner. Long sequences of probe are costly.
Probe Sequence is:
H (key) % table size
H (key)+1 % Table size
H (Key)+2 % Table
size
Page 266 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS
Page 267 of
271
IIIIII// CCSS88339511--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS
EMPTY 89 18 49 58 69
0 49 49 49
1 58 58
2 69
3
4
5
6
7
8 18 18 18
9 89 89 89 89
Linear probing
Quadratic Probing
To resolve the primary clustering problem, quadratic probing can be used. With quadratic
probing, rather than always moving one spot, move i2 spots from the point of collision, where
i is the number of attempts to resolve the collision.
Another collision resolution method which distributes items more evenly.
Page 268 of
271
IIIIII// CCSS88335911--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS
From the original index H, if the slot is filled, try cells H+12, H+22, H+32,.., H + i2
with wrap-around.
Hi(X)=(Hash(X)+F(i))mod Tablesize,F(i)=i2
Hi(X)=(Hash(X)+ i2)mod Tablesize
Limitation: at most half of the table can be used as alternative locations to resolve collisions.
This means that once the table is more than half full, it's difficult to find an empty spot. This
new problem is known as secondary clustering because elements that hash to the same hash
key will always probe the same alternative cells.
Double Hashing
Double hashing uses the idea of applying a second hash function to the key when a
collision occurs. The result of the second hash function will be the number of positions forms
the point of collision to insert.
There are a couple of requirements for the second function:
It must never evaluate to 0 must make sure that all cells can be probed. Hi(X)=(Hash(X)
+i*Hash2(X))mod Tablesize
A popular second hash function is:
Hash2 (key) = R - (key % R) where R is a prime number that is smaller than the size of the
table.
Rehashing
Once the hash table gets too full, the running time for operations will start to take too
long and may fail. To solve this problem, a table at least twice the size of the original will
be built and the elements will be transferred to the new table.
Advantage:
A programmer doesn‟t worry about table system.
Simple to implement
Can be used in other data structure as well
The new size of the hash table:
should also be prime
will be used to calculate the new insertion spot (hence the name rehashing)
This is a very expensive operation! O(N) since there are N elements to rehash and the
table size is roughly 2N. This is ok though since it doesn't happen that often.
The question becomes when should the rehashing be applied?
Some possible answers:
once the table becomes half
full once an insertion fails
once a specific load factor has been reached, where load factor is the ratio of
the number of elements in the hash table to the table size
Extendible Hashing
Extendible Hashing is a mechanism for altering the size of the hash table to accommodate
new entries when buckets overflow.
Common strategy in internal hashing is to double the hash table and rehash each entry.
However, this technique is slow, because writing all pages to disk is too expensive.
Therefore, instead of doubling the whole hash table, we use a directory of pointers to
buckets, and double the number of buckets by doubling the directory, splitting just
the bucket that overflows.
Since the directory is much smaller than the file, doubling it is much cheaper. Only
one page of keys and pointers is split.
000 100 0 1
010 100
100 000
111 000 100 000
000 100
001 000 111 000
010 100
011 000 101 000
001 000
101 000 111 001
011 000
111 001
001 010
101 100
101 110
00 01 10 11
IIIIII// CCSS88339511--DDAATTAA
IIIIYYRR SSTTRRUUCCTTUURREESS