9.hash Function and Hash Table
9.hash Function and Hash Table
Hashing
Hashing is the solution that can be used in almost all such situations and
performs extremely well compared to above data structures like Array, Linked
List, Balanced BST in practice. With hashing O(1) search time on average
(under reasonable assumptions) and O(n) in worst case.
Hashing is an improvement over Direct Access Table. The idea is to use hash
function that converts a given phone number or any other key to a smaller
number and uses the small number as index in a table called hash table.
A function that converts a given big phone number to a small practical integer
value. The mapped integer value is used as an index in hash table. In simple
terms, a hash function maps a big number or string to a small integer that can be
used as index in hash table.
A good hash function should have following properties
1) Efficiently computable.
2) Should uniformly distribute the keys (Each table position equally likely
for each key)
For example for phone numbers a bad hash function is to take first three digits.
A better function is consider last three digits. Please note that this may not be
the best hash function. There may be better ways.
Hash Table:
An array that stores pointers to records corresponding to a given phone number.
An entry in hash table is NIL if no existing phone number has hash function
value equal to the index for the entry.
Division Method.
Mid Square Method.
Folding Method.
Multiplication Method.
This is the most simple and easiest method to generate a hash value. The hash
function divides the value k by M and then uses the remainder obtained.
It is best suited that M is a prime number as that can make sure the keys are
more uniformly distributed. The hash function is dependent upon the remainder
of a division.
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
3 June 2022 C-DAC 6
Division Method
Pros:
This method is quite good for any value of M.
The division method is very fast since it requires only a single division
operation.
Cons:
This method leads to poor performance since consecutive keys map to
consecutive hash values in the hash table.
Sometimes extra care should be taken to chose value of M.
Pros:
The performance of this method is good as most or all digits of the key value
contribute to the result. This is because all digits in the key contribute to
generating the middle digits of the squared result.
The result is not dominated by the distribution of the top digit or bottom digit of
the original key value.
Cons:
The size of the key is one of the limitations of this method, as the key is of big
size then its square will double the number of digits.
Another disadvantage is that there will be collisions but we can try to reduce
collisions.
3 June 2022 C-DAC 11
Digit Folding Method
1) Separate chaining:-
This technique creates a linked list to the slot for which collision occurs.
The new key is then inserted in the linked list.
These linked lists to the slots appear like chains.
That is why, this technique is called as separate chaining.
Complexity is O(log n).The data is stored in each of these linked lists.
Data type for storage :- LinkedList[ ] Table; Table = new LinkedList(N), where
N is the table size
(i) Linear probing:- : When collision occurs, scan down the array one cell at a
time looking for an empty cell.
(ii) Quadratic probing:- when an incoming data's hash value indicates it
should be stored in an already-occupied slot or bucket. Spread out the search for
an empty slot – Increment by i^2 instead of i.
Whenever an element is to be inserted, compute the hash code of the key passed
and locate the index using that hash code as an index in the array.
Use linear probing for empty location, if an element is found at the computed
hash code.
Whenever an element is to be deleted, compute the hash code of the key passed
and locate the index using that hash code as an index in the array.
Use linear probing to get the element ahead if an element is not found at the
computed hash code.
When found, store a dummy item there to keep the performance of the hash
table intact.