0% found this document useful (0 votes)
43 views

9.hash Function and Hash Table

Hashing provides efficient O(1) search time by mapping keys to array indices via hash functions. A good hash function uniformly distributes keys and is efficiently computable. Common hash functions include division, mid-square, digit folding, and multiplication methods. Collisions are resolved via separate chaining by linking keys to the same index, or open addressing by probing to the next open index. Double hashing combines two hash functions to determine probe locations. Elements are inserted by computing the hash code and probing linearly if occupied, and deleted by setting the element to a dummy value.

Uploaded by

Patil Vinayak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

9.hash Function and Hash Table

Hashing provides efficient O(1) search time by mapping keys to array indices via hash functions. A good hash function uniformly distributes keys and is efficiently computable. Common hash functions include division, mid-square, digit folding, and multiplication methods. Collisions are resolved via separate chaining by linking keys to the same index, or open addressing by probing to the next open index. Double hashing combines two hash functions to determine probe locations. Elements are inserted by computing the hash code and probing linearly if occupied, and deleted by setting the element to a dummy value.

Uploaded by

Patil Vinayak
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

3 June 2022 C-DAC 1

Hashing

Hashing is the solution that can be used in almost all such situations and
performs extremely well compared to above data structures like Array, Linked
List, Balanced BST in practice. With hashing O(1) search time on average
(under reasonable assumptions) and O(n) in worst case.

Hashing is an improvement over Direct Access Table. The idea is to use hash
function that converts a given phone number or any other key to a smaller
number and uses the small number as index in a table called hash table.

3 June 2022 C-DAC 2


Hash Function

A function that converts a given big phone number to a small practical integer
value. The mapped integer value is used as an index in hash table. In simple
terms, a hash function maps a big number or string to a small integer that can be
used as index in hash table.
A good hash function should have following properties
1) Efficiently computable.
2) Should uniformly distribute the keys (Each table position equally likely
for each key)
For example for phone numbers a bad hash function is to take first three digits.
A better function is consider last three digits. Please note that this may not be
the best hash function. There may be better ways.

3 June 2022 C-DAC 3


Hash Table

Hash Table:
An array that stores pointers to records corresponding to a given phone number.
An entry in hash table is NIL if no existing phone number has hash function
value equal to the index for the entry.

3 June 2022 C-DAC 4


Types of Hash functions

Division Method.
Mid Square Method.
Folding Method.
Multiplication Method.

3 June 2022 C-DAC 5


Division Method

This is the most simple and easiest method to generate a hash value. The hash
function divides the value k by M and then uses the remainder obtained.

Formula: h(K) = k mod M


Here, k is the key value, and M is the size of the hash table.

It is best suited that M is a prime number as that can make sure the keys are
more uniformly distributed. The hash function is dependent upon the remainder
of a division.
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
3 June 2022 C-DAC 6
Division Method

Pros:
This method is quite good for any value of M.
The division method is very fast since it requires only a single division
operation.

Cons:
This method leads to poor performance since consecutive keys map to
consecutive hash values in the hash table.
Sometimes extra care should be taken to chose value of M.

3 June 2022 C-DAC 7


Mid Square Method

It involves two steps to compute the hash value-


1) Square the value of the key k i.e. k2
2) Extract the middle r digits as the hash value.
Formula: h(K) = h(k x k)
Here, k is the key value. The value of k is based on the size of the table.
Example:
Suppose the hash table has 100 memory locations. So r = 2 because two digits
are required to map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
The hash value obtained is 60

3 June 2022 C-DAC 8


Mid Square Method - Example

Suppose a 4-digit seed is taken. seed = 4765


Hence, square of seed is = 4765 * 4765 = 22705225
Now, from this 8-digit number, any four digits are extracted (Say, the middle
four).
So, the new seed value becomes seed = 7052
Now, square of this new seed is = 7052 * 7052 = 49730704
Again, the same set of 4-digits is extracted.
So, the new seed value becomes seed = 7307
This process is repeated as many times as a key is required.

3 June 2022 C-DAC 9


Mid Square Method - Example

3 June 2022 C-DAC 10


Mid Square Method

Pros:
The performance of this method is good as most or all digits of the key value
contribute to the result. This is because all digits in the key contribute to
generating the middle digits of the squared result.
The result is not dominated by the distribution of the top digit or bottom digit of
the original key value.

Cons:
The size of the key is one of the limitations of this method, as the key is of big
size then its square will double the number of digits.
Another disadvantage is that there will be collisions but we can try to reduce
collisions.
3 June 2022 C-DAC 11
Digit Folding Method

This method involves two steps:


1. Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where
each part has the same number of digits except for the last part that can have
lesser digits than the other parts.
2. Add the individual parts. Hash value is obtained by ignoring last carry if any.
Formula:
k = k1, k2, k3, k4, ….., kn
s = k1+ k2 + k3 + k4 +….+ kn
h(K)= s
Here,s is obtained by adding the parts of the key k
Example:
k = 12345; k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5 = 51 ie h(K) = 51
3 June 2022 C-DAC 12
Collision Resolution

3 June 2022 C-DAC 13


Collision Resolution

1) Separate chaining:-
This technique creates a linked list to the slot for which collision occurs.
The new key is then inserted in the linked list.
These linked lists to the slots appear like chains.
That is why, this technique is called as separate chaining.
Complexity is O(log n).The data is stored in each of these linked lists.
Data type for storage :- LinkedList[ ] Table; Table = new LinkedList(N), where
N is the table size

3 June 2022 C-DAC 14


Collision Resolution

2) Open chaining (Probing)

(i) Linear probing:- : When collision occurs, scan down the array one cell at a
time looking for an empty cell.
(ii) Quadratic probing:- when an incoming data's hash value indicates it
should be stored in an already-occupied slot or bucket. Spread out the search for
an empty slot – Increment by i^2 instead of i.

3 June 2022 C-DAC 15


Double Hashing

1.Take the key to store on the hash-table.


2.Apply the first hash function h_1h1​(key) over key to get the location to store the
key.
3.If the location is empty, place the key on that location.
4.If the location is already filled, apply the second hash function h_2h2​(key) in
combination with the first hash function h_1h1​(key) to get the new location for the
key.

3 June 2022 C-DAC 16


Double Hashing

3 June 2022 C-DAC 17


Insert Operation

Whenever an element is to be inserted, compute the hash code of the key passed
and locate the index using that hash code as an index in the array.
Use linear probing for empty location, if an element is found at the computed
hash code.

3 June 2022 C-DAC 18


Delete Operation

Whenever an element is to be deleted, compute the hash code of the key passed
and locate the index using that hash code as an index in the array.
Use linear probing to get the element ahead if an element is not found at the
computed hash code.
When found, store a dummy item there to keep the performance of the hash
table intact.

3 June 2022 C-DAC 19

You might also like