Hashing 1
Hashing 1
value (or index) in the hash table. Since the hash function produces a fixed-size output (usually an
integer), there are always possibilities for different keys to produce the same hash value. Collisions
are inevitable in hashing when the number of possible keys exceeds the size of the hash table
(pigeonhole principle).
Types of Collisions
1. Collision: This occurs when two different keys are mapped to the same location in the hash
table but are still distinct keys.
Handling Collisions
There are several strategies to handle collisions in hashing:
Chaining (or Separate Chaining) is a collision resolution technique used in hash tables to handle
situations where multiple keys hash to the same index (i.e., a collision). In this method, each slot of
the hash table does not store just one key but instead holds a linked list (or another collection) of
keys that hash to the same index.
Structure of Chaining
1. Hash Table: The hash table is an array of slots (or buckets), where each slot contains a
pointer to a linked list (or other dynamic data structure).
2. Linked List: Every slot in the table stores a linked list that holds all the keys that hash to
that specific index.
Operations in Chaining
1. Insertion: When inserting a new key:
• Compute the hash value using a hash function.
• If the corresponding index is empty, insert the key directly at that index.
• If the index already contains other keys (due to collisions), the new key is added to
the linked list at that index.
2. Search: To search for a key:
• Compute the hash value of the key.
• Check the corresponding index in the hash table.
• If there is a linked list at that index, iterate through the list to find the key.
3. Deletion: To delete a key:
• Compute the hash value of the key.
• Check the corresponding index in the hash table.
• If the key is found in the linked list, remove it from the list.
Advantages of Chaining
1. Simple to implement: Chaining is easy to implement and does not require resizing the hash
table when collisions occur, unlike open addressing techniques.
2. Dynamic size: The linked list allows the hash table to grow dynamically without needing to
resize the table. The only limitation is the memory available for the linked lists.
3. Flexible handling of collisions: Even if multiple collisions occur, the size of the linked lists
can grow to accommodate the additional keys. The hash table size does not need to be
increased unless necessary (e.g., if the load factor becomes too large).
4. No clustering: Unlike open addressing methods (e.g., linear or quadratic probing), chaining
does not suffer from clustering (groups of consecutive filled slots) because each hash table
slot is treated independently with its linked list.
Disadvantages of Chaining
1. Extra space overhead: Each index in the hash table requires additional space for a linked
list or other dynamic structures. This overhead could be significant if many collisions occur
and the linked lists grow long.
2. Slower access time with long chains: If many elements hash to the same index, the time
complexity for searching, inserting, or deleting an element could degrade to O(n) in the
worst case (where n is the number of elements in the linked list).
3. Cache performance: Chaining can be less cache-efficient because the elements are stored
in non-contiguous memory locations, as opposed to open addressing, where elements are
stored in contiguous locations.
Example of Chaining
Let's assume we have a hash table of size 5 and we want to insert the keys 25, 17, 3, 7, and 12. We
will use a simple modulo hash function (key % 5).
Visual Example
• Key 25: 25 % 5 = 0
• Key 17: 17 % 5 = 2
• Key 3: 3 % 5 = 3
• Key 7: 7 % 5 = 2 (collision occurs here because 17 already occupies index 2)
• Key 12: 12 % 5 = 2 (another collision at index 2)
In this case, the collision would need to be resolved using one of the techniques mentioned above,
such as chaining or open addressing.
Conclusion
Collisions are an inherent part of hashing, but they can be effectively managed using various
techniques like chaining, open addressing, and rehashing. The key is to choose the right collision
resolution strategy based on the expected data size and access patterns to ensure efficient hash table
operations.