0% found this document useful (0 votes)
10 views4 pages

Hashing 1

The document discusses collisions in hashing, where multiple keys map to the same hash value, and outlines various strategies for handling these collisions, primarily focusing on chaining. Chaining allows multiple keys to be stored at the same index using linked lists, while also detailing its advantages and disadvantages. Other methods mentioned include open addressing and rehashing, emphasizing the importance of managing the load factor for efficient hashing performance.

Uploaded by

Rupali Kaldoke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

Hashing 1

The document discusses collisions in hashing, where multiple keys map to the same hash value, and outlines various strategies for handling these collisions, primarily focusing on chaining. Chaining allows multiple keys to be stored at the same index using linked lists, while also detailing its advantages and disadvantages. Other methods mentioned include open addressing and rehashing, emphasizing the importance of managing the load factor for efficient hashing performance.

Uploaded by

Rupali Kaldoke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 4

In hashing, collision refers to the situation where two or more keys are mapped to the same hash

value (or index) in the hash table. Since the hash function produces a fixed-size output (usually an
integer), there are always possibilities for different keys to produce the same hash value. Collisions
are inevitable in hashing when the number of possible keys exceeds the size of the hash table
(pigeonhole principle).

Types of Collisions
1. Collision: This occurs when two different keys are mapped to the same location in the hash
table but are still distinct keys.

Handling Collisions
There are several strategies to handle collisions in hashing:

1. Chaining (Separate Chaining)


• In this method, each slot of the hash table points to a linked list (or another dynamic data
structure like a tree) that stores all the keys that hash to the same index.
• Advantage: It allows for efficient insertion, and the hash table size can remain fixed,
dynamically handling a variable number of keys.
• Disadvantage: If there are too many collisions, the linked lists become long, and the
performance of search operations could degrade to O(n) in the worst case.
Example: If two keys k1 and k2 hash to index i, then index i will hold a linked list containing k1
and k2.

Chaining (or Separate Chaining) is a collision resolution technique used in hash tables to handle
situations where multiple keys hash to the same index (i.e., a collision). In this method, each slot of
the hash table does not store just one key but instead holds a linked list (or another collection) of
keys that hash to the same index.

How Chaining Works


In a hash table, when a collision occurs (i.e., two keys hash to the same index), the new key is
added to the linked list (or another structure, such as a binary search tree or a doubly linked list) at
that particular index. This way, multiple keys can be stored in the same index without any loss of
information.

Structure of Chaining
1. Hash Table: The hash table is an array of slots (or buckets), where each slot contains a
pointer to a linked list (or other dynamic data structure).
2. Linked List: Every slot in the table stores a linked list that holds all the keys that hash to
that specific index.

Operations in Chaining
1. Insertion: When inserting a new key:
• Compute the hash value using a hash function.
• If the corresponding index is empty, insert the key directly at that index.
• If the index already contains other keys (due to collisions), the new key is added to
the linked list at that index.
2. Search: To search for a key:
• Compute the hash value of the key.
• Check the corresponding index in the hash table.
• If there is a linked list at that index, iterate through the list to find the key.
3. Deletion: To delete a key:
• Compute the hash value of the key.
• Check the corresponding index in the hash table.
• If the key is found in the linked list, remove it from the list.

Advantages of Chaining
1. Simple to implement: Chaining is easy to implement and does not require resizing the hash
table when collisions occur, unlike open addressing techniques.
2. Dynamic size: The linked list allows the hash table to grow dynamically without needing to
resize the table. The only limitation is the memory available for the linked lists.
3. Flexible handling of collisions: Even if multiple collisions occur, the size of the linked lists
can grow to accommodate the additional keys. The hash table size does not need to be
increased unless necessary (e.g., if the load factor becomes too large).
4. No clustering: Unlike open addressing methods (e.g., linear or quadratic probing), chaining
does not suffer from clustering (groups of consecutive filled slots) because each hash table
slot is treated independently with its linked list.

Disadvantages of Chaining
1. Extra space overhead: Each index in the hash table requires additional space for a linked
list or other dynamic structures. This overhead could be significant if many collisions occur
and the linked lists grow long.
2. Slower access time with long chains: If many elements hash to the same index, the time
complexity for searching, inserting, or deleting an element could degrade to O(n) in the
worst case (where n is the number of elements in the linked list).

3. Cache performance: Chaining can be less cache-efficient because the elements are stored
in non-contiguous memory locations, as opposed to open addressing, where elements are
stored in contiguous locations.

Example of Chaining
Let's assume we have a hash table of size 5 and we want to insert the keys 25, 17, 3, 7, and 12. We
will use a simple modulo hash function (key % 5).

• Insert 25: 25 % 5 = 0. Index 0 is empty, so 25 is inserted at index 0.


• Insert 17: 17 % 5 = 2. Index 2 is empty, so 17 is inserted at index 2.
• Insert 3: 3 % 5 = 3. Index 3 is empty, so 3 is inserted at index 3.
• Insert 7: 7 % 5 = 2. Index 2 is already occupied by 17, so 7 is added to the linked list at
index 2.
• Insert 12: 12 % 5 = 2. Index 2 already contains 17 and 7, so 12 is added to the linked
list at index 2.
After these insertions, the hash table will look like this:

Index Linked List


0 25
1 (empty)
2 17 → 7 → 12
3 3
4 (empty)

Visual Example

2. Open Addressing (Closed Hashing)


• In this approach, all elements are stored within the hash table itself. When a collision occurs,
the algorithm looks for the next available slot according to a probe sequence.
• There are several probing techniques:
• Linear Probing: If a collision occurs at index i, then check index i+1, then i+2,
and so on (wrap around if needed).
• Quadratic Probing: Instead of a linear search, it probes in a quadratic manner (i.e.,
i + 1^2, i + 2^2, i + 3^2, etc.).
• Double Hashing: Use a second hash function to calculate the next position when a
collision occurs.
Advantage: It can be more space-efficient since no additional data structures like linked lists are
used. Disadvantage: It may lead to clustering (when a series of consecutive hash slots are filled)
and requires careful management of table size to maintain good performance.

3. Rehashing (Dynamic Resizing)


• Rehashing involves resizing the hash table when the number of collisions exceeds a certain
threshold. When this happens, a new hash table is created with a larger size, and all elements
are reinserted using a new hash function.
• Advantage: Keeps the load factor (number of elements per table slot) in check, leading to
fewer collisions.
• Disadvantage: Rehashing can be expensive in terms of time complexity, especially when
resizing happens frequently.

Load Factor and Performance


The load factor is the ratio of the number of elements to the size of the hash table. A high load
factor increases the likelihood of collisions. Managing this factor is key to efficient hashing:
• Low load factor: Fewer collisions, but inefficient use of memory.
• High load factor: More collisions, leading to degraded performance.

Example of Collision Handling


Let's assume we have a hash table of size 5, and we are inserting keys 25, 17, 3, 7, and 12. Using
a simple modulo hash function (key % table_size):

• Key 25: 25 % 5 = 0
• Key 17: 17 % 5 = 2
• Key 3: 3 % 5 = 3
• Key 7: 7 % 5 = 2 (collision occurs here because 17 already occupies index 2)
• Key 12: 12 % 5 = 2 (another collision at index 2)

In this case, the collision would need to be resolved using one of the techniques mentioned above,
such as chaining or open addressing.

Conclusion
Collisions are an inherent part of hashing, but they can be effectively managed using various
techniques like chaining, open addressing, and rehashing. The key is to choose the right collision
resolution strategy based on the expected data size and access patterns to ensure efficient hash table
operations.

You might also like