0% found this document useful (0 votes)
21 views

Collision

Uploaded by

Richa Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Collision

Uploaded by

Richa Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Collisions

Definition: A collision occurs when more than one value to be


hashed by a particular hash function hash to the same slot in the
table or data structure (hash table) being generated by the hash
function.

Example Hash Table With Collisions:


Let’s take the exact same hash function from before: take the value to be
hashed mod 10, and place it in that slot in the hash table.
Numbers to hash: 22, 9, 14, 17, 42
As before, the hash table is shown to the right.​
As before, we hash each value as it appears in the
string of values to hash, starting with the first value.
The first four values can be entered into the hash
table without any issues. It is the last
value, 42, however, that causes a problem. 42 mod
10 = 2, but there is already a value in slot 2 of the
hash table, namely 22. This is a collision.​

Collision Resolution
Techniques

Chaining Open Addressing


(Open Hashing) (Closed Hashing)
1. Linear Probing
2. Quadratic Probing
3. Double Hashing
Chaining
When we use chaining to resolve collisions, we simply allow each slot in the
hash table to accept more than one value. Therefore, in the example above, 42
would simply go in slot 2, as the hash function told us, in a list after 22.
The chaining technique
In the chaining approach, the hash table is an array of linked lists i.e., each
index has its own linked list.
All key-value pairs mapping to the same index will be stored in the linked list
of that index.

The benefits of chaining


• Through chaining, insertion in a hash table always occurs in O(1) since
linked lists allow insertion in constant time.
• Theoretically, a chained hash table can grow infinitely as long as there
is enough space.
• A hash table which uses chaining will never need to be resized.
if a phone book is implemented using hash tables
with separate chaining and A and B resolve to same
index, how will the algorithm deduce whose phone
number to return?
You need to do a linear search on the linked list. Therefore, in the worst
case, hashing with chaining would have complexity O(N). That would
happen if all values in the hash table were mapped to the same index
(which a reasonable hash function would try to avoid of course)
Open Addressing:
Like separate chaining, open addressing is a method for handling collisions.
In Open Addressing, all elements are stored in the hash table itself. So at
any point, the size of the table must be greater than or equal to the total
number of keys (Note that we can increase table size by copying old data if
needed). This approach is also known as closed hashing. This entire
procedure is based upon probing. We will understand the types of probing
ahead:
• Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.
• Search(k): Keep probing until the slot’s key doesn’t become equal to k or an empty slot is
reached.
• Delete(k): Delete operation is interesting. If we simply delete a key, then the search may
fail. So slots of deleted keys are marked specially as “deleted”.
The insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot.
Different ways of Open Addressing:
1. Linear Probing:
In linear probing, the hash table is searched sequentially that starts from
the original location of the hash. If in case the location that we get is
already occupied, then we check for the next location.
The function used for rehashing is as follows: rehash(key) =
(n+1)%table-size.
For example, The typical gap between two probes is 1 as seen in the
example below:
Challenges in Linear Probing :
•Primary Clustering: One of the problems with linear probing is
Primary clustering, many consecutive elements form groups and it
starts taking time to find a free slot or to search for an element.
•Secondary Clustering: Secondary clustering is less severe, two
records only have the same collision chain (Probe Sequence) if their
initial position is the same.
Example: Let us consider a simple hash function as “key mod 5” and a
sequence of keys that are to be inserted are 50, 70, 76, 93.
•Step1: First draw the empty hash table which will have a possible range
of hash values from 0 to 4 according to the hash function provided.
•Step 2: Now insert all the keys in the hash table one by one. The first
key is 50. It will map to slot number 0 because 50%5=0. So insert it
into slot number 0.

Step 3: The next key is 70. It will map to


slot number 0 because 70%5=0 but 50 is
already at slot number 0 so, search for
the next empty slot and insert it
•Step 4: The next key is 76. It
will map to slot number 1
because 76%5=1 but 70 is
already at slot number 1 so,
search for the next empty slot
and insert it.
Step 5: The next key is 93 It will
map to slot number 3 because
93%5=3, So insert it into slot
number 3
2. Quadratic Probing
If you observe carefully, then you will understand that the
interval between probes will increase proportionally to the
hash value. Quadratic probing is a method with the help of
which we can solve the problem of clustering that was
discussed above. This method is also known as the mid-
square method. In this method, we look for the i2‘th slot
in the ith iteration. We always start from the original hash
location. If only the location is occupied then we check the
other slots.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
…………………………………………..
Example: Let us consider table Size = 7, hash function as
Hash(x) = x % 7 and collision resolution strategy to be f(i) =
i2 . Insert = 22, 30, and 50.
•Step 1: Create a table of
size 7.
•Step 2 – Insert 22 and 30
• Hash(22) = 22 % 7 = 1, Since the cell at index 1 is empty, we can
easily insert 22 at slot 1.
• Hash(30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can
easily insert 30 at slot 2.
•Step 3: Inserting 50
• Hash(50) = 50 % 7 = 1
• In our hash table slot 1 is already occupied. So, we will search for
slot 1+12, i.e. 1+1 = 2,
• Again slot 2 is found occupied, so we will search for cell 1+22,
i.e.1+4 = 5,
• Now, cell 5 is not occupied so we will place 50 in slot 5.
3. Double Hashing
The intervals that lie between probes are computed by another hash
function. Double hashing is a technique that reduces clustering in an
optimized way. In this technique, the increments for the probing
sequence are computed by using another hash function. We use
another hash function hash2(x) and look for the i*hash2(x) slot in
the ith rotation.

let hash(x) be the slot index computed using hash function.


If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) +
2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) +
3*hash2(x)) % S
…………………………………………..
Example: Insert the keys 27, 43, 92, 72 into the Hash Table of size 7.
where first hash-function is h1​(k) = k mod 7 and second hash-
function is h2(k) = 1 + (k mod 5)
•Step 1: Insert 27
• 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.
•Step 2: Insert 43
• 43 % 7 = 1, location 1 is empty so insert 43
into 1 slot.
•Step 3: Insert 92
• 92 % 7 = 6, but location 6 is already being occupied and this is a
collision
• So we need to resolve this collision using double hashing.
hnew = [h1(92) + i * (h2(92)] % 7
= [6 + 1 * (1 + 92 % 5)] % 7
=9%7
=2

Now, as 2 is an empty slot,


so we can insert 92 into 2nd slot.
•Step 4: Insert 72
• 72 % 7 = 2, but location 2 is already being occupied and this is a collision.
• So we need to resolve this collision using double hashing.

hnew = [h1(72) + i * (h2(72)] % 7


= [2 + 1 * (1 + 72 % 5)] % 7
=5%7
= 5,
Now, as 5 is an empty slot,
so we can insert 72 into 5th slot.
Comparison of the above three:
•Linear probing has the best cache performance but suffers
from clustering. One more advantage of Linear probing is
easy to compute.
•Quadratic probing lies between the two in terms of cache
performance and clustering.
•Double hashing has poor cache performance but no
clustering. Double hashing requires more computation time
as two hash functions need to be computed.
S.No. Separate Chaining Open Addressing

1. Chaining is Simpler to implement. Open Addressing requires more computation.

2. In chaining, Hash table never fills up, we can always add In open addressing, table may become full.
more elements to chain.

3. Chaining is Less sensitive to the hash function or load Open addressing requires extra care to avoid clustering
factors. and load factor.

4. Chaining is mostly used when it is unknown how many Open addressing is used when the frequency and number
and how frequently keys may be inserted or deleted. of keys is known.

5. Cache performance of chaining is not good as keys are Open addressing provides better cache performance as
stored using linked list. everything is stored in the same table.

6. Wastage of Space (Some Parts of hash table in chaining In Open addressing, a slot can be used even if an input
are never used). doesn’t map to it.

7. Chaining uses extra space for links. No links in Open addressing


Note: Cache performance of chaining is not good because
when we traverse a Linked List, we are basically jumping
from one node to another, all across the computer’s
memory. For this reason, the CPU cannot cache the nodes
which aren’t visited yet, this doesn’t help us. But with Open
Addressing, data isn’t spread, so if the CPU detects that a
segment of memory is constantly being accessed, it gets
cached for quick access.

You might also like