Week 9_Hash Functions and Collision
Week 9_Hash Functions and Collision
Collision Handling
• The pair is of the form (key, value), where for a given key,
one can find a value using some kind of a “function” that
maps keys to values. The key for a given object can be
calculated using a function called a hash function. For
example, given an array A, if i is the key, then we can find the
value by simply looking up A[i].
Describe the hash function.
• A hash function is a fixed procedure that changes a key into a hash
key.
• This function converts a key into a length-restricted value known as
a hash value or hash.
• Although the hash value is typically less than the original, it
nevertheless represents the original string of characters.
• The digital signature is transferred, and both the hash value and the
signature are then given to the recipient. The hash value generated
by the receiver using the same hash algorithm is compared to the
hash value received along with the message.
• The message is sent without problems if the hash values match.
• Assume we want to create a system for storing employee
records that include phone numbers (as keys). We also want
the following queries to run quickly:
• Division Method.
• Mid Square Method.
• Folding Method.
• Multiplication Method.
• Let’s begin discussing these methods in detail.
1. Division Method:
• This is the most simple and easiest method to generate a hash
value. The hash function divides the value k by M and then
uses the remainder obtained.
• Formula: h(K) = k mod M
• Here,
• k is the key value, and
• M is the size of the hash table.
• It is best suited that M is a prime number as that can make
sure the keys are more uniformly distributed. The hash
function is dependent upon the remainder of a division.
Example:
k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
=0
• Pros:
• Cons:
• k = 60
• k x k = 60 x 60
• = 3600
• h(60) = 60
• Here,
• s is obtained by adding the parts of the key k
• Example:
• k = 12345
• k1 = 12, k2 = 34, k3 = 5
• s = k1 + k2 + k3
• = 12 + 34 + 5
• = 51
• h(K) = 51
• Note:
• The number of digits in each part varies depending upon the
size of the hash table. Suppose for example the size of the
hash table is 100, then each part must have two digits except
for the last part which can have a lesser number of digits.
4. Multiplication Method
• Here,
• M is the size of the hash table.
• k is the key value.
• A is a constant value.
• Example:
• k = 12345
• A = 0.357840
• M = 100
• Cons:
• Separate Chaining
• Open Addressing
• In this article, only separate chaining is discussed. We will be
discussing Open addressing in the next post.
• Separate Chaining:
• The idea behind separate chaining is to implement the
array as a linked list called a chain. Separate chaining
is one of the most popular and commonly used
techniques in order to handle collisions.
• NOTE- The "removed" buckets are handled the same as any other empty
buckets during insertion.
• When searching, the search does not stop when it comes across a "deleted"
bucket.
• Only when the necessary key or an empty bucket are discovered does the
quest come to an end.
Open Addressing
• Open addressing is when
• All the keys are kept inside the hash table, unlike separate
chaining.
• The hash table contains the only key information.
• The methods for open addressing are as follows:
• Linear Probing
• Quadratic Probing
• Double Hashing
• The following techniques are used for open addressing:
• (a) Linear probing
• In linear probing, the hash table is systematically examined
beginning at the hash's initial point. If the site we receive is already
occupied, we look for a different one.
• Let S be the size of the table and let hash(x) be the slot index
calculated using a hash algorithm.
• If slot hash (x) % S is full, then we try ( hash (x) + 1 ) % S
• If ( hash (x) + 1 ) % S is also full, then we try ( hash (x) + 2) % S
• If ( hash (x) + 2 ) % S is also full, then we try ( hash (x) + 3 ) % S
• ..................................................
• Linear probing problems:
• Time Complexity:
• The worst time in linear probing to search an element is O (
table size ). This is due to
• even if all other elements are absent and there is only one
element.
• The hash table's "deleted" markers then force a full table
search.
• Quadratic probing
• If you pay close attention, you will notice that the hash value
will cause the interval between probes to grow. The above-
discussed clustering issue can be resolved with the aid of the
quadratic probing technique. The mid-square method is
another name for this approach. We search for the i2'th slot
in the i'th iteration using this strategy. We always begin
where the hash was generated. We check the other slots if
only the location is taken.
• let hash (x) be the slot index computed using hash function.
• If slot hash(x) % S is full, then we try ( hash (x) + 1*1 ) % S
• If ( hash (x) + 1*1 ) % S is also full, then we try ( hash (x) +
2*2 ) % S
• If ( hash (x) + 2*2 ) % S is also full, then we try ( hash (x) +
3*3 ) % S
• ..................................................
• ..................................................
• Double Hash
• Another hash function calculates the gaps that exist between
the probes. Clustering is optimally reduced by the use of
double hashing. This method uses a different hash function to
generate the increments for the probing sequence. We search
for the slot i*hash2(x) in the i'th rotation using another hash
algorithm, hash2(x).
• let hash(x) be the slot index computed using hash function.
• If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x))
%S
• If (hash(x) + 1*hash2(x)) % S is also full, then we try
(hash(x) + 2*hash2(x)) % S
• If (hash(x) + 2*hash2(x)) % S is also full, then we try
(hash(x) + 3*hash2(x)) % S
• ..................................................
• ..................................................
• Comparing the first three: