Hashing
Hashing
Hashing
Hashing
• Hashing refers to the process of transforming a given key to
another value.
• It involves mapping data to a specific index in a hash table
using a hash function that enables fast retrieval of information
based on its key.
• The transformation of a key to the corresponding value is done
using a Hash Function and the value obtained from the hash
function is called Hash Code .
Need for Hash data structure
● A very common data structure is the Array data structure.
● For a large data set Array data structure becomes inefficient.
● So now we are looking for a data structure that can store the
data and search in it in constant time, i.e. in O(1) time. This is
how Hashing data structure came into play.
● With the introduction of the Hash data structure, it is now
possible to easily store data in constant time and retrieve them
in constant time as well.
Components of Hashing
1. Key: A Key can be anything string or integer which is fed as
input in the hash function.
2. Hash Function: The hash function receives the input key and
returns the index of an element in an array called a hash table.
The index is known as the hash index .
3. Hash Table: Hash table is a data structure that maps keys to
values using a special function called a hash function. Hash
stores the data in an associative manner in an array where
each data value has its own unique index.
How Hashing Works?
The process of hashing can be broken down into three steps:
less.
● It causes wastage of memory space if there is a
Advantages:
● Simple to implement.
● Works well when 𝑚m is a prime number.
Disadvantages:
● Poor distribution if 𝑚m is not chosen wisely.
● The reason why a prime modulo is often chosen is due to the
absence of common factors.
● This allows keys to be more evenly distributed and reduces
clustering, thus improving the performance of the hash table.
Advantages:
● Reduces the probability of collisions.
Disadvantages:
● Requires more computation and storage.
What is Collision?
● Collision in Hashing occurs
when two different keys
map to the same hash
value.
● The probability of a hash
collision depends on the
size of the algorithm, the
distribution of hash values
and the efficiency of Hash
function.
How to handle Collisions
Separate Chaining Example:
Hash function: key mod 7
● The idea is to make each cell of values: 50, 700, 76, 85, 92, 73, 101
the hash table point to a linked
list of records that have the
same hash function value.
● This method is implemented
using the linked list data
structure.
● When numerous elements are
hashed into the same slot index,
those elements are added to a
chain, which is a singly-linked
list.
Advantages:
● easy to implement
● We can always add more elements to the chain, thus the hash table
never runs out of space.
● less susceptible to load factors or the hash function.
● When it is unclear how many or how frequently keys might be added
or removed, it is typically used.
Disadvantages:
● Chaining's cache performance is poor since keys are kept in a linked
list.
● Space wastage (Some Parts of hash table are never used)
● In the worst situation, search time can become O(n) as the chain
lengthens.
● additional space is used for connections.
Open Addressing
Unlike chaining, open addressing doesn't store multiple elements
into the same slot. Here, each slot is either filled with a single key
or left NIL.
Linear Probing
In linear probing, the hash table is searched sequentially that
starts from the original location of the hash. If in case the location
that we get is already occupied, then we check for the next
location.
Algorithm
1. Calculate the hash key. i.e. key = data % size
2. Check, if hashTable[key] is empty
store the value directly by hashTable[key] = data
3. If the hash index already has some value then
check for next index using key = (key+1) % size
4. Check, if the next index is available hashTable[key] then store
the value. Otherwise try for next index.
5. Do the above process till we find the space.
Example: Let us consider a simple hash function as “key mod 5”
and a sequence of keys that are to be inserted are 50, 70, 76, 85,
93.
Quadratic Probing
Quadratic probing operates by taking the original hash index and
adding successive values of an arbitrary quadratic polynomial until an
open slot is found.
Algorithm
1. If the slot hash(x) % n is full, then we try (hash(x) + 12) % n.
2. If (hash(x) + 12 ) % n is also full, then we try (hash(x) + 22) % n.
3. If (hash(x) + 22 ) % n is also full, then we try (hash(x) + 32) % n.
4. This process will be repeated for all the values of i until an empty slot is
found
Example 1:
Table Size = 7, hash function as Hash(x) = x % 7. Insert = 22, 30,
and 50
Example 2:
Table size: 10; hash function: h(x) = x % 10
Keys: 2, 12, 22, 32
Quadratic probing formula: (hash(x) + i2 ) % n
32 2 12 22
0 1 2 3 4 5 6 7 8 9
1. h(2) = 2 % 10 = 2
2. h(12) = 12 % 10 = 2, index 2 is occupied.
(hash(x) + i2 ) % n
(2 + 12 ) % 10 = 3
3. h(22) = 22 % 10 = 2, index 2 is occupied
(2 + 12 ) % 10 = 3, index 3 is occupied
(2 + 22 ) % 10 = 6
4. h(32) = 32 % 10 = 2, index 2 is occupied
(2 + 12 ) % 10 = 3, index 3 is occupied
(2 + 22 ) % 10 = 6, index 6 is occupied
(2 + 32 ) % 10 = 1
Double Hashing
● Double hashing works by using two hash functions to compute two
different hash values for a given key.
● The first hash function is h1(k) which takes the key and gives out a
location on the hash table.
● But in case the location is occupied (collision) we will use secondary
hash-function h2(k) in combination with the first hash-function h1(k) to
find the new location on the hash table.