Hashing Data Structure
Hashing Data Structure
What is Hashing
Table of Contents/Roadmap
What is Hashing
Need for Hash data structure
Components of Hashing
How does Hashing work?
What is a Hash function?
Types of Hash functions:
Properties of a Good hash function:
Complexity of calculating hash value using the hash
function
Problem with Hashing:
What is collision?
How to handle Collisions?
1) Separate Chaining:
2) Open Addressing:
2.a) Linear Probing:
2.b) Quadratic Probing:
2.c) Double Hashing:
What is meant by Load Factor in Hashing?
What is Rehashing?
Applications of Hash Data structure
Real-Time Applications of Hash Data structure
Advantages of Hash Data structure
Disadvantages of Hash Data structure
Conclusion
Need for Hash data structure
Every day, the data on the internet is increasing multifold and it is
always a struggle to store this data efficiently. In day-to-day
programming, this amount of data might not be that big, but still, it
needs to be stored, accessed, and processed easily and efficiently. A
very common data structure that is used for such a purpose is the
Array data structure.
Now the question arises if Array was already there, what was the
need for a new data structure! The answer to this is in the word
“efficiency“. Though storing in Array takes O(1) time, searching in
it takes at least O(log n) time. This time appears to be small, but for
a large data set, it can cause a lot of problems and this, in turn,
makes the Array data structure inefficient.
So now we are looking for a data structure that can store the data
and search in it in constant time, i.e. in O(1) time. This is how
Hashing data structure came into play. With the introduction of the
Hash data structure, it is now possible to easily store data in
constant time and retrieve them in constant time as well.
Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as
input in the hash function the technique that determines an index
or location for storage of an item in a data structure.
2. Hash Function: The hash function receives the input key and
returns the index of an element in an array called a hash
table. The index is known as the hash index.
3. Hash Table: Hash table is a data structure that maps keys to
values using a special function called a hash function. Hash stores
the data in an associative manner in an array where each data
value has its own unique index.
Components of Hashing
1) Separate Chaining
The idea is to make each cell of the hash table point to a linked list
of records that have the same hash function value. Chaining is
simple but requires additional memory outside the table.
Example: We have given a hash function and we have to insert
some elements in the hash table using a separate chaining method
for collision resolution technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.
Let’s see step by step approach to how to solve the above problem:
Step 1: First draw the empty hash table which will have a
possible range of hash values from 0 to 4 according to the hash
function provided.
Hash table
Step 2: Now insert all the keys in the hash table one by one. The
first key to be inserted is 12 which is mapped to bucket number 2
which is calculated by using the hash function 12%5=2.
Insert 12
Step 3: Now the next key is 22. It will map to bucket number 2
because 22%5=2. But bucket 2 is already occupied by key 12.
Insert 22
Step 4: The next key is 15. It will map to slot number 0 because
15%5=0.
Insert 15
Step 5: Now the next key is 25. Its bucket number will be
25%5=0. But bucket 0 is already occupied by key 25. So separate
chaining method will again handle the collision by creating a
linked list to bucket 0.
Insert 25
Hash table
Step 2: Now insert all the keys in the hash table one by one. The
first key is 50. It will map to slot number 0 because 50%5=0. So
insert it into slot number 0.
Step 3: The next key is 70. It will map to slot number 0 because
70%5=0 but 50 is already at slot number 0 so, search for the
next empty slot and insert it.
Insert 70 into hash table
Step 4: The next key is 76. It will map to slot number 1 because
76%5=1 but 70 is already at slot number 1 so, search for the
next empty slot and insert it.
Hash table
Step 3: Inserting 50
Hash(50) = 50 % 7 = 1
In our hash table slot 1 is already occupied. So, we will
search for slot 1+12, i.e. 1+1 = 2,
Again slot 2 is found occupied, so we will search for cell
1+22, i.e.1+4 = 5,
Now, cell 5 is not occupied so we will place 50 in slot 5.
where
i is a non-negative integer that indicates a collision number,
k = element/key which is being hashed
n = hash table size.
Complexity of the Double hashing algorithm:
Time complexity: O(n)
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size
7. where first hash-function is h1(k) = k mod 7 and second hash-
function is h2(k) = 1 + (k mod 5)
Step 1: Insert 27
27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.
Step 2: Insert 43
43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.
Insert key 43 in the hash table
Step 4: Insert 72
72 % 7 = 2, but location 2 is already being occupied and
this is a collision.
So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
= 5 % 7
= 5,
What is Rehashing?
As the name suggests, rehashing means hashing again. Basically,
when the load factor increases to more than its predefined value
(the default value of the load factor is 0.75), the complexity
increases. So to overcome this, the size of the array is increased
(doubled) and all the values are hashed again and stored in the new
double-sized array to maintain a low load factor and low complexity.
Applications of Hash Data structure
Hash is used in databases for indexing.
Hash is used in disk-based data structures.
In some programming languages like Python, JavaScript hash is
used to implement objects.
Real-Time Applications of Hash Data
structure
Hash is used for cache mapping for fast access to the data.
Hash can be used for password verification.
Hash is used in cryptography as a message digest.
Rabin-Karp algorithm for pattern matching in a string.
Calculating the number of different substrings of a string.
Advantages of Hash Data structure
Hash provides better synchronization than other data structures.
Hash tables are more efficient than search trees or other data
structures
Hash provides constant time for searching, insertion, and deletion
operations on average.
Disadvantages of Hash Data structure
Hash is inefficient when there are many collisions.
Hash collisions are practically not avoided for a large set of
possible keys.
Hash does not allow null values.
Conclusion
From the above discussion, we conclude that the goal of hashing is
to resolve the challenge of finding an item quickly in a collection. For
example, if we have a list of millions of English words and we wish to
find a particular term then we would use hashing to locate and find
it more efficiently. It would be inefficient to check each item on the
millions of lists until we find a match. Hashing reduces search time
by restricting the search to a smaller set of words at the beginning.