Hashtables
Hashtables
1. Key: A Key can be anything string or integer which is fed as input in the hash function
the technique that determines an index or location for storage of an item in a data
structure.
2. Hash Function: The hash function receives the input key and returns the index of an
element in an array called a hash table. The index is known as the hash index.
index
3. Hash Table: Hash table is a data structure that maps keys to values using a special
stores the data in an associative manner in an array
function called a hash function. Hash stores
where each data value has its own unique index.
There are many hash functions that use numeric or alphanumeric keys. This article focuses on
discussing different hash functions
functions:
1. Division Method.
3. Folding Method.
4. Multiplication Method
What is collision?
The hashing process generates a small number for a big key, so there is a possibility that two
keys could produce the same value. The situation where the newly inserted key maps to an
already occupied, and it must be handled using some collision handling technology.
Hashing is the process of generating a value from a text or a list of numbers using a
mathematical function known as a hash function.
A Hash Function is a function that converts a given numeric or alphanumeric key to a small
practical integer value. The mapped integer value is used as an index in the hash table. In simple
terms, a hash function maps a significant number or string to a small integer that can be used as
the index in the hash table.
The pair is of the form (key, value), where for a given key, one can find a value using some kind
of a “function” that maps keys to values. The key for a given object can be calculated using a
function called a hash function. For example, given an array A, if i is the key, then we can find
the value by simply looking up A[i].
……………………………………………………………………………………………………
There are many hash functions that use numeric or alphanumeric keys. This article focuses on
discussing different hash functions:
1. Division Method:
This is the most simple and easiest method to generate a hash value. The hash function divides
the value k by M and then uses the remainder obtained.
It is best suited that M is a prime number as that can make sure the keys are more uniformly
distributed. The hash function is dependent upon the remainder of a division.
Example: k = 12345
M = 95
h(12345) = 12345 mod 95
= 90
k = 1276
M = 11
h(1276) = 1276 mod 11
=0
Pros:
Cons:
1. This method leads to poor performance since consecutive keys map to consecutive hash
values in the hash table.
2. Sometimes extra care should be taken to choose the value of M.
The mid-square method is a very good hashing method. It involves two steps to compute the
hash value-
Example:
Suppose the hash table has 100 memory locations. So r = 2 because two digits are required to
map the key to the memory location.
k = 60
k x k = 60 x 60
= 3600
h(60) = 60
Pros:
1. The performance of this method is good as most or all digits of the key value contribute
to the result. This is because all digits in the key contribute to generating the middle
digits of the squared result.
2. The result is not dominated by the distribution of the top digit or bottom digit of the
original key value.
Cons:
1. The size of the key is one of the limitations of this method, as the key is of big size then
its square will double the number of digits.
2. Another disadvantage is that there will be collisions but we can try to reduce collisions.
1. Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each part has
the same number of digits except for the last part that can have lesser digits than the other
parts.
2. Add the individual parts. The hash value is obtained by ignoring the last carry if any.
Formula:
Here,
s is obtained by adding the parts of the key k
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
Note:
The number of digits in each part varies depending upon the size of the hash table. Suppose for
example the size of the hash table is 100, then each part must have two digits except for the last
part which can have a lesser number of digits.
4. Multiplication Method
Formula:
Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Example:
k = 12345
A = 0.357840
M = 100
Pros:
The advantage of the multiplication method is that it can work with any value between 0 and 1,
although there are some values that tend to give better results than the rest.
Cons:
The multiplication method is generally suitable when the table size is the power of two, then the
whole process of computing the index by the key using multiplication hashing is very fast.
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position is equally likely for each.
4. Should have a low load factor(number of items in the table divided by the size of the
table).
1. Separate Chaining:
2. Open Addressing:
1) Separate Chaining
The idea is to make each cell of the hash table point to a linked list of records that have the same
hash function value. Chaining is simple but requires additiona
additionall memory outside the table.
Example: We have given a hash function and we have to insert some elements in the hash table
using a separate chaining method for collision resolution technique.
Let’s see step by step approach to how to solve the above problem:
Step 1: First draw the empty hash table which will have a possible range of hash values
from 0 to 4 according to the hash function provided.
Step 2: Now insert all the keys in the hash table on
onee by one. The first key to be inserted
is 12 which is mapped to bucket number 2 which is calculated by using the hash function
12%5=2.
Step 3: Now the next key is 22. It will map to bucket number 2 because 22%5=2. But
bucket 2 is already occupied by key 12
Step 4: The next key is 15. It will map to slot number 0 because 15%5=0.
Step 5: Now the next key is 25. Its bucket number will be 25%5=0. But bucket 0 is
already occupied by key 25. So separate chaining method will again handle the collision
by creating a linked list to bucket 0.
2) Open Addressing
In open addressing, all elements are stored in the hash table itself. Each table entry contains
either a record or NIL. When searching for an element, we examine the table slots one by one
until the desired element is found or it is clear that the element is not in the table.
In linear probing, the hash table is searched sequentially that starts from the original location of
the hash. If in case the location that we get is already occupied, then we check for the next
location.
Algorithm:
1. Calculate the hash key. i.e. key = data % size
4. Check, if the next index is available hashTable[key] then store the value. Otherwise try
for next index.
Example: Let us consider a simple hash function as “key mod 5” and a sequence of keys that are
to be inserted are 50, 70, 76, 85, 93.
Step 1: First draw the empty hash table which will have a possible range of hash values
from 0 to 4 according to the hash function provided.
Hash table
Step 2: Now insert all the keys in the hash table one by one. The first key is 50. It will
map to slot number 0 because 50%5=0. So insert it into slot number 0.
Insert 50 into hash table
Step 3: The next key is 70. It will map to slot number 0 because 70%5=0 but 50 is
already at slot number 0 so, search for the next empty slot and insert it.
Step 5: The next key is 93 It will map to slot number 3 because 93%5=3, So insert it into
slot number 3.
Insert 93 into hash table
Quadratic probing is an open addressing scheme in computer programming for resolving hash
collisions in hash tables. Quadratic probing operates by taking the original hash index and adding
successive values of an arbitrary quadratic polynomial until an open slot is found.
This method is also known as the mid-square method because in this method we look for i2‘th
probe (slot) in i’th iteration and the value of i = 0, 1, . . . n – 1. We always start from the original
hash location. If only the location is occupied then we check the other slots.
Let hash(x) be the slot index computed using the hash function and n be the size of the hash
table.
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision
resolution strategy to be f(i) = i2 . Insert = 22, 30, and 50
Step 3: Inserting 50
o Hash(50) = 50 % 7 = 1
o In our hash table slot 1 is already occupied. So, we will search for slot 1+12, i.e.
1+1 = 2,
o Again slot 2 is found occupied, so we will search for cell 1+22, i.e.1+4 = 5,
Double hashing is a collision resolving technique in Open Addressed Hash tables. Double
hashing make use of two hash function,
The first hash function is h1(k) which takes the key and gives out a location on the hash
table. But if the new location is not occupied or empty then we can easily place our key.
But in case the location is occupied (collision) we will use secondary hash-function
hash h2(k)
in combination with the first hash-function
hash h1(k) to find the new location on the hash
table.
where
i is a non-negative
negative integer that indicates a collision number,
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first hash
hash-function
is h1(k) = k mod 7 and second hash
hash-function is h2(k) = 1 + (k mod 5)
Step 1: Insert 27
Step 2: Insert 43
Step 4: Insert 72
The load factor of the hash table can be defined as the number of items the hash table contains
divided by the size of the hash table. Load factor is the decisive parameter that is used when we
want to rehash the previous hash function or want to add more elements to the existing hash
table.
It helps us in determining the efficiency of the hash function i.e. it tells whether the hash function
which we are using is distributing the keys uniformly or not in the hash table.
What is Rehashing?
As the name suggests, rehashing means hashing again. Basically, when the load factor increases
to more than its predefined value (the default value of the load factor is 0.75), the complexity
increases. So to overcome this, the size of the array is increased (doubled) and all the values are
hashed again and stored in the new double-sized array to maintain a low load factor and low
complexity.
Hash is used for cache mapping for fast access to the data.
Hash tables are more efficient than search trees or other data structures
Hash provides constant time for searching, insertion, and deletion operations on average.
Hash collisions are practically not avoided for a large set of possible keys.
Conclusion
From the above discussion, we conclude that the goal of hashing is to resolve the challenge of
finding an item quickly in a collection. For example, if we have a list of millions of English
words and we wish to find a particular term then we would use hashing to locate and find it more
efficiently. It would be inefficient to check each item on the millions of lists until we find a
match. Hashing reduces search time by restricting the search to a smaller set of words at the
beginning.