Module-4 Dictionaries and Hash Tables
Module-4 Dictionaries and Hash Tables
Operations:
• Insertion
• Deletion
• Searching
Insertion:
• Insertion in a dictionary is similar to that of a linked list. We create and
new node and store it’s address in the previous node.
• While inserting a value in the dictionary, we search a dictionary for the
given key
o If we find the key, we enter (overwrite) the data into that node
o If we do not find the key, We create a new node at the correct
position (keys should be in ascending order) and enter value and
key into the new node.
Example: Insert (2,90),(11,86),(4,65) into a dictionary
Step 1:
Step 2:
Step 3:
void insert(int key, int value)
{
temp = head;
while (temp != NULL) //Traversing to find the key
{
//key is found, enter the value into the node
if (temp->key == key)
{
temp->value = value;
return;
}
temp = temp->next;
}
if (prev == NULL)
head = temp->next;
else
prev->next = temp->next;
free(temp);
return;
}
prev = temp;
temp = temp->next;
}}
Searching: To search for a value,
• We traverse through the dictionary and search for the matching key
• If we find the key, We return the corresponding value
int search(int key) {
temp = head;
Division Method: This is the most simple and easiest method to generate a
hash value. The hash function divides the value k by M and then uses the
remainder obtained.
Formula:
h(K) = k mod M
Here,
k is the key value, and
M is the size of the hash table.
It is best suited that M is a prime number as that can make sure the keys are
more uniformly distributed. The hash function is dependent upon the
remainder of a division.
Example:
k = 12345 k=1276
M = 95 M=11
h(12345) = 12345 mod 95 h(1276) = 1276 mod 11
= 90 =0
Open Addressing: In Open Addressing, all elements are stored in the hash table
itself. So at any point, the size of the table must be greater than or equal to the
total number of keys (Note that we can increase table size by copying old data
if needed). This approach is also known as closed hashing.
1.Linear Probing: In linear probing, the hash table is searched sequentially that
starts from the original location of the hash. If in case the location that we get
is already occupied, then we check for the next location.
Formula:
index = ((key% TABLE_SIZE ) + i) % TABLE_SIZE
Example: Insert the following sequence of keys in the hash table
{9, 7, 11, 13, 12, 8}
Use linear probing technique for collision resolution
h(k, i) = [h(k) + i] mod m
h(k) = 2k + 5
m=10
Solution:
Step 01:
First Draw an empty hash table of Size 10.
The possible range of hash values will be [0, 9].
Step 02:
Insert the given keys one by one in the hash table.
First Key to be inserted in the hash table = 9.
h(k) = 2k + 5
h(9) = 2*9 + 5 = 23
h(k, i) = [h(k) + i] mod m
h(9, 0) = [23 + 0] mod 10 = 3
So, key 9 will be inserted at index 3 of the hash table
Step 03:
Step 05:
Next Key to be inserted in the hash table = 13.
h(k) = 2k + 5
h(13) = 2*13 + 5 = 31
h(k, i) = [h(k) + i] mod m
h(13, 0) = [31 + 0] mod 10 = 1
So, key 13 will be inserted at index 1 of the hash table
Step 06:
Next key to be inserted in the hash table = 12.
h(k) = 2k + 5
h(12) = 2*12 + 5 = 27
h(k, i) = [h(k) + i] mod m
h(12, 0) = [27 + 0] mod 10 = 7
Here Collision has occurred because index 7 is already filled.
Now we will increase i by 1.
h(12, 1) = [27 + 1] mod 10 = 8
So, key 12 will be inserted at index 8 of the hash table.
Step 07:
Next key to be inserted in the hash table = 8.
h(k) = 2k + 5
h(8) = 2*8 + 5 = 21
h(k, i) = [h(k) + i] mod m
h(8, 0) = [21 + 0] mod 10 = 1
Here Collision has occurred because index 1 is already filled.
Now we will increase i by 1 now i become 1.
h(k) = 2k + 5
h(8) = 2*8 + 5 = 21
h(k, i) = [h(k) + i] mod m
h(8, 0) = [21 + 1] mod 10 = 2
index 2 is vacant so 8 will be inserted at index 2.
This is how the linear probing collision resolution technique works.
Operations:
• Insertion
• Deletion
• Searching
If we would like to add another element to the table with key k4 , which is
hashed at the same location of another element,
void insert(int key)
{
int index = hashFunction(key);
while (hashTable[index] != -1)
{ -1 represents empty slot. If slot is not empty,
we perform linear probing (Move to the next
index = (index + 1) % SIZE; slot)
}
hashTable[index] = key; // Insert the key
printf("Inserted %d at index %d\n", key, index);
}
Searching: Linear probing works as its name suggests. There is a probe that
linearly traverses the underlying array. If it finds an element in its path while
traversing the array - it will return it to the user.
However, if it finds an empty bucket during its search, it will stop at this
moment and signal to the user that it has found nothing.
int search(int key)
{
int index = hashFunction(key);
while (hashTable[index] != -1)
{
if (hashTable[index] == key) // Check if the key matches
{
return index; // Key found, return the index
}
index = (index + 1) % SIZE; // Linear probing: move to the next slot
}
return -1; // Key not found
}
Deletion: Deleting an item while using linear probing is a bit tricky. Consider
this example where I naively remove an item from a hash table:
We just deleted an item from the hash table. Then we look for the item which
is hashed by the hash function in the same location as the item which was
just deleted and which was placed earlier in the hash table to the
right relative to the element that has just been deleted:
Because the linear probe stops when it encounters a empty space.
If the linear probe encounters a tombstone while searching the array, it will
ignore it, and continue its search for the element we want.
Then, we search for an element like before, however this time we are able to
find it thanks to the tombstone:
void delete(int key)
{
int index = search(key);
if (index != -1)
{
hashTable[index] = -1; // Mark the slot as empty
printf("Deleted key %d at index %d\n", key, index);
}
else
{
printf("Key %d not found in the hash table\n", key);
}
}
Display Function:
void display()
{
printf("Hash Table:\n");
for (int i = 0; i < SIZE; i++)
{
printf("[%d] -> ", i);
if (hashTable[i] != -1)
{
printf("%d", hashTable[i]);
}
else
{ printf("Empty");
}
printf("\n");
}
}
Disadvantage of Linear Probing: clustering
Procedure: Let hash(x) be the slot index computed using the hash function.
Example: Let us consider a simple hash function as “key mod 7” and sequence
of keys as 50, 700, 76, 85, 92, 73, 101
3.Double Hashing: It is a technique in which two hash functions are used when
there is an occurrence of collision. In this method 1 hash function is simple as
same as division method. But a good second Hash Function must follow the
following rules:
H2(key) = P - (key % P)
Where, p is a prime number which should be taken smaller than the size of
a hash table.
67%10 = 7
0 90
90%10 = 0 1 17
55%10 = 5 2
3
17%10 = 7
4
(apply 2nd function as there’s a collision)
5 55
= 7 - (17%7) = 7 - 3 6
=4 7 67
8
Final Index = (7+4) % 10 = 11%10
9 49
=1
49%10 = 9
Example: Insert 76,93,40,47,10,55
Procedure:
• For each addition of a new entry to the map, check the load factor.
• If it’s greater than its pre-defined value (or default value of 0.75 if not
given), then Rehash.
• For Rehash, make a new array of double the previous size and make it
the new bucketarray.
• Then traverse to each element in the old bucketArray and call the
insert() for each so as to insert it into the new larger bucket array.
Inserting 4 and 6:
Both 4(100) and 6(110)have 0 in their LSB. Hence, they are hashed as follows:
Inserting 22: binary form of 22 is 10110. Its LSB is 0. The bucket pointed by
directory 0 is already full. Hence, Over Flow occurs.
• Apply Step 7-Case 1,
• Since Local Depth = Global Depth, the bucket splits and directory
expansion takes place.
• Also, rehashing of numbers present in the overflowing bucket
takes place after the split.
•
• since the global depth is incremented by 1, now,the global depth
is 2.
• Hence, 16,4,6,22 are now rehashed w.r.t 2 LSBs.[
16(10000),4(100),6(110),22(10110) ]
Notice that the bucket which was underflow has remained untouched but 01
and 11 pointing to the same bucket.
Inserting 24 and 10: 24(11000) and 10 (1010) can be hashed based on
directories with id 00 and 10. Here, we encounter no overflow condition.