Hashing
Hashing
Hashing is a technique that is used to uniquely identify a specific object from a group of similar objects. Some
examples of how hashing is used in our lives include:
In universities, each student is assigned a unique roll number that can be used to retrieve information
about them.
In libraries, each book is assigned a unique number that can be used to determine information about the
book, such as its exact position in the library or the users it has been issued to etc.
In both these examples the students and books were hashed to a unique number.
Hashing is a technique to convert a range of key values into a range of indexes of an array.
We're going to use modulo operator to get a range of key values. Consider an example of hash
table of size 20, and the following items are to be stored. Item are in the (key,value) format.
In hashing, large keys are converted into small keys by using hash functions.
The values are then stored in a data structure called hash table.
o The idea of hashing is to distribute entries (key/value pairs) uniformly across an array. Each
element is assigned a key (converted key). By using that key you can access the element
in O(1) time. Using the key, the algorithm (hash function) computes an index that suggests
where an entry can be found or inserted.
Implementation:
1. An element is converted into an integer by using a hash function. This element can be used as an index
to store the original element, which falls into the hash table.
2. The element is stored in the hash table where it can be quickly retrieved using hashed key.
hash = hashfunc(key)
index = hash % array_size
In this method, the hash is independent of the array size and it is then reduced to an index (a number between 0
and array_size − 1) by using the modulo operator (%).
Hash function
A hash function is any function that can be used to map a data set of an arbitrary size to a data set of a fixed
size, which falls into the hash table. The values returned by a hash function are called hash values, hash codes,
hash sums, or simply hashes.
To achieve a good hashing mechanism, It is important to have a good hash function with the following basic
requirements:
1. Easy to compute: It should be easy to compute and must not become an algorithm in itself.
2. Uniform distribution: It should provide a uniform distribution across the hash table and should not result
in clustering.
3. Less collisions: Collisions occur when pairs of elements are mapped to the same hash value. These
should be avoided.
(1,20)
(2,70)
(42,80)
(4,25)
(12,44)
(14,32)
(17,11)
(13,78)
(37,98)
1 1 1 % 20 = 1 1
2 2 2 % 20 = 2 2
3 42 42 % 20 = 2 2
4 4 4 % 20 = 4 4
5 12 12 % 20 = 12 12
6 14 14 % 20 = 14 14
7 17 17 % 20 = 17 17
8 13 13 % 20 = 13 13
9 37 37 % 20 = 17 17
Hash table
A hash table is a data structure that is used to store keys/value pairs. It uses a hash function to compute an index
into an array in which an element will be inserted or searched. By using a good hash function, hashing can work
well. Under reasonable assumptions, the average time required to search for an element in a hash table is O(1).
Let us consider string S. You are required to count the frequency of all the characters in this string.
string S = “ababcd”
The simplest way to do this is to iterate over all the possible characters and count their frequency one by
one. The time complexity of this approach is O(26*N) where N is the size of the string and there are 26
possible characters.
void countFre(string S)
{
for(char c = ‘a’;c <= ‘z’;++c)
{
int frequency = 0;
for(int i = 0;i < S.length();++i)
if(S[i] == c)
frequency++;
cout << c << ‘ ‘ << frequency << endl;
}
}
Output
a 2
b 2
c 1
d 1
e 0
f 0
…
z 0
Basic Operations
Following are the basic primary operations of a hash table.
Search − Searches an element in a hash table.
Insert − inserts an element in a hash table.
delete − Deletes an element from a hash table.
Insert
Insert Operation
Whenever an element is to be inserted, compute the hash code of the key passed and locate
the index using that hash code as an index in the array. Use linear probing for empty
location, if an element is found at the computed hash code
void insert(string s)
{
Search Operation
Whenever an element is to be searched, compute the hash code of the key passed and locate
the element using that hash code as index in the array. Use linear probing to get the element
ahead if the element is not found at the computed hash code.
void search(string s)
{
//Compute the index by using the hash function
int index = hashFunc(s);
//Search the linked list at that specific index
for(int i = 0;i < hashTable[index].size();i++)
{
if(hashTable[index][i] == s)
{
cout << s << " is found!" << endl;
return;
}
}
cout << s << " is not found!" << endl;
}
Delete Operation
Whenever an element is to be deleted, compute the hash code of the key passed and locate the
index using that hash code as an index in the array. Use linear probing to get the element ahead
if an element is not found at the computed hash code. When found, store a dummy item there to
keep the performance of the hash table intact
if(hashArray[hashIndex]->key == key) {
struct DataItem* temp = hashArray[hashIndex];
return NULL;
}
Applications
Associative arrays: Hash tables are commonly used to implement many types of in-memory tables.
They are used to implement associative arrays (arrays whose indices are arbitrary strings or other
complicated objects).
Database indexing: Hash tables may also be used as disk-based data structures and database indices
(such as in dbm).
Caches: Hash tables can be used to implement caches i.e. auxiliary data tables that are used to speed
up the access to data, which is primarily stored in slower media.
Object representation: Several dynamic languages, such as Perl, Python, JavaScript, and Ruby use
hash tables to implement objects.
Hash Functions are used in various algorithms to make their computing faster
Separate chaining is one of the most commonly used collision resolution techniques. It is usually
implemented using linked lists. In separate chaining, each element of the hash table is a linked list. To
store an element in the hash table you must insert it into a specific linked list. If there is any collision (i.e.
two different elements have same hash value) then store both the elements in the same linked list.
The cost of a lookup is that of scanning the entries of the selected linked list for the required key. If the
distribution of the keys is sufficiently uniform, then the average cost of a lookup depends only on the
average number of keys per linked list. For this reason, chained hash tables remain effective even when
the number of table entries (N) is much higher than the number of slots.
For separate chaining, the worst-case scenario is when all the entries are inserted into the same linked
list. The lookup procedure may have to scan all its entries, so the worst-case cost is proportional to the
number (N) of entries in the table.
In the following image, CodeMonk and Hashing both hash to the value 2. The linked list at the
index 2 can hold only one entry, therefore, the next entry (in this case Hashing) is linked (attached) to
the entry of CodeMonk.
Assumption
Insert
void insert(string s)
{
// Compute the index using Hash Function
int index = hashFunc(s);
// Insert the element in the linked list at the particular index
hashTable[index].push_back(s);
}
Search
void search(string s)
{
//Compute the index by using the hash function
int index = hashFunc(s);
//Search the linked list at that specific index
for(int i = 0;i < hashTable[index].size();i++)
{
if(hashTable[index][i] == s)
{
cout << s << " is found!" << endl;
return;
}
}
cout << s << " is not found!" << endl;
}
In open addressing, instead of in linked lists, all entry records are stored in the array itself. When a new entry has
to be inserted, the hash index of the hashed value is computed and then the array is examined (starting with the
hashed index). If the slot at the hashed index is unoccupied, then the entry record is inserted in slot at the
hashed index else it proceeds in some probe sequence until it finds an unoccupied slot.
The probe sequence is the sequence that is followed while traversing through entries. In different probe
sequences, you can have different intervals between successive entry slots or probes.
When searching for an entry, the array is scanned in the same sequence until either the target element is found
or an unused slot is found. This indicates that there is no such key in the table. The name "open addressing"
refers to the fact that the location or address of the item is not determined by its hash value.
Linear probing is when the interval between successive probes is fixed (usually to 1). Let’s assume that the
hashed index for a particular entry is index. The probing sequence for linear probing will be:
Assumption
string hashTable[21];
int hashTableSize = 21;
Insert
void insert(string s)
{
//Compute the index using the hash function
int index = hashFunc(s);
//Search for an unused slot and if the index will exceed the
hashTableSize then roll back
while(hashTable[index] != "")
index = (index + 1) % hashTableSize;
hashTable[index] = s;
}
Search
void search(string s)
{
//Compute the index using the hash function
int index = hashFunc(s);
//Search for an unused slot and if the index will exceed the
hashTableSize then roll back
while(hashTable[index] != s and hashTable[index] != "")
index = (index + 1) % hashTableSize;
//Check if the element is present in the hash table
if(hashTable[index] == s)
cout << s << " is found!" << endl;
else
cout << s << " is not found!" << endl;
}
Quadratic Probing
Quadratic probing is similar to linear probing and the only difference is the interval between successive probes or
entry slots. Here, when the slot at a hashed index for an entry record is already occupied, you must start
traversing until you find an unoccupied slot. The interval between slots is computed by adding the successive
value of an arbitrary polynomial in the original hashed index.
Let us assume that the hashed index for an entry is index and at index there is an occupied slot. The probe
sequence will be as follows:
and so on…
Assumption
string hashTable[21];
int hashTableSize = 21;
Insert
void insert(string s)
{
//Compute the index using the hash function
int index = hashFunc(s);
//Search for an unused slot and if the index will exceed the
hashTableSize roll back
int h = 1;
while(hashTable[index] != "")
{
index = (index + h*h) % hashTableSize;
h++;
}
hashTable[index] = s;
}
Search
void search(string s)
{
//Compute the index using the Hash Function
int index = hashFunc(s);
//Search for an unused slot and if the index will exceed the
hashTableSize roll back
int h = 1;
while(hashTable[index] != s and hashTable[index] != "")
{
index = (index + h*h) % hashTableSize;
h++;
}
//Is the element present in the hash table
if(hashTable[index] == s)
cout << s << " is found!" << endl;
else
cout << s << " is not found!" << endl;
}
Double hashing
Double hashing is similar to linear probing and the only difference is the interval between successive probes.
Here, the interval between probes is computed by using two hash functions.
Let us say that the hashed index for an entry record is an index that is computed by one hashing function and the
slot at that index is already occupied. You must start traversing in a specific probing sequence to look for an
unoccupied slot. The probing sequence will be:
and so on…
Here, indexH is the hash value that is computed by another hash function.
Assumption
string hashTable[21];
int hashTableSize = 21;
Insert
void insert(string s)
{
//Compute the index using the hash function1
int index = hashFunc1(s);
int indexH = hashFunc2(s);
//Search for an unused slot and if the index exceeds the
hashTableSize roll back
while(hashTable[index] != "")
index = (index + indexH) % hashTableSize;
hashTable[index] = s;
}
Search
void search(string s)
{
//Compute the index using the hash function
int index = hashFunc1(s);
int indexH = hashFunc2(s);
//Search for an unused slot and if the index exceeds the
hashTableSize roll back
while(hashTable[index] != s and hashTable[index] != "")
index = (index + indexH) % hashTableSize;
//Is the element present in the hash table
if(hashTable[index] == s)
cout << s << " is found!" << endl;
else
cout << s << " is not found!" << endl;
}