ADI Hashing
ADI Hashing
Hashing
• Hashing is a technique of converting an element or a value into a fixed size key,
that is used to uniquely identify that element from the group of similar
elements.
• In other words, hashing is a process of mapping an element or a value with a
unique key.
• In hashing, a hash function is used, which takes the value(element) as an input
and generates the fixed size key also known as hash code as an output. The
generated hash code is thus used as the index position at which the value
associated with the key is stored.
Examples of Hashing in Data Structure
The following are real-life examples of hashing in the data structure –
• In schools, the teacher assigns a unique roll number to each student. Later,
the teacher uses that roll number to retrieve information about that student.
• A library has an infinite number of books. The librarian assigns a unique
number to each book. This unique number helps in identifying the position
of the books on the bookshelf.
Need for Hash data structure
• Every day, the data on the internet is increasing multifold and it is always a struggle to store
this data efficiently.
• In day-to-day programming, this amount of data might not be that big, but still, it needs to be
stored, accessed, and processed easily and efficiently.
• A very common data structure that is used for such a purpose is the Array data structure.
• Now the question arises if Array was already there, what was the need for a new data
structure!
• The answer to this is in the word “efficiency“.
• Though storing in Array takes O(1) time, searching in it takes at least
O(log n) time. This time appears to be small, but for a large data set, it
can cause a lot of problems and this, in turn, makes the Array data
structure inefficient.
• So now we are looking for a data structure that can store the data and
search in it in constant time, i.e. in O(1) time.
• This is how Hashing data structure came into play. With the
introduction of the Hash data structure, it is now possible to easily
store data in constant time and retrieve them in constant time as well.
Components of Hashing
1.Key: A Key can be anything string or integer which is fed as input in the
hash function the technique that determines an index or location for storage
of an item in a data structure.
2.Hash Function: The hash function receives the input key and returns the
index of an element in an array called a hash table. The index is known as
the hash index.
3.Hash Table: Hash table is a data structure that maps keys to values using a
special function called a hash function. Hash stores the data in an associative
manner in an array where each data value has its own unique index.
Hash Table
• Hash table is a data structure which uses hashing to store the data in the form
of key value pairs, such that the basic operations i.e. the insertion, deletion,
and searching can be performed on the data in O(1) time.
• Consider a hash table as an array, and whenever we need to insert, delete or
search an element in it, first the hash code corresponding to that element is
computed and then treating the hash code as the index of the array, the
required operation is performed on the given element.
• A simple hashing approach would be to use the modulo(%) operator as a hash
function and generate the key for a given value(assuming its numerical).
hash= value % hashTableSize
How does Hashing work?
• Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to
store it in a table.
• Our main objective here is to search or update the values stored in the table
quickly in O(1) time and we are not concerned about the ordering of strings
in the table. So the given set of strings can act as a key and the string itself
will act as the value of the string but how to store the value corresponding to
the key?
Step 1: We know that hash functions (which is some mathematical
formula) are used to calculate the hash value which acts as the index of
the data structure where the value will be stored.
Step 2: So, let’s assign
• “a” = 1,
• “b”=2, .. etc, to all alphabetical characters.
Step 3: Therefore, the numerical value by summation of all characters of the string:
“ab” = 1 + 2 = 3,
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
Step 4: Now, assume that we have a table of size 7 to store these strings. The hash
function that is used here is the sum of the characters in key mod Table size. We
can compute the location of the string in the array by taking the sum(string) mod 7.
Step 5: So we will then store
“ab” in 3 mod 7 = 3,
“cd” in 7 mod 7 = 0, and
“efg” in 18 mod 7 = 4.
• The above technique enables us to calculate the location of a given
string by using a simple hash function and rapidly find the value that
is stored in that location. Therefore the idea of hashing seems like a
great way to store (key, value) pairs of the data in a table.
int hashTable[10];
int hashTableSize = 10;
• The hashing process generates a small number for a big key, so there is
a possibility that two keys could produce the same value. The situation
where the newly inserted key maps to an already occupied, and it must
be handled using some collision handling technology.
How to handle Collisions?
• There are mainly two methods to handle collision:
1.Separate Chaining:
2.Open Addressing:
Separate Chaining
• The idea is to make each cell of the hash table point to a linked list of
records that have the same hash function value. Chaining is simple but
requires additional memory outside the table.
• Example: We have given a hash function and we have to insert some
elements in the hash table using a separate chaining method for
collision resolution technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.
step by step approach to how to solve the
above problem:
Step 1: First draw the empty hash table which will have a possible
range of hash values from 0 to 4 according to the hash function
provided.
Step 2: Now insert all the keys in the hash table one by one. The first key to be
inserted is 12 which is mapped to bucket number 2 which is calculated by using the
hash function 12%5=2.
Step 3: Now the next key is 22. It will map to bucket number 2
because 22%5=2. But bucket 2 is already occupied by key 12.
Step 4: The next key is 15. It will map to slot number 0 because
15%5=0.
Step 5: Now the next key is 25. Its bucket number will be
25%5=0. But bucket 0 is already occupied by key 25. So
separate chaining method will again handle the collision by
creating a linked list to bucket 0.
Complexity Analysis
M= Number of slots in hash table
• Insert –O(1) N= Number of keys to be inserted in hash table
Load factor α = N/M
• Search
• > Best :O(1)
• > Avg: O(N/M)
• >Worst: O(N)
• Delete- depends on
Time complexity of
searching
Open Addressing
• Open Addressing, which is also known as closed hashing is a
technique of collision resolution in hash tables. The main idea of open
addressing is to keep all the data in the same table to achieve it, we
search for alternative slots in the hash table until it is found.
• The three Major collision resolution strategies
Linear Probing
Quadratic Probing
Double hashing
• When using open addressing, a collision is resolved by probing
(searching) alternative cells in the hash table until our target cell
(empty cell while insertion, and cell with value x while searching x) is
found.
Linear Probing
• In linear probing, collisions are resolved by searching the hash table
consecutively (with wraparound) until an empty cell is found.
• The definition of collision function f is quite simple in linear probing.
As suggested by the name it is a linear function of i or simply f(i)=i
Algorithm of linear probing
• Insert(x) -
• Find the hash value, k of x from the hash function hash(x).
• Iterate consecutively in the table starting from the k, till you find a cell that is
currently not occupied.
• Place x in that cell.
• Search(x) -
• Find the hash value k of x from the hash function hash(x).
• Iterate consecutively in the table starting from the ,k, till you find a cell that
contains x or which is never been occupied.
• If we found x, then the search is successful and unsuccessful in the other case.
• Delete(x) -
• Repeat the steps of Search(x).
• If element x does not exist in the table then we can't delete it.
• If x exists in the cell (say k), put ∞ in cell k to denote it has been occupied
some time in the past, but now it is empty.
Pseudocode for Linear Probing
class Hashing: Delete(x):
size, table[] k=Hash(x)
Hash(x): while(table[k]!=x):
return x%size if(table[k] has never been occupied):
return
Insert(x): k=(k+1)%size
k=Hash(x) if(table[k]==x):
while(table[k] is not empty): table[k] = -Infinity
k=(k+1)%size
table[k]=x
Search(x):
k=Hash(x)
while(table[k] != x):
if(table[k] has never been occupied):
return false
k=(k+1)%size
return table[k]==x
Example of linear probing -
• Table Size = 7 Hash Function - hash(key)=key%7
• Insert - 16,40,27,9,75
Step 1 - Make an empty hash table of size 7.
• HashSet uses HashMap for storing its object internally. You must be
wondering that to enter a value in HashMap we need a key-value pair,
but in HashSet, we are passing only one value.
• Storage in HashMap: Actually the value we insert in HashSet acts as
a key to the map Object and for its value, java uses a constant variable.
So in the key-value pair, all the values will be the same.
• Make a HashSet
HashSet<E> hs = new HashSet<E>();
build an empty HashSet object in which the default initial capacity is 16
and the default load factor is 0.75.
• Put values into a hashset
hs.add(key);// Used to add the specified element if it is not present, if it
is present then return false.
• Some useful other methods
hs.remove(key);//It is used to remove the specified element from this
set if it is present. Key should be of object type
hs.clear();//It is used to remove all of the elements from the set.
hs.contains(key);// used to return true if this set contains the specified
element.
hs.size();//returns the number of elements in set
• Iterating through set using Iterator () method
Iterator<E> i = Hash_Set.iterator();
while (i.hasNext())
{ // Iterating over elements
// using next() method
System.out.println(i.next());
}