0% found this document useful (0 votes)
43 views47 pages

ADI Hashing

Uploaded by

adarshssingh2311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views47 pages

ADI Hashing

Uploaded by

adarshssingh2311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Hashing

Hashing
• Hashing is a technique of converting an element or a value into a fixed size key,
that is used to uniquely identify that element from the group of similar
elements.
• In other words, hashing is a process of mapping an element or a value with a
unique key.
• In hashing, a hash function is used, which takes the value(element) as an input
and generates the fixed size key also known as hash code as an output. The
generated hash code is thus used as the index position at which the value
associated with the key is stored.
Examples of Hashing in Data Structure
The following are real-life examples of hashing in the data structure –
• In schools, the teacher assigns a unique roll number to each student. Later,
the teacher uses that roll number to retrieve information about that student.
• A library has an infinite number of books. The librarian assigns a unique
number to each book. This unique number helps in identifying the position
of the books on the bookshelf.
Need for Hash data structure
• Every day, the data on the internet is increasing multifold and it is always a struggle to store
this data efficiently.
• In day-to-day programming, this amount of data might not be that big, but still, it needs to be
stored, accessed, and processed easily and efficiently.
• A very common data structure that is used for such a purpose is the Array data structure.
• Now the question arises if Array was already there, what was the need for a new data
structure!
• The answer to this is in the word “efficiency“.
• Though storing in Array takes O(1) time, searching in it takes at least
O(log n) time. This time appears to be small, but for a large data set, it
can cause a lot of problems and this, in turn, makes the Array data
structure inefficient.
• So now we are looking for a data structure that can store the data and
search in it in constant time, i.e. in O(1) time.
• This is how Hashing data structure came into play. With the
introduction of the Hash data structure, it is now possible to easily
store data in constant time and retrieve them in constant time as well.
Components of Hashing
1.Key: A Key can be anything string or integer which is fed as input in the
hash function the technique that determines an index or location for storage
of an item in a data structure.
2.Hash Function: The hash function receives the input key and returns the
index of an element in an array called a hash table. The index is known as
the hash index.
3.Hash Table: Hash table is a data structure that maps keys to values using a
special function called a hash function. Hash stores the data in an associative
manner in an array where each data value has its own unique index.
Hash Table
• Hash table is a data structure which uses hashing to store the data in the form
of key value pairs, such that the basic operations i.e. the insertion, deletion,
and searching can be performed on the data in O(1) time.
• Consider a hash table as an array, and whenever we need to insert, delete or
search an element in it, first the hash code corresponding to that element is
computed and then treating the hash code as the index of the array, the
required operation is performed on the given element.
• A simple hashing approach would be to use the modulo(%) operator as a hash
function and generate the key for a given value(assuming its numerical).
hash= value % hashTableSize
How does Hashing work?
• Suppose we have a set of strings {“ab”, “cd”, “efg”} and we would like to
store it in a table.
• Our main objective here is to search or update the values stored in the table
quickly in O(1) time and we are not concerned about the ordering of strings
in the table. So the given set of strings can act as a key and the string itself
will act as the value of the string but how to store the value corresponding to
the key?
Step 1: We know that hash functions (which is some mathematical
formula) are used to calculate the hash value which acts as the index of
the data structure where the value will be stored.
Step 2: So, let’s assign
• “a” = 1,
• “b”=2, .. etc, to all alphabetical characters.
Step 3: Therefore, the numerical value by summation of all characters of the string:
“ab” = 1 + 2 = 3,
“cd” = 3 + 4 = 7 ,
“efg” = 5 + 6 + 7 = 18
Step 4: Now, assume that we have a table of size 7 to store these strings. The hash
function that is used here is the sum of the characters in key mod Table size. We
can compute the location of the string in the array by taking the sum(string) mod 7.
Step 5: So we will then store
“ab” in 3 mod 7 = 3,
“cd” in 7 mod 7 = 0, and
“efg” in 18 mod 7 = 4.
• The above technique enables us to calculate the location of a given
string by using a simple hash function and rapidly find the value that
is stored in that location. Therefore the idea of hashing seems like a
great way to store (key, value) pairs of the data in a table.
int hashTable[10];
int hashTableSize = 10;

int hashFunction(int value)


{
return (value);
}
void Insert(int value)
{
//Compute the index using the hash function
int index= hashFunction(value) % hashTablesize;
//insert the value in the hashtable at the computed index
hashTable[index]=value;
}

void search(int value)


{
//Compute the index uing the Hash function
int index=hashFunction(value) % hashTableSize;
//Check the value in the hashTable present the computed index
if (hashTable[index]==value)
printf(“found”);
else printf("not found");
}
What is a Hash function?
• The hash function creates a mapping between key and value, this is done through the use of
mathematical formulas known as hash functions. The result of the hash function is referred to as a hash
value or hash. The hash value is a representation of the original string of characters but usually smaller
than the original.
Types of Hash functions:
1. Division Method. A good hash function should have the following properties:
1.Efficiently computable.
2. Mid Square Method. 2. Should uniformly distribute the keys (Each table position is
3. Folding Method. equally likely for each.
3.Should minimize collisions.
4. Multiplication Method 4.Should have a low load factor(number of items in the table divided
by the size of the table).
Problem with Hashing
• If we consider the above example, the hash function we used is the
sum of the letters, but if we examined the hash function closely then
the problem can be easily visualized that for different strings same
hash value is begin generated by the hash function.
• For example: {“ab”, “ba”} both have the same hash value, and string
{“cd”,”be”} also generate the same hash value, etc. This is known
as collision and it creates problem in searching, insertion, deletion,
and updating of value.
What is Collision?

• The hashing process generates a small number for a big key, so there is
a possibility that two keys could produce the same value. The situation
where the newly inserted key maps to an already occupied, and it must
be handled using some collision handling technology.
How to handle Collisions?
• There are mainly two methods to handle collision:
1.Separate Chaining:
2.Open Addressing:
Separate Chaining
• The idea is to make each cell of the hash table point to a linked list of
records that have the same hash function value. Chaining is simple but
requires additional memory outside the table.
• Example: We have given a hash function and we have to insert some
elements in the hash table using a separate chaining method for
collision resolution technique.
Hash function = key % 5,
Elements = 12, 15, 22, 25 and 37.
step by step approach to how to solve the
above problem:
Step 1: First draw the empty hash table which will have a possible
range of hash values from 0 to 4 according to the hash function
provided.
Step 2: Now insert all the keys in the hash table one by one. The first key to be
inserted is 12 which is mapped to bucket number 2 which is calculated by using the
hash function 12%5=2.

Step 3: Now the next key is 22. It will map to bucket number 2
because 22%5=2. But bucket 2 is already occupied by key 12.
Step 4: The next key is 15. It will map to slot number 0 because
15%5=0.

Step 5: Now the next key is 25. Its bucket number will be
25%5=0. But bucket 0 is already occupied by key 25. So
separate chaining method will again handle the collision by
creating a linked list to bucket 0.
Complexity Analysis
M= Number of slots in hash table
• Insert –O(1) N= Number of keys to be inserted in hash table
Load factor α = N/M
• Search
• > Best :O(1)
• > Avg: O(N/M)
• >Worst: O(N)
• Delete- depends on
Time complexity of
searching
Open Addressing
• Open Addressing, which is also known as closed hashing is a
technique of collision resolution in hash tables. The main idea of open
addressing is to keep all the data in the same table to achieve it, we
search for alternative slots in the hash table until it is found.
• The three Major collision resolution strategies
Linear Probing
Quadratic Probing
Double hashing
• When using open addressing, a collision is resolved by probing
(searching) alternative cells in the hash table until our target cell
(empty cell while insertion, and cell with value x while searching x) is
found.
Linear Probing
• In linear probing, collisions are resolved by searching the hash table
consecutively (with wraparound) until an empty cell is found.
• The definition of collision function f is quite simple in linear probing.
As suggested by the name it is a linear function of i or simply f(i)=i
Algorithm of linear probing
• Insert(x) -
• Find the hash value, k of x from the hash function hash(x).
• Iterate consecutively in the table starting from the k, till you find a cell that is
currently not occupied.
• Place x in that cell.
• Search(x) -
• Find the hash value k of x from the hash function hash(x).
• Iterate consecutively in the table starting from the ,k, till you find a cell that
contains x or which is never been occupied.
• If we found x, then the search is successful and unsuccessful in the other case.
• Delete(x) -
• Repeat the steps of Search(x).
• If element x does not exist in the table then we can't delete it.
• If x exists in the cell (say k), put ∞ in cell k to denote it has been occupied
some time in the past, but now it is empty.
Pseudocode for Linear Probing
class Hashing: Delete(x):
size, table[] k=Hash(x)
Hash(x): while(table[k]!=x):
return x%size if(table[k] has never been occupied):
return
Insert(x): k=(k+1)%size
k=Hash(x) if(table[k]==x):
while(table[k] is not empty): table[k] = -Infinity
k=(k+1)%size
table[k]=x

Search(x):
k=Hash(x)
while(table[k] != x):
if(table[k] has never been occupied):
return false
k=(k+1)%size
return table[k]==x
Example of linear probing -
• Table Size = 7 Hash Function - hash(key)=key%7
• Insert - 16,40,27,9,75
Step 1 - Make an empty hash table of size 7.

Step 2 - Inserting 16,40, 27


• ℎ(16)=16%7=2
• ℎ(40)=40%7=5
• ℎ(27)=27%7=6
• Step 3 - Inserting 9 and 75.
• hash(9)=9%7=2 But at index 2 already 16 is placed and hence collision occurs so as
per linear probing we will search for consecutive cells till we find an empty cell.
So we will probe for hash(9)+1 i.e. cell 3, since the next cell i.e. 3 is not occupied we
place 9 in cell 33.
• hash(75)=75%7=5 Again collision happens because 40 is already placed in cell 5. So
will search for the consecutive cells, so we search for cell 6 which is also occupied
then we will search for cell (hash(75)+2)%7 i.e. 0 which is empty so we will place 75
there.
Search - 75,21
• Step 4 - Search 75 and 21
• hash(75)=75%7=5 But at index 5,75 is not present so we search for
consecutive cells until we found an empty cell or a cell with a value of 75. So
we search in cell 6 but it does not contain 75, so we search for 75 in cell 0 and
we stop our search here as we have found 75.
• h(21)=21%7=0 We will search for 21 in cell 0 but it contains 75 so we will
search in the next cell hash(21)+1, i.e. 1 since it is found empty it is clear
that 21 do not exist in our table.
Problem With Linear Probing
• Even though linear probing is intuitive and easy to implement but it
suffers from a problem known as Primary Clustering. It occurs
because the table is large enough therefore time to get an empty cell or
to search for a key k is quite large.
• worst case time complexity
of searching, insertion and deletion operations to be O(n), where n is
the size of the table.
Quadratic Probing
• The working of quadratic probing involves taking the initial hash
value and probing in the hash table by adding successive values of an
arbitrary quadratic polynomial. As suggested by its name, quadratic
probing uses a quadratic collision function f. One of the most common
and reasonable choices for f(i)=i^2
• For inserting x we search for the cells hash(x)+0,hash(x)+1^2,hash(x)
+2^2,... until we find an empty cell to insert x.
• For searching x we again search for the cells hash(x)+0,hash(x)
+1^2,hash(x)+2^2,... until we find a cell with value x. If we find an
empty cell that has never been occupied it means x is not present in the
hash table.
• For deletion, we repeat the search process if a cell is found with
value x we replace the value x with a predefined unique value to
denote that this cell has contained some value in past.
Example of Quadratic Probing
• Table Size = 7 Insert = 15, 23, and 85. Search & Delete = 85 Hash
Function -Hash(x)=x%7 Collision Resolution Strategy - f(i)=i^2

Step 1 - Create a table of size 7.

Step 2 - Insert 15 and 23


•hash(15)=15%7=1 Since the cell at index 1 is not
occupied we can easily insert 15 at cell 1.
•hash(23)=23%7=2 Again cell 2 is not occupied so
place 23 in cell 2. After performing this step our hash
table will look like
• Step 3 - Inserting 85
• hash(85)=85%7=1 In our hash table cell 1 is already occupied so we
will search for cell 1+1^2, i.e. cell 2. Again it is found occupied so we
will search for cell 1+2^2 , i.e. cell 5. It is not occupied so we will
place 85 in cell 5. After performing all these 3 insertions in our hash
table it will look like -
Double Hashing
• Double hashing offers us one of the best techniques for hashing with
open addressing. The permutation formed by double hashing is like a
random permutation therefore it almost eliminates the chances of
cluster forming as it uses a secondary hash function as an offset to deal
with collision condition.
• The hash function that is used in double hashing is of the form -
h(k,i)=hash1​(k)+i×hash2​(k), i=0,1,2,3,...
Example of double hashing
• Table size = 7 hash1​(k) = k%7 and hash2​(k)=1+k
%6 Insert = 37,25,12,40 and 75 Search & Delete = 75
Step 1 - Create a table of size 77.

Step 2 - Insert 37,25 and 12


•hash1​(37)=37%7=2
•hash1​(25)=25%7=4
•hash1​(12)=12%7=5
There is no collision at any point during inserting
these three values so we will simply place these
elements at their respective positions. After which,
our hash table will look like -
Step 3 - Insert 40and 74
• hash1​(40)=40%7=5, hash2​(40)=1+40%6=5 But at the cell at index
5 is already occupied, so we will check for next cell i.e. hash1​
(40)+1×hash2​(40) which will evaluate to (5+5)%7=3. Cell at index
3 is not occupied so we will place 40 in cell 3.
• hash1​(74)=74%7=4, hash2​(74)=1+74%6=3 But at the cell at index
4 is already occupied, so we will check for next cell i.e. hash1​
(74)+1×hash2​(74) which will evaluate to (4+3)%7=0. Cell at index
0 is not occupied so we will place 74 in cell 0.
Load Factor
• Total number of elements per bucket= N/M
• The load factor of the hash table can be defined as the number of items
the hash table contains divided by the size of the hash table. Load
factor is the decisive parameter that is used when we want to rehash
the previous hash function or want to add more elements to the
existing hash table.
• It helps us in determining the efficiency of the hash function i.e. it tells
whether the hash function which we are using is distributing the keys
uniformly or not in the hash table.
Rehashing?
• Rehashing means hashing again. Basically, when the load factor
increases to more than its predefined value (the default value of the
load factor is 0.75), the complexity increases. So to overcome this, the
size of the array is increased (doubled) and all the values are hashed
again and stored in the new double-sized array to maintain a low load
factor and low complexity.
Hashing Data Structure in JAVA
HashMap
• HashMap class implements the Map interface which allows us to store key and value pair, where keys should be unique.
• If you try to insert the duplicate key, it will replace the element of the corresponding key
• Java HashMap contains values based on the key.
• Java HashMap contains only unique keys.
• Java HashMap may have one null key and multiple null values.
• Java HashMap is non synchronized.
• Java HashMap maintains no order.
• The initial default capacity of Java HashMap class is 16 with a load factor of 0.75.
Make a HashMap
HashMap<keyType, valueType> myMap =
new HashMap<keyType, valueType>();
Put and get values into a map
myMap.put(key, value);
myMap.get(key) // returns the corresponding value

Some useful other methods


int size = myMap.size();//returns the size of pairs added into map
myMap.containsKey(key); // returns true or false if key is in map
myMap.keySet(); //returns the keys in the Map in collection set
myMap.values();//returns the values in the Map in collection set
myMap.remove(key); // remove the key value pairs
myMAp. isEmpty();//returns true if this map contains no key-value mappings.

Iterate using a foreach loop


for(keyType key : myMap.keySet())
{ // not ordered
myMap.get(key); // do something with the key/value pair
}
Methods inherited from class java.util.AbstractMap
METHOD DESCRIPTION
equals() Compares the specified object with this map for equality.

hashCode() Returns the hash code value for this map.

toString() Returns a string representation of this map.


HashSet in Java
• A few important features of HashSet are mentioned below:
• Implements Set Interface.
• The underlying data structure for HashSet is Hashtable.
• As it implements the Set Interface, duplicate values are not allowed.
• Objects that you insert in HashSet are not guaranteed to be inserted in
the same order. Objects are inserted based on their hash code.
• NULL elements are allowed in HashSet.
. HashSet does not store duplicate items
Internal Working of a HashSet

• HashSet uses HashMap for storing its object internally. You must be
wondering that to enter a value in HashMap we need a key-value pair,
but in HashSet, we are passing only one value.
• Storage in HashMap: Actually the value we insert in HashSet acts as
a key to the map Object and for its value, java uses a constant variable.
So in the key-value pair, all the values will be the same.
• Make a HashSet
HashSet<E> hs = new HashSet<E>();
build an empty HashSet object in which the default initial capacity is 16
and the default load factor is 0.75.
• Put values into a hashset
hs.add(key);// Used to add the specified element if it is not present, if it
is present then return false.
• Some useful other methods
hs.remove(key);//It is used to remove the specified element from this
set if it is present. Key should be of object type
hs.clear();//It is used to remove all of the elements from the set.
hs.contains(key);// used to return true if this set contains the specified
element.
hs.size();//returns the number of elements in set
• Iterating through set using Iterator () method
Iterator<E> i = Hash_Set.iterator();
while (i.hasNext())
{ // Iterating over elements
// using next() method
System.out.println(i.next());
}

You might also like