0% found this document useful (0 votes)
5 views62 pages

Lecture 4 Hashtable and HashMap

The document covers the concepts of HashTables and HashMaps, including their implementation and the importance of hash functions. It discusses the challenges of collisions in hash tables and methods to handle them, specifically focusing on chaining. Additionally, it provides a practical example of implementing a phonebook using these data structures in Java.

Uploaded by

bruha dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views62 pages

Lecture 4 Hashtable and HashMap

The document covers the concepts of HashTables and HashMaps, including their implementation and the importance of hash functions. It discusses the challenges of collisions in hash tables and methods to handle them, specifically focusing on chaining. Additionally, it provides a practical example of implementing a phonebook using these data structures in Java.

Uploaded by

bruha dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

#LifeKoKaroLift

Data Structures
Module Name- HashTables and
HashMaps
Topic Name: Hashtable and
EditEdit
HashMap
MasterMaster
texttext
stylesstyles
Spot Test

Let’s take a quick revision assessment of previously taught topics :-

● Stacks

● Queues

● Implementing Stack using Queue

● Implementing Queue using Stack


Today’s Agenda

● Introduction to Hashtable
● Array implementation of phonebook
● Hashing & Hash Functions
● Collisions in Hash Tables
● Hashtable implementation of Phonebook
● Introduction to HashMap
● Finding Symmetric Pairs
● First Unique Character Problem
Introduction to Hashtable

● In the previous modules, you learnt about searching and


sorting algorithm.

● What is the time complexity to search an element in an


unsorted array? What about a sorted array?
Can you further reduce this time complexity even more???

● In this module, you will learn about data structures that you
could use to search an element in constant time.
Introduction to Hashtable

● Imagine you are on a week long business trip to Bangkok. And on very first
day, you realize that you forgot to pack a very essential item. You have no
other option but to order online to your hotel address.

● Now let’s say your order has been


delivered and you go to reception
to collect it. The receptionist can
search for your name in the
parcels in either of the following
three ways:
Introduction to Hashtable

● It can be that the receptionist has a lot of parcels in an unorganized way and
he/she goes through all of them looking for your name.

● There is also a chance that the parcels are kept according to the sorted
order of names. In this way, a manual version of binary search can be
applied to reduce the number of parcels to be checked by the receptionist to
fasten the process of searching.

● If you pay attention, these two options are nothing but analogues of linear
and binary search
Introduction to Hashtable

● Well, there is a third possibility too. What if there are 26 storage spaces,
each containing the parcels by the name of people starting with each
alphabet. In this the receptionist can directly go to your alphabet and fetch
your parcel for you quickly( even if there are 3-4 parcels in that alphabet
column).

● Which option do you like the best?


Array Implementation of Phonebook

● A phone book is basically an array of phone book entries. In fact, the two
fields of the phone book object are the array of entries (called entries, and a
count of the number of entries it contains (num_entries).

There are three basic methods that we want to execute: We might want to
add a new entry, we might want to find an entry, or we might want to remove
an entry. Additionally, we’ll add a method isFull() that indicates whether the
phone book is full yet.
Array Implementation of Phonebook

public class PhoneBook {


public static final int MAX_ENTRIES = 100;

private PhoneBookEntry[] entries;


private int num_entries;

public PhoneBook() {
entries = new PhoneBookEntry[MAX_ENTRIES];
num_entries = 0;
}
public boolean isFull() {
return num_entries == MAX_ENTRIES;
}
Array Implementation of Phonebook
public void add(PhoneBookEntry entry) {
// We check to make sure there's room and that the
// entry isn't null, just to make sure.
if(entry != null && num_entries < MAX_ENTRIES) {
// We add the entry at the end of the entries we have.
entries[num_entries] = entry;
++num_entries;
}
}
public PhoneBookEntry find(String name) {
// We search through the entries we have for the name.
for(int i = 0; i < num_entries; i++) {
if(entries[i].getName().equals(name))
return entries[i];
}
return null;
}
Array Implementation of Phonebook

public PhoneBookEntry delete(String name) {


// We search through the entries we have for the name.
for(int i = 0; i < num_entries; i++) {
if(entries[i].getName().equals(name)) {
// When we find it, we'll move the last entry into
// the current slot, replacing the found one.
PhoneBookEntry ret = entries[i];
entries[i] = entries[num_entries - 1];
--num_entries;
return ret;
}
}
return null;
}
}
Hashing and Hash Functions

● The idea of hashing is based on the


central concept of the hash
function.

● A hash function is any function that


can be used for mapping arbitrarily
sized data to fixed size data.
Hashing and Hash Functions

● As you can see in the figure, the


cardinality of the input set U is
infinite, whereas the output V is a
range with a fixed size M, where
0<M<=N (N is the set of natural
numbers).

● The job of H is to map each


element of U set to some element
of V set using the hash function.
Hashing and Hash Functions

● In layman terms, the basic idea behind a hash function is to map the
input( which can be any data type), to the output( which in mostly a natural
number).

● Another obvious yet important point to consider here is that the hash
function should be fast to compute, or else, having a constant time search
complexity after the hashing will be useless.

● Let’s understand the importance of hash function to be fast with an example.


But before that, have you ever been to Mumbai?
Hashing and Hash Functions

● It will be like creating a shortcut bridge between two places to reduce the
existing distance, for example Bandra-Worli Sea Link but to access that
bridge you’ve to go through an hour long registration process every time.

● So, the reduced distance doesn’t really help us save any time overall.
Hashing and Hash Functions

● Now, let’s take an


example of a simple
phonebook.

● Consider a hash
function that uses the
first alphabet of a
name, and then it maps
the name to the
corresponding index of
the first alphabet in
English language in the
hash table.
Hashing and Hash Functions

● In this example, you can see that the name ‘Avi’ starts with an ‘A’ and since
A is the first letter of the English, it gets mapped to the first index of the hash
table, i.e. 0.

● Similarly, ‘Bob’ gets mapped to the hash index 1, ‘Chandrika’ to index 2 and
so on.

● Let’s say you have completed all the entries and you search for the name
‘Farook’. What do you think will happen?
Hashing and Hash Functions

● Now, when a new entry, ‘Farook’, comes in for search, you can directly
apply the hash function to ‘Farook’.

● Here, see that the first letter is ‘F’, which means H(F) -> 5, so we can
directly go to index 5 of the hash table and see whether the entry ‘Farook’ is
there in the hash table or not.

● This search for the name ‘Farook’ by applying the hash function and directly
checking the hash index 5 would take a total time of O(1), i.e. constant time.
Hashing and Hash Functions

● Is the complexity REALLY O(1)?


○ It depends on good hash function, the elements in hash table, the capacity
of Hash table
○ Actually it is
○ O(n/N) [Assuming a good hash-function that distributes keys
uniformly] where n is number of elements in the hashtable and N
is number of bucket-arrays or locations where keys are mapped
(or the capacity of the table)

■ However, we keep n/N between 0 and 1 to achieve


Constant time complexity

■ n/N is called Fill Ratio or Load Factor


Hashing and Hash Functions

● Is the complexity REALLY O(1)?


○ It depends on good hash function, the elements in hash table, the capacity
of Hash table
○ Actually it is
■ Load factor is used by HashMap, Hashtable and other classes
in Java and the Hashtable to decide when to increase the
bucket capacity (so that constant time is maintained)

Default Load Factor for HashMap and HashSet is 0.75

■ Note that performance of a Hashing Mechanism (for storage or


search) degrades if hash function maps every key into the same
bucket (A BAD HASH FUNCTION) - then it will become O(n) in worst
case)
Collisions in Hash Tables

● Let’s understand a major limitation of hash tables that was overlooked in the
previous example, collision.

● Let’s say that you maintained a phone book with


a hash function that uses first character of a
name. What would happen if there were two or
more names with the same first letter? Like
adding Akbar in the given HashTable which
already has an entry Avi...

● We saw earlier that there can be infinite inputs


and they all have to map to a fixed range of
natural numbers. So, it’s fairly obvious that
collisions are inevitable.
Collisions in Hash Tables

So, there are two initial thoughts about this situation:

● The first instinctive thought is to not allow


such entries to be added into the hashtable to
avoid the error. But that would not be an
optimal solution because then the phonebook
won’t be able to have more than 26 people,
one for each alphabet.

● Another idea can be to replace the old entry


with the new one because then, there will be no
way to access the old entry and once again we
won’t be able to store more than 26 entries in
out phonebook.
Collisions in Hash Tables

What we saw is a typical example of Collision.

● We already saw that the size of set


U is mostly greater than the set V.

● So, it means that collisions are


bound to happen a lot because the
element of a much larger set are
mapped to a much smaller set.

● Thus, it can be said that collisions


are an integral part of the
hashing mechanism.
Collisions in Hash Tables

● The phenomenon of two distinct keys, K1


and K2, getting hashed to the same index
is known as a ‘collision’ in hash tables.

● So for two keys where

K1≠K2, if H(K1)=H(K2),

collisions occur.

● Collisions cannot be avoided


completely, but they may be reduced to
a great extent by choosing good hash
functions
Collisions in Hash Tables

There are two major ways to


handle the problem of collisions:

● CLOSED HASHING or
LINEAR PROBING

● OPEN HASHING or
CHAINING
Collisions in Hash Tables

● In our discussions, we will be focussing solely on the latter, i.e. Chaining.

● In this method, rather than storing a single entry at each index, there is a
linked list of entries maintained at each index. So, every time a new entry
with the same hash index comes into the picture, it gets added as the head
of the linked list at that particular index.
Collisions in Hash Tables

● Let’s understand with the


help of an example.

● In this figure, we can see


‘Akbar’ getting added to the
linked list at the 0th index
where ‘Avi’ is already
present, thus increasing the
size of the list and also giving
the chance to store another
key with the same hash
index at the same position.
Collisions in Hash Tables

Let’s see how the final searching for a key after chaining takes place:

● The first step is to calculate the hash value or the hash index.

● Then, we find out the entry corresponding to the index in our hashtable. This
entry may not be a single element but a linked list of elements.

● Now, we perform a linear search on our linked list to find if the key is present
or not.

● If the key is present, the it is found during this binary search. If not, the
binary search completes with no output.
Collisions in Hash Tables

Let’s see an example to clarify the search process even more:

● Here, when we search


for Akbar, we get the
output as True but if we
search for Avinash, the
entire linked list is
traversed only to find
that Avinash is not
present in the list.

● Thus, in the worst case scenarios, the time complexity approaches O(n) for
the search operation.
Collisions in Hash Tables

● While chaining, the addition of a new entry into the hash table still is an O(1)
operation because you are adding the element to the head of the list,
whereas searching can take O(n) time when the list at an index grows
proportional to the length of the hash table.

● Regardless of this, maintaining a hash table with chaining is a good way to


eliminate collisions.
Collisions in Hash Tables

● The choice of a hash function greatly affects the overall process of hashing.
So, for example, let’s choose a hash function that is proven to be a bad
choice, as shown below:
Collisions in Hash Tables

● As you can see, the hash function considers the position of the first letter of
each name in the English alphabet and then considers whether it lies in an
even position or odd position in the alphabet.

● If it’s located in an even position, like ‘A’


comes at 0 , then the names starting with
A, C, E, etc. will get mapped to the hash
index 0, whereas names starting with B, D,
F, etc. will get mapped to index 1.

● Thus, Avi, Chandrika and Eva have hash


index as 0 and Bob and Dev have their
hash index as 1.
Collisions in Hash Tables

● In this scenario, only two of the indices of the hash table get occupied and
that too with both indices getting chained with long linked lists. It results in a
skewed or unbalanced load on the hashtable.

● This way of storing prevents us from calculating the constant runtime


complexity, since we would have to traverse the entire list at a single index
in the worst case.
Collisions in Hash Tables

● As we can see, all the names are packed up in long chains, we will end up
with very bad time performance of the search operation.

● Besides, since only two of the indices are have taken up the load of the
entire hash table, a lot of useful space is being wasted.

● However, if we were careful in choosing the hash function, the scenario


would have been quite different. For instance, we wouldn’t be wasting so
much space and could have avoided creating the long chains at only two of
the indices of the hash table.
Collisions in Hash Tables

● Next thing that comes up in hashing is patterns, which you need to avoid in
order to ensure better performance of your hash functions.

● Let’s consider the case of a university, so the names are all saved with the
title ‘Prof.’ ahead of them, e.g. ‘Prof. Avi’, ‘Prof. Bob’, and so on.

● Now, all of the names get mapped


to the same index since they all
start with ‘P’. Here, a well defined
hash function, which worked well in
some other cases, would fail as
soon as it encounters a pattern in
the input domain.
Collisions in Hash Tables
What we get here is an example of a hash function that happens to work well in
certain cases but doesn’t produce desirable results when the input values have a
pattern in them.
Collisions in Hash Tables

There is no single hash function that is universally applicable. For a


given application, a good hash function should be designed with the
following characteristics in mind:

1. It should map all the keys.

2. It should distribute the keys uniformly across the array indices.

3. It should output different hash values for similar, yet unequal,


keys

4. It should be fast and easy to compute


Hashtable implementation of Phonebook

● Let’s talk about the actual java implementation of


the same. Hash tables are used in the
implementation of a particular data structure
called dictionary which stores the key value pairs.

● We’ve been talking about the phonebook


example of hash tables since the starting of
today’s session.
.
● Let’s take a look at the actual java
implementation of the same using hash table API
by Java.
Hashtable implementation of Phonebook

Let’s look at the API for dictionary abstract data-type. It is simple and has only
two functions:

● ADD(): Results in addition of entries in


the dictionary. This function may also be
called as put, insert etc. in various
implementations.

● LOOKUP(): Takes a key and finds the


value corresponding to it. This function
too may have different names like search,
get, find in various implementations.
Hashtable implementation of Phonebook

● You basically require two main functions called put and get, where put does
the work of adding values to your hash table and get retrieves the results
from the hash table using the key.

● First, you must import the hash table to your class, which is done by simply
writing ‘import java.util.Hashtable;’ where all the import statements go in the
code.

● After that, you have to declare the hash table by writing —


Hashtable<String, int> name = new Hashtable<String, int>();
Hashtable implementation of Phonebook

● In our example, string is the key of the hash table and int is the value that
is intended to be stored at that key.

● As for the put function, you have to write ‘name.put(key, value)’, which
hashes the value to the particular key specified in the function itself.

● And as for the get function, you simply write ‘name.get(key)’, which returns
the value stored at the specified key.
Hashtable implementation of Phonebook

● There are other functions too provided by the hash table API, such as
containsKey(key), to check if a particular key is contained in a hashtable or not.

● Moreover, Hashtable.keySet() returns the set of keys that are contained in the
hash table, and you can traverse the set of keys to get a list of all the keys
contained in the hash table.

● Also, Hashtable.remove(key) deletes the value having the key as given in the
function from the hash table, thus clearing out the memory assigned to that key.

● Lastly, Hashtable.clear() completely clears the hash table and leaves it empty.
Introduction to Hashmap

● In the last class we learnt about the


data structure Hashmap, and now it’s
time to learn a similar data structure-
HASHMAP.

● Even though there are many


differences between a map and a
table, these two data structures are
very similar in Java.
Introduction to Hashmap

● As already told, these two data structures are very similar. Here are the similarities:

○ Both are the implementations of Map interface in java.

○ Both of them perform similar functions.

○ Both do not maintain any order of elements.

● But they won’t be named as two different data structures if they were so same. Like
any human couple, they’ve got their differences too:
Introduction to Hashmap
Differences between Hashtable and HashMap are as follows:
Hashtable HashMap

It is used in the older versions of Java. It exists only in newer versions of Java i.e., it is part of Java

since version 1.2

It doesn’t allow a key to be null. (because it calls It allows at most one key to be null. (It was an

hashCode() and equals() method on key objects) improvement over Hashtable)

It doesn’t allow to store a null value. It allows storing any number of null values.

It is a bit slower. (because of synchronization) It is faster.

It is synchronized ( thread safe) It is non-synchronized (Not Thread-safe)


Introduction to Hashmap

● Hashmaps are used when we have to store a very large dataset and
might need to access some particular value later.

● One such example is to store the Aadhar IDs of over 1 billion Indians.
When we have to find a particular person from this massive dataset.
It can be done in O(1) time.

● Another example can be a scientist who stores millions of readings


over years for a particular experiment, and if he wants to access
some reading, the fastest result will be if the data is stored in a
hashmap.
Introduction to Hashmap

● We declare the HashMap in Java by using the below instruction:

HashMap<keyDataType, valueDataType> hashMapName =


new HashMap<keyDataType, valueDataType>();

● For example, if we have to store key value pairs of the form character-integer like a-
>1, b->2 and so on, the way to declare it is:

HashMap< Character, Integer > example=


new HashMap< Character, Integer >();
Introduction to Hashmap

There are many methods in a HashMap; which are listed below:

Methods Operations

put(key,value) This method adds the specified key with the specified value to the

HashMap.

remove(key) If the key is present in the HashMap, then it removes the key along with the

value mapped to it.

containsKey(key) If there is any mapping to the specified key, then it returns true.
Introduction to Hashmap

If you want to use your own classes (Student, MyKeyClass, etc.) then these classes
should override these hashCode() and equals() method of Object class

You can use them as keys without overriding these methods, BUT YOU WILL NOT BE ABLE
TO SEARCH YOUR KEYS in the MAP (you will need exact same references to do this)

This is true for ALL classes which have “Hash” in their names
Introduction to Hashmap

Methods Operations

size() This returns the number of key-value mappings present in the HashMap.

isEmpty() If there is no key-value mapping present in the HashMap, it returns true.

clear() Removes all mappings present in the HashMap.

get(key) Returns the value mapped to the specified key in the HashMap.

keySet() Returns the set of keys present in the HashMap.

● There is no doubt about the fact that these methods are almost same as the
methods of a hashtable.
Introduction to Hashmap

Let’s consider a hashmap H with two entries , A->1 and B->2 and test these methods:

INPUT: OUTPUT:

➔ H.put(D, 4); ➔ (Adds the key value pair D->4)


➔ H.remove(B); ➔ (Removes the key value pair B->2)
➔ H.containsKey(C); ➔ FALSE (Because no key C is present)
➔ H.size() ➔ 2 (Two pairs are present in the hashmap)
➔ H.isEmpty() ➔ FALSE
➔ H.get(A) ➔ 1 (The value for key A)
➔ H.keySet() ➔ {A, D}
➔ H.clear() ➔ (Deletes all the content from the hashmap)
Finding Symmetric Pairs

● Well it’s time to put our newly learnt knowledge into practice. The problem is called
as “Finding Symmetric Pairs”

● You will be given an array of pairs, and you have to print all the symmetric pairs. Pair
(a, b) and (c, d) are called symmetric pairs, if a is equal to d and b is equal to c.

● For example, if the given array of pairs is {{1, 2}, {2, 3}, {3, 4}, {4, 3}, {2, 1}} then the
symmetric pairs in the given array of pairs are:
○ (1, 2) and (3, 4) because the pair (1, 2) has its symmetric pair (2, 1)
○ and the pair (3, 4) has its symmetric pair (4, 3) in the given array.
Finding Symmetric Pairs

Here’s the basic approach to solve the problem: (APPROACH 1)

● for(int i = 0; i < arr.length; i++) // Traverse


through the given array

○ int firstC = arr[i][0]; // Get the


first and second elements
○ int secondC = arr[i][1]; of the
current pair

○ for(int j = i+1; j < arr.length; j++) // Check whether the


pairs are
symmetric to the current pairs or not
■ int secondO = arr[j][1];
■ int firstO = arr[j][0];
Finding Symmetric Pairs

● In this approach, to extract key-value pair of say ith element, we need to get the
arr[i][0] and arr[i][1] values.

● Now, we start with the first pair, and for all the following pairs, we check if there is
any pair such that its key matches with the value of this pair and its value matches
the key of this pair.

● If a match is found, the pair is printed.

● Otherwise, we repeat all the above steps for the next pair and search for its
symmetric pair in all the pairs AFTER the current pair.
Finding Symmetric Pairs

Here’s the hashmap approach to solve the problem: (APPROACH 2)

● Create a hashmap.

● Traverse through the array and check for every current pair to identify whether the
second element of the current pair is present in the hashmap or not.

○ If it is present, then check whether the first element of the current pair and
value of the key in hashmap are the same or not.

○ If they are same, then print the key and value in the hashmap.

○ Otherwise, add that current pair to the hashmap considering the first element of
the pair as the key and the second element of the pair as the value of the key.
Finding Symmetric Pairs

● In approach 1, for each pair, you need to traverse through the other pairs of the array
to check whether the pairs are symmetric or not. This approach takes O(n2).

● In approach 2, For each pair, checking for the symmetric pair in the HashMap takes
O(1) time. Since all the pairs are checked, the time complexity of this approach is
O(n).

● For space complexity, in the worst case, if there are no symmetric pairs in the array,
then the size of the hashmap is O(n). Therefore, the space complexity is O(n).

Which approach according to you is the better one???

Let’s check the Java code for the hashmap approach now:
Finding Symmetric Pairs

HashMap<Integer, Integer> hashMap = new HashMap<Integer,


Integer>();
boolean flag = false;

for (int i = 0; i < arr.length; i++) { //Traversing the


int first1 = arr[i][0];
input array
int second1 = arr[i][1];
Integer data = hashMap.get(second1); // Searching
the

current value in the

keys of hashmap
if (data != null && data == first1) {
System.out.println(second1 + " " + first1); //Printing
flag = true; }
First Unique Character

● Let’s quickly check another problem which uses our newly learnt data structure.

● You will be given a string, and you have to find and print the first unique character,
i.e., the first non-repeating character of the string.

● The string may contain duplicate characters.

SPOILER ALERT!!!
Unlike most of the problems, the time complexity analysis of our algorithm(s) will produce
a very unusual conclusion.
First Unique Character

● Let’s see a couple of examples to understand the problem statement better:

● Consider the string ‘abcdebadf’. If you observe the given string, the first character ‘a’
and the second character ‘b’ are repeated in abcdebadf. BUT, the third character ‘c’ is
not repeated in abcdebadf. Therefore, the first unique character.

● Let’s see another example. Consider a humongous string which comprises of all the
numbers from 1-1000, but in word format:
onetwothree…………………………fivehundred……………………..thousand

In this case, the first unique character is ‘a’ from thousand. How many of you knew
this fun fact already? See, computer programming can be fun too!!!
First Unique Character

Let’s see the approach to solve this problem now:


● First, create a hashmap to get the count of each character of the string, where the
key of the hashmap represents the character of the string, and the value in the
hashmap represents the number of times the character is repeated in the string.
● After creating the hashmap, insert all the characters and their respective counts into
it by scanning through the string once.
● Now, scan through the string again and while scanning through each character, check
its count in the hashmap. If the count is 1, then stop scanning further and print that
character. Otherwise, scan through the string until you either reach such character or
the end of the string.

Now, it’s your turn to write the code for the same!
#LifeKoKaroLift

Thank You!
Happy learning!

You might also like