Theory PDF
Theory PDF
Theory PDF
Suppose we want to design a system for storing employee records keyed using
phone numbers. And we want following queries to be performed efficiently:
We can think of using the following data structures to maintain information about
different phone numbers.
For arrays and linked lists, we need to search in a linear fashion, which can be
costly in practice. If we use arrays and keep the data sorted, then a phone number
can be searched in O(Logn) time using Binary Search, but insert and delete
operations become costly as we have to maintain sorted order.
With balanced binary search tree, we get moderate search, insert and delete
times. All of these operations can be guaranteed to be in O(Logn) time.
Another solution that one can think of is to use a direct access table where we
make a big array and use phone numbers as index in the array. An entry in array is
NIL if phone number is not present, else the array entry stores pointer to records
corresponding to phone number. Time complexity wise this solution is the best
among all, we can do all operations in O(1) time. For example to insert a phone
number, we create a record with details of given phone number, use phone number
as index and store the pointer to the created record in table.
This solution has many practical limitations. First problem with this solution is extra
space required is huge. For example if phone number is n digits, we need O(m * 10 n)
space for table where m is size of a pointer to record. Another problem is an integer
in a programming language may not store n digits.
Due to above limitations Direct Access Table cannot always be used. Hashing is the
solution that can be used in almost all such situations and performs extremely well
compared to above data structures like Array, Linked List, Balanced BST in practice.
With hashing we get O(1) search time on average (under reasonable assumptions)
and O(n) in worst case.
Hashing is an improvement over Direct Access Table. The idea is to use hash
function that converts a given phone number or any other key to a smaller
number and uses the small number as index in a table called hash table.
Hash Function: A function that converts a given big phone number to a small
practical integer value. The mapped integer value is used as an index in hash table.
In simple terms, a hash function maps a big number or string to a small integer that
can be used as index in hash table.
A good hash function should have following properties:
1. Efficiently computable.
2. Should uniformly distribute the keys (Each table position equally likely for
each key)
For example for phone numbers a bad hash function is to take first three digits. A
better function is consider last three digits. Please note that this may not be the best
hash function. There may be better ways.
Hash Table: An array that stores pointers to records corresponding to a given phone
number. An entry in hash table is NIL if no existing phone number has hash function
value equal to the index for the entry.
Collision Handling: Since a hash function gets us a small number for a big key,
there is possibility that two keys result in same value. The situation where a newly
inserted key maps to an already occupied slot in hash table is called collision and
must be handled using some collision handling technique. Following are the ways to
handle collisions:
• Chaining:The idea is to make each cell of hash table point to a linked list of
records that have same hash function value. Chaining is simple, but requires
additional memory outside the table.
• Open Addressing: In open addressing, all elements are stored in the hash
table itself. Each table entry contains either a record or NIL. When searching
for an element, we one by one examine table slots until the desired element is
found or it is clear that the element is not in the table.
Open Addressing
Open Addressing: Like separate chaining, open addressing is a method for
handling collisions. In Open Addressing, all elements are stored in the hash table
itself. So at any point, size of the table must be greater than or equal to the total
number of keys (Note that we can increase table size by copying old data if needed).
Important Operations:
• Insert(k): Keep probing until an empty slot is found. Once an empty slot is
found, insert k.
• Search(k): Keep probing until slot's key doesn't become equal to k or an
empty slot is reached.
• Delete(k): Delete operation is interesting. If we simply delete a key, then
search may fail. So slots of deleted keys are marked specially as "deleted".
Insert can insert an item in a deleted slot, but the search doesn't stop at a deleted
slot.
1. Linear Probing: In linear probing, we linearly probe for next slot. For
example, typical gap between two probes is 1 as taken in below example
also.
let hash(x) be the slot index computed using hash function and S be the table
size
Let us consider a simple hash function as “key mod 7” and sequence of keys
as 50, 700, 76, 85, 92, 73, 101.
3. Double Hashing We use another hash function hash2(x) and look for
i*hash2(x) slot in i'th rotation.
• Linear probing has the best cache performance but suffers from clustering.
One more advantage of Linear probing is easy to compute.
• Quadratic probing lies between the two in terms of cache performance and
clustering.
• Double hashing has poor cache performance but no clustering. Double
hashing requires more computation time as two hash functions need to be
computed.
• set
• unordered_set
• map
• unordered_map
set
Sets are a type of associative containers in which each element has to be unique,
because the value of the element identifies it. The value of the element cannot be
modified once it is added to the set, though it is possible to remove and add the
modified value of that element.
Sets are used in the situation where it is needed to check if an element is present in
a list or not. It can also be done with the help of arrays, but it would take up a lot of
space. Sets can also be used to solve many problems related to sorting as the
elements in the set are arranged in a sorted order.
Some basic functions associated with Set:
Implementation:
#include <iostream>
#include <set>
#include <iterator>
int main()
{
// empty set container
set <int> s;
// List of elements
int arr[] = {40, 20, 60, 30, 50, 50, 10};
return 0;
}
Run
Output:
50 is present
unordered_set
The unordered_set container is implemented using a hash table where keys are
hashed into indices of this hash table so it is not possible to maintain any order. All
operation on unordered_set takes constant time O(1) on an average which can go
up to linear time in the worst case which depends on the internally used hash
function but practically they perform very well and generally provide constant time
search operation.
The unordered-set can contain key of any type – predefined or user-defined data
structure but when we define key of a user-defined type, we need to specify our
comparison function according to which keys will be compared.
set vs unordered_set
Note: Like set containers, the Unordered_set also allows only unique keys.
Implementation:
#include <iostream>
#include <unordered_set>
#include <iterator>
int main()
{
// empty set container
unordered_set <int> s;
// List of elements
int arr[] = {40, 20, 60, 30, 50, 50, 10};
return 0;
}
Run
The elements in the unordered_set are:
10 50 30 60 40 20
50 is present
Map container
As a set, the Map container is also associative and stores elements in an ordered
way but Maps store elements in a mapped fashion. Each element has a key value
and a mapped value. No two mapped values can have the same key values.
Implementation:
#include <iostream>
#include <iterator>
#include <map>
int main()
{
// empty map container
map<int, int> mp;
// printing map mp
map<int, int>::iterator itr;
cout << "The map mp is : n";
cout << "KEYtELEMENTn";
for (itr = mp.begin(); itr != mp.end(); ++itr) {
cout << itr->first
<< 't' << itr->second << 'n';
}
return 0;
}
Run
Output:
The map mp is :
KEY ELEMENT
1 40
2 30
3 60
4 20
5 50
6 50
7 10
unordered_map Container
The unordered_map is an associated container that stores elements formed by a
combination of key value and a mapped value. The key value is used to uniquely
identify the element and mapped value is the content associated with the key. Both
key and value can be of any type predefined or user-defined.
Internally unordered_map is implemented using Hash Table, the key provided to
map are hashed into indices of a hash table that is why the performance of data
structure depends on hash function a lot but on an average, the cost of search, insert
and delete from hash table is O(1).
Implementation:
int main()
{
// Declaring umap to be of <string, int> type
// key will be of string type and mapped value will
// be of double type
unordered_map<string, int> umap;
// inserting values
umap.insert({"GeeksforGeeks", 10});
umap.insert({"Practice", 20});
umap.insert({"Contribute", 30});
return 0;
}
Run
Output:
unordered_map vs unordered_set
In unordered_set, we have only key, no value, these are mainly used to see
presence/absence in a set. For example, consider the problem of counting
frequencies of individual words. We can’t use unordered_set (or set) as we can’t
store counts.
unordered_map vs map
map (like set) is an ordered sequence of unique keys whereas in the unordered_map
key can be stored in any order, so unordered.
A map is implemented as a balanced tree structure that is why it is possible to
maintain order between the elements (by specific tree traversal). The time
complexity of map operations is O(Log n) while for unordered_set, it is O(1) on
average.
class GFG {
public static void main(String args[])
{
Run
Output:
class GFG {
// Print HashMap
System.out.println(hmap);
}
Run
Output:
Run
Output:
class Test {
public static void main(String[] args)
{
HashSet<String> h = new HashSet<String>();
Run
Output:
import java.util.LinkedHashSet;
public class Demo
{
public static void main(String[] args)
{
LinkedHashSet<String> linkedset =
new LinkedHashSet<String>();
Run
Output:
Size of LinkedHashSet = 5
Original LinkedHashSet:[A, B, C, D, E]
Removing D from LinkedHashSet: true
Trying to Remove Z which is not present: false
Checking if A is present=true
Updated LinkedHashSet: [A, B, C, E]
import java.util.*;
class TreeSetDemo {
public static void main(String[] args)
{
TreeSet<String> ts1 = new TreeSet<String>();
Run
Output:
TreeSet: [A, B, C]