How to use unordered_map efficiently in C++
Last Updated :
22 May, 2021
Pre-requisite: unordered_set, unordered_map
C++ provides std::unordered_set and std::unordered_map to be used as a hash set and hash map respectively. They perform insertion/deletion/access in constant average time.
- However, the worst-case complexity is O(n2).
- The reason is that the unordered_map store's key-value pair by taking the modulo of input value by a prime number and then stores it in a hash table.
- When the input data is big and input values are multiples of this prime number a lot of collisions take place and may cause the complexity of O(n2).
- Depending on the compiler the prime number maybe 107897 or 126271.
Example 1: If we insert multiples of the above two prime numbers and compute execution time. One of the prime numbers takes a much longer time than the other.
C++
// C++ program to determine worst case
// time complexity of an unordered_map
#include <bits/stdc++.h>
using namespace std;
using namespace std::chrono;
int N = 55000;
int prime1 = 107897;
int prime2 = 126271;
void insert(int prime)
{
// Starting the clock
auto start
= high_resolution_clock::now();
unordered_map<int, int> umap;
// Inserting multiples of prime
// number as key in the map
for (int i = 1; i <= N; i++)
umap[i * prime] = i;
// Stopping the clock
auto stop
= high_resolution_clock::now();
// Typecasting the time to
// milliseconds
auto duration
= duration_cast<milliseconds>(
stop - start);
// Time in seconds
cout << "for " << prime << " : "
<< duration.count() / 1000.0
<< " seconds "
<< endl;
}
// Driver code
int main()
{
// Function call for prime 1
insert(prime1);
// Function call for prime 2
insert(prime2);
}
Output: for 107897 : 2.261 seconds
for 126271 : 0.024 seconds
Clearly, for one of the prime numbers, the time complexity is O(n2).
The standard inbuilt hash function on which unordered_map works is similar to this:
C++
struct hash {
size_t operator()(uint64_t x)
const { return x; }
};
The above function can produce numerous collisions. The keys inserted in HashMap are not evenly distributed, and after inserting numerous prime multiples, further insertion causes the hash function to reallocate all previous keys to new slots hence making it slow. So, the idea is that we have to randomize the hash function.
The idea is to use a method so that the keys in our hashmap are evenly distributed. This will prevent collisions to take place. For this, we use Fibonacci numbers. The golden ratio related to the Fibonacci sequence (Phi = 1.618) has a property that it can subdivide any range evenly without looping back to the starting position.
We can create our own simple hash function. Below is the hash function:
C++
struct modified_hash {
static uint64_t splitmix64(uint64_t x)
{
// 0x9e3779b97f4a7c15,
// 0xbf58476d1ce4e5b9,
// 0x94d049bb133111eb are numbers
// that are obtained by dividing
// high powers of two with Phi
// (1.6180..) In this way the
// value of x is modified
// to evenly distribute
// keys in hash table
x += 0x9e3779b97f4a7c15;
x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
return x ^ (x >> 31);
}
int operator()(uint64_t x) const
{
static const uint64_t random
= steady_clock::now()
.time_since_epoch()
.count();
// The above line generates a
// random number using
// high precision clock
return splitmix64(
// It returns final hash value
x + random);
}
};
Basically, the above hashing function generates random hash values to store keys. To know more about this please refer to this article Fibonacci hashing.
Example 2: Using the above hashing function, the program runs very quickly.
C++
// C++ program to determine worst case
// time complexity of an unordered_map
// using modified hash function
#include <bits/stdc++.h>
using namespace std;
using namespace std::chrono;
struct modified_hash {
static uint64_t splitmix64(uint64_t x)
{
x += 0x9e3779b97f4a7c15;
x = (x ^ (x >> 30))
* 0xbf58476d1ce4e5b9;
x = (x ^ (x >> 27))
* 0x94d049bb133111eb;
return x ^ (x >> 31);
}
int operator()(uint64_t x) const
{
static const uint64_t random
= steady_clock::now()
.time_since_epoch()
.count();
return splitmix64(x + random);
}
};
int N = 55000;
int prime1 = 107897;
int prime2 = 126271;
// Function to insert in the hashMap
void insert(int prime)
{
auto start = high_resolution_clock::now();
// Third argument in initialisation
// of unordered_map ensures that
// the map uses the hash function
unordered_map<int, int, modified_hash>
umap;
// Inserting multiples of prime
// number as key in the map
for (int i = 1; i <= N; i++)
umap[i * prime] = i;
auto stop
= high_resolution_clock::now();
auto duration
= duration_cast<milliseconds>(
stop - start);
cout << "for " << prime << " : "
<< duration.count() / 1000.0
<< " seconds "
<< endl;
}
// Driver Code
int main()
{
// Function call for prime 1
insert(prime1);
// Function call for prime 2
insert(prime2);
}
Output: for 107897 : 0.025 seconds
for 126271 : 0.024 seconds
Reserving space before hand
By default, the capacity of unordered_map is 16 and a hash table is created for this. But every time, when threshold is reached, the capacity of the unordered_map is doubled and all the values are rehashed according to new hash table.
So, we can reserve the capacity beforehand according to our input size by using .reserve() method.
umap.reserve(1024);
1024 can be replaced by any int value according to input size. This prevents rehashing and dynamic allocation which makes program more efficient.
Setting max_load_factor
max_load_factor of unordered_map determines the probability of collision. Default value is set to 1.
By setting it to a lower value like 0.25 can decrease the probability of collisions by great extent.
umap.max_load_factor(0.25);
Example : Using above two method can make umap faster :
C++
#include <bits/stdc++.h>
using namespace std;
using namespace std::chrono;
int N = 55000;
int prime1 = 107897;
int prime2 = 126271;
void insert(int prime)
{
// Starting the clock
auto start
= high_resolution_clock::now();
unordered_map<int, int> umap;
umap.reserve(1024); // RESERVING SPACE BEFOREHAND
umap.max_load_factor(0.25); // DECREASING MAX_LOAD_FACTOR
// Inserting multiples of prime
// number as key in the map
for (int i = 1; i <= N; i++)
umap[i * prime] = i;
// Stopping the clock
auto stop
= high_resolution_clock::now();
// Typecasting the time to
// milliseconds
auto duration
= duration_cast<milliseconds>(
stop - start);
// Time in seconds
cout << "for " << prime << " : "
<< duration.count() / 1000.0
<< " seconds "
<< endl;
}
// Driver code
int main()
{
// Function call for prime 1
insert(prime1);
// Function call for prime 2
insert(prime2);
}
Output :
for 107897 : 0.029 seconds
for 126271 : 0.026 seconds
Similar Reads
Unordered Map in C++ STL
In C++, unordered_map is an unordered associative container that stores data in the form of unique key-value pairs. But unlike map, unordered map stores its elements using hashing. This provides average constant-time complexity O(1) for search, insert, and delete operations but the elements are not
7 min read
Different Ways to Initialize an unordered_map in C++
Initialization is the process of assigning the initial values to the std::unordered_map elements. In this article, we will learn different methods to initialize the std::unordered_map in C++.Table of ContentUsing Initializer ListBy Inserting Elements One by OneFrom Another std::unordered_mapFrom Ano
3 min read
Traversing a Map and unordered_map in C++ STL
The maps are described as mapped associative containers for elements where each element has a key and value assigned to it. Another form of map container seen in the C++ STL is the unordered map. It is the same as map containers just that they don't store the data in sorted order.We can traverse map
5 min read
How to use unordered_map efficiently in C++
Pre-requisite: unordered_set,  unordered_map C++ provides std::unordered_set and std::unordered_map to be used as a hash set and hash map respectively. They perform insertion/deletion/access in constant average time. However, the worst-case complexity is O(n2).The reason is that the unordered_map
6 min read
map vs unordered_map in C++
In C++, map and unordered_map are the containers that store can store data in the form of key-value pairs, but they differ significantly in terms of underlying implementation and performance characteristics.The below table lists the primary differences between map and unordered_map container:mapunor
2 min read
How to create an unordered_map of tuples in C++?
Tuple - A tuple is an object that can hold a number of elements. The elements can be of different data types. The elements of tuples are initialized as arguments in the order in which they will be accessed. Unordered Map does not contain a hash function for a tuple. So if we want to hash a tuple the
2 min read
How to create an unordered_map of user defined class in C++?
unordered_map is used to implement hash tables. It stores key value pairs. For every key, a hash function is computed and value is stored at that hash entry. Hash functions for standard data types (int, char, string, ..) are predefined. How to use our own data types for implementing hash tables?unor
3 min read