05 Chapter Performance MongoDB New
05 Chapter Performance MongoDB New
The fact that RAM or memory is 25 times faster than common SSDs also makes this
transition of disk-oriented into RAM-oriented a nice, strong appealing factor for databases to
be designed around usage of memory.
As a result of this, MongoDB has storage engines that are either very dependent on RAM, or
even completely in memory execution modes for its data management operations.
Roughly, one megabyte per established connection. And therefore they require memory
space. It is safe to say that the more RAM you have available, the more performance your
department of MongoDB will tend to be.
By default, MongoDB will try to use all available CPU cores to respond to incoming
requests. Our non-locking concurrency control mechanism, using wired tag or storage
engine, rely heavily in the CPU to process these requests.
This means that if we have non-blocking operations, like writing different documents
concurrently or responding to an incoming query requests like Reads, MongoDB will perform
better the more CPU resources we have available.
Also, there are certain operations, like page compression, data calculation operations,
aggregation framework operations, and map reduce, amongst others that will require the
availability of CPU cycles.
11/12/2023 MongoDB Data Model
7
1. Hardware
The faster and the larger the bandwidth is for your network, the better performance you will
experience. But this is not the end of the story regarding network utilization with MongoDB.
MongoDB is a distributed database for high availability. But also for horizontal scaling, where
the shard and cluster in all its different components, allows you to get a horizontal
distribution of your data.
The way that your different hosts that hold the different nodes of your cluster are connected,
can affect the overall performance of your system.
The type of connections between data centers, especially latency we haven't cracked going
faster than the speed of light yet will play a great deal in the performance experienced by
your application.
This aligned with the write concern, read concern, and read preference that your application
can set while emitting commands or doing requests to the server, needs to be taken into
consideration when analyzing the performance of your application.
• Replica Cluster
• Shard Cluster
• Consider latency;
• Read implications;
• Write implications.
Apart from the main purpose of providing high availability, in case of failure of a node, we will
still have availability of a service provided by the remaining nodes, but replica sets can also
provide a few other functions, like offloading ventral consistency data to secondaries,
privileging your primary for operational workload, or having specific workload with target
indexes configuration on secondary nodes.
The other side of distributed systems in MongoDB, with a purpose of horizontal scalability, is
our Shard Cluster.
Config Servers
Shard Node
mongos
11/12/2023 MongoDB Data Model
15
02. Performance on Clusters
Considerations in Distributed
In our Shard Cluster, we will have our Mongos, responsible for routing our client application
requests to designated nodes.
We're going to have config servers. These nodes are responsible for holding the mapping of
our Shard Cluster, where data sits at each point in time, but also the general configuration of
our Shard Cluster in its own.
And finally, we have our Shard Nodes. Shard Nodes are responsible for holding the
application data. Databases, collections, indexes these will reside in these members, and it
will be here that all major workload will be performed.
˗ You need to understand how your data grows and your data is accessed;
Config Servers
Shard Node
mongos
˗ When
• If we are not using the shard key, we will be performing scattered gathered queries
• If we are using the shard key, we will be performing scattered Routed Queries
Config Servers
Shard Node
mongos
Config Servers
Shard Node
mongos
Config Servers
Local Sort
Local Sort
mongos
11/12/2023 MongoDB Data Model
22
02. Performance on Clusters
Considerations in Distributed
Sorting Merge
Config Servers
mongos
Config Servers
Local Limit + Skip
mongos
mongos
11/12/2023 MongoDB Data Model
25
02. Performance on Clusters
Considerations in Distributed
{ first_name : “Donal”, … }
{ first_name : “Donal”, … }
{ first_name : “Donal”, … }
{ first_name : “Donal”, … }
db.people.find().readPref(“primary”)
db.people.find().readPref(“primaryPreferred”)
db.people.find().readPref(“secondary”)
db.people.find().readPref(“secondaryPreferred”)
db.people.find().readPref(“nearest”)
db.people.find().readPref(“primary”)
db.people.find().readPref(“secondary”)
11/12/2023 MongoDB Data Model
42
02. Performance on Clusters
Reading from Secondaries
db.people.find().readPref(“secondaryPreffered”)
11/12/2023 MongoDB Data Model
43
02. Performance on Clusters
Reading from Secondaries
db.people.find().readPref(“nearest”)
11/12/2023 MongoDB Data Model
44
02. Performance on Clusters
Reading from Secondaries
Analytics queries
Local reads
db.people.find().readPref(“secondary”)
11/12/2023 MongoDB Data Model
46
02. Performance on Clusters
Reading from Secondaries
db.people.find().readPref(“secondary”)
11/12/2023 MongoDB Data Model
49
02. Performance on Clusters
Reading from Secondaries
˗ How it works;
˗ Where operations are completed;
˗ Optimizations.
˗ $out;
˗ $facet;
˗ $lookup;
˗ $graphLookup.
Aggregation Optimizations
Aggregation Optimizations
Aggregation Optimizations
Aggregation Optimizations
• An index may have multiple keys for queries with exact matches. The index keys for equality
matches can appear in any order. However, to satisfy an equality match with the index, all of
the index keys for exact matches must come before any other index fields.
• Exact matches should be selective. To reduce the number of index keys scanned, ensure
equality tests eliminate at least 90% of possible document matches.
Follow ESR Rule in Compound Indexes
Sort
• "Sort" determines the order for results. Sort follows equality matches because the equality
matches reduce the number of documents that need to be sorted.
• An index can support sort operations when the query fields are a subset of the index keys. Sort
operations on a subset of the index keys are only supported if the query includes equality
conditions for all of the prefix keys that precede the sort keys.
• To improve query performance, create an index on the manufacturer and model fields:
db.cars.createIndex( { manufacturer: 1, model: 1 } )
Follow ESR Rule in Compound Indexes
Sort - Blocking sort
• A blocking sort indicates that MongoDB must consume and process all input documents to the
sort before returning results. Blocking sorts do not block concurrent operations on the collection
or database.
• If MongoDB cannot use an index or indexes to obtain the sort order, MongoDB must perform a
blocking sort operation on the data.
• MongoDB to use temporary files on disk to store data exceeding the 100 megabyte system
memory limit while processing a blocking sort operation.
• Sort operations that use an index often have better performance than blocking sorts.
Follow ESR Rule in Compound Indexes
Range
• "Range" filters scan fields. The scan doesn't require an exact match, which means range filters
are loosely bound to index keys. To improve query efficiency, make the range bounds as tight
as possible and use equality matches to limit the number of documents that must be scanned.
• Range filters resemble the following:
Cardinality
• Cardinality is defined to be number of unique elements present in a set. The lower the
cardinality, the more duplicated elements.
• So if a set has 5 elements made of Boolean values, then the cardinality of the set is going to be
two. So, all sets made of Booleans will have a max cardinality of two and a min cardinality of
one.
B-Tree & Prefix Compression: Query Performance & Disk usage
• Indexes must be built carefully in conditions like these. One more side effect of having an index
on such fields is that it impacts writes as well.
B-Tree & Prefix Compression: Query Performance & Disk usage
Partial Index
• Partial indexes only index the documents in a collection that meet a specified filter expression.
By indexing a subset of the documents in a collection, partial indexes have lower storage
requirements and reduced performance costs for index creation and maintenance.
• For example, the following operation creates a compound index that indexes only the
documents with a rating field greater than 5.
db.restaurants.createIndex(
{ cuisine: 1, name: 1 },
{ partialFilterExpression: { rating: { $gt: 5 } } }
)
Use Covered Queries When Possible
Covered query
• A covered query is a query that can be satisfied entirely using an index and does not have to
examine any documents. An index covers a query when all of the following apply:
• all the fields in the query are part of an index, and
• all the fields returned in the results are in the same index.
• no fields in the query are equal to null (i.e. {"field" : null} or {"field" : {$eq : null}} ).
• For example, a collection inventory has the following index on the type and item fields:
db.inventory.createIndex( { type: 1, item: 1 } )
• This index will cover the following operation which queries on the type and item fields and
returns only the item field:
db.inventory.find( { type: "food", item:/^c/ },{ item: 1, _id: 0 })
Use Covered Queries When Possible
Covered query
• For the specified index to cover the query, the projection document must explicitly specify _id: 0
to exclude the _id field from the result since the index does not include the _id field.
• For example, consider a collection userdata with documents of the following form:
{ _id: 1, user: { login: "tester" } }
˗ Index creation in the background helps to overcome the locking bottleneck but decrease
the efficiency of index traversal
˗ Recommend the developer to write the covered query. The kind of query will be entirely
satisfied with an index. So zero documents need to be inspected to satisfy the query, and
this makes the query run lot faster. All the projection keys need to be indexed
˗ Remove Duplicate and unused index, it also improve the disk throughput and memory
optimization