0% found this document useful (0 votes)
43 views31 pages

Simplr Solutions - Field

Performance report and suggestions to optimize mongoDB queries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views31 pages

Simplr Solutions - Field

Performance report and suggestions to optimize mongoDB queries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Simplr Solutions

MongoDB Consulting Report


Santhosh S Kashyap <[email protected]>, MongoDB Inc.
April 2023

Participants:
● Ofer Beit Halachmi, Simplr Solutions
● Srinivas Jaligama, Simplr Solutions
● Santhosh S Kashyap, Consulting Engineer, MongoDB Inc.

This document summarizes the discussions and recommendations from a 3-day remote
consultation with the “Simplr Solutions” team on 17th, 18th and 19th April 2023.

Each recommendation is assigned a level from 1 to 3. The levels correspond to the following
priorities:

1. Implement this recommendation immediately.

Not implementing this recommendation incurs the risk of data loss, system unavailability, or
other significant problems that may cause outages in the future.

2. Implement this recommendation as soon as possible.

These recommendations are less severe than level 1, but typically represent significant
problems that could affect a production system. Consider these issues promptly.

3. Consider this recommendation.

While this suggestion may improve some aspects of the application, it is not critical and
may reflect a much larger change in the application code or MongoDB deployment.
Consider these modifications as part of the next revision of the application.

Copyright 2023 MongoDB, Inc. 1 of 31


CONFIDENTIAL
1. Executive Summary
2 Background
2.1 Application
2.2 Environment
3 Recommendations
3.1 Query Optimization [Priority 1]
3.1.1 Prefer Equality or $in vs Negators
3.1.2 Optimize $or conditions.
3.2 Remove Unused Indexes [Priority 1]
3.2.1 Safe Practice to Delete an Index
3.3 Precompute conversations and messages [Priority 2]
3.4 Archive Unused Data [Priority 2]
3.5 Snowflake Integration [Priority 3]
3.5.1 Use Kafka For Integration
3.5.2 Manual Integration using Change Streams
4 Other Notes, Questions & Answers
4.1 Indexing strategy
4.1.1 General Indexing Strategies
4.1.2 Create compound indexes to help with sorting
4.1.3 ESR Rule
4.1.3.1 (E → R) Equality Before Range
4.1.3.2 (E → S) Equality before Sort
4.1.3.3 (S → R) Sort before Range
4.1.4 Explain Plans
4.2 Archiving Strategies
4.2.1 Atlas Data Federation
4.2.1.1 Features not supported by Atlas Data Federation
4.2.1.2 $out stage
4.2.1.3 FedRamp AWS
4.2.2 Online Archive
What are the limitations of the Online Archive?
Can we restore the data back from S3 buckets?
How can we connect to our archived data? Does the connection string for the
cluster also change?
How is the performance when we connect to Cluster and Online Archive both?
4.3 Mongolyser
4.4 Keyhole analysis
4.4.1 Keyhole Setup
4.4.2 Keyhole Logs analysis

Copyright 2023 MongoDB, Inc. 2 of 31


CONFIDENTIAL
4.4.3 Keyhole Cluster Analysis
4.5 Atlas Backup
4.5.1 Continuous Backup
4.5.2 How to quickly restore from a Multi Region Cluster?
4.6 Atlas Event Changes
4.6.1 Change Streams
4.6.2 Atlas Triggers
Database Triggers
Scheduled Triggers
Restarting a suspended Trigger
LIMITATIONS
5 Recommended Further Consulting and Training
5.1 Consulting
5.2 Training

Copyright 2023 MongoDB, Inc. 3 of 31


CONFIDENTIAL
1. Executive Summary
Simplr offers companies a human-first, machine-enabled customer experience solution
that meets the demands of the NOW Customer across all digital channels. Simplr allows
companies to immediately expand their customer service capacity and engage customers
with speed, empathy, and precision.

This document contains comments and recommendations for Simplr’s simplr-prod Atlas
cluster. Primarily the efforts concentrated on this cluster to remove unused and redundant
indexes, backups, online archives and integration strategies to snowflake.

2 Background
2.1 Application
The company offers a fully managed service that connects a chatbot,human agents, and an
AI-powered platform to deliver better and more cost efficient CX than legacy BPOs. With
Simplr, clients are engaging with customers in ways that drive more revenue. In doing so,
they’re fundamentally transforming CX programs into strategic imperatives for their
companies.

2.2 Environment

Application is developed using Node.js using typegoose driver. Following are the details of
the MongoDB deployment focussed as part of this engagement:
Cluster Name Deployment Region Cluster Tier MongoDB
Type Version

Replica Set AWS N. Virginia M80 (General 4.4.21


simplr-prod (4 nodes, (us-east-1)
PSS+Analytic s)

Prod-clone-Dec20 Replica Set AWS N. Virginia M50 (General 4.4.21


22 (4 nodes, (us-east-1)
PSS+Analytic s)

3 Recommendations
3.1 Query Optimization [Priority 1]
During the consultation, we analyzed logs for 1 week using the keyhole analyser to find slow
queries. We saw a few queries which can be optimized and he discussed the query
optimization.

Copyright 2023 MongoDB, Inc. 4 of 31


CONFIDENTIAL
Query 1
From the log analysis we saw that out of 1,445,714 slow queries tagged by mongodb
1,442,125 slow queries were from the below query and this was executed by npm package
agenda. This package creates scheduled tasks and acquires locking for a task and performs
the task.

{
"t":{
"$date":"2023-04-11T01:40:26.850+00:00"
},
"s":"I",
"c":"COMMAND",
"id":51803,
"ctx":"conn30900",
"msg":"Slow query",
"attr":{
"type":"command",
"ns":"simplr.agendaJobs-prod",
"command":{
"findAndModify":"agendaJobs-prod",
"query":{
"$or":[
{
"name":"conversationCloseTimer",
"lockedAt":null,
"nextRunAt":{
"$lte":{
"$date":"2023-04-11T01:40:26.698Z"
}
},
"disabled":{
"$ne":true
}
},
{
"name":"conversationCloseTimer",
"lockedAt":{
"$exists":false
},
"nextRunAt":{
"$lte":{
"$date":"2023-04-11T01:40:26.698Z"

Copyright 2023 MongoDB, Inc. 5 of 31


CONFIDENTIAL
}
},
"disabled":{
"$ne":true
}
},
{
"name":"conversationCloseTimer",
"lockedAt":{
"$lte":{
"$date":"2023-04-11T01:40:11.698Z"
}
},
"disabled":{
"$ne":true
}
}
]
},
"sort":{
"nextRunAt":1,
"priority":-1
},

"planSummary":"IXSCAN { name: 1, nextRunAt: 1, priority: -1, lockedAt: 1,


disabled: 1 }, IXSCAN { name: 1, lockedAt: 1, nextRunAt: 1, priority: -1,
disabled: 1 }, IXSCAN { name: 1, lockedAt: 1, nextRunAt: 1, priority: -1,
disabled: 1 }",
"keysExamined":41734,
"docsExamined":0,
"hasSortStage":true,
"nMatched":0,
"nModified":0,
"numYields":41,
"queryHash":"C66DDE38",
"planCacheKey":"27DF7BC2",
"reslen":232,

"protocol":"op_query",
"durationMillis":5140
}
}

We saw that this query was executed whenever the agenda starts the job. We also dived
deep into the agenda code and saw we can optimize it by removing one of the conditions as
in the current db lockedAt always exists. This needs a change in agenda code v2.0.0 which
the team is using. Due to time constraints we were not able to implement and test the
possible issues the team must test thoroughly before implementing. We also saw that this
logic is greatly improved in the newer versions of agenda and team can also check the
possibility of upgrading the package.

Copyright 2023 MongoDB, Inc. 6 of 31


CONFIDENTIAL
Query 2
The following query was analyzed

{
"t":{
"$date":"2023-04-14T19:13:36.140+00:00"
},
"s":"I",
"c":"COMMAND",
"id":51803,
"ctx":"conn557",
"msg":"Slow query",
"attr":{
"type":"command",
"ns":"simplr.Message",
"command":{
"aggregate":"Message",
"pipeline":[
{
"$match":{
"_p_conversationPtr":{
"$exists":true
},
"direction":"INBOUND",
"_updated_at":{
"$gte":{
"$date":"2023-04-13T00:00:00.000Z"
},
"$lte":{
"$date":"2023-04-14T00:00:00.000Z"
}
}
}
},
{
"$project":{
"messageBody":1,
"_p_conversationPtr":1,
"conversationId":{
"$substr":[
"$_p_conversationPtr",
13,
-1
]
},
"_id":1
}
},
{
"$lookup":{
"from":"Conversation",
"let":{
"conversationId":"$conversationId"

Copyright 2023 MongoDB, Inc. 7 of 31


CONFIDENTIAL
},
"pipeline":[
{
"$match":{
"$expr":{
"$eq":[
"$_id",
"$$conversationId"
]
}
}
},
{
"$project":{
"smbName":1,
"firstMessageArrivedAt":1,
"simplrTicketNumber":1,
"_id":1
}
}
],
"as":"conversation"
}
},
{
"$skip":42840
},
{
"$limit":20
}
],
"cursor":{

},
"lsid":{
"id":{
"$uuid":"6285c6f4-6552-410c-b173-bbfd738489cd"
}
},
"$clusterTime":{
"clusterTime":{
"$timestamp":{
"t":1681499575,
"i":63
}
},
"signature":{
"hash":{
"$binary":{
"base64":"EmroYSf1BTt8RGzHXJQQV+PcAKs=",
"subType":"0"
}
},
"keyId":7179330532990255267

Copyright 2023 MongoDB, Inc. 8 of 31


CONFIDENTIAL
}
},
"$db":"simplr",
"$readPreference":{
"mode":"secondaryPreferred"
}
},
"planSummary":"IXSCAN { _updated_at: -1 }",
"keysExamined":134461,
"docsExamined":134461,
"protocol":"op_msg",
"durationMillis":40408
}
}

In the above query we can improve it by adding an optimal index for the match condition
using ESR Rule.

Possible Index
Direction:1, _updated_at:1, _p_conversationPtr:1

Here we also see that team is $skip and $limit is after the lookup which leads to lot of
documents being looked up but since the team is not using the looked up items for sorting,
matching or unwinding we can move the skip and limit before the lookup which drastically
reduced the number of documents going for $lookup.

Query 3

{
"t":{
"$date":"2023-04-14T20:08:42.333+00:00"
},
"s":"I",
"c":"COMMAND",
"id":51803,
"ctx":"conn361",
"msg":"Slow query",
"attr":{
"type":"command",
"ns":"simplr.EmailQueue",
"command":{
"find":"EmailQueue",
"filter":{
"status":{
"$in":[
"SENDING",
"FAILED"
]
},

Copyright 2023 MongoDB, Inc. 9 of 31


CONFIDENTIAL
"lastUpdate":{
"$lt":{
"$date":"2023-04-14T19:52:43.871Z"
}
},
"processed":{
"$ne":true
}
},
"sort":{

},
"projection":{

},
"limit":100,
"returnKey":false,
"showRecordId":false,
"planSummary":"IXSCAN { status: 1 }",
"keysExamined":246070,
"docsExamined":246069,
"durationMillis":58457
}
}
In the above query we can improve it by adding an optimal index for the match condition
using ESR Rule.

Possible Index
status:1, lastUpdate:1, processed:1

Here we can also avoid a negator query by changing the processed {$ne: true} to
processed: {eq: false} for more details please refer Prefer Equality or $in vs Negators

3.1.1 Prefer Equality or $in vs Negators

During the consultation, we observed a few queries which had a $ne check. Hence we
discussed the possibility of using an equality or $in in these conditions.

The inequality operator $ne is an expensive operation as it is not very selective since it often
matches a large portion of the index. As a result, performance of $ne query with index is not
better than the $ne query without index in many cases.

It is recommended to modify this query to use $in operation. It is to be noted that Mongodb
runs $in as equality until the combinations of all the items in $in reach 200 after which it
treats all the $in as range. When $in combination increases over 200 it results in a memory
sort stage. This needs to be kept in mind when designing the index of the query using ESR
rule.

Copyright 2023 MongoDB, Inc. 10 of 31


CONFIDENTIAL
3.1.2 Optimize $or conditions.

In the aggregation above we observe a few conditions where $or is used hence we
discussed $or optimization and how indexes work in the case of $or. For MongoDB to use
indexes to evaluate an $or expression, all the clauses in the $or expression must be
supported by distinct indexes. Otherwise, MongoDB will perform a collection scan.

When using indexes with $or queries, each clause of an $or can use its own index. Consider
the following query:

db.inventory.find( { $or: [ { quantity: { $lt: 20 } }, { price: 10 } ] } )

The following indexes need to be created to support the above query:

db.inventory.createIndex( { quantity: 1 } )
db.inventory.createIndex( { price: 1 } )

3.2 Remove Unused Indexes [Priority 1]


During the consultation we ran a script below to find all the indexes and its uses to get a list
of unused indexes.

// mongosh "mongodb+srv://<user>:<pass>@<mongodbURI>" --eval "var


_printJSON=false;"
indexStats.js > IndexStats.log

//save below code as indexStats.js


// code
db = db.getSiblingDB("admin");
var dbs = db.adminCommand("listDatabases").databases;
var view = "default";
dbs.forEach(function(database) {
db = db.getSiblingDB(database.name);
if (database.name !== "admin") {
if (database.name !== "system") {
if (database.name !== "local") {
if (database.name !== "config") {
cols = db.getCollectionNames();
cols.forEach(function(collection) {
view = db.getCollection(collection).exists().type;
if (collection !== "system.views" && view ==
"collection") {
print("--" + collection + "--");
print(
db.getCollection(collection)
.aggregate([{

Copyright 2023 MongoDB, Inc. 11 of 31


CONFIDENTIAL
$indexStats: {},
},
{
$project: {
indexName: "$name",
indexKeys: "$spec.key",
indexUsage: "$accesses.ops",
},
},
])
.toArray()
.map((m) => ({
indexName: m.indexName,
indexKeys: m.indexKeys,
indexUsage: m.indexUsage,
}))
);
print("\n");
}
});
}
}
}
}
});
//code

In MongoDB, indexes aid in the efficient execution of queries. However, each index you
create has a negative impact on write performance and requires some disc space. Adding
unnecessary indexes to a collection results in a bloated collection and slow writes. Consider
whether each query performed by your application justifies the creation of an index. Remove
unused indexes, either because the field is not used to query the database or the index is
redundant. Below is the list of indexes with zero usage.
Please note details in the list is as per april 17 and will be dynamic and the team should run
the script again to determine before considering removal of indexes.

Collection IndexKey Usage


DecisionFlowStatusV2 { _created_at: 1 } 0
LastMile { _updated_at: -1 } 0
SimplrExpert { _updated_at: -1 } 0
SimplrExpert { locationCountry: 1 } 0
{ isActive: 1,
_p_conversationPtr: 1,
ExpertQueue _p_simplrExpertPtr: 1 } 0
ExpertQueue { _created_at: -1 } 0
{ isActive: 1,
_p_conversationPtr: 1,
ExpertQueue _rperm: 1 } 0

Copyright 2023 MongoDB, Inc. 12 of 31


CONFIDENTIAL
ExpertQueue { _updated_at: -1 } 0
CustomerSurvey { _created_at: 1 0
CustomerSurvey { didSubmit: 1 } 0
CustomerSurvey { rating: 1 } 0
CustomerSurvey { _updated_at: 1 } 0
{ _fts: 'text', _ftsx: 1
CustomerSurvey }, 0
{ platformSurveyId: 1,
CustomerSurvey platformDomain: 1 } 0
{simplrChannel:
1,customerChannel:
1,isOpen: 1,status:
Conversation 1,_updated_at: -1} 0
{_p_simplrAccountPtr:
1,conversationUpdatedAt:
-1,communicationType:
Conversation 1,_rperm: 1} 0
Conversation { _created_at: -1 0
{_p_simplrExpertPtr:
1,status: 1,paymentStatus:
1,isTestTicket:
Conversation 1,closedAt: 1} 0
Conversation { lastStatusChangedAt: 1 } 0
{ customerChannel: 1,
Conversation isOpen: 1 0
{ firstMessageArrivedAt:
Conversation -1 } 0
{ _rperm: 1,
conversationUpdatedAt: -1
Conversation } 0
Conversation { platformAssigneeId: -1 } 0
{ conversationUpdatedAt: 1
Conversation } 0
{ intentValidationStatus:
Conversation 1 } 0
{ _p_simplrAccountPtr: 1,
Conversation brand: 1 } 0
Conversation { _updated_at: -1 } 0
Conversation { orderingSystem: -1 } 0
{ _p_simplrAccountPtr: 1,
_rperm: 1,
conversationUpdatedAt: -1
Conversation } 0
{ _p_simplrAccountPtr: 1,
Conversation _created_at: 1 } 0

Copyright 2023 MongoDB, Inc. 13 of 31


CONFIDENTIAL
CustomerSurveyTicket { platformTicketId: 1 } 0
DecisionFlowStatus { _updated_at: -1 } 0
DecisionFlowStatus { _created_at: -1 } 0
DecisionFlowStatus { _p_decisionFlowPtr: 1 } 0
FlowSystem _updated_at: -1 } 0
FlowSystem { version: 1 } 0
FlowSystem { _created_at: -1 } 0
_Idempotency { expire: 1 } 0
_Idempotency { reqId: 1 } 0
{ _p_reviewedExpertPtr: 1,
QualityRating _created_at: -1 } 0
{ reportType: 1, sellerId:
AmazonReport 1, endDate: -1 } 0
AnalyticsEvent { eventName: 1 } 0
AnalyticsEvent { _created_at: -1 } 0
AnalyticsEvent { _p_conversationPtr: 1 } 0
AnalyticsEvent { _p_simplrExpertPtr: 1 } 0
AnalyticsEvent { _updated_at: -1 } 0
{ name: 1, nextRunAt: 1,
priority: -1, lockedAt: 1,
agendaJobs-setTimer disabled: 1 } 0
{ _p_simplrAccountPtr: 1,
PlatformMacro platformType: 1 } 0
PlatformMacro { _p_simplrAccountPtr: 1 } 0
{ name: 1, nextRunAt: -1,
lastRunAt: -1,
agendaJobs-prod lastFinishedAt: -1 } 0
{ name: 1, nextRunAt: 1,
agendaJobs-prod lockedAt: 1 } 0
{ name: 1, nextRunAt: 1,
priority: -1, disabled: 1
agendaJobs-prod } 0
agendaJobs-prod { lastFinishedAt: -1 } 0
{ nextRunAt: -1,
lastRunAt: -1,
agendaJobs-prod lastFinishedAt: -1 } 0
{ _p_simplrAccountPtr: 1,
Message _created_at: 1 } 0
Message { smbName: 1 } 0
{ _p_conversationPtr: 1,
_rperm: 1, _created_at: 1
Message } 0

Copyright 2023 MongoDB, Inc. 14 of 31


CONFIDENTIAL
{ _p_simplrExpertPtr: 1,
_p_couponCodePtr: 1,
Message _created_at: 1 } 0
Message { _created_at: 1 } 0
IntegrationConfig { isActive: 1 } 0
IntegrationConfig { _created_at: -1 0
IntegrationConfig { systemType: 1 } 0
IntegrationConfig { _updated_at: -1 } 0
IntegrationConfig { resourceType: 1 } 0
ExpertPayment { _created_at: -1 } 0
{ _p_simplrAccountPtr: 1,
ExpertPayment readyForPaymentAt: 1 } 0
ExpertPayment { _updated_at: -1 } 0
EmailQueue { _created_at: -1 } 0
{ status: 1, lastUpdate:
EmailQueue 1, processed: 1 } 0
EmailQueue simplrAccountId: 1 } 0
EmailQueue { type: 1 } 0
_User { username: 1 } 0
{ email: 1 } (case
_User insensitive) 0

3.2.1 Safe Practice to Delete an Index

During the discussion, the team inquired about the safe deletion of an index from the
database. We talked about the following steps -

● Identify the index that needs to be removed from the database


● Hide the index to validate that it does not affect the ongoing operations
● Delete the index permanently from the database

3.3 Precompute conversations and messages [Priority 2]

During the consultation, we saw that the team was using the type goose's populate method
extensively.

// Populate getting called on every find and findAll


return await this.findAll({
sort,
filter,
populate,
roleGroup,
logContext,

Copyright 2023 MongoDB, Inc. 15 of 31


CONFIDENTIAL
});

// Type goose populate call


queries.map((query) =>
query.model.populate(
query.docs,
query.paths.map((path) => ({
...path,
match,
transform: this.options.transform,
}))
)
)

It is to be noted that populate is not a mongodb method but a typgoose driver method
derived mongoose driver. Populate caches the requested data and tries to replace the data
in place of reference. It is to be noted that when data is not available this may lead to a large
number of calls to mongodb database reducing the performance of the query drastically. We
can reduce and optimize this by adding a lookup with lookup optimization.

We can optimize this better by precomputing the reference fields using the precompute
pattern. Here we can create a new collection (to preserve the previous schema) or one of
the current collections and pre-compute the referenced fields with the required data for the
api and frontend. Here we also discussed the use of change streams and atlas triggers to
keep the data in sync by computing when a data point is added or modified from which the
data was computed.

We discussed combining conversations and messages into a single new computed


collection since these two collections are usually used together and the team is having to
perform populates on these during the queries. Hence we discussed the possibility of
combining these collections by precomputing them into a new collection and storing only
relevant data required by apis. We also discussed storing this in a time window to reduce
(Conversation of only 6 months) the increase in data size due to redundancy.

3.4 Archive Unused Data [Priority 2]

During the consultation, we saw few collection which were were very large like
conversations, messages, history collection, events collections etc in these the team
mentioned that some of the collections like directly moved to snowflake and are not
accessed using mongodb but team also mentioned that these data need to available in
mongodb hence these collections can be moved to online archives. We also saw that there
is a large number of old and unused conversations and messages which can also be
archived. Hence we discussed ways to archive the data.

Please note: Data pushed to online archives cannot be modified

Copyright 2023 MongoDB, Inc. 16 of 31


CONFIDENTIAL
Please refer to Archiving Strategies.

3.5 Snowflake Integration [Priority 3]

During the consultation, the team mentioned that they wanted alternatives for using fivetran
for integration into snowflake. Here we discussed 2 possible approaches for integration into
snowflakes.

3.5.1 Use Kafka For Integration

Here we discussed the possibility of using kafka source connector to fetch data from
mongodb and using kafka’s snowflake sink connector to push data from mongodb into
snowflake.

3.5.2 Manual Integration using Change Streams

Here we discussed the possibility of pushing data to snowflake using snowflake apis to push
data to snowflake after the initial sync is complete the team can make of change streams to
track the changes to data points that needs to pushed to snowflake and use snowflake apis
to push data into snowflake

4 Other Notes, Questions & Answers


4.1 Indexing strategy

4.1.1 General Indexing Strategies


During the consultation, the team wanted to understand the indexing strategies and how to
implement them optimally. Hence indexing strategies were discussed. When adding or
modifying your approach to indexing, consider the following notes:

● A compound index can be utilized to satisfy multiple queries. For example, if there is
a compound index like { a:1, b:1, c:1 }, this can be utilized to satisfy all the
following queries -
○ db.coll.find({ a:3 })
○ db.coll.find({ a:3, b:5 })
○ db.coll.find({ a:3, b:5, c:8 })

● The order of the index keys is important for a compound index. In general, the
following rule of thumb can be used for the key order in compound indexes: First use
keys on fields on which there is an equality match in the query (these are usually the
most “selective” part of the query), then the sort keys, and then range query keys.
We call this the ESR (Equality-Sort-Range) rule. Consider the following example:

○ If the Query shape looks like - { a:5, b: {$gt : 5} }.sort({ c:1 })


○ Index field order should be - { a:1, c:1, b:1 }

Copyright 2023 MongoDB, Inc. 17 of 31


CONFIDENTIAL
The reason for this structure is that for an equality match, it is expected that it will
only select a subset of the index keys (therefore more selective), compared to sorting
which will scan all of the index keys for an index.

● Remove indexes that are not used because every index creates overhead for write
operations. The $indexStats command can be used for getting the statistics about
the usage of the indexes or index usage can be checked using MongoDB Compass.
Atlas UI can also be used to check the index usage.

● For the fastest processing, ensure that your indexes fit entirely in RAM so that the
system can avoid reading the index from the disk. For more information, refer to
Ensure Indexes Fit in RAM.

● When possible use covered queries. A covered query is a query that can be satisfied
entirely using an index and does not have to examine any documents (it will not show
a FETCH stage in the winning plan of explain-results).

● Use indexes to sort query results. For more information please refer to the link -Use
Indexes to Sort Query Results.

● You can make MongoDB use an index with hint() if for some reason the optimal index
is not used. Use caution while using .hint() because changes to the database’s query
optimizer may be negated by forcing an index selection with hints.

4.1.2 Create compound indexes to help with sorting

It’s common to find indexes that are optimized for querying and others for sorting.
Compound indexes should be structured to optimize for both whenever possible. A
compound index can benefit both a query and a sort when created properly.

Order matters in a compound index. Indexes are, by definition, a sorted structure. All levels
of a tree are sorted in ascending or descending order. Some queries and sorts can use an
index while others cannot, all depending on the query structure and sort options. It may help
to visualize an index as a generic tree structure to help see why.

Copyright 2023 MongoDB, Inc. 18 of 31


CONFIDENTIAL
The diagram above shows an index on { type: 1, name: 1, color: 1 }. This index will work
efficiently for the following queries:

● db.food.find({ type: "fruit" }).sort({ name: 1 })


● db.food.find({ type: "fruit", name: "apple" }).sort({ color: 1 })
● db.food.find({}).sort({ type: 1 })
● db.food.find({ type: "fruit", name: "apple", color: "green" })

All of the examples queries above leverage the fact that every “level” of the tree is sorted. It’s
important to note that name and color are both sorted under their respective tree. For
example, apple and orange are sorted correctly even though cucumber comes before
orange in a basic sort because they do not share a common parent.

The same index can be used for the following queries, but less efficiently:
● db.food.find({ type: "fruit" }).sort({ color: 1 })
● db.food.find({ name: "apple" }).sort({ type: 1 })

Both examples above can use the index to satisfy either the equality or the sort but not both.

Looking at the first example, equality on fruit will eliminate half of the tree structure (i.e.
vegetable), but an in-memory sort of color is still required. Orange comes before yellow
when sorting only by color, but those colors don’t contain a common parent so they are not
sorted for this example. An in-memory sort is now required to sort by color.

The second example has an equality on name and sort on type. The index is already sorted
on type so it can be used for that portion, but the entire tree must be traversed to eliminate
all entries with the name apple.

For more information on indexing performance, please refer to the following blog
Performance Best Practices: Indexing

Copyright 2023 MongoDB, Inc. 19 of 31


CONFIDENTIAL
4.1.3 ESR Rule

While creating indexes, ESR is the thumb rule that should be kept in mind which is also
recommended for performance best practices.

4.1.3.1 (E → R) Equality Before Range


When creating queries that ensure selectivity, we learn that “selectivity” is the ability of a
query to narrow results using the index. Effective indexes are more selective and allow
MongoDB to use the index for a larger portion of the work associated with fulfilling the query.

Equality fields should always form the prefix for the index to ensure selectivity.

4.1.3.2 (E → S) Equality before Sort


Placing Sort predicates after sequential Equality keys allow for the index to:
● Provide a non-blocking sort.
● Minimize the amount of scanning required.

4.1.3.3 (S → R) Sort before Range


Having a Range predicate before the Sort can result in a Blocking (In Memory) Sort being
performed as the index cannot be used to satisfy the sort criteria.

Refer to the Documentation for more details

4.1.4 Explain Plans

We can get insights about query/aggregation performance using explain plans. We can run
an explain plan on a query using .explain() on a query/aggregation. By looking at the
execution plan, the following can be determined:
● Access paths being used for fetching the data
● Various stages that go into execution
● How data is being sorted
● What indexes are being used and which fields the indexes are being used for

Using the explain method, an explainable object can be constructed to get the
allPlansExecution as below.

db.collection.explain(“allPlansExecution”).<query/aggregation>

Sample Execution plan results

"executionStats" : {
"executionSuccess" : <boolean>,
"nReturned" : <int>,
"executionTimeMillis" : <int>,

Copyright 2023 MongoDB, Inc. 20 of 31


CONFIDENTIAL
"totalKeysExamined" : <int>,
"totalDocsExamined" : <int>,
"executionStages" : {
"stage" : <STAGE1> // Executed second
"nReturned" : <int>,
"executionTimeMillisEstimate" : <int>,
"works" : <int>,
"advanced" : <int>,
"needTime" : <int>,
"needYield" : <int>,
"saveState" : <int>,
"restoreState" : <int>,
"isEOF" : <boolean>,
...
"inputStage" : {
"stage" : <STAGE2>, // First Executed
"nReturned" : <int>,
"executionTimeMillisEstimate" : <int>,
...
"inputStage" : {
...
}
}
},
"allPlansExecution" : [
{
"nReturned" : <int>,
"executionTimeMillisEstimate" : <int>,
"totalKeysExamined" : <int>,
"totalDocsExamined" :<int>,
"executionStages" : {
"stage" : <STAGEA>,
"nReturned" : <int>,
"executionTimeMillisEstimate" : <int>,
...
"inputStage" : {
"stage" : <STAGEB>,
...
"inputStage" : {
...
}
}
}
},
...
]
}

Copyright 2023 MongoDB, Inc. 21 of 31


CONFIDENTIAL
Each stage passes its results (i.e. documents or index keys) to the parent node. The leaf
nodes access the collection or the indices. The internal nodes manipulate the documents or
the index keys that result from the child nodes. The root node is the final stage from which
MongoDB derives the result set.

Stages are descriptive of the operation; e.g.


● COLLSCAN for a collection scan
● IXSCAN for scanning index keys
● FETCH for retrieving documents

The key metrics to note are


● nReturned: Shows the number of documents returned
● executionTimeMillis: Total time required in milliseconds for the execution of the query
● keysExamined: Number of index entries scanned
● docsExamined: Number of documents scanned from disk
● executionStages: Shows the complete execution path the query took

In a performant system, nReturned, keysExamined and docsExamined should be the same


because we should not scan more than what we want to be returned from the database. In
other words, there should be a 1:1 ratio between the parameters. Please see Explain
Results for more information on explain output.

4.2 Archiving Strategies

4.2.1 Atlas Data Federation

Atlas Data Federation is a multi-tenant on-demand query processing engine that allows
users to quickly query their archived data stored in Amazon S3 buckets using the standard
MongoDB Query Language (MQL). Atlas data Federation supports multiple formats for data
stored in AWS S3 buckets viz. JSON, BSON, CSV, TSV, Avro, ORC, and Parquet.

Data is analyzed on-demand with no infrastructure setup and no time-consuming


transformations, pre-processing, or metadata management. There's no schema to
pre-define, allowing you to work with your data faster.

Atlas Data Federation also supports “Federated Queries” with Atlas clusters as data sources
in addition to data stored in S3 buckets. This means that you can combine both your live
cluster data and historical data in your S3 buckets in virtual databases and collections on
Atlas Data Federation and can query on these virtual databases/collections using MQL
seamlessly.

For additional information on Atlas Data Federation, please refer to the links below-

● Atlas Data Federation — MongoDB Atlas Data Federation


Copyright 2023 MongoDB, Inc. 22 of 31
CONFIDENTIAL
4.2.1.1 Features not supported by Atlas Data Federation

● Creating Indexes
● Monitoring Data with Atlas monitoring tools
● Creating a Data Federation with S3 buckets from more than one AWS account
● Assigning Atlas temporary users permission to query a Data Federation
● Querying documents larger than 16MB
● Adding IP address associated with your Data Federation to your Atlas project
whitelist
● Assigning Atlas read only access to your AWS account with AWS security groups
● Returning more than 100 collections for wildcard collections

4.2.1.2 $out stage

$out takes documents returned by the aggregation pipeline and writes them to a specified
collection. The $out operator must be the last stage in the aggregation pipeline. In Atlas Data
Federation, $out can be used to write to S3 buckets with read and write permissions or to an
Atlas cluster namespace.

{
"$out": {
"s3": {
"bucket": "<bucket-name>",
"region": "<aws-region>",
"filename": "<file-name>",
"format": {
"name": "json|json.gz|bson|bson.gz",
"maxFileSize": "<file-size>"
}
}
}
}

4.2.1.3 FedRamp AWS


During the consultation the team mentioned that normal aws S3 bucket cannot be used as
some of the data cannot be transferred to non fedramp storage. Hence we discussed the
possibility of using fedramp aws for achieving this.

The US Federal Government is dedicated to delivering its services to the American people in
the most innovative, secure, and cost-efficient fashion. Cloud computing plays a key part in
how the federal government can achieve operational efficiencies and innovate on demand to
advance their mission across the nation. That is why many federal agencies today are using
AWS cloud services to process, store, and transmit federal government data.

For more details please refer here


Copyright 2023 MongoDB, Inc. 23 of 31
CONFIDENTIAL
4.2.2 Online Archive

Atlas Online Archive is a new feature that provides the capability to move historical
documents to MongoDB-managed S3 buckets automatically.

With Atlas Online Archive, the team can define a simple rule based on a date field and the
number of days, for archiving data off of a cluster, pick specific fields you query most
frequently, and then sit back. Atlas will automatically move data off of your cluster and into a
more cost-effective storage layer (MongoDB managed AWS S3 buckets) that can still be
queried with a connection string that combines cluster and archive data, powered by Atlas
Data Lake.

Online Archive is a good fit for many different use cases, including

● Insert-only workloads, where data is immutable and has lower performance


requirements the older it gets
● Historical log keeping
● Time-series datasets
● Storing data that would have been deleted using TTL indexes

The process of configuring “Atlas Online Archive” with screenshots of every step and how to
connect to it, is very well documented in the following MongoDB’s blog - Online Archive: A
New Paradigm for Data Tiering on MongoDB Atlas, also refer to Archive Cluster Data.

What are the limitations of the Online Archive?


Currently, here are the limitations of Online Archive:
● You can configure multiple online archives in the same namespace, but only one can
be active at any given time.
● Archived data becomes immutable and only the entire archive can be deleted.
● The performance will be lower for queries targeting archived data (compared to data
in Atlas clusters)
● You can only partition archived documents with up to two fields in addition to the date
field used for archiving.
● You cannot create multiple online archives on the same fields in the same collection.

Although You can create up to 50 online archives per cluster, up to 20 can be active per
cluster.

Can we restore the data back from S3 buckets?


Yes, you can restore the data back from your online archive by using $out and $merge. This
along with $match will help you migrate data from the archive to a temporary collection on
the Atlas cluster.

For a step-by-step guide, refer to the documentation: Restore Archived Data

Copyright 2023 MongoDB, Inc. 24 of 31


CONFIDENTIAL
How can we connect to our archived data? Does the connection string for the cluster also
change?
There are two possible ways wherein you can connect to your online archive. However, you
can only perform read-only, federated queries across your archive/archive and cluster:
● Connect to your Archived data only, or
● Connect to Archived as well as cluster data

The below image states that after configuring the online archive, you are provided with three
possible ways:

Also, the standard connection string for your cluster remains the same; it is just that you are
provided with two more additional connection strings for connecting to your archive as well
as archive and cluster together.

How is the performance when we connect to Cluster and Online Archive both?
The queries will be propagated in parallel to the underlying S3 data of the OA as well as the
cluster. The cluster is likely to respond quicker ( depending on the characteristics of the data,
presence of indexes, etc), since it's managed under MongoDB and S3 is generally slower.
The bottleneck can be S3, but the performance of OA can be improved by creating efficient
partitioning while creating Online Archives.

4.3 Mongolyser
Using the mongolyser tool you can detect, diagnose and anticipate any bottlenecks, issues
and red flags inside your MongoDB implementation. The information that the tool examines
includes in depth log analysis, query health and efficiency analysis, index analysis and much
more. This tool only runs administrative commands and does not contain any DDL or DML
commands. It is also worth mentioning that no analytics data leaves the system running the
tool.

We helped the team in downloading and installing the latest release from here. This tool is
still evolving and will entail new and deeper insights in near future.

Copyright 2023 MongoDB, Inc. 25 of 31


CONFIDENTIAL
Note 1: It is to be noted that the mongolyser tool is not officially supported by MongoDB. It is
to be considered as a 3rd party tool from an unverified source.

Note 2: Certain admin commands being used inside mongolyser’s analysis engine, under
specific conditions, can cause performance impact. Hence it is recommended to run these
analyses in Least User Activity (LUA) hours.

4.4 Keyhole analysis

The team wanted to monitor the application query patterns for the index usage and the
MongoDB cluster for query targeting to devise a strategy for index and query optimizations.
Hence, the team was advised to use Keyhole to analyze the system logs while leveraging
Atlas metrics/Atlas profiler to gauge the health of the cluster periodically. Using Keyhole with
Maobi helps in gaining actionable insights from the log files generated by MongoDB and can
help you scan your MongoDB cluster effectively. The information that the keyhole examines
includes MongoDB configurations, cluster statistics, cache usage visibility, database
schema, indexes, and index usages. It also identifies if your performance issues are related
to short hardware resources (such as physical RAM, CPU, and disk IOPS), and/or slow
queries without proper indexes.

Please refer to the following blog posts to understand how to best use keyhole:
● Survey Your Mongo Land
● Peek at your MongoDB Clusters like a Pro with Keyhole: Part 1
● Peek at your MongoDB Clusters like a Pro with Keyhole: Part 2
● Peek at your MongoDB Clusters like a Pro with Keyhole: Part 3

4.4.1 Keyhole Setup

Please use the installation steps mentioned here to download and install Keyhole, Maobi for
monitoring the cluster using Keyhole.

4.4.2 Keyhole Logs analysis

Following command can be used to run the analysis on the log file downloaded from Atlas:
keyhole -loginfo mongodb.log.gz

4.4.3 Keyhole Cluster Analysis

Following command can be used to generate cluster information for the Atlas cluster by
providing the Atlas connection string:
keyhole -allinfo "mongodb+srv://<username>:<password>@cluster.mongodb.net"

Copyright 2023 MongoDB, Inc. 26 of 31


CONFIDENTIAL
These generated log files can be uploaded on http://localhost:3030 to generate the HTML
report using Maobi with Docker.

Please note: The keyhole tool is not officially supported by MongoDB. Keyhole reports for
log analysis can be run offline and the visualization of the report in Maobi requires network
calls for loading CSS or HTML related files.

4.5 Atlas Backup

MongoDB Atlas provides a fully managed backup service for use with MongoDB
deployments. There are different backup strategies which you can use to backup your
database:

● Cloud Backups: These use the cloud provider’s native snapshot capabilities(
incremental snapshots) to take a volume snapshot of your database. Snapshots can
be configured as dictated by your backup policy.

● Continuous Cloud Backups: These backups also record the oplog for a configured
window, which can be replayed after a snapshot has been restored to a cluster upto
a minute granularity, thus allowing Point-in-Time restores. Please note that this
feature increases the monthly cost of your cluster.

During the engagement, it was observed that the continuous backup was not enabled on the
production cluster. In addition, only hourly backup was enabled with a frequency of 6 hours.
The team can consider following points to setup a backup strategy based on their
requirements:

4.5.1 Continuous Backup

Continuous Backup can be enabled from Cluster’s configuration screen on MongoDB Atlas.

Copyright 2023 MongoDB, Inc. 27 of 31


CONFIDENTIAL
For best performance during restores, please refer to the prerequisites section of the
following documentation - Restore a Cluster from a Cloud Backup — MongoDB Atlas.

4.5.2 How to quickly restore from a Multi Region Cluster?

During the consultation, the team mentioned that one of main hurdles for converting their
cluster to multi region cluster is that restore into a similar multi region cluster goes through a
full download restore process which can take a long time. Hence we discussed restoring the
backup to a single region cluster which can be done using a direct attach restore and
converting the cluster to multi region cluster which greatly reduces the downtime when a
cluster needs to be restored.

Please note: Team must test this strategy to find the exact time taken for the entire process
and any potential downtime

4.6 Atlas Event Changes

4.6.1 Change Streams


During the consultation, to create a precomputed collection for atlas search we discussed
ways we have synced data. For that we discussed change streams and triggers.

Change streams were introduced in MongoDB version 3.6 to give applications an ability to
listen for changes happening in the database in real time, using a simple API and since then
it has come a long way forward providing more robustness. Change streams are very robust
because they provide “resumability” capabilities, provide retry logic to handle loss of

Copyright 2023 MongoDB, Inc. 28 of 31


CONFIDENTIAL
connections to the MongoDB cluster (such as timeouts, or transient network errors, or
elections) etc.

Characteristics of change streams

1. Targeted changes
Changes can be filtered to provide relevant and targeted changes to listening
applications. As an example, filters can be on operation type or fields within the
document.
2. Resumablility
Resumability was top of mind when building change streams to ensure that
applications can see every change in a collection. Each change stream response
includes a resume token. In cases where the connection between the application and
the database is temporarily lost, the application can send the last resume token it
received and change streams will pick up right where the application left off. In cases
of transient network errors or elections, the driver will automatically make an attempt
to reestablish a connection using its cached copy of the most recent resume token.
However, to resume after application failure, the application needs to persist the
resume token, as drivers do not maintain state over application restarts.
3. Total ordering
MongoDB 3.6 and above has a global logical clock that enables the server to order
all changes across a sharded cluster. Applications will always receive changes in the
order they were applied to the database.
4. Durability
Change streams only include majority-committed changes. This means that every
change seen by listening applications is durable in failure scenarios such as a new
primary being elected.
5. Security
Change streams are secure – users are only able to create change streams on
collections to which they have been granted read access.
6. Ease of use
Change streams are familiar – the API syntax takes advantage of the established
MongoDB drivers and query language, and are independent of the underlying oplog
format.

For better understanding of a use case and relevant code examples, please refer to this and
this.

4.6.2 Atlas Triggers


Triggers allow you to execute server-side logic in response to database events or according
to a schedule. Atlas provides two kinds of Triggers: Database and Scheduled triggers.

Copyright 2023 MongoDB, Inc. 29 of 31


CONFIDENTIAL
Database Triggers

Database triggers allow you to execute server-side logic whenever a document is added,
updated, or removed in a linked cluster. Use database triggers to implement complex data
interactions, including updating information in one document when a related document
changes or interacting with an external service when a DML event occurs.
Scheduled Triggers

Scheduled triggers allow you to execute server-side logic on a regular schedule that you
define using CRON expressions. Use scheduled triggers to do work that happens on a
periodic basis, such as updating a document every minute, generating a nightly report, or
sending an automated weekly email newsletter.

Read more about triggers here: Triggers.

Restarting a suspended Trigger


Realm triggers use change streams to watch for document changes on a MongoDB
database. Database triggers may enter a suspended state in response to an event that
prevents the trigger’s change stream from continuing, such as a network disruption or
changes to the underlying cluster. There are a number of scenarios in which a trigger
becomes suspended:

● Realm receives an invalidate event from the change stream, for example
dropDatabase or renameCollection. Invalidate events close the change stream
cursor and prevent them from resuming.
● The resume point/token which the trigger needs to use is no longer in the oplog.
● A network error resulted in a communication failure and invalidation of the underlying
change stream.
● An authentication error where the Atlas database user used by the Realm trigger is
no longer valid, for example, if the Realm App is imported with --strategy=replace
instead of --strategy=merge.

Typically, restarting the trigger establishes a new change stream against the watched
collection. If you restart the trigger with a resume token, Realm attempts to resume the
trigger’s underlying change stream at the event immediately following the last change event
it processed. If successful, the trigger processes any events that occurred while it was
suspended.

However, it is possible that the suspended trigger cannot be restarted with a resume token if
the resume token is no longer in the oplog by the time the trigger attempts to resume (for
example, due to a small oplog window). The solution is to restart the trigger without the
resume token. If you do not use a resume token, the trigger listens for new events, but will
not fire for any events that occurred while it was suspended. Ensure that your oplog size is
sufficient (typically a few times more than the peak value from the Oplog GB / Hour graph in

Copyright 2023 MongoDB, Inc. 30 of 31


CONFIDENTIAL
the cluster metrics view) in your cluster’s configuration to prevent the resume token from
disappearing from the oplog in the future.

LIMITATIONS
Like all services, Realm triggers also have certain limits which need to be applied for optimal
performance. Below are limits applied on Realm Triggers/Functions:

1. Function runtime is limited to 90 seconds.


2. Function memory usage is limited to 256MB.
3. Functions support most commonly used ES6+ features, but some features that are
uncommon or unsuited to serverless workloads are not supported. For more
information, see JavaScript Feature Compatibility.
4. A function may open a maximum of 5 sockets using the net built-in module.
5. Realm supports a subset of built-in Node modules. For a full list of supported and
unsupported modules, see Built-In Module Support.
6. There is an 18 MB limit for incoming webhook and function requests. For functions
called from an SDK, this limit applies to the total size of all arguments you pass to the
function.

5 Recommended Further Consulting and Training


5.1 Consulting
We covered the backups, unused and redundant indexes, snowflake integration and query
optimization during the consultation. Recommendations for optimizing these queries and
backups have been suggested in this report, however, it is recommended to re-engage with
the MongoDB Professional Services team once the new implementation of the typegoose is
implemented along with the above recommendations.

5.2 Training

MongoDB offers a comprehensive set of instructor-led training courses covering all aspects
of building and running applications with MongoDB. Instructor-led training is the fastest and
best way to learn MongoDB in depth. Both public and private training classes are available -
for more information or to enroll in classes, please see Instructor-Led Training.

Copyright 2023 MongoDB, Inc. 31 of 31


CONFIDENTIAL

You might also like