12 MongoDB Design Patterns Part 1
12 MongoDB Design Patterns Part 1
1
2
▪ A write operation is atomic on the level of a single document, even if the
operation modifies multiple embedded documents within a single document
▪ When a single write operation (e.g. db.collection.updateMany()) modifies multiple
documents, the modification of each document is atomic, but the operation as a
whole is not atomic
▪ For situations requiring atomicity of reads and writes to multiple documents (in a
single or multiple collections), MongoDB supports multi-document transactions:
▪ in version 4.0, MongoDB supports multi-document transactions on replica sets
▪ in version 4.2, MongoDB introduces distributed transactions, which adds support for
multi-document transactions on sharded clusters and incorporates the existing support
for multi-document transactions on replica sets
3
MongoDB can perform schema validation during updates and insertions. Existing
documents do not undergo validation checks until modification.
▪ validator: specify validation rules or expressions for the collection
▪ validationLevel: determines how strictly MongoDB applies validation rules to existing
documents during an update
▪ strict, the default, applies to all changes to any document of the collection
▪ moderate, applies only to existing documents that already fulfill the validation criteria or to inserts
▪ validationAction: determines whether MongoDB should raise error and reject documents
that violate the validation rules or warn about the violations in the log but allow invalid
documents
db.createCollection( <name>,
{validator: <document>,
validationLevel: <string>,
validationAction: <string>,
}) 4
▪ Starting in version 3.6, MongoDB supports JSON Schema validation (recommended)
db.createCollection( "contacts",
{ validator: {
$or: [
{ phone: { $type: "string" } },
{ email: { $regex: /@mongodb\.com$/ } },
{ status: { $in: [ "Unknown", "Incomplete" ] } }
]
}
})
6
▪ Atomicity
▪ Embedded Data Model vs Multi-Document Transaction
▪ Sharding
▪ selecting the proper shard key has significant implications for performance, and can
enable or prevent query isolation and increased write capacity
▪ Indexes
▪ each index requires at least 8 kB of data space.
▪ adding an index has some negative performance impact for write operations
▪ collections with high read-to-write ratio often benefit from additional indexes
▪ when active, each index consumes disk space and memory
7
▪ Approximation
▪ Attribute
▪ Bucket
▪ Computed
▪ Document Versioning
▪ Extended Reference
▪ Outlier
▪ Pre-allocation
▪ Polymorphic
▪ Schema Versioning
▪ Subset
▪ Tree source: https://www.mongodb.com/blog/post/building-with-patterns-the-extended-reference-pattern
8
▪ Let's say that our city planning strategy
is based on needing one fire engine
per 10,000 people.
▪ instead of updating the population in
the database with every change, we
could build in a counter and only
update by 100, 1% of the time.
▪ Another option might be to have a
function that returns a random number.
If, for example, that function returns a
number from 0 to 100, it will return 0
around 1% of the time. When that
condition is met, we increase the
counter by 100.
▪ Our writes are significantly reduced Examples
here, in this example by 99%.
▪ population counter
▪ when working with large amounts of
data, the impact on performance of ▪ movie website counter
write operations is large too. source: https://www.mongodb.com/blog/post/building-with-patterns-the-approximation-pattern
9
▪ Useful when
▪ expensive calculations are
frequently done
▪ the precision of those calculations
is not the highest priority
▪ Pros
▪ fewer writes to the database
▪ no schema change required
▪ Cons
▪ exact numbers aren’t being
represented Examples
▪ implementation must be done in ▪ population counter
the application ▪ movie website counter
source: https://www.mongodb.com/blog/post/building-with-patterns-the-approximation-pattern
10
▪ Let’s think about a collection of
movies.
▪ The documents will likely have
similar fields involved across all
the documents:
▪ title, director, producer, cast, etc.
▪ Let’s say we want to search on the ▪ Move this subset of information into an array
release date: which release date? and reduce the indexing needs. We turn this
Movies are often released on information into an array of key-value pairs
different dates in different
countries.
▪ A search for a release date will
require looking across many
fields at once, we’d need several
indexes on our movies collection.
11
▪ Useful when
▪ there is a subset of fields that
share common characteristics
▪ the fields we need to sort on are
only found in a small subset of
documents
▪ Pros
▪ fewer indexes are needed, e.g.,
{"releases.location": 1,
"releases.date": 1}
▪ queries become simpler to write
and are generally faster
▪ Example
▪ product catalog
Source: https://www.mongodb.com/blog/post/building-
with-patterns-the-attribute-pattern
12
▪ With data coming in as a stream over a period
of time (time series data) we may be inclined
to store each measurement in its own
document, as if we were using a relational
database.
▪ We could end up having to index sensor_id
and timestamp for every single measurement
to enable rapid access.
▪ We can "bucket" this data, by time, into
documents that hold the measurements from
a particular time span. We can also
programmatically add additional information
to each of these "buckets".
▪ Benefits in terms of index size savings,
potential query simplification, and the ability
to use that pre-aggregated data in our
documents.
13
▪ Useful when
▪ needing to manage streaming data
▪ time-series
▪ real-time analytics
▪ Internet of Things (IoT)
▪ Pros
▪ reduces the overall number of
documents in a collection
▪ improves index performance
▪ can simplify data access by leveraging
pre-aggregation, e.g., average
temperature = sum/count
▪ Example
▪ IoT, time series
14
▪ The usefulness of data becomes much
more apparent when we can compute
values from it.
▪ What's the total sales revenue of …?
▪ How many viewers watched …?
19
In an e-commerce application ▪ However, the full retrieval of
▪ the order an order requires to join data
from different entities
▪ the customer
▪ the inventory ▪ A customer can have N
orders, creating a 1-N
are separate logical entities relationship
▪ Embedding all the customer
information inside each order
▪ reduce the JOIN operation
▪ results in a lot of duplicated
information
▪ not all the customer data may
be actually needed
20
Instead of embedding (i.e., duplicating) all the data of an external entity (i.e., another
document), we only copy the fields we access frequently.
Instead of including a reference to join the information, we only embed those fields of the
highest priority and most frequently accessed.
▪ Useful when
▪ your application is experiencing lots of JOIN operations to bring together frequently accessed data
▪ Pros
▪ improves performance when there are
a lot of join operations
▪ faster reads and a reduction in the
complexity of data fetching
▪ Cons
▪ data duplication, it works best if such
data rarely change (e.g., user-id, name)
▪ Sometimes duplication of data is better
because you keep the historical values
(e.g., shipping address of the order)
21
22
For further information on the content of these slides,
please refer to the book
https://flowygo.com/en/projects/design-with-mongodb/
23