0% found this document useful (0 votes)
22 views

12 MongoDB Design Patterns Part 1

Uploaded by

Tran Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

12 MongoDB Design Patterns Part 1

Uploaded by

Tran Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

▪ Unlike SQL databases, collections do not require its documents to have the same

schema, i.e., the following properties might change:


▪ the set of fields and
▪ the data type for the same field

▪ In practice, however, documents in a collection share a similar structure


▪ Which is the best document structure?
▪ Are there patterns to address common applications?

▪ It is possible to enforce document validation rules for a collection during update


and insert operations

1
2
▪ A write operation is atomic on the level of a single document, even if the
operation modifies multiple embedded documents within a single document
▪ When a single write operation (e.g. db.collection.updateMany()) modifies multiple
documents, the modification of each document is atomic, but the operation as a
whole is not atomic
▪ For situations requiring atomicity of reads and writes to multiple documents (in a
single or multiple collections), MongoDB supports multi-document transactions:
▪ in version 4.0, MongoDB supports multi-document transactions on replica sets
▪ in version 4.2, MongoDB introduces distributed transactions, which adds support for
multi-document transactions on sharded clusters and incorporates the existing support
for multi-document transactions on replica sets

3
MongoDB can perform schema validation during updates and insertions. Existing
documents do not undergo validation checks until modification.
▪ validator: specify validation rules or expressions for the collection
▪ validationLevel: determines how strictly MongoDB applies validation rules to existing
documents during an update
▪ strict, the default, applies to all changes to any document of the collection
▪ moderate, applies only to existing documents that already fulfill the validation criteria or to inserts

▪ validationAction: determines whether MongoDB should raise error and reject documents
that violate the validation rules or warn about the violations in the log but allow invalid
documents

db.createCollection( <name>,
{validator: <document>,
validationLevel: <string>,
validationAction: <string>,
}) 4
▪ Starting in version 3.6, MongoDB supports JSON Schema validation (recommended)

▪ To specify JSON Schema validation, use the $jsonSchema operator


db.createCollection("students",
{ validator: {
$jsonSchema: {
bsonType: "object",
required: [ "name", "year" ],
properties: {
name: {
bsonType: "string",
description: "must be a string and is required"
},
year: {
bsonType: "int",
minimum: 2000,
maximum: 2099,
description: "must be an integer in [2000, 2099] and is required»
}
}
}
}
})
5
In addition to JSON Schema validation that uses the $jsonSchema query operator,
MongoDB supports validation with other query operators, except for:
▪ $near, $nearSphere, $text, and $where operators
▪ Note: users can bypass document validation with bypassDocumentValidation option.

db.createCollection( "contacts",
{ validator: {
$or: [
{ phone: { $type: "string" } },
{ email: { $regex: /@mongodb\.com$/ } },
{ status: { $in: [ "Unknown", "Incomplete" ] } }
]
}
})
6
▪ Atomicity
▪ Embedded Data Model vs Multi-Document Transaction

▪ Sharding
▪ selecting the proper shard key has significant implications for performance, and can
enable or prevent query isolation and increased write capacity
▪ Indexes
▪ each index requires at least 8 kB of data space.
▪ adding an index has some negative performance impact for write operations
▪ collections with high read-to-write ratio often benefit from additional indexes
▪ when active, each index consumes disk space and memory

▪ Data Lifecycle Management


▪ the Time to Live feature of collections expires documents after a period of time

7
▪ Approximation
▪ Attribute
▪ Bucket
▪ Computed
▪ Document Versioning
▪ Extended Reference
▪ Outlier
▪ Pre-allocation
▪ Polymorphic
▪ Schema Versioning
▪ Subset
▪ Tree source: https://www.mongodb.com/blog/post/building-with-patterns-the-extended-reference-pattern
8
▪ Let's say that our city planning strategy
is based on needing one fire engine
per 10,000 people.
▪ instead of updating the population in
the database with every change, we
could build in a counter and only
update by 100, 1% of the time.
▪ Another option might be to have a
function that returns a random number.
If, for example, that function returns a
number from 0 to 100, it will return 0
around 1% of the time. When that
condition is met, we increase the
counter by 100.
▪ Our writes are significantly reduced Examples
here, in this example by 99%.
▪ population counter
▪ when working with large amounts of
data, the impact on performance of ▪ movie website counter
write operations is large too. source: https://www.mongodb.com/blog/post/building-with-patterns-the-approximation-pattern

9
▪ Useful when
▪ expensive calculations are
frequently done
▪ the precision of those calculations
is not the highest priority
▪ Pros
▪ fewer writes to the database
▪ no schema change required

▪ Cons
▪ exact numbers aren’t being
represented Examples
▪ implementation must be done in ▪ population counter
the application ▪ movie website counter
source: https://www.mongodb.com/blog/post/building-with-patterns-the-approximation-pattern

10
▪ Let’s think about a collection of
movies.
▪ The documents will likely have
similar fields involved across all
the documents:
▪ title, director, producer, cast, etc.

▪ Let’s say we want to search on the ▪ Move this subset of information into an array
release date: which release date? and reduce the indexing needs. We turn this
Movies are often released on information into an array of key-value pairs
different dates in different
countries.
▪ A search for a release date will
require looking across many
fields at once, we’d need several
indexes on our movies collection.

11
▪ Useful when
▪ there is a subset of fields that
share common characteristics
▪ the fields we need to sort on are
only found in a small subset of
documents
▪ Pros
▪ fewer indexes are needed, e.g.,
{"releases.location": 1,
"releases.date": 1}
▪ queries become simpler to write
and are generally faster
▪ Example
▪ product catalog

Source: https://www.mongodb.com/blog/post/building-
with-patterns-the-attribute-pattern
12
▪ With data coming in as a stream over a period
of time (time series data) we may be inclined
to store each measurement in its own
document, as if we were using a relational
database.
▪ We could end up having to index sensor_id
and timestamp for every single measurement
to enable rapid access.
▪ We can "bucket" this data, by time, into
documents that hold the measurements from
a particular time span. We can also
programmatically add additional information
to each of these "buckets".
▪ Benefits in terms of index size savings,
potential query simplification, and the ability
to use that pre-aggregated data in our
documents.

13
▪ Useful when
▪ needing to manage streaming data
▪ time-series
▪ real-time analytics
▪ Internet of Things (IoT)

▪ Pros
▪ reduces the overall number of
documents in a collection
▪ improves index performance
▪ can simplify data access by leveraging
pre-aggregation, e.g., average
temperature = sum/count
▪ Example
▪ IoT, time series

14
▪ The usefulness of data becomes much
more apparent when we can compute
values from it.
▪ What's the total sales revenue of …?
▪ How many viewers watched …?

▪ These types of questions can be answered


from data stored in a database but must be
computed.
▪ Running these computations every time
they're requested though becomes a
highly resource-intensive process,
especially on huge datasets.
▪ Example: a movie review website, every
time we visit a movie webpage, it provides
information about the number of cinemas
the movie has played in, the total number
of people who've watched the movie, and
the overall revenue.
15
▪ Useful when
▪ very read-intensive data access patterns
▪ data needs to be repeatedly computed by the
application
▪ computation done in conjunction with any
update or at defined intervals - every hour for
example
▪ Pros
▪ reduction in CPU workload for frequent
computations
▪ Cons
▪ it may be difficult to identify the need for this
pattern
▪ Examples
▪ revenue or viewer
▪ time series data
▪ product catalogs
16
▪ In most cases we query only the latest
state of the data.
▪ What about situations in which we need to
query previous states of the data?
▪ What if we need to have some functionality
of version control of our documents?

▪ Goal: keeping the version history of


documents available and usable
▪ Assumptions about the data in the
database and the data access patterns
that the application makes
▪ Limited number of revisions
▪ Limited number of versioned documents
▪ Most of the queries performed are done on
the most recent version of the document
17
▪ An insurance company might make use of this
pattern.
▪ Each customer has a “standard” policy and a
second portion that is specific to that customer.
▪ This second portion would contain a list of policy
add-ons and a list of specific items that are being
insured.

▪ As the customer changes what specific items


are insured, this information needs to be
updated while the historical information
needs to be available as well.
▪ When a customer purchases a new item and
wants it added to their policy, a new
policy_revision document is created using the
current_policy document.
▪ A version field in the document is then
incremented to identify it as the latest revision
and the customer's changes added.
18
The newest revision will be stored in the
current_policies collection and the old version
will be written to the policy_revisions collection.
▪ Pros
▪ easy to implement, even on existing
systems
▪ no performance impact on queries on the
latest revision
▪ Cons
▪ doubles the number of writes
▪ queries need to target the correct
collection
▪ Examples
▪ financial industries
▪ healthcare industries
source: https://www.mongodb.com/blog/post/building-with-patterns-the-document-
versioning-pattern

19
In an e-commerce application ▪ However, the full retrieval of
▪ the order an order requires to join data
from different entities
▪ the customer
▪ the inventory ▪ A customer can have N
orders, creating a 1-N
are separate logical entities relationship
▪ Embedding all the customer
information inside each order
▪ reduce the JOIN operation
▪ results in a lot of duplicated
information
▪ not all the customer data may
be actually needed

20
Instead of embedding (i.e., duplicating) all the data of an external entity (i.e., another
document), we only copy the fields we access frequently.
Instead of including a reference to join the information, we only embed those fields of the
highest priority and most frequently accessed.
▪ Useful when
▪ your application is experiencing lots of JOIN operations to bring together frequently accessed data

▪ Pros
▪ improves performance when there are
a lot of join operations
▪ faster reads and a reduction in the
complexity of data fetching
▪ Cons
▪ data duplication, it works best if such
data rarely change (e.g., user-id, name)
▪ Sometimes duplication of data is better
because you keep the historical values
(e.g., shipping address of the order)
21
22
For further information on the content of these slides,
please refer to the book

“Design with MongoDB”


Best Models for Applications
by Alessandro Fiori

https://flowygo.com/en/projects/design-with-mongodb/

23

You might also like