0% found this document useful (0 votes)
45 views53 pages

4 Pattern in MongoDB1

The document discusses data modeling patterns for NoSQL databases like MongoDB. It describes the attribute pattern, which helps organize common and uncommon fields across documents to potentially reduce the number of indexes needed. The pattern involves transforming key-value properties into an array of documents. An example shows how different product types could use the attribute pattern to store varying size attributes in a standardized way. Maintaining data integrity and consistency when using patterns is also addressed.

Uploaded by

Mạnh Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views53 pages

4 Pattern in MongoDB1

The document discusses data modeling patterns for NoSQL databases like MongoDB. It describes the attribute pattern, which helps organize common and uncommon fields across documents to potentially reduce the number of indexes needed. The pattern involves transforming key-value properties into an array of documents. An example shows how different product types could use the attribute pattern to store varying size attributes in a standardized way. Maintaining data integrity and consistency when using patterns is also addressed.

Uploaded by

Mạnh Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

mongoDB

Giảng viên: Từ Thị Xuân Hiền


Contents
▪ What are Patterns?
▪ Significance of Data Modeling Patterns
▪ Patterns in NOSQL Data Modeling
What are Patterns?
▪ Building Blocks
▪ Identified by our Consulting
Engineers helping customers for
the last 12 years.

▪ Common Language
▪ Data Architects and Engineers
can easily reference the same
things
What are Patterns?
▪ Patterns are the most powerful tool for designing schemas
for MongoDB and NoSQL.
▪ Patterns are not full solution to problems. Patterns are a smaller
section of those solutions.
▪ Patterns are reusable units of knowledge.
▪ Familiar with software architecture design, patterns will do
for data modeling and schema design for document
databases.
What can patterns do for you?
▪ Improve Performance
▪ By using no more resources
than you should
▪ Simplify the access to the data
▪ By grouping and pre-arranging
data in a simpler form
Patterns in Schema Design - MongoDB
▪ Benefits of Patterns
▪ Pattern helps to optimize large documents with subset pattern.
▪ Use the computed pattern, avoid repeated calculations,
▪ Handle changes to the system implementation in no time.
▪ Patterns serve as a common language for teams working on schema
designs.
▪ Having clear patterns and understanding when and how to use them
eliminates errors in the data model for MongoDB and makes the
process more predictable.
Handling Duplication, Staleness and Integrity

▪ Some concerns the usage of patterns may arise:


▪ Duplication
▪ Duplicating data across documents
▪ Data Staleness
▪ Accepting staleness in some pieces of data
▪ Data Integrity Issues:
▪ Writing extra application side logic to ensure referential intgrity
Handling Duplication
▪ Duplication may cause inconsistency when you change one
piece of data while the duplication part not changed.
▪ Cause of duplication: Results embedding information into a
given document for faster access
▪ Concern:
▪ Duplication makes handling changes to duplicate information
a challenge of correctness and consistency, where multiple
documents in different collections may need to be updated.
Handling Duplication
▪ Duplication is the solution:
In some cases, duplication is
better than no duplication.
▪ Example:
▪ Let's link orders of products to
the address of the customer
that placed the order by using a
reference to a customer
document.
Handling Duplication
▪ Duplication is the solution:
Example:
▪ Updating the address for this
customer updates information for
the already fulfilled shipments,
order that have been already
delivered to the customer.
▪ This is not the desired behavior.
Handling Duplication
▪ Duplication is the solution
▪ Example (cont.):
▪ Embedding a copy of the address within the shipment document
will ensure we keep the correct value.
▪ When the customer moves, we add another shipping address on file.
▪ Using this new address for new orders, does not affect the already
shipped orders.
Handling Duplication
▪ Duplication is the solution
Handling Duplication
▪ Duplication has minimal effect: duplication situation to
consider is when the copy data does not ever change
▪ Example:
▪ Let's say we want to model movies and actors.
▪ Movies have many actors and actors play in many movies. So this
is a typical many-to-many relationship
▪ Avoiding duplication in a many-to-many relationship requires us to
keep two collections and create references between the documents
in the two collections.
Handling Duplication
▪ Duplication has minimal effect:
▪ Example (cont.)
▪ If list the actors in a given movie document, we are creating duplication.
▪ However, once the movie is released, the list of actors does not change
▪ So duplication on this unchanging information is also perfectly
acceptable.
Handling Duplication
▪ Duplication has minimal effect:
Handling Duplication
▪ Duplication should be handled: the duplication of a piece of
information that needs to or may change with time.
▪ Example:
▪ The revenues for a given movie, which is stored within the movie, and
the revenues earned per screening.
▪ In this case, we have duplication between the sum store in the movie
document and the revenue store in the screening documents used
to compute the total sum.
Handling Duplication
▪ Duplication should be handled
▪ Example
Handling Duplication
▪ Duplication should be handled:
▪ Example (cont.):
▪ This situation, we must keep multiple values in sync over time,
makes us ask the question is the benefit of having this sum
precomputed surpassing the cost and trouble of keeping it in sync?
▪ If yes, then use this computed pattern.
▪ Here, if we want the sum to be synchronized. Meaning, whenever
the application writes a new document to the collection or updates
the value of an existing document, it must update the sum.
Handling Duplication
▪ Duplication should be handled:
▪ Example (cont.):
▪ If we want the sum to be synchronized, it may be the responsibility
of the application to keep it in sync. Meaning, whenever the
application writes a new document to the collection or updates the
value of an existing document, it must update the sum.
▪ But how often should we actually recalculate the sum?
▪ This brings us to the next concern we must consider when using
patterns, staleness.
Handling Duplication
▪ Duplication should be handled:
▪ Example (cont.):
Handling Staleness
▪ Due to globalization and the world being flatter, systems are
now accessed by millions of concurrent users, impacting the
ability to display up-to-the-second data to all these users
more challenging.
▪ Example:
▪ The availability of a product that is shown to a user may still have to
be confirmed at checkout time.
▪ The prices of plane tickets or hotel rooms that change right before
you book them
Handling Staleness
▪ Why do we get this staleness?
▪ New events come along at such a fast rate that updating data
constantly can cause performance issues.

▪ The main concern when solving this issue is data quality


and reliability.
▪ The issuses
▪ How long can the user tolerate not seeing the most up-to-date value
for a specific field?
▪ Analytic queries are often run on the secondary node, which often
may have stale data
Handling Staleness
▪ Resolve Staleness:
▪ The solution to resolve staleness in the world of big data is to batch
updates.

▪ Change Stream's a new application to access and respond to data


changes, either in real time or in a delayed mode.
Handling Referential Integrity
▪ Referential integrity has some similarities to staleness.
▪ Why?
▪ information between documents or tables
▪ No support for cascading deletes
▪ Concern?
▪ Challenge for correctness and consistency
Handling Referential Integrity
▪ Resolve Referential Integrity
▪ Change Streams
▪ For delayed referential integrity, we can, rely on
change streams.
▪ Single Document
▪ We can avoid using references by embedding
information in a single document, instead of linking it.
▪ Multi Documents Transaction
▪ We can use MongoDB with be multi-document
transactions to update multiple documents at once
Recap
▪ For a given piece of data
▪ Should or could the information be duplicated or not?
▪ Resolve with bulk updates
▪ What is the tolerated or acceptable staleness?
▪ Resolve with updates based on change streams
▪ Which pieces of data require referential integrity?
▪ Resolve or prevent the inconsistencies with change stream or
transactions
PATTERNS IN NOSQL DATA MODELING
Patterns in NOSQL Data Modeling
1. Attribute Pattern 6. Schema Versioning Pattern
2. Extended Reference Pattern 7. Tree Patterns
3. Subset Pattern 8. Polymorphic Pattern
4. Computed Pattern 9. Other Patterns
5. Bucket Pattern
Attribute Pattern
▪ The attribute pattern is orthogonal to polymorphic. It helps to
organize fields that have either common characteristics you
want to search across, or fields that are rare (hiếm), or when
you need to manage an influx of unpredictable properties.
▪ Attribute pattern potentially reduces the number of indexes.
▪ To use attribute pattern, transpose the key/values of the
desired properties into an array of documents.
Attribute Pattern
▪ Example:
▪ Products have an identification like brand, manufacturer, sub-brand,
enterprise that are common across the majority of products
▪ Products' additional fields that are common across many products,
like color and size-- either these values may have different units
and means different things for the different products.
Attribute Pattern
▪ Example (cont.):
Attribute Pattern
▪ Example (cont.)
▪ The size of a beverage made in the US maybe measured as ounces,
while the same drink in Europe will be measured in milliliters.
▪ The MongoDB charger, the size is measured according to its three
dimensions.
▪ The size of a Cherry Coke six-pack, 12 ounces for a single can, six
times 12 ounces, or 72 ounces to count the full six-pack.
▪ We could list the physical dimension and report the amount of the
liquid in that field
Attribute Pattern
▪ Example (cont.)
▪ The third list of fields, the set of fields that are not going to exist in
all the products. They may exist in the new description that your
supplier is providing you
▪ For a sugary drink, you may want to know the type of sweetener,
while for a battery, you are more interested in its specifications, like
the amount of electricity provid
▪ Schema and indexing may appear in the third list of fields.
▪ To search effectively on one of those fields, you need an index.
Attribute Pattern
▪ Example (cont.)
▪ Searching on the capacity for my battery would require an index.
▪ Searching on the voltage output of my battery would also require an
index.
▪ If you have tons of fields, you may have a lot of indexes.
Using the attribute pattern
▪ How to use attribute pattern
▪ Identifying the list of fields you
want to transpose.
▪ For each field in associated value,
we create that pair.

▪ Example:
▪ We transpose the fields input,
output, and capacity.
Using the attribute pattern
▪ Example (cont.)
▪ For consistency, let's use K for key and V for value, as some of our
aggregation functions do.
▪ Under the field name K, we put the name of the original field as
the value
▪ For the first one, the field was named "input," so that became the
value for K.
▪ Then the value for input was five volts or 1,300 milliamps, so this is
the value for the field V
Using the attribute pattern
▪ Example (cont.):
Using the attribute pattern
▪ Example (cont.)
▪ Repeating the same thing for the original field's output and
capacity, we get three documents, each adding a K and a V in
them.
Using the attribute pattern
▪ Example (cont.)
▪ Because of their similar shape it is easy to place them together
under an "add_specs" for additional specs array.
▪ Note that for the third field, not only do I transpose it to a key value
pair, but that also added a third field called U to store some units
separately.
▪ This third field qualifies the relationship between K and the V.
Using the attribute pattern
▪ Example (cont.)
▪ The last thing to do is to create an index for all that info.
▪ This is done by creating an index on "add_specs.k" and
"add_specs.v."
Fields that share Common Characteristics

▪ Another scenario: we have a document representing a movie


▪ In the document, there are several fields to keep track of when the
movie was released.
▪ In this case, we keep track of the dates when a movie was released
in the USA, in Mexico, and France, and when it appears in the San
Jose movie festival
▪ One thing to observe with those fields is that they share the same
type of value: the type, release date.
Fields that share Common Characteristics

▪ Another scenario
Fields that share Common Characteristics

▪ Question: What if we want to find all the movies released


between two dates across all countries?
▪ I would have to list all the countries in the festival for each
of these,
▪ Run a separate query for the range certain and aggregate
all my results.
Fields that share Common Characteristics

▪ Using the attribute pattern and transforming those release


dates to an array of field pairs, we can change the query to
this.
Fields that share Common Characteristics

▪ Problem ▪ Solution
▪ The attribute pattern ▪ Break the field/value into a sub-
addresses the problem of document with:
having a lot of similar fields in ▪ fieldA: field
a document. ▪ fieldB: value
▪ Search across many fields at ▪ Example:
once ▪ {“color”:”Blue”, “size”: “large”}
▪ Fields present in only a subset ▪ {[{“k”:”color”, “v”: “Blue”},
of the documents have many
▪ {“k”:”size”, “v”: “large”}]}
similar fields.
Fields that share Common Characteristics

▪ User case example ▪ Benefit and trade – Offs


▪ Characteristics of a product ▪ Easier to index
▪ Set of fields all having same ▪ Allow fo non-deterministic field
value type names
▪ List of dates ▪ Ability to quality the relationship
▪ With movies, where a different of the original field and value
location can have different
release dates
Summary
▪ The attribute pattern
▪ Orthogonal Pattern to polymorphism
▪ Add organization for
▪ Common characteristics
▪ Rare/unpredictable fields
▪ Reduces number of indexes
▪ Transpose keys/values as
▪ Array of sub-documents of form:
▪ {“k”:”key”, “v”:”value”}
Lab: Apply the Attribute Pattern
Problem: User Story
▪ The museum we work at has grown from a local attraction to one
that is seen as having very popular items.
▪ For this reason, other museums in the World have started
exchanging pieces of art with our museum.
▪ Our database was tracking if our pieces are on display and where
they are in the museum.
▪ To track the pieces we started exchanging with other museum, we
added an array called events, in which we created an entry for each
date a piece was loaned and the museum it was loaned to.
Lab: Apply the Attribute Pattern
▪ Problem: User Story
Lab: Apply the Attribute Pattern
▪ Problem: User Story
▪ The problem with this design is that we need to build a new index
every time there is a new museum with which we start exchanging
pieces.
▪ For example, when we started working with The Prado in Madrid, we
needed to add this index:

{ "events.prado" : 1 }
Lab: Apply the Attribute Pattern
▪ Task: To address this issue, you've decided to change the
schema to:
▪ Use a single index on all event dates.
▪ Transform the field that tracks the date when a piece was acquired,
date_acquisition, so that it is also indexed with the values above.
▪ To ensure the validator can verify your solution, use "k" and "v" as
field names if needed.
▪ To complete this lab:
▪ Modify the following schema to incorporate the above changes:
Lab: Apply the Attribute Pattern
▪ To complete this lab:
Modify the following
schema to incorporate the
above changes:
Lab: Apply the Attribute Pattern
▪ Save your new schema to a file named pattern_attribute.json.
▪ Validate your answer on Windows by running in the CMD
shell:
validate_m320 pattern_attribute --file pattern_attribute.json

You might also like