0% found this document useful (0 votes)
121 views

Data Modeling With MongoDB

This document discusses data modeling with MongoDB. It covers key considerations like linking vs embedding data and provides examples. The methodology involves iteratively defining entities and relationships, evaluating the application workload, and finalizing the data model with relevant design patterns. Linking is better if related data is often queried or changed separately, while embedding works for tightly-coupled data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views

Data Modeling With MongoDB

This document discusses data modeling with MongoDB. It covers key considerations like linking vs embedding data and provides examples. The methodology involves iteratively defining entities and relationships, evaluating the application workload, and finalizing the data model with relevant design patterns. Linking is better if related data is often queried or changed separately, while embedding works for tightly-coupled data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Data Modeling with MongoDB

Yulia Genkina
Curriculum Engineer @ MongoDB
Agenda

Key Considerations
Agenda

Key Considerations

Linking vs. Embedding


Agenda

Key Considerations

Linking vs. Embedding

Design Patterns
Sub - Bullet points

Key Considerations

Linking vs. Embedding

Design Patterns

Use Case Example


Agenda

Key Considerations

Linking vs. Embedding

Design Patterns

Use Case Example

Conclusion
Let’s Compare
RDBMS approach to data modeling vs. MongoDB
Modeling for RDBMS Concerns

Step 1: Define the Schema

T
EC
RR
CO

Step 2: Develop the application


and queries
Modeling for RDBMS Concerns

Step 1: Define the Schema

D
L IZE
R MA
NO ?
DE

Step 2: Develop the application


and queries ?
Modeling for RDBMS Concerns

Step 1: Define the Schema

Da
ta
dic
Step 2: Develop the application t at
es

and queries
Modeling for RDBMS Concerns

Step 1: Define the Schema

Step 2: Develop the application


and queries
Data Modeling with MongoDB

Develop the Define the Data Improve the Improve the Data
Application Model Application Model
Many design options

Designed for the usage pattern

Data model evolution is easy


Improve the Improve the
Application Data Model
Can evolve without any
downtime
Key Considerations
For Data Modeling with MongoDB
Data model is defined at the
application level

There Is No Magic
Design is part of each phase of
Formula, but There Is A
the application lifetime
Method
What affects the data model:
o The data that your application needs
o Application’s read and write usage of
the data
Data Modeling
Methodology to Achieve a Near Magic Almost Formula
Step-by-step Iteration
ü Business domain expertise
ü Current and predicted scenarios
ü Production logs and stats

• Data size

• Database queries and


Evaluate the indexes
application workload
• Current operations and
assumptions
• Data size
• A list of
operations
ranked by
importance
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Data size

• Database queries and


Evaluate the Map out entities and indexes
application workload their relationships
• Current operations and
assumptions
• Data size • CRD: Collection
• A list of relationship
operations Diagram (Link or
ranked by Embed? )
importance
Link vs. Embed
Which is the Right Decision and What Does it Mean?
What Can Be Linked?
tags
• name
Relationships: • url
• One-to-one articles
N-to-N

• One-to-many • title
• date
• Many-to-many • text
1-to-N N-to-N
users categories
• name 1-to-N
• name
• email • url
1-to-N
comments
• name
• url
Example: Entities and relationships in a Blog
One-to-One Linked

Book = { // either side can track


"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": 1, // more fields follow…
}

Author = {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky"
"book": 1, // more fields follow…
}
One-to-One Embedded

Book = {
"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": {
"firstName": "Eliezer",
"lastName": "Yudkowsky"
},
// more fields follow…
}
One-to-Many: Array in Parent

Author= {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky",
"books": [1, 5, 17],
// more fields follow…
}
One-to-Many: Scalar in Child

Book1= {
"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": 1, // more fields follow…
}

Book2= {
"_id": 5,
"title": "How to Actually Change Your Mind",
"slug": "1939311179490-how-to-change",
"author": 1, // more fields follow…
}
Many-to-Many: Arrays on either side

Book = { //either side can track


"_id": 5,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"authors": [1, 3], // more fields follow…
}

Author = {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky",
"books": [5, 7], // more fields follow…

}
Embed All Embed &Link
articles articles
• title
• title
• date
• text
• date
• text
tags []
• name
• url
tags []
• name
categories [] users • url
• name • name 1-to-N
• url • email
categories []
• name
comments[] • url
• name
• url
comments[]
• name
users • url
• name
• email

Queries by articles Queries by articles or users


How often does the embedded
information get accessed?

Is the data queried using the


To Link or Embed? embedded information?

Does the embedded information


change often?
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Collections with
documents fields and
Finalize the data shapes for each
Evaluate the Map out entities and • Data size
model for each
application workload their relationships • Database queries and
collection indexes
• Current operations
• Data size • CRD: Collection • Identify and assumptions, and growth
• A list of relationship apply relevant projections
operations Diagram (Link or design patterns
ranked by Embed? )
importance
Design Patterns
Brief introduction
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Bucket Pattern

Tabular Approach Document Approach


New document for each sensor New document per time unit per
reading sensor
Really benefits from the document
model

Used to store small, related data


items
• Bank Transactions – related by account and
date
• IoT Readings – related by sensor and date

Reduces index sizes by a large


magnitude

The Bucket Pattern Increases speed of retrieval of related


Enables the Computed Pattern data
The Bucket Pattern Implementation

sensor = 5, value = 22, time = Date('2020-05-11')

db.iot.updateOne({ "sensor": reading.sensor,


"valcount": { "$lt": 200 } },
{ "$push": { "readings": { "v": value, "t": time } },
"$inc": { "valcount": 1 } },
{ upsert: true })

{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3,


"readings": [ {"v": 11, "t": Date("2020-05-09")},
{"v": 81, "t": Date("2020-05-10")},
{"v": 22, "t": Date("2020-05-11")} ] }

}
The Computed Pattern

CPU work
The Computed Pattern

CPU work
The Computed Pattern
"Never recompute what you can
precompute"

Reads are often more common than


writes

Compute on write is less work than


The Computed Pattern compute on read

When updating the database, update


some summary records too

Can be thought of as a caching


pattern
Computed Pattern with the Bucket Pattern

sensor = 5, value = 22, time = Date('2020-05-11')

db.iot.updateOne({ "sensor": reading.sensor,


"valcount": { $lt:200 } },
{ "$push": { "readings": { "v": value, "t": time } },
"$inc": { "valcount": 1, "tot": value } },
{ upsert: true })

{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3, "tot": 114,


"readings": [ { "v": 11, "t": Date("2020-05-09” )},
{ "v": 81, "t": Date("2020-05-10” )},
{ "v": 22, "t": Date("2020-05-11” )} ] }
Other Patterns and Where To Find Them
MongoDB Blog, MongoDB Developer Portal and
MongoDB University are all great resources to continue
learning about data modeling and patterns.

Learning
Design Patterns: Elements of Reusable Object-Oriented
Software – a book!

Other talks at this conference:


• Advanced Schema Design Patterns
• A Complete Methodology to Data Modeling
• Using JSON Schema to Save Lives
• Attribute Pattern and the Wildcard Index: Is the
Attribute Pattern Obsolete?
Design an Online Shopping App:
MongoMart
A Use Case Example
Step 1
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Data size

• Database queries and


Evaluate the indexes
application workload
• Current operations
assumptions, and growth
• Data size projections
• A list of
operations
ranked by
importance
Evaluate the Application Workload

1000 stores 50 employees per stores


1 store lookup per customer per year

10 Million items 100 reviews per item


500 thousand updates per day

100 Million user accounts Placing 4 items in the cart


• 500 thousand new accounts per week
Buying an average of 2 items per cart
• Logging in 20 times a year
• Looking up 100 items per year
• Creating 5 carts per year
• Reviewing 2 items per year

10 data scientists each running 10


Analytics
queries a day
Workload Evaluation Summary

Most important queries


• r2: user views a specific item – has to be under 1 ms
• w3: user adds item to cart – write concern: majority
List of Entities:
Required indexes • carts
• {"category": 1, "item_name": 1} • categories
• items
• {"category": 1, "item_name": 1, "price": 1}
• reviews
• {"username": 1} and more.. • staff
• stores
• users
Assumptions and Projections • views
• Data will be stored for a maximum of 5 years
• Number of items sold and number of users will double each year
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Collections with
documents fields and
shapes for each
Evaluate the Map out entities and • Data size
application workload their relationships • Database queries and
indexes
• Current operations
• Data size • CRD: Collection assumptions, and growth
• A list of relationship projections
operations Diagram (Link or
ranked by Embed? )
importance
Entity Relationship Diagram

carts users

N-to-N N-to-N
1-to-N

users items staff

1-to-N 1-to-N N-to-N 1-to-N


N-to-N

views reviews stores


Collections Relationship Diagram (Simple)
Embed Everything!

users items

carts reviews
stores
N-to-N
N-to-N staff

1-to-N
views categories
Collections Relationship Diagram (Better)
Accommodate for assumptions.
Embed & Link!

items

y 5
carts
r
ve
reviews
stores
r e
ea rs
1-to-N N-to-N
users
l
c a
N-to-N staff

ye
y5
1-to-N 1-to-N
views
e r categories

e v
r
l ea rs
c a
ye
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Collections with
documents fields and
Finalize the data shapes for each
Evaluate the Map out entities and • Data size
model for each
application workload their relationships • Database queries and
collection indexes
• Current operations
• Data size • CRD: Collection • Identify and assumptions, and growth
• A list of relationship apply relevant projections
operations Diagram (Link or schema patterns
ranked by Embed? )
importance
Apply all the Patterns!

Patterns Used:

• Schema Versioning
• Subset
• Computed
• Bucket
• Extended Reference
Conclusion
And additional considerations
Your Data Model Will Evolve
Just like your application

Small team Medium team Large team Very big team team
Tailor the Data Model
To your unique setup

e l
od
e l a m
• Shared hosted DB
od• Replica Set at
• Small team
m t d
ta an
d a rm
le r rf o
p Pe
Sim • Large Sharded Cluster
Small team Medium team Large team Very big team team
Flexible Data Modeling Approach
For a Simpler data model For the most Performant
For a bit of both:
focus on: data model focus on:

• Data size
• Data size • The most frequent
Evaluate the application The most frequent
• The most frequent operations
workload operation
operations • The most important
operations

Map out the entities and Embedding and linking Embedding and linking
Embedding data
their relationships data data

Finalize schema for each Use as many patterns as Use as many patterns as
Use few patterns
collection necessary necessary
#MDBlive

Visit our product


"booths" for new
features, like the new
Schema Advisor in
Atlas!
mongodb.com/live/product
#MDBlive

Special Thanks to:


John Page, Daniel Coupal,
Eoin Brazil for excellent
content support

You might also like