Data Modeling With MongoDB
Data Modeling With MongoDB
Yulia Genkina
Curriculum Engineer @ MongoDB
Agenda
Key Considerations
Agenda
Key Considerations
Key Considerations
Design Patterns
Sub - Bullet points
Key Considerations
Design Patterns
Key Considerations
Design Patterns
Conclusion
Let’s Compare
RDBMS approach to data modeling vs. MongoDB
Modeling for RDBMS Concerns
T
EC
RR
CO
D
L IZE
R MA
NO ?
DE
Da
ta
dic
Step 2: Develop the application t at
es
and queries
Modeling for RDBMS Concerns
Develop the Define the Data Improve the Improve the Data
Application Model Application Model
Many design options
There Is No Magic
Design is part of each phase of
Formula, but There Is A
the application lifetime
Method
What affects the data model:
o The data that your application needs
o Application’s read and write usage of
the data
Data Modeling
Methodology to Achieve a Near Magic Almost Formula
Step-by-step Iteration
ü Business domain expertise
ü Current and predicted scenarios
ü Production logs and stats
• Data size
• Data size
• One-to-many • title
• date
• Many-to-many • text
1-to-N N-to-N
users categories
• name 1-to-N
• name
• email • url
1-to-N
comments
• name
• url
Example: Entities and relationships in a Blog
One-to-One Linked
Author = {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky"
"book": 1, // more fields follow…
}
One-to-One Embedded
Book = {
"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": {
"firstName": "Eliezer",
"lastName": "Yudkowsky"
},
// more fields follow…
}
One-to-Many: Array in Parent
Author= {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky",
"books": [1, 5, 17],
// more fields follow…
}
One-to-Many: Scalar in Child
Book1= {
"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": 1, // more fields follow…
}
Book2= {
"_id": 5,
"title": "How to Actually Change Your Mind",
"slug": "1939311179490-how-to-change",
"author": 1, // more fields follow…
}
Many-to-Many: Arrays on either side
Author = {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky",
"books": [5, 7], // more fields follow…
}
Embed All Embed &Link
articles articles
• title
• title
• date
• text
• date
• text
tags []
• name
• url
tags []
• name
categories [] users • url
• name • name 1-to-N
• url • email
categories []
• name
comments[] • url
• name
• url
comments[]
• name
users • url
• name
• email
• Collections with
documents fields and
Finalize the data shapes for each
Evaluate the Map out entities and • Data size
model for each
application workload their relationships • Database queries and
collection indexes
• Current operations
• Data size • CRD: Collection • Identify and assumptions, and growth
• A list of relationship apply relevant projections
operations Diagram (Link or design patterns
ranked by Embed? )
importance
Design Patterns
Brief introduction
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Bucket Pattern
}
The Computed Pattern
CPU work
The Computed Pattern
CPU work
The Computed Pattern
"Never recompute what you can
precompute"
Learning
Design Patterns: Elements of Reusable Object-Oriented
Software – a book!
• Data size
• Collections with
documents fields and
shapes for each
Evaluate the Map out entities and • Data size
application workload their relationships • Database queries and
indexes
• Current operations
• Data size • CRD: Collection assumptions, and growth
• A list of relationship projections
operations Diagram (Link or
ranked by Embed? )
importance
Entity Relationship Diagram
carts users
N-to-N N-to-N
1-to-N
users items
carts reviews
stores
N-to-N
N-to-N staff
1-to-N
views categories
Collections Relationship Diagram (Better)
Accommodate for assumptions.
Embed & Link!
items
y 5
carts
r
ve
reviews
stores
r e
ea rs
1-to-N N-to-N
users
l
c a
N-to-N staff
ye
y5
1-to-N 1-to-N
views
e r categories
e v
r
l ea rs
c a
ye
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats
• Collections with
documents fields and
Finalize the data shapes for each
Evaluate the Map out entities and • Data size
model for each
application workload their relationships • Database queries and
collection indexes
• Current operations
• Data size • CRD: Collection • Identify and assumptions, and growth
• A list of relationship apply relevant projections
operations Diagram (Link or schema patterns
ranked by Embed? )
importance
Apply all the Patterns!
Patterns Used:
• Schema Versioning
• Subset
• Computed
• Bucket
• Extended Reference
Conclusion
And additional considerations
Your Data Model Will Evolve
Just like your application
Small team Medium team Large team Very big team team
Tailor the Data Model
To your unique setup
e l
od
e l a m
• Shared hosted DB
od• Replica Set at
• Small team
m t d
ta an
d a rm
le r rf o
p Pe
Sim • Large Sharded Cluster
Small team Medium team Large team Very big team team
Flexible Data Modeling Approach
For a Simpler data model For the most Performant
For a bit of both:
focus on: data model focus on:
• Data size
• Data size • The most frequent
Evaluate the application The most frequent
• The most frequent operations
workload operation
operations • The most important
operations
Map out the entities and Embedding and linking Embedding and linking
Embedding data
their relationships data data
Finalize schema for each Use as many patterns as Use as many patterns as
Use few patterns
collection necessary necessary
#MDBlive