0% found this document useful (0 votes)

184 views59 pages

Data Modeling With MongoDB

This document discusses data modeling with MongoDB. It covers key considerations like linking vs embedding data and provides examples. The methodology involves iteratively defining entities and relationships, evaluating the application workload, and finalizing the data model with relevant design patterns. Linking is better if related data is often queried or changed separately, while embedding works for tightly-coupled data.

Uploaded by

Muhammad Riza Alifi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

184 views59 pages

Data Modeling With MongoDB

Uploaded by

Muhammad Riza Alifi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Data Modeling with MongoDB

Yulia Genkina
Curriculum Engineer @ MongoDB
Agenda

Key Considerations
Agenda

Key Considerations

Linking vs. Embedding

Agenda

Key Considerations

Linking vs. Embedding

Design Patterns
Sub - Bullet points

Key Considerations

Linking vs. Embedding

Design Patterns

Use Case Example

Agenda

Key Considerations

Linking vs. Embedding

Design Patterns

Use Case Example

Conclusion
Let’s Compare
RDBMS approach to data modeling vs. MongoDB
Modeling for RDBMS Concerns

Step 1: Define the Schema

T
EC
RR
CO

Step 2: Develop the application

and queries
Modeling for RDBMS Concerns

Step 1: Define the Schema

D
L IZE
R MA
NO ?
DE

Step 2: Develop the application

and queries ?
Modeling for RDBMS Concerns

Step 1: Define the Schema

Da
ta
dic
Step 2: Develop the application t at
es

and queries
Modeling for RDBMS Concerns

Step 1: Define the Schema

Step 2: Develop the application

and queries
Data Modeling with MongoDB

Develop the Define the Data Improve the Improve the Data
Application Model Application Model
Many design options

Designed for the usage pattern

Data model evolution is easy

Improve the Improve the
Application Data Model
Can evolve without any
downtime
Key Considerations
For Data Modeling with MongoDB
Data model is defined at the
application level

There Is No Magic
Design is part of each phase of
Formula, but There Is A
the application lifetime
Method
What affects the data model:
o The data that your application needs
o Application’s read and write usage of
the data
Data Modeling
Methodology to Achieve a Near Magic Almost Formula
Step-by-step Iteration
ü Business domain expertise
ü Current and predicted scenarios
ü Production logs and stats

• Data size

• Database queries and

Evaluate the indexes
application workload
• Current operations and
assumptions
• Data size
• A list of
operations
ranked by
importance
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Data size

• Database queries and

Evaluate the Map out entities and indexes
application workload their relationships
• Current operations and
assumptions
• Data size • CRD: Collection
• A list of relationship
operations Diagram (Link or
ranked by Embed? )
importance
Link vs. Embed
Which is the Right Decision and What Does it Mean?
What Can Be Linked?
tags
• name
Relationships: • url
• One-to-one articles
N-to-N

• One-to-many • title
• date
• Many-to-many • text
1-to-N N-to-N
users categories
• name 1-to-N
• name
• email • url
1-to-N
comments
• name
• url
Example: Entities and relationships in a Blog
One-to-One Linked

Book = { // either side can track

"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": 1, // more fields follow…
}

Author = {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky"
"book": 1, // more fields follow…
}
One-to-One Embedded

Book = {
"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": {
"firstName": "Eliezer",
"lastName": "Yudkowsky"
},
// more fields follow…
}
One-to-Many: Array in Parent

Author= {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky",
"books": [1, 5, 17],
// more fields follow…
}
One-to-Many: Scalar in Child

Book1= {
"_id": 1,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"author": 1, // more fields follow…
}

Book2= {
"_id": 5,
"title": "How to Actually Change Your Mind",
"slug": "1939311179490-how-to-change",
"author": 1, // more fields follow…
}
Many-to-Many: Arrays on either side

Book = { //either side can track

"_id": 5,
"title": "Harry Potter and the Methods of Rationality",
"slug": "9781857150193-hpmor",
"authors": [1, 3], // more fields follow…
}

Author = {
"_id": 1,
"firstName": "Eliezer",
"lastName": "Yudkowsky",
"books": [5, 7], // more fields follow…

}
Embed All Embed &Link
articles articles
• title
• title
• date
• text
• date
• text
tags []
• name
• url
tags []
• name
categories [] users • url
• name • name 1-to-N
• url • email
categories []
• name
comments[] • url
• name
• url
comments[]
• name
users • url
• name
• email

Queries by articles Queries by articles or users

How often does the embedded
information get accessed?

Is the data queried using the

To Link or Embed? embedded information?

Does the embedded information

change often?
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Collections with
documents fields and
Finalize the data shapes for each
Evaluate the Map out entities and • Data size
model for each
application workload their relationships • Database queries and
collection indexes
• Current operations
• Data size • CRD: Collection • Identify and assumptions, and growth
• A list of relationship apply relevant projections
operations Diagram (Link or design patterns
ranked by Embed? )
importance
Design Patterns
Brief introduction
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Schema Versioning Pattern
The Bucket Pattern

Tabular Approach Document Approach

New document for each sensor New document per time unit per
reading sensor
Really benefits from the document
model

Used to store small, related data

items
• Bank Transactions – related by account and
date
• IoT Readings – related by sensor and date

Reduces index sizes by a large

magnitude

The Bucket Pattern Increases speed of retrieval of related

Enables the Computed Pattern data
The Bucket Pattern Implementation

sensor = 5, value = 22, time = Date('2020-05-11')

db.iot.updateOne({ "sensor": reading.sensor,

"valcount": { "$lt": 200 } },
{ "$push": { "readings": { "v": value, "t": time } },
"$inc": { "valcount": 1 } },
{ upsert: true })

{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3,

"readings": [ {"v": 11, "t": Date("2020-05-09")},
{"v": 81, "t": Date("2020-05-10")},
{"v": 22, "t": Date("2020-05-11")} ] }

}
The Computed Pattern

CPU work
The Computed Pattern

CPU work
The Computed Pattern
"Never recompute what you can
precompute"

Reads are often more common than

writes

Compute on write is less work than

The Computed Pattern compute on read

When updating the database, update

some summary records too

Can be thought of as a caching

pattern
Computed Pattern with the Bucket Pattern

sensor = 5, value = 22, time = Date('2020-05-11')

db.iot.updateOne({ "sensor": reading.sensor,

"valcount": { $lt:200 } },
{ "$push": { "readings": { "v": value, "t": time } },
"$inc": { "valcount": 1, "tot": value } },
{ upsert: true })

{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3, "tot": 114,

"readings": [ { "v": 11, "t": Date("2020-05-09” )},
{ "v": 81, "t": Date("2020-05-10” )},
{ "v": 22, "t": Date("2020-05-11” )} ] }
Other Patterns and Where To Find Them
MongoDB Blog, MongoDB Developer Portal and
MongoDB University are all great resources to continue
learning about data modeling and patterns.

Learning
Design Patterns: Elements of Reusable Object-Oriented
Software – a book!

Other talks at this conference:

• Advanced Schema Design Patterns
• A Complete Methodology to Data Modeling
• Using JSON Schema to Save Lives
• Attribute Pattern and the Wildcard Index: Is the
Attribute Pattern Obsolete?
Design an Online Shopping App:
MongoMart
A Use Case Example
Step 1
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Data size

• Database queries and

Evaluate the indexes
application workload
• Current operations
assumptions, and growth
• Data size projections
• A list of
operations
ranked by
importance
Evaluate the Application Workload

1000 stores 50 employees per stores

1 store lookup per customer per year

10 Million items 100 reviews per item

500 thousand updates per day

100 Million user accounts Placing 4 items in the cart

• 500 thousand new accounts per week
Buying an average of 2 items per cart
• Logging in 20 times a year
• Looking up 100 items per year
• Creating 5 carts per year
• Reviewing 2 items per year

10 data scientists each running 10

Analytics
queries a day
Workload Evaluation Summary

Most important queries

• r2: user views a specific item – has to be under 1 ms
• w3: user adds item to cart – write concern: majority
List of Entities:
Required indexes • carts
• {"category": 1, "item_name": 1} • categories
• items
• {"category": 1, "item_name": 1, "price": 1}
• reviews
• {"username": 1} and more.. • staff
• stores
• users
Assumptions and Projections • views
• Data will be stored for a maximum of 5 years
• Number of items sold and number of users will double each year
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Collections with
documents fields and
shapes for each
Evaluate the Map out entities and • Data size
application workload their relationships • Database queries and
indexes
• Current operations
• Data size • CRD: Collection assumptions, and growth
• A list of relationship projections
operations Diagram (Link or
ranked by Embed? )
importance
Entity Relationship Diagram

carts users

N-to-N N-to-N
1-to-N

users items staff

1-to-N 1-to-N N-to-N 1-to-N

N-to-N

views reviews stores

Collections Relationship Diagram (Simple)
Embed Everything!

users items

carts reviews
stores
N-to-N
N-to-N staff

1-to-N
views categories
Collections Relationship Diagram (Better)
Accommodate for assumptions.
Embed & Link!

items

y 5
carts
r
ve
reviews
stores
r e
ea rs
1-to-N N-to-N
users
l
c a
N-to-N staff

ye
y5
1-to-N 1-to-N
views
e r categories

e v
r
l ea rs
c a
ye
Step-by-step Iteration
• Business domain expertise
• Current and predicted scenarios
• Production logs and stats

• Collections with
documents fields and
Finalize the data shapes for each
Evaluate the Map out entities and • Data size
model for each
application workload their relationships • Database queries and
collection indexes
• Current operations
• Data size • CRD: Collection • Identify and assumptions, and growth
• A list of relationship apply relevant projections
operations Diagram (Link or schema patterns
ranked by Embed? )
importance
Apply all the Patterns!

Patterns Used:

• Schema Versioning
• Subset
• Computed
• Bucket
• Extended Reference
Conclusion
And additional considerations
Your Data Model Will Evolve
Just like your application

Small team Medium team Large team Very big team team
Tailor the Data Model
To your unique setup

e l
od
e l a m
• Shared hosted DB
od• Replica Set at
• Small team
m t d
ta an
d a rm
le r rf o
p Pe
Sim • Large Sharded Cluster
Small team Medium team Large team Very big team team
Flexible Data Modeling Approach
For a Simpler data model For the most Performant
For a bit of both:
focus on: data model focus on:

• Data size
• Data size • The most frequent
Evaluate the application The most frequent
• The most frequent operations
workload operation
operations • The most important
operations

Map out the entities and Embedding and linking Embedding and linking
Embedding data
their relationships data data

Finalize schema for each Use as many patterns as Use as many patterns as
Use few patterns
collection necessary necessary
#MDBlive

Visit our product

"booths" for new
features, like the new
Schema Advisor in
Atlas!
mongodb.com/live/product
#MDBlive

Special Thanks to:

John Page, Daniel Coupal,
Eoin Brazil for excellent
content support

Simply Rethink DB
No ratings yet
Simply Rethink DB
193 pages
MongoDB Schema Design Basics
100% (2)
MongoDB Schema Design Basics
51 pages
MEAN Web Development - Second Edition
From Everand
MEAN Web Development - Second Edition
Amos Q. Haviv
No ratings yet
TCS Prep Camp DBMS
No ratings yet
TCS Prep Camp DBMS
120 pages
RDBMS To MongoDB Migration
No ratings yet
RDBMS To MongoDB Migration
20 pages
MongoDB Data Models Guide
100% (1)
MongoDB Data Models Guide
39 pages
MongoDB Architecture Guide
No ratings yet
MongoDB Architecture Guide
18 pages
Protractor
No ratings yet
Protractor
36 pages
Dzone RC Rxjs
No ratings yet
Dzone RC Rxjs
6 pages
MongoDB Schema Design
0% (1)
MongoDB Schema Design
116 pages
Tableau CheatSheet Zep
No ratings yet
Tableau CheatSheet Zep
1 page
L48 - MongoDB
No ratings yet
L48 - MongoDB
31 pages
MongoDB Schema Design
No ratings yet
MongoDB Schema Design
69 pages
DBMS Solutions
No ratings yet
DBMS Solutions
9 pages
Operating System
No ratings yet
Operating System
60 pages
Azure Durable Functions Succinctly
No ratings yet
Azure Durable Functions Succinctly
103 pages
Search Features: Arrow Functions
No ratings yet
Search Features: Arrow Functions
9 pages
In - Memory Data Fabric in Action: Apache Ignite
No ratings yet
In - Memory Data Fabric in Action: Apache Ignite
16 pages
Lecture 07 - Key-Value Databases
No ratings yet
Lecture 07 - Key-Value Databases
75 pages
MongoDB Performance Tuning Mastery
No ratings yet
MongoDB Performance Tuning Mastery
23 pages
Ignite Sample
0% (1)
Ignite Sample
88 pages
Dynamodb DG
No ratings yet
Dynamodb DG
705 pages
MongoDB CRUD Operations
No ratings yet
MongoDB CRUD Operations
70 pages
MongoDB Lab
No ratings yet
MongoDB Lab
41 pages
Creating Distributed Applications Using RMI and JDBC
No ratings yet
Creating Distributed Applications Using RMI and JDBC
30 pages
AngularJS Dev Guide
100% (1)
AngularJS Dev Guide
123 pages
10 Hibernate Interview Questions and Answers For Java J2EE Programmers
No ratings yet
10 Hibernate Interview Questions and Answers For Java J2EE Programmers
4 pages
J2EE (Advanced) JAVA
No ratings yet
J2EE (Advanced) JAVA
141 pages
3.EcmaScript 6 Overview
No ratings yet
3.EcmaScript 6 Overview
126 pages
Cassandra Tutorial
No ratings yet
Cassandra Tutorial
27 pages
Data Modeling
No ratings yet
Data Modeling
3 pages
MySQL Tutorial
No ratings yet
MySQL Tutorial
52 pages
Essential Python Libraries and Functions For Data Science 1706295212
No ratings yet
Essential Python Libraries and Functions For Data Science 1706295212
12 pages
Sequence Diagrams
No ratings yet
Sequence Diagrams
8 pages
NoSql Notes
No ratings yet
NoSql Notes
4 pages
Understanding The Top 5 Redis Performance Metrics
No ratings yet
Understanding The Top 5 Redis Performance Metrics
22 pages
Hibernate Search Reference
No ratings yet
Hibernate Search Reference
379 pages
60 TOP AngularJS Interview Questions and Answers
100% (1)
60 TOP AngularJS Interview Questions and Answers
3 pages
Sequence To Activity Diagram Example
No ratings yet
Sequence To Activity Diagram Example
6 pages
RESTful Java Web Services - Second Edition - Sample Chapter
100% (1)
RESTful Java Web Services - Second Edition - Sample Chapter
34 pages
Building A RESTful Web Service With Spring - Sample Chapter
No ratings yet
Building A RESTful Web Service With Spring - Sample Chapter
13 pages
Build API Basic Concepts
100% (1)
Build API Basic Concepts
78 pages
100 Interview Questions
No ratings yet
100 Interview Questions
13 pages
Nosql Database
No ratings yet
Nosql Database
8 pages
Junit
No ratings yet
Junit
145 pages
Starting With UML - Cheatsheet, 2014
No ratings yet
Starting With UML - Cheatsheet, 2014
10 pages
Big Data - RDBMS, NoSQL and DynamoDB
No ratings yet
Big Data - RDBMS, NoSQL and DynamoDB
6 pages
Tutorial Elasticsearch - English
0% (1)
Tutorial Elasticsearch - English
166 pages
Recommendations Using Redis
No ratings yet
Recommendations Using Redis
7 pages
Node Js Interview Questions
No ratings yet
Node Js Interview Questions
3 pages
The 4+1 View Model of Architecture
100% (1)
The 4+1 View Model of Architecture
9 pages
MongoDB Shell Cheat Sheet
No ratings yet
MongoDB Shell Cheat Sheet
3 pages
CPA System REST API Specification 3.5.7
No ratings yet
CPA System REST API Specification 3.5.7
121 pages
WSO2 Identity Server 3.0: Prabath Siriwardena, Architect and Senior Manager
No ratings yet
WSO2 Identity Server 3.0: Prabath Siriwardena, Architect and Senior Manager
20 pages
Messaging With RabbitMQ - Logical Link Diagram
100% (1)
Messaging With RabbitMQ - Logical Link Diagram
11 pages
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Mastering Java Persistence: From Basics to Expert Proficiency
From Everand
Mastering Java Persistence: From Basics to Expert Proficiency
William Smith
No ratings yet
Hands-On Microservices with JavaScript: Build scalable web applications with JavaScript, Node.js, and Docker
From Everand
Hands-On Microservices with JavaScript: Build scalable web applications with JavaScript, Node.js, and Docker
Tural Suleymani
No ratings yet
Monitoring Hadoop
From Everand
Monitoring Hadoop
Gurmukh Singh
No ratings yet
The JavaScript Journey: From Basics to Full-Stack Mastery
From Everand
The JavaScript Journey: From Basics to Full-Stack Mastery
Priya Singh
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
adt manual-1 (3)
No ratings yet
adt manual-1 (3)
48 pages
Case Study - Refreshing A CS 10 Test Development System With Fresh Production Data
No ratings yet
Case Study - Refreshing A CS 10 Test Development System With Fresh Production Data
106 pages
SQL Quick Reference
No ratings yet
SQL Quick Reference
7 pages
Unit-I-Basic Concepts
No ratings yet
Unit-I-Basic Concepts
42 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
4 pages
Chapter 3 Euler 3D Tutorial PDF
No ratings yet
Chapter 3 Euler 3D Tutorial PDF
41 pages
Mongodb QB
No ratings yet
Mongodb QB
17 pages
Data Blocks Consistent Reads - Undo Records Applied
No ratings yet
Data Blocks Consistent Reads - Undo Records Applied
33 pages
Content Server Enterprise Edition 6.6 Release Notes
No ratings yet
Content Server Enterprise Edition 6.6 Release Notes
107 pages
United States Patent (10) Patent No.: US 8,527,512 B2
No ratings yet
United States Patent (10) Patent No.: US 8,527,512 B2
6 pages
Questions Practical File
No ratings yet
Questions Practical File
13 pages
Foxpro MCQ Question
67% (3)
Foxpro MCQ Question
6 pages
Technical Note Access Database Setup For Surpac FINAL
No ratings yet
Technical Note Access Database Setup For Surpac FINAL
5 pages
Overview RDBMS
No ratings yet
Overview RDBMS
46 pages
Core Data 2017
100% (2)
Core Data 2017
271 pages
Synon Cool2e Parameters
100% (1)
Synon Cool2e Parameters
28 pages
ABAP Dictionary
No ratings yet
ABAP Dictionary
117 pages
DBCC DMV Commands
No ratings yet
DBCC DMV Commands
6 pages
101 Onwards On Python Pandas and Pyplot
No ratings yet
101 Onwards On Python Pandas and Pyplot
33 pages
R18 DBMS Unit-V
No ratings yet
R18 DBMS Unit-V
43 pages
Excel Functions
No ratings yet
Excel Functions
11 pages
Credit Card Customer Segmentation by Clustering: Bennett NG Teng Seng
No ratings yet
Credit Card Customer Segmentation by Clustering: Bennett NG Teng Seng
6 pages
MySQL Tutorial PDF
No ratings yet
MySQL Tutorial PDF
24 pages
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
No ratings yet
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
31 pages
DB2 Questionnaire
No ratings yet
DB2 Questionnaire
8 pages
SAP Note 547314 - FAQ System Copy Procedure
No ratings yet
SAP Note 547314 - FAQ System Copy Procedure
5 pages
Dynamics AX 2009 Development Presentation
No ratings yet
Dynamics AX 2009 Development Presentation
83 pages
Manage LOB
No ratings yet
Manage LOB
1,124 pages

Data Modeling With MongoDB

Uploaded by

Data Modeling With MongoDB

Uploaded by

Data Modeling with MongoDB

Linking vs. Embedding

Linking vs. Embedding

Linking vs. Embedding

Use Case Example

Linking vs. Embedding

Use Case Example

Step 1: Define the Schema

Step 2: Develop the application

Step 1: Define the Schema

Step 2: Develop the application

Step 1: Define the Schema

Step 1: Define the Schema

Step 2: Develop the application

Designed for the usage pattern

Data model evolution is easy

• Database queries and

• Database queries and

Book = { // either side can track

Book = { //either side can track

Queries by articles Queries by articles or users

Is the data queried using the

Does the embedded information

Tabular Approach Document Approach

Used to store small, related data

Reduces index sizes by a large

The Bucket Pattern Increases speed of retrieval of related

sensor = 5, value = 22, time = Date('2020-05-11')

db.iot.updateOne({ "sensor": reading.sensor,

{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3,

Reads are often more common than

Compute on write is less work than

When updating the database, update

Can be thought of as a caching

sensor = 5, value = 22, time = Date('2020-05-11')

db.iot.updateOne({ "sensor": reading.sensor,

{ "_id": ObjectId("abcd12340101"), "sensor": 5, "valcount": 3, "tot": 114,

Other talks at this conference:

• Database queries and

1000 stores 50 employees per stores

10 Million items 100 reviews per item

100 Million user accounts Placing 4 items in the cart

10 data scientists each running 10

Most important queries

users items staff

1-to-N 1-to-N N-to-N 1-to-N

views reviews stores

Visit our product

Special Thanks to:

You might also like