0% found this document useful (0 votes)

43 views1 page

Mongodb Schema Design Part 2

Uploaded by

Javier Morales

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views1 page

Mongodb Schema Design Part 2

Uploaded by

Javier Morales

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

| Blog Home News Applied Developer QuickStart Updates Culture Events Mark Loves Tech All  Search

6 Rules of Thumb for MongoDB Schema

Design: Part 2

MongoDB
June 5, 2014 | Updated: May 1, 2018
#Technical

By William Zola, Lead Technical Support Engineer at MongoDB

This is the second stop on our tour of modeling One-to-N relationships in MongoDB. Last time I
covered the three basic schema designs: embedding, child-referencing, and parent-referencing. I also
covered the two factors to consider when picking one of these designs:

Will the entities on the “N” side of the One-to-N ever need to stand alone?

What is the cardinality of the relationship: is it one-to-few; one-to-many; or one-to-squillions?

With these basic techniques under our belt, I can move on to covering more sophisticated schema
designs, involving two-way referencing and denormalization.

Intermediate: Two-Way Referencing

If you want to get a little bit fancier, you can combine two techniques and include both styles of
reference in your schema, having both references from the “one” side to the “many” side and
references from the “many” side to the “one” side.

For an example, let’s go back to that task-tracking system. There’s a “people” collection holding
Person documents, a “tasks” collection holding Task documents, and a One-to-N relationship from
Person -> Task. The application will need to track all of the Tasks owned by a Person, so we will need
to reference Person -> Task.

With the array of references to Task documents, a single Person document might look like this:

[Link]()
{
_id: ObjectID("AAF1"),
name: "Kate Monster",
tasks [ // array of references to Task documents
ObjectID("ADF9"),
ObjectID("AE02"),
ObjectID("AE73")
// etc
]
}

On the other hand, in some other contexts this application will display a list of Tasks (for example, all
of the Tasks in a multi-person Project) and it will need to quickly find which Person is responsible for
each Task. You can optimize this by putting an additional reference to the Person in the Task
document.

[Link]()
{
_id: ObjectID("ADF9"),
description: "Write lesson plan",
due_date: ISODate("2014-04-01"),
owner: ObjectID("AAF1") // Reference to Person document
}

This design has all of the advantages and disadvantages of the “One-to-Many” schema, but with some
additions. Putting in the extra ‘owner’ reference into the Task document means that its quick and easy
to find the Task’s owner, but it also means that if you need to reassign the task to another person, you
need to perform two updates instead of just one. Specifically, you’ll have to update both the reference
from the Person to the Task document, and the reference from the Task to the Person. (And to the
relational gurus who are reading this – you’re right: using this schema design means that it is no
longer possible to reassign a Task to a new Person with a single atomic update. This is OK for our
task-tracking system: you need to consider if this works with your particular use case.)

Intermediate: Denormalizing With “One-To-Many” Relationships

Beyond just modeling the various flavors of relationships, you can also add denormalization into your
schema. This can eliminate the need to perform the application-level join for certain cases, at the
price of some additional complexity when performing updates. An example will help make this clear.

Denormalizing from Many -> One

For the parts example, you could denormalize the name of the part into the ‘parts[]’ array. For
reference, here’s the version of the Product document without denormalization.

> [Link]()
{
name : 'left-handed smoke shifter',
manufacturer : 'Acme Corp',
catalog_number: 1234,
parts : [ // array of references to Part documents
ObjectID('AAAA'), // reference to the #4 grommet above
ObjectID('F17C'), // reference to a different Part
ObjectID('D2AA'),
// etc
]
}

Denormalizing would mean that you don’t have to perform the application-level join when displaying all
of the part names for the product, but you would have to perform that join if you needed any other
information about a part.

> [Link]()
{
name : 'left-handed smoke shifter',
manufacturer : 'Acme Corp',
catalog_number: 1234,
parts : [
{ id : ObjectID('AAAA'), name : '#4 grommet' }, // Part name is denormalized
{ id: ObjectID('F17C'), name : 'fan blade assembly' },
{ id: ObjectID('D2AA'), name : 'power switch' },
// etc
]
}

While making it easier to get the part names, this would add just a bit of client-side work to the
application-level join:

// Fetch the product document

> product = [Link]({catalog_number: 1234});
// Create an array of ObjectID()s containing *just* the part numbers
> part_ids = [Link]( function(doc) { return [Link] } );
// Fetch all the Parts that are linked to this Product
> product_parts = [Link]({_id: { $in : part_ids } } ).toArray() ;

Denormalizing saves you a lookup of the denormalized data at the cost of a more expensive update: if
you’ve denormalized the Part name into the Product document, then when you update the Part name
you must also update every place it occurs in the ‘products’ collection.

Denormalizing only makes sense when there’s an high ratio of reads to updates. If you’ll be reading
the denormalized data frequently, but updating it only rarely, it often makes sense to pay the price of
slower updates – and more complex updates – in order to get more efficient queries. As updates
become more frequent relative to queries, the savings from denormalization decrease.

For example: assume the part name changes infrequently, but the quantity on hand changes
frequently. This means that while it makes sense to denormalize the part name into the Product
document, it does not make sense to denormalize the quantity on hand.

Also note that if you denormalize a field, you lose the ability to perform atomic and isolated updates
on that field. Just like with the two-way referencing example above, if you update the part name in the
Part document, and then in the Product document, there will be a sub-second interval where the
denormalized ‘name’ in the Product document will not reflect the new, updated value in the Part
document.

Denormalizing from One -> Many

You can also denormalize fields from the “One” side into the “Many” side:

> [Link]()
{
_id : ObjectID('AAAA'),
partno : '123-aff-456',
name : '#4 grommet',
product_name : 'left-handed smoke shifter', // Denormalized from the ‘Product’ document
product_catalog_number: 1234, // Ditto
qty: 94,
cost: 0.94,
price: 3.99
}

However, if you’ve denormalized the Product name into the Part document, then when you update the
Product name you must also update every place it occurs in the ‘parts’ collection. This is likely to be a
more expensive update, since you’re updating multiple Parts instead of a single Product. As such, it’s
significantly more important to consider the read-to-write ratio when denormalizing in this way.

Intermediate: Denormalizing With “One-To-Squillions” Relationships

You can also denormalize the “one-to-squillions” example. This works in one of two ways: you can
either put information about the “one” side (from the 'hosts’ document) into the “squillions” side (the
log entries), or you can put summary information from the “squillions” side into the “one” side.

Here’s an example of denormalizing into the “squillions” side. I’m going to add the IP address of the
host (from the ‘one’ side) into the individual log message:

> [Link]()
{
time : ISODate("2014-03-28T[Link].382Z"),
message : 'cpu is on fire!',
ipaddr : '[Link]',
host: ObjectID('AAAB')
}

Your query for the most recent messages from a particular IP address just got easier: it’s now just one
query instead of two.

> last_5k_msg = [Link]({ipaddr : '[Link]'}).sort({time : -1}).limit(5000

).toArray()

In fact, if there’s only a limited amount of information you want to store at the “one” side, you can
denormalize it ALL into the “squillions” side and get rid of the “one” collection altogether:

> [Link]()
{
time : ISODate("2014-03-28T[Link].382Z"),
message : 'cpu is on fire!',
ipaddr : '[Link]',
hostname : '[Link]',
}

On the other hand, you can also denormalize into the “one” side. Lets say you want to keep the last
1000 messages from a host in the 'hosts’ document. You could use the $each / $slice functionality
introduced in MongoDB 2.4 to keep that list sorted, and only retain the last 1000 messages:

The log messages get saved in the 'logmsg’ collection as well as in the denormalized list in the 'hosts’
document: that way the message isn’t lost when it ages out of the '[Link]’ array.

// Get log message from monitoring system

logmsg = get_log_msg();
log_message_here = [Link];
log_ip = [Link];
// Get current timestamp
now = new Date()
// Find the _id for the host I’m updating
host_doc = [Link]({ipaddr : log_ip },{_id:1}); // Don’t return the whole do
cument
host_id = host_doc._id;
// Insert the log message, the parent reference, and the denormalized data into the ‘many’ side
[Link]({time : now, message : log_message_here, ipaddr : log_ip, host : host_i
d ) });
// Push the denormalized log message onto the ‘one’ side
[Link]( {_id: host_id },
{$push : {logmsgs : { $each: [ { time : now, message : log_message_here } ],
$sort: { time : 1 }, // Only keep the latest ones
$slice: -1000 } // Only keep the latest 1000
}} );

Note the use of the projection specification ( {_id:1} ) to prevent MongoDB from having to ship the
entire ‘hosts’ document over the network. By telling MongoDB to only return the _id field, I reduce the
network overhead down to just the few bytes that it takes to store that field (plus just a little bit more
for the wire protocol overhead).

Just as with denormalizing in the “One-to-Many” case, you’ll want to consider the ratio of reads to
updates. Denormalizing the log messages into the Host document makes sense only if log messages
are infrequent relative to the number of times the application needs to look at all of the messages for
a single host. This particular denormalization is a bad idea if you want to look at the data less
frequently than you update it.

Recap

In this post, I’ve covered the additional choices that you have past the basics of embed, child-
reference, or parent-reference.

You can use bi-directional referencing if it optimizes your schema, and if you are willing to pay the price of not
having atomic updates

If you are referencing, you can denormalize data either from the “One” side into the “N” side, or from the “N”
side into the “One” side

When deciding whether or not to denormalize, consider the following factors:

You cannot perform an atomic update on denormalized data

Denormalization only makes sense when you have a high read to write ratio

Next time, I’ll give you some guidelines to pick and choose among all of these options.

More Information
Schema Design Consulting Services

Thinking in Documents (recorded webinar)

Schema Design for Time-Series Data (recorded webinar)

Socialite, the Open Source Status Feed - Managing the Social Graph (recorded presentation)

This post was updated in January 2015 to include additional resources and updated links.

   

← Previous Next →

Increasing MMS Security via Two- Accelerate App Delivery with

Factor Authentication Cognizant's Next Gen Continuous
Integrator
As of May 28th, the MongoDB
Management Service (MMS) requires Two The phrase “digital transformation” is
Factor Authentication ( 2FA ) for all MM… ubiquitous these days. But what does it
actually mean? Often, the heart of a…

June 4, 2014 September 29, 2021

Resources Education & Support Popular Topics About Follow Us

NoSQL Database Explained View Course Catalog MongoDB on AWS MongoDB, Inc. Facebook

MongoDB Architecture Guide Certification MongoDB on Google Cloud Leadership Github

MongoDB Enterprise Advanced MongoDB Manual Run MongoDB on Multiple Clouds with MongoDB Press Room Youtube
Atlas
MongoDB Atlas Installation Careers Twitter
Migrate to MongoDB Atlas
MongoDB Realm Support Investors LinkedIn
What is a Cloud Database?
MongoDB Engineering Blog Community Legal Notices StackOverflow
Building a REST API with MongoDB Realm
FAQ Privacy Notice Twitch

Security
Information

Trust Center

Office Locations

Code of Conduct

Mongo, MongoDB, and the MongoDB leaf logo are registered trademarks of MongoDB, Inc.

MongoDB Schema Design for Projects and Tasks
No ratings yet
MongoDB Schema Design for Projects and Tasks
1 page
Bangalore Institute of Technology K R Road, V V Pura, Bengaluru-04 I Internals - 2024-25 (EVEN)
No ratings yet
Bangalore Institute of Technology K R Road, V V Pura, Bengaluru-04 I Internals - 2024-25 (EVEN)
6 pages
DBMS Overview: ER Diagrams & Dependencies
No ratings yet
DBMS Overview: ER Diagrams & Dependencies
9 pages
Assignment No:-2: Object Oriented Analysis & Design
No ratings yet
Assignment No:-2: Object Oriented Analysis & Design
15 pages
Mongodb Schema Design Part 3
No ratings yet
Mongodb Schema Design Part 3
1 page
ERD Creation Tutorial Using Visio
No ratings yet
ERD Creation Tutorial Using Visio
9 pages
Chapter 3-Database Modelling
No ratings yet
Chapter 3-Database Modelling
59 pages
Managing One-to-One Relationships in EF
No ratings yet
Managing One-to-One Relationships in EF
1 page
Data Modeling
No ratings yet
Data Modeling
36 pages
MongoDB Data Relationships Guide
No ratings yet
MongoDB Data Relationships Guide
62 pages
Object Oriented
No ratings yet
Object Oriented
17 pages
Software Engineering Importanrt PYQ 2
No ratings yet
Software Engineering Importanrt PYQ 2
42 pages
Lab Manual 01 13-10-2022
No ratings yet
Lab Manual 01 13-10-2022
9 pages
9-System ModelingFL20
No ratings yet
9-System ModelingFL20
34 pages
Mcs 219
No ratings yet
Mcs 219
16 pages
Software Development Reusability Concepts
No ratings yet
Software Development Reusability Concepts
18 pages
Hashing and Database Structures
No ratings yet
Hashing and Database Structures
4 pages
Er Diagram Dbms Prep Full
No ratings yet
Er Diagram Dbms Prep Full
4 pages
Understanding ER Model in DBMS
No ratings yet
Understanding ER Model in DBMS
28 pages
Object-Oriented Analysis & Design Guide
No ratings yet
Object-Oriented Analysis & Design Guide
6 pages
Unit 2
No ratings yet
Unit 2
21 pages
MongoDB Update and Delete Methods
100% (1)
MongoDB Update and Delete Methods
47 pages
INS Assignments
No ratings yet
INS Assignments
4 pages
CSE 241 hw3
No ratings yet
CSE 241 hw3
2 pages
Database Structure and User Roles For DoveLinker
No ratings yet
Database Structure and User Roles For DoveLinker
10 pages
DBMS
No ratings yet
DBMS
4 pages
ch20 22
No ratings yet
ch20 22
8 pages
Assignment: MC0069 - System Analysis & Design (SAD)
No ratings yet
Assignment: MC0069 - System Analysis & Design (SAD)
25 pages
Java Design Patterns Guide
No ratings yet
Java Design Patterns Guide
1 page
Subject: Object Oriented Subject Code: CS E61: Analysis and Design
No ratings yet
Subject: Object Oriented Subject Code: CS E61: Analysis and Design
26 pages
Er Diagram
No ratings yet
Er Diagram
28 pages
Cs8492-Unit Ii
No ratings yet
Cs8492-Unit Ii
25 pages
Nouveau Document Microsoft Word (3) (AutoRecovered)
No ratings yet
Nouveau Document Microsoft Word (3) (AutoRecovered)
7 pages
Collaboration Hub
No ratings yet
Collaboration Hub
2 pages
DBMS2
No ratings yet
DBMS2
9 pages
3.2.5. Associations: The Relations Is - A and Has - A Are Fundamental Ways To Understand Collections of Classes
No ratings yet
3.2.5. Associations: The Relations Is - A and Has - A Are Fundamental Ways To Understand Collections of Classes
45 pages
Construct ER Diagram: Checklist
No ratings yet
Construct ER Diagram: Checklist
14 pages
DBMS 4
No ratings yet
DBMS 4
46 pages
CH 2
No ratings yet
CH 2
6 pages
Entity Framework Core Quick Reference
No ratings yet
Entity Framework Core Quick Reference
3 pages
Lab Manual 02 - 8-11-2021
No ratings yet
Lab Manual 02 - 8-11-2021
6 pages
DBMS Handwritten
No ratings yet
DBMS Handwritten
36 pages
NoSQL Database Guide
No ratings yet
NoSQL Database Guide
100 pages
CSE 425: Software Design and Pattern: Section 1
No ratings yet
CSE 425: Software Design and Pattern: Section 1
11 pages
Purpose of Normalization in Databases
No ratings yet
Purpose of Normalization in Databases
4 pages
Unit 2 1 1
No ratings yet
Unit 2 1 1
37 pages
DBMS 8
No ratings yet
DBMS 8
13 pages
UML Objects2Classes
No ratings yet
UML Objects2Classes
36 pages
Class and Object Diagram Guide
No ratings yet
Class and Object Diagram Guide
19 pages
DBMS Microproject
No ratings yet
DBMS Microproject
15 pages
Block 3 MS 032 Unit 5
No ratings yet
Block 3 MS 032 Unit 5
15 pages
ASP.NET Core Employee Management API
No ratings yet
ASP.NET Core Employee Management API
33 pages
Transformation To 1NF
No ratings yet
Transformation To 1NF
5 pages
Understanding Unique Identifiers in Databases
No ratings yet
Understanding Unique Identifiers in Databases
14 pages
Database ER Diagram Basics
100% (1)
Database ER Diagram Basics
96 pages
Exam Example Sol
100% (2)
Exam Example Sol
11 pages
Dbms
No ratings yet
Dbms
52 pages
Object Data Model Overview
No ratings yet
Object Data Model Overview
14 pages
MongoDB ReadConcern: Majority vs Snapshot
No ratings yet
MongoDB ReadConcern: Majority vs Snapshot
1 page
Docker Inc Docker Fundamentals Course PDF
0% (1)
Docker Inc Docker Fundamentals Course PDF
193 pages
SQL To MongoDB Mapping Chart
No ratings yet
SQL To MongoDB Mapping Chart
17 pages
Azure Networking Fundamentals Overview
No ratings yet
Azure Networking Fundamentals Overview
46 pages
Ipv6 Hardening Guide For Windows Servers
No ratings yet
Ipv6 Hardening Guide For Windows Servers
21 pages
Richardson Maturity Model
No ratings yet
Richardson Maturity Model
12 pages
How To Create A Stellar: UX/UI Portfolio
No ratings yet
How To Create A Stellar: UX/UI Portfolio
14 pages
M03 - HOL Complex Data Relationships
No ratings yet
M03 - HOL Complex Data Relationships
35 pages
Fundamentals of Continuous Integration: Jenkins
No ratings yet
Fundamentals of Continuous Integration: Jenkins
7 pages
04-Power Automate Lab Manual
0% (1)
04-Power Automate Lab Manual
29 pages
M02 - HOL Reusable Components
No ratings yet
M02 - HOL Reusable Components
28 pages
M04 - HOL Embedded Canvas
No ratings yet
M04 - HOL Embedded Canvas
21 pages
01-Power Apps Canvas App Lab Manual
No ratings yet
01-Power Apps Canvas App Lab Manual
49 pages
Zero Trust Security Maturity Model Guide
No ratings yet
Zero Trust Security Maturity Model Guide
7 pages
02-Common Data Service Lab Manual
No ratings yet
02-Common Data Service Lab Manual
51 pages
00-AppInADay Lab Overview
No ratings yet
00-AppInADay Lab Overview
8 pages
Android Content Providers
No ratings yet
Android Content Providers
31 pages
Role of Software Readability On Software Development Cost: Collare@wcsu - Edu rvalerdi@MIT - EDU
No ratings yet
Role of Software Readability On Software Development Cost: Collare@wcsu - Edu rvalerdi@MIT - EDU
3 pages
DVB-S2 for Satellite Engineers
No ratings yet
DVB-S2 for Satellite Engineers
22 pages
STM32Cube Firmware
No ratings yet
STM32Cube Firmware
42 pages
Unit 5 Object Relational and NOSQL Database
No ratings yet
Unit 5 Object Relational and NOSQL Database
51 pages
BDA Question Bank
No ratings yet
BDA Question Bank
8 pages
ITN CCNA 1 v6.0 Practice Final Exam Answers 2018 2019 - Passed 100%
0% (1)
ITN CCNA 1 v6.0 Practice Final Exam Answers 2018 2019 - Passed 100%
27 pages
Memory Management Essentials
No ratings yet
Memory Management Essentials
30 pages
MAH MCA CET Sample Question Paper
0% (2)
MAH MCA CET Sample Question Paper
2 pages
Data Analytics 2marks PDF
100% (1)
Data Analytics 2marks PDF
13 pages
ODI 11.1.1.9 Certification Matrix
No ratings yet
ODI 11.1.1.9 Certification Matrix
18 pages
Torrentleech Gdrive
No ratings yet
Torrentleech Gdrive
14 pages
CCNPv6 TSHOOT Lab10-1-Comp-Env Student PDF
No ratings yet
CCNPv6 TSHOOT Lab10-1-Comp-Env Student PDF
17 pages
Apex 20.2 With ORDS & Apache Tomcat
100% (1)
Apex 20.2 With ORDS & Apache Tomcat
27 pages
Unit 5 Working With Database in PHP 5.1 To 5.4
No ratings yet
Unit 5 Working With Database in PHP 5.1 To 5.4
37 pages
MySQL Lab Manual and Exercises
No ratings yet
MySQL Lab Manual and Exercises
6 pages
Krename-3 0 3
No ratings yet
Krename-3 0 3
19 pages
DS Module1 Notes
No ratings yet
DS Module1 Notes
53 pages
Bitmap vs Vector Graphics Explained
No ratings yet
Bitmap vs Vector Graphics Explained
21 pages
Oracle ASCP & eAM Integration Guide
No ratings yet
Oracle ASCP & eAM Integration Guide
9 pages
Zoom
No ratings yet
Zoom
22 pages
Essential Git Commands for DevOps
No ratings yet
Essential Git Commands for DevOps
9 pages
SQL Server DBA Profile and Experience
No ratings yet
SQL Server DBA Profile and Experience
4 pages
Next-Gen Transactional Databases for Banks
No ratings yet
Next-Gen Transactional Databases for Banks
6 pages
CS ATM MANAGEMENT Synopsis
0% (1)
CS ATM MANAGEMENT Synopsis
15 pages
RDBMS Practical File
No ratings yet
RDBMS Practical File
13 pages
CSC 1002 Computer Architecture Second Assignment
No ratings yet
CSC 1002 Computer Architecture Second Assignment
4 pages
How To Install Oracle Solaris Cluster On Solaris 11
No ratings yet
How To Install Oracle Solaris Cluster On Solaris 11
4 pages
COMPUTER STUDIES Question N Answer
0% (1)
COMPUTER STUDIES Question N Answer
134 pages
Consolacion Community College SQL Exam
No ratings yet
Consolacion Community College SQL Exam
2 pages
1 - Chapter 1 - The Worlds of Database Systems
No ratings yet
1 - Chapter 1 - The Worlds of Database Systems
20 pages
Tixati Language Template
No ratings yet
Tixati Language Template
205 pages

Mongodb Schema Design Part 2

Uploaded by

Mongodb Schema Design Part 2

Uploaded by

| Blog Home News Applied Developer QuickStart Updates Culture Events Mark Loves Tech All  Search

6 Rules of Thumb for MongoDB Schema

By William Zola, Lead Technical Support Engineer at MongoDB

What is the cardinality of the relationship: is it one-to-few; one-to-many; or one-to-squillions?

Intermediate: Two-Way Referencing

Intermediate: Denormalizing With “One-To-Many” Relationships

Denormalizing from Many -> One

// Fetch the product document

Denormalizing from One -> Many

Intermediate: Denormalizing With “One-To-Squillions” Relationships

> last_5k_msg = [Link]({ipaddr : '[Link]'}).sort({time : -1}).limit(5000

// Get log message from monitoring system

When deciding whether or not to denormalize, consider the following factors:

You cannot perform an atomic update on denormalized data

Thinking in Documents (recorded webinar)

Schema Design for Time-Series Data (recorded webinar)

Increasing MMS Security via Two- Accelerate App Delivery with

June 4, 2014 September 29, 2021

Resources Education & Support Popular Topics About Follow Us

MongoDB Architecture Guide Certification MongoDB on Google Cloud Leadership Github

© 2021 MongoDB, Inc.

You might also like