0% found this document useful (0 votes)

29 views13 pages

281511lecture Notes 2 - MongoDB Data Modeling-1718181255820

Short notes on mongo db. Lecture 2nd

Uploaded by

praksh00740

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views13 pages

281511lecture Notes 2 - MongoDB Data Modeling-1718181255820

Short notes on mongo db. Lecture 2nd

Uploaded by

praksh00740

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

NoSQL Database: MongoDB

LECTURE 2 NOTES

MongoDB Data Modeling

●
● What is Data Modeling?

■ In MongoDB, data modeling refers to the process of designing the structure of your
data to best suit the needs of your application. Unlike traditional relational
databases where data is organized in tables, rows, and columns, MongoDB is a
NoSQL database that stores data in flexible, JSON-like documents.
■ The key challenge in data modeling is balancing the needs of the application, the
performance characteristics of the database engine, and the data retrieval
patterns. When designing data models, always consider the application usage of
the data (i.e. queries, updates, and processing of the data) as well as the inherent
structure of the data itself.

■ Flexible Schema

○ Unlike SQL databases, where you must determine and declare a table's
schema before inserting data, MongoDB's collections, by default, do not
require their documents to have the same schema. That is:

○ The documents in a single collection do not need to have the same set of
fields and the data type for a field can differ across documents within a
collection.

○ To change the structure of the documents in a collection, such as adding

new fields, removing existing fields, or changing the field values to a new
type, update the documents to the new structure.

○ This flexibility facilitates the mapping of documents to an entity or an

object. Each document can match the data fields of the represented entity,
even if the document has substantial variation from other documents in the
collection.

○ In practice, however, the documents in a collection share a similar structure,

and you can enforce document validation rules for a collection during
update and insert operations.

■ Document Structure

○ The key decision in designing data models for MongoDB applications

revolves around the structure of documents and how the application
represents relationships between data. MongoDB allows related data to be
embedded within a single document.
■ Embedded Data
○ Embedded documents capture relationships between data by storing
related data in a single document structure. MongoDB documents make it
possible to embed document structures in a field or array within a
document. These denormalized data models allow applications to retrieve
and manipulate related data in a single database operation.

■ Embedded Data Models

○ With MongoDB, you may embed related data in a single structure or document.
These schemas are generally known as ‘denormalized’ models, and take
advantage of MongoDB's rich documents.

○ Embedded data models allow applications to store related pieces of

information in the same database record. As a result, applications may need to
issue fewer queries and updates to complete common operations.

○ In general, use embedded data models when: you have ‘contains’ relationships
between entities. You have one-to-many relationships between entities. In these
relationships the ‘many’ or child documents always appear with or are viewed in
the context of the ‘one’ or parent documents.

○ In general, embedding provides better performance for read operations, as well

as the ability to request and retrieve related data in a single database
operation. Embedded data models make it possible to update related data in a
single atomic write operation.

○ To access data within embedded documents, use dot notation to ‘reach into’ the
embedded documents. See query for data in arrays and query data in
embedded documents for more examples of accessing data in arrays and
embedded documents.

● Embedded Data Model and Document Size Limit

■ Documents in MongoDB must be smaller than the maximum BSON document size.

■ For bulk binary data, consider GridFS.

■ For many use cases in MongoDB, the denormalized data model is optimal.
● Atomicity of Write Operations

■ Single Document Atomicity

○ In MongoDB, a write operation is atomic on the level of a single document, even

if the operation modifies multiple embedded documents within a single
document.

○ A denormalized data model with embedded data combines all related data in a
single document instead of normalizing across multiple documents and
collections. This data model facilitates atomic operations.

■ Multi-Document Transactions

○ When a single write operation (e.g. db.collection.updateMany()) modifies

multiple documents, the modification of each document is atomic, but the
operation as a whole is not atomic.

○ When performing multi-document write operations, whether through a single

write operation or multiple write operations, other operations may interleave.

○ For situations that require atomicity of reads and writes to multiple documents
(in a single or multiple collections), MongoDB supports multi-document
transactions:

➢ In version 4.0, MongoDB supports multi-document transactions on replica

sets.

➢ In version 4.2, MongoDB introduces distributed transactions, which adds

support for multi-document transactions on sharded clusters and
incorporates the existing support for multi-document transactions on
replica sets.

○ Note:

1. In most cases, multi-document transaction incurs a greater performance

cost over single document writes, and the availability of multi-document
transactions should not be a replacement for effective schema design.

2. For many scenarios, the denormalized data model (embedded documents

and arrays) will continue to be optimal for your data and use cases.

3. That is, for many scenarios, modeling your data appropriately will minimize
the need for multi-document transactions.
● Sharding

■ MongoDB uses sharding to provide horizontal scaling. These clusters support

deployments with large data sets and high-throughput operations. Sharding allows
users to partition a collection within a database to distribute the collection's
documents across several mongod instances or shards.

■ To distribute data and application traffic in a sharded collection, MongoDB uses the
shard key. Selecting the proper shard key has significant implications for
performance, and can enable or prevent query isolation and increased write
capacity. While you can change your shard key later, it is important to carefully
consider your shard key choice.

● Indexes

■ Use indexes to improve performance for common queries. Build indexes on fields
that appear often in queries and for all operations that return sorted results.
MongoDB automatically creates a unique index on the _id field.

■ As you create indexes, consider the following behaviors of indexes:

○ Each index requires at least 8 kB of data space.

○ Adding an index has some negative performance impact on write

operations. For collections with a high write-to-read ratio, indexes are
expensive since each insert must also update any indexes.

○ Collections with high read-to-write ratios often benefit from additional

indexes. Indexes do not affect un-indexed read operations.

○ When active, each index consumes disk space and memory. This usage can
be significant and should be tracked for capacity planning, especially for
concerns over working set size.

● Large Number of Collections

■ In certain situations, you might choose to store related information in several

collections rather than in a single collection.

■ Consider a sample collection log that stores log documents for various
environments and applications. The log collection contains documents in the
following form:
{ log: "dev", ts: ..., info: ... }

{ log: "debug", ts: ..., info: ...}

■ If the total number of documents is low, you may group documents into collections
by type. For logs, consider maintaining distinct log collections, such as logs_dev
and logs_debug. The logs_dev collection would contain only the documents related
to the dev environment.
■ Generally, having a large number of collections has no significant performance
penalty and results in a very good performance. Distinct collections are very
important for high-throughput batch processing.
■ When using models that have a large number of collections, consider the following
behaviors:

○ Each collection has a certain minimum overhead of a few kilobytes.

○ Each index, including the index on _id, requires at least 8 kB of data space.

○ For each database, a single namespace file (i.e. <database>.ns) stores all
meta-data for that database, and each index and collection has its own
entry in the namespace file. See places namespace length limits for specific
limitations.

■ Collection Contains a Large Number of Small Documents

○ You should consider embedding for performance reasons if you have a

collection with a large number of small documents. If you can group these
small documents by some logical relationship and you frequently retrieve
the documents by this grouping, you might consider ‘rolling-up’ the small
documents into larger documents that contain an array of embedded
documents.

○ ‘Rolling up’ these small documents into logical groupings means that
queries to retrieve a group of documents involve sequential reads and fewer
random disk accesses. Additionally, ‘rolling up’ documents and moving
common fields to the larger document benefit the index on these fields.
There would be fewer copies of the common fields and there would be
fewer associated key entries in the corresponding index. See Indexes for
more information on indexes.

○ However, if you often only need to retrieve a subset of the documents within
the group, then ‘rolling-up’ the documents may not provide better
performance. Furthermore, if small, separate documents represent the
natural model for the data, you should maintain that model.

● Storage Optimization for Small Documents

■ Each MongoDB document contains a certain amount of overhead. This overhead is

normally insignificant but becomes significant if all documents are just a few
bytes, as might be the case if the documents in your collection only have one or
two fields.

■ Consider the following suggestions and strategies for optimizing storage utilization
for these collections:

○ Use the _id field explicitly:

➢ MongoDB clients automatically add an _id field to each document and

generate a unique 12-byte ObjectId for the _id field. Furthermore, MongoDB
always indexes the _id field. For smaller documents this may account for a
significant amount of space.

➢ To optimize storage use, users can specify a value for the _id field explicitly
when inserting documents into the collection. This strategy allows
applications to store a value in the _id field that would have occupied space
in another portion of the document.

➢ You can store any value in the _id field, but because this value serves as a
primary key for documents in the collection, it must uniquely identify them.
If the field's value is not unique, then it cannot serve as a primary key as
there would be collisions in the collection.

○ Use shorter field names:

➢ Shortening field names reduces expressiveness and does not provide

considerable benefit for larger documents and where document overhead is
not of significant concern. Shorter field names do not reduce the size of
indexes, because indexes have a predefined structure.

➢ In general, it is not necessary to use short field names.

➢ MongoDB stores all field names in every document. For most documents,
this represents a small fraction of the space used by a document; however,
for small documents the field names may represent a proportionally large
amount of space. Consider a collection of small documents that resemble
the following:

{ last_name : "Smith",
best_score: 3.9 }

➢ If you shorten the field named last_name to lname and the field named
best_score to score, as follows, you could save 9 bytes per document.
{ lname :
"Smith", score :
3.9 }

○ Embed documents:

➢ In some cases you may want to embed documents in other documents and
save on the per-document overhead. See Collection Contains Large Number
of Small Documents.

➢ Here are some key aspects of data modeling in MongoDB:

■ Document Structure:

➢ MongoDB stores data in JSON-such as documents called BSON (Binary

JSON). These documents can contain nested structures and arrays, allowing
for more complex data models compared to rows and columns in relational
databases.

■ Schema Design:

➢ MongoDB is schema-less, allowing for dynamic and flexible schemas.

However, having a well-thought-out schema design is crucial for
performance and scalability. Even though MongoDB allows flexibility,
defining a schema that fits your application's needs is important.

■ Embedding vs Referencing:

➢ MongoDB allows you to embed related data within a single document or

reference it across multiple documents. Deciding whether to embed or
reference depends on factors such as data access patterns, data size, and
relationships between entities.

➢ Indexing: Creating appropriate indexes is vital for efficient querying in

MongoDB. Indexes help in speeding up data retrieval operations.
Understanding query patterns and frequently accessed fields is essential
for creating the right indexes.
■ Normalization vs Denormalization:

➢ Unlike traditional relational databases where normalization is a common

practice to minimize redundancy, MongoDB often uses denormalization to
improve query performance. This involves duplicating data across
documents to avoid costly joins.

■ Scalability:

➢ Designing a data model that supports scalability is crucial in MongoDB.

Considering sharding (horizontal scaling) and replication early in the data
modeling process ensures the model can handle increasing data volumes
and concurrent users.

➢ When modeling data in MongoDB, it's important to consider the specific

requirements of your application, the expected workload, and the types of
queries you'll perform most frequently. This way, you can create a data
model that optimizes performance, scalability, and flexibility.

● Some MongoDB data modeling best practices

1. Understand Your Data and Use Cases:

○ Know Your Application: Understand how your application will use the data,
the read/write patterns, and the types of queries it will execute most
frequently.

○ Analyze Access Patterns: Design the data model based on how data will be
accessed and queried.

2. Design a Schema that Fits Your Use Case:

○ Balance Flexibility and Structure: Leverage MongoDB's flexibility but

design a schema that suits your application's requirements.

○ Use Schema Validation: Employ schema validation to enforce data

consistency and integrity where necessary.

3. Optimize Document Structure:

○ Embedding vs Referencing: Choose between embedding related data

within a single document or referencing it across multiple documents based
on query patterns and data relationships.
○ Avoid Deeply Nested Structures: Deeply nested arrays or objects can
affect query performance. Use nesting judiciously.

4. Index Appropriately:

○ Create Indexes: Identify fields frequently used in queries and create

indexes on those fields to speed up query performance.

○ Compound Indexes: For queries that involve multiple fields, consider using
compound indexes.

5. Normalize or Denormalize Data Appropriately:

○ Balance Normalization and Denormalization: Normalize data to maintain

consistency and reduce redundancy where it makes sense, but consider
denormalization for performance optimization.

○ Use References Judiciously: If referencing data, ensure it's done for logical
relationships and doesn't lead to excessive query loads.

6. Consider Sharding and Replication for Scalability:

○ Sharding: Plan for sharding early if your data is expected to grow

significantly. Distribute data across shards to achieve horizontal scalability.

○ Replication: Implement replication to ensure data redundancy, fault

tolerance, and high availability.

7. Regularly Monitor and Adjust:

○ Monitor Performance: Regularly analyze query performance, index usage,

and overall database performance metrics.

○ Refine Data Model: Modify the data model as needed based on observed
performance and changing application requirements.

8. Utilize Aggregation Framework:

○ Leverage Aggregation: Use MongoDB's powerful Aggregation Framework

for complex data manipulation, analytics, and reporting tasks.

9. Test and Iterate:

○ Prototype and Test: Experiment with different data models and query
patterns. Test the performance of approaches to find the most efficient one.

○ Iterate Based on Feedback: Use insights gained from testing to refine and
improve the data model iteratively.

■ Elaborating on use cases involves understanding how different scenarios or

applications might benefit from MongoDB's data modeling best practices:

1. Content Management Systems (CMS):

○ Use Case: Storing articles, blogs, or multimedia content.

○ Modeling Approach: Use embedding to store comments, likes, or metadata

within the same document as the content to retrieve them together
efficiently.

2. E-Commerce Platforms:

○ Use Case: Managing products, orders, and customer data.

○ Modeling Approach: Use referencing to connect orders to products and

customers, allowing for efficient retrieval of specific order details or
customer information.

3. Internet of Things (IoT) Applications:

○ Use Case: Storing sensor data, device information, and telemetry data.

○ Modeling Approach: Depending on the volume of data, use sharding to

handle the growing number of devices and sensor readings efficiently.

4. Real-Time Analytics and Logging:

○ Use Case: Capturing and analyzing log data, user interactions, or system
events.

○ Modeling Approach: Use denormalization to optimize read performance and

generate real-time analytics without the need for complex joins.

5. Social Media Platforms:

○ Use Case: Managing user profiles, friendships, posts, comments, and likes.
○ Modeling Approach: Use a combination of embedding and referencing to
balance data retrieval efficiency with consistency.

6. Geographic Information Systems (GIS):

○ Use Case: Storing geographical data, maps, and spatial information.

○ Modeling Approach: Utilize MongoDB's geospatial indexes and data types

for efficient querying and analysis of spatial data.

7. Messaging and Chat Applications:

○ Use Case: Managing conversations, messages, and user interactions.

○ Modeling Approach: Embed messages within conversation documents for

fast retrieval of entire conversation histories.

8. Event Logging and Monitoring:

○ Use Case: Storing and analyzing system events, errors, and performance
metrics.

○ Modeling Approach: Design a schema that enables efficient querying and

aggregation of event data for monitoring and analysis.

9. Healthcare and Electronic Medical Records (EMR):

○ Use Case: Managing patient records, medical history, and appointments.

○ Modeling Approach: Employ a schema that balances data normalization for

consistency with denormalization for efficient retrieval of patient
information.

10. Gaming Applications:

○ Use Case: Storing user profiles, game data, scores, and achievements.

○ Modeling Approach: Design a schema that allows for quick retrieval of

game-related data, considering performance optimizations for leaderboards
or achievements.

DB LAB Week 5
No ratings yet
DB LAB Week 5
3 pages
MongoDB Case Study 1
No ratings yet
MongoDB Case Study 1
6 pages
Ims DB
No ratings yet
Ims DB
59 pages
Capgemini SSIS Developer Interview Questions:: Cognizant SR - Associate MSBI + T-SQL Interview Questions
No ratings yet
Capgemini SSIS Developer Interview Questions:: Cognizant SR - Associate MSBI + T-SQL Interview Questions
6 pages
SDD 2
No ratings yet
SDD 2
2 pages
FSD Unit III
No ratings yet
FSD Unit III
22 pages
Mongo DB
No ratings yet
Mongo DB
16 pages
MongoDB Database Model
No ratings yet
MongoDB Database Model
7 pages
Data Modeling With Mongodb
No ratings yet
Data Modeling With Mongodb
22 pages
NoSQL Unit 3
No ratings yet
NoSQL Unit 3
65 pages
Unit 2
No ratings yet
Unit 2
85 pages
Data Modeling
100% (1)
Data Modeling
3 pages
1664473609-Unit 5 - Database Management - MongoDB
No ratings yet
1664473609-Unit 5 - Database Management - MongoDB
23 pages
Mongodb
No ratings yet
Mongodb
9 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
Mongo DB
No ratings yet
Mongo DB
30 pages
Module 3 Mongodb
No ratings yet
Module 3 Mongodb
10 pages
Mongidb 1
No ratings yet
Mongidb 1
29 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
10 pages
Mongodb (Cont.) : Excerpts From "The Little Mongodb Book" Karl Seguin
No ratings yet
Mongodb (Cont.) : Excerpts From "The Little Mongodb Book" Karl Seguin
37 pages
MST Unit-5
No ratings yet
MST Unit-5
14 pages
Notes For Question Bank
No ratings yet
Notes For Question Bank
17 pages
21 Mongo DB
No ratings yet
21 Mongo DB
104 pages
Mongo DB
No ratings yet
Mongo DB
77 pages
Dod Unit4
No ratings yet
Dod Unit4
18 pages
Chapitre 4 MongoDB
No ratings yet
Chapitre 4 MongoDB
27 pages
Mongodb Tutorial: Database Collection
No ratings yet
Mongodb Tutorial: Database Collection
36 pages
MongoDB Lecture 1
No ratings yet
MongoDB Lecture 1
37 pages
Data Modeling
No ratings yet
Data Modeling
2 pages
Mongodb
No ratings yet
Mongodb
19 pages
DBMS-Module 5
No ratings yet
DBMS-Module 5
15 pages
Bda Unit 4
No ratings yet
Bda Unit 4
13 pages
What Is Mongodb - Working and Features
100% (1)
What Is Mongodb - Working and Features
11 pages
Mongo
No ratings yet
Mongo
29 pages
NoSQL 24 Mongo P1
No ratings yet
NoSQL 24 Mongo P1
43 pages
Big Data Practical 3
No ratings yet
Big Data Practical 3
4 pages
Big Data (Unit 3)
No ratings yet
Big Data (Unit 3)
46 pages
DB Practices For MongoDB
No ratings yet
DB Practices For MongoDB
7 pages
L48 - MongoDB
No ratings yet
L48 - MongoDB
31 pages
Mongodb
No ratings yet
Mongodb
60 pages
Lab Sheet 9
No ratings yet
Lab Sheet 9
13 pages
What Is MongoDB - Introduction, Architecture, Features & Example
No ratings yet
What Is MongoDB - Introduction, Architecture, Features & Example
8 pages
NoSQL CIA EXAMS QUESTIONS WITH ANSWERS
No ratings yet
NoSQL CIA EXAMS QUESTIONS WITH ANSWERS
32 pages
Lecture 07.06 ModelingDataInMongo - 12
No ratings yet
Lecture 07.06 ModelingDataInMongo - 12
12 pages
Unit 2
No ratings yet
Unit 2
4 pages
FSD Unit - 3 - Part-1
No ratings yet
FSD Unit - 3 - Part-1
15 pages
MongoDB Data Modeling - Sample Chapter
No ratings yet
MongoDB Data Modeling - Sample Chapter
40 pages
Unit-3 (Mongo DB)
No ratings yet
Unit-3 (Mongo DB)
47 pages
Unit 4 (MongoDB)
No ratings yet
Unit 4 (MongoDB)
46 pages
02 - Document-Based and MongoDB
No ratings yet
02 - Document-Based and MongoDB
133 pages
BDA3
No ratings yet
BDA3
3 pages
NGT Unit 2 - 230630 - 094118
No ratings yet
NGT Unit 2 - 230630 - 094118
62 pages
CHAP1 No SQL Database - 085309
No ratings yet
CHAP1 No SQL Database - 085309
72 pages
Open-Source - Document Oriented - Nosql Database - Distributed Database
No ratings yet
Open-Source - Document Oriented - Nosql Database - Distributed Database
15 pages
Mongodb Schema Design Part 3
No ratings yet
Mongodb Schema Design Part 3
1 page
NGD Question Bank Answers
No ratings yet
NGD Question Bank Answers
41 pages
Unit 2 - Bda Notes
No ratings yet
Unit 2 - Bda Notes
37 pages
Udbms (Unit 3)
No ratings yet
Udbms (Unit 3)
9 pages
281507lecture Notes 1 - Introduction To MongoDB-1718181125439
No ratings yet
281507lecture Notes 1 - Introduction To MongoDB-1718181125439
8 pages
Mongo Lesson2
No ratings yet
Mongo Lesson2
43 pages
ICT Report
No ratings yet
ICT Report
16 pages
Hall Ticket Number:: 14CS IT 504
No ratings yet
Hall Ticket Number:: 14CS IT 504
19 pages
Passing Datatable To Stored Procedur
No ratings yet
Passing Datatable To Stored Procedur
2 pages
Chapter 14 12a
No ratings yet
Chapter 14 12a
4 pages
Hibernate Reference Documentation v4.3.8
No ratings yet
Hibernate Reference Documentation v4.3.8
358 pages
LIS ICT 106 - Course Syllabus - Revised
No ratings yet
LIS ICT 106 - Course Syllabus - Revised
8 pages
Worksheet 20 Qs
No ratings yet
Worksheet 20 Qs
3 pages
Sybase Catalog 2012
No ratings yet
Sybase Catalog 2012
48 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
29 pages
A Relational Database Consists of A Collection of A) Tables B) Fields C) Records D) Keys Ans:A
No ratings yet
A Relational Database Consists of A Collection of A) Tables B) Fields C) Records D) Keys Ans:A
25 pages
Lakshay Yadav - 20622
No ratings yet
Lakshay Yadav - 20622
14 pages
Unit 2
No ratings yet
Unit 2
17 pages
Its Module Exam
No ratings yet
Its Module Exam
6 pages
Creating A Database Class
No ratings yet
Creating A Database Class
14 pages
Index: S.No Date Name of Experiment No. Signature
No ratings yet
Index: S.No Date Name of Experiment No. Signature
1 page
Oracle 1Z0 047 Exam Preparation Material by Dumpsschool
No ratings yet
Oracle 1Z0 047 Exam Preparation Material by Dumpsschool
11 pages
Pavan QA
No ratings yet
Pavan QA
3 pages
Unit-Iii Distributed Database: System
No ratings yet
Unit-Iii Distributed Database: System
55 pages
PLSQL Lab Mannual
No ratings yet
PLSQL Lab Mannual
28 pages
Data Sharing
No ratings yet
Data Sharing
3 pages
Top 50 SQL Interview Questions and Answers
No ratings yet
Top 50 SQL Interview Questions and Answers
20 pages
Lab 1
No ratings yet
Lab 1
7 pages
Data Engineering Questionnaire
No ratings yet
Data Engineering Questionnaire
143 pages
Tut 02
No ratings yet
Tut 02
2 pages
10gnew Features
No ratings yet
10gnew Features
141 pages
Query Language (SQL) To Interact With A Database Here in The Classroom. You Will
No ratings yet
Query Language (SQL) To Interact With A Database Here in The Classroom. You Will
25 pages
DBMS Lecture Notes
No ratings yet
DBMS Lecture Notes
120 pages