0% found this document useful (0 votes)
52 views8 pages

Mongo DB

There are two main types of data modeling in MongoDB - embedded modeling and reference modeling. Embedded modeling stores related data as subdocuments, and is useful for one-to-one relationships where the nested data does not change frequently. Reference modeling stores related data across different collections and uses references to connect them, which is better for one-to-many relationships where the related data changes often.

Uploaded by

Muhammad Asim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views8 pages

Mongo DB

There are two main types of data modeling in MongoDB - embedded modeling and reference modeling. Embedded modeling stores related data as subdocuments, and is useful for one-to-one relationships where the nested data does not change frequently. Reference modeling stores related data across different collections and uses references to connect them, which is better for one-to-many relationships where the related data changes often.

Uploaded by

Muhammad Asim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Mongo DB

Mongo DB Modeling:
There are two types of modeling in Mongo DB

1. Embedded Modeling

Embedded modeling is good to use when there is more relative data and there is one-to-one
relationship between the data. In such cases the change in the relative data is not frequent and
is not dependent. In embedded modeling the relative data is embedded as a sub document
under the main document. You can also say that embedded modeling requires demoralization of
data. For example we have a list of person with their personal information and addresses. If we
go with embedded approach then we don’t have to create the separate collection for the
customer addresses as it’s only relative to the respective customer.

The embedded collection will look like this:

"_id" : ObjectId("5d3ffdc8575e43fc45ef05c7"),

"first_name" : "Customer",

"last_name" : "One",

"email" : "[email protected]",

"work_address" : {

"address" : "work address",

"street" : 1,

"state" : "XYZ",

"country" : "Xyz"

},

"home_address" : {

"address" : "home address",

"street" : 1,

"state" : "XYZ",

"country" : "Xyz"

}
In embedded modeling childe data belongs to parent data that means one-to-one relationship.
Sub documents should be smaller in size and the sub documents should be in proper hierarchy.
In most of the cases embedded modeling is more efficient in performance but use more
memory as documents and sub documents store in same collection. The size of each sub
document should not be more than 16Mb.

2. Reference Modeling

Reference modeling is good to use when the relationship between the collections is one-to-
many. Reference modeling is used when there is a relative data which changes frequent and is
not fully dependent on the other data. In reference modeling the reference id is used as a
reference in the other document. You can also say that reference modeling requires
normalization in data. For example we have list of students which are enrolled in different
course. In this case we have a collection of student and course separately. In students collection
there are different students having their respective documents and a separate collection of
courses in which different courses have respective documents. So each student have a reference
id of courses in their documents. The documents will be look like as below:
Student Collection and documents

_id: 123456,

name: ”Student 1”,

courses: [147852, 963258]

_id: 782654,

name: ”Student 2”,

courses: [147852, 963258]

Course Collection and documents:

_id: 147852,

title: “Course 1”

_id: 963258,

title: “Course 2”

}
Above is just a one example of reference modeling. Once can use reference modeling in many ways
according to the requirements. Reference modeling is mostly used when the relationship between the
data is one-to-many. Reference modeling is less efficient as compared to embedded modeling but it
saves memory as it does not contains sub documents. If we are using reference modeling then we
should take of the performance as in reference modeling the number of queries can be increased and
we have to make sure that we should design our schema in such way that our query count should be
minimum.

Datatypes:
1. String
a. This is the most commonly used datatype to store the data. String in MongoDB must be
UTF-8 valid.
2. Integer
a. This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending
upon your server.
3. Boolean
a. This type is used to store a Boolean (true/ false) value.
4. Double
a. This type is used to store floating point values.
5. Undefined
a. This MongoDB data type stores the undefined values.
6. Null
a. This MongoDB data types stores a null value in it.
7. Min/ Max keys
a. This type is used to compare a value against the lowest and highest BSON elements.
8. Arrays
a. This type is used to store arrays or list or multiple values into one key.
9. Timestamp
a. This can be handy for recording when a document has been modified or added.
10. Object
a. This datatype is used for embedded documents.
11. Symbol
a. These MongoDB data types similar to the string data type. It is not supported by a shell.
But if the shell gets a symbol from the database, it is converted into strings.
12. Date
a. This datatype is used to store the current date or time in UNIX time format. You can
specify your own date time by creating object of Date and passing day, month, year into
it.
13. Object ID
a. This datatype is used to store the document’s ID.
14. Binary data
a. This datatype is used to store binary data.
15. JavaScript
a. This datatype is used to store JavaScript code into the document.
16. Regular expression
a. This datatype is used to store regular expression.

Profiling:
The mongodb profiler collects the data of all the database queries executed in mondodb instance. There
are different levels of profiling in mongodb

1. 0 - It means that the profiler is disabled and will not collect any data. It’s the default level.
2. 1 - It means that the profiler is enabled but will only get the data for the operations that take
longer than the slowms value.
3. 2 - It means that the profiler is enabled but will get data for all the operations.

Getting profiling level:

To check if the profile is enabled or disabled simply run this command in mongo shell.

db.getProfilingLevel()

If you get 0 value then it means the profile is disabled. To enabled the profile simply run this command
in mongo shell.

db.setProfilingLevel(1, { slowms: 20 })

The first parameter in the above command is showing the profiler level and the slowms is representing
the threshold of the profile in milliseconds. The default threshold is 100 milliseconds. But you can
change that threshold by giving your desired value in milliseconds.

The slowms threshold applies to all databases in a mongod instance. It is used by both the database
profiler and the diagnostic log, and should be set to the highest useful value to avoid performance
degradation.

Profile random sample of slow operations

We can also profile a random sample of slow operations. We can specify the sampleRate to obtain slow
operations while enabling profiler. Use below command to set sampleRate.

db.setProfilingLevel(1, { sampleRate: 0.42 })

The default value of sampleRate is set to 1.0, meaning all slow operations are profiled. When
sampleRate is set between 0 and 1, databases with profiling level 1 will only profile a randomly sampled
percentage of slow operations according to sampleRate.

The above command sets the profiling level for the current database to 1 and sets the profiler to sample
42% of all slow operations.

Checking current profiler status:

db.getProfilingStatus() this command will give us the following output

{ "was" : 0, "slowms" : 100, "sampleRate" : 1.0, "ok" : 1 }


If you see in the output the default slowmns is 100 and the default sample rate is 1.0. The was fields is
showing the current profiler level.

View profiler data:

The database profiler logs information about database operations is in the system.profile collection.

You can get the profiler data by quering system.profile collection according to your requirement. Given
below is one example.

db.system.profile.find().limit(10).sort( { ts : -1 } ).pretty()

Aggregation Framework:
By using aggregation documents we can easily group documents in collections by specific conditions. We
can also add additional fields by using aggregation. Aggregation requests are very fast as compare to
simple queries.

Aggregation Process:

When working in aggregation framework we use the documents in the collections as the input data and
then we perform our desired operations on the input data in different stages and we get refined output
that we can also store in a different collection. Each stage is independent from other stage. The date of
one stage is passes to the other stage. In aggregation order of stages is very important.

aggregate() is the method in mongodb to use aggregate method. This method contains different stages
and give us our desired results. Given below is a basic sudo code of aggregate()

db.<collectionname>.aggregate(

// stage 1

{$<stageoperatior: <field>},

// stage 2

{$<stageoperatior: <field>},

],

)
AllowDiskUse:

All aggregation stages are allowed to use maximum 100 MB of RAM. If the memory exceeds, server will
stop operations and returns the error.

{allowDiskUse: true} is used to tell the mongodb server to write the stages data to temporary files. So in
this case the mongodb server will use temporary files instead of using RAM. This is recommended if we
are using aggregation queries on large data. The default aggregation beviour is set to use the RAM.

Stages:

1. $match
2. $group
3. $count
4. $sort
5. $project
6. $limit
7. $unwind
8. Accumulator Stages
a. $sum
b. $avg
c. $max
d. $min

The accumulator stages are always used with $group stage

9. Unary Stages
a. $type
b. $it
c. $and
d. $or
e. $gt
f. $multiply

These unary stages are always used with $project stage or it can also be use with $group
stage if we are using accumulator stages in group.

10. $out

This stage is used to store the output result from aggregation in a separate collection. If the
collection is not present it creates the collection

Indexing:
Indexing is the best way to get results faster than normal. But the key for indexing is that we should
know what type of the data we are fetching through queries and what type of indexing should be create
on fields to get results fast and increase the performance. In aggregation use index for the field which is
using for match or group.

1. Single index
2. Compound index

Which modeling is best for which type of data?


While designing the database schema in Mogo dB, we keep the following points in our mind.

1. Dataset
2. Generated data
3. Where to use the data
4. Relationship between the data
5. Less queries more data
6. How often the data is changing (more important while indexing)?
7. Documents size should be under 16Mb.
8. Only update the required data instead of updating the whole document
9. Avoid application joins.

If we take care about the above the points while designing our database scheme it will help us to choose
the modeling approach that is best for our schema.

Best Practices:
1. Application and Database should be host on same network.
2. Create authenticated users for database access
3. User best indexing for queries to get better performance
4. Use replica for high availability
5. Regularly backup data
6. Best to keep the number of array elements well below four figures.

Questions and Answers:


1. Reading data with embedded objects with main document
a. db.collectionname.find()
b. Above will return all the data including the main document and embedded document
2. Reading data of referenced document with main document
a.
3. Applying any condition on embedded or referenced document (while reading data from main
document)
4. How embedded or references work behind the scene
5. How to updated embedded documents
6. How to read specific embedded document if there are multiple embedded documents in main
document
7. How to insert main document with embedded or reference
8. What does it do to indexing/caching when new document is added or updated
9. Validation Framework
10. Any third party open source component to use as a wrapper on MongoDB
11. What are alternate options in Mongo like we've stored procedures in MySQL
References:
https://www.youtube.com/watch?v=-56x56UppqQ

https://www.youtube.com/watch?v=4rhKKFbbYT4

https://www.youtube.com/watch?v=wA7ui4l8JBw

https://www.youtube.com/results?search_query=mongodb+indexing+explained+urdu+

https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

https://docs.mongodb.com/manual/tutorial/manage-the-database-profiler/

https://docs.mongodb.com/manual/applications/data-models/

https://www.tutorialspoint.com/mongodb/mongodb_datatype

https://www.infoq.com/articles/Starting-With-MongoDB/

https://devops.com/7-best-practices-new-mongodb-users-know/

https://www.developer.com/db/indexing-tips-for-improving-your-mongodb-performance.html

https://www.tutorialspoint.com/mongodb/mongodb_datatype

https://docs.mongodb.com/manual/tutorial/manage-the-database-profiler/

You might also like