Introductiont MongoDB
Introductiont MongoDB
Introduction to MongoDB
Overview
This chapter will introduce you to MongoDB fundamentals, first defining
data and its types, then exploring how a database solves data storage
challenges. You will learn about the different types of databases and how to
select the right one for your task. Once you have a clear idea about these
concepts, we will discuss MongoDB, its features, architecture, licensing,
and deployment models. By the end of the chapter, you will have gained
hands-on experience using MongoDB through Atlas—the cloud-based
service used to manage MongoDB—and worked with its basic elements,
such as databases, collections, and documents.
2 | Introduction to MongoDB
Introduction
A database is a platform to store data in a way that is secure, reliable, and easily
available. There are two types of databases used in general: relational databases
and non-relational databases. Non-relational databases are often called as NoSQL
databases. A NoSQL database is used to store large quantities of complex and
diverse data, such as product catalogs, logs, user interactions, analytics, and
more. MongoDB is one of the most established NoSQL databases, with features
such as data aggregation, ACID (Atomicity, Consistency, Isolation, Durability)
transactions, horizontal scaling, and Charts, all of which we will explore in detail in
the upcoming sections.
MongoDB comes in different variants and can be utilized for both experimental
and real-world applications. It is easier to set up and simpler to manage than most
other databases due to its intuitive syntax for queries and commands. MongoDB is
available for anyone to install on their own machine(s) or to be used on the cloud
as a managed service. MongoDB's cloud-managed service (called Atlas) is available
to everyone for free, whether you are an established enterprise or a student.
Before we start our discussion of MongoDB, let us first learn about database
management systems.
The data stored in the NoSQL database varies depending on the provider, but
generally, data is stored as documents instead of tables. An example of this would be
databases for inventory management, where different products can have different
attributes and, therefore, require a flexible structure. Similarly, an analytics database
that stores data from different sources in different structures would also need a
flexible structure.
Comparison
Let us compare NoSQL databases and RDBMS based on the following factors. You
will get an in-depth understanding of these as you read through this book. For now,
a basic overview is provided in the following table:
That concludes our discussion on databases and the differences between the various
database types. In the next section, we will begin our exploration of MongoDB.
Introduction to MongoDB
MongoDB is a popular NoSQL database that can store both structured and
unstructured data. Founded in 2007 by Kevin P. Ryan, Dwight Merriman, and Eliot
Horowitz in New York, the organization was initially called 10gen and was later
renamed MongoDB—a word inspired by the term humongous.
It provides both essential and extravagant features that are needed to store real-
world big data. Its document-based design makes it easy to understand and use. It is
built to be utilized for both experimental and real-world applications and is easier to
set up and simpler to manage than most of the other NoSQL databases. Its intuitive
syntax for queries and commands makes it easy to learn.
• Flexible and Dynamic Schema: MongoDB allows a flexible schema for your
database. A flexible schema allows variance in fields in different documents.
In simple terms, each record in the database may or may not have the same
number of attributes. It addresses the need for storing evolving data without
making any changes to the schema itself.
• Rich Query Language: MongoDB supports intuitive and rich query language,
which means simple yet powerful queries. It comes with a rich aggregation
framework that allows you to group and filter data as required. It also has
built-in support for general-purpose text search and specific purposes like
geospatial searches.
• Atomicity means all or nothing, which means either all operations are a part
of a transaction as it happens or none of them are. This means that if one of
the operations fails, then all the executed operations are rolled back to leave
the data affected by transaction operation in the state it was in before the
transaction started.
Introduction to MongoDB | 5
• Consistency in a transaction means keeping the data consistent as per the rules
defined for the database. If a transaction breaks any database consistency rules,
then it must be rolled back.
• Durability ensures that the changes are committed by the transaction. So,
if a transaction has executed then the database will ensure the changes are
committed even if there is a system crash.
Note
MongoDB 1.0 was first officially launched in February 2009 as an open
source database. Since then, there have been several stable releases of
the software. More information about different versions and the evolution
of MongoDB can be found at the official MongoDB website
(https://www.mongodb.com/evolved).
6 | Introduction to MongoDB
MongoDB Editions
MongoDB is available in two different editions to address the needs of developers
and enterprises, as follows:
Enterprise Edition: The Enterprise Edition uses the same underlying software as
the Community Edition but comes with some additional features, which include
the following:
• In-memory storage engine: This provides high throughput and low latency.
• System event auditing: This lets you record events in JSON format.
Standalone
Replica Set
Sharded
Sharded deployments allow you to store the data in a distributed way. They are
required for applications that manage massive data and expect high throughput. A
shard contains a subset of the data, and each shard must use a replica set to provide
redundancy of the data that it holds. Multiple shards working together provide a
distributed and replicated dataset.
Managing MongoDB
MongoDB provides the user with two options. Based on your requirements, you
can either install it on your system and manage the database yourself or utilize the
Database as a Service (DBaaS) option offered by MongoDB (Atlas). Let us learn more
about these two options.
8 | Introduction to MongoDB
Self-Managed
MongoDB is available to be downloaded and installed on your machines. The
machine can be a workstation, a server, a virtual machine in a data center, or on the
cloud. You can install MongoDB as standalone, a replica set, or sharded clusters. All
these deployments are possible with both the Community and Enterprise Editions.
Each deployment has its advantages and associated complexity. A self-managed
database can be useful for scenarios where you either want more granular control of
your database or you just want to learn database management and operations.
In this section, we learned about the history of MongoDB and its evolution. We also
learned about different editions of MongoDB and the differences between them. We
concluded the section by learning how MongoDB can be deployed and managed.
MongoDB Atlas
MongoDB Atlas is the DBaaS offering from MongoDB Inc. It allows you to provision
a database on the cloud as a service, which can be used for your applications from
anywhere. Atlas uses cloud infrastructures from different cloud vendors. You can
choose the cloud vendor on which you want to deploy your database. Like any other
managed service, you get the benefits of highly available secured environments with
low or no maintenance needed.
• Simple Setup: The database setup on Atlas is easy and can be done in just a few
steps. Atlas runs a variety of automated tasks behind the scenes to set up your
multi-node cluster.
MongoDB Atlas | 9
Cloud Providers
MongoDB Atlas currently supports three cloud providers, namely AWS, GCP,
and Microsoft Azure.
Availability Zones
Availability Zones (AZs) are a group of physical data centers within close proximity,
equipped with computational, storage, or networking resources.
Regions
A region is a geographical area, for example, Sydney, Mumbai, London, and so on.
A region generally consists of two or more AZs. The AZs are generally in different
cities/towns away from each other, to provide fault tolerance in case of any natural
disasters. Fault tolerance is the ability of a system to keep running when something
goes wrong in one portion of the system. In terms of AZs, if one AZ goes down due to
some reason, another AZ should still be able to serve the operations.
10 | Introduction to MongoDB
• AWS: https://docs.atlas.mongodb.com/reference/amazon-aws/#amazon-aws.
• GCP: https://docs.atlas.mongodb.com/reference/google-gcp/#google-gcp.
• Azure: https://docs.atlas.mongodb.com/reference/microsoft-azure/#microsoft-azure.
Atlas Tiers
To build a database cluster in MongoDB Atlas, you need to select a tier. A tier is a
level of database power that you get from your cluster. When you provision your
database in Atlas, you are given two parameters: RAM and storage. Depending on
your selection of these parameters, an appropriate amount of database power is
provisioned. The cost of your cluster is linked to the selection of RAM and storage; a
higher selection means a higher cost and a lower selection means a lower cost.
M0 is the free tier available in MongoDB Atlas, which gives you shared RAM with
storage of 512 MB. It is the tier that we will be using for our learning purposes. The
free tier is not available in all regions, so if you do not find it in your region, select the
closest free tier region. The proximity of your database determines the latency for
your operations.
Selecting a tier requires an understanding of your database usage and how much
you would like to spend. Under provisioned databases can exhaust your application's
capacity at peak usage and can lead to application errors. Overprovisioned
databases can help your application perform well but are more expensive. One of
the advantages of using a cloud database is that you can always modify your cluster
size as per your needs. But you still need to find what is the optimal capacity for
your day-to-day database use. Determining the maximum number of concurrent
connections is a critical decision factor that can help you choose the appropriate
MongoDB Atlas tier for your use case. Let us look at the different tiers available:
MongoDB Atlas | 11
Once we have identified our requirements, the estimated cost can be calculated
as follows:
Note
Apart from the running cost of your cluster, you should consider the cost of
additional services such as backups, data transfer, and support contracts.
Let us implement our learning in an example scenario through the following exercise.
2. You can sign up using your Google account or by providing your details manually
as can be seen from the following screen. Provide your usage, Your Work
Email, First Name, Last Name, and Password details in the respective
fields, select the checkbox to agree to the terms of service and click Get
started free.
The following window appears in which you can enter your organization and
project details:
Next, you should see the following page, which means your account has been
successfully created:
MongoDB Atlas Organizations, Projects, Users, and Clusters | 15
Organizations
A MongoDB Atlas organization is the top-level entity in your account, containing other
elements such as projects, clusters, and users. You need to set up an organization
first before any other resources.
16 | Introduction to MongoDB
2. You will see the default organization in the list of organizations. To create a
new organization, click the Create New Organization button in the
top-right corner:
MongoDB Atlas Organizations, Projects, Users, and Clusters | 17
3. Type the organization name in the Name Your Organization field. Leave
the default selection for Cloud Service as MongoDB Atlas. Click Next to
proceed to the next step:
4. You will see your login as the Organization Owner. Leave everything as
their defaults and click Create Organization.
Once you have successfully created the organization, the following Projects
screen will appear:
So, in this exercise, you have successfully created the organization for your
MongoDB application.
MongoDB Atlas Organizations, Projects, Users, and Clusters | 19
Projects
A project provides a grouping of clusters and users for a specific purpose; for
example, you would like to segregate your lab, demo, and production environments.
Similarly, you may like a different network, region, and user setup for different
environments. Projects allow you to do this grouping as per your own organizational
needs. In the next exercise, you will create a project.
2. Provide a name for your project on the Name Your Project tab. Name the
project myMongoProject. Click Next:
3. Click Create Project. The Add Members and Set Permissions page
is not mandatory, so leave it as the default. Your name should appear as the
Project Owner:
Figure 1.14: Add Members and Set Permissions for the project
Your project is now set up. A cluster setup splash screen appears as shown in
the following figure:
Now that you have created a project, you can create your first MongoDB
cloud deployment.
MongoDB Clusters
A MongoDB cluster is the term used for a database replica set or shared deployments
in MongoDB Atlas. A cluster is a distributed set of servers used for data storage and
retrieval. A MongoDB cluster, at the minimum level, is a three-node replica set. In
a sharded environment, a single cluster may contain hundreds of nodes/servers
containing different replica sets with each replica set comprised of at least three
nodes/servers.
5. Select the Recommended region that is closest to your location and is free. In
this case, you are selecting Sydney, as can be seen from the following figure:
On the region selection page, you will see your cluster setting as per your
selection. The Cluster Tier will be M0 Sandbox(Shared RAM, 512 MB
storage), Additional Settings will be MongoDB 4.2 No Backup,
and Cluster Name will be Cluster0:
6. Ensure that the selections are made correctly in the preceding step so that the
cost appears as FREE. Any selections different from what is recommended in
the previous steps may add costs for your cluster. Click on Create Cluster:
After a few minutes, you should see your new cluster, as shown here:
3. Now type your Password and click Login. The Clusters window appears as
shown here:
4. Click the CONNECT button under Cluster0. It will open a modal screen
as follows:
The first step before you connect to the cluster is to whitelist your IP address.
MongoDB Atlas has a built-in security feature that is enabled by default, which
blocks connectivity to the database from everywhere. So, the whitelisting of the
client IP is necessary to connect to the database.
28 | Introduction to MongoDB
6. The screen will show your current IP address; just click on the Add IP
Address button. If you wish to add more IPs to the whitelist, you can add
them manually by clicking the Add a Different IP Address option (see
preceding figure):
7. To create a new MongoDB user, provide a Username and Password for a new
user and click on the Create Database User button to create a user as
shown here:
MongoDB Atlas Organizations, Projects, Users, and Clusters | 29
Once the details are successfully updated, the following screen appears:
9. Download and install the mongo shell by selecting the options for your
workstation/client machine as shown in the following screenshot:
MongoDB Atlas Organizations, Projects, Users, and Clusters | 31
10. Once you have the mongo shell installed, run the connection string you grabbed
in the preceding step to connect to your database. When prompted, enter the
password that you used for your MongoDB user in the previous step:
If everything goes well, you should see the mongo shell connected to your Atlas
cluster. Here is a sample output of a connecting string execution:
Ignore the warnings seen in Figure 1.37. At the end, you should see your cluster
name and a command prompt. You can run the show databases command
to list the existing database. You should see the two databases that are used by
MongoDB for administrative purposes. Here is some sample output of the show
databases command:
MongoDB Enterprise Cluster0-shard-0:PRIMARY> show databases
admin 0.000GB
local 4.215GB
MongoDB Elements
Let us dive into some very basic elements of MongoDB, such as databases,
collections, and documents. Databases are basically aggregations of collections,
which in turn, are made up of documents. A document is the basic building block in
MongoDB and contains information about the various fields in a key-value format.
Documents
MongoDB stores data records in documents. A document is a collection of field
names and values, structured in a JavaScript Object Notation (JSON)-like format.
JSON is an easy-to-understand key-value pair format to describe data. The documents
in MongoDB are stored as an extension of the JSON type, which is called BSON (Binary
JSON). It is a binary-encoded serialization of JSON-like documents. BSON is designed
to be more efficient in space than standard JSON. BSON also contains extensions that
allow the representation of data types that cannot be represented in JSON. We will
look at these in detail in Chapter 2, Documents and Data Types.
34 | Introduction to MongoDB
Document Structures
MongoDB documents contain field and value pairs and follow a basic structure, as
follows:
{
"firstFieldName": firstFieldValue,
"secondFieldName": secondFieldValue,
…
"nthFieldName": nthFieldValue
}
{
"_id":ObjectId("5da26111139a21bbe11f9e89"),
"name":"Anita P",
"placeOfBirth":"Koszalin",
"profession":"Nursing"
The following is another example with some fields and date types from BSON:
{
"_id" : ObjectId("5da26553fb4ef99de45a6139"),
"name" : "Roxana",
"dateOfBirth" : new Date("Dec 25, 2007"),
"placeOfBirth" : "Brisbane",
"profession" : "Student"
}
MongoDB Atlas Organizations, Projects, Users, and Clusters | 35
{
"_id" : ObjectId("5da2685bfb4ef99de45a613a"),
"name" : "Helen",
"dateOfBirth" : new Date("Dec 25, 2007"),
"placeOfBirth" : "Brisbane",
"profession" : "Student",
"hobbies" : [
"painting",
"football",
"singing",
"story-writing"],
"address" : {
"city" : "Sydney",
"country" : "Australia",
"postcode" : 2161
}
}
The _id field shown in the preceding snippet is auto generated by MongoDB and
is used as a unique identifier for the document. We will learn more about this in the
upcoming chapters.
Collections
In MongoDB, documents are stored in collections. Collections are analogous to tables
in relational databases. You need to use the collection name in your queries for
operations such as insert, retrieve, delete, and so on.
36 | Introduction to MongoDB
Creating a Database
Creating a database in MongoDB is very simple. Execute the use command in the
mongo shell as follows, by replacing yourDatabaseName with your own choice of
database name:
use yourDatabaseName
MongoDB Atlas Organizations, Projects, Users, and Clusters | 37
If the database does not exist, Mongo will create the database and will switch the
current database to the new database. If the database exists, Mongo will refer to the
existing database. Here is the output of the last command:
switched to db yourDatabaseName
Note
Naming conventions and using logical names always help even if you are
working on a learning project. The project name is meant to be replaced
by something more meaningful for you and understandable for later use.
This rule applies to the name of any asset that we create, so try to use
logical names.
Creating a Collection
You can use the createCollection command to create a collection. This
command allows you to utilize different options for your collection, such as a capped
collection, validation, collation, and so on. Another way to create a collection is by just
inserting a document in a non-existent collection. In such a case, MongoDB checks
whether the collection exists, and if not, it will create the collection before inserting
the documents passed. We will try to utilize both methods to create a collection.
db.createCollection( '<collectionName>',
{
capped: <boolean>,
autoIndexId: <boolean>,
size: <number>,
max: <number>,
storageEngine: <document>,
validator: <document>,
validationLevel: <string>,
validationAction: <string>,
indexOptionDefaults: <document>,
viewOn: <string>,
38 | Introduction to MongoDB
pipeline: <pipeline>,
collation: <document>,
writeConcern: <document>
})
db.createCollection('myCappedCollection',
{
capped: true,
size: 256,
max: 5
})
{
«ok» : 1,
«$clusterTime» : {
«clusterTime» : Timestamp(1592064731, 1),
«signature» : {
«hash» : BinData(0,»XJ2DOzjAagUkftFkLQIT
9W2rKjc="),
«keyId» : NumberLong(«6834058563036381187»)
}
},
«operationTime» : Timestamp(1592064731, 1)
}
Do not worry about the preceding options much as none of them are mandatory. If
you do not need to set any of these, then your createCollection command can
be simplified as follows:
db.createCollection('myFirstCollection')
{
«ok» : 1,
«$clusterTime» : {
«clusterTime» : Timestamp(1597230876, 1),
«signature» : {
MongoDB Atlas Organizations, Projects, Users, and Clusters | 39
«hash» : BinData(0,»YO8Flg5AglrxCV3XqEuZG
aaLzZc="),
«keyId» : NumberLong(«6853300587753111555»)
}
},
«operationTime» : Timestamp(1597230876, 1)
}
use yourDatabaseName;
db.myCollectionName.insert(
{
"name" : "Yahya A", "company" : "Sony"}
);
WriteResult({ "nInserted" : 1 })
The preceding output returns the number of documents inserted into the collection.
As you have inserted a document in a non-existent collection, MongoDB must have
created the collection for us before inserting this document. To confirm that, display
your collections list using the following command:
show collections;
The output of your command should display the list of collections in your database,
something like this:
myCollectionName
Creating Documents
As you must have noticed in the previous section, we used the insert
command to put a document in a collection. Let us look at a couple of variants of
insert commands.
40 | Introduction to MongoDB
db.blogs.insertOne(
{ username: "Zakariya", noOfBlogs: 100, tags: ["science",
"fiction"]
})
The insertOne operation returns the _id value of the newly inserted document.
Here is the output of the insertOne command:
{
"acknowledged" : true,
"insertedId" : ObjectId("5ea3a1561df5c3fd4f752636")
}
Note
insertedId is the unique ID for the document that is inserted, and it will
not be the same for you as mentioned in the output.
db.blogs.insertMany(
[
{ username: "Thaha", noOfBlogs: 200, tags: ["science",
"robotics"]},
{ username: "Thayebbah", noOfBlogs: 500, tags: ["cooking",
"general knowledge"]},
{ username: "Thaherah", noOfBlogs: 50, tags: ["beauty",
"arts"]}
]
)
The output returns the _id values of all the newly inserted documents:
{
«acknowledged» : true,
«insertedIds» : [
MongoDB Atlas Organizations, Projects, Users, and Clusters | 41
ObjectId(«5f33cf74592962df72246ae8»),
ObjectId(«5f33cf74592962df72246ae9»),
ObjectId(«5f33cf74592962df72246aea»)
]
}
db.collection.find(query, projection)
The command takes two optional parameters: query and projection. The query
parameter allows you to pass a document to apply filters during the find operation.
The projection parameter allows you to pick desired attributes from the returned
documents instead of all the attributes. When no parameter is passed in the find
command, then all the documents are returned.
db.records.insertMany(
[
{ Name: "Aaliya A", City: "Sydney"},
{ Name: "Naseem A", City: "New Delhi"}
]
)
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("5f33cfac592962df72246aeb"),
ObjectId("5f33cfac592962df72246aec")
]
}
42 | Introduction to MongoDB
First, fetch these records using the find command without the pretty method:
db.records.find()
Now, run the same find command using the pretty method:
db.records.find().pretty()
It should return the same records, but in a beautifully formatted way as shown here:
{
"_id" : ObjectId("5f33cfac592962df72246aeb"),
"Name" : "Aaliya A",
"City" : "Sydney"
}
{
"_id" : ObjectId("5f33cfac592962df72246aec"),
"Name" : "Naseem A",
"City" : "New Delhi"
}
Clearly, the pretty() method can be quite useful when you are looking at multiple
or nested documents, as the output is more easily readable.
[
{
"title": "Rocky",
"releaseDate": new Date("Dec 3, 1976"),
"genre": "Action",
"about": "A small-time boxer gets a supremely rare chance
to fight a heavy- weight champion in a bout in
which he strives to go the distance for his self-respect.",
"countries": ["USA"],
"cast" : ["Sylvester Stallone","Talia Shire",
"Burt Young"],
"writers" : ["Sylvester Stallone"],
"directors" : ["John G. Avildsen"]
},
{
"title": "Rambo 4",
"releaseDate ": new Date("Jan 25, 2008"),
"genre": "Action",
"about": "In Thailand, John Rambo joins a group of
mercenaries to venture into war-torn Burma, and rescue
a group of Christian aid workers who were kidnapped
by the ruthless local infantry unit.",
"countries": ["USA"],
"cast" : [" Sylvester Stallone", "Julie Benz",
"Matthew Marsden"],
"writers" : ["Art Monterastelli",
"Sylvester Stallone"],
"directors" : ["Sylvester Stallone"]
}
]
5. Create an awards collection with a few records using the following data:
{
"title": "Oscars",
"year": "1976",
"category": "Best Film",
"nominees": ["Rocky","All The President's Men","Bound For
Glory","Network","Taxi Driver"],
"winners" :
44 | Introduction to MongoDB
[
{
"movie" : "Rocky"
}
]
}
{
"title": "Oscars",
"year": "1976",
"category": "Actor In A Leading Role",
"nominees": ["PETER FINCH","ROBERT DE NIRO",
"GIANCARLO GIANNINI","WILLIAM HOLDEN","SYLVESTER STALLONE"],
"winners" :
[
{
"actor" : "PETER FINCH",
"movie" : "Network"
}
]
}
6. Check whether your inserts have saved the documents in the collection as
desired by fetching the documents.
Note
The solution for this activity can be found via this link.
Summary
We began this chapter by covering the fundamentals of data, databases, RDBMS,
and NoSQL databases. You learned the differences between RDBMS and NoSQL
databases, and how to decide which database is a good fit for a given scenario. You
learned that MongoDB can be used as self-managed or as DbaaS, set up your account
in MongoDB Atlas, and reviewed MongoDB deployment on different cloud platforms
and how to estimate its cost. We concluded the chapter with the MongoDB structure
and its basic components, such as databases, collections, and documents. In the
next chapter, you will utilize these concepts to explore MongoDB components and its
data model.