0% found this document useful (0 votes)
3 views54 pages

Module-3

MongoDB is a cross-platform, open-source, non-relational, distributed NoSQL database that uses BSON for data storage, allowing for flexible and dynamic schemas. It addresses challenges faced by traditional RDBMS, such as handling large volumes of unstructured data and providing scalability, fault tolerance, and ease of distribution. Key features include support for dynamic queries, replication for high availability, sharding for horizontal scaling, and in-place updates for improved performance.

Uploaded by

dhanulokesh06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views54 pages

Module-3

MongoDB is a cross-platform, open-source, non-relational, distributed NoSQL database that uses BSON for data storage, allowing for flexible and dynamic schemas. It addresses challenges faced by traditional RDBMS, such as handling large volumes of unstructured data and providing scalability, fault tolerance, and ease of distribution. Key features include support for dynamic queries, replication for high availability, sharding for horizontal scaling, and in-place updates for improved performance.

Uploaded by

dhanulokesh06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

MODULE-3

Introduction to MongoDB
What is MongoDB?
MongoDB is
i. Cross-platform.

ii. Open source.

iii. Non-relational.

iv. Distributed.

v. NoSQL.

vi. Document-oriented data store.

Why MongoDB?
• Few of the major challenges with traditional RDBMS are dealing with large volumes of
data, rich variety of data - particularly unstructured data, and meeting up to the scale
needs of enterprise data.
• The need is for a database that can scale out or scale horizontally to meet the scale
requirements, has flexibility with respect to schema, is fault tolerant, is consistent and
partition tolerant, and can be easily distributed over a multitude of nodes in a cluster.

Why MongoDB?

Using Java Script Object Notation (JSON)

• JSON is extremely expressive.


• MongoDB actually does not use JSON but BSON (pronounced Bee Son) is Binary JSON.
• It is an open standard.
• It is used to store complex data structures.
• Let us look at how data is stored in .csv file. Assume this data is about the employees of
an organization named "XYZ". As can be seen below, the column values are separated
using commas and the rows are separated by a carriage return.
• John, Mathews, +123 4567 8900
Andrews, Symmonds, +456 7890 1234
Mable, Mathews, +789 1234 5678
• This looks good! However let us make it slightly more legible by adding column heading.
• FirstName, LastName, ContactNo
John, Mathews, +123 4567 8900
Andrews, Symmonds, +456 7890 1234
Mable, Mathews, +789 1234 5678
• Now assume that few employees have more than one ContactNo. It can be neatly
classified as OfficeContactNo and HomeContactNo.
• Let us look at just another piece of data that you wish to store about the employees. You
need to store these email addresses as well. Here again we have the same issues, few
employees have two email addresses, we have three and there are a few employees with
more than three email addresses as well.
• As we come across these fields or columns, we realize that it gets messy with.csv. CSV
are known to store data well if it is flat and does not have repeating values.
• The problem becomes even more complex when different departments maintain the
details of their employees. The formats of.csv (columns, etc.) could vastly differ and it
will call for some efforts before we can merge the files from the various departments to
make a single file. T
• This problem can be solved by XML. But as the name suggests XML is highly extensible.
It does not use call for defining a data format, rather it defines how you define a data
format. You may be prepared to undertake this cumbersome task for highly complex and
structured data; however, for simple data exchange it might just be too much work.
• Enter JSON! Let us look at how it reacts to the problem at hand.
{
FirstName: John,
LastName: Mathews,
ContactNo: [+123 4567 8900, +123 4444 5555]
}
{
FirstName: Andrews,
LastName: Symmonds,
ContactNo: [+456 7890 1234, +456 6666 7777]
}
{
First Name Mable,
LastName: Mathews,
ContactNo: +789 1234 5678
}
• As you can see it is quite easy to read a JSON. There is absolutely no confusion now. One
can have a list of n contact numbers, and they can be stored with ease.
• JSON is very expressive. It provides the much needed ease to store and retrieve
documents in their real form. The binary form of JSON is BSON. BSON is ana open
standard. In most cases it consumes less space as compared to the text-based JSON. There
is yet another advantage with BSON. It is much easier and quicker to convert BSON to a
programming language's native data format. There are MongoDB drivers available for a
number of programming languages such as C, C++, Ruby, PHP, Python, C#, etc., and
each works slightly differently. Using the basic binary format enables the native data
structures to be built quickly for each language without going through the hassle of first
processing JSON.

Creating or Generating a Unique Key

Each JSON document should have a unique identifier. It is the _id key. It is similar to the
primary key in relational databases. This facilitates search for documents based on the unique
identifier. An index is automatically built on the unique identifier. It is your choice to either
provide unique values yourself or have the mongo shell generate the same.

Database

It is a collection of collections. In other words, it is like a container for collections. It gets created the first time
that your collection makes a reference to it. This can also be created on demand. Each database gets its own set
of files on the file system. A single MongoDB server can house several databases.

Collection

A collection is analogous to a table of RDBMS. A collection is created on demand. It gets created the first time
that you attempt to save a document that references it. A collection exists within a single database.A collection
holds several MongoDB documents. A collection does not enforce a schema. This implies that documents within
a collection can have different fields. Even if the documents within a collection have same fields, the order of
the fields can be different.

Document

A document is analogous to a row/record/tuple in an RDBMS table. A document has a dynamic schema This
implies that a document in a collection need not necessarily have the same set of fields/key-value pairs. Shown
in Figure is a collection by the name "students" containing three documents.

A collection “students” containing 3 documents


Support for Dynamic Queries

MongoDB has extensive support for dynamic queries. This is in keeping with traditional
RDBMS wherein we have static data and dynamic queries. CouchDB, another document-
oriented, schema-less NoSQL database and MongoDB's biggest competitor, works on quite
the reverse philosophy. It has support for dynamic data and static queries.

Storing Binary Data

• MongoDB provides GridFS to support the storage of binary data. It can store up to 4 MB
of data. This usually suffices for photographs (such as a profile picture) or small audio
clips. However, if one wishes to more movie clips, MongoDB has another solution.
• It stores the metadata (data about data along with the context information) in a collection
called "file". It then breaks the data into small pieces called chunks and stores it in the
"chunks" collection. This process takes care about the need for easy scalability.
Replication

It provides data redundancy and high availability. It helps to recover from hardware failure
and service interruptions. In MongoDB, the replica set has a single primary and several
secondaries. Each write request from the client is directed to the primary. The primary logs all
write requests into its Oplog (operations log). The Oplog is then used by the secondary
replica members to synchronize their data. This way there is strict adherence to consistency.
The clients usually read from the primary. However, the client can also specify a read
preference that will then direct the read operations to the secondary.

The process of REPLICATION in MongoDB

Sharding
Sharding is akin to horizontal scaling. It means that the large dataset is divided and
distributed over multiple servers or shards. Each shard is an independent database and
collectively they would constitute a logical database.

The prime advantages of sharding are as follows:

i. Sharding reduces the amount of data that each shard needs to store and manage.
For example, if the dataset was 1 TB in size and we were to distribute this over
four shards, each shard would house jus 256 GB data. As the cluster grows, the
amount of data that each shard will store and manage will decrease.
The process of SHARDING in MongoDB
ii. Sharding reduces the number of operations that each shard handles. For example,
if we were to insert data, the application needs to access only that shard which
houses that data.

Updating Information In-Place

MongoDB updates the information in-place. This implies that it updates the data wherever it
is available. It does not allocate separate space and the indexes remain unaltered.

MongoDB is all for lazy-writes. It writes to the disk once every second. Reading and writing
to disk is a slow operation as compared to reading and writing from memory. The fewer the
reads and writes that we perform to the disk, the better is the performance. This makes
MongoDB faster than its other competitors who write almost immediately to the disk.
However, there is a tradeoff. MongoDB makes no guarantee that data will be stored safely on
the disk.

Terms used in RDBMS and MongoDB

Create Database

The syntax for creating database is as follows:

use DATABASE_Name
To create a database by the name "myDB" the syntax is

use myDB

To confirm the existence of your database, type the command at the MongoDB shell:

db

To get a list of all databases, type the below command:

show dbs

Notice that the newly created database, "myDB" does not show up in the list above. The
reason is that the database needs to have at least one document to show up in the list.

The default database in MongoDB is test. If one does not create any database, all collections
are by default stored in the test database.

Drop Database

The syntax to drop database is as follows:

db.dropDatabase();

To drop the database, "myDB", first ensure that you are currently placed in "myDB" database
and then use the db.dropDatabase() command to drop the database.

use myDB;
db.dropDatabase();

Confirm if the database "myDB" has been dropped.

If no database is selected, the default database "test" is dropped.


Data Types in MongoDB
The following are various data types in MongoDB.

A few commands worth looking at are as follows


Consider a table "Students" with the following columns:
i. StudRollNo
ii. StudName
iii. Grade
iv. Hobbies
v. DOJ
Before we get into the details of CRUD operations in MongoDB, let us look at how the
statements are written in RDBMS and MongoDB.
MongoDB Query Language

You might also like