Module-3
Module-3
Introduction to MongoDB
What is MongoDB?
MongoDB is
i. Cross-platform.
iii. Non-relational.
iv. Distributed.
v. NoSQL.
Why MongoDB?
• Few of the major challenges with traditional RDBMS are dealing with large volumes of
data, rich variety of data - particularly unstructured data, and meeting up to the scale
needs of enterprise data.
• The need is for a database that can scale out or scale horizontally to meet the scale
requirements, has flexibility with respect to schema, is fault tolerant, is consistent and
partition tolerant, and can be easily distributed over a multitude of nodes in a cluster.
Why MongoDB?
Each JSON document should have a unique identifier. It is the _id key. It is similar to the
primary key in relational databases. This facilitates search for documents based on the unique
identifier. An index is automatically built on the unique identifier. It is your choice to either
provide unique values yourself or have the mongo shell generate the same.
Database
It is a collection of collections. In other words, it is like a container for collections. It gets created the first time
that your collection makes a reference to it. This can also be created on demand. Each database gets its own set
of files on the file system. A single MongoDB server can house several databases.
Collection
A collection is analogous to a table of RDBMS. A collection is created on demand. It gets created the first time
that you attempt to save a document that references it. A collection exists within a single database.A collection
holds several MongoDB documents. A collection does not enforce a schema. This implies that documents within
a collection can have different fields. Even if the documents within a collection have same fields, the order of
the fields can be different.
Document
A document is analogous to a row/record/tuple in an RDBMS table. A document has a dynamic schema This
implies that a document in a collection need not necessarily have the same set of fields/key-value pairs. Shown
in Figure is a collection by the name "students" containing three documents.
MongoDB has extensive support for dynamic queries. This is in keeping with traditional
RDBMS wherein we have static data and dynamic queries. CouchDB, another document-
oriented, schema-less NoSQL database and MongoDB's biggest competitor, works on quite
the reverse philosophy. It has support for dynamic data and static queries.
• MongoDB provides GridFS to support the storage of binary data. It can store up to 4 MB
of data. This usually suffices for photographs (such as a profile picture) or small audio
clips. However, if one wishes to more movie clips, MongoDB has another solution.
• It stores the metadata (data about data along with the context information) in a collection
called "file". It then breaks the data into small pieces called chunks and stores it in the
"chunks" collection. This process takes care about the need for easy scalability.
Replication
It provides data redundancy and high availability. It helps to recover from hardware failure
and service interruptions. In MongoDB, the replica set has a single primary and several
secondaries. Each write request from the client is directed to the primary. The primary logs all
write requests into its Oplog (operations log). The Oplog is then used by the secondary
replica members to synchronize their data. This way there is strict adherence to consistency.
The clients usually read from the primary. However, the client can also specify a read
preference that will then direct the read operations to the secondary.
Sharding
Sharding is akin to horizontal scaling. It means that the large dataset is divided and
distributed over multiple servers or shards. Each shard is an independent database and
collectively they would constitute a logical database.
i. Sharding reduces the amount of data that each shard needs to store and manage.
For example, if the dataset was 1 TB in size and we were to distribute this over
four shards, each shard would house jus 256 GB data. As the cluster grows, the
amount of data that each shard will store and manage will decrease.
The process of SHARDING in MongoDB
ii. Sharding reduces the number of operations that each shard handles. For example,
if we were to insert data, the application needs to access only that shard which
houses that data.
MongoDB updates the information in-place. This implies that it updates the data wherever it
is available. It does not allocate separate space and the indexes remain unaltered.
MongoDB is all for lazy-writes. It writes to the disk once every second. Reading and writing
to disk is a slow operation as compared to reading and writing from memory. The fewer the
reads and writes that we perform to the disk, the better is the performance. This makes
MongoDB faster than its other competitors who write almost immediately to the disk.
However, there is a tradeoff. MongoDB makes no guarantee that data will be stored safely on
the disk.
Create Database
use DATABASE_Name
To create a database by the name "myDB" the syntax is
use myDB
To confirm the existence of your database, type the command at the MongoDB shell:
db
show dbs
Notice that the newly created database, "myDB" does not show up in the list above. The
reason is that the database needs to have at least one document to show up in the list.
The default database in MongoDB is test. If one does not create any database, all collections
are by default stored in the test database.
Drop Database
db.dropDatabase();
To drop the database, "myDB", first ensure that you are currently placed in "myDB" database
and then use the db.dropDatabase() command to drop the database.
use myDB;
db.dropDatabase();