0% found this document useful (0 votes)
25 views16 pages

Unit 2

Uploaded by

thirdp753
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views16 pages

Unit 2

Uploaded by

thirdp753
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

www.acuityeducare.

com
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

Q.1: Explain MongoDB data model


Acuity Educare Ans:

 MongoDB is a document-based database system where the documents can have a


flexible schema. This means that documents within a collection can have different (or

NGT 
same) sets of fields. This affords you more flexibility when dealing with data.
A MongoDB deployment can have many databases. Each database is a set of
collections. Collections are similar to the concept of tables in SQL; however, they are
schemaless. Each collection can have multiple documents. Think of a document as a
SEM : V row in SQL.

SEM V: UNIT 2

 In an RDBMS system, since the table structures and the data types for each column
are fixed, you can only add data of a particular data type in a column. In MongoDB, a
collection is a collection of documents where data is stored as key-value pairs.
 Let’s understand with an example how data is stored in a document. The following
document holds the name and phone numbers of the users:

{"Name": "ABC", "Phone": ["1111111", "222222" ] }

 Dynamic schema means that documents within the same collection can have the same
or different sets of fields or structure, and even common fields can store different types
of values across documents. There’s no rigidness in the way data is stored in the
documents of a collection.
Let’s see an example of a Region collection:
{ "R_ID" : "REG001", "Name" : "United States" }
{ "R_ID" :1234, "Name" : "New York" , "Country" : "United States" }

607A, 6th floor, Ecstasy business park, city of joy, JSD


road, mulund (W) | 8591065589/022-25600622
Page 1 of 63
YouTube - Abhay More | Telegram - abhay_more
Abhay More 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
abhay_more

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

Q.2: write note on JSON and BSON Q.3 write note on The Identifier(_id)and capped collection

Ans: Ans:
JSON
 MongoDB is a document-based database. It uses Binary JSON for storing its data. The Identifier (_id)
 . JSON stands for JavaScript Object Notation. It’s a standard used for data interchange  MongoDB stores data in documents. Documents are made up of key-value pairs.
in today’s modern Web (along with XML). The format is human and machine readable. Although a document can be compared to a row in RDBMS, unlike a row, documents
It is not only a great way to exchange data but also a nice way to store data. have flexible schema.
 All the basic data types (such as strings, numbers, Boolean values, and arrays) are  A key, which is nothing but a label, can be roughly compared to the column name in
supported by JSON. RDBMS.A key is used for querying data from the documents. Hence, like a RDBMS
 At high level,JSON have two things- An Object and an array .An object is a collection of primary key (used to uniquely identify each row), you need to have a key that uniquely
name/value pairs and an array is ordered list of values. With the combination of identifies each document within a collection. This is referred to as _id in MongoDB.
two,you can have complete JSON structure  If you have not explicitly specified any value for a key, a unique value will be
 The maximum number of documents that can embed in a document is 100. This is very automatically generated and assigned to it by MongoDB. This key value is immutable
important factor while working with MongoDB. and can be of any data type except arrays.
 The following code shows what a JSON document looks like:
{ Capped Collection
"_id" : 1, If you want to log the activities ,cache data or high volume data within an application and you
"name" : { "first" : "John", "last" : "Doe" }, want to store data in the same order it is inserted
"publications" : [  MongoDB offers Capped collections for doing so. Capped collections are fixed size
{ circular collections which can store data in the same order it is inserted in order to
"title" : "First Book", support high performance for create read and delete operation.
"year" : 1989,  It is very fixed size,high-performance and “auto FIFO age-out” .That is when allotted
"publisher" : "publisher1" space is fully utilized ,newly added object will replace the older ones in the same order
}, it is inserted.
{ "title" : "Second Book",  New objects can be inserted into capped collection.
"year" : 1999,  Existing objects can be replaced.
"publisher" : "publisher2"  But you can’t remove an individual object from capped collection.
}  To create capped collection ,we use the following command
] >db.createCollection(“CappedLogCollection”,{capped:True,size:10000,max:1000})
} Where the size is the maximum size of the capped collection in bytes,and max specifies
 JSON lets you keep all the related pieces of information together in one place, which the number of documents in the capped collections.
provides excellent performance. It also enables the updating of a document to be  To check whether the collection is capped or not
independent. It is schemaless. >db.cappedLogCollection.isCapped()
 If you want to cap the existing collection
Binary JSON (BSON) >db.runCommand({“convertToCapped”:”posts”,size:10000})
Where the posts is the name of the collection which will be capped.
 MongoDB stores the JSON document in a binary-encoded format. This is termed as  MongoDB uses capped collection for maintaining replication logs. It assures protection
BSON. The BSON data model is an extended form of the JSON data model. of data in the insertion order thus leading high performance without use of indexes.
 MongoDB’s implementation of a BSON document is fast, highly traversable, and
lightweight. It supports embedding of arrays and objects within other arrays, and also
enables MongoDB to reach inside the objects to build indexes and match objects Q.4: explain Object-Oriented Programming
against queried expressions, both on top-level and nested BSON keys.This means that
MongoDb gives users the ease of use and flexibility of JSON documents together with Ans:
the speed and richness of a lightweight binary format
 Object-oriented programming enables you to have classes share data and behaviors
using inheritance. It also lets you define functions in the parent class that can be
overridden in the child class and thus will function differently in a different context.
 In other words, you can use the same function name to manipulate the child as well as
the parent class, although under the hood the implementations might be different. This
feature is referred to as polymorphism.
Page 2 of 63 Page 3 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 Relational databases ,with their focus on tables with a fixed schema and does not allow  One of the ways of doing this is to include the new structure to the new documents
us to define a related set of schemas for table so that we could store any object in our being added to the collection and then gradually migrating the collection in the
hierarchy in the same table background while the application is still running. This is one of the many use cases
 The flexibility of MongoDB offers by not enforcing a particular shema for all documents where having a polymorphic schema will be advantageous.
in a collection provides several benefits to the application programmer over an RDBMS  For example, say you are working with a Tickets collection where you have documents
solution: with ticket details, like so:
o Better Mapping of object-oriented inheritance and polymorphism // "Ticket1" document (stored in "Tickets" collection")
o S+implermigrstions between schemas with less application downtime {
o Better support for semi-structured domain data. id: 1,
 For Example Priority: "High",
{ type: "Incident",
_id:1, text: "Printer not working"
Password:”7f1afdbe” }...........
Firstname:”Derick”  At some point, the application team decides to introduce a “short description” field in
Lastname:”Rethans” the ticketdocument structure, so the best alternative is to introduce this new field in
Contacts:[ the new ticket documents.
{  Within the application, you embed a piece of code that will handle retrieving both “old
method:”phone” style” documents(without a short description field) and “new style” documents (with a
value:”+447551569555” short description field).
}  Gradually the old style documents can be migrated to the new style documents. Once
] the migration is completed, if required the code can be updated to remove the piece of
}, code that was embedded to handle the missing field.
{
_id:2,
Password:”ae9c300e” Q.6: Explain the basic Query of MongoDB
Firstname:”Rasmus”
Lastname:”Lerdorf” Ans:
}  CRUD operations (Create, Read, Update, and Delete) are used to query database.The
 In the above example we can see that the fields of two documents are not common mongo shell is standard distribution of MongoDB which provides a full database
and the structure is also different .we can also have fields with same name but interface,enabling you to work on different data stored in mongoDB.Once the database
different datatypes .This flexible schema not only enables you to store related data services have started,you can fire up the mongo shell and start using to query the
with different structures together in a same collection but it also simplifies the querying database.
 MongoDB by default listens for any incoming connections on port 27017 of the
localhost interface. Now that the database server is started ,you can start issuing
Q.5: Explain Schema Evolution commands to the server using mongo shell or any new command prompt.
 Let’s understand how to use the import/export tool to import and export data in and
Ans: out of the MongoDB database.
 When you are working with databases, one of the most important considerations that  First, create a CSV file to hold the records of students with the following structure:
you need to account for is the schema evolution (i.e. the change in the schema’s Name, Gender, Class, Score, Age.
impact on the running application). The design should be done in a way as to have  Save the file in C:/ as “student.json”
minimal or no impact on the application, meaning no or minimal downtime, no or very  Next, import the data from the MongoDB database to a new collection in order to look
minimal code changes, etc. at how the import tool works.
 Typically, schema evolution happens by executing a migration script that upgrades the  Open the command prompt (by running it as administrator) and import the .json file
databaseschema from the old version to the new one. If the database is not in using the following command:
production, the script can be simple drop and recreation of the database.
 Although MongoDB offers an Update option that can be used to update all the C:\>Mongoimport --db<database_name>--collection <collection_name><file_path
documents’ structure within a collection if there’s a new addition of a field, imagine the  Example
impact of doing this if you have thousands of documents in the collection. It would be C:\>Mongoimport --db details --collection student <student.json
very slow and would have a negative impact on the underlying application’s  Issue the following command to import the data from file student.json to a new
performance collection called student in the database name details.

Page 4 of 63 Page 5 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 In order to validate whether the collection is created and data is imported ,you connect
to the database using mongo shell To start the mongo shell, run command prompt as Q.7: Explain Create and Insert command
administrator and issue the command
>mongo Ans:
And press Enter Create and Insert
 Now ,on MongoDB shell check whether database,collections and documents exist with The use command
following commands  MongoDB use DATABASE_NAME is used to create database. The command will create a
 show dbs : print list of all databases on the server new database if it doesn’t exit ,otherwise it wil return the existing database.
 use<db>: switch current database to <db> variable db is set to the current database >use <database_name>
 show collections:print list of all collections for current database. Example:
 db.<collection_name>.find():To display all the documents present in collection. >use details
Switched to db details
Example:  If you want to check your database list,use the command show dbs
>use details  MongoDB doesn’t create a database until data is inserted into the database. In order to
Switched to db details view database we need to insert atleast one data.
>show collections
student Create collection implicitly
>db.student.find()  To create collection implicitly, we use the following command:
{ "_id" : ObjectId("5450af58c770b7161eefd31d"), "Name" : "S1", "Gender" : "M", >db.<collection_name>.insert({})
"Class" : "C1", "Age" : 19 } Example:
....... >db.student.insert({“name”:”ankush”,”age”:19})
{ "_id" : ObjectId("5450af59c770b7161eefd31e"), "Name" : "S2", "Gender" : "M",  To view the created collection,we use the following command
"Class" : "C2", "Age" : 18 } >show collections
 At any point, help can be accesses using the help() command .  Querying a document from collection(Read)
> help >db.<collection_name>.find() OR >db.<collection_name>.find().pretty()
db.help(): help on db methods Example:
db.mycoll.help() :help on collection methods >db.student.find()
sh.help():sharding helpers  The find() command helps to query data from MongoDB collection and to display the
rs.help(): replica set helpers result in a formatted way,you can use pretty() method.
help admin: administrative help
help connect :connecting to a db help Create collection explicitly
help keys: key shortcuts  To create collection explicitly,we use following command:
help misc:misc things to know >db.createCollection(“<collection_name>”,options)
help mr:mapreduce  In the command name is name of collection to be created. Options is a document and
show dbs:show database names is used to specify configuration of collection
show collections :show collections in current database  Options can be defined as follows:
show users :show users in current database
............. Field Type Description
exit :quit the mongo shell Capped Boolean If true,enables a capped
collection. Capped collection
 As shown above, if you need help on any of the methods of dbor collection , you can is a fixed size collection that
use db.help() or db.<CollectionName>.help() . automatically overwrites its
oldest entries when reaches
 For example, if you need help on the dbcommand, execute db.help() . to maximum size. If you
>db.help() specify true,you need to
DB methods: specify size parameter also
db.addUser(userDocument) autoIndexId Boolean (optional)If
... true,automatically create
db.shutdownServer() index on _id field.Default
db.stats() value is false
db.version() current version of the server

Page 6 of 63 Page 7 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

Size Number (optional)specifies a Example: criteria.


maximum size in bytes for db.emp.findOneAndReplace({“name”:”kavita”}
capped collection.If capped ,{“age”:20},{“addrs”:”Thane”})
is true,then you need to db.collection.findOneAndUpdate(filter,update Updates a single document
specify this field also ,options) based on the filter criteria
Max Number (Optional)specifies the
maximum number of  db.collection.insertMany():This method inserts multiple documents into collection .
documents allowed in the >db.emp.insertMany([{“name”:”raj”,”age”:30,”sal”:25000},{
capped collection. name”:”rajesh”,”age”:35,”sal”:35000}])

Inserting by Explicitly Specifying _id


 Example:  In the previous examples of insert, the id field was not specified, so it was implicitly
>db.createCollection(“student”,{capped:”true”,autoindexId:true,size:6142800, added. In the following example, you will see how to explicitly specify the id field when
max:10000}) inserting the documents within a collection. While explicitly specifying the id field, you
 The documents in the MongoDB are in JSON format .we can create document have to keep in mind the uniqueness of the field; otherwise the insert will fail.
separately and merge all documents into one collection as follows:  The following command explicitly specifies the id field:
> user1 = {FName: "Test", LName: "User", Age:30, Gender: "M", Country: "US"} >db.users.insert({"_id":10, "Name": "explicit id"})
{ The insert operation creates the following document in the userscollection:
"FName" : "Test", { "_id" : 10, "Name" : "explicit id" }
"LName" : "User",
"Age" : 30, Q.8 Explain Update and Delete command
"Gender" : "M",
"Country" : "US" Ans:
} Update
> user2 = {Name: "Test User", Age:45, Gender: "F", Country: "US"}
{ "Name" : "Test User", "Age" : 45, "Gender" : "F", "Country" : "US" }  The update() method updates a single document by default. If you need to update all
> documents that match the selection criteria, you can do so by setting the multi option
You will next add both these documents ( user1 and user2) to the users collection in as true.
the following order of operations:  Let’s begin by updating the values of existing columns. The $set operator will be used
>db.users.insert(user1) for updating the records.
>db.users.insert(user2) db.collection_name.update({selection_criteria,updated_data})
db.emp.update({name:”anand”},$set:{age:50}})
Inserting Documents Using Loop  If you check the output, you will see that only the first document record is updated,
 Documents can also be added to the collection using a for loop. The following code which is the default behavior of update since no multi option was specified.
inserts users using for .  Now let’s change the update command and include the multi option:
>for(var i=1; i<=20; i++) db.users.insert({"Name" : "Test User" + i, "Age": db.emp.update({name:”anand”},$set:{age:50}},{multi:true})
10+i,"Gender" : "F", "Country" : "India"})  Setting this property updates all the documents that satisfies the given condition
 Various Read operations in mongodb are as follows:  use the same update() command with the $unset operator to remove fields
from the documents.The following command will remove the field Company from all
Method Description the documents:
Db.collection.find(query,projection) Returns one document that >db.users.update({},{$unset:{"Company":""}},{multi:true})
Example: satisfies the specified query  The save() replaces the existing document with the new document passed in the save()
db.emp.findOne({“age”:45}) criteria on the collection. >db.collection_name.save({_id:ObjectId(),NewData({})
If multiple documents satisfy
the query method returns the Delete
first document
db.collection.findOneAndDelete(filter,options) Delets the single document  To delete documents in a collection, use the remove () method . If you specify a
Example: based on the filter and return selection criterion, only the documents meeting the criteria will be deleted. If no
db.emp.findOneAndDelete({age:45}) the deleted document criteria is specified, all of the documents will be deleted.The following command will
db.collection.findOneAndReplace(filter, Modifies and replaces a single delete the documents where Gender = ‘M’ :
replacement,options) document based on the filter >db.emp.remove({"Gender":"M"})
Page 8 of 63 Page 9 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 If there are multiple records and you want to delete only the first record ,then set  To display the first 3 documents who stay in Mumbai
justOne() parameter in remove() method.  In huge collection,if you want to return only few matching documents,then the limit()
>db.emp.remove({“age”:20},1) command is used
 The following command will delete all documents: >db.emp.find({“addrs”:”mumbai”}).limit(3)
>db.users.remove({})  To display the next 2 document after skipping first two records whose age is 37
 Finally, if you want to drop the collection, the following command will drop the >db.emp.find({“age”:37}).skip(2).limit(2)
collection:
>db.users.drop() Q.10: Explain Conditional Operators

Ans:
Q.9: write note on Query Document  Conditional operators enable you to have more control over the data you are trying to
extract from the database.The operator compares two expressions and fetches
Ans: documents from mongodb collection

 A rich query system is provided by MongoDB. Query documents can be passed as a NAME DESCRIPTION
parameter to the find() method to filter documents within a collection. $eq Matches values that are equal to a specified
 A query document is specified within open “{” and closed “}” curly braces. A query value
document ismatched against all of the documents in the collection before returning the {Key:{$eq:value}}
result set.Using the find() command without any query document or an empty query $gt Matches values that are greater than a
document such asfind({}) returns all the documents within the collection. specified value
 A query document can contain selectors and projectors. {Key:{$gt:value}}
o A selector is like a where condition in SQL or a filter that is used to filter out the $gte Matches values that are greater than or
results. equal to a specified value
o A projector is like the select condition or the selection list that is used to display the {Key:{$gte:value}}
data fields. $lt Matches values that are less than a specified
value
Selector {Key:{$lt:value}}
$lte Matches values that are less than or equal
 In MongoDB ,when you execute find() method,then it displays all fields of a document toa specified value
The following command will return all the female users: {Key:{$lte:value}}
>db.users.find({"Gender":"F"}) $in Matches any of the values specified in an
 MongoDB also supports operators that merge different conditions together in array
order to refine your search on the basis of your requirements. {Key:{$in:value}}
Let’s refine the above query to now look for female users from India. The following $ne Matches all values that are not equal toa
command willreturn the same: specified value
>db.users.find({"Gender":"F", $or: [{"Country":"India"}]}) {Key:{$ne:value}}
 Next, if you want to find all female users who belong to either India or US, execute the
following command: To find students whose Age >25 .
>db.users.find({"Gender":"F",$or:[{"Country":"India"},{"Country":"US >db.students.find({"Age":{"$gt":25}})
"}]})
If you change the above example to return students with Age >= 25 , then the command is
Projector >db.students.find({"Age":{"$gte":25}})

 In above example find() command returns all fields of the documents matching the Let’s find all students who belong to either class C1 or C2 . The command for the same is
selector. >db.students.find({"Class":{"$in":["C1","C2"]}})
 Let’s add a projector to the query document where, in addition to the selector, you will
also mention specific details or fields that need to be displayed. Let’s next find students who don’t belong to class C1 or C2 . The command is
 Suppose you want to display the first name and age of all female employees. In this >db.students.find({"Class":{"$nin":["C1","C2"]}})
case, along with theselector, a projector is also used.
 Execute the following command to return the desired result set:
>db.users.find({"Gender":"F"}, {"Name":1,"Age":1}) If you want to find all students who are younger than 25 (Age < 25), you can execute the
Page 10 of 63 Page 11 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

following findwith a selector: { "_id" : "M", "totalStudent" : 9 }


>db.students.find({"Age":{"$lt":25}}) >

If you want to find out all students who are older than 25 (Age <= 25), execute the Similarly, in order to find out the class-wise average score , the following command can be
following: executed:
>db.students.find({"Age":{"$lte":25}}) >db.students.aggregate({$group:{_id:"$Class", AvgScore: {$avg: "$Score"}}})
{ "_id" : "Biology", "AvgScore" : 90 }
Q.11: Explain MapReduce { "_id" : "C3", "AvgScore" : 90 }
{ "_id" : "Chemistry", "AvgScore" : 90 }
Ans: { "_id" : "C2", "AvgScore" : 93 }
 Map-reduce is the data processing paradigm for condensing large volumes of data into { "_id" : "C1", "AvgScore" : 85 }
useful aggregate results. >
 For map-reduce operation mongodb provides the mapreduce database command
 In this map-reduce operation ,mongoDB applies the map phase each input.The map Q13: Explain Regular Expressions
function emits key-value pairs.
 For those keys that have multiple values, MongoDB applies reduce phase ,which collect Ans:
and condenses the aggregate data.
 Map reduce function can be used on both structured data and unstructured data  Regular expressions are useful in scenarios where you want to find a stringas some
o Map is a javascript function that maps a value with akey and emits a key-value particular pattern. As in SQL we had LIKE clause, InmongoDB it is regular expressions
pair.It divides the big problem into multiple small problems,which can be further for the same.
subdivided into sub-problems
o Reduce is a javascript function that reduces or groups all the documents having In order to understand this, let's take the example of students with different names.
the same key and produces the final output which was the answer to big
problem that you were trying to solve >db.students.insert({Name:"Student1", Age:30, Gender:"M", Class: "Biology", Score:90})
 In order to understand how it works,let’s consider the following example where you >db.students.insert({Name:"Student2", Age:30, Gender:"M", Class: "Chemistry", Score:90})
find out the number of male,female and others in emp collection >db.students.insert({Name:"Test1", Age:30, Gender:"M", Class: "Chemistry", Score:90})
 The first step is to create map and reduce functions and then you call the mapReduce >db.students.insert({Name:"Test2", Age:30, Gender:"M", Class: "Chemistry", Score:90})
function and pass the necessary arguments. >db.students.insert({Name:"Test3", Age:30, Gender:"M", Class: "Chemistry", Score:90})
>var map=function(){emit(this.Gender,1)}
>var reduce=function(key,value){return Array.sum(value);}  Say you want to find all students with names starting with “St” or “Te” and whose class
 This will group document emitted by the map function on the key field begins with “Che”.
 Put them together using the mapReduce function
>db.emp.mapReduce(map,reduce,{out:”mapreducecount”}) The same can be filtered using regular expressions, like so:
Q.12 Explain aggregate functions >db.students.find({"Name":/(St|Te)*/i, "Class":/(Che)/i})
{ "_id" : ObjectId("52f89ecae451bb7a56e59086"), "Name" : "Student2", "Age" : 30,
Ans: "Gender" : "M", "Class" : "Chemistry", "Score" : 90 }
.........................
 Aggregate operations process data records and return computed results. Aggregation { "_id" : ObjectId("52f89f06e451bb7a56e59089"), "Name" : "Test3", "Age" : 30,
operations group values from multiple document together and can perform a variety of "Gender" : "M", "Class" : "Chemistry", "Score" : 90 }
operations on the grouped data to return a single result, >
 Aggregate function groups the records in collection, and can be used to provide total
number (sum), average, minimum, maximum, etc out of the group selected. In order to understand how the regular expression works, let’s take the query
 The aggregation framework enables you find out the aggregate value without using the "Name":/(St|Te)*/i.
MapReduce function. Performance-wise, the aggregation framework is faster than the
MapReduce function.  //I indicates that the regex is case insensitive.
To perform aggregate function aggregate() is function to be used. Following is the syntax for  (St|Te)*means the Name string must start with either “St” or “Te”.
aggregation:  The * at the end means it will match anything after that.
 When you put everything together, you are doing a case insensitive match of names
>db.students.aggregate({$group:{_id:"$Gender", totalStudent: {$sum: 1}}}) that have either “St” or “Te” at the beginning of them. In the regex for the Class also
{ "_id" : "F", "totalStudent" : 6 } the same Regex is issued.
Page 12 of 63 Page 13 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 It takes an optional parameter called verbose , which determines what the explain
Q.14: Explain use of cursors output should look like.
The following are the verbosity modes:
Ans:  allPlansExecution,executionStats, and queryPlanner.
 The default verbosity mode is queryPlanner,which means if nothing is specified, it
 The find() method is used, MongoDB returns the results of the query as a cursor defaultsto queryPlanner.
object. In order to display the result, the mongo shell iterates over the returned cursor The following code covers the steps executed when filtering on the username field:
 To return all the users in the US. In order to do so, you created a variable, assigned >db.users.find({"Name":"Test User"}).explain("allPlansExecution")
the output of find() to the variable, which is a cursor, and then using the while loop "queryPlanner" : {
you iterate and print the output. "plannerVersion" : 1,
The code snippet is as follows: "namespace" : "mydbproc.users",
> var c = db.users.find({"Country":"US"}) "indexFilterSet" : false,
> while(c.hasNext()) printjson(c.next()) "parsedQuery" : {
{ "$and" : [ ]
"_id" :ObjectId("52f4a823958073ea07e15070"), },
"FName" : "Test",
"LName" : "User", "winningPlan" : {
"Age" : 30, "stage" : "COLLSCAN",
"Gender" : "M", "filter" : {
"Country" : "US" "$and" : [ ]
} },
{ "direction" : "forward"
"_id" :ObjectId("52f4a826958073ea07e15071"), },
"Name" : "Test User", "rejectedPlans" : [ ]
"Age" : 45, },
"Gender" : "F", "executionStats" : {
"Country" : "US" "executionSuccess" : true,
} "nReturned" : 20,
> "executionTimeMillis" : 0,
 The next() function returns the next document. The h asNext() function returns true if "totalKeysExamined" : 0,
a document exists, and printjson() renders the output in JSON format. "totalDocsExamined" : 20,
 The variable to which the cursor object is assigned can also be manipulated as an "executionStages" : {
array. If, instead of looping through the variable, you want to display the document at "stage" : "COLLSCAN",
array index 1, you can run the following command: "filter" : {
"$and" : [ ]
> var c = db.users.find({"Country":"US"}) },
>printjson(c[1])
{ "nReturned" : 20,
"_id" :ObjectId("52f4a826958073ea07e15071"), "executionTimeMillisEstimate" : 0,
"Name" : "Test User", "works" : 22,
.... "Gender" : "F", "advanced" : 20,
"Country" : "US"} "needTime" : 1,
"needFetch" : 0,
Q.15: Explain explain() function "saveState" : 0,
"restoreState" : 0,
Ans: "isEOF" : 1,
"invalidates" : 0,
 The explain() function can be used to see what steps the MongoDB database is running "direction" : "forward",
while executing a query. the output format of the function and the parameter that is "docsExamined" : 20
passed to the function have changed. },
"allPlansExecution" : [ ]

Page 14 of 63 Page 15 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

}, "Name" : 1
"serverInfo" : { },
"host" : " ANOC9", "indexName" : "Name_1",
"port" : 27017, "isMultiKey" : false,
"version" : "3.0.4", "direction" : "forward",
"gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952" "indexBounds" : {
}, "Name" : [
"ok" : 1 "[\"user101\", \"user101\"]"
]
 As you can see, the explain() output returns information regarding queryPlanner, }
executionStats, and serverInfo. As highlighted above, the information the output }
returns depends on the verbosity modeselected. },
"rejectedPlans" : [ ]
Q16: Explain Concept of Indexes },
"executionStats" : {
Ans: "executionSuccess" : true,
"nReturned" : 1,
 Indexes are used to provide high performance read operations for queries that are "executionTimeMillis" : 0,
used frequently. "totalKeysExamined" : 1,
 By default, whenever a collection is created and documents are added to it, an index "totalDocsExamined" : 1,
is created on the id field of the document. "executionStages" : {
"stage" : "FETCH",
Single Key Index "nReturned" : 1,
"executionTimeMillisEstimate" : 0,
 Let’s create an index on the Name field of the document. Use ensureIndex() to create "works" : 2,
the index. "advanced" : 1,
"needTime" : 0,
>db.testindx.ensureIndex({"Name":1}) "needFetch" : 0,
"saveState" : 0,
 The index creation will take few minutes depending on the server and the collection "restoreState" : 0,
size. "isEOF" : 1,
 Let’s run the same query that you run earlier with explain() to check what the steps "invalidates" : 0,
the database is executing post index creation. "docsExamined" : 1,
 Check the n ,nscanned, and millisfields in the output. "alreadyHasObj" : 0,
"inputStage" : {
>db.testindx.find({"Name":"user101"}).explain("allPathsExecution") "stage" : "IXSCAN",
{ "nReturned" : 1,
"queryPlanner" : { "executionTimeMillisEstimate" : 0,
"plannerVersion" : 1, "works" : 2,
"namespace" : "mydbproc.testindx", "advanced" : 1,
"indexFilterSet" : false, "needTime" : 0,
"parsedQuery" : { "needFetch" : 0,
"Name" : { "saveState" : 0,
"$eq" : "user101" "restoreState" : 0,
} "isEOF" : 1,
}, "invalidates" : 0,
"winningPlan" : { "keyPattern" : {
"stage" : "FETCH", "Name" : 1
"inputStage" : { },
"stage" : "IXSCAN", "indexName" : "Name_1",
"keyPattern" : { "isMultiKey" : false,

Page 16 of 63 Page 17 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

"direction" : "forward", "plannerVersion" : 1,


"indexBounds" : { "namespace" : "mydbproc.testindx",
"Name" : [ "indexFilterSet" : false,
"[\"user101\", \"user101\"]" "parsedQuery" : {
] "$and" : [
}, {
"keysExamined" : 1, "Name" : {
"dupsTested" : 0, "$eq" : "user5"
"dupsDropped" : 0, }
"seenInvalidated" : 0, },
"matchTested" : 0
} {
}, "Age" : {
"allPlansExecution" : [ ] "$gt" : 25
}, }
"serverInfo" : { }
"host" : "ANOC9", ]
"port" : 27017, },
"version" : "3.0.4", "winningPlan" : {
"gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952" "stage" : "KEEP_MUTATIONS",
}, "inputStage" : {
"ok" : 1 "stage" : "FETCH",
} "filter" : {
> "Age" : {
 As you can see in the results, there is no table scan. The index creation makes a "$gt" : 25
significant difference in the query execution time. }
},
Compound Index ............................
"indexBounds" : {
 When creating an index, you should keep in mind that the index covers most of your "Name" : [
queries. If yousometimes query only the Name field and at times you query both the "[\"user5\", \"user5\"
Name and the Age field, creating a compound index on the Name and Age fields will be },
more beneficial than an index that is created on either of the fields because the "rejectedPlans" : [
compound index will cover both queries. {
"stage" : "FETCH",
The following command creates a compound index on fields Name and Age of the collection ......................................................
testindx. "indexName" : "Name_1_Age_1",
"isMultiKey" : false,
>db.testindx.ensureIndex({"Name":1, "Age": 1}) "direction" : "forward",
.....................................................
 Compound indexes help MongoDB execute queries with multiple clauses more "executionStats" : {
efficiently. When "executionSuccess" : true,
 creating a compound index, it is also very important to keep in mind that the fields "nReturned" : 1,
that will be used for exactmatches (e.g. Name :"S1" ) come first, followed by fields "executionTimeMillis" : 0,
that are used in ranges (e.g. Age : {"$gt":20} ). "totalKeysExamined" : 1,
"totalDocsExamined" : 1,
Hence the above index will be beneficial for the following query: .....................................................
"inputStage" : {
>db.testindx.find({"Name": "user5","Age":{"$gt":25}}).explain("allPlansExecution") "stage" : "FETCH",
{ "filter" : {
"queryPlanner" : { "Age" : {

Page 18 of 63 Page 19 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

"$gt" : 25 >db.testindx.insert({"Name":"usercit"})
} >db.testindx.insert({"Name":"usercit", "Age":30})
},
"nReturned" : 1, However, if you execute
"executionTimeMillisEstimate" : 0,
"works" : 2, >db.testindx.insert({"Name":"usercit", "Age":30})
"advanced" : 1,
"allPlansExecution" : [  it’ll throw an error like E11000 duplicate key error index:
{ mydbpoc.testindx.$Name_1_Age_1
"nReturned" : 1,  dup key: { : "usercit", : 30.0 }
"executionTimeMillisEstimate" : 0,  You may create the collection and insert the documents first and then create an index
CHAPTER 6 ■ USING MONGODB SHELL on the collection.
75  If you create a unique index on the collection that might have duplicate values in the
"totalKeysExamined" : 1, fields on
"totalDocsExamined" : 1,  which the index is being created, the index creation will fail.
"executionStages" : {  To cater to this scenario, MongoDB provides a dropDupsoption. The dropDupsoption
............................................................. saves the first document found and remove any subsequent documents with duplicate
"serverInfo" : { values.
"host" : " ANOC9",
"port" : 27017, The following command will create a unique index on the name field and will delete any
"version" : "3.0.4", duplicate
"gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952" documents:
},
"ok" : 1 >db.testindx.ensureIndex({"Name":1},{"unique":true, "dropDups":true})
} >
>
system.indexes
Unique Index
 Whenever you create a database, by default a system.indexescollection is created. All
 Creating index on a field doesn’t ensure uniqueness, so if an index is created on the of the information about a database’s indexes is stored in the system.indexescollection.
Name field, then two or more documents can have the same names. However, if This is a reserved collection, so you cannot modify its documents or remove documents
uniqueness is one of the constraints that needs to be enabled, the unique property from it. You can manipulate it only through ensureIndexand the dropIndexesdatabase
needs to be set to true when creating the index. commands.
 Whenever an index is created, its meta information can be seen in system.indexes.
First, let’s drop the existing indexes.
The followingcommand can be used to fetch all the index information about the mentioned
>db.testindx.dropIndexes() collection:

The following command will create a unique index on the Name field of the testindxcollection: db.collectionName.getIndexes()

>db.testindx.ensureIndex({"Name":1},{"unique":true}) For example, the following command will return all indexes created on the testindxcollection:

Now if you try to insert duplicate names in the collection as shown below, MongoDB returns >db.testindx.getIndexes()
an errorand does not allow insertion of duplicate records:
dropIndex
For example, if you have a unique index on {"name":1, "age":1} ,
>db.testindx.ensureIndex({"Name":1, "Age":1},{"unique":true})  The dropIndexcommand is used to remove the index.
>
The following command will remove the Name field index from the testindxcollection:
then the following inserts will be permissible:
>db.testindx.dropIndex({"Name":1})

Page 20 of 63 Page 21 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

{ "nIndexesWas" : 3, "ok" : 1 }
> Q18: MongoDB Document Data Model Approach

reIndex Ans:

 When you have performed a number of insertions and deletions on the collection, you  As you know, in MongoDB, data is stored in documents . This opens up some new
may have to rebuild the indexes so that the index can be used optimally. The possibilities in schema design. It also complicates our schema design process. In
reIndexcommand is used to rebuild the indexes. MongoDB, the schema design depends on the problem you are trying to solve.
 The following command rebuilds all the indexes of a collection. It will first drop the
indexes, including the default index on the id field, and then it will rebuild the indexes. Embedding

db.collectionname.reIndex()  you will see if embedding will have a positive impact on the performance. Embedding
can be useful when you want to fetch some set of data and display it on the screen,
The following command rebuilds the indexes of the testindxcollection: such as a page that displays comments associated with the blog; in this case the
comments can be embedded in the Blogs document.
>db.testindx.reIndex()  The benefit of this approach is that since MongoDB stores the documents contiguously
{ on disk, all therelated data can be fetched in a single seek.
"nIndexesWas" : 2,  Apart from this, since JOINs are not supported and you used referencing in this case,
"msg" : "indexes dropped for collection", the application might do something like the following to fetch the comments data
"nIndexes" : 2, associated with the blog.
..............  Fetch the associated comments _id from the blogs document.
"ok" : 1  Fetch the comments document based on the comments_idfound in the first step.
}  If you take this approach, which is referencing, not only does the database have to do
> multiple seeks to find your data, but additional latency is introduced into the lookup
since it now takes two round trips to the database to retrieve your data.
Q17: Designing an Application’s Data Model  If the application frequently accesses the comments data along with the blogs, then
almost certainly embedding the comments within the blog documents will have a
Ans: positive impact on the performance.
 Another concern that weighs in favor of embedding is the desire for atomicity and
let's understand how to design the data model for an application in mongoDB he MongoDB isolation in writing data. MongoDB is designed without multi-documents transactions.
database provides two options for designing a data model: In MongoDB, the atomicity of the operation is provided only at a single document level
so data that needs to be updated together atomically needs to be placed together in a
 the user can either embed related objects within one another single document.
 When you update data in your database, you must ensure that your update either
 or it can reference each other using ID succeeds or fails entirely, never having a “partial success,” and that no other database
reader ever sees an incomplete write operation.
The Problem with Normal Forms
Referencing
 As mentioned, the nice thing about normalization is that it allows for easy updating
without any redundancy  You understood embedding is the approach that will provide the best performance in
 However, a problem arises when you try to get the data back out . For instance, to find many cases; it also provides data consistency guarantees. However, in some cases, a
all tags and comments associated with posts by a specific user, the relational database more normalized model works better in MongoDB.
programmer uses a JOIN.  One reason for having multiple collections and adding references is the increased
 By using a JOIN, the database returns all data as per the application screen design, but flexibility it gives when querying the data. Let’s understand this with the blogging
the real problem is what operation the database performs to get that result set. example mentioned above.
 Generally, any RDBMS reads from a disk and does a seek, which takes well over 99%  You saw how to use embedded schema, which will work very well when displaying all
of the timespent reading a row. When it comes to disk access, random seeks are the the data together on a single page (i.e. the page that displays the blog post followed
enemy. The reason why this is so important in this context is because JOINs typically by all of the associated comments).
require random seeks. The JOIN operation is one of the most expensive operations Now suppose you have a requirement to search for the comments posted by a particular
within a relational database. user. The query(using this embedded schema) would be as follows:

Page 22 of 63 Page 23 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

db.posts.find({'comments.author': 'author2'},{'comments': 1}) find() on thecomments collection :


The result of this query, then, would be documents of the following form:
{ db.comments.find({"author": "author2"})
"_id" :ObjectId("509d27069cc1ae293b36928d"),
"comments" : [ {  In general, if your application’s query pattern is well known, and data tends to be
"subject" : "Sample Comment 1 ", accessed in only one way, an embedded approach works well. Alternatively, if your
"body" : "Comment1 Body.", application may query data in many different ways, or you are not able to anticipate
"author_id" : "author2", the patterns in which data may be queried, a more “normalized” approach may be
"created_date" :ISODate("2015-07-06T13:34:23.929Z")}...] better.
}  For instance, in the above schema, you will be able to sort the comments or return a
"_id" :ObjectId("509d27069cc1ae293b36928d"), more restricted set of comments using the limit, skip operators. In the embedded case,
"comments" : [ you’re stuck retrieving all the comments in the same order in which they are stored in
{ the post.
"subject" : "Sample Comment 2",  For instance, a popular blog with a large amount of reader engagement may have
"body" : "Comments Body.", hundreds or even thousands of comments for a given post. In this case, embedding
"author_id" : "author2", carries significant penalties with it:
"created_date" :ISODate("2015-07-06T13:34:23.929Z") • Effect on read performance : As the document size increases, it will occupy more
}...]} memory. The problem with memory is that a MongoDB database caches frequently
accessed documents in memory, and the larger the documents become, the lesser
 The major drawback to this approach is that you get back much more data than you the probability of them fitting into memory. This will lead to more page faults while
actually need. retrieving the documents, which will lead to random disk I/O, which will further slow
 On the other hand, suppose you decide to use a normalized schema. In this case you down the performance.
will have three documents: “Authors,” “Posts,” and “Comments.” • Effect on update performance : As the size increases and an update operation is
 The “Authors” document will have Author-specific content such as Name, Age, Gender, performed on such documents to append data, eventually MongoDB is going to
etc., and the need to move the document to an area with more space available. This movement,
 “Posts” document will have posts-specific details such as post creation time, author of when it happens, significantly slows update performance.
the post, actual content, and the subject of the post.
 The “Comments” document will have the post’s comments such as CommentedOn date
time, created Q19: Operational Considerations
 by author, and the text of the comment. This is depicted as follows:
 In addition to the way the elements interact with each other (i.e. whether to store the
// Authors document: documents in an embedded manner or use references), a number of other operational
{ factors are important when designing a data model for the application. These factors
"_id": ObjectId("508d280e9cc1ae293b36928e "), are covered in the following sections.
"name": "Jenny",
.......... Data Lifecycle Management
}
//Posts Document  This feature needs to be used if your application has datasets that need to be persisted
{ in the database only for a limited time period.
"_id" :ObjectId("508d27069cc1ae293b36928d"),....................  Say you need to retain the data related to the review and comments for a month. This
} feature can be taken into consideration.
// Comments document:  This is implemented by using the Time to Live (TTL) feature of the collection. The TTL
{ feature of the collection ensures that the documents are expired after a period of time.
"_id": ObjectId("508d359a9cc1ae293b3692a0"),  Additionally, if the application requirement is to work with only the recently inserted
"Author": ObjectId("508d27069cc1ae293b36928d"), documents, using capped collections will help optimize the performance.
"created_date" :ISODate("2015-07-06T13:34:59.336Z"),
"Post_id": ObjectId("508d27069cc1ae293b36928d"), Indexes
..........
}  Indexes can be created to support commonly used queries to increase the
performance. By default, an index is created by MongoDB on the id field.
In this scenario, the query to find the comments by “author2” can be fulfilled by a simple  The following are a few points to consider when creating indexes:

Page 24 of 63 Page 25 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 At least 8KB of data space is required by each index.  It handles data requests, manages data access, and performs background
 For write operations, an index addition has some negative performance impact. Hence management operations.
for collections with heavy writes, indexes might be expensive because for each insert,  When a mongod is run without any arguments, it connects to the default data
the keys must be added to all the indexes. directory.
 Indexes are beneficial for collections with heavy read operations such as where the  By default, MongoDB listens for connections from clients on port 27017, and stores
proportion of read-to-write operations is high. The un-indexed read operations are not data in the /data/db directory in C:\ drive.
affected by an index.  mongod also has a HTTP server which listens on a port 1000 higher than the default
port. This basic HTTP server provides administrative information about the database.
Sharding
mongo
 One of the important factors when designing the application model is whether to
partition the data or not.  mongo provides an interactive JavaScript interface for the developer to test queries
 This is implemented using sharding in MongoDB. and operations directly on the database and for the system administrators to manage
 Sharding is also referred as partitioning of data. In MongoDB, a collection is partitioned the database.
with its  This is all done via the command line. When the mongo shell is started, it will connect
 documents distributed across cluster of machines, which are referred as shards. This to the default database called test .
can have a significant impact on the performance. We will discuss sharding more in  This database connection value is assigned to global variable db.
Chapter tk.
 A Large Number of Collections The design considerations for having multiple collections mongos
vs. storing data in a single collection are the following:
 There is no performance penalty in choosing multiple collections for storing data.  mongos is used in “MongoDB Shard,”. It is a routing service for MongoDB
 Having distinct collections for different types of data can have shard configurations that processes queries from the application layer, and
performanceimprovements in high-throughput batch processing applications. determines the location of this data in the sharded cluster, in order to
 When you are designing models that have a large number of collections, you need to complete these operations.
take intoconsideration the following behaviors:
 A certain minimum overhead of few kilobytes is associated with each collection. Q21: what are the various tools available in MongoDB?
 At least 8KB of data space is required by each index, including the id index.
 You know by now that the metadata for each database is stored in the Ans:
<database>.nsfile. Each
 collection and index has its own entry in the namespace file, so you need to consider mongodump
the limits_on_the_size_of_namespace files when deciding to implement a large number
of collections.  This utility is used as part of an effective backup strategy.
 Growth of the Document Few updates, such as pushing an element to an array, adding  It creates a binary export of the database contents.
new fields, etc., can lead to an increase in the document size, which can lead to the  mongodump can read data from either mongod or mongos instances
movement of the document from one slot to another in order to fit in the document. mongorestore:
This process of document relocation is both resource and time consuming.
 Although MongoDB provides padding to minimize the relocation occurrences, you may  The mongorestore program writes data from a binary database dump created
need to handle the document growth manually. by mongodump to a MongoDB instance.
 mongorestore can create a new database or add data to an existing database.
Q20: Explain concept of Core Processes of MongoDB  mongorestore can write data to either mongod or mongos instances

Ans: C:\Program Files\MongoDB\Server\4.0\bin>mongodump


mongod 2018-08-10T10:25:17.172+0530 writing admin.system.users to
mongos 2018-08-10T10:25:17.176+0530 done dumping admin.system.users (1 document)
mongo 2018-08-10T10:25:17.177+0530 writing admin.system.version to
2018-08-10T10:25:17.182+0530 done dumping admin.system.version (3 documents)
mongod 2018-08-10T10:25:17.182+0530 writing mydb.books to
2018-08-10T10:25:17.183+0530 writing mydb.emp1 to
 mongod is the primary daemon( is a program that runs continuously and exists for 2018-08-10T10:25:17.186+0530 writing mydb.emp to
the purpose of handling periodic service requests that a computer system expects to 2018-08-10T10:25:17.186+0530 writing details.student to
receive) process for the MongoDB system. 2018-08-10T10:25:17.207+0530 done dumping mydb.books (6 documents)

Page 26 of 63 Page 27 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

2018-08-10T10:25:17.207+0530 writing mydb.mapreducecount to Q22: What is Replication? Explain it


2018-08-10T10:25:17.266+0530 done dumping mydb.emp1 (10001 documents)
2018-08-10T10:25:17.267+0530 done dumping mydb.mapreducecount (3 documents) Ans:
2018-08-10T10:25:18.203+0530 done dumping details.student (22 documents)
2018-08-10T10:25:18.208+0530 done dumping mydb.emp (9 documents)  Replication is the process of synchronizing data across multiple servers.
bsondump:  Replication provides redundancy and increases data availability with multiple copies of
data on different database servers.
 The bsondump converts BSON files into human-readable formats, such as JSON and  Replication protects a database from the loss of a single server.
CSV.  Replication also allows you to recover from hardware failure and service interruptions.
 For example, this utility can be used to read the output file generated by mongodump.  With additional copies of the data, you can dedicate one to disaster recovery,
reporting, or backup.
mongoimport, mongoexport:
How Replication works in MongoDB
 The mongoimport tool imports content from an Extended JSON, CSV, or TSV export
created by mongoexport and importing it into a mongod instance.
 mongoexport is a utility that produces a JSON, CSV or TSV export of data stored in a
MongoDB instance.

mongostat, mongotop ,mongosniff


 These utilities provide diagnostic information related to the current operation of a
mongod instance.
 The mongostat utility command checks the status of all running mongod instances and
return counters of database operations. These counters include inserts, queries,
updates, deletes, and cursors. Command also shows when you’re hitting page faults,
and showcase your lock percentage. This means that you're running low on memory,
hitting write capacity or have some performance issue.
 mongotop provides a method to track the amount of time a MongoDB instance spends
reading and writing data. mongotop provides statistics on a per-collection level. By There are two types of replication supported in MongoDB:
default, mongotop returns values every second.
 mongosniff is a powerful traffic capture and replay tool that you can use to inspect and  Traditional master/slave replication and
record commands sent a MongoDB instance, and then replay those commands back  Replica set.
onto another host at a later time.
 Master/Slave Replication
Standalone Deployment
 In this type of replication, there is one master and a number of slaves that replicate
 Standalone deployment is used for development purpose; it doesn’t ensure any the data from the master.
redundancy of data and it doesn’t ensure recovery in case of failures. So it’s not  The only advantage with this type of replication is that there’s no restriction on the
recommended for use in production environment. number of slaves within a cluster.
 Standalone deployment has the following components: a single mongod and a client  However, thousands of slaves will overburden the master node, so in practical
connecting to the mongod, as shown : scenarios it’s better to have less than dozen slaves.
 In a basic master/slave setup, you have two types of mongod instances: one instance
is in the master mode and the remaining are in the slave mode, as shown below.
 Since the slaves are replicating from the master, all slaves need to be aware of the
master’s address.

 MongoDB uses sharding and replication to provide a highly available system by


distributing and duplicating the data.

Page 28 of 63 Page 29 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 The master node maintains a capped collection (oplog) that stores an ordered history
of logical writes to the database.
 The slaves replicate the data using this oplog collection.
 Since the oplog is a capped collection, if the slave’s state is far behind the master’s
state, the slave may become out of sync.
 In that scenario, the replication will stop and manual intervention will be needed to re-  The primary logs any changes or updates to its data sets in its oplog(read as op log).
establish the replication.  The secondaries also replicate the oplog of the primary and apply all the operations to
 There are two main reasons behind a slave becoming out of sync: their data sets.
 The slave shuts down or stops and restarts later. During this time, the oplog may have  When the primary becomes unavailable, the replica set nominates a secondary as the
deleted the log of operations required to be applied on the slave. primary.
 The slave is slow in executing the updates that are available from the master.  The primary node is selected through an election mechanism. If the primary goes
down, the selected node will be chosen as the primary node.
Disadvantage:  Figure below shows how a two-member replica set failover happens.

 this type of replication doesn’t automate failover and


 provides less redundancy.

Replica set

 Replica sets are basically a type of master-slave replication but they provide automatic
failover.
 A replica set has one master, which is termed as primary, and multiple slaves, which
are termed as secondary in the replica set context.
 In a replica set, the primary mongod receives all write operations from clients and the
secondary mongod replicates the operations from the primary and thus both have the
same data set.

 The primary node goes down, and the secondary is promoted as primary.
 The original primary comes up, it acts as slave, and becomes the secondary node.

The points to be noted are

 A replica set is a mongod’s cluster, which replicates among one another and ensures
automatic failover.

Page 30 of 63 Page 31 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 In the replica set, one mongod will be the primary member and the others will be
secondary members.
 The primary member is elected by the members of the replica set. All writes are
directed to the primary member whereas the secondary members replicate from the
primary asynchronously using oplog.
 The secondary’s data sets reflect the primary data sets, enabling them to be promoted
to primary in case of unavailability of the current primary

Primary and Secondary Members

 There are two types of members: primary members and secondary members.

Primary member :
Hidden members
 A replica set can have only one primary, which is elected by the voting nodes in the
replica set. Any node with associated priority as 1 can be elected as a primary. The
client redirects all the write operations to the primary member, which is then later
replicated to the secondary members.

Secondary member

 Members in SECONDARY state replicate the primary’s data set and can be configured
to accept read operations. Secondary’s are eligible to vote in elections, and may be
elected to the PRIMARY state if the primary becomes unavailable. In addition to this, a
replica set can have other types of secondary members.

Types of Secondary Members

 Priority 0 Replica Set Members in MongoDB  A hidden member maintains a copy of the primary’s data set but is invisible to client
 Hidden members applications.
 Delayed members  Hidden members must always be priority 0 members and so cannot become primary.
 Arbiters  In a replica set, these members can be dedicated for reporting needs or backups.
 Non-voting members  Hidden members can vote in the elections

Priority 0 Replica Set Members in MongoDB Delayed members

 A priority zero member in a replica set is a secondary member that cannot become the  Delayed members contain copies of a replica set’s data set.
primary. These members can act as normal secondary’s, but cannot trigger any  However, a delayed member’s data set reflects an earlier, or delayed, state of the set.
election.  Must be priority 0 members. Set the priority to 0 to prevent a delayed member from
The main functions of a priority are as follows: becoming primary as they do not consist of updated data.
 Maintains data set copies  Should be hidden members. Always prevent applications from seeing and querying
 Accepts and performs read operations delayed members.
Elects the primary node  Do vote in elections for primary, if members[n].votes is set to 1.

 A priority zero member is particularly useful in multi-data center deployments.


 In a replica set containing three members, one data center hosts both, the primary and
a secondary, and the second data center hosts one priority zero member. priority zero
members are helpful for backup or standby member.

Page 32 of 63 Page 33 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 performing replica set maintenance using methods such


as rs.stepDown() or rs.reconfig(), and the secondary members losing connectivity
to the primary for more than the configured timeout (10 seconds by default).

Arbiters

 They are secondary members that do not hold a copy of the primary’s data and hence
they cannot become primary.
 Replica sets may have arbiters to add a vote in elections for primary.
 Arbiters always have exactly 1 election vote, and thus allow replica sets to have an  The replica set cannot process write operations until the election completes
uneven number of voting members without the overhead of an additional member that
successfully.
replicates data.
 The replica set can continue to serve read queries if such queries are configured
to run on secondaries.

Process of Election
 In order to get elected, a server need to not just have the majority but needs to
have majority of the total votes.
 If there are X servers with each server having 1 vote, then a server can become
primary only when it has at least [(X/2) + 1] votes.
 If a server gets the required number of votes or more, then it will become primary.
The primary that went down still remains part of the set; when it is up, it will act as
a secondary server until the time it gets a majority of votes again.
Non-voting members  If just two nodes are there, acting as master and slave then slave will never be
promoted as master if the server goes down.
 These members hold the primary’s data copy, they can accept client read operations,
 In case of network partitioning, the master will lose the majority of votes since it
and they can also become the primary, but they cannot vote in an election.
 The voting ability of a member can be disabled by setting its votes to 0. By default will have only its own one vote and it’ll be demoted to slave.
every member has one vote.  A replica set uses an arbiter to help resolve such conflicts.

cfg_1 = rs.conf() Adding arbiter for elections


cfg_1.members[3].votes = 0  If a network partition occurs with the master and arbiter in one data center and the
cfg_1.members[4].votes = 0 slave in another data center, the master will remain master since it will still have
cfg_1.members[5].votes = 0 the majority of votes.
rs.reconfig(cfg_1)  If the master fails with no network partitioning, the slave can be promoted to
master because it will have two votes (slave + arbiter).
Q.23 Explain Election process  This three-server setup provides a robust failover deployment.
Ans:
 Replica sets use elections to determine which set member will become primary.
 Replica sets can trigger an election in response to a variety of events, such as:
 Adding a new node to the replica set,
 initiating a replica set,

Page 34 of 63 Page 35 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 The oplog is a capped collection, with every new addition of an operation, the oldest
operations are automatically moved out. This is done to ensure that it does not
grow beyond a pre-set bound, which is the oplog size.
 By default in MongoDB, available free space or 5% is used for the oplog on
Windows

Reconsideration of the oplog size:


Data Replication Process  Updates to multiple documents simultaneously: Since the operations need to be
 The members of a replica set replicate data continuously. translated into operations that are idempotent(means that even if it’s applied
 Every member, including the primary member, maintains an oplog. multiple times on the secondary, the secondary node data will remain
 An oplog is a capped collection where the members maintain a record of all the  consistent), this scenario might end up requiring great deal of oplog size.
operations that are performed on the data set.  Deletes and insertions happening at the same rate involving same amount of data:
 The secondary members copy the primary member’s oplog and apply all the In this scenario, although the database size will not increase, the operations
operations in an asynchronous manner. translation into an idempotent operation can lead to a bigger oplog.
 Large number of in-place updates: Although these updates will not change the
database size, the recording of updates as idempotent operations in the oplog can
lead to a bigger oplog.

Initial Sync and Replication


Initial sync is done when the member is in either of the following two cases:
1. The node has started for the first time (i.e. it’s a new node and has no data).
2. The node has become stale, where the primary has overwritten the oplog and the
node has not replicated the data. In this case, the data will be removed.
In both cases, the initial sync involves the following steps:
1. First, all databases are cloned.
2. Using oplog of the source node, the changes are applied to its dataset.
3. Finally, the indexes are built on all the collections.
 Post the initial sync, the replica set members continuously replicate the changes in
order to be up-to-date.
Oplog  Most of the synchronization happens from the primary, but chained replication can
 Oplog stands for the operation log . An oplog is a capped collection where all the be enabled where the sync happens from a secondary only (i.e. the sync targets are
operations that modify the data are recorded. changed based on the ping time and state of other member’s replication).
 The oplog is maintained in a special database, namely local in the collection
oplog.$main . Syncing – Normal Operation
 Every operation is maintained as a document, where each document corresponds to  In normal operations, the secondary chooses a member from where it will sync its
one operation that is performed on the master server. The document contains data, and then the
various keys, including the following keys :  operations are pulled from the chosen source’s oplog collection ( local.oplog.rs ).
 ts :This stores the timestamp when the operations are performed. It’s an internal
Once the operation (op) is get, the secondary does the following:
type and is composed of a 4-byte timestamp and a 4-byte incrementing counter. 1. It first applies the op to its data copy.
 op: This stores information about the type of operation performed. The value is 2. Then it writes the op to its local oplog.
stored as 1-byte code (e.g. it will store an “I” for an insert operation). 3. Once the op is written to the oplog, it requests the next op.
 ns :This key stores the collection namespace on which the operation was  Suppose it crashes between step 1 and step 2, and then it comes back again.
performed.  In this scenario, it’ll assume the operation has not been performed and will re-apply
 o:This key specifies the operation that is performed. In case of an insert, this will it.
store the document to insert. Only operations that change the data are maintained
Starting Up
in the oplog because it’s a mechanism for ensuring that the secondary node data is
 When a node is started, it checks its local collection to find out the
in sync with the primary node data.
lastOpTimeWritten .
Page 36 of 63 Page 37 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 This is the time of the latest op that was applied on the secondary. Failover

The following shell helper can be used to find the latest op in the shell:  All members of a replica set are connected to each other as shown below.
 rs.debug.getLastOpWritten()  They exchange a heartbeat message amongst each other.
 The output returns a field named ts , which depicts the last op time.  A node with missing heartbeat is considered as crashed.
 If a member starts up and finds the ts entry, it starts by choosing a target to sync
from and it will start syncing as in a normal operation.
 However, if no entry is found, the node will begin the initial sync process.

Whom to Sync From?


 As of 2.0, based on the average ping time servers automatically sync from the
 If the node is a secondary node, it will be removed from the membership of the
“nearest” node.
replica set. In the future, when it recovers, it can re-join. Once it re-joins, it needs to
 When you bring up a new node, it sends heartbeats to all nodes and monitors the update the latest changes.
response time.  If the down period is small, it connects to the primary and catches up with the latest
updates.
Based on the data received, it then decides the member to sync from using the following  However, if the down period is lengthy, the secondary server will need to resync with
algorithm: Primary where it deletes all its data and does an initial sync as if it’s a new server.
for each healthy member Loop:  When a primary does not communicate with the other members of the set for more
if state is Primary than the configured electionTimeoutMillis period (10 seconds by default), an eligible
add the member to possible sync target set secondary calls for an election to nominate itself as the new primary.
if member’s lastOpTimeWritten is greater then the local lastOpTime Written  A new primary will be elected by majority of the replica set nodes, which is in
add the member to possible sync target set accordance with the automatic failover capability of the replica set.
Set sync_from = MIN (PING TIME to members of sync target set)
 Running the following command will show the server that is chosen as the source
for syncing:

db.adminCommand({replSetGetStatus:1})
 The output field of syncingTo is present only on secondary nodes and provides
information on the node from which it is syncing.

Making Writes Work with Chaining Slaves


 MongoDB supports chained replication.
 A chained replication occurs when a secondary member replicates from another
secondary member instead of primary. Rollbacks
 This might be the case, for example, if a secondary selects its replication target  A rollback reverts write operations on a former primary when the member rejoins
its replica set after a failover.
based on ping time and if the closest member is another secondary thus reducing
 A rollback is necessary only if the primary had accepted write operations that
the load on primary
the secondaries had not successfully replicated before the primary stepped down.
 When a server is started, it’ll most probably choose a server within the same data  When the primary rejoins the set as a secondary, it reverts, or “rolls back,” its write
center to sync from, thus reducing the WAN traffic. operations to maintain database consistency with the other members.
 There is no method to handle rollback situations automatically for MongoDB. Therefore
manual Intervention is required to apply rollback data.
 While applying the rollback, it’s vital to ensure that these are replicated to either all or
at least some of the members in the set so that in case of any failover rollbacks can be
avoided.
 A rollback does not occur if the write operations replicate to another member of the
replica set before the primary steps down and if that member remains available and
accessible to a majority of the replica set.

Page 38 of 63 Page 39 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

Consistency  The members should be distributed geographically in order to cater to main data
center failure. As shown below, the members that are kept at a geographically different
 In MongoDB, the reads can be routed to the secondaries, the writes are always routed location other than the main data center can have priority set as 0, so that they cannot
to the primary. be elected as primary and can act as a standby only.
 If the read requests are routed to the primary node, it will always see the up-to-date
changes, which means the read operations are always consistent with the last write
operations.
 However, if the application has changed the read preference to read from secondaries,
there might be a probability of user not seeing the latest changes or seeing previous
states. This is because the writes are replicated asynchronously on the secondaries.
 This behavior is characterized as eventual consistency, which means that although the
secondary’s state is not consistent with the primary node state, it will eventually
become consistent over time.
 There is no way that reads from the secondary can be guaranteed to be consistent,
except by issuing write concerns to ensure that writes succeed on all members before
the operation is actually marked successful.

Possible Replication Deployment  When replica set members are distributed across data centers, network partitioning
can prevent data centers from communicating with each other. In order to ensure a
 The architecture you chose to deploy a replica set affects its capability and capacity. majority in the case of network partitioning, it keeps a majority of the members in one
location.
Odd number of members
Scaling Reads
 This should be done in order to ensure that there is no tie when electing a primary. If
you have an even number of voting members, deploy an arbiter so that the set has an  The primary purpose of the secondaries is to ensure data availability in case of
odd number of voting members. downtime of the primary node
 They can be used to perform some backup operations or data processing jobs or to
scale out reads.
 One of the ways to scale reads is to issue the read queries against the secondary
nodes; by doing so the workload on the master is reduced.
 One important point that you need to consider when using secondaries for scaling read
operations is that in MongoDB the replication is asynchronous, which means if any
write or update operation is performed on the master’s data, the secondary data will be
momentarily out-of-date.
 If the application in question is read-heavy and is accessed over a network and does
not need up-to-date data, the secondaries can be used to scale out the read in order to
 Replica set fault tolerance is the count of members, which can go down but still the provide a good read throughput.
replica set has enough members to elect a primary in case of any failure.  Although by default the read requests are routed to the primary node, the requests can
be distributed over secondary nodes by specifying the read preferences.

 If the application is read-heavy , the read can be distributed across secondaries. As


the requirement increases, more nodes can be added to increase the data duplication;
this can have a positive impact on the read throughput.

Page 40 of 63 Page 41 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 Applications that are geographically distributed: In such cases, you can have a
replica set that is distributed across geographies. The read preferences should be set to In most situations, operations read from the primary but if it is
primaryPreferred
read from the nearest secondary node. This helps in reducing the latency that is unavailable, operations read from secondary members.
caused when reading over network and this improves the read performance.
secondary All operations read from the secondary members of the replica set.

In most situations, operations read from secondary members but if


secondaryPreferred no secondary members are available, operations read from
the primary.

Operations read from member of the replica set with the least
nearest
network latency, irrespective of the member’s type.

Write Concerns
 If the application always requires up-to-date data, it uses the option
primaryPreferred, which in normal circumstances will always read from the primary  When the client application interacts with MongoDB, it is generally not aware whether
node. However, if the primary is unavailable, as is the case during failover situations, the database is on standalone deployment or is deployed as a replica set.
operations read from secondary members.  However, when dealing with replica sets, the client should be aware of write concern
and read concern.
 Since a replica set duplicates the data and stores it across multiple nodes, these two
concerns give a client application the flexibility to enforce data consistency across
nodes while performing read or write operations.
 Using a write concern enables the application to get a success or failure response from
MongoDB.
 When used in a replica set deployment of MongoDB, the write concern sends a
confirmation from the server to the application that the write has succeeded on the
primary node. However, this can be configured so that the write concern returns
success only when the write is replicated to all the nodes maintaining the data.
 Note: If while specifying number the number is greater than the nodes that actually
hold the data, the command will keep on waiting until the members are available. In
order to avoid this indefinite wait time, wtimeout should also be used along with w,
which will ensure that it will wait for the specified time period, and if the write has not
 If you have an application that supports two types of operations, the first operation is succeeded by that time, it will time out.
the main workload that involves reading and doing some processing on the data,
whereas the second operation generates reports using the data. In such a scenario,
you can have the reporting reads directed to the secondaries.

Read Preference Mode Description

Default mode. All operations read from the current replica


set primary.
Multi-document transactions that contain read operations must use
primary
read preference primary.
All operations in a given transaction must route to the same
member.

Page 42 of 63 Page 43 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

How Writes Happen with Write Concern  One example is a query scanning through all documents of a database where the size
exceeds the server memory. This leads to loading of the documents in memory and
In order to ensure that the written data is present on say at least two members, issue the moving the working set out to disk.
following command :  In MongoDB, the scaling is handled by scaling out the data horizontally (i.e.
>db.testprod.insert({i:”test”, q: 50, t: “B”}, {writeConcern: {w:2}}) partitioning the data across multiple commodity servers), which is also called sharding
(horizontal scaling).
Steps  Sharding addresses the challenges of scaling to support large data sets and high
throughput by horizontally dividing the datasets across servers where each server is
1. The write operation is directed to the primary. responsible for handling its part of data and no one server is burdened.
2. The operation is written to the oplog of primary with ts depicting the time of  These servers are also called shards.
operation.  Every shard is an independent database. All the shards collectively make up a single
3. A w: 2 is issued, so the write operation needs to be written to one more server before it’s logical database .
marked successful.  Sharding reduces the operations count handled by each shard. For example, when data
4. The secondary queries the primary’s oplog for the op, and it applies the op. is inserted, only the shards responsible for storing those records need to be accessed.
5. Next, the secondary sends a request to the primary requesting for ops with ts  The processes that need to be handled by each shard reduce as the cluster grows
greater than t. because the subset of data that the shard holds reduces. This leads to an increase in
6. At this point, the primary sends an update that the operation until t has been the throughput and capacity horizontally.
applied by the secondary as it’s requesting for ops with {ts: {$gt: t}} .
7. The writeConcern finds that a write has occurred on both the primary and
secondary, satisfying the w: 2 criteria, and the command returns success.

Q23: Explain implementing Advanced Clustering with Replica sets.

Ans:

1. Setting up a replica set.


2. Removing a server.
3. Adding a server.
4. Adding an arbiter.
5. Inspecting the status.
6. Forcing a new election of a primary.
7. Using the web interface to inspect the status of the replica set.

Q24: What is Sharding?

Ans:

 MongoDB uses memory extensively for low latency database operations (Low
latency describes a computer network that is optimized to process a very high volume
of data messages with minimal delay (latency)).
 When you compare the speed of reading data from memory to reading data from disk,
reading from memory is approximately 100,000 times faster than reading from the
disk.
 A page fault happens when data which is not there in memory is accessed by
MongoDB.
 If there’s free memory available, the OS will directly load the requested page into
memory; however, in the absence of free memory, the page in memory is written to When to use Sharding
the disk and then the requested page is loaded in the memory, slowing down the
process.  Although sharding is a compelling and powerful feature, it has significant infrastructure
 Few operations accidentally purge large portion of the working set from the memory, requirements and it increases the complexity of the overall deployment.
leading to an adverse effect on the performance.  Use sharding in the following instances:
Page 44 of 63 Page 45 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 The size of the dataset is huge and it has started challenging the capacity of a single Shards
system.
 Since memory is used by MongoDB for quickly fetching data, it becomes important to  Shards (upper left) store the application data. In a sharded cluster, only the
scale out when the active work set limits are set to reach. mongos routers or system administrators should be connecting directly to the
 If the application is write-intensive, sharding can be used to spread the writes across shards.
multiple servers.  Like an unsharded deployment, each shard can be a single node for development
and testing, but should be a replica set in production.
Q25: Explain Sharding Components  mongos routers (center) cache the cluster metadata and use it to route operations
to the correct shard or shards.
Ans:  Config servers (upper right) persistently store metadata about the cluster, including
 Sharding is enabled in MongoDB via sharded clusters. which shard has what subset of the data.
 The following are the components of a sharded cluster:
 Shards
 mongos
 Config servers

Config Server

 Config servers are special mongods that hold the sharded cluster’s metadata. This
metadata depicts the sharded system state and organization.
 The config server stores data for a single sharded cluster. The config servers should
be available for the proper functioning of the cluster.
 One config server can lead to a cluster’s single point of failure. For production
deployment it’s recommended to have at least three config servers, so that the
cluster keeps functioning even if one config server is not accessible.
 A config server stores the data in the config database, which enables routing of the
 The shard is the component where the actual data is stored. For the sharded cluster, it client requests to the respective data. This database should not be updated.
holds a subset of data and can either be a mongod or a replica set.  MongoDB writes data to the config server only when the data distribution has
 All shard’s data combined together forms the complete dataset for the sharded cluster. changed for balancing the cluster.
 Sharding is enabled per collection basis, so there might be collections that are not
sharded. mongos
 In every sharded cluster there’s a primary shard where all the unsharded collections
are placed in addition to the sharded collection data.  The mongos act as the routers. They are responsible for routing the read and write
 When deploying a sharded cluster, by default the first shard becomes the primary request from the application to the shards.
shard although it’s configurable.  An application interacting with a mongo database need not worry about how the data is
stored internally on the shards. For them, it’s transparent because it’s only the mongos
they interact with.
 The mongos, in turn, route the reads and writes to the shards.

Page 46 of 63 Page 47 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 The mongos cache the metadata from config server so that for every read and write Shard Key
request they don’t overburden the config server.
 However, in the following cases, the data is read from the config server :  Any indexed single/compound field that exists within all documents of the collection
 Either an existing mongos has restarted or a new mongos has started for the first time. can be a shard key. You specify that this is the field basis which the documents of the
 Migration of chunks. collection need to be distributed.
 Internally, MongoDB divides the documents based on the value of the field into chunks
Q26: Explain Data Distribution Process and distributes them across the shards.
There are two ways MongoDB enables distribution of the data:
Ans:  range-based partitioning and
 In MongoDB, the data is sharded or distributed at the collection level.  Hashbased partitioning.
 The collection is partitioned by the shard key.

Range based partitioning

 In range-based partitioning , the shard key values are divided into ranges.
 Say you consider a timestamp field as the shard key.
 In this way of partitioning, the values are considered as a straight line starting from a
Min value to Max value where Min is the starting period (say, 01/01/1970) and Max is
the end period (say, 12/31/9999).
 Every document in the collection will have timestamp value within this range only, and
 Levels of granularity available in a sharded MongoDB deployment
it will represent some point on the line.
 Based on the number of shards available, the line will be divided into ranges, and
documents will be distributed based on them.
 In this scheme of partitioning, the documents where the values of the shard key are
nearby are likely to fall on the same shard.
 This can significantly improve the performance of the range queries.

These four levels of granularity represent the units of data in MongoDB:


 Document—The smallest unit of data in MongoDB. A document represents a single
object in the system and can’t be divided further. You can compare this to a row in a
relational database.
 Chunk—A group of documents clustered by values on a field. A chunk is a concept that
exists only in sharded setups. This is a logical grouping of documents based on their
values for a field or set of fields, known as a shard key.
 Collection—A named grouping of documents within a database. To allow users to
separate a database into logical groupings that make sense for the application,
MongoDB provides the concept of a collection. This is nothing more than a named
grouping of documents, and it must be explicitly specified by the application to run any Disadvantage
queries.
 Database—Contains collections of documents. This is the top-level named grouping in  However, the disadvantage is that it can lead to uneven distribution of data,
the system. Because a database contains collections of documents, a collection must overloading one of the shards, which may end up receiving majority of the requests,
also be specified to perform any operations on the documents themselves. whereas the other shards remain underloaded, so the system will not scale properly.

Page 48 of 63 Page 49 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

Hash based partitioning  For a sharded cluster, 64MB is the default chunk size. In most situations, this is an apt
size for chunk slitting and migration.
 In hash-based partitioning , the data is distributed on the basis of the hash value of the  The mongos routes writes to the appropriate chunk based on the shard key value.
shard field. MongoDB splits chunks when they grow beyond the configured chunk size. Both inserts
 If selected, this will lead to a more random distribution compared to range-based and updates can trigger a chunk split.
partitioning.
 It’s unlikely that the documents with close shard key will be part of the same chunk.
For example, for ranges based on the hash of the id field, there will be a straight line of
hash values, which will again be partitioned on basis of the number of shards.
 On the basis of the hash values, the documents will lie in either of the shards.

For Example
Say you have a blog posts collection which is sharded o
 Shard #1: Beginning of time up to July 2009
 Shard #2: August 2009 to December 2009
 Shard #3: January 2010 to through the end of
 time the field date .
 This ensures that the data is evenly distributed, but it happens at the cost of efficient
range queries.
Role of ConfigServers in the Above Scenario
Chunks
 Consider a scenario where you start getting insert requests for millions of
documents with the date of September 2009.
 The data is moved between the shards in form of chunks.
 In this case, Shard #2 begins to get overloaded.
 The shard key range is further partitioned into subranges, which are also termed as
 The config server steps in once it realizes that Shard #2 is becoming too big. It will
chunks.
split the data on the shard and start migrating it to other shards. After the
 A chunk consists of a subset of sharded data.
migration is completed, it sends the updated status to the mongos.
 Each chunk has a inclusive lower and exclusive upper range based on the shard key.
 So now Shard #2 has data from August 2009 until September 18, 2009 and Shard
#3 contains data from September 19, 2009 until the end of time.
 When a new shard is added to the cluster, it’s the config server’s responsibility to
figure out what to do with it. The data may need to be immediately migrated to the
new shard, or the new shard may need to be in reserve for some time.
 In summary, the config servers are the brains. Whenever any data is moved
around, the config servers let the mongos know about the final configuration so that
the mongos can continue doing proper routing.

Q.27 Explain Data Balancing Process

 We need to understand how MongoDB ensures that all the shards are equally loaded.
 The addition of new data or modification of existing data, or the addition or removal of
servers, can lead to imbalance in the data distribution, which means either one shard is
overloaded with more chunks and the other shards have less number of chunks, or it
Page 50 of 63 Page 51 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

can lead to an increase in the chunk size, which is significantly greater than the other
chunks.
 MongoDB ensures balance with the following background processes:
• Chunk splitting
• Balancer
Chunk Splitting

 Chunk splitting is one of the processes that ensures the chunks are of the specified
size.
 As you have seen, a shard key is chosen and it is used to identify how the documents
will be distributed across the shards.
 The documents are further grouped into chunks of 64MB (default and is configurable)
and are stored in the shards based on the range it is hosting.
 If the size of the chunk changes due to an insert or update operation, and exceeds the  Any of the mongos within the cluster can initiate the balancer process.
default chunk size, then the chunk is split into two smaller chunks by the mongos.  They do so by acquiring a lock on the config database of the config server, as
balancer involves migration of chunks from one shard to another, which can lead to
a change in the metadata, which will lead to change in the config server database.
 The balancer migrates one chunk at a time

The balancers impact on DB performance

1. Be configured to start the migration only when the migration threshold has reached.
The migration threshold is the difference in the number of maximum and minimum
chunks on the shards.
2. Or it can be scheduled to run in a time period that will not impact the production
traffic.

Migration process

 The moveChunk command is sent to the source shard.


 A n internal m oveChunk command is started on the source where it creates the
 copy of the documents within the chunk and queues it. In the meantime, any
 operations for that chunk are routed to the source by the mongos because the
 This process keeps the chunks within a shard of the specified size or lesser than that.  config database is not yet changed and the source will be responsible for serving
 Insert and update operations trigger splits.  any read/write request on that chunk.
 The split operation leads to modification of the data in the config server as the  The destination shard starts receiving the copy of the data from the source.
metadata is modified. Although splits don’t lead to migration of data, this operation can  Once all of the documents in the chunks have been received by the destination
lead to an unbalance of the cluster with one shard having more chunks compared to  shard, the synchronization process is initiated to ensure that all changes that
another.  have happened to the data while migration are updated at the destination shard.
 Once the synchronization is completed, the next step is to update the metadata
Balancer  with the chunk’s new location in the config database. This activity is done by
 the destination shard that connects to the config database and carries out the
 The balancer is a background process that manages chunk migrations.  necessary updates.
 If the difference in number of chunks between the largest and smallest shard exceed  Post successful completion of all the above, the document copy that is
the migration thresholds, the balancer begins migrating chunks across the cluster to  maintained at the source shard is deleted.
ensure an even distribution of data.
Operations on Sharding
1. Setting up a sharded cluster.
2. Creating a database and collection, and enable sharding on the collection.
3. Using the import command to load data in the sharded collection.
4. Distributed data amongst the shards.
5. Adding and removing shards from the cluster and checking how data is distributed
Page 52 of 63 Page 53 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

automatically.  { "shard" : "shard0001" }


Controlling Collection Distribution (Tag-Based Sharding)
 A tag is a keyword used as a “label” to group documents into different categories. This  { "shard" : "shard0001" }
allows the users to quickly navigate through similar content and it’s especially useful
 { "shard" : "shard0002" }
when dealing with a big amount of data.
 Tagging gives operators control over which collections go to which shard.  { "shard" : "shard0002" }
 In order to understand tag-based sharding, let’s set up a sharded cluster with 3
shards, 3 config servers, and 1 mongos.  { "shard" : "shard0002" }

1. Start the config servers  mongos>db.chunks.find({ns:"movies.comedy"}, {shard:1, _id:0}).sort({shard:1})

2. Start the shards  { "shard" : "shard0000" }

3. Start the mongos  { "shard" : "shard0000" }

4. Next, start a new terminal window, connect to the mongos, and enable sharding on the  { "shard" : "shard0000" }
collections
 { "shard" : "shard0000" }
5. 5. View the running databases connected to the mongos instance running at port.
 { "shard" : "shard0001" }
6. 6. Get reference to the database named as movies .
 mongos>db.chunks.find({ns:"movies.action"}, {shard:1, _id:0}).sort({shard:1})
7. 7. Enable sharding of the database movies .
 { "shard" : "shard0000" }
8. 8. Shard the collection movies.drama by shard key originality .
 { "shard" : "shard0000" }
9. 9. Shard the collection movies.action by shard key distribution .
 { "shard" : "shard0000" }
10.10. Shard the collection movies.comedy by shard key collections .
 { "shard" : "shard0000" }
 Now, to check how data is distributed across the shards.
 { "shard" : "shard0000" }
 Switch to configdb:
mongos> use config  { "shard" : "shard0001" }
 switched to db config
mongos>  { "shard" : "shard0001" }
 You can use chunks.find to look at how the chunks are distributed:
mongos>db.chunks.find({ns:"movies.drama"}, {shard:1,  { "shard" : "shard0001" }
_id:0}).sort({shard:1})
 { "shard" : "shard0001" }
 Similarly on the other documents of a collection if you fire the chunks.find command
we can see where the data of the data is distributed in the shards.  { "shard" : "shard0002" }
 mongos>db.chunks.find({ns:"movies.drama"}, {shard:1, _id:0}).sort({shard:1})  { "shard" : "shard0002" }
 { "shard" : "shard0000" }  { "shard" : "shard0002" }
 { "shard" : "shard0000" }  { "shard" : "shard0002" }
 { "shard" : "shard0000" }  { "shard" : "shard0001" }
 { "shard" : "shard0000" }  { "shard" : "shard0001" }
 { "shard" : "shard0001" }
Page 54 of 63 Page 55 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 { "shard" : "shard0002" } {collection:MaxKey}, "comedies")


 Now wait for the cluster to rebalance so that the chunks are distributed based on
 { "shard" : "shard0002" } the tags and rules defined above.

 { "shard" : "shard0002" }

Distribution without Tagging

Scaling with Tagging


 Let’s assume the collection movies.action needs two servers for its data. Since you
have only three shards, this means the other two collection’s data need to be
Tag the shards as belonging to each of the collection:
moved to one shard.
mongos>sh.addShardTag("shard0000", "dramas")
mongos>sh.addShardTag("shard0001", "actions")  In this scenario, you will change the tagging of the shards. You will add the tag
mongos>sh.addShardTag("shard0002", "comedies") “comedies” to Shard0 and remove the tag from Shard2, and further add the tag
mongos> “actions” to Shard2.
This signifies the following:  This means that the chunks tagged “comedies” will be moved to Shard0 and chunks
 Put the chunks tagged “dramas” on shard0000. tagged “actions” will be spread to Shard2.
 Put the chunks tagged “actions” on shard0001.  You first move the collection movies.comedy chunk to Shard0 and remove the same
 And put the chunks tagged “comedies” on shard0002. from Shard2:
 Next, you will create rules to tag the collections chunk accordingly.
mongos>sh.addShardTag("shard0000","comedies")
Rule 1: All chunks created in the movies.drama collection will be tagged as “dramas:” mongos>sh.removeShardTag("shard0002","comedies")
mongos>sh.addTagRange("movies.drama", {originality:MinKey},  Next, you add the tag “actions” to Shard2, so that movies.action chunks are spread
{originality:MaxKey}, "dramas") across Shard2 also:
 The rule uses MinKey, which means negative infinity, and MaxKey, which means
positive infinity. mongos>sh.addShardTag("shard0002","actions")
 Hence the above rule means mark all of the chunks of the collection movies.drama  Re-issuing the find command after some time will show the following results:
with the tag “dramas.”
mongos>db.chunks.find({ns:"movies.drama"}, {shard:1,
 Similar to this you will make rules for the other two collections.
_id:0}).sort({shard:1})
mongos>sh.addTagRange("movies.action", {distribution:MinKey},
{distribution:MaxKey}, "actions")
mongos>sh.addTagRange("movies.comedy", {collection:MinKey},
Page 56 of 63 Page 57 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 the data is distributed effectively among the nodes.

Monitoring the Config Servers


 The config server stores the metadata of the sharded cluster. The mongos caches
the data and routes the request to the respective shards.
 If the config server goes down but there’s a running mongos instance, there’s no
immediate impact on the shard cluster and it will remain available for a while.
 However, you won’t be able to perform operations like chunk migration or restart a
new mongos.
 In the long run, the unavailability of the config server can severely impact the
availability of the cluster. To ensure that the cluster remains balanced and
available, you should monitor the config servers.

Monitoring the Shard Status Balancing and Chunk Distribution


 For a most effective sharded cluster deployment , it’s required that the chunks be
distributed evenly among the shards.
 This is done automatically by MongoDB using a background process.
 You need to monitor the shard status to ensure that the process is working
effectively.
 For this, you can use the db.printShardingStatus() or sh.status() command in the
mongos mongo shell to ensure that the process is working effectively.

Monitoring the Lock Status


 In almost all cases the balancer releases its locks automatically after completing its
Q28: What Points to Remember When Importing Data in a Sharded
process, but you need to check the lock status of the database in order to ensure
Environment
there’s no long lasting lock because this can block future balancing, which will affect
Ans:
Pre-Splitting of the Data the availability of the cluster.
 Instead of leaving the choice of chunks creation with MongoDB, you can tell  Issue the following from mongos mongo to
MongoDB how to do so using the following command: o check the lock status:
o use config
db.runCommand( { split : "practicalmongodb.mycollection" , middle : { shardkey : value } } o db.locks.find()
);
Deciding on the Chunk Size Production Cluster Architecture
 You need to keep the following points in mind when deciding on the chunk size :  To understand production cluster architecture, let’s consider a use case of a social
networking application where the user can create a circle of friends and can share
1. If the size is too small, the data will be distributed evenly but it will end up having more
their comments or pictures across the group. The user can also comment or like her
frequent migrations, which will be an expensive operation at the mongos layer.
2. If the size is large, it will lead to less migration, reducing the expense at the mongos layer, friend’s comments or pictures. The users are geographically distributed.
but you will end up with uneven data distribution.  The application requirement is immediate availability across geographies of all the
Choosing a Good Shard Key comments; data should be redundant so that the user’s comments, posts and
 It’s very essential to pick a good shard key for good distribution of data among pictures are not lost; and it should be highly available.
nodes of the shard cluster .  So the application’s production cluster should have the following components:

Monitoring for Sharding 1. At least two mongos instance, but you can have more as per need.
 In addition to the normal monitoring and analysis that is done for other MongoDB 2. Three config servers, each on a separate system.
instances, the sharding 3. Two or more replica sets serving as shards . The replica sets are distributed across
geographies with read concern set to nearest.
 cluster requires an additional monitoring to ensure that all its operations are
functioning appropriately and
Page 58 of 63 Page 59 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622
TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

 If it is, it should be recovered and restarted whereas if it’s non-recoverable, you


need to create a new replica set and replace it as soon as possible.

Q29: Explain Scenario 1, Scenario 2, Scenario 3 and Scenario 4


Ans:
Scenario 1:
Mongos become unavailable
 The application server where mongos has gone down will not be able to
communicate with the cluster but it will not lead to any data loss since the mongos
don’t maintain any data of its own.
 The mongos can restart, and while restarting, it can sync up with the config servers
to cache the cluster metadata, and the application can normally start its operations

Scenario 3
If one of the shard becomes unavailable
 In this scenario, the data on the shard will be unavailable, but the other shards will
be available, so it won’t stop the application.
 The application can continue with its read/ write operations; however, the partial
results must be dealt with within the application.
 In parallel, the shard should attempt to recover as soon as possible

Scenario 2
One of the mongod of the replica set becomes unavailable in a shard
 Since you used replica sets to provide high availability, there is no data loss.
 If a primary node is down, a new primary is chosen, whereas if it’s a secondary
node, then it is disconnected and the functioning continues normally.
 The only difference is that the duplication of the data is reduced, making the system
little weak, so you should in parallel check if the mongod is recoverable.

Page 60 of 63 Page 61 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2 TRAINING -> CERTIFICATION -> PLACEMENT BSC IT : SEM – V NGT: UNIT2

Scenario 4
Only one config server is available out of three
 In this scenario, although the cluster will become readonly, it will not serve any
operations that might lead to changes in the cluster structure, thereby leading to a
change of metadata such as chunk migration or chunk splitting.
 The config servers should be replaced ASAP because if all config servers become
unavailable, this will lead to an inoperable cluster

Page 62 of 63 Page 63 of 63
YouTube - Abhay More | Telegram - abhay_more YouTube - Abhay More | Telegram - abhay_more
607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622 607A, 6th floor, Ecstasy business park, city of joy, JSD road, mulund (W) | 8591065589/022-25600622

You might also like