0% found this document useful (0 votes)

40 views94 pages

MEAN 3 L4 Advanced MongoDB With Aggregation

To retrieve indexes in MongoDB, you can: 1. Use the db.collection.getIndexes() method to get a list of all indexes for a collection. 2. Use db.getCollectionNames() to get all collection names and loop through them to print the indexes for each collection. 3. Use db.currentOp() to view the status of any ongoing indexing processes. 4. Use db.killOp() to abort an ongoing index build process.

Uploaded by

Dimple Gulati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views94 pages

MEAN 3 L4 Advanced MongoDB With Aggregation

Uploaded by

Dimple Gulati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 94

Advanced MongoDB with Aggregation

A Day in the Life of a MEAN Stack Developer

In this sprint, Joe is going to work with MongoDB, and an important

project of a telecom company is handed over to him.

He has to create a database of call records for the telecom company

and write a program to analyze them.

In this lesson, we will learn how to solve this real-world scenario to

help Joe complete his task effectively and quickly.
Learning Objectives

By the end of this lesson, you will be able to:

List the index types and their properties

Create, remove, and modify indexes

Explain replication and sharding

Start a replica set and check its status

Create and deploy a shard cluster

Indexing and Aggregation
Introduction to Indexing

Indexes are data structures that store collection’s data set in a form that is easy to
traverse. Indexes help perform the following functions:
● Execute queries and find documents that match the query criteria without a
collection scan
● Limit the number of documents a query examines
● Store field value in the order of the value
● Support equality matches that are range-based queries
Types of Index

MongoDB supports the following index types for querying:

● Default _id: Each MongoDB collection contains an index on the default _id field.
● Single Field: For single-field index and sort operations, MongoDB can traverse the indexes
either in the ascending or descending order.
● Compound Index: MongoDB supports user-defined indexes, such as compound indexes for
multiple fields.
● Multikey Index: Multikey indexes are used for indexing array data.
● Geospatial Index: Geospatial indexes use 2D indexes and 2Dsphere indexes.
● Text Indexes: Text indexes search data string in a collection.
● Hashed Indexes: MongoDB supports hash-based sharding and provides hashed indexes.
Properties of Index

Indexes can have various properties. They are the following:

● TTL Indexes: The TTL index is used for TTL collections, which removes data after a period of
time.

● Unique Indexes: A unique index causes MongoDB to reject all documents that contain a
duplicate value for the indexed field.

● Partial Indexes: A partial index indexes only documents that meet specified filter criteria.

● Case Insensitive Indexes: A case insensitive index disregards the case of the index key values.

● Sparse Indexes: A sparse index does not index documents that do not have the indexed field.
Compound Index

A compound index in MongoDB contains multiple single field indexes separated by a comma.
MongoDB limits the fields of a compound index to a maximum of 31.

db.products.createIndex( { "item": 1, "stock": 1 } )

Sparse Index

Sparse indexes manage documents with indexed fields and ignore documents which do not contain any
index field.
To create a sparse index, use the db.collection.createIndex() method and set the sparse option to true.

db.addresses.createIndex( { "xmpp_id": 1 }, { sparse: true } )

When a sparse index returns an incomplete index, then MongoDB does not use that index unless it is
specified in the hint method.

{ x: { $exists: false } }
Unique Index

Unique indexes can be created by using the db.collection.createIndex() method

and set the unique option to true. To create a unique index on the item field of the
items collection, execute the operation given below.

db.items.createIndex( { “item": 1 }, { unique: true } )

If a unique index has no value, the index stores a null value for the document.
Because of this unique constraint, MongoDB permits only one document without
the indexed field. For more than one document with a valueless or missing
indexed field, the index build process fails.
Create Compound, Sparse, and Unique Indexes

Duration: 45 min.

Problem Statement:

You are given a project to create compound, sparse, and, unique indexes.
Assisted Practice: Guidelines to demonstrate Indexing

1. Set up MongoDB server and shell.

2. Write a program to create compound, sparse, and, unique indexes.

Modification of Index
Index Creation

During index creation, operations on a database are blocked and the database becomes unavailable
for any read or write operation. The read or write operations on the database queue allow the index
building process to complete.

Use the following command to make MongoDB available even during an index build process:

db.items.createIndex( {item:1},{background: true})

db.items.createIndex({category:1}, {sparse: true, background: true})

Index Creation

If you perform any administrative operations when MongoDB is creating indexes in the
background for a collection, you will receive an error.
The index build process at the background:
● Uses an incremental approach and is slower than the normal foreground process
● Depends on the size of the index for its speed
● Impacts database performance
To avoid any performance issues, use:
● getIndexes() to ensure that your application checks for the indexes at the start-up
● Equivalent method for your driver and ensure it terminates an operation if the proper
indexes do not exist
● Separate application codes and designated maintenance windows
Index Creation on Replica Set

Background index operations on a secondary replica set begin after the index build completes in
the primary. To build large indexes on secondaries perform the following steps:

1 2 3 4
Restart one Catch up with
Restart as a
secondary at a The index build the other
member of the
time in a completes members of the
replica set
standalone mode set

5 6 When all 7 8
Build the index secondaries Build the index
on the next have new index, Restart as a on the former
secondary step down the standalone primary
primary

Use the command below to specify a name for an index:

db.products.createIndex( { item: 1, quantity: -1 } , { name: "inventory" } )

Index Removal

Use the following methods to remove indexes.

dropIndex() Method

To remove an ascending index on the item field in the items collection:

db.accounts.dropIndex( { "tax-id": 1 } )

db.collection.dropIndexe() Method

To remove all indexes barring the _id index from a collection, use the command:

db.collection.dropIndexes()
Index Modification

To modify an index, perform the following steps.

1
Drop Index: Execute the query given below to return a document showing the
operation status.

db.orders.dropIndex({ "cust_id" : 1, "ord_date" : -1, "items" : 1 })

2
Recreate the Index: Execute the query given below to return a document showing the
status of the results.

db.orders.createIndex({ "cust_id" : 1, "ord_date" : -1, "items" : -1 })

Create, Modify, and Delete Indexes

Duration: 40 min.

Problem Statement:

You are given a project to create, modify, and delete indexes.

Assisted Practice: Guidelines to Create, Modify, and Delete Indexes

1. Set up MongoDB server and shell.

2. Write a program to create indexes.

3. Write a program to modify indexes.

4. Write a program to delete indexes.

Retrieval of Index
Rebuild Index

Use the db.collection.reIndex method to rebuild all indexes of a collection. This will
drop all indexes including _id and rebuild all indexes in a single operation.

You can use the following commands when rebuilding indexes:

● db.currentOp()—Type this command in the mongo shell to view the indexing

process status.

● db.killOp()—Type this command in the mongo shell to abort an ongoing index

build process.
List Index

All indexes of a collection and a database can be listed. To get a list of all indexes of a
collection, use the db.collection.getIndexes() or a similar method.

To list all indexes of collections, use the operation given below in the mongo shell.

db.getCollectionNames().forEach(function(collection) {
indexes = db[collection].getIndexes();
print("Indexes for " + collection + ":");
printjson(indexes);
});

23
Retrieval of Index

Duration: 40 min.

Problem Statement:

You are given a project to retrieve indexes.

Assisted Practice: Guidelines to Retrieve Indexes

1. Set up MongoDB server and shell.

2. Write a program to retrieve indexes.

Aggregation
Aggregation

Aggregations process data sets and return calculated results. They are run on the
mongod instance to simplify application codes and limit resource requirements.

Following are the characteristics of aggregation:

● Uses collections of documents as an input and returns results in the form of one
or more documents
● Is based on data processing pipelines. Documents pass through multi-stage
pipelines and get transformed into an aggregated result
● The most basic pipeline stage in the aggregation framework provides filters that
function like queries
● The pipeline operations group and sort documents by defined field or fields
● The pipeline uses native operations within MongoDB to allow efficient data
aggregation and is the favored method for data aggregation
Aggregation
Aggregation

The aggregation operation given below returns all states with total population greater than 10 million.

db.zipcodes.aggregate( [{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },

{ $match: { totalPop: { $gte: 10*1000*1000 } } }] )

In this operation, the $group stage does the following:

● Groups the documents of the zipcode collection by the state field
● Calculates thetotalPop field for each state
● Returns an output document for each unique state

The aggregation operation given below returns user names sorted by the month of their joining.

db.users.aggregate([{ $project :{month_joined : { $month : "$joined" },name : "$_id",_id :

0}},{ $sort : { month_joined : 1 } }
])
Aggregation Operations

Aggregation operations manipulate data and return a computed result based on the input document
and a specific procedure. Aggregation provides the following semantics for data processing:

Count
This command along with the two methods, count() and cursor.count() provides access to total
counts in the mongo shell. The command given below counts all documents in the customer_info
collection.
db.customer_info.count()

Distinct
This operation searches for documents matching a query and returns all unique values for a field in
the matched document. The syntax given below is an example of a distinct operation.

db.customer_info.distinct( “customer_name" )
Aggregation Operations

Duration: 40 min.

Problem Statement:

You are given a project to use aggregation operations.

Assisted Practice: Guidelines to Demonstrate Aggregation Operations

1. Set up MongoDB server and shell.

2. Write a program to perform aggregation operations on MongoDB.

Use of Group Function
Group Function

Group operations accept sets of documents as input. They match the given query, apply the
operation, and then return an array of documents with the computed results.
A group does not support sharded collection data. In addition, the results of the group operation
must not exceed 16 megabytes.
The group operation shown below groups documents by the field ‘a’, where ‘a’ is less than three. It
also sums the field count for each group.

db.records.group( {key: { a: 1 },cond: { a: { $lt: 3 } },reduce: function(cur, result)

{ result.count += cur.count },initial: { count: 0 }} )
Use of Group Functions

Duration: 30 min.

Problem Statement:

You are given a project to use group functions.

Assisted Practice: Guidelines to Use Group Function

1. Set up MongoDB server and shell.

2. Write a program to perform group function in MongoDB.

Replication and Sharding
Replication

The primary task of a MongoDB administrator is to set up a functioning replication in the

production setting. Following are the benefits of replication:

● Increases data availability by creating redundancy

● Stores multiple copies of data across different databases in multiple locations, and thus
protects data when the database suffers any loss

● Helps manage data in the event of hardware failure and any kind of service interruptions

● Enhances read operations

● Stores copies of these operations in different data centers to increase the locality and
availability of data for distributed applications
Master-Slave Replication

The Master-Slave Replication is the oldest mode of replication that MongoDB supports.
In the earlier versions of MongoDB, the master-slave replication was used for failover,
backup, and read scaling. However, in the newer versions, it is replaced by replica sets
for most use cases.

Master

Slave Slave Slave

Replica Set in MongoDB

A replica set consists of a group of mongod instances that host the same
data set. The replica set functions as follows: Client Application
Driver
● The primary mongod receives all write operations and the secondary
mongod replicates the operations from the primary. Writes Reads
● The primary node receives the write operations from clients.
● The primary logs any changes or updates to its data sets in its oplog.
Primary
● The secondaries replicate the oplog of the primary and apply all the

Re
io
operations to their data sets.

cat

pli
ca
p li
● When the primary becomes unavailable, the replica set nominates a

tio
Re

n
secondary as the primary.
Secondary Secondary
Replica Set in MongoDB

An extra mongod instance can be added to a replica set to act as an arbiter. Following are some
characteristics of an arbiter:

● Arbiters do not maintain a data set.

● Arbiter is the node which just participated in an election to select the primary node.

● Arbiters do not require a dedicated hardware.

Secondary members in a replica set asynchronously apply operations from the primary. These
replica sets can function without some secondary members. As a result, all secondary members
may not return the updated data to clients.
Replica Set Members

A replica set can also have an arbiter. Arbiters do not replicate or store data but play a
crucial role in selecting a secondary to take the place of the primary, when the primary
becomes unavailable. A typical replica set contains a primary, secondary, and an arbiter.
Priority 0 Replica Set Members

A priority 0 member is a secondary member that cannot become the primary. The characteristics of a
priority 0 are:
● Cannot trigger any election
● Can maintain data set copies, accept and perform read operations, and elect a primary

By configuring a priority zero member, you can prevent secondaries from becoming the primary. In a
three-member replica set, one data center hosts both, the primary and a secondary, and a second
data center hosts one priority zero member.
Hidden Replica Set Members

Hidden members of a replica set are invisible to the client applications. The characteristics
of hidden members are:
● They store a copy of the primary’s data.
● They are priority 0 members, can elect a primary but cannot replace a primary.
● They are not given appropriate read and write rights.
● They can be used for dedicated functions like reporting and backup.
Start a Replica Set

Duration: 30 min.

Problem Statement:

You are given a project to start a replica set.

Assisted Practice: Guidelines to Start a Replica Set

1. Create directories in C drive.

2. Set up replica sets.

3. Connect to Mongo shell.

Tag Set
Tag Set

Tag sets allow tagging target read operations to select replica set members. Customized read
preferences and write concerns assess tag sets as follows:

● Read preferences stress on the tag value when selecting a replica set member to read from
● Write concerns ignore the tag value when selecting a member

You can specify tag sets with the following read preference modes:
● primaryPreferred
● secondary
● secondaryPreferred
● nearest

Tags are not compatible with the primary mode but are compatible with the nearest mode. When
combined together, the nearest mode selects the matching member, primary or secondary, with the
lowest network latency.
Tag Set for Replica Set

Tag sets allow customizing write concerns and read preferences in a replica set. MongoDB stores tag
sets in the replica set configuration object.

conf = rs.conf()
conf.members[0].tags = { "dc": "NYK", "rackNYK": "A"}
conf.members[1].tags = { "dc": "NYK", "rackNYK": "A"}
conf.members[2].tags = { "dc": "NYK", "rackNYK": "B" }
conf.members[3].tags = {"dc": "LON", "rackLON": "A"}
conf.members[4].tags = { "dc": "LON", "rackLON": "B"}
conf.settings = { getLastErrorModes: { MultipleDC : { "dc": 2}, multiRack: { rackNYK: 2 } }
rs.reconfig(conf)
Replica Set and Patterns
Replica Set Deployment Strategies

A replica set architecture impacts the set’s capability.

You can use the following deployment strategies for a replica set:

● Deploy an Odd Number of Members

For electing the primary member, add an arbiter if a replica set has an even number of
members.

● Consider Fault Tolerance

Adding a new member to a replica set may not increase fault tolerance. The additional
members can provide support for some dedicated functions, such as backups or reporting.

● Use Hidden and Delayed Members

Add hidden or delayed members to support dedicated functions, such as backup or reporting.

● Load Balance on Read-Heavy Deployments

Distribute reads to secondary members on read-heavy deployments. Add or move members to
alternate data centers and improve redundancy and availability.
Replica Set Deployment Strategies

Some more Replica Set Deployment Strategies are given below:

Add Capacity
Add a new member to an existing replica set before new demands arise.
Ahead of Demand

Distribute
Keep at least one member in an alternate data center as a backup in case of any data loss
Members
incident. Set the priorities of these members to zero to prevent them from becoming primary.
Geographically

When electing the primary, all members must be able to see each other to create a majority.
Keep Majority in
To enable the members elect the primary, ensure that most of the members are in one
One Location
location.

Use Replica Set This ensures that all operations are replicated at specific data centers. Tag sets help route read
Tag Sets operations to specific computers.

Use journaling to safely write data on a disk in case of shutdowns, power failure, and other
Use Journaling
unexpected failures.
Replica Set Deployment Patterns

The common deployment patterns for a replica set are as follows:

● Three-Member Replica Sets

Minimum recommended architecture for a replica set

● Four or More Member Replica Sets

Provides greater redundancy and supports greater distribution of read

operations and dedicated functionality

● Geographically Distributed Replica Sets

Members are distributed in multiple locations to protect data against facility-

specific failures, such as power outages
Oplog File

The record of operations maintained by the master server is called the operation log or
oplog. Each oplog document denotes a single operation performed on the master server
and contains the following keys:

Timestamp (TS) for

the Operation Op Key Namespace (NS) O Key

An internal function Type of operation The collection name Key for specifying
to track operations. performed as a 1- where the the operation to
Contains a 4-byte byte code, for operation is perform. For an
timestamp and a 4- example, i for an performed insert, this would
byte incrementing insert be the document
counter to insert
Replication State and Local Database

MongoDB maintains a local database called local to keep the information about the replication
state and the list of master and slaves. The content of this database remains local to the master
and slaves.

Slaves store the replication information in the local database. The unique slave identifier gets saved
in the me collection and the list of masters gets saved in sources collection.

The timestamp stored in the syncedTo command is used as follows:

● Master and Slave: To understand how up-to-date a slave is
● Slave: To query the oplog for new operations and find out if any operation is out of sync
Replication Administration

To check the replication status, use the function given below when connected to the master:

configured oplog size: 10.48576MB

log length start to end: 34 secs (0.01hrs)
oplog first event time: Tue Mar 30 2010 16:42:57 GMT-0400 (EDT)
oplog last event time: Tue Mar 30 2010 16:43:31 GMT-0400 (EDT)
now: Tue Mar 30 2010 16:43:37 GMT-0400 (EDT)

The oplog size and the date ranges of operations are contained in the oplog. In the given
example, the oplog size is 10 megabytes and can accommodate about 30 seconds of operations.
The log length serves as a metric for servers that have been operational long enough for the
oplog to roll over.
The functions given below will populate a list of sources for a slave, each displaying information
such as how far behind it is from the master.

db.printSlaveReplicationInfo()
Check a Replica Set Status

Duration: 30 min.

Problem Statement:

You are given a project to check a replica set status.

Assisted Practice: Guidelines to Check a Replica Set Status

1. Create directories in C drive.

2. Set up replica sets.

3. Connect to Mongo shell.

Sharding in MongoDB
Sharding

Sharding is the process of distributing data across multiple servers for storage. The
characteristics of sharding are as follows:

● Sharding adds more servers to a database and automatically balances data and load across
various servers.

● Sharding provides additional write capacity by distributing the write load over a number of
mongod instances.

● Sharding splits the data set and distributes them across multiple databases, or shards. Each
shard serves as an independent database, and together, shards make a single logical
database.

● Sharding reduces the number of operations each shard handles.

Importance of Sharding

Sharded clusters require a proper infrastructure setup, which increases the overall
complexity of the deployment. Therefore, consider deploying sharded clusters only when
your system shows the following characteristics:

● The data set outgrows the storage capacity of a single MongoDB instance.

● The size of the active working set exceeds the capacity of the maximum available RAM.

● A single MongoDB instance is unable to manage write operations.

What Is a Shard?

A shard is a replica set or a single mongod instance that holds the data
subset used in a sharded cluster. Each shard is a replica set that
provides redundancy and high availability for the data it holds.
The characteristics of a shard are as follows:

● MongoDB shards data on a per-collection basis.

● When directly connected to a shard, you will be able to view only a

fraction of the data contained in a cluster.

● Data is not organized in any particular order in a shard.

● There is no guarantee that two contiguous data chunks will reside on

any particular shard.
What Is a Shard?

When deploying sharding, you need to choose a key from a

collection and split the data using the key’s value. The
characteristics of a shard key are as follows:

● Determines document distribution among the different

shards in a cluster

● Is a field that exists in every document in the collection and

can be an indexed or indexed compound field

● Performs data partitions in a collection

● Helps distribute documents according to its range values

Choosing a Shard?

To enhance and optimize the performance, functioning, and capability of your

database, you need to choose the correct shard key.

Choose the appropriate shard key based on the following two factors:

● The schema of your data

● The way database applications query and perform write operations

Ideal Shard Key

An ideal shard key must have the following characteristics:

● Must be Easily Divisible

An easily divisible shard key enables data distribution among shards. If shard keys
contain limited number of possible values, then the chunks in shards cannot be split.

● High Degree of Randomness

This ensures that a single shard distributes write operations among the cluster and does
not become a bottleneck.

● Target a Single Shard

A shard key must target a single shard to enable the mongos program to return the
query operations directly from a single mongod instance.

● Use a Compound Shard Key

Compute a special purpose shard key or use a compound shard key if the existing field in
a collection is not the ideal key.

65
Range-Based Shard Key

In range-based sharding, MongoDB divides data sets into different ranges based on the values of shard
keys. In range-based sharding, documents having close shard key values reside in the same chunk and
shard.

Range-based partitioning supports range queries because for a given range query of a shard key, the
query router can easily find which shards contain those chunks.
Data distribution in range-based partitioning can be uneven, which may negate some benefits of
sharding. For example, if a shard key field size increases linearly, such as time, then all requests for a given
time range will map to the same chunk and shard. In such cases, a small set of shards may receive most
of the requests and the system would fail to scale.

66
Hash-Based Sharding

For hash-based partitioning, MongoDB first calculates the hash of a field’s value, and then
creates chunks using those hashes. In hash-based partitioning, collections in a cluster are
randomly distributed.

In hash-based partitioning:
• Data is evenly distributed.
• Hashed key values randomly distribute data across chunks and shards.
• Range query on the shard key is ineffective.

67
Impact of Shard Keys on Cluster Operation

Some shard keys can scale write operations. A computed shard key with randomness,
allows a cluster to scale write operations. To improve write scaling, MongoDB enables
sharding a collection on a hashed index.
MongoDB improves write scaling using the following two methods:
● Querying
A mongos instance enables applications to interact with sharded clusters. When mongos
receives queries from client applications, it uses metadata from the config server and
routes queries to the mongod instances. mongos makes querying operational in sharded
environments.

● Query Isolation
Query execution will be fast and efficient if mongos can route to a single shard using a
shard key and metadata is stored from the config server. If your query contains the first
component of a compound shard key then mongos can route a query to a single shard
and thus provides good performance.

68
Production Cluster Deployment

A production cluster must have the following components:

● Config Servers
Each of the three config servers must be hosted on separate machines. Each single sharded
cluster must have an exclusive use of its config servers. Each cluster in multiple sharded
clusters must have a group of config servers.

● Shards
A production cluster must have two or more replica sets or shards.

● Query Routers (mongos)

A production cluster must have one or more mongos instances that act as the routers for
the cluster. Configure the load balancer to enable a connection from a single client to reach
the same mongos.

69
Deploy a Sharded Cluster

To deploy a sharded cluster, perform the following sequence of tasks:

● Step 1: Create data directories for each of the three config server instances with the following command:
mkdir /data/configdb

● Step 2: Start each config server by issuing a command using the syntax given below:

mongod --configsvr --dbpath /data/configdb --port 27019

● Step 3: To start a mongos instance, issue a command using the syntax given below:

mongo --host mongos0.example.net --port 27017

To start a mongos that connects to a config server instance running on the following hosts and on the
default ports, issue the command given below:

Hosts
cfg0.example.net sh.addShard( “mongodb0.example.net:27017")
cfg1.example.net
cfg2.example.net

70
Create a Shard Cluster and Deploy the Sharded Cluster

Duration: 100 min.

Problem Statement:

You are given a project to create a shard cluster.

Assisted Practice: Guidelines to Shard a Cluster

1. Create directories at C drive.

2. Start config server instances.

3. Connect to Mongo shell.

Shard Implementations
Enable Sharding for Database

You need to enable sharding for the database of the collection before you start sharding.
To enable sharding, perform the following steps:

● Step 1: From a mongo shell, connect to the mongos instance and issue a command using the
syntax given below.

mongo --host <hostname of machine running mongos> --port <port mongos listens on>

● Step 2: Issue the sh.enableSharding() method and specify the name of the database for which
you want to enable sharding. Use the syntax given below:
sh.enableSharding("<database>")

Optionally, enable sharding for a database using the enableSharding command. For this, use the
syntax given below.
db.runCommand( { enableSharding: <database> } )

74
Enable Sharding for Collection

To enable sharding for a collection, perform the following steps:

● Determine the shard key value. The selected shard key value impacts the efficiency of
sharding.

● If the collection contains data, create an index on the shard key using the createIndex()
method. If the collection is empty then MongoDB creates the index as a part of the
sh.shardCollection().

● To enable sharding for a collection, open the mongo shell and issue the
sh.shardCollection() method.

75
Enable Sharding for Collection

To enable sharding for a collection, replace the string <database>.<collection> with the full
namespace of your database. This string consists of the name of your database, a dot, and the
full name of the collection.

The example given below shows sharding collections based on the partition key.

sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } )

sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } )
sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } )
sh.shardCollection("events.alerts", { "_id": "hashed" } )

76
Shard Balancing

● MongoDB uses balancing to redistribute data within a sharded cluster.

● When the shard distribution in a cluster is uneven, the balancer migrates chunks from one
shard to another to achieve a balance in chunk numbers per shard.

● Chunk migration is a background operation that occurs between two shards, an origin and a
destination.

● The origin shard sends the destination shard to all the current documents.

● During the migration, if an error occurs, the balancer aborts the process and leaves the
chunk unchanged in the origin shard.

● Adding a new shard to a cluster may create an imbalance because the new shard has no
chunks.

● Similarly, when a shard is being removed, the balancer migrates all the chunks from the
shard to other shards.

● After all the data is migrated and the metadata is updated, the shard can be safely removed.

77
Shard Balancing

Chunk migrations carry bandwidth and workload overheads, which may impact the database
performance.

Shard balancing minimizes the impact by:

● Moving only one chunk at a time
● Starting balancing only when two chunks reach the migration threshold

You may want to disable the balancer temporarily for:

● Maintaining MongoDB
● Preventing performance impact of MongoDB during peak load time

No Chunks Migration Threshold

< 20 2
20-79 4
>80 8

78
Tag Aware Sharding

● In mongoDB, you can create tags for a range of shard keys to associate those ranges to a
group of shards.

● These shards receive all inserts within that tagged range.

● The balancer that moves chunks from one shard to another obeys these tagged ranges.

● The balancer moves or keeps a specific subset of the data on a specific set of shards and
ensures that the most relevant data resides on the shard which is geographically closer to
the client/application server.

79
Add Shard Tags

When connected to a mongos instance, use the sh.addShardTag() method to associate tags with
a particular shard. The example given on the screen adds the tag NYC to two shards, and adds
the tags SFO and NRT to a third shard.

sh.addShardTag("shard0000", "NYC")
sh.addShardTag("shard0001", "NYC")
sh.addShardTag("shard0002", "SFO")
sh.addShardTag("shard0002", "NRT")

To assign a tag to a range of shard keys, connect to the mongos instance and use the
sh.addTagRange() method. The following operations assign:
● Two ranges of zip codes in Manhattan and Brooklyn, the NYC tag
● One range of zip codes in San Francisco, the SFO tag

sh.addTagRange("records.users", { zipcode: "10001" }, { zipcode: "10281" }, "NYC")

sh.addTagRange("records.users", { zipcode: "11201" }, { zipcode: "11240" }, "NYC")
sh.addTagRange("records.users", { zipcode: "94102" }, { zipcode: "94135" }, "SFO")

80
Remove Shard Tags

Shard tags exist in the shard’s document in a collection of the config database. To return all shards
with a specific tag, use the operations as given below:

use config
db.shards.find({ tags: "NYC" })

To return all shard key ranges tagged with NYC, use the following sequence of operations given below:

use config
db.tags.find({ tags: "NYC" })

The example given below removes the NYC tag assignment for the range of zip codes within
Manhattan:

use config
db.tags.remove({ _id: { ns: "records.users", min: { zipcode: "10001" }}, tag: "NYC" })

81
Key Takeaways

Indexes are data structures that store collection’s data set in a

form that is easy to traverse

The db.collection.reIndex method is used to rebuild all

indexes of a collection

Aggregation operations manipulate data and return a

computed result based on the input document and a specific
procedure
Knowledge Check
Knowledge
Check Which among the following is the correct syntax to create a unique index on field
1 name?

a. db.collection.createUniqueIndex({name:1})

b. db.collection.createUniqueIndex({name:1})

c. db.collection.createIndex({unique:true},{name:1})

d. db.collection.createIndex({name:1},{unique:true})
Knowledge
Check Which among the following is the correct syntax to create a unique index on field
1 name?

a. db.collection.createUniqueIndex({name:1})

b. db.collection.createUniqueIndex({name:1})

c. db.collection.createIndex({unique:true},{name:1})

d. db.collection.createIndex({name:1},{unique:true})

The correct answer is d. db.collection.createIndex({name:1},{unique:true})

db.collection.createIndex({name:1},{unique:true}) is the correct syntax to create an index on field name.

Knowledge
Check
Which among the following statements is correct regarding MongoDB indexes?
2

a. MongoDB uses indexes for performing collection scan

b. MongoDB does not support sparse index

c. Indexes support the efficient execution of queries in MongoDB

d. Indexes in MongoDB are not similar to indexes in other database systems

Knowledge
Check
Which among the following statements is correct regarding MongoDB indexes?
2

a. MongoDB uses indexes for performing collection scan

b. MongoDB does not support sparse index

c. Indexes support the efficient execution of queries in MongoDB

d. Indexes in MongoDB are not similar to indexes in other database systems

The correct answer is c. Indexes support the efficient execution of queries in MongoDB

Indexes support the efficient execution of queries in MongoDB.

Knowledge
Check
Which of the following techniques is used for scaling write operation in MongoDB?
3

Replication
a.

Sharding
b.

Indexing
c.

Splitting
d.
Knowledge
Check
Which of the following techniques is used for scaling write operation in MongoDB?
3

Replication
a.

Sharding
b.

Indexing
c.

Splitting
d.

The correct answer is b. Sharding

Sharding is used for scaling write operations in MongoDB.

Knowledge
Check
Which among the following statements is correct regarding replica set in MongoDB?
4

a. Write operation happens only at both primary and secondary machines

b. Write operation happens only at secondary machines

c. Write operation happens only at primary machines

d. Write operation happens only at hidden machines

Knowledge
Check
Which among the following statements is correct regarding replica set in MongoDB?
4

a. Write operation happens only at both primary and secondary machines

b. Write operation happens only at secondary machines

c. Write operation happens only at primary machines

d. Write operation happens only at hidden machines

The correct answer is c. Write operation happens only at primary machines

Write operation happens only at primary machines.

Knowledge
Check
Which of the following methods is used for listing all indexes of a collection?
5

a. getIndex()

b. listIndex()

c. getIndexes()

d. listIndexes()
Knowledge
Check
Which of the following methods is used for listing all indexes of a collection?
5

a. getIndex()

b. listIndex()

c. getIndexes()

d. listIndexes()

The correct answer is c. getIndexes()

getIndexes() is used to list all the indexes created for a collection.

Employee Training Score Analysis

Problem Statement: Duration: 60 min.

PQR Corp is a leading corporate training provider. PQR Corp has decided
to share analysis report with their clients. This report will help their clients
know the employees who have completed training and evaluation exam,
what are their strengths, and what are the areas where employees need
improvement. This is going to be a unique selling feature for the PQR Corp.
They have huge amount of data to deal with. They have hired you as an
expert and want your help to solve this problem.

MongoDB Cheat Sheet
No ratings yet
MongoDB Cheat Sheet
9 pages
Information Theory and Coding by Example
89% (9)
Information Theory and Coding by Example
528 pages
How To Get Lots of Money For Anything
100% (4)
How To Get Lots of Money For Anything
40 pages
Nyquist Diagram and Stability Criterion
No ratings yet
Nyquist Diagram and Stability Criterion
28 pages
Artificial Intelligence A Z Learn How To Build An AI 2
100% (1)
Artificial Intelligence A Z Learn How To Build An AI 2
33 pages
Understanding Conversational Systems
100% (3)
Understanding Conversational Systems
2 pages
Lecture 3 Cognitivism
100% (1)
Lecture 3 Cognitivism
48 pages
Control System 2MARKS
No ratings yet
Control System 2MARKS
16 pages
I M. Tech. - I Sem. (CSE) L T C 3 0 3 Program Elective I (16CS5010) Course Objectives
No ratings yet
I M. Tech. - I Sem. (CSE) L T C 3 0 3 Program Elective I (16CS5010) Course Objectives
8 pages
Semiotics: The Study of Signs and Symbols and Their Use or Interpretation
100% (1)
Semiotics: The Study of Signs and Symbols and Their Use or Interpretation
26 pages
جزوه هوش مصنوعی
No ratings yet
جزوه هوش مصنوعی
16 pages
What Is ChatGPT Conversational AI Redefined
No ratings yet
What Is ChatGPT Conversational AI Redefined
2 pages
Fastquerying Indexingforperformance4 150324144349 Converske01
No ratings yet
Fastquerying Indexingforperformance4 150324144349 Converske01
59 pages
Lecture 6 State Space Modelling Analysis
No ratings yet
Lecture 6 State Space Modelling Analysis
21 pages
Indexes MongoDB
No ratings yet
Indexes MongoDB
21 pages
Managerial Communication & PR: Srishti Singh Mba (HR & Ir) Section B
No ratings yet
Managerial Communication & PR: Srishti Singh Mba (HR & Ir) Section B
20 pages
MSC Thesis
No ratings yet
MSC Thesis
94 pages
S/4HANA: Partitioning of Table MATDOC: Symptom
No ratings yet
S/4HANA: Partitioning of Table MATDOC: Symptom
2 pages
Robotics Workshops
No ratings yet
Robotics Workshops
18 pages
Machine Learning by Tom Mitchell - Definitions
No ratings yet
Machine Learning by Tom Mitchell - Definitions
12 pages
SAP Analytics Cloud
No ratings yet
SAP Analytics Cloud
2 pages
S3D-UNet Separable 3D U-Net For Brain Tumor Segmentation
No ratings yet
S3D-UNet Separable 3D U-Net For Brain Tumor Segmentation
11 pages
DF200 - 01 - Indexes and Optimization Mongo DB Training
No ratings yet
DF200 - 01 - Indexes and Optimization Mongo DB Training
69 pages
Weather Radar Echo Prediction Method Based On Convolution Neural
No ratings yet
Weather Radar Echo Prediction Method Based On Convolution Neural
9 pages
Full Metal Mongo: Scale
No ratings yet
Full Metal Mongo: Scale
82 pages
Autoencoder Assignment PDF
No ratings yet
Autoencoder Assignment PDF
5 pages
5 Indexes
No ratings yet
5 Indexes
51 pages
Mongodb Indexes
No ratings yet
Mongodb Indexes
31 pages
Interview Ques
No ratings yet
Interview Ques
14 pages
BUBBLE Chart Presentation
No ratings yet
BUBBLE Chart Presentation
8 pages
Disease Prediction From Various Symptoms Using Machine Learning
No ratings yet
Disease Prediction From Various Symptoms Using Machine Learning
7 pages
MongoDB Index Type and Properties
No ratings yet
MongoDB Index Type and Properties
18 pages
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning
No ratings yet
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning
12 pages
Unit-1 Heterogenous and Homogenous Databases
No ratings yet
Unit-1 Heterogenous and Homogenous Databases
10 pages
First Quarter Exam in Oral Communication
100% (4)
First Quarter Exam in Oral Communication
3 pages
Database Quiz
No ratings yet
Database Quiz
4 pages
Pratik Shrestha
No ratings yet
Pratik Shrestha
2 pages
Mongodb Indexing and Aggregation in Mongodb
No ratings yet
Mongodb Indexing and Aggregation in Mongodb
33 pages
Mongodb Interview Questions (V4.4)
No ratings yet
Mongodb Interview Questions (V4.4)
25 pages
Update Uida I
No ratings yet
Update Uida I
4 pages
CK Pithawala Colege PDF For Big Data Analysis
No ratings yet
CK Pithawala Colege PDF For Big Data Analysis
16 pages
Remaining NGD New
No ratings yet
Remaining NGD New
21 pages
NoSQL 14 MONGO 2
No ratings yet
NoSQL 14 MONGO 2
37 pages
Index
No ratings yet
Index
9 pages
Mongo
No ratings yet
Mongo
7 pages
MongoDB Official Cheat Sheet
100% (2)
MongoDB Official Cheat Sheet
15 pages
05 Chapter Performance MongoDB
No ratings yet
05 Chapter Performance MongoDB
42 pages
MongoDB ReferenceCards
No ratings yet
MongoDB ReferenceCards
28 pages
Mongo DB Documentation-2922'11
No ratings yet
Mongo DB Documentation-2922'11
21 pages
DB Practices For MongoDB
No ratings yet
DB Practices For MongoDB
7 pages
MongoDB Reference Card
No ratings yet
MongoDB Reference Card
28 pages
Mongodb QRC Booklet
No ratings yet
Mongodb QRC Booklet
12 pages
Week 4 Block 2 - ITDSA2 1
No ratings yet
Week 4 Block 2 - ITDSA2 1
45 pages
Mongodb-Aggregation-And-Indexing Group-B-Assignment-15batch-1
No ratings yet
Mongodb-Aggregation-And-Indexing Group-B-Assignment-15batch-1
8 pages
Mongodb Interview Questions
No ratings yet
Mongodb Interview Questions
18 pages
Chapter 2 - Introduction To Control System
100% (2)
Chapter 2 - Introduction To Control System
20 pages
1664473609-Unit 5 - Database Management - MongoDB
No ratings yet
1664473609-Unit 5 - Database Management - MongoDB
23 pages
MongoDB Indexing PDF
No ratings yet
MongoDB Indexing PDF
3 pages
Dod Unit5
No ratings yet
Dod Unit5
15 pages
Mongo DB
100% (2)
Mongo DB
22 pages
Mongo DB Pracs 1to 3-1
No ratings yet
Mongo DB Pracs 1to 3-1
29 pages
Mongodb
No ratings yet
Mongodb
9 pages
Mongo DB
No ratings yet
Mongo DB
36 pages
GROUP B - Assign - 2 - Mongodb-Indexing-And-Aggregation-In-Mongodb-2-33
No ratings yet
GROUP B - Assign - 2 - Mongodb-Indexing-And-Aggregation-In-Mongodb-2-33
32 pages
MongoDB Indexes
No ratings yet
MongoDB Indexes
29 pages
Full Final
No ratings yet
Full Final
64 pages
Unit 2 Part 2
No ratings yet
Unit 2 Part 2
68 pages
Indexing-Sharding and Replication in MongoDB
No ratings yet
Indexing-Sharding and Replication in MongoDB
32 pages
Ex 9,10,11
No ratings yet
Ex 9,10,11
8 pages
MongoDB Indexing Aggregation
No ratings yet
MongoDB Indexing Aggregation
5 pages
MongoDB Tutorial
No ratings yet
MongoDB Tutorial
4 pages
Notes-Lecture 14 - MongoDB With NodeJS - II-3447
No ratings yet
Notes-Lecture 14 - MongoDB With NodeJS - II-3447
13 pages
M10A1
No ratings yet
M10A1
3 pages
Wa0005.
No ratings yet
Wa0005.
145 pages
NoSQL and MongoDB
No ratings yet
NoSQL and MongoDB
24 pages
5 Indexes
No ratings yet
5 Indexes
39 pages
Mongodb Indexing Simplified
No ratings yet
Mongodb Indexing Simplified
7 pages
Unit - 3
No ratings yet
Unit - 3
5 pages
MongoDB Seminar Presentation
No ratings yet
MongoDB Seminar Presentation
15 pages
MongoDB - Cours 4
No ratings yet
MongoDB - Cours 4
58 pages
Program8 WM
No ratings yet
Program8 WM
6 pages
Unit 2 - Bda Notes
No ratings yet
Unit 2 - Bda Notes
37 pages
MongoDb Imp
No ratings yet
MongoDb Imp
21 pages
MongoDB Query Language Notes
No ratings yet
MongoDB Query Language Notes
16 pages
MongoDB Indexing
No ratings yet
MongoDB Indexing
18 pages
Module 5 Indexes
No ratings yet
Module 5 Indexes
4 pages
Experment 8
No ratings yet
Experment 8
5 pages
Ass 8
No ratings yet
Ass 8
3 pages
Lecture 9 - MongoDB
No ratings yet
Lecture 9 - MongoDB
8 pages
The Beginner’s Guide to Unreal Engine Building Complete Games: The Beginner’s Guide to Unreal Engine, #3
From Everand
The Beginner’s Guide to Unreal Engine Building Complete Games: The Beginner’s Guide to Unreal Engine, #3
Steven Mcananey
No ratings yet