0% found this document useful (0 votes)
55 views32 pages

Indexing-Sharding and Replication in MongoDB

Uploaded by

bhumitboraniya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views32 pages

Indexing-Sharding and Replication in MongoDB

Uploaded by

bhumitboraniya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Indexing-Sharding and

Replication in
MongoDB

By – Jigna Patel
MongoDB Indexes

• In MongoDB, querying without indexes is called a collection scan. A


collection scan will:
1. Result in various performance bottlenecks
2. Significantly slow down your application
What are indexes in MongoDB?

• Indexes are special data structures that store a small part of the
Collection’s data in a way that can be queried easily.
• In simplest terms, indexes store the values of the indexed fields
outside the table or collection and keep track of their location in the
disk. These values are used to order the indexed fields.
• This ordering helps to perform equality matches and range-based
query operations efficiently. In MongoDB, indexes are defined in the
collection level and indexes on any field or subfield of the documents
in a collection are supported.
Creating indexes
• When creating documents in a collection, MongoDB creates a unique
index using the _id field. MongoDB refers to this as the Default _id Index.
This default index cannot be dropped from the collection.
• see the _id field which will be
• utilized as the default index:
• Syntax : db.<collection>.createIndex(<Key and Index Type>, <Options>)
• When creating an index, you need to define the field to be indexed and the
direction of the key (1 or -1) to indicate ascending or descending order.
• Indexes cannot be renamed after creation. (The only way to rename an index is
to first drop that index and recreate it using the desired name.)
• Let’s create an index using the name field in the studentgrades collection and
name it as student name index.
Finding indexes
• You can find all the available indexes in a MongoDB collection by using the
getIndexes method.
• Syntax: db.<collection>.getIndexes()
Dropping indexes

• To delete an index from a collection, use the dropIndex method while


specifying the index name to be dropped.
• Syntax : db.<collection>.dropIndex(<Index Name / Field Name>)
• Let’s remove the user-created index with the index name student name
index, as shown below.
• The dropIndexes command can also drop all the indexes excluding the
default _id Index.
Common MongoDB index types

• MongoDB provides different types of indexes that can be utilized


according to user needs. Here are the most common ones:
1. Single field index
2. Compound index
3. Multikey index
Single field index

• These user-defined indexes use a single field in a document to create an


index in an ascending or descending sort order (1 or -1).
• In a single field index, the sort order of the index key does not have an
impact because MongoDB can traverse the index in either direction.

You can use the sort() method to see how the data will be
represented in the index.
Compound index
• You can use multiple fields in a MongoDB document to create a compound index.
• This type of index will use the first field for the initial sort and then sort by the
preceding fields

• In the above compound index, MongoDB will:


1. First sort by the subject field
2. Then, within each subject value, sort by grade
• The index would create a data structure similar to the following:
db.studentgrades.find({},{_id:0}).sort({subject:1, score:-1})
Multikey index
• MongoDB supports indexing array fields. When you create an index for a field
containing an array, MongoDB will create separate index entries for every
element in the array.
• These multikey indexes enable users to query documents using the elements
within the array.
• MongoDB will automatically create a multikey index when encountered with
an array field without requiring the user to explicitly define the multikey type.
• Let’s create a new data set containing an array field to demonstrate the
creation of a multikey index.
• Now let’s create an index using the grades field.

• The above code will automatically create a Multikey index in MongoDB.


When you query for a document using the array field (grades), MongoDB
will search for the first element of the array defined in the find() method
and then search for the whole matching query.
Other MongoDB index types

1. Geospatial Index
• MongoDB provides two types of indexes to increase the efficiency of database queries
when dealing with geospatial coordinate data:
• 2d indexes that use planar geometry which is intended for legacy coordinate pairs used in
MongoDB 2.2 and earlier.
• 2dsphere indexes that use spherical geometry.
2. Text index
• The text index type enables you to search the string content in a collection.
3. Hashed index
• MongoDB Hashed index type is used to provide support for hash-based
sharding functionality. This would index the hash value of the specified field.
Sharding
What is sharding?
• Sharding is the process of distributing data across multiple hosts. In
MongoDB, sharding is achieved by splitting large data sets into small
data sets across multiple MongoDB instances.
How sharding works…
When dealing with high throughput applications or very large
databases, the underlying hardware becomes the main limitation. High
query rates can stress the CPU, RAM, and I/O capacity of disk drives
resulting in a poor end-user experience.

To mitigate this problem, there are two types of scaling methods.


• Vertical Scaling
• Horizontal Scaling - MongoDB supports horizontal scaling through sharding
MongoDB sharding basics

MongoDB sharding works by creating a cluster of MongoDB instances


consisting of at least three servers. That means sharded clusters consist
of three main components:
• The shard
• Mongos
• Config servers
• Shard :
A shard is a single MongoDB instance that holds a subset of the sharded data.
Shards can be deployed as replica sets to increase availability and provide
redundancy. The combination of multiple shards creates a complete data set.
For example, a 2 TB data set can be broken down into four shards, each
containing 500 GB of data from the original data set.
• Mongos
Mongos act as the query router providing a stable interface between the
application and the sharded cluster. This MongoDB instance is responsible for
routing the client requests to the correct shard.
• Config Servers
Configuration servers store the metadata and the configuration settings for the
whole cluster.
Components illustration:
Find out
• Benefits of sharding
• Limitation of sharding in MongoDB
MongoDB Replication: A Complete Introduction
What is MongoDB Replication?
• MongoDB replication is the process of creating a copy of the same data set in
more than one MongoDB server.
• This can be achieved by using a Replica Set. A replica set is a group of MongoDB
instances that maintain the same data set and pertain to any mongod process.
• Replication enables database administrators to provide:
1. Data redundancy
2. High availability of data
• Replication can also be used as a part of load balancing, where read and write
operations can be distributed across all the instances depending on the use
case.
How MongoDB replication works

• MongoDB handles replication through a Replica Set, which consists of


multiple MongoDB nodes that are grouped together as a unit.
• A Replica Set requires a minimum of three MongoDB nodes:
1. One of the nodes will be considered the primary node that receives
all the write operations.
2. The others are considered secondary nodes. These secondary nodes
will replicate the data from the primary node.
• While the primary node is the only instance
that accepts write operations, any other
node within a replica set can accept read
operations.

• These can be configured through a


supported MongoDB client.

• In an event where the primary node is


unavailable or inoperable, a secondary node
will take the primary node’s role to provide
continuous availability of data.

• In such a case, the primary node selection is


made through a process called Replica Set
Elections, where the most suitable secondary
node is selected as the new primary node.
The Heartbeat process

• Heartbeat is the process that identifies the current status of a MongoDB


node in a replica set.
• There, the replica set nodes send pings to each other every two seconds
(hence the name).
• If any node doesn’t ping back within 10 seconds, the other nodes in the
replica set mark it as inaccessible.
• This functionality is vital for the automatic failover process where the
primary node is unreachable and the secondary nodes do not receive a
heartbeat from it within the allocated time frame.
• Then, MongoDB will automatically assign a secondary server to act as the
primary server
Replica set elections

• The elections in replica sets are used to determine which MongoDB


node should become the primary node. These elections can occur in
the following instances:
• Loss of connectivity to the primary node (detected by heartbeats)
• Initializing a replica set
• Adding a new node to an existing replica set
• Maintenance of a Replica set using stepDown or rs.reconfig methods
• In the process of an election, first, one of the nodes will raise a flag requesting
an election, and all the other nodes will vote to elect that node as the primary
node.
• The average time for an election process to complete is 12 seconds, assuming
that replica configuration settings are in their default values.
• A major factor that may affect the time for an election to complete is the
network latency, and it can cause delays in getting your replica set back to
operation with the new primary node.
• The replica set cannot process any write operations until the election is
completed.
• However, read operations can be served if read queries are configured to be
processed on secondary nodes.
MongoDB Replica Set vs
MongoDB Cluster
• A replica set creates multiple copies of the same data set across the replica
set nodes. The basic objective of a replica set is to:
1. Increase data availability
2. Provide a built-in backup solution
• Clusters work differently. The MongoDB cluster distributes the data across
multiple nodes using a shard key.
• This process will break down the data into multiple pieces called shards and
then copy each shard to a separate node
• The main purpose of a cluster is to support extremely large data sets and
high throughput operations by horizontally scaling the workload.
• The major difference between a replica set and a cluster is:
1. A replica set copies the data set as a whole.
2. A cluster distributes the workload and stores pieces of data (shards)
across multiple servers.
• MongoDB allows users to combine these two functionalities by creating a
sharded cluster, where each shard is replicated to a secondary server in
order to provide high data availability and redundancy.

You might also like