0% found this document useful (0 votes)
13 views9 pages

ADB - Lab Sheet 6

Uploaded by

yolaxa1297
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views9 pages

ADB - Lab Sheet 6

Uploaded by

yolaxa1297
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Experiment No.

6: Explore Replication Commands

Level 1: Try all replication commands on ‘Student’ Database.

Ensuring data availability and reliability is crucial in the dynamic world of centralized
database management. This issue is addressed by MongoDB, a flexible NoSQL database
solution because of its strong replication system. The whole package of preserving data
integrity, high availability, and effective scaling is MongoDB replication. This solution
not only protects against hardware failures and data loss but also improves read
scalability and facilitates geographically distributed applications by spreading and
synchronizing data across multiple nodes.

This article explores the core ideas of MongoDB replication, as well as its advantages
and underlying processes, which make it a vital resource for creating dependable and
adaptable database systems. You will get to know about various methods for executing
the MongoDB replication.

What is MongoDB
MongoDB is a widely used open-source, NoSQL database management system that falls
under the category of document-oriented databases. Unlike traditional relational
databases, MongoDB uses a flexible, schema-less data model that is designed to handle
large amounts of unstructured or semi-structured data. It is developed by MongoDB,
Inc. and has gained widespread recognition in various industries and applications.

Key features of MongoDB

 Document-Oriented: MongoDB stores data in documents, which are JSON-like


format composed of key-value pairs. Documents can have different fields and
structures, providing flexibility to accommodate different data formats within a
single collection which is a table.
 Collections and Documents: In MongoDB, documents are organized into
collections, which can be considered as equivalent to tables in the relational
databases. Each document within a collection can have a different schema which
makes it easier to handle the dynamic data.
 No Fixed Schema: Unlike relational databases, MongoDB does not require a
fixed schema for its collections. This means fields can be added, modified, or
removed without making any difference to the other documents within the same
collection.
 Scalability: MongoDB supports horizontal scaling by allowing data to be
distributed across multiple servers or nodes. This is crucial for handling large
amounts of data and high workloads.
 Replication: MongoDB offers built-in replication through Replica Sets, enabling
the creation of multiple copies of data for high availability and fault tolerance.
 Sharding: Sharding is a method of partitioning data across multiple virtual
machines. MongoDB's sharding feature allows databases to be distributed and
balanced across different servers, which can significantly improve query
performance and handle large data volumes.
 Query Language: MongoDB provides a rich query language for retrieving data
from collections. Queries can be written to match the hierarchical nature of
documents and support various filtering, sorting, and aggregation operations.
 Indexes: Indexes in MongoDB improve query performance by allowing efficient
data retrieval. Single-field, compound, geospatial, and text indexes are few types
of indexes that MongoDB supports.
 Aggregation Framework: The Aggregation Framework offers powerful data
transformation and aggregation capabilities, allowing the user to perform
complex queries and operations on data within MongoDB.
 Ad Hoc Queries: MongoDB supports ad hoc queries which is very useful as you
can query data without the need to predefine relationships or join tables.
MongoDB is widely used in various scenarios, including web applications, mobile
applications, real-time analytics, content management systems, and more. It's a one stop
solution for multiple database projects that require the flexibility to handle dynamically
changing data structures, high scalability, and efficient data retrieval. However, it's
important to choose the right database system based on the specific requirements of
your application.

What is MongoDB replication and how it works


MongoDB replication is a data synchronization process that allows multiple copies of
MongoDB data to be maintained across different servers or nodes. This feature is
designed to improve data availability, fault tolerance, and scalability in distributed
database environments. MongoDB replication is implemented using a structure called a
replica set.

In a replica set, there are two or more MongoDB instances. One of them would be a
primary and one or more would be the secondaries. The primary node is the primary
source of truth for data modifications, handling all write operations from the users.
Secondary nodes replicate data from the primary node, ensuring that they have a
consistent copy of the primary data.

The replication process works as follows:

 Write operations on the primary: When a user sends a write operation (such
as an insert, update, or delete) to the primary node, the primary node processes
the operation and records it in its oplog (operations log).
 Oplog replication to secondaries: Secondary nodes poll the primary's oplog at
regular intervals. The oplog contains a chronological record of all the write
operations performed. The secondary nodes read the oplog entries and apply the
same operations to their data sets in the same order they were executed on the
primary node.
 Achieving data consistency: Through this oplog-based replication, secondary
nodes catch up with the primary's node data over time. This process ensures that
the data on secondary nodes remains consistent with the primary's node data.
 Read operations: While primary nodes handle write operations, both primary
and secondary nodes can serve read operations which can help in load balancing.
Clients can choose to read from secondary nodes, which helps distribute the read
load balance and reduce the primary node's workload. However, note that
secondary node might have slightly outdated data due to replication lag.

MongoDB replication provides several benefits


 High Availability: In the event of primary node failure, a secondary node can be
automatically promoted to the primary role, ensuring that the database remains
operational and minimizing downtime.
 Fault Tolerance: Multiple replicas of data reduce the risk of data loss due to
hardware failures or other issues affecting a single node.
 Read Scalability: Secondary nodes can handle read queries, distributing the
read workload and improving overall performance.
 Data Redundancy: Having multiple replicas of data provides a level of data
redundancy, helping protect against data loss.
To set up and manage replication, administrators can define a Replica Set by specifying
the nodes and their roles in a configuration. MongoDB's replication mechanism ensures
data consistency, handles failover, and provides the tools necessary to monitor the
status of the Replica Set.

In summary, MongoDB replication is a critical feature that enhances data availability


and reliability in distributed environments. It enables the maintenance of synchronized
data copies across multiple nodes, allowing for fault tolerance and improved
performance in MongoDB database systems.

Methods to setup MongoDB replication


Setting up MongoDB replication is a crucial step in creating a fault-tolerant and highly
available database environment. MongoDB replication allows you to create multiple
copies of your data across different servers, ensuring data redundancy and fault
tolerance. Here are the methods to set up MongoDB replication
Method 1 : MongoDB replication using replica set
Setting up MongoDB replication using a Replica Set involves several steps. Here are
detailed instructions with code snippets for each step:

Step 1: Prepare MongoDB Instances

Install MongoDB on multiple servers or virtual machines where Replica Set will be
created. You can follow the installation instructions provided in the MongoDB
documentation.

Step 2: Configure Network Settings

Ensure that all the servers can communicate with each other over the network. Include
the hostnames and IP addresses of all Replica Set members by updating them to the
/etc/hosts file or DNS configuration.

Step 3: Start MongoDB Instances

For each server, a MongoDB configuration file needs to be created which will be saved
as mongod.conf file name and written in yaml format type similar to the code snippet
below.

storage:

dbPath: /var/lib/mongodb

journal:

enabled: true

systemLog:

destination: file

path: /var/log/mongodb/mongod.log

logAppend: true

net:

bindIp:

port:
replication:

replSetName: myReplSet

To start MongoDB on each server using the configuration file made above i.e.
mongod.conf file by using the following bash command:

mongod -f /path/to/mongod.conf

Step 4: Initialize the Replica Set

Now we need to connect to any one of the MongoDB instances (which is basically
replica set) created using the bello MongoDB shell command:

mongo --host <hostname>:<port>

Replica Set needs to be initialized now by executing the following command:

rs.initiate({_id: "myReplSet", members: [{_id: 0, host: "<primary_host>:<primary_port>"}]})

Step 5: Add Secondary Members

After executing the Replica Set, connect to the primary node using the following bash
command in MongoDB shell:

mongo --host <primary_host>:<primary_port>

Add secondary members using the following Javascript command:

rs.add(":")

Repeat this step for each secondary member.

Step 6: Optional - Add Arbiter Node

If you want to add an arbiter node for elections, connect to the primary node’s
MongoDB shell and execute the following javascript command:

rs.addArb("<arbiter_host>:<arbiter_port>")

Step 7: Check Replica Set Status

To check the status of the Replica Set, connect to any of the MongoDB instances and run
the following javascript command:
rs.status()

Step 8: Test Connection Failure

To test connection failure, you can simulate a primary node failure by stopping the
MongoDB instance. The Replica Set should automatically elect a new primary node.
Please note that the provided steps and code snippets are generalized and the actual
steps might require adjustments based on the specific environment and use case. This is
where a near real-time low code tool like fivetran can be leveraged as you just need to
connect MongoDB with it and then Fivetran would handle all the replication tasks
without any hassle.

While MongoDB replication using the replica set method offers numerous benefits,
there are situations where its complexity, resource requirements, or alignment with
specific use cases make it less feasible. Organizations need to carefully assess their
requirements, infrastructure, and operational capabilities to determine whether replica
sets are the appropriate solution or if alternative strategies should be considered. In
such circumstances one can always consider a low code data replication tool like
Fivetran.

Method 2 : MongoDB replication using fivetran


Step 1: Find Host Identifiers

To configure Fivetran, you need to identify the MongoDB replica set's host identifier.
The host identifier can be in various formats:

 SRV host identifier: `mongodb+srv://example.mongodb.net`


 Connection
string: `mongodb://mongodb0.example.com:27017,mongodb1.example.com:27
017,mongodb2.example.com:27017`
 Domain and port: `your.server.com:27017`
 IP address and port: `1.2.3.4:27017`
You can also optionally use Analytics nodes or specify read preferences based on your
needs. You can find your host identifiers either using MongoDB Atlas or the MongoDB
shell.

Using MongoDB Atlas:

 Log in to the MongoDB Atlas dashboard.


 In the Cluster Overview tab, click Connect.
 Select Connect your application.
 Copy the SRV host identifier.
Using MongoDB shell:

 Connect to your replica set or primary node using the MongoDB shell.
 Execute the `db.adminCommand({ replSetGetStatus : 1 }).members` command.
 Copy the host identifier and alternative host identifiers if needed.
Step 2: Allow Database Access

Create a database user for Fivetran using MongoDB Atlas or the MongoDB shell.

Using MongoDB Atlas:

 Log in to the MongoDB Atlas dashboard.


 Go to Security > Database Access.
 Create a new database user with specific privileges including `readAnyDatabase`
and `read` on the `local` database.
Using MongoDB shell:

 Connect to your replica set or primary node using the MongoDB shell.
 Execute the necessary command to create a user for Fivetran, specifying roles.
Step 3: Choose Connection Method

Decide how you want to connect Fivetran to your MongoDB cluster: directly, using an
SSH tunnel, or through a private link.

Connect Directly (TLS required): Configure firewall and access control to allow
incoming connections from Fivetran's IPs.

Using MongoDB Atlas:

 Make note of MongoDB cluster's cloud service provider and region.


 Go to Security > Network Access.
 Add Fivetran's IP to the access list.
Using MongoDB shell:

 Follow MongoDB's Security Considerations documentation to safelist Fivetran's


IPs.
Connect using SSH (TLS optional): Configure firewall to allow connections to your
MongoDB port from your SSH tunnel server's IP.

Connect using PrivateLink (Optional): If you have a Business Critical plan, you can
use AWS PrivateLink, Azure Private Link, or Google Cloud Private Service Connect to
connect Fivetran to your MongoDB Atlas database.

Step 4: Set Oplog Size (Optional)

Adjust oplog size to retain sufficient changes, at least 24 hours' worth, preferably seven
days' worth. Your oplog size can be adjusted using either MongoDB Atlas or the
MongoDB shell:

 MongoDB Atlas: Follow MongoDB Atlas' Set Oplog Size tutorial.


 MongoDB shell: Follow MongoDB's Change the Size of the Oplog tutorial.
Step 5: Choose Pack Mode

Choose between packed mode and unpacked mode based on your needs.

Step 6: Finish Fivetran Configuration

 Enter Destination schema prefix.


 Enter Host and ports.
 Provide Fivetran-specific User and Password.
 Choose Connection Method.
 If you enabled SSL/TLS on your database, configure it accordingly.
 Click Save & Test to validate the connection.
You only need to authenticate the MongoDB instance with Fivetran one time which will
hardly take a few minutes. Upon successful setup, you can start syncing data using
Fivetran. You can now replicate your data anywhere and any number of times whether
it be any AWS service, cloud database or data warehouse. For more details you can go
through this detailed MongoDB set up guide.

Advantages of using Fivetran for MongoDB replication


The following are some major benefits of utilizing Fivetran for MongoDB replication:

 Seamless Data Integration: Fivetran offers pre-built connections for a variety


of data sources like MongoDB and for various data warehouses like Amazon
Redshift, removing the need for manual scripting or intricate settings. This
speeds up and simplifies the process of integrating data.
 Automated Workflows: By automating the process of data loading, automated
workflows assist in consistently and frequently synchronizing data. By
minimizing manual involvement and maintaining data integrity, it manages
incremental updates, data format changes, and schema modifications.
 Data Transformation Capabilities: Before putting the data into the data
warehouse, users can perform customized data transformations thanks to the
system's robust data transformation capabilities. This makes it possible to clean,
normalize, and enhance data, ensuring that it is prepared for analysis.
 Monitoring and Alerting: It provides monitoring and alerting options so that
you can keep track of how the data integration process is going. It offers visibility
into data loading metrics, error correction, and alerts for any emerging issues.
 Flexibility of Data Sources: Fivetran supports a wide range of data sources. It
allows businesses to combine data from many sources into data warehouses like
BigQuery, Redshift etc. by connecting to different databases, cloud services, and
apps.
 Saving Time and Resources: By automating the data loading process and
eliminating the need for manual intervention, Fivetran saves time and resources.
Teams can now focus on data analysis and developing conclusions from the
loaded data.

Level2: Implement replication commands on ‘Employee’ Database.


(Students will implement and write in the lab record)

You might also like