ADB - Lab Sheet 6
ADB - Lab Sheet 6
Ensuring data availability and reliability is crucial in the dynamic world of centralized
database management. This issue is addressed by MongoDB, a flexible NoSQL database
solution because of its strong replication system. The whole package of preserving data
integrity, high availability, and effective scaling is MongoDB replication. This solution
not only protects against hardware failures and data loss but also improves read
scalability and facilitates geographically distributed applications by spreading and
synchronizing data across multiple nodes.
This article explores the core ideas of MongoDB replication, as well as its advantages
and underlying processes, which make it a vital resource for creating dependable and
adaptable database systems. You will get to know about various methods for executing
the MongoDB replication.
What is MongoDB
MongoDB is a widely used open-source, NoSQL database management system that falls
under the category of document-oriented databases. Unlike traditional relational
databases, MongoDB uses a flexible, schema-less data model that is designed to handle
large amounts of unstructured or semi-structured data. It is developed by MongoDB,
Inc. and has gained widespread recognition in various industries and applications.
In a replica set, there are two or more MongoDB instances. One of them would be a
primary and one or more would be the secondaries. The primary node is the primary
source of truth for data modifications, handling all write operations from the users.
Secondary nodes replicate data from the primary node, ensuring that they have a
consistent copy of the primary data.
Write operations on the primary: When a user sends a write operation (such
as an insert, update, or delete) to the primary node, the primary node processes
the operation and records it in its oplog (operations log).
Oplog replication to secondaries: Secondary nodes poll the primary's oplog at
regular intervals. The oplog contains a chronological record of all the write
operations performed. The secondary nodes read the oplog entries and apply the
same operations to their data sets in the same order they were executed on the
primary node.
Achieving data consistency: Through this oplog-based replication, secondary
nodes catch up with the primary's node data over time. This process ensures that
the data on secondary nodes remains consistent with the primary's node data.
Read operations: While primary nodes handle write operations, both primary
and secondary nodes can serve read operations which can help in load balancing.
Clients can choose to read from secondary nodes, which helps distribute the read
load balance and reduce the primary node's workload. However, note that
secondary node might have slightly outdated data due to replication lag.
Install MongoDB on multiple servers or virtual machines where Replica Set will be
created. You can follow the installation instructions provided in the MongoDB
documentation.
Ensure that all the servers can communicate with each other over the network. Include
the hostnames and IP addresses of all Replica Set members by updating them to the
/etc/hosts file or DNS configuration.
For each server, a MongoDB configuration file needs to be created which will be saved
as mongod.conf file name and written in yaml format type similar to the code snippet
below.
storage:
dbPath: /var/lib/mongodb
journal:
enabled: true
systemLog:
destination: file
path: /var/log/mongodb/mongod.log
logAppend: true
net:
bindIp:
port:
replication:
replSetName: myReplSet
To start MongoDB on each server using the configuration file made above i.e.
mongod.conf file by using the following bash command:
mongod -f /path/to/mongod.conf
Now we need to connect to any one of the MongoDB instances (which is basically
replica set) created using the bello MongoDB shell command:
After executing the Replica Set, connect to the primary node using the following bash
command in MongoDB shell:
rs.add(":")
If you want to add an arbiter node for elections, connect to the primary node’s
MongoDB shell and execute the following javascript command:
rs.addArb("<arbiter_host>:<arbiter_port>")
To check the status of the Replica Set, connect to any of the MongoDB instances and run
the following javascript command:
rs.status()
To test connection failure, you can simulate a primary node failure by stopping the
MongoDB instance. The Replica Set should automatically elect a new primary node.
Please note that the provided steps and code snippets are generalized and the actual
steps might require adjustments based on the specific environment and use case. This is
where a near real-time low code tool like fivetran can be leveraged as you just need to
connect MongoDB with it and then Fivetran would handle all the replication tasks
without any hassle.
While MongoDB replication using the replica set method offers numerous benefits,
there are situations where its complexity, resource requirements, or alignment with
specific use cases make it less feasible. Organizations need to carefully assess their
requirements, infrastructure, and operational capabilities to determine whether replica
sets are the appropriate solution or if alternative strategies should be considered. In
such circumstances one can always consider a low code data replication tool like
Fivetran.
To configure Fivetran, you need to identify the MongoDB replica set's host identifier.
The host identifier can be in various formats:
Connect to your replica set or primary node using the MongoDB shell.
Execute the `db.adminCommand({ replSetGetStatus : 1 }).members` command.
Copy the host identifier and alternative host identifiers if needed.
Step 2: Allow Database Access
Create a database user for Fivetran using MongoDB Atlas or the MongoDB shell.
Connect to your replica set or primary node using the MongoDB shell.
Execute the necessary command to create a user for Fivetran, specifying roles.
Step 3: Choose Connection Method
Decide how you want to connect Fivetran to your MongoDB cluster: directly, using an
SSH tunnel, or through a private link.
Connect Directly (TLS required): Configure firewall and access control to allow
incoming connections from Fivetran's IPs.
Connect using PrivateLink (Optional): If you have a Business Critical plan, you can
use AWS PrivateLink, Azure Private Link, or Google Cloud Private Service Connect to
connect Fivetran to your MongoDB Atlas database.
Adjust oplog size to retain sufficient changes, at least 24 hours' worth, preferably seven
days' worth. Your oplog size can be adjusted using either MongoDB Atlas or the
MongoDB shell:
Choose between packed mode and unpacked mode based on your needs.