0% found this document useful (0 votes)
20 views13 pages

Replication

The document discusses replication in MySQL and PostgreSQL, highlighting its importance for data duplication, high availability, and fault tolerance. It details various replication methods, including streaming, replicated block device, and WAL, along with their configurations and benefits. Additionally, it addresses security considerations and challenges associated with replication setups.

Uploaded by

Khalil Hafiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Replication

The document discusses replication in MySQL and PostgreSQL, highlighting its importance for data duplication, high availability, and fault tolerance. It details various replication methods, including streaming, replicated block device, and WAL, along with their configurations and benefits. Additionally, it addresses security considerations and challenges associated with replication setups.

Uploaded by

Khalil Hafiz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

REPLICATION

11/29/2023 IN MYSQL & POSTGRESQL

Group members:

SAMEN RYAN
NGOUONJI MANZAM KAALIL
TCHABAT VALIER
RYAN ERIC NGU
NKOUEKAM GESKY
TCHUMTCHUA VANELLE
OLOMO ABBE
BOMG SHALOM
NDIP LUCY-DIANE
Table of Contents
Group members: ............................................................................................................................... 0
I. Replication:.................................................................................................................................... 2
A. Postgres Replication: .................................................................................................................... 2
B. Mysql Replication: ........................................................................................................................ 2
What is automatic failover ? ................................................................................................................ 3
Benefits of Using Replication ............................................................................................................... 3
How Mysql Replication Work .............................................................................................................. 4
How PostgreSQL Replication Works .................................................................................................. 5
Method 1: Streaming .......................................................................................................................... 5
Method 2: Replicated Block Device .................................................................................................... 9
Method 3: WAL ................................................................................................................................... 9
Types of Replication ............................................................................................................................ 11
Physical replication ....................................................................................................................... 11
Logical replication ......................................................................................................................... 11
Replication Modes............................................................................................................................... 11
In synchronous mode replication ................................................................................................. 11
In asynchronous mode .................................................................................................................. 12
To conclude ........................................................................................................................................... 12
I. Replication:
Replication is a fundamental feature in database management
systems that allows for the duplication of data from a main server to a secondary server,
providing high availability and scalability. In the context of MySQL and PostgreSQL,
replication plays a crucial role in ensuring the reliability and accessibility of data for
organizations. However, alongside the benefits of replication, it is essential to address security
considerations to safeguard the integrity and confidentiality of replicated data.
Setting up replication between two databases offers fault tolerance against unexpected mishaps.
It’s considered to be the best strategy for attaining high availability during disasters.
This expose will explore the replication features of MySQL and PostgreSQL, as well as the
security measures available to protect replicated data.

Replication agent are the software components that performs the replication tasks, such as
reading, sending, receiving, or applying the data changes.

Examples of Replication Agents are :


- SQL Server Agent
- Snapshot Agent
- Queue Agent

A. Postgres Replication:
PostgreSQL replication is defined as the process of copying data from a PostgreSQL
database server to another server. The source database server is also known as the “primary”
server, whereas the database server receiving the copied data is known as the “replica” server.

The PostgreSQL database follows a straightforward replication model, where all writes go to
a primary node. The primary node can then apply these changes and broadcast them to
secondary nodes.

B. Mysql Replication:
MySQL replication is a process in which data from a primary MySQL database is
copied and sent to one or more secondary databases, known as replicas.

Replication guarantees information gets copied and purposely populated into another
environment instead of only stored in one location (based on the transactions of the source
environment).
What is automatic failover ?

Once physical streaming replication has been set up and configured, failover
con take place if the primary server for the database fails. Failover is the term to describe the
recovery process, which in postgres can take some time, particularly as postgres does not have
itself does not provide build-in tools to detect server failures.
The EnterpriseDB’s EDB Postgres Failure Manager lets you detect database failures and
promotes the most current standby serve as the new master, helping to avoid costly database
downtime.

Benefits of Using Replication

Here are a few key benefits of leveraging PostgreSQL replication:

• Data migration: You can leverage PostgreSQL replication for data migration either
through a change of database server hardware or through system deployment.
• Fault tolerance: If the primary server fails, the standby server can act as a server
because the contained data for both primary and standby servers is the same.
• Online transactional processing (OLTP) performance: You can improve the
transaction processing time and query time of an OLTP system by removing reporting
query load. Transaction processing time is the duration it takes for a given query to be
executed before a transaction is finished.
• System testing in parallel: While upgrading a new system, you need to make sure
that the system fares well with existing data, hence the need to test with a production
database copy before deployment.
• Data security because the replica can pause the replication process, it is possible to
run backup services on the replica without corrupting the corresponding source data.

• Analytics live data can be created on the source, while the analysis of the information
can take place on the replica without affecting the performance of the source.
How Mysql Replication Work

Here is an example of how to set up MySQL replication using SQL queries:

1. First, create a user on the master server that the slave server can use to connect to the master
server:

CREATE USER 'replication_user'@'%' IDENTIFIED BY 'password';

GRANT REPLICATION SLAVE ON *.* TO 'replication_user'@'%';

2. Next, on the master server, lock the tables that you want to replicate:

FLUSH TABLES WITH READ LOCK;

3. Find out the current binary log file and position:

SHOW MASTER STATUS;

4. On the slave server, configure the slave to connect to the master:

CHANGE MASTER TO

MASTER_HOST='master_host_name',

MASTER_USER='replication_user',

MASTER_PASSWORD='password',

MASTER_LOG_FILE='recorded_log_file_name',

MASTER_LOG_POS=recorded_log_position;

5. Start slave

START SLAVE;
6. Finally, check the slave status:

SHOW SLAVE STATUS\G

How PostgreSQL Replication Works

Generally, people believe that when you’re dabbling with a primary and secondary
architecture, there’s only one way to set up backups and replication. PostgreSQL deployments,
however, can follow any of these three methods:

1. Streaming replication: Replicates data from the primary node to the secondary, then
copies data to Amazon S3 or Azure Blob for backup storage.
2. Volume-level replication: Replicates data at the storage layer, starting from the
primary node to the secondary node, then copies data to Amazon S3 or Azure Blob for
backup storage.
3. Incremental backups: Replicates data from the primary node while constructing a new
secondary node from Amazon S3 or Azure Blob storage, allowing for streaming directly
from the primary node.

Method 1: Streaming

PostgreSQL streaming replication also known as WAL replication can be set up seamlessly
after installing PostgreSQL on all servers. This approach to replication is based on moving the
WAL files from the primary to the target database.

You can implement PostgreSQL streaming replication by using a primary-secondary


configuration. The primary server is the main instance that handles the primary database and
all its operations. The secondary server acts as the supplementary instance and executes all
changes made to the primary database on itself, generating an identical copy in the process.
The primary is the read/write server whereas the secondary server is merely read-only.

For this method, you need to configure both the primary node and the standby node. The
following sections will elucidate the steps involved in configuring them with ease.

Configuring Primary Node

You can configure the primary node for streaming replication by carrying out the following
steps:

Step 1: Initialize the Database

To initialize the database, you can leverage the initdb utility command. Next, you can create
a new user with replication privileges by utilizing the following command:
CREATE USER 'example_username' REPLICATION LOGIN ENCRYPTED PASSWORD
'example_password';

The user will have to provide a password and username for the given query. The replication
keyword is used to give the user the required privileges. An example query would look
something like this:

CREATE USER 'rep_username' REPLICATION LOGIN ENCRYPTED PASSWORD


'rep_password';

Step 2: Configure Streaming Properties

Next, you can configure the streaming properties with the PostgreSQL configuration file
(postgresql.conf) that can be modified as follows:

wal_level = logical
wal_log_hints = on
max_wal_senders = 8
max_wal_size = 1GB
hot_standby = on

Here’s a little background around the parameters used in the previous snippet:

• wal_log_hints : This parameter is required for the pg_rewind capability that comes
in handy when the standby server’s out of sync with the primary server.
• wal_level : You can use this parameter to enable PostgreSQL streaming replication,
with possible values including minimal , replica , or logical .
• max_wal_size : This can be used to specify the size of WAL files that can be
retained in log files.
• hot_standby : You can leverage this parameter for a read-on connection with the
secondary when it’s set to ON.
• max_wal_senders : You can use max_wal_senders to specify the maximum
number of concurrent connections that can be established with the standby servers.
Step 3: Create New Entry

After you’ve modified the parameters in the postgresql.conf file, a new replication entry in
the pg_hba.conf file can allow the servers to establish a connection with each other for
replication.

You can usually find this file in the data directory of PostgreSQL. You can use the following
code snippet for the same:

host replication rep_user IPaddress md5

Once the code snippet gets executed, the primary server allows a user called rep_user to
connect and act as the standby server by using the specified IP for replication. For instance:

host replication rep_user 192.168.0.22/32 md5

Configuring Standby Node

To configure the standby node for streaming replication, follow these steps:

Step 1: Back Up Primary Node

To configure the standby node, leverage the pg_basebackup utility to generate a backup of
the primary node. This will serve as a starting point for the standby node. You can use this
utility with the following syntax:

pg_basebackp -D -h -X stream -c fast -U rep_user -W

The parameters used in the syntax mentioned above are as follows:

• -h : You can use this to mention the primary host.


• -D : This parameter indicates the directory you’re currently working on.
• -C : You can use this to set the checkpoints.
• -X : This parameter can be used to include the necessary transactional log files.
• -W : You can use this parameter to prompt the user for a password before linking to
the database.

Step 2: Set Up Replication Configuration File

Next, you need to check if the replication configuration file exists. If it doesn’t, you can
generate the replication configuration file as recovery.conf.

You should create this file in the data directory of the PostgreSQL installation. You can
generate it automatically by using the -R option within the pg_basebackup utility.

The recovery.conf file should contain the following commands:

standby_mode = 'on'

primary_conninfo = 'host=<master_host> port=<postgres_port> user=<replication_user>


password=<password> application_name="host_name"'

recovery_target_timeline = 'latest'

The parameters used in the aforementioned commands are as follows:

• primary_conninfo : You can use this to make a connection between the primary and
secondary servers by leveraging a connection string.
• standby_mode : This parameter can cause the primary server to start as the standby
when switched ON.
• recovery_target_timeline : You can use this to set the recovery time.

To set up a connection, you need to provide the username, IP address, and password as values
for the primary_conninfo parameter. For instance:

primary_conninfo = 'host=192.168.0.26 port=5432 user=rep_user password=rep_pass'

Step 3: Restart Secondary Server

Finally, you can restart the secondary server to complete the configuration process.
However, streaming replication comes with several challenges, such as:

• Various PostgreSQL clients (written in different programming languages) converse


with a single endpoint. When the primary node fails, these clients will keep retrying the
same DNS or IP name. This makes failover visible to the application.
• PostgreSQL replication doesn’t come with built-in failover and monitoring. When the
primary node fails, you need to promote a secondary to be the new primary. This
promotion needs to be executed in a way where clients write to only one primary node,
and they don’t observe data inconsistencies.
• PostgreSQL replicates its entire state. When you need to develop a new secondary node,
the secondary needs to recap the entire history of state change from the primary node,
which is resource-intensive and makes it costly to eliminate nodes in the head and create
new ones.

Method 2: Replicated Block Device

The replicated block device method depends on disk mirroring (also known as volume
replication). In this approach, changes are written to a persistent volume which gets
synchronously mirrored to another volume.

The added benefit of this method is its compatibility and data durability in cloud environments
with all relational databases, including PostgreSQL, MySQL, and SQL Server, to name a few.

However, the disk-mirroring approach to PostgreSQL replication needs you to replicate both
WAL log and table data. Since each write to the database now needs to go over the network
synchronously, you can’t afford to lose a single byte, as that could leave your database in a
corrupt state.

This method is normally leveraged using Azure PostgreSQL and Amazon RDS.

Method 3: WAL

WAL consists of segment files (16 MB by default). Each segment has one or more records. A
log sequence record (LSN) is a pointer to a record in WAL, letting you know the
position/location where the record has been saved in the log file.

A standby server leverages WAL segments — also known as XLOGS in PostgreSQL


terminology — to continuously replicate changes from its primary server. You can use write-
ahead logging to grant durability and atomicity in a DBMS by serializing chunks of byte-array
data (each one with a unique LSN) to stable storage before they get applied to a database.

Applying a mutation to a database might lead to various file system operations. A pertinent
question that comes up is how a database can assure atomicity in the event of a server failure
due to a power outage while it’s in the middle of a file system update. When a database boots,
it begins a startup or replay process which can read the available WAL segments and compares
them with the LSN stored on every data page (every data page is marked with the LSN of the
latest WAL record that affects the page).
Log Shipping-Based Replication (Block Level)

Streaming replication refines the log shipping process. As opposed to waiting for the WAL
switch, the records are sent as they get created, thus decreasing replication delay.

Streaming replication also trumps log shipping because the standby server links with the
primary server over the network by leveraging a replication protocol. The primary server can
then send WAL records directly over this connection without having to depend on scripts
provided by the end-user.

Log Shipping-Based Replication (File Level)

Log shipping is defined as copying log files to another PostgreSQL server to generate another
standby server by replaying WAL files. This server is configured to work in recovery mode,
and its sole purpose is to apply any new WAL files as they show up.

This secondary server then becomes a warm backup of the primary PostgreSQL server. It can
also be configured to be a read replica, where it can offer read-only queries, also referred to as
hot standby.

Continuous WAL Archiving

Duplicating WAL files as they are created into any location other than
the pg_wal subdirectory to archive them is known as WAL archiving. PostgreSQL will call
a script given by the user for archiving, each time a WAL file gets created.

The script can leverage the scp command to duplicate the file to one or more locations such
as an NFS mount. Once archived, the WAL segment files can be leveraged to recover the
database at any given point in time.

Other log-based configurations include:

• Synchronous replication: Before every synchronous replication transaction gets


committed, the primary server waits until standbys confirm that they got the data. The
benefit of this configuration is that there won’t be any conflicts caused due to parallel
writing processes.
• Synchronous multi-master replication: Here, every server can accept write
requests, and modified data gets transmitted from the original server to every other
server before each transaction gets committed. It leverages the 2PC protocol and
adheres to the all-or-none rule.
WAL Streaming Protocol Details

A process known as WAL receiver, running on the standby server, leverages the connection
details provided in the primary_conninfo parameter of recovery.conf and connects to the
primary server by leveraging a TCP/IP connection.
To start streaming replication, the frontend can send the replication parameter within the startup
message. A Boolean value of true, yes, 1, or ON lets the backend know that it needs to go into
physical replication walsender mode.

WAL sender is another process that runs on the primary server and is in charge of sending the
WAL records to the standby server as they get generated. The WAL receiver saves the WAL
records in WAL as if they were created by client activity of locally connected clients.

Once the WAL records reach the WAL segment files, the standby server constantly keeps
replaying the WAL so that primary and standby are up to date.

Types of Replication

There are two types of PostgreSQL replication: logical and physical replication.

A simple logical operation — initdb — would carry out the physical operation of creating a
base directory for a cluster. Likewise, a simple logical operation CREATE
DATABASE would carry out the physical operation of creating a subdirectory in the base
directory.

Physical replication usually deals with files and directories. It doesn’t know what these files
and directories represent. These methods are used to maintain a full copy of the entire data of
a single cluster, typically on another machine, and are done at the file system level or disk level
and use exact block addresses.

Logical replication is a way of reproducing data entities and their modifications, based upon
their replication identity (usually a primary key). Unlike physical replication, it deals with
databases, tables, and DML operations and is done at the database cluster level. It uses
a publish and subscribe model where one or more subscribers are subscribed to one or
more publications on a publisher node.

Replication Modes

There are mainly two modes of PostgreSQL replication: synchronous and asynchronous.

Synchronous replication allows data to be written to both the primary and secondary server at
the same time, whereas asynchronous replication ensures that the data is first written to the host
and then copied to the secondary server.

In synchronous mode replication, transactions on the primary database are considered complete
only when those changes have been replicated to all the replicas. The replica servers must all
be available all the time for the transactions to be completed on the primary. The synchronous
mode of replication is used in high-end transactional environments with immediate failover
requirements.
In asynchronous mode, transactions on the primary server can be declared complete when the
changes have been done on just the primary server. These changes are then replicated in the
replicas later in time. The replica servers can remain out-of-sync for a certain duration, called
a replication lag. In the case of a crash, data loss may occur, but the overhead provided by
asynchronous replication is small, so it’s acceptable in most cases (it doesn’t overburden the
host). Failover from the primary database to the secondary database takes longer than
synchronous replication

To conclude, Data replication is an automated backup process in which your data is


repeatedly copied from its main database to another, remote location for safekeeping. It’s an
integral technology for any site or app running a database server. You can also leverage the
replicated database to process read-only SQL, allowing more processes to be run within the
system.

You might also like