0% found this document useful (0 votes)
7 views6 pages

Postgre Cluster Replication

Chapter 4 outlines the setup process for a database cluster using Red Hat Enterprise Linux High Availability Add-On, including stopping databases, configuring firewalls, and setting up cluster services. It details the installation of necessary components, configuration of fencing, and creation of resources for managing PostgreSQL databases. The chapter also provides operational commands for managing cluster resources and troubleshooting potential issues.

Uploaded by

sreenivasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views6 pages

Postgre Cluster Replication

Chapter 4 outlines the setup process for a database cluster using Red Hat Enterprise Linux High Availability Add-On, including stopping databases, configuring firewalls, and setting up cluster services. It details the installation of necessary components, configuration of fencing, and creation of resources for managing PostgreSQL databases. The chapter also provides operational commands for managing cluster resources and troubleshooting potential issues.

Uploaded by

sreenivasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 4.

Database Cluster Setup


Cluster services are used to automatically manage the primary and standby databases. For the
reference environment, the Red Hat Enterprise Linux High Availability Add-On is used. Refer
to Red Hat Enterprise Linux 7: High Availability Add-on Reference.[4] for additional
information.

4.1. Setup
Before setting up the cluster, stop the database on both nodes and turn on the appropriate
firewalls:

#systemctl stop $APPLIANCE_PG_SERVICE

#firewall-cmd --permanent --add-service=high-availability

Set the password for the 'hacluster’ user which is created when pacemaker is installed. Be sure
the hacluster password is the same on both systems.

#passwd hacluster

Enable the pcsd daemon:

#systemctl enable pcsd

#systemctl start pcsd

Install the json_pure gem which is required by the pcs command:

#gem install json_pure

Run the following commands on each database system. Pick a cluster name that is unique.

#NODE1=NODE1_HOSTNAME

#NODE2=NODE2_HOSTNAME

#CLUSTER_NAME=CLUSTER_NAME

Run this command to setup authorization to pcsd on both nodes. It will prompt for the password:

#pcs cluster auth $NODE1 $NODE2 -u hacluster


On NODE1 (primary):

#pcs cluster setup --local --name $CLUSTER_NAME $NODE1 $NODE2

#pcs cluster setup --start --enable --name $CLUSTER_NAME $NODE1 $NODE2

Verify the cluster configuration with the following commands:

#pcs status

Next, configure fencing to allow a surviving cluster node to forcibly remove a non-responsive
node from the cluster.

Note

For the reference environment VMware components are used as the virtualization provider. If
using Red Hat technologies, the following specific VMware configuration is not used. Refer
to Red Hat Enterprise Linux 7: High Availability Add-on Administration.[5] for additional
information on fencing configuration and setup.

Check that both database systems can access the SOAP API of the vCenter they are running on.
The vCenter user needs to be able to power virtual machines on and off.

#fence_vmware_soap -o list -a VCENTER_HOSTNAME -l VCENTER_USERNAME -p


VCENTER_PASSWORD -z

Add vCenter as a fence device to the cluster. This only needs to be run on the first database
system:

#pcs stonith create VCENTER_NAME_stonith fence_vmware_soap


ipaddr="VCENTER_HOSTNAME" ssl="1" login='VCENTER_USERNAME'
passwd='VCENTER_PASSWORD' pcmk_host_list="$NODE1,$NODE2"
pcmk_host_check="static-list"
Note

For virtualization environments it is recommended to configure pcmk_host_map. Refer to the


following knowledge base article for configuration: https://access.redhat.com/solutions/701463

On both database systems, verify that fencing works. The following command
reboots NODE2 when run on NODE1 .

#pcs stonith fence $NODE2


When it has finished rebooting, fence NODE1 from NODE2 :

#pcs stonith fence $NODE1

After each of the systems comes back, verify that the following command shows both cluster
nodes as online, and the fence_vmware_soap agent as started:

#pcs status

Add the virtual IP address that the CloudForms appliances will use to connect to the primary
database.

#pcs resource create pgvip ocf:heartbeat:IPaddr2 ip=VIP


cidr_netmask=NETWORK_PREFIX `iflabel=pgvip meta target-role=Started`

Next, create the resource for the cluster to run the PostgreSQL database. Due to using a newer
PostgreSQL version from the Red Hat Software Collections channel, create two custom scripts
on both database systems:

/usr/local/bin/pg_ctl:

#cat >/usr/local/bin/pg_ctl <<'EOF'

#!/bin/bash

scl enable rh-postgresql94 -- pg_ctl "$@"

EOF

/usr/local/bin/psql:

#cat >/usr/local/bin/psql <<'EOF'

#!/bin/bash

scl enable rh-postgresql94 -- psql "$@"

EOF
#chmod 755 /usr/local/bin/{pg_ctl,psql}

On the NODE1 , create the resource for the cluster to manage the PostgreSQL service.
Replace REPLICATOR_PASSWORD and REPLICATION_VIP with the actual values.

#pcs resource create postgresql pgsql pgctl=/usr/local/bin/pg_ctl


pgdata=/var/opt/rh/rh-postgresql94/lib/pgsql/data/ psql=/usr/local/bin/psql
config=/var/opt/rh/rh-postgresql94/lib/pgsql/data/postgresql.conf
rep_mode=async repuser=replicator
primary_conninfo_opt="password=REPLICATOR_PASSWORD" node_list="$NODE1 $NODE2"
restore_command='cp /var/opt/rh/rh-postgresql94/lib/pgsql/wal-archive/%f "%p"
master_ip=REPLICATION_VIP tmpdir=/var/opt/rh/rh-postgresql94/lib/pgsql/tmp
check_wal_receiver=false restart_on_promote=true op start timeout="60s"
interval="0s" on-fail="restart" op monitor timeout="60s" interval="4s" on-
fail="restart" op monitor timeout="60s" interval="3s" on-fail="restart"
role="Master" op promote timeout="60s" interval="0s" on-fail="restart" op
demote timeout="60s" interval="0s" on-fail="stop" op stop timeout="60s"
interval="0s" op notify timeout="60s" interval="0s"

#pcs resource master postgresql-ms postgresql master-max=1 master-node-max=1


clone-max=2 clone-node-max=1 notify=true

Set low timeout values for the monitor operations to speed up the master/slave negotiation when
the resource is started.

Configure the cluster to keep the replication and service IP address on the same cluster node as
the primary database.

#pcs constraint colocation add pgvip with Master postgresql-ms INFINITY

#pcs constraint order promote postgresql-ms then start pgvip


symmetrical=false score=INFINITY

#pcs constraint order demote postgresql-ms then stop pgvip symmetrical=false


score=0

Replace the postgresql resource agent file to support the pglogical failover. This new resource
agent will run the syncsubs method every 10 minutes as defined in the resource configuration.
Syncsubs reads from the pglogical tables for the subscription information and writes it to a target
location which is either a local file or remote-by-ssh location. When a failure occurs, the
subscription information is already available to the new master and it will read in and recreate
the subscription.

#cp pgsql /usr/lib/ocf/resource.d/

Verify PostgreSQL is started:

#pcs status

This should show that the database is running as master on one node, and as slave on the other
node. If it shows that the database is stopped on both nodes and provides "failed actions" details,
run the following to help diagnose the issue:

#pcs resource debug-start postgresql


One more thing to do is to ensure that old WAL files are deleted. This can be done with
the archive_cleanup_command option of the pgsql cluster resource script, however there is
desire to keep WAL files longer than they are needed for replication and to be able to reconstruct
the database from an older backup.

To do this, create a script in /etc/cron.daily on both nodes. The find command at the end of the
script is run as the postgres user as in production, the wal-archive directory is an NFS mount and
not readable by the root user.

/etc/cron.daily/pgsql-replication:

Expand
#!/bin/bash

# Delete archived WAL files after N days. This assumes that we take a full
backup more often than that.

set -eu

# Exit if we are on the standby node. We only need to run the delete command
once, and the primary node is the one who # writes the files.

if [ -e /var/opt/rh/rh-postgresql94/lib/pgsql/data/recovery.conf ]; then

exit 0

fi

# The number of days after which archived WAL files will be deleted.

MAX_AGE=3

su -c "find /var/opt/rh/rh-postgresql94/lib/pgsql/wal-archive -maxdepth 1 -


mtime +$MAX_AGE -type f -delete" postgres

EOF
chmod +x /etc/cron.daily/pgsql-replication

4.1.1. Operations

To stop all cluster resources on one node:

#pcs cluster standby $HOSTNAME

If that node is the primary database, the remaining node should automatically switch from
standby to primary.
To re-enable the node to host cluster resources:

#pcs cluster unstandby $HOSTNAME

The pgsql cluster resource agent uses a lock file to track clean shutdowns of the primary/standby
combo and will refuse to make a node the primary if the lock file is there.

This lock file is only deleted when the standby is stopped first, and then the primary is stopped.
If the primary is stopped first, or is fenced, and the standby takes over, the lock file is not
deleted.

In reference environment, this precaution is not necessary because the Write Ahead Logs are
archived. If PostgreSQL remains in a “Stopped” state on a node and the "Failed actions" list
shows "unknown error", execute to help diagnose the issue:

#pcs resource debug-start postgresql

If the node displays:

ERROR: My data may be inconsistent.

You have to remove /var/opt/rh/rh-postgresql94/lib/pgsql/tmp/PGSQL.lock file to force start.

…delete /var/opt/rh/rh-postgresql94/lib/pgsql/tmp/PGSQL.lock and run:

#pcs resource cleanup postgresql

You might also like