0% found this document useful (0 votes)
47 views28 pages

2018 MuellerStephanPGDay

This document provides an overview of using RepMgr to manage high availability PostgreSQL clusters. It discusses how to initialize a RepMgr cluster by registering the primary node, clone standby nodes from the primary using base backups, and monitor the cluster status. It also explains how to perform manual failovers by promoting a standby node to primary using the RepMgr tools after shutting down the old primary node. RepMgr is a tool developed by 2ndQuadrant that provides an interface for managing PostgreSQL streaming replication and performing switchovers and failovers in a PostgreSQL cluster.

Uploaded by

inside
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views28 pages

2018 MuellerStephanPGDay

This document provides an overview of using RepMgr to manage high availability PostgreSQL clusters. It discusses how to initialize a RepMgr cluster by registering the primary node, clone standby nodes from the primary using base backups, and monitor the cluster status. It also explains how to perform manual failovers by promoting a standby node to primary using the RepMgr tools after shutting down the old primary node. RepMgr is a tool developed by 2ndQuadrant that provides an interface for managing PostgreSQL streaming replication and performing switchovers and failovers in a PostgreSQL cluster.

Uploaded by

inside
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Building a Lightweight High Availability Cluster

Using RepMgr

Stephan Müller

June 29, 2018


Schedule

Introduction
Postgres high availability options
Write ahead log and streaming replication
Built-in tools
Cluster management with RepMgr
Configuration and usage
Automatic failover with RepMgrD
Backup and Recovery with BarMan
Configuration and usage

Wrap-up & Discussion


Please ask questions
Personal Background

IT Operations, since 2.5 years


OLMeRO
Swiss market leader for internet solution for construction sector
Tender and construction site management
renovero.ch
Craftmens’ offerings for private customers
Belongs to tamedia portfolio
Publishing company
Digital market places

Mathematics and Computer Science in Berlin


Cryptography, Category Theory
Thank you PGDay.ch’17
Postgres High Availability Options on Different Layers

Hardware
SAN
Transparent to OS and postgres
Fails spectacularly
Operating system
Distributed Replicated Block Device (DRDB)
SAN in Software
Database physical
WAL based: Log shipping (≥ v 8.3)
WAL based: Streaming replication (≥ v 9.0)
Database logical
PGDay.ch’18: Harald Armin Massa → 11:00
FOSDEM’18: Magnus Hagander
App-in-db
Slony-I (trigger based)
Application
Introduction: Postgres Write Ahead Log

Before committing any transaction (i.e. set state COMMITTED


in clog ), the transaction is written to WAL and flushed to disk
One big virtual file (16 EB)
Divided into logical files (4 GB)
Divided into segments (16 MB)
This is what you see on your disk
pg xlog/ |0000000A
{z } 0000083E
| {z } 000000B1
| {z }
timeline block segment

Divided into pages (8 KB)


Contains xlog records with transaction data
Log Sequence Number (LSN) is a byte address in WAL
SELECT pg current xlog location (); 83E/B18FE7C0
Address 8FE7C0 in segment 0000000A0000083E000000B1
Introduction: Postgres Write Ahead Log

BEGIN; INSERT INTO foo VALUES(’bar’); COMMIT;


Each page has a pg lsn attribute:
Contains the LSN of the last xlog record which modified that
page
Recovery After a Crash Using the Write Ahead Log

Your server just crashed


After a restart:
Uncommitted data?
It’s lost.
Committed but not yet written to db?
Start replaying missing records from WAL
Where to start?
Form last checkpoint. Location saved in pg control file
pg controldata /your/data/dir
Corrupted page writes?
full page writes = on
Insert complete backup of pages into WAL
That makes your WAL so big: ∼8K for each modified page
In short: Write Ahead Log is the D in ACID
Write Ahead Log and Streaming Replication

Idea: Copy WAL to other postgres servers


Remote server indefinitely replays from WAL
Log Shipping: ”Just copy WAL segments”
Streaming Replication: Copy individual xlog records
Different levels of replication: synchronous commit
off Everywhere asynchronous
local Locally synchronous, remote asynchronous
on Wait until remote server has written to WAL
remote apply Wait until remote server has commited

synchronous standby names


Tradeoff: Saftey vs Performance
Tunable on transaction level
Postgres Streaming Replication Benefits

Built-in
Easy to set up
Hard to break
Easy monitoring: All or nothing
SELECT ∗ FROM pg stat replication;
pid | 20841
usename | repmgr
application name | db02 remote server
backend xmin | 294106915
state | streaming OK
sent location | 83E/ F92947F0
write location | 83E/ F92947F0 in memory
flush location | 83E/ F92947F0 on disk
replay location | 83E/ F92947B8 applied to db
sync state | async
[...]
Streaming Replication: Easy Setup

Prepare primary:
postgres . conf
listen addresses = ’ 192.168.0.10 ’
max wal senders ≥ #nodes + 2
wal level = replica
wal log hints = on for pg rewind
Special user:
CREATE ROLE r e p u s e r WITH REPLICATION

Dont forget hba.conf and your firewall


Prepare standby:
p g b a s e b a c k u p −h p r i m a r y −P −U r e p u s e r −X −R
postgres.conf:
h o t s t a n d b y = on
Adjust recovery . conf
Done. Ok, it is more complicated but not much
Cluster Management Solutions

At the end of the day: You want an easy failover solution.


Patroni
Focuses on automatic failover
Based on etcd / zookeeper
RepMgr
Wraps built-in commands
Focuses on manual failover
Automatic failover with repmgrd
Very slim
PAF (postgres automatic failover)
Focuses on automatic failover
Based on corosync / pacemaker
Using virtual IPs
Overview: RepMgr (Replication Manager)

https://repmgr.org/ (Source on github)


Developed by 2ndQuadrant, written in C
Packaged for most distributions
Use 2ndQuadrant repository
Depending on your postgres version:
d n f i n s t a l l repmgr96 (or repmgr10, etc)

Few dependencies to build from source


Well documented
Only manual failover (i.e. switchover)
Tuneable to automatic failover
Plays well with BarMan (Backup and Recovery Manager)
Setting up RepMgr on Primary

Start with your primary postgres node


Create repmgr user (superuser or replication privilege)
c r e a t e u s e r −s repmgr

Create db for metadata


c r e a t e d b repmgr −O repmgr

Adjust hba.conf
Allow repmgr user to connect to its db, local and remotely
Prepare repmgr.conf
node id = 1
node name = db01 dont use role names
c o n n i n f o = ’ h o s t=db01 . o l m e r o . ch
u s e r=repmgr
dbname=repmgr ’
RepMgr Usage: Start a Cluster

General pattern: repmgr [options ] <object> <verb>


object ∈ {primary, standby, node, cluster , witness }
verb ∈ { register , clone , follow , switchover , check, show, . . .}
Register primary node
repmgr p r i m a r y r e g i s t e r
Installs some extensions
Adds entry to repmgr database
SELECT ∗ FROM re pm gr . n o d e s ;
node id | 1
upstream node id |
active | t
node name | db01
type | primary
location | default
priority | 30
conninfo | h o s t=db01 . o l m e r o . ch dbname=r ep mg r u s e r=re pm gr
repluser | r ep mg r
slot name |
config file | / e t c / r ep m gr . c o n f
RepMgr Usage: Adding Nodes to Your Cluster

Start with empty data directory


Copy and modify repmgr.conf from primary:
node id = 2
node name = db02
c o n n i n f o = ’ h o s t=db02 . o l m e r o . ch
u s e r=repmgr
dbname=repmgr ’

Clone primary server


repmgr −h db01.olmero.ch s t a n d b y c l o n e

Executes a basebackup
p g b a s e b a c k u p −h node1 −U repmgr −X s t r e a m

Prepares recovery.conf
RepMgr Usage: Adding Nodes to Your Cluster (cont)

recovery.conf:
s t a n d b y m o d e = ’ on ’
recovery target timeline = ’ latest ’
p r i m a r y c o n n i n f o = ’ h o s t = db01.olmero.ch
u s e r = repmgr
a p p l i c a t i o n n a m e = db02 ’
r e s t o r e c o m m a n d = ’ / u s r / b i n / barman−wal−r e s t o r e
barman o l m e r o %f %p ’

Start postgres server - Done.


Streaming replication is running
RepMgr Usage: Change Primary

View your cluster: (run on any node)


repmgr c l u s t e r show

ID | Name | R o l e | Status | Upstream | L o c a t i o n


−−−+−−−−−−+−−−−−−−−−+−−−−−−−−−−−+−−−−−−−−−−+−−−−−−−−−
1 | db01 | p r i m a r y | ∗ r u n n i n g | | default
2 | db02 | s t a n d b y | r u n n i n g | db01 | default
3 | db03 | s t a n d b y | r u n n i n g | db01 | default

Switch over to other primary: (run on new primary)


repmgr standby switchover
You want to start with a healthy cluster
Shutdown primary (service stop command)
Promote local (service promote command)
pg rewind old primary
Restart and rejoin old primary
Manual Failover with RepMgr

Promote a standby:
Make sure your old primary is dead and will stay dead
Choose a standby and run
repmgr s t a n d b y promote
Calls service promote command from repmgr.conf
Change the upstream node for your other standbys
repmgr s t a n d b y f o l l o w

Tell your applications about the new master


Use a connection pooler to separate your application and
database
For example: pg bouncer
Your old primary is trashed
Delete and clone from new primary
Automatic Failover with RepMgr: Overview

A repmgrd runs on each postgres node


repmgrd uses metadata table from repmgr db
It knows your postgres cluster
But it is not aware of other repmgrds
The repmgrds are not a cluster themselves (unlike etcd)
repmgrd PQpings the clusters primary and its ”local” node
On failure: repmgrd on a standby promote its local node
Automatic Failover with RepMgr: Configuration

Shared configuration: /etc/repmgr.conf


f a i l o v e r = automatic
p r i o r i t y = 100
r e c o n n e c t a t t e m p t s = 10
r e c o n n e c t i n t e r v a l = 20
promote command = repmgr s t a n d b y promote # No

Lastest LSN overrules priority


No fencing! Only rudimentary checks are done
Use a wrapper to do all the logic:
promote command = / y o u r / f a n c y / f a i l o v e r / s c r i p t . py

STONITH in software
Eventually call repmgr standby promote
In doubt, leave it out
BarMan: Backup and Recovery Manager

https://www.pgbarman.org/
Developed by 2ndQuadrant, written in Python 2
Packaged for most distributions
dnf install barman
dnf install barman−cli (on your postges nodes)
Physical backups
Fast recovery
Point In Time Recovery (PITR)
No logical backups
Onsite and offsite backups possible
Restore functionality
BarMan: Overview

Think: ”A postgres node without postgres”


Copies your data directory
pg basebackup
rsync
Uses streaming replication for continuous WAL archiving
pg receivexlog
On barmans disk:
/ d a t a 1 / barman / o l m e r o / b a s e :
20180626 T013002 / your data dir
20180627 T013002 /

/ d a t a 1 / barman / o l m e r o / w a l s :
[...]
0000002 E0000084B / all wal segments
0000002 E0000084C /
0000002 E0000084D /
0000002E . h i s t o r y
BarMan: Configuration

Everything in barman.conf
[ olmero ]
c o n n i n f o = host=db01.olmero.ch user=barman
dbname=postgres
s t r e a m i n g c o n n i n f o = host=db01.olmero.ch user=barman

backup method = rsync


ssh command = ssh [email protected] -c arcfour
reuse backup = link
parallel jobs = 4

s t r e a m i n g a r c h i v e r = on ; stream wals
s l o t n a m e = barman01 ; use a replication slot

Point barman to your postgres primary


Additionally:
Passwordless SSH login
DB connection with replication privilege
BarMan: Commandline Usage

barman backup olmero


Basebackup via rsync
Start pg receivexlog
barman list backups olmero
20180627 Wed Jun 27 04:40:39 - Size: 468.3 GiB - WAL Size: 8.5 GiB
20180626 Tue Jun 26 04:58:48 - Size: 468.4 GiB - WAL Size: 9.5 GiB

barman check olmero −−nagios


BARMAN OK - Ready to serve the Espresso backup for olmero
barman replication −status show
Pretty print ”SELECT ∗ FROM pg stat replication;”
BarMan: How to Restore a Backup

Restore from backup:


barman r e c o v e r o l m e r o l a t e s t
/ data / d i r
−−remote−s s h −command ” s s h p o s t g r e s @ d b 0 1 ”
<r e c o v e r y −t a r g e t>

Use appropriate recovery target


−−t a r g e t −t i m e ”Wed Jan 01 09 : 3 0 : 0 0 2018 ”
−−t a r g e t −x i d 128278783
−−t a r g e t −name ” f o o ” # SELECT pg create restore point(’foo’)
−−t a r g e t −i m m e d i a t e # o n l y r e c o v e r b a s e backup

Restores basebackup via rsync


Prepares recovery . conf:
barman−wal−restore −U barman barman01 olmero %f %p
Start your postgres server
BarMan and Failover

Barman has no daemons, no extra processes


Everything is a cron job
Barman is not aware of your cluster
Check regularly for a new primary
You have to write a custom script
Adjust config
Start streaming from new primary
barman receive-wal –create-slot olmero
barman switch-wal olmero
If your primary changed
Timeline will change, no confusion in wal segments
Make a new basebackup
Wrap up - Picture at OLMeRO

repmgr as wrapper arround built-in features


Very flexible, very slim
BYOS: You have to bring your own failover logic
This is very hard
Plays well with barman
Thank You

Questions and Discussion

You might also like