0% found this document useful (0 votes)

47 views28 pages

2018 MuellerStephanPGDay

This document provides an overview of using RepMgr to manage high availability PostgreSQL clusters. It discusses how to initialize a RepMgr cluster by registering the primary node, clone standby nodes from the primary using base backups, and monitor the cluster status. It also explains how to perform manual failovers by promoting a standby node to primary using the RepMgr tools after shutting down the old primary node. RepMgr is a tool developed by 2ndQuadrant that provides an interface for managing PostgreSQL streaming replication and performing switchovers and failovers in a PostgreSQL cluster.

Uploaded by

inside

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views28 pages

2018 MuellerStephanPGDay

Uploaded by

inside

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Building a Lightweight High Availability Cluster

Using RepMgr

Stephan Müller

June 29, 2018

Schedule

Introduction
Postgres high availability options
Write ahead log and streaming replication
Built-in tools
Cluster management with RepMgr
Configuration and usage
Automatic failover with RepMgrD
Backup and Recovery with BarMan
Configuration and usage

Wrap-up & Discussion

Please ask questions
Personal Background

IT Operations, since 2.5 years

OLMeRO
Swiss market leader for internet solution for construction sector
Tender and construction site management
renovero.ch
Craftmens’ offerings for private customers
Belongs to tamedia portfolio
Publishing company
Digital market places

Mathematics and Computer Science in Berlin

Cryptography, Category Theory
Thank you PGDay.ch’17
Postgres High Availability Options on Different Layers

Hardware
SAN
Transparent to OS and postgres
Fails spectacularly
Operating system
Distributed Replicated Block Device (DRDB)
SAN in Software
Database physical
WAL based: Log shipping (≥ v 8.3)
WAL based: Streaming replication (≥ v 9.0)
Database logical
PGDay.ch’18: Harald Armin Massa → 11:00
FOSDEM’18: Magnus Hagander
App-in-db
Slony-I (trigger based)
Application
Introduction: Postgres Write Ahead Log

Before committing any transaction (i.e. set state COMMITTED

in clog ), the transaction is written to WAL and flushed to disk
One big virtual file (16 EB)
Divided into logical files (4 GB)
Divided into segments (16 MB)
This is what you see on your disk
pg xlog/ |0000000A
{z } 0000083E
| {z } 000000B1
| {z }
timeline block segment

Divided into pages (8 KB)

Contains xlog records with transaction data
Log Sequence Number (LSN) is a byte address in WAL
SELECT pg current xlog location (); 83E/B18FE7C0
Address 8FE7C0 in segment 0000000A0000083E000000B1
Introduction: Postgres Write Ahead Log

BEGIN; INSERT INTO foo VALUES(’bar’); COMMIT;

Each page has a pg lsn attribute:
Contains the LSN of the last xlog record which modified that
page
Recovery After a Crash Using the Write Ahead Log

Your server just crashed

After a restart:
Uncommitted data?
It’s lost.
Committed but not yet written to db?
Start replaying missing records from WAL
Where to start?
Form last checkpoint. Location saved in pg control file
pg controldata /your/data/dir
Corrupted page writes?
full page writes = on
Insert complete backup of pages into WAL
That makes your WAL so big: ∼8K for each modified page
In short: Write Ahead Log is the D in ACID
Write Ahead Log and Streaming Replication

Idea: Copy WAL to other postgres servers

Remote server indefinitely replays from WAL
Log Shipping: ”Just copy WAL segments”
Streaming Replication: Copy individual xlog records
Different levels of replication: synchronous commit
off Everywhere asynchronous
local Locally synchronous, remote asynchronous
on Wait until remote server has written to WAL
remote apply Wait until remote server has commited

synchronous standby names

Tradeoff: Saftey vs Performance
Tunable on transaction level
Postgres Streaming Replication Benefits

Prepare primary:
postgres . conf
listen addresses = ’ 192.168.0.10 ’
max wal senders ≥ #nodes + 2
wal level = replica
wal log hints = on for pg rewind
Special user:
CREATE ROLE r e p u s e r WITH REPLICATION

Dont forget hba.conf and your firewall

Prepare standby:
p g b a s e b a c k u p −h p r i m a r y −P −U r e p u s e r −X −R
postgres.conf:
h o t s t a n d b y = on
Adjust recovery . conf
Done. Ok, it is more complicated but not much
Cluster Management Solutions

At the end of the day: You want an easy failover solution.

Patroni
Focuses on automatic failover
Based on etcd / zookeeper
RepMgr
Wraps built-in commands
Focuses on manual failover
Automatic failover with repmgrd
Very slim
PAF (postgres automatic failover)
Focuses on automatic failover
Based on corosync / pacemaker
Using virtual IPs
Overview: RepMgr (Replication Manager)

https://repmgr.org/ (Source on github)

Developed by 2ndQuadrant, written in C
Packaged for most distributions
Use 2ndQuadrant repository
Depending on your postgres version:
d n f i n s t a l l repmgr96 (or repmgr10, etc)

Few dependencies to build from source

Well documented
Only manual failover (i.e. switchover)
Tuneable to automatic failover
Plays well with BarMan (Backup and Recovery Manager)
Setting up RepMgr on Primary

Start with your primary postgres node

Create repmgr user (superuser or replication privilege)
c r e a t e u s e r −s repmgr

Create db for metadata

c r e a t e d b repmgr −O repmgr

Adjust hba.conf
Allow repmgr user to connect to its db, local and remotely
Prepare repmgr.conf
node id = 1
node name = db01 dont use role names
c o n n i n f o = ’ h o s t=db01 . o l m e r o . ch
u s e r=repmgr
dbname=repmgr ’
RepMgr Usage: Start a Cluster

General pattern: repmgr [options ] <object> <verb>

object ∈ {primary, standby, node, cluster , witness }
verb ∈ { register , clone , follow , switchover , check, show, . . .}
Register primary node
repmgr p r i m a r y r e g i s t e r
Installs some extensions
Adds entry to repmgr database
SELECT ∗ FROM re pm gr . n o d e s ;
node id | 1
upstream node id |
active | t
node name | db01
type | primary
location | default
priority | 30
conninfo | h o s t=db01 . o l m e r o . ch dbname=r ep mg r u s e r=re pm gr
repluser | r ep mg r
slot name |
config file | / e t c / r ep m gr . c o n f
RepMgr Usage: Adding Nodes to Your Cluster

Start with empty data directory

Copy and modify repmgr.conf from primary:
node id = 2
node name = db02
c o n n i n f o = ’ h o s t=db02 . o l m e r o . ch
u s e r=repmgr
dbname=repmgr ’

Clone primary server

repmgr −h db01.olmero.ch s t a n d b y c l o n e

Executes a basebackup
p g b a s e b a c k u p −h node1 −U repmgr −X s t r e a m

Prepares recovery.conf
RepMgr Usage: Adding Nodes to Your Cluster (cont)

recovery.conf:
s t a n d b y m o d e = ’ on ’
recovery target timeline = ’ latest ’
p r i m a r y c o n n i n f o = ’ h o s t = db01.olmero.ch
u s e r = repmgr
a p p l i c a t i o n n a m e = db02 ’
r e s t o r e c o m m a n d = ’ / u s r / b i n / barman−wal−r e s t o r e
barman o l m e r o %f %p ’

Start postgres server - Done.

Streaming replication is running
RepMgr Usage: Change Primary

View your cluster: (run on any node)

repmgr c l u s t e r show

ID | Name | R o l e | Status | Upstream | L o c a t i o n

Switch over to other primary: (run on new primary)

repmgr standby switchover
You want to start with a healthy cluster
Shutdown primary (service stop command)
Promote local (service promote command)
pg rewind old primary
Restart and rejoin old primary
Manual Failover with RepMgr

Promote a standby:
Make sure your old primary is dead and will stay dead
Choose a standby and run
repmgr s t a n d b y promote
Calls service promote command from repmgr.conf
Change the upstream node for your other standbys
repmgr s t a n d b y f o l l o w

Tell your applications about the new master

Use a connection pooler to separate your application and
database
For example: pg bouncer
Your old primary is trashed
Delete and clone from new primary
Automatic Failover with RepMgr: Overview

A repmgrd runs on each postgres node

repmgrd uses metadata table from repmgr db
It knows your postgres cluster
But it is not aware of other repmgrds
The repmgrds are not a cluster themselves (unlike etcd)
repmgrd PQpings the clusters primary and its ”local” node
On failure: repmgrd on a standby promote its local node
Automatic Failover with RepMgr: Configuration

Shared configuration: /etc/repmgr.conf

f a i l o v e r = automatic
p r i o r i t y = 100
r e c o n n e c t a t t e m p t s = 10
r e c o n n e c t i n t e r v a l = 20
promote command = repmgr s t a n d b y promote # No

Lastest LSN overrules priority

No fencing! Only rudimentary checks are done
Use a wrapper to do all the logic:
promote command = / y o u r / f a n c y / f a i l o v e r / s c r i p t . py

STONITH in software
Eventually call repmgr standby promote
In doubt, leave it out
BarMan: Backup and Recovery Manager

https://www.pgbarman.org/
Developed by 2ndQuadrant, written in Python 2
Packaged for most distributions
dnf install barman
dnf install barman−cli (on your postges nodes)
Physical backups
Fast recovery
Point In Time Recovery (PITR)
No logical backups
Onsite and offsite backups possible
Restore functionality
BarMan: Overview

Think: ”A postgres node without postgres”

Copies your data directory
pg basebackup
rsync
Uses streaming replication for continuous WAL archiving
pg receivexlog
On barmans disk:
/ d a t a 1 / barman / o l m e r o / b a s e :
20180626 T013002 / your data dir
20180627 T013002 /

/ d a t a 1 / barman / o l m e r o / w a l s :
[...]
0000002 E0000084B / all wal segments
0000002 E0000084C /
0000002 E0000084D /
0000002E . h i s t o r y
BarMan: Configuration

Everything in barman.conf
[ olmero ]
c o n n i n f o = host=db01.olmero.ch user=barman
dbname=postgres
s t r e a m i n g c o n n i n f o = host=db01.olmero.ch user=barman

backup method = rsync

ssh command = ssh [email protected] -c arcfour
reuse backup = link
parallel jobs = 4

s t r e a m i n g a r c h i v e r = on ; stream wals
s l o t n a m e = barman01 ; use a replication slot

Point barman to your postgres primary

Additionally:
Passwordless SSH login
DB connection with replication privilege
BarMan: Commandline Usage

barman backup olmero

Basebackup via rsync
Start pg receivexlog
barman list backups olmero
20180627 Wed Jun 27 04:40:39 - Size: 468.3 GiB - WAL Size: 8.5 GiB
20180626 Tue Jun 26 04:58:48 - Size: 468.4 GiB - WAL Size: 9.5 GiB

barman check olmero −−nagios

BARMAN OK - Ready to serve the Espresso backup for olmero
barman replication −status show
Pretty print ”SELECT ∗ FROM pg stat replication;”
BarMan: How to Restore a Backup

Restore from backup:

barman r e c o v e r o l m e r o l a t e s t
/ data / d i r
−−remote−s s h −command ” s s h p o s t g r e s @ d b 0 1 ”
<r e c o v e r y −t a r g e t>

Use appropriate recovery target

−−t a r g e t −t i m e ”Wed Jan 01 09 : 3 0 : 0 0 2018 ”
−−t a r g e t −x i d 128278783
−−t a r g e t −name ” f o o ” # SELECT pg create restore point(’foo’)
−−t a r g e t −i m m e d i a t e # o n l y r e c o v e r b a s e backup

Restores basebackup via rsync

Prepares recovery . conf:
barman−wal−restore −U barman barman01 olmero %f %p
Start your postgres server
BarMan and Failover

Barman has no daemons, no extra processes

Everything is a cron job
Barman is not aware of your cluster
Check regularly for a new primary
You have to write a custom script
Adjust config
Start streaming from new primary
barman receive-wal –create-slot olmero
barman switch-wal olmero
If your primary changed
Timeline will change, no confusion in wal segments
Make a new basebackup
Wrap up - Picture at OLMeRO

repmgr as wrapper arround built-in features

Very flexible, very slim
BYOS: You have to bring your own failover logic
This is very hard
Plays well with barman
Thank You

Questions and Discussion

Implementing Failover of Logical Replication Slots in Patroni
No ratings yet
Implementing Failover of Logical Replication Slots in Patroni
37 pages
OpenText Vendor Invoice Management For SAP Solutions 20.4 - Administration Guide English (VIM200400-AGD-EN-04)
100% (1)
OpenText Vendor Invoice Management For SAP Solutions 20.4 - Administration Guide English (VIM200400-AGD-EN-04)
258 pages
Admin Workshop
No ratings yet
Admin Workshop
117 pages
Configure PostgreSQL Replication and Failover With Repmgr
No ratings yet
Configure PostgreSQL Replication and Failover With Repmgr
9 pages
Barman-3 9 0-Manual
No ratings yet
Barman-3 9 0-Manual
92 pages
Backups and DR
No ratings yet
Backups and DR
48 pages
GitHub - 2ndquadrant - Repmgr - The Most Popular Replication Manager For PostgreSQL (Postgres)
No ratings yet
GitHub - 2ndquadrant - Repmgr - The Most Popular Replication Manager For PostgreSQL (Postgres)
23 pages
High Availability Opener P
0% (2)
High Availability Opener P
11 pages
Administration
100% (1)
Administration
112 pages
PostgreSQLInstallation RunBook
100% (1)
PostgreSQLInstallation RunBook
33 pages
PG - Rewind in Postgresql-1
No ratings yet
PG - Rewind in Postgresql-1
3 pages
Built-In Physical and Logical Replication in Postgresql
No ratings yet
Built-In Physical and Logical Replication in Postgresql
49 pages
Postgres Replication
No ratings yet
Postgres Replication
11 pages
PostgreSQL Replication
No ratings yet
PostgreSQL Replication
59 pages
PGDayUK PostgreSQL-maintenance Tasks
No ratings yet
PGDayUK PostgreSQL-maintenance Tasks
59 pages
How To Setup A PostgreSQL Cluster With Repmgr by Victor Boissiere Medium
100% (1)
How To Setup A PostgreSQL Cluster With Repmgr by Victor Boissiere Medium
4 pages
Hot Streaming Rep
No ratings yet
Hot Streaming Rep
22 pages
PostgreSQL Proficiency For Python People
No ratings yet
PostgreSQL Proficiency For Python People
215 pages
Postgresql Management and Automation With Clustercontrol
50% (2)
Postgresql Management and Automation With Clustercontrol
42 pages
How To Set Up PostgreSQL For High Availability and Replication With Hot Standby
No ratings yet
How To Set Up PostgreSQL For High Availability and Replication With Hot Standby
11 pages
Replication
No ratings yet
Replication
13 pages
EDB Single Master WP
No ratings yet
EDB Single Master WP
9 pages
PostgreSQL Tutorial
100% (1)
PostgreSQL Tutorial
13 pages
61 Synchronous Log Shipping Replication
No ratings yet
61 Synchronous Log Shipping Replication
58 pages
Streaming Replication in Practice
No ratings yet
Streaming Replication in Practice
70 pages
Answer: B: Explanation
100% (3)
Answer: B: Explanation
76 pages
VMCE11 - Student Guide
No ratings yet
VMCE11 - Student Guide
313 pages
Backup Strategies
100% (1)
Backup Strategies
44 pages
Repmgr 5.3.0 Documentation
No ratings yet
Repmgr 5.3.0 Documentation
87 pages
PostgreSQL Failover
No ratings yet
PostgreSQL Failover
7 pages
Master-Slave Replication On Postgresql
No ratings yet
Master-Slave Replication On Postgresql
8 pages
Barman Tutorial - en
No ratings yet
Barman Tutorial - en
21 pages
PostgreSQL Activity
No ratings yet
PostgreSQL Activity
13 pages
Setting Up Replication in PostgreSQL
No ratings yet
Setting Up Replication in PostgreSQL
6 pages
PostgreSQL PITR
No ratings yet
PostgreSQL PITR
6 pages
Postgrre
No ratings yet
Postgrre
14 pages
Barman Tutorial
No ratings yet
Barman Tutorial
25 pages
PgconfBE Streaming-Replication
No ratings yet
PgconfBE Streaming-Replication
27 pages
Postgresql Cluster Kurulumu (Repmgr + Haproxy) by Alparslan Ozturk Medium
No ratings yet
Postgresql Cluster Kurulumu (Repmgr + Haproxy) by Alparslan Ozturk Medium
6 pages
Upgrade Postgresql Streaming Replication Setup
No ratings yet
Upgrade Postgresql Streaming Replication Setup
6 pages
How To Set Up PostgreSQL For High Availability and Replication With Hot Standby - Solutions Google Cloud Platform
No ratings yet
How To Set Up PostgreSQL For High Availability and Replication With Hot Standby - Solutions Google Cloud Platform
6 pages
Simple Streaming Replication Setting With Pgpool-II
No ratings yet
Simple Streaming Replication Setting With Pgpool-II
5 pages
Postgres Admin
No ratings yet
Postgres Admin
109 pages
Set Up PostgreSQL 9.5 Master-Slave Replication Using Repmgr Edward Samuel's Blog
No ratings yet
Set Up PostgreSQL 9.5 Master-Slave Replication Using Repmgr Edward Samuel's Blog
5 pages
HS - Postgres 9.6
No ratings yet
HS - Postgres 9.6
3 pages
PGSQL Warm Standby Server Setups
No ratings yet
PGSQL Warm Standby Server Setups
8 pages
Postgresql Log
No ratings yet
Postgresql Log
2 pages
Replication With ZFS and PostgreSQL - Six Feet Up
No ratings yet
Replication With ZFS and PostgreSQL - Six Feet Up
4 pages
Replication - PostgreSQL Unable To Run Repmgr Cloned Database - Database Administrators Stack Exchange
No ratings yet
Replication - PostgreSQL Unable To Run Repmgr Cloned Database - Database Administrators Stack Exchange
5 pages
Pgcluster
No ratings yet
Pgcluster
24 pages
Pgpool-II For Beginners
No ratings yet
Pgpool-II For Beginners
12 pages
PostgreSQL ? Streaming Replication Using Repmgr
No ratings yet
PostgreSQL ? Streaming Replication Using Repmgr
10 pages
Hig H Ava Il Ability & L Oad Balancingusing Pgcluster
No ratings yet
Hig H Ava Il Ability & L Oad Balancingusing Pgcluster
20 pages
Pgpool-II 1st Steps
No ratings yet
Pgpool-II 1st Steps
12 pages
Streaming Replication
No ratings yet
Streaming Replication
6 pages
Enable Online Backup in PostgreSQL
No ratings yet
Enable Online Backup in PostgreSQL
4 pages
Relational Database Technologies Configuring Automatic Failover Using Replication Manager 2.0 On PostgreSQL 9.3.5
No ratings yet
Relational Database Technologies Configuring Automatic Failover Using Replication Manager 2.0 On PostgreSQL 9.3.5
1 page
Postgre Cluster Replication
No ratings yet
Postgre Cluster Replication
6 pages
PostgreSQL Backup Restore Guide
No ratings yet
PostgreSQL Backup Restore Guide
5 pages
Edb Postgres Architecture Deep Dive
No ratings yet
Edb Postgres Architecture Deep Dive
5 pages
PostgreSQL Replication - Second Edition - Sample Chapter
No ratings yet
PostgreSQL Replication - Second Edition - Sample Chapter
27 pages
Notes of Microsoft Word 2007
No ratings yet
Notes of Microsoft Word 2007
60 pages
Rajesh
No ratings yet
Rajesh
4 pages
Support Field Service
100% (1)
Support Field Service
774 pages
TCI1835 Student Guide 1of2 PDF
No ratings yet
TCI1835 Student Guide 1of2 PDF
462 pages
SalesForce AI Specialist - 5
No ratings yet
SalesForce AI Specialist - 5
19 pages
Cover Page
100% (2)
Cover Page
10 pages
Business Analysis Tutorial PDF
No ratings yet
Business Analysis Tutorial PDF
13 pages
Excel For Microsoft 365 Advanced Options
No ratings yet
Excel For Microsoft 365 Advanced Options
70 pages
Acquisition Basic Atos sw2018 694 010 EN PDF
No ratings yet
Acquisition Basic Atos sw2018 694 010 EN PDF
103 pages
APO BW Planning SNP
No ratings yet
APO BW Planning SNP
11 pages
Manual Keylogger
No ratings yet
Manual Keylogger
12 pages
Ax 2012 Upgrade Guide
No ratings yet
Ax 2012 Upgrade Guide
180 pages
Programing Reference E6621-90007
No ratings yet
Programing Reference E6621-90007
180 pages
CivilCAD 2014 Newsletter V1.0 English
No ratings yet
CivilCAD 2014 Newsletter V1.0 English
17 pages
RSA Authentication Manager Upgrade Plan and Timeline
No ratings yet
RSA Authentication Manager Upgrade Plan and Timeline
5 pages
4 Reqanalysis
No ratings yet
4 Reqanalysis
58 pages
What Is The Difference Between A Standalone and An Enterprise CA
No ratings yet
What Is The Difference Between A Standalone and An Enterprise CA
4 pages
Sygic Manual
No ratings yet
Sygic Manual
200 pages
JSP Intro
No ratings yet
JSP Intro
32 pages
Kumpulan Script Untuk Blogger
No ratings yet
Kumpulan Script Untuk Blogger
22 pages
Edpuzzle Instructions Final
No ratings yet
Edpuzzle Instructions Final
2 pages
Nikita Kharat - MES MOM Associate 2
No ratings yet
Nikita Kharat - MES MOM Associate 2
2 pages
How To Publish An Article
No ratings yet
How To Publish An Article
13 pages
Chapter 2-Architecture
No ratings yet
Chapter 2-Architecture
15 pages
Embedded System - Word File
No ratings yet
Embedded System - Word File
5 pages
Grasp Pattern
No ratings yet
Grasp Pattern
32 pages
Import Google Contacts in C#
No ratings yet
Import Google Contacts in C#
3 pages
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet

2018 MuellerStephanPGDay

Uploaded by

2018 MuellerStephanPGDay

Uploaded by

Building a Lightweight High Availability Cluster

June 29, 2018

Wrap-up & Discussion

IT Operations, since 2.5 years

Mathematics and Computer Science in Berlin

Before committing any transaction (i.e. set state COMMITTED

Divided into pages (8 KB)

BEGIN; INSERT INTO foo VALUES(’bar’); COMMIT;

Your server just crashed

Idea: Copy WAL to other postgres servers

synchronous standby names

Dont forget hba.conf and your firewall

At the end of the day: You want an easy failover solution.

https://repmgr.org/ (Source on github)

Few dependencies to build from source

Start with your primary postgres node

Create db for metadata

General pattern: repmgr [options ] <object> <verb>

Start with empty data directory

Clone primary server

Start postgres server - Done.

View your cluster: (run on any node)

ID | Name | R o l e | Status | Upstream | L o c a t i o n

Switch over to other primary: (run on new primary)

Tell your applications about the new master

A repmgrd runs on each postgres node

Shared configuration: /etc/repmgr.conf

Lastest LSN overrules priority

Think: ”A postgres node without postgres”

backup method = rsync

Point barman to your postgres primary

barman backup olmero

barman check olmero −−nagios

Restore from backup:

Use appropriate recovery target

Restores basebackup via rsync

Barman has no daemons, no extra processes

repmgr as wrapper arround built-in features

Questions and Discussion

You might also like