0% found this document useful (0 votes)
152 views76 pages

Galera Cluster Best Practices Guide

The document discusses best practices for Galera cluster, a synchronous multi-master database cluster for MySQL. It covers topics like dealing with conflicts that can occur from concurrent writes to the same data on different nodes, performing state transfers when a new node joins the cluster, and handling database backups and schema upgrades within the Galera cluster.

Uploaded by

Jackey Lin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views76 pages

Galera Cluster Best Practices Guide

The document discusses best practices for Galera cluster, a synchronous multi-master database cluster for MySQL. It covers topics like dealing with conflicts that can occur from concurrent writes to the same data on different nodes, performing state transfers when a new node joins the cluster, and handling database backups and schema upgrades within the Galera cluster.

Uploaded by

Jackey Lin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Galera Cluster Best Practices

Seppo Jaakola
Codership
Agenda

● Galera Cluster Short Introduction


● Multi-master Conflicts
● State Transfers (SST IST)
● Backups
● Schema Upgrades
● Galera Project

www.codership.com
2
Galera Cluster
Multi-Master Replication

MySQL

a Galera Replication

www.codership.com
4
Multi-Master Replication

There can be several nodes


MySQL MySQL
a Galera Replication

www.codership.com
5
Multi-Master Replication

There can be several nodes


MySQL MySQL MySQL
a Galera Replication

www.codership.com
6
Multi-Master Replication

Client can connect to any node

There can be several nodes


MySQL MySQL MySQL
a Galera Replication

www.codership.com
7
Multi-Master Replication

read & write read & write read & write Read & write access to any node
Client can connect to any node

There can be several nodes


MySQL MySQL MySQL
a Galera Replication

www.codership.com
8
Multi-Master Replication

read & write read & write read & write Read & write access to any node
Client can connect to any node

There can be several nodes


MySQL MySQL MySQL
a Galera Replication
Replication is synchronous

www.codership.com
9
Multi-Master Replication

read & write read & write read & write

Multi-master cluster looks


like one big database with
multiple entry points

a MySQL

www.codership.com
10
Galera Cluster

➢ Synchronous multi-master cluster


➢ For MySQL/InnoDB
➢ 3 or more nodes needed for HA
➢ Automatic node provisioning
➢ Works in LAN / WAN / Cloud

www.codership.com
11
Synchronous Replication

Transaction is
Read & write processed locally up
to commit time

MySQL MySQL MySQL

a
Galera Replication

www.codership.com
14
Synchronous Replication

Transaction is
commit replicated to whole
cluster

MySQL MySQL MySQL

a
Galera Replication

www.codership.com
15
Synchronous Replication

Client gets OK
OK status

MySQL MySQL MySQL

a
Galera Replication

www.codership.com
16
Synchronous Replication

Transaction is
applied in slaves

MySQL MySQL MySQL

a
Galera Replication

www.codership.com
17
Dealing with Multi-Master Conflicts
Multi-master Conflicts

write write

MySQL MySQL MySQL

a
Galera Replication

www.codership.com
20
Multi-master Conflicts

write write

Conflict detected
MySQL MySQL MySQL

a
Galera Replication

www.codership.com
21
Multi-master Conflicts

Deadlock
OK write error

MySQL MySQL MySQL

a
Galera Replication

www.codership.com
22
Multi-Master Conflicts

● Galera uses optimistic concurrency control


● If two transactions modify same row on
different nodes at the same time, one of
the transactions must abort
➔ Victim transaction will get deadlock error
● Application should retry deadlocked
transactions, however not all applications
have retrying logic inbuilt

www.codership.com
23
Database Hot-Spots

● Some rows where many transactions want to


write to simultaneously
● Patterns like queue or ID allocation can be hot-
spots

www.codership.com
24
Hot-Spots

write write
write

Hot row

www.codership.com
25
Diagnosing Multi-Master Conflicts

● In the past Galera did not log much information


from cluster wide conflicts
● But, by using wsrep_debug configuration, all
conflicts (...and plenty of other information) will
be logged
● Next release will add new variable:
wsrep_log_conflicts which will cause each cluster
conflict to be logged in mysql error log
● Monitor:
● wsrep_local_bf_aborts
● wsrep_local_cert_failures
www.codership.com
26
wsrep_retry_autocommit

● Galera can retry autocommit transaction on


behalf of the client application, inside of the
MySQL server
● MySQL will not return deadlock error, but will
silently retry the transaction
● wsrep_retry_autocommit=n will retry the
transaction n times before giving up and
returning deadlock error
● Retrying applies only to autocommit
transactions, as retrying is not safe for multi-
statement transactions
www.codership.com
27
Retry Autocommit

1. conflict
Write detected
write
2. retrying

MySQL MySQL MySQL

a
Galera Replication

www.codership.com
28
Multi-Master Conflicts

1) Analyze the hot-spot


2) Check if application logic can be changed to
catch deadlock exception and apply retrying
logic in application
3) Try if wsrep_retry_autocommit configuration
helps
4) Limit the number of master nodes or change
completely to master-slave model
if you can filter out the access to the hot-
spot table, it is enough to treat writes
only to hot-spotwww.codership.com
table as master-slave
29
State Transfers
State Transfer

➢ Joining node needs to get the current


database state
➢ Two choices:

➢ IST: incremental state transfer

➢ SST: full state transfer

➢ If joining node had some previous state

and gcache spans to that, then IST can


be used

www.codership.com
31
State Snapshot Transfer

● To send full database state


● wsrep_sst_method to choose the method:

➢ mysqldump

➢ rsync

➢ xtrabackup

www.codership.com
32
SST Request

MySQL MySQL joiner


SST Request
wsrep_sst_method
Galera Replication

www.codership.com
33
SST Method

wsrep_sst_mysqldump

MySQL donor wsrep_sst_rsync joiner

Galera Replication

wsrep_sst_xtrabackup

www.codership.com
34
SST API

● SST is open API for shell scripts


● Anyone can write custom SST

● SST API can be used e.g. for:

● Backups

● Filtering out part of database

www.codership.com
35
wsrep_sst_mysqldump

● Logical backup
● Slowest method

● Configure authentication

➢ wsrep_sst_auth=”root:rootpass”

➢ Super privilege needed

● Make sure SST user in donor node can

take mysqldump from donor and load it


over the network to joiner node
● You can try this manually beforehand

www.codership.com
36
wsrep_sst_rsync

● Physical backup
● Fast method

● Can only be used when node is starting

➢ Rsyncing datadirectory under running

InnoDB is not possible

www.codership.com
37
wsrep_sst_xtrabackup

● Contributed by Percona
● Probably the fastest method

● Uses xtrabackup

● Least blocking on Donor side (short

readlock is still used when backup starts)

www.codership.com
38
SST Donor

● All SST methods cause some disturbance


for donor node
● By default donor accepts client

connections, although committing will be


prohibited for a while
● If wsrep_sst_donor_rejects_queries is set,

donor gives unknown command error to


clients
➔ Best practice is to dedicate a reference

node for donor and backup activities


www.codership.com
39
Incremental State Transfer

Request to join
GTID: seqno-n
Donor Joiner

seqno-n
gcache
gcache

www.codership.com
40
Incremental State Transfer

Donor Joiner

Send IST events


apply

gcache seqno-n
gcache

www.codership.com
41
Incremental State Transfer

● Very effective
● gcache.size parameter defines how big

cache will be maintained


● Gcache is mmap, available disk space is

upper limit for size allocation

www.codership.com
42
Incremental State Transfer

● Use database size and write rate to


optimize gcache:
➢ gcache < database
➢ Write rate tells how long tail will be

stored in cache

www.codership.com
43
Incremental State Transfer

● You can think that IST Is


● A short asynchronous replication session
● If communication is bad quality, node
can drop and join back fast with IST

www.codership.com
44
Backups Backups Backups
Backups

➢ All Galera nodes are constantly up to date


➢ Best practices:
➢ Dedicate a reference node for backups
➢ Assign global trx ID with the backup

➢ Possible methods:
1.Disconnecting a node for backup
2.Using SST script interface
3.xtrabackup

www.codership.com
46
Backups with global Trx ID

➢ Global transaction ID (GTID) marks a


position in the cluster transaction stream
➢ Backup with known GTID make it possible
to utilize IST when joining new nodes, eg,
when:
➢ Recovering the node
➢ Provisioning new nodes

www.codership.com
47
Backup by Disconnecting a Node

Load Balancing Isolate the backup node

MySQL MySQL MySQL

Galera Replication

www.codership.com
48
Backup by Disconnecting a Node

Load Balancing

Disconnect from group


MySQL MySQL MySQL e.g. clear wsrep_provider

Galera Replication

www.codership.com
49
Backup by Disconnecting a Node

Load Balancing

Disconnect from group


MySQL MySQL MySQL e.g. clear wsrep_provider

Galera Replication

www.codership.com
50
Backup by Disconnecting a Node

Load Balancing

Work your backup magic


MySQL MySQL MySQL

Galera Replication

backups
www.codership.com
51
Backup by Disconnecting a Node

Load Balancing

Read global transaction ID


MySQL MySQL MySQL from status and assign to
backup
Galera Replication wsrep_cluster_uuid
wsrep_last_committed

backups
www.codership.com
52
Backup by SST

● Donor mode provides isolated processing


environment
● A special SST script can be written just to
prepare backup in donor node:
wsrep_sst_backup
● Garbd can be used to trigger donor node to run
the wsrep_sst_backup

www.codership.com
53
Backup by SST API

Load Balancing Launch garbd

SST request
Garbd
node1 node2 node3
wsrep_sst_donor=node3
wsrep_sst_method=backup
Galera Replication

www.codership.com
54
Backup by SST API

Donor launches
Load Balancing
wsrep_sst_backup

wsrep_sst_backup
node1 node2 node3
.
Galera Replication .
.

www.codership.com
55
Backup by SST API

wsrep_sst_backup
Load Balancing
prepares the backup

wsrep_sst_backup
node1 node2 node3
.
Galera Replication .
.GTID

backups
www.codership.com
56
Backup by SST API

Backup node returns to


Load Balancing
cluster

node1 node2 node3

Galera Replication

www.codership.com
57
Backup by xtrabackup

● Xtrabackup is hot backup method and can be


used anytime
● Simple, efficient
● Use –galera-info option to get global transaction
ID logged into separate galera info file

www.codership.com
58
Schema Upgrades
Schema Upgrades

● DDL is non-transactional, and therefore


bad
● Galera has two methods for DDL
● TOI, Total Order Isolation
● RSU, Rolling Schema Upgrade

● Use wsrep_osu_method to choose either


option

www.codership.com
60
Total Order Isolation

● DDL is replicated up-front


● Each node will get the DDL statement
and must process the DDL at same slot
in transaction stream
● Galera will isolate the affected
table/database for the duration of DDL
processing

www.codership.com
61
Rolling Schema Upgrade

● DDL is not replicated


● Galera will take the node out of replication
for the duration of DDL processing
● When DDL is done with, node will catch up
with missed transactions (like IST)
● DBA should roll RSU operation over all
nodes
● Requires backwards compatible schema
changes

www.codership.com
62
wsrep_on=OFF

● wsrep_on is a session variable telling if this


session will be replicated or not
● I tried to hide this information to the best I can,
but somebody has leaked this out
● And so, yes, it is possible to run
“poor man's RSU” with wsrep_on set to OFF
● such session may be aborted by replication
● Use only, if you are really sure that:
● planned SQL is not conflicting
● SQL will not generate inconsistency

www.codership.com
63
Schema Upgrades

● Best practices:
➔ Plan your upgrades
➔ Try to be backwards compatible
➔ Rehearse your upgrades
➔ Find out DDL execution time
➔ Go for RSU if possible

www.codership.com
64
Consistent Reads
Consistent reads

Replication is virtually synchronous...

Transaction is
commit replicated to whole
cluster

MySQL MySQL MySQL

Galera Replication

www.codership.com
66
Consistent reads

1. Insert into t1 values (1,....)


2. Select from t1 where i=1

Will the select see


the inserted row?
MySQL MySQL

Galera Replication
www.codership.com
67
Consistent Reads

● Aka read causality


● There is causal dependency between
operations on two database connections
● Application is expecting to see the values of
earlier write

www.codership.com
68
Consistent Reads

● Use: wsrep_causal_reads=ON
➔ Every read (select, show) will wait until slave
queue has been fully applied
● There is timeout for max causal read wait:
● replicator.causal_read_keepalive

www.codership.com
69
Other Tidbits...
Parallel Applying

● Aka parallel replication


● “true parallel applying”
● Every application will benefit of it
● Works not on database, not on table,
but on row level
● wsrep_slave_threads=n
● How many slaves makes sense:
● Monitor wsrep_cert_deps_distance
● Max 2 * cores

www.codership.com
71
MyISAM Replication

● On experimental level
● MyISAM is phasing out not much demand to
complete
● Replicates SQL up-front, like TOI
● Should be used in master-slave model
● No checks for non-deterministic SQL
● Insert into t (r, time) values (rand(), now());

www.codership.com
72
SSL / TLS

● Replication over SSL is supported


● No authentication (yet), only encryption
● Whole cluster must use SSL

www.codership.com
73
SSL or VPN

● Bundling several nodes through VPN


tunnel may cause a vulnerability
● When VPN gateway breaks, a big part of
cluster will be blacked out
● Best practice is to go for SSL if VPN does
not have alternative routes

www.codership.com
74
UDP Multicast

● Configure with gmcast.mcast_addr


● Full cluster must be configured for
multicast or tcp sockets
● Multicast is good for scalability
● Best practice is to go for multicast if
planning for large clusters

www.codership.com
75
Galera Project
Galera Project
● Galera Cluster for MySQL
● 5 years development
● based on MySQL server community edition
● Fully open source
● Active community
● ~3 releases per year
● Release 2.2 RC out yesterday
● Major release 3.0 in the works
● Galera Replication also used in:
● Percona XtraDB Cluster
● MariaDB Galera www.codership.com
Cluster 77
Galera Project

Galera Cluster for MySQL

MariaDB Galera Cluster


Percona XtraDB Cluster
MySQL

Percona MariaDB
API merge
Server
e rge
m
API
API

Galera Replication plugin

www.codership.com
78
Questions?

Thank you for listening!


Happy Clustering :-)

You might also like