Galera Cluster Best Practices
Seppo Jaakola
Codership
Agenda
● Galera Cluster Short Introduction
● Multi-master Conflicts
● State Transfers (SST IST)
● Backups
● Schema Upgrades
● Galera Project
www.codership.com
2
Galera Cluster
Multi-Master Replication
MySQL
a Galera Replication
www.codership.com
4
Multi-Master Replication
There can be several nodes
MySQL MySQL
a Galera Replication
www.codership.com
5
Multi-Master Replication
There can be several nodes
MySQL MySQL MySQL
a Galera Replication
www.codership.com
6
Multi-Master Replication
Client can connect to any node
There can be several nodes
MySQL MySQL MySQL
a Galera Replication
www.codership.com
7
Multi-Master Replication
read & write read & write read & write Read & write access to any node
Client can connect to any node
There can be several nodes
MySQL MySQL MySQL
a Galera Replication
www.codership.com
8
Multi-Master Replication
read & write read & write read & write Read & write access to any node
Client can connect to any node
There can be several nodes
MySQL MySQL MySQL
a Galera Replication
Replication is synchronous
www.codership.com
9
Multi-Master Replication
read & write read & write read & write
Multi-master cluster looks
like one big database with
multiple entry points
a MySQL
www.codership.com
10
Galera Cluster
➢ Synchronous multi-master cluster
➢ For MySQL/InnoDB
➢ 3 or more nodes needed for HA
➢ Automatic node provisioning
➢ Works in LAN / WAN / Cloud
www.codership.com
11
Synchronous Replication
Transaction is
Read & write processed locally up
to commit time
MySQL MySQL MySQL
a
Galera Replication
www.codership.com
14
Synchronous Replication
Transaction is
commit replicated to whole
cluster
MySQL MySQL MySQL
a
Galera Replication
www.codership.com
15
Synchronous Replication
Client gets OK
OK status
MySQL MySQL MySQL
a
Galera Replication
www.codership.com
16
Synchronous Replication
Transaction is
applied in slaves
MySQL MySQL MySQL
a
Galera Replication
www.codership.com
17
Dealing with Multi-Master Conflicts
Multi-master Conflicts
write write
MySQL MySQL MySQL
a
Galera Replication
www.codership.com
20
Multi-master Conflicts
write write
Conflict detected
MySQL MySQL MySQL
a
Galera Replication
www.codership.com
21
Multi-master Conflicts
Deadlock
OK write error
MySQL MySQL MySQL
a
Galera Replication
www.codership.com
22
Multi-Master Conflicts
● Galera uses optimistic concurrency control
● If two transactions modify same row on
different nodes at the same time, one of
the transactions must abort
➔ Victim transaction will get deadlock error
● Application should retry deadlocked
transactions, however not all applications
have retrying logic inbuilt
www.codership.com
23
Database Hot-Spots
● Some rows where many transactions want to
write to simultaneously
● Patterns like queue or ID allocation can be hot-
spots
www.codership.com
24
Hot-Spots
write write
write
Hot row
www.codership.com
25
Diagnosing Multi-Master Conflicts
● In the past Galera did not log much information
from cluster wide conflicts
● But, by using wsrep_debug configuration, all
conflicts (...and plenty of other information) will
be logged
● Next release will add new variable:
wsrep_log_conflicts which will cause each cluster
conflict to be logged in mysql error log
● Monitor:
● wsrep_local_bf_aborts
● wsrep_local_cert_failures
www.codership.com
26
wsrep_retry_autocommit
● Galera can retry autocommit transaction on
behalf of the client application, inside of the
MySQL server
● MySQL will not return deadlock error, but will
silently retry the transaction
● wsrep_retry_autocommit=n will retry the
transaction n times before giving up and
returning deadlock error
● Retrying applies only to autocommit
transactions, as retrying is not safe for multi-
statement transactions
www.codership.com
27
Retry Autocommit
1. conflict
Write detected
write
2. retrying
MySQL MySQL MySQL
a
Galera Replication
www.codership.com
28
Multi-Master Conflicts
1) Analyze the hot-spot
2) Check if application logic can be changed to
catch deadlock exception and apply retrying
logic in application
3) Try if wsrep_retry_autocommit configuration
helps
4) Limit the number of master nodes or change
completely to master-slave model
if you can filter out the access to the hot-
spot table, it is enough to treat writes
only to hot-spotwww.codership.com
table as master-slave
29
State Transfers
State Transfer
➢ Joining node needs to get the current
database state
➢ Two choices:
➢ IST: incremental state transfer
➢ SST: full state transfer
➢ If joining node had some previous state
and gcache spans to that, then IST can
be used
www.codership.com
31
State Snapshot Transfer
● To send full database state
● wsrep_sst_method to choose the method:
➢ mysqldump
➢ rsync
➢ xtrabackup
www.codership.com
32
SST Request
MySQL MySQL joiner
SST Request
wsrep_sst_method
Galera Replication
●
www.codership.com
33
SST Method
wsrep_sst_mysqldump
MySQL donor wsrep_sst_rsync joiner
Galera Replication
wsrep_sst_xtrabackup
www.codership.com
34
SST API
● SST is open API for shell scripts
● Anyone can write custom SST
● SST API can be used e.g. for:
● Backups
● Filtering out part of database
www.codership.com
35
wsrep_sst_mysqldump
● Logical backup
● Slowest method
● Configure authentication
➢ wsrep_sst_auth=”root:rootpass”
➢ Super privilege needed
● Make sure SST user in donor node can
take mysqldump from donor and load it
over the network to joiner node
● You can try this manually beforehand
www.codership.com
36
wsrep_sst_rsync
● Physical backup
● Fast method
● Can only be used when node is starting
➢ Rsyncing datadirectory under running
InnoDB is not possible
www.codership.com
37
wsrep_sst_xtrabackup
● Contributed by Percona
● Probably the fastest method
● Uses xtrabackup
● Least blocking on Donor side (short
readlock is still used when backup starts)
www.codership.com
38
SST Donor
● All SST methods cause some disturbance
for donor node
● By default donor accepts client
connections, although committing will be
prohibited for a while
● If wsrep_sst_donor_rejects_queries is set,
donor gives unknown command error to
clients
➔ Best practice is to dedicate a reference
node for donor and backup activities
www.codership.com
39
Incremental State Transfer
Request to join
GTID: seqno-n
Donor Joiner
seqno-n
gcache
gcache
www.codership.com
40
Incremental State Transfer
Donor Joiner
Send IST events
apply
gcache seqno-n
gcache
www.codership.com
41
Incremental State Transfer
● Very effective
● gcache.size parameter defines how big
cache will be maintained
● Gcache is mmap, available disk space is
upper limit for size allocation
www.codership.com
42
Incremental State Transfer
● Use database size and write rate to
optimize gcache:
➢ gcache < database
➢ Write rate tells how long tail will be
stored in cache
www.codership.com
43
Incremental State Transfer
● You can think that IST Is
● A short asynchronous replication session
● If communication is bad quality, node
can drop and join back fast with IST
www.codership.com
44
Backups Backups Backups
Backups
➢ All Galera nodes are constantly up to date
➢ Best practices:
➢ Dedicate a reference node for backups
➢ Assign global trx ID with the backup
➢ Possible methods:
1.Disconnecting a node for backup
2.Using SST script interface
3.xtrabackup
www.codership.com
46
Backups with global Trx ID
➢ Global transaction ID (GTID) marks a
position in the cluster transaction stream
➢ Backup with known GTID make it possible
to utilize IST when joining new nodes, eg,
when:
➢ Recovering the node
➢ Provisioning new nodes
www.codership.com
47
Backup by Disconnecting a Node
Load Balancing Isolate the backup node
MySQL MySQL MySQL
Galera Replication
www.codership.com
48
Backup by Disconnecting a Node
Load Balancing
Disconnect from group
MySQL MySQL MySQL e.g. clear wsrep_provider
Galera Replication
www.codership.com
49
Backup by Disconnecting a Node
Load Balancing
Disconnect from group
MySQL MySQL MySQL e.g. clear wsrep_provider
Galera Replication
www.codership.com
50
Backup by Disconnecting a Node
Load Balancing
Work your backup magic
MySQL MySQL MySQL
Galera Replication
backups
www.codership.com
51
Backup by Disconnecting a Node
Load Balancing
Read global transaction ID
MySQL MySQL MySQL from status and assign to
backup
Galera Replication wsrep_cluster_uuid
wsrep_last_committed
backups
www.codership.com
52
Backup by SST
● Donor mode provides isolated processing
environment
● A special SST script can be written just to
prepare backup in donor node:
wsrep_sst_backup
● Garbd can be used to trigger donor node to run
the wsrep_sst_backup
www.codership.com
53
Backup by SST API
Load Balancing Launch garbd
SST request
Garbd
node1 node2 node3
wsrep_sst_donor=node3
wsrep_sst_method=backup
Galera Replication
www.codership.com
54
Backup by SST API
Donor launches
Load Balancing
wsrep_sst_backup
wsrep_sst_backup
node1 node2 node3
.
Galera Replication .
.
www.codership.com
55
Backup by SST API
wsrep_sst_backup
Load Balancing
prepares the backup
wsrep_sst_backup
node1 node2 node3
.
Galera Replication .
.GTID
backups
www.codership.com
56
Backup by SST API
Backup node returns to
Load Balancing
cluster
node1 node2 node3
Galera Replication
www.codership.com
57
Backup by xtrabackup
● Xtrabackup is hot backup method and can be
used anytime
● Simple, efficient
● Use –galera-info option to get global transaction
ID logged into separate galera info file
www.codership.com
58
Schema Upgrades
Schema Upgrades
● DDL is non-transactional, and therefore
bad
● Galera has two methods for DDL
● TOI, Total Order Isolation
● RSU, Rolling Schema Upgrade
● Use wsrep_osu_method to choose either
option
www.codership.com
60
Total Order Isolation
● DDL is replicated up-front
● Each node will get the DDL statement
and must process the DDL at same slot
in transaction stream
● Galera will isolate the affected
table/database for the duration of DDL
processing
www.codership.com
61
Rolling Schema Upgrade
● DDL is not replicated
● Galera will take the node out of replication
for the duration of DDL processing
● When DDL is done with, node will catch up
with missed transactions (like IST)
● DBA should roll RSU operation over all
nodes
● Requires backwards compatible schema
changes
www.codership.com
62
wsrep_on=OFF
● wsrep_on is a session variable telling if this
session will be replicated or not
● I tried to hide this information to the best I can,
but somebody has leaked this out
● And so, yes, it is possible to run
“poor man's RSU” with wsrep_on set to OFF
● such session may be aborted by replication
● Use only, if you are really sure that:
● planned SQL is not conflicting
● SQL will not generate inconsistency
www.codership.com
63
Schema Upgrades
● Best practices:
➔ Plan your upgrades
➔ Try to be backwards compatible
➔ Rehearse your upgrades
➔ Find out DDL execution time
➔ Go for RSU if possible
www.codership.com
64
Consistent Reads
Consistent reads
Replication is virtually synchronous...
Transaction is
commit replicated to whole
cluster
MySQL MySQL MySQL
Galera Replication
www.codership.com
66
Consistent reads
1. Insert into t1 values (1,....)
2. Select from t1 where i=1
Will the select see
the inserted row?
MySQL MySQL
Galera Replication
www.codership.com
67
Consistent Reads
● Aka read causality
● There is causal dependency between
operations on two database connections
● Application is expecting to see the values of
earlier write
www.codership.com
68
Consistent Reads
● Use: wsrep_causal_reads=ON
➔ Every read (select, show) will wait until slave
queue has been fully applied
● There is timeout for max causal read wait:
● replicator.causal_read_keepalive
www.codership.com
69
Other Tidbits...
Parallel Applying
● Aka parallel replication
● “true parallel applying”
● Every application will benefit of it
● Works not on database, not on table,
but on row level
● wsrep_slave_threads=n
● How many slaves makes sense:
● Monitor wsrep_cert_deps_distance
● Max 2 * cores
www.codership.com
71
MyISAM Replication
● On experimental level
● MyISAM is phasing out not much demand to
complete
● Replicates SQL up-front, like TOI
● Should be used in master-slave model
● No checks for non-deterministic SQL
● Insert into t (r, time) values (rand(), now());
www.codership.com
72
SSL / TLS
● Replication over SSL is supported
● No authentication (yet), only encryption
● Whole cluster must use SSL
www.codership.com
73
SSL or VPN
● Bundling several nodes through VPN
tunnel may cause a vulnerability
● When VPN gateway breaks, a big part of
cluster will be blacked out
● Best practice is to go for SSL if VPN does
not have alternative routes
www.codership.com
74
UDP Multicast
● Configure with gmcast.mcast_addr
● Full cluster must be configured for
multicast or tcp sockets
● Multicast is good for scalability
● Best practice is to go for multicast if
planning for large clusters
www.codership.com
75
Galera Project
Galera Project
● Galera Cluster for MySQL
● 5 years development
● based on MySQL server community edition
● Fully open source
● Active community
● ~3 releases per year
● Release 2.2 RC out yesterday
● Major release 3.0 in the works
● Galera Replication also used in:
● Percona XtraDB Cluster
● MariaDB Galera www.codership.com
Cluster 77
Galera Project
Galera Cluster for MySQL
MariaDB Galera Cluster
Percona XtraDB Cluster
MySQL
Percona MariaDB
API merge
Server
e rge
m
API
API
Galera Replication plugin
www.codership.com
78
Questions?
Thank you for listening!
Happy Clustering :-)