100% found this document useful (1 vote)
54 views69 pages

MariaDB Galera Cluster

- Bad state snapshot transfers (SSTs) when a new node joins can cause inconsistencies if the data transferred is incorrect. - Misuse of the cluster, such as violating transaction boundaries or failing to commit transactions, can also lead to inconsistencies. - In a multi-master replication setup, inconsistencies will immediately violate business logic since any node can be written to or read from.

Uploaded by

Hector Lira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
54 views69 pages

MariaDB Galera Cluster

- Bad state snapshot transfers (SSTs) when a new node joins can cause inconsistencies if the data transferred is incorrect. - Misuse of the cluster, such as violating transaction boundaries or failing to commit transactions, can also lead to inconsistencies. - In a multi-master replication setup, inconsistencies will immediately violate business logic since any node can be written to or read from.

Uploaded by

Hector Lira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

MariaDB Galera Cluster

Dealing with Inconsistency Issues

Seppo Jaakola
Codership
➢ Seppo Jaakola
➢ One of the Founders of Codership

➢ Codership – Galera Replication developers


➢ Partner of MariaDB for developing and supporting MariaDB

Galera Cluster
➢ Galera releases since 2009

www.galeracluster.com
2
Galera Project

Galera Cluster for MySQL MariaDB Galera Cluster

MySQL R&D
MariaDB
Community edition

WSREP API merge WSREP API

Galera Replication Plugin

www.galeracluster.com
3
MariaDB Galera Cluster R&D

● MGC releases based on MariaDB 5.5 and 10.0


● Since MariaDB 10.1, Galera is inbuilt in MariaDB
● If wsrep_provider is not specified, works as native MariaDB
● And will be present in MariaDB 10.2, 10.3...

www.galeracluster.com
4
Agenda

● Master-Slave vs Multi-Master Replication


Topologies
● Galera Replication Overview
● Reasons for Inconsistencies
● Bad SST
● Cluster misuse
● Effect – the Harm Done
● How to Detect Inconsistencies
● Inconsistency Recovery
● Galera Inconsistency Voting Protocol
www.galeracluster.com
5
Inconsistency in Master-Slave and
Multi-Master Topologies
Master-Slave Async Replication

MariaDB
Slave
MariaDB
Relay
Master log

Bin
log
MariaDB
Slave

Relay
www.galeracluster.com log
7
Master-Slave Async Replication

read & write read only

MariaDB
Slave
MariaDB
Relay
Master log

Bin
log
MariaDB
Slave

Relay
www.galeracluster.com log
8
Master-Slave Async Replication

read & write read only

MariaDB
Slave
MariaDB
Relay
Master log

Bin
log
MariaDB
Slave
Master node is trusted data
Source
And is used for recovery Relay
www.galeracluster.com log
9
Master-Slave Async Replication
If slave nodes are
not used for reading,
No immediate inconsistency
issues

read & write

MariaDB
Slave
MariaDB
Relay
Master log

Bin
log
MariaDB
Slave

Relay
www.galeracluster.com log
10
Master-Slave Async Replication

If slave nodes get reads


But reads are not used for
anything critical,
No problems for business logic

read & write read only

MariaDB
Slave
MariaDB
Relay
Master log

Bin
log
MariaDB
Slave

Relay
www.galeracluster.com log
11
Master-Slave Async Replication

If reads from slave nodes are


used as input for writes,
Inconsistency may violate
business logic

read only
write
MariaDB
Slave
MariaDB
Relay
Master log

Bin
log
MariaDB
Slave

Relay
www.galeracluster.com log
12
Inconsistency for Master-Slave

● Master node is always the trusted data source


● Recovery from inconsistency will be straight forward
● If slave nodes are not used for reading,
inconsistency issue does not cause immediate
problem
● If slaves are used for reading, application will get
false data
● But this is safe, is these reads are not used for anything
critical
● If slave read results are trusted, and used as input
for later write operations business logic, the
application business logic will be violated as well
www.galeracluster.com
13
Multi-Master Replication

read & write read & write read & write Read & write access to any node
Client can connect to any node

There can be several nodes


MariaDB MariaDB MariaDB

Replication is synchronous
Galera Replication

www.galeracluster.com
14
Multi-Master Replication

read & write read & write read & write

Multi-master cluster looks


like one big database with
multiple entry points

a MariaDB

www.galeracluster.com
15
Multi-Master Replication

Adding more nodes


read & write read & write read & write read & write
Opens new connection
ports

a MariaDB

www.galeracluster.com
16
Inconsistency for Multi-Master

● The effect of inconsistency more prominent


● Inconsistency may immediately violate the business
logic

www.galeracluster.com
18
Inconsistency for Multi-Master

read & write read & write read & write

Reads from compromised node


will immediately hurt business
MariaDB MariaDB MariaDB logic

Galera Replication

www.galeracluster.com
19
Inconsistency for Multi-Master

read & write read & write read & write

Reads from compromised node


will immediately hurt business
MariaDB MariaDB MariaDB logic

Galera Replication

www.galeracluster.com
20
Reasons for Inconsistency
Reasons for Inconsistency

● Bad SST
● Full cluster Snapshot State Transfer is external operation
(to cluster), which happens through SST API
● Cluster picks a donor node to help the joining node to get
same state as active cluster
● 3 main variants: mysqldump, rsync and xtrabackup based
versions, (and more to come, MariaBackup)
● Improper SST leaves joining node with inconsistent
database
● Cluster misuse
● wsrep_on session and global variable, requires SUPER
privileges, tells if wsrep replication plugin should replicate
the transaction or not
● sql_log_bin session variable, skips binlogging for the
session in master node. With no binlog events, Galera has
www.galeracluster.com
23
nothing to replicate
Reasons for Inconsistency

● Replication filtering
● binlog_do* No binlogging in master node
● replicate_do* slave node does not apply incoming events
● Slave lag
● Not real inconsistency, but replication may be slow
● if reads happen from too old version of data, the
consequence for application is same as having inconsistent
database
● Bug

www.galeracluster.com
24
Consistent Reads from a Cluster

● Galera Cluster orders transactions and forces them


to commit in strictly same order in every cluster node
●Cluster has builtin flow control, which makes nodes’
commit pace to be very close to each other
● gcs.fc_limit, gcs.fc_factor
●If client has only one connection (to one node), it will
see and operate with safe data access

www.galeracluster.com
25
Consistent Reads from a Cluster

MariaDB MariaDB

38 39 40 41 42 43 44 45 46 47 48 49

www.galeracluster.com
26
Consistent Reads from a Cluster

●If client has several connections, or connects


through proxy or load balancer, which can direct
client’s connections to several cluster nodes, there
may happen read causality issue

www.galeracluster.com
27
Read Causality

INSERT INTO mydata

MariaDB MariaDB

INS Galera Replication Replication is “synchronous”

www.galeracluster.com
28
Read Causality

INSERT INTO mydata


SELECT FROM mydata

MariaDB MariaDB

Galera Replication INS

www.galeracluster.com
29
Read Causality

Causality between INSERT and SELECT

INSERT INTO mydata SELECT FROM mydata

MariaDB MariaDB

Galera Replication INS

www.galeracluster.com
30
Read Causality

MaxScale Proxy may use several connections


And trigger read causality

INSERT INTO mydata SELECT FROM mydata

MariaDB MariaDB

Galera Replication INS

www.galeracluster.com
31
Read Causality

● Can be avoided by wsrep_sync_wait


●Makes session to wait for slave queue flushing
before issuing next read
● Sync wait can be applied for:
● Select, show, begin, update, delete
●Makes session’s all reads somewhat slower, should be used
only when read causality is really needed
●Read causality may look like inconsistency, at first glance, but
however, cluster is totally consistent

www.galeracluster.com
32
Triggers, Stored Procedures, Events

●Triggers fire in master node only, and data


manipulation effects will be replicated as ROW events
●Same applies for Stored Procedures, they are
processed in master node only
●Events should be carefully used. They will execute in
all nodes, where they are enabled
● Managing in which node event should be enabled or
slaveside_disabled, may be error prone, especially
when nodes drop and join the cluster often

www.galeracluster.com
33
Foreign Key Constraint

● FK constraints add new level for data coherence in


the cluster
●Galera enforces FK constraints in all cluster nodes,
and any problem in constraint validation will lead to
node emergency shutdown
●Supporting FK constraints in Galera replication was a
major effort in Galera release 2 (year 2011-2012)
● FK violation in slave node, shows up as cluster
inconsistency

www.galeracluster.com
34
Async Replication

●Galera Cluster node can operate as slave node for


external MariaDB master
●Incoming native replication is treated like any client
connection
●However, it is possible to configure async replication
to be “pre-ordered” replication, and this type of
replication stream, is trusted and no certification is
performed
● This may be a vulnerability for consistency, if not
carefully used
● See: wsrep_preordered

www.galeracluster.com
35
The Effect of Inconsistency
The Harm Done

● MariaDB asynchronous replication slave thread stops


for an error
● But, it can be configured to accept certain errors in

replication
● e.g. with: --slave-skip-errors
● Note that, if replication errors are not dealt with, slaves
may become inconsistent and the inconsistency just gets
worse by the time
● Galera is very strict about inconsistency:
● Errors in DDL are neglected
● Any error in regular DML transaction replication will
cause emergency abort for the node
● Errors also in foreign key constraint validation, will cause
emergency abort www.galeracluster.com
37
Inconsistency for Multi-Master

read & write read & write read & write

MariaDB MariaDB MariaDB


Error in applying will trigger
emergency abort
WS WS
Galera Replication

www.galeracluster.com
38
Inconsistency for Multi-Master

read & write read & write

Error in applying will trigger


MariaDB MariaDB MariaDB emergency abort

Galera Replication

www.galeracluster.com
39
Inconsistency for Multi-Master

Every failed write set will be


logged in GRA_x_y.dat log file
x=thread ID
y=seqno

MariaDB

GRA_x_y.dat

Binlog
events
WS
Galera Replication

www.galeracluster.com
40
Could Some Errors Be Neglected?

●Failure to delete a row, when the row is already


deleted
●Failure to insert a row, when exactly same row
already exists
● In failing node, we should allow reads

This requirement is implemented by
wsrep_dirty_reads

www.galeracluster.com
41
Could Some Errors Be Neglected?

● Application implements certain business logic


●There are reads, to get knowledge of the current
state, and based on results of reads, application does
certain write operations

Begin
Select…
Select… reads
Select… Business logic
Update…
writes
Update…
Commit

www.galeracluster.com
42
Could Some Errors Be Neglected?

● Any replication error is a sign of data inconsistency


● Even a delete of non existing row
● Inconsistency may be anything between a fresh
event or a hidden inconsistency from system startup
time
● Detecting an inconsistency is always “after the
fact”,business logic may have been violated for ages
already
●Therefore strict measures are needed, dirty data
may not be used anymore

www.galeracluster.com
43
Could Some Errors Be Neglected?

●Should failing node stay in the cluster instead of


shutting down?
● Should we allow reads from non primary node?
● This is a long term wish, and now supported by option:
wsrep_dirty_reads
● Is it wise to read from non primary node?
● No
● Why DDL errors are neglected?
● Galera replicates DDL up-front as verbatim SQL
statement
● Some application frameworks generate impossible DDL
on regular basis, like DROP TABLE <non-existent>

www.galeracluster.com
44
About Foreign Key Constraint

●FK constraint adds extra discipline for database


content
●If data inconsistency is about to happen, FK
constraints may make it surface earlier
●Supporting FK constraints, in Galera Cluster is
somewhat complicated, because:
● Galera allows multi-master access
● Replication slaves are highly parallel
●For this, we have to carefully control, when and how
FK constraints are enforced, and what kind of multi-
master operation must be rejected as FK violation

www.galeracluster.com
45
Recovering from Inconsistency
How to Recover from Inconsistency

● Practically only SST helps


● However, troubleshooting analysis is needed for
figuring out which node has the most reliable data
● Also, the reason for the inconsistency should be
found out
● This may help to prevent that similar problem would
happen in future
● Unfortunately, analysis of inconsistency is probably
the hardest problem with database clusters
● There is no indication of when the inconsistency was
brought to the system
● And, often there are no signs in the logs about the incdent
which caused the inconsistency
www.galeracluster.com
47
How to Recover from Inconsistency

● Can you tune your database consistent by tolerating


the applying errors and hoping that RBR events will
eventually write over bad data?
● In multi-master topology, absolutely NO
●In master-slave topology, some good progress might
happen
● But not necessarily
● Delete and inserts would work correctly
● But updates, which cannot match ROW before image
would fail
● And reads from this node would be always a compromise

www.galeracluster.com
48
Detecting Inconsistencies
Detecting Inconsistencies up-front

● To prevent sudden inconsistency emergencies, it


would be good to harvest the database before hand
for inconsistencies
● Early inconsistency detection would:
• Reduce the harm of the inconsistency for business logic
● Allow better time for dealing with the issue
● However, the found inconsistency is equally bad to
troubleshoot
● Also, the constant database scanning takes some
resources, and can itself cause issues for the cluster use
●But how can you check data consistency in a live
cluster, with transactions processing in high speed

www.galeracluster.com
50
Galera Consistency Checking

●Galera has inbuilt a method for running consistency


checking in isolated transaction sequence slot
●Consistency checking support is performed for
statements:
• INSERT...SELECT
● REPLACE...SELECT
● Having special version comment /* !99997 */

CREATE TABLE consistency (id INT AUTO_INCREMENT,


checksum VARCHAR(32), … )
INSERT INTO consistency (checksum,...)
SELECT MD5(GROUP_CONCAT(i,j))
FROM mytable /* !99997 */www.galeracluster.com
51
Galera Consistency Checking

read & write read & write read & write

MariaDB MariaDB MariaDB

WS WS WS WS WS WS WS WS WS
945 946 947 948 949 950 951 952 953
Galera Replication
www.galeracluster.com
52
Galera Consistency Checking

INSERT...SELECT… /* !99997 */

read & write read & write read & write

MariaDB MariaDB MariaDB

WS WS WS WS WS WS WS WS WS
948 949 950 951 952 953 954 955 956
Galera Replication
www.galeracluster.com
53
Galera Consistency Checking

INSERT...SELECT… /* 99997 */

read & write read & write read & write

MariaDB MariaDB MariaDB

WS WS WS WS WS WS WS WS WS WS WS WS
954 955 956 957 958 959 960 961 962 963 964 965
Galera Replication
www.galeracluster.com
54
Galera Consistency Checking

read & write read & write read & write

MariaDB MariaDB MariaDB

WS WS WS WS WS WS WS WS WS WS WS
955 956 957 958 959 960 961 962 963 964 965
Galera Replication
www.galeracluster.com
55
Galera Consistency Checking

● The INSERT...SELECT is run in total order isolation


(TOI)
●It will block all commits until the INSERT..SELECT has
completed
●This may have impact on overall transaction
throughput, avoid too large select result set
● Pt-table-checksum tool uses Galera consistency
checking support, and scans automatically all tables
for inconsistencies

www.galeracluster.com
56
Optimizing Inconsistency Shutdown
Optimizing Inconsistency Shutdown

● Current Policy for Inconsistency:


● For suspected inconsistency, cluster node will
do emergency shutdown
● However, DDL failures are logged only as
warnings
● Injected inconsistency in one node can cause all
other nodes to shutdown

www.galeracluster.com
58
Inconsistency Shutdown

Create table t1 (i int)

Node A Node B Node C

Galera Replication

www.galeracluster.com
59
Inconsistency Shutdown

Create table t1 (i int)

Node A Node B Node C

t1 t1 t1

Galera Replication

www.galeracluster.com
60
Inconsistency Shutdown

Set wsrep_on=OFF
Insert into t values (8)

Node A Node B Node C

t1 t1 t1

Galera Replication

www.galeracluster.com
61
Inconsistency Shutdown

Set wsrep_on=ON
Delete from t;

Node A Node B Node C

t1 t1 t1

Del Del
8 8

www.galeracluster.com
62
Inconsistency Shutdown

Set wsrep_on=ON
Delete from t;

Node A Node B Node C

t1 t1 t1

Del Del
8 8

www.galeracluster.com
63
Inconsistency Shutdown

Node A, remains in
minority and changes to
Non-Primary

Node A Node B Node C

t1 t1 t1

www.galeracluster.com
64
Optimizing Consistency Shutdown

● Codership has developed a method to optimize node


emergency shutdowns due to suspected
inconsistency, to happen only for minimal set of
compromised nodes
● When inconsistency is observed, nodes will
communicate through consistency voting protocol to
compare which nodes face the similar issue in
applying
● Target is to find the majority, which have same
understanding about the fate of the offending write
set

www.galeracluster.com
65
Inconsistency Shutdown

Set wsrep_on=ON Consistency Voting


Delete from t;

Node A Node B Node C

t1 t1 t1
8

Del Del
8 8

www.galeracluster.com
66
Consistency Voting

Node A Success

Node B Error: ‘row not found’

Node C Error: ‘row not found’

www.galeracluster.com
67
Inconsistency Shutdown

Consistency Voting

Node A Node B Node C

t1 t1 t1

www.galeracluster.com
68
Galera Consistency Voting Protocol

● With consistency voting, Galera Cluster can mitigate


the harm of inconsistency for the cluster
●In the best case, only one node has to abort, and
majority can continue operating normally
● However, the database has been inconsistent, for
indefinitely long period, and application business logic
may have been hurt

www.galeracluster.com
69
Summary

●Database inconsistency is a critical problem and


very hard to track down
● It affects both MS and MM topologies
● But, inconsistency problem is immediate in MM
● It is possible to scan Galera Clusters for
inconsistencies up-front
● Recovery requires full database copy
● Galera Consistency Voting protocol will help, but only
in saving the cluster to remain in service

www.galeracluster.com
70
4

Thank you for listening!


Happy Clustering :-)
www.galeracluster.com
74

You might also like