100% found this document useful (1 vote)
126 views106 pages

Galera Cluster

This document discusses database replication using MySQL. It covers key concepts like asynchronous and synchronous replication, row and statement based replication, and certification based replication used in Galera clusters. It provides examples of how data is replicated across nodes, how conflicts are detected and resolved, and how consistency is maintained during replication. It also discusses advantages and disadvantages of different replication approaches.

Uploaded by

Mari Miranda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
126 views106 pages

Galera Cluster

This document discusses database replication using MySQL. It covers key concepts like asynchronous and synchronous replication, row and statement based replication, and certification based replication used in Galera clusters. It provides examples of how data is replicated across nodes, how conflicts are detected and resolved, and how consistency is maintained during replication. It also discusses advantages and disadvantages of different replication approaches.

Uploaded by

Mari Miranda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

MySQL Database Replication

Instructor: Hadi Alnabriss


 Database Replication Concepts
 Asynchronous and Synchronous Replication
 Row and Statement Based Replication
 Certification Based Replication
 Building Galera Cluster
 Experiencing Node’s Failure and Adding Additional Nodes
 Monitoring and Checking Cluster Status
 Understanding Weighted Quorum
 Load Balancing
 Database Replication refers to copying data from one node to
another
 This will replicate the same database to many nodes.
 Can be used for load balancing.
 Useful for backup and reporting. Client
 Database replication looks simple, but under the hood
it is very complex.
 The daunting part is to keep data among the nodes
consistent.
 Replication Must Guarantee the ACID Model Requirements:
◦ Atomicity: Database transactions should be atomic units of changes that can be
committed upon successful completion or can be rolled back upon abortion of
transaction.
◦ Consistency: the database state should be consistent at all times, during the
progression of a transaction and after commits and rollbacks.
◦ Isolation: Transactions should be isolated from one another, so that one
transaction could not interfere with the work set of another transaction and
also to avoid conflicts.
◦ Durability.
 Master server receives the updates and propagates
them to slaves.
 Master copies all updates to a binary log file, then
slaves copy this file and replay it locally.
 Inserts, Updates and deletes queries (DML) will be
forwarded to the master server.
 Read queries can be scaled
 Most applications are read-heavy

 Web developers need to modify


their applications
 Updates can be sent to any node, then the replication
system will propagate the updates to the other DBMS
(Database Management Systems) using the Replication
System.
 Writes the new update on the local DB, commit to the client,
then propagate to the other nodes using binary log file.

Update the local


DBMS, commit to
the client, then tell
the other nodes Ok, Balance = 500
 Writes the new update on the local DB, commit to the client,
then propagate to the other nodes using binary log file.

Update the local


DBMS, commit to
the client, then tell
the other nodes I am busy now, I will
do that in 3 minutes
 Writes the new update on the local DB, commit to the client,
then propagate to the other nodes using binary log file.

I am too busy now, I


will do that in 30
Update the local minutes
DBMS, then tell the
other nodes
 (Cons) Data is not always consistent.
 (Pros) Can work with low bandwidths and long distances.
 Local DB must guarantee that new data will be added
to all the nodes at the same time.

Send the update to the


Replication Manager
When the Acknowledgment
received, all the nodes will
update the record at the
same time
 If one of the DBMS didn’t commit the modification, it will be rejected

Send the update to the


Replication Manager
When the Acknowledgment
received, all the nodes will
update the record at the
same time
After all the nodes make the
modifications, commit to
client
 (pros) No data Loss when nodes crash
 (pros) Data Replicas always consistent
 (cons) Any increase in the number of nodes leads to an
exponential growth in the transaction response times and in
the probability of conflicts and deadlock rates.
 Semi-synchronous replication guarantees that a transaction is
correctly copied to at least one of the other servers.

2. Ok, I copied that


1. Copy the update to the
other servers

3. Node 3, confirmed, Commit


 (Pros) Semi-sync allows for additional data integrity.
 (Cons) Slower than Asynchronous
 We need Synchronous Replication because it makes
data consistency, but its performance will degrade too
much if you increase the number of nodes.
 How to solve this problem?
 Proposed Solutions
◦ Group Communication
◦ Write Sets
◦ Database State Machine
◦ Transaction Reordering
 Galera Cluster uses Certification-based Replication that
uses these approaches.
 Statement Based Replication
◦ SQL statements will be replicated
◦ not safe
 Row Based Replication
◦ Actual Data Changes will be replicated
◦ Safe
◦ If a statement changes many rows, replication writes more data to the
binary log
 The Group Communication decides which node can be
added to the cluster (or the group) or evicted from it.
 Performs failure detection.
 Orders servers’ messages.

Group Communication
 A client executes a write transaction, it is executed
optimistically on the local server right until before
being actually committed.
 Transaction is bundled into a write-set and sent to the
Group Communication toolkit
 Message is ordered by the Group Communication
Toolkit, and sent to all nodes in the same order.
 Every server keeps for each row update an associated
version, and each transaction has a version number to
guarantee consistency.
 If the transaction has no conflicts it will be executed
 Certification is used to detect conflicts between
version numbers
 We have a 3 nodes database group
 Each Database has a Database Version Number
 Database version number is increased after any
transaction is committed

DBV=1 DBV=1 DBV=1


 A client will execute a write transaction on DB1

DBV=1 DBV=1 DBV=1


 The transaction executes until the before commit
stage, then it broadcasts write set and data to the
group.

DBV=1 DBV=1 DBV=1


 The transaction write set is composed by the primary
keys of each updated table row and the database
version at which transaction was executed

DBV=1 DBV=1 DBV=1


 The database version number is compared now.
 Current version equals database version, so no conflicts and transaction
will be allowed to commit, and row version on certification module will be
updated to 2 (cv: 2).

Transaction TX1
Current Version =1

DBV=1 DBV=1 DBV=1


 After updating the row version On the local server DB1 it proceeds to
commit and return success to client.
 On the remote servers (s2 and s3) the transaction will be queued to be
applied by applier module.

Transaction TX1
Current Version =1

DBV=1 DBV=1 DBV=1


 If the version number in the write set less than the database
version number, then rollback
 i.e (CV =3, DBV=5)

Transaction TX1
Current Version =1

DBV=1 DBV=1 DBV=1


 In this example we have two clients updating the same record
at the same time
 Each server (DB1 and DB2) will complete the request until
before the commit to the client.
Client 1 Client 2

Set X=500 Set X=0

DBV=1 DBV=1 DBV=1


 Now the write set is bundled and sent to the group
communication.
From DB1:
Transaction T1
Update x=500
Client 1 Client 2
Current Version = 1

Set X=500 Set X=0

From DB2:
Transaction T2
Update x=0
Current Version = 1
DBV=1 DBV=1 DBV=1
 The group communication will choose the right order for the
two transactions ( assume T1 will be first)
From DB1:
Transaction T1
Update x=500
Client 1 Client 2
Current Version = 1

Set X=500 Set X=0

From DB2:
Transaction T2
Update x=0
Current Version = 1
DBV=1 DBV=1 DBV=1
 T1 will be checked for certification.
 Since the CV >= DBV , then there is no conflicts and it will be
certified.

Client 1 Client 2

Set X=500 Set X=0

DBV=1 DBV=1 DBV=1


 The version number will be increased to 2 now, and the
applier module will guarantee that this transaction will be
applied on the two other servers
 And commit will be sent to client.
Client 1 Client 2

Set X=500 Set X=0

DBV=2 DBV=2 DBV=2


 Now the Group communication will send the 2nd transaction
T2.
 The Current Version = 1, while the DBV = 2, certification will
fail
Client 1 Client 2

Set X=500 Set X=0

From DB2:
Transaction T2
Update x=0
Current Version = 1
DBV=2 DBV=2 DBV=2
 Server 2 will rollback the transaction and return an error to
the client.

Client 1 Client 2

Set X=500 Set X=0

DBV=2 DBV=2 DBV=2


 A transactional DBMS uses the concept of isolation to
isolate concurrent transactions.
 Concurrent transactions isolated for:
◦ Performance
◦ Reliability
◦ Consistency
 The “isolation level” determines the degree of isolation
needed when multiple transactions try to make changes
and issue queries to the database.
 In the following example we have two concurrent transactions, assume
that T1 will be applied before T2, what is the final value of X?

Old Data Transaction 1 Transaction 2


X=500 Select X, Select X,
Update X = (X + 500) Update X = (X - 500)

 Transaction 2 didn’t read the updated value of X (update by T1)


 So if we don’t have isolation the final value will be X=0 (Wrong Value)
 Transactions should be isolated from one another
 A transaction must not interfere with the work set of another transaction
to avoid conflicts
 This is achieved through table and row locking.
 The more restricted is the isolation level with longer locking
causes delays and performance issues for the sake of
transaction integrity.
 Galera Cluster supports 4 isolation levels:
 (1) READ-UNCOMMITED: In this level, transactions can see data changes made by
other transactions even before they are committed.
 Also known as dirty read
 This level does not provide real isolation at all. X=1000
(before
committing)

Old Data Transaction 1 Transaction 2


X=500 Select X, Select X,
Update X = (X + 500) Update X = (X - 500)
 Galera cluster supports 4 isolation levels:
 (2) READ-COMMITED: transactions can see committed changes made by other
transactions.
 SELECT queries read the committed data prior to the query.
 So when a single transaction runs multiple SELECT queries, each one sees their own snapshot
of committed data that are different due to the changes in data caused by other transactions.

Each select
reads different
data
Old Data Transaction 1 Transaction 2
X=500 Select X, Select X,
Update X = (X + 500) Update X = (X - 500)
Select X,
Update X = 100 Update X = (X - 500)
 Galera cluster supports 4 isolation levels:
 (3) REPEATABLE–READ: The default isolation level for MySQL InnoDB.
 Here snapshots of data are taken before the first SELECT query and all subsequent queries see the same
snapshot
 Queries will not see changes committed by other transactions making reads repeatable.

All the queries


see the same
value
Old Data Transaction 1 Transaction 2
X=500 Select X, Select X,
Update X = (X + 500) Update X = (X - 500)
Select X,
Update X = (X - 500)
 Galera cluster supports 4 isolation levels:
 (4) SERIALIZABLE: In this level, all rows accessed by the transaction are locked
 Since the data snapshot available to SELECT queries are the same ones, this is similar to REPEATABLE-READ but
read-only.

All the queries


see the same
value
Old Data Transaction 1 Transaction 2
X=500 Select X, Select X,
Update X = (X + 500) Update X = (X - 500)
Select X,
Update X = (X - 500)
 In the master-slave mode of Galera Cluster, all four levels of
isolation can be used,
 In multi-master mode the REPEATABLE-READ level is only
supported.
 If we have configured MySQL with the default REPEATABLE-
READ level, transactions at same nodes will be isolated at that
level.
 This will show the “lost update” problem
 Here we have two concurrent Transactions on the same node
, T1 and T2,
T1 T2
1 Get the same data snapshot using “select”
2 Lock data and update Try to update locked data
3 Pending for commit
4 Commit and release lock
5 Update and commit
T2 missed the
updates applied
by T1
 With Galera Cluster’s SNAPSHOT ISOLATION, the first
committer wins logic is used for ensuring data integrity
 When concurrent transactions operate on separate
nodes in the cluster (same database). It is
implemented.
 With Galera Cluster’s SNAPSHOT ISOLATION, the first committer wins logic
is used for ensuring data integrity
 When T1 and T2 work on two different nodes, the lost update problem
will not occur, because the second transaction will be totally rejected
 To overcome this issue, Galera Cluster utilizes
MySQL/InnoDB’s “SELECT FOR UPDATE” statement for reading
for updates.
 This statement locks the table from the READ operation prior
to UPDATE operation itself making the table read only so that
conflicting WRITEs are prevented.
 In our example, when T2 tries to read the data for update, it
will be rejected.
 State transfer is the process of replicating data from
the cluster to individual node.
 Synchronizing node’s data with the cluster is called
provisioning
 Two methods available in Galera Cluster to provision
nodes:
◦ State Snapshot Transfers (SST)
 Where a snapshot of the entire node state transfers.
◦ Incremental State Transfers (IST)
 Where only the missing transactions transfer
 The cluster provisions nodes by transferring a full data
copy from one node to another.
 Two different approaches in Galera Cluster to transfer
a state from one database to another:
◦ Logical : This method uses mysqldump.
◦ Physical This method copies the data files directly from server
to server (i.e rsync)
 The cluster provisions a node by identifying the missing
transactions on the joiner and sends them only, instead
of the entire state.
 Currently replication works only with the InnoDB storage engine.
 Any writes to tables of other types, including system (mysql.*) tables are
not replicated
 This limitation excludes DDL (Data Definition Language) statements such
as CREATE USER, which implicitly modify the mysql.* tables
 There is however experimental support for MyISAM
(wsrep_replicate_myisam system variable)
 All tables should have a primary key (multi-column primary keys are
supported).
 DELETE operations are unsupported on tables without a primary key.
 Rows in tables without a primary key may appear in different order on
different nodes.
 Transaction size:
◦ Galera does not explicitly limit the transaction size, a writeset is processed as a
single memory-resident buffer and as a result, extremely large transactions (e.g.
LOAD DATA) may adversely affect node performance.
◦ To avoid that, the wsrep_max_ws_rows and wsrep_max_ws_size system
variables limit transaction rows to 128K and the transaction size to 1Gb by
default.
◦ If necessary, users may want to increase those limits.
◦ Future versions will add support for transaction fragmentation.
Practical Section
 Three Nodes Minimum, with Hardware Specifications at least:
◦ 1 GHz single core
◦ 512 MB RAM
◦ 100 Mbps Network Connectivity (use NAT to guarantee network connectivity
and Internet Access)

 Note 1 : One Slow node might slow your whole cluster


 Note 2 : Configure enough swap space on your system.
 Install CentOS 7 (latest version) on each node
 To avoid security issues disable firewalld and selinux.
 Configure network connections (Internet access required)
 Configure hostnames on the three nodes (gdb01, gdb02 and gdb03)
◦ During Galera DB configurations we will use hostnames instead of IP address
 Configure the hosts file on each node (IP and name for each node)

10.12.0.134 gdb01
10.12.0.135 gdb02
10.12.0.136 gdb03
 CentOS Repositories include the old version 5.5.
 We will install MariaDB version 10.1
 Add the repository (Ref: https://mariadb.com/kb/en/library/yum/):
◦ Create the file : /etc/yum.repos.d/MariaDB.repo
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.1/centos7-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1

 Install MariaDB server, client and common tools using the command:
◦ yum install MariaDB-server MariaDB-client MariaDB-common
 On The 1st node (any node), start MariaDB and run the
command :
◦ mysql_secure_installation
 Configure Galera Section in the file : /etc/my.cnf.d/server.cnf

[galera]
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://10.12.0.134,10.12.0.135,10.12.0.136"
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
wsrep_cluster_name="cluster1"
wsrep_sst_method=rsync
wsrep_node_address= " 10.12.0.134"
wsrep_node_name="gdb01"
 On the 1st node, run the command:
 galera_new_cluster
 Now the Cluster size = 1 (check that)
 On the 2nd node run
 systemctl start mariadb
 Check the cluster size again
 On the 3rd node run
 systemctl start mariadb
 Check the cluster size again
 To check the cluster, create new database and new table
(InnoDB) , insert data, and check that everything is
replicated to the other servers.
create database test100;
use test100;
CREATE TABLE t1 (a INT) ENGINE=InnoDB;
insert into t1(a) values(50);
insert into t1(a) values(60);
insert into t1(a) values(70);

 On the other nodes check the new data


 Check what happens when we shutdown some of the
nodes gracefully?
 Check what happens when we power off some of the
machines?
 How to restart the cluster ( stop and start all the
nodes)?
 Prepare the node as you did with the previous three nodes.
 Install the required repository and packages for MariaDB.
 Modify the galera section in the /ety/my.conf.d/server.conf in the three
nodes.
 Copy the galera section configurations from any one of the three nodes,
and paste it to the new node (modify the IP and the name of the node).
 Restart mariadb service on the nodes of the cluster (the 3 nodes)
 Start the mariadb service on the new node.
 This problem occurs in environments with multi master
DBMS.
 When a problem happens to the
replication system and clients still
make updates to the database.
 Which Value is right?
 Cluster failures that result in database nodes operating
autonomous of each other are called split-brain conditions.
 When this occurs, data can become irreparably corrupted,
 i.e when two database nodes independently update the
same row on the same table.
 The current number of nodes in the cluster defines the
current cluster size.
 Every time a node joins the cluster, the total cluster size
increases.
 When a node leaves the cluster, gracefully, the cluster size
decreases.
 Cluster size determines the number of votes required to
achieve quorum.
 Cluster needs quorum before making any decisions
 What happens if we have two members?
 What is the final decision? Yes or No

Yes No
Do you accept
increasing the DB
size:
 What happens if we have three members?
 What is the final decision? Yes

Yes No Yes
Do you accept
increasing the DB
Size:
 Galera Cluster takes a quorum vote whenever a node does not respond and is
suspected of no longer being a part of the cluster.
 The component that has quorum alone continues to operate as the Primary
Component, while those without quorum enter the non-primary state and begin
attempt to connect with the Primary Component.

Expected Votes: 3 Expected Votes: 3


Total votes: 1 Total votes: 2
No Quorum Valid Quorum
 Clusters that have an even number of nodes risk split-brain conditions.
 If you lose network connectivity between the partitions in a way that
causes the number of nodes to split exactly in half, neither partition can
retain quorum and both enter a non-primary state

Expected Votes: 4 Expected Votes: 4


Total votes: 2 Total votes: 2
No Quorum No Quorum
 One partition will be primary component, and the second will be non-
primary

Expected Votes: 5
Expected Votes: 5
Total votes: 2
Total votes: 3
No Quorum
Valid Quorum
 When you design your infrastructure try to avoid split brain conditions (by
using multiple datacenters, switches, servers, Power sources, etc )
 You can also make advanced configurations to modify the weights for
member nodes
 You can protect your Galera cluster using your firewall, or using iptables or
firewalld in linux, but you need to keep the following ports open:
◦ 3306 For MySQL client connections and State Snapshot Transfer that use the mysqldump
method.
◦ 4567 For Galera Cluster replication traffic, multicast replication uses both UDP transport
and TCP on this port.
◦ 4568 For Incremental State Transfer.
◦ 4444 For all other State Snapshot Transfer.
 For securing database server and client connections, you can use the internal
MySQL SSL support.
 In the event that you use logical transfer methods for state snapshot transfer, such
as mysqldump, this is the only step you need to take in securing your state
snapshot transfers. ##### /etc/my.cnf File
# MySQL Server
[mysqld]
ssl-ca = /path/to/ca-cert.pem
ssl-key = /path/to/server-key.pem
ssl-cert = /path/to/server-cert.pem

# MySQL Client Configuration


[mysql]
ssl-ca = /path/to/ca-cert.pem
ssl-key = /path/to/client-key.pem
ssl-cert = /path/to/client-cert.pem
 In order to enable SSL on the internal node processes, you need to define
the paths to the key, certificate and certificate authority files that you
want the node to use in encrypting replication traffic.

## Add the following line to the my.cnf file


wsrep_provider_options="socket.ssl_key=/path/to/server-key.pem/;socket.ssl_cert=/path/to/server-
cert.pem;socket.ssl_ca=/path/to/cacert.pem“

# OR use:
SET GLOBAL wsrep_provider_options="socket.ssl_key=/path/to/server-
key.pem/;socket.ssl_cert=/path/to/server-cert.pem;socket.ssl_ca=/path/to/cacert.pem“
 How to balance client requests to your cluster?

Client
 Clients connect to HAProxy
 HAProxy is used for load balancing
 HAProxy used for fault tolerance
Client Client Client

HAProxy
 In this Environment we still have Single Point Of Failure.
 To avoid Single Point Of Failure , you need to use more than
one HAProxy
Client Client Client

HAProxy
 To which IP clients will connect?
Client Client Client

HAProxy HAProxy
 Pacemaker creates cluster for HAProxy nodes
 It monitors the nodes using heartbeat
Client Client Client
 It can be used to create Virtual IP

 Virtual IP will run on one HAProxy

 If the HAProxy with VIP fails, the


VIP
VIP will be migrated to the second
HAProxy HAProxy HAProxy

 Like Active/Passive solution


 Congratulations for completing this course
 Please Rate How Much This Course Was Helpful For You

 I Will be glad to share my knowledge with you, and to hear from you
about your experience and advices regarding Galera Cluster and MySQL
High Availability.
 I will consider adding any additional topics that you think it’s important for
this course.
 http://galeracluster.com/documentation-webpages/
 https://mysqlhighavailability.com
 https://clusterengine.me

You might also like