SlideShare a Scribd company logo
pg chameleon
MySQL to PostgreSQL replica made easy
Federico Campoli
Transferwise
PGCon, Ottawa
01 Jun 2018
http://www.pgdba.org
@4thdoctor scarf
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 1 / 46
Few words about the speaker
Born in 1972
Passionate about IT since 1982
mostly because of the TRON movie
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
Few words about the speaker
Born in 1972
Passionate about IT since 1982
mostly because of the TRON movie
Joined the Oracle DBA secret society in 2004
In love with PostgreSQL since 2006
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
Few words about the speaker
Born in 1972
Passionate about IT since 1982
mostly because of the TRON movie
Joined the Oracle DBA secret society in 2004
In love with PostgreSQL since 2006
Devrim PostgreSQL tattoo’s copycat
Works at Transferwise as Data Engineer
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
Disclaimer
I’m not a developer
I’m a DBA...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
Disclaimer
I’m not a developer
I’m a DBA...which means being hated by everybody and hating everybody
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
Disclaimer
I’m not a developer
I’m a DBA...which means being hated by everybody and hating everybody
So, to put things in the right perspective...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
Disclaimer
I’m not a developer
I’m a DBA...which means being hated by everybody and hating everybody
So, to put things in the right perspective...I use tabs
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
Palpatine
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 4 / 46
Table of contents
1 History
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 5 / 46
History
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 6 / 46
The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL.1
1Opening a new connection for each query is not the smartest thing to do.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL.1
The script is written in python 2.6
It’s a monolith script
And it’s slow, very slow
1Opening a new connection for each query is not the smartest thing to do.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL.1
The script is written in python 2.6
It’s a monolith script
And it’s slow, very slow
It’s a good checklist for things to avoid when coding
https://github.com/the4thdoctor/neo my2pg
1Opening a new connection for each query is not the smartest thing to do.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
It was built during the years of the life on a roller coaster2
Therefore it was a just a way to discharge frustration
2Recording available here: http://www.pgbrighton.uk/post/backup recovery/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 8 / 46
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
It was built during the years of the life on a roller coaster2
Therefore it was a just a way to discharge frustration
Abandoned after a while
SQLAlchemy’s limitations were frustrating as well (see slide 3)
And pgloader did the same job much much better
2Recording available here: http://www.pgbrighton.uk/post/backup recovery/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 8 / 46
pg chameleon reborn
Year 2016
I needed to replicate the data data from MySQL to PostgreSQL
http://tech.transferwise.com/scaling-our-analytics-database/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 9 / 46
pg chameleon reborn
Year 2016
I needed to replicate the data data from MySQL to PostgreSQL
http://tech.transferwise.com/scaling-our-analytics-database/
The amazing library python-mysql-replication allowed me build a proof of
concept
Evolved later in pg chameleon 1.x
Kudos to the python-mysql-replication team!
https://github.com/noplay/python-mysql-replication
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 9 / 46
pg chameleon 1.x
Developed on the London to Brighton commute
Released as stable the 7th May 2017
Followed by 8 bugfix releases
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 10 / 46
pg chameleon 1.x
Developed on the London to Brighton commute
Released as stable the 7th May 2017
Followed by 8 bugfix releases
Compatible with CPython 2.7/3.3+
No more SQLAlchemy
The MySQL driver changed from MySQLdb to PyMySQL
Command line helper
Supports type override on the fly (Danger!)
Installs in virtualenv and system wide via pypi
Can detach the replica for minimal downtime migrations
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 10 / 46
pg chameleon versions 1’s limitations
All the affected tables are locked in read only mode during the init replica
process
During the init replica the data is not accessible
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
pg chameleon versions 1’s limitations
All the affected tables are locked in read only mode during the init replica
process
During the init replica the data is not accessible
The tables for being replicated require primary keys
No daemon, the process always stays in foreground
Single schema replica
One process per each schema
Network inefficient
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
pg chameleon versions 1’s limitations
All the affected tables are locked in read only mode during the init replica
process
During the init replica the data is not accessible
The tables for being replicated require primary keys
No daemon, the process always stays in foreground
Single schema replica
One process per each schema
Network inefficient
Read and replay not concurrent with risk of high lag
The optional threaded mode very inefficient and fragile
A single error in the replay process and the replica is broken
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
MySQL Replica in a nutshell
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 12 / 46
MySQL Replica
The MySQL replica is logical
When the replica is enabled the data changes are stored in the master’s
binary log files
The slave gets from the master’s binary log files
The slave saves the stream of data into local relay logs
The relay logs are replayed against the slave
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 13 / 46
MySQL Replica
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 14 / 46
Log formats
MySQL have three ways of storing the changes in the binary logs.
STATEMENT: It logs the statements which are replayed on the slave.
It’s the best solution for the bandwidth. However, when replaying statements
with not deterministic functions this format generates different values on the
slave (e.g. using an insert with a column autogenerated by the uuid function).
ROW: It’s deterministic. This format logs the row images.
MIXED takes the best of both worlds. The master logs the statements unless
a not deterministic function is used. In that case it logs the row image.
All three formats always log the DDL query events.
The python-mysql-replication library and therefore pg chameleon, require the
ROW format to work properly.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 15 / 46
A chameleon in the middle
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 16 / 46
pg chameleon
pg chameleon mimics a mysql slave’s behaviour
It performs the initial load for the replicated tables
It connects to the MySQL replica protocol
It stores the row images into a PostgreSQL table
A plpgSQL function decodes the rows and replay the changes
It can detach the replica for minimal downtime migrations
PostgreSQL acts as relay log and replication slave
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 17 / 46
MySQL replica + pg chameleon
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 18 / 46
pg chameleon 2.0 #1
Developed at the pgconf.eu 2017 and on the commute
Released as stable the 1st of January 2018
Compatible with python 3.3+
Installs in virtualenv and system wide via pypi
Replicates multiple schemas from a single MySQL into a target PostgreSQL
database
Conservative approach to the replica. Tables which generate errors are
automatically excluded from the replica
Daemonised replica process with two distinct subprocesses, for concurrent
read and replay
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 19 / 46
pg chameleon 2.0 #2
Soft locking replica initialisation. The tables are locked only during the copy.
Rollbar integration for a simpler error detection and messaging
Experimental support for the PostgreSQL source type
The tables are loaded in a separate schema which is swapped with the
existing.
This approach requires more space but it makes the init a replica virtually
painless, leaving the old data accessible until the init replica is complete.
The DDL are translated in the PostgreSQL dialect keeping the schema in
sync with MySQL automatically
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 20 / 46
Version 2.0’s limitations
Tables for being replicated require primary or unique keys
When detaching the replica the foreign keys are created always ON
DELETE/UPDATE RESTRICT
The source type PostgreSQL supports only the init replica process
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 21 / 46
Replica initialisation
The replica initialisation follows the same workflow as stated on the mysql online
manual.
Flush the tables with read lock
Get the master’s coordinates
Copy the data
Release the locks
However...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 22 / 46
Replica initialisation
The replica initialisation follows the same workflow as stated on the mysql online
manual.
Flush the tables with read lock
Get the master’s coordinates
Copy the data
Release the locks
However...
pg chameleon flushes the tables with read lock one by one. The lock is held only
during the copy.
The log coordinates are stored in the replica catalogue along the table’s name and
used by the replica process to determine whether the table’s binlog data should be
used or not.
The replica starts inconsistent and gains consistency over time.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 22 / 46
Fallback on failure
The data is pulled from mysql using the CSV format in slices. This approach
prevents the memory overload.
Once the file is saved then is pushed into PostgreSQL using the COPY command.
However...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 23 / 46
Fallback on failure
The data is pulled from mysql using the CSV format in slices. This approach
prevents the memory overload.
Once the file is saved then is pushed into PostgreSQL using the COPY command.
However...
COPY is fast but is single transaction
One failure and the entire batch is rolled back
If this happens the procedure loads the same data using the INSERT
statements
Which can be very slow
The process attempts to clean the NUL markers which are allowed by MySQL
If the row still fails on insert then it’s discarded
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 23 / 46
Replica in action
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 24 / 46
MySQL configuration
The mysql configuration file is usually stored in /etc/mysql/my.cnf
To enable the binary logging find the section [mysqld] and check that the
following parameters are set.
binlog_format= ROW
log-bin = mysql-bin
server-id = 1
binlog-row-image = FULL
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 25 / 46
MySQL user for replica
Setup a replication user on MySQL
CREATE USER usr_replica ;
SET PASSWORD FOR usr_replica =PASSWORD(’replica ’);
GRANT ALL ON sakila .* TO ’usr_replica ’;
GRANT RELOAD ON *.* to ’usr_replica ’;
GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’;
GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’;
FLUSH PRIVILEGES;
In our example we are using the sakila test database.
https://dev.mysql.com/doc/sakila/en/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 26 / 46
PostgreSQL setup
Add an user on PostgreSQL capable to create schemas and relations in the
destination database
CREATE USER usr_replica WITH PASSWORD ’replica ’;
CREATE DATABASE db_replica WITH OWNER usr_replica;
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 27 / 46
Install pg chameleon
Install pg chameleon and create the configuration files
pip install pip --upgrade
pip install pg_chameleon
chameleon set_configuration_files
cd ~/.pg_chameleon/configuration
cp config-example.yml default.yml
Edit the file default.yml setting the correct values for connection and source.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 28 / 46
Configure global settings in default.yaml
PostgreSQL Connection
pg conn:
host: " localhost "
p or t : " 5432 "
u s e r : " usr_replica "
password: " replica "
database: " db_replica "
c h a r s e t : " utf8 "
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
Configure global settings in default.yaml
PostgreSQL Connection
pg conn:
host: " localhost "
p or t : " 5432 "
u s e r : " usr_replica "
password: " replica "
database: " db_replica "
c h a r s e t : " utf8 "
Rollbar configuration
r o l l b a r k e y : ’< rollbar_long_key>’
r o l l b a r e n v : ’pgcon - demo ’
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
Configure global settings in default.yaml
PostgreSQL Connection
pg conn:
host: " localhost "
p or t : " 5432 "
u s e r : " usr_replica "
password: " replica "
database: " db_replica "
c h a r s e t : " utf8 "
Rollbar configuration
r o l l b a r k e y : ’< rollbar_long_key>’
r o l l b a r e n v : ’pgcon - demo ’
Type override (optional)
t y p e o v e r r i d e :
" tinyint (1) ":
o v e r r i d e t o : b o o l e a n
o v e r r i d e t a b l e s :
- "*"
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
Configure the mysql source
s o u r c e s :
mysql:
db conn:
host: " localhost "
po r t : " 3306 "
u s e r : " usr_replica "
password: " replica "
c h a r s e t : ’utf8 ’
connect timeout: 10
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
Configure the mysql source
s o u r c e s :
mysql:
db conn:
host: " localhost "
po r t : " 3306 "
u s e r : " usr_replica "
password: " replica "
c h a r s e t : ’utf8 ’
connect timeout: 10
schema mappings:
s a k i l a : l o x o d o n t a a f r i c a n a
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
Configure the mysql source
s o u r c e s :
mysql:
db conn:
host: " localhost "
po r t : " 3306 "
u s e r : " usr_replica "
password: " replica "
c h a r s e t : ’utf8 ’
connect timeout: 10
schema mappings:
s a k i l a : l o x o d o n t a a f r i c a n a
l i m i t t a b l e s :
s k i p t a b l e s :
g r a n t s e l e c t t o :
- u s r r e a d o n l y
l o c k t i m e o u t : " 120 s"
m y s e r v e r i d : 100
r e p l i c a b a t c h s i z e : 10000
rep l ay max row s: 10000
b a t c h r e t e n t i o n : ’1 day ’
copy max memory: " 300 M"
copy mode: ’file ’
o u t d i r : /tmp
s l e e p l o o p : 1
o n e r r o r r e p l a y : c o n t i n u e
o n e r r o r r e a d : c o n t i n u e
auto maintenance: "1 day "
type: mysql
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
Add the source and initialise the replica
Add the source mysql and initialise the replica for it. We are using debug in order
to get the logging on the console.
chameleon create_replica_schema --debug
chameleon add_source --config default --source mysql --debug
chameleon init_replica --config default --source mysql --debug
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 31 / 46
Start the replica
Start the replica process
chameleon start_replica --config default --source mysql
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 32 / 46
Start the replica
Start the replica process
chameleon start_replica --config default --source mysql
Show the replica status
chameleon show_status --config default --source mysql
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 32 / 46
Time for a demo
Demo!
The demo will fail miserably for sure and you will hate this project forever.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 33 / 46
Lessons learned
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 34 / 46
Strictness is an illusion. MySQL doubly so
MySQL’s lack of strictness is not a mystery.
The funny way the default with NOT NULL is managed by MySQL can break the
replica.
Therefore any field with NOT NULL added after the initialisation are created
always as NULLable in PostgreSQL.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 35 / 46
The DDL. A real pain in the back
I initially tried to use sqlparse for tokenising the DDL emitted by MySQL.
Unfortunately didn’t worked as I expected.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
The DDL. A real pain in the back
I initially tried to use sqlparse for tokenising the DDL emitted by MySQL.
Unfortunately didn’t worked as I expected.
So I decided to use the regular expressions.
Some people, when confronted with a problem,
think "I know, I’ll use regular expressions."
Now they have two problems.
-- Jamie Zawinski
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
The DDL. A real pain in the back
I initially tried to use sqlparse for tokenising the DDL emitted by MySQL.
Unfortunately didn’t worked as I expected.
So I decided to use the regular expressions.
Some people, when confronted with a problem,
think "I know, I’ll use regular expressions."
Now they have two problems.
-- Jamie Zawinski
MySQL even in ROW format emits the DDL as statements
The class sql token uses the regular expressions to tokenise the DDL
The tokenised data is used to build the DDL in the PostgreSQL dialect
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
Wrap up
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 37 / 46
To boldly go where no chameleon has gone before
Short team goals, version 2.0
Re sync automatically the tables when they error on replay
Improve the replay speed and cpu efficiency
GTID support for MySQL source
Medium term goals version 2.1
Parallel copy and index creation in order to speed up the init replica process
Logical replica from PostgreSQL
Improve the default column handling
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 38 / 46
Igor, the green little guy
The chameleon logo has been developed by Elena Toma, a talented Italian Lady.
https://www.facebook.com/Tonkipapperoart/
The name Igor is inspired by Martin Feldman’s Igor portraited in Young
Frankenstein movie.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 39 / 46
Feedback please!
Please report any issue on github and follow pg chameleon on twitter for the
announcements.
https://github.com/the4thdoctor/pg chameleon
@pg chameleon
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 40 / 46
Did you say hire?
WE ARE HIRING!
https://transferwise.com/jobs/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 41 / 46
That’s all folks!
Thank you for listening!
Any questions?
Please be very basic, I’m just an electrician after all.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 42 / 46
Image credits
Palpatine,Dr. Evil disclaimer,It could work. Young Frankenstein source
memegenerator
MySQL Image source, WikiCommons
Hard Disk image, source WikiCommons
Tron image, source Tron Wikia
Twitter icon, source Open Icon Library
The PostgreSQL logo, copyright the PostgreSQL global development group
Boromir get rid of mysql, source imgflip
Morpheus, source imgflip
Keep calm chameleon, source imgflip
The dolphin picture - Copyright artnoose
Perseus, Framed - Copyright Federico Campoli
Pinkie Pie that’s all folks, Copyright by dan232323, used with permission
Doom, source RetroPie
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 43 / 46
License
This document is distributed under the terms of the Creative Commons
Attribution, Not Commercial, Share Alike
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 44 / 46
pg chameleon
MySQL to PostgreSQL replica made easy
Federico Campoli
Transferwise
PGCon, Ottawa
01 Jun 2018
http://www.pgdba.org
@4thdoctor scarf
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 45 / 46

More Related Content

PDF
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
PDF
MySQL Shell for DBAs
PDF
PostgreSQL and RAM usage
PPTX
OpenTelemetry For Developers
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
Raft presentation
PPTX
Using multi-tenant WordPress to simplify development
PDF
Observability
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Shell for DBAs
PostgreSQL and RAM usage
OpenTelemetry For Developers
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Raft presentation
Using multi-tenant WordPress to simplify development
Observability

What's hot (20)

PPTX
Sonatype nexus 로 docker registry 관리하기
PDF
MySQL GTID 시작하기
PDF
Galera explained 3
PDF
Airflow Best Practises & Roadmap to Airflow 2.0
PDF
Galera Replication Demystified: How Does It Work?
PDF
[수정본] 우아한 객체지향
PDF
Adventures in Observability - Clickhouse and Instana
PDF
Custom DevOps Monitoring System in MelOn (with InfluxDB + Telegraf + Grafana)
PDF
Percona Live 2022 - The Evolution of a MySQL Database System
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
PPTX
My sql failover test using orchestrator
PDF
Spring Boot on Amazon Web Services with Spring Cloud AWS
PPTX
Practical learnings from running thousands of Flink jobs
PPTX
Airflow - a data flow engine
PDF
우아한 객체지향
PPTX
Introduction to react_js
PDF
PostgreSQL HA
PDF
SeaweedFS introduction
PPTX
Apache Kafka Architectures and Fundamentals
PDF
(책 소개) 가상 면접 사례로 배우는 대규모 시스템 설계 기초
Sonatype nexus 로 docker registry 관리하기
MySQL GTID 시작하기
Galera explained 3
Airflow Best Practises & Roadmap to Airflow 2.0
Galera Replication Demystified: How Does It Work?
[수정본] 우아한 객체지향
Adventures in Observability - Clickhouse and Instana
Custom DevOps Monitoring System in MelOn (with InfluxDB + Telegraf + Grafana)
Percona Live 2022 - The Evolution of a MySQL Database System
Tame the small files problem and optimize data layout for streaming ingestion...
My sql failover test using orchestrator
Spring Boot on Amazon Web Services with Spring Cloud AWS
Practical learnings from running thousands of Flink jobs
Airflow - a data flow engine
우아한 객체지향
Introduction to react_js
PostgreSQL HA
SeaweedFS introduction
Apache Kafka Architectures and Fundamentals
(책 소개) 가상 면접 사례로 배우는 대규모 시스템 설계 기초
Ad

Similar to pg_chameleon MySQL to PostgreSQL replica made easy (20)

PDF
Pg chameleon MySQL to PostgreSQL replica
PDF
pg_chameleon a MySQL to PostgreSQL replica
PDF
The ninja elephant, scaling the analytics database in Transwerwise
PDF
PostgreSQL, the big the fast and the (NOSQL on) Acid
PDF
PostgreSQL - backup and recovery with large databases
PDF
The ninja elephant, scaling the analytics database in Transwerwise
PDF
a look at the postgresql engine
PDF
Hitchikers guide handout
PDF
A couple of things about PostgreSQL...
PDF
The hitchhiker's guide to PostgreSQL
PDF
Opencast Architecture
PDF
pgpool: Features and Development
PDF
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
PPTX
Distributed System explained (with Java Microservices)
PDF
JPA Week3 Entity Mapping / Hexagonal Architecture
PDF
Infrastructure as code might be literally impossible part 2
PDF
Migrating PostgreSQL to the Cloud
PDF
Pg big fast ugly acid
PDF
Puppetconf 2015 - Puppet Reporting with Elasticsearch Logstash and Kibana
PDF
MySQL Software Repositories
Pg chameleon MySQL to PostgreSQL replica
pg_chameleon a MySQL to PostgreSQL replica
The ninja elephant, scaling the analytics database in Transwerwise
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL - backup and recovery with large databases
The ninja elephant, scaling the analytics database in Transwerwise
a look at the postgresql engine
Hitchikers guide handout
A couple of things about PostgreSQL...
The hitchhiker's guide to PostgreSQL
Opencast Architecture
pgpool: Features and Development
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
Distributed System explained (with Java Microservices)
JPA Week3 Entity Mapping / Hexagonal Architecture
Infrastructure as code might be literally impossible part 2
Migrating PostgreSQL to the Cloud
Pg big fast ugly acid
Puppetconf 2015 - Puppet Reporting with Elasticsearch Logstash and Kibana
MySQL Software Repositories
Ad

More from Federico Campoli (8)

PDF
Pg chameleon, mysql to postgresql replica made easy
PDF
Life on a_rollercoaster
PDF
Backup recovery with PostgreSQL
PDF
Don't panic! - Postgres introduction
PDF
Streaming replication
PDF
PostgreSql query planning and tuning
PDF
PostgreSQL, The Big, The Fast and The Ugly
PDF
Postgresql database administration volume 1
Pg chameleon, mysql to postgresql replica made easy
Life on a_rollercoaster
Backup recovery with PostgreSQL
Don't panic! - Postgres introduction
Streaming replication
PostgreSql query planning and tuning
PostgreSQL, The Big, The Fast and The Ugly
Postgresql database administration volume 1

Recently uploaded (20)

PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
PDF
REPORT: Heating appliances market in Poland 2024
PDF
Google’s NotebookLM Unveils Video Overviews
PDF
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Dell Pro 14 Plus: Be better prepared for what’s coming
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
KodekX | Application Modernization Development
PDF
Modernizing your data center with Dell and AMD
PDF
How AI Agents Improve Data Accuracy and Consistency in Due Diligence.pdf
PPTX
ABU RAUP TUGAS TIK kelas 8 hjhgjhgg.pptx
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Reimagining Insurance: Connected Data for Confident Decisions.pdf
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
madgavkar20181017ppt McKinsey Presentation.pdf
REPORT: Heating appliances market in Poland 2024
Google’s NotebookLM Unveils Video Overviews
HCSP-Presales-Campus Network Planning and Design V1.0 Training Material-Witho...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Dell Pro 14 Plus: Be better prepared for what’s coming
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KodekX | Application Modernization Development
Modernizing your data center with Dell and AMD
How AI Agents Improve Data Accuracy and Consistency in Due Diligence.pdf
ABU RAUP TUGAS TIK kelas 8 hjhgjhgg.pptx
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Chapter 3 Spatial Domain Image Processing.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
NewMind AI Weekly Chronicles - August'25 Week I
Reimagining Insurance: Connected Data for Confident Decisions.pdf

pg_chameleon MySQL to PostgreSQL replica made easy

  • 1. pg chameleon MySQL to PostgreSQL replica made easy Federico Campoli Transferwise PGCon, Ottawa 01 Jun 2018 http://www.pgdba.org @4thdoctor scarf Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 1 / 46
  • 2. Few words about the speaker Born in 1972 Passionate about IT since 1982 mostly because of the TRON movie Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
  • 3. Few words about the speaker Born in 1972 Passionate about IT since 1982 mostly because of the TRON movie Joined the Oracle DBA secret society in 2004 In love with PostgreSQL since 2006 Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
  • 4. Few words about the speaker Born in 1972 Passionate about IT since 1982 mostly because of the TRON movie Joined the Oracle DBA secret society in 2004 In love with PostgreSQL since 2006 Devrim PostgreSQL tattoo’s copycat Works at Transferwise as Data Engineer Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
  • 5. Disclaimer I’m not a developer I’m a DBA... Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
  • 6. Disclaimer I’m not a developer I’m a DBA...which means being hated by everybody and hating everybody Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
  • 7. Disclaimer I’m not a developer I’m a DBA...which means being hated by everybody and hating everybody So, to put things in the right perspective... Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
  • 8. Disclaimer I’m not a developer I’m a DBA...which means being hated by everybody and hating everybody So, to put things in the right perspective...I use tabs Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
  • 9. Palpatine Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 4 / 46
  • 10. Table of contents 1 History 2 MySQL Replica in a nutshell 3 A chameleon in the middle 4 Replica in action 5 Lessons learned 6 Wrap up Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 5 / 46
  • 11. History Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 6 / 46
  • 12. The beginnings Years 2006/2012 neo my2pg.py I wrote the script because of a struggling phpbb on MySQL The database migration was successful However phpbb didn’t work very well with PostgreSQL.1 1Opening a new connection for each query is not the smartest thing to do. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
  • 13. The beginnings Years 2006/2012 neo my2pg.py I wrote the script because of a struggling phpbb on MySQL The database migration was successful However phpbb didn’t work very well with PostgreSQL.1 The script is written in python 2.6 It’s a monolith script And it’s slow, very slow 1Opening a new connection for each query is not the smartest thing to do. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
  • 14. The beginnings Years 2006/2012 neo my2pg.py I wrote the script because of a struggling phpbb on MySQL The database migration was successful However phpbb didn’t work very well with PostgreSQL.1 The script is written in python 2.6 It’s a monolith script And it’s slow, very slow It’s a good checklist for things to avoid when coding https://github.com/the4thdoctor/neo my2pg 1Opening a new connection for each query is not the smartest thing to do. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
  • 15. I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 Used SQLAlchemy for extracting the MySQL’s metadata Proof of concept only It was built during the years of the life on a roller coaster2 Therefore it was a just a way to discharge frustration 2Recording available here: http://www.pgbrighton.uk/post/backup recovery/ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 8 / 46
  • 16. I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 Used SQLAlchemy for extracting the MySQL’s metadata Proof of concept only It was built during the years of the life on a roller coaster2 Therefore it was a just a way to discharge frustration Abandoned after a while SQLAlchemy’s limitations were frustrating as well (see slide 3) And pgloader did the same job much much better 2Recording available here: http://www.pgbrighton.uk/post/backup recovery/ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 8 / 46
  • 17. pg chameleon reborn Year 2016 I needed to replicate the data data from MySQL to PostgreSQL http://tech.transferwise.com/scaling-our-analytics-database/ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 9 / 46
  • 18. pg chameleon reborn Year 2016 I needed to replicate the data data from MySQL to PostgreSQL http://tech.transferwise.com/scaling-our-analytics-database/ The amazing library python-mysql-replication allowed me build a proof of concept Evolved later in pg chameleon 1.x Kudos to the python-mysql-replication team! https://github.com/noplay/python-mysql-replication Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 9 / 46
  • 19. pg chameleon 1.x Developed on the London to Brighton commute Released as stable the 7th May 2017 Followed by 8 bugfix releases Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 10 / 46
  • 20. pg chameleon 1.x Developed on the London to Brighton commute Released as stable the 7th May 2017 Followed by 8 bugfix releases Compatible with CPython 2.7/3.3+ No more SQLAlchemy The MySQL driver changed from MySQLdb to PyMySQL Command line helper Supports type override on the fly (Danger!) Installs in virtualenv and system wide via pypi Can detach the replica for minimal downtime migrations Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 10 / 46
  • 21. pg chameleon versions 1’s limitations All the affected tables are locked in read only mode during the init replica process During the init replica the data is not accessible Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
  • 22. pg chameleon versions 1’s limitations All the affected tables are locked in read only mode during the init replica process During the init replica the data is not accessible The tables for being replicated require primary keys No daemon, the process always stays in foreground Single schema replica One process per each schema Network inefficient Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
  • 23. pg chameleon versions 1’s limitations All the affected tables are locked in read only mode during the init replica process During the init replica the data is not accessible The tables for being replicated require primary keys No daemon, the process always stays in foreground Single schema replica One process per each schema Network inefficient Read and replay not concurrent with risk of high lag The optional threaded mode very inefficient and fragile A single error in the replay process and the replica is broken Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
  • 24. MySQL Replica in a nutshell Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 12 / 46
  • 25. MySQL Replica The MySQL replica is logical When the replica is enabled the data changes are stored in the master’s binary log files The slave gets from the master’s binary log files The slave saves the stream of data into local relay logs The relay logs are replayed against the slave Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 13 / 46
  • 26. MySQL Replica Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 14 / 46
  • 27. Log formats MySQL have three ways of storing the changes in the binary logs. STATEMENT: It logs the statements which are replayed on the slave. It’s the best solution for the bandwidth. However, when replaying statements with not deterministic functions this format generates different values on the slave (e.g. using an insert with a column autogenerated by the uuid function). ROW: It’s deterministic. This format logs the row images. MIXED takes the best of both worlds. The master logs the statements unless a not deterministic function is used. In that case it logs the row image. All three formats always log the DDL query events. The python-mysql-replication library and therefore pg chameleon, require the ROW format to work properly. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 15 / 46
  • 28. A chameleon in the middle Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 16 / 46
  • 29. pg chameleon pg chameleon mimics a mysql slave’s behaviour It performs the initial load for the replicated tables It connects to the MySQL replica protocol It stores the row images into a PostgreSQL table A plpgSQL function decodes the rows and replay the changes It can detach the replica for minimal downtime migrations PostgreSQL acts as relay log and replication slave Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 17 / 46
  • 30. MySQL replica + pg chameleon Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 18 / 46
  • 31. pg chameleon 2.0 #1 Developed at the pgconf.eu 2017 and on the commute Released as stable the 1st of January 2018 Compatible with python 3.3+ Installs in virtualenv and system wide via pypi Replicates multiple schemas from a single MySQL into a target PostgreSQL database Conservative approach to the replica. Tables which generate errors are automatically excluded from the replica Daemonised replica process with two distinct subprocesses, for concurrent read and replay Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 19 / 46
  • 32. pg chameleon 2.0 #2 Soft locking replica initialisation. The tables are locked only during the copy. Rollbar integration for a simpler error detection and messaging Experimental support for the PostgreSQL source type The tables are loaded in a separate schema which is swapped with the existing. This approach requires more space but it makes the init a replica virtually painless, leaving the old data accessible until the init replica is complete. The DDL are translated in the PostgreSQL dialect keeping the schema in sync with MySQL automatically Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 20 / 46
  • 33. Version 2.0’s limitations Tables for being replicated require primary or unique keys When detaching the replica the foreign keys are created always ON DELETE/UPDATE RESTRICT The source type PostgreSQL supports only the init replica process Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 21 / 46
  • 34. Replica initialisation The replica initialisation follows the same workflow as stated on the mysql online manual. Flush the tables with read lock Get the master’s coordinates Copy the data Release the locks However... Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 22 / 46
  • 35. Replica initialisation The replica initialisation follows the same workflow as stated on the mysql online manual. Flush the tables with read lock Get the master’s coordinates Copy the data Release the locks However... pg chameleon flushes the tables with read lock one by one. The lock is held only during the copy. The log coordinates are stored in the replica catalogue along the table’s name and used by the replica process to determine whether the table’s binlog data should be used or not. The replica starts inconsistent and gains consistency over time. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 22 / 46
  • 36. Fallback on failure The data is pulled from mysql using the CSV format in slices. This approach prevents the memory overload. Once the file is saved then is pushed into PostgreSQL using the COPY command. However... Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 23 / 46
  • 37. Fallback on failure The data is pulled from mysql using the CSV format in slices. This approach prevents the memory overload. Once the file is saved then is pushed into PostgreSQL using the COPY command. However... COPY is fast but is single transaction One failure and the entire batch is rolled back If this happens the procedure loads the same data using the INSERT statements Which can be very slow The process attempts to clean the NUL markers which are allowed by MySQL If the row still fails on insert then it’s discarded Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 23 / 46
  • 38. Replica in action Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 24 / 46
  • 39. MySQL configuration The mysql configuration file is usually stored in /etc/mysql/my.cnf To enable the binary logging find the section [mysqld] and check that the following parameters are set. binlog_format= ROW log-bin = mysql-bin server-id = 1 binlog-row-image = FULL Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 25 / 46
  • 40. MySQL user for replica Setup a replication user on MySQL CREATE USER usr_replica ; SET PASSWORD FOR usr_replica =PASSWORD(’replica ’); GRANT ALL ON sakila .* TO ’usr_replica ’; GRANT RELOAD ON *.* to ’usr_replica ’; GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’; GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’; FLUSH PRIVILEGES; In our example we are using the sakila test database. https://dev.mysql.com/doc/sakila/en/ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 26 / 46
  • 41. PostgreSQL setup Add an user on PostgreSQL capable to create schemas and relations in the destination database CREATE USER usr_replica WITH PASSWORD ’replica ’; CREATE DATABASE db_replica WITH OWNER usr_replica; Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 27 / 46
  • 42. Install pg chameleon Install pg chameleon and create the configuration files pip install pip --upgrade pip install pg_chameleon chameleon set_configuration_files cd ~/.pg_chameleon/configuration cp config-example.yml default.yml Edit the file default.yml setting the correct values for connection and source. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 28 / 46
  • 43. Configure global settings in default.yaml PostgreSQL Connection pg conn: host: " localhost " p or t : " 5432 " u s e r : " usr_replica " password: " replica " database: " db_replica " c h a r s e t : " utf8 " Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
  • 44. Configure global settings in default.yaml PostgreSQL Connection pg conn: host: " localhost " p or t : " 5432 " u s e r : " usr_replica " password: " replica " database: " db_replica " c h a r s e t : " utf8 " Rollbar configuration r o l l b a r k e y : ’< rollbar_long_key>’ r o l l b a r e n v : ’pgcon - demo ’ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
  • 45. Configure global settings in default.yaml PostgreSQL Connection pg conn: host: " localhost " p or t : " 5432 " u s e r : " usr_replica " password: " replica " database: " db_replica " c h a r s e t : " utf8 " Rollbar configuration r o l l b a r k e y : ’< rollbar_long_key>’ r o l l b a r e n v : ’pgcon - demo ’ Type override (optional) t y p e o v e r r i d e : " tinyint (1) ": o v e r r i d e t o : b o o l e a n o v e r r i d e t a b l e s : - "*" Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
  • 46. Configure the mysql source s o u r c e s : mysql: db conn: host: " localhost " po r t : " 3306 " u s e r : " usr_replica " password: " replica " c h a r s e t : ’utf8 ’ connect timeout: 10 Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
  • 47. Configure the mysql source s o u r c e s : mysql: db conn: host: " localhost " po r t : " 3306 " u s e r : " usr_replica " password: " replica " c h a r s e t : ’utf8 ’ connect timeout: 10 schema mappings: s a k i l a : l o x o d o n t a a f r i c a n a Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
  • 48. Configure the mysql source s o u r c e s : mysql: db conn: host: " localhost " po r t : " 3306 " u s e r : " usr_replica " password: " replica " c h a r s e t : ’utf8 ’ connect timeout: 10 schema mappings: s a k i l a : l o x o d o n t a a f r i c a n a l i m i t t a b l e s : s k i p t a b l e s : g r a n t s e l e c t t o : - u s r r e a d o n l y l o c k t i m e o u t : " 120 s" m y s e r v e r i d : 100 r e p l i c a b a t c h s i z e : 10000 rep l ay max row s: 10000 b a t c h r e t e n t i o n : ’1 day ’ copy max memory: " 300 M" copy mode: ’file ’ o u t d i r : /tmp s l e e p l o o p : 1 o n e r r o r r e p l a y : c o n t i n u e o n e r r o r r e a d : c o n t i n u e auto maintenance: "1 day " type: mysql Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
  • 49. Add the source and initialise the replica Add the source mysql and initialise the replica for it. We are using debug in order to get the logging on the console. chameleon create_replica_schema --debug chameleon add_source --config default --source mysql --debug chameleon init_replica --config default --source mysql --debug Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 31 / 46
  • 50. Start the replica Start the replica process chameleon start_replica --config default --source mysql Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 32 / 46
  • 51. Start the replica Start the replica process chameleon start_replica --config default --source mysql Show the replica status chameleon show_status --config default --source mysql Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 32 / 46
  • 52. Time for a demo Demo! The demo will fail miserably for sure and you will hate this project forever. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 33 / 46
  • 53. Lessons learned Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 34 / 46
  • 54. Strictness is an illusion. MySQL doubly so MySQL’s lack of strictness is not a mystery. The funny way the default with NOT NULL is managed by MySQL can break the replica. Therefore any field with NOT NULL added after the initialisation are created always as NULLable in PostgreSQL. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 35 / 46
  • 55. The DDL. A real pain in the back I initially tried to use sqlparse for tokenising the DDL emitted by MySQL. Unfortunately didn’t worked as I expected. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
  • 56. The DDL. A real pain in the back I initially tried to use sqlparse for tokenising the DDL emitted by MySQL. Unfortunately didn’t worked as I expected. So I decided to use the regular expressions. Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems. -- Jamie Zawinski Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
  • 57. The DDL. A real pain in the back I initially tried to use sqlparse for tokenising the DDL emitted by MySQL. Unfortunately didn’t worked as I expected. So I decided to use the regular expressions. Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems. -- Jamie Zawinski MySQL even in ROW format emits the DDL as statements The class sql token uses the regular expressions to tokenise the DDL The tokenised data is used to build the DDL in the PostgreSQL dialect Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
  • 58. Wrap up Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 37 / 46
  • 59. To boldly go where no chameleon has gone before Short team goals, version 2.0 Re sync automatically the tables when they error on replay Improve the replay speed and cpu efficiency GTID support for MySQL source Medium term goals version 2.1 Parallel copy and index creation in order to speed up the init replica process Logical replica from PostgreSQL Improve the default column handling Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 38 / 46
  • 60. Igor, the green little guy The chameleon logo has been developed by Elena Toma, a talented Italian Lady. https://www.facebook.com/Tonkipapperoart/ The name Igor is inspired by Martin Feldman’s Igor portraited in Young Frankenstein movie. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 39 / 46
  • 61. Feedback please! Please report any issue on github and follow pg chameleon on twitter for the announcements. https://github.com/the4thdoctor/pg chameleon @pg chameleon Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 40 / 46
  • 62. Did you say hire? WE ARE HIRING! https://transferwise.com/jobs/ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 41 / 46
  • 63. That’s all folks! Thank you for listening! Any questions? Please be very basic, I’m just an electrician after all. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 42 / 46
  • 64. Image credits Palpatine,Dr. Evil disclaimer,It could work. Young Frankenstein source memegenerator MySQL Image source, WikiCommons Hard Disk image, source WikiCommons Tron image, source Tron Wikia Twitter icon, source Open Icon Library The PostgreSQL logo, copyright the PostgreSQL global development group Boromir get rid of mysql, source imgflip Morpheus, source imgflip Keep calm chameleon, source imgflip The dolphin picture - Copyright artnoose Perseus, Framed - Copyright Federico Campoli Pinkie Pie that’s all folks, Copyright by dan232323, used with permission Doom, source RetroPie Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 43 / 46
  • 65. License This document is distributed under the terms of the Creative Commons Attribution, Not Commercial, Share Alike Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 44 / 46
  • 66. pg chameleon MySQL to PostgreSQL replica made easy Federico Campoli Transferwise PGCon, Ottawa 01 Jun 2018 http://www.pgdba.org @4thdoctor scarf Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 45 / 46