SlideShare a Scribd company logo
pg chameleon
MySQL to PostgreSQL lightweight replica
Federico Campoli
Brighton PostgreSQL Meetup
18 November 2016
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 1 / 44
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 2 / 44
Some history
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 3 / 44
Some history
The beginnings
Years 2006/2012
neo my2pg.py
Developed for helping a struggling phpbb
The database was successfully migrated from MySQL to PostgreSQL
The migration failed for other reasons
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
Some history
The beginnings
Years 2006/2012
neo my2pg.py
Developed for helping a struggling phpbb
The database was successfully migrated from MySQL to PostgreSQL
The migration failed for other reasons
It’s written in python 2.6
It’s a monolith script
And it’s slow, very slow
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
Some history
The beginnings
Years 2006/2012
neo my2pg.py
Developed for helping a struggling phpbb
The database was successfully migrated from MySQL to PostgreSQL
The migration failed for other reasons
It’s written in python 2.6
It’s a monolith script
And it’s slow, very slow
You can use it as checklist for things to avoid when coding
https://github.com/the4thdoctor/neo my2pg
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
Some history
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
SQLAlchemy was used for extracting the MySql’s metadata
Good proof of concept. No real hope to become usable
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
Some history
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
SQLAlchemy was used for extracting the MySql’s metadata
Good proof of concept. No real hope to become usable
Built during the years of the roller coaster
It was a just a way to discharge frustration
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
Some history
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
SQLAlchemy was used for extracting the MySql’s metadata
Good proof of concept. No real hope to become usable
Built during the years of the roller coaster
It was a just a way to discharge frustration
Abandoned because pgloader did the same and better
The ORM limitations didn’t help to keep the project alive
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
Some history
pg chameleon reborn
Year 2016
The project’s revamp the was triggered by a specific need.
What if were possible to replicate data from MySQL to PostgreSQL?
The library python-mysql-replication can decode the mysql replica when using
ROW based.
Trying won’t harm they said.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 6 / 44
Some history
pg chameleon reborn
Is still on Python 2.7
Removed SQLAlchemy
Switched the mysql driver to PyMySQL
The library python-mysql-replication reads the MySQL replica
Provides a basic command line
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 7 / 44
MySQL Replica in a nutshell
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 8 / 44
MySQL Replica in a nutshell
MySQL Replica
MySQL saves the logical data rather the physical
The data changes are stored in a local binary log
The slave saves in its local relay logs the replication data pulled from the
master
The slave read the local relay logs and replays the data
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 9 / 44
MySQL Replica in a nutshell
MySQL Replica
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 10 / 44
MySQL Replica in a nutshell
Log formats
STATEMENT format logs the statements which are replayed on the slave.
It seems the best solution for performance.
Replaying not deterministic functions generate inconsistent slaves (e.g. uuid).
ROW is deterministic. It logs the changed row and the DDL queries.
This format is required for pg chameleon to work.
MIXED takes the best of both worlds. The master logs the statements unless
a not deterministic function is used. In that case it logs the row image.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 11 / 44
MySQL Replica in a nutshell
A chameleon in the middle
pg chameleon mimics a mysql slave’s behaviour
Reads the replica
Stores the decoded rows into a PostgreSQL table
PostgreSQL acts as relay log and replication slave
A plpgSQL function decodes the rows and replay the changes
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
MySQL Replica in a nutshell
A chameleon in the middle
pg chameleon mimics a mysql slave’s behaviour
Reads the replica
Stores the decoded rows into a PostgreSQL table
PostgreSQL acts as relay log and replication slave
A plpgSQL function decodes the rows and replay the changes
With an extra cool feature.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
MySQL Replica in a nutshell
A chameleon in the middle
pg chameleon mimics a mysql slave’s behaviour
Reads the replica
Stores the decoded rows into a PostgreSQL table
PostgreSQL acts as relay log and replication slave
A plpgSQL function decodes the rows and replay the changes
With an extra cool feature.
Initialise the PostgreSQL replica schema in just one command
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
MySQL Replica in a nutshell
MySQL replica + pg chameleon
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 13 / 44
The pg chameleon library
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 14 / 44
The pg chameleon library
Project structure
project directory
pg chameleon.py
config
config.yaml
logs
pg chameleon
lib
global lib.py
mysql lib.py
pg lib.py
sqlutil lib.py
sql
upgrade
create schema.sql
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 15 / 44
The pg chameleon library
pg chameleon.py
Command line wrapper
Use argparse to execute the commands
Can be simply extended to more commands
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 16 / 44
The pg chameleon library
pg chameleon.py
init replica copies the data from mysql and saves the master coordinates in
postgres
this command locks the mysql tables in read only mode during the
copy
start replica connects to the mysql master and replies the changes in
PostgreSQL
create schema,drop schema,upgrade schema manual actions on the
PostgreSQL service schema
not required in general because the init replica recreates the service schema
from scratch.
start replica runs the schema migrations if required before starting the
program loop
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 17 / 44
The pg chameleon library
global lib.py
class global config: loads the config.yaml into the class attributes
class replica engine: wraps the mysql and pgsql class methods and setup the
logging method. a global config instance is created for getting the
configuration settings
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 18 / 44
The pg chameleon library
mysql lib.py
class mysql connection: connects to mysql using the parameters provided by
replica engine
class mysql engine: does all the magic for the replication setup and execution
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 19 / 44
The pg chameleon library
mysql lib.py
class mysql engine
locks and release the tables for the init replica command
pulls out the data from mysql in csv format or insert statements
extracts the metadata from mysql’s information schema
copy the data into postgres using the class pg engine
fallsback to inserts if the copy fails for any reason
starts the replica stream using python-mysql-replication
decodes the replica events into a data dictionary which is saved by pg engine
when a replica binlog is read executes the postgres replay via pg engine
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 20 / 44
The pg chameleon library
pg lib.py
class pg encoder: extends the class JSON and adds some special handling for
types like decimal and datetime
class pgsql connection: connects to the PostgreSQL database
class pgsql engine: does all the magic for rebuilding the data structure,
loading data and migrating the schema
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 21 / 44
The pg chameleon library
pg lib.py
class pgsql engine
create and upgrade the service schema sch chameleon
builds the create statements for tables and indices using the metadata
provided by mysql engine
executes the create statements and register the mysql tables in sch chameleon
copy the data into the tables and fallsback to inserts if the copy fails
builds the primary keys and indices using the medatada provided by
mysql engine
store the json data from the replica and executes the replay
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 22 / 44
The pg chameleon library
sqlutil lib.py
Consists in just one class sql token which tokenise the mysql queries to be used by
pgsql engine for building the DDL in PostgreSQL’s dialect.
Currently under development
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 23 / 44
The pg chameleon library
config.yaml
my server id: the server id for the mysql replica. must be unique among the
replica cluster
copy max memory: the max amount of memory to use when copying the
table in PostgreSQL. Is possible to specify the value in (k)ilobytes,
(M)egabytes, (G)igabytes adding the suffix (e.g. 300M)
my database: mysql database to replicate. a schema with the same name will
be initialised in the postgres database
pg database: destination database in PostgreSQL.
copy mode: the allowed values are ‘file’ and ‘direct’. With direct the copy
happens on the fly. With file the table is first dumped in a csv file then
reloaded in PostgreSQL.
hexify: is a yaml list with the data types that require coversion in hex (e.g.
blob, binary). The conversion happens on the copy and on the replica.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 24 / 44
The pg chameleon library
config.yaml
log dir: directory where the logs are stored
log level: logging verbosity. allowed values are debug, info, warning, error
log dest: log destination. stdout for debugging purposes, file for the normal
activity.
my charset mysql charset for the copy (please note the replica is always in
utf8)
pg charset: PostgreSQL connection’s charset.
tables limit: yaml list with the tables to replicate. if empty the entire mysql
database is replicated.
sleep loop seconds between a new replica batch attempt
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 25 / 44
The pg chameleon library
config.yaml
MySQL connection parameters
mysql_conn:
host: localhost
port: 3306
user: replication_username
passwd: never_commit_passwords
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 26 / 44
The pg chameleon library
config.yaml
PostgreSQL connection parameters
pg_conn:
host: localhost
port: 5432
user: replication_username
password: never_commit_passwords
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 27 / 44
The pg chameleon library
MySQL replica configuration
The mysql configuration file is usually stored in /etc/mysql/my.cnf
To enable the binary logging find the section [mysqld] and check the following
parameters are set.
binlog format Has to be ROW for capturing the DML events
log-bin any name is good (e.g. mysql-bin)
server-id has to be a numerical value unique along the replication cluster
The value 1 is used for the master
binlog row image has to be full as required by the python-mysql-replication
library
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 28 / 44
The pg chameleon library
MySQL setup
CREATE USER usr_replica ;
SET PASSWORD FOR usr_replica =PASSWORD(’replica ’);
GRANT ALL ON sakila .* TO ’usr_replica ’;
GRANT RELOAD ON *.* to ’usr_replica ’;
GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’;
GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’;
FLUSH PRIVILEGES;
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 29 / 44
The pg chameleon library
PostgreSQL setup
CREATE USER usr_replica WITH PASSWORD ’replica ’;
CREATE DATABASE db_replica WITH OWNER usr_replica;
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 30 / 44
The pg chameleon library
Replica setup
Setup copy config-yaml.example in config.yaml and setup the configuration
parameters
./pg_chameleon.py init_replica
Wait for the init replica completion then start the replica with
./pg_chameleon.py start_replica
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 31 / 44
Caveats, traps, the usual political stuff...
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 32 / 44
Caveats, traps, the usual political stuff...
Limitations
Tables for being replicated require primary keys
There is no cleanup for the rubbish accepted by mysql (e.g. nulls implicitly
converted to 0)
No Daemonisation yet
Binary data are hexified to avoid issues with PostgreSQL
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 33 / 44
Caveats, traps, the usual political stuff...
What does it work
Replicate mysql schema into PostgreSQL
Locks the tables in mysql and gets the master coordinates
Create primary keys and indices on PostgreSQL
Write MySQL row events in PostgreSQL
Replay of the replicated data in PostgreSQL
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 34 / 44
Caveats, traps, the usual political stuff...
What does seem to work
Enum support
Binary import into bytea (hex conversion)
Initial copy based on copy to file or in memory
Fall back to inserts in case of rubbish data (slow)
Replication of CREATE and DROP TABLE statements
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 35 / 44
Caveats, traps, the usual political stuff...
What doesn’t work
replication of ALTER TABLE statements
Materialisation of the MySQL views
Foreign keys import in PostgreSQL
Daemonisation, background workers for replay, postgres extension
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 36 / 44
Wrap up
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 37 / 44
Wrap up
Igor, the green little guy
The chameleon logo has been developed by Elena Toma, a talented Italian Lady.
https://www.facebook.com/Tonkipapperoart/
The name Igor is inspired by Martin Feldman’s Igor portraited in Young
Frankenstein movie.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 38 / 44
Wrap up
Some numbers
Lines of code
global lib.py 163
mysql lib.py 521
pg lib.py 557
sql util.py 208
create schema.sql 354
Total lines in libraries 1449
Total lines including SQL 1803
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 39 / 44
Wrap up
pg chameleon’s license
Old plain 2clause BSD License
Copyright (c) 2016, Federico Campoli
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 40 / 44
Wrap up
Please Test!
That’s all!
Please clone the repository, test and break the tool!
Report issues!
https://github.com/the4thdoctor/pg chameleon
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 41 / 44
Wrap up
Boring legal stuff
MySQL Image source WikiCommons
Hard Disk image source WikiCommons
Slonik logo, copyright PostgreSQL Global development group
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 42 / 44
Wrap up
Contacts and license
Twitter: 4thdoctor scarf
Blog:http://www.pgdba.co.uk
Brighton PostgreSQL Meetup:
http://www.meetup.com/Brighton-PostgreSQL-Meetup/
This document is distributed under the terms of the Creative Commons
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 43 / 44
Wrap up
pg chameleon
MySQL to PostgreSQL lightweight replica
Federico Campoli
Brighton PostgreSQL Meetup
18 November 2016
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 44 / 44

More Related Content

PDF
Backup recovery with PostgreSQL
PDF
Hitchikers guide handout
PDF
PostgreSql query planning and tuning
PDF
PostgreSQL, the big the fast and the (NOSQL on) Acid
PDF
Streaming replication
PDF
The ninja elephant, scaling the analytics database in Transwerwise
PDF
PostgreSQL - backup and recovery with large databases
PDF
The ninja elephant, scaling the analytics database in Transwerwise
Backup recovery with PostgreSQL
Hitchikers guide handout
PostgreSql query planning and tuning
PostgreSQL, the big the fast and the (NOSQL on) Acid
Streaming replication
The ninja elephant, scaling the analytics database in Transwerwise
PostgreSQL - backup and recovery with large databases
The ninja elephant, scaling the analytics database in Transwerwise

What's hot (20)

PDF
Pg big fast ugly acid
PDF
The hitchhiker's guide to PostgreSQL
PDF
pg_chameleon a MySQL to PostgreSQL replica
PDF
Life on a_rollercoaster
PDF
a look at the postgresql engine
PDF
pg_chameleon MySQL to PostgreSQL replica made easy
PDF
A couple of things about PostgreSQL...
PDF
Don't panic! - Postgres introduction
PDF
[로켓 자바] Part 1 성능 튜닝 마인드 확립
PDF
JPA Week3 Entity Mapping / Hexagonal Architecture
PDF
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
PDF
lwref: insane idea on reference counting
PDF
new ifnet(9) KPI for FreeBSD network stack
PDF
etcd based PostgreSQL HA Cluster
PDF
New sendfile
PDF
Kernel Recipes 2016 - The kernel report
PDF
Week7 bean life cycle
PDF
Ireland OUG Meetup May 2017
PDF
XXXsrc 2018 -the record of the past year-
PPTX
Git basic
Pg big fast ugly acid
The hitchhiker's guide to PostgreSQL
pg_chameleon a MySQL to PostgreSQL replica
Life on a_rollercoaster
a look at the postgresql engine
pg_chameleon MySQL to PostgreSQL replica made easy
A couple of things about PostgreSQL...
Don't panic! - Postgres introduction
[로켓 자바] Part 1 성능 튜닝 마인드 확립
JPA Week3 Entity Mapping / Hexagonal Architecture
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
lwref: insane idea on reference counting
new ifnet(9) KPI for FreeBSD network stack
etcd based PostgreSQL HA Cluster
New sendfile
Kernel Recipes 2016 - The kernel report
Week7 bean life cycle
Ireland OUG Meetup May 2017
XXXsrc 2018 -the record of the past year-
Git basic
Ad

Viewers also liked (9)

PDF
Postgresql database administration volume 1
PPTX
Media planning
PPTX
MuCEM
PPTX
My favorite movember mustaches
DOCX
Course outline aacsb mba 839_global outsourcing_r kumar
PDF
SISTEMA DE IDENTIDADE VISUAL - SIV - CANNIBAL Sex Shop
PPTX
Media planning f
PPTX
Catálogo de bolsas Chenson - Cristal cosmetic
PDF
PostgreSQL, The Big, The Fast and The Ugly
Postgresql database administration volume 1
Media planning
MuCEM
My favorite movember mustaches
Course outline aacsb mba 839_global outsourcing_r kumar
SISTEMA DE IDENTIDADE VISUAL - SIV - CANNIBAL Sex Shop
Media planning f
Catálogo de bolsas Chenson - Cristal cosmetic
PostgreSQL, The Big, The Fast and The Ugly
Ad

Similar to Pg chameleon MySQL to PostgreSQL replica (20)

PDF
Python performance engineering in 2017
PDF
Pg chameleon, mysql to postgresql replica made easy
PDF
GenAI-powered assistants compared in a real case - 2025-03-18
PPT
Os Webb
PPT
How Typepad changed their architecture without taking down the service
PDF
Raptor user manual3.0
PDF
Nancy CLI. Automated Database Experiments
PDF
My works in gitub, etc.
PDF
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
PDF
PostgreSQL Server Programming - Second Edition Dar
PPTX
FAIR Projector Builder
PDF
Postgresql Up And Running Regina Obe Leo Hsu
PDF
Intermediate python
PDF
Spark Meetup
PDF
R programming for data science
PPTX
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
PPTX
Apache Camel K - Copenhagen v2
PPTX
Apache Camel K - Copenhagen
PDF
Learning Concurrent Programming in Scala Second Edition Aleksandar Prokopec
PDF
Python Interview Questions And Answers 2019 | Edureka
Python performance engineering in 2017
Pg chameleon, mysql to postgresql replica made easy
GenAI-powered assistants compared in a real case - 2025-03-18
Os Webb
How Typepad changed their architecture without taking down the service
Raptor user manual3.0
Nancy CLI. Automated Database Experiments
My works in gitub, etc.
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
PostgreSQL Server Programming - Second Edition Dar
FAIR Projector Builder
Postgresql Up And Running Regina Obe Leo Hsu
Intermediate python
Spark Meetup
R programming for data science
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen
Learning Concurrent Programming in Scala Second Edition Aleksandar Prokopec
Python Interview Questions And Answers 2019 | Edureka

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
PPTX
ChatGPT's Deck on The Enduring Legacy of Fax Machines
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Chapter 2 Digital Image Fundamentals.pdf
PPTX
CroxyProxy Instagram Access id login.pptx
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
PDF
agentic-ai-and-the-future-of-autonomous-systems.pdf
PDF
This slide provides an overview Technology
PPTX
How Much Does It Cost to Build a Train Ticket App like Trenitalia in Italy.pptx
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PDF
Event Presentation Google Cloud Next Extended 2025
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
PDF
Doc9.....................................
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Smarter Business Operations Powered by IoT Remote Monitoring
NewMind AI Weekly Chronicles - July'25 - Week IV
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Chapter 2 Digital Image Fundamentals.pdf
CroxyProxy Instagram Access id login.pptx
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
agentic-ai-and-the-future-of-autonomous-systems.pdf
This slide provides an overview Technology
How Much Does It Cost to Build a Train Ticket App like Trenitalia in Italy.pptx
GamePlan Trading System Review: Professional Trader's Honest Take
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Event Presentation Google Cloud Next Extended 2025
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
Doc9.....................................
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Understanding_Digital_Forensics_Presentation.pptx
Smarter Business Operations Powered by IoT Remote Monitoring

Pg chameleon MySQL to PostgreSQL replica

  • 1. pg chameleon MySQL to PostgreSQL lightweight replica Federico Campoli Brighton PostgreSQL Meetup 18 November 2016 Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 1 / 44
  • 2. Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 2 / 44
  • 3. Some history Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 3 / 44
  • 4. Some history The beginnings Years 2006/2012 neo my2pg.py Developed for helping a struggling phpbb The database was successfully migrated from MySQL to PostgreSQL The migration failed for other reasons Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
  • 5. Some history The beginnings Years 2006/2012 neo my2pg.py Developed for helping a struggling phpbb The database was successfully migrated from MySQL to PostgreSQL The migration failed for other reasons It’s written in python 2.6 It’s a monolith script And it’s slow, very slow Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
  • 6. Some history The beginnings Years 2006/2012 neo my2pg.py Developed for helping a struggling phpbb The database was successfully migrated from MySQL to PostgreSQL The migration failed for other reasons It’s written in python 2.6 It’s a monolith script And it’s slow, very slow You can use it as checklist for things to avoid when coding https://github.com/the4thdoctor/neo my2pg Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
  • 7. Some history I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 SQLAlchemy was used for extracting the MySql’s metadata Good proof of concept. No real hope to become usable Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
  • 8. Some history I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 SQLAlchemy was used for extracting the MySql’s metadata Good proof of concept. No real hope to become usable Built during the years of the roller coaster It was a just a way to discharge frustration Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
  • 9. Some history I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 SQLAlchemy was used for extracting the MySql’s metadata Good proof of concept. No real hope to become usable Built during the years of the roller coaster It was a just a way to discharge frustration Abandoned because pgloader did the same and better The ORM limitations didn’t help to keep the project alive Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
  • 10. Some history pg chameleon reborn Year 2016 The project’s revamp the was triggered by a specific need. What if were possible to replicate data from MySQL to PostgreSQL? The library python-mysql-replication can decode the mysql replica when using ROW based. Trying won’t harm they said. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 6 / 44
  • 11. Some history pg chameleon reborn Is still on Python 2.7 Removed SQLAlchemy Switched the mysql driver to PyMySQL The library python-mysql-replication reads the MySQL replica Provides a basic command line Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 7 / 44
  • 12. MySQL Replica in a nutshell Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 8 / 44
  • 13. MySQL Replica in a nutshell MySQL Replica MySQL saves the logical data rather the physical The data changes are stored in a local binary log The slave saves in its local relay logs the replication data pulled from the master The slave read the local relay logs and replays the data Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 9 / 44
  • 14. MySQL Replica in a nutshell MySQL Replica Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 10 / 44
  • 15. MySQL Replica in a nutshell Log formats STATEMENT format logs the statements which are replayed on the slave. It seems the best solution for performance. Replaying not deterministic functions generate inconsistent slaves (e.g. uuid). ROW is deterministic. It logs the changed row and the DDL queries. This format is required for pg chameleon to work. MIXED takes the best of both worlds. The master logs the statements unless a not deterministic function is used. In that case it logs the row image. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 11 / 44
  • 16. MySQL Replica in a nutshell A chameleon in the middle pg chameleon mimics a mysql slave’s behaviour Reads the replica Stores the decoded rows into a PostgreSQL table PostgreSQL acts as relay log and replication slave A plpgSQL function decodes the rows and replay the changes Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
  • 17. MySQL Replica in a nutshell A chameleon in the middle pg chameleon mimics a mysql slave’s behaviour Reads the replica Stores the decoded rows into a PostgreSQL table PostgreSQL acts as relay log and replication slave A plpgSQL function decodes the rows and replay the changes With an extra cool feature. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
  • 18. MySQL Replica in a nutshell A chameleon in the middle pg chameleon mimics a mysql slave’s behaviour Reads the replica Stores the decoded rows into a PostgreSQL table PostgreSQL acts as relay log and replication slave A plpgSQL function decodes the rows and replay the changes With an extra cool feature. Initialise the PostgreSQL replica schema in just one command Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
  • 19. MySQL Replica in a nutshell MySQL replica + pg chameleon Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 13 / 44
  • 20. The pg chameleon library Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 14 / 44
  • 21. The pg chameleon library Project structure project directory pg chameleon.py config config.yaml logs pg chameleon lib global lib.py mysql lib.py pg lib.py sqlutil lib.py sql upgrade create schema.sql Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 15 / 44
  • 22. The pg chameleon library pg chameleon.py Command line wrapper Use argparse to execute the commands Can be simply extended to more commands Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 16 / 44
  • 23. The pg chameleon library pg chameleon.py init replica copies the data from mysql and saves the master coordinates in postgres this command locks the mysql tables in read only mode during the copy start replica connects to the mysql master and replies the changes in PostgreSQL create schema,drop schema,upgrade schema manual actions on the PostgreSQL service schema not required in general because the init replica recreates the service schema from scratch. start replica runs the schema migrations if required before starting the program loop Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 17 / 44
  • 24. The pg chameleon library global lib.py class global config: loads the config.yaml into the class attributes class replica engine: wraps the mysql and pgsql class methods and setup the logging method. a global config instance is created for getting the configuration settings Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 18 / 44
  • 25. The pg chameleon library mysql lib.py class mysql connection: connects to mysql using the parameters provided by replica engine class mysql engine: does all the magic for the replication setup and execution Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 19 / 44
  • 26. The pg chameleon library mysql lib.py class mysql engine locks and release the tables for the init replica command pulls out the data from mysql in csv format or insert statements extracts the metadata from mysql’s information schema copy the data into postgres using the class pg engine fallsback to inserts if the copy fails for any reason starts the replica stream using python-mysql-replication decodes the replica events into a data dictionary which is saved by pg engine when a replica binlog is read executes the postgres replay via pg engine Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 20 / 44
  • 27. The pg chameleon library pg lib.py class pg encoder: extends the class JSON and adds some special handling for types like decimal and datetime class pgsql connection: connects to the PostgreSQL database class pgsql engine: does all the magic for rebuilding the data structure, loading data and migrating the schema Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 21 / 44
  • 28. The pg chameleon library pg lib.py class pgsql engine create and upgrade the service schema sch chameleon builds the create statements for tables and indices using the metadata provided by mysql engine executes the create statements and register the mysql tables in sch chameleon copy the data into the tables and fallsback to inserts if the copy fails builds the primary keys and indices using the medatada provided by mysql engine store the json data from the replica and executes the replay Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 22 / 44
  • 29. The pg chameleon library sqlutil lib.py Consists in just one class sql token which tokenise the mysql queries to be used by pgsql engine for building the DDL in PostgreSQL’s dialect. Currently under development Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 23 / 44
  • 30. The pg chameleon library config.yaml my server id: the server id for the mysql replica. must be unique among the replica cluster copy max memory: the max amount of memory to use when copying the table in PostgreSQL. Is possible to specify the value in (k)ilobytes, (M)egabytes, (G)igabytes adding the suffix (e.g. 300M) my database: mysql database to replicate. a schema with the same name will be initialised in the postgres database pg database: destination database in PostgreSQL. copy mode: the allowed values are ‘file’ and ‘direct’. With direct the copy happens on the fly. With file the table is first dumped in a csv file then reloaded in PostgreSQL. hexify: is a yaml list with the data types that require coversion in hex (e.g. blob, binary). The conversion happens on the copy and on the replica. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 24 / 44
  • 31. The pg chameleon library config.yaml log dir: directory where the logs are stored log level: logging verbosity. allowed values are debug, info, warning, error log dest: log destination. stdout for debugging purposes, file for the normal activity. my charset mysql charset for the copy (please note the replica is always in utf8) pg charset: PostgreSQL connection’s charset. tables limit: yaml list with the tables to replicate. if empty the entire mysql database is replicated. sleep loop seconds between a new replica batch attempt Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 25 / 44
  • 32. The pg chameleon library config.yaml MySQL connection parameters mysql_conn: host: localhost port: 3306 user: replication_username passwd: never_commit_passwords Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 26 / 44
  • 33. The pg chameleon library config.yaml PostgreSQL connection parameters pg_conn: host: localhost port: 5432 user: replication_username password: never_commit_passwords Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 27 / 44
  • 34. The pg chameleon library MySQL replica configuration The mysql configuration file is usually stored in /etc/mysql/my.cnf To enable the binary logging find the section [mysqld] and check the following parameters are set. binlog format Has to be ROW for capturing the DML events log-bin any name is good (e.g. mysql-bin) server-id has to be a numerical value unique along the replication cluster The value 1 is used for the master binlog row image has to be full as required by the python-mysql-replication library Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 28 / 44
  • 35. The pg chameleon library MySQL setup CREATE USER usr_replica ; SET PASSWORD FOR usr_replica =PASSWORD(’replica ’); GRANT ALL ON sakila .* TO ’usr_replica ’; GRANT RELOAD ON *.* to ’usr_replica ’; GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’; GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’; FLUSH PRIVILEGES; Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 29 / 44
  • 36. The pg chameleon library PostgreSQL setup CREATE USER usr_replica WITH PASSWORD ’replica ’; CREATE DATABASE db_replica WITH OWNER usr_replica; Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 30 / 44
  • 37. The pg chameleon library Replica setup Setup copy config-yaml.example in config.yaml and setup the configuration parameters ./pg_chameleon.py init_replica Wait for the init replica completion then start the replica with ./pg_chameleon.py start_replica Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 31 / 44
  • 38. Caveats, traps, the usual political stuff... Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 32 / 44
  • 39. Caveats, traps, the usual political stuff... Limitations Tables for being replicated require primary keys There is no cleanup for the rubbish accepted by mysql (e.g. nulls implicitly converted to 0) No Daemonisation yet Binary data are hexified to avoid issues with PostgreSQL Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 33 / 44
  • 40. Caveats, traps, the usual political stuff... What does it work Replicate mysql schema into PostgreSQL Locks the tables in mysql and gets the master coordinates Create primary keys and indices on PostgreSQL Write MySQL row events in PostgreSQL Replay of the replicated data in PostgreSQL Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 34 / 44
  • 41. Caveats, traps, the usual political stuff... What does seem to work Enum support Binary import into bytea (hex conversion) Initial copy based on copy to file or in memory Fall back to inserts in case of rubbish data (slow) Replication of CREATE and DROP TABLE statements Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 35 / 44
  • 42. Caveats, traps, the usual political stuff... What doesn’t work replication of ALTER TABLE statements Materialisation of the MySQL views Foreign keys import in PostgreSQL Daemonisation, background workers for replay, postgres extension Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 36 / 44
  • 43. Wrap up Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 37 / 44
  • 44. Wrap up Igor, the green little guy The chameleon logo has been developed by Elena Toma, a talented Italian Lady. https://www.facebook.com/Tonkipapperoart/ The name Igor is inspired by Martin Feldman’s Igor portraited in Young Frankenstein movie. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 38 / 44
  • 45. Wrap up Some numbers Lines of code global lib.py 163 mysql lib.py 521 pg lib.py 557 sql util.py 208 create schema.sql 354 Total lines in libraries 1449 Total lines including SQL 1803 Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 39 / 44
  • 46. Wrap up pg chameleon’s license Old plain 2clause BSD License Copyright (c) 2016, Federico Campoli All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 40 / 44
  • 47. Wrap up Please Test! That’s all! Please clone the repository, test and break the tool! Report issues! https://github.com/the4thdoctor/pg chameleon Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 41 / 44
  • 48. Wrap up Boring legal stuff MySQL Image source WikiCommons Hard Disk image source WikiCommons Slonik logo, copyright PostgreSQL Global development group Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 42 / 44
  • 49. Wrap up Contacts and license Twitter: 4thdoctor scarf Blog:http://www.pgdba.co.uk Brighton PostgreSQL Meetup: http://www.meetup.com/Brighton-PostgreSQL-Meetup/ This document is distributed under the terms of the Creative Commons Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 43 / 44
  • 50. Wrap up pg chameleon MySQL to PostgreSQL lightweight replica Federico Campoli Brighton PostgreSQL Meetup 18 November 2016 Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 44 / 44