0% found this document useful (0 votes)
172 views21 pages

Db2 Postgresql Migration 169

The document summarizes the migration of an organization's databases and applications from DB2 on a mainframe to PostgreSQL. Key points: - The migration was a multi-year project for a large German state ministry, led by an external consultant and new DBA. - Applications were rewritten from Natural, PL/I, and Java to be compatible with PostgreSQL. The Natural to Java migration was automated but delayed. - Tools like SQL Workbench and pgloader were used to migrate schemas and bulk load over 1.3TB of data from DB2 to PostgreSQL. - The full migration took place over a weekend by scripting the schema and data migration to minimize downtime. Validation jobs ensured data was

Uploaded by

ginglle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
172 views21 pages

Db2 Postgresql Migration 169

The document summarizes the migration of an organization's databases and applications from DB2 on a mainframe to PostgreSQL. Key points: - The migration was a multi-year project for a large German state ministry, led by an external consultant and new DBA. - Applications were rewritten from Natural, PL/I, and Java to be compatible with PostgreSQL. The Natural to Java migration was automated but delayed. - Tools like SQL Workbench and pgloader were used to migrate schemas and bulk load over 1.3TB of data from DB2 to PostgreSQL. - The full migration took place over a weekend by scripting the schema and data migration to minimize downtime. Validation jobs ensured data was

Uploaded by

ginglle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Migration From DB2 in a Large Public Setting: Lessons Learned

Balázs Bárány and Michael Banck

PGConf.EU 2017
Introduction

I Federate state ministry in Germany


I Hosting by state’s central IT service centre
I Michael worked as an external consultant for both
I Balázs took over DBA role and migration lead at ministry
I Michael continues to support the service centre’s Postgres operations

Balázs Bárány and Michael Banck PGConf.EU 2017 1 / 20


Introduction

I Proof-of-Concept version of this talk presented at pgconf.eu 2015


I Slides still available on https://wiki.postgresql.org/
I DB2 UDB is the z/OS mainframe edition of IBM’s DB2 database
I DB2 UDB central database and application server (“the Host”) in German state ministry
I Used by programs written in (mostly) Software AG Natural and Java (some PL/I)
I Natural (and PL/I) programs directly executed on the mainframe, no network round-trip
I Business-critical, handles considerable payouts of EU subsidies
I Crunch-Time in spring when users apply for subsidies

Balázs Bárány and Michael Banck PGConf.EU 2017 2 / 20


Prior Postgres Usage

I Postgres introduced about 12 years ago due to geospatial requirements (PostGIS,


nothing comparable for DB2 at the time)
I Started using Postgres for smaller, non-critical projects about 7 years ago
I Modernized the software stack merging geospatial and business data about 5 years ago
I In-house code development of Java web applications (Tomcat/Hibernate)
I Business-logic in the applications, almost no (DB-level) foreign keys, no stored
procedures
I Some business data retrieved from DB2, either via a second JDBC connection, or via
batch migrations
I Migrated all Natural and PL/I programs to Java/Postgres

Balázs Bárány and Michael Banck PGConf.EU 2017 3 / 20


Application Migration Strategy

I Java Applications
I Development environment switched to Postgres and errors fixed
I Not a lot of problems if Hibernate is used
I Potentially migrated to modernized framework
I PL/I Applications
I Rewritten in Java (only a few)
I Natural Applications
I Automatic migration/transcription into (un-Java, but correct) Java on DB2
I Test of migrated “Java” application on the original data
I Test on schema migrated to PostgreSQL
I Multi-year project facilitated by an external consultancy

Balázs Bárány and Michael Banck PGConf.EU 2017 4 / 20


Setup Before the Migration

I Postgres
I Postgres-9.4/PostGIS-2.1 (upgrade to 9.6/2.3 planned in late 2017)
I SLES11, 64 cores, 512 GB memory, SAN storage
I HA 2-node setup using Pacemaker, two streaming standbys (one disaster recovery
standby)
I Roughly 1.3 TB data, 22 schemas, 440 tables, 180 views in PROD
I Almost no stored procedures (around 10)
I DB2
I DB2 UDB Version 10
I Roughly 600 GB data in PROD instance
I Almost no stored procedures (around 20, written in PL/I)

Balázs Bárány and Michael Banck PGConf.EU 2017 5 / 20


Steps towards migration

I Natural migration to Java delayed


I Originally planned for November 2015, ready in July 2017
I Gave us one year for testing the migration process
I Several Java projects maintained by external developers have been (mostly)
successfully tested on local Postgres deployments
I First production migration of a complex Java program and its schema done in early
2016
I Required daily migration of core tables (DB2 to PostgreSQL) starting at that point
I Separate DB2 database operated by the ministry migrated from mainframe to another
Postgres instance successfully in Q1/2017
I Just a data migration, schema was migrated by hand

Balázs Bárány and Michael Banck PGConf.EU 2017 6 / 20


Tools used for the migration

I SQLWorkbench/J (http://www.sql-workbench.net) v117.6


I Java-based, DB-agnostic workbench GUI
I Heavily-used in-house already, installed on workstations
I Allows for headless script/batch operation via internal programs
I Used for schema migration and data export from DB2
I pgloader (http://pgloader.io) 3.2.0
I Postgres bulk loading and migration tool written in Lisp
I Open Source (PostgreSQL license)
I Written and maintained by Dimitri Fontaine (PostgreSQL major contributor)
I Used for data import into Postgres

Balázs Bárány and Michael Banck PGConf.EU 2017 7 / 20


Schema-Migration

I General Approach
I Dump schema objects into an XML representation
I Transform XML into Postgres DDL via XSLT
I Provide compatibility environment for functions called in views and triggers
I Post-process SQL DDL to remove/work-around remaining issues
I Handle triggers separately
I Ignore functions/stored procedures (out-of-scope)

Balázs Bárány and Michael Banck PGConf.EU 2017 8 / 20


DB2 Compatibility Layer (db2fce)

I Similar (in spirit) to orafce, only SQL-functions so far


I https://github.com/credativ/db2fce, PostgreSQL license
I SYSIBM.SYSDUMMY1 view (similar to Oracle’s DUAL table)
I SELECT 1 FROM SYSIBM.SYSDUMMY1;
I db2 Schema:
I Time/Date: MICROSECOND()/SECOND()/MINUTE()/HOUR()/DAY()/MONTH()/
YEAR()/DAYS()/MONTHS BETWEEN()
I String: LOCATE()/TRANSLATE()/STRIP()
I Casts: CHAR()/INTEGER()/INT()/DOUBLE()/DECIMAL()/DEC()
I Aliases: VALUE() (for coalesce()), DOUBLE (for DOUBLE PRECISION type), ^= (for <>
/ != operators), !! (for || operator)
I search path changed to ’db2, public’ in database configuration
Balázs Bárány and Michael Banck PGConf.EU 2017 9 / 20
Data Migration, Encountered Problems

I Several tables had \x00 values in them, resulting in "invalid byte sequence for
encoding UTF8: 0x00" errors
I Exporting tables with a column USER resulted in WbExport writing the username of
the person running it
I Default timestamp resolution was too coarse, leading to duplicate key violations
I NUMERIC(X,Y) columns were exported with a precision of 2 only
I Import of timestamps invalid in daylight saving change time rejected by PostgreSQL
I Export them with -Duser.timezone=GMT despite being local (Central European)
timestamps
I Objects in target DB with the same name as in the source, but different contents
I Renamed in source system

Balázs Bárány and Michael Banck PGConf.EU 2017 10 / 20


Full Migration

I Closed databases for “normal” usage


I Source DB switched to read only
I PostgreSQL: removed USAGE on schemas from non-DBA users
I Notified users with open connections
I Deactivated HA watchdog, disaster recovery
I Scripted (automatic) migration process:
I Dumped schema to XML, converted to DDL, post-processed
I Dropped indexes, constraints and triggers
I Exported data
I Imported data
I Set sequence values
I Created indexes, constraints and triggers
I Created grants

Balázs Bárány and Michael Banck PGConf.EU 2017 11 / 20


Full Migration, Results

I Full migration in 3 processes (different schemas) in 14 hours incl. index building


I Database was gaining 1 GB every 2 minutes when 3 processes were writing
I Up to 80 Mbit/sec both incoming and outgoing on the network interfaces
I Up to 120 Mbit/sec when writing (4 CPUs at the limit)
I Data validation jobs started whenever a schema was ready
I Minimal differences in floating point representation
I Everything else identical, including binary data and sequence values
I Watching logs for errors while the applications start
I A few schema or table permissions were missing
I Tables missing
I Dropped from source system before, application was not tested

Balázs Bárány and Michael Banck PGConf.EU 2017 12 / 20


SQL Differences

I Migration Guide in PostgreSQL wiki


I https://wiki.postgresql.org/wiki/File:DB2UDB-to-PG.pdf
I Age and Author unknown
I Noticed SQL Differences
I CURRENT TIMESTAMP etc. (but CURRENT TIMESTAMP is supported by DB2 as well)
I Casts via scalar functions like INT(foo.id)
I CURRENT DATE + 21 DAYS
I ‘2100-12-31 24.00.00.000000’ timestamp in data - year 2100/2101
I Operators like != instead of <>
I “Default default” values: attribute INTEGER DEFAULT
I Like attribute INTEGER DEFAULT 0 in PostgreSQL

Balázs Bárány and Michael Banck PGConf.EU 2017 13 / 20


Behaviour Differences

I DB2 sorts data by GROUP BY keys, no need for ORDER BY


I PostgreSQL doesn’t guarantee this
I Sorting differences
I EBCDIC: numbers after characters (ASCII: before)
I EBCDIC: special characters after characters and numbers
I Similar behaviour with C collation in Postgres
I Applications using ECBDIC order inside values
I Application got “duplicate key value” error
I Tried to use CURRENT TIMESTAMP as primary key
I Postgres: Start of transaction. DB2: current time regardless of transaction.
I Application trying to insert NULL into field with DEFAULT
I DB2 accepted the NULL and used the DEFAULT
I For Postgres, we had to create a trigger to fix the INSERTs
Balázs Bárány and Michael Banck PGConf.EU 2017 14 / 20
Transaction handling

I Some queries in DB2 using WITH UR


I UR = Uncommitted Read
I Performance optimization to avoid locks (but getting inconsistent data)
I No comparable built-in mechanism in PostgreSQL, but locking not a huge problem
I PostgreSQL cancels the entire transaction after an error, ROLLBACK necessary
I Some program logic needed changes (bad error handling, errors used for branching logic)

Balázs Bárány and Michael Banck PGConf.EU 2017 15 / 20


Performance

I Most applications comparable or faster after migration


I A few with mainframe access patterns slower
I Some smaller impacts: different indexing, JDBC oddities, . . .
I A few huge problems (application not usable)
I Indexes correct, query is fast in SQL client
I Not using “best” index when called from prepared query
I Found the reason: ID (integer) field was queried with NUMERIC parameter

Balázs Bárány and Michael Banck PGConf.EU 2017 16 / 20


JDBC: defaultRowFetchSize

I Applications started crashing with Out of Memory errors


I Pattern: SELECT * FROM <big table>, read some rows for display
I PostgreSQL JDBC reads the whole data set by default
I DB2 didn’t, so the application was working fine
I Solution: setting defaultRowFetchSize to a reasonable value (e. g. 10000)
I No negative effects (negligible performance hit?)

Balázs Bárány and Michael Banck PGConf.EU 2017 17 / 20


JDBC: stringtype

I Errors with prepared statements, but query works in SQL client


I This works (automatic casting to date):
SELECT * FROM t WHERE dat = ’2017-08-01’;
I This doesn’t (with param1 = ’2017-08-01’):
SELECT * FROM t WHERE dat = ?;
I Could affect date, timestamp, numeric and Boolean columns
I Comparison of date type with “forced” text type fails, no automatic cast
I Solution: stringtype=unknown
I Fine for the affected applications
I Might be wrong in some situations, e. g. garbled date format

Balázs Bárány and Michael Banck PGConf.EU 2017 18 / 20


Summary

I Customer is happy
I Big savings on mainframe costs
I mainframe performance units, DB2, Natural runtime, . . .
I One modern database for business and GIS data, pleasant usage
I Better standards for DB roles and permissions, change management etc.

Balázs Bárány and Michael Banck PGConf.EU 2017 19 / 20


Contact

I DB2 compatibility extension: https://github.com/credativ/db2fce


I Michael Banck <[email protected]>
I http://www.credativ.de/postgresql-competence-center
I Balázs Bárány <[email protected]>
I https://datascientist.at/

Questions?

Balázs Bárány and Michael Banck PGConf.EU 2017 20 / 20

You might also like