IBM Data Movement Tool
IBM Data Movement Tool
Vikram S. Khatri
Certified Consulting I/T Specialist
IBM
19 Jun 2009
This article presents a very simple and powerful tool that enables applications from
Oracle to be run on IBM® DB2® Version 9.7 for Linux®, UNIX®, and Windows®. The
tool can also be used to move data from various other database management
systems to DB2 for Linux, UNIX, and Windows and DB2 for z/OS®.
Introduction
Beginning with DB2 V9.7 for Linux, UNIX, and Windows, the Migration Toolkit (MTK)
is not required in order to use applications from Oracle on DB2 products. This tool
replaces the MTK functionality with a greatly simplified workflow.
For all other scenarios, for example, moving data from a database to DB2 for z/OS,
this tool supports the MTK particularly in the area of the high speed data movement.
Using this tool, as much as 4TB of data have been moved in just three days.
A GUI provides an easy to use interface for the novice while the command line API
is often preferred by the advanced user.
Preparation
Download
First, download the tool from the Download section to your target DB2 server.
Additional steps are required to move data to DB2 for z/OS. (Check for the latest
available version of the tool.)
Installation
Once you have downloaded the IBMDataMovementTool.zip file, extract the files into
a directory called IBMDataMovementTool on your target DB2 server. A server side
install (on DB2) is strongly recommended to achieve the best data movement
performance.
Prerequisites
• DB2 V9.7 should be installed on your target server if you are enabling an
Oracle application to be run on DB2 for Linux, UNIX, and Windows.
• Java™ version 1.5 or higher must be installed on your target server. To
verify your current Java version, run java -version command. By
default, Java is installed as part of DB2 for Linux, UNIX, and Windows in
<install_dir>\SQLLIB\java\jdk (Windows) or /opt/ibm/db2/V9.7/java/jdk
(Linux).
Table 1. Location of JDBC drivers for your source database and DB2
Database JDBC drivers
Oracle ojdbc5.jar or ojdbc6.jar or ojdbc14.jar, xdb.jar,
xmlparserv2.jar or classes12.jar or
classes111.jar for Oracle 7 or 8i
SQL Server sqljdbc5.jar or sqljdbc.jar
Sybase jconn3.jar
MySQL mysql-connector-java-5.0.8-bin.jar or latest driver
PostgreSQL postgresql-8.1-405.jdbc3.jar or latest driver
DB2 for Linux, UNIX, and Windows db2jcc.jar, db2jcc_license_cu.jar or db2jcc4.jar,
db2jcc4_license_cu.jar
DB2 for z db2jcc.jar, db2jcc_license_cisuz.jar or
db2jcc4.jar, db2jcc4_license_cisuz.jar
DB2 for i jt400.jar
MS Access Optional Access_JDBC30.jar
Environment setup
Since a database connection to the target is required to run the tool, the DB2
database must be created first. On DB2 V9.7, we recommended that you use the
default automatic storage and choose a 32KB page size. When enabling
applications to be run on DB2 V9.7, the instance and the database must be
operating in compatibility mode. It is also recommended to adjust the rounding
behavior to match that of Oracle. You can deploy objects out of dependency order
by setting the revalidation semantics to deferred_force.
On UNIX systems
$ db2set DB2_COMPATIBILITY_VECTOR=ORA
$ db2set DB2_DEFERRED_PREPARE_SEMANTICS=YES
$ db2stop force
$ db2start
$ db2 "create db testdb automatic storage yes on /db2data1,
/db2data2,/db2data3 DBPATH ON /db2system PAGESIZE 32 K"
$ db2 update db cfg for testdb using auto_reval deferred_force
$ db2 update db cfg for testdb using decflt_rounding round_half_up
On Windows systems
On Windows:
IBMDataMovementTool.cmd
On UNIX:
chmod +x IBMDataMovementTool.sh
./IBMDataMovementTool.sh
You will now see a GUI window. Some messages should also appear in the shell
window. Please look through these messages to ensure no errors were logged
before you start using the GUI.
If you have not set DB2_COMPATIBILITY_VECTOR, the tool will report a warning.
Please follow the steps to set the compatibility vector if you have not done so.
[2010-01-10 17.08.58.578]
INPUT Directory = .
[2010-01-10 17.08.58.578]
Configuration file loaded: 'jdbcdriver.properties'
[2010-01-10 17.08.58.593]
Configuration file loaded: 'IBMExtract.properties'
[2010-01-10 17.08.58.593]
appJar : 'C:\IBMDataMovementTool\IBMDataMovementTool.jar'
[2010-01-10 17.08.59.531]
DB2 PATH is C:\Program Files\IBM\SQLLIB
[2010-01-10 17.35.30.015]
*** WARNING ***. The DB2_COMPATIBILITY_VECTOR is not set.
[2010-01-10 17.35.30.015]
To set compatibility mode, discontinue this program and
run the following commands
[2010-01-10 17.35.30.015] db2set DB2_COMPATIBILITY_VECTOR=FFF
[2010-01-10 17.35.30.015] db2stop force
[2010-01-10 17.35.30.015] db2start
The GUI screen as shown in Figure 1 has fields for specifying the source and DB2
database connection information. The sequence of events in this screen are:
4. Specify the working directory where DDL and data are to be extracted to.
5. Choose if you want DDL and/or DATA. If you only select DDL, an
additional genddl script will be generated.
6. Click on the Extract DDL/Data button. You can monitor progress in the
console window.
8. Optionally, you can click on the View Script/Output button to check the
generated scripts, DDL, data or the output log file.
10. You can use Execute DB2 Script to run the generated DB2 scripts
instead of running it from the command line. The data movement is an
interative exercise. If you need to drop all tables before you start fresh,
you can select the drop table script and execute it. You can also use this
button to execute the scripts in the order you want them to be executed.
After clicking on the Extract DDL/Data button, you will notice tool's messages in the
View File tab, as shown in Figure 2:
After completing the extraction of DDL and DATA, you will notice several new files
created in the working directory. These files can be used at the command line to run
in DB2.
Configuration files
The following command scripts are regenerated each time you run the tool in GUI
mode. However, you can use these scripts to perform all data movement steps
without the GUI. This is helpful when you want to embed this tool as part of a batch
processes to accomplish an automated data movement.
You can run the tool using command line mode particularly when the GUI capability
is not available. The tool switches modes automatically if it is not able to start GUI. If
you want to force to run the tool in command line interactive mode, you can specify
On Windows:
IBMDataMovementTool -console
On UNIX:
./IBMDataMovementTool.sh -console
You will be presented with interactive options to specify source and DB2 database
connection parameters in step-by-step process. A sample output from the console
window is shown as below:
After extraction of the DDL and DATA, you have three different ways of deploying
the extracted objects in DB2.
The interactive deploy option is likely your better choice when you are also deploying
PL/SQL objects such as triggers, functions, procedures, and PL/SQL packages.
The GUI screen as shown in Fig-4 is used for interactive deployment of DDL and
other database objects. The sequence of events in this screen is:
3. Use the Open Directory button to select the working directory containing
the previously extracted objects. The objects are read and listed in a tree
view.
4. You can deploy all objects by pressing Deploy All Objects button on the
toolbar. Most objects will deploy successfully while others may fail.
5. When you click on an object which failed to deploy in the tree view, you
can see the source of the object in the editor window. The reason for the
failure is listed in the deployment log below.
7. You can select one or more objects using the CTRL key and click Deploy
Selected Objects button on the toolbar to deploy objects after they have
been edited. Often deployment failures occur in a cascade which means
that once one object is successfully deployed others which depend on it
will also deploy.
• Go to the root directory of the data movement and run the rowcount
script.
• You should see a report generated in the "<source database
name>.tables.rowcount" file. The report contains row counts from both
source and target databases.
oracle : db2
"TESTCASE"."CALL_STACKS" : 123 "TESTCASE"."CALL_STACKS" : 123
"TESTCASE"."CLASSES" : 401 "TESTCASE"."CLASSES" : 401
"TESTCASE"."DESTINATION" : 513 "TESTCASE"."DESTINATION" : 513
When the source database size is too large and there is not enough space to hold
intermediate data files, using pipe is the recommended way to move the data.
On Windows systems
The tool uses Pipe.dll to create Windows pipes and makes sure that this dll is placed
in the same directory where IBMDataMovementTool.jar file is placed.
On UNIX systems
The tool creates UNIX pipes using the mkfifo command for use to move data from
source to DB2.
Before you can use pipe between source and DB2 database, it is necessary to have
table definition created. Follow this procedure:
2. Click on the Extract DDL/Data button to unload the data, or run the
unload script from the command line window.
3. Click on the Deploy DDL/Data button, or run the db2gen script from the
command line window.
5. Click on the Extract / Deploy through Pipe Load button, or run the
unload script from the command line window.
You can use this tool from z/OS to do the data movement from a source database to
DB2 for z/OS. However, the following additional steps are required.
2. This zip file contains a file named jzos.pax. FTP this file using Unix
System Services in binary mode to the directory where you would like
JZOS installed.
4. Run the command: pax -rvf. This will create a subdirectory called jzos
in your current working directory. This subdirectory will be referred to as
<JZOS_HOME>
5. In the user's home directory, create a file named .profile based upon the
template given below by making changes as per your z/OS DB2
installation.
export JZOS_HOME=$HOME/jzos
export JAVA_PATH=/usr/lpp/java/J1.5
export PATH=$JAVA_HOME/bin:$PATH
export CLPHOME=/usr/lpp/db2/db2910/db2910_base/lib/IBM
export CLASSPATH=$CLASSPATH:/usr/lpp/db2/db2910/db2910_base/lib/clp.jar
export CLPPROPERTIESFILE=$HOME/clp.properties
export LIBPATH=$LIBPATH:<JZOS_HOME<
alias db2="java com.ibm.db2.clp.db2"
7. In the user's home directory, create a file name clp.properties based upon
template given below:
10. IBMExtract.properties, geninput and unload scripts are created for you.
13. The parameter zoveralloc specifies by how much you want to oversize
your file allocation requests. A value of 1 means that you are nor
oversizing at all. In an environment with sufficient free storage, this might
work. In a realistic environment, 15/11 (1.3636) will be a good estimate. It
is recommended that you start at 1.3636 (15/11) and lower the value
gradually until you get file write errors, and then increase it a little. If you
know the value of SMS parameter REDUCE SPACE UP TO, you should
be able to calculate the perfect value of overAlloc by setting it to 1 / (1 -
(X/100)), where X is the value of REDUCE SPACE UP TO given as an
integer between 0 - 100. Note that REDUCE SPACE UP TO represents a
percentage.
15. Run geninput script to create an input file for the unload process.
17. Run generated script to create the DDL and load data on z/OS DB2.
18. The DSNUTILS will fail if you do not delete those datasets. The following
java program can delete those intermediate datasets.
19. After data loading is completed into DB2 tables on z/OS, you may find the
datasets that you need to delete. Use the the following java program to
delete those datasets as a part of cleanup.
Create a script jd as shown below:
JZOS_HOME=$HOME/jzos
JAVA_HOME=/usr/lpp/java/J1.5
CLASSPATH=$HOME/migr/IBMDataMovementTool.jar:$JZOS_HOME/ibmjzos.jar
LIBPATH=$LIBPATH:$JZOS_HOME
$JAVA_HOME/bin/java -cp $CLASSPATH \
-Djava.ext.dirs=${JZOS_HOME}:${JAVA_HOME}/lib/ext ibm.Jd $1
Change file permission to 755 and run it and then you will get an output
shown below:
DNET770:/u/dnet770/migr: >./jd
USAGE: ibm.Jd <filter_key>
USAGE: ibm.Jd "DNET770.TBLDATA.**"
USAGE: ibm.Jd "DNET770.TBLDATA.**.CERR"
USAGE: ibm.Jd "DNET770.TBLDATA.**.LERR"
USAGE: ibm.Jd "DNET770.TBLDATA.**.DISC"
It is out of the scope of this article to discuss hardware requirements and database
capacity planning but it is important to keep in mind following considerations for
estimating time to complete large scale data movement.
• You need a good network connection between source and DB2 server,
preferably of 1GBPS or higher. You will be limited by the network
bandwidth for the time frame to complete the data movement.
• The number of CPUs on the source server will allow you to unload
multiple tables in parallel. For database size greater than 1TB, you should
have minimum 4 CPU on source server.
• The number of CPUs on the DB2 server will determine the speed of the
LOAD process. As a rule of thumb, you will require 1/4 to 1/3 of the time
to load data in data and rests will be consumed by the unload process.
• Plan ahead the DB2 database layout. Please consult IBM's best practice
paperss for DB2
Tips and techniques
• Gain understanding of the tool in the command line mode. Use GUI to
generate data movement scripts (geninput and unload) and practice data
unload by running unload script from the command line.
• Extract only DDL from source by setting GENDDL=true and
UNLOAD=false in the unload script. Use the generated DDL to plan for
the table space and table mapping. Use a separate output directory to
store generated DDL and data by specifying the target directory using
-DOUTPUT_DIR parameter in the unload script. The generation of the
DDL should be done ahead of the final data movement.
• Use geninput script to generate a list of tables to be moved from source
to DB2. Use SRCSCHEMA=ALL and DSTSCHEMA=ALL parameter in the
geninput script to generate a list of all tables. Edit the file to remove
unwanted tables and split it into several input files to do a staggered
movement approach where you perform unload from source and load to
target in parallel.
• After breaking the table input file (generated from geninput script) into
several files, copy the unload script into equivalent different files, change
the name of the input file, and specify a different directory for each unload
process. For example, you could create 10 unload scripts to unload 500
tables from each unload script, totalling 5000 tables.
• Make sure that you do DDL and DATA in separate steps. Do not mix
these 2 into a single step for such large movement of data.
• The tool unloads data from the source tables in parallel controlled by
NUM_THREADS parameter in the unload script. The default value is 5,
and you can increase it to a level where CPU utilization on your source
server is around 90%.
• Pay attention to the tables listed in the input tables file. The script
geninput does not have intelligence to put the tables in a particular
order, but you need to order the tables in such a way as to minimize
unload time. The tables listed in the input files are fed to a pool of threads
in a round robin fashion. It may so happen that all the threads have
finished the unload process but one is still running. In order to keep all
threads busy, organize the input file for the tables in the increasing
numbers of rows.
• It may still so happen that all tables have unloaded and a few threads are
still holding up unloading very large tables. You can unload the same
table in multiple threads if you can specify the WHERE clause properly in
the input file. For example:
Make sure that you use the right keys in the WHERE clause, which
should preferrably be either the primary key or a unique index. The tool
takes care of making proper DB2 LOAD scripts to load data from multiple
files generated by the tool. There is no other setup required to unload the
same table in multiple threads, except to add different WHERE clause as
explained.
• After breaking your unload process in several steps, you can start putting
data in DB2 simultaneously when a batch has finished unloading the data.
The key here is the seperate output directory for each unload batch. All
necessary files are generated to put data in DB2 in the output directory.
For DDL, you will use generated db2ddl script to create table definitions.
For data, you will use db2load script to load the data in DB2. If you
combine DDL and data in a single step, the name of the script will be
db2gen.
• Automate the whole process in your shell scripts so that the unload and
load processes are synchronised. Each and every large data movement
from Oracle or other databases to DB2 is unique. You will have your skills
tested determining how to automate all of these jobs. Save the output of
the jobs in a file by using the tee command, so that you can keep
watching the progress, and the output is saved in a log file.
Run mock tests
It is a bad idea to fail to do the mock movement to test your automation and validate
the way you planned staggered unload from source and load in DB2. The level of
customization is only in creating the shell scripts to run these tasks in the right order.
Follow these steps to run the mock tests:
1. Copy your data movement scripts and automation shell scripts to a mock
directory.
2. Estimate your time by unloading a few large tables in a few threads, and
3. Add a WHERE clause to limit the number of rows to test the movement of
data. For example, you can add a ROWNUM clause to limit the number of
rows in Oracle or use the TOP clause for SQL Server.
4. Practice your scripts and make changes as necessary, and prepare for
the final run.
Final run
1. You have already extracted DDL and made the required manual changes
for the mapping between tables and tablespaces if required.
3. Make sure your have around 10000 open cursors setting for the Oracle
database if that is the source.
For large movement of data, it is much more about planning, discipline and the
ability to automate jobs. The tool provides all the capability that you require for such
movement. This little tool has moved very large databases from source to DB2.
JVM on it.
I am running this tool from a secure shell Depending upon your DISPLAY settings, the GUI
window on my Linux/Unix platform and I see window has opened on your display capable
few messages in the command line shell but I server. You need to properly export your
do not see GUI and it seems that tool has DISPLAY settings. Consult your Unix system
hung. adminstrator.
I am trying to move data from PostgreSQL There is no JDBC drivers provided with the tool
and I do not see PostgreSQL JDBC driver due to licensing considerations. You should get
attached with the tool. your database JDBC driver from your licensed
software.
It is not possible to grant DBA to the user You will at least need
extracting data from Oracle database. How SELECT_CATALOG_ROLE granted to the user
can I use the tool? and SELECT privileges on tables used for
migration.
What are the databases to which this tool can Any database that has a type-IV JDBC driver.
connect? So, you can connect to MySQL, PostgreSQL,
Ingres, SQL Server, Sybase, Oracle, DB2 and
others. It can also connect to a database that has
a ODBC-JDBC connector so you can also move
from Access database.
What version of Java do I need to run this You need minimum Java 1.5 to run the tool. The
tool? dependency for Java 1.5 is basically due to the
GUI portion of the tool. If you really need support
for Java 1.4.2, send me a note and I will compile
the tool for Java 1.4.2 but the GUI will not run to
create the data movement driver scripts.
You can determine the version of Java by
running this command.
$ java -version
C:\>java -version
How do I check the version of the tool? Run IBMDataMovementTool -version on
Windows or ./IBMDataMovementTool.sh
-version on Linux/UNIX
I am get the error "Unsupported major.minor You are using a version of Java less than 1.5.
version 49.0" or "(.:15077): Gtk-WARNING **: Install Java higher than version 1.4.2 to
cannot open display: " when I run the tool. overcome this problem. We prefer that you install
What does it mean? IBM Java.
What information do I need for a source and You need to know IP address, port number,
DB2 database servers in order to run this database name, user id and password for the
tool? source and DB2 database. The user id for the
source database should have DBA priviliges and
SYSADM privilege for the DB2 database.
I am running this tool from my Windows The default memory allocated to this tool from
workstation and it is running extremely slow. IBMDataMovementTool.cmd or
What can I do? IBMDataMovementTool.sh command script is
990MB by using -Xmx switch for the JVM. Try
reducing this memory as you might be having
less memory on your workstation.
Acknowledgements
Many IBMers from around the world provided valuable feedback to the tool and
without their feedback, the tool in this shape would not have been possible. I
acknowledge significant help, feedback, suggestions and guidance from following
people.
• Jason A Arnold
• Serge Rielau
• Marina Greenstein
• Maria N Schwenger
• Patrick Dantressangle
• Sam Lightstome
• Barry Faust
• Vince Lee
• Connie Tsui
• Raanon Reutlinger
• Antonio Maranhao
• Max Petrenko
• Kenneth Chen
• Masafumi Otsuki
• Neal Finkelstein
Disclaimer
This article contains a tool. IBM grants you ("Licensee") a non-exclusive, royalty
free, license to use this tool. However, the tool is provided as-is and without any
warranties, whether EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED
WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE
OR NON-INFRINGEMENT. IBM AND ITS LICENSORS SHALL NOT BE LIABLE
FOR ANY DAMAGES SUFFERED BY LICENSEE THAT RESULT FROM YOUR
USE OF THE SOFTWARE. IN NO EVENT WILL IBM OR ITS LICENSORS BE
LIABLE FOR ANY LOST REVENUE, PROFIT OR DATA, OR FOR DIRECT,
INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL OR PUNITIVE DAMAGES,
HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY,
ARISING OUT OF THE USE OF OR INABILITY TO USE SOFTWARE, EVEN IF
IBM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
Downloads
1
• Product: IBM Data Movement Tool
Note
1. A new build of the tool is uploaded very frequently, after bug fixes and new enhancements.
Click on Help > Check New Version from the GUI or enter the command
./IBMDataMovementTool.sh -check to check if a new build is available for download. You can
find the Tool's build number from the Help > About menu option or by entering the
./IBMDataMovementTool.sh -version command. This tool uses JGoodies Forms 1.2.1, JGoodies
Look 2.2.2, and JSyntaxPane 0.9.4 packages for the GUI interface.
Resources
Learn
• "Migrate from MySQL or PostgreSQL to DB2 Express-C" (developerWorks,
June 2006) was the first article written for this tool.
• "DB2 Viper 2 compatibility features" (developerWorks, July 2007) is the article
that explains compatibility features.
• You can also use Migration Toolkit, for the migration of data and procedures.
Get products and technologies
• Download DB2 Express-C 9.7, a no-charge version of DB2 Express database
server for the community.
• Download a free trial version of DB2 9.7 for Linux, UNIX, and Windows..
• Download IBM product evaluation versions and get your hands on application
development tools and middleware products from DB2, Lotus®, Rational®,
Tivoli®, and WebSphere®.
Discuss
• Participate in the discussion forum for this content.
• Check out developerWorks blogs and get involved in the developerWorks
community.
Trademarks
IBM, AIX, DB2, z/OS and DB2 are trademarks of IBM Corporation in the United
States and many other countries. Java and all Java-based trademarks are
trademarks of Sun Microsystems, Inc. in the United States and other countries. Linux
is a trademark of Linus Torvalds in the United States and other countries. Microsoft,
Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States and other countries. UNIX is a registered trademark
of The Open Group in the United States and other countries. Oracle is a trademark of
Oracle Corporation in the United States and other countries. Other company,
product, or service names may be trademarks or service marks of others.