dbexec is a utility for executing SQL scripts (typically Export, Transform, Load (ETL) scripts) against multiple databases as efficiently as possible. It performs automatic (but currently rudimentary) dependency analysis of all provided scripts and executes as many in parallel as possible based on the dependencies discovered. It also offers simple text-substitution (for centralizing and securing login credentials, storage paths, etc.) and a few extensions to the SQL language primarily to aid in error handling.
dbexec [options] files...
Execute files in the most efficient order available after analysis of dependencies between scripts based on data files produced and consumed.
.. program:: dbexec
.. option:: --version Outputs the application's version number and exits immediately
.. option:: -h, --help Outputs the help screen shown above and exits immediately
.. option:: -q, --quiet Specifies that only errors are to be displayed on stderr. By default both warnings and errors will be displayed
.. option:: -v, --verbose Specifies that informational items are to be displayed on stderr in addition to warnings and errors
.. option:: -l, --logfile LOGFILE Output all messages to LOGFILE. Output to the logfile is not influenced by --quiet or --verbose - all messages (informational, warning, and error) will always be included
.. option:: -D, --debug Run dbexec under PDB, the Python debugger. Generally only useful for developers. Also note that in this mode, debug entries will be output to stderr as well, which results in a lot of output
.. option:: -t, --terminator TERMINATOR Use TERMINATOR as the statement terminator in all specified scripts. This value defaults to semi-colon (';'), unlike the DB2 CLP which defaults to line breaks as a statement terminator. It is not possible to specify line-breaks as dbexec's statement terminator.
.. option:: -a, --auto-commit This is the reverse of the -c option of the DB2 CLP. By default, dbexec runs scripts without auto-COMMIT enabled (equivalent to the +c mode in DB2 CLP). This is because I consider auto-COMMIT to be a dangerous default (one should always be able to rollback scripts that have failed).
.. option:: -c, --config CONFIG Specifies the ini-style configuration file which is expected to contain a [Substitution] section. The keys and values listed in that section will be substituted for $references in the specified SQL scripts.
.. option:: -d, --delete-files If specified, tells dbexec to remove all the files generated by the scripts (by EXPORT commands) after finishing execution (note that if execution fails for any reason, files will not be removed).
.. option:: -n, --dry-run If specified, tells dbexec to do parse but not execute the specified scripts. If specified twice, file-permission tests will also be carried out. If specified three times, database login tests will also be carried out.
.. option:: -r, --retry RETRY DEPRECATED If specified, specifies the default number of retries that dbexec will use for failed scripts. This option is now deprecated and will be removed in a future version. Use the ON statement to specify retry options with considerably finer control.
.. option:: -s, --stop-on-error This option is copied from the DB2 CLP. If used, scripts will terminate immediately if an error occurs (normally a script will continue running to the end if an error occurs, although it will still be considered to have "failed" by dbexec in either case)
.. option:: -L, --log-scripts TRANSFORM If specified, dbexec will enable real-time logging of script output by writing a log file per script that is specified for execution. The TRANSFORM value takes the form /regexpr/subst and specifies a regular expression and substitution pattern that will be applied to the script's filename in order to obtain the corresponding log filename. Be warned that this parameter will undergo major revisions when #85 is implemented.
dbexec requires one or more SQL or CLP scripts in order to run. Scripts are simple text files containing SQL or CLP commands separated by a statement terminator. Unlike the DB2 CLP, dbexec does not permit statements to be separated by line breaks; the default statement terminator is semi-colon (equivalent to the DB2 CLP with the -t switch). You can specify a different statement terminator with db2exec's :option:`-t` command line switch (equivalent to the DB2 CLP's -td switch).
dbexec is primarily intended for working with scripts which transfer data from one or more DB2 databases to another (ETL scripts). This is why it concentrates on analyzing dependencies based on exported and imported files. To demonstrate a typical work-flow with db2exec, we will start with the following simple EXPORT script:
export_dept.sql
CONNECT TO SAMPLE1 USER fred USING secret; EXPORT TO /tmp/DEPTDATA.IXF OF IXF SELECT DEPTNO, DEPTNAME, MGRNO FROM DEPARTMENT; CONNECT RESET;
The following command line can be used to execute this script with db2exec:
$ dbexec export_dept.sql
Normally, dbexec produces no output unless warnings or errors occur. If you wish to see more information, including the output of each script, and a summary of execution, use the :option:`-v` switch. Below is shown the output produced when running the example script with this switch:
$ dbexec -v export_dept.sql Parsing script export_dept.sql Calculating dependencies for script export_dept.sql File /tmp/DEPTDATA.IXF is produced, but never consumed Test create/write for /tmp/DEPTDATA.IXF succeeded Test login to database SAMPLE1 succeeded (username: fred) Starting script export_dept.sql Script export_dept.sql completed successfully with return code 0 Script export_dept.sql output CONNECT TO 'SAMPLE1' USER 'fred' USING Database Connection Information Database server = DB2/LINUX 8.2.4 SQL authorization ID = FRED Local database alias = SAMPLE1 EXPORT TO '/tmp/DEPTDATA.IXF' OF IXF SELECT DEPTNO, DEPTNAME, MGRNO FROM DEPARTMENT SQL3104N The Export utility is beginning to export data to file "DEPTDATA.IXF". SQL3105N The Export utility has finished exporting "1" rows. Number of rows exported: 1 CONNECT RESET DB20000I The SQL command completed successfully. Script Started Duration Status --------------- -------- -------- ------ export_dept.sql 12:40:14 00:00:00 OK --------------- -------- -------- ------ Total 12:40:14 0s
There are several things to note in the output above:
File /tmp/DEPTDATA.IXF is produced, but never consumed
This line indicates that dbexec has noticed the /tmp/DEPTDATA.IXF file is created by the script, but that no other script consumes this file (with a LOAD or IMPORT statement). This is not an error, but a warning. In a set of scripts designed to transfer data from one database to another this might indicate that an import script has been forgotten from the command line, or that an extraneous export script has been included.
Test create/write for /tmp/DEPTDATA.IXF succeeded
Before executing any scripts, dbexec performs a test of all file permissions that will be required by all scripts being executed. Specifically, the ability to create, read, and/or write files that are produced or consumed by each script is tested. The tests performed are all non-destructive, hence the content of any data files that exist prior to the run will be preserved.
Test login to database SAMPLE1 succeeded (username: fred)
Before executing any scripts, dbexec performs a "test login" against each combination of database and username found in all the scripts specified on the command line. The reason for this is to avoid locking user accounts in the case where an incorrect password is used. For example, if ten scripts are to be executed which extract data from a single database, and this database has a security policy causing an account to be locked after 5 invalid login attempts (a stupid policy, but one all too common in enterprises) then if dbexec executes all 10 scripts in parallel with an invalid password it will immediately cause the account to be locked. Performing a test login before beginning execution will avoid this scenario. Should any test login fail, dbexec will abort before starting any scripts.
Starting script export_dept.sql Script export_dept.sql completed successfully with return code 0 Script export_dept.sql output CONNECT TO 'SAMPLE1' USER 'fred' USING ...
This sequence of lines implies that dbexec does not print the output of a script until after the script has finished executing (successfully or otherwise). Because dbexec uses parallel execution, if it printed the output of scripts while they were executing it would be difficult to discern which script produced which line of output.
Script Started Duration Status --------------- -------- -------- ------ export_dept.sql 12:40:14 00:00:00 OK --------------- -------- -------- ------ Total 12:40:14 00:00:00
Finally, the table at the end of the output provides a summary of the execution. In this case, one can see that a single script was executed successfully in less than 1 second.
One issue with the example above is that the password of the user Fred, used to connect to the "SAMPLE1" database is stored in the SQL script. Firstly, this is rather insecure. Secondly, consider the case where multiple scripts connect to the same database. If Fred needs to change his password all the scripts would need updating. Instead, we can use dbexec's variable substitution feature to remove the password (and, in the example below, the username, database name, and export path) from the script into a single configuration file. This makes maintenance easier, as there is now a single location to update in the case of a password change. It also, to a small extent, improves security as there is now only one file that we need to secure (with file system permissions for example). Below you can see the configuration file:
dbexec.ini
[Substitute] SAMPLEDB=SAMPLE1 SAMPLEUSER=fred SAMPLEPASS=secret EXPORTPATH=/tmp
And the updated export script:
export_dept.sql
CONNECT TO $SAMPLEDB USER $SAMPLEUSER USING $SAMPLEPASS; EXPORT TO $EXPORTPATH/DEPTDATA.IXF OF IXF SELECT DEPTNO, DEPTNAME, MGRNO FROM DEPARTMENT; CONNECT RESET;
The configuration file is a simple INI-style file. It consists of a section
titled "Substitute", containing name=value
lines. Blank lines and comments
(prefixed by semi-colon or hash signs) will be ignored. Continuation lines can
be specified by indentation.
Substitution variables may appear anywhere in an SQL script; within quoted
strings, or outside them. They may represent values or be statements in and of
themselves. Variables are prefixed with a dollar and the name of the variable
is optionally enclosed in ${braces}
. If a variable name is immediately
followed by a character which could form part of a variable name (an alphabetic
character, a number or a underscore) then the braces are mandatory.
To execute the script with the configuration file, the following command line could be used:
$ dbexec -c dbexec.ini export_dept.sql
If one wished to test the configuration to ensure that variables were substituted correctly and dependencies fulfilled, use the :option:`-n` switch (commonly combined with :option:`-v` to ensure all detail is displayed):
$ dbexec -nv -c db2exec.ini export_dept.sql Parsing script export_dept.sql Calculating dependencies for script export_dept.sql File /tmp/DEPTDATA.IXF is produced, but never consumed Dependency tree: '- export_dept.sql Data transfers: SAMPLE1 ---> /tmp/DEPTDATA.IXF SQL that would be executed is logged below: export_dept.sql CONNECT TO 'SAMPLE1' USER 'fred' USING 'secret'@ EXPORT TO '/tmp/DEPTDATA.IXF' OF IXF SELECT DEPTNO, DEPTNAME, MGRNO FROM DEPARTMENT@ CONNECT RESET@
You may note that the statement terminator in the output has changed from the original semi-colon to an at-symbol (@). There are two reasons for this:
- In order to ensure that all scripts output when using the :option:`-n` option use the same terminator. This is useful when redirecting the output (see below).
- In order to ensure that compound statements (in between
BEGIN ATOMIC
andEND
) are able to use semi-colon as the terminator for statements within them, a terminator other than semi-colon must be used. The at-symbol is a common "alternate" terminator.
When :option:`-n` is used, all SQL that is output by dbexec is written to stdout, while informational and warning messages are written to stderr. Furthermore, all scripts are output in the order they would be executed (the ordering of scripts that would be executed in parallel is arbitrary). This enables one to redirect the output to a file in order to obtain a single script which, if run, would produce the same result as running dbexec without the :option:`-n` switch. This can be useful for debugging purposes. For example:
$ dbexec -nv -c dbexec.ini export_dept.sql > test.sql Parsing script export_dept.sql Calculating dependencies for script export_dept.sql File /tmp/DEPTDATA.IXF is produced, but never consumed Dependency tree: '- export_dept.sql Data transfers: SAMPLE1 ---> /tmp/DEPTDATA.IXF SQL that would be executed is logged below: export_dept.sql $ cat test.sql CONNECT TO 'SAMPLE1' USER 'fred' USING 'secret'@ EXPORT TO '/tmp/DEPTDATA.IXF' OF IXF SELECT DEPTNO, DEPTNAME, MGRNO FROM DEPARTMENT@ CONNECT RESET@
Note that :option:`-n` can be specified multiple times:
- -n: parse scripts and output SQL to run
- -nn: as above but also run file permission tests
- -nnn: as above but also test database logins
Finally, a more complete example, involving four SQL scripts. Three of the scripts connect to two databases and extract data. The final script loads all the exports into a third target database. A configuration file defines all the user names and passwords involved:
dm.ini
[Substitute] DM1USER=GB01111 DM1PASS=Passw0rd DM2USER=GB01111 DM2PASS=Passw0rd DM3USER=db2admin DM3PASS=db2admin EXPORTPATH=/tmp
dm1_export_a.sql
CONNECT TO DM1 USER $DM1USER USING $DM1PASS; EXPORT TO $EXPORTPATH/A.IXF OF IXF SELECT * FROM A; CONNECT RESET;
dm1_export_b.sql
CONNECT TO DM1 USER $DM1USER USING $DM1PASS; EXPORT TO $EXPORTPATH/B.IXF OF IXF SELECT * FROM B; CONNECT RESET;
dm2_export.sql
CONNECT TO DM2 USER $DM2USER USING $DM2PASS; EXPORT TO $EXPORTPATH/C.IXF OF IXF SELECT * FROM C; CONNECT RESET;
dm3_import.sql
CONNECT TO DM3 USER $DM3USER USING $DM3PASS; LOAD FROM $EXPORTPATH/A.IXF OF IXF REPLACE INTO A; LOAD FROM $EXPORTPATH/B.IXF OF IXF REPLACE INTO B; LOAD FROM $EXPORTPATH/C.IXF OF IXF INSERT INTO C; CONNECT RESET;
The following command line can be used to see the results of the dependency analysis that db2exec performs, and the substituted SQL that would be executed:
$ dbexec -nv -c dm.ini dm*.sql Parsing script dm1_export_a.sql Parsing script dm1_export_b.sql Parsing script dm2_export.sql Parsing script dm3_import.sql Calculating dependencies for script dm1_export_a.sql Calculating dependencies for script dm1_export_b.sql Calculating dependencies for script dm2_export.sql Calculating dependencies for script dm3_import.sql Dependency tree: '- dm3_import.sql | +- dm1_export_a.sql +- dm1_export_b.sql '- dm2_export.sql Data transfers: DM1 ---> /tmp/A.IXF ---> DM3 DM1 ---> /tmp/B.IXF ---> DM3 DM2 ---> /tmp/C.IXF ---> DM3 SQL that would be executed is logged below: dm1_export_a.sql CONNECT TO 'DM1' USER 'GB01111' USING 'Passw0rd'@ EXPORT TO '/tmp/A.IXF' OF IXF SELECT * FROM A@ CONNECT RESET@ dm1_export_b.sql CONNECT TO 'DM1' USER 'GB01111' USING 'Passw0rd'@ EXPORT TO '/tmp/B.IXF' OF IXF SELECT * FROM B@ CONNECT RESET@ dm2_export.sql CONNECT TO 'DM2' USER 'GB01111' USING 'Passw0rd'@ EXPORT TO '/tmp/C.IXF' OF IXF SELECT * FROM C@ CONNECT RESET@ dm3_import.sql CONNECT TO 'DM3' USER 'db2admin' USING 'db2admin'@ LOAD FROM '/tmp/A.IXF' OF IXF REPLACE INTO A@ LOAD FROM '/tmp/B.IXF' OF IXF REPLACE INTO B@ LOAD FROM '/tmp/C.IXF' OF IXF INSERT INTO C@ CONNECT RESET@
Alternatively, instead of supplying the list of scripts to be executed with a wildcard (dm*.sql), one can use a "response file". A response file is a simple text file containing one command line parameter per line:
dmfiles.lst
dm1_export_a.sql dm1_export_b.sql dm2_export.sql dm3_import.sql
The response file is specified on the dbexec command line by prefixing its name with @:
$ dbexec -nv -c dm.ini @dmfiles.lst
Note that any command line parameter may be specified in the response file, not just filenames. Hence the following is equivalent to the command line above (note that :option:`-c` and dm.ini appear on separate lines as each line is strictly one parameter):
dmfiles.lst
-c dm.ini dm1_export_a.sql dm1_export_b.sql dm2_export.sql dm3_import.sql $ dbexec -nv @dmfiles.lst
Inside response files, quoting of parameters which contain spaces is not required. Furthermore, leading and trailing whitespace is significant. Parameters within response files may include wildcard characters (which will be expanded as normal), but response files are not recursive (placing an @-prefixed filename within a response file will not include the second level response file within the command line).
dbexec enhances the SQL language with a few extra commands that have proved
useful in production environments. The first and simplest is the INSTANCE
statement which is used in environments with multiple DB2 instances to switch
the instance that the script will execute against. The second is the ON
statement which provides rudimentary error handling capabilities based on
return codes or regular expression matching of the output. The following script
demonstrates both of these custom statements:
-- Use the client instance specified in the configuration INSTANCE $CLIENT_INSTANCE; CONNECT TO $SOME_SERVER USER $SOME_USER USING $SOME_PASS; -- If the source's table is unavailable, retry for an hour and then give up ON REGEX '^SQL0904' WAIT 30 MINUTES AND RETRY SCRIPT 2 TIMES THEN FAIL; EXPORT TO $EXPORT_PATH/MY_EXPORT.IXF OF IXF SELECT * FROM SOME_TABLE WITH UR; CONNECT RESET;