Skip to content

Files

Latest commit

Aug 2, 2012
bfa0004 · Aug 2, 2012

History

History
565 lines (396 loc) · 18.9 KB

dbexec.rst

File metadata and controls

565 lines (396 loc) · 18.9 KB

dbexec

dbexec is a utility for executing SQL scripts (typically Export, Transform, Load (ETL) scripts) against multiple databases as efficiently as possible. It performs automatic (but currently rudimentary) dependency analysis of all provided scripts and executes as many in parallel as possible based on the dependencies discovered. It also offers simple text-substitution (for centralizing and securing login credentials, storage paths, etc.) and a few extensions to the SQL language primarily to aid in error handling.

Synopsis

dbexec [options] files...

Description

Execute files in the most efficient order available after analysis of dependencies between scripts based on data files produced and consumed.

.. program:: dbexec

.. option:: --version

   Outputs the application's version number and exits immediately

.. option:: -h, --help

   Outputs the help screen shown above and exits immediately

.. option:: -q, --quiet

   Specifies that only errors are to be displayed on stderr. By default both
   warnings and errors will be displayed

.. option:: -v, --verbose

   Specifies that informational items are to be displayed on stderr in addition
   to warnings and errors

.. option:: -l, --logfile LOGFILE

   Output all messages to LOGFILE. Output to the logfile is not influenced by
   --quiet or --verbose - all messages (informational, warning, and error) will
   always be included

.. option:: -D, --debug

   Run dbexec under PDB, the Python debugger. Generally only useful for
   developers. Also note that in this mode, debug entries will be output to
   stderr as well, which results in a lot of output

.. option:: -t, --terminator TERMINATOR

   Use TERMINATOR as the statement terminator in all specified scripts. This
   value defaults to semi-colon (';'), unlike the DB2 CLP which defaults to
   line breaks as a statement terminator. It is not possible to specify
   line-breaks as dbexec's statement terminator.

.. option:: -a, --auto-commit

   This is the reverse of the -c option of the DB2 CLP. By default, dbexec
   runs scripts without auto-COMMIT enabled (equivalent to the +c mode in DB2
   CLP). This is because I consider auto-COMMIT to be a dangerous default (one
   should always be able to rollback scripts that have failed).

.. option:: -c, --config CONFIG

   Specifies the ini-style configuration file which is expected to contain a
   [Substitution] section. The keys and values listed in that section will be
   substituted for $references in the specified SQL scripts.

.. option:: -d, --delete-files

   If specified, tells dbexec to remove all the files generated by the scripts
   (by EXPORT commands) after finishing execution (note that if execution
   fails for any reason, files will not be removed).

.. option:: -n, --dry-run

   If specified, tells dbexec to do parse but not execute the specified
   scripts. If specified twice, file-permission tests will also be carried
   out. If specified three times, database login tests will also be carried
   out.

.. option:: -r, --retry RETRY

   DEPRECATED If specified, specifies the default number of retries that dbexec
   will use for failed scripts. This option is now deprecated and will be
   removed in a future version. Use the ON statement to specify retry options
   with considerably finer control.

.. option:: -s, --stop-on-error

   This option is copied from the DB2 CLP. If used, scripts will terminate
   immediately if an error occurs (normally a script will continue running to
   the end if an error occurs, although it will still be considered to have
   "failed" by dbexec in either case)

.. option:: -L, --log-scripts TRANSFORM

   If specified, dbexec will enable real-time logging of script output by
   writing a log file per script that is specified for execution. The TRANSFORM
   value takes the form /regexpr/subst and specifies a regular expression and
   substitution pattern that will be applied to the script's filename in order
   to obtain the corresponding log filename. Be warned that this parameter will
   undergo major revisions when #85 is implemented.


Tutorial

Basic Usage

dbexec requires one or more SQL or CLP scripts in order to run. Scripts are simple text files containing SQL or CLP commands separated by a statement terminator. Unlike the DB2 CLP, dbexec does not permit statements to be separated by line breaks; the default statement terminator is semi-colon (equivalent to the DB2 CLP with the -t switch). You can specify a different statement terminator with db2exec's :option:`-t` command line switch (equivalent to the DB2 CLP's -td switch).

dbexec is primarily intended for working with scripts which transfer data from one or more DB2 databases to another (ETL scripts). This is why it concentrates on analyzing dependencies based on exported and imported files. To demonstrate a typical work-flow with db2exec, we will start with the following simple EXPORT script:

export_dept.sql

CONNECT TO SAMPLE1 USER fred USING secret;

EXPORT TO /tmp/DEPTDATA.IXF OF IXF
SELECT
    DEPTNO,
    DEPTNAME,
    MGRNO
FROM DEPARTMENT;

CONNECT RESET;

The following command line can be used to execute this script with db2exec:

$ dbexec export_dept.sql

Normally, dbexec produces no output unless warnings or errors occur. If you wish to see more information, including the output of each script, and a summary of execution, use the :option:`-v` switch. Below is shown the output produced when running the example script with this switch:

$ dbexec -v export_dept.sql
Parsing script export_dept.sql

Calculating dependencies for script export_dept.sql
File /tmp/DEPTDATA.IXF is produced, but never consumed
Test create/write for /tmp/DEPTDATA.IXF succeeded
Test login to database SAMPLE1 succeeded (username: fred)
Starting script export_dept.sql
Script export_dept.sql completed successfully with return code 0
Script export_dept.sql output
CONNECT TO 'SAMPLE1' USER 'fred' USING

   Database Connection Information

 Database server        = DB2/LINUX 8.2.4
 SQL authorization ID   = FRED
 Local database alias   = SAMPLE1


EXPORT TO '/tmp/DEPTDATA.IXF' OF IXF SELECT DEPTNO, DEPTNAME, MGRNO FROM
DEPARTMENT
SQL3104N  The Export utility is beginning to export data to file
"DEPTDATA.IXF".

SQL3105N  The Export utility has finished exporting "1" rows.


Number of rows exported: 1


CONNECT RESET
DB20000I  The SQL command completed successfully.

Script          Started  Duration Status
--------------- -------- -------- ------
export_dept.sql 12:40:14 00:00:00 OK
--------------- -------- -------- ------
Total           12:40:14 0s

There are several things to note in the output above:

File /tmp/DEPTDATA.IXF is produced, but never consumed

This line indicates that dbexec has noticed the /tmp/DEPTDATA.IXF file is created by the script, but that no other script consumes this file (with a LOAD or IMPORT statement). This is not an error, but a warning. In a set of scripts designed to transfer data from one database to another this might indicate that an import script has been forgotten from the command line, or that an extraneous export script has been included.

Test create/write for /tmp/DEPTDATA.IXF succeeded

Before executing any scripts, dbexec performs a test of all file permissions that will be required by all scripts being executed. Specifically, the ability to create, read, and/or write files that are produced or consumed by each script is tested. The tests performed are all non-destructive, hence the content of any data files that exist prior to the run will be preserved.

Test login to database SAMPLE1 succeeded (username: fred)

Before executing any scripts, dbexec performs a "test login" against each combination of database and username found in all the scripts specified on the command line. The reason for this is to avoid locking user accounts in the case where an incorrect password is used. For example, if ten scripts are to be executed which extract data from a single database, and this database has a security policy causing an account to be locked after 5 invalid login attempts (a stupid policy, but one all too common in enterprises) then if dbexec executes all 10 scripts in parallel with an invalid password it will immediately cause the account to be locked. Performing a test login before beginning execution will avoid this scenario. Should any test login fail, dbexec will abort before starting any scripts.

Starting script export_dept.sql
Script export_dept.sql completed successfully with return code 0
Script export_dept.sql output
CONNECT TO 'SAMPLE1' USER 'fred' USING
...

This sequence of lines implies that dbexec does not print the output of a script until after the script has finished executing (successfully or otherwise). Because dbexec uses parallel execution, if it printed the output of scripts while they were executing it would be difficult to discern which script produced which line of output.

Script          Started  Duration Status
--------------- -------- -------- ------
export_dept.sql 12:40:14 00:00:00 OK
--------------- -------- -------- ------
Total           12:40:14 00:00:00

Finally, the table at the end of the output provides a summary of the execution. In this case, one can see that a single script was executed successfully in less than 1 second.

One issue with the example above is that the password of the user Fred, used to connect to the "SAMPLE1" database is stored in the SQL script. Firstly, this is rather insecure. Secondly, consider the case where multiple scripts connect to the same database. If Fred needs to change his password all the scripts would need updating. Instead, we can use dbexec's variable substitution feature to remove the password (and, in the example below, the username, database name, and export path) from the script into a single configuration file. This makes maintenance easier, as there is now a single location to update in the case of a password change. It also, to a small extent, improves security as there is now only one file that we need to secure (with file system permissions for example). Below you can see the configuration file:

dbexec.ini

[Substitute]
SAMPLEDB=SAMPLE1
SAMPLEUSER=fred
SAMPLEPASS=secret

EXPORTPATH=/tmp

And the updated export script:

export_dept.sql

CONNECT TO $SAMPLEDB USER $SAMPLEUSER USING $SAMPLEPASS;

EXPORT TO $EXPORTPATH/DEPTDATA.IXF OF IXF
SELECT
    DEPTNO,
    DEPTNAME,
    MGRNO
FROM DEPARTMENT;

CONNECT RESET;

The configuration file is a simple INI-style file. It consists of a section titled "Substitute", containing name=value lines. Blank lines and comments (prefixed by semi-colon or hash signs) will be ignored. Continuation lines can be specified by indentation.

Substitution variables may appear anywhere in an SQL script; within quoted strings, or outside them. They may represent values or be statements in and of themselves. Variables are prefixed with a dollar and the name of the variable is optionally enclosed in ${braces}. If a variable name is immediately followed by a character which could form part of a variable name (an alphabetic character, a number or a underscore) then the braces are mandatory.

To execute the script with the configuration file, the following command line could be used:

$ dbexec -c dbexec.ini export_dept.sql

If one wished to test the configuration to ensure that variables were substituted correctly and dependencies fulfilled, use the :option:`-n` switch (commonly combined with :option:`-v` to ensure all detail is displayed):

$ dbexec -nv -c db2exec.ini export_dept.sql
Parsing script export_dept.sql

Calculating dependencies for script export_dept.sql
File /tmp/DEPTDATA.IXF is produced, but never consumed

Dependency tree:
'- export_dept.sql

Data transfers:
SAMPLE1 ---> /tmp/DEPTDATA.IXF

SQL that would be executed is logged below:

export_dept.sql
CONNECT TO 'SAMPLE1' USER 'fred' USING 'secret'@

EXPORT TO '/tmp/DEPTDATA.IXF' OF IXF
SELECT
    DEPTNO,
    DEPTNAME,
    MGRNO
FROM DEPARTMENT@

CONNECT RESET@

You may note that the statement terminator in the output has changed from the original semi-colon to an at-symbol (@). There are two reasons for this:

  • In order to ensure that all scripts output when using the :option:`-n` option use the same terminator. This is useful when redirecting the output (see below).
  • In order to ensure that compound statements (in between BEGIN ATOMIC and END) are able to use semi-colon as the terminator for statements within them, a terminator other than semi-colon must be used. The at-symbol is a common "alternate" terminator.

When :option:`-n` is used, all SQL that is output by dbexec is written to stdout, while informational and warning messages are written to stderr. Furthermore, all scripts are output in the order they would be executed (the ordering of scripts that would be executed in parallel is arbitrary). This enables one to redirect the output to a file in order to obtain a single script which, if run, would produce the same result as running dbexec without the :option:`-n` switch. This can be useful for debugging purposes. For example:

$ dbexec -nv -c dbexec.ini export_dept.sql > test.sql
Parsing script export_dept.sql

Calculating dependencies for script export_dept.sql
File /tmp/DEPTDATA.IXF is produced, but never consumed

Dependency tree:
'- export_dept.sql

Data transfers:
SAMPLE1 ---> /tmp/DEPTDATA.IXF

SQL that would be executed is logged below:

export_dept.sql

$ cat test.sql
CONNECT TO 'SAMPLE1' USER 'fred' USING 'secret'@

EXPORT TO '/tmp/DEPTDATA.IXF' OF IXF
SELECT
    DEPTNO,
    DEPTNAME,
    MGRNO
FROM DEPARTMENT@

CONNECT RESET@

Note that :option:`-n` can be specified multiple times:

  • -n: parse scripts and output SQL to run
  • -nn: as above but also run file permission tests
  • -nnn: as above but also test database logins

Finally, a more complete example, involving four SQL scripts. Three of the scripts connect to two databases and extract data. The final script loads all the exports into a third target database. A configuration file defines all the user names and passwords involved:

dm.ini

[Substitute]
DM1USER=GB01111
DM1PASS=Passw0rd

DM2USER=GB01111
DM2PASS=Passw0rd

DM3USER=db2admin
DM3PASS=db2admin

EXPORTPATH=/tmp

dm1_export_a.sql

CONNECT TO DM1 USER $DM1USER USING $DM1PASS;

EXPORT TO $EXPORTPATH/A.IXF OF IXF SELECT * FROM A;

CONNECT RESET;

dm1_export_b.sql

CONNECT TO DM1 USER $DM1USER USING $DM1PASS;

EXPORT TO $EXPORTPATH/B.IXF OF IXF SELECT * FROM B;

CONNECT RESET;

dm2_export.sql

CONNECT TO DM2 USER $DM2USER USING $DM2PASS;

EXPORT TO $EXPORTPATH/C.IXF OF IXF SELECT * FROM C;

CONNECT RESET;

dm3_import.sql

CONNECT TO DM3 USER $DM3USER USING $DM3PASS;

LOAD FROM $EXPORTPATH/A.IXF OF IXF REPLACE INTO A;
LOAD FROM $EXPORTPATH/B.IXF OF IXF REPLACE INTO B;
LOAD FROM $EXPORTPATH/C.IXF OF IXF INSERT INTO C;

CONNECT RESET;

The following command line can be used to see the results of the dependency analysis that db2exec performs, and the substituted SQL that would be executed:

$ dbexec -nv -c dm.ini dm*.sql
Parsing script dm1_export_a.sql
Parsing script dm1_export_b.sql
Parsing script dm2_export.sql
Parsing script dm3_import.sql

Calculating dependencies for script dm1_export_a.sql
Calculating dependencies for script dm1_export_b.sql
Calculating dependencies for script dm2_export.sql
Calculating dependencies for script dm3_import.sql

Dependency tree:
'- dm3_import.sql
   |
   +- dm1_export_a.sql
   +- dm1_export_b.sql
   '- dm2_export.sql

Data transfers:
DM1 ---> /tmp/A.IXF ---> DM3
DM1 ---> /tmp/B.IXF ---> DM3
DM2 ---> /tmp/C.IXF ---> DM3

SQL that would be executed is logged below:

dm1_export_a.sql
CONNECT TO 'DM1' USER 'GB01111' USING 'Passw0rd'@

EXPORT TO '/tmp/A.IXF' OF IXF SELECT * FROM A@

CONNECT RESET@

dm1_export_b.sql
CONNECT TO 'DM1' USER 'GB01111' USING 'Passw0rd'@

EXPORT TO '/tmp/B.IXF' OF IXF SELECT * FROM B@

CONNECT RESET@

dm2_export.sql
CONNECT TO 'DM2' USER 'GB01111' USING 'Passw0rd'@

EXPORT TO '/tmp/C.IXF' OF IXF SELECT * FROM C@

CONNECT RESET@

dm3_import.sql
CONNECT TO 'DM3' USER 'db2admin' USING 'db2admin'@

LOAD FROM '/tmp/A.IXF' OF IXF REPLACE INTO A@
LOAD FROM '/tmp/B.IXF' OF IXF REPLACE INTO B@
LOAD FROM '/tmp/C.IXF' OF IXF INSERT INTO C@

CONNECT RESET@

Alternatively, instead of supplying the list of scripts to be executed with a wildcard (dm*.sql), one can use a "response file". A response file is a simple text file containing one command line parameter per line:

dmfiles.lst

dm1_export_a.sql
dm1_export_b.sql
dm2_export.sql
dm3_import.sql

The response file is specified on the dbexec command line by prefixing its name with @:

$ dbexec -nv -c dm.ini @dmfiles.lst

Note that any command line parameter may be specified in the response file, not just filenames. Hence the following is equivalent to the command line above (note that :option:`-c` and dm.ini appear on separate lines as each line is strictly one parameter):

dmfiles.lst

-c
dm.ini
dm1_export_a.sql
dm1_export_b.sql
dm2_export.sql
dm3_import.sql

$ dbexec -nv @dmfiles.lst

Inside response files, quoting of parameters which contain spaces is not required. Furthermore, leading and trailing whitespace is significant. Parameters within response files may include wildcard characters (which will be expanded as normal), but response files are not recursive (placing an @-prefixed filename within a response file will not include the second level response file within the command line).

Enhanced SQL Commands

dbexec enhances the SQL language with a few extra commands that have proved useful in production environments. The first and simplest is the INSTANCE statement which is used in environments with multiple DB2 instances to switch the instance that the script will execute against. The second is the ON statement which provides rudimentary error handling capabilities based on return codes or regular expression matching of the output. The following script demonstrates both of these custom statements:

-- Use the client instance specified in the configuration
INSTANCE $CLIENT_INSTANCE;

CONNECT TO $SOME_SERVER USER $SOME_USER USING $SOME_PASS;

-- If the source's table is unavailable, retry for an hour and then give up
ON REGEX '^SQL0904' WAIT 30 MINUTES AND RETRY SCRIPT 2 TIMES THEN FAIL;

EXPORT TO $EXPORT_PATH/MY_EXPORT.IXF OF IXF
SELECT * FROM SOME_TABLE
WITH UR;

CONNECT RESET;