Checkpoint Restart V2.1
Checkpoint Restart V2.1
Checkpoint Restart V2.1
STANDARD
CHECKPOINT RESTART
-2-
1.0 An overview to checkpoint restart
-3-
A symbolic checkpoint call also does this. However, it has another function in that the application program can store away
information which will allow it to restart again if it fails. This information will vary between various different types of
programs, but it tends to be things like: the root key of the database record currently being processed, the page number of
the report which is being output, accumulators giving financial totals and item counts of whatever has been processed, etc.
Symbolic checkpointing is the only one which is of relevance in our environment: basic checkpointing is not (and should
not be) used in application programs.
As a side effect of both types of call, the current position on all databases is destroyed and must be re-established, by the
application, before any further processing can be done. The exception to this for DB2 is the use of SQL DECLARE
CURSOR WITH HOLD which maintains position across a COMMIT call.
There are two other IMS facilities which need to be explained before it is possible to show the logic of a typical
checkpoint/restartable program.
Firstly, there is a type of file called a GSAM (Generalised Sequential Access Method) database, which is not really a
database at all! It is basically a ordinary sequential file (BSAM, or VSAM ESDS) which is accessed through DL/I ISRT
(for writing) and GN (for reading) calls.
It might seem rather perverse to give the ability to access sequential files via DL/I calls, but there is a very good reason.
This is that on a restart call (explained below!), GSAM files are positioned to the record being processed at the last
checkpoint. Also, after a checkpoint call, positioning is not lost, as it is on true databases.
If this facility did not exist, it would be hard to restart a program which used sequential files as it would have to position
itself to the right place. This would involve doing things like reading through the input file until the relevant record had
been read, which is all rather messy. The situation is even worse for output files, where the program would have to read
from the old output file and write them to the new one until correctly positioned.
Secondly, there is a DL/I call XRST (for restart), which must be the very first call which the program issues. This can do
one of two things. Either, it will tell the application that it is not a restart and that processing can continue normally. Or,
alternatively, it will tell the program that it is a restart and that special action is necessary. In this case DL/I will do some
extra things. The information which the program saved at the last checkpoint call before it failed will be returned to the
program. The position is restored for GSAM files, so no application work is needed for them. However, for other databases
position is not restored and it is the application’s responsibility to restore position as necessary.
The general scheme of things for a batch program which is checkpoint/restartable is as follows:
It issues a XRST call to IMS, which tells it whether a restart is taking place or not.
If necessary, it then re-opens any DB2 cursors needed and re-positions on the various databases. It then resumes
processing using the data which was stored away in the checkpoint area.
Otherwise, it will open any GSAM files which are to be used and, if any of them are for output, it will issue an initial
CHKP call to IMS.
It then performs the main logic of the program:
Do the Logical Unit of Work (L.U.W.) - the lowest USEFUL function which it performs in a repetitive way,
e.g. process an input record.
Issue a CHKP call.
Re-position on the various databases (but not on DB2 cursors declared WITH HOLD)
until there is no more work to be done.
• DB2 automatically backs out table updates to the last COMMIT point.
• IMS BMP jobs automatically back out DL/1 updates back to the last checkpoint.
• IMS stand-alone DL/1 jobs do not backout automatically. However production jobs include a job step which performs
this function, using the IMS feature known as DBRC. In fact, DBRC will prevent access to a database which needs
backing-out until this step is performed.
When testing your checkpoint/restart programs, however, you may need to perform a manual batch backout.
1.4 ARC
Another piece of software which is involved in the process of checkpoint restart is ARC (APPLICATION RESTART
CONTROL) from BMC Software. ARC provides a wide range of features but at the time of writing its use is limited to
replacing the functionality of its predecessor product, BMP Restart. Broadly, these functions are:
-4-
• To provide a central repository of the checkpoint/restart characteristics of each group of jobs.
• To intercept CHKP calls to ensure that these are passed to IMS at an appropriate rate. Typically checkpoints will only
be allowed to be honoured once a certain time period has elapsed. This function is known as pacing.
• To maintain the restart information for each job, such that jobs can be restarted without manual intervention.
ARC is administered by Information Management and it is their responsibility to ensure that checkpoint pacing is set
correctly. However there are certain features of ARC that can assist you in testing your checkpoint/restartable programs
and these are noted below. For more information please refer to the following manuals on BookManager bookshelf BMC
Software Products for DB2, IMS, CICS, & MVS:
APPLICATION RESTART CONTROL User Guide V 2.1
APPLICATION RESTART CONTROL Reference Manual V 2.1
• Any program which accesses a DB2 or IMS database, whether read or update, must issue regular checkpoint calls.
• Any program which updates a DB2 or IMS database must also be written to be checkpoint restartable.
• Any scheduled program whose longest execution is likely to run longer than 20 minutes elapsed, must also be written to
be checkpoint restartable.
Thus a short-running read-only IMS or DB2 program will issue an XRST call and regular CHKP calls, but can use
READ/WRITE instructions for sequential files. They should not use GSAM. Such programs will always run 'from the top',
whether starting normally or restarting after an abend. The XRST call is necessary to keep IMS happy, but the program
should perform the same 'from the top' processing whether the XRST call indicates normal start or restart.
Other IMS or DB2 programs will be either long running, or do updates. These will issue an XRST call and regular CHKP
calls, but in addition will be truly restartable. This means they must use GSAM for sequential files, and will be written
and tested so that they use a checkpoint save area, if necessary, and do any required repositioning following an XRST call
which indicates a restart. Following an abend, they must rerun from the restart point to completion.
Programs which do not access IMS or DB2 databases need not take checkpoints.
An alternative for one-off read-only DL/1 programs which do not require absolute integrity of extracted data that is to use a
PSB with PROCOPT=GO. The access for these programs will ignore locks and therefore they run the risk of reading
uncommitted data, which will potentially lack referential integrity with other data read during the same run.
2.1 Overview
The following are a set of guidelines for coding checkpoint/restartable programs. They are by no means hard and fast rules
but they have proved themselves in existing programs. Generally, checkpoint/restart CAN be installed in programs almost
invisibly, if a little care is taken. It can also be fairly easy to retrofit it to an existing program, if that was reasonably well
written initially. It can however be a blight if misused, and in programs which have been troublesome, one or more of the
following guidelines has invariably been broken.
-5-
update programs, which can have many input records which to apply to a single database record. An update program which
applies unbilled financial transactions to C/M Accounts, would be a typical example.
The rule is that where there is a choice of L.U.W. then go for the lowest. The pacing function of ARC applied externally to
the program will then ensure that honoured checkpoints occur at an appropriate rate. Choosing the lower level L.U.W. may
well incur more re-positioning overhead after successful checkpoints. However, transaction driven programs usually have
to find the particular database record they are interested in, so the re-positioning tends to be 'lost' into this.
program 2:
Initialise.
XRST call.
If restart
reposition on database(s)
Otherwise
open files.
While input file non-empty
read next input record,
if not End Of File
CHKP call,
if checkpoint honoured
reposition on database(s) (if necessary),
main logic.
Write trailers to o/p files.
Close files.
Terminate.
Generally speaking, most people intuitively prefer the first solution, probably without being able to explain exactly why.
There are however some who vociferously advocate the second. This causes confusion amongst those trying to design their
programs structure, so the air needs clearing a little. The first method is, in almost every instance, preferable to the second.
There are two basic reasons for this.
Firstly, the first method uses the well-known structured technique: 'read ahead'. It is a very useful tool, which should not be
discarded lightly, unless there are clear advantages in doing so.
Secondly, the first method is much clearer in its approach to how the checkpoint relates to the program logic. That is to say,
the contents of the checkpoint area reflect the L.U.W. which has just been completed. The philosophy is that a checkpoint
acknowledges total completion of a particular L.U.W. On the other hand, in the second method a checkpoint indicates that
the program is about to start processing a particular L.U.W. This does not seem, to most people, a particularly clear way of
expressing what checkpoint/restart is all about.
The above arguments are not meant to show that there is never a case for not using the second method. They should
however have convinced you that it shouldn't be used very often. Only if you think that there is a strong case should you
design a program this way and you should clear it with a senior developer in your area first.
-6-
2.4 Where to code checkpoint/restart logic
This might sound, on the face of it, somewhat obvious. You have a section, which is performed in the initialisation phase,
which does the necessary work for handling restarts or opening GSAM files. There would also be a section, performed from
the main control loop, which does the necessary work for checkpointing (including re-positioning after honoured
checkpoint calls, if necessary).
It feels like such a natural way to do it, that it is difficult to conceive how else to approach it. However, if
checkpoint/restart is being retrofitted to an existing program, then it is quite possible that this way won't easily fit with the
current structure. If this is the case then you are strongly recommended to change that, rather than use a different way of
implementing checkpoint/restart. In general, where problems have been found in the past with such exercises, it has been
with those programs which are not structured very well. Signs of trouble, are things like references to restart flags in
sections which don't have anything to do with it. For example, in a section which reads an input file for the rest of the
program, it shouldn't need to look at a flag to see whether it should read the file or take it from the checkpoint area (after a
restart). Traces of pathological connections between functionally distinct areas of programs are a sure warning that there
could be severe problems ahead.
-7-
worse still simply not work properly - usually causing mayhem and pandemonium in other innocent and unsuspecting
programs.
For DB2, Appendix A contains some advice provided by Information Management on how to code DECLARE CURSOR
statements that will work correctly on a restart.
-8-
3.3 Use of ARC Online Panels
ARC provides a range of features to allow you to display and control your checkpoint/restart testing. On Testbed enter
‘TSO BMCAES’ on a command line:
Enter option 3:
-9-
Enter option 1:
BCSID : BCSS
Enter option 1:
JOBNAME . . . . . . Y0HG****
JOBSTEP . . . . . . ********
PROCSTEP . . . . . . ********
PROGNAME . . . . . . ********
PSBNAME . . . . . . ********
-10-
Enter your job details:
List Records
Command ===> Scroll ===> CSR
Act JOBNAME JOBSTEP PROCSTEP PROGNAME PSBNAME Run Start Date/Time Status
From here you can select options to delete a checkpoint to force a cold start on the next test run or to examine
the contents of the checkpoint data.
END
where Ixxxx,Oxxxx are the JCL DD names (input and output)
RECORD=n refers to LRECL . For variable-length files IMS will add two bytes to this value for QSAM
compatibility as per the note below.
SIZE=0 refers to BLKSIZE. SIZE must be specified as 0 in the DBD to stop DLI calculating it at DBDGEN time.
2. Make an entry in the programs PSB for the GSAM dataset.
PCB TYPE=GSAM,DBDNAME=xxxx,PROCOPT=LS
Always use PROCOPT=LS or GS.
3. Code a PCB mask.
01 DLI-xxxx-PCB.
03 DLI-xxxx-DBD-NAME PIC X(8).
03 DLI-xxxx-SEG-LEVEL PIC XX.
03 DLI-xxxx-STATUS-CODE PIC XX.
03 DLI-xxxx-PROC-OPT-CODE PIC X(4).
03 DLI-xxxx-RESRVED PIC S9(9) COMP.
03 DLI-xxxx-SEG-NAME PIC X(8).
-11-
03 DLI-xxxx-KFB-LENGTH-CNT PIC S9(9) COMP.
03 DLI-xxxx-SENS-SEG-CNT PIC S9(9) COMP.
03 DLI-xxxx-KFB-AREA PIC X(8).
03 DLI-xxxx-UNDEF-LENGTH PIC S9(9) COMP.
4. Use the GN call to read GSAM records sequentially. Use the ISRT call to write new records to the end of the file.
There are several things to note with using GSAM files.
1. It is not necessary to explicitly open or close the file using OPEN and CLSE calls. GSAM implicitly performs these
functions. The first call to a GSAM file will open it and, if opened, the file will be closed at the end of the program.
Implicit closing also occurs if, on reading sequentially, end-of-file is reached (status GB). Be aware that the next GN
call following this will cause the file to be re-opened, and the first record to be retrieved again.
2. GSAM datasets must be pre-allocated. This avoids the need for 2 sets of JCL - one for normal processing which creates
the dataset and one for restart processing which adds to an existing dataset. Furthermore, they must be pre-allocated
using HUT21, rather than IEFBR14, to avoid creating empty files without end-of-file markers.
If the DDNAME of the file in the pre-allocation step starts 'DD', HUT21 will open and close the file after allocating it.
This ensures that the file is empty. So all files in the pre-allocation step must be named 'DD......' e.g. DD1, DD2, etc.
3. You do not need to re-establish position on GSAM files after returning from the checkpoint or after restarting a
program - unlike for all other IMS databases. After checkpointing or restarting, repositioning will occur automatically.
However, be sure to code a DISP of SHR on all your GSAM files - or else unpredictable results may occur on restart.
DBD NAME=GSIJ,ACCESS=(GSAM,BSAM),PASSWD=NO
DATASET DD1=IGSIJ,DD2=OGSIJ,RECFM=VB,RECORD=352,SIZE=0
Set RECORD=max record/segment size, including 2 byte length field.
In your program (example taken from GS26000):
01 WP-GSAM-AREAS.
03 WP-GSIJ-REC.
05 WP-GSIJ-LEN PIC S9(4) COMP.
05 WP-GSIJ-DATA PIC X(400).
01 GSIJA-SE-FIN-DTL-HDR-REC.
03 GSIJA-FINE-REC-CD PIC XX.
.
01 GSIJB-SE-FIN-ACCT-TRANS-REC.
03 GSIJB-FINE-REC-CD PIC XX.
.
01 GSIJC-SE-FIN-DTL-SUBM-REC.
03 GSIJC-FINE-REC-CD PIC XX.
.
MOVE +352 TO WP-GSIJ-LEN.
MOVE GSIJB-SE-FIN-ACCT-TRANS-REC TO WP-GSIJ-DATA.
CALL 'CBLTDLI' USING DLI-ISRT,
WL-GSIJ-PCB,
WP-GSIJ-REC.
Some points on GSAM vs QSAM variable length record considerations.
Note that the GSIJA- record layout generated from Data Dictionary does not mention the length field. This is so that the
layout can be used whether the file is being accessed via GSAM or QSAMIORT. Explanation follows:-
GSAM uses a 2 byte segment length field. This follows standard DL/1 variable length segment conventions, i.e. 2 byte
S9(4) COMP. The length value includes the 2 bytes for the length field. For an example, see WP-GSIJ-REC above.
-12-
QSAM uses a 4 byte Record Descriptor Word. This is made up of a 2 byte S9(4) COMP record length and a 2 byte
FILLER. See QSAMIORT's description in the Utilities section of Standards for examples. The QSAM record length
includes 2 bytes for the length field and 2 bytes for the filler.
A common scenario is a VB dataset output as GSAM in a checkpoint restartable program, and read via QSAMIORT in
another program. Have we a compatibility problem between GSAM and QSAMIORT record prefixes? The answer is no -
because DL/1 outputs standard QSAM VB records; i.e. DL/1 inserts the FILLER and adds 2 to the segment length before
writing the record. Thus the GSAM output program may set record length to 352, but both physically, and to QSAMIORT,
the record length will be 354.
If a GSAM or QSAM program is coded like GS26000 (above), then data is moved from an unformatted WORKING
STORAGE IO-area to the formatted record layout. This means both the GSAM and the QSAM program can use the same
Data Dictionary-generated layout. The different length record prefixes are defined in the unformatted IO-areas.
-13-
5.0 System Design Considerations
All too often, Checkpoint/Restart capability is added as an afterthought, either during program design or, worse
still, after program build has been completed. Many of the problems that occur with checkpoint restart could be
avoided if sufficient thought was given to these matters at the outset. In other words, one of the key elements of
Systems Design process should be to think through the restart implications of a potential failure at every single
point in a batch process flow. Here are some guidelines to help with this.
5.1 Drivers
All batch processes will be driven in some way. The driver can either be a transaction file (‘hit file’) or a
sequential scan of a database. Where you have a choice then use a hit file whenever possible rather than a
database scan. This means that you don't need special routines for re-positioning the true databases. There may
also be benefits in performance, testing and production support.
You should also avoid having more that one driver (whether the driver is a database or a hit file) in any one
program. The merge-update logic does not fit in at all well with checkpoint-restart. Instead, generate a single
sequential hit file using conventional merge logic and then use this to update the database in a subsequent step.
5.2 Reports
Avoid the production of reports in checkpoint/restartable programs as it can be difficult to work out how to
restart these. If at all possible, write the details required by the report to a GSAM file and then have a small
program to print the report in a separate step. The side-benefit here is that we may be able to find a better way
to get the data to the customer than a mainframe report, such as transmitting the data to a PC file. We could do
this without touching the database program.
-14-
subsequently abends. Accordingly, the application program will have to cater for this: e.g. by doing a
REWRITE if a WRITE fails because the record was already present.
Thirdly, if the access pattern of the program to the KSDS is amenable, the problem can simply evaporate. This
would happen if the program reads the KSDS at the beginning of a run and updates it again at the end. This is
quite frequently the case with control or totals files which are also quite often VSAM. In this case, because
updates to the file only happen at one time (in the programs termination phase), they do not cross L.U.W.s and
so cannot get out of synch with the database updates.
5.5 Sorts
It is an unfortunate fact that sorts and checkpoint/restart don't mix well. It is notionally possible to get around
the incompatibilities, but it is almost never worth it. It complicates the logic of the program to the point where it
is incredibly tortuous. It is still not possible to checkpoint the program during the sort input/output phases, and it
doesn't carry any advantage in saved I/O operations.
This point does sometimes escape the notice of systems designers. It is not unheard of to find a spec for a
checkpoint/restartable program, which is also supposed to do some kind of internal sort. The solution is always
to split the program into 2 (or occasionally more) smaller ones. Usually, one of them will be a stand-alone (JCL)
sort to do the sorting for the checkpoint/restartable one.
-15-
6.0 Checkpoint/Restart with Other Products
6.1 Director
6.1.1 Introduction.
DIRTOBMC is a user exit module that allows DIRECTOR programs to use the ARC checkpoint/restart
software. This means that DIRECTOR can be used instead of COBOL when there is a requirement to write a
'oneoff' checkpoint/restartable program.
Using the DIRECTOR language, instead of COBOL, to code and test a checkpoint/restartable program can
considerably reduce the amount of programming effort required. However DIRECTOR should only be used for
production support and systems development tasks and not for regular scheduled production applications.
You will need to have a reasonable understanding of the DIRECTOR language, checkpoint/restart programming
techniques in order to successfully use DIRTOBMC. Examples of checkpoint/restartable DIRECTOR programs
can be found in DEVL.STANDARD.JCL members DIRBMC1 and DIRBMC2.
-16-
6.1.4 The CHKP call.
To make a checkpoint call set the function area AREAFUNC to CHKP and save any data you may require for a
restart in AREASAVE (a root segment key value for example). An example CHKP call follows:
AREAFUNC POS=1,C'CHKP' * set function
AREASAVE POS=1,KFBA(1,11) * save any restart data
UEXIT DIRTOBMC(AREAFUNC,AREAPCB,AREASAVE)
| | |
| | ---> CHECKPOINT SAVE AREA
| ---> IO PCB MASK
---> FUNCTION
Note that checkpoint ids do not need to be specified as they are generated by DIRTOBMC for you.
It is best to place your CHKP call in a position which will make restarting most easy. The following rules for
placing the CHKP call should serve most programs:
1. If your program is driven by an input GSAM file then place the CHKP call just before the GSAM read
statement (the GN call).
2. If your program is database driven (ie. you are sequentially processing a database) then place the CHKP
call just before reading a root segment.
Actual checkpoint frequency will be controlled by ARC. The status-code from the CHKP call is passed back to
your DIRECTOR program in bytes 11-12 of AREAPCB. If using pacing class PACESK then a status-code of
'SK' indicates that checkpoint processing has been skipped and processing can continue as usual. A status code
of spaces means that a checkpoint has been taken and the DIRECTOR program must therefore reposition on any
databases it is sequentially processing.
IF STATUS#
DOEXIT
ENDIF
A status-code of 'GB' is returned when end of file is detected.
The format of a GSAM file write is as follows:
DO DBD=DIR2
ISRT
ENDDO
IF STATUS#
DOEXIT
ENDIF
-17-
6.1.6 Testing.
The restart code of a checkpoint/restartable DIRECTOR program must be thoroughly tested before it is allowed
to run in the production environment. DEVL.STANDARD.JCL members DIRBMC1 and DIRBMC2 provide
examples of the JCL required. The following points may also be helpful.
• An old style checkpoint database is not required however your PSB must specify COMPAT=YES on the
PSBGEN statement.
• Any GSAM files must be pre-allocated prior to running the DIRECTOR step for a normal start.
• Load library DEVT.TESTBED.LOADLIB which contains DIRTOBMC must be placed on the STEPLIB
statement.
• The IEFRDER IMS logging DD statement should be present in your run JCL. This file is required for
running a database backout.
In order to test that restart is working correctly:
1. Place displays in your DIRECTOR program (using the PRINT just after the CHKP call to show which
database record has just been processed.
2. Cause an abend to occur in your DIRECTOR program when a record has been read. You can call module
ZCDB029 to cause an abend (see DEVL.STANDARD.JCL(DIRBMC1) for example code).
3. Run a database backout job after the abend. See member DEVL.STANDARD.JCL(DICHKPBO) as an
example.
4. Re-run your DIRECTOR program and display which key is returned AREASAVE to prove that you are
restarting in the correct place.
-18-
7.0 Defunct Checkpoint/Restart Components
7.2 CBLTBMC
This module was developed in response to a bug with GSAM processing in the original version of BMP Restart.
It is called instead of the standard module CBLTDLI. Where calls to CBLTBMC are encountered in a program
which is undergoing maintenance then you should perform a straight replacement to equivalent calls to
CBLTDLI.
7.3 CBLTDLX
CBLTDLX was the in-house written checkpoint/restart method in use prior to BMP Restart. This module was
called instead of CBLTDLI and checkpoint information was maintained on DL/1 databases. Where CBLTDLX
is encountered in a program which is undergoing maintenance then it should be removed and replaced with
standard calls to CBLTDLI . Please refer to section Converting COBOL To DLX Modules in the Utilities
Replacement Guide for details of how to achieve this.
7.4 SPLTDLI
This is a version of the database split utility which calls CBLTDLX rather than CBLTDLI. Where encountered
you can make a straight replacement with the standard module SPLTDLZ. Better still, there is another
technique for accessing split databases that is now available which avoids the need for these specialised
modules. Please contact the Development Centre for details.
-19-
Appendix A - DB2 Restart Logic
DB2 DATA
Col_A (key) other
37421 … … …
37422 … … …
37423 … … …
37424 … … …
37425 … … …
37426 … … …
SQL CODE
DECLARE crs1 CURSOR WITH HOLD FOR
SELECT …
FROM tables
WHERE …
AND Col_A > :ws-a
ORDER BY Col_A
When the key of a DB2 table consists of just one column the restart logic is very simple to code and is intuitive.
The only thing to bear in mind with any cursor that you want to stay open across logical units of work is that
they must have an ORDER BY statement. They should also be defined WITH HOLD so the program doesn’t
continually close and re-open the cursor after every checkpoint. It is a significant overhead.
DB2 DATA
Col_A Col_B
37421 00
37421 01
37421 02
37422 00
37422 01
37422 02
SQL CODE
SELECT …
FROM tables
WHERE …
AND Col_A > :ws-a
AND Col_B > :ws-b
ORDER BY Col_A, COL_B
A common mistake in re-positioning logic comes when we have to deal with multi-column keys. The most
common error is to code the cursor as above.
The problem comes to light if the program abends after processing, say, the second of the rows in our sample
table. In this case, Col_A = 37421 and Col_B = 01
When the program restarts, it would appear to pick up from where it left off and complete with return code of 0.
However, the fourth and fifth rows in our table would not have been processed by either the first run or the
restart. DB2 can give no indication that these rows have been missed as it has returned all the rows the program
requested.
-20-
A.3 Compound Key - Right
DB2 DATA
Col_A Col_B
37421 00
37421 01
37421 02
37422 00
37422 01
SQL CODE
SELECT …
FROM tables
WHERE …
AND ((Col_A = :ws-a
AND Col_B > :ws-b)
OR Col_A > :ws-a)
ORDER BY Col_A, COL_B
It says that we want to process all rows which have a Col_A value greater than the abend value and, in addition,
we want to process all rows that have the same Col_A value and a larger Col_B value.
However, even this sometimes causes problems from a performance perspective. DB2’s optimizer, which
decides what is the best access path to take, doesn’t realise that the predicates referring to Col_A on either side
of the OR are set to the same value. As a result, DB2 sometimes chooses an inefficient route to the data.
DB2 DATA
Col_A Col_B
37421 00
37421 01
37421 02
37422 00
37422 01
SQL CODE
SELECT …
FROM tables
WHERE …
AND Col_A >= :ws-a
AND ((Col_A = :ws-a
AND Col_B > :ws-b )
OR Col_A > :ws-a)
By adding an additional predicate into the restart logic we are explicitly telling DB2 that every row on the
restart will be greater than, or equal to, a particular value. This is usually enough to have the optimizer choose
the best access path.
NB: There are some circumstances when the best access path for a restart of a program is different from the
best for a normal run. This is because when a program restarts it will typically process fewer rows and DB2
will always estimate number of rows that will be returned. It then uses that value as a factor to determine the
access path to take. As a result, if the query is a join DB2 may choose a less than ideal method of joining the
tables (it has three to choose from) or even for simple queries on a single table DB2 may choose to use an index
when a table scan is best.
-21-
Therefore, when coding restart logic for long running programs on the critical path, or if you have concerns
about performance then contact your project team’s DBA. If your project team has not been allocated one then
contact Richard Livett, X4211.
-22-